Riff: LLMs are Software Diamonds

Published: 2025-07-01 Updated: 2025-07-01 By: Aditya Athalye

The making of a diamond is a repeatable, but naturally non-reproducible process. The exact same input of carbon subject to the exact same configuration of pressure, temperature, forge, time, process control will never produce the exact same diamond twice. Once made, a diamond is unique. And once made, a diamond is forever.

Lt. Cdr. La Forge and Dr. Brahms. (image source: screenrant)

Every Large Language Model is a spectacularly faceted diamond.

LLM training is a repeatable, but naturally non-reproducible process ¹. The exact same corpus of data processed in the exact same data center on the exact same hardware and software configuration using the exact same training program will never produce identical LLMs across training runs.

To continue the analogy, a diamond is a fixed relationship of carbon atoms (tokens). It will refract light in a myriad bewildering and surprising ways. However, in its now-set crystalline form, the rearranged carbon atoms will always refract the exact same incident light into the exact same multi-spectral beauty every single time ².

An LLM too, once-created, is a pure function; an impossibly fine crystalline next-token computer. The spectral catalogue we make by studying an unadorned diamond in vacuum will only ever be a partial reflection of said untouched diamond. We can spend entire lifetimes and fail to catalogue all the rearrangements of incident light a single diamond can effect. ³

Can an inert diamond be useful, beautiful, expensive, unique, surprising, delightful, and endlessly entertaining?

Of course it can. Industrially, personally, and aesthetically!

Degree of intelligence and aliveness is a whole other ballgame.

An unadorned LLM can "think" only to the extent that it is a calculating function; it is not itself a stateful process.

Thinking requires the thinker to mutate; record new information, alter its system state and capabilities, adapt and/or respond to the environment. The thought and the change of the thinker are concomitant processes.

This is why, to make diamond-based pattern generators behave we must prefix and suffix dynamic optical middleware—gates, filters, polarisers, gravity, lenses, surrounding media, vacuum etc. For more dynamism and new capabilities, we can compose various arrangements of such augmented diamond-based pattern generators.

In case of LLMs the LLM hosts, third-party orchestrators, and, we the users, provide the prefix and postfix middleware… our prompts and tacked-on labels and post-training "fine-tuning" and augmented memories and trampoline API calls etc… Our very minds are the middleware; modulating, interpreting, interrogating, manipulating the ephemeral stateful mind-context in which an LLM-diamond is submerged and operates, by continually experimenting with, and fine-tuning prompts and process (LLM chaining etc.). ⁴

A reader may object saying these are human-centric criteria. Why are LLMs not "alive"?

As far as I'm concerned, we are already ubiquitously using reliable AI ⁵, and LLMs certainly are a kind of mind ⁶ on the contiguous spectrum of intelligence… surprising, unexpected, novel, shockingly useful…

Just not alive in the sense of even a virus… I find it difficult to equate stunningly high degree of surprise to emergent behaviour. Even of the sort we can observe in the simplest arrangement of mechanical parts, governed by universal laws. The three body problem eludes a general solution because of process… it is, generally, a dynamic system tending to chaos. The best we can do is calculate approximate solutions on a case-by-case basis (there is no one-size-fits-all solution).

Meanwhile, LLM technology has been used to remarkably accelerate computing protein folds, attacking Fields Medal-level proofs, and other research-hard problem statements.

What does this mean?

Is it a property of the LLM? viz. that the LLM is itself an intelligent problem-solver in the anthropocentric sense, or at least biological sense (spontaneously auto-introspecting auto-adapting)?
Is it the property of those particular problem spaces? Were they always vulnerable to LLM-style brute-forcing; a sort of "permute through a search space until the output parts fit just so"?
Is it a property of those incredibly sophisticated users of LLM technology; veritable Geordi La Forges ⁷? Is it because they already know how to tell when parts fit; viz. when the completely novel-to-them solution satisfies an internally consistent system of axioms, and/or obeys well-understood rules and laws of chemistry and physics, and/or long-term stable, convergent behaviour of biological evolution itself.

A pure function can only approximately model a process. A process itself is a live, non-fungible phenomena. It is much more than the sum total of its model-as-equations or functions. Framed this way…

Rocks are more alive than diamonds, and contain more intelligence than the latter because rocks mutate over time and encode within themselves a story of whatever was going on — did a river flow, and then became a desert, and then a test nuclear bomb detonated near it, which put the location of the rock in Nevada, but now it gives mute company to its nameless faceless kin in a wall far from its original home? ⁸ etc. etc. etc.
A fly-by-wire system is way smarter than human pilots, capable of governing a control loop in an impossibly dynamic environment. It reconfigures the aircraft itself, compensating for atmospheric effects, passenger movement, fuel sloshing about, to keep the whole system within the envelope of control. We can trust this evolution of the humble thermostat with our lives.
Aristotle's writing is dead because it's just inert data, same as the computed LLM. A snapshot of a moment frozen in time. Completely devoid of any subtext or context. Which we must rely on other people to supply, to varying degrees of detail, completeness, fidelity / truth etc. The writing and the LLM live through other real-world processes ⁹.

Speech patterns (text, audio, visual) generated by LLM tech, simultaneously blow my mind and sober me up… Surprisingly sensible-looking to me, while making it painfully clear that humans are boringly predictable creatures as a collective.

We are tribal and each tribe and sub tribe and sub sub tribe vocalises in distinct but relatively stable echo chambers. Echo chambers that are littered across the corpus of the Internet. Putting them all inside one giant search construct exposes us to thoughts we never had, but a hell of a lot of other people did. Alive context-imbued thoughts whose very rough textual approximations were frozen in time. And then were number-crunched into a pattern space of set ways of thinking, that are networked inside the forever unchanging pre-computed graph of probabilities inside the LLM.

We are also (predictably) surprised because we get to know what we didn't, because we didn't even know where to look, because we had no way to interpolate connections. What's going on with LLMs from a user's point of view is that they are search tech that autocomplete pattern spaces. LLM technology cannot extrapolate holes outside the training set. It can interpolate novel connections within the training set. ¹⁰.

Further, our degree of surprise and incredulity is heavily tainted by our overwhelmingly anthropocentric conditioning. We are perpetually amazed by the crazy stuff an octopus can do. Or the tool-making felicity of Corvids.

Anyway, this has gone on long enough…

Let's just say I just finished reading SB Divya's Meru and I wouldn't mind that future at all; where alloys and megaconstructs keep us way too happy to shoot each other any more.

Soapboxing. (Credit: Culture Club and Getty Images. Image source: grunge.com).

~Fin~

Floating point math is imprecise, and the calculation of weights is an optimisation problem, which must converge to a "good enough" minima, kind of like Newton-Rhapson on steroids. (Thanks you-know-who for the mental model correction.)↩︎
Sure, quantum effects will create subtle variations, but once discovered, a refraction pattern is nearly 100% reproducible. That is, a diamond is a waveguide set in stone.↩︎
Finite and countable, but just too big to characterise completely. The Travelling Salesman problem from hell.↩︎
I think Fred Hebert is on to something.↩︎
… but we just call it "engineering" or "technology"…
- Automated assembly lines and factories
- Traffic grids
- National-scale power grid balancing
- Drive by wire and fly by wire systems (autopilots can land aircraft in zero-visibility, and bad weather, with pilot incapacitated)
- Self driving tech (behemoth container ships make safe trans-oceanic passages in the care of crews of well under 50 people)
- Large Vision Models can spot subtle anomalies in radiology, bloodwork, graphs and charts
- The computerised paralegal
- Not to mention gameplay A.I.s
- etc. etc. etc.
↩︎
Against Mind-Blindness: recognizing and communicating with diverse intelligences

This is a ~35 minute talk about diverse intelligence and our efforts to establish formalisms and methods for communicating with unconventional biological intelligences.

— Michael Levin

↩︎
Geordi La Forge is the Gen AI-using Engineer Archetype. He uses the Ship's LLM all the time, not infrequently to solve novel one-off problems (sometimes under life-and-death pressure). However, he is such a successful user only because he can spot incredibly subtle errors and compensate/augment with his own elite training combined with years and years of hands-on field experience. Essentially, he has hard-earned specialist tacit knowledge, which is entirely absent from any static data corpus, no matter how detailed the corpus is.↩︎
Rocks - Animated short film (2001)

As time passes, the two rocks Hew and Kew watch the world around them change rapidly. But there is nothing really that can disturb their calm…

German title: Das Rad
Directors: Chris Stenner, Heidi Wittlinger, Arvid Uibel
Screenplay: Chris Stenner, Heidi Wittlinger, Arvid Uibel
Producer: Georg Gruber
Production: Filmakademie Baden-Württemberg
Music: Roland Hackl

↩︎
For example… LLM-owners continually feeding their almost-here singularity (pinky promise) with a nuclear reactor, and injecting ever-changing "system prompts" that mash up unpredictably with user prompts to "guide" the calculation away from converging to speech emitted by a genocidal demagogue, or sycophant, or racist or whatever spectrum of speech 4chan, twitter, yellow journalism and so forth have stuffed into its vector space.↩︎
An analogy of interpolation that strikes me as relevant… In the search for matter distribution in space, dark matter was maddeningly inadequate, and filament theory was not even popular speculation. But now we have observations of filaments between galaxies. Evidence of most of the "missing" matter content of the universe being concentrated in filaments connecting galaxies. LLMs feel a lot like that… constructs that materialise the previously unseeable filaments in a universe-graph of clumpy knowledge matter.↩︎