Promise in a broken mirror

Leafless tree branches against a pale sky with several small birds perched on the highest twigs.

Here I am, sitting in bed. It’s quiet, early Friday morning, just before setting off on holiday back to Latvia. I’m looking forward to enjoying the snow, the nature, and a bit of time outdoors.

Anyway, back to the topic. Something is keeping me up this early on a day when I’d usually still be asleep, so I thought I’d write it down.

I’ve been quietly observing how LLMs have changed things since they first came out publicly. People close to this field generally understand how they work and how they should be used. However, I keep seeing cases of misuse. That made me stop and reflect.

I don’t think we explain clearly enough what LLMs actually are. Vague lines like “this might hallucinate” or “it might get things wrong” aren’t strong explanations and don’t do them justice. These systems are human-made results of decades of research. During my time at university, I learned about and built a Markov chain text generator. It was nowhere near the sophistication of modern LLMs, but it still gave me a sense of what it feels like to see a machine produce readable text. Even Andrej Karpathy’s microgpt project is just a single file, around 200 lines of code, with no external dependencies, yet it can train and run a small GPT. That said, thinking of LLMs as all-knowing entities straight out of science fiction is simply wrong. LLMs aren’t unique. They’re built on longstanding concepts, repackaged for a competitive landscape.

They’re more like a shattered mirror. Thousands of tiny fragments, with quite a few pieces missing. You can sort of see your reflection, but it isn’t really you. LLMs are trained on huge datasets and contain billions of parameters, which already implies limits. They are inherently incomplete and bound by physical constraints. They’re trained on past data yet respond in real time, which can blur our perception of what is real and what isn’t. Almost like an illusion we want to believe in.

Sometimes I find myself pondering how these massive datasets were sourced, and the ethics behind that process, since it has huge implications for the results an LLM can produce. For example, some reporting, such as a piece on TorrentFreak about alleged access to large collections of pirated books for AI training, raises important questions about consent and ownership. Biases are baked in, knowingly or not. Still, it’s something I keep coming back to.

How do we measure their effectiveness? Through benchmarks designed by humans. But those are relative too, because as models improve, so do the tests. Nothing is constant. Today even the best models might score below 40% on something like Humanity’s Last Exam. Tomorrow they might not. But why does that even matter?

LLMs mimic humanity, but they are not human and never will be. That’s why I find both fascinating and unsettling the level of obsession around AI. It can feel like a kind of collective hype. In the end, though, it’s still technology: disruptive, yes, but still just a tool. People adopt tools and move on.

Much of the confusion comes from not really understanding what these systems are, how they work, or how they should be used. Like choosing the right tool for a job, you wouldn’t use a fork to hammer in a nail.

And once you see it clearly, it stops looking like magic and starts looking like a tool.