It's that time of year again. As usual, we took a couple of weeks off and came south: a bit of Portugal, a bit of Italy, a lot of friends and family. My family has a long-running joke that I hate people and love my computer. That's not (entirely) true. I talked to lots of people this summer. But something felt off.
ChatGPT this, AI that, Copilot here, Gemini there. Nobody had a clue what they were talking about. I’m not an AI guru, but I have worked in the field for some years. People are not stupid; people are misinformed.
If you know what a Mixture-of-experts or KV cache is - some of these misconceptions might be surprising. If you don’t, then it’s time to clear some things up.
Large Language Models are connected to the internet
Large language models (LLMs) are not connected to the internet. A model is a zip file, a file you can even run in your own laptop1. A file that compresses the whole internet it was trained on. But that training had a beginning and an end. The model knows nothing about what happened after it was trained. Ask a model - in its purest form - the news from today and it will refuse or make something up.
"But Duarte, when I ask ChatGPT today's news it knows". What ChatGPT - and many other apps - are doing is stuffing search results into the model context. ChatGPT used to have a Search button, remember? (I know my sister Cata was pissed I didn't tell her sooner). ChatGPT still has a "Search Web" button, it just now automatically detects if your question needs a web search and uses it. So you don’t have to think about it.
Large Language Models should cite their sources
Let’s say you are training your own LLM. You start by collecting the entire internet into a single Word document. You run a probabilistic model over it. The model starts learning. For example, whenever the model sees “The president of the USA is,” it always sees “Barack Obama” next to it. This is a large document; it has seen that pattern many times—in websites, Wikipedia, blogs, books, etc. And so it learns it.
Once training stops, you prompt the model: “The president of the USA is,” and it responds: “Barack Obama,” simply because it is likely. It sounds wrong today, and that’s exactly the point: the model only knows what it saw during training. How can we cite this process? It wasn’t a single website that made the LLM respond like that. It was the effect of the entire corpus. But what exactly has it seen?—you might ask—and how much? Well, that’s where things get tricky.
An Ice-cream flavor generated by artificial intelligence
There are a lot of things that can be generated by AI. Some of them good, some of them less bad, a lot of them useless. Just because you can do something with AI does not mean you should. Someone prompted ChatGPT for weird flavors of ice-cream, posted them online and got thousands of likes. We have a name for that. It’s called slop.
$$ \text{AI Generated} \neq \text{Good} $$
Half our feeds are low quality AI content, the other is filled with influencers that know that adding the two magic letters will triple the interactions. In summary: be skeptical of anything with “AI” slapped on it.
It automatically learns from what you tell it
My entire family thinks AI is some sort of all powerful monster that learns continuously from whatever they tell it. Another myth. A model is an artifact that is stuck in time. In order for it to learn something new, you need to retrain it. Retraining is a long and expensive process. If you tell ChatGPT that your mom's name is Elsa, and create another conversation, the model itself does not know that. 2
I know if feels like it's continuously learning. AI providers are use small "tricks" to make that happen.They write what you said to it in a file, and then inject that file into the context when you start a conversation. Effectively giving you the idea it's learning even though it’s not. The model needs to be retrained to get new knowledge.
Are we in a bubble?
Of course we are, even Sam has said it. Many companies are pivoting overnight to AI, and getting crazy investments as a byproduct, much like the dot-com days. Researchers are getting paid millions of dollars. Influencers are getting crazy engagement just by mentioning AI. This is peak stupid.
But AI is also insanely useful. That has real effects. Just look at your everyday now: Are you using Google as much? Are you taking advantage of models in any way? I am pretty sure you are. These models are useful. These models are powerful. And the open source ecosystem of models is genuinely growing and exciting. The whole world in your pocket? The whole internet’s knowledge without WiFi!
As usual, we are somewhere in the middle of the useful and the overhyped.
Humanity is doomed and technology sucks
Why are we investing money in going to Mars? Why are we inventing these new things that destroy creativity? Why are we submitting our children to such a future? We should go back to pen and paper, we should stop this right now, we should invest only in our planet and nothing else. I understand, change is scary, we are scared of the unknown. Especially when we are older and used to things in their normal state - whatever that is.
In a lot of ways though, it has never been more interesting to be alive. Code is easier, learning is easier, diagnosing is easier, boring things are easier. That is progress. But using all of those easier things to build something stupid has also never been easier. And that feels like the opposite of progress.
Final thoughts
I'm an optimist. Where a lot of people see doom, I see something interesting. That’s why I am in love with technology. When changes are this big then fear, uncertainty, and doubt prevail. Keep two things in mind: it’s relevant to understand how AI works to some extent, and remember that no one knows anything.
When the dust has settled another thing will become clear: taste, creativity, and the will of doing interesting things has never been more important. If I used ChatGPT to write this entire post, it would probably suck. But if I use ChatGPT strategically to give me feedback on how to make it better - it probably will.
Like Vitto says: There are many fun things to do: go do them.
- You should try LM Studio ↩
- Even if they use your data for training - but this loop takes a while - it's not instant ↩