AI’s Bitter Lesson hits everyone different
There’s nothing you can think that can’t be thunk, scale is all you need.
Back in 2019, Canadian computer scientist Rich Sutton published a seminal article titled “The Bitter Lesson.” The point of the article, as we see it, was to urge the AI sector to focus its efforts on development methods that show the greatest potential to advance the field.
Ultimately, the bitter lesson boils down to not repeating the same mistake: trying to imbue machines with human knowledge instead of teaching them how to solve problems.
The phrase “how to solve problems” is our wording, not Sutton’s. We use it because, though Sutton advocates a paradigm of “taking advantage of general methods leveraging computation,” he appears to intrinsically marry solving the AI generalization problem with increasing computational resources ad nauseam.
2019 was a long time ago
It’s been nearly six years since The Bitter Lesson was published and it’s unclear if the “scaling” argument still holds up. Firms such as OpenAI and xAI are clearly putting the bulk of their resources into increasing compute resources. Nvidia experienced the fastest period of revenue growth in history in 2023 and 2024 thanks to the demand for its GPU units.
But, then there’s DeepSeek. The models made by DeepSeek are comparable in capability to those made by OpenAI and xAI. And there’s a popular narrative going around that it cost less than $6 million to train.
OpenAI and xAI have sunk billions into their models. If it were possible to reach the same level of performance for less than 1 percent of the costs, that would be a huge deal. But it isn’t. As numerous reports have indicated in the time since DeepSeek’s launch, the actual costs of development are at least above a billion dollars.
Rather than get too deep into the weeds, a good analogy is: if you buy a whole pig to make a pound of bacon, the total cost of having bacon and eggs for breakfast isn’t equal to the average cost of a plate of bacon and eggs. It’s the cost of a whole pig plus the eggs and any other expense incurred in the cooking process.
And that’s not even counting the farmer’s costs. In this case, orgs such as OpenAI, Microsoft, Google, and Meta did a lot of the heavy lifting for DeepSeek’s arrival. That doesn’t make DeepSeek any less impressive. But it does call into question whether there’s a point at which no amount of additional compute resources will produce demonstrably better results.
Real brains don’t wait
In the real world, intelligence doesn’t emerge, evolve, and adapt to score higher on benchmarks. Even at the individual level, our intellectual capabilities grow in tandem with our ability to interact with our environments.
Data is all you need?
The Bitter Lesson has been interpreted in many different ways, but one of the most interesting takes we’ve seen comes from The Transmitter, an editorial team associated with the Simons Foundation.
In a March 26 article titled “Accepting ‘the bitter lesson’ and embracing the brain’s complexity,” they write:
“To gain insight into complex neural data, we must move toward a data-driven regime, training large models on vast amounts of information.”
A panel of nine experts opined on the path forward for neuroscience and AI. According to the aggregate, researchers should leverage state-of-the-art techniques to generate novel insights into organic brains.
The big idea seems to be that if we can scrape up all the “neural data” we have on brains and toss it all into foundational AI models that are developed and tuned for the purpose, we might see similar gains in neuroscience as we’ve seen in chemistry.
Our take: Heck yeah, that’s what AI is for. We aren’t sure if increasing the size of data troves in order to glean insight is exactly what Sutton meant, but this sure seems like the way forward for neuroscience.
When it comes to getting deeper insights into data, we agree that scale is all you need. If you have a question about chemical combinations or neural network permutations, and you need to conduct a data analysis that’s so complex it would take a million humans a million years to go through all the data, machine learning presents a path forward.
We don’t think large-language models or chatbots are necessarily the way to do that, but whatever architecture works for the data is the right one to use.
However, if the goal of a given AI endeavor is to create an oracle, a “human-level” agent, or a general intelligence, then we disagree. You need a lot more than scale to do any of that.
To us, the problem of building the best AI to do a specific task is not comparable to the task of building an AI that can do any task or any task a human can do.
For example, if your goal was to drive from one end of France to the other in the shortest amount of time, you wouldn’t simply take a current car and scale its engine power to the limits of engineering.
A car that can travel 1,000 kph might be faster than a street legal production car, but there are almost no circumstances by which it would be more efficient for the specific task at hand.
Art by Nicole Greene