Anthropic bends over backwards calling transformer work ‘thought’ in 2 new papers
They say good developers need to remain flexible, but this is quite the stretch.
What is “thought?” To hear Anthropic define it, it’s basically just whatever happens inside of an AI model’s black box.
In two new papers, Anthropic describes a new methodology by which developers are attempting to peek under the models’ hoods to discern exactly what’s occurring inside. We read the papers and found them enlightening.
But, there’s nothing in them that changes our view that forcing an AI model to conduct “autocomplete” at scale is in any way related to “thinking.”
The first paper
Circuit Tracing: Revealing Computational Graphs in Language Models
The first paper describes a method for determining how a model chooses what to say next — large language models (LLMs) are often described as “autocomplete” tools at scale.
To accomplish this, the team built a secondary “replacement model” that allows them to replicate and observe certain neural network functions in response to prompting.
The result, according to the paper, allowed the team to develop a graphing system that traces the model’s “thought process.”
The work is incredible and we’re still poring over the paper. But, upon first reading, it’s pretty easy to dismiss the notion that these models can “think.” Anthropic takes great pains to describe the scale at which functions occur inside of these models, laying bare the notion that they’re doing the same trick that all algorithms do. It’s just more impressive at scale.
While building a secondary network to graph “circuits” certainly provides new insight into what’s happening inside of the secondary network, the paper doesn’t make it clear how these evaluations could work at meaningful scale. Currently, they’re only locally interpretable which, seemingly, means the resources necessary to graph and evaluate a single prompt’s thoroughfare would likely not translate globally.
In other words: If you measure a prompt once, there’s no indication that your measurements will apply to subsequent responses given to the exact same prompt by the exact same model.
Part of the problem, as the paper states, is that there is no practical method by which to observe these processes. Many of them might even involve “superposition,” or the activation of neurons across the network in changing “clusters” of activity that may or may not repeat.
Essentially, the more layers and neurons a neural network has, the more technically intractable the task of mapping the potential connections becomes.
If we start with a two-neuron network and scale it, eventually, we end up with more possible activation scenarios, or “thought” patterns, than there are atoms in our galaxy.
It bears mention that we still disagree with the notion that what’s occurring inside of these AI models could be considered “thought” by any measure.
The second paper
On the Biology of a Large Language Model
This one is a bit more nebulous. The premise of the second paper, titled “On the biology of a large language model,” is that the use of the graphing technique demonstrated in the first paper can provide deep insight into the circuits a model uses for “thinking.”
From where we’re sitting, it looks a lot like they’re trying to reverse engineer how a Plinko ball falls by tracing paths from its end position to its starting point.
This seems like a cool idea. But, to make that analogy work, we must imagine a Plinko game inside of an opaque machine. We see the ball go in, we see where it comes out, and we’re just guessing about what happens. If there are only a handful of potential paths, it becomes quite easy to do the math. But, as we scale the number of paths and account for the “superposition” notion, it becomes nearly impossible to definitively determine which path was taken.
The circuits created through the Anthropic team’s observations are as likely to be ground truths as they are good guesses with each subsequent run.
And, since the latent space they’re attempting to observe is ostensibly always larger than the sample space they’re attempting to conduct measurements in, even simple circuits could be incorrect. What if, for example, processes involving the word “apple” can occur in multiple, nearly identical circuit spaces?
In another example, the team seems to believe that an AI model’s ability to generate novel, coherent, rhyming text (poetry that rhymes) is an indication that it “plans ahead.” It could simply be that the model runs more than one “autocomplete” sequence. One to determine the next word in a phrase and another to determine the next rhyming word in a couplet.
But, It seems a lot like AI developers have lost access to Occam’s Razor. Devs across multiple firms are seemingly very impressed with the speed and efficiency at which multiple passes at a prompt can be run. For example, we remain unconvinced that ChatGPT or Gemini models are capable of “reasoning” or “thinking” simply because they output two responses to a prompt, with one supposedly showing their thinking.
Learn more: Google, OpenAI, Artificial Intelligence benchmarks, and AGI — Center for AGI Investigations
It’s hard to make the case that Anthropic’s models can think simply because they’re capable of completing more than one autocomplete task at a time.
That being said, if you remove the hyperbole, silly references to “biology” and “thinking,” the Anthropic team has clearly moved the needle of interpretability in the right direction. We may not be able to see what’s happening inside of the Plinko machine, but it’s important to develop techniques to understand why certain outputs meet or don’t meet our expectations.
Read more: AI’s Bitter Lesson hits everyone different — Center for AGI Investigations