These AI agents kill zombies with angels, their creators say they can ‘think’
Well, that’s certainly one way to look at it.
A team of researchers from the Astera Institute, a nonprofit organization that’s “doing fundamental research in artificial general intelligence” recently published a preprint research article that’s got everything.
AI agents? Check. Action, combat, and treasure? Check, check, and check. Unleashing hidden flocks of angels on unwitting hordes of zombies to prove the existence of artificial “thought?”
Check?
Artificial thought
A group called the “Obelisk Team” at Astera published the aforementioned paper, titled “Thinking agents for zero-shot generalization to qualitatively novel tasks” on March 25, 2025.
While relatively short at seven total pages (before references and appendices), it manages to squeeze in a whole lot of information.
Essentially, the team wants to build a way to demonstrate that an AI agent can “think” by designing a puzzle that it’s never been trained on, couldn’t have memorized, and can solve using what appears to be fancy Markov chain reasoning trees.
Here’s a snippet from the paper:
“By ‘thinking,’ that is, by internally manipulating concepts and behaviors and evaluating likely outcomes, agents can tackle novel problems never encountered before, by recombining existing knowledge into new solutions. This ability is perhaps the hallmark of what we think of as truly “intelligent” behavior: it is highly prevalent in humans, but it is debated whether it even exists in non-human animals…”
And how, praytell, does an agent demonstrate this incredibly human-like, advanced behavior?
By killing zombies, of course
The Obelisk team developed a grid-based game environment for the AI Agent to play in. Inside the grids are walls, doors, cows, zombies, tools, other obstructions, hidden passages, and more zombies. If an agent touches a zombie, they get “negative rewards,” and if it figures out how to survive a challenge, kill the zombies, or solve the test, it gets "positive rewards.”
The training environment used in the “Thinking agents for zero-shot generalization to qualitatively novel tasks” paper.
The big idea behind this test is that certain combinations of environmental factors are withheld during each generalization attempt. This ostensibly creates a novel task for each run. And, from what we can tell, it pretty much guarantees that the agent doesn’t already know how to solve a given task within the confines of the testing environment.
And that’s where things get a bit odd.
First things first, it bears mentioning that the paper explicitly states that the term “thinking” is used in place of a better term. The authors acknowledge that “thinking” is a nebulous concept and they don’t appear to be making empirical claims about the nature of thought.
But, that being said, the use of the term in the title of the paper and to describe agent behavior is still troublesome.
If “thinking” can be described as using a set of pre-defined, hard-coded rules about an environment to extrapolate what would ordinarily be considered “incomplete information,” then it’s hard to argue with the notion that the Obelisk Team’s agents are doing exactly that.
But, by that logic, it’s also hard to argue that AlphaZero wasn’t thinking. In fact, it would be hard to argue that any organism whose cells are capable of the division of labor (ie: specialization, resource seeking, harm avoidance, etc.) isn’t “thinking” by these terms.
We’re not trying to mince words or debate semantics. But, when the same words that are being used to describe what occurs inside of the human mind during problem-solving, chatbots that respond to prompts twice, and an agent that can navigate an adventure game that strongly resembles the 1981 arcade game “Venture” by Exidy, it seems like everyone’s using the same words to mean something completely different.
Can AI think? Probably not. There’s certainly no evidence to indicate it can.
What really matters is whether it can actually solve real-world problems. And, so far, we’re not sure how well an agent’s ability to evade zombies and seek out angels will translate into the challenges faced by beings on this side of the computer screen.
We think the Obelisk Team at Astera is doing awesome work and it was really cool to read their paper. But, much like the ARC-AGI-2 benchmark we talked about in yesterday’s news, trying to build a scaffold to prove AI can “think” or that it’s reached “human-level” intelligence might be an intractable challenge using the scientific method.
Read more: Google, OpenAI, Artificial Intelligence benchmarks, and AGI — Center for AGI Investigations
Art by Nicole Greene