Why I Am Agnostic About AGI & ASI
I don't know if it's possible or what the timelines are, and neither does anyone else. So I don't tend to get worked up about it, unless someone is trying to control me with p(doom) scare stories.
I often get asked if I believe in imminent artificial general intelligence (AGI) or artificial super intelligence (ASI), and I always answer that I’m extremely skeptical that we’re close to such things. But this answer is more of a “tribal signaling” type of answer that I give to plant a flag, and doesn’t capture what I really think.
So here’s what I really think: Nobody can possibly know how close we are to AGI or ASI, especially not professional AI-knowers — in fact, AI nerds are the least well-equipped people to know how close we are or are not to AGI or ASI.
The short version of the reason I think people working in AI are the last ones you should listen to is that their whole conception of “progress” in this field is dominated by a benchmark-based arms race dynamic that most informed observers in other domains consider dysfunctional when they see it in their own space.
In this post, I want to pull on this benchmarking/measurement thread, because if it’s pulled hard enough, the whole “imminent AGI/ASI disaster” sweater will unravel.
When a measurement becomes a target
Everybody knows Goodhart’s Law, which is popularly phrased as follows: "When a measure becomes a target, it ceases to be a good measure".
The insight behind this law is straightforward: When you take some benchmark meant to evaluate progress in some project, and you focus your entire project specifically on beating that benchmark, then the main bit of information the benchmark gives about the individual things being benchmarked is how good they are at beating the benchmark.
This problem has long been recognized in standardized testing, which is one reason tests like the SAT get revised every few years. When students are focused on the test itself, then the test measures how well they prepared for the test, instead of how well they prepared for college. (I do know the SAT is supposedly more robust to this kind of thing, but it’s still a factor.)
This is also a problem I encountered frequently in my former life as an editor and publisher of PC hardware reviews — microprocessors and gaming graphics cards, and the like. There was a constant problem of hardware makers hacking and tweaking their products to blow the doors off of some synthetic benchmark suite so that they’d have the longest or shortest bar in a graph in a Tom’s Hardware Guide bakeoff. This led to an arms race between the benchmark makers and the hardware makers, where the former were always trying to make their benchmarks realistic and relevant, while the latter were always actively trying to make those same benchmarks as unrepresentative as possible for the purpose of capturing market share by any and all means.
So I’ve lived this Goodhart dynamic in my earlier career in tech journalism, and I know exactly how the game works when there are hundreds of billions of dollars of earnings and literally the NASDAQ’s performance on the line with benchmark results.
Imagine my (utter lack of) surprise to see this same dynamic take hold of the AI industry. Every day, I watch the following Types of Guy post benchmark results that show LLMs going straight up and to the right in performance:
Founders who have untold billions at stake in a contract clause that triggers when they reach AGI. (Ok, maybe just one founder. And his investors and assorted dependents and hangers on.)
Patriots who want to warn that America is behind in the race to AGI.
Mercenaries (investors, engineers) who are arm-in-arm with the aforementioned patriots, because they want to dip their beak in some of the government money that will flow towards efforts to beat China to AGI.
X-risk Doomers who sincerely believe that AGI will kill us all, therefore we need to enact shockingly illiberal, draconian global measures (which they have come up with and would be in charge of) to stop it.
E/accs who sincerely believe that AGI will usher in a literal post-biological utopia, free of suffering and death, and the rest of Samsara.
Threadbois who farm engagement.
I’m sure I’ve missed a few with this list, but you get the idea. There is a lot at stake in “AI number go up” discourse, both financially and spiritually (i.e., national pride, of either summoning all-powerful daemons or preventing the summoning of all-powerful demons).
The focus of all of this intense financial and spiritual pressure is a suite of benchmarks. The benchmarks are a mix of tools we use to measure humans, and specific benchmarks we’ve invented just to measure LLMs.
If you just take a step back and think about how the world works, and what happens when there’s this much at stake in beating a specific set of concrete benchmarks, then you can very easily back into the fact that the benchmark numbers will absolutely, definitely keep going up, regardless of whether these benchmarks are actually measuring anything useful anymore.
It also turns out that when you look very closely at the benchmarks and how they were developed, you find that not only are they targets with a lot of money at stake in them being hit, but many of them have been developed specifically as targets.
The Large Language Model Benchmark Survey
A recent paper by a group of Chinese researchers surveys the AI benchmark landscape and provides a historical overview of AI benchmark development. The paper paints a broad picture of how, at this point, the AI benchmark situation has gone from basic “the measurements became targets,” to a state of “we’re publishing these targets and calling them ‘benchmarks’”.
For example, this from the section on the evolution of knowledge evaluations is representative:
This established a rigorous standard and catalyzed an arms race in both model development and benchmark design. In response to emergent model saturation on MMLU, subsequent benchmarks have pushed the frontiers of difficulty and scope. For instance, MMLU-Pro raised the adversarial bar by increasing the number of choices and the proportion of reasoning-intensive questions. Concurrently, benchmarks like GPQA were designed by domain experts to be “Google-Proof,” directly addressing the challenge of models retrieving answers from web search rather than relying on internalized knowledge, while SuperGPQA further escalated the challenge into hundreds of highly specialized, graduate-level domains. This evolutionary arc reflects a continuous effort to create evaluations that remain challenging for even the most capable models.
What’s going on here, with the models and benchmarks co-evolving in response to one another under an arms race dynamic, is that “intelligence” is extremely highly contested, ill-defined, and difficult to identify positively across an essentially infinite number of domains.
So a solution presents itself: We’ll identify some very narrow thing that humans can do — coreference resolution in linguistics, solving a specific suite of toy logic problems, constructing certain types of mathematical proofs, etc. — and define a measurable target based on that thing so that everyone can now try to hit this new target.
Then, once we hit that target, we can congratulate ourselves that we are closer to artificial “intelligence” and then set about formulating a new target that, once hit, will bring us even closer.
A wonderful, recent example of this piecemeal, target-to-target-based approach is Francois Chollet’s ARC Prize, which has gone through a few predecessor versions and is now literally a set of targets that, if you hit them, you can say you’ve achieved AGI and you win a bunch of money.
Again, this is subtly but crucially different from the more mundane sense of benchmarks as tools for discriminating between multiple different approaches to determine which one is the best for some intended, non-benchmark-beating purpose.
The sense I have of the folk wisdom that’s emerging here — I say “folk wisdom” because seems to be more than a collective intuition, but less than a rigorous set of claims — is that the real “scale” in the scale-based approach AGI is not so much increases in parameter count or training token count, but something like: We keep scaling up the number of discrete “intelligent” things these models can do (which humans can already do), until we cross some threshold at which they’re somehow general enough to take over and begin acquiring new capabilities on their own in the same way we humans do both individually and collectively.
In other words, we’re still looking for a magic phase change in some future scale regime, where the basic material we’re working with goes from being one kind of thing to being another kind of thing with different properties.
This isn’t an unreasonable expectation. It’s pretty clear there was exactly this kind of phase change somewhere between about GPT-2 and GPT-3.5. So the plan is that we will get to yet another one, soon. And then perhaps another one or two after that.
To this plan, I say: Maybe, or maybe not.
The case for agnosticism
At the beginning of this post, I was careful in my phrasing of what I really think. I specifically recommended not skepticism but agnosticism on the AGI/ASI question.
(Note: To nuance my take a bit further, I do actually recommend extreme skepticism of the motives of any people who are pushing a baldly authoritarian agenda in the name of some existential threat. But this is different from agnosticism about the threat itself. This is a hard line to walk, but it’s important.)
Agnosticism about a supposed existential threat is a hard sell. It’s too energizing and useful to get yourself and your peers worked up into a full lather about a Big Bad on the horizon. Such threats breed unity, as enemies make common cause with one another, and everyone has a clear, singular purpose. Believe me, I know how this goes first-hand… but I don’t want to re-litigate the COVID response, and what I did right and wrong, right now (I’ll do that someday, though).
So I’ll make a few narrow points to hopefully convince you that agnosticism on the X-risk threat of AGI is rational and warranted.
When you’re sounding the alarm about AGI/ASI wiping out humanity, you’re dealing with at least three unknowns that are arranged in a sequence, where the second depends on the first, and the third depends on the first and second:
Is AGI/ASI even possible?
The timing of our encounter with it, assuming it’s possible
The inevitable nature of ASI as threatening or benevolent, assuming it’s possible, and we actually encounter it before the heat death of the universe.
There are some other sci-fi unknowns I could add to the list, like a #4 that says, “The existence (or not) of benevolent ASIs that are watching over us and preventing the threatening version of it from taking us out.”
You probably read my candidate for #4 and thought, “Ok, now he’s just making stuff up.” You’re right, I am just making stuff up, and in fact, the only thing that separates X-risk AI doomers from “just making stuff up” is the presence of benchmarks and timelines in their stories about the apocalypse. But these Goodhartified benchmark charts and timelines are all cosmetic and still amount to “just making stuff up, but with statistics.”
Anyway, let me unpack the three points above, very briefly.
Is AGI/ASI possible?
I think AGI is certainly possible because it just boils down to an electronic version of a thing that already exists in the universe, i.e., human intelligence. So I do think it’s possible, and I want to leave aside the implications of that for now and talk about ASI, instead.
My intuition about intelligence is that it’s not an exponential but a sigmoid curve. I wrote about this already, so I’m just going to link it and not repeat it. I think it’s likely that humans are near the top end of this curve, and that we’re not likely to make anything that relates to us the way we do to a dog or an ant.
The doomer counter to my instinct is, of course, to cite benchmarks showing number go up. But again, you know what I think of that.
To sum up:
I am certain that AGI is theoretically possible because the “GI” part already exists
I strongly lean toward thinking ASI is theoretically impossible — at least in the strong Yudkowskian sense of “godlike, and as far from us as we are from insects”. I’m reasonably sure it’s possible to be smarter than humans, but until I see an existence proof of something much smarter than us, then I’ll continue to assume a sigmoid that we’re in the upper bend of.
The timing of AGI
This timing issue is the main place I’m at odds with both the daemon-worshipper and demon-fighter versions of AGI/ASI maximalism. I just don’t think we have any way of knowing how near or far we are from such a moment. And I think we don’t know because the whole AI effort is experimentation and surprises.
The fact that neural networks can create coherent, useful, human-intelligible sequences of words, computer code, genomes, pixels, sound waves, Go moves, etc., has come as a huge surprise to even the most informed observers.
It’s not that the functioning of these things is mysterious — what’s mysterious is that these simple, strange little constructions can be made to work as well as they do for certain types of applications. As David Chapman puts it:
It is widely acknowledged in the field that it is mysterious why backprop works at all, even with all this tweaking. It’s easy to understand why gradient descent works in the abstract. It’s not easy to understand why overparameterized function approximation doesn’t overfit. It’s not easy to understand how enough error signal gets propagated back through a densely-connected non-linear deep network without getting smeared into meaninglessness. These are scientifically interesting questions. Investigation may lead to insights that could—ideally—help design a better replacement technology.
In current practice, however, getting backprop to work depends on hyperparameter search to tweak each epicyclic modification just right. Each modification to the algorithm has an understandable explanation abstractly, but none does the job individually, and it’s not easy to understand why they work well enough in combination—when they do.
If it seems likely that the resulting system would have unpredictable properties and fragile performance… that is usually the case.
The unpredictability and fragility that Chapman describes are painfully, constantly apparent to anyone who works closely with these systems on a day-to-day basis. It takes a ton of benchmarks to paper over all this and shape it into a steady, uniform “scientific progress” narrative.
We are literally just poking at these mathematical objects, guided by our hunches and intuitions, and seeing how they respond. Then, based on those responses, we’re trying to back into a provisional model of an underlying “intelligence” that has some predictive power. We are very, very early in these efforts.
There are roughly three ways you can interpret this “poke at it, and hope for a little surprise” approach to “progress” in AI:
Dark forest: Like in “The Three-Body Problem”, we should stop sending out probes into new regions of darkest latent space, lest we awaken an extremely unpleasant surprise in the form of a Shoggoth.
Penetration testing: We should keep port-spamming the numinous via AI and psychedelics, in the hopes that we’ll uncover a backdoor to reality that lets us get root.
Same old technological progress: Every new thing we discover how to do with a neural network is just that, and only exactly that — a new thing we can do with a neural network. It’s not a milestone on some journey towards a Shoggoth or root access, but just another cool thing we can do now that will hopefully unlock other cool things we can do later. And because all this is a series of surprises, there’s no way to know what’s next.
If you haven’t guessed, I’m in camp #3. In my view, all of us in AI are just doing some engineering with a new toolset, trying to discover what we can and can’t build right now. Sure, you can arrange a series of technological surprises into a classic tech tree, but tech trees are always a hindsight construction that we impose on past innovations, and not a real thing we actually navigate our way along. “Progress” is only directional when you’re looking backward at it — it’s a social and hermeneutic reality, not a natural law.
Contrast this to people in AI who are in either the doomer camp or the accelerationist camp, and who have convinced themselves and many others that they are the very ones chosen to lead humanity through the final phase of some technological journey into either the abyss of extinction or the glory of a post-human heaven.
The inevitably threatening nature of ASI
Once you have convinced yourself, contrary to all available evidence, that “intelligence” is a quantifiable thing that exists as an exponential in nature, and you have also convinced yourself that each random thing we figure out how to do with neural nets is in fact a definite milestone on some larger path with a clear direction of “up and to the right” on that same exponential, then to my mind you are well into sci-fi fantasy land and should believe whatever you like from there on out. If you’re a pessimist, then by all means knock yourself out with paperclip scenarios; or if you’re an optimist, then go nuts imagining Star Trek immortality. As I said, we’re all just making stuff up.
I mean the above to be less dismissive than it probably sounds. We’re all way down our own rabbit holes of woo and supposition on various things. I doubt any of the rationalists at MIRI are hot to spend a minute of their time on intra-Christian fights over Sabellian Modalism vs. Trinitarianism, and which is the correct way to think about the ineffable nature of the Godhead. For similar reasons, I’m disinclined to think too hard about ASI once we’re this far down the chain of improbably what-ifs. I understand if it’s your jam — it’s just not mine.
I use this Godhead analogy advisedly, because I think God is real and that he is interested in MIRI, whether or not MIRI is interested in him. Similarly, the folks at MIRI think ASI is real and that it is interested in Christians whether or not Christians are interested in it. And in this, we are with the AI policy wars in the thick of a very ancient and familiar type of battle over which camp’s ultimate vision of the world will govern our shared lives.
To come at it from a slightly different angle: By comparing AI doomerism to Christian theology, I am paying it a compliment by generously framing it as a peer competitor to a thing I take quite seriously. The doomers won’t see it that way, but of course, I don’t particularly care how they see it. All I care about is that my side wins, and I understand that they feel similarly. This is how one naturally behaves when the stakes are very high.