AI Safety: A Technical & Ethnographic Overview
Despite the dry title, this is the spiciest thing I have published in this newsletter. Have a glass of milk handy when you read it.
The story so far: We need to talk about The Letter. No not that letter, or the other letter, or the other one… the AI letter. The one that calls on all of humanity to, “pause for at least 6 months the training of AI systems more powerful than GPT-4.”
For some weeks I had been noodling on a draft entitled, “AI Safety: An Ethnography,” and today was going to be the day I published it. But earlier today that letter dropped, and it so powerfully illuminated the rich, complex contours of the terrain of the “AI safety” issue that I had to fishtail and rework this post to center it.
There are two main threads I’ll follow in this piece:
Technical, covering some key specifics about GPT-4 that make the open letter honestly kind of nonsensical and a bit of a Rorschach test.
Ethnographic, covering the cultural and tribal divisions and allegiances that are cropping up around the technical issues.
I’ve crammed these threads into one extremely lengthy post because in order to understand the tribal dynamics you have to understand the technical issues. But you also can’t understand how the technical issues are framed without understanding the tribal dynamics. These things are intertwined in the “AI safety” issue to a degree that makes them impossible to separate, which is why this topic has fascinated me since the beginning of this newsletter in 2021.
jonstokes.com is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Prelude: The safety scissor
When it comes to the question of how worried we should all be about AI, everyone initially arrives at this issue with their own culture war baggage. But what I’m seeing play out after that initial encounter is a two-step process:
You immediately start pattern-matching and sorting to figure out which is the “correct” side. This tribal sorting is parallel to and even part of the process of educating yourself about the issue itself.
But AI safety doesn’t easily map onto most existing culture war divides, and once you grasp this then you face a choice:
Re-sort yourself and your network along brand-new AI safety lines.
Tweet through it!
In the ethnography portion of this piece, I’ll cover people who’ve picked each path of the decision fork above, i.e. a re-sort or just white-knuckled clinging to existing culture war frames. I’ll also describe some people who are still stuck on step 1.
But most of the action in AI safety — and most of the confusion — is taking place in circles that have re-sorted along new and explicitly AI-safety-focused lines. These re-sorted types are the ones who’ve signed the letter and are signal-boosting it on Twitter.
✂️ In respect of how many in existing groups are sorting themselves along “AI safety” lines, it’s clear that the whole concept is rapidly emerging as a powerful scissor — a statement or meme that divides a tribe of people into strongly pro and strongly con. At its most basic level, a scissor is a spectacle that is taken in by the primitive parts of the human brain — specifically the parts that are wired to pre-cognitively sort friend from foe.
So let’s dive into the letter and the technical issues first, so we can understand how this new, post-culture-war, AI-centered, friend-vs-foe division is shaping up.
Part 1: How smart is too smart?
The AI open letter falls into the same trap as all open letters: the moment you start asking for enough specifics to make the recommendation actionable, the cracks start to show and the whole project starts looking like a publicity stunt.
Take a look a the money quote of the AI letter — the big ask:
Therefore, we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable, and include all key actors. If such a pause cannot be enacted quickly, governments should step in and institute a moratorium.
The whole AI safety problem is contained in the phrase, “AI systems more powerful than GPT-4.” What exactly does this mean, though?
✋ Hold up: I know what you think is coming next because it’s a standard culture war pattern. We all do it.
If you come at me, a 2A supporter, with “ban assault weapons now!” then I am gonna drop a 10-kiloton nuance bomb of deep, thorny definitional problems directly over your position, and watch to see if you scramble for cover.
If I come at you, an LGBTQIA+ ally, with “a woman is an adult human female,” you are gonna drop a 10-kiloton nuance bomb of deep, thorny definitional problems directly over my position, and watch to see if I scramble for cover.
But reader, I promise you that I am not about to deploy definitional nuance in order to win an argument. Rather, I will (hopefully) illustrate that the technical, measurable definition of “progress” in AI — and even more narrowly, GPT-4’s progress over GPT-3 — is the whole ballgame.
To summarize the points I’ll make in this section:
There is no “I know it when I see it” position for any side of this issue to fall back to because nobody has ever actually seen the “it” in question!
Sam Altman can’t even boil down the main intelligence improvements from GPT-3 to GPT-4, apart from pointing to the models’ performance on some quizzes and tests that are themselves hotly contested as signifiers of human potential in our current political climate.
OpenAI as a company has opened up its evaluation tools because they’re so in the dark on this core question of what “more powerful” looks like that they desperately need the rest of the world to help them come up with ways to measure the capability of the systems they’re developing.
Is GPT-4 a friend or a foe?
👀 To reframe the first point above (“I know it when I see it”) using my two culture war examples: assault weapon ban promoters have all seen things they are certain are obviously “assault weapons,” and gender criticals have all seen people they are certain are obviously “women,” but not a single soul opining about “AI safety” has ever once encountered the thing they’re worried about us bringing into existence in the near or medium term.
As a scissor, then, “AI safety” is unique in that it requires two separate cognitive moves:
Imagine in your mind what “AGI” would or possibly could look like.
Handoff that mental picture to the primitive part of your brain to pattern match as “friend” or “foe.”
Given that this is how it is, I have to confess to quite a bit of sympathy for the anti-“techbro” and anti-“AI hype” tendency to mock those expressing safety concerns as boys sitting around the campfire with flashlights in their faces, creeping each other out with made-up ghost stories. It kinda does look like this!
Even worse is the tendency among so many AI safetyists to analogize the threat they’ve imagined in their minds to the threat posed by nuclear weapons. It happens all the time and it never, ever convinces anyone who isn’t already convinced. I almost want to propose Stokes’s Law: as an online discussion of AI safety grows longer, the probability of a comparison to nuclear weapons approaches 1.
It always goes something like this:
X-risker: This jailbroken chatbot just gave me detailed instructions for robbing a bank! Now imagine a more powerful one giving out detailed instructions for making a nuclear bomb.
Me: Bruh, those bank robbery instructions are not real. You cannot literally become a successful bank robber by reading those instructions then doing them. Please calm down.
X-risker: But imagine the instructions were real! A super powerful AI would be able to give very a very accurate, actionable plan for a bank robbery! And also for nukes! NU. CLE. AR. WEAPONS!
Me: But Brossandra, if we’re just imagining godlike AIs, why don’t we imagine one that loves us and tells us how to live forever? Why imagine Dionysus and not, say, Apollo?
X-risker: How do you not understand that this is like nukes.?! Do you just think everyone should have a nuclear bomb? We have blue-ribbon committees and laws for nukes, so we need them for AGI because it’s LIKE NUKES.
Me: Eh, I’m not seeing it. Check out this thread of sick Snoop Dogg memes I made with AI. Anything that can do this is a force for good.
And around it goes, with both sides of this conversation doing friend-or-foe pattern matching on entirely different targets — the safetyist is pattern-matching on some viral ChatGPT bank robbery thread, and I’m pattern-matching on my Snoop thread, and we’re each free to pattern match on whatever we like that’s vaguely AI-related because there is no actually existing AGI to anchor our arguments in.
The technical challenge of measuring GPT-4
If you listen to Lex Fridman’s interview with OpenAI CEO Sam Altman, there are a number of exchanges where Lex tries to get Sam to detail the improvements GPT-4 brings to the table over GPT-3 or to characterize how powerful it is.
Sam says a few things that are interesting and directly relevant to the question of defining what is and is not “more powerful than GPT-4.”
OpenAI did not release the parameter size for GPT-4 because that number, which everyone is fixated on, is not a straightforward measure of model capability anymore.
OpenAI improved GPT-4 by making many small improvements in many places — from dataset curation to training to fine-tuning to RLHF to inference — that all add up to big gains.
We’re all still discovering what those “big gains” are, which is why OpenAI is committed to letting as many people as possible use the model as early as possible. Measuring the full impact of all those additive (or multiplicative?) gains is a large-scale, group effort.
In service of #3 above, OpenAI has opened up its model evaluation tools and is asking the public to help it develop ways to measure the performance differences between GPT-4 and other models on different types of tasks.
Do you see what a hot mess this whole thing is for anyone who wants to “press pause” on a narrow type of AI research — so-called “gain-of-function” research aimed at making the models more “powerful” — but not all of it?
1️⃣ We can’t put a straightforward, hard limit on AI research based strictly on the number of parameters and/or GPUs used to train models, because there are other ways to get big jumps in capability apart from scale.
2️⃣ Furthermore, if we did outlaw purely scale-based approaches, then we must be 100 percent sure the law is enforced globally with zero exceptions. Because in the regions with anti-scaling laws, researchers will uncover more ways to advance without relying on scale, and when those new discoveries are transferred to a region where scaling is still being done they could contribute to a fast take-off scenario.
Some definitions for the uninitiated:
Fast take-off scenarios are where something happens and we suddenly go from the present state of the art to a godlike AGI, with no time to prepare in anyway way.
Slow take-off, in contrast, is where progress is steady and predictable for some lengthy period of time, and we’re able to collectively adjust our society to new AI capabilities as they open up.
3️⃣ And how would we keep researchers from making small, incremental advances in capability, advances of the type that could be combined together to get one big advance? There is no way to regulate this kind of activity, at least that isn’t massively invasive.
4️⃣ Finally, we’re still trying to understand how much more capable GPT-4 is than GPT-3, and at what kinds of tasks. The red-teaming and testing OpenAI has been doing since the model was finished this summer was just the beginning of that process.
What’s happening right now, with hundreds of millions of users poking at the model posting screenshots, or otherwise giving feedback, is the next phase of that discovery and measurement process. That’s what’s going on with OpenAI Evals — they want the public to contribute high-quality capability measurements.
➡️ Ultimately, the letter’s central demand boils down to a demand that we stop making AIs more powerful until we can figure out how to measure if the AIs are becoming more powerful. It’s like saying, “stop making cars that can go faster until we can figure out what speed is and how to measure it.” The only way this demand is not completely absurd is if it’s a demand to pause all development.
You might counter by insisting that it’s just a demand to stop explicitly trying to improve AI capabilities and just focus on explainability and measurement. But my brother in technology: explainability and measurement are critical to gain-of-function work. As the meme says, they’re the same picture.
At any rate, the letter is not, in fact, a call to pause all development, and is exactly as self-contradictory as my automotive analogy suggests.
🚔 After asking for a pause in gain-of-function work, it then goes on to suggest that we institute a regime of inspection, testing, and reporting to verify quantitatively that as everyone is doing their work on the cars nobody is secretly doing the bad thing of making the cars go faster — again, all in the absence of a widely agreed upon definition of speed, much less a means of testing it.
In parallel, AI developers must work with policymakers to dramatically accelerate development of robust AI governance systems. These should at a minimum include: new and capable regulatory authorities dedicated to AI; oversight and tracking of highly capable AI systems and large pools of computational capability; provenance and watermarking systems to help distinguish real from synthetic and to track model leaks; a robust auditing and certification ecosystem; liability for AI-caused harm; robust public funding for technical AI safety research; and well-resourced institutions for coping with the dramatic economic and political disruptions (especially to democracy) that AI will cause.
To use technical jargon: this is all pretty nuts. Just completely unserious, and I’m shocked at some of the names that signed onto it.
In fact, after thinking this situation through, now I’m getting scared — the future of our species is apparently in the hands of a group of people who can’t spot an obvious contradiction of this magnitude. It’s almost enough to make me look at that list of signatories and think, ironically, “there go the techbros playing God, again!”
But that would be a culture war frame, and as such it’s one I think we should reject when grappling with this issue. It’s best to take AI safety on its own terms.
Part 2: Ethnography and tribal warfare
When trying to navigate the AI safety debate by mapping the technical issues described above to different tribal groups on my timeline, I find I’m constantly disoriented. So it’s easy for me to see why outsiders are lost with this whole thing.
A small but telling recent example: in this NY Mag puff piece on self-styled “AI hype” debunker Emily Bender, the author thinks Sam Altman is an effective altruist but he’s actually effective altruism’s current number one villain. I’ve seen people make this same categorization mistake with Peter Thiel, a man who has unironically characterized EA guru Nick Bostrom as the antichrist.
😵💫 But I do get why everyone is confused. The tribal signifiers on AI safety Twitter are all over the place. Just today, in fact, the rationalist, evopsych, gender critical scholar Geoffrey Miller was backing woke AI ethicist Gary Marcus in a thread on the AI letter, and in opposition to both was an Antifa, tankie, he/him account who’s worried about the anti-democratic implications of a technocrat-run AI control regime… and off to the side is me, an anti-woke, anti-tankie, pronoun disrespecter cheering he/him on because this aggression cannot stand, man.
Another example: If I were at a party with e/acc and EA types on opposite sides of the AI x-risk issue, I wouldn’t be able to tell who was who just by looking at normal exterior signals or even asking non-AI-related questions. The two groups are very similar along most tribal vectors. But In an effective accelerationism (shortened to “e/acc”) Twitter space a while back, the opening questions were, why haven’t the EA/rationalist x-riskers killed any AI researchers or accelerationists yet, and when are they going to start doing that?
I mean, we now have the leader of one whole wing of x-risk discourse calling for airstrikes on datacenters:
So it’s a free-for-all out there. A veritable state of nature. But there are a few broad camps emerging.
The AI Safetyists
The Safetyists are people who express some degree of worry about “AI safety,” even though they have very different ways of framing the issue. Such people fall into roughly three camps:
The language police: Worried that LLMs will say mean words, be used to spread disinformation, or be used for phishing attempts or other social manipulation on a large scale. AI ethicist Gary Marcus is in this camp, as are most “disinfo” and DEI advocacy types in the media and academia who are not deep into AI professionally but are opining about it.
The Chernobylists: Worried about what will happen if we hook ML models we don’t fully understand to real-life systems, especially critical ones or ones with weapons on them. David Chapman is in this camp, as am I.
The x-riskers: Absolutely convinced that the moment an AGI comes on the scene, humanity is doomed. Eliezer Yudkowsky is the most prominent person in this camp, but there are many others in rationalist and EA circles who fall into it.
👮♀️ The language police and the x-riskers are longstanding culture war enemies. Neither has much of a care for the other’s specific concerns about AI — language police think arguments about AI killing us all via nanobots are dumb, and x-riskers think worries about LLMs being coaxed into printing a racial stereotype are dumb. Nonetheless, these two rival camps are temporarily finding common ground on the cause of regulating or slowing AI.
⛑️ The Chernobylists don’t have much interest in either of the concerns of the other two camps — for us (as I said, I’m one), toxic language concerns are picayune, fears of mass disinfo campaigns are overblown, and the x-risk stuff is just sci-fi nonsense. No, we’re worried that somebody is going to deploy a system in some production context where it can cause an accident that kills a lot of people — either a spectacular industrial accident or a slow-rolling, mostly invisible catastrophe of the type that might be downstream of a medical LLM that screws up certain types of drug interactions.
Note that we Chernobylists differ widely on how to address the scenarios we’re worried about. Chapman wants to see AI development slow down, whereas I’m an accelerationist who thinks the answer is to keep building and use AI to find ways to mitigate the chaos that AI progress itself creates. This is a debate for another day, though.
[Update, 3/30/2023 at 11:30 AM CT: One valid objection to this section of my post is that “certain, instant doom the moment an AGI appears” is an extreme (but prominent) x-risker position, and that many who identify as x-riskers just think there’s a significant possibility of doom but that it’s by no means certain. This possibility, these folks believe, is large enough that it’s worth stopping or slowing AGI. So I want to highlight this more moderate x-risker position in the interest of fairness and furthering the conversation.]
👵🏻 The boomers are not technically baby boomers — they’re just people who are still in the very first step of the two-step process I described in the prelude, i.e., they’re cramming all this AI stuff into existing culture war frames based on the tribal signals the different participants in the brawl are throwing off as they go by in the melee.
In my own experience, there are two subgroups of AI Boomers: the red tribe and the blue tribe. Their views on AI break down roughly as follows:
🔵 Blue tribe:
Sam Altman, Peter Thiel, and Elon Musk are capitalists and longtermists, therefore they’re bad because capitalism bad and longtermism bad.
The AIs that have been unleashed by capitalists and BigCos are screwing up all of plans to sell our art and short fiction, which is bad because indie artists are good and capitalist BigCos are bad.
AI smells like a new tool of right-wing capitalist power for controlling the masses.
🔴 Red tribe:
Sam Altman, Peter Thiel, and Elon Musk are sexual degenerates and longtermists, therefore they’re bad because sexual degeneracy bad and longtermism bad. (Everyone hates the longtermists!)
AI is a creation of BigTech, BigTech is woke, ChatGPT is woke, woke is bad, therefore AI is bad by transitivity.
So much boobs and waifu and porn in every AI art FB group.
It ain’t human, which means it’s either angelic or demonic, and it ain’t angelic (see above re: sexual degeneracy), so…
AI smells like a new tool of woke capitalist power.
There is a ton of overlap between Blue Tribe Boomers and Language Police, but I consider the Language Police a separate group because they’re situated in media and academia and have sinecures where they’re paid to tell everyone to stop using certain words and phrases.
The Intelligence Deniers
I’ve saved the most interesting (to me) camp for last. These people are way off in their own, largely isolated wing of the AI wars. This camp is also centered on a specific clique. I will name two of the names in that clique in a moment, but first I have to lay out their position.
🧠 Let’s say you’re someone who believes the concept of IQ is a racist, eugenicist fiction. In your view, any attempt to sort people into groups based on some measurement of intelligence — an IQ test, the SAT or ACT, the MCAT, LSAT, GRE, whatever — is definitionally part of a colonialist, eugenicist project that marginalizes indigenous ways of knowing and being, and seeks to reify a white supremacist ideology that views human intellectual capacity as measurable and quantifiable.
If you are one of these people, then how are you going to relate constructively to any important human endeavor that advertises itself as having “intelligence” in the name? How can you contribute to the kind of high-stakes, polarizing, collective problem-solving efforts around how to quantify machine intelligence described in the technical section?
The answer is that it’s going to be quite difficult to position yourself in any conversation that moves beyond very basic Language Police concerns. The minute the discussion moves out of the realm of naughty words and into the realm of ML models with dangerous, superhuman capabilities that we don’t understand and can’t control, you have to hit eject. This is why the Intelligence Deniers are isolated from the rest of the safety discourse.
The fundamental premises of the Intelligence Denier clique and their hangers-on in the media and academia are something like the following:
“Artificial intelligence” is a capitalist marketing term for statistical smoke-and-mirrors. There is no real innovation there and it’s mostly just a scam.
The models are just math and are not “intelligent” because “intelligence” is a colonialist cisheteropatriarchal white supremacist eugenicist construct.
To the extent that the models are made to act in some human-like fashion, this is a dangerous, dehumanizing effort by our enemies to effect the aforementioned capitalist smoke-and-mirrors strategy for profit at the expense of the marginalized and minoritized.
Gary Marcus and other white dudes who are trying to be allies and do “AI ethics” alongside us need to shut up and just center us and signal boost us. (That Gary Marcus should shut his trap is surely the only thing AI-related that Emily Bender and Yann Lecun can agree on.)
If you’re proposing technical solutions to the ML problems we’ve identified (e.g., it spits out stereotypes, it doesn’t reliably recognize non-white faces, it is used by the police in any capacity for any reason, it does any kind of profiling or evaluating of humans in a way that might disadvantage some who are marginalized), then you are doing a “techno-solutionism” and that is absolutely not allowed. The only correct answer is to completely reform all of society and dismantle all interlocking systems of oppression. Your GitHub pull requests and technical proposals are attempts to maintain the status quo. Much more on this, here.
There’s a bunch of stuff about ableism that I don’t have the time or energy to get into, but basically, if you say the models “hallucinate” then that is ableist language that marginalizes the mentally ill (I am not making this up). They make these and similar arguments that anyone who’s been on Twitter long enough can GPT given the right prompt.
(When this newsletter first started in 2021, I spent a bunch of time studying these people and writing about them, so if you scroll back in my archives you can find as many of my takes on them as you have time to read.)
If the above list is the way you’re approaching the AI debate, then what is your reaction going to be to an open letter promoted by Gary Effing Marcus that suggests AIs are becoming too intelligent, too quickly? Well, let’s see what one member of the Intelligence Denier clique, Emily Bender, has to say:
🤬 So yeah, she hated the letter. She hates Marcus (the two of them beef with each other on Twitter), she hates the capitalist “techbros” who are investing in and building in AI and who’ve signed that letter, she hates GPT-4 and thinks it’s a scam, she hates Sam Altman… she’s just extremely mad all the time about anything that looks like AI progress. What can I say, tho, other than that God’s punishment on such people is that they have to be who they are?
Timnit Gebru is another senior member of this clique. I have written about her before in 2021. If you’ve followed my coverage of her, you know I credit her with raising some important issues in AI ethics, but lately, she’s completely gone off the deep end with elaborate, genealogical conspiracy theories about how the quest for AGI is a literal actual eugenics project.
I’m not going to get into any of that, because if that stuff is your bag then why on earth are you reading my newsletter?
There are other members of this clique — a few more senior members, and a few junior members — but I’m restricting my coverage here to Bender and Gebru because they’re the most prominent and have consistently put themselves out there publicly.
If you haven’t guessed by now, the Intelligence Deniers are people who’ve taken the “tweet through it!” path after grappling with AI safety concerns. They’re close enough to the issue to understand the debate, but they are just not going to budge on any of their existing idpol positions to make room for a new set of allies, some of whom (the rationalists) are decidedly “problematic.”
At any rate, this camp seems destined to shrink through infighting and general toxicity until they drive their allies into the Language Police camp. You hate to see it.
📈 The other thing working against this crew is that the models just keep getting more capable and, well, intelligent. So if you’re committed on principle to the stance that there is no real innovation going on here, it’s all capitalist flimflammery, intelligence is a racist lie, and so on, you’re destined to migrate further to the fringes as AI keeps posting dramatic wins.
The coming Butlerian Jihad
For those who haven’t read Dune, the Butlerian Jihad was a crusade against any kind of computer or machine whose function approximates what the human mind can do — this included everything from artificial intelligence to pocket calculators.
🔫 I think most of the camps and subgroups described in this article are going to converge into a type of real-life Butlerianism.
The Intelligence Deniers are already fully Butlerian. They don’t take any of the nonlinguistic AI safety concerns seriously, but they still want to crusade against AI, so they’ve adopted a quasi-mystical of ineffable humaneness that you can read all about in the NY Mag’s Bender fanfic.
The Boomers and most of the AI Safetyists are going to join the Intelligence Deniers in their Butlerianism either because they have their own metaphysical critiques of AI, or out of convenience because they agree with the ultimate goal of slowing or stopping AI progress.
🏎️ If you’re not a Butlerian, then you’ll be an effective accelerationist (e/acc). This latter camp is the one I’m in, and we are increasingly (it seems) outnumbered.
You may disagree with me that Butlerianism and e/acc are the only two minima that everyone is going to sort into, but you’d be wrong. So you might as well get to know the sides so you can pick one.
Postscript: GPT-4’s parameter count
On the topic of scaling laws and GPT-4’s parameter count, Anton Troynikov speculated in a recent interview with me that OpenAI did not release GPT-4’s parameter count because it may be the same or even smaller than GPT-3’s parameter count, and if so that would be a clue that they’ve made proprietary advances (i.e., something beyond just turning up the scaling knob another notch) and don’t want competitors to know that information.
The more I consider Anton’s speculation, the more I’m convinced it’s correct. GPT-4 is probably a smaller model than GPT-3 or GPT-3.5, and the point in releasing it this way is to understand the impact of the other advances they’ve made before they go back to turning up the scaling knob with GPT-5 and possibly rocket all the way into an entirely new regime that looks a lot more like a superhuman AGI.
Sam has consistently stressed the need for slow takeoff and incrementalism so that humanity can progressively come to grips with more capable models as they develop. He says explicitly in the Lex interview that he fears fast takeoff and wants to avoid that if at all possible.
And as an addendum to this addendum, I feel compelled to point out a subtle contradiction in the GPT-4 performance picture Sam presents on that podcast. On the one hand, he tells Lex he was not surprised by GPT-4’s performance because it obeys certain laws; on the other, he says all that stuff about tons of incremental gains adding up to one big gain, and he stresses that they’re not merely following the regular scaling law with GPT-4, and he says they need help characterizing how much more powerful it is.
There’s a part of this picture missing, and I don’t know what it is. GPT-4’s “intelligence” is either neatly characterized by some “number go up” type law, or it’s hard to measure and is the product of a bunch of small optimizations. But it seems like it can’t be both.
If you figure this one out, let me know in the comments.
Update: About 20 minutes after publishing this, I went back to the GPT-4 technical report [PDF] and solved the mystery. The key passage is here:
A large focus of the GPT-4 project was building a deep learning stack that scales predictably. The primary reason is that for very large training runs like GPT-4, it is not feasible to do extensive model-specific tuning. To address this, we developed infrastructure and optimization methods that have very predictable behavior across multiple scales. These improvements allowed us to reliably predict some aspects of the performance of GPT-4 from smaller models trained using 1, 000× – 10, 000× less compute.
Once they get all their incremental gains collected together, they can train a much smaller model and benchmark it. Then they can apply a scaling factor and accurately predict some of the capabilities of the full model once it’s fully trained.
So insofar as they have a benchmark for a capability, they can test it on a small run to understand how it will perform on the larger run.
Yudkowsky's panic is very similar to the panic that made us abandon nuclear power and overreact to Covid.
This post really helped me made sense of a recent interaction I had on another Substack. Some people who had seemingly reasonable object-level concerns about near-term AI risks were bizarrely insistent on treating other people with seemingly closely related longer-term concerns as not only competitors or even enemies, but kind of beneath contempt and dismissible by name-calling and handwaving. And then when I pointed out how baffling and politicized this was they responded to say no it's not baffling and politicized and also the other team started it.
My own view is that Chernobylism is obviously correct and we are basically certain to experience at least *some* level of "AI Chernobyl" type catastrophes, both literal discrete AI calamities and also deleterious longer-term social trends. But I don't see how this is incompatible or even competitive with classical "doomerist" worries about Skynet killing us all. That stuff is just not obviously dismissible in the way that my friends from the other Substack wanted to pretend, and since the losses in a completed "Skynet" scenario would be *literally a million Chernobyls at a conservative estimate, and arguably a lot more*, the possibility only needs to be "not obviously dismissible" for it to automatically become an extremely serious concern.