AI Wars, Part 2: Meta’s Galactica AI & The End Of Science
Science is in deep trouble in for reasons that have zero to do with deep learning.
On November 15, the AI team at Meta launched a brand new open-source model that promised to revolutionize science. Meta’s Galactica AI aimed to be a kind of GitHub Copilot for scientific writing — scientists could type sentences in a paper and it would suggest relevant literature citations and even lines of text.
When the model launched, some yahoos did what yahoos always do with every ML model that’s released to the public — they executed a play I’ve come to think of as the GIGO dunk. The GIGO dunk is when you type garbage into an ML model and get garbage out of it, then you rush to Twitter to post a dunk thread full of screencaps purporting to demonstrate that, hey, this new model’s output is garbage!
But this always happens. So it goes, right?
Eh, not quite. This GIGO dunk hit different this time because in this particular case the yahoos who employed it were well-credentialed academics and prominent researchers from multiple fields (chemistry, linguistics, computer science, machine learning, etc.) and FAANGs.
In fact, you know that game where you start texting and then just accept whatever autocomplete throws at you, and the end result is this vaguely coherent but funny (in a random way) string of words? That’s literally what these folks did with Galactica. Then they used the resulting garbage output to claim that the model is a “dangerous” tool for producing “fake science” with the potential to wreck our scientific publishing ecosystem. I realize it sounds fantastic that a bunch of grownup PhDs would behave this way, nonetheless, it happened.
jonstokes.com is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
The model was subsequently yanked offline and a clarification posted on the main Galactia page.
At any rate, this post won’t get any further into the Very Online drama surrounding Galactica for a few reasons:
It’s depressing. I just felt a mix of rage and abject despair while reading this material, and I’m not keen to put others through the same experience.
The fact of the drama matters, but the details do not. If you’ve followed this Substack since 2021, then once I give you a few details you’ll be able to guess all the rest because the blow-up follows a tiresomely familiar pattern and involves the same old worn-out themes.
I don’t want to give too much free exposure to the haters. Everyone at this point is savvy to the way negative press from one’s enemies brings with it even more clout and satisfaction than positive press from one’s allies, so I don’t want to do bad actors any favors by naming them here or linking their threads.
YouTuber Yannic Kilcher has provided a great overview of the hate and the problems with it, so I’ll direct you to that in my appendix at the end of this article.
Galactica raises far better and more interesting issues for science than are covered in the Twitter hysterics surrounding it. So I’d rather talk about real issues.
That last item above is the most important — Galactica is ambitious, and it kinda matters. So let’s get right into the whats and whys, and I’ll link the drama in an appendix for the masochists in the audience.
What is Galactica?
Meta’s Galactica is a machine learning model that’s meant to assist in the writing of scientific papers — a kind of turbocharged autocomplete for science writing. But while “autocomplete for science” give you a sense of how it works, the metaphor doesn’t really do justice to the tool or to the full scope of its makers’ ambitions for it.
Here’s a quick breakdown of Galactica’s highlights:
It’s a “large language model that can store, combine and reason about scientific knowledge.” (Per Meta’s paper on it).
It can suggest relevant citations based on what you’ve typed.
It can suggest text.
It can explain scientific equations and blocks of code in plain English.
It can simplify code and equations.
It can find errors in math derivations.
The tool’s makers are quite ambitious about it: “Our ultimate vision is a single neural network for powering scientific tasks. We believe this is will be the next interface for how humans access scientific knowledge, and we get started in this paper.”
The Galactica paper’s authors frame the current problem with science as one of information overload. There are too many papers, bits of code, nucleotide sequences, and other objects that make up the rapidly expanding universe of formal scientific knowledge. It’s not just that one person can’t hope to get a solid handle on the current state of any field or discipline — this has been true for a while — but there’s even a sense that humanity collectively has lost its grip on “science.”
The current state-of-the-art in wrangling all this scientific knowledge into a paper is the search engine. When scientists are in the process of writing a paper that describes something they’re working on, and they need to get a handle on the work that has been done in their area that they should be reading and citing, they do what everyone else in the world does in a similar situation: they Google it. Or they use some more specialized search tool.
Now, the scenario I’ve just described — you’re pecking away in a text editor, and as you work you regularly reach a point where you have to pause your typing and go Google something so that you can return to your text editor and get on with it (often pasting some of what you got from Google into the text editor — will be intimately familiar to software developers. This is exactly why StackOverflow exists and is so popular.
But if you’re a software developer in the waning months of 2022, you suddenly have access to a miraculous bit of technology that makes the above type-pause-Google-copy-paste-type loop mostly obsolete: ML-assisted code completion. Coders who use Github’s Copilot or Replit’s Generate Code tools get elaborate, context-aware autocompletion suggestions that they can either ignore or accept. The AI watches what they’re typing and suggests the code that we’d normally have to pause and do a search for. It’s fantastic.
ML gives coders more than just autocomplete — you can also type in a text description of some code you want to write, and the model will take a stab at generating the relevant code. Or, you can run this process in reverse by feeding some code into the model and having it give you an explanation of what that code does.
Coders are finding these new capabilities powerfully useful — this has changed programming overnight, and will lead to a large increase in per-programmer productivity. So Galactica’s creators figured why not try to bring this same productivity boost to the process of creating new science?
Fake harms vs. real problems
Galactica’s haters have proposed a bunch of laughably silly fake “harms” that you can acquaint yourself with in the appendix to this post. I won’t entertain any of this foolishness myself, because if you’ve made it this far then as you watch Yannic’s video embedded in the appendix you’ll know exactly what’s wrong with the bedwetters’ arguments (in the few cases where they actually tried to make them… most of it is just lazy QT clout-chasing).
But I have some concerns with Galactica as a kind of GitHub Copilot for science, and they’re rooted in two realities:
Science is in very bad shape in ways that have little or nothing to do with the fact that scientists have to pause their typing and Google things, or need equations explained.
Scientific papers are not computer programs, so tools that may net-net improve the field of software development might have a different impact on science because while the process is similar the goals are quite different.
Expanding on point #2, computer programs have the following qualities that are not shared in any respect by scientific papers:
Programs are meant to be read by computers, not humans.
Following on #1 above, it never has and never will matter if there are too many programs for humanity to get some kind of collective mental handle on. The number of useful computer programs is limited by constraints like the amount of hardware that can store and run them, not by humanity’s ability to access them, read them, and synthesize them into new programs.
A body of executable programs is not a body of knowledge in the scientific sense. Sure, code is speech (sorta) and all that; and scientific papers often feature code while code is improved by scientific papers. But a corpus of programs just doesn’t have the same origins or goals in the world as a corpus of scientific literature.
So what does all this mean for Galactica?
As I explained above, the Galactica paper frames the problem with science as one of too many scientific papers published for even specialists to keep abreast of. But Galactica’s creators don’t seem to have gamed out how giving each scientist a huge productivity boost in paper writing seems very likely to make this exact problem much worse.
The incentive for scientists in most fields is to “publish or perish” — volume and citations count for promotions and pay, which is a major reason the number of papers is so high. So if you give all these citation-maxxing scientists a tool that, say, cuts the amount of time they spend on one paper in half, they’re most likely to just produce twice as many papers. (As opposed to, say, using the extra time to make the same number of papers twice as good.)
Would twice as many papers per scientist be good for science as it currently exists, with all its problems? This depends entirely on the quality of the additional work, but whatever the case it will certainly make the “too many papers to keep track of” problem worse.
Boosting the paper productivity of individual scientists could lead to something like the phenomenon where in a crowded restaurant everyone raises their voice because everyone else is talking too loud. The volume will just keep going up in a runaway fashion until we reach a point of such deafening white noise (scientifically speaking) that not even computers can make sense of it.
Science is also facing a massive replicability crisis (more on this in the next section), and the volume issue feeds back into it because the incentives are all geared toward publishing “new” science instead of trying to replicate existing science. If its main use is amping up the volume of “new” science, then Galactica at best does nothing for the replicability crisis. At worst, it might exacerbate the crisis by blowing out the number of papers that need to be replicated before they can be relied on.
Note that all the problems I’ve cited above aren’t really problems with Galactica — they’re problems with science.
If science was structured to reward replications over novel results, Galactica could be a great tool in the quest to validate existing work. Literally, the whole point of Galactica is to give scientists better access to existing science, so it seems pretty ideal as a tool for replicating work and should probably have been pitched as such.
Science is already in so much trouble
Just like I said above that scientific papers are not computer programs, it’s also the case that scientific papers are not news articles. Worries about “fake news” and “disinformation” the model’s critics have imported from the news ecosystem to the scientific ecosystem are not appropriate… or, at least, they shouldn’t be appropriate. If they are, it means something is way wrong. (Spoiler alert: something is way wrong.)
But before I go further, I want to acknowledge that a news article is closer to a scientific paper than a computer program is. News articles form an important input into different academic disciplines, and they do play a role in our collective knowledge supply chain — a supply chain that serves the reading public, academics, and news professionals — that’s not entirely dissimilar to the role played by scientific publications.
Nonetheless, it seems to me that news articles have two key qualities that aren’t shared by scientific papers:
The audience for news is predominantly made up of people who are not in the business of trying to synthesize news articles together to make novel news articles, articles that will then go on to advance the state of human knowledge in some specialized area.
Following #1 above, the audience for news pieces does not have an individual or collective duty to reproduce reporting and validate it (i.e., a duty comparable to science’s duty to replicate and validate important scientific work).
The point of the above is that the audience for scientific papers is overwhelmingly other scientists, and as scientists, they are obliged to a level of diligence in their use of previous work that’s just not comparable to the level of diligence we can reasonably ask of the news-reading public.
(Of course, this doesn’t stop “disinformation” crusaders and tech platforms from urging the public to exercise greater diligence in their news consumption and sharing habits. But I have a pretty low opinion of any argument that amounts to, “if only each individual in this large collective would resist every powerful incentive and behave more virtuously on an individual level, we could solve this problem.”)
Unlike your clickbait-sharing uncle on Facebook, scientists are informed specialists who are supposed to read and understand the papers they use in their own work, and to form a reasonable assessment of the quality of that prior work. To put it in AI terms, Galactica assumes there is a human in the loop — and not just any human, but a human with a Ph.D. in the area that the AI is offering suggestions about.
My point: If randos and bad actors really can use Galactica to invent fake science that other scientists will then just blindly use and cite without even bothering to check anything about it (who wrote it, their institutional affiliations, if the paper even makes sense, etc.), doesn’t that mean science is already screwed?
It seems to me there are two ways a Galactica-powered science grifter could introduce reams of fake science into our scientific supply chain:
By getting the fake science past peer review. In which case… hello? What even is the point of peer review, then? (I know… I know… the practice has many problems. The question is almost rhetorical, here.)
By pushing the fake science to arXiv.org, where it’s published without peer review and cited by other scientists.
Leaving aside the problem of peer review — the current sorry state of it, and if it was ever a real guarantee of scientific quality — I want say that if spamming arXiv.org with fake papers is really what we’re worried about, then folks I have questions.
Are working scientists really citing arXiv.org papers from total strangers without even doing basic diligence on the authors’ credentials?
Or, is there no way to verify that the authors listed were involved with the paper, so even if the credentials could check out you’re still using fake science without knowing it?
Or maybe the incentives in science are so screwed up that we’re worried that actual working, credentialed scientists will just start pushing up a bunch of fake science to get their numbers up?
What I’m really suggesting here is that any successful attack on the scientific supply chain that depends entirely on spamming publication venues (peer-reviewed or not) with fake science would seem to demonstrate that science is already massively compromised — that there are no real filters or guardrails in place.
And now, the plot twist: Science is already massively compromised and there are no filters or guardrails in place, as is evidenced by the fact that fake science is already everywhere.
The replication crisis is real and has been widely acknowledged since about 2010, but nobody even knows how bad it is. Many suspect it’s very, very bad.
Fake scientific papers are a major problem in science, with “paper mills” churning out reams of fake papers on demand. Scientists buy these papers to burnish their credentials. Again, nobody knows how big the problem really is, though insiders suspect it’s huge.
Fake peer reviews, where papers’ authors essentially “review” themselves, are another big problem.
Some scientific papers now have fake authors, where the name of an author who may not even exist has been added to the paper.
The picture is grim, especially when ideally there should be zero fake science in any venue that legitimate scientists rely on for inputs to their work. None. And yet not only are all the venues apparently plagued with it, but nobody really knows how much of it there is out there.
That’s kind of what cracks me up about the criticism that Galactica is dangerous because bad guys might use it to make more fake science. The victim is already drowning, and instead of pulling him out of the water entirely, you’re running around trying to stop people from pouring more water on him. He might drown even more!
This whole thing almost makes me hope for a WallStreetBets-style grassroots effort to just totally and finally discredit the scientific apparatus by flooding it with AI-generated garbage. Just like the WSB goons mooned AMC to highlight the absurdity of a financial system that’s obviously rigged by big players, some group of Galactica users could pump venues full of fake science to highlight the fact that nobody should be relying on any of them because the quality filters are so degraded.
Don’t do this, by the way. This would be bad. I’m just venting. No need to, as I said, pour water on the drowning guy. Instead, we should fix the incentives.
How about we start with the simple, widely understood fact that when you reward quantity and ignore quality, you create a market for counterfeits? I suspect everyone knows this is the issue, but there’s a massive coordination problem that prevents everyone from coming together to fix it. If only we had guilds, or organizations, or a government apparatus or something that could coordinate a solution. Maybe one day, in a better world.
Appendix: Threads of big mad
The first fifteen minutes or so of the video below will walk you through a sampling of the caterwauling and Twitter haterism that accompanied Galactica’s launch, but before you watch it I offer you a trigger warning: if the sight of academic and industry insiders with platinum-plated institutional affiliations taking up torches and pitchforks like a bunch of medieval villagers on the trail of some witchcraft makes you despair for humanity, then don’t watch this.
Kilcher does a capable job of swatting down a lot of the sillier critiques, so watch that whole first section if you can stomach it.
What Kilcher’s lowlights reel exposes is, of course, the other big crisis in science that I left implicit in this post — the fact that so many working scientists and researchers are giant babies who throw these insane Twitter tantrums and won’t even answer any questions or respond to pushback. They just want to dunk and chase clout, and they don’t care a bit about the collateral damage they’re doing to the institutions they represent.
jonstokes.com is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Given the current oversupply of scientists and resulting publish or perish pressure, Galactica probably has most value as a tool for replication, error checking and automation of meta and synethesis studies.
"1. The audience for news is predominantly made up of people who are themselves in the business of trying to synthesize news articles together to make novel news articles, articles that will then go on to advance the state of human knowledge in some specialized area."
Is there a 'not' missing from this sentence?