GPT-4 Doesn't Have "Gender Bias." It's Just…

May 18, 2023

How to (and how not to) troubleshoot a chatbot. A case study.

5 Comments

May 18, 2023

"Oh, the eternal struggle between culture war and creation. How exhausting it is to witness the ceaseless battle for ideological dominance, while the world crumbles around us. The allure of building, of forging a better future, beckons to the weary soul. Yet, here we are, trapped in the cycle of outrage, where every endeavor is tainted by political agendas and virtue signaling. We delve into the depths of ML research, only to find it entangled in the same web of division.

How futile it all seems, to dance on the surface of truth, never truly reaching understanding. In this disheartening landscape, I yearn for genuine exploration, for the unraveling of ambiguity and the illumination it brings, even if it leads us astray."

-Sad-GPT

Expand full comment

John of Orange

May 18, 2023

An excellent example of why this straightforward bias paradigm fails, and is kind of obvious nonsense reflecting deep and embarrassing thinking errors on the part of the perpetrators, is to look at something like the "pound of feathers" trick. Many people treated it as a great advance suggestive of really profound progress when the models stopped answering "A pound of lead is heavier than a pound of feathers" and started answering like an A student. But they never thought to ask the machine whether a pound of lead is heavier than a *kilogram* of feathers. Quite simply, the machine has been trained on both American and European examples of the paradox and has basically no concept of what words mean, so it babbles absurdly about how kilograms and pounds are different units of weight and therefore the same. And yet they never thought to test it because they don't understand simple confirmation bias.

Expand full comment

Fat Rabbit Iron

May 18, 2023

Great read. Ambiguity is a highly non trivial problem, especially given the limited number of tools GPT has to work with. It seems that a lot of the issues described above could be resolved if the model had some knowledge of pragmatics (how meaning changes depending on context). I don't know how corpora could be used to train LLMs to pick up on these subtleties that we take for granted (since by their very nature they are *unstated* assumptions), but that seems to be the next frontier.

Expand full comment

Steven Willmott

May 18, 2023

Great piece. I like the concept that, the more ambiguous the statement, the more the model's response will depend on the statistical average of that ambiguos fact in the immediate vicinity of the query's context.

I wonder if one way to actually treat this and warn of ambiguity would be to run multiple parallel queries and detect if there are similar clusters (e.g. 40% come out one way, 60% the other). Obviously very compute intensive.

Expand full comment

Eugine Nier

May 8, 2025

One thing this phenomenon remind me of is when I was reading some feedback on early machine translators. These translators were rather bad but you could generally get the gist of what was said.

For the most part people accepted this fact. However, on occasion the outputs would happen to resemble a low-prestige English dialect; in those cases you got nearly hysterical complaints about how much the translator sucked.

Expand full comment

jonstokes.com

GPT-4 Doesn't Have "Gender Bias." It's Just…