Stochastic parroting / a tool is a tool is a tool
Paris Marx (Tech Won’t Save Us) talks with Prof. Emily M. Bender
Against the backdrop of hype by the creators of large language models and the fawning rah rah by the MSM,it was refreshing to listen to Paris Marx (Tech Won’t Save Us) and Emily M. Bender discuss what it means to say that ChatGPT is a “stochastic parrot,” why Elon Musk is calling to pause AI development, and how the tech industry uses language to trick us into buying its narratives about technology.
Emily M. Bender is a professor in the Department of Linguistics at the University of Washington and the Faculty Director of the Computational Linguistics Master’s Program. She is also the director of the Computational Linguistics Laboratory.
There is this difference in linguistics between form and meaning, or syntax and semantics — though syntax now includes not only grammar but also statistics-based word predictions.
EB: Meaning is use. But use isn’t just distribution and text. Use is embedded in a social context. It’s embedded in communicative intent. But these language models, GPT-2, GPT-3, et cetera — their only training data is the form. The only thing they have access to is the distribution of word forms and text. I wrote a paper which was the result of having interminable Twitter arguments with people about this.
PM: It’s a good way to inspire some work.
EB: Yeah, man, frustration papers!
So, I guess I started working on it in 2019. It was published in 2020. This is with Alexander Koller, where we basically just lay out the argument for why meaning isn’t the same thing as form. And therefore something trained only on form is only going to get form. It’s not going to get meaning. Even if the similarities between word distributions can tell you a lot about similarities between word meanings, it’s still not going to get to meaning, it’s not going to get to understanding. And so that paper came out at ACL in 2020, and it’s the one with the octopus thought experiment in it.
Here is the paper’s octopus section (© 2020 Association for Computational Linguistics):
In order to illustrate the challenges in attempting to learn meaning from form alone, we propose a concrete scenario. Say that A and B, both fluent speakers of English, are independently stranded on two uninhabited islands. They soon discover that previous visitors to these islands have left behind telegraphs and that they can communicate with each other via an underwater cable. A and B start happily typing messages to each other.
Meanwhile, O, a hyper-intelligent deep-sea octopus who is unable to visit or observe the two islands, discovers a way to tap into the underwater cable and listen in on A and B’s conversations. O knows nothing about English initially, but is very good at detecting statistical patterns. Over time, O learns to predict with great accuracy how B will respond to each of A’s utterances. O also observes that certain words tend to occur in similar contexts, and perhaps learns to generalize across lexical patterns by hypothesizing that they can be used somewhat interchangeably. Nonetheless, O has never observed these objects, and thus would not be able to pick out the referent of a word when presented with a set of (physical) alternatives.
At some point, O starts feeling lonely. He cuts the underwater cable and inserts himself into the conversation, by pretending to be B and replying to A’s messages. Can O successfully pose as B without making A suspicious? This constitutes a weak form of the Turing test (weak because A has no reason to suspect she is talking to a nonhuman); the interesting question is whether O fails it because he has not learned the meaning relation, having seen only the form of A and B’s utterances.
The extent to which O can fool A depends on the task—that is, on what A is trying to talk about. A and B have spent a lot of time exchanging trivial notes about their daily lives to make the long island evenings more enjoyable. It seems possible that O would be able to produce new sentences of the kind B used to produce; essentially acting as a chatbot. This is because the utterances in such conversations have a primarily social function, and do not need to be grounded in the particulars of the interlocutors’ actual physical situation nor anything else specific about the real world. It is sufficient to produce text that is internally coherent.
Now say that A has invented a new device, say a coconut catapult. She excitedly sends detailed instructions on building a coconut catapult to B, and asks about B’s experiences and suggestions for improvements. Even if O had a way of constructing the catapult underwater, he does not know what words such as rope and coconut refer to, and thus can’t physically reproduce the experiment. He can only resort to earlier observations about how B responded to similarly worded utterances. Perhaps O can recognize utterances about mangos and nails as “similarly worded” because those words appeared in similar contexts as coconut and rope. So O decides to simply say “Cool idea, great job!”, because B said that a lot when A talked about ropes and nails. It is absolutely conceivable that A accepts this reply as meaningful—but only because A does all the work in attributing meaning to O’s response. It is not because O understood the meaning of A’s instructions or even his own reply.
Finally, A faces an emergency. She is suddenly pursued by an angry bear. She grabs a couple of sticks and frantically asks B to come up with a way to construct a weapon to defend herself. Of course, O has no idea what A “means”. Solving a task like this requires the ability to map accurately between words and real-world entities (as well as reasoning and creative thinking). It is at this point that O would fail the Turing test, if A hadn’t been eaten by the bear before noticing the deception.
Having only form available as training data, O did not learn meaning. The language exchanged by A and B is a projection of their communicative intents through the meaning relation into linguistic forms. Without access to a means of hypothesizing and testing the underlying communicative intents, reconstructing them from the forms alone is hopeless, and O’s language use will eventually diverge from the language use of an agent who can ground their language in coherent communicative intents.
The thought experiment also illustrates our point from §3 about listeners’ active role in communication. When O sent signals to A pretending to be B, he exploited statistical regularities in the form, i.e. the distribution of linguistic forms he observed. Whatever O learned is a reflection of A and B’s communicative intents and the meaning relation. But reproducing this distribution is not sufficient for meaningful communication. O only fooled A into believing he was B because A was such an active listener: Because agents who produce English sentences usually have communicative intents, she assumes that O does too, and thus she builds the conventional meaning English associates with O’s utterances. Because she assumes that O is B, she uses that conventional meaning together with her other guesses about B’s state of mind and goals to attribute communicative intent. It is not that O’s utterances make sense of them.
EB: The Stochastic Parrots paper came about because Dr. Timnit Gebru approached me, asking if I knew of any papers that sort of brought together the risks of ever larger language models. And I said, no, but here’s a few things I can think of off the top of my head. And then the next day, I said, hey, that looks like a paper outline. You want to write this paper?
And so that’s how that started. We were basically reacting to the way that these companies just wanted to make them bigger and bigger and bigger and sort of saying, maybe it’s time to stop and think about what the risks are instead of just barreling down this path. And one of the risks that we identified ... was that if you have a coherent seeming, plausible sounding text, people are going to fall for that. They’re going to think it actually reflects some reasoning, some thought, some knowledge of the world when it doesn’t. And so that’s in there, actually, I think, in the section called Stochastic parrots, and boy, did that start happening....
It’s not even lying because lying entails some relationship to the truth that just isn’t there.
The idea with the phrase “stochastic parrots” was to basically just give a cute metaphor that would allow people to get a better sense of what this technology is doing. And it’s honestly unfair to parrots. We’re drawing here really on the English verb “to parrot,” which is to repeat back without any understanding.... So “stochastic” means randomly but according to a probability distribution. The idea is that these systems stitched together words from their training data because they fit in terms of word distribution and not because of any model of the world or any communicative intent or any reasoning. It’s not even lying because lying entails some relationship to the truth that just isn’t there...
There’s this wonderful, weird paper by Reddy from 1979 about the conduit metaphor, and he says if you look at how we talk about language we talk about it as a conduit — “I’m having a hard time getting my ideas across.” For example, there’s this notion that the words store the meaning, carry it through to somebody else who unpacks it, or stay put in a library where there’s like this storage of ideas that you could then go retrieve from the conduit. And he says, that’s not actually what it is at all. And a whole bunch of other research in pragmatics in language acquisition backs this up, that when we use language, what we’re doing is creating very rich clues to what it is that we’re trying to convey. But the person understanding is creating a whole bunch of hypotheses about what our intentions are, what we believe to be true about the world, et cetera, et cetera, and then using that clue in that context to guess what it is we must have been trying to say.
In order to understand it, we almost have to posit a mind behind it and then it’s really hard to remember that that mind is fake.
So, if that’s how we understand language, and then we encounter some language that started off in a different way, it started off just from this text synthesis machine — in order to understand it, we almost have to posit a mind behind it and then it’s really hard to remember that that mind is fake.
I think there’s two directions I’d like to go from there. One is that because we now have these large language models that have taken in so much training data across so many domains, they can output plausible sounding text in just about any domain. It seems like if we don’t yet have a robo lawyer, we’re this close to having one. Or if we don’t yet have a robo mental health therapist, we’re this close to having one. Because we have something that can produce plausible text in all those domains. And that’s a really dangerous moment because the tech solutionism, or I like Meredith Broussard’s phrase tech chauvinism, would like us to believe that that’s possible. And then here’s this technology that can put out language that sure looks like evidence for it. So, there’s some danger there.
And then when you’re talking about how this is language that is intrinsically human.... That connects for me to a serious regulatory wish list item that I have, which is accountability. I would like to have it be set up that if anybody is creating a chatbot or a text synthesis machine and putting it out there in the world, then the organization that is doing that should be accountable for what the thing says. And I think that would change things in a big hurry....
One takeaway is that OpenAI is brilliant at getting the general public to do their PR for them. The whole ChatGPT interface basically just set up this ... real groundswell of buzz for this product that was free for them. I mean, they had to pay for the compute time, but that’s it, right?...
So, one of the things about very large data sets is that people like to assume that because they’re big, they must be representative. And the Internet’s a big place. It seems like everybody’s there, so let’s just grab stuff off the Internet and that’ll be representative and that will be somehow neutral. There’s this position of “we just took what was naturally occurring, so we have no responsibility for what’s here.” Well, it’s never neutral, right?
In (I think it’s section 4 of) Stochastic Parrots, we actually go through step by step to show how the data that’s collected is likely to overrepresent the views and positions of people with privilege.... We then looked at what we knew about where the data was coming from, which by the way, for GPT-4 is zero. They apparently for safety (in big scare quotes) have said they’re not going to say what the data is, which is just absurd. I think it’s safety for the company’s bottom line and nothing else. But for GPT-3, which is what we were writing about in Stochastic Parrots, we had some information and one of the main sources was websites that were linked from Reddit. Not Reddit itself, but the sites that were pointed to from there. And the participation in Reddit is overwhelmingly male, probably overwhelmingly white, and so on.... So it's not neutral, it's not representative, and it is going to skew hegemonic.
Last month (on 22 March 2023) tech and thought leaders signed an open letter, authored by the Future of Life Institute (a longtermist operation), calling for a six-month pause or a moratorium if required, on developing large language models past GPT-4. The letter describes how this tech is simply too dangerous to proceed headfirst without first thinking through, collectively, what it is we are doing.
EB: This letter is proposing a six-month moratorium on something rather ill defined.... And it’s written from this point of view of “these things are maybe on the verge of becoming autonomous agents that could turn evil and destroy the whole world.” It very much is of a piece with the whole longtermism stuff that Émile Torres and also […] have doing great work exposing. And there’s like one or two okay ideas in there about like, we need regulation around transparency. Yes. But then a whole bunch of stuff about like, we need to make sure that we develop these to be — long list of adjectives ending with “loyal” — which is just hugely misplaced.
Is your hammer loyal? Is your car loyal?
This is a tool.It’s technology. Is your hammer loyal? Is your car loyal? That doesn’t mean anything. But it’s part of this narrative of “AIs are separate autonomous thinking agents that are maybe now in their infancy, and so we have to nurture them and raise them right, and then they become things where we can displace accountability to those things instead of keeping the accountability where it belongs.”
So, yeah, the letter was really infuriating, I have to say, and certainly got a lot of attention. One of the things that I found particularly annoying about it: they cite the Stochastic Parrots paper and they cite it — let me get their exact words because this is really frustrating. First line of the letter: “AI systems with human competitive intelligence can pose profound risks to society and humanity, as shown by extensive research — footnote one.” And our paper is the first thing in that footnote. We were not talking about AI systems. We certainly did not claim that the systems we were talking about have something like human competitive intelligence. We were talking about large language models. And yes, we show that there’s harms to that, but that’s what we’re talking about. We’re not talking about autonomous AI agents because those things don’t exist....
Take a look at the initial signatories and the authors of that AI pause letter. They are not people who are experiencing the short end of the stick in our systems of oppression right now. And so they would rather think about this imaginary Sci-Fi villain that they can be fighting against rather than looking at their own role in what’s going on in harms right now....
The Stochastic Parrots authors wrote a response to the AI pause letter that’s up on the Dare Institute website.... It seemed like it would be efficient to have just a joint statement that we could then point people to and that has been helpful. And one of the reactions we’ve gotten to that is people who signed the letter telling us that we are squandering the opportunity that the letter introduced by creating the appearance of infighting. Don’t you kind of want some of the same things?
I have quite a bunch of responses to that. One of them is if your very framing relies on AI hype, then you’re already causing harms and I’m not going to get on board. Another one is, do you really think that I have something in common politically with Elon Musk? A third point is if the so-called AI safety people were really interested in working together, then they would not pretend, like, “oh no, now it’s a problem. Now we have to worry about this.” They would rather go and build on the work of the people who’ve been doing this.
And so I want to point to the Algorithmic Justice League with Joy Buolamwini. I want to point to the AI incident Database, which is this great project that’s collecting these examples. It’s not exhaustive because we don’t hear about all of them but the known ones, and then a whole bunch of books.... So, there’s this huge literature here with people doing really brilliant work. And if the AI safety people say we should get along and fight for common goals, well, come read, come learn from the people who’ve been doing this. And don’t just pretend that you can come in and columbus the whole issue....
I really want folks to be proud of our humanity and stand up for it and not fall for that.
I think one part that we haven’t touched on is just how much dehumanization is involved here on many, many levels.... And one of them is this idea that if we’re going to call large language models “stochastic parrots,” well, maybe people are stochastic parrots, too.And what I see there is somebody who so desperately wants this thing they’ve created to actually be artificial general intelligence, which is not defined, but it’s supposed to be something like human intelligence, that they’re going to minimize what it is to be human, to make them the same. And I guess I really want folks to be proud of our humanity and stand up for it and not fall for that....
I don’t like the term AI. There’s a wonderful replacement from Stefano Quintarelli to call it SALAMI.As soon as you say SALAMI instead of AI, everything sounds wonderfully ridiculous. Right? “We’re going to give you this SALAMI powered tool. Does this SALAMI have feelings?” It’s wonderful. And there’s an additional layer to that for Italian speakers, apparently, because to call someone a salami in Italian is to say they are not very smart.
Case in point: Emily M. Bender, Google CEO peddles #AIhype on CBS 60 minutes. (Note added on 19 April 2023)
A useful link by Ethan Mollick about using “AI” as a tool.
Remember Dennett telling us that we are all zombies?
Great analysis. Thanks.
Polanyi left a potential Nobel in chemistry behind sixty years ago, and dedicated his life to study what she is trying to explain on that note about “communicative intents”. So a frustration paper is not a bad thing.
Anyway, I don’t read the news, and neither of them gets to the point. Reality is two steps removed.