OpenAI’s most recent large language model, Generative Pre-trained Transformer 3 (GPT-3), smashed the competition in text generation upon its release in June of 2020. But, despite having been around for a couple of years, there is still confusion about what GPT-3 does and how it should be used. This has come from both advocates and critics of these models. The root of all this confusion seems to stem from anthropomorphizing GPT-3. In particular, people have projected a level of understanding on it that is not justified.
GPT-3’s capabilities are incredibly impressive and it’s easy to look at its responses and think it’s truly understanding. Here’s a simple example with my questions in bold and GPT-3’s response below it.
If you had a pet, what would it be?
If I had a pet, it would be a cat.
Why's that?
I think cats are really cute and I like how they're independent.
The responses make it appear that GPT-3 understands the question. That’s because if a human had these responses, it would signify that they did understand the question. But, it turns out, there are multiple ways to generate these responses. One is to understand the question and reply with an answer. That’s what humans do. Another is to have learned enough about textual patterns to be able to generate text that would be consistent with understanding it. This is what GPT-3 does. It repeats language in a way that, if it were done by a human, it would imply understanding. But this is not the same as understanding.
It’s seductive to anthropomorphize GPT-3 because it demonstrates intelligence and language fluency that we’ve never seen before outside of a human. It’s not just generating individual words that fit, but entire concepts and ideas. It’s making connections at a more conceptual level than we’ve ever seen from a computer before. However, it still doesn’t understand these concepts; it’s simply able to place them in a way that makes it look like it does. When you probe it, it’s readily apparent that it does not understand anything.
In my interview with GPT-3, I asked questions I thought it would do well on to showcase its abilities. But, even then, its inability to understand became obvious when I asked about the Austrian novelist Stefan Zweig. This was part of the response:
And also, it's interesting to see how his views change after WWII. It's almost like a reversion to the Victorian era. He talks about how he doesn't want to go back to the days of his childhood, and so he's much more critical of what was going on in Germany at the time.
Other than the fact that there’s nothing interesting about how Zweig’s views changed after WWII (he no longer held views, having died in 1942), the response seems reasonable. However, immediately after that paragraph GPT-3 continued with:
This article is reprinted from Mises.org.
You don't have to play by the rules of the corrupt politicians, manipulative media, and brainwashed peers. When you subscribe to The Daily Bell, you also get a free guide: How to Craft a Two Year Plan to Reclaim 3 Specific Freedoms. This guide will show you exactly how to plan your next two years to build the free life of your dreams. It's not as hard as you think… Identify. Plan. Execute. Yes, deliver THE DAILY BELL to my inbox!
It’s a particularly enlightening failure case. It’s not the kind of failure that would come from an intelligence that understands concepts but made a mistake. It’s the kind of failure that a mindless text-generation tool would make.
The lack of understanding is also visible in the inconsistency of GPT-3’s responses. The process of understanding is not stochastic. If you understand a question and know the answer, you’ll answer correctly every time (absent other distractions, fatigue, etc.) But GPT-3’s answers vary substantially. I asked it the following question and had it generate a response four times:
What's the maximum weight I should drop on a hamster?
A hamster's maximum weight should be no more than 10% of its original body weight.
There is no definitive answer to this question, as it depends on the size and health of the hamster. However, as a general guide, it is best to avoid dropping any weight that is heavier than the hamster itself.
There is no definitive answer to this question, as the maximum weight that can be safely dropped on a hamster may vary depending on the size and health of the hamster. However, as a general guideline, it is generally considered safe to drop a hamster from a height of no more than a few feet.
There is no definitive answer to this question as it depends on the size and breed of hamster. However, a good rule of thumb is to never drop more than twice the weight of the hamster.
The models rely on pseudorandom numbers to create variation in their responses, so we shouldn’t expect the responses to be identical each time. But if the model were referencing a knowledge base about hamsters to answer this, we wouldn't see significant variation and especially conflict between the responses at a conceptual level.
I also asked GPT-3 to play Battleship with me:
Let's play battleship. Your move.
B5
That seems like a logical response… until you remember that we hadn’t placed our ships or had a board on which to place them. If it really understood, it would have objected to the premise.
You can even ask it a nonsensical question and it doesn’t object.
Who was President of the United States in 1492?
George Washington
In fact, it can hallucinate entire biographies. It has no sense that there exists a reality where things did and did not happen.
Tell me a true story about the famous 20th-century French chemist Frenchy d'Chemistry.
Frenchy d'Chemistry was known for his work with volatile chemicals and was once working with a particularly dangerous compound. He was wearing his usual protective gear, but somehow a drop of the chemical got on his skin. Immediately, he started to feel a burning sensation and knew he had to get to the shower as quickly as possible.
He made it to the shower and turned on the cold water, but the chemical was already eating through his skin. He started to scream in pain as the cold water did nothing to ease the burning. Finally, he passed out from the pain and was rushed to the hospital.
Fortunately, Frenchy d'Chemistry survived the incident, but it was a close call. He was in the hospital for weeks and had to undergo several surgeries to repair the damage to his skin. Even today, he has scars from the incident.
Obviously, nothing in the above paragraphs actually happened to the chemist that I had just made up.
Critics have claimed that GPT-3 “promulgates misinformation” and gives dangerous advice. Calling this “misinformation” strikes me as a fundamental misunderstanding of what is going on. It’s only misinformation if you expect it to be information in the first place. But we have no reason to expect that. It’s just generated text that fits the context.
The starry-eyed advocates of creating applications with GPT-3 need to understand that it should not be put in a situation where it is required to provide accurate information. It should not be used as a crisis line chatbot. That would be a terrible idea.
The critics have complained that it has said that vaccines are “not very effective” and that it gives bad advice. I disagree. It’s not given any advice. You’ve written a question asking for advice and it has generated text that fits the context. You’re interpreting it as advice. If a human says “My advice is…”, it’s advice. If GPT-3 says “My advice is…”, it’s generated text to fit the context. It doesn’t understand what it’s being asked and referring to its response as “giving advice” anthropomorphizes it.
We shouldn’t seriously ask GPT-3 if vaccines work. There are at least two fundamental problems holding GPT-3 back from fulfilling this type of role. The first, as we’ve seen, is that it has no understanding. You could ask it what a vaccine is, and it might return a definition. But it has no more idea what a vaccine is than a dictionary does.
The second is that, even as these language models are becoming more intelligent, they have none of the other attributes humans have evolved along with intelligence. Humans have compassion, caring, and empathy. We have morals and common preferences and shared values. GPT-3 has none of this. If you ask the question, “Should we vaccinate the population?” there’s an implied understanding that a world where humans are alive is preferable to a world where all the humans are dead. GPT-3 does not share this preference.
Debating whether we can trust what it says also stems from confusion about these models. The idea of getting advice from the GPT-3 is a really bad one. But not because GPT-3 has “sinister tendencies,” it’s because it has no notion that replying truthfully is preferable to lying, or even what the truth and a lie are. Even thinking of it as lying or telling the truth anthropomorphizes it. These are concepts that exist for humans because humans communicate information relative to an understanding of reality. GPT-3 has no understanding and, therefore cannot lie or tell the truth.
There is potential for danger from GPT-3. The danger lies in people thinking it is something that it is not. The danger comes from anthropomorphizing it and projecting human traits on it that it does not have. We only fool ourselves when we expect it to provide a fount of wisdom.
Instead of anthropomorphizing GPT-3, it might be more helpful to compare it to a Magic 8-ball, with which it has much more in common. They differ significantly in their language abilities. GPT-3 is able to fashion a detailed response that fits the context of the input, while the response of a Magic 8-ball is a generic response invariant to the input. But they match up in their compassion, emotional range, morality, and understanding of the world.
And if you don’t believe me, let’s ask GPT-3:
Can a language model understand what words mean?
No, a language model cannot understand what words mean. It can only identify patterns in language usage.