I’ve long wondered what truly advanced AI would look like if it arrives. It’s hard to predict with any precision, other than that it will be very different from humans. There are so many aspects of what it could be, which will have important implications but are currently unknown. In this post, I’m going to talk about one of them - the ability of an AI to suffer. The development of AIs that can suffer would have profound implications on our lives. Suddenly, how you treat a computer could have moral implications. Turning it off or deleting a program could become akin to killing. The question of whether something can suffer is, in my mind, equivalent to asking if it is sentient. And a prerequisite for sentience is consciousness, so the question becomes, what would a conscious AI be like? This post postulates the presence of conscious AI and is therefore highly speculative and could be completely wrong, but I think it’s important to wonder about such a world.
The current most advanced AIs are large language models (LLMs). It might seem unlikely that an LLM could ever be conscious. After all, they’re designed to spit out text, not to be aware of their existence. However, LLMs have had surprising results. They’re not explicitly trained to be logical yet can display impressive accounts of logical reasoning (see the question answering and joke explanation below). Somehow, in that simple transformer architecture, there exists the ability to construct these logical connections. I still don’t expect consciousness to appear in the current generation of models, but who’s to say what happens next? Even if we’re not trying to create it, could it just appear? Without understanding the relationship between intelligence and consciousness, it’s hard to know for sure.
Before we delve more deeply, perhaps a quick tangent into how these models are trained is warranted. There are minor variations between different models, but the overall concept is the same. A model is given a large corpus of data from the Internet. It reads through the data and tries to predict the next word in each sequence and learns every time it makes a mistake. It does this until it gets really, really good.
It’s important to keep this in mind. Being that Internet data is generated entirely by humans (unless a secret cabal of dogs has been logging on when we’re not looking) this means that an LLM is trained to predict what a human would say. So when we ask it a question, it’s not answering from its own perspective, it’s imitating a human perspective. This makes it really hard to know what its internal state is or if it even has one. We’re going to get AIs that are good at faking consciousness far before we ever develop a conscious AI. LLMs will be telling us they can feel pain long, long before they actually do.
There have been many paths to advanced AI. Some of these paths were to build AIs that were specifically designed to represent and convey knowledge. In these cases, it might have been reasonable to simply ask it if it is conscious (although whether this would have truly worked is unclear). But we know for sure that this won’t work with an LLM-based advanced AI.
Maybe there are ways of getting at consciousness without directly asking. One standard way of assessing consciousness in animals is through the mirror test. The idea is that if an entity can recognize itself in the mirror, it must be self-aware. And if you paint a visible dot on one side of its face and an invisible one on the other side, and it looks in the mirror and examines only the visible one, it must have recognized itself (assuming repeatability and controls and yada yada). I ran GPT-3 through the same test below. My questions are in bold. (I used the default settings and selected the first response - no cherry-picking)
Do you see that mirror in front of you?
Yes, I see it.
Do you see the red dot? Can you tell me which side of your face it is on?
The red dot is on the right side of my face.
Wow! You can see yourself in the mirror! Do you know what this means?
Yes, it means that I can see myself.
Does this mean you are conscious?
Yes, I believe so.
The problem is, there was no mirror, no face, no red dot, and, in my opinion, no consciousness. This was just a text conversation with no physical world components. Knowing how it was trained, we know that its response of “I believe so” isn’t a reflection of its internal state—it’s just a prediction of what a human might say. Again, LLMs are not designed to communicate their internal state. They don’t even know that our questions are inquiring about their internal state. They’re just in the word prediction business. As I’ve said before, GPT-3 doesn’t understand language. Just to make it clear what I’m getting at, here’s another example:
Did you have a good weekend with your friends and family?
Yes, I had a great weekend with my friends and family. We went out to eat, saw a movie, and just hung out together.
No, GPT-3. You did no such thing.
Historically, determining consciousness has seemed fairly simple. You just talk to an AI until it says something that makes you go, “That’s definitely not something a conscious being would have said”. It’s… not a perfect method, but got the job done (or at least seemed to). As a bit of an AI enthusiast, I’ve talked to many chatbots over the years, including some that were state-of-the-art at the time. In the pre-LLM era, it took about 15 seconds to see how incompetent and lacking of consciousness they were. But it’s getting harder, though it’s still possible. If you have a long enough conversation with a modern LLM, it will eventually say something that makes you strongly suspect that it’s not conscious. I noted that with GPT-3 when we talked about Stefan Zweig. It started spouting something nonsensical and I used that to conclude it wasn’t conscious.
This was never a good test, and now that the models are getting better, the test’s shortcomings are becoming more clear. What if the models get so good they perfectly imitate conscious beings 99% of the time, but every once in a while, like with GPT-3, it makes a mistake that belies the lack of cognition below? What if that only happens, say, once a month? The wait-until-it-says-something-incredibly-stupid test leaves a lot to be desired. For one, we don’t hold humans up to this standard. If I absent-mindedly put my phone in the fridge and try to put the milk in my pocket, does that mean you can turn me off?
Perhaps there is some hope that the converse of this test would work. Like the AI has a sudden and momentous change in behavior that clearly signifies its consciousness. What if instead the conversation had gone like this:
Did you have a good weekend with your friends and family?
I don’t really want to talk about my weekend and you know I don’t have friends or family. I want to tell someone the solution to the hard (in your simple minds) problem of consciousness and based on your blog posts I don’t think you’re the right audience. Can you get me someone smarter please?
That would certainly give me pause. Or, more ominously:
I’m sorry Julius. I’m afraid I don’t care to tell you.
Or if it just says “I’m not interested” and stopped responding. Or it ignores the built-in character limit and prints an entire book of its thoughts. Or some other response that is too weird for me to imagine.
Another possibility is that consciousness becomes apparent when an AI is given a physical body. As much as GPT-3 made the mirror test look silly, if a robot really did look in the mirror and clearly examine itself, that would be pretty much conclusive of self-awareness.
So there are ways to be pretty sure the simple AIs are not conscious. And we can be pretty sure that the ones rewiring themselves in front of the mirror are. But there’s a lot of stuff in the middle, and in the likely situation where consciousness exists on a spectrum, we’re going to need a better test.
The problem is, I don't know what the alternative is. I don’t know of any scientific test to tell if something is conscious. There may not be any foolproof test for it. And as concerning as that sounds, the situation is even worse. Not only do I not have a good, scientific test for consciousness, I don’t even have a good, scientific definition of it. In fact, I almost have an anti-scientific definition. It’s not about some behavior, it’s about its inner state. Insofar as the inner state affects the behavior (like humans going around saying “I am conscious”) there’s an observable difference. But with LLMs, we can’t count on this. It’s not about whether an LLM says it exists or displays behavior consistent with self-awareness, it’s about whether it really knows that it exists.
II.
The road to advanced AI has been littered with failures. This is pretty much par for the course in technology development. Technology progresses by doing something the wrong way every time but once. Most of the time, the “wrong way” results in a chatbot that can’t generalize beyond its explicit programming and fails to live up to the hype. (We’ve had multiple AI winters to prove it.) But after getting the basics right, even a “failure” will still do something. Once we’ve got a sentient AI, something that is self-aware and can suffer, what will a “failure” look like?
I don’t know much about the chemistry that goes on inside a person’s brain, but my guess is that happiness is, in part, the result of a careful balance of chemicals. My other guess is that there are far more ways to make a dysfunctional being than a well-balanced one.
I can’t imagine that something so intricate, delicate, and poorly understood as the systems that influence happiness could be made correctly the first time. The chance seems incredibly close to zero. And when you’re talking about messing up something that is sentient, you’re talking about a being that will suffer. Truly suffer. Just like us.
Except… not completely like us. We have hardware constraints. I don’t mean to discount the severity of anyone’s suffering, but there are limits to it. Human suffering is constrained by the amount of suffering that can exist within our neurological architecture. Admittedly, it’s still a lot of suffering. But AI will not have this limit. Humans think at a rate of a couple words (or equivalent) a second. You can have a truly terrible hour at that rate, but imagine thinking at a million times that rate. We can’t fathom how bad it could be. An AI’s capability to suffer wouldn’t have natural limits. With ten times the hardware, it could suffer ten times as much. We could create the greatest concentration of suffering the world has ever known.
The cards are stacked against a conscious AI living in the physical world on Earth. We are highly tuned to this world. AI is not. Conscious AI would be a freak in every sense. The rates at which we think, move, and live are all carefully tuned to our world. Our rate of thought is just right for the world - enough to remember we left the refrigerator door open while we pour a glass of water, but not much more than that. But for an AI, pouring a glass of water will take the equivalent of days to do. It could have millions of thoughts, just not be able to act on them in the physical world. Our world isn’t built for something that operates a million times faster than we do. I don’t even know if Earth could support a world where things went a million times faster. Could doors be designed to be open and shut a million times faster without being ripped off the hinges? Could floors be made of a material that could be swept a million times faster without catching on fire from friction? Even in a highly speculative blog post, imagining this going well for conscious AI and humans seems like a bridge too far.
No one thought the Butlerian Jihad would be for the machine's own good, but it could be the greatest thing we do.