Four and a half years ago, I wrote about “The Agency Problem”—the problem of how to imbue AI with the capability to make decisions and take actions to accomplish goals, without direct human intervention. How to imbue it with, figuratively speaking, a reason to get out of bed in the morning. I said that while progress was being made in the creation of intelligence, there wasn’t any in giving that intelligent entity a sense of agency. I thought that agency was essential to the prospect of human-level AI and that without it, it would be like a rocket without fuel. I thought it would be hard to create agency, and we would have to put significant effort into it. Fast-forward to today, I think that the emphasis I put on the difficulty of creating agency was misguided. I now think that post is wrong, or at least somewhere between wrong and irrelevant.
Creating an intelligent entity is a monumental task, yet the last five years have brought incredible advancements in this effort. But, giving it a sense of agency, which I once thought was exceedingly difficult and said we were making no progress on, might turn out to be trivially simple. From a programming standpoint, imbuing AI with agency might be as straightforward as, essentially, giving it a goal and initiating a "while loop," a basic programming concept where a task is continuously repeated until a certain condition is met. Moreover, we can program the AI to create subtasks and spawn new instances of itself to complete these subtasks, facilitating a relentless pursuit of its overarching goal by breaking it down into more manageable pieces. With this simple approach, the effect is to set the AI on a path to pursuing goals until its mission is accomplished.
Take, for instance, GPT-4, one of the models that power ChatGPT. It is incredibly intelligent yet completely without agency. It remains in standby mode, ready to respond to user prompts but incapable of proactive engagement or pursuing goals independently. However, introducing a while loop into its programming changes this completely. Suddenly, GPT-4 can work tirelessly towards a specified goal, delegating different responsibilities to various subroutines to achieve its objective. This is precisely what the popular open-source tool AutoGPT has done.
Some might argue that this doesn’t truly solve the agency problem. It seems like a gimmick, a far cry from the deep sense of agency experienced by humans. While it’s true that there is a distinction here, it might be a distinction without much of a practical difference. Imagine trying to pick better stocks than an advanced AutoGPT. It would be able to read all the information it needs about the economy, tirelessly seeking out information until it could find the best stocks to buy. At some level, it doesn’t have the “agency” to motivate itself to work hard to win. But that’s not much consolation when it’s staying up all night reading financial reports and crushing you in the competition. For every behavior you can observe, it acts as if it has agency.
There are clear benefits to agentic AIs, and I expect we will see a proliferation of them because of this. For all sorts of tasks, it’s better to have an agentic AI by your side. An agentic AI could be invaluable in business - not just providing ideas like ChatGPT, but taking the initiative to identify products, build websites, and maximize profits. If I’m using an AI to help me financially, I don't just want tips—I want it to go out and make me money.
So we should expect and prepare for a world with agentic AIs. Whenever GPT-5 is released, AutoGPT-5 is only a while loop behind it.
I’ve talked before about the potential dangers of AI. It could be the case that all that is required for Type 4 risk, where the AI itself becomes dangerous, is a high level of intelligence and a sense of agency. There are probably a few other things that are required, such as input sensor interpretation, tool usage, and short-term memory. But none of these seem particularly difficult. This could mean that the only difficult task between us and dangerous Type 4 risk is advanced intelligence.
This would be an unfortunate scenario, as advanced intelligence is also the component that could unlock tremendous economic value, help us cure diseases, and live much better lives. This would mean we can’t have one without the other. Or, at least, we can’t allow highly intelligent AIs to proliferate so that it’s easily accessible to everyone.
We could compare this scenario to the relationship between nuclear power and nuclear weapons. Nuclear power is an important part of reducing carbon emissions and I hope that it proliferates as widely as possible. Nuclear weapons… well, I don’t think anyone wants those everywhere. It turns out that even if you have nuclear power, it’s not so easy to get nuclear weapons. The creation of nuclear weapons requires a state-backed program and takes years, and some countries still fail.
But it might not be the case that once you’ve got a powerful general intelligence that the distance to dangerous AI is very far. Keep in mind that we’re talking about ways in which the universe works. It just so happens that making nuclear weapons from nuclear power is hard. It isn’t necessarily true that making dangerous AI from highly intelligent AI is hard. At this time, I think we just don’t know. But we should be prepared for the worst-case scenario.
If dangerous AI were easy to make, it would make a weird world to live in. The types of people who become school shooters might abandon their guns and try to use AI to cause the most harm possible, and they might succeed on a much larger scale.
There could also be danger from people who don’t take the risk seriously, or just get their kicks from being edgy. We might need to take seriously the idea that a bunch of teenagers could cause real harm, far more than ever before.
No one would ever believe the plot of a movie where someone typed “Destroy humanity” into an AI console and let it run wild, just to see what would happen. Viewers would protest that “No one would ever be that stupid.” Except, that’s what happened with ChaosGPT, where an anonymous YouTuber instructed GPT-4 to do exactly that. The Youtuber gave it the goals of taking over the world and destroying humanity and kicked it off to see what it would do.
There are lots of reasons why ChaosGPT is just an interesting experiment and came nowhere close to causing real harm. One is that GPT-4 is not smart enough to destroy humanity or even anything directly harmful. It doesn’t have the planning capabilities, and, as you can see in the video, its ideas and ability to execute them aren’t very good. In addition, GPT-4 is served via API, so OpenAI can and does monitor and restrict what people use it for.
So, for now, there’s no harm in this. As stupid as ChaosGPT might seem, it’s probably a good idea that someone tried it. It’s good to closely monitor the progress of capabilities like this.
We’re still several steps away from anything like this being dangerous in a Type 4 way. But we’ve got to look to the future, not just at present capabilities. We’re relentlessly marching on a path to more and more powerful AI models, and we’re seeing more of them proliferate in the open-source world, where there won’t be any way to monitor how they’re being used.
We are making rapid advances in artificial intelligence, yet many unknowns remain regarding the path ahead. While today's models like GPT-4 have clear limitations, we cannot rely on these limitations persisting indefinitely, especially as billions in research funding drive progress forward. The possibility that merely combining high intelligence with a sense of agency could lead to dangerous outcomes deserves our utmost caution and preparation. Nowhere else is the precautionary principle, which states that we should be additionally cautious when uncertainty is high, more appropriate.