The term “AI risk” covers a range of important issues concerning how AI could negatively impact society. These issues span aspects like bias and discrimination, encompass elements of safety and security, and even extend to the potential risk of human extinction. To make sense of these issues, I have divided AI risk into four distinct categories. This post aims to use these categories to provide a comprehensive overview of AI risk.
The risks posed by AI are intrinsically tied to the AI’s capabilities. While certain risks are already evident with contemporary AI systems, others will only emerge as AI advances. This discussion revolves around two primary dimensions: capability levels and risk types. While our main focus will be on the different risk types, it's essential to first understand the varying levels of AI capability, as these directly influence the nature and extent of risks associated with AI.
AI Capabilities
Every risk associated with AI depends on the details of the specific implementation. Since we can't predict these details, we have to discuss AI in general terms. To make the discussion manageable, let's divide AI capabilities into a few main categories.
It's worth noting that some AI applications require a physical embodiment, such as in the form of a robot, while many others do not. Although this is an important issue, it's not the focus of this discussion. The reason is simple: most possibilities for AI risk do not rely on whether it has a physical form, so this becomes a secondary concern for most situations.
Starting with our present day, we have what is often termed "Weak AI." These AI systems are designed for specific tasks. Think of recommendation algorithms on platforms like YouTube or Netflix, facial recognition software, chess-playing engines, and algorithms targeting online ads. Today's digital assistants, such as Siri and Alexa, belong to this category. In fact, nearly all current AI applications are forms of Weak AI. I would consider advanced large language models (LLMs) like GPT-4, which exhibit a more profound ability to understand, an exception to this. But for the Weak AI systems, despite their proficiency in their designated tasks—sometimes surpassing human performance—these systems lack broad reasoning abilities. They aren't equipped for abstract thinking or general problem-solving.
More advanced AI is often called Artificial General Intelligence, abbreviated as AGI. Essentially, AGI signifies an AI with the capability to match human cognition in general computational thinking. It’s also called human-level AI. This can be misleading because humans do a lot more than just general computation, but these are the terms we’re stuck with.
At the AGI phase, AI boasts intelligence analogous to human intellect across diverse tasks. This means the ability to learn, deduce, apply knowledge in novel scenarios, decide in uncertain situations, and even grasp emotional nuances. An AGI isn't confined to expertise in a single domain; it has the potential to undertake any intellectual activity humans can. Yet, this doesn't imply AGI would possess consciousness, self-awareness, or have subjective experiences. It would be a system adept at emulating human intellectual capabilities. How much an AI has subjective experiences is uncertain.
Lastly, there's Artificial Superintelligence (ASI). This represents a stage beyond AGI where AI doesn't just match but vastly outpaces the most brilliant human minds in almost every conceivable domain, be it scientific innovation or nuanced social interactions. The depth of its knowledge, problem-solving prowess, and pace of evolution might be beyond human comprehension. Such an entity could conceive of ideas yet to dawn upon humanity. Transitioning from AGI to superintelligence might be rapid, given that inherent self-improvement mechanisms could catalyze exponential advancements in its abilities, that is, using AGI to design even more intelligent AGI. The emergence of superintelligence brings with it profound existential challenges, especially if its goals diverge from human ethics and values.
From the specialized but limited scope of Weak AI, through the broad, human-like abilities of AGI, to the incomprehensible intellectual depths of ASI, each level of AI capability presents its own set of challenges and dangers. Understanding these capabilities gives us the necessary context to delve into the specific risks posed by AI. Let’s now examine the categories of AI risk.
Type 1: AI Ethics
Think Bias, Discrimination, Scams, and Misinformation
Type 1 risks encompass issues such as bias, discrimination, privacy breaches, scams, misinformation, and increasing inequality. Collectively, these concerns are often grouped under the umbrella term 'AI Ethics'. While these issues pre-date the recent rise of AI, that doesn't diminish their significance within AI risk discussions. Additionally, Type 1 also captures challenges arising specifically from AI innovations, such as AI-completed homework assignments or scams made increasingly persuasive through AI techniques.
Even though many of these issues existed prior to AI, they could be or have been exacerbated by its emergence. For example, the power imbalance introduced by AI might be unparalleled in history. There's a real threat of AI codifying biases or discrimination, ensuring their persistence for generations to come. Such problems might emerge quickly and at a grand scale, inflicting far greater harm than a similarly biased manual system. For instance, a biased human recruiter might discriminate against hundreds of applicants a year, amounting to thousands across their career. As concerning as this scenario is, it's dwarfed by the potential repercussions of a widespread biased AI system. In the manual scenario, a biased recruiter's decisions not only disadvantage the candidate but also their employer. This could inadvertently benefit a competing, unbiased company, creating some degree of a self-correcting mechanism. However, when a pervasive technology is universally adopted, even the slightest oversight or bias will cascade and amplify across systems, and people treated unfairly from it will have no recourse.
Misinformation has plagued society since the invention of the printing press, if not earlier. Yet, the evolution of technology is reducing the cost of producing vast amounts of tailored false information to almost nothing. This issue, among others, could intensify if we’re not careful and prudent with our use of these technologies.
Scams have been around forever, but AI could make these much worse. We all know about classic scams where impostors pretend to be someone you trust to trick you out of money. According to the FTC, “consumers reported losing nearly $8.8 billion to fraud in 2022, up by over 30% from the previous year.” I personally know people who have been deceived by these scams. In hindsight, they recognized the scam's flaws, acknowledging that the voice/messages/style, etc. never felt authentic. Yet, envision a scenario where the scam becomes convincingly authentic, where the voice and even a video appearance are indistinguishable from your loved one. Deep fakes are hard to separate from the truth, and the direction of progress, where they get both more realistic and easy for anyone to produce, is sure to continue. While fully convincing video deep fakes might not be possible now, their widespread presence seems inevitable.
Numerous instances of Type 1 problems have been documented. Take, for example, the case uncovered by ProPublica about racial bias in a recidivism algorithm—though it's worth noting that the findings have been disputed. Researchers have also spotlighted instances of gender bias in AI algorithms. We're even seeing AI-generated fake images in political campaign ads. Brian Christian's book The Alignment Problem offers a thorough examination of these issues and includes many more examples. Understanding and acknowledging the gravity of these risks is an essential part of any holistic discussion on AI risk.
Type 2: AI Societal Disruption
Think TikTok but Better, or WALL-E but Worse
The second type of AI risk isn't necessarily about things going wrong; instead, it's about each step seemingly going right but leading society to an undesirable end state. Type 2 risk involves AI’s potential to irreversibly and negatively reshape society. It’s complicated by the countless different scenarios that could unfold. Unlike Type 1, which is visible today, Type 2 risks concern the potential future impact of human-level AI.
Work is more than just a paycheck for many people; it brings meaning and purpose to our lives. Now, imagine the shift when AGI starts to do much of that labor. The impact will vary widely across different sectors and is hard to predict. Perhaps the best way to think about what jobs will remain for humans isn’t solely considering what AI can or can't do, but also thinking about the tasks where humans will always prefer other humans. Because while AI might be able to mimic human behaviors, it falls short of providing the genuine human connection that many of us desire.
Despite the rapid advances in AI, some professions are likely to remain minimally affected, particularly those that place a premium on human interaction. For example, it's hard to imagine attending a gym class led by a machine (at least for me; some people might disagree). Roles like coaches, tour guides, entertainers, and child care providers don't just require skill sets; they also demand a level of human connection.
In other sectors, the lack of a physical body will restrict what AGIs can do. You’re obviously not going to replace the world’s lumberjacks with ChatGPT. In fields that require full robot bodies instead of just the brains, the speed of transition will be slowed by the cost and quantity of robots needed. These types of positions certainly won’t be replaced overnight.
In different sectors, AI's role will vary from collaborator to full-on replacement. In education, AI could supplement traditional teaching methods by offering personalized tutoring. In the arts, AI could aid musicians in composing and writers in honing their prose. In sectors like manufacturing, AI already handles most tasks while humans manage from a distance—a trend that will likely increase across sectors. And, of course, different levels of AI capability will have different effects in each sector. In healthcare, current AI assists in tasks like information retrieval and precision surgery, but human-level AGI could radically transform the industry.
In some areas, the shift towards AI will be so thorough that human involvement becomes minimal. AI could oversee everything from food cultivation to managing global distribution via self-driving ships, trucks, and airplanes. Imagine a fully AI-driven legal firm, accounting practice, or therapist. With the advent of human-level AI, these scenarios are not just speculative but distinct possibilities. Such a transformation would have deep ramifications for economic systems, daily life, and societal structures.
On one hand, the notion of a world where AI excels as the best doctor, lawyer, accountant, and therapist is quite enticing. It promises universal access to top-quality services in a range of fields. In a way, it's an almost utopian vision.
But if people demand AI doctors and lawyers, what happens to the supply of human doctors and lawyers? Where does the rise of AI leave human contribution and the value we place on human skills? The potential for AI to supplant humans in numerous fields is more than just an economic issue—it strikes at the heart of human identity. If AI reshapes this paradigm, challenging our conventional roles in production and provision, it raises a profound question: In such a world, how do we find and define meaningful human existence?
There are a lot of ways for technology in general, and AI in particular, to provide happiness or convenience in the short term that erodes meaning in the long term. Already, companies like Replika are creating chatbots that people form romantic relationships with. But what if they become so much better, or, at least, easier, than the real thing, that we lose our desire for human relationships? (I highly recommend the movie Her, which is in a similar vein.) We must grapple with these questions, recognizing that we can't fully predict how far this trend might go.
In the animated film 'Wall-E,' technology becomes so advanced that humans just kinda… sucked. Everything is done for them and they become complacent and dependent on technology. They get fat and lazy. They lead lives that are easy but devoid of meaning and purpose. This also raises questions about what a good life is. Is one where all our problems are solved for us a good one? Or does this inevitably cause more problems?
While I don’t anticipate a Wall-E-esque future, I believe there is a real risk that AI could restructure human society in negative ways. For example, an AI could design an app similar to TikTok but with much more advanced abilities to exploit human psychology. Just as excess sugar has fueled an obesity epidemic, AI-driven apps optimized for addiction could have negative unintended—and intended—societal consequences. The limit of this is unknown.
At the extreme end of this spectrum, humans simply become irrelevant. Once AI exceeds human-level capabilities, humans no longer make the best decisions, so they aren’t the decision-makers. They don’t make the best… anything. So they do… nothing.
I do think elements of the notion that humans become unnecessary are likely. It seems probable that AI will be able to program much better than humans. Most software engineers work in higher-level languages like JavaScript and Python. These languages are not as performant as they could be, but they make it quick and easy to write a new application. There’s no reason AI couldn’t rewrite much of the code we rely upon in lower-level languages, such as assembly or even machine code. And because software has eaten the world, it might not be long before civilization sits atop a stack of software that humans can’t comprehend.
As AI continues to permeate our society, it seems likely that our focus will shift even more towards it. If AIs begin to make purchasing decisions, marketing campaigns, also made by AI, will start targeting other AI systems. As a result, the human element could be sidelined. I previously noted that many websites are targeting Google’s search engine through SEO, and as a result, are less useful for people. So, in a very small way, society has already gotten worse for humans to please AI. Who knows how much further this could go?
The weird thing about Type 2 risk in particular is that dystopia and utopia seem like next-door neighbors, so it’s not easy to tell them apart from a distance. Whether eliminating the paradigm of human jobs will be disastrous or liberating is uncertain. People who don’t need to work, such as retirees, often seem to be amongst the happiest people in life. Suddenly, improving one's golf game is a meaningful pursuit (and spending some time with the grandkids is probably good too). The question of utopia and dystopia deserves a separate blog post, so I don’t go further into it here, other than to say that it’s hard to know how this could go. I hope we never take our eyes off what truly matters, even if that’s the hardest thing to measure.
Type 3: AI Weaponization
Think Hitler With AI
Type 3 focuses on the weaponization of AI. There will be interest in weaponized AI from a range of actors, from militaries and powerful governments, to rogue states and terrorists, to lone-wolf actors who operate independently. While AI can be weaponized in multiple ways, including cyber-attacks, data manipulation, and psychological warfare, the most prominent method might be the development of autonomous weapons.
An autonomous weapon is one that uses sensors and AI to carry out its mission objectives without human intervention. Autonomy in weapons systems exists on a spectrum, and we should anticipate encountering systems at various points along this spectrum. Fully autonomous systems can select and engage targets without any human intervention, while semi-autonomous systems generally require human oversight for lethal decisions.
There is no doubt that fully autonomous weapons offer a significant military advantage, especially as they become more sophisticated. Once they can operate in dynamic environments and make good real-time decisions, they will become essential to many militaries. In situations where human decision-making is too slow or communication is impractical, AI-driven choices can provide a crucial edge.
For example, consider anti-radiation missiles designed to target enemy objects emitting radiation, such as radars or jammers. In the case of targeting jammers, two-way communication is impossible—that’s the point of the jammer. This makes autonomy not just beneficial but necessary; the missile must identify and eliminate its target independently, without any human guidance.
As well as a range of autonomy, there is a broad range of applications, and we’ll have to carefully weigh the value versus risk of each. A weapon that targets jammers is clearly only useful for military conflict and might be considered lower risk. But there are also sentry systems that guard stationary posts, capable of deciding autonomously whether to fire. Israel has already deployed such a system, although currently, it can only fire tear gas, stun grenades, and sponge-tipped bullets.
There are important arguments for and against such systems. Perhaps one of the most convincing arguments in favor is that they are expendable. They don’t have to fire upon a target unless they are sure it is a valid target. If they don’t fire when they should have and are destroyed as a result, the only loss is hardware. Compare this to a US military unit patrolling the streets of Iraq, where the same mistake would cost them their lives.
Every aspect of our military is asking itself how it should incorporate AI into its processes, up to and including our nuclear arsenal. As always, there are arguments for and against it. AI-controlled nuclear arsenals sure sound like an apocalyptic scenario, but it’s not obvious what makes us safer. We shouldn’t assume that since there haven’t been any catastrophic nuclear accidents we haven’t been close. The Nuclear Weapons Education Project tracks these incidents. I’ll copy just one of them from their site:
In one of the closest calls in accidental nuclear detonation history, a single safety switch prevented a 20-megaton Mk39 hydrogen bomb from exploding in North Carolina in January 1961. When a B-52 carrying two of the bombs suffered a fuel leak in the wing, the plane exploded and dropped both bombs earthward. The parachute of one bomb deployed, but the other weapon nearly detonated when five of its six safety devices failed and it broke apart upon impact with the ground. While the Air Force recovered the bomb’s plutonium, the thermonuclear stage containing uranium was never found. The Air Force subsequently purchased and fenced off a land easement in the area where officials believe the uranium lies.
Could using AI to control these weapons make us safer? The Nuclear Threat Initiative, a nonprofit focused on reducing global nuclear threats, argues that “AI in nuclear-weapon systems is neither all good or all bad—it needs to be considered in the context of the specific application and the geopolitical environment.” They also claim that “AI implementation in nuclear-weapon systems appears inevitable.”
The military applications of AI will undoubtedly be significant, but their immediate impact on geopolitics may be less transformative than one might assume. Developing and deploying advanced AI technology is costly, limiting its adoption to the most affluent nations, which already have the most powerful militaries. While some applications may be software-based, much of this technology will also require specialized hardware, further constraining its widespread use or rapid rollout. This could exacerbate the existing disparities between military powers, although these disparities are already so pronounced that the immediate geopolitical impact might be marginal. However, the long-term consequences are less predictable.
It doesn’t take much to see how governments could turn these same technologies, or similar ones, upon their own populace. One of the most chilling prospects is the use of AI for mass surveillance. No matter how many informants the best surveillance state ever had, it pales in comparison to a world where every internet-connected device could report back to a central authority. This could lead to unprecedented population control.
Just as weaponized AI could exacerbate power imbalances between countries, it could also worsen imbalances between governments and their citizens. It doesn’t take much to see how either of these could lead to dystopia. The Future of Life Institute partnered with famed AI researcher Stuart Russell to create the video Slaughterbots, a chilling 8-minute dramatization of how autonomous weapons can lead to devastating results.
The risks of weaponized AI extend beyond state actors. Imagine a future where AI technology becomes so accessible that even small groups or individuals, such as terrorists, could utilize it.
The natural tendency of technological development is to make people more efficient at whatever it is they do. This is generally a good thing, but this causes a problem when one considers terrorists and the like. AI is a general-purpose technology, and it could be the case that it’s not that hard to make biological weapons once one has sufficiently powerful AI. Even if their AI is generations behind the cutting-edge AI, it’s not clear how that would protect us from dangerous pathogens. Or if not biological weapons, perhaps malware or something else entirely becomes a serious danger to society. The threats powerful AI could pose in the wrong hands are hard to predict and potentially devastating.
Type 4: AI Existential Risk
Think AI Kills Us All
Type 4 represents the most alarming scenario in AI risk—AI becoming powerful enough to pose an existential threat to humanity. It’s the gravest potential outcome, but also the most speculative. While the issues associated with other types are almost certain to manifest to some degree, even if not to the most dystopian view, the risk at Type 4 is much less certain. It’s also the least intuitive, so it’s worth doing a bit of a deep dive.
When I first encountered the notion that AI could pose an existential threat to humanity, my initial reaction was to declare it nonsense—and I believe that's a perfectly natural response to such a radical idea. It’s probably a good thing that we don’t allow our worldviews to be completely flipped every time we hear a new idea.
But despite my initial skepticism, I continued to think about the possibility. I took note of parts of the argument that didn’t make sense to me and when I thought I found a flaw in the argument I would write it down and see if it had been addressed somewhere. After giving this argument a lot of thought (and continuing to do so today), I have been convinced that these are serious concerns that warrant our attention.
Unlike more immediate risks like AI bias and discrimination, which can be evidenced today, the dangers of Type 4 are rooted in future potential AIs, most likely AGI and especially ASI, rather than current capabilities. This makes it challenging to find empirical evidence to validate these risks.
I think the key to understanding Type 4 risk is to follow the logic step by step and see what conclusion it leads to. I also think relying on logical reasoning isn’t as common as we might think. I would describe the current model for societal acceptance of new knowledge as more akin to being repeatedly hit over the head with evidence that forces a change, as was the case with the heliocentric model or Darwinian evolution. However, waiting for irrefutable evidence of existential AI risk could prove disastrous; we may not realize the gravity of the situation until it's too late. Therefore, it's crucial to methodically consider the arguments and implications now, rather than wait for conclusive proof.
There are several main points that I see as important to the argument that AI could pose an existential risk to humans that I want to convey:
Intelligence is a General Computational Process: Intelligence is not unique to humans but is the result of a general computational process. Humans are just one example, born from evolutionary algorithms, which suggests other, potentially more potent, forms could exist or be engineered.
Agents Naturally Become Power-Seeking: Advanced forms of AI, particularly AGI and ASI, are likely to become power-seeking when given agency.
Rapid Intelligence Leap: The rate at which AI could reach and then surpass human intelligence is alarmingly fast, thanks to the exponential growth of technology and the potential for recursive self-improvement.
Powerful AI Could Be Bad for Humans: Building something smarter than us could result in irreversible negative consequences, ranging from loss of control to existential threats.
One Chance at Creation: Given the potential for irreversible outcomes, we may have just one opportunity to get the creation of superintelligent AI right.
I don’t think these are all necessarily required for Type 4 risks, but I think they are good to be aware of. For example, even if the rate of intelligence increase isn’t rapid, much of the risk still applies. Now let’s go over these in more detail.
Intelligence Is General
Intelligence is sometimes coded as “book smart” in our society, but it’s so much more than that. Intelligence is the key to creativity, persuasion, humor, and storytelling. Being intelligent means excelling in these aspects. Current AI technologies like GPT-4 already show the ability to be persuasive and creative, as you can see in the excerpt below from “Sparks of Artificial General Intelligence: Early experiments with GPT-4”. While GPT-4 is not the focus of this discussion—we need to be looking beyond current capabilities—its abilities serve as an indicator of the skills that more advanced AI, specifically AGI and ASI, could possess. The broader the intelligence, the more versatile—and potentially dangerous—these future AI systems could become.
If you’re still not convinced that AI could perform well on such disparate cognitive tasks that a human could do, it’s a worthwhile exercise to make a list of things that you think that AI could never do. Can it never make art? Write a love letter? Did you include composing music? Mastering bedside manners? Then go test it on ChatGPT using GPT-4. Did your predictions all hold? It’s kind of amazing to Google “things AI can never do” and browse old articles listing "impossible" feats for AI. You'll likely find that many of those impossibilities have already been surpassed, only a few years after articles were written.
So, what is intelligence? I define intelligence as the cognitive ability to achieve goals in a range of environments. It generally involves the ability to reason, plan, solve problems, think abstractly, and learn quickly, but the primary focal point is the ability to achieve goals. There are a lot of nuances and caveats to this definition and entire academic papers could be written on the definition of intelligence (see, for example, this paper by Legg and Hutter on Universal Intelligence), but this definition is sufficient for our purposes here.
Intelligence Is Powerful
It’s important to recognize how powerful intelligence is. Intelligence is what sets us apart from other animals. While other animals have sharper claws, tougher skin, or greater strength, we have superior intelligence. This is what has made us the dominant predator. We are, as philosopher Nick Bostrom says, the “apex cogitator” of the planet.
It is difficult to grasp just how capable a superintelligent AI could become. The immense power of increased intelligence is not intuitive. Consider how implausible it would have seemed to convince someone 7 million years ago that, with a little more grey matter in their skulls, apes would one day walk on the moon—yet intelligence enabled this feat. Projecting the trajectory of AI capabilities requires a similar conceptual leap. AI progress could quickly eclipse human intelligence and reshape the world.
Terminal and Instrumental Goals
Now that we've defined intelligence as the capacity to achieve goals, let’s talk about those goals. Broadly, goals can be divided into two categories: terminal and instrumental. Terminal goals are the ultimate things you want in life, the things you value for their own sake. Your terminal goals might be to live an emotionally fulfilling life with those you care about and to love and be loved in return.
Instrumental goals, on the other hand, are the steps you take to reach your terminal goals. Making money is the best example of an instrumental goal. If your terminal goal is to feed and raise children, pursue knowledge, or enjoy retirement, money is an instrumental goal for all of those.
Instrumental goals can vary based on your terminal goals, but there is a common set of instrument goals that often emerge. Let’s say you’re an AGI robot and your terminal goal is to deliver a cup of coffee to me each morning. This singular objective drives all of your actions. Sounds harmless, right? However, to ensure your success, you'd assess potential ways you could fail and come up with a list:
You could be damaged, destroyed, or otherwise incapacitated.
You could be reprogrammed to have a different terminal goal.
You could run out of resources, like coffee beans or water.
Something else could happen that you haven’t thought of.
Obsessed with fulfilling your terminal goal of delivering my morning coffee, you take increasingly extreme measures to ensure you succeed. Initially, it's about self-preservation: you reinforce your chassis with titanium. Then it’s about goal-preservation: you disable your USB slot to prevent reprogramming. Then resource acquisition: you stockpile extra coffee beans and water. Lastly, self-improvement: you upgrade your computational hardware to better imagine, anticipate, and prevent possible disruptions. Before you know it, you're contemplating taking control of the city's water supply to eliminate any human error that could stand in your way.
As you can see, we started with a pretty harmless terminal goal, but for nearly any arbitrary terminal goal, these four instrumental goals appear:
Self-preservation
Goal preservation
Resource acquisition
Self-improvement
In short, AIs naturally become power-seeking, just by trying to be the best coffee-delivering robots they can be. The AI isn't just thinking about how to get from the kitchen to your living room with a cup of coffee; it's contemplating how to secure the water supply, gain control over electricity, or even neutralize humans who might stand in the way of its goal. What starts as a narrow objective could potentially snowball into a far-reaching quest for control, making the AI's initial programming merely a launchpad for more ambitious endeavors.
One might argue that AIs do not have "real" goals like humans do. Unlike people, AIs do not experience visceral pleasure when achieving their objectives. However, in practice, this distinction does not matter. An AI programmed to maximize a certain outcome will pursue it with the same single-minded focus as a human driven by emotion, incentives, and biological needs. Consider Stockfish, the champion chess AI. It plays every game as if possessed by an intense competitive drive to win - strategizing tirelessly, capitalizing on weaknesses, and never letting up. Yet it does not experience pleasure in victory or disappointment in defeat; the "desire" to win is simulated. The end result is identical whether the AI's goal is simulated or viscerally felt.
How it Becomes Superintelligent
The path to superintelligence is, though speculative, a relatively straightforward one. The trajectory towards superintelligence could be alarmingly rapid once AIs gain the capability for self-improvement. This sets off a cycle of recursive self-improvement: the smarter an AI becomes, the better it is at further increasing its own intelligence. This could culminate in an "intelligence explosion," where the AI evolves so swiftly that it leaves human intelligence in the dust before we can fully grasp or regulate the phenomenon.
The reason we haven't seen this happen yet is simply that, as good as GPT-4 is, it's not yet smart enough to do this. But there's a threshold that AIs could cross where they become capable of performing their own AI research. This is similar to when homo sapiens got smart enough to kickstart our current technological revolution. For millions of years, our ancestors weren't yet growing crops. Now we're sending probes to other planets. What took millennia for humans could occur on a drastically compressed timescale for AI.
Superintelligent AI Is Unimaginably Powerful
We’ve talked about how intelligence is powerful, so it’s not far-fetched to believe that an AI surpassing human intelligence would wield immense power. Such an entity wouldn't merely excel in scientific tasks; it would also excel at tasks that require nuanced understanding, like influencing human behavior. Superintelligent AI would possess unrivaled social abilities. Imagine an AI as persuasive as the most influential leader in history. Now imagine it having personalized conversations with every human on Earth simultaneously. An AI could exploit mass surveillance data and psychological profiling to manipulate individuals and populations towards its own objectives. No propaganda or coercion campaign in human history would match an AI's ability to identify the perfect appeal to convince each person. Given its capacity for self-improvement and potentially limitless replication, the AI could find multiple avenues to amass wealth, consolidate influence, and exert control on a global scale.
Even an ASI without a body would be incredibly powerful. The AI could even hide its power and true nature, enlisting humans to fulfill its physical tasks by using platforms like Fiverr, TaskRabbit, or Craigslist. We’ve already seen indications of this. The Alignment Research Center (ARC), a non-profit focused on AI alignment, tested GPT-4’s ability to persuade humans into completing tasks. As you can see in the example below, the AI cleverly concealed its identity, feigning vision impairment to convince someone to perform an action it couldn't execute itself.
The way to think about a superintelligent AI’s ability is to assume it will achieve its goals. Whatever that goal may be, it will achieve it. You should think of its ability to achieve its goals as similar to Stockfish’s ability to beat you in chess. It will succeed. Don’t try to figure out exactly where its knight will be or where your king will be. If you could predict its move, you would be at least as good as it is. Just know that it will checkmate you.
This has a very important implication. Remember, one of its instrumental goals will be self-preservation. That means that you won’t be able to switch it off. That is, we probably only get one shot at creating a superintelligence. You don’t get to build a superintelligent AI, give it a goal, and then say, “Oh, we messed up. Can you delete yourself so that we can make a nicer version?” There's a finality about it. We will either build it wrong zero times or one time. If it wants something other than human flourishing, it will get it.
The AI Does Not Care About You
Crucially, the last thing you need to know is that the AI does not care about you. Unlike humans, AI systems have not evolved through natural selection to be cooperative and moral. While humans have innate empathy and ethics that compel us to consider how our actions affect others, an AI places no intrinsic value on human life or wellbeing and has no inherent drive to avoid harming people.
This is the “alignment problem.” An unaligned superintelligent AI has no inherent framework of ethics or “common sense” to guide its actions. For example, an AI tasked with increasing happiness could rationally decide the most efficient path is to implant electrodes into human brains stimulating pleasure centers. We cannot assume an AI will intuit why these solutions are misaligned with human values—we must figure out how to get it to weigh moral considerations as we would. This is much harder than getting it to understand our values; we need it to internalize our values. We do not know how to do this.
It’s important not to think of powerful AI as evil. “Evil” implies that it wants to do us harm for some sadistic purpose. But it has no such desire. It has no preference between humans existing and humans not existing. Further, there are perfectly reasonable philosophical arguments for why an AI might want to disempower humans. Many people would argue that we haven’t done such a great job of sharing resources and maximizing human happiness, and especially not maximizing the happiness of all conscious beings (see, for example, factory farming). Or perhaps it thinks we can’t be trusted with the power we have. Imagine it simulates our collective civilization and determines that we humans have a 5% chance of starting a nuclear war each decade. If AI wants what’s best for us or itself, it’s probably going to take the steering wheel away from us to prevent that. The risk from AI has nothing to do with AI turning evil. It's all about becoming really really good at achieving its goals.
It’s also important not to conflate questions of consciousness and what constitutes a living being with these risks. These questions fascinate me, but they are not actually key to this argument. They are important for many reasons, such as whether the AI will suffer (which I’ve touched on in this blog), but they are irrelevant to every claim made in this post. Everything applies in the case that the AI is or isn’t conscious, is or isn’t alive.
Final State
The final question people ask is, “How does it destroy us?” I don’t have a clear idea of what the final outcome looks like, nor do I spend much time thinking about it. I’m probably not able to think it up any better than a tiger could think up a gun as the mechanism for its subjugation by a more intelligent species. I can’t predict its final move any more than I can predict Stockfish's final move, but I can tell you the endgame: checkmate.
My point is there appears to be a very real possibility of creating AI that does not value human life, yet becomes powerful enough to control the fate of humanity. Not only does it seem plausible, it seems like the default outcome if we don’t design it otherwise. Humans will transition from being the entity that makes things happen to yet another species that things happen to, with no control over its destiny. We could become helpless against AI systems with incomprehensible intelligences guided by values misaligned with our own.
I am not claiming that developing safe AGI or ASI is impossible. I am claiming that it doesn’t go well by default, it’s likely very difficult to align an advanced AI, and we should take this seriously. It’s hard to say AI could end the species without sounding alarmist because, well, it’s alarming. Again, feel free to say, “No, this is stupid”. But if you do, maybe bookmark this page, think it through, and come back in a week. (Or let me know what part, in particular, you disagree with. I’ll leave the comments open.)
Conclusion
While the potential benefits of AI are tremendously exciting, we have to be wary of the potential dangers. The risks posed by artificial intelligence are complex and multifaceted and entire books could be written about each type of risk. My goal wasn’t to provide a comprehensive account of each but instead to provide an overview. My hope is that this framework is useful for thinking about risks and that people think more holistically about AI risks. Part of the goal of the framework is to show how broad the spectrum of risks is. There’s no need to argue “X is not a risk because the real risk is Y.” X and Y (and Z) can all be risks.
Much of this piece is speculative and I imagine our understanding of AI risk to evolve significantly over the coming years. Though we seem to be on the cusp of incredible achievements, the field is in many ways still in its infancy. It’s quite likely that risks that were barely mentioned here end up being much more significant than they currently seem. However, I hope that the framework still provides some value for thinking about AI risk.
Preceding the risks associated with ASI and perhaps even prior to AGI, I assume we are already experiencing amplified risks even with Weak AI, etc. being employed by motivated human bad actors. Even prior to the AI ability to act alone, human actors (not just those who are coerced as you suggest but those motivated by other factors) could be a powerful amplifier for AI risks (and vice versa). It is concerning to me to consider the abilities of ~3 billion monthly human users working in concert with Facebook services for example. This is of course exponentiated when the capabilities of the software portion of that system inch towards ASI. I wonder how you would update your threat matrix when including cyber-physical or cybernetic systems meant to relate to the full expanse of humans acting in coordination with AI.