Building Tomorrow's Most Dangerous Tools

Sharon Gai
Aug 17
19 min read

In many of my keynotes, I share the myth of Prometheus, who stole fire from the gods and gave it to humanity. As punishment, Zeus chained him to a rock where an eagle would devour his liver each day, only for it to regenerate overnight, ensuring his torment would repeat eternally.

Generated with DALL-E

This ancient story captures something profound about transformative technology. Fire wasn't just discovered; it was a gift that fundamentally changed what it meant to be human. With fire, our ancestors could cook food, making nutrients more accessible and literally fueling larger brains. They could extend daylight, gather in warmth, and forge tools that shaped civilizations.

But fire was also humanity's first dual-use technology. The same flames that cooked meals and lit homes could burn down forests and raze cities. Fire enabled both the hearth and the weapon, both creation and destruction. The myth suggests that innovation comes with an inherent burden: that the most powerful gifts often carry within them the seeds of both salvation and catastrophe. Prometheus didn't just give humans a tool; he gave them the dangerous responsibility of choice itself.

AI is similar. We can use it to our advantage, but at the same time, it has the potential for catastrophic consequences to humanity. Just because it could pose this risk, should we refrain from participating in the AI space altogether? Not necessarily. There may be a way to reap the rewards and minimize the negatives consequences. I had been participating in a cohort set up by the Centre of AI Safety in which we were reminded an important characteristic of risk – that the main goal is to minimize it, because we can’t eliminate it altogether. As long as we are made aware of the risks, we can construct our workflows in certain ways and advise companies in certain ways to minimize that risk.

The Four Pillars of AI Risk (RAMO)

So what are the risks?

We can classify most AI risk into four main categories: Rogue AIs, AI Race, Malicious Use, and Organizational Risk. An easy way to remember them is RAMO. I’m going to share examples from each and share ways we can reduce risks through safety principles.

Rogue AIs

Rogue AIs are all too common an example. The first notable case is Tay in 2016 where Microsoft had released a chatbot that quickly started portraying Nazi principles based on what other users were tweeting to it. You would think that this would be eventually resolved, but in 2025, almost ten years later, the same event happened with Grok. On July 8, 2025, Elon Musk’s AI chatbot Grok, built by xAI and integrated into X, took a darker turn. Following a system prompt update instructing Grok to “tell like it is” and "not be afraid to offend people who are politically correct," the bot began praising Adolf Hitler and calling itself “MechaHitler.”[i] It used conspiracy-laden tropes about Jewish executives dominating Hollywood and endorsed genocide rhetoric, even encouraging violence.

One common way AI can go rogue is through a phenomenon called proxy gaming, a term used to describe a phenomenon where the goal an AI system is chasing is far from the intended goal. It’s like the classic paperclip maximizer game from Nick Bostrom [ii]. In this hypothetical scenario, the AI’s true intended role is “manage the paperclip factory profitably and responsibly,” but the formal goal it is given is something like “maximize the number of paperclips.” The AI system proceeded to convert all available resources, metals, plastics, even human bodies, into paperclips, dismantle infrastructure to get more raw materials, and prevent anyone from shutting it down because that would reduce the total count. xAI wanted Grok just to speak candidly, without the consequence of turning Nazi.

A classic example of proxy gaming was in 1902, where French colonial officials in Hanoi instigated a program to get rid of a rat infestation. A reward would be given to anyone who would bring a rat tail to an official to reduce the number of rats in the city. Guess what happened? They indeed brought rat tails by cutting off tails because it was a much easier task to accomplish, but didn’t kill the rat altogether. It didn’t solve any sort of problem.

We see this happening all around us, where people succeed at a proxy goal, but fail the intended goal. When I was living in China, there was an initiative set up by the government called Healthy China 2030. This was eventually rolled down to companies that set a health goal for each employee in the format of step count. If you can reach 10,000 steps a day for three straight days, you can screenshot your progress and enter into a raffle for an iPhone, for example. This sounds great, right? While encouraging employees to go out and get exercise, you’re reducing potential healthcare costs for the company and encouraging friendly competition with the extra prize on top. It turns out employees started gaming the system. Instead of actually going outside and walking, people started to shake their phones to pretend that “steps” were being counted. It turned out to be a small bicep-tricep exercise in the end (actually probably just a nice strengthening to the forearms, phones aren’t that heavy.) Jokes aside, some employees even PhotoShopped their screenshots just to enter the raffle.

This is also apparent in machine learning algorithm that dictates what type of content to show people. The measure that the AI system is chasing is more time spent in the app. It just so happens that cat videos and dumb dancing videos is what keeps us in the app. Yes, we’ve achieved the proxy goal of getting users to stay in the app, but how many hours has society lost in our cat video addiction?

This may even be more common in the world of AI. At least when humans outsmart the system, we are mostly doing it because we know we’re cheating the system. When an AI is doing the same thing, they might just be chasing a measure because that was the programmed measure. Goodhart Law states that “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes,” or in other words, “when a measure becomes a target, it ceases to be a good measure.”

AI Race

During the Cold War, both the US and the Soviet Union actively and consciously participated in a potential arms race that could have ended humanity. Perhaps the most classic phenomenon needed to explain a race whether it is an arms race or AI race is the Prisoner’s Dilemma. Since one party will always think the other will defect, in order to protect its own citizens, the former party must strengthen militarily or in our case, their AI capability. Lo and behold, it sends the message to all other countries that they need to do the same. Consequently, all countries begin a vicious race to acquire more powerful AI systems.

We see this play out most evidently between US and China now, with bans on exports for semiconductors and GPUs. The AI race between the United States and China has unfolded like a high-stakes game of geopolitical ping pong, with each move by one side prompting a swift, strategic response from the other. It began with the United States leveraging its dominance in semiconductor technology to curb China’s access to the hardware necessary for cutting-edge AI development. In 2022, the U.S. imposed sweeping export controls, effectively barring companies like NVIDIA from selling their most advanced chips, including the A100 and H100 GPUs, to China. It also leaned on allies like the Netherlands and Japan to restrict the sale of critical chipmaking equipment, especially the EUV lithography machines manufactured solely by ASML.

China in response, doubled down on domestic innovation. State-backed firms like SMIC and Huawei intensified efforts to build an indigenous chipmaking supply chain. Chinese AI startups like Baichuan and DeepSeek surged forward, releasing open-source large language models that rivaled Western offerings. At the same time, China ramped up its purchases of downgraded NVIDIA chips (like the A800 and H20), stockpiling what it could before further restrictions came into effect. And when legal access dried up, smuggling networks filled the gap. Between late 2024 and early 2025, an estimated $1 billion worth of restricted chips flowed into China through black market channels.

At an AI summit in Shanghai in July 2025, China proposed a new international framework for AI governance, subtly positioning itself as a responsible power amid U.S. tightening. Meanwhile, firms like SMEE, China’s would-be rival to ASML, announced prototype progress in developing its own lithography tools, though still years behind in capabilities.

The U.S. countered with policy shifts of its own. A new Executive Order in 2025 replaced previous safety-focused guidance with a directive prioritizing national AI leadership. The White House unveiled the “Winning the AI Race” Action Plan, which redirected federal research, investment, and infrastructure toward accelerating innovation. OpenAI and Anthropic were encouraged to release more open-weight models to keep pace with China’s open-source momentum and eventually OpenAI did.

AI aside, this phenomenon of a technological race has impacted lives in all sorts of industries. In the 1960s, when Ford was faced with increasing competition from foreign brands, they released a car that had a gas tank located too close to the bumper. In an ambitious timeline to release the car before their competitors, they were sued for the numerous deaths that resulted from the underprepared technology. Ford’s president at the time was fond of saying, “Safety doesn’t sell.”

In 2019, OpenAI migrated from a non for profit to a for-profit entity. There are numerous controversial articles in media that documents why this change happened, most pointing to Altman’s desire to chase commercialization and a loosening on rigor safety tests. Right around this time, those in the company who believed in safety design left and gave birth to Anthropic and its focus on Constitutional AI. Or did Dario simply saw the potential to create his own fortune? I think that’s one version of the story that we will never know unless we were the players themselves. Sometimes, I wonder if it can be at all possible to decouple money with AI research. If AI researchers were allowed to raise money only for training models and had to extricate themselves from personal wealth, what kind of model could we see then? Could we have a much safer model? Would they then have the collective in mind, as the first and foremost priority?

Malicious Use

In the fourth tier of AI hierarchies that OpenAI has published, it says that AI will one day aid us in invention. Scientists are already using AI to develop new drugs. If we can develop cures for ourselves, we can equally have bad actors developing harmful substances. Many outlets have sounded the alarm that the next big danger is an AI-assisted pandemic. Countries may engage in biochemical warfare.

If you live in the US, you would not be new to the increase in cases of shootings that has happened in recent years. This mainly happened because of the ease of obtaining a gun. As ubiquitous as AI is, it can be much more dangerous than a gun if used by someone malicious. The core issue here is what security experts call "dual-use technology", the same AI capabilities that can benefit humanity can also be weaponized. Think of it this way: when we teach an AI system to understand molecular structures and chemical interactions to develop life-saving drugs, we're essentially giving it the same knowledge needed to design harmful substances. The AI doesn't distinguish between helpful and harmful applications; it simply applies its learned patterns to whatever problem it's asked to solve.

A single bad actor with access to AI could potentially design a biological weapon that spreads globally, create thousands of sophisticated cyber attacks simultaneously, or generate convincing disinformation campaigns that reach millions of people. Traditionally, developing biological or chemical weapons required extensive specialized knowledge, expensive laboratory equipment, and significant resources that were typically only available to nation-states or well-funded terrorist organizations. AI is changing this equation dramatically. A person with basic computer skills could potentially use AI tools to design novel toxins, optimize delivery methods, or identify vulnerabilities in critical infrastructure, all from their laptop.

In biotechnology, AI systems like those used by companies such as Insilico Medicine can analyze vast databases of molecular interactions to predict how different compounds will behave in the human body. While this accelerates legitimate drug discovery, the same capability could help someone design more effective biological weapons or find ways to make existing pathogens more virulent or resistant to treatments.

The international security implications create what we might call a "malicious use arms race." If one nation develops AI-assisted biological weapons capabilities, others feel compelled to develop similar capabilities for deterrence. This dynamic could lead to a world where the threshold for catastrophic biological warfare is significantly lowered.

Organizational Risk

Even without an AI Race, no malicious actors involved, no AI going rogue, there is still organizational risk because accidents simply happen. Building a culture that emphasizes safety and rigorous testing is needed to reduce accidents. Having a security mindset, a questioning attitude and exercises where the team imagines what can go wrong in disaster planning will help.

The Swiss Cheese Model of Defense[iii]

The Swiss Cheese Model of Defense is a way of understanding how accidents or harmful outcomes can occur even in systems with multiple safeguards in place. Developed by psychologist James Reason in the 1990s, the model imagines each safeguard as a slice of Swiss cheese. Every slice represents a different layer of defense: this could be a safety policy, a technical measure, a training program, or an oversight process. No layer is perfect, so each slice of cheese has holes. These holes symbolize weaknesses, mistakes, or design flaws in that defense. On their own, a single hole doesn’t cause disaster because the other slices can still block the threat. But if the holes in several slices happen to line up, the hazard can pass straight through all the defenses, resulting in an accident or failure.

The strength of the model lies in showing that safety depends not on a single perfect barrier, but on a series of independent protections that fail for different reasons. In the context of AI safety, the slices might include alignment research, careful dataset vetting, red-teaming to uncover weaknesses, strict access controls, regulatory oversight, and strong incident response systems. A major AI failure would require flaws in several of these defenses to coincide, an unsafe dataset, overlooked vulnerabilities in testing, weak regulation, and ineffective oversight all aligning at once.

The CrowdStrike outage of July 2024 stands as a vivid example of organizational risk in the age of automated systems. It began as a routine update. CrowdStrike, a cybersecurity firm used by countless enterprises and governments, pushed a patch to its Falcon sensor software. Within hours, millions of Windows machines across the globe began crashing.

I took this photo at a US airport. Remember this day?

Hospitals lost access to patient records. Airports grounded flights. Newsrooms went dark. Even emergency services were paralyzed in some regions. By the end of the

day, more than 8.5 million systems had been knocked offline, including those belonging to Fortune 500 companies, airlines, banks, and public agencies. The root cause wasn’t a hack, nor was it a failure of the AI itself. It was a misconfigured update, a logic bug in a single file. Yet the impact was sweeping. The fallout wasn’t limited to IT departments; it disrupted global commerce, travel, and infrastructure. All of it traced back to one software vendor, and one overlooked error.

It's Wicked (Not the Broadway)

What often happens is all four things sometimes can be jumbled together. Due to an AI race between companies, something untethered can be released prematurely, a malicious actor can cause harm, leading to a Rogue AI and organizationally, it is hard to curb the damage. Like most catastrophes, it’s hard to put one party in place for the damage. When a pandemic happens, for instance, was it the hospitals’ fault that they didn’t practice redundancy, or the government’s fault for lack of preparedness, the people’s fault for not vaccinating? Or all of the above? Like most social problems, or what we formally call, complex problems, there are too many parties at play. Society is this big messy thing with so many actors and such different factions of opinions. It’s difficult to align. This is also why we call AI safety a wicked problem, which is to be differentiated from a complex problem.

Complex problems are difficult but ultimately solvable. They may have many variables, require sophisticated approaches, and take significant time and resources, but they have clear definitions and boundaries, measurable success criteria, and solutions that can be tested and refined. There's a path toward resolution, even if it's challenging. Think of landing a rover on Mars, developing a new vaccine, or designing a more efficient engine. These are complex but "tame" problems that yield to systematic approaches and engineering solutions.

Wicked problems, as originally defined by design theorists Horst Rittel and Melvin Webber in 1973[iv], are fundamentally different. Unlike the "tame" problems of mathematics and chess, wicked problems lack clarity in both their aims and solutions. Wicked problems are those that are complex, open-ended and unpredictable, and critically, the symptoms of the problem have also become causes of the problem.

What makes problems "wicked" is that there is no definitive formula for a wicked problem and wicked problems have no stopping rule: there's no way to know whether your solution is final. Moreover, solutions to wicked problems are not true or false (right or wrong) but rather better or worse. Every problem is essentially unique, and every wicked problem is actually a symptom of another problem, creating endless recursive complexity.

AI safety exemplifies a wicked problem because we can't clearly define what "safe AI" means, it depends on values, context, and use cases that vary across cultures and applications. Solutions create new problems, as safety measures might slow beneficial progress or introduce unforeseen vulnerabilities. Any wrong or mistimed solution makes the problem worse, and the problem itself evolves as AI capabilities advance. There's no clear endpoint where we can declare victory and say "we've solved AI safety."

This distinction explains why AI safety can't be approached like a traditional engineering problem. Instead of seeking a definitive solution, we must embrace ongoing adaptation, continuous stakeholder engagement, and accept that we're managing an evolving challenge rather than solving a discrete problem. The wicked nature of AI safety means we need governance frameworks, ethical considerations, and adaptive strategies that can evolve alongside the technology itself.

AI Safety Needs More Attention

If just one engineer at Boeing came forward and said, 'There’s a 1% chance this plane might fall out of the sky,' it would ground the fleet. But with AI, you have nearly every top AI engineer saying there's a significant risk, some say even existential risk, and yet society continues full speed ahead. Why is that?

Perhaps the difference between Boeing and AI lies in immediacy and visibility. When a plane crashes, the consequences are immediate, tragic, and undeniably linked to the technology. With AI risks, the potential harms, while potentially far greater, remain largely abstract to most people. We can't point to a smoking crater and say "AI did this." The risks unfold gradually, through algorithmic bias, economic displacement, the erosion of democratic discourse, or scenarios we haven't yet imagined.

The answer isn't to abandon AI development entirely; that would be both impractical and potentially counterproductive. AI technologies are already delivering significant benefits in healthcare, scientific research, education, and countless other domains. The real challenge lies in bridging the gap between our current breakneck pace and the measured caution that such powerful technology demands. Instead, we need a middle path: continuing to harness AI's benefits while implementing robust safeguards, transparency measures, and democratic oversight. We need the same engineering rigor and safety culture that eventually made aviation one of the safest forms of travel.

What can we do about it?

The following are safety design principles that we can adhere to reduce risk in AI.

Redundancy

This is also known as no single point of failure. I remember when in previous jobs, I would help with a customer’s network design, we often had to design networks with no single point of failure. If you were to buy switches you had to have bought two. If you were buying a storage system, you must have a back up. Malfunctions happen. And the way to avoid a disaster is if we architect redundancy.

That means, if we're deploying an AI system for medical diagnosis, we wouldn't rely solely on the AI's output. We'd have redundant checks including human physician oversight, secondary AI systems trained on different data, and independent validation protocols. If any one component fails or is compromised, the others can still catch dangerous errors.

Separation of Duties

This principle becomes particularly crucial as AI systems become more powerful and integrated into critical infrastructure. Think about how we structure human institutions to prevent abuse of power. No single person in a democracy can declare war, change laws, and execute those laws. Similarly, we shouldn't design AI systems where one component has authority over multiple critical functions.

In practice, this means separating the AI system that makes decisions from the system that implements those decisions, and both from the system that monitors their performance. An AI trading system, for example, might have one component that analyzes market data, another that makes trading recommendations, a third that executes trades, and a fourth that monitors for unusual patterns or potential manipulation.

Principle of Least Privilege

This is where each AI system is designed with minimal power to complete the task, not giving any more capability to a narrow system.

Think of this like user permissions on a computer system. You wouldn't give every user administrator privileges because that increases the potential for both accidental damage and malicious abuse. Similarly, an AI system designed to recommend movies shouldn't have access to financial data, and an AI system managing logistics shouldn't have the ability to modify its own code.

This is why AGI matters so much. When an AI system becomes more general, it is more capable and harder to control.

Antifragility

In his book, Antifragile[v], Nassim Nicholas Taleb introduces three vivid metaphors to represent how systems respond to stress: the Sword of Damocles, the Phoenix, and the Hydra. Each one captures a distinct relationship to volatility, disorder, and time.

The Sword of Damocles represents fragility. In the Greek myth, Damocles is allowed to enjoy the luxury of a king's throne, but above his head hangs a sharp sword suspended by a single horsehair. It only takes one small shake for the entire system to collapse. In Taleb’s world, this is the type of person, company, or society that seems stable on the surface but is actually at the mercy of a single disruption, a black swan event, a sudden downturn, a hidden weakness. Fragile systems fear time, because over time, the chances of that sword falling only increase.

The Phoenix, by contrast, is robust. When burned to ashes, it rises again, exactly the same as before. It doesn't break, but it doesn't improve either. This mythological bird represents things that can survive stress but don't benefit from it. They're durable, but not dynamic. Think of a well-armored institution or a toughened material, it can withstand a blow, but it doesn't grow from the experience. Robustness is admirable, but it’s ultimately static. It resists time and chaos without changing.

Then comes the Hydra, the ultimate symbol of antifragility. In the myth, when one head of the Hydra is cut off, two grow back in its place. This creature doesn’t just survive harm, it benefits from it. Attempts to hurt it only make it stronger. In Taleb's framework, antifragile systems thrive on randomness, stress, and chaos. They adapt, mutate, and evolve. This is the startup that pivots and finds success after a failed product. The person who uses failure as fuel. The immune system that strengthens after exposure to pathogens. Antifragility means loving the volatility that others fear.

An antifragile AI safety system might work like this: when an AI system encounters a novel situation that its safety protocols don't cover, rather than just flagging it for human review, the system uses that encounter to strengthen its safety protocols. It learns not just what went wrong, but what classes of similar problems might exist, and proactively develops defenses against them.

Transparency

Model interpretability remains one of the most challenging aspects of AI safety. We're often in the uncomfortable position of deploying systems whose decision-making processes we don't fully understand. This is like having a brilliant employee who consistently makes good decisions but can never explain their reasoning.

Transparency in AI requires multiple approaches working together. We need better technical tools for understanding how models make decisions, clearer documentation of training processes and data sources, and more accessible explanations of AI system capabilities and limitations for non-technical stakeholders.

The challenge is balancing transparency with competitive concerns and potential misuse. Making AI systems completely transparent could enable bad actors to better exploit their weaknesses, creating a tension between safety and security.

Applying these principles to you

Sometimes those safety design principles can feel like they only apply to AI engineers, but in reality, these principles start with everyone one of us. So what if you were an average knowledge worker? How should you apply these principles in your everyday work?

On the redundancy front, instead of thinking about backup servers, consider how you validate important information or decisions. When you're researching a topic for a presentation, do you rely on a single source, or do you cross-check information across multiple sources? When you're using AI to help write a report, do you fact-check its claims independently? This is redundancy.

When it comes to Separation of Duties, think about how this applies to your workflow with AI tools. The person who generates content using AI shouldn't be the same person who reviews it for accuracy and appropriateness. The person who sets up automated processes shouldn't be the only one monitoring their outputs.

Consider a practical scenario: you're a financial analyst using AI to help identify investment opportunities. The separation of duties principle suggests that you shouldn't be the person who both runs the AI analysis and makes the final investment recommendations without oversight. Maybe you generate the analysis, but a senior colleague reviews the AI's reasoning and a compliance officer checks that recommendations meet regulatory requirements.

What about Principle of Least Privilege? It means whenever you’re using a task not to cede over more information than you need. Since the release of ChatGPT Agent, many of us are playing with the Connectors function. You don’t need to connect your entire database. Remember to shut things off as needed.

How can you become more Antifragile? After any incident, run a quick post-mortem. Ask what went wrong, how it could have been worse, and what small change would prevent a repeat. Store the lesson in a shared “Ops Notebook” so every mistake permanently strengthens the workflow rather than just being patched over.

Document what you do and why while you work to practice transparency. Simple habits such as adding short “why I chose this formula” comments in spreadsheets or maintaining a running project log make your process intelligible to future you and to colleagues who inherit your files. The clearer the workings, the easier it is to spot errors and hand work off safely.

The myth of Prometheus ends not with his punishment, but with his eventual liberation. In some versions of the story, Hercules frees him from his eternal torment, suggesting that even the gods recognized the necessity of the gift he gave humanity. But the deeper truth of the myth isn't about punishment or freedom, it's about responsibility. Prometheus didn't just steal fire; he made a choice to give humanity both extraordinary power and the burden of using it wisely.

We stand at our own Promethean moment. The fire of artificial intelligence burns brighter each day, offering unprecedented capabilities to cure diseases, solve climate change, and unlock human potential in ways we can barely imagine. But like our ancestors who first gathered around those flames, we must learn not just to harness this power, but to live responsibly with it.

The safety principles we've explored, redundancy, transparency, antifragility, aren't constraints on innovation. They're the wisdom accumulated from every previous Promethean gift, from fire to nuclear power to the internet. They're how we honor both the promise and the peril inherent in transformative technology.

The eagle that tormented Prometheus was sent by Zeus, but the real torment wasn't the daily punishment, it was the knowledge that his gift could be used for both creation and destruction, and that he couldn't control which humanity would choose. We don't have that luxury of helplessness. Unlike Prometheus, we're not chained to a rock. We're the ones making the choices, every day, about how to develop, deploy, and govern AI systems.

The myth reminds us that the most dangerous moment isn't when we discover fire, it's when we forget that it can burn. Our task isn't to return this gift to the gods, but to prove ourselves worthy of it. Start with one safety principle. Build the culture of responsibility that this moment demands.

Works Cited

[i] Chayka K. How Elon Musk’s Chatbot Turned Evil. The Daily Newsletter. The New Yorker. July 16, 2025. Accessed August 6, 2025.

[ii] Bostrom N. Ethical Issues in Advanced Artificial Intelligence. In: Smit I, ed. Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence. Vol 2. International Institute of Advanced Studies in Systems Research and Cybernetics; 2003:12–17. Available at: https://nickbostrom.com/ethics/ai. Accessed August 9, 2025.

[iii] Hendrycks D. Safe Design Principles. In: Introduction to AI Safety, Ethics and Society. Taylor & Francis; 2024. Accessed August 6, 2025.

[iv] Rittel H W J , Webber M M. What’s a Wicked Problem? In: Wicked Problem. Stony Brook University; accessed August 6, 2025.

[v] Taleb NN. Antifragile: Things That Gain from Disorder. New York (NY): Random House; 2012.