Why Humans Are Still Ahead
- Sharon Gai
- 1 day ago
- 5 min read
Sometimes we read about sweeping headlines on AI, and it makes us lose hope in humans. AI gets better exponentially at everything and our confidence in ourselves starts to wane. But the other day, I came across a test where the humans scored 100% and AI around 1%.
Whenever a new model update is rolled out, it is measured against industry benchmarks. (These benchmarks include tests such as MMLU, which measures broad knowledge across many subjects, GSM8K for mathematical reasoning, HumanEval for programming ability, and reasoning challenges like ARC or HellaSwag.) These are like the SATs or LSATs or other standardized tests we give for humans. It’s a way to measure all models in a standardized format to really compare which model is better.
Beating these tests means headlines written by media. Headlines mean a raise in stock price. Raise in stock price means a literal raise in employee paychecks. All in all, all frontier model companies are obsessed with these scores. Most model companies are incentivized to do well on these tests.
We’ve thrown AI to many tests where they have far exceeded human scores. But the other week, I came across a test where the humans scored 100% and the AI lower than 1%. (Google’s Gemini 3.1 Pro scored 0.37%, followed by GPT-5.4 High (0.26%) and Claude Opus 4.6 Max (0.25%))
You can visit ARC AGI to play with the tests yourself. It’s made by this entity called the ARC Prize Foundation, a non-profit organization that advances open-source artificial general intelligence (AGI) research by creating scientifically grounded benchmarks and hosting global prize competitions to measure and close the gap between human and artificial intelligence.
Its latest test is basically a game where there are no explicit written rules and you kind of have to “figure it out” to win. In these situations, when AI is also given no rules, they are extremely weak at experimentation. But for humans, we’ve been training this skill since we opened our eyes. This is actually how a young toddler sees the world. They bump into things. It hurts. They remember never to do that again. It’s through all of these small experimentations that teaches them about how to function as a human.
I was speaking at a Sales Conference the other day, and I used this example to illustrate the difference between a junior sales rep and a senior sales rep. There’s a junior sales rep who goes along with the script, and will likely have a lower close rate than someone more senior, who knows when to shut up, when to interject, and what the sweet spot offer number would be to close the deal. All of that wisdom forms from the hundreds of experiments and years of experience they acquired years before.
PS if you like these type of games, I came upon called Bubba is You. They’re good brain teasers and is one grade above a Candy Crush!
Why This Gap Exists
Large language models, including the most advanced ones, are fundamentally reactive systems. They process a prompt and generate a response based on patterns they absorbed during training. They are extraordinary at this. Ask an LLM to write, summarize, analyze, translate, or code, and it will often produce work that rivals or exceeds what most humans can do on those specific tasks.
But all of that performance is interpolation. The model is drawing on a massive library of patterns it has already seen and recombining them in sophisticated ways. It is not learning anything new in the moment. It is not updating its understanding of the world based on real-time feedback. It is not exploring.
ARC-AGI-3 is designed to find how quickly a system can understand and master a completely new environment. This is the core of what intelligence actually is: not what you already know, but how fast you can figure out something you have never encountered before.
When you drop a human into one of these games, they start poking around. They try moving in different directions. They observe what happens when they interact with objects. Within a few moments, they have built a rough mental model. A few moments later, they have refined it. And then they win.
The Bee and the Beekeeper
This is why I wrote about the bee versus the beekeeper in my book. AI is the bee, not the beekeeper.
Bees are remarkable at executing known patterns. They follow established routines with extraordinary precision, speed, and scale. In a business context, AI does the same thing. It processes documents, generates content, analyzes data, and automates workflows faster and cheaper than humans ever could. If you know exactly what needs to be done and can define the task clearly, AI is your best employee.
But the beekeeper is the one who walks into unfamiliar territory. The beekeeper reads the environment, makes judgment calls with incomplete information, adapts the strategy when conditions change, and decides what the bees should be doing in the first place.
ARC-AGI-3 is, in essence, a beekeeper test. It asks: can you figure out a world you have never seen before? Can you set your own goals when nobody has told you what to do? Can you learn from your own experience and transfer that learning to new situations?
Humans pass this test effortlessly. AI fails completely. And that should fundamentally shape how organizations think about where to deploy AI and where to invest in human capability.
What does this mean for humans?
So what does this mean? And is there a way we can train ourselves to be better than AI?
The answer lies in seeking out the unfamiliar. AI excels at processing known patterns, but it struggles with "edge cases", the messy, unpredictable realities of life. To train your brain to outperform an algorithm, you must embrace discomfort: travel to new places, navigate complex systems, and allow yourself to get lost.
While specializing in your field is important, true innovation comes from "cross-pollination." By studying subjects far outside your professional lane, you diversify your internal knowledge base and learn to apply foreign concepts to your work. In my book, I call this Fluid Intelligence. Unlike Crystallized Intelligence, the static, trained data that AI relies on, Fluid Intelligence is the human ability to solve problems creatively and adapt concepts across seemingly unrelated fields. Historically, this adaptability has been the bedrock of our greatest inventions.
So, explore, experiment, learn, and adapt. And get better at being human.
Hello! I’m Sharon Gai, author of How to Do More with Less: Future-Proofing in an AI-Driven World a keynote speaker on AI and its effects on workers and the future of work.



Comments