Why Humans Are Still Ahead

Sharon Gai
Apr 14
5 min read

Sometimes we read about sweeping headlines on AI, and it makes us lose hope in humans. AI gets better exponentially at everything and our confidence in ourselves starts to wane. But the other day, I came across a test where the humans scored 100% and AI around 1%.

Whenever a new model update is rolled out, it is measured against industry benchmarks. (These benchmarks include tests such as MMLU, which measures broad knowledge across many subjects, GSM8K for mathematical reasoning, HumanEval for programming ability, and reasoning challenges like ARC or HellaSwag.) These are like the SATs or LSATs or other standardized tests we give for humans. It’s a way to measure all models in a standardized format to really compare which model is better.

Beating these tests means headlines written by media. Headlines mean a raise in stock price. Raise in stock price means a literal raise in employee paychecks. All in all, all frontier model companies are obsessed with these scores. Most model companies are incentivized to do well on these tests.

We’ve thrown AI to many tests where they have far exceeded human scores. But the other week, I came across a test where the humans scored 100% and the AI lower than 1%. (Google’s Gemini 3.1 Pro scored 0.37%, followed by GPT-5.4 High (0.26%) and Claude Opus 4.6 Max (0.25%))

You can visit ARC AGI to play with the tests yourself. It’s made by this entity called the ARC Prize Foundation, a non-profit organization that advances open-source artificial general intelligence (AGI) research by creating scientifically grounded benchmarks and hosting global prize competitions to measure and close the gap between human and artificial intelligence.

https://video.wixstatic.com/video/786d80_8e354ccd06b242fe9e50614a5c3e86fe/1080p/mp4/file.mp4

Its latest test is basically a game where there are no explicit written rules and you kind of have to “figure it out” to win. In these situations, when AI is also given no rules, they are extremely weak at experimentation. But for humans, we’ve been training this skill since we opened our eyes. This is actually how a young toddler sees the world. They bump into things. It hurts. They remember never to do that again. It’s through all of these small experimentations that teaches them about how to function as a human.

I was speaking at a Sales Conference the other day, and I used this example to illustrate the difference between a junior sales rep and a senior sales rep. There’s a junior sales rep who goes along with the script, and will likely have a lower close rate than someone more senior, who knows when to shut up, when to interject, and what the sweet spot offer number would be to close the deal. All of that wisdom forms from the hundreds of experiments and years of experience they acquired years before.

PS if you like these type of games, I came upon called Bubba is You. They’re good brain teasers and is one grade above a Candy Crush!

Why This Gap Exists

Large language models, including the most advanced ones, are fundamentally reactive systems. They process a prompt and generate a response based on patterns they absorbed during training. They are extraordinary at this. Ask an LLM to write, summarize, analyze, translate, or code, and it will often produce work that rivals or exceeds what most humans can do on those specific tasks.

But all of that performance is interpolation. The model is drawing on a massive library of patterns it has already seen and recombining them in sophisticated ways. It is not learning anything new in the moment. It is not updating its understanding of the world based on real-time feedback. It is not exploring.

ARC-AGI-3 is designed to find how quickly a system can understand and master a completely new environment. This is the core of what intelligence actually is: not what you already know, but how fast you can figure out something you have never encountered before.

When you drop a human into one of these games, they start poking around. They try moving in different directions. They observe what happens when they interact with objects. Within a few moments, they have built a rough mental model. A few moments later, they have refined it. And then they win.

The Bee and the Beekeeper

This is why I wrote about the bee versus the beekeeper in my book. AI is the bee, not the beekeeper.

Bees are remarkable at executing known patterns. They follow established routines with extraordinary precision, speed, and scale. In a business context, AI does the same thing. It processes documents, generates content, analyzes data, and automates workflows faster and cheaper than humans ever could. If you know exactly what needs to be done and can define the task clearly, AI is your best employee.

But the beekeeper is the one who walks into unfamiliar territory. The beekeeper reads the environment, makes judgment calls with incomplete information, adapts the strategy when conditions change, and decides what the bees should be doing in the first place.

ARC-AGI-3 is, in essence, a beekeeper test. It asks: can you figure out a world you have never seen before? Can you set your own goals when nobody has told you what to do? Can you learn from your own experience and transfer that learning to new situations?

Humans pass this test effortlessly. AI fails completely. And that should fundamentally shape how organizations think about where to deploy AI and where to invest in human capability.

What does this mean for humans?

So what does this mean? And is there a way we can train ourselves to be better than AI?

The answer lies in seeking out the unfamiliar. AI excels at processing known patterns, but it struggles with "edge cases", the messy, unpredictable realities of life. To train your brain to outperform an algorithm, you must embrace discomfort: travel to new places, navigate complex systems, and allow yourself to get lost.

While specializing in your field is important, true innovation comes from "cross-pollination." By studying subjects far outside your professional lane, you diversify your internal knowledge base and learn to apply foreign concepts to your work. In my book, I call this Fluid Intelligence. Unlike Crystallized Intelligence, the static, trained data that AI relies on, Fluid Intelligence is the human ability to solve problems creatively and adapt concepts across seemingly unrelated fields. Historically, this adaptability has been the bedrock of our greatest inventions.

So, explore, experiment, learn, and adapt. And get better at being human.

Hello! I’m Sharon Gai, author of How to Do More with Less: Future-Proofing in an AI-Driven World a keynote speaker on AI and its effects on workers and the future of work.

3 Comments

Rated 0 out of 5 stars.

No ratings yet

keonhacai5

5 days ago

keonhacai5 hôm trước mình lướt thử vì thấy mấy ông bạn hay nhắc, kiểu tò mò xem trang này trình bày ra sao thôi chứ mình không phải dân ngồi phân tích kèo. Vào cái là thấy bảng tỷ lệ kèo odds cập nhật khá nhanh, mình refresh nhẹ cũng thấy con số nhúc nhích nên cảm giác thông tin còn “tươi”. Mình cũng để ý họ có đoạn giải thích thuật ngữ kèo nhà cái, nhất là kèo châu Á (handicap) nói ngắn gọn nên đọc lướt vẫn hiểu kiểu đội mạnh chấp đội yếu thế nào. Nói chung nhìn không bị rối mắt, chữ số canh hàng thẳng, dễ dò. Mình thích nhất là bảng odds chia cột…

Guest

6 days ago

bongdalu808 hôm bữa mình rảnh nên bấm vào xem thử cho biết thôi, kiểu tò mò giao diện ra sao chứ không có ngồi đọc kỹ hay soi gì hết. Vừa mở lên là thấy ngay mấy khung tỷ số trực tuyến với lịch thi đấu đặt rõ ràng, nhìn cái là nắm được chứ không bị rối chữ. Mình cũng thích cái thanh chọn múi giờ GMT, kéo qua lại khá tiện, nhất là ai hay xem trận theo giờ khác khỏi phải tự đổi giờ trong đầu. Lướt một lúc là quen vì bố cục gọn, không phải bấm vòng vòng mới ra thông tin. Nói chung cảm giác dùng nhẹ nhàng, và phần tỷ số + thanh…

May 23

kèo nhà cái 5 hôm bữa mình lướt thử vì thấy có người share, chủ yếu tò mò xem họ bố trí bài vở ra sao chứ không định đọc sâu. Vào trang cái là thấy phần nhận định soi kèo để khá ngay ngắn, tiêu đề to rõ nên kéo một phát là biết đang nói trận nào. Mình có bắt gặp bài Stjarnan vs Valur (02h15 ngày 18 07) nằm ở cụm nội dung nổi bật, nhìn gọn gàng chứ không rối mắt. Thích nhất là kiểu chia khối, khoảng cách chữ ổn, cuộn xuống vẫn không bị “ngợp” vì mỗi bài được tách riêng bằng tiêu đề và đoạn tóm tắt ngắn trên giao diện.