Reframing AI Success: Introducing 'Humanity's Best Exam' for Social Good

It feels like we're stuck in a loop. A new AI model is announced, and the news cycle spins up. We hear about its incredible power, see it pass another standardized test, and then, almost immediately, we're hit with headlines about the potential dangers. Could it be used to design a weapon? Will it automate away our jobs? The narrative is so focused on technical specs and potential threats that it's easy to feel like every step forward for AI is a step back for humanity.

The way we measure progress—our benchmarks—is largely to blame. These tests often feel like they're designed to scare us, with names like "Humanity's Last Exam." They excel at measuring an AI's raw power or its potential for misuse, but they miss a huge part of the story: AI's incredible potential to do good.

This isn't to say we should ignore the risks. Vigilance is essential. But the current landscape is dangerously imbalanced. It's time to change the test. It's time for "Humanity's Best Exam."

A New Yardstick for AI Success

Imagine a different kind of evaluation. Instead of asking an AI to solve abstract puzzles, we challenge it with real-world problems that matter to all of us. Think about an AI benchmark that measures success by:

Saving Lives: Identifying early-stage diseases from medical scans with superhuman accuracy, potentially saving millions from preventable illnesses.
Healing the Planet: Designing new, cost-effective catalysts to capture carbon dioxide from the air or creating models that predict natural disasters with enough lead time to save communities.
Unlocking Potential: Generating personalized learning plans that help students master complex subjects, effectively closing educational gaps.

This is the vision behind Humanity's Best Exam—a benchmark that measures an AI's capacity to serve the public good and advance human flourishing.

Why This New Exam Matters

Adopting a benchmark focused on positive impact would be transformative for several key reasons:

Channels Competition for Good: AI labs are fiercely competitive and driven to top the leaderboards. By creating leaderboards for solving societal challenges, we can redirect that immense talent and energy toward developing solutions for the world's most pressing problems.
Reshapes the Public Narrative: The story we tell about technology matters. If the most visible metrics are about danger, the public will naturally be fearful. A steady stream of news about AI helping to cure diseases or fight climate change would create a more balanced, hopeful, and informed public conversation.
Guides a Smarter Path Forward: Policymakers, investors, and researchers need clear signals on where to focus their efforts. A benchmark for social good would highlight where AI can deliver the most significant returns for society, encouraging proactive investment and supportive policies rather than reactive, fear-based regulations.

Who Builds the Test?

Creating Humanity's Best Exam is a big job that can't be left to a single entity. It would require an independent, multi-stakeholder group—a consortium of experts from academia, nonprofits, international organizations, and civil society. This group would define the challenges, perhaps drawing inspiration from global frameworks like the UN's Sustainable Development Goals, and ensure the tests are rigorous, fair, and evolve as technology and society's needs change.

Key Takeaways

Our current focus on AI's potential downsides is creating an atmosphere of anxiety. We're meticulously tracking how AI could go wrong while failing to systematically encourage and measure its profound potential to go right.

Here’s a summary of the path forward:

The Problem: Current AI benchmarks are imbalanced, focusing on risks and fueling public fear.
The Solution: Create "Humanity's Best Exam," a new benchmark that evaluates AI on its ability to solve major societal problems.
The Benefit: This would harness AI competition for good, foster a more hopeful public narrative, and guide policymakers toward proactive innovation.
The How: An independent, multi-stakeholder consortium should be formed to build and govern this new benchmark.

It's time to reorient our focus. Let's not just prepare for AI's worst-case scenarios; let's actively architect its very best.