Is your AI benchmark lying to you?

By Michael Brooks, Nature, August 6, 2025


Anshul Kundaje sums up his frustration with the use of artificial intelligence in science in three words: “bad benchmarks propagate”.

Kundaje researches computational genomics at Stanford University in California. He is keen to incorporate any form of artificial intelligence (AI) that helps to accelerate progress in his field — and countless researchers have stepped up to offer tools for this purpose. But finding the ones that work best is becoming ever harder because some researchers have been making questionable claims about the AI models they have developed. These claims can take months to check. And they often turn out to be false — mainly because the benchmarks used to demonstrate and compare performance of these tools are not fit for purpose.

By then, it’s often too late: Kundaje and his colleagues are left playing whack-a-mole after the flawed benchmarks have been adopted and ‘improved’ by enthusiastic, but naive, users. “In the meantime, everyone has been using these [benchmarks] for all kinds of wrong stuff, and then you have wrong information and wrong predictions out there,” he says.

continue reading

Next
Next

The AI-Powered Security Shift: What 2025 Is Teaching Us About Cloud Defense