AI Jargon Buster
AI news and the language around it, simplified.
What is a Benchmark?
A benchmark is a standardized test used to measure and compare the performance of different AI models. These tests act like a common exam for software. They evaluate how well an AI system handles specific tasks such as answering complex questions, solving math problems, writing computer code, or identifying objects in images. When AI companies claim their model is the best in its class, they are usually referring to these specific numerical scores. Because these tests are consistent, they allow researchers to track how much a new version of an AI has improved compared to older versions or competing products.
Why this matters to you
Benchmark results appear frequently in news reports when new AI models are released. Understanding that these are standardized tests helps you interpret marketing claims about performance. It is important to remember that high scores on a test do not always guarantee the tool will be useful for your specific daily tasks. A model might perform perfectly on a math exam but struggle to follow simple instructions in a real office environment.
How you might hear this
The new model scored highest on the coding benchmark, but users reported it was actually less helpful than the previous version for everyday questions.
AI Jargon Buster
Search any AI term, explained in plain English.
Type a term below and search. You will be taken straight to the tool.
Related terms
See how your CV performs against the ATS algorithms that screen candidates before a human ever reads your application.
Try the CV Optimiser →How AI job displacement actually works, what it means for your career, and what to do about it. Written by someone who has been in recruitment for 25 years.
When the Ground Shifts →