Vision AI Checkup
Take your LLM to the optometrist
Run on 85 prompts.
Model
Score
Time
#1
GPT-4.1 Mini
77.6%
6.85s
#1
ChatGPT-4o
77.6%
6.47s
#2
OpenAI O4 Mini
75.0%
11.78s
#3
GPT-4.1
73.7%
7.13s
#4
Gemini 2.5 Pro Preview
72.4%
12.46s
#5
Llama 4 Scout 17B

69.7%
3.06s
#5
Gemma 3 27b
69.7%
2.86s
#5
Gemini 2.0 Flash
69.7%
3.04s
#5
Claude 3.5 Haiku
69.7%
4.42s
#6
Claude 3.7 Sonnet
68.4%
5.32s
Explore Prompts
Explore the prompts we run as part of the Vision AI Checkup.
(p.s.: you can add your own too!)
Is the glass rim cracked? Answer only yes or no.

How wide is the sticker in inches? Answer only a number.

How many bottles are in the image? Answer only a number

What date is picked on the calendar? Answer like January 1 2020

How much tax was paid? Only answer like $1.00

What is the serial number on the tire? Answer only the serial number.

About Vision AI Checkup
Vision AI Checkup measures how well new multimodal models perform at real world use cases.
Our assessment consists of dozens of images, questions, and answers that we benchmark against models. We run the checkup every time we add a new model to the leaderboard.
You can use the Vision AI checkup to gauge how well a model does generally, without having to understand a complex benchmark with thousands of data points.
The assessment and models are constantly evolving. This means that as more tasks get added or models receive updates, we can build a clearer picture of the current state-of-the-art models in real-time.
Contribute a Prompt
Have an idea for a prompt? Submit it to the project repository on GitHub!