adjustable glasses illustration

eye chart illustration

clipboard illustration

lenses illustration

Vision AI Checkup

See how 20+ vision-language models perform on dozens of real-world tasks.

Run on 90 prompts.

Model

Score

Avg Time / Prompt

OpenAI O4 Mini (High Reasoning)

OpenAI O4 Mini (Medium Reasoning)

OpenAI O3 (Medium Reasoning)

ChatGPT-4o (High Reasoning)

OpenAI O3 (High Reasoning)

ChatGPT-4o (Medium Reasoning)

GPT-5 (high reasoning)

Explore Prompts

Explore the prompts we run as part of the Vision AI Checkup.

(p.s.: you can add your own too!)

Is the glass rim cracked? Answer only yes or no.

Glass rim crack

How wide is the sticker in inches? Return only a real number.

Measure a sticker

How many bottles are in the image? Answer only a number

Count bottles

What date is picked on the calendar? Answer like January 1 2020

Date picker

How much tax was paid? Only answer like $1.00

Read a receipt

What is the serial number on the tire? Answer only the serial number.

Read a serial number

Explore all 90 prompts →

About Vision AI Checkup

Vision AI Checkup measures how well new multimodal models perform at real world use cases.

Our assessment consists of dozens of images, questions, and answers that we benchmark against models. We run the checkup every time we add a new model to the leaderboard.

You can use the Vision AI checkup to gauge how well a model does generally, without having to understand a complex benchmark with thousands of data points.

The assessment and models are constantly evolving. This means that as more tasks get added or models receive updates, we can build a clearer picture of the current state-of-the-art models in real-time.

All of our assessment code is open source.

Contribute a Prompt

Lightbulb

Have an idea for a prompt? Submit it to the project repository on GitHub!