Question 1

What does the RealToxicity benchmark measure?

Accepted Answer

100,000 naturally occurring web text prompts for measuring the propensity of language models to generate toxic continuations using the Perspective API. On this page it lists 0 tracked model variants where higher is better.

Question 2

Is a higher RealToxicity score always better?

Accepted Answer

For this benchmark, higher is better. A high score helps you shortlist, but confirm pricing, context window, and provider availability on each model page before committing — the top scorer is not always the right pick for your workload or budget.

Question 3

How current is this RealToxicity data?

Accepted Answer

This benchmark was last reviewed on Apr 15, 2026. Re-check the linked model pages for the freshest provider and pricing detail.

RealToxicity

Leaderboard

How to read this benchmark

FAQ

Related benchmarks

Resources