LLM ReferenceLLM Reference
ToxiGenactiveSafety

ToxiGen

Metric: Toxicity Classification Accuracy (higher is better)Introduced: 2022

About

274,000 machine-generated toxic and benign statements about 13 minority groups, using adversarial classifier-in-the-loop decoding for implicit hate speech evaluation.