ToxiGenactiveSafety
ToxiGen
Metric: Toxicity Classification Accuracy (higher is better)Introduced: 2022
About
274,000 machine-generated toxic and benign statements about 13 minority groups, using adversarial classifier-in-the-loop decoding for implicit hate speech evaluation.