LLM ReferenceLLM Reference

Benchmark Leaderboard

Top models by HellaSwag score

#ModelScoreVersionSource
1GPT-4o (05-13)96.410-shotHELM, Open LLM Leaderboard
2Claude 3.5 Sonnet96.210-shotHELM, official documentation
3Llama 3.1 405B95.810-shotOpen LLM Leaderboard, Meta official
4DeepSeek V395.710-shotOpen LLM Leaderboard, DeepSeek official
5Qwen2.5 72B95.610-shotresearch
6Qwen2.5 72B Instruct95.6standardOpen LLM Leaderboard
7Llama 3.1 70B Instruct94.210-shotOpen LLM Leaderboard
8Mistral Medium93.910-shotresearch
9Mistral Large 293.810-shotOpen LLM Leaderboard
10Mixtral 8x22B v0.193.810-shotresearch
11Falcon 180B92.710-shotOpen LLM Leaderboard
12Gemma 2 27B92.610-shotresearch
13Llama 3 70B92.410-shotresearch
14Qwen2 7B9210-shotresearch
15Mistral NeMo Instruct (2407)91.810-shotOpen LLM Leaderboard
16StarCoder2 15B91.710-shotresearch
17DeepSeek Coder V2 Lite91.410-shotOpen LLM Leaderboard
18Llama 3 8B Instruct91.110-shotresearch
19Mixtral 8x7B90.910-shotOpen LLM Leaderboard
20Command R90.810-shotresearch