Hermes 3 Models by Nous Research
About
The Hermes 3 family of large language models (LLMs), developed by NousResearch, represents a significant advancement in generalist instruction models 146. Built upon the Llama 3.1 foundation model, Hermes 3 models are available in 8B, 70B, and 405B parameter versions 146. A key design principle is enhanced steerability, achieved through targeted training to precisely follow system and instruction prompts in a neutral and adaptive manner 146. This leads to models that are highly responsive to system prompts, allowing fine-grained control over behavior and persona 8. In addition to instruction following, Hermes 3 features long-term context retention, multi-turn conversation, complex role-playing, internal monologue abilities, and enhanced agentic function-calling 146. These models excel in structured output generation, utilizing XML tags and scratchpads for transparency and accuracy 1. The training data is a carefully curated blend of approximately 390 million tokens, including a significant portion of synthetically generated responses to encourage precise instruction following and nuanced reasoning 13. While NousResearch claims superior performance over Llama 3.1 in certain areas 46, independent evaluations indicate mixed results, underscoring the challenges in benchmarking and comparing LLMs 3.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 128k context and 405B parameters.
Use when the workload needs 128k context and 70B parameters.
Use when the workload needs 128k context and 8B parameters.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Hermes 3 Llama 3.1 405B | Use when the workload needs 128k context and 405B parameters. | 2024-11 | 128k context405B parameters | Current |
| Hermes 3 Llama 3.1 70B | Use when the workload needs 128k context and 70B parameters. | 2024-11 | 128k context70B parameters | Current |
| Hermes 3 Llama 3.1 8B | Use when the workload needs 128k context and 8B parameters. | 2024-11 | 128k context8B parameters | Current |
Release Timeline
1 release groupSpecifications(3 models)
| Model | Released | Context | Parameters |
|---|---|---|---|
| Hermes 3 Llama 3.1 405B | 2024-11 | 128k | 405B |
| Hermes 3 Llama 3.1 70B | 2024-11 | 128k | 70B |
| Hermes 3 Llama 3.1 8B | 2024-11 | 128k | 8B |
Frequently Asked Questions
- What is Hermes 3 used for?
- Hermes 3 is used for agent workflows, structured outputs, and chatbot and role-playing use cases. The family description and listed model capabilities point to those workloads as the best fit.
- How does Hermes 3 compare to MOSS-Audio?
- Hermes 3 by Nous Research is strongest where you need agent workflows, while MOSS-Audio by MOSI Intelligence is the closest related family to check for multimodal. Hermes 3 has 3 listed variants and reaches up to 128k context, so compare the specs and pricing tables before choosing a production model.
- Which Hermes 3 model should I use?
- If price is the main constraint, use the pricing table first because Hermes 3 does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Hermes 3 Llama 3.1 405B with 128k context.




