GPT-JT Models by Together.ai
About
GPT-JT is a series of large language models that originate from a fine-tuned version of EleutherAI's GPT-J 6B model. These models utilize a decentralized training algorithm, allowing them to operate efficiently despite using a network with relatively slow interconnect speeds. This novel approach optimizes the use of diverse hardware resources. The training process integrates various open-source methodologies and datasets, including Google Research's UL2 training objective, Chain-of-Thought prompting, and datasets like BigScience's Public Pool of Prompts (P3) and AllenAI's Natural Instructions (NI). As a result, GPT-JT models exhibit strong performance on classification benchmarks and are known to outperform models with significantly larger parameters. Importantly, these models are available as open-source, inviting community participation for further enhancements145.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs safety, 6B parameters, and structured outputs.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| GPT-JT 6B V0 | Use when the workload needs 6B parameters. | 2023-03 | 6B parameters | Current |
| GPT-JT 6B V1 | Use when the workload needs 6B parameters. | 2023-03 | 6B parameters | Current |
| GPT-JT Moderation 6B | Use when the workload needs safety, 6B parameters, and structured outputs. | 2023-03 | safety6B parametersstructured outputs | Current |
Release Timeline
1 release groupSpecifications(3 models)
| Model | Released | Parameters | Structured Outputs |
|---|---|---|---|
| GPT-JT 6B V0 | 2023-03 | 6B | No |
| GPT-JT 6B V1 | 2023-03 | 6B | No |
| GPT-JT Moderation 6B | 2023-03 | 6B | Yes |
Available From(2 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| GPT-JT Moderation 6B | Together AI | $0.2 | $0.2 | Serverless |
Frequently Asked Questions
- What is GPT-JT used for?
- GPT-JT is used for safety and structured outputs. The family description and listed model capabilities point to those workloads as the best fit.
- How does GPT-JT compare to Together General?
- GPT-JT by Together.ai is strongest where you need safety, while Together General by Together.ai is the closest related family to check for adjacent model selection. GPT-JT has 3 listed variants, while Together General reaches up to 4k context, so compare the specs and pricing tables before choosing a production model.
- Which GPT-JT model should I use?
- For the lowest listed input price, start with GPT-JT Moderation 6B through Together AI at $0.2/1M input tokens. For the most capable/latest local choice, evaluate GPT-JT Moderation 6B with structured outputs.




