LLM ReferenceLLM Reference
Concepts & capability filters
Capability filtercapabilityintermediate

Structured outputs

Also known as: JSON mode, schema outputs, constrained decoding

schema-shaped JSON

390

matching active models

43

tracked providers

345

models with routes

model.structured_outputs

Definition

Structured outputs are constrained responses shaped to a schema, JSON object, or function-call contract instead of free-form prose. They are useful for extraction, classification, routing, and agent control because downstream code can parse the result more reliably.

Models With Structured outputs

Showing the first 80 decision-sorted matches, with model flags and provider-route evidence from seed data.

390 matches
Mistral Small 3

Mistral Small 3 is a January 2025 Mistral model. Provider-specific rows, such as Together AI, belong in modelProvider coverage rather than in the model name.

2025-01-01

Researched 1d ago

33K

32,768 tokens

Tool useFunctionsJSON
Together AI

$0.100 in / $0.300 out / 1M tokens

1 route

Provider docs
Qwen3.5-9B

Open-weight small dense Qwen3.5 model. Apache 2.0.

2026-03-02

Researched 1d ago

262K

262,144 tokens

262K contextVisionMultimodalTool useFunctionsJSON
Alibaba Cloud PAI-EAS

$0.100 in / $0.150 out / 1M tokens

3 routes

Provider docs
Together AI - Gemma 3n-e4B

Efficient 4B parameter model from Google, available on Together AI. Gemma 3 nano-edge model optimized for low-latency inference.

2026-03-15

Researched 26d ago

8K

8,192 tokens

Tool useFunctionsJSON
Together AI

$0.020 in / $0.040 out / 1M tokens

1 route

Provider docs
Llama Guard 7B

Llama Guard 7B is a specialized content moderation model based on the Llama 2 architecture, designed to safeguard AI interactions. With 7 billion parameters, it excels in classifying and moderating both input prompts and output responses from large language models. The model employs a comprehensive risk taxonomy to identify various categories of harmful content, including violence, hate speech, and sexual content. Trained on diverse datasets, including prompts from the Anthropic dataset and in-house generated responses, Llama Guard 7B has demonstrated superior performance compared to industry-standard content moderation APIs. This makes it an invaluable tool for AI engineers focused on deploying safe and responsible AI systems. For more information, visit the model's page on Hugging Face .

2023-12-07

Researched 26d ago

2K

2,000 tokens

JSON
Fireworks AI

$0.200 in / $0.200 out / 1M tokens

3 routes

Provider docs
Llama Guard 3 8B

Llama Guard 3 8B is a specialized large language model developed by Meta for content safety classification. Fine-tuned from Llama 3.1, this 8-billion parameter model excels in moderation tasks, classifying both inputs and outputs across 14 hazard categories based on the MLCommons taxonomy. It supports multiple languages, including English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Designed for AI engineers focusing on safe and responsible AI systems, Llama Guard 3 offers improved accuracy and reduced false positive rates in identifying unsafe content, making it a valuable tool for developing robust content moderation systems in conversational AI applications .

2024-07-23

Researched 26d ago

8K

8,000 tokens

JSON
OpenRouter

$0.480 in / $0.030 out / 1M tokens

4 routes

Provider docs
GPT-3.5 Turbo 16k

GPT-3.5 Turbo 16k, developed by OpenAI, is an advanced language model featuring a significantly enhanced context window of 16,384 tokens—four times larger than its predecessor's 4,096 tokens 245. This extension allows it to process and comprehend extended texts, up to approximately 20 pages, in a single interaction, while maintaining the speed and efficiency of earlier versions 10. Although it is a chat-centric model not compatible with the completions endpoint 8, it remains highly effective for tasks requiring prolonged relevance and coherence through OpenAI's API 46. With a knowledge cutoff in September 2021, it surpasses its predecessors in capability, yet it is less advanced than GPT-4 in visual and multilingual contexts 35.

2023-06-13

Researched 26d ago

16K

16,000 tokens

JSON
OpenRouter

$3.00 in / $4.00 out / 1M tokens

2 routes

Provider docs
Xiaomi MiMo-V2.5

Xiaomi MiMo-V2.5 is the lower-cost native omnimodal sibling in the MiMo-V2.5 series. OpenRouter describes it as supporting text, image, audio, and video inputs with text output, Pro-level agentic performance at roughly half the inference cost, and improved multimodal perception over MiMo-V2-Omni. Xiaomi's official April 22 release page highlights MiMo-V2.5 alongside MiMo-V2.5-Pro in benchmark data and says the V2.5 series will be open-sourced soon; no public weights/license were verified at research time.

2026-04-22

Researched 22d ago

1M

1,048,576 tokens

1M contextReasoningVisionMultimodalTool useFunctions
OpenRouter

$0.400 in / $2.00 out / 1M tokens

1 route

Provider docs
Xiaomi MiMo-V2.5-Pro

Xiaomi's April 22, 2026 public-beta flagship in the MiMo-V2.5 series. The official Xiaomi MiMo page describes MiMo-V2.5-Pro as its most capable model to date, focused on general agentic capability, complex software engineering, long-horizon tasks, and ultra-long-context instruction following. OpenRouter lists it as text-to-text with 1,048,576 token context, 131,072 max completion tokens, reasoning controls, tool use, and response_format support. Xiaomi says the V2.5 series will be open-sourced soon, but no public weights/license were verified at research time.

2026-04-22

Researched 22d ago

1M

1,048,576 tokens

1M contextTool useFunctionsJSON
OpenRouter

$1.00 in / $3.00 out / 1M tokens

1 route

Provider docs
Llama 2 7B Chat

The Llama 2 7B Chat model is a fine-tuned variant of Meta's Llama 2 series, optimized for conversational AI applications. Built on an auto-regressive transformer architecture, it boasts 7 billion parameters and has been trained on a diverse dataset of 2 trillion tokens. The model underwent supervised fine-tuning and reinforcement learning with human feedback to enhance its performance in dialogue scenarios. It demonstrates competitive capabilities in terms of helpfulness and safety compared to both open-source and closed-source alternatives like ChatGPT and PaLM. Designed for commercial and research use, particularly in English language tasks, it's well-suited for developing chatbots, virtual assistants, and other interactive AI systems. More details can be found on its Hugging Face page .

2023-07-18

Researched 26d ago

4K

4,000 tokens

JSON
DeepInfra

$0.070 in / $0.070 out / 1M tokens

10 routes

Provider docs
Llama 2 13B Chat

The Llama 2 13B Chat model is a 13 billion parameter generative text model developed by Meta, optimized for conversational applications. Released on July 18, 2023, it's part of the Llama 2 family and excels in dialogue scenarios. The model leverages supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to generate coherent and contextually relevant responses. Trained on 2 trillion tokens from diverse public sources, it outperforms many open-source chat models and matches popular closed-source models in helpfulness and safety. This model is ideal for AI engineers working on chatbots, virtual assistants, and customer service automation. For more details, visit the model's Hugging Face page [1].

2023-07-18

Researched 26d ago

4K

4,000 tokens

JSON
DeepInfra

$0.130 in / $0.130 out / 1M tokens

12 routes

Provider docs
CodeLlama 7B Python

CodeLlama 7B Python is a specialized variant of Meta's CodeLlama family, designed for Python programming tasks. With 7 billion parameters, it excels in code completion, infilling, and instruction following. The model utilizes an optimized auto-regressive transformer architecture and has been trained on diverse programming tasks. It's suitable for both commercial and research applications, offering AI engineers a powerful tool for enhancing productivity in Python-centric environments. For more details, visit the model's page on Hugging Face .

2023-08-24

Researched 26d ago

100K

100,000 tokens

JSON
Fireworks AI

$0.200 in / $0.200 out / 1M tokens

4 routes

Provider docs
CodeLlama 7B

CodeLlama 7B is a specialized code generation model released by Meta on August 24, 2023. With 7 billion parameters, it excels in code completion, infilling, and instruction-following tasks. Built on an optimized transformer architecture, it's designed for both commercial and research applications, particularly in English and various programming languages. The model offers robust capabilities for AI engineers looking to enhance coding workflows or develop code generation applications. More details can be found on the official Hugging Face page .

2023-08-24

Researched 26d ago

100K

100,000 tokens

JSON
Fireworks AI

$0.200 in / $0.200 out / 1M tokens

5 routes

Provider docs
CodeLlama 13B Python

CodeLlama 13B Python is a specialized variant of the CodeLlama family, featuring 13 billion parameters and optimized for Python programming tasks. This model excels at code synthesis, completion, and infilling, while also supporting instruction following and chat-based interactions. Leveraging advanced transformer architecture and trained on a diverse dataset, it offers robust understanding of programming concepts and syntax. Designed for AI engineers and developers, it serves as a powerful tool for integrating AI into coding workflows, making it particularly valuable for Python-related applications .

2023-08-24

Researched 26d ago

100K

100,000 tokens

JSON
Fireworks AI

$0.200 in / $0.200 out / 1M tokens

4 routes

Provider docs
CodeLlama 13B

CodeLlama 13B is a state-of-the-art generative text model developed by Meta, specifically designed for code synthesis and understanding tasks. Released on August 24, 2023, this 13-billion-parameter model excels in general code generation and comprehension, making it suitable for a wide range of programming tasks, including code completion, infilling, and instruction following. It utilizes an optimized transformer architecture and has been trained on a diverse dataset similar to Llama 2, ensuring robust understanding of programming languages and coding practices. AI engineers can integrate CodeLlama 13B into various coding environments and tools for both commercial and research applications, leveraging its powerful capabilities to enhance productivity and streamline the coding process.

2023-08-24

Researched 26d ago

100K

100,000 tokens

JSON
Fireworks AI

$0.200 in / $0.200 out / 1M tokens

4 routes

Provider docs
CodeLlama 34B Python

CodeLlama 34B Python is a specialized code generation model released by Meta on August 24, 2023. With 34 billion parameters, it excels in Python-specific tasks like code completion, infilling, and instruction following. This model offers AI engineers a powerful tool for enhancing coding workflows and productivity. Its architecture is optimized for understanding and generating complex code structures, making it particularly useful for software development tasks. The model is available in the Hugging Face Transformers format, facilitating easy integration into existing projects .

2023-08-24

Researched 26d ago

100K

100,000 tokens

JSON
Together AI

$0.800 in / $0.800 out / 1M tokens

4 routes

Provider docs
CodeLlama 34B

CodeLlama 34B is a powerful generative text model developed by Meta, specifically tailored for code synthesis and understanding. With 34 billion parameters, it excels in code completion, infilling, and instruction following, particularly for Python programming. The model utilizes an auto-regressive transformer architecture and has been trained on a diverse dataset of programming languages, making it versatile for various coding tasks. Designed for both commercial and research applications, CodeLlama 34B offers AI engineers a robust tool for integrating advanced code generation capabilities into their projects. More details can be found on the model's Hugging Face page .

2023-08-24

Researched 26d ago

100K

100,000 tokens

JSON
DeepInfra

$0.200 in / $0.450 out / 1M tokens

6 routes

Provider docs
GLM-5.1

Post-training variant of GLM-5 from Zhipu AI with enhanced reasoning and coding capabilities. 754B parameters (40B active) in Mixture of Experts architecture. Optimized for complex agentic workflows and multi-step reasoning. Available via Z.AI API and open weights under the MIT license.

2026-04-07

Researched 11d ago

200k

200,000 tokens

200k contextReasoningTool useFunctionsJSONCode exec
OpenRouter

$1.05 in / $3.50 out / 1M tokens

3 routes

Provider docs
Grok-2

Enhanced contextual memory with limited image input; political filter added.

2024-08-01

Researched 26d ago

128K

128,000 tokens

128K contextReasoningJSON
SiliconFlow

$0.500 in / $0.500 out / 1M tokens

1 route

Provider docs
Qwen1.5-32B

Qwen1.5-32B is a robust large language model from the Qwen1.5 series, serving as a beta version of Qwen2. It is a transformer-based, decoder-only model, pretrained on an extensive dataset. Key features include its 32 billion parameters and a support for up to 32K context length, alongside multilingual capabilities. The model demonstrates substantial performance enhancements over its predecessor, especially in chat applications, using advanced techniques like SwiGLU activation and group query attention. While there's a base version, the chat variant is fine-tuned for conversational AI. It's accessible on Hugging Face and other platforms for diverse applications.

2024-02-05

Researched 26d ago

No window data

JSON
Together AI

$0.800 in / $0.800 out / 1M tokens

2 routes

Provider docs
Gemma 7B Instruct

Gemma 7B Instruct is a cutting-edge large language model developed by Google DeepMind, boasting 7 billion parameters. As part of the Gemma family, it benefits from the advanced research underpinning Google's Gemini models. This model is optimized for text generation tasks, excelling in areas like question answering and summarization, and it is finely tuned to follow instructions effectively. Despite its compact size, Gemma 7B Instruct performs impressively on benchmarks, making it versatile for deployment across various hardware platforms, from laptops to cloud infrastructure. Moreover, it is open-source, with accessible weights and incorporates responsible AI practices, such as data filtering and human feedback, to ensure safe and ethical use.

2024-02-21

Researched 26d ago

8K

8,000 tokens

JSON
Lepton AI API

$0.070 in / $0.070 out / 1M tokens

8 routes

Provider docs
CodeLlama 70B Python

CodeLlama 70B Python is a specialized AI model by Meta, designed for Python code synthesis and understanding. With 70 billion parameters, it excels in code completion, infilling, and instruction following tasks. The model leverages an optimized transformer architecture and has been fine-tuned with up to 16,000 tokens, making it particularly effective for Python-centric development workflows. While it doesn't support long contexts of 100,000 tokens, it offers powerful capabilities for both commercial and research applications in Python programming environments. More details can be found in the research paper "Code Llama: Open Foundation Models for Code" .

2024-01-29

Researched 26d ago

16K

16,000 tokens

JSON
Fireworks AI

$0.900 in / $0.900 out / 1M tokens

4 routes

Provider docs
Falcon 7B

Falcon-7B, developed by the Technology Innovation Institute, is a cutting-edge large language model boasting a decoder-only architecture with 7 billion parameters. It's trained on 1,500 billion tokens from the curated web dataset, RefinedWeb, enhancing its performance in language tasks. The model is equipped with advanced features like FlashAttention and multiquery attention, optimizing speed and memory usage. With 32 layers and rotary positional embeddings, it manages a sequence length of up to 2048 tokens efficiently. Renowned for tasks such as text generation, summarization, translation, and conversational AI, Falcon-7B is open-source under Apache 2.0, suitable even for consumer hardware, needing at least 16GB of memory for inference 236.

2023-11-28

Researched 26d ago

No window data

JSON
Microsoft Foundry

$0.520 in / $0.670 out / 1M tokens

4 routes

Provider docs
Nova Pro

Amazon Nova Pro available on AWS Bedrock

2025-03-17

Researched 26d ago

300K

300,000 tokens

300K contextJSON
AWS Bedrock

$0.800 in / $3.20 out / 1M tokens

2 routes

Provider docs
Reka Edge

7B dense multimodal model. Outperforms much larger models

2024-02-12

Researched 26d ago

64K

64,000 tokens

MultimodalJSON
OpenRouter

$0.100 in / $0.100 out / 1M tokens

2 routes

Provider docs
WizardLM-2 7B

WizardLM-2 7B is a large language model developed by WizardLM in collaboration with Microsoft AI. It is part of the WizardLM-2 family, which includes larger models but is notable for its quick processing speed, achieving performance comparable to open-source models that are much larger. This multilingual model can process diverse input types, such as natural language text, code, and mathematical expressions. It showcases capabilities in text generation, question answering, summarization, as well as code generation and mathematical problem-solving. Particularly adept at complex chat scenarios and multilingual tasks, it is based on the Mistral-7B-v0.1 base model and is available as open-source under the Apache 2.0 license. Different versions exist with various quantizations, providing options that balance model size and performance.

2024-01-09

Researched 26d ago

No window data

JSON
Lepton AI API

$0.070 in / $0.070 out / 1M tokens

2 routes

Provider docs
Nous Hermes Llama 2 7B

The Nous Hermes Llama 2 7B is a state-of-the-art large language model built on the efficient Llama 2 transformer architecture. Fine-tuned on over 300,000 instructions, it exhibits several impressive features, such as generating long, detailed responses with a low hallucination rate. Notably, it lacks OpenAI's censorship, enabling more open discussions. The model excels in knowledge retention and task completion through extensive training on synthetic GPT-4 outputs and supports prompts in the versatile Alpaca format. Its benchmark performance varies across tasks like GPT4All and BigBench, and quantized versions are available, providing flexible deployment across various platforms.

2023-12-15

Researched 26d ago

No window data

JSON
Fireworks AI

$0.200 in / $0.200 out / 1M tokens

2 routes

Provider docs
Mistral Medium

Mistral Medium is a versatile large language model developed by Mistral AI, designed to handle a wide array of tasks with a robust 32k token context window, allowing it to process approximately 24,000 words. Built on a transformer architecture, it offers native fluency in multiple languages, including English, French, Spanish, German, and Italian, enhancing its multilingual reasoning capabilities. Available via API, Mistral Medium is proprietary and stronger than some of Mistral AI's open-source models like Mixtral 8x7B and Mistral-7B. While it is described as more cost-effective than models such as GPT-4, specific pricing details are not provided 11011.

2023-12-11

Researched 10d ago

32K

32,000 tokens

JSON
OpenRouter

$0.400 in / $2.00 out / 1M tokens

2 routes

Provider docs
Titan Text Lite

Amazon Titan Text Lite is a lightweight and efficient large language model designed specifically for English-language tasks. It excels in fine-tuning applications such as summarization and copywriting, providing a cost-effective and highly customizable solution for users. Despite being smaller and less expensive than other Titan Text models, it supports a variety of text generation tasks. The model can handle a maximum context length of 4,000 tokens, ensuring flexibility in handling longer text inputs while maintaining efficiency and performance.

2023-11-29

Researched 26d ago

No window data

JSON
AWS Bedrock

$0.150 in / $0.200 out / 1M tokens

1 route

Provider docs
WizardLM 13B V1.0

The WizardLM 13B V1.0 is a transformer-based large language model built on the LLaMA architecture, featuring 13 billion parameters, enabling it to efficiently process a variety of natural language tasks. Its capabilities include text generation, coherent conversations, summarization, translation, and coding assistance. Trained on diverse datasets, the model has some limitations, such as potential biases, lack of content filtering, performance trade-offs due to quantization, and a context length limited to 2048 tokens. Despite these constraints, it is a powerful tool for numerous applications in natural language processing.

2023-04-28

Researched 26d ago

No window data

JSON
Together AI

$0.300 in / $0.300 out / 1M tokens

1 route

Provider docs
Llama 3 8B Instruct

The Llama 3 8B Instruct model, released on April 18, 2024, is Meta's latest instruction-following language model with 8 billion parameters. It utilizes an auto-regressive transformer architecture with Grouped-Query Attention for improved scalability. Trained on over 15 trillion tokens and fine-tuned with 10 million human-annotated examples, it excels in dialogue and conversational tasks. The model outperforms its predecessors on industry benchmarks, scoring 68.4 on MMLU (5-shot). Designed for commercial and research applications, it prioritizes safety and helpfulness, making it suitable for chatbots, virtual assistants, and other interactive AI applications. For more details, visit the Hugging Face page [1].

2024-04-18

Researched 26d ago

8K

8,000 tokens

JSON
OpenRouter

$0.030 in / $0.040 out / 1M tokens

16 routes

Provider docs
OpenChat 3.5 (0106)

OpenChat 3.5 is an open-source large language model renowned for delivering performance on par with ChatGPT despite having just 7 billion parameters compared to ChatGPT's 70 billion. Built on a transformer architecture, this model uses C-RLFT (Constrained Reinforcement Learning from Trajectories) to learn from mixed-quality data without requiring explicit preference labels. It adeptly handles tasks such as text and code generation, including languages like HTML5 and JavaScript, and excels in multi-turn conversations. OpenChat 3.5 is open-source, making it an appealing choice for developers and researchers, and it is optimized to run efficiently on consumer-grade GPUs with 24GB of RAM. The model also includes specialized modes like "Mathematical Reasoning Mode" for task-specific enhancements. However, it is not without its challenges, as it shares common LLM limitations, such as hallucination and potential safety concerns related to biased or harmful outputs.

2024-01-06

Researched 26d ago

8K

8,000 tokens

JSON
Lepton AI API

$0.070 in / $0.070 out / 1M tokens

4 routes

Provider docs
Falcon 40B

Falcon 40B is a leading open-source large language model developed by the Technology Innovation Institute in Abu Dhabi, featuring a causal decoder-only architecture with 40 billion parameters. It stands out with its use of rotary positional embeddings, multi-query attention, and FlashAttention, enhancing its contextual understanding and processing efficiency. Trained on 1 trillion tokens using the enriched RefinedWeb dataset, Falcon 40B excels in various natural language processing tasks, ranging from text generation to language translation and question answering. It supports multiple languages and is open under the Apache 2.0 license, promoting both research and commercial use. The model efficiently utilizes standard hardware, requiring around 85-100 GB of memory for inference, setting a benchmark for performance and scalability in its category.

2023-11-28

Researched 26d ago

No window data

JSON
Microsoft Foundry

$1.54 in / $1.77 out / 1M tokens

4 routes

Provider docs
Mistral Large 2 (2407)

Flagship sparse MoE Mistral model (675B total, 41B active) with 256K context and multimodal capabilities. Leads benchmarks in complex reasoning and long-context processing.

2024-07-23

Researched 22d ago

128K

128,000 tokens

128K contextVisionJSON
Chutes AI

$0.500 in / $1.50 out / 1M tokens

3 routes

Phi-3 Medium 4K

The Phi-3 Medium 4K, developed by Microsoft, is a state-of-the-art large language model with 14 billion parameters. It is engineered for efficiency across various tasks, particularly excelling in reasoning capabilities. This model is designed to handle 4,096 token context lengths, allowing for the processing of longer input sequences. Leveraging a dense, decoder-only Transformer architecture, it incorporates techniques like supervised fine-tuning and direct preference optimization to align with human preferences and safety standards. The model supports multilingual data, although it is primarily trained in English. Its lightweight nature allows for deployment on diverse hardware platforms, making it accessible and versatile for both commercial and research purposes. Safety measures are embedded, although further precautions are advised for applications with higher risks.

2024-05-21

Researched 26d ago

4K

4,000 tokens

JSON
DeepInfra

$0.140 in / $0.410 out / 1M tokens

3 routes

Provider docs
Nous Hermes Llama 2 13B

The Nous Hermes Llama 2 13B is an advanced large language model fine-tuned on over 300,000 instructions, building upon the Llama 2 architecture. Developed by Nous Research with contributions from Teknium, Emozilla, and compute sponsorship from Redmond AI, the model boasts features like extended response generation, reduced hallucinations, and compatibility with Alpaca prompt format. It is trained on a diverse dataset, including synthetic GPT-4 outputs, enhancing its knowledge and task completion capabilities. Known for strong performance across various benchmarks, it is suitable for applications like chatbots, content generation, and code generation, while still acknowledging limitations such as data bias and handling of open-ended prompts.

2023-12-15

Researched 26d ago

No window data

JSON
Fireworks AI

$0.200 in / $0.200 out / 1M tokens

3 routes

Provider docs
GPT-3.5 Turbo (Instruct)

GPT-3.5 Turbo Instruct by OpenAI is designed to excel in precise instruction following and task completion, focusing on accuracy and clarity over conversational abilities. It offers key enhancements like efficient instruction adherence, reduced hallucination, and lower toxicity compared to previous models. Compatible with legacy completion endpoints, it retains the speed and affordability of the standard GPT-3.5 Turbo model while using a 4K context window and training data up to September 2021. Not specifically built for chat, it still supports diverse tasks like question answering, text completion, and code generation, aiming to enhance AI usability with safer and more accurate interactions.

2023-09-19

Researched 5d ago

4K

4,000 tokens

JSONBatch
OpenAI API

$1.50 in / $2.00 out / 1M tokens

3 routes · 1 batch

Provider docs
Nova Lite

Amazon Nova Lite available on AWS Bedrock

2025-03-17

Researched 26d ago

300K

300,000 tokens

300K contextJSON
AWS Bedrock

$0.060 in / $0.240 out / 1M tokens

2 routes

Provider docs
Jamba 1.5 Mini

Jamba 1.5 Mini available on AWS Bedrock

2024-08-22

Researched 26d ago

256K

256,000 tokens

256K contextFunctionsJSON
AWS Bedrock

$0.200 in / $0.400 out / 1M tokens

2 routes

Provider docs
Qwen1.5-110B

The Qwen1.5-110B is a large language model created by Alibaba Cloud, distinguished as the largest in the Qwen1.5 series. It is a transformer-based, decoder-only model equipped with 110 billion parameters and optimized for efficiency using features like SwiGLU activation and Grouped Query Attention (GQA). Pretrained on an extensive dataset, it supports a 32K context length and multilingual capabilities, handling various languages including English and Chinese. The model excels in tasks like text generation, dialogue systems, and is noted for its competitive performance and advanced tokenizer, making it highly versatile and applicable across multiple NLP tasks. Various quantized versions are available to accommodate different hardware specifications.

2024-04-25

Researched 26d ago

No window data

JSON
Together AI

$1.80 in / $1.80 out / 1M tokens

2 routes

Provider docs
Claude 3 Sonnet

Claude 3 Sonnet by Anthropic is a versatile large language AI model, balancing intelligence and speed for diverse enterprise use cases. It is part of the Claude 3 family, positioned between the powerful Opus and the faster Haiku models. Sonnet excels in nuanced content creation, accurate summarization, and complex scientific query handling while also showcasing proficiency in non-English languages and coding tasks. Additionally, it enhances vision capabilities with exceptional skills in visual reasoning, such as interpreting charts, graphs, and transcribing text from imperfect images, which benefits industries like retail, logistics, and finance. Operated at twice the speed of Claude 3 Opus, Sonnet is efficient in context-sensitive customer support and multi-step workflows. It has achieved AI Safety Level 2 (ASL-2) and is accessible through multiple platforms, including Claude.ai, the Claude iOS app, the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.

2024-03-04

Researched 26d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalJSONCode exec
AWS Bedrock

$3.00 in / $15.00 out / 1M tokens

2 routes · 1 cache

Provider docs
Jurassic-2 Mid

Jurassic-2 Mid, developed by AI21 Labs, is a large language model that balances quality, speed, and cost, making it well-suited for complex language tasks such as chatbots and conversational interfaces. It has a parameter size of 17 billion and supports multiple languages including Spanish, French, German, Portuguese, Italian, and Dutch. Optimized for generating precise text from instruction-only prompts, the model is capable of zero-shot text generation without requiring examples. Despite its powerful capabilities, it shares common limitations with other LLMs, such as potential inaccuracies, lack of coherence, and the presence of training data biases. It was trained on a vast corpus of roughly 1.2 trillion tokens from diverse sources like CommonCrawl, Wikipedia, and Stack Exchange. Jurassic-2 Mid has an end-of-life date set for November 14, 2024 12.

2023-03-09

Researched 26d ago

No window data

JSON
AI21 Studio

$12.50 in / $12.50 out / 1M tokens

2 routes

Provider docs
Text Bison

Text Bison is an advanced language model developed by Google AI as a refined iteration of the Pathways Language Model (PaLM) 4. It is particularly effective at handling a range of natural language processing tasks, including classification, sentiment analysis, entity extraction, question answering, summarization, and text rewriting 5. The model boasts a substantial token limit, initially set at 4096, later expanded to 32,000, allowing it to process longer texts compared to some earlier models 3. Despite its capabilities, Text Bison shares common large language model limitations, such as training data biases and the potential for generating incorrect or nonsensical outputs 4. Its knowledge is up-to-date only until February 2023 5, and it is scheduled for discontinuation in April 2025 1.

2023-12-06

Researched 26d ago

No window data

JSON
GCP Vertex AI

$0.500 in / $0.500 out / 1M tokens

1 route

Provider docs
Titan Text Express

Amazon Titan Text Express is a large language model (LLM) crafted by AWS, offering a balance of price and performance for text generation 123. As part of the Amazon Titan family, this versatile model excels in tasks such as open-ended text generation, conversational chat, and Retrieval Augmented Generation (RAG) 124. It supports a context length of up to 8,000 tokens, enabling it to handle extensive text inputs effectively 235. While optimized for English, it provides preview support for over 100 other languages 235. This model can be fine-tuned with user data to enhance accuracy for specific tasks like summarization, code generation, table creation, data formatting, paraphrasing, and question answering 123.

2023-11-29

Researched 26d ago

No window data

JSON
AWS Bedrock

$0.200 in / $0.600 out / 1M tokens

1 route

Provider docs
DeepSeek 67B Chat

DeepSeek LLM 67B Chat is a sophisticated language model with 67 billion parameters, leveraging the LLaMA architecture with enhancements such as Grouped-Query Attention across 95 layers. Trained on a vast corpus of 2 trillion tokens in English and Chinese, it excels in tasks like text generation, question answering, and fluent conversation, demonstrating superior performance in reasoning, coding, and mathematics compared to some larger models. Despite its advanced capabilities, the model can exhibit biases from its training data, experience hallucinations, and produce repetitive outputs. Due to its size, substantial computational resources are needed for inference, although quantization methods can reduce its size with potential trade-offs in quality.

2023-11-29

Researched 26d ago

No window data

JSON

No tracked provider route

Llama 3 70B Instruct

The Llama 3 70B Instruct model is a large language model with 70 billion parameters, released by Meta on April 18, 2024. It's an instruction-tuned variant optimized for conversational applications, utilizing an advanced auto-regressive transformer architecture. The model excels in following instructions and engaging in dialogue, having been trained on over 15 trillion tokens with a December 2023 knowledge cutoff. It demonstrates superior performance on industry benchmarks, scoring 82.0 on the MMLU (5-shot) test. The model incorporates extensive safety measures and optimizations, including RLHF, to enhance helpfulness and reduce harmful content generation. For more details, visit the model's Hugging Face page [1].

2024-04-18

Researched 26d ago

8K

8,000 tokens

JSON
Hyperbolic AI Inference

$0.400 in / $0.400 out / 1M tokens

17 routes

Provider docs
DeepSeek R1

DeepSeek R1: Reasoning-optimized model with extended thinking capabilities. 128K context.

2025-01-20

Researched 26d ago

128K

128,000 tokens

128K contextReasoningJSONCode exec
Bitdeer AI

$0.100 in / $0.300 out / 1M tokens

13 routes

Provider docs
DeepSeek V3

DeepSeek V3: Latest flagship model. 685B total with MoE. 128K context. Open-source.

2024-12-26

Researched 26d ago

64k

64,000 tokens

Tool useFunctionsJSON
DeepSeek Platform

$0.140 in / $0.280 out / 1M tokens

12 routes

Provider docs
Kimi K2.5

Moonshot Kimi K2.5 available on AWS Bedrock

2026-03-15

Researched 26d ago

256K

256,000 tokens

256K contextFunctionsJSON
OpenRouter

$0.440 in / $2.00 out / 1M tokens

7 routes

Provider docs
Qwen2.5-72B-Instruct

Instruction-optimized flagship variant for demanding production applications requiring high-accuracy complex problem-solving across industries.

2024-06-07

Researched 26d ago

128K

128,000 tokens

128K contextJSON
SiliconFlow

$0.280 in / $0.280 out / 1M tokens

7 routes

Provider docs
Qwen2.5-7B-Instruct

Instruction-tuned 7B variant combining strong reasoning with real-time inference on single GPUs, ideal for developer tools and vision applications.

2024-06-07

Researched 26d ago

128K

128,000 tokens

128K contextJSON
DeepInfra

$0.030 in / $0.030 out / 1M tokens

6 routes

Provider docs
Command R+

Command R+ is a powerful large language model from Cohere, tailored for robust enterprise applications. It features an architecturally impressive 104 billion parameters and a 128k-token context window, allowing it to adeptly manage complex tasks and long dialogue sessions. Its cutting-edge capabilities include retrieval-augmented generation (RAG) with inline citations, multilingual functionality supporting ten major languages, and multi-step tool usage to automate complex workflows. The model is suited for diverse business operations such as financial analysis, customer support, and content creation, and can be accessed via Cohere's API or through platforms like Microsoft Azure and Oracle Cloud Infrastructure. Its deployment is subject to a non-commercial license, though exceptions can be considered.

2024-04-04

Researched 26d ago

128K

128,000 tokens

128K contextJSON
Cohere API

$2.50 in / $10.00 out / 1M tokens

6 routes

Provider docs
Claude 3.7 Sonnet

Claude 3.7 Sonnet is Anthropic's advanced model with extended thinking capabilities, offering state-of-the-art reasoning for complex tasks.

2024-03-04

Researched 26d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalTool useFunctions
AWS Bedrock

$3.00 in / $15.00 out / 1M tokens

6 routes · 1 batch

Provider docs
GLM-5

Flagship open-weight foundation model from Zhipu AI with 744B parameters (40B active per token) in Mixture of Experts architecture. Trained on 28.5T tokens using DeepSeek Sparse Attention on Huawei Ascend hardware. Achieves state-of-the-art performance on coding and agentic benchmarks (SWE-bench Verified: 77.8%). Supports autonomous planning, multi-step tool use, and self-correction.

2026-02-11

Researched 26d ago

200k

200,000 tokens

200k contextReasoningTool useFunctionsJSON
OpenRouter

$0.600 in / $2.08 out / 1M tokens

5 routes

Provider docs
Qwen2.5-Coder-32B-Instruct

Instruction-optimized 32B code flagship for production systems requiring top-tier code reasoning, generation, and multi-file analysis.

2024-11-12

Researched 26d ago

No window data

JSONCode exec
SiliconFlow

$0.180 in / $0.180 out / 1M tokens

5 routes

Provider docs
Llama 3.2 11B Vision Instruct

Instruction-tuned 11B Llama 3.2 Vision model for image reasoning, visual question answering, document understanding, and captioning. NVIDIA NIM lists text plus image input, text output, and a 128K context window for the Llama 3.2 Vision collection.

2024-09-25

Researched 9d ago

128K

128,000 tokens

128K contextVisionMultimodalJSON
Fireworks AI

$0.200 in / $0.200 out / 1M tokens

5 routes

Provider docs
Gemma 2 27B Instruct

Gemma 2 27B Instruct is a cutting-edge large language model from Google, excelling in text generation, question answering, summarization, and reasoning tasks. It features a decoder-only transformer architecture, utilizing 27 billion parameters, and supports context length processing of up to 8,192 tokens. The model incorporates innovative mechanisms like Grouped Query Attention and Sliding Window Attention to enhance efficiency and effectiveness in handling long texts. Its instruction-tuned variants are designed for improved interaction in conversational tasks, and it benefits from knowledge distillation techniques for enhanced performance. Additionally, Gemma 2 27B Instruct is openly accessible, promoting wider innovation in AI applications.

2024-06-27

Researched 26d ago

8K

8,000 tokens

JSON
Replicate API

$0.400 in / $0.400 out / 1M tokens

5 routes

Provider docs
Gemma 2 9B Instruct

Gemma 2 9B Instruct, developed by Google, is a state-of-the-art large language model based on the advanced Gemini framework. It is a decoder-only transformer model with 9 billion parameters, offering a balance between size and performance. The model is trained on an expansive dataset comprising 8 trillion tokens, including web documents, code, and mathematical text, a notable 30% increase from its predecessor, Gemma 1.1. This allows it to adeptly handle diverse tasks such as question answering, creative writing, coding, and mathematical problem-solving. However, it shares common limitations of large language models, such as potential biases and the risk of generating inaccuracies or outdated information. Notably, Gemma 2 9B Instruct incorporates Grouped-Query Attention (GQA) and uses the GeGLU activation function, and is specifically fine-tuned to follow instructions and participate effectively in multi-turn dialogues.

2024-06-27

Researched 26d ago

8K

8,000 tokens

JSON
Replicate API

$0.100 in / $0.100 out / 1M tokens

5 routes

Provider docs
SOLAR 10.7B

SOLAR 10.7B is a robust large language model created by Upstage AI in South Korea, featuring 10.7 billion parameters. It is tailored for high efficiency and performance through its innovative "Depth Up-Scaling" (DUS) approach, which deepens the model's layers rather than widening them, allowing for enhanced capabilities without significantly increasing computational costs. This method distinguishes it from other models that utilize more complex techniques like Mixture of Experts. By integrating pre-trained weights from the Mistral 7B model with the Llama 2 framework, SOLAR 10.7B achieves notable performance, outpacing even some models with up to 30 billion parameters. Available under the Apache 2.0 license, it also includes a finely-tuned instruction-based variant under CC-BY-NC-4.0, optimized for single-turn conversations and diverse NLP tasks, albeit with limitations in handling multi-turn dialogue and complex context. The model is grounded in the transformer architecture, widely adopted in advanced language models.

2024-06-24

Researched 26d ago

No window data

JSON
Fireworks AI

$0.200 in / $0.200 out / 1M tokens

5 routes

Provider docs
Phi-2

Phi-2 is a compact language model by Microsoft endowed with 2.7 billion parameters and part of their Phi series. It shows formidable capabilities in reasoning and language understanding, outshining much larger models, even those with up to 25 times more parameters. Phi-2's training utilized a vast and diverse dataset of 1.4 trillion tokens, incorporating high-quality synthetic data and curated web content to bolster its common sense reasoning and general knowledge. Interestingly, despite lacking fine-tuning via reinforcement learning from human feedback (RLHF), it exhibits enhanced safety features and reduced bias. This makes Phi-2 a particularly useful asset in natural language processing research and development 127.

2023-12-12

Researched 26d ago

No window data

JSON
Microsoft Foundry

$0.070 in / $0.070 out / 1M tokens

5 routes

Provider docs
Command

Command R is a 35-billion parameter enterprise-grade LLM developed by CohereForAI, emphasizing scalability and performance. It features retrieval-augmented generation, optimized for long-context tasks up to 128k tokens, and robust multilingual capabilities across ten major languages. Additionally, it can interface with external tools and APIs. Command R excels in retrieving information from documents and enterprise data, providing well-cited responses to mitigate hallucinations. These capabilities are further amplified in its larger variant, Command R+, with 104 billion parameters. While highly capable, it requires human oversight to address domain-specific performance limitations and ethical concerns, including potential biases.

2023-11-14

Researched 26d ago

4K

4,000 tokens

JSON
Cohere API

$1.00 in / $2.00 out / 1M tokens

5 routes

Provider docs
GLM-4.7

GLM-4.7 is a language model from Zhipu AI in the GLM-4 family, available via the Fireworks AI platform.

2025-01-01

Researched 26d ago

No window data

JSON
OpenRouter

$0.380 in / $1.74 out / 1M tokens

4 routes

Provider docs
Qwen2-72B

Qwen2-72B is a cutting-edge large language model developed by Alibaba's Qwen team, featuring an impressive 72 billion parameters based on the Transformer architecture 12. It employs advanced enhancements such as SwiGLU activation, attention QKV bias, and group query attention to advance efficiency and precision 16. The model demonstrates strong performance across diverse benchmarks, excelling in language understanding, generation, coding, mathematics, and multilingual tasks, often surpassing other open-source models and challenging proprietary alternatives 34. With support for processing up to 128,000 tokens in context and proficiency in around 30 languages, it offers extensive input capabilities 15. However, the base model is not optimal for direct text generation; post-training techniques are advisable for specific applications 16.

2024-06-05

Researched 26d ago

128K

128,000 tokens

128K contextJSON
DeepInfra

$0.450 in / $0.650 out / 1M tokens

4 routes

Provider docs
WizardLM-2 8x22B

WizardLM-2 8x22B, developed by WizardLM@Microsoft AI, is a powerful large language model (LLM) featuring 141 billion parameters and utilizing a Mixture of Experts (MoE) architecture. It excels in complex tasks such as chat, multilingual conversations, reasoning, and agent-based interactions. Trained with an AI-powered synthetic system incorporating techniques like Evol-Instruct and AI Align AI, the model surpasses many open-source alternatives. Despite its performance on various benchmarks, further research is essential to address potential biases and enhance reliability post "toxicity testing."

2024-01-09

Researched 26d ago

No window data

JSON
Lepton AI API

$0.500 in / $0.500 out / 1M tokens

4 routes

Provider docs
Yi 34B

34B base model from 01.AI.

2023-11-02

Researched 26d ago

200K

200,000 tokens

200K contextJSON
DeepInfra

$0.250 in / $0.380 out / 1M tokens

4 routes

Provider docs
MythoMax L2 13B

MythoMax L2 13B is an advanced large language model developed by Gryphe, designed specifically for creative text generation, with a focus on storytelling and role-playing applications. It builds upon the Llama 2 architecture and utilizes a unique tensor merging technique that combines strengths from the MythoLogic-L2 and Huginn models. This enhances its ability to produce coherent, contextually relevant text for extended narratives and complex character interactions. The model features a substantial parameter count of 13 billion, supporting high-quality, fluent text generation. It is configured for optimal use with Alpaca-style prompt formatting and offers multiple quantized versions to suit different hardware configurations. The model can manage extensive context lengths, making it ideal for interactive storytelling, roleplaying games, and more. However, it requires substantial computational resources and careful prompt engineering to function effectively. Potential biases or inaccuracies are a consideration, as with any large language model.

2023-10-27

Researched 26d ago

No window data

JSON
OpenRouter

$0.060 in / $0.060 out / 1M tokens

4 routes

Provider docs