Mistral Small 3.1 24B Instruct
Mistral Small 3.1 24B Instruct is worth evaluating for rag, long context, and vision when its provider route and context window match the workload.
Use it for
- Teams evaluating rag, long context, and vision
- Workloads that can use a 128k context window
- Buyers comparing 4 tracked provider routes
Do not use it for
- Workloads where another current model has stronger sourced task evidence
- Family
- Mistral Small
- Released
- 2025-12-15
- Context
- 128k
- Parameters
- 24B
- Architecture
- Decoder Only
- Knowledge cutoff
- 2023-10
- Specialization
- general
- Openness
- Open source
- License
- Apache 2.0OSI-approvedCommercial use: permitted
- Training
- Pretrained
Cheapest of 6 routes · Together AI
About
Mistral's Small 3.1 24B model with multimodal vision understanding capabilities. Optimized for cost-efficient deployment with 128K token context window. Available on Cloudflare Workers AI.
Mistral Small 3.1 24B Instruct is an open-source model in the Mistral Small family. The structured metadata tracks a 128k-token context window, multimodal input, and structured outputs. This page tracks provider routes through Cloudflare Workers AI, OpenRouter, Fireworks AI, and 3 more, with the cheapest tracked route listed at $0.1 input and $0.3 output per 1M tokens. No headline benchmark score is tracked for Mistral Small 3.1 24B Instruct yet.
Top use-case fit
RAG
Included by capability and metadata signals in the decision map.
Long context
Included by capability and metadata signals in the decision map.
Vision
Included by capability and metadata signals in the decision map.
Provider price ladder
Compare all 6Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.
| Provider | Input / 1M | Output / 1M | Route |
|---|---|---|---|
| Together AI | $0.100 | $0.300 | Serverless |
| Cloudflare Workers AI | $0.351 | $0.555 | Serverless |
| OpenRouter | $0.350 | $0.560 | Serverless |
| Fireworks AI | $0.900 | $0.900 | Serverless |
Available via routers & gateways(11)
AIRouter
RouterCommercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.
LiteLLM
GatewayOpen-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.
Martian
RouterAI-powered LLM router that analyzes each prompt in real-time to select the optimal model, targeting 20–97% cost reduction while maintaining quality; San Francisco startup reportedly nearing $1.3B valuation.
Neutrino AI
RouterCommercial LLM router that dynamically routes each query to the best-suited model with load balancing and fallback handling, charging 3% of underlying AI spend.
Not Diamond
RouterPredictive model router that determines the best LLM for each query; claims up to 25% accuracy gains and 10x cost reduction; powers OpenRouter's auto mode and is positioned specifically for coding agents.
NVIDIA LLM Router Blueprint
RouterNVIDIA's open-source AI blueprint for LLM routing that selects the optimal model per prompt via intent classification or neural auto-routing; being deprecated 2026-06-20.
Capabilities
Benchmark peer barsfor RAG
No task-mapped benchmark peers are available for this model yet.
Migration checks
No linked migration route is available for this model yet.
Comparison and alternatives
Browse all comparisons →Cheapest of 6 routes · Together AI