14 models across 11 families · Latest: FireMoE 3B Chat v2 (2025-08)
Blazing-fast inference for generative AI
Fireworks AI's portfolio covers 14 active models across 11 non-obsolete families, with task labels spanning long context and vision. Open a model detail page to compare provider routes and sourced benchmarks.
Portfolio context: 2 decision-task tags, 14 active tracked models, latest research stamp 2026-05-19.
Use this portfolio page for
- Teams evaluating long context and vision across this lab's releases
- Readers comparing families before locking a flagship SKU
- 14 tracked SKUs for migration and pricing follow-ups
Do not stop here for
- Choosing a hosting provider without opening a model page for price ladders
Active models
14
Non-deprecated SKUs linked to this researcher
Active families
11
Non-obsolete families in coverage
Open catalog
8 OSS
0 open-weight (text match)
Decision task tags
2
Mapped to the site-wide task taxonomy
Latest dated release
2025-08-22
FireMoE 3B Chat v2
Freshness
2026-05-19
Researched 16d ago
Release cadence
Showing 5 recent dated ships (full timeline below). Latest spotlight: FireMoE 3B Chat v2 (2025-08-22).
Where this lab wins
- Long-context: 1 tracked model with context-token or InfiniteBench-class signal.
- Vision: 1 tracked model with multimodal benchmark coverage.
Flagship quality / price signal
Anchor SKU: Firefunction V1 (best sourced coding Q/$ in this portfolio).
Quality / dollar unavailable for this anchor — missing benchmark coverage and/or output token price on the cheapest ladder route (open the model detail after pricing lands).
Fireworks AI is an American AI research organization founded in 2022. Blazing-fast inference for generative AI. Fireworks AI ships 11 model families totaling 14 models, with the most recent release FireMoE 3B Chat v2 in 2025-08. Notable families include FireMoE, Fire Qwen, and FireGemini. Use it as a stable reference for lab background, release coverage, and follow-up model pages as they are. View official API endpoints, benchmark performance, and coding/agent fit for every Fireworks AI model.
About
Fireworks AI emerged in the tech landscape in October 2022, with its innovative vision rooted in generative AI advances. Situated in Redwood City, California, the company swiftly built a reputation for its platform that empowers developers and businesses to efficiently create and deploy generative AI applications. This platform bridges the gap between AI prototypes and production-ready systems, emphasizing speedy deployment, cost optimization, and scalability. The company has gained significant traction in the AI industry, having raised $77 million in funding, including a $52 million Series B round in July 2024, pegging its valuation at $552 million. Esteemed investors like Sequoia Capital, Benchmark, Nvidia, and AMD are among its backers, indicative of the trust placed in Fireworks AI's potential and trajectory. Fireworks AI is particularly known for its proprietary fast and efficient inference engine, boasting performance metrics such as up to 1000 tokens per second with speculative decoding, and offering 9x faster inference for RAG models compared to competitors like Groq. This acceleration is complemented by their support for popular AI models like Llama 3, Mixtral, and Stable Diffusion, alongside a unique LoRA-based fine-tuning service for enhanced cost efficiency and customization. A defining feature of Fireworks AI is its compound AI system approach, facilitating the integration of multiple AI models and data modalities alongside external tools like databases, APIs, and knowledge graphs. This approach is advanced by their FireFunction, a cutting-edge function calling model fostering the development of sophisticated applications in areas like RAG, search, and AI-driven expert systems. With a team that includes veterans from Meta's PyTorch team, Fireworks AI demonstrates strong technical expertise. They cater to a diverse clientele with their production-grade infrastructure, offering serverless and dedicated deployment options. Noteworthy features include secure and compliant offerings like pay-per-token pricing, on-demand GPUs, SOC2 Type II and HIPAA compliance, and secure VPC & VPN connectivity. In essence, Fireworks AI stands out in the generative AI domain with its focus on speed, cost-efficiency, and the development of advanced compound AI systems. Backed by significant funding and an experienced team, its innovative approach and robust infrastructure underline its position as an influential entity in the rapidly evolving AI landscape.
Featured models
| Model | Released | Context | Input price ($/1M) | Output price ($/1M) | License |
|---|---|---|---|---|---|
| FireMoE 3B Chat v2 | 2025-08-22 | - | - | - | Apache 2.0 |
| FireQwen2.5-7B-Instruct | 2025-05-10 | - | - | - | Apache 2.0 |
| FireGemini 7B | 2025-02-14 | - | - | - | Apache 2.0 |
Model families
Recent releases
- FireMoE 3B Chat v2- 2025-08-22
- FireQwen2.5-7B-Instruct- 2025-05-10
- FireGemini 7B- 2025-02-14
- FARE-20B- 2025-01-01
- FireLlama 3 8B Instruct- 2024-12-20
FAQ
Who founded Fireworks AI and when?
Fireworks AI was founded in 2022 and is associated with Redwood City, California, United States.
What models has Fireworks AI released?
Fireworks AI ships 14 models across 11 families: FireMoE, Fire Qwen, and FireGemini.
Is Fireworks AI's technology open source?
Some Fireworks AI models are open-weight (FireMoE 3B Chat v2, FireMoE 1B Chat, and FireQwen2.5-7B-Instruct); others are proprietary (FARE-20B).
Where is Fireworks AI headquartered?
Fireworks AI is headquartered in Redwood City, California, United States.
What is Fireworks AI known for?
Blazing-fast inference for generative AI. Its most prominent tracked family is FireMoE.
How can I access Fireworks AI's models?
Fireworks AI's models are available via Fireworks AI.
Explore related pages
Last reviewed: 2026-05-19. Data sourced from public lab announcements and provider documentation.










