Fireworks AI

14 models across 11 families · Latest: FireMoE 3B Chat v2 (2025-08)

Researched 60d ago

Blazing-fast inference for generative AI

Long contextVision

Fireworks AI's portfolio covers 14 active models across 11 current families, spanning long context and vision. Open a model detail page to compare provider routes and sourced benchmarks.
Covers 2 workload areas across 14 active tracked models; last verified 2026-05-19.

Use it for

Teams evaluating long context and vision across this lab's releases
Comparing model families before committing to a flagship
Migration and pricing follow-ups across 14 tracked models

Do not use it for

Choosing a hosting provider without opening a model page for price ladders

Active models

Current models from this lab, excluding deprecated ones

Active families

Current model families from this lab

Open catalog

11 open

7 open source / 4 open weights

Lowest output price

$0.500 /1M

Cheapest tracked output across active models, per 1M tokens

Latest dated release

2025-08-22

FireMoE 3B Chat v2

Freshness

2026-05-19

Researched 60d ago

aging

Information

Founded2022

Redwood City, California, United States

Links

Website GitHub X / Twitter LinkedIn HuggingFace Crunchbase

Release cadence

Showing 5 recent dated releases (full timeline below). Latest: FireMoE 3B Chat v2 (2025-08-22).

Where this lab wins

Long-context: 1 tracked model with context-token or InfiniteBench-class signal.
Vision: 1 tracked model with multimodal benchmark coverage.

Flagship quality / price signal

Flagship: Firefunction V1 (best sourced coding quality-per-dollar in this portfolio).

Quality-per-dollar unavailable for this flagship — benchmark coverage or output token pricing is still missing.

Fireworks AI is an American AI research organization founded in 2022. Blazing-fast inference for generative AI. Fireworks AI ships 11 model families totaling 14 models, with the most recent release FireMoE 3B Chat v2 in 2025-08. Notable families include FireMoE, Fire Qwen, and FireGemini. Use it as a stable reference for lab background, release coverage, and follow-up model pages as they are. View official API endpoints, benchmark performance, and coding/agent fit for every Fireworks AI model.

About

Fireworks AI emerged in the tech landscape in October 2022, with its innovative vision rooted in generative AI advances. Situated in Redwood City, California, the company swiftly built a reputation for its platform that empowers developers and businesses to efficiently create and deploy generative AI applications. This platform bridges the gap between AI prototypes and production-ready systems, emphasizing speedy deployment, cost optimization, and scalability. The company has gained significant traction in the AI industry, having raised $77 million in funding, including a $52 million Series B round in July 2024, pegging its valuation at $552 million. Esteemed investors like Sequoia Capital, Benchmark, Nvidia, and AMD are among its backers, indicative of the trust placed in Fireworks AI's potential and trajectory. Fireworks AI is particularly known for its proprietary fast and efficient inference engine, boasting performance metrics such as up to 1000 tokens per second with speculative decoding, and offering 9x faster inference for RAG models compared to competitors like Groq. This acceleration is complemented by their support for popular AI models like Llama 3, Mixtral, and Stable Diffusion, alongside a unique LoRA-based fine-tuning service for enhanced cost efficiency and customization. A defining feature of Fireworks AI is its compound AI system approach, facilitating the integration of multiple AI models and data modalities alongside external tools like databases, APIs, and knowledge graphs. This approach is advanced by their FireFunction, a cutting-edge function calling model fostering the development of sophisticated applications in areas like RAG, search, and AI-driven expert systems. With a team that includes veterans from Meta's PyTorch team, Fireworks AI demonstrates strong technical expertise. They cater to a diverse clientele with their production-grade infrastructure, offering serverless and dedicated deployment options. Noteworthy features include secure and compliant offerings like pay-per-token pricing, on-demand GPUs, SOC2 Type II and HIPAA compliance, and secure VPC & VPN connectivity. In essence, Fireworks AI stands out in the generative AI domain with its focus on speed, cost-efficiency, and the development of advanced compound AI systems. Backed by significant funding and an experienced team, its innovative approach and robust infrastructure underline its position as an influential entity in the rapidly evolving AI landscape.

Featured models

Model	Released	Context	Input price ($/1M)	Output price ($/1M)	License	Openness
FireMoE 3B Chat v2	2025-08-22	-	-	-	Apache 2.0	Open source
FireQwen2.5-7B-Instruct	2025-05-10	-	-	-	Apache 2.0	Open source
FireGemini 7B	2025-02-14	-	-	-	Apache 2.0	Open source

Model families

FireMoE

Fire Qwen

FireGemini

FARE

Fire Llama 3

Fireworks Dev

Fireworks Chat

Fireworks Functions

FireLLaVA

Firefunction

Recent releases

FireMoE 3B Chat v2- 2025-08-22
FireQwen2.5-7B-Instruct- 2025-05-10
FireGemini 7B- 2025-02-14
FARE-20B- 2025-01-01
FireLlama 3 8B Instruct- 2024-12-20

Top comparisons

FAQ

Who founded Fireworks AI and when?

Fireworks AI was founded in 2022 and is associated with Redwood City, California, United States.

What models has Fireworks AI released?

Fireworks AI ships 14 models across 11 families: FireMoE, Fire Qwen, and FireGemini.

Is Fireworks AI's technology open source?

Some Fireworks AI models are open-weight (FireMoE 3B Chat v2, FireMoE 1B Chat, and FireQwen2.5-7B-Instruct); others are proprietary (FARE-20B, f1, and f1-mini).

Where is Fireworks AI headquartered?

Fireworks AI is headquartered in Redwood City, California, United States.

What is Fireworks AI known for?

Blazing-fast inference for generative AI. Its most prominent tracked family is FireMoE.

How can I access Fireworks AI's models?

Fireworks AI's models are available via Fireworks AI.

Explore related pages

FireMoE model family Fire Qwen model family FireGemini model family FARE model family FireMoE 3B Chat v2 model spec FireQwen2.5-7B-Instruct model spec FireGemini 7B model spec OpenAI Anthropic AI at Meta

Last reviewed: 2026-05-19. Data sourced from public lab announcements and provider documentation.