Get Started with Falcon 40B on Microsoft Foundry
Microsoft Foundry offers access to Falcon 40B. Microsoft Foundry is a unified enterprise AI platform that significantly expands beyond Azure OpenAI. It functions as a multi-provider hosting and deployment platform for LLMs, supporting models from OpenAI, Anthropic, DeepSeek, xAI, Meta, Mistral, NVIDIA, and others. Foundry integrates agent services, evaluation, observability, and governance into a single Azure control plane. Key capabilities include a multi-provider model catalog, Model Router for intelligent prompt routing, Foundry Agent Service for building and deploying AI agents with built-in tracing and monitoring, and enterprise-grade governance with RBAC, compliance, and regional deployments. For broader model catalog including Claude, DeepSeek, Grok, Llama, Mistral, and NVIDIA Nemotron, Foundry is the recommended platform over Azure OpenAI.
Pricing
| Type | Price (per 1M) |
|---|---|
| Input tokens | $1.54 |
| Output tokens | $1.77 |
Capabilities
About Falcon 40B
Falcon 40B is a leading open-source large language model developed by the Technology Innovation Institute in Abu Dhabi, featuring a causal decoder-only architecture with 40 billion parameters. It stands out with its use of rotary positional embeddings, multi-query attention, and FlashAttention, enhancing its contextual understanding and processing efficiency. Trained on 1 trillion tokens using the enriched RefinedWeb dataset, Falcon 40B excels in various natural language processing tasks, ranging from text generation to language translation and question answering. It supports multiple languages and is open under the Apache 2.0 license, promoting both research and commercial use. The model efficiently utilizes standard hardware, requiring around 85-100 GB of memory for inference, setting a benchmark for performance and scalability in its category.