Using Phi-3 Medium 128K on NVIDIA NIM
Implementation guide · Phi-3 · Microsoft Research
Quick Start
- 1
- 2Use the NVIDIA NIM SDK or REST API to call
phi-3-medium-128k— see the documentation for request format. - 3
Code Examples
About NVIDIA NIM
NIM packages inference runtimes and model profiles into containers that expose standard API surfaces such as chat completions, completions, model listing, tokenization, health, and management endpoints. The hosted API path is useful for prototyping and catalog discovery, while the NGC/container path is the self-hosted route for teams that want GPU-hour infrastructure control, private-network deployment, Kubernetes scaling, or NVIDIA AI Enterprise support. Per-token pricing is not a universal provider-level claim in the current seed data; pricing should stay attached to sourced model-provider rows or NVIDIA's current catalog terms.
NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices. Developers can try hosted NIM APIs through the NVIDIA API Catalog on build.nvidia.com, then move the same model families into self-hosted NIM containers on NVIDIA GPUs in a data center, private cloud, public cloud, or workstation. The catalog positions NIM around optimized open and NVIDIA models, including chat, coding, reasoning, retrieval, vision, speech, and safety use cases, with downloadable model cards and API endpoints where NVIDIA exposes them.
Pricing on NVIDIA NIM
| Type | Rate |
|---|---|
| GPU Hour Rate | $1.00/GPU·hr |
| GPU Config | 1xH100 |
Capabilities
No model capability flags are currently sourced.
About Phi-3 Medium 128K
The Phi-3 Medium 128K is an open-source, 14-billion parameter language model by Microsoft, designed for efficient operation in resource-limited environments. Noted for its state-of-the-art performance on reasoning tasks, it excels in language understanding, code generation, and logical reasoning while offering a long context window of up to 128,000 tokens, making it ideal for applications like summarizing lengthy documents. Its dense decoder-only Transformer architecture has been refined with supervised fine-tuning and preference optimization to enhance instruction-following capabilities. Additionally, Phi-3 Medium 128K is optimized for diverse hardware platforms, ensuring broad accessibility and performance 12.