Nemotron 3 8B on Azure OpenAI

Provisioned

Pricing

Type	Price (per 1M)
Input tokens	$0.37
Output tokens	$1.10

Capabilities

VisionMultimodalReasoningFunction CallingTool UseJSON ModeCode Execution

About Nemotron 3 8B

Nemotron-3 8B is a series of large language models from NVIDIA, geared towards corporate applications for developing bespoke LLMs. Utilizing a GPT-3-style transformer architecture, the core model features 8 billion parameters and supports a 4,096 token context length. This model forms the backbone for specialized variants like Nemotron-3-8B-Base-4k for customization, Nemotron-3-8B-Chat models allowing for steerable outputs and refined via RLHF, and Nemotron-3-8B-QA, optimized for question-answering. Compatible with the NVIDIA NeMo framework, these models support fine-tuning methods such as LoRA and are designed for efficient deployment on NVIDIA GPUs. They have been trained on extensive multilingual data containing 3.5 to 3.8 trillion tokens across a diverse range of languages and evaluation benchmarks, although they may exhibit biases and inaccuracies due to their training data.

Get Started

Model Card Docs Portal Pricing

Model Specs

Released2026-03-01

Parameters8B

Context4K

ArchitectureDecoder Only

Provider

Azure OpenAI

Microsoft

All models on Azure OpenAI →