llmreference
NVIDIA NIM

Using DBRX Instruct on NVIDIA NIM

Implementation guide · DBRX · Databricks Mosaic

Provisioned

Quick Start

  1. 1
    Create an account at NVIDIA NIM and generate an API key.
  2. 2
    Use the NVIDIA NIM SDK or REST API to call dbrx-instruct — see the documentation for request format.
  3. 3
    You'll be billed $1.00/GPU·hr. See full pricing.

Code Examples

See NVIDIA NIM documentation for integration details.

About NVIDIA NIM

NIM packages inference runtimes and model profiles into containers that expose standard API surfaces such as chat completions, completions, model listing, tokenization, health, and management endpoints. The hosted API path is useful for prototyping and catalog discovery, while the NGC/container path is the self-hosted route for teams that want GPU-hour infrastructure control, private-network deployment, Kubernetes scaling, or NVIDIA AI Enterprise support. Per-token pricing is not a universal provider-level claim in the current seed data; pricing should stay attached to sourced model-provider rows or NVIDIA's current catalog terms.

NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices. Developers can try hosted NIM APIs through the NVIDIA API Catalog on build.nvidia.com, then move the same model families into self-hosted NIM containers on NVIDIA GPUs in a data center, private cloud, public cloud, or workstation. The catalog positions NIM around optimized open and NVIDIA models, including chat, coding, reasoning, retrieval, vision, speech, and safety use cases, with downloadable model cards and API endpoints where NVIDIA exposes them.

Pricing on NVIDIA NIM

TypeRate
GPU Hour Rate$1.00/GPU·hr
GPU Config1xH100

Capabilities

Structured Outputs

About DBRX Instruct

DBRX Instruct, developed by Databricks, is a cutting-edge large language model designed for various natural language processing tasks. It excels in text summarization, question answering, information extraction, and code generation, utilizing a fine-grained mixture-of-experts architecture with 132 billion parameters. With advanced features like rotary position encodings, gated linear units, and grouped query attention, it performs exceptionally across multiple benchmarks, even outperforming some closed-source models. Trained on a vast 12 trillion token dataset, it supports contexts up to 32,000 tokens. Although primarily effective in English, its multilingual strength isn't fully explored. Users should be cautious as it may generate inaccurate or biased outputs.

Model Specs

Released2024-03-27
Parameters132B
Context32K
ArchitectureMixture of Experts

Provider

NVIDIA NIM
NVIDIA NIM

NVIDIA

Santa Clara, California, United States