Using LLaVA 1.6 Hermes Yi 34B on NVIDIA NIM
Implementation guide · LLaVA 1.6 · Haotian Liu
Quick Start
- 1
- 2Use the NVIDIA NIM SDK or REST API to call
llava-1.6-hermes-yi-34b— see the documentation for request format. - 3
Code Examples
About NVIDIA NIM
NIM packages inference runtimes and model profiles into containers that expose standard API surfaces such as chat completions, completions, model listing, tokenization, health, and management endpoints. The hosted API path is useful for prototyping and catalog discovery, while the NGC/container path is the self-hosted route for teams that want GPU-hour infrastructure control, private-network deployment, Kubernetes scaling, or NVIDIA AI Enterprise support. Per-token pricing is not a universal provider-level claim in the current seed data; pricing should stay attached to sourced model-provider rows or NVIDIA's current catalog terms.
NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices. Developers can try hosted NIM APIs through the NVIDIA API Catalog on build.nvidia.com, then move the same model families into self-hosted NIM containers on NVIDIA GPUs in a data center, private cloud, public cloud, or workstation. The catalog positions NIM around optimized open and NVIDIA models, including chat, coding, reasoning, retrieval, vision, speech, and safety use cases, with downloadable model cards and API endpoints where NVIDIA exposes them.
Pricing on NVIDIA NIM
| Type | Rate |
|---|---|
| GPU Hour Rate | $1.00/GPU·hr |
| GPU Config | 1xH100 |
Capabilities
No model capability flags are currently sourced.
About LLaVA 1.6 Hermes Yi 34B
LLaVA-1.6, specifically the Hermes Yi 34B variant, represents a leap in multimodal AI capabilities, enhanced from its predecessor, LLaVA 1.5. This open-source chatbot excels in processing and responding to both text and image inputs. The model boasts a fourfold increase in image resolution support, enhanced visual reasoning and OCR capabilities, and improved visual conversation and world knowledge. It leverages the Nous-Hermes-2-Yi-34B language model as its backbone, offering superior commercial licenses and bilingual support. LLaVA-1.6-34B outshines other open-source models and even competes with Google's Gemini Pro on some tasks. Its training efficiency is impressive, requiring just one day on 32 A100 GPUs, and a demo for chat, image captioning, and visual question answering is accessible online.