PaliGemma 3B 896 on NVIDIA NIM

Name: PaliGemma 3B 896 on NVIDIA NIM
Brand: Google DeepMind
SKU: paligemma-3b-896-nvidia-nim

PaliGemma · Google DeepMind

ProvisionedOpen Weights

Last refreshed 2026-05-01. Next refresh: weekly.

Why use PaliGemma 3B 896 on NVIDIA NIM?

NVIDIA NIM offers PaliGemma 3B 896 with competitive pricing. NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices.

Input / 1M

Output / 1M

Cache

Not sourced

Batch

Not sourced

Setup recipe

Docs fallback

Install

Use the provider REST API or SDK

Auth

Create a provider API key

Call

model: paligemma-3b-896

Model ID

paligemma-3b-896

Request example

Curated snippets for this provider are not sourced yet. Use NVIDIA NIM documentation with model ID paligemma-3b-896.

Gotchas

No curated gotchas have been sourced for this exact provider/model route yet.

Pricing

Type	Rate
GPU Hour Rate	$1.00/GPU·hr
GPU Config	1xH100

Capabilities

VisionMultimodal

About PaliGemma 3B 896

PaliGemma 3B 896 is a versatile and lightweight vision-language model developed by Google, designed to process and integrate both images and text. Inspired by the PaLI-3 model, it employs components like the SigLIP vision model and the Gemma-2B language model, featuring a linear projection layer for seamless integration of visual and textual inputs. Capable of handling tasks such as image captioning, visual question answering, object detection, and segmentation, it supports multilingual text processing. Despite requiring task-specific fine-tuning for optimal performance, PaliGemma highlights strong capabilities across various vision-language applications, although it may encounter challenges with contextual understanding, biases, and computational demands 124.