Last refreshed 2026-05-01. Next refresh: weekly.
Why use PaliGemma 3B 896 on NVIDIA NIM?
NVIDIA NIM offers PaliGemma 3B 896 with competitive pricing. NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices.
Setup recipe
Docs fallbackUse the provider REST API or SDKCreate a provider API keymodel: paligemma-3b-896paligemma-3b-896Request example
paligemma-3b-896.Gotchas
No curated gotchas have been sourced for this exact provider/model route yet.
Pricing
| Type | Rate |
|---|---|
| GPU Hour Rate | $1.00/GPU·hr |
| GPU Config | 1xH100 |
Capabilities
About PaliGemma 3B 896
PaliGemma 3B 896 is a versatile and lightweight vision-language model developed by Google, designed to process and integrate both images and text. Inspired by the PaLI-3 model, it employs components like the SigLIP vision model and the Gemma-2B language model, featuring a linear projection layer for seamless integration of visual and textual inputs. Capable of handling tasks such as image captioning, visual question answering, object detection, and segmentation, it supports multilingual text processing. Despite requiring task-specific fine-tuning for optimal performance, PaliGemma highlights strong capabilities across various vision-language applications, although it may encounter challenges with contextual understanding, biases, and computational demands 124.
FAQ
What is the context window for PaliGemma 3B 896 on NVIDIA NIM?
PaliGemma 3B 896 supports a 512 token context window on NVIDIA NIM.
Who created PaliGemma 3B 896?
PaliGemma 3B 896 was created by Google DeepMind as part of the PaliGemma model family.
Is PaliGemma 3B 896 open source?
PaliGemma 3B 896 is open source according to the seed data.