LLM Reference
PaliGemma

PaliGemma

About

PaliGemma is a family of open-source vision-language models (VLMs) developed by Google, emphasizing lightweight design and efficiency compared to other large language models. Built using open components, including the SigLIP vision model and the Gemma language model, PaliGemma models seamlessly process both images and text to deliver text outputs. This capability makes them well-suited for tasks such as image captioning, visual question answering, and object detection. Available in resolutions ranging from 224x224 to 896x896, these models are offered in various forms including pre-trained, mix, and fine-tuned versions to meet diverse research and practical needs. While useful for direct inference, they excel when fine-tuned for specific applications 13578.

Models(3)

Details

ResearcherGoogle DeepMind
Models3