Using Gemma 4 12B IT on Hugging Face Inference Endpoints

Implementation guide · Gemma 4 · Google DeepMind

Open Source

Quick Start

1
Create an account at Hugging Face Inference Endpoints and generate an API key.
2
Use the Hugging Face Inference Endpoints SDK or REST API to call google/gemma-4-12B-it — see the documentation for request format.

API Portal Documentation Pricing Model Card

Code Examples

See Hugging Face Inference Endpoints documentation for integration details.

About Hugging Face Inference Endpoints

Hugging Face's AI platform serves as a comprehensive ecosystem for machine learning, centered around the Hugging Face Hub. This hub hosts an extensive collection of over 450,000 pre-trained models and 90,000 datasets, covering a wide range of AI tasks including natural language processing, computer vision, and audio processing. Users can easily access and utilize these resources for various applications such as text classification, translation, image generation, and speech recognition. The platform's Transformers library simplifies the implementation of these models, providing user-friendly interfaces for tasks like fine-tuning and model evaluation. The platform extends its capabilities through Spaces, which are customizable environments for deploying and showcasing machine learning applications. These Spaces enable users to create interactive demos and engage with AI technology without requiring extensive technical expertise. The platform also supports integration with popular machine learning frameworks like TensorFlow and PyTorch, enhancing its versatility for developers. By combining a vast repository of models and datasets with tools for collaboration and deployment, the platform empowers users to efficiently build, train, and deploy AI models while fostering a community-driven approach to AI development and innovation.

Hugging Face is a leading AI community and platform dedicated to democratizing artificial intelligence. They provide a comprehensive ecosystem for machine learning, focusing on natural language processing and deep learning. Their platform offers: 1. A vast repository of pre-trained models and datasets 2. Tools for model training, fine-tuning, and deployment 3. Collaborative spaces for AI researchers and developers 4. Open-source libraries like Transformers for state-of-the-art NLP Founded in 2016, Hugging Face has grown rapidly, now serving over 5 million users. They emphasize open-source development and community-driven innovation, fostering a collaborative environment for AI advancement. The platform supports various AI tasks, including text generation, image processing, and speech recognition, making it a versatile hub for both beginners and experts in the field of artificial intelligence.

View all models on Hugging Face Inference Endpoints →

Pricing on Hugging Face Inference Endpoints

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsAudioFine-tuning

About Gemma 4 12B IT

Instruction-tuned version of Gemma 4 12B. Open weight (Apache 2.0), 12B parameters, encoder-free multimodal (text, image, audio). Optimized for chat and instruction-following. Runs on a 16GB laptop.

Full model details →

Model Specs

Released2026-06-03

Parameters12B

Context256k

ArchitectureDecoder Only

Knowledge cutoff2025-01

Hugging Face

New York City, New York, United States