LLM Reference
Hugging Face Inference Endpoints

Using Gemma 4 12B IT on Hugging Face Inference Endpoints

Implementation guide · Gemma 4 · Google DeepMind

Open Source

Quick Start

  1. 1
    Create an account at Hugging Face Inference Endpoints and generate an API key.
  2. 2
    Use the Hugging Face Inference Endpoints SDK or REST API to call google/gemma-4-12B-it — see the documentation for request format.

Code Examples

About Hugging Face Inference Endpoints

Hugging Face's AI platform serves as a comprehensive ecosystem for machine learning, centered around the Hugging Face Hub. This hub hosts an extensive collection of over 450,000 pre-trained models and 90,000 datasets, covering a wide range of AI tasks including natural language processing, computer vision, and audio processing. Users can easily access and utilize these resources for various applications such as text classification, translation, image generation, and speech recognition. The platform's Transformers library simplifies the implementation of these models, providing user-friendly interfaces for tasks like fine-tuning and model evaluation. The platform extends its capabilities through Spaces, which are customizable environments for deploying and showcasing machine learning applications. These Spaces enable users to create interactive demos and engage with AI technology without requiring extensive technical expertise. The platform also supports integration with popular machine learning frameworks like TensorFlow and PyTorch, enhancing its versatility for developers. By combining a vast repository of models and datasets with tools for collaboration and deployment, the platform empowers users to efficiently build, train, and deploy AI models while fostering a community-driven approach to AI development and innovation.

Hugging Face is a leading AI community and platform dedicated to democratizing artificial intelligence. They provide a comprehensive ecosystem for machine learning, focusing on natural language processing and deep learning. Their platform offers: 1. A vast repository of pre-trained models and datasets 2. Tools for model training, fine-tuning, and deployment 3. Collaborative spaces for AI researchers and developers 4. Open-source libraries like Transformers for state-of-the-art NLP Founded in 2016, Hugging Face has grown rapidly, now serving over 5 million users. They emphasize open-source development and community-driven innovation, fostering a collaborative environment for AI advancement. The platform supports various AI tasks, including text generation, image processing, and speech recognition, making it a versatile hub for both beginners and experts in the field of artificial intelligence.

Pricing on Hugging Face Inference Endpoints

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsAudio

About Gemma 4 12B IT

Instruction-tuned 12B Gemma 4 model with native text, image, video, and audio input through an encoder-free unified architecture. It runs on 16 GB VRAM in BF16, supports a 256K context window, configurable thinking mode, function calling, structured outputs, and 140+ languages, making it the mid-sized Gemma 4 option between E4B and the 26B MoE.

Model Specs

Released2026-06-03
Parameters11.9B
Context256k
Architectureencoder_free_unified_multimodal
Knowledge cutoff2025-01

More Models on Hugging Face Inference Endpoints

Provider

Hugging Face Inference Endpoints
Hugging Face Inference Endpoints

Hugging Face

New York City, New York, United States