Mamba 1.4B on Replicate API

Name: Mamba 1.4B on Replicate API
Brand: State Spaces
SKU: mamba-1.4b-replicate

Mamba · State Spaces

ServerlessOpen Source

Last refreshed 2026-05-19. Next refresh: weekly.

Why use Mamba 1.4B on Replicate API?

Replicate API offers Mamba 1.4B with competitive pricing. Replicate is a cloud-based platform that enables users to run machine learning models easily and efficiently.

Input / 1M

Output / 1M

Cache

Not sourced

Batch

Not sourced

Setup recipe

Python + curl

Install

pip install replicate

Auth

export REPLICATE_API_TOKEN=...

Call

import replicate
output = replicate.run(
    "mamba-1.4b",
    input={"prompt": "Hello"}

Model ID

mamba-1.4b

Request example

import replicate

# reads REPLICATE_API_TOKEN from env
# mamba-1.4b format: "owner/model-name" (latest version) or "owner/model-name:version-hash"
output = replicate.run(
    "mamba-1.4b",
    input={"prompt": "Hello"}
)
# Output is a list or generator depending on the model
print("".join(output))

Gotchas

Replicate uses "owner/model-name" format (e.g. "meta/meta-llama-3-8b-instruct") for the latest version, or "owner/model-name:version-sha" to pin to a specific version. The REST endpoint splits owner and model-name into the path: /v1/models/{owner}/{model-name}/predictions.
The examples expect REPLICATE_API_TOKEN; rename it only if your application config maps the new variable.

Capabilities

No model capability flags are currently sourced.

About Mamba 1.4B

Mamba 1.4B is a cutting-edge large language model that utilizes a state-space model (SSM) architecture, designed for efficient processing of extended sequences. Unlike traditional Transformer models, Mamba scales linearly with sequence length, capable of handling up to a million elements, thanks to its selective SSM layer that optimally filters token information. This innovative approach enhances inference speed by forgoing the conventional attention mechanism, enabling a 5x throughput improvement over similarly sized Transformers. Optimized for NVIDIA GPUs, the model offers performance on par with larger models, although it may fall short in some downstream tasks compared to more extensive, fine-tuned solutions. Trained on the Pile dataset, its training specifics vary among sources, reflecting a need for clearer reporting standards.