Mamba 370M on Replicate API

Name: Mamba 370M on Replicate API
Brand: State Spaces
SKU: mamba-370m-replicate

Mamba · State Spaces

ServerlessOpen Source

Last refreshed 2026-05-22. Next refresh: weekly.

Why use Mamba 370M on Replicate API?

Replicate API offers Mamba 370M with competitive pricing. Replicate is a cloud-based platform that enables users to run machine learning models easily and efficiently.

Input / 1M

Output / 1M

Cache

Not sourced

Batch

Not sourced

Setup recipe

Python + curl

Install

pip install replicate

Auth

export REPLICATE_API_TOKEN=...

Call

import replicate
output = replicate.run(
    "mamba-370m",
    input={"prompt": "Hello"}

Model ID

mamba-370m

Request example

import replicate

# reads REPLICATE_API_TOKEN from env
# mamba-370m format: "owner/model-name" (latest version) or "owner/model-name:version-hash"
output = replicate.run(
    "mamba-370m",
    input={"prompt": "Hello"}
)
# Output is a list or generator depending on the model
print("".join(output))

Gotchas

Replicate uses "owner/model-name" format (e.g. "meta/meta-llama-3-8b-instruct") for the latest version, or "owner/model-name:version-sha" to pin to a specific version. The REST endpoint splits owner and model-name into the path: /v1/models/{owner}/{model-name}/predictions.
The examples expect REPLICATE_API_TOKEN; rename it only if your application config maps the new variable.

Capabilities

No model capability flags are currently sourced.

About Mamba 370M

Mamba 370M is a 370-million parameter large language model leveraging a state-space model (SSM) architecture, which differentiates it from traditional transformer models by eschewing attention and MLP blocks in favor of linear scaling with sequence length [6][9]. This design ensures efficient processing of lengthy sequences and is optimized for parallel GPU processing [6]. Notable for its text generation capabilities, Mamba 370M is also utilized for Japanese language processing [10], though the details of its training data vary, with some mentioning the Pile dataset [1]. A known limitation, "state collapse," wherein performance declines with longer sequences, has been addressed with mitigation techniques [7]. Despite these challenges, certain studies have shown Mamba models can handle sequences up to 256K tokens accurately with the right training [7].