WizardCoder 33B on Replicate API

Name: WizardCoder 33B on Replicate API
Brand: WizardLM Team
SKU: wizardcoder-33b-replicate
Price: 0.20 USD

WizardCoder · WizardLM Team

ServerlessOpen Source

Last refreshed 2026-06-29. Next refresh: weekly.

Why use WizardCoder 33B on Replicate API?

Replicate API offers WizardCoder 33B with pay-as-you-go pricing at $0.20/1M input tokens. Replicate is a cloud-based platform that enables users to run machine learning models easily and efficiently.

Input / 1M

$0.20

Output / 1M

$1.00

Cache

Not sourced

Batch

Not sourced

Setup recipe

Python + curl

Install

pip install replicate

Auth

export REPLICATE_API_TOKEN=...

Call

import replicate
output = replicate.run(
    "wizardcoder-33b",
    input={"prompt": "Hello"}

Model ID

wizardcoder-33b

Request example

import replicate

# reads REPLICATE_API_TOKEN from env
# wizardcoder-33b format: "owner/model-name" (latest version) or "owner/model-name:version-hash"
output = replicate.run(
    "wizardcoder-33b",
    input={"prompt": "Hello"}
)
# Output is a list or generator depending on the model
print("".join(output))

Gotchas

Replicate uses "owner/model-name" format (e.g. "meta/meta-llama-3-8b-instruct") for the latest version, or "owner/model-name:version-sha" to pin to a specific version. The REST endpoint splits owner and model-name into the path: /v1/models/{owner}/{model-name}/predictions.
The examples expect REPLICATE_API_TOKEN; rename it only if your application config maps the new variable.

Pricing

Type	Price (per 1M)
Input tokens	$0.20
Output tokens	$1.00

Capabilities

No model capability flags are currently sourced.

About WizardCoder 33B

WizardCoder-33B-V1.1 is a cutting-edge large language model designed specifically for code generation tasks. Developed by the WizardLM team, it's based on the DeepSeek-Coder-33B-base model and employs the Evol-Instruct method, improving both code generation and comprehension. This model excels in generating code across multiple languages, enhancing workflows through automated code completion, and facilitating prototyping. It surpasses notable benchmarks such as HumanEval and MBPP, outperforming models like ChatGPT 3.5 in certain areas. The architecture uses a transformer design with advanced quantization methods to optimize performance across various hardware, and though some quality might be lost in quantized versions, it maintains a token context length of 16384. Proper use of system prompts ensures optimal results for this model, making it a premium tool for both educational and productivity-enhancing purposes.