Using Phi-3 Medium 4K on DeepInfra

Implementation guide · Phi-3 · Microsoft Research

ServerlessOpen Source

Quick Start

1
Create an account at DeepInfra and generate an API key.
2
Use the DeepInfra SDK or REST API to call phi-3-medium-4k — see the documentation for request format.
3
You'll be billed $0.14/1M input, $0.41/1M output tokens. See full pricing.

API Portal Documentation Pricing

Code Examples

Install

pip install openai

API key

DEEPINFRA_API_KEY

Model ID

phi-3-medium-4k

DeepInfra uses "organization/model-name" format, e.g. "meta-llama/Meta-Llama-3-8B-Instruct" or "mistralai/Mistral-7B-Instruct-v0.3". See the DeepInfra model catalog for exact IDs.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPINFRA_API_KEY"],
    base_url="https://api.deepinfra.com/v1/openai"
)
response = client.chat.completions.create(
    model="phi-3-medium-4k",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

About DeepInfra

DeepInfra offers serverless AI inference with a simple API, supporting hundreds of models across text generation, embeddings, and more. Pay-per-token pricing with no upfront commitments.

DeepInfra is a cloud inference platform offering cost-effective access to open-source AI models. It provides serverless inference for leading models from Meta, Mistral, Alibaba, and others with competitive token-based pricing.

View all models on DeepInfra →

Pricing on DeepInfra

Type	Price (per 1M)
Input tokens	$0.14
Output tokens	$0.41

Capabilities

Structured Outputs

About Phi-3 Medium 4K

The Phi-3 Medium 4K, developed by Microsoft, is a state-of-the-art large language model with 14 billion parameters. It is engineered for efficiency across various tasks, particularly excelling in reasoning capabilities. This model is designed to handle 4,096 token context lengths, allowing for the processing of longer input sequences. Leveraging a dense, decoder-only Transformer architecture, it incorporates techniques like supervised fine-tuning and direct preference optimization to align with human preferences and safety standards. The model supports multilingual data, although it is primarily trained in English. Its lightweight nature allows for deployment on diverse hardware platforms, making it accessible and versatile for both commercial and research purposes. Safety measures are embedded, although further precautions are advised for applications with higher risks.

Full model details →

Model Specs

Released2024-05-21

Parameters14B

Context4k

ArchitectureDecoder Only

Knowledge cutoff2023-10

San Francisco, California, United States