Using GPT-3.5 Turbo 16k on Azure OpenAI

Implementation guide · GPT-3.5 · OpenAI

Serverless

Quick Start

1
Create an account at Azure OpenAI and generate an API key.
2
Use the Azure OpenAI SDK or REST API to call gpt-3.5-turbo-16k — see the documentation for request format.
3
You'll be billed $0.50/1M input, $2.00/1M output tokens. See full pricing.

API Portal Documentation Pricing Model Card

Code Examples

Install

pip install openai

API key

AZURE_OPENAI_API_KEY

Model ID

gpt-3.5-turbo-16k

gpt-3.5-turbo-16k is your Azure deployment name, not the underlying model name. Deployment names are set when you deploy a model in Azure AI Foundry / Azure OpenAI Studio.

import os
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],  # e.g. https://{resource}.openai.azure.com
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-01"
)
response = client.chat.completions.create(
    model="gpt-3.5-turbo-16k",  # your deployment name
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

About Azure OpenAI

Azure OpenAI Service hosts OpenAI's GPT-4o, GPT-4, GPT-3.5, and embedding models on Microsoft Azure with enterprise SLAs. Deployments run in customer-selected regions with private networking, role-based access control, and capacity options spanning Standard pay-per-token, Provisioned Throughput Units (PTUs) for reserved capacity, Global Standard shared capacity, and Batch processing. Azure OpenAI sits inside the wider Microsoft Foundry / Azure AI Studio control plane, which adds an evaluation, monitoring, and Agent Service layer on top of the base model APIs. For workloads that need non-OpenAI models (Claude, DeepSeek, Grok, Llama, Mistral, NVIDIA Nemotron), Microsoft Foundry is the broader catalog; Azure OpenAI is the OpenAI-specific entry point. The service is API-compatible with the OpenAI SDK in most flows, so teams typically swap base URLs and authentication rather than rewriting calls.

Azure OpenAI Service hosts OpenAI's GPT-4o, GPT-4, GPT-3.5, and embedding models on Microsoft Azure with enterprise SLAs. Microsoft's broader AI platform spans Azure Machine Learning, Azure Cognitive Services, and Azure AI Search, plus productivity tools like Microsoft 365 Copilot and developer tooling in Visual Studio and the .NET framework. The platform emphasizes responsible AI, security, and regional compliance.

View all models on Azure OpenAI →

Pricing on Azure OpenAI

Type	Price (per 1M)
Input tokens	$0.50
Output tokens	$2.00

Capabilities

Structured Outputs

About GPT-3.5 Turbo 16k

GPT-3.5 Turbo 16k, developed by OpenAI, is an advanced language model featuring a significantly enhanced context window of 16,384 tokens—four times larger than its predecessor's 4,096 tokens 245. This extension allows it to process and comprehend extended texts, up to approximately 20 pages, in a single interaction, while maintaining the speed and efficiency of earlier versions 10. Although it is a chat-centric model not compatible with the completions endpoint 8, it remains highly effective for tasks requiring prolonged relevance and coherence through OpenAI's API 46. With a knowledge cutoff in September 2021, it surpasses its predecessors in capability, yet it is less advanced than GPT-4 in visual and multilingual contexts 35.

Full model details →

Model Specs

Released2023-06-13

Parameters20B

Context16k

ArchitectureDecoder Only

Knowledge cutoff2021-09

Microsoft

Redmond, Washington, United States