LLM Reference

UI-TARS Models by ByteDance

ByteDanceApache 2.0Open source
1 model2026Up to 128k ctx

Details

ResearcherByteDance
LicenseApache 2.0(OSI)
Commercial useCommercial use allowed
Models1
Released2026
Max context128k

Capabilities

VisionAll models
MultimodalAll models
Tool UseAll models

Links

Website

About

UI-TARS is ByteDance's multimodal vision-language agent series optimized for GUI automation across desktop, web, and mobile environments.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

1 in view

Use when the workload needs agents, 128k context, and 7B parameters.

2026-02agents128k context7B parameters

Release Timeline

1 release group
2026-02
1 current
UI-TARS 1.5 7B
agents128k context7B parameters
Current

Specifications(1 models)

UI-TARS model specifications comparison
ModelReleasedContextParametersVisionMultimodalTool Use
UI-TARS 1.5 7B2026-02128k7BYesYesYes

Frequently Asked Questions

What is UI-TARS used for?
UI-TARS is used for agents, vision and multimodal work, and agent workflows and tool use. The family description and listed model capabilities point to those workloads as the best fit.
How does UI-TARS compare to Seed?
UI-TARS by ByteDance is strongest where you need agents, while Seed by ByteDance is the closest related family to check for vision and multimodal work. UI-TARS has 1 listed variant and reaches up to 128k context, while Seed reaches up to 256k context, so compare the specs and pricing tables before choosing a production model.
Which UI-TARS model should I use?
If price is the main constraint, use the pricing table first because UI-TARS does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate UI-TARS 1.5 7B with 128k context and tool use and multimodal inputs.