UI-TARS 1.5 7B
ui-tars-1.5-7b
Open SourceMultimodal
About
UI-TARS-1.5 is ByteDance's multimodal vision-language agent model optimized for GUI-based environments including desktop interfaces, web browsers, and mobile apps. It supports grounding, planning, and action execution for computer-use tasks.
UI-TARS 1.5 7B has a 128K-token context window.
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution
Specifications
FamilyUI-TARS
Released2026-02-01
Parameters7B
Context128K
ArchitectureDecoder Only
Specializationagents
LicenseApache 2.0
Trainingfinetuned