Grok-1.5V

About

Grok-1.5V, created by xAI, is a multimodal large language model that combines both text and image processing capabilities. This model excels at interpreting and interacting with diverse visual data, including documents, diagrams, charts, screenshots, and photographs. Its multimodal nature allows it to perform advanced tasks like translating diagrams into code, generating image descriptions, and answering questions based on visual inputs, all while displaying a strong understanding of spatial information. Grok-1.5V has demonstrated competitive prowess against top models such as GPT-4V and Gemini Pro 1.5, particularly in areas that require spatial reasoning. Initially, access is primarily limited to early testers and existing Grok users, with plans for broader availability in the future 124.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Providers(1)

Provider	Input (per 1M)	Output (per 1M)	Type
xAI PromptIDE	—	—	Serverless

Specifications

FamilyGrok

Released2024-04-12

ArchitectureDecoder Only

Specializationgeneral