LLM Reference
xAI

Grok-1.5V on xAI

Grok · xAI

Serverless

Capabilities

VisionMultimodalReasoningFunction CallingTool UseJSON ModeCode Execution

About Grok-1.5V

Grok-1.5V, created by xAI, is a multimodal large language model that combines both text and image processing capabilities. This model excels at interpreting and interacting with diverse visual data, including documents, diagrams, charts, screenshots, and photographs. Its multimodal nature allows it to perform advanced tasks like translating diagrams into code, generating image descriptions, and answering questions based on visual inputs, all while displaying a strong understanding of spatial information. Grok-1.5V has demonstrated competitive prowess against top models such as GPT-4V and Gemini Pro 1.5, particularly in areas that require spatial reasoning. Initially, access is primarily limited to early testers and existing Grok users, with plans for broader availability in the future 124.

Get Started

Model Specs

Released2024-04-12
ArchitectureDecoder Only

Related Models on xAI