Capabilities
VisionMultimodalReasoningFunction CallingTool UseJSON ModeCode Execution
About Step-1.5V
Step-1.5V is StepFun's multimodal language model with vision capabilities, building on Step-1 with image understanding.
Model Specs
Released2024-06-01
Context128K
ArchitectureDecoder Only