Elon Musk’s rival to OpenAI, xAI, has debuted a feature capable of processing visual data in the AI chatbot, Grok.
Dubbed Grok-1.5V, this is xAI’s first multimodal AI model, adept not only at text comprehension but also at analyzing “documents, diagrams, charts, screenshots, and photographs.”
In its announcement, xAI showcased several practical applications of Grok’s expanded abilities. Users can present Grok with a photo of a flow chart and request it to convert it into Python code, compose a narrative based on a drawing, or elucidate a perplexing meme. After all, not everyone can effortlessly decipher the internet’s eclectic content.
This latest release follows closely on the heels of Grok-1.5’s introduction just a few weeks prior. Grok-1.5 was engineered to excel in coding and mathematics compared to its forerunner, while also boasting enhanced contextual processing capabilities to scrutinize data from diverse sources for deeper comprehension of specific inquiries.
While xAI has promised that early testers and current users will soon benefit from Grok-1.5V’s functionalities, it refrained from providing a precise timeline for its widespread availability.
In addition to unveiling Grok-1.5V, xAI has unveiled a benchmark dataset named RealWorldQA. This dataset comprises 700 images intended for evaluating AI models, each accompanied by questions and answers for easy verification, yet posing potential challenges for multimodal models like Grok.
xAI asserted its technology achieved the highest score when evaluated against competitors using RealWorldQA, including OpenAI’s GPT-4V and Google’s Gemini Pro 1.5.