Anthropic
, the artificial intelligence startup founded by former
OpenAI
researchers that is backed by
Amazon
, has unveiled its latest language models,
Claude 3
, claiming they outperform rival systems like
's
Gemini
and OpenAI's vaunted
GPT-4
.
The Google,
Amazon-backed
company says its new Claude 3 Opus model scored 50.4% on the graduate-level reasoning benchmark, higher than GPT-4's 35.7%, and better results on undergraduate knowledge, basic maths skills, and other measures.
One major new feature is multimodal support, allowing Claude 3 to analyse images, documents, charts, and other data formats in addition to text. This makes it the first truly multimodal AI assistant from Anthropic, putting it on par with offerings like the image-understanding versions of GPT-4.
"The world is multimodal...For us, text and code always felt like incomplete modalities," said Daniela Amodei, Anthropic's co-founder, echoing similar statements from OpenAI about the importance of multimodal AI.
Beyond increased accuracy, Anthropic claims the Claude 3 models are significantly faster than their predecessors, with the smallest Haiku version able to summarise long documents like research papers in just seconds quickly.
The company also says Claude 3 is less likely to refuse to answer prompts that its previous models judged to be off-limits or too close to the safety "guardrails." This suggests more nuanced, context-aware responses.
“In our quest to have a highly harmless model, Claude 2 would sometimes over-refuse,” Amodei told CNBC. "Claude 3 has a more nuanced understanding of prompts."
Anthropic's three new language models - Opus, Sonnet, and Haiku - are available today through the company's API and website and cloud platforms like Amazon Bedrock and Google Vertex AI.
The release of Claude 3 marks another salvo in the fierce competition among AI labs to build ever-more capable language models and meet soaring commercial demand.
OpenAI, Google, Anthropic, and others have released a rapid succession of more powerful AI systems in recent months, each claiming superiority over rivals on select benchmarks while acknowledging remaining limitations.