News

A discrepancy between first- and third-party benchmark results for OpenAI's o3 AI model is raising questions about the ...
OpenAI’s newest LLM, o3, is facing scrutiny after independent tests found it solved a far fewer number of tough math problems ...
Benchmark performance results typically accompany the launch of every new AI model to showcase how well the models can ...
AI models are numerous and confusing to navigate, but the benchmarks used to measure their performance are also challenging.
Artificial intelligence is poised to outperform humans in writing code as leading groups, including OpenAI, Anthropic and ...
Through the Pioneers Program, OpenAI hopes to create benchmarks for specific domains like legal, finance, insurance, healthcare, and accounting. The lab says that, in the coming months, it’ll work ...
OpenAI launches GPT-4.1 with improved coding, long-context support, and updated data. Available via API only, it outperforms ...
OpenAI has announced the OpenAI Pioneers Program, a new initiative that will have the company working with startups to devise ...
OpenAI launches groundbreaking o3 and o4-mini AI models that can manipulate and reason with images, representing a major ...
By OpenAI 's own testing, its newest reasoning models, o3 and o4 -mini, hallucinate significantly higher than o1.
OpenAI slashes GPT-4.1 API prices by up to 75% while offering superior coding performance and million-token context windows, ...