According to Sam Altman, OpenAI LLM attained IMO gold-level mathematical abilities, and the GPT-5 rollout is imminent


An experimental large language model (LLM) developed by OpenAI has achieved a historic milestone in artificial intelligence by attaining gold medal-level performance at the 2025 International Math Olympiad (IMO) — the world’s most challenging mathematics competition for high school students.

According to OpenAI researcher Alexander Wei, the model solved five out of six problems from the 2025 IMO under human exam conditions, earning 35 out of 42 points — enough to qualify for a gold medal in the actual competition.

What Makes This a Breakthrough

  • The IMO problems are notoriously complex and test creative mathematical reasoning, often taking even the best students hours to solve.

  • The LLM was evaluated over two 4.5-hour sessions, just like human participants, with no internet or tools and was asked to write natural language proofs.

  • Three former IMO medalists independently verified the answers, unanimously confirming the model’s solutions for problems 1 through 5. It did not solve problem 6, which is typically the hardest.

Wei highlighted how far AI has advanced in math:

“We’ve now progressed from GSM8K (~0.1 min for top humans), MATH benchmark (~1 min), AIME (~10 mins), to IMO (~100 mins).”

He added that this progress reflects a shift beyond traditional reinforcement learning, as the model had to produce long, logical proofs — something hard to verify and reward in most AI training pipelines.

Not GPT-5, and Not Coming Soon

Wei clarified that this is not GPT-5, but part of a separate research path. The IMO-capable model will not be released publicly for months, if at all:

“We don’t plan to release a model with IMO gold level of capability for many months.”

OpenAI CEO Sam Altman echoed this:

“We are releasing GPT-5 soon but want to set accurate expectations: this is an experimental model... not a specialised math system, but a general-purpose reasoning model.”

Reflections and Significance

  • In a personal note, Wei recalled a 2021 forecast he made as a PhD student — estimating 30% accuracy on the MATH benchmark by 2025. Instead, OpenAI hit IMO gold, far exceeding expectations.

  • He credited collaborators including Sheryl Hsu and Noam Brown, and acknowledged the real 2025 IMO participants, many of whom share backgrounds with OpenAI researchers.

Why It Matters

This result is arguably one of the most significant achievements in AI reasoning to date. Unlike prior benchmarks focused on short answers or multiple choice problems, IMO challenges demand:

  • Deep understanding

  • Precise logical argumentation

  • Mathematical creativity

Successfully solving such problems suggests AI is approaching expert-level performance in areas previously thought to be exclusively human domains.


 

buttons=(Accept !) days=(20)

Our website uses cookies to enhance your experience. Learn More
Accept !