Proving mathematical theorems, or logically explaining why a theorem (e.g. the Pythagorean theorem) is true, requires both reasoning and the ability to choose from a range of possible steps toward a solution. These problem-solving skills could — if DeepMind’s right — turn out to be a useful component of future general-purpose AI models.”
DesoLina on
AI solves cookie-cutter problems with known solutions, what a shock!
navetzz on
My dictionary knows more words than any scholar throughout history…
Arachnode on
Whoa, it can do math?
Next they’ll claim it could win a spelling bee.
MongolianMango on
Calculators are better than international mathematical gold medalists too…
ftgyhujikolp on
Looking at the paper: it was given hundreds to thousands of attempts per problem.
6 Comments
“The system, called AlphaGeometry2, is an improved version of a system, AlphaGeometry, [that DeepMind released last January](https://techcrunch.com/2024/01/17/deepminds-latest-ai-can-solve-geometry-problems/). In a [newly published study](https://arxiv.org/pdf/2502.03544), the DeepMind researchers behind AlphaGeometry2 claim their AI can solve 84% of all geometry problems over the last 25 years in the International Mathematical Olympiad (IMO)
Proving mathematical theorems, or logically explaining why a theorem (e.g. the Pythagorean theorem) is true, requires both reasoning and the ability to choose from a range of possible steps toward a solution. These problem-solving skills could — if DeepMind’s right — turn out to be a useful component of future general-purpose AI models.”
AI solves cookie-cutter problems with known solutions, what a shock!
My dictionary knows more words than any scholar throughout history…
Whoa, it can do math?
Next they’ll claim it could win a spelling bee.
Calculators are better than international mathematical gold medalists too…
Looking at the paper: it was given hundreds to thousands of attempts per problem.
Section 8 – Results https://arxiv.org/pdf/2502.03544
Far less impressive when you read between the lines.
OpenAI fudges stats with math in the same ways, which made me suspicious enough to go look at the paper.