
When A.I. Passes This Test, Look Out | The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models.
https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html

11 Comments
“If you’re looking for a new reason to be nervous about artificial intelligence, try this: Some of the smartest humans in the world are struggling to create tests that A.I. systems can’t pass.
For years, A.I. systems were measured by giving new models a variety of standardized benchmark tests. Many of these tests consisted of challenging, S.A.T.-caliber problems in areas like math, science and logic. Comparing the models’ scores over time served as a rough measure of A.I. progress.
But A.I. systems eventually got too good at those tests, so new, harder tests were created — often with the types of questions graduate students might encounter on their exams.
Those tests aren’t in good shape, either. New models from companies like OpenAI, Google and Anthropic have been getting high scores on many Ph.D.-level challenges, limiting those tests’ usefulness and leading to a chilling question: Are A.I. systems getting too smart for us to measure?
This week, researchers at the Center for AI Safety and Scale AI are releasing a possible answer to that question: A new evaluation, called “Humanity’s Last Exam,” that they claim is the hardest test ever administered to A.I. systems.”
This is so stupid. Tech companies are just going to build a model that is going that is capable of answering these questions and people that have no clue will claim it is over AGI has been achieved.
Got news for Humans. We are outmoded and no longer necessary to keep society moving. We’ll just eat shit, and billionaires will live with robot butlers
So I know this sounds like a really dumb question, but based on my knowledge of AI and learning models, it would be an easy feat for an AI to solve any problem that exists as long as that problem has a solution which also exists as a key.
Do these tests, ask AI to conceive of something novel based upon minimal input?
If not, then it’s just rote recall on a masive scale.
Why don’t they ask the AI to make a test and see how every other model performs and rank them about how hard they make it for other models.
So are we on to the blade runner timeline now……just need to know if we will be moving to space anytime soon…..
I will be impressed only when AI, looks at a horn shed by a ram, laying on the ground, and without consulting any databases, other than basic knowledge about its environment, create the following:
– A drinking vessel.
– a musical instrument
– a method for scraping animal hides free of hair
– a cutting instrument sharp enough to slice flesh
Or even better, give AI, a pile of sticks and again, without any additional knowledge other than basic properties of sticks (no science stuff nothing about friction and coefficient and temperatures and shit like that) and ask it to create a method of manipulating the sticks to create fire.
AI “knows” only what we teach it and let it know. It is a trained monkey being sold as a brilliant mathematician.
It still fails at creative tasks that people are good at. Composing good, meaningful music or creating real literature still seems to be beyond it.
Ask the AI to build the test. These questions are so easy to answer.
Engagement bait. AI is only as powerful as we allow it. Pass this test…what is my favorite memory of all time?
>Some of the smartest humans in the world are struggling to create tests that A.I. systems can’t pass.
Well, I’m certainly not one of those people, but I can cite 3 examples of AI being profoundly wrong:
1. I’m a structural engineer, so I asked ChatGPT for the strength of a cantilever beam. The answer was not just incorrect – it missed by a factor of 1000.
2. My wife is an attorney. She asked for a legal opinion, and ChatGPT cited two previous opinions – neither of which exist.
3. I asked Chat GPT if Trump had any felony convictions. It replied, “no”.