Futurology

When A.I. Passes This Test, Look Out | The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models.

January 26, 2025

View 11 Comments

11 Comments

MetaKnowing on January 26, 2025 5:52 pm

“If you’re looking for a new reason to be nervous about artificial intelligence, try this: Some of the smartest humans in the world are struggling to create tests that A.I. systems can’t pass.

For years, A.I. systems were measured by giving new models a variety of standardized benchmark tests. Many of these tests consisted of challenging, S.A.T.-caliber problems in areas like math, science and logic. Comparing the models’ scores over time served as a rough measure of A.I. progress.

But A.I. systems eventually got too good at those tests, so new, harder tests were created — often with the types of questions graduate students might encounter on their exams.

Those tests aren’t in good shape, either. New models from companies like OpenAI, Google and Anthropic have been getting high scores on many Ph.D.-level challenges, limiting those tests’ usefulness and leading to a chilling question: Are A.I. systems getting too smart for us to measure?

This week, researchers at the Center for AI Safety and Scale AI are releasing a possible answer to that question: A new evaluation, called “Humanity’s Last Exam,” that they claim is the hardest test ever administered to A.I. systems.”
dasdas90 on January 26, 2025 6:01 pm

This is so stupid. Tech companies are just going to build a model that is going that is capable of answering these questions and people that have no clue will claim it is over AGI has been achieved.
Sunflier on January 26, 2025 6:06 pm

Got news for Humans. We are outmoded and no longer necessary to keep society moving. We’ll just eat shit, and billionaires will live with robot butlers
DunkingDognuts on January 26, 2025 6:25 pm

So I know this sounds like a really dumb question, but based on my knowledge of AI and learning models, it would be an easy feat for an AI to solve any problem that exists as long as that problem has a solution which also exists as a key.

Do these tests, ask AI to conceive of something novel based upon minimal input?

If not, then it’s just rote recall on a masive scale.
err604 on January 26, 2025 7:36 pm

Why don’t they ask the AI to make a test and see how every other model performs and rank them about how hard they make it for other models.
Novus20 on January 26, 2025 7:40 pm

So are we on to the blade runner timeline now……just need to know if we will be moving to space anytime soon…..
DunkingDognuts on January 26, 2025 7:42 pm

I will be impressed only when AI, looks at a horn shed by a ram, laying on the ground, and without consulting any databases, other than basic knowledge about its environment, create the following:

– A drinking vessel.
– a musical instrument
– a method for scraping animal hides free of hair
– a cutting instrument sharp enough to slice flesh

Or even better, give AI, a pile of sticks and again, without any additional knowledge other than basic properties of sticks (no science stuff nothing about friction and coefficient and temperatures and shit like that) and ask it to create a method of manipulating the sticks to create fire.

AI “knows” only what we teach it and let it know. It is a trained monkey being sold as a brilliant mathematician.
Mecha-Dave on January 26, 2025 7:49 pm

It still fails at creative tasks that people are good at. Composing good, meaningful music or creating real literature still seems to be beyond it.
salesmunn on January 26, 2025 8:00 pm

Ask the AI to build the test. These questions are so easy to answer.
KingArthurKOTRT on January 26, 2025 8:06 pm

Engagement bait. AI is only as powerful as we allow it. Pass this test…what is my favorite memory of all time?
wwarnout on January 26, 2025 8:34 pm

>Some of the smartest humans in the world are struggling to create tests that A.I. systems can’t pass.

Well, I’m certainly not one of those people, but I can cite 3 examples of AI being profoundly wrong:

1. I’m a structural engineer, so I asked ChatGPT for the strength of a cantilever beam. The answer was not just incorrect – it missed by a factor of 1000.

2. My wife is an attorney. She asked for a legal opinion, and ChatGPT cited two previous opinions – neither of which exist.

3. I asked Chat GPT if Trump had any felony convictions. It replied, “no”.

Tags

When A.I. Passes This Test, Look Out | The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models.

11 Comments