According to research organization METR: The capabilities of key LLMs are doubling every seven months. This realization leads to a second conclusion, equally stunning: By 2030, the most advanced LLMs should be able to complete, with 50 percent reliability, a software-based task that takes humans *a full month* of 40-hour workweeks. And the LLMs would likely be able to do many of these tasks much more quickly than humans, taking only days, or even just hours.
At the heart of the METR work is a metric the researchers devised called “[task-completion time horizon.](https://www.perplexity.ai/page/ai-task-capacity-doubles-every-F0hK0HIkQVKjH.88ZDYArg)” It’s the amount of time human programmers would take, on average, to do a task that an LLM can complete with some specified degree of reliability, such as 50 percent.
A plot of this metric for some general-purpose LLMs going back several years shows clear exponential growth, with a doubling period of about seven months. The researchers also considered the “messiness” factor of the tasks, with “messy” tasks being those that more resembled ones in the “real world.”
sciolisticism on
> If the pace continues until 2030
Found the catch.
navetzz on
Let s ignore ceilings and assume everything is an exponential: this sub
3 Comments
According to research organization METR: The capabilities of key LLMs are doubling every seven months. This realization leads to a second conclusion, equally stunning: By 2030, the most advanced LLMs should be able to complete, with 50 percent reliability, a software-based task that takes humans *a full month* of 40-hour workweeks. And the LLMs would likely be able to do many of these tasks much more quickly than humans, taking only days, or even just hours.
At the heart of the METR work is a metric the researchers devised called “[task-completion time horizon.](https://www.perplexity.ai/page/ai-task-capacity-doubles-every-F0hK0HIkQVKjH.88ZDYArg)” It’s the amount of time human programmers would take, on average, to do a task that an LLM can complete with some specified degree of reliability, such as 50 percent.
A plot of this metric for some general-purpose LLMs going back several years shows clear exponential growth, with a doubling period of about seven months. The researchers also considered the “messiness” factor of the tasks, with “messy” tasks being those that more resembled ones in the “real world.”
> If the pace continues until 2030
Found the catch.
Let s ignore ceilings and assume everything is an exponential: this sub