Share.

12 Comments

  1. “Claude Sonnet 4.5, released Monday, outperforms prior versions at coding, finance, cybersecurity and long-duration autonomous work, Anthropic said.

    To act as an agent, AI models must sustain work on a single task for hours — something many earlier models couldn’t do.

    The new version of Claude can work for 30 hours or more on its own, a big step up from the seven hours of autonomous work with Claude Opus 4.

    Anthropic said the rapid progress, marked by major Sonnet updates in February and May, shows a pattern where every six months its new model can handle tasks that are twice as complex.

    “This is a continued evolution on Claude, going from an assistant to more of a collaborator to a full, autonomous agent that’s capable of working for extended time horizons,” White said.

  2. However, fifteen minutes in it goes off the rails. Then it spends an incredible amount of tokens doing 29.75 hours of hallucinating and then you throw the result away. 

    Anthropic loses $100 of compute on the attempt, and nothing of value was made.

  3. When is it economically unsustainable to let an agent go on its own vs a human overlooking it and preventing it from going off course?

  4. ohyeathatsright on

    “The quirky sycophantic intern will now complete the entire project without supervision!”

  5. Literally used it to code yesterday – it keeps forgetting the context it’s in, doesn’t show its work, keeps hallucinating, and at one point suggested I redo an entire page from the ground up instead of adding a small helper method.

    10/10, will use again on Monday.

  6. After 30 hours of work on a task that takes 30 hours, it only has 360 hours of work to fix the bugs.

  7. This_They_Those_Them on

    Sonnet 4.5 was pushed out probably before it was ready. It took much longer to train than anticipated and was only released to align with an ad campaign.

  8. EarlobeGreyTea on

    Okay, but could you publish actual research on this instead of parroting what Anthropic said?  
    This is just an advertisement for Anthropic. it can all be bullshit, and there are no consequences when it will be shown to be bullshit. 

  9. It’s funny to see this sub consistently vote up to the top the most skeptical and negative takes about the capabilities of AI, only to watch them shift the goalposts further every few months with the next round of advancements. It’s like watching a literal hole in collective human reasoning.

    Fascinating.

  10. I trained an AI bot on ingested resource data and I fine tune the workflow to get the exact behavior I’m looking for, and let me tell you, by far Claude models are superior to Gemini. I’ve been working on this for 3 months and the results are consistent with Anthropics LLMs. Just my experience