“Claude’s success caught even Anthropic’s own red-team hackers off guard.
“Originally it was just me at a hotel realizing that PicoCTF had started and being like, ‘Oh, I wonder if Claude could do some of these challenges,'” Lucas [an Anthropic red-teamer] said.
* PicoCTF is the largest capture-the-flag competition for middle school, high school, and college students. Participants are tasked with reverse-engineering malware, breaking into systems, and decrypting files.
* Lucas began by just pasting the first challenge verbatim into Claude.ai. The only hiccup he encountered was the need to download a third-party tool, but once that was done, Claude instantly solved the problem.
* “Claude was able to solve most of those challenges and get in the top 3% of PicoCTF,” he said.
As Lucas continued this laissez-faire experiment in other competitions, Claude kept surpassing expectations.
* Lucas entered a few more using only Claude.ai and Claude Code. At the time, Sonnet 3.7 was Anthropic’s most advanced available model.
* The red team provided only minimal help — usually when Claude needed to install a piece of software. Besides that, Claude was on its own.”
LookAtThatBacon on
> Keane Lucas, a member of Anthropic’s red team, first entered Claude into a hacking competition — Carnegie Mellon’s PicoCTF — on a whim this past spring.
–
> PicoCTF is the largest capture-the-flag competition for middle school, high school, and college students.
It’s good for the kids to be forced to face the tool that will take away any future job prospects for them. /s
thedabking123 on
Yeah- not surprising considering it’s trained on almost all the public code available online… of which there is considerable.
I think the danger here is someone else doing the same and not blocking the prompts from bad actors.
Remember boys and girls… transformers are simple text-in and text out at the end of the day (Well slightly more than that with agents and multimodal, but still the principal is the same… it’s not conscious).
Eelroots on
So, you can’t ask them to draw a boob, but you can ask them to breach in a bank network?
kurtatwork on
Considering you can basically google most of the answers……. im not that impressed…
ftgyhujikolp on
Claude can’t tell me what dependencies are in a lock file lol.
formerdaywalker on
Breaking News: AI trained on programming is average at programming. In unrelated news, AI companies hope this increases their stock prices.
Talinoth on
God these kids are fucking hopeless. I mean, I was too, but I was hoping it’d get better, not worse over time… AI is going to steal 90%+ of their jobs – not necessarily because the LLM is actually that good, but because using it is *much cheaper* than than paying top dollar for easily replaced trash college grads who struggle with basic tasks below the level of current LLMs in the hopes you get 1/10 who’s actually got the potential for senior work.
One Redditor replied that Claude’s performance wasn’t impressive because:
>Considering you can basically google most of the answers……. im not that impressed…
Uhuh. Okay. *Why weren’t the competitors doing that then in other competitions?* In the article I just read, AI-based teams entered multiple different competitions, and the teams using AI were substantially more effective than most human competitors at the competitions (the humans who could presumably also look up Google or Stack Exchange for answers).
8 Comments
“Claude’s success caught even Anthropic’s own red-team hackers off guard.
“Originally it was just me at a hotel realizing that PicoCTF had started and being like, ‘Oh, I wonder if Claude could do some of these challenges,'” Lucas [an Anthropic red-teamer] said.
* PicoCTF is the largest capture-the-flag competition for middle school, high school, and college students. Participants are tasked with reverse-engineering malware, breaking into systems, and decrypting files.
* Lucas began by just pasting the first challenge verbatim into Claude.ai. The only hiccup he encountered was the need to download a third-party tool, but once that was done, Claude instantly solved the problem.
* “Claude was able to solve most of those challenges and get in the top 3% of PicoCTF,” he said.
As Lucas continued this laissez-faire experiment in other competitions, Claude kept surpassing expectations.
* Lucas entered a few more using only Claude.ai and Claude Code. At the time, Sonnet 3.7 was Anthropic’s most advanced available model.
* The red team provided only minimal help — usually when Claude needed to install a piece of software. Besides that, Claude was on its own.”
> Keane Lucas, a member of Anthropic’s red team, first entered Claude into a hacking competition — Carnegie Mellon’s PicoCTF — on a whim this past spring.
–
> PicoCTF is the largest capture-the-flag competition for middle school, high school, and college students.
It’s good for the kids to be forced to face the tool that will take away any future job prospects for them. /s
Yeah- not surprising considering it’s trained on almost all the public code available online… of which there is considerable.
I think the danger here is someone else doing the same and not blocking the prompts from bad actors.
Remember boys and girls… transformers are simple text-in and text out at the end of the day (Well slightly more than that with agents and multimodal, but still the principal is the same… it’s not conscious).
So, you can’t ask them to draw a boob, but you can ask them to breach in a bank network?
Considering you can basically google most of the answers……. im not that impressed…
Claude can’t tell me what dependencies are in a lock file lol.
Breaking News: AI trained on programming is average at programming. In unrelated news, AI companies hope this increases their stock prices.
God these kids are fucking hopeless. I mean, I was too, but I was hoping it’d get better, not worse over time… AI is going to steal 90%+ of their jobs – not necessarily because the LLM is actually that good, but because using it is *much cheaper* than than paying top dollar for easily replaced trash college grads who struggle with basic tasks below the level of current LLMs in the hopes you get 1/10 who’s actually got the potential for senior work.
One Redditor replied that Claude’s performance wasn’t impressive because:
>Considering you can basically google most of the answers……. im not that impressed…
Uhuh. Okay. *Why weren’t the competitors doing that then in other competitions?* In the article I just read, AI-based teams entered multiple different competitions, and the teams using AI were substantially more effective than most human competitors at the competitions (the humans who could presumably also look up Google or Stack Exchange for answers).