[Design Arena](https://www.designarena.ai/) is a crowdsource benchmark where users provide large language models a prompt and then compare generations (e.g. websites, games, images, etc.) from several models at random. So far, the voting platform has amassed 30K+ unique users.
The leaderboard above is determined by win rate (% of comparisons in which a user picked a generation from model X over the other generation). Elo rating is an approximate formula based off win-rate to adjust for number of battles participated in.
We’re always trying to improve the benchmark, so let us know if you have feedback!
UchuYagi on
Probably anecdotal, but from my experience in the last ~1yr of heavy usage:
New Code and Refactoring:
1. Claude Sonnet 4
2. Gemini 2.5 Pro
3. o4
Debugging:
1. o4
2. Gemini 2.5 Pro
3. Claude Sonnet 4
This is on massive corporate React and Vue codebases with a few additional libraries.
nut-sack on
im amazed at how many people are using deepseek even tho it has been shown to communicate back with .cn hosts.
3 Comments
[Design Arena](https://www.designarena.ai/) is a crowdsource benchmark where users provide large language models a prompt and then compare generations (e.g. websites, games, images, etc.) from several models at random. So far, the voting platform has amassed 30K+ unique users.
The leaderboard above is determined by win rate (% of comparisons in which a user picked a generation from model X over the other generation). Elo rating is an approximate formula based off win-rate to adjust for number of battles participated in.
We’re always trying to improve the benchmark, so let us know if you have feedback!
Probably anecdotal, but from my experience in the last ~1yr of heavy usage:
New Code and Refactoring:
1. Claude Sonnet 4
2. Gemini 2.5 Pro
3. o4
Debugging:
1. o4
2. Gemini 2.5 Pro
3. Claude Sonnet 4
This is on massive corporate React and Vue codebases with a few additional libraries.
im amazed at how many people are using deepseek even tho it has been shown to communicate back with .cn hosts.