LiveCodeBench is a contamination-free benchmark that continuously collects new coding problems from LeetCode, AtCoder, and Codeforces. LiveCodeBench uses problems released after model training cutoffs to measure true generalization. It evaluates models on code generation, self-repair (fixing buggy code given error feedback), code execution prediction, and test output prediction.
Each line represents that labs highest scoring model at a time.
Calculation method:
1. Models split into open/closed categories
2. For each month, calculated running maximum within each category
3. Lines carry forward until a new model beats the previous best
Google has massive distribution through Chrome and Android. Soon iOS. Seem like default win due to their reach. Anthropic and OpenAI both have millions of paying users.
Who pays for xAI? At this point it seems Gemini could starve the others through pricing and win if they keep up with the progress.
soukoree on
Every programmer I know uses Claude for coding, not Gemini or OpenAI
3 Comments
Google is coming back in the AI race!
Data Source: Benchmark scores originally from [https://artificialanalysis.ai/](https://artificialanalysis.ai/), which aggregates results from [https://livecodebench.github.io/](https://livecodebench.github.io/). The chart is displayed on [https://pricepertoken.com/trends](https://pricepertoken.com/trends).
LiveCodeBench is a contamination-free benchmark that continuously collects new coding problems from LeetCode, AtCoder, and Codeforces. LiveCodeBench uses problems released after model training cutoffs to measure true generalization. It evaluates models on code generation, self-repair (fixing buggy code given error feedback), code execution prediction, and test output prediction.
Each line represents that labs highest scoring model at a time.
Calculation method:
1. Models split into open/closed categories
2. For each month, calculated running maximum within each category
3. Lines carry forward until a new model beats the previous best
Tool: Built with ECharts, data from [https://pricepertoken.com/trends](https://pricepertoken.com/trends)
Google has massive distribution through Chrome and Android. Soon iOS. Seem like default win due to their reach. Anthropic and OpenAI both have millions of paying users.
Who pays for xAI? At this point it seems Gemini could starve the others through pricing and win if they keep up with the progress.
Every programmer I know uses Claude for coding, not Gemini or OpenAI