
LLM Leaderboard
Compare how different large language models perform at writing Clerk code and select the one that best fits your requirements.
| Model / Average | Authentication | Users | Organizations | Webhooks | API Routes | Checkout Flow |
|---|---|---|---|---|---|---|
1 GPT-5 69% | 30% | 50% | 86% | 83% | 56% | 18% |
2 GPT-5 Chat 67% | 10% | 50% | 57% | 75% | 67% | 100% |
3 Claude Sonnet 4.5 60% | 20% | 67% | 14% | 91% | 78% | 9% |
4 v0-1.5-md 60% | 30% | 33% | 14% | 67% | 100% | 90% |
5 Claude Sonnet 4 56% | 30% | 33% | 43% | 60% | 44% | 9% |
6 Claude Opus 4 52% | 10% | 33% | 29% | 67% | 78% | 9% |
7 GPT-4o 50% | 20% | 25% | 14% | 60% | 11% | 0% |
Last updated: October 21, 2025
Trusted by fast-growing companies around the world.

