There are many leaderboards available. Just like Search Engine Optimization (SEO), this is a measurement that gets “gamed” and cheated by many model providers. The common issue is that the model “cheats” by being trained on the leaderboard tests. Then, in real applications the results are worse than in the leaderboard because it has not seen the material before.

<aside> 💡

As leaderboards become popular, the AI companies notice. Something that can happen is Overfitting, where the new AI is trained on the leaderboard questions. This is the equivalent of a student training on exams. The student might get improved exam results, but in the real world, outside of exam questions, they do badly. In the same way, Leaderboard results can be gamed to get results that look better. So your personal experience really matters!

</aside>

Locky’s favourite leaderboards are:

1. Chatbot Arena

lmarena.ai

(Formerly known as lmsys leaderboard)

https://lmarena.ai/

<aside> 🥷

Locky’s Notes

<aside> 💡

Feeling confused already? The good news is, with Expanse, you don’t have to commit to a single model or provider, you can switch as often as you like, even in the middle of a conversation!

</aside>

2. Aider leaderboard (for coding)

https://aider.chat/docs/leaderboards/

https://aider.chat/docs/leaderboards/

<aside> 🥷

Locky’s Notes

3. Personal Experience

Everyone has a different way / use-case of using AI, and so the best way to find the best LLM for you is to trial out multiple different models and compare the results.

<aside> 💡

Your personal experience is one of the most important LLM leaderboards.

</aside>