All models are from the Gemma 3 family. Thanks Google!
Highlight mode:
Model for highlight:
Legend · Probability of the true token
p < 1e-5
1e-5–1e-4
1e-4–1e-3
0.001–0.01
0.01–0.1
0.1–0.5
p ≥ 0.5
Select models to compare • B = Base, I = Instruct • Click size to toggle both

Deep learning works. Any other facts about this era or about me are relatively unimportant, but sometimes I write about them anyway. Hi, I'm Lovre!

The last few years I have spent fascinated by the The riches! Such simple structures holding so much complexity! Had I but a thousand years to spend on studying a model as simple as Llama 3 8b, it would not be enough. Of my public output, see my early investigations into My work got cited in a few papers, and the later literature (see the many papers that cite Garg et al.) has been quite interesting, but I still feel a lot of mystery in in-context learning. I am still thinking about what exactly would dispel the mystery for me. and a bit of benchmarking of visual LLMs on Tetris.

These days I'm thinking a great deal about how to create language models with more "unusual and diverse" minds, with an emphasis on models' metacognition - synthesizing data (at >10B tokens currently), thinking about some truly queer training objectives, etc.

My best guess is that my expected contribution to the project of understanding LLMs is tiny relative to what it could be, were I to have access to significantly more resources (employment at an ambitious lab, more compute, etc.). If you have liked something that I have done and would like to see more of such things, feel free to reach out.

What's new

2025-10-16. A tree showing the diversity of continuations produced by LLMs.

2025-07-30. How has the distribution of papers posted in ArXiv's categories changed over the past 28 years? A bar chart race!