Paper Presentation

Large Language Models as Markov Chains

- By Oussama Zekri, Student Researcher, ENS Paris-Saclay and Imeprial College London

Read the Paper

In recent years, large language models (LLMs) have emerged as transformative tools in natural language processing (NLP), demonstrating exceptional performance across a myriad of tasks. Despite their success, the theoretical foundations explaining how these models achieve such impressive capabilities remain inadequately understood. Oussama Zekri's research takes a significant step toward unraveling this mystery by establishing a crucial equivalence between autoregressive language models and Markov chains, thereby providing a novel lens through which to analyze LLMs.

Join us for an insightful webinar where Oussama Zekri presents his innovative research paper titled “Large Language Models as Markov Chains.” In this session, Oussama will delve into several key findings:

  • Equivalence between LLMs and Markov Chains: Oussama establishes an equivalence between generic autoregressive language models with a vocabulary size of T and a context window of size K and Markov chains defined on a finite state space. This insight paves the way for a deeper understanding of the mechanics behind LLMs.
  • Theoretical Analysis: The presentation will highlight the derivation of several surprising findings, including the existence of a stationary distribution for the Markov chains that represent LLM inference capabilities, their convergence rates, and the role of temperature in these dynamics.
  • Generalization Bounds: Oussama will explain the theoretical guarantees regarding pre-training and in-context generalization, enriching our interpretation of LLM performance.
  • Empirical Validation: The discussion will include experimental results showcasing how recent LLMs conform to the theoretical predictions of the analysis, revealing the models' capability to operate effectively as Markov chain learners.

Meet our Speaker:

Oussama Zekri

Oussama Zekri is a final-year mathematics student at ENS Paris-Saclay and currently an intern at Imperial College London. His research spans applied mathematics and machine learning, with recent work focused on generative models. Oussama has completed internships at Huawei Noah's Ark Lab, Kyoto University's System Optimization Laboratory, and Centre Borelli, ENS Paris-Saclay, contributing to projects on large language models, convex optimization, time series, and optimal transport. He has authored multiple research papers and co-authors a research blog, logB.