Blog.

What to Read and Watch about LLM Training and Reasoning?

Image for a What to Read and Watch about LLM Training and Reasoning? post

8 Resources to Better Understand How Modern Models Work and Learn

1. Andrej Karpathy’s 3-hour video “Deep Dive into LLMs like ChatGPT”

Comprehensive high-level overview covering the basics, architecture, fine-tuning, reasoning, and reinforcement learning for LLMs. Perfect for your first deep dive into LLM theory.

Also, a follow-up video about practical use of LLMs:

2. “Transformers” by 3Blue1Brown.

A visually intuitive video explaining the internal structure of transformers. Essential to conceptually grasp the architecture before exploring modern techniques.

3. Free NLP course by Hugging Face.

Great illustrations, clear examples—everything you need to practically run and fine-tune models yourself.

Introduction - Hugging Face LLM Course
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

4. Paper on DeepSeekMath by DeepSeek.

Though not explicitly about reasoning, this article thoroughly explains data collection, pretraining, experimentation, and reinforcement learning. Provides solid foundations for understanding how models are trained in industry.

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models


5. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” from Google Research.

One of the pioneering and most influential papers discussing the concept and impact of Chain-of-Thought (CoT). Essential historical foundation with many illustrative examples.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

6. DeepSeek-R1 Paper.

Clearly and concisely written with practical insights. Reading this will improve your understanding of reasoning better than 99.9% of ChatGPT users.

DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1
Contribute to deepseek-ai/DeepSeek-R1 development by creating an account on GitHub.

7. “Learning to Reason with LLMs,” talk by Noam Brown from OpenAI.

Insightful discussion on reasoning, scaling compute, gaming environments, and how the industry developed reasoning-based models and agents.

8. MIT’s relaunched legendary course “6.S191: Introduction to Deep Learning.”

Covering NLP, computer vision, LLMs, and applications in medicine. An end-to-end course with both theory and practical implementations using current libraries. Suitable even for beginners: basic calculus and matrix multiplication skills required, the rest explained during the course. Lectures are uploaded to YouTube every Monday.