Technical overview of DeepSeek-R1

January 29, 2025

Introduction

DeepSeek-R1 is a first-generation reasoning large language model (LLM). Established in 2023 by the Chinese hedge fund High Flyer and led by CEO Liang Wenfeng, DeepSeek has made headlines with its high-performing AI chatbot, DeepSeek R1.
Among the notable features of DeepSeek R1 are its open-source model, energy efficiency, cost-effectiveness compared to other large language models, and strong reasoning capabilities.

Architecture of DeepSeek R1

DeepSeek R1 leverages a combination of advanced machine learning techniques and optimized algorithms to deliver impressive performance. Unlike traditional AI models that require vast amounts of computing power and data, DeepSeek R1 operates efficiently with significantly lower resources. This efficiency is achieved through combination of below principles,

Core Architectural Principles:

Mixture-of-Experts (MoE):
DeepSeek R1 employs an MoE architecture. Imagine it as a team of specialized experts, each tackling a specific aspect of a problem. When faced with a query, the model strategically activates only the necessary experts, optimizing resource utilization and enhancing efficiency.
Reinforcement Learning (RL) algorithm:
Unlike many LLMs trained primarily on massive datasets, DeepSeek R1 heavily leverages Group Relative Policy Optimization (GRPO), a reinforcement learning algorithm introduced in the DeepSeek Math paper in 2024. This approach allows the model to learn through trial and error, refining its reasoning strategies based on rewards and feedback. It's similar to teaching a child to solve puzzles by rewarding successful attempts and guiding them towards better approaches.
Open-Source Philosophy:
DeepSeek R1 is open-source, making its code and architecture accessible to the broader AI community. This promotes collaboration, innovation, and the potential for further advancements in LLM development.

Training Methodology:

DeepSeek R1 undergoes training process in multiple stages:

Supervised Fine-Tuning (SFT):
The model is initially fine-tuned on a curated dataset of high-quality examples, providing a foundation for basic language understanding and reasoning.
Reasoning-Oriented RL:
The model then enters an intensive RL phase. It's tasked with solving complex reasoning problems, and its performance is evaluated based on accuracy and the clarity of its reasoning steps. This iterative process refines the model's ability to break down problems, generate intermediate steps, and arrive at correct solutions.

Overview of DeepSeek R1's Performance

reference image for Ref: https://github.com/deepseek-ai/DeepSeek-R1/tree/main — Ref: https://github.com/deepseek-ai/DeepSeek-R1/tree/main

Reasoning Tasks:
Below are some parameters where DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks,

AIME 2024: DeepSeek R1 achieves a score of 79.8% Pass@1, slightly surpassing OpenAI-o1-1217.
MATH-500: It attains an impressive score of 97.3%, performing on par with OpenAI-o1-1217 and significantly outperforming other models.
Coding: DeepSeek R1 demonstrates expert-level performance in code competitions, achieving a 2,029 Elo rating on Codeforces, outperforming 96.3% of human participants.
Engineering: It performs slightly better than DeepSeek-V3, aiding developers in real-world tasks.

Knowledge Benchmarks:

MMLU: 90.8% on MMLU.
MMLU-Pro: 84.0% on MMLU-Pro.
GPQA Diamond: 71.5% on GPQA Diamond.
SimpleQA: Outperforms DeepSeek-V3, excelling in handling fact-based queries.

Other Tasks:
DeepSeek R1 excels in creative writing, general question answering, editing, summarization, and more. Additionally it demonstrates outstanding performance in tasks requiring long-context understanding

AlpacaEval 2.0: Achieves an impressive length-controlled win-rate of 87.6%.
Arena-Hard: Attains a win-rate of 92.3%

Reference: DeepSeek-AI et al. (2025) Deepseek-R1: Incentivizing reasoning capability in LLMS via reinforcement learning, arXiv.org. Available at: https://arxiv.org/abs/2501.12948 (Accessed: 29 January 2025).

Looking for AI agents to enhance your business? Contact us here

Authored by Bob Head, AI Agents Live