Meta scientists reveal Llama 3.1 training process, Llama 4 development begins

Meta researcher Thomas Scialom discusses the Llama 3.1 model and its future prospects.

Llama 3.1 Research and Development Approach

How to Determine Parameter Scale

  • Need to consider multiple factors such as scaling law, training time, GPU hardware constraints, etc.
  • Consider not only Meta's own hardware but also the situation of the entire AI community
  • The application of quantization technology has changed the proportion of inference and training/fine-tuning costs
  • Found a balance point of 405B under existing computing power and constraints
  • The goal is to create an open-source model comparable to GPT-4

Revisiting Scaling Law

  • Traditional Scaling Law focuses on two dimensions: model weights and training volume
  • Chinchilla emphasized the importance of total training data tokens
  • Meta chose to increase training token count and duration, allowing the model to "over-train"
  • This doesn't comply with Chinchilla's law but can achieve better inference performance

Model Architecture

  • Not much change compared to Llama 2 architecture, mainly expanded data scale and quality
  • Future improvements may involve more architectural changes, not limited to Transformer
  • Current Transformer architecture still lacks flexibility
  • Exploring MoE architecture

On Synthetic Data

  • Large amounts of low-quality text exist on the public internet
  • Using Llama as a classifier to filter high-quality tokens
  • Llama 3 post-training uses entirely synthetic data obtained from Llama 2
  • Optimistic about the prospects of synthetic data

LLM Evaluation and Improvement

  • Risk of overfitting when improving benchmark scores through post-training
  • Language model evaluation is a difficult problem
  • Tried various evaluation methods, such as reward models, model-as-a-judge, etc.
  • Multi-round RLHF is a good method for comparing models

Llama 4 and Agent

  • Meta began training Llama 4 model in June
  • Focus may be on agent technology
  • Some work has been done on agent tools like Toolformer
  • Excellent instruction models are the foundation for expanding agent capabilities
  • Meta's released GAIA benchmark is used to evaluate the ability to solve real-world problems
  • Various agent capabilities are closely related to the model's intelligence level

Original link