loading...

Meta scientists reveal Llama 3.1 training process, Llama 4 development begins

Meta researcher Thomas Scialom discusses the Llama 3.1 model and its future prospects.

Llama 3.1 Research and Development Approach

How to Determine Parameter Scale

Need to consider multiple factors such as scaling law, training time, GPU hardware constraints, etc.
Consider not only Meta's own hardware but also the situation of the entire AI community
The application of quantization technology has changed the proportion of inference and training/fine-tuning costs
Found a balance point of 405B under existing computing power and constraints
The goal is to create an open-source model comparable to GPT-4

Revisiting Scaling Law

Traditional Scaling Law focuses on two dimensions: model weights and training volume
Chinchilla emphasized the importance of total training data tokens
Meta chose to increase training token count and duration, allowing the model to "over-train"
This doesn't comply with Chinchilla's law but can achieve better inference performance

Model Architecture

Not much change compared to Llama 2 architecture, mainly expanded data scale and quality
Future improvements may involve more architectural changes, not limited to Transformer
Current Transformer architecture still lacks flexibility
Exploring MoE architecture

On Synthetic Data

Large amounts of low-quality text exist on the public internet
Using Llama as a classifier to filter high-quality tokens
Llama 3 post-training uses entirely synthetic data obtained from Llama 2
Optimistic about the prospects of synthetic data

LLM Evaluation and Improvement

Risk of overfitting when improving benchmark scores through post-training
Language model evaluation is a difficult problem
Tried various evaluation methods, such as reward models, model-as-a-judge, etc.
Multi-round RLHF is a good method for comparing models

Llama 4 and Agent

Meta began training Llama 4 model in June
Focus may be on agent technology
Some work has been done on agent tools like Toolformer
Excellent instruction models are the foundation for expanding agent capabilities
Meta's released GAIA benchmark is used to evaluate the ability to solve real-world problems
Various agent capabilities are closely related to the model's intelligence level