Llama 3.1 405B: Open-source AI Giant Surpassing GPT-4 Leads a New Era

Meta has released the Llama 3.1 model, including versions in 8B, 70B, and 405B sizes. The main features are:

Maximum context length increased to 128K
Multilingual support
Excellent code generation performance
Complex reasoning capabilities

From benchmark results:

Llama 3.1 405B surpasses GPT-4 0125, and competes with GPT-4o and Claude 3.5
Llama 3.1 8B outperforms Gemma 2 9B 1T and Mistral 7B Instruct
Llama 3.1 70B outperforms GPT-3.5 Turbo

Training details for Llama 3.1 405B:

Trained on over 15 trillion tokens
Trained on over 16,000 H100 GPUs
Used iterative post-training program, combining supervised fine-tuning and direct preference optimization
Improved quantity and quality of pre-training and post-training data
Quantized from 16-bit precision to 8-bit precision, reducing computational resource requirements

Other highlights:

Provides open/free model weights and code
License allows users to fine-tune, distill models, and deploy arbitrarily
Offers Llama Stack API for easy integration
Supports coordination of multiple components, including calling external tools

Meta no longer prohibits using Llama 3 to improve other models, reflecting a more open attitude. This release marks the first time open-source large models have matched closed-source models in performance, ushering in a new era led by open source.

Model download link

92-page training report paper

Llama 3.1 405B: Open-source AI Giant Surpassing GPT-4 Leads a New Era

Meta has just launched the latest version of the Llama model, Llama 3.1, as planned.