Llama 3.1 405B: Open-source AI Giant Surpassing GPT-4 Leads a New Era

Meta has just launched the latest version of the Llama model, Llama 3.1, as planned.

Meta has released the Llama 3.1 model, including versions in 8B, 70B, and 405B sizes. The main features are:

  • Maximum context length increased to 128K
  • Multilingual support
  • Excellent code generation performance
  • Complex reasoning capabilities

From benchmark results:

  • Llama 3.1 405B surpasses GPT-4 0125, and competes with GPT-4o and Claude 3.5
  • Llama 3.1 8B outperforms Gemma 2 9B 1T and Mistral 7B Instruct
  • Llama 3.1 70B outperforms GPT-3.5 Turbo

Training details for Llama 3.1 405B:

  • Trained on over 15 trillion tokens
  • Trained on over 16,000 H100 GPUs
  • Used iterative post-training program, combining supervised fine-tuning and direct preference optimization
  • Improved quantity and quality of pre-training and post-training data
  • Quantized from 16-bit precision to 8-bit precision, reducing computational resource requirements

Other highlights:

  • Provides open/free model weights and code
  • License allows users to fine-tune, distill models, and deploy arbitrarily
  • Offers Llama Stack API for easy integration
  • Supports coordination of multiple components, including calling external tools

Meta no longer prohibits using Llama 3 to improve other models, reflecting a more open attitude. This release marks the first time open-source large models have matched closed-source models in performance, ushering in a new era led by open source.

Model download link

92-page training report paper