Meta has released the Llama 3.1 model, including versions in 8B, 70B, and 405B sizes. The main features are:
- Maximum context length increased to 128K
- Multilingual support
- Excellent code generation performance
- Complex reasoning capabilities
From benchmark results:
- Llama 3.1 405B surpasses GPT-4 0125, and competes with GPT-4o and Claude 3.5
- Llama 3.1 8B outperforms Gemma 2 9B 1T and Mistral 7B Instruct
- Llama 3.1 70B outperforms GPT-3.5 Turbo
Training details for Llama 3.1 405B:
- Trained on over 15 trillion tokens
- Trained on over 16,000 H100 GPUs
- Used iterative post-training program, combining supervised fine-tuning and direct preference optimization
- Improved quantity and quality of pre-training and post-training data
- Quantized from 16-bit precision to 8-bit precision, reducing computational resource requirements
Other highlights:
- Provides open/free model weights and code
- License allows users to fine-tune, distill models, and deploy arbitrarily
- Offers Llama Stack API for easy integration
- Supports coordination of multiple components, including calling external tools
Meta no longer prohibits using Llama 3 to improve other models, reflecting a more open attitude. This release marks the first time open-source large models have matched closed-source models in performance, ushering in a new era led by open source.