Meta has officially released Llama 3.1, including models of 8B, 70B, and 405B parameters, with the maximum context length increased to 128k. The main features include:
-
The 405B version is one of the largest open-source models to date, outperforming existing top AI models.
-
Introduction of longer context windows (up to 128K tokens), capable of handling more complex tasks and conversations.
-
Support for multilingual input and output, enhancing versatility and applicability.
-
Improved inference capabilities, particularly excelling in solving complex mathematical problems and real-time content generation.
Meta states that the era of open-source large language models lagging behind closed-source models is coming to an end, with Llama 3.1 ushering in a new era led by open-source. The 405B version is now comparable in performance to GPT-4 and Claude 3.
In terms of model architecture, Llama 3.1 was trained on over 15 trillion tokens of data, using more than 16,000 H100 GPUs. To ensure stability and convenience, it adopts a standard decoder-only Transformer architecture rather than an MoE architecture.
The research team implemented iterative post-training methods, enhancing model functionality through supervised fine-tuning and direct preference optimization. They also explored using the 405B model as a "teacher model" for smaller models.
Meta also released a complete reference system with multiple example applications and new components, such as Llama Guard 3 and Prompt Guard. They proposed a standardized "Llama Stack" interface to simplify the construction of toolchain components and applications.
According to benchmarks, the 405B version is comparable or slightly superior to closed-source models like GPT-4 in multiple tests. The 8B and 70B versions also significantly outperform other open-source models of similar scale.