Groundbreaking Release: Llama 3.1 Open-Source Large Model Leads New Era of AI for All

Utilizing 16,000 H100 GPUs, trained on 15 billion tokens.

01. 405B Open-Source Model Benchmarks Against GPT-4o, 25 Partners Ready

Meta evaluated performance on over 150 benchmark datasets. Llama 3.1 405B is comparable to GPT-4o, Claude 3.5 Sonnet, and Gemini Ultra in a range of tasks including common sense, actionability, mathematics, tool use, and multilingual translation.

In real-world scenarios, Llama 3.1 405B was compared with human evaluations, outperforming GPT-4o and Claude 3.5 Sonnet overall.

The upgraded Llama 3.1 8B and 70B models also perform better than models of similar size. These smaller models support the same 128K token context window, multilingual capabilities, improved inference, and state-of-the-art tool use to enable more advanced applications.

Meta updated its license to allow developers to use the output of Llama models, including the 405B parameter scale, to improve other models for the first time.

Meanwhile, Meta's open-source ecosystem has further expanded, with over 25 companies launching new Llama 3.1 models.

Among them, Amazon Web Services, Databricks, and NVIDIA are rolling out full services to support developers in fine-tuning and training their own models. AI chip startup Groq and others have built low-latency, low-cost inference services for all new models released by Meta this time.

These models will also be available on major cloud platforms such as Amazon Web Services, Microsoft Azure, Google Cloud, and Oracle.

Companies like Scale AI, Dell, and Deloitte are ready to help enterprises adopt Llama models and train custom models using their own data.

Llama 3.1 405B is not only the strongest open-source model but also has the potential to become the strongest model overall, further narrowing the gap between open-source and closed-source models.

02. Complete Optimization of Training Stack, Focus on Model Scalability

To train models based on 15 trillion tokens while achieving the desired effects for researchers in a reasonable timeframe, Meta fully optimized its training stack.

In addressing these challenges, Meta chose to focus on keeping the model development process scalable and more direct strategies:

  1. Researchers chose the standard decoder-only Transformer model architecture with minor adjustments, rather than adopting MoE (Mixture of Experts) models, to maximize training stability.

  2. Researchers adopted an iterative post-training procedure, using supervised fine-tuning and direct preference optimization in each round. This allows the model to create the highest quality synthetic data for each round and improve performance in each capability.

Compared to previous Llama series models, Meta improved the quantity and quality of data used for pre-training and post-training. These improvements include developing more careful preprocessing and management pipelines for pre-training data, developing more rigorous quality assurance, and filtering methods for post-training data.

As expected by the Scaling Laws of large language models, Meta's new flagship model outperforms smaller models trained using the same strategy. Meta also used the 405B parameter model to improve the training quality of its smaller models.

To support large-scale inference of the 405B parameter model, researchers quantized the model from BF16 to FP8, effectively reducing the required computational requirements and allowing the model to run within a single server node.

In terms of instruction and chat fine-tuning, researchers generated the final model through several rounds of alignment on top of the pre-trained model, each involving supervised fine-tuning (SFT), rejection sampling (RS), and direct preference optimization (DPO), using synthetic data generation to produce the vast majority of SFT examples to generate higher quality synthetic data across all capabilities.

Additionally, Meta employed various data processing techniques to filter this synthetic data to the highest quality, allowing the new model to scale fine-tuning data volume across capabilities.

In terms of data, researchers also carefully balanced the data to produce high-quality models with all capabilities. For example, ensuring model quality on short context benchmarks allows it to scale to 128K context lengths.

Furthermore, Meta announced the launch of a comprehensive Llama system. This system, in addition to covering Llama models, involves the coordination of multiple components and external tool calls to help developers develop custom products stronger than the base models.

The Llama system will cover a series of new components, including open-source new safety tools such as Llama Guard 3 (multilingual safety model) and Prompt Guard (prompt injection filter). To connect disparate components, Meta also released a request for comments on the Llama Stack API, a standard interface to make it easier for third-party projects to utilize Llama models.

For ordinary developers, using models at the 405B scale remains a challenge, requiring significant computational resources and expertise.

Based on the Llama system, generative AI development is not just about prompting models; everyone should be able to use 405B models to accomplish more tasks, including real-time and batch inference, supervised fine-tuning, evaluating models for specific applications, continuous pre-training, retrieval-augmented generation (RAG), function calling, synthetic data generation, and more.

This is the largest model Meta has launched to date, with more device-friendly sizes, more modalities, and updates at the Agent level to come in the future.

03. 405B Large Model Dramatically Improves Meta AI, Quest Smart Voice Assistant Upgraded

Now, many of Meta's terminals, such as WhatsApp and Meta AI chatbots, have started using Llama 3.1 405B.

Meta AI currently supports seven new languages, and Meta has launched a batch of new Meta AI creative tools, mainly focusing on visual generation, mathematics, and coding.

First, looking at visual generation, Meta AI introduced the "Imagine Me" image generation prompt feature, which allows users to input "Imagine me" in Meta AI chat and add prompts, such as "Imagine me as a member of royalty" or "Imagine me in a surrealist painting," to generate images and share them with friends and family.