Alleged Llama 3.1 Leak: 405 Billion Parameter Open-Source Model Surpassing GPT-4 Emerges

Llama 3.1 has reportedly leaked, including benchmark results for 8B, 70B and 405B parameter models. Even the 70B version outperforms GPT-4o on several benchmarks, marking the first time an open-source model has surpassed closed-source models like GPT-4o and Claude Sonnet 3.5 on multiple benchmarks.

Key details from the leaked model card:

Trained on 15T+ tokens of publicly available data up to December 2023
Fine-tuning data includes public instruction datasets and 15 million synthetic samples
Supports English, French, German, Hindi, Italian, Portuguese, Spanish and Thai

The models reportedly have a 128k context length and use grouped-query attention for improved inference scalability.

Intended uses include multilingual commercial applications and research. The instruction-tuned models are optimized for assistant-like chat, while pre-trained models can be adapted for various natural language generation tasks.

Training infrastructure:

Custom training library and Meta's GPU clusters
39.3M GPU hours on H100-80GB hardware
Estimated 11,390 tons CO2e emissions (0 tons market-based due to renewable energy use)

Benchmark scores are reported for various tasks, with Llama 3.1 models outperforming many open and closed-source chat models.

Safety considerations:

Multi-pronged data collection approach combining human-generated and synthetic data
LLM-based classifiers for quality control
Focus on reducing model refusals and refusal tone
Adversarial prompts incorporated into safety data
Intended for deployment as part of a larger AI system with additional safeguards

Developers should implement system-level safety measures when building agent systems, especially when utilizing new features like longer context windows, multilingual capabilities, and third-party tool integrations.

[Links to referenced papers and sources have been omitted]