NVIDIA partners with Mistral AI to unveil a powerful new creation, a 12 billion parameter small model makes a stunning debut, outperforming Llama 3, and can run on a single 4090 GPU

Mistral has launched a new artificial intelligence model, NeMo 12B, which outperforms similar products in its class.

Mistral AI and NVIDIA have jointly released Mistral NeMo, a new 12B parameter small language model that outperforms Gemma 2 9B and Llama 3 8B in several benchmarks.

Key features of Mistral NeMo:

  • 12 billion parameters
  • 128K context window
  • Trained on NVIDIA DGX Cloud AI platform
  • Optimized with NVIDIA TensorRT-LLM and NeMo framework
  • Released under Apache 2.0 license
  • Uses FP8 data format for efficient inference
  • Designed for enterprise use cases

Performance:

  • Exceeds Gemma 2 9B and Llama 3 8B in multi-turn conversations, math, common sense reasoning, world knowledge and coding benchmarks
  • Slightly behind Gemma 2 9B on MMLU benchmark

Key capabilities:

  • Multilingual support for 11 languages
  • New Tekken tokenizer based on Tiktoken, more efficient than SentencePiece
  • Advanced instruction tuning for better instruction following, reasoning, and code generation

Deployment:

  • Can run on a single NVIDIA L40S, GeForce RTX 4090 or RTX 4500 GPU
  • Compatible with existing systems using Mistral 7B
  • Easily deployable in minutes on various platforms

The collaboration leverages Mistral AI's expertise in training data and NVIDIA's optimized hardware/software ecosystem. Mistral NeMo aims to provide enterprises with a powerful yet practical AI solution that can be readily integrated into commercial applications.

Link to Mistral AI Link to NVIDIA