NVIDIA partners with Mistral AI to unveil a powerful new creation, a 12 billion parameter small model makes a stunning debut, outperforming Llama 3, and can run on a single 4090 GPU

Mistral AI and NVIDIA have jointly released Mistral NeMo, a new 12B parameter small language model that outperforms Gemma 2 9B and Llama 3 8B in several benchmarks.

Key features of Mistral NeMo:

12 billion parameters
128K context window
Trained on NVIDIA DGX Cloud AI platform
Optimized with NVIDIA TensorRT-LLM and NeMo framework
Released under Apache 2.0 license
Uses FP8 data format for efficient inference
Designed for enterprise use cases

Performance:

Exceeds Gemma 2 9B and Llama 3 8B in multi-turn conversations, math, common sense reasoning, world knowledge and coding benchmarks
Slightly behind Gemma 2 9B on MMLU benchmark

Key capabilities:

Multilingual support for 11 languages
New Tekken tokenizer based on Tiktoken, more efficient than SentencePiece
Advanced instruction tuning for better instruction following, reasoning, and code generation

Deployment:

Can run on a single NVIDIA L40S, GeForce RTX 4090 or RTX 4500 GPU
Compatible with existing systems using Mistral 7B
Easily deployable in minutes on various platforms

The collaboration leverages Mistral AI's expertise in training data and NVIDIA's optimized hardware/software ecosystem. Mistral NeMo aims to provide enterprises with a powerful yet practical AI solution that can be readily integrated into commercial applications.

Link to Mistral AI Link to NVIDIA

NVIDIA partners with Mistral AI to unveil a powerful new creation, a 12 billion parameter small model makes a stunning debut, outperforming Llama 3, and can run on a single 4090 GPU

Mistral has launched a new artificial intelligence model, NeMo 12B, which outperforms similar products in its class.