Mistral AI and NVIDIA have jointly released Mistral NeMo, a new 12B parameter small language model that outperforms Gemma 2 9B and Llama 3 8B in several benchmarks.
Key features of Mistral NeMo:
- 12 billion parameters
- 128K context window
- Trained on NVIDIA DGX Cloud AI platform
- Optimized with NVIDIA TensorRT-LLM and NeMo framework
- Released under Apache 2.0 license
- Uses FP8 data format for efficient inference
- Designed for enterprise use cases
Performance:
- Exceeds Gemma 2 9B and Llama 3 8B in multi-turn conversations, math, common sense reasoning, world knowledge and coding benchmarks
- Slightly behind Gemma 2 9B on MMLU benchmark
Key capabilities:
- Multilingual support for 11 languages
- New Tekken tokenizer based on Tiktoken, more efficient than SentencePiece
- Advanced instruction tuning for better instruction following, reasoning, and code generation
Deployment:
- Can run on a single NVIDIA L40S, GeForce RTX 4090 or RTX 4500 GPU
- Compatible with existing systems using Mistral 7B
- Easily deployable in minutes on various platforms
The collaboration leverages Mistral AI's expertise in training data and NVIDIA's optimized hardware/software ecosystem. Mistral NeMo aims to provide enterprises with a powerful yet practical AI solution that can be readily integrated into commercial applications.