The small model era has arrived, with major players like OpenAI, Mistral AI, HuggingFace, and now Apple releasing compact language models.
Apple has entered the small model arena with its new DCLM (Data-Centric Language Model) series, which includes 7 billion and 1.4 billion parameter versions. The 7B model outperforms Mistral-7B and approaches the capabilities of Llama 3 and Gemma.
According to Apple ML researcher Vaishaal Shankar, DCLM is the best performing "truly open source" model to date, with weights, training code, and an open dataset all publicly available. This fully open approach has garnered praise from the AI community.
The DCLM-7B model uses a decoder-only architecture and was trained on 2.5T tokens filtered from a 4T token dataset. It has a context length of 2048 tokens. Performance evaluations show it outperforms other open data models of similar size across multiple benchmarks.
While DCLM-7B's performance is comparable to models like Mistral-7B and Gemma 8B, it lags behind some closed-data models like Phi-3. However, researchers found further improvements when extending training data and context length.
The 1.4B version of DCLM shows particularly strong results for its size, outperforming models like SmolLM, Qwen-1.5B and Phi-1.5B on some metrics.
The DCLM models are built on the DataComp benchmark, which focuses on curating high-quality training data rather than just scaling up model size. This aligns with the growing emphasis many tech giants are placing on training data over model architecture.
While large language models continue to advance, there is increasing interest in smaller, more efficient models from major AI labs. Apple's entry into this space with fully open source models could help accelerate progress in compact yet capable language models.