AI Model New Trends: Balancing Miniaturization and High Performance

Large language models may be powerful, but smaller models offer better value for money.

It's Not That Large Models Are Unaffordable, But Small Models Offer Better Value

In the vast realm of AI, small models have always had their own legend.

Looking outward, last year's sensational Mistral 7B was hailed as the "best 7B model" upon release, outperforming the 13B parameter Llama 2 in multiple evaluation benchmarks, and surpassing Llama 34B in reasoning, math, and code generation.

This year, Microsoft also open-sourced its strongest small-parameter large model, phi-3-mini. Despite having only 3.8B parameters, its performance evaluation results far exceed models of similar parameter scale, rivaling larger models like GPT-3.5 and Claude-3 Sonnet.

Looking inward, MiniMax introduced the edge-side language model MiniCPM in early February, with only 2B parameters, achieving stronger performance with a smaller size. It outperforms the popular French large model Mistral-7B and is dubbed the "little steel cannon".

Recently, MiniCPM-Llama3-V2.5, with only 8B parameters, surpassed larger models like GPT-4V and Gemini Pro in multimodal comprehensive performance and OCR capabilities, which also led to plagiarism by Stanford University's AI team.

Until last week, OpenAI's late-night bombshell introduced GPT-4o mini, described as the "most powerful and cost-effective small parameter model", bringing everyone's attention back to small models.

Since OpenAI dragged the whole world into the imagination of generative AI, from competing on long context, to parameters, intelligent agents, and now price wars, domestic and international developments have always revolved around one logic - moving towards commercialization to stay at the table.

Therefore, in various public opinion arenas, the most eye-catching is that OpenAI, which has reduced prices, seems to be entering the price war as well.

Many people may not have a clear concept of GPT-4o mini's pricing. GPT-4o mini costs 15 cents per 100,000 input tokens and 60 cents per 100,000 output tokens, which is over 60% cheaper than GPT-3.5 Turbo.

In other words, generating a 2,500-page book with GPT-4o mini would only cost 60 cents.

OpenAI CEO Sam Altman also couldn't help but sigh on X that compared to GPT-4o mini, the strongest model two years ago not only had a huge performance gap but also cost 100 times more to use.

As the price war for large models intensifies, some efficient and economical open-source small models are also more likely to attract market attention. After all, it's not that large models are unaffordable, but small models offer better value.

On one hand, with global GPUs being bought up and even out of stock, open-source small models with lower training and deployment costs are also sufficient to gradually gain the upper hand.

For example, MiniCPM introduced by MiniMax can achieve a cliff-like drop in inference costs due to its smaller parameters. It can even achieve CPU inference, requiring only one machine for continuous parameter training and one graphics card for parameter fine-tuning, while also having room for continuous improvement in costs.

If you are a mature developer, you can even train a vertical model in the legal field by building your own small model, with inference costs possibly only one-thousandth of using a large model for fine-tuning.

The application of some edge-side "small models" has allowed many manufacturers to see the dawn of profitability first. For example, MiniMax helped the Shenzhen Intermediate People's Court launch and operate an AI-assisted trial system, proving the value of the technology to the market.

Of course, more accurately, the change we will begin to see is not a shift from large models to small models, but a shift from single-category models to a combination of models, with the choice of appropriate models depending on the specific needs of the organization, the complexity of the task, and available resources.

On the other hand, small models are easier to deploy and integrate in mobile devices, embedded systems, or low-power environments.

Small models have relatively small parameter scales and require less computational resources (such as AI computing power, memory, etc.) compared to large models, allowing them to run more smoothly on resource-constrained edge devices. Moreover, edge devices typically have more extreme requirements for power consumption and heat generation, and specially designed small models can better adapt to the limitations of edge devices.

Honor CEO Zhao Ming has said that due to AI computing power issues, parameters on the edge side may be between 1B and 10B, while the cloud computing capability of network large models can reach 10-100 billion or even higher, which is the difference between the two.

A mobile phone is in a very limited space, right? It supports 7 billion under limited battery, limited heat dissipation, and limited storage environment. Just imagine so many constraints, it must be the most difficult.

We have also unveiled the behind-the-scenes hero responsible for operating Apple's intelligence, where a fine-tuned 3B small model dedicated to summarization, polishing, and other tasks, with the support of adapters, performs better than Gemma-7B and is suitable for running on mobile terminals.

So we see that former OpenAI genius Andrej Karpathy recently proposed a judgment that the competition for model size will "reverse involution", not getting bigger and bigger, but competing for who is smaller and more flexible.

How Can Small Models Win with Small Size

Andrej Karpathy's prediction is not unfounded.

In this data-centric era, models are rapidly becoming larger and more complex. Super-large models trained on massive amounts of data (such as GPT-4) are mostly used to memorize a large number of irrelevant details, that is, rote memorization of materials.

However, fine-tuned models can even "win with small size" on specific tasks, with usability comparable to many "super-large models".

Hugging Face CEO Clem Delangue has also suggested that up to 99% of use cases can be solved by using small models and predicted that 2024 will be the year of small language models.

Before exploring the reasons, we need to popularize some knowledge first.

In 2020, OpenAI proposed a famous law in a paper: Scaling law, which refers to the increase in performance as the model size increases. With the introduction of models like GPT-4, the advantages of the Scaling law have gradually become apparent.

Researchers and engineers in the AI field believe that by increasing the number of parameters in the model, they can further enhance the model's learning ability and generalization ability. This way, we have witnessed model scales leap from tens of billions of parameters to hundreds of billions, and even climb towards trillion-parameter models.

In the world of AI, the scale of a model is not the only measure of its intelligence.

On the contrary, a well-designed small model, through optimized algorithms, improved data quality, and advanced compression techniques, can often demonstrate performance comparable to or even better than large models on specific tasks.

This strategy of winning with small size is becoming a new trend in the AI field. Among them, improving data quality is one of the ways small models win with small size.

Satish Jayanthi, CTO and co-founder of Coalesce, once described the effect of data on models like this:

If LLMs existed in the 17th century and we asked ChatGPT whether the Earth was round or flat, it would answer that the Earth was flat because the data we provided made it believe this was a fact. The data we provide to LLMs and how we train them will directly affect their output.

To produce high-quality results, large language models need to be trained on high-quality, targeted data for specific topics and domains. Just as students need quality textbooks to learn, LLMs also need quality data sources.