Musk's xAI launches new model: Significant progress but not yet leading

Grok's image generation feature has no restrictions, allowing users to freely create images of political figures, while ChatGPT refuses to generate such content.

"Grok's progress is rocket-like." Musk excitedly announced the arrival of Grok-2 on X.

On August 14 local time, xAI released test versions of two AI models, Grok-2 and Grok-2mini. Among them, Grok-2 is the company's most powerful language model in terms of reasoning ability, while the lightweight model Grok-2mini is Grok-2's "sibling product," attempting to achieve strong functionality with a smaller parameter size.

xAI stated in a blog post that the early preview version of Grok-2 has made significant progress compared to Grok-1.5, featuring cutting-edge capabilities in chatting, coding, and reasoning.

The company claims that an early version of Grok-2, tested under the name "sus-column-r," outperformed Anthropic's Claude 3.5 Sonnet and OpenAI's GPT-4-Turbo on the LMSYS leaderboard. The LMSYS leaderboard ranks models based on random anonymous one-on-one "battles" between large language models using the ELO rating system.

Sus-column-r (early version of Grok 2) is now public, and with over 12,000 community votes, it ranked 3rd on the overall leaderboard, on par with GPT-4o. It ranked 2nd in coding, 4th in difficult prompts, and 2nd in mathematics.

xAI also tested Grok's interaction with new models through an AI tutor system. Grok-2 focused on evaluating the model's abilities in two key areas: following instructions and providing accurate, truthful information. Grok-2 showed significant improvements in reasoning about retrieved content and tool use, such as correctly identifying missing information, reasoning through event sequences, and discarding irrelevant posts.

Additionally, xAI evaluated the Grok-2 model through a series of academic benchmarks, including reasoning, reading comprehension, mathematics, science, and coding. The company stated, "Performance in areas such as graduate-level scientific knowledge, common sense, and math competition problems is comparable to other cutting-edge models."

Musk is deeply integrating xAI with his acquired social media platform "X" - Grok-2 and Grok-2mini will support X's enhanced search functionality, in-depth understanding of posts, and improved reply functions, despite previous opposition to xAI's use of X user data for training.

A major highlight of this update is that the Grok-2 model can generate images on X, using the recently popular Flux.1 model, although currently limited to Premium and Premium+ users on X.

Since Grok's image generation feature has no restrictions, many users have used it to create images of political figures. For example, a user generated an image of George Washington, the first U.S. president, using Grok-2, which was even reposted by Musk. However, OpenAI's ChatGPT would refuse to generate such images to avoid political risks.

It's worth noting that Grok-2 and Grok-2mini are still in the testing phase. The company expects to make these two models available to developers through its enterprise API later this month. The upcoming API is built on a new custom technology stack, allowing multi-region inference deployment for global low-latency access, while providing enhanced security features such as mandatory multi-factor authentication, traffic statistics, and advanced billing analytics.

After parting ways with OpenAI, Musk predicted that artificial general intelligence would be achieved by 2029, and his founded xAI ultimately aims to make AI products accessible to consumers, businesses, and everyone, becoming useful tools. It hopes to use AI to help people solve complex scientific and mathematical problems and "understand" the universe.

xAI's actions are also accelerating. The company conducted its first funding round in January 2024, raising $135 million; in May, it completed a $6 billion Series B funding round, with the company's valuation soaring from $18 billion to $25 billion, becoming another AI unicorn in the United States.

In July, Musk stated that the xAI team had begun training on the "Memphis Supercluster." This cluster consists of 100,000 liquid-cooled H100 GPUs, aiming to train "the world's most powerful AI by every metric" before December this year.

His ambition doesn't stop there; he has revealed that xAI plans to build a supercomputer "super factory of computing power," expected to be four times the scale of the most powerful competitors in the market.

As a "latecomer" in large models, Musk believes xAI can bring new breakthroughs and innovations in the field of artificial intelligence, while emphasizing that competition is beneficial for driving progress in the entire industry, avoiding a unipolar world dominated by a single company in the AI field.

However, judging from the two latest models released, they have not shown innovation that surpasses the industry and still play the role of a follower. For Grok-2 to break through in the competition with OpenAI, Google, and other tech companies, it needs to present more powerful products.