AI Giants' Battle: Can GPT-5 Surpass Sora and Reshape the Industry Landscape?

The booming trend of AI short dramas reveals the development trend of multimodal large models. This phenomenon not only reflects the potential application of artificial intelligence technology in the creative field but also demonstrates the ability of multimodal models to integrate various forms of information such as text, images, and videos. With continuous technological advancements, multimodal large models are reshaping the way content is created and consumed, bringing new opportunities and challenges to the digital entertainment industry.

Here is the English translation:

Xiaomi Chairman and CEO Lei Jun recently stated: "Short dramas seem to have opened up a new world, with faster pace, more excitement, and better viewing experience than web novels."

While short dramas are becoming wildly popular, some creators have also discovered the value of AI in this process. ### Since its release on July 13, China's first AIGC original fantasy micro-short drama "Mirror of Mountains and Seas" has quickly gone viral on major video platforms, with over 10 million views on Kuaishou. Through clever use of AI technology, mythical figures and strange creatures described in the "Classic of Mountains and Seas" have been transformed from text into vivid images on screen. With its realistic and smooth performance, it has successfully broken people's stereotypical impressions of AI video production effects.

In addition, ### "Sanxingdui: Future Revelation" produced by Bona Film Group's AIGMS Production Center has also achieved significant results and response upon release. Jiang Defu, CEO of Bona Film Group, said that Bona used industrialized film processes to produce this short drama using AI, aiming to leverage their mature film experience to enhance the technical content of AI short dramas and tell Chinese stories well through the AI short drama track.

It can be said that the "breakout" of AI short dramas has taken full advantage of "timing, geographical and human conditions". From production tools to platforms to audiences, a complete ecosystem chain has created a nurturing soil for its development.

The success of these works is not just a technological breakthrough, but also a microcosm of the application of multimodal large models in artistic creation. It demonstrates not only AI's processing capabilities in visual and auditory aspects, but also achieves profound understanding and innovative expression of cultural elements through deep learning and natural language processing technologies.

Lowered Expectations, What Can OpenAI Use to Save Itself

Amidst this thriving scene, one can't help but recall the former "concept god" - Sora.

As OpenAI's brand new generative video large model, it indeed caused unprecedented sensation when first released. When OpenAI officially unveiled Sora in February, the global internet and social media were instantly shocked by its powerful features, seemingly recreating the glorious moment of GPT-3.5's release.

As soon as Sora was released, it quickly became the focus of the tech world with its three core advantages. The ability to generate ultra-long videos up to 60 seconds and break through the 4-second coherence bottleneck of previous AI video generation models amazed the industry and the public. Secondly, Sora not only supports multi-angle shots, but can also achieve smooth one-take shooting, generating images that perfectly demonstrate the light and shadow relationships, physical occlusion and collision effects in the scene, making the video content more vivid and realistic.

At that time, Sora was even seen by OpenAI as a "world simulator", not just a video generation model, but an intelligent tool that could understand and simulate the physical laws of the real world.

In the early stages of release, people marveled at the technological innovation and convenience brought by Sora. Many professionals predicted that Sora would revolutionize the field of video production, completely changing traditional video production methods.

However, as of today, Sora is still preparing for its official launch, including adversarial testing, where it has undergone rigorous testing by red teams composed of experts from various fields to identify and mitigate potential risks such as misinformation, hate content, and bias.

Meanwhile, OpenAI has also allowed visual artists, designers, and filmmakers early access to Sora to gather feedback and improve the model, especially for the needs of creative professionals. To enhance transparency and safety, OpenAI is developing tools to detect misleading content generated by Sora and plans to include C2PA metadata in the model. Additionally, the company is collaborating with policymakers, educators, and artists globally to understand their concerns and identify positive use cases for Sora. These activities have led to the delayed release of Sora.

As time passed, the practical application of Sora has not progressed as rapidly as expected. Although OpenAI has made significant technological breakthroughs, it has consistently failed to transform this technology into an actual usable product and bring it to market.

For the vast majority of users, this contrast undoubtedly leads to disappointment and anxiety. On one hand, there's the "full ideal" that Sora can quickly change the landscape of video production, lowering the barriers to creation and allowing more people to easily produce high-quality video content; on the other hand, there's the "harsh reality" of Sora's slow landing process.

Sora's predicament reflects not just delays or shortcomings in technological implementation, but more deeply the common challenges faced by AI technology in the process of commercial application. From algorithm optimization to data processing, from cultivating user habits to improving market acceptance, each step requires meticulous polishing and time to settle. And in this fast-paced era, the mismatch between users' desire for instant gratification and the maturity curve of AI technology often leads to a huge gap between expectations and reality.

Easy to Conquer, Hard to Defend; GPT-5 from Technology Worship to Trust Crisis

Apart from Sora's closed-door training, the sudden release of GPT-4o mini has further fueled public opinion, with some netizens jokingly saying, "GPT-3.5 is out of a job, will GPT-5 be far behind? Altman: It will be!" Although the release of GPT-5 seems like a mirage, most people still firmly believe in OpenAI's technological prowess.

However, competition and changes in the AI field are also intensifying daily. Not only are more and more companies and research institutions joining the R&D and application of AI technology, but numerous AI products in vertical fields are constantly emerging, winning user favor with more precise positioning and more personalized services.

In comparison, OpenAI's attractiveness in the industry seems to have diminished, and its "domination of the field" is becoming increasingly difficult to maintain.

Similar to when OpenAI officially stopped providing API services to China and other regions on the 9th of this month, what was thought to be a new technological monopoly turned out to be contrary to expectations and did not cause a stir in China.

Faced with OpenAI's "cut-off", domestic companies reacted quite positively this time. As soon as the news broke, AI companies like Zhipu AI, Baidu, Alibaba, and Tencent immediately launched "relocation plans" for API services, starting to absorb customers who previously used OpenAI's API services through price reductions and simplified processes.

We need not seek answers as to why they chose to abandon the Chinese market, but the performance of domestic large model vendors is sufficient to prove that ### from the perspective of market environment and large model deployment conditions, domestic large models can indeed be users' preferred choice.

In the so-called "year of large models", we discussed model scale and model capabilities, but the rapid increase in technology in just one year has already begun to make companies think about how to implement and commercialize. The recent concentrated explosion of products like Kuaishou's Keling and SenseTime's Vimi is a microcosm of technology implementation. Continuous innovation has become the cornerstone of enterprise survival and development.

Large Model Home believes that for OpenAI, continuous innovation means constantly exploring new areas of artificial intelligence, pushing the boundaries of technology, and creating products that can truly solve real-world problems. The launch of GPT-5 should not simply be an upgrade of the previous generation product, but a qualitative leap, in order to maintain OpenAI's leadership position in the field of artificial intelligence.

Afterword: Can Multimodality Become a New Opportunity for Curve Overtaking

The explosion of AI short dramas is undoubtedly a noteworthy phenomenon, but it's just the tip of the iceberg in the development of the domestic multimodal field. This phenomenon is far from an isolated display of technological progress, but a comprehensive manifestation of deep integration of technological innovation with local culture, precise capture of market demand, and collaborative development of the entire industry chain.

If we zoom out from the specific phenomenon of AI short dramas, this deep integration of technological innovation with local culture, market demand, and industrial ecology is precisely China's key advantage in the field of multimodal artificial intelligence. Whether it's precise diagnosis in the healthcare field, intelligent transformation in the education industry, or the rapid development of intelligent manufacturing and Industry 4.0, multimodal artificial intelligence is creating new