Nvidia NIM Upgrade: Both a Blessing and a Challenge
Nvidia announced that Nvidia NIM has been further optimized and standardized the complex deployment of AI models. NIM is a key component in Nvidia's AI strategy. Jensen Huang has repeatedly praised the innovation brought by NIM, calling it "### AI-in-a-Box, essentially it's artificial intelligence in a box."
This upgrade undoubtedly consolidates Nvidia's leadership in the AI field, becoming an important part of its technological moat.
CUDA has long been considered a key factor in Nvidia's establishment of leadership in the GPU field. With the support of CUDA, GPUs have evolved from single graphics processors to general-purpose parallel computing devices, making AI development possible. However, although Nvidia's software ecosystem is very rich, these scattered systems are still too complex and difficult to master for traditional industries lacking basic AI development capabilities.
To address this issue, in March this year, Nvidia introduced NIM (Nvidia Inference Microservices) cloud-native microservices at the GTC conference, integrating all the software developed over the past few years to simplify and accelerate the deployment of AI applications. NIM can package models as optimized "containers" that can be deployed in the cloud, data centers, or workstations, allowing developers to complete tasks in minutes, such as easily building generative AI applications for co-pilots, chatbots, and more.
Now, the NIM ecosystem developed by Nvidia can provide a series of pre-trained AI models. Nvidia announced that it helps developers accelerate application development and deployment in multiple fields, and focuses on providing specific AI models in different areas (such as understanding, digital humans, 3D development, robotics, and digital biology):
In the direction of understanding, NIM can use Llama 3.1 and NeMo Retriever to enhance text data processing capabilities; In the direction of digital humans, models such as Parakeet ASR and FastPitch HiFiGAN are provided to support high-fidelity speech synthesis and automatic speech recognition, providing powerful tools for building virtual assistants and digital humans;
In terms of 3D development, models such as USD Code and USD Search simplify the creation and manipulation of 3D scenes, helping developers build digital twins and virtual worlds more efficiently;
In the direction of robot embodiment, Nvidia introduced MimicGen and Robocasa models, accelerating the research and development and application of robotics through generating synthetic motion data and simulating environments. MimicGen NIM can generate synthetic motion data based on remote operation data recorded by spatial computing devices such as Apple Vision Pro. Robocasa NIM can generate robot tasks and simulation-ready environments in OpenUSD (a universal framework for development and collaboration in 3D worlds).
Models such as DiffDock and ESMFold in the field of digital biology provide advanced solutions in drug discovery and protein folding prediction, promoting advances in biomedical research, and so on.
In addition, Nvidia announced that the Hugging Face inference-as-a-service platform is also supported by Nvidia NIM, running in the cloud.
By integrating these multifunctional models, Nvidia's ecosystem not only improves the efficiency of AI development but also provides innovative tools and solutions. However, although the many upgrades of Nvidia NIM are indeed a "blessing" for the industry, from another perspective, they also bring many challenges to programmers.
Nvidia NIM greatly simplifies the development and deployment process of AI models by providing pre-trained AI models and standardized APIs, which is indeed a blessing for developers, but does it also mean that job opportunities for ordinary programmers may further shrink in the future? After all, companies can complete the same work with fewer technical personnel because these tasks have been pre-completed by NIM, and ordinary programmers may no longer need to perform complex model training and tuning work.
Teaching AI to Think in 3D, Building Virtual Physical Worlds
Nvidia also demonstrated the application of generative AI on the open USD and Omniverse platforms at the SIGGRAPH conference.
Nvidia announced that it has built the world's first generative AI model capable of understanding language, geometry, materials, physics, and space based on OpenUSD (Universal Scene Description), and packaged these models as Nvidia NIM microservices. Currently, there are three NIMs available for preview in the Nvidia API catalog: USD Code, for answering knowledge questions about OpenUSD and generating OpenUSD Python code; USD Search, allowing developers to search vast OpenUSD 3D and image databases using natural language or image input; USD Validate, which can check the compatibility of uploaded files with OpenUSD release versions and generate fully RTX-rendered path-traced images using the Omniverse cloud API.
Nvidia stated that with the enhancement and accessibility of Nvidia NIM microservices to OpenUSD, various industries will be able to build physics-based virtual worlds and digital twins in the future. Through new generative AI based on OpenUSD and Nvidia accelerated development frameworks built on the Nvidia Omniverse platform, more industries can now develop applications for visualizing industrial design and engineering projects, as well as for simulating environments to build the next wave of physical AI and robotics. Additionally, new USD connectors link robotics and industrial simulation data formats and developer tools, enabling users to stream large-scale, fully Nvidia RTX ray-traced datasets to Apple Vision Pro.
In short, introducing USD through Nvidia NIM, better understanding the physical world and building virtual worlds through large models is a very valuable digital asset. For example, in 2019, Notre-Dame de Paris suffered a severe fire, with large areas of the cathedral destroyed. Fortunately, Ubisoft game designers had visited this building countless times to study its structure and completed the digital restoration of Notre-Dame de Paris, recreating all the details of Notre-Dame in the AAA game "Assassin's Creed: Unity," which also greatly helped in the restoration of Notre-Dame. At that time, designers and historians took two years to replicate, but with the introduction of this technology, we can greatly speed up the recreation of digital copies in the future, using AI to understand and replicate the physical world more meticulously.
Another example is designers building basic 3D scenes in Omniverse and using these scenes to adjust generative AI, achieving a controllable and collaborative content creation process. For instance, WPP and Coca-Cola Company were the first to adopt this workflow to expand their global advertising campaigns.
Nvidia also announced several upcoming new NIM microservices, including USD Layout, USD Smart Material, and FDB Mesh Generation, to further enhance developers' application capabilities and efficiency on the OpenUSD platform.
This time, NVIDIA Research brought more than 20 papers to the conference, sharing innovative results in advancing synthetic data generators and inverse rendering tools, two of which won the Best Technical Paper Award. The research showcased this year demonstrates that ### AI improves simulation capabilities by enhancing image quality and unlocking new 3D representation methods; at the same time, improved synthetic data generators and more content also enhance AI's capabilities. These studies showcase Nvidia's latest advancements and innovations in AI and simulation.
Nvidia stated that designers and artists now have new improved methods to increase productivity by using generative AI trained on licensed data. For example, Shutterstock (a US image provider) launched the commercial beta version of its generative 3D service. It allows creators to quickly prototype 3D assets and generate 360 HDRi backgrounds to illuminate scenes using only text or image prompts; and Getty Images (a US image trading company) accelerated its generative AI service, doubling image generation speed and improving output quality. These services are based on the multimodal generative AI architecture Nvidia Edify, which doubles speed with new models, improves image quality and prompt accuracy, and allows users to control camera settings such as depth of field or focal length. Users can generate four images in about six seconds and scale them up to 4K resolution.
Conclusion
In various occasions where Jensen Huang appears, he always wears a leather jacket, describing to the world the exciting future brought by AI.
We have also experienced Nvidia's growth, witnessing Nvidia's step-by-step progression from a gaming GPU giant to an AI chip leader, and then to a full-stack layout across AI hardware and software. Nvidia's ambition is strong, iterating rapidly at the forefront of AI technology waves.
From programmable shading GPUs, CUDA accelerated computing, to the introduction of Nvidia Omniverse and generative AI NIM microservices, to promoting the development of 3D modeling, robot simulation, and digital twin technologies, it also means the arrival of a new round of AI industry innovation.
However, as large companies have more resources, including funds, technology, and manpower, they can adopt and implement advanced technologies such as Nvidia NIM faster. Small and medium-sized enterprises, due to limited resources, may find it difficult to keep up with the pace of technological development. In addition to the differences in talent and technical levels, will this lead to more technological inequality in the future?
The ideal AI in human minds is to help humans liberate their hands and labor, bringing a world of higher productivity to humans. But when productivity and means of production are controlled by a few people, will it trigger a deeper crisis? These are all questions we need to think about.