Global top hedge fund Coatue recently released a major report on "embodied intelligence" titled "The Path to General-Purpose Robots".
Coatue believes that AI robots are a disruptive force with the potential to become one of the biggest technological waves in human history, worthy of high attention.
This report has many highlights, not only analyzing in detail the challenges faced by AI robots at the current stage, but also giving reasonable prospects for industry development, and providing professional opinions from an investment perspective. Whether you are a tech investor, AI practitioner, or someone interested in robots, it's worth reading.
Below I'll interpret this big report for you. The report link is at the end of the article, and interested friends are welcome to read the original.
(1) The ideal is full, the reality is skinny
The robotics industry may be one of the industries with the biggest gap between demos and reality.
In 1961, the first industrial robot was born at GM, used for automotive production lines.
After more than 50 years of development, robots have become more diverse in form and richer in functional scenarios, including floor cleaning robots, quadruped robots, humanoid robots, etc.
Throughout history, robot penetration has actually increased linearly.
Taking industrial robots as an example, the number of robots per 10,000 manufacturing employees increased from 53 in 2013 to 151 in 2022, with a CAGR of 12%.
Although the overall development of the robotics industry is stable and improving, the performance of specific companies is not satisfactory.
Robotics companies generally face difficulties in commercialization, coupled with huge initial capital expenditures, many robotics companies went bankrupt in 2022-2023.
(2) Spatial intelligence makes general-purpose robots possible
The previous generation of robots was more focused on performing certain single tasks, such as floor cleaning robots only responsible for cleaning, agricultural drones only responsible for irrigating farmland, industrial robots only responsible for mechanical welding, etc.
However, with the emergence of AI generalized intelligence, the next generation of robots is expected to become "general-purpose robots", capable of handling various tasks and environments.
Just as large language models make language reasoning a reality, large spatial models are expected to break the fourth wall, allowing AI to truly understand the physical world and interact with it.
(3) The core challenge facing robots: lack of training data
Tasks that are simple for humans may not be easy for robots.
Coatue gave three specific examples.
Dexterity:
Spatial perception ability:
Balance recovery ability:
To overcome these problems, massive data is needed for training to make robots smarter.
However, robotics is a very new field, severely lacking in the accumulation of training data.
Comparing the largest datasets in different modalities, the text modality has about 15T tokens, the image modality has 6B image-text paired data, and the video modality has 2.6B audiovisual feature data.
However, the robot modality only has 2.4 million data segments, which is far from enough compared to other modalities.
(4) Four ways to collect robot training data
Since data is the core bottleneck for robot development, what methods can quickly accumulate robot training data?
In recent years, research in this area has emerged in succession, gradually forming four schools of thought.
Robot data collection method 1: Teleoperation
As the name suggests, experimental personnel operate mechanical handles to remotely control robots to make the same actions, thereby accumulating data.
Robot data collection method 2: AR
In a study called "Explainable Human-Robot Training and Cooperation with Augmented Reality", researchers used AR (Augmented Reality) technology to make the human-machine interaction process more explainable, thereby accumulating data.
Robot data collection method 3: Simulation
Through massive computing power for simulation calculations, massive robot training datasets are calculated.
Simulation may be the path most likely to achieve large-scale data generation at present, requiring huge computing power support behind it.
Currently, Jim Fan's team at Nvidia is taking this technical path.
Robot data collection method 4: Video learning
Through multimodal large models, robots directly learn human actions through videos, thereby accumulating training data.
(5) The golden cross of robot cost and human wages
With the decline in GPU costs, the cost of large model training has dropped significantly.
Over the past year, the rental price of A100 GPUs on the Azure cloud platform has dropped from $6/hour to $1.5/hour, a decrease of 75%.