AI Ping Pong Robot Defeats Humans, Reaches Intermediate Player Level

DeepMind's New Artificial Intelligence Creation

Exceeding expectations, this robot has already reached an intermediate level.

After seeing its performance, netizens expressed: ### Can we buy it? Want one.

Able to handle unexpected situations with ease

Table tennis is a sport that requires high comprehensive abilities in various aspects such as physical fitness, strategy, and skills. Humans often need years of training to master it.

Therefore, unlike pure strategy games like chess or Go, table tennis has become an important benchmark for testing the comprehensive abilities of robots, such as high-speed movement, real-time precise control, strategic decision-making, system design, and so on.

For example, faced with different landing points of the ball, the robot needs to quickly move its position; faced with an obvious out-of-bounds ball, the robot should choose not to return it.

The team found 29 table tennis players of different skill levels to compete, including beginners, intermediate, advanced, and above advanced players.

Humans and robots played 3 matches, following standard table tennis rules. (However, since the robot cannot serve, humans served throughout the game)

Before this, there have been corresponding table tennis robot studies. The special feature of this Google robot is that it can engage in comprehensive competitive matches with humans it has never seen before.

It can quickly adapt to various playing styles of humans.

For example, look at this player. At the beginning of the match, the robot was obviously still in the process of adapting, and the human defeated the robot with a big score of 9-2.

But just after the next game, the robot obviously became familiar with the opponent's style and was always closely chasing the score. Both sides played back and forth.

In the end, among all opponents, the robot won all beginner matches and had a 55% win rate in matches against intermediate players.

Although the robot currently cannot defeat advanced players, from various human feedback, we can see that everyone is very willing to play with this robot.

How to master the little table tennis?

Before introducing the method, let's take a look at the hardware configuration of the table tennis robot.

The main body uses a 6-degree-of-freedom ABB 1100 robotic arm from the Swiss company, installed on two Festo linear guides, allowing it to move in a plane. The horizontal movement guide is 4 meters long, and the vertical movement guide is 2 meters long.

A 3D printed paddle handle and a paddle covered with short pips rubber are mounted on the robotic arm.

How did this little thing learn to play table tennis?

In summary, it used a hybrid training method combining ### reinforcement learning and ### imitation learning.

The team designed a hierarchical and modular strategy architecture. The Agent includes a low-level skill library (LLC) and a high-level controller (HLC).

LLC is a set of specialized strategies, each trained to perform ### specific table tennis skills such as forehand hits, backhand hits, serves, etc. These LLCs use CNN architecture and are trained through evolutionary strategy algorithms in simulation environments.

The training process used a dataset of ball state data collected from the real world to ensure consistency between the simulated environment and the real environment.

The HLC is responsible for ### selecting the most appropriate LLC for each incoming ball.

It contains multiple components: a style strategy for choosing forehand or backhand; a spin classifier for identifying the type of spin on incoming balls; LLC skill descriptors describing the capabilities of each LLC; and a set of heuristic strategies for shortlisting candidate LLCs based on the current situation.

The HLC also uses online learning of LLC preferences to adapt to opponent characteristics and bridge the sim-to-real gap.

Specifically, the team first collected a small amount of human match data, set initial task conditions, then trained an Agent in a simulated environment using reinforcement learning, and then zero-shot deployed the strategy to the real world.

They used the MuJoCo physics engine to accurately simulate ball and robot dynamics, including air resistance, Magnus effect, etc., and designed handling for topspin "correction" by switching different paddle parameters in the simulation to simulate the effects of topspin and backspin in the real world.

In the process of continuous play between the Agent and humans, more training task conditions can be generated, and training-deployment can be repeated.

As the robot's skills gradually improve, the matches also become increasingly complex, but still based on real-world task conditions. After collecting data, the robot can also discover its own deficiencies and subsequently compensate for these defects through continuous training in the simulated environment.

Through this method, the robot's skills can be automatically iteratively improved in a cycle combining simulation and reality.

In addition, this robot can track the opponent's behavior and playing style to adapt to different opponents, such as which part of the table the opponent tends to return the ball to.

This allows it to try different techniques, monitor its success rate, and adjust strategies in real-time.

In experiments playing against humans, the team also found a weakness in this robot: it's not good at handling backspin balls.

According to the estimation of ball spin,