Llama 3 Efficiency Greatly Improved: Agent Q Emerges, OpenAI's New Project Faces Challenge

Emerging company MultiOn launches advanced AI assistant Q.

  • project.

Agent Q is described as a self-supervised agent framework capable of reasoning and searching. It can engage in self-play and reinforcement learning through real tasks on the internet, allowing for self-correction and autonomous improvement.

The CEO of MultiOn, Div Garg, frequently uses a strawberry emoji when mentioning Agent Q on Twitter, fueling speculation about connections to OpenAI's Q* project.

Agent Q has its own Twitter account that posts unusual and human-like content. The account's background image and profile information make numerous references to strawberries, even using a photo of strawberries from Sam Altman's garden.

Interestingly, the account is followed by several tech leaders and influencers, including Y-Combinator CEO Garry Tan, Quora CEO Adam D'Angelo, New York Times columnist Kevin Roose, Wharton AI professor Ethan Mollick, and multiple OpenAI employees. Sam Altman has also recently interacted with the account.

According to Div Garg, Agent Q has planning, reasoning, and self-repair capabilities. They claim to have improved Llama 3's zero-shot performance by 340% with just one day of training, achieving a 95.4% success rate on real-world booking tasks.

The official demo video shows Agent Q performing tasks like booking restaurants, meetings, and flights, involving multi-step planning, reasoning, decision-making, and interaction with various applications.

While MultiOn has published a research paper, Agent Q is not yet available for public testing. Users can join a waitlist to apply for beta access.

Agent Q combines guided Monte Carlo Tree Search (MCTS), AI self-reflection, iterative fine-tuning, and Direct Preference Optimization (DPO) to improve generalization in multi-step reasoning tasks. Key components include:

  1. MCTS-based guided search to autonomously generate diverse data
  2. AI self-criticism for step-level feedback
  3. DPO for off-policy training on aggregated datasets

Evaluation experiments show significant improvements over baseline methods on both simulated and real-world tasks. On the Open Table booking task, Agent Q improved LLaMa-3's zero-shot success rate from 18.6% to 95.4%.