AI-Driven Tech Giants: Efficiency Soars, Human-Machine Collaboration Reshapes Organizational Structures

01 Key Points

Here are some key points Alex discovered after organizing AI agents into structures similar to companies like Apple, Microsoft, Google, etc.:

Companies with multiple "competing" teams (i.e. competing to produce the best final product), like Microsoft and Apple, outperformed centralized hierarchical structures.
Systems with single points of failure (like one leader making important decisions), such as Google, Amazon and Oracle, performed poorly.
The organizational structures of large tech companies had a moderate but noticeable impact on problem-solving abilities.

02 AI Agents and Tech Giant Organizations

Previous methods of simply increasing the number of AI agents to improve performance, like SWE-bench, did not achieve significant results.

This suggests that relying solely on increasing numbers cannot solve the problem.

So what other methods can make AI agents better at software engineering?

Three weeks ago, Alex happened to see an article by James Huckle about "Conway's Law" - software and product architecture is destined to reflect the organizational structure that created it.

James showed an illustration revealing the dramatized organizational structures of Amazon, Google, Facebook, Microsoft, Apple and Oracle, and proposed an idea:

Just like humans in large tech companies, multi-agent communication structures may shape problem-solving approaches.

Inspired by this, Alex decided to test James' hypothesis on SWE-bench instances.

03 Experimental Setup

The author organized AI agents into different company structures and evaluated six different organizational structures on the 13-instance "mini" subset of SWE-bench-lite.

In constructing these six organizations, he designed multi-agent organizational structures based on some core observations:

Amazon

A binary tree with a "manager" at the top.

To replicate this structure, Alex used a large number of agents performing code repository searches, and a single agent executing final code repository updates.

Google

A tree structure similar to Amazon, but with more connections between middle layers.

Alex replicated this by aggregating all agent results within a single layer and passing them to agents in the next layer.

Meta (Facebook)

Lacks hierarchical structure, but still a mesh organization with many connections between agents.

Alex modified the original agent design by increasing the possibility of transitions between different agents.

Microsoft

Emphasizes competing teams, each with its own hierarchy.

Essentially, Alex readjusted Amazon's structure (reducing the number of agents) and used a vector similarity voting method to select the "best" solution from three separate runs (each run slightly adjusting the hierarchical structure).

Apple

Many small competing teams, each with its own minimal structure.

Alex used the same "best solution" approach as Microsoft, but with more runs without agent hierarchies (each run having different transitions).

Oracle

Has two distinct teams, a larger "legal" binary tree and a smaller engineering tree.

Alex interpreted the legal team as agents searching the code repository and retrieving key context, while the engineering team consisted of agents actually writing code.

The structures of both teams are similar to Amazon, with a single agent at the top coordinating information transfer between "legal" and "engineering".

04 Evaluation Results

To evaluate each set of patches on SWE-bench, the author used SWE-bench evaluation.

The results are as follows:

Organizational Chart Performance Analysis

Here are some of the author's observations on how different company structures affect performance:

Competitive teams increase chances of success.

The two best performers (Microsoft and Apple) both had multiple teams competing to solve problems, while other companies seemed to have only one huge team generating a single patch.