LLM inference: Output format significantly impacts performance, especially JSON

Research has found that format constraints reduce the reasoning ability of large language models (LLMs), especially under JSON format. The main findings include:

The stricter the format constraints, the worse the model's reasoning ability. JSON schema performs the worst, followed by format-restricted instructions (FRI), then natural language to format conversion, and finally natural language prompts.
Different models have different format preferences: GPT prefers YAML, Claude prefers XML, Gemini/Gemma prefers JSON.
Reasons why format constraints reduce reasoning ability:
- Limits the ability to generate intermediate reasoning steps
- Forces formats incompatible with the model's natural generation method
- Format errors may cause correct reasoning to be judged as incorrect
Solutions:
- The best approach is "natural language to format conversion", answering in natural language first, then converting to the target format
- Pay attention to the order of keys in structured output
- Reduce parsing errors through corrective prompts
A balance is needed between easily parsable formats and preserving reasoning ability.
LLMs as answer parsers are better at understanding answer meanings and context than regular expressions.

The research suggests that when applying LLMs, a trade-off between format constraints and reasoning ability is necessary to achieve optimal performance.

Paper link

LLM inference: Output format significantly impacts performance, especially JSON

Strict format restrictions may weaken reasoning abilities.