Research has found that format constraints reduce the reasoning ability of large language models (LLMs), especially under JSON format. The main findings include:
-
The stricter the format constraints, the worse the model's reasoning ability. JSON schema performs the worst, followed by format-restricted instructions (FRI), then natural language to format conversion, and finally natural language prompts.
-
Different models have different format preferences: GPT prefers YAML, Claude prefers XML, Gemini/Gemma prefers JSON.
-
Reasons why format constraints reduce reasoning ability:
- Limits the ability to generate intermediate reasoning steps
- Forces formats incompatible with the model's natural generation method
- Format errors may cause correct reasoning to be judged as incorrect
-
Solutions:
- The best approach is "natural language to format conversion", answering in natural language first, then converting to the target format
- Pay attention to the order of keys in structured output
- Reduce parsing errors through corrective prompts
-
A balance is needed between easily parsable formats and preserving reasoning ability.
-
LLMs as answer parsers are better at understanding answer meanings and context than regular expressions.
The research suggests that when applying LLMs, a trade-off between format constraints and reasoning ability is necessary to achieve optimal performance.