AI System Crashes 9 Times Due to Malicious Attacks: Oxford and Cambridge Research Featured on Nature Cover

Generated synthetic data may have limitations, similar to the lack of genetic diversity caused by inbreeding.

Research has found that using AI-generated data to train AI models may lead to a phenomenon called "model collapse". The main conclusions are as follows:

  1. If a large amount of AI-generated content is used in training data, the model will develop irreversible defects, and low-probability events in the original content distribution will disappear.

  2. This effect is called "model collapse", similar to inbreeding producing low-quality offspring.

  3. Researchers trained an initial model using Wikipedia articles, then trained multiple generations of models using text generated by the previous generation model.

  4. Results showed that as the number of iterations increased, the quality of model output rapidly declined:

    • Generation 0 began to show factual errors and strange symbols
    • Generation 5 became complete gibberish
    • Generation 9 showed more irrelevant content and garbled text
  5. This indicates that using AI-generated data to train models leads to multi-generational degradation and eventual collapse.

  6. To avoid this situation, more high-quality human-generated data needs to be used for training.

  7. As AI content floods the internet, obtaining genuine human data will become more difficult and valuable in the future.

In conclusion, this research warns of the potential risks of abusing AI-generated data for model training and emphasizes the importance of high-quality human data.