The Claude team recently sparked controversy by large-scale scraping of a company's website content. The specific situation is as follows:
- Accessed the company's servers 1 million times in 24 hours, scraping website content without payment
- Ignored the website's "no scraping" notice, forcibly occupying server resources
- The affected company attempted to defend but failed, content data was still captured
The company's head expressed dissatisfaction on social media:
Hey Anthropic, I know you're thirsty for data. Claude is really smart! But you know what, this is not! Cool! Oh!
Many netizens expressed anger about this, with some suggesting using "steal" rather than "unpaid" to describe Anthropic's behavior.
Event details:
- The affected company is iFixit, a US website providing electronic product repair guides
- Claude's crawler program ClaudeBot sent thousands of requests per minute within hours
- About 1 million visits in one day, downloading 10 TB of files, totaling 73 TB in May
- iFixit's website states that copying content for AI training without permission is prohibited
- iFixit CEO stated ClaudeBot scraped all data without permission, filling up the server
- iFixit has modified its robots.txt file to block Anthropic crawlers
Anthropic responded that they have deactivated the old crawler, but did not address whether ClaudeBot complies with robots.txt.
This is not the first time AI companies have scraped website content on a large scale:
- In April, the Linux Mint forum crashed due to ClaudeBot scraping
- Some suggested planting traceable information on websites to detect data theft
- iFixit found its information was not only scraped by Claude but also obtained by OpenAI
- Multiple AI companies were accused of ignoring robots.txt settings and forcibly scraping
Some people call for creators to move content to paid areas to prevent unlimited scraping. However, whether this approach will be effective remains to be seen.