This instance receives 500+ IPs with differing user agents all connecting at once but keeping within rate limits by distribution of bots.
The only way I know it’s a scraper is if they do something dumb like using “google.com” as the referrer for every request or by eyeballing the logs and noticing multiple entries from the same /12.
How is blocking scrapers easy?
This instance receives 500+ IPs with differing user agents all connecting at once but keeping within rate limits by distribution of bots.
The only way I know it’s a scraper is if they do something dumb like using “google.com” as the referrer for every request or by eyeballing the logs and noticing multiple entries from the same /12.
Exactly this, you can only stop scrapers that play by the rules.
Each one of those books powering GPT had like protection on them already.