An investigation by Amazon Web Services (AWS) has discovered that Perplexity, a company that uses AWS servers to train their AI models, is using web scraping techniques to collect data from certain websites. Web scraping, or data scraping, involves using software to extract HTML code from web pages and filter and store information automatically.
Developer Robb Knight and Wired uncovered evidence that Perplexity violated the Robots Exclusion Protocol by scraping data from specific websites without permission. The Robots Exclusion Protocol requires website owners to place a robots.txt file on their domain to specify which pages should not be accessed by robots and crawlers.
AWS has strict policies prohibiting customers from engaging in illegal activities and must comply with all applicable laws. Perplexity claims to adhere to the Robots Exclusion Protocol and states that their services do not violate AWS terms of service, except for rare cases where their bot ignores robots.txt in order to retrieve specific URLs.
However, investigations by Wired suggest that Perplexity’s chatbot sometimes ignores robots.txt in order to collect unauthorized information, raising concerns about potential violations of AWS terms of service and the legality of Perplexity’s data collection methods.
Newly discovered fossils of a colossal, salamander-like predator have shed light on a creature that…
This summer, Thunder Bay National Marine Sanctuary welcomed Charlie Azzarito, a Hollings Scholar from Florida…
Chelsea's left-back Ben Chilwell is uncertain about his future with the club due to ongoing…
In recent years, the district has invested heavily in security by installing a 10-gigabit-per-second, dark…
Marco Materazzi, a former World Cup winner and Inter supporter, recently shared his thoughts on…
The West Texas A&M University Meat Science Quiz Bowl team recently made history by taking…