• Sat. Jul 6th, 2024

AWS Discovers Perplexity’s Web Scraping Violation: Is This Legal?

BySamantha Jones

Jul 1, 2024
Amazon Web Services is looking into whether Perplexity employs ‘web scraping’ to train its AI

An investigation by Amazon Web Services (AWS) has discovered that Perplexity, a company that uses AWS servers to train their AI models, is using web scraping techniques to collect data from certain websites. Web scraping, or data scraping, involves using software to extract HTML code from web pages and filter and store information automatically.

Developer Robb Knight and Wired uncovered evidence that Perplexity violated the Robots Exclusion Protocol by scraping data from specific websites without permission. The Robots Exclusion Protocol requires website owners to place a robots.txt file on their domain to specify which pages should not be accessed by robots and crawlers.

AWS has strict policies prohibiting customers from engaging in illegal activities and must comply with all applicable laws. Perplexity claims to adhere to the Robots Exclusion Protocol and states that their services do not violate AWS terms of service, except for rare cases where their bot ignores robots.txt in order to retrieve specific URLs.

However, investigations by Wired suggest that Perplexity’s chatbot sometimes ignores robots.txt in order to collect unauthorized information, raising concerns about potential violations of AWS terms of service and the legality of Perplexity’s data collection methods.

By Samantha Jones

As a content writer at newsnnk.com, I weave words into captivating stories that inform and engage our readers. With a passion for storytelling and an eye for detail, I strive to deliver high-quality and engaging content that resonates with our audience. From breaking news to thought-provoking features, I am dedicated to providing informative and compelling articles that keep our readers informed and entertained. Join me on this journey as we explore the world through the power of words.

Leave a Reply