Web Scraping of Public Pages Deemed Legal in the United States

Disclaimer: This post is intended for informational purposes only and does not constitute legal advice. For specific legal concerns, it is advisable to consult a qualified attorney.

Understanding the Legality of Web Scraping in the United States

Web scraping, a process many individuals and companies can benefit from, has long been a topic of legal ambiguity and confusion in the US. Although there are no definitive laws explicitly addressing the legality of web scraping, we can gain insight by examining relevant court cases and the precedents they establish. By understanding these legal frameworks, one can ethically and legally navigate the complexities of web scraping.

The legal landscape in the United States surrounding web scraping remains murky, with no clear legislative guidelines. However, analyzing significant court decisions provides a framework for what is permissible. These legal precedents can serve as a valuable guide for individuals and organizations looking to use web scraping techniques ethically and within the bounds of the law.

1. Computer Fraud and Abuse Act (CFAA)

The CFAA often emerges as a central focus in legal disputes involving web scraping. This law, which frequently surfaces as the initial legal challenge in such cases, was enacted in 1986 to combat hacking activities. It centers around the concepts of “authorized” versus “unauthorized” access to protected computers or computer systems. Under the CFAA, it is a criminal act to intentionally access and obtain information from a computer without permission or by “exceeding authorized access.” However, a significant issue with the CFAA is the lack of clear definitions for these terms within the statute itself. Consequently, interpretations of the CFAA largely depend on legal precedents set by the Supreme Court, particularly in the case of Van Buren v. United States, which will be discussed in greater detail later in this article.

2. California Penal Code Section 502

When applicable, California Penal Code Section 502 is frequently cited alongside the CFAA due to its similar implications. This statute outlines the offenses and penalties associated with unauthorized access to computers, computer systems, or networks. Beyond merely accessing data without authorization, it encompasses actions such as altering data, assisting others in gaining unauthorized access, or causing damage to computer systems or networks. The comprehensive nature of this code makes it a critical consideration in web scraping cases within California.

3. Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003 (CAN-SPAM Act of 2003)

While not specifically a web scraping law, the CAN-SPAM Act of 2003 is pertinent to discussions of web scraping, particularly in the context of this article. The CAN-SPAM Act addresses the issue of unsolicited commercial electronic mail, focusing on whether the information contained in such emails is materially false or misleading. Emails that feature deceptive headers or bodies violate the CAN-SPAM Act. Although it primarily targets spam emails, its principles can intersect with web scraping activities, especially when dealing with large volumes of data. This act will be further explored in relation to web scraping later in this post.

Court Cases Defining the Legality of Scraping Public Pages

1. Craigslist Inc. v. 3taps Inc. et al (2013)

In the landmark case of Craigslist v. 3taps, it was acknowledged that 3taps could scrape Craigslist as it was publicly accessible until Craigslist explicitly revoked this authorization through a cease-and-desist letter.

The crux of this case lay in 3taps’ motion to dismiss Craigslist’s claims of violating the CFAA and California Penal Code Section 502. 3taps was engaged in scraping data from Craigslist to market a Craigslist API. In an effort to halt this activity, Craigslist sent a cease-and-desist letter and blocked 3taps’ IP addresses. The letter clearly stated that 3taps was no longer authorized to access Craigslist’s site and services “for any reason.” Despite this, 3taps continued their scraping activities by using new IPs and rotating proxies.

Craigslist then sued 3taps, citing multiple claims, including violations of the CFAA. 3taps countered by filing a motion to dismiss, arguing that Craigslist, being a public website, could not revoke access authorization. However, Craigslist contended that the cease-and-desist letter was a clear revocation of authorization and that 3taps’ continued access constituted unauthorized access under the CFAA.

The court, upon reviewing the statute’s wording and precedent set by LVRC Holdings LLC v. Brekka, decided in favor of Craigslist. The court explained that the term “authorization” in the CFAA depends on actions taken by the entity controlling the computer system. Since Craigslist had taken definitive actions by sending the cease-and-desist letter and blocking 3taps’ IPs, they effectively revoked authorization, and 3taps’ continued access violated the CFAA.

2. Facebook, Inc. v. Power Ventures, Inc. (2016)

Initially, it appeared that Power Ventures could access Facebook’s data with user consent. However, after receiving a cease-and-desist letter from Facebook, Power Ventures was no longer authorized to access this data, making this case crucial in understanding the boundaries of authorization.

Facebook sued Power Ventures for accessing user data and sending promotional messages through Facebook’s platform. Power allowed Facebook users to promote Power through various Facebook actions, which resulted in emails or internal messages depending on user settings. Despite receiving a cease-and-desist letter and facing an IP block, Power continued its promotional activities, claiming user authorization.

Facebook accused Power Ventures of violating the CAN-SPAM Act, CFAA, and California Penal Code section 502. The district court initially ruled in Facebook’s favor on all counts, but the appeals court later reversed some decisions and upheld others.

Regarding the CAN-SPAM Act, the appeals court determined that the promotional messages sent by Power were not materially false or misleading. Since users consented to share the promotion, and Power was correctly identified in the messages, the court reversed the district court’s decision on this count.

For the CFAA, the court noted that Facebook incurred over $5,000 in employee time addressing the issue, giving them grounds for a private right of action. Initially, Power had user consent to access Facebook data, but once the cease-and-desist letter was issued, any further access was unauthorized. The appeals court upheld the district court’s decision that Power Ventures violated the CFAA.

Finally, under California Penal Code section 502, the court found that Power Ventures knowingly accessed and used Facebook data without permission after authorization was revoked, affirming the district court’s decision on this count as well.

3. HiQ Labs, Inc. v. LinkedIn Corporation (2022)

The HiQ Labs v. LinkedIn case is a pivotal example highlighting that scraping publicly accessible data does not violate the CFAA.

LinkedIn issued a cease-and-desist letter to HiQ Labs for scraping public profiles and blocked HiQ’s IP addresses. HiQ sought a preliminary injunction, arguing that the IP block significantly harmed their business. The case eventually reached the U.S. Supreme Court, which sent it back to the Ninth Circuit. The Ninth Circuit reaffirmed its original decision, granting HiQ Labs the injunction. The court ruled that HiQ Labs did not violate the CFAA as the data scraped was publicly accessible and LinkedIn did not own the information.

This decision emphasized that under the CFAA, authorization is not required for accessing publicly available data. This principle is supported by the SCOTUS case Van Buren v. United States, which clarified the “exceeds authorized access” clause, applicable only to restricted information, not publicly accessible data.

4. Meta Platforms, Inc. v. Bright Data Ltd. (2024)

The Meta Platforms v. Bright Data case is a recent example underscoring the legality of scraping publicly accessible data without violating terms of service.

Meta sued Bright Data for scraping data from Facebook and Instagram, arguing that it breached their terms of service. Bright Data collected and sold datasets through web scraping, utilizing proxies for their customers. Meta contended that Bright Data’s activities were against Facebook and Instagram’s policies, and despite demands to stop, Bright Data continued scraping.

A key issue was the definition of “user” and “use.” Meta argued that anyone accessing their site was bound by their terms, while Bright Data claimed that only those with accounts were considered users bound by the terms. Notably, Bright Data terminated all Facebook and Instagram accounts prior to the lawsuit and asserted that all scraping was done while logged off, accessing only public data.

Bright Data argued that logged-off scraping could not be bound to terms and conditions that were only accessible during account sign-up. The court agreed, noting that public site visitors would not encounter or agree to these terms. Meta’s claims of circumventing access restrictions, such as CAPTCHA solving, were dismissed as they did not involve accessing password-protected content, paralleling the HiQ Labs v. LinkedIn decision.

Upon reviewing the terms, extrinsic evidence, and the nature of logged-off scraping, the court ruled in favor of Bright Data, granting their summary judgment on breach of contract and denying Meta’s partial summary judgment. Meta eventually dropped the tortious interference claim.

Legality of Web Scraping at SerpApi

At SerpApi, we focus exclusively on scraping publicly available data from search engines. Our API calls mimic real-time searches without the need for any login or authorization. This practice aligns with legal standards, as demonstrated in landmark cases like hiQ Labs v. LinkedIn, confirming that SerpApi’s web scraping activities are entirely lawful.

We are dedicated to upholding legal and ethical standards in web scraping, ensuring our clients can use our services with confidence. To further this commitment, we offer the Legal US Shield with all production plans and higher tiers. Although web scraping is generally legal in the United States, this shield provides an extra layer of protection by covering any legal questions related specifically to the scraping and parsing of data, though not its subsequent use. This additional security helps alleviate any concerns our customers might have regarding the legality of web scraping.

Sources

California Penal Code section 502. (n.d.-b). Retrieved from https://www.calpers.ca.gov/docs/ca-penal-code-502.pdf

Computer fraud and abuse act (CFAA). NACDL. (n.d.). Retrieved from https://www.nacdl.org/Landing/ComputerFraudandAbuseAct

Craigslist, Inc., v 3 Taps Inc. et al. (United States District Court for the Northern District of California August 16, 2013). Retrieved April 18, 2024, from https://law.justia.com/cases/federal/district-courts/california/candce/3:2012cv03816/257395/101/

Craigslist, Inc v. 3Taps, Inc et al., no. 3:2012CV03816 – document 101 (N.D. Cal. 2013). Justia Law. (n.d.). Retrieved from https://law.justia.com/cases/federal/district-courts/california/candce/3:2012cv03816/257395/101/

Dilmegani, C. (2024, January 5). Is web scraping legal? Ethical web scraping guide in 2024. AIMultiple. Retrieved from https://research.aimultiple.com/web-scraping-ethics/

Facebook, Inc. v Power Ventures, Inc., DBA. (n.d.-b). Retrieved from https://cdn.ca9.uscourts.gov/datastore/opinions/2016/07/12/13-17102.pdf

Facebook, Inc. v Power Ventures, Inc, DBA and Steven Suraj Vachani (United States Court of Appeals for the Ninth Circuit July 12, 2016). Retrieved April 18, 2024, from https://cdn.ca9.uscourts.gov/datastore/opinions/2016/07/12/13-17102.pdf

HiQ Labs, Inc. v. LinkedIn Corp. (n.d.-b). Retrieved from https://cdn.ca9.uscourts.gov/datastore/opinions/2022/04/18/17-16783.pdf

HiQ Labs, Inc. v. LinkedIn Corporation (United States Court of Appeals for the Ninth Circuit April 18, 2022). Retrieved April 18, 2024, from https://cdn.ca9.uscourts.gov/datastore/opinions/2022/04/18/17-16783.pdf

Legal Information Institute. (n.d.). 18 U.S. Code § 1030 – fraud and related activity in connection with computers. Legal Information Institute. Retrieved from https://www.law.cornell.edu/uscode/text/18/1030

Lim, J. (2024, March 4). This is why Meta lost the scraping legal battle to Bright Data. Proxycurl Blog | Read our stories on data, scraping, APIs. Retrieved from https://nubela.co/blog/meta-lost-the-scraping-legal-battle-to-bright-data/

Meta Platforms, Inc. v. Bright Data, Ltd. (United States District Court, Northern District of California January 23, 2024). Retrieved April 18, 2024, from https://casetext.com/case/meta-platforms-inc-v-bright-data-ltd-6

Meta Platforms, inc. v. Bright Data Ltd., 23-CV-00077-EMC | casetext search + citator. Casetext.com. (n.d.). Retrieved from https://casetext.com/case/meta-platforms-inc-v-bright-data-ltd-6

19-783 Van Buren v. United States (06/03/2021). Retrieved from https://www.supremecourt.gov/opinions/20pdf/19-783_k53l.pdf

Public law 108–187 108th Congress an act – govinfo.gov. (n.d.). Retrieved from https://www.govinfo.gov/content/pkg/PLAW-108publ187/pdf/PLAW-108publ187.pdf

Quinnemanuel. (2023, April 28). The legal landscape of web scraping. Quinn Emanuel Trial Lawyers – Quinn Emanuel Urquhart & Sullivan, LLP. Retrieved from https://www.quinnemanuel.com/the-firm/publications/the-legal-landscape-of-web-scraping/

Urban, O. (2024, March 7). Is web scraping legal?. Apify Blog. Retrieved from https://blog.apify.com/is-web-scraping-legal/#what-is-personal-data-information-anyway

Van Buren v United States (Supreme Court of the United States June 3, 2021).

Whittaker, Z. (2022, April 18). Web scraping is legal, US appeals court reaffirms. TechCrunch. Retrieved from https://techcrunch.com/2022/04/18/web-scraping-legal-court/