Plan On Collecting Web Data? Do Some Homework First

By   Denas Grybauskas
Head of Legal , | Oct 28, 2022 02:15 am PST

Q&A with Denas Grybauskas, Head of Legal at Oxylabs

Vast amounts of data are being collected every minute with the help of web scraping. From product prices on different online shops to SEO keyword rankings – public web data allows businesses to make both tactical and strategic decisions, which helps them outperform their competitors. However, collecting such data comes with responsibility. Besides being a sensitive issue in itself, data gathering lacks comprehensive legal regulation or clear industry consensus between what is acceptable and what is not.

Denas Grybauskas, the Head of Legal at a leading public web data gathering solutions provider Oxylabs, explains why it’s always useful to consult a legal professional before starting your scraping activities.

How does the web scraping industry look today?

Web scraping has developed from niche to almost conventional. In some industries, such as eCommerce, over 80% of companies use web scraping in their daily operations. From the world’s largest corporations to startups, the demand for web scraping services is enormous and keeps on growing as more businesses discover the potential of this technology.

Growing demand, in turn, fosters innovation in the field. As new use cases emerge, opportunities for improvement are unlocked. Web scraping tools today are more efficient, straightforward, and reliable than they were 5 years ago, with increased use of artificial intelligence and machine learning technologies.

What are the primary legal challenges regarding web scraping and data gathering?

The main challenge lies in answering a simple question – can I scrape this particular data? Web scraping is relatively new and thus shares the same problem with other new technologies – regulation is developing a lot slower than the technology itself.

In practice, a company that collects public web data must consider several legal aspects and check if it complies with them. From the United States Computer Fraud and Abuse Act (CFAA) to the European General Data Protection Regulation (GDPR) and other privacy, in addition to regulations that differ from region to region, there’s a long list of laws that might become relevant in specific circumstances.

Not only that the list is ongoing, but many other laws could be considered depending on a particular data gathering situation. Proper regulation explicitly dedicated to web scraping or a clear industry consensus and establishing best industry practices would ease such a headache.

Another legal challenge comes with the growing pressure for Big Tech from governments worldwide. There will likely be a push for new regulations, especially around personal data, its acquisition, and aggregation. The data gathering industry should not turn a blind eye to these processes. In light of government pressure, some big tech companies might already be restricting access to public web data, which could affect many businesses.

Have there been any recent developments in web scraping’s legal landscape?

Back in spring, headlines linking to whether “web scraping is officially legal” ran through some major tech media outlets. It was a reaction to the U.S. Ninth Circuit of Appeals ruling in the legal battle between LinkedIn and HiQ Labs. HiQ Labs used public data from LinkedIn user profiles to get information on employee attrition. LinkedIn raised many claims regarding this scraping activity. However, the main argument was that scraping such data amounts to hacking.

Once again, the Court of Appeals concluded that scraping publicly available data does not breach the Computer Fraud and Abuse Act (CFAA) as LinkedIn had tried to prove. Some met the ruling as if it officially “legalized web scraping.”

While it’s a great decision for the scraping industry, it just reaffirmed what the majority of those in the tech industry probably already knew: the scraping of public data and hacking shouldn’t be treated as the same. These actions are completely different and should have entirely different legal implications.

What advice or tips would you give to companies that want to start collecting web data?

I would advise consulting a legal professional first. It’s better to double check the scope of the data being gathered and legalities around it than to regret the consequences.

The first evaluation is needed when defining which data to collect. Try to determine what kind of data you are actually planning to scrape. Is there a risk that personal data could be collected? If so, can you minimize and anonymize that data? Secondly, is any of the data copyrighted? If so, is it possible to avoid collecting it, etc.?

Another portion of questions would concern the sources from which you plan to collect data. What kind of websites are those and what do their Terms and Conditions say? Do you need to login to the website to access the necessary data?

Finally, ask yourself if there are incoming regulations and court decisions you should be aware of. Always consider the region and how regulations differ in the U.S., Europe, Asia, etc.

These might seem like difficult questions to answer, which is why it’s always beneficial to have a legal professional nearby. Companies should make sure that their scraping processes are in line with the latest case law and relevant regulations.

Oxylabs often uses the term “ethical web scraping”. What is the distinction between legal and ethical?

Not everything that is legal can be considered ethical. Data can be a very sensitive issue. Therefore, the industry should go beyond what’s legal and have clear ethical principles for scraping operations.

With some exceptions, scraping only publicly available information is one of the fundamental rules of ethical web scraping. Companies should ensure that the data is requested at a fair rate and that it doesn’t compromise the web server. They should also study the website’s Terms of Service and decide if they can be accepted. Finally, they should use ethically procured proxies.

How do you see the future of web scraping regulation?

I can’t forecast specialised web scraping regulation emerging anytime soon. However, the proactivity of the industry in terms of ethics and standards will be essential.

For instance, Oxylabs and the other four key market players – Zyte, Smartproxy, Coresignal, and Sprious – have recently launched the Ethical Web Data Collection Initiative (EWDCI). The co-founders will aim to promote the industry’s best practices with the highest ethical standards in mind.

The doors are now open for all companies that, in one way or another, rely on web scraping technology to join the Web Data Collection Initiative association’s ranks and safeguard the industry from within. Web scraping is rocket science to politicians. Hence, if we want more clarity on regulating our industry, we should help the government create it.