Close Menu
  • Home
  • Articles
    • Attacks
      • BEC
      • Data Breach
      • DDoS
      • Evasion Attacks
      • Injection
      • Malware
      • MITM
      • Phishing
      • Ransomware
      • RCE
      • Social Engineering
      • Spoofing
      • Spyware
    • Business and Policy
      • BCP and DRP
      • GRC
      • Regulations
    • Data Protection
      • DLP
      • DRM
      • Encryption
      • IAM
    • Future, Trends and Insight
      • AI
      • Events & Community
      • Emerging Tech
      • Expert Panel
      • Interviews With Experts
      • Insights
      • Study & Research
    • Resources
      • Guides
      • Tools
      • Training & Education
    • Security
      • API
      • Apps
      • Cloud
      • Critical Infrastructure
      • Endpoint
      • Hardware
      • IoT
      • Mobile
      • Network
      • OT
      • Port Security
      • Security Architecture
      • Software Development
      • Supply Chain
      • Zero Trust
    • Threats and Vulnerabilities
      • Emerging Threats
      • Insider Threats
      • Risk Management
      • Threat Intelligence
      • Zero Day
  • News and Exclusives
    • Latest News
    • ISB Exclusive
    • Positive News
  • Who We Are
    • About Us
    • Information Security Buzz Expert Panel​
    • Write for Us
    • Media Pack
  • Contact Us
  • Newsletter
Facebook X (Twitter) LinkedIn
Facebook X (Twitter) LinkedIn
Information Security BuzzInformation Security Buzz
  • Home
  • Articles
    • Attacks
      • BEC
      • Data Breach
      • DDoS
      • Evasion Attacks
      • Injection
      • Malware
      • MITM
      • Phishing
      • Ransomware
      • RCE
      • Social Engineering
      • Spoofing
      • Spyware
    • Business and Policy
      • BCP and DRP
      • GRC
      • Regulations
    • Data Protection
      • DLP
      • DRM
      • Encryption
      • IAM
    • Future, Trends and Insight
      • AI
      • Events & Community
      • Emerging Tech
      • Expert Panel
      • Interviews With Experts
      • Insights
      • Study & Research
    • Resources
      • Guides
      • Tools
      • Training & Education
    • Security
      • API
      • Apps
      • Cloud
      • Critical Infrastructure
      • Endpoint
      • Hardware
      • IoT
      • Mobile
      • Network
      • OT
      • Port Security
      • Security Architecture
      • Software Development
      • Supply Chain
      • Zero Trust
    • Threats and Vulnerabilities
      • Emerging Threats
      • Insider Threats
      • Risk Management
      • Threat Intelligence
      • Zero Day
  • News and Exclusives
    • Latest News
    • ISB Exclusive
    • Positive News
  • Who We Are
    • About Us
    • Information Security Buzz Expert Panel​
    • Write for Us
    • Media Pack
  • Contact Us
  • Newsletter
Subscribe
Information Security BuzzInformation Security Buzz
Home - Articles - Lack Of Web Scraping Regulations Hurts Progress
Articles

Lack Of Web Scraping Regulations Hurts Progress

Denas GrybauskasBy Denas GrybauskasJune 23, 2022Updated:December 16, 20227 Mins Read
Share LinkedIn Twitter Facebook Copy Link Email
Microsoft Admits PaperCut Servers Used By LockBit and Cl0p Ransomware
City Of Toronto Admits Data Theft, Clop Takes Blame
Share
Facebook Twitter LinkedIn Email Copy Link
Quick AI Summary
ChatGPTClaudeGeminiGrokPerplexityDeepSeekCopilot

Web scraping is something enigmatic in the public’s eye. It’s not entirely surprising – every innovative solution takes some time before it’s understood and accepted. Generally, widespread acceptance closely follows industry-related legislation or regulation.

However, web scraping is immensely useful for the public good. Anything that slows down its adoption among the wider public is causing harm in the long term. In this article, I’ll argue that any dedicated regulation, at least industry self-regulation, would be beneficial for the world at large.

Someone’s money versus everyone’s money

In any free-market economy, private capital spending is fairly unrestricted. Public money, on the other hand, follows numerous rules and regulations. While they may differ across countries, governments, and political/economical ideologies, public money spending is project-based.

Usually, project-based work means someone has to convince someone else to provide the required sum of money. That indicates public spending is based on two factors – persuasion and law. Clearly, or rather hopefully, no government entity would set their signature upon a project that breaks the law.

On the other hand, persuasion is needed because those who sign off on public spending are people. Their perceptions, ideas, hopes, and dreams all influence the decision-making process. Presented with a new and unknown process, people are wary. Especially if there’s no legislation surrounding it.

Private capital, on the other hand, hasn’t got many masters and wardens. It can be spent according to the desires of the person controlling it. There is often no need to present an argument why private capital should be spent one way or another. However, business-centric individuals will often reinvest the money towards innovation, leading to the creation of new industries over time.

These differences in spending mean that private capital creates industry, some of which comes with innovation. Industries for a while exist in a limbo state but are eventually regulated. The associated innovations then gain legitimacy in the public consciousness. Public spending can then begin.

Of course, this is slightly reductionist as there are cases where web scraping is being used right now in science and for the public good (e.g. such as to catch “copied and pasted” laws pushed by lobbyists). When COVID-19 originally hit the world, we supported CoronaMapper (powered by web scraping) to help them in their project.

As another exception, now we are helping the Lithuanian government in finding illicit visual content on the internet through the power of web scraping pro bono. Other examples can range from economists acquiring additional data sources for research or discovering inefficiencies in property tax allocation.

However, all of these examples are an exception to the rule, not the other way around. If web scraping would be identified as legitimate by the world at large, the possibilities outlined here would just be the tip of the iceberg.

Role of sectoral laws

Even when industries exist in a limbo state, courts still have to resolve disputes between participants. Although no laws that would directly regulate web scraping exist, courts follow previous practice and interpret current legislation that might be indirectly related to the case at hand.

Web scraping is going through such a stage. There are many court cases that have allowed our industry to interpret what could be considered a good way of scraping. Most of the interpretation in case-law is performed through sectoral laws, usually those that define or regulate the use of data in general (e.g., GDPR, CCPA, and others).

When sectoral laws are in use, you can arrive at a lot of… unusual conclusions. For example, most social media websites argue that personal data stored in them belongs to the people and that the company only protects user privacy. Yet, Facebook recently banned public NYU researchers who were collecting data on the platform. They used a browser extension that collected the data only from those who had installed it. Of course, they got the explicit permission of every extension user, meaning that they agreed that these researchers will collect their personal data through these extensions.

Such action is essentially a contradiction to the argument raised by many social media giants. If the data belongs to the users, they can grant consent for it to be collected. If, as in Facebook’s case, they cannot grant consent, the social media platform has to make the case that data is their private Facebook’s asset instead of one that belongs to people. Such a position could be seen as contradictory.

Facebook can only strongarm researchers through their Terms of Service in such a manner because industry regulation is lacking. These gaps can be used to devise Terms of Service in a self-interested manner without breaking the law, even if web scraping or other means of automated data collection would be used for the public good.

Over time, web scraping industry players have developed a common and mutual understanding that scraping publicly accessible data is OK. Data accessible only after a log-in, on the other hand, is off limits. However, a case can also be made that the widespread public adoption of some social media such as Facebook should be factored into these considerations.

According to Statista, there are 2.85 billion monthly active users on Facebook. With such a large percentage of the global population participating in the platform, it may be argued that the data is no longer de facto non-public. Such data, it could be argued, should not fall under the hegemony of one company.

Self-regulation until regulation

Some time ago, the UK Joint Industry Committee for Web Standards (JICWEBS) started using Trustworthy Accountability Group’s Certified Against Fraud (TAG), which is intended to reduce ad fraud. However, JICWEBS was late to the party. Many digital advertising companies had been using TAG independently. Independent use of TAG, which is essentially self-regulation, has had tremendous results.

Back when GDPR and CCPA were being rolled out, industry leaders reached out to policymakers. Not to lobby for softer laws (although I’m sure that happened as well), but to provide greater clarity to certain definitions.

There are many other industries where self-regulation has led to impressive results (e.g., American Bar Association). Clearly, business can work hand-in-hand with governments and lawmakers. Most importantly, self-regulation can provide a foundation for government regulation.

Web scraping is the perfect candidate for self-regulation. As it’s a highly technical and difficult topic, legislation will take some time to catch up. It will take us some time to explain and educate the world at large about the entire process from the ground up. Until then, it’s hard to expect that any proper legislation can be enacted.

Self-regulation is also not only about getting to the stage where legislation happens. It also reduces the influence of bad actors on the market and improves the overall perception of the industry. Finally, it’s a lot easier to spread awareness about something if an association for it exists.

Conclusion

Web scraping can benefit the world at large. However, it will struggle to do so if it remains an enigma that exists in some legal limbo. While it’s easy to just sit back and wait until someone else does the work for us, we shouldn’t aim for easy. Web scraping can self-regulate so that the world may reap its benefits.

Denas Grybauskas

Denas Grybauskas, Head of Legal at Oxylabs.io

  • Denas Grybauskas
    Plan On Collecting Web Data? Do Some Homework First

The opinions expressed in this post belong to the individual contributors and do not necessarily reflect the views of Information Security Buzz.

Share. Facebook Twitter LinkedIn Email Copy Link

Related Posts

Roblox Under Fire: Lawsuit Alleges Secret Data Tracking of Kids

May 13, 20254 Mins Read

Understanding Cloud Access Security Brokers (CASB)

March 28, 202410 Mins Read

Decoding Cloud Security Posture Management (CSPM)

March 28, 202411 Mins Read
ISB-Bora-Side-Bar

No se ha podido establecer conexión. Error 429

 
ISB-Bora-Side-Bar
Black ISB Logo

Information Security Buzz is an independent resource that provides the experts’ comments, analysis, and opinion on the latest Cybersecurity news and topics

X (Twitter) LinkedIn Facebook RSS

Working With Us

  • About Us
  • Advertise With Us
  • Contact Us

Write For Us

  • How To Contribute

The Pages

  • Privacy Policy
  • Cookie Policy
  • AI Policy
  • Terms & Conditions
  • Copyright Notice

Information Security Buzz and all its contents are copyright © 2014-2025. All rights reserved. All third-party trademarks are recognized.

Type above and press Enter to search. Press Esc to cancel.

Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
View preferences
  • {title}
  • {title}
  • {title}