Web information

Do you plan to collect web data? Do some homework first

Q&A with Denas Grybauskas, Legal Manager at Oxylabs

Large amounts of data are collected every minute using web scraping. From product prices on different online stores to SEO keyword rankings, public web data enables businesses to make both tactical and strategic decisions, helping them outperform their competitors. However, with the collection of such data comes a responsibility. Besides being a sensitive issue in itself, data collection lacks comprehensive legal regulation or a clear industry consensus on what is acceptable and what is not.

Denas Grybauskas, the legal director of a major provider of public web data collection solutions Oxylabsexplains why it is always useful to consult a legal professional before embarking on your scrapping activities.

What does the web scraping industry look like today?

Web scraping has gone from a niche to an almost conventional form. In some industries, like e-commerce, over 80% of businesses use web scraping in their day-to-day operations. From the world’s largest companies to startups, the demand for web scraping services is huge and growing as more companies discover the potential of this technology.

The growing demand, in turn, drives innovation in the field. As new use cases emerge, opportunities for improvement arise. Web scraping tools are more efficient, simple and reliable today than they were 5 years ago, with increased use of artificial intelligence and machine learning technologies.

What are the main legal challenges regarding web scraping and data collection?

The main challenge is to answer a simple question: can I recover this particular data? Web scraping is relatively new and therefore shares the same problem with other new technologies – regulation develops much more slowly than the technology itself.

In practice, a company that collects public web data must take into account several legal aspects and check whether it complies with them. From the US Computer Fraud and Abuse Act (CFAA) to the European General Data Protection Regulation (GDPR) and other privacy rules, in addition to regulations that differ from region to region, there is a long list of laws that might become relevant in specific circumstances. .

Not only is the list ongoing, but many other laws could be considered depending on a particular data collection situation. Appropriate regulation explicitly dedicated to web scraping or a clear industry consensus and establishment of industry best practices would alleviate such a headache.

Another legal challenge comes from the growing pressure on Big Tech from governments around the world. There will likely be pressure for new regulations, particularly regarding personal data, its acquisition and aggregation. The data collection industry should not turn a blind eye to these processes. In light of government pressure, some big tech companies may already be restricting access to public web data, which could affect many businesses.

Have there been any recent developments in the web scraping legal landscape?

In the spring, headlines about whether “web scraping is officially legal” appeared in some major tech outlets. It was a reaction to the ruling of the US Ninth Circuit of Appeals in the legal battle between LinkedIn and HiQ Labs. HiQ Labs used public data from LinkedIn user profiles to gain insights into employee attrition. LinkedIn has raised numerous complaints about this scraping activity. However, the main argument was that scraping this data amounted to hacking.

Once again, the Court of Appeals found that the scraping of publicly available data did not violate the Computer Fraud and Abuse Act (CFAA) as LinkedIn had attempted to prove. Some respected the ruling as if it “officially legalized web scraping.”

While this is a great move for the scraping industry, it just reaffirms what most players in the tech industry probably already knew: public data scraping and hacking shouldn’t be treated the same. way. These actions are completely different and should have entirely different legal implications.

What advice or tips would you give to companies that want to start collecting web data?

I would advise consulting a legal professional first. It is better to verify the extent of the data collected and the legality surrounding it than to regret the consequences.

The first assessment is necessary when defining the data to be collected. Try to determine what type of data you actually plan to recover. Is there a risk that personal data will be collected? If so, can you minimize and anonymize this data? Second, is any data protected by copyright? If so, is it possible to avoid collecting it, etc.?

Another part of the questions would relate to the sources from which you plan to collect data. What kind of websites are they and what do their terms and conditions say? Do you need to login to the website to access the necessary data?

Finally, ask yourself if there are any upcoming regulations and court rulings that you should be aware of. Always consider the region and how the regulations differ in the US, Europe, Asia, etc.

These questions may seem difficult to answer, which is why it is always beneficial to have a legal professional nearby. Companies should ensure that their scraping processes comply with the latest case law and applicable regulations.

Oxylabs often uses the term “ethical web scraping”. What is the distinction between legal and ethics?

Not everything that is legal can be considered ethical. Data can be a very sensitive issue. Therefore, the industry should go beyond what is legal and have clear ethical principles for scraping operations.

With few exceptions, scraping only publicly available information is one of the fundamental rules of ethical web scraping. Businesses need to ensure that data is requested at a fair rate and does not compromise the web server. They should also study the terms of use of the website and decide if they can be accepted. Finally, they should use ethically obtained proxies.

How do you see the future of web scraping regulation?

I cannot foresee the imminent emergence of specialized regulations on web scraping. However, industry proactivity in terms of ethics and standards will be essential.

For example, Oxylabs and the four other key players in the market – Zyte, Smartproxy, Coresignal and Sprious – recently launched the Ethical Web Data Collection Initiative (EWDCI). The co-founders will aim to promote industry best practices with the highest ethical standards in mind.

The doors are now open to all companies that, in one way or another, rely on web scraping technology to join the ranks of the Web Data Collection Initiative and protect the industry from interior. Web scraping is rocket science for politicians. Therefore, if we want more clarity on the regulation of our industry, we should help the government create it.