The World Wide Web is the greatest tool for sharing information humankind has ever created. Unfortunately, lies and fake news spread over the Internet just as well. It is so easy for disinformation to proliferate online that it takes the combined effort of multiple governmental organizations and NGOs worldwide to keep it at bay. One technology—web scraping—is especially pivotal in assisting their fight for letting knowledge and not falsehood dominate the Internet.
The perils of disinformation
The Internet’s power to quickly disseminate wrong or partial information has virtually limitless potential to harm individuals and societies. In times of crisis, such as the COVID-19 pandemic, it divides people, induces panic, and otherwise curtails optimal collective response. Additionally, disinformation threatens the freedom of thought and the right to democratic participation by diminishing our capabilities to make free and informed decisions.
Thus, it is no wonder more than 85% of people worry about online disinformation and believe it has already significantly impacted politics in their country. On the one hand, such awareness of the problem is a good sign as people are less likely to fall for fake news. On the other hand, this can lead to a general distrust of online information, which diminishes the benefits of the Internet as a source of knowledge.
All this makes the problem of disinformation one of the most pressing issues for liberal societies. One could ask where so much of it comes from to do such significant damage. However, an opposite question is more apt.
Why isn’t there more of it?
Spreading online misinformation can be beneficial for various bad actors. Predatory states use it to destabilize foreign countries. Unscrupulous politicians utilize it to gain support and win elections. Finally, many people will share whatever attracts social media engagement without considering whether it is accurate or the damage false information can do.
It is also very easy to create and spread fake news. The democratization of knowledge sharing that the Internet brings means that any of the 5.44 billion Internet users can release any piece of information, true or false, into the world within seconds. Of course, in reality, most of them do not significantly contribute to online disinformation. However, those who do, utilize bots to accelerate the creation and dissemination of fake news beyond human capabilities. Finally, AI-based technology only makes this easier and adds high-quality visual and audio deepfakes to the problem.
Given all this, one can be amazed that this flood of disinformation has not completely drowned the truth yet. We still go online to find out the facts and check if something we heard is true. And we can still find reliable sources and get at least some of the answers we seek. How come the Internet has not become a repository of only lies and fakes?
Systemic fight for the truth
Decisive efforts to combat online disinformation are what keep the truth alive. It starts with users being vigilant and fact-checking what they read online before forming an opinion or sharing it. However, the volume of disinformation limits what private users can do on their own. Researchers found online searches to fact-check information can increase belief in misinformation. It happens when one finds low-quality sources that corroborate the same falsehoods.
Thus, the only way to combat disinformation is by systematically finding, removing, or flagging it. NGOs, journalists, and government agencies work tirelessly for this purpose. However, since fake news and misinformation creators utilize automated solutions, proper technology is also necessary to effectively fight it. Enter web scraping.
What is web scraping, and how does it help?
Web scraping is the automated collection of publicly available web data using specialized software tools known as web scrapers. It is used for various purposes that require large-scale data gathering or monitoring of the entire Internet. This ability of web scrapers to automatically crawl thousands of websites and identify specific categories of information makes it invaluable to organizations working to curtail online misinformation.
Organizations like Bellingcat, Debunk, and Confirmado use web scraping tools to find disinformation and report on it. These tools also help monitor extremist groups and uncover highly organized networks of malicious actors spreading propaganda for particular purposes. It enables researchers to better understand disinformation, who is behind it, their methods, and motives. For example, automated scraping that efficiently finds articles dispersed all over the Internet can reveal that “a journalist” is, in fact, a bot that generated and published thousands of articles within a year to spread Russian propaganda on the war in Ukraine.
The additional information uncovered with scraping tools can help inform effective policy guidelines for countering disinformation by filling our knowledge gaps with real-time data. Meanwhile, authorized agencies can use these tools to identify misinformation or potentially harmful content and flag or remove it.
Scraping is also where AI can be used for good — to fight disinformation instead of creating it. AI can help effectively manage the extensive proxy (IP address) infrastructure necessary to access various websites where fake news spreads. Furthermore, AI tools can increase the accuracy of automatically identifying instances of disinformation.
What if there was no web scraping?
Take web scraping out of the equation, and the overwhelming wave of online disinformation washes away any attempt to hold it back. In a world without web scraping, those fighting disinformation would have to manually check thousands of domains, countless web pages, and comment sections.
With new pieces of information uploaded every second, it would be impossible for manual efforts to make any difference. Journalists and researchers would always be way behind the automatic creation of disinformation if there was no way to also track it automatically. Luckily, web scraping allows them and responsible agencies to keep pace with producers of disinformation and protect the value of the Internet as a source of true knowledge.
The opinions expressed in this post belongs to the individual contributors and do not necessarily reflect the views of Information Security Buzz.