Bringing Visibility To The Issue Of Web Data Bloat

Francesco Giarletta, CEO of Avanite, examines the potential security issues that simple web browsing data can cause and how web data bloat can be reduced

Whenever a user visits a website, data is created, downloaded, and stored. Some of this data is useful; it enables us to get the rich browsing experience we expect from the sites we visit. But as web sites and web applications become increasingly connected and complex the amount of browsing data also grows, quickly reaching a point where computer performance is impacted. What’s more, that data in browsing databases can include highly sensitive information putting organizations security at risk.

Of course the issue of the web data is as old as the internet itself, so it would be perfectly reasonable to question why this is now an issue. So how -and why – has the web browsing data problem evolved into not only a potential risk to IT performance but also organisations privacy?

Inconvenience to impacting performance

While web data has always been a potential inconvenience to users and organisations it started to become a bigger problem when Microsoft introduced Windows 8.0 and Internet Explorer 10. At this point the computing behemoth introduced the webcachev01.dat database, a central store for all web data, such as cookies and browsing history. In addition to acting as a central data repository for Microsoft browsers, it also collects data from any application which uses the WinInet subsystem for internet communications, such as Windows store apps and Windows Explorer.

The database spawns at 32MB and as users use the system it grows with new data. Files of multiple Gigabytes are not uncommon, so the capacity to store all that data can directly impact user experience, with problems such as prolonged log-on times, while also utilising a growing amount of IT infrastructure resource, and associated costs.

Where data grows, security threats follow

But the problem doesn’t stop there as the data collection goes far beyond a record of websites visited. It can also can include information such as usernames, passwords, and account numbers. Furthermore many websites prompt user to remember login credentials, which, when opted for, stores an authentication cookie on the machine. This makes it possible for hackers steal or copy these authentication cookies and user logins, enabling them to gain access to sites, such as the CRM or any other cloud based software-as-a-service application, as a verified user without being prompted to provide any credentials.

You can’t walk out

And it isn’t just the Microsoft browsers that create these issues. While browsers such as Google Chrome and Mozilla Firefox do not use WinInet, they do utilise their own proprietary code and databases that perform many of the same actions. And as web sites and web applications become more complicated, compatibility issues also start to occur, forcing businesses to install multiple browsers for their users, exacerbating the data bloat and performance problem further.

The final complication to this issues is that much of this web data, along with other personal settings such as email signatures etc, is usually stored in user profiles. This in turn creates further data bloat and performance issues, particularly when users login to a roaming or virtual environment.

Addressing the web data challenge

So with the challenges associated with web data bloat unlikely to go anywhere soon what can organisations do to address the problem? Let’s take a look at the pros and cons of the standard options that organisations can implement.

Option one – keep all web data

Continuing to keep all web data will ensure that users have a rich browsing experience, ensuring their expectations of websites and web apps are met. However, not only will steps have to be taken to secure that data, the administrator will also be faced with a requirement for increased back-end storage.

For instance, in a business of 1000 users with an average web data size of 250MB per user an extra 250GB of storage will be required. And if all users log on at 9am this 250GB needs transferring to the users which places strain on the network. All of this affects the user in the way of performance. Login times are massively affected (customers have reported up to 90% of the time taken to log on is down to web data), browser launch times increase, as does the rendering of web pages which, in some extreme cases will time out, impacting productivity.

Option two – delete the data each time

By deleting all web data after each session organisations will remove the impact on logon, storage, and network while addressing concerns around data security. However, this effectively means that for each logon the user will be starting from scratch as far as internet applications are concerned.

Useful browser items such as history and cookies, many of which will be authentication cookies, which are used to recognise users on websites such as AWS and Office365 are not immune and are also lost. So while the issues of bloat and privacy are removed, problems around user experience and productivity remain.

Option three – disallow third party cookies

A further option is to utilise the ‘disallow third party cookies’ setting that all browsers include. This will stop the majority of the “undesired” cookies being stored but might also stop certain website features from operating correctly. As such, by enabling this, all of the issues (performance, security, user experience) would be partially but not fully addressed.

Option four – create whitelists and blacklists

Finally organisations could choose to whitelist and black list certain types of web data. By identifying business applications and websites and “whitelisting” them, while blacklisting problem websites and applications, they can ensure that only data that is relevant to business as usual activity is retained. However, much like disallowing third party cookies, this only partially addresses the issues rather than providing a solution to them.

A new approach to web data bloat

It’s clear that the standard options available all come with a catch – either a rich browsing experience with web bloat and possible privacy issues, or a lean data footprint with quick loading times that’s hampered by browsing and compatibility issues. As such what is needed is a solution that completely manages all aspects of web data, offering organisations the opportunity to strike a balance which best suits their specific requirements.

To achieve this IT teams need a solution that is able to analyse the data generated by simple web browsing and web based applications. By being able to see what data is present on PCs and servers, they can begin to understand what category – desirable and undesirable – that web data falls into, while also assessing the savings in disk space and storage that can be realized by addressing the issue.

With this visibility of the web browser data that resides in the network and the issues it is creating, it then becomes possible to remove unnecessary data. This will not only reduce the size of users’ web browser databases, but will also provide administrators with full control over users’ browsing data to ensure that only required information is kept. In our experience this can reduce the size of WebCache files by 80 to 90%, and the number of cookies in a typical WebCache from typically 5,000 or more to a few hundred.

As cloud and web applications become increasingly prevalent in the IT environment, it’s critical that organisations adapt their practices to ensure that this new plethora of data doesn’t impact performance, security or the user experience. And this can only be achieved once they have not only the ability to gain full visibility of web data but also the capability to manage – and delete it – effectively.

ISBuzz Team

The opinions expressed in this post belong to the individual contributors and do not necessarily reflect the views of Information Security Buzz.

ISBuzz Team

The Real Cost of Inconsistent Third-Party Access

What Happens When Devices Cross Borders? The Role of Geofencing in Global IT

The Evolving Importance of Identity Governance in FinTech

Working With Us

Write For Us

The Pages

Bringing Visibility To The Issue Of Web Data Bloat

ISBuzz Team

Related Posts

The Real Cost of Inconsistent Third-Party Access

What Happens When Devices Cross Borders? The Role of Geofencing in Global IT

The Evolving Importance of Identity Governance in FinTech