David Gibson, vice president, Varonis, takes a look at how we can organise, analyse and utilise the contents of the biggest wave of human creativity that has ever engulfed the world.
If you think about it, as knowledge workers, we create every day. No matter what job we do in an office, we have software to capture our ideas and make them permanent: Word documents for notes and reports, Excel spread sheets for formula and models, and PowerPoint for big visions and roadmaps. The corporate file system has become both an enormous library – holding more words than are in all the “stacks” of any pre-digital college library—and a village commons, where corporate citizens can meet and exchange ideas.
With the Internet era now in its second decade, files, not paper documents, have truly become the vehicle for thoughts. Unlike legacy libraries, much of an organisation’s file system contents are spontaneously generated by its patrons. We use files as self-published books for our words, ideas, and raw data to share with co-workers. These data “books” are at the heart of enterprise digital collaboration.
Human-generated data is immediate and democratised—employees are free to publish their information in various formats, and it becomes available to others in real-time. When you have so many sources, storage platforms, and formats, data complexity grows quickly out of control. IT administrators are tasked with making this valuable information accessible at all times, making sure it’s protected, stored in the right places, and then deleted when it’s no longer relevant.
Human-generated data is far more difficult to organise and categorise than legacy print, and it exceeds the complexity of two-dimensional structured data in databases by orders of magnitude. Organising the chaos of unstructured data is far too difficult an assignment to be accomplished without significant help.
Just as libraries needed indexes to map out human knowledge and make it accessible, file systems also demand an organisational schema. A Dewey decimal-like system, of course, won’t work. How could it? Ad hoc, human-generated information doesn’t fall into a fixed set of static categories. Fortunately, the clues for a more practical organisation can be found alongside the data, in the metadata.
Metadata that describes formats and sizes, who accessed the files, when the files were accessed, what actions were taken (read, modify, copy), and who owns the files and folders are powerful classifiers. Why? If we consider the user community in a company as a social network, file access is similar to a “like” or a “follow”. Those who access the same files and folders are likely to have common work interests, content requirements and, with high probability, even belong in the same group or department. And just as social networking sites are able map the interests and preferences of its users and groups and, crucially, leverage those connections to enhance the overall user experience, it’s also possible to use metadata to create a relationship map between users, groups, and data within an organisation.
There is a big data technology specifically designed to mine and analyse massive amounts of file system and email metadata and draw key insights about company data usage, security, and more. At Varonis, our unique metadata framework harnesses unstructured, human-generated data, extracting the metadata related to content sensitivity, activity, and accessibility from emails, intranets, and file systems. In most organisations this metadata either doesn’t exist, or if it does, it’s in a format that is unusable for analytics.
The Varonis framework was engineered to efficiently handle the knotty computing problems involved with processing enormous amounts of metadata in real-time without burdening production systems. The complexity of human generated data can be a little surprising. As an example, a single terabyte of unstructured data usually contains 50,000 folders, 2500 of which are uniquely permissioned. Each of those 2500 folders is usually accessible by 4 groups, and each of those groups usually contains 10 (or even more) members. Between the folders, users, and groups, we can easily exceed 100,000 functional relationships that we need to track – and that’s before we start looking at who is accessing the data and what it contains. Typical organisations are now dealing with hundreds or thousands of terabytes of information.
At Varonis, we help our customers solve urgent problems. We’ve become their central command centre for data governance and security – making sure that data is only accessible to the right people at all times, use is monitored, and abuse is flagged. We also enable secure collaboration through a secure private cloud – giving users access to corporate data from any device, with easy-to-use file sync and share capabilities. Lastly, we provide intelligent and automated data retention and migration. Using our powerful metadata, we can determine which data the business should keep, delete, archive, or move and then automatically execute the appropriate action without the risk of downtime or permissions errors.
But this is usually just the beginning of our customers’ journey. Today, most organizations don’t have any insight into who is accessing files, mailboxes, and intranets. Imagine if you couldn’t see who was accessing your bank account. What would be the first thing you’d do if someone gave you the ability to see every financial transaction? You’d immediately make sure your assets were protected and only accessible by the right people. Next, you’d start to analyse access looking for patterns of misuse. Once you were confident in the security of your account, you’d leverage your newfound intelligence to optimise your finances.
Companies that haven’t mined their human-generated data are in a similar position. Once these organisations have visibility and control over their data they will effectively manage and protect it and, naturally, begin to extract more value from it, and ultimately make better IT and business decisions.
Varonis is the foremost innovator and solution provider of comprehensive, actionable data governance solutions for unstructured and semi-structured data with over 4500 installations spanning leading firms in financial services, government, healthcare, energy, media, education, manufacturing and technology worldwide. Based on patented technology, Varonis’ solutions give organisations total visibility and control over their data, ensuring that only the right users have access to the right data at all times.