In my mind, Dark Data is a subset of Big Data—enormous but without formal boundaries as defined by database schemas. In other words, it’s the human generated content in documents, presentations, spreadsheets, notes, and other readable formats that make up the bits and bytes of a corporate file system.
Corporate Dark Data comes about as a natural by-product of employees creating content to communicate ideas—every document is after all a just a thought that’s been converted to bits. However, we’ve grown accustomed to treating our file system as an enormous storage medium: files are continuously added, and hardly ever deleted.
Of course, ultimately there are real costs in terms of buying additional network access servers and paying admins to manage and protect all this data. And then the costs associated with a breach can be very high. There are also hidden costs when you can’t find the information you need because it’s hiding somewhere in the sprawl or has been inadvertently deleted. But as users, we don’t pay much attention, until we run out of disk space, hit our quota, or can’t find something we desperately need.
Dark Value: Infonomics
Looking on the other side of the balance sheet, analysts such as Gartner’s Doug Laney see Dark Data as a new kind of asset class worthy of being put on the corporate books. Laney has even proposed various methods to value corporate information—you can choose from fuzzier non-financial valuations such as IVI (intrinsic value of information) or MVI (market value of information), a more bottom-line technique based on how much someone is willing to pay. And Gartner has a whole theory around this topic, which it calls Infonomics.
Back to the nitty gritty of file system economics. One use case I’ve heard analysts discuss comes out of the insurance world.
Suppose an important customer has made a complaint about a pending claim. No doubt much of the information about the claim has been broken down into searchable database records. But not everything.
Think of all the communication between the company and the customers: Word docs and PDF files, notes from claim inspectors, and other file content associated with their interactions, along with any internal emails. To get a better sense of how the company responded or failed to respond, it would make sense to search for this customer-related information in the file system, using appropriate keywords associated with name, account number, email addresses, etc.
And if the company wanted even more context, it would search also for customers with a similar complaint and then correlate all the results. For example, it may point beyond a one-off problem to a root-cause stemming from, say, a workflow glitch or perhaps even an individual agent mishandling a specific kind of issue.
This is, of course, quite valuable information, which may not show up through conventional methods involving CRM or other corporate IT systems.
As I talked about in a previous post, to speed up the search for customer information, it makes great sense to use metadata- based classification methods. Let’s take the insurance claim example: you’d want your file system searches to be restricted to folders belonging to certain internal groups and active within specific time periods—e.g., property insurance department in July.
Outside of strictly dollar-and-cents issues, there’s also daily operational work that can be handled best on the dark side. A relevant use case in this area typically involves internal compliance or governance. Perhaps a financial company is doing trading or financial transactions on a specific security, or maybe there’s a request for information from the corporate counsel (for example, e-discovery).
The operational issue is to find all the relevant information in the file system and then freeze or quarantine the files, putting highly-restricted permissions on the contents. In other words, you wouldn’t want an employee accidentally acting on or changing information that’s currently the basis for a larger strategic initiative.
Dark New World
The key takeaway for IT is to start looking outside the well-defined world of databases and enterprise systems (CRM, ERP). The raw data that’s created every day by employees as they use the file system has non-zero value. Taken together, this dark data mass has both significant intrinsic value and is also a good source of operational intelligence.
By Andy Green | Varonis