Apache Hadoop has been growing in popularity over recent years, known to be a valuable solution in cost effectively running large-scale analytics and processing. However, this popularity has meant that its security capabilities have been under scrutiny lately and questions have been raised around whether Hadoop is ready for secure environments.
With big data projects on the rise, many organisations are turning to Apache Hadoop for help. Hadoop can enable companies to cost-effectively run large-scale analytics and processing. However, questions have been raised around Hadoop’s security and whether it is ready for production use.
Hadoop’s versatility puts security under scrutiny
We have seen Hadoop evolve from its experimental beginnings to being deployed in many different applications at enterprises in a wide range of industries. Some of the most popular uses include: data warehouse optimisation, fraud and anomaly detection, recommendation engines and clickstream analysis.
Hadoop’s proliferation means its security capabilities have been under scrutiny—but much of this is an unfortunate mischaracterisation. The focus should not be on whether Hadoop is ready for secure environments. It already runs in some of the most security-conscious departments in the world, including financial services, healthcare and government. The real issue is identifying the right approach for specific environments.
Native security capabilities need to be deployed correctly
There are native security capabilities within Hadoop. Authentication is always required for secure data as well as Kerberos integration. Authorisation or access controls grant and deny permissions for accessing specific data. Auditing, such as analysing user behaviour and adhering to compliance to meet business requirements, can be done in a variety of ways.
Encryption is also supported, although it is often a misunderstood capability because it is sometimes misused as a means for access controls. It should be used to protect data sent over the network (data in motion). Data at rest can be encrypted to keep it safe, even if physical storage devices are stolen.
Additionally, obscuring sensitive elements in files can encrypt data at rest. This can make the data non-sensitive while retaining its analytical value. This type of encryption is handled by a variety of third-party vendors for Hadoop.
Advanced security added to suit business needs
Organisations should take steps to protect themselves while reaping the benefits of Hadoop.
For example, in some deployment models, a Hadoop cluster can be secured with firewalls and other network protection schemes to only allow trusted users to access it. This is the most basic type of implementation and does not depend on specific security capabilities in Hadoop. As an extension to this, a model can prohibit direct logins to the cluster servers while users are given data access via edge nodes combined with basic Hadoop security controls.
In a more sophisticated approach, native Hadoop security controls can give access to more users, where data access is only available for authorised users. In even more advanced environments, Hadoop security capabilities are fully deployed in conjunction with monitoring and analytics tools on Hadoop clusters to detect and prevent intrusion or other rogue activities.
Hadoop’s use in many companies with sensitive data validates its legitimacy as a secure environment. As with any new technology deployment, teams considering Hadoop should first document what’s most important, and then look for specific features that support those priorities. They should talk to Hadoop vendors and third-party security providers and then conduct a rollout that largely mirrors the requirements currently used in other enterprise systems.
Lack of standards means making it your own
There are several options for handling access control in Hadoop. However, since no one universal standard exists, security professionals have to do a bit more investigation to determine the right option for their business. Some technologies have a build as you go, or follow-the-data style of approach, while others take more of a data-centric approach.
The lack of standards should not deter organisations from adopting Hadoop. The various possible approaches mean different levels of people and processes can be applied, which is no different than those in other enterprise systems. With a vast number of production environments and sensitive data already running on Hadoop, security capabilities are only set to get better as adoption continues to grow.[su_box title=”About Dale Kim” style=”noise” box_color=”#336588″]Dale Kim is the Director of Industry Solutions at MapR. His background includes a variety of technical and management roles at information technology companies. While his experience includes work with relational databases, much of his career pertains to non-relational data in the areas of search, content management, and NoSQL, and includes senior roles in technical marketing, sales engineering, and support engineering. Dale holds an MBA from Santa Clara University, and a BA in Computer Science from the University of California, Berkeley.[/su_box]