Optimizing the Use of Hadoop with a Process-Oriented Sandboxing Approach

The rapid evolution of technologies like Hadoop has fueled the Big Data hype we’ve been witnessing for years now. As these technologies mature, the Big Data dream is increasingly within reach of any organization with extensive data resources. And once something valuable is perceived as being within reach, business leaders begin to demand it be implemented, and provide value, to a wide variety of applications.

Companies with self-service access to their Hadoop infrastructure saw an almost 50% increase in achieving tangible value from their Hadoop infrastructure (2015)
— AtScale (1)

The demand for Hadoop solutions has been boosted further by the technology’s maturity. Whereas the adaptability of Hadoop was once associated with loose, relatively ungoverned systems, governance and security on the platform have come a long way. In addition, all those early adopter organizations that initiated Proofs of Concept have realized the value of the technology and are beginning to apply it across their enterprises. The technology itself, of course, is intrinsically scalable: the processing capability is increased simply by adding memory, processing power, and storage, which are increasingly affordable. Indeed, Hadoop’s ability to provide massive parallelization at a fraction of the cost of traditional RDBMS systems makes the natural option for any organization that is cost sensitive (and I’ve yet to meet a business leader who isn’t open to that!).

So, the message from the Business is “use Hadoop.” Data and information stakeholders then ask “give us a slice of Hadoop.” Someone needs to give them their Hadoop, without overloading the system, and always with an eye on security and effective usage. Individuals and departments are constantly asking, and administrators must continuous establish user accounts, groups, group membership, folders, tool access, hive DB access, HBase namespace access, security policies, resource management mappings, etc. And whereas you want to grant them the access they need to do their work, you must avoid the pitfalls of poorly managed systems that could give individuals more access than they need, or could result in tying up the system with malformed queries.

However… into 2016 fewer than 50% of companies had self-service access after more than one year of Hadoop adoption
— AtScale (2)

It stands to reason that, at the same time as we employ a technology that makes the most effective use of resources yet imaginable, we should not abandon a system administrator to manage Hadoop sandboxing on an ad hoc basis. A process-oriented approach allows for better monitoring of resources and requests. Furthermore, with proper resource planning and due diligence around the approach individuals can be allocated to the appropriate queues. This approach ensures mission-critical tasks avoid bottlenecks created by low-priority tasks. Process-oriented approaches also allow for greater auditability: logs keep track of exactly which objects have been created, what memberships have been assigned, and what accesses have been granted.

More effective use of Hadoop resources allows the organization to focus on what is done with those resources. Analytics teams have the means to “play” with the data in a secure environment and prove the value of their solutions before deploying them to the organization. Administrators can monitor activity to increase accountability. Well-managed sandboxes are a great accelerator in terms of time-to-value for analytics teams: They provide the necessary space, data, and tools so that analytical teams can effectively “hit the ground running”, obtaining access with minimal delays, and applying their time, energy, and talents to providing value.

Adastra’s Automated Sandbox Provisioning Solution includes:

  • Pre-built solution architecture/design
  • Customizable technology stack and integration
  • Development/configuration accelerators
  • Implementation plan
  • Deployment support for operations team

Adastra-powered Hadoop sandboxing enables:

  • Automated provisioning of:
    • User accounts within Hadoop and associated tool stack
    • Hadoop tool resource definitions
    • Security policies to grant access control
    • Delegation of resources controlled through queues and quotas
  • Execution logs and resource summary for auditing and usage administration
  • Auto-generated business-friendly confirmation emails and resourcing summaries
  • Archiving existing sandbox data resources
  • Ability to rollback provisioned sandboxes by removing access control policies and provisioned resources

With the Adastra solution…

Once a request for a sandbox is approved by your system manager, the solution’s automated process assigns the new user a sandbox with all the tools and access necessary for them to get to work. Pre-defined queues and quotas make sure that system resources are being used optimally and are not bogged down by unnecessary or unproductive instances. The solution makes sure that sandboxes for critical and priority operations and projects take precedence over more exploratory, experimental, or pet projects.

Summaries of activity are readily available to administrators, and users receive fully automated confirmation and activity summary notifications. To help ensure the process is functioning properly, detailed logs are kept for auditing purposes. Existing sandbox data resources are archived frequently to protect data from shutdowns, and to ensure that nothing of value is discarded unintentionally. The solution also features a full rollback functionality.

Benefits of Adastra’s Sandbox Provisioning Solution:

  • Frees up the valuable time of system administrators
  • Mitigates the risk of errors and their related costs
  • Ensures consistent naming conventions and syntax across projects and departments
  • Handles large volumes of requests quickly and efficiently
  • Enables effective oversight, auditing, and usage management

Accelerate the Big Data Analytics capabilities of your team: combine their business knowledge and insight with the power of the latest Big Data technologies, and unlock the value of your data without exposing yourself to risk.

Contact us today.


John Yawney Analytics Practice Lead

John Yawney
Analytics Practice Lead

SOURCES:
1. 2015 Hadoop Maturity Survey Results, AtScale p. 16 & 28. http://info.atscale.com/2015-hadoop-maturity-survey-results-report
2. 2016 Hadoop Maturity Survey Results, AtScale, p. 17. http://info.atscale.com/2016-hadoop-maturity-survey-results-report