Becoming Sustainably Data Driven: Step 4

Robust, Agile Advanced Analytics that Make a Difference


This post is the fourth and final installment of our "Becoming Sustainably Data Driven" series. Read the other entries here:


In response to a rapidly changing technical landscape and the need to remain relevant, organizations are becoming more dynamic and innovative. Strategically-minded companies understand the revenue-driving power of enhanced customer experience driven by meaningful insights from their data assets. When customers feel that an organization is responsive to their needs, and flexible enough to deliver, they are more likely to remain customers. Agility is equally important internally: energized and fully engaged data science teams are more likely to create robust solutions that derive quick and continued insights, which in turn are more likely to ensure continued commitment from decision makers.

The traditional “waterfall” approach depends on complete and concrete plans that minimize the risk of obstacles and delays. Success depends on well-understood and concise requirements, a well-defined approach, and accurate timing. As any project manager knows, the perfect waterfall project remains an unachievable ideal.

Data science has ushered in a new age of innovation and dynamism to Information Management projects. In contrast to the traditional approach, the typical data science process is very iterative and agile. Data science teams are given the flexibility to achieve their goals, modify approaches to fit boundaries, and manage constraints as they arise. By acknowledging and accepting uncertainty and capitalizing on clever ideation, data science teams deliver optimal value across a wide range of situations.

Data science succeeds because of its approach. Research scientists spend years developing futuristic products that may or may not succeed, and most organizations do not have the flexibility or capital to invest in things that may not work. Data scientists, on the other hand stay connected with the business. Fueled by their creative aptitude and driven to produce tangible business value, a successful data science team shows incremental and usable results at each iteration of their development lifecycle, ensuring the continued support of the organization [4].

By engaging frequently with relevant business stakeholders, data scientists get immediate feedback on whether or not their approach is consistent with the business goal. This feedback allows them to understand what changes need to be made in order to improve the strategic business outcome, and determine early on whether or not to adjust or even change course [3]. To foster this all-important feedback, an ideal agile data science project follows the following steps:

1)      Present the business problem that needs to be solved
2)      Brainstorm and choose the set of hypotheses to explore and strategize for the next iteration of the project lifecycle
3)      Begin development toward solutions to confirm or reject the hypotheses
4)      Present incremental developments to relevant stakeholders, accepting feedback and adapting project trajectory according to suggestions and limitations uncovered
5)      Continue steps 2-4 potentially expanding to incorporate additional data sources, advancements in the analytical models, and added functionalities until step 1 has either been solved, or deemed unsolvable.

These general steps are applicable to most data science projects, from simple model development to larger strategic initiatives. Figure 1, a methodology popularized by the Spotify team, succinctly summarizes the intentions behind a continuous, value-generating data science process.

Figure 1: How to build a minimum viable product [5]

Figure 1: How to build a minimum viable product [5]

The value of agility extends beyond development methodology. Advances in big data technology have made readily available scalable analytics possible, allowing data science teams to experiment and iterate much faster. This allows us to capture insights on a much larger scale and at a finer granularity than ever before. For example, rather than having to make general conclusions about an individual based on the population segment they fall into, scalable analytics platforms now allow organizations to capture individuals’ patterns and make effective, targeted offers.

Technical breakthroughs such as Hadoop and Spark have made it possible to scale analytics up and out across the organization. Rather than having to develop low-level code within parallel computing environments requiring countless edit-compile-run-debug cycles [2], these technologies have made parallel computation available to employees across the organization, truly an amazing feat. With integration into R and Python (the two fundamental tools within the data science toolbox), Spark is now even more accessible for data scientists who may not have the technical depth or preference to program in a lower-level language.

Robust, diverse technologies are arming data science teams and enabling agility and success: from tools that automate general operational tasks to those that incorporate cognitive intelligence into enhanced offerings. With an ever-growing list of tools, organizations are faced with strategic questions such as whether to buy or build, and how to implement, consume, and expand upon technology choices in the long-term interest of the organization.

Robust methodologies require strategic thinking about which approaches to implement, as well as how to implement them. Are you applying the correct data transformation and ensuring that metrics are standardized for consistent comparison? A robust workflow that performs all necessary checks and takes all necessary steps will ensure much more extensible solutions. Furthermore, adopting standard development principles such as defining short functions, verbose commenting, and limited hard coding will allow data scientists to develop models and scripts that are more flexible and shareable. In addition, most tools are now maintaining a robust design principle that can handle an array of data types and sources. This is part of the flexibility required by big data tools to allow model execution across schema-on-read and unstructured data sources.

Whether your organization is just beginning their data science journey, or has an experienced analytics team, Adastra’s data science practice offers the assistance and support to enhance and accelerate your efforts. Feel free to reach out to our data science practice lead to start a conversation.


John Yawney, Analytics Practice Lead

John Yawney, Analytics Practice Lead


References

[1]    Akred, John. Successful Data Teams are Agile and Cross-Functional. Silicon Valley Data Science, 22 Sept. 2017, www.svds.com/tbt-successful-data-teams-are-agile-and-cross-functional/. Accessed 3 Oct. 2017.
[2]    Janssens, Jeroen. Data science at the command line. Sebastopol, CA, OReilly, 2015.
[3]    Overton, Jerry. Going Pro in Data Science. Sebastopol, CA, OReilly, 2016.
[4]    Patil, DJ. Building data science teams: the skills, tools and perspectives behind great data science groups. Sebastopol, CA, OReilly, 2011.
[5]    Shikhanov, Kirill. MVP. Dribble, 5 Oct. 2014, https://dribbble.com/shots/1753131-MVP. Accessed 3 Oct. 2017.