I am here with Rahim Hajee, Chief Technology Strategy Officer at Adastra Corp. where his primary focus is to help organizations build and integrate new technologies in their Data and Analytics roadmaps.
Previously with Rahim, we had focused on helping organizations get more value out of their customer journey and how it relates to the enterprise data model. Today’s conversation will focus on Data Mesh and Data Fabric.
What are the drivers an organization faces which requires them to continuously evolve their approach to enterprise data management?
Organizations are driven by continuous innovation and improvement. This requires them to adopt new technologies to better deliver insights for decision making, efficiency, scale and meet the needs of their customers. The journey to innovation is paved with an acceleration in data volume, velocity, and veracity, as well as complexity and higher demand. Often there is a disconnect between the current enterprise capabilities and processes, and the technology necessary to manage this change in data, to deliver on the promises of value creation through data. More sophisticated use cases, with the underlying that data must be identifiable, clean, accurate, and accessible, is where the need for evolution arises.
In the early 2000’s, the concepts of Data Governance were emerging and wasn’t that supposed to be the Holy Grail of data management? Is there still a role for Data Governance in today’s organization?
In short, yes, there continues to be a role for Data Governance, probably more now that ever. Some organizations that initially implemented top-down approaches have done very well, and others were overwhelmed. Some organizations tried bottom which resulted in too much focus at the granular and never met the promise of scaling across the organization, and was met with abandonment.
As the autocratic approach of ensuring structure, processes, documentation, technology, etc... made a lot of organizations shy away, core foundation elements such as metadata management and data quality management, were also seen as an afterthought.
With the shift towards self-service models, what used to be garbage-in, garbage-out is now garbage-in, garbage everywhere. The impetus for change, however, resides in organizations pushing forward data democratization programs and empowering everyone in the organization to have access to the right information, to build the right analytics, to make the right business decisions. Now more sophisticated architectures, more sources, more consumers of data… all scream the need to adopt Data Governance so that, the organization has trust in the shared data. The principles that are paramount such as data literacy, security, architecture, delivery, analytics, AI, and automation have become ever so more important.
This is why Adastra developed the concept of “Just Enough Governance” a use-case driven approach, ensuring foundational elements, such as stewardship, standards, core data management methodologies, are in place, as organizations deliver on those use cases.
This is a great time to ask: Data Mesh, Data Fabric, can you help us understand these two concepts?
Yes. In a nutshell, these two concepts focus on the same underlying challenges of scalable data management, identification, access, and security of data, all with minimum impact and change to the organization. It’s about the data lifecycle and the data supply chain, AKA Data Ops as the new term goes. Maybe some data architects will disagree with me but, both the concepts aim to treat data as a product. A product you “manage” in a Data Mesh vs. a product you “shop for” in a Data Fabric.
The data mesh itself is an architectural framework that treats data domains as a product, allowing for efficient data access, delivery and problem solving for analytical challenges, while minimizing IT bottlenecks. Teams of Domain experts maintain the quality and make the data available to the organization through a decentralized process. They manage their own environments eliminating the limitations around scale and agility. In a Data Mesh environment, the owners are responsible for the quality and delivery of the data. There are 4 pillars of a Data Mesh: Data as a product structure, Data Domain ownership and architecture, Federated Data Governance and Self-Service infrastructure as a platform.
Now the Data Fabric is a concept supported by technology to take advantage of analytics, automation, and dynamic integration, that is all driven by metadata. The key is to discover, understand and provision the data through ML and automation. The Data Fabric focuses on a set of connected data services allowing for data consumers and developers to find, understand and access the data across the organizational estate.
But it doesn’t stop there, the concept extends to active metadata, inferring new metadata from existing metadata, by discovering relationships in the data and how it is used, automated profiling, classification, and cleansing. And it keeps going, when the data is needed, because you now have source and target metadata, it can build dynamic data integration pipelines and delivery mechanisms, providing the most relevant and accurate data.
The key pillar here is to have a very strong Data Catalogue, in addition to: a connected knowledge graph, ability to capture active metadata and infer relationships, enrichment of metadata based on AI/ML discovery, automation of data integration design, and allow for multiple delivery styles through automated orchestration. If we push the concept further, the ML models can now start to predict which data will be most needed and have it ready before it is needed.
Data Fabric fundamentally is about eliminating human effort through technology, while the Data Mesh is about smarter and more efficient use of human effort through organizational change.
Both concepts are very well suited to Cloud environments.
To summarize then, the Data Mesh is a decentralized, federated Data Domain governance driven architecture model while Data Fabric leverages technology (AI/ML) to discover and deliver the data through the data catalogue to make it available to the organization?
I can see that there are definitely benefits to both Data Mesh and Data Fabric. Would you implement both, or do you implement one versus the other?
Right now, it’s almost like a battle for supremacy. Data Mesh versus Data Fabric, right? It just takes me back to “is it HD DVD or Blu Ray DVD”?
But I say they're complementary, not in every aspect, but you could take the best of both. You can definitely leverage the architectural foundation to treating data like a product and layering the concepts of Data Fabric on top to be able to search, navigate and pick what you need. That is what my crystal ball says, but I think we will have more Data Mesh deployment in the near term. Whereas for a Data Fabric, adoption will be realized when there are more advancements and software vendors with full-fledged AI and ML capabilities to have it truly shine.
Do these concepts apply to all data environments sizes, or is it more geared towards bigger more complex environments, like enterprise clients?
Should everyone jump on this right away? No. Perhaps the organizational maturity is not there yet but the questions you want to ask yourself are:
- Do you want to have data delivered in an accessible, more sustainable manner to drive insights?
- Are you currently facing challenges with multiple source systems, high demands from business users and challenges with accurate data?
Then yes, these concepts should be considered. Organizations are joining the data democratization momentum so you should start investigating. All I would say is don’t wait too long to look at these advantages because, if you are an early adopter, then you get to influence the future of the tools and concepts, versus, staying back to wait for lessons learned, then the adoption curve becomes too steep, and you lose your competitive advantage. There is really nothing to lose because any step you take to make your data more accessible, trustworthy, and understood will eventually be leveraged down the line.
I really like how you positioned that. I mean, I have read quite a bit of literature that says if you are small don’t touch it, but what you bring to the table is very mature and I agree, everyone should consider it if they are driving towards data democratization and creating value for the enterprise with data. Can you maybe help us understand what the journey would look like for an organization that is looking to implement either?
The first step is a holistic current state assessment, and then articulating what the target state looks like. Then performing a gap analysis of people, process, technologies, against business use cases, to define your journey. When you implement a data strategy roadmap, you need to consider not only what technologies you are aligning to, but also the strategic business alignment to effectively implement your data management program, including considerations for, how are your teams organized to support. Understanding your challenges and pain points with your data needs. Are your use cases being realized? Are you getting the right value out of your data? Are you making effective decisions? How are the teams operating today? Are we able to support our needs for the next 5 to 10 years in an agile and scalable manner?
I think for both Data Mesh and Data Fabric, you really want to think about your technology alignment and what concurrent processes you have in place. From a Governance perspective, you want to consider whether a centralized or decentralized approach will suit your organization. You want to make sure that you have your Data Catalogue well established, socialized and maintained. Understanding the connectivity, integration, and orchestration patterns. Lastly it is all about the enablement of AI and ML to complement active metadata. But with all roadmap implementations, it’s all about taking a crawl, walk, run approach.
What does an organization gain from implementing a Data Mesh or a Data Fabric?
The organization truly gains agility through sharing and promoting data across the organization in addition to, discoverability, scalability, quality, security, accessibility, speed to delivery, and data re-usability. Rather than digging through multiple systems to fish out what you need, you can quickly access it and spend more time analyzing and making decisions. In essence, you spend less time understanding and preparing the data, and more time using the data. All while reducing the workload on IT teams. It’s all about creating value for the organization.
Thank you for your time, Rahim. This was most enlightening. If our readers are interested in understanding more about this subject or if they have questions about how to get started, they can reach out to us through our website or directly at: email@example.com
We will contact you as soon as possible.