Key Data Mesh Benefits
March 4, 2020 2022-10-31 15:14Key Data Mesh Benefits
2020 data landscape looks way different than it looked 30 years ago. It’s decentralized and very different from what we see in almost any company currently. Zhamak Dehghani calls it “Data Mesh” and recently it became a buzzword and trendy topic in most of the conferences. If you feel the pain of current data architecture in your company, then you want to move to it. Therefore today’s article is about Data Mesh to understand what it is and why it is gaining momentum.
What is Data Mesh?
As data becomes ever more ubiquitous, traditional architectures of data warehouses and data lakes become overwhelmed, and are unable to scale efficiently. A distributed data mesh approach can overcome these inherent inefficiencies by embracing domain-oriented data ownership.
“I suggest that the next enterprise data platform architecture is in the convergence of Distributed Domain Driven Architecture, Self-serve Platform Design, and Product Thinking with Data” says Zhamak Dehghani.
The main shift is to treat domain data product as a first class concern, and data lake tooling and pipeline as a second class concern – an implementation detail. This inverts the current mental model from a centralized data lake to an ecosystem of data products that play nicely together, a data mesh.
Data mesh is an architectural paradigm that unlocks analytical data at scale; rapidly unlocking access to an ever-growing number of distributed domain data sets, for a proliferation of consumption scenarios such as machine learning, analytics or data intensive applications across the organization. Data mesh addresses the common failure modes of the traditional centralized data lake or data platform architecture, with a shift from the centralized paradigm of a lake, or its predecessor, the data warehouse. Data mesh shifts to a paradigm that draws from modern distributed architecture: considering domains as the first-class concern, applying platform thinking to create a self-serve data infrastructure, treating data as a product and implementing open standardization to enable an ecosystem of interoperable distributed data products.
Data mesh creates a layer of connectivity that abstracts away the complexities of connecting, managing and supporting access to data. At its core, It is used to stitch together data held across multiple data silos. The premise of a data mesh is that it is used to connect distributed data across different locations and organizations.
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
DDD, microservices & DevOps changed the way we develop software in the last decade. Data in the analytics department, however, did not catch up to that. To speed up decision making based on data in a company with a modern development approach, analytics & software teams need to change:
- Software teams must consider data a product they serve to everybody else, including analytics teams
- Analytics teams must build on that, stop hoarding data and instead pull it in on-demand
- Analytics teams must start to consider their data lakes/ data warehouses as data products as well.
A data mesh ensures that data is highly available, easily discoverable, secure, and interoperable with the applications that need access to it.
Data meshes are used in a variety of circumstances:
- Connecting cloud applications to sensitive data that lives in a customer’s on-premise or cloud environment
- Creating virtual data catalogs from a variety of data sources that can’t be centralized
- Creating virtual data warehouses or data lakes for analytics and machine learning training without consolidating data into a single repository
- Giving application developers and devops teams ways to query data from a variety of data stores without having to think about ‘how’ they are accessing that data
When should you consider moving to a data mesh?
First of all, if you’re happy with your structure, if you’re happy with the way your company uses data to make decisions, then don’t. But if you feel any of the following pains, the solution is the data mesh:
- If you have domain complexity in combination with microservices/ domain driven design, you will probably be feeling that things are too complex for a central team to properly serve that data at once.
- Importing data into the data warehouse is costly, if that is the case, and you are therefore dismissing data sources to be imported that are valuable to individual users. Those should be served individually and are perfect candidates for a “carve-out as data mesh node”.
- You haven’t closed the loop of data -> information -> insight -> decision -> action back to data.
- Data speed in the Continuous Intelligence Cycle is measured in weeks & months, not days or hours.
- You’re already moving transformation of data as close to the data-users as possible
Becoming a data-driven organization remains one of the top strategic goals of many companies. Data Mesh makes companies intelligently empowered: providing the best customer experience based on data and hyper-personalization; reducing operational costs and time through data-driven optimizations; and giving employees super powers with trend analysis and business intelligence.