
Over the past couple of decades, enterprises have pushed decision-making lower in the organization, in the belief that the best decision-makers should be those closest to customers and products. At the same time, the digital transformation wave has convinced nearly everybody that decisions should be data-driven.
Unfortunately, in most companies, decision support information remains locked up in something called a data warehouse, which is essentially a giant centralized database tended by teams of data scientists and engineers.
The process of getting useful information into and out of that central resource can be painfully long, as data engineers validate, cleanse, and whip records into shape. By the time you get an answer to your question, that question may not even be meaningful anymore. And the people charged with tending that data and getting you answers are detached from your business unit, making them poorly qualified to understand the context of your request.
In short, data warehouses have become a massive bottleneck at many large organizations, particularly the largest ones. We need ways to streamline the way we manage decision-support data.
Breaking down the warehouse
The hottest concept right now is a “data mesh,” in which responsibility for maintaining analytical data moves from a centralized team of wizards to the people on the front lines who create and best understand it. Their job is to gather relevant information, ensure that it’s of high quality, and promote it to others in the organization who may find it useful.
In this scenario, the central warehouse doesn’t go away, but simply becomes another consumer of information on the mesh. The centralized organization is broken up, and many of the data engineers—or the people who refine data prior to loading it into the warehouse—are dispersed to the business units to work closely with the data owners.
A core construct of the data mesh is what Zhamak Dehghani—a consultant at technology consultancy Thoughtworks, who first proposed the mesh concept two years ago—dubbed “product thinking,” or the idea that data gets as much care and attention as any product the company sells. That means ensuring that information is relevant, current, well-organized, and exposed in a way that others in the company can find it. Incentives should be put in place to encourage business stakeholders to take care of their data, up to and including charging for access.
In Dehghani’s scenario, the organization adopts a master catalog of data that people can browse and search to find what they need in the same way that smartphone users satisfy their own needs in an app store. Instead of waiting weeks for requests to work their way to the front of the queue, people would grab data whenever they need it.
In tune with business
This may sound like a radical idea, but numerous technology and business trends are solidly behind it. One is that software-as-a-service has legitimized self-service. People now freely choose the business applications they prefer to use, so why not do the same with data?
On the technology front, recent innovations have made the mesh concept more practical. High-speed, open-source query engines, such as Presto and Apache Spark, can operate on data distributed across multiple locations without requiring that it be copied and loaded into a central resource. Business intelligence vendors—such as Microsoft (PowerBI) and Salesforce (Tableau)—are integrating their user-friendly front-end tools to hide the underlying complexity.
Businesses are also increasingly keeping data in the cloud on secure, low-cost storage services, like Amazon S3, as a kind of data lake. That means it’s in one logical place where it’s easier to index and protect.
Finally, the data catalogs that act as a master record of all the information an organization possesses use leverage machine learning to recognize and automatically categorize records with little or no human involvement. For example, the Lumada Data Catalog from Hitachi Ventara uses machine learning to create a constantly updated virtual view of information stored in databases and other structured data stores within an organization. That means business users can find, organize, and classify data without involving the IT department.
The data mesh model is already well beyond the concept phase. In a recent YouTube presentation, senior leaders at JP Morgan Chase spoke about the substantial progress the big financial services firm has made in its year-long data mesh initiative. Gartner estimates that between 400 and 500 large organizations already have initiatives underway.
In a business environment that has little tolerance for delay, the mesh concept increasingly makes sense. And given that Gartner has estimated that it can reduce integration overhead by 30% and cut maintenance costs by 70%, there’s also a compelling cost case to be made.
Next read this: