The Ovum Decision Matrix is one of Ovum's signature pieces of research, offering in-depth quantitative analysis comparing enterprise software vendors of a specific category on a broad array of features. In June 2018, Ovum released a Decision Matrix report on the self-service data prep industry, documenting the key players and trends in this space. Data often requires extensive blending and standardization before it is fed into analytics and visualization tools, and self-service data prep products were developed to meet this need for business users. Here, we highlight the key findings from our self-service data prep research.
Some of the findings of Ovum's 2018 report are intuitive, as the self-service data prep market is reaching a relative stage of maturity. However, in some regards and niche functionality, it is still rapidly evolving. Here, we summarize.
There is an increasing trend of "platformization" of self-service data prep, whereby data prep functionality is being added as a feature in information management and analytics platforms, rather than being offered as a purely standalone tool. This has made it increasingly difficult to compare self-service data prep vendors in a side-by-side fashion, as functionality is often embedded in platforms with different inherent capabilities.
Leaders in the Ovum Decision Matrix represent a variety of approaches – both standalone data prep and data prep functionality embedded within a broader platform. The right choice will depend on business requirements and existing IT infrastructure.
Self-service data prep vendors are quite closely matched in terms of core data blending and preparation capabilities; differentiators include machine learning-guided functionality, connectivity to analytics tools, and governance features. As is the case with self-service analytics, vendors are working to expand the audience of users to less-technical personas.
Information governance functionality is increasing in importance for self-service data prep environments, with many products embedding data catalogs; the enterprise demands the ability to audit and see lineage for transformations. Finding relevant data is the first step in preparing data, and catalogs facilitate this process.
Because some self-service data prep products are natively embedded in analytics platforms, there is notable variation between products' ability to connect to multiple competing BI and visualization tools. Analytics platforms with embedded data prep functionality typically have limited connectivity and integration with competing analytics tools.
Self-service data prep has traditionally served as a feeder to self-service analytics tools. However, forward-leaning organizations are increasingly feeding prepped data into machine learning models. The only follower in the Ovum Decision Matrix on self-service data prep supports this use case, embedding data prep functionality in a data science platform.
Enterprise cloud-first and multi-cloud strategies are putting pressure on data prep vendors to offer connectivity to various cloud data repositories and software-as-a-service data sources; most data prep vendors in the Ovum Decision Matrix can deploy on all three major cloud providers. This helps the enterprise avoid cloud vendor lock-in, a growing concern.
Native integration with execution environments such as MapReduce, Spark, and Hive give the enterprise flexibility in data processing; a data prep tool's ability to infer best-fit processing execution gives the enterprise an edge in working with large data sets. When a data prep product provides choice of processing environment, it allows the enterprise to leverage existing IT infrastructure and investments.
Ovum Decision Matrix: Selecting a Self-Service Data Prep Solution, 2018–19, INT002-000120 (June 2018)
Beyond Self-Serve: Expanding the End-User Audience of Data Prep, IT0014-003213 (January 2017)
Paige Bartley, Senior Analyst, Data and Enterprise Intelligence