As secondary volumes continue to grow exponentially, backing up, managing, and gaining insights from that data has never been more important for enterprises as it is today — especially if you think about disaster recovery, compliance, security and using that data to drive better customer experiences. But, as secondary data volumes continue to grow, so does another problem, which is mass data fragmentation, where data is so siloed (on-prem, in the cloud, and at the edge) that it’s nearly impossible to manage or derive insights from long term.
Mohit Aron, CEO of Cohesity (formerly co-founder and CTO of Nutanix), saw this mass data fragmentation problem years ago (2013) and created a true hyperconverged platform to directly address this issue — where customers can get rid of these data silos and easily manage backups, archives, analytics, testing and development, file shares, and object stores from a single pane of glass. This hyperconverged solution—available today—has changed the game in data management for scores of customers globally who are ditching legacy solutions in droves, embracing this more modern approach.
However, now, multiple vendors are desperately trying to also solve this data fragmentation problem by cobbling together a solution that they claim is hyperconverged secondary storage. Of course we’re flattered that they are finally embracing Mohit’s vision of hyperconverged secondary storage, but what they’ve come up with is really a mishmash of legacy products with a “semi-functional” management interface that is closer to “convergence” (circa 2009) than anything else.
Mass data fragmentation didn’t begin yesterday, and in fact has become more and more of a critical challenge over an extended period of time. Such a complex problem requires an authentic platform that is built from the ground up to not only bring these data silos together, but to make backup data and applications more productive—again—all from a single pane of glass. As you look to solve your mass data fragmentation challenges, we thought it would be helpful to provide a list of the top 10 questions you should ask vendors as you evaluate their claims of a hyperconverged secondary storage solution for your data.
- Resiliency – Starting from the infrastructure layer that continues to be ignored for secondary workloads. Resiliency is one of the most important aspects in organizations that run 24 x 7 operations. A failure in backup because of a failed node or backup/master server could mean a missed SLA and that increases risks for the business exponentially. Other workloads like test/dev, analytics, file/object etc. are even more sensitive to failure. Start by asking – is the solution resilient and highly available? Will any failure in the cluster/solution result in a failed backup or a secondary workload?
- Deduplication – As you consolidate multiple (and different types) of secondary workloads on the platform, ask if the solution employs global, and variable-block (sliding-window) deduplication for maximizing space efficiency across many different workload types that are categorized as secondary? Or is it limited to a fixed block implementation? Or is it limited within a specific type of content?
- Indexing – For visibility, compliance, and ease of restore, does the solution natively (without requiring any external software) perform global indexing across use cases, such that it makes files and content inside files searchable across workloads, clusters and locations through a single UI?
- Snapshots and clones – Not all snapshots and clones are created equal. Does the solution have native data protection that works across heterogeneous systems? Does it have the necessary technology to perform unlimited snapshots? Are these snapshots fully hydrated or are they chain-based? Can these snapshots be used to instantly restore/clone many workloads or volumes of data simultaneously, or is the solution limited to perform these at a small scale?
- Scale – Does the vendor limit the scale discussion to how many nodes they can scale? Or do their snapshot, restores, deduplication, indexing etc. scale without limits and silos just as the number of nodes?
- Management – Manageability is a key challenge not only for fragmented solutions but also for hyperconverged solutions that have multiple components converged into a single GUI, which may end up making the GUI incomplete or unusable. Moreover, with the increasing use of automation in modern IT, every system needs to be programmable. Does the vendor employ an API-driven architecture and have a global, modern, and intuitive management interface which doesn’t sacrifice functionality for simplicity? Does this management use modern machine learning algorithms to eliminate management?
- Apps close to data – Many secondary use cases require copies of data to be provisioned for use. This can result in unnecessary data movement as well as wasted time and resources in creating the copies, exacerbating the mass data fragmentation problem. Does the solution allow bringing apps close to data by providing authorized access to clones of data in place?
- Cloud-native & Support for Modern workloads – Cloud is part of every CIO’s IT strategy, but it also is one of the greatest contributors to mass data fragmentation. Does the solution natively integrate with the leading cloud providers and provide a single platform that spans across on-premises and public cloud? Does it support backing up modern applications like NoSQL databases, distributed file systems and SaaS?
- Eliminating silos or creating silos? – Does the solution deliver the core functionalities (specified above) required for eliminating mass data fragmentation? Does it natively support modern applications like NoSQL databases, distributed file systems and cloud native applications? Or does it rely on other vendors?
- Modern world class support – While it is independent of the type of solution a vendor provides, customers should definitely check if the vendor provides a scalable support model driven by machine learning with a high NPS score.
As you embark on your journey to get data proliferation under control, the above questions will help you find the right solution to solve mass data fragmentation in your data center.