In our daily lives, we are surrounded by unstructured data. From (textual data) emails, presentations, text messages, online financial transaction, to (non-textual) media data like JPEGs, audio and video files, all are contributing to uncontrolled growth of unstructured data. Multiple reports have confirmed that we will continue to witness an explosive growth of unstructured data.
- IDG: Unstructured data is growing at the rate of 62% per year
- Gartner: Data volume is set to grow 800% over the next five years and 80% of it will reside as unstructured data
Organizations of all sizes are generating mount of unstructured data and are required to protect and archive it. This is especially true for regulated industries like medical and law enforcement, where they are required to keep the data for five to seven years. In some cases this data can be in the form of large media files, which are much larger than textual data. As recently reported, by Time.com, “there are nearly 18,000 state and local police departments in the United States, and almost one-third of them are now putting body-cameras on their officers”. It is safe to expect that this number will only increase, putting a huge burden on their respective IT infrastructure teams to store all that data efficiently.
To put things more in perspective, the Oakland Police Department (OPD) in California currently has 600 officers with body cameras. The OPD is required to retain the video collated for five years, and longer if a video becomes evidence in a court.
If gone unchecked, the unstructured data can lead to multiple issues, like storage capacity, ability to search and retrieve relevant data when needed, and rising cost, example Baltimore city officials estimated video storage costs to be as much as $2.6 million annually.
Historically organizations were using traditional NAS to store their unstructured data, which was expensive and did not scale-out. With advancements in technology, users started to deploy proprietary scale-out solutions like EMC Isilon, which can scale to 144 nodes but are cost prohibitive for many.
Cohesity is a hyperconverged secondary solution that is helping organizations consolidate their unstructured data in one, easy to use and manage platform. With built in features like global in-line dedupe, customers can optimize their storage capacity. Traditional solutions would allow users to dedupe within the same volume or chassis, whereas with Cohesity, users can dedupe across the entire stack. Customers can quickly locate and restore unstructured files with Cohesity’s global Google-like wild-card search.
Each Cohesity hyperconverged node is optimally configured (compute, capacity and network connectivity) to ingest large unstructured data at scale. Cohesity also natively integrates with all leading cloud providers that can help users to optimally plan their capacity requirements. With CloudArchive, customers can replace tapes and archive their large files on the cloud for long-term archival. Policy based tiering with CloudTier, helps customers to move cold data to cloud, and leverage cloud economics.
The recently announced Cohesity C3000 is industry’s highest density, scale-out hyperconverged node. With 183TB per industry standard server nodes, combined with the limitless scale-out capability of SpanFS, it is an ideal solution for managing unstructured data. Click here to get more details on the C3000 high density appliance.
University of California at Santa Barbara recently deployed Cohesity to protect their data center apps, and archive their growing unstructured data, including the space-consuming video files from their police departments body and dash cameras. The data was critical and mandatory to protect and needed to be made available upon request. Click here to see how Cohesity helped the IT infrastructure team at UCSB to both simplify their data protection AND manage their growing unstructured data in one cost-effective solution.
For more information, please refer to the solution brief.