NVIDIA NeMo Retriever Microservices

Earlier this year, we introduced the industry’s first generative AI-powered conversational search assistant, Cohesity Gaia. It brings the power of generative AI to enterprise data, dramatically improving the speed and quality of insights available for a variety of use cases.

We’ve been working hard, learning from customers, and launching new capabilities that improve the user experience. We’ve also been learning from our collaboration with NVIDIA. In June, we wrote about our integration of NVIDIA NIM, a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inference across clouds, data centers, and workstations. Now, we’re excited to share some of the early results of our implementation with NVIDIA NIM microservices, particularly how their use has improved the user experience.

NVIDIA NIM unlocks the power of NVIDIA accelerated computing

Cohesity already uses popular components of the NVIDIA stack: NVIDIA Triton Inference Server, NVIDIA A100 Tensor Core GPUs, and CUDA Python, to name a few.

Watch the video below to see how to deploy generative AI in production with NVIDIA NIM.

NVIDIA NIM microservices comprise the “secret sauce” we’ve used to harness the full capabilities of NVIDIA technologies. NIM simplifies the deployment of AI models by offering them as performance-optimized, containerized microservices. Watch the video below to see how to deploy generative AI in production with NVIDIA NIM.

NIM microservices are part of the NVIDIA AI Enterprise software platform and are designed to support developers like Cohesity in deploying AI models quickly and efficiently across various environments, from local workstations to large-scale cloud infrastructures.

Our most extensive NIM implementation in Cohesity Gaia is with the NeMo Retriever text reranking NIM microservice. We selected this particular microservice because it’s designed to enhance answer accuracy.

Read the NVIDIA blog

How the NVIDIA NeMo Retriever reranking NIM improves accuracy for Cohesity Gaia

Cohesity Gaia has two important components: search/retrieval and generation. The NVIDIA NeMo Retriever text reranking NIM microservice increases the recall performance of the retrieval component.

It works by reordering retrieved documents based on their relevance to a user query, assigning a “relevance score” to each document-query pair. This process is particularly important in the Cohesity Gaia pipeline, where multiple sources of data are combined.

Cohesity Gaia | Retrieval augmented generation

blog

Figure 1 (above): The text reranking NIM-powered service improves the accuracy of the Cohesity Gaia retrieval process.

In practice, a basic retrieval system might fetch a set of documents based on a query. From there, the text reranking NIM microservice will then refine this list by scoring and reordering them to ensure the most relevant documents are only sent to the subsequent generation step. This helps with recall and reduces model confusion by minimizing irrelevant context.

For Cohesity Gaia customers, the reranking process improves the recall of relevant results. Let’s consider a legal use case for eDiscovery. When a user requests a summary of artifacts about a legal case, it’s important for Cohesity Gaia to use the richest possible set of documents that pertain to the user’s query. Important details may be omitted from the summary if relevant documents are not included as part of Cohesity Gaia’s response. We measure this dimension of Gaia’s performance by tracking the number of errors observed for a given query, i.e., how many relevant documents were overlooked when responding to the user’s query.

Thanks to the use of text reranking NIM microservice, Gaia’s errors were reduced by 13%.

During this implementation, NVIDIA’s guidance and feedback were invaluable. NVIDIA product managers and engineers provided us with SDKs, data sets, plus guidance and feedback on our implementation. Building cutting-edge AI solutions isn’t easy, but the support from NVIDIA dramatically simplified our work.

What’s next for Cohesity

AI is a major investment area for Cohesity. We’re excited to continue our adoption of NVIDIA NeMo and NIM microservices in the months and years ahead. In the coming months, we’re considering how to use NIM to help Gaia analyze “data modalities”—different types of data that can be collected and analyzed, including structured, unstructured, and semi-structured data tables. We also want to expand to support multiple languages—whereby a user could submit a query in French and receive results from documents in English, German, French, and other languages.

Join us at AWS re:Invent to learn how AWS, Cohesity, and NVIDIA combined generative AI with a modern data platform to deliver instant, high-quality insights from secondary data, helping businesses address critical questions faster.

In the meantime, if you’re an ISV building RAG applications, we recommend this blog from NVIDIA to help get you started.

Cohesity and Veritas have joined forces!

How Cohesity uses NVIDIA NeMo Retriever microservices to improve RAG AI retrieval recall

NVIDIA NIM unlocks the power of NVIDIA accelerated computing

How the NVIDIA NeMo Retriever reranking NIM improves accuracy for Cohesity Gaia

Cohesity Gaia | Retrieval augmented generation

What’s next for Cohesity

Most popular blogs

Want to learn more?

Cohesity and Veritas have joined forces!

How Cohesity uses NVIDIA NeMo Retriever microservices to improve RAG AI retrieval recall

NVIDIA NIM unlocks the power of NVIDIA accelerated computing

How the NVIDIA NeMo Retriever reranking NIM improves accuracy for Cohesity Gaia

Cohesity Gaia | Retrieval augmented generation

What’s next for Cohesity

Jared Ruckle

Recent blogs

Vikram Gupta

Most popular blogs

Want to learn more?