In today’s fast-paced digital landscape, adopting innovative AI tools like Retrieval-Augmented Generation (RAG) systems and agents built on top is becoming increasingly important. Using pure LLMs is not valuable in a company context at all.
These systems offer a unique blend of retrieval-based and Generative AI models. However, combining data governance issues when using a RAG with the dilemma of choosing between on-premises and cloud variants leads to significant decisions and implementation work.
This blog post aims to guide you through the nuances of integrating RAG systems effectively and safely into your organization.
Operating RAG systems causes significant governance issues
RAG systems are excellent assets for applications requiring comprehensive answers. However, their implementation varies significantly depending on whether they’re hosted on-premises or in the cloud and what kind of data security practices and processes you have.
A crucial aspect of data security in these systems is the implementation of role-based access control (RBAC). RBAC adds an additional layer of security by ensuring that access to sensitive data is limited to authorized roles within the organization, thereby reducing the risk of data breaches. Each role is assigned specific rights, and users can only see information necessary to their role, preventing unauthorized access to data.
Yet, RBAC is only one function that you need to design a sustainable approach for in order to have a secure and usable RAG and agent solution. In summary, implementing the technical side of RBAC may seem “trivial,” but combining RBAC and RAG with all its data into your company’s sociological side (people, processes, and tool integrations) will be the heart of the problem.
On-premises vs. cloud deployment options
An on-premises RAG system is hosted on a company’s own servers and managed by in-house IT staff. This variant offers better control over your data, which is crucial for sensitive information. However, it might require substantial capital expenditure and technical expertise for setup and maintenance and could limit scalability due to hardware constraints.
As an example, you can build such a system using open-source vector databases like Weaviate and an LLM execution environment like vLLM or Ollama to run your local models. Then, you can integrate the RAG system with Keycloak for scalability and to future-proof your RBAC solution.
Cloud-based RAG systems like Azure Open AI Service or AWS Q are hosted on the service provider’s servers. They offer scalability, cost-effectiveness as a pay-as-you-go model, and ease of setup. However, depending on the nature of the work and the cloud provider’s policies, organizations might face data security and compliance concerns.
More importantly, you need to consider whether you want to codify your company’s culture to a platform that is not in your hands. Using cloud-based RAGs can also generate significant costs.
Steps to adopt RAG and agent systems
Evaluate
Evaluate your needs and determine areas where RAG systems could benefit your business. Assess whether your organization’s data sensitivity and scalability requirements align better with an on-premises or cloud variant.
Data embeddings
You need to process your data in a vector format so that it is accessible to LLMs. This is foundational for your RAG system to be valuable for your development value stream use cases.
Depending on whether you are on-prem or cloud, you need to select the correct methods, tools, and libraries to get chunking and indexing to perform the best way when you use the system.
Expertise and training
Ensure your team has the necessary skills to manage your chosen RAG system. This might involve training existing employees or hiring new ones. Start small, even with your local environments, use your own machines, and experiment with data embedding methods.
Choose the right models for your LLM system
Consider each model's specific functionality (code, user guides, specifications, etc.), and align it with your organizational needs. In the near future, you will have tens or hundreds of these models running within your organization.
Integration
Integrate the RAG system into your existing infrastructure and RBAC solution. This step will vary significantly depending on whether you adopt an on-premise or cloud-based system.
Independent of the deployment strategy, you must enable your current DevOps teams to operate these systems. You also need to build the lifecycle management processes that surround these tools. Of course, managing them in the cloud and on-prem should be done using Infrastructure-as-Code (IaC) tooling integrated with CI/CD tooling.
Validating
The key here is using existing or GenAI-enhanced DevOps processes as a safety net. You need to conduct extensive testing to ensure the system functions correctly and adjusts based on the outcomes of your development value stream.
Utilizing the sociotechnological system as a guide is good practice. You must continuously validate the output of your RAG-empowered organization at all levels.
Training organization
Educate your organization members on how to interact with and leverage the RAG system. Again, create awareness of the capabilities and transparency of these RAG systems so that they do not appear to the users as complete black boxes.
Understanding how and why prompting (user input) affects the end result is key to successfully using AI-assisted tools. Many applications provide prompt templates in the background, so users don’t need to understand everything (for the most part). But if things go wrong, you’ll need people who know what they’re doing.
Monitor and improve
Monitor the system’s performance, prompts, and responses regularly and make necessary improvements to optimize its benefits. Remember to iterate with system prompts and do not stop because of poor results.
Whether you opt for an on-premises or cloud-based RAG and agent system, the key is understanding your organizational needs and preparing for the integration. With mindful planning and continuous improvement, your organization can harness the full potential of LLM systems, enhancing development, operational efficiency, and decision-making processes.
Published: Jun 14, 2024