Overcoming Challenges in RAG Systems: Lessons from Real-World Implementations

Understanding the RAG Pipeline
The RAG pipeline consists of two main stages:
1. Indexing Process
- Objective: Convert documents into a searchable format by chunking and embedding them.
- How It Works: Documents are split into smaller chunks, which are transformed into numerical embeddings using a chosen embedding model. These embeddings and their corresponding chunks are stored in a vector database.
- Key Decisions:
- Chunking Strategy: Small chunks may miss context, while large ones risk including irrelevant information.
- Embedding Selection: The chosen embedding model impacts retrieval accuracy and must suit the application domain.
2. Query Process
- Objective: Retrieve relevant chunks for user queries and generate accurate answers.
- How It Works:
- A user query is converted into an embedding.
- Similar documents are retrieved from the database.
- Retrieved chunks are re-ranked to prioritize relevance.
- Top-ranked chunks are consolidated and passed to the LLM for answer generation.
- Challenges Addressed:
- Token Limitations: Limited tokens must prioritize essential information.
- Output Format: Generated answers must follow required formats like tables or lists.
By structuring the pipeline this way, RAG systems achieve seamless retrieval and generation, adapting to diverse applications.
The Core Failure Points and Our Solutions
1. Missing Content (FP1)
- What It Is: When a question cannot be answered from available documents, the RAG system might respond with "Sorry, I don't know." However, for questions tangentially related to the content, the system could generate incorrect answers.
- Our Solution: We created an index (like a table of contents with summaries) and used a cost-effective model like Gemini Flash to verify if a query matched indexed content. If relevant, we proceeded with the RAG pipeline.
- Impact: Reduced false responses and increased user trust.
2. Missed Top Ranked Documents (FP2)
- What It Is: Relevant documents exist but are not ranked high enough(don't make it to the topK) for retrieval due to suboptimal ranking or embeddings.
- Our Solution: We enhanced indexing by extracting metadata(such as related queries etc.) and integrating it with retrieval techniques(like doing a semantic search on the metadata queries to improve ranking). We also experimented to find the best embedding model for our document set.
- Impact: Improved retrieval precision with key documents consistently ranked higher.
3. Not Extracted (FP4)
- What It Is: The LLM fails to extract the correct answer due to noise, contradictions, etc.
- Our Solution: We utilized an LLM in the re-ranking and consolidation stages to filter out irrelevant documents, ensuring only the most pertinent content was included in the final context.
- Impact: Filtering irrelevant documents reduced noise, resulting in more precise answers and improved overall system reliability.
4. Incorrect Specificity (FP6)
- What It Is: Responses are too general or overly specific, missing user expectations.
- Our Solution: Semantic chunking—ensure that each chunk is semantically complete and contains the necessary information about a particular topic.
- Impact: Delivered responses more aligned with user needs, boosting satisfaction.
5. Incomplete Answers (FP7)
- What It Is: Incomplete answers are not incorrect but miss some of the information even though that information was in the context and available for extraction. An example question such as "What are the key points covered in documents A, B, and C?"
- Our Solution: We employed multi-query retrieval to gather relevant documents. By including the rewritten versions of the queries in the prompt, the LLM could address all aspects of the question comprehensively.
- Impact: Improved the completeness of answers, especially for multi-faceted queries.
By methodically addressing these challenges, we enhanced our RAG systems and gained insights that can guide others in the field. Engineering a robust RAG system is a journey of learning, adapting, and continuous improvement.
- End of article -