Yuri Vorontsov: RAGs, LLM Papers & AI Search Renaissance
37 subscribers
18 photos
16 links
A perspective on the world of AI Search through the eyes of founder of Yandex Vertical Search.
DM: @ragexpert
Download Telegram
Yuri Vorontsov: RAGs, LLM Papers & AI Search Renaissance pinned Β«Will the Larger Context Window Kill RAG? 640 KB ought to be enough for anybody. Bill Gates, 1981 There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days. Eric Schmidt…»
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly

Great study, you can observe the degradation of metrics depending on the size of the context used.

Based on open datasets, the team built a tool for testing how different LLMs solve various tasks.

Performance degradation with longer inputs is category-dependent. Most frontier models largely retain performance on recall and RAG with longer inputs; however, even the best models significantly degrade with more contexts on tasks like re-ranking and generation with citations.


https://arxiv.org/pdf/2410.02694

@advancedrag #digest
πŸ‘2
Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM

This work presents an interpretable decisionmaking framework for autonomous vehicles that integrates traffic regulations, norms, and safety guidelines comprehensively and enables seamless adaptation to different regions. While traditional rule-based methods struggle to incorporate the full scope of traffic rules, we develop a Traffic Regulation Retrieval (TRR) Agent based on Retrieval-Augmented Generation (RAG) to automatically retrieve relevant traffic rules and guidelines from extensive regulation documents and relevant records based on the ego vehicle’s situation. Given the semantic complexity of the retrieved rules, we also design a reasoning module powered by a Large Language Model (LLM) to interpret these rules, differentiate between mandatory rules and safety guidelines, and assess actions on legal compliance and safety.
Additionally, the reasoning is designed to be interpretable, enhancing both transparency and reliability. The framework demonstrates robust performance on both hypothesized and real-world cases across diverse scenarios, along with the ability to adapt to different regions with ease.


A very unexpected, yet obvious application of RAG for edge cases. It’s a great idea to use LLM to explain the reasons behind certain non-obvious actions, especially when it comes to various legal nuances.

https://arxiv.org/pdf/2410.04759

@advancedrag #digest
πŸ‘2
GARLIC: LLM-GUIDED DYNAMIC PROGRESS CONTROL WITH HIERARCHICAL WEIGHTED GRAPH FOR LONG DOCUMENT QA

In the past, Retrieval-Augmented Generation (RAG) methods split text into chunks to enable language models to handle long documents. Recent tree-based RAG methods are able to retrieve detailed information while preserving global context. However, with the advent of more powerful LLMs, such as Llama 3.1, which offer better comprehension and support for longer inputs, we found that even recent tree-based RAG methods perform worse than directly feeding the entire document into Llama 3.1, although RAG methods still hold an advantage in reducing computational costs. In this paper, we propose a new retrieval method, called LLM-Guided Dynamic Progress Control with Hierarchical Weighted Graph (GARLIC), which outperforms previous state-of-the-art baselines, including Llama 3.1, while retaining the computational efficiency of RAG methods. Our method introduces several improvements:
(1) Rather than using a tree structure, we construct a Hierarchical Weighted Directed Acyclic Graph with many-to-many summarization, where the graph edges are derived from attention mechanisms, and each node focuses on a single event or very few events.
(2) We introduce a novel retrieval method that leverages the attention weights of LLMs rather than dense embedding similarity. Our method allows for searching the graph along multiple paths and can terminate at any depth.
(3) We use the LLM to control the retrieval process, enabling it to dynamically adjust the amount and depth of information retrieved for different queries.
Experimental results show that our method outperforms previous state-of-the-art baselines, including Llama 3.1, on two single-document and two multi-document QA datasets, while maintaining similar computational complexity to traditional RAG methods.


An interesting approach not only to data organization through graphs but also to the retrieval step, which is implemented as graph traversal, guided by attention weights to choose the node and using an LLM to determine the depth of the traversal.

This approach might be suitable for analyzing legal documents. It needs to be tested on some dataset.

https://arxiv.org/pdf/2410.04790

@advancedrag #digest
πŸ‘2
Yuri Vorontsov: RAGs, LLM Papers & AI Search Renaissance
Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM This work presents an interpretable decisionmaking framework for autonomous vehicles that integrates traffic regulations, norms, and…
LLaVA Needs More Knowledge: Retrieval Augmented Natural Language Generation with Knowledge Graph for Explaining Thoracic Pathologies

Generating Natural Language Explanations (NLEs) for model predictions on medical images, particularly those depicting thoracic pathologies, remains a critical and challenging task. Existing methodologies often struggle due to general models’ insufficient domain-specific medical knowledge and privacy concerns associated with retrievalbased augmentation techniques. To address these issues, we propose a novel Vision-Language framework augmented with a Knowledge Graph (KG)-based datastore, which enhances the model’s understanding by incorporating additional domain-specific medical knowledge essential for generating accurate and informative NLEs. Our framework employs a KG-based retrieval mechanism that not only improves the precision of the generated explanations but also preserves data privacy by avoiding direct data retrieval. The KG datastore is designed as a plug-andplay module, allowing for seamless integration with various model architectures. We introduce and evaluate three distinct frameworks within this paradigm: KG-LLaVA, which integrates the pre-trained LLaVA model with KGRAG; Med-XPT, a custom framework combining MedCLIP, a transformer-based projector, and GPT-2; and Bio-LLaVA, which adapts LLaVA by incorporating the Bio-ViT-L vision model. These frameworks are validated on the MIMIC-NLE dataset, where they achieve state-of-the-art results, underscoring the effectiveness of KG augmentation in generating high-quality NLEs for thoracic pathologies.


First, a RAG for self-driving cars; next, a RAG for analyzing medical images.

https://arxiv.org/pdf/2410.04749

@advancedrag #digest
πŸ”₯2
Retrieving, Rethinking and Revising: The Chain-of-Verification Can Improve Retrieval Augmented Generation

On one hand, this work demonstrates how it is possible to refine a question and find the correct answer in a more controlled manner using a function that depends on multiple parameters. This allows to manage multiple iterations effectively.

On the other hand, it raises the same issues as other approaches where the results are re-validated by a model trained on publicly available data, which can be time-consuming and expensive.

https://arxiv.org/pdf/2410.05801

@advancedrag #digest
πŸ”₯3
Is Semantic Chunking Worth the Computational Cost?

There is no "silver bullet" β€” no perfect chunking strategy. The document chunking format should be chosen based on the specific retrieval task. It's great that this research has emerged!

TL;DR

This article examines the balance between semantic chunking and fixed-size chunking in Retrieval-Augmented Generation (RAG) systems. While semantic chunking aims to improve document retrieval and answer generation by dividing documents into semantically coherent segments, the authors question if the additional computational costs are justified compared to the simpler fixed-size chunking method.

The study conducted various experiments on document retrieval, evidence retrieval, and answer generation tasks. Results show that while semantic chunking offers some benefits in synthetic datasets with high topic diversity, it doesn't consistently outperform fixed-size chunking in real-world scenarios. Fixed-size chunking is often more computationally efficient and performs better when documents don't exhibit significant topic diversity.

The authors conclude that semantic chunking's performance improvements are context-dependent and may not justify the increased computational overhead. They call for further exploration of more efficient chunking strategies for practical applications.

https://arxiv.org/pdf/2410.13070

@advancedrag #digest
πŸ‘3
ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation

This method could be useful in complex tasks such as building comprehensive summaries (e.g., wikis) or handling intricate queries in business intelligence. The approach emphasizes depth and accuracy but could be slow, raising questions about how to optimize the process.

TL;DR

The ConTReGen paper introduces a novel method for open-domain long-form text generation. The problem it tackles involves generating responses to complex queries by breaking them down into smaller, manageable sub-tasks. The authors propose a tree-structured retrieval model where queries are decomposed into sub-queries in a top-down manner and then synthesized in a bottom-up process. This hierarchical method ensures that every facet of the query is explored in depth.

Key points include:
1. Tree-structured Approach: Instead of a linear sequence of sub-questions, the system uses a tree structure where each task can have further sub-tasks, ensuring a thorough exploration of all aspects.

2. Recursive Decomposition: The model recursively breaks down the original query and explores sub-facets until a stopping condition is reached, either a predefined depth or the determination that all questions have been answered.

3. Bottom-up Synthesis: After exploring the tree, results are synthesized bottom-up, where each node's results are combined to form a comprehensive answer to the original query.

The paper's results show potential, but the authors have not yet detailed how the model handles recursion depth or the decision to stop exploration. The idea is promising for handling large, complex problems, but there are questions about performance and potential optimizations.

https://arxiv.org/pdf/2410.15511

@advancedrag #digest
πŸ‘1
Oldies but goldies:

Retrieval-augmented generation (RAG), as a cutting-edge technological paradigm, aims to address challenges faced by large language models (LLMs), such as:
β€” data freshness (He et al., 2022),
β€” hallucinations (Benedict et al., 2023; Chen et al., 2023b; Zuccon et al., 2023; Liang et al., 2024), and
β€” the lack of domain-specific knowledge (Li et al., 2023; Shen et al., 2023).


https://arxiv.org/pdf/2410.12788

@advancedrag #quote
πŸ‘1
Recently, we conducted tests with one of our clients on state-of-the-art machine translation systems, including Yandex and DeepL, as well as simpler models like GPT-4o-mini, GPT-3.5, Claude Haiku, and some open-weight models (e.g., tilmash). Our aim was to reduce translation costs without sacrificing the content of the original text. In our experiment, the quality of translation from Russian to English using GPT-4o-mini was excellent.

Today, I came across a study that applied RAG in machine translation to handle domain-specific terms. I tried examples from the study in ChatGPT, and translations into Russian were flawless. It’s possible that any issues observed in the study are specific to translation into Chinese.

This research prompted me to think about how we manage internal jargon β€” corporate slang β€” where an LLM might not be familiar with certain in-house terms. The approach in this study might be applicable not only to translation tasks but also to searching within corporate data.

Abstract
Large language models (LLMs) have shown great promise in machine translation, but they still struggle with contextually dependent terms, such as new or domain-specific words. This leads to inconsistencies and errors that are difficult to address. Existing solutions often depend on manual identification of such terms, which is impractical given the complexity and evolving nature of language. While Retrieval-Augmented-Generation (RAG) could provide some assistance, its application to translation is limited by issues such as hallucinations from information overload.

https://arxiv.org/pdf/2410.21067

@advancedrag #digest
πŸ‘1
Wrote a blog about RAG in 2025.

Anticipated trends and advancements in RAG for 2025:
https://www.quepasa.ai/post/anticipated-trends-and-advancements-in-rag-for-2025

@advancedrag #blog
πŸ‘2πŸ”₯1
Forwarded from Yuri Vorontsov
What are pros and cons using rerankers?
πŸ‘4
Forwarded from QuePasaRAGpapersBot
Based on the sources provided, here are some pros and cons of using rerankers:

Pros:
1. Improved performance: Rerankers can enhance the quality of retrieved documents, potentially leading to more accurate and contextually rich outputs in retrieval-augmented generation tasks [🌐ARAGOG: Advanced RAG Output Grading].

2. Better understanding of relevance: Some rerankers, like TART, can incorporate user intent and instructions into the reranking process, going beyond just the query [🌐AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline].

3. Versatility: There are various types of rerankers (e.g., LM-based, embedding-based, log prob-based) that can be used for different purposes and scenarios [🌐AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline].

4. Potential for state-of-the-art performance: Fine-tuned rerankers, like Cohere reranker, can achieve state-of-the-art performance on certain benchmarks [🌐RAR-b: Reasoning as Retrieval Benchmark].

Cons:
1. Computational cost: Reranking can be computationally expensive, especially when dealing with a large number of documents [🌐Efficient Title Reranker for Fast and Improved Knowledge-Intense NLP].

2. Efficiency concerns: Some rerankers, particularly those based on large language models, can be slow and resource-intensive during inference [🌐RAR-b: Reasoning as Retrieval Benchmark].

3. Limited generalization: Rerankers may not always generalize well to out-of-domain datasets, sometimes being outperformed by simpler methods like BM25 [🌐Assessing generalization capability of text ranking models in Polish].

4. Potential for redundancy: Without careful implementation, rerankers might not effectively mitigate the issue of redundancy in retrieved documents [🌐ARAGOG: Advanced RAG Output Grading].

5. Model size dependency: The generalization capability of text rerankers may increase with model size, meaning that only large rerankers with billions of parameters might achieve good results in zero-shot settings [🌐Assessing generalization capability of text ranking models in Polish].

6. Gap with dense retrievers: Recent advancements in dense retrievers have narrowed the performance gap with rerankers, potentially making the additional reranking step less impactful in some cases [🌐Assessing generalization capability of text ranking models in Polish].
⚑2πŸ‘2
Forwarded from QuePasaRAGpapersBot
It's important to note that the effectiveness of rerankers can vary depending on the specific task, dataset, and implementation. Careful consideration of these factors is necessary when deciding whether to incorporate a reranking step in a retrieval or RAG system [🌐Assessing generalization capability of text ranking models in Polish].
πŸ‘3⚑2
Yeah... that's right...

I've built a bot that searches through all RAG papers. @QuePasaRAGpapersBot
I think my work here might be done.

@advancedrag #bot
πŸ”₯3πŸ‘2
Wrote a blog β€” Why the Heck Do I Need RAG When I’ve Got ChatGPT?
https://www.quepasa.ai/post/why-the-heck-do-i-need-rag-when-ive-got-chatgpt

@advancedrag #blog
❀1πŸ‘1