Reading with Intent
- URL: http://arxiv.org/abs/2408.11189v1
- Date: Tue, 20 Aug 2024 20:47:27 GMT
- Title: Reading with Intent
- Authors: Benjamin Reichman, Kartik Talamadupula, Toshish Jawale, Larry Heck,
- Abstract summary: RAG systems that rely on the open internet as their knowledge source have to contend with the complexities of human-generated content.
We introduce a prompting system designed to enhance the model's ability to interpret and generate responses in the presence of sarcasm.
- Score: 7.623508712778745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval augmented generation (RAG) systems augment how knowledge language models are by integrating external information sources such as Wikipedia, internal documents, scientific papers, or the open internet. RAG systems that rely on the open internet as their knowledge source have to contend with the complexities of human-generated content. Human communication extends much deeper than just the words rendered as text. Intent, tonality, and connotation can all change the meaning of what is being conveyed. Recent real-world deployments of RAG systems have shown some difficulty in understanding these nuances of human communication. One significant challenge for these systems lies in processing sarcasm. Though the Large Language Models (LLMs) that make up the backbone of these RAG systems are able to detect sarcasm, they currently do not always use these detections for the subsequent processing of text. To address these issues, in this paper, we synthetically generate sarcastic passages from Natural Question's Wikipedia retrieval corpus. We then test the impact of these passages on the performance of both the retriever and reader portion of the RAG pipeline. We introduce a prompting system designed to enhance the model's ability to interpret and generate responses in the presence of sarcasm, thus improving overall system performance. Finally, we conduct ablation studies to validate the effectiveness of our approach, demonstrating improvements in handling sarcastic content within RAG systems.
Related papers
- Reading with Intent -- Neutralizing Intent [1.7614751781649955]
In most benchmarks, Wikipedia-like texts are written in a neutral and factual tone.
When RAG systems retrieve internet-based content, they encounter text with diverse tones and linguistic styles.
The Reading with Intent task addresses this issue by evaluating how varying tones in context passages affect model performance.
arXiv Detail & Related papers (2025-01-07T02:33:25Z) - GeAR: Generation Augmented Retrieval [82.20696567697016]
Document retrieval techniques form the foundation for the development of large-scale information systems.
The prevailing methodology is to construct a bi-encoder and compute the semantic similarity.
We propose a new method called $textbfGe$neration that incorporates well-designed fusion and decoding modules.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - Enhancing Rhetorical Figure Annotation: An Ontology-Based Web Application with RAG Integration [0.6372911857214884]
We develop a web application called "Find your Figure"
It facilitates the identification and annotation of German rhetorical figures.
In addition, we improve the user experience with Retrieval Generation (RAG)
arXiv Detail & Related papers (2024-12-18T12:45:55Z) - Semantic Tokens in Retrieval Augmented Generation [0.0]
I propose a novel Comparative RAG system that introduces an evaluator module to bridge the gap between probabilistic RAG systems and deterministically verifiable responses.
This framework paves the way for more reliable and scalable question-answering applications in domains requiring high precision and verifiability.
arXiv Detail & Related papers (2024-12-03T16:52:06Z) - Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation [65.23793829741014]
Embodied-RAG is a framework that enhances the model of an embodied agent with a non-parametric memory system.
At its core, Embodied-RAG's memory is structured as a semantic forest, storing language descriptions at varying levels of detail.
We demonstrate that Embodied-RAG effectively bridges RAG to the robotics domain, successfully handling over 200 explanation and navigation queries.
arXiv Detail & Related papers (2024-09-26T21:44:11Z) - A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning [13.112610550392537]
Retrieval-augmented generation (RAG) is a framework enabling large language models to enhance their accuracy and reduce hallucinations by integrating external knowledge bases.
We introduce a hybrid RAG system enhanced through a comprehensive suite of optimizations that significantly improve retrieval quality, augment reasoning capabilities, and refine numerical ability.
arXiv Detail & Related papers (2024-08-09T15:53:55Z) - Seven Failure Points When Engineering a Retrieval Augmented Generation
System [1.8776685617612472]
RAG systems aim to reduce the problem of hallucinated responses from large language models.
RAG systems suffer from limitations inherent to information retrieval systems.
We present an experience report on the failure points of RAG systems from three case studies.
arXiv Detail & Related papers (2024-01-11T12:04:11Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - How to Describe Images in a More Funny Way? Towards a Modular Approach
to Cross-Modal Sarcasm Generation [62.89586083449108]
We study a new problem of cross-modal sarcasm generation (CMSG), i.e., generating a sarcastic description for a given image.
CMSG is challenging as models need to satisfy the characteristics of sarcasm, as well as the correlation between different modalities.
We propose an Extraction-Generation-Ranking based Modular method (EGRM) for cross-model sarcasm generation.
arXiv Detail & Related papers (2022-11-20T14:38:24Z) - A Survey on Automated Sarcasm Detection on Twitter [0.0]
Short text messages are increasingly used for communication, especially over social media platforms such as Twitter.
Unidentified sarcasm in these messages can invert the meaning of a statement, leading to confusion and communication failures.
This paper covers a variety of current methods used for sarcasm detection, including detection by context, posting history and machine learning models.
arXiv Detail & Related papers (2022-02-05T08:38:38Z) - Addressing Issues of Cross-Linguality in Open-Retrieval Question
Answering Systems For Emergent Domains [67.99403521976058]
We demonstrate a cross-lingual open-retrieval question answering system for the emergent domain of COVID-19.
Our system adopts a corpus of scientific articles to ensure that retrieved documents are reliable.
We show that a deep semantic retriever greatly benefits from training on our English-to-all data and significantly outperforms a BM25 baseline in the cross-lingual setting.
arXiv Detail & Related papers (2022-01-26T19:27:32Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Sarcasm Detection using Context Separators in Online Discourse [3.655021726150369]
Sarcasm is an intricate form of speech, where meaning is conveyed implicitly.
In this work, we use RoBERTa_large to detect sarcasm in two datasets.
We also assert the importance of context in improving the performance of contextual word embedding models.
arXiv Detail & Related papers (2020-06-01T10:52:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.