LLMs may Dominate Information Access: Neural Retrievers are Biased
Towards LLM-Generated Texts
- URL: http://arxiv.org/abs/2310.20501v2
- Date: Sun, 14 Jan 2024 14:41:06 GMT
- Title: LLMs may Dominate Information Access: Neural Retrievers are Biased
Towards LLM-Generated Texts
- Authors: Sunhao Dai, Yuqi Zhou, Liang Pang, Weihao Liu, Xiaolin Hu, Yong Liu,
Xiao Zhang, Gang Wang and Jun Xu
- Abstract summary: Large language models (LLMs) have revolutionized the paradigm of information retrieval (IR) applications.
Surprisingly, our findings indicate that neural retrieval models tend to rank LLM-generated documents higher.
To mitigate the source bias, we also propose a plug-and-play debiased constraint for the optimization objective.
- Score: 36.73455759259717
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, the emergence of large language models (LLMs) has revolutionized
the paradigm of information retrieval (IR) applications, especially in web
search. With their remarkable capabilities in generating human-like texts, LLMs
have created enormous texts on the Internet. As a result, IR systems in the
LLMs era are facing a new challenge: the indexed documents now are not only
written by human beings but also automatically generated by the LLMs. How these
LLM-generated documents influence the IR systems is a pressing and still
unexplored question. In this work, we conduct a quantitative evaluation of
different IR models in scenarios where both human-written and LLM-generated
texts are involved. Surprisingly, our findings indicate that neural retrieval
models tend to rank LLM-generated documents higher. We refer to this category
of biases in neural retrieval models towards the LLM-generated text as the
\textbf{source bias}. Moreover, we discover that this bias is not confined to
the first-stage neural retrievers, but extends to the second-stage neural
re-rankers. Then, we provide an in-depth analysis from the perspective of text
compression and observe that neural models can better understand the semantic
information of LLM-generated text, which is further substantiated by our
theoretical analysis. To mitigate the source bias, we also propose a
plug-and-play debiased constraint for the optimization objective, and
experimental results show the effectiveness. Finally, we discuss the potential
severe concerns stemming from the observed source bias and hope our findings
can serve as a critical wake-up call to the IR community and beyond. To
facilitate future explorations of IR in the LLM era, the constructed two new
benchmarks and codes will later be available at
\url{https://github.com/KID-22/LLM4IR-Bias}.
Related papers
- LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs [63.580867975515474]
We present the first systematic investigation comparing the long-context performance of diffusion LLMs and traditional auto-regressive LLMs.<n>We propose LongLLaDA, a training-free method that integrates LLaDA with the NTK-based RoPE extrapolation.
arXiv Detail & Related papers (2025-06-17T11:45:37Z) - Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation [44.58099275559231]
Large language models (LLMs) are increasingly integral to information retrieval (IR), powering ranking, evaluation, and AI-assisted content creation.
This paper synthesizes existing research and presents novel experiment designs that explore how LLM-based rankers and assistants influence LLM-based judges.
arXiv Detail & Related papers (2025-03-24T19:24:40Z) - Potential and Perils of Large Language Models as Judges of Unstructured Textual Data [0.631976908971572]
This research investigates the effectiveness of LLM-as-judge models to evaluate the thematic alignment of summaries generated by other LLMs.
Our findings reveal that while LLM-as-judge offer a scalable solution comparable to human raters, humans may still excel at detecting subtle, context-specific nuances.
arXiv Detail & Related papers (2025-01-14T14:49:14Z) - Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG.
In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning.
In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z) - Robustness of LLMs to Perturbations in Text [2.0670689746336]
Large language models (LLMs) have shown impressive performance, but can they handle the inevitable noise in real-world data?
This work tackles this critical question by investigating LLMs' resilience against morphological variations in text.
Our findings show that contrary to popular beliefs, generative LLMs are quiet robust to noisy perturbations in text.
arXiv Detail & Related papers (2024-07-12T04:50:17Z) - ReMoDetect: Reward Models Recognize Aligned LLM's Generations [55.06804460642062]
Large language models (LLMs) generate human-preferable texts.
In this paper, we identify the common characteristics shared by these models.
We propose two training schemes to further improve the detection ability of the reward model.
arXiv Detail & Related papers (2024-05-27T17:38:33Z) - Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration [60.535793237063885]
The proliferation of Large Language Models (LLMs) has led to an influx of AI-generated content (AIGC) on the internet.
The impact of this surge in AIGC on Information Retrieval systems remains an open question.
We introduce Cocktail, a benchmark tailored for evaluating IR models in this mixed-sourced data landscape.
arXiv Detail & Related papers (2024-05-26T12:30:20Z) - Understanding Privacy Risks of Embeddings Induced by Large Language Models [75.96257812857554]
Large language models show early signs of artificial general intelligence but struggle with hallucinations.
One promising solution is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation.
Recent studies experimentally showed that the original text can be partially reconstructed from text embeddings by pre-trained language models.
arXiv Detail & Related papers (2024-04-25T13:10:48Z) - A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions [39.36381851190369]
There is an imperative need to develop detectors that can detect LLM-generated text.
This is crucial to mitigate potential misuse of LLMs and safeguard realms like artistic expression and social networks from harmful influence of LLM-generated content.
The detector techniques have witnessed notable advancements recently, propelled by innovations in watermarking techniques, statistics-based detectors, neural-base detectors, and human-assisted methods.
arXiv Detail & Related papers (2023-10-23T09:01:13Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.