Towards Complex-query Referring Image Segmentation: A Novel Benchmark
- URL: http://arxiv.org/abs/2309.17205v1
- Date: Fri, 29 Sep 2023 12:58:13 GMT
- Title: Towards Complex-query Referring Image Segmentation: A Novel Benchmark
- Authors: Wei Ji, Li Li, Hao Fei, Xiangyan Liu, Xun Yang, Juncheng Li, Roger
Zimmermann
- Abstract summary: We propose a new RIS benchmark with complex queries, namely textbfRIS-CQ.
The RIS-CQ dataset is of high quality and large scale, which challenges the existing RIS with enriched, specific and informative queries.
We present a nichetargeting method to better task the RIS-CQ, called dual-modality graph alignment model (textbftextscDuMoGa)
- Score: 42.263084522244796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Referring Image Understanding (RIS) has been extensively studied over the
past decade, leading to the development of advanced algorithms. However, there
has been a lack of research investigating how existing algorithms should be
benchmarked with complex language queries, which include more informative
descriptions of surrounding objects and backgrounds (\eg \textit{"the black
car."} vs. \textit{"the black car is parking on the road and beside the
bus."}). Given the significant improvement in the semantic understanding
capability of large pre-trained models, it is crucial to take a step further in
RIS by incorporating complex language that resembles real-world applications.
To close this gap, building upon the existing RefCOCO and Visual Genome
datasets, we propose a new RIS benchmark with complex queries, namely
\textbf{RIS-CQ}. The RIS-CQ dataset is of high quality and large scale, which
challenges the existing RIS with enriched, specific and informative queries,
and enables a more realistic scenario of RIS research. Besides, we present a
nichetargeting method to better task the RIS-CQ, called dual-modality graph
alignment model (\textbf{\textsc{DuMoGa}}), which outperforms a series of RIS
methods.
Related papers
- TrustRAG: An Information Assistant with Retrieval Augmented Generation [73.84864898280719]
TrustRAG is a novel framework that enhances acRAG from three perspectives: indexing, retrieval, and generation.
We open-source the TrustRAG framework and provide a demonstration studio designed for excerpt-based question answering tasks.
arXiv Detail & Related papers (2025-02-19T13:45:27Z) - GeAR: Generation Augmented Retrieval [82.20696567697016]
Document retrieval techniques form the foundation for the development of large-scale information systems.
The prevailing methodology is to construct a bi-encoder and compute the semantic similarity.
We propose a new method called $textbfGe$neration that incorporates well-designed fusion and decoding modules.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - Context Awareness Gate For Retrieval Augmented Generation [2.749898166276854]
Retrieval Augmented Generation (RAG) has emerged as a widely adopted approach to mitigate the limitations of large language models (LLMs) in answering domain-specific questions.
Previous research has predominantly focused on improving the accuracy and quality of retrieved data chunks to enhance the overall performance of the generation pipeline.
We investigate the impact of retrieving irrelevant information in open-domain question answering, highlighting its significant detrimental effect on the quality of LLM outputs.
arXiv Detail & Related papers (2024-11-25T06:48:38Z) - RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs [12.846097618151951]
We develop a dataset for LLMs Complex Reasoning over Textual Knowledge Graphs (RiTeK) with a broad topological structure coverage.
We synthesize realistic user queries that integrate diverse topological structures, annotated information, and complex textual descriptions.
We introduce an enhanced Monte Carlo Tree Search (CTS) method, which automatically extracts relational path information from textual graphs for specific queries.
arXiv Detail & Related papers (2024-10-17T19:33:37Z) - LightRAG: Simple and Fast Retrieval-Augmented Generation [12.86888202297654]
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources.
Existing RAG systems have significant limitations, including reliance on flat data representations and inadequate contextual awareness.
We propose LightRAG, which incorporates graph structures into text indexing and retrieval processes.
arXiv Detail & Related papers (2024-10-08T08:00:12Z) - Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context [4.1229332722825]
This paper introduces a novel framework that combines graph-driven context retrieval in conjunction to knowledge graphs based enhancement.
We conduct experiments on various Large Language Models (LLMs) with different parameter sizes to evaluate their ability to ground knowledge and determine factual accuracy in answers to open-ended questions.
Our methodology GraphContextGen consistently outperforms dominant text-based retrieval systems, demonstrating its robustness and adaptability to a larger number of use cases.
arXiv Detail & Related papers (2024-01-23T11:25:34Z) - Building Interpretable and Reliable Open Information Retriever for New
Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA)
We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query.
We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z) - QontSum: On Contrasting Salient Content for Query-focused Summarization [22.738731393540633]
Query-focused summarization (QFS) is a challenging task in natural language processing that generates summaries to address specific queries.
This paper highlights the role of QFS in Grounded Answer Generation (GAR)
We propose QontSum, a novel approach for QFS that leverages contrastive learning to help the model attend to the most relevant regions of the input document.
arXiv Detail & Related papers (2023-07-14T19:25:35Z) - Information Screening whilst Exploiting! Multimodal Relation Extraction
with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation.
We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z) - Towards Robust Referring Image Segmentation [80.53860642199412]
Referring Image (RIS) is a fundamental vision-language task that outputs object masks based on text descriptions.
We propose a new formulation of RIS, named Robust Referring Image (R-RIS)
We create three R-RIS datasets by augmenting existing RIS datasets with negative sentences.
We propose a new transformer-based model, called RefSegformer, with a token-based vision and language fusion module.
arXiv Detail & Related papers (2022-09-20T08:48:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.