ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
- URL: http://arxiv.org/abs/2506.21233v2
- Date: Fri, 27 Jun 2025 12:06:27 GMT
- Title: ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
- Authors: Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma,
- Abstract summary: Training-free open-vocabulary semantic segmentation (OVS) aims to segment images given a set of arbitrary textual categories without costly model fine-tuning.<n>We introduce a data-quality-oriented framework, comprising a data pipeline to construct a reference set with well-paired segment-text embeddings and a simple similarity-based retrieval.<n>Our method outperforms all existing training-free OVS approaches, highlighting the importance of data-centric design for advancing OVS without training.
- Score: 18.666044903856363
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Training-free open-vocabulary semantic segmentation (OVS) aims to segment images given a set of arbitrary textual categories without costly model fine-tuning. Existing solutions often explore attention mechanisms of pre-trained models, such as CLIP, or generate synthetic data and design complex retrieval processes to perform OVS. However, their performance is limited by the capability of reliant models or the suboptimal quality of reference sets. In this work, we investigate the largely overlooked data quality problem for this challenging dense scene understanding task, and identify that a high-quality reference set can significantly benefit training-free OVS. With this observation, we introduce a data-quality-oriented framework, comprising a data pipeline to construct a reference set with well-paired segment-text embeddings and a simple similarity-based retrieval to unveil the essential effect of data. Remarkably, extensive evaluations on ten benchmark datasets demonstrate that our method outperforms all existing training-free OVS approaches, highlighting the importance of data-centric design for advancing OVS without training. Our code is available at https://github.com/xiweix/ReME .
Related papers
- The LLM Data Auditor: A Metric-oriented Survey on Quality and Trustworthiness in Evaluating Synthetic Data [25.926467401802046]
Large Language Models (LLMs) have emerged as powerful tools for generating data across various modalities.<n>We propose a framework for evaluating synthetic data from two dimensions: quality and trustworthiness.
arXiv Detail & Related papers (2026-01-25T06:40:25Z) - OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value [74.80873109856563]
OpenDataArena (ODA) is a holistic and open platform designed to benchmark the intrinsic value of post-training data.<n>ODA establishes a comprehensive ecosystem comprising four key pillars: (i) a unified training-evaluation pipeline that ensures fair, open comparisons across diverse models; (ii) a multi-dimensional scoring framework that profiles data quality along tens of distinct axes; and (iii) an interactive data lineage explorer to visualize dataset genealogy and dissect component sources.
arXiv Detail & Related papers (2025-12-16T03:33:24Z) - Vision Large Language Models Are Good Noise Handlers in Engagement Analysis [54.397912827957164]
We propose a framework leveraging Vision Large Language Models (VLMs) to refine annotations and guide the training process.<n>Our framework uses a questionnaire to extract behavioral cues and split data into high- and low-reliability subsets.<n>We demonstrate that classical computer vision models trained on refined high-reliability subsets and enhanced with our curriculum strategy show improvements.
arXiv Detail & Related papers (2025-11-18T18:50:26Z) - SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z) - Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs [1.6332728502735252]
Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly deployed in industry applications.<n>Their reliability remains hampered by challenges in detecting hallucinations.<n>This paper addresses the bottleneck of data annotation by investigating the feasibility of reducing training data requirements.
arXiv Detail & Related papers (2025-05-29T09:50:56Z) - Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization [45.48642232138223]
In-context learning (ICL) enables Large Vision-Language Models (LVLMs) to adapt to new tasks without parameter updates.<n>We propose Key-based Coreset Optimization (KeCO), a novel framework that leverages untapped data to construct a compact and informative coreset.<n>KeCO effectively enhances ICL performance for image classification task, achieving an average improvement of more than 20%.
arXiv Detail & Related papers (2025-04-19T06:26:23Z) - Enhancing Generalization via Sharpness-Aware Trajectory Matching for Dataset Condensation [37.77454972709646]
We introduce Sharpness-Aware Trajectory Matching (SATM), which enhances the generalization capability of learned synthetic datasets.<n>Our approach is mathematically well-supported and straightforward to implement along with controllable computational overhead.
arXiv Detail & Related papers (2025-02-03T22:30:06Z) - Few-Shot Optimized Framework for Hallucination Detection in Resource-Limited NLP Systems [1.0124625066746595]
We introduce DeepSeek Few-shot optimization to enhance weak label generation through iterative prompt engineering.<n>We achieve high-quality annotations that considerably enhanced the performance of downstream models.<n>We further fine-tuned the Mistral-7B-Instruct-v0.3 model on these optimized annotations, enabling it to accurately detect hallucinations in resource-limited settings.
arXiv Detail & Related papers (2025-01-28T01:26:22Z) - HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction [24.46493675079128]
OCR-dependent methods rely on offline OCR engines, while OCR-free methods might produce outputs that lack interpretability or contain hallucinated content.
We propose HIP, which models entities as HIerarchical Points to better conform to the hierarchical nature of the end-to-end VIE task.
Specifically, such hierarchical points can be flexibly encoded and subsequently decoded into desired text transcripts, centers of various regions, and categories of entities.
arXiv Detail & Related papers (2024-11-02T05:00:13Z) - OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation [26.018655577919617]
Open-Vocabulary (OVS) aims to segment classes that are not present in the training dataset.<n>We propose ConOVS, a novel continual learning method based on a Mixture-of-Experts framework.<n>We show that ConOVS consistently outperforms existing methods across pre-training, incremental, and zero-shot test datasets.
arXiv Detail & Related papers (2024-10-15T12:11:41Z) - A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.<n>Data selection has shown promise in identifying the most representative samples from the entire dataset.<n>We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.