Low to High Dimensional Modality Hallucination using Aggregated Fields
of View
- URL: http://arxiv.org/abs/2007.06166v1
- Date: Mon, 13 Jul 2020 03:13:48 GMT
- Title: Low to High Dimensional Modality Hallucination using Aggregated Fields
of View
- Authors: Kausic Gunasekar, Qiang Qiu and Yezhou Yang
- Abstract summary: We argue modality hallucination as one effective way to ensure consistent modality availability.
We present a novel hallucination architecture that aggregates information from multiple fields of view of the local neighborhood.
We also conduct extensive classification and segmentation experiments on UWRGBD and NYUD datasets and demonstrate that hallucination allays the negative effects of the modality loss.
- Score: 48.32515709424962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world robotics systems deal with data from a multitude of modalities,
especially for tasks such as navigation and recognition. The performance of
those systems can drastically degrade when one or more modalities become
inaccessible, due to factors such as sensors' malfunctions or adverse
environments. Here, we argue modality hallucination as one effective way to
ensure consistent modality availability and thereby reduce unfavorable
consequences. While hallucinating data from a modality with richer information,
e.g., RGB to depth, has been researched extensively, we investigate the more
challenging low-to-high modality hallucination with interesting use cases in
robotics and autonomous systems. We present a novel hallucination architecture
that aggregates information from multiple fields of view of the local
neighborhood to recover the lost information from the extant modality. The
process is implemented by capturing a non-linear mapping between the data
modalities and the learned mapping is used to aid the extant modality to
mitigate the risk posed to the system in the adverse scenarios which involve
modality loss. We also conduct extensive classification and segmentation
experiments on UWRGBD and NYUD datasets and demonstrate that hallucination
allays the negative effects of the modality loss. Implementation and models:
https://github.com/kausic94/Hallucination
Related papers
- CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs [74.36850397755572]
CATCH addresses issues related to visual defects that cause diminished fine-grained feature perception and cumulative hallucinations in open-ended scenarios.
It is applicable to various visual question-answering tasks without requiring any specific data or prior knowledge, and generalizes robustly to new tasks without additional training.
arXiv Detail & Related papers (2024-11-19T18:27:31Z) - Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models [13.48296910438554]
We introduce Reefknot, a comprehensive benchmark targeting relation hallucinations, comprising over 20,000 real-world samples.
We provide a systematic definition of relation hallucinations, integrating perceptive and cognitive perspectives, and construct a relation-based corpus using the Visual Genome scene graph dataset.
We propose a novel confidence-based mitigation strategy, which reduces the hallucination rate by an average of 9.75% across three datasets, including Reefknot.
arXiv Detail & Related papers (2024-08-18T10:07:02Z) - Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [40.930238150365795]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback.
We generate a small-size hallucination annotation dataset by proprietary models.
Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z) - Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent.
Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z) - HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored.
We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm.
Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z) - HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot
Classification with Unimodal Cues [19.800985243540797]
Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input.
We propose a novel few-shot generative framework, Hallucinated Audio-Visual Embeddings-Network (HAVE-Net), to meta-train cross-modal features from limited unimodal data.
arXiv Detail & Related papers (2023-09-23T20:05:00Z) - Cortex Inspired Learning to Recover Damaged Signal Modality with ReD-SOM
Model [0.0]
Recent progress in AI and cognitive sciences opens up new challenges that were previously inaccessible to study.
One of such modern tasks is recovering lost data of one modality by using the data from another one.
We propose a way to simulate such an effect and use it to reconstruct lost data modalities by combining Variational Auto-Encoders, Self-Organizing Maps, and Hebb connections.
arXiv Detail & Related papers (2023-07-27T09:44:12Z) - DynImp: Dynamic Imputation for Wearable Sensing Data Through Sensory and
Temporal Relatedness [78.98998551326812]
We argue that traditional methods have rarely made use of both times-series dynamics of the data as well as the relatedness of the features from different sensors.
We propose a model, termed as DynImp, to handle different time point's missingness with nearest neighbors along feature axis.
We show that the method can exploit the multi-modality features from related sensors and also learn from history time-series dynamics to reconstruct the data under extreme missingness.
arXiv Detail & Related papers (2022-09-26T21:59:14Z) - Generative Partial Visual-Tactile Fused Object Clustering [81.17645983141773]
We propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering.
A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality.
To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders.
arXiv Detail & Related papers (2020-12-28T02:37:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.