Related papers: CLEAR: Character Unlearning in Textual and Visual Modalities

CLEAR: Character Unlearning in Textual and Visual Modalities

URL: http://arxiv.org/abs/2410.18057v1
Date: Wed, 23 Oct 2024 17:30:50 GMT
Title: CLEAR: Character Unlearning in Textual and Visual Modalities
Authors: Alexey Dontsov, Dmitrii Korzh, Alexey Zhavoronkin, Boris Mikheev, Denis Bobkov, Aibek Alanov, Oleg Y. Rogov, Ivan Oseledets, Elena Tutubalina,
Abstract summary: We introduce CLEAR, a benchmark designed to evaluate multimodal unlearning (MMU) methods. CLEAR contains 200 fictitious individuals and 3,700 images linked with corresponding question-answer pairs. We assess 10 MU methods, adapting them for MMU, and highlight new challenges specific to multimodal forgetting.
Score: 7.618793381903125
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine Unlearning (MU) is critical for enhancing privacy and security in deep learning models, particularly in large multimodal language models (MLLMs), by removing specific private or hazardous information. While MU has made significant progress in textual and visual modalities, multimodal unlearning (MMU) remains significantly underexplored, partially due to the absence of a suitable open-source benchmark. To address this, we introduce CLEAR, a new benchmark designed to evaluate MMU methods. CLEAR contains 200 fictitious individuals and 3,700 images linked with corresponding question-answer pairs, enabling a thorough evaluation across modalities. We assess 10 MU methods, adapting them for MMU, and highlight new challenges specific to multimodal forgetting. We also demonstrate that simple $\ell_1$ regularization on LoRA weights significantly mitigates catastrophic forgetting, preserving model performance on retained data. The dataset is available at https://huggingface.co/datasets/therem/CLEAR

Related papers

Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation [88.78166077081912]
We introduce a multimodal unlearning benchmark, UnLOK-VQA, and an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs.<n>Our results show multimodal attacks outperform text- or image-only ones, and that the most effective defense removes answer information from internal model states.
arXiv Detail & Related papers (2025-05-01T01:54:00Z)
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness [61.87055159919641]
Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities. Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality. We introduce a robustness benchmark that evaluates MMSS models under three scenarios: Entire-Missing Modality (EMM), Random-Missing Modality (RMM), and Noisy Modality (NM)
arXiv Detail & Related papers (2025-03-24T08:46:52Z)
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models [27.338242898495448]
Multimodal large language models (MLLMs) have achieved remarkable success in vision-language tasks.<n>Their reliance on vast, internet-sourced data raises significant privacy and security concerns.<n>Machine unlearning (MU) has emerged as a critical technique to address these issues.<n>PEBench is a novel benchmark designed to facilitate a thorough assessment of MU in MLLMs.
arXiv Detail & Related papers (2025-03-16T15:26:20Z)
PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning [42.00851701431368]
Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs. A critical challenge remains: the issue of missing modalities during incremental learning phases. We propose PAL, a novel exemplar-free framework tailored to MMCIL under missing-modality scenarios.
arXiv Detail & Related papers (2025-01-16T08:04:04Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks. Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales. We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench [17.73279547506514]
We introduce Multimodal Large Language Model Unlearning Benchmark (MLLMU-Bench), a novel benchmark aimed at advancing the understanding of multimodal machine unlearning. MLLMU-Bench consists of 500 fictitious profiles and 153 profiles for public celebrities, each profile feature over 14 customized question-answer pairs, evaluated from both multimodal (image+text) and unimodal (text) perspectives. Surprisingly, our experiments show that unimodal unlearning algorithms excel in generation and cloze tasks, while multimodal unlearning approaches perform better in classification tasks with multimodal inputs.
arXiv Detail & Related papers (2024-10-29T15:07:23Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models [71.36392373876505]
We introduce MMIE, a large-scale benchmark for evaluating interleaved multimodal comprehension and generation in Large Vision-Language Models (LVLMs) MMIE comprises 20K meticulously curated multimodal queries, spanning 3 categories, 12 fields, and 102 subfields, including mathematics, coding, physics, literature, health, and arts. It supports both interleaved inputs and outputs, offering a mix of multiple-choice and open-ended question formats to evaluate diverse competencies.
arXiv Detail & Related papers (2024-10-14T04:15:00Z)
Recent Advances of Multimodal Continual Learning: A Comprehensive Survey [64.82070119713207]
We present the first comprehensive survey on multimodal continual learning methods. We categorize existing MMCL methods into four categories, i.e., regularization-based, architecture-based, replay-based, and prompt-based. We discuss several promising future directions for investigation and development.
arXiv Detail & Related papers (2024-10-07T13:10:40Z)
Deep Multimodal Learning with Missing Modality: A Survey [12.873458712005037]
Multimodal learning techniques designed to handle missing modalities can mitigate this. This survey reviews recent progress in Multimodal Learning with Missing Modality (MLMM)
arXiv Detail & Related papers (2024-09-12T08:15:39Z)
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training [9.023648972811458]
RagVL is a novel framework with knowledge-enhanced reranking and noise-injected training. We instruction-tune the MLLM with a simple yet effective instruction template to induce its ranking ability. For generation, we inject visual noise during training at the data and token levels to enhance the generator's robustness.
arXiv Detail & Related papers (2024-07-31T08:43:17Z)
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning [14.755831733659699]
We develop MU-Bench, the first comprehensive benchmark for Machine Unlearning (MU) MU-Bench unifies the sets of deleted samples and trained models, and provides broad coverage of tasks and data modalities. We analyze several under-investigated aspects of unlearning, including scalability, the impacts of parameter-efficient fine-tuning and curriculum learning, and susceptibility to dataset biases.
arXiv Detail & Related papers (2024-06-21T00:13:17Z)
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models [13.08771725554285]
We propose an efficient method, Single Image Unlearning (SIU), to unlearn the visual recognition of a concept by fine-tuning a single associated image for few steps. Experimental results on MMUBench show that SIU completely surpasses the performance of existing methods.
arXiv Detail & Related papers (2024-05-21T06:27:12Z)
Are We on the Right Way for Evaluating Large Vision-Language Models? [92.5761176224556]
Large vision-language models (LVLMs) have recently achieved rapid progress, sparking numerous studies to evaluate their multi-modal capabilities. We identify two primary issues: Visual content is unnecessary for many samples and intentional data leakage exists. We present MMStar, an elite vision-indispensable multi-modal benchmark comprising 1,500 samples meticulously selected by humans.
arXiv Detail & Related papers (2024-03-29T17:59:34Z)
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks. This paper presents the first comprehensive MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z)
MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition [73.80088682784587]
"Multimodal Generalization" (MMG) aims to study how systems can generalize when data from certain modalities is limited or even completely missing. MMG consists of two novel scenarios, designed to support security, and efficiency considerations in real-world applications. New fusion module with modality dropout training, contrastive-based alignment training, and a novel cross-modal loss for better few-shot performance.
arXiv Detail & Related papers (2023-05-12T03:05:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.