IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization
- URL: http://arxiv.org/abs/2406.10580v1
- Date: Sat, 15 Jun 2024 09:44:54 GMT
- Title: IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization
- Authors: Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, Jizhe Zhou,
- Abstract summary: IMDL-BenCo is the first comprehensive IMDL benchmark and modular.
It decomposes the IMDL framework into standardized, reusable components and revises the model construction pipeline.
It includes 8 state-of-the-art IMDL models (1 of which are reproduced from scratch), 2 sets of standard training and evaluation protocols, 15 GPU-accelerated evaluation metrics, and 3 kinds of robustness evaluation.
- Score: 58.32394109377374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A comprehensive benchmark is yet to be established in the Image Manipulation Detection \& Localization (IMDL) field. The absence of such a benchmark leads to insufficient and misleading model evaluations, severely undermining the development of this field. However, the scarcity of open-sourced baseline models and inconsistent training and evaluation protocols make conducting rigorous experiments and faithful comparisons among IMDL models challenging. To address these challenges, we introduce IMDL-BenCo, the first comprehensive IMDL benchmark and modular codebase. IMDL-BenCo:~\textbf{i)} decomposes the IMDL framework into standardized, reusable components and revises the model construction pipeline, improving coding efficiency and customization flexibility;~\textbf{ii)} fully implements or incorporates training code for state-of-the-art models to establish a comprehensive IMDL benchmark; and~\textbf{iii)} conducts deep analysis based on the established benchmark and codebase, offering new insights into IMDL model architecture, dataset characteristics, and evaluation standards. Specifically, IMDL-BenCo includes common processing algorithms, 8 state-of-the-art IMDL models (1 of which are reproduced from scratch), 2 sets of standard training and evaluation protocols, 15 GPU-accelerated evaluation metrics, and 3 kinds of robustness evaluation. This benchmark and codebase represent a significant leap forward in calibrating the current progress in the IMDL field and inspiring future breakthroughs. Code is available at: https://github.com/scu-zjz/IMDLBenCo
Related papers
- SIMCOPILOT: Evaluating Large Language Models for Copilot-Style Code Generation [5.880496520248658]
SIMCOPILOT is a benchmark that simulates the role of large language models (LLMs) as interactive, "copilot"-style coding assistants.<n>The benchmark comprises dedicated sub-benchmarks for Java (SIMCOPILOTJ) and Python.
arXiv Detail & Related papers (2025-05-21T04:59:44Z) - CoCo-Bench: A Comprehensive Code Benchmark For Multi-task Large Language Model Evaluation [19.071855537400463]
Large language models (LLMs) play a crucial role in software engineering, excelling in tasks like code generation and maintenance.
CoCo-Bench is designed to evaluate LLMs across four critical dimensions: code understanding, code generation, code modification, and code review.
arXiv Detail & Related papers (2025-04-29T11:57:23Z) - CodeVisionary: An Agent-based Framework for Evaluating Large Language Models in Code Generation [8.795746370609855]
Large language models (LLMs) have demonstrated strong capabilities in code generation.
Existing evaluation approaches fall into three categories, including human-centered, metric-based, and LLM-based.
We propose CodeVisionary, the first LLM-based agent framework for evaluating LLMs in code generation.
arXiv Detail & Related papers (2025-04-18T05:26:32Z) - Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z) - CoLLM: A Large Language Model for Composed Image Retrieval [76.29725148964368]
Composed Image Retrieval (CIR) is a complex task that aims to retrieve images based on a multimodal query.
We present CoLLM, a one-stop framework that generates triplets on-the-fly from image-caption pairs.
We leverage Large Language Models (LLMs) to generate joint embeddings of reference images and modification texts.
arXiv Detail & Related papers (2025-03-25T17:59:50Z) - Correctness Assessment of Code Generated by Large Language Models Using Internal Representations [4.32362000083889]
We introduce OPENIA, a novel framework to assess the correctness of code generated by Large Language Models (LLMs)
Our empirical analysis reveals that these internal representations encode latent information, which strongly correlates with the correctness of the generated code.
OPENIA consistently outperforms baseline models, achieving higher accuracy, precision, recall, and F1-Scores with up to a 2X improvement in standalone code generation.
arXiv Detail & Related papers (2025-01-22T15:04:13Z) - CODES: Benchmarking Coupled ODE Surrogates [0.0]
CODES is a benchmark for comprehensive evaluation of surrogate architectures for coupled ODE systems.
It emphasizes usability through features such as integrated parallel training, a web-based configuration generator, and pre-implemented baseline models and datasets.
arXiv Detail & Related papers (2024-10-28T10:12:06Z) - MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models [71.36392373876505]
We introduce MMIE, a large-scale benchmark for evaluating interleaved multimodal comprehension and generation in Large Vision-Language Models (LVLMs)
MMIE comprises 20K meticulously curated multimodal queries, spanning 3 categories, 12 fields, and 102 subfields, including mathematics, coding, physics, literature, health, and arts.
It supports both interleaved inputs and outputs, offering a mix of multiple-choice and open-ended question formats to evaluate diverse competencies.
arXiv Detail & Related papers (2024-10-14T04:15:00Z) - LLM-CI: Assessing Contextual Integrity Norms in Language Models [1.1715858161748576]
Large language models (LLMs) may inadvertently encode societal preferences and norms.
This is especially challenging due to prompt sensitivity$-$small variations in prompts yield different responses.
We present LLM-CI, the first open-sourced framework to assess encoded norms.
arXiv Detail & Related papers (2024-09-05T17:50:31Z) - LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models [71.8065384742686]
LMMS-EVAL is a unified and standardized multimodal benchmark framework with over 50 tasks and more than 10 models.
LMMS-EVAL LITE is a pruned evaluation toolkit that emphasizes both coverage and efficiency.
Multimodal LIVEBENCH utilizes continuously updating news and online forums to assess models' generalization abilities in the wild.
arXiv Detail & Related papers (2024-07-17T17:51:53Z) - GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization [21.846935203845728]
Local manipulation pipeline is designed, incorporating the powerful SAM, ChatGPT and generative models.
The GIM dataset has the following advantages: 1) Large scale, including over one million pairs of AI-manipulated images and real images.
We propose a novel IMDL framework, termed GIMFormer, which consists of a ShadowTracer, Frequency-Spatial Block (FSB), and a Multi-window Anomalous Modelling (MWAM) Module.
arXiv Detail & Related papers (2024-06-24T11:10:41Z) - An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models [25.28234439927537]
MMDetection3D-lidarseg is a comprehensive toolbox for efficient training and evaluation of state-of-the-art LiDAR segmentation models.
We support a wide range of segmentation models and integrate advanced data augmentation techniques to enhance robustness and efficiency.
By fostering a unified framework, MMDetection3D-lidarseg streamlines development and benchmarking, setting new standards for research and application.
arXiv Detail & Related papers (2024-05-23T17:59:57Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Quantitatively Assessing the Benefits of Model-driven Development in
Agent-based Modeling and Simulation [80.49040344355431]
This paper compares the use of MDD and ABMS platforms in terms of effort and developer mistakes.
The obtained results show that MDD4ABMS requires less effort to develop simulations with similar (sometimes better) design quality than NetLogo.
arXiv Detail & Related papers (2020-06-15T23:29:04Z) - MLModelScope: A Distributed Platform for Model Evaluation and
Benchmarking at Scale [32.62513495487506]
Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that researchers are hard-pressed to analyze and study them.
The complicated procedures for evaluating innovations, along with the lack of standard and efficient ways of specifying and provisioning ML/DL evaluation, is a major "pain point" for the community.
This paper proposes MLModelScope, an open-source, framework/ hardware agnostic, and customizable design that enables repeatable, fair, and scalable model evaluation and benchmarking.
arXiv Detail & Related papers (2020-02-19T17:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.