Using Set Covering to Generate Databases for Holistic Steganalysis
- URL: http://arxiv.org/abs/2211.03447v2
- Date: Thu, 28 Dec 2023 08:15:05 GMT
- Title: Using Set Covering to Generate Databases for Holistic Steganalysis
- Authors: Rony Abecidan (CRIStAL, CNRS), Vincent Itier (CRIStAL, IMT Nord
Europe, CNRS), J\'er\'emie Boulanger (CRIStAL, CNRS), Patrick Bas (CRIStAL,
CNRS), Tom\'a\v{s} Pevn\'y (CTU)
- Abstract summary: We explore a grid of processing pipelines to study the origins of Cover Source Mismatch (CSM)
A set-covering greedy algorithm is used to select representative pipelines minimizing the maximum regret between the representative and the pipelines within the set.
Our analysis also shows that parameters as denoising, sharpening, and downsampling are very important to foster diversity.
- Score: 2.089615335919449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Within an operational framework, covers used by a steganographer are likely
to come from different sensors and different processing pipelines than the ones
used by researchers for training their steganalysis models. Thus, a performance
gap is unavoidable when it comes to out-of-distributions covers, an extremely
frequent scenario called Cover Source Mismatch (CSM). Here, we explore a grid
of processing pipelines to study the origins of CSM, to better understand it,
and to better tackle it. A set-covering greedy algorithm is used to select
representative pipelines minimizing the maximum regret between the
representative and the pipelines within the set. Our main contribution is a
methodology for generating relevant bases able to tackle operational CSM.
Experimental validation highlights that, for a given number of training
samples, our set covering selection is a better strategy than selecting random
pipelines or using all the available pipelines. Our analysis also shows that
parameters as denoising, sharpening, and downsampling are very important to
foster diversity. Finally, different benchmarks for classical and wild
databases show the good generalization property of the extracted databases.
Additional resources are available at
github.com/RonyAbecidan/HolisticSteganalysisWithSetCovering.
Related papers
- Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step.
Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z) - A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.
We also develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
Our proposed ReSFU framework consistently achieves satisfactory performance on different segmentation applications.
arXiv Detail & Related papers (2024-07-02T14:12:21Z) - AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning [100.14685774661959]
textbfAgentOhana aggregates agent trajectories from distinct environments, spanning a wide array of scenarios.
textbfxLAM-v0.1, a large action model tailored for AI agents, demonstrates exceptional performance across various benchmarks.
arXiv Detail & Related papers (2024-02-23T18:56:26Z) - Leveraging Data Geometry to Mitigate CSM in Steganalysis [1.130790932059036]
In operational scenarios, steganographers use sets of covers from various sensors and processing pipelines that differ significantly from those used by researchers to train steganalysis models.
This leads to an inevitable performance gap when dealing with out-of-distribution covers, commonly referred to as Cover Source Mismatch (CSM)
In this study, we consider the scenario where test images are processed using the same pipeline. Our objective is to identify a training dataset that allows for maximum generalization to our target.
arXiv Detail & Related papers (2023-10-06T09:08:25Z) - Towards Personalized Preprocessing Pipeline Search [52.59156206880384]
ClusterP3S is a novel framework for Personalized Preprocessing Pipeline Search via Clustering.
We propose a hierarchical search strategy to jointly learn the clusters and search for the optimal pipelines.
Experiments on benchmark classification datasets demonstrate the effectiveness of enabling feature-wise preprocessing pipeline search.
arXiv Detail & Related papers (2023-02-28T05:45:05Z) - Experiments on Generalizability of BERTopic on Multi-Domain Short Text [2.352645870795664]
We explore how the state-of-the-art BERTopic algorithm performs on short multi-domain text.
We analyze the performance of the HDBSCAN clustering algorithm utilized by BERTopic.
When we replace HDBSCAN with k-Means, we achieve similar performance, but without outliers.
arXiv Detail & Related papers (2022-12-16T13:07:39Z) - Reinforced Approximate Exploratory Data Analysis [7.974685452145769]
We are first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors.
We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact.
arXiv Detail & Related papers (2022-12-12T20:20:22Z) - GFlowCausal: Generative Flow Networks for Causal Discovery [27.51595081346858]
We propose a novel approach to learning a Directed Acyclic Graph (DAG) from observational data called GFlowCausal.
GFlowCausal aims to learn the best policy to generate high-reward DAGs by sequential actions with probabilities proportional to predefined rewards.
We conduct extensive experiments on both synthetic and real datasets, and results show the proposed approach to be superior and also performs well in a large-scale setting.
arXiv Detail & Related papers (2022-10-15T04:07:39Z) - Optimal Decision Making in High-Throughput Virtual Screening Pipelines [12.366455276434513]
We propose two optimization frameworks, applying to most (if not all) screening campaigns involving experimental or/and computational evaluations.
In particular, we consider the optimal computational campaign for the long non-coding RNA (lncRNA) classification as a practical example.
The simulation results demonstrate that the proposed frameworks significantly reduce the effective selection cost per potential candidate.
arXiv Detail & Related papers (2021-09-23T22:58:14Z) - Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence.
We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z) - Unshuffling Data for Improved Generalization [65.57124325257409]
Generalization beyond the training distribution is a core challenge in machine learning.
We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization.
arXiv Detail & Related papers (2020-02-27T03:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.