Related papers: Using Set Covering to Generate Databases for Holistic Steganalysis

Using Set Covering to Generate Databases for Holistic Steganalysis

URL: http://arxiv.org/abs/2211.03447v2
Date: Thu, 28 Dec 2023 08:15:05 GMT
Title: Using Set Covering to Generate Databases for Holistic Steganalysis
Authors: Rony Abecidan (CRIStAL, CNRS), Vincent Itier (CRIStAL, IMT Nord Europe, CNRS), J\'er\'emie Boulanger (CRIStAL, CNRS), Patrick Bas (CRIStAL, CNRS), Tom\'a\v{s} Pevn\'y (CTU)
Abstract summary: We explore a grid of processing pipelines to study the origins of Cover Source Mismatch (CSM) A set-covering greedy algorithm is used to select representative pipelines minimizing the maximum regret between the representative and the pipelines within the set. Our analysis also shows that parameters as denoising, sharpening, and downsampling are very important to foster diversity.
Score: 2.089615335919449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Within an operational framework, covers used by a steganographer are likely to come from different sensors and different processing pipelines than the ones used by researchers for training their steganalysis models. Thus, a performance gap is unavoidable when it comes to out-of-distributions covers, an extremely frequent scenario called Cover Source Mismatch (CSM). Here, we explore a grid of processing pipelines to study the origins of CSM, to better understand it, and to better tackle it. A set-covering greedy algorithm is used to select representative pipelines minimizing the maximum regret between the representative and the pipelines within the set. Our main contribution is a methodology for generating relevant bases able to tackle operational CSM. Experimental validation highlights that, for a given number of training samples, our set covering selection is a better strategy than selecting random pipelines or using all the available pipelines. Our analysis also shows that parameters as denoising, sharpening, and downsampling are very important to foster diversity. Finally, different benchmarks for classical and wild databases show the good generalization property of the extracted databases. Additional resources are available at github.com/RonyAbecidan/HolisticSteganalysisWithSetCovering.

Related papers

Purifying, Labeling, and Utilizing: A High-Quality Pipeline for Small Object Detection [83.90563802153707]
PLUSNet is a high-quality Small object detection framework. It comprises three components: the Hierarchical Feature (HFP) framework for purifying upstream features, the Multiple Criteria Label Assignment (MCLA) for improving the quality of midstream training samples, and the Frequency Decoupled Head (FDHead) for more effectively exploiting information to accomplish downstream tasks.
arXiv Detail & Related papers (2025-04-29T10:11:03Z)
BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes [34.71850178233064]
Key factors limiting generalization are reliance on environment-specific voxel size and search radius, poor out-of-domain robustness of learning-based keypoint detectors, and raw coordinate usage. We present a zero-shot registration pipeline called BUFFER-X by adaptively determining voxel size/search radius, using farthest point sampling to bypass learned detectors, and (c) leveraging patch-wise scale normalization for consistent coordinate bounds.
arXiv Detail & Related papers (2025-03-11T00:40:45Z)
COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation [5.840239260337972]
We propose COBRA (COmBinatorial Retrieval Augmentation), which employs an alternative CMI measure that considers both diversity and similarity to a target dataset. COBRA introduces negligible computational overhead to the cost of retrieval while providing significant gains in downstream model performance.
arXiv Detail & Related papers (2024-12-23T16:10:07Z)
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step. Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z)
A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives. We also develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts. Our proposed ReSFU framework consistently achieves satisfactory performance on different segmentation applications.
arXiv Detail & Related papers (2024-07-02T14:12:21Z)
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning [98.26836657967162]
textbfAgentOhana aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. textbfxLAM-v0.1, a large action model tailored for AI agents, demonstrates exceptional performance across various benchmarks.
arXiv Detail & Related papers (2024-02-23T18:56:26Z)
Leveraging Data Geometry to Mitigate CSM in Steganalysis [1.130790932059036]
In operational scenarios, steganographers use sets of covers from various sensors and processing pipelines that differ significantly from those used by researchers to train steganalysis models. This leads to an inevitable performance gap when dealing with out-of-distribution covers, commonly referred to as Cover Source Mismatch (CSM) In this study, we consider the scenario where test images are processed using the same pipeline. Our objective is to identify a training dataset that allows for maximum generalization to our target.
arXiv Detail & Related papers (2023-10-06T09:08:25Z)
Towards Personalized Preprocessing Pipeline Search [52.59156206880384]
ClusterP3S is a novel framework for Personalized Preprocessing Pipeline Search via Clustering. We propose a hierarchical search strategy to jointly learn the clusters and search for the optimal pipelines. Experiments on benchmark classification datasets demonstrate the effectiveness of enabling feature-wise preprocessing pipeline search.
arXiv Detail & Related papers (2023-02-28T05:45:05Z)
Experiments on Generalizability of BERTopic on Multi-Domain Short Text [2.352645870795664]
We explore how the state-of-the-art BERTopic algorithm performs on short multi-domain text. We analyze the performance of the HDBSCAN clustering algorithm utilized by BERTopic. When we replace HDBSCAN with k-Means, we achieve similar performance, but without outliers.
arXiv Detail & Related papers (2022-12-16T13:07:39Z)
Reinforced Approximate Exploratory Data Analysis [7.974685452145769]
We are first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors. We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact.
arXiv Detail & Related papers (2022-12-12T20:20:22Z)
GFlowCausal: Generative Flow Networks for Causal Discovery [27.51595081346858]
We propose a novel approach to learning a Directed Acyclic Graph (DAG) from observational data called GFlowCausal. GFlowCausal aims to learn the best policy to generate high-reward DAGs by sequential actions with probabilities proportional to predefined rewards. We conduct extensive experiments on both synthetic and real datasets, and results show the proposed approach to be superior and also performs well in a large-scale setting.
arXiv Detail & Related papers (2022-10-15T04:07:39Z)
Optimal Decision Making in High-Throughput Virtual Screening Pipelines [12.366455276434513]
We propose two optimization frameworks, applying to most (if not all) screening campaigns involving experimental or/and computational evaluations. In particular, we consider the optimal computational campaign for the long non-coding RNA (lncRNA) classification as a practical example. The simulation results demonstrate that the proposed frameworks significantly reduce the effective selection cost per potential candidate.
arXiv Detail & Related papers (2021-09-23T22:58:14Z)
Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence. We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z)
Unshuffling Data for Improved Generalization [65.57124325257409]
Generalization beyond the training distribution is a core challenge in machine learning. We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization.
arXiv Detail & Related papers (2020-02-27T03:07:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.