Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism
- URL: http://arxiv.org/abs/2407.21611v2
- Date: Mon, 19 Aug 2024 16:09:14 GMT
- Title: Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism
- Authors: Jiafeng Zhong, Bin Li, Jiangyan Yi,
- Abstract summary: We propose a novel method called Boundary-aware Attention Mechanism (BAM)
BAM consists of two core modules: Boundary Enhancement and Boundary Frame-wise Attention.
Experimental results on PartialSpoof database demonstrate our proposed method achieves the best performance.
- Score: 17.468808107791265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of partially spoofed audio localization aims to accurately determine audio authenticity at a frame level. Although some works have achieved encouraging results, utilizing boundary information within a single model remains an unexplored research topic. In this work, we propose a novel method called Boundary-aware Attention Mechanism (BAM). Specifically, it consists of two core modules: Boundary Enhancement and Boundary Frame-wise Attention. The former assembles the intra-frame and inter-frame information to extract discriminative boundary features that are subsequently used for boundary position detection and authenticity decision, while the latter leverages boundary prediction results to explicitly control the feature interaction between frames, which achieves effective discrimination between real and fake frames. Experimental results on PartialSpoof database demonstrate our proposed method achieves the best performance. The code is available at https://github.com/media-sec-lab/BAM.
Related papers
- Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization [60.899082019130766]
We introduce a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization.
FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions.
PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN.
arXiv Detail & Related papers (2024-07-23T15:07:52Z) - Boundary Discretization and Reliable Classification Network for Temporal Action Detection [39.17204328036531]
Temporal action detection aims to recognize the action category and determine each action instance's starting and ending time in untrimmed videos.
Mixed methods have achieved remarkable performance by seamlessly merging anchor-based and anchor-free approaches.
We propose a novel Boundary Discretization and Reliable Classification Network (BDRC-Net) that addresses the issues above by introducing boundary discretization and reliable classification modules.
arXiv Detail & Related papers (2023-10-10T08:14:24Z) - Local Compressed Video Stream Learning for Generic Event Boundary
Detection [25.37983456118522]
Event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks.
Existing methods typically require video frames to be decoded before feeding into the network.
We propose a novel event boundary detection method that is fully end-to-end leveraging rich information in the compressed domain.
arXiv Detail & Related papers (2023-09-27T06:49:40Z) - Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - An Efficient Temporary Deepfake Location Approach Based Embeddings for
Partially Spoofed Audio Detection [4.055489363682199]
We propose a fine-grained partially spoofed audio detection method, namely Temporal Deepfake Location (TDL)
Our approach involves two novel parts: embedding similarity module and temporal convolution operation.
Our method outperform baseline models in ASVspoof 2019 Partial Spoof dataset and demonstrate superior performance even in the crossdataset scenario.
arXiv Detail & Related papers (2023-09-06T14:29:29Z) - Rethinking the Video Sampling and Reasoning Strategies for Temporal
Sentence Grounding [64.99924160432144]
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.
We propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames.
arXiv Detail & Related papers (2023-01-02T03:38:22Z) - Synthesize Boundaries: A Boundary-aware Self-consistent Framework for
Weakly Supervised Salient Object Detection [8.951168425295378]
We propose to learn precise boundaries from our designed synthetic images and labels.
The synthetic image creates boundary information by inserting synthetic concave regions that simulate the real concave regions of salient objects.
We also propose a novel self-consistent framework that consists of a global integral branch (GIB) and a boundary-aware branch (BAB) to train a saliency detector.
arXiv Detail & Related papers (2022-12-04T08:22:45Z) - Temporal Perceiver: A General Architecture for Arbitrary Boundary
Detection [48.33132632418303]
Generic Boundary Detection (GBD) aims at locating general boundaries that divide videos into semantically coherent and taxonomy-free units.
Previous research separately handle these different-level generic boundaries with specific designs of complicated deep networks from simple CNN to LSTM.
We present Temporal Perceiver, a general architecture with Transformers, offering a unified solution to the detection of arbitrary generic boundaries.
arXiv Detail & Related papers (2022-03-01T09:31:30Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.