SCENE: Semantic-aware Codec Enhancement with Neural Embeddings
- URL: http://arxiv.org/abs/2601.22189v1
- Date: Thu, 29 Jan 2026 05:41:28 GMT
- Title: SCENE: Semantic-aware Codec Enhancement with Neural Embeddings
- Authors: Han-Yu Lin, Li-Wei Chen, Hung-Shin Lee,
- Abstract summary: We propose a semantic-aware pre-processing framework that enhances perceptual fidelity.<n>Our method integrates semantic embeddings from a vision-language model into an efficient convolutional architecture.<n>Our results show that semantic-aware pre-processing is an effective approach for enhancing compressed video streams.
- Score: 9.6489283335586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compression artifacts from standard video codecs often degrade perceptual quality. We propose a lightweight, semantic-aware pre-processing framework that enhances perceptual fidelity by selectively addressing these distortions. Our method integrates semantic embeddings from a vision-language model into an efficient convolutional architecture, prioritizing the preservation of perceptually significant structures. The model is trained end-to-end with a differentiable codec proxy, enabling it to mitigate artifacts from various standard codecs without modifying the existing video pipeline. During inference, the codec proxy is discarded, and SCENE operates as a standalone pre-processor, enabling real-time performance. Experiments on high-resolution benchmarks show improved performance over baselines in both objective (MS-SSIM) and perceptual (VMAF) metrics, with notable gains in preserving detailed textures within salient regions. Our results show that semantic-guided, codec-aware pre-processing is an effective approach for enhancing compressed video streams.
Related papers
- SCALED : Surrogate-gradient for Codec-Aware Learning of Downsampling in ABR Streaming [9.436544348188598]
Over-the-Top (OTT) delivery now predominantly relies on Adaptive Bitrate (ABR) streaming.<n>Deep learning has spurred interest in jointly optimizing the ABR pipeline using learned resampling methods.<n>We introduce a novel framework that enables end-to-end training with real, non-differentiable codecs.
arXiv Detail & Related papers (2026-01-30T10:38:35Z) - Content Adaptive based Motion Alignment Framework for Learned Video Compression [72.13599533975413]
This paper proposes a content adaptive based motion alignment framework.<n>We first introduce a two-stage flow-guided deformable warping mechanism that refines motion compensation with coarse-to-fine offset prediction and mask modulation.<n>Second, we propose a multi-reference quality aware strategy that adjusts distortion weights based on reference quality, and applies it to hierarchical training to reduce error propagation.<n>Third, we integrate a training-free module that downsamples frames by motion magnitude and resolution to obtain smooth motion estimation.
arXiv Detail & Related papers (2025-12-15T02:51:47Z) - Plug-and-Play Versatile Compressed Video Enhancement [57.62582951699999]
Video compression effectively reduces the size of files, making it possible for real-time cloud computing.<n>However, it comes at the cost of visual quality, challenges the robustness of downstream vision models.<n>We present a versatile-aware enhancement framework that adaptively enhance videos under different compression settings.
arXiv Detail & Related papers (2025-04-21T18:39:31Z) - Embedding Compression Distortion in Video Coding for Machines [67.97469042910855]
Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis.<n>We propose a Compression Distortion Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models.<n>Our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of execution time, and number of parameters.
arXiv Detail & Related papers (2025-03-27T13:01:53Z) - High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing.
This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.