Fugu-MT 論文翻訳(概要): Subjective Portrait Region Cropping in Landscape Videos with Temporal Annotation Smoothing

論文の概要: Subjective Portrait Region Cropping in Landscape Videos with Temporal Annotation Smoothing

arxiv url: http://arxiv.org/abs/2604.24947v1
Date: Mon, 27 Apr 2026 19:42:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-29 16:49:17.574563
Title: Subjective Portrait Region Cropping in Landscape Videos with Temporal Annotation Smoothing
Title（参考訳）: 時間的アノテーション平滑化による景観映像の主観的画像領域のクロップ
Authors: Cheng-Han Lee, Maniratnam Mandal, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik,
Abstract要約: 1800本のビデオが収録され,90人の被験者が注釈を付けたLIV-YouTube Video Croppingデータベースを紹介した。 YouTube-UGCデータベースとLSVQデータベースからソースされたビデオを使用して、この新しいリソースは、一般公開されている最大の主観的ビデオポートレート領域トリミングデータベースである。 LIVE-YT VC++と呼ばれるデータベースの処理後バージョンを導入し、フレーム内時間フィルタを新たに導入して、各ビデオ内のスムーズな主観的アノテーションを実現した。
参考スコア（独自算出の注目度）: 32.890796703337095
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the rise of mobile video consumption on diverse handheld display resolutions and orientation modes, altering videos to aspect ratios poses challenges. Static cropping and border padding often compromises visual quality, while warping may distort a video's intended meaning. Here we advocate for a more effective approach: cropping significant regions within video frames in a temporal manner, while minimizing distortion and preserving essential content. One barrier to solving this problem is the lack of sufficiently large-scale database devoted to informing these tasks. Towards filling this gap, we introduce the LIVE-YouTube Video Cropping (LIVE-YT VC) database, featuring 1800 videos, annotated by 90 human subjects. Using videos sourced from the YouTube-UGC and LSVQ Databases, this new resource is the largest publicly-available subjective video portrait region cropping database. We also introduce a post-processed version of the database, called LIVE-YT VC++, whereby a novel intra-frame temporal filter was deployed to smooth subjective annotations within each video. We demonstrate the usefulness of this new data resource using the SmartVidCrop algorithm and state-of-the-art video grounding models, in hopes of establishing our subjective dataset as a benchmark for future research. Our contributions offer a resource for advancing video aspect ratio transformation models towards ensuring that reshaped mobile-friendly video content retains its quality and meaning. Since our labels bear resemblances to video saliency annotations, we also conducted an additional analysis to explore the similarity between our labels and video saliency predictions. Finally, we repurposed state-of-the-art video grounding models for aspect ratio change tasks, and fine-tuned them on our dataset. As a service to the research community, we plan to open source the project.
Abstract（参考訳）: 多様なハンドヘルドディスプレイの解像度と配向モードにおけるモバイルビデオの消費の増加により、ビデオのアスペクト比への変更が課題となっている。静的な収穫と境界パッドはしばしば視覚的品質を損なうが、ワープはビデオの意味を歪めることがある。ここでは、時間的にビデオフレーム内の重要な領域を収穫し、歪みを最小限に抑え、本質的な内容を保存するという、より効果的なアプローチを提唱する。この問題を解決するための障壁の1つは、これらのタスクを伝えるのに十分な大規模なデータベースがないことである。 LIVE-YouTube Video Cropping (LIVE-YT VC) データベースでは,90名の被験者がアノテートした1800本の動画が紹介されている。 YouTube-UGCとLSVQ Databasesからソースされたビデオを使って、この新しいリソースは、一般公開されている最大の主観的なビデオポートレート領域トリミングデータベースである。 LIVE-YT VC++と呼ばれるデータベースの処理後バージョンを導入し、フレーム内時間フィルタを新たに導入して、各ビデオ内のスムーズな主観的アノテーションを実現した。我々は,SmartVidCropアルゴリズムと最先端のビデオグラウンドモデルを用いて,この新たなデータリソースの有用性を実証し,今後の研究のベンチマークとして,我々の主観的データセットを確立することを期待する。我々のコントリビューションは、モバイルフレンドリーなビデオコンテンツが品質と意味を維持することを保証するために、ビデオアスペクト比変換モデルを進めるためのリソースを提供する。我々のラベルはビデオ・サリエンシ・アノテーションに類似しているため、我々のラベルとビデオ・サリエンシ・予測との類似性を調べるために追加分析を行った。最後に、アスペクト比変化タスクのための最先端のビデオグラウンドモデルを再使用し、データセット上で微調整した。研究コミュニティへのサービスとして、プロジェクトをオープンソース化する予定です。

論文の概要: Subjective Portrait Region Cropping in Landscape Videos with Temporal Annotation Smoothing

関連論文リスト