Fugu-MT 論文翻訳(概要): Local Information Matters: A Rethink of Crowd Counting

論文の概要: Local Information Matters: A Rethink of Crowd Counting

arxiv url: http://arxiv.org/abs/2508.16970v1
Date: Sat, 23 Aug 2025 09:45:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.274569
Title: Local Information Matters: A Rethink of Crowd Counting
Title（参考訳）: 地域情報 - 群衆の数え方を再考する
Authors: Tianhang Pan, Xiuyi Jia,
Abstract要約: 本論文の動機は,画像のごく一部を個人が占める,群集カウントの本質的特徴を再考することにある。これにより,モデルの局所的モデリング能力を強調する,クラウドカウントの新しいモデル設計原則を提案する。 LIMM(Local Information Matters Model)と呼ばれる群集カウントモデルの設計と原則に従う。
参考スコア（独自算出の注目度）: 16.700460568894012
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The motivation of this paper originates from rethinking an essential characteristic of crowd counting: individuals (heads of humans) in the crowd counting task typically occupy a very small portion of the image. This characteristic has never been the focus of existing works: they typically use the same backbone as other visual tasks and pursue a large receptive field. This drives us to propose a new model design principle of crowd counting: emphasizing local modeling capability of the model. We follow the principle and design a crowd counting model named Local Information Matters Model (LIMM). The main innovation lies in two strategies: a window partitioning design that applies grid windows to the model input, and a window-wise contrastive learning design to enhance the model's ability to distinguish between local density levels. Moreover, a global attention module is applied to the end of the model to handle the occasionally occurring large-sized individuals. Extensive experiments on multiple public datasets illustrate that the proposed model shows a significant improvement in local modeling capability (8.7\% in MAE on the JHU-Crowd++ high-density subset for example), without compromising its ability to count large-sized ones, which achieves state-of-the-art performance. Code is available at: https://github.com/tianhangpan/LIMM.
Abstract（参考訳）: 本論文の動機は,群集カウント作業における個人(人間の頭)が画像のごく一部を占めるという,群集カウントの本質的特徴を再考することにある。それらは通常、他の視覚的タスクと同じバックボーンを使用し、大きな受容野を追求する。これにより,モデルの局所的モデリング能力を強調する,クラウドカウントの新しいモデル設計原則を提案する。我々は,地域情報事項モデル (LIMM) と呼ばれる群集カウントモデルに従って設計する。主な革新は、モデル入力にグリッドウィンドウを適用するウィンドウパーティショニング設計と、局所密度レベルを区別するモデルの能力を強化するウィンドウワイドコントラスト学習設計である。さらに、時折発生する大規模な個人を扱うために、モデルの末尾にグローバルアテンションモジュールを適用した。複数の公開データセットに関する大規模な実験は、提案モデルが局所モデリング能力(例えば、JHU-Crowd++の高密度サブセットのMAEの8.7\%)を大幅に改善したことを示している。コードは、https://github.com/tianhangpan/LIMM.comで入手できる。

論文の概要: Local Information Matters: A Rethink of Crowd Counting

関連論文リスト