Fugu-MT 論文翻訳(概要): CoarseSoundNet: Building a reliable model for ecological soundscape analysis

論文の概要: CoarseSoundNet: Building a reliable model for ecological soundscape analysis

arxiv url: http://arxiv.org/abs/2605.21143v1
Date: Wed, 20 May 2026 13:18:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.690141
Title: CoarseSoundNet: Building a reliable model for ecological soundscape analysis
Title（参考訳）: CoarseSoundNet:生態音環境解析のための信頼性のあるモデルの構築
Authors: Alexander Gebhard, Andreas Triantafyllopoulos, Dominik Arend, Sandra Müller, Svenja Schmidt, Michael Scherer-Lorenzen, Björn W. Schuller,
Abstract要約: サウンドスケープは、生物音(動物音)、地球音(自然無生物音)、人類音(人間音)の3種類からなる。
参考スコア（独自算出の注目度）: 73.44688723989053
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A soundscape is composed of three types of sound: biophony (sounds made by animals), geophony (natural abiotic sounds) and anthropophony (sounds made by humans). A key research question in the field of soundscape ecology is how these components interact with each other, specifically how biophony responds to geophony and anthropophony. Nevertheless, as of today, there are not many analytical instruments that enable the distinct quantification of these elements. Recent machine learning (ML) approaches aim to support automated analysis but often rely on task-specific or clean data, limiting generalisation to noisy passive acoustic monitoring (PAM) recordings. This study presents a clear and reproducible structure to build ML models for coarse soundscape classification and introduces CoarseSoundNet, a deep learning model trained to distinguish biophony, geophony, and anthropophony under realistic PAM conditions. We systematically investigate model architectures, the influence of an additional training class, data composition, and evaluation strategies. Our findings suggest that model performance improves with additional PAM data, especially when similar to the target domain, and by introducing an explicit silence class during training. Class-specific decision thresholds and duration-based constraints further enhance performance, particularly for anthropophony and geophony. Error analyses exhibit challenges for anthropophony due to masking effects and confusions for silence and insect sounds for geophony and biophony. Finally, we conduct an ecological case study which shows that pre-filtering recordings with CoarseSoundNet yields acoustic index trends comparable to ground-truth filtering, supporting its use as an effective preprocessing tool for ecoacoustic analyses.
Abstract（参考訳）: サウンドスケープは、バイオフォニー(動物で作られた音)、ジオフォニー(自然無生物音)、人類学(人間で作られた音)の3種類の音からなる。サウンドスケープ生態学の分野における重要な研究課題は、これらの成分が相互にどのように相互作用するか、特にバイオフォニーがジオフォニーや人類学にどのように反応するかである。しかし、今日の時点では、これらの元素の別個の定量化を可能にする分析機器は多くない。最近の機械学習(ML)アプローチは、自動分析をサポートすることを目的としているが、多くの場合、タスク固有またはクリーンなデータに依存し、一般化をノイズの多いパッシブ・アコースティック・モニタリング(PAM)記録に制限する。本研究では、粗い音環境分類のためのMLモデルを構築するための明確で再現可能な構造を示し、現実的なPAM条件下でのバイオフォニー、ジオフォニー、人類学の区別を訓練された深層学習モデルであるCoarseSoundNetを紹介した。モデルアーキテクチャ、追加のトレーニングクラスの影響、データ構成、評価戦略を体系的に検討する。以上の結果から,モデル性能は,特に対象ドメインと類似する場合にPAMデータを追加することにより向上し,トレーニング中に明示的な沈黙クラスを導入することが示唆された。クラス固有の決定しきい値と期間に基づく制約は、特に人類学やジオフォニーのパフォーマンスをさらに向上させる。誤り分析は、マスク効果による人類学の課題と、ジオフォニーとバイオフォニーのための沈黙と昆虫の音の混乱を示す。最後に,CoarseSoundNetを用いたプレフィルタ記録は,地中構造フィルタリングに匹敵する音響指標の傾向を示し,エコ音響解析のための効果的な前処理ツールとしての利用を支援する。

論文の概要: CoarseSoundNet: Building a reliable model for ecological soundscape analysis

関連論文リスト