Fugu-MT 論文翻訳(概要): Sound Model Factory: An Integrated System Architecture for Generative Audio Modelling

論文の概要: Sound Model Factory: An Integrated System Architecture for Generative Audio Modelling

arxiv url: http://arxiv.org/abs/2206.13085v1
Date: Mon, 27 Jun 2022 07:10:22 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-29 02:12:55.726056
Title: Sound Model Factory: An Integrated System Architecture for Generative Audio Modelling
Title（参考訳）: 音響モデルファクトリー:生成音声モデリングのための統合システムアーキテクチャ
Authors: Lonce Wyse, Purnima Kamath, Chitralekha Gupta
Abstract要約: 2つの異なるニューラルネットワークアーキテクチャを中心に構築されたデータ駆動型音響モデル設計のための新しいシステムを提案する。本システムの目的は、(a)モデルが合成できるべき音の範囲と、(b)その音の空間をナビゲートするためのパラメトリック制御の仕様を与えられた、インタラクティブに制御可能な音モデルを生成することである。
参考スコア（独自算出の注目度）: 4.193940401637568
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a new system for data-driven audio sound model design built around two different neural network architectures, a Generative Adversarial Network(GAN) and a Recurrent Neural Network (RNN), that takes advantage of the unique characteristics of each to achieve the system objectives that neither is capable of addressing alone. The objective of the system is to generate interactively controllable sound models given (a) a range of sounds the model should be able to synthesize, and (b) a specification of the parametric controls for navigating that space of sounds. The range of sounds is defined by a dataset provided by the designer, while the means of navigation is defined by a combination of data labels and the selection of a sub-manifold from the latent space learned by the GAN. Our proposed system takes advantage of the rich latent space of a GAN that consists of sounds that fill out the spaces ''between" real data-like sounds. This augmented data from the GAN is then used to train an RNN for its ability to respond immediately and continuously to parameter changes and to generate audio over unlimited periods of time. Furthermore, we develop a self-organizing map technique for ``smoothing" the latent space of GAN that results in perceptually smooth interpolation between audio timbres. We validate this process through user studies. The system contributes advances to the state of the art for generative sound model design that include system configuration and components for improving interpolation and the expansion of audio modeling capabilities beyond musical pitch and percussive instrument sounds into the more complex space of audio textures.
Abstract（参考訳）: 本稿では,GAN(Generative Adversarial Network)とRNN(Recurrent Neural Network)という,2つの異なるニューラルネットワークアーキテクチャを中心に構築されたデータ駆動型音声モデル設計システムを紹介する。システムの目的は,対話的に制御可能な音モデルを生成することである (a)モデルが合成できるべき音の範囲、及び (b)その音の空間をナビゲートするためのパラメトリック制御の仕様音の範囲は設計者が提供するデータセットによって定義され、ナビゲーションの手段はganによって学習された潜在空間からデータラベルとサブマニフォールドの選択の組み合わせによって定義される。 Our proposed system takes advantage of the rich latent space of a GAN that consists of sounds that fill out the spaces ''between" real data-like sounds. This augmented data from the GAN is then used to train an RNN for its ability to respond immediately and continuously to parameter changes and to generate audio over unlimited periods of time. Furthermore, we develop a self-organizing map technique for ``smoothing" the latent space of GAN that results in perceptually smooth interpolation between audio timbres. このプロセスはユーザスタディを通じて検証する。このシステムは、補間を改善するためのシステム構成とコンポーネントを含む生成音響モデル設計のための技術の発展に寄与し、音のピッチや打楽器の音以外の音響モデリング能力をより複雑な音質空間に拡張する。

関連論文リスト

DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis [4.834986020597738]
変形性周期ネットワークに基づくGAN(DPN-GAN)を提案する。 DPN-GANは、カーネルベースの周期的ReLUアクティベーション機能を導入し、オーディオ生成の周期的バイアスを誘導する。 DPN-GAN小パラメータ (38.67Mパラメータ) とDPN-GAN大パラメータ (124Mパラメータ) の2種類のモデルを訓練した。
論文参考訳（メタデータ） (2025-05-14T02:52:16Z)
Designing Neural Synthesizers for Low-Latency Interaction [8.27756937768806]
対話型ニューラルオーディオ合成(NAS)モデルで典型的に見られる遅延源とジッタについて検討する。次に、この解析を畳み込み変分オートエンコーダであるRAVEを用いて音色伝達のタスクに適用する。これは、私たちがBRAVEと呼ぶ低レイテンシで、ピッチと大音量の再現性が向上したモデルで終わる。
論文参考訳（メタデータ） (2025-03-14T16:30:31Z)
ImmerseDiffusion: A Generative Spatial Audio Latent Diffusion Model [2.2927722373373247]
ImmerseDiffusionは音の空間的・時間的・環境的条件を条件とした3次元没入型音像を生成する。
論文参考訳（メタデータ） (2024-10-19T02:28:53Z)
AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
ビュー音響合成は、音源が3Dシーンで出力するモノのオーディオを考慮し、任意の視点でオーディオを描画することを目的としている。既存の手法では、音声合成の条件として視覚的手がかりを利用するため、NeRFベースの暗黙モデルが提案されている。本研究では,シーン環境全体を特徴付ける新しいオーディオ・ビジュアル・ガウス・スプレイティング(AV-GS)モデルを提案する。 AV-GSが実世界のRWASやシミュレーションベースのSoundSpacesデータセットの既存の代替品よりも優れていることを検証する。
論文参考訳（メタデータ） (2024-06-13T08:34:12Z)
DiffMoog: a Differentiable Modular Synthesizer for Sound Matching [48.33168531500444]
DiffMoogはモジュラーシンセサイザーで、一般に商用機器で見られるモジュールの集合を包含する。差別化が可能であるため、ニューラルネットワークとの統合が可能になり、自動サウンドマッチングが可能になる。我々はDiffMoogとエンドツーエンドのサウンドマッチングフレームワークを組み合わせたオープンソースのプラットフォームを紹介した。
論文参考訳（メタデータ） (2024-01-23T08:59:21Z)
Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model [35.171785986428425]
雑音環境下での音声・視覚音声分離を行うために,AVLIT(Audio-Visual Lightweight ITerative Model)を提案する。我々のアーキテクチャは、オーディオブランチとビデオブランチで構成されており、各モードの重みを共有する反復的なA-FRCNNブロックがある。実験は、様々な音声のみのベースラインと音声視覚のベースラインに対して、両方の設定において、我々のモデルが優れていることを示す。
論文参考訳（メタデータ） (2023-05-31T20:09:50Z)
A General Framework for Learning Procedural Audio Models of Environmental Sounds [7.478290484139404]
本稿では,手続き型自動エンコーダ(ProVE)フレームワークについて,手続き型オーディオPAモデルを学習するための一般的なアプローチとして紹介する。本稿では, ProVE モデルが従来の PA モデルと敵対的アプローチの両方を音響忠実度で上回ることを示す。
論文参考訳（メタデータ） (2023-03-04T12:12:26Z)
Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
本稿では,完全自動エンドツーエンド音声検出手法を提案する。まず、wav2vec事前学習モデルを用いて、音声の高レベル表現を得る。ネットワーク構造には, Light-DARTS という異種アーキテクチャサーチ (DARTS) の修正版を用いる。
論文参考訳（メタデータ） (2022-08-20T06:46:55Z)
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis [129.86743102915986]
我々は、音声を共通部分へ分解することで、異なる視点から合成プロセスを定式化する。拡散モデルを備えた新しい2段階フレームワークであるBinauralGradを提案する。実験結果から,BinauralGradは対象評価指標と対象評価指標の両方において,既存のベースラインよりも高い性能を示した。
論文参考訳（メタデータ） (2022-05-30T02:09:26Z)
A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
視覚情報を利用した小型音声覚醒単語スポッティング(WWS)システムの設計について検討する。繰り返し微調整方式(LTH-IF)で抽選券仮説を通したニューラルネットワークプルーニング戦略を導入する。提案システムでは,ノイズ条件の異なる単一モード(オーディオのみ,ビデオのみ)システムに対して,大幅な性能向上を実現している。
論文参考訳（メタデータ） (2022-02-17T08:26:25Z)
MTCRNN: A multi-scale RNN for directed audio texture synthesis [0.0]
本稿では,異なる抽象レベルで訓練された繰り返しニューラルネットワークと,ユーザ指向の合成を可能にする条件付け戦略を組み合わせたテクスチャのモデリング手法を提案する。モデルの性能を様々なデータセットで実証し、その性能を様々なメトリクスで検証し、潜在的なアプリケーションについて議論する。
論文参考訳（メタデータ） (2020-11-25T09:13:53Z)
Deep generative models for musical audio synthesis [0.0]
音響モデリングは、パラメトリック制御の下で音を生成するアルゴリズムを開発するプロセスである。音声合成のための最近の生成的深層学習システムは、任意の音空間を横切ることができるモデルを学習することができる。本稿では,音響モデリングの実践を変える深層学習の展開を概観する。
論文参考訳（メタデータ） (2020-06-10T04:02:42Z)
VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
本稿では,条件付き変分オートエンコーダ(CVAE)を用いた変分パラメトリックシンセサイザVaPar Synthを提案する。提案するモデルの性能は,ピッチを柔軟に制御した楽器音の再構成と生成によって実証する。
論文参考訳（メタデータ） (2020-03-30T16:05:47Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。