Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20220217となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 4次元の基底を補完する Completing bases in four dimensions ( http://arxiv.org/abs/2010.09506v4 ) ライセンス: Link先を確認	Hans Havlicek and Karl Svozil	(参考訳) in)分解可能ベクトルによる4次元ヒルベルト空間の不完全基底、あるいは文脈の完備化のための基準と構成的方法が与えられる。 Criteria and constructive methods for the completion of an incomplete basis of, or context in, a four-dimensional Hilbert space by (in)decomposable vectors are given.	翻訳日:2023-04-28 21:53:35 公開日:2022-02-17
# 大域マルコフ進化の局所非マルコフ量子力学における熱電流とエントロピー生成速度 Heat current and entropy production rate in local non-Markovian quantum dynamics of global Markovian evolution ( http://arxiv.org/abs/2102.06694v2 ) ライセンス: Link先を確認	Ahana Ghoshal and Ujjwal Sen	(参考訳) 開量子進化におけるエントロピーの平衡方程式の要素と、マルコフ的状態から非マルコフ的状態へ進むときの応答について検討する。特に,非マルコフの還元進化における熱電流とエントロピー生成速度,およびマルコフの限界を,マルコフの浴に浸漬された2つの相互作用系のうちの1つで経験する。この分析によって自然に熱電流不足とエントロピー生産率の赤字が定義され、これは対応する量のグローバルバージョンとローカルバージョンの違いである。この調査は、ある場合において、時間積分熱電流欠損と2つの系間の絡み合いの相対エントロピーの相補性につながる。 We examine the elements of the balance equation of entropy in open quantum evolutions and their response as we go from a Markovian to a non-Markovian situation. In particular, we look at the heat current and entropy production rate in the non-Markovian reduced evolution, as well as a Markovian limit of the same, experienced by one of two interacting systems immersed in a Markovian bath. The analysis naturally leads us to define a heat current deficit and an entropy production rate deficit, which are differences between the global and local versions of the corresponding quantities. The investigation leads, in certain cases, to a complementarity of the time-integrated heat current deficit and the relative entropy of entanglement between the two systems.	翻訳日:2023-04-11 07:58:10 公開日:2022-02-17
# クラウド上の6つのノイズ量子ビットを持つ量子非局所ゲーム Playing quantum nonlocal games with six noisy qubits on the cloud ( http://arxiv.org/abs/2105.05266v3 ) ライセンス: Link先を確認	Meron Sheffer, Daniel Azses, Emanuele G. Dalla Torre	(参考訳) 非局所ゲームはベルの不等式の拡張であり、量子優位を示すことを目的としている。これらのゲームは、浅い回路の準備だけを必要とするため、ノイズの多い量子コンピュータには適しており、非可換可観測性の測定も必要である。ここでは, science 362, 308 (2018) で提案されている非局所ゲームの最小実装について考察する。我々はibm、ionq、honeywellによるクラウド上の量子コンピュータを用いて6量子ビットのクラスタ状態を作成することでこのゲームをテストする。我々のアプローチには、回路のアイデンティティやエラー軽減など、いくつかのレベルの最適化が含まれており、古典的しきい値を超え、1つの量子コンピュータで量子優位性を示すことができる。我々は、より精度の低い量子コンピュータにおいて、より多くの回路の探索を犠牲にして量子優位を観測できる異なる不等式を導入することで、結論付けた。 Nonlocal games are extensions of Bell inequalities, aimed at demonstrating quantum advantage. These games are well suited for noisy quantum computers because they only require the preparation of a shallow circuit, followed by the measurement of non-commuting observable. Here, we consider the minimal implementation of the nonlocal game proposed in Science 362, 308 (2018). We test this game by preparing a 6-qubit cluster state using quantum computers on the cloud by IBM, Ionq, and Honeywell. Our approach includes several levels of optimization, such as circuit identities and error mitigation and allows us to cross the classical threshold and demonstrate quantum advantage in one quantum computer. We conclude by introducing a different inequality that allows us to observe quantum advantage in less accurate quantum computers, at the expense of probing a larger number of circuits.	翻訳日:2023-03-31 20:47:33 公開日:2022-02-17
# ファーストレスポンダーが翼を得た: beyond 5gシステムにおけるローカライズ操作の救助にuav First Responders Got Wings: UAVs to the Rescue of Localization Operations in Beyond 5G Systems ( http://arxiv.org/abs/2109.03180v3 ) ライセンス: Link先を確認	Antonio Albanese, Vincenzo Sciancalepore, Xavier Costa-P\'erez	(参考訳) 自然災害や人造災害は過去数十年で劇的に増加している。第一応答者のローカライゼーション時間と最終死亡数との強い関係を考えると,検索・救助業務の近代化は不可欠である。この文脈では、無人航空機(UAV)ベースのソリューションは、人工知能(AI)、再構成可能なインテリジェントサーフェス(RIS)、直交時間周波数空間(OTFS)といった新しい技術を活用することで、ローカライゼーションの課題に取り組む最も有望な候補である。本稿では,前例のない効果的な被害者のローカライズソリューションを生み出すために,最先端技術のローカライズ性能を高めるための,主な課題と今後の機会を浮き彫りにすることで,最近利用可能な手法を生かした。 Natural and human-made disasters have dramatically increased during the last decades. Given the strong relationship between first responders localization time and the final number of deaths, the modernization of search-and-rescue operations has become imperative. In this context, Unmanned Aerial Vehicles (UAVs)-based solutions are the most promising candidates to take up on the localization challenge by leveraging on emerging technologies such as: Artificial Intelligence (AI), Reconfigurable Intelligent Surfaces (RIS) and Orthogonal Time Frequency Space (OTFS) modulations. In this paper, we capitalize on such recently available techniques by shedding light on the main challenges and future opportunities to boost the localization performance of state-of-the-art techniques to give birth to unprecedentedly effective missing victims localization solutions.	翻訳日:2023-03-15 22:45:41 公開日:2022-02-17
# 変動量子に基づく導波路モードのシミュレーション Variational Quantum-Based Simulation of Waveguide Modes ( http://arxiv.org/abs/2109.12279v2 ) ライセンス: Link先を確認	Wei-Bin Ewe, Dax Enshan Koh, Siong Thye Goh, Hong-Son Chu, Ching Eng Png	(参考訳) 変分量子アルゴリズムは、古典的コンピュータに対する量子優位性を達成するために、ノイズの多い中間スケール量子(NISQ)マシン上で実装できる最も有望な方法の1つである。本稿では、中空金属導波路における電磁波の伝搬モードの計算における有限差分法と併用した変分量子アルゴリズムの使用について述べる。ヘルムホルツ方程式によって記述された二次元(2次元)導波路問題は線形方程式の系によって近似され、その解は量子ハードウェア上で効率的に評価できる単純な量子期待値で表される。 2次元導波路問題を解決するための提案手法を検証するために, 数値実験を行った。 Variational quantum algorithms are one of the most promising methods that can be implemented on noisy intermediate-scale quantum (NISQ) machines to achieve a quantum advantage over classical computers. This article describes the use of a variational quantum algorithm in conjunction with the finite difference method for the calculation of propagation modes of an electromagnetic wave in a hollow metallic waveguide. The two-dimensional (2D) waveguide problem, described by the Helmholtz equation, is approximated by a system of linear equations, whose solutions are expressed in terms of simple quantum expectation values that can be evaluated efficiently on quantum hardware. Numerical examples are presented to validate the proposed method for solving 2D waveguide problems.	翻訳日:2023-03-13 19:06:34 公開日:2022-02-17
# ナノロータアライメントの干渉制御 Interferometric control of nanorotor alignment ( http://arxiv.org/abs/2110.01301v2 ) ライセンス: Link先を確認	Birthe Schrinski, Benjamin A. Stickler, Klaus Hornberger	(参考訳) 剛体の固有に非線形な回転ダイナミクスは、量子運動を利用する前例のない方法を提供する。このレターではマッハ・ツェンダー干渉計の回転アナログを考案し、これは対称回転子を完全に整列して完全に反整列することを可能にする。このスキームは4つの異なる配向の重ね合わせを使い、量子回復時間の8分の1で出現し、干渉は弱いレーザーパルスによって制御される。この効果の半古典的モデルを構築し,不完全性やデコヒーレンスの存在下においても持続することを示す。 The intrinsically non-linear rotation dynamics of rigid bodies offer unprecedented ways to exploit their quantum motion. In this Letter we devise a rotational analog of Mach-Zehnder interferometry, which allows steering symmetric rotors from fully aligned to completely antialigned. The scheme uses a superposition of four distinct orientations, emerging at the eighth of the quantum revival time, whose interference can be controlled by a weak laser pulse. We develop a semiclassical model of the effect and demonstrate that it persists even in presence of imperfections and decoherence.	翻訳日:2023-03-12 14:18:58 公開日:2022-02-17
# xx$+$xxzダイオードの輸送とスペクトル特性とデファージングに対する安定性 Transport and spectral properties of the XX$+$XXZ diode and stability to dephasing ( http://arxiv.org/abs/2110.15564v3 ) ライセンス: Link先を確認	Kang Hao Lee, Vinitha Balachandran, Chu Guo and Dario Poletti	(参考訳) XX$+$ XXZスピンチェーンで形成されるセグメントダイオードの輸送特性とスペクトル特性について検討した。この系は十分に大きな異方性を持つスピン電流に対する理想的な整流器であることが示されている。ここでは、逆バイアスの系が異方性(弾道性、拡散性、絶縁性)の値によって3つの異なる輸送機構を持つことを示す。前方バイアスでは、弾道性と拡散性の2つの体制に遭遇する。前と逆のバイアスの系は、異なる関数に収束する急流度の分布で、スペクトル特性が著しく異なる。脱落の有無が拡散し、改質が著しく減少し、緩和ギャップが増加し、前方および逆バイアスのスペクトル特性が収束する傾向にある。大きく強調すると、緩和ギャップは量子ゼノン物理学の結果として再び減少する。 We study the transport and spectral property of a segmented diode formed by an XX $+$ XXZ spin chain. This system has been shown to become an ideal rectifier for spin current for large enough anisotropy. Here we show numerical evidence that the system in reverse bias has three different transport regimes depending on the value of the anisotropy: ballistic, diffusive and insulating. In forward bias we encounter two regimes, ballistic and diffusive. The system in forward and reverse bias shows significantly different spectral properties, with distribution of rapidities converging towards different functions. In the presence of dephasing the system becomes diffusive, rectification is significantly reduced, the relaxation gap increases and the spectral properties in forward and reverse bias tend to converge. For large dephasing the relaxation gap decreases again as a result of Quantum Zeno physics.	翻訳日:2023-03-09 22:57:19 公開日:2022-02-17
# トーリック符号における2Qubitゲートと複数Qubitゲートの比較 Comparing Two-Qubit and Multi-Qubit Gates within the Toric Code ( http://arxiv.org/abs/2111.04047v2 ) ライセンス: Link先を確認	David Schwerdt, Yotam Shapira, Tom Manovitz, and Roee Ozeri	(参考訳) 一部の量子コンピューティング(qc)アーキテクチャでは、任意の数の量子ビットの絡み合いを単一の演算で生成することができる。この性質は多くの潜在的な応用があり、特に量子エラー補正(QEC)に有用である。安定化器の測定は、複数の2キュービットゲートの代わりに単一のマルチキュービットゲートを使用して行うことができ、回路深さを低減できる。本研究では,パリティチェック回路における2量子ゲートと5量子ゲートの性能を比較するベンチマークとして,トーリック符号を用いた。我々は、ラマン遷移によって制御される閉じ込められたイオン量子ビットについて検討し、主な誤差源は自発的な光子散乱であると仮定する。 5量子ビットのm{\o}lmer-s{\o}rensenゲートは、フォールトトレランスしきい値の点で、2量子ビットのゲートに対して約40\%の改善を提供する。この結果は、QECの文脈でマルチキュービットゲートを使用する利点を示している。 In some quantum computing (QC) architectures, entanglement of an arbitrary number of qubits can be generated in a single operation. This property has many potential applications, and may specifically be useful for quantum error correction (QEC). Stabilizer measurements can then be implemented using a single multi-qubit gate instead of several two-qubit gates, thus reducing circuit depth. In this study, the toric code is used as a benchmark to compare the performance of two-qubit and five-qubit gates within parity-check circuits. We consider trapped ion qubits that are controlled via Raman transitions, where the primary source of error is assumed to be spontaneous photon scattering. We show that a five-qubit M{\o}lmer-S{\o}rensen gate offers an approximately $40\%$ improvement over two-qubit gates in terms of the fault tolerance threshold. This result indicates an advantage of using multi-qubit gates in the context of QEC.	翻訳日:2023-03-08 22:33:06 公開日:2022-02-17
# 非エルミート準結晶における拡散誘起移動エッジと多重再帰局在遷移 Dimerization induced mobility edges and multiple reentrant localization transitions in non-Hermitian quasicrystals ( http://arxiv.org/abs/2111.08427v3 ) ライセンス: Link先を確認	Wenqian Han and Longwen Zhou	(参考訳) 非エルミート効果はリッチな動的および位相的位相構造を生み出す。本研究は,格子二量体と非ハーミティティーの協調によって,一次元準結晶の移動エッジと多重局在化遷移が生じることを示す。 Aubry-Andr\e-Harper (AAH) モデルの非エルミート拡張(英語版)を行い,実験結果を実証した。準周期的利得/損失と格子二元化との相互作用による再帰的局在遷移を見いだす。量子化された巻数はさらに位相不変量として用いられ、異なるスペクトルと輸送性を持つ位相間の遷移を特徴づける。そこで本研究では,非エルミート準結晶の族を格子二量体の影響を取り入れ,非エルミート系における局在化遷移と移動エッジを調節する便利な手段を提供する。 Non-Hermitian effects could create rich dynamical and topological phase structures. In this work, we show that the collaboration between lattice dimerization and non-Hermiticity could generally bring about mobility edges and multiple localization transitions in one-dimensional quasicrystals. Non-Hermitian extensions of the Aubry-Andr\'e-Harper (AAH) model with staggered onsite potential and dimerized hopping amplitudes are introduced to demonstrate our results. Reentrant localization transitions due to the interplay between quasiperiodic gain/loss and lattice dimerization are found. Quantized winding numbers are further adopted as topological invariants to characterize transitions among phases with distinct spectrum and transport nature. Our study thus enriches the family of non-Hermitian quasicrystals by incorporating effects of lattice dimerization, and offering a convenient way to modulate localization transitions and mobility edges in non-Hermitian systems.	翻訳日:2023-03-08 00:00:59 公開日:2022-02-17
# 形状依存型多重磁性人工シナプスによるニューロモルフィックコンピューティング Shape-Dependent Multi-Weight Magnetic Artificial Synapses for Neuromorphic Computing ( http://arxiv.org/abs/2111.11516v2 ) ライセンス: Link先を確認	Thomas Leonard, Samuel Liu, Mahshid Alamdar, Can Cui, Otitoaleke G. Akinola, Lin Xue, T. Patrick Xiao, Joseph S. Friedman, Matthew J. Marinella, Christopher H. Bennett and Jean Anne C. Incorvia	(参考訳) ニューロモルフィックコンピューティングでは、人工シナプスは脳と類似したニューロンからの入力に基づいて設定される多重コンダクタンス状態を提供する。複数の重量を超えるシナプスのさらなる特性が必要であり、同じ材料から異なるシナプスの挙動を生成する必要があるため、アプリケーションに依存することができる。本稿では,磁気トンネル接合と磁区壁を用いた磁気材料を用いた人工シナプスの測定を行った。単磁気トンネル接合の下の磁壁軌道にリソグラフィノッチを造ることで、スピン軌道トルクを用いて繰り返し電気的に制御できる4-5の安定抵抗状態が得られる。形状がシナプスの挙動に及ぼす影響を解析し,非対称な重みが制御性が高く,ストレートな装置は確率性が高いが,抵抗レベルは安定であることを示した。デバイスデータはニューロモルフィックコンピューティングシミュレータに入力され、アプリケーション固有のシナプス関数の有用性を示す。ストリーム型Fashion-MNISTデータに適用した人工ニューラルネットワークの実装により,ケプシカル磁気シナプスをメタ塑性関数として利用し,オンライン学習の効率化を図る。 CIFAR-100画像認識のための畳み込みニューラルネットワークを実装したところ、磁気シナプスは抵抗レベルの安定性のため、ほぼ理想的推論精度が得られることがわかった。この研究は、多重磁気シナプスがニューロモルフィックコンピューティングの実現可能な技術であることを示し、新しい人工シナプス技術の設計ガイドラインを提供する。 In neuromorphic computing, artificial synapses provide a multi-weight conductance state that is set based on inputs from neurons, analogous to the brain. Additional properties of the synapse beyond multiple weights can be needed, and can depend on the application, requiring the need for generating different synapse behaviors from the same materials. Here, we measure artificial synapses based on magnetic materials that use a magnetic tunnel junction and a magnetic domain wall. By fabricating lithographic notches in a domain wall track underneath a single magnetic tunnel junction, we achieve 4-5 stable resistance states that can be repeatably controlled electrically using spin orbit torque. We analyze the effect of geometry on the synapse behavior, showing that a trapezoidal device has asymmetric weight updates with high controllability, while a straight device has higher stochasticity, but with stable resistance levels. The device data is input into neuromorphic computing simulators to show the usefulness of application-specific synaptic functions. Implementing an artificial neural network applied on streamed Fashion-MNIST data, we show that the trapezoidal magnetic synapse can be used as a metaplastic function for efficient online learning. Implementing a convolutional neural network for CIFAR-100 image recognition, we show that the straight magnetic synapse achieves near-ideal inference accuracy, due to the stability of its resistance levels. This work shows multi-weight magnetic synapses are a feasible technology for neuromorphic computing and provides design guidelines for emerging artificial synapse technologies.	翻訳日:2023-03-07 04:14:44 公開日:2022-02-17
# 熱平衡の平和的共存と時間の出現 Peaceful coexistence of thermal equilibrium and the emergence of time ( http://arxiv.org/abs/2112.04057v3 ) ライセンス: Link先を確認	Tommaso Favalli and Augusto Smerzi	(参考訳) 我々は、小さな系 S と大きな環境からなる量子宇宙を考える。全体のエネルギー制約を満たす宇宙のランダムに選択された波動関数の大部分に対して、系Sの密度行列は正準統計分布によって与えられることが示されている。ここでは、pageとwootters機構を通じて、時間と非平衡のダイナミクスが、宇宙の(ランダムに選択された)大域的な波動関数に存在するシステムと環境の絡み合いの結果生じることを示します。統計平衡と非平衡力学の平和的共存のパラドックスは、系Sの歴史全体にわたる時間的トレースで環境自由度上のトレースを特定することによって解決される。 We consider a quantum Universe composed by a small system S and a large environment. It has been demonstrated that, for the vast majority of randomly chosen wave-functions of the Universe satisfying a total energy constraint, the reduced density matrix of the system S is given by the canonical statistical distribution. Here, through the Page and Wootters mechanism, we show that time and non-equilibrium dynamics can emerge as a consequence of the entanglement between the system and the environment present in the (randomly chosen) global wave-function of the Universe. The clock is provided by the environment, which ticks the temporal evolution of S. The paradox of the peaceful coexistence of statistical equilibrium and non-equilibrium dynamics is solved by identifying the trace over the environment degrees of freedom with the temporal trace over the entire history of the system S.	翻訳日:2023-03-05 03:19:55 公開日:2022-02-17
# スパイクニューラルネットワークにおける劣化曲線の解法による深層残差学習の促進 Advancing Deep Residual Learning by Solving the Crux of Degradation in Spiking Neural Networks ( http://arxiv.org/abs/2201.07209v2 ) ライセンス: Link先を確認	Yifan Hu, Yujie Wu, Lei Deng, Guoqi Li	(参考訳) ニューロモルフィックコンピューティングの急速な進歩にもかかわらず、スパイクニューラルネットワーク(SNN)の不十分な深さと結果として生じる表現力は、実際に適用範囲を厳しく制限している。残存学習とショートカットはディープニューラルネットワークのトレーニングに重要なアプローチとして証明されているが、スパイクベースのコミュニケーションと時空間ダイナミクスの特性にその適用性を評価することは滅多になかった。この無視は情報の流れを阻害し、それに伴う劣化問題を引き起こす。そこで本論文では,snsの新たな残差ブロックを提案する。これは,cifar-10上の最大482層,imagenet上の104層といった,直接訓練されたsnsの深さをわずかに劣化する問題を観察することなく大きく拡張することができる。 SRM-ResNet104は、直接訓練されたSNNの領域において、ImageNetにおいて76.02%の精度で優れた結果が得られる。大いなるエネルギー効率を推定し、その結果得られるネットワークは、入力サンプルを分類するために平均1つのスパイクのみを必要とする。当社の強力でスケーラブルなモデリングは、SNNのさらなる探索に強力なサポートを提供すると信じています。 Despite the rapid progress of neuromorphic computing, the inadequate depth and the resulting insufficient representation power of spiking neural networks (SNNs) severely restrict their application scope in practice. Residual learning and shortcuts have been evidenced as an important approach for training deep neural networks, but rarely did previous work assess their applicability to the characteristics of spike-based communication and spatiotemporal dynamics. This negligence leads to impeded information flow and the accompanying degradation problem. In this paper, we identify the crux and then propose a novel residual block for SNNs, which is able to significantly extend the depth of directly trained SNNs, e.g., up to 482 layers on CIFAR-10 and 104 layers on ImageNet, without observing any slight degradation problem. We validate the effectiveness of our methods on both frame-based and neuromorphic datasets, and our SRM-ResNet104 achieves a superior result of 76.02% accuracy on ImageNet, the first time in the domain of directly trained SNNs. The great energy efficiency is estimated and the resulting networks need on average only one spike per neuron for classifying an input sample. We believe our powerful and scalable modeling will provide a strong support for further exploration of SNNs.	翻訳日:2023-03-05 00:41:21 公開日:2022-02-17
# 光力学系における量子ノイズによる一般化不確実性原理の探究 Probing the generalized uncertainty principle through quantum noises in optomechanical systems ( http://arxiv.org/abs/2112.13682v2 ) ライセンス: Link先を確認	Soham Sen, Sukanta Bhattacharyya, Sunandan Gangopadhyay	(参考訳) 本研究では,一般不確かさ原理フレームワーク (gup) において,キャビティ内の単一モードの光学場と相互作用する単純な機械振動子について考察した。本研究の目的は,修正雑音スペクトルを計算し,gupの効果を観測することである。我々が検討した可換関係は、二次順序項と共に余剰線型順序運動量項を持つ。理論結果と観測結果とを対比して,gupパラメータは,異なる実験から得られたシステムパラメータの値を用いて,ノイズスペクトルからより厳密な結合が得られることを観測した。 In this work we have considered a simple mechanical oscillator interacting with a single mode optical field inside a cavity in the generalized uncertainty principle framework (GUP). Our aim is to calculate the modified noise spectrum and observe the effects of the GUP. The commutation relation that we have considered has an extra linear order momentum term along with a quadratic order term. Confronting our theoretical results with the observational results, we observe that we get a much tighter bound on the GUP parameters from the noise spectrum using the values of the system parameters from different experiments.	翻訳日:2023-03-03 17:30:20 公開日:2022-02-17
# 光学浮揚ナノダイヤモンドのホットブラウン運動 Hot Brownian motion of optically levitated nanodiamonds ( http://arxiv.org/abs/2201.00170v2 ) ライセンス: Link先を確認	Fran\c{c}ois Rivi\`ere, Timoth\'ee de Guillebon, Damien Raynal, Martin Schmidt, Jean-S\'ebastien Lauret, Jean-Fran\c{c}ois Roch, Lo\"ic Rondin	(参考訳) 粒子の環境よりも高温のブラウン運動は、象徴的な平衡系である。その研究はナノスケールの熱効果に関する貴重な洞察を与える。特に、力センシングと量子物理学のテストに有望なプラットフォームである光学浮揚粒子の熱的効果の優れた診断を提供する。したがって、この効果における関連するパラメータを理解することは重要である。本研究では,NV中心を担持する光レビテーションナノダイアモンドを用いて,粒子の内部温度と質量の動態を計測し,粒子の形状と物質の役割を検証した。ナノダイアモンドの内部温度を他の粒子に適応可能な力学から評価するモデルを提案する。また,他の機構がトラップのナノダイアモンド力学とその安定性に影響を及ぼすことを示した。最後に, ナノダイヤモンドの浮揚をナノサーマル効果を研究するための優れたツールとして示すことで, 光学浮揚粒子の捕捉安定性を高めるための展望を開く。 The Brownian motion of a particle hotter than its environment is an iconic out-of-equilibrium system. Its study provides valuable insights into nanoscale thermal effects. Notably, it supplies an excellent diagnosis of thermal effects in optically levitated particles, a promising platform for force sensing and quantum physics tests. Thus, understanding the relevant parameters in this effect is critical. In this context, we test the role of particles' shape and material, using optically levitated nanodiamonds hosting NV centers to measure the particles' internal temperature and center-of-mass dynamics. We present a model to assess the nanodiamond internal temperature from its dynamics, adaptable to other particles. We also demonstrate that other mechanisms affect the nanodiamond dynamics and its stability in the trap. Finally, our work, by showing levitating nanodiamonds as an excellent tool for studying nano-thermal effects, opens prospects for increasing the trapping stability of optically levitated particles.	翻訳日:2023-03-02 17:24:17 公開日:2022-02-17
# 高帯域幅キャビティ内の原子のラマンイメージング Raman Imaging of Atoms Inside a High-bandwidth Cavity ( http://arxiv.org/abs/2202.05369v2 ) ライセンス: Link先を確認	Eduardo Uru\~nuela, Maximilian Ammenwerth, Pooja Malik, Lukas Ahlheit, Hannes Pfeifer, Wolfgang Alt, and Dieter Meschede	(参考訳) 高帯域幅ファイバベースの光キャビティは、将来の量子ネットワークにとって有望な構成要素である。量子情報を高速にファイバーネットワークにルーティングする光子を持つ単原子や複数原子のような定常量子ビットを共鳴的に結合するために用いられる。高帯域幅の空洞では、パーセル効果が全方位蛍光を強く抑制するため、原子空洞共鳴線の標準蛍光イメージングによる原子位置制御が阻害される。ここでは,連続および3次元ラマンサイドバンド冷却により発生するレプタンパー蛍光を検出することにより,このような繊維ファブリ・パウエルトキャビティに強く結合した$^{87}$rb原子のイメージングを復元する。我々は,ラマン共鳴に影響を及ぼすリプタンパー誘起差光シフトを,強度と調律に依存して詳細な分光分析を行った。本研究は, 捕捉原子の加熱ダイナミクスにおける双極子-力ゆらぎの役割に関する物理的洞察を得るための, 画像信号対雑音比と生存率の妥協機構を明らかにする。 High-bandwidth, fiber-based optical cavities are a promising building block for future quantum networks. They are used to resonantly couple stationary qubits such as single or multiple atoms with photons routing quantum information into a fiber network at high rates. In high-bandwidth cavities, standard fluorescence imaging on the atom-cavity resonance line for controlling atom positions is impaired since the Purcell effect strongly suppresses all-directional fluorescence. Here, we restore imaging of $^{87}$Rb atoms strongly coupled to such a fiber Fabry-P\'erot cavity by detecting the repumper fluorescence which is generated by continuous and three-dimensional Raman sideband cooling. We have carried out a detailed spectroscopic investigation of the repumper-induced differential light shifts affecting the Raman resonance, dependent on intensity and detuning. Our analysis identifies a compromise regime between imaging signal-to-noise ratio and survival rate, where physical insight into the role of dipole-force fluctuations in the heating dynamics of trapped atoms is gained.	翻訳日:2023-02-26 04:31:00 公開日:2022-02-17
# 量子コンピューティングと通信におけるデコヒーレンスと量子エラー補正 Decoherence and Quantum Error Correction for Quantum Computing and Communications ( http://arxiv.org/abs/2202.08600v1 ) ライセンス: Link先を確認	Josu Etxezarreta Martinez	(参考訳) 量子技術は、素数分解、非構造化データベース探索、複雑なマクロ分子シミュレーションなど、いくつかの情報処理タスクを効果的に解決する、計り知れない可能性を示している。古典的な扱いにくい問題を解く能力の結果として、量子マシンは薬物設計、プロセスの最適化、破壊不能なコミュニケーション、機械学習といったアプリケーションを通じて現代社会に革命をもたらす可能性がある。しかし、量子情報は、周囲の環境との相互作用に関連する量子状態のコヒーレンスを失うことを表す、いわゆるデコヒーレンス(decoherence)による誤りに悩まされる傾向がある。このデコヒーレンス現象は、量子情報の伝達、処理、あるいは保存など、すべての量子情報タスクに存在します。したがって、量子誤り訂正符号(QECC)による量子情報の保護は、完全に動作する量子コンピュータを構築する上で最重要となる。量子情報の保護が可能な効果的な誤り訂正法を構築するためには,環境非一貫性のプロセスとそのモデル化方法を理解することが基本である。この論文では、デコヒーレンスの性質を研究・数学的にモデル化し、QECCはより優れた誤り訂正能力を示すように設計・最適化されている。 Quantum technologies have shown immeasurable potential to effectively solve several information processing tasks such as prime number factorization, unstructured database search or complex macromolecule simulation. As a result of such capability to solve certain problems that are not classically tractable, quantum machines have the potential revolutionize the modern world via applications such as drug design, process optimization, unbreakable communications or machine learning. However, quantum information is prone to suffer from errors caused by the so-called decoherence, which describes the loss in coherence of quantum states associated to their interactions with the surrounding environment. This decoherence phenomenon is present in every quantum information task, be it transmission, processing or even storage of quantum information. Consequently, the protection of quantum information via quantum error correction codes (QECC) is of paramount importance to construct fully operational quantum computers. Understanding environmental decoherence processes and the way they are modeled is fundamental in order to construct effective error correction methods capable of protecting quantum information. In this thesis, the nature of decoherence is studied and mathematically modelled; and QECCs are designed and optimized so that they exhibit better error correction capabilities.	翻訳日:2023-02-25 12:59:46 公開日:2022-02-17
# 信頼性量子コンピューティングにおける誤差補正 Error Correction for Reliable Quantum Computing ( http://arxiv.org/abs/2202.08599v1 ) ライセンス: Link先を確認	Patricio Fuentes	(参考訳) 量子コンピュータは、これまで難解だった計算問題を効率的に解決する新しい時代の到来を告げる。しかし、量子技術はデコヒーレンス (decoherence) によって抑えられ、これは量子パラダイムにおいて一様であり、未確認のままで量子情報が役に立たない現象である。量子誤り訂正の科学は、符号として知られる構造を用いてデコヒーレンスの効果から量子情報を組み合わせ、保護しようとする分野であり、この課題を満たすために生まれた。量子符号の特定のサブクラスである安定化符号は、古典的誤り訂正の分野を用いて並列を描画することで、量子誤り訂正の分野の高速進行を可能にした。この結果、スパース符号や量子ターボ符号のような、よく知られたキャパシティに適合する古典符号の量子対数が構築された。しかし、この方法で得られた量子符号は、古典的な誤り訂正能力を完全に引き起こすわけではない。これは、古典的戦略が量子パラダイムと古典的パラダイムの間の重要な違いを無視しているためであり、量子的誤り訂正がデコヒーレンスとの戦いに成功するには対処しなければならない問題である。この論文では、縮退(degeneracy)として知られる量子パラダイム専用の現象とそのスパース量子符号の性能への影響について研究する。さらに,様々なシナリオにおいて,スパース量子コードの特定のファミリーの性能を向上させる手法を解析・提示する。 Quantum computers herald the arrival of a new era in which previously intractable computational problems will be solved efficiently. However, quantum technology is held down by decoherence, a phenomenon that is omnipresent in the quantum paradigm and that renders quantum information useless when left unchecked. The science of quantum error correction, a discipline that seeks to combine and protect quantum information from the effects of decoherence using structures known as codes, has arisen to meet this challenge. Stabilizer codes, a particular subclass of quantum codes, have enabled fast progress in the field of quantum error correction by allowing parallels to be drawn with the widely studied field of classical error correction. This has resulted in the construction of the quantum counterparts of well-known capacity-approaching classical codes like sparse codes and quantum turbo codes. However, quantum codes obtained in this manner do not entirely evoke the stupendous error correcting abilities of their classical counterparts. This occurs because classical strategies ignore important differences between the quantum and classical paradigms, an issue that needs to be addressed if quantum error correction is to succeed in its battle with decoherence. In this dissertation we study a phenomenon exclusive to the quantum paradigm, known as degeneracy, and its effects on the performance of sparse quantum codes. Furthermore, we also analyze and present methods to improve the performance of a specific family of sparse quantum codes in various different scenarios.	翻訳日:2023-02-25 12:59:25 公開日:2022-02-17
# 金属中のホットフォノン物理学の性質と課題:MgB$_2$およびその他の化合物 Properties and challenges of hot-phonon physics in metals: MgB$_2$ and other compounds ( http://arxiv.org/abs/2202.08597v1 ) ライセンス: Link先を確認	Emmanuele Cappelluti, Fabio Caruso, Dino Novko	(参考訳) 平衡外の系における電子と集合モードの超高速ダイナミクスは、ポンプ源のエネルギーが通常吸収される電子自由度から格子の自由度へのエネルギー移動によって決定的に制御される。従来の金属では、この過程は格子全体の加熱につながり、通常、すべての自由度との最終的な平衡に達するまで、有効な格子温度 $t_{\rm ph}$ で記述される。しかし、特定の材料では、エネルギー移動に優先的なチャネルを提供する格子モードがほとんどなく、非熱的振動分布と"em hot phonons"、すなわち他のモードよりも人口が多い格子モードの開始に繋がる。ホットフォノンは通常、半導体やグラフェンのような半金属化合物で起こるが、ホットモードへの優先チャネルは、電子位相空間の縮小によって決定される。異なる経路を辿ると、金属でもホットフォノン物理学を得る可能性も近年、電子-フォノン (el-ph) カップリングの強い異方性の結果、文学においても引き起こされている。本稿では,MgB$_2$を代表例として,異方性el-ph結合金属のホットフォノンシナリオを許容する物理条件を概説し,ホットフォノンの観察可能な指紋について考察する。他の金属化合物中のホットフォノンの予測と実験的観察への新しい展望についても論じる。 The ultrafast dynamics of electrons and collective modes in systems out of equilibrium is crucially governed by the energy transfer from electronic degrees of freedom, where the energy of the pump source is usually absorbed, to lattice degrees of freedom. In conventional metals such process leads to an overall heating of the lattice, usually described by an effective lattice temperature $T_{\rm ph}$, until final equilibrium with all the degrees of freedom is reached. In specific materials, however, few lattice modes provide a preferential channel for the energy transfer, leading to a non-thermal distribution of vibrations and to the onset of {\em hot phonons}, i.e., lattice modes with a much higher population than the other modes. Hot phonons are usually encountered in semiconductors or semimetal compounds, like graphene, where the preferential channel towards hot modes is dictated by the reduced electronic phase space. Following a different path, the possibility of obtaining hot-phonon physics also in metals has been however also recently prompted in literature, as a result of a strong anisotropy of the electron-phonon (el-ph) coupling. In the present paper, taking MgB$_2$ as a representative example, we review the physical conditions that allow a hot-phonon scenario in metals with anisotropic el-ph coupling, and we discuss the observable fingerprints of hot phonons. Novel perspectives towards the prediction and experimental observation of hot phonons in other metallic compounds are also discussed.	翻訳日:2023-02-25 12:59:02 公開日:2022-02-17
# 多成分偶数と奇数jスピンコヒーレント状態を用いた任意の重ね合わせコヒーレント状態の確率的量子テレポーテーション効率の向上 Improving the probabilistic quantum teleportation efficiency of arbitrary superposed coherent state using multipartite even and odd j-spin coherent states as resource ( http://arxiv.org/abs/2202.08591v1 ) ライセンス: Link先を確認	Meryem El Kirdi, Abdallah Slaoui, Hanane El Hadfi and Mohammed Daoud	(参考訳) 量子テレポーテーションは、量子情報セキュア伝送において最も重要な技術の一つである。量子テレポーテーションは、多くの量子情報タスクの基本的な鍵として設計され、量子技術、特に量子通信において顕著に機能する。本研究では,alice (sender) と bob (receiver) を接続する絡み合った資源として,多成分偶数と奇数の$j$-spinコヒーレント状態を用いて,任意の重ね合わせコヒーレント状態に対する確率的テレポーテーション方式を提案する。ここで、アリスは偶数と奇数の両方のスピンコヒーレント状態を持ち、(1)未知のスピン状態と(2)2つのコヒーレントスピン状態のうちの1つからなる一対のスピン上で繰り返しGHZ状態測定(GHZSMs)を行い、最大平均忠実度で量子テレポーテーションに到達するまで交互に行う。共起によって定量化された共有状態の絡み合い量と、テレポーテーション忠実度と、テレポーテーションされた対象状態の成功確率との関係を、n^{\rm th}$の繰り返し試行まで提供する。本研究では,非最大絡み合った状態でも完全量子テレポーテーションが可能であることを示す。さらに、この繰り返しGHZSM試行プロセスは、テレポートされた状態の平均忠実度と確率的プロトコルの実行が成功する確率の両方を著しく増大させる。また,この結果から,jスピン数,目標状態パラメータ,コヒーレント状態間の重なり合いが,テレポーテーション効率を最大化するために調整可能な重要な制御パラメータを提供することを示した。 Quantum teleportation is one of the most important techniques for quantum information secure transmission. Using preshared entanglement, quantum teleportation is designed as a basic key in many quantum information tasks and features prominently in quantum technologies, especially in quantum communication. In this work, we provide a new probabilistic teleportation protocol scheme for arbitrary superposed coherent states by employing the multipartite even and odd $j$-spin coherent states as the entangled resource connecting Alice (sender) and Bob (receiver). Here, Alice possesses both even and odd spin coherent states and makes repeated GHZ states measurements (GHZSMs) on the pair of spins, consisting of (1) the unknown spin state and (2) one of the two coherent spin states, taken alternately, until reaching a quantum teleportation with maximal average fidelity. We provide the relationship between the entanglement amount of the shared state, quantified by the concurrence, with the teleportation fidelity and the success probability of the teleported target state up to the $n^{\rm th}$ repeated attempt. In this scheme, we show that the perfect quantum teleportation can be done even with a non-maximally entangled state. Furthermore, this repeated GHZSMs attempt process significantly increases both the average fidelity of the teleported state and the probability of a successful run of the probabilistic protocol. Also on our results, we show that the j-spin number, the target state parameter and the overlap between coherent states provide important additional control parameters that can be adjusted to maximize the teleportation efficiency.	翻訳日:2023-02-25 12:58:37 公開日:2022-02-17
# 準周期駆動1次元乱れ系の局在と非局在化特性 Localization and delocalization properties in quasi-periodically driven one-dimensional disordered system ( http://arxiv.org/abs/2202.08582v1 ) ライセンス: Link先を確認	Hiroaki S. Yamada and Kensuke S. Ikeda	(参考訳) m$カラーの準周期調和振動により摂動した時間連続1次元アンダーソンモデルにおける量子拡散の局在と非局在を系統的に検討し, [pre {\bf 103}, l040202(2021)] で部分的に報告した。モデルの局所化・非局在化特性について, 障害強度$W$, 摂動強度$\epsilon$, 空間次元の類似的な役割を果たす色数$M$の3つのパラメータについて詳細に検討した。特に,局所的非局在化遷移 (ldt) の存在とその臨界特性に注目している。 M\geq 3$ の LDT が存在し、通常の拡散は臨界強度$\epsilon$ の上に回復し、拡散力学の特徴は、M$ が大きなとはいえ、確率的に摂動されたアンダーソンモデルに対して予測される拡散過程を模倣する。これらの結果は、時間離散量子マップ、すなわちアンダーソン写像と標準写像の結果と比較される。さらに,静的不規則な部分を持たない極限モデルと比較し,非局在化ダイナミクスの特徴について考察した。 Localization and delocalization of quantum diffusion in time-continuous one-dimensional Anderson model perturbed by the quasi-periodic harmonic oscillations of $M$ colors is investigated systematically, which has been partly reported by the preliminary letter [PRE {\bf 103}, L040202(2021)]. We investigate in detail the localization-delocalization characteristics of the model with respect to three parameters: the disorder strength $W$, the perturbation strength $\epsilon$ and the number of the colors $M$ which plays the similar role of spatial dimension. In particular, attentions are focused on the presence of localization-delocalization transition (LDT) and its critical properties. For $M\geq 3$ the LDT exists and a normal diffusion is recovered above a critical strength $\epsilon$, and the characteristics of diffusion dynamics mimic the diffusion process predicted for the stochastically perturbed Anderson model even though $M$ is not large. These results are compared with the results of time-discrete quantum maps, i.e., Anderson map and the standard map. Further, the features of delocalized dynamics is discussed in comparison with a limit model which has no static disordered part.	翻訳日:2023-02-25 12:58:02 公開日:2022-02-17
# 量子論における部分的無知通信タスク Partial ignorance communication tasks in quantum theory ( http://arxiv.org/abs/2202.08581v1 ) ライセンス: Link先を確認	Oskari Kerppo	(参考訳) 本稿では,成功度基準が測定と準備で最大化される前に,準備と測定の双方が第三者から入力を受け取る部分的無知のコミュニケーションの一般化を提案する。 sdps、通信行列のための超弱モノトン、量子状態のフレーム理論など、成功指標の境界を得るために様々な方法が用いられている。新しい一般化された準備・測定設定における最も単純なシナリオは、単に部分的無知通信タスクと呼ばれ、ビットとキューディットに対して徹底的に分析される。最後に、新しい一般化された設定により、準備と測定に操作等価性を導入することができ、通信タスクの1つで量子論の文脈上の優位性を分析し観察することができる。 We introduce a generalization of communication of partial ignorance where both parties of a prepare-and-measure setup receive inputs from a third party before a success metric is maximized over the measurements and preparations. Various methods are used to obtain bounds on the success metrics, including SDPs, ultraweak monotones for communication matrices and frame theory for quantum states. Simplest scenarios in the new generalized prepare-and-measure setting, simply called partial ignorance communication tasks, are analysed exhaustively for bits and qudits. Finally, the new generalized setting allows the introduction of operational equivalences to the preparations and measurements, allowing us to analyse and observe a contextual advantage for quantum theory in one of the communication tasks.	翻訳日:2023-02-25 12:57:39 公開日:2022-02-17
# コールド分子またはリドバーグ原子の合成格子中の量子膜相 Quantum Membrane Phases in Synthetic Lattices of Cold Molecules or Rydberg Atoms ( http://arxiv.org/abs/2202.08540v1 ) ライセンス: Link先を確認	Chunhan Feng, Hannah Manetsch, Valery G. Rousseau, Kaden R. A. Hazzard and Richard Scalettar	(参考訳) 確率的グリーン関数量子モンテカルロ法を用いて、双極子相互作用を持つ超低温分子またはリドバーグ原子の性質を合成次元と2次元実空間光学格子または周期マイクロトラップアレイの半合成的構成で計算する。熱力学的量と適切な相関関数の計算と、それらの有限サイズのスケーリングによって、分子やライドバーグ原子の内部回転状態や電子状態の合成次元に2次元のシートが形成される低温相への2次遷移が存在することが示されている。相互作用の異なる値に対するシミュレーション $v$ は、実空間と合成空間の両方に隣接する原子や分子の間で作用し、位相図を計算することができる。十分に大きなV$での有限温度遷移と量子相転移は、遷移温度が消滅する臨界値である$V_c$よりも低い値である。 We calculate properties of dipolar interacting ultracold molecules or Rydberg atoms in a semi-synthetic three-dimensional configuration -- one synthetic dimension plus a two-dimensional real space optical lattice or periodic microtrap array -- using the stochastic Green function Quantum Monte Carlo method. Through a calculation of thermodynamic quantities and appropriate correlation functions, along with their finite size scalings, we show that there is a second order transition to a low temperature phase in which two-dimensional `sheets' form in the synthetic dimension of internal rotational or electronic states of the molecules or Rydberg atoms, respectively. Simulations for different values of the interaction $V$, which acts between atoms or molecules that are adjacent both in real and synthetic space, allow us to compute a phase diagram. We find a finite-temperature transition at sufficiently large $V$, as well as a quantum phase transition -- a critical value $V_c$ below which the transition temperature vanishes.	翻訳日:2023-02-25 12:57:26 公開日:2022-02-17
# チュートリアル:マクロなQEDと真空力 Tutorial: Macroscopic QED and vacuum forces ( http://arxiv.org/abs/2202.08762v1 ) ライセンス: Link先を確認	S. A. R. Horsley	(参考訳) このチュートリアルでは、分散した散逸物質と相互作用する電磁場を表すハミルトニアンが見つかる、マクロなqedの理論を紹介している。 1次元理論をモチベーションとして用いて、より面倒な3次元理論を構築する。そして、ドップラー効果と電気および磁気応答の混合により物質反応が変化する移動体へのこの理論の拡張を考えると、量子電磁力の理論を無料で得ることが示されている。我々は、スライド板間の量子摩擦力に対するペンドリー式を再現するためにマクロQEDを適用して仕上げる。 This tutorial introduces the theory of macroscopic QED, where a Hamiltonian is found that represents the electromagnetic field interacting with a dispersive, dissipative material. Using a one dimensional theory as motivation, we build up the more cumbersome three dimensional theory. Then considering the extension of this theory to moving materials, where the material response changes due to both the Doppler effect and the mixing of electric and magnetic responses, it is shown that one gets the theory of quantum electromagnetic forces for free. We finish by applying macroscopic QED to reproduce Pendry's expression for the quantum friction force between sliding plates.	翻訳日:2023-02-25 12:50:16 公開日:2022-02-17
# 断熱熱機の動力に関する幾何学的境界 Geometric Bounds on the Power of Adiabatic Thermal Machines ( http://arxiv.org/abs/2202.08759v1 ) ライセンス: Link先を確認	Joshua Eglinton and Kay Brandner	(参考訳) 温度差の小さい2つの熱浴間での低速駆動型メソとマイクロスケールの冷凍機とヒートエンジンの性能解析を行った。一般的なスケーリング引数を用いて,浴槽間の熱リークが完全に抑制された場合に限って,カルノット限界に任意に近づくことができることを示す。その出力は、カルノー極限で二次的にゼロに崩壊する普遍幾何学的境界に従属する。この境界は、駆動プロトコルが適切に最適化され、浴槽間の温度差が駆動周波数でゼロとなる場合、準静的限界において漸近的に飽和する。これらの結果は、明確に定義された断熱応答状態と一般化されたオンサーガー対称性を持つ任意の熱力学的一貫した力学に対して一般的な条件で成り立つ。実例では, 冷却装置として動作するクビット冷凍機とコヒーレントチャージポンプのモデルについて検討する。 We analyze the performance of slowly driven meso- and micro-scale refrigerators and heat engines that operate between two thermal baths with small temperature difference. Using a general scaling argument, we show that such devices can work arbitrarily close to their Carnot limit only if heat-leaks between the baths are fully suppressed. Their power output is then subject to a universal geometric bound that decays quadratically to zero at the Carnot limit. This bound can be asymptotically saturated in the quasi-static limit if the driving protocols are suitably optimized and the temperature difference between the baths goes to zero with the driving frequency. These results hold under generic conditions for any thermodynamically consistent dynamics admitting a well-defined adiabatic-response regime and a generalized Onsager symmetry. For illustration, we work out models of a qubit-refrigerator and a coherent charge pump operating as a cooling device.	翻訳日:2023-02-25 12:50:06 公開日:2022-02-17
# 平衡・高非線形ブール関数の進化構成 Evolving Constructions for Balanced, Highly Nonlinear Boolean Functions ( http://arxiv.org/abs/2202.08743v1 ) ライセンス: Link先を確認	Claude Carlet, Marko Djurasevic, Domagoj Jakobovic, Luca Mariot, Stjepan Picek	(参考訳) バランスの取れた高非線形ブール関数を見つけることは、一般にどの非線形値に到達できるかがわからないという難しい問題である。同時に、進化的計算は特定のブール関数インスタンスの進化に成功しているが、より大きなブール関数サイズに対して容易にスケールできない。実際、より小さなブール関数の進化はほぼ自明であるが、より大きなサイズはますます難しくなり、進化的アルゴリズムは亜最適に機能する。本研究では,遺伝的プログラミング (gp) が高非線形性を持つブール関数のバランスを保った構成を進化させるかどうかを問う。特に興味深いのは、そのような構成はごくわずかしか知られていないことである。以上の結果から,GP はよく一般化される構造,すなわち,複数のテストサイズに必要な関数を見つけることができることがわかった。さらに、GPは異なる構文表現の下で多くの等価な構成を進化させることを示す。興味深いことに、GPによって発見された最も単純な解は、よく知られた間接和構成の特別な場合である。 Finding balanced, highly nonlinear Boolean functions is a difficult problem where it is not known what nonlinearity values are possible to be reached in general. At the same time, evolutionary computation is successfully used to evolve specific Boolean function instances, but the approach cannot easily scale for larger Boolean function sizes. Indeed, while evolving smaller Boolean functions is almost trivial, larger sizes become increasingly difficult, and evolutionary algorithms perform suboptimally. In this work, we ask whether genetic programming (GP) can evolve constructions resulting in balanced Boolean functions with high nonlinearity. This question is especially interesting as there are only a few known such constructions. Our results show that GP can find constructions that generalize well, i.e., result in the required functions for multiple tested sizes. Further, we show that GP evolves many equivalent constructions under different syntactic representations. Interestingly, the simplest solution found by GP is a particular case of the well-known indirect sum construction.	翻訳日:2023-02-25 12:49:44 公開日:2022-02-17
# 文脈の微分幾何学 Differential Geometry of Contextuality ( http://arxiv.org/abs/2202.08719v1 ) ライセンス: Link先を確認	Sidiney B. Montanhano	(参考訳) 文脈性は、トポロジカルな現象として長い間関連してきた。この研究では、そのような関係は一般化された文脈性というより一般的な枠組みで明らかにされる。主アイデアは、状態、効果、変換を接空間に存在するベクトルとして、非文脈条件を離散閉経路として、ヌル垂直位相を意味する。同様の解釈が2つある。平坦な空間が課される幾何学的あるいは現実的な視点は、文脈の振る舞いが、電磁的テンソルに類似した確率関数の曲率(非自明なホロノミー)と等価になることを意味する; 評価関数の修正として、文脈性と干渉、非可換性、符号付き測度を接続するのに使うことができる。評価関数を保存しなければならない位相的あるいは反現実的視点は、文脈的振る舞いを位相的障害(非自明なモノドロミー)として解釈できることを意味し、文脈性と非埋め込み可能性、一般化されたボロビエフの定理をつなぐのに使うことができる。両方のビューは文脈的分数と関連付けられ、オンティックモデルの乱れは非自明な遷移写像として表現できる。 Contextuality has been related for a long time as a topological phenomenon. In this work, such a relationship is exposed in the more general framework of generalized contextuality. The main idea is to identify states, effects, and transformations as vectors living in a tangent space, and the non-contextual conditions as discrete closed paths implying null vertical phases. Two equivalent interpretations hold. The geometrical or realistic view, where flat space is imposed, implies that the contextual behavior becomes equivalent to the curvature (non-trivial holonomy) of the probabilistic functions, in analogy with the electromagnetic tensor; as a modification of the valuation function, it can be used to connect contextuality with interference, non-commutativity, and signed measures. The topological or anti-realistic view, where the valuation functions must be preserved, implies that the contextual behavior can be translated as topological failures (non-trivial monodromy); it can be used to connect contextuality with non-embeddability and a generalized Voroby'ev theorem. Both views can be related to contextual fraction, and the disturbance in ontic models can be presented as non-trivial transition maps.	翻訳日:2023-02-25 12:49:28 公開日:2022-02-17
# 圧縮二次光学を用いたゼプトニュートン力センシング Zeptonewton force sensing with squeezed quadratic optomechanics ( http://arxiv.org/abs/2202.08690v1 ) ライセンス: Link先を確認	Sheng-Dian Zhang, Jie Wang, Ya-Feng Jiao, Huilai Zhang, Ying Li, Yun-Lan Zuo, \c{S}ahin K. \"Ozdemir, Cheng-Wei Qiu, Franco Nori, Hui Jing	(参考訳) キャビティ・オプティメカニカル(COM)センサは, 主に線形COM結合を用いて, 超弱力測定や暗黒物質探索のための強力なツールとして実装されている。ここでは、不安定な二次COMシステムを用いて量子力センシングを行い、機械的エネルギーの正確な測定を可能にする。このシステムは従来のリニアCOMセンサよりも7ドル高い精度で出力感度を実現するために最適化され、パラメーターが実験的に利用可能であることが判明した。さらに2次COMシステムとスクイーズ媒質を統合することで、標準量子限界をはるかに下回ってゼプトニュートンレベルに達するという、さらに3ドルの注文の強化につながる可能性がある。これにより、量子非線形COMセンサーを基礎物理学実験や極端感度を必要とする広範囲のアプリケーションで製造および使用する新たな展望が開かれる。 Cavity optomechanical (COM) sensors, as powerful tools for measuring ultraweak forces or searching for dark matter, have been implemented to date mainly using linear COM couplings. Here, quantum force sensing is explored by using a quadratic COM system which is free of bistability and allows accurate measurement of mechanical energy. We find that this system can be optimized to achieve a force sensitivity $7$ orders of magnitude higher than any conventional linear COM sensor, with experimentally accessible parameters. Further integrating a quadratic COM system with a squeezing medium can lead to another $3$ orders enhancement, well below the standard quantum limit and reaching the zeptonewton level. This opens new prospects of making and using quantum nonlinear COM sensors in fundamental physics experiments and in a wide range of applications requiring extreme sensitivity.	翻訳日:2023-02-25 12:48:33 公開日:2022-02-17
# ボソニック弦理論における熱場二重状態の回路複雑性 Circuit Complexity for Thermofield Double States in Bosonic String Theory ( http://arxiv.org/abs/2202.08663v1 ) ライセンス: Link先を確認	Arshid Shabir, Sanjib Dey, Salman Sajad Wani, Suhail Lone, Seemin Rubab, Mir Faizal	(参考訳) 本稿では、まず光円錐ゲージにおけるボソニック弦理論の熱場二重状態を構築する。次に、コヒーレント-熱的弦状態を取得し、弦理論の回路複雑性を計算する。これは共分散行列法を用いて回路複雑性を計算する。このアプローチでは、水平弦発生器によって最適な測地線を生成し、群多様体における最小測地線の長さを用いて回路複雑性を得る。 In this paper, we first construct thermofield double states for bosonic string theory in the light-cone gauge. We then obtain a coherent-thermal string state and use it to calculate the circuit complexity in string theory. This is done using the covariance matrix approach to calculate the circuit complexity. In this approach, we will generate the optimal geodesics by a horizontal string generator and, then, obtain the circuit complexity using the length of the minimal geodesics in the group manifold.	翻訳日:2023-02-25 12:48:18 公開日:2022-02-17
# エミュレートされた乱流による絡み合った光子の形成 Shaping entangled photons through emulated turbulent atmosphere ( http://arxiv.org/abs/2202.08650v1 ) ライセンス: Link先を確認	Ronen Shekel, Ohad Lib, Yaron Bromberg, Alon Sardas	(参考訳) 大気乱流による散乱は、長い自由空間光リンク、特に絡み合った光子のリンクを作成する際の大きな課題の1つである。古典的な補償法は、本質的に低信号対雑音比と絡み合いの脆弱さのため、絡み合う光子には適用が難しい。我々は近ごろ、自発パラメトリックダウン変換を励起する明るいレーザービームを用いて、絡み合った光子間の空間的相関を制御し、散乱を補償できることを示した。本研究では,大気乱流をエミュレートして散乱する絡み合った光子間の相関関係のスクランブル補正にポンプシェーピング法を適用した。空間光変調器とコルモゴロフの乱流モデルを用いて,ラボ内の大気乱流をエミュレートし,ポンプ最適化による光子絡み込み信号の15倍の精度で増幅する。本研究では, 静的および動的エミュレート雰囲気の両方に対してこれを示し, 高次モードの散乱の補償も示す。この結果は、量子鍵分布などのアプリケーションで用いられる絡み合った光子による自由空間量子リンクを実現するための扉を開くことができる。 Scattering by atmospheric turbulence is one of the main challenges in creating long free-space optical links, and specifically links of entangled photons. Classical compensation methods are hard to apply to entangled photons, due to inherently low signal to noise ratios and the fragility of entanglement. We have recently shown that we can use the bright laser beam that pumps spontaneous parametric down conversion to control the spatial correlations between entangled photons for compensating their scattering. In this work, we apply the pump-shaping technique to compensate for scrambling of correlations between entangled photons that scatter by emulated atmospheric turbulence. We use a spatial light modulator and Kolmogorov's turbulence model to emulate atmospheric turbulence in the lab, and enhance the entangled photons' signal by a factor of fifteen using pump optimization. We show this for both static and dynamic emulated atmosphere, and demonstrate also the compensation of the scattering of a higher-order mode. Our results can open the door towards realizing free-space quantum links with entangled photons, used in applications such as quantum key distribution.	翻訳日:2023-02-25 12:48:11 公開日:2022-02-17
# 新しいHDコンピューティング代数:秩序情報を表すスパースバンドルを生成する状態の非連想的重ね合わせ A novel HD Computing Algebra: Non-associative superposition of states creating sparse bundles representing order information ( http://arxiv.org/abs/2202.08633v1 ) ライセンス: Link先を確認	Stefan Reimann	(参考訳) 計算システムへの情報流入は、情報項目のシーケンスによって行われる。認知コンピューティング、すなわち、そのシーケンスに沿って変換を実行するには、アイテム情報だけでなく、シーケンシャルな情報も表現する必要がある。最も基本的な操作としては、バンドル、すなわちアイテムの追加、'メモリ状態'、すなわち情報の取得が可能なバンドルなどがある。通常のベクトル付加のような結合演算が連想的であれば、追加の代数構造を含まないシーケンシャル情報は表現できない。神経活動の確率的総和にインスパイアされた単純な確率的バイナリバンドルルールにより、結果として生じる記憶状態は、非連想的である限り、アイテム情報とシーケンシャル情報の両方を表現することができる。任意の数のアイテムを束ねて生じるメモリ状態は不均一であり、和の活性化閾値によって制御される疎さの度合いを有する。提案するバンドル操作は,情報の連続的な流入をナビゲートするために使用できるアイテムのドメインだけでなく,テンポラリにもフィルタを構築することができる。 Information inflow into a computational system is by a sequence of information items. Cognitive computing, i.e. performing transformations along that sequence, requires to represent item information as well as sequential information. Among the most elementary operations is bundling, i.e. adding items, leading to 'memory states', i.e. bundles, from which information can be retrieved. If the bundling operation used is associative, e.g. ordinary vector-addition, sequential information can not be represented without imposing additional algebraic structure. A simple stochastic binary bundling rule inspired by the stochastic summation of neuronal activities allows the resulting memory state to represent both, item information as well as sequential information as long as it is non-associative. The memory state resulting from bundling together an arbitrary number of items is non-homogeneous and has a degree of sparseness, which is controlled by the activation threshold in summation. The bundling operation proposed allows to build a filter in the temporal as well as in the items' domain, which can be used to navigate the continuous inflow of information.	翻訳日:2023-02-25 12:47:49 公開日:2022-02-17
# 連続対称性の自発的破断によるスケーラブルなスピンスクイーズ Scalable spin squeezing from spontaneous breaking of a continuous symmetry ( http://arxiv.org/abs/2202.08607v1 ) ライセンス: Link先を確認	Tommaso Comparin, Fabio Mezzacapo, Martin Robert-de-Saint-Vincent, Tommaso Roscilde	(参考訳) 自発的対称性破れ(ssb)は、熱力学的極限において、それに結合した磁場が断続的にオフになっても順序パラメータの有限平均値を保持するハミルトン平衡状態の性質である。 In the case of quantum spin models with continuous symmetry, we show that this adiabatic process is also accompanied by the suppression of the fluctuations of the symmetry generator -- namely, the collective spin component along an axis of symmetry. In systems of $S=1/2$ spins or qubits, the combination of the suppression of fluctuations along one direction and of the persistence of transverse magnetization leads to spin squeezing -- a much sought-after property of quantum states, both for the purpose of entanglement detection as well as for metrological uses. U(1)(またはSU(2))対称性を自発的に破るXXZモデルの場合、アディベート的に準備された状態はほぼ最小のスピン不確実性を持ち、これらの状態で達成できる最小位相不確実性は、スピン数$N$で$N^{-3/4}$にスケールし、このスケーリングは、アディベート準備時間が$N$で線形にスケーリングされた後に達成されることを示す。我々の発見は、例えば光学格子時計を含む様々な量子多体デバイスにおける強いスピンスクイーズ状態の断熱的準備への扉を開く。 Spontaneous symmetry breaking (SSB) is a property of Hamiltonian equilibrium states which, in the thermodynamic limit, retain a finite average value of an order parameter even after a field coupled to it is adiabatically turned off. In the case of quantum spin models with continuous symmetry, we show that this adiabatic process is also accompanied by the suppression of the fluctuations of the symmetry generator -- namely, the collective spin component along an axis of symmetry. In systems of $S=1/2$ spins or qubits, the combination of the suppression of fluctuations along one direction and of the persistence of transverse magnetization leads to spin squeezing -- a much sought-after property of quantum states, both for the purpose of entanglement detection as well as for metrological uses. Focusing on the case of XXZ models spontaneously breaking a U(1) (or even SU(2)) symmetry, we show that the adiabatically prepared states have nearly minimal spin uncertainty; that the minimum phase uncertainty that one can achieve with these states scales as $N^{-3/4}$ with the number of spins $N$; and that this scaling is attained after an adiabatic preparation time scaling linearly with $N$. Our findings open the door to the adiabatic preparation of strongly spin-squeezed states in a large variety of quantum many-body devices including e.g. optical lattice clocks.	翻訳日:2023-02-25 12:47:29 公開日:2022-02-17
# ベンチマーク最適化問題を用いた量子アニールとハイブリッドソルバの実験解析 Experimental analysis of quantum annealers and hybrid solvers using benchmark optimization problems ( http://arxiv.org/abs/2202.08939v1 ) ライセンス: Link先を確認	Evangelos Stogiannos and Christos Papalitsas and Theodore Andronikos	(参考訳) 本稿では、D-Waveの量子システムにおけるハミルトンサイクル問題(HCP)とトラベリングセールスマン問題(TSP)について検討する。当初、ほとんどのライブラリが隣接行列でベンチマークインスタンスを提示するという事実に動機付けられて、量子プラットフォームにおけるベンチマークインスタンスのシームレスかつ自動統合を可能にする、HCPおよびTSPハミルトニアンの新しい行列定式化を開発した。大規模な実験の結果興味深い結論が得られました D-Wave の {\tt Advantage\_system4.1} は、量子ビット利用とソリューションの品質の両方において、 {\tt Advantage\_system1.1} よりも効率的である。最後に、D-WaveのHybridソルバがQUBO制約に違反することなく、任意の大問題に対して120ドルノードの順序で常に有効なソリューションを提供することを実験的に確立する。 tspインスタンスを解くとき、量子アニーラーによって生成される解はしばしば、グラフのトポロジーに違反しているという意味で無効である。この用途に対処するために、TSPハミルトニアンの係数に対する \emph{min-max normalization} の使用を提唱する。最後に, hcp と tsp のハミルトニアンを表現するのに必要な制約の正確な数を数学的に解析する。この分析は、不完全なグラフインスタンスを実行するのに完全インスタンスよりもクビットを必要とする理由を定量的に説明している。不完全グラフは完備グラフよりも二次的な制約を必要とすることが判明し、これは一連の実験で裏付けられている。 This paper studies the Hamiltonian Cycle Problem (HCP) and the Traveling Salesman Problem (TSP) on D-Wave's quantum systems. Initially, motivated by the fact that most libraries present their benchmark instances in terms of adjacency matrices, we develop a novel matrix formulation for the HCP and TSP Hamiltonians, which enables the seamless and automatic integration of benchmark instances in quantum platforms. our extensive experimental tests have led us to some interesting conclusions. D-Wave's {\tt Advantage\_system4.1} is more efficient than {\tt Advantage\_system1.1} both in terms of qubit utilization and quality of solutions. Finally, we experimentally establish that D-Wave's Hybrid solvers always provide a valid solution to a problem, without violating the QUBO constraints, even for arbitrarily big problems, of the order of $120$ nodes. When solving TSP instances, the solutions produced by the quantum annealer are often invalid, in the sense that they violate the topology of the graph. To address this use we advocate the use of \emph{min-max normalization} for the coefficients of the TSP Hamiltonian. Finally, we present a thorough mathematical analysis on the precise number of constraints required to express the HCP and TSP Hamiltonians. This analysis, explains quantitatively why, almost always, running incomplete graph instances requires more qubits than complete instances. It turns out that incomplete graph require more quadratic constraints than complete graphs, a fact that has been corroborated by a series of experiments.	翻訳日:2023-02-25 12:41:01 公開日:2022-02-17
# ニューロモルフィックアーキテクチャにおけるスパイクニューラルネットワークの実装 Implementing Spiking Neural Networks on Neuromorphic Architectures: A Review ( http://arxiv.org/abs/2202.08897v1 ) ライセンス: Link先を確認	Phu Khanh Huynh, M. Lakshmi Varshika, Ankita Paul, Murat Isik, Adarsha Balaji, Anup Das	(参考訳) 近年,産学ともにスパイキングニューラルネットワーク(snn)を用いて設計された機械学習アプリケーションを実行するために,複数の異なるニューロモルフィックシステムを提案している。設計と技術面での複雑さが増大する中で、このようなシステムを機械学習アプリケーションを受け入れ実行するためのプログラミングはますます困難になりつつある。さらに、ニューロモルフィックシステムはリアルタイムのパフォーマンスを保証し、低エネルギーを消費し、論理やメモリ障害に対する耐性を提供する必要がある。そのため、現在および新興のニューロモルフィックシステム上で機械学習アプリケーションを実装でき、同時にパフォーマンス、エネルギー、信頼性に対処できるシステムソフトウェアフレームワークが明らかに必要である。本稿では,プラットフォームベース設計とハードウェア・ソフトウェア共同設計の両面で提案されているフレームワークの概要を紹介する。我々は,ニューロモルフィックコンピューティングのシステムソフトウェア技術分野における将来が持つ課題と機会を強調する。 Recently, both industry and academia have proposed several different neuromorphic systems to execute machine learning applications that are designed using Spiking Neural Networks (SNNs). With the growing complexity on design and technology fronts, programming such systems to admit and execute a machine learning application is becoming increasingly challenging. Additionally, neuromorphic systems are required to guarantee real-time performance, consume lower energy, and provide tolerance to logic and memory failures. Consequently, there is a clear need for system software frameworks that can implement machine learning applications on current and emerging neuromorphic systems, and simultaneously address performance, energy, and reliability. Here, we provide a comprehensive overview of such frameworks proposed for both, platform-based design and hardware-software co-design. We highlight challenges and opportunities that the future holds in the area of system software technology for neuromorphic computing.	翻訳日:2023-02-25 12:40:11 公開日:2022-02-17
# 量子ハミルトン平均場モデルにおける三重臨界点 Tricritical point in the quantum Hamiltonian mean-field model ( http://arxiv.org/abs/2202.08855v1 ) ライセンス: Link先を確認	Harald Schmid, Johannes Dieplinger, Andrea Solfanelli, Sauro Succi, and Stefano Ruffo	(参考訳) 実験プラットフォームにおける工学的長距離相互作用は、近年、様々な量子システムにおいて大きな成功を収めている。この進展に触発されて、古典ハミルトン平均場モデルのフェルミオン粒子への一般化を提案する。温度・ホッピング関数としての強磁性相互作用の正準アンサンブルにおける模型の位相図と熱力学的性質について検討した。ゼロ温度では、小さな電荷のゆらぎは、ゼロ温度で秩序から乱れた位相への1次量子相遷移を通じて多体系を駆動する。高温では、揺動誘起相転移は最初は第1次であり、三臨界点でのみ第2次に遷移する。本研究は, 長距離結合を持つ量子系において, 直接実験的妥当性を持つ三重臨界性の興味深い例を示す。解析は厳密な対角化と平均場理論によって行われる。 Engineering long-range interactions in experimental platforms has been achieved with great success in a large variety of quantum systems in recent years. Inspired by this progress, we propose a generalization of the classical Hamiltonian mean-field model to fermionic particles. We study the phase diagram and thermodynamic properties of the model in the canonical ensemble for ferromagnetic interactions as a function of temperature and hopping. At zero temperature, small charge fluctuations drive the many-body system through a first order quantum phase transition from an ordered to a disordered phase at zero temperature. At higher temperatures, the fluctuation-induced phase transition remains first order initially and switches to second order only at a tricritical point. Our results offer an intriguing example of tricriticality in a quantum system with long-range couplings, which bears direct experimental relevance. The analysis is performed by exact diagonalization and mean-field theory.	翻訳日:2023-02-25 12:39:38 公開日:2022-02-17
# 量子$\mathbb{z}_2$格子ゲージ理論を用いたハミルトニアンサイクル問題の解法 Solving Hamiltonian Cycle Problem using Quantum $\mathbb{Z}_2$ Lattice Gauge Theory ( http://arxiv.org/abs/2202.08817v1 ) ライセンス: Link先を確認	Xiaopeng Cui, Yu Shi	(参考訳) グラフ理論におけるハミルトンサイクル(HC)問題は、よく知られたNP完全問題である。我々は、グラフを双対とする格子上で定義される {\mathbb{z}_2$ lattice gauge theory (lgt) という観点からのアプローチを示す。結合パラメータ $g$ が臨界値 $g_c$ よりも小さい場合、基底状態は、同じシングルスピン状態のスピンの閉文字列を持つ全ての構成の重ね合わせであり、時間複雑性を持つ断熱量子アルゴリズム $o(\frac{1}{g_c^2} \sqrt{ \frac{1}{\varepsilon} n_e^{3/2}(n_v^3 + \frac{n_e}{g_c}})$, ここで $n_v$ と $n_e$ はそれぞれグラフの頂点と辺の数である。その後の閉文字列間のhcの探索は、hc問題を解く。小さなグラフのランダムな例では、$\sqrt{n_{hc}}$, $n_{hc}$ の平均値が hcs の数、$\frac{1}{g_c}$ の平均値が $n_e$ に対して線型であることが示されている。したがって、いくつかのグラフではhc問題は多項式時間で解くことができる。また、$g_c$を用いて$N_{hc}$を推論できる量子アルゴリズムについても論じる。 The Hamiltonian cycle (HC) problem in graph theory is a well-known NP-complete problem. We present an approach in terms of $\mathbb{Z}_2$ lattice gauge theory (LGT) defined on the lattice with the graph as its dual. When the coupling parameter $g$ is less than the critical value $g_c$, the ground state is a superposition of all configurations with closed strings of spins in a same single-spin state, which can be obtained by using an adiabatic quantum algorithm with time complexity $O(\frac{1}{g_c^2} \sqrt{ \frac{1}{\varepsilon} N_e^{3/2}(N_v^3 + \frac{N_e}{g_c}}))$, where $N_v$ and $N_e$ are the numbers of vertices and edges of the graph respectively. A subsequent search for a HC among those closed-strings solves the HC problem. For some random samples of small graphs, we demonstrate that the dependence of the average value of $g_c$ on $\sqrt{N_{hc}}$, $N_{hc}$ being the number of HCs, and that of the average value of $\frac{1}{g_c}$ on $N_e$ are both linear. It is thus suggested that for some graphs, the HC problem may be solved in polynomial time. A possible quantum algorithm using $g_c$ to infer $N_{hc}$ is also discussed.	翻訳日:2023-02-25 12:39:03 公開日:2022-02-17
# 学校地理教育のイノベーション技術としての遠隔学習 Distance learning as innovation technology of school geographical education ( http://arxiv.org/abs/2202.08697v1 ) ライセンス: Link先を確認	Myroslav Syvyi, Ordenbek Mazbayev, Olga Varakuta, Natalia Panteleeva and Olga Bondarenko	(参考訳) 本論文は,中等教育における地理分野の学習と教育の過程における革新的技術の利用の必要性を述べる。教育的革新としての遠隔学習、その理論的側面、教育プロセスへの導入方法に特に注意が払われている。新ウクライナ学校における遠隔学習の意義が実証された。その利点と欠点が明らかになる。欧州の要求に応じて地理的能力開発に寄与するいくつかの遠隔学習の例が提供される。この記事は特に、Massive Open Online Courses、モダンウェブサイト、個々の教師の仮想ポータル、LearningApps.orgポータル、Moodleに焦点を当てている。 The article substantiates the necessity of using innovative technologies in the process of studying and teaching geographical disciplines at secondary schools. Particular attention is paid to distance learning as a pedagogical innovation, its theoretical aspects and the ways of its introduction into the educational process. The relevance of using distance learning at the New Ukrainian School is proved. Its advantages and disadvantages are revealed. The examples of some forms of distance learning that will contribute to geographical competence development according to European requirements are provided. The article particularly focuses on the Massive Open Online Courses, modern websites, virtual portals of individual teachers, LearningApps.org portal, and Moodle.	翻訳日:2023-02-19 15:01:47 公開日:2022-02-17
# 農業における知識共有のための情報通信技術イニシアティブ Information and communication technology initiatives for knowledge sharing in agriculture ( http://arxiv.org/abs/2202.08649v1 ) ライセンス: Link先を確認	Siddhartha Paul Tiwari	(参考訳) 農業における知識共有のための情報通信技術(ICT)の利用状況と動向について調査した。アジア諸国では、日本、韓国、台湾を含む先進ユーザーカテゴリーに次いでインドは第2のカテゴリーに分類される。一方の利益モチベーションとビジネス強化、一方の地域サービスと他方の農村福祉は、インドの農業におけるICTベースモデルの目的である。 ICTによる農業への取り組みは、様々な機関、ヴィズ民間セクター、公共セクター、セルフヘルプグループ、NGOに属しており、複合的な取り組みも含まれている。 eラーニングはますます両分野に傾きつつある (i)キャンパス内又は「プレゼンス」モードで、及び (ii)「距離」モード。その使用は「鉄の三角形」の3つの腕の抑止範囲から徐々に利害関係者を緩和している (i)品質、 (ii)アクセス、及び (iii)コスト。モビリティの低い社会グループは、この教育様式の恩恵を受けることができる。これはまた、ジェンダーの主流化をもたらす強力なツールの1つかもしれない。 eラーニングは「ICT支援学習」と呼ばれるハイブリッドシステムとして、既存の組織・教育構造に統合されている。接続性, コンテンツ開発, インフラ開発, 教員開発, 継続性の評価, ノード3とコンソーシアムの形成等は, 支援, 開発が必要な分野である。 A survey on status and trends of information and communication technologies (ICT) use for knowledge sharing in agriculture was attempted. Among asian countries, India comes under the second next category after the advanced user category comprising Japan, South Korea and Taiwan. Both profit-motive and business augmentation on one hand and community services and rural welfare on the other have been the objectives of ICT-based models in agriculture in India. The ICT endeavours for agriculture belong to a wide array of agencies, viz private sector, public sector, self-help groups and NGOs, and also include combined endeavours. e-Learning is being increasingly resorted to both in (i) in campus or 'presence' mode, and (ii) 'distance' mode. Its use is gradually easing-out the stakeholders from the stranglehold of the inter-deterrence of the 3 arms of the 'Iron Triangle', viz (i) quality, (ii) access, and (iii) cost. The social groups having less mobility are poised to benefit more from this mode of education. This could also be one of the potent tools to bring about gender mainstreaming. e-Learning is being integrated into the existing organizational and educational structure as a hybrid system that can be called 'ICT-supported learning'. Connectivity, content development, infrastructure development, faculty developmeat, need assessment on a continuum, linking the node3 and formation of consortia etc. are the areas identified that need to be supported and developed.	翻訳日:2023-02-19 15:01:36 公開日:2022-02-17
# SNPSFuzzer: スナップショットを使用したステートフルネットワークプロトコルのための高速Greybox Fuzzer SNPSFuzzer: A Fast Greybox Fuzzer for Stateful Network Protocols using Snapshots ( http://arxiv.org/abs/2202.03643v2 ) ライセンス: Link先を確認	Junqiang Li, Senyi Li, Gang Sun, Ting Chen, and Hongfang Yu	(参考訳) グレイボックスファジングはステートレスプログラムで広く使われており、大きな成功を収めている。しかしながら、ほとんどの最先端のグレーボックスファザは、インタラクションの詳細を記憶し保存できるステートフルネットワークプロトコルプログラムをファザリングするプロセスにおいて、遅い速度と浅い状態の深さのカバレッジの問題を一般的に抱えている。ネットワークプロトコルプログラム用の既存のグレーボックスファッジャは、まず入力メッセージの明確に定義されたプレフィックスシーケンスを送信し、次に変更されたメッセージを送り、ステートフルなネットワークプロトコルのターゲット状態をテストする。上記のプロセスは、高い時間的コストを引き起こす。本稿では、スナップショットを用いたステートフルネットワークプロトコルのための高速グレーボックスファザであるSNPSFuzzerを提案する。 SNPSFuzzerは、ネットワークプロトコルプログラムが特定の状態にあるときにコンテキスト情報をダンプし、状態がファジットされる必要があるときにそれを復元する。さらに,より深いネットワークプロトコル状態を探索するために,メッセージ連鎖解析アルゴリズムを設計する。 SNPSFuzzerは最先端のネットワークプロトコルであるGragbox fuzzer AFLNETと比較して112.0%-168.9%高速化し、24時間以内にパスカバレッジを21.4%-27.5%向上した。さらに、snpsfuzzerは、プログラムtinydtlsで以前に報告されていない脆弱性を公開する。 Greybox fuzzing has been widely used in stateless programs and has achieved great success. However, most state-of-the-art greybox fuzzers generally have the problems of slow speed and shallow state depth coverage in the process of fuzzing stateful network protocol programs which are able to remember and store details of the interactions. The existing greybox fuzzers for network protocol programs send a series of well-defined prefix sequences of input messages first and then send mutated messages to test the target state of a stateful network protocol. The process mentioned above causes a high time cost. In this paper, we propose SNPSFuzzer, a fast greybox fuzzer for stateful network protocol using snapshots. SNPSFuzzer dumps the context information when the network protocol program is under a specific state and restores it when the state needs to be fuzzed. Furthermore, we design a message chain analysis algorithm to explore more and deeper network protocol states. Our evaluation shows that, compared with the state-of-the-art network protocol greybox fuzzer AFLNET, SNPSFuzzer increases the speed of network protocol fuzzing by 112.0%-168.9% and improves path coverage by 21.4%-27.5% within 24 hours. Moreover, SNPSFuzzer exposes a previously unreported vulnerability in program Tinydtls.	翻訳日:2023-02-19 14:45:08 公開日:2022-02-17
# 西イランの方言層--階層的ディリクレ過程の言語関係へのアプローチ Dialectal Layers in West Iranian: a Hierarchical Dirichlet Process Approach to Linguistic Relationships ( http://arxiv.org/abs/2001.05297v4 ) ライセンス: Link先を確認	Chundra Aroor Cathcart	(参考訳) 本稿は、西イラン語の歴史音韻学における、複雑で未解決の一連の問題に対処する。西イランの言語(ペルシア語、クルド語、バラチ語、その他の言語)は、非ラウトゲシュリッヒ的な振る舞いの度合いが高い。しかし、西イラン方言学の文献では、作業プロセスに対する過度に単純化された見解が普及しており、専門家は、特定の非ペルシア語における期待される結果からの逸脱は、ペルシア語の年代学的段階からの語彙的借用によるものであると仮定している。この定性的なアプローチは、データの分布に関する明示的な確率論的推論の欠如から生じる問題的な結論をもたらすことが示されている: ペルシア語は唯一のドナー言語ではないかもしれない; さらに、語彙レベルで借りることが必ずしも不規則性をもたらすメカニズムであるとは限らない。多くの場合、西イランの言語が異なる条件条件下で異なる反射を示す可能性は未検討のままである。我々は、これらの問題を克服し、西イランの音響変化のパターンにおける不規則性の異なる決定要因を分解するために設計された新しいベイズ的アプローチを採用する。提案手法により,西イラン語方言学における特定の音変化の弁証的関連に関する多くの顕著な疑問を予備的に解決することができる。この種の作業の今後の方向性について概説する。 This paper addresses a series of complex and unresolved issues in the historical phonology of West Iranian languages. The West Iranian languages (Persian, Kurdish, Balochi, and other languages) display a high degree of non-Lautgesetzlich behavior. Most of this irregularity is undoubtedly due to language contact; we argue, however, that an oversimplified view of the processes at work has prevailed in the literature on West Iranian dialectology, with specialists assuming that deviations from an expected outcome in a given non-Persian language are due to lexical borrowing from some chronological stage of Persian. It is demonstrated that this qualitative approach yields at times problematic conclusions stemming from the lack of explicit probabilistic inferences regarding the distribution of the data: Persian may not be the sole donor language; additionally, borrowing at the lexical level is not always the mechanism that introduces irregularity. In many cases, the possibility that West Iranian languages show different reflexes in different conditioning environments remains under-explored. We employ a novel Bayesian approach designed to overcome these problems and tease apart the different determinants of irregularity in patterns of West Iranian sound change. Our methodology allows us to provisionally resolve a number of outstanding questions in the literature on West Iranian dialectology concerning the dialectal affiliation of certain sound changes. We outline future directions for work of this sort.	翻訳日:2023-01-11 23:06:43 公開日:2022-02-17
# ミトコンドリアの深部核融合 Deep Feature Fusion for Mitosis Counting ( http://arxiv.org/abs/2002.03781v3 ) ライセンス: Link先を確認	Robin Elizabeth Yancey	(参考訳) 米国に住む女性はそれぞれ、浸潤性乳癌を発症する確率は8分の1である。有糸分裂細胞数は、乳がんの攻撃性または品位を評価する最も一般的な検査の1つである。病理組織像は高分解能顕微鏡を用いて細胞数を計測し,病理組織学的に検討する必要がある。残念ながら、これは再現性に乏しい、特に非専門家にとって、徹底的な作業だ。深層学習ネットワークは、これらの関心領域を自動的にローカライズできる医療アプリケーションに適用されている。しかし、これらのリージョンベースのネットワークは、フルイメージCNNが生成するセグメンテーション機能を活用できないため、検出の唯一の方法としてしばしば使用される。そこで提案手法では,rgb画像特徴を持つunetで生成されたセグメンテーション特徴を活用しつつ,オブジェクト検出に高速なrcnnを活用し,mitos-atypia 2014 mitosis counting challengeデータセット上で0.508のf-scoreを実現する。 Each woman living in the United States has about 1 in 8 chance of developing invasive breast cancer. The mitotic cell count is one of the most common tests to assess the aggressiveness or grade of breast cancer. In this prognosis, histopathology images must be examined by a pathologist using high-resolution microscopes to count the cells. Unfortunately, this can be an exhaustive task with poor reproducibility, especially for non-experts. Deep learning networks have recently been adapted to medical applications which are able to automatically localize these regions of interest. However, these region-based networks lack the ability to take advantage of the segmentation features produced by a full image CNN which are often used as a sole method of detection. Therefore, the proposed method leverages Faster RCNN for object detection while fusing segmentation features generated by a UNet with RGB image features to achieve an F-score of 0.508 on the MITOS-ATYPIA 2014 mitosis counting challenge dataset, outperforming state-of-the-art methods.	翻訳日:2023-01-05 00:37:27 公開日:2022-02-17
# un-mix: 教師なし視覚表現学習のための画像混合再考 Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning ( http://arxiv.org/abs/2003.05438v5 ) ライセンス: Link先を確認	Zhiqiang Shen and Zechun Liu and Zhuang Liu and Marios Savvides and Trevor Darrell and Eric Xing	(参考訳) 最近の教師なし学習のアプローチでは、同じイメージから2つの"ビュー"を比較して表現を学習する。 2つのビューを区別させることは、教師なしのメソッドが意味のある情報を学習できることを保証するためのコアである。しかし、このようなフレームワークは、2つのビューを生成するのに使用される拡張が不十分な場合、オーバーフィッティングに脆弱な場合があり、トレーニングデータに過度な問題が発生する。この欠点は、微妙な分散ときめ細かい情報を学ぶことを妨げる。そこで本研究では,ラベル空間上の距離概念を教師なし学習に含め,入力データ空間を混合することにより,正対と負対のソフトな類似度をモデルに認識させ,入力空間と損失空間を協調的に扱うことを目的とする。その概念的単純さにもかかわらず、この解 -- 教師なし画像混合(Un-Mix)により、変換された入力と対応する新しいラベル空間からより微妙でより堅牢で一般化された表現を学習できることを示す。 CIFAR-10、CIFAR-100、STL-10、Tiny ImageNet、および一般的な教師なし手法SimCLR、BYOL、MoCo V1&V2、SwaVなどを用いて、広範囲にわたる実験を行った。提案する画像混合とラベル割当戦略は,ベース手法の全く同じハイパーパラメータとトレーニング手順に従って,1～3%の一貫した改善が得られる。コードはhttps://github.com/szq0214/un-mixで公開されている。 The recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations. Making the two views distinctive is a core to guarantee that unsupervised methods can learn meaningful information. However, such frameworks are sometimes fragile on overfitting if the augmentations used for generating two views are not strong enough, causing the over-confident issue on the training data. This drawback hinders the model from learning subtle variance and fine-grained information. To address this, in this work we aim to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs through mixing the input data space, to further work collaboratively for the input and loss spaces. Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space. Extensive experiments are conducted on CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet and standard ImageNet with popular unsupervised methods SimCLR, BYOL, MoCo V1&V2, SwAV, etc. Our proposed image mixture and label assignment strategy can obtain consistent improvement by 1~3% following exactly the same hyperparameters and training procedures of the base methods. Code is publicly available at https://github.com/szq0214/Un-Mix.	翻訳日:2022-12-24 14:33:02 公開日:2022-02-17
# 最も近い隣のディリクレ混合物 Nearest Neighbor Dirichlet Mixtures ( http://arxiv.org/abs/2003.07953v3 ) ライセンス: Link先を確認	Shounak Chattopadhyay, Antik Chakraborty, David B. Dunson	(参考訳) ベイズの密度推定法には、未知の密度を核の混合として特徴づける豊富な文献がある。このような手法は、様々な密度に適応しながら、推定において不確実な定量化を提供するという点で有利である。しかし、頻繁な局所適応型カーネル法と比較して、ベイズ的アプローチはマルコフ連鎖モンテカルロアルゴリズムに依存して実装するのが遅く不安定である。計算上の欠点を伴わずにベイズアプローチの強みのほとんどを維持するため,近接-ディリクレ混合系のクラスを提案する。このアプローチは、データを標準アルゴリズムに基づいて近隣にグループ化することから始まります。各近傍では、密度は、未知のパラメータを持つガウスのようなベイズパラメトリックモデルによって特徴づけられる。これらの局所カーネルの重みの前にディリクレを割り当てると、重みとカーネルパラメータの擬似ポストリプタが得られる。単純で恥ずかしい並列なモンテカルロアルゴリズムは、未知の密度の擬似後続体からサンプリングするために提案される。望ましい漸近的性質を示し,シミュレーション研究で評価し,分類の文脈における動機付けデータセットに適用する。 There is a rich literature on Bayesian methods for density estimation, which characterize the unknown density as a mixture of kernels. Such methods have advantages in terms of providing uncertainty quantification in estimation, while being adaptive to a rich variety of densities. However, relative to frequentist locally adaptive kernel methods, Bayesian approaches can be slow and unstable to implement in relying on Markov chain Monte Carlo algorithms. To maintain most of the strengths of Bayesian approaches without the computational disadvantages, we propose a class of nearest neighbor-Dirichlet mixtures. The approach starts by grouping the data into neighborhoods based on standard algorithms. Within each neighborhood, the density is characterized via a Bayesian parametric model, such as a Gaussian with unknown parameters. Assigning a Dirichlet prior to the weights on these local kernels, we obtain a pseudo-posterior for the weights and kernel parameters. A simple and embarrassingly parallel Monte Carlo algorithm is proposed to sample from the resulting pseudo-posterior for the unknown density. Desirable asymptotic properties are shown, and the methods are evaluated in simulation studies and applied to a motivating data set in the context of classification.	翻訳日:2022-12-22 21:57:53 公開日:2022-02-17
# neural loop combiner: ループの互換性を評価するニューラルネットワークモデル Neural Loop Combiner: Neural Network Models for Assessing the Compatibility of Loops ( http://arxiv.org/abs/2008.02011v2 ) ライセンス: Link先を確認	Bo-Yu Chen, Jordan B. L. Smith, Yi-Hsuan Yang	(参考訳) ループを使用する音楽プロデューサーは数千のループライブラリにアクセスできますが、互換性のある曲を見つけるのは時間を要するプロセスです。 AutoMashUpperのような互換性を推定する最先端システムはほとんどルールベースであり、機械学習によって改善される可能性がある。モデルをトレーニングするには、真理互換値の大きいループのセットが必要です。このようなデータセットは存在せず、既存の音楽からループを抽出して互換ループの正の例を得、負の例を選択するための様々な戦略を提案し比較する。このデータを用いて、我々はループの互換性を推定するための2種類のモデルアーキテクチャを調査する。1つは、シームズネットワークに基づくもので、もう1つは純粋な畳み込みニューラルネットワーク(CNN)である。我々は,各モデルが提案する組み合わせの質を評価するユーザスタディを行い,CNNがシームズネットワークを上回っていることを確認した。どちらのモデルベースアプローチもルールベースのアプローチよりも優れています。モデルとデータセットを構築するためのコードをオープンソース化しました。 Music producers who use loops may have access to thousands in loop libraries, but finding ones that are compatible is a time-consuming process; we hope to reduce this burden with automation. State-of-the-art systems for estimating compatibility, such as AutoMashUpper, are mostly rule-based and could be improved on with machine learn-ing. To train a model, we need a large set of loops with ground truth compatibility values. No such dataset exists, so we extract loops from existing music to obtain positive examples of compatible loops, and propose and compare various strategies for choosing negative examples. For re-producibility, we curate data from the Free Music Archive.Using this data, we investigate two types of model architectures for estimating the compatibility of loops: one based on a Siamese network, and the other a pure convolutional neural network (CNN). We conducted a user study in which participants rated the quality of the combinations suggested by each model, and found the CNN to outperform the Siamese network. Both model-based approaches outperformed the rule-based one. We have opened source the code for building the models and the dataset.	翻訳日:2022-11-02 19:06:59 公開日:2022-02-17
# 超次元計算の理論的展望 A Theoretical Perspective on Hyperdimensional Computing ( http://arxiv.org/abs/2010.07426v3 ) ライセンス: Link先を確認	Anthony Thomas, Sanjoy Dasgupta, Tajana Rosing	(参考訳) 超次元(hyperdimensional, hd)コンピューティングは、高次元、低精度、分散したデータの表現を得るための、神経にインスパイアされた一連の方法である。これらの表現は、様々な情報処理タスクに影響を及ぼす、単純で神経学的に妥当なアルゴリズムと組み合わせることができる。 HDコンピューティングは最近、学習問題を解決するためのエネルギー効率、低レイテンシ、ノイズローバストツールとして、コンピュータハードウェアコミュニティから大きな関心を集めている。本稿では,HDコンピューティングの理論的基礎を統一的に扱うとともに,学習における表現の適合性に焦点をあてる。 Hyperdimensional (HD) computing is a set of neurally inspired methods for obtaining high-dimensional, low-precision, distributed representations of data. These representations can be combined with simple, neurally plausible algorithms to effect a variety of information processing tasks. HD computing has recently garnered significant interest from the computer hardware community as an energy-efficient, low-latency, and noise-robust tool for solving learning problems. In this review, we present a unified treatment of the theoretical foundations of HD computing with a focus on the suitability of representations for learning.	翻訳日:2022-10-07 14:05:38 公開日:2022-02-17
# 自動運転のためのディープサロゲートQラーニング Deep Surrogate Q-Learning for Autonomous Driving ( http://arxiv.org/abs/2010.11278v2 ) ライセンス: Link先を確認	Maria Kalweit, Gabriel Kalweit, Moritz Werling, Joschka Boedecker	(参考訳) 実システムへの適用における深層強化学習システムの課題は,環境変化への適応性とw.r.t.計算資源とデータの有効性である。自動運転の学習車線変更行動の適用においては、エージェントは周囲のさまざまな車両を扱う必要がある。さらに、テストドライバは実世界で任意の数のレーン変更を実行できないため、必要なトランジションの数がボトルネックとなる。政治外の環境では、他人の行動を観察することで、タスクの解決に関する追加情報を得ることができる。古典的なRL設定では、この知識は使われていないが、エージェントの値関数をより効率的に学習するために、他のドライバを代理として使用する。本稿では、上記の問題に対処し、必要な運転時間を劇的に短縮するSurrogate Q-learningを提案する。さらに,q関数の置換同変ディープニューラルネットワークアーキテクチャに基づく効率的な実装を提案し,センサ範囲の可変車両の動作値の推定を行う。オープントラヒックシミュレータsumoでは,このアーキテクチャにより,シーン中心体験リプレイと呼ばれる新たなリプレイサンプリング手法が実現され,サロゲートq学習とシーン中心体験リプレイのパフォーマンス評価が可能となった。さらに,本手法は実高Dデータセット上のポリシーを学習することで,実世界のRLシステムの適用性を向上させる。 Challenging problems of deep reinforcement learning systems with regard to the application on real systems are their adaptivity to changing environments and their efficiency w.r.t. computational resources and data. In the application of learning lane-change behavior for autonomous driving, agents have to deal with a varying number of surrounding vehicles. Furthermore, the number of required transitions imposes a bottleneck, since test drivers cannot perform an arbitrary amount of lane changes in the real world. In the off-policy setting, additional information on solving the task can be gained by observing actions from others. While in the classical RL setup this knowledge remains unused, we use other drivers as surrogates to learn the agent's value function more efficiently. We propose Surrogate Q-learning that deals with the aforementioned problems and reduces the required driving time drastically. We further propose an efficient implementation based on a permutation-equivariant deep neural network architecture of the Q-function to estimate action-values for a variable number of vehicles in sensor range. We show that the architecture leads to a novel replay sampling technique we call Scene-centric Experience Replay and evaluate the performance of Surrogate Q-learning and Scene-centric Experience Replay in the open traffic simulator SUMO. Additionally, we show that our methods enhance real-world applicability of RL systems by learning policies on the real highD dataset.	翻訳日:2022-10-05 00:53:26 公開日:2022-02-17
# ハリケーン予報:新しいマルチモーダル機械学習フレームワーク Hurricane Forecasting: A Novel Multimodal Machine Learning Framework ( http://arxiv.org/abs/2011.06125v3 ) ライセンス: Link先を確認	L\'eonard Boussioux, Cynthia Zeng, Th\'eo Gu\'enais, Dimitris Bertsimas	(参考訳) 本稿では,熱帯性サイクロン強度とトラック予測のための機械学習(ML)フレームワークについて述べる。我々のマルチモーダルフレームワークであるHurricastは、深層学習エンコーダデコーダアーキテクチャで特徴を抽出し、勾配木で予測することで、時空間データと統計データを効率的に組み合わせている。我々は2016-2019年、北大西洋と東太平洋の流域で24時間リードタイムトラックと強度予測を行い、計算中に現在の運用予測モデルに匹敵する平均誤差とスキルを達成できたことを示す。さらに、Hurricastを運用予測コンセンサスモデルに組み込むことは、National Hurricane Centerの公式予測よりも改善され、既存のアプローチと相補的な特性が強調される。まとめると、我々の研究は、異なるデータソースを組み合わせるために機械学習技術を利用することで、熱帯サイクロン予測の新しい機会がもたらされることを示した。 This paper describes a novel machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple ML techniques and utilizing diverse data sources. Our multimodal framework, called Hurricast, efficiently combines spatial-temporal data with statistical data by extracting features with deep-learning encoder-decoder architectures and predicting with gradient-boosted trees. We evaluate our models in the North Atlantic and Eastern Pacific basins on 2016-2019 for 24-hour lead time track and intensity forecasts and show they achieve comparable mean average error and skill to current operational forecast models while computing in seconds. Furthermore, the inclusion of Hurricast into an operational forecast consensus model could improve over the National Hurricane Center's official forecast, thus highlighting the complementary properties with existing approaches. In summary, our work demonstrates that utilizing machine learning techniques to combine different data sources can lead to new opportunities in tropical cyclone forecasting.	翻訳日:2022-09-26 23:40:59 公開日:2022-02-17
# (参考訳) モバイルクラウドセンシングにおける偽タスク防止のためのdeepnnを用いた協調的自己組織化マップ Collaborative Self Organizing Map with DeepNNs for Fake Task Prevention in Mobile Crowdsensing ( http://arxiv.org/abs/2203.12434v1 ) ライセンス: CC BY 4.0	Murat Simsek, Burak Kantarci, Azzedine Boukerche	(参考訳) モバイルクラウドセンシング(MCS)は、さまざまなサービスプロバイダがデータを収集し、処理し、分析する方法を変革したセンシングパラダイムである。 mcsは、最先端技術のための様々なアプリケーションやサービスをサポートするために、ユーザのモバイルデバイスを通じてデータがセンシングされ共有される、新しいプロセスを提供する。しかし、データ中毒、詰まったタスクアタック、偽のセンシングタスクといった様々な脅威は、mcsシステム、特にそのセンシングと計算能力のパフォーマンスに悪影響を及ぼす。フェイクセンシングタスクの提出は、正当なタスクとモバイルデバイスリソースの完成を目標としているため、mcsプラットフォームリソースも排除している。本研究では、教師なしでトレーニングされたニューラルネットワークであるSelf Organizing Feature Map(SOFM)を用いて、データセット内の正当データを事前クラスタリングすることにより、新しいデータセットにおいて正当/偽タスク比率が低い不均衡なデータにより、偽タスクをより効果的に検出することができる。クラスタ化された正規タスクが元のデータセットから分離された後、残りのデータセットを使用して、最終的なパフォーマンス目標に到達するためのDeep Neural Network(DeepNN)をトレーニングする。提案手法の性能向上のために,DeepNNの正の予測出力に,事前クラスタ化された正規タスクを付加し,事前クラスタ化されたDeepNN(PrecDeepNN)と呼ぶ。その結果、DeepNNから得られた正当性と偽のタスクを、選択した特徴セットで識別する初期平均精度が、提案した機械学習技術から得られる平均精度0.9812まで向上できることが証明された。 Mobile Crowdsensing (MCS) is a sensing paradigm that has transformed the way that various service providers collect, process, and analyze data. MCS offers novel processes where data is sensed and shared through mobile devices of the users to support various applications and services for cutting-edge technologies. However, various threats, such as data poisoning, clogging task attacks and fake sensing tasks adversely affect the performance of MCS systems, especially their sensing, and computational capacities. Since fake sensing task submissions aim at the successful completion of the legitimate tasks and mobile device resources, they also drain MCS platform resources. In this work, Self Organizing Feature Map (SOFM), an artificial neural network that is trained in an unsupervised manner, is utilized to pre-cluster the legitimate data in the dataset, thus fake tasks can be detected more effectively through less imbalanced data where legitimate/fake tasks ratio is lower in the new dataset. After pre-clustered legitimate tasks are separated from the original dataset, the remaining dataset is used to train a Deep Neural Network (DeepNN) to reach the ultimate performance goal. Pre-clustered legitimate tasks are appended to the positive prediction outputs of DeepNN to boost the performance of the proposed technique, which we refer to as pre-clustered DeepNN (PrecDeepNN). The results prove that the initial average accuracy to discriminate the legitimate and fake tasks obtained from DeepNN with the selected set of features can be improved up to an average accuracy of 0.9812 obtained from the proposed machine learning technique.	翻訳日:2022-03-27 14:02:42 公開日:2022-02-17
# (参考訳) GEMA: 自己組織化マップのためのオープンソースのPythonライブラリ GEMA: An open-source Python library for self-organizing-maps ( http://arxiv.org/abs/2203.13190v1 ) ライセンス: CC BY 4.0	Alvaro J. Garcia-Tejedor, Alberto Nogales	(参考訳) 組織はデータ分析の重要性とそのメリットを認識した。この機械学習アルゴリズムと組み合わせることで、問題をより容易に解決することが可能になり、これらのプロセスの時間が短縮される。ニューラルネットワークは、最近非常に良い結果を得た機械学習技術である。本稿では、自己組織化マップと呼ばれるニューラルネットワークモデルを扱うために開発された、GEMAと呼ばれるオープンソースのPythonライブラリについて述べる。 GEMAはGitHubのGNU General Public License(https://github.com/ufvceiec/GEMA)の下で無料で利用できる。ライブラリは特定のユースケースで評価され、正確な結果が得られる。 Organizations have realized the importance of data analysis and its benefits. This in combination with Machine Learning algorithms has allowed to solve problems more easily, making these processes less time-consuming. Neural networks are the Machine Learning technique that is recently obtaining very good best results. This paper describes an open-source Python library called GEMA developed to work with a type of neural network model called Self-Organizing-Maps. GEMA is freely available under GNU General Public License at GitHub (https://github.com/ufvceiec/GEMA). The library has been evaluated in different a particular use case obtaining accurate results.	翻訳日:2022-03-27 13:51:51 公開日:2022-02-17
# fexgan-meta : メタヒトによる表情生成 FExGAN-Meta: Facial Expression Generation with Meta Humans ( http://arxiv.org/abs/2203.05975v1 ) ライセンス: Link先を確認	J. Rafid Siddiqui	(参考訳) 人間の表情の微妙さと、人間の表情が表現する強度の度合いの変動は、表情のイメージを頑健に分類し、生成することを困難にしている。高品質なデータの欠如は、ディープラーニングモデルのパフォーマンスを阻害する可能性がある。本稿では,メタヒトの表情にロバストに作用するメタヒト(fexgan-meta)の表情生成法を提案する。スタジオ環境に配置した10人のメタヒューマンが提示した表情の大規模なデータセットを作成し,FExGAN-Metaを画像上で評価した。以上の結果から,FExGAN-MetaはMeta-Humansの画像と複雑な表情を強く生成し,分類する。 The subtleness of human facial expressions and a large degree of variation in the level of intensity to which a human expresses them is what makes it challenging to robustly classify and generate images of facial expressions. Lack of good quality data can hinder the performance of a deep learning model. In this article, we have proposed a Facial Expression Generation method for Meta-Humans (FExGAN-Meta) that works robustly with the images of Meta-Humans. We have prepared a large dataset of facial expressions exhibited by ten Meta-Humans when placed in a studio environment and then we have evaluated FExGAN-Meta on the collected images. The results show that FExGAN-Meta robustly generates and classifies the images of Meta-Humans for the simple as well as the complex facial expressions.	翻訳日:2022-03-20 23:08:17 公開日:2022-02-17
# (参考訳) 不均一コンピューティングにおけるテキスト分類のための量子時間畳み込み学習 When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing ( http://arxiv.org/abs/2203.03550v1 ) ライセンス: CC BY-SA 4.0	Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Yu Tsao, Pin-Yu Chen	(参考訳) 量子コンピューティングの急速な発展は、よりリッチな特徴表現やモデルパラメータのよりセキュアな保護など、量子アドバンテージの多くの特徴を示している。本研究は,変分量子回路に基づく垂直連合学習アーキテクチャを提案し,テキスト分類のための量子化事前学習BERTモデルの競争性能を実証する。特に,提案するハイブリッド古典量子モデルは,BERTデコーダのいくつかの層を置き換える新しいランダム量子時間畳み込み(QTC)学習フレームワークで構成されている。目的分類実験により,提案したBERT-QTCモデルにより,SnipsおよびATIS音声言語データセットの競合実験結果が得られた。特にbert-qtcは、2つのテキスト分類データセットにおける既存の量子回路ベースの言語モデルのパフォーマンスを1.57%向上させた。さらにBERT-QTCは、既存の商用アクセス可能な量子計算ハードウェアとCPUベースのインターフェースの両方にデプロイ可能で、データの分離を保証することができる。 The rapid development of quantum computing has demonstrated many unique characteristics of quantum advantages, such as richer feature representation and more secured protection on model parameters. This work proposes a vertical federated learning architecture based on variational quantum circuits to demonstrate the competitive performance of a quantum-enhanced pre-trained BERT model for text classification. In particular, our proposed hybrid classical-quantum model consists of a novel random quantum temporal convolution (QTC) learning framework replacing some layers in the BERT-based decoder. Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets. Particularly, the BERT-QTC boosts the performance of the existing quantum circuit-based language model in two text classification datasets by 1.57% and 1.52% relative improvements. Furthermore, BERT-QTC can be feasibly deployed on both existing commercial-accessible quantum computation hardware and CPU-based interface for ensuring data isolation.	翻訳日:2022-03-13 17:10:33 公開日:2022-02-17
# youtubeで子どものコンテンツを不注意で不安全に書き起こしする「bitch」 'Beach' to 'Bitch': Inadvertent Unsafe Transcription of Kids' Content on YouTube ( http://arxiv.org/abs/2203.04837v1 ) ライセンス: Link先を確認	Krithika Ramesh, Ashiqur R. KhudaBukhsh, Sumeet Kumar	(参考訳) ここ数年、youtube kidsは子供向けエンタテインメントにおけるテレビの競争の激しい選択肢の1つとして登場してきた。その結果、youtube kidsのコンテンツは、子供の安全を確保するためにさらなるレベルの精査を受けるべきである。子どもに悪質なコンテンツや不適切なコンテンツを検出する研究が勢いを増しているが、aiアプリケーションが子どもに不適切なコンテンツを導入する可能性について調査する現在の研究は、ほとんど、あるいは全く存在しない。本稿では,よく知られた自動音声認識(asr)システムが,youtubeキッズビデオの書き起こしをしながら,子供にとって不適切なテキストコンテンツを生成することを発見した。我々はこの現象を『不適切なコンテンツ幻覚』と呼ぶ。以上の結果から,これらの幻覚は時折生じない可能性が示唆され,asr系は高い信頼感を持つことが多い。我々は,既存の最先端asrシステムが子どもに不適切なコンテンツを提示するための,初歩的な音声データセットをリリースする。さらに,これらのエラーのいくつかを言語モデルを用いて修正できることを実証する。 Over the last few years, YouTube Kids has emerged as one of the highly competitive alternatives to television for children's entertainment. Consequently, YouTube Kids' content should receive an additional level of scrutiny to ensure children's safety. While research on detecting offensive or inappropriate content for kids is gaining momentum, little or no current work exists that investigates to what extent AI applications can (accidentally) introduce content that is inappropriate for kids. In this paper, we present a novel (and troubling) finding that well-known automatic speech recognition (ASR) systems may produce text content highly inappropriate for kids while transcribing YouTube Kids' videos. We dub this phenomenon as \emph{inappropriate content hallucination}. Our analyses suggest that such hallucinations are far from occasional, and the ASR systems often produce them with high confidence. We release a first-of-its-kind data set of audios for which the existing state-of-the-art ASR systems hallucinate inappropriate content for kids. In addition, we demonstrate that some of these errors can be fixed using language models.	翻訳日:2022-03-13 14:01:18 公開日:2022-02-17
# (参考訳) 連続時間イベント系列の効率的な検索のための時間点過程の学習 Learning Temporal Point Processes for Efficient Retrieval of Continuous Time Event Sequences ( http://arxiv.org/abs/2202.11485v1 ) ライセンス: CC BY 4.0	Vinayak Gupta and Srikanta Bedathur and Abir De	(参考訳) MTPPを用いた予測モデリングの最近の進歩は、連続時間イベントシーケンス(CTES)を含む実世界のいくつかの応用の正確な評価を可能にしている。しかし、これらの配列の検索問題は文献にはほとんど見当たらない。そこで本研究では,あるクエリシーケンスに対して,関連する一連の連続時間イベントシーケンスの検索とランク付けを学習するNEUROSEQRETを提案する。より具体的には、NEUROSEQRETはまず、クエリシーケンスにトレーニング可能なアンウォープ関数を適用し、特に関連するクエリ-コーパスペアが個々の属性を持つ場合、コーパスシーケンスに匹敵する。次に、未処理のクエリシーケンスとコーパスシーケンスをMTPP誘導神経関連モデルにフィードする。精度と効率のトレードオフを提供する関係モデルの2つの変種を開発する。また、局所性に敏感なハッシュに適合し、与えられたクエリシーケンスに対してトップK結果を返す際の大幅な高速化につながるバイナリシーケンスの埋め込みを、関連スコアから学習するための最適化フレームワークを提案する。いくつかのデータセットを用いた実験では、NEUROSEQRETの精度がいくつかのベースラインを超えて向上し、ハッシュ機構の有効性が示された。 Recent developments in predictive modeling using marked temporal point processes (MTPP) have enabled an accurate characterization of several real-world applications involving continuous-time event sequences (CTESs). However, the retrieval problem of such sequences remains largely unaddressed in literature. To tackle this, we propose NEUROSEQRET which learns to retrieve and rank a relevant set of continuous-time event sequences for a given query sequence, from a large corpus of sequences. More specifically, NEUROSEQRET first applies a trainable unwarping function on the query sequence, which makes it comparable with corpus sequences, especially when a relevant query-corpus pair has individually different attributes. Next, it feeds the unwarped query sequence and the corpus sequence into MTPP guided neural relevance models. We develop two variants of the relevance model which offer a tradeoff between accuracy and efficiency. We also propose an optimization framework to learn binary sequence embeddings from the relevance scores, suitable for the locality-sensitive hashing leading to a significant speedup in returning top-K results for a given query sequence. Our experiments with several datasets show the significant accuracy boost of NEUROSEQRET beyond several baselines, as well as the efficacy of our hashing mechanism.	翻訳日:2022-02-27 19:23:20 公開日:2022-02-17
# (参考訳) 単一画像超解像法:調査 Single Image Super-Resolution Methods: A Survey ( http://arxiv.org/abs/2202.11763v1 ) ライセンス: CC BY 4.0	Bahattin Can Maral	(参考訳) 同一場面の1つ以上の低解像度観測から高解像度画像を得る過程であるスーパーレゾリューション(sr)は、信号処理分野と画像処理分野の両方において、過去数十年で非常に一般的な研究テーマとなっている。近年の畳み込みニューラルネットワークの発展により、SRアルゴリズムの人気は急上昇し、参入障壁は大幅に低下した。近年、この人気はビデオ処理領域に広がり、リアルタイムに動作するSRモデルの開発期間にまで及んでいる。本稿では,単一画像処理を専門とするSRモデルの比較を行い,それらが長年にわたって様々な目的や形状にどのように取り組んできたのかを考察する。 Super-resolution (SR), the process of obtaining high-resolution images from one or more low-resolution observations of the same scene, has been a very popular topic of research in the last few decades in both signal processing and image processing areas. Due to the recent developments in Convolutional Neural Networks, the popularity of SR algorithms has skyrocketed as the barrier of entry has been lowered significantly. Recently, this popularity has spread into video processing areas to the lengths of developing SR models that work in real-time. In this paper, we compare different SR models that specialize in single image processing and will take a glance at how they evolved to take on many different objectives and shapes over the years.	翻訳日:2022-02-27 19:04:20 公開日:2022-02-17
# (参考訳) 二部グラフによる作業類似性 Occupation similarity through bipartite graphs ( http://arxiv.org/abs/2202.11064v1 ) ライセンス: CC BY 4.0	Pavle Bo\v{s}koski and Matija Perne and Tja\v{s}a Redek and Biljana Mileva Boshkoska	(参考訳) 職業間の類似性は、キャリア決定を行う上で重要な情報である。しかし、単一で統一された職業類似性尺度の概念は、資産というよりはむしろ制限である。この研究の目的は、複数の説明可能な職業類似性尺度を評価し、占領間関係に関する異なる洞察を提供することである。このような測度は二部グラフの枠組みを用いて導出される。彼らの生存率は、2012年から2021年の間にスロベニアで発生した45万人以上のジョブトランジションによって評価される。結果は、いくつかの類似性尺度が妥当であり、異なる実現可能なキャリアパスを示すという仮説を支持する。データセットの完全な実装と一部は、https://repo.ijs.si/pboskoski/bipartite_job_ similarity_codeで入手できる。 Similarity between occupations is a crucial piece of information when making career decisions. However, the notion of a single and unified occupation similarity measure is more of a limitation than an asset. The goal of the study is to assess multiple explainable occupation similarity measures that can provide different insights into inter-occupation relations. Several such measures are derived using the framework of bipartite graphs. Their viability is assessed on more than 450,000 job transitions occurring in Slovenia in the period between 2012 and 2021. The results support the hypothesis that several similarity measures are plausible and that they present different feasible career paths. The complete implementation and part of the datasets are available at https://repo.ijs.si/pboskoski/bipartite_job_similarity_code.	翻訳日:2022-02-27 18:49:04 公開日:2022-02-17
# (参考訳) deepsketch - 拡散デルタ圧縮のための新しい機械学習に基づく参照探索手法 DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression ( http://arxiv.org/abs/2202.10584v1 ) ライセンス: CC BY 4.0	Jisung Park, Jeoggyun Kim, Yeseong Kim, Sungjin Lee, Onur Mutlu	(参考訳) データセンターの管理コストを最小限に抑える効果的なソリューションとして,ストレージシステムのデータ削減がますます重要になっている。データリダクション効率を最大化するため、既存の後拡散デルタ圧縮技術では、従来のデータ重複やロスレス圧縮とともにデルタ圧縮を行う。残念なことに、類似したデータブロックを識別する際の精度が限られているため、既存の手法は最適値よりも大幅に低いデータ引き込み比を実現している。本稿では,差分圧縮の参照探索における高い精度を達成するために,学習からハッシュへの手法を活用し,データリダクション効率の向上を図る,ディバイス後のデルタ圧縮のための新しい参照探索手法であるdeepsketchを提案する。 deepsketchはディープニューラルネットワークを使用して、データブロックのスケッチ、すなわち他のブロックとの類似性を保存するブロックの近似データシグネチャを生成する。実世界の11のワークロードを用いた評価から,deepsketchは最先端のデルタ圧縮技術に対して,データ還元率を最大33%(平均21%)向上させることがわかった。 Data reduction in storage systems is becoming increasingly important as an effective solution to minimize the management cost of a data center. To maximize data-reduction efficiency, existing post-deduplication delta-compression techniques perform delta compression along with traditional data deduplication and lossless compression. Unfortunately, we observe that existing techniques achieve significantly lower data-reduction ratios than the optimal due to their limited accuracy in identifying similar data blocks. In this paper, we propose DeepSketch, a new reference search technique for post-deduplication delta compression that leverages the learning-to-hash method to achieve higher accuracy in reference search for delta compression, thereby improving data-reduction efficiency. DeepSketch uses a deep neural network to extract a data block's sketch, i.e., to create an approximate data signature of the block that can preserve similarity with other blocks. Our evaluation using eleven real-world workloads shows that DeepSketch improves the data-reduction ratio by up to 33% (21% on average) over a state-of-the-art post-deduplication delta-compression technique.	翻訳日:2022-02-27 18:35:53 公開日:2022-02-17
# (参考訳) 概念の認識と音楽のテーマの認識。量子意味解析 Recognizing Concepts and Recognizing Musical Themes. A Quantum Semantic Analysis ( http://arxiv.org/abs/2202.10941v1 ) ライセンス: CC BY 4.0	Maria Luisa Dalla Chiara, Roberto Giuntini, Eleonora Negri, Giuseppe Sergioli	(参考訳) 過去の経験に基づいて、抽象概念や音楽テーマはどのように認識されているか? この問題に関して、人間と人工知能の異なる行動を比較することは興味深い。一般に、ある概念(例えば、表)を既知の例のセットから抽象化する人間の心は、テーブル・ゲシュタルト(table-gestalt)を生成する。同様の状況は、音楽的なテーマの場合には生じる。人間の心にとって非常に自然なゲシュタルティックなパターンの構築は、知的機械に教えられるのだろうか? この問題は、パターン認識と機械学習への量子アプローチの枠組みにおいて、うまく議論することができる。基本的な考え方は、古典的なデータセットを量子データセットに置き換えることであり、オブジェクトまたは音楽のテーマは、量子世界の特徴となる不確実性と曖昧さを含む量子情報の断片として形式的に表現することができる。この枠組みでは、ジェスタルトの直感的な概念は、与えられた量子データセットの正中心の数学的概念によってシミュレートすることができる。したがって、「新しい物体や新しい音楽の主題を以前の経験に基づいてどのように分類できるか」という重要な問題は、いくつかの特別な量子類似性関係の観点から扱うことができる。認識手順は人間と人工知能では異なるが、どちらの場合においても「問題に直面する」方法が一般的である。 How are abstract concepts and musical themes recognized on the basis of some previous experience? It is interesting to compare the different behaviors of human and of artificial intelligences with respect to this problem. Generally, a human mind that abstracts a concept (say, table) from a given set of known examples creates a table-Gestalt: a kind of vague and out of focus image that does not fully correspond to a particular table with well determined features. A similar situation arises in the case of musical themes. Can the construction of a gestaltic pattern, which is so natural for human minds, be taught to an intelligent machine? This problem can be successfully discussed in the framework of a quantum approach to pattern recognition and to machine learning. The basic idea is replacing classical data sets with quantum data sets, where either objects or musical themes can be formally represented as pieces of quantum information, involving the uncertainties and the ambiguities that characterize the quantum world. In this framework, the intuitive concept of Gestalt can be simulated by the mathematical concept of positive centroid of a given quantum data set. Accordingly, the crucial problem "how can we classify a new object or a new musical theme (we have listened to) on the basis of a previous experience?" can be dealt with in terms of some special quantum similarity-relations. Although recognition procedures are different for human and for artificial intelligences, there is a common method of "facing the problems" that seems to work in both cases.	翻訳日:2022-02-27 18:06:47 公開日:2022-02-17
# 変動型神経時相点過程 Variational Neural Temporal Point Process ( http://arxiv.org/abs/2202.10585v1 ) ライセンス: Link先を確認	Deokjun Eom, Sehyun Lee, Jaesik Choi	(参考訳) 時間的ポイントプロセスは、イベントのシーケンスの履歴が与えられたとき、どのイベントが発生するかを予測する確率的プロセスである。日常生活における発生ダイナミクスの様々な例があり、時間的ダイナミクスを訓練し、2つの異なる予測問題、時間とタイプ予測を解決することが重要である。特に、ディープニューラルネットワークベースのモデルは、ホークス過程やポアソン過程のような統計モデルよりも優れている。しかし、既存の多くのアプローチは、さまざまなイベントタイプを学習し予測するのではなく、特定のイベントに適合する。そのため、このような手法は事象間の変化した関係に対処できず、時間点過程の強度関数を予測できなかった。本稿では,これらの問題を解決するために,変動型ニューラルテンポラリポイントプロセス(vntpp)を提案する。本稿では,推論と生成ネットワークを導入し,ディープニューラルネットワークの確率的性質に対処するために潜在変数の分布を訓練する。インテンシティ関数は潜在変数の分布を用いて計算され、イベントタイプやイベントの到着時刻をより正確に予測できる。モデルが様々なイベントタイプの表現を一般化できることを実証的に実証する。さらに,我々のモデルは,合成および実世界のデータセット上で,他のディープニューラルネットワークベースモデルや統計処理よりも優れていることを示す。 A temporal point process is a stochastic process that predicts which type of events is likely to happen and when the event will occur given a history of a sequence of events. There are various examples of occurrence dynamics in the daily life, and it is important to train the temporal dynamics and solve two different prediction problems, time and type predictions. Especially, deep neural network based models have outperformed the statistical models, such as Hawkes processes and Poisson processes. However, many existing approaches overfit to specific events, instead of learning and predicting various event types. Therefore, such approaches could not cope with the modified relationships between events and fail to predict the intensity functions of temporal point processes very well. In this paper, to solve these problems, we propose a variational neural temporal point process (VNTPP). We introduce the inference and the generative networks, and train a distribution of latent variable to deal with stochastic property on deep neural network. The intensity functions are computed using the distribution of latent variable so that we can predict event types and the arrival times of the events more accurately. We empirically demonstrate that our model can generalize the representations of various event types. Moreover, we show quantitatively and qualitatively that our model outperforms other deep neural network based models and statistical processes on synthetic and real-world datasets.	翻訳日:2022-02-27 17:46:08 公開日:2022-02-17
# 知識インフォームド分子学習:パラダイム伝達に関する調査 Knowledge-informed Molecular Learning: A Survey on Paradigm Transfer ( http://arxiv.org/abs/2202.10587v1 ) ライセンス: Link先を確認	Yin Fang, Qiang Zhang, Zhuo Chen, Xiaohui Fan and Huajun Chen	(参考訳) 機械学習、特に深層学習は、生化学領域において非常に進歩した分子研究である。ほとんどの場合、ほとんどの分子タスクのモデリングはいくつかのパラダイムに収束している。例えば、私たちは通常、分子特性予測の課題を解決するために予測パラダイムを採用します。純粋データ駆動モデルの生成と解釈性を改善するため、研究者はこれらのモデルに生化学的ドメイン知識を組み込んで分子研究を行った。この知識の組み入れによりパラダイムトランスファーの傾向が高まり、ある分子学習タスクを別の分子として再構成することで解決している。本稿では,パラダイム伝達の観点からの知識インフォームド分子学習に関する文献レビューを行い,パラダイムの分類,方法論のレビュー,ドメイン知識の貢献度の分析を行う。さらに,その傾向を要約し,分子学習の今後の方向性を指摘する。 Machine learning, especially deep learning, has greatly advanced molecular studies in the biochemical domain. Most typically, modeling for most molecular tasks have converged to several paradigms. For example, we usually adopt the prediction paradigm to solve tasks of molecular property prediction. To improve the generation and interpretability of purely data-driven models, researchers have incorporated biochemical domain knowledge into these models for molecular studies. This knowledge incorporation has led to a rising trend of paradigm transfer, which is solving one molecular learning task by reformulating it as another one. In this paper, we present a literature review towards knowledge-informed molecular learning in perspective of paradigm transfer, where we categorize the paradigms, review their methods and analyze how domain knowledge contributes. Furthermore, we summarize the trends and point out interesting future directions for molecular learning.	翻訳日:2022-02-27 17:45:20 公開日:2022-02-17
# ptychographyにおける深部反復位相検索 Deep Iterative Phase Retrieval for Ptychography ( http://arxiv.org/abs/2202.10573v1 ) ライセンス: Link先を確認	Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann	(参考訳) 回折イメージングの分野における最も顕著な課題の1つは位相検索(pr)問題である: 回折パターンから物体を再構築するためには、逆フーリエ変換を計算しなければならない。これは全複素値回折データ、すなわち等級と位相を考えると可能である。しかし、回折イメージングでは、一般的に、位相を見積もる必要がある間に直接等級だけを測定できる。本研究では,複数重なり合った回折画像から物体を再構成する回折イメージングのサブフィールドであるptychographyについて考察する。本稿では,既存の反復位相探索アルゴリズムをニューラルネットワークで拡張し,各繰り返しの結果を精査する手法を提案する。この目的のために、最近提案されたアーキテクチャを音声処理分野から適応し拡張する。評価結果から,提案手法は反復数とアルゴリズム実行時間の両方の観点から,収束率の向上を図っている。 One of the most prominent challenges in the field of diffractive imaging is the phase retrieval (PR) problem: In order to reconstruct an object from its diffraction pattern, the inverse Fourier transform must be computed. This is only possible given the full complex-valued diffraction data, i.e. magnitude and phase. However, in diffractive imaging, generally only magnitudes can be directly measured while the phase needs to be estimated. In this work we specifically consider ptychography, a sub-field of diffractive imaging, where objects are reconstructed from multiple overlapping diffraction images. We propose an augmentation of existing iterative phase retrieval algorithms with a neural network designed for refining the result of each iteration. For this purpose we adapt and extend a recently proposed architecture from the speech processing field. Evaluation results show the proposed approach delivers improved convergence rates in terms of both iteration count and algorithm runtime.	翻訳日:2022-02-27 17:39:12 公開日:2022-02-17
# ヒルベルト空間におけるニューラルネットワークによる流れ前進の価格オプション Pricing options on flow forwards by neural networks in Hilbert space ( http://arxiv.org/abs/2202.11606v1 ) ライセンス: Link先を確認	Fred Espen Benth, Nils Detering, Luca Galimberti	(参考訳) 本稿では,無限次元ニューラルネットワークを応用したフローフォワードの価格設定手法を提案する。我々は, 正実数直線上の実数値関数のヒルベルト空間において, 項構造ダイナミクスの状態空間であるヒルベルト空間において, 価格問題を最適化問題として再キャストする。この最適化問題は、状態空間上の連続関数を近似するために設計された新しいフィードフォワードニューラルネットワークアーキテクチャを容易にすることで解決される。提案したニューラルネットはヒルベルト空間に基づいて構築される。本研究は, 用語構造曲線のサンプリングを訓練した古典的ニューラルネットよりも優れた数値効率を示す, 広範なケーススタディを提供する。 We propose a new methodology for pricing options on flow forwards by applying infinite-dimensional neural networks. We recast the pricing problem as an optimization problem in a Hilbert space of real-valued function on the positive real line, which is the state space for the term structure dynamics. This optimization problem is solved by facilitating a novel feedforward neural network architecture designed for approximating continuous functions on the state space. The proposed neural net is built upon the basis of the Hilbert space. We provide an extensive case study that shows excellent numerical efficiency, with superior performance over that of a classical neural net trained on sampling the term structure curves.	翻訳日:2022-02-27 17:39:00 公開日:2022-02-17
# 深層学習に基づく降水予測と推定のための効果的なトレーニング戦略 Effective Training Strategies for Deep-learning-based Precipitation Nowcasting and Estimation ( http://arxiv.org/abs/2202.10555v1 ) ライセンス: Link先を確認	Jihoon Ko, Kyuhan Lee, Hyunjin Hwang, Seok-Geun Oh, Seok-Woo Son, Kijung Shin	(参考訳) 深層学習は降水流にうまく適用されている。本研究では,事前学習方式と,ディープラーニングに基づく nowcasting 改善のための新しい損失関数を提案する。まず、広く使われているディープラーニングモデルであるU-Netを、降水量予測とレーダ画像からの降水量推定の2つの問題に適用する。前者を3つの降水間隔を持つ分類問題、後者を回帰問題として定式化する。そこで本研究では, 地中降雨を必要とせず, 近い将来, レーダー画像予測モデルを事前学習することを提案し, また, クラス不均衡問題を解決するために, 微調整のための新しい損失関数の利用を提案する。韓国から7年にわたって収集したレーダー画像と降水データセットを用いて,本手法の有効性を実証した。その結果,前訓練計画と新しい損失関数により,5時間リードタイムで最大95.7%,43.6%の降雨量(少なくとも10mm/h)のナッシングの臨界成功率(csi)が向上した。また,従来の降水量に比べて降水量推定誤差が最大で10.7%減少し,降水量は1mm/hrから10mm/hrに減少した。最後に, 異なる分解能に対するアプローチの感度について報告し, 豪雨の4例について詳細な解析を行った。 Deep learning has been successfully applied to precipitation nowcasting. In this work, we propose a pre-training scheme and a new loss function for improving deep-learning-based nowcasting. First, we adapt U-Net, a widely-used deep-learning model, for the two problems of interest here: precipitation nowcasting and precipitation estimation from radar images. We formulate the former as a classification problem with three precipitation intervals and the latter as a regression problem. For these tasks, we propose to pre-train the model to predict radar images in the near future without requiring ground-truth precipitation, and we also propose the use of a new loss function for fine-tuning to mitigate the class imbalance problem. We demonstrate the effectiveness of our approach using radar images and precipitation datasets collected from South Korea over seven years. It is highlighted that our pre-training scheme and new loss function improve the critical success index (CSI) of nowcasting of heavy rainfall (at least 10 mm/hr) by up to 95.7% and 43.6%, respectively, at a 5-hr lead time. We also demonstrate that our approach reduces the precipitation estimation error by up to 10.7%, compared to the conventional approach, for light rainfall (between 1 and 10 mm/hr). Lastly, we report the sensitivity of our approach to different resolutions and a detailed analysis of four cases of heavy rainfall.	翻訳日:2022-02-27 17:01:33 公開日:2022-02-17
# 正確なバイアスフリー学習のための勾配に基づくアクティベーション Gradient Based Activations for Accurate Bias-Free Learning ( http://arxiv.org/abs/2202.10943v1 ) ライセンス: Link先を確認	Vinod K Kurmi, Rishabh Sharma, Yash Vardhan Sharma, Vinay P. Namboodiri	(参考訳) 機械学習モデルのバイアス軽減は必須だが、それでも難しい。いくつかのアプローチが提案されているが、バイアスを緩和するための1つの視点は、逆学習である。判別器は、問題となる性別、年齢、人種などのバイアス特性を識別するために用いられる。この判別器は、バイアス特性を識別できないように逆向きに使用される。このようなモデルの主な欠点は、識別器が偏見の識別に敏感であると判断する特徴が分類と相関できるため、精度のトレードオフを直接導入することである。この仕事で私たちはその問題を解決する。このバイアス・精度のトレードオフを改善するためにバイアス付き判別器が実際に使用できることを示す。具体的には、判別器の勾配を用いた特徴マスキングアプローチを用いてこれを実現する。バイアス差別に好まれる特徴が強調され、分類中に偏りのない特徴が強化されることを保証する。この単純なアプローチはバイアスを低減し、精度を大幅に向上するために有効であることを示す。提案モデルを標準ベンチマークで評価する。我々は,不偏性を維持したり改善したりしながら,敵の手法の精度を向上し,また近年の手法よりも優れています。 Bias mitigation in machine learning models is imperative, yet challenging. While several approaches have been proposed, one view towards mitigating bias is through adversarial learning. A discriminator is used to identify the bias attributes such as gender, age or race in question. This discriminator is used adversarially to ensure that it cannot distinguish the bias attributes. The main drawback in such a model is that it directly introduces a trade-off with accuracy as the features that the discriminator deems to be sensitive for discrimination of bias could be correlated with classification. In this work we solve the problem. We show that a biased discriminator can actually be used to improve this bias-accuracy tradeoff. Specifically, this is achieved by using a feature masking approach using the discriminator's gradients. We ensure that the features favoured for the bias discrimination are de-emphasized and the unbiased features are enhanced during classification. We show that this simple approach works well to reduce bias as well as improve accuracy significantly. We evaluate the proposed model on standard benchmarks. We improve the accuracy of the adversarial methods while maintaining or even improving the unbiasness and also outperform several other recent methods.	翻訳日:2022-02-27 17:01:08 公開日:2022-02-17
# 健康データベースにおける予測モデルに対する不足値のベンチマーク手法 Benchmarking missing-values approaches for predictive models on health databases ( http://arxiv.org/abs/2202.10580v1 ) ライセンス: Link先を確認	Alexandre Perez-Lebel (MNI, MILA, PARIETAL), Ga\"el Varoquaux (MNI, MILA, PARIETAL), Marine Le Morvan (PARIETAL), Julie Josse (CRISAM, IDESP), Jean-Baptiste Poline (MNI)	(参考訳) 背景: データベースが大きくなるにつれて、コレクションを完全にコントロールすることが難しくなり、しばしば欠落した値(不完全な観察)を伴います。これらの大きなデータベースは、例えば予測やバイオマーカーの抽出など、機械学習モデルをトレーニングするのに適しています。このような予測的アプローチは、生成的ではなく差別的モデリングを使用することで、新たな欠落値戦略への扉を開くことができる。しかし、不足値を扱う戦略に関する既存の実証的な評価は、推論統計に焦点を当てている。本稿では,4つの電子健康記録データセット,1つの人口脳イメージングデータセット,1つの健康調査,および2つの集中治療データセットを対象とする,予測モデルにおける不足値戦略の体系的ベンチマークを行う。グラデーションブースト木を用いて,学習前に不足値に対するネイティブサポートと,単純かつ最先端のインプテーションを比較した。予測精度と計算時間について検討する。インプテーション後の予測では、どの値をインプットしたかを表す指標を追加することが重要であり、データが無作為ではないことを示唆する。不足値の計算は単純な戦略に比べて予測を改善できるが、大規模データではより長い計算時間を必要とする。価値の欠落をモデル化する学習ツリー - 頑丈で、高速で、優れた予測モデリングに、組み込まれた属性リードが欠落している。結論: 教師付き機械学習における欠落値のネイティブサポートは、計算コストをはるかに少なくして、最先端の命令よりも優れた予測を行う。インプテーションを使用する場合には、どの値がインプテーションされたかを表すインジケータ列を追加することが重要である。 BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative -- rather than generative -- modeling, and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. RESULTS: Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: four electronic health record datasets, a population brain imaging one, a health survey and two intensive care ones. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values-with missing incorporated attribute-leads to robust, fast, and well-performing predictive modeling. CONCLUSIONS: Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.	翻訳日:2022-02-27 17:00:24 公開日:2022-02-17
# minerl diamond 2021コンペティション:概要,結果,教訓 MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned ( http://arxiv.org/abs/2202.10583v1 ) ライセンス: Link先を確認	Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang, Weijun Hong, Zhongyue Huang, Haicheng Chen, Guangjun Zeng, Yue Lin, Vincent Micheli, Eloi Alonso, Fran\c{c}ois Fleuret, Alexander Nikulin, Yury Belousov, Oleg Svidchenko, Aleksei Shpilman	(参考訳) 強化学習コンペティションは、特定の問題に対する解決策を開発するための適切なスコープと支援を提供することによって、分野を前進させる。より広範に適用可能な手法の開発を促進するためには,一般技術の使用,サンプル効率のよい手法の使用,その結果の再現性等が求められる。研究コミュニティにとって有益ではあるが、これらの制限はコストがかかる。参入障壁が高すぎると、多くの潜在的な参加者が解体される。このことを念頭に置いて、我々はミネル・アーナダイアモンド・コンペティションの第3版、ミネル・ダイアモンド2021を主催し、新参者の参加を促進するためのいかなる解決策も許した。このトラックとより広範なチュートリアルとサポートにより、投稿数が増加した。この容易な軌道の参加者はダイヤモンドを得ることができ、硬い軌道の参加者は同じ作業で一般化可能な解を進めた。 Reinforcement learning competitions advance the field by providing appropriate scope and support to develop solutions toward a specific problem. To promote the development of more broadly applicable methods, organizers need to enforce the use of general techniques, the use of sample-efficient methods, and the reproducibility of the results. While beneficial for the research community, these restrictions come at a cost -- increased difficulty. If the barrier for entry is too high, many potential participants are demoralized. With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers. With this track and more extensive tutorials and support, we saw an increased number of submissions. The participants of this easier track were able to obtain a diamond, and the participants of the harder track progressed the generalizable solutions in the same task.	翻訳日:2022-02-27 16:59:55 公開日:2022-02-17
# (参考訳) 人間-AIコパイロット最適化による安全運転政策の効率的な学習 Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization ( http://arxiv.org/abs/2202.10341v1 ) ライセンス: CC BY 4.0	Quanyi Li, Zhenghao Peng, Bolei Zhou	(参考訳) 人間の介入は、人間知識を強化学習のトレーニングループに注入する効果的な方法であり、迅速な学習とトレーニング安全性の確保をもたらす。人間の介入の予算が非常に限られているため、人間のエキスパートがトレーニングで学習エージェントと対話する時期と方法を設計することは依然として困難である。本研究では,Human-AI Copilot Optimization (HACO)と呼ばれる新しいループ学習手法を開発した。訓練の安全性を確保しつつ、危険な環境におけるエージェントの十分な探索を可能にするために、人間の専門家は制御を引き継ぎ、おそらく危険な状況や簡単な行動を避ける方法を示すことができる。提案したHACOは、試行錯誤と人間の部分的なデモンストレーションの両方から、高性能エージェントの訓練に有効に活用する。 HACOは、部分的な人間のデモンストレーションからプロキシ状態-アクション値を抽出し、エージェントを最適化してプロキシ値を改善し、一方で人間の介入を減らす。実験の結果,hacoは安全運転ベンチマークにおいて試料効率がかなり高いことがわかった。 HACOは、少数の人的介入予算で未確認の交通シナリオを運転するエージェントを訓練し、高い安全性と一般化性を実現し、強化学習と模倣学習ベースラインの両方を大きなマージンで上回る。コードとデモビデオはhttps://decisionforce.github.io/haco/。 Human intervention is an effective way to inject human knowledge into the training loop of reinforcement learning, which can bring fast learning and ensured training safety. Given the very limited budget of human intervention, it remains challenging to design when and how human expert interacts with the learning agent in the training. In this work, we develop a novel human-in-the-loop learning method called Human-AI Copilot Optimization (HACO).To allow the agent's sufficient exploration in the risky environments while ensuring the training safety, the human expert can take over the control and demonstrate how to avoid probably dangerous situations or trivial behaviors. The proposed HACO then effectively utilizes the data both from the trial-and-error exploration and human's partial demonstration to train a high-performing agent. HACO extracts proxy state-action values from partial human demonstration and optimizes the agent to improve the proxy values meanwhile reduce the human interventions. The experiments show that HACO achieves a substantially high sample efficiency in the safe driving benchmark. HACO can train agents to drive in unseen traffic scenarios with a handful of human intervention budget and achieve high safety and generalizability, outperforming both reinforcement learning and imitation learning baselines with a large margin. Code and demo videos are available at: https://decisionforce.github.io/HACO/.	翻訳日:2022-02-26 22:40:07 公開日:2022-02-17
# (参考訳) 因果正規化フローを用いた世界南部地域におけるIMFプログラムの子ども貧困への影響の実態分析 Counterfactual Analysis of the Impact of the IMF Program on Child Poverty in the Global-South Region using Causal-Graphical Normalizing Flows ( http://arxiv.org/abs/2202.09391v1 ) ライセンス: CC BY 4.0	Sourabh Balgi, Jose M. Pe\~na, Adel Daoud	(参考訳) この研究は、因果推論とディープラーニングモデルの特定の分岐の適用例を示す: \emph{causal-Graphical Normalizing Flows (c-GNFs)}。近年の研究では、正規化フローは特定の特性を持ち、因果解析や反事実解析に特に適していることが示された。しかし、c-GNFsはシミュレーションされたデータセットでのみテストされ、c-GNFsの大規模実世界のデータへの適用は評価されていない。本研究は,c-gnfsを用いた国際通貨基金(imf)プログラムが子どもの貧困に与える影響を反事実的に分析するものである。この分析は18歳未満の1,941,734人の子どもが、世界南部から67カ国に居住する567,344家族の世話をする大規模な実世界の観測データに基づいている。 IMFの主な目的は、経済の安定を達成するための政府を支援することであるが、我々の結果は、IMFプログラムが子どもの貧困を正の副作用として約1.2$\pm$0.24(`0' は貧困に等しい。このように、本稿は、c-GNFが、AIにおける深層学習と因果推論を社会改善にどのように活用するかを示す。学習アルゴリズムが、人口レベル(ACE)、サブ人口レベル(CACE)、個人レベル(ICE)の対実的推論を通じて、未解決の社会的影響に対する大きなポテンシャルにどのように対処できるかを示す。 ACE や CACE を ICE ではなくモデル化している多くの作品とは対照的に、c-GNF は \emph{`The First Law of Causal Inference'} を用いてパーソナライズを可能にする。 This work demonstrates the application of a particular branch of causal inference and deep learning models: \emph{causal-Graphical Normalizing Flows (c-GNFs)}. In a recent contribution, scholars showed that normalizing flows carry certain properties, making them particularly suitable for causal and counterfactual analysis. However, c-GNFs have only been tested in a simulated data setting and no contribution to date have evaluated the application of c-GNFs on large-scale real-world data. Focusing on the \emph{AI for social good}, our study provides a counterfactual analysis of the impact of the International Monetary Fund (IMF) program on child poverty using c-GNFs. The analysis relies on a large-scale real-world observational data: 1,941,734 children under the age of 18, cared for by 567,344 families residing in the 67 countries from the Global-South. While the primary objective of the IMF is to support governments in achieving economic stability, our results find that an IMF program reduces child poverty as a positive side-effect by about 1.2$\pm$0.24 degree (`0' equals no poverty and `7' is maximum poverty). Thus, our article shows how c-GNFs further the use of deep learning and causal inference in AI for social good. It shows how learning algorithms can be used for addressing the untapped potential for a significant social impact through counterfactual inference at population level (ACE), sub-population level (CACE), and individual level (ICE). In contrast to most works that model ACE or CACE but not ICE, c-GNFs enable personalization using \emph{`The First Law of Causal Inference'}.	翻訳日:2022-02-26 22:19:47 公開日:2022-02-17
# 災害後の探索・救助作業における強化学習に基づくUAV基地局軌道最適化 UAV Base Station Trajectory Optimization Based on Reinforcement Learning in Post-disaster Search and Rescue Operations ( http://arxiv.org/abs/2202.10338v1 ) ライセンス: Link先を確認	Shiye Zhao, Kaoru Ota, Mianxiong Dong	(参考訳) 災害のため、地上基地局(TBS)は部分的にクラッシュした。一部のユーザー機器(UE)は保存されていない。無人航空機(UAV)を航空基地局として配置することは、UEを迅速にカバーする方法である。しかし、既存の方法はUAVのカバレッジのみを指す。これらのシナリオでは、すべてのTBSがもはや機能しないディスカスター後の領域におけるUAVの展開に重点を置いている。 TBSとUAVの組み合わせに関する限られた研究がある。本稿では,航空基地局として利用可能なTBSと協調してUAVを配備する手法を提案する。強化学習によってカバー範囲を改善しますさらに,実験では,まず階層構造(BIRCH)を用いて反復還元とクラスタリングのバランスをとるUEをクラスタリングした。最後に、Qラーニングを通じて基地局のUEに対するより良いカバレッジを達成する。 Because of disaster, terrestrial base stations (TBS) would be partly crashed. Some user equipments (UE) would be unserved. Deploying unmanned aerial vehicles (UAV) as aerial base stations is a method to cover UEs quickly. But existing methods solely refer to the coverage of UAVs. In those scenarios, they focus on the deployment of UAVs in the post-disaster area where all TBSs do not work any longer. There is limited research about the combination of available TBSs and UAVs. We propose the method to deploy UAVs cooperating with available TBSs as aerial base stations. And improve the coverage by reinforcement learning. Besides, in the experiments, we cluster UEs with balanced iterative reducing and clustering using hierarchies (BIRCH) at first. Finally, achieve base stations' better coverage to UEs through Q-learning.	翻訳日:2022-02-23 09:53:51 公開日:2022-02-17
# (参考訳) 多パターン乗客予測のためのヒューマンモビリティの探索 : グラフ学習フレームワーク Exploring Human Mobility for Multi-Pattern Passenger Prediction: A Graph Learning Framework ( http://arxiv.org/abs/2202.10339v1 ) ライセンス: CC BY 4.0	Xiangjie Kong, Kailai Wang, Mingliang Hou, Feng Xia, Gour Karmakar, Jianxin Li	(参考訳) 交通流予測は、インテリジェント交通システムにおいて不可欠な部分であり、様々な交通関連アプリケーションに基礎を置いている。バスは、固定された路線とスケジュールを持つ都市住民にとって必須の移動手段であり、定期運行が遅れる。しかし、この固定移動モードでは、人間の移動パターン、特にバス乗客間の複雑な関係が深く隠されている。交通流の予測には多くのモデルが存在するが、この点に関して人間の移動パターンは十分に研究されていない。この研究のギャップを減らし、この固定走行行動から人間の移動性知識を学習するために、グラフ畳み込みネットワーク(GCN)に基づく多パターンの乗客フロー予測フレームワークMPGCNを提案する。まず,バス記録データに基づいて乗客間の関係をモデル化する新しい共有ストップネットワークを構築する。そこで我々はGCNを用いて,有用なトポロジ情報から特徴を抽出し,バス乗客に隠された移動パターンを認識するディープクラスタリング手法を提案する。さらに, 時空間情報を完全に活用するために, 様々な移動パターンに基づき, gcn2flow を提案する。我々の知る限り、この論文は、グラフ学習からバスの乗客フローを予測するためのマルチパターンアプローチを採用した最初の試みである。経路最適化のためのケーススタディを設計する。実世界のバスデータセットに対する大規模な実験は、MPGCNが乗客フロー予測と経路最適化に潜在的に有効であることを示した。 Traffic flow prediction is an integral part of an intelligent transportation system and thus fundamental for various traffic-related applications. Buses are an indispensable way of moving for urban residents with fixed routes and schedules, which leads to latent travel regularity. However, human mobility patterns, specifically the complex relationships between bus passengers, are deeply hidden in this fixed mobility mode. Although many models exist to predict traffic flow, human mobility patterns have not been well explored in this regard. To reduce this research gap and learn human mobility knowledge from this fixed travel behaviors, we propose a multi-pattern passenger flow prediction framework, MPGCN, based on Graph Convolutional Network (GCN). Firstly, we construct a novel sharing-stop network to model relationships between passengers based on bus record data. Then, we employ GCN to extract features from the graph by learning useful topology information and introduce a deep clustering method to recognize mobility patterns hidden in bus passengers. Furthermore, to fully utilize Spatio-temporal information, we propose GCN2Flow to predict passenger flow based on various mobility patterns. To the best of our knowledge, this paper is the first work to adopt a multipattern approach to predict the bus passenger flow from graph learning. We design a case study for optimizing routes. Extensive experiments upon a real-world bus dataset demonstrate that MPGCN has potential efficacy in passenger flow prediction and route optimization.	翻訳日:2022-02-22 22:11:19 公開日:2022-02-17
# VRL3: ビジュアルディープ強化学習のためのデータ駆動フレームワーク VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning ( http://arxiv.org/abs/2202.10324v1 ) ライセンス: Link先を確認	Che Wang, Xufang Luo, Keith Ross, Dongsheng Li	(参考訳) 高度に課題の多いdrl(visual deep reinforcement learning)タスクを解決するための,シンプルかつ強力なデータ駆動フレームワークを提案する。我々は、データ駆動アプローチをとる際の多くの大きな障害を分析し、データ駆動型ビジュアルDRLに関する一連の設計原則、トレーニング戦略、重要な洞察を提示します。我々のフレームワークには3つのステージがある: ステージ1では非RLデータセット(例: ImageNet)を使ってタスクに依存しない視覚表現を学習し、ステージ2ではオフラインのRLデータ(例: 限られた数の専門家によるデモンストレーション)を使ってタスクに依存しない表現をより強力なタスク固有の表現に変換する。 sparse reward と real visual input を用いた極めて困難なハンド操作タスクのセットでは,従来の sota 法よりも 370%-1200% 高速に学習し,データ駆動型深層強化学習の可能性を完全に実証するエンコーダを用いた。 We propose a simple but powerful data-driven framework for solving highly challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, training strategies, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of highly challenging hand manipulation tasks with sparse reward and realistic visual inputs, our framework learns 370%-1200% faster than the previous SOTA method while using an encoder that is 50 times smaller, fully demonstrating the potential of data-driven deep reinforcement learning.	翻訳日:2022-02-22 15:56:57 公開日:2022-02-17
# (参考訳) ComParE COVID-19チャレンジの概要 A Summary of the ComParE COVID-19 Challenges ( http://arxiv.org/abs/2202.08981v1 ) ライセンス: CC BY 4.0	Harry Coppock, Alican Akman, Christian Bergler, Maurice Gerczuk, Chlo\"e Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Jing Han, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Panagiotis Tzirakis, Anton Batliner, Cecilia Mascolo, Bj\"orn W. Schuller	(参考訳) 新型コロナウイルス(COVID-19)のパンデミックは、人道と経済に大きな被害をもたらした。さまざまな分野の科学者チームが、政府やコミュニティが病気と戦うのを助ける方法を模索している。研究されている機械学習分野からの1つの道は、感染した人の呼吸音から新型コロナウイルスを検出するデジタル質量テストの展望である。我々は,InterSPEECH 2021 Computational Paralinguistics Challenges: COVID-19 Cough, (CCS) and COVID-19 Speech, (CSS)の結果の概要を述べる。 The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from infected individuals' respiratory sounds. We present a summary of the results from the INTERSPEECH 2021 Computational Paralinguistics Challenges: COVID-19 Cough, (CCS) and COVID-19 Speech, (CSS).	翻訳日:2022-02-22 00:52:02 公開日:2022-02-17
# (参考訳) RemixIT:ブートストラップリミックスによる音声強調モデルの連続的自己学習 RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing ( http://arxiv.org/abs/2202.08862v1 ) ライセンス: CC BY 4.0	Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar	(参考訳) RemixITは、単一の独立したドメイン内音声やノイズ波形を必要とせずに、音声強調を訓練するためのシンプルで効果的な自己教師手法である。提案手法は, 従来手法の制約を克服し, クリーンなドメイン内ターゲット信号に依存し, 列車とテストサンプル間のドメインミスマッチに敏感に対処する。 RemixITは、ドメイン外のデータに基づく事前訓練された教師モデルが、ドメイン内の混合に対して推定された擬似ターゲット信号を推測する継続的自己学習方式に基づいている。そして、推定されたクリーン信号とノイズ信号を置換してリミックスすることで、学生ネットワークのトレーニングに使用されるブートストラップ付き混合とそれに対応する擬似ターゲットを新たに生成する。教師は、最新の学生モデルの更新されたパラメータを使って、定期的に見積もりを洗練する。複数の音声強調データセットとタスクにおける実験結果は,従来の手法よりも優れた手法を示すだけでなく,任意の分離モデルと組み合わせて,任意の半教師なし・教師なしのドメイン適応タスクに適用できることを示した。実験的なエビデンスと組み合わせた分析は, 学生モデルが高度に劣化した疑似標的を観察しながら, 良好な性能を保ち続ける自己学習方式の内部機能に光を当てる。 We present RemixIT, a simple yet effective selfsupervised method for training speech enhancement without the need of a single isolated in-domain speech nor a noise waveform. Our approach overcomes limitations of previous methods which make them dependent to clean in-domain target signals and thus, sensitive to any domain mismatch between train and test samples. RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures. Then, by permuting the estimated clean and noise signals and remixing them together, we generate a new set of bootstrapped mixtures and corresponding pseudo-targets which are used to train the student network. Vice-versa, the teacher periodically refines its estimates using the updated parameters of the latest student models. Experimental results on multiple speech enhancement datasets and tasks not only show the superiority of our method over prior approaches but also showcase that RemixIT can be combined with any separation model as well as be applied towards any semi-supervised and unsupervised domain adaptation task. Our analysis, paired with empirical evidence, sheds light on the inside functioning of our self-training scheme wherein the student model keeps obtaining better performance while observing severely degraded pseudo-targets.	翻訳日:2022-02-22 00:36:28 公開日:2022-02-17
# (参考訳) グラフ機械学習のためのグラフデータ拡張:調査 Graph Data Augmentation for Graph Machine Learning: A Survey ( http://arxiv.org/abs/2202.08871v1 ) ライセンス: CC BY 4.0	Tong Zhao, Gang Liu, Stephan G\"unnemann, Meng Jiang	(参考訳) データ拡張は、追加のトレーニングデータを作成し、モデルの一般化を改善する能力から、グラフ機械学習への関心が高まっている。グラフデータの複雑で非ユークリッド的な構造によって引き起こされる課題により、従来の拡張操作を他の種類のデータに直接類似化することが制限されるため、この領域は依然として過小評価されている。本稿では,文献を構造化的に要約したグラフデータ拡張の包括的かつ体系的な調査を行う。まず,変更や生成したグラフデータのコンポーネントに基づいて,グラフデータの拡張操作を分類する。次に,グラフデータ拡張の最近の進歩を紹介し,その学習目標と方法論によって分離する。現在未解決の課題と今後の研究の方向性を概説する。全体として,グラフデータ拡張における既存文献の展望を明らかにし,この分野における追加作業を動機付けることを目的としている。私たちはgithubリポジトリ(https://github.com/zhao-tong/graph-data-augmentation-papers)と読み込みリストを提供し、継続的に更新します。 Data augmentation has recently seen increased interest in graph machine learning given its ability of creating extra training data and improving model generalization. Despite this recent upsurge, this area is still relatively underexplored, due to the challenges brought by complex, non-Euclidean structure of graph data, which limits the direct analogizing of traditional augmentation operations on other types of data. In this paper, we present a comprehensive and systematic survey of graph data augmentation that summarizes the literature in a structured manner. We first categorize graph data augmentation operations based on the components of graph data they modify or create. Next, we introduce recent advances in graph data augmentation, separating by their learning objectives and methodologies. We conclude by outlining currently unsolved challenges as well as directions for future research. Overall, this paper aims to clarify the landscape of existing literature in graph data augmentation and motivate additional work in this area. We provide a GitHub repository (https://github.com/zhao-tong/graph-data-augmentation-papers) with a reading list that will be continuously updated.	翻訳日:2022-02-21 23:59:15 公開日:2022-02-17
# (参考訳) 単調変分不等式を用いたニューラルネットワークの訓練 Training neural networks using monotone variational inequality ( http://arxiv.org/abs/2202.08876v1 ) ライセンス: CC BY 4.0	Chen Xu, Xiuyuan Cheng, Yao Xie	(参考訳) ニューラルネットワークの実証的な成功にもかかわらず、トレーニング手順の理論的理解は限定的であり、特に最適化問題の非凸性により性能が保証される。最近の研究(Juditsky & Nemirovsky, 2019)に触発されて、従来の損失関数最小化アプローチではなく、ネットワークパラメータのトレーニングを凸構造を持つ別の問題に還元し、モノトーン変分不等式(MVI)を解決する。 MVIの解は計算効率のよい手順で発見でき、さらに重要なことは、一層線形ニューラルネットワークの理論的設定の下でのモデル回復精度と予測精度に関する$\ell_2$および$\ell_{\infty}$バウンドの性能保証につながる。さらに,マルチ層ニューラルネットワークのトレーニングにおけるMVIの利用について検討し,SVI(textit{stochastic variational inequality})と呼ばれる実用的アルゴリズムを提案し,完全に接続されたニューラルネットワークとグラフニューラルネットワーク(GNN)のトレーニングへの適用性を示した(SVIは完全に汎用的で,他のタイプのニューラルネットワークのトレーニングに使用できる)。各種性能指標に関する実ネットワークデータ予測タスクにおいて,確率勾配降下(SGD)と比較して,SVIの競争力や性能が向上することを示した。 Despite the vast empirical success of neural networks, theoretical understanding of the training procedures remains limited, especially in providing performance guarantees of testing performance due to the non-convex nature of the optimization problem. Inspired by a recent work of (Juditsky & Nemirovsky, 2019), instead of using the traditional loss function minimization approach, we reduce the training of the network parameters to another problem with convex structure -- to solve a monotone variational inequality (MVI). The solution to MVI can be found by computationally efficient procedures, and importantly, this leads to performance guarantee of $\ell_2$ and $\ell_{\infty}$ bounds on model recovery accuracy and prediction accuracy under the theoretical setting of training one-layer linear neural network. In addition, we study the use of MVI for training multi-layer neural networks and propose a practical algorithm called \textit{stochastic variational inequality} (SVI), and demonstrates its applicability in training fully-connected neural networks and graph neural networks (GNN) (SVI is completely general and can be used to train other types of neural networks). We demonstrate the competitive or better performance of SVI compared to the stochastic gradient descent (SGD) on both synthetic and real network data prediction tasks regarding various performance metrics.	翻訳日:2022-02-21 23:43:02 公開日:2022-02-17
# (参考訳) 部分音声タグによるSinhalaニューラルマシン翻訳の改良 Improving English to Sinhala Neural Machine Translation using Part-of-Speech Tag ( http://arxiv.org/abs/2202.08882v1 ) ライセンス: CC BY 4.0	Ravinga Perera, Thilakshi Fonseka, Rashmini Naranpanawa, Uthayasanker Thayasivam	(参考訳) ニューラルマシン翻訳(nmt)の性能は、利用可能な並列コーパスのサイズに大きく依存する。このため、低リソース言語対は高リソース言語対と比較して低翻訳性能を示す。形態学的に豊かな言語に対してnmtを行うと翻訳品質はさらに低下する。ウェブには大量の情報が含まれているが、スリランカのほとんどの人々は英語を正しく読み書きできない。そのため、地域住民間で情報を共有するために、英語コンテンツを現地語に翻訳する大きな要件が存在する。シンハラ語はスリランカで主要な言語であり、シンハラ語に英語を翻訳できるnmtシステムを構築するのは、リソースの制約の少ない2つの言語間の構文の相違のため困難である。そこで本研究では,音声の一部(POS)タグをトランスフォーマーの入力埋め込みと位置エンコーディングに組み込むことにより,Sinhalaニューラルマシン翻訳モデルに対するベースライン英語の性能をさらに向上させる方法について検討する。 The performance of Neural Machine Translation (NMT) depends significantly on the size of the available parallel corpus. Due to this fact, low resource language pairs demonstrate low translation performance compared to high resource language pairs. The translation quality further degrades when NMT is performed for morphologically rich languages. Even though the web contains a large amount of information, most people in Sri Lanka are unable to read and understand English properly. Therefore, there is a huge requirement of translating English content to local languages to share information among locals. Sinhala language is the primary language in Sri Lanka and building an NMT system that can produce quality English to Sinhala translations is difficult due to the syntactic divergence between these two languages under low resource constraints. Thus, in this research, we explore effective methods of incorporating Part of Speech (POS) tags to the Transformer input embedding and positional encoding to further enhance the performance of the baseline English to Sinhala neural machine translation model.	翻訳日:2022-02-21 23:41:03 公開日:2022-02-17
# (参考訳) 低リソース音声認識のためのカリキュラム最適化 Curriculum optimization for low-resource speech recognition ( http://arxiv.org/abs/2202.08883v1 ) ライセンス: CC BY-SA 4.0	Anastasia Kuznetsova, Anurag Kumar, Jennifer Drexler Fox, Francis Tyers	(参考訳) 現代のエンドツーエンド音声認識モデルは、音声信号をテキストに書き起こすという驚くべき結果を示している。しかし、従来のデータ供給パイプラインは低リソース音声認識に最適であり、依然として困難な課題である。本稿では,トレーニング中のモデルの進捗状況と,トレーニングの難易度に関する事前知識の両方に基づいて,トレーニング事例の順序を最適化する自動カリキュラム学習手法を提案する。様々な騒音条件において生音声のスコアリング機能として使用できる圧縮比と呼ばれる新しい難易度尺度を提案する。提案手法は音声認識単語の誤り率をベースラインシステムと比較して最大33%向上させる。 Modern end-to-end speech recognition models show astonishing results in transcribing audio signals into written text. However, conventional data feeding pipelines may be sub-optimal for low-resource speech recognition, which still remains a challenging task. We propose an automated curriculum learning approach to optimize the sequence of training examples based on both the progress of the model while training and prior knowledge about the difficulty of the training examples. We introduce a new difficulty measure called compression ratio that can be used as a scoring function for raw audio in various noise conditions. The proposed method improves speech recognition Word Error Rate performance by up to 33% relative over the baseline system	翻訳日:2022-02-21 23:32:26 公開日:2022-02-17
# (参考訳) 衛星画像を用いた深層移動学習による開発途上国の大気質評価の改善 Deep Transfer Learning on Satellite Imagery Improves Air Quality Estimates in Developing Nations ( http://arxiv.org/abs/2202.08890v1 ) ライセンス: CC BY 4.0	Nishant Yadav, Meytar Sorek-Hamer, Michael Von Pohle, Ata Akbari Asanjan, Adwait Sahasrabhojanee, Esra Suel, Raphael Arku, Violet Lingenfelter, Michael Brauer, Majid Ezzati, Nikunj Oza, Auroop R. Ganguly	(参考訳) 都市大気汚染は低所得国や中所得国(LMIC)の公衆衛生問題である。しかし、LMICには十分な空気品質(AQ)の監視インフラがない。 LMICの都市ではAQを正確に推定できないため、緊急の準備やリスク軽減を妨げている。衛星画像をAQにマッピングするディープラーニングベースのモデルは、適切な地上データを持つ高所得国(HIC)向けに構築することができる。ここでは,HIC都市で学習した時空間パターンに基づいて,衛星画像の深層移動学習をAQに適用したスケーラブルなアプローチにより,LMIC都市の有意義な推定と洞察を抽出できることを実証する。このアプローチはアフリカ、ガーナのAccraで実証されており、米国2都市、特にロサンゼルスとニューヨークからAQパターンが学習されている。 Urban air pollution is a public health challenge in low- and middle-income countries (LMICs). However, LMICs lack adequate air quality (AQ) monitoring infrastructure. A persistent challenge has been our inability to estimate AQ accurately in LMIC cities, which hinders emergency preparedness and risk mitigation. Deep learning-based models that map satellite imagery to AQ can be built for high-income countries (HICs) with adequate ground data. Here we demonstrate that a scalable approach that adapts deep transfer learning on satellite imagery for AQ can extract meaningful estimates and insights in LMIC cities based on spatiotemporal patterns learned in HIC cities. The approach is demonstrated for Accra in Ghana, Africa, with AQ patterns learned from two US cities, specifically Los Angeles and New York.	翻訳日:2022-02-21 23:22:16 公開日:2022-02-17
# (参考訳) 対話型AI設計がユーザ行動に及ぼす影響:Fact-checking COVID-19の実態調査 The Effects of Interactive AI Design on User Behavior: An Eye-tracking Study of Fact-checking COVID-19 Claims ( http://arxiv.org/abs/2202.08901v1 ) ライセンス: CC BY 4.0	Li Shi, Nilavra Bhattacharya, Anubrata Das, Matthew Lease, Jacek Gwidzka	(参考訳) 我々は,AIを活用したファクトチェックシステムの相互作用が,生活時間や注意,精神的リソースといったユーザインタラクションにどのように影響するかを,実験室で調査した。参加者は、対話型および非対話型のaiファクトチェックシステムを使用して、covid-19関連クレームの正しさを評価した。 NASA-TLX を用いた Web ページインタラクション,アイトラッキングデータ,メンタルワークロードの収集を行った。 aiシステムの予測パラメータを対話的に操作する余裕があることは、ユーザの居住時間やaoisのアイフィックスに影響を与えているが、メンタルなワークロードには影響しなかった。インタラクティブシステムでは、参加者は主張の正しさを評価し、次にニュースを読む。この有望な結果は、ai駆動システムにおける相互活動のポジティブな役割を示している。 We conducted a lab-based eye-tracking study to investigate how the interactivity of an AI-powered fact-checking system affects user interactions, such as dwell time, attention, and mental resources involved in using the system. A within-subject experiment was conducted, where participants used an interactive and a non-interactive version of a mock AI fact-checking system and rated their perceived correctness of COVID-19 related claims. We collected web-page interactions, eye-tracking data, and mental workload using NASA-TLX. We found that the presence of the affordance of interactively manipulating the AI system's prediction parameters affected users' dwell times, and eye-fixations on AOIs, but not mental workload. In the interactive system, participants spent the most time evaluating claims' correctness, followed by reading news. This promising result shows a positive role of interactivity in a mixed-initiative AI-powered system.	翻訳日:2022-02-21 23:11:34 公開日:2022-02-17
# (参考訳) スタック一般化を用いたバイナリ分類のための多様な学習者の組み合わせ Combining Varied Learners for Binary Classification using Stacked Generalization ( http://arxiv.org/abs/2202.08910v1 ) ライセンス: CC BY 4.0	Sruthi Nair, Abhishek Gupta, Raunak Joshi, Vidya Chitre	(参考訳) 機械学習は、いくつかの面や他の面よりも優れた学習アルゴリズムを持っているが、すべてのアルゴリズムが抱える一般的なエラーは、非常に高次元の特徴セットを持つトレーニングデータである。これは通常、アルゴリズムが性能を損なう一般化エラーになってしまう。これは、Stacked Generalizationと呼ばれるStackingとして知られるEnsemble Learningメソッドを使って解決できる。本稿では,高次元多嚢胞性卵巣症候群データセット上で,スタック一般化を用いた2値分類を行い,モデルが一般化し,指標が大幅に向上する点を証明する。様々な指標が本論文で示されており、受信機動作特性曲線で見いだされた微妙な遷移が誤りであることが証明されている。 The Machine Learning has various learning algorithms that are better in some or the other aspect when compared with each other but a common error that all algorithms will suffer from is training data with very high dimensional feature set. This usually ends up algorithms into generalization error that deplete the performance. This can be solved using an Ensemble Learning method known as Stacking commonly termed as Stacked Generalization. In this paper we perform binary classification using Stacked Generalization on high dimensional Polycystic Ovary Syndrome dataset and prove the point that model becomes generalized and metrics improve significantly. The various metrics are given in this paper that also point out a subtle transgression found with Receiver Operating Characteristic Curve that was proved to be incorrect.	翻訳日:2022-02-21 23:03:30 公開日:2022-02-17
# (参考訳) 知識グラフ関係における微粒化セマンティクスの発見 Discovering Fine-Grained Semantics in Knowledge Graph Relations ( http://arxiv.org/abs/2202.08917v1 ) ライセンス: CC BY 4.0	Nitisha Jain and Ralf Krestel	(参考訳) マルチリレーショナルデータの理解と分析に関しては,関係のセマンティクスが不可欠である。複数のセマンティクスを表す異なる種類のエンティティ間の多相関係は、知識グラフで表される現実世界のリレーショナルデータセットで一般的である。エンティティタイプ分類、質問応答、知識グラフ補完などの多くのユースケースでは、これらの関係の正しい意味解釈が必要である。本研究は,抽象的関係に関連する異なる意味を探索し,細粒度な意味を持つ多くの部分関係を導出するための戦略を提案する。これを実現するために、関係に関連するエンティティの型を活用し、エンティティと関係のベクトル表現をクラスタ化する。提案手法は,多元関係に対する最良部分関係を自動で発見し,その意味的解釈を経験的評価により決定することができる。 When it comes to comprehending and analyzing multi-relational data, the semantics of relations are crucial. Polysemous relations between different types of entities, that represent multiple semantics, are common in real-world relational datasets represented by knowledge graphs. For numerous use cases, such as entity type classification, question answering and knowledge graph completion, the correct semantic interpretation of these relations is necessary. In this work, we provide a strategy for discovering the different semantics associated with abstract relations and deriving many sub-relations with fine-grained meaning. To do this, we leverage the types of the entities associated with the relations and cluster the vector representations of entities and relations. The suggested method is able to automatically discover the best number of sub-relations for a polysemous relation and determine their semantic interpretation, according to our empirical evaluation.	翻訳日:2022-02-21 22:56:43 公開日:2022-02-17
# (参考訳) FLAME: マルチデバイス環境におけるフェデレーション学習 FLAME: Federated Learning Across Multi-device Environments ( http://arxiv.org/abs/2202.08922v1 ) ライセンス: CC BY 4.0	Hyunsung Cho, Akhil Mathur, Fahim Kawsar	(参考訳) federated learning(fl)は、個人データをユーザデバイスにプライベートに保持しながら、マシンラーニングモデルの分散トレーニングを可能にする。ヒューマンアクティビティ認識などのモバイルセンシング分野におけるFLの応用が増加しているのを目にするが、FLはマルチデバイス環境(MDE)の文脈では研究されておらず、各ユーザが複数のデータ生成デバイスを所有している。モバイルやウェアラブルデバイスの普及に伴い、MIDはユビコン設定で人気が高まり、FLの研究が必要とされるようになった。 MDEにおけるFLは、クライアント間の非IID性が高く、ユーザとデバイスの両方の不均一性が複雑である。さらに、MDEにおけるFLクライアントにおけるシステムリソースの効率的な利用の確保は、依然として重要な課題である。本稿では,ユーザ中心のFLAME学習手法であるFLAMEを提案し,MDEにおける統計的・システム的不均一性に対処し,デバイス間での推論性能の整合性を実現する。 FLAMEの特徴 (i)同一ユーザからのデバイス間の時間アライメントを利用したユーザ中心FLトレーニング二精度及び効率性を考慮した装置の選択 (iii)デバイスへのパーソナライズモデル。また,実測エネルギードレインとネットワーク帯域幅プロファイルを用いたFL評価実験を行い,既存のHARデータセットをフェデレートした設定に拡張する新しいクラスベースデータ分割方式を提案する。その結果,FLAMEはF-1スコアが4.8～33.8%,エネルギー効率が1.02～2.86倍,収束速度が最大2.2倍に向上し,FLワークロードの公平分布による目標精度が向上した。 Federated Learning (FL) enables distributed training of machine learning models while keeping personal data on user devices private. While we witness increasing applications of FL in the area of mobile sensing, such as human-activity recognition, FL has not been studied in the context of a multi-device environment (MDE), wherein each user owns multiple data-producing devices. With the proliferation of mobile and wearable devices, MDEs are increasingly becoming popular in ubicomp settings, therefore necessitating the study of FL in them. FL in MDEs is characterized by high non-IID-ness across clients, complicated by the presence of both user and device heterogeneities. Further, ensuring efficient utilization of system resources on FL clients in a MDE remains an important challenge. In this paper, we propose FLAME, a user-centered FL training approach to counter statistical and system heterogeneity in MDEs, and bring consistency in inference performance across devices. FLAME features (i) user-centered FL training utilizing the time alignment across devices from the same user; (ii) accuracy- and efficiency-aware device selection; and (iii) model personalization to devices. We also present an FL evaluation testbed with realistic energy drain and network bandwidth profiles, and a novel class-based data partitioning scheme to extend existing HAR datasets to a federated setup. Our experiment results on three multi-device HAR datasets show that FLAME outperforms various baselines by 4.8-33.8% higher F-1 score, 1.02-2.86x greater energy efficiency, and up to 2.02x speedup in convergence to target accuracy through fair distribution of the FL workload.	翻訳日:2022-02-21 22:41:20 公開日:2022-02-17
# (参考訳) 最適パス森林における不均衡データセットの扱い Handling Imbalanced Datasets Through Optimum-Path Forest ( http://arxiv.org/abs/2202.08934v1 ) ライセンス: CC BY 4.0	Leandro Aparecido Passos, Danilo S. Jodas, Luiz C. F. Ribeiro, Marco Akio, Andre Nunes de Souza, Jo\~ao Paulo Papa	(参考訳) 過去10年間で、機械学習ベースのアプローチは、時には人間よりも幅広い複雑なタスクを実行できるようになり、わずかな時間を要するようになった。このような進歩は、利用可能なデータ量が指数関数的に増加し、それらから信頼できる現実世界情報を抽出できるためである。しかし、ある現象は他の現象よりも可能性が高いため、これらのデータは一般に不均衡である。このような振る舞いは、より頻繁なデータに偏っているため、機械学習モデルのパフォーマンスにかなりの影響を与えます。大量の機械学習手法にもかかわらず、グラフベースのアプローチは、多くのアプリケーション、すなわち最適なパスフォレスト(opf)のパフォーマンスが優れたため、かなりの注目を集めている。本稿では,不均衡問題に対処するための3つのopfベースの戦略を提案する。$\text{o}^2$pf と opf-us はそれぞれオーバーサンプリングとアンダーサンプリングのための新しいアプローチであり,両方のアプローチを組み合わせたハイブリッド戦略である。本稿では,上記の戦略に関する変種についても紹介する。パブリックデータセットとプライベートデータセットにおける最先端技術との比較により,提案手法の堅牢性が確認された。 In the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance problem: the $\text{O}^2$PF and the OPF-US, which are novel approaches for oversampling and undersampling, respectively, as well as a hybrid strategy combining both approaches. The paper also introduces a set of variants concerning the strategies mentioned above. Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches.	翻訳日:2022-02-21 22:12:22 公開日:2022-02-17
# スタイルベース生成対向ネットワークを用いた先行画像に基づく医用画像再構成 Prior image-based medical image reconstruction using a style-based generative adversarial network ( http://arxiv.org/abs/2202.08936v1 ) ライセンス: Link先を確認	Varun A. Kelkar and Mark A. Anastasio	(参考訳) 医用画像システムには画像形成のための計算再構成手順が必要である。記録された測定値が不完全である場合には、被写体の有用な推定値を取得するためには、被写体の性質に関する事前知識を利用する必要がある。画像逆問題に対する条件付けを改善するために,物体の先行や制約をよりよく表現するために,ディープラーニングアプローチを積極的に研究している。本研究は,検索対象の先行画像の形で追加情報が得られる場合に,画像再構成問題を制約するために,スタイルベースの生成逆ネットワーク(StyleGAN)を使用することを提案する。磁気共鳴イメージング(MRI)で使用されるコントラストのような、意味のある画像属性や「スタイル」に対して歪んだスタイルGANの中間潜時空間に最適化問題を定式化する。追従画像と先行画像との相違は、非絡み空間において測定され、非絡み空間の特定のスタイルに対する制約の形で逆問題を調整するために使用される。 MRイメージングにインスパイアされた、構造的に類似しているが、異なるコントラスト機構に属する、スタイル化された数値研究が設計されている。提案手法は,従来の手法に比べ,従来手法よりも優れていることを示す数値的研究を行った。 Computed medical imaging systems require a computational reconstruction procedure for image formation. In order to recover a useful estimate of the object to-be-imaged when the recorded measurements are incomplete, prior knowledge about the nature of object must be utilized. In order to improve the conditioning of an ill-posed imaging inverse problem, deep learning approaches are being actively investigated for better representing object priors and constraints. This work proposes to use a style-based generative adversarial network (StyleGAN) to constrain an image reconstruction problem in the case where additional information in the form of a prior image of the sought-after object is available. An optimization problem is formulated in the intermediate latent-space of a StyleGAN, that is disentangled with respect to meaningful image attributes or "styles", such as the contrast used in magnetic resonance imaging (MRI). Discrepancy between the sought-after and prior images is measured in the disentangled latent-space, and is used to regularize the inverse problem in the form of constraints on specific styles of the disentangled latent-space. A stylized numerical study inspired by MR imaging is designed, where the sought-after and the prior image are structurally similar, but belong to different contrast mechanisms. The presented numerical studies demonstrate the superiority of the proposed approach as compared to classical approaches in the form of traditional metrics.	翻訳日:2022-02-21 14:57:22 公開日:2022-02-17
# 量子データによる多体局在へのスケーラブルなアプローチ Scalable approach to many-body localization via quantum data ( http://arxiv.org/abs/2202.08853v1 ) ライセンス: Link先を確認	Alexander Gresch, Lennart Bittel and Martin Kliesch	(参考訳) 量子データによって、計算が難しい問題に対する実用的な解決策が実現できることに関心があります。量子多体物理学の非常に難しい現象は、多体局在(MBL)の出現である。これまでのところ、isは包括的な分析を避けている。特に、数値的研究はヒルベルト空間次元の指数的成長によって挑戦される。これらの研究の多くはシステムのハミルトニアンの正確な対角化に依存しているため、小さなシステムサイズのみがアクセス可能である。本研究では,学習データから計算コストのかかるステップを回避できる,高度に柔軟なニューラルネットワークに基づく学習手法を提案する。このようにして、隣接するギャップ比やエントロピー量などのMBLの共通指標を効率的に推定することができる。私たちの推定器は、さまざまなシステムサイズのデータを一度にトレーニングすることで、より小さなものから大きなものへと推定することが可能になります。さらに, 伝達学習を用いて, 二次元特徴ベクトルは, 様々なエネルギー密度で複数の異なる指標を一度に得るのに十分であることを示す。我々は、このアプローチを大規模量子実験に適用し、量子多体物理学への新たな洞察を提供することを望んでいる。 We are interested in how quantum data can allow for practical solutions to otherwise difficult computational problems. A notoriously difficult phenomenon from quantum many-body physics is the emergence of many-body localization (MBL). So far, is has evaded a comprehensive analysis. In particular, numerical studies are challenged by the exponential growth of the Hilbert space dimension. As many of these studies rely on exact diagonalization of the system's Hamiltonian, only small system sizes are accessible. In this work, we propose a highly flexible neural network based learning approach that, once given training data, circumvents any computationally expensive step. In this way, we can efficiently estimate common indicators of MBL such as the adjacent gap ratio or entropic quantities. Our estimator can be trained on data from various system sizes at once which grants the ability to extrapolate from smaller to larger ones. Moreover, using transfer learning we show that already a two-dimensional feature vector is sufficient to obtain several different indicators at various energy densities at once. We hope that our approach can be applied to large-scale quantum experiments to provide new insights into quantum many-body physics.	翻訳日:2022-02-21 14:57:00 公開日:2022-02-17
# 音声混合における単語埋め込みによる自動等化 Word Embeddings for Automatic Equalization in Audio Mixing ( http://arxiv.org/abs/2202.08898v1 ) ライセンス: Link先を確認	Satvik Venkatesh, David Moffat, Eduardo Reck Miranda	(参考訳) 近年,音声混合プロセスを自動化するために機械学習が広く採用されている。ゲイン調整、ステレオパニング、等化、残響といった様々な音響効果に自動ミキシングシステムが適用されている。これらのシステムはビジュアルインターフェースを通じて制御でき、オーディオ例、ノブ、セマンティックディスクリプタを提供する。セマンティック記述子やテキスト情報を使用してシステムを制御することは、アーティストが創造的な目標を伝える効果的な方法である。さらに、アーティストはミキシングシステムやミキシングエンジニアでは理解できないような非技術的な言葉を使うこともある。本稿では,意味記述子を表現するために単語埋め込みを利用する新しいアイデアについて検討する。単語埋め込みは一般的に、大量のテキストのコーパス上にニューラルネットワークをトレーニングすることで得られる。これらの埋め込みは、単語からEQ設定への変換を生成するニューラルネットワークの入力層として機能する。この技術を使用して、機械学習モデルは、これまで見たことのないセマンティックディスクリプタのEQ設定を生成することもできる。我々はこのアイデアの実現可能性を示す実験を行う。さらに,人間のeq設定とニューラルネットワークの予測を比較し,予測の質を評価する。その結果、埋め込み層により、ニューラルネットワークは意味記述子を理解できることがわかった。埋め込み層を持つモデルは、埋め込み層を持たないモデルよりも優れているが、人間のラベルほど良いものではない。 In recent years, machine learning has been widely adopted to automate the audio mixing process. Automatic mixing systems have been applied to various audio effects such as gain-adjustment, stereo panning, equalization, and reverberation. These systems can be controlled through visual interfaces, providing audio examples, using knobs, and semantic descriptors. Using semantic descriptors or textual information to control these systems is an effective way for artists to communicate their creative goals. Furthermore, sometimes artists use non-technical words that may not be understood by the mixing system, or even a mixing engineer. In this paper, we explore the novel idea of using word embeddings to represent semantic descriptors. Word embeddings are generally obtained by training neural networks on large corpora of written text. These embeddings serve as the input layer of the neural network to create a translation from words to EQ settings. Using this technique, the machine learning model can also generate EQ settings for semantic descriptors that it has not seen before. We perform experiments to demonstrate the feasibility of this idea. In addition, we compare the EQ settings of humans with the predictions of the neural network to evaluate the quality of predictions. The results showed that the embedding layer enables the neural network to understand semantic descriptors. We observed that the models with embedding layers perform better those without embedding layers, but not as good as human labels.	翻訳日:2022-02-21 14:54:29 公開日:2022-02-17
# 付加目的量を用いた評価最適化のための分散アルゴリズム A Distributed Algorithm for Measure-valued Optimization with Additive Objective ( http://arxiv.org/abs/2202.08930v1 ) ライセンス: Link先を確認	Iman Nodozi, Abhishek Halder	(参考訳) 本稿では,付加目的の測度値最適化問題を解くための分散非パラメトリックアルゴリズムを提案する。このような問題は、非正規化された平均場ニューラルネットワーク学習とワッサーシュタイン勾配流からランゲヴィンをサンプリングするなど、確率的学習と制御のいくつかの文脈で発生する。提案アルゴリズムは,乗算器の2層交互方向法(ADMM)を含む。外側層ADMMはユークリッドコンセンサスADMMをワッサーシュタインコンセンサスADMMに一般化し、エントロピー正規化バージョンSinkhornコンセンサスADMMに一般化する。内層ADMMは標準ユークリッドADMMの特定の例であることが判明した。全体アルゴリズムは、確率測度の多様体内の勾配流れに対する作用素分割を実現する。 We propose a distributed nonparametric algorithm for solving measure-valued optimization problems with additive objectives. Such problems arise in several contexts in stochastic learning and control including Langevin sampling from an unnormalized prior, mean field neural network learning and Wasserstein gradient flows. The proposed algorithm comprises a two-layer alternating direction method of multipliers (ADMM). The outer-layer ADMM generalizes the Euclidean consensus ADMM to the Wasserstein consensus ADMM, and to its entropy-regularized version Sinkhorn consensus ADMM. The inner-layer ADMM turns out to be a specific instance of the standard Euclidean ADMM. The overall algorithm realizes operator splitting for gradient flows in the manifold of probability measures.	翻訳日:2022-02-21 14:54:10 公開日:2022-02-17
# 複数入力関数を考慮した部分微分演算子モデリングのための拡張DeepONet Enhanced DeepONet for Modeling Partial Differential Operators Considering Multiple Input Functions ( http://arxiv.org/abs/2202.08942v1 ) ライセンス: Link先を確認	Lesley Tan and Liang Chen	(参考訳) 機械学習、特にディープラーニングは、様々な認知アプリケーションにおけるブレークスルーのパフォーマンスのために注目されている。近年、ニューラルネットワーク(NN)は偏微分方程式をモデル化するために集中的に研究されており、非線形関数の普遍近似器と見なすことができる。偏微分方程式(pde)に対する一般的な非線形連続作用素をモデル化するために、ディープ・ネットワーク・オペレータ(deeponet)アーキテクチャが提案されている。しかし、既存のdeeponetは1つの入力関数しか受け入れず、アプリケーションを制限することができる。本研究では,2つ以上の入力関数を受け入れるように拡張するDeepONetアーキテクチャについて検討する。本稿では,2つの入力関数を2つのブランチDNNサブネットワークで表現し,内部積を介して出力トラックネットワークに接続して,ニューラルネットワーク全体の出力を生成する,拡張DeepONetまたはEDeepONetハイレベルニューラルネットワーク構造を提案する。提案するEDeepONet構造は,複数の入力関数を扱うために容易に拡張できる。 2つの偏微分方程式の例をモデル化した結果,提案する拡張deeponetは,完全連結ニューラルネットワークよりも約7x-17xあるいは約1桁精度が高く,トレーニングとテストの両方において単純な拡張deeponetよりも約2x-3倍精度が高いことがわかった。 Machine learning, especially deep learning is gaining much attention due to the breakthrough performance in various cognitive applications. Recently, neural networks (NN) have been intensively explored to model partial differential equations as NN can be viewed as universal approximators for nonlinear functions. A deep network operator (DeepONet) architecture was proposed to model the general non-linear continuous operators for partial differential equations (PDE) due to its better generalization capabilities than existing mainstream deep neural network architectures. However, existing DeepONet can only accept one input function, which limits its application. In this work, we explore the DeepONet architecture to extend it to accept two or more input functions. We propose new Enhanced DeepONet or EDeepONet high-level neural network structure, in which two input functions are represented by two branch DNN sub-networks, which are then connected with output truck network via inner product to generate the output of the whole neural network. The proposed EDeepONet structure can be easily extended to deal with multiple input functions. Our numerical results on modeling two partial differential equation examples shows that the proposed enhanced DeepONet is about 7X-17X or about one order of magnitude more accurate than the fully connected neural network and is about 2X-3X more accurate than a simple extended DeepONet for both training and test.	翻訳日:2022-02-21 14:53:56 公開日:2022-02-17
# 約低ランクアイシングモデルサンプリング:MCMCは変分法を満たす Sampling Approximately Low-Rank Ising Models: MCMC meets Variational Methods ( http://arxiv.org/abs/2202.08907v1 ) ライセンス: Link先を確認	Frederic Koehler and Holden Lee and Andrej Risteski	(参考訳) 我々は、一般的な相互作用行列 $j$ を持つ超キューブ上のイジングモデルを検討し、もし$o(1)$ の固有値以外が長さ 1 の間隔にある場合、多項式時間サンプリングアルゴリズムを与える。これは以前は、all固有値が長さ1の間隔に収まるグラウバー力学で知られていたが、一方のアウトリアーはグラウバー力学をひどく混ぜ合わせることができる。この結果から,低次元文脈のホップフィールドネットワークやベイズクラスタリングモデルなどの低次アイジングモデルに対する最初の多項式時間サンプリングアルゴリズムが提案され,拡張器グラフ上の不整合場を持つ反強磁性/強磁性アイジングモデルに対する多項式時間サンプリング方式が大幅に改善された。また、変分法および統計物理学における素平均場近似に基づいて、従来の近似アルゴリズムの結果を改善した。我々のアプローチは、MCMCと変分推論の世界からの新たなアイデアの融合に基づいている。アルゴリズムの一部として,分布の指数関数的再重み付けから負の定値二次形式でサンプル化できる新しい非凸変分問題を定義し,確率的勾配降下を用いてこの手順を効果的に行う方法を示す。この上に、大きな正の固有値によって生じる障害を克服し、それをSGDベースのサンプリング器と組み合わせて全問題を解決する、新しい模擬テンパリングチェーン(Hubbard-Stratonovich変換から生じる拡張状態空間)を構築する。 We consider Ising models on the hypercube with a general interaction matrix $J$, and give a polynomial time sampling algorithm when all but $O(1)$ eigenvalues of $J$ lie in an interval of length one, a situation which occurs in many models of interest. This was previously known for the Glauber dynamics when all eigenvalues fit in an interval of length one; however, a single outlier can force the Glauber dynamics to mix torpidly. Our general result implies the first polynomial time sampling algorithms for low-rank Ising models such as Hopfield networks with a fixed number of patterns and Bayesian clustering models with low-dimensional contexts, and greatly improves the polynomial time sampling regime for the antiferromagnetic/ferromagnetic Ising model with inconsistent field on expander graphs. It also improves on previous approximation algorithm results based on the naive mean-field approximation in variational methods and statistical physics. Our approach is based on a new fusion of ideas from the MCMC and variational inference worlds. As part of our algorithm, we define a new nonconvex variational problem which allows us to sample from an exponential reweighting of a distribution by a negative definite quadratic form, and show how to make this procedure provably efficient using stochastic gradient descent. On top of this, we construct a new simulated tempering chain (on an extended state space arising from the Hubbard-Stratonovich transform) which overcomes the obstacle posed by large positive eigenvalues, and combine it with the SGD-based sampler to solve the full problem.	翻訳日:2022-02-21 14:36:09 公開日:2022-02-17
# 心拍数推定のための機械学習モデルと顔領域ビデオ:特許,データセット,文献のレビュー Machine learning models and facial regions videos for estimating heart rate: a review on Patents, Datasets and Literature ( http://arxiv.org/abs/2202.08913v1 ) ライセンス: Link先を確認	Tiago Palma Pagano, Lucas Lemos Ortega, Victor Rocha Santos, Yasmin da Silva Bonfim, Jos\'e Vin\'icius Dantas Paranhos, Paulo Henrique Miranda S\'a, Lian Filipe Santana Nascimento, Ingrid Winkler, Erick Giovani Sperandio Nascimento	(参考訳) 心拍数の推定は、様々な状況のユーザを監視する上で重要である。顔画像に基づく推定は、非侵襲的な方法で心臓情報を監視することができ、デバイスがシンプルであるため、ユーザーの顔を撮影するカメラのみを必要とするため、ますます研究されている。ユーザーの顔のこれらのビデオから、機械学習は心拍数を推定することができる。本研究では、顔ビデオから心拍数を推定するために機械学習モデルを使用することの利点と課題について、特許、データセット、記事レビューを通して検討する。我々はderwent innovation, ieee xplore, scopus, web of science knowledge basesを検索し,7つの特許出願,11のデータセット,20の心拍数,photoplethysmography,心電図データを特定した。特許に関しては,著者らによって述べられているように,心拍推定に関する発明の利点に留意する。データセットの面では、そのほとんどは学術的な目的と、心拍推定以外の対象をカバーできるさまざまなサインとアノテーションがあることがわかりました。論文の観点では,心拍数測定のための関心領域の抽出や,小さな運動抽出にビデオ倍率を用いる手法や,観察された個人の心拍数,信号抽出に最適な領域,処理方法などを抽出したevm-cnnやvgg-16などのモデルを発見した。 Estimating heart rate is important for monitoring users in various situations. Estimates based on facial videos are increasingly being researched because it makes it possible to monitor cardiac information in a non-invasive way and because the devices are simpler, requiring only cameras that capture the user's face. From these videos of the user's face, machine learning is able to estimate heart rate. This study investigates the benefits and challenges of using machine learning models to estimate heart rate from facial videos, through patents, datasets, and articles review. We searched Derwent Innovation, IEEE Xplore, Scopus, and Web of Science knowledge bases and identified 7 patent filings, 11 datasets, and 20 articles on heart rate, photoplethysmography, or electrocardiogram data. In terms of patents, we note the advantages of inventions related to heart rate estimation, as described by the authors. In terms of datasets, we discovered that most of them are for academic purposes and with different signs and annotations that allow coverage for subjects other than heartbeat estimation. In terms of articles, we discovered techniques, such as extracting regions of interest for heart rate reading and using Video Magnification for small motion extraction, and models such as EVM-CNN and VGG-16, that extract the observed individual's heart rate, the best regions of interest for signal extraction and ways to process them.	翻訳日:2022-02-21 14:33:45 公開日:2022-02-17
# 反復的な信念の変化計算的な変化 Iterated Belief Change, Computationally ( http://arxiv.org/abs/2202.08856v1 ) ライセンス: Link先を確認	Kai Sauerwald and Christoph Beierle	(参考訳) 反復的信念変化(英: iterated belief change)とは、信念のダイナミクスに関する原則を研究する研究領域である。本稿では,反復的信念変化がいかに計算に結びついているかを示す。特に,反復的信念修正は,ダルウィッチ・ピアール法のような広く受け入れられた原則の下でも,チューリング完全であることを示す。 Iterated Belief Change is the research area that investigates principles for the dynamics of beliefs over (possibly unlimited) many subsequent belief changes. In this paper, we demonstrate how iterated belief change is connected to computation. In particular, we show that iterative belief revision is Turing complete, even under the condition that broadly accepted principles like the Darwiche-Pearl postulates for iterated revision hold.	翻訳日:2022-02-21 14:31:07 公開日:2022-02-17
# コンピュータビジョンによるカモフラージュ軍用アセットに対する非知覚的敵パッチの開発 Developing Imperceptible Adversarial Patches to Camouflage Military Assets From Computer Vision Enabled Technologies ( http://arxiv.org/abs/2202.08892v1 ) ライセンス: Link先を確認	Christopher Wise, Jo Plested	(参考訳) 畳み込みニューラルネットワーク(CNN)は、オブジェクト検出の急速な進歩と高いレベルの成功を示している。しかし、近年の証拠は敵の攻撃に対する脆弱性を浮き彫りにした。これらの攻撃は、対象の誤分類や検出の抑制をもたらす画像の摂動や敵のパッチが計算される。伝統的なカモフラージュ手法は、情報、監視および偵察技術および第5世代のミサイルにおける自律的な検出から航空機や他の大きな移動資産を偽装する場合、実用的でない。本稿では,コンピュータビジョン対応技術から大規模軍事資産をカモフラームできる非受容性パッチを作製するユニークな手法を提案する。対象検出損失を最大化しつつ,パッチの色知覚性を制限することにより,これらのパッチを開発した。本研究は,対象検出アルゴリズムに対する敵例とその影響の理解を深めることを目的とする。 Convolutional neural networks (CNNs) have demonstrated rapid progress and a high level of success in object detection. However, recent evidence has highlighted their vulnerability to adversarial attacks. These attacks are calculated image perturbations or adversarial patches that result in object misclassification or detection suppression. Traditional camouflage methods are impractical when applied to disguise aircraft and other large mobile assets from autonomous detection in intelligence, surveillance and reconnaissance technologies and fifth generation missiles. In this paper we present a unique method that produces imperceptible patches capable of camouflaging large military assets from computer vision-enabled technologies. We developed these patches by maximising object detection loss whilst limiting the patch's colour perceptibility. This work also aims to further the understanding of adversarial examples and their effects on object detection algorithms.	翻訳日:2022-02-21 14:29:31 公開日:2022-02-17
# 言語仕様による視覚的注意の誘導について On Guiding Visual Attention with Language Specification ( http://arxiv.org/abs/2202.08926v1 ) ライセンス: Link先を確認	Suzanne Petryk, Lisa Dunlap, Keyan Nasseri, Joseph Gonzalez, Trevor Darrell, and Anna Rohrbach	(参考訳) 現実の課題は通常、言語用語やフレーズで視覚的カテゴリーを定義するが、ほとんどの視覚的分類法は数値的な指標でカテゴリーを定義する。しかし、クラスの言語仕様は偏りのあるデータセットやうるさいデータセットに特に便利であり、どの機能がタスクに関係するかの曖昧さを解消するのに役立ちます。近年の大規模マルチモーダルモデルは、画像訓練データを追加しても言語仕様から多種多様な高レベル概念を認識することが示されているが、より細かいタスクではクラスを区別できないことが多い。対照的にCNNは、きめ細かい識別に必要な微妙な画像の特徴を抽出できるが、データセットのバイアスやノイズに過度に適合する。私たちの洞察は、気を散らすのではなく、タスク関連機能に分類証拠を限定するためのアドバイスとして、ハイレベルな言語仕様を使用することです。そこで我々は,事前訓練された大規模モデルからタスク関連語やフレーズに注意マップを付ける。次に、このグラウンドリングを用いて分類器の空間的注意を注意散逸から遠ざける。この方法で空間的注意を監督することで、偏りやノイズのあるデータを含む分類タスクのパフォーマンスが向上し、約3～15%の最悪グループ精度が向上し、41～45%の相対的改善が得られている。 While real world challenges typically define visual categories with language words or phrases, most visual classification methods define categories with numerical indices. However, the language specification of the classes provides an especially useful prior for biased and noisy datasets, where it can help disambiguate what features are task-relevant. Recently, large-scale multimodal models have been shown to recognize a wide variety of high-level concepts from a language specification even without additional image training data, but they are often unable to distinguish classes for more fine-grained tasks. CNNs, in contrast, can extract subtle image features that are required for fine-grained discrimination, but will overfit to any bias or noise in datasets. Our insight is to use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors. To do this, we ground task-relevant words or phrases with attention maps from a pretrained large-scale model. We then use this grounding to supervise a classifier's spatial attention away from distracting context. We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data, including about 3-15% worst-group accuracy improvements and 41-45% relative improvements on fairness metrics.	翻訳日:2022-02-21 14:29:16 公開日:2022-02-17
# Generative Adversarial Networkに基づく非線形文脈帯域の高速オンライン推論 Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network ( http://arxiv.org/abs/2202.08867v1 ) ライセンス: Link先を確認	Yun Da Tsai, Shou De Lin	(参考訳) この研究は、腕数$n$が非常に大きいとき、非線形文脈バンドイットを推測する効率上の懸念に対処する。本稿では,トンプソンサンプリングや UCB などの帯域幅アルゴリズムを効率よく実行するための,エンドツーエンドのトレーニングプロセスを備えたニューラルバンディットモデルを提案する。我々は現在最先端の時間複雑性を$O(\log n)$に推し進め、ベイズ近似、ニューラルランダム特徴マッピング、近似大域最大化、近似近接探索を行う。さらに,予測時間からトレーニング時間までの最適なアーム選択の目標を最大化し,バッチ処理と並列処理のメリットを付加した大幅な高速化を享受する,生成的対向ネットワークを提案する。 % 生成モデルでは, 近似近傍探索の助けを借りて, 対数時間における後方サンプリングの近似argmaxを推定できる。分類とレコメンデーションタスクに関する広範囲な実験は、推論時間における桁違いな改善を示し、性能に有意な低下はない。 This work addresses the efficiency concern on inferring a nonlinear contextual bandit when the number of arms $n$ is very large. We propose a neural bandit model with an end-to-end training process to efficiently perform bandit algorithms such as Thompson Sampling and UCB during inference. We advance state-of-the-art time complexity to $O(\log n)$ with approximate Bayesian inference, neural random feature mapping, approximate global maxima and approximate nearest neighbor search. We further propose a generative adversarial network to shift the bottleneck of maximizing the objective for selecting optimal arms from inference time to training time, enjoying significant speedup with additional advantage of enabling batch and parallel processing. %The generative model can inference an approximate argmax of the posterior sampling in logarithmic time complexity with the help of approximate nearest neighbor search. Extensive experiments on classification and recommendation tasks demonstrate order-of-magnitude improvement in inference time no significant degradation on the performance.	翻訳日:2022-02-21 14:25:58 公開日:2022-02-17
# 連続時間対離散時間視ベースSLAM:比較研究 Continuous-Time vs. Discrete-Time Vision-based SLAM: A Comparative Study ( http://arxiv.org/abs/2202.08894v1 ) ライセンス: Link先を確認	Giovanni Cioffi, Titus Cieslewski, Davide Scaramuzza	(参考訳) ロボット実践者は一般に離散時間定式化を通じて視覚に基づくSLAM問題にアプローチする。これは統合理論の利点であり、成功事例と失敗事例を非常によく理解している。しかし、離散時間SLAMは、異なるセンサから来る高速度および/または非同期測定が推定プロセスに存在する場合、アルゴリズムを調整し、仮定を単純化する必要がある。逆に、実践者がしばしば見落としている継続的SLAMは、これらの制限に苦しめられません。実際、新しい測定値に新しい最適化変数を追加することなく、新しいセンサーデータを非同期に統合することができる。このように、非同期または連続的なセンサーデータの高速ストリームの統合は、高度に設計されたアルゴリズムを必要としないため、直感的な方法で複数のセンサーモダリティの融合を可能にする。マイナス面として、連続時間は、いくつかの好ましくない状況における軌道推定を悪化させる可能性のある事前を導入する。本研究では,視力に基づくSLAMにおける2つの定式化の利点と限界を体系的に比較することを目的とする。そこで我々は,ロボットの種類,動作速度,センサのモーダル性など,幅広い実験分析を行った。実験結果から, 軌道型とは独立に, 連続時間スラムは, センサが時間同期しない場合には, 個別のスラムよりも優れていることが示唆された。この作業の文脈で,slam問題を離散的かつ連続的に解決するための最先端アルゴリズムを含む,モジュール化された効率的なソフトウェアアーキテクチャを開発し,オープンソースとした。 Robotic practitioners generally approach the vision-based SLAM problem through discrete-time formulations. This has the advantage of a consolidated theory and very good understanding of success and failure cases. However, discrete-time SLAM needs tailored algorithms and simplifying assumptions when high-rate and/or asynchronous measurements, coming from different sensors, are present in the estimation process. Conversely, continuous-time SLAM, often overlooked by practitioners, does not suffer from these limitations. Indeed, it allows integrating new sensor data asynchronously without adding a new optimization variable for each new measurement. In this way, the integration of asynchronous or continuous high-rate streams of sensor data does not require tailored and highly-engineered algorithms, enabling the fusion of multiple sensor modalities in an intuitive fashion. On the down side, continuous time introduces a prior that could worsen the trajectory estimates in some unfavorable situations. In this work, we aim at systematically comparing the advantages and limitations of the two formulations in vision-based SLAM. To do so, we perform an extensive experimental analysis, varying robot type, speed of motion, and sensor modalities. Our experimental analysis suggests that, independently of the trajectory type, continuous-time SLAM is superior to its discrete counterpart whenever the sensors are not time-synchronized. In the context of this work, we developed, and open source, a modular and efficient software architecture containing state-of-the-art algorithms to solve the SLAM problem in discrete and continuous time.	翻訳日:2022-02-21 14:19:11 公開日:2022-02-17
# トレーニング済みのGANはいつ、なぜ、どれが役に立つのか? When, Why, and Which Pretrained GANs Are Useful? ( http://arxiv.org/abs/2202.08937v1 ) ライセンス: Link先を確認	Timofey Grigoryev, Andrey Voynov, Artem Babenko	(参考訳) 論文は、新しいデータセットで事前訓練されたganを微調整するいくつかの方法を提案しており、これは通常、スクラッチから、特に限られたデータレジームで、トレーニングよりも高いパフォーマンスをもたらす。しかし、GANプレトレーニングの明らかな経験的利点にもかかわらず、その内部メカニズムは詳細に分析されず、その役割の理解は明らかになっていない。さらに、例えば、適切な事前訓練されたGANチェックポイントを選択するなど、基本的な実践的詳細は、厳密な根拠を持たず、典型的には試行錯誤によって決定される。この研究は、ガン微調整の過程を解明することを目的としている。まず,事前学習したチェックポイントによるGANトレーニングプロセスの初期化が,個々のサンプルの忠実度よりもモデルのカバレッジに影響を与えることを示す。第2に,事前学習された生成器と判別器が微調整プロセスにどのように寄与するかを明示的に記述し,両者の事前学習の重要性に関するこれまでの証拠を説明する。最後に,本解析の直接的な実用的利点として,特定の対象タスクの微調整に最も適したganチェックポイントを選択するための簡単なレシピについて述べる。重要なことは、ほとんどのタスクにおいて、Imagenetで事前訓練されたGANは、視覚的品質が劣っているにもかかわらず、識別型コンピュータビジョンモデルの典型的な事前訓練シナリオと同様、微調整の出発点として優れたものと思われる。 The literature has proposed several methods to finetune pretrained GANs on new datasets, which typically results in higher performance compared to training from scratch, especially in the limited-data regime. However, despite the apparent empirical benefits of GAN pretraining, its inner mechanisms were not analyzed in-depth, and understanding of its role is not entirely clear. Moreover, the essential practical details, e.g., selecting a proper pretrained GAN checkpoint, currently do not have rigorous grounding and are typically determined by trial and error. This work aims to dissect the process of GAN finetuning. First, we show that initializing the GAN training process by a pretrained checkpoint primarily affects the model's coverage rather than the fidelity of individual samples. Second, we explicitly describe how pretrained generators and discriminators contribute to the finetuning process and explain the previous evidence on the importance of pretraining both of them. Finally, as an immediate practical benefit of our analysis, we describe a simple recipe to choose an appropriate GAN checkpoint that is the most suitable for finetuning to a particular target task. Importantly, for most of the target tasks, Imagenet-pretrained GAN, despite having poor visual quality, appears to be an excellent starting point for finetuning, resembling the typical pretraining scenario of discriminative computer vision models.	翻訳日:2022-02-21 13:15:48 公開日:2022-02-17
# SGPT:意味検索のためのGPT文埋め込み SGPT: GPT Sentence Embeddings for Semantic Search ( http://arxiv.org/abs/2202.08904v1 ) ライセンス: Link先を確認	Niklas Muennighoff	(参考訳) GPT変換器は利用可能な最大の言語モデルであるが、セマンティック検索はBERT変換器が支配している。 SGPT-BE と SGPT-CE を用いて,GPT モデルをバイエンコーダやクロスエンコーダとして対称探索や非対称探索に適用する。 SGPT-BEは、バイアステンソルのみを対照的に微調整し、意味的に意味のある文埋め込みを生成する。 580億のパラメータSGPT-BEは、BEIRに新しい最先端を設定すれば、最高の文埋め込みを6%上回る。同時に提案された175B DavinciエンドポイントのOpenAI Embeddingよりも優れており、パラメータは25万倍も微調整されている。 SGPT-CEは微調整なしでGPTモデルのログ確率を使用する。 610億のパラメータSGPT-CEは、BEIR上で教師なしの最先端を設定する。 7つのデータセットの教師付き最先端を破るが、他のデータセットでは著しく失われる。プロンプトに適応することで、どのように緩和できるかを示す。 SGPT-BEとSGPT-CEはモデルサイズでスケールする。しかし、レイテンシ、ストレージ、計算コストの増加を考慮すべきである。コード、モデル、結果ファイルはhttps://github.com/Muennighoff/sgpt.comから無料で入手できる。 GPT transformers are the largest language models available, yet semantic search is dominated by BERT transformers. We present SGPT-BE and SGPT-CE for applying GPT models as Bi-Encoders or Cross-Encoders to symmetric or asymmetric search. SGPT-BE produces semantically meaningful sentence embeddings by contrastive fine-tuning of only bias tensors and a novel pooling method. A 5.8 billion parameter SGPT-BE outperforms the best available sentence embeddings by 6% setting a new state-of-the-art on BEIR. It outperforms the concurrently proposed OpenAI Embeddings of the 175B Davinci endpoint, which fine-tunes 250,000 times more parameters. SGPT-CE uses log probabilities from GPT models without any fine-tuning. A 6.1 billion parameter SGPT-CE sets an unsupervised state-of-the-art on BEIR. It beats the supervised state-of-the-art on 7 datasets, but significantly loses on other datasets. We show how this can be alleviated by adapting the prompt. SGPT-BE and SGPT-CE performance scales with model size. Yet, increased latency, storage and compute costs should be considered. Code, models and result files are freely available at https://github.com/Muennighoff/sgpt.	翻訳日:2022-02-21 13:03:19 公開日:2022-02-17
# BADDr:PMDP用ベイズ適応型ディープドロップアウトRL BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs ( http://arxiv.org/abs/2202.08884v1 ) ライセンス: Link先を確認	Sammie Katt, Hai Nguyen, Frans A. Oliehoek, Christopher Amato	(参考訳) 強化学習(RL)はスケーラビリティに大きな進歩を遂げているが、探索と部分観測可能性はまだ研究トピックとして活発である。対照的に、ベイジアンRL(BRL)は、州の推定と探索・探索のトレードオフの両方に対して原則的な答えを提供するが、スケールに苦慮している。この課題に対処するため、様々な前提を持つBRLフレームワークが提案され、様々な成功を収めている。この研究は、部分的に可観測性の下でのBRLの表現に依存しない定式化を示し、1つの理論的な傘の下で以前のモデルを統一する。また,その実用性を示すために,ドロップアウトネットワークに基づく新しい導出手法Bayes-Adaptive Deep Dropout rl (BADDr)を提案する。このパラメータ化の下では、以前の仕事とは対照的に、状態とダイナミクスに対する信念は、よりスケーラブルな推論問題である。我々はモンテカルロ木探索による行動選択を行い、我々の手法がより大きい領域を解きながら、小さな領域における最先端のBRL法と競合することを示す。 While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.	翻訳日:2022-02-21 13:02:45 公開日:2022-02-17
# 効果的なスパースエキスパートモデルの設計 Designing Effective Sparse Expert Models ( http://arxiv.org/abs/2202.08906v1 ) ライセンス: Link先を確認	Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, William Fedus	(参考訳) スケールは自然言語処理の新たなフロンティアを切り開いたが、コストは高い。これに対し、Mixture-of-Experts (MoE) とSwitch Transformersは、より大きくより有能な言語モデルへのエネルギー効率の良い経路として提案されている。しかし、さまざまな自然言語タスクの最先端化は、微調整中にトレーニングの不安定さと不確実な品質によって妨げられている。私たちの仕事はこれらの問題に焦点を当て、デザインガイドとして機能します。計算コストは32B高密度エンコーダデコーダ変換器(Stable and Transferable Mixture-of-Experts, ST-MoE-32B)に匹敵する。スパースモデルは、推論(SuperGLUE, ARC Easy, ARC Challenge)、要約(XSum, CNN-DM)、クローズドブック質問応答(WebQA, Natural Questions)、反対に構築されたタスク(Winogrande, ANLI R3)など、さまざまなタスクの集合において、トランスファーラーニングにおける最先端のパフォーマンスを初めて達成する。 Scale has opened new frontiers in natural language processing -- but at a high cost. In response, Mixture-of-Experts (MoE) and Switch Transformers have been proposed as an energy efficient path to even larger and more capable language models. But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine-tuning. Our work focuses on these issues and acts as a design guide. We conclude by scaling a sparse model to 269B parameters, with a computational cost comparable to a 32B dense encoder-decoder Transformer (Stable and Transferable Mixture-of-Experts or ST-MoE-32B). For the first time, a sparse model achieves state-of-the-art performance in transfer learning, across a diverse set of tasks including reasoning (SuperGLUE, ARC Easy, ARC Challenge), summarization (XSum, CNN-DM), closed book question answering (WebQA, Natural Questions), and adversarially constructed tasks (Winogrande, ANLI R3).	翻訳日:2022-02-21 12:48:40 公開日:2022-02-17
# 言語抽象化による本質的探索の改善 Improving Intrinsic Exploration with Language Abstractions ( http://arxiv.org/abs/2202.08938v1 ) ライセンス: Link先を確認	Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rockt\"aschel, Edward Grefenstette	(参考訳) 強化学習(RL)エージェントは、報酬が不足している場合、特に訓練が困難である。共通の解決策の1つは、エージェントが環境を探索することを奨励するために内在的な報酬を使用することである。しかし、近年の内在的な探索手法では、低レベルの探索に報いるが、より抽象的なスキルを必要とする領域にはスケールしない状態に基づく新しい手法が用いられることが多い。代わりに、環境における関連する抽象化を強調するための一般的な媒体として自然言語を探索する。 amigo (campero et al., 2021) や noveld (zhang et al., 2021) といった競合型内在的探索ベースラインを直接拡張(および比較)することで、言語が既存の探索方法よりも改善できるかどうかを評価する。これらの言語ベースの変種は、MiniGridとMiniHack環境スイートの13の課題に対して、言語以外の形式を45～85%上回っている。 Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural language as a general medium for highlighting relevant abstractions in an environment. Unlike previous work, we evaluate whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al., 2021) and NovelD (Zhang et al., 2021). These language-based variants outperform their non-linguistic forms by 45-85% across 13 challenging tasks from the MiniGrid and MiniHack environment suites.	翻訳日:2022-02-21 12:42:46 公開日:2022-02-17
# マルチモダリティ・メディカルイメージングのためのグラフ畳み込みネットワーク:方法,アーキテクチャ,臨床応用 Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications ( http://arxiv.org/abs/2202.08916v1 ) ライセンス: Link先を確認	Kexin Ding, Mu Zhou, Zichen Wang, Qiao Liu, Corey W. Arnold, Shaoting Zhang, Dimitri N. Metaxas	(参考訳) 画像に基づく特徴づけと疾患の理解は、生物学的スケールにわたる形態的、空間的、および位相的情報の統合分析を含む。グラフ畳み込みネットワーク(gcns)の開発は、gcnsが機能集約、インタラクション、推論を驚くほどの柔軟性と効率で実行できるため、グラフ駆動アーキテクチャを通じてこの情報複雑性に対処する機会を生み出した。これらのGCNは、定量的疾患の理解、モニタリング、診断を改善することを目的として、医療画像解析における新たな研究の波を生み出している。しかし、多モードな医療画像のための重要な画像と画像の変換を設計し、モデル解釈と臨床診断支援の強化に関する洞察を得る上で、大きな課題が残っている。本稿では,放射線学や病理組織学からのイメージングデータを含む医用画像解析における最近のGCNの発展について述べる。本稿では,医療画像解析におけるグラフネットワークアーキテクチャの急速な普及と臨床における疾患診断と患者の予後の改善について考察する。分野横断的な研究を促進するために,我々は,画像ベースのgcnとそのモデル解釈における拡張における共通の課題,医療画像研究と関連するグラフ駆動医学研究のスコープを変えることを約束する大規模ベンチマークを提示する。 Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These GCNs capabilities have spawned a new wave of research in medical imaging analysis with the overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we present recent GCNs developments in the context of medical image analysis including imaging data from radiology and histopathology. We discuss the fast-growing use of graph network architectures in medical image analysis to improve disease diagnosis and patient outcomes in clinical practice. To foster cross-disciplinary research, we present GCNs technical advancements, emerging medical applications, identify common challenges in the use of image-based GCNs and their extensions in model interpretation, large-scale benchmarks that promise to transform the scope of medical image studies and related graph-driven medical research.	翻訳日:2022-02-21 12:41:56 公開日:2022-02-17
# (参考訳) 遺伝的カリキュラムによるロバスト強化学習 Robust Reinforcement Learning via Genetic Curriculum ( http://arxiv.org/abs/2202.08393v1 ) ライセンス: CC BY 4.0	Yeeho Song, Jeff Schneider	(参考訳) 安全クリティカルシステムに深部強化学習(RL)を適用する場合、堅牢な性能を達成することが重要である。芸術的アプローチのいくつかは、敵エージェントの問題に対処しようとするが、これらのエージェントは、しばしば専門家の監督を必要とし、敵エージェントが訓練者エージェントにとって難しすぎることを防ぐ。他のアプローチではトレーニング中に環境設定を自動的に調整するが、低次元エンコーディングが使用可能な単純な環境に限定されている。これらのアプローチに触発されて,エージェントが現在失敗しているシナリオを自動的に識別し,関連するカリキュラムを生成して,エージェントがシナリオを解決し,より堅牢な行動を得るための遺伝的カリキュラムを提案する。非パラメトリックオプティマイザとして、シナリオの生の非固定エンコーディングを使用し、専門家の監督の必要性を低減し、アルゴリズムがエージェントのパフォーマンスの変化に適応できるようにします。実験の結果,既存のアルゴリズムに対するロバスト性が向上し,積算報酬を犠牲にすることなく,エージェントの2～8倍の失敗率を低下させるトレーニングカリキュラムが得られた。我々はアブレーション研究を行い、アルゴリズムがなぜ以前のアプローチを上回っているかについての知見を共有する。 Achieving robust performance is crucial when applying deep reinforcement learning (RL) in safety critical systems. Some of the state of the art approaches try to address the problem with adversarial agents, but these agents often require expert supervision to fine tune and prevent the adversary from becoming too challenging to the trainee agent. While other approaches involve automatically adjusting environment setups during training, they have been limited to simple environments where low-dimensional encodings can be used. Inspired by these approaches, we propose genetic curriculum, an algorithm that automatically identifies scenarios in which the agent currently fails and generates an associated curriculum to help the agent learn to solve the scenarios and acquire more robust behaviors. As a non-parametric optimizer, our approach uses a raw, non-fixed encoding of scenarios, reducing the need for expert supervision and allowing our algorithm to adapt to the changing performance of the agent. Our empirical studies show improvement in robustness over the existing state of the art algorithms, providing training curricula that result in agents being 2 - 8x times less likely to fail without sacrificing cumulative reward. We include an ablation study and share insights on why our algorithm outperforms prior approaches.	翻訳日:2022-02-19 05:06:18 公開日:2022-02-17
# (参考訳) Augment with Care: ブール満足度問題に対するコントラスト学習 Augment with Care: Contrastive Learning for the Boolean Satisfiability Problem ( http://arxiv.org/abs/2202.08396v1 ) ライセンス: CC BY 4.0	Haonan Duan, Pashootan Vaezipoor, Max B. Paulus, Yangjun Ruan and Chris J. Maddison	(参考訳) 教師付き学習はコンビネート問題に対する最先端の解法の設計を改善することができるが、膨大な数のコンビネートインスタンスをラベル付けすることは、指数関数的な最悪のケースの複雑さのために、しばしば実用的ではない。画像のコントラストプリトレーニングが最近成功したことに触発されて,ブール満足性問題に対するコントラストプレトレーニングに対する拡張設計の影響を科学的に研究した。典型的なグラフコントラスト事前学習はラベルに依存しない拡張を用いるが、我々の重要な洞察は、多くの組合せ問題にはよく研究された不変性があり、ラベル保存拡張の設計を可能にすることである。ラベル保存強化は対照的な事前学習の成功に不可欠である。我々の表現は、ラベルの1%しか使用せず、完全教師付き学習に匹敵するテスト精度を達成できることを示す。また、我々の表現は、目に見えない領域からより大きな問題に転送可能であることも示している。 Supervised learning can improve the design of state-of-the-art solvers for combinatorial problems, but labelling large numbers of combinatorial instances is often impractical due to exponential worst-case complexity. Inspired by the recent success of contrastive pre-training for images, we conduct a scientific study of the effect of augmentation design on contrastive pre-training for the Boolean satisfiability problem. While typical graph contrastive pre-training uses label-agnostic augmentations, our key insight is that many combinatorial problems have well-studied invariances, which allow for the design of label-preserving augmentations. We find that label-preserving augmentations are critical for the success of contrastive pre-training. We show that our representations are able to achieve comparable test accuracy to fully-supervised learning while using only 1% of the labels. We also demonstrate that our representations are more transferable to larger problems from unseen domains.	翻訳日:2022-02-19 04:51:40 公開日:2022-02-17
# (参考訳) フェデレート確率勾配降下は自己誘導運動量を得る Federated Stochastic Gradient Descent Begets Self-Induced Momentum ( http://arxiv.org/abs/2202.08402v1 ) ライセンス: CC BY 4.0	Howard H. Yang, Zuozhu Liu, Yaru Fu, Tony Q. S. Quek, H. Vincent Poor	(参考訳) フェデレーション学習(federated learning, ffl)は,サーバとクライアントのホストが,プライバシに敏感なデータを直接公開することなく,クライアントのデータや計算リソースを活用した統計モデルを協調的にトレーニングする,モバイルエッジシステムに適用可能な,新たなマシンラーニング手法である。このような条件下での確率勾配降下 (SGD) の実行は, 大域的な集約プロセスに運動量的な項を加えるとみなすことができる。そこで本研究では,パラメータの安定度と通信資源の影響を考慮し,フェデレーション学習システムの収束率をさらに解析する。これらの結果はフェデレーションsgdアルゴリズムの理解を前進させ、またシステム設計者にとって有用なステイネス解析とフェデレーション計算システムとのリンクを分岐させる。 Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems, in which a server and a host of clients collaboratively train a statistical model utilizing the data and computation resources of the clients without directly exposing their privacy-sensitive data. We show that running stochastic gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process. Based on this finding, we further analyze the convergence rate of a federated learning system by accounting for the effects of parameter staleness and communication resources. These results advance the understanding of the Federated SGD algorithm, and also forges a link between staleness analysis and federated computing systems, which can be useful for systems designers.	翻訳日:2022-02-19 04:29:33 公開日:2022-02-17
# (参考訳) AutoScore-Ordinal: 順序付け結果のスコアリングモデルを生成するための解釈可能な機械学習フレームワーク AutoScore-Ordinal: An interpretable machine learning framework for generating scoring models for ordinal outcomes ( http://arxiv.org/abs/2202.08407v1 ) ライセンス: CC BY 4.0	Seyed Ehsan Saffari, Yilin Ning, Xie Feng, Bibhas Chakraborty, Victor Volovici, Roger Vaughan, Marcus Eng Hock Ong, Nan Liu	(参考訳) 背景:リスク予測モデルは、リスク階層化とリソース割り当てに役立つ臨床意思決定の有用なツールであり、患者の健康管理に繋がる可能性がある。 AutoScoreは、機械学習に基づくバイナリ結果のための自動臨床スコア生成装置である。本研究では,autoscoreフレームワークを拡張して,順序的結果に対するリスク予測を解釈可能にすることを目的とした。メソッド: AutoScore-Ordinalフレームワークは、変数ランキング、変数変換、スコア導出(比例奇数モデルからの)、モデル選択、スコア微調整、モデル評価を含む、オリジナルのAutoScoreアルゴリズムの6つのモジュールを使用して生成される。 2008年から2017年にかけてシンガポール総合病院の救急部門から電子カルテデータを用いてオートスコア・オルディナルのパフォーマンスを解析した。モデルはデータの70%でトレーニングされ、10%で検証され、残りの20%でテストされた。結果: 本研究は, 患者445,989例を対象とし, 平均結果の分布が80.7%, 30日可読率12.5%, 30日可読率12.5%, 退院後6.8%であった。フレキシブル変数選択手順によって同定された8変数の2セットを用いて,2つのポイントベースリスク予測モデルを開発した。 2つのモデルは、レシーバーの動作特性曲線 (0.785 と 0.793) の下の平均領域で測定された適度な性能を示し、代替モデルに匹敵する一般化された c-index (0.737 と 0.760) を示した。結論: autoscore-ordinalは、リスク予測モデルの開発と検証のための自動化および使いやすいフレームワークを提供し、高次元データから潜在的な予測者を体系的に識別する。 Background: Risk prediction models are useful tools in clinical decision-making which help with risk stratification and resource allocations and may lead to a better health care for patients. AutoScore is a machine learning-based automatic clinical score generator for binary outcomes. This study aims to expand the AutoScore framework to provide a tool for interpretable risk prediction for ordinal outcomes. Methods: The AutoScore-Ordinal framework is generated using the same 6 modules of the original AutoScore algorithm including variable ranking, variable transformation, score derivation (from proportional odds models), model selection, score fine-tuning, and model evaluation. To illustrate the AutoScore-Ordinal performance, the method was conducted on electronic health records data from the emergency department at Singapore General Hospital over 2008 to 2017. The model was trained on 70% of the data, validated on 10% and tested on the remaining 20%. Results: This study included 445,989 inpatient cases, where the distribution of the ordinal outcome was 80.7% alive without 30-day readmission, 12.5% alive with 30-day readmission, and 6.8% died inpatient or by day 30 post discharge. Two point-based risk prediction models were developed using two sets of 8 predictor variables identified by the flexible variable selection procedure. The two models indicated reasonably good performance measured by mean area under the receiver operating characteristic curve (0.785 and 0.793) and generalized c-index (0.737 and 0.760), which were comparable to alternative models. Conclusion: AutoScore-Ordinal provides an automated and easy-to-use framework for development and validation of risk prediction models for ordinal outcomes, which can systematically identify potential predictors from high-dimensional data.	翻訳日:2022-02-19 04:19:57 公開日:2022-02-17
# (参考訳) 動的グラフニューラルネットワークによる多変量時系列予測 Multivariate Time Series Forecasting with Dynamic Graph Neural ODEs ( http://arxiv.org/abs/2202.08408v1 ) ライセンス: CC BY 4.0	Ming Jin, Yu Zheng, Yuan-Fang Li, Siheng Chen, Bin Yang, Shirui Pan	(参考訳) 多変量時系列予測は、エネルギー消費や交通予測といった実世界の応用において長年大きな注目を集めてきた。最近の手法は優れた予測能力を示しているが、3つの基本的な限界に苦しむ。 (i)離散型ニューラルネットワークアーキテクチャ: 個別にパラメータ化された空間的および時間的ブロックをエンコードすることで、不連続な潜在状態の軌跡を導き、数値的誤差を高い予測に導く。 (ii) 高複雑性: 離散的アプローチは、専用の設計と冗長なパラメータを持つモデルを複雑にし、高い計算とメモリオーバーヘッドをもたらす。 3) グラフ事前の信頼性: 事前定義された静的グラフ構造に基づくと、実世界のアプリケーションにおけるその有効性と実践性が制限される。本稿では,動的グラフニューラル常微分方程式(mtgode)を用いた多変量時系列予測のための連続モデルを提案することにより,上記の制約をすべて解決する。具体的には、まず多変量時系列を時間発展ノード特徴と未知グラフ構造を持つ動的グラフに抽象化する。次に,不足するグラフトポロジーを補完し,空間的および時間的メッセージパッシングを統一するニューラルodeを設計,解決し,より深いグラフ伝搬と細粒度の時間情報集約を可能にし,安定かつ精密な潜在空間-時間的ダイナミクスを特徴付ける。本実験は, MTGODEの5つの時系列ベンチマーク・データセットにおける優位性を示すものである。 Multivariate time series forecasting has long received significant attention in real-world applications, such as energy consumption and traffic prediction. While recent methods demonstrate good forecasting abilities, they suffer from three fundamental limitations. (i) Discrete neural architectures: Interlacing individually parameterized spatial and temporal blocks to encode rich underlying patterns leads to discontinuous latent state trajectories and higher forecasting numerical errors. (ii) High complexity: Discrete approaches complicate models with dedicated designs and redundant parameters, leading to higher computational and memory overheads. (iii) Reliance on graph priors: Relying on predefined static graph structures limits their effectiveness and practicability in real-world applications. In this paper, we address all the above limitations by proposing a continuous model to forecast Multivariate Time series with dynamic Graph neural Ordinary Differential Equations (MTGODE). Specifically, we first abstract multivariate time series into dynamic graphs with time-evolving node features and unknown graph structures. Then, we design and solve a neural ODE to complement missing graph topologies and unify both spatial and temporal message passing, allowing deeper graph propagation and fine-grained temporal information aggregation to characterize stable and precise latent spatial-temporal dynamics. Our experiments demonstrate the superiorities of MTGODE from various perspectives on five time series benchmark datasets.	翻訳日:2022-02-19 04:05:40 公開日:2022-02-17
# (参考訳) 写本記号のエントロピー連想記憶 Entropic Associative Memory for Manuscript Symbols ( http://arxiv.org/abs/2202.08413v1 ) ライセンス: CC BY 4.0	Rafael Morales and No\'e Hern\'andez and Ricardo Cruz and Victor D. Cruz and Luis A. Pineda	(参考訳) メモリ検索は構成的な操作であり、メモリに含まれないオブジェクトへのメモリキューは検索なしで直接拒否され、並列計算によってメモリ操作を行うことができる。文字と数字の両方の写本記号は、関連するエントロピーを持つ連想記憶レジスタで表される。メモリ認識操作は、精度とリコールの間のエントロピートレードオフに従い、エントロピーレベルは、メモリ検索操作によって回収されたオブジェクトの品質に影響を及ぼす。本提案は,連想記憶のニューラルネットワークモデルと数次元的に対比される。重度咬合などの完全情報と不完全情報の両方を有する物体を検索するためのエントロピー連想記憶の動作特性について検討する。本稿で報告した実験は、自然記憶の実用的応用と計算モデルを開発するためのこの枠組みの可能性を示すものである。 Manuscript symbols can be stored, recognized and retrieved from an entropic digital memory that is associative and distributed but yet declarative; memory retrieval is a constructive operation, memory cues to objects not contained in the memory are rejected directly without search, and memory operations can be performed through parallel computations. Manuscript symbols, both letters and numerals, are represented in Associative Memory Registers that have an associated entropy. The memory recognition operation obeys an entropy trade-off between precision and recall, and the entropy level impacts on the quality of the objects recovered through the memory retrieval operation. The present proposal is contrasted in several dimensions with neural networks models of associative memory. We discuss the operational characteristics of the entropic associative memory for retrieving objects with both complete and incomplete information, such as severe occlusions. The experiments reported in this paper add evidence on the potential of this framework for developing practical applications and computational models of natural memory.	翻訳日:2022-02-19 03:38:00 公開日:2022-02-17
# (参考訳) FPIC:光PCB保証のための新しいセマンティックデータセット FPIC: A Novel Semantic Dataset for Optical PCB Assurance ( http://arxiv.org/abs/2202.08414v1 ) ライセンス: CC BY 4.0	Nathan Jessurun, Olivia P. Dizon-Paradis, Jacob Harrison, Shajib Ghosh, Mark M. Tehranipoor, Damon L. Woodard, Navid Asadizanjani	(参考訳) 印刷基板(PCB)の海外へのアウトソーシングは、ハードウェア保証能力の向上を必要とした。この目的のために、過去にデジタルカメラを用いて取得したPCB画像の様々な側面を探求する自動光学検査(AOI)技術が提案されている。本研究では、最先端のAOI手法を概観し、機械学習(ML)ソリューションに対する強固で急激な傾向を観察した。これらは、公開されているPCBデータ空間に欠けている、大量のラベル付き真実データを必要とする。本稿では,FICS PBCイメージコレクション(FPIC)データセットを提案する。さらに、この研究は、ハードウェアセキュリティ能力の潜在的な増加と、データ収集中に強調された方法論的区別をカバーしている。 The continued outsourcing of printed circuit board (PCB) fabrication to overseas venues necessitates increased hardware assurance capabilities. Toward this end, several automated optical inspection (AOI) techniques have been proposed in the past exploring various aspects of PCB images acquired using digital cameras. In this work, we review state-of-the-art AOI techniques and observed the strong, rapid trend toward machine learning (ML) solutions. These require significant amounts of labeled ground truth data, which is lacking in the publicly available PCB data space. We propose the FICS PBC Image Collection (FPIC) dataset to address this bottleneck in available large-volume, diverse, semantic annotations. Additionally, this work covers the potential increase in hardware security capabilities and observed methodological distinctions highlighted during data collection.	翻訳日:2022-02-19 03:17:18 公開日:2022-02-17
# (参考訳) 制御可能な調和性と多声性を有するコードコンディショニングメロディ合唱 Chord-Conditioned Melody Choralization with Controllable Harmonicity and Polyphonicity ( http://arxiv.org/abs/2202.08423v1 ) ライセンス: CC BY 4.0	Shangda Wu, Xiaobing Li, Maosong Sun	(参考訳) メロディ合唱(メロディの合唱)、すなわちユーザ・ギヴン・メロディに基づく4パートの合唱は、長い間J.S.バッハ合唱と密接に関連していた。従来のニューラルネットワークベースのシステムは、コード進行を条件としたchorale生成にほとんど焦点を合わせず、いずれも制御可能なメロディ合唱を実現していなかった。ニューラルネットワークがバッハの合唱曲から対位法の一般的な原理を学べるように、コードコンディショニングのためのコードシンボルを符号化した音楽表現を最初に設計する。次に,コード進行に条件付きメロディのための4パート合唱を生成可能なメロディ合唱システムであるDeepChoirを提案する。さらに、密度サンプリングの改善により、ユーザはDeepChoirが生成するコラールの調和度やポリフォニック度を制御できる。実験結果から,高調波とポリフォニック性に対するDeepChoirのデータ表現の有効性と制御性を明らかにした。 DeepChoirのコードと生成されたサンプル(合唱曲、民謡、交響曲)、そして現在使用しているデータセットはhttps://github.com/sander-wood/deepchoir.comで入手できる。 Melody choralization, i.e. generating a four-part chorale based on a user-given melody, has long been closely associated with J.S. Bach chorales. Previous neural network-based systems rarely focus on chorale generation conditioned on a chord progression, and none of them realised controllable melody choralization. To enable neural networks to learn the general principles of counterpoint from Bach's chorales, we first design a music representation that encoded chord symbols for chord conditioning. We then propose DeepChoir, a melody choralization system, which can generate a four-part chorale for a given melody conditioned on a chord progression. Furthermore, with the improved density sampling, a user can control the extent of harmonicity and polyphonicity for the chorale generated by DeepChoir. Experimental results reveal the effectiveness of our data representation and the controllability of DeepChoir over harmonicity and polyphonicity. The code and generated samples (chorales, folk songs and a symphony) of DeepChoir, and the dataset we use now are available at https://github.com/sander-wood/deepchoir.	翻訳日:2022-02-19 02:59:47 公開日:2022-02-17
# (参考訳) オンライン線形回帰としての合成制御 Synthetic Control As Online Linear Regression ( http://arxiv.org/abs/2202.08426v1 ) ライセンス: CC BY 4.0	Jiafeng Chen	(参考訳) 本稿では,合成制御とオンライン学習の単純な関係について述べる。具体的には、FTL(Follow-The-Leader)の例として合成制御を認識する。オンライン凸最適化における標準結果から, 対向的な結果が選択された場合でも, 処理単位に対する対実結果の合成制御予測は, 制御単位の結果のオラクル重み付き平均とほぼ同等に実行されることが示唆された。差分データに対する合成制御は、オラクル重み付き差分差分とほぼ同等に動作する。この観察は、比較ケーススタディにおける合成制御推定器の使用をさらに支援していると論じる。 This paper notes a simple connection between synthetic control and online learning. Specifically, we recognize synthetic control as an instance of Follow-The-Leader (FTL). Standard results in online convex optimization then imply that, even when outcomes are chosen by an adversary, synthetic control predictions of counterfactual outcomes for the treated unit perform almost as well as an oracle weighted average of control units' outcomes. Synthetic control on differenced data performs almost as well as oracle weighted difference-in-differences. We argue that this observation further supports the use of synthetic control estimators in comparative case studies.	翻訳日:2022-02-19 02:44:39 公開日:2022-02-17
# (参考訳) AKB-48: 実世界のArticulated Object Knowledge Base AKB-48: A Real-World Articulated Object Knowledge Base ( http://arxiv.org/abs/2202.08432v1 ) ライセンス: CC BY 4.0	Liu Liu, Wenqiang Xu, Haoyuan Fu, Sucheng Qian, Yang Han, Cewu Lu	(参考訳) 人間の生活は明瞭な物体で占められている。表現された物体の包括的理解、すなわち外観、構造、物理学的性質、意味論は、多くの研究コミュニティに利益をもたらすだろう。現在の調音オブジェクト理解ソリューションは、通常、物理特性のないCADモデルによる合成オブジェクトデータセットに基づいており、視覚およびロボット工学のタスクにおけるシミュレーションから実世界の応用への満足のいく一般化を防ぐ。このギャップを埋めるために、48のカテゴリからなる実世界3次元関節オブジェクトモデル2,037からなる大規模関節オブジェクト知識ベースであるakb-48を提案する。各オブジェクトは知識グラフArtiKGによって記述される。 akb-48を構築するために,高速な調音知識モデリング(farm)パイプラインを提案する。このパイプラインは10～15分で調音オブジェクトのarikgを満たし,実世界でのオブジェクトモデリングのコストを大幅に削減する。提案するAKBNetは,C-VAM(Calegory-level Visual Articulation Manipulation)タスクのための新しい積分パイプラインであり,ポーズ推定,オブジェクト再構成,操作という3つのサブタスクをベンチマークする。データセット、コード、モデルはhttps://liuliu66.github.io/articulationobjects/で公開されている。 Human life is populated with articulated objects. A comprehensive understanding of articulated objects, namely appearance, structure, physics property, and semantics, will benefit many research communities. As current articulated object understanding solutions are usually based on synthetic object dataset with CAD models without physics properties, which prevent satisfied generalization from simulation to real-world applications in visual and robotics tasks. To bridge the gap, we present AKB-48: a large-scale Articulated object Knowledge Base which consists of 2,037 real-world 3D articulated object models of 48 categories. Each object is described by a knowledge graph ArtiKG. To build the AKB-48, we present a fast articulation knowledge modeling (FArM) pipeline, which can fulfill the ArtiKG for an articulated object within 10-15 minutes, and largely reduce the cost for object modeling in the real world. Using our dataset, we propose AKBNet, a novel integral pipeline for Category-level Visual Articulation Manipulation (C-VAM) task, in which we benchmark three sub-tasks, namely pose estimation, object reconstruction and manipulation. Dataset, codes, and models will be publicly available at https://liuliu66.github.io/articulationobjects/.	翻訳日:2022-02-19 02:23:34 公開日:2022-02-17
# (参考訳) 説明可能な強化学習に関する調査 A Survey of Explainable Reinforcement Learning ( http://arxiv.org/abs/2202.08434v1 ) ライセンス: CC BY 4.0	Stephanie Milani and Nicholay Topin and Manuela Veloso and Fei Fang	(参考訳) 説明可能な強化学習(XRL)は説明可能な機械学習の新たなサブフィールドであり,近年注目されている。 xrlの目標は、学習エージェントの逐次意思決定設定における意思決定過程を明らかにすることである。本稿では,RL設定を優先するXRL文献を整理するための新しい分類法を提案する。私たちはこの分類法に従ってテクニックを概説する。将来の仕事のロードマップをモチベーションにし、概説するために使用する文献のギャップを指摘します。 Explainable reinforcement learning (XRL) is an emerging subfield of explainable machine learning that has attracted considerable attention in recent years. The goal of XRL is to elucidate the decision-making process of learning agents in sequential decision-making settings. In this survey, we propose a novel taxonomy for organizing the XRL literature that prioritizes the RL setting. We overview techniques according to this taxonomy. We point out gaps in the literature, which we use to motivate and outline a roadmap for future work.	翻訳日:2022-02-19 02:10:26 公開日:2022-02-17
# (参考訳) 前立腺癌のスライド画像全体を調べる病理医の視覚的注意分析 Visual attention analysis of pathologists examining whole slide images of Prostate cancer ( http://arxiv.org/abs/2202.08437v1 ) ライセンス: CC BY 4.0	Souradeep Chakraborty, Ke Ma, Rajarsi Gupta, Beatrice Knudsen, Gregory J. Zelinsky, Joel H. Saltz, Dimitris Samaras	(参考訳) 本研究は,前立腺癌組織の全スライディング画像(WSI)をデジタル顕微鏡を用いて検討する。我々の知る限りでは、病理学者が前立腺がんのWSIをどのようにナビゲートし、診断に関する情報を蓄積するかを報告するのは初めてです。本研究は,GU専門医5名と一般病理医8名からなる13名の病理医からスライドナビゲーションデータ(ビューポート位置,拡大レベル,時間)を収集し,視覚的注意熱マップとスキャンパスを生成した。各病理医は、GUの病理専門医が選択したTCGA PRADデータセットから5つのWSIを検査した。 wsi検査後の病理医群における視覚注意の分布について検討・解析した。 WSIにおける病理医の注意とがんの証拠との関係を定量化するために, 生殖器専門医から腫瘍のアノテーションを得た。これらのアノテーションを用いて視覚注意の分布と腫瘍領域との重なりを計算し,強い相関関係を同定した。この分析により,未知のWSIに対する視覚的注意を予測するために,ディープラーニングモデルを訓練した。本モデルによって予測された注意熱マップは, 様々な空間的, 時間的評価指標を用いて17wsisの検査群において, 基底真理注意熱マップや腫瘍注釈と非常によく相関することがわかった。 We study the attention of pathologists as they examine whole-slide images (WSIs) of prostate cancer tissue using a digital microscope. To the best of our knowledge, our study is the first to report in detail how pathologists navigate WSIs of prostate cancer as they accumulate information for their diagnoses. We collected slide navigation data (i.e., viewport location, magnification level, and time) from 13 pathologists in 2 groups (5 genitourinary (GU) specialists and 8 general pathologists) and generated visual attention heatmaps and scanpaths. Each pathologist examined five WSIs from the TCGA PRAD dataset, which were selected by a GU pathology specialist. We examined and analyzed the distributions of visual attention for each group of pathologists after each WSI was examined. To quantify the relationship between a pathologist's attention and evidence for cancer in the WSI, we obtained tumor annotations from a genitourinary specialist. We used these annotations to compute the overlap between the distribution of visual attention and annotated tumor region to identify strong correlations. Motivated by this analysis, we trained a deep learning model to predict visual attention on unseen WSIs. We find that the attention heatmaps predicted by our model correlate quite well with the ground truth attention heatmap and tumor annotations on a test set of 17 WSIs by using various spatial and temporal evaluation metrics.	翻訳日:2022-02-19 01:56:58 公開日:2022-02-17
# (参考訳) 深層強化学習に基づく適応と一般化に関する研究 A Survey on Deep Reinforcement Learning-based Approaches for Adaptation and Generalization ( http://arxiv.org/abs/2202.08444v1 ) ライセンス: CC BY 4.0	Pamul Yadav, Ashutosh Mishra, Junyong Lee, Shiho Kim	(参考訳) Deep Reinforcement Learning (DRL)は、現実世界の環境で複雑な問題を効率的に解けるインテリジェントエージェントを作ることを目的としている。通常、適応と一般化という2つの学習目標が、DRLアルゴリズムの性能を異なるタスクや領域で基礎付けるために使われる。本稿では,DRLに基づく適応と一般化に向けた最近の研究動向について述べる。まず、これらの目標をタスクとドメインのコンテキストで定式化する。次に,これらの手法による最近の研究成果を概観し,DRLアルゴリズムの適応性と一般化性を向上し,現実世界の幅広い問題に適用できる可能性について論じる。 Deep Reinforcement Learning (DRL) aims to create intelligent agents that can learn to solve complex problems efficiently in a real-world environment. Typically, two learning goals: adaptation and generalization are used for baselining DRL algorithm's performance on different tasks and domains. This paper presents a survey on the recent developments in DRL-based approaches for adaptation and generalization. We begin by formulating these goals in the context of task and domain. Then we review the recent works under those approaches and discuss future research directions through which DRL algorithms' adaptability and generalizability can be enhanced and potentially make them applicable to a broad range of real-world problems.	翻訳日:2022-02-19 01:48:45 公開日:2022-02-17
# (参考訳) Design-Bench: データ駆動のオフラインモデルベース最適化のためのベンチマーク Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization ( http://arxiv.org/abs/2202.08450v1 ) ライセンス: CC BY 4.0	Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine	(参考訳) black-box model-based optimization(mbo)問題は、未知の目的関数を最大化する設計入力を見つけることを目的としており、タンパク質の設計、dna配列、航空機、ロボットなど、幅広い領域においてユビキタスである。モデルに基づく最適化問題を解決するには、通常、設計提案において未知の目的関数を積極的に問い合わせる必要があり、つまり、候補分子、航空機、ロボットを物理的に構築し、それをテストし、結果を格納する。このプロセスは高価で時間がかかり、代わりに、既に持っているデータのみを使用して最適な設計のために最適化することを好むかもしれません。この設定はオフラインmboと呼ばれ、より一般的に研究されているオンライン技術とは異なるアルゴリズム上の課題をもたらす。近年の多くの研究は、高容量ディープニューラルネットワークを用いた高次元最適化問題に対するオフラインMBOの成功を示している。しかし、この新興分野における標準ベンチマークの欠如は、追跡を困難にしている。そこで本稿では,評価プロトコルを統一したオフラインmboベンチマークであるdesign-benchと,最近の手法のリファレンス実装を提案する。私たちのベンチマークには、生物学、材料科学、ロボット工学における現実世界の最適化問題から派生した、多様な現実的なタスクが含まれています。ベンチマークおよびリファレンス実装はgithub.com/rail-berkeley/design-benchおよびgithub.com/rail-berkeley/design-baselinesでリリースしています。 Black-box model-based optimization (MBO) problems, where the goal is to find a design input that maximizes an unknown objective function, are ubiquitous in a wide range of domains, such as the design of proteins, DNA sequences, aircraft, and robots. Solving model-based optimization problems typically requires actively querying the unknown objective function on design proposals, which means physically building the candidate molecule, aircraft, or robot, testing it, and storing the result. This process can be expensive and time consuming, and one might instead prefer to optimize for the best design using only the data one already has. This setting -- called offline MBO -- poses substantial and different algorithmic challenges than more commonly studied online techniques. A number of recent works have demonstrated success with offline MBO for high-dimensional optimization problems using high-capacity deep neural networks. However, the lack of standardized benchmarks in this emerging field is making progress difficult to track. To address this, we present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods. Our benchmark includes a suite of diverse and realistic tasks derived from real-world optimization problems in biology, materials science, and robotics that present distinct challenges for offline MBO. Our benchmark and reference implementations are released at github.com/rail-berkeley/design-bench and github.com/rail-berkeley/design-baselines.	翻訳日:2022-02-19 01:26:54 公開日:2022-02-17
# (参考訳) 科学的成功の遺伝子 The Gene of Scientific Success ( http://arxiv.org/abs/2202.08461v1 ) ライセンス: CC BY 4.0	Xiangjie Kong, Jun Zhang, Da Zhang, Yi Bu, Ying Ding, Feng Xia	(参考訳) 本稿では,科学的影響を改善するための因果要因の同定と評価方法について詳述する。現在、科学的影響の分析は、資金提供申請、メンター推薦、潜在的な協力者発見など様々な学術活動に有用である。ハイインパクトの学者は勤勉な仕事の奨励として賞を受ける機会が多すぎることが広く認められている。そのため、学者は科学的な業績を上げ、学問生活における科学的影響を改善することに多大な努力を捧げている。しかし,研究者の学業成功を左右する要因は何か。この問いへの答えは、研究者がより効率的に研究を行うのに役立つ。そこで本研究では,研究者の学術的成功に不可欠な要因を提示し,分析する。まず,記事中心因子,著者中心因子,会場中心因子,施設中心因子,時間的要因を含む5つの主要な要因を提案する。次に,最近の機械学習アルゴリズムとジャックナイフ法を適用し,各因果因子の重要性を評価する。その結果,著者中心および記事中心の要因は,コンピュータ科学分野における研究者の今後の成功に最も寄与することが示された。さらに、同じ機関や大学内の研究者のh-インデックスが互いに非常に近いという興味深い現象が発見された。 This paper elaborates how to identify and evaluate causal factors to improve scientific impact. Currently, analyzing scientific impact can be beneficial to various academic activities including funding application, mentor recommendation, and discovering potential cooperators etc. It is universally acknowledged that high-impact scholars often have more opportunities to receive awards as an encouragement for their hard working. Therefore, scholars spend great efforts in making scientific achievements and improving scientific impact during their academic life. However, what are the determinate factors that control scholars' academic success? The answer to this question can help scholars conduct their research more efficiently. Under this consideration, our paper presents and analyzes the causal factors that are crucial for scholars' academic success. We first propose five major factors including article-centered factors, author-centered factors, venue-centered factors, institution-centered factors, and temporal factors. Then, we apply recent advanced machine learning algorithms and jackknife method to assess the importance of each causal factor. Our empirical results show that author-centered and article-centered factors have the highest relevancy to scholars' future success in the computer science area. Additionally, we discover an interesting phenomenon that the h-index of scholars within the same institution or university are actually very close to each other.	翻訳日:2022-02-19 00:56:07 公開日:2022-02-17
# (参考訳) フルスパンログ線形モデルと高速学習アルゴリズム Full-Span Log-Linear Model and Fast Learning Algorithm ( http://arxiv.org/abs/2202.08472v1 ) ライセンス: CC BY-SA 4.0	Kazuya Takabatake, Shotaro Akaho	(参考訳) 本論文で導入されたフルスパン対数線形(fsll)モデルは、ターゲットシステム内の全ての変数の数が$n$であるボルツマンマシンと見なされる。 X = (X_0, ..., X_{n-1})$ を$\|X\|=\|X_0\|...\|X_{n-1}\|$異なる値を取る有限離散確率変数とする。 FSLLモデルは$\|X\|-1$パラメータを持ち、任意の正の分布を$X$で表すことができる。 FSLLモデルは「高階」ボルツマンマシンであるが、指数族において重要な役割を果たすモデル分布の双対パラメータを$O(\|X\|\log\|X\|)$ timeで計算することができる。さらに、FSLLモデルの双対パラメータの特性を用いて、効率的な学習アルゴリズムを構築することができる。 FSLLモデルは、最大$\|X\|\approx2^{25}$までの小さな確率モデルに制限されているが、この問題領域では、FSLLモデルはハイパーパラメータチューニングなしで、トレーニングデータの基礎となる様々な真の分布に柔軟に適合する。実験の結果、FSLLはラップトップPCで1分以内に$\|X\|=2^{20}$のような6つのトレーニングデータセットを学習した。 The full-span log-linear(FSLL) model introduced in this paper is considered an $n$-th order Boltzmann machine, where $n$ is the number of all variables in the target system. Let $X=(X_0,...,X_{n-1})$ be finite discrete random variables that can take $\|X\|=\|X_0\|...\|X_{n-1}\|$ different values. The FSLL model has $\|X\|-1$ parameters and can represent arbitrary positive distributions of $X$. The FSLL model is a "highest-order" Boltzmann machine; nevertheless, we can compute the dual parameters of the model distribution, which plays important roles in exponential families, in $O(\|X\|\log\|X\|)$ time. Furthermore, using properties of the dual parameters of the FSLL model, we can construct an efficient learning algorithm. The FSLL model is limited to small probabilistic models up to $\|X\|\approx2^{25}$; however, in this problem domain, the FSLL model flexibly fits various true distributions underlying the training data without any hyperparameter tuning. The experiments presented that the FSLL successfully learned six training datasets such that $\|X\|=2^{20}$ within one minute with a laptop PC.	翻訳日:2022-02-19 00:40:36 公開日:2022-02-17
# (参考訳) パラフレーズ生成の評価基準の再検討 Revisiting the Evaluation Metrics of Paraphrase Generation ( http://arxiv.org/abs/2202.08479v1 ) ライセンス: CC BY 4.0	Lingfeng Shen, Haiyun Jiang, Lemao Liu, Shuming Shi	(参考訳) パラフレーズ生成は近年大きな進歩を遂げた重要なNLPタスクである。しかし、重要な問題の一つが「パラフレーズの品質をどのように評価するか?」である。ほとんどの既存のパラフレーズ生成モデルは、ニューラルネットワーク翻訳(NMT)から参照ベースのメトリクス(BLEUなど)を使用して、生成されたパラフレーズを評価する。このようなメトリクスの信頼性はほとんど評価されておらず、標準参照が存在する場合にのみ妥当である。そこで本稿では,まず「既存のメトリクスはパラフレーズ生成に信頼性があるか?」という問いに答える。パラフレーズ生成における従来の知恵に反する2つの結論を提示する。(1)システムレベルとセグメントレベルのパラフレーズ評価において、既存のメトリクスは人間のアノテーションと不一致である。 2) 基準のないメトリクスは基準ベースのメトリクスよりも優れており、パラフレーズの品質を評価するのに標準参照は不要であることを示している。このような経験的発見は、信頼性の高い自動評価指標の欠如を露呈する。そこで本稿では,生成したパラフレーズの品質を反映した参照フリーメトリックであるBBScoreを提案する。 BBScoreはS3CスコアとSelfBLEUの2つのサブメトリックから構成されており、これは意味的保存と多様性の2つの基準に対応する。 2つのサブメトリックを接続することで、BBScoreは既存のパラフレーズ評価指標を大幅に上回る。 Paraphrase generation is an important NLP task that has achieved significant progress recently. However, one crucial problem is overlooked, `how to evaluate the quality of paraphrase?'. Most existing paraphrase generation models use reference-based metrics (e.g., BLEU) from neural machine translation (NMT) to evaluate their generated paraphrase. Such metrics' reliability is hardly evaluated, and they are only plausible when there exists a standard reference. Therefore, this paper first answers one fundamental question, `Are existing metrics reliable for paraphrase generation?'. We present two conclusions that disobey conventional wisdom in paraphrasing generation: (1) existing metrics poorly align with human annotation in system-level and segment-level paraphrase evaluation. (2) reference-free metrics outperform reference-based metrics, indicating that the standard references are unnecessary to evaluate the paraphrase's quality. Such empirical findings expose a lack of reliable automatic evaluation metrics. Therefore, this paper proposes BBScore, a reference-free metric that can reflect the generated paraphrase's quality. BBScore consists of two sub-metrics: S3C score and SelfBLEU, which correspond to two criteria for paraphrase evaluation: semantic preservation and diversity. By connecting two sub-metrics, BBScore significantly outperforms existing paraphrase evaluation metrics.	翻訳日:2022-02-19 00:21:20 公開日:2022-02-17
# (参考訳) 自己教師付きノード表現学習のための構造的および意味的コントラスト学習 Structural and Semantic Contrastive Learning for Self-supervised Node Representation Learning ( http://arxiv.org/abs/2202.08480v1 ) ライセンス: CC BY 4.0	Kaize Ding, Yancheng Wang, Yingzhen Yang and Huan Liu	(参考訳) グラフコントラスト学習(GCL)は最近、自己教師型で一般化可能、転送可能、堅牢なノード表現の学習に多くの研究関心を集めている。一般に、gclのコントラスト学習プロセスは、グラフニューラルネットワーク(gnn)バックボーンによって学習された表現の上に行われ、ノードのコンテキスト情報をローカルな近傍に基づいて変換、伝播する。しかし、既存のGCLの取り組みは、アーキテクチャのエンコーディング、拡張、および対照的な目的の両方の観点から厳しい制限があり、異なるデータセットで使用するのに一般的に非効率で非効率である。この作業では、既存の教師なしのGCLを超越し、シンプルだが効果的なフレームワークであるS$^3$-CLを提案し、それらの制限に対処する。具体的には、構造的および意味的対比学習によって、単純なニューラルネットワークでさえ、価値のある構造的および意味的パターンを保存する表現的ノード表現を学習することができる。実験により, S$^3$-CLで学習したノード表現は, 最先端のGCL法と比較して, 異なる下流タスクにおいて優れた性能を示すことが示された。 Graph Contrastive Learning (GCL) recently has drawn much research interest for learning generalizable, transferable, and robust node representations in a self-supervised fashion. In general, the contrastive learning process in GCL is performed on top of the representations learned by a graph neural network (GNN) backbone, which transforms and propagates the node contextual information based on its local neighborhoods. However, existing GCL efforts have severe limitations in terms of both encoding architecture, augmentation, and contrastive objective, making them commonly inefficient and ineffective to use in different datasets. In this work, we go beyond the existing unsupervised GCL counterparts and address their limitations by proposing a simple yet effective framework S$^3$-CL. Specifically, by virtue of the proposed structural and semantic contrastive learning, even a simple neural network is able to learn expressive node representations that preserve valuable structural and semantic patterns. Our experiments demonstrate that the node representations learned by S$^3$-CL achieve superior performance on different downstream tasks compared to the state-of-the-art GCL methods.	翻訳日:2022-02-19 00:08:54 公開日:2022-02-17
# (参考訳) 連続物理学のための連続モデル学習 Learning continuous models for continuous physics ( http://arxiv.org/abs/2202.08494v1 ) ライセンス: CC BY 4.0	Aditi S. Krishnapriyan, Alejandro F. Queiruga, N. Benjamin Erichson, Michael W. Mahoney	(参考訳) 時間とともに継続的に進化する力学系は、科学と工学を通して普遍的である。機械学習(ML)は、そのようなシステムのダイナミクスをモデル化し予測するためのデータ駆動型アプローチを提供する。このアプローチの中核的な問題は、MLモデルは典型的には離散データに基づいて訓練され、基礎となる連続性の性質を意識していないML方法論を使用する。その結果、これらのMLモデルは、多くの科学的・工学的な応用に限られている。この課題に対処するため,数値解析理論に基づく収束試験を開発した。このテストは、モデルがシステムの基盤となる連続ダイナミクスを正確に近似する関数を学習したかどうかを検証する。このテストに失敗するモデルは、関連するダイナミクスを捉えることができず、多くの科学的予測タスクに対して限られたユーティリティで表現するが、このテストに合格するモデルは、より優れた補間と、より優れた補間の両方を複数の方法で実現できる。本研究は,従来のMLトレーニング/テスト手法と一体化して,科学・工学分野におけるモデルの検証を行う方法である。 Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach is that ML models are typically trained on discrete data, using ML methodologies that are not aware of underlying continuity properties, which results in models that often do not capture the underlying continuous dynamics of a system of interest. As a result, these ML models are of limited use for for many scientific and engineering applications. To address this challenge, we develop a convergence test based on numerical analysis theory. Our test verifies whether a model has learned a function that accurately approximates a system's underlying continuous dynamics. Models that fail this test fail to capture relevant dynamics, rendering them of limited utility for many scientific prediction tasks; while models that pass this test enable both better interpolation and better extrapolation in multiple ways. Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.	翻訳日:2022-02-18 23:50:34 公開日:2022-02-17
# (参考訳) ニューラルネットワークプルーニングにおける反復的微調整に基づく小型音声・視覚後発単語スポッティングシステムの設計に関する研究 A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning ( http://arxiv.org/abs/2202.08509v1 ) ライセンス: CC BY 4.0	Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee	(参考訳) 音声のみに基づくウェイクワードスポッティング(WWS)は,信号伝送における環境干渉によりノイズの多い環境下では困難である。本稿では,視覚情報を利用した小型オーディオ・ビジュアルWWSシステムの設計について検討する。具体的には,視覚情報を利用するために,まず検出された唇を固定サイズのベクターにmobilenetと符号化し,音響的特徴と結合し,wwsのフュージョンネットワークを合成する。しかし、ニューラルネットワークに基づくオーディオ視覚モデルは、大きなフットプリントと高い計算複雑性を必要とする。アプリケーション要件を満たすために,ロッタリーチケット仮説(lth-if)によるニューラルネットワークのプルーニング戦略を,単モードモデルとマルチモーダルモデルに対して,反復的微調整方式(lth-if)で導入する。ホームテレビシーンにおける視聴覚wwのための社内コーパスでテストした結果,提案する視聴覚システムは,単一モード(オーディオのみまたはビデオのみ)システムに対して,異なる雑音環境下で大きな性能向上を達成している。さらに、LTH-IFプルーニングは、WWS性能を低下させることなく、ネットワークパラメータと計算を大幅に削減し、テレビの起動シナリオに潜在的な製品ソリューションをもたらす。 Audio-only-based wake word spotting (WWS) is challenging under noisy conditions due to environmental interference in signal transmission. In this paper, we investigate on designing a compact audio-visual WWS system by utilizing visual information to alleviate the degradation. Specifically, in order to use visual information, we first encode the detected lips to fixed-size vectors with MobileNet and concatenate them with acoustic features followed by the fusion network for WWS. However, the audio-visual model based on neural networks requires a large footprint and a high computational complexity. To meet the application requirements, we introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF), to the single-modal and multi-modal models, respectively. Tested on our in-house corpus for audio-visual WWS in a home TV scene, the proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions. Moreover, LTH-IF pruning can largely reduce the network parameters and computations with no degradation of WWS performance, leading to a potential product solution for the TV wake-up scenario.	翻訳日:2022-02-18 23:33:24 公開日:2022-02-17
# (参考訳) 対面分類としての視覚的地上真理構築 Visual Ground Truth Construction as Faceted Classification ( http://arxiv.org/abs/2202.08512v1 ) ライセンス: CC BY 4.0	Fausto Giunchiglia, Mayukh Bagchi, Xiaolei Diao	(参考訳) 機械学習とコンピュータビジョンにおける最近の研究は、主要なオブジェクト認識ベンチマークデータセットの開発において体系的な設計欠陥の証拠を提供している。例えば ImageNet では,いくつかのカテゴリのイメージに対して,表現対象とアノテートに使用するラベルとの間には矛盾がある。この問題の結果は、特に多くの機械学習アプリケーション、特にこれらのデータセットに基づいてトレーニングされたDeep Neural Networksに基づくアプリケーションを考えると、大きなものだ。本稿では,これらの基礎的真理ベンチマークデータセットの構築の基礎を提供する知識表現(kr)方法論の欠如が問題点であることを示す。そこで本研究では,3つの主要なステップで記述された解を提案する。 (i) テレオースマン論の哲学理論に基づく4つの順序付け段階における物体認識過程の分解 (ii)このような階層化に基づき、その視覚特性に応じて分類階層内で物体を整理するための新しい4段階の方法論を提案している。 (iii)顔分類パラダイムに従ってこのような分類を行う。アプローチの重要な新規性は、視覚的種分化を利用した視覚的特性から分類階層を構築し、言語的に基礎付けられた性質からではないという事実にある。提案手法は、音楽実験のImageNet階層に関する一連の実験によって検証される。 Recent work in Machine Learning and Computer Vision has provided evidence of systematic design flaws in the development of major object recognition benchmark datasets. One such example is ImageNet, wherein, for several categories of images, there are incongruences between the objects they represent and the labels used to annotate them. The consequences of this problem are major, in particular considering the large number of machine learning applications, not least those based on Deep Neural Networks, that have been trained on these datasets. In this paper we posit the problem to be the lack of a knowledge representation (KR) methodology providing the foundations for the construction of these ground truth benchmark datasets. Accordingly, we propose a solution articulated in three main steps: (i) deconstructing the object recognition process in four ordered stages grounded in the philosophical theory of teleosemantics; (ii) based on such stratification, proposing a novel four-phased methodology for organizing objects in classification hierarchies according to their visual properties; and (iii) performing such classification according to the faceted classification paradigm. The key novelty of our approach lies in the fact that we construct the classification hierarchies from visual properties exploiting visual genus-differentiae, and not from linguistically grounded properties. The proposed approach is validated by a set of experiments on the ImageNet hierarchy of musical experiments.	翻訳日:2022-02-18 23:21:37 公開日:2022-02-17
# (参考訳) 画像変換を用いた自己教師型表現学習に関する調査 Survey on Self-supervised Representation Learning Using Image Transformations ( http://arxiv.org/abs/2202.08514v1 ) ライセンス: CC BY 4.0	Muhammad Ali, Sayed Hashim	(参考訳) 深層ニューラルネットワークは大量のトレーニングデータを必要とするが、現実世界ではトレーニング目的のデータが少ない。これらの問題を解決するために、自己教師付き学習法(SSL)が用いられる。 ssl using geometric transformations (gt) は教師なし表現学習で使われる単純かつ強力な技術である。複数の調査論文がssl技術をレビューしているが、幾何学的変換を使うものだけに焦点を当てたものはない。さらに、これらの手法は、レビューされた論文では詳しくは触れられていない。この研究を提示する動機は、幾何学的変換が教師なし表現学習において強力な監督信号であることが示されていることです。また、多くの作品が大成功を収めたが、あまり注目されなかった。幾何変換を用いたSSLアプローチの簡潔な調査を行う。我々は、予測と自動エンコード変換に基づく画像変換を含む6つの代表的なモデルを要約する。私たちは、彼らのアーキテクチャと学習方法論をレビューします。また、cifar-10およびimagenetデータセットのオブジェクト認識タスクにおけるこれらのモデルの性能を比較する。分析の結果,AETv2はほとんどの環境で最高の性能を示した。機能分離によるローテーションも、いくつかの設定でうまく機能した。そして、観察結果から洞察を得る。最後に、結果と洞察の要約とともに、対処すべきオープンな問題を強調し、様々な今後の方向性を示す。 Deep neural networks need huge amount of training data, while in real world there is a scarcity of data available for training purposes. To resolve these issues, self-supervised learning (SSL) methods are used. SSL using geometric transformations (GT) is a simple yet powerful technique used in unsupervised representation learning. Although multiple survey papers have reviewed SSL techniques, there is none that only focuses on those that use geometric transformations. Furthermore, such methods have not been covered in depth in papers where they are reviewed. Our motivation to present this work is that geometric transformations have shown to be powerful supervisory signals in unsupervised representation learning. Moreover, many such works have found tremendous success, but have not gained much attention. We present a concise survey of SSL approaches that use geometric transformations. We shortlist six representative models that use image transformations including those based on predicting and autoencoding transformations. We review their architecture as well as learning methodologies. We also compare the performance of these models in the object recognition task on CIFAR-10 and ImageNet datasets. Our analysis indicates the AETv2 performs the best in most settings. Rotation with feature decoupling also performed well in some settings. We then derive insights from the observed results. Finally, we conclude with a summary of the results and insights as well as highlighting open problems to be addressed and indicating various future directions.	翻訳日:2022-02-18 23:00:48 公開日:2022-02-17
# (参考訳) DeepHybrid: 物体分類のための自動車レーダスペクトルと反射の深層学習 DeepHybrid: Deep Learning on Automotive Radar Spectra and Reflections for Object Classification ( http://arxiv.org/abs/2202.08519v1 ) ライセンス: CC BY-SA 4.0	Adriana-Eliza Cozma, Lisa Morgan, Martin Stolz, David Stoeckel, Kilian Rambach	(参考訳) 自動運転車は、オブジェクトと交通参加者を正確に検出し、分類する必要がある。自動車用レーダーセンサーを用いた信頼性の高い物体分類は困難であることが判明した。本稿では,従来のレーダー信号処理とディープラーニングアルゴリズムを組み合わせた手法を提案する。レーダ反射レベルの範囲方位情報は、レンジドップラースペクトルから関心のあるスパース領域を抽出するために使用される。これはニューラルネットワーク(NN)への入力として使用され、静止オブジェクトと移動オブジェクトの異なるタイプを分類する。本研究では,レーダースペクトルと反射特性の両方を入力として受信するハイブリッドモデル(deephybrid)を提案する。実験の結果,スペクトルのみを用いたモデルと比較して分類性能が向上した。さらに、資源効率が高く高性能なNNを見つけるために、ニューラルネットワーク探索(NAS)アルゴリズムを適用した。 NASは、精度を保ちながら手作業で設計したNNよりも、ほぼ1桁小さいNNが得られる。提案手法は,自動緊急ブレーキや衝突回避システムの改善などに用いることができる。 Automated vehicles need to detect and classify objects and traffic participants accurately. Reliable object classification using automotive radar sensors has proved to be challenging. We propose a method that combines classical radar signal processing and Deep Learning algorithms. The range-azimuth information on the radar reflection level is used to extract a sparse region of interest from the range-Doppler spectrum. This is used as input to a neural network (NN) that classifies different types of stationary and moving objects. We present a hybrid model (DeepHybrid) that receives both radar spectra and reflection attributes as inputs, e.g. radar cross-section. Experiments show that this improves the classification performance compared to models using only spectra. Moreover, a neural architecture search (NAS) algorithm is applied to find a resource-efficient and high-performing NN. NAS yields an almost one order of magnitude smaller NN than the manually-designed one while preserving the accuracy. The proposed method can be used for example to improve automatic emergency braking or collision avoidance systems.	翻訳日:2022-02-18 22:55:24 公開日:2022-02-17
# (参考訳) 自己指導・対人訓練を用いたエンドツーエンド音楽リマスターシステム End-to-end Music Remastering System Using Self-supervised and Adversarial Training ( http://arxiv.org/abs/2202.08520v1 ) ライセンス: CC BY 4.0	Junghyun Koo, Seungryeol Paik, Kyogu Lee	(参考訳) マスタリングは音楽制作において不可欠なステップだが、経験豊富なオーディオエンジニアの手に渡り、曲のトーン、スペース、ボリュームを調整しなければならない課題でもある。リマスターは同じ技術的プロセスに従っており、そのコンテキストは当時の曲をマスターすることにある。これらのタスクは入力障壁が高いため、入力音声のマスタリングスタイルをターゲットに変換するエンドツーエンドの音楽リマスターシステムを提案することにより、障壁を低くすることを目指している。システムは自己指導的な方法で訓練され、解放されたポップソングがトレーニングに使用された。また,事前学習したエンコーダと投影判別器を適用して,参照のマスタリングスタイルを反映した現実的な音声を生成するモデルも期待した。その結果を定量的指標と主観的聞き取りテストを用いて検証し,モデルが目標と類似したマスタリングスタイルのサンプルを生成したことを示す。 Mastering is an essential step in music production, but it is also a challenging task that has to go through the hands of experienced audio engineers, where they adjust tone, space, and volume of a song. Remastering follows the same technical process, in which the context lies in mastering a song for the times. As these tasks have high entry barriers, we aim to lower the barriers by proposing an end-to-end music remastering system that transforms the mastering style of input audio to that of the target. The system is trained in a self-supervised manner, in which released pop songs were used for training. We also anticipated the model to generate realistic audio reflecting the reference's mastering style by applying a pre-trained encoder and a projection discriminator. We validate our results with quantitative metrics and a subjective listening test and show that the model generated samples of mastering style similar to the target.	翻訳日:2022-02-18 22:44:27 公開日:2022-02-17
# (参考訳) 確率的ブロックモデルにおける不均衡コミュニティの回復と欠陥オラクルによるクラスタリングへの応用 Recovering Unbalanced Communities in the Stochastic Block Model With Application to Clustering with a Faulty Oracle ( http://arxiv.org/abs/2202.08522v1 ) ライセンス: CC BY 4.0	Chandra Sekhar Mukherjee, Pan Peng and Jiapeng Zhang	(参考訳) 確率ブロックモデル(SBM)は,ネットワークにおけるグラフクラスタリングやコミュニティ検出の基本的なモデルである。過去10年間で大きな注目を集めており、バランスの取れた場合、すなわち全てのクラスターが大きければ、十分に研究されている。しかし、不均衡なコミュニティとのSBMの理解(おそらく実際はより関連性が高い)は依然として極めて限られている。本稿では,SBMのコミュニティを様々な大きさのコミュニティで復元するための,SVDに基づく簡単なアルゴリズムを提案する。 KS-threshold予想の下では、我々のアルゴリズムのパラメータ間のトレードオフは、幅広いレジームに対する多対数因子にほぼ最適である。副産物として,複数の先行作業(mazumdarand saha [nips 2017], larsen, mitzenmacher, tsourakakis [www 2020], peng and zhang[colt 2021])によって改善される,クラスタ化問題に対するクエリ複雑性が向上した,時間効率の高いアルゴリズムを得る。 ks-threshold予想の下では、アルゴリズムのクエリの複雑さは多対数因子までほぼ最適である。 The stochastic block model (SBM) is a fundamental model for studying graph clustering or community detection in networks. It has received great attention in the last decade and the balanced case, i.e., assuming all clusters have large size, has been well studied. However, our understanding of SBM with unbalanced communities (arguably, more relevant in practice) is still very limited. In this paper, we provide a simple SVD-based algorithm for recovering the communities in the SBM with communities of varying sizes. Under the KS-threshold conjecture, the tradeoff between the parameters in our algorithm is nearly optimal up to polylogarithmic factors for a wide range of regimes. As a byproduct, we obtain a time-efficient algorithm with improved query complexity for a clustering problem with a faulty oracle, which improves upon a number of previous work (Mazumdarand Saha [NIPS 2017], Larsen, Mitzenmacher and Tsourakakis [WWW 2020], Peng and Zhang[COLT 2021]). Under the KS-threshold conjecture, the query complexity of our algorithm is nearly optimal up to polylogarithmic factors.	翻訳日:2022-02-18 22:34:33 公開日:2022-02-17
# (参考訳) オープンソースの風力・風力データセットの収集と分類 A Collection and Categorization of Open-Source Wind and Wind Power Datasets ( http://arxiv.org/abs/2202.08524v1 ) ライセンス: CC BY 4.0	Nina Effenberger and Nicole Ludwig	(参考訳) 風力発電やその他の再生可能エネルギー源は、今日の電力網のエネルギー供給において、より重要な役割を担っている。そのため、電力グリッドのバランスをとるには再生可能エネルギー源の予測が不可欠である。新しい予測方法に注目する一方で、メソッドを他のユースケースやデータと比較し、再現し、転送する方法にはほとんど注意を払わない。この欠如の理由のひとつは、現在使用されている多くのデータセットが非開示であり、研究の再現性を不可能にしているため、オープンソースデータセットの可用性が限られていることだ。このオープンソースのデータセットの利用不可能性は、風力予測のような商業的に興味深い分野で特に一般的である。しかし,本論文では,既存のオープンソースの風力データセットの最新の概観と,風力予測に使用できるさまざまなデータセット群への分類を提供することにより,研究者が利用可能なデータセット上での手法を比較することを可能にする。風力予測タスクに十分なデータセットが公開されていることを示し、研究者が適切なオープンソースデータセットを選択してそれらのメソッドを比較することができるように、異なるデータグループ特性について議論する。 Wind power and other forms of renewable energy sources play an ever more important role in the energy supply of today's power grids. Forecasting renewable energy sources has therefore become essential in balancing the power grid. While a lot of focus is placed on new forecasting methods, little attention is given on how to compare, reproduce and transfer the methods to other use cases and data. One reason for this lack of attention is the limited availability of open-source datasets, as many currently used datasets are non-disclosed and make reproducibility of research impossible. This unavailability of open-source datasets is especially prevalent in commercially interesting fields such as wind power forecasting. However, with this paper we want to enable researchers to compare their methods on publicly available datasets by providing the, to our knowledge, largest up-to-date overview of existing open-source wind power datasets, and a categorization into different groups of datasets that can be used for wind power forecasting. We show that there are publicly available datasets sufficient for wind power forecasting tasks and discuss the different data groups properties to enable researchers to choose appropriate open-source datasets and compare their methods on them.	翻訳日:2022-02-18 22:12:52 公開日:2022-02-17
# (参考訳) ベイジアンニューラルモデリングを用いたエンド・ツー・エンド音声認識の高速化 Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition ( http://arxiv.org/abs/2202.08532v1 ) ライセンス: CC BY 4.0	Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko	(参考訳) 本研究では,敵対的雑音のある音声に対して,エンドツーエンド自動音声認識(ASR)のシステムロバスト性を高めることを目的とする。厳密で経験的な"閉じたモデルの敵対的ロバスト性"設定(例えば、オンデバイスやクラウドアプリケーション)に焦点を当てています。対向ノイズは、ターゲットとするASRモデルの勾配情報に直接アクセスすることなく、閉モデル最適化(例えば、進化的およびゼロ次推定)によってのみ生成される。本稿では,bnn(advanced bayesian neural network)を基盤とした,適応的逆摂動に対する潜性分布を分岐計測によりモデル化する手法を提案する。さらに, RNN Transducer, Conformer, wav2vec-2.0 ベースの ASR システムの配置シナリオを, 逆検出システムを用いてシミュレートする。提案したBNNベースの検出システムを利用することで,検出率を+2.77から+5.42%(相対+3.03から+6.26%)に改善し,LbriSpeechデータセット上での単語誤り率を5.02から7.47%削減する。 In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples. We focus on a rigorous and empirical "closed-model adversarial robustness" setting (e.g., on-device or cloud applications). The adversarial noise is only generated by closed-model optimization (e.g., evolutionary and zeroth-order estimation) without accessing gradient information of a targeted ASR model directly. We propose an advanced Bayesian neural network (BNN) based adversarial detector, which could model latent distributions against adaptive adversarial perturbation with divergence measurement. We further simulate deployment scenarios of RNN Transducer, Conformer, and wav2vec-2.0 based ASR systems with the proposed adversarial detection system. Leveraging the proposed BNN based detection system, we improve detection rate by +2.77 to +5.42% (relative +3.03 to +6.26%) and reduce the word error rate by 5.02 to 7.47% on LibriSpeech datasets compared to the current model enhancement methods against the adversarial speech examples.	翻訳日:2022-02-18 21:48:45 公開日:2022-02-17
# (参考訳) AISHELL-NER:中国語音声からのエンティティ認識 AISHELL-NER: Named Entity Recognition from Chinese Speech ( http://arxiv.org/abs/2202.08533v1 ) ライセンス: CC BY 4.0	Boli Chen, Guangwei Xu, Xiaobin Wang, Pengjun Xie, Meishan Zhang, Fei Huang	(参考訳) 音声からのエンティティ認識(NER)は音声信号から意味情報を抽出することを目的とした音声言語理解(SLU)タスクの一つである。音声からのNERは通常、(1)音声を自動音声認識(ASR)システムで処理し、(2)NERタグをASR出力に適用する2段階のパイプラインによって行われる。最近の研究は、英語とフランス語の音声からNERに対するEnd-to-End(E2E)アプローチの能力を示している。しかし、中国語には多くのホモフォンやポリフォンがあるため、中国語のNERは事実上難しい課題である。本稿では,中国語音声からのNERのためのデータセットAISEHLL-NERを提案する。いくつかの最先端手法の性能を調べるために,広範囲な実験を行った。その結果、エンティティ認識型ASRと事前学習型NERタグを併用することで、現在のSLUパイプラインに容易に適用できることが示されている。データセットはgithub.com/Alibaba-NLP/AISHELL-NERで公開されている。 Named Entity Recognition (NER) from speech is among Spoken Language Understanding (SLU) tasks, aiming to extract semantic information from the speech signal. NER from speech is usually made through a two-step pipeline that consists of (1) processing the audio using an Automatic Speech Recognition (ASR) system and (2) applying an NER tagger to the ASR outputs. Recent works have shown the capability of the End-to-End (E2E) approach for NER from English and French speech, which is essentially entity-aware ASR. However, due to the many homophones and polyphones that exist in Chinese, NER from Chinese speech is effectively a more challenging task. In this paper, we introduce a new dataset AISEHLL-NER for NER from Chinese speech. Extensive experiments are conducted to explore the performance of several state-of-the-art methods. The results demonstrate that the performance could be improved by combining entity-aware ASR and pretrained NER tagger, which can be easily applied to the modern SLU pipeline. The dataset is publicly available at github.com/Alibaba-NLP/AISHELL-NER.	翻訳日:2022-02-18 21:33:50 公開日:2022-02-17
# (参考訳) ANNに新しいニューロンをいつ、どこで、どのように追加するか When, where, and how to add new neurons to ANNs ( http://arxiv.org/abs/2202.08539v1 ) ライセンス: CC BY 4.0	Kaitlin Maile, Emmanuel Rachelson, Herv\'e Luga, Dennis G. Wilson	(参考訳) annの神経新生は未熟で難しい問題であり、刈り取りのような構造学習の他の形態と比較しても難しい。トリガーと初期化に分解することで、学習プロセス中にニューロンをいつ、どこで、どのように追加するかという、神経発生のさまざまな側面を研究するフレームワークを導入します。神経新生戦略のニューラルオルソゴン性(NORTH)スイートを,活性化や重みの直交性に基づく階層的トリガと初期化を組み合わせて,効率の良い大きさに収束する性能的ネットワークを動的に成長させる。 MLPを用いた他の神経新生研究に対する我々の貢献を評価する。 Neurogenesis in ANNs is an understudied and difficult problem, even compared to other forms of structural learning like pruning. By decomposing it into triggers and initializations, we introduce a framework for studying the various facets of neurogenesis: when, where, and how to add neurons during the learning process. We present the Neural Orthogonality (NORTH) suite of neurogenesis strategies, combining layer-wise triggers and initializations based on the orthogonality of activations or weights to dynamically grow performant networks that converge to an efficient size. We evaluate our contributions against other recent neurogenesis works with MLPs.	翻訳日:2022-02-18 21:23:14 公開日:2022-02-17
# (参考訳) 医用画像における深層学習の概要 An overview of deep learning in medical imaging ( http://arxiv.org/abs/2202.08546v1 ) ライセンス: CC BY 4.0	Imran Ul Haq	(参考訳) 機械学習(ML)は、最近の10年間で大きな検討がなされている。この成功は2012年、MLモデルがコンピュータビジョンに関する世界で最も有名なコンペであるImageNet Classificationで驚くべき勝利を収めた時に始まった。このモデルは、Deep Learning(DL)と呼ばれる畳み込みニューラルネットワーク(CNN)の一種である。それ以来、研究者はdlの最速の研究開発領域に効率的に参加し始めた。近年、DLシステムは、人間の言語処理からビデオ分析まで幅広い分野にまたがる最先端のMLシステムであり、学術の世界や企業でよく使われている。最近の進歩は医療分野に大きな改善をもたらす可能性がある。データ処理、画像解析の革新的手法を改良し、診断技術や医療サービスを大幅に改善することができる。画像診断におけるdlの分野における問題点と今後の展開について概観した。レビューの主な目的は次の4つです。 (i)異なるDLモデルについて議論することで、DLに簡単なプロログを提供する。 (ii)医療画像解析におけるdlの使用状況(分類・検出・分割・登録)についての検討三医用画像におけるDLの7つの主な応用分野の検討 (iv)無償利用可能なdlコード、公開データセットテーブル7、医療画像コンペティションソーステーブル8などの有用な情報的資産のリンクを提供することにより、臨床画像におけるdlに関する研究領域への追加を熱望する人に初期段階を与え、医学分野におけるdlの明確な継続的な困難、教訓、今後の展望を概説し、調査を終了させる。 Machine learning (ML) has seen enormous consideration during the most recent decade. This success started in 2012 when an ML model accomplished a remarkable triumph in the ImageNet Classification, the world's most famous competition for computer vision. This model was a kind of convolutional neural system (CNN) called deep learning (DL). Since then, researchers have started to participate efficiently in DL's fastest developing area of research. These days, DL systems are cutting-edge ML systems spanning a broad range of disciplines, from human language processing to video analysis, and commonly used in the scholarly world and enterprise sector. Recent advances can bring tremendous improvement to the medical field. Improved and innovative methods for data processing, image analysis and can significantly improve the diagnostic technologies and medicinal services gradually. A quick review of current developments with relevant problems in the field of DL used for medical imaging has been provided. The primary purposes of the review are four: (i) provide a brief prolog to DL by discussing different DL models, (ii) review of the DL usage for medical image analysis (classification, detection, segmentation, and registration), (iii) review seven main application fields of DL in medical imaging, (iv) give an initial stage to those keen on adding to the research area about DL in clinical imaging by providing links of some useful informative assets, such as freely available DL codes, public datasets Table 7, and medical imaging competition sources Table 8 and end our survey by outlining distinct continuous difficulties, lessons learned and future of DL in the field of medical science.	翻訳日:2022-02-18 21:10:09 公開日:2022-02-17
# (参考訳) oracleによる最悪の敵に対するオンライン学習の効率化 Oracle-Efficient Online Learning for Beyond Worst-Case Adversaries ( http://arxiv.org/abs/2202.08549v1 ) ライセンス: CC BY 4.0	Nika Haghtalab, Yanjun Han, Abhishek Shetty, Kunhe Yang	(参考訳) 本稿では,オンライン学習の最悪のケース分析を超越した,オラクル効率のアルゴリズムについて検討する。私たちは2つの設定に集中します。まず,[rst11,hrs12]の平滑化解析設定は,一様密度の1/\sigma$倍の上限値を持つ分布からサンプルを生成することに制約される。第二に、$K$-hintトランスダクティブ学習の設定では、学習者が真のインスタンスを含むことが保証される時間毎に$K$ヒントにアクセスできるようになる。私たちは、クラスのvc次元のみに依存する設定と、敵の力をキャプチャする$\sigma$と$k$の両方に対して、最初のoracle効率の高いアルゴリズムを提供します。特に、これらの設定に対してそれぞれ$ O ( \sqrt{T (d / \sigma )^{1/2} } ) $ と $ O ( \sqrt{T d K } )$ のオラクル効率の後悔境界を達成する。このスムーズな分析設定のために,本研究は,スムーズな相手を用いたオンライン学習のための最初のオラクル効率アルゴリズムを提供する[HRS21]。これは[HK16]が確立したオフライン学習と, オンライン学習と最悪の相手との計算的分離とは対照的である。私たちのアルゴリズムは、小さなドメインで最悪の場合のバウンダリも改善しています。特に、$O ( \sqrt{T(d \vert{\mathcal{X}}\vert ) ^{1/2} })$を後悔したオラクル効率のアルゴリズムを与え、これは [DS16] で束縛された以前の$O ( \sqrt{T\vert{\mathcal{X} } \vert })$の洗練である。 In this paper, we study oracle-efficient algorithms for beyond worst-case analysis of online learning. We focus on two settings. First, the smoothed analysis setting of [RST11, HRS12] where an adversary is constrained to generating samples from distributions whose density is upper bounded by $1/\sigma$ times the uniform density. Second, the setting of $K$-hint transductive learning, where the learner is given access to $K$ hints per time step that are guaranteed to include the true instance. We give the first known oracle-efficient algorithms for both settings that depend only on the VC dimension of the class and parameters $\sigma$ and $K$ that capture the power of the adversary. In particular, we achieve oracle-efficient regret bounds of $ O ( \sqrt{T (d / \sigma )^{1/2} } ) $ and $ O ( \sqrt{T d K } )$ respectively for these setting. For the smoothed analysis setting, our results give the first oracle-efficient algorithm for online learning with smoothed adversaries [HRS21]. This contrasts the computational separation between online learning with worst-case adversaries and offline learning established by [HK16]. Our algorithms also imply improved bounds for worst-case setting with small domains. In particular, we give an oracle-efficient algorithm with regret of $O ( \sqrt{T(d \vert{\mathcal{X}}\vert ) ^{1/2} })$, which is a refinement of the earlier $O ( \sqrt{T\vert{\mathcal{X} } \vert })$ bound by [DS16].	翻訳日:2022-02-18 20:35:21 公開日:2022-02-17
# (参考訳) 非同期学習のための遅延適応ステップサイズ Delay-adaptive step-sizes for asynchronous learning ( http://arxiv.org/abs/2202.08550v1 ) ライセンス: CC BY 4.0	Xuyang Wu, Sindri Magnusson, Hamid Reza Feyzmahdavian and Mikael Johansson	(参考訳) スケーラブルな機械学習システムでは、モデルトレーニングは、厳密な同期なしに実行される複数のノードに並列化されることが多い。関連する非同期アルゴリズムのほとんどの分析結果は、学習率を決定するためにシステム内の情報遅延の上限を使用する。このような境界は事前に取得することが難しいだけでなく、不必要に収束が遅くなる。本稿では,システムにおける実際の時間変化の遅延に依存する学習率を利用することが可能であることを示す。遅延適応型非同期反復に対する一般的な収束結果を開発し,近位漸進勾配降下法とブロック座標降下法に特化する。これらの方法のそれぞれについて,遅延をオンラインで測定し,遅延適応型ステップサイズポリシを提示し,その理論上および実用上の優位性を実証する。 In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block-coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.	翻訳日:2022-02-18 20:34:01 公開日:2022-02-17
# (参考訳) 普遍的対向摂動による深部ニューラルネットワークのグローバルなフィンガープリント Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations ( http://arxiv.org/abs/2202.08602v1 ) ライセンス: CC BY 4.0	Zirui Peng and Shaofeng Li and Guoxing Chen and Cheng Zhang and Haojin Zhu and Minhui Xue	(参考訳) 本稿では,容疑者モデルが被害者モデルから盗まれているかどうかをモデル抽出攻撃によって検証できる,新規かつ実用的なメカニズムを提案する。我々の重要な洞察は、DNNモデルの決定境界のプロファイルは、その \textit{Universal Adversarial Perturbations (UAPs) によって特徴付けられることである。 UAPは低次元のサブスペースに属し、海賊モデルのサブスペースは非海賊モデルよりも犠牲者モデルのサブスペースとより整合している。そこで本研究では, DNNモデルに対するUAPフィンガープリント手法を提案し, 指紋を入力とし, 類似度スコアを出力する <textit{contrastive learning} を用いてエンコーダを訓練する。広範囲にわたる研究により、我々のフレームワークは、疑似モデルの20ドルの指紋だけで、信頼度99.99 %$でモデルIP侵害を検出することができることが示された。異なるモデルアーキテクチャにまたがる優れた一般化性を持ち、盗難モデルの修正後に対して堅牢である。 In this paper, we propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks. Our key insight is that the profile of a DNN model's decision boundary can be uniquely characterized by its \textit{Universal Adversarial Perturbations (UAPs)}. UAPs belong to a low-dimensional subspace and piracy models' subspaces are more consistent with victim model's subspace compared with non-piracy model. Based on this, we propose a UAP fingerprinting method for DNN models and train an encoder via \textit{contrastive learning} that takes fingerprint as inputs, outputs a similarity score. Extensive studies show that our framework can detect model IP breaches with confidence $> 99.99 \%$ within only $20$ fingerprints of the suspect model. It has good generalizability across different model architectures and is robust against post-modifications on stolen models.	翻訳日:2022-02-18 20:13:22 公開日:2022-02-17
# (参考訳) メタ-)ソルバアプローチの評価について On the evaluation of (meta-)solver approaches ( http://arxiv.org/abs/2202.08613v1 ) ライセンス: CC BY 4.0	Roberto Amadini, Maurizio Gabbrielli, Tong Liu, Jacopo Mauro	(参考訳) メタソルバアプローチは、よりよいソルバを構築するために、多数の個別のソルバを利用する。メタソルバの性能を評価するには、個々のソルバ(例えば、ランタイムやソリューションの品質)に典型的なメトリクスを採用するか、より具体的な評価指標(例えば、メタソルバが仮想的な最高のパフォーマンスにどの程度近いかを測定することで)を採用する。本稿では,最近発表されたいくつかの成果をもとに,その強みと弱みを基礎として,(メタ)ソルバを評価するためのさまざまなパフォーマンス指標の概要を示す。 Meta-solver approaches exploits a number of individual solvers to potentially build a better solver. To assess the performance of meta-solvers, one can simply adopt the metrics typically used for individual solvers (e.g., runtime or solution quality), or employ more specific evaluation metrics (e.g., by measuring how close the meta-solver gets to its virtual best performance). In this paper, based on some recently published works, we provide an overview of different performance metrics for evaluating (meta-)solvers, by underlying their strengths and weaknesses.	翻訳日:2022-02-18 19:54:40 公開日:2022-02-17
# (参考訳) 動的放射場レンダリングのためのFourier PlenOctrees Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-time ( http://arxiv.org/abs/2202.08614v1 ) ライセンス: CC BY 4.0	Liao Wang, Jiakai Zhang, Xinhang Liu, Fuqiang Zhao, Yanshun Zhang, Yingliang Zhang, Minye Wu, Lan Xu and Jingyi Yu	(参考訳) Neural Radiance Field (NeRF)のような暗黙の神経表現は主に、PlenOctreeのようなスマートなデータ構造でリアルタイムなレンダリングを実現するマルチビュー設定下でキャプチャされた静的オブジェクトのモデリングに焦点を当てている。本稿では,FVV(Fourier PlenOctree)技術を用いて,FVV(Fourier PlenOctree)設定下で撮影した動的シーンの効率的なニューラルモデリングとリアルタイムレンダリングを実現する。我々のFPOにおける鍵となるアイデアは、一般化されたNeRF、PlenOctree表現、体積融合、フーリエ変換の新たな組み合わせである。 fpo構築を加速するために, 一般化したnerf技術を用いて空間的ブレンドにより木を生成できる新しい粗粒間融合スキームを提案する。動的シーンに取り組むために、暗黙のネットワークを調整し、時間軸密度と色属性のフーリエ係数をモデル化する。最後に、FPOを構築し、動的列の合同PlenOctree構造の葉に直接フーリエ係数を訓練する。結果,FPOは動的オブジェクトの処理にコンパクトなメモリオーバーロードを実現し,高速な微調整をサポートすることを示す。大規模な実験により,提案手法は元のNeRFの3000倍の速度でSOTAよりも桁違いの加速を実現し,非表示ダイナミックシーンの自由視点レンダリングに高い視覚的品質を保っていることがわかった。 Implicit neural representations such as Neural Radiance Field (NeRF) have focused mainly on modeling static objects captured under multi-view settings where real-time rendering can be achieved with smart data structures, e.g., PlenOctree. In this paper, we present a novel Fourier PlenOctree (FPO) technique to tackle efficient neural modeling and real-time rendering of dynamic scenes captured under the free-view video (FVV) setting. The key idea in our FPO is a novel combination of generalized NeRF, PlenOctree representation, volumetric fusion and Fourier transform. To accelerate FPO construction, we present a novel coarse-to-fine fusion scheme that leverages the generalizable NeRF technique to generate the tree via spatial blending. To tackle dynamic scenes, we tailor the implicit network to model the Fourier coefficients of timevarying density and color attributes. Finally, we construct the FPO and train the Fourier coefficients directly on the leaves of a union PlenOctree structure of the dynamic sequence. We show that the resulting FPO enables compact memory overload to handle dynamic objects and supports efficient fine-tuning. Extensive experiments show that the proposed method is 3000 times faster than the original NeRF and achieves over an order of magnitude acceleration over SOTA while preserving high visual quality for the free-viewpoint rendering of unseen dynamic scenes.	翻訳日:2022-02-18 19:43:19 公開日:2022-02-17
# (参考訳) オブジェクトカウントのためのドメインランダム化 Domain Randomization for Object Counting ( http://arxiv.org/abs/2202.08670v1 ) ライセンス: CC0 1.0	Enric Moreu, Kevin McGuinness, Diego Ortego, Noel E. O'Connor	(参考訳) 近年,ゲームエンジンに基づく合成データセットの利用により,コンピュータビジョンにおけるタスクの性能向上が図られている。しかし、これらのデータセットは通常、車両や人々を含む都市シーンなど、コンピュータゲームで描かれた特定のドメインにのみ適している。本稿では,高額な3Dアーティストチームによって手作業で作成される写真リアルな技法を必要とせずに,任意の領域のオブジェクトカウントのための合成データセットを生成する手法を提案する。本稿では,高速かつ安価に生成できる合成データセットに基づくオブジェクトカウントのためのドメインランダム化手法を提案する。我々は、故意にフォトリアリズムを避け、データセットの可変性を劇的に増加させ、ランダムなテクスチャと3d変換を持つ画像を生成し、一般化を改善する。実験により,本手法は,人,車,ペンギン,果物など,複数のドメインを対象とした実単語オブジェクトカウントデータセットの性能向上を図っている。ソースコードはhttps://github.com/enric1994/dr4oc Recently, the use of synthetic datasets based on game engines has been shown to improve the performance of several tasks in computer vision. However, these datasets are typically only appropriate for the specific domains depicted in computer games, such as urban scenes involving vehicles and people. In this paper, we present an approach to generate synthetic datasets for object counting for any domain without the need for photo-realistic techniques manually generated by expensive teams of 3D artists. We introduce a domain randomization approach for object counting based on synthetic datasets that are quick and inexpensive to generate. We deliberately avoid photorealism and drastically increase the variability of the dataset, producing images with random textures and 3D transformations, which improves generalization. Experiments show that our method facilitates good performance on various real word object counting datasets for multiple domains: people, vehicles, penguins, and fruit. The source code is available at: https://github.com/enric1994/dr4oc	翻訳日:2022-02-18 19:27:22 公開日:2022-02-17
# (参考訳) 信頼度測定か、生理学の自動化か? Safra, Chevallier, Gr\`ezes, and Baumard (2020) へのコメント Measuring Trustworthiness or Automating Physiognomy? A Comment on Safra, Chevallier, Gr\`ezes, and Baumard (2020) ( http://arxiv.org/abs/2202.08674v1 ) ライセンス: CC BY 4.0	Rory W Spanton and Olivia Guest	(参考訳) 個人間の信頼 - 他の個人に対する信頼と脆弱性の共有表示 - は、人間の社会の発展に有効であると見なすことができる。 Safra、Chevallier、Gr\ezes、Baumard (2020)は、顔の特徴に基づいて、歴史的肖像画の信頼性評価を生成するために機械学習(ML)アルゴリズムを訓練することで、対人信頼の歴史的進歩を研究した。彼らは1500年から2000年代にかけての肖像画の信頼度評価が時間とともに増加し、これが社会進歩の指標と一致する対人信頼のより広範な増加を証明していると主張した。これらの主張はいくつかの方法論的・分析的問題と相まって成り立っており、サフラらのアルゴリズムと生理学の疑似科学の類似点を強調する。本論では,これらの問題の現実的な影響と可能性について,さらに詳細に論じる。 Interpersonal trust - a shared display of confidence and vulnerability toward other individuals - can be seen as instrumental in the development of human societies. Safra, Chevallier, Gr\`ezes, and Baumard (2020) studied the historical progression of interpersonal trust by training a machine learning (ML) algorithm to generate trustworthiness ratings of historical portraits, based on facial features. They reported that trustworthiness ratings of portraits dated between 1500--2000CE increased with time, claiming that this evidenced a broader increase in interpersonal trust coinciding with several metrics of societal progress. We argue that these claims are confounded by several methodological and analytical issues and highlight troubling parallels between Safra et al.'s algorithm and the pseudoscience of physiognomy. We discuss the implications and potential real-world consequences of these issues in further detail.	翻訳日:2022-02-18 19:18:56 公開日:2022-02-17
# (参考訳) Winograd Convolution: フォールトトレランスの観点から Winograd Convolution: A Perspective from Fault Tolerance ( http://arxiv.org/abs/2202.08675v1 ) ライセンス: CC BY 4.0	Xinghua Xue, Haitong Huang, Cheng Liu, Ying Wang, Tao Luo, Lei Zhang	(参考訳) Winograd Convolutionは、ニューラルネットワーク(NN)の乗算を線形変換によって加算することで、計算オーバーヘッドを削減するために提案された。計算効率以外では,NNの耐障害性向上に大きな可能性を示し,その耐障害性を総合的に評価した。次に, 耐故障性, 省エネ性NN処理における耐故障性の検討を行った。以上の結果から, 耐故障設計のオーバーヘッドを27.49 %, エネルギー消費を7.19 %削減できることがわかった。 Winograd convolution is originally proposed to reduce the computing overhead by converting multiplication in neural network (NN) with addition via linear transformation. Other than the computing efficiency, we observe its great potential in improving NN fault tolerance and evaluate its fault tolerance comprehensively for the first time. Then, we explore the use of fault tolerance of winograd convolution for either fault-tolerant or energy-efficient NN processing. According to our experiments, winograd convolution can be utilized to reduce fault-tolerant design overhead by 27.49\% or energy consumption by 7.19\% without any accuracy loss compared to that without being aware of the fault tolerance	翻訳日:2022-02-18 19:14:27 公開日:2022-02-17
# (参考訳) 教師なしポリプセグメンテーションのための合成データ Synthetic data for unsupervised polyp segmentation ( http://arxiv.org/abs/2202.08680v1 ) ライセンス: CC0 1.0	Enric Moreu, Kevin McGuinness, Noel E. O'Connor	(参考訳) 深層学習は医療画像の解析において優れた性能を示した。しかし、データセットはプライバシの問題、標準化の問題、アノテーションの欠如のために取得することが難しい。本稿では,3次元技術と生成対向ネットワークを組み合わせたリアルな合成画像を作成することで,これらの課題に対処する。パイプラインでは医療専門家のアノテーションをゼロにしています。本手法は,5つの実ポリープセグメンテーションデータセットに対して有望な結果を得る。この研究の一環として、我々はSynth-Colonをリリースした。Synth-Colonは、20000のリアルな大腸画像と深度と3D幾何学に関する追加情報を含む完全に合成されたデータセットである。 Deep learning has shown excellent performance in analysing medical images. However, datasets are difficult to obtain due privacy issues, standardization problems, and lack of annotations. We address these problems by producing realistic synthetic images using a combination of 3D technologies and generative adversarial networks. We use zero annotations from medical professionals in our pipeline. Our fully unsupervised method achieves promising results on five real polyp segmentation datasets. As a part of this study we release Synth-Colon, an entirely synthetic dataset that includes 20000 realistic colon images and additional details about depth and 3D geometry: https://enric1994.github.io/synth-colon	翻訳日:2022-02-18 19:01:39 公開日:2022-02-17
# (参考訳) 集合オートマトンマッチングに基づく項書き換え Term Rewriting Based On Set Automaton Matching ( http://arxiv.org/abs/2202.08687v1 ) ライセンス: CC BY 4.0	Mark Bouwman, Rick Erkens	(参考訳) これまで我々は,集合オートマトンの概念に基づく効率的なパターンマッチングアルゴリズムを提案してきた。本稿では,効率的な項書き換え手順を実現するために,set automataをどのように活用できるかを検討する。これらの手順はパターンマッチングステップと書き換えステップをインターリーブし、redex発見とサブターム置換をスムーズに統合する。具体的には,左線形項書き換えシステムの最外書き換えのための最適化アルゴリズムを提案し,その正しさを証明し,いくつかの実装実験の結果を示す。 In previous work we have proposed an efficient pattern matching algorithm based on the notion of set automaton. In this article we investigate how set automata can be exploited to implement efficient term rewriting procedures. These procedures interleave pattern matching steps and rewriting steps and thus smoothly integrate redex discovery and subterm replacement. Concretely, we propose an optimised algorithm for outermost rewriting of left-linear term rewriting systems, prove its correctness, and present the results of some implementation experiments.	翻訳日:2022-02-18 18:54:32 公開日:2022-02-17
# (参考訳) OmniSyn:360度ビデオをワイドベースラインパノラマで合成する OmniSyn: Synthesizing 360 Videos with Wide-baseline Panoramas ( http://arxiv.org/abs/2202.08752v1 ) ライセンス: CC BY 4.0	David Li, Yinda Zhang, Christian H\"ane, Danhang Tang, Amitabh Varshney, Ruofei Du	(参考訳) GoogleストリートビューやBingストリートサイドのような没入型マップは、パノラマの膨大なコレクションで現実のビューを提供する。しかし、これらのパノラマは、撮影される経路に沿ってスパース間隔でのみ利用可能であり、ナビゲーション中に視覚的な不連続が生じる。視線合成の先行技術は通常、視線画像、立体画像、または単眼画像のセットの上に構築されるが、広帯域パノラマは、帯域幅とストレージ使用量の最適化のために商業的プラットフォームで広く採用されている。本稿では,ワイドベースラインパノラマのユニークな特徴と,ワイドベースラインパノラマ間の360{\deg}ビュー合成のための新しいパイプラインであるOmniSynについて述べる。 omnisynは球面コストボリュームと単眼スキップ接続を用いて全方位深度マップを予測し、360{\deg}画像にメッシュを描画し、中間ビューと融合ネットワークを合成する。我々はomnisynの有効性を,carlaおよびmatterportデータセットの最先端手法との比較,アブレーション研究,ストリートビューの一般化など,総合的な実験結果を通じて実証する。私たちの研究は、この未完成の現実世界のタスクの将来の研究を刺激し、最終的には没入型マップをスムースに操作できることを期待しています。 Immersive maps such as Google Street View and Bing Streetside provide true-to-life views with a massive collection of panoramas. However, these panoramas are only available at sparse intervals along the path they are taken, resulting in visual discontinuities during navigation. Prior art in view synthesis is usually built upon a set of perspective images, a pair of stereoscopic images, or a monocular image, but barely examines wide-baseline panoramas, which are widely adopted in commercial platforms to optimize bandwidth and storage usage. In this paper, we leverage the unique characteristics of wide-baseline panoramas and present OmniSyn, a novel pipeline for 360{\deg} view synthesis between wide-baseline panoramas. OmniSyn predicts omnidirectional depth maps using a spherical cost volume and a monocular skip connection, renders meshes in 360{\deg} images, and synthesizes intermediate views with a fusion network. We demonstrate the effectiveness of OmniSyn via comprehensive experimental results including comparison with the state-of-the-art methods on CARLA and Matterport datasets, ablation studies, and generalization studies on street views. We envision our work may inspire future research for this unheeded real-world task and eventually produce a smoother experience for navigating immersive maps.	翻訳日:2022-02-18 18:25:30 公開日:2022-02-17
# (参考訳) ツールキットのように見る: ツールキットがai倫理の仕事をどう考えるか Seeing Like a Toolkit: How Toolkits Envision the Work of AI Ethics ( http://arxiv.org/abs/2202.08792v1 ) ライセンス: CC BY 4.0	Richmond Y. Wong, Michael A. Madaio, Nick Merrill	(参考訳) 倫理的AI開発を支援するために多くのツールキットが開発されている。しかしながら、ツールキットは、他のツールと同様に、何をすべきか、どのように行うべきかという前提を設計にエンコードします。本稿では,27のAI倫理ツールキットの質的分析を行い,倫理的作業がどのように想像されるか,どのように支援されるのかを批判的に検証する。具体的には,倫理的問題,倫理的作業を行うべき者,倫理的対処に関わる作業慣行をどのように想定するか,などについて論じる。 AI倫理ツールキットは、AI倫理の社会的側面に固執することや、実践中のAI倫理作業の組織的および政治的意味に抗うことなく、より広範な利害関係者の関与を求めるにもかかわらず、AI倫理の作業が個々の技術実践者にとって技術的作業になるように、主に枠組みを定めている。すべてのツールキットのうち、想定された倫理の作業と、その作業にツールキットが提供するサポートのミスマッチを識別します。倫理的な作業を行う上で,組織的な力のダイナミクスをナビゲートする方法に関するガイダンスの欠如を特定します。我々はこれらの欠落を利用して、AI倫理ツールキットの研究者やデザイナーの今後の業績をグラフ化しています。 Numerous toolkits have been developed to support ethical AI development. However, toolkits, like all tools, encode assumptions in their design about what work should be done and how. In this paper, we conduct a qualitative analysis of 27 AI ethics toolkits to critically examine how the work of ethics is imagined and how it is supported by these toolkits. Specifically, we examine the discourses toolkits rely on when talking about ethical issues, who they imagine should do the work of ethics, and how they envision the work practices involved in addressing ethics. We find that AI ethics toolkits largely frame the work of AI ethics to be technical work for individual technical practitioners, despite calls for engaging broader sets of stakeholders in grappling with social aspects of AI ethics, and without contending with the organizational and political implications of AI ethics work in practice. Among all toolkits, we identify a mismatch between the imagined work of ethics and the support the toolkits provide for doing that work. We identify a lack of guidance around how to navigate organizational power dynamics as they relate to performing ethical work. We use these omissions to chart future work for researchers and designers of AI ethics toolkits.	翻訳日:2022-02-18 18:09:05 公開日:2022-02-17
# (参考訳) 物理化学的性質の物理およびデータ駆動予測手法のハイブリダイゼーション Hybridizing Physical and Data-driven Prediction Methods for Physicochemical Properties ( http://arxiv.org/abs/2202.08804v1 ) ライセンス: CC BY 4.0	Fabian Jirasek, Robert Bamler, and Stephan Mandt	(参考訳) 本稿では,物理化学的特性の予測のための物理・データ駆動手法のハイブリッド化手法を提案する。このアプローチは、物理手法の予測を事前のモデルに 'distills' し、ベイズ推定を用いた疎い実験データと組み合わせる。本研究では,データ駆動型および物理ベースラインと比較して,無限希釈時の活動係数の予測に新たなアプローチを適用し,機械学習文献からアンサンブル法を確立した。 We present a generic way to hybridize physical and data-driven methods for predicting physicochemical properties. The approach `distills' the physical method's predictions into a prior model and combines it with sparse experimental data using Bayesian inference. We apply the new approach to predict activity coefficients at infinite dilution and obtain significant improvements compared to the data-driven and physical baselines and established ensemble methods from the machine learning literature.	翻訳日:2022-02-18 17:43:21 公開日:2022-02-17
# (参考訳) 人間とアルゴリズムの協調:相補性と不公平を回避する Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness ( http://arxiv.org/abs/2202.08821v1 ) ライセンス: CC BY 4.0	Kate Donahue, Alexandra Chouldechova, Krishnaram Kenthapadi	(参考訳) 機械学習の研究の多くは予測精度に焦点を当てている。タスクが与えられたら、精度を最大化する機械学習モデル(またはアルゴリズム)を作成する。しかし、多くの環境では、システムの最終的な予測や決定は、アルゴリズムのアウトプットと自身の個人的な専門知識を使って複合的な予測を生成する人間の管理下にある。このような協調システムの最終的な目標は「相補性」(complementarity) であり、すなわち、人間やアルゴリズム単独よりも損失の少ないもの(ほぼ同値)を生み出すことである。しかし, 慎重に設計したシステムにおいても, 相補的な性能は明らかである。私たちの仕事は3つの重要な貢献をします。まず,簡単な人間-アルゴリズム系をモデル化するための理論的枠組みを提供し,複数の事前解析をその内部で表現できることを実証する。次に、このモデルを用いて相補性が不可能な条件を証明し、相補性が達成可能な構成例を示す。最後に,本研究の意義について,特に分類器の公平性について論じる。まとめると、これらの結果は人間のアルゴリズムシステムの複合性能に影響を及ぼす重要な要因の理解を深め、アルゴリズムツールが協調環境のためにどのように最適に設計できるかを洞察する。 Much of machine learning research focuses on predictive accuracy: given a task, create a machine learning model (or algorithm) that maximizes accuracy. In many settings, however, the final prediction or decision of a system is under the control of a human, who uses an algorithm's output along with their own personal expertise in order to produce a combined prediction. One ultimate goal of such collaborative systems is "complementarity": that is, to produce lower loss (equivalently, greater payoff or utility) than either the human or algorithm alone. However, experimental results have shown that even in carefully-designed systems, complementary performance can be elusive. Our work provides three key contributions. First, we provide a theoretical framework for modeling simple human-algorithm systems and demonstrate that multiple prior analyses can be expressed within it. Next, we use this model to prove conditions where complementarity is impossible, and give constructive examples of where complementarity is achievable. Finally, we discuss the implications of our findings, especially with respect to the fairness of a classifier. In sum, these results deepen our understanding of key factors influencing the combined performance of human-algorithm systems, giving insight into how algorithmic tools can best be designed for collaborative environments.	翻訳日:2022-02-18 17:42:23 公開日:2022-02-17
# (参考訳) クロスマーケットレコメンデーションのための多段階アンサンブルモデル Multi-stage Ensemble Model for Cross-market Recommendation ( http://arxiv.org/abs/2202.08824v1 ) ライセンス: CC BY 4.0	Cesare Bernardis	(参考訳) 本稿では,WSDM カップ 2022 における PolimiRank チームによるクロスマーケットレコメンデーションのソリューションについて述べる。競争の目的は、異なる市場から抽出された情報を効果的に活用し、2つのターゲット市場における推薦のランキング精度を向上させることである。我々のモデルは、異なる市場に属するデータの組み合わせに基づく多段階的なアプローチで構成されている。最初の段階では、最先端のレコメンデータを使用して、以下の2段階にまとめられたユーザとイタムのカップルのスコアを予測し、単純な線形結合とより強力なグラディエントブースティング決定木技術を用いる。我々のチームはファイナル・リーダーボードで4位にランクインした。 This paper describes the solution of our team PolimiRank for the WSDM Cup 2022 on cross-market recommendation. The goal of the competition is to effectively exploit the information extracted from different markets to improve the ranking accuracy of recommendations on two target markets. Our model consists in a multi-stage approach based on the combination of data belonging to different markets. In the first stage, state-of-the-art recommenders are used to predict scores for user-item couples, which are ensembled in the following 2 stages, employing a simple linear combination and more powerful Gradient Boosting Decision Tree techniques. Our team ranked 4th in the final leaderboard.	翻訳日:2022-02-18 17:11:39 公開日:2022-02-17
# (参考訳) LAMP: 言語モデルでグラディエントからテキストを抽出する LAMP: Extracting Text from Gradients with Language Model Priors ( http://arxiv.org/abs/2202.08827v1 ) ライセンス: CC BY 4.0	Dimitar I. Dimitrov, Mislav Balunovi\'c, Nikola Jovanovi\'c, Martin Vechev	(参考訳) 最近の研究は、センシティブなユーザデータを勾配更新から再構築できることを示し、フェデレートされた学習における重要なプライバシーの約束を破っている。成功は主に画像データで示されたが、これらの手法はテキストなどの他の領域に直接転送するわけではない。本研究では,テキストデータに合わせた新しい攻撃手法であるlampを提案する。我々の重要な洞察は、テキストの以前の確率を補助言語モデルでモデル化し、検索をより自然なテキストへと導くことである。具体的には、lampは補助言語モデルによって提供されるレコンストラクション損失と以前のテキスト確率の両方を最小化する離散テキスト変換手順を導入する。この手順は、再建された埋め込みの長さを規則化する再構成損失の連続的な最適化と交換される。我々の実験では、LAMPは以前の作業よりもかなり正確に元のテキストを再構築することを示した。さらに,テキストモデルでは,バッチサイズが1より大きい場合から,まず入力を復元する。これらの結果から,テキストデータ上で動作しているモデルの勾配更新は,従来考えられていたよりも情報漏えいが大きいことが示唆された。 Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients. Our key insight is to model the prior probability of the text with an auxiliary language model, utilizing it to guide the search towards more natural text. Concretely, LAMP introduces a discrete text transformation procedure that minimizes both the reconstruction loss and the prior text probability, as provided by the auxiliary language model. The procedure is alternated with a continuous optimization of the reconstruction loss, which also regularizes the length of the reconstructed embeddings. Our experiments demonstrate that LAMP reconstructs the original text significantly more precisely than prior work: we recover 5x more bigrams and $23\%$ longer subsequences on average. Moreover, we are first to recover inputs from batch sizes larger than 1 for textual models. These findings indicate that gradient updates of models operating on textual data leak more information than previously thought.	翻訳日:2022-02-18 17:02:17 公開日:2022-02-17
# 水中画像強調のためのウェーブレット型デュアルストリームネットワーク A Wavelet-based Dual-stream Network for Underwater Image Enhancement ( http://arxiv.org/abs/2202.08758v1 ) ライセンス: Link先を確認	Ziyin Ma and Changjae Oh	(参考訳) 水中画像のカラーキャストやぼやけた細部に対処するウェーブレットベースのデュアルストリームネットワークを提案する。入力画像を離散ウェーブレット変換を用いて複数の周波数帯域に分解することで、これらのアーティファクトを別々に処理し、ダウンサンプリングされた構造画像と詳細画像を生成する。これらのサブバンドイメージは、マルチカラースペースフュージョンネットワークとディテールエンハンスメントネットワークという2つのサブネットワークを組み込んだデュアルストリームネットワークへの入力として使用されます。多色空間融合ネットワークは、分解した構造画像を入力として、入力の多様な色空間からの特徴表現を用いて色補正出力を推定する。ディテールエンハンスメントネットワークは、高周波サブバンドからの画像の詳細を改善することにより、元の水中画像のぼやけに対処する。提案手法を実環境および合成水中データセットの両方で検証し,計算複雑性の低い色補正およびぼかし除去におけるモデルの有効性を示した。 We present a wavelet-based dual-stream network that addresses color cast and blurry details in underwater images. We handle these artifacts separately by decomposing an input image into multiple frequency bands using discrete wavelet transform, which generates the downsampled structure image and detail images. These sub-band images are used as input to our dual-stream network that incorporates two sub-networks: the multi-color space fusion network and the detail enhancement network. The multi-color space fusion network takes the decomposed structure image as input and estimates the color corrected output by employing the feature representations from diverse color spaces of the input. The detail enhancement network addresses the blurriness of the original underwater image by improving the image details from high-frequency sub-bands. We validate the proposed method on both real-world and synthetic underwater datasets and show the effectiveness of our model in color correction and blur removal with low computational complexity.	翻訳日:2022-02-18 16:43:23 公開日:2022-02-17
# 推移型および線形順序付きデータを用いた問合せ応答 Query Answering with Transitive and Linear-Ordered Data ( http://arxiv.org/abs/2202.08555v1 ) ライセンス: Link先を確認	Antoine Amarilli and Michael Benedikt and Pierre Bourhis and Michael Vanden Boom	(参考訳) 我々は,一組の区別関係に対して追加的な意味的制約を課すフロンティア保護存在規則のような強力な制約言語を含む包括的問題を考える。我々は、関係を推移的に制限し、関係を他の関係の推移的閉包に制限し、関係を線型次数に制限することを検討する。我々は、各ケースにおいて推論を決定可能とし、対応する決定問題の複雑さを分離できるガードネスの自然な変種を与える。最後に,これらの条件のわずかな変化が決定不能につながることを示す。 We consider entailment problems involving powerful constraint languages such as frontier-guarded existential rules in which we impose additional semantic restrictions on a set of distinguished relations. We consider restricting a relation to be transitive, restricting a relation to be the transitive closure of another relation, and restricting a relation to be a linear order. We give some natural variants of guardedness that allow inference to be decidable in each case, and isolate the complexity of the corresponding decision problems. Finally we show that slight changes in these conditions lead to undecidability.	翻訳日:2022-02-18 16:43:06 公開日:2022-02-17
# 大規模実世界グラフにおける最大k-プレックスの一覧 Listing Maximal k-Plexes in Large Real-World Graphs ( http://arxiv.org/abs/2202.08737v1 ) ライセンス: Link先を確認	Zhengren Wang, Yi Zhou, Mingyun Xiao and Bakhadyr Khoussainov	(参考訳) 大きなグラフで高密度なサブグラフをリストすることは、コミュニティ検出のような様々なネットワーク分析アプリケーションにおいて重要なタスクである。最も密度の高いモデルであるクライクは広く研究されている。しかし、実際には、データノイズなど、様々な理由でコミュニティが斜めに形成されることは滅多にない。したがって、k$-plex、-graphは、最大$k$頂点を除いて全ての頂点に隣接し、リラックスしたcliqueバージョンとして導入される。コヒーシブなコミュニティをよりよくシミュレートするために、接続された$k$-plexesに$k$を小さな$k$で強調することが多い。本稿では,任意のサイズの最大$k$-plexes と最大$k$-plexes をリストアップする研究を継続する。最初のコントリビューションはアルゴリズム \emph{listplex} で、各定数 $k$ に対して、最大$k$-plexes をリストアップします。 $o^(\gamma^d)$ time ここで$\gamma$ は$k$ に関連する値ですが、2 より厳密に小さい値で、$d$ は実数グラフの頂点数 $n$ よりもはるかに小さいグラフの縮約です。 2^n$の自明なバウンドと比較すると、改善は重要であり、我々のバウンドはすべての既知の結果より優れている。実際には、構造ベースのプルールール、キャッシュ効率のよいデータ構造、並列技術など、所定のサイズの$k$プレックスの一覧化を高速化するために、いくつかの手法を用いる。これら全ては、非常に実用的なアルゴリズムをもたらす。実証的な結果は、我々のアプローチが最先端のソリューションを最大で桁違いに上回っていることを示している。 Listing dense subgraphs in large graphs plays a key task in varieties of network analysis applications like community detection. Clique, as the densest model, has been widely investigated. However, in practice, communities rarely form as cliques for various reasons, e.g., data noise. Therefore, $k$-plex, -- graph with each vertex adjacent to all but at most $k$ vertices, is introduced as a relaxed version of clique. Often, to better simulate cohesive communities, an emphasis is placed on connected $k$-plexes with small $k$. In this paper, we continue the research line of listing all maximal $k$-plexes and maximal $k$-plexes of prescribed size. Our first contribution is algorithm \emph{ListPlex} that lists all maximal $k$-plexes in $O^(\gamma^D)$ time for each constant $k$, where $\gamma$ is a value related to $k$ but strictly smaller than 2, and $D$ is the degeneracy of the graph that is far less than the vertex number $n$ in real-word graphs. Compared to the trivial bound of $2^n$, the improvement is significant, and our bound is better than all previously known results. In practice, we further use several techniques to accelerate listing $k$-plexes of a given size, such as structural-based prune rules, cache-efficient data structures, and parallel techniques. All these together result in a very practical algorithm. Empirical results show that our approach outperforms the state-of-the-art solutions by up to orders of magnitude.	翻訳日:2022-02-18 16:42:56 公開日:2022-02-17
# 有限混合モデルによる最大近似推定のための精製収束率 Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models ( http://arxiv.org/abs/2202.08786v1 ) ライセンス: Link先を確認	Tudor Manole, Nhat Ho	(参考訳) 有限混合モデル下での最大極大推定(MLE)の収束率を再検討する。これらのモデルにおけるパラメータ推定の解析において、ワッサースタイン距離は標準損失関数となり、ラベルの切り換えを回避でき、結合した混合成分の挙動を消滅重みで正確に特徴付けることができるようになった。しかし、ワッサーシュタイン計量は、残りの適合した混合成分の中で最悪の場合の収束率のみを捉えることができる。対数類似関数をペナル化して混合重みの消滅を阻止すると、より強い損失関数を導出し、ワッサーシュタイン距離のこの欠点を解決することができる。これらの新しい損失関数は, 適合混合成分の収束率の不均一性を正確に把握し, 各種混合モデルにおける既存点方向および一様収束率の研削に用いた。特に、これらの結果は、ペナルティ化されたmleの構成要素のサブセットが、通常、過去の作業で予想されていたよりもかなり速く収束することを示している。さらに、これらの結論のいくつかが従来のMLEにまで拡張されていることを示す。我々の理論的知見は、これらの収束率を改善するためのシミュレーション研究によって裏付けられている。 We revisit convergence rates for maximum likelihood estimation (MLE) under finite mixture models. The Wasserstein distance has become a standard loss function for the analysis of parameter estimation in these models, due in part to its ability to circumvent label switching and to accurately characterize the behaviour of fitted mixture components with vanishing weights. However, the Wasserstein metric is only able to capture the worst-case convergence rate among the remaining fitted mixture components. We demonstrate that when the log-likelihood function is penalized to discourage vanishing mixing weights, stronger loss functions can be derived to resolve this shortcoming of the Wasserstein distance. These new loss functions accurately capture the heterogeneity in convergence rates of fitted mixture components, and we use them to sharpen existing pointwise and uniform convergence rates in various classes of mixture models. In particular, these results imply that a subset of the components of the penalized MLE typically converge significantly faster than could have been anticipated from past work. We further show that some of these conclusions extend to the traditional MLE. Our theoretical findings are supported by a simulation study to illustrate these improved convergence rates.	翻訳日:2022-02-18 16:42:23 公開日:2022-02-17
# グラフマスク付きオートエンコーダ Graph Masked Autoencoder ( http://arxiv.org/abs/2202.08391v1 ) ライセンス: Link先を確認	Hongxu Chen, Sixiao Zhang, Guandong Xu	(参考訳) トランスフォーマーはグラフ表現の学習において最先端のパフォーマンスを達成している。しかし、深いトランスフォーマーをスクラッチからトレーニングすることは困難であり、メモリ消費が大きいため、現実世界のシナリオにトランスフォーマーを適用する際の課題が残っている。この2つの課題に対処するために,我々は,バニラグラフ変換器をエンコーダおよびデコーダとして使用する,グラフ表現を学習するための自己教師型モデルであるGraph Masked Autoencoders (GMAE)を提案する。 GMAEは部分的にマスキングされたグラフを入力として、マスキングされたノードの特徴を再構築する。我々は、非対称エンコーダ-デコーダ設計を採用し、エンコーダは深いグラフトランス、デコーダは浅いグラフトランスフォーマである。マスキング機構と非対称設計によりgmaeは従来のトランスフォーマーに比べてメモリ効率の良いモデルとなった。 GMAEを用いて事前学習したグラフトランスフォーマーは,スクラッチからのトレーニングに比べ,微調整後の性能が向上することを示した。また,従来の自己教師付きグラフ表現モデルとして機能し,svmモデルをダウンストリームグラフ分類器として使用する場合,gmaeは7つのベンチマークデータセットのうち5つで最先端のパフォーマンスを実現する。 Transformers have achieved state-of-the-art performance in learning graph representations. However, there are still some challenges when applying transformers to real-world scenarios due to the fact that deep transformers are hard to be trained from scratch and the memory consumption is large. To address the two challenges, we propose Graph Masked Autoencoders (GMAE), a self-supervised model for learning graph representations, where vanilla graph transformers are used as the encoder and the decoder. GMAE takes partially masked graphs as input, and reconstructs the features of the masked nodes. We adopt asymmetric encoder-decoder design, where the encoder is a deep graph transformer and the decoder is a shallow graph transformer. The masking mechanism and the asymmetric design make GMAE a memory-efficient model compared with conventional transformers. We show that, compared with training from scratch, the graph transformer pre-trained using GMAE can achieve much better performance after fine-tuning. We also show that, when serving as a conventional self-supervised graph representation model and using an SVM model as the downstream graph classifier, GMAE achieves state-of-the-art performance on 5 of the 7 benchmark datasets.	翻訳日:2022-02-18 16:40:08 公開日:2022-02-17
# Point-of-Interest Recommender システムによるレーティングと関連性の改善 Improving Rating and Relevance with Point-of-Interest Recommender System ( http://arxiv.org/abs/2202.08751v1 ) ライセンス: Link先を確認	Syed Raza Bashir, Vojislav Misic	(参考訳) 位置情報ベースのソーシャルネットワークでは,関心点(POI)の推薦が不可欠である。これにより、ユーザや場所の情報共有が容易になる。近年,質問項目関連性を表す大量の学習データを必要とする大規模検索システムとしてPOIを推奨する傾向にある。しかし,検索システムにおけるユーザフィードバックの収集は高価である。既存のPOIレコメンデータシステムは、ユーザとアイテム(ロケーション)のインタラクションのみに基づいてレコメンデーションを行います。しかし、考慮すべきフィードバックの源はたくさんあります。例えば、ユーザがPOIを訪れたとき、POIとは何かなどです。 POIレコメンデータの開発には,これらすべての種類のフィードバックを統合することが不可欠です。本稿では,ユーザ情報とアイテム情報と補助情報を用いて検索システムにおけるレコメンデーションモデリングを改善することを提案する。我々は,協調情報とコンテンツ情報の両方が存在する場合のクエリ-イテム関係をモデル化するディープニューラルネットワークアーキテクチャを開発した。また、ユーザからのフィードバックデータからコンテキスト情報を含めることで、クエリやアイテムの学習表現の質を向上させる。これらの学習表現を大規模データセットに適用することで、大幅な改善がもたらされた。 The recommendation of points of interest (POIs) is essential in location-based social networks. It makes it easier for users and locations to share information. Recently, researchers tend to recommend POIs by treating them as large-scale retrieval systems that require a large amount of training data representing query-item relevance. However, gathering user feedback in retrieval systems is an expensive task. Existing POI recommender systems make recommendations based on user and item (location) interactions solely. However, there are numerous sources of feedback to consider. For example, when the user visits a POI, what is the POI is about and such. Integrating all these different types of feedback is essential when developing a POI recommender. In this paper, we propose using user and item information and auxiliary information to improve the recommendation modelling in a retrieval system. We develop a deep neural network architecture to model query-item relevance in the presence of both collaborative and content information. We also improve the quality of the learned representations of queries and items by including the contextual information from the user feedback data. The application of these learned representations to a large-scale dataset resulted in significant improvements.	翻訳日:2022-02-18 16:39:48 公開日:2022-02-17
# この通知を送りましょうか。将来をモデル化したプッシュ通知決定の最適化 Should I send this notification? Optimizing push notifications decision making by modeling the future ( http://arxiv.org/abs/2202.08812v1 ) ライセンス: Link先を確認	Conor O'Brien, Huasen Wu, Shaodan Zhai, Dalin Guo, Wenzhe Shi, Jonathan J Hunt	(参考訳) 最も推奨されるシステムは、ユーザの即時応答に基づいて最適化されるミオピックである。これは、長期的なユーザ満足度の作成など、真の目標と誤解する可能性がある。この作業では,特に推奨システム決定の長期的な影響が強いモバイルプッシュ通知に重点を置いています。例えば、過剰な通知や無関係な通知を送ると、ユーザーに迷惑をかけ、通知を無効にすることがある。しかし、将来マイナス効果が発生するため、筋電図システムは常に通知を送信することを選択する。これは典型的にはヒューリスティックを用いて緩和される。しかし、ヒューリスティックスは推論や改善が困難であり、システムが変更されるたびに修正が必要であり、亜最適かもしれない。これらの欠点に対処するため、長期的価値(LTV)を直接最適化するレコメンデーターシステムに大きな関心がある。本稿では,モデルベース強化学習(RL)を用いたLTVの最大化手法について述べる。我々は,通知がユーザの将来の行動に与える影響をモデル化する。推薦システムにおけるLTVの最大化にRLを適用した以前の作業の多くはセッションベースの最適化に重点を置いていたが、この作業における通知決定の時間的地平は数日にわたって続いている。我々は、大手ソーシャルネットワーク上でのA/Bテストでこのアプローチをテストする。プッシュ通知に関する決定を最適化することで,既存のヒューリスティックなシステムと同じレベルのユーザエンゲージメントをプラットフォーム上で生成しながら,通知の送信を減らし,ベースラインシステムよりも高いオープンレートを得ることができることを示す。 Most recommender systems are myopic, that is they optimize based on the immediate response of the user. This may be misaligned with the true objective, such as creating long term user satisfaction. In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong. For example, sending too many or irrelevant notifications may annoy a user and cause them to disable notifications. However, a myopic system will always choose to send a notification since negative effects occur in the future. This is typically mitigated using heuristics. However, heuristics can be hard to reason about or improve, require retuning each time the system is changed, and may be suboptimal. To counter these drawbacks, there is significant interest in recommender systems that optimize directly for long-term value (LTV). Here, we describe a method for maximising LTV by using model-based reinforcement learning (RL) to make decisions about whether to send push notifications. We model the effects of sending a notification on the user's future behavior. Much of the prior work applying RL to maximise LTV in recommender systems has focused on session-based optimization, while the time horizon for notification decision making in this work extends over several days. We test this approach in an A/B test on a major social network. We show that by optimizing decisions about push notifications we are able to send less notifications and obtain a higher open rate than the baseline system, while generating the same level of user engagement on the platform as the existing, heuristic-based, system.	翻訳日:2022-02-18 16:39:34 公開日:2022-02-17
# 反事実推論と事実推論に基づくグラフニューラルネットワーク説明の学習と評価 Learning and Evaluating Graph Neural Network Explanations based on Counterfactual and Factual Reasoning ( http://arxiv.org/abs/2202.08816v1 ) ライセンス: Link先を確認	Juntao Tan, Shijie Geng, Zuohui Fu, Yingqiang Ge, Shuyuan Xu, Yunqi Li, Yongfeng Zhang	(参考訳) 構造化データは、ソーシャルメディアのソーシャルネットワーク、学術ウェブサイトの引用ネットワーク、オンラインフォーラムのスレッドデータなど、Webアプリケーションによく存在する。複雑なトポロジーのため、そのようなデータ内のリッチな情報を処理し利用することは困難である。グラフニューラルネットワーク(GNN)は、構造データに対する学習表現に大きな利点を示している。しかし、ディープラーニングモデルの非透明性は、GNNによる予測を説明・解釈するのは簡単ではない。一方、GNNの説明を評価することは大きな課題であり、多くの場合、真理的な説明は利用できない。本稿では、因果推論理論に基づくCF^2推論の考察を行い、説明可能なGNNにおける学習と評価の両問題を解く。本稿では,2つのカジュアルな視点から最適化問題を定式化するモデルに依存しないフレームワークを提案する。これにより、cf^2 は以前の説明可能な gnn と区別される。この研究のもうひとつの貢献は、GNN説明の評価である。根拠を必要とせず, 生成した説明を定量的に評価するために, 説明の必要性と十分性を評価するために, 反事実的, 事実的推論に基づく指標を設計する。 CF^2は, 実世界のデータセットにおける従来の最先端の手法よりも, より優れた説明を生成する。さらに, 統計的解析により, 実測値と実測値との相関関係を正当化する。 Structural data well exists in Web applications, such as social networks in social media, citation networks in academic websites, and threads data in online forums. Due to the complex topology, it is difficult to process and make use of the rich information within such data. Graph Neural Networks (GNNs) have shown great advantages on learning representations for structural data. However, the non-transparency of the deep learning models makes it non-trivial to explain and interpret the predictions made by GNNs. Meanwhile, it is also a big challenge to evaluate the GNN explanations, since in many cases, the ground-truth explanations are unavailable. In this paper, we take insights of Counterfactual and Factual (CF^2) reasoning from causal inference theory, to solve both the learning and evaluation problems in explainable GNNs. For generating explanations, we propose a model-agnostic framework by formulating an optimization problem based on both of the two casual perspectives. This distinguishes CF^2 from previous explainable GNNs that only consider one of them. Another contribution of the work is the evaluation of GNN explanations. For quantitatively evaluating the generated explanations without the requirement of ground-truth, we design metrics based on Counterfactual and Factual reasoning to evaluate the necessity and sufficiency of the explanations. Experiments show that no matter ground-truth explanations are available or not, CF^2 generates better explanations than previous state-of-the-art methods on real-world datasets. Moreover, the statistic analysis justifies the correlation between the performance on ground-truth evaluation and our proposed metrics.	翻訳日:2022-02-18 16:39:09 公開日:2022-02-17
# 局所的非パラメトリック信頼区間とシーケンス Locally private nonparametric confidence intervals and sequences ( http://arxiv.org/abs/2202.08728v1 ) ライセンス: Link先を確認	Ian Waudby-Smith, Zhiwei Steven Wu, Aaditya Ramdas	(参考訳) この研究は、局所微分プライバシー(ldp)の制約下で人口パラメータの非パラメトリック、非漸近的統計推論を行う手法を導出する。 z_1, \dots, x_n)$に民営化される平均$\mu^\star$(z_1, \dots, z_n)$の観測により、民営化データへのアクセスが与えられた場合にのみ$\mu^\star \in \mathbb r$に対して、信頼区間(ci)と時間一様信頼シーケンス(cs)を導入する。我々は、ワーナーの有名な「ランダム化応答」機構を非パラメトリックかつ逐次的に一般化し、任意の有界な確率変数に対するldpを満たし、その結果の民営化された観測へのアクセスを与えられた方法でcisとcssを提供する。我々は、これらのCSを拡張して、時間変化のある(非定常的な)手段を捕捉し、これらの手法がオンラインA/Bテストのプライベートな実施にどのように使用できるかを説明する。 This work derives methods for performing nonparametric, nonasymptotic statistical inference for population parameters under the constraint of local differential privacy (LDP). Given observations $(X_1, \dots, X_n)$ with mean $\mu^\star$ that are privatized into $(Z_1, \dots, Z_n)$, we introduce confidence intervals (CI) and time-uniform confidence sequences (CS) for $\mu^\star \in \mathbb R$ when only given access to the privatized data. We introduce a nonparametric and sequentially interactive generalization of Warner's famous "randomized response" mechanism, satisfying LDP for arbitrary bounded random variables, and then provide CIs and CSs for their means given access to the resulting privatized observations. We extend these CSs to capture time-varying (non-stationary) means, and conclude by illustrating how these methods can be used to conduct private online A/B tests.	翻訳日:2022-02-18 16:36:46 公開日:2022-02-17
# 無線フェデレーション学習における航空モデル集約の効率化のための時間関連スパシフィケーション Time-Correlated Sparsification for Efficient Over-the-Air Model Aggregation in Wireless Federated Learning ( http://arxiv.org/abs/2202.08420v1 ) ライセンス: Link先を確認	Yuxuan Sun, Sheng Zhou, Zhisheng Niu, Deniz G\"und\"uz	(参考訳) Federated Edge Learning(FEEL)は、エッジインテリジェンスアプリケーションを駆動するための有望な分散機械学習(ML)フレームワークである。しかし、無線の動的な環境とエッジデバイスのリソース制限により、通信は大きなボトルネックとなる。本研究では,通信効率の高い FEEL のためのハイブリッドアグリゲーション (TCS-H) を用いた時間相関スペーシングを提案する。モデルパラメータ間の時間的相関を利用して、デバイス間で同一のグローバルスペーシフィケーションマスクを構築し、より効率的なモデルアグリゲーションを実現する。各デバイスはさらに局所スパースベクトルを構築し、それぞれが直交する多重アクセスを持つデジタル通信によって集約される重要なパラメータを探索する。 tcs-hの装置スケジューリングと電力割当アルゴリズムを更に設計する。実験結果から,TCS-Hは通信資源が限られており,直交モデルアグリゲーションによる従来のTop-Kスペーシフィケーションに比べて高い精度が得られることがわかった。 Federated edge learning (FEEL) is a promising distributed machine learning (ML) framework to drive edge intelligence applications. However, due to the dynamic wireless environments and the resource limitations of edge devices, communication becomes a major bottleneck. In this work, we propose time-correlated sparsification with hybrid aggregation (TCS-H) for communication-efficient FEEL, which exploits jointly the power of model compression and over-the-air computation. By exploiting the temporal correlations among model parameters, we construct a global sparsification mask, which is identical across devices, and thus enables efficient model aggregation over-the-air. Each device further constructs a local sparse vector to explore its own important parameters, which are aggregated via digital communication with orthogonal multiple access. We further design device scheduling and power allocation algorithms for TCS-H. Experiment results show that, under limited communication resources, TCS-H can achieve significantly higher accuracy compared to the conventional top-K sparsification with orthogonal model aggregation, with both i.i.d. and non-i.i.d. data distributions.	翻訳日:2022-02-18 16:36:24 公開日:2022-02-17
# ADD 2022:初のオーディオ深層合成検出チャレンジ ADD 2022: the First Audio Deep Synthesis Detection Challenge ( http://arxiv.org/abs/2202.08433v1 ) ライセンス: Link先を確認	Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li	(参考訳) オーディオディープフェイク検出は、ASVspoof 2021に含まれる新たなトピックである。しかし、最近の共有タスクは多くの現実と挑戦的なシナリオをカバーしていない。最初のオーディオディープ合成検出チャレンジ(ADD)は、ギャップを埋めるために動機付けられた。 ADD 2022には、低品質の偽オーディオ検出(LF)、部分的に偽オーディオ検出(PF)、オーディオ偽ゲーム(FG)の3つのトラックが含まれている。 LFトラックは、さまざまな現実世界のノイズで、ボナ・フェイドと完全に偽の発話を扱うことに焦点を当てている。 PFトラックは、部分的に偽のオーディオと本物を区別することを目的としている。 FGトラックは、オーディオ生成タスクとオーディオ偽検出タスクの2つのタスクを含むライバルゲームである。本稿では,データセット,評価指標,プロトコルについて述べる。また,近年のオーディオディープフェイク検出タスクの進歩を反映した大きな発見を報告する。 Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake game (FG). The LF track focuses on dealing with bona fide and fully fake utterances with various real-world noises etc. The PF track aims to distinguish the partially fake audio from the real. The FG track is a rivalry game, which includes two tasks: an audio generation task and an audio fake detection task. In this paper, we describe the datasets, evaluation metrics, and protocols. We also report major findings that reflect the recent advances in audio deepfake detection tasks.	翻訳日:2022-02-18 16:36:03 公開日:2022-02-17
# MLP-ASR:音声認識のためのシーケンス長非依存型オールMLPアーキテクチャ MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition ( http://arxiv.org/abs/2202.08456v1 ) ライセンス: Link先を確認	Jin Sakuma, Tatsuya Komatsu, and Robin Scheibler	(参考訳) 可変長入力に適した多層パーセプトロン(mlp)ベースのアーキテクチャを提案する。画像分類のために最近提案されたMLPベースのアーキテクチャは、固定サイズの入力にのみ使用できる。しかし、例えば音響信号など、多くの種類のデータの長さは自然に変化する。任意の長さのシーケンスで使用するために,MLPベースのアーキテクチャを拡張する3つの手法を提案する。 1つはフーリエ領域で適用される円形の畳み込み、もう1つは奥行きの畳み込みを適用し、最後はシフト演算に依存する。提案手法をLibrispeechとTedlium2コーパスを用いて自動音声認識タスクで評価する。提案されている最も優れたmlpベースのアーキテクチャは wer を 1.0 / 0.9%、librispeech dev-clean/dev-other で0.9 / 0.5%、test-clean/test-other セットで 0.8 / 1.1%、 tedlium2 dev/test セットで86.4%改善する。 We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input. MLP-based architectures, recently proposed for image classification, can only be used for inputs of a fixed, pre-defined size. However, many types of data are naturally variable in length, for example, acoustic signals. We propose three approaches to extend MLP-based architectures for use with sequences of arbitrary length. The first one uses a circular convolution applied in the Fourier domain, the second applies a depthwise convolution, and the final relies on a shift operation. We evaluate the proposed architectures on an automatic speech recognition task with the Librispeech and Tedlium2 corpora. The best proposed MLP-based architectures improves WER by 1.0 / 0.9%, 0.9 / 0.5% on Librispeech dev-clean/dev-other, test-clean/test-other set, and 0.8 / 1.1% on Tedlium2 dev/test set using 86.4% the size of self-attention-based architecture.	翻訳日:2022-02-18 16:35:50 公開日:2022-02-17
# トレーニングのボトルネックはどこにあるのか? ディープラーニング前処理パイプラインにおける隠れトレードオフ Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines ( http://arxiv.org/abs/2202.08679v1 ) ライセンス: Link先を確認	Alexander Isenko, Ruben Mayer, Jeffrey Jedele, Hans-Arno Jacobsen	(参考訳) ディープラーニングにおける前処理パイプラインは、トレーニングプロセスを忙しくするための十分なデータスループットの提供を目的としている。ハードウェアの革新(高速GPU、TPU、インターコネクションなど)や高度な並列化技術によって、トレーニングプロセスのスループットが向上するにつれ、リソース利用の最大化はますます困難になりつつある。同時に、ますます複雑なモデルをトレーニングするために必要なトレーニングデータも増えています。この開発の結果、エンドツーエンドのディープラーニングパイプラインでは、データ前処理とプロビジョニングが深刻なボトルネックになっている。本稿では,4つの異なる機械学習領域からのデータ前処理パイプラインを詳細に分析する。エンドツーエンドのディープラーニングパイプラインのためのデータセットを効率的に準備し、スループット、前処理時間、ストレージ消費を最適化するために個々のトレードオフを抽出する新しい視点を導入する。さらに、スループットを最大化する適切な前処理戦略を自動的に決定できるオープンソースのプロファイリングライブラリを提供する。実世界のユースケースに生成した洞察を適用することで、パイプラインを機能的に同一に保ちながら、未調整のシステムに比べてスループットが3倍から13倍に向上する。これらの結果は,データパイプラインチューニングの膨大な可能性を示している。 Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amount of training data needed in order to train increasingly complex models is growing. As a consequence of this development, data preprocessing and provisioning are becoming a severe bottleneck in end-to-end deep learning pipelines. In this paper, we provide an in-depth analysis of data preprocessing pipelines from four different machine learning domains. We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines and extract individual trade-offs to optimize throughput, preprocessing time, and storage consumption. Additionally, we provide an open-source profiling library that can automatically decide on a suitable preprocessing strategy to maximize throughput. By applying our generated insights to real-world use-cases, we obtain an increased throughput of 3x to 13x compared to an untuned system while keeping the pipeline functionally identical. These findings show the enormous potential of data pipeline tuning.	翻訳日:2022-02-18 16:35:30 公開日:2022-02-17
# 変圧器を用いた確率力学の学習と創発行動予測 Learning stochastic dynamics and predicting emergent behavior using transformers ( http://arxiv.org/abs/2202.08708v1 ) ライセンス: Link先を確認	Corneel Casert, Isaac Tamblyn and Stephen Whitelam	(参考訳) 言語処理用に設計されたニューラルネットワークは,システムの単一の動的軌道を観測することで確率システムの動的規則を学習でき,訓練中に観察されない条件下での創発的挙動を正確に予測できる。連続時間モンテカルロ動力学による活性物質の格子モデルについて検討し,その定常状態が小さな分散クラスターからなる密度でシミュレーションした。我々はモデルの1つの軌道上でトランスフォーマーと呼ばれるニューラルネットワークを訓練する。変圧器は多数の非局所的な動的規則を表現できる能力を有しており、このモデルの力学が少数のプロセスから構成されていることが分かる。訓練された変圧器の前方伝播軌道は、訓練中に見当たらない密度で運動性誘起相分離を示し、非平衡相転移の存在を予測する。トランスフォーマは、速度の明示的な列挙や構成空間の粗粒化を伴わずに観察から動的規則を学習する柔軟性を持つため、この手順は、大規模で複雑な力学発生器を持つものを含む、幅広い物理システムに適用することができる。 We show that a neural network originally designed for language processing can learn the dynamical rules of a stochastic system by observation of a single dynamical trajectory of the system, and can accurately predict its emergent behavior under conditions not observed during training. We consider a lattice model of active matter undergoing continuous-time Monte Carlo dynamics, simulated at a density at which its steady state comprises small, dispersed clusters. We train a neural network called a transformer on a single trajectory of the model. The transformer, which we show has the capacity to represent dynamical rules that are numerous and nonlocal, learns that the dynamics of this model consists of a small number of processes. Forward-propagated trajectories of the trained transformer, at densities not encountered during training, exhibit motility-induced phase separation and so predict the existence of a nonequilibrium phase transition. Transformers have the flexibility to learn dynamical rules from observation without explicit enumeration of rates or coarse-graining of configuration space, and so the procedure used here can be applied to a wide range of physical systems, including those with large and complex dynamical generators.	翻訳日:2022-02-18 16:35:11 公開日:2022-02-17
# グラフ上のハミルトン・ヤコビ方程式と半教師付き学習とデータ深度への応用 Hamilton-Jacobi equations on graphs with applications to semi-supervised learning and data depth ( http://arxiv.org/abs/2202.08789v1 ) ライセンス: Link先を確認	Jeff Calder, Mahmood Ettehad	(参考訳) 最も短いパスグラフ距離は、データ多様体上の測地線距離を近似できるため、データサイエンスや機械学習で広く使われている。しかし、最短経路距離は、ノイズまたは逆方向の摂動によって、グラフ内の破損したエッジの追加に非常に敏感である。本稿では, グラフ上のハミルトン・ヤコビ方程式の族を, $p$-eikonal equation と呼ぶ。 p=1$の$p$-ekonal方程式はグラフ上の証明可能な堅牢な距離型関数であり、$p\to \infty$制限は最短経路距離を回復する。 p$-eikonal方程式は最短経路グラフ距離とは一致しないが、ランダムな幾何学グラフ上の$p$-eikonal方程式の連続限界は連続体における測地密度重み付き距離を回復することを示している。我々は,データ深度と半教師付き学習に対する$p$-ekonal方程式の適用を検討し,連続極限を用いて両アプリケーションに漸近的整合性を示す。最後に,MNIST,FashionMNIST,CIFAR-10などの実画像データセットに対して,データ深度と半教師付き学習による実験結果を示す。 Shortest path graph distances are widely used in data science and machine learning, since they can approximate the underlying geodesic distance on the data manifold. However, the shortest path distance is highly sensitive to the addition of corrupted edges in the graph, either through noise or an adversarial perturbation. In this paper we study a family of Hamilton-Jacobi equations on graphs that we call the $p$-eikonal equation. We show that the $p$-eikonal equation with $p=1$ is a provably robust distance-type function on a graph, and the $p\to \infty$ limit recovers shortest path distances. While the $p$-eikonal equation does not correspond to a shortest-path graph distance, we nonetheless show that the continuum limit of the $p$-eikonal equation on a random geometric graph recovers a geodesic density weighted distance in the continuum. We consider applications of the $p$-eikonal equation to data depth and semi-supervised learning, and use the continuum limit to prove asymptotic consistency results for both applications. Finally, we show the results of experiments with data depth and semi-supervised learning on real image datasets, including MNIST, FashionMNIST and CIFAR-10, which show that the $p$-eikonal equation offers significantly better results compared to shortest path distances.	翻訳日:2022-02-18 16:34:52 公開日:2022-02-17
# (参考訳) グラフニューラルネットワークが生成するグラフ関数の厳密なクラス The Exact Class of Graph Functions Generated by Graph Neural Networks ( http://arxiv.org/abs/2202.08833v1 ) ライセンス: CC BY 4.0	Mohammad Fereydounian, Hamed Hassani, Javid Dadashkarimi, Amin Karbasi	(参考訳) エッジウェイトとノード特徴の任意のセットで定義されたグラフ関数が与えられたとき、グラフ関数と出力が同一であるグラフニューラルネットワーク(gnn)が存在するだろうか? 本稿では,この疑問に完全に答え,GNNで表現可能なグラフ問題のクラスを特徴付ける。エッジ重みとノード特徴の置換の観点から代数的条件を同定し、グラフ問題がgnnの到達範囲内にあることを証明した。さらに、この条件を2次的に多くの制約をチェックすることで効率よく検証できることを示す。 GNNの表現力に関する洗練された特徴付けは、GNNとWeisfeiler-Lehmanグラフの同値性を示す理論結果と直交する。例えば、我々の特徴は、min-cut値、max-flow値、max-clique sizeなどの多くの自然グラフ問題をGNNで表現できることを示唆している。対照的に、驚くべきことに、GNNがすべてのノード間の最も短いパスの長さを正しく見つけることができない非常に単純なグラフが存在する。最短経路を見つけることは動的プログラミング(dp)における最も古典的な問題の1つである。このように、前述の否定例は、(概念的には)非常に類似した反復的手順に従っているにもかかわらず、DPとGNNの相違を浮き彫りにしている。最後に,実験シミュレーションによる理論結果を支持する。 Given a graph function, defined on an arbitrary set of edge weights and node features, does there exist a Graph Neural Network (GNN) whose output is identical to the graph function? In this paper, we fully answer this question and characterize the class of graph problems that can be represented by GNNs. We identify an algebraic condition, in terms of the permutation of edge weights and node features, which proves to be necessary and sufficient for a graph problem to lie within the reach of GNNs. Moreover, we show that this condition can be efficiently verified by checking quadratically many constraints. Note that our refined characterization on the expressive power of GNNs are orthogonal to those theoretical results showing equivalence between GNNs and Weisfeiler-Lehman graph isomorphism heuristic. For instance, our characterization implies that many natural graph problems, such as min-cut value, max-flow value, and max-clique size, can be represented by a GNN. In contrast, and rather surprisingly, there exist very simple graphs for which no GNN can correctly find the length of the shortest paths between all nodes. Note that finding shortest paths is one of the most classical problems in Dynamic Programming (DP). Thus, the aforementioned negative example highlights the misalignment between DP and GNN, even though (conceptually) they follow very similar iterative procedures. Finally, we support our theoretical results by experimental simulations.	翻訳日:2022-02-18 16:32:55 公開日:2022-02-17
# グラフから見たBERTの過平滑化再考 Revisiting Over-smoothing in BERT from the Perspective of Graph ( http://arxiv.org/abs/2202.08625v1 ) ライセンス: Link先を確認	Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M.S. Lee, James T. Kwok	(参考訳) 近年,トランスフォーマーモデルにおける過度に平滑化現象が視覚と言語の両方で観測されている。しかし、この現象の主な原因をさらに調査するために、既存の研究が深く掘り下げられていない。そこで本研究では,このような問題を最初に発見・検討したグラフの観点から,過剰スモーシング問題を解析する試みを行う。直感的には、自己着行列は対応するグラフの正規化隣接行列と見なすことができる。上述の接続に基づいて理論的解析を行い、トランスフォーマーモデルにおける過度な平滑化問題において、層正規化が重要な役割を果たすことを確認する。具体的には、層正規化の標準偏差が十分大きい場合、トランスフォーマースタックの出力は特定の低ランク部分空間に収束し、オーバースムーズとなる。オーバースムーシング問題を軽減するために,異なる層からの表現を適応的に組み合わせ,出力をより多様にする階層的融合戦略を検討する。各種データセットにおける広範な実験結果から, 核融合法の効果を明らかにした。 Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields. However, no existing work has delved deeper to further investigate the main cause of this phenomenon. In this work, we make the attempt to analyze the over-smoothing problem from the perspective of graph, where such problem was first discovered and explored. Intuitively, the self-attention matrix can be seen as a normalized adjacent matrix of a corresponding graph. Based on the above connection, we provide some theoretical analysis and find that layer normalization plays a key role in the over-smoothing issue of Transformer-based models. Specifically, if the standard deviation of layer normalization is sufficiently large, the output of Transformer stacks will converge to a specific low-rank subspace and result in over-smoothing. To alleviate the over-smoothing problem, we consider hierarchical fusion strategies, which combine the representations from different layers adaptively to make the output more diverse. Extensive experiment results on various data sets illustrate the effect of our fusion method.	翻訳日:2022-02-18 15:53:52 公開日:2022-02-17
# 確率時系列予測のためのアンサンブル等角化分位回帰 Ensemble Conformalized Quantile Regression for Probabilistic Time Series Forecasting ( http://arxiv.org/abs/2202.08756v1 ) ライセンス: Link先を確認	Vilde Jensen, Filippo Maria Bianchi, Stian Norman Anfinsen	(参考訳) 本稿では,アンサンブル共形量子回帰(EnCQR)と呼ばれる新しい確率予測手法を提案する。 EnCQRは、分布のないほぼ妥当な予測間隔(PI)を構築し、非定常およびヘテロセダスティック時系列データに適しており、長いデータシーケンスでトレーニングされたディープラーニングアーキテクチャを含むあらゆる予測モデルに適用することができる。 EnCQRはブートストラップアンサンブル推定器を利用して、データ交換性の必要性を取り除くことで、時系列に共形予測器を使用できる。アンサンブル学習者は、定量回帰を実行する汎用機械学習アルゴリズムとして実装され、PIの長さがデータの局所的変動に適応できるようにする。実験では,異なるヘテロシドキシーによって特徴付けられる時系列を予測した。その結果、encqrは量的回帰や共形予測のみに基づくモデルよりも優れており、より鋭く、より有益で、有効なpiを提供する。 This paper presents a novel probabilistic forecasting method called ensemble conformalized quantile regression (EnCQR). EnCQR constructs distribution-free and approximately marginally valid prediction intervals (PIs), is suitable for nonstationary and heteroscedastic time series data, and can be applied on top of any forecasting model, including deep learning architectures that are trained on long data sequences. EnCQR exploits a bootstrap ensemble estimator, which enables the use of conformal predictors for time series by removing the requirement of data exchangeability. The ensemble learners are implemented as generic machine learning algorithms performing quantile regression, which allow the length of the PIs to adapt to local variability in the data. In the experiments, we predict time series characterized by a different amount of heteroscedasticity. The results demonstrate that EnCQR outperforms models based only on quantile regression or conformal prediction, and it provides sharper, more informative, and valid PIs.	翻訳日:2022-02-18 15:53:35 公開日:2022-02-17
# 未知の切断点を持つ高次元データのモデリング:融合ペナル化ロジスティック閾値回帰 Modeling High-Dimensional Data with Unknown Cut Points: A Fusion Penalized Logistic Threshold Regression ( http://arxiv.org/abs/2202.08441v1 ) ライセンス: Link先を確認	Yinan Lin, Wen Zhou, Zhi Geng, Gexin Xiao, and Jianxin Yin	(参考訳) 従来のロジスティック回帰モデルでは、リンク関数は線形で連続であると見なされることが多い。ここでは,すべての連続的な特徴が順序レベルに離散化され,さらにバイナリ応答を決定するしきい値モデルを考える。閾値点と回帰係数はともに未知であり、推定される。高次元データに対して,可変選択法として積分ラッソペナルティを適用し,係数を0に縮小する,融合ペナルティ付きロジスティックしきい値回帰(フィルタ)モデルを提案する。未知しきい値の推定における軽度条件下では、係数推定のための非漸近誤差とモデル選択整合性を確立する。また, エラー伝播の注意深い評価により, CARTなどの木に基づく手法がしきい値推定条件を満たすことを示した。このフィルタモデルは, 糖尿病などの慢性疾患の早期発見と予測に, 理学検査データを用いて好適であることがわかった。また,提案手法の有限サンプル挙動についても検討し,理論的な発見を支援するモンテカルロ研究と比較した。 In traditional logistic regression models, the link function is often assumed to be linear and continuous in predictors. Here, we consider a threshold model that all continuous features are discretized into ordinal levels, which further determine the binary responses. Both the threshold points and regression coefficients are unknown and to be estimated. For high dimensional data, we propose a fusion penalized logistic threshold regression (FILTER) model, where a fused lasso penalty is employed to control the total variation and shrink the coefficients to zero as a method of variable selection. Under mild conditions on the estimate of unknown threshold points, we establish the non-asymptotic error bound for coefficient estimation and the model selection consistency. With a careful characterization of the error propagation, we have also shown that the tree-based method, such as CART, fulfill the threshold estimation conditions. We find the FILTER model is well suited in the problem of early detection and prediction for chronic disease like diabetes, using physical examination data. The finite sample behavior of our proposed method are also explored and compared with extensive Monte Carlo studies, which supports our theoretical discoveries.	翻訳日:2022-02-18 15:53:20 公開日:2022-02-17
# transcg: 透明な物体深度の完成と把握のための大規模実世界データセット TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and Grasping ( http://arxiv.org/abs/2202.08471v1 ) ライセンス: Link先を確認	Hongjie Fang, Hao-Shu Fang, Sheng Xu and Cewu Lu	(参考訳) 透明なオブジェクトは私たちの日常生活で一般的であり、自動生産ラインで頻繁に扱われます。視覚に基づくロボットによる物体の把握と操作は、自動化に有用だろう。しかし、現在の把持アルゴリズムの大部分は深度画像に大きく依存しているため失敗するが、通常の深度センサは通常、光の反射と屈折のために透明な物体の正確な深さ情報を生成することができない。そこで本稿では,130シーンの57,715 rgb-d画像を含む透明オブジェクト深度補完のための大規模実世界データセットをコントリビュートすることで,この問題に対処した。私たちのデータセットは、最初の大規模な実世界のデータセットであり、最も包括的なアノテーションを提供します。クロスドメイン実験は、我々のデータセットが非常に一般化できることを示している。さらに、RGB画像と不正確な深度マップを入力とし、精細化された深度マップを出力するエンドツーエンドの深度補完ネットワークを提案する。実験では,従来の手法よりも優れた有効性,効率性,頑健性を示し,限られたハードウェア資源で高分解能画像を処理できることを示した。実ロボット実験では,新しい物体の把握にロバストに応用できることを示した。完全なデータセットとメソッドはwww.graspnet.net/transcgで公開されている。 Transparent objects are common in our daily life and frequently handled in the automated production line. Robust vision-based robotic grasping and manipulation for these objects would be beneficial for automation. However, the majority of current grasping algorithms would fail in this case since they heavily rely on the depth image, while ordinary depth sensors usually fail to produce accurate depth information for transparent objects owing to the reflection and refraction of light. In this work, we address this issue by contributing a large-scale real-world dataset for transparent object depth completion, which contains 57,715 RGB-D images from 130 different scenes. Our dataset is the first large-scale real-world dataset and provides the most comprehensive annotation. Cross-domain experiments show that our dataset has a great generalization ability. Moreover, we propose an end-to-end depth completion network, which takes the RGB image and the inaccurate depth map as inputs and outputs a refined depth map. Experiments demonstrate superior efficacy, efficiency and robustness of our method over previous works, and it is able to process images of high resolutions under limited hardware resources. Real robot experiment shows that our method can also be applied to novel object grasping robustly. The full dataset and our method are publicly available at www.graspnet.net/transcg.	翻訳日:2022-02-18 15:51:03 公開日:2022-02-17
# 嫌な男:Facebookの挑戦のレンズを通して、嫌なミームを自動的に検出する Feels Bad Man: Dissecting Automated Hateful Meme Detection Through the Lens of Facebook's Challenge ( http://arxiv.org/abs/2202.08492v1 ) ライセンス: Link先を確認	Catherine Jennifer, Fatemeh Tahmasbi, Jeremy Blackburn, Gianluca Stringhini, Savvas Zannettou, and Emiliano De Cristofaro	(参考訳) インターネットミームはコミュニケーションの主流となっているが、同時に過激主義を提唱し、軽蔑的信念を育むためにも使われるようになっている。いずれにせよ、ミームの知覚的側面がこの現象を引き起こすのかについては、よく分かっていない。本研究では,現在最先端のマルチモーダル機械学習モデルのヘイトフルミーム検出に対する有効性,特にプラットフォーム間の一般化性について評価する。 4chan's "politically incorrect" board (/pol/)とfacebook's hateful memes challenge datasetの12,140と10,567の2つのベンチマークデータセットを使用して、競争のトップレベルの機械学習モデルをトレーニングし、バイラルな憎しみのあるミームと良性なミームを区別する最も顕著な特徴を発見しました。分類性能におけるマルチモーダルの重要性,主流のソーシャルプラットフォームにおけるWebコミュニティの影響力,その逆の3つの実験を行い,モデルの4chanミームにおける学習伝達性について検討した。実験の結果,ミームのイメージ特性はテキストの内容よりも豊富な情報を提供することがわかった。ミームにおけるヘイトスピーチのオンライン検出のために開発された現在のシステムは、その視覚要素にさらなる集中を要し、文化的意味論の解釈を改善し、マルチモーダルモデルではミームにおけるヘイトスピーチの複雑さを十分に把握できず、ソーシャルメディアプラットフォーム全体に一般化できないことを示唆している。 Internet memes have become a dominant method of communication; at the same time, however, they are also increasingly being used to advocate extremism and foster derogatory beliefs. Nonetheless, we do not have a firm understanding as to which perceptual aspects of memes cause this phenomenon. In this work, we assess the efficacy of current state-of-the-art multimodal machine learning models toward hateful meme detection, and in particular with respect to their generalizability across platforms. We use two benchmark datasets comprising 12,140 and 10,567 images from 4chan's "Politically Incorrect" board (/pol/) and Facebook's Hateful Memes Challenge dataset to train the competition's top-ranking machine learning models for the discovery of the most prominent features that distinguish viral hateful memes from benign ones. We conduct three experiments to determine the importance of multimodality on classification performance, the influential capacity of fringe Web communities on mainstream social platforms and vice versa, and the models' learning transferability on 4chan memes. Our experiments show that memes' image characteristics provide a greater wealth of information than its textual content. We also find that current systems developed for online detection of hate speech in memes necessitate further concentration on its visual elements to improve their interpretation of underlying cultural connotations, implying that multimodal models fail to adequately grasp the intricacies of hate speech in memes and generalize across social media platforms.	翻訳日:2022-02-18 15:50:43 公開日:2022-02-17
# EBHI:画像分類のための新しい内視鏡生検組織学的H&E画像データセット EBHI:A New Enteroscope Biopsy Histopathological H&E Image Dataset for Image Classification Evaluation ( http://arxiv.org/abs/2202.08552v1 ) ライセンス: Link先を確認	Weiming Hu, Chen Li, Xiaoyan Li, Md Mamunur Rahaman, Yong Zhang, Haoyuan Chen, Wanli Liu, Yudong Yao, Hongzan Sun, Ning Xu, Xinyu Huang and Marcin Grzegorze	(参考訳) 背景と目的: 大腸癌は世界で3番目に多いがんであり、がん患者の約10%を占めている。この疾患の早期発見は大腸癌患者の治療に重要である。病理組織検査は大腸癌検診の金本位制である。しかし,現在の大腸癌,特に内視鏡生検の病理組織像データセットの欠如は,コンピュータ支援診断技術の正確な評価を妨げている。方法: 新たに公開された腸鏡生検組織病理組織学的h&e画像データセット (ebhi) を本論文で発表する。 EBHIデータセットの有効性を実証するために,200倍の倍率の画像を用いて,複数の機械学習,畳み込みニューラルネットワーク,新しいトランスフォーマーベース分類器を用いて実験と評価を行った。結果:実験結果から,深層学習法はEBHIデータセットで良好に動作することが示された。従来の機械学習手法は最大精度76.02%、ディープラーニング手法は最大精度95.37%である。結語: EBHIは4倍, 5種類の腫瘍分化期像, 5532枚の画像を含む, 初めて公開された大腸病理組織内視鏡生検データセットである。 EBHIは大腸癌の自動診断のための新しい分類アルゴリズムを研究者に提供し、臨床現場で医師や患者に役立てることができると考えている。 Background and purpose: Colorectal cancer has become the third most common cancer worldwide, accounting for approximately 10% of cancer patients. Early detection of the disease is important for the treatment of colorectal cancer patients. Histopathological examination is the gold standard for screening colorectal cancer. However, the current lack of histopathological image datasets of colorectal cancer, especially enteroscope biopsies, hinders the accurate evaluation of computer-aided diagnosis techniques. Methods: A new publicly available Enteroscope Biopsy Histopathological H&E Image Dataset (EBHI) is published in this paper. To demonstrate the effectiveness of the EBHI dataset, we have utilized several machine learning, convolutional neural networks and novel transformer-based classifiers for experimentation and evaluation, using an image with a magnification of 200x. Results: Experimental results show that the deep learning method performs well on the EBHI dataset. Traditional machine learning methods achieve maximum accuracy of 76.02% and deep learning method achieves a maximum accuracy of 95.37%. Conclusion: To the best of our knowledge, EBHI is the first publicly available colorectal histopathology enteroscope biopsy dataset with four magnifications and five types of images of tumor differentiation stages, totaling 5532 images. We believe that EBHI could attract researchers to explore new classification algorithms for the automated diagnosis of colorectal cancer, which could help physicians and patients in clinical settings.	翻訳日:2022-02-18 15:50:12 公開日:2022-02-17
# 解剖学的パラメータ化統計形状モデル:統計学習による形態計測 Anatomically Parameterized Statistical Shape Model: Explaining Morphometry through Statistical Learning ( http://arxiv.org/abs/2202.08580v1 ) ライセンス: Link先を確認	Arnaud Boutillon, Asma Salhi, Val\'erie Burdin, Bhushan Borotikar	(参考訳) 統計的形状モデル(SSM)は、臨床実践において重要なステップである解剖学的構造の形態学的解析を行うための一般的なツールである。しかし、SSMによる形状表現は形状係数に基づいており、臨床関連性の解剖学的尺度との明確な一対一の関係は欠如している。形状係数は解剖学的測度の組み合わせを埋め込んでいるが、それらの関係を見つけるための形式化されたアプローチは、文献ではまだ解明されていない。これにより、SSMの使用は臨床実践における主観評価に制限される。形態計測解析から得られた解剖学的パラメータによって制御される新しいssmを提案する。解剖学的パラメータ化SSM(ANAT-SSM)は,形状係数と選択された解剖学的パラメータの線形マッピングを学習することに基づく。このマッピングは、標準SSMによって生成された合成集団から学習される。マッピングの擬似逆数を決定することで、ANAT-SSMを構築することができます。さらに, 独立な形状変化パターンを得るために, 解剖学的パラメータ化に直交性制約を課す。本研究は, 臨床解剖学的パラメータを用いて, 大腿骨骨と肩甲骨形状の2つの骨格データベースを用いて評価した。合成生成した形状の解剖学的指標は現実的な統計値を示した。学習した行列は得られた統計的関係とよく一致し,2つのssmは見当たらない形状の解剖学的パラメータの予測において中程度から優れた性能を得た。本研究は、解剖学的パラメータ化SSMの作成に解剖学的表現を用いており、その結果、標準SSMの限られた臨床解釈性は排除される。提案モデルでは, 患者の骨形態計測の差異を解析し, 患者固有の手術前計画や手術中評価に組み込むことができる。 Statistical shape models (SSMs) are a popular tool to conduct morphological analysis of anatomical structures which is a crucial step in clinical practices. However, shape representations through SSMs are based on shape coefficients and lack an explicit one-to-one relationship with anatomical measures of clinical relevance. While a shape coefficient embeds a combination of anatomical measures, a formalized approach to find the relationship between them remains elusive in the literature. This limits the use of SSMs to subjective evaluations in clinical practices. We propose a novel SSM controlled by anatomical parameters derived from morphometric analysis. The proposed anatomically parameterized SSM (ANAT-SSM) is based on learning a linear mapping between shape coefficients and selected anatomical parameters. This mapping is learned from a synthetic population generated by the standard SSM. Determining the pseudo-inverse of the mapping allows us to build the ANAT-SSM. We further impose orthogonality constraints to the anatomical parameterization to obtain independent shape variation patterns. The proposed contribution was evaluated on two skeletal databases of femoral and scapular bone shapes using clinically relevant anatomical parameters. Anatomical measures of the synthetically generated shapes exhibited realistic statistics. The learned matrices corroborated well with the obtained statistical relationship, while the two SSMs achieved moderate to excellent performance in predicting anatomical parameters on unseen shapes. This study demonstrates the use of anatomical representation for creating anatomically parameterized SSM and as a result, removes the limited clinical interpretability of standard SSMs. The proposed models could help analyze differences in relevant bone morphometry between populations, and be integrated in patient-specific pre-surgery planning or in-surgery assessment.	翻訳日:2022-02-18 15:49:49 公開日:2022-02-17
# 弱教師付き効率的なUNetとモルフォロジー後処理に基づくエンドツーエンドニューロンインスタンス分割 End-to-end Neuron Instance Segmentation based on Weakly Supervised Efficient UNet and Morphological Post-processing ( http://arxiv.org/abs/2202.08682v1 ) ライセンス: Link先を確認	Huaqian Wu, Nicolas Souedet, Caroline Jan, C\'edric Clouchoux, Thierry Delzescaux	(参考訳) 近年の研究では、医学画像解析における深層学習の優位性、特に細胞インスタンスセグメンテーションにおいて、多くの生物学的研究の基本的なステップが示されている。しかし、ニューラルネットワークの優れたパフォーマンスには、大きな偏りのないデータセットとアノテーションのトレーニングが必要である。本稿では,NuN染色神経細胞の組織像における検出と分画を,ポイントアノテーションのみで行うエンド・ツー・エンドで制御するフレームワークを提案する。私たちは最先端のネットワークであるEfficientNetをU-Netのようなアーキテクチャに統合します。検証結果は,近年の手法と比較して,モデルの優位性を示している。さらに,複数の後処理スキームを調査し,究極のエロージョンと動的再構成を用いて確率マップをセグメント化されたインスタンスに変換する手法を提案した。このアプローチは、他の古典的な後処理技術の設定と性能に優れています。 Recent studies have demonstrated the superiority of deep learning in medical image analysis, especially in cell instance segmentation, a fundamental step for many biological studies. However, the good performance of the neural networks requires training on large unbiased dataset and annotations, which is labor-intensive and expertise-demanding. In this paper, we present an end-to-end weakly-supervised framework to automatically detect and segment NeuN stained neuronal cells on histological images using only point annotations. We integrate the state-of-the-art network, EfficientNet, into our U-Net-like architecture. Validation results show the superiority of our model compared to other recent methods. In addition, we investigated multiple post-processing schemes and proposed an original strategy to convert the probability map into segmented instances using ultimate erosion and dynamic reconstruction. This approach is easy to configure and outperforms other classical post-processing techniques.	翻訳日:2022-02-18 15:49:21 公開日:2022-02-17
# オプティカルフローにより駆動されるレベルセットに基づく粒子フィルタ:x線ct時系列からの塩境界追跡への応用 Level set based particle filter driven by optical flow: an application to track the salt boundary from X-ray CT time-series ( http://arxiv.org/abs/2202.08717v1 ) ライセンス: Link先を確認	Karim Makki and Jean Fran\c{c}ois Lecomte and Lukas Fuchs and Sylvie Schueller and Etienne M\'emin	(参考訳) 画像に基づく計算流体力学は、様々な物理現象の知識と理解を活用する上で、長い間重要な役割を担ってきた。特に確率論的計算法は、純粋にランダムな乱流運動におけるシステムの複雑な力学をモデル化する方法を開いた。構造地質学の分野では、塩と周囲の岩石の双方における変形と応力状態のより深い理解が、あらゆる地下の長期エネルギー貯蔵システムの特徴付けに非常に興味がある。本研究の目的は,X線CT(Computed Tomography, CT)画像時系列から, 重力および差分荷重下での塩構造の進化を示す並列的, 確率的フィルタリング手法を用いて, 時間とともに塩境界の非線形変形を決定することである。この研究は、モデルの不確実性を考慮した物理モデリングと高度な確率画像処理手法を統合するための第一歩である。 Image-based computational fluid dynamics have long played an important role in leveraging knowledge and understanding of several physical phenomena. In particular, probabilistic computational methods have opened the way to modelling the complex dynamics of systems in purely random turbulent motion. In the field of structural geology, a better understanding of the deformation and stress state both within the salt and the surrounding rocks is of great interest to characterize all kinds of subsurface long-terms energy-storage systems. The objective of this research is to determine the non-linear deformation of the salt boundary over time using a parallelized, stochastic filtering approach from x-ray computed tomography (CT) image time series depicting the evolution of salt structures triggered by gravity and under differential loading. This work represents a first step towards bringing together physical modeling and advanced stochastic image processing methods where model uncertainty is taken into account.	翻訳日:2022-02-18 15:49:06 公開日:2022-02-17
# (参考訳) 画像品質評価のための深い知覚指標に関する研究 A study of deep perceptual metrics for image quality assessment ( http://arxiv.org/abs/2202.08692v1 ) ライセンス: CC BY 4.0	R\'emi Kazmierczak, Gianni Franchi, Nacim Belkhir, Antoine Manzanera, David Filliat	(参考訳) 画像間の類似度を定量化する指標はいくつか存在するが、高度に歪んだ画像の類似度を測定することは非効率である。本研究では,画像品質評価(iqa)タスクに取り組むために,深層ニューラルネットワークに基づく知覚指標を実証的に検討する。ネットワークのアーキテクチャやトレーニング手順など、さまざまなハイパーパラメータに従って、深い知覚指標を調査します。最後に,様々な解像度で知覚情報を集約し,画像変形の異なる iqa タスクにおける標準知覚指標を上回るマルチレゾリューション知覚指標(mr-perceptual)を提案する。私たちのコードはhttps://github.com/ENSTA-U2IS/MR_perceptualで利用可能です。 Several metrics exist to quantify the similarity between images, but they are inefficient when it comes to measure the similarity of highly distorted images. In this work, we propose to empirically investigate perceptual metrics based on deep neural networks for tackling the Image Quality Assessment (IQA) task. We study deep perceptual metrics according to different hyperparameters like the network's architecture or training procedure. Finally, we propose our multi-resolution perceptual metric (MR-Perceptual), that allows us to aggregate perceptual information at different resolutions and outperforms standard perceptual metrics on IQA tasks with varying image deformations. Our code is available at https://github.com/ENSTA-U2IS/MR_perceptual	翻訳日:2022-02-18 15:46:50 公開日:2022-02-17
# 時系列予測のための多目的モデル選択 Multi-Objective Model Selection for Time Series Forecasting ( http://arxiv.org/abs/2202.08485v1 ) ライセンス: Link先を確認	Oliver Borchert, David Salinas, Valentin Flunkert, Tim Januschowski, Stephan G\"unnemann	(参考訳) 時系列予測の研究は、精度を向上させる方法の開発に重点を置いている。しかし、トレーニング時間やレイテンシなどの他の基準は、多くの現実世界のアプリケーションで重要である。そこで本研究では,与えられたデータセットの適切な予測モデルを選択する方法について,精度が多くの基準のうちの1つにすぎない場合,多くの予測手法の中から解決する。これに対する私たちの貢献は2倍です。まず,44個の不均質な公開データセットを用いた7つの古典的および6つのディープラーニング予測手法を評価する,包括的なベンチマークを提案する。ベンチマークコードは、すべてのメソッドの評価と予測とともに、オープンソースである。これらの評価により、従来のモデルよりも優れたディープラーニングモデルに必要なデータ量などのオープンな質問に答えることができる。第2に、ベンチマーク評価を利用して、精度やレイテンシなど、複数の目的を考慮した良いデフォルトを学習します。予測モデルからパフォーマンスメトリクスへのマッピングを学習することにより、私たちのメソッドPARETOSELECTがParetoフロントから正確にモデルを選択できることを示します。我々の知る限り、PARETOSELECTは、マルチオブジェクト設定でデフォルトモデルを学習する最初の方法を構成する。 Research on time series forecasting has predominantly focused on developing methods that improve accuracy. However, other criteria such as training time or latency are critical in many real-world applications. We therefore address the question of how to choose an appropriate forecasting model for a given dataset among the plethora of available forecasting methods when accuracy is only one of many criteria. For this, our contributions are two-fold. First, we present a comprehensive benchmark, evaluating 7 classical and 6 deep learning forecasting methods on 44 heterogeneous, publicly available datasets. The benchmark code is open-sourced along with evaluations and forecasts for all methods. These evaluations enable us to answer open questions such as the amount of data required for deep learning models to outperform classical ones. Second, we leverage the benchmark evaluations to learn good defaults that consider multiple objectives such as accuracy and latency. By learning a mapping from forecasting models to performance metrics, we show that our method PARETOSELECT is able to accurately select models from the Pareto front -- alleviating the need to train or evaluate many forecasting models for model selection. To the best of our knowledge, PARETOSELECT constitutes the first method to learn default models in a multi-objective setting.	翻訳日:2022-02-18 15:34:29 公開日:2022-02-17
# SAITS: 自己注意に基づく時系列計算 SAITS: Self-Attention-based Imputation for Time Series ( http://arxiv.org/abs/2202.08516v1 ) ライセンス: Link先を確認	Wenjie Du, David C\^ot\'e, Yan Liu	(参考訳) 時系列におけるデータの欠如は、特に現実世界のアプリケーションにおいて、パターン認識の方法に障害をもたらす広範囲にわたる問題である。一般的な解決策はインプテーションであり、どの値を埋めるべきかを決めることが基本的な課題である。本稿では,多変量時系列における値計算の欠落に対する自己注意機構に基づくSAITSを提案する。 SAITSは共同最適化アプローチによって訓練され、2つの対角行列自己注意ブロック(DMSA)の重み付け組み合わせから欠落値を学ぶ。 dmsaは、時間ステップ間の時間依存性と特徴相関の両方を明示的に捉え、インプテーション精度とトレーニング速度を改善する。一方、重み付け合成設計では、注意マップと不足情報に基づいて、2つのDMSAブロックから学習した表現に重みを動的に割り当てることができる。実世界の不完全な時系列データを用いたパターン認識モデルの学習性能を向上させるために,saitsは時系列インプテーションタスクにおいて最先端の手法を効果的に超えていることを示す広範な実験を行った。 Missing data in time series is a pervasive problem that puts obstacles in the way of pattern recognition, especially in real-world applications. A popular solution is imputation, where the fundamental challenge is to determine what values should be filled in. This paper proposes SAITS, a novel method based on the self-attention mechanism for missing value imputation in multivariate time series. Trained by a joint-optimization approach, SAITS learns missing values from a weighted combination of two diagonally-masked self-attention (DMSA) blocks. DMSA explicitly captures both the temporal dependencies and feature correlations between time steps, which improves imputation accuracy and training speed. Meanwhile, the weighted-combination design enables SAITS to dynamically assign weights to the learned representations from two DMSA blocks according to the attention map and the missingness information. Extensive experiments demonstrate that SAITS outperforms the state-of-the-art methods on the time-series imputation task efficiently and reveal SAITS' potential to improve the learning performance of pattern recognition models on incomplete time-series data from the real world.	翻訳日:2022-02-18 15:34:12 公開日:2022-02-17
# CoFED:コトレーニングによるクロスサイロ不均一なマルチタスク学習 CoFED: Cross-silo Heterogeneous Federated Multi-task Learning via Co-training ( http://arxiv.org/abs/2202.08603v1 ) ライセンス: Link先を確認	Xingjian Cao, Zonghang Li, Hongfang Yu, Gang Sun	(参考訳) Federated Learning(FL)は、参加者がプライベートデータを交換することなく、高品質なモデルを協調的にトレーニングできる機械学習技術である。クロスサイロfl設定の参加者は、異なるタスクニーズを持つ独立した組織であり、データプライバシだけでなく、知的財産による独自のモデルのトレーニングにも関係しています。既存のFLスキームの多くは上記のシナリオでは不可能である。本稿では,コトレーニングのようなラベルなしの擬似ラベルデータに基づく通信効率の高いFL方式であるCoFEDを提案する。我々の知る限り、これは異種タスク、異種モデル、異種訓練アルゴリズムを同時に扱う最初のFLスキームである。実験結果から,CoFEDは通信コストの低減を図った。特に非iid設定や不均質モデルでは,提案手法により35%性能が向上した。 Federated Learning (FL) is a machine learning technique that enables participants to train high-quality models collaboratively without exchanging their private data. Participants in cross-silo FL settings are independent organizations with different task needs, and they are concerned not only with data privacy, but also with training independently their unique models due to intellectual property. Most existing FL schemes are incapability for the above scenarios. In this paper, we propose a communication-efficient FL scheme, CoFED, based on pseudo-labeling unlabeled data like co-training. To the best of our knowledge, it is the first FL scheme compatible with heterogeneous tasks, heterogeneous models, and heterogeneous training algorithms simultaneously. Experimental results show that CoFED achieves better performance with a lower communication cost. Especially for the non-IID settings and heterogeneous models, the proposed method improves the performance by 35%.	翻訳日:2022-02-18 15:33:52 公開日:2022-02-17
# (参考訳) 大量内視鏡画像を用いた大腸内視鏡ポリープ検出 Colonoscopy polyp detection with massive endoscopic images ( http://arxiv.org/abs/2202.08730v1 ) ライセンス: CC BY 4.0	Jialin Yu, Huogen Wang, Ming Chen	(参考訳) 我々は,検出速度において自明なコストで異なるデータセットで検証された平均精度を向上し,既存の終端ポリプ検出モデルを改善した。大腸内視鏡検査におけるポリープ検出に関するこれまでの研究は、医師の検査オーバーヘッドを軽減するための効率的なエンドツーエンドソリューションを提供した。しかし、後の実験で、このフレームワークはポリプ捕獲の状態が変化する以前ほど堅牢ではないことが分かりました。本研究では,ポリープ検出作業において,精度の低下の原因となる主な問題を特定するため,データセットに関するいくつかの研究を行った。私たちは、アンカーボックス形状を改善するために最適化されたアンカー生成手法を使い、小さなオブジェクト検出に必要であると信じているため、より多くのボックスが検出に使われました。代替のバックボーンは、密集したアンカーボックス回帰によって引き起こされる重い時間コストを補償するために使用される。アテンションゲートモジュールを使用することで,リアルタイム検出速度を維持しつつ,最先端ポリープ検出性能を実現することができる。 We improved an existing end-to-end polyp detection model with better average precision validated by different data sets with trivial cost on detection speed. Previous work on detecting polyps within colonoscopy \cite{Chen2018} provided an efficient end-to-end solution to alleviate doctor's examination overhead. However, our later experiments found this framework is not as robust as before as the condition of polyp capturing varies. In this work, we conducted several studies on data set, identifying main issues that causes low precision rate in the task of polyp detection. We used an optimized anchor generation methods to get better anchor box shape and more boxes are used for detection as we believe this is necessary for small object detection. A alternative backbone is used to compensate the heavy time cost introduced by dense anchor box regression. With use of the attention gate module, our model can achieve state-of-the-art polyp detection performance while still maintain real-time detection speed.	翻訳日:2022-02-18 15:31:01 公開日:2022-02-17
# テンポラルシーンセグメンテーションのためのシフトメモリネットワーク Shift-Memory Network for Temporal Scene Segmentation ( http://arxiv.org/abs/2202.08399v1 ) ライセンス: Link先を確認	Guo Cheng, Jiang Yu Zheng	(参考訳) 意味セグメンテーションは空間レイアウトの理解において非常に正確である。動的シーンに基づくリアルタイムタスクでは,時間領域における意味セグメンテーションを拡張し,動きによる空間的精度を向上させる。ストリーミング入力上のシフトモードネットワークを用いて、ゼロレイテンシ出力を保証する。シフトネットワーク下でのデータの重なりについて,ネットワーク層間の一定周期における反復計算を同定する。この冗長性を避けるために、シフトメモリネットワーク(smn)を符号化復号ベースラインから導出し、精度を損なうことなくネットワーク値を再利用する。 SMNはパッチモードで訓練され、SMNのネットワークパラメータを抽出し、高速なメモリで推論を行う。 1dスキャン入力と2dビデオから動的シーンを分割する。 SMNの実験はシフトモードとして同等の精度を達成するが、高速な推論速度とメモリの縮小を実現している。これにより、エッジデバイス上のリアルタイムアプリケーションにおけるセマンティックセグメンテーションが容易になる。 Semantic segmentation has achieved great accuracy in understanding spatial layout. For real-time tasks based on dynamic scenes, we extend semantic segmentation in temporal domain to enhance the spatial accuracy with motion. We utilize a shift-mode network over streaming input to ensure zero-latency output. For the data overlap under shifting network, this paper identifies repeated computation in fixed periods across network layers. To avoid this redundancy, we derive a Shift-Memory Network (SMN) from encoding-decoding baseline to reuse the network values without accuracy loss. Trained in patch-mode, the SMN extracts the network parameters for SMN to perform inference promptly in compact memory. We segment dynamic scenes from 1D scanning input and 2D video. The experiments of SMN achieve equivalent accuracy as shift-mode but in faster inference speeds and much smaller memory. This will facilitate semantic segmentation in real-time application on edge devices.	翻訳日:2022-02-18 15:20:25 公開日:2022-02-17
# PENCIL: ノイズラベルによるディープラーニング PENCIL: Deep Learning with Noisy Labels ( http://arxiv.org/abs/2202.08436v1 ) ライセンス: Link先を確認	Kun Yi, Guo-Hua Wang, Jianxin Wu	(参考訳) ディープラーニングは様々なコンピュータビジョンタスクにおいて優れたパフォーマンスを実現しているが、クリーンなラベルで多くのトレーニング例を必要とする。ノイズの多いラベルでデータセットを収集するのは簡単だが、そのようなノイズによりネットワークは過度に適合し、精度は劇的に低下する。この問題に対処するために,ネットワークパラメータとラベル推定をラベル分布として更新するPENCILというエンドツーエンドフレームワークを提案する。 PENCILはバックボーンネットワーク構造とは独立しており、補助的なクリーンデータセットやノイズに関する事前情報を必要としないため、既存の手法よりも汎用的で堅牢であり、適用が容易である。 PENCILは、パフォーマンス向上のために繰り返し使用することもできる。 PENCILは、ノイズタイプやノイズ率の異なる合成および実世界のデータセットにおいて、従来の最先端の手法よりも大きなマージンで優れている。また,PENCILはバックボーンネットワークに単純なアテンション構造を加えることで,マルチラベル分類タスクにも有効である。実験によると、PENCILはクリーンなデータセットにも堅牢である。 Deep learning has achieved excellent performance in various computer vision tasks, but requires a lot of training examples with clean labels. It is easy to collect a dataset with noisy labels, but such noise makes networks overfit seriously and accuracies drop dramatically. To address this problem, we propose an end-to-end framework called PENCIL, which can update both network parameters and label estimations as label distributions. PENCIL is independent of the backbone network structure and does not need an auxiliary clean dataset or prior information about noise, thus it is more general and robust than existing methods and is easy to apply. PENCIL can even be used repeatedly to obtain better performance. PENCIL outperforms previous state-of-the-art methods by large margins on both synthetic and real-world datasets with different noise types and noise rates. And PENCIL is also effective in multi-label classification tasks through adding a simple attention structure on backbone networks. Experiments show that PENCIL is robust on clean datasets, too.	翻訳日:2022-02-18 15:20:11 公開日:2022-02-17
# V2X-Sim:自律運転のための仮想協調知覚データセット V2X-Sim: A Virtual Collaborative Perception Dataset for Autonomous Driving ( http://arxiv.org/abs/2202.08449v1 ) ライセンス: Link先を確認	Yiming Li, Ziyan An, Zixun Wang, Yiqi Zhong, Siheng Chen, Chen Feng	(参考訳) V2X(V2X)は、車両と周囲のあらゆる物体との協調を表すもので、自動運転システムの認識を根本的に改善することができる。個人の知覚が急速に進歩するにつれて、公共のV2Xデータセットが不足しているため、協調的な知覚はほとんど進歩していない。本稿では,自動運転における初の大規模共同認識データセットであるv2x-simデータセットを提案する。 v2x-simは 1)道路側インフラと交差点における複数車両の協調的認識を実現するための同期記録 2)マルチモダリティ知覚を容易にするマルチモダリティセンサストリーム 3) 検出,追跡,セグメンテーションなど,さまざまな下流タスクをサポートするための,多種多様な注釈付き地上真実。我々はマルチエージェントマルチモダリティマルチタスク知覚の研究を刺激し、仮想データセットは現実的なデータセットが広く利用可能になる前に協調的な知覚の開発を促進することを約束している。 Vehicle-to-everything (V2X), which denotes the collaboration between a vehicle and any entity in its surrounding, can fundamentally improve the perception in self-driving systems. As the individual perception rapidly advances, collaborative perception has made little progress due to the shortage of public V2X datasets. In this work, we present the V2X-Sim dataset, the first public large-scale collaborative perception dataset in autonomous driving. V2X-Sim provides: 1) well-synchronized recordings from roadside infrastructure and multiple vehicles at the intersection to enable collaborative perception, 2) multi-modality sensor streams to facilitate multi-modality perception, 3) diverse well-annotated ground truth to support various downstream tasks including detection, tracking, and segmentation. We seek to inspire research on multi-agent multi-modality multi-task perception, and our virtual dataset is promising to promote the development of collaborative perception before realistic datasets become widely available.	翻訳日:2022-02-18 15:19:54 公開日:2022-02-17
# ハードウェア保証のためのコンピュータビジョンを用いたPCB成分検出 PCB Component Detection using Computer Vision for Hardware Assurance ( http://arxiv.org/abs/2202.08452v1 ) ライセンス: Link先を確認	Wenwei Zhao, Suprith Gurudu, Shayan Taheri, Shajib Ghosh, Mukhil Azhagan Mallaiyan Sathiaseelan, Navid Asadizanjani	(参考訳) 光領域におけるプリント回路基板(PCB)の保証は重要な研究分野である。画像処理やコンピュータビジョン(CV)、機械学習(ML)といった既存のPCB保証手法は数多く存在するが、PCB分野は複雑で進化が進んでいるため、新たな技術が求められている。既存のMLベースの手法は従来のCV法よりも優れているが、多くのデータを必要とし、説明可能性も低く、新しい技術が出現しても適応が難しい。これらの課題を克服するために、CVメソッドはMLメソッドとのタンデムで使用できる。特に、色、形状、テクスチャの特徴を抽出するような人間の解釈可能なCVアルゴリズムは、PCB保証説明可能性を高める。これにより、事前知識を組み込むことで、トレーニング可能なMLパラメータの数を効果的に減らし、MLモデルのトレーニングや再トレーニングにおいて高い精度を達成するために必要なデータの量を実現できる。そこで本研究では, セマンティックデータを用いたPCBコンポーネント検出作業において, コンピュータビジョンに基づく様々な特徴の利点と限界について検討する。本研究は,PCB成分検出において,色特徴が有望な性能を示すことを示した。本研究の目的は,ハードウェア保証,コンピュータビジョン,機械学習コミュニティ間のコラボレーションを促進することである。 Printed Circuit Board (PCB) assurance in the optical domain is a crucial field of study. Though there are many existing PCB assurance methods using image processing, computer vision (CV), and machine learning (ML), the PCB field is complex and increasingly evolving so new techniques are required to overcome the emerging problems. Existing ML-based methods outperform traditional CV methods, however they often require more data, have low explainability, and can be difficult to adapt when a new technology arises. To overcome these challenges, CV methods can be used in tandem with ML methods. In particular, human-interpretable CV algorithms such as those that extract color, shape, and texture features increase PCB assurance explainability. This allows for incorporation of prior knowledge, which effectively reduce the number of trainable ML parameters and thus, the amount of data needed to achieve high accuracy when training or retraining an ML model. Hence, this study explores the benefits and limitations of a variety of common computer vision-based features for the task of PCB component detection using semantic data. Results of this study indicate that color features demonstrate promising performance for PCB component detection. The purpose of this paper is to facilitate collaboration between the hardware assurance, computer vision, and machine learning communities.	翻訳日:2022-02-18 15:19:36 公開日:2022-02-17
# TraSeTR:ロボット手術におけるインスタンスレベルの機器分割のためのコントラストクエリ付きトラック・ツー・セグメンテーション・トランス TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery ( http://arxiv.org/abs/2202.08453v1 ) ライセンス: Link先を確認	Zixu Zhao, Yueming Jin, Pheng-Ann Heng	(参考訳) 手術器具のセグメンテーション - 一般にピクセル分類タスク - は、ロボット支援手術(ras)における認知知能の促進に不可欠である。しかし,従来手法では楽器の種類や事例の識別に苦慮していた。上記の問題に対処するため,我々は,セグメント毎に予測を行うマスク分類パラダイムを検討する。そこで本研究では,手術器具のセグメンテーションを支援するために,追跡手がかりを巧みに活用する新しいトラックツーセグメンテーショントランスフォーマであるtrasetrを提案する。 TraSeTRは、クエリの埋め込みをデコードすることで、インスツルメンタタイプ、ロケーション、アイデンティティとインスタンスレベルの予測、すなわちクラス-ボックス-マスクペアのセットを併用する。具体的には、過去の時間的知識をエンコードした先行クエリを導入し、アイデンティティマッチングを通じて現在のインスタンスに追跡信号を転送する。対照的なクエリ学習戦略は、クエリ特徴空間を再形成するためにさらに適用され、大きな時間的変動に起因する追跡困難を大幅に軽減する。本手法の有効性は,EndoVis ChallengesのRASベンチマークと1つの白内障手術データセットCaDISを含む3つの公開データセットに対して,最先端の計器型セグメンテーション結果を用いて実証した。 Surgical instrument segmentation -- in general a pixel classification task -- is fundamentally crucial for promoting cognitive intelligence in robot-assisted surgery (RAS). However, previous methods are struggling with discriminating instrument types and instances. To address the above issues, we explore a mask classification paradigm that produces per-segment predictions. We propose TraSeTR, a novel Track-to-Segment Transformer that wisely exploits tracking cues to assist surgical instrument segmentation. TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions i.e., a set of class-bbox-mask pairs, by decoding query embeddings. Specifically, we introduce the prior query that encoded with previous temporal knowledge, to transfer tracking signals to current instances via identity matching. A contrastive query learning strategy is further applied to reshape the query feature space, which greatly alleviates the tracking difficulty caused by large temporal variations. The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets, including two RAS benchmarks from EndoVis Challenges and one cataract surgery dataset CaDIS.	翻訳日:2022-02-18 15:19:16 公開日:2022-02-17
# Mirror-Yolo: 注意に基づくミラーのインスタンスセグメンテーションと検出モデル Mirror-Yolo: An attention-based instance segmentation and detection model for mirrors ( http://arxiv.org/abs/2202.08498v1 ) ライセンス: Link先を確認	Fengze Li, Jieming Ma, Zhongbei Tian, Ji Ge, Hai-Ning Liang, Yungang Zhang and Tianxi Wen	(参考訳) 鏡はコンピュータビジョンモデルの性能を劣化させるが、画像中の鏡を正確に検出することは依然として困難である。 yolov4は物体検出精度と速度の両方で驚くべき結果を達成するが、ミラーの検出には失敗することが多い。本稿では,ミラー検出を主目的とした新しいミラー検出手法"Mirror-YOLO"を提案する。 YOLOv4に基づく提案モデルでは,より優れた特徴獲得のための注意機構と,特徴マップ融合のためのハイパーカラム・ステップアプローチが組み込まれている。 Mirror-YOLO は実例分割のための正確な有界多角形も生成できる。提案モデルの有効性を実験により実証し,既存のミラー検出法と比較して,ミラー画像データセットにおけるミラーyoloの検出精度が向上することを示した。 Mirrors can degrade the performance of computer vision models, however to accurately detect mirrors in images remains challenging. YOLOv4 achieves phenomenal results both in object detection accuracy and speed, nevertheless the model often fails in detecting mirrors. In this paper, a novel mirror detection method `Mirror-YOLO' is proposed, which mainly targets on mirror detection. Based on YOLOv4, the proposed model embeds an attention mechanism for better feature acquisition, and a hypercolumn-stairstep approach for feature map fusion. Mirror-YOLO can also produce accurate bounding polygons for instance segmentation. The effectiveness of our proposed model is demonstrated by our experiments, compared to the existing mirror detection methods, the proposed Mirror-YOLO achieves better performance in detection accuracy on the mirror image dataset.	翻訳日:2022-02-18 15:18:52 公開日:2022-02-17
# CLS:セミスーパービジョンラーニングのためのクロスラベル・スーパービジョン CLS: Cross Labeling Supervision for Semi-Supervised Learning ( http://arxiv.org/abs/2202.08502v1 ) ライセンス: Link先を確認	Yao Yao, Junyi Shen, Jin Xu, Bin Zhong, Li Xiao	(参考訳) ディープニューラルネットワークの成功は、大規模ラベル付きデータセットによるところが大きいことが知られている。しかし、ほとんどの実用的なアプリケーションで十分な高品質なラベル付きデータを集めるのに非常に時間がかかり、手間がかかる。半教師付き学習(SSL)はラベル付きデータとラベルなしデータの両方を同時に活用することによりラベル付けコストを削減する効果的なソリューションを提供する。本稿では,典型的な擬似ラベル処理を一般化するフレームワークであるCross Labeling Supervision (CLS)を紹介する。弱いサンプルから擬似ラベルを生成し、同じ入力サンプルの強い強化について予測を教えるfixmatchに基づいて、clsは疑似ラベルと補完ラベルの両方を作成し、正の学習と負の学習の両方をサポートすることができる。自己ラベルの確認バイアスを緩和し、偽ラベルに対する耐性を高めるために、同じ構造を持つ2つの異なる初期化ネットワークを同時に訓練する。各ネットワークは、他のネットワークからの高信頼ラベルを追加の監視信号として利用する。ラベル生成段階では、その予測信頼度に応じて適応的なサンプル重みを人工ラベルに割り当てる。サンプルウェイトは、生成されたラベルの品質を定量化し、ネットワークトレーニングにおける不正確なラベルの破壊を低減する。半教師付き分類タスクの実験結果から,CIFAR-10データセットとCIFAR-100データセットにおいて,我々のフレームワークが既存のアプローチよりも優れていることが示された。 It is well known that the success of deep neural networks is greatly attributed to large-scale labeled datasets. However, it can be extremely time-consuming and laborious to collect sufficient high-quality labeled data in most practical applications. Semi-supervised learning (SSL) provides an effective solution to reduce the cost of labeling by simultaneously leveraging both labeled and unlabeled data. In this work, we present Cross Labeling Supervision (CLS), a framework that generalizes the typical pseudo-labeling process. Based on FixMatch, where a pseudo label is generated from a weakly-augmented sample to teach the prediction on a strong augmentation of the same input sample, CLS allows the creation of both pseudo and complementary labels to support both positive and negative learning. To mitigate the confirmation bias of self-labeling and boost the tolerance to false labels, two different initialized networks with the same structure are trained simultaneously. Each network utilizes high-confidence labels from the other network as additional supervision signals. During the label generation phase, adaptive sample weights are assigned to artificial labels according to their prediction confidence. The sample weight plays two roles: quantify the generated labels' quality and reduce the disruption of inaccurate labels on network training. Experimental results on the semi-supervised classification task show that our framework outperforms existing approaches by large margins on the CIFAR-10 and CIFAR-100 datasets.	翻訳日:2022-02-18 15:18:38 公開日:2022-02-17
# TAFNet: RGB-T クラウドカウントのための3ストリーム適応型フュージョンネットワーク TAFNet: A Three-Stream Adaptive Fusion Network for RGB-T Crowd Counting ( http://arxiv.org/abs/2202.08517v1 ) ライセンス: Link先を確認	Haihan Tang, Yi Wang, Lap-Pui Chau	(参考訳) 本稿では,クラウドカウントにrgbと熱画像の組み合わせを用いた3ストリーム適応型核融合ネットワークtafnetを提案する。具体的には、TAFNetは1つのメインストリームと2つの補助ストリームに分けられる。メインストリームの入力を構成するために,RGBと熱画像のペアを組み合わせる。 2つの補助ストリームはそれぞれrgbイメージとサーマルイメージを利用してモダリティ特有の特徴を抽出する。さらに、モーダリティ固有の特徴を主ストリームに適応的に融合させる情報改善モジュール(IIM)を提案する。 RGBT-CCデータセットを用いた実験結果から,本手法は平均誤差および根平均二乗誤差に対して,最先端手法と比較して20%以上改善されていることがわかった。ソースコードはhttps://github.com/TANGHAIHAN/TAFNetで公開されている。 In this paper, we propose a three-stream adaptive fusion network named TAFNet, which uses paired RGB and thermal images for crowd counting. Specifically, TAFNet is divided into one main stream and two auxiliary streams. We combine a pair of RGB and thermal images to constitute the input of main stream. Two auxiliary streams respectively exploit RGB image and thermal image to extract modality-specific features. Besides, we propose an Information Improvement Module (IIM) to fuse the modality-specific features into the main stream adaptively. Experiment results on RGBT-CC dataset show that our method achieves more than 20% improvement on mean average error and root mean squared error compared with state-of-the-art method. The source code will be publicly available at https://github.com/TANGHAIHAN/TAFNet.	翻訳日:2022-02-18 15:18:14 公開日:2022-02-17
# コンテンツとスタイル分離による水中画像強調のためのドメイン適応 Domain Adaptation for Underwater Image Enhancement via Content and Style Separation ( http://arxiv.org/abs/2202.08537v1 ) ライセンス: Link先を確認	Yu-Wei Chen, Soo-Chang Pei	(参考訳) 水中画像は、光吸収、屈折、散乱によるカラーキャスト、低コントラスト、ハジー効果に悩まされ、物体検出や物体追跡などの高レベルな用途が劣化した。近年の学習に基づく手法は水中画像強調における驚くべき性能を示しているが、これらの作品の多くは合成ペアデータを用いて教師あり学習を行い、実世界データに対するドメインギャップを無視している。本稿では,水中画像強調のためのコンテンツとスタイル分離によるドメイン適応フレームワークを提案し,画像がコンテンツとスタイル潜伏とに絡み合うことができることを仮定し,潜伏空間における関連するスタイルのサブドメインにイメージをクラスタリングし,水中潜伏とクリーンな画像間のマッピングを構築することを目的とする。合成と実世界のデータの遅延を最小限に抑えることを目的とした,水中画像強調のための領域適応の先行研究とは違って,異なるサブドメインからスタイル潜在度を区別することを目的とする。実世界のペアデータの欠如を解決するため,実画像から画像への変換に合成を活用し,教師付き学習のための擬似実水中画像ペアを得る。本モデルでは,潜時操作により異なる拡張レベルを調整できるユーザインタラクションインタフェースを提供する。実世界の様々な水中ベンチマーク実験により,提案フレームワークは水中画像強調のための領域適応を行い,様々な水中画像強調アルゴリズムの量と品質に優れることを示した。モデルとソースコードはhttps://github.com/fordevoted/UIESSで入手できる。 Underwater image suffer from color cast, low contrast and hazy effect due to light absorption, refraction and scattering, which degraded the high-level application, e.g, object detection and object tracking. Recent learning-based methods demonstrate astonishing performance on underwater image enhancement, however, most of these works use synthesis pair data for supervised learning and ignore the domain gap to real-world data. In this paper, we propose a domain adaptation framework for underwater image enhancement via content and style separation, we assume image could be disentangled to content and style latent, and image could be clustered to the sub-domain of associated style in latent space, the goal is to build up the mapping between underwater style latent and clean one. Different from prior works of domain adaptation for underwater image enhancement, which target to minimize the latent discrepancy of synthesis and real-world data, we aim to distinguish style latent from different sub-domains. To solve the problem of lacking pair real-world data, we leverage synthesis to real image-to-image translation to obtain pseudo real underwater image pairs for supervised learning, and enhancement can be achieved by input content and clean style latent into generator. Our model provide a user interact interface to adjust different enhanced level by latent manipulation. Experiment on various public real-world underwater benchmarks demonstrate that the proposed framework is capable to perform domain adaptation for underwater image enhancement and outperform various state-of-the-art underwater image enhancement algorithms in quantity and quality. The model and source code are available at https://github.com/fordevoted/UIESS	翻訳日:2022-02-18 15:18:02 公開日:2022-02-17
# 奥行きを優先した3次元室内シーン合成 3D-Aware Indoor Scene Synthesis with Depth Priors ( http://arxiv.org/abs/2202.08553v1 ) ライセンス: Link先を確認	Zifan Shi, Yujun Shen, Jiapeng Zhu, Dit-Yan Yeung, Qifeng Chen	(参考訳) 近年,2次元データから3次元画像合成を学習するGAN(Generative Adversarial Networks)が進歩しているが,室内レイアウトや内部オブジェクトの多様さにより,既存の手法では屋内シーンのモデル化に失敗している。室内シーンは内在的な構造が共有されていないため, 2次元画像のみを用いた場合, モデルに十分な3次元形状を導くことはできない。本研究では,このギャップを3次元の先行モデルとして深度を導入することで埋める。他の3Dデータフォーマットと比較して、深度は畳み込みベースの生成メカニズムに適合し、実際はより容易にアクセスできる。具体的には、一方の経路が他方の経路に中間的な特徴を注入する深度生成を、外観レンダリングの条件として行うデュアルパス生成器を提案する。このような設計により、明快な幾何学情報による3D認識合成が容易になる。一方、実際のv.s.フェイクドメインを区別し、与えられた入力から深さを予測するために、切り替え可能な判別器を導入する。このようにして、判別器は空間配置を考慮に入れ、ジェネレータに適切な深度条件を学ぶよう助言することができる。大規模な実験結果から,本手法は室内のシーンを極めて優れた品質と3D整合性で合成することができることが示唆された。 Despite the recent advancement of Generative Adversarial Networks (GANs) in learning 3D-aware image synthesis from 2D data, existing methods fail to model indoor scenes due to the large diversity of room layouts and the objects inside. We argue that indoor scenes do not have a shared intrinsic structure, and hence only using 2D images cannot adequately guide the model with the 3D geometry. In this work, we fill in this gap by introducing depth as a 3D prior. Compared with other 3D data formats, depth better fits the convolution-based generation mechanism and is more easily accessible in practice. Specifically, we propose a dual-path generator, where one path is responsible for depth generation, whose intermediate features are injected into the other path as the condition for appearance rendering. Such a design eases the 3D-aware synthesis with explicit geometry information. Meanwhile, we introduce a switchable discriminator both to differentiate real v.s. fake domains and to predict the depth from a given input. In this way, the discriminator can take the spatial arrangement into account and advise the generator to learn an appropriate depth condition. Extensive experimental results suggest that our approach is capable of synthesizing indoor scenes with impressively good quality and 3D consistency, significantly outperforming state-of-the-art alternatives.	翻訳日:2022-02-18 15:17:10 公開日:2022-02-17
# フィードバックネットワークを用いた構造化特徴マップ上のポイントクラウド補完 Point cloud completion on structured feature map with feedback network ( http://arxiv.org/abs/2202.08583v1 ) ライセンス: Link先を確認	Zejia Su, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu	(参考訳) 本稿では,特徴学習の観点から,ポイントクラウド完成の課題に挑戦する。基本となる構造と表面の詳細を部分的な入力から回復するために、基本的なコンポーネントは、大域構造と局所幾何学的詳細の両方をキャプチャできる優れた特徴表現です。この目的に向けて,我々はまず,局所領域から複数の潜在パターンを学習することにより,ポイントワイドな特徴を2次元構造的特徴マップに適応的に集約する機能構造化モジュールFSNetを提案する。次に、FSNetをポイントクラウド補完のための粗大なパイプラインに統合します。具体的には、2D畳み込みニューラルネットワークを用いて、FSNetから粗い完全点クラウドに特徴マップをデコードする。次に、部分入力と粗い中間出力から高密度点雲を生成するために、点雲アップサンプリングネットワークを用いる。局所構造を効率的に活用し, 点分布の均一性を高めるために, 生成した濃密点雲の詳細を段階的に洗練できる自己補正機構を備えた点アップサンプリングモジュールifnetを提案する。本研究では,ShapeNet,MVPおよびKITTIデータセットの定性的および定量的な実験を行い,本手法が最先端のクラウド補完手法より優れていることを示す。 In this paper, we tackle the challenging problem of point cloud completion from the perspective of feature learning. Our key observation is that to recover the underlying structures as well as surface details given a partial input, a fundamental component is a good feature representation that can capture both global structure and local geometric details. Towards this end, we first propose FSNet, a feature structuring module that can adaptively aggregate point-wise features into a 2D structured feature map by learning multiple latent patterns from local regions. We then integrate FSNet into a coarse-to-fine pipeline for point cloud completion. Specifically, a 2D convolutional neural network is adopted to decode feature maps from FSNet into a coarse and complete point cloud. Next, a point cloud upsampling network is used to generate dense point cloud from the partial input and the coarse intermediate output. To efficiently exploit the local structures and enhance the point distribution uniformity, we propose IFNet, a point upsampling module with self-correction mechanism that can progressively refine details of the generated dense point cloud. We conduct both qualitative and quantitative experiments on ShapeNet, MVP, and KITTI datasets, which demonstrate that our method outperforms state-of-the-art point cloud completion approaches.	翻訳日:2022-02-18 15:16:48 公開日:2022-02-17
# 解釈可能なピラミッドネットワークによる単一uhd画像デハジング Single UHD Image Dehazing via Interpretable Pyramid Network ( http://arxiv.org/abs/2202.08589v1 ) ライセンス: Link先を確認	Boxue Xiao, Zhuoran Zheng, Xiang Chen, Chen Lv, Yunliang Zhuang, Tao Wang	(参考訳) 現在、ほとんどのシングルイメージデハージングモデルは、単一のGPUシェーダを持つ超高解像度(UHD)イメージをリアルタイムで実行することはできない。この問題を解決するために,テイラーの定理をラプラスのピラミッドパターンで無限近似する原理を導入し,4Kハジー画像をリアルタイムで処理できるモデルを構築する。ピラミッドネットワークの N 分岐ネットワークはテイラーの定理における N の制約項に対応する。低次多項式は、画像の低周波情報(色、照明など)を再構成する。高次多項式は、画像の高周波情報(例えばテクスチャ)を抑圧する。さらに,ピラミッドモデルの各分岐ネットワークに作用するタッカー再構成に基づく正規化項を提案する。さらに、特徴空間における異常信号の生成を制限する。広範な実験結果から,hazeを用いた4kイメージを単一のgpu (80fps) 上でリアルタイムに動作させるだけでなく,並列性のない解釈性も実現できた。 2つのベンチマーク(o/i-haze)と更新された4kidデータセットで最先端(sota)性能を実現し,その後の最適化手法に対する信頼性の高い基盤を提供する。 Currently, most single image dehazing models cannot run an ultra-high-resolution (UHD) image with a single GPU shader in real-time. To address the problem, we introduce the principle of infinite approximation of Taylor's theorem with the Laplace pyramid pattern to build a model which is capable of handling 4K hazy images in real-time. The N branch networks of the pyramid network correspond to the N constraint terms in Taylor's theorem. Low-order polynomials reconstruct the low-frequency information of the image (e.g. color, illumination). High-order polynomials regress the high-frequency information of the image (e.g. texture). In addition, we propose a Tucker reconstruction-based regularization term that acts on each branch network of the pyramid model. It further constrains the generation of anomalous signals in the feature space. Extensive experimental results demonstrate that our approach can not only run 4K images with haze in real-time on a single GPU (80FPS) but also has unparalleled interpretability. The developed method achieves state-of-the-art (SOTA) performance on two benchmarks (O/I-HAZE) and our updated 4KID dataset while providing the reliable groundwork for subsequent optimization schemes.	翻訳日:2022-02-18 15:16:29 公開日:2022-02-17
# 少数ショット学習のための意味的比例パッチミックス Semantically Proportional Patchmix for Few-Shot Learning ( http://arxiv.org/abs/2202.08647v1 ) ライセンス: Link先を確認	Jingquan Wang, Jing Xu, Yu Pan, Zenglin Xu	(参考訳) 少数ショット学習は、限られた数のラベル付きデータで未発見のクラスを分類することを目的としている。近年の研究では、単純な転写学習戦略によるトレーニングモデルが、数ショットの分類において競合的な結果が得られることが示されている。トレーニングデータの識別には優れていますが、これらのモデルは、おそらく評価上の特徴表現が不十分なため、見当たらないデータに対してうまく一般化していません。そこで本研究では,訓練画像間でパッチをカット・ペーストし,パッチの意味情報に接する基底真理ラベルを混合する,意味的に比例するパッチミックス(seppmix)を提案する。このように,重度のラベルノイズを発生させることなく,局所的ドロップアウト効果によりモデルの一般化能力を向上させることができる。データのより堅牢な表現を学習するために、混合画像上で回転変換を行い、規則ベースの正規化器として回転を予測する。提案手法の有効性を実証し,提案手法の有効性を検証した。 Few-shot learning aims to classify unseen classes with only a limited number of labeled data. Recent works have demonstrated that training models with a simple transfer learning strategy can achieve competitive results in few-shot classification. Although excelling at distinguishing training data, these models are not well generalized to unseen data, probably due to insufficient feature representations on evaluation. To tackle this issue, we propose Semantically Proportional Patchmix (SePPMix), in which patches are cut and pasted among training images and the ground truth labels are mixed proportionally to the semantic information of the patches. In this way, we can improve the generalization ability of the model by regional dropout effect without introducing severe label noise. To learn more robust representations of data, we further take rotate transformation on the mixed images and predict rotations as a rule-based regularizer. Extensive experiments on prevalent few-shot benchmarks have shown the effectiveness of our proposed method.	翻訳日:2022-02-18 15:16:10 公開日:2022-02-17
# 画像デブラリング学習のためのリアルなボケ合成 Realistic Blur Synthesis for Learning Image Deblurring ( http://arxiv.org/abs/2202.08771v1 ) ライセンス: Link先を確認	Jaesung Rim, Geonung Kim, Jungeon Kim, Junyong Lee, Seungyong Lee, Sunghyun Cho	(参考訳) 学習に基づくデブロワーリングの訓練には、大量のぼやけた画像と鋭い画像のペアが必要である。残念ながら、既存の合成データセットは十分に現実的ではなく、既存の現実世界のぼやけたデータセットは、シーンやカメラの設定に制限がある。結果として、トレーニングされたデブロアリングモデルは、実際のぼやけた画像を扱う一般化能力の欠如に悩まされている。本稿では,実画像と合成画像の相違を生ずる様々な要因を分析し,よりリアルなぼかしを合成できる新しいぼかし合成パイプラインを提案する。また,実際のぼやけた画像とシャープな画像のシーケンスを含む,新しいデータセットrsblurを提案する。 rsblurデータセットは、実際のぼかしと合成ぼかしの違いを詳細に分析するために、合成ぼかし画像を生成するのに使うことができる。ボケ合成パイプラインとrsblurデータセットを用いて,ボケ合成における異なる因子の影響を明らかにする。また,本手法により,実際のぼやけた画像の劣化性能を向上できることを示す。 Training learning-based deblurring methods demands a significant amount of blurred and sharp image pairs. Unfortunately, existing synthetic datasets are not realistic enough, and existing real-world blur datasets provide limited diversity of scenes and camera settings. As a result, deblurring models trained on them still suffer from the lack of generalization ability for handling real blurred images. In this paper, we analyze various factors that introduce differences between real and synthetic blurred images, and present a novel blur synthesis pipeline that can synthesize more realistic blur. We also present RSBlur, a novel dataset that contains real blurred images and the corresponding sequences of sharp images. The RSBlur dataset can be used for generating synthetic blurred images to enable detailed analysis on the differences between real and synthetic blur. With our blur synthesis pipeline and RSBlur dataset, we reveal the effects of different factors in the blur synthesis. We also show that our synthesis method can improve the deblurring performance on real blurred images.	翻訳日:2022-02-18 15:15:53 公開日:2022-02-17
# 一般化可能な情報理論因果表現 Generalizable Information Theoretic Causal Representation ( http://arxiv.org/abs/2202.08388v1 ) ライセンス: Link先を確認	Mengyue Yang, Xinyu Cai, Furui Liu, Xu Chen, Zhitang Chen, Jianye Hao, Jun Wang	(参考訳) 表現学習は、画像分類やレコメンダシステムなど、多くの実世界のシナリオにおいて、複数のダウンストリームタスクに対するモデルのパフォーマンスを向上させることができる。既存の学習アプローチは、特徴と下流タスク(ラベル)の間の相関(あるいはそのプロキシ)を確立することに依存しており、通常はラベルの原因、効果、刺激的な相関変数を含む表現をもたらす。非因果部分の不安定性のため、その一般化性は低下する可能性がある。本稿では,観測データから因果表現を学習するために,仮説的因果グラフに基づいて相互情報測度で学習手順を規則化することを提案する。この最適化は、因果性に着想を得た学習がサンプルの複雑さを減らし、一般化能力を向上させるという理論的保証を導出する反事実損失を含む。広範な実験により,提案手法で学習した因果表現に基づくモデルが,敵対的攻撃と分布シフト下で頑健であることが判明した。 It is evidence that representation learning can improve model's performance over multiple downstream tasks in many real-world scenarios, such as image classification and recommender systems. Existing learning approaches rely on establishing the correlation (or its proxy) between features and the downstream task (labels), which typically results in a representation containing cause, effect and spurious correlated variables of the label. Its generalizability may deteriorate because of the unstability of the non-causal parts. In this paper, we propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph. The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability. Extensive experiments show that the models trained on causal representations learned by our approach is robust under adversarial attacks and distribution shift.	翻訳日:2022-02-18 15:13:39 公開日:2022-02-17
# swim: メモリ内ニューラルネットワークアクセラレータのための選択的書き込み検証 SWIM: Selective Write-Verify for Computing-in-Memory Neural Accelerators ( http://arxiv.org/abs/2202.08395v1 ) ライセンス: Link先を確認	Zheyu Yan, Xiaobo Sharon Hu, Yiyu Shi	(参考訳) 非揮発性新興メモリに基づくコンピューティング・イン・メモリアーキテクチャは、その高エネルギー効率によりディープニューラルネットワーク(DNN)加速に大きな可能性を示している。しかし、これらの新興デバイスはマッピングプロセス、すなわちデバイスへのプログラミング重み付けの間に大きなバリエーションに悩まされ、もし未解決のままにしていれば、大幅な精度低下を引き起こす可能性がある。ウェイトマッピングの非理想性は、反復的プログラミングと書き込み検証スキーム、すなわち、必要に応じてコンダクタンスを読み書きすることで補うことができる。既存のすべての作品において、そのような実践はマッピングされているdnnのすべての重量に適用され、広範なプログラミング時間を必要とする。本研究は,DNNの精度を維持するために,書き込み検証のためのウェイトの一部だけを選択する必要があることを示し,大幅な高速化を実現する。さらに、書込み検証が必要なウェイトを効率的に選択するために、フォワードとバックプロパゲーションの1パスしか必要としない第2のデリバティブベース手法SWIMを導入する。異なるデータセットに対する様々なDNNアーキテクチャの実験結果から、SWIMは従来の完全な書き込み検証に比べて最大10倍のプログラムスピードアップを実現でき、精度は同等である。 Computing-in-Memory architectures based on non-volatile emerging memories have demonstrated great potential for deep neural network (DNN) acceleration thanks to their high energy efficiency. However, these emerging devices can suffer from significant variations during the mapping process i.e., programming weights to the devices), and if left undealt with, can cause significant accuracy degradation. The non-ideality of weight mapping can be compensated by iterative programming with a write-verify scheme, i.e., reading the conductance and rewriting if necessary. In all existing works, such a practice is applied to every single weight of a DNN as it is being mapped, which requires extensive programming time. In this work, we show that it is only necessary to select a small portion of the weights for write-verify to maintain the DNN accuracy, thus achieving significant speedup. We further introduce a second derivative based technique SWIM, which only requires a single pass of forward and backpropagation, to efficiently select the weights that need write-verify. Experimental results on various DNN architectures for different datasets show that SWIM can achieve up to 10x programming speedup compared with conventional full-blown write-verify while attaining a comparable accuracy.	翻訳日:2022-02-18 15:13:22 公開日:2022-02-17
# 検索型強化学習 Retrieval-Augmented Reinforcement Learning ( http://arxiv.org/abs/2202.08417v1 ) ライセンス: Link先を確認	Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Ksenia Konyushkova, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell	(参考訳) ほとんどの深層強化学習(RL)アルゴリズムは、経験をパラメトリックな行動ポリシーや値関数に抽出する。効果的であるが、このアプローチにはいくつかの欠点がある:(1)計算コストが高い、(2)パラメトリックモデルに経験を統合するために多くの更新を必要とする、(3)完全に統合されていない経験はエージェントの振る舞いに適切に影響しない、(4)行動はモデルの能力によって制限される。本稿では,過去の経験のデータセットを最適な行動にマップするために,ネットワークを訓練する代替パラダイムを検討する。具体的には、経験のデータセットに直接アクセス可能な検索プロセス(ニューラルネットワークとしてパラメータ化)でRLエージェントを増強する。このデータセットは、エージェントの過去の経験、専門家によるデモンストレーション、その他の関連するソースから得られる。検索プロセスは、現在の文脈で有用なデータセットから情報を取得するように訓練され、エージェントがその目標を迅速かつ効率的に達成するのに役立つ。オフラインDQNエージェントとオンラインR2D2エージェントの2つの異なるRLエージェントに統合する。オフラインマルチタスク問題では,検索拡張DQNエージェントはタスク干渉を回避し,ベースラインDQNエージェントよりも高速に学習することを示す。 Atariでは,検索強化R2D2がベースラインR2D2エージェントよりもかなり高速に学習し,より高いスコアを得ることを示す。提案手法の成分の寄与度を測定するため,広範なアブレーションを行った。 Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model. In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. Specifically, we augment an RL agent with a retrieval process (parameterized as a neural network) that has direct access to a dataset of experiences. This dataset can come from the agent's past experiences, expert demonstrations, or any other relevant source. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context, to help the agent achieve its goal faster and more efficiently. We integrate our method into two different RL agents: an offline DQN agent and an online R2D2 agent. In offline multi-task problems, we show that the retrieval-augmented DQN agent avoids task interference and learns faster than the baseline DQN agent. On Atari, we show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores. We run extensive ablations to measure the contributions of the components of our proposed method.	翻訳日:2022-02-18 15:13:02 公開日:2022-02-17
# (参考訳) 事前学習言語モデルを用いた知識集約型NLPの検討 A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models ( http://arxiv.org/abs/2202.08772v1 ) ライセンス: CC BY 4.0	Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao	(参考訳) 事前訓練された言語モデルによってもたらされるモデルキャパシティの増大に伴い、百科事典や常識知識の柔軟な利用を含む高度な機能を備えたより知識のある自然言語処理(NLP)モデルの必要性が高まっている。しかし、単に事前訓練された言語モデルでは、そのような知識集約型NLPタスクのみを扱う能力が欠けている。この課題に対処するため、外部知識ソースを付加した事前学習型言語モデルが多数提案され、迅速な開発が行われている。本稿では,知識源,知識集約型nlpタスク,知識融合手法の3つの重要な要素を解剖し,事前学習した言語モデルに基づく知識強化モデル(plmkes)の現状を概説する。最後に,3つの要素に関する議論に基づくPLMKEの課題について述べるとともに,NLP実践者にさらなる研究の道筋を与えようとしている。 With the increasing of model capacity brought by pre-trained language models, there emerges boosting needs for more knowledgeable natural language processing (NLP) models with advanced functionalities including providing and making flexible use of encyclopedic and commonsense knowledge. The mere pre-trained language models, however, lack the capacity of handling such knowledge-intensive NLP tasks alone. To address this challenge, large numbers of pre-trained language models augmented with external knowledge sources are proposed and in rapid development. In this paper, we aim to summarize the current progress of pre-trained language model-based knowledge-enhanced models (PLMKEs) by dissecting their three vital elements: knowledge sources, knowledge-intensive NLP tasks, and knowledge fusion methods. Finally, we present the challenges of PLMKEs based on the discussion regarding the three elements and attempt to provide NLP practitioners with potential directions for further research.	翻訳日:2022-02-18 15:12:09 公開日:2022-02-17
# (参考訳) ロバスト行列回復のための下位勾配法のグローバル収束:小さな初期化、雑音測定、過度パラメータ化 Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization ( http://arxiv.org/abs/2202.08788v1 ) ライセンス: CC BY 4.0	Jianhao Ma and Salar Fattahi	(参考訳) 本研究では,低階行列回復の自然な非凸に対するサブ段階法(SubGM)の性能と,低階行列回復を$\ell_1$-lossで定式化する手法について検討する。真の解のランクが未知であり、代わりに過大評価されるシナリオを研究する。ランクの過度な推定は、必要以上の自由度を持つ過度なパラメータ化されたモデルをもたらす。このような過度パラメータ化はアルゴリズムの性能に過度に適合するか、悪影響を及ぼす可能性がある。初期化が小さい単純なsubgmは、測定値の過パラメータ化とノイズの両方に無依存であることが証明される。特に, 極小初期化は, SubGMの性能に及ぼす過パラメータ化の影響を無効化し, 収束率を指数的に向上させることを示した。さらに, 外部雑音モデルとガウス雑音モデルの両方の下でのSubGMの挙動を解析するための最初の統一フレームワークを提案し, 任意に大きく, 任意の密度の雑音値の下でもSubGMが真の解に収束することを示した。我々の結果の核となるのは、Sign-RIPと呼ばれる制限された等距離特性の頑健な変種であり、これは理想的な期待損失から$\ell_1$-lossの偏差を制御している。以上の結果の副産物として,ガウス計測によるロバストな低ランク行列復元のサブクラスを考察し,subgmのグローバル収束を保証するために必要なサンプル数が過パラメータのランクとは無関係であることを示す。 In this work, we study the performance of sub-gradient method (SubGM) on a natural nonconvex and nonsmooth formulation of low-rank matrix recovery with $\ell_1$-loss, where the goal is to recover a low-rank matrix from a limited number of measurements, a subset of which may be grossly corrupted with noise. We study a scenario where the rank of the true solution is unknown and over-estimated instead. The over-estimation of the rank gives rise to an over-parameterized model in which there are more degrees of freedom than needed. Such over-parameterization may lead to overfitting, or adversely affect the performance of the algorithm. We prove that a simple SubGM with small initialization is agnostic to both over-parameterization and noise in the measurements. In particular, we show that small initialization nullifies the effect of over-parameterization on the performance of SubGM, leading to an exponential improvement in its convergence rate. Moreover, we provide the first unifying framework for analyzing the behavior of SubGM under both outlier and Gaussian noise models, showing that SubGM converges to the true solution, even under arbitrarily large and arbitrarily dense noise values, and--perhaps surprisingly--even if the globally optimal solutions do not correspond to the ground truth. At the core of our results is a robust variant of restricted isometry property, called Sign-RIP, which controls the deviation of the sub-differential of the $\ell_1$-loss from that of an ideal, expected loss. As a byproduct of our results, we consider a subclass of robust low-rank matrix recovery with Gaussian measurements, and show that the number of required samples to guarantee the global convergence of SubGM is independent of the over-parameterized rank.	翻訳日:2022-02-18 14:59:20 公開日:2022-02-17
# 行動の多重性を考慮したコントラスト的メタラーニング Contrastive Meta Learning with Behavior Multiplicity for Recommendation ( http://arxiv.org/abs/2202.08523v1 ) ライセンス: Link先を確認	Wei Wei and Chao Huang and Lianghao Xia and Yong Xu and Jiashu Zhao and Dawei Yin	(参考訳) 優れたレコメンデーションフレームワークは、ユーザーが興味のあるアイテムを識別するのに役立つだけでなく、さまざまなオンラインプラットフォーム(eコマースやソーシャルメディアなど)の収益にも寄与する。従来のレコメンデーションモデルは、通常、ユーザとアイテムの間には単一のタイプのインタラクションしか存在せず、ページビュー、アドオン、購入のようなマルチタイプのユーザー行動データから複数のユーザ-イテム関係をモデル化できないと仮定する。最近の研究では、さまざまなタイプの振る舞いにまたがる依存関係を捉えることが提案されているが、重要な2つの課題が探究されていない。一標的行動(購入等)の下で、疎い監視信号に対処すること。 ii)カスタマイズされた依存関係モデリングによるパーソナライズされたマルチビヘイビアパターンのキャプチャ。上記の課題に取り組むため,我々は新しいモデルであるコントラストメタラーニング(cml)を考案し,異なるユーザに対して専用のクロスタイプ行動依存性を維持する。特に,構築されたコントラスト損失を通じて,異なる種類の行動にまたがる移動可能知識を蒸留する多目的コントラスト学習フレームワークを提案する。さらに,多様なマルチビヘイビアパターンを捉えるために,異なるユーザに対してカスタマイズされた振る舞いの不均一性を符号化するコントラストメタネットワークを設計する。 3つの実世界のデータセットに関する広範囲な実験は、この手法が様々な最先端の推奨手法を一貫して上回っていることを示している。さらに, コントラスト的メタ学習パラダイムは, 行動の多重性をレコメンデーションで捉えるための大きな可能性を示唆する。私たちはモデル実装をhttps://github.com/weiwei1206/cml.gitでリリースします。 A well-informed recommendation framework could not only help users identify their interested items, but also benefit the revenue of various online platforms (e.g., e-commerce, social media). Traditional recommendation models usually assume that only a single type of interaction exists between user and item, and fail to model the multiplex user-item relationships from multi-typed user behavior data, such as page view, add-to-favourite and purchase. While some recent studies propose to capture the dependencies across different types of behaviors, two important challenges have been less explored: i) Dealing with the sparse supervision signal under target behaviors (e.g., purchase). ii) Capturing the personalized multi-behavior patterns with customized dependency modeling. To tackle the above challenges, we devise a new model CML, Contrastive Meta Learning (CML), to maintain dedicated cross-type behavior dependency for different users. In particular, we propose a multi-behavior contrastive learning framework to distill transferable knowledge across different types of behaviors via the constructed contrastive loss. In addition, to capture the diverse multi-behavior patterns, we design a contrastive meta network to encode the customized behavior heterogeneity for different users. Extensive experiments on three real-world datasets indicate that our method consistently outperforms various state-of-the-art recommendation methods. Our empirical studies further suggest that the contrastive meta learning paradigm offers great potential for capturing the behavior multiplicity in recommendation. We release our model implementation at: https://github.com/weiwei1206/CML.git.	翻訳日:2022-02-18 14:56:08 公開日:2022-02-17
# 終わりは意味を正当化するのか? フェアネスを考慮した機械学習のモラル正当性について Does the End Justify the Means? On the Moral Justification of Fairness-Aware Machine Learning ( http://arxiv.org/abs/2202.08536v1 ) ライセンス: Link先を確認	Hilde Weerts, Lamb\`er Royakkers, Mykola Pechenizkiy	(参考訳) フェアネス認識機械学習(fair-ml)アルゴリズムは豊富であるが、これらのアルゴリズムがどのようにフェアネスメトリクスを強制するかの道徳的正当性はほとんど未解明である。本研究の目的は,fair-mlアルゴリズムの道徳的意味を引き出すことである。この目的のために、まずアルゴリズムが最適化する公平度指標の道徳的正当性を考察する。我々は、フェアネスのメトリクスを正当化できる3つの命題に到達するために、以前の作業の拡張を示す。これまでの作業とは違って,予測結果の結果が公平さを判断する上で重要であることを強調する。我々は、fair-mlアルゴリズムの道徳的意味を識別するために、拡張された枠組みと経験的倫理から引き出す。我々は、アルゴリズムに固有の2つの最適化戦略に焦点を当てる:グループ固有の決定しきい値とランダム化された決定しきい値。我々は、アルゴリズムの正当化は、アルゴリズムが適用される(社会的)コンテキストについての仮定によって、たとえ関連するフェアネス計量が同じであっても、異なる可能性があると主張する。最後に,fair-mlアルゴリズムのより完全な評価に向けた今後の研究の道筋を,直接最適化の目的を超えてスケッチする。 Despite an abundance of fairness-aware machine learning (fair-ml) algorithms, the moral justification of how these algorithms enforce fairness metrics is largely unexplored. The goal of this paper is to elicit the moral implications of a fair-ml algorithm. To this end, we first consider the moral justification of the fairness metrics for which the algorithm optimizes. We present an extension of previous work to arrive at three propositions that can justify the fairness metrics. Different from previous work, our extension highlights that the consequences of predicted outcomes are important for judging fairness. We draw from the extended framework and empirical ethics to identify moral implications of the fair-ml algorithm. We focus on the two optimization strategies inherent to the algorithm: group-specific decision thresholds and randomized decision thresholds. We argue that the justification of the algorithm can differ depending on one's assumptions about the (social) context in which the algorithm is applied - even if the associated fairness metric is the same. Finally, we sketch paths for future work towards a more complete evaluation of fair-ml algorithms, beyond their direct optimization objectives.	翻訳日:2022-02-18 14:55:41 公開日:2022-02-17
# バナッハ空間におけるロバストSVM最適化 Robust SVM Optimization in Banach spaces ( http://arxiv.org/abs/2202.08567v1 ) ライセンス: Link先を確認	Mohammed Sbihi and Nicolas Couellan	(参考訳) バナッハ空間における二項分類の問題に不確実性が存在する場合に対処する。古典的支援ベクトルマシン理論から得られる多くの結果は、バナッハ空間においてその頑健な結果に適切に一般化できることを示す。これらはRepresenter Theorem、関連する最適化問題に対する強い双対性、幾何学的解釈を含む。さらに, 2つの閉凸集合において, 基底空間が反射的かつ滑らかなときに最も近い点を求めるより一般的な問題に対して, ナッシュ均衡問題を定式化してゲーム理論解釈を提案する。 We address the issue of binary classification in Banach spaces in presence of uncertainty. We show that a number of results from classical support vector machines theory can be appropriately generalised to their robust counterpart in Banach spaces. These include the Representer Theorem, strong duality for the associated Optimization problem as well as their geometric interpretation. Furthermore, we propose a game theoretic interpretation by expressing a Nash equilibrium problem formulation for the more general problem of finding the closest points in two closed convex sets when the underlying space is reflexive and smooth.	翻訳日:2022-02-18 14:55:23 公開日:2022-02-17
# 統合階段特性:二層ニューラルネットワークにおけるスパース関数のSGD学習に必要なほぼ十分条件 The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks ( http://arxiv.org/abs/2202.08658v1 ) ライセンス: Link先を確認	Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz	(参考訳) 現在、ニューラルネットワークが2つの極端パラメータ化のためにSGDで学習できる機能、すなわち線形状態のニューラルネットワークと、構造的な制約のないニューラルネットワークを特徴付ける方法が知られている。しかし、関心の主パラメトリゼーション(非線形だが正規のネットワーク)については、大きな発展にもかかわらず、厳密な特徴がまだ得られていない。我々は,sgdにより訓練された深部2ニューラルネットワークを平均場法で検討することで,この方向の一歩を踏み出す。我々は、潜在する低次元部分空間(つまり、少数の座標)に依存する二進入力上の函数を考える。この体制は、ニューラルネットワークが高次元データセットに日常的に取り組み、次元性の呪いに苦しむことなく潜伏する低次元構造に適応する方法がよく理解されていないため、関心がある。したがって、SGD-learnability with $O(d)$ sample complexity in a large ambient dimension $d$。私たちの主な結果は階層的特性である"merged-staircase property"を特徴付けており、この設定で学習するには必要であり、ほぼ十分である。この関数のクラスでは、任意の特徴写像(例えば、ntk)上の線形メソッドは効率的に学習できない。鍵となるツールは、低次元の潜在空間上で定義される関数に適用される新しい「次元自由」動力学近似結果、多項式の恒等性テストに基づく大域収束の証明、非直交関数に対する線形法に対する下界の改善である。 It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest (non-linear but regular networks) no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly understood how neural networks routinely tackle high-dimensional datasets and adapt to latent low-dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension $d$. Our main results characterize a hierarchical property, the "merged-staircase property", that is both necessary and nearly sufficient for learning in this setting. We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new "dimension-free" dynamics approximation result that applies to functions defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions.	翻訳日:2022-02-18 14:53:20 公開日:2022-02-17
# 経験的リスク最小化の普遍性 Universality of empirical risk minimization ( http://arxiv.org/abs/2202.08832v1 ) ライセンス: Link先を確認	Andrea Montanari and Basil Saeed	(参考訳) d.d. サンプル $\{{\boldsymbol x}_i,y_i\}_{i\le n}$ ここで、${\boldsymbol x}_i \in\mathbb{R}^p$ は特徴ベクトルであり、${y} \in \mathbb{R}$ はラベルである。我々は,$\mathsf{k} = o(1)$ vectors ${\boldsymbol \theta}_1, . . . , {\boldsymbol \theta}_{\mathsf k} \in \mathbb{r}^p$ でパラメータ化される関数のクラスに対する経験的リスク最小化について検討し,トレーニングとテストエラーの両方で普遍性が証明された。すなわち、比例漸近値 $n,p\to\infty$, with $n/p = \theta(1)$ の下で、トレーニング誤差はその共分散構造を通してのみランダム特徴分布に依存することが証明される。さらに,短期的リスク最小値に対する最小テスト誤差が類似する普遍性特性を享受できることを実証する。特に、これらの量の漸近はより単純なモデルの下で$-$to leading order$-$と計算され、特徴ベクトル ${\boldsymbol x}_i$ は同じ共分散を持つガウスベクトル ${\boldsymbol g}_i$ に置き換えられる。初期の普遍性の結果は、強い凸学習手順や独立エントリを持つベクトル${\boldsymbol x}_i$に限られていた。私たちの結果はこれらの仮定を成さない。我々の仮定は、ランダム化有限化写像によって生成される特徴ベクトル ${\boldsymbol x}_i$ を含むのに十分一般的である。特に、特定のランダムな特徴モデル(ランダムな重み付き一層ニューラルネットワークの出力を計算する)とニューラルタンジェントモデル(二層ネットワークの1次テイラー近似)の仮定を明示的に検証する。 Consider supervised learning from i.i.d. samples $\{{\boldsymbol x}_i,y_i\}_{i\le n}$ where ${\boldsymbol x}_i \in\mathbb{R}^p$ are feature vectors and ${y} \in \mathbb{R}$ are labels. We study empirical risk minimization over a class of functions that are parameterized by $\mathsf{k} = O(1)$ vectors ${\boldsymbol \theta}_1, . . . , {\boldsymbol \theta}_{\mathsf k} \in \mathbb{R}^p$ , and prove universality results both for the training and test error. Namely, under the proportional asymptotics $n,p\to\infty$, with $n/p = \Theta(1)$, we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed $-$to leading order$-$ under a simpler model in which the feature vectors ${\boldsymbol x}_i$ are replaced by Gaussian vectors ${\boldsymbol g}_i$ with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors ${\boldsymbol x}_i$ with independent entries. Our results do not make any of these assumptions. Our assumptions are general enough to include feature vectors ${\boldsymbol x}_i$ that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).	翻訳日:2022-02-18 14:52:53 公開日:2022-02-17
# カーネル法による情報理論 Information Theory with Kernel Methods ( http://arxiv.org/abs/2202.08545v1 ) ライセンス: Link先を確認	Francis Bach (SIERRA)	(参考訳) 生成カーネルヒルベルト空間からの共分散演算子による確率分布の解析について考察する。これらの作用素のフォン・ノイマンエントロピーと相対エントロピーは、シャノンエントロピーと相対エントロピーの通常の概念と密接に関連しており、それらの性質の多くを共有している。確率分布の様々なオーラクルから効率的な推定アルゴリズムが組み合わさっている。また、積空間を考察し、テンソル積核に対して、相互情報と合同エントロピーの概念を定義できることを示した。最終的に、これらの新しい相対エントロピーの概念が、変分推論手法における凸最適化と併用し、新しい確率的推論手法のファミリーを提供する、ログ分割関数上の新しい上界につながることを示す。 We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. We show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from various oracles on the probability distributions. We also consider product spaces and show that for tensor product kernels, we can define notions of mutual information and joint entropies, which can then characterize independence perfectly, but only partially conditional independence. We finally show how these new notions of relative entropy lead to new upper-bounds on log partition functions, that can be used together with convex optimization within variational inference methods, providing a new family of probabilistic inference methods.	翻訳日:2022-02-18 14:51:46 公開日:2022-02-17
# cosformer:softmaxの注目を再考する cosFormer: Rethinking Softmax in Attention ( http://arxiv.org/abs/2202.08791v1 ) ライセンス: Link先を確認	Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong	(参考訳) Transformerは自然言語処理、コンピュータビジョン、オーディオ処理で大きな成功を収めている。コアコンポーネントの1つとして、ソフトマックスアテンションは長距離依存を捉えるのに役立つが、2次空間とシーケンス長の時間的複雑さのためにスケールアップを禁止している。カーネル法はソフトマックス演算子を近似することで複雑さを減らすためによく用いられる。それにもかかわらず、近似誤差のため、その性能は異なるタスク/コーパスで異なり、バニラソフトマックスの注意と比べ、重要な性能低下に苦しむ。本稿では,カジュアル・クロスの両面において,バニラ変圧器に匹敵する精度を達成できる,cosFormerと呼ばれる線形変圧器を提案する。 cosformerはsoftmax attentionの2つの重要な特性に基づいている。私)。注意行列の非負性 i)。注意行列の分布に集中できる非線形再重み付けスキーム。線型代用として、cosFormerは線型作用素とコサインに基づく距離再重み付け機構でこれらの特性を満たす。言語モデルとテキスト理解タスクに関する広範な実験により,本手法の有効性が示された。さらに,本手法を長手シーケンスで検討し,長手領域のarenaベンチマークで最先端の性能を実現する。ソースコードはhttps://github.com/OpenNLPLab/cosFormerで入手できる。 Transformer has shown great successes in natural language processing, computer vision, and audio processing. As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length. Kernel methods are often adopted to reduce the complexity by approximating the softmax operator. Nevertheless, due to the approximation errors, their performances vary in different tasks/corpus and suffer crucial performance drops when compared with the vanilla softmax attention. In this paper, we propose a linear transformer called cosFormer that can achieve comparable or better accuracy to the vanilla transformer in both casual and cross attentions. cosFormer is based on two key properties of softmax attention: i). non-negativeness of the attention matrix; ii). a non-linear re-weighting scheme that can concentrate the distribution of the attention matrix. As its linear substitute, cosFormer fulfills these properties with a linear operator and a cosine-based distance re-weighting mechanism. Extensive experiments on language modeling and text understanding tasks demonstrate the effectiveness of our method. We further examine our method on long sequences and achieve state-of-the-art performance on the Long-Range Arena benchmark. The source code is available at https://github.com/OpenNLPLab/cosFormer.	翻訳日:2022-02-18 14:51:31 公開日:2022-02-17
# (参考訳) ニューラルネットワークの一般サイクル学習 General Cyclical Training of Neural Networks ( http://arxiv.org/abs/2202.08835v1 ) ライセンス: CC BY 4.0	Leslie N. Smith	(参考訳) 本稿では,機械学習における「一般循環型トレーニング」の原則について述べる。ニューラルネットワークのトレーニングには,アルゴリズムによる例(ハイパーパラメータとロス関数),データに基づく例,モデルに基づく例など,いくつかのマニフェストを提案する。具体的には, 循環量減少, 循環的バッチサイズ, 循環的焦点損失, 循環的ソフトマックス温度, 循環的データ増大, 循環的勾配クリッピング, 循環的半教師付き学習といった新しい手法を紹介する。さらに, 実験モデルの試験精度向上には, 周期的重量減衰, 周期的軟度温度, 循環的勾配クリッピング(この原理の3つの例)が有用であることを示した。さらに, 一般循環学習の観点から, モデルに基づく例(事前学習や知識蒸留など)を考察し, 典型的な学習手法の変更を推奨する。本稿では、一般循環学習の概念を定義し、この概念をニューラルネットワークのトレーニングに適用できるいくつかの具体的な方法について論じる。再現性の精神では、我々の実験で使われたコードは \url{https://github.com/lnsmith54/CFL} で入手できる。 This paper describes the principle of "General Cyclical Training" in machine learning, where training starts and ends with "easy training" and the "hard training" happens during the middle epochs. We propose several manifestations for training neural networks, including algorithmic examples (via hyper-parameters and loss functions), data-based examples, and model-based examples. Specifically, we introduce several novel techniques: cyclical weight decay, cyclical batch size, cyclical focal loss, cyclical softmax temperature, cyclical data augmentation, cyclical gradient clipping, and cyclical semi-supervised learning. In addition, we demonstrate that cyclical weight decay, cyclical softmax temperature, and cyclical gradient clipping (as three examples of this principle) are beneficial in the test accuracy performance of a trained model. Furthermore, we discuss model-based examples (such as pretraining and knowledge distillation) from the perspective of general cyclical training and recommend some changes to the typical training methodology. In summary, this paper defines the general cyclical training concept and discusses several specific ways in which this concept can be applied to training neural networks. In the spirit of reproducibility, the code used in our experiments is available at \url{https://github.com/lnsmith54/CFL}.	翻訳日:2022-02-18 14:50:07 公開日:2022-02-17
# バックプロパゲーションのない勾配 Gradients without Backpropagation ( http://arxiv.org/abs/2202.08587v1 ) ライセンス: Link先を確認	At{\i}l{\i}m G\"une\c{s} Baydin, Barak A. Pearlmutter, Don Syme, Frank Wood, Philip Torr	(参考訳) 最適化のために目的関数の勾配を計算するためにバックプロパゲーションを使うことは、機械学習のメインスタンスである。バックプロパゲーション(backpropagation)またはリバースモード微分(reverse-mode differentiation)は、フォワードモードを含む自動微分アルゴリズムの一般ファミリーにおける特別なケースである。本稿では,フォワードモードを通じて正確に効率的に計算できる方向微分のみに基づいて勾配を計算する手法を提案する。この定式化をフォワード勾配と呼び、関数の1回のフォワードランで評価できる勾配の偏りのない推定と呼び、勾配降下におけるバックプロパゲーションの必要性を完全に排除する。我々は,様々な問題において前方勾配降下を示し,計算量を大幅に削減し,場合によっては最大2倍の速さでトレーニングできることを示した。 Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can compute exactly and efficiently via the forward mode. We call this formulation the forward gradient, an unbiased estimate of the gradient that can be evaluated in a single forward run of the function, entirely eliminating the need for backpropagation in gradient descent. We demonstrate forward gradient descent in a range of problems, showing substantial savings in computation and enabling training up to twice as fast in some cases.	翻訳日:2022-02-18 14:34:42 公開日:2022-02-17
# セマンティックセグメンテーションにおける未知の検出と学習 Detecting and Learning the Unknown in Semantic Segmentation ( http://arxiv.org/abs/2202.08700v1 ) ライセンス: Link先を確認	Robin Chan, Svenja Uhlemeyer, Matthias Rottmann and Hanno Gottschalk	(参考訳) セマンティックセグメンテーションは自動運転における認識にとって重要な要素である。ディープニューラルネットワーク(DNN)はこのタスクに一般的に使われ、通常、閉じた操作領域に現れるオブジェクトクラスの閉じたセットでトレーニングされる。しかしこれは、DNNがデプロイされる自動運転におけるオープンワールドの仮定とは対照的である。したがって、DNNは、これまで遭遇したことのないデータ(異常とも呼ばれる)に直面している必要がある。本稿では,まず,情報理論的な観点からの異常について概観する。次に,セマンティックセグメンテーションにおける意味不明物体の検出に関する研究について述べる。我々は,異常物体に対する高いエントロピー応答の訓練が,我々の理論的知見に合致する他の手法よりも優れていることを実証する。さらに,モデルのセマンティクスのセットに含まれる異常タイプを選択するために,異常の発生頻度を評価する手法について検討する。これらの異常は教師なしの方法で学習できることを示し、ディープラーニングに基づくオンラインアプリケーションに特に適している。 Semantic segmentation is a crucial component for perception in automated driving. Deep neural networks (DNNs) are commonly used for this task and they are usually trained on a closed set of object classes appearing in a closed operational domain. However, this is in contrast to the open world assumption in automated driving that DNNs are deployed to. Therefore, DNNs necessarily face data that they have never encountered previously, also known as anomalies, which are extremely safety-critical to properly cope with. In this work, we first give an overview about anomalies from an information-theoretic perspective. Next, we review research in detecting semantically unknown objects in semantic segmentation. We demonstrate that training for high entropy responses on anomalous objects outperforms other recent methods, which is in line with our theoretical findings. Moreover, we examine a method to assess the occurrence frequency of anomalies in order to select anomaly types to include into a model's set of semantic categories. We demonstrate that these anomalies can then be learned in an unsupervised fashion, which is particularly suitable in online applications based on deep learning.	翻訳日:2022-02-18 14:33:55 公開日:2022-02-17
# (参考訳) Data-SUITE:In-distribution incongruous例のデータ中心同定 Data-SUITE: Data-centric identification of in-distribution incongruous examples ( http://arxiv.org/abs/2202.08836v1 ) ライセンス: CC BY 4.0	Nabeel Seedat, Jonathan Crabbe, Mihaela van der Schaar	(参考訳) データ品質の体系的定量化は一貫したモデル性能にとって重要である。以前の研究は、アウトオブディストリビューションデータに重点を置いてきた。代わりに、特徴空間の不均一性から生じる可能性のある不連続領域(ID)データを特徴付けるという、未検討かつ等しく重要な問題に取り組む。そこで本研究では,データ中心のフレームワークであるData-SUITEによるパラダイムシフトを提案する。 Data-SUITEは、コプラモデリング、表現学習、コンフォメーション予測を利用して、一連のトレーニングインスタンスに基づいて特徴量信頼区間推定器を構築する。これらの推定器は、トレーニングセットに関するテストインスタンスの一致を評価するために、(1)トレーニングインスタンスでトレーニングされたモデルによってどのテストインスタンスが確実に予測されるかという、実用的な2つの質問に答えるために使用できる。そして、(2)データオーナーがデータの制限を理解したり、将来のデータ収集を導くために、特徴空間の不一致領域を識別できますか? 我々は、Data-SUITEの性能とカバレッジ保証を実証的に検証し、クロスサイト医療データ、偏りのあるデータ、コンセプトドリフトデータ、そして、下流モデルが信頼できる(そのモデルに依存しない)ID領域を最もよく識別することを示す。さらに、これらの特定されたリージョンがデータセットに対する洞察を提供し、その制限を強調する方法について説明する。 Systematic quantification of data quality is critical for consistent model performance. Prior works have focused on out-of-distribution data. Instead, we tackle an understudied yet equally important problem of characterizing incongruous regions of in-distribution (ID) data, which may arise from feature space heterogeneity. To this end, we propose a paradigm shift with Data-SUITE: a data-centric framework to identify these regions, independent of a task-specific model. DATA-SUITE leverages copula modeling, representation learning, and conformal prediction to build feature-wise confidence interval estimators based on a set of training instances. These estimators can be used to evaluate the congruence of test instances with respect to the training set, to answer two practically useful questions: (1) which test instances will be reliably predicted by a model trained with the training instances? and (2) can we identify incongruous regions of the feature space so that data owners understand the data's limitations or guide future data collection? We empirically validate Data-SUITE's performance and coverage guarantees and demonstrate on cross-site medical data, biased data, and data with concept drift, that Data-SUITE best identifies ID regions where a downstream model may be reliable (independent of said model). We also illustrate how these identified regions can provide insights into datasets and highlight their limitations.	翻訳日:2022-02-18 14:32:17 公開日:2022-02-17
# 文法に基づく基底辞書学習 Grammar-Based Grounded Lexicon Learning ( http://arxiv.org/abs/2202.08806v1 ) ライセンス: Link先を確認	Jiayuan Mao, Haoyue Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum	(参考訳) 本稿では,文法に基づく接地辞書学習(G2L2)について述べる。 G2L2の中核には、各単語を構文型のタプルとニューロシンボリックセマンティックプログラムにマッピングする辞書エントリのコレクションがある。例えば、shiny という単語は形容詞の構文型を持ち、そのニューロシンボリックな意味プログラムは記号形式 {\lambda}x を持つ。これはshiNYの概念がニューラルネットワークの埋め込みと関連付けられており、光沢のあるオブジェクトを分類するために使用される。入力文が与えられた後、G2L2はまず各トークンに関連する辞書エントリを検索する。次に、構文に基づいた語彙的意味を合成することにより、実行可能な神経シンボリックプログラムとして文の意味を導出する。回収された意味プログラムは、接地入力で実行することができる。指数関数的に成長する合成空間における学習を容易にするために,学習時間を削減するために,導出上の局所辺縁化を行う合同解析および期待実行アルゴリズムを提案する。視覚的推論と言語駆動ナビゲーションの2つの領域でG2L2を評価する。その結果、g2l2は少量のデータから新しい単語合成に一般化できることがわかった。 We present Grammar-Based Grounded Lexicon Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts. At the core of G2L2 is a collection of lexicon entries, which map each word to a tuple of a syntactic type and a neuro-symbolic semantic program. For example, the word shiny has a syntactic type of adjective; its neuro-symbolic semantic program has the symbolic form {\lambda}x. filter(x, SHINY), where the concept SHINY is associated with a neural network embedding, which will be used to classify shiny objects. Given an input sentence, G2L2 first looks up the lexicon entries associated with each token. It then derives the meaning of the sentence as an executable neuro-symbolic program by composing lexical meanings based on syntax. The recovered meaning programs can be executed on grounded inputs. To facilitate learning in an exponentially-growing compositional space, we introduce a joint parsing and expected execution algorithm, which does local marginalization over derivations to reduce the training time. We evaluate G2L2 on two domains: visual reasoning and language-driven navigation. Results show that G2L2 can generalize from small amounts of data to novel compositions of words.	翻訳日:2022-02-18 14:30:43 公開日:2022-02-17
# Dynamic Object Comprehension: 人工的な視覚知覚を評価するフレームワーク Dynamic Object Comprehension: A Framework For Evaluating Artificial Visual Perception ( http://arxiv.org/abs/2202.08490v1 ) ライセンス: Link先を確認	Scott Y.L. Chin, Bradley R. Quinton	(参考訳) AugmentedとMixed Realityは、おそらくモバイルインターネットの後継として浮上している。しかし、多くの技術的課題が残っている。これらのシステムの重要な要件の1つは、物理的な世界と仮想世界の間の連続性を作り出す能力であり、ユーザの視覚知覚が主要なインターフェイス媒体である。この連続性を構築するには、物理的な世界を視覚的に理解する必要がある。コンピュータビジョンや画像分類やオブジェクト検出などのai技術は近年大きく進歩しているが、これらの領域での成功は、これらの重要なmrやarアプリケーションに必要な視覚認識にはまだ繋がっていない。重要な問題は、これらのアプリケーションに現在の評価基準が不十分であることだ。この新興分野の進歩を動機づけ、評価するには、新しいメトリクスが必要である。本稿では,現在の評価基準の限界を概説し,新しい基準を提案する。 Augmented and Mixed Reality are emerging as likely successors to the mobile internet. However, many technical challenges remain. One of the key requirements of these systems is the ability to create a continuity between physical and virtual worlds, with the user's visual perception as the primary interface medium. Building this continuity requires the system to develop a visual understanding of the physical world. While there has been significant recent progress in computer vision and AI techniques such as image classification and object detection, success in these areas has not yet led to the visual perception required for these critical MR and AR applications. A significant issue is that current evaluation criteria are insufficient for these applications. To motivate and evaluate progress in this emerging area, there is a need for new metrics. In this paper we outline limitations of current evaluation criteria and propose new criteria.	翻訳日:2022-02-18 14:29:39 公開日:2022-02-17
# 連続コンディショニングによる点雲生成 Point Cloud Generation with Continuous Conditioning ( http://arxiv.org/abs/2202.08526v1 ) ライセンス: Link先を確認	Larissa T. Triess and Andre B\"uhler and David Peter and Fabian B. Flohr and J. Marius Z\"ollner	(参考訳) 生成モデルは高品質で多様な3Dオブジェクトを合成するのに使うことができる。本稿では,連続パラメータを条件とした3次元点クラウド形状を生成する新たなgan(generative adversarial network)構成を提案する。例示アプリケーションでは、これを使って生成プロセスをガイドし、カスタムフィットな形状の3dオブジェクトを作成します。補助分類器gansの概念を用いて,マルチタスク環境でこの生成プロセスを定式化する。さらに、データセットのカーネル密度推定(KDE)からトレーニング用ジェネレータラベル入力をサンプリングする。以上の結果から,少ないサンプルの領域で大幅なパフォーマンス向上が期待できる。広範に定量的および定性的な実験により、優れた生成品質と多様性を維持しながら、対象次元を明示的に制御できることが示されている。 Generative models can be used to synthesize 3D objects of high quality and diversity. However, there is typically no control over the properties of the generated object.This paper proposes a novel generative adversarial network (GAN) setup that generates 3D point cloud shapes conditioned on a continuous parameter. In an exemplary application, we use this to guide the generative process to create a 3D object with a custom-fit shape. We formulate this generation process in a multi-task setting by using the concept of auxiliary classifier GANs. Further, we propose to sample the generator label input for training from a kernel density estimation (KDE) of the dataset. Our ablations show that this leads to significant performance increase in regions with few samples. Extensive quantitative and qualitative experiments show that we gain explicit control over the object dimensions while maintaining good generation quality and diversity.	翻訳日:2022-02-18 14:29:26 公開日:2022-02-17
# マルチオブジェクト追跡のための断熱量子コンピューティング Adiabatic Quantum Computing for Multi Object Tracking ( http://arxiv.org/abs/2202.08837v1 ) ライセンス: Link先を確認	Jan-Nico Zaech, Alexander Liniger, Martin Danelljan, Dengxin Dai, Luc Van Gool	(参考訳) マルチオブジェクトトラッキング(mot)は、オブジェクト検出が時間を通じて関連付けられるトラッキング・バイ・検出パラダイムにおいて、最も頻繁にアプローチされる。関連性は自然に離散最適化問題につながる。これらの最適化問題はNPハードであるため、現在のハードウェア上の小さなインスタンスに対してのみ解ける。 AQC(Adiabatic quantum computing)は、近い将来、NP-hard最適化問題にかなりのスピードアップをもたらす可能性があるため、この問題に対する解決策を提供する。しかし、現在のMOTの定式化は、スケーリング特性のために量子コンピューティングには適さない。そこで本研究では,AQCで解くためのMOTの定式化を提案する。我々は、AQC上に実装された量子力学系を表すIsingモデルを用いる。本手法は,既成整数計画法を用いても,最先端の最適化手法と競合することを示す。最後に、MOT問題はすでに実量子コンピュータの現世代で小さな例で解決可能であることを実証し、測定された解の性質を解析する。 Multi-Object Tracking (MOT) is most often approached in the tracking-by-detection paradigm, where object detections are associated through time. The association step naturally leads to discrete optimization problems. As these optimization problems are often NP-hard, they can only be solved exactly for small instances on current hardware. Adiabatic quantum computing (AQC) offers a solution for this, as it has the potential to provide a considerable speedup on a range of NP-hard optimization problems in the near future. However, current MOT formulations are unsuitable for quantum computing due to their scaling properties. In this work, we therefore propose the first MOT formulation designed to be solved with AQC. We employ an Ising model that represents the quantum mechanical system implemented on the AQC. We show that our approach is competitive compared with state-of-the-art optimization-based approaches, even when using of-the-shelf integer programming solvers. Finally, we demonstrate that our MOT problem is already solvable on the current generation of real quantum computers for small examples, and analyze the properties of the measured solutions.	翻訳日:2022-02-18 14:29:13 公開日:2022-02-17
# AIを用いた胃内視鏡生検5クラス診断のためのハイブリッド2段階視覚変換器 A hybrid 2-stage vision transformer for AI-assisted 5 class pathologic diagnosis of gastric endoscopic biopsies ( http://arxiv.org/abs/2202.08510v1 ) ライセンス: Link先を確認	Yujin Oh, Go Eun Bae, Kyung-Hee Kim, Min-Kyung Yeo, Jong Chul Ye	(参考訳) 胃内視鏡検査は早期に適切な胃癌(GC)治療を判定し,GC関連死亡率を低下させる有効な方法である。 ai(artificial intelligence)は、病理医が全スライド画像のデジタル化を支援するという大きな約束をもたらしたが、臨床ガイドラインに基づいた適切なgc処理を導くための自動分類システムは、いまだに不足している。本稿では,gc組織学の5つのクラスを分類するaiシステムを提案する。 2段階の視覚トランスフォーマーを用いたマルチスケールな自己照査機構を通じて、病理医がスライドを理解する方法を模倣したaiシステムは、内外コホート分析において85%以上の診断感度を達成し、臨床能力を示す。さらに、AI支援の病理医は、ヒトの病理医と比較して18%のスクリーニング時間で診断感度を10%改善した。当科のAIシステムは,早期GC患者に対する適切な治療法を決定する上で,先進的な病理所見を提供する大きな可能性を秘めている。 Gastric endoscopic screening is an effective way to decide appropriate gastric cancer (GC) treatment at an early stage, reducing GC-associated mortality rate. Although artificial intelligence (AI) has brought a great promise to assist pathologist to screen digitalized whole slide images, automatic classification systems for guiding proper GC treatment based on clinical guideline are still lacking. Here, we propose an AI system classifying 5 classes of GC histology, which can be perfectly matched to general treatment guidance. The AI system, mimicking the way pathologist understand slides through multi-scale self-attention mechanism using a 2-stage Vision Transformer, demonstrates clinical capability by achieving diagnostic sensitivity of above 85% for both internal and external cohort analysis. Furthermore, AI-assisted pathologists showed significantly improved diagnostic sensitivity by 10% within 18% saved screening time compared to human pathologists. Our AI system has a great potential for providing presumptive pathologic opinion for deciding proper treatment for early GC patients.	翻訳日:2022-02-18 14:28:56 公開日:2022-02-17
# 深層学習における一般化理解のための神経崩壊の限界 Limitations of Neural Collapse for Understanding Generalization in Deep Learning ( http://arxiv.org/abs/2202.08384v1 ) ライセンス: Link先を確認	Like Hui, Mikhail Belkin, Preetum Nakkiran	(参考訳) papyan, han, & donoho (2020) の最近の研究は興味深い「神経崩壊」現象を示し、訓練の後期における補間分類器の構造的特性を示した。この研究は、この現象の研究の豊富な領域を開拓した。私たちのモチベーションは、この研究プログラムの上限を研究することにあります。まず,一般化におけるその役割について検討する。我々はニューラル・コラプス予想を2つの別々の予想に洗練する: 列車集合上の崩壊(最適化特性)と試験分布上の崩壊(一般化特性)である。ニューラル・コラプスは列車のセットで発生することが多いが、テストセットでは発生しない。したがって、神経崩壊は主として最適化現象であり、一般化と無明なつながりを持つと結論づける。次に,機能学習における神経崩壊の役割について検討する。ダウンストリームタスクの転送性能によって測定されるように、トレーニングがより長くなるような、シンプルで現実的な実験を行う。これは、前述したように、神経崩壊が表現学習に必ずしも望ましいわけではないことを示唆している。最後に、「カスケード崩壊」現象の予備的証拠として、最後の層だけでなく、初期の層にも何らかの形態のニューラル崩壊が起こる。私たちの研究は、Neural Collapse研究の豊富なラインを継続し、その固有の制限を考慮しながら、コミュニティを奨励することを願っています。 The recent work of Papyan, Han, & Donoho (2020) presented an intriguing "Neural Collapse" phenomenon, showing a structural property of interpolating classifiers in the late stage of training. This opened a rich area of exploration studying this phenomenon. Our motivation is to study the upper limits of this research program: How far will understanding Neural Collapse take us in understanding deep learning? First, we investigate its role in generalization. We refine the Neural Collapse conjecture into two separate conjectures: collapse on the train set (an optimization property) and collapse on the test distribution (a generalization property). We find that while Neural Collapse often occurs on the train set, it does not occur on the test set. We thus conclude that Neural Collapse is primarily an optimization phenomenon, with as-yet-unclear connections to generalization. Second, we investigate the role of Neural Collapse in feature learning. We show simple, realistic experiments where training longer leads to worse last-layer features, as measured by transfer-performance on a downstream task. This suggests that neural collapse is not always desirable for representation learning, as previously claimed. Finally, we give preliminary evidence of a "cascading collapse" phenomenon, wherein some form of Neural Collapse occurs not only for the last layer, but in earlier layers as well. We hope our work encourages the community to continue the rich line of Neural Collapse research, while also considering its inherent limitations.	翻訳日:2022-02-18 14:28:37 公開日:2022-02-17
# アルツハイマー病関連知識グラフのマイニング : 薬物補充のためのad関連意味三重項の同定 Mining On Alzheimer's Diseases Related Knowledge Graph to Identity Potential AD-related Semantic Triples for Drug Repurposing ( http://arxiv.org/abs/2202.08712v1 ) ライセンス: Link先を確認	Yi Nian, Xinyue Hu, Rui Zhang, Jingna Feng, Jingcheng Du, Fang Li, Yong Chen and Cui Tao	(参考訳) 現在、ほとんどの神経変性疾患に対して効果的な治療法はない。知識グラフは異種データの包括的および意味的表現を提供し、薬物再精製を含む多くの生体医学的応用でうまく活用されている。本研究の目的は,アルツハイマー病 (AD) と薬剤, 薬物, 栄養補助薬の関係を文献から研究し, 神経変性の進行を予防または遅らせる機会を明らかにすることである。バイオメディカルアノテーションを収集し,SemMedDBを介してSemRepを用いてそれらの関係を抽出した。我々は、データ前処理中にBERTベースの分類器とルールベースの手法の両方を用いて、ほとんどのAD関連セマンティックトリプルを保存しながらノイズを排除した。 1,672,110個のフィルター付きトリプルは知識グラフ補完アルゴリズム(TransE、DistMult、ComplEx)を用いてAD治療や予防に役立つ候補を予測するために使用された。 3つの知識グラフ補完モデルの中で、TransEは他の2つよりも優れていた(MR = 13.45, Hits@1 = 0.306)。予測結果のさらなる評価に時間スライシング手法を活用した。我々のモデルによって予測される最も高いランクの候補に対する支持的な証拠は、我々のアプローチが信頼できる新しい知識を知らせることができることを示している。グラフマイニングモデルは,adと他のエンティティ(サプリメント,化学物質,薬物)との間の信頼性の高い新たな関係を予測できることを示す。構築された知識グラフは、データ駆動の知識発見と新しい仮説の生成を促進することができる。 To date, there are no effective treatments for most neurodegenerative diseases. Knowledge graphs can provide comprehensive and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug repurposing. Our objective is to construct a knowledge graph from literature to study relations between Alzheimer's disease (AD) and chemicals, drugs and dietary supplements in order to identify opportunities to prevent or delay neurodegenerative progression. We collected biomedical annotations and extracted their relations using SemRep via SemMedDB. We used both a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The 1,672,110 filtered triples were used to train with knowledge graph completion algorithms (i.e., TransE, DistMult, and ComplEx) to predict candidates that might be helpful for AD treatment or prevention. Among three knowledge graph completion models, TransE outperformed the other two (MR = 13.45, Hits@1 = 0.306). We leveraged the time-slicing technique to further evaluate the prediction results. We found supporting evidence for most highly ranked candidates predicted by our model which indicates that our approach can inform reliable new knowledge. This paper shows that our graph mining model can predict reliable new relationships between AD and other entities (i.e., dietary supplements, chemicals, and drugs). The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses.	翻訳日:2022-02-18 14:27:49 公開日:2022-02-17
# Neural Marionette: ボリュームビデオからの運動骨格と潜在ダイナミクスの教師なし学習 Neural Marionette: Unsupervised Learning of Motion Skeleton and Latent Dynamics from Volumetric Video ( http://arxiv.org/abs/2202.08418v1 ) ライセンス: Link先を確認	Jinseok Bae, Hojun Jang, Cheol-Hui Min, Hyungun Choi, Young Min Kim	(参考訳) 神経マリオネット(neural marionette)は、動的シーケンスから骨格構造を発見し、観察された動きのダイナミクスと一致する多様な動きを生成するための教師なしアプローチである。任意の運動下での関節物体の点雲観察のビデオストリームを考えると、運動を効果的に表現できる未知の低次元の骨格関係を発見できる。次に、検出された構造を用いて、相対的な関節回転に復号して全骨格運動を表す潜在構造における動的シーケンスの運動前兆を符号化する。提案手法は, 基礎となる運動や骨格構造についての事前の知識なく動作し, 得られた構造が, 4次元の運動列を表す場合に, ハンドラベルの接地真実骨格と同等であることを示す。骨格構造は、様々なシナリオの運動を生成することができる運動空間の一般的な意味を埋め込む。学習前の動作が多モードシーケンス生成、2つのポーズの補間、異なる骨格構造への動き再ターゲットに一般化可能であることを検証する。 We present Neural Marionette, an unsupervised approach that discovers the skeletal structure from a dynamic sequence and learns to generate diverse motions that are consistent with the observed motion dynamics. Given a video stream of point cloud observation of an articulated body under arbitrary motion, our approach discovers the unknown low-dimensional skeletal relationship that can effectively represent the movement. Then the discovered structure is utilized to encode the motion priors of dynamic sequences in a latent structure, which can be decoded to the relative joint rotations to represent the full skeletal motion. Our approach works without any prior knowledge of the underlying motion or skeletal structure, and we demonstrate that the discovered structure is even comparable to the hand-labeled ground truth skeleton in representing a 4D sequence of motion. The skeletal structure embeds the general semantics of possible motion space that can generate motions for diverse scenarios. We verify that the learned motion prior is generalizable to the multi-modal sequence generation, interpolation of two poses, and motion retargeting to a different skeletal structure.	翻訳日:2022-02-18 14:27:24 公開日:2022-02-17
# CSCNet:クラウド空間における軌道予測のための文脈意味一貫性ネットワーク CSCNet: Contextual Semantic Consistency Network for Trajectory Prediction in Crowded Spaces ( http://arxiv.org/abs/2202.08506v1 ) ライセンス: Link先を確認	Beihao Xia, Conghao Wong, Qinmu Peng, Wei Yuan, and Xinge You	(参考訳) 軌道予測は、歩行者、バイカー、車両などのエージェントの動き傾向を予測することを目的としている。混雑した空間における人間の活動の分析と理解に役立ち、監視ビデオ分析や自動運転システムなど、多くの分野に広く適用されている。ディープラーニングの成功のおかげで、軌道予測は大幅に進歩した。現在の方法は、社会的相互作用と風景の物理的制約の下でエージェントの将来の軌跡を研究することに専念している。さらに、これらの要因をどう扱うかは研究者の注意を引いている。しかし、これらの相互作用を様々な予測シーンでモデル化する際には、textbf{Semantic Shift Phenomenon} を無視する。社会的相互作用と物理的相互作用の間にはいくつかの意味的偏差があり、「\textbf{Gap}」と呼ばれる。本稿では,コンテキスト制約が強力かつ効率的なエージェントの将来の活動を予測するための \textbf{c}ontextual \textbf{s}emantic \textbf{c}onsistency \textbf{net}work (\textbf{cscnet})を提案する。シーン画像と軌跡から中間表現を得るために,よく設計されたコンテキスト認識転送を利用する。そして,活動意味論とシーン意味論を連携させてギャップを横切ることによって,社会的・身体的相互作用の違いを解消する。実験により、CSCNetは現在のほとんどの手法よりも定量的に質的に優れた性能を示した。 Trajectory prediction aims to predict the movement trend of the agents like pedestrians, bikers, vehicles. It is helpful to analyze and understand human activities in crowded spaces and widely applied in many areas such as surveillance video analysis and autonomous driving systems. Thanks to the success of deep learning, trajectory prediction has made significant progress. The current methods are dedicated to studying the agents' future trajectories under the social interaction and the sceneries' physical constraints. Moreover, how to deal with these factors still catches researchers' attention. However, they ignore the \textbf{Semantic Shift Phenomenon} when modeling these interactions in various prediction sceneries. There exist several kinds of semantic deviations inner or between social and physical interactions, which we call the "\textbf{Gap}". In this paper, we propose a \textbf{C}ontextual \textbf{S}emantic \textbf{C}onsistency \textbf{Net}work (\textbf{CSCNet}) to predict agents' future activities with powerful and efficient context constraints. We utilize a well-designed context-aware transfer to obtain the intermediate representations from the scene images and trajectories. Then we eliminate the differences between social and physical interactions by aligning activity semantics and scene semantics to cross the Gap. Experiments demonstrate that CSCNet performs better than most of the current methods quantitatively and qualitatively.	翻訳日:2022-02-18 14:27:06 公開日:2022-02-17
# CADRE:視覚に基づく自律型都市走行のためのカスケード深部強化学習フレームワーク CADRE: A Cascade Deep Reinforcement Learning Framework for Vision-based Autonomous Urban Driving ( http://arxiv.org/abs/2202.08557v1 ) ライセンス: Link先を確認	Yinuo Zhao, Kun Wu, Zhiyuan Xu, Zhengping Che, Qi Lu, Jian Tang, Chi Harold Liu	(参考訳) 複雑な都市環境と運転行動のダイナミクスのため、高密度交通における視覚に基づく自律走行は極めて困難である。広く応用された手法は、手作りのルールに大きく依存するか、限られた人間の経験から学習する。本稿では,モデルフリービジョンに基づく自律運転を実現するために,新しいカスケード深層強化学習フレームワークcadreを提案する。 cadreでは、生の観察から代表的潜在性特徴を導出するため、まずコアテンション機構を利用したコアテンション知覚モジュール(copm)をオフラインで訓練し、事前収集した駆動データセットから視覚情報と制御情報との相互関係を学習する。凍結したCoPMを事例として、特に設計された報酬関数の指導の下で、運転ポリシーをオンライン学習するための効率的な分散近位ポリシー最適化フレームワークを提案する。我々は、CARLA NoCrashベンチマークと、自律都市運転タスクにおける特定の障害物回避シナリオを用いて、総合的な実証的研究を行う。実験結果はCADREの有効性と最先端技術に対する優位性を広いマージンで良好に証明した。 Vision-based autonomous urban driving in dense traffic is quite challenging due to the complicated urban environment and the dynamics of the driving behaviors. Widely-applied methods either heavily rely on hand-crafted rules or learn from limited human experience, which makes them hard to generalize to rare but critical scenarios. In this paper, we present a novel CAscade Deep REinforcement learning framework, CADRE, to achieve model-free vision-based autonomous urban driving. In CADRE, to derive representative latent features from raw observations, we first offline train a Co-attention Perception Module (CoPM) that leverages the co-attention mechanism to learn the inter-relationships between the visual and control information from a pre-collected driving dataset. Cascaded by the frozen CoPM, we then present an efficient distributed proximal policy optimization framework to online learn the driving policy under the guidance of particularly designed reward functions. We perform a comprehensive empirical study with the CARLA NoCrash benchmark as well as specific obstacle avoidance scenarios in autonomous urban driving tasks. The experimental results well justify the effectiveness of CADRE and its superiority over the state-of-the-art by a wide margin.	翻訳日:2022-02-18 14:26:41 公開日:2022-02-17
# 画像分類における早期停止を用いたニューラルアーキテクチャ探索による2段階アーキテクチャの微調整 Two-Stage Architectural Fine-Tuning with Neural Architecture Search using Early-Stopping in Image Classification ( http://arxiv.org/abs/2202.08604v1 ) ライセンス: Link先を確認	Youngkee Kim, Won Joon Yun, Youn Kyu Lee, Joongheon Kim	(参考訳) ディープニューラルネットワーク(NN)は、畳み込みニューラルネットワーク(CNN)によって様々なタスク(コンピュータビジョンなど)でよく機能する。しかし,業界における品質データ収集の難しさは,NNの利用を妨げている。この問題に対処するために、大規模なデータセットでトレーニングされたnnの微調整を活用する転送学習(tl)の概念が登場した。そこで本稿では,ニューラルアーキテクチャサーチ(NAS)の概念に触発された,画像分類のための2段階のアーキテクチャ微調整手法を提案する。提案手法の主なアイデアの1つはベースアーキテクチャの変異であり、与えられたアーキテクチャ情報を使用することで検索コストを削減できる。さらに、NASコストを直接削減するアーリーストッピングも検討されている。実験により,提案手法は計算コストを最大28.2%,検索コストを22.3%削減できることを確認した。 Deep neural networks (NN) perform well in various tasks (e.g., computer vision) because of the convolutional neural networks (CNN). However, the difficulty of gathering quality data in the industry field hinders the practical use of NN. To cope with this issue, the concept of transfer learning (TL) has emerged, which leverages the fine-tuning of NNs trained on large-scale datasets in data-scarce situations. Therefore, this paper suggests a two-stage architectural fine-tuning method for image classification, inspired by the concept of neural architecture search (NAS). One of the main ideas of our proposed method is a mutation with base architectures, which reduces the search cost by using given architectural information. Moreover, an early-stopping is also considered which directly reduces NAS costs. Experimental results verify that our proposed method reduces computational and searching costs by up to 28.2% and 22.3%, compared to existing methods.	翻訳日:2022-02-18 14:26:20 公開日:2022-02-17
# バックトランスレーションフレームワークにおける両翻訳モデルのエンドツーエンドトレーニング End-to-End Training of Both Translation Models in the Back-Translation Framework ( http://arxiv.org/abs/2202.08465v1 ) ライセンス: Link先を確認	DongNyeong Heo and Heeyoul Choi	(参考訳) ニューラルネットワーク翻訳(NMT)における半教師付き学習アルゴリズムは、追加の単言語コーパスを用いることで教師付き学習アルゴリズムと比較して翻訳品質を著しく改善した。その中でもバックトランスレーションは理論的によく構造化された最先端の方法である。ソース言語とターゲット言語の間で事前訓練された2つのNMTモデルが与えられた場合、一方はモノリンガル文を潜在文として翻訳し、他方は潜在文を与えられたモノリンガル入力文を再構成する。そのため、以前の研究では、可変オートエンコーダ(VAE)トレーニングフレームワークをバックトランスレーションフレームワークに適用しようとした。しかし、潜在文の離散性は、フレームワークでバックプロパゲーションを使うことを不可能にした。本稿では,VAEの後方翻訳訓練フレームワークを実践し,エンドツーエンドのバックプロパゲーションによって学習する,識別可能な文を生成する分類的再パラメータ化手法を提案する。さらに,このフレームワークに特に有利ないくつかの正規化手法を提案する。本実験では,本手法が潜在文を通じてバックプロパゲーションを利用可能とし,wmt18翻訳タスクのデータセットのbleuスコアを改善することを実証する。 Semi-supervised learning algorithms in neural machine translation (NMT) have significantly improved translation quality compared to the supervised learning algorithms by using additional monolingual corpora. Among them, back-translation is a theoretically well-structured and cutting-edge method. Given two pre-trained NMT models between source and target languages, one translates a monolingual sentence as a latent sentence, and the other reconstructs the monolingual input sentence given the latent sentence. Therefore, previous works tried to apply the variational auto-encoder's (VAE) training framework to the back-translation framework. However, the discrete property of the latent sentence made it impossible to use backpropagation in the framework. This paper proposes a categorical reparameterization trick that generates a differentiable sentence, with which we practically implement the VAE's training framework for the back-translation and train it by end-to-end backpropagation. In addition, we propose several regularization techniques that are especially advantageous to this framework. In our experiments, we demonstrate that our method makes backpropagation available through the latent sentences and improves the BLEU scores on the datasets of the WMT18 translation task.	翻訳日:2022-02-18 14:24:22 公開日:2022-02-17
# グラフ用トランスフォーマー:アーキテクチャの観点からの概観 Transformer for Graphs: An Overview from Architecture Perspective ( http://arxiv.org/abs/2202.08455v1 ) ライセンス: Link先を確認	Erxue Min, Runfa Chen, Yatao Bian, Tingyang Xu, Kangfei Zhao, Wenbing Huang, Peilin Zhao, Junzhou Huang, Sophia Ananiadou, Yu Rong	(参考訳) 近年,多くの人工知能分野で大きな成功を収めたTransformerモデルは,グラフ構造化データのモデリングにおいて大きな可能性を実証している。現在、グラフ構造化データに適応するために、様々なトランスフォーマーが提案されている。しかし、これらの変圧器のグラフに対する包括的な文献レビューと体系的な評価はまだ利用できない。既存のグラフのトランスフォーマーモデルを整理し、様々なグラフタスクの有効性を体系的に調査することが不可欠である。本稿では,建築設計の観点から様々なグラフトランスフォーマーモデルの包括的レビューを行う。最初に既存のモデルを分解し、バニラ変換器にグラフ情報を組み込む典型的な3つの方法を結論付けます。 1)補助モジュールとしてのGNN 2)グラフによる位置埋め込みの改善,及び 3)グラフからの注意行列の改善。さらに,代表コンポーネントを3つのグループに実装し,様々なグラフデータベンチマークの総合的な比較を行い,各コンポーネントの性能向上について検討する。筆者らは,現行のグラフ特定モジュールによるトランスフォーマタの利点を検証し,その利点をグラフタスクで明らかにする。 Recently, Transformer model, which has achieved great success in many artificial intelligence fields, has demonstrated its great potential in modeling graph-structured data. Till now, a great variety of Transformers has been proposed to adapt to the graph-structured data. However, a comprehensive literature review and systematical evaluation of these Transformer variants for graphs are still unavailable. It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks. In this survey, we provide a comprehensive review of various Graph Transformer models from the architectural design perspective. We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer: 1) GNNs as Auxiliary Modules, 2) Improved Positional Embedding from Graphs, and 3) Improved Attention Matrix from Graphs. Furthermore, we implement the representative components in three groups and conduct a comprehensive comparison on various kinds of famous graph data benchmarks to investigate the real performance gain of each component. Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.	翻訳日:2022-02-18 14:23:41 公開日:2022-02-17
# GraphSHAP: Black-box Graph 分類のためのモチーフベースの説明 GRAPHSHAP: Motif-based Explanations for Black-box Graph Classifiers ( http://arxiv.org/abs/2202.08815v1 ) ライセンス: Link先を確認	Alan Perotti, Paolo Bajardi, Francesco Bonchi, and Andr\'e Panisson	(参考訳) ブラックボックス分類器(例えば表データ、画像、時系列)を説明するほとんどの方法は、特徴の削除/摂動がモデル出力に与える影響を測定することに依存している。これにより、説明言語は分類子の特徴空間にマッチする。しかし、基本特徴がグラフ構造(つまりエッジ)を記述する隣接情報と本質的に対応しているグラフデータを扱う場合、特徴空間と説明言語とのマッチングは適切ではないかもしれない。この点に関して、私たちは (i)黒箱の内部表現に関して、グラフ分類の優れた説明方法が完全に非依存であるべきである。 (ii)グラフ分類タスクのための良質な説明言語はモチーフのような高次構造で表現すべきである。したがって、特徴空間(エッジ)を説明空間(motifs)から切り離す必要性は、グラフ分類タスクの実行可能な説明を開発するための大きな課題である。本稿では,黒ボックスグラフ分類器のモチーフに基づく説明を提供する,Shapleyベースのアプローチである GraphSHAPを紹介し,モデルやトレーニングデータについて何の知識も必要とせずに,ブラックボックスを自由にクエリできる,という要件について述べる。さらに,合成グラフデータセット生成装置,サブグラフマイニングとランキングのためのアルゴリズム,カスタムグラフ畳み込み層,カーネルなどの補助コンポーネントを導入し,線形時間複雑性を維持しながら説明スコアを近似する。最後に,自閉症スペクトラム障害の患者とコントロールグループからなる実世界の脳ネットワークデータセット上で GraphSHAP を検証した。実験では,ブラックボックスモデルが提供する分類が,コネクトロミクスパターンによって効果的に説明できることを示す。 Most methods for explaining black-box classifiers (e.g., on tabular data, images, or time series) rely on measuring the impact that the removal/perturbation of features has on the model output. This forces the explanation language to match the classifier features space. However, when dealing with graph data, in which the basic features correspond essentially to the adjacency information describing the graph structure (i.e., the edges), this matching between features space and explanation language might not be appropriate. In this regard, we argue that (i) a good explanation method for graph classification should be fully agnostic with respect to the internal representation used by the black-box; and (ii) a good explanation language for graph classification tasks should be represented by higher-order structures, such as motifs. The need to decouple the feature space (edges) from the explanation space (motifs) is thus a major challenge towards developing actionable explanations for graph classification tasks. In this paper we introduce GRAPHSHAP, a Shapley-based approach able to provide motif-based explanations for black-box graph classifiers, assuming no knowledge whatsoever about the model or its training data: the only requirement is that the black-box can be queried at will. Furthermore, we introduce additional auxiliary components such as a synthetic graph dataset generator, algorithms for subgraph mining and ranking, a custom graph convolutional layer, and a kernel to approximate the explanation scores while maintaining linear time complexity. Finally, we test GRAPHSHAP on a real-world brain-network dataset consisting of patients affected by Autism Spectrum Disorder and a control group. Our experiments highlight how the classification provided by a black-box model can be effectively explained by few connectomics patterns.	翻訳日:2022-02-18 14:23:28 公開日:2022-02-17
# 構造化アウトプットを用いた効率的で信頼性の高い対話型学習 Efficient and Reliable Probabilistic Interactive Learning with Structured Outputs ( http://arxiv.org/abs/2202.08566v1 ) ライセンス: Link先を確認	Stefano Teso, Antonio Vergari	(参考訳) 本稿では,構造化された出力空間に対する対話型学習について検討し,ラベルが未知であり,取得しなければならないアクティブラーニングと,ラベルが騒々しく,拡張が必要な懐疑的な学習に焦点を当てた。これらのシナリオは、不確実性を測定するために確率量の信頼性と効率的な計算を保証する表現モデルを必要とする。我々は,これらの条件をすべて満たしている確率モデルの種類を同定し,表現性を維持しつつ,その量を扱いやすい計算を行う。トラクタブルな確率回路に関する先行研究に基づいて、CRISPが大規模な出力空間において、堅牢で効率的な能動的・懐疑的な学習を可能にする方法について説明する。 In this position paper, we study interactive learning for structured output spaces, with a focus on active learning, in which labels are unknown and must be acquired, and on skeptical learning, in which the labels are noisy and may need relabeling. These scenarios require expressive models that guarantee reliable and efficient computation of probabilistic quantities to measure uncertainty. We identify conditions under which a class of probabilistic models -- which we denote CRISPs -- meet all of these conditions, thus delivering tractable computation of the above quantities while preserving expressiveness. Building on prior work on tractable probabilistic circuits, we illustrate how CRISPs enable robust and efficient active and skeptical learning in large structured output spaces.	翻訳日:2022-02-18 14:23:00 公開日:2022-02-17
# データ中毒とビザンチン勾配攻撃の等価性 An Equivalence Between Data Poisoning and Byzantine Gradient Attacks ( http://arxiv.org/abs/2202.08578v1 ) ライセンス: Link先を確認	Sadegh Farhadkhani, Rachid Guerraoui, L\^e-Nguy\^en Hoang, Oscar Villemaud	(参考訳) 分散学習のレジリエンスを研究するために、"ビザンティン"文学は、労働者がパラメータサーバに任意の勾配を報告できる強力な脅威モデルを考える。このモデルはいくつかの基本的な結果を得るのに役立ったが、労働者がほとんど信頼できる機械であるときには、時には非現実的とみなされる。本稿では,本モデルとデータ中毒との間に驚くべき等価性を示す。より具体的には、PACを保証するパーソナライズされたフェデレーション学習システムにおいて、すべての勾配攻撃がデータ中毒に還元できることを証明します。この同値性により、ビザンティン機械学習における既存の不可能性定理のまとめとして、データ中毒に対するレジリエンスに関する新しい不合理性結果が得られる。さらに,同値性を用いることで,(理論上,経験的に)古典的なパーソナライズされた連合学習モデルに対して非常に効果的であることを示す,実践的な攻撃を導出する。 To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience to data poisoning as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models.	翻訳日:2022-02-18 14:22:47 公開日:2022-02-17
# (参考訳) 安全な予測モデル更新のためのホールドアウトセットの最適サイズ Optimal sizing of a holdout set for safe predictive model updating ( http://arxiv.org/abs/2202.06374v2 ) ライセンス: CC BY-SA 4.0	Sami Haidar-Wehbe, Samuel R Emerson, Louis J M Aslett, James Liley	(参考訳) 医療統計と医療機械学習のリスクモデルは、臨床または他の介入を導くためにますます使われている。ガイド付き介入の後にモデルが更新される場合、正確な予測を行うのに失敗する可能性がある。モデルによって導かれる介入を受けない集団のサブセットである「ホールドアウトセット」の使用がこれを防ぐために提案されている。ホールドアウトセットの患者はリスク予測の恩恵を受けないため、ホールドアウトセットの患者数を最小限に抑えながら、モデルパフォーマンスの最大化をトレードオフしなければならない。一般損失関数を定義することにより、最適ホールドアウト集合サイズの存在と一意性を証明し、その推定にパラメトリックおよびセミパラメトリックアルゴリズムを導入する。われわれは,近年の予防接種前のリスクスコアを実証した。これらの結果に基づき、ホールドアウトセットはモデル更新問題に対する安全で実行可能で実装が容易なソリューションであると主張する。 Risk models in medical statistics and healthcare machine learning are increasingly used to guide clinical or other interventions. Should a model be updated after a guided intervention, it may lead to its own failure at making accurate predictions. The use of a `holdout set' -- a subset of the population that does not receive interventions guided by the model -- has been proposed to prevent this. Since patients in the holdout set do not benefit from risk predictions, the chosen size must trade off maximising model performance whilst minimising the number of held out patients. By defining a general loss function, we prove the existence and uniqueness of an optimal holdout set size, and introduce parametric and semi-parametric algorithms for its estimation. We demonstrate their use on a recent risk score for pre-eclampsia. Based on these results, we argue that a holdout set is a safe, viable and easily implemented solution to the model update problem.	翻訳日:2022-02-18 12:50:35 公開日:2022-02-17
# (参考訳) コンピュータ支援精子分析による顕微鏡映像の精液品質評価の検討 A Survey of Semen Quality Evaluation in Microscopic Videos Using Computer Assisted Sperm Analysis ( http://arxiv.org/abs/2202.07820v2 ) ライセンス: CC BY 4.0	Wenwei Zhao, Pingli Ma, Chen Li, Xiaoning Bu, Shuojia Zou, Tao Jiang, Marcin Grzegorzek	(参考訳) CASA(Computer Assisted Sperm Analysis)は、男性生殖健康診断と不妊治療において重要な役割を担っている。近年,コンピュータ産業の発展に伴い,精度の高いアルゴリズムが提案されている。これらの新しいアルゴリズムの助けを借りて、CASAはより高速で高品質な結果を得ることができる。画像処理はcasaの技術的基盤であり、前処理、特徴抽出、ターゲット検出、追跡などを含むため、これらの手法はcasaを扱う上で重要な技術的ステップである。過去30年間(1988年以降)のコンピュータ・アシスト精子分析手法に関する様々な研究が包括的に紹介され、分析されている。理解を容易にするために、関連する方法は精子分析の一般的なステップのシーケンスで分析される。言い換えると、精子検出(局所化)に関連する方法が最初に分析され、その後、精子追跡の方法が分析される。これとは別に、我々はCASAの現状と将来を分析・予測する。本研究によれば,本論文で述べた方法の精子顕微鏡映像に適用できる可能性について解説した。さらに、顕微鏡映像における物体検出と追跡の課題は、この調査に触発されて解決される可能性がある。 The Computer Assisted Sperm Analysis (CASA) plays a crucial role in male reproductive health diagnosis and Infertility treatment. With the development of the computer industry in recent years, a great of accurate algorithms are proposed. With the assistance of those novel algorithms, it is possible for CASA to achieve a faster and higher quality result. Since image processing is the technical basis of CASA, including pre-processing,feature extraction, target detection and tracking, these methods are important technical steps in dealing with CASA. The various works related to Computer Assisted Sperm Analysis methods in the last 30 years (since 1988) are comprehensively introduced and analysed in this survey. To facilitate understanding, the methods involved are analysed in the sequence of general steps in sperm analysis. In other words, the methods related to sperm detection (localization) are first analysed, and then the methods of sperm tracking are analysed. Beside this, we analyse and prospect the present situation and future of CASA. According to our work, the feasible for applying in sperm microscopic video of methods mentioned in this review is explained. Moreover, existing challenges of object detection and tracking in microscope video are potential to be solved inspired by this survey.	翻訳日:2022-02-18 12:34:19 公開日:2022-02-17
# (参考訳) 深層学習を用いた乳房密度推定のマルチ再構成 A multi-reconstruction study of breast density estimation using Deep Learning ( http://arxiv.org/abs/2202.08238v2 ) ライセンス: CC BY 4.0	Vikash Gupta, Mutlu Demirer, Robert W. Maxwell, Richard D. White, Barbaros Selnur Erdal	(参考訳) 乳腺密度の推定は、乳がんに先立つ個人を認識する上で重要な課題の1つである。マンモグラムの脂肪組織背景の低コントラストと変動のため、しばしば困難である。多くの場合、乳房密度は、放射線学者が乳房画像・報告データシステム(BI-RADS)によって決定される4つの密度カテゴリのうちの1つを、手動で推定する。乳房密度分類パイプラインの自動化に向けた取り組みが進められている。乳房密度推定はスクリーニング試験で行う重要な課題の1つである。濃厚な乳がんは乳がんの影響を受けやすい。マンモグラムの脂肪組織背景の低コントラストとゆらぎのため, 密度推定は困難である。伝統的なマンモグラムは、トモシンセシスや他の低放射線量変種(例えばhologicのintelligent 2dとc-view)に置き換えられている。低用量要件のため、Intelligent 2DビューとC-Viewを優先するスクリーニングセンターが増えている。乳房密度推定のためのディープラーニング研究は、ニューラルネットワークのトレーニングに単一のモダリティのみを使用する。しかし、そうすることでデータセット内の画像数が制限される。本稿では,すべてのモダリティを一度に訓練したニューラルネットワークが,任意のモダリティを訓練したニューラルネットワークよりも優れた性能を示す。受信者特性曲線の下の領域を用いてこれらの結果について議論する。 Breast density estimation is one of the key tasks in recognizing individuals predisposed to breast cancer. It is often challenging because of low contrast and fluctuations in mammograms' fatty tissue background. Most of the time, the breast density is estimated manually where a radiologist assigns one of the four density categories decided by the Breast Imaging and Reporting Data Systems (BI-RADS). There have been efforts in the direction of automating a breast density classification pipeline. Breast density estimation is one of the key tasks performed during a screening exam. Dense breasts are more susceptible to breast cancer. The density estimation is challenging because of low contrast and fluctuations in mammograms' fatty tissue background. Traditional mammograms are being replaced by tomosynthesis and its other low radiation dose variants (for example Hologic' Intelligent 2D and C-View). Because of the low-dose requirement, increasingly more screening centers are favoring the Intelligent 2D view and C-View. Deep-learning studies for breast density estimation use only a single modality for training a neural network. However, doing so restricts the number of images in the dataset. In this paper, we show that a neural network trained on all the modalities at once performs better than a neural network trained on any single modality. We discuss these results using the area under the receiver operator characteristics curves.	翻訳日:2022-02-18 12:33:12 公開日:2022-02-17
# 自然運動を超えて:ビデオフレーム補間の不連続を探る Beyond Natural Motion: Exploring Discontinuity for Video Frame Interpolation ( http://arxiv.org/abs/2202.07291v2 ) ライセンス: Link先を確認	Sangjin Lee, Hyeongmin Lee, Chajin Shin, Hanbin Son, Sangyoun Lee	(参考訳) ビデオ補間は、2つの連続するフレームが与えられた中間フレームを合成するタスクである。以前の研究の多くは、乱れたフレームに対する適切なフレームワープ操作と改良モジュールに焦点を当てていた。これらの研究は、連続的な動きしか持たない自然ビデオで行われている。しかし、多くの実用的なビデオには、チャットウィンドウ、ウォーターマーク、GUI要素、サブタイトルなど、多くの不連続な動きが含まれている。これらの問題に対処するために,二つのフレーム間の遷移の概念を拡張する3つの手法を提案する。まず、連続的および不連続的な動き領域を分離できる新しいアーキテクチャです。また,図形テキスト混合(FTM)と呼ばれる新しいデータ拡張戦略を提案し,モデルがより一般的なシナリオを学習できるようにする。最後に,データ拡張を伴う不連続な運動領域の監視を行うための損失関数を提案する。モバイルゲームやチャットビデオからなる特別なデータセットを収集しました。本手法は,特殊データセット上の映像の補間特性を著しく改善することを示す。さらに,本モデルは,DAVISやUCF101のような連続的な動きのみを含む自然なビデオデータセットの最先端手法よりも優れている。 Video interpolation is the task that synthesizes the intermediate frame given two consecutive frames. Most of the previous studies have focused on appropriate frame warping operations and refinement modules for the warped frames. These studies have been conducted on natural videos having only continuous motions. However, many practical videos contain a lot of discontinuous motions, such as chat windows, watermarks, GUI elements, or subtitles. We propose three techniques to expand the concept of transition between two consecutive frames to address these issues. First is a new architecture that can separate continuous and discontinuous motion areas. We also propose a novel data augmentation strategy called figure-text mixing (FTM) to make our model learn more general scenarios. Finally, we propose loss functions to give supervisions of the discontinuous motion areas with the data augmentation. We collected a special dataset consisting of some mobile games and chatting videos. We show that our method significantly improves the interpolation qualities of the videos on the special dataset. Moreover, our model outperforms the state-of-the-art methods for natural video datasets containing only continuous motions, such as DAVIS and UCF101.	翻訳日:2022-02-18 12:27:06 公開日:2022-02-17
# ユーザ指向ロバスト強化学習 User-Oriented Robust Reinforcement Learning ( http://arxiv.org/abs/2202.07301v2 ) ライセンス: Link先を確認	Haoyi You, Beichen Yu, Haiming Jin, Zhaoxing Yang, Jiahui Sun, Xinbing Wang	(参考訳) 近年、様々な環境における政策の堅牢性向上が強化学習(RL)コミュニティの注目を集めている。既存のロバストなRL手法は主に、最悪の環境下でのポリシーの性能を最適化することで、最大限のロバスト性を達成することを目的としている。しかし、実際には、rlポリシーを使用するユーザは、環境間のパフォーマンスよりも異なる好みを持つ可能性がある。上述した最大限の堅牢性は、しばしばユーザーの好みを満たすには保守的すぎる。そこで本稿では,ロバストなRLにユーザ嗜好を取り入れ,新しいユーザ指向ロバストRL(UOR-RL)フレームワークを提案する。具体的には、RLのための新しいユーザ指向ロバストネス(UOR)メトリックを定義し、ユーザ好みに応じて異なる重みを環境に割り当て、最大ロバストネスメトリックを一般化する。 UORのパラメータを最適化するために, 既知環境分布を有するシナリオに対して, 2つの異なるUOR-RLトレーニングアルゴリズムを開発した。理論的には、我々のUOR-RLトレーニングアルゴリズムは、環境分布に関する不正確な、あるいは全く知識のない場合でも、ほぼ最適ポリシーに収束することを示す。さらに,4つの MuJoCo タスクの広範な実験評価を行った。実験結果から,UOR-RLは平均および最悪の性能指標の下では最先端のベースラインと同等であり,さらにUOR測定に基づいて新たな最先端のパフォーマンスを確立することが示唆された。 Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy's performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize the UOR metric, we develop two different UOR-RL training algorithms for the scenarios with or without a priori known environment distribution, respectively. Theoretically, we prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution. Furthermore, we carry out extensive experimental evaluations in 4 MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to the state-of-the-art baselines under the average and worst-case performance metrics, and more importantly establishes new state-of-the-art performance under the UOR metric.	翻訳日:2022-02-18 12:26:50 公開日:2022-02-17
# ノード分類において、スペクトルグラフニューラルネットワークはいつ失敗するのか? When Does A Spectral Graph Neural Network Fail in Node Classification? ( http://arxiv.org/abs/2202.07902v2 ) ライセンス: Link先を確認	Zhixian Chen, Tengfei Ma and Yang Wang	(参考訳) 様々なグラフフィルタを持つスペクトルグラフニューラルネットワーク(GNN)は、グラフ学習問題における有望な性能のため、広く肯定されている。しかし、GNNは必ずしもうまく機能していないことが知られている。グラフフィルタはモデル説明の理論的基礎を提供するが、スペクトルGNNがいつ失敗するかは不明である。本稿では,ノード分類問題に着目し,その予測誤差を調査し,スペクトルGNNの性能に関する理論的解析を行う。本研究では,グラフ構造,ノードラベル,グラフフィルタの複雑な関係を包括的に理解する手法を提案する。ラベル差に対する応答効率の低いグラフフィルタは失敗しがちであることを示す。 GNNの性能を向上させるため,データ駆動型フィルタバンクを用いた理論解析から,フィルタ設計のためのより優れた手法を提案し,経験的検証のためのシンプルなモデルを提案する。実験結果は理論結果と一貫性を示し,戦略を支持する。 Spectral Graph Neural Networks (GNNs) with various graph filters have received extensive affirmation due to their promising performance in graph learning problems. However, it is known that GNNs do not always perform well. Although graph filters provide theoretical foundations for model explanations, it is unclear when a spectral GNN will fail. In this paper, focusing on node classification problems, we conduct a theoretical analysis of spectral GNNs performance by investigating their prediction error. With the aid of graph indicators including homophily degree and response efficiency we proposed, we establish a comprehensive understanding of complex relationships between graph structure, node labels, and graph filters. We indicate that graph filters with low response efficiency on label difference are prone to fail. To enhance GNNs performance, we provide a provably better strategy for filter design from our theoretical analysis - using data-driven filter banks, and propose simple models for empirical validation. Experimental results show consistency with our theoretical results and support our strategy.	翻訳日:2022-02-18 12:26:26 公開日:2022-02-17
# 会話レベル特性の学習による会話音声認識 Conversational Speech Recognition By Learning Conversation-level Characteristics ( http://arxiv.org/abs/2202.07855v2 ) ライセンス: Link先を確認	Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma	(参考訳) 会話自動音声認識(英: Conversational Automatic Speech Recognition, ASR)は、複数の話者を含む会話音声を認識するタスクである。文レベルのASRとは異なり、会話型ASRは、役割選好や話題のコヒーレンスといった会話の特徴から自然に利点を生かすことができる。本稿では,会話レベルの特徴を主成分とする対話型ASRモデルを提案する。提案するモデルのハイライトは2つだ。まず、コンバータベースのエンコーダデコーダASRバックボーンに潜時変分モジュール(LVM)をアタッチして、役割選好とトピックコヒーレンスを学ぶ。第二に、予測されたトピックの単語にデコーダの出力をバイアスするトピックモデルが特に採用されている。 2つのマンダリン会話型ASRタスクの実験により、提案モデルが最大12%の相対的文字誤り率(CER)を減少させることを示した。 Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12% relative character error rate (CER) reduction.	翻訳日:2022-02-18 12:26:12 公開日:2022-02-17

Title

Authors

Abstract

論文公表日・翻訳日

# 4次元の基底を補完する

Completing bases in four dimensions ( http://arxiv.org/abs/2010.09506v4 )

ライセンス: Link先を確認

Hans Havlicek and Karl Svozil

(参考訳) in)分解可能ベクトルによる4次元ヒルベルト空間の不完全基底、あるいは文脈の完備化のための基準と構成的方法が与えられる。

Criteria and constructive methods for the completion of an incomplete basis of, or context in, a four-dimensional Hilbert space by (in)decomposable vectors are given.

翻訳日:2023-04-28 21:53:35 公開日:2022-02-17

# 大域マルコフ進化の局所非マルコフ量子力学における熱電流とエントロピー生成速度

Heat current and entropy production rate in local non-Markovian quantum dynamics of global Markovian evolution ( http://arxiv.org/abs/2102.06694v2 )

ライセンス: Link先を確認

Ahana Ghoshal and Ujjwal Sen

(参考訳) 開量子進化におけるエントロピーの平衡方程式の要素と、マルコフ的状態から非マルコフ的状態へ進むときの応答について検討する。特に,非マルコフの還元進化における熱電流とエントロピー生成速度,およびマルコフの限界を,マルコフの浴に浸漬された2つの相互作用系のうちの1つで経験する。この分析によって自然に熱電流不足とエントロピー生産率の赤字が定義され、これは対応する量のグローバルバージョンとローカルバージョンの違いである。この調査は、ある場合において、時間積分熱電流欠損と2つの系間の絡み合いの相対エントロピーの相補性につながる。

We examine the elements of the balance equation of entropy in open quantum evolutions and their response as we go from a Markovian to a non-Markovian situation. In particular, we look at the heat current and entropy production rate in the non-Markovian reduced evolution, as well as a Markovian limit of the same, experienced by one of two interacting systems immersed in a Markovian bath. The analysis naturally leads us to define a heat current deficit and an entropy production rate deficit, which are differences between the global and local versions of the corresponding quantities. The investigation leads, in certain cases, to a complementarity of the time-integrated heat current deficit and the relative entropy of entanglement between the two systems.

翻訳日:2023-04-11 07:58:10 公開日:2022-02-17

# クラウド上の6つのノイズ量子ビットを持つ量子非局所ゲーム

Playing quantum nonlocal games with six noisy qubits on the cloud ( http://arxiv.org/abs/2105.05266v3 )

ライセンス: Link先を確認

Meron Sheffer, Daniel Azses, Emanuele G. Dalla Torre

(参考訳) 非局所ゲームはベルの不等式の拡張であり、量子優位を示すことを目的としている。これらのゲームは、浅い回路の準備だけを必要とするため、ノイズの多い量子コンピュータには適しており、非可換可観測性の測定も必要である。ここでは, science 362, 308 (2018) で提案されている非局所ゲームの最小実装について考察する。我々はibm、ionq、honeywellによるクラウド上の量子コンピュータを用いて6量子ビットのクラスタ状態を作成することでこのゲームをテストする。我々のアプローチには、回路のアイデンティティやエラー軽減など、いくつかのレベルの最適化が含まれており、古典的しきい値を超え、1つの量子コンピュータで量子優位性を示すことができる。我々は、より精度の低い量子コンピュータにおいて、より多くの回路の探索を犠牲にして量子優位を観測できる異なる不等式を導入することで、結論付けた。

Nonlocal games are extensions of Bell inequalities, aimed at demonstrating quantum advantage. These games are well suited for noisy quantum computers because they only require the preparation of a shallow circuit, followed by the measurement of non-commuting observable. Here, we consider the minimal implementation of the nonlocal game proposed in Science 362, 308 (2018). We test this game by preparing a 6-qubit cluster state using quantum computers on the cloud by IBM, Ionq, and Honeywell. Our approach includes several levels of optimization, such as circuit identities and error mitigation and allows us to cross the classical threshold and demonstrate quantum advantage in one quantum computer. We conclude by introducing a different inequality that allows us to observe quantum advantage in less accurate quantum computers, at the expense of probing a larger number of circuits.

翻訳日:2023-03-31 20:47:33 公開日:2022-02-17

# ファーストレスポンダーが翼を得た: beyond 5gシステムにおけるローカライズ操作の救助にuav

First Responders Got Wings: UAVs to the Rescue of Localization Operations in Beyond 5G Systems ( http://arxiv.org/abs/2109.03180v3 )

ライセンス: Link先を確認

Antonio Albanese, Vincenzo Sciancalepore, Xavier Costa-P\'erez

(参考訳) 自然災害や人造災害は過去数十年で劇的に増加している。第一応答者のローカライゼーション時間と最終死亡数との強い関係を考えると,検索・救助業務の近代化は不可欠である。この文脈では、無人航空機(UAV)ベースのソリューションは、人工知能(AI)、再構成可能なインテリジェントサーフェス(RIS)、直交時間周波数空間(OTFS)といった新しい技術を活用することで、ローカライゼーションの課題に取り組む最も有望な候補である。本稿では,前例のない効果的な被害者のローカライズソリューションを生み出すために,最先端技術のローカライズ性能を高めるための,主な課題と今後の機会を浮き彫りにすることで,最近利用可能な手法を生かした。

Natural and human-made disasters have dramatically increased during the last decades. Given the strong relationship between first responders localization time and the final number of deaths, the modernization of search-and-rescue operations has become imperative. In this context, Unmanned Aerial Vehicles (UAVs)-based solutions are the most promising candidates to take up on the localization challenge by leveraging on emerging technologies such as: Artificial Intelligence (AI), Reconfigurable Intelligent Surfaces (RIS) and Orthogonal Time Frequency Space (OTFS) modulations. In this paper, we capitalize on such recently available techniques by shedding light on the main challenges and future opportunities to boost the localization performance of state-of-the-art techniques to give birth to unprecedentedly effective missing victims localization solutions.

翻訳日:2023-03-15 22:45:41 公開日:2022-02-17

# 変動量子に基づく導波路モードのシミュレーション

Variational Quantum-Based Simulation of Waveguide Modes ( http://arxiv.org/abs/2109.12279v2 )

ライセンス: Link先を確認

Wei-Bin Ewe, Dax Enshan Koh, Siong Thye Goh, Hong-Son Chu, Ching Eng Png

(参考訳) 変分量子アルゴリズムは、古典的コンピュータに対する量子優位性を達成するために、ノイズの多い中間スケール量子(NISQ)マシン上で実装できる最も有望な方法の1つである。本稿では、中空金属導波路における電磁波の伝搬モードの計算における有限差分法と併用した変分量子アルゴリズムの使用について述べる。ヘルムホルツ方程式によって記述された二次元(2次元)導波路問題は線形方程式の系によって近似され、その解は量子ハードウェア上で効率的に評価できる単純な量子期待値で表される。 2次元導波路問題を解決するための提案手法を検証するために, 数値実験を行った。

Variational quantum algorithms are one of the most promising methods that can be implemented on noisy intermediate-scale quantum (NISQ) machines to achieve a quantum advantage over classical computers. This article describes the use of a variational quantum algorithm in conjunction with the finite difference method for the calculation of propagation modes of an electromagnetic wave in a hollow metallic waveguide. The two-dimensional (2D) waveguide problem, described by the Helmholtz equation, is approximated by a system of linear equations, whose solutions are expressed in terms of simple quantum expectation values that can be evaluated efficiently on quantum hardware. Numerical examples are presented to validate the proposed method for solving 2D waveguide problems.

翻訳日:2023-03-13 19:06:34 公開日:2022-02-17

# ナノロータアライメントの干渉制御

Interferometric control of nanorotor alignment ( http://arxiv.org/abs/2110.01301v2 )

ライセンス: Link先を確認

Birthe Schrinski, Benjamin A. Stickler, Klaus Hornberger

(参考訳) 剛体の固有に非線形な回転ダイナミクスは、量子運動を利用する前例のない方法を提供する。このレターではマッハ・ツェンダー干渉計の回転アナログを考案し、これは対称回転子を完全に整列して完全に反整列することを可能にする。このスキームは4つの異なる配向の重ね合わせを使い、量子回復時間の8分の1で出現し、干渉は弱いレーザーパルスによって制御される。この効果の半古典的モデルを構築し,不完全性やデコヒーレンスの存在下においても持続することを示す。

The intrinsically non-linear rotation dynamics of rigid bodies offer unprecedented ways to exploit their quantum motion. In this Letter we devise a rotational analog of Mach-Zehnder interferometry, which allows steering symmetric rotors from fully aligned to completely antialigned. The scheme uses a superposition of four distinct orientations, emerging at the eighth of the quantum revival time, whose interference can be controlled by a weak laser pulse. We develop a semiclassical model of the effect and demonstrate that it persists even in presence of imperfections and decoherence.

翻訳日:2023-03-12 14:18:58 公開日:2022-02-17

# xx$+$xxzダイオードの輸送とスペクトル特性とデファージングに対する安定性

Transport and spectral properties of the XX$+$XXZ diode and stability to dephasing ( http://arxiv.org/abs/2110.15564v3 )

ライセンス: Link先を確認

Kang Hao Lee, Vinitha Balachandran, Chu Guo and Dario Poletti

(参考訳) XX$+$ XXZスピンチェーンで形成されるセグメントダイオードの輸送特性とスペクトル特性について検討した。この系は十分に大きな異方性を持つスピン電流に対する理想的な整流器であることが示されている。ここでは、逆バイアスの系が異方性(弾道性、拡散性、絶縁性)の値によって3つの異なる輸送機構を持つことを示す。前方バイアスでは、弾道性と拡散性の2つの体制に遭遇する。前と逆のバイアスの系は、異なる関数に収束する急流度の分布で、スペクトル特性が著しく異なる。脱落の有無が拡散し、改質が著しく減少し、緩和ギャップが増加し、前方および逆バイアスのスペクトル特性が収束する傾向にある。大きく強調すると、緩和ギャップは量子ゼノン物理学の結果として再び減少する。

We study the transport and spectral property of a segmented diode formed by an XX $+$ XXZ spin chain. This system has been shown to become an ideal rectifier for spin current for large enough anisotropy. Here we show numerical evidence that the system in reverse bias has three different transport regimes depending on the value of the anisotropy: ballistic, diffusive and insulating. In forward bias we encounter two regimes, ballistic and diffusive. The system in forward and reverse bias shows significantly different spectral properties, with distribution of rapidities converging towards different functions. In the presence of dephasing the system becomes diffusive, rectification is significantly reduced, the relaxation gap increases and the spectral properties in forward and reverse bias tend to converge. For large dephasing the relaxation gap decreases again as a result of Quantum Zeno physics.

翻訳日:2023-03-09 22:57:19 公開日:2022-02-17

# トーリック符号における2Qubitゲートと複数Qubitゲートの比較

Comparing Two-Qubit and Multi-Qubit Gates within the Toric Code ( http://arxiv.org/abs/2111.04047v2 )

ライセンス: Link先を確認

David Schwerdt, Yotam Shapira, Tom Manovitz, and Roee Ozeri

(参考訳) 一部の量子コンピューティング(qc)アーキテクチャでは、任意の数の量子ビットの絡み合いを単一の演算で生成することができる。この性質は多くの潜在的な応用があり、特に量子エラー補正(QEC)に有用である。安定化器の測定は、複数の2キュービットゲートの代わりに単一のマルチキュービットゲートを使用して行うことができ、回路深さを低減できる。本研究では,パリティチェック回路における2量子ゲートと5量子ゲートの性能を比較するベンチマークとして,トーリック符号を用いた。我々は、ラマン遷移によって制御される閉じ込められたイオン量子ビットについて検討し、主な誤差源は自発的な光子散乱であると仮定する。 5量子ビットのm{\o}lmer-s{\o}rensenゲートは、フォールトトレランスしきい値の点で、2量子ビットのゲートに対して約40\%の改善を提供する。この結果は、QECの文脈でマルチキュービットゲートを使用する利点を示している。

In some quantum computing (QC) architectures, entanglement of an arbitrary number of qubits can be generated in a single operation. This property has many potential applications, and may specifically be useful for quantum error correction (QEC). Stabilizer measurements can then be implemented using a single multi-qubit gate instead of several two-qubit gates, thus reducing circuit depth. In this study, the toric code is used as a benchmark to compare the performance of two-qubit and five-qubit gates within parity-check circuits. We consider trapped ion qubits that are controlled via Raman transitions, where the primary source of error is assumed to be spontaneous photon scattering. We show that a five-qubit M{\o}lmer-S{\o}rensen gate offers an approximately $40\%$ improvement over two-qubit gates in terms of the fault tolerance threshold. This result indicates an advantage of using multi-qubit gates in the context of QEC.

翻訳日:2023-03-08 22:33:06 公開日:2022-02-17

# 非エルミート準結晶における拡散誘起移動エッジと多重再帰局在遷移

Dimerization induced mobility edges and multiple reentrant localization transitions in non-Hermitian quasicrystals ( http://arxiv.org/abs/2111.08427v3 )

ライセンス: Link先を確認

Wenqian Han and Longwen Zhou

(参考訳) 非エルミート効果はリッチな動的および位相的位相構造を生み出す。本研究は,格子二量体と非ハーミティティーの協調によって,一次元準結晶の移動エッジと多重局在化遷移が生じることを示す。 Aubry-Andr\e-Harper (AAH) モデルの非エルミート拡張(英語版)を行い,実験結果を実証した。準周期的利得/損失と格子二元化との相互作用による再帰的局在遷移を見いだす。量子化された巻数はさらに位相不変量として用いられ、異なるスペクトルと輸送性を持つ位相間の遷移を特徴づける。そこで本研究では,非エルミート準結晶の族を格子二量体の影響を取り入れ,非エルミート系における局在化遷移と移動エッジを調節する便利な手段を提供する。

Non-Hermitian effects could create rich dynamical and topological phase structures. In this work, we show that the collaboration between lattice dimerization and non-Hermiticity could generally bring about mobility edges and multiple localization transitions in one-dimensional quasicrystals. Non-Hermitian extensions of the Aubry-Andr\'e-Harper (AAH) model with staggered onsite potential and dimerized hopping amplitudes are introduced to demonstrate our results. Reentrant localization transitions due to the interplay between quasiperiodic gain/loss and lattice dimerization are found. Quantized winding numbers are further adopted as topological invariants to characterize transitions among phases with distinct spectrum and transport nature. Our study thus enriches the family of non-Hermitian quasicrystals by incorporating effects of lattice dimerization, and offering a convenient way to modulate localization transitions and mobility edges in non-Hermitian systems.

翻訳日:2023-03-08 00:00:59 公開日:2022-02-17

# 形状依存型多重磁性人工シナプスによるニューロモルフィックコンピューティング

Shape-Dependent Multi-Weight Magnetic Artificial Synapses for Neuromorphic Computing ( http://arxiv.org/abs/2111.11516v2 )

ライセンス: Link先を確認

Thomas Leonard, Samuel Liu, Mahshid Alamdar, Can Cui, Otitoaleke G. Akinola, Lin Xue, T. Patrick Xiao, Joseph S. Friedman, Matthew J. Marinella, Christopher H. Bennett and Jean Anne C. Incorvia

(参考訳) ニューロモルフィックコンピューティングでは、人工シナプスは脳と類似したニューロンからの入力に基づいて設定される多重コンダクタンス状態を提供する。複数の重量を超えるシナプスのさらなる特性が必要であり、同じ材料から異なるシナプスの挙動を生成する必要があるため、アプリケーションに依存することができる。本稿では,磁気トンネル接合と磁区壁を用いた磁気材料を用いた人工シナプスの測定を行った。単磁気トンネル接合の下の磁壁軌道にリソグラフィノッチを造ることで、スピン軌道トルクを用いて繰り返し電気的に制御できる4-5の安定抵抗状態が得られる。形状がシナプスの挙動に及ぼす影響を解析し,非対称な重みが制御性が高く,ストレートな装置は確率性が高いが,抵抗レベルは安定であることを示した。デバイスデータはニューロモルフィックコンピューティングシミュレータに入力され、アプリケーション固有のシナプス関数の有用性を示す。ストリーム型Fashion-MNISTデータに適用した人工ニューラルネットワークの実装により,ケプシカル磁気シナプスをメタ塑性関数として利用し,オンライン学習の効率化を図る。 CIFAR-100画像認識のための畳み込みニューラルネットワークを実装したところ、磁気シナプスは抵抗レベルの安定性のため、ほぼ理想的推論精度が得られることがわかった。この研究は、多重磁気シナプスがニューロモルフィックコンピューティングの実現可能な技術であることを示し、新しい人工シナプス技術の設計ガイドラインを提供する。

In neuromorphic computing, artificial synapses provide a multi-weight conductance state that is set based on inputs from neurons, analogous to the brain. Additional properties of the synapse beyond multiple weights can be needed, and can depend on the application, requiring the need for generating different synapse behaviors from the same materials. Here, we measure artificial synapses based on magnetic materials that use a magnetic tunnel junction and a magnetic domain wall. By fabricating lithographic notches in a domain wall track underneath a single magnetic tunnel junction, we achieve 4-5 stable resistance states that can be repeatably controlled electrically using spin orbit torque. We analyze the effect of geometry on the synapse behavior, showing that a trapezoidal device has asymmetric weight updates with high controllability, while a straight device has higher stochasticity, but with stable resistance levels. The device data is input into neuromorphic computing simulators to show the usefulness of application-specific synaptic functions. Implementing an artificial neural network applied on streamed Fashion-MNIST data, we show that the trapezoidal magnetic synapse can be used as a metaplastic function for efficient online learning. Implementing a convolutional neural network for CIFAR-100 image recognition, we show that the straight magnetic synapse achieves near-ideal inference accuracy, due to the stability of its resistance levels. This work shows multi-weight magnetic synapses are a feasible technology for neuromorphic computing and provides design guidelines for emerging artificial synapse technologies.

翻訳日:2023-03-07 04:14:44 公開日:2022-02-17

# 熱平衡の平和的共存と時間の出現

Peaceful coexistence of thermal equilibrium and the emergence of time ( http://arxiv.org/abs/2112.04057v3 )

ライセンス: Link先を確認

Tommaso Favalli and Augusto Smerzi

(参考訳) 我々は、小さな系 S と大きな環境からなる量子宇宙を考える。全体のエネルギー制約を満たす宇宙のランダムに選択された波動関数の大部分に対して、系Sの密度行列は正準統計分布によって与えられることが示されている。ここでは、pageとwootters機構を通じて、時間と非平衡のダイナミクスが、宇宙の(ランダムに選択された)大域的な波動関数に存在するシステムと環境の絡み合いの結果生じることを示します。統計平衡と非平衡力学の平和的共存のパラドックスは、系Sの歴史全体にわたる時間的トレースで環境自由度上のトレースを特定することによって解決される。

We consider a quantum Universe composed by a small system S and a large environment. It has been demonstrated that, for the vast majority of randomly chosen wave-functions of the Universe satisfying a total energy constraint, the reduced density matrix of the system S is given by the canonical statistical distribution. Here, through the Page and Wootters mechanism, we show that time and non-equilibrium dynamics can emerge as a consequence of the entanglement between the system and the environment present in the (randomly chosen) global wave-function of the Universe. The clock is provided by the environment, which ticks the temporal evolution of S. The paradox of the peaceful coexistence of statistical equilibrium and non-equilibrium dynamics is solved by identifying the trace over the environment degrees of freedom with the temporal trace over the entire history of the system S.

翻訳日:2023-03-05 03:19:55 公開日:2022-02-17

# スパイクニューラルネットワークにおける劣化曲線の解法による深層残差学習の促進

Advancing Deep Residual Learning by Solving the Crux of Degradation in Spiking Neural Networks ( http://arxiv.org/abs/2201.07209v2 )

ライセンス: Link先を確認

Yifan Hu, Yujie Wu, Lei Deng, Guoqi Li

(参考訳) ニューロモルフィックコンピューティングの急速な進歩にもかかわらず、スパイクニューラルネットワーク(SNN)の不十分な深さと結果として生じる表現力は、実際に適用範囲を厳しく制限している。残存学習とショートカットはディープニューラルネットワークのトレーニングに重要なアプローチとして証明されているが、スパイクベースのコミュニケーションと時空間ダイナミクスの特性にその適用性を評価することは滅多になかった。この無視は情報の流れを阻害し、それに伴う劣化問題を引き起こす。そこで本論文では,snsの新たな残差ブロックを提案する。これは,cifar-10上の最大482層,imagenet上の104層といった,直接訓練されたsnsの深さをわずかに劣化する問題を観察することなく大きく拡張することができる。 SRM-ResNet104は、直接訓練されたSNNの領域において、ImageNetにおいて76.02%の精度で優れた結果が得られる。大いなるエネルギー効率を推定し、その結果得られるネットワークは、入力サンプルを分類するために平均1つのスパイクのみを必要とする。当社の強力でスケーラブルなモデリングは、SNNのさらなる探索に強力なサポートを提供すると信じています。

Despite the rapid progress of neuromorphic computing, the inadequate depth and the resulting insufficient representation power of spiking neural networks (SNNs) severely restrict their application scope in practice. Residual learning and shortcuts have been evidenced as an important approach for training deep neural networks, but rarely did previous work assess their applicability to the characteristics of spike-based communication and spatiotemporal dynamics. This negligence leads to impeded information flow and the accompanying degradation problem. In this paper, we identify the crux and then propose a novel residual block for SNNs, which is able to significantly extend the depth of directly trained SNNs, e.g., up to 482 layers on CIFAR-10 and 104 layers on ImageNet, without observing any slight degradation problem. We validate the effectiveness of our methods on both frame-based and neuromorphic datasets, and our SRM-ResNet104 achieves a superior result of 76.02% accuracy on ImageNet, the first time in the domain of directly trained SNNs. The great energy efficiency is estimated and the resulting networks need on average only one spike per neuron for classifying an input sample. We believe our powerful and scalable modeling will provide a strong support for further exploration of SNNs.

翻訳日:2023-03-05 00:41:21 公開日:2022-02-17

# 光力学系における量子ノイズによる一般化不確実性原理の探究

Probing the generalized uncertainty principle through quantum noises in optomechanical systems ( http://arxiv.org/abs/2112.13682v2 )

ライセンス: Link先を確認

Soham Sen, Sukanta Bhattacharyya, Sunandan Gangopadhyay

(参考訳) 本研究では,一般不確かさ原理フレームワーク (gup) において,キャビティ内の単一モードの光学場と相互作用する単純な機械振動子について考察した。本研究の目的は,修正雑音スペクトルを計算し,gupの効果を観測することである。我々が検討した可換関係は、二次順序項と共に余剰線型順序運動量項を持つ。理論結果と観測結果とを対比して,gupパラメータは,異なる実験から得られたシステムパラメータの値を用いて,ノイズスペクトルからより厳密な結合が得られることを観測した。

In this work we have considered a simple mechanical oscillator interacting with a single mode optical field inside a cavity in the generalized uncertainty principle framework (GUP). Our aim is to calculate the modified noise spectrum and observe the effects of the GUP. The commutation relation that we have considered has an extra linear order momentum term along with a quadratic order term. Confronting our theoretical results with the observational results, we observe that we get a much tighter bound on the GUP parameters from the noise spectrum using the values of the system parameters from different experiments.

翻訳日:2023-03-03 17:30:20 公開日:2022-02-17

# 光学浮揚ナノダイヤモンドのホットブラウン運動

Hot Brownian motion of optically levitated nanodiamonds ( http://arxiv.org/abs/2201.00170v2 )

ライセンス: Link先を確認

Fran\c{c}ois Rivi\`ere, Timoth\'ee de Guillebon, Damien Raynal, Martin Schmidt, Jean-S\'ebastien Lauret, Jean-Fran\c{c}ois Roch, Lo\"ic Rondin

(参考訳) 粒子の環境よりも高温のブラウン運動は、象徴的な平衡系である。その研究はナノスケールの熱効果に関する貴重な洞察を与える。特に、力センシングと量子物理学のテストに有望なプラットフォームである光学浮揚粒子の熱的効果の優れた診断を提供する。したがって、この効果における関連するパラメータを理解することは重要である。本研究では,NV中心を担持する光レビテーションナノダイアモンドを用いて,粒子の内部温度と質量の動態を計測し,粒子の形状と物質の役割を検証した。ナノダイアモンドの内部温度を他の粒子に適応可能な力学から評価するモデルを提案する。また,他の機構がトラップのナノダイアモンド力学とその安定性に影響を及ぼすことを示した。最後に, ナノダイヤモンドの浮揚をナノサーマル効果を研究するための優れたツールとして示すことで, 光学浮揚粒子の捕捉安定性を高めるための展望を開く。

The Brownian motion of a particle hotter than its environment is an iconic out-of-equilibrium system. Its study provides valuable insights into nanoscale thermal effects. Notably, it supplies an excellent diagnosis of thermal effects in optically levitated particles, a promising platform for force sensing and quantum physics tests. Thus, understanding the relevant parameters in this effect is critical. In this context, we test the role of particles' shape and material, using optically levitated nanodiamonds hosting NV centers to measure the particles' internal temperature and center-of-mass dynamics. We present a model to assess the nanodiamond internal temperature from its dynamics, adaptable to other particles. We also demonstrate that other mechanisms affect the nanodiamond dynamics and its stability in the trap. Finally, our work, by showing levitating nanodiamonds as an excellent tool for studying nano-thermal effects, opens prospects for increasing the trapping stability of optically levitated particles.

翻訳日:2023-03-02 17:24:17 公開日:2022-02-17

# 高帯域幅キャビティ内の原子のラマンイメージング

Raman Imaging of Atoms Inside a High-bandwidth Cavity ( http://arxiv.org/abs/2202.05369v2 )

ライセンス: Link先を確認

Eduardo Uru\~nuela, Maximilian Ammenwerth, Pooja Malik, Lukas Ahlheit, Hannes Pfeifer, Wolfgang Alt, and Dieter Meschede

(参考訳) 高帯域幅ファイバベースの光キャビティは、将来の量子ネットワークにとって有望な構成要素である。量子情報を高速にファイバーネットワークにルーティングする光子を持つ単原子や複数原子のような定常量子ビットを共鳴的に結合するために用いられる。高帯域幅の空洞では、パーセル効果が全方位蛍光を強く抑制するため、原子空洞共鳴線の標準蛍光イメージングによる原子位置制御が阻害される。ここでは,連続および3次元ラマンサイドバンド冷却により発生するレプタンパー蛍光を検出することにより,このような繊維ファブリ・パウエルトキャビティに強く結合した$^{87}$rb原子のイメージングを復元する。我々は,ラマン共鳴に影響を及ぼすリプタンパー誘起差光シフトを,強度と調律に依存して詳細な分光分析を行った。本研究は, 捕捉原子の加熱ダイナミクスにおける双極子-力ゆらぎの役割に関する物理的洞察を得るための, 画像信号対雑音比と生存率の妥協機構を明らかにする。

High-bandwidth, fiber-based optical cavities are a promising building block for future quantum networks. They are used to resonantly couple stationary qubits such as single or multiple atoms with photons routing quantum information into a fiber network at high rates. In high-bandwidth cavities, standard fluorescence imaging on the atom-cavity resonance line for controlling atom positions is impaired since the Purcell effect strongly suppresses all-directional fluorescence. Here, we restore imaging of $^{87}$Rb atoms strongly coupled to such a fiber Fabry-P\'erot cavity by detecting the repumper fluorescence which is generated by continuous and three-dimensional Raman sideband cooling. We have carried out a detailed spectroscopic investigation of the repumper-induced differential light shifts affecting the Raman resonance, dependent on intensity and detuning. Our analysis identifies a compromise regime between imaging signal-to-noise ratio and survival rate, where physical insight into the role of dipole-force fluctuations in the heating dynamics of trapped atoms is gained.

翻訳日:2023-02-26 04:31:00 公開日:2022-02-17

# 量子コンピューティングと通信におけるデコヒーレンスと量子エラー補正

Decoherence and Quantum Error Correction for Quantum Computing and Communications ( http://arxiv.org/abs/2202.08600v1 )

ライセンス: Link先を確認

Josu Etxezarreta Martinez

(参考訳) 量子技術は、素数分解、非構造化データベース探索、複雑なマクロ分子シミュレーションなど、いくつかの情報処理タスクを効果的に解決する、計り知れない可能性を示している。古典的な扱いにくい問題を解く能力の結果として、量子マシンは薬物設計、プロセスの最適化、破壊不能なコミュニケーション、機械学習といったアプリケーションを通じて現代社会に革命をもたらす可能性がある。しかし、量子情報は、周囲の環境との相互作用に関連する量子状態のコヒーレンスを失うことを表す、いわゆるデコヒーレンス(decoherence)による誤りに悩まされる傾向がある。このデコヒーレンス現象は、量子情報の伝達、処理、あるいは保存など、すべての量子情報タスクに存在します。したがって、量子誤り訂正符号(QECC)による量子情報の保護は、完全に動作する量子コンピュータを構築する上で最重要となる。量子情報の保護が可能な効果的な誤り訂正法を構築するためには,環境非一貫性のプロセスとそのモデル化方法を理解することが基本である。この論文では、デコヒーレンスの性質を研究・数学的にモデル化し、QECCはより優れた誤り訂正能力を示すように設計・最適化されている。

Quantum technologies have shown immeasurable potential to effectively solve several information processing tasks such as prime number factorization, unstructured database search or complex macromolecule simulation. As a result of such capability to solve certain problems that are not classically tractable, quantum machines have the potential revolutionize the modern world via applications such as drug design, process optimization, unbreakable communications or machine learning. However, quantum information is prone to suffer from errors caused by the so-called decoherence, which describes the loss in coherence of quantum states associated to their interactions with the surrounding environment. This decoherence phenomenon is present in every quantum information task, be it transmission, processing or even storage of quantum information. Consequently, the protection of quantum information via quantum error correction codes (QECC) is of paramount importance to construct fully operational quantum computers. Understanding environmental decoherence processes and the way they are modeled is fundamental in order to construct effective error correction methods capable of protecting quantum information. In this thesis, the nature of decoherence is studied and mathematically modelled; and QECCs are designed and optimized so that they exhibit better error correction capabilities.

翻訳日:2023-02-25 12:59:46 公開日:2022-02-17

# 信頼性量子コンピューティングにおける誤差補正

Error Correction for Reliable Quantum Computing ( http://arxiv.org/abs/2202.08599v1 )

ライセンス: Link先を確認

Patricio Fuentes

(参考訳) 量子コンピュータは、これまで難解だった計算問題を効率的に解決する新しい時代の到来を告げる。しかし、量子技術はデコヒーレンス (decoherence) によって抑えられ、これは量子パラダイムにおいて一様であり、未確認のままで量子情報が役に立たない現象である。量子誤り訂正の科学は、符号として知られる構造を用いてデコヒーレンスの効果から量子情報を組み合わせ、保護しようとする分野であり、この課題を満たすために生まれた。量子符号の特定のサブクラスである安定化符号は、古典的誤り訂正の分野を用いて並列を描画することで、量子誤り訂正の分野の高速進行を可能にした。この結果、スパース符号や量子ターボ符号のような、よく知られたキャパシティに適合する古典符号の量子対数が構築された。しかし、この方法で得られた量子符号は、古典的な誤り訂正能力を完全に引き起こすわけではない。これは、古典的戦略が量子パラダイムと古典的パラダイムの間の重要な違いを無視しているためであり、量子的誤り訂正がデコヒーレンスとの戦いに成功するには対処しなければならない問題である。この論文では、縮退(degeneracy)として知られる量子パラダイム専用の現象とそのスパース量子符号の性能への影響について研究する。さらに,様々なシナリオにおいて,スパース量子コードの特定のファミリーの性能を向上させる手法を解析・提示する。

Quantum computers herald the arrival of a new era in which previously intractable computational problems will be solved efficiently. However, quantum technology is held down by decoherence, a phenomenon that is omnipresent in the quantum paradigm and that renders quantum information useless when left unchecked. The science of quantum error correction, a discipline that seeks to combine and protect quantum information from the effects of decoherence using structures known as codes, has arisen to meet this challenge. Stabilizer codes, a particular subclass of quantum codes, have enabled fast progress in the field of quantum error correction by allowing parallels to be drawn with the widely studied field of classical error correction. This has resulted in the construction of the quantum counterparts of well-known capacity-approaching classical codes like sparse codes and quantum turbo codes. However, quantum codes obtained in this manner do not entirely evoke the stupendous error correcting abilities of their classical counterparts. This occurs because classical strategies ignore important differences between the quantum and classical paradigms, an issue that needs to be addressed if quantum error correction is to succeed in its battle with decoherence. In this dissertation we study a phenomenon exclusive to the quantum paradigm, known as degeneracy, and its effects on the performance of sparse quantum codes. Furthermore, we also analyze and present methods to improve the performance of a specific family of sparse quantum codes in various different scenarios.

翻訳日:2023-02-25 12:59:25 公開日:2022-02-17

# 金属中のホットフォノン物理学の性質と課題:MgB$_2$およびその他の化合物

Properties and challenges of hot-phonon physics in metals: MgB$_2$ and other compounds ( http://arxiv.org/abs/2202.08597v1 )

ライセンス: Link先を確認

Emmanuele Cappelluti, Fabio Caruso, Dino Novko

(参考訳) 平衡外の系における電子と集合モードの超高速ダイナミクスは、ポンプ源のエネルギーが通常吸収される電子自由度から格子の自由度へのエネルギー移動によって決定的に制御される。従来の金属では、この過程は格子全体の加熱につながり、通常、すべての自由度との最終的な平衡に達するまで、有効な格子温度 $t_{\rm ph}$ で記述される。しかし、特定の材料では、エネルギー移動に優先的なチャネルを提供する格子モードがほとんどなく、非熱的振動分布と"em hot phonons"、すなわち他のモードよりも人口が多い格子モードの開始に繋がる。ホットフォノンは通常、半導体やグラフェンのような半金属化合物で起こるが、ホットモードへの優先チャネルは、電子位相空間の縮小によって決定される。異なる経路を辿ると、金属でもホットフォノン物理学を得る可能性も近年、電子-フォノン (el-ph) カップリングの強い異方性の結果、文学においても引き起こされている。本稿では,MgB$_2$を代表例として,異方性el-ph結合金属のホットフォノンシナリオを許容する物理条件を概説し,ホットフォノンの観察可能な指紋について考察する。他の金属化合物中のホットフォノンの予測と実験的観察への新しい展望についても論じる。

The ultrafast dynamics of electrons and collective modes in systems out of equilibrium is crucially governed by the energy transfer from electronic degrees of freedom, where the energy of the pump source is usually absorbed, to lattice degrees of freedom. In conventional metals such process leads to an overall heating of the lattice, usually described by an effective lattice temperature $T_{\rm ph}$, until final equilibrium with all the degrees of freedom is reached. In specific materials, however, few lattice modes provide a preferential channel for the energy transfer, leading to a non-thermal distribution of vibrations and to the onset of {\em hot phonons}, i.e., lattice modes with a much higher population than the other modes. Hot phonons are usually encountered in semiconductors or semimetal compounds, like graphene, where the preferential channel towards hot modes is dictated by the reduced electronic phase space. Following a different path, the possibility of obtaining hot-phonon physics also in metals has been however also recently prompted in literature, as a result of a strong anisotropy of the electron-phonon (el-ph) coupling. In the present paper, taking MgB$_2$ as a representative example, we review the physical conditions that allow a hot-phonon scenario in metals with anisotropic el-ph coupling, and we discuss the observable fingerprints of hot phonons. Novel perspectives towards the prediction and experimental observation of hot phonons in other metallic compounds are also discussed.

翻訳日:2023-02-25 12:59:02 公開日:2022-02-17

# 多成分偶数と奇数jスピンコヒーレント状態を用いた任意の重ね合わせコヒーレント状態の確率的量子テレポーテーション効率の向上

Improving the probabilistic quantum teleportation efficiency of arbitrary superposed coherent state using multipartite even and odd j-spin coherent states as resource ( http://arxiv.org/abs/2202.08591v1 )

ライセンス: Link先を確認

Meryem El Kirdi, Abdallah Slaoui, Hanane El Hadfi and Mohammed Daoud

(参考訳) 量子テレポーテーションは、量子情報セキュア伝送において最も重要な技術の一つである。量子テレポーテーションは、多くの量子情報タスクの基本的な鍵として設計され、量子技術、特に量子通信において顕著に機能する。本研究では,alice (sender) と bob (receiver) を接続する絡み合った資源として,多成分偶数と奇数の$j$-spinコヒーレント状態を用いて,任意の重ね合わせコヒーレント状態に対する確率的テレポーテーション方式を提案する。ここで、アリスは偶数と奇数の両方のスピンコヒーレント状態を持ち、(1)未知のスピン状態と(2)2つのコヒーレントスピン状態のうちの1つからなる一対のスピン上で繰り返しGHZ状態測定(GHZSMs)を行い、最大平均忠実度で量子テレポーテーションに到達するまで交互に行う。共起によって定量化された共有状態の絡み合い量と、テレポーテーション忠実度と、テレポーテーションされた対象状態の成功確率との関係を、n^{\rm th}$の繰り返し試行まで提供する。本研究では,非最大絡み合った状態でも完全量子テレポーテーションが可能であることを示す。さらに、この繰り返しGHZSM試行プロセスは、テレポートされた状態の平均忠実度と確率的プロトコルの実行が成功する確率の両方を著しく増大させる。また,この結果から,jスピン数,目標状態パラメータ,コヒーレント状態間の重なり合いが,テレポーテーション効率を最大化するために調整可能な重要な制御パラメータを提供することを示した。

Quantum teleportation is one of the most important techniques for quantum information secure transmission. Using preshared entanglement, quantum teleportation is designed as a basic key in many quantum information tasks and features prominently in quantum technologies, especially in quantum communication. In this work, we provide a new probabilistic teleportation protocol scheme for arbitrary superposed coherent states by employing the multipartite even and odd $j$-spin coherent states as the entangled resource connecting Alice (sender) and Bob (receiver). Here, Alice possesses both even and odd spin coherent states and makes repeated GHZ states measurements (GHZSMs) on the pair of spins, consisting of (1) the unknown spin state and (2) one of the two coherent spin states, taken alternately, until reaching a quantum teleportation with maximal average fidelity. We provide the relationship between the entanglement amount of the shared state, quantified by the concurrence, with the teleportation fidelity and the success probability of the teleported target state up to the $n^{\rm th}$ repeated attempt. In this scheme, we show that the perfect quantum teleportation can be done even with a non-maximally entangled state. Furthermore, this repeated GHZSMs attempt process significantly increases both the average fidelity of the teleported state and the probability of a successful run of the probabilistic protocol. Also on our results, we show that the j-spin number, the target state parameter and the overlap between coherent states provide important additional control parameters that can be adjusted to maximize the teleportation efficiency.

翻訳日:2023-02-25 12:58:37 公開日:2022-02-17

# 準周期駆動1次元乱れ系の局在と非局在化特性

Localization and delocalization properties in quasi-periodically driven one-dimensional disordered system ( http://arxiv.org/abs/2202.08582v1 )

ライセンス: Link先を確認

Hiroaki S. Yamada and Kensuke S. Ikeda

(参考訳) m$カラーの準周期調和振動により摂動した時間連続1次元アンダーソンモデルにおける量子拡散の局在と非局在を系統的に検討し, [pre {\bf 103}, l040202(2021)] で部分的に報告した。モデルの局所化・非局在化特性について, 障害強度$W$, 摂動強度$\epsilon$, 空間次元の類似的な役割を果たす色数$M$の3つのパラメータについて詳細に検討した。特に,局所的非局在化遷移 (ldt) の存在とその臨界特性に注目している。 M\geq 3$ の LDT が存在し、通常の拡散は臨界強度$\epsilon$ の上に回復し、拡散力学の特徴は、M$ が大きなとはいえ、確率的に摂動されたアンダーソンモデルに対して予測される拡散過程を模倣する。これらの結果は、時間離散量子マップ、すなわちアンダーソン写像と標準写像の結果と比較される。さらに,静的不規則な部分を持たない極限モデルと比較し,非局在化ダイナミクスの特徴について考察した。

Localization and delocalization of quantum diffusion in time-continuous one-dimensional Anderson model perturbed by the quasi-periodic harmonic oscillations of $M$ colors is investigated systematically, which has been partly reported by the preliminary letter [PRE {\bf 103}, L040202(2021)]. We investigate in detail the localization-delocalization characteristics of the model with respect to three parameters: the disorder strength $W$, the perturbation strength $\epsilon$ and the number of the colors $M$ which plays the similar role of spatial dimension. In particular, attentions are focused on the presence of localization-delocalization transition (LDT) and its critical properties. For $M\geq 3$ the LDT exists and a normal diffusion is recovered above a critical strength $\epsilon$, and the characteristics of diffusion dynamics mimic the diffusion process predicted for the stochastically perturbed Anderson model even though $M$ is not large. These results are compared with the results of time-discrete quantum maps, i.e., Anderson map and the standard map. Further, the features of delocalized dynamics is discussed in comparison with a limit model which has no static disordered part.

翻訳日:2023-02-25 12:58:02 公開日:2022-02-17

# 量子論における部分的無知通信タスク

Partial ignorance communication tasks in quantum theory ( http://arxiv.org/abs/2202.08581v1 )

ライセンス: Link先を確認

Oskari Kerppo

(参考訳) 本稿では,成功度基準が測定と準備で最大化される前に,準備と測定の双方が第三者から入力を受け取る部分的無知のコミュニケーションの一般化を提案する。 sdps、通信行列のための超弱モノトン、量子状態のフレーム理論など、成功指標の境界を得るために様々な方法が用いられている。新しい一般化された準備・測定設定における最も単純なシナリオは、単に部分的無知通信タスクと呼ばれ、ビットとキューディットに対して徹底的に分析される。最後に、新しい一般化された設定により、準備と測定に操作等価性を導入することができ、通信タスクの1つで量子論の文脈上の優位性を分析し観察することができる。

We introduce a generalization of communication of partial ignorance where both parties of a prepare-and-measure setup receive inputs from a third party before a success metric is maximized over the measurements and preparations. Various methods are used to obtain bounds on the success metrics, including SDPs, ultraweak monotones for communication matrices and frame theory for quantum states. Simplest scenarios in the new generalized prepare-and-measure setting, simply called partial ignorance communication tasks, are analysed exhaustively for bits and qudits. Finally, the new generalized setting allows the introduction of operational equivalences to the preparations and measurements, allowing us to analyse and observe a contextual advantage for quantum theory in one of the communication tasks.

翻訳日:2023-02-25 12:57:39 公開日:2022-02-17

# コールド分子またはリドバーグ原子の合成格子中の量子膜相

Quantum Membrane Phases in Synthetic Lattices of Cold Molecules or Rydberg Atoms ( http://arxiv.org/abs/2202.08540v1 )

ライセンス: Link先を確認

Chunhan Feng, Hannah Manetsch, Valery G. Rousseau, Kaden R. A. Hazzard and Richard Scalettar

(参考訳) 確率的グリーン関数量子モンテカルロ法を用いて、双極子相互作用を持つ超低温分子またはリドバーグ原子の性質を合成次元と2次元実空間光学格子または周期マイクロトラップアレイの半合成的構成で計算する。熱力学的量と適切な相関関数の計算と、それらの有限サイズのスケーリングによって、分子やライドバーグ原子の内部回転状態や電子状態の合成次元に2次元のシートが形成される低温相への2次遷移が存在することが示されている。相互作用の異なる値に対するシミュレーション $v$ は、実空間と合成空間の両方に隣接する原子や分子の間で作用し、位相図を計算することができる。十分に大きなV$での有限温度遷移と量子相転移は、遷移温度が消滅する臨界値である$V_c$よりも低い値である。

We calculate properties of dipolar interacting ultracold molecules or Rydberg atoms in a semi-synthetic three-dimensional configuration -- one synthetic dimension plus a two-dimensional real space optical lattice or periodic microtrap array -- using the stochastic Green function Quantum Monte Carlo method. Through a calculation of thermodynamic quantities and appropriate correlation functions, along with their finite size scalings, we show that there is a second order transition to a low temperature phase in which two-dimensional `sheets' form in the synthetic dimension of internal rotational or electronic states of the molecules or Rydberg atoms, respectively. Simulations for different values of the interaction $V$, which acts between atoms or molecules that are adjacent both in real and synthetic space, allow us to compute a phase diagram. We find a finite-temperature transition at sufficiently large $V$, as well as a quantum phase transition -- a critical value $V_c$ below which the transition temperature vanishes.

翻訳日:2023-02-25 12:57:26 公開日:2022-02-17

# チュートリアル:マクロなQEDと真空力

Tutorial: Macroscopic QED and vacuum forces ( http://arxiv.org/abs/2202.08762v1 )

ライセンス: Link先を確認

S. A. R. Horsley

(参考訳) このチュートリアルでは、分散した散逸物質と相互作用する電磁場を表すハミルトニアンが見つかる、マクロなqedの理論を紹介している。 1次元理論をモチベーションとして用いて、より面倒な3次元理論を構築する。そして、ドップラー効果と電気および磁気応答の混合により物質反応が変化する移動体へのこの理論の拡張を考えると、量子電磁力の理論を無料で得ることが示されている。我々は、スライド板間の量子摩擦力に対するペンドリー式を再現するためにマクロQEDを適用して仕上げる。

This tutorial introduces the theory of macroscopic QED, where a Hamiltonian is found that represents the electromagnetic field interacting with a dispersive, dissipative material. Using a one dimensional theory as motivation, we build up the more cumbersome three dimensional theory. Then considering the extension of this theory to moving materials, where the material response changes due to both the Doppler effect and the mixing of electric and magnetic responses, it is shown that one gets the theory of quantum electromagnetic forces for free. We finish by applying macroscopic QED to reproduce Pendry's expression for the quantum friction force between sliding plates.

翻訳日:2023-02-25 12:50:16 公開日:2022-02-17

# 断熱熱機の動力に関する幾何学的境界

Geometric Bounds on the Power of Adiabatic Thermal Machines ( http://arxiv.org/abs/2202.08759v1 )

ライセンス: Link先を確認

Joshua Eglinton and Kay Brandner

(参考訳) 温度差の小さい2つの熱浴間での低速駆動型メソとマイクロスケールの冷凍機とヒートエンジンの性能解析を行った。一般的なスケーリング引数を用いて,浴槽間の熱リークが完全に抑制された場合に限って,カルノット限界に任意に近づくことができることを示す。その出力は、カルノー極限で二次的にゼロに崩壊する普遍幾何学的境界に従属する。この境界は、駆動プロトコルが適切に最適化され、浴槽間の温度差が駆動周波数でゼロとなる場合、準静的限界において漸近的に飽和する。これらの結果は、明確に定義された断熱応答状態と一般化されたオンサーガー対称性を持つ任意の熱力学的一貫した力学に対して一般的な条件で成り立つ。実例では, 冷却装置として動作するクビット冷凍機とコヒーレントチャージポンプのモデルについて検討する。

We analyze the performance of slowly driven meso- and micro-scale refrigerators and heat engines that operate between two thermal baths with small temperature difference. Using a general scaling argument, we show that such devices can work arbitrarily close to their Carnot limit only if heat-leaks between the baths are fully suppressed. Their power output is then subject to a universal geometric bound that decays quadratically to zero at the Carnot limit. This bound can be asymptotically saturated in the quasi-static limit if the driving protocols are suitably optimized and the temperature difference between the baths goes to zero with the driving frequency. These results hold under generic conditions for any thermodynamically consistent dynamics admitting a well-defined adiabatic-response regime and a generalized Onsager symmetry. For illustration, we work out models of a qubit-refrigerator and a coherent charge pump operating as a cooling device.

翻訳日:2023-02-25 12:50:06 公開日:2022-02-17

# 平衡・高非線形ブール関数の進化構成

Evolving Constructions for Balanced, Highly Nonlinear Boolean Functions ( http://arxiv.org/abs/2202.08743v1 )

ライセンス: Link先を確認

Claude Carlet, Marko Djurasevic, Domagoj Jakobovic, Luca Mariot, Stjepan Picek

(参考訳) バランスの取れた高非線形ブール関数を見つけることは、一般にどの非線形値に到達できるかがわからないという難しい問題である。同時に、進化的計算は特定のブール関数インスタンスの進化に成功しているが、より大きなブール関数サイズに対して容易にスケールできない。実際、より小さなブール関数の進化はほぼ自明であるが、より大きなサイズはますます難しくなり、進化的アルゴリズムは亜最適に機能する。本研究では,遺伝的プログラミング (gp) が高非線形性を持つブール関数のバランスを保った構成を進化させるかどうかを問う。特に興味深いのは、そのような構成はごくわずかしか知られていないことである。以上の結果から,GP はよく一般化される構造,すなわち,複数のテストサイズに必要な関数を見つけることができることがわかった。さらに、GPは異なる構文表現の下で多くの等価な構成を進化させることを示す。興味深いことに、GPによって発見された最も単純な解は、よく知られた間接和構成の特別な場合である。

Finding balanced, highly nonlinear Boolean functions is a difficult problem where it is not known what nonlinearity values are possible to be reached in general. At the same time, evolutionary computation is successfully used to evolve specific Boolean function instances, but the approach cannot easily scale for larger Boolean function sizes. Indeed, while evolving smaller Boolean functions is almost trivial, larger sizes become increasingly difficult, and evolutionary algorithms perform suboptimally. In this work, we ask whether genetic programming (GP) can evolve constructions resulting in balanced Boolean functions with high nonlinearity. This question is especially interesting as there are only a few known such constructions. Our results show that GP can find constructions that generalize well, i.e., result in the required functions for multiple tested sizes. Further, we show that GP evolves many equivalent constructions under different syntactic representations. Interestingly, the simplest solution found by GP is a particular case of the well-known indirect sum construction.

翻訳日:2023-02-25 12:49:44 公開日:2022-02-17

# 文脈の微分幾何学

Differential Geometry of Contextuality ( http://arxiv.org/abs/2202.08719v1 )

ライセンス: Link先を確認

Sidiney B. Montanhano

(参考訳) 文脈性は、トポロジカルな現象として長い間関連してきた。この研究では、そのような関係は一般化された文脈性というより一般的な枠組みで明らかにされる。主アイデアは、状態、効果、変換を接空間に存在するベクトルとして、非文脈条件を離散閉経路として、ヌル垂直位相を意味する。同様の解釈が2つある。平坦な空間が課される幾何学的あるいは現実的な視点は、文脈の振る舞いが、電磁的テンソルに類似した確率関数の曲率(非自明なホロノミー)と等価になることを意味する; 評価関数の修正として、文脈性と干渉、非可換性、符号付き測度を接続するのに使うことができる。評価関数を保存しなければならない位相的あるいは反現実的視点は、文脈的振る舞いを位相的障害(非自明なモノドロミー)として解釈できることを意味し、文脈性と非埋め込み可能性、一般化されたボロビエフの定理をつなぐのに使うことができる。両方のビューは文脈的分数と関連付けられ、オンティックモデルの乱れは非自明な遷移写像として表現できる。

Contextuality has been related for a long time as a topological phenomenon. In this work, such a relationship is exposed in the more general framework of generalized contextuality. The main idea is to identify states, effects, and transformations as vectors living in a tangent space, and the non-contextual conditions as discrete closed paths implying null vertical phases. Two equivalent interpretations hold. The geometrical or realistic view, where flat space is imposed, implies that the contextual behavior becomes equivalent to the curvature (non-trivial holonomy) of the probabilistic functions, in analogy with the electromagnetic tensor; as a modification of the valuation function, it can be used to connect contextuality with interference, non-commutativity, and signed measures. The topological or anti-realistic view, where the valuation functions must be preserved, implies that the contextual behavior can be translated as topological failures (non-trivial monodromy); it can be used to connect contextuality with non-embeddability and a generalized Voroby'ev theorem. Both views can be related to contextual fraction, and the disturbance in ontic models can be presented as non-trivial transition maps.

翻訳日:2023-02-25 12:49:28 公開日:2022-02-17

# 圧縮二次光学を用いたゼプトニュートン力センシング

Zeptonewton force sensing with squeezed quadratic optomechanics ( http://arxiv.org/abs/2202.08690v1 )

ライセンス: Link先を確認

Sheng-Dian Zhang, Jie Wang, Ya-Feng Jiao, Huilai Zhang, Ying Li, Yun-Lan Zuo, \c{S}ahin K. \"Ozdemir, Cheng-Wei Qiu, Franco Nori, Hui Jing

(参考訳) キャビティ・オプティメカニカル(COM)センサは, 主に線形COM結合を用いて, 超弱力測定や暗黒物質探索のための強力なツールとして実装されている。ここでは、不安定な二次COMシステムを用いて量子力センシングを行い、機械的エネルギーの正確な測定を可能にする。このシステムは従来のリニアCOMセンサよりも7ドル高い精度で出力感度を実現するために最適化され、パラメーターが実験的に利用可能であることが判明した。さらに2次COMシステムとスクイーズ媒質を統合することで、標準量子限界をはるかに下回ってゼプトニュートンレベルに達するという、さらに3ドルの注文の強化につながる可能性がある。これにより、量子非線形COMセンサーを基礎物理学実験や極端感度を必要とする広範囲のアプリケーションで製造および使用する新たな展望が開かれる。

Cavity optomechanical (COM) sensors, as powerful tools for measuring ultraweak forces or searching for dark matter, have been implemented to date mainly using linear COM couplings. Here, quantum force sensing is explored by using a quadratic COM system which is free of bistability and allows accurate measurement of mechanical energy. We find that this system can be optimized to achieve a force sensitivity $7$ orders of magnitude higher than any conventional linear COM sensor, with experimentally accessible parameters. Further integrating a quadratic COM system with a squeezing medium can lead to another $3$ orders enhancement, well below the standard quantum limit and reaching the zeptonewton level. This opens new prospects of making and using quantum nonlinear COM sensors in fundamental physics experiments and in a wide range of applications requiring extreme sensitivity.

翻訳日:2023-02-25 12:48:33 公開日:2022-02-17

# ボソニック弦理論における熱場二重状態の回路複雑性

Circuit Complexity for Thermofield Double States in Bosonic String Theory ( http://arxiv.org/abs/2202.08663v1 )

ライセンス: Link先を確認

Arshid Shabir, Sanjib Dey, Salman Sajad Wani, Suhail Lone, Seemin Rubab, Mir Faizal

(参考訳) 本稿では、まず光円錐ゲージにおけるボソニック弦理論の熱場二重状態を構築する。次に、コヒーレント-熱的弦状態を取得し、弦理論の回路複雑性を計算する。これは共分散行列法を用いて回路複雑性を計算する。このアプローチでは、水平弦発生器によって最適な測地線を生成し、群多様体における最小測地線の長さを用いて回路複雑性を得る。

In this paper, we first construct thermofield double states for bosonic string theory in the light-cone gauge. We then obtain a coherent-thermal string state and use it to calculate the circuit complexity in string theory. This is done using the covariance matrix approach to calculate the circuit complexity. In this approach, we will generate the optimal geodesics by a horizontal string generator and, then, obtain the circuit complexity using the length of the minimal geodesics in the group manifold.

翻訳日:2023-02-25 12:48:18 公開日:2022-02-17

# エミュレートされた乱流による絡み合った光子の形成

Shaping entangled photons through emulated turbulent atmosphere ( http://arxiv.org/abs/2202.08650v1 )

ライセンス: Link先を確認

Ronen Shekel, Ohad Lib, Yaron Bromberg, Alon Sardas

(参考訳) 大気乱流による散乱は、長い自由空間光リンク、特に絡み合った光子のリンクを作成する際の大きな課題の1つである。古典的な補償法は、本質的に低信号対雑音比と絡み合いの脆弱さのため、絡み合う光子には適用が難しい。我々は近ごろ、自発パラメトリックダウン変換を励起する明るいレーザービームを用いて、絡み合った光子間の空間的相関を制御し、散乱を補償できることを示した。本研究では,大気乱流をエミュレートして散乱する絡み合った光子間の相関関係のスクランブル補正にポンプシェーピング法を適用した。空間光変調器とコルモゴロフの乱流モデルを用いて,ラボ内の大気乱流をエミュレートし,ポンプ最適化による光子絡み込み信号の15倍の精度で増幅する。本研究では, 静的および動的エミュレート雰囲気の両方に対してこれを示し, 高次モードの散乱の補償も示す。この結果は、量子鍵分布などのアプリケーションで用いられる絡み合った光子による自由空間量子リンクを実現するための扉を開くことができる。

Scattering by atmospheric turbulence is one of the main challenges in creating long free-space optical links, and specifically links of entangled photons. Classical compensation methods are hard to apply to entangled photons, due to inherently low signal to noise ratios and the fragility of entanglement. We have recently shown that we can use the bright laser beam that pumps spontaneous parametric down conversion to control the spatial correlations between entangled photons for compensating their scattering. In this work, we apply the pump-shaping technique to compensate for scrambling of correlations between entangled photons that scatter by emulated atmospheric turbulence. We use a spatial light modulator and Kolmogorov's turbulence model to emulate atmospheric turbulence in the lab, and enhance the entangled photons' signal by a factor of fifteen using pump optimization. We show this for both static and dynamic emulated atmosphere, and demonstrate also the compensation of the scattering of a higher-order mode. Our results can open the door towards realizing free-space quantum links with entangled photons, used in applications such as quantum key distribution.

翻訳日:2023-02-25 12:48:11 公開日:2022-02-17

# 新しいHDコンピューティング代数:秩序情報を表すスパースバンドルを生成する状態の非連想的重ね合わせ

A novel HD Computing Algebra: Non-associative superposition of states creating sparse bundles representing order information ( http://arxiv.org/abs/2202.08633v1 )

ライセンス: Link先を確認

Stefan Reimann

(参考訳) 計算システムへの情報流入は、情報項目のシーケンスによって行われる。認知コンピューティング、すなわち、そのシーケンスに沿って変換を実行するには、アイテム情報だけでなく、シーケンシャルな情報も表現する必要がある。最も基本的な操作としては、バンドル、すなわちアイテムの追加、'メモリ状態'、すなわち情報の取得が可能なバンドルなどがある。通常のベクトル付加のような結合演算が連想的であれば、追加の代数構造を含まないシーケンシャル情報は表現できない。神経活動の確率的総和にインスパイアされた単純な確率的バイナリバンドルルールにより、結果として生じる記憶状態は、非連想的である限り、アイテム情報とシーケンシャル情報の両方を表現することができる。任意の数のアイテムを束ねて生じるメモリ状態は不均一であり、和の活性化閾値によって制御される疎さの度合いを有する。提案するバンドル操作は,情報の連続的な流入をナビゲートするために使用できるアイテムのドメインだけでなく,テンポラリにもフィルタを構築することができる。

Information inflow into a computational system is by a sequence of information items. Cognitive computing, i.e. performing transformations along that sequence, requires to represent item information as well as sequential information. Among the most elementary operations is bundling, i.e. adding items, leading to 'memory states', i.e. bundles, from which information can be retrieved. If the bundling operation used is associative, e.g. ordinary vector-addition, sequential information can not be represented without imposing additional algebraic structure. A simple stochastic binary bundling rule inspired by the stochastic summation of neuronal activities allows the resulting memory state to represent both, item information as well as sequential information as long as it is non-associative. The memory state resulting from bundling together an arbitrary number of items is non-homogeneous and has a degree of sparseness, which is controlled by the activation threshold in summation. The bundling operation proposed allows to build a filter in the temporal as well as in the items' domain, which can be used to navigate the continuous inflow of information.

翻訳日:2023-02-25 12:47:49 公開日:2022-02-17

# 連続対称性の自発的破断によるスケーラブルなスピンスクイーズ

Scalable spin squeezing from spontaneous breaking of a continuous symmetry ( http://arxiv.org/abs/2202.08607v1 )

ライセンス: Link先を確認

Tommaso Comparin, Fabio Mezzacapo, Martin Robert-de-Saint-Vincent, Tommaso Roscilde

(参考訳) 自発的対称性破れ(ssb)は、熱力学的極限において、それに結合した磁場が断続的にオフになっても順序パラメータの有限平均値を保持するハミルトン平衡状態の性質である。 In the case of quantum spin models with continuous symmetry, we show that this adiabatic process is also accompanied by the suppression of the fluctuations of the symmetry generator -- namely, the collective spin component along an axis of symmetry. In systems of $S=1/2$ spins or qubits, the combination of the suppression of fluctuations along one direction and of the persistence of transverse magnetization leads to spin squeezing -- a much sought-after property of quantum states, both for the purpose of entanglement detection as well as for metrological uses. U(1)(またはSU(2))対称性を自発的に破るXXZモデルの場合、アディベート的に準備された状態はほぼ最小のスピン不確実性を持ち、これらの状態で達成できる最小位相不確実性は、スピン数$N$で$N^{-3/4}$にスケールし、このスケーリングは、アディベート準備時間が$N$で線形にスケーリングされた後に達成されることを示す。我々の発見は、例えば光学格子時計を含む様々な量子多体デバイスにおける強いスピンスクイーズ状態の断熱的準備への扉を開く。

Spontaneous symmetry breaking (SSB) is a property of Hamiltonian equilibrium states which, in the thermodynamic limit, retain a finite average value of an order parameter even after a field coupled to it is adiabatically turned off. In the case of quantum spin models with continuous symmetry, we show that this adiabatic process is also accompanied by the suppression of the fluctuations of the symmetry generator -- namely, the collective spin component along an axis of symmetry. In systems of $S=1/2$ spins or qubits, the combination of the suppression of fluctuations along one direction and of the persistence of transverse magnetization leads to spin squeezing -- a much sought-after property of quantum states, both for the purpose of entanglement detection as well as for metrological uses. Focusing on the case of XXZ models spontaneously breaking a U(1) (or even SU(2)) symmetry, we show that the adiabatically prepared states have nearly minimal spin uncertainty; that the minimum phase uncertainty that one can achieve with these states scales as $N^{-3/4}$ with the number of spins $N$; and that this scaling is attained after an adiabatic preparation time scaling linearly with $N$. Our findings open the door to the adiabatic preparation of strongly spin-squeezed states in a large variety of quantum many-body devices including e.g. optical lattice clocks.

翻訳日:2023-02-25 12:47:29 公開日:2022-02-17

# ベンチマーク最適化問題を用いた量子アニールとハイブリッドソルバの実験解析

Experimental analysis of quantum annealers and hybrid solvers using benchmark optimization problems ( http://arxiv.org/abs/2202.08939v1 )

ライセンス: Link先を確認

Evangelos Stogiannos and Christos Papalitsas and Theodore Andronikos

(参考訳) 本稿では、D-Waveの量子システムにおけるハミルトンサイクル問題(HCP)とトラベリングセールスマン問題(TSP)について検討する。当初、ほとんどのライブラリが隣接行列でベンチマークインスタンスを提示するという事実に動機付けられて、量子プラットフォームにおけるベンチマークインスタンスのシームレスかつ自動統合を可能にする、HCPおよびTSPハミルトニアンの新しい行列定式化を開発した。大規模な実験の結果興味深い結論が得られました D-Wave の {\tt Advantage\_system4.1} は、量子ビット利用とソリューションの品質の両方において、 {\tt Advantage\_system1.1} よりも効率的である。最後に、D-WaveのHybridソルバがQUBO制約に違反することなく、任意の大問題に対して120ドルノードの順序で常に有効なソリューションを提供することを実験的に確立する。 tspインスタンスを解くとき、量子アニーラーによって生成される解はしばしば、グラフのトポロジーに違反しているという意味で無効である。この用途に対処するために、TSPハミルトニアンの係数に対する \emph{min-max normalization} の使用を提唱する。最後に, hcp と tsp のハミルトニアンを表現するのに必要な制約の正確な数を数学的に解析する。この分析は、不完全なグラフインスタンスを実行するのに完全インスタンスよりもクビットを必要とする理由を定量的に説明している。不完全グラフは完備グラフよりも二次的な制約を必要とすることが判明し、これは一連の実験で裏付けられている。

This paper studies the Hamiltonian Cycle Problem (HCP) and the Traveling Salesman Problem (TSP) on D-Wave's quantum systems. Initially, motivated by the fact that most libraries present their benchmark instances in terms of adjacency matrices, we develop a novel matrix formulation for the HCP and TSP Hamiltonians, which enables the seamless and automatic integration of benchmark instances in quantum platforms. our extensive experimental tests have led us to some interesting conclusions. D-Wave's {\tt Advantage\_system4.1} is more efficient than {\tt Advantage\_system1.1} both in terms of qubit utilization and quality of solutions. Finally, we experimentally establish that D-Wave's Hybrid solvers always provide a valid solution to a problem, without violating the QUBO constraints, even for arbitrarily big problems, of the order of $120$ nodes. When solving TSP instances, the solutions produced by the quantum annealer are often invalid, in the sense that they violate the topology of the graph. To address this use we advocate the use of \emph{min-max normalization} for the coefficients of the TSP Hamiltonian. Finally, we present a thorough mathematical analysis on the precise number of constraints required to express the HCP and TSP Hamiltonians. This analysis, explains quantitatively why, almost always, running incomplete graph instances requires more qubits than complete instances. It turns out that incomplete graph require more quadratic constraints than complete graphs, a fact that has been corroborated by a series of experiments.

翻訳日:2023-02-25 12:41:01 公開日:2022-02-17

# ニューロモルフィックアーキテクチャにおけるスパイクニューラルネットワークの実装

Implementing Spiking Neural Networks on Neuromorphic Architectures: A Review ( http://arxiv.org/abs/2202.08897v1 )

ライセンス: Link先を確認

Phu Khanh Huynh, M. Lakshmi Varshika, Ankita Paul, Murat Isik, Adarsha Balaji, Anup Das

(参考訳) 近年,産学ともにスパイキングニューラルネットワーク(snn)を用いて設計された機械学習アプリケーションを実行するために,複数の異なるニューロモルフィックシステムを提案している。設計と技術面での複雑さが増大する中で、このようなシステムを機械学習アプリケーションを受け入れ実行するためのプログラミングはますます困難になりつつある。さらに、ニューロモルフィックシステムはリアルタイムのパフォーマンスを保証し、低エネルギーを消費し、論理やメモリ障害に対する耐性を提供する必要がある。そのため、現在および新興のニューロモルフィックシステム上で機械学習アプリケーションを実装でき、同時にパフォーマンス、エネルギー、信頼性に対処できるシステムソフトウェアフレームワークが明らかに必要である。本稿では,プラットフォームベース設計とハードウェア・ソフトウェア共同設計の両面で提案されているフレームワークの概要を紹介する。我々は,ニューロモルフィックコンピューティングのシステムソフトウェア技術分野における将来が持つ課題と機会を強調する。

Recently, both industry and academia have proposed several different neuromorphic systems to execute machine learning applications that are designed using Spiking Neural Networks (SNNs). With the growing complexity on design and technology fronts, programming such systems to admit and execute a machine learning application is becoming increasingly challenging. Additionally, neuromorphic systems are required to guarantee real-time performance, consume lower energy, and provide tolerance to logic and memory failures. Consequently, there is a clear need for system software frameworks that can implement machine learning applications on current and emerging neuromorphic systems, and simultaneously address performance, energy, and reliability. Here, we provide a comprehensive overview of such frameworks proposed for both, platform-based design and hardware-software co-design. We highlight challenges and opportunities that the future holds in the area of system software technology for neuromorphic computing.

翻訳日:2023-02-25 12:40:11 公開日:2022-02-17

# 量子ハミルトン平均場モデルにおける三重臨界点

Tricritical point in the quantum Hamiltonian mean-field model ( http://arxiv.org/abs/2202.08855v1 )

ライセンス: Link先を確認

Harald Schmid, Johannes Dieplinger, Andrea Solfanelli, Sauro Succi, and Stefano Ruffo

(参考訳) 実験プラットフォームにおける工学的長距離相互作用は、近年、様々な量子システムにおいて大きな成功を収めている。この進展に触発されて、古典ハミルトン平均場モデルのフェルミオン粒子への一般化を提案する。温度・ホッピング関数としての強磁性相互作用の正準アンサンブルにおける模型の位相図と熱力学的性質について検討した。ゼロ温度では、小さな電荷のゆらぎは、ゼロ温度で秩序から乱れた位相への1次量子相遷移を通じて多体系を駆動する。高温では、揺動誘起相転移は最初は第1次であり、三臨界点でのみ第2次に遷移する。本研究は, 長距離結合を持つ量子系において, 直接実験的妥当性を持つ三重臨界性の興味深い例を示す。解析は厳密な対角化と平均場理論によって行われる。

Engineering long-range interactions in experimental platforms has been achieved with great success in a large variety of quantum systems in recent years. Inspired by this progress, we propose a generalization of the classical Hamiltonian mean-field model to fermionic particles. We study the phase diagram and thermodynamic properties of the model in the canonical ensemble for ferromagnetic interactions as a function of temperature and hopping. At zero temperature, small charge fluctuations drive the many-body system through a first order quantum phase transition from an ordered to a disordered phase at zero temperature. At higher temperatures, the fluctuation-induced phase transition remains first order initially and switches to second order only at a tricritical point. Our results offer an intriguing example of tricriticality in a quantum system with long-range couplings, which bears direct experimental relevance. The analysis is performed by exact diagonalization and mean-field theory.

翻訳日:2023-02-25 12:39:38 公開日:2022-02-17

# 量子$\mathbb{z}_2$格子ゲージ理論を用いたハミルトニアンサイクル問題の解法

Solving Hamiltonian Cycle Problem using Quantum $\mathbb{Z}_2$ Lattice Gauge Theory ( http://arxiv.org/abs/2202.08817v1 )

ライセンス: Link先を確認

Xiaopeng Cui, Yu Shi

(参考訳) グラフ理論におけるハミルトンサイクル(HC)問題は、よく知られたNP完全問題である。我々は、グラフを双対とする格子上で定義される {\mathbb{z}_2$ lattice gauge theory (lgt) という観点からのアプローチを示す。結合パラメータ $g$ が臨界値 $g_c$ よりも小さい場合、基底状態は、同じシングルスピン状態のスピンの閉文字列を持つ全ての構成の重ね合わせであり、時間複雑性を持つ断熱量子アルゴリズム $o(\frac{1}{g_c^2} \sqrt{ \frac{1}{\varepsilon} n_e^{3/2}(n_v^3 + \frac{n_e}{g_c}})$, ここで $n_v$ と $n_e$ はそれぞれグラフの頂点と辺の数である。その後の閉文字列間のhcの探索は、hc問題を解く。小さなグラフのランダムな例では、$\sqrt{n_{hc}}$, $n_{hc}$ の平均値が hcs の数、$\frac{1}{g_c}$ の平均値が $n_e$ に対して線型であることが示されている。したがって、いくつかのグラフではhc問題は多項式時間で解くことができる。また、$g_c$を用いて$N_{hc}$を推論できる量子アルゴリズムについても論じる。

The Hamiltonian cycle (HC) problem in graph theory is a well-known NP-complete problem. We present an approach in terms of $\mathbb{Z}_2$ lattice gauge theory (LGT) defined on the lattice with the graph as its dual. When the coupling parameter $g$ is less than the critical value $g_c$, the ground state is a superposition of all configurations with closed strings of spins in a same single-spin state, which can be obtained by using an adiabatic quantum algorithm with time complexity $O(\frac{1}{g_c^2} \sqrt{ \frac{1}{\varepsilon} N_e^{3/2}(N_v^3 + \frac{N_e}{g_c}}))$, where $N_v$ and $N_e$ are the numbers of vertices and edges of the graph respectively. A subsequent search for a HC among those closed-strings solves the HC problem. For some random samples of small graphs, we demonstrate that the dependence of the average value of $g_c$ on $\sqrt{N_{hc}}$, $N_{hc}$ being the number of HCs, and that of the average value of $\frac{1}{g_c}$ on $N_e$ are both linear. It is thus suggested that for some graphs, the HC problem may be solved in polynomial time. A possible quantum algorithm using $g_c$ to infer $N_{hc}$ is also discussed.

翻訳日:2023-02-25 12:39:03 公開日:2022-02-17

# 学校地理教育のイノベーション技術としての遠隔学習

Distance learning as innovation technology of school geographical education ( http://arxiv.org/abs/2202.08697v1 )

ライセンス: Link先を確認

Myroslav Syvyi, Ordenbek Mazbayev, Olga Varakuta, Natalia Panteleeva and Olga Bondarenko

(参考訳) 本論文は,中等教育における地理分野の学習と教育の過程における革新的技術の利用の必要性を述べる。教育的革新としての遠隔学習、その理論的側面、教育プロセスへの導入方法に特に注意が払われている。新ウクライナ学校における遠隔学習の意義が実証された。その利点と欠点が明らかになる。欧州の要求に応じて地理的能力開発に寄与するいくつかの遠隔学習の例が提供される。この記事は特に、Massive Open Online Courses、モダンウェブサイト、個々の教師の仮想ポータル、LearningApps.orgポータル、Moodleに焦点を当てている。

The article substantiates the necessity of using innovative technologies in the process of studying and teaching geographical disciplines at secondary schools. Particular attention is paid to distance learning as a pedagogical innovation, its theoretical aspects and the ways of its introduction into the educational process. The relevance of using distance learning at the New Ukrainian School is proved. Its advantages and disadvantages are revealed. The examples of some forms of distance learning that will contribute to geographical competence development according to European requirements are provided. The article particularly focuses on the Massive Open Online Courses, modern websites, virtual portals of individual teachers, LearningApps.org portal, and Moodle.

翻訳日:2023-02-19 15:01:47 公開日:2022-02-17

# 農業における知識共有のための情報通信技術イニシアティブ

Information and communication technology initiatives for knowledge sharing in agriculture ( http://arxiv.org/abs/2202.08649v1 )

ライセンス: Link先を確認

Siddhartha Paul Tiwari

(参考訳) 農業における知識共有のための情報通信技術(ICT)の利用状況と動向について調査した。アジア諸国では、日本、韓国、台湾を含む先進ユーザーカテゴリーに次いでインドは第2のカテゴリーに分類される。一方の利益モチベーションとビジネス強化、一方の地域サービスと他方の農村福祉は、インドの農業におけるICTベースモデルの目的である。 ICTによる農業への取り組みは、様々な機関、ヴィズ民間セクター、公共セクター、セルフヘルプグループ、NGOに属しており、複合的な取り組みも含まれている。 eラーニングはますます両分野に傾きつつある (i)キャンパス内又は「プレゼンス」モードで、及び (ii)「距離」モード。その使用は「鉄の三角形」の3つの腕の抑止範囲から徐々に利害関係者を緩和している (i)品質、 (ii)アクセス、及び (iii)コスト。モビリティの低い社会グループは、この教育様式の恩恵を受けることができる。これはまた、ジェンダーの主流化をもたらす強力なツールの1つかもしれない。 eラーニングは「ICT支援学習」と呼ばれるハイブリッドシステムとして、既存の組織・教育構造に統合されている。接続性, コンテンツ開発, インフラ開発, 教員開発, 継続性の評価, ノード3とコンソーシアムの形成等は, 支援, 開発が必要な分野である。

A survey on status and trends of information and communication technologies (ICT) use for knowledge sharing in agriculture was attempted. Among asian countries, India comes under the second next category after the advanced user category comprising Japan, South Korea and Taiwan. Both profit-motive and business augmentation on one hand and community services and rural welfare on the other have been the objectives of ICT-based models in agriculture in India. The ICT endeavours for agriculture belong to a wide array of agencies, viz private sector, public sector, self-help groups and NGOs, and also include combined endeavours. e-Learning is being increasingly resorted to both in (i) in campus or 'presence' mode, and (ii) 'distance' mode. Its use is gradually easing-out the stakeholders from the stranglehold of the inter-deterrence of the 3 arms of the 'Iron Triangle', viz (i) quality, (ii) access, and (iii) cost. The social groups having less mobility are poised to benefit more from this mode of education. This could also be one of the potent tools to bring about gender mainstreaming. e-Learning is being integrated into the existing organizational and educational structure as a hybrid system that can be called 'ICT-supported learning'. Connectivity, content development, infrastructure development, faculty developmeat, need assessment on a continuum, linking the node3 and formation of consortia etc. are the areas identified that need to be supported and developed.

翻訳日:2023-02-19 15:01:36 公開日:2022-02-17

# SNPSFuzzer: スナップショットを使用したステートフルネットワークプロトコルのための高速Greybox Fuzzer

SNPSFuzzer: A Fast Greybox Fuzzer for Stateful Network Protocols using Snapshots ( http://arxiv.org/abs/2202.03643v2 )

ライセンス: Link先を確認

Junqiang Li, Senyi Li, Gang Sun, Ting Chen, and Hongfang Yu

(参考訳) グレイボックスファジングはステートレスプログラムで広く使われており、大きな成功を収めている。しかしながら、ほとんどの最先端のグレーボックスファザは、インタラクションの詳細を記憶し保存できるステートフルネットワークプロトコルプログラムをファザリングするプロセスにおいて、遅い速度と浅い状態の深さのカバレッジの問題を一般的に抱えている。ネットワークプロトコルプログラム用の既存のグレーボックスファッジャは、まず入力メッセージの明確に定義されたプレフィックスシーケンスを送信し、次に変更されたメッセージを送り、ステートフルなネットワークプロトコルのターゲット状態をテストする。上記のプロセスは、高い時間的コストを引き起こす。本稿では、スナップショットを用いたステートフルネットワークプロトコルのための高速グレーボックスファザであるSNPSFuzzerを提案する。 SNPSFuzzerは、ネットワークプロトコルプログラムが特定の状態にあるときにコンテキスト情報をダンプし、状態がファジットされる必要があるときにそれを復元する。さらに,より深いネットワークプロトコル状態を探索するために,メッセージ連鎖解析アルゴリズムを設計する。 SNPSFuzzerは最先端のネットワークプロトコルであるGragbox fuzzer AFLNETと比較して112.0%-168.9%高速化し、24時間以内にパスカバレッジを21.4%-27.5%向上した。さらに、snpsfuzzerは、プログラムtinydtlsで以前に報告されていない脆弱性を公開する。

Greybox fuzzing has been widely used in stateless programs and has achieved great success. However, most state-of-the-art greybox fuzzers generally have the problems of slow speed and shallow state depth coverage in the process of fuzzing stateful network protocol programs which are able to remember and store details of the interactions. The existing greybox fuzzers for network protocol programs send a series of well-defined prefix sequences of input messages first and then send mutated messages to test the target state of a stateful network protocol. The process mentioned above causes a high time cost. In this paper, we propose SNPSFuzzer, a fast greybox fuzzer for stateful network protocol using snapshots. SNPSFuzzer dumps the context information when the network protocol program is under a specific state and restores it when the state needs to be fuzzed. Furthermore, we design a message chain analysis algorithm to explore more and deeper network protocol states. Our evaluation shows that, compared with the state-of-the-art network protocol greybox fuzzer AFLNET, SNPSFuzzer increases the speed of network protocol fuzzing by 112.0%-168.9% and improves path coverage by 21.4%-27.5% within 24 hours. Moreover, SNPSFuzzer exposes a previously unreported vulnerability in program Tinydtls.

翻訳日:2023-02-19 14:45:08 公開日:2022-02-17

# 西イランの方言層--階層的ディリクレ過程の言語関係へのアプローチ

Dialectal Layers in West Iranian: a Hierarchical Dirichlet Process Approach to Linguistic Relationships ( http://arxiv.org/abs/2001.05297v4 )

ライセンス: Link先を確認

Chundra Aroor Cathcart

(参考訳) 本稿は、西イラン語の歴史音韻学における、複雑で未解決の一連の問題に対処する。西イランの言語(ペルシア語、クルド語、バラチ語、その他の言語)は、非ラウトゲシュリッヒ的な振る舞いの度合いが高い。しかし、西イラン方言学の文献では、作業プロセスに対する過度に単純化された見解が普及しており、専門家は、特定の非ペルシア語における期待される結果からの逸脱は、ペルシア語の年代学的段階からの語彙的借用によるものであると仮定している。この定性的なアプローチは、データの分布に関する明示的な確率論的推論の欠如から生じる問題的な結論をもたらすことが示されている: ペルシア語は唯一のドナー言語ではないかもしれない; さらに、語彙レベルで借りることが必ずしも不規則性をもたらすメカニズムであるとは限らない。多くの場合、西イランの言語が異なる条件条件下で異なる反射を示す可能性は未検討のままである。我々は、これらの問題を克服し、西イランの音響変化のパターンにおける不規則性の異なる決定要因を分解するために設計された新しいベイズ的アプローチを採用する。提案手法により,西イラン語方言学における特定の音変化の弁証的関連に関する多くの顕著な疑問を予備的に解決することができる。この種の作業の今後の方向性について概説する。

This paper addresses a series of complex and unresolved issues in the historical phonology of West Iranian languages. The West Iranian languages (Persian, Kurdish, Balochi, and other languages) display a high degree of non-Lautgesetzlich behavior. Most of this irregularity is undoubtedly due to language contact; we argue, however, that an oversimplified view of the processes at work has prevailed in the literature on West Iranian dialectology, with specialists assuming that deviations from an expected outcome in a given non-Persian language are due to lexical borrowing from some chronological stage of Persian. It is demonstrated that this qualitative approach yields at times problematic conclusions stemming from the lack of explicit probabilistic inferences regarding the distribution of the data: Persian may not be the sole donor language; additionally, borrowing at the lexical level is not always the mechanism that introduces irregularity. In many cases, the possibility that West Iranian languages show different reflexes in different conditioning environments remains under-explored. We employ a novel Bayesian approach designed to overcome these problems and tease apart the different determinants of irregularity in patterns of West Iranian sound change. Our methodology allows us to provisionally resolve a number of outstanding questions in the literature on West Iranian dialectology concerning the dialectal affiliation of certain sound changes. We outline future directions for work of this sort.

翻訳日:2023-01-11 23:06:43 公開日:2022-02-17

# ミトコンドリアの深部核融合

Deep Feature Fusion for Mitosis Counting ( http://arxiv.org/abs/2002.03781v3 )

ライセンス: Link先を確認

Robin Elizabeth Yancey

(参考訳) 米国に住む女性はそれぞれ、浸潤性乳癌を発症する確率は8分の1である。有糸分裂細胞数は、乳がんの攻撃性または品位を評価する最も一般的な検査の1つである。病理組織像は高分解能顕微鏡を用いて細胞数を計測し,病理組織学的に検討する必要がある。残念ながら、これは再現性に乏しい、特に非専門家にとって、徹底的な作業だ。深層学習ネットワークは、これらの関心領域を自動的にローカライズできる医療アプリケーションに適用されている。しかし、これらのリージョンベースのネットワークは、フルイメージCNNが生成するセグメンテーション機能を活用できないため、検出の唯一の方法としてしばしば使用される。そこで提案手法では,rgb画像特徴を持つunetで生成されたセグメンテーション特徴を活用しつつ,オブジェクト検出に高速なrcnnを活用し,mitos-atypia 2014 mitosis counting challengeデータセット上で0.508のf-scoreを実現する。

Each woman living in the United States has about 1 in 8 chance of developing invasive breast cancer. The mitotic cell count is one of the most common tests to assess the aggressiveness or grade of breast cancer. In this prognosis, histopathology images must be examined by a pathologist using high-resolution microscopes to count the cells. Unfortunately, this can be an exhaustive task with poor reproducibility, especially for non-experts. Deep learning networks have recently been adapted to medical applications which are able to automatically localize these regions of interest. However, these region-based networks lack the ability to take advantage of the segmentation features produced by a full image CNN which are often used as a sole method of detection. Therefore, the proposed method leverages Faster RCNN for object detection while fusing segmentation features generated by a UNet with RGB image features to achieve an F-score of 0.508 on the MITOS-ATYPIA 2014 mitosis counting challenge dataset, outperforming state-of-the-art methods.

翻訳日:2023-01-05 00:37:27 公開日:2022-02-17

# un-mix: 教師なし視覚表現学習のための画像混合再考

Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning ( http://arxiv.org/abs/2003.05438v5 )

ライセンス: Link先を確認

Zhiqiang Shen and Zechun Liu and Zhuang Liu and Marios Savvides and Trevor Darrell and Eric Xing

(参考訳) 最近の教師なし学習のアプローチでは、同じイメージから2つの"ビュー"を比較して表現を学習する。 2つのビューを区別させることは、教師なしのメソッドが意味のある情報を学習できることを保証するためのコアである。しかし、このようなフレームワークは、2つのビューを生成するのに使用される拡張が不十分な場合、オーバーフィッティングに脆弱な場合があり、トレーニングデータに過度な問題が発生する。この欠点は、微妙な分散ときめ細かい情報を学ぶことを妨げる。そこで本研究では,ラベル空間上の距離概念を教師なし学習に含め,入力データ空間を混合することにより,正対と負対のソフトな類似度をモデルに認識させ,入力空間と損失空間を協調的に扱うことを目的とする。その概念的単純さにもかかわらず、この解 -- 教師なし画像混合(Un-Mix)により、変換された入力と対応する新しいラベル空間からより微妙でより堅牢で一般化された表現を学習できることを示す。 CIFAR-10、CIFAR-100、STL-10、Tiny ImageNet、および一般的な教師なし手法SimCLR、BYOL、MoCo V1&V2、SwaVなどを用いて、広範囲にわたる実験を行った。提案する画像混合とラベル割当戦略は,ベース手法の全く同じハイパーパラメータとトレーニング手順に従って,1～3%の一貫した改善が得られる。コードはhttps://github.com/szq0214/un-mixで公開されている。

The recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations. Making the two views distinctive is a core to guarantee that unsupervised methods can learn meaningful information. However, such frameworks are sometimes fragile on overfitting if the augmentations used for generating two views are not strong enough, causing the over-confident issue on the training data. This drawback hinders the model from learning subtle variance and fine-grained information. To address this, in this work we aim to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs through mixing the input data space, to further work collaboratively for the input and loss spaces. Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space. Extensive experiments are conducted on CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet and standard ImageNet with popular unsupervised methods SimCLR, BYOL, MoCo V1&V2, SwAV, etc. Our proposed image mixture and label assignment strategy can obtain consistent improvement by 1~3% following exactly the same hyperparameters and training procedures of the base methods. Code is publicly available at https://github.com/szq0214/Un-Mix.

翻訳日:2022-12-24 14:33:02 公開日:2022-02-17

# 最も近い隣のディリクレ混合物

Nearest Neighbor Dirichlet Mixtures ( http://arxiv.org/abs/2003.07953v3 )

ライセンス: Link先を確認

Shounak Chattopadhyay, Antik Chakraborty, David B. Dunson

(参考訳) ベイズの密度推定法には、未知の密度を核の混合として特徴づける豊富な文献がある。このような手法は、様々な密度に適応しながら、推定において不確実な定量化を提供するという点で有利である。しかし、頻繁な局所適応型カーネル法と比較して、ベイズ的アプローチはマルコフ連鎖モンテカルロアルゴリズムに依存して実装するのが遅く不安定である。計算上の欠点を伴わずにベイズアプローチの強みのほとんどを維持するため,近接-ディリクレ混合系のクラスを提案する。このアプローチは、データを標準アルゴリズムに基づいて近隣にグループ化することから始まります。各近傍では、密度は、未知のパラメータを持つガウスのようなベイズパラメトリックモデルによって特徴づけられる。これらの局所カーネルの重みの前にディリクレを割り当てると、重みとカーネルパラメータの擬似ポストリプタが得られる。単純で恥ずかしい並列なモンテカルロアルゴリズムは、未知の密度の擬似後続体からサンプリングするために提案される。望ましい漸近的性質を示し,シミュレーション研究で評価し,分類の文脈における動機付けデータセットに適用する。

There is a rich literature on Bayesian methods for density estimation, which characterize the unknown density as a mixture of kernels. Such methods have advantages in terms of providing uncertainty quantification in estimation, while being adaptive to a rich variety of densities. However, relative to frequentist locally adaptive kernel methods, Bayesian approaches can be slow and unstable to implement in relying on Markov chain Monte Carlo algorithms. To maintain most of the strengths of Bayesian approaches without the computational disadvantages, we propose a class of nearest neighbor-Dirichlet mixtures. The approach starts by grouping the data into neighborhoods based on standard algorithms. Within each neighborhood, the density is characterized via a Bayesian parametric model, such as a Gaussian with unknown parameters. Assigning a Dirichlet prior to the weights on these local kernels, we obtain a pseudo-posterior for the weights and kernel parameters. A simple and embarrassingly parallel Monte Carlo algorithm is proposed to sample from the resulting pseudo-posterior for the unknown density. Desirable asymptotic properties are shown, and the methods are evaluated in simulation studies and applied to a motivating data set in the context of classification.

翻訳日:2022-12-22 21:57:53 公開日:2022-02-17

# neural loop combiner: ループの互換性を評価するニューラルネットワークモデル

Neural Loop Combiner: Neural Network Models for Assessing the Compatibility of Loops ( http://arxiv.org/abs/2008.02011v2 )

ライセンス: Link先を確認

Bo-Yu Chen, Jordan B. L. Smith, Yi-Hsuan Yang

(参考訳) ループを使用する音楽プロデューサーは数千のループライブラリにアクセスできますが、互換性のある曲を見つけるのは時間を要するプロセスです。 AutoMashUpperのような互換性を推定する最先端システムはほとんどルールベースであり、機械学習によって改善される可能性がある。モデルをトレーニングするには、真理互換値の大きいループのセットが必要です。このようなデータセットは存在せず、既存の音楽からループを抽出して互換ループの正の例を得、負の例を選択するための様々な戦略を提案し比較する。このデータを用いて、我々はループの互換性を推定するための2種類のモデルアーキテクチャを調査する。1つは、シームズネットワークに基づくもので、もう1つは純粋な畳み込みニューラルネットワーク(CNN)である。我々は,各モデルが提案する組み合わせの質を評価するユーザスタディを行い,CNNがシームズネットワークを上回っていることを確認した。どちらのモデルベースアプローチもルールベースのアプローチよりも優れています。モデルとデータセットを構築するためのコードをオープンソース化しました。

Music producers who use loops may have access to thousands in loop libraries, but finding ones that are compatible is a time-consuming process; we hope to reduce this burden with automation. State-of-the-art systems for estimating compatibility, such as AutoMashUpper, are mostly rule-based and could be improved on with machine learn-ing. To train a model, we need a large set of loops with ground truth compatibility values. No such dataset exists, so we extract loops from existing music to obtain positive examples of compatible loops, and propose and compare various strategies for choosing negative examples. For re-producibility, we curate data from the Free Music Archive.Using this data, we investigate two types of model architectures for estimating the compatibility of loops: one based on a Siamese network, and the other a pure convolutional neural network (CNN). We conducted a user study in which participants rated the quality of the combinations suggested by each model, and found the CNN to outperform the Siamese network. Both model-based approaches outperformed the rule-based one. We have opened source the code for building the models and the dataset.

翻訳日:2022-11-02 19:06:59 公開日:2022-02-17

# 超次元計算の理論的展望

A Theoretical Perspective on Hyperdimensional Computing ( http://arxiv.org/abs/2010.07426v3 )

ライセンス: Link先を確認

Anthony Thomas, Sanjoy Dasgupta, Tajana Rosing

(参考訳) 超次元(hyperdimensional, hd)コンピューティングは、高次元、低精度、分散したデータの表現を得るための、神経にインスパイアされた一連の方法である。これらの表現は、様々な情報処理タスクに影響を及ぼす、単純で神経学的に妥当なアルゴリズムと組み合わせることができる。 HDコンピューティングは最近、学習問題を解決するためのエネルギー効率、低レイテンシ、ノイズローバストツールとして、コンピュータハードウェアコミュニティから大きな関心を集めている。本稿では,HDコンピューティングの理論的基礎を統一的に扱うとともに,学習における表現の適合性に焦点をあてる。

Hyperdimensional (HD) computing is a set of neurally inspired methods for obtaining high-dimensional, low-precision, distributed representations of data. These representations can be combined with simple, neurally plausible algorithms to effect a variety of information processing tasks. HD computing has recently garnered significant interest from the computer hardware community as an energy-efficient, low-latency, and noise-robust tool for solving learning problems. In this review, we present a unified treatment of the theoretical foundations of HD computing with a focus on the suitability of representations for learning.

翻訳日:2022-10-07 14:05:38 公開日:2022-02-17

# 自動運転のためのディープサロゲートQラーニング

Deep Surrogate Q-Learning for Autonomous Driving ( http://arxiv.org/abs/2010.11278v2 )

ライセンス: Link先を確認

Maria Kalweit, Gabriel Kalweit, Moritz Werling, Joschka Boedecker

(参考訳) 実システムへの適用における深層強化学習システムの課題は,環境変化への適応性とw.r.t.計算資源とデータの有効性である。自動運転の学習車線変更行動の適用においては、エージェントは周囲のさまざまな車両を扱う必要がある。さらに、テストドライバは実世界で任意の数のレーン変更を実行できないため、必要なトランジションの数がボトルネックとなる。政治外の環境では、他人の行動を観察することで、タスクの解決に関する追加情報を得ることができる。古典的なRL設定では、この知識は使われていないが、エージェントの値関数をより効率的に学習するために、他のドライバを代理として使用する。本稿では、上記の問題に対処し、必要な運転時間を劇的に短縮するSurrogate Q-learningを提案する。さらに,q関数の置換同変ディープニューラルネットワークアーキテクチャに基づく効率的な実装を提案し,センサ範囲の可変車両の動作値の推定を行う。オープントラヒックシミュレータsumoでは,このアーキテクチャにより,シーン中心体験リプレイと呼ばれる新たなリプレイサンプリング手法が実現され,サロゲートq学習とシーン中心体験リプレイのパフォーマンス評価が可能となった。さらに,本手法は実高Dデータセット上のポリシーを学習することで,実世界のRLシステムの適用性を向上させる。

Challenging problems of deep reinforcement learning systems with regard to the application on real systems are their adaptivity to changing environments and their efficiency w.r.t. computational resources and data. In the application of learning lane-change behavior for autonomous driving, agents have to deal with a varying number of surrounding vehicles. Furthermore, the number of required transitions imposes a bottleneck, since test drivers cannot perform an arbitrary amount of lane changes in the real world. In the off-policy setting, additional information on solving the task can be gained by observing actions from others. While in the classical RL setup this knowledge remains unused, we use other drivers as surrogates to learn the agent's value function more efficiently. We propose Surrogate Q-learning that deals with the aforementioned problems and reduces the required driving time drastically. We further propose an efficient implementation based on a permutation-equivariant deep neural network architecture of the Q-function to estimate action-values for a variable number of vehicles in sensor range. We show that the architecture leads to a novel replay sampling technique we call Scene-centric Experience Replay and evaluate the performance of Surrogate Q-learning and Scene-centric Experience Replay in the open traffic simulator SUMO. Additionally, we show that our methods enhance real-world applicability of RL systems by learning policies on the real highD dataset.

翻訳日:2022-10-05 00:53:26 公開日:2022-02-17

# ハリケーン予報:新しいマルチモーダル機械学習フレームワーク

Hurricane Forecasting: A Novel Multimodal Machine Learning Framework ( http://arxiv.org/abs/2011.06125v3 )

ライセンス: Link先を確認

L\'eonard Boussioux, Cynthia Zeng, Th\'eo Gu\'enais, Dimitris Bertsimas

(参考訳) 本稿では,熱帯性サイクロン強度とトラック予測のための機械学習(ML)フレームワークについて述べる。我々のマルチモーダルフレームワークであるHurricastは、深層学習エンコーダデコーダアーキテクチャで特徴を抽出し、勾配木で予測することで、時空間データと統計データを効率的に組み合わせている。我々は2016-2019年、北大西洋と東太平洋の流域で24時間リードタイムトラックと強度予測を行い、計算中に現在の運用予測モデルに匹敵する平均誤差とスキルを達成できたことを示す。さらに、Hurricastを運用予測コンセンサスモデルに組み込むことは、National Hurricane Centerの公式予測よりも改善され、既存のアプローチと相補的な特性が強調される。まとめると、我々の研究は、異なるデータソースを組み合わせるために機械学習技術を利用することで、熱帯サイクロン予測の新しい機会がもたらされることを示した。

This paper describes a novel machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple ML techniques and utilizing diverse data sources. Our multimodal framework, called Hurricast, efficiently combines spatial-temporal data with statistical data by extracting features with deep-learning encoder-decoder architectures and predicting with gradient-boosted trees. We evaluate our models in the North Atlantic and Eastern Pacific basins on 2016-2019 for 24-hour lead time track and intensity forecasts and show they achieve comparable mean average error and skill to current operational forecast models while computing in seconds. Furthermore, the inclusion of Hurricast into an operational forecast consensus model could improve over the National Hurricane Center's official forecast, thus highlighting the complementary properties with existing approaches. In summary, our work demonstrates that utilizing machine learning techniques to combine different data sources can lead to new opportunities in tropical cyclone forecasting.

翻訳日:2022-09-26 23:40:59 公開日:2022-02-17

# (参考訳) モバイルクラウドセンシングにおける偽タスク防止のためのdeepnnを用いた協調的自己組織化マップ

Collaborative Self Organizing Map with DeepNNs for Fake Task Prevention in Mobile Crowdsensing ( http://arxiv.org/abs/2203.12434v1 )

ライセンス: CC BY 4.0

Murat Simsek, Burak Kantarci, Azzedine Boukerche

(参考訳) モバイルクラウドセンシング(MCS)は、さまざまなサービスプロバイダがデータを収集し、処理し、分析する方法を変革したセンシングパラダイムである。 mcsは、最先端技術のための様々なアプリケーションやサービスをサポートするために、ユーザのモバイルデバイスを通じてデータがセンシングされ共有される、新しいプロセスを提供する。しかし、データ中毒、詰まったタスクアタック、偽のセンシングタスクといった様々な脅威は、mcsシステム、特にそのセンシングと計算能力のパフォーマンスに悪影響を及ぼす。フェイクセンシングタスクの提出は、正当なタスクとモバイルデバイスリソースの完成を目標としているため、mcsプラットフォームリソースも排除している。本研究では、教師なしでトレーニングされたニューラルネットワークであるSelf Organizing Feature Map(SOFM)を用いて、データセット内の正当データを事前クラスタリングすることにより、新しいデータセットにおいて正当/偽タスク比率が低い不均衡なデータにより、偽タスクをより効果的に検出することができる。クラスタ化された正規タスクが元のデータセットから分離された後、残りのデータセットを使用して、最終的なパフォーマンス目標に到達するためのDeep Neural Network(DeepNN)をトレーニングする。提案手法の性能向上のために,DeepNNの正の予測出力に,事前クラスタ化された正規タスクを付加し,事前クラスタ化されたDeepNN(PrecDeepNN)と呼ぶ。その結果、DeepNNから得られた正当性と偽のタスクを、選択した特徴セットで識別する初期平均精度が、提案した機械学習技術から得られる平均精度0.9812まで向上できることが証明された。

Mobile Crowdsensing (MCS) is a sensing paradigm that has transformed the way that various service providers collect, process, and analyze data. MCS offers novel processes where data is sensed and shared through mobile devices of the users to support various applications and services for cutting-edge technologies. However, various threats, such as data poisoning, clogging task attacks and fake sensing tasks adversely affect the performance of MCS systems, especially their sensing, and computational capacities. Since fake sensing task submissions aim at the successful completion of the legitimate tasks and mobile device resources, they also drain MCS platform resources. In this work, Self Organizing Feature Map (SOFM), an artificial neural network that is trained in an unsupervised manner, is utilized to pre-cluster the legitimate data in the dataset, thus fake tasks can be detected more effectively through less imbalanced data where legitimate/fake tasks ratio is lower in the new dataset. After pre-clustered legitimate tasks are separated from the original dataset, the remaining dataset is used to train a Deep Neural Network (DeepNN) to reach the ultimate performance goal. Pre-clustered legitimate tasks are appended to the positive prediction outputs of DeepNN to boost the performance of the proposed technique, which we refer to as pre-clustered DeepNN (PrecDeepNN). The results prove that the initial average accuracy to discriminate the legitimate and fake tasks obtained from DeepNN with the selected set of features can be improved up to an average accuracy of 0.9812 obtained from the proposed machine learning technique.

翻訳日:2022-03-27 14:02:42 公開日:2022-02-17

# (参考訳) GEMA: 自己組織化マップのためのオープンソースのPythonライブラリ

GEMA: An open-source Python library for self-organizing-maps ( http://arxiv.org/abs/2203.13190v1 )

ライセンス: CC BY 4.0

Alvaro J. Garcia-Tejedor, Alberto Nogales

(参考訳) 組織はデータ分析の重要性とそのメリットを認識した。この機械学習アルゴリズムと組み合わせることで、問題をより容易に解決することが可能になり、これらのプロセスの時間が短縮される。ニューラルネットワークは、最近非常に良い結果を得た機械学習技術である。本稿では、自己組織化マップと呼ばれるニューラルネットワークモデルを扱うために開発された、GEMAと呼ばれるオープンソースのPythonライブラリについて述べる。 GEMAはGitHubのGNU General Public License(https://github.com/ufvceiec/GEMA)の下で無料で利用できる。ライブラリは特定のユースケースで評価され、正確な結果が得られる。

Organizations have realized the importance of data analysis and its benefits. This in combination with Machine Learning algorithms has allowed to solve problems more easily, making these processes less time-consuming. Neural networks are the Machine Learning technique that is recently obtaining very good best results. This paper describes an open-source Python library called GEMA developed to work with a type of neural network model called Self-Organizing-Maps. GEMA is freely available under GNU General Public License at GitHub (https://github.com/ufvceiec/GEMA). The library has been evaluated in different a particular use case obtaining accurate results.

翻訳日:2022-03-27 13:51:51 公開日:2022-02-17

# fexgan-meta : メタヒトによる表情生成

FExGAN-Meta: Facial Expression Generation with Meta Humans ( http://arxiv.org/abs/2203.05975v1 )

ライセンス: Link先を確認

J. Rafid Siddiqui

(参考訳) 人間の表情の微妙さと、人間の表情が表現する強度の度合いの変動は、表情のイメージを頑健に分類し、生成することを困難にしている。高品質なデータの欠如は、ディープラーニングモデルのパフォーマンスを阻害する可能性がある。本稿では,メタヒトの表情にロバストに作用するメタヒト(fexgan-meta)の表情生成法を提案する。スタジオ環境に配置した10人のメタヒューマンが提示した表情の大規模なデータセットを作成し,FExGAN-Metaを画像上で評価した。以上の結果から,FExGAN-MetaはMeta-Humansの画像と複雑な表情を強く生成し,分類する。

The subtleness of human facial expressions and a large degree of variation in the level of intensity to which a human expresses them is what makes it challenging to robustly classify and generate images of facial expressions. Lack of good quality data can hinder the performance of a deep learning model. In this article, we have proposed a Facial Expression Generation method for Meta-Humans (FExGAN-Meta) that works robustly with the images of Meta-Humans. We have prepared a large dataset of facial expressions exhibited by ten Meta-Humans when placed in a studio environment and then we have evaluated FExGAN-Meta on the collected images. The results show that FExGAN-Meta robustly generates and classifies the images of Meta-Humans for the simple as well as the complex facial expressions.

翻訳日:2022-03-20 23:08:17 公開日:2022-02-17

# (参考訳) 不均一コンピューティングにおけるテキスト分類のための量子時間畳み込み学習

When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing ( http://arxiv.org/abs/2203.03550v1 )

ライセンス: CC BY-SA 4.0

Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Yu Tsao, Pin-Yu Chen

(参考訳) 量子コンピューティングの急速な発展は、よりリッチな特徴表現やモデルパラメータのよりセキュアな保護など、量子アドバンテージの多くの特徴を示している。本研究は,変分量子回路に基づく垂直連合学習アーキテクチャを提案し,テキスト分類のための量子化事前学習BERTモデルの競争性能を実証する。特に,提案するハイブリッド古典量子モデルは,BERTデコーダのいくつかの層を置き換える新しいランダム量子時間畳み込み(QTC)学習フレームワークで構成されている。目的分類実験により,提案したBERT-QTCモデルにより,SnipsおよびATIS音声言語データセットの競合実験結果が得られた。特にbert-qtcは、2つのテキスト分類データセットにおける既存の量子回路ベースの言語モデルのパフォーマンスを1.57%向上させた。さらにBERT-QTCは、既存の商用アクセス可能な量子計算ハードウェアとCPUベースのインターフェースの両方にデプロイ可能で、データの分離を保証することができる。

The rapid development of quantum computing has demonstrated many unique characteristics of quantum advantages, such as richer feature representation and more secured protection on model parameters. This work proposes a vertical federated learning architecture based on variational quantum circuits to demonstrate the competitive performance of a quantum-enhanced pre-trained BERT model for text classification. In particular, our proposed hybrid classical-quantum model consists of a novel random quantum temporal convolution (QTC) learning framework replacing some layers in the BERT-based decoder. Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets. Particularly, the BERT-QTC boosts the performance of the existing quantum circuit-based language model in two text classification datasets by 1.57% and 1.52% relative improvements. Furthermore, BERT-QTC can be feasibly deployed on both existing commercial-accessible quantum computation hardware and CPU-based interface for ensuring data isolation.

翻訳日:2022-03-13 17:10:33 公開日:2022-02-17

# youtubeで子どものコンテンツを不注意で不安全に書き起こしする「bitch」

'Beach' to 'Bitch': Inadvertent Unsafe Transcription of Kids' Content on YouTube ( http://arxiv.org/abs/2203.04837v1 )

ライセンス: Link先を確認

Krithika Ramesh, Ashiqur R. KhudaBukhsh, Sumeet Kumar

(参考訳) ここ数年、youtube kidsは子供向けエンタテインメントにおけるテレビの競争の激しい選択肢の1つとして登場してきた。その結果、youtube kidsのコンテンツは、子供の安全を確保するためにさらなるレベルの精査を受けるべきである。子どもに悪質なコンテンツや不適切なコンテンツを検出する研究が勢いを増しているが、aiアプリケーションが子どもに不適切なコンテンツを導入する可能性について調査する現在の研究は、ほとんど、あるいは全く存在しない。本稿では,よく知られた自動音声認識(asr)システムが,youtubeキッズビデオの書き起こしをしながら,子供にとって不適切なテキストコンテンツを生成することを発見した。我々はこの現象を『不適切なコンテンツ幻覚』と呼ぶ。以上の結果から,これらの幻覚は時折生じない可能性が示唆され,asr系は高い信頼感を持つことが多い。我々は,既存の最先端asrシステムが子どもに不適切なコンテンツを提示するための,初歩的な音声データセットをリリースする。さらに,これらのエラーのいくつかを言語モデルを用いて修正できることを実証する。

Over the last few years, YouTube Kids has emerged as one of the highly competitive alternatives to television for children's entertainment. Consequently, YouTube Kids' content should receive an additional level of scrutiny to ensure children's safety. While research on detecting offensive or inappropriate content for kids is gaining momentum, little or no current work exists that investigates to what extent AI applications can (accidentally) introduce content that is inappropriate for kids. In this paper, we present a novel (and troubling) finding that well-known automatic speech recognition (ASR) systems may produce text content highly inappropriate for kids while transcribing YouTube Kids' videos. We dub this phenomenon as \emph{inappropriate content hallucination}. Our analyses suggest that such hallucinations are far from occasional, and the ASR systems often produce them with high confidence. We release a first-of-its-kind data set of audios for which the existing state-of-the-art ASR systems hallucinate inappropriate content for kids. In addition, we demonstrate that some of these errors can be fixed using language models.

翻訳日:2022-03-13 14:01:18 公開日:2022-02-17

# (参考訳) 連続時間イベント系列の効率的な検索のための時間点過程の学習

Learning Temporal Point Processes for Efficient Retrieval of Continuous Time Event Sequences ( http://arxiv.org/abs/2202.11485v1 )

ライセンス: CC BY 4.0

Vinayak Gupta and Srikanta Bedathur and Abir De

(参考訳) MTPPを用いた予測モデリングの最近の進歩は、連続時間イベントシーケンス(CTES)を含む実世界のいくつかの応用の正確な評価を可能にしている。しかし、これらの配列の検索問題は文献にはほとんど見当たらない。そこで本研究では,あるクエリシーケンスに対して,関連する一連の連続時間イベントシーケンスの検索とランク付けを学習するNEUROSEQRETを提案する。より具体的には、NEUROSEQRETはまず、クエリシーケンスにトレーニング可能なアンウォープ関数を適用し、特に関連するクエリ-コーパスペアが個々の属性を持つ場合、コーパスシーケンスに匹敵する。次に、未処理のクエリシーケンスとコーパスシーケンスをMTPP誘導神経関連モデルにフィードする。精度と効率のトレードオフを提供する関係モデルの2つの変種を開発する。また、局所性に敏感なハッシュに適合し、与えられたクエリシーケンスに対してトップK結果を返す際の大幅な高速化につながるバイナリシーケンスの埋め込みを、関連スコアから学習するための最適化フレームワークを提案する。いくつかのデータセットを用いた実験では、NEUROSEQRETの精度がいくつかのベースラインを超えて向上し、ハッシュ機構の有効性が示された。

Recent developments in predictive modeling using marked temporal point processes (MTPP) have enabled an accurate characterization of several real-world applications involving continuous-time event sequences (CTESs). However, the retrieval problem of such sequences remains largely unaddressed in literature. To tackle this, we propose NEUROSEQRET which learns to retrieve and rank a relevant set of continuous-time event sequences for a given query sequence, from a large corpus of sequences. More specifically, NEUROSEQRET first applies a trainable unwarping function on the query sequence, which makes it comparable with corpus sequences, especially when a relevant query-corpus pair has individually different attributes. Next, it feeds the unwarped query sequence and the corpus sequence into MTPP guided neural relevance models. We develop two variants of the relevance model which offer a tradeoff between accuracy and efficiency. We also propose an optimization framework to learn binary sequence embeddings from the relevance scores, suitable for the locality-sensitive hashing leading to a significant speedup in returning top-K results for a given query sequence. Our experiments with several datasets show the significant accuracy boost of NEUROSEQRET beyond several baselines, as well as the efficacy of our hashing mechanism.

翻訳日:2022-02-27 19:23:20 公開日:2022-02-17

# (参考訳) 単一画像超解像法:調査

Single Image Super-Resolution Methods: A Survey ( http://arxiv.org/abs/2202.11763v1 )

ライセンス: CC BY 4.0

Bahattin Can Maral

(参考訳) 同一場面の1つ以上の低解像度観測から高解像度画像を得る過程であるスーパーレゾリューション(sr)は、信号処理分野と画像処理分野の両方において、過去数十年で非常に一般的な研究テーマとなっている。近年の畳み込みニューラルネットワークの発展により、SRアルゴリズムの人気は急上昇し、参入障壁は大幅に低下した。近年、この人気はビデオ処理領域に広がり、リアルタイムに動作するSRモデルの開発期間にまで及んでいる。本稿では,単一画像処理を専門とするSRモデルの比較を行い,それらが長年にわたって様々な目的や形状にどのように取り組んできたのかを考察する。

Super-resolution (SR), the process of obtaining high-resolution images from one or more low-resolution observations of the same scene, has been a very popular topic of research in the last few decades in both signal processing and image processing areas. Due to the recent developments in Convolutional Neural Networks, the popularity of SR algorithms has skyrocketed as the barrier of entry has been lowered significantly. Recently, this popularity has spread into video processing areas to the lengths of developing SR models that work in real-time. In this paper, we compare different SR models that specialize in single image processing and will take a glance at how they evolved to take on many different objectives and shapes over the years.

翻訳日:2022-02-27 19:04:20 公開日:2022-02-17

# (参考訳) 二部グラフによる作業類似性

Occupation similarity through bipartite graphs ( http://arxiv.org/abs/2202.11064v1 )

ライセンス: CC BY 4.0

Pavle Bo\v{s}koski and Matija Perne and Tja\v{s}a Redek and Biljana Mileva Boshkoska

(参考訳) 職業間の類似性は、キャリア決定を行う上で重要な情報である。しかし、単一で統一された職業類似性尺度の概念は、資産というよりはむしろ制限である。この研究の目的は、複数の説明可能な職業類似性尺度を評価し、占領間関係に関する異なる洞察を提供することである。このような測度は二部グラフの枠組みを用いて導出される。彼らの生存率は、2012年から2021年の間にスロベニアで発生した45万人以上のジョブトランジションによって評価される。結果は、いくつかの類似性尺度が妥当であり、異なる実現可能なキャリアパスを示すという仮説を支持する。データセットの完全な実装と一部は、https://repo.ijs.si/pboskoski/bipartite_job_ similarity_codeで入手できる。

Similarity between occupations is a crucial piece of information when making career decisions. However, the notion of a single and unified occupation similarity measure is more of a limitation than an asset. The goal of the study is to assess multiple explainable occupation similarity measures that can provide different insights into inter-occupation relations. Several such measures are derived using the framework of bipartite graphs. Their viability is assessed on more than 450,000 job transitions occurring in Slovenia in the period between 2012 and 2021. The results support the hypothesis that several similarity measures are plausible and that they present different feasible career paths. The complete implementation and part of the datasets are available at https://repo.ijs.si/pboskoski/bipartite_job_similarity_code.

翻訳日:2022-02-27 18:49:04 公開日:2022-02-17

# (参考訳) deepsketch - 拡散デルタ圧縮のための新しい機械学習に基づく参照探索手法

DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression ( http://arxiv.org/abs/2202.10584v1 )

ライセンス: CC BY 4.0

Jisung Park, Jeoggyun Kim, Yeseong Kim, Sungjin Lee, Onur Mutlu

(参考訳) データセンターの管理コストを最小限に抑える効果的なソリューションとして,ストレージシステムのデータ削減がますます重要になっている。データリダクション効率を最大化するため、既存の後拡散デルタ圧縮技術では、従来のデータ重複やロスレス圧縮とともにデルタ圧縮を行う。残念なことに、類似したデータブロックを識別する際の精度が限られているため、既存の手法は最適値よりも大幅に低いデータ引き込み比を実現している。本稿では,差分圧縮の参照探索における高い精度を達成するために,学習からハッシュへの手法を活用し,データリダクション効率の向上を図る,ディバイス後のデルタ圧縮のための新しい参照探索手法であるdeepsketchを提案する。 deepsketchはディープニューラルネットワークを使用して、データブロックのスケッチ、すなわち他のブロックとの類似性を保存するブロックの近似データシグネチャを生成する。実世界の11のワークロードを用いた評価から,deepsketchは最先端のデルタ圧縮技術に対して,データ還元率を最大33%(平均21%)向上させることがわかった。

Data reduction in storage systems is becoming increasingly important as an effective solution to minimize the management cost of a data center. To maximize data-reduction efficiency, existing post-deduplication delta-compression techniques perform delta compression along with traditional data deduplication and lossless compression. Unfortunately, we observe that existing techniques achieve significantly lower data-reduction ratios than the optimal due to their limited accuracy in identifying similar data blocks. In this paper, we propose DeepSketch, a new reference search technique for post-deduplication delta compression that leverages the learning-to-hash method to achieve higher accuracy in reference search for delta compression, thereby improving data-reduction efficiency. DeepSketch uses a deep neural network to extract a data block's sketch, i.e., to create an approximate data signature of the block that can preserve similarity with other blocks. Our evaluation using eleven real-world workloads shows that DeepSketch improves the data-reduction ratio by up to 33% (21% on average) over a state-of-the-art post-deduplication delta-compression technique.

翻訳日:2022-02-27 18:35:53 公開日:2022-02-17

# (参考訳) 概念の認識と音楽のテーマの認識。量子意味解析

Recognizing Concepts and Recognizing Musical Themes. A Quantum Semantic Analysis ( http://arxiv.org/abs/2202.10941v1 )

ライセンス: CC BY 4.0

Maria Luisa Dalla Chiara, Roberto Giuntini, Eleonora Negri, Giuseppe Sergioli

(参考訳) 過去の経験に基づいて、抽象概念や音楽テーマはどのように認識されているか? この問題に関して、人間と人工知能の異なる行動を比較することは興味深い。一般に、ある概念(例えば、表)を既知の例のセットから抽象化する人間の心は、テーブル・ゲシュタルト(table-gestalt)を生成する。同様の状況は、音楽的なテーマの場合には生じる。人間の心にとって非常に自然なゲシュタルティックなパターンの構築は、知的機械に教えられるのだろうか? この問題は、パターン認識と機械学習への量子アプローチの枠組みにおいて、うまく議論することができる。基本的な考え方は、古典的なデータセットを量子データセットに置き換えることであり、オブジェクトまたは音楽のテーマは、量子世界の特徴となる不確実性と曖昧さを含む量子情報の断片として形式的に表現することができる。この枠組みでは、ジェスタルトの直感的な概念は、与えられた量子データセットの正中心の数学的概念によってシミュレートすることができる。したがって、「新しい物体や新しい音楽の主題を以前の経験に基づいてどのように分類できるか」という重要な問題は、いくつかの特別な量子類似性関係の観点から扱うことができる。認識手順は人間と人工知能では異なるが、どちらの場合においても「問題に直面する」方法が一般的である。

How are abstract concepts and musical themes recognized on the basis of some previous experience? It is interesting to compare the different behaviors of human and of artificial intelligences with respect to this problem. Generally, a human mind that abstracts a concept (say, table) from a given set of known examples creates a table-Gestalt: a kind of vague and out of focus image that does not fully correspond to a particular table with well determined features. A similar situation arises in the case of musical themes. Can the construction of a gestaltic pattern, which is so natural for human minds, be taught to an intelligent machine? This problem can be successfully discussed in the framework of a quantum approach to pattern recognition and to machine learning. The basic idea is replacing classical data sets with quantum data sets, where either objects or musical themes can be formally represented as pieces of quantum information, involving the uncertainties and the ambiguities that characterize the quantum world. In this framework, the intuitive concept of Gestalt can be simulated by the mathematical concept of positive centroid of a given quantum data set. Accordingly, the crucial problem "how can we classify a new object or a new musical theme (we have listened to) on the basis of a previous experience?" can be dealt with in terms of some special quantum similarity-relations. Although recognition procedures are different for human and for artificial intelligences, there is a common method of "facing the problems" that seems to work in both cases.

翻訳日:2022-02-27 18:06:47 公開日:2022-02-17

# 変動型神経時相点過程

Variational Neural Temporal Point Process ( http://arxiv.org/abs/2202.10585v1 )

ライセンス: Link先を確認

Deokjun Eom, Sehyun Lee, Jaesik Choi

(参考訳) 時間的ポイントプロセスは、イベントのシーケンスの履歴が与えられたとき、どのイベントが発生するかを予測する確率的プロセスである。日常生活における発生ダイナミクスの様々な例があり、時間的ダイナミクスを訓練し、2つの異なる予測問題、時間とタイプ予測を解決することが重要である。特に、ディープニューラルネットワークベースのモデルは、ホークス過程やポアソン過程のような統計モデルよりも優れている。しかし、既存の多くのアプローチは、さまざまなイベントタイプを学習し予測するのではなく、特定のイベントに適合する。そのため、このような手法は事象間の変化した関係に対処できず、時間点過程の強度関数を予測できなかった。本稿では,これらの問題を解決するために,変動型ニューラルテンポラリポイントプロセス(vntpp)を提案する。本稿では,推論と生成ネットワークを導入し,ディープニューラルネットワークの確率的性質に対処するために潜在変数の分布を訓練する。インテンシティ関数は潜在変数の分布を用いて計算され、イベントタイプやイベントの到着時刻をより正確に予測できる。モデルが様々なイベントタイプの表現を一般化できることを実証的に実証する。さらに,我々のモデルは,合成および実世界のデータセット上で,他のディープニューラルネットワークベースモデルや統計処理よりも優れていることを示す。

A temporal point process is a stochastic process that predicts which type of events is likely to happen and when the event will occur given a history of a sequence of events. There are various examples of occurrence dynamics in the daily life, and it is important to train the temporal dynamics and solve two different prediction problems, time and type predictions. Especially, deep neural network based models have outperformed the statistical models, such as Hawkes processes and Poisson processes. However, many existing approaches overfit to specific events, instead of learning and predicting various event types. Therefore, such approaches could not cope with the modified relationships between events and fail to predict the intensity functions of temporal point processes very well. In this paper, to solve these problems, we propose a variational neural temporal point process (VNTPP). We introduce the inference and the generative networks, and train a distribution of latent variable to deal with stochastic property on deep neural network. The intensity functions are computed using the distribution of latent variable so that we can predict event types and the arrival times of the events more accurately. We empirically demonstrate that our model can generalize the representations of various event types. Moreover, we show quantitatively and qualitatively that our model outperforms other deep neural network based models and statistical processes on synthetic and real-world datasets.

翻訳日:2022-02-27 17:46:08 公開日:2022-02-17

# 知識インフォームド分子学習:パラダイム伝達に関する調査

Knowledge-informed Molecular Learning: A Survey on Paradigm Transfer ( http://arxiv.org/abs/2202.10587v1 )

ライセンス: Link先を確認

Yin Fang, Qiang Zhang, Zhuo Chen, Xiaohui Fan and Huajun Chen

(参考訳) 機械学習、特に深層学習は、生化学領域において非常に進歩した分子研究である。ほとんどの場合、ほとんどの分子タスクのモデリングはいくつかのパラダイムに収束している。例えば、私たちは通常、分子特性予測の課題を解決するために予測パラダイムを採用します。純粋データ駆動モデルの生成と解釈性を改善するため、研究者はこれらのモデルに生化学的ドメイン知識を組み込んで分子研究を行った。この知識の組み入れによりパラダイムトランスファーの傾向が高まり、ある分子学習タスクを別の分子として再構成することで解決している。本稿では,パラダイム伝達の観点からの知識インフォームド分子学習に関する文献レビューを行い,パラダイムの分類,方法論のレビュー,ドメイン知識の貢献度の分析を行う。さらに,その傾向を要約し,分子学習の今後の方向性を指摘する。

Machine learning, especially deep learning, has greatly advanced molecular studies in the biochemical domain. Most typically, modeling for most molecular tasks have converged to several paradigms. For example, we usually adopt the prediction paradigm to solve tasks of molecular property prediction. To improve the generation and interpretability of purely data-driven models, researchers have incorporated biochemical domain knowledge into these models for molecular studies. This knowledge incorporation has led to a rising trend of paradigm transfer, which is solving one molecular learning task by reformulating it as another one. In this paper, we present a literature review towards knowledge-informed molecular learning in perspective of paradigm transfer, where we categorize the paradigms, review their methods and analyze how domain knowledge contributes. Furthermore, we summarize the trends and point out interesting future directions for molecular learning.

翻訳日:2022-02-27 17:45:20 公開日:2022-02-17

# ptychographyにおける深部反復位相検索

Deep Iterative Phase Retrieval for Ptychography ( http://arxiv.org/abs/2202.10573v1 )

ライセンス: Link先を確認

Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann

(参考訳) 回折イメージングの分野における最も顕著な課題の1つは位相検索(pr)問題である: 回折パターンから物体を再構築するためには、逆フーリエ変換を計算しなければならない。これは全複素値回折データ、すなわち等級と位相を考えると可能である。しかし、回折イメージングでは、一般的に、位相を見積もる必要がある間に直接等級だけを測定できる。本研究では,複数重なり合った回折画像から物体を再構成する回折イメージングのサブフィールドであるptychographyについて考察する。本稿では,既存の反復位相探索アルゴリズムをニューラルネットワークで拡張し,各繰り返しの結果を精査する手法を提案する。この目的のために、最近提案されたアーキテクチャを音声処理分野から適応し拡張する。評価結果から,提案手法は反復数とアルゴリズム実行時間の両方の観点から,収束率の向上を図っている。

One of the most prominent challenges in the field of diffractive imaging is the phase retrieval (PR) problem: In order to reconstruct an object from its diffraction pattern, the inverse Fourier transform must be computed. This is only possible given the full complex-valued diffraction data, i.e. magnitude and phase. However, in diffractive imaging, generally only magnitudes can be directly measured while the phase needs to be estimated. In this work we specifically consider ptychography, a sub-field of diffractive imaging, where objects are reconstructed from multiple overlapping diffraction images. We propose an augmentation of existing iterative phase retrieval algorithms with a neural network designed for refining the result of each iteration. For this purpose we adapt and extend a recently proposed architecture from the speech processing field. Evaluation results show the proposed approach delivers improved convergence rates in terms of both iteration count and algorithm runtime.

翻訳日:2022-02-27 17:39:12 公開日:2022-02-17

# ヒルベルト空間におけるニューラルネットワークによる流れ前進の価格オプション

Pricing options on flow forwards by neural networks in Hilbert space ( http://arxiv.org/abs/2202.11606v1 )

ライセンス: Link先を確認

Fred Espen Benth, Nils Detering, Luca Galimberti

(参考訳) 本稿では,無限次元ニューラルネットワークを応用したフローフォワードの価格設定手法を提案する。我々は, 正実数直線上の実数値関数のヒルベルト空間において, 項構造ダイナミクスの状態空間であるヒルベルト空間において, 価格問題を最適化問題として再キャストする。この最適化問題は、状態空間上の連続関数を近似するために設計された新しいフィードフォワードニューラルネットワークアーキテクチャを容易にすることで解決される。提案したニューラルネットはヒルベルト空間に基づいて構築される。本研究は, 用語構造曲線のサンプリングを訓練した古典的ニューラルネットよりも優れた数値効率を示す, 広範なケーススタディを提供する。

We propose a new methodology for pricing options on flow forwards by applying infinite-dimensional neural networks. We recast the pricing problem as an optimization problem in a Hilbert space of real-valued function on the positive real line, which is the state space for the term structure dynamics. This optimization problem is solved by facilitating a novel feedforward neural network architecture designed for approximating continuous functions on the state space. The proposed neural net is built upon the basis of the Hilbert space. We provide an extensive case study that shows excellent numerical efficiency, with superior performance over that of a classical neural net trained on sampling the term structure curves.

翻訳日:2022-02-27 17:39:00 公開日:2022-02-17

# 深層学習に基づく降水予測と推定のための効果的なトレーニング戦略

Effective Training Strategies for Deep-learning-based Precipitation Nowcasting and Estimation ( http://arxiv.org/abs/2202.10555v1 )

ライセンス: Link先を確認

Jihoon Ko, Kyuhan Lee, Hyunjin Hwang, Seok-Geun Oh, Seok-Woo Son, Kijung Shin

(参考訳) 深層学習は降水流にうまく適用されている。本研究では,事前学習方式と,ディープラーニングに基づく nowcasting 改善のための新しい損失関数を提案する。まず、広く使われているディープラーニングモデルであるU-Netを、降水量予測とレーダ画像からの降水量推定の2つの問題に適用する。前者を3つの降水間隔を持つ分類問題、後者を回帰問題として定式化する。そこで本研究では, 地中降雨を必要とせず, 近い将来, レーダー画像予測モデルを事前学習することを提案し, また, クラス不均衡問題を解決するために, 微調整のための新しい損失関数の利用を提案する。韓国から7年にわたって収集したレーダー画像と降水データセットを用いて,本手法の有効性を実証した。その結果,前訓練計画と新しい損失関数により,5時間リードタイムで最大95.7%,43.6%の降雨量(少なくとも10mm/h)のナッシングの臨界成功率(csi)が向上した。また,従来の降水量に比べて降水量推定誤差が最大で10.7%減少し,降水量は1mm/hrから10mm/hrに減少した。最後に, 異なる分解能に対するアプローチの感度について報告し, 豪雨の4例について詳細な解析を行った。

Deep learning has been successfully applied to precipitation nowcasting. In this work, we propose a pre-training scheme and a new loss function for improving deep-learning-based nowcasting. First, we adapt U-Net, a widely-used deep-learning model, for the two problems of interest here: precipitation nowcasting and precipitation estimation from radar images. We formulate the former as a classification problem with three precipitation intervals and the latter as a regression problem. For these tasks, we propose to pre-train the model to predict radar images in the near future without requiring ground-truth precipitation, and we also propose the use of a new loss function for fine-tuning to mitigate the class imbalance problem. We demonstrate the effectiveness of our approach using radar images and precipitation datasets collected from South Korea over seven years. It is highlighted that our pre-training scheme and new loss function improve the critical success index (CSI) of nowcasting of heavy rainfall (at least 10 mm/hr) by up to 95.7% and 43.6%, respectively, at a 5-hr lead time. We also demonstrate that our approach reduces the precipitation estimation error by up to 10.7%, compared to the conventional approach, for light rainfall (between 1 and 10 mm/hr). Lastly, we report the sensitivity of our approach to different resolutions and a detailed analysis of four cases of heavy rainfall.

翻訳日:2022-02-27 17:01:33 公開日:2022-02-17

# 正確なバイアスフリー学習のための勾配に基づくアクティベーション

Gradient Based Activations for Accurate Bias-Free Learning ( http://arxiv.org/abs/2202.10943v1 )

ライセンス: Link先を確認

Vinod K Kurmi, Rishabh Sharma, Yash Vardhan Sharma, Vinay P. Namboodiri

(参考訳) 機械学習モデルのバイアス軽減は必須だが、それでも難しい。いくつかのアプローチが提案されているが、バイアスを緩和するための1つの視点は、逆学習である。判別器は、問題となる性別、年齢、人種などのバイアス特性を識別するために用いられる。この判別器は、バイアス特性を識別できないように逆向きに使用される。このようなモデルの主な欠点は、識別器が偏見の識別に敏感であると判断する特徴が分類と相関できるため、精度のトレードオフを直接導入することである。この仕事で私たちはその問題を解決する。このバイアス・精度のトレードオフを改善するためにバイアス付き判別器が実際に使用できることを示す。具体的には、判別器の勾配を用いた特徴マスキングアプローチを用いてこれを実現する。バイアス差別に好まれる特徴が強調され、分類中に偏りのない特徴が強化されることを保証する。この単純なアプローチはバイアスを低減し、精度を大幅に向上するために有効であることを示す。提案モデルを標準ベンチマークで評価する。我々は,不偏性を維持したり改善したりしながら,敵の手法の精度を向上し,また近年の手法よりも優れています。

Bias mitigation in machine learning models is imperative, yet challenging. While several approaches have been proposed, one view towards mitigating bias is through adversarial learning. A discriminator is used to identify the bias attributes such as gender, age or race in question. This discriminator is used adversarially to ensure that it cannot distinguish the bias attributes. The main drawback in such a model is that it directly introduces a trade-off with accuracy as the features that the discriminator deems to be sensitive for discrimination of bias could be correlated with classification. In this work we solve the problem. We show that a biased discriminator can actually be used to improve this bias-accuracy tradeoff. Specifically, this is achieved by using a feature masking approach using the discriminator's gradients. We ensure that the features favoured for the bias discrimination are de-emphasized and the unbiased features are enhanced during classification. We show that this simple approach works well to reduce bias as well as improve accuracy significantly. We evaluate the proposed model on standard benchmarks. We improve the accuracy of the adversarial methods while maintaining or even improving the unbiasness and also outperform several other recent methods.

翻訳日:2022-02-27 17:01:08 公開日:2022-02-17

# 健康データベースにおける予測モデルに対する不足値のベンチマーク手法

Benchmarking missing-values approaches for predictive models on health databases ( http://arxiv.org/abs/2202.10580v1 )

ライセンス: Link先を確認

Alexandre Perez-Lebel (MNI, MILA, PARIETAL), Ga\"el Varoquaux (MNI, MILA, PARIETAL), Marine Le Morvan (PARIETAL), Julie Josse (CRISAM, IDESP), Jean-Baptiste Poline (MNI)

(参考訳) 背景: データベースが大きくなるにつれて、コレクションを完全にコントロールすることが難しくなり、しばしば欠落した値(不完全な観察)を伴います。これらの大きなデータベースは、例えば予測やバイオマーカーの抽出など、機械学習モデルをトレーニングするのに適しています。このような予測的アプローチは、生成的ではなく差別的モデリングを使用することで、新たな欠落値戦略への扉を開くことができる。しかし、不足値を扱う戦略に関する既存の実証的な評価は、推論統計に焦点を当てている。本稿では,4つの電子健康記録データセット,1つの人口脳イメージングデータセット,1つの健康調査,および2つの集中治療データセットを対象とする,予測モデルにおける不足値戦略の体系的ベンチマークを行う。グラデーションブースト木を用いて,学習前に不足値に対するネイティブサポートと,単純かつ最先端のインプテーションを比較した。予測精度と計算時間について検討する。インプテーション後の予測では、どの値をインプットしたかを表す指標を追加することが重要であり、データが無作為ではないことを示唆する。不足値の計算は単純な戦略に比べて予測を改善できるが、大規模データではより長い計算時間を必要とする。価値の欠落をモデル化する学習ツリー - 頑丈で、高速で、優れた予測モデリングに、組み込まれた属性リードが欠落している。結論: 教師付き機械学習における欠落値のネイティブサポートは、計算コストをはるかに少なくして、最先端の命令よりも優れた予測を行う。インプテーションを使用する場合には、どの値がインプテーションされたかを表すインジケータ列を追加することが重要である。

BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative -- rather than generative -- modeling, and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. RESULTS: Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: four electronic health record datasets, a population brain imaging one, a health survey and two intensive care ones. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values-with missing incorporated attribute-leads to robust, fast, and well-performing predictive modeling. CONCLUSIONS: Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.

翻訳日:2022-02-27 17:00:24 公開日:2022-02-17

# minerl diamond 2021コンペティション:概要,結果,教訓

MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned ( http://arxiv.org/abs/2202.10583v1 )

ライセンス: Link先を確認

Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang, Weijun Hong, Zhongyue Huang, Haicheng Chen, Guangjun Zeng, Yue Lin, Vincent Micheli, Eloi Alonso, Fran\c{c}ois Fleuret, Alexander Nikulin, Yury Belousov, Oleg Svidchenko, Aleksei Shpilman

(参考訳) 強化学習コンペティションは、特定の問題に対する解決策を開発するための適切なスコープと支援を提供することによって、分野を前進させる。より広範に適用可能な手法の開発を促進するためには,一般技術の使用,サンプル効率のよい手法の使用,その結果の再現性等が求められる。研究コミュニティにとって有益ではあるが、これらの制限はコストがかかる。参入障壁が高すぎると、多くの潜在的な参加者が解体される。このことを念頭に置いて、我々はミネル・アーナダイアモンド・コンペティションの第3版、ミネル・ダイアモンド2021を主催し、新参者の参加を促進するためのいかなる解決策も許した。このトラックとより広範なチュートリアルとサポートにより、投稿数が増加した。この容易な軌道の参加者はダイヤモンドを得ることができ、硬い軌道の参加者は同じ作業で一般化可能な解を進めた。

Reinforcement learning competitions advance the field by providing appropriate scope and support to develop solutions toward a specific problem. To promote the development of more broadly applicable methods, organizers need to enforce the use of general techniques, the use of sample-efficient methods, and the reproducibility of the results. While beneficial for the research community, these restrictions come at a cost -- increased difficulty. If the barrier for entry is too high, many potential participants are demoralized. With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers. With this track and more extensive tutorials and support, we saw an increased number of submissions. The participants of this easier track were able to obtain a diamond, and the participants of the harder track progressed the generalizable solutions in the same task.

翻訳日:2022-02-27 16:59:55 公開日:2022-02-17

# (参考訳) 人間-AIコパイロット最適化による安全運転政策の効率的な学習

Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization ( http://arxiv.org/abs/2202.10341v1 )

ライセンス: CC BY 4.0

Quanyi Li, Zhenghao Peng, Bolei Zhou

(参考訳) 人間の介入は、人間知識を強化学習のトレーニングループに注入する効果的な方法であり、迅速な学習とトレーニング安全性の確保をもたらす。人間の介入の予算が非常に限られているため、人間のエキスパートがトレーニングで学習エージェントと対話する時期と方法を設計することは依然として困難である。本研究では,Human-AI Copilot Optimization (HACO)と呼ばれる新しいループ学習手法を開発した。訓練の安全性を確保しつつ、危険な環境におけるエージェントの十分な探索を可能にするために、人間の専門家は制御を引き継ぎ、おそらく危険な状況や簡単な行動を避ける方法を示すことができる。提案したHACOは、試行錯誤と人間の部分的なデモンストレーションの両方から、高性能エージェントの訓練に有効に活用する。 HACOは、部分的な人間のデモンストレーションからプロキシ状態-アクション値を抽出し、エージェントを最適化してプロキシ値を改善し、一方で人間の介入を減らす。実験の結果,hacoは安全運転ベンチマークにおいて試料効率がかなり高いことがわかった。 HACOは、少数の人的介入予算で未確認の交通シナリオを運転するエージェントを訓練し、高い安全性と一般化性を実現し、強化学習と模倣学習ベースラインの両方を大きなマージンで上回る。コードとデモビデオはhttps://decisionforce.github.io/haco/。

Human intervention is an effective way to inject human knowledge into the training loop of reinforcement learning, which can bring fast learning and ensured training safety. Given the very limited budget of human intervention, it remains challenging to design when and how human expert interacts with the learning agent in the training. In this work, we develop a novel human-in-the-loop learning method called Human-AI Copilot Optimization (HACO).To allow the agent's sufficient exploration in the risky environments while ensuring the training safety, the human expert can take over the control and demonstrate how to avoid probably dangerous situations or trivial behaviors. The proposed HACO then effectively utilizes the data both from the trial-and-error exploration and human's partial demonstration to train a high-performing agent. HACO extracts proxy state-action values from partial human demonstration and optimizes the agent to improve the proxy values meanwhile reduce the human interventions. The experiments show that HACO achieves a substantially high sample efficiency in the safe driving benchmark. HACO can train agents to drive in unseen traffic scenarios with a handful of human intervention budget and achieve high safety and generalizability, outperforming both reinforcement learning and imitation learning baselines with a large margin. Code and demo videos are available at: https://decisionforce.github.io/HACO/.

翻訳日:2022-02-26 22:40:07 公開日:2022-02-17

# (参考訳) 因果正規化フローを用いた世界南部地域におけるIMFプログラムの子ども貧困への影響の実態分析

Counterfactual Analysis of the Impact of the IMF Program on Child Poverty in the Global-South Region using Causal-Graphical Normalizing Flows ( http://arxiv.org/abs/2202.09391v1 )

ライセンス: CC BY 4.0

Sourabh Balgi, Jose M. Pe\~na, Adel Daoud

(参考訳) この研究は、因果推論とディープラーニングモデルの特定の分岐の適用例を示す: \emph{causal-Graphical Normalizing Flows (c-GNFs)}。近年の研究では、正規化フローは特定の特性を持ち、因果解析や反事実解析に特に適していることが示された。しかし、c-GNFsはシミュレーションされたデータセットでのみテストされ、c-GNFsの大規模実世界のデータへの適用は評価されていない。本研究は,c-gnfsを用いた国際通貨基金(imf)プログラムが子どもの貧困に与える影響を反事実的に分析するものである。この分析は18歳未満の1,941,734人の子どもが、世界南部から67カ国に居住する567,344家族の世話をする大規模な実世界の観測データに基づいている。 IMFの主な目的は、経済の安定を達成するための政府を支援することであるが、我々の結果は、IMFプログラムが子どもの貧困を正の副作用として約1.2$\pm$0.24(`0' は貧困に等しい。このように、本稿は、c-GNFが、AIにおける深層学習と因果推論を社会改善にどのように活用するかを示す。学習アルゴリズムが、人口レベル(ACE)、サブ人口レベル(CACE)、個人レベル(ICE)の対実的推論を通じて、未解決の社会的影響に対する大きなポテンシャルにどのように対処できるかを示す。 ACE や CACE を ICE ではなくモデル化している多くの作品とは対照的に、c-GNF は \emph{`The First Law of Causal Inference'} を用いてパーソナライズを可能にする。

This work demonstrates the application of a particular branch of causal inference and deep learning models: \emph{causal-Graphical Normalizing Flows (c-GNFs)}. In a recent contribution, scholars showed that normalizing flows carry certain properties, making them particularly suitable for causal and counterfactual analysis. However, c-GNFs have only been tested in a simulated data setting and no contribution to date have evaluated the application of c-GNFs on large-scale real-world data. Focusing on the \emph{AI for social good}, our study provides a counterfactual analysis of the impact of the International Monetary Fund (IMF) program on child poverty using c-GNFs. The analysis relies on a large-scale real-world observational data: 1,941,734 children under the age of 18, cared for by 567,344 families residing in the 67 countries from the Global-South. While the primary objective of the IMF is to support governments in achieving economic stability, our results find that an IMF program reduces child poverty as a positive side-effect by about 1.2$\pm$0.24 degree (`0' equals no poverty and `7' is maximum poverty). Thus, our article shows how c-GNFs further the use of deep learning and causal inference in AI for social good. It shows how learning algorithms can be used for addressing the untapped potential for a significant social impact through counterfactual inference at population level (ACE), sub-population level (CACE), and individual level (ICE). In contrast to most works that model ACE or CACE but not ICE, c-GNFs enable personalization using \emph{`The First Law of Causal Inference'}.

翻訳日:2022-02-26 22:19:47 公開日:2022-02-17

# 災害後の探索・救助作業における強化学習に基づくUAV基地局軌道最適化

UAV Base Station Trajectory Optimization Based on Reinforcement Learning in Post-disaster Search and Rescue Operations ( http://arxiv.org/abs/2202.10338v1 )

ライセンス: Link先を確認

Shiye Zhao, Kaoru Ota, Mianxiong Dong

(参考訳) 災害のため、地上基地局(TBS)は部分的にクラッシュした。一部のユーザー機器(UE)は保存されていない。無人航空機(UAV)を航空基地局として配置することは、UEを迅速にカバーする方法である。しかし、既存の方法はUAVのカバレッジのみを指す。これらのシナリオでは、すべてのTBSがもはや機能しないディスカスター後の領域におけるUAVの展開に重点を置いている。 TBSとUAVの組み合わせに関する限られた研究がある。本稿では,航空基地局として利用可能なTBSと協調してUAVを配備する手法を提案する。強化学習によってカバー範囲を改善しますさらに,実験では,まず階層構造(BIRCH)を用いて反復還元とクラスタリングのバランスをとるUEをクラスタリングした。最後に、Qラーニングを通じて基地局のUEに対するより良いカバレッジを達成する。

Because of disaster, terrestrial base stations (TBS) would be partly crashed. Some user equipments (UE) would be unserved. Deploying unmanned aerial vehicles (UAV) as aerial base stations is a method to cover UEs quickly. But existing methods solely refer to the coverage of UAVs. In those scenarios, they focus on the deployment of UAVs in the post-disaster area where all TBSs do not work any longer. There is limited research about the combination of available TBSs and UAVs. We propose the method to deploy UAVs cooperating with available TBSs as aerial base stations. And improve the coverage by reinforcement learning. Besides, in the experiments, we cluster UEs with balanced iterative reducing and clustering using hierarchies (BIRCH) at first. Finally, achieve base stations' better coverage to UEs through Q-learning.

翻訳日:2022-02-23 09:53:51 公開日:2022-02-17

# (参考訳) 多パターン乗客予測のためのヒューマンモビリティの探索 : グラフ学習フレームワーク

Exploring Human Mobility for Multi-Pattern Passenger Prediction: A Graph Learning Framework ( http://arxiv.org/abs/2202.10339v1 )

ライセンス: CC BY 4.0

Xiangjie Kong, Kailai Wang, Mingliang Hou, Feng Xia, Gour Karmakar, Jianxin Li

(参考訳) 交通流予測は、インテリジェント交通システムにおいて不可欠な部分であり、様々な交通関連アプリケーションに基礎を置いている。バスは、固定された路線とスケジュールを持つ都市住民にとって必須の移動手段であり、定期運行が遅れる。しかし、この固定移動モードでは、人間の移動パターン、特にバス乗客間の複雑な関係が深く隠されている。交通流の予測には多くのモデルが存在するが、この点に関して人間の移動パターンは十分に研究されていない。この研究のギャップを減らし、この固定走行行動から人間の移動性知識を学習するために、グラフ畳み込みネットワーク(GCN)に基づく多パターンの乗客フロー予測フレームワークMPGCNを提案する。まず,バス記録データに基づいて乗客間の関係をモデル化する新しい共有ストップネットワークを構築する。そこで我々はGCNを用いて,有用なトポロジ情報から特徴を抽出し,バス乗客に隠された移動パターンを認識するディープクラスタリング手法を提案する。さらに, 時空間情報を完全に活用するために, 様々な移動パターンに基づき, gcn2flow を提案する。我々の知る限り、この論文は、グラフ学習からバスの乗客フローを予測するためのマルチパターンアプローチを採用した最初の試みである。経路最適化のためのケーススタディを設計する。実世界のバスデータセットに対する大規模な実験は、MPGCNが乗客フロー予測と経路最適化に潜在的に有効であることを示した。

Traffic flow prediction is an integral part of an intelligent transportation system and thus fundamental for various traffic-related applications. Buses are an indispensable way of moving for urban residents with fixed routes and schedules, which leads to latent travel regularity. However, human mobility patterns, specifically the complex relationships between bus passengers, are deeply hidden in this fixed mobility mode. Although many models exist to predict traffic flow, human mobility patterns have not been well explored in this regard. To reduce this research gap and learn human mobility knowledge from this fixed travel behaviors, we propose a multi-pattern passenger flow prediction framework, MPGCN, based on Graph Convolutional Network (GCN). Firstly, we construct a novel sharing-stop network to model relationships between passengers based on bus record data. Then, we employ GCN to extract features from the graph by learning useful topology information and introduce a deep clustering method to recognize mobility patterns hidden in bus passengers. Furthermore, to fully utilize Spatio-temporal information, we propose GCN2Flow to predict passenger flow based on various mobility patterns. To the best of our knowledge, this paper is the first work to adopt a multipattern approach to predict the bus passenger flow from graph learning. We design a case study for optimizing routes. Extensive experiments upon a real-world bus dataset demonstrate that MPGCN has potential efficacy in passenger flow prediction and route optimization.

翻訳日:2022-02-22 22:11:19 公開日:2022-02-17

# VRL3: ビジュアルディープ強化学習のためのデータ駆動フレームワーク

VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning ( http://arxiv.org/abs/2202.10324v1 )

ライセンス: Link先を確認

Che Wang, Xufang Luo, Keith Ross, Dongsheng Li

(参考訳) 高度に課題の多いdrl(visual deep reinforcement learning)タスクを解決するための,シンプルかつ強力なデータ駆動フレームワークを提案する。我々は、データ駆動アプローチをとる際の多くの大きな障害を分析し、データ駆動型ビジュアルDRLに関する一連の設計原則、トレーニング戦略、重要な洞察を提示します。我々のフレームワークには3つのステージがある: ステージ1では非RLデータセット(例: ImageNet)を使ってタスクに依存しない視覚表現を学習し、ステージ2ではオフラインのRLデータ(例: 限られた数の専門家によるデモンストレーション)を使ってタスクに依存しない表現をより強力なタスク固有の表現に変換する。 sparse reward と real visual input を用いた極めて困難なハンド操作タスクのセットでは,従来の sota 法よりも 370%-1200% 高速に学習し,データ駆動型深層強化学習の可能性を完全に実証するエンコーダを用いた。

We propose a simple but powerful data-driven framework for solving highly challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, training strategies, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of highly challenging hand manipulation tasks with sparse reward and realistic visual inputs, our framework learns 370%-1200% faster than the previous SOTA method while using an encoder that is 50 times smaller, fully demonstrating the potential of data-driven deep reinforcement learning.

翻訳日:2022-02-22 15:56:57 公開日:2022-02-17

# (参考訳) ComParE COVID-19チャレンジの概要

A Summary of the ComParE COVID-19 Challenges ( http://arxiv.org/abs/2202.08981v1 )

ライセンス: CC BY 4.0

Harry Coppock, Alican Akman, Christian Bergler, Maurice Gerczuk, Chlo\"e Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Jing Han, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Panagiotis Tzirakis, Anton Batliner, Cecilia Mascolo, Bj\"orn W. Schuller

(参考訳) 新型コロナウイルス(COVID-19)のパンデミックは、人道と経済に大きな被害をもたらした。さまざまな分野の科学者チームが、政府やコミュニティが病気と戦うのを助ける方法を模索している。研究されている機械学習分野からの1つの道は、感染した人の呼吸音から新型コロナウイルスを検出するデジタル質量テストの展望である。我々は,InterSPEECH 2021 Computational Paralinguistics Challenges: COVID-19 Cough, (CCS) and COVID-19 Speech, (CSS)の結果の概要を述べる。

The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from infected individuals' respiratory sounds. We present a summary of the results from the INTERSPEECH 2021 Computational Paralinguistics Challenges: COVID-19 Cough, (CCS) and COVID-19 Speech, (CSS).

翻訳日:2022-02-22 00:52:02 公開日:2022-02-17

# (参考訳) RemixIT:ブートストラップリミックスによる音声強調モデルの連続的自己学習

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing ( http://arxiv.org/abs/2202.08862v1 )

ライセンス: CC BY 4.0

Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

(参考訳) RemixITは、単一の独立したドメイン内音声やノイズ波形を必要とせずに、音声強調を訓練するためのシンプルで効果的な自己教師手法である。提案手法は, 従来手法の制約を克服し, クリーンなドメイン内ターゲット信号に依存し, 列車とテストサンプル間のドメインミスマッチに敏感に対処する。 RemixITは、ドメイン外のデータに基づく事前訓練された教師モデルが、ドメイン内の混合に対して推定された擬似ターゲット信号を推測する継続的自己学習方式に基づいている。そして、推定されたクリーン信号とノイズ信号を置換してリミックスすることで、学生ネットワークのトレーニングに使用されるブートストラップ付き混合とそれに対応する擬似ターゲットを新たに生成する。教師は、最新の学生モデルの更新されたパラメータを使って、定期的に見積もりを洗練する。複数の音声強調データセットとタスクにおける実験結果は,従来の手法よりも優れた手法を示すだけでなく,任意の分離モデルと組み合わせて,任意の半教師なし・教師なしのドメイン適応タスクに適用できることを示した。実験的なエビデンスと組み合わせた分析は, 学生モデルが高度に劣化した疑似標的を観察しながら, 良好な性能を保ち続ける自己学習方式の内部機能に光を当てる。

We present RemixIT, a simple yet effective selfsupervised method for training speech enhancement without the need of a single isolated in-domain speech nor a noise waveform. Our approach overcomes limitations of previous methods which make them dependent to clean in-domain target signals and thus, sensitive to any domain mismatch between train and test samples. RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures. Then, by permuting the estimated clean and noise signals and remixing them together, we generate a new set of bootstrapped mixtures and corresponding pseudo-targets which are used to train the student network. Vice-versa, the teacher periodically refines its estimates using the updated parameters of the latest student models. Experimental results on multiple speech enhancement datasets and tasks not only show the superiority of our method over prior approaches but also showcase that RemixIT can be combined with any separation model as well as be applied towards any semi-supervised and unsupervised domain adaptation task. Our analysis, paired with empirical evidence, sheds light on the inside functioning of our self-training scheme wherein the student model keeps obtaining better performance while observing severely degraded pseudo-targets.

翻訳日:2022-02-22 00:36:28 公開日:2022-02-17

# (参考訳) グラフ機械学習のためのグラフデータ拡張:調査

Graph Data Augmentation for Graph Machine Learning: A Survey ( http://arxiv.org/abs/2202.08871v1 )

ライセンス: CC BY 4.0

Tong Zhao, Gang Liu, Stephan G\"unnemann, Meng Jiang

(参考訳) データ拡張は、追加のトレーニングデータを作成し、モデルの一般化を改善する能力から、グラフ機械学習への関心が高まっている。グラフデータの複雑で非ユークリッド的な構造によって引き起こされる課題により、従来の拡張操作を他の種類のデータに直接類似化することが制限されるため、この領域は依然として過小評価されている。本稿では,文献を構造化的に要約したグラフデータ拡張の包括的かつ体系的な調査を行う。まず,変更や生成したグラフデータのコンポーネントに基づいて,グラフデータの拡張操作を分類する。次に,グラフデータ拡張の最近の進歩を紹介し,その学習目標と方法論によって分離する。現在未解決の課題と今後の研究の方向性を概説する。全体として,グラフデータ拡張における既存文献の展望を明らかにし,この分野における追加作業を動機付けることを目的としている。私たちはgithubリポジトリ(https://github.com/zhao-tong/graph-data-augmentation-papers)と読み込みリストを提供し、継続的に更新します。

Data augmentation has recently seen increased interest in graph machine learning given its ability of creating extra training data and improving model generalization. Despite this recent upsurge, this area is still relatively underexplored, due to the challenges brought by complex, non-Euclidean structure of graph data, which limits the direct analogizing of traditional augmentation operations on other types of data. In this paper, we present a comprehensive and systematic survey of graph data augmentation that summarizes the literature in a structured manner. We first categorize graph data augmentation operations based on the components of graph data they modify or create. Next, we introduce recent advances in graph data augmentation, separating by their learning objectives and methodologies. We conclude by outlining currently unsolved challenges as well as directions for future research. Overall, this paper aims to clarify the landscape of existing literature in graph data augmentation and motivate additional work in this area. We provide a GitHub repository (https://github.com/zhao-tong/graph-data-augmentation-papers) with a reading list that will be continuously updated.

翻訳日:2022-02-21 23:59:15 公開日:2022-02-17

# (参考訳) 単調変分不等式を用いたニューラルネットワークの訓練

Training neural networks using monotone variational inequality ( http://arxiv.org/abs/2202.08876v1 )

ライセンス: CC BY 4.0

Chen Xu, Xiuyuan Cheng, Yao Xie

(参考訳) ニューラルネットワークの実証的な成功にもかかわらず、トレーニング手順の理論的理解は限定的であり、特に最適化問題の非凸性により性能が保証される。最近の研究(Juditsky & Nemirovsky, 2019)に触発されて、従来の損失関数最小化アプローチではなく、ネットワークパラメータのトレーニングを凸構造を持つ別の問題に還元し、モノトーン変分不等式(MVI)を解決する。 MVIの解は計算効率のよい手順で発見でき、さらに重要なことは、一層線形ニューラルネットワークの理論的設定の下でのモデル回復精度と予測精度に関する$\ell_2$および$\ell_{\infty}$バウンドの性能保証につながる。さらに,マルチ層ニューラルネットワークのトレーニングにおけるMVIの利用について検討し,SVI(textit{stochastic variational inequality})と呼ばれる実用的アルゴリズムを提案し,完全に接続されたニューラルネットワークとグラフニューラルネットワーク(GNN)のトレーニングへの適用性を示した(SVIは完全に汎用的で,他のタイプのニューラルネットワークのトレーニングに使用できる)。各種性能指標に関する実ネットワークデータ予測タスクにおいて,確率勾配降下(SGD)と比較して,SVIの競争力や性能が向上することを示した。

Despite the vast empirical success of neural networks, theoretical understanding of the training procedures remains limited, especially in providing performance guarantees of testing performance due to the non-convex nature of the optimization problem. Inspired by a recent work of (Juditsky & Nemirovsky, 2019), instead of using the traditional loss function minimization approach, we reduce the training of the network parameters to another problem with convex structure -- to solve a monotone variational inequality (MVI). The solution to MVI can be found by computationally efficient procedures, and importantly, this leads to performance guarantee of $\ell_2$ and $\ell_{\infty}$ bounds on model recovery accuracy and prediction accuracy under the theoretical setting of training one-layer linear neural network. In addition, we study the use of MVI for training multi-layer neural networks and propose a practical algorithm called \textit{stochastic variational inequality} (SVI), and demonstrates its applicability in training fully-connected neural networks and graph neural networks (GNN) (SVI is completely general and can be used to train other types of neural networks). We demonstrate the competitive or better performance of SVI compared to the stochastic gradient descent (SGD) on both synthetic and real network data prediction tasks regarding various performance metrics.

翻訳日:2022-02-21 23:43:02 公開日:2022-02-17

# (参考訳) 部分音声タグによるSinhalaニューラルマシン翻訳の改良

Improving English to Sinhala Neural Machine Translation using Part-of-Speech Tag ( http://arxiv.org/abs/2202.08882v1 )

ライセンス: CC BY 4.0

Ravinga Perera, Thilakshi Fonseka, Rashmini Naranpanawa, Uthayasanker Thayasivam

(参考訳) ニューラルマシン翻訳(nmt)の性能は、利用可能な並列コーパスのサイズに大きく依存する。このため、低リソース言語対は高リソース言語対と比較して低翻訳性能を示す。形態学的に豊かな言語に対してnmtを行うと翻訳品質はさらに低下する。ウェブには大量の情報が含まれているが、スリランカのほとんどの人々は英語を正しく読み書きできない。そのため、地域住民間で情報を共有するために、英語コンテンツを現地語に翻訳する大きな要件が存在する。シンハラ語はスリランカで主要な言語であり、シンハラ語に英語を翻訳できるnmtシステムを構築するのは、リソースの制約の少ない2つの言語間の構文の相違のため困難である。そこで本研究では,音声の一部(POS)タグをトランスフォーマーの入力埋め込みと位置エンコーディングに組み込むことにより,Sinhalaニューラルマシン翻訳モデルに対するベースライン英語の性能をさらに向上させる方法について検討する。

The performance of Neural Machine Translation (NMT) depends significantly on the size of the available parallel corpus. Due to this fact, low resource language pairs demonstrate low translation performance compared to high resource language pairs. The translation quality further degrades when NMT is performed for morphologically rich languages. Even though the web contains a large amount of information, most people in Sri Lanka are unable to read and understand English properly. Therefore, there is a huge requirement of translating English content to local languages to share information among locals. Sinhala language is the primary language in Sri Lanka and building an NMT system that can produce quality English to Sinhala translations is difficult due to the syntactic divergence between these two languages under low resource constraints. Thus, in this research, we explore effective methods of incorporating Part of Speech (POS) tags to the Transformer input embedding and positional encoding to further enhance the performance of the baseline English to Sinhala neural machine translation model.

翻訳日:2022-02-21 23:41:03 公開日:2022-02-17

# (参考訳) 低リソース音声認識のためのカリキュラム最適化

Curriculum optimization for low-resource speech recognition ( http://arxiv.org/abs/2202.08883v1 )

ライセンス: CC BY-SA 4.0

Anastasia Kuznetsova, Anurag Kumar, Jennifer Drexler Fox, Francis Tyers

(参考訳) 現代のエンドツーエンド音声認識モデルは、音声信号をテキストに書き起こすという驚くべき結果を示している。しかし、従来のデータ供給パイプラインは低リソース音声認識に最適であり、依然として困難な課題である。本稿では,トレーニング中のモデルの進捗状況と,トレーニングの難易度に関する事前知識の両方に基づいて,トレーニング事例の順序を最適化する自動カリキュラム学習手法を提案する。様々な騒音条件において生音声のスコアリング機能として使用できる圧縮比と呼ばれる新しい難易度尺度を提案する。提案手法は音声認識単語の誤り率をベースラインシステムと比較して最大33%向上させる。

Modern end-to-end speech recognition models show astonishing results in transcribing audio signals into written text. However, conventional data feeding pipelines may be sub-optimal for low-resource speech recognition, which still remains a challenging task. We propose an automated curriculum learning approach to optimize the sequence of training examples based on both the progress of the model while training and prior knowledge about the difficulty of the training examples. We introduce a new difficulty measure called compression ratio that can be used as a scoring function for raw audio in various noise conditions. The proposed method improves speech recognition Word Error Rate performance by up to 33% relative over the baseline system

翻訳日:2022-02-21 23:32:26 公開日:2022-02-17

# (参考訳) 衛星画像を用いた深層移動学習による開発途上国の大気質評価の改善

Deep Transfer Learning on Satellite Imagery Improves Air Quality Estimates in Developing Nations ( http://arxiv.org/abs/2202.08890v1 )

ライセンス: CC BY 4.0

Nishant Yadav, Meytar Sorek-Hamer, Michael Von Pohle, Ata Akbari Asanjan, Adwait Sahasrabhojanee, Esra Suel, Raphael Arku, Violet Lingenfelter, Michael Brauer, Majid Ezzati, Nikunj Oza, Auroop R. Ganguly

(参考訳) 都市大気汚染は低所得国や中所得国(LMIC)の公衆衛生問題である。しかし、LMICには十分な空気品質(AQ)の監視インフラがない。 LMICの都市ではAQを正確に推定できないため、緊急の準備やリスク軽減を妨げている。衛星画像をAQにマッピングするディープラーニングベースのモデルは、適切な地上データを持つ高所得国(HIC)向けに構築することができる。ここでは,HIC都市で学習した時空間パターンに基づいて,衛星画像の深層移動学習をAQに適用したスケーラブルなアプローチにより,LMIC都市の有意義な推定と洞察を抽出できることを実証する。このアプローチはアフリカ、ガーナのAccraで実証されており、米国2都市、特にロサンゼルスとニューヨークからAQパターンが学習されている。

Urban air pollution is a public health challenge in low- and middle-income countries (LMICs). However, LMICs lack adequate air quality (AQ) monitoring infrastructure. A persistent challenge has been our inability to estimate AQ accurately in LMIC cities, which hinders emergency preparedness and risk mitigation. Deep learning-based models that map satellite imagery to AQ can be built for high-income countries (HICs) with adequate ground data. Here we demonstrate that a scalable approach that adapts deep transfer learning on satellite imagery for AQ can extract meaningful estimates and insights in LMIC cities based on spatiotemporal patterns learned in HIC cities. The approach is demonstrated for Accra in Ghana, Africa, with AQ patterns learned from two US cities, specifically Los Angeles and New York.

翻訳日:2022-02-21 23:22:16 公開日:2022-02-17

# (参考訳) 対話型AI設計がユーザ行動に及ぼす影響:Fact-checking COVID-19の実態調査

The Effects of Interactive AI Design on User Behavior: An Eye-tracking Study of Fact-checking COVID-19 Claims ( http://arxiv.org/abs/2202.08901v1 )

ライセンス: CC BY 4.0

Li Shi, Nilavra Bhattacharya, Anubrata Das, Matthew Lease, Jacek Gwidzka

(参考訳) 我々は,AIを活用したファクトチェックシステムの相互作用が,生活時間や注意,精神的リソースといったユーザインタラクションにどのように影響するかを,実験室で調査した。参加者は、対話型および非対話型のaiファクトチェックシステムを使用して、covid-19関連クレームの正しさを評価した。 NASA-TLX を用いた Web ページインタラクション,アイトラッキングデータ,メンタルワークロードの収集を行った。 aiシステムの予測パラメータを対話的に操作する余裕があることは、ユーザの居住時間やaoisのアイフィックスに影響を与えているが、メンタルなワークロードには影響しなかった。インタラクティブシステムでは、参加者は主張の正しさを評価し、次にニュースを読む。この有望な結果は、ai駆動システムにおける相互活動のポジティブな役割を示している。

We conducted a lab-based eye-tracking study to investigate how the interactivity of an AI-powered fact-checking system affects user interactions, such as dwell time, attention, and mental resources involved in using the system. A within-subject experiment was conducted, where participants used an interactive and a non-interactive version of a mock AI fact-checking system and rated their perceived correctness of COVID-19 related claims. We collected web-page interactions, eye-tracking data, and mental workload using NASA-TLX. We found that the presence of the affordance of interactively manipulating the AI system's prediction parameters affected users' dwell times, and eye-fixations on AOIs, but not mental workload. In the interactive system, participants spent the most time evaluating claims' correctness, followed by reading news. This promising result shows a positive role of interactivity in a mixed-initiative AI-powered system.

翻訳日:2022-02-21 23:11:34 公開日:2022-02-17

# (参考訳) スタック一般化を用いたバイナリ分類のための多様な学習者の組み合わせ

Combining Varied Learners for Binary Classification using Stacked Generalization ( http://arxiv.org/abs/2202.08910v1 )

ライセンス: CC BY 4.0

Sruthi Nair, Abhishek Gupta, Raunak Joshi, Vidya Chitre

(参考訳) 機械学習は、いくつかの面や他の面よりも優れた学習アルゴリズムを持っているが、すべてのアルゴリズムが抱える一般的なエラーは、非常に高次元の特徴セットを持つトレーニングデータである。これは通常、アルゴリズムが性能を損なう一般化エラーになってしまう。これは、Stacked Generalizationと呼ばれるStackingとして知られるEnsemble Learningメソッドを使って解決できる。本稿では,高次元多嚢胞性卵巣症候群データセット上で,スタック一般化を用いた2値分類を行い,モデルが一般化し,指標が大幅に向上する点を証明する。様々な指標が本論文で示されており、受信機動作特性曲線で見いだされた微妙な遷移が誤りであることが証明されている。

The Machine Learning has various learning algorithms that are better in some or the other aspect when compared with each other but a common error that all algorithms will suffer from is training data with very high dimensional feature set. This usually ends up algorithms into generalization error that deplete the performance. This can be solved using an Ensemble Learning method known as Stacking commonly termed as Stacked Generalization. In this paper we perform binary classification using Stacked Generalization on high dimensional Polycystic Ovary Syndrome dataset and prove the point that model becomes generalized and metrics improve significantly. The various metrics are given in this paper that also point out a subtle transgression found with Receiver Operating Characteristic Curve that was proved to be incorrect.

翻訳日:2022-02-21 23:03:30 公開日:2022-02-17

# (参考訳) 知識グラフ関係における微粒化セマンティクスの発見

Discovering Fine-Grained Semantics in Knowledge Graph Relations ( http://arxiv.org/abs/2202.08917v1 )

ライセンス: CC BY 4.0

Nitisha Jain and Ralf Krestel

(参考訳) マルチリレーショナルデータの理解と分析に関しては,関係のセマンティクスが不可欠である。複数のセマンティクスを表す異なる種類のエンティティ間の多相関係は、知識グラフで表される現実世界のリレーショナルデータセットで一般的である。エンティティタイプ分類、質問応答、知識グラフ補完などの多くのユースケースでは、これらの関係の正しい意味解釈が必要である。本研究は,抽象的関係に関連する異なる意味を探索し,細粒度な意味を持つ多くの部分関係を導出するための戦略を提案する。これを実現するために、関係に関連するエンティティの型を活用し、エンティティと関係のベクトル表現をクラスタ化する。提案手法は,多元関係に対する最良部分関係を自動で発見し,その意味的解釈を経験的評価により決定することができる。

When it comes to comprehending and analyzing multi-relational data, the semantics of relations are crucial. Polysemous relations between different types of entities, that represent multiple semantics, are common in real-world relational datasets represented by knowledge graphs. For numerous use cases, such as entity type classification, question answering and knowledge graph completion, the correct semantic interpretation of these relations is necessary. In this work, we provide a strategy for discovering the different semantics associated with abstract relations and deriving many sub-relations with fine-grained meaning. To do this, we leverage the types of the entities associated with the relations and cluster the vector representations of entities and relations. The suggested method is able to automatically discover the best number of sub-relations for a polysemous relation and determine their semantic interpretation, according to our empirical evaluation.

翻訳日:2022-02-21 22:56:43 公開日:2022-02-17

# (参考訳) FLAME: マルチデバイス環境におけるフェデレーション学習

FLAME: Federated Learning Across Multi-device Environments ( http://arxiv.org/abs/2202.08922v1 )

ライセンス: CC BY 4.0

Hyunsung Cho, Akhil Mathur, Fahim Kawsar

(参考訳) federated learning(fl)は、個人データをユーザデバイスにプライベートに保持しながら、マシンラーニングモデルの分散トレーニングを可能にする。ヒューマンアクティビティ認識などのモバイルセンシング分野におけるFLの応用が増加しているのを目にするが、FLはマルチデバイス環境(MDE)の文脈では研究されておらず、各ユーザが複数のデータ生成デバイスを所有している。モバイルやウェアラブルデバイスの普及に伴い、MIDはユビコン設定で人気が高まり、FLの研究が必要とされるようになった。 MDEにおけるFLは、クライアント間の非IID性が高く、ユーザとデバイスの両方の不均一性が複雑である。さらに、MDEにおけるFLクライアントにおけるシステムリソースの効率的な利用の確保は、依然として重要な課題である。本稿では,ユーザ中心のFLAME学習手法であるFLAMEを提案し,MDEにおける統計的・システム的不均一性に対処し,デバイス間での推論性能の整合性を実現する。 FLAMEの特徴 (i)同一ユーザからのデバイス間の時間アライメントを利用したユーザ中心FLトレーニング二精度及び効率性を考慮した装置の選択 (iii)デバイスへのパーソナライズモデル。また,実測エネルギードレインとネットワーク帯域幅プロファイルを用いたFL評価実験を行い,既存のHARデータセットをフェデレートした設定に拡張する新しいクラスベースデータ分割方式を提案する。その結果,FLAMEはF-1スコアが4.8～33.8%,エネルギー効率が1.02～2.86倍,収束速度が最大2.2倍に向上し,FLワークロードの公平分布による目標精度が向上した。

Federated Learning (FL) enables distributed training of machine learning models while keeping personal data on user devices private. While we witness increasing applications of FL in the area of mobile sensing, such as human-activity recognition, FL has not been studied in the context of a multi-device environment (MDE), wherein each user owns multiple data-producing devices. With the proliferation of mobile and wearable devices, MDEs are increasingly becoming popular in ubicomp settings, therefore necessitating the study of FL in them. FL in MDEs is characterized by high non-IID-ness across clients, complicated by the presence of both user and device heterogeneities. Further, ensuring efficient utilization of system resources on FL clients in a MDE remains an important challenge. In this paper, we propose FLAME, a user-centered FL training approach to counter statistical and system heterogeneity in MDEs, and bring consistency in inference performance across devices. FLAME features (i) user-centered FL training utilizing the time alignment across devices from the same user; (ii) accuracy- and efficiency-aware device selection; and (iii) model personalization to devices. We also present an FL evaluation testbed with realistic energy drain and network bandwidth profiles, and a novel class-based data partitioning scheme to extend existing HAR datasets to a federated setup. Our experiment results on three multi-device HAR datasets show that FLAME outperforms various baselines by 4.8-33.8% higher F-1 score, 1.02-2.86x greater energy efficiency, and up to 2.02x speedup in convergence to target accuracy through fair distribution of the FL workload.

翻訳日:2022-02-21 22:41:20 公開日:2022-02-17

# (参考訳) 最適パス森林における不均衡データセットの扱い

Handling Imbalanced Datasets Through Optimum-Path Forest ( http://arxiv.org/abs/2202.08934v1 )

ライセンス: CC BY 4.0

Leandro Aparecido Passos, Danilo S. Jodas, Luiz C. F. Ribeiro, Marco Akio, Andre Nunes de Souza, Jo\~ao Paulo Papa

(参考訳) 過去10年間で、機械学習ベースのアプローチは、時には人間よりも幅広い複雑なタスクを実行できるようになり、わずかな時間を要するようになった。このような進歩は、利用可能なデータ量が指数関数的に増加し、それらから信頼できる現実世界情報を抽出できるためである。しかし、ある現象は他の現象よりも可能性が高いため、これらのデータは一般に不均衡である。このような振る舞いは、より頻繁なデータに偏っているため、機械学習モデルのパフォーマンスにかなりの影響を与えます。大量の機械学習手法にもかかわらず、グラフベースのアプローチは、多くのアプリケーション、すなわち最適なパスフォレスト(opf)のパフォーマンスが優れたため、かなりの注目を集めている。本稿では,不均衡問題に対処するための3つのopfベースの戦略を提案する。$\text{o}^2$pf と opf-us はそれぞれオーバーサンプリングとアンダーサンプリングのための新しいアプローチであり,両方のアプローチを組み合わせたハイブリッド戦略である。本稿では,上記の戦略に関する変種についても紹介する。パブリックデータセットとプライベートデータセットにおける最先端技術との比較により,提案手法の堅牢性が確認された。

In the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance problem: the $\text{O}^2$PF and the OPF-US, which are novel approaches for oversampling and undersampling, respectively, as well as a hybrid strategy combining both approaches. The paper also introduces a set of variants concerning the strategies mentioned above. Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches.

翻訳日:2022-02-21 22:12:22 公開日:2022-02-17

# スタイルベース生成対向ネットワークを用いた先行画像に基づく医用画像再構成

Prior image-based medical image reconstruction using a style-based generative adversarial network ( http://arxiv.org/abs/2202.08936v1 )

ライセンス: Link先を確認

Varun A. Kelkar and Mark A. Anastasio

(参考訳) 医用画像システムには画像形成のための計算再構成手順が必要である。記録された測定値が不完全である場合には、被写体の有用な推定値を取得するためには、被写体の性質に関する事前知識を利用する必要がある。画像逆問題に対する条件付けを改善するために,物体の先行や制約をよりよく表現するために,ディープラーニングアプローチを積極的に研究している。本研究は,検索対象の先行画像の形で追加情報が得られる場合に,画像再構成問題を制約するために,スタイルベースの生成逆ネットワーク(StyleGAN)を使用することを提案する。磁気共鳴イメージング(MRI)で使用されるコントラストのような、意味のある画像属性や「スタイル」に対して歪んだスタイルGANの中間潜時空間に最適化問題を定式化する。追従画像と先行画像との相違は、非絡み空間において測定され、非絡み空間の特定のスタイルに対する制約の形で逆問題を調整するために使用される。 MRイメージングにインスパイアされた、構造的に類似しているが、異なるコントラスト機構に属する、スタイル化された数値研究が設計されている。提案手法は,従来の手法に比べ,従来手法よりも優れていることを示す数値的研究を行った。

Computed medical imaging systems require a computational reconstruction procedure for image formation. In order to recover a useful estimate of the object to-be-imaged when the recorded measurements are incomplete, prior knowledge about the nature of object must be utilized. In order to improve the conditioning of an ill-posed imaging inverse problem, deep learning approaches are being actively investigated for better representing object priors and constraints. This work proposes to use a style-based generative adversarial network (StyleGAN) to constrain an image reconstruction problem in the case where additional information in the form of a prior image of the sought-after object is available. An optimization problem is formulated in the intermediate latent-space of a StyleGAN, that is disentangled with respect to meaningful image attributes or "styles", such as the contrast used in magnetic resonance imaging (MRI). Discrepancy between the sought-after and prior images is measured in the disentangled latent-space, and is used to regularize the inverse problem in the form of constraints on specific styles of the disentangled latent-space. A stylized numerical study inspired by MR imaging is designed, where the sought-after and the prior image are structurally similar, but belong to different contrast mechanisms. The presented numerical studies demonstrate the superiority of the proposed approach as compared to classical approaches in the form of traditional metrics.

翻訳日:2022-02-21 14:57:22 公開日:2022-02-17

# 量子データによる多体局在へのスケーラブルなアプローチ

Scalable approach to many-body localization via quantum data ( http://arxiv.org/abs/2202.08853v1 )

ライセンス: Link先を確認

Alexander Gresch, Lennart Bittel and Martin Kliesch

(参考訳) 量子データによって、計算が難しい問題に対する実用的な解決策が実現できることに関心があります。量子多体物理学の非常に難しい現象は、多体局在(MBL)の出現である。これまでのところ、isは包括的な分析を避けている。特に、数値的研究はヒルベルト空間次元の指数的成長によって挑戦される。これらの研究の多くはシステムのハミルトニアンの正確な対角化に依存しているため、小さなシステムサイズのみがアクセス可能である。本研究では,学習データから計算コストのかかるステップを回避できる,高度に柔軟なニューラルネットワークに基づく学習手法を提案する。このようにして、隣接するギャップ比やエントロピー量などのMBLの共通指標を効率的に推定することができる。私たちの推定器は、さまざまなシステムサイズのデータを一度にトレーニングすることで、より小さなものから大きなものへと推定することが可能になります。さらに, 伝達学習を用いて, 二次元特徴ベクトルは, 様々なエネルギー密度で複数の異なる指標を一度に得るのに十分であることを示す。我々は、このアプローチを大規模量子実験に適用し、量子多体物理学への新たな洞察を提供することを望んでいる。

We are interested in how quantum data can allow for practical solutions to otherwise difficult computational problems. A notoriously difficult phenomenon from quantum many-body physics is the emergence of many-body localization (MBL). So far, is has evaded a comprehensive analysis. In particular, numerical studies are challenged by the exponential growth of the Hilbert space dimension. As many of these studies rely on exact diagonalization of the system's Hamiltonian, only small system sizes are accessible. In this work, we propose a highly flexible neural network based learning approach that, once given training data, circumvents any computationally expensive step. In this way, we can efficiently estimate common indicators of MBL such as the adjacent gap ratio or entropic quantities. Our estimator can be trained on data from various system sizes at once which grants the ability to extrapolate from smaller to larger ones. Moreover, using transfer learning we show that already a two-dimensional feature vector is sufficient to obtain several different indicators at various energy densities at once. We hope that our approach can be applied to large-scale quantum experiments to provide new insights into quantum many-body physics.

翻訳日:2022-02-21 14:57:00 公開日:2022-02-17

# 音声混合における単語埋め込みによる自動等化

Word Embeddings for Automatic Equalization in Audio Mixing ( http://arxiv.org/abs/2202.08898v1 )

ライセンス: Link先を確認

Satvik Venkatesh, David Moffat, Eduardo Reck Miranda

(参考訳) 近年,音声混合プロセスを自動化するために機械学習が広く採用されている。ゲイン調整、ステレオパニング、等化、残響といった様々な音響効果に自動ミキシングシステムが適用されている。これらのシステムはビジュアルインターフェースを通じて制御でき、オーディオ例、ノブ、セマンティックディスクリプタを提供する。セマンティック記述子やテキスト情報を使用してシステムを制御することは、アーティストが創造的な目標を伝える効果的な方法である。さらに、アーティストはミキシングシステムやミキシングエンジニアでは理解できないような非技術的な言葉を使うこともある。本稿では,意味記述子を表現するために単語埋め込みを利用する新しいアイデアについて検討する。単語埋め込みは一般的に、大量のテキストのコーパス上にニューラルネットワークをトレーニングすることで得られる。これらの埋め込みは、単語からEQ設定への変換を生成するニューラルネットワークの入力層として機能する。この技術を使用して、機械学習モデルは、これまで見たことのないセマンティックディスクリプタのEQ設定を生成することもできる。我々はこのアイデアの実現可能性を示す実験を行う。さらに,人間のeq設定とニューラルネットワークの予測を比較し,予測の質を評価する。その結果、埋め込み層により、ニューラルネットワークは意味記述子を理解できることがわかった。埋め込み層を持つモデルは、埋め込み層を持たないモデルよりも優れているが、人間のラベルほど良いものではない。

In recent years, machine learning has been widely adopted to automate the audio mixing process. Automatic mixing systems have been applied to various audio effects such as gain-adjustment, stereo panning, equalization, and reverberation. These systems can be controlled through visual interfaces, providing audio examples, using knobs, and semantic descriptors. Using semantic descriptors or textual information to control these systems is an effective way for artists to communicate their creative goals. Furthermore, sometimes artists use non-technical words that may not be understood by the mixing system, or even a mixing engineer. In this paper, we explore the novel idea of using word embeddings to represent semantic descriptors. Word embeddings are generally obtained by training neural networks on large corpora of written text. These embeddings serve as the input layer of the neural network to create a translation from words to EQ settings. Using this technique, the machine learning model can also generate EQ settings for semantic descriptors that it has not seen before. We perform experiments to demonstrate the feasibility of this idea. In addition, we compare the EQ settings of humans with the predictions of the neural network to evaluate the quality of predictions. The results showed that the embedding layer enables the neural network to understand semantic descriptors. We observed that the models with embedding layers perform better those without embedding layers, but not as good as human labels.

翻訳日:2022-02-21 14:54:29 公開日:2022-02-17

# 付加目的量を用いた評価最適化のための分散アルゴリズム

A Distributed Algorithm for Measure-valued Optimization with Additive Objective ( http://arxiv.org/abs/2202.08930v1 )

ライセンス: Link先を確認

Iman Nodozi, Abhishek Halder

(参考訳) 本稿では,付加目的の測度値最適化問題を解くための分散非パラメトリックアルゴリズムを提案する。このような問題は、非正規化された平均場ニューラルネットワーク学習とワッサーシュタイン勾配流からランゲヴィンをサンプリングするなど、確率的学習と制御のいくつかの文脈で発生する。提案アルゴリズムは,乗算器の2層交互方向法(ADMM)を含む。外側層ADMMはユークリッドコンセンサスADMMをワッサーシュタインコンセンサスADMMに一般化し、エントロピー正規化バージョンSinkhornコンセンサスADMMに一般化する。内層ADMMは標準ユークリッドADMMの特定の例であることが判明した。全体アルゴリズムは、確率測度の多様体内の勾配流れに対する作用素分割を実現する。

We propose a distributed nonparametric algorithm for solving measure-valued optimization problems with additive objectives. Such problems arise in several contexts in stochastic learning and control including Langevin sampling from an unnormalized prior, mean field neural network learning and Wasserstein gradient flows. The proposed algorithm comprises a two-layer alternating direction method of multipliers (ADMM). The outer-layer ADMM generalizes the Euclidean consensus ADMM to the Wasserstein consensus ADMM, and to its entropy-regularized version Sinkhorn consensus ADMM. The inner-layer ADMM turns out to be a specific instance of the standard Euclidean ADMM. The overall algorithm realizes operator splitting for gradient flows in the manifold of probability measures.

翻訳日:2022-02-21 14:54:10 公開日:2022-02-17

# 複数入力関数を考慮した部分微分演算子モデリングのための拡張DeepONet

Enhanced DeepONet for Modeling Partial Differential Operators Considering Multiple Input Functions ( http://arxiv.org/abs/2202.08942v1 )

ライセンス: Link先を確認

Lesley Tan and Liang Chen

(参考訳) 機械学習、特にディープラーニングは、様々な認知アプリケーションにおけるブレークスルーのパフォーマンスのために注目されている。近年、ニューラルネットワーク(NN)は偏微分方程式をモデル化するために集中的に研究されており、非線形関数の普遍近似器と見なすことができる。偏微分方程式(pde)に対する一般的な非線形連続作用素をモデル化するために、ディープ・ネットワーク・オペレータ(deeponet)アーキテクチャが提案されている。しかし、既存のdeeponetは1つの入力関数しか受け入れず、アプリケーションを制限することができる。本研究では,2つ以上の入力関数を受け入れるように拡張するDeepONetアーキテクチャについて検討する。本稿では,2つの入力関数を2つのブランチDNNサブネットワークで表現し,内部積を介して出力トラックネットワークに接続して,ニューラルネットワーク全体の出力を生成する,拡張DeepONetまたはEDeepONetハイレベルニューラルネットワーク構造を提案する。提案するEDeepONet構造は,複数の入力関数を扱うために容易に拡張できる。 2つの偏微分方程式の例をモデル化した結果,提案する拡張deeponetは,完全連結ニューラルネットワークよりも約7x-17xあるいは約1桁精度が高く,トレーニングとテストの両方において単純な拡張deeponetよりも約2x-3倍精度が高いことがわかった。

Machine learning, especially deep learning is gaining much attention due to the breakthrough performance in various cognitive applications. Recently, neural networks (NN) have been intensively explored to model partial differential equations as NN can be viewed as universal approximators for nonlinear functions. A deep network operator (DeepONet) architecture was proposed to model the general non-linear continuous operators for partial differential equations (PDE) due to its better generalization capabilities than existing mainstream deep neural network architectures. However, existing DeepONet can only accept one input function, which limits its application. In this work, we explore the DeepONet architecture to extend it to accept two or more input functions. We propose new Enhanced DeepONet or EDeepONet high-level neural network structure, in which two input functions are represented by two branch DNN sub-networks, which are then connected with output truck network via inner product to generate the output of the whole neural network. The proposed EDeepONet structure can be easily extended to deal with multiple input functions. Our numerical results on modeling two partial differential equation examples shows that the proposed enhanced DeepONet is about 7X-17X or about one order of magnitude more accurate than the fully connected neural network and is about 2X-3X more accurate than a simple extended DeepONet for both training and test.

翻訳日:2022-02-21 14:53:56 公開日:2022-02-17

# 約低ランクアイシングモデルサンプリング:MCMCは変分法を満たす

Sampling Approximately Low-Rank Ising Models: MCMC meets Variational Methods ( http://arxiv.org/abs/2202.08907v1 )

ライセンス: Link先を確認

Frederic Koehler and Holden Lee and Andrej Risteski

(参考訳) 我々は、一般的な相互作用行列 $j$ を持つ超キューブ上のイジングモデルを検討し、もし$o(1)$ の固有値以外が長さ 1 の間隔にある場合、多項式時間サンプリングアルゴリズムを与える。これは以前は、*all*固有値が長さ1の間隔に収まるグラウバー力学で知られていたが、一方のアウトリアーはグラウバー力学をひどく混ぜ合わせることができる。この結果から,低次元文脈のホップフィールドネットワークやベイズクラスタリングモデルなどの低次アイジングモデルに対する最初の多項式時間サンプリングアルゴリズムが提案され,拡張器グラフ上の不整合場を持つ反強磁性/強磁性アイジングモデルに対する多項式時間サンプリング方式が大幅に改善された。また、変分法および統計物理学における素平均場近似に基づいて、従来の近似アルゴリズムの結果を改善した。我々のアプローチは、MCMCと変分推論の世界からの新たなアイデアの融合に基づいている。アルゴリズムの一部として,分布の指数関数的再重み付けから負の定値二次形式でサンプル化できる新しい非凸変分問題を定義し,確率的勾配降下を用いてこの手順を効果的に行う方法を示す。この上に、大きな正の固有値によって生じる障害を克服し、それをSGDベースのサンプリング器と組み合わせて全問題を解決する、新しい模擬テンパリングチェーン(Hubbard-Stratonovich変換から生じる拡張状態空間)を構築する。

We consider Ising models on the hypercube with a general interaction matrix $J$, and give a polynomial time sampling algorithm when all but $O(1)$ eigenvalues of $J$ lie in an interval of length one, a situation which occurs in many models of interest. This was previously known for the Glauber dynamics when *all* eigenvalues fit in an interval of length one; however, a single outlier can force the Glauber dynamics to mix torpidly. Our general result implies the first polynomial time sampling algorithms for low-rank Ising models such as Hopfield networks with a fixed number of patterns and Bayesian clustering models with low-dimensional contexts, and greatly improves the polynomial time sampling regime for the antiferromagnetic/ferromagnetic Ising model with inconsistent field on expander graphs. It also improves on previous approximation algorithm results based on the naive mean-field approximation in variational methods and statistical physics. Our approach is based on a new fusion of ideas from the MCMC and variational inference worlds. As part of our algorithm, we define a new nonconvex variational problem which allows us to sample from an exponential reweighting of a distribution by a negative definite quadratic form, and show how to make this procedure provably efficient using stochastic gradient descent. On top of this, we construct a new simulated tempering chain (on an extended state space arising from the Hubbard-Stratonovich transform) which overcomes the obstacle posed by large positive eigenvalues, and combine it with the SGD-based sampler to solve the full problem.

翻訳日:2022-02-21 14:36:09 公開日:2022-02-17

# 心拍数推定のための機械学習モデルと顔領域ビデオ:特許,データセット,文献のレビュー

Machine learning models and facial regions videos for estimating heart rate: a review on Patents, Datasets and Literature ( http://arxiv.org/abs/2202.08913v1 )

ライセンス: Link先を確認

Tiago Palma Pagano, Lucas Lemos Ortega, Victor Rocha Santos, Yasmin da Silva Bonfim, Jos\'e Vin\'icius Dantas Paranhos, Paulo Henrique Miranda S\'a, Lian Filipe Santana Nascimento, Ingrid Winkler, Erick Giovani Sperandio Nascimento

(参考訳) 心拍数の推定は、様々な状況のユーザを監視する上で重要である。顔画像に基づく推定は、非侵襲的な方法で心臓情報を監視することができ、デバイスがシンプルであるため、ユーザーの顔を撮影するカメラのみを必要とするため、ますます研究されている。ユーザーの顔のこれらのビデオから、機械学習は心拍数を推定することができる。本研究では、顔ビデオから心拍数を推定するために機械学習モデルを使用することの利点と課題について、特許、データセット、記事レビューを通して検討する。我々はderwent innovation, ieee xplore, scopus, web of science knowledge basesを検索し,7つの特許出願,11のデータセット,20の心拍数,photoplethysmography,心電図データを特定した。特許に関しては,著者らによって述べられているように,心拍推定に関する発明の利点に留意する。データセットの面では、そのほとんどは学術的な目的と、心拍推定以外の対象をカバーできるさまざまなサインとアノテーションがあることがわかりました。論文の観点では,心拍数測定のための関心領域の抽出や,小さな運動抽出にビデオ倍率を用いる手法や,観察された個人の心拍数,信号抽出に最適な領域,処理方法などを抽出したevm-cnnやvgg-16などのモデルを発見した。

Estimating heart rate is important for monitoring users in various situations. Estimates based on facial videos are increasingly being researched because it makes it possible to monitor cardiac information in a non-invasive way and because the devices are simpler, requiring only cameras that capture the user's face. From these videos of the user's face, machine learning is able to estimate heart rate. This study investigates the benefits and challenges of using machine learning models to estimate heart rate from facial videos, through patents, datasets, and articles review. We searched Derwent Innovation, IEEE Xplore, Scopus, and Web of Science knowledge bases and identified 7 patent filings, 11 datasets, and 20 articles on heart rate, photoplethysmography, or electrocardiogram data. In terms of patents, we note the advantages of inventions related to heart rate estimation, as described by the authors. In terms of datasets, we discovered that most of them are for academic purposes and with different signs and annotations that allow coverage for subjects other than heartbeat estimation. In terms of articles, we discovered techniques, such as extracting regions of interest for heart rate reading and using Video Magnification for small motion extraction, and models such as EVM-CNN and VGG-16, that extract the observed individual's heart rate, the best regions of interest for signal extraction and ways to process them.

翻訳日:2022-02-21 14:33:45 公開日:2022-02-17

# 反復的な信念の変化計算的な変化

Iterated Belief Change, Computationally ( http://arxiv.org/abs/2202.08856v1 )

ライセンス: Link先を確認

Kai Sauerwald and Christoph Beierle

(参考訳) 反復的信念変化(英: iterated belief change)とは、信念のダイナミクスに関する原則を研究する研究領域である。本稿では,反復的信念変化がいかに計算に結びついているかを示す。特に,反復的信念修正は,ダルウィッチ・ピアール法のような広く受け入れられた原則の下でも,チューリング完全であることを示す。

Iterated Belief Change is the research area that investigates principles for the dynamics of beliefs over (possibly unlimited) many subsequent belief changes. In this paper, we demonstrate how iterated belief change is connected to computation. In particular, we show that iterative belief revision is Turing complete, even under the condition that broadly accepted principles like the Darwiche-Pearl postulates for iterated revision hold.

翻訳日:2022-02-21 14:31:07 公開日:2022-02-17

# コンピュータビジョンによるカモフラージュ軍用アセットに対する非知覚的敵パッチの開発

Developing Imperceptible Adversarial Patches to Camouflage Military Assets From Computer Vision Enabled Technologies ( http://arxiv.org/abs/2202.08892v1 )

ライセンス: Link先を確認

Christopher Wise, Jo Plested

(参考訳) 畳み込みニューラルネットワーク(CNN)は、オブジェクト検出の急速な進歩と高いレベルの成功を示している。しかし、近年の証拠は敵の攻撃に対する脆弱性を浮き彫りにした。これらの攻撃は、対象の誤分類や検出の抑制をもたらす画像の摂動や敵のパッチが計算される。伝統的なカモフラージュ手法は、情報、監視および偵察技術および第5世代のミサイルにおける自律的な検出から航空機や他の大きな移動資産を偽装する場合、実用的でない。本稿では,コンピュータビジョン対応技術から大規模軍事資産をカモフラームできる非受容性パッチを作製するユニークな手法を提案する。対象検出損失を最大化しつつ,パッチの色知覚性を制限することにより,これらのパッチを開発した。本研究は,対象検出アルゴリズムに対する敵例とその影響の理解を深めることを目的とする。

Convolutional neural networks (CNNs) have demonstrated rapid progress and a high level of success in object detection. However, recent evidence has highlighted their vulnerability to adversarial attacks. These attacks are calculated image perturbations or adversarial patches that result in object misclassification or detection suppression. Traditional camouflage methods are impractical when applied to disguise aircraft and other large mobile assets from autonomous detection in intelligence, surveillance and reconnaissance technologies and fifth generation missiles. In this paper we present a unique method that produces imperceptible patches capable of camouflaging large military assets from computer vision-enabled technologies. We developed these patches by maximising object detection loss whilst limiting the patch's colour perceptibility. This work also aims to further the understanding of adversarial examples and their effects on object detection algorithms.

翻訳日:2022-02-21 14:29:31 公開日:2022-02-17

# 言語仕様による視覚的注意の誘導について

On Guiding Visual Attention with Language Specification ( http://arxiv.org/abs/2202.08926v1 )

ライセンス: Link先を確認

Suzanne Petryk, Lisa Dunlap, Keyan Nasseri, Joseph Gonzalez, Trevor Darrell, and Anna Rohrbach

(参考訳) 現実の課題は通常、言語用語やフレーズで視覚的カテゴリーを定義するが、ほとんどの視覚的分類法は数値的な指標でカテゴリーを定義する。しかし、クラスの言語仕様は偏りのあるデータセットやうるさいデータセットに特に便利であり、どの機能がタスクに関係するかの曖昧さを解消するのに役立ちます。近年の大規模マルチモーダルモデルは、画像訓練データを追加しても言語仕様から多種多様な高レベル概念を認識することが示されているが、より細かいタスクではクラスを区別できないことが多い。対照的にCNNは、きめ細かい識別に必要な微妙な画像の特徴を抽出できるが、データセットのバイアスやノイズに過度に適合する。私たちの洞察は、気を散らすのではなく、タスク関連機能に分類証拠を限定するためのアドバイスとして、ハイレベルな言語仕様を使用することです。そこで我々は,事前訓練された大規模モデルからタスク関連語やフレーズに注意マップを付ける。次に、このグラウンドリングを用いて分類器の空間的注意を注意散逸から遠ざける。この方法で空間的注意を監督することで、偏りやノイズのあるデータを含む分類タスクのパフォーマンスが向上し、約3～15%の最悪グループ精度が向上し、41～45%の相対的改善が得られている。

While real world challenges typically define visual categories with language words or phrases, most visual classification methods define categories with numerical indices. However, the language specification of the classes provides an especially useful prior for biased and noisy datasets, where it can help disambiguate what features are task-relevant. Recently, large-scale multimodal models have been shown to recognize a wide variety of high-level concepts from a language specification even without additional image training data, but they are often unable to distinguish classes for more fine-grained tasks. CNNs, in contrast, can extract subtle image features that are required for fine-grained discrimination, but will overfit to any bias or noise in datasets. Our insight is to use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors. To do this, we ground task-relevant words or phrases with attention maps from a pretrained large-scale model. We then use this grounding to supervise a classifier's spatial attention away from distracting context. We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data, including about 3-15% worst-group accuracy improvements and 41-45% relative improvements on fairness metrics.

翻訳日:2022-02-21 14:29:16 公開日:2022-02-17

# Generative Adversarial Networkに基づく非線形文脈帯域の高速オンライン推論

Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network ( http://arxiv.org/abs/2202.08867v1 )

ライセンス: Link先を確認

Yun Da Tsai, Shou De Lin

(参考訳) この研究は、腕数$n$が非常に大きいとき、非線形文脈バンドイットを推測する効率上の懸念に対処する。本稿では,トンプソンサンプリングや UCB などの帯域幅アルゴリズムを効率よく実行するための,エンドツーエンドのトレーニングプロセスを備えたニューラルバンディットモデルを提案する。我々は現在最先端の時間複雑性を$O(\log n)$に推し進め、ベイズ近似、ニューラルランダム特徴マッピング、近似大域最大化、近似近接探索を行う。さらに,予測時間からトレーニング時間までの最適なアーム選択の目標を最大化し,バッチ処理と並列処理のメリットを付加した大幅な高速化を享受する,生成的対向ネットワークを提案する。 % 生成モデルでは, 近似近傍探索の助けを借りて, 対数時間における後方サンプリングの近似argmaxを推定できる。分類とレコメンデーションタスクに関する広範囲な実験は、推論時間における桁違いな改善を示し、性能に有意な低下はない。

This work addresses the efficiency concern on inferring a nonlinear contextual bandit when the number of arms $n$ is very large. We propose a neural bandit model with an end-to-end training process to efficiently perform bandit algorithms such as Thompson Sampling and UCB during inference. We advance state-of-the-art time complexity to $O(\log n)$ with approximate Bayesian inference, neural random feature mapping, approximate global maxima and approximate nearest neighbor search. We further propose a generative adversarial network to shift the bottleneck of maximizing the objective for selecting optimal arms from inference time to training time, enjoying significant speedup with additional advantage of enabling batch and parallel processing. %The generative model can inference an approximate argmax of the posterior sampling in logarithmic time complexity with the help of approximate nearest neighbor search. Extensive experiments on classification and recommendation tasks demonstrate order-of-magnitude improvement in inference time no significant degradation on the performance.

翻訳日:2022-02-21 14:25:58 公開日:2022-02-17

# 連続時間対離散時間視ベースSLAM:比較研究

Continuous-Time vs. Discrete-Time Vision-based SLAM: A Comparative Study ( http://arxiv.org/abs/2202.08894v1 )

ライセンス: Link先を確認

Giovanni Cioffi, Titus Cieslewski, Davide Scaramuzza

(参考訳) ロボット実践者は一般に離散時間定式化を通じて視覚に基づくSLAM問題にアプローチする。これは統合理論の利点であり、成功事例と失敗事例を非常によく理解している。しかし、離散時間SLAMは、異なるセンサから来る高速度および/または非同期測定が推定プロセスに存在する場合、アルゴリズムを調整し、仮定を単純化する必要がある。逆に、実践者がしばしば見落としている継続的SLAMは、これらの制限に苦しめられません。実際、新しい測定値に新しい最適化変数を追加することなく、新しいセンサーデータを非同期に統合することができる。このように、非同期または連続的なセンサーデータの高速ストリームの統合は、高度に設計されたアルゴリズムを必要としないため、直感的な方法で複数のセンサーモダリティの融合を可能にする。マイナス面として、連続時間は、いくつかの好ましくない状況における軌道推定を悪化させる可能性のある事前を導入する。本研究では,視力に基づくSLAMにおける2つの定式化の利点と限界を体系的に比較することを目的とする。そこで我々は,ロボットの種類,動作速度,センサのモーダル性など,幅広い実験分析を行った。実験結果から, 軌道型とは独立に, 連続時間スラムは, センサが時間同期しない場合には, 個別のスラムよりも優れていることが示唆された。この作業の文脈で,slam問題を離散的かつ連続的に解決するための最先端アルゴリズムを含む,モジュール化された効率的なソフトウェアアーキテクチャを開発し,オープンソースとした。

Robotic practitioners generally approach the vision-based SLAM problem through discrete-time formulations. This has the advantage of a consolidated theory and very good understanding of success and failure cases. However, discrete-time SLAM needs tailored algorithms and simplifying assumptions when high-rate and/or asynchronous measurements, coming from different sensors, are present in the estimation process. Conversely, continuous-time SLAM, often overlooked by practitioners, does not suffer from these limitations. Indeed, it allows integrating new sensor data asynchronously without adding a new optimization variable for each new measurement. In this way, the integration of asynchronous or continuous high-rate streams of sensor data does not require tailored and highly-engineered algorithms, enabling the fusion of multiple sensor modalities in an intuitive fashion. On the down side, continuous time introduces a prior that could worsen the trajectory estimates in some unfavorable situations. In this work, we aim at systematically comparing the advantages and limitations of the two formulations in vision-based SLAM. To do so, we perform an extensive experimental analysis, varying robot type, speed of motion, and sensor modalities. Our experimental analysis suggests that, independently of the trajectory type, continuous-time SLAM is superior to its discrete counterpart whenever the sensors are not time-synchronized. In the context of this work, we developed, and open source, a modular and efficient software architecture containing state-of-the-art algorithms to solve the SLAM problem in discrete and continuous time.

翻訳日:2022-02-21 14:19:11 公開日:2022-02-17

# トレーニング済みのGANはいつ、なぜ、どれが役に立つのか?

When, Why, and Which Pretrained GANs Are Useful? ( http://arxiv.org/abs/2202.08937v1 )

ライセンス: Link先を確認

Timofey Grigoryev, Andrey Voynov, Artem Babenko

(参考訳) 論文は、新しいデータセットで事前訓練されたganを微調整するいくつかの方法を提案しており、これは通常、スクラッチから、特に限られたデータレジームで、トレーニングよりも高いパフォーマンスをもたらす。しかし、GANプレトレーニングの明らかな経験的利点にもかかわらず、その内部メカニズムは詳細に分析されず、その役割の理解は明らかになっていない。さらに、例えば、適切な事前訓練されたGANチェックポイントを選択するなど、基本的な実践的詳細は、厳密な根拠を持たず、典型的には試行錯誤によって決定される。この研究は、ガン微調整の過程を解明することを目的としている。まず,事前学習したチェックポイントによるGANトレーニングプロセスの初期化が,個々のサンプルの忠実度よりもモデルのカバレッジに影響を与えることを示す。第2に,事前学習された生成器と判別器が微調整プロセスにどのように寄与するかを明示的に記述し,両者の事前学習の重要性に関するこれまでの証拠を説明する。最後に,本解析の直接的な実用的利点として,特定の対象タスクの微調整に最も適したganチェックポイントを選択するための簡単なレシピについて述べる。重要なことは、ほとんどのタスクにおいて、Imagenetで事前訓練されたGANは、視覚的品質が劣っているにもかかわらず、識別型コンピュータビジョンモデルの典型的な事前訓練シナリオと同様、微調整の出発点として優れたものと思われる。

The literature has proposed several methods to finetune pretrained GANs on new datasets, which typically results in higher performance compared to training from scratch, especially in the limited-data regime. However, despite the apparent empirical benefits of GAN pretraining, its inner mechanisms were not analyzed in-depth, and understanding of its role is not entirely clear. Moreover, the essential practical details, e.g., selecting a proper pretrained GAN checkpoint, currently do not have rigorous grounding and are typically determined by trial and error. This work aims to dissect the process of GAN finetuning. First, we show that initializing the GAN training process by a pretrained checkpoint primarily affects the model's coverage rather than the fidelity of individual samples. Second, we explicitly describe how pretrained generators and discriminators contribute to the finetuning process and explain the previous evidence on the importance of pretraining both of them. Finally, as an immediate practical benefit of our analysis, we describe a simple recipe to choose an appropriate GAN checkpoint that is the most suitable for finetuning to a particular target task. Importantly, for most of the target tasks, Imagenet-pretrained GAN, despite having poor visual quality, appears to be an excellent starting point for finetuning, resembling the typical pretraining scenario of discriminative computer vision models.

翻訳日:2022-02-21 13:15:48 公開日:2022-02-17

# SGPT:意味検索のためのGPT文埋め込み

SGPT: GPT Sentence Embeddings for Semantic Search ( http://arxiv.org/abs/2202.08904v1 )

ライセンス: Link先を確認

Niklas Muennighoff

(参考訳) GPT変換器は利用可能な最大の言語モデルであるが、セマンティック検索はBERT変換器が支配している。 SGPT-BE と SGPT-CE を用いて,GPT モデルをバイエンコーダやクロスエンコーダとして対称探索や非対称探索に適用する。 SGPT-BEは、バイアステンソルのみを対照的に微調整し、意味的に意味のある文埋め込みを生成する。 580億のパラメータSGPT-BEは、BEIRに新しい最先端を設定すれば、最高の文埋め込みを6%上回る。同時に提案された175B DavinciエンドポイントのOpenAI Embeddingよりも優れており、パラメータは25万倍も微調整されている。 SGPT-CEは微調整なしでGPTモデルのログ確率を使用する。 610億のパラメータSGPT-CEは、BEIR上で教師なしの最先端を設定する。 7つのデータセットの教師付き最先端を破るが、他のデータセットでは著しく失われる。プロンプトに適応することで、どのように緩和できるかを示す。 SGPT-BEとSGPT-CEはモデルサイズでスケールする。しかし、レイテンシ、ストレージ、計算コストの増加を考慮すべきである。コード、モデル、結果ファイルはhttps://github.com/Muennighoff/sgpt.comから無料で入手できる。

GPT transformers are the largest language models available, yet semantic search is dominated by BERT transformers. We present SGPT-BE and SGPT-CE for applying GPT models as Bi-Encoders or Cross-Encoders to symmetric or asymmetric search. SGPT-BE produces semantically meaningful sentence embeddings by contrastive fine-tuning of only bias tensors and a novel pooling method. A 5.8 billion parameter SGPT-BE outperforms the best available sentence embeddings by 6% setting a new state-of-the-art on BEIR. It outperforms the concurrently proposed OpenAI Embeddings of the 175B Davinci endpoint, which fine-tunes 250,000 times more parameters. SGPT-CE uses log probabilities from GPT models without any fine-tuning. A 6.1 billion parameter SGPT-CE sets an unsupervised state-of-the-art on BEIR. It beats the supervised state-of-the-art on 7 datasets, but significantly loses on other datasets. We show how this can be alleviated by adapting the prompt. SGPT-BE and SGPT-CE performance scales with model size. Yet, increased latency, storage and compute costs should be considered. Code, models and result files are freely available at https://github.com/Muennighoff/sgpt.

翻訳日:2022-02-21 13:03:19 公開日:2022-02-17

# BADDr:PMDP用ベイズ適応型ディープドロップアウトRL

BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs ( http://arxiv.org/abs/2202.08884v1 )

ライセンス: Link先を確認

Sammie Katt, Hai Nguyen, Frans A. Oliehoek, Christopher Amato

(参考訳) 強化学習(RL)はスケーラビリティに大きな進歩を遂げているが、探索と部分観測可能性はまだ研究トピックとして活発である。対照的に、ベイジアンRL(BRL)は、州の推定と探索・探索のトレードオフの両方に対して原則的な答えを提供するが、スケールに苦慮している。この課題に対処するため、様々な前提を持つBRLフレームワークが提案され、様々な成功を収めている。この研究は、部分的に可観測性の下でのBRLの表現に依存しない定式化を示し、1つの理論的な傘の下で以前のモデルを統一する。また,その実用性を示すために,ドロップアウトネットワークに基づく新しい導出手法Bayes-Adaptive Deep Dropout rl (BADDr)を提案する。このパラメータ化の下では、以前の仕事とは対照的に、状態とダイナミクスに対する信念は、よりスケーラブルな推論問題である。我々はモンテカルロ木探索による行動選択を行い、我々の手法がより大きい領域を解きながら、小さな領域における最先端のBRL法と競合することを示す。

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

翻訳日:2022-02-21 13:02:45 公開日:2022-02-17

# 効果的なスパースエキスパートモデルの設計

Designing Effective Sparse Expert Models ( http://arxiv.org/abs/2202.08906v1 )

ライセンス: Link先を確認

Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, William Fedus

(参考訳) スケールは自然言語処理の新たなフロンティアを切り開いたが、コストは高い。これに対し、Mixture-of-Experts (MoE) とSwitch Transformersは、より大きくより有能な言語モデルへのエネルギー効率の良い経路として提案されている。しかし、さまざまな自然言語タスクの最先端化は、微調整中にトレーニングの不安定さと不確実な品質によって妨げられている。私たちの仕事はこれらの問題に焦点を当て、デザインガイドとして機能します。計算コストは32B高密度エンコーダデコーダ変換器(Stable and Transferable Mixture-of-Experts, ST-MoE-32B)に匹敵する。スパースモデルは、推論(SuperGLUE, ARC Easy, ARC Challenge)、要約(XSum, CNN-DM)、クローズドブック質問応答(WebQA, Natural Questions)、反対に構築されたタスク(Winogrande, ANLI R3)など、さまざまなタスクの集合において、トランスファーラーニングにおける最先端のパフォーマンスを初めて達成する。

Scale has opened new frontiers in natural language processing -- but at a high cost. In response, Mixture-of-Experts (MoE) and Switch Transformers have been proposed as an energy efficient path to even larger and more capable language models. But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine-tuning. Our work focuses on these issues and acts as a design guide. We conclude by scaling a sparse model to 269B parameters, with a computational cost comparable to a 32B dense encoder-decoder Transformer (Stable and Transferable Mixture-of-Experts or ST-MoE-32B). For the first time, a sparse model achieves state-of-the-art performance in transfer learning, across a diverse set of tasks including reasoning (SuperGLUE, ARC Easy, ARC Challenge), summarization (XSum, CNN-DM), closed book question answering (WebQA, Natural Questions), and adversarially constructed tasks (Winogrande, ANLI R3).

翻訳日:2022-02-21 12:48:40 公開日:2022-02-17

# 言語抽象化による本質的探索の改善

Improving Intrinsic Exploration with Language Abstractions ( http://arxiv.org/abs/2202.08938v1 )

ライセンス: Link先を確認

Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rockt\"aschel, Edward Grefenstette

(参考訳) 強化学習(RL)エージェントは、報酬が不足している場合、特に訓練が困難である。共通の解決策の1つは、エージェントが環境を探索することを奨励するために内在的な報酬を使用することである。しかし、近年の内在的な探索手法では、低レベルの探索に報いるが、より抽象的なスキルを必要とする領域にはスケールしない状態に基づく新しい手法が用いられることが多い。代わりに、環境における関連する抽象化を強調するための一般的な媒体として自然言語を探索する。 amigo (campero et al., 2021) や noveld (zhang et al., 2021) といった競合型内在的探索ベースラインを直接拡張(および比較)することで、言語が既存の探索方法よりも改善できるかどうかを評価する。これらの言語ベースの変種は、MiniGridとMiniHack環境スイートの13の課題に対して、言語以外の形式を45～85%上回っている。

Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural language as a general medium for highlighting relevant abstractions in an environment. Unlike previous work, we evaluate whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al., 2021) and NovelD (Zhang et al., 2021). These language-based variants outperform their non-linguistic forms by 45-85% across 13 challenging tasks from the MiniGrid and MiniHack environment suites.

翻訳日:2022-02-21 12:42:46 公開日:2022-02-17

# マルチモダリティ・メディカルイメージングのためのグラフ畳み込みネットワーク:方法,アーキテクチャ,臨床応用

Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications ( http://arxiv.org/abs/2202.08916v1 )

ライセンス: Link先を確認

Kexin Ding, Mu Zhou, Zichen Wang, Qiao Liu, Corey W. Arnold, Shaoting Zhang, Dimitri N. Metaxas

(参考訳) 画像に基づく特徴づけと疾患の理解は、生物学的スケールにわたる形態的、空間的、および位相的情報の統合分析を含む。グラフ畳み込みネットワーク(gcns)の開発は、gcnsが機能集約、インタラクション、推論を驚くほどの柔軟性と効率で実行できるため、グラフ駆動アーキテクチャを通じてこの情報複雑性に対処する機会を生み出した。これらのGCNは、定量的疾患の理解、モニタリング、診断を改善することを目的として、医療画像解析における新たな研究の波を生み出している。しかし、多モードな医療画像のための重要な画像と画像の変換を設計し、モデル解釈と臨床診断支援の強化に関する洞察を得る上で、大きな課題が残っている。本稿では,放射線学や病理組織学からのイメージングデータを含む医用画像解析における最近のGCNの発展について述べる。本稿では,医療画像解析におけるグラフネットワークアーキテクチャの急速な普及と臨床における疾患診断と患者の予後の改善について考察する。分野横断的な研究を促進するために,我々は,画像ベースのgcnとそのモデル解釈における拡張における共通の課題,医療画像研究と関連するグラフ駆動医学研究のスコープを変えることを約束する大規模ベンチマークを提示する。

Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These GCNs capabilities have spawned a new wave of research in medical imaging analysis with the overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we present recent GCNs developments in the context of medical image analysis including imaging data from radiology and histopathology. We discuss the fast-growing use of graph network architectures in medical image analysis to improve disease diagnosis and patient outcomes in clinical practice. To foster cross-disciplinary research, we present GCNs technical advancements, emerging medical applications, identify common challenges in the use of image-based GCNs and their extensions in model interpretation, large-scale benchmarks that promise to transform the scope of medical image studies and related graph-driven medical research.

翻訳日:2022-02-21 12:41:56 公開日:2022-02-17

# (参考訳) 遺伝的カリキュラムによるロバスト強化学習

Robust Reinforcement Learning via Genetic Curriculum ( http://arxiv.org/abs/2202.08393v1 )

ライセンス: CC BY 4.0

Yeeho Song, Jeff Schneider

(参考訳) 安全クリティカルシステムに深部強化学習(RL)を適用する場合、堅牢な性能を達成することが重要である。芸術的アプローチのいくつかは、敵エージェントの問題に対処しようとするが、これらのエージェントは、しばしば専門家の監督を必要とし、敵エージェントが訓練者エージェントにとって難しすぎることを防ぐ。他のアプローチではトレーニング中に環境設定を自動的に調整するが、低次元エンコーディングが使用可能な単純な環境に限定されている。これらのアプローチに触発されて,エージェントが現在失敗しているシナリオを自動的に識別し,関連するカリキュラムを生成して,エージェントがシナリオを解決し,より堅牢な行動を得るための遺伝的カリキュラムを提案する。非パラメトリックオプティマイザとして、シナリオの生の非固定エンコーディングを使用し、専門家の監督の必要性を低減し、アルゴリズムがエージェントのパフォーマンスの変化に適応できるようにします。実験の結果,既存のアルゴリズムに対するロバスト性が向上し,積算報酬を犠牲にすることなく,エージェントの2～8倍の失敗率を低下させるトレーニングカリキュラムが得られた。我々はアブレーション研究を行い、アルゴリズムがなぜ以前のアプローチを上回っているかについての知見を共有する。

Achieving robust performance is crucial when applying deep reinforcement learning (RL) in safety critical systems. Some of the state of the art approaches try to address the problem with adversarial agents, but these agents often require expert supervision to fine tune and prevent the adversary from becoming too challenging to the trainee agent. While other approaches involve automatically adjusting environment setups during training, they have been limited to simple environments where low-dimensional encodings can be used. Inspired by these approaches, we propose genetic curriculum, an algorithm that automatically identifies scenarios in which the agent currently fails and generates an associated curriculum to help the agent learn to solve the scenarios and acquire more robust behaviors. As a non-parametric optimizer, our approach uses a raw, non-fixed encoding of scenarios, reducing the need for expert supervision and allowing our algorithm to adapt to the changing performance of the agent. Our empirical studies show improvement in robustness over the existing state of the art algorithms, providing training curricula that result in agents being 2 - 8x times less likely to fail without sacrificing cumulative reward. We include an ablation study and share insights on why our algorithm outperforms prior approaches.

翻訳日:2022-02-19 05:06:18 公開日:2022-02-17

# (参考訳) Augment with Care: ブール満足度問題に対するコントラスト学習

Augment with Care: Contrastive Learning for the Boolean Satisfiability Problem ( http://arxiv.org/abs/2202.08396v1 )

ライセンス: CC BY 4.0

Haonan Duan, Pashootan Vaezipoor, Max B. Paulus, Yangjun Ruan and Chris J. Maddison

(参考訳) 教師付き学習はコンビネート問題に対する最先端の解法の設計を改善することができるが、膨大な数のコンビネートインスタンスをラベル付けすることは、指数関数的な最悪のケースの複雑さのために、しばしば実用的ではない。画像のコントラストプリトレーニングが最近成功したことに触発されて,ブール満足性問題に対するコントラストプレトレーニングに対する拡張設計の影響を科学的に研究した。典型的なグラフコントラスト事前学習はラベルに依存しない拡張を用いるが、我々の重要な洞察は、多くの組合せ問題にはよく研究された不変性があり、ラベル保存拡張の設計を可能にすることである。ラベル保存強化は対照的な事前学習の成功に不可欠である。我々の表現は、ラベルの1%しか使用せず、完全教師付き学習に匹敵するテスト精度を達成できることを示す。また、我々の表現は、目に見えない領域からより大きな問題に転送可能であることも示している。

Supervised learning can improve the design of state-of-the-art solvers for combinatorial problems, but labelling large numbers of combinatorial instances is often impractical due to exponential worst-case complexity. Inspired by the recent success of contrastive pre-training for images, we conduct a scientific study of the effect of augmentation design on contrastive pre-training for the Boolean satisfiability problem. While typical graph contrastive pre-training uses label-agnostic augmentations, our key insight is that many combinatorial problems have well-studied invariances, which allow for the design of label-preserving augmentations. We find that label-preserving augmentations are critical for the success of contrastive pre-training. We show that our representations are able to achieve comparable test accuracy to fully-supervised learning while using only 1% of the labels. We also demonstrate that our representations are more transferable to larger problems from unseen domains.

翻訳日:2022-02-19 04:51:40 公開日:2022-02-17

# (参考訳) フェデレート確率勾配降下は自己誘導運動量を得る

Federated Stochastic Gradient Descent Begets Self-Induced Momentum ( http://arxiv.org/abs/2202.08402v1 )

ライセンス: CC BY 4.0

Howard H. Yang, Zuozhu Liu, Yaru Fu, Tony Q. S. Quek, H. Vincent Poor

(参考訳) フェデレーション学習(federated learning, ffl)は,サーバとクライアントのホストが,プライバシに敏感なデータを直接公開することなく,クライアントのデータや計算リソースを活用した統計モデルを協調的にトレーニングする,モバイルエッジシステムに適用可能な,新たなマシンラーニング手法である。このような条件下での確率勾配降下 (SGD) の実行は, 大域的な集約プロセスに運動量的な項を加えるとみなすことができる。そこで本研究では,パラメータの安定度と通信資源の影響を考慮し,フェデレーション学習システムの収束率をさらに解析する。これらの結果はフェデレーションsgdアルゴリズムの理解を前進させ、またシステム設計者にとって有用なステイネス解析とフェデレーション計算システムとのリンクを分岐させる。

Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems, in which a server and a host of clients collaboratively train a statistical model utilizing the data and computation resources of the clients without directly exposing their privacy-sensitive data. We show that running stochastic gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process. Based on this finding, we further analyze the convergence rate of a federated learning system by accounting for the effects of parameter staleness and communication resources. These results advance the understanding of the Federated SGD algorithm, and also forges a link between staleness analysis and federated computing systems, which can be useful for systems designers.

翻訳日:2022-02-19 04:29:33 公開日:2022-02-17

# (参考訳) AutoScore-Ordinal: 順序付け結果のスコアリングモデルを生成するための解釈可能な機械学習フレームワーク

AutoScore-Ordinal: An interpretable machine learning framework for generating scoring models for ordinal outcomes ( http://arxiv.org/abs/2202.08407v1 )

ライセンス: CC BY 4.0

Seyed Ehsan Saffari, Yilin Ning, Xie Feng, Bibhas Chakraborty, Victor Volovici, Roger Vaughan, Marcus Eng Hock Ong, Nan Liu

(参考訳) 背景:リスク予測モデルは、リスク階層化とリソース割り当てに役立つ臨床意思決定の有用なツールであり、患者の健康管理に繋がる可能性がある。 AutoScoreは、機械学習に基づくバイナリ結果のための自動臨床スコア生成装置である。本研究では,autoscoreフレームワークを拡張して,順序的結果に対するリスク予測を解釈可能にすることを目的とした。メソッド: AutoScore-Ordinalフレームワークは、変数ランキング、変数変換、スコア導出(比例奇数モデルからの)、モデル選択、スコア微調整、モデル評価を含む、オリジナルのAutoScoreアルゴリズムの6つのモジュールを使用して生成される。 2008年から2017年にかけてシンガポール総合病院の救急部門から電子カルテデータを用いてオートスコア・オルディナルのパフォーマンスを解析した。モデルはデータの70%でトレーニングされ、10%で検証され、残りの20%でテストされた。結果: 本研究は, 患者445,989例を対象とし, 平均結果の分布が80.7%, 30日可読率12.5%, 30日可読率12.5%, 退院後6.8%であった。フレキシブル変数選択手順によって同定された8変数の2セットを用いて,2つのポイントベースリスク予測モデルを開発した。 2つのモデルは、レシーバーの動作特性曲線 (0.785 と 0.793) の下の平均領域で測定された適度な性能を示し、代替モデルに匹敵する一般化された c-index (0.737 と 0.760) を示した。結論: autoscore-ordinalは、リスク予測モデルの開発と検証のための自動化および使いやすいフレームワークを提供し、高次元データから潜在的な予測者を体系的に識別する。

Background: Risk prediction models are useful tools in clinical decision-making which help with risk stratification and resource allocations and may lead to a better health care for patients. AutoScore is a machine learning-based automatic clinical score generator for binary outcomes. This study aims to expand the AutoScore framework to provide a tool for interpretable risk prediction for ordinal outcomes. Methods: The AutoScore-Ordinal framework is generated using the same 6 modules of the original AutoScore algorithm including variable ranking, variable transformation, score derivation (from proportional odds models), model selection, score fine-tuning, and model evaluation. To illustrate the AutoScore-Ordinal performance, the method was conducted on electronic health records data from the emergency department at Singapore General Hospital over 2008 to 2017. The model was trained on 70% of the data, validated on 10% and tested on the remaining 20%. Results: This study included 445,989 inpatient cases, where the distribution of the ordinal outcome was 80.7% alive without 30-day readmission, 12.5% alive with 30-day readmission, and 6.8% died inpatient or by day 30 post discharge. Two point-based risk prediction models were developed using two sets of 8 predictor variables identified by the flexible variable selection procedure. The two models indicated reasonably good performance measured by mean area under the receiver operating characteristic curve (0.785 and 0.793) and generalized c-index (0.737 and 0.760), which were comparable to alternative models. Conclusion: AutoScore-Ordinal provides an automated and easy-to-use framework for development and validation of risk prediction models for ordinal outcomes, which can systematically identify potential predictors from high-dimensional data.

翻訳日:2022-02-19 04:19:57 公開日:2022-02-17

# (参考訳) 動的グラフニューラルネットワークによる多変量時系列予測

Multivariate Time Series Forecasting with Dynamic Graph Neural ODEs ( http://arxiv.org/abs/2202.08408v1 )

ライセンス: CC BY 4.0

Ming Jin, Yu Zheng, Yuan-Fang Li, Siheng Chen, Bin Yang, Shirui Pan

(参考訳) 多変量時系列予測は、エネルギー消費や交通予測といった実世界の応用において長年大きな注目を集めてきた。最近の手法は優れた予測能力を示しているが、3つの基本的な限界に苦しむ。 (i)離散型ニューラルネットワークアーキテクチャ: 個別にパラメータ化された空間的および時間的ブロックをエンコードすることで、不連続な潜在状態の軌跡を導き、数値的誤差を高い予測に導く。 (ii) 高複雑性: 離散的アプローチは、専用の設計と冗長なパラメータを持つモデルを複雑にし、高い計算とメモリオーバーヘッドをもたらす。 3) グラフ事前の信頼性: 事前定義された静的グラフ構造に基づくと、実世界のアプリケーションにおけるその有効性と実践性が制限される。本稿では,動的グラフニューラル常微分方程式(mtgode)を用いた多変量時系列予測のための連続モデルを提案することにより,上記の制約をすべて解決する。具体的には、まず多変量時系列を時間発展ノード特徴と未知グラフ構造を持つ動的グラフに抽象化する。次に,不足するグラフトポロジーを補完し,空間的および時間的メッセージパッシングを統一するニューラルodeを設計,解決し,より深いグラフ伝搬と細粒度の時間情報集約を可能にし,安定かつ精密な潜在空間-時間的ダイナミクスを特徴付ける。本実験は, MTGODEの5つの時系列ベンチマーク・データセットにおける優位性を示すものである。

Multivariate time series forecasting has long received significant attention in real-world applications, such as energy consumption and traffic prediction. While recent methods demonstrate good forecasting abilities, they suffer from three fundamental limitations. (i) Discrete neural architectures: Interlacing individually parameterized spatial and temporal blocks to encode rich underlying patterns leads to discontinuous latent state trajectories and higher forecasting numerical errors. (ii) High complexity: Discrete approaches complicate models with dedicated designs and redundant parameters, leading to higher computational and memory overheads. (iii) Reliance on graph priors: Relying on predefined static graph structures limits their effectiveness and practicability in real-world applications. In this paper, we address all the above limitations by proposing a continuous model to forecast Multivariate Time series with dynamic Graph neural Ordinary Differential Equations (MTGODE). Specifically, we first abstract multivariate time series into dynamic graphs with time-evolving node features and unknown graph structures. Then, we design and solve a neural ODE to complement missing graph topologies and unify both spatial and temporal message passing, allowing deeper graph propagation and fine-grained temporal information aggregation to characterize stable and precise latent spatial-temporal dynamics. Our experiments demonstrate the superiorities of MTGODE from various perspectives on five time series benchmark datasets.

翻訳日:2022-02-19 04:05:40 公開日:2022-02-17

# (参考訳) 写本記号のエントロピー連想記憶

Entropic Associative Memory for Manuscript Symbols ( http://arxiv.org/abs/2202.08413v1 )

ライセンス: CC BY 4.0

Rafael Morales and No\'e Hern\'andez and Ricardo Cruz and Victor D. Cruz and Luis A. Pineda

(参考訳) メモリ検索は構成的な操作であり、メモリに含まれないオブジェクトへのメモリキューは検索なしで直接拒否され、並列計算によってメモリ操作を行うことができる。文字と数字の両方の写本記号は、関連するエントロピーを持つ連想記憶レジスタで表される。メモリ認識操作は、精度とリコールの間のエントロピートレードオフに従い、エントロピーレベルは、メモリ検索操作によって回収されたオブジェクトの品質に影響を及ぼす。本提案は,連想記憶のニューラルネットワークモデルと数次元的に対比される。重度咬合などの完全情報と不完全情報の両方を有する物体を検索するためのエントロピー連想記憶の動作特性について検討する。本稿で報告した実験は、自然記憶の実用的応用と計算モデルを開発するためのこの枠組みの可能性を示すものである。

Manuscript symbols can be stored, recognized and retrieved from an entropic digital memory that is associative and distributed but yet declarative; memory retrieval is a constructive operation, memory cues to objects not contained in the memory are rejected directly without search, and memory operations can be performed through parallel computations. Manuscript symbols, both letters and numerals, are represented in Associative Memory Registers that have an associated entropy. The memory recognition operation obeys an entropy trade-off between precision and recall, and the entropy level impacts on the quality of the objects recovered through the memory retrieval operation. The present proposal is contrasted in several dimensions with neural networks models of associative memory. We discuss the operational characteristics of the entropic associative memory for retrieving objects with both complete and incomplete information, such as severe occlusions. The experiments reported in this paper add evidence on the potential of this framework for developing practical applications and computational models of natural memory.

翻訳日:2022-02-19 03:38:00 公開日:2022-02-17

# (参考訳) FPIC:光PCB保証のための新しいセマンティックデータセット

FPIC: A Novel Semantic Dataset for Optical PCB Assurance ( http://arxiv.org/abs/2202.08414v1 )

ライセンス: CC BY 4.0

Nathan Jessurun, Olivia P. Dizon-Paradis, Jacob Harrison, Shajib Ghosh, Mark M. Tehranipoor, Damon L. Woodard, Navid Asadizanjani

(参考訳) 印刷基板(PCB)の海外へのアウトソーシングは、ハードウェア保証能力の向上を必要とした。この目的のために、過去にデジタルカメラを用いて取得したPCB画像の様々な側面を探求する自動光学検査(AOI)技術が提案されている。本研究では、最先端のAOI手法を概観し、機械学習(ML)ソリューションに対する強固で急激な傾向を観察した。これらは、公開されているPCBデータ空間に欠けている、大量のラベル付き真実データを必要とする。本稿では,FICS PBCイメージコレクション(FPIC)データセットを提案する。さらに、この研究は、ハードウェアセキュリティ能力の潜在的な増加と、データ収集中に強調された方法論的区別をカバーしている。

The continued outsourcing of printed circuit board (PCB) fabrication to overseas venues necessitates increased hardware assurance capabilities. Toward this end, several automated optical inspection (AOI) techniques have been proposed in the past exploring various aspects of PCB images acquired using digital cameras. In this work, we review state-of-the-art AOI techniques and observed the strong, rapid trend toward machine learning (ML) solutions. These require significant amounts of labeled ground truth data, which is lacking in the publicly available PCB data space. We propose the FICS PBC Image Collection (FPIC) dataset to address this bottleneck in available large-volume, diverse, semantic annotations. Additionally, this work covers the potential increase in hardware security capabilities and observed methodological distinctions highlighted during data collection.

翻訳日:2022-02-19 03:17:18 公開日:2022-02-17

# (参考訳) 制御可能な調和性と多声性を有するコードコンディショニングメロディ合唱

Chord-Conditioned Melody Choralization with Controllable Harmonicity and Polyphonicity ( http://arxiv.org/abs/2202.08423v1 )

ライセンス: CC BY 4.0

Shangda Wu, Xiaobing Li, Maosong Sun

(参考訳) メロディ合唱(メロディの合唱)、すなわちユーザ・ギヴン・メロディに基づく4パートの合唱は、長い間J.S.バッハ合唱と密接に関連していた。従来のニューラルネットワークベースのシステムは、コード進行を条件としたchorale生成にほとんど焦点を合わせず、いずれも制御可能なメロディ合唱を実現していなかった。ニューラルネットワークがバッハの合唱曲から対位法の一般的な原理を学べるように、コードコンディショニングのためのコードシンボルを符号化した音楽表現を最初に設計する。次に,コード進行に条件付きメロディのための4パート合唱を生成可能なメロディ合唱システムであるDeepChoirを提案する。さらに、密度サンプリングの改善により、ユーザはDeepChoirが生成するコラールの調和度やポリフォニック度を制御できる。実験結果から,高調波とポリフォニック性に対するDeepChoirのデータ表現の有効性と制御性を明らかにした。 DeepChoirのコードと生成されたサンプル(合唱曲、民謡、交響曲)、そして現在使用しているデータセットはhttps://github.com/sander-wood/deepchoir.comで入手できる。

Melody choralization, i.e. generating a four-part chorale based on a user-given melody, has long been closely associated with J.S. Bach chorales. Previous neural network-based systems rarely focus on chorale generation conditioned on a chord progression, and none of them realised controllable melody choralization. To enable neural networks to learn the general principles of counterpoint from Bach's chorales, we first design a music representation that encoded chord symbols for chord conditioning. We then propose DeepChoir, a melody choralization system, which can generate a four-part chorale for a given melody conditioned on a chord progression. Furthermore, with the improved density sampling, a user can control the extent of harmonicity and polyphonicity for the chorale generated by DeepChoir. Experimental results reveal the effectiveness of our data representation and the controllability of DeepChoir over harmonicity and polyphonicity. The code and generated samples (chorales, folk songs and a symphony) of DeepChoir, and the dataset we use now are available at https://github.com/sander-wood/deepchoir.

翻訳日:2022-02-19 02:59:47 公開日:2022-02-17

# (参考訳) オンライン線形回帰としての合成制御

Synthetic Control As Online Linear Regression ( http://arxiv.org/abs/2202.08426v1 )

ライセンス: CC BY 4.0

Jiafeng Chen

(参考訳) 本稿では,合成制御とオンライン学習の単純な関係について述べる。具体的には、FTL(Follow-The-Leader)の例として合成制御を認識する。オンライン凸最適化における標準結果から, 対向的な結果が選択された場合でも, 処理単位に対する対実結果の合成制御予測は, 制御単位の結果のオラクル重み付き平均とほぼ同等に実行されることが示唆された。差分データに対する合成制御は、オラクル重み付き差分差分とほぼ同等に動作する。この観察は、比較ケーススタディにおける合成制御推定器の使用をさらに支援していると論じる。

This paper notes a simple connection between synthetic control and online learning. Specifically, we recognize synthetic control as an instance of Follow-The-Leader (FTL). Standard results in online convex optimization then imply that, even when outcomes are chosen by an adversary, synthetic control predictions of counterfactual outcomes for the treated unit perform almost as well as an oracle weighted average of control units' outcomes. Synthetic control on differenced data performs almost as well as oracle weighted difference-in-differences. We argue that this observation further supports the use of synthetic control estimators in comparative case studies.

翻訳日:2022-02-19 02:44:39 公開日:2022-02-17

# (参考訳) AKB-48: 実世界のArticulated Object Knowledge Base

AKB-48: A Real-World Articulated Object Knowledge Base ( http://arxiv.org/abs/2202.08432v1 )

ライセンス: CC BY 4.0

Liu Liu, Wenqiang Xu, Haoyuan Fu, Sucheng Qian, Yang Han, Cewu Lu

(参考訳) 人間の生活は明瞭な物体で占められている。表現された物体の包括的理解、すなわち外観、構造、物理学的性質、意味論は、多くの研究コミュニティに利益をもたらすだろう。現在の調音オブジェクト理解ソリューションは、通常、物理特性のないCADモデルによる合成オブジェクトデータセットに基づいており、視覚およびロボット工学のタスクにおけるシミュレーションから実世界の応用への満足のいく一般化を防ぐ。このギャップを埋めるために、48のカテゴリからなる実世界3次元関節オブジェクトモデル2,037からなる大規模関節オブジェクト知識ベースであるakb-48を提案する。各オブジェクトは知識グラフArtiKGによって記述される。 akb-48を構築するために,高速な調音知識モデリング(farm)パイプラインを提案する。このパイプラインは10～15分で調音オブジェクトのarikgを満たし,実世界でのオブジェクトモデリングのコストを大幅に削減する。提案するAKBNetは,C-VAM(Calegory-level Visual Articulation Manipulation)タスクのための新しい積分パイプラインであり,ポーズ推定,オブジェクト再構成,操作という3つのサブタスクをベンチマークする。データセット、コード、モデルはhttps://liuliu66.github.io/articulationobjects/で公開されている。

Human life is populated with articulated objects. A comprehensive understanding of articulated objects, namely appearance, structure, physics property, and semantics, will benefit many research communities. As current articulated object understanding solutions are usually based on synthetic object dataset with CAD models without physics properties, which prevent satisfied generalization from simulation to real-world applications in visual and robotics tasks. To bridge the gap, we present AKB-48: a large-scale Articulated object Knowledge Base which consists of 2,037 real-world 3D articulated object models of 48 categories. Each object is described by a knowledge graph ArtiKG. To build the AKB-48, we present a fast articulation knowledge modeling (FArM) pipeline, which can fulfill the ArtiKG for an articulated object within 10-15 minutes, and largely reduce the cost for object modeling in the real world. Using our dataset, we propose AKBNet, a novel integral pipeline for Category-level Visual Articulation Manipulation (C-VAM) task, in which we benchmark three sub-tasks, namely pose estimation, object reconstruction and manipulation. Dataset, codes, and models will be publicly available at https://liuliu66.github.io/articulationobjects/.

翻訳日:2022-02-19 02:23:34 公開日:2022-02-17

# (参考訳) 説明可能な強化学習に関する調査

A Survey of Explainable Reinforcement Learning ( http://arxiv.org/abs/2202.08434v1 )

ライセンス: CC BY 4.0

Stephanie Milani and Nicholay Topin and Manuela Veloso and Fei Fang

(参考訳) 説明可能な強化学習(XRL)は説明可能な機械学習の新たなサブフィールドであり,近年注目されている。 xrlの目標は、学習エージェントの逐次意思決定設定における意思決定過程を明らかにすることである。本稿では,RL設定を優先するXRL文献を整理するための新しい分類法を提案する。私たちはこの分類法に従ってテクニックを概説する。将来の仕事のロードマップをモチベーションにし、概説するために使用する文献のギャップを指摘します。

Explainable reinforcement learning (XRL) is an emerging subfield of explainable machine learning that has attracted considerable attention in recent years. The goal of XRL is to elucidate the decision-making process of learning agents in sequential decision-making settings. In this survey, we propose a novel taxonomy for organizing the XRL literature that prioritizes the RL setting. We overview techniques according to this taxonomy. We point out gaps in the literature, which we use to motivate and outline a roadmap for future work.

翻訳日:2022-02-19 02:10:26 公開日:2022-02-17

# (参考訳) 前立腺癌のスライド画像全体を調べる病理医の視覚的注意分析

Visual attention analysis of pathologists examining whole slide images of Prostate cancer ( http://arxiv.org/abs/2202.08437v1 )

ライセンス: CC BY 4.0

Souradeep Chakraborty, Ke Ma, Rajarsi Gupta, Beatrice Knudsen, Gregory J. Zelinsky, Joel H. Saltz, Dimitris Samaras

(参考訳) 本研究は,前立腺癌組織の全スライディング画像(WSI)をデジタル顕微鏡を用いて検討する。我々の知る限りでは、病理学者が前立腺がんのWSIをどのようにナビゲートし、診断に関する情報を蓄積するかを報告するのは初めてです。本研究は,GU専門医5名と一般病理医8名からなる13名の病理医からスライドナビゲーションデータ(ビューポート位置,拡大レベル,時間)を収集し,視覚的注意熱マップとスキャンパスを生成した。各病理医は、GUの病理専門医が選択したTCGA PRADデータセットから5つのWSIを検査した。 wsi検査後の病理医群における視覚注意の分布について検討・解析した。 WSIにおける病理医の注意とがんの証拠との関係を定量化するために, 生殖器専門医から腫瘍のアノテーションを得た。これらのアノテーションを用いて視覚注意の分布と腫瘍領域との重なりを計算し,強い相関関係を同定した。この分析により,未知のWSIに対する視覚的注意を予測するために,ディープラーニングモデルを訓練した。本モデルによって予測された注意熱マップは, 様々な空間的, 時間的評価指標を用いて17wsisの検査群において, 基底真理注意熱マップや腫瘍注釈と非常によく相関することがわかった。

We study the attention of pathologists as they examine whole-slide images (WSIs) of prostate cancer tissue using a digital microscope. To the best of our knowledge, our study is the first to report in detail how pathologists navigate WSIs of prostate cancer as they accumulate information for their diagnoses. We collected slide navigation data (i.e., viewport location, magnification level, and time) from 13 pathologists in 2 groups (5 genitourinary (GU) specialists and 8 general pathologists) and generated visual attention heatmaps and scanpaths. Each pathologist examined five WSIs from the TCGA PRAD dataset, which were selected by a GU pathology specialist. We examined and analyzed the distributions of visual attention for each group of pathologists after each WSI was examined. To quantify the relationship between a pathologist's attention and evidence for cancer in the WSI, we obtained tumor annotations from a genitourinary specialist. We used these annotations to compute the overlap between the distribution of visual attention and annotated tumor region to identify strong correlations. Motivated by this analysis, we trained a deep learning model to predict visual attention on unseen WSIs. We find that the attention heatmaps predicted by our model correlate quite well with the ground truth attention heatmap and tumor annotations on a test set of 17 WSIs by using various spatial and temporal evaluation metrics.

翻訳日:2022-02-19 01:56:58 公開日:2022-02-17

# (参考訳) 深層強化学習に基づく適応と一般化に関する研究

A Survey on Deep Reinforcement Learning-based Approaches for Adaptation and Generalization ( http://arxiv.org/abs/2202.08444v1 )

ライセンス: CC BY 4.0

Pamul Yadav, Ashutosh Mishra, Junyong Lee, Shiho Kim

(参考訳) Deep Reinforcement Learning (DRL)は、現実世界の環境で複雑な問題を効率的に解けるインテリジェントエージェントを作ることを目的としている。通常、適応と一般化という2つの学習目標が、DRLアルゴリズムの性能を異なるタスクや領域で基礎付けるために使われる。本稿では,DRLに基づく適応と一般化に向けた最近の研究動向について述べる。まず、これらの目標をタスクとドメインのコンテキストで定式化する。次に,これらの手法による最近の研究成果を概観し,DRLアルゴリズムの適応性と一般化性を向上し,現実世界の幅広い問題に適用できる可能性について論じる。

Deep Reinforcement Learning (DRL) aims to create intelligent agents that can learn to solve complex problems efficiently in a real-world environment. Typically, two learning goals: adaptation and generalization are used for baselining DRL algorithm's performance on different tasks and domains. This paper presents a survey on the recent developments in DRL-based approaches for adaptation and generalization. We begin by formulating these goals in the context of task and domain. Then we review the recent works under those approaches and discuss future research directions through which DRL algorithms' adaptability and generalizability can be enhanced and potentially make them applicable to a broad range of real-world problems.

翻訳日:2022-02-19 01:48:45 公開日:2022-02-17

# (参考訳) Design-Bench: データ駆動のオフラインモデルベース最適化のためのベンチマーク

Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization ( http://arxiv.org/abs/2202.08450v1 )

ライセンス: CC BY 4.0

Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine

(参考訳) black-box model-based optimization(mbo)問題は、未知の目的関数を最大化する設計入力を見つけることを目的としており、タンパク質の設計、dna配列、航空機、ロボットなど、幅広い領域においてユビキタスである。モデルに基づく最適化問題を解決するには、通常、設計提案において未知の目的関数を積極的に問い合わせる必要があり、つまり、候補分子、航空機、ロボットを物理的に構築し、それをテストし、結果を格納する。このプロセスは高価で時間がかかり、代わりに、既に持っているデータのみを使用して最適な設計のために最適化することを好むかもしれません。この設定はオフラインmboと呼ばれ、より一般的に研究されているオンライン技術とは異なるアルゴリズム上の課題をもたらす。近年の多くの研究は、高容量ディープニューラルネットワークを用いた高次元最適化問題に対するオフラインMBOの成功を示している。しかし、この新興分野における標準ベンチマークの欠如は、追跡を困難にしている。そこで本稿では,評価プロトコルを統一したオフラインmboベンチマークであるdesign-benchと,最近の手法のリファレンス実装を提案する。私たちのベンチマークには、生物学、材料科学、ロボット工学における現実世界の最適化問題から派生した、多様な現実的なタスクが含まれています。ベンチマークおよびリファレンス実装はgithub.com/rail-berkeley/design-benchおよびgithub.com/rail-berkeley/design-baselinesでリリースしています。

Black-box model-based optimization (MBO) problems, where the goal is to find a design input that maximizes an unknown objective function, are ubiquitous in a wide range of domains, such as the design of proteins, DNA sequences, aircraft, and robots. Solving model-based optimization problems typically requires actively querying the unknown objective function on design proposals, which means physically building the candidate molecule, aircraft, or robot, testing it, and storing the result. This process can be expensive and time consuming, and one might instead prefer to optimize for the best design using only the data one already has. This setting -- called offline MBO -- poses substantial and different algorithmic challenges than more commonly studied online techniques. A number of recent works have demonstrated success with offline MBO for high-dimensional optimization problems using high-capacity deep neural networks. However, the lack of standardized benchmarks in this emerging field is making progress difficult to track. To address this, we present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods. Our benchmark includes a suite of diverse and realistic tasks derived from real-world optimization problems in biology, materials science, and robotics that present distinct challenges for offline MBO. Our benchmark and reference implementations are released at github.com/rail-berkeley/design-bench and github.com/rail-berkeley/design-baselines.

翻訳日:2022-02-19 01:26:54 公開日:2022-02-17

# (参考訳) 科学的成功の遺伝子

The Gene of Scientific Success ( http://arxiv.org/abs/2202.08461v1 )

ライセンス: CC BY 4.0

Xiangjie Kong, Jun Zhang, Da Zhang, Yi Bu, Ying Ding, Feng Xia

(参考訳) 本稿では,科学的影響を改善するための因果要因の同定と評価方法について詳述する。現在、科学的影響の分析は、資金提供申請、メンター推薦、潜在的な協力者発見など様々な学術活動に有用である。ハイインパクトの学者は勤勉な仕事の奨励として賞を受ける機会が多すぎることが広く認められている。そのため、学者は科学的な業績を上げ、学問生活における科学的影響を改善することに多大な努力を捧げている。しかし,研究者の学業成功を左右する要因は何か。この問いへの答えは、研究者がより効率的に研究を行うのに役立つ。そこで本研究では,研究者の学術的成功に不可欠な要因を提示し,分析する。まず,記事中心因子,著者中心因子,会場中心因子,施設中心因子,時間的要因を含む5つの主要な要因を提案する。次に,最近の機械学習アルゴリズムとジャックナイフ法を適用し,各因果因子の重要性を評価する。その結果,著者中心および記事中心の要因は,コンピュータ科学分野における研究者の今後の成功に最も寄与することが示された。さらに、同じ機関や大学内の研究者のh-インデックスが互いに非常に近いという興味深い現象が発見された。

This paper elaborates how to identify and evaluate causal factors to improve scientific impact. Currently, analyzing scientific impact can be beneficial to various academic activities including funding application, mentor recommendation, and discovering potential cooperators etc. It is universally acknowledged that high-impact scholars often have more opportunities to receive awards as an encouragement for their hard working. Therefore, scholars spend great efforts in making scientific achievements and improving scientific impact during their academic life. However, what are the determinate factors that control scholars' academic success? The answer to this question can help scholars conduct their research more efficiently. Under this consideration, our paper presents and analyzes the causal factors that are crucial for scholars' academic success. We first propose five major factors including article-centered factors, author-centered factors, venue-centered factors, institution-centered factors, and temporal factors. Then, we apply recent advanced machine learning algorithms and jackknife method to assess the importance of each causal factor. Our empirical results show that author-centered and article-centered factors have the highest relevancy to scholars' future success in the computer science area. Additionally, we discover an interesting phenomenon that the h-index of scholars within the same institution or university are actually very close to each other.

翻訳日:2022-02-19 00:56:07 公開日:2022-02-17

# (参考訳) フルスパンログ線形モデルと高速学習アルゴリズム

Full-Span Log-Linear Model and Fast Learning Algorithm ( http://arxiv.org/abs/2202.08472v1 )

ライセンス: CC BY-SA 4.0

Kazuya Takabatake, Shotaro Akaho

(参考訳) 本論文で導入されたフルスパン対数線形(fsll)モデルは、ターゲットシステム内の全ての変数の数が$n$であるボルツマンマシンと見なされる。 X = (X_0, ..., X_{n-1})$ を$|X|=|X_0|...|X_{n-1}|$異なる値を取る有限離散確率変数とする。 FSLLモデルは$|X|-1$パラメータを持ち、任意の正の分布を$X$で表すことができる。 FSLLモデルは「高階」ボルツマンマシンであるが、指数族において重要な役割を果たすモデル分布の双対パラメータを$O(|X|\log|X|)$ timeで計算することができる。さらに、FSLLモデルの双対パラメータの特性を用いて、効率的な学習アルゴリズムを構築することができる。 FSLLモデルは、最大$|X|\approx2^{25}$までの小さな確率モデルに制限されているが、この問題領域では、FSLLモデルはハイパーパラメータチューニングなしで、トレーニングデータの基礎となる様々な真の分布に柔軟に適合する。実験の結果、FSLLはラップトップPCで1分以内に$|X|=2^{20}$のような6つのトレーニングデータセットを学習した。

The full-span log-linear(FSLL) model introduced in this paper is considered an $n$-th order Boltzmann machine, where $n$ is the number of all variables in the target system. Let $X=(X_0,...,X_{n-1})$ be finite discrete random variables that can take $|X|=|X_0|...|X_{n-1}|$ different values. The FSLL model has $|X|-1$ parameters and can represent arbitrary positive distributions of $X$. The FSLL model is a "highest-order" Boltzmann machine; nevertheless, we can compute the dual parameters of the model distribution, which plays important roles in exponential families, in $O(|X|\log|X|)$ time. Furthermore, using properties of the dual parameters of the FSLL model, we can construct an efficient learning algorithm. The FSLL model is limited to small probabilistic models up to $|X|\approx2^{25}$; however, in this problem domain, the FSLL model flexibly fits various true distributions underlying the training data without any hyperparameter tuning. The experiments presented that the FSLL successfully learned six training datasets such that $|X|=2^{20}$ within one minute with a laptop PC.

翻訳日:2022-02-19 00:40:36 公開日:2022-02-17

# (参考訳) パラフレーズ生成の評価基準の再検討

Revisiting the Evaluation Metrics of Paraphrase Generation ( http://arxiv.org/abs/2202.08479v1 )

ライセンス: CC BY 4.0

Lingfeng Shen, Haiyun Jiang, Lemao Liu, Shuming Shi

(参考訳) パラフレーズ生成は近年大きな進歩を遂げた重要なNLPタスクである。しかし、重要な問題の一つが「パラフレーズの品質をどのように評価するか?」である。ほとんどの既存のパラフレーズ生成モデルは、ニューラルネットワーク翻訳(NMT)から参照ベースのメトリクス(BLEUなど)を使用して、生成されたパラフレーズを評価する。このようなメトリクスの信頼性はほとんど評価されておらず、標準参照が存在する場合にのみ妥当である。そこで本稿では,まず「既存のメトリクスはパラフレーズ生成に信頼性があるか?」という問いに答える。パラフレーズ生成における従来の知恵に反する2つの結論を提示する。(1)システムレベルとセグメントレベルのパラフレーズ評価において、既存のメトリクスは人間のアノテーションと不一致である。 2) 基準のないメトリクスは基準ベースのメトリクスよりも優れており、パラフレーズの品質を評価するのに標準参照は不要であることを示している。このような経験的発見は、信頼性の高い自動評価指標の欠如を露呈する。そこで本稿では,生成したパラフレーズの品質を反映した参照フリーメトリックであるBBScoreを提案する。 BBScoreはS3CスコアとSelfBLEUの2つのサブメトリックから構成されており、これは意味的保存と多様性の2つの基準に対応する。 2つのサブメトリックを接続することで、BBScoreは既存のパラフレーズ評価指標を大幅に上回る。

Paraphrase generation is an important NLP task that has achieved significant progress recently. However, one crucial problem is overlooked, `how to evaluate the quality of paraphrase?'. Most existing paraphrase generation models use reference-based metrics (e.g., BLEU) from neural machine translation (NMT) to evaluate their generated paraphrase. Such metrics' reliability is hardly evaluated, and they are only plausible when there exists a standard reference. Therefore, this paper first answers one fundamental question, `Are existing metrics reliable for paraphrase generation?'. We present two conclusions that disobey conventional wisdom in paraphrasing generation: (1) existing metrics poorly align with human annotation in system-level and segment-level paraphrase evaluation. (2) reference-free metrics outperform reference-based metrics, indicating that the standard references are unnecessary to evaluate the paraphrase's quality. Such empirical findings expose a lack of reliable automatic evaluation metrics. Therefore, this paper proposes BBScore, a reference-free metric that can reflect the generated paraphrase's quality. BBScore consists of two sub-metrics: S3C score and SelfBLEU, which correspond to two criteria for paraphrase evaluation: semantic preservation and diversity. By connecting two sub-metrics, BBScore significantly outperforms existing paraphrase evaluation metrics.

翻訳日:2022-02-19 00:21:20 公開日:2022-02-17

# (参考訳) 自己教師付きノード表現学習のための構造的および意味的コントラスト学習

Structural and Semantic Contrastive Learning for Self-supervised Node Representation Learning ( http://arxiv.org/abs/2202.08480v1 )

ライセンス: CC BY 4.0

Kaize Ding, Yancheng Wang, Yingzhen Yang and Huan Liu

(参考訳) グラフコントラスト学習(GCL)は最近、自己教師型で一般化可能、転送可能、堅牢なノード表現の学習に多くの研究関心を集めている。一般に、gclのコントラスト学習プロセスは、グラフニューラルネットワーク(gnn)バックボーンによって学習された表現の上に行われ、ノードのコンテキスト情報をローカルな近傍に基づいて変換、伝播する。しかし、既存のGCLの取り組みは、アーキテクチャのエンコーディング、拡張、および対照的な目的の両方の観点から厳しい制限があり、異なるデータセットで使用するのに一般的に非効率で非効率である。この作業では、既存の教師なしのGCLを超越し、シンプルだが効果的なフレームワークであるS$^3$-CLを提案し、それらの制限に対処する。具体的には、構造的および意味的対比学習によって、単純なニューラルネットワークでさえ、価値のある構造的および意味的パターンを保存する表現的ノード表現を学習することができる。実験により, S$^3$-CLで学習したノード表現は, 最先端のGCL法と比較して, 異なる下流タスクにおいて優れた性能を示すことが示された。

Graph Contrastive Learning (GCL) recently has drawn much research interest for learning generalizable, transferable, and robust node representations in a self-supervised fashion. In general, the contrastive learning process in GCL is performed on top of the representations learned by a graph neural network (GNN) backbone, which transforms and propagates the node contextual information based on its local neighborhoods. However, existing GCL efforts have severe limitations in terms of both encoding architecture, augmentation, and contrastive objective, making them commonly inefficient and ineffective to use in different datasets. In this work, we go beyond the existing unsupervised GCL counterparts and address their limitations by proposing a simple yet effective framework S$^3$-CL. Specifically, by virtue of the proposed structural and semantic contrastive learning, even a simple neural network is able to learn expressive node representations that preserve valuable structural and semantic patterns. Our experiments demonstrate that the node representations learned by S$^3$-CL achieve superior performance on different downstream tasks compared to the state-of-the-art GCL methods.

翻訳日:2022-02-19 00:08:54 公開日:2022-02-17

# (参考訳) 連続物理学のための連続モデル学習

Learning continuous models for continuous physics ( http://arxiv.org/abs/2202.08494v1 )

ライセンス: CC BY 4.0

Aditi S. Krishnapriyan, Alejandro F. Queiruga, N. Benjamin Erichson, Michael W. Mahoney

(参考訳) 時間とともに継続的に進化する力学系は、科学と工学を通して普遍的である。機械学習(ML)は、そのようなシステムのダイナミクスをモデル化し予測するためのデータ駆動型アプローチを提供する。このアプローチの中核的な問題は、MLモデルは典型的には離散データに基づいて訓練され、基礎となる連続性の性質を意識していないML方法論を使用する。その結果、これらのMLモデルは、多くの科学的・工学的な応用に限られている。この課題に対処するため,数値解析理論に基づく収束試験を開発した。このテストは、モデルがシステムの基盤となる連続ダイナミクスを正確に近似する関数を学習したかどうかを検証する。このテストに失敗するモデルは、関連するダイナミクスを捉えることができず、多くの科学的予測タスクに対して限られたユーティリティで表現するが、このテストに合格するモデルは、より優れた補間と、より優れた補間の両方を複数の方法で実現できる。本研究は,従来のMLトレーニング/テスト手法と一体化して,科学・工学分野におけるモデルの検証を行う方法である。

Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach is that ML models are typically trained on discrete data, using ML methodologies that are not aware of underlying continuity properties, which results in models that often do not capture the underlying continuous dynamics of a system of interest. As a result, these ML models are of limited use for for many scientific and engineering applications. To address this challenge, we develop a convergence test based on numerical analysis theory. Our test verifies whether a model has learned a function that accurately approximates a system's underlying continuous dynamics. Models that fail this test fail to capture relevant dynamics, rendering them of limited utility for many scientific prediction tasks; while models that pass this test enable both better interpolation and better extrapolation in multiple ways. Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.

翻訳日:2022-02-18 23:50:34 公開日:2022-02-17

# (参考訳) ニューラルネットワークプルーニングにおける反復的微調整に基づく小型音声・視覚後発単語スポッティングシステムの設計に関する研究

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning ( http://arxiv.org/abs/2202.08509v1 )

ライセンス: CC BY 4.0

Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee

(参考訳) 音声のみに基づくウェイクワードスポッティング(WWS)は,信号伝送における環境干渉によりノイズの多い環境下では困難である。本稿では,視覚情報を利用した小型オーディオ・ビジュアルWWSシステムの設計について検討する。具体的には,視覚情報を利用するために,まず検出された唇を固定サイズのベクターにmobilenetと符号化し,音響的特徴と結合し,wwsのフュージョンネットワークを合成する。しかし、ニューラルネットワークに基づくオーディオ視覚モデルは、大きなフットプリントと高い計算複雑性を必要とする。アプリケーション要件を満たすために,ロッタリーチケット仮説(lth-if)によるニューラルネットワークのプルーニング戦略を,単モードモデルとマルチモーダルモデルに対して,反復的微調整方式(lth-if)で導入する。ホームテレビシーンにおける視聴覚wwのための社内コーパスでテストした結果,提案する視聴覚システムは,単一モード(オーディオのみまたはビデオのみ)システムに対して,異なる雑音環境下で大きな性能向上を達成している。さらに、LTH-IFプルーニングは、WWS性能を低下させることなく、ネットワークパラメータと計算を大幅に削減し、テレビの起動シナリオに潜在的な製品ソリューションをもたらす。

Audio-only-based wake word spotting (WWS) is challenging under noisy conditions due to environmental interference in signal transmission. In this paper, we investigate on designing a compact audio-visual WWS system by utilizing visual information to alleviate the degradation. Specifically, in order to use visual information, we first encode the detected lips to fixed-size vectors with MobileNet and concatenate them with acoustic features followed by the fusion network for WWS. However, the audio-visual model based on neural networks requires a large footprint and a high computational complexity. To meet the application requirements, we introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF), to the single-modal and multi-modal models, respectively. Tested on our in-house corpus for audio-visual WWS in a home TV scene, the proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions. Moreover, LTH-IF pruning can largely reduce the network parameters and computations with no degradation of WWS performance, leading to a potential product solution for the TV wake-up scenario.

翻訳日:2022-02-18 23:33:24 公開日:2022-02-17

# (参考訳) 対面分類としての視覚的地上真理構築

Visual Ground Truth Construction as Faceted Classification ( http://arxiv.org/abs/2202.08512v1 )

ライセンス: CC BY 4.0

Fausto Giunchiglia, Mayukh Bagchi, Xiaolei Diao

(参考訳) 機械学習とコンピュータビジョンにおける最近の研究は、主要なオブジェクト認識ベンチマークデータセットの開発において体系的な設計欠陥の証拠を提供している。例えば ImageNet では,いくつかのカテゴリのイメージに対して,表現対象とアノテートに使用するラベルとの間には矛盾がある。この問題の結果は、特に多くの機械学習アプリケーション、特にこれらのデータセットに基づいてトレーニングされたDeep Neural Networksに基づくアプリケーションを考えると、大きなものだ。本稿では,これらの基礎的真理ベンチマークデータセットの構築の基礎を提供する知識表現(kr)方法論の欠如が問題点であることを示す。そこで本研究では,3つの主要なステップで記述された解を提案する。 (i) テレオースマン論の哲学理論に基づく4つの順序付け段階における物体認識過程の分解 (ii)このような階層化に基づき、その視覚特性に応じて分類階層内で物体を整理するための新しい4段階の方法論を提案している。 (iii)顔分類パラダイムに従ってこのような分類を行う。アプローチの重要な新規性は、視覚的種分化を利用した視覚的特性から分類階層を構築し、言語的に基礎付けられた性質からではないという事実にある。提案手法は、音楽実験のImageNet階層に関する一連の実験によって検証される。

Recent work in Machine Learning and Computer Vision has provided evidence of systematic design flaws in the development of major object recognition benchmark datasets. One such example is ImageNet, wherein, for several categories of images, there are incongruences between the objects they represent and the labels used to annotate them. The consequences of this problem are major, in particular considering the large number of machine learning applications, not least those based on Deep Neural Networks, that have been trained on these datasets. In this paper we posit the problem to be the lack of a knowledge representation (KR) methodology providing the foundations for the construction of these ground truth benchmark datasets. Accordingly, we propose a solution articulated in three main steps: (i) deconstructing the object recognition process in four ordered stages grounded in the philosophical theory of teleosemantics; (ii) based on such stratification, proposing a novel four-phased methodology for organizing objects in classification hierarchies according to their visual properties; and (iii) performing such classification according to the faceted classification paradigm. The key novelty of our approach lies in the fact that we construct the classification hierarchies from visual properties exploiting visual genus-differentiae, and not from linguistically grounded properties. The proposed approach is validated by a set of experiments on the ImageNet hierarchy of musical experiments.

翻訳日:2022-02-18 23:21:37 公開日:2022-02-17

# (参考訳) 画像変換を用いた自己教師型表現学習に関する調査

Survey on Self-supervised Representation Learning Using Image Transformations ( http://arxiv.org/abs/2202.08514v1 )

ライセンス: CC BY 4.0

Muhammad Ali, Sayed Hashim

(参考訳) 深層ニューラルネットワークは大量のトレーニングデータを必要とするが、現実世界ではトレーニング目的のデータが少ない。これらの問題を解決するために、自己教師付き学習法(SSL)が用いられる。 ssl using geometric transformations (gt) は教師なし表現学習で使われる単純かつ強力な技術である。複数の調査論文がssl技術をレビューしているが、幾何学的変換を使うものだけに焦点を当てたものはない。さらに、これらの手法は、レビューされた論文では詳しくは触れられていない。この研究を提示する動機は、幾何学的変換が教師なし表現学習において強力な監督信号であることが示されていることです。また、多くの作品が大成功を収めたが、あまり注目されなかった。幾何変換を用いたSSLアプローチの簡潔な調査を行う。我々は、予測と自動エンコード変換に基づく画像変換を含む6つの代表的なモデルを要約する。私たちは、彼らのアーキテクチャと学習方法論をレビューします。また、cifar-10およびimagenetデータセットのオブジェクト認識タスクにおけるこれらのモデルの性能を比較する。分析の結果,AETv2はほとんどの環境で最高の性能を示した。機能分離によるローテーションも、いくつかの設定でうまく機能した。そして、観察結果から洞察を得る。最後に、結果と洞察の要約とともに、対処すべきオープンな問題を強調し、様々な今後の方向性を示す。

Deep neural networks need huge amount of training data, while in real world there is a scarcity of data available for training purposes. To resolve these issues, self-supervised learning (SSL) methods are used. SSL using geometric transformations (GT) is a simple yet powerful technique used in unsupervised representation learning. Although multiple survey papers have reviewed SSL techniques, there is none that only focuses on those that use geometric transformations. Furthermore, such methods have not been covered in depth in papers where they are reviewed. Our motivation to present this work is that geometric transformations have shown to be powerful supervisory signals in unsupervised representation learning. Moreover, many such works have found tremendous success, but have not gained much attention. We present a concise survey of SSL approaches that use geometric transformations. We shortlist six representative models that use image transformations including those based on predicting and autoencoding transformations. We review their architecture as well as learning methodologies. We also compare the performance of these models in the object recognition task on CIFAR-10 and ImageNet datasets. Our analysis indicates the AETv2 performs the best in most settings. Rotation with feature decoupling also performed well in some settings. We then derive insights from the observed results. Finally, we conclude with a summary of the results and insights as well as highlighting open problems to be addressed and indicating various future directions.

翻訳日:2022-02-18 23:00:48 公開日:2022-02-17

# (参考訳) DeepHybrid: 物体分類のための自動車レーダスペクトルと反射の深層学習

DeepHybrid: Deep Learning on Automotive Radar Spectra and Reflections for Object Classification ( http://arxiv.org/abs/2202.08519v1 )

ライセンス: CC BY-SA 4.0

Adriana-Eliza Cozma, Lisa Morgan, Martin Stolz, David Stoeckel, Kilian Rambach

(参考訳) 自動運転車は、オブジェクトと交通参加者を正確に検出し、分類する必要がある。自動車用レーダーセンサーを用いた信頼性の高い物体分類は困難であることが判明した。本稿では,従来のレーダー信号処理とディープラーニングアルゴリズムを組み合わせた手法を提案する。レーダ反射レベルの範囲方位情報は、レンジドップラースペクトルから関心のあるスパース領域を抽出するために使用される。これはニューラルネットワーク(NN)への入力として使用され、静止オブジェクトと移動オブジェクトの異なるタイプを分類する。本研究では,レーダースペクトルと反射特性の両方を入力として受信するハイブリッドモデル(deephybrid)を提案する。実験の結果,スペクトルのみを用いたモデルと比較して分類性能が向上した。さらに、資源効率が高く高性能なNNを見つけるために、ニューラルネットワーク探索(NAS)アルゴリズムを適用した。 NASは、精度を保ちながら手作業で設計したNNよりも、ほぼ1桁小さいNNが得られる。提案手法は,自動緊急ブレーキや衝突回避システムの改善などに用いることができる。

Automated vehicles need to detect and classify objects and traffic participants accurately. Reliable object classification using automotive radar sensors has proved to be challenging. We propose a method that combines classical radar signal processing and Deep Learning algorithms. The range-azimuth information on the radar reflection level is used to extract a sparse region of interest from the range-Doppler spectrum. This is used as input to a neural network (NN) that classifies different types of stationary and moving objects. We present a hybrid model (DeepHybrid) that receives both radar spectra and reflection attributes as inputs, e.g. radar cross-section. Experiments show that this improves the classification performance compared to models using only spectra. Moreover, a neural architecture search (NAS) algorithm is applied to find a resource-efficient and high-performing NN. NAS yields an almost one order of magnitude smaller NN than the manually-designed one while preserving the accuracy. The proposed method can be used for example to improve automatic emergency braking or collision avoidance systems.

翻訳日:2022-02-18 22:55:24 公開日:2022-02-17

# (参考訳) 自己指導・対人訓練を用いたエンドツーエンド音楽リマスターシステム

End-to-end Music Remastering System Using Self-supervised and Adversarial Training ( http://arxiv.org/abs/2202.08520v1 )

ライセンス: CC BY 4.0

Junghyun Koo, Seungryeol Paik, Kyogu Lee

(参考訳) マスタリングは音楽制作において不可欠なステップだが、経験豊富なオーディオエンジニアの手に渡り、曲のトーン、スペース、ボリュームを調整しなければならない課題でもある。リマスターは同じ技術的プロセスに従っており、そのコンテキストは当時の曲をマスターすることにある。これらのタスクは入力障壁が高いため、入力音声のマスタリングスタイルをターゲットに変換するエンドツーエンドの音楽リマスターシステムを提案することにより、障壁を低くすることを目指している。システムは自己指導的な方法で訓練され、解放されたポップソングがトレーニングに使用された。また,事前学習したエンコーダと投影判別器を適用して,参照のマスタリングスタイルを反映した現実的な音声を生成するモデルも期待した。その結果を定量的指標と主観的聞き取りテストを用いて検証し,モデルが目標と類似したマスタリングスタイルのサンプルを生成したことを示す。

Mastering is an essential step in music production, but it is also a challenging task that has to go through the hands of experienced audio engineers, where they adjust tone, space, and volume of a song. Remastering follows the same technical process, in which the context lies in mastering a song for the times. As these tasks have high entry barriers, we aim to lower the barriers by proposing an end-to-end music remastering system that transforms the mastering style of input audio to that of the target. The system is trained in a self-supervised manner, in which released pop songs were used for training. We also anticipated the model to generate realistic audio reflecting the reference's mastering style by applying a pre-trained encoder and a projection discriminator. We validate our results with quantitative metrics and a subjective listening test and show that the model generated samples of mastering style similar to the target.

翻訳日:2022-02-18 22:44:27 公開日:2022-02-17

# (参考訳) 確率的ブロックモデルにおける不均衡コミュニティの回復と欠陥オラクルによるクラスタリングへの応用

Recovering Unbalanced Communities in the Stochastic Block Model With Application to Clustering with a Faulty Oracle ( http://arxiv.org/abs/2202.08522v1 )

ライセンス: CC BY 4.0

Chandra Sekhar Mukherjee, Pan Peng and Jiapeng Zhang

(参考訳) 確率ブロックモデル(SBM)は,ネットワークにおけるグラフクラスタリングやコミュニティ検出の基本的なモデルである。過去10年間で大きな注目を集めており、バランスの取れた場合、すなわち全てのクラスターが大きければ、十分に研究されている。しかし、不均衡なコミュニティとのSBMの理解(おそらく実際はより関連性が高い)は依然として極めて限られている。本稿では,SBMのコミュニティを様々な大きさのコミュニティで復元するための,SVDに基づく簡単なアルゴリズムを提案する。 KS-threshold予想の下では、我々のアルゴリズムのパラメータ間のトレードオフは、幅広いレジームに対する多対数因子にほぼ最適である。副産物として,複数の先行作業(mazumdarand saha [nips 2017], larsen, mitzenmacher, tsourakakis [www 2020], peng and zhang[colt 2021])によって改善される,クラスタ化問題に対するクエリ複雑性が向上した,時間効率の高いアルゴリズムを得る。 ks-threshold予想の下では、アルゴリズムのクエリの複雑さは多対数因子までほぼ最適である。

The stochastic block model (SBM) is a fundamental model for studying graph clustering or community detection in networks. It has received great attention in the last decade and the balanced case, i.e., assuming all clusters have large size, has been well studied. However, our understanding of SBM with unbalanced communities (arguably, more relevant in practice) is still very limited. In this paper, we provide a simple SVD-based algorithm for recovering the communities in the SBM with communities of varying sizes. Under the KS-threshold conjecture, the tradeoff between the parameters in our algorithm is nearly optimal up to polylogarithmic factors for a wide range of regimes. As a byproduct, we obtain a time-efficient algorithm with improved query complexity for a clustering problem with a faulty oracle, which improves upon a number of previous work (Mazumdarand Saha [NIPS 2017], Larsen, Mitzenmacher and Tsourakakis [WWW 2020], Peng and Zhang[COLT 2021]). Under the KS-threshold conjecture, the query complexity of our algorithm is nearly optimal up to polylogarithmic factors.

翻訳日:2022-02-18 22:34:33 公開日:2022-02-17

# (参考訳) オープンソースの風力・風力データセットの収集と分類

A Collection and Categorization of Open-Source Wind and Wind Power Datasets ( http://arxiv.org/abs/2202.08524v1 )

ライセンス: CC BY 4.0

Nina Effenberger and Nicole Ludwig

(参考訳) 風力発電やその他の再生可能エネルギー源は、今日の電力網のエネルギー供給において、より重要な役割を担っている。そのため、電力グリッドのバランスをとるには再生可能エネルギー源の予測が不可欠である。新しい予測方法に注目する一方で、メソッドを他のユースケースやデータと比較し、再現し、転送する方法にはほとんど注意を払わない。この欠如の理由のひとつは、現在使用されている多くのデータセットが非開示であり、研究の再現性を不可能にしているため、オープンソースデータセットの可用性が限られていることだ。このオープンソースのデータセットの利用不可能性は、風力予測のような商業的に興味深い分野で特に一般的である。しかし,本論文では,既存のオープンソースの風力データセットの最新の概観と,風力予測に使用できるさまざまなデータセット群への分類を提供することにより,研究者が利用可能なデータセット上での手法を比較することを可能にする。風力予測タスクに十分なデータセットが公開されていることを示し、研究者が適切なオープンソースデータセットを選択してそれらのメソッドを比較することができるように、異なるデータグループ特性について議論する。

Wind power and other forms of renewable energy sources play an ever more important role in the energy supply of today's power grids. Forecasting renewable energy sources has therefore become essential in balancing the power grid. While a lot of focus is placed on new forecasting methods, little attention is given on how to compare, reproduce and transfer the methods to other use cases and data. One reason for this lack of attention is the limited availability of open-source datasets, as many currently used datasets are non-disclosed and make reproducibility of research impossible. This unavailability of open-source datasets is especially prevalent in commercially interesting fields such as wind power forecasting. However, with this paper we want to enable researchers to compare their methods on publicly available datasets by providing the, to our knowledge, largest up-to-date overview of existing open-source wind power datasets, and a categorization into different groups of datasets that can be used for wind power forecasting. We show that there are publicly available datasets sufficient for wind power forecasting tasks and discuss the different data groups properties to enable researchers to choose appropriate open-source datasets and compare their methods on them.

翻訳日:2022-02-18 22:12:52 公開日:2022-02-17

# (参考訳) ベイジアンニューラルモデリングを用いたエンド・ツー・エンド音声認識の高速化

Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition ( http://arxiv.org/abs/2202.08532v1 )

ライセンス: CC BY 4.0

Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko

(参考訳) 本研究では,敵対的雑音のある音声に対して,エンドツーエンド自動音声認識(ASR)のシステムロバスト性を高めることを目的とする。厳密で経験的な"閉じたモデルの敵対的ロバスト性"設定(例えば、オンデバイスやクラウドアプリケーション)に焦点を当てています。対向ノイズは、ターゲットとするASRモデルの勾配情報に直接アクセスすることなく、閉モデル最適化(例えば、進化的およびゼロ次推定)によってのみ生成される。本稿では,bnn(advanced bayesian neural network)を基盤とした,適応的逆摂動に対する潜性分布を分岐計測によりモデル化する手法を提案する。さらに, RNN Transducer, Conformer, wav2vec-2.0 ベースの ASR システムの配置シナリオを, 逆検出システムを用いてシミュレートする。提案したBNNベースの検出システムを利用することで,検出率を+2.77から+5.42%(相対+3.03から+6.26%)に改善し,LbriSpeechデータセット上での単語誤り率を5.02から7.47%削減する。

In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples. We focus on a rigorous and empirical "closed-model adversarial robustness" setting (e.g., on-device or cloud applications). The adversarial noise is only generated by closed-model optimization (e.g., evolutionary and zeroth-order estimation) without accessing gradient information of a targeted ASR model directly. We propose an advanced Bayesian neural network (BNN) based adversarial detector, which could model latent distributions against adaptive adversarial perturbation with divergence measurement. We further simulate deployment scenarios of RNN Transducer, Conformer, and wav2vec-2.0 based ASR systems with the proposed adversarial detection system. Leveraging the proposed BNN based detection system, we improve detection rate by +2.77 to +5.42% (relative +3.03 to +6.26%) and reduce the word error rate by 5.02 to 7.47% on LibriSpeech datasets compared to the current model enhancement methods against the adversarial speech examples.

翻訳日:2022-02-18 21:48:45 公開日:2022-02-17

# (参考訳) AISHELL-NER:中国語音声からのエンティティ認識

AISHELL-NER: Named Entity Recognition from Chinese Speech ( http://arxiv.org/abs/2202.08533v1 )

ライセンス: CC BY 4.0

Boli Chen, Guangwei Xu, Xiaobin Wang, Pengjun Xie, Meishan Zhang, Fei Huang

(参考訳) 音声からのエンティティ認識(NER)は音声信号から意味情報を抽出することを目的とした音声言語理解(SLU)タスクの一つである。音声からのNERは通常、(1)音声を自動音声認識(ASR)システムで処理し、(2)NERタグをASR出力に適用する2段階のパイプラインによって行われる。最近の研究は、英語とフランス語の音声からNERに対するEnd-to-End(E2E)アプローチの能力を示している。しかし、中国語には多くのホモフォンやポリフォンがあるため、中国語のNERは事実上難しい課題である。本稿では,中国語音声からのNERのためのデータセットAISEHLL-NERを提案する。いくつかの最先端手法の性能を調べるために,広範囲な実験を行った。その結果、エンティティ認識型ASRと事前学習型NERタグを併用することで、現在のSLUパイプラインに容易に適用できることが示されている。データセットはgithub.com/Alibaba-NLP/AISHELL-NERで公開されている。

Named Entity Recognition (NER) from speech is among Spoken Language Understanding (SLU) tasks, aiming to extract semantic information from the speech signal. NER from speech is usually made through a two-step pipeline that consists of (1) processing the audio using an Automatic Speech Recognition (ASR) system and (2) applying an NER tagger to the ASR outputs. Recent works have shown the capability of the End-to-End (E2E) approach for NER from English and French speech, which is essentially entity-aware ASR. However, due to the many homophones and polyphones that exist in Chinese, NER from Chinese speech is effectively a more challenging task. In this paper, we introduce a new dataset AISEHLL-NER for NER from Chinese speech. Extensive experiments are conducted to explore the performance of several state-of-the-art methods. The results demonstrate that the performance could be improved by combining entity-aware ASR and pretrained NER tagger, which can be easily applied to the modern SLU pipeline. The dataset is publicly available at github.com/Alibaba-NLP/AISHELL-NER.

翻訳日:2022-02-18 21:33:50 公開日:2022-02-17

# (参考訳) ANNに新しいニューロンをいつ、どこで、どのように追加するか

When, where, and how to add new neurons to ANNs ( http://arxiv.org/abs/2202.08539v1 )

ライセンス: CC BY 4.0

Kaitlin Maile, Emmanuel Rachelson, Herv\'e Luga, Dennis G. Wilson

(参考訳) annの神経新生は未熟で難しい問題であり、刈り取りのような構造学習の他の形態と比較しても難しい。トリガーと初期化に分解することで、学習プロセス中にニューロンをいつ、どこで、どのように追加するかという、神経発生のさまざまな側面を研究するフレームワークを導入します。神経新生戦略のニューラルオルソゴン性(NORTH*)スイートを,活性化や重みの直交性に基づく階層的トリガと初期化を組み合わせて,効率の良い大きさに収束する性能的ネットワークを動的に成長させる。 MLPを用いた他の神経新生研究に対する我々の貢献を評価する。

Neurogenesis in ANNs is an understudied and difficult problem, even compared to other forms of structural learning like pruning. By decomposing it into triggers and initializations, we introduce a framework for studying the various facets of neurogenesis: when, where, and how to add neurons during the learning process. We present the Neural Orthogonality (NORTH*) suite of neurogenesis strategies, combining layer-wise triggers and initializations based on the orthogonality of activations or weights to dynamically grow performant networks that converge to an efficient size. We evaluate our contributions against other recent neurogenesis works with MLPs.

翻訳日:2022-02-18 21:23:14 公開日:2022-02-17

# (参考訳) 医用画像における深層学習の概要

An overview of deep learning in medical imaging ( http://arxiv.org/abs/2202.08546v1 )

ライセンス: CC BY 4.0

Imran Ul Haq

(参考訳) 機械学習(ML)は、最近の10年間で大きな検討がなされている。この成功は2012年、MLモデルがコンピュータビジョンに関する世界で最も有名なコンペであるImageNet Classificationで驚くべき勝利を収めた時に始まった。このモデルは、Deep Learning(DL)と呼ばれる畳み込みニューラルネットワーク(CNN)の一種である。それ以来、研究者はdlの最速の研究開発領域に効率的に参加し始めた。近年、DLシステムは、人間の言語処理からビデオ分析まで幅広い分野にまたがる最先端のMLシステムであり、学術の世界や企業でよく使われている。最近の進歩は医療分野に大きな改善をもたらす可能性がある。データ処理、画像解析の革新的手法を改良し、診断技術や医療サービスを大幅に改善することができる。画像診断におけるdlの分野における問題点と今後の展開について概観した。レビューの主な目的は次の4つです。 (i)異なるDLモデルについて議論することで、DLに簡単なプロログを提供する。 (ii)医療画像解析におけるdlの使用状況(分類・検出・分割・登録)についての検討三医用画像におけるDLの7つの主な応用分野の検討 (iv)無償利用可能なdlコード、公開データセットテーブル7、医療画像コンペティションソーステーブル8などの有用な情報的資産のリンクを提供することにより、臨床画像におけるdlに関する研究領域への追加を熱望する人に初期段階を与え、医学分野におけるdlの明確な継続的な困難、教訓、今後の展望を概説し、調査を終了させる。

Machine learning (ML) has seen enormous consideration during the most recent decade. This success started in 2012 when an ML model accomplished a remarkable triumph in the ImageNet Classification, the world's most famous competition for computer vision. This model was a kind of convolutional neural system (CNN) called deep learning (DL). Since then, researchers have started to participate efficiently in DL's fastest developing area of research. These days, DL systems are cutting-edge ML systems spanning a broad range of disciplines, from human language processing to video analysis, and commonly used in the scholarly world and enterprise sector. Recent advances can bring tremendous improvement to the medical field. Improved and innovative methods for data processing, image analysis and can significantly improve the diagnostic technologies and medicinal services gradually. A quick review of current developments with relevant problems in the field of DL used for medical imaging has been provided. The primary purposes of the review are four: (i) provide a brief prolog to DL by discussing different DL models, (ii) review of the DL usage for medical image analysis (classification, detection, segmentation, and registration), (iii) review seven main application fields of DL in medical imaging, (iv) give an initial stage to those keen on adding to the research area about DL in clinical imaging by providing links of some useful informative assets, such as freely available DL codes, public datasets Table 7, and medical imaging competition sources Table 8 and end our survey by outlining distinct continuous difficulties, lessons learned and future of DL in the field of medical science.

翻訳日:2022-02-18 21:10:09 公開日:2022-02-17

# (参考訳) oracleによる最悪の敵に対するオンライン学習の効率化

Oracle-Efficient Online Learning for Beyond Worst-Case Adversaries ( http://arxiv.org/abs/2202.08549v1 )

ライセンス: CC BY 4.0

Nika Haghtalab, Yanjun Han, Abhishek Shetty, Kunhe Yang

(参考訳) 本稿では,オンライン学習の最悪のケース分析を超越した,オラクル効率のアルゴリズムについて検討する。私たちは2つの設定に集中します。まず,[rst11,hrs12]の平滑化解析設定は,一様密度の1/\sigma$倍の上限値を持つ分布からサンプルを生成することに制約される。第二に、$K$-hintトランスダクティブ学習の設定では、学習者が真のインスタンスを含むことが保証される時間毎に$K$ヒントにアクセスできるようになる。私たちは、クラスのvc次元のみに依存する設定と、敵の力をキャプチャする$\sigma$と$k$の両方に対して、最初のoracle効率の高いアルゴリズムを提供します。特に、これらの設定に対してそれぞれ$ O ( \sqrt{T (d / \sigma )^{1/2} } ) $ と $ O ( \sqrt{T d K } )$ のオラクル効率の後悔境界を達成する。このスムーズな分析設定のために,本研究は,スムーズな相手を用いたオンライン学習のための最初のオラクル効率アルゴリズムを提供する[HRS21]。これは[HK16]が確立したオフライン学習と, オンライン学習と最悪の相手との計算的分離とは対照的である。私たちのアルゴリズムは、小さなドメインで最悪の場合のバウンダリも改善しています。特に、$O ( \sqrt{T(d \vert{\mathcal{X}}\vert ) ^{1/2} })$を後悔したオラクル効率のアルゴリズムを与え、これは [DS16] で束縛された以前の$O ( \sqrt{T\vert{\mathcal{X} } \vert })$の洗練である。

In this paper, we study oracle-efficient algorithms for beyond worst-case analysis of online learning. We focus on two settings. First, the smoothed analysis setting of [RST11, HRS12] where an adversary is constrained to generating samples from distributions whose density is upper bounded by $1/\sigma$ times the uniform density. Second, the setting of $K$-hint transductive learning, where the learner is given access to $K$ hints per time step that are guaranteed to include the true instance. We give the first known oracle-efficient algorithms for both settings that depend only on the VC dimension of the class and parameters $\sigma$ and $K$ that capture the power of the adversary. In particular, we achieve oracle-efficient regret bounds of $ O ( \sqrt{T (d / \sigma )^{1/2} } ) $ and $ O ( \sqrt{T d K } )$ respectively for these setting. For the smoothed analysis setting, our results give the first oracle-efficient algorithm for online learning with smoothed adversaries [HRS21]. This contrasts the computational separation between online learning with worst-case adversaries and offline learning established by [HK16]. Our algorithms also imply improved bounds for worst-case setting with small domains. In particular, we give an oracle-efficient algorithm with regret of $O ( \sqrt{T(d \vert{\mathcal{X}}\vert ) ^{1/2} })$, which is a refinement of the earlier $O ( \sqrt{T\vert{\mathcal{X} } \vert })$ bound by [DS16].

翻訳日:2022-02-18 20:35:21 公開日:2022-02-17

# (参考訳) 非同期学習のための遅延適応ステップサイズ

Delay-adaptive step-sizes for asynchronous learning ( http://arxiv.org/abs/2202.08550v1 )

ライセンス: CC BY 4.0

Xuyang Wu, Sindri Magnusson, Hamid Reza Feyzmahdavian and Mikael Johansson

(参考訳) スケーラブルな機械学習システムでは、モデルトレーニングは、厳密な同期なしに実行される複数のノードに並列化されることが多い。関連する非同期アルゴリズムのほとんどの分析結果は、学習率を決定するためにシステム内の情報遅延の上限を使用する。このような境界は事前に取得することが難しいだけでなく、不必要に収束が遅くなる。本稿では,システムにおける実際の時間変化の遅延に依存する学習率を利用することが可能であることを示す。遅延適応型非同期反復に対する一般的な収束結果を開発し,近位漸進勾配降下法とブロック座標降下法に特化する。これらの方法のそれぞれについて,遅延をオンラインで測定し,遅延適応型ステップサイズポリシを提示し,その理論上および実用上の優位性を実証する。

In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block-coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.

翻訳日:2022-02-18 20:34:01 公開日:2022-02-17

# (参考訳) 普遍的対向摂動による深部ニューラルネットワークのグローバルなフィンガープリント

Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations ( http://arxiv.org/abs/2202.08602v1 )

ライセンス: CC BY 4.0

Zirui Peng and Shaofeng Li and Guoxing Chen and Cheng Zhang and Haojin Zhu and Minhui Xue

(参考訳) 本稿では,容疑者モデルが被害者モデルから盗まれているかどうかをモデル抽出攻撃によって検証できる,新規かつ実用的なメカニズムを提案する。我々の重要な洞察は、DNNモデルの決定境界のプロファイルは、その \textit{Universal Adversarial Perturbations (UAPs) によって特徴付けられることである。 UAPは低次元のサブスペースに属し、海賊モデルのサブスペースは非海賊モデルよりも犠牲者モデルのサブスペースとより整合している。そこで本研究では, DNNモデルに対するUAPフィンガープリント手法を提案し, 指紋を入力とし, 類似度スコアを出力する <textit{contrastive learning} を用いてエンコーダを訓練する。広範囲にわたる研究により、我々のフレームワークは、疑似モデルの20ドルの指紋だけで、信頼度99.99 %$でモデルIP侵害を検出することができることが示された。異なるモデルアーキテクチャにまたがる優れた一般化性を持ち、盗難モデルの修正後に対して堅牢である。

In this paper, we propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks. Our key insight is that the profile of a DNN model's decision boundary can be uniquely characterized by its \textit{Universal Adversarial Perturbations (UAPs)}. UAPs belong to a low-dimensional subspace and piracy models' subspaces are more consistent with victim model's subspace compared with non-piracy model. Based on this, we propose a UAP fingerprinting method for DNN models and train an encoder via \textit{contrastive learning} that takes fingerprint as inputs, outputs a similarity score. Extensive studies show that our framework can detect model IP breaches with confidence $> 99.99 \%$ within only $20$ fingerprints of the suspect model. It has good generalizability across different model architectures and is robust against post-modifications on stolen models.

翻訳日:2022-02-18 20:13:22 公開日:2022-02-17

# (参考訳) メタ-)ソルバアプローチの評価について

On the evaluation of (meta-)solver approaches ( http://arxiv.org/abs/2202.08613v1 )

ライセンス: CC BY 4.0

Roberto Amadini, Maurizio Gabbrielli, Tong Liu, Jacopo Mauro

(参考訳) メタソルバアプローチは、よりよいソルバを構築するために、多数の個別のソルバを利用する。メタソルバの性能を評価するには、個々のソルバ(例えば、ランタイムやソリューションの品質)に典型的なメトリクスを採用するか、より具体的な評価指標(例えば、メタソルバが仮想的な最高のパフォーマンスにどの程度近いかを測定することで)を採用する。本稿では,最近発表されたいくつかの成果をもとに,その強みと弱みを基礎として,(メタ)ソルバを評価するためのさまざまなパフォーマンス指標の概要を示す。

Meta-solver approaches exploits a number of individual solvers to potentially build a better solver. To assess the performance of meta-solvers, one can simply adopt the metrics typically used for individual solvers (e.g., runtime or solution quality), or employ more specific evaluation metrics (e.g., by measuring how close the meta-solver gets to its virtual best performance). In this paper, based on some recently published works, we provide an overview of different performance metrics for evaluating (meta-)solvers, by underlying their strengths and weaknesses.

翻訳日:2022-02-18 19:54:40 公開日:2022-02-17

# (参考訳) 動的放射場レンダリングのためのFourier PlenOctrees

Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-time ( http://arxiv.org/abs/2202.08614v1 )

ライセンス: CC BY 4.0

Liao Wang, Jiakai Zhang, Xinhang Liu, Fuqiang Zhao, Yanshun Zhang, Yingliang Zhang, Minye Wu, Lan Xu and Jingyi Yu

(参考訳) Neural Radiance Field (NeRF)のような暗黙の神経表現は主に、PlenOctreeのようなスマートなデータ構造でリアルタイムなレンダリングを実現するマルチビュー設定下でキャプチャされた静的オブジェクトのモデリングに焦点を当てている。本稿では,FVV(Fourier PlenOctree)技術を用いて,FVV(Fourier PlenOctree)設定下で撮影した動的シーンの効率的なニューラルモデリングとリアルタイムレンダリングを実現する。我々のFPOにおける鍵となるアイデアは、一般化されたNeRF、PlenOctree表現、体積融合、フーリエ変換の新たな組み合わせである。 fpo構築を加速するために, 一般化したnerf技術を用いて空間的ブレンドにより木を生成できる新しい粗粒間融合スキームを提案する。動的シーンに取り組むために、暗黙のネットワークを調整し、時間軸密度と色属性のフーリエ係数をモデル化する。最後に、FPOを構築し、動的列の合同PlenOctree構造の葉に直接フーリエ係数を訓練する。結果,FPOは動的オブジェクトの処理にコンパクトなメモリオーバーロードを実現し,高速な微調整をサポートすることを示す。大規模な実験により,提案手法は元のNeRFの3000倍の速度でSOTAよりも桁違いの加速を実現し,非表示ダイナミックシーンの自由視点レンダリングに高い視覚的品質を保っていることがわかった。

Implicit neural representations such as Neural Radiance Field (NeRF) have focused mainly on modeling static objects captured under multi-view settings where real-time rendering can be achieved with smart data structures, e.g., PlenOctree. In this paper, we present a novel Fourier PlenOctree (FPO) technique to tackle efficient neural modeling and real-time rendering of dynamic scenes captured under the free-view video (FVV) setting. The key idea in our FPO is a novel combination of generalized NeRF, PlenOctree representation, volumetric fusion and Fourier transform. To accelerate FPO construction, we present a novel coarse-to-fine fusion scheme that leverages the generalizable NeRF technique to generate the tree via spatial blending. To tackle dynamic scenes, we tailor the implicit network to model the Fourier coefficients of timevarying density and color attributes. Finally, we construct the FPO and train the Fourier coefficients directly on the leaves of a union PlenOctree structure of the dynamic sequence. We show that the resulting FPO enables compact memory overload to handle dynamic objects and supports efficient fine-tuning. Extensive experiments show that the proposed method is 3000 times faster than the original NeRF and achieves over an order of magnitude acceleration over SOTA while preserving high visual quality for the free-viewpoint rendering of unseen dynamic scenes.

翻訳日:2022-02-18 19:43:19 公開日:2022-02-17

# (参考訳) オブジェクトカウントのためのドメインランダム化

Domain Randomization for Object Counting ( http://arxiv.org/abs/2202.08670v1 )

ライセンス: CC0 1.0

Enric Moreu, Kevin McGuinness, Diego Ortego, Noel E. O'Connor

(参考訳) 近年,ゲームエンジンに基づく合成データセットの利用により,コンピュータビジョンにおけるタスクの性能向上が図られている。しかし、これらのデータセットは通常、車両や人々を含む都市シーンなど、コンピュータゲームで描かれた特定のドメインにのみ適している。本稿では,高額な3Dアーティストチームによって手作業で作成される写真リアルな技法を必要とせずに,任意の領域のオブジェクトカウントのための合成データセットを生成する手法を提案する。本稿では,高速かつ安価に生成できる合成データセットに基づくオブジェクトカウントのためのドメインランダム化手法を提案する。我々は、故意にフォトリアリズムを避け、データセットの可変性を劇的に増加させ、ランダムなテクスチャと3d変換を持つ画像を生成し、一般化を改善する。実験により,本手法は,人,車,ペンギン,果物など,複数のドメインを対象とした実単語オブジェクトカウントデータセットの性能向上を図っている。ソースコードはhttps://github.com/enric1994/dr4oc

Recently, the use of synthetic datasets based on game engines has been shown to improve the performance of several tasks in computer vision. However, these datasets are typically only appropriate for the specific domains depicted in computer games, such as urban scenes involving vehicles and people. In this paper, we present an approach to generate synthetic datasets for object counting for any domain without the need for photo-realistic techniques manually generated by expensive teams of 3D artists. We introduce a domain randomization approach for object counting based on synthetic datasets that are quick and inexpensive to generate. We deliberately avoid photorealism and drastically increase the variability of the dataset, producing images with random textures and 3D transformations, which improves generalization. Experiments show that our method facilitates good performance on various real word object counting datasets for multiple domains: people, vehicles, penguins, and fruit. The source code is available at: https://github.com/enric1994/dr4oc

翻訳日:2022-02-18 19:27:22 公開日:2022-02-17

# (参考訳) 信頼度測定か、生理学の自動化か? Safra, Chevallier, Gr\`ezes, and Baumard (2020) へのコメント

Measuring Trustworthiness or Automating Physiognomy? A Comment on Safra, Chevallier, Gr\`ezes, and Baumard (2020) ( http://arxiv.org/abs/2202.08674v1 )

ライセンス: CC BY 4.0

Rory W Spanton and Olivia Guest

(参考訳) 個人間の信頼 - 他の個人に対する信頼と脆弱性の共有表示 - は、人間の社会の発展に有効であると見なすことができる。 Safra、Chevallier、Gr\ezes、Baumard (2020)は、顔の特徴に基づいて、歴史的肖像画の信頼性評価を生成するために機械学習(ML)アルゴリズムを訓練することで、対人信頼の歴史的進歩を研究した。彼らは1500年から2000年代にかけての肖像画の信頼度評価が時間とともに増加し、これが社会進歩の指標と一致する対人信頼のより広範な増加を証明していると主張した。これらの主張はいくつかの方法論的・分析的問題と相まって成り立っており、サフラらのアルゴリズムと生理学の疑似科学の類似点を強調する。本論では,これらの問題の現実的な影響と可能性について,さらに詳細に論じる。

Interpersonal trust - a shared display of confidence and vulnerability toward other individuals - can be seen as instrumental in the development of human societies. Safra, Chevallier, Gr\`ezes, and Baumard (2020) studied the historical progression of interpersonal trust by training a machine learning (ML) algorithm to generate trustworthiness ratings of historical portraits, based on facial features. They reported that trustworthiness ratings of portraits dated between 1500--2000CE increased with time, claiming that this evidenced a broader increase in interpersonal trust coinciding with several metrics of societal progress. We argue that these claims are confounded by several methodological and analytical issues and highlight troubling parallels between Safra et al.'s algorithm and the pseudoscience of physiognomy. We discuss the implications and potential real-world consequences of these issues in further detail.

翻訳日:2022-02-18 19:18:56 公開日:2022-02-17

# (参考訳) Winograd Convolution: フォールトトレランスの観点から

Winograd Convolution: A Perspective from Fault Tolerance ( http://arxiv.org/abs/2202.08675v1 )

ライセンス: CC BY 4.0

Xinghua Xue, Haitong Huang, Cheng Liu, Ying Wang, Tao Luo, Lei Zhang

(参考訳) Winograd Convolutionは、ニューラルネットワーク(NN)の乗算を線形変換によって加算することで、計算オーバーヘッドを削減するために提案された。計算効率以外では,NNの耐障害性向上に大きな可能性を示し,その耐障害性を総合的に評価した。次に, 耐故障性, 省エネ性NN処理における耐故障性の検討を行った。以上の結果から, 耐故障設計のオーバーヘッドを27.49 %, エネルギー消費を7.19 %削減できることがわかった。

Winograd convolution is originally proposed to reduce the computing overhead by converting multiplication in neural network (NN) with addition via linear transformation. Other than the computing efficiency, we observe its great potential in improving NN fault tolerance and evaluate its fault tolerance comprehensively for the first time. Then, we explore the use of fault tolerance of winograd convolution for either fault-tolerant or energy-efficient NN processing. According to our experiments, winograd convolution can be utilized to reduce fault-tolerant design overhead by 27.49\% or energy consumption by 7.19\% without any accuracy loss compared to that without being aware of the fault tolerance

翻訳日:2022-02-18 19:14:27 公開日:2022-02-17

# (参考訳) 教師なしポリプセグメンテーションのための合成データ

Synthetic data for unsupervised polyp segmentation ( http://arxiv.org/abs/2202.08680v1 )

ライセンス: CC0 1.0

Enric Moreu, Kevin McGuinness, Noel E. O'Connor

(参考訳) 深層学習は医療画像の解析において優れた性能を示した。しかし、データセットはプライバシの問題、標準化の問題、アノテーションの欠如のために取得することが難しい。本稿では,3次元技術と生成対向ネットワークを組み合わせたリアルな合成画像を作成することで,これらの課題に対処する。パイプラインでは医療専門家のアノテーションをゼロにしています。本手法は,5つの実ポリープセグメンテーションデータセットに対して有望な結果を得る。この研究の一環として、我々はSynth-Colonをリリースした。Synth-Colonは、20000のリアルな大腸画像と深度と3D幾何学に関する追加情報を含む完全に合成されたデータセットである。

Deep learning has shown excellent performance in analysing medical images. However, datasets are difficult to obtain due privacy issues, standardization problems, and lack of annotations. We address these problems by producing realistic synthetic images using a combination of 3D technologies and generative adversarial networks. We use zero annotations from medical professionals in our pipeline. Our fully unsupervised method achieves promising results on five real polyp segmentation datasets. As a part of this study we release Synth-Colon, an entirely synthetic dataset that includes 20000 realistic colon images and additional details about depth and 3D geometry: https://enric1994.github.io/synth-colon

翻訳日:2022-02-18 19:01:39 公開日:2022-02-17

# (参考訳) 集合オートマトンマッチングに基づく項書き換え

Term Rewriting Based On Set Automaton Matching ( http://arxiv.org/abs/2202.08687v1 )

ライセンス: CC BY 4.0

Mark Bouwman, Rick Erkens

(参考訳) これまで我々は,集合オートマトンの概念に基づく効率的なパターンマッチングアルゴリズムを提案してきた。本稿では,効率的な項書き換え手順を実現するために,set automataをどのように活用できるかを検討する。これらの手順はパターンマッチングステップと書き換えステップをインターリーブし、redex発見とサブターム置換をスムーズに統合する。具体的には,左線形項書き換えシステムの最外書き換えのための最適化アルゴリズムを提案し,その正しさを証明し,いくつかの実装実験の結果を示す。

In previous work we have proposed an efficient pattern matching algorithm based on the notion of set automaton. In this article we investigate how set automata can be exploited to implement efficient term rewriting procedures. These procedures interleave pattern matching steps and rewriting steps and thus smoothly integrate redex discovery and subterm replacement. Concretely, we propose an optimised algorithm for outermost rewriting of left-linear term rewriting systems, prove its correctness, and present the results of some implementation experiments.

翻訳日:2022-02-18 18:54:32 公開日:2022-02-17

# (参考訳) OmniSyn:360度ビデオをワイドベースラインパノラマで合成する

OmniSyn: Synthesizing 360 Videos with Wide-baseline Panoramas ( http://arxiv.org/abs/2202.08752v1 )

ライセンス: CC BY 4.0

David Li, Yinda Zhang, Christian H\"ane, Danhang Tang, Amitabh Varshney, Ruofei Du

(参考訳) GoogleストリートビューやBingストリートサイドのような没入型マップは、パノラマの膨大なコレクションで現実のビューを提供する。しかし、これらのパノラマは、撮影される経路に沿ってスパース間隔でのみ利用可能であり、ナビゲーション中に視覚的な不連続が生じる。視線合成の先行技術は通常、視線画像、立体画像、または単眼画像のセットの上に構築されるが、広帯域パノラマは、帯域幅とストレージ使用量の最適化のために商業的プラットフォームで広く採用されている。本稿では,ワイドベースラインパノラマのユニークな特徴と,ワイドベースラインパノラマ間の360{\deg}ビュー合成のための新しいパイプラインであるOmniSynについて述べる。 omnisynは球面コストボリュームと単眼スキップ接続を用いて全方位深度マップを予測し、360{\deg}画像にメッシュを描画し、中間ビューと融合ネットワークを合成する。我々はomnisynの有効性を,carlaおよびmatterportデータセットの最先端手法との比較,アブレーション研究,ストリートビューの一般化など,総合的な実験結果を通じて実証する。私たちの研究は、この未完成の現実世界のタスクの将来の研究を刺激し、最終的には没入型マップをスムースに操作できることを期待しています。

Immersive maps such as Google Street View and Bing Streetside provide true-to-life views with a massive collection of panoramas. However, these panoramas are only available at sparse intervals along the path they are taken, resulting in visual discontinuities during navigation. Prior art in view synthesis is usually built upon a set of perspective images, a pair of stereoscopic images, or a monocular image, but barely examines wide-baseline panoramas, which are widely adopted in commercial platforms to optimize bandwidth and storage usage. In this paper, we leverage the unique characteristics of wide-baseline panoramas and present OmniSyn, a novel pipeline for 360{\deg} view synthesis between wide-baseline panoramas. OmniSyn predicts omnidirectional depth maps using a spherical cost volume and a monocular skip connection, renders meshes in 360{\deg} images, and synthesizes intermediate views with a fusion network. We demonstrate the effectiveness of OmniSyn via comprehensive experimental results including comparison with the state-of-the-art methods on CARLA and Matterport datasets, ablation studies, and generalization studies on street views. We envision our work may inspire future research for this unheeded real-world task and eventually produce a smoother experience for navigating immersive maps.

翻訳日:2022-02-18 18:25:30 公開日:2022-02-17

# (参考訳) ツールキットのように見る: ツールキットがai倫理の仕事をどう考えるか

Seeing Like a Toolkit: How Toolkits Envision the Work of AI Ethics ( http://arxiv.org/abs/2202.08792v1 )

ライセンス: CC BY 4.0

Richmond Y. Wong, Michael A. Madaio, Nick Merrill

(参考訳) 倫理的AI開発を支援するために多くのツールキットが開発されている。しかしながら、ツールキットは、他のツールと同様に、何をすべきか、どのように行うべきかという前提を設計にエンコードします。本稿では,27のAI倫理ツールキットの質的分析を行い,倫理的作業がどのように想像されるか,どのように支援されるのかを批判的に検証する。具体的には,倫理的問題,倫理的作業を行うべき者,倫理的対処に関わる作業慣行をどのように想定するか,などについて論じる。 AI倫理ツールキットは、AI倫理の社会的側面に固執することや、実践中のAI倫理作業の組織的および政治的意味に抗うことなく、より広範な利害関係者の関与を求めるにもかかわらず、AI倫理の作業が個々の技術実践者にとって技術的作業になるように、主に枠組みを定めている。すべてのツールキットのうち、想定された倫理の作業と、その作業にツールキットが提供するサポートのミスマッチを識別します。倫理的な作業を行う上で,組織的な力のダイナミクスをナビゲートする方法に関するガイダンスの欠如を特定します。我々はこれらの欠落を利用して、AI倫理ツールキットの研究者やデザイナーの今後の業績をグラフ化しています。

Numerous toolkits have been developed to support ethical AI development. However, toolkits, like all tools, encode assumptions in their design about what work should be done and how. In this paper, we conduct a qualitative analysis of 27 AI ethics toolkits to critically examine how the work of ethics is imagined and how it is supported by these toolkits. Specifically, we examine the discourses toolkits rely on when talking about ethical issues, who they imagine should do the work of ethics, and how they envision the work practices involved in addressing ethics. We find that AI ethics toolkits largely frame the work of AI ethics to be technical work for individual technical practitioners, despite calls for engaging broader sets of stakeholders in grappling with social aspects of AI ethics, and without contending with the organizational and political implications of AI ethics work in practice. Among all toolkits, we identify a mismatch between the imagined work of ethics and the support the toolkits provide for doing that work. We identify a lack of guidance around how to navigate organizational power dynamics as they relate to performing ethical work. We use these omissions to chart future work for researchers and designers of AI ethics toolkits.

翻訳日:2022-02-18 18:09:05 公開日:2022-02-17

# (参考訳) 物理化学的性質の物理およびデータ駆動予測手法のハイブリダイゼーション

Hybridizing Physical and Data-driven Prediction Methods for Physicochemical Properties ( http://arxiv.org/abs/2202.08804v1 )

ライセンス: CC BY 4.0

Fabian Jirasek, Robert Bamler, and Stephan Mandt

(参考訳) 本稿では,物理化学的特性の予測のための物理・データ駆動手法のハイブリッド化手法を提案する。このアプローチは、物理手法の予測を事前のモデルに 'distills' し、ベイズ推定を用いた疎い実験データと組み合わせる。本研究では,データ駆動型および物理ベースラインと比較して,無限希釈時の活動係数の予測に新たなアプローチを適用し,機械学習文献からアンサンブル法を確立した。

We present a generic way to hybridize physical and data-driven methods for predicting physicochemical properties. The approach `distills' the physical method's predictions into a prior model and combines it with sparse experimental data using Bayesian inference. We apply the new approach to predict activity coefficients at infinite dilution and obtain significant improvements compared to the data-driven and physical baselines and established ensemble methods from the machine learning literature.

翻訳日:2022-02-18 17:43:21 公開日:2022-02-17

# (参考訳) 人間とアルゴリズムの協調:相補性と不公平を回避する

Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness ( http://arxiv.org/abs/2202.08821v1 )

ライセンス: CC BY 4.0

Kate Donahue, Alexandra Chouldechova, Krishnaram Kenthapadi

(参考訳) 機械学習の研究の多くは予測精度に焦点を当てている。タスクが与えられたら、精度を最大化する機械学習モデル(またはアルゴリズム)を作成する。しかし、多くの環境では、システムの最終的な予測や決定は、アルゴリズムのアウトプットと自身の個人的な専門知識を使って複合的な予測を生成する人間の管理下にある。このような協調システムの最終的な目標は「相補性」(complementarity) であり、すなわち、人間やアルゴリズム単独よりも損失の少ないもの(ほぼ同値)を生み出すことである。しかし, 慎重に設計したシステムにおいても, 相補的な性能は明らかである。私たちの仕事は3つの重要な貢献をします。まず,簡単な人間-アルゴリズム系をモデル化するための理論的枠組みを提供し,複数の事前解析をその内部で表現できることを実証する。次に、このモデルを用いて相補性が不可能な条件を証明し、相補性が達成可能な構成例を示す。最後に,本研究の意義について,特に分類器の公平性について論じる。まとめると、これらの結果は人間のアルゴリズムシステムの複合性能に影響を及ぼす重要な要因の理解を深め、アルゴリズムツールが協調環境のためにどのように最適に設計できるかを洞察する。

Much of machine learning research focuses on predictive accuracy: given a task, create a machine learning model (or algorithm) that maximizes accuracy. In many settings, however, the final prediction or decision of a system is under the control of a human, who uses an algorithm's output along with their own personal expertise in order to produce a combined prediction. One ultimate goal of such collaborative systems is "complementarity": that is, to produce lower loss (equivalently, greater payoff or utility) than either the human or algorithm alone. However, experimental results have shown that even in carefully-designed systems, complementary performance can be elusive. Our work provides three key contributions. First, we provide a theoretical framework for modeling simple human-algorithm systems and demonstrate that multiple prior analyses can be expressed within it. Next, we use this model to prove conditions where complementarity is impossible, and give constructive examples of where complementarity is achievable. Finally, we discuss the implications of our findings, especially with respect to the fairness of a classifier. In sum, these results deepen our understanding of key factors influencing the combined performance of human-algorithm systems, giving insight into how algorithmic tools can best be designed for collaborative environments.

翻訳日:2022-02-18 17:42:23 公開日:2022-02-17

# (参考訳) クロスマーケットレコメンデーションのための多段階アンサンブルモデル

Multi-stage Ensemble Model for Cross-market Recommendation ( http://arxiv.org/abs/2202.08824v1 )

ライセンス: CC BY 4.0

Cesare Bernardis

(参考訳) 本稿では,WSDM カップ 2022 における PolimiRank チームによるクロスマーケットレコメンデーションのソリューションについて述べる。競争の目的は、異なる市場から抽出された情報を効果的に活用し、2つのターゲット市場における推薦のランキング精度を向上させることである。我々のモデルは、異なる市場に属するデータの組み合わせに基づく多段階的なアプローチで構成されている。最初の段階では、最先端のレコメンデータを使用して、以下の2段階にまとめられたユーザとイタムのカップルのスコアを予測し、単純な線形結合とより強力なグラディエントブースティング決定木技術を用いる。我々のチームはファイナル・リーダーボードで4位にランクインした。

This paper describes the solution of our team PolimiRank for the WSDM Cup 2022 on cross-market recommendation. The goal of the competition is to effectively exploit the information extracted from different markets to improve the ranking accuracy of recommendations on two target markets. Our model consists in a multi-stage approach based on the combination of data belonging to different markets. In the first stage, state-of-the-art recommenders are used to predict scores for user-item couples, which are ensembled in the following 2 stages, employing a simple linear combination and more powerful Gradient Boosting Decision Tree techniques. Our team ranked 4th in the final leaderboard.

翻訳日:2022-02-18 17:11:39 公開日:2022-02-17

# (参考訳) LAMP: 言語モデルでグラディエントからテキストを抽出する

LAMP: Extracting Text from Gradients with Language Model Priors ( http://arxiv.org/abs/2202.08827v1 )

ライセンス: CC BY 4.0

Dimitar I. Dimitrov, Mislav Balunovi\'c, Nikola Jovanovi\'c, Martin Vechev

(参考訳) 最近の研究は、センシティブなユーザデータを勾配更新から再構築できることを示し、フェデレートされた学習における重要なプライバシーの約束を破っている。成功は主に画像データで示されたが、これらの手法はテキストなどの他の領域に直接転送するわけではない。本研究では,テキストデータに合わせた新しい攻撃手法であるlampを提案する。我々の重要な洞察は、テキストの以前の確率を補助言語モデルでモデル化し、検索をより自然なテキストへと導くことである。具体的には、lampは補助言語モデルによって提供されるレコンストラクション損失と以前のテキスト確率の両方を最小化する離散テキスト変換手順を導入する。この手順は、再建された埋め込みの長さを規則化する再構成損失の連続的な最適化と交換される。我々の実験では、LAMPは以前の作業よりもかなり正確に元のテキストを再構築することを示した。さらに,テキストモデルでは,バッチサイズが1より大きい場合から,まず入力を復元する。これらの結果から,テキストデータ上で動作しているモデルの勾配更新は,従来考えられていたよりも情報漏えいが大きいことが示唆された。

Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients. Our key insight is to model the prior probability of the text with an auxiliary language model, utilizing it to guide the search towards more natural text. Concretely, LAMP introduces a discrete text transformation procedure that minimizes both the reconstruction loss and the prior text probability, as provided by the auxiliary language model. The procedure is alternated with a continuous optimization of the reconstruction loss, which also regularizes the length of the reconstructed embeddings. Our experiments demonstrate that LAMP reconstructs the original text significantly more precisely than prior work: we recover 5x more bigrams and $23\%$ longer subsequences on average. Moreover, we are first to recover inputs from batch sizes larger than 1 for textual models. These findings indicate that gradient updates of models operating on textual data leak more information than previously thought.

翻訳日:2022-02-18 17:02:17 公開日:2022-02-17

# 水中画像強調のためのウェーブレット型デュアルストリームネットワーク

A Wavelet-based Dual-stream Network for Underwater Image Enhancement ( http://arxiv.org/abs/2202.08758v1 )

ライセンス: Link先を確認

Ziyin Ma and Changjae Oh

(参考訳) 水中画像のカラーキャストやぼやけた細部に対処するウェーブレットベースのデュアルストリームネットワークを提案する。入力画像を離散ウェーブレット変換を用いて複数の周波数帯域に分解することで、これらのアーティファクトを別々に処理し、ダウンサンプリングされた構造画像と詳細画像を生成する。これらのサブバンドイメージは、マルチカラースペースフュージョンネットワークとディテールエンハンスメントネットワークという2つのサブネットワークを組み込んだデュアルストリームネットワークへの入力として使用されます。多色空間融合ネットワークは、分解した構造画像を入力として、入力の多様な色空間からの特徴表現を用いて色補正出力を推定する。ディテールエンハンスメントネットワークは、高周波サブバンドからの画像の詳細を改善することにより、元の水中画像のぼやけに対処する。提案手法を実環境および合成水中データセットの両方で検証し,計算複雑性の低い色補正およびぼかし除去におけるモデルの有効性を示した。

We present a wavelet-based dual-stream network that addresses color cast and blurry details in underwater images. We handle these artifacts separately by decomposing an input image into multiple frequency bands using discrete wavelet transform, which generates the downsampled structure image and detail images. These sub-band images are used as input to our dual-stream network that incorporates two sub-networks: the multi-color space fusion network and the detail enhancement network. The multi-color space fusion network takes the decomposed structure image as input and estimates the color corrected output by employing the feature representations from diverse color spaces of the input. The detail enhancement network addresses the blurriness of the original underwater image by improving the image details from high-frequency sub-bands. We validate the proposed method on both real-world and synthetic underwater datasets and show the effectiveness of our model in color correction and blur removal with low computational complexity.

翻訳日:2022-02-18 16:43:23 公開日:2022-02-17

# 推移型および線形順序付きデータを用いた問合せ応答

Query Answering with Transitive and Linear-Ordered Data ( http://arxiv.org/abs/2202.08555v1 )

ライセンス: Link先を確認

Antoine Amarilli and Michael Benedikt and Pierre Bourhis and Michael Vanden Boom

(参考訳) 我々は,一組の区別関係に対して追加的な意味的制約を課すフロンティア保護存在規則のような強力な制約言語を含む包括的問題を考える。我々は、関係を推移的に制限し、関係を他の関係の推移的閉包に制限し、関係を線型次数に制限することを検討する。我々は、各ケースにおいて推論を決定可能とし、対応する決定問題の複雑さを分離できるガードネスの自然な変種を与える。最後に,これらの条件のわずかな変化が決定不能につながることを示す。

We consider entailment problems involving powerful constraint languages such as frontier-guarded existential rules in which we impose additional semantic restrictions on a set of distinguished relations. We consider restricting a relation to be transitive, restricting a relation to be the transitive closure of another relation, and restricting a relation to be a linear order. We give some natural variants of guardedness that allow inference to be decidable in each case, and isolate the complexity of the corresponding decision problems. Finally we show that slight changes in these conditions lead to undecidability.

翻訳日:2022-02-18 16:43:06 公開日:2022-02-17

# 大規模実世界グラフにおける最大k-プレックスの一覧

Listing Maximal k-Plexes in Large Real-World Graphs ( http://arxiv.org/abs/2202.08737v1 )

ライセンス: Link先を確認

Zhengren Wang, Yi Zhou, Mingyun Xiao and Bakhadyr Khoussainov

(参考訳) 大きなグラフで高密度なサブグラフをリストすることは、コミュニティ検出のような様々なネットワーク分析アプリケーションにおいて重要なタスクである。最も密度の高いモデルであるクライクは広く研究されている。しかし、実際には、データノイズなど、様々な理由でコミュニティが斜めに形成されることは滅多にない。したがって、k$-plex、-graphは、最大$k$頂点を除いて全ての頂点に隣接し、リラックスしたcliqueバージョンとして導入される。コヒーシブなコミュニティをよりよくシミュレートするために、接続された$k$-plexesに$k$を小さな$k$で強調することが多い。本稿では,任意のサイズの最大$k$-plexes と最大$k$-plexes をリストアップする研究を継続する。最初のコントリビューションはアルゴリズム \emph{listplex} で、各定数 $k$ に対して、最大$k$-plexes をリストアップします。 $o^*(\gamma^d)$ time ここで$\gamma$ は$k$ に関連する値ですが、2 より厳密に小さい値で、$d$ は実数グラフの頂点数 $n$ よりもはるかに小さいグラフの縮約です。 2^n$の自明なバウンドと比較すると、改善は重要であり、我々のバウンドはすべての既知の結果より優れている。実際には、構造ベースのプルールール、キャッシュ効率のよいデータ構造、並列技術など、所定のサイズの$k$プレックスの一覧化を高速化するために、いくつかの手法を用いる。これら全ては、非常に実用的なアルゴリズムをもたらす。実証的な結果は、我々のアプローチが最先端のソリューションを最大で桁違いに上回っていることを示している。

Listing dense subgraphs in large graphs plays a key task in varieties of network analysis applications like community detection. Clique, as the densest model, has been widely investigated. However, in practice, communities rarely form as cliques for various reasons, e.g., data noise. Therefore, $k$-plex, -- graph with each vertex adjacent to all but at most $k$ vertices, is introduced as a relaxed version of clique. Often, to better simulate cohesive communities, an emphasis is placed on connected $k$-plexes with small $k$. In this paper, we continue the research line of listing all maximal $k$-plexes and maximal $k$-plexes of prescribed size. Our first contribution is algorithm \emph{ListPlex} that lists all maximal $k$-plexes in $O^*(\gamma^D)$ time for each constant $k$, where $\gamma$ is a value related to $k$ but strictly smaller than 2, and $D$ is the degeneracy of the graph that is far less than the vertex number $n$ in real-word graphs. Compared to the trivial bound of $2^n$, the improvement is significant, and our bound is better than all previously known results. In practice, we further use several techniques to accelerate listing $k$-plexes of a given size, such as structural-based prune rules, cache-efficient data structures, and parallel techniques. All these together result in a very practical algorithm. Empirical results show that our approach outperforms the state-of-the-art solutions by up to orders of magnitude.

翻訳日:2022-02-18 16:42:56 公開日:2022-02-17

# 有限混合モデルによる最大近似推定のための精製収束率

Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models ( http://arxiv.org/abs/2202.08786v1 )

ライセンス: Link先を確認

Tudor Manole, Nhat Ho

(参考訳) 有限混合モデル下での最大極大推定(MLE)の収束率を再検討する。これらのモデルにおけるパラメータ推定の解析において、ワッサースタイン距離は標準損失関数となり、ラベルの切り換えを回避でき、結合した混合成分の挙動を消滅重みで正確に特徴付けることができるようになった。しかし、ワッサーシュタイン計量は、残りの適合した混合成分の中で最悪の場合の収束率のみを捉えることができる。対数類似関数をペナル化して混合重みの消滅を阻止すると、より強い損失関数を導出し、ワッサーシュタイン距離のこの欠点を解決することができる。これらの新しい損失関数は, 適合混合成分の収束率の不均一性を正確に把握し, 各種混合モデルにおける既存点方向および一様収束率の研削に用いた。特に、これらの結果は、ペナルティ化されたmleの構成要素のサブセットが、通常、過去の作業で予想されていたよりもかなり速く収束することを示している。さらに、これらの結論のいくつかが従来のMLEにまで拡張されていることを示す。我々の理論的知見は、これらの収束率を改善するためのシミュレーション研究によって裏付けられている。

We revisit convergence rates for maximum likelihood estimation (MLE) under finite mixture models. The Wasserstein distance has become a standard loss function for the analysis of parameter estimation in these models, due in part to its ability to circumvent label switching and to accurately characterize the behaviour of fitted mixture components with vanishing weights. However, the Wasserstein metric is only able to capture the worst-case convergence rate among the remaining fitted mixture components. We demonstrate that when the log-likelihood function is penalized to discourage vanishing mixing weights, stronger loss functions can be derived to resolve this shortcoming of the Wasserstein distance. These new loss functions accurately capture the heterogeneity in convergence rates of fitted mixture components, and we use them to sharpen existing pointwise and uniform convergence rates in various classes of mixture models. In particular, these results imply that a subset of the components of the penalized MLE typically converge significantly faster than could have been anticipated from past work. We further show that some of these conclusions extend to the traditional MLE. Our theoretical findings are supported by a simulation study to illustrate these improved convergence rates.

翻訳日:2022-02-18 16:42:23 公開日:2022-02-17

# グラフマスク付きオートエンコーダ

Graph Masked Autoencoder ( http://arxiv.org/abs/2202.08391v1 )

ライセンス: Link先を確認

Hongxu Chen, Sixiao Zhang, Guandong Xu

(参考訳) トランスフォーマーはグラフ表現の学習において最先端のパフォーマンスを達成している。しかし、深いトランスフォーマーをスクラッチからトレーニングすることは困難であり、メモリ消費が大きいため、現実世界のシナリオにトランスフォーマーを適用する際の課題が残っている。この2つの課題に対処するために,我々は,バニラグラフ変換器をエンコーダおよびデコーダとして使用する,グラフ表現を学習するための自己教師型モデルであるGraph Masked Autoencoders (GMAE)を提案する。 GMAEは部分的にマスキングされたグラフを入力として、マスキングされたノードの特徴を再構築する。我々は、非対称エンコーダ-デコーダ設計を採用し、エンコーダは深いグラフトランス、デコーダは浅いグラフトランスフォーマである。マスキング機構と非対称設計によりgmaeは従来のトランスフォーマーに比べてメモリ効率の良いモデルとなった。 GMAEを用いて事前学習したグラフトランスフォーマーは,スクラッチからのトレーニングに比べ,微調整後の性能が向上することを示した。また,従来の自己教師付きグラフ表現モデルとして機能し,svmモデルをダウンストリームグラフ分類器として使用する場合,gmaeは7つのベンチマークデータセットのうち5つで最先端のパフォーマンスを実現する。

Transformers have achieved state-of-the-art performance in learning graph representations. However, there are still some challenges when applying transformers to real-world scenarios due to the fact that deep transformers are hard to be trained from scratch and the memory consumption is large. To address the two challenges, we propose Graph Masked Autoencoders (GMAE), a self-supervised model for learning graph representations, where vanilla graph transformers are used as the encoder and the decoder. GMAE takes partially masked graphs as input, and reconstructs the features of the masked nodes. We adopt asymmetric encoder-decoder design, where the encoder is a deep graph transformer and the decoder is a shallow graph transformer. The masking mechanism and the asymmetric design make GMAE a memory-efficient model compared with conventional transformers. We show that, compared with training from scratch, the graph transformer pre-trained using GMAE can achieve much better performance after fine-tuning. We also show that, when serving as a conventional self-supervised graph representation model and using an SVM model as the downstream graph classifier, GMAE achieves state-of-the-art performance on 5 of the 7 benchmark datasets.

翻訳日:2022-02-18 16:40:08 公開日:2022-02-17

# Point-of-Interest Recommender システムによるレーティングと関連性の改善

Improving Rating and Relevance with Point-of-Interest Recommender System ( http://arxiv.org/abs/2202.08751v1 )

ライセンス: Link先を確認

Syed Raza Bashir, Vojislav Misic

(参考訳) 位置情報ベースのソーシャルネットワークでは,関心点(POI)の推薦が不可欠である。これにより、ユーザや場所の情報共有が容易になる。近年,質問項目関連性を表す大量の学習データを必要とする大規模検索システムとしてPOIを推奨する傾向にある。しかし,検索システムにおけるユーザフィードバックの収集は高価である。既存のPOIレコメンデータシステムは、ユーザとアイテム(ロケーション)のインタラクションのみに基づいてレコメンデーションを行います。しかし、考慮すべきフィードバックの源はたくさんあります。例えば、ユーザがPOIを訪れたとき、POIとは何かなどです。 POIレコメンデータの開発には,これらすべての種類のフィードバックを統合することが不可欠です。本稿では,ユーザ情報とアイテム情報と補助情報を用いて検索システムにおけるレコメンデーションモデリングを改善することを提案する。我々は,協調情報とコンテンツ情報の両方が存在する場合のクエリ-イテム関係をモデル化するディープニューラルネットワークアーキテクチャを開発した。また、ユーザからのフィードバックデータからコンテキスト情報を含めることで、クエリやアイテムの学習表現の質を向上させる。これらの学習表現を大規模データセットに適用することで、大幅な改善がもたらされた。

The recommendation of points of interest (POIs) is essential in location-based social networks. It makes it easier for users and locations to share information. Recently, researchers tend to recommend POIs by treating them as large-scale retrieval systems that require a large amount of training data representing query-item relevance. However, gathering user feedback in retrieval systems is an expensive task. Existing POI recommender systems make recommendations based on user and item (location) interactions solely. However, there are numerous sources of feedback to consider. For example, when the user visits a POI, what is the POI is about and such. Integrating all these different types of feedback is essential when developing a POI recommender. In this paper, we propose using user and item information and auxiliary information to improve the recommendation modelling in a retrieval system. We develop a deep neural network architecture to model query-item relevance in the presence of both collaborative and content information. We also improve the quality of the learned representations of queries and items by including the contextual information from the user feedback data. The application of these learned representations to a large-scale dataset resulted in significant improvements.

翻訳日:2022-02-18 16:39:48 公開日:2022-02-17

# この通知を送りましょうか。将来をモデル化したプッシュ通知決定の最適化

Should I send this notification? Optimizing push notifications decision making by modeling the future ( http://arxiv.org/abs/2202.08812v1 )

ライセンス: Link先を確認

Conor O'Brien, Huasen Wu, Shaodan Zhai, Dalin Guo, Wenzhe Shi, Jonathan J Hunt

(参考訳) 最も推奨されるシステムは、ユーザの即時応答に基づいて最適化されるミオピックである。これは、長期的なユーザ満足度の作成など、真の目標と誤解する可能性がある。この作業では,特に推奨システム決定の長期的な影響が強いモバイルプッシュ通知に重点を置いています。例えば、過剰な通知や無関係な通知を送ると、ユーザーに迷惑をかけ、通知を無効にすることがある。しかし、将来マイナス効果が発生するため、筋電図システムは常に通知を送信することを選択する。これは典型的にはヒューリスティックを用いて緩和される。しかし、ヒューリスティックスは推論や改善が困難であり、システムが変更されるたびに修正が必要であり、亜最適かもしれない。これらの欠点に対処するため、長期的価値(LTV)を直接最適化するレコメンデーターシステムに大きな関心がある。本稿では,モデルベース強化学習(RL)を用いたLTVの最大化手法について述べる。我々は,通知がユーザの将来の行動に与える影響をモデル化する。推薦システムにおけるLTVの最大化にRLを適用した以前の作業の多くはセッションベースの最適化に重点を置いていたが、この作業における通知決定の時間的地平は数日にわたって続いている。我々は、大手ソーシャルネットワーク上でのA/Bテストでこのアプローチをテストする。プッシュ通知に関する決定を最適化することで,既存のヒューリスティックなシステムと同じレベルのユーザエンゲージメントをプラットフォーム上で生成しながら,通知の送信を減らし,ベースラインシステムよりも高いオープンレートを得ることができることを示す。

Most recommender systems are myopic, that is they optimize based on the immediate response of the user. This may be misaligned with the true objective, such as creating long term user satisfaction. In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong. For example, sending too many or irrelevant notifications may annoy a user and cause them to disable notifications. However, a myopic system will always choose to send a notification since negative effects occur in the future. This is typically mitigated using heuristics. However, heuristics can be hard to reason about or improve, require retuning each time the system is changed, and may be suboptimal. To counter these drawbacks, there is significant interest in recommender systems that optimize directly for long-term value (LTV). Here, we describe a method for maximising LTV by using model-based reinforcement learning (RL) to make decisions about whether to send push notifications. We model the effects of sending a notification on the user's future behavior. Much of the prior work applying RL to maximise LTV in recommender systems has focused on session-based optimization, while the time horizon for notification decision making in this work extends over several days. We test this approach in an A/B test on a major social network. We show that by optimizing decisions about push notifications we are able to send less notifications and obtain a higher open rate than the baseline system, while generating the same level of user engagement on the platform as the existing, heuristic-based, system.

翻訳日:2022-02-18 16:39:34 公開日:2022-02-17

# 反事実推論と事実推論に基づくグラフニューラルネットワーク説明の学習と評価

Learning and Evaluating Graph Neural Network Explanations based on Counterfactual and Factual Reasoning ( http://arxiv.org/abs/2202.08816v1 )

ライセンス: Link先を確認

Juntao Tan, Shijie Geng, Zuohui Fu, Yingqiang Ge, Shuyuan Xu, Yunqi Li, Yongfeng Zhang

(参考訳) 構造化データは、ソーシャルメディアのソーシャルネットワーク、学術ウェブサイトの引用ネットワーク、オンラインフォーラムのスレッドデータなど、Webアプリケーションによく存在する。複雑なトポロジーのため、そのようなデータ内のリッチな情報を処理し利用することは困難である。グラフニューラルネットワーク(GNN)は、構造データに対する学習表現に大きな利点を示している。しかし、ディープラーニングモデルの非透明性は、GNNによる予測を説明・解釈するのは簡単ではない。一方、GNNの説明を評価することは大きな課題であり、多くの場合、真理的な説明は利用できない。本稿では、因果推論理論に基づくCF^2推論の考察を行い、説明可能なGNNにおける学習と評価の両問題を解く。本稿では,2つのカジュアルな視点から最適化問題を定式化するモデルに依存しないフレームワークを提案する。これにより、cf^2 は以前の説明可能な gnn と区別される。この研究のもうひとつの貢献は、GNN説明の評価である。根拠を必要とせず, 生成した説明を定量的に評価するために, 説明の必要性と十分性を評価するために, 反事実的, 事実的推論に基づく指標を設計する。 CF^2は, 実世界のデータセットにおける従来の最先端の手法よりも, より優れた説明を生成する。さらに, 統計的解析により, 実測値と実測値との相関関係を正当化する。

Structural data well exists in Web applications, such as social networks in social media, citation networks in academic websites, and threads data in online forums. Due to the complex topology, it is difficult to process and make use of the rich information within such data. Graph Neural Networks (GNNs) have shown great advantages on learning representations for structural data. However, the non-transparency of the deep learning models makes it non-trivial to explain and interpret the predictions made by GNNs. Meanwhile, it is also a big challenge to evaluate the GNN explanations, since in many cases, the ground-truth explanations are unavailable. In this paper, we take insights of Counterfactual and Factual (CF^2) reasoning from causal inference theory, to solve both the learning and evaluation problems in explainable GNNs. For generating explanations, we propose a model-agnostic framework by formulating an optimization problem based on both of the two casual perspectives. This distinguishes CF^2 from previous explainable GNNs that only consider one of them. Another contribution of the work is the evaluation of GNN explanations. For quantitatively evaluating the generated explanations without the requirement of ground-truth, we design metrics based on Counterfactual and Factual reasoning to evaluate the necessity and sufficiency of the explanations. Experiments show that no matter ground-truth explanations are available or not, CF^2 generates better explanations than previous state-of-the-art methods on real-world datasets. Moreover, the statistic analysis justifies the correlation between the performance on ground-truth evaluation and our proposed metrics.

翻訳日:2022-02-18 16:39:09 公開日:2022-02-17

# 局所的非パラメトリック信頼区間とシーケンス

Locally private nonparametric confidence intervals and sequences ( http://arxiv.org/abs/2202.08728v1 )

ライセンス: Link先を確認

Ian Waudby-Smith, Zhiwei Steven Wu, Aaditya Ramdas

(参考訳) この研究は、局所微分プライバシー(ldp)の制約下で人口パラメータの非パラメトリック、非漸近的統計推論を行う手法を導出する。 z_1, \dots, x_n)$に民営化される平均$\mu^\star$(z_1, \dots, z_n)$の観測により、民営化データへのアクセスが与えられた場合にのみ$\mu^\star \in \mathbb r$に対して、信頼区間(ci)と時間一様信頼シーケンス(cs)を導入する。我々は、ワーナーの有名な「ランダム化応答」機構を非パラメトリックかつ逐次的に一般化し、任意の有界な確率変数に対するldpを満たし、その結果の民営化された観測へのアクセスを与えられた方法でcisとcssを提供する。我々は、これらのCSを拡張して、時間変化のある(非定常的な)手段を捕捉し、これらの手法がオンラインA/Bテストのプライベートな実施にどのように使用できるかを説明する。

This work derives methods for performing nonparametric, nonasymptotic statistical inference for population parameters under the constraint of local differential privacy (LDP). Given observations $(X_1, \dots, X_n)$ with mean $\mu^\star$ that are privatized into $(Z_1, \dots, Z_n)$, we introduce confidence intervals (CI) and time-uniform confidence sequences (CS) for $\mu^\star \in \mathbb R$ when only given access to the privatized data. We introduce a nonparametric and sequentially interactive generalization of Warner's famous "randomized response" mechanism, satisfying LDP for arbitrary bounded random variables, and then provide CIs and CSs for their means given access to the resulting privatized observations. We extend these CSs to capture time-varying (non-stationary) means, and conclude by illustrating how these methods can be used to conduct private online A/B tests.

翻訳日:2022-02-18 16:36:46 公開日:2022-02-17

# 無線フェデレーション学習における航空モデル集約の効率化のための時間関連スパシフィケーション

Time-Correlated Sparsification for Efficient Over-the-Air Model Aggregation in Wireless Federated Learning ( http://arxiv.org/abs/2202.08420v1 )

ライセンス: Link先を確認

Yuxuan Sun, Sheng Zhou, Zhisheng Niu, Deniz G\"und\"uz

(参考訳) Federated Edge Learning(FEEL)は、エッジインテリジェンスアプリケーションを駆動するための有望な分散機械学習(ML)フレームワークである。しかし、無線の動的な環境とエッジデバイスのリソース制限により、通信は大きなボトルネックとなる。本研究では,通信効率の高い FEEL のためのハイブリッドアグリゲーション (TCS-H) を用いた時間相関スペーシングを提案する。モデルパラメータ間の時間的相関を利用して、デバイス間で同一のグローバルスペーシフィケーションマスクを構築し、より効率的なモデルアグリゲーションを実現する。各デバイスはさらに局所スパースベクトルを構築し、それぞれが直交する多重アクセスを持つデジタル通信によって集約される重要なパラメータを探索する。 tcs-hの装置スケジューリングと電力割当アルゴリズムを更に設計する。実験結果から,TCS-Hは通信資源が限られており,直交モデルアグリゲーションによる従来のTop-Kスペーシフィケーションに比べて高い精度が得られることがわかった。

Federated edge learning (FEEL) is a promising distributed machine learning (ML) framework to drive edge intelligence applications. However, due to the dynamic wireless environments and the resource limitations of edge devices, communication becomes a major bottleneck. In this work, we propose time-correlated sparsification with hybrid aggregation (TCS-H) for communication-efficient FEEL, which exploits jointly the power of model compression and over-the-air computation. By exploiting the temporal correlations among model parameters, we construct a global sparsification mask, which is identical across devices, and thus enables efficient model aggregation over-the-air. Each device further constructs a local sparse vector to explore its own important parameters, which are aggregated via digital communication with orthogonal multiple access. We further design device scheduling and power allocation algorithms for TCS-H. Experiment results show that, under limited communication resources, TCS-H can achieve significantly higher accuracy compared to the conventional top-K sparsification with orthogonal model aggregation, with both i.i.d. and non-i.i.d. data distributions.

翻訳日:2022-02-18 16:36:24 公開日:2022-02-17

# ADD 2022:初のオーディオ深層合成検出チャレンジ

ADD 2022: the First Audio Deep Synthesis Detection Challenge ( http://arxiv.org/abs/2202.08433v1 )

ライセンス: Link先を確認

Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li

(参考訳) オーディオディープフェイク検出は、ASVspoof 2021に含まれる新たなトピックである。しかし、最近の共有タスクは多くの現実と挑戦的なシナリオをカバーしていない。最初のオーディオディープ合成検出チャレンジ(ADD)は、ギャップを埋めるために動機付けられた。 ADD 2022には、低品質の偽オーディオ検出(LF)、部分的に偽オーディオ検出(PF)、オーディオ偽ゲーム(FG)の3つのトラックが含まれている。 LFトラックは、さまざまな現実世界のノイズで、ボナ・フェイドと完全に偽の発話を扱うことに焦点を当てている。 PFトラックは、部分的に偽のオーディオと本物を区別することを目的としている。 FGトラックは、オーディオ生成タスクとオーディオ偽検出タスクの2つのタスクを含むライバルゲームである。本稿では,データセット,評価指標,プロトコルについて述べる。また,近年のオーディオディープフェイク検出タスクの進歩を反映した大きな発見を報告する。

Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake game (FG). The LF track focuses on dealing with bona fide and fully fake utterances with various real-world noises etc. The PF track aims to distinguish the partially fake audio from the real. The FG track is a rivalry game, which includes two tasks: an audio generation task and an audio fake detection task. In this paper, we describe the datasets, evaluation metrics, and protocols. We also report major findings that reflect the recent advances in audio deepfake detection tasks.

翻訳日:2022-02-18 16:36:03 公開日:2022-02-17

# MLP-ASR:音声認識のためのシーケンス長非依存型オールMLPアーキテクチャ

MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition ( http://arxiv.org/abs/2202.08456v1 )

ライセンス: Link先を確認

Jin Sakuma, Tatsuya Komatsu, and Robin Scheibler

(参考訳) 可変長入力に適した多層パーセプトロン(mlp)ベースのアーキテクチャを提案する。画像分類のために最近提案されたMLPベースのアーキテクチャは、固定サイズの入力にのみ使用できる。しかし、例えば音響信号など、多くの種類のデータの長さは自然に変化する。任意の長さのシーケンスで使用するために,MLPベースのアーキテクチャを拡張する3つの手法を提案する。 1つはフーリエ領域で適用される円形の畳み込み、もう1つは奥行きの畳み込みを適用し、最後はシフト演算に依存する。提案手法をLibrispeechとTedlium2コーパスを用いて自動音声認識タスクで評価する。提案されている最も優れたmlpベースのアーキテクチャは wer を 1.0 / 0.9%、librispeech dev-clean/dev-other で0.9 / 0.5%、test-clean/test-other セットで 0.8 / 1.1%、 tedlium2 dev/test セットで86.4%改善する。

We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input. MLP-based architectures, recently proposed for image classification, can only be used for inputs of a fixed, pre-defined size. However, many types of data are naturally variable in length, for example, acoustic signals. We propose three approaches to extend MLP-based architectures for use with sequences of arbitrary length. The first one uses a circular convolution applied in the Fourier domain, the second applies a depthwise convolution, and the final relies on a shift operation. We evaluate the proposed architectures on an automatic speech recognition task with the Librispeech and Tedlium2 corpora. The best proposed MLP-based architectures improves WER by 1.0 / 0.9%, 0.9 / 0.5% on Librispeech dev-clean/dev-other, test-clean/test-other set, and 0.8 / 1.1% on Tedlium2 dev/test set using 86.4% the size of self-attention-based architecture.

翻訳日:2022-02-18 16:35:50 公開日:2022-02-17

# トレーニングのボトルネックはどこにあるのか? ディープラーニング前処理パイプラインにおける隠れトレードオフ

Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines ( http://arxiv.org/abs/2202.08679v1 )

ライセンス: Link先を確認

Alexander Isenko, Ruben Mayer, Jeffrey Jedele, Hans-Arno Jacobsen

(参考訳) ディープラーニングにおける前処理パイプラインは、トレーニングプロセスを忙しくするための十分なデータスループットの提供を目的としている。ハードウェアの革新(高速GPU、TPU、インターコネクションなど)や高度な並列化技術によって、トレーニングプロセスのスループットが向上するにつれ、リソース利用の最大化はますます困難になりつつある。同時に、ますます複雑なモデルをトレーニングするために必要なトレーニングデータも増えています。この開発の結果、エンドツーエンドのディープラーニングパイプラインでは、データ前処理とプロビジョニングが深刻なボトルネックになっている。本稿では,4つの異なる機械学習領域からのデータ前処理パイプラインを詳細に分析する。エンドツーエンドのディープラーニングパイプラインのためのデータセットを効率的に準備し、スループット、前処理時間、ストレージ消費を最適化するために個々のトレードオフを抽出する新しい視点を導入する。さらに、スループットを最大化する適切な前処理戦略を自動的に決定できるオープンソースのプロファイリングライブラリを提供する。実世界のユースケースに生成した洞察を適用することで、パイプラインを機能的に同一に保ちながら、未調整のシステムに比べてスループットが3倍から13倍に向上する。これらの結果は,データパイプラインチューニングの膨大な可能性を示している。

Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amount of training data needed in order to train increasingly complex models is growing. As a consequence of this development, data preprocessing and provisioning are becoming a severe bottleneck in end-to-end deep learning pipelines. In this paper, we provide an in-depth analysis of data preprocessing pipelines from four different machine learning domains. We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines and extract individual trade-offs to optimize throughput, preprocessing time, and storage consumption. Additionally, we provide an open-source profiling library that can automatically decide on a suitable preprocessing strategy to maximize throughput. By applying our generated insights to real-world use-cases, we obtain an increased throughput of 3x to 13x compared to an untuned system while keeping the pipeline functionally identical. These findings show the enormous potential of data pipeline tuning.

翻訳日:2022-02-18 16:35:30 公開日:2022-02-17

# 変圧器を用いた確率力学の学習と創発行動予測

Learning stochastic dynamics and predicting emergent behavior using transformers ( http://arxiv.org/abs/2202.08708v1 )

ライセンス: Link先を確認

Corneel Casert, Isaac Tamblyn and Stephen Whitelam

(参考訳) 言語処理用に設計されたニューラルネットワークは,システムの単一の動的軌道を観測することで確率システムの動的規則を学習でき,訓練中に観察されない条件下での創発的挙動を正確に予測できる。連続時間モンテカルロ動力学による活性物質の格子モデルについて検討し,その定常状態が小さな分散クラスターからなる密度でシミュレーションした。我々はモデルの1つの軌道上でトランスフォーマーと呼ばれるニューラルネットワークを訓練する。変圧器は多数の非局所的な動的規則を表現できる能力を有しており、このモデルの力学が少数のプロセスから構成されていることが分かる。訓練された変圧器の前方伝播軌道は、訓練中に見当たらない密度で運動性誘起相分離を示し、非平衡相転移の存在を予測する。トランスフォーマは、速度の明示的な列挙や構成空間の粗粒化を伴わずに観察から動的規則を学習する柔軟性を持つため、この手順は、大規模で複雑な力学発生器を持つものを含む、幅広い物理システムに適用することができる。

We show that a neural network originally designed for language processing can learn the dynamical rules of a stochastic system by observation of a single dynamical trajectory of the system, and can accurately predict its emergent behavior under conditions not observed during training. We consider a lattice model of active matter undergoing continuous-time Monte Carlo dynamics, simulated at a density at which its steady state comprises small, dispersed clusters. We train a neural network called a transformer on a single trajectory of the model. The transformer, which we show has the capacity to represent dynamical rules that are numerous and nonlocal, learns that the dynamics of this model consists of a small number of processes. Forward-propagated trajectories of the trained transformer, at densities not encountered during training, exhibit motility-induced phase separation and so predict the existence of a nonequilibrium phase transition. Transformers have the flexibility to learn dynamical rules from observation without explicit enumeration of rates or coarse-graining of configuration space, and so the procedure used here can be applied to a wide range of physical systems, including those with large and complex dynamical generators.

翻訳日:2022-02-18 16:35:11 公開日:2022-02-17

# グラフ上のハミルトン・ヤコビ方程式と半教師付き学習とデータ深度への応用

Hamilton-Jacobi equations on graphs with applications to semi-supervised learning and data depth ( http://arxiv.org/abs/2202.08789v1 )

ライセンス: Link先を確認

Jeff Calder, Mahmood Ettehad

(参考訳) 最も短いパスグラフ距離は、データ多様体上の測地線距離を近似できるため、データサイエンスや機械学習で広く使われている。しかし、最短経路距離は、ノイズまたは逆方向の摂動によって、グラフ内の破損したエッジの追加に非常に敏感である。本稿では, グラフ上のハミルトン・ヤコビ方程式の族を, $p$-eikonal equation と呼ぶ。 p=1$の$p$-ekonal方程式はグラフ上の証明可能な堅牢な距離型関数であり、$p\to \infty$制限は最短経路距離を回復する。 p$-eikonal方程式は最短経路グラフ距離とは一致しないが、ランダムな幾何学グラフ上の$p$-eikonal方程式の連続限界は連続体における測地密度重み付き距離を回復することを示している。我々は,データ深度と半教師付き学習に対する$p$-ekonal方程式の適用を検討し,連続極限を用いて両アプリケーションに漸近的整合性を示す。最後に,MNIST,FashionMNIST,CIFAR-10などの実画像データセットに対して,データ深度と半教師付き学習による実験結果を示す。

Shortest path graph distances are widely used in data science and machine learning, since they can approximate the underlying geodesic distance on the data manifold. However, the shortest path distance is highly sensitive to the addition of corrupted edges in the graph, either through noise or an adversarial perturbation. In this paper we study a family of Hamilton-Jacobi equations on graphs that we call the $p$-eikonal equation. We show that the $p$-eikonal equation with $p=1$ is a provably robust distance-type function on a graph, and the $p\to \infty$ limit recovers shortest path distances. While the $p$-eikonal equation does not correspond to a shortest-path graph distance, we nonetheless show that the continuum limit of the $p$-eikonal equation on a random geometric graph recovers a geodesic density weighted distance in the continuum. We consider applications of the $p$-eikonal equation to data depth and semi-supervised learning, and use the continuum limit to prove asymptotic consistency results for both applications. Finally, we show the results of experiments with data depth and semi-supervised learning on real image datasets, including MNIST, FashionMNIST and CIFAR-10, which show that the $p$-eikonal equation offers significantly better results compared to shortest path distances.

翻訳日:2022-02-18 16:34:52 公開日:2022-02-17

# (参考訳) グラフニューラルネットワークが生成するグラフ関数の厳密なクラス

The Exact Class of Graph Functions Generated by Graph Neural Networks ( http://arxiv.org/abs/2202.08833v1 )

ライセンス: CC BY 4.0

Mohammad Fereydounian, Hamed Hassani, Javid Dadashkarimi, Amin Karbasi

(参考訳) エッジウェイトとノード特徴の任意のセットで定義されたグラフ関数が与えられたとき、グラフ関数と出力が同一であるグラフニューラルネットワーク(gnn)が存在するだろうか? 本稿では,この疑問に完全に答え,GNNで表現可能なグラフ問題のクラスを特徴付ける。エッジ重みとノード特徴の置換の観点から代数的条件を同定し、グラフ問題がgnnの到達範囲内にあることを証明した。さらに、この条件を2次的に多くの制約をチェックすることで効率よく検証できることを示す。 GNNの表現力に関する洗練された特徴付けは、GNNとWeisfeiler-Lehmanグラフの同値性を示す理論結果と直交する。例えば、我々の特徴は、min-cut値、max-flow値、max-clique sizeなどの多くの自然グラフ問題をGNNで表現できることを示唆している。対照的に、驚くべきことに、GNNがすべてのノード間の最も短いパスの長さを正しく見つけることができない非常に単純なグラフが存在する。最短経路を見つけることは動的プログラミング(dp)における最も古典的な問題の1つである。このように、前述の否定例は、(概念的には)非常に類似した反復的手順に従っているにもかかわらず、DPとGNNの相違を浮き彫りにしている。最後に,実験シミュレーションによる理論結果を支持する。

Given a graph function, defined on an arbitrary set of edge weights and node features, does there exist a Graph Neural Network (GNN) whose output is identical to the graph function? In this paper, we fully answer this question and characterize the class of graph problems that can be represented by GNNs. We identify an algebraic condition, in terms of the permutation of edge weights and node features, which proves to be necessary and sufficient for a graph problem to lie within the reach of GNNs. Moreover, we show that this condition can be efficiently verified by checking quadratically many constraints. Note that our refined characterization on the expressive power of GNNs are orthogonal to those theoretical results showing equivalence between GNNs and Weisfeiler-Lehman graph isomorphism heuristic. For instance, our characterization implies that many natural graph problems, such as min-cut value, max-flow value, and max-clique size, can be represented by a GNN. In contrast, and rather surprisingly, there exist very simple graphs for which no GNN can correctly find the length of the shortest paths between all nodes. Note that finding shortest paths is one of the most classical problems in Dynamic Programming (DP). Thus, the aforementioned negative example highlights the misalignment between DP and GNN, even though (conceptually) they follow very similar iterative procedures. Finally, we support our theoretical results by experimental simulations.

翻訳日:2022-02-18 16:32:55 公開日:2022-02-17

# グラフから見たBERTの過平滑化再考

Revisiting Over-smoothing in BERT from the Perspective of Graph ( http://arxiv.org/abs/2202.08625v1 )

ライセンス: Link先を確認

Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M.S. Lee, James T. Kwok

(参考訳) 近年,トランスフォーマーモデルにおける過度に平滑化現象が視覚と言語の両方で観測されている。しかし、この現象の主な原因をさらに調査するために、既存の研究が深く掘り下げられていない。そこで本研究では,このような問題を最初に発見・検討したグラフの観点から,過剰スモーシング問題を解析する試みを行う。直感的には、自己着行列は対応するグラフの正規化隣接行列と見なすことができる。上述の接続に基づいて理論的解析を行い、トランスフォーマーモデルにおける過度な平滑化問題において、層正規化が重要な役割を果たすことを確認する。具体的には、層正規化の標準偏差が十分大きい場合、トランスフォーマースタックの出力は特定の低ランク部分空間に収束し、オーバースムーズとなる。オーバースムーシング問題を軽減するために,異なる層からの表現を適応的に組み合わせ,出力をより多様にする階層的融合戦略を検討する。各種データセットにおける広範な実験結果から, 核融合法の効果を明らかにした。

Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields. However, no existing work has delved deeper to further investigate the main cause of this phenomenon. In this work, we make the attempt to analyze the over-smoothing problem from the perspective of graph, where such problem was first discovered and explored. Intuitively, the self-attention matrix can be seen as a normalized adjacent matrix of a corresponding graph. Based on the above connection, we provide some theoretical analysis and find that layer normalization plays a key role in the over-smoothing issue of Transformer-based models. Specifically, if the standard deviation of layer normalization is sufficiently large, the output of Transformer stacks will converge to a specific low-rank subspace and result in over-smoothing. To alleviate the over-smoothing problem, we consider hierarchical fusion strategies, which combine the representations from different layers adaptively to make the output more diverse. Extensive experiment results on various data sets illustrate the effect of our fusion method.

翻訳日:2022-02-18 15:53:52 公開日:2022-02-17

# 確率時系列予測のためのアンサンブル等角化分位回帰

Ensemble Conformalized Quantile Regression for Probabilistic Time Series Forecasting ( http://arxiv.org/abs/2202.08756v1 )

ライセンス: Link先を確認

Vilde Jensen, Filippo Maria Bianchi, Stian Norman Anfinsen

(参考訳) 本稿では,アンサンブル共形量子回帰(EnCQR)と呼ばれる新しい確率予測手法を提案する。 EnCQRは、分布のないほぼ妥当な予測間隔(PI)を構築し、非定常およびヘテロセダスティック時系列データに適しており、長いデータシーケンスでトレーニングされたディープラーニングアーキテクチャを含むあらゆる予測モデルに適用することができる。 EnCQRはブートストラップアンサンブル推定器を利用して、データ交換性の必要性を取り除くことで、時系列に共形予測器を使用できる。アンサンブル学習者は、定量回帰を実行する汎用機械学習アルゴリズムとして実装され、PIの長さがデータの局所的変動に適応できるようにする。実験では,異なるヘテロシドキシーによって特徴付けられる時系列を予測した。その結果、encqrは量的回帰や共形予測のみに基づくモデルよりも優れており、より鋭く、より有益で、有効なpiを提供する。

This paper presents a novel probabilistic forecasting method called ensemble conformalized quantile regression (EnCQR). EnCQR constructs distribution-free and approximately marginally valid prediction intervals (PIs), is suitable for nonstationary and heteroscedastic time series data, and can be applied on top of any forecasting model, including deep learning architectures that are trained on long data sequences. EnCQR exploits a bootstrap ensemble estimator, which enables the use of conformal predictors for time series by removing the requirement of data exchangeability. The ensemble learners are implemented as generic machine learning algorithms performing quantile regression, which allow the length of the PIs to adapt to local variability in the data. In the experiments, we predict time series characterized by a different amount of heteroscedasticity. The results demonstrate that EnCQR outperforms models based only on quantile regression or conformal prediction, and it provides sharper, more informative, and valid PIs.

翻訳日:2022-02-18 15:53:35 公開日:2022-02-17

# 未知の切断点を持つ高次元データのモデリング:融合ペナル化ロジスティック閾値回帰

Modeling High-Dimensional Data with Unknown Cut Points: A Fusion Penalized Logistic Threshold Regression ( http://arxiv.org/abs/2202.08441v1 )

ライセンス: Link先を確認

Yinan Lin, Wen Zhou, Zhi Geng, Gexin Xiao, and Jianxin Yin

(参考訳) 従来のロジスティック回帰モデルでは、リンク関数は線形で連続であると見なされることが多い。ここでは,すべての連続的な特徴が順序レベルに離散化され,さらにバイナリ応答を決定するしきい値モデルを考える。閾値点と回帰係数はともに未知であり、推定される。高次元データに対して,可変選択法として積分ラッソペナルティを適用し,係数を0に縮小する,融合ペナルティ付きロジスティックしきい値回帰(フィルタ)モデルを提案する。未知しきい値の推定における軽度条件下では、係数推定のための非漸近誤差とモデル選択整合性を確立する。また, エラー伝播の注意深い評価により, CARTなどの木に基づく手法がしきい値推定条件を満たすことを示した。このフィルタモデルは, 糖尿病などの慢性疾患の早期発見と予測に, 理学検査データを用いて好適であることがわかった。また,提案手法の有限サンプル挙動についても検討し,理論的な発見を支援するモンテカルロ研究と比較した。

In traditional logistic regression models, the link function is often assumed to be linear and continuous in predictors. Here, we consider a threshold model that all continuous features are discretized into ordinal levels, which further determine the binary responses. Both the threshold points and regression coefficients are unknown and to be estimated. For high dimensional data, we propose a fusion penalized logistic threshold regression (FILTER) model, where a fused lasso penalty is employed to control the total variation and shrink the coefficients to zero as a method of variable selection. Under mild conditions on the estimate of unknown threshold points, we establish the non-asymptotic error bound for coefficient estimation and the model selection consistency. With a careful characterization of the error propagation, we have also shown that the tree-based method, such as CART, fulfill the threshold estimation conditions. We find the FILTER model is well suited in the problem of early detection and prediction for chronic disease like diabetes, using physical examination data. The finite sample behavior of our proposed method are also explored and compared with extensive Monte Carlo studies, which supports our theoretical discoveries.

翻訳日:2022-02-18 15:53:20 公開日:2022-02-17

# transcg: 透明な物体深度の完成と把握のための大規模実世界データセット

TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and Grasping ( http://arxiv.org/abs/2202.08471v1 )

ライセンス: Link先を確認

Hongjie Fang, Hao-Shu Fang, Sheng Xu and Cewu Lu

(参考訳) 透明なオブジェクトは私たちの日常生活で一般的であり、自動生産ラインで頻繁に扱われます。視覚に基づくロボットによる物体の把握と操作は、自動化に有用だろう。しかし、現在の把持アルゴリズムの大部分は深度画像に大きく依存しているため失敗するが、通常の深度センサは通常、光の反射と屈折のために透明な物体の正確な深さ情報を生成することができない。そこで本稿では,130シーンの57,715 rgb-d画像を含む透明オブジェクト深度補完のための大規模実世界データセットをコントリビュートすることで,この問題に対処した。私たちのデータセットは、最初の大規模な実世界のデータセットであり、最も包括的なアノテーションを提供します。クロスドメイン実験は、我々のデータセットが非常に一般化できることを示している。さらに、RGB画像と不正確な深度マップを入力とし、精細化された深度マップを出力するエンドツーエンドの深度補完ネットワークを提案する。実験では,従来の手法よりも優れた有効性,効率性,頑健性を示し,限られたハードウェア資源で高分解能画像を処理できることを示した。実ロボット実験では,新しい物体の把握にロバストに応用できることを示した。完全なデータセットとメソッドはwww.graspnet.net/transcgで公開されている。

Transparent objects are common in our daily life and frequently handled in the automated production line. Robust vision-based robotic grasping and manipulation for these objects would be beneficial for automation. However, the majority of current grasping algorithms would fail in this case since they heavily rely on the depth image, while ordinary depth sensors usually fail to produce accurate depth information for transparent objects owing to the reflection and refraction of light. In this work, we address this issue by contributing a large-scale real-world dataset for transparent object depth completion, which contains 57,715 RGB-D images from 130 different scenes. Our dataset is the first large-scale real-world dataset and provides the most comprehensive annotation. Cross-domain experiments show that our dataset has a great generalization ability. Moreover, we propose an end-to-end depth completion network, which takes the RGB image and the inaccurate depth map as inputs and outputs a refined depth map. Experiments demonstrate superior efficacy, efficiency and robustness of our method over previous works, and it is able to process images of high resolutions under limited hardware resources. Real robot experiment shows that our method can also be applied to novel object grasping robustly. The full dataset and our method are publicly available at www.graspnet.net/transcg.

翻訳日:2022-02-18 15:51:03 公開日:2022-02-17

# 嫌な男:Facebookの挑戦のレンズを通して、嫌なミームを自動的に検出する

Feels Bad Man: Dissecting Automated Hateful Meme Detection Through the Lens of Facebook's Challenge ( http://arxiv.org/abs/2202.08492v1 )

ライセンス: Link先を確認

Catherine Jennifer, Fatemeh Tahmasbi, Jeremy Blackburn, Gianluca Stringhini, Savvas Zannettou, and Emiliano De Cristofaro

(参考訳) インターネットミームはコミュニケーションの主流となっているが、同時に過激主義を提唱し、軽蔑的信念を育むためにも使われるようになっている。いずれにせよ、ミームの知覚的側面がこの現象を引き起こすのかについては、よく分かっていない。本研究では,現在最先端のマルチモーダル機械学習モデルのヘイトフルミーム検出に対する有効性,特にプラットフォーム間の一般化性について評価する。 4chan's "politically incorrect" board (/pol/)とfacebook's hateful memes challenge datasetの12,140と10,567の2つのベンチマークデータセットを使用して、競争のトップレベルの機械学習モデルをトレーニングし、バイラルな憎しみのあるミームと良性なミームを区別する最も顕著な特徴を発見しました。分類性能におけるマルチモーダルの重要性,主流のソーシャルプラットフォームにおけるWebコミュニティの影響力,その逆の3つの実験を行い,モデルの4chanミームにおける学習伝達性について検討した。実験の結果,ミームのイメージ特性はテキストの内容よりも豊富な情報を提供することがわかった。ミームにおけるヘイトスピーチのオンライン検出のために開発された現在のシステムは、その視覚要素にさらなる集中を要し、文化的意味論の解釈を改善し、マルチモーダルモデルではミームにおけるヘイトスピーチの複雑さを十分に把握できず、ソーシャルメディアプラットフォーム全体に一般化できないことを示唆している。

Internet memes have become a dominant method of communication; at the same time, however, they are also increasingly being used to advocate extremism and foster derogatory beliefs. Nonetheless, we do not have a firm understanding as to which perceptual aspects of memes cause this phenomenon. In this work, we assess the efficacy of current state-of-the-art multimodal machine learning models toward hateful meme detection, and in particular with respect to their generalizability across platforms. We use two benchmark datasets comprising 12,140 and 10,567 images from 4chan's "Politically Incorrect" board (/pol/) and Facebook's Hateful Memes Challenge dataset to train the competition's top-ranking machine learning models for the discovery of the most prominent features that distinguish viral hateful memes from benign ones. We conduct three experiments to determine the importance of multimodality on classification performance, the influential capacity of fringe Web communities on mainstream social platforms and vice versa, and the models' learning transferability on 4chan memes. Our experiments show that memes' image characteristics provide a greater wealth of information than its textual content. We also find that current systems developed for online detection of hate speech in memes necessitate further concentration on its visual elements to improve their interpretation of underlying cultural connotations, implying that multimodal models fail to adequately grasp the intricacies of hate speech in memes and generalize across social media platforms.

翻訳日:2022-02-18 15:50:43 公開日:2022-02-17

# EBHI:画像分類のための新しい内視鏡生検組織学的H&E画像データセット

EBHI:A New Enteroscope Biopsy Histopathological H&E Image Dataset for Image Classification Evaluation ( http://arxiv.org/abs/2202.08552v1 )

ライセンス: Link先を確認

Weiming Hu, Chen Li, Xiaoyan Li, Md Mamunur Rahaman, Yong Zhang, Haoyuan Chen, Wanli Liu, Yudong Yao, Hongzan Sun, Ning Xu, Xinyu Huang and Marcin Grzegorze

(参考訳) 背景と目的: 大腸癌は世界で3番目に多いがんであり、がん患者の約10%を占めている。この疾患の早期発見は大腸癌患者の治療に重要である。病理組織検査は大腸癌検診の金本位制である。しかし,現在の大腸癌,特に内視鏡生検の病理組織像データセットの欠如は,コンピュータ支援診断技術の正確な評価を妨げている。方法: 新たに公開された腸鏡生検組織病理組織学的h&e画像データセット (ebhi) を本論文で発表する。 EBHIデータセットの有効性を実証するために,200倍の倍率の画像を用いて,複数の機械学習,畳み込みニューラルネットワーク,新しいトランスフォーマーベース分類器を用いて実験と評価を行った。結果:実験結果から,深層学習法はEBHIデータセットで良好に動作することが示された。従来の機械学習手法は最大精度76.02%、ディープラーニング手法は最大精度95.37%である。結語: EBHIは4倍, 5種類の腫瘍分化期像, 5532枚の画像を含む, 初めて公開された大腸病理組織内視鏡生検データセットである。 EBHIは大腸癌の自動診断のための新しい分類アルゴリズムを研究者に提供し、臨床現場で医師や患者に役立てることができると考えている。

Background and purpose: Colorectal cancer has become the third most common cancer worldwide, accounting for approximately 10% of cancer patients. Early detection of the disease is important for the treatment of colorectal cancer patients. Histopathological examination is the gold standard for screening colorectal cancer. However, the current lack of histopathological image datasets of colorectal cancer, especially enteroscope biopsies, hinders the accurate evaluation of computer-aided diagnosis techniques. Methods: A new publicly available Enteroscope Biopsy Histopathological H&E Image Dataset (EBHI) is published in this paper. To demonstrate the effectiveness of the EBHI dataset, we have utilized several machine learning, convolutional neural networks and novel transformer-based classifiers for experimentation and evaluation, using an image with a magnification of 200x. Results: Experimental results show that the deep learning method performs well on the EBHI dataset. Traditional machine learning methods achieve maximum accuracy of 76.02% and deep learning method achieves a maximum accuracy of 95.37%. Conclusion: To the best of our knowledge, EBHI is the first publicly available colorectal histopathology enteroscope biopsy dataset with four magnifications and five types of images of tumor differentiation stages, totaling 5532 images. We believe that EBHI could attract researchers to explore new classification algorithms for the automated diagnosis of colorectal cancer, which could help physicians and patients in clinical settings.

翻訳日:2022-02-18 15:50:12 公開日:2022-02-17

# 解剖学的パラメータ化統計形状モデル:統計学習による形態計測

Anatomically Parameterized Statistical Shape Model: Explaining Morphometry through Statistical Learning ( http://arxiv.org/abs/2202.08580v1 )

ライセンス: Link先を確認

Arnaud Boutillon, Asma Salhi, Val\'erie Burdin, Bhushan Borotikar

(参考訳) 統計的形状モデル(SSM)は、臨床実践において重要なステップである解剖学的構造の形態学的解析を行うための一般的なツールである。しかし、SSMによる形状表現は形状係数に基づいており、臨床関連性の解剖学的尺度との明確な一対一の関係は欠如している。形状係数は解剖学的測度の組み合わせを埋め込んでいるが、それらの関係を見つけるための形式化されたアプローチは、文献ではまだ解明されていない。これにより、SSMの使用は臨床実践における主観評価に制限される。形態計測解析から得られた解剖学的パラメータによって制御される新しいssmを提案する。解剖学的パラメータ化SSM(ANAT-SSM)は,形状係数と選択された解剖学的パラメータの線形マッピングを学習することに基づく。このマッピングは、標準SSMによって生成された合成集団から学習される。マッピングの擬似逆数を決定することで、ANAT-SSMを構築することができます。さらに, 独立な形状変化パターンを得るために, 解剖学的パラメータ化に直交性制約を課す。本研究は, 臨床解剖学的パラメータを用いて, 大腿骨骨と肩甲骨形状の2つの骨格データベースを用いて評価した。合成生成した形状の解剖学的指標は現実的な統計値を示した。学習した行列は得られた統計的関係とよく一致し,2つのssmは見当たらない形状の解剖学的パラメータの予測において中程度から優れた性能を得た。本研究は、解剖学的パラメータ化SSMの作成に解剖学的表現を用いており、その結果、標準SSMの限られた臨床解釈性は排除される。提案モデルでは, 患者の骨形態計測の差異を解析し, 患者固有の手術前計画や手術中評価に組み込むことができる。

Statistical shape models (SSMs) are a popular tool to conduct morphological analysis of anatomical structures which is a crucial step in clinical practices. However, shape representations through SSMs are based on shape coefficients and lack an explicit one-to-one relationship with anatomical measures of clinical relevance. While a shape coefficient embeds a combination of anatomical measures, a formalized approach to find the relationship between them remains elusive in the literature. This limits the use of SSMs to subjective evaluations in clinical practices. We propose a novel SSM controlled by anatomical parameters derived from morphometric analysis. The proposed anatomically parameterized SSM (ANAT-SSM) is based on learning a linear mapping between shape coefficients and selected anatomical parameters. This mapping is learned from a synthetic population generated by the standard SSM. Determining the pseudo-inverse of the mapping allows us to build the ANAT-SSM. We further impose orthogonality constraints to the anatomical parameterization to obtain independent shape variation patterns. The proposed contribution was evaluated on two skeletal databases of femoral and scapular bone shapes using clinically relevant anatomical parameters. Anatomical measures of the synthetically generated shapes exhibited realistic statistics. The learned matrices corroborated well with the obtained statistical relationship, while the two SSMs achieved moderate to excellent performance in predicting anatomical parameters on unseen shapes. This study demonstrates the use of anatomical representation for creating anatomically parameterized SSM and as a result, removes the limited clinical interpretability of standard SSMs. The proposed models could help analyze differences in relevant bone morphometry between populations, and be integrated in patient-specific pre-surgery planning or in-surgery assessment.

翻訳日:2022-02-18 15:49:49 公開日:2022-02-17

# 弱教師付き効率的なUNetとモルフォロジー後処理に基づくエンドツーエンドニューロンインスタンス分割

End-to-end Neuron Instance Segmentation based on Weakly Supervised Efficient UNet and Morphological Post-processing ( http://arxiv.org/abs/2202.08682v1 )

ライセンス: Link先を確認

Huaqian Wu, Nicolas Souedet, Caroline Jan, C\'edric Clouchoux, Thierry Delzescaux

(参考訳) 近年の研究では、医学画像解析における深層学習の優位性、特に細胞インスタンスセグメンテーションにおいて、多くの生物学的研究の基本的なステップが示されている。しかし、ニューラルネットワークの優れたパフォーマンスには、大きな偏りのないデータセットとアノテーションのトレーニングが必要である。本稿では,NuN染色神経細胞の組織像における検出と分画を,ポイントアノテーションのみで行うエンド・ツー・エンドで制御するフレームワークを提案する。私たちは最先端のネットワークであるEfficientNetをU-Netのようなアーキテクチャに統合します。検証結果は,近年の手法と比較して,モデルの優位性を示している。さらに,複数の後処理スキームを調査し,究極のエロージョンと動的再構成を用いて確率マップをセグメント化されたインスタンスに変換する手法を提案した。このアプローチは、他の古典的な後処理技術の設定と性能に優れています。

Recent studies have demonstrated the superiority of deep learning in medical image analysis, especially in cell instance segmentation, a fundamental step for many biological studies. However, the good performance of the neural networks requires training on large unbiased dataset and annotations, which is labor-intensive and expertise-demanding. In this paper, we present an end-to-end weakly-supervised framework to automatically detect and segment NeuN stained neuronal cells on histological images using only point annotations. We integrate the state-of-the-art network, EfficientNet, into our U-Net-like architecture. Validation results show the superiority of our model compared to other recent methods. In addition, we investigated multiple post-processing schemes and proposed an original strategy to convert the probability map into segmented instances using ultimate erosion and dynamic reconstruction. This approach is easy to configure and outperforms other classical post-processing techniques.

翻訳日:2022-02-18 15:49:21 公開日:2022-02-17

# オプティカルフローにより駆動されるレベルセットに基づく粒子フィルタ:x線ct時系列からの塩境界追跡への応用

Level set based particle filter driven by optical flow: an application to track the salt boundary from X-ray CT time-series ( http://arxiv.org/abs/2202.08717v1 )

ライセンス: Link先を確認

Karim Makki and Jean Fran\c{c}ois Lecomte and Lukas Fuchs and Sylvie Schueller and Etienne M\'emin

(参考訳) 画像に基づく計算流体力学は、様々な物理現象の知識と理解を活用する上で、長い間重要な役割を担ってきた。特に確率論的計算法は、純粋にランダムな乱流運動におけるシステムの複雑な力学をモデル化する方法を開いた。構造地質学の分野では、塩と周囲の岩石の双方における変形と応力状態のより深い理解が、あらゆる地下の長期エネルギー貯蔵システムの特徴付けに非常に興味がある。本研究の目的は,X線CT(Computed Tomography, CT)画像時系列から, 重力および差分荷重下での塩構造の進化を示す並列的, 確率的フィルタリング手法を用いて, 時間とともに塩境界の非線形変形を決定することである。この研究は、モデルの不確実性を考慮した物理モデリングと高度な確率画像処理手法を統合するための第一歩である。

Image-based computational fluid dynamics have long played an important role in leveraging knowledge and understanding of several physical phenomena. In particular, probabilistic computational methods have opened the way to modelling the complex dynamics of systems in purely random turbulent motion. In the field of structural geology, a better understanding of the deformation and stress state both within the salt and the surrounding rocks is of great interest to characterize all kinds of subsurface long-terms energy-storage systems. The objective of this research is to determine the non-linear deformation of the salt boundary over time using a parallelized, stochastic filtering approach from x-ray computed tomography (CT) image time series depicting the evolution of salt structures triggered by gravity and under differential loading. This work represents a first step towards bringing together physical modeling and advanced stochastic image processing methods where model uncertainty is taken into account.

翻訳日:2022-02-18 15:49:06 公開日:2022-02-17

# (参考訳) 画像品質評価のための深い知覚指標に関する研究

A study of deep perceptual metrics for image quality assessment ( http://arxiv.org/abs/2202.08692v1 )

ライセンス: CC BY 4.0

R\'emi Kazmierczak, Gianni Franchi, Nacim Belkhir, Antoine Manzanera, David Filliat

(参考訳) 画像間の類似度を定量化する指標はいくつか存在するが、高度に歪んだ画像の類似度を測定することは非効率である。本研究では,画像品質評価(iqa)タスクに取り組むために,深層ニューラルネットワークに基づく知覚指標を実証的に検討する。ネットワークのアーキテクチャやトレーニング手順など、さまざまなハイパーパラメータに従って、深い知覚指標を調査します。最後に,様々な解像度で知覚情報を集約し,画像変形の異なる iqa タスクにおける標準知覚指標を上回るマルチレゾリューション知覚指標(mr-perceptual)を提案する。私たちのコードはhttps://github.com/ENSTA-U2IS/MR_perceptualで利用可能です。

Several metrics exist to quantify the similarity between images, but they are inefficient when it comes to measure the similarity of highly distorted images. In this work, we propose to empirically investigate perceptual metrics based on deep neural networks for tackling the Image Quality Assessment (IQA) task. We study deep perceptual metrics according to different hyperparameters like the network's architecture or training procedure. Finally, we propose our multi-resolution perceptual metric (MR-Perceptual), that allows us to aggregate perceptual information at different resolutions and outperforms standard perceptual metrics on IQA tasks with varying image deformations. Our code is available at https://github.com/ENSTA-U2IS/MR_perceptual

翻訳日:2022-02-18 15:46:50 公開日:2022-02-17

# 時系列予測のための多目的モデル選択

Multi-Objective Model Selection for Time Series Forecasting ( http://arxiv.org/abs/2202.08485v1 )

ライセンス: Link先を確認

Oliver Borchert, David Salinas, Valentin Flunkert, Tim Januschowski, Stephan G\"unnemann

(参考訳) 時系列予測の研究は、精度を向上させる方法の開発に重点を置いている。しかし、トレーニング時間やレイテンシなどの他の基準は、多くの現実世界のアプリケーションで重要である。そこで本研究では,与えられたデータセットの適切な予測モデルを選択する方法について,精度が多くの基準のうちの1つにすぎない場合,多くの予測手法の中から解決する。これに対する私たちの貢献は2倍です。まず,44個の不均質な公開データセットを用いた7つの古典的および6つのディープラーニング予測手法を評価する,包括的なベンチマークを提案する。ベンチマークコードは、すべてのメソッドの評価と予測とともに、オープンソースである。これらの評価により、従来のモデルよりも優れたディープラーニングモデルに必要なデータ量などのオープンな質問に答えることができる。第2に、ベンチマーク評価を利用して、精度やレイテンシなど、複数の目的を考慮した良いデフォルトを学習します。予測モデルからパフォーマンスメトリクスへのマッピングを学習することにより、私たちのメソッドPARETOSELECTがParetoフロントから正確にモデルを選択できることを示します。我々の知る限り、PARETOSELECTは、マルチオブジェクト設定でデフォルトモデルを学習する最初の方法を構成する。

Research on time series forecasting has predominantly focused on developing methods that improve accuracy. However, other criteria such as training time or latency are critical in many real-world applications. We therefore address the question of how to choose an appropriate forecasting model for a given dataset among the plethora of available forecasting methods when accuracy is only one of many criteria. For this, our contributions are two-fold. First, we present a comprehensive benchmark, evaluating 7 classical and 6 deep learning forecasting methods on 44 heterogeneous, publicly available datasets. The benchmark code is open-sourced along with evaluations and forecasts for all methods. These evaluations enable us to answer open questions such as the amount of data required for deep learning models to outperform classical ones. Second, we leverage the benchmark evaluations to learn good defaults that consider multiple objectives such as accuracy and latency. By learning a mapping from forecasting models to performance metrics, we show that our method PARETOSELECT is able to accurately select models from the Pareto front -- alleviating the need to train or evaluate many forecasting models for model selection. To the best of our knowledge, PARETOSELECT constitutes the first method to learn default models in a multi-objective setting.

翻訳日:2022-02-18 15:34:29 公開日:2022-02-17

# SAITS: 自己注意に基づく時系列計算

SAITS: Self-Attention-based Imputation for Time Series ( http://arxiv.org/abs/2202.08516v1 )

ライセンス: Link先を確認

Wenjie Du, David C\^ot\'e, Yan Liu

(参考訳) 時系列におけるデータの欠如は、特に現実世界のアプリケーションにおいて、パターン認識の方法に障害をもたらす広範囲にわたる問題である。一般的な解決策はインプテーションであり、どの値を埋めるべきかを決めることが基本的な課題である。本稿では,多変量時系列における値計算の欠落に対する自己注意機構に基づくSAITSを提案する。 SAITSは共同最適化アプローチによって訓練され、2つの対角行列自己注意ブロック(DMSA)の重み付け組み合わせから欠落値を学ぶ。 dmsaは、時間ステップ間の時間依存性と特徴相関の両方を明示的に捉え、インプテーション精度とトレーニング速度を改善する。一方、重み付け合成設計では、注意マップと不足情報に基づいて、2つのDMSAブロックから学習した表現に重みを動的に割り当てることができる。実世界の不完全な時系列データを用いたパターン認識モデルの学習性能を向上させるために,saitsは時系列インプテーションタスクにおいて最先端の手法を効果的に超えていることを示す広範な実験を行った。

Missing data in time series is a pervasive problem that puts obstacles in the way of pattern recognition, especially in real-world applications. A popular solution is imputation, where the fundamental challenge is to determine what values should be filled in. This paper proposes SAITS, a novel method based on the self-attention mechanism for missing value imputation in multivariate time series. Trained by a joint-optimization approach, SAITS learns missing values from a weighted combination of two diagonally-masked self-attention (DMSA) blocks. DMSA explicitly captures both the temporal dependencies and feature correlations between time steps, which improves imputation accuracy and training speed. Meanwhile, the weighted-combination design enables SAITS to dynamically assign weights to the learned representations from two DMSA blocks according to the attention map and the missingness information. Extensive experiments demonstrate that SAITS outperforms the state-of-the-art methods on the time-series imputation task efficiently and reveal SAITS' potential to improve the learning performance of pattern recognition models on incomplete time-series data from the real world.

翻訳日:2022-02-18 15:34:12 公開日:2022-02-17

# CoFED:コトレーニングによるクロスサイロ不均一なマルチタスク学習

CoFED: Cross-silo Heterogeneous Federated Multi-task Learning via Co-training ( http://arxiv.org/abs/2202.08603v1 )

ライセンス: Link先を確認

Xingjian Cao, Zonghang Li, Hongfang Yu, Gang Sun

(参考訳) Federated Learning(FL)は、参加者がプライベートデータを交換することなく、高品質なモデルを協調的にトレーニングできる機械学習技術である。クロスサイロfl設定の参加者は、異なるタスクニーズを持つ独立した組織であり、データプライバシだけでなく、知的財産による独自のモデルのトレーニングにも関係しています。既存のFLスキームの多くは上記のシナリオでは不可能である。本稿では,コトレーニングのようなラベルなしの擬似ラベルデータに基づく通信効率の高いFL方式であるCoFEDを提案する。我々の知る限り、これは異種タスク、異種モデル、異種訓練アルゴリズムを同時に扱う最初のFLスキームである。実験結果から,CoFEDは通信コストの低減を図った。特に非iid設定や不均質モデルでは,提案手法により35%性能が向上した。

Federated Learning (FL) is a machine learning technique that enables participants to train high-quality models collaboratively without exchanging their private data. Participants in cross-silo FL settings are independent organizations with different task needs, and they are concerned not only with data privacy, but also with training independently their unique models due to intellectual property. Most existing FL schemes are incapability for the above scenarios. In this paper, we propose a communication-efficient FL scheme, CoFED, based on pseudo-labeling unlabeled data like co-training. To the best of our knowledge, it is the first FL scheme compatible with heterogeneous tasks, heterogeneous models, and heterogeneous training algorithms simultaneously. Experimental results show that CoFED achieves better performance with a lower communication cost. Especially for the non-IID settings and heterogeneous models, the proposed method improves the performance by 35%.

翻訳日:2022-02-18 15:33:52 公開日:2022-02-17

# (参考訳) 大量内視鏡画像を用いた大腸内視鏡ポリープ検出

Colonoscopy polyp detection with massive endoscopic images ( http://arxiv.org/abs/2202.08730v1 )

ライセンス: CC BY 4.0

Jialin Yu, Huogen Wang, Ming Chen

(参考訳) 我々は,検出速度において自明なコストで異なるデータセットで検証された平均精度を向上し,既存の終端ポリプ検出モデルを改善した。大腸内視鏡検査におけるポリープ検出に関するこれまでの研究は、医師の検査オーバーヘッドを軽減するための効率的なエンドツーエンドソリューションを提供した。しかし、後の実験で、このフレームワークはポリプ捕獲の状態が変化する以前ほど堅牢ではないことが分かりました。本研究では,ポリープ検出作業において,精度の低下の原因となる主な問題を特定するため,データセットに関するいくつかの研究を行った。私たちは、アンカーボックス形状を改善するために最適化されたアンカー生成手法を使い、小さなオブジェクト検出に必要であると信じているため、より多くのボックスが検出に使われました。代替のバックボーンは、密集したアンカーボックス回帰によって引き起こされる重い時間コストを補償するために使用される。アテンションゲートモジュールを使用することで,リアルタイム検出速度を維持しつつ,最先端ポリープ検出性能を実現することができる。

We improved an existing end-to-end polyp detection model with better average precision validated by different data sets with trivial cost on detection speed. Previous work on detecting polyps within colonoscopy \cite{Chen2018} provided an efficient end-to-end solution to alleviate doctor's examination overhead. However, our later experiments found this framework is not as robust as before as the condition of polyp capturing varies. In this work, we conducted several studies on data set, identifying main issues that causes low precision rate in the task of polyp detection. We used an optimized anchor generation methods to get better anchor box shape and more boxes are used for detection as we believe this is necessary for small object detection. A alternative backbone is used to compensate the heavy time cost introduced by dense anchor box regression. With use of the attention gate module, our model can achieve state-of-the-art polyp detection performance while still maintain real-time detection speed.

翻訳日:2022-02-18 15:31:01 公開日:2022-02-17

# テンポラルシーンセグメンテーションのためのシフトメモリネットワーク

Shift-Memory Network for Temporal Scene Segmentation ( http://arxiv.org/abs/2202.08399v1 )

ライセンス: Link先を確認

Guo Cheng, Jiang Yu Zheng

(参考訳) 意味セグメンテーションは空間レイアウトの理解において非常に正確である。動的シーンに基づくリアルタイムタスクでは,時間領域における意味セグメンテーションを拡張し,動きによる空間的精度を向上させる。ストリーミング入力上のシフトモードネットワークを用いて、ゼロレイテンシ出力を保証する。シフトネットワーク下でのデータの重なりについて,ネットワーク層間の一定周期における反復計算を同定する。この冗長性を避けるために、シフトメモリネットワーク(smn)を符号化復号ベースラインから導出し、精度を損なうことなくネットワーク値を再利用する。 SMNはパッチモードで訓練され、SMNのネットワークパラメータを抽出し、高速なメモリで推論を行う。 1dスキャン入力と2dビデオから動的シーンを分割する。 SMNの実験はシフトモードとして同等の精度を達成するが、高速な推論速度とメモリの縮小を実現している。これにより、エッジデバイス上のリアルタイムアプリケーションにおけるセマンティックセグメンテーションが容易になる。

Semantic segmentation has achieved great accuracy in understanding spatial layout. For real-time tasks based on dynamic scenes, we extend semantic segmentation in temporal domain to enhance the spatial accuracy with motion. We utilize a shift-mode network over streaming input to ensure zero-latency output. For the data overlap under shifting network, this paper identifies repeated computation in fixed periods across network layers. To avoid this redundancy, we derive a Shift-Memory Network (SMN) from encoding-decoding baseline to reuse the network values without accuracy loss. Trained in patch-mode, the SMN extracts the network parameters for SMN to perform inference promptly in compact memory. We segment dynamic scenes from 1D scanning input and 2D video. The experiments of SMN achieve equivalent accuracy as shift-mode but in faster inference speeds and much smaller memory. This will facilitate semantic segmentation in real-time application on edge devices.

翻訳日:2022-02-18 15:20:25 公開日:2022-02-17

# PENCIL: ノイズラベルによるディープラーニング

PENCIL: Deep Learning with Noisy Labels ( http://arxiv.org/abs/2202.08436v1 )

ライセンス: Link先を確認

Kun Yi, Guo-Hua Wang, Jianxin Wu

(参考訳) ディープラーニングは様々なコンピュータビジョンタスクにおいて優れたパフォーマンスを実現しているが、クリーンなラベルで多くのトレーニング例を必要とする。ノイズの多いラベルでデータセットを収集するのは簡単だが、そのようなノイズによりネットワークは過度に適合し、精度は劇的に低下する。この問題に対処するために,ネットワークパラメータとラベル推定をラベル分布として更新するPENCILというエンドツーエンドフレームワークを提案する。 PENCILはバックボーンネットワーク構造とは独立しており、補助的なクリーンデータセットやノイズに関する事前情報を必要としないため、既存の手法よりも汎用的で堅牢であり、適用が容易である。 PENCILは、パフォーマンス向上のために繰り返し使用することもできる。 PENCILは、ノイズタイプやノイズ率の異なる合成および実世界のデータセットにおいて、従来の最先端の手法よりも大きなマージンで優れている。また,PENCILはバックボーンネットワークに単純なアテンション構造を加えることで,マルチラベル分類タスクにも有効である。実験によると、PENCILはクリーンなデータセットにも堅牢である。

Deep learning has achieved excellent performance in various computer vision tasks, but requires a lot of training examples with clean labels. It is easy to collect a dataset with noisy labels, but such noise makes networks overfit seriously and accuracies drop dramatically. To address this problem, we propose an end-to-end framework called PENCIL, which can update both network parameters and label estimations as label distributions. PENCIL is independent of the backbone network structure and does not need an auxiliary clean dataset or prior information about noise, thus it is more general and robust than existing methods and is easy to apply. PENCIL can even be used repeatedly to obtain better performance. PENCIL outperforms previous state-of-the-art methods by large margins on both synthetic and real-world datasets with different noise types and noise rates. And PENCIL is also effective in multi-label classification tasks through adding a simple attention structure on backbone networks. Experiments show that PENCIL is robust on clean datasets, too.

翻訳日:2022-02-18 15:20:11 公開日:2022-02-17

# V2X-Sim:自律運転のための仮想協調知覚データセット

V2X-Sim: A Virtual Collaborative Perception Dataset for Autonomous Driving ( http://arxiv.org/abs/2202.08449v1 )

ライセンス: Link先を確認

Yiming Li, Ziyan An, Zixun Wang, Yiqi Zhong, Siheng Chen, Chen Feng

(参考訳) V2X(V2X)は、車両と周囲のあらゆる物体との協調を表すもので、自動運転システムの認識を根本的に改善することができる。個人の知覚が急速に進歩するにつれて、公共のV2Xデータセットが不足しているため、協調的な知覚はほとんど進歩していない。本稿では,自動運転における初の大規模共同認識データセットであるv2x-simデータセットを提案する。 v2x-simは 1)道路側インフラと交差点における複数車両の協調的認識を実現するための同期記録 2)マルチモダリティ知覚を容易にするマルチモダリティセンサストリーム 3) 検出,追跡,セグメンテーションなど,さまざまな下流タスクをサポートするための,多種多様な注釈付き地上真実。我々はマルチエージェントマルチモダリティマルチタスク知覚の研究を刺激し、仮想データセットは現実的なデータセットが広く利用可能になる前に協調的な知覚の開発を促進することを約束している。

Vehicle-to-everything (V2X), which denotes the collaboration between a vehicle and any entity in its surrounding, can fundamentally improve the perception in self-driving systems. As the individual perception rapidly advances, collaborative perception has made little progress due to the shortage of public V2X datasets. In this work, we present the V2X-Sim dataset, the first public large-scale collaborative perception dataset in autonomous driving. V2X-Sim provides: 1) well-synchronized recordings from roadside infrastructure and multiple vehicles at the intersection to enable collaborative perception, 2) multi-modality sensor streams to facilitate multi-modality perception, 3) diverse well-annotated ground truth to support various downstream tasks including detection, tracking, and segmentation. We seek to inspire research on multi-agent multi-modality multi-task perception, and our virtual dataset is promising to promote the development of collaborative perception before realistic datasets become widely available.

翻訳日:2022-02-18 15:19:54 公開日:2022-02-17

# ハードウェア保証のためのコンピュータビジョンを用いたPCB成分検出

PCB Component Detection using Computer Vision for Hardware Assurance ( http://arxiv.org/abs/2202.08452v1 )

ライセンス: Link先を確認

Wenwei Zhao, Suprith Gurudu, Shayan Taheri, Shajib Ghosh, Mukhil Azhagan Mallaiyan Sathiaseelan, Navid Asadizanjani

(参考訳) 光領域におけるプリント回路基板(PCB)の保証は重要な研究分野である。画像処理やコンピュータビジョン(CV)、機械学習(ML)といった既存のPCB保証手法は数多く存在するが、PCB分野は複雑で進化が進んでいるため、新たな技術が求められている。既存のMLベースの手法は従来のCV法よりも優れているが、多くのデータを必要とし、説明可能性も低く、新しい技術が出現しても適応が難しい。これらの課題を克服するために、CVメソッドはMLメソッドとのタンデムで使用できる。特に、色、形状、テクスチャの特徴を抽出するような人間の解釈可能なCVアルゴリズムは、PCB保証説明可能性を高める。これにより、事前知識を組み込むことで、トレーニング可能なMLパラメータの数を効果的に減らし、MLモデルのトレーニングや再トレーニングにおいて高い精度を達成するために必要なデータの量を実現できる。そこで本研究では, セマンティックデータを用いたPCBコンポーネント検出作業において, コンピュータビジョンに基づく様々な特徴の利点と限界について検討する。本研究は,PCB成分検出において,色特徴が有望な性能を示すことを示した。本研究の目的は,ハードウェア保証,コンピュータビジョン,機械学習コミュニティ間のコラボレーションを促進することである。

Printed Circuit Board (PCB) assurance in the optical domain is a crucial field of study. Though there are many existing PCB assurance methods using image processing, computer vision (CV), and machine learning (ML), the PCB field is complex and increasingly evolving so new techniques are required to overcome the emerging problems. Existing ML-based methods outperform traditional CV methods, however they often require more data, have low explainability, and can be difficult to adapt when a new technology arises. To overcome these challenges, CV methods can be used in tandem with ML methods. In particular, human-interpretable CV algorithms such as those that extract color, shape, and texture features increase PCB assurance explainability. This allows for incorporation of prior knowledge, which effectively reduce the number of trainable ML parameters and thus, the amount of data needed to achieve high accuracy when training or retraining an ML model. Hence, this study explores the benefits and limitations of a variety of common computer vision-based features for the task of PCB component detection using semantic data. Results of this study indicate that color features demonstrate promising performance for PCB component detection. The purpose of this paper is to facilitate collaboration between the hardware assurance, computer vision, and machine learning communities.

翻訳日:2022-02-18 15:19:36 公開日:2022-02-17

# TraSeTR:ロボット手術におけるインスタンスレベルの機器分割のためのコントラストクエリ付きトラック・ツー・セグメンテーション・トランス

TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery ( http://arxiv.org/abs/2202.08453v1 )

ライセンス: Link先を確認

Zixu Zhao, Yueming Jin, Pheng-Ann Heng

(参考訳) 手術器具のセグメンテーション - 一般にピクセル分類タスク - は、ロボット支援手術(ras)における認知知能の促進に不可欠である。しかし,従来手法では楽器の種類や事例の識別に苦慮していた。上記の問題に対処するため,我々は,セグメント毎に予測を行うマスク分類パラダイムを検討する。そこで本研究では,手術器具のセグメンテーションを支援するために,追跡手がかりを巧みに活用する新しいトラックツーセグメンテーショントランスフォーマであるtrasetrを提案する。 TraSeTRは、クエリの埋め込みをデコードすることで、インスツルメンタタイプ、ロケーション、アイデンティティとインスタンスレベルの予測、すなわちクラス-ボックス-マスクペアのセットを併用する。具体的には、過去の時間的知識をエンコードした先行クエリを導入し、アイデンティティマッチングを通じて現在のインスタンスに追跡信号を転送する。対照的なクエリ学習戦略は、クエリ特徴空間を再形成するためにさらに適用され、大きな時間的変動に起因する追跡困難を大幅に軽減する。本手法の有効性は,EndoVis ChallengesのRASベンチマークと1つの白内障手術データセットCaDISを含む3つの公開データセットに対して,最先端の計器型セグメンテーション結果を用いて実証した。

Surgical instrument segmentation -- in general a pixel classification task -- is fundamentally crucial for promoting cognitive intelligence in robot-assisted surgery (RAS). However, previous methods are struggling with discriminating instrument types and instances. To address the above issues, we explore a mask classification paradigm that produces per-segment predictions. We propose TraSeTR, a novel Track-to-Segment Transformer that wisely exploits tracking cues to assist surgical instrument segmentation. TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions i.e., a set of class-bbox-mask pairs, by decoding query embeddings. Specifically, we introduce the prior query that encoded with previous temporal knowledge, to transfer tracking signals to current instances via identity matching. A contrastive query learning strategy is further applied to reshape the query feature space, which greatly alleviates the tracking difficulty caused by large temporal variations. The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets, including two RAS benchmarks from EndoVis Challenges and one cataract surgery dataset CaDIS.

翻訳日:2022-02-18 15:19:16 公開日:2022-02-17

# Mirror-Yolo: 注意に基づくミラーのインスタンスセグメンテーションと検出モデル

Mirror-Yolo: An attention-based instance segmentation and detection model for mirrors ( http://arxiv.org/abs/2202.08498v1 )

ライセンス: Link先を確認

Fengze Li, Jieming Ma, Zhongbei Tian, Ji Ge, Hai-Ning Liang, Yungang Zhang and Tianxi Wen

(参考訳) 鏡はコンピュータビジョンモデルの性能を劣化させるが、画像中の鏡を正確に検出することは依然として困難である。 yolov4は物体検出精度と速度の両方で驚くべき結果を達成するが、ミラーの検出には失敗することが多い。本稿では,ミラー検出を主目的とした新しいミラー検出手法"Mirror-YOLO"を提案する。 YOLOv4に基づく提案モデルでは,より優れた特徴獲得のための注意機構と,特徴マップ融合のためのハイパーカラム・ステップアプローチが組み込まれている。 Mirror-YOLO は実例分割のための正確な有界多角形も生成できる。提案モデルの有効性を実験により実証し,既存のミラー検出法と比較して,ミラー画像データセットにおけるミラーyoloの検出精度が向上することを示した。

Mirrors can degrade the performance of computer vision models, however to accurately detect mirrors in images remains challenging. YOLOv4 achieves phenomenal results both in object detection accuracy and speed, nevertheless the model often fails in detecting mirrors. In this paper, a novel mirror detection method `Mirror-YOLO' is proposed, which mainly targets on mirror detection. Based on YOLOv4, the proposed model embeds an attention mechanism for better feature acquisition, and a hypercolumn-stairstep approach for feature map fusion. Mirror-YOLO can also produce accurate bounding polygons for instance segmentation. The effectiveness of our proposed model is demonstrated by our experiments, compared to the existing mirror detection methods, the proposed Mirror-YOLO achieves better performance in detection accuracy on the mirror image dataset.

翻訳日:2022-02-18 15:18:52 公開日:2022-02-17

# CLS:セミスーパービジョンラーニングのためのクロスラベル・スーパービジョン

CLS: Cross Labeling Supervision for Semi-Supervised Learning ( http://arxiv.org/abs/2202.08502v1 )

ライセンス: Link先を確認

Yao Yao, Junyi Shen, Jin Xu, Bin Zhong, Li Xiao

(参考訳) ディープニューラルネットワークの成功は、大規模ラベル付きデータセットによるところが大きいことが知られている。しかし、ほとんどの実用的なアプリケーションで十分な高品質なラベル付きデータを集めるのに非常に時間がかかり、手間がかかる。半教師付き学習(SSL)はラベル付きデータとラベルなしデータの両方を同時に活用することによりラベル付けコストを削減する効果的なソリューションを提供する。本稿では,典型的な擬似ラベル処理を一般化するフレームワークであるCross Labeling Supervision (CLS)を紹介する。弱いサンプルから擬似ラベルを生成し、同じ入力サンプルの強い強化について予測を教えるfixmatchに基づいて、clsは疑似ラベルと補完ラベルの両方を作成し、正の学習と負の学習の両方をサポートすることができる。自己ラベルの確認バイアスを緩和し、偽ラベルに対する耐性を高めるために、同じ構造を持つ2つの異なる初期化ネットワークを同時に訓練する。各ネットワークは、他のネットワークからの高信頼ラベルを追加の監視信号として利用する。ラベル生成段階では、その予測信頼度に応じて適応的なサンプル重みを人工ラベルに割り当てる。サンプルウェイトは、生成されたラベルの品質を定量化し、ネットワークトレーニングにおける不正確なラベルの破壊を低減する。半教師付き分類タスクの実験結果から,CIFAR-10データセットとCIFAR-100データセットにおいて,我々のフレームワークが既存のアプローチよりも優れていることが示された。

It is well known that the success of deep neural networks is greatly attributed to large-scale labeled datasets. However, it can be extremely time-consuming and laborious to collect sufficient high-quality labeled data in most practical applications. Semi-supervised learning (SSL) provides an effective solution to reduce the cost of labeling by simultaneously leveraging both labeled and unlabeled data. In this work, we present Cross Labeling Supervision (CLS), a framework that generalizes the typical pseudo-labeling process. Based on FixMatch, where a pseudo label is generated from a weakly-augmented sample to teach the prediction on a strong augmentation of the same input sample, CLS allows the creation of both pseudo and complementary labels to support both positive and negative learning. To mitigate the confirmation bias of self-labeling and boost the tolerance to false labels, two different initialized networks with the same structure are trained simultaneously. Each network utilizes high-confidence labels from the other network as additional supervision signals. During the label generation phase, adaptive sample weights are assigned to artificial labels according to their prediction confidence. The sample weight plays two roles: quantify the generated labels' quality and reduce the disruption of inaccurate labels on network training. Experimental results on the semi-supervised classification task show that our framework outperforms existing approaches by large margins on the CIFAR-10 and CIFAR-100 datasets.

翻訳日:2022-02-18 15:18:38 公開日:2022-02-17

# TAFNet: RGB-T クラウドカウントのための3ストリーム適応型フュージョンネットワーク

TAFNet: A Three-Stream Adaptive Fusion Network for RGB-T Crowd Counting ( http://arxiv.org/abs/2202.08517v1 )

ライセンス: Link先を確認

Haihan Tang, Yi Wang, Lap-Pui Chau

(参考訳) 本稿では,クラウドカウントにrgbと熱画像の組み合わせを用いた3ストリーム適応型核融合ネットワークtafnetを提案する。具体的には、TAFNetは1つのメインストリームと2つの補助ストリームに分けられる。メインストリームの入力を構成するために,RGBと熱画像のペアを組み合わせる。 2つの補助ストリームはそれぞれrgbイメージとサーマルイメージを利用してモダリティ特有の特徴を抽出する。さらに、モーダリティ固有の特徴を主ストリームに適応的に融合させる情報改善モジュール(IIM)を提案する。 RGBT-CCデータセットを用いた実験結果から,本手法は平均誤差および根平均二乗誤差に対して,最先端手法と比較して20%以上改善されていることがわかった。ソースコードはhttps://github.com/TANGHAIHAN/TAFNetで公開されている。

In this paper, we propose a three-stream adaptive fusion network named TAFNet, which uses paired RGB and thermal images for crowd counting. Specifically, TAFNet is divided into one main stream and two auxiliary streams. We combine a pair of RGB and thermal images to constitute the input of main stream. Two auxiliary streams respectively exploit RGB image and thermal image to extract modality-specific features. Besides, we propose an Information Improvement Module (IIM) to fuse the modality-specific features into the main stream adaptively. Experiment results on RGBT-CC dataset show that our method achieves more than 20% improvement on mean average error and root mean squared error compared with state-of-the-art method. The source code will be publicly available at https://github.com/TANGHAIHAN/TAFNet.

翻訳日:2022-02-18 15:18:14 公開日:2022-02-17

# コンテンツとスタイル分離による水中画像強調のためのドメイン適応

Domain Adaptation for Underwater Image Enhancement via Content and Style Separation ( http://arxiv.org/abs/2202.08537v1 )

ライセンス: Link先を確認

Yu-Wei Chen, Soo-Chang Pei

(参考訳) 水中画像は、光吸収、屈折、散乱によるカラーキャスト、低コントラスト、ハジー効果に悩まされ、物体検出や物体追跡などの高レベルな用途が劣化した。近年の学習に基づく手法は水中画像強調における驚くべき性能を示しているが、これらの作品の多くは合成ペアデータを用いて教師あり学習を行い、実世界データに対するドメインギャップを無視している。本稿では,水中画像強調のためのコンテンツとスタイル分離によるドメイン適応フレームワークを提案し,画像がコンテンツとスタイル潜伏とに絡み合うことができることを仮定し,潜伏空間における関連するスタイルのサブドメインにイメージをクラスタリングし,水中潜伏とクリーンな画像間のマッピングを構築することを目的とする。合成と実世界のデータの遅延を最小限に抑えることを目的とした,水中画像強調のための領域適応の先行研究とは違って,異なるサブドメインからスタイル潜在度を区別することを目的とする。実世界のペアデータの欠如を解決するため,実画像から画像への変換に合成を活用し,教師付き学習のための擬似実水中画像ペアを得る。本モデルでは,潜時操作により異なる拡張レベルを調整できるユーザインタラクションインタフェースを提供する。実世界の様々な水中ベンチマーク実験により,提案フレームワークは水中画像強調のための領域適応を行い,様々な水中画像強調アルゴリズムの量と品質に優れることを示した。モデルとソースコードはhttps://github.com/fordevoted/UIESSで入手できる。

Underwater image suffer from color cast, low contrast and hazy effect due to light absorption, refraction and scattering, which degraded the high-level application, e.g, object detection and object tracking. Recent learning-based methods demonstrate astonishing performance on underwater image enhancement, however, most of these works use synthesis pair data for supervised learning and ignore the domain gap to real-world data. In this paper, we propose a domain adaptation framework for underwater image enhancement via content and style separation, we assume image could be disentangled to content and style latent, and image could be clustered to the sub-domain of associated style in latent space, the goal is to build up the mapping between underwater style latent and clean one. Different from prior works of domain adaptation for underwater image enhancement, which target to minimize the latent discrepancy of synthesis and real-world data, we aim to distinguish style latent from different sub-domains. To solve the problem of lacking pair real-world data, we leverage synthesis to real image-to-image translation to obtain pseudo real underwater image pairs for supervised learning, and enhancement can be achieved by input content and clean style latent into generator. Our model provide a user interact interface to adjust different enhanced level by latent manipulation. Experiment on various public real-world underwater benchmarks demonstrate that the proposed framework is capable to perform domain adaptation for underwater image enhancement and outperform various state-of-the-art underwater image enhancement algorithms in quantity and quality. The model and source code are available at https://github.com/fordevoted/UIESS

翻訳日:2022-02-18 15:18:02 公開日:2022-02-17

# 奥行きを優先した3次元室内シーン合成

3D-Aware Indoor Scene Synthesis with Depth Priors ( http://arxiv.org/abs/2202.08553v1 )

ライセンス: Link先を確認

Zifan Shi, Yujun Shen, Jiapeng Zhu, Dit-Yan Yeung, Qifeng Chen

(参考訳) 近年,2次元データから3次元画像合成を学習するGAN(Generative Adversarial Networks)が進歩しているが,室内レイアウトや内部オブジェクトの多様さにより,既存の手法では屋内シーンのモデル化に失敗している。室内シーンは内在的な構造が共有されていないため, 2次元画像のみを用いた場合, モデルに十分な3次元形状を導くことはできない。本研究では,このギャップを3次元の先行モデルとして深度を導入することで埋める。他の3Dデータフォーマットと比較して、深度は畳み込みベースの生成メカニズムに適合し、実際はより容易にアクセスできる。具体的には、一方の経路が他方の経路に中間的な特徴を注入する深度生成を、外観レンダリングの条件として行うデュアルパス生成器を提案する。このような設計により、明快な幾何学情報による3D認識合成が容易になる。一方、実際のv.s.フェイクドメインを区別し、与えられた入力から深さを予測するために、切り替え可能な判別器を導入する。このようにして、判別器は空間配置を考慮に入れ、ジェネレータに適切な深度条件を学ぶよう助言することができる。大規模な実験結果から,本手法は室内のシーンを極めて優れた品質と3D整合性で合成することができることが示唆された。

Despite the recent advancement of Generative Adversarial Networks (GANs) in learning 3D-aware image synthesis from 2D data, existing methods fail to model indoor scenes due to the large diversity of room layouts and the objects inside. We argue that indoor scenes do not have a shared intrinsic structure, and hence only using 2D images cannot adequately guide the model with the 3D geometry. In this work, we fill in this gap by introducing depth as a 3D prior. Compared with other 3D data formats, depth better fits the convolution-based generation mechanism and is more easily accessible in practice. Specifically, we propose a dual-path generator, where one path is responsible for depth generation, whose intermediate features are injected into the other path as the condition for appearance rendering. Such a design eases the 3D-aware synthesis with explicit geometry information. Meanwhile, we introduce a switchable discriminator both to differentiate real v.s. fake domains and to predict the depth from a given input. In this way, the discriminator can take the spatial arrangement into account and advise the generator to learn an appropriate depth condition. Extensive experimental results suggest that our approach is capable of synthesizing indoor scenes with impressively good quality and 3D consistency, significantly outperforming state-of-the-art alternatives.

翻訳日:2022-02-18 15:17:10 公開日:2022-02-17

# フィードバックネットワークを用いた構造化特徴マップ上のポイントクラウド補完

Point cloud completion on structured feature map with feedback network ( http://arxiv.org/abs/2202.08583v1 )

ライセンス: Link先を確認

Zejia Su, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu

(参考訳) 本稿では,特徴学習の観点から,ポイントクラウド完成の課題に挑戦する。基本となる構造と表面の詳細を部分的な入力から回復するために、基本的なコンポーネントは、大域構造と局所幾何学的詳細の両方をキャプチャできる優れた特徴表現です。この目的に向けて,我々はまず,局所領域から複数の潜在パターンを学習することにより,ポイントワイドな特徴を2次元構造的特徴マップに適応的に集約する機能構造化モジュールFSNetを提案する。次に、FSNetをポイントクラウド補完のための粗大なパイプラインに統合します。具体的には、2D畳み込みニューラルネットワークを用いて、FSNetから粗い完全点クラウドに特徴マップをデコードする。次に、部分入力と粗い中間出力から高密度点雲を生成するために、点雲アップサンプリングネットワークを用いる。局所構造を効率的に活用し, 点分布の均一性を高めるために, 生成した濃密点雲の詳細を段階的に洗練できる自己補正機構を備えた点アップサンプリングモジュールifnetを提案する。本研究では,ShapeNet,MVPおよびKITTIデータセットの定性的および定量的な実験を行い,本手法が最先端のクラウド補完手法より優れていることを示す。

In this paper, we tackle the challenging problem of point cloud completion from the perspective of feature learning. Our key observation is that to recover the underlying structures as well as surface details given a partial input, a fundamental component is a good feature representation that can capture both global structure and local geometric details. Towards this end, we first propose FSNet, a feature structuring module that can adaptively aggregate point-wise features into a 2D structured feature map by learning multiple latent patterns from local regions. We then integrate FSNet into a coarse-to-fine pipeline for point cloud completion. Specifically, a 2D convolutional neural network is adopted to decode feature maps from FSNet into a coarse and complete point cloud. Next, a point cloud upsampling network is used to generate dense point cloud from the partial input and the coarse intermediate output. To efficiently exploit the local structures and enhance the point distribution uniformity, we propose IFNet, a point upsampling module with self-correction mechanism that can progressively refine details of the generated dense point cloud. We conduct both qualitative and quantitative experiments on ShapeNet, MVP, and KITTI datasets, which demonstrate that our method outperforms state-of-the-art point cloud completion approaches.

翻訳日:2022-02-18 15:16:48 公開日:2022-02-17

# 解釈可能なピラミッドネットワークによる単一uhd画像デハジング

Single UHD Image Dehazing via Interpretable Pyramid Network ( http://arxiv.org/abs/2202.08589v1 )

ライセンス: Link先を確認

Boxue Xiao, Zhuoran Zheng, Xiang Chen, Chen Lv, Yunliang Zhuang, Tao Wang

(参考訳) 現在、ほとんどのシングルイメージデハージングモデルは、単一のGPUシェーダを持つ超高解像度(UHD)イメージをリアルタイムで実行することはできない。この問題を解決するために,テイラーの定理をラプラスのピラミッドパターンで無限近似する原理を導入し,4Kハジー画像をリアルタイムで処理できるモデルを構築する。ピラミッドネットワークの N 分岐ネットワークはテイラーの定理における N の制約項に対応する。低次多項式は、画像の低周波情報(色、照明など)を再構成する。高次多項式は、画像の高周波情報(例えばテクスチャ)を抑圧する。さらに,ピラミッドモデルの各分岐ネットワークに作用するタッカー再構成に基づく正規化項を提案する。さらに、特徴空間における異常信号の生成を制限する。広範な実験結果から,hazeを用いた4kイメージを単一のgpu (80fps) 上でリアルタイムに動作させるだけでなく,並列性のない解釈性も実現できた。 2つのベンチマーク(o/i-haze)と更新された4kidデータセットで最先端(sota)性能を実現し,その後の最適化手法に対する信頼性の高い基盤を提供する。

Currently, most single image dehazing models cannot run an ultra-high-resolution (UHD) image with a single GPU shader in real-time. To address the problem, we introduce the principle of infinite approximation of Taylor's theorem with the Laplace pyramid pattern to build a model which is capable of handling 4K hazy images in real-time. The N branch networks of the pyramid network correspond to the N constraint terms in Taylor's theorem. Low-order polynomials reconstruct the low-frequency information of the image (e.g. color, illumination). High-order polynomials regress the high-frequency information of the image (e.g. texture). In addition, we propose a Tucker reconstruction-based regularization term that acts on each branch network of the pyramid model. It further constrains the generation of anomalous signals in the feature space. Extensive experimental results demonstrate that our approach can not only run 4K images with haze in real-time on a single GPU (80FPS) but also has unparalleled interpretability. The developed method achieves state-of-the-art (SOTA) performance on two benchmarks (O/I-HAZE) and our updated 4KID dataset while providing the reliable groundwork for subsequent optimization schemes.

翻訳日:2022-02-18 15:16:29 公開日:2022-02-17

# 少数ショット学習のための意味的比例パッチミックス

Semantically Proportional Patchmix for Few-Shot Learning ( http://arxiv.org/abs/2202.08647v1 )

ライセンス: Link先を確認

Jingquan Wang, Jing Xu, Yu Pan, Zenglin Xu

(参考訳) 少数ショット学習は、限られた数のラベル付きデータで未発見のクラスを分類することを目的としている。近年の研究では、単純な転写学習戦略によるトレーニングモデルが、数ショットの分類において競合的な結果が得られることが示されている。トレーニングデータの識別には優れていますが、これらのモデルは、おそらく評価上の特徴表現が不十分なため、見当たらないデータに対してうまく一般化していません。そこで本研究では,訓練画像間でパッチをカット・ペーストし,パッチの意味情報に接する基底真理ラベルを混合する,意味的に比例するパッチミックス(seppmix)を提案する。このように,重度のラベルノイズを発生させることなく,局所的ドロップアウト効果によりモデルの一般化能力を向上させることができる。データのより堅牢な表現を学習するために、混合画像上で回転変換を行い、規則ベースの正規化器として回転を予測する。提案手法の有効性を実証し,提案手法の有効性を検証した。

Few-shot learning aims to classify unseen classes with only a limited number of labeled data. Recent works have demonstrated that training models with a simple transfer learning strategy can achieve competitive results in few-shot classification. Although excelling at distinguishing training data, these models are not well generalized to unseen data, probably due to insufficient feature representations on evaluation. To tackle this issue, we propose Semantically Proportional Patchmix (SePPMix), in which patches are cut and pasted among training images and the ground truth labels are mixed proportionally to the semantic information of the patches. In this way, we can improve the generalization ability of the model by regional dropout effect without introducing severe label noise. To learn more robust representations of data, we further take rotate transformation on the mixed images and predict rotations as a rule-based regularizer. Extensive experiments on prevalent few-shot benchmarks have shown the effectiveness of our proposed method.

翻訳日:2022-02-18 15:16:10 公開日:2022-02-17

# 画像デブラリング学習のためのリアルなボケ合成

Realistic Blur Synthesis for Learning Image Deblurring ( http://arxiv.org/abs/2202.08771v1 )

ライセンス: Link先を確認

Jaesung Rim, Geonung Kim, Jungeon Kim, Junyong Lee, Seungyong Lee, Sunghyun Cho

(参考訳) 学習に基づくデブロワーリングの訓練には、大量のぼやけた画像と鋭い画像のペアが必要である。残念ながら、既存の合成データセットは十分に現実的ではなく、既存の現実世界のぼやけたデータセットは、シーンやカメラの設定に制限がある。結果として、トレーニングされたデブロアリングモデルは、実際のぼやけた画像を扱う一般化能力の欠如に悩まされている。本稿では,実画像と合成画像の相違を生ずる様々な要因を分析し,よりリアルなぼかしを合成できる新しいぼかし合成パイプラインを提案する。また,実際のぼやけた画像とシャープな画像のシーケンスを含む,新しいデータセットrsblurを提案する。 rsblurデータセットは、実際のぼかしと合成ぼかしの違いを詳細に分析するために、合成ぼかし画像を生成するのに使うことができる。ボケ合成パイプラインとrsblurデータセットを用いて,ボケ合成における異なる因子の影響を明らかにする。また,本手法により,実際のぼやけた画像の劣化性能を向上できることを示す。

Training learning-based deblurring methods demands a significant amount of blurred and sharp image pairs. Unfortunately, existing synthetic datasets are not realistic enough, and existing real-world blur datasets provide limited diversity of scenes and camera settings. As a result, deblurring models trained on them still suffer from the lack of generalization ability for handling real blurred images. In this paper, we analyze various factors that introduce differences between real and synthetic blurred images, and present a novel blur synthesis pipeline that can synthesize more realistic blur. We also present RSBlur, a novel dataset that contains real blurred images and the corresponding sequences of sharp images. The RSBlur dataset can be used for generating synthetic blurred images to enable detailed analysis on the differences between real and synthetic blur. With our blur synthesis pipeline and RSBlur dataset, we reveal the effects of different factors in the blur synthesis. We also show that our synthesis method can improve the deblurring performance on real blurred images.

翻訳日:2022-02-18 15:15:53 公開日:2022-02-17

# 一般化可能な情報理論因果表現

Generalizable Information Theoretic Causal Representation ( http://arxiv.org/abs/2202.08388v1 )

ライセンス: Link先を確認

Mengyue Yang, Xinyu Cai, Furui Liu, Xu Chen, Zhitang Chen, Jianye Hao, Jun Wang

(参考訳) 表現学習は、画像分類やレコメンダシステムなど、多くの実世界のシナリオにおいて、複数のダウンストリームタスクに対するモデルのパフォーマンスを向上させることができる。既存の学習アプローチは、特徴と下流タスク(ラベル)の間の相関(あるいはそのプロキシ)を確立することに依存しており、通常はラベルの原因、効果、刺激的な相関変数を含む表現をもたらす。非因果部分の不安定性のため、その一般化性は低下する可能性がある。本稿では,観測データから因果表現を学習するために,仮説的因果グラフに基づいて相互情報測度で学習手順を規則化することを提案する。この最適化は、因果性に着想を得た学習がサンプルの複雑さを減らし、一般化能力を向上させるという理論的保証を導出する反事実損失を含む。広範な実験により,提案手法で学習した因果表現に基づくモデルが,敵対的攻撃と分布シフト下で頑健であることが判明した。

It is evidence that representation learning can improve model's performance over multiple downstream tasks in many real-world scenarios, such as image classification and recommender systems. Existing learning approaches rely on establishing the correlation (or its proxy) between features and the downstream task (labels), which typically results in a representation containing cause, effect and spurious correlated variables of the label. Its generalizability may deteriorate because of the unstability of the non-causal parts. In this paper, we propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph. The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability. Extensive experiments show that the models trained on causal representations learned by our approach is robust under adversarial attacks and distribution shift.

翻訳日:2022-02-18 15:13:39 公開日:2022-02-17

# swim: メモリ内ニューラルネットワークアクセラレータのための選択的書き込み検証

SWIM: Selective Write-Verify for Computing-in-Memory Neural Accelerators ( http://arxiv.org/abs/2202.08395v1 )

ライセンス: Link先を確認

Zheyu Yan, Xiaobo Sharon Hu, Yiyu Shi

(参考訳) 非揮発性新興メモリに基づくコンピューティング・イン・メモリアーキテクチャは、その高エネルギー効率によりディープニューラルネットワーク(DNN)加速に大きな可能性を示している。しかし、これらの新興デバイスはマッピングプロセス、すなわちデバイスへのプログラミング重み付けの間に大きなバリエーションに悩まされ、もし未解決のままにしていれば、大幅な精度低下を引き起こす可能性がある。ウェイトマッピングの非理想性は、反復的プログラミングと書き込み検証スキーム、すなわち、必要に応じてコンダクタンスを読み書きすることで補うことができる。既存のすべての作品において、そのような実践はマッピングされているdnnのすべての重量に適用され、広範なプログラミング時間を必要とする。本研究は,DNNの精度を維持するために,書き込み検証のためのウェイトの一部だけを選択する必要があることを示し,大幅な高速化を実現する。さらに、書込み検証が必要なウェイトを効率的に選択するために、フォワードとバックプロパゲーションの1パスしか必要としない第2のデリバティブベース手法SWIMを導入する。異なるデータセットに対する様々なDNNアーキテクチャの実験結果から、SWIMは従来の完全な書き込み検証に比べて最大10倍のプログラムスピードアップを実現でき、精度は同等である。

Computing-in-Memory architectures based on non-volatile emerging memories have demonstrated great potential for deep neural network (DNN) acceleration thanks to their high energy efficiency. However, these emerging devices can suffer from significant variations during the mapping process i.e., programming weights to the devices), and if left undealt with, can cause significant accuracy degradation. The non-ideality of weight mapping can be compensated by iterative programming with a write-verify scheme, i.e., reading the conductance and rewriting if necessary. In all existing works, such a practice is applied to every single weight of a DNN as it is being mapped, which requires extensive programming time. In this work, we show that it is only necessary to select a small portion of the weights for write-verify to maintain the DNN accuracy, thus achieving significant speedup. We further introduce a second derivative based technique SWIM, which only requires a single pass of forward and backpropagation, to efficiently select the weights that need write-verify. Experimental results on various DNN architectures for different datasets show that SWIM can achieve up to 10x programming speedup compared with conventional full-blown write-verify while attaining a comparable accuracy.

翻訳日:2022-02-18 15:13:22 公開日:2022-02-17

# 検索型強化学習

Retrieval-Augmented Reinforcement Learning ( http://arxiv.org/abs/2202.08417v1 )

ライセンス: Link先を確認

Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Ksenia Konyushkova, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell

(参考訳) ほとんどの深層強化学習(RL)アルゴリズムは、経験をパラメトリックな行動ポリシーや値関数に抽出する。効果的であるが、このアプローチにはいくつかの欠点がある:(1)計算コストが高い、(2)パラメトリックモデルに経験を統合するために多くの更新を必要とする、(3)完全に統合されていない経験はエージェントの振る舞いに適切に影響しない、(4)行動はモデルの能力によって制限される。本稿では,過去の経験のデータセットを最適な行動にマップするために,ネットワークを訓練する代替パラダイムを検討する。具体的には、経験のデータセットに直接アクセス可能な検索プロセス(ニューラルネットワークとしてパラメータ化)でRLエージェントを増強する。このデータセットは、エージェントの過去の経験、専門家によるデモンストレーション、その他の関連するソースから得られる。検索プロセスは、現在の文脈で有用なデータセットから情報を取得するように訓練され、エージェントがその目標を迅速かつ効率的に達成するのに役立つ。オフラインDQNエージェントとオンラインR2D2エージェントの2つの異なるRLエージェントに統合する。オフラインマルチタスク問題では,検索拡張DQNエージェントはタスク干渉を回避し,ベースラインDQNエージェントよりも高速に学習することを示す。 Atariでは,検索強化R2D2がベースラインR2D2エージェントよりもかなり高速に学習し,より高いスコアを得ることを示す。提案手法の成分の寄与度を測定するため,広範なアブレーションを行った。

Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model. In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. Specifically, we augment an RL agent with a retrieval process (parameterized as a neural network) that has direct access to a dataset of experiences. This dataset can come from the agent's past experiences, expert demonstrations, or any other relevant source. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context, to help the agent achieve its goal faster and more efficiently. We integrate our method into two different RL agents: an offline DQN agent and an online R2D2 agent. In offline multi-task problems, we show that the retrieval-augmented DQN agent avoids task interference and learns faster than the baseline DQN agent. On Atari, we show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores. We run extensive ablations to measure the contributions of the components of our proposed method.

翻訳日:2022-02-18 15:13:02 公開日:2022-02-17

# (参考訳) 事前学習言語モデルを用いた知識集約型NLPの検討

A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models ( http://arxiv.org/abs/2202.08772v1 )

ライセンス: CC BY 4.0

Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao

(参考訳) 事前訓練された言語モデルによってもたらされるモデルキャパシティの増大に伴い、百科事典や常識知識の柔軟な利用を含む高度な機能を備えたより知識のある自然言語処理(NLP)モデルの必要性が高まっている。しかし、単に事前訓練された言語モデルでは、そのような知識集約型NLPタスクのみを扱う能力が欠けている。この課題に対処するため、外部知識ソースを付加した事前学習型言語モデルが多数提案され、迅速な開発が行われている。本稿では,知識源,知識集約型nlpタスク,知識融合手法の3つの重要な要素を解剖し,事前学習した言語モデルに基づく知識強化モデル(plmkes)の現状を概説する。最後に,3つの要素に関する議論に基づくPLMKEの課題について述べるとともに,NLP実践者にさらなる研究の道筋を与えようとしている。

With the increasing of model capacity brought by pre-trained language models, there emerges boosting needs for more knowledgeable natural language processing (NLP) models with advanced functionalities including providing and making flexible use of encyclopedic and commonsense knowledge. The mere pre-trained language models, however, lack the capacity of handling such knowledge-intensive NLP tasks alone. To address this challenge, large numbers of pre-trained language models augmented with external knowledge sources are proposed and in rapid development. In this paper, we aim to summarize the current progress of pre-trained language model-based knowledge-enhanced models (PLMKEs) by dissecting their three vital elements: knowledge sources, knowledge-intensive NLP tasks, and knowledge fusion methods. Finally, we present the challenges of PLMKEs based on the discussion regarding the three elements and attempt to provide NLP practitioners with potential directions for further research.

翻訳日:2022-02-18 15:12:09 公開日:2022-02-17

# (参考訳) ロバスト行列回復のための下位勾配法のグローバル収束:小さな初期化、雑音測定、過度パラメータ化

Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization ( http://arxiv.org/abs/2202.08788v1 )

ライセンス: CC BY 4.0

Jianhao Ma and Salar Fattahi

(参考訳) 本研究では,低階行列回復の自然な非凸に対するサブ段階法(SubGM)の性能と,低階行列回復を$\ell_1$-lossで定式化する手法について検討する。真の解のランクが未知であり、代わりに過大評価されるシナリオを研究する。ランクの過度な推定は、必要以上の自由度を持つ過度なパラメータ化されたモデルをもたらす。このような過度パラメータ化はアルゴリズムの性能に過度に適合するか、悪影響を及ぼす可能性がある。初期化が小さい単純なsubgmは、測定値の過パラメータ化とノイズの両方に無依存であることが証明される。特に, 極小初期化は, SubGMの性能に及ぼす過パラメータ化の影響を無効化し, 収束率を指数的に向上させることを示した。さらに, 外部雑音モデルとガウス雑音モデルの両方の下でのSubGMの挙動を解析するための最初の統一フレームワークを提案し, 任意に大きく, 任意の密度の雑音値の下でもSubGMが真の解に収束することを示した。我々の結果の核となるのは、Sign-RIPと呼ばれる制限された等距離特性の頑健な変種であり、これは理想的な期待損失から$\ell_1$-lossの偏差を制御している。以上の結果の副産物として,ガウス計測によるロバストな低ランク行列復元のサブクラスを考察し,subgmのグローバル収束を保証するために必要なサンプル数が過パラメータのランクとは無関係であることを示す。

In this work, we study the performance of sub-gradient method (SubGM) on a natural nonconvex and nonsmooth formulation of low-rank matrix recovery with $\ell_1$-loss, where the goal is to recover a low-rank matrix from a limited number of measurements, a subset of which may be grossly corrupted with noise. We study a scenario where the rank of the true solution is unknown and over-estimated instead. The over-estimation of the rank gives rise to an over-parameterized model in which there are more degrees of freedom than needed. Such over-parameterization may lead to overfitting, or adversely affect the performance of the algorithm. We prove that a simple SubGM with small initialization is agnostic to both over-parameterization and noise in the measurements. In particular, we show that small initialization nullifies the effect of over-parameterization on the performance of SubGM, leading to an exponential improvement in its convergence rate. Moreover, we provide the first unifying framework for analyzing the behavior of SubGM under both outlier and Gaussian noise models, showing that SubGM converges to the true solution, even under arbitrarily large and arbitrarily dense noise values, and--perhaps surprisingly--even if the globally optimal solutions do not correspond to the ground truth. At the core of our results is a robust variant of restricted isometry property, called Sign-RIP, which controls the deviation of the sub-differential of the $\ell_1$-loss from that of an ideal, expected loss. As a byproduct of our results, we consider a subclass of robust low-rank matrix recovery with Gaussian measurements, and show that the number of required samples to guarantee the global convergence of SubGM is independent of the over-parameterized rank.

翻訳日:2022-02-18 14:59:20 公開日:2022-02-17

# 行動の多重性を考慮したコントラスト的メタラーニング

Contrastive Meta Learning with Behavior Multiplicity for Recommendation ( http://arxiv.org/abs/2202.08523v1 )

ライセンス: Link先を確認

Wei Wei and Chao Huang and Lianghao Xia and Yong Xu and Jiashu Zhao and Dawei Yin

(参考訳) 優れたレコメンデーションフレームワークは、ユーザーが興味のあるアイテムを識別するのに役立つだけでなく、さまざまなオンラインプラットフォーム(eコマースやソーシャルメディアなど)の収益にも寄与する。従来のレコメンデーションモデルは、通常、ユーザとアイテムの間には単一のタイプのインタラクションしか存在せず、ページビュー、アドオン、購入のようなマルチタイプのユーザー行動データから複数のユーザ-イテム関係をモデル化できないと仮定する。最近の研究では、さまざまなタイプの振る舞いにまたがる依存関係を捉えることが提案されているが、重要な2つの課題が探究されていない。一標的行動(購入等)の下で、疎い監視信号に対処すること。 ii)カスタマイズされた依存関係モデリングによるパーソナライズされたマルチビヘイビアパターンのキャプチャ。上記の課題に取り組むため,我々は新しいモデルであるコントラストメタラーニング(cml)を考案し,異なるユーザに対して専用のクロスタイプ行動依存性を維持する。特に,構築されたコントラスト損失を通じて,異なる種類の行動にまたがる移動可能知識を蒸留する多目的コントラスト学習フレームワークを提案する。さらに,多様なマルチビヘイビアパターンを捉えるために,異なるユーザに対してカスタマイズされた振る舞いの不均一性を符号化するコントラストメタネットワークを設計する。 3つの実世界のデータセットに関する広範囲な実験は、この手法が様々な最先端の推奨手法を一貫して上回っていることを示している。さらに, コントラスト的メタ学習パラダイムは, 行動の多重性をレコメンデーションで捉えるための大きな可能性を示唆する。私たちはモデル実装をhttps://github.com/weiwei1206/cml.gitでリリースします。

A well-informed recommendation framework could not only help users identify their interested items, but also benefit the revenue of various online platforms (e.g., e-commerce, social media). Traditional recommendation models usually assume that only a single type of interaction exists between user and item, and fail to model the multiplex user-item relationships from multi-typed user behavior data, such as page view, add-to-favourite and purchase. While some recent studies propose to capture the dependencies across different types of behaviors, two important challenges have been less explored: i) Dealing with the sparse supervision signal under target behaviors (e.g., purchase). ii) Capturing the personalized multi-behavior patterns with customized dependency modeling. To tackle the above challenges, we devise a new model CML, Contrastive Meta Learning (CML), to maintain dedicated cross-type behavior dependency for different users. In particular, we propose a multi-behavior contrastive learning framework to distill transferable knowledge across different types of behaviors via the constructed contrastive loss. In addition, to capture the diverse multi-behavior patterns, we design a contrastive meta network to encode the customized behavior heterogeneity for different users. Extensive experiments on three real-world datasets indicate that our method consistently outperforms various state-of-the-art recommendation methods. Our empirical studies further suggest that the contrastive meta learning paradigm offers great potential for capturing the behavior multiplicity in recommendation. We release our model implementation at: https://github.com/weiwei1206/CML.git.

翻訳日:2022-02-18 14:56:08 公開日:2022-02-17

# 終わりは意味を正当化するのか? フェアネスを考慮した機械学習のモラル正当性について

Does the End Justify the Means? On the Moral Justification of Fairness-Aware Machine Learning ( http://arxiv.org/abs/2202.08536v1 )

ライセンス: Link先を確認

Hilde Weerts, Lamb\`er Royakkers, Mykola Pechenizkiy

(参考訳) フェアネス認識機械学習(fair-ml)アルゴリズムは豊富であるが、これらのアルゴリズムがどのようにフェアネスメトリクスを強制するかの道徳的正当性はほとんど未解明である。本研究の目的は,fair-mlアルゴリズムの道徳的意味を引き出すことである。この目的のために、まずアルゴリズムが最適化する公平度指標の道徳的正当性を考察する。我々は、フェアネスのメトリクスを正当化できる3つの命題に到達するために、以前の作業の拡張を示す。これまでの作業とは違って,予測結果の結果が公平さを判断する上で重要であることを強調する。我々は、fair-mlアルゴリズムの道徳的意味を識別するために、拡張された枠組みと経験的倫理から引き出す。我々は、アルゴリズムに固有の2つの最適化戦略に焦点を当てる:グループ固有の決定しきい値とランダム化された決定しきい値。我々は、アルゴリズムの正当化は、アルゴリズムが適用される(社会的)コンテキストについての仮定によって、たとえ関連するフェアネス計量が同じであっても、異なる可能性があると主張する。最後に,fair-mlアルゴリズムのより完全な評価に向けた今後の研究の道筋を,直接最適化の目的を超えてスケッチする。

Despite an abundance of fairness-aware machine learning (fair-ml) algorithms, the moral justification of how these algorithms enforce fairness metrics is largely unexplored. The goal of this paper is to elicit the moral implications of a fair-ml algorithm. To this end, we first consider the moral justification of the fairness metrics for which the algorithm optimizes. We present an extension of previous work to arrive at three propositions that can justify the fairness metrics. Different from previous work, our extension highlights that the consequences of predicted outcomes are important for judging fairness. We draw from the extended framework and empirical ethics to identify moral implications of the fair-ml algorithm. We focus on the two optimization strategies inherent to the algorithm: group-specific decision thresholds and randomized decision thresholds. We argue that the justification of the algorithm can differ depending on one's assumptions about the (social) context in which the algorithm is applied - even if the associated fairness metric is the same. Finally, we sketch paths for future work towards a more complete evaluation of fair-ml algorithms, beyond their direct optimization objectives.

翻訳日:2022-02-18 14:55:41 公開日:2022-02-17

# バナッハ空間におけるロバストSVM最適化

Robust SVM Optimization in Banach spaces ( http://arxiv.org/abs/2202.08567v1 )

ライセンス: Link先を確認

Mohammed Sbihi and Nicolas Couellan

(参考訳) バナッハ空間における二項分類の問題に不確実性が存在する場合に対処する。古典的支援ベクトルマシン理論から得られる多くの結果は、バナッハ空間においてその頑健な結果に適切に一般化できることを示す。これらはRepresenter Theorem、関連する最適化問題に対する強い双対性、幾何学的解釈を含む。さらに, 2つの閉凸集合において, 基底空間が反射的かつ滑らかなときに最も近い点を求めるより一般的な問題に対して, ナッシュ均衡問題を定式化してゲーム理論解釈を提案する。

We address the issue of binary classification in Banach spaces in presence of uncertainty. We show that a number of results from classical support vector machines theory can be appropriately generalised to their robust counterpart in Banach spaces. These include the Representer Theorem, strong duality for the associated Optimization problem as well as their geometric interpretation. Furthermore, we propose a game theoretic interpretation by expressing a Nash equilibrium problem formulation for the more general problem of finding the closest points in two closed convex sets when the underlying space is reflexive and smooth.

翻訳日:2022-02-18 14:55:23 公開日:2022-02-17

# 統合階段特性:二層ニューラルネットワークにおけるスパース関数のSGD学習に必要なほぼ十分条件

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks ( http://arxiv.org/abs/2202.08658v1 )

ライセンス: Link先を確認

Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

(参考訳) 現在、ニューラルネットワークが2つの極端パラメータ化のためにSGDで学習できる機能、すなわち線形状態のニューラルネットワークと、構造的な制約のないニューラルネットワークを特徴付ける方法が知られている。しかし、関心の主パラメトリゼーション(非線形だが正規のネットワーク)については、大きな発展にもかかわらず、厳密な特徴がまだ得られていない。我々は,sgdにより訓練された深部2ニューラルネットワークを平均場法で検討することで,この方向の一歩を踏み出す。我々は、潜在する低次元部分空間(つまり、少数の座標)に依存する二進入力上の函数を考える。この体制は、ニューラルネットワークが高次元データセットに日常的に取り組み、次元性の呪いに苦しむことなく潜伏する低次元構造に適応する方法がよく理解されていないため、関心がある。したがって、SGD-learnability with $O(d)$ sample complexity in a large ambient dimension $d$。私たちの主な結果は階層的特性である"merged-staircase property"を特徴付けており、この設定で学習するには必要であり、ほぼ十分である。この関数のクラスでは、任意の特徴写像(例えば、ntk)上の線形メソッドは効率的に学習できない。鍵となるツールは、低次元の潜在空間上で定義される関数に適用される新しい「次元自由」動力学近似結果、多項式の恒等性テストに基づく大域収束の証明、非直交関数に対する線形法に対する下界の改善である。

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest (non-linear but regular networks) no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly understood how neural networks routinely tackle high-dimensional datasets and adapt to latent low-dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension $d$. Our main results characterize a hierarchical property, the "merged-staircase property", that is both necessary and nearly sufficient for learning in this setting. We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new "dimension-free" dynamics approximation result that applies to functions defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions.

翻訳日:2022-02-18 14:53:20 公開日:2022-02-17

# 経験的リスク最小化の普遍性

Universality of empirical risk minimization ( http://arxiv.org/abs/2202.08832v1 )

ライセンス: Link先を確認

Andrea Montanari and Basil Saeed

(参考訳) d.d. サンプル $\{{\boldsymbol x}_i,y_i\}_{i\le n}$ ここで、${\boldsymbol x}_i \in\mathbb{R}^p$ は特徴ベクトルであり、${y} \in \mathbb{R}$ はラベルである。我々は,$\mathsf{k} = o(1)$ vectors ${\boldsymbol \theta}_1, . . . , {\boldsymbol \theta}_{\mathsf k} \in \mathbb{r}^p$ でパラメータ化される関数のクラスに対する経験的リスク最小化について検討し,トレーニングとテストエラーの両方で普遍性が証明された。すなわち、比例漸近値 $n,p\to\infty$, with $n/p = \theta(1)$ の下で、トレーニング誤差はその共分散構造を通してのみランダム特徴分布に依存することが証明される。さらに,短期的リスク最小値に対する最小テスト誤差が類似する普遍性特性を享受できることを実証する。特に、これらの量の漸近はより単純なモデルの下で$-$to leading order$-$と計算され、特徴ベクトル ${\boldsymbol x}_i$ は同じ共分散を持つガウスベクトル ${\boldsymbol g}_i$ に置き換えられる。初期の普遍性の結果は、強い凸学習手順や独立エントリを持つベクトル${\boldsymbol x}_i$に限られていた。私たちの結果はこれらの仮定を成さない。我々の仮定は、ランダム化有限化写像によって生成される特徴ベクトル ${\boldsymbol x}_i$ を含むのに十分一般的である。特に、特定のランダムな特徴モデル(ランダムな重み付き一層ニューラルネットワークの出力を計算する)とニューラルタンジェントモデル(二層ネットワークの1次テイラー近似)の仮定を明示的に検証する。

Consider supervised learning from i.i.d. samples $\{{\boldsymbol x}_i,y_i\}_{i\le n}$ where ${\boldsymbol x}_i \in\mathbb{R}^p$ are feature vectors and ${y} \in \mathbb{R}$ are labels. We study empirical risk minimization over a class of functions that are parameterized by $\mathsf{k} = O(1)$ vectors ${\boldsymbol \theta}_1, . . . , {\boldsymbol \theta}_{\mathsf k} \in \mathbb{R}^p$ , and prove universality results both for the training and test error. Namely, under the proportional asymptotics $n,p\to\infty$, with $n/p = \Theta(1)$, we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed $-$to leading order$-$ under a simpler model in which the feature vectors ${\boldsymbol x}_i$ are replaced by Gaussian vectors ${\boldsymbol g}_i$ with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors ${\boldsymbol x}_i$ with independent entries. Our results do not make any of these assumptions. Our assumptions are general enough to include feature vectors ${\boldsymbol x}_i$ that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).

翻訳日:2022-02-18 14:52:53 公開日:2022-02-17

# カーネル法による情報理論

Information Theory with Kernel Methods ( http://arxiv.org/abs/2202.08545v1 )

ライセンス: Link先を確認

Francis Bach (SIERRA)

(参考訳) 生成カーネルヒルベルト空間からの共分散演算子による確率分布の解析について考察する。これらの作用素のフォン・ノイマンエントロピーと相対エントロピーは、シャノンエントロピーと相対エントロピーの通常の概念と密接に関連しており、それらの性質の多くを共有している。確率分布の様々なオーラクルから効率的な推定アルゴリズムが組み合わさっている。また、積空間を考察し、テンソル積核に対して、相互情報と合同エントロピーの概念を定義できることを示した。最終的に、これらの新しい相対エントロピーの概念が、変分推論手法における凸最適化と併用し、新しい確率的推論手法のファミリーを提供する、ログ分割関数上の新しい上界につながることを示す。

We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. We show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from various oracles on the probability distributions. We also consider product spaces and show that for tensor product kernels, we can define notions of mutual information and joint entropies, which can then characterize independence perfectly, but only partially conditional independence. We finally show how these new notions of relative entropy lead to new upper-bounds on log partition functions, that can be used together with convex optimization within variational inference methods, providing a new family of probabilistic inference methods.

翻訳日:2022-02-18 14:51:46 公開日:2022-02-17

# cosformer:softmaxの注目を再考する

cosFormer: Rethinking Softmax in Attention ( http://arxiv.org/abs/2202.08791v1 )

ライセンス: Link先を確認

Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong

(参考訳) Transformerは自然言語処理、コンピュータビジョン、オーディオ処理で大きな成功を収めている。コアコンポーネントの1つとして、ソフトマックスアテンションは長距離依存を捉えるのに役立つが、2次空間とシーケンス長の時間的複雑さのためにスケールアップを禁止している。カーネル法はソフトマックス演算子を近似することで複雑さを減らすためによく用いられる。それにもかかわらず、近似誤差のため、その性能は異なるタスク/コーパスで異なり、バニラソフトマックスの注意と比べ、重要な性能低下に苦しむ。本稿では,カジュアル・クロスの両面において,バニラ変圧器に匹敵する精度を達成できる,cosFormerと呼ばれる線形変圧器を提案する。 cosformerはsoftmax attentionの2つの重要な特性に基づいている。私)。注意行列の非負性 i)。注意行列の分布に集中できる非線形再重み付けスキーム。線型代用として、cosFormerは線型作用素とコサインに基づく距離再重み付け機構でこれらの特性を満たす。言語モデルとテキスト理解タスクに関する広範な実験により,本手法の有効性が示された。さらに,本手法を長手シーケンスで検討し,長手領域のarenaベンチマークで最先端の性能を実現する。ソースコードはhttps://github.com/OpenNLPLab/cosFormerで入手できる。

Transformer has shown great successes in natural language processing, computer vision, and audio processing. As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length. Kernel methods are often adopted to reduce the complexity by approximating the softmax operator. Nevertheless, due to the approximation errors, their performances vary in different tasks/corpus and suffer crucial performance drops when compared with the vanilla softmax attention. In this paper, we propose a linear transformer called cosFormer that can achieve comparable or better accuracy to the vanilla transformer in both casual and cross attentions. cosFormer is based on two key properties of softmax attention: i). non-negativeness of the attention matrix; ii). a non-linear re-weighting scheme that can concentrate the distribution of the attention matrix. As its linear substitute, cosFormer fulfills these properties with a linear operator and a cosine-based distance re-weighting mechanism. Extensive experiments on language modeling and text understanding tasks demonstrate the effectiveness of our method. We further examine our method on long sequences and achieve state-of-the-art performance on the Long-Range Arena benchmark. The source code is available at https://github.com/OpenNLPLab/cosFormer.

翻訳日:2022-02-18 14:51:31 公開日:2022-02-17

# (参考訳) ニューラルネットワークの一般サイクル学習

General Cyclical Training of Neural Networks ( http://arxiv.org/abs/2202.08835v1 )

ライセンス: CC BY 4.0

Leslie N. Smith

(参考訳) 本稿では,機械学習における「一般循環型トレーニング」の原則について述べる。ニューラルネットワークのトレーニングには,アルゴリズムによる例(ハイパーパラメータとロス関数),データに基づく例,モデルに基づく例など,いくつかのマニフェストを提案する。具体的には, 循環量減少, 循環的バッチサイズ, 循環的焦点損失, 循環的ソフトマックス温度, 循環的データ増大, 循環的勾配クリッピング, 循環的半教師付き学習といった新しい手法を紹介する。さらに, 実験モデルの試験精度向上には, 周期的重量減衰, 周期的軟度温度, 循環的勾配クリッピング(この原理の3つの例)が有用であることを示した。さらに, 一般循環学習の観点から, モデルに基づく例(事前学習や知識蒸留など)を考察し, 典型的な学習手法の変更を推奨する。本稿では、一般循環学習の概念を定義し、この概念をニューラルネットワークのトレーニングに適用できるいくつかの具体的な方法について論じる。再現性の精神では、我々の実験で使われたコードは \url{https://github.com/lnsmith54/CFL} で入手できる。

This paper describes the principle of "General Cyclical Training" in machine learning, where training starts and ends with "easy training" and the "hard training" happens during the middle epochs. We propose several manifestations for training neural networks, including algorithmic examples (via hyper-parameters and loss functions), data-based examples, and model-based examples. Specifically, we introduce several novel techniques: cyclical weight decay, cyclical batch size, cyclical focal loss, cyclical softmax temperature, cyclical data augmentation, cyclical gradient clipping, and cyclical semi-supervised learning. In addition, we demonstrate that cyclical weight decay, cyclical softmax temperature, and cyclical gradient clipping (as three examples of this principle) are beneficial in the test accuracy performance of a trained model. Furthermore, we discuss model-based examples (such as pretraining and knowledge distillation) from the perspective of general cyclical training and recommend some changes to the typical training methodology. In summary, this paper defines the general cyclical training concept and discusses several specific ways in which this concept can be applied to training neural networks. In the spirit of reproducibility, the code used in our experiments is available at \url{https://github.com/lnsmith54/CFL}.

翻訳日:2022-02-18 14:50:07 公開日:2022-02-17

# バックプロパゲーションのない勾配

Gradients without Backpropagation ( http://arxiv.org/abs/2202.08587v1 )

ライセンス: Link先を確認

At{\i}l{\i}m G\"une\c{s} Baydin, Barak A. Pearlmutter, Don Syme, Frank Wood, Philip Torr

(参考訳) 最適化のために目的関数の勾配を計算するためにバックプロパゲーションを使うことは、機械学習のメインスタンスである。バックプロパゲーション(backpropagation)またはリバースモード微分(reverse-mode differentiation)は、フォワードモードを含む自動微分アルゴリズムの一般ファミリーにおける特別なケースである。本稿では,フォワードモードを通じて正確に効率的に計算できる方向微分のみに基づいて勾配を計算する手法を提案する。この定式化をフォワード勾配と呼び、関数の1回のフォワードランで評価できる勾配の偏りのない推定と呼び、勾配降下におけるバックプロパゲーションの必要性を完全に排除する。我々は,様々な問題において前方勾配降下を示し,計算量を大幅に削減し,場合によっては最大2倍の速さでトレーニングできることを示した。

Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can compute exactly and efficiently via the forward mode. We call this formulation the forward gradient, an unbiased estimate of the gradient that can be evaluated in a single forward run of the function, entirely eliminating the need for backpropagation in gradient descent. We demonstrate forward gradient descent in a range of problems, showing substantial savings in computation and enabling training up to twice as fast in some cases.

翻訳日:2022-02-18 14:34:42 公開日:2022-02-17

# セマンティックセグメンテーションにおける未知の検出と学習

Detecting and Learning the Unknown in Semantic Segmentation ( http://arxiv.org/abs/2202.08700v1 )

ライセンス: Link先を確認

Robin Chan, Svenja Uhlemeyer, Matthias Rottmann and Hanno Gottschalk

(参考訳) セマンティックセグメンテーションは自動運転における認識にとって重要な要素である。ディープニューラルネットワーク(DNN)はこのタスクに一般的に使われ、通常、閉じた操作領域に現れるオブジェクトクラスの閉じたセットでトレーニングされる。しかしこれは、DNNがデプロイされる自動運転におけるオープンワールドの仮定とは対照的である。したがって、DNNは、これまで遭遇したことのないデータ(異常とも呼ばれる)に直面している必要がある。本稿では,まず,情報理論的な観点からの異常について概観する。次に,セマンティックセグメンテーションにおける意味不明物体の検出に関する研究について述べる。我々は,異常物体に対する高いエントロピー応答の訓練が,我々の理論的知見に合致する他の手法よりも優れていることを実証する。さらに,モデルのセマンティクスのセットに含まれる異常タイプを選択するために,異常の発生頻度を評価する手法について検討する。これらの異常は教師なしの方法で学習できることを示し、ディープラーニングに基づくオンラインアプリケーションに特に適している。

Semantic segmentation is a crucial component for perception in automated driving. Deep neural networks (DNNs) are commonly used for this task and they are usually trained on a closed set of object classes appearing in a closed operational domain. However, this is in contrast to the open world assumption in automated driving that DNNs are deployed to. Therefore, DNNs necessarily face data that they have never encountered previously, also known as anomalies, which are extremely safety-critical to properly cope with. In this work, we first give an overview about anomalies from an information-theoretic perspective. Next, we review research in detecting semantically unknown objects in semantic segmentation. We demonstrate that training for high entropy responses on anomalous objects outperforms other recent methods, which is in line with our theoretical findings. Moreover, we examine a method to assess the occurrence frequency of anomalies in order to select anomaly types to include into a model's set of semantic categories. We demonstrate that these anomalies can then be learned in an unsupervised fashion, which is particularly suitable in online applications based on deep learning.

翻訳日:2022-02-18 14:33:55 公開日:2022-02-17

# (参考訳) Data-SUITE:In-distribution incongruous例のデータ中心同定

Data-SUITE: Data-centric identification of in-distribution incongruous examples ( http://arxiv.org/abs/2202.08836v1 )

ライセンス: CC BY 4.0

Nabeel Seedat, Jonathan Crabbe, Mihaela van der Schaar

(参考訳) データ品質の体系的定量化は一貫したモデル性能にとって重要である。以前の研究は、アウトオブディストリビューションデータに重点を置いてきた。代わりに、特徴空間の不均一性から生じる可能性のある不連続領域(ID)データを特徴付けるという、未検討かつ等しく重要な問題に取り組む。そこで本研究では,データ中心のフレームワークであるData-SUITEによるパラダイムシフトを提案する。 Data-SUITEは、コプラモデリング、表現学習、コンフォメーション予測を利用して、一連のトレーニングインスタンスに基づいて特徴量信頼区間推定器を構築する。これらの推定器は、トレーニングセットに関するテストインスタンスの一致を評価するために、(1)トレーニングインスタンスでトレーニングされたモデルによってどのテストインスタンスが確実に予測されるかという、実用的な2つの質問に答えるために使用できる。そして、(2)データオーナーがデータの制限を理解したり、将来のデータ収集を導くために、特徴空間の不一致領域を識別できますか? 我々は、Data-SUITEの性能とカバレッジ保証を実証的に検証し、クロスサイト医療データ、偏りのあるデータ、コンセプトドリフトデータ、そして、下流モデルが信頼できる(そのモデルに依存しない)ID領域を最もよく識別することを示す。さらに、これらの特定されたリージョンがデータセットに対する洞察を提供し、その制限を強調する方法について説明する。

Systematic quantification of data quality is critical for consistent model performance. Prior works have focused on out-of-distribution data. Instead, we tackle an understudied yet equally important problem of characterizing incongruous regions of in-distribution (ID) data, which may arise from feature space heterogeneity. To this end, we propose a paradigm shift with Data-SUITE: a data-centric framework to identify these regions, independent of a task-specific model. DATA-SUITE leverages copula modeling, representation learning, and conformal prediction to build feature-wise confidence interval estimators based on a set of training instances. These estimators can be used to evaluate the congruence of test instances with respect to the training set, to answer two practically useful questions: (1) which test instances will be reliably predicted by a model trained with the training instances? and (2) can we identify incongruous regions of the feature space so that data owners understand the data's limitations or guide future data collection? We empirically validate Data-SUITE's performance and coverage guarantees and demonstrate on cross-site medical data, biased data, and data with concept drift, that Data-SUITE best identifies ID regions where a downstream model may be reliable (independent of said model). We also illustrate how these identified regions can provide insights into datasets and highlight their limitations.

翻訳日:2022-02-18 14:32:17 公開日:2022-02-17

# 文法に基づく基底辞書学習

Grammar-Based Grounded Lexicon Learning ( http://arxiv.org/abs/2202.08806v1 )

ライセンス: Link先を確認

Jiayuan Mao, Haoyue Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum

(参考訳) 本稿では,文法に基づく接地辞書学習(G2L2)について述べる。 G2L2の中核には、各単語を構文型のタプルとニューロシンボリックセマンティックプログラムにマッピングする辞書エントリのコレクションがある。例えば、shiny という単語は形容詞の構文型を持ち、そのニューロシンボリックな意味プログラムは記号形式 {\lambda}x を持つ。これはshiNYの概念がニューラルネットワークの埋め込みと関連付けられており、光沢のあるオブジェクトを分類するために使用される。入力文が与えられた後、G2L2はまず各トークンに関連する辞書エントリを検索する。次に、構文に基づいた語彙的意味を合成することにより、実行可能な神経シンボリックプログラムとして文の意味を導出する。回収された意味プログラムは、接地入力で実行することができる。指数関数的に成長する合成空間における学習を容易にするために,学習時間を削減するために,導出上の局所辺縁化を行う合同解析および期待実行アルゴリズムを提案する。視覚的推論と言語駆動ナビゲーションの2つの領域でG2L2を評価する。その結果、g2l2は少量のデータから新しい単語合成に一般化できることがわかった。

We present Grammar-Based Grounded Lexicon Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts. At the core of G2L2 is a collection of lexicon entries, which map each word to a tuple of a syntactic type and a neuro-symbolic semantic program. For example, the word shiny has a syntactic type of adjective; its neuro-symbolic semantic program has the symbolic form {\lambda}x. filter(x, SHINY), where the concept SHINY is associated with a neural network embedding, which will be used to classify shiny objects. Given an input sentence, G2L2 first looks up the lexicon entries associated with each token. It then derives the meaning of the sentence as an executable neuro-symbolic program by composing lexical meanings based on syntax. The recovered meaning programs can be executed on grounded inputs. To facilitate learning in an exponentially-growing compositional space, we introduce a joint parsing and expected execution algorithm, which does local marginalization over derivations to reduce the training time. We evaluate G2L2 on two domains: visual reasoning and language-driven navigation. Results show that G2L2 can generalize from small amounts of data to novel compositions of words.

翻訳日:2022-02-18 14:30:43 公開日:2022-02-17

# Dynamic Object Comprehension: 人工的な視覚知覚を評価するフレームワーク

Dynamic Object Comprehension: A Framework For Evaluating Artificial Visual Perception ( http://arxiv.org/abs/2202.08490v1 )

ライセンス: Link先を確認

Scott Y.L. Chin, Bradley R. Quinton

(参考訳) AugmentedとMixed Realityは、おそらくモバイルインターネットの後継として浮上している。しかし、多くの技術的課題が残っている。これらのシステムの重要な要件の1つは、物理的な世界と仮想世界の間の連続性を作り出す能力であり、ユーザの視覚知覚が主要なインターフェイス媒体である。この連続性を構築するには、物理的な世界を視覚的に理解する必要がある。コンピュータビジョンや画像分類やオブジェクト検出などのai技術は近年大きく進歩しているが、これらの領域での成功は、これらの重要なmrやarアプリケーションに必要な視覚認識にはまだ繋がっていない。重要な問題は、これらのアプリケーションに現在の評価基準が不十分であることだ。この新興分野の進歩を動機づけ、評価するには、新しいメトリクスが必要である。本稿では,現在の評価基準の限界を概説し,新しい基準を提案する。

Augmented and Mixed Reality are emerging as likely successors to the mobile internet. However, many technical challenges remain. One of the key requirements of these systems is the ability to create a continuity between physical and virtual worlds, with the user's visual perception as the primary interface medium. Building this continuity requires the system to develop a visual understanding of the physical world. While there has been significant recent progress in computer vision and AI techniques such as image classification and object detection, success in these areas has not yet led to the visual perception required for these critical MR and AR applications. A significant issue is that current evaluation criteria are insufficient for these applications. To motivate and evaluate progress in this emerging area, there is a need for new metrics. In this paper we outline limitations of current evaluation criteria and propose new criteria.

翻訳日:2022-02-18 14:29:39 公開日:2022-02-17

# 連続コンディショニングによる点雲生成

Point Cloud Generation with Continuous Conditioning ( http://arxiv.org/abs/2202.08526v1 )

ライセンス: Link先を確認

Larissa T. Triess and Andre B\"uhler and David Peter and Fabian B. Flohr and J. Marius Z\"ollner

(参考訳) 生成モデルは高品質で多様な3Dオブジェクトを合成するのに使うことができる。本稿では,連続パラメータを条件とした3次元点クラウド形状を生成する新たなgan(generative adversarial network)構成を提案する。例示アプリケーションでは、これを使って生成プロセスをガイドし、カスタムフィットな形状の3dオブジェクトを作成します。補助分類器gansの概念を用いて,マルチタスク環境でこの生成プロセスを定式化する。さらに、データセットのカーネル密度推定(KDE)からトレーニング用ジェネレータラベル入力をサンプリングする。以上の結果から,少ないサンプルの領域で大幅なパフォーマンス向上が期待できる。広範に定量的および定性的な実験により、優れた生成品質と多様性を維持しながら、対象次元を明示的に制御できることが示されている。

Generative models can be used to synthesize 3D objects of high quality and diversity. However, there is typically no control over the properties of the generated object.This paper proposes a novel generative adversarial network (GAN) setup that generates 3D point cloud shapes conditioned on a continuous parameter. In an exemplary application, we use this to guide the generative process to create a 3D object with a custom-fit shape. We formulate this generation process in a multi-task setting by using the concept of auxiliary classifier GANs. Further, we propose to sample the generator label input for training from a kernel density estimation (KDE) of the dataset. Our ablations show that this leads to significant performance increase in regions with few samples. Extensive quantitative and qualitative experiments show that we gain explicit control over the object dimensions while maintaining good generation quality and diversity.

翻訳日:2022-02-18 14:29:26 公開日:2022-02-17

# マルチオブジェクト追跡のための断熱量子コンピューティング

Adiabatic Quantum Computing for Multi Object Tracking ( http://arxiv.org/abs/2202.08837v1 )

ライセンス: Link先を確認

Jan-Nico Zaech, Alexander Liniger, Martin Danelljan, Dengxin Dai, Luc Van Gool

(参考訳) マルチオブジェクトトラッキング(mot)は、オブジェクト検出が時間を通じて関連付けられるトラッキング・バイ・検出パラダイムにおいて、最も頻繁にアプローチされる。関連性は自然に離散最適化問題につながる。これらの最適化問題はNPハードであるため、現在のハードウェア上の小さなインスタンスに対してのみ解ける。 AQC(Adiabatic quantum computing)は、近い将来、NP-hard最適化問題にかなりのスピードアップをもたらす可能性があるため、この問題に対する解決策を提供する。しかし、現在のMOTの定式化は、スケーリング特性のために量子コンピューティングには適さない。そこで本研究では,AQCで解くためのMOTの定式化を提案する。我々は、AQC上に実装された量子力学系を表すIsingモデルを用いる。本手法は,既成整数計画法を用いても,最先端の最適化手法と競合することを示す。最後に、MOT問題はすでに実量子コンピュータの現世代で小さな例で解決可能であることを実証し、測定された解の性質を解析する。

Multi-Object Tracking (MOT) is most often approached in the tracking-by-detection paradigm, where object detections are associated through time. The association step naturally leads to discrete optimization problems. As these optimization problems are often NP-hard, they can only be solved exactly for small instances on current hardware. Adiabatic quantum computing (AQC) offers a solution for this, as it has the potential to provide a considerable speedup on a range of NP-hard optimization problems in the near future. However, current MOT formulations are unsuitable for quantum computing due to their scaling properties. In this work, we therefore propose the first MOT formulation designed to be solved with AQC. We employ an Ising model that represents the quantum mechanical system implemented on the AQC. We show that our approach is competitive compared with state-of-the-art optimization-based approaches, even when using of-the-shelf integer programming solvers. Finally, we demonstrate that our MOT problem is already solvable on the current generation of real quantum computers for small examples, and analyze the properties of the measured solutions.

翻訳日:2022-02-18 14:29:13 公開日:2022-02-17

# AIを用いた胃内視鏡生検5クラス診断のためのハイブリッド2段階視覚変換器

A hybrid 2-stage vision transformer for AI-assisted 5 class pathologic diagnosis of gastric endoscopic biopsies ( http://arxiv.org/abs/2202.08510v1 )

ライセンス: Link先を確認

Yujin Oh, Go Eun Bae, Kyung-Hee Kim, Min-Kyung Yeo, Jong Chul Ye

(参考訳) 胃内視鏡検査は早期に適切な胃癌(GC)治療を判定し,GC関連死亡率を低下させる有効な方法である。 ai(artificial intelligence)は、病理医が全スライド画像のデジタル化を支援するという大きな約束をもたらしたが、臨床ガイドラインに基づいた適切なgc処理を導くための自動分類システムは、いまだに不足している。本稿では,gc組織学の5つのクラスを分類するaiシステムを提案する。 2段階の視覚トランスフォーマーを用いたマルチスケールな自己照査機構を通じて、病理医がスライドを理解する方法を模倣したaiシステムは、内外コホート分析において85%以上の診断感度を達成し、臨床能力を示す。さらに、AI支援の病理医は、ヒトの病理医と比較して18%のスクリーニング時間で診断感度を10%改善した。当科のAIシステムは,早期GC患者に対する適切な治療法を決定する上で,先進的な病理所見を提供する大きな可能性を秘めている。

Gastric endoscopic screening is an effective way to decide appropriate gastric cancer (GC) treatment at an early stage, reducing GC-associated mortality rate. Although artificial intelligence (AI) has brought a great promise to assist pathologist to screen digitalized whole slide images, automatic classification systems for guiding proper GC treatment based on clinical guideline are still lacking. Here, we propose an AI system classifying 5 classes of GC histology, which can be perfectly matched to general treatment guidance. The AI system, mimicking the way pathologist understand slides through multi-scale self-attention mechanism using a 2-stage Vision Transformer, demonstrates clinical capability by achieving diagnostic sensitivity of above 85% for both internal and external cohort analysis. Furthermore, AI-assisted pathologists showed significantly improved diagnostic sensitivity by 10% within 18% saved screening time compared to human pathologists. Our AI system has a great potential for providing presumptive pathologic opinion for deciding proper treatment for early GC patients.

翻訳日:2022-02-18 14:28:56 公開日:2022-02-17

# 深層学習における一般化理解のための神経崩壊の限界

Limitations of Neural Collapse for Understanding Generalization in Deep Learning ( http://arxiv.org/abs/2202.08384v1 )

ライセンス: Link先を確認

Like Hui, Mikhail Belkin, Preetum Nakkiran

(参考訳) papyan, han, & donoho (2020) の最近の研究は興味深い「神経崩壊」現象を示し、訓練の後期における補間分類器の構造的特性を示した。この研究は、この現象の研究の豊富な領域を開拓した。私たちのモチベーションは、この研究プログラムの上限を研究することにあります。まず,一般化におけるその役割について検討する。我々はニューラル・コラプス予想を2つの別々の予想に洗練する: 列車集合上の崩壊(最適化特性)と試験分布上の崩壊(一般化特性)である。ニューラル・コラプスは列車のセットで発生することが多いが、テストセットでは発生しない。したがって、神経崩壊は主として最適化現象であり、一般化と無明なつながりを持つと結論づける。次に,機能学習における神経崩壊の役割について検討する。ダウンストリームタスクの転送性能によって測定されるように、トレーニングがより長くなるような、シンプルで現実的な実験を行う。これは、前述したように、神経崩壊が表現学習に必ずしも望ましいわけではないことを示唆している。最後に、「カスケード崩壊」現象の予備的証拠として、最後の層だけでなく、初期の層にも何らかの形態のニューラル崩壊が起こる。私たちの研究は、Neural Collapse研究の豊富なラインを継続し、その固有の制限を考慮しながら、コミュニティを奨励することを願っています。

The recent work of Papyan, Han, & Donoho (2020) presented an intriguing "Neural Collapse" phenomenon, showing a structural property of interpolating classifiers in the late stage of training. This opened a rich area of exploration studying this phenomenon. Our motivation is to study the upper limits of this research program: How far will understanding Neural Collapse take us in understanding deep learning? First, we investigate its role in generalization. We refine the Neural Collapse conjecture into two separate conjectures: collapse on the train set (an optimization property) and collapse on the test distribution (a generalization property). We find that while Neural Collapse often occurs on the train set, it does not occur on the test set. We thus conclude that Neural Collapse is primarily an optimization phenomenon, with as-yet-unclear connections to generalization. Second, we investigate the role of Neural Collapse in feature learning. We show simple, realistic experiments where training longer leads to worse last-layer features, as measured by transfer-performance on a downstream task. This suggests that neural collapse is not always desirable for representation learning, as previously claimed. Finally, we give preliminary evidence of a "cascading collapse" phenomenon, wherein some form of Neural Collapse occurs not only for the last layer, but in earlier layers as well. We hope our work encourages the community to continue the rich line of Neural Collapse research, while also considering its inherent limitations.

翻訳日:2022-02-18 14:28:37 公開日:2022-02-17

# アルツハイマー病関連知識グラフのマイニング : 薬物補充のためのad関連意味三重項の同定

Mining On Alzheimer's Diseases Related Knowledge Graph to Identity Potential AD-related Semantic Triples for Drug Repurposing ( http://arxiv.org/abs/2202.08712v1 )

ライセンス: Link先を確認

Yi Nian, Xinyue Hu, Rui Zhang, Jingna Feng, Jingcheng Du, Fang Li, Yong Chen and Cui Tao

(参考訳) 現在、ほとんどの神経変性疾患に対して効果的な治療法はない。知識グラフは異種データの包括的および意味的表現を提供し、薬物再精製を含む多くの生体医学的応用でうまく活用されている。本研究の目的は,アルツハイマー病 (AD) と薬剤, 薬物, 栄養補助薬の関係を文献から研究し, 神経変性の進行を予防または遅らせる機会を明らかにすることである。バイオメディカルアノテーションを収集し,SemMedDBを介してSemRepを用いてそれらの関係を抽出した。我々は、データ前処理中にBERTベースの分類器とルールベースの手法の両方を用いて、ほとんどのAD関連セマンティックトリプルを保存しながらノイズを排除した。 1,672,110個のフィルター付きトリプルは知識グラフ補完アルゴリズム(TransE、DistMult、ComplEx)を用いてAD治療や予防に役立つ候補を予測するために使用された。 3つの知識グラフ補完モデルの中で、TransEは他の2つよりも優れていた(MR = 13.45, Hits@1 = 0.306)。予測結果のさらなる評価に時間スライシング手法を活用した。我々のモデルによって予測される最も高いランクの候補に対する支持的な証拠は、我々のアプローチが信頼できる新しい知識を知らせることができることを示している。グラフマイニングモデルは,adと他のエンティティ(サプリメント,化学物質,薬物)との間の信頼性の高い新たな関係を予測できることを示す。構築された知識グラフは、データ駆動の知識発見と新しい仮説の生成を促進することができる。

To date, there are no effective treatments for most neurodegenerative diseases. Knowledge graphs can provide comprehensive and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug repurposing. Our objective is to construct a knowledge graph from literature to study relations between Alzheimer's disease (AD) and chemicals, drugs and dietary supplements in order to identify opportunities to prevent or delay neurodegenerative progression. We collected biomedical annotations and extracted their relations using SemRep via SemMedDB. We used both a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The 1,672,110 filtered triples were used to train with knowledge graph completion algorithms (i.e., TransE, DistMult, and ComplEx) to predict candidates that might be helpful for AD treatment or prevention. Among three knowledge graph completion models, TransE outperformed the other two (MR = 13.45, Hits@1 = 0.306). We leveraged the time-slicing technique to further evaluate the prediction results. We found supporting evidence for most highly ranked candidates predicted by our model which indicates that our approach can inform reliable new knowledge. This paper shows that our graph mining model can predict reliable new relationships between AD and other entities (i.e., dietary supplements, chemicals, and drugs). The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses.

翻訳日:2022-02-18 14:27:49 公開日:2022-02-17

# Neural Marionette: ボリュームビデオからの運動骨格と潜在ダイナミクスの教師なし学習

Neural Marionette: Unsupervised Learning of Motion Skeleton and Latent Dynamics from Volumetric Video ( http://arxiv.org/abs/2202.08418v1 )

ライセンス: Link先を確認

Jinseok Bae, Hojun Jang, Cheol-Hui Min, Hyungun Choi, Young Min Kim

(参考訳) 神経マリオネット(neural marionette)は、動的シーケンスから骨格構造を発見し、観察された動きのダイナミクスと一致する多様な動きを生成するための教師なしアプローチである。任意の運動下での関節物体の点雲観察のビデオストリームを考えると、運動を効果的に表現できる未知の低次元の骨格関係を発見できる。次に、検出された構造を用いて、相対的な関節回転に復号して全骨格運動を表す潜在構造における動的シーケンスの運動前兆を符号化する。提案手法は, 基礎となる運動や骨格構造についての事前の知識なく動作し, 得られた構造が, 4次元の運動列を表す場合に, ハンドラベルの接地真実骨格と同等であることを示す。骨格構造は、様々なシナリオの運動を生成することができる運動空間の一般的な意味を埋め込む。学習前の動作が多モードシーケンス生成、2つのポーズの補間、異なる骨格構造への動き再ターゲットに一般化可能であることを検証する。

We present Neural Marionette, an unsupervised approach that discovers the skeletal structure from a dynamic sequence and learns to generate diverse motions that are consistent with the observed motion dynamics. Given a video stream of point cloud observation of an articulated body under arbitrary motion, our approach discovers the unknown low-dimensional skeletal relationship that can effectively represent the movement. Then the discovered structure is utilized to encode the motion priors of dynamic sequences in a latent structure, which can be decoded to the relative joint rotations to represent the full skeletal motion. Our approach works without any prior knowledge of the underlying motion or skeletal structure, and we demonstrate that the discovered structure is even comparable to the hand-labeled ground truth skeleton in representing a 4D sequence of motion. The skeletal structure embeds the general semantics of possible motion space that can generate motions for diverse scenarios. We verify that the learned motion prior is generalizable to the multi-modal sequence generation, interpolation of two poses, and motion retargeting to a different skeletal structure.

翻訳日:2022-02-18 14:27:24 公開日:2022-02-17

# CSCNet:クラウド空間における軌道予測のための文脈意味一貫性ネットワーク

CSCNet: Contextual Semantic Consistency Network for Trajectory Prediction in Crowded Spaces ( http://arxiv.org/abs/2202.08506v1 )

ライセンス: Link先を確認

Beihao Xia, Conghao Wong, Qinmu Peng, Wei Yuan, and Xinge You

(参考訳) 軌道予測は、歩行者、バイカー、車両などのエージェントの動き傾向を予測することを目的としている。混雑した空間における人間の活動の分析と理解に役立ち、監視ビデオ分析や自動運転システムなど、多くの分野に広く適用されている。ディープラーニングの成功のおかげで、軌道予測は大幅に進歩した。現在の方法は、社会的相互作用と風景の物理的制約の下でエージェントの将来の軌跡を研究することに専念している。さらに、これらの要因をどう扱うかは研究者の注意を引いている。しかし、これらの相互作用を様々な予測シーンでモデル化する際には、textbf{Semantic Shift Phenomenon} を無視する。社会的相互作用と物理的相互作用の間にはいくつかの意味的偏差があり、「\textbf{Gap}」と呼ばれる。本稿では,コンテキスト制約が強力かつ効率的なエージェントの将来の活動を予測するための \textbf{c}ontextual \textbf{s}emantic \textbf{c}onsistency \textbf{net}work (\textbf{cscnet})を提案する。シーン画像と軌跡から中間表現を得るために,よく設計されたコンテキスト認識転送を利用する。そして,活動意味論とシーン意味論を連携させてギャップを横切ることによって,社会的・身体的相互作用の違いを解消する。実験により、CSCNetは現在のほとんどの手法よりも定量的に質的に優れた性能を示した。

Trajectory prediction aims to predict the movement trend of the agents like pedestrians, bikers, vehicles. It is helpful to analyze and understand human activities in crowded spaces and widely applied in many areas such as surveillance video analysis and autonomous driving systems. Thanks to the success of deep learning, trajectory prediction has made significant progress. The current methods are dedicated to studying the agents' future trajectories under the social interaction and the sceneries' physical constraints. Moreover, how to deal with these factors still catches researchers' attention. However, they ignore the \textbf{Semantic Shift Phenomenon} when modeling these interactions in various prediction sceneries. There exist several kinds of semantic deviations inner or between social and physical interactions, which we call the "\textbf{Gap}". In this paper, we propose a \textbf{C}ontextual \textbf{S}emantic \textbf{C}onsistency \textbf{Net}work (\textbf{CSCNet}) to predict agents' future activities with powerful and efficient context constraints. We utilize a well-designed context-aware transfer to obtain the intermediate representations from the scene images and trajectories. Then we eliminate the differences between social and physical interactions by aligning activity semantics and scene semantics to cross the Gap. Experiments demonstrate that CSCNet performs better than most of the current methods quantitatively and qualitatively.

翻訳日:2022-02-18 14:27:06 公開日:2022-02-17

# CADRE:視覚に基づく自律型都市走行のためのカスケード深部強化学習フレームワーク

CADRE: A Cascade Deep Reinforcement Learning Framework for Vision-based Autonomous Urban Driving ( http://arxiv.org/abs/2202.08557v1 )

ライセンス: Link先を確認

Yinuo Zhao, Kun Wu, Zhiyuan Xu, Zhengping Che, Qi Lu, Jian Tang, Chi Harold Liu

(参考訳) 複雑な都市環境と運転行動のダイナミクスのため、高密度交通における視覚に基づく自律走行は極めて困難である。広く応用された手法は、手作りのルールに大きく依存するか、限られた人間の経験から学習する。本稿では,モデルフリービジョンに基づく自律運転を実現するために,新しいカスケード深層強化学習フレームワークcadreを提案する。 cadreでは、生の観察から代表的潜在性特徴を導出するため、まずコアテンション機構を利用したコアテンション知覚モジュール(copm)をオフラインで訓練し、事前収集した駆動データセットから視覚情報と制御情報との相互関係を学習する。凍結したCoPMを事例として、特に設計された報酬関数の指導の下で、運転ポリシーをオンライン学習するための効率的な分散近位ポリシー最適化フレームワークを提案する。我々は、CARLA NoCrashベンチマークと、自律都市運転タスクにおける特定の障害物回避シナリオを用いて、総合的な実証的研究を行う。実験結果はCADREの有効性と最先端技術に対する優位性を広いマージンで良好に証明した。

Vision-based autonomous urban driving in dense traffic is quite challenging due to the complicated urban environment and the dynamics of the driving behaviors. Widely-applied methods either heavily rely on hand-crafted rules or learn from limited human experience, which makes them hard to generalize to rare but critical scenarios. In this paper, we present a novel CAscade Deep REinforcement learning framework, CADRE, to achieve model-free vision-based autonomous urban driving. In CADRE, to derive representative latent features from raw observations, we first offline train a Co-attention Perception Module (CoPM) that leverages the co-attention mechanism to learn the inter-relationships between the visual and control information from a pre-collected driving dataset. Cascaded by the frozen CoPM, we then present an efficient distributed proximal policy optimization framework to online learn the driving policy under the guidance of particularly designed reward functions. We perform a comprehensive empirical study with the CARLA NoCrash benchmark as well as specific obstacle avoidance scenarios in autonomous urban driving tasks. The experimental results well justify the effectiveness of CADRE and its superiority over the state-of-the-art by a wide margin.

翻訳日:2022-02-18 14:26:41 公開日:2022-02-17

# 画像分類における早期停止を用いたニューラルアーキテクチャ探索による2段階アーキテクチャの微調整

Two-Stage Architectural Fine-Tuning with Neural Architecture Search using Early-Stopping in Image Classification ( http://arxiv.org/abs/2202.08604v1 )

ライセンス: Link先を確認

Youngkee Kim, Won Joon Yun, Youn Kyu Lee, Joongheon Kim

(参考訳) ディープニューラルネットワーク(NN)は、畳み込みニューラルネットワーク(CNN)によって様々なタスク(コンピュータビジョンなど)でよく機能する。しかし,業界における品質データ収集の難しさは,NNの利用を妨げている。この問題に対処するために、大規模なデータセットでトレーニングされたnnの微調整を活用する転送学習(tl)の概念が登場した。そこで本稿では,ニューラルアーキテクチャサーチ(NAS)の概念に触発された,画像分類のための2段階のアーキテクチャ微調整手法を提案する。提案手法の主なアイデアの1つはベースアーキテクチャの変異であり、与えられたアーキテクチャ情報を使用することで検索コストを削減できる。さらに、NASコストを直接削減するアーリーストッピングも検討されている。実験により,提案手法は計算コストを最大28.2%,検索コストを22.3%削減できることを確認した。

Deep neural networks (NN) perform well in various tasks (e.g., computer vision) because of the convolutional neural networks (CNN). However, the difficulty of gathering quality data in the industry field hinders the practical use of NN. To cope with this issue, the concept of transfer learning (TL) has emerged, which leverages the fine-tuning of NNs trained on large-scale datasets in data-scarce situations. Therefore, this paper suggests a two-stage architectural fine-tuning method for image classification, inspired by the concept of neural architecture search (NAS). One of the main ideas of our proposed method is a mutation with base architectures, which reduces the search cost by using given architectural information. Moreover, an early-stopping is also considered which directly reduces NAS costs. Experimental results verify that our proposed method reduces computational and searching costs by up to 28.2% and 22.3%, compared to existing methods.

翻訳日:2022-02-18 14:26:20 公開日:2022-02-17

# バックトランスレーションフレームワークにおける両翻訳モデルのエンドツーエンドトレーニング

End-to-End Training of Both Translation Models in the Back-Translation Framework ( http://arxiv.org/abs/2202.08465v1 )

ライセンス: Link先を確認

DongNyeong Heo and Heeyoul Choi

(参考訳) ニューラルネットワーク翻訳(NMT)における半教師付き学習アルゴリズムは、追加の単言語コーパスを用いることで教師付き学習アルゴリズムと比較して翻訳品質を著しく改善した。その中でもバックトランスレーションは理論的によく構造化された最先端の方法である。ソース言語とターゲット言語の間で事前訓練された2つのNMTモデルが与えられた場合、一方はモノリンガル文を潜在文として翻訳し、他方は潜在文を与えられたモノリンガル入力文を再構成する。そのため、以前の研究では、可変オートエンコーダ(VAE)トレーニングフレームワークをバックトランスレーションフレームワークに適用しようとした。しかし、潜在文の離散性は、フレームワークでバックプロパゲーションを使うことを不可能にした。本稿では,VAEの後方翻訳訓練フレームワークを実践し,エンドツーエンドのバックプロパゲーションによって学習する,識別可能な文を生成する分類的再パラメータ化手法を提案する。さらに,このフレームワークに特に有利ないくつかの正規化手法を提案する。本実験では,本手法が潜在文を通じてバックプロパゲーションを利用可能とし,wmt18翻訳タスクのデータセットのbleuスコアを改善することを実証する。

Semi-supervised learning algorithms in neural machine translation (NMT) have significantly improved translation quality compared to the supervised learning algorithms by using additional monolingual corpora. Among them, back-translation is a theoretically well-structured and cutting-edge method. Given two pre-trained NMT models between source and target languages, one translates a monolingual sentence as a latent sentence, and the other reconstructs the monolingual input sentence given the latent sentence. Therefore, previous works tried to apply the variational auto-encoder's (VAE) training framework to the back-translation framework. However, the discrete property of the latent sentence made it impossible to use backpropagation in the framework. This paper proposes a categorical reparameterization trick that generates a differentiable sentence, with which we practically implement the VAE's training framework for the back-translation and train it by end-to-end backpropagation. In addition, we propose several regularization techniques that are especially advantageous to this framework. In our experiments, we demonstrate that our method makes backpropagation available through the latent sentences and improves the BLEU scores on the datasets of the WMT18 translation task.

翻訳日:2022-02-18 14:24:22 公開日:2022-02-17

# グラフ用トランスフォーマー:アーキテクチャの観点からの概観

Transformer for Graphs: An Overview from Architecture Perspective ( http://arxiv.org/abs/2202.08455v1 )

ライセンス: Link先を確認

Erxue Min, Runfa Chen, Yatao Bian, Tingyang Xu, Kangfei Zhao, Wenbing Huang, Peilin Zhao, Junzhou Huang, Sophia Ananiadou, Yu Rong

(参考訳) 近年,多くの人工知能分野で大きな成功を収めたTransformerモデルは,グラフ構造化データのモデリングにおいて大きな可能性を実証している。現在、グラフ構造化データに適応するために、様々なトランスフォーマーが提案されている。しかし、これらの変圧器のグラフに対する包括的な文献レビューと体系的な評価はまだ利用できない。既存のグラフのトランスフォーマーモデルを整理し、様々なグラフタスクの有効性を体系的に調査することが不可欠である。本稿では,建築設計の観点から様々なグラフトランスフォーマーモデルの包括的レビューを行う。最初に既存のモデルを分解し、バニラ変換器にグラフ情報を組み込む典型的な3つの方法を結論付けます。 1)補助モジュールとしてのGNN 2)グラフによる位置埋め込みの改善,及び 3)グラフからの注意行列の改善。さらに,代表コンポーネントを3つのグループに実装し,様々なグラフデータベンチマークの総合的な比較を行い,各コンポーネントの性能向上について検討する。筆者らは,現行のグラフ特定モジュールによるトランスフォーマタの利点を検証し,その利点をグラフタスクで明らかにする。

Recently, Transformer model, which has achieved great success in many artificial intelligence fields, has demonstrated its great potential in modeling graph-structured data. Till now, a great variety of Transformers has been proposed to adapt to the graph-structured data. However, a comprehensive literature review and systematical evaluation of these Transformer variants for graphs are still unavailable. It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks. In this survey, we provide a comprehensive review of various Graph Transformer models from the architectural design perspective. We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer: 1) GNNs as Auxiliary Modules, 2) Improved Positional Embedding from Graphs, and 3) Improved Attention Matrix from Graphs. Furthermore, we implement the representative components in three groups and conduct a comprehensive comparison on various kinds of famous graph data benchmarks to investigate the real performance gain of each component. Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.

翻訳日:2022-02-18 14:23:41 公開日:2022-02-17

# GraphSHAP: Black-box Graph 分類のためのモチーフベースの説明

GRAPHSHAP: Motif-based Explanations for Black-box Graph Classifiers ( http://arxiv.org/abs/2202.08815v1 )

ライセンス: Link先を確認

Alan Perotti, Paolo Bajardi, Francesco Bonchi, and Andr\'e Panisson

(参考訳) ブラックボックス分類器(例えば表データ、画像、時系列)を説明するほとんどの方法は、特徴の削除/摂動がモデル出力に与える影響を測定することに依存している。これにより、説明言語は分類子の特徴空間にマッチする。しかし、基本特徴がグラフ構造(つまりエッジ)を記述する隣接情報と本質的に対応しているグラフデータを扱う場合、特徴空間と説明言語とのマッチングは適切ではないかもしれない。この点に関して、私たちは (i)黒箱の内部表現に関して、グラフ分類の優れた説明方法が完全に非依存であるべきである。 (ii)グラフ分類タスクのための良質な説明言語はモチーフのような高次構造で表現すべきである。したがって、特徴空間(エッジ)を説明空間(motifs)から切り離す必要性は、グラフ分類タスクの実行可能な説明を開発するための大きな課題である。本稿では,黒ボックスグラフ分類器のモチーフに基づく説明を提供する,Shapleyベースのアプローチである GraphSHAPを紹介し,モデルやトレーニングデータについて何の知識も必要とせずに,ブラックボックスを自由にクエリできる,という要件について述べる。さらに,合成グラフデータセット生成装置,サブグラフマイニングとランキングのためのアルゴリズム,カスタムグラフ畳み込み層,カーネルなどの補助コンポーネントを導入し,線形時間複雑性を維持しながら説明スコアを近似する。最後に,自閉症スペクトラム障害の患者とコントロールグループからなる実世界の脳ネットワークデータセット上で GraphSHAP を検証した。実験では,ブラックボックスモデルが提供する分類が,コネクトロミクスパターンによって効果的に説明できることを示す。

Most methods for explaining black-box classifiers (e.g., on tabular data, images, or time series) rely on measuring the impact that the removal/perturbation of features has on the model output. This forces the explanation language to match the classifier features space. However, when dealing with graph data, in which the basic features correspond essentially to the adjacency information describing the graph structure (i.e., the edges), this matching between features space and explanation language might not be appropriate. In this regard, we argue that (i) a good explanation method for graph classification should be fully agnostic with respect to the internal representation used by the black-box; and (ii) a good explanation language for graph classification tasks should be represented by higher-order structures, such as motifs. The need to decouple the feature space (edges) from the explanation space (motifs) is thus a major challenge towards developing actionable explanations for graph classification tasks. In this paper we introduce GRAPHSHAP, a Shapley-based approach able to provide motif-based explanations for black-box graph classifiers, assuming no knowledge whatsoever about the model or its training data: the only requirement is that the black-box can be queried at will. Furthermore, we introduce additional auxiliary components such as a synthetic graph dataset generator, algorithms for subgraph mining and ranking, a custom graph convolutional layer, and a kernel to approximate the explanation scores while maintaining linear time complexity. Finally, we test GRAPHSHAP on a real-world brain-network dataset consisting of patients affected by Autism Spectrum Disorder and a control group. Our experiments highlight how the classification provided by a black-box model can be effectively explained by few connectomics patterns.

翻訳日:2022-02-18 14:23:28 公開日:2022-02-17

# 構造化アウトプットを用いた効率的で信頼性の高い対話型学習

Efficient and Reliable Probabilistic Interactive Learning with Structured Outputs ( http://arxiv.org/abs/2202.08566v1 )

ライセンス: Link先を確認

Stefano Teso, Antonio Vergari

(参考訳) 本稿では,構造化された出力空間に対する対話型学習について検討し,ラベルが未知であり,取得しなければならないアクティブラーニングと,ラベルが騒々しく,拡張が必要な懐疑的な学習に焦点を当てた。これらのシナリオは、不確実性を測定するために確率量の信頼性と効率的な計算を保証する表現モデルを必要とする。我々は,これらの条件をすべて満たしている確率モデルの種類を同定し,表現性を維持しつつ,その量を扱いやすい計算を行う。トラクタブルな確率回路に関する先行研究に基づいて、CRISPが大規模な出力空間において、堅牢で効率的な能動的・懐疑的な学習を可能にする方法について説明する。

In this position paper, we study interactive learning for structured output spaces, with a focus on active learning, in which labels are unknown and must be acquired, and on skeptical learning, in which the labels are noisy and may need relabeling. These scenarios require expressive models that guarantee reliable and efficient computation of probabilistic quantities to measure uncertainty. We identify conditions under which a class of probabilistic models -- which we denote CRISPs -- meet all of these conditions, thus delivering tractable computation of the above quantities while preserving expressiveness. Building on prior work on tractable probabilistic circuits, we illustrate how CRISPs enable robust and efficient active and skeptical learning in large structured output spaces.

翻訳日:2022-02-18 14:23:00 公開日:2022-02-17

# データ中毒とビザンチン勾配攻撃の等価性

An Equivalence Between Data Poisoning and Byzantine Gradient Attacks ( http://arxiv.org/abs/2202.08578v1 )

ライセンス: Link先を確認

Sadegh Farhadkhani, Rachid Guerraoui, L\^e-Nguy\^en Hoang, Oscar Villemaud

(参考訳) 分散学習のレジリエンスを研究するために、"ビザンティン"文学は、労働者がパラメータサーバに任意の勾配を報告できる強力な脅威モデルを考える。このモデルはいくつかの基本的な結果を得るのに役立ったが、労働者がほとんど信頼できる機械であるときには、時には非現実的とみなされる。本稿では,本モデルとデータ中毒との間に驚くべき等価性を示す。より具体的には、PACを保証するパーソナライズされたフェデレーション学習システムにおいて、すべての勾配攻撃がデータ中毒に還元できることを証明します。この同値性により、ビザンティン機械学習における既存の不可能性定理のまとめとして、データ中毒に対するレジリエンスに関する新しい不合理性結果が得られる。さらに,同値性を用いることで,(理論上,経験的に)古典的なパーソナライズされた連合学習モデルに対して非常に効果的であることを示す,実践的な攻撃を導出する。

To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience to data poisoning as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models.

翻訳日:2022-02-18 14:22:47 公開日:2022-02-17

# (参考訳) 安全な予測モデル更新のためのホールドアウトセットの最適サイズ

Optimal sizing of a holdout set for safe predictive model updating ( http://arxiv.org/abs/2202.06374v2 )

ライセンス: CC BY-SA 4.0

Sami Haidar-Wehbe, Samuel R Emerson, Louis J M Aslett, James Liley

(参考訳) 医療統計と医療機械学習のリスクモデルは、臨床または他の介入を導くためにますます使われている。ガイド付き介入の後にモデルが更新される場合、正確な予測を行うのに失敗する可能性がある。モデルによって導かれる介入を受けない集団のサブセットである「ホールドアウトセット」の使用がこれを防ぐために提案されている。ホールドアウトセットの患者はリスク予測の恩恵を受けないため、ホールドアウトセットの患者数を最小限に抑えながら、モデルパフォーマンスの最大化をトレードオフしなければならない。一般損失関数を定義することにより、最適ホールドアウト集合サイズの存在と一意性を証明し、その推定にパラメトリックおよびセミパラメトリックアルゴリズムを導入する。われわれは,近年の予防接種前のリスクスコアを実証した。これらの結果に基づき、ホールドアウトセットはモデル更新問題に対する安全で実行可能で実装が容易なソリューションであると主張する。

Risk models in medical statistics and healthcare machine learning are increasingly used to guide clinical or other interventions. Should a model be updated after a guided intervention, it may lead to its own failure at making accurate predictions. The use of a `holdout set' -- a subset of the population that does not receive interventions guided by the model -- has been proposed to prevent this. Since patients in the holdout set do not benefit from risk predictions, the chosen size must trade off maximising model performance whilst minimising the number of held out patients. By defining a general loss function, we prove the existence and uniqueness of an optimal holdout set size, and introduce parametric and semi-parametric algorithms for its estimation. We demonstrate their use on a recent risk score for pre-eclampsia. Based on these results, we argue that a holdout set is a safe, viable and easily implemented solution to the model update problem.

翻訳日:2022-02-18 12:50:35 公開日:2022-02-17

# (参考訳) コンピュータ支援精子分析による顕微鏡映像の精液品質評価の検討

A Survey of Semen Quality Evaluation in Microscopic Videos Using Computer Assisted Sperm Analysis ( http://arxiv.org/abs/2202.07820v2 )

ライセンス: CC BY 4.0

Wenwei Zhao, Pingli Ma, Chen Li, Xiaoning Bu, Shuojia Zou, Tao Jiang, Marcin Grzegorzek

(参考訳) CASA(Computer Assisted Sperm Analysis)は、男性生殖健康診断と不妊治療において重要な役割を担っている。近年,コンピュータ産業の発展に伴い,精度の高いアルゴリズムが提案されている。これらの新しいアルゴリズムの助けを借りて、CASAはより高速で高品質な結果を得ることができる。画像処理はcasaの技術的基盤であり、前処理、特徴抽出、ターゲット検出、追跡などを含むため、これらの手法はcasaを扱う上で重要な技術的ステップである。過去30年間(1988年以降)のコンピュータ・アシスト精子分析手法に関する様々な研究が包括的に紹介され、分析されている。理解を容易にするために、関連する方法は精子分析の一般的なステップのシーケンスで分析される。言い換えると、精子検出(局所化)に関連する方法が最初に分析され、その後、精子追跡の方法が分析される。これとは別に、我々はCASAの現状と将来を分析・予測する。本研究によれば,本論文で述べた方法の精子顕微鏡映像に適用できる可能性について解説した。さらに、顕微鏡映像における物体検出と追跡の課題は、この調査に触発されて解決される可能性がある。

The Computer Assisted Sperm Analysis (CASA) plays a crucial role in male reproductive health diagnosis and Infertility treatment. With the development of the computer industry in recent years, a great of accurate algorithms are proposed. With the assistance of those novel algorithms, it is possible for CASA to achieve a faster and higher quality result. Since image processing is the technical basis of CASA, including pre-processing,feature extraction, target detection and tracking, these methods are important technical steps in dealing with CASA. The various works related to Computer Assisted Sperm Analysis methods in the last 30 years (since 1988) are comprehensively introduced and analysed in this survey. To facilitate understanding, the methods involved are analysed in the sequence of general steps in sperm analysis. In other words, the methods related to sperm detection (localization) are first analysed, and then the methods of sperm tracking are analysed. Beside this, we analyse and prospect the present situation and future of CASA. According to our work, the feasible for applying in sperm microscopic video of methods mentioned in this review is explained. Moreover, existing challenges of object detection and tracking in microscope video are potential to be solved inspired by this survey.

翻訳日:2022-02-18 12:34:19 公開日:2022-02-17

# (参考訳) 深層学習を用いた乳房密度推定のマルチ再構成

A multi-reconstruction study of breast density estimation using Deep Learning ( http://arxiv.org/abs/2202.08238v2 )

ライセンス: CC BY 4.0

Vikash Gupta, Mutlu Demirer, Robert W. Maxwell, Richard D. White, Barbaros Selnur Erdal

(参考訳) 乳腺密度の推定は、乳がんに先立つ個人を認識する上で重要な課題の1つである。マンモグラムの脂肪組織背景の低コントラストと変動のため、しばしば困難である。多くの場合、乳房密度は、放射線学者が乳房画像・報告データシステム(BI-RADS)によって決定される4つの密度カテゴリのうちの1つを、手動で推定する。乳房密度分類パイプラインの自動化に向けた取り組みが進められている。乳房密度推定はスクリーニング試験で行う重要な課題の1つである。濃厚な乳がんは乳がんの影響を受けやすい。マンモグラムの脂肪組織背景の低コントラストとゆらぎのため, 密度推定は困難である。伝統的なマンモグラムは、トモシンセシスや他の低放射線量変種(例えばhologicのintelligent 2dとc-view)に置き換えられている。低用量要件のため、Intelligent 2DビューとC-Viewを優先するスクリーニングセンターが増えている。乳房密度推定のためのディープラーニング研究は、ニューラルネットワークのトレーニングに単一のモダリティのみを使用する。しかし、そうすることでデータセット内の画像数が制限される。本稿では,すべてのモダリティを一度に訓練したニューラルネットワークが,任意のモダリティを訓練したニューラルネットワークよりも優れた性能を示す。受信者特性曲線の下の領域を用いてこれらの結果について議論する。

Breast density estimation is one of the key tasks in recognizing individuals predisposed to breast cancer. It is often challenging because of low contrast and fluctuations in mammograms' fatty tissue background. Most of the time, the breast density is estimated manually where a radiologist assigns one of the four density categories decided by the Breast Imaging and Reporting Data Systems (BI-RADS). There have been efforts in the direction of automating a breast density classification pipeline. Breast density estimation is one of the key tasks performed during a screening exam. Dense breasts are more susceptible to breast cancer. The density estimation is challenging because of low contrast and fluctuations in mammograms' fatty tissue background. Traditional mammograms are being replaced by tomosynthesis and its other low radiation dose variants (for example Hologic' Intelligent 2D and C-View). Because of the low-dose requirement, increasingly more screening centers are favoring the Intelligent 2D view and C-View. Deep-learning studies for breast density estimation use only a single modality for training a neural network. However, doing so restricts the number of images in the dataset. In this paper, we show that a neural network trained on all the modalities at once performs better than a neural network trained on any single modality. We discuss these results using the area under the receiver operator characteristics curves.

翻訳日:2022-02-18 12:33:12 公開日:2022-02-17

# 自然運動を超えて:ビデオフレーム補間の不連続を探る

Beyond Natural Motion: Exploring Discontinuity for Video Frame Interpolation ( http://arxiv.org/abs/2202.07291v2 )

ライセンス: Link先を確認

Sangjin Lee, Hyeongmin Lee, Chajin Shin, Hanbin Son, Sangyoun Lee

(参考訳) ビデオ補間は、2つの連続するフレームが与えられた中間フレームを合成するタスクである。以前の研究の多くは、乱れたフレームに対する適切なフレームワープ操作と改良モジュールに焦点を当てていた。これらの研究は、連続的な動きしか持たない自然ビデオで行われている。しかし、多くの実用的なビデオには、チャットウィンドウ、ウォーターマーク、GUI要素、サブタイトルなど、多くの不連続な動きが含まれている。これらの問題に対処するために,二つのフレーム間の遷移の概念を拡張する3つの手法を提案する。まず、連続的および不連続的な動き領域を分離できる新しいアーキテクチャです。また,図形テキスト混合(FTM)と呼ばれる新しいデータ拡張戦略を提案し,モデルがより一般的なシナリオを学習できるようにする。最後に,データ拡張を伴う不連続な運動領域の監視を行うための損失関数を提案する。モバイルゲームやチャットビデオからなる特別なデータセットを収集しました。本手法は,特殊データセット上の映像の補間特性を著しく改善することを示す。さらに,本モデルは,DAVISやUCF101のような連続的な動きのみを含む自然なビデオデータセットの最先端手法よりも優れている。

Video interpolation is the task that synthesizes the intermediate frame given two consecutive frames. Most of the previous studies have focused on appropriate frame warping operations and refinement modules for the warped frames. These studies have been conducted on natural videos having only continuous motions. However, many practical videos contain a lot of discontinuous motions, such as chat windows, watermarks, GUI elements, or subtitles. We propose three techniques to expand the concept of transition between two consecutive frames to address these issues. First is a new architecture that can separate continuous and discontinuous motion areas. We also propose a novel data augmentation strategy called figure-text mixing (FTM) to make our model learn more general scenarios. Finally, we propose loss functions to give supervisions of the discontinuous motion areas with the data augmentation. We collected a special dataset consisting of some mobile games and chatting videos. We show that our method significantly improves the interpolation qualities of the videos on the special dataset. Moreover, our model outperforms the state-of-the-art methods for natural video datasets containing only continuous motions, such as DAVIS and UCF101.

翻訳日:2022-02-18 12:27:06 公開日:2022-02-17

# ユーザ指向ロバスト強化学習

User-Oriented Robust Reinforcement Learning ( http://arxiv.org/abs/2202.07301v2 )

ライセンス: Link先を確認

Haoyi You, Beichen Yu, Haiming Jin, Zhaoxing Yang, Jiahui Sun, Xinbing Wang

(参考訳) 近年、様々な環境における政策の堅牢性向上が強化学習(RL)コミュニティの注目を集めている。既存のロバストなRL手法は主に、最悪の環境下でのポリシーの性能を最適化することで、最大限のロバスト性を達成することを目的としている。しかし、実際には、rlポリシーを使用するユーザは、環境間のパフォーマンスよりも異なる好みを持つ可能性がある。上述した最大限の堅牢性は、しばしばユーザーの好みを満たすには保守的すぎる。そこで本稿では,ロバストなRLにユーザ嗜好を取り入れ,新しいユーザ指向ロバストRL(UOR-RL)フレームワークを提案する。具体的には、RLのための新しいユーザ指向ロバストネス(UOR)メトリックを定義し、ユーザ好みに応じて異なる重みを環境に割り当て、最大ロバストネスメトリックを一般化する。 UORのパラメータを最適化するために, 既知環境分布を有するシナリオに対して, 2つの異なるUOR-RLトレーニングアルゴリズムを開発した。理論的には、我々のUOR-RLトレーニングアルゴリズムは、環境分布に関する不正確な、あるいは全く知識のない場合でも、ほぼ最適ポリシーに収束することを示す。さらに,4つの MuJoCo タスクの広範な実験評価を行った。実験結果から,UOR-RLは平均および最悪の性能指標の下では最先端のベースラインと同等であり,さらにUOR測定に基づいて新たな最先端のパフォーマンスを確立することが示唆された。

Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy's performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize the UOR metric, we develop two different UOR-RL training algorithms for the scenarios with or without a priori known environment distribution, respectively. Theoretically, we prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution. Furthermore, we carry out extensive experimental evaluations in 4 MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to the state-of-the-art baselines under the average and worst-case performance metrics, and more importantly establishes new state-of-the-art performance under the UOR metric.

翻訳日:2022-02-18 12:26:50 公開日:2022-02-17

# ノード分類において、スペクトルグラフニューラルネットワークはいつ失敗するのか?

When Does A Spectral Graph Neural Network Fail in Node Classification? ( http://arxiv.org/abs/2202.07902v2 )

ライセンス: Link先を確認

Zhixian Chen, Tengfei Ma and Yang Wang

(参考訳) 様々なグラフフィルタを持つスペクトルグラフニューラルネットワーク(GNN)は、グラフ学習問題における有望な性能のため、広く肯定されている。しかし、GNNは必ずしもうまく機能していないことが知られている。グラフフィルタはモデル説明の理論的基礎を提供するが、スペクトルGNNがいつ失敗するかは不明である。本稿では,ノード分類問題に着目し,その予測誤差を調査し,スペクトルGNNの性能に関する理論的解析を行う。本研究では,グラフ構造,ノードラベル,グラフフィルタの複雑な関係を包括的に理解する手法を提案する。ラベル差に対する応答効率の低いグラフフィルタは失敗しがちであることを示す。 GNNの性能を向上させるため,データ駆動型フィルタバンクを用いた理論解析から,フィルタ設計のためのより優れた手法を提案し,経験的検証のためのシンプルなモデルを提案する。実験結果は理論結果と一貫性を示し,戦略を支持する。

Spectral Graph Neural Networks (GNNs) with various graph filters have received extensive affirmation due to their promising performance in graph learning problems. However, it is known that GNNs do not always perform well. Although graph filters provide theoretical foundations for model explanations, it is unclear when a spectral GNN will fail. In this paper, focusing on node classification problems, we conduct a theoretical analysis of spectral GNNs performance by investigating their prediction error. With the aid of graph indicators including homophily degree and response efficiency we proposed, we establish a comprehensive understanding of complex relationships between graph structure, node labels, and graph filters. We indicate that graph filters with low response efficiency on label difference are prone to fail. To enhance GNNs performance, we provide a provably better strategy for filter design from our theoretical analysis - using data-driven filter banks, and propose simple models for empirical validation. Experimental results show consistency with our theoretical results and support our strategy.

翻訳日:2022-02-18 12:26:26 公開日:2022-02-17

# 会話レベル特性の学習による会話音声認識

Conversational Speech Recognition By Learning Conversation-level Characteristics ( http://arxiv.org/abs/2202.07855v2 )

ライセンス: Link先を確認

Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma

(参考訳) 会話自動音声認識(英: Conversational Automatic Speech Recognition, ASR)は、複数の話者を含む会話音声を認識するタスクである。文レベルのASRとは異なり、会話型ASRは、役割選好や話題のコヒーレンスといった会話の特徴から自然に利点を生かすことができる。本稿では,会話レベルの特徴を主成分とする対話型ASRモデルを提案する。提案するモデルのハイライトは2つだ。まず、コンバータベースのエンコーダデコーダASRバックボーンに潜時変分モジュール(LVM)をアタッチして、役割選好とトピックコヒーレンスを学ぶ。第二に、予測されたトピックの単語にデコーダの出力をバイアスするトピックモデルが特に採用されている。 2つのマンダリン会話型ASRタスクの実験により、提案モデルが最大12%の相対的文字誤り率(CER)を減少させることを示した。

Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12% relative character error rate (CER) reduction.

翻訳日:2022-02-18 12:26:12 公開日:2022-02-17

PDF登録状況（公開日: 20220217）