Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20210709となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# ファンデルワールスヘテロ界面におけるマグノン-励起子近接結合 Magnon-exciton proximity coupling at a van der Waals heterointerface ( http://arxiv.org/abs/2006.14257v2 ) ライセンス: Link先を確認	Arnaud Gloppe and Masaru Onga and Ryusuke Hisatomi and Atac Imamoglu and Yasunobu Nakamura and Yoshihiro Iwasa and Koji Usami	(参考訳) スピンとフォトニックシステムは、現代の情報デバイスや新しい量子技術の中心にある。半導体の電子ホール対(電子子)と磁性結晶の集団スピン励起(マグノン)の相互作用は、これらの異種系を橋渡しし、新しい相互接続デバイスにおける個々の資産を活用する。本稿では,磁性薄膜と原子薄膜半導体との界面におけるマグノン-エキシトン結合について報告する。我々のアプローチは、イットリウム鉄ガーネット (YIG) フィルムに宿る長寿命マグノンを遷移金属ジアルコゲナイド (MoSe$_2$) のフレーク中の強結合励起子に結合させる。マグノンは界面交換相互作用によって支配される動的谷ゼーマン効果を励起子に誘導する。この初期のハイブリッドシステムは、マイクロ波と光領域間の情報伝達の新しい機会を示唆している。 Spin and photonic systems are at the heart of modern information devices and emerging quantum technologies. An interplay between electron-hole pairs (excitons) in semiconductors and collective spin excitations (magnons) in magnetic crystals would bridge these heterogeneous systems, leveraging their individual assets in novel interconnected devices. Here, we report the magnon-exciton coupling at the interface between a magnetic thin film and an atomically-thin semiconductor. Our approach allies the long-lived magnons hosted in a film of yttrium iron garnet (YIG) to strongly-bound excitons in a flake of a transition metal dichalcogenide, MoSe$_2$. The magnons induce on the excitons a dynamical valley Zeeman effect ruled by interfacial exchange interactions. This nascent class of hybrid system suggests new opportunities for information transduction between microwave and optical regions.	翻訳日:2023-05-12 20:03:37 公開日:2021-07-09
# 連続作業証明の圧縮Oracle技術とポスト量子セキュリティについて On the Compressed-Oracle Technique, and Post-Quantum Security of Proofs of Sequential Work ( http://arxiv.org/abs/2010.11658v4 ) ライセンス: Link先を確認	Kai-Min Chung, Serge Fehr, Yu-Hsuan Huang, Tai-Ning Liao	(参考訳) 我々は、zhandryが量子ランダムオラクルモデル(qrom)で量子アルゴリズムを分析するために導入した圧縮オラクル技術について再検討する。まず、並列クエリQROMに容易に拡張できる手法の簡潔な説明を行い、各クエリラウンドにおいて、考慮されたアルゴリズムが複数のクエリを並列にQROMに生成する。このQROMの変形により、よりきめ細かいクエリ・複雑度解析が可能になる。我々の主な技術的貢献は、クエリ複雑性の結果を証明するために圧縮されたオラクル技術を使用する(並列クエリの一般化)フレームワークである。我々のフレームワークが組み込まれているため、任意の場合に、純粋に古典的な推論によって量子的クエリの複雑さを下げることが可能である。それよりも、典型的には古典的境界をもたらす重要な古典的観測は、対応する量子境界を結論付けるのに十分である。我々はこれをいくつかの例で示し、既知の結果(並列Groverの最適性など)を復元すると同時に、新しい結果(並列BHT衝突探索の最適性など)を得る。私たちの主なターゲットは、$q$以下の並列クエリ、すなわち、$x_0, x_1,\ldots, x_q$ with $x_i = H(x_{i-1})$ for all $1 \leq i \leq q$。上記のハッシュ連鎖を見つける問題は、シーケンシャルな作業の証明の文脈において重要な問題である。実際、我々の技術の具体的な暗号的応用として、CohenとPietrzakが提唱した"Simple Proofs of Sequential Work"が量子攻撃に対して安全であることを示す。このような分析は、単に新しいバウンドをプラグインすることではなく、プロトコル全体を量子攻撃の光で分析する必要がある。私たちのフレームワークのおかげで、これは純粋に古典的な推論で実現できます。 We revisit the so-called compressed oracle technique, introduced by Zhandry for analyzing quantum algorithms in the quantum random oracle model (QROM). To start off with, we offer a concise exposition of the technique, which easily extends to the parallel-query QROM, where in each query-round the considered algorithm may make several queries to the QROM in parallel. This variant of the QROM allows for a more fine-grained query-complexity analysis. Our main technical contribution is a framework that simplifies the use of (the parallel-query generalization of) the compressed oracle technique for proving query complexity results. With our framework in place, whenever applicable, it is possible to prove quantum query complexity lower bounds by means of purely classical reasoning. More than that, for typical examples the crucial classical observations that give rise to the classical bounds are sufficient to conclude the corresponding quantum bounds. We demonstrate this on a few examples, recovering known results (like the optimality of parallel Grover), but also obtaining new results (like the optimality of parallel BHT collision search). Our main target is the hardness of finding a $q$-chain with fewer than $q$ parallel queries, i.e., a sequence $x_0, x_1,\ldots, x_q$ with $x_i = H(x_{i-1})$ for all $1 \leq i \leq q$. The above problem of finding a hash chain is of fundamental importance in the context of proofs of sequential work. Indeed, as a concrete cryptographic application of our techniques, we prove that the "Simple Proofs of Sequential Work" proposed by Cohen and Pietrzak remains secure against quantum attacks. Such an analysis is not simply a matter of plugging in our new bound; the entire protocol needs to be analyzed in the light of a quantum attack. Thanks to our framework, this can now be done with purely classical reasoning.	翻訳日:2023-04-28 01:08:00 公開日:2021-07-09
# 多成分散乱マトリックスの調査:ユニタリティと対称性 Surveying the Multicomponent Scattering Matrix: Unitarity and Symmetries ( http://arxiv.org/abs/2010.15926v2 ) ライセンス: Link先を確認	L. Diago-Cisneros, J. J. Flores-Godoy and G. Fern\'andez-Anaya	(参考訳) 成分が混合および同期的に伝播するspm電荷キャリアの多成分多バンドフラックスは、非ゼロの入射振幅を持つが、任意の基底集合に対して散乱行列上の標準ユニタリティ条件に従わない。そのような場合には、ユニタリティー保存のための量子輸送問題の基本となるロバストな理論手順を導出し、その名称は \emph{structured unitarity condition} に因む。我々のアプローチは、包絡関数近似(EFA)内の相互作用成分$(N \times N)$(N \geq 2$)を扱い、しかしながら、$N = 1$)散乱行列の標準ユニタリ特性を回復する。むしろ基底集合および/または出力散乱係数に対する任意の条件は、構成とスピノリアル空間の両方において \emph{eigen}-函数が正規化されている場合、もはや必要ではない。エルミート・ハミルトニアン(Hermitian Hamiltonian)によってEFA内で記述された様々な種類のマルチバンド・マルチコンポーネント物理系に対して、このモデルが有効であると期待する。我々は、状態ベクトル伝達行列の相互作用を、その条件数の大きい値とともに予測し、散乱実験におけるトンネルチャネルの閾値をより正確に定義するための新しい補完的ツールである。 Multicomponent-multiband fluxes of spim-charge carriers, whose components propagate mixed and synchronously, with \emph{a priori} nonzero incoming amplitudes, do not obey the standard unitarity condition on the scattering matrix for an arbitrary basis set. For such cases, we have derived a robust theoretical procedure, which is fundamental in quantum-transport problems for unitarity preservation and we have named after \emph{structured unitarity condition}. Our approach deals with $(N \times N)$ interacting components (for $N \geq 2$), within the envelope function approximation (EFA), and yet the standard unitary properties of the ($N = 1$) scattering matrix are recovered. Rather arbitrary conditions to the basis-set and/or to the output scattering coefficients, are not longer required, if the \emph{eigen}-functions are orthonormalized in both the configuration and the spinorial spaces. We expect the present model to be workable, for different kind of multiband-multicomponent physical systems described by Hermitian Hamiltonians within the EFA, with small transformations if any. We foretell the interplay for the state-vector transfer matrix, together with the large values of its condition number, as a novel complementary tools for a more accurate definition of the threshold for tunnelling channels in a scattering experiment.	翻訳日:2023-04-27 00:29:36 公開日:2021-07-09
# 空洞を介する相関トンネルによる自己組織型トポロジカル絶縁体 Self-organized topological insulator due to cavity-mediated correlated tunneling ( http://arxiv.org/abs/2011.01687v3 ) ライセンス: Link先を確認	Titas Chanda, Rebecca Kraus, Giovanna Morigi, Jakub Zakrzewski	(参考訳) トポロジカル材料は量子技術に潜在的な応用がある。位相絶縁体や超伝導体などの非相互作用型位相材料は基本対称性クラスによって分類される。その代わりに、相互作用がトポロジカルな性質にどのように影響するかを部分的に理解しているだけである。本稿では,単粒子力学と大域的相互作用の量子干渉からトポロジーが出現するモデルについて述べる。このシステムは、1次元格子内の大域的相関ホッピングを介して相互作用するソフトコアボソンによって構成される。量子干渉の開始は格子変換対称性の自発的な破れにつながり、対応する位相は有名なSu-Schriefer-Heegerモデルの非自明な状態に似ている。フェルミオンピエルズ不安定性と同様に、出現する量子相はトポロジカル絶縁体であり、半分の充填で見られる。量子干渉から派生したこの位相位相は「正確な」密度行列再正規化群計算で見られ、平均場アプローチでは完全に欠落している。これらのダイナミクスはキャビティ量子電磁力学の設定のような既存の実験プラットフォームで実現可能であり、共振器から放出される光で位相的特徴が明らかにできると主張している。 Topological materials have potential applications for quantum technologies. Non-interacting topological materials, such as e.g., topological insulators and superconductors, are classified by means of fundamental symmetry classes. It is instead only partially understood how interactions affect topological properties. Here, we discuss a model where topology emerges from the quantum interference between single-particle dynamics and global interactions. The system is composed by soft-core bosons that interact via global correlated hopping in a one-dimensional lattice. The onset of quantum interference leads to spontaneous breaking of the lattice translational symmetry, the corresponding phase resembles nontrivial states of the celebrated Su-Schriefer-Heeger model. Like the fermionic Peierls instability, the emerging quantum phase is a topological insulator and is found at half fillings. Originating from quantum interference, this topological phase is found in "exact" density-matrix renormalization group calculations and is entirely absent in the mean-field approach. We argue that these dynamics can be realized in existing experimental platforms, such as cavity quantum electrodynamics setups, where the topological features can be revealed in the light emitted by the resonator.	翻訳日:2023-04-25 11:48:16 公開日:2021-07-09
# 高エネルギー物理における弱値増幅:B中間子崩壊におけるCP振動の高精度測定の事例研究 Weak Value Amplification in High Energy Physics: A Case Study for Precision Measurement of CP Violation in B Meson Decays ( http://arxiv.org/abs/2011.07560v4 ) ライセンス: Link先を確認	Satoshi Higashino, Yuichiro Mori, Yosuke Takubo, Takeo Higuchi, Akimasa Ishikawa, Izumi Tsutsui	(参考訳) 1988年にAharonovらによって提唱された弱値増幅法は、精密測定のために物理学の様々な分野に適用され、物理過程における最終状態を積極的に特定する「ポストセレクション」の自由を利用して実現されている。本稿では,高エネルギー粒子物理学における弱値増幅の手法,特にb中間子崩壊におけるcp違反パラメータの測定において,減衰モードの有効寿命がポスト選択により統計的に長引くことが期待される場合,その実現可能性について述べる。解析の結果,SuperKEKBコライダーでのベルII実験では有効寿命が2.6倍に長くなる可能性があり,CP違反パラメータの測定精度も向上することが示唆された。 The technique of weak value amplification, proposed by Aharonov et al. in 1988, has been applied for various fields of physics for the purpose of precision measurement, which is made possible by exploiting the freedom of `postselection' specifying actively the final state in the physical process. Here we report for the first time the feasibility of utilizing the technique of weak value amplification in high energy particle physics, especially in measuring the CP-violating parameters in B meson decays, where the effective lifetime of the decay mode is expected to be prolonged statistically due to the postselection. Our analysis shows that, when adopted in the Belle II experiment at the SuperKEKB collider, the effective lifetime may be prolonged up to 2.6 times, and that the measurement precision of the CP-violating parameters will also be improved by its effect.	翻訳日:2023-04-24 01:40:35 公開日:2021-07-09
# モデル網膜の定常光異性化における極度パラメトリック感度 Extreme Parametric Sensitivity in the Steady-State Photoisomerization of Model Retinal ( http://arxiv.org/abs/2011.14342v3 ) ライセンス: Link先を確認	Chern Chuang and Paul Brumer	(参考訳) ロドプシン中の網膜色素の光異性化反応を熱浴に結合した2状態二モードモデルを用いて計算した。定常状態(10ps以上)での反応量子収率は、それらの過渡値とはかなり異なることが分かり、これらの系における過渡と定常状態のダイナミクスの間に弱い相関が示唆された。さらに, 定常量子収率は系パラメータの微妙な変化に対して高い感度を示したが, 過渡力学はほとんど影響を受けなかった。このような感度とノルナジアバティック・ビブロン系の標準レベル間隔統計との相関は、量子カオスの起源を示唆している。本現象の実験的観察の可能性とその凝縮相光化学および生物光センシングにおける意義について考察した。 The photoisomerization reaction of the retinal chromophore in rhodopsin was computationally studied using a two-state two-mode model coupled to thermal baths. Reaction quantum yields at the steady state (10 ps and beyond) were found to be considerably different than their transient values, suggesting a weak correlation between transient and steady-state dynamics in these systems. Significantly, the steady-state quantum yield was highly sensitive to minute changes in system parameters, while transient dynamics was nearly unaffected. Correlation of such sensitivity with standard level spacing statistics of the nonadiabatic vibronic system suggests a possible origin in quantum chaos. The feasibility of experimental observation of this phenomenon and its implications in condensed-phase photochemistry and biological light sensing are discussed.	翻訳日:2023-04-22 16:42:21 公開日:2021-07-09
# 線形ポールトラップにおける半径2次元イオン結晶 Radial two-dimensional ion crystals in a linear Paul trap ( http://arxiv.org/abs/2012.12766v5 ) ライセンス: Link先を確認	Marissa D'Onofrio, Yuanheng Xie, A.J. Rasmusson, Evangeline Wolanski, Jiafeng Cui, and Philip Richerme	(参考訳) 線形ポールトラップの"radial-2d"相における二次元(2次元)クーロン結晶を実験的に研究した。この相はラジアル面に完全に整列した2次元イオン格子によって同定され、軸方向とラジアル方向のトラップ電位の比が大きいことで形成される。 19$^{171}$Yb$^+$イオンの配列を用いて、本態マイクロモーションによって駆動される時間依存性イオン位置にもかかわらず、そのような結晶の構造相境界と振動モード周波数が擬ポテンシャル近似によって適切に記述されていることを示す。さらに,微動による放射状2D結晶の加熱が放射面に制限されていることを観察した。最後に、ほとんどのイオントラップ量子シミュレーションで使用される横運動モードが、この幾何学において分離され冷えていることを検証した。本研究では,ラジアル2次元イオン結晶を量子シミュレーションと計算における様々な理論的提案を実現するための強固な実験プラットフォームとして確立する。 We experimentally study two-dimensional (2D) Coulomb crystals in the "radial-2D" phase of a linear Paul trap. This phase is identified by a 2D ion lattice aligned entirely with the radial plane and is created by imposing a large ratio of axial to radial trapping potentials. Using arrays of up to 19 $^{171}$Yb$^+$ ions, we demonstrate that the structural phase boundaries and vibrational mode frequencies of such crystals are well-described by the pseudopotential approximation, despite the time-dependent ion positions driven by intrinsic micromotion. We further observe that micromotion-induced heating of the radial-2D crystal is confined to the radial plane. Finally, we verify that the transverse motional modes, which are used in most ion-trap quantum simulation schemes, remain decoupled and cold in this geometry. Our results establish radial-2D ion crystals as a robust experimental platform for realizing a variety of theoretical proposals in quantum simulation and computation.	翻訳日:2023-04-19 19:36:34 公開日:2021-07-09
# 閉・開放型量子電池におけるエネルギー貯蔵とコヒーレンス Energy storage and coherence in closed and open quantum batteries ( http://arxiv.org/abs/2012.15026v4 ) ライセンス: Link先を確認	Francesco Caravelli, Bin Yan, Luis Pedro Garcia-Pintos, Alioscia Hamma	(参考訳) 閉鎖型および開放型量子電池におけるコヒーレンスの役割について検討する。我々は、コヒーレンスの観点から、クローズドとオープンの両方の量子バッテリによって実行される仕事やエネルギーの上限を求める。具体的には、電池の進化をエンコードするユニタリ作用素のスペクトル基底における密度行列のヒルベルト・シュミットコヒーレンスによってエネルギー貯蔵が束縛されることを示す。また、電池のハミルトニアンコヒーレンスの観点から、可換作用素の評価により類似した境界が得られることを示す。これらの境界を閉系の場合の4状態量子系と異方性XYイジングモデル、開系の場合のスピン-ボソンモデルに適用する。 We study the role of coherence in closed and open quantum batteries. We obtain upper bounds to the work performed or energy exchanged by both closed and open quantum batteries in terms of coherence. Specifically, we show that the energy storage can be bounded by the Hilbert-Schmidt coherence of the density matrix in the spectral basis of the unitary operator that encodes the evolution of the battery. We also show that an analogous bound can be obtained in terms of the battery's Hamiltonian coherence in the basis of the unitary operator by evaluating their commutator. We apply these bounds to a 4-state quantum system and the anisotropic XY Ising model in the closed system case, and the Spin-Boson model in the open case.	翻訳日:2023-04-18 08:07:08 公開日:2021-07-09
# リアリスティック量子ビットにおける加速断熱ゲートの解析設計:超電導回路の一般理論と応用 Analytic Design of Accelerated Adiabatic Gates in Realistic Qubits: General Theory and Applications to Superconducting Circuits ( http://arxiv.org/abs/2102.02370v2 ) ライセンス: Link先を確認	F. Setiawan, Peter Groszkowski, Hugo Ribeiro, and Aashish A. Clerk	(参考訳) adiabaticityへのショートカットは、adiabatic quantum protocolを高速化するための一般的な方法であり、量子情報処理に多くの潜在的な応用がある。残念ながら、複雑な相互作用といくつかのレベルを持つシステムに対して、分析的にショートカットを構築することは難しい作業である。これは通常、理想化されたハミルトニアン(例えば、エネルギーレベルの限られた部分集合のみが保持され、回転波近似(RWA)が作られる)を仮定することで克服される。ここでは、これらの制限を超えることができる$analytic$アプローチを開発します。本手法は一般的であり,非rwa誤差と非rwa誤差の両方を補正するパルス形状を解析的に導出する。また,本手法は従来の非断熱プロトコルよりも少ない駆動力を必要とするパルスが得られることを示す。我々は,高忠実な単一量子ビット"三脚"ゲートを現実的な超伝導フラックスニウム量子ビットで解析的に設計する方法を詳細に示す。 Shortcuts to adiabaticity is a general method for speeding up adiabatic quantum protocols, and has many potential applications in quantum information processing. Unfortunately, analytically constructing shortcuts to adiabaticity for systems having complex interactions and more than a few levels is a challenging task. This is usually overcome by assuming an idealized Hamiltonian [e.g., only a limited subset of energy levels are retained, and the rotating-wave approximation (RWA) is made]. Here we develop an $analytic$ approach that allows one to go beyond these limitations. Our method is general and results in analytically derived pulse shapes that correct both nonadiabatic errors as well as non-RWA errors. We also show that our approach can yield pulses requiring a smaller driving power than conventional nonadiabatic protocols. We show in detail how our ideas can be used to analytically design high-fidelity single-qubit "tripod" gates in a realistic superconducting fluxonium qubit.	翻訳日:2023-04-12 20:11:09 公開日:2021-07-09
# 10kg物体の運動基底状態への接近 Approaching the motional ground state of a 10 kg object ( http://arxiv.org/abs/2102.12665v2 ) ライセンス: Link先を確認	Chris Whittle, Evan D. Hall, Sheila Dwyer, Nergis Mavalvala, Vivishek Sudhir, R. Abbott, A. Ananyeva, C. Austin, L. Barsotti, J. Betzwieser, C. D. Blair, A. F. Brooks, D. D. Brown, A. Buikema, C. Cahillane, J. C. Driggers, A. Effler, A. Fernandez-Galiana, P. Fritschel, V. V. Frolov, T. Hardwick, M. Kasprzack, K. Kawabe, N. Kijbunchoo, J. S. Kissel, G. L. Mansell, F. Matichard, L. McCuller, T. McRae, A. Mullavey, A. Pele, R. M. S. Schofield, D. Sigg, M. Tse, G. Vajente, D. C. Vander-Hyde, Hang Yu, Haocun Yu, C. Adams, R. X. Adhikari, S. Appert, K. Arai, J. S. Areeda, Y. Asali, S. M. Aston, A. M. Baer, M. Ball, S. W. Ballmer, S. Banagiri, D. Barker, J. Bartlett, B. K. Berger, D. Bhattacharjee, G. Billingsley, S. Biscans, R. M. Blair, N. Bode, P. Booker, R. Bork, A. Bramley, K. C. Cannon, X. Chen, A. A. Ciobanu, F. Clara, C. M. Compton, S. J. Cooper, K. R. Corley, S. T. Countryman, P. B. Covas, D. C. Coyne, L. E. H. Datrier, D. Davis, C. Di Fronzo, K. L. Dooley, P. Dupej, T. Etzel, M. Evans, T. M. Evans, J. Feicht, P. Fulda, M. Fyffe, J. A. Giaime, K. D. Giardina, P. Godwin, E. Goetz, S. Gras, C. Gray, R. Gray, A. C. Green, E. K. Gustafson, R. Gustafson, J. Hanks, J. Hanson, R. K. Hasskew, M. C. Heintze, A. F. Helmling-Cornell, N. A. Holland, J. D. Jones, S. Kandhasamy, S. Karki, P. J. King, Rahul Kumar, M. Landry, B. B. Lane, B. Lantz, M. Laxen, Y. K. Lecoeuche, J. Leviton, J. Liu, M. Lormand, A. P. Lundgren, R. Macas, M. MacInnis, D. M. Macleod, S. M\'arka, Z. M\'arka, D. V. Martynov, K. Mason, T. J. Massinger, R. McCarthy, D. E. McClelland, S. McCormick, J. McIver, G. Mendell, K. Merfeld, E. L. Merilh, F. Meylahn, T. Mistry, R. Mittleman, G. Moreno, C. M. Mow-Lowry, S. Mozzon, T. J. N. Nelson, P. Nguyen, L. K. Nuttall, J. Oberling, Richard J. Oram, C. Osthelder, D. J. Ottaway, H. Overmier, J. R. Palamos, W. Parker, E. Payne, R. Penhorwood, C. J. Perez, M. Pirello, H. Radkins, K. E. Ramirez, J. W. Richardson, K. Riles, N. A. Robertson, J. G. Rollins, C. L. Romel, J. H. Romie, M. P. Ross, K. Ryan, T. Sadecki, E. J. Sanchez, L. E. Sanchez, T. R. Saravanan, R. L. Savage, D. Schaetzl, R. Schnabel, E. Schwartz, D. Sellers, T. Shaffer, B. J. J. Slagmolen, J. R. Smith, S. Soni, B. Sorazu, A. P. Spencer, K. A. Strain, L. Sun, M. J. Szczepa\'nczyk, M. Thomas, P. Thomas, K. A. Thorne, K. Toland, C. I. Torrie, G. Traylor, A. L. Urban, G. Valdes, P. J. Veitch, K. Venkateswara, G. Venugopalan, A. D. Viets, T. Vo, C. Vorvick, M. Wade, R. L. Ward, J. Warner, B. Weaver, R. Weiss, B. Willke, C. C. Wipf, L. Xiao, H. Yamamoto, L. Zhang, M. E. Zucker, J. Zweizig	(参考訳) 機械的物体(人間サイズの物体でさえ)の運動は、量子力学の規則によって制御されるべきである。熱環境は、物体の動きの量子的シグネチャを隠蔽する。実際、熱環境は大規模スケールでの量子力学の修正提案の効果を隠蔽している。 10kgの機械振動子の質量中心運動を、平均フォノン占有量10.8の状態で作成する。室温から77 nkまでの温度の低下は、フィードバックによる量子バックアクションの11桁の抑制と、その運動基底状態に近い物体の質量の13桁の桁の上昇とが共通している。これは、巨大な量子系の重力を観測する可能性を示している。 The motion of a mechanical object -- even a human-sized object -- should be governed by the rules of quantum mechanics. Coaxing them into a quantum state is, however, difficult: the thermal environment masks any quantum signature of the object's motion. Indeed, the thermal environment also masks effects of proposed modifications of quantum mechanics at large mass scales. We prepare the center-of-mass motion of a 10 kg mechanical oscillator in a state with an average phonon occupation of 10.8. The reduction in temperature, from room temperature to 77 nK, is commensurate with an 11 orders-of-magnitude suppression of quantum back-action by feedback -- and a 13 orders-of-magnitude increase in the mass of an object prepared close to its motional ground state. This begets the possibility of probing gravity on massive quantum systems.	翻訳日:2023-04-09 23:01:18 公開日:2021-07-09
# 合成次元における高次量子特異点の生成 Generating high-order quantum exceptional points in synthetic dimensions ( http://arxiv.org/abs/2102.13646v2 ) ライセンス: Link先を確認	Ievgen I. Arkhipov, Fabrizio Minganti, Adam Miranowicz, Franco Nori	(参考訳) 近年,散逸系における高次例外点(eps)の構成法の提案と開発が盛んに行われている。これらのepは、キラル輸送や感度の向上など、多くの興味深い特性を持つことができる。高次epを持つ非エルミート型ハミルトニアン(nhhs)を実現する以前の提案は、主に結合モードの空間ネットワークを直接構築するか、例えば空間格子を時間や光子数空間にマッピングする合成次元の活用に基づいている。どちらの手法も、古典的またはポストセレクトされた量子場を記述する効果的なNHHの構築に依存しており、量子ジャンプの影響を無視し、したがって、量子ジャンプの確率が励起数や散逸率によって増加するとき、 {\it量子状態におけるスケーラビリティの問題に悩まされる。ここでは、二次リウビリア超作用素の完全な量子力学を考慮し、系作用素モーメントの進化行列から導かれる高次量子EPを用いたNHHsをシンプルかつ効果的に設計する手法を提案する。すなわち、二次二モード系のシステム作用素の高次モーメントを量子化することにより、結果として得られる進化行列は、例えば結合共振器の空間格子を記述する代替の nhhs として解釈でき、そこでは、空間サイトは場モーメントの合成空間における高次フィールドモーメントとして表現される。例えば、u(1) 対称二次二次リウビリアン(英語版)($u(1)$-symmetric quadratic liouvillian)は、非コヒーレントモードカップリングを持つ {\it bimodal} キャビティを記述するが、これは反$\cal pt$-symmetry(英語版)も持つことができ、そのフィールドモーメントダイナミクスは高次epを持つ結合共振器の空間 {\it network} を制御する nhh にマッピングできる。 Recently, there has been intense research in proposing and developing various methods for constructing high-order exceptional points (EPs) in dissipative systems. These EPs can possess a number of intriguing properties related to, e.g., chiral transport and enhanced sensitivity. Previous proposals to realize non-Hermitian Hamiltonians (NHHs) with high-order EPs have been mainly based on either direct construction of spatial networks of coupled modes or utilization of synthetic dimensions, e.g., of mapping spatial lattices to time or photon-number space. Both methods rely on the construction of effective NHHs describing classical or postselected quantum fields, which neglect the effects of quantum jumps, and which, thus, suffer from a scalability problem in the {\it quantum regime}, when the probability of quantum jumps increases with the number of excitations and dissipation rate. Here, by considering the full quantum dynamics of a quadratic Liouvillian superoperator, we introduce a simple and effective method for engineering NHHs with high-order quantum EPs, derived from evolution matrices of system operators moments. That is, by quantizing higher-order moments of system operators, e.g., of a quadratic two-mode system, the resulting evolution matrices can be interpreted as alternative NHHs describing, e.g., a spatial lattice of coupled resonators, where spatial sites are represented by high-order field moments in the synthetic space of field moments. As an example, we consider a $U(1)$-symmetric quadratic Liouvillian describing a {\it bimodal} cavity with incoherent mode coupling, which can also possess anti-$\cal PT$-symmetry, whose field moment dynamics can be mapped to an NHH governing a spatial {\it network} of coupled resonators with high-order EPs.	翻訳日:2023-04-09 20:22:12 公開日:2021-07-09
# 原子と分子による2光子の絡み合った吸収:量子光学チュートリアル Entangled Two-Photon Absorption by Atoms and Molecules: A Quantum Optics Tutorial ( http://arxiv.org/abs/2103.02551v3 ) ライセンス: Link先を確認	Michael G. Raymer, Tiemo Landes and Andrew H. Marcus	(参考訳) 2光子吸収(tpa)や他の分子と時間周波数エンタングル光子対(epp)との非線形相互作用は、様々な興味深い効果を示すと予測されている。そのため、実用的な量子化分子分光法での可能性は、綿密な検査を必要とする。本稿では, 分子による1光および2光子吸収の詳細な理論的研究を行い, 光の量子的性質の扱いについて述べる。基本量子光学理論を概観し、分子光学応答の密度行列(リウヴィル)の導出を概観し、光の量子状態をどのように治療に組み込むかを強調した。挿絵では、自然パラメトリックダウン変換によって生成される光子対のTPAを詳細に扱い、量子光のTPAが古典光とどのように異なるかを強調している。特に, 絡み合った状態を用いて, どの程度のTPA率向上が達成できるかという問題を論じる。この論文は、既知の理論手法と結果のレビュー、およびいくつかの拡張、特に遠方共振中間状態のみを介して発生するTPAプロセスと非共振中間状態を含むTPAプロセスの比較を含む。また, 絡み合ったTPAの実験的研究に直面する主な課題についても概説した。 Two-photon absorption (TPA) and other nonlinear interactions of molecules with time-frequency-entangled photon pairs (EPP) has been predicted to display a variety of fascinating effects. Therefore, their potential use in practical quantum-enhanced molecular spectroscopy requires close examination. This paper presents in tutorial style a detailed theoretical study of one- and two-photon absorption by molecules, focusing on how to treat the quantum nature of light. We review some basic quantum optics theory, then we review the density-matrix (Liouville) derivation of molecular optical response, emphasizing how to incorporate quantum states of light into the treatment. For illustration we treat in detail the TPA of photon pairs created by spontaneous parametric down conversion, with an emphasis on how quantum light TPA differs from that with classical light. In particular, we treat the question of how much enhancement of the TPA rate can be achieved using entangled states. The paper includes review of known theoretical methods and results, as well as some extensions, especially the comparison of TPA processes that occur via far-off-resonant intermediate states only and those that involve off-resonant intermediate state by virtue of dephasing processes. A brief discussion of the main challenges facing experimental studies of entangled TPA is also given.	翻訳日:2023-04-09 07:51:45 公開日:2021-07-09
# トモグラフィーデータへの量子ノイズモデルの適用 Fitting quantum noise models to tomography data ( http://arxiv.org/abs/2103.17243v2 ) ライセンス: Link先を確認	Emilio Onorati, Tamara Kohler, and Toby Cubitt	(参考訳) ノイズの存在は、現在、大規模な量子計算を達成するための主要な障害の1つである。量子ハードウェアにおけるノイズプロセスの特徴付けと理解の戦略は、特に完全なエラー修正とフォールトトレランスのオーバーヘッドが現在のハードウェアの範囲を超えているため、それを緩和する重要な部分である。非マルコフ効果は特に好ましくない種類のノイズであり、標準技術を用いて解析することは困難であり、誤り訂正を用いて制御することが困難である。本研究では,マルコフマスター方程式の厳密な数学的理論に基づいて,未知雑音過程の解析・評価を行う効率的なアルゴリズムを開発した。時間に依存しないマルコフ力学(あるいはほぼマルコフ力学)の場合、このアルゴリズムは最も適したリンドブラジアン、すなわち、与えられた精度内でトモグラフィデータを最も近似するメモリレス量子チャネルの生成子を出力する。非マルコフ力学の場合、このアルゴリズムは等方性雑音付加の観点で非マルコフ性についての定量的かつ操作上有意義な尺度を返す。我々は全てのアルゴリズムのpython実装を提供し、cirqプラットフォームを用いて生成された合成雑音トモグラフィデータの1ビットおよび2量子ビットのサンプルでこれらをベンチマークします。数値計算の結果から,本アルゴリズムは,計測力学に対する最適リンドブラジアンの完全な記述と,解析計算に適合する非マルコフ性を正確に計算することに成功した。 The presence of noise is currently one of the main obstacles to achieving large-scale quantum computation. Strategies to characterise and understand noise processes in quantum hardware are a critical part of mitigating it, especially as the overhead of full error correction and fault-tolerance is beyond the reach of current hardware. Non-Markovian effects are a particularly unfavorable type of noise, being both harder to analyse using standard techniques and more difficult to control using error correction. In this work we develop a set of efficient algorithms, based on the rigorous mathematical theory of Markovian master equations, to analyse and evaluate unknown noise processes. In the case of time-independent Markovian (or nearly Markovian) dynamics, our algorithm outputs the best-fit Lindbladian, i.e., the generator of a memoryless quantum channel which best approximates the tomographic data to within the given precision. In the case of non-Markovian dynamics, our algorithm returns a quantitative and operationally meaningful measure of non-Markovianity in terms of isotropic noise addition. We provide a Python implementation of all our algorithms, and benchmark these on a range of 1- and 2-qubit examples of synthesised noisy tomography data, generated using the Cirq platform. The numerical results show that our algorithms succeed both in extracting a full description of the best-fit Lindbladian to the measured dynamics, and in computing accurate values of non-Markovianity that match analytical calculations.	翻訳日:2023-04-06 00:48:27 公開日:2021-07-09
# t\bar t$-変形フェルミオン理論の再検討 $T\bar T$-deformed Fermionic Theories Revisited ( http://arxiv.org/abs/2104.09529v2 ) ライセンス: Link先を確認	Kyung-Sun Lee, Piljin Yi and Junggi Yoon	(参考訳) 我々は、量子化に向けてフェルミオンを持つ$d=2$理論の$T\bar T$変形を再考する。簡単な図解として、変形したディラックブラケットをMajorana doubletで計算し、既知の固有値の流れを摂動的に確認する。我々は、ワールドシート計量を統合する際に弦のような理論から再構成できるこれらの$t\bar t$理論をほとんど考慮している。 NSRのようなフェルミオンやGSのようなフェルミオンを加えると、これがどのように働くかを簡単に説明した後、ネーターエネルギー運動量に基づいて、後者から$\cN=(1,1)$理論の既知の非超対称性の$T\bar T$変形を得る。この世界表の再構成は、後者が実際には$d=3$ GS-likeモデルの超対称部分集合であり、隠されたスーパーチャージを暗示していることを意味する。これにより、超対称な$T\bar T$のような異なる$T\bar T$の変形や、より一般的には対称エネルギーモメンタムを経由する。フェルミオンを持つ理論では、そのような選択は、潜在的にユニタリティの問題を伴う自由度を2倍にすることが多い。余剰セクターが「小さな変形」限界の分岐ギャップを発達し、赤外における疎結合が生じることを示すが、どのような意味での変形と考えられるかは定かではない。 We revisit $T\bar T$ deformations of $d=2$ theories with fermions with a view toward the quantization. As a simple illustration, we compute the deformed Dirac bracket for a Majorana doublet and confirm the known eigenvalue flows perturbatively. We mostly consider those $T\bar T$ theories that can be reconstructed from string-like theories upon integrating out the worldsheet metric. After a quick overview of how this works when we add NSR-like or GS-like fermions, we obtain a known non-supersymmetric $T\bar T$ deformation of a $\cN=(1,1)$ theory from the latter, based on the Noether energy-momentum. This worldsheet reconstruction implies that the latter is actually a supersymmetric subsector of a $d=3$ GS-like model, implying hidden supercharges, which we do construct explicitly. This brings us to ask about different $T\bar T$ deformations, such as manifestly supersymmetric $T\bar T$ and also more generally via the symmetric energy-momentum. We show that, for theories with fermions, such choices often lead us to doubling of degrees of freedom, with potential unitarity issues. We show that the extra sector develops a divergent gap in the "small deformation" limit and decouples in the infrared, although it remains uncertain in what sense these can be considered a deformation.	翻訳日:2023-04-03 04:43:16 公開日:2021-07-09
# 動的バックアクションマグノメカニクス Dynamical Backaction Magnomechanics ( http://arxiv.org/abs/2104.11218v2 ) ライセンス: Link先を確認	C.A. Potts, E. Varga, V.A.S.V. Bittencourt, S. Viola Kusminskiy and J.P. Davis	(参考訳) 光機械系の放射圧による動的バックアクションは、機械振動を操作するための汎用的な道具であることが証明されている。特に、動的バックアクションは、メカニカル共振器の基底状態への冷却、フォノン発振の駆動、絡み合った状態の生成、光バネ効果の観測に繋がった。ある磁性材料では、機械的振動は磁歪相互作用を介して磁気励起(マグノン)と相互作用し、類似のマグノン誘起動的バックアクションを引き起こす。本稿では,マグノン誘起動的バックアクションが球状磁気試料の機械的振動に与える影響を直接観察する。さらに,近年の多くの理論的提案において,動的バックアクション効果が重要な役割を担っている。 Dynamical backaction resulting from radiation pressure forces in optomechanical systems has proven to be a versatile tool for manipulating mechanical vibrations. Notably, dynamical backaction has resulted in the cooling of a mechanical resonator to its ground-state, driving phonon lasing, the generation of entangled states, and observation of the optical-spring effect. In certain magnetic materials, mechanical vibrations can interact with magnetic excitations (magnons) via the magnetostrictive interaction, resulting in an analogous magnon-induced dynamical backaction. In this article, we directly observe the impact of magnon-induced dynamical backaction on a spherical magnetic sample's mechanical vibrations. Moreover, dynamical backaction effects play a crucial role in many recent theoretical proposals; thus, our work provides the foundation for future experimental work pursuing many of these theoretical proposals.	翻訳日:2023-04-02 20:09:39 公開日:2021-07-09
# エンタングル量子アンルーオットーエンジンはより効率的である Entangled quantum Unruh Otto engine is more efficient ( http://arxiv.org/abs/2105.11709v2 ) ライセンス: Link先を確認	Gaurang Ramakant Kane, Bibhas Ranjan Majhi	(参考訳) 2量子ビットの絡み合った状態と、通常の1量子ビットの量子オットーエンジンよりも効率が良い複合励起状態(あるいは基底状態)との間の相対論的量子オットーサイクルを提案する。熱水貯留層は、背景場と個々の量子ビット間の相互作用とともに、これらの量子ビットに均一な加速度を提供することによって構成される。量子ビットのフレームの1つから測定された効率は、状態のエネルギーギャップだけでなく、それらの間の相対加速度にも依存する。観測者のキュービットの加速度を他のキュービットと比較すると、サイクルは単一キュービット量子オットーエンジンよりも効率的である。さらに、そのようなサイクルを構築するための完全なプロトコルが提供される。 We propose a relativistic quantum Otto cycle between an entangled state of two qubits and their composite excited (or ground) state whose efficiency can be greater than the usual single qubit quantum Otto engine. The hot and cold reservoirs are constructed by providing uniform accelerations to these qubits along with the interaction between the background field and individual qubits. The efficiency, as measured from one of the qubits' frame, not only depends on the energy gap of the states but also the relative acceleration between them. For lower acceleration of our observer's qubit compared to the other one, the cycle is more efficient than the single qubit quantum Otto engine. Furthermore, a complete protocol to construct such a cycle is being provided.	翻訳日:2023-03-29 21:09:07 公開日:2021-07-09
# ESR: 人工知能研究の倫理と社会観 ESR: Ethics and Society Review of Artificial Intelligence Research ( http://arxiv.org/abs/2106.11521v2 ) ライセンス: Link先を確認	Michael S. Bernstein, Margaret Levi, David Magnus, Betsy Rajala, Debra Satz, Charla Waeiss	(参考訳) 人工知能(AI)の研究は、その現実的および潜在的な社会への影響について定期的に批判されており、我々はこの批判とそれが反映する責任に対する十分な制度的な反応を欠いている。 AI研究は、人間の社会に害を与えるのではなく、人に対する害を評価するように設計された、制度審査委員会(IRB)のような既存のフィードバックメカニズムの見地から外れることが多い。そこで我々は,AI研究の否定的倫理的側面と社会的側面を緩和するためのフィードバックパネルであるEthics and Society Review Board (ESR)を開発した。研究者は、この提案のためにesrプロセスが完了するまで、私たちの大学で主要なai資金プログラムから助成金を受けられません。本稿では、41の提案で最初の1年間に設計し、実行してきたESRについて述べる。我々はこれらの提案に関するESRの総合的なフィードバックを分析し、このパネルがマイノリティグループに対する害の問題を最もよく特定していること、研究計画における多様な利害関係者の関与、二重利用、データの表現について調べる。 esrと対話した研究者の調査では、58%が研究プロジェクトの設計に影響を与えていると感じており、100%は将来のプロジェクトをesrに提出し続け、倫理や社会問題を通じて推論のための足場を探していた。 Artificial intelligence (AI) research is routinely criticized for its real and potential impacts on society, and we lack adequate institutional responses to this criticism and to the responsibility that it reflects. AI research often falls outside the purview of existing feedback mechanisms such as the Institutional Review Board (IRB), which are designed to evaluate harms to human subjects rather than harms to human society. In response, we have developed the Ethics and Society Review board (ESR), a feedback panel that works with researchers to mitigate negative ethical and societal aspects of AI research. The ESR's main insight is to serve as a requirement for funding: researchers cannot receive grant funding from a major AI funding program at our university until the researchers complete the ESR process for the proposal. In this article, we describe the ESR as we have designed and run it over its first year across 41 proposals. We analyze aggregate ESR feedback on these proposals, finding that the panel most commonly identifies issues of harms to minority groups, inclusion of diverse stakeholders in the research plan, dual use, and representation in data. Surveys and interviews of researchers who interacted with the ESR found that 58% felt that it had influenced the design of their research project, 100% are willing to continue submitting future projects to the ESR, and that they sought additional scaffolding for reasoning through ethics and society issues.	翻訳日:2023-03-25 21:11:27 公開日:2021-07-09
# レーン変化のアトラス:顧客艦隊の測定データを用いた位置依存型レーン変化行動の検討 The Atlas of Lane Changes: Investigating Location-dependent Lane Change Behaviors Using Measurement Data from a Customer Fleet ( http://arxiv.org/abs/2107.04029v2 ) ライセンス: Link先を確認	Florian Wirthm\"uller, Jochen Hipp, Christian Reichenb\"acher and Manfred Reichert	(参考訳) 周辺交通参加者の行動予測は、運転支援システムや自動運転システムにとって重要かつ困難な課題である。今日のアプローチは、主に交通状況の動的側面をモデル化し、これに基づいて交通参加者の行動を予測することに焦点を当てている。本稿では、位置特異的なa-プリオリレーン変化確率を計算することにより、この共通プラクティスを拡大する第一歩を踏み出す。人間の運転行動は、それぞれの場所によって全く同じ交通状況で異なるかもしれない。例えば、運転手は自問自答する:私はすぐにトラックを目の前で通り過ぎるべきか、あるいは、わずか数キロ先にあるルートの曲がりくねった部分に到達するまで待つべきなのか? このような情報は単独で行動予測を許すには程遠いが、今日のアプローチがそのような位置固有のa-priori確率を予測に組み込むことで大いに有益であることは明らかである。例えば、高速道路のインターチェンジは車線変更を行うドライバーのモチベーションを高めがちであるが、カーブは車線変更削減効果を持っているようである。それにもかかわらず、すべての検討された地域条件の調査は、様々な効果の重畳が、いくつかの場所で予期せぬ確率をもたらすことを示している。そこで我々は,車載予測システムを支援するために,顧客艦隊データに基づく車線変更確率マップを動的に構築,維持することを提案する。信頼できる車線変更確率を導出するためには、広い顧客層が成功の鍵となる。 The prediction of surrounding traffic participants behavior is a crucial and challenging task for driver assistance and autonomous driving systems. Today's approaches mainly focus on modeling dynamic aspects of the traffic situation and try to predict traffic participants behavior based on this. In this article we take a first step towards extending this common practice by calculating location-specific a-priori lane change probabilities. The idea behind this is straight forward: The driving behavior of humans may vary in exactly the same traffic situation depending on the respective location. E.g. drivers may ask themselves: Should I pass the truck in front of me immediately or should I wait until reaching the less curvy part of my route lying only a few kilometers ahead? Although, such information is far away from allowing behavior prediction on its own, it is obvious that today's approaches will greatly benefit when incorporating such location-specific a-priori probabilities into their predictions. For example, our investigations show that highway interchanges tend to enhance driver's motivation to perform lane changes, whereas curves seem to have lane change-dampening effects. Nevertheless, the investigation of all considered local conditions shows that superposition of various effects can lead to unexpected probabilities at some locations. We thus suggest dynamically constructing and maintaining a lane change probability map based on customer fleet data in order to support onboard prediction systems with additional information. For deriving reliable lane change probabilities a broad customer fleet is the key to success.	翻訳日:2023-03-25 18:12:51 公開日:2021-07-09
# ダイヤモンド核スピンジャイロスコープの実証 Demonstration of diamond nuclear spin gyroscope ( http://arxiv.org/abs/2107.04257v1 ) ライセンス: Link先を確認	Andrey Jarmola, Sean Lourette, Victor M. Acosta, A. Glen Birdwell, Peter Bl\"umler, Dmitry Budker, Tony Ivanov, Vladimir S. Malinovsky	(参考訳) 我々は,ダイヤモンド中の窒素空白(nv)色中心に固有な原子核スピンである^<14}$nに基づく回転センサの動作を実証する。このセンサーは、核の光偏光と読み出しと、無線周波数の2量子パルスプロトコルを使用し、原子核スピンの先入観をモニターする。この測定プロトコルは、$^{14}$N四極子分割における温度変化に対する感度を抑え、NV電子スピン遷移に共鳴するマイクロ波パルスを必要としない。この装置は回転プラットフォーム上でテストされ、感度は4.7$^{\circ}/\sqrt{\rm{s}}$ (13 mHz/$\sqrt{\rm{Hz}}$)、バイアス安定性は0.4$^{\circ}$/s (1.1 mHz)であった。 We demonstrate operation of a rotation sensor based on the $^{14}$N nuclear spins intrinsic to nitrogen-vacancy (NV) color centers in diamond. The sensor employs optical polarization and readout of the nuclei and a radio-frequency double-quantum pulse protocol that monitors $^{14}$N nuclear spin precession. This measurement protocol suppresses the sensitivity to temperature variations in the $^{14}$N quadrupole splitting, and it does not require microwave pulses resonant with the NV electron spin transitions. The device was tested on a rotation platform and demonstrated a sensitivity of 4.7 $^{\circ}/\sqrt{\rm{s}}$ (13 mHz/$\sqrt{\rm{Hz}}$), with bias stability of 0.4 $^{\circ}$/s (1.1 mHz).	翻訳日:2023-03-23 00:08:24 公開日:2021-07-09
# インターフェロメトリ質量分析 Interferometric mass spectrometry ( http://arxiv.org/abs/2107.04256v1 ) ライセンス: Link先を確認	Radu Ionicioiu	(参考訳) 加速器質量分析法(accelerator mass spectrometry, ams)は、地質学、分子生物学、考古学など、様々な応用分野において広く用いられている技術である。非常に正確ではあるが、AMSはタンデム加速器とバルク磁石を必要とし、大きな実験室に閉じ込める。本稿では、量子干渉を用いた新しい質量分離法であるインターフェロメトリ質量分析法(IMS)を提案する。 IMSは試料の波状特性を採用しており、試料が粒子状であるAMSと相補的である。この相補性は2つの重要な結果をもたらす。 i) 絶対質量$m$に従ってIMS分離を行うが、AMSのように質量/電荷比$m/q$には対応しない。 IMSでは, 試料は低速度状態にあるが, AMSで使用される高速度状態とは対照的である。 IMSの潜在的な応用は、モバイルアプリケーションのためのコンパクトなデバイス、加速段階で壊れる感受性分子、イオン化が難しい中性試料である。 Accelerator mass spectrometry (AMS) is a widely-used technique with multiple applications, including geology, molecular biology and archeology. Although extremely precise, AMS requires tandem accelerators and bulky magnets which confines it to large laboratories. Here we propose interferometric mass spectrometry (IMS), a novel method of mass separation which uses quantum interference. IMS employs the wave-like properties of the samples, and as such is complementary to AMS, in which samples are particle-like. This complementarity has two significant consequences: (i) in IMS separation is performed according to the absolute mass $m$, and not to the mass-to-charge ratio $m/q$, as in AMS; (ii) in IMS the samples are in the low-velocity regime, in contrast to the high-velocity regime used in AMS. Potential applications of IMS are compact devices for mobile applications, sensitive molecules that break at the acceleration stage and neutral samples which are difficult to ionise.	翻訳日:2023-03-23 00:08:08 公開日:2021-07-09
# 回路qedにおける単一光子状態キュービットとコヒーレント状態キュービット間の量子絡み合い状態の伝達 Transferring quantum entangled states between multiple single-photon-state qubits and coherent-state qubits in circuit QED ( http://arxiv.org/abs/2107.04203v1 ) ライセンス: Link先を確認	Qi-Ping Su, Hanyu Zhang, Chui-Ping Yang	(参考訳) 超伝導フラックス量子ビットに結合した2n個のマイクロ波キャビティを用いて、n個の単光子状態(SPS)量子ビットをn個のコヒーレント状態(CS)量子ビットに最大あるいは部分的に絡み合った状態に転送する方法を提案する。ここでのSPS量子ビットの2つの論理状態は空洞の真空状態と単光子状態で表され、CS量子ビットの2つの論理状態は空洞の2つのコヒーレント状態で符号化される。カプラとして1つの超伝導クトリットのみを使用するため、回路アーキテクチャは大幅に単純化される。状態転送の動作時間はキュービット数の増加とともに増加しない。系の散逸が無視できる場合、量子状態は測定が不要であるため決定論的に転送することができる。さらに、カプラキュトリットの高エネルギー中間レベルは全操作中に励起されず、キュトリットからの脱コヒーレンスを大幅に抑制する。具体例として、2つのSPS量子ビットのベル状態の2つのCS量子ビットへの高忠実転送が、現在の回路QED技術で実現可能であることを示す。最後に、散逸が無視できるとき、n個のCS量子ビットの絡み合った状態は、逆演算を行うことでn個のSPS量子ビットに戻すことができることに注意する必要がある。この提案は非常に一般的であり、自然な原子または人工原子を使用して2nマイクロ波または光学共振器を結合することにより、同じタスクを達成するために拡張することができる。 We present a way to transfer maximally- or partially-entangled states of n single-photon-state (SPS) qubits onto n coherent-state (CS) qubits, by employing 2n microwave cavities coupled to a superconducting flux qutrit. The two logic states of a SPS qubit here are represented by the vacuum state and the single-photon state of a cavity, while the two logic states of a CS qubit are encoded with two coherent states of a cavity. Because of using only one superconducting qutrit as the coupler, the circuit architecture is significantly simplified. The operation time for the state transfer does not increase with the increasing of the number of qubits. When the dissipation of the system is negligible, the quantum state can be transferred in a deterministic way since no measurement is required. Furthermore, the higher-energy intermediate level of the coupler qutrit is not excited during the entire operation and thus decoherence from the qutrit is greatly suppressed. As a specific example, we numerically demonstrate that the high-fidelity transfer of a Bell state of two SPS qubits onto two CS qubits is achievable within the present-day circuit QED technology. Finally, it is worthy to note that when the dissipation is negligible, entangled states of n CS qubits can be transferred back onto n SPS qubits by performing reverse operations. This proposal is quite general and can be extended to accomplish the same task, by employing a natural or artificial atom to couple 2n microwave or optical cavities.	翻訳日:2023-03-23 00:07:35 公開日:2021-07-09
# ポラリトンシミュレーションによる実空間における超高速コヒーレンス非局在化 Ultrafast Coherence Delocalization in Real Space Simulated by Polaritons ( http://arxiv.org/abs/2107.04162v1 ) ライセンス: Link先を確認	Bo Xiang, Zimo Yang, Yi-Zhuang You, Wei Xiong	(参考訳) 超高速2次元赤外高スペクトルイメージングにより, 時間, 周波数, 空間領域における結合キャビティ分子偏光子プラットフォーム上のコヒーレンス非局在化を検討した。周波数および実空間において一方向コヒーレンス非局在化(一つの空洞から別の空洞へのコヒーレンス移動で調製したコヒーレンス)が観察された。この方向性は、lindblad dynamicsによって記述された高エネルギーモードから低エネルギーモードへの非局在光子の散逸によって実現された。さらなる実験により、コヒーレンスがキャビティ間(異なるキャビティからのポラリトン間の重ね合わせ)で直接合成されたとき、エネルギー的に近傍のポラリトンのみが長距離環境変動を生き残ったコヒーレンスを形成することができた。リンドブラッド力学とともに、この結果はコヒーレンスが1段階の機構を通じて非局在化され、光子が1つの空洞から別の空洞へ移動し、自然と人工の量子系におけるコヒーレンス進化に光を遮蔽することを示した。この研究は光子と分子モードを組み合わせてコヒーレンス力学をシミュレートする方法も示した。 We investigated coherence delocalization on a coupled-cavity molecular polariton platform in time, frequency, and spatial domains, enabled by ultrafast two-dimensional infrared hyperspectral imaging. Unidirectional coherence delocalization (coherence prepared in one cavity transfer to another cavity) was observed in frequency and real spaces. This directionality was enabled by dissipation of delocalized photon from high-energy to low-energy modes, described by Lindblad dynamics. Further experiments showed that when coherences were directly prepared across cavities (superpositions between polaritons from different cavities), only energetically nearby polaritons could form coherences that survived the long-range environmental fluctuation. Together with the Lindblad dynamics, this result implied that coherences delocalized through a one-step mechanism where photons transferred from one cavity to another, shedding lights to coherence evolution in natural and artificial quantum systems. This work also demonstrated a way of combining photon and molecular modes to simulate coherence dynamics.	翻訳日:2023-03-23 00:07:06 公開日:2021-07-09
# 集団結合レジームにおける振動ポラリトン化学の理論 Theory of Vibrational Polariton Chemistry in the Collective Coupling Regime ( http://arxiv.org/abs/2107.04156v1 ) ライセンス: Link先を確認	Arkajit Mandal, Xinyang Li, Pengfei Huo	(参考訳) 分子振動を光学キャビティと結合させることで化学反応速度定数を著しく抑制し,集合結合効果と速度定数のキャビティ周波数変化の両方を示すことを理論的に証明した。反応座標が溶媒分子に強く結合すると、動的カウジング効果により反応速度定数が低下する。また, 溶媒をキャビティに結合させることにより, この動的カウジング効果をさらに高め, 化学速度のさらなる抑制が期待できることを示した。この効果はキャビティ損失を考慮するとさらに増幅される。 We theoretically demonstrate that chemical reaction rate constant can be significantly suppressed by coupling molecular vibrations with an optical cavity, exhibiting both the collective coupling effect and the cavity-frequency modification of the rate constant. When a reaction coordinate is strongly coupled to the solvent molecules, the reaction rate constant is reduced due to the dynamical caging effect. We demonstrate that collectively coupling the solvent to the cavity can further enhance this dynamical caging effect, leading to additional suppression of the chemical kinetics. This effect is further amplified when cavity loss is considered.	翻訳日:2023-03-23 00:06:44 公開日:2021-07-09
# ランダウアー境界を超える実験的非平衡メモリ消去 Experimental nonequilibrium memory erasure beyond Landauer's bound ( http://arxiv.org/abs/2107.04429v1 ) ライセンス: Link先を確認	Mario A. Ciampini, Tobias Wenzl, Michael Konopik, Gregor Thalhammer, Markus Aspelmeyer, Eric Lutz, Nikolai Kiesel	(参考訳) デジタル情報のクリーンな世界は、ノイズの多い物理デバイスに基づいている。ランダウアーの原理は、論理的に不可逆な変換のエネルギー消費と熱生成の限界を低く設定することで、情報処理と基礎となる熱力学の深い関係を提供する。ランダウアーの元々の定式化は平衡を仮定するが、実際の装置はしばしば平衡から遠く離れている。メモリ状態の非平衡特性により、消費電力の低減と負の熱発生を伴う全消去が可能となることを実験的に示す。最適化された消去プロトコルを2状態メモリに実装する。この目的のために, 非線形ポテンシャルランドスケープの動的形状をレヴィトダイナミクスの強力なツールとして, および非平衡過程の研究として導入する。 The clean world of digital information is based on noisy physical devices. Landauer's principle provides a deep connection between information processing and the underlying thermodynamics by setting a lower limit on the energy consumption and heat production of logically irreversible transformations. While Landauer's original formulation assumes equilibrium, real devices often do operate far from equilibrium. We show experimentally that the nonequilibrium character of a memory state enables full erasure with reduced power consumption as well as negative heat production. We implement the optimized erasure protocols in an optomechanical two-state memory. To this end, we introduce dynamical shaping of nonlinear potential landscapes as a powerful tool for levitodynamics as well as the investigation of far-from-equilibrium processes.	翻訳日:2023-03-23 00:02:12 公開日:2021-07-09
# ド・ジッター時空の双曲真空中の球によって誘起されるカシミール密度 Casimir densities induced by a sphere in the hyperbolic vacuum of de Sitter spacetime ( http://arxiv.org/abs/2107.04376v1 ) ライセンス: Link先を確認	A. A. Saharian, T. A. Petrosyan	(参考訳) モードの完全集合とアダマール関数は (D+1)-次元ド・ジッター時空における球面内および外部のスカラー場に対して負の定数曲率空間で分離される。体は球面上のロビン境界条件に従うと仮定する。球によって誘導されるアダマール関数の寄与を明示的に分離し, 双極子およびエネルギー-モーメントテンソルの真空期待値(VEVs)を双曲真空に対して検討した。平坦な時空極限では、後者はミルネ宇宙の共形真空に還元され、最大対称の束-デイヴィス真空状態とは異なる。真空エネルギー運動量テンソルは、放射方向のエネルギーフラックスを記述する非零オフ対角成分を有する。後者は純粋に球面誘起効果であり、境界自由幾何には存在しない。ロビン境界条件の定数と放射座標によっては、エネルギーフラックスは球面からまたは球面へ向けることができる。宇宙膨張の初期段階では、時空曲率が球によって誘導されるVEVに与える影響は弱く、対応する膨張の先頭の項は、ミルヌ宇宙の球のそれと一致する。重力場の影響は、膨張の後期において不可欠である。磁場質量と曲率結合パラメータに依存すると、時間座標の関数としての球誘起VEVの崩壊は単調または減衰振動である。球面から遠く離れたところでは、測地線距離の関数としての球面誘起VEVの降下は、質量場と質量場の両方に対して指数関数的である。 Complete set of modes and the Hadamard function are constructed for a scalar field inside and outside a sphere in (D+1)-dimensional de Sitter spacetime foliated by negative constant curvature spaces. We assume that the field obeys Robin boundary condition on the sphere. The contributions in the Hadamard function induced by the sphere are explicitly separated and the vacuum expectation values (VEVs) of the field squared and energy-momentum tensor are investigated for the hyperbolic vacuum. In the flat spacetime limit the latter is reduced to the conformal vacuum in the Milne universe and is different from the maximally symmetric Bunch-Davies vacuum state. The vacuum energy-momentum tensor has a nonzero off-diagonal component that describes the energy flux in the radial direction. The latter is a purely sphere-induced effect and is absent in the boundary-free geometry. Depending on the constant in Robin boundary condition and also on the radial coordinate, the energy flux can be directed either from the sphere or towards the sphere. At early stages of the cosmological expansion the effects of the spacetime curvature on the sphere-induced VEVs are weak and the leading terms in the corresponding expansions coincide with those for a sphere in the Milne universe. The influence of the gravitational field is essential at late stages of the expansion. Depending on the field mass and the curvature coupling parameter, the decay of the sphere-induced VEVs, as functions of the time coordinate, is monotonic or damping oscillatory. At large distances from the sphere the fall-off of the sphere-induced VEVs, as functions of the geodesic distance, is exponential for both massless and massive fields.	翻訳日:2023-03-23 00:01:36 公開日:2021-07-09
# 分離可能な数値範囲による自信エンタングルメント検出 Confident entanglement detection via separable numerical range ( http://arxiv.org/abs/2107.04365v1 ) ライセンス: Link先を確認	Timo Simnacher, Jakub Czartowski, Konrad Szyma\'nski and Karol \.Zyczkowski	(参考訳) 我々は、複数の測定値のジョイント(分離可能な)数値範囲、すなわち、与えられた観測値に対して(分離可能な)量子状態でアクセス可能な期待値の領域について検討する。これは効率の良い絡み合い検出を可能にするだけでなく、量子状態の集合の幾何学にも光を当てる。より正確には、実験において、得られたデータに対する信頼領域と分離可能な数値範囲が解離した場合、絡み合いを確実に検出する。概して、このような実験の成功は、分離可能な数値範囲が測定された観測値の標準数値範囲と比較されるほど小さい可能性が高い。これら2つの体積の比を用いてこの関係を定量化し、任意の粒子数、局所次元および測定数に対して解析的境界を与えることなく、任意に小さくすることはできないことを示す。さらに, 2つの局所トレースのない2量子ビット生成可観測器の分離可能領域と標準数値範囲の体積を明示的に計算する。さらに、一般的な観測可能量と極端なインスタンスに対する典型的な体積比を考察する。 We investigate the joint (separable) numerical range of multiple measurements, i.e., the regions of expectation values accessible with (separable) quantum states for given observables. This not only enables efficient entanglement detection, but also sheds light on the geometry of the set of quantum states. More precisely, in an experiment, if the confidence region for the obtained data and the separable numerical range are disjoint, entanglement is reliably detected. Generically, the success of such an experiment is more likely the smaller the separable numerical range is compared to the standard numerical range of the observables measured. We quantify this relation using the ratio between these two volumes and show that it cannot be arbitrarily small, giving analytical bounds for any number of particles, local dimensions as well as number of measurements. Moreover, we explicitly compute the volume of separable and standard numerical range for two locally traceless two-qubit product observables, which are of particular interest as they are easier to measure in practice. Furthermore, we consider typical volume ratios for generic observables and extreme instances.	翻訳日:2023-03-23 00:01:07 公開日:2021-07-09
# 無反射ポテンシャルに対するグリーン関数と超対称パートナーを見つけるためのパワーローポテンシャルへの境界状態の追加 Green's functions for reflectionless potentials and addition of boundstates to powerlaw potentials to find Supersymmetric partners ( http://arxiv.org/abs/2107.04332v1 ) ライセンス: Link先を確認	C.V.Sukumar	(参考訳) グリーンの無反射ポテンシャル関数は構成され、解析される。電力法ポテンシャルのグリーン関数,超対称性パートナー,固有値の和規則について検討した。付加境界状態が$e=0$である法ポテンシャルを動力とするsusyパートナーポテンシャルが構成される。 Green's functions for reflectionless potentials are constructed and analyzed. Green's functions for power law potentials, their Super Symmetric partners and sum rules for eigenvalues are examined. The SUSY partner potentials to power law potentials which have an additional bound state at $E=0$ are constructed.	翻訳日:2023-03-23 00:00:48 公開日:2021-07-09
# プラトン絡み合い Platonic Entanglement ( http://arxiv.org/abs/2107.04329v1 ) ライセンス: Link先を確認	Jos\'e I. Latorre and Germ\'an Sierra	(参考訳) 本稿では, Acillary Absolute Maximally Entangled (AME) 状態に基づくテンソルネットワークを用いて, プラトニックソリッドの位相上で定義される強絡状態の構成について述べる。ドデカヘドロン上のAME(5,2)に基づく量子状態の例を用いて、このアイデアを説明する。このような状態のエントロピーを多くの異なる分割で解析し、それらが整数上で発生し、ほぼ極大であるのを観測する。また,すべてのプラトニックソリッドは,面数,頂点数,辺数が常に素数プラス1であるため,リード・ソロモン符号に基づくAME状態の構成を受け入れる。 We present a construction of highly entangled states defined on the topology of a platonic solid using tensor networks based on ancillary Absolute Maximally Entangled (AME) states. We illustrate the idea using the example of a quantum state based on AME(5,2) over a dodecahedron. We analyze the entropy of such states on many different partitions, and observe that they come on integer numbers and are almost maximal. We also observe that all platonic solids accept the construction of AME states based on Reed-Solomon codes since their number of facets, vertices and edges are always a prime number plus one.	翻訳日:2023-03-23 00:00:42 公開日:2021-07-09
# 量子回路冷凍における電荷動態:熱化とマイクロ波利得 Charge dynamics in quantum-circuit refrigeration: thermalization and microwave gain ( http://arxiv.org/abs/2107.04278v1 ) ライセンス: Link先を確認	Hao Hsu, Matti Silveri, Vasilii Sevriuk, Mikko M\"ott\"onen, Gianluigi Catelani	(参考訳) 通常の金属-絶縁体-超導体接合による光子補助トンネルの研究は、量子電気回路の散逸をその場で制御するための便利なツールを提供する可能性を示した。しかし、そのような量子回路冷凍機(QCR)に関する現在の文献では、トンネル過程の電荷ダイナミクスやオープン量子系の位相コヒーレンスについて詳細な記述は示されていない。ここでは、量子電気と電荷の自由度の両方を記述するマスター方程式を導出し、低温と低電荷エネルギーの典型的な実験パラメータが電荷と量子力学の時間スケールの分離をもたらすことを発見する。したがって、電荷分布を平均化することにより、異なる電荷状態のマイナーな効果を考慮に入れることができる。また、交流電圧をトンネル接合に印加することにより、駆動振幅を変化させて超伝導量子ビットの減衰率を4桁以上制御可能とし、40 nsでの量子ビット励起の桁数低下と10^{-4}$以下の残留リセットを求める。さらに、通常の島では、超伝導ギャップ、すなわち量子ドットに比べて電荷エネルギーと単一粒子レベルが大きな場合を考える。このような点QCRから生じる減衰速度は量子ビットリセットにおいて低いように見えるが、結合マイクロ波共振器に効果的な負減衰(利得)を与えることができる。そのようなミリケルビンマイクロ波源のファノ係数はユニティよりも小さくなり、後者の値は最大到達可能電力に近い値に達する。 Previous studies of photon-assisted tunneling through normal-metal-insulator-superconductor junctions have exhibited potential for providing a convenient tool to control the dissipation of quantum-electric circuits in-situ. However, the current literature on such a quantum-circuit refrigerator (QCR) does not present a detailed description for the charge dynamics of the tunneling processes or the phase coherence of the open quantum system. Here we derive a master equation describing both quantum-electric and charge degrees of freedom, and discover that typical experimental parameters of low temperature and yet lower charging energy yield a separation of time scales for the charge and quantum dynamics. Consequently, the minor effect of the different charge states can be taken into account by averaging over the charge distribution. We also consider applying an ac voltage to the tunnel junction, which enables control of the decay rate of a superconducting qubit over four orders of magnitude by changing the drive amplitude; we find an order-of-magnitude drop in the qubit excitation in 40 ns and a residual reset infidelity below $10^{-4}$. Furthermore, for the normal island we consider the case of charging energy and single-particle level spacing large compared to the superconducting gap, i.e., a quantum dot. Although the decay rates arising from such a dot QCR appear low for use in qubit reset, the device can provide effective negative damping (gain) to the coupled microwave resonator. The Fano factor of such a millikelvin microwave source may be smaller than unity, with the latter value being reached close to the maximum attainable power.	翻訳日:2023-03-22 23:59:48 公開日:2021-07-09
# 重力sagによる殻状凝縮:接触と双極子相互作用 Shell-shaped condensates with gravitational sag: contact and dipolar interactions ( http://arxiv.org/abs/2107.04577v1 ) ライセンス: Link先を確認	Maria Arazo, Ricardo Mayol, Montserrat Guilleumas	(参考訳) 小重力下での気泡トラップポテンシャルにおけるボース・アインシュタイン凝縮について検討する。特に,薄い殻に注目し,接触と双極子相互作用の凝縮の研究を行う。まず,双極子相互作用の異方性の影響を解析し,双極子と重力の偏光軸がわずかに不一致の場合,すでに重力の欠如に現れている。そこで, 微小重力場において, 重力方向の瞬時に傾いたり, 重力強度の急激な変化によって引き起こされた, 薄い貝殻状凝縮体の小さな振動のダイナミクスについて検討した。このシステムは、宇宙実験室で重力センサーを実現するための予備段階となるかもしれない。 We investigate Bose-Einstein condensates in bubble trap potentials in the presence of a small gravity. In particular, we focus on thin shells and study both contact and dipolar interacting condensates. We first analyze the effects of the anisotropic nature of the dipolar interactions, which already appear in the absence of gravity and are enhanced when the polarization axis of the dipoles and the gravity are slightly misaligned. Then, in the small gravity context, we investigate the dynamics of small oscillations of these thin, shell-shaped condensates triggered either by an instantaneous tilting of the gravity direction or by a sudden change of the gravity strength. This system could be a preliminary stage for realizing a gravity sensor in space laboratories.	翻訳日:2023-03-22 23:53:34 公開日:2021-07-09
# KscAイオンチャネルの量子モデルにおける輸送しきい値 Transport threshold in a quantum model for the KscA ion channel ( http://arxiv.org/abs/2107.04573v1 ) ライセンス: Link先を確認	N. De March, S. D. Prado and L. G. Brunnet	(参考訳) K$^{+}$チャネルにおける高いスループット率の背後にあるメカニズムは、まだ未解決の問題である。最近のシミュレーションにより、k$^{+}$チャネルコアを通るカリウムの通過(いわゆる選択性フィルター(sf))は、クーロン反発の強さがイオン伝導を凍結するモデルに対して無水であることが示されている。量子コヒーレントホッピングはイオン伝導の仲介に関係があることが示唆されている。量子的アプローチと経路に沿った解離したイオンの仮説の中で、ソース内の多数の粒子から始まり、ドレインで収集されるサイトの線形連鎖によってモデル化されたSFをどう通過するかを確認する。その結果、SF占有率の平均は3イオンであり、これは最近の古典的モデルシミュレーションと一致していることがわかった。 The mechanism behind the high throughput rate in K$^{+}$ channels is still an open problem. Recent simulations have shown that the passage of potassium through the K$^{+}$ channel core, the so-called selectivity filter (SF), is water-free against models where the strength of Coulomb repulsion freezes ions conduction. It has been suggested that quantum coherent hopping might be relevant in mediating ion conduction. Within the quantum approach and the hypothesis of desolvated ions along the pathway, we start with a number of particles in a source to see how they go across the SF modeled by a linear chain of sites to be collected in a drain. As a main result we show that there is a threshold SF occupancy is three ions on average, which is in agreement with recent classical model simulations.	翻訳日:2023-03-22 23:53:20 公開日:2021-07-09
# SherLOCKed:サイバーセキュリティ教育のための探偵テーマのシリアスゲーム SherLOCKED: A Detective-themed Serious Game for Cyber Security Education ( http://arxiv.org/abs/2107.04506v1 ) ライセンス: Link先を確認	Alice Jaffray and Conor Finn and Jason R.C. Nurse	(参考訳) ゲーミフィケーションとシリアスゲームは、多くの分野、特に教育を支援するために徐々に使われている。このようなゲームは、生徒にコンテンツを与える新しい方法を提供し、より伝統的な学習アプローチを補完する。この記事は、2Dトップダウンパズルアドベンチャーのスタイルで作られた新しい真剣なゲームであるSherLOCKEDを提案する。このゲームは、学部のサイバーセキュリティコースの文脈にあり、学生の基本的なセキュリティ概念(ciaのトライアド、セキュリティの脅威と攻撃、リスク管理など)に関する知識を統合するために使用される。 sherlockedは、既存のシリアスゲームのレビューと共通のゲーミフィケーション原則の研究に基づいて構築された。その後、学部で実施され、112名の学生で評価された。このゲームは、学生が講義中に導入したコンテンツへのさらなるエンゲージメントを可能にする、効果的で魅力的で楽しいソリューションであることがわかった。この研究は、サイバーセキュリティに関する学習を支援するシリアスゲームの使用に新たな証拠を与えている。 Gamification and Serious Games are progressively being used over a host of fields, particularly to support education. Such games provide a new way to engage students with content and can complement more traditional approaches to learning. This article proposes SherLOCKED, a new serious game created in the style of a 2D top-down puzzle adventure. The game is situated in the context of an undergraduate cyber security course, and is used to consolidate students' knowledge of foundational security concepts (e.g. the CIA triad, security threats and attacks and risk management). SherLOCKED was built based on a review of existing serious games and a study of common gamification principles. It was subsequently implemented within an undergraduate course, and evaluated with 112 students. We found the game to be an effective, attractive and fun solution for allowing further engagement with content that students were introduced to during lectures. This research lends additional evidence to the use of serious games in supporting learning about cyber security.	翻訳日:2023-03-22 23:52:19 公開日:2021-07-09
# 臨界パラメトリック量子センシング Critical parametric quantum sensing ( http://arxiv.org/abs/2107.04503v1 ) ライセンス: Link先を確認	R. Di Candia, F. Minganti, K. V. Petrovnin, G. S. Paraoanu and S. Felicetti	(参考訳) 臨界量子システム(Critical quantum systems)は、相転移に近接して発達する拡散感受性のため、量子力学応用の有望な資源である。ここでは、駆動散逸位相遷移中のパラメトリックカー共振器のメトロジーパワーを評価する。周波数推定のための量子フィッシャー情報と周波数識別のためのヘルストロムバウンドを完全に特徴付ける。漸近的な状態を超えて、実験的な到達可能なパラメータでハイゼンベルク精度を達成できることが示される。我々は、非線形共振器の臨界挙動を利用して量子磁気センサの精度と超伝導量子ビット読み出しの忠実性を高めるプロトコルを設計する。 Critical quantum systems are a promising resource for quantum metrology applications, due to the diverging susceptibility developed in proximity of phase transitions. Here, we assess the metrological power of parametric Kerr resonators undergoing driven-dissipative phase transitions. We fully characterize the quantum Fisher information for frequency estimation, and the Helstrom bound for frequency discrimination. By going beyond the asymptotic regime, we show that the Heisenberg precision can be achieved with experimentally reachable parameters. We design protocols that exploit the critical behavior of nonlinear resonators to enhance the precision of quantum magnetometers and the fidelity of superconducting qubit readout.	翻訳日:2023-03-22 23:52:04 公開日:2021-07-09
# 高調波発生によるツイストト秒パルス中のトーラス結び角運動量 Torus Knot Angular Momentum in Twisted Attosecond Pulses from High Harmonic Generation ( http://arxiv.org/abs/2107.04499v1 ) ライセンス: Link先を確認	Bj\"orn Minneker, Birger B\"oning, Anne Weber and Stephan Fritzsche	(参考訳) 双円ねじれラゲール・ガウスビームは、新しい角運動量として、一定のトーラス結び目角運動量(TKAM)を持つ。 tkam は高調波発生のような非線形原子プロセスで保存され、時間遅延パラメータ $\tau$ と調整パラメータ $\gamma$ で分類することができる。これらのパラメータは、それぞれ投影された軌道角運動量と2つの重ね合わせされたラゲール・ガウシアンビームのエネルギーによって定義される。我々は、駆動ビームと高調波放射から$\tau$と$\gamma$を決定する一貫した幾何学的手法を導出した。この方法は、放射される高調波放射に対する不変パラメータ($\tau$ と $\gamma$)の両方を関連づける。したがって、$\tau$と$\gamma$は2つの異なるトーラス結び目から読み取ることができる。これらの結び目は、それぞれの高調波放射または駆動ビームの電界の時空間的進化から構築することができる。二次元ラゲール・ガウス線を明示的に照射した平面型原子ガスターゲットの分散パラメータの分類を実証する。さらに、$\tau$ と $\gamma$ によって決定される各トーラス結び目は、小さな修正で互いに写像できることを示した。この幾何学的手法は、純粋な形式的導出と比較して、不変パラメータである$\tau$ と $\gamma$ を解釈する異なる方法をもたらす。この研究で示された研究は、前回の発見とよく一致し、双円状のラゲール・ガウスビームによって誘導される高調波発生の文脈におけるTKAMの動的対称性に関する洞察を与える。 Bicircular twisted Laguerre-Gaussian beams possess a definite torus knot angular momentum (TKAM) as a new form of angular momentum. TKAM is conserved in nonlinear atomic processes such as high harmonic generation and can be classified by a time delay parameter $\tau$ and a coordination parameter $\gamma$. These parameters are defined by the respective projected orbital angular momentum and the energy of the two superimposed Laguerre-Gaussian beams. We derive a consistent geometric method to determine $\tau$ and $\gamma$ from the driving beam as well as from the high harmonic radiation. This method relates both invariance parameters ($\tau$ and $\gamma$) to the emitted high harmonic radiation. Therefore, $\tau$ and $\gamma$ can be read off of two different torus knots. These knots can be constructed from the spatio-temporal evolution of the electric field of the respective high harmonic radiation or the driving beam. We demonstrate the classification of the invariance parameters for a planar atomic gas target irradiated by bicircular Laguerre-Gaussian beams explicitly. Furthermore, we demonstrate that the respective torus knots determined by $\tau$ and $\gamma$ can be mapped onto each other within minor modifications. This geometric method yields a different way to interpret the invariance parameters $\tau$ and $\gamma$ as well as their underlying relation compared to a purely formal derivation. The investigations presented in this work are in good agreement with previous findings and provide insight into the dynamical symmetry of TKAM in the context of high harmonic generation induced by bicircular twisted Laguerre-Gaussian beams.	翻訳日:2023-03-22 23:51:53 公開日:2021-07-09
# 光キャビティにおける原子配列を用いた多重通信通信量子ネットワーク Multiplexed telecom-band quantum networking with atom arrays in optical cavities ( http://arxiv.org/abs/2107.04477v1 ) ライセンス: Link先を確認	William Huie, Shankar G. Menon, Hannes Bernien, and Jacob P. Covey	(参考訳) 通信帯域演算や大規模量子情報処理と互換性のある物質ベースの量子ビットの量子ネットワークノードの実現は、基本的な量子ネットワークの可能性を制限する優れた課題である。マルチプレクサネットワークアーキテクチャにおいて、中性原子配列とテレコムバンド光子からなる量子プロセッサを相互接続するプラットフォームを提案する。単一原子ではなく大きな原子配列を用いることで、双方向通信の有害な影響を緩和し、2つのノード間の絡み合いを2桁近く改善する。さらに、各ノード内で高忠実度決定性ゲートと読み出しを同時に実行し、量子リピータへのドアを開き、ネットワークの長さと忠実度を高めるプロトコルを浄化する機能を提供する。中間ノードを量子リピータとして使用することにより,実際の仮定に基づいて約1500kmにわたる絡み合い分布の実現可能性を示し,大陸間ネットワークの青写真を提供する。最後に,分散フォールトトレラント量子コンピュータのバックボーンとして機能する,約25個のベルペアを大都市圏に分散できることを実証する。 The realization of a quantum network node of matter-based qubits compatible with telecom-band operation and large-scale quantum information processing is an outstanding challenge that has limited the potential of elementary quantum networks. We propose a platform for interfacing quantum processors comprising neutral atom arrays with telecom-band photons in a multiplexed network architecture. The use of a large atom array instead of a single atom mitigates the deleterious effects of two-way communication and improves the entanglement rate between two nodes by nearly two orders of magnitude. Further, this system simultaneously provides the ability to perform high-fidelity deterministic gates and readout within each node, opening the door to quantum repeater and purification protocols to enhance the length and fidelity of the network, respectively. Using intermediate nodes as quantum repeaters, we demonstrate the feasibility of entanglement distribution over approximately 1500 km based on realistic assumptions, providing a blueprint for a transcontinental network. Finally, we demonstrate that our platform can distribute approximately 25 Bell pairs over metropolitan distances, which could serve as the backbone of a distributed fault-tolerant quantum computer.	翻訳日:2023-03-22 23:51:02 公開日:2021-07-09
# ブロックチェーンとスマートコントラクトにセマンティック記述が必要な理由 Why blockchain and smart contracts need semantic descriptions ( http://arxiv.org/abs/2107.14101v1 ) ライセンス: Link先を確認	Zoran \v{S}koda	(参考訳) 私たちは、ブロックチェーンやスマートコントラクトの内容や振る舞いの背後にある、その特定のレベルの関連する現実の特徴を記述するレベルの階層が存在する、と論じています。これらの記述の基礎の研究がこれらの記述の形式主義、ツール、標準を発達させ、設定すれば、これらの体系の選択、設計、監査、法的な統制はより情報化され、より容易で、より高いレベルに引き上げられる。 We argue that there is a hierarchy of levels describing to that particular level relevant features of reality behind the content and behavior of blockchain and smart contracts in their realistic deployment. Choice, design, audit and legal control of these systems could be more informed, easier and raised to a higher level, if research on foundations of these descriptions develops and sets the formalisms, tools and standards for such descriptions.	翻訳日:2023-03-22 23:44:16 公開日:2021-07-09
# Smart Band: 緊急管理のための統合デバイス Smart Band: An Integrated Device for Emergency Management ( http://arxiv.org/abs/2107.14100v1 ) ライセンス: Link先を確認	A. Jackulin Mahariba, Shivam Patel	(参考訳) 誘拐や緊急事態の場合には、しばしば助けを求めるために無力化される。そして通常、最初の応答者が到着するまでは遅すぎます。現在、市場に出回っている「安全」デバイスは、あまり実用的でない電源ボタンをダブルタップしたり、そもそも大きな投資をしているスマートウォッチのアプリなど、あまりにも初歩的すぎることが多い。 Smart Bandは、誘拐、野生動物への正面対面、心臓発作などの危険な状況で自分自身を見つける人による物理的なトリガーの必要性を排除し、むしろ心拍を感知することを目的としている。 smart bandはパーソナライズされたウェアラブルデバイスとして設計されており、ユーザの心拍数を機械学習アルゴリズムを使って収集し、トレーニングすることで、イベントが特定された時にアラートシステムが自動的に起動される。したがって、緊急状況を評価する精度が高く、虚偽率を低減することができる。イベントが検出されると、バンドは第1応答者および緊急連絡先にgps座標を中継し、それはネットワークキャリア(simカードモジュール)を介してバンドから直接送信される。基本的には、スマートバンドはgpsトラッカー、心拍センサー、ネットワークモジュール、bluetoothモジュールで構成されており、既存の技術はすべて大量生産されており、最終製品が手頃な価格で大量生産できる程度に大量生産されている。さらなる開発では、スマートバンドは、その目的を自律的に果たすことができる、きめ細かいウェアラブルジュエリーにカスタマイズできる。 In the event of a kidnapping or a medical emergency, a person is often incapacitated to be able to call for help. And it's usually too late before the first responders arrive on-scene. Currently, a vast array of 'safety' devices available in the market are often far too rudimentary such as double tapping the power button that isn't very practical, or an app on a smart watch that is a huge investment in the first place. The Smart Band aims to eliminate the need for a physical trigger by the person who finds himself in dangerous situations like kidnapping, front-facing some wild animal, heart attack, etc., and rather senses the heartbeat. The Smart Band is designed as a personalized wearable device, wherein the user heart beat rate is collected and trained using machine learning algorithm, which triggers the alert system automatically when the event is identified. Hence the accuracy of assessing emergency situation will be high and false rate will be reduced. As soon as the event is detected, the band relays GPS coordinates to first responders and emergency contacts, which will be sent via the Network Carrier (SIM card module) directly from the band, not relying on a mobile phone, which is usually out of reach during such emergency situations. In essence, the Smart Band consists of a GPS tracker, a heartbeat sensor, a Network module, and a Bluetooth module, all existing technologies which have been mass produced to an extent that the end product can be made affordable, and in huge quantities as well. On further development the smart band can be customized to a finely wearable jewel which can serve the purpose autonomously.	翻訳日:2023-03-22 23:44:09 公開日:2021-07-09
# 振動ポラリトン化学の理論へのロードマップ A roadmap toward the theory of vibrational polariton chemistry ( http://arxiv.org/abs/2107.09026v1 ) ライセンス: Link先を確認	Derek S Wang and Susanne F Yelin	(参考訳) 振動ポラリトン化学の分野は2016年に、室温での化学反応速度が外部に駆動することなく共鳴調整された赤外線キャビティ内で変化した際に確立された。反応速度がなぜ変化するのかを理解するために世界中の科学者による激しい努力にもかかわらず、理論的な説明は存在しない。この観点からは, まず, 反応物質濃度, キャビティ周波数, 対称性の役割をほのめかした, このセレント実験およびそれに続く関連する実験を概観する。次に, 量子力学修飾遷移速度理論, フォトニック溶媒ケージ効果, 暗黒状態からの散逸の影響, 分子内振動エネルギー再分配による結合強化, 局所分子特性の総合的向上など, 主要な理論の関連性を分析する。最後に、理論と理論家のための新しい経路をテストする実験を提案し、振動ポラリトン化学の理論へのロードマップを構築する。強い結合機構の開始の重要性を理解し,反応経路の変化を捉えるための実験を設計し,さらにキャビティ修飾した分子内振動エネルギー再分配の理論と局所分子特性の総合的強化が次の重要なステップであると考えている。この視点が振動偏光子化学の分野の研究を導くための貴重な資源になることを願っている。 The field of vibrational polariton chemistry was firmly established in 2016 when a chemical reaction rate at room temperature was modified within a resonantly tuned infrared cavity without externally driving the system. Despite intense efforts by scientists around the world to understand why the reaction rate changes, no convincing theoretical explanation exists. In this perspective, first, we briefly review this seminal experiment, as well as relevant experiments that have since followed that have hinted at the roles of reactant concentration, cavity frequency, and symmetry. Then, we analyze the relevance of leading theories, such as quantum electrodynamics-modified transition rate theories, the photonic solvent cage effect, the impact of dissipation from dark states, bond strengthening via intramolecular vibrational energy redistribution, and collectively enhanced local molecular properties. Finally, we construct a roadmap toward the theory of vibrational polariton chemistry by suggesting experiments to test theories and new paths for theorists. We believe that understanding the importance of the onset of the strong coupling regime, designing experiments to capture changes in reaction pathways, and further developing the theories of cavity-modified intramolecular vibrational energy redistribution and collectively enhanced local molecular properties are crucial next steps. We hope this perspective will be a valuable resource for guiding research in the field of vibrational polariton chemistry.	翻訳日:2023-03-22 23:43:45 公開日:2021-07-09
# 過去の断片:家庭内暴力の加害者によるピアサポートの算出 Fragments of the Past: Curating Peer Support with Perpetrators of Domestic Violence ( http://arxiv.org/abs/2107.04711v1 ) ライセンス: Link先を確認	Rosanna Bellini, Alexander Wilson, Jan David Smeddinck	(参考訳) デジタルピアサポートネットワークが、自分や他人を傷つける人々にとって行動の変化や幸福な結果にポジティブな影響を与えうるという証拠が増えている。しかし、このようなネットワークの構築と維持には倫理的かつ実践的な課題があり、特に家庭内暴力の加害者が団結する際に独特なリスクを負う。本研究は,6人の支援労働者と18人の加害者とともに,音声メッセージを有形人工物と結び付ける社会材料システムFragments of the Pastの設計と展開について10ヶ月にわたる研究を報告する。暴力から脱落した経験をデジタルで表現したアーティファクトの作り方 - フラッグメント(fragments) - を共有することで、直接対人コミュニケーションに固有のリスクを負うことなく、モチベーションやラプポートのメッセージを伝えることができる。これらの知見は、挑戦的な人口を持つ将来のネットワーク設計の実践的考察の基礎となる。 There is growing evidence that digital peer-support networks can have a positive influence on behaviour change and wellbeing outcomes for people who harm themselves and others. However, making and sustaining such networks are subject to ethical and pragmatic challenges, particularly for perpetrators of domestic violence whom pose unique risks when brought together. In this work we report on a ten-month study where we worked with six support workers and eighteen perpetrators in the design and deployment of Fragments of the Past; a socio-material system that connects audio messages with tangible artefacts. We share how crafting digitally-augmented artefacts - 'fragments' - of experiences of desisting from violence can translate messages for motivation and rapport between peers, without subjecting the process to risks inherent with direct inter-personal communication. These insights provide the basis for practical considerations for future network design with challenging populations.	翻訳日:2023-03-22 23:43:24 公開日:2021-07-09
# Um Metodo para Busca Automatica de Redes Neurais Artificiais Um Metodo para Busca Automatica de Redes Neurais Artificiais ( http://arxiv.org/abs/2107.04702v1 ) ライセンス: Link先を確認	Anderson P. da Silva, Teresa B. Ludermir, Leandro M. Almeida	(参考訳) 本稿では,セル遺伝アルゴリズムを用いたニューラルネットワークの自動検索手法について述べる。一般的な遺伝的アルゴリズムにおけるこの方法の主な違いは、個人に位置情報を提供することができるセルオートマトンを使用することで、検索空間における局所最小化の可能性を減らすことである。この方法は、初期重み付け、伝達関数、アーキテクチャ、学習規則の同時選択のための進化的探索を用いる。実験結果から,本手法は,文献に見られる他の手法と比較して,十分に一般化し,訓練時間も短く,コンパクトで効率的なネットワークを探索できることがわかった。 This paper describes a method that automatically searches Artificial Neural Networks using Cellular Genetic Algorithms. The main difference of this method for a common genetic algorithm is the use of a cellular automaton capable of providing the location for individuals, reducing the possibility of local minima in search space. This method employs an evolutionary search for simultaneous choices of initial weights, transfer functions, architectures and learning rules. Experimental results have shown that the developed method can find compact, efficient networks with a satisfactory generalization power and with shorter training times when compared to other methods found in the literature.	翻訳日:2023-03-22 23:43:08 公開日:2021-07-09
# 量子コンピューティングによるアンテナアレー薄肉化 Antenna Array Thinning Through Quantum Computing ( http://arxiv.org/abs/2107.04684v1 ) ライセンス: Link先を確認	Paolo Rocca, Nicola Anselmi, Giacomo Oliveri, Alessandro Polo and Andrea Massa	(参考訳) 量子フーリエ変換(QFT)によるアンテナアレイの薄膜化を提案する。配列要素の候補位置の格子が与えられた場合、配列要素がどのアンテナ位置を占有するかを問う問題は量子コンピューティング(QC)フレームワークで定式化され、QFTアルゴリズムの適切な実装に基づくアドホック設計手法で対処される。提案手法の特徴と利点を指摘するために, 代表的な数値計算結果を提示し, 考察した。 Thinning antenna arrays through quantum Fourier transform (QFT) is proposed. Given the lattice of the candidate locations for the array elements, the problem of selecting which antenna location has to be either occupied or not by an array element is formulated in the quantum computing (QC) framework and then addressed with an ad-hoc design method based on a suitable implementation of the QFT algorithm. Representative numerical results are presented and discussed to point out the features and the advantages of the proposed QC-based thinning technique.	翻訳日:2023-03-22 23:42:56 公開日:2021-07-09
# 量子純度と生体直交多項式反復のモーメント Moments of quantum purity and biorthogonal polynomial recurrence ( http://arxiv.org/abs/2107.04637v1 ) ライセンス: Link先を確認	Shi-Hao Li and Lu Wei	(参考訳) ビューズ・ハルアンサンブルは密度行列のユニークな尺度であり、量子情報処理における様々な特性を満たす。本研究では,バーレス・ハル・アンサンブル上での絡み合いの統計的挙動を,最も単純なエントロピー(量子純度)によって測定した。この研究の主な成果は、任意のサブシステム次元に対して有効な量子純粋性の正確な第2および第3モーメント表現であり、文学における対応する結果は、等しいサブシステム次元のシナリオに限られる。結果を得るためには,cauchy-laguerre biorthogonal polynomials 上の基底積分の帰結関係を独立に求めた。 The Bures-Hall ensemble is a unique measure of density matrices that satisfies various distinguished properties in quantum information processing. In this work, we study the statistical behavior of entanglement over the Bures-Hall ensemble as measured by the simplest form of an entanglement entropy - the quantum purity. The main results of this work are the exact second and third moment expressions of quantum purity valid for any subsystem dimensions, where the corresponding results in the literature are limited to the scenario of equal subsystem dimensions. In obtaining the results, we have derived recurrence relations of the underlying integrals over the Cauchy-Laguerre biorthogonal polynomials that may be of independent interest.	翻訳日:2023-03-22 23:42:46 公開日:2021-07-09
# Tavis-Cummingsモデルを超えて:原子アンサンブルを用いたQEDの再検討 Beyond the Tavis-Cummings model: revisiting cavity QED with atomic ensembles ( http://arxiv.org/abs/2107.04583v1 ) ライセンス: Link先を確認	Martin Blaha, Aisling Johnson, Arno Rauschenbeutel, J\"urgen Volz	(参考訳) 単一モード電磁場と$N$2レベル原子のアンサンブルの相互作用は、Tavis-Cummingsモデルによって説明される。ここで、集合的に強化された光-物質結合強度は、$g_N = \sqrt{N} \bar{g}_1$, $\bar{g}_1$ で与えられる。以前は、このモデルは多くの空洞実験を記述し分析するために用いられてきた。ここでは,非キャビティモードへの有効散乱速度がキャビティの自由スペクトル範囲と比較して無視できる場合にのみ正当性を示す。実験パラメータに関しては、アンサンブルの光学的深さが低く、いくつかの最先端の実験で破られる条件が必要である。我々は、tavis-cummingsモデルの有効性の定量的条件を与え、全ての連続原子と光子のカスケード相互作用を考慮したより一般的なハミルトニアン記述を導出する。その結果,tavis-cummingsモデルで得られた予測と定量的および定性的に予測が異なっていた。最後に,Tavis-Cummingsモデルの予測から逸脱していることを示す実験データについて述べる。本研究は、量子エミッタの光密度アンサンブルを光共振器に結合した全ての実験に関係している。 The interaction of an ensemble of $N$ two-level atoms with a single mode electromagnetic field is described by the Tavis-Cummings model. There, the collectively enhanced light-matter coupling strength is given by $g_N = \sqrt{N} \bar{g}_1$, where $\bar{g}_1$ is the ensemble-averaged single-atom coupling strength. Formerly, this model has been employed to describe and to analyze numerous cavity-based experiments. Here, we show that this is only justified if the effective scattering rate into non-cavity modes is negligible compared to the cavity's free-spectral range. In terms of experimental parameters, this requires that the optical depth of the ensemble is low, a condition that is violated in several state-of-the-art experiments. We give quantitative conditions for the validity of the Tavis-Cummings model and derive a more general Hamiltonian description that takes into account the cascaded interaction of the photons with all consecutive atoms. We show that the predictions of our model can differ quantitatively and even qualitatively from those obtained with the Tavis-Cummings model. Finally, we present experimental data, for which the deviation from the predictions of the Tavis-Cummings model is apparent. Our findings are relevant for all experiments in which optically dense ensembles of quantum emitters are coupled to an optical resonator.	翻訳日:2023-03-22 23:41:50 公開日:2021-07-09
# 離散時間アルゴリズム理解のための$O(s^r)$-resolution ODE Frameworkとミニマックス問題の線形収束への応用 An $O(s^r)$-Resolution ODE Framework for Understanding Discrete-Time Algorithms and Applications to the Linear Convergence of Minimax Problems ( http://arxiv.org/abs/2001.08826v7 ) ライセンス: Link先を確認	Haihao Lu	(参考訳) 離散時間アルゴリズム(dtas)のダイナミクスを理解するために、通常の微分方程式(odes)を用いた長い歴史がある。意外なことに、まだ基本的な疑問と答えが2つあります。 i)所定のDTAから \emph{suitable} ODE を取得する方法が不明確で、 (ii) DTA の収束と対応する ODE との関係は不明確である。本稿では、上記の2つの疑問に答える汎用DTAの挙動を分析するための新しい機械、$O(s^r)$- resolution ODEフレームワークを提案する。フレームワークには3つのステップがある。 1. 与えられた DTA から適切な ODE を得るには、$s$ が DTA のステップサイズである次数 $r$ でパラメータ化された DTA の $O(s^r)$- resolution ODE の階層を定義する。 DTA からユニークな $O(s^r)$- resolution ODE を構築するための主要なアプローチを提案する。 2) 得られたODEを解析するために,DTAのエネルギー関数に対する$O(s^r)$-linear-convergence条件を提案し,そこで$O(s^r)$- resolution ODEが最適解に線形に収束する。 3) DTA とその対応する ODE の収束特性をブリッジするために、エネルギー関数の固有性を定義し、適切なエネルギー関数に対する$O(s^r)$-解像度 ODE の線型収束が、DTA の線形収束を自動的に保証できることを示す。この機構をよりよく説明するために、制約のないミニマックス問題 $\min_{x\in\RR^n} \max_{y\in \RR^m} L(x,y)$ の解法として、勾配降下昇降法(GDA)、近点法(PPM)、外勾配法(EGM)の3つの古典的アルゴリズムについて検討する。 There has been a long history of using ordinary differential equations (ODEs) to understand the dynamics of discrete-time algorithms (DTAs). Surprisingly, there are still two fundamental and unanswered questions: (i) it is unclear how to obtain a \emph{suitable} ODE from a given DTA, and (ii) it is unclear the connection between the convergence of a DTA and its corresponding ODEs. In this paper, we propose a new machinery -- an $O(s^r)$-resolution ODE framework -- for analyzing the behavior of a generic DTA, which (partially) answers the above two questions. The framework contains three steps: 1. To obtain a suitable ODE from a given DTA, we define a hierarchy of $O(s^r)$-resolution ODEs of a DTA parameterized by the degree $r$, where $s$ is the step-size of the DTA. We present a principal approach to construct the unique $O(s^r)$-resolution ODEs from a DTA; 2. To analyze the resulting ODE, we propose the $O(s^r)$-linear-convergence condition of a DTA with respect to an energy function, under which the $O(s^r)$-resolution ODE converges linearly to an optimal solution; 3. To bridge the convergence properties of a DTA and its corresponding ODEs, we define the properness of an energy function and show that the linear convergence of the $O(s^r)$-resolution ODE with respect to a proper energy function can automatically guarantee the linear convergence of the DTA. To better illustrate this machinery, we utilize it to study three classic algorithms -- gradient descent ascent (GDA), proximal point method (PPM) and extra-gradient method (EGM) -- for solving the unconstrained minimax problem $\min_{x\in\RR^n} \max_{y\in \RR^m} L(x,y)$.	翻訳日:2023-01-07 13:32:04 公開日:2021-07-09
# 一般バナッハ空間におけるコヒーレントとアルキメデスの選択 Coherent and Archimedean choice in general Banach spaces ( http://arxiv.org/abs/2002.05461v4 ) ライセンス: Link先を確認	Gert de Cooman	(参考訳) 私は、抽象バナッハ空間に生きる選択肢間の二項選択と非二項選択に対するアルキメデス性という新しい概念を、非常に一般的な選択モデルのクラスを通して導入し、研究する。 In order to be able to bring an important diversity of contexts into the fold, amongst which choice between horse lottery options, I pay special attention to the case where these linear spaces don't include all `constant' options.I consider the frameworks of conservative inference associated with Archimedean (and coherent) choice models, and also pay quite a lot of attention to representation of general (non-binary) choice models in terms of the simpler, binary ones.The representation theorems proved here provide an axiomatic characterisation for, amongst many other choice methods, Levi's E-admissibility and Walley-Sen maximality. I introduce and study a new notion of Archimedeanity for binary and non-binary choice between options that live in an abstract Banach space, through a very general class of choice models, called sets of desirable option sets. In order to be able to bring an important diversity of contexts into the fold, amongst which choice between horse lottery options, I pay special attention to the case where these linear spaces don't include all `constant' options.I consider the frameworks of conservative inference associated with Archimedean (and coherent) choice models, and also pay quite a lot of attention to representation of general (non-binary) choice models in terms of the simpler, binary ones.The representation theorems proved here provide an axiomatic characterisation for, amongst many other choice methods, Levi's E-admissibility and Walley-Sen maximality.	翻訳日:2023-01-01 13:23:06 公開日:2021-07-09
# 高エネルギー物理データを用いた量子インスピレーション機械学習 Quantum-inspired Machine Learning on high-energy physics data ( http://arxiv.org/abs/2004.13747v2 ) ライセンス: Link先を確認	Timo Felser, Marco Trenti, Lorenzo Sestini, Alessio Gianelle, Davide Zuliani, Donatella Lucchesi and Simone Montangero	(参考訳) 量子多体システムのシミュレーション用に設計された数値ツールであるTensor Networksは、機械学習の問題を解決するために最近応用されている。木テンソルネットワークをエクスプロイトし、CERNの大型ハドロン衝突型加速器によって生成されたデータの分析と分類を、高エネルギー物理学において非常に重要かつ挑戦的なビッグデータ問題に適用する。特に, LHCb実験において, いわゆるb-ジェット, 陽子-陽子衝突に由来するb-クォークを効果的に分類する方法, および, 分類結果の解釈方法について述べる。我々は,テンソルネットワークアプローチを利用して重要な特徴を抽出し,学習プロセスで取得した情報に基づいてネットワーク形状を適応する。最後に,木テンソルネットワークを適応させて,学習プロセスを繰り返すことなく,最適な精度や高速な応答を実現する方法を示す。これらの結果は、数十mhz規模のイベントをトリガできる現在のlhcbイベント分類や将来のlhcbイベント分類に必要な重要な要素である、高周波リアルタイムアプリケーションの実装への道を開いた。 Tensor Networks, a numerical tool originally designed for simulating quantum many-body systems, have recently been applied to solve Machine Learning problems. Exploiting a tree tensor network, we apply a quantum-inspired machine learning technique to a very important and challenging big data problem in high energy physics: the analysis and classification of data produced by the Large Hadron Collider at CERN. In particular, we present how to effectively classify so-called b-jets, jets originating from b-quarks from proton-proton collisions in the LHCb experiment, and how to interpret the classification results. We exploit the Tensor Network approach to select important features and adapt the network geometry based on information acquired in the learning process. Finally, we show how to adapt the tree tensor network to achieve optimal precision or fast response in time without the need of repeating the learning process. These results pave the way to the implementation of high-frequency real-time applications, a key ingredient needed among others for current and future LHCb event classification able to trigger events at the tens of MHz scale.	翻訳日:2022-12-08 22:50:32 公開日:2021-07-09
# data-to-textタスクのためのtext-to-text事前トレーニング Text-to-Text Pre-Training for Data-to-Text Tasks ( http://arxiv.org/abs/2005.10433v3 ) ライセンス: Link先を確認	Mihir Kale, Abhinav Rastogi	(参考訳) データ・ツー・テキストタスクの事前トレーニング+微調整戦略について検討する。実験の結果,テキスト・トゥ・テキスト・プレトレーニングをT5形式で行うことで,データ・ツー・テキスト生成に適したパイプライン型ニューラルネットワークモデルと,BERT や GPT-2 といった代替言語モデルに基づく事前トレーニング技術に勝ることを示す。重要な点として、T5事前トレーニングはドメイン外のテストセットを大きく改善することで証明されるように、より良い一般化をもたらす。私たちの研究が、データからテキストへのタスクでより普及するにつれて、将来の研究のベースラインとして役立つことを願っています。 We study the pre-train + fine-tune strategy for data-to-text tasks. Our experiments indicate that text-to-text pre-training in the form of T5, enables simple, end-to-end transformer based models to outperform pipelined neural architectures tailored for data-to-text generation, as well as alternative language model based pre-training techniques such as BERT and GPT-2. Importantly, T5 pre-training leads to better generalization, as evidenced by large improvements on out-of-domain test sets. We hope our work serves as a useful baseline for future research, as transfer learning becomes ever more prevalent for data-to-text tasks.	翻訳日:2022-11-30 23:30:27 公開日:2021-07-09
# Deep RelativeFusion:シングルイメージ相対深度予測を用いた高密度単分子SLAM DeepRelativeFusion: Dense Monocular SLAM using Single-Image Relative Depth Prediction ( http://arxiv.org/abs/2006.04047v3 ) ライセンス: Link先を確認	Shing Yan Loo, Syamsiah Mashohor, Sai Hong Tang, Hong Zhang	(参考訳) 本稿では,大域的に一貫した3次元構造を復元できる,DeepRelativeFusionと呼ばれる高密度単分子SLAMシステムを提案する。この目的のために,視覚的なslamアルゴリズムを用いて,キーフレームのカメラポーズとセミセンス深度マップを確実に復元し,相対深度予測を用いてセミセンス深度マップを高密度化し,キーフレームポーズグラフを洗練する。半密度深度マップを改善するため, 隣接する画素の画素強度と深度を考慮した構造保存型平均平滑化フィルタである適応フィルタ方式を提案する。高密度化を実現するために,deepfusionが提案するエネルギー最小化フレームワークについて,(1)コスト関数の改善,(2)単像相対深度予測の2つの段階的な改善を提案する。密度化後、キーフレームを2ビュー一貫した半深度と深度マップで更新し、ポーズグラフの最適化を改善し、正確なシーン再構成のためにキーフレームのポーズを洗練するためのフィードバックループを提供する。我々のシステムは最先端の高密度SLAMシステムよりも高い精度で高精度に再現できる。 In this paper, we propose a dense monocular SLAM system, named DeepRelativeFusion, that is capable to recover a globally consistent 3D structure. To this end, we use a visual SLAM algorithm to reliably recover the camera poses and semi-dense depth maps of the keyframes, and then use relative depth prediction to densify the semi-dense depth maps and refine the keyframe pose-graph. To improve the semi-dense depth maps, we propose an adaptive filtering scheme, which is a structure-preserving weighted average smoothing filter that takes into account the pixel intensity and depth of the neighbouring pixels, yielding substantial reconstruction accuracy gain in densification. To perform densification, we introduce two incremental improvements upon the energy minimization framework proposed by DeepFusion: (1) an improved cost function, and (2) the use of single-image relative depth prediction. After densification, we update the keyframes with two-view consistent optimized semi-dense and dense depth maps to improve pose-graph optimization, providing a feedback loop to refine the keyframe poses for accurate scene reconstruction. Our system outperforms the state-of-the-art dense SLAM systems quantitatively in dense reconstruction accuracy by a large margin.	翻訳日:2022-11-24 08:32:03 公開日:2021-07-09
# 部分観測線形力学系における粒子フィルタリングは計画に有効か? When is Particle Filtering Efficient for Planning in Partially Observed Linear Dynamical Systems? ( http://arxiv.org/abs/2006.05975v2 ) ライセンス: Link先を確認	Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu	(参考訳) 粒子フィルタリングは確率力学系の潜在状態を推定する一般的な方法であり、その理論的性質は機械学習や統計コミュニティでよく研究されている。多くの制御問題、例えば、部分的に観測された線形力学系(POLDS)では、推論された潜在状態が各ステップの計画にさらに使用される。本稿では,逐次計画のための粒子フィルタリングの効率に関する厳密な研究を開始し,最初の粒子複雑性境界を与える。過去の行動の誤りは未来に影響を与えるかもしれないが、粒子フィルタリングに基づく政策の長期的報酬が正確な推測に基づいてそれに近いように、必要な粒子の数を制限できる。特に、安定系では、多項式的に多くの粒子が十分であることを示す。我々の証明の鍵は、正確な計画と、粒子フィルタリングに基づく近似計画によって生成されるシーケンスに基づく理想的なシーケンスのカップリングである。このテクニックは、他の逐次的な意思決定問題にも有用だと考えています。 Particle filtering is a popular method for inferring latent states in stochastic dynamical systems, whose theoretical properties have been well studied in machine learning and statistics communities. In many control problems, e.g., partially observed linear dynamical systems (POLDS), oftentimes the inferred latent state is further used for planning at each step. This paper initiates a rigorous study on the efficiency of particle filtering for sequential planning, and gives the first particle complexity bounds. Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference. In particular, we show that, in stable systems, polynomially many particles suffice. Key in our proof is a coupling of the ideal sequence based on the exact planning and the sequence generated by approximate planning based on particle filtering. We believe this technique can be useful in other sequential decision-making problems.	翻訳日:2022-11-23 05:16:43 公開日:2021-07-09
# 分布変換と多様体埋め込みのための四分位・四分位埋め込みと埋め込み分布の選択能力 Quantile-Quantile Embedding for Distribution Transformation and Manifold Embedding with Ability to Choose the Embedding Distribution ( http://arxiv.org/abs/2006.11385v2 ) ライセンス: Link先を確認	Benyamin Ghojogh, Fakhri Karray, Mark Crowley	(参考訳) 本稿では, 分布変換および多様体埋め込みのためのquantile-quantile embedded (qqe) という新しい埋め込み手法を提案する。 QQEは、視覚統計的テストから量子量子的プロットの概念を用いており、データの分布を理論上望ましい分布や経験的参照サンプルに変換することができる。さらに、QQEは、データの多様体を低次元の埋め込み空間に埋め込む際に、ユーザーに分布を埋め込む選択を与える。また、PCA、t-SNE、ディープメトリックラーニングなどの他の次元削減手法の埋め込み分布を修正して、データの表現や視覚化に使用することもできる。教師なし型と教師なし型の両方でQQEを提案する。 QQEはまた、分布を正確な参照分布またはその形状に変換することもできる。また,qqeによってクラス識別が向上する場合もある。異なる合成データと画像データセットを用いた実験により,提案手法の有効性を示す。 We propose a new embedding method, named Quantile-Quantile Embedding (QQE), for distribution transformation and manifold embedding with the ability to choose the embedding distribution. QQE, which uses the concept of quantile-quantile plot from visual statistical tests, can transform the distribution of data to any theoretical desired distribution or empirical reference sample. Moreover, QQE gives the user a choice of embedding distribution in embedding the manifold of data into the low dimensional embedding space. It can also be used for modifying the embedding distribution of other dimensionality reduction methods, such as PCA, t-SNE, and deep metric learning, for better representation or visualization of data. We propose QQE in both unsupervised and supervised forms. QQE can also transform a distribution to either an exact reference distribution or its shape. We show that QQE allows for better discrimination of classes in some cases. Our experiments on different synthetic and image datasets show the effectiveness of the proposed embedding method.	翻訳日:2022-11-19 03:29:58 公開日:2021-07-09
# logit調整によるロングテール学習 Long-tail learning via logit adjustment ( http://arxiv.org/abs/2007.07314v2 ) ライセンス: Link先を確認	Aditya Krishna Menon and Sadeep Jayasumana and Ankit Singh Rawat and Himanshu Jain and Andreas Veit and Sanjiv Kumar	(参考訳) 実世界の分類問題は通常、不均衡またはロングテールのラベル分布を示し、多くのラベルは少数のサンプルに関連付けられる。これはそのようなラベルの一般化に挑戦し、na\"ive learningを支配的なラベルに偏らせる。本稿では,これらの課題に対処するために,標準ソフトマックスクロスエントロピートレーニングの2つの簡単な修正を提案する。本手法では,ラベル周波数に基づくロジット調整の古典的考え方を再考し,トレーニングモデルにポストホックを適用したり,トレーニング中に損失を強制したりする。このような調整は、レアラベルと支配ラベルのロジットの間に大きな相対的マージンをもたらす。これらの技術は、統計的根拠と経験的パフォーマンスをしっかりと保ちながら、文学における最近のいくつかの提案を統一し、一般化する。 Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes na\"ive learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these challenges. Our techniques revisit the classic idea of logit adjustment based on the label frequencies, either applied post-hoc to a trained model, or enforced in the loss during training. Such adjustment encourages a large relative margin between logits of rare versus dominant labels. These techniques unify and generalise several recent proposals in the literature, while possessing firmer statistical grounding and empirical performance.	翻訳日:2022-11-10 13:59:13 公開日:2021-07-09
# 抽象的マルチエージェントインタラクションによる量子セキュア認証と鍵合意に向けて Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction ( http://arxiv.org/abs/2007.09327v2 ) ライセンス: Link先を確認	Ibrahim H. Ahmed, Josiah P. Hanna, Elliot Fosong, and Stefano V. Albrecht	(参考訳) 公開鍵暗号に基づく認証と鍵契約の現在の方法は、量子コンピューティングに弱い。本稿では,人工知能研究に基づく新たなアプローチを提案する。この手法では,コミュニケーション関係者を自律エージェントと見なして,個人決定モデルを用いて相互に対話する。インタラクション中のエージェントの観察行動に基づいて認証とキーアグリーメントが決定される。このアプローチの安全性は、限られた観測結果から相互作用するエージェントの決定をモデル化することの難しさに起因している。提案手法に基づくプロトタイプ認証および鍵契約システムであるPyAMIをリリースする。本手法を実証的に検証し、異なるタイプの攻撃を検知し、正当なユーザを認証する。最後に,サーバモデルのトレーニングに強化学習技術を用いることで,クライアントの判断を効果的に探索し,よりサンプリング効率の高い認証を実現する方法を示す。 Current methods for authentication and key agreement based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach based on artificial intelligence research in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. Authentication and key agreement are decided based on the agents' observed behaviors during the interaction. The security of this approach rests upon the difficulty of modeling the decisions of interacting agents from limited observations, a problem which we conjecture is also hard for quantum computing. We release PyAMI, a prototype authentication and key agreement system based on the proposed method. We empirically validate our method for authenticating legitimate users while detecting different types of adversarial attacks. Finally, we show how reinforcement learning techniques can be used to train server models which effectively probe a client's decisions to achieve more sample-efficient authentication.	翻訳日:2022-11-09 06:12:38 公開日:2021-07-09
# CNNに基づくテキスト分類モデルのためのSHAP値 SHAP values for Explaining CNN-based Text Classification Models ( http://arxiv.org/abs/2008.11825v2 ) ライセンス: Link先を確認	Wei Zhao, Tarun Joshi, Vijayan N. Nair, and Agus Sudjianto	(参考訳) ディープニューラルネットワークは自然言語処理(nlp)モデルでますます使われている。しかし、複雑なアルゴリズムによる結果の解釈と説明の必要性は、銀行などの規制産業において広く採用されていることを制限している。構造化データを用いた機械学習アルゴリズムの解釈可能性に関する最近の研究がある。しかし、語彙の大きさ、高次元の性質、テキストのコヒーレンスと言語構造を考慮する必要があるため、問題がより難しいnlpアプリケーションでは、制限された技術しかありません。本稿では,cnnに基づくテキスト分類モデルの局所的説明可能性のためのshap値を計算する手法を開発した。このアプローチは、機能の重要性を評価するためにグローバルスコアを計算するためにも拡張されている。結果は、Amazon Electronic Reviewのデータの感情分析に基づいて説明される。 Deep neural networks are increasingly used in natural language processing (NLP) models. However, the need to interpret and explain the results from complex algorithms are limiting their widespread adoption in regulated industries such as banking. There has been recent work on interpretability of machine learning algorithms with structured data. But there are only limited techniques for NLP applications where the problem is more challenging due to the size of the vocabulary, high-dimensional nature, and the need to consider textual coherence and language structure. This paper develops a methodology to compute SHAP values for local explainability of CNN-based text classification models. The approach is also extended to compute global scores to assess the importance of features. The results are illustrated on sentiment analysis of Amazon Electronic Review data.	翻訳日:2022-10-24 20:43:05 公開日:2021-07-09
# 効率的な衛星画像による雲構造の分類と理解 Classification and understanding of cloud structures via satellite images with EfficientUNet ( http://arxiv.org/abs/2009.12931v4 ) ライセンス: Link先を確認	Tashin Ahmed and Noor Hossain Nuri Sabab	(参考訳) 気候変動は、長年にわたって重要な政治議論と意思決定の最前線であり、共通の関心事であった。浅層雲は地球の気候を理解する上で重要な役割を果たすが、気候モデルで解釈し表現することは困難である。これらの雲構造を分類することで、雲の物理的構造を理解する可能性が高くなり、気候モデルの生成が改善され、気候変動の予測や天気予報がより良くなる。クラウドは多くの形式で編成されるため、従来のルールベースのアルゴリズムを構築してクラウド機能を分離することは困難である。本稿では,コンボリューションニューラルネット(cnn)をエンコーダとして,unetをデコーダとして,細粒度特徴マップの抽出と再構成を行い,分類器として活用し,専門家が雲が将来的な気候をどのように形成するかを理解するのに役立つような,クラウド組織パターンの分類を行った。分類タスクでセグメンテーションモデルを使用することで、UNetと共に優れたエンコーダを使用することで、このデータセットから優れたパフォーマンスを得ることができることを示した。ダイス係数は最終評価基準に使われており、カグル競技においてそれぞれ66.26\%と66.02\%のスコアを得た。 Climate change has been a common interest and the forefront of crucial political discussion and decision-making for many years. Shallow clouds play a significant role in understanding the Earth's climate, but they are challenging to interpret and represent in a climate model. By classifying these cloud structures, there is a better possibility of understanding the physical structures of the clouds, which would improve the climate model generation, resulting in a better prediction of climate change or forecasting weather update. Clouds organise in many forms, which makes it challenging to build traditional rule-based algorithms to separate cloud features. In this paper, classification of cloud organization patterns was performed using a new scaled-up version of Convolutional Neural Network (CNN) named as EfficientNet as the encoder and UNet as decoder where they worked as feature extractor and reconstructor of fine grained feature map and was used as a classifier, which will help experts to understand how clouds will shape the future climate. By using a segmentation model in a classification task, it was shown that with a good encoder alongside UNet, it is possible to obtain good performance from this dataset. Dice coefficient has been used for the final evaluation metric, which gave the score of 66.26\% and 66.02\% for public and private (test set) leaderboard on Kaggle competition respectively.	翻訳日:2022-10-14 03:53:57 公開日:2021-07-09
# 確率的強制アンサンブル動的モード分解による近周期系の予測と解析 Stochastically forced ensemble dynamic mode decomposition for forecasting and analysis of near-periodic systems ( http://arxiv.org/abs/2010.04248v2 ) ライセンス: Link先を確認	Daniel Dylewsky, David Barajas-Solano, Tong Ma, Alexandre M. Tartakovsky, J. Nathan Kutz	(参考訳) 時系列予測はほとんどの科学分野において中心的な課題である。本稿では, 時間遅延座標における動的モード分解(dmd)を用いた強制線形系として, 観測されたダイナミクスをモデル化する新しい負荷予測法を提案する。このアプローチの中心は、グリッドの負荷が、複雑な実世界の多くの観測可能量と同様に、「ほぼ周期的な」特性、すなわち、支配的なピークによって変動する連続フーリエスペクトルを持つという洞察である。提示した予測方法は,この特性を利用する (i)固有スペクトルがそれらのピークに写像する決定論的線形モデルへの回帰、 (2)確率ガウス過程回帰(GPR)過程を同時に学習し、このシステムを動作させる。予測アルゴリズムは, 説明変数を付加せず, 最先端予測手法と比較し, 優れた性能が得られることを示した。さらに、線形固有ダイナミクスの使用は、解釈可能性とパシモニーの観点から、多くの望ましい特性を提供する。電力網からの負荷データを用いたテストケースについて結果を示す。負荷予測は、リアルタイム制御、価格設定、メンテナンス、セキュリティ決定など、電力システム工学における重要な課題である。 Time series forecasting remains a central challenge problem in almost all scientific disciplines. We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system using Dynamic Mode Decomposition (DMD) in time delay coordinates. Central to this approach is the insight that grid load, like many observables on complex real-world systems, has an "almost-periodic" character, i.e., a continuous Fourier spectrum punctuated by dominant peaks, which capture regular (e.g., daily or weekly) recurrences in the dynamics. The forecasting method presented takes advantage of this property by (i) regressing to a deterministic linear model whose eigenspectrum maps onto those peaks, and (ii) simultaneously learning a stochastic Gaussian process regression (GPR) process to actuate this system. Our forecasting algorithm is compared against state-of-the-art forecasting techniques not using additional explanatory variables and is shown to produce superior performance. Moreover, its use of linear intrinsic dynamics offers a number of desirable properties in terms of interpretability and parsimony. Results are presented for a test case using load data from an electrical grid. Load forecasting is an essential challenge in power systems engineering, with major implications for real-time control, pricing, maintenance, and security decisions.	翻訳日:2022-10-09 13:10:30 公開日:2021-07-09
# 深部ニューラルネットワークを用いた有限温度コーンシャム密度関数理論の高速化 Accelerating Finite-temperature Kohn-Sham Density Functional Theory with Deep Neural Networks ( http://arxiv.org/abs/2010.04905v2 ) ライセンス: Link先を確認	J. Austin Ellis and Lenz Fiedler and Gabriel A. Popoola and Normand A. Modine and J. Adam Stephens and Aidan P. Thompson and Attila Cangi and Sivasankaran Rajamanickam	(参考訳) 有限電子温度でコーン・シャム密度汎関数理論(DFT)によって生成された総エネルギーを、無視可能な計算コストで化学精度で再現する機械学習(ML)に基づく数値モデリングワークフローを提案する。ディープニューラルネットワークに基づいて、ワークフローは与えられた原子構成に対する状態の局所密度(LDOS)を生成する。 LDOSから、原子のボルン・オッペンハイマーポテンシャルエネルギー表面として機能するDFT全自由エネルギーを含む空間分解、エネルギー分解、統合量を計算することができる。本研究では, 固体および液体金属に対するこのアプローチの有効性を実証し, 固体および液体アルミニウムの独立および統一機械学習モデルとの比較を行った。機械学習の密度汎関数理論の枠組みは、現在のアルゴリズムでは達成不可能な計算規模とコストで、環境条件および極限条件下でのマルチスケール材料モデリングへの道を開く。 We present a numerical modeling workflow based on machine learning (ML) which reproduces the the total energies produced by Kohn-Sham density functional theory (DFT) at finite electronic temperature to within chemical accuracy at negligible computational cost. Based on deep neural networks, our workflow yields the local density of states (LDOS) for a given atomic configuration. From the LDOS, spatially-resolved, energy-resolved, and integrated quantities can be calculated, including the DFT total free energy, which serves as the Born-Oppenheimer potential energy surface for the atoms. We demonstrate the efficacy of this approach for both solid and liquid metals and compare results between independent and unified machine-learning models for solid and liquid aluminum. Our machine-learning density functional theory framework opens up the path towards multiscale materials modeling for matter under ambient and extreme conditions at a computational scale and cost that is unattainable with current algorithms.	翻訳日:2022-10-08 23:38:03 公開日:2021-07-09
# 強化学習とグラフニューラルネットワークによるグラフダイナミクスの制御 Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks ( http://arxiv.org/abs/2010.05313v3 ) ライセンス: Link先を確認	Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik	(参考訳) グラフ上で部分的に観察された動的プロセスを限られた数の介入によって制御する問題を考える。この問題は、流行を抑制するためのウイルス検査のスケジュール、製品を宣伝するためのターゲットマーケティング、ソーシャルネットワークに拡散する偽ニュースを検出するために投稿を手作業で検査するといった状況で自然に発生する。この設定を時間グラフプロセス上の逐次決定問題として定式化する。指数的状態空間、組合せ作用空間、部分可観測性に直面して、時間グラフ上の動的過程を制御する新しい可観測スキームを設計する。我々は、流行拡大を抑制するためにどのノードをテストするべきかを優先順位付けし、グラフの最大化に影響を与えるという2つの一般的な問題に対して、このアプローチをうまく適用しました。 We consider the problem of controlling a partially-observed dynamic process on a graph by a limited number of interventions. This problem naturally arises in contexts such as scheduling virus tests to curb an epidemic; targeted marketing in order to promote a product; and manually inspecting posts to detect fake news spreading on social networks. We formulate this setup as a sequential decision problem over a temporal graph process. In face of an exponential state space, combinatorial action space and partial observability, we design a novel tractable scheme to control dynamical processes on temporal graphs. We successfully apply our approach to two popular problems that fall into our framework: prioritizing which nodes should be tested in order to curb the spread of an epidemic, and influence maximization on a graph.	翻訳日:2022-10-08 13:59:57 公開日:2021-07-09
# 講演から説明責任行動へ:ディープニューラルネットワークとトピックモデリングによる政策立案者の公開討論の監視 From Talk to Action with Accountability: Monitoring the Public Discussion of Policy Makers with Deep Neural Networks and Topic Modelling ( http://arxiv.org/abs/2010.08346v3 ) ライセンス: Link先を確認	Vili H\"at\"onen and Fiona Melzer	(参考訳) 気候変動の研究は、人間の活動が気候を変え、現在気候危機に向かっているという意見の一致をもたらした。気候変動の緩和に関する公的な議論や研究活動は増加しているが、潜在的な解決策は議論されるだけでなく、効果的に展開する必要がある。不正管理や政策立案者が説明責任を負うのを防ぐため、透明性と政府プロセスに関する情報の程度が重要であることが示されている。しかし、現在、気候変動に関する議論や情報源の多さから、公共社会や市民社会が政治家の責任を負うための概要を維持することはますます困難になっている。そこで本研究では,複数の公開情報源の発言と修辞を,容易に理解可能なトピック要約へと処理するマルチソーストピックアグリゲーションシステム(mustas)を提案する。 MuSTASは、様々なドキュメントからトピックをモデル化するために、新しいマルチソースハイブリッド遅延ディリクレアロケーションを使用する。この話題の消化は、政治家が気候変動と気候変動の政策について話す場所、方法、時期を評価する上で、一般市民や市民社会に役立ち、気候変動を緩和し、その欠如を和らげるために政治家に責任を負わせることができる。 Decades of research on climate have provided a consensus that human activity has changed the climate and we are currently heading into a climate crisis. While public discussion and research efforts on climate change mitigation have increased, potential solutions need to not only be discussed but also effectively deployed. For preventing mismanagement and holding policy makers accountable, transparency and degree of information about government processes have been shown to be crucial. However, currently the quantity of information about climate change discussions and the range of sources make it increasingly difficult for the public and civil society to maintain an overview to hold politicians accountable. In response, we propose a multi-source topic aggregation system (MuSTAS) which processes policy makers speech and rhetoric from several publicly available sources into an easily digestible topic summary. MuSTAS uses novel multi-source hybrid latent Dirichlet allocation to model topics from a variety of documents. This topic digest will serve the general public and civil society in assessing where, how, and when politicians talk about climate and climate policies, enabling them to hold politicians accountable for their actions to mitigate climate change and lack thereof.	翻訳日:2022-10-06 20:13:08 公開日:2021-07-09
# 情報クエリのための概要指向質問生成 Summary-Oriented Question Generation for Informational Queries ( http://arxiv.org/abs/2010.09692v2 ) ライセンス: Link先を確認	Xusen Yin, Li Zhou, Kevin Small, Jonathan May	(参考訳) ユーザは、質問応答(QA)システムに対して、単純なファクトイドの質問を頻繁に求め、より複雑な質問をサポートする無数の最近の研究の影響を減らします。自動生成された質問(SQ)をユーザに提供することで、QAシステム機能のユーザ理解が向上し、より効果的な使用が容易になる。主文書の話題に焦点をあて,可変長文で回答可能な自己説明的な質問を適切な形で作成することを目指している。 NQ(Natural Questions)データセットに基づいてトレーニングしたBERTベースのPointer-Generator Networkを用いて,これらの要件を満たす。 NQデータセット(20.1BLEU-4)上でのSQ生成のSOTA性能を示す。我々はさらに,本モデルを外部のニュース記事に適用し,ゴールド質問の欠如によるQAシステムによる評価を行い,我々のモデルがニュース記事に対してより良いSQを生成することを示す。 Users frequently ask simple factoid questions for question answering (QA) systems, attenuating the impact of myriad recent works that support more complex questions. Prompting users with automatically generated suggested questions (SQs) can improve user understanding of QA system capabilities and thus facilitate more effective use. We aim to produce self-explanatory questions that focus on main document topics and are answerable with variable length passages as appropriate. We satisfy these requirements by using a BERT-based Pointer-Generator Network trained on the Natural Questions (NQ) dataset. Our model shows SOTA performance of SQ generation on the NQ dataset (20.1 BLEU-4). We further apply our model on out-of-domain news articles, evaluating with a QA system due to the lack of gold questions and demonstrate that our model produces better SQs for news articles -- with further confirmation via a human evaluation.	翻訳日:2022-10-05 21:48:31 公開日:2021-07-09
# 複合文の意味的類似性評価における単語埋め込みの比較分析 Comparative analysis of word embeddings in assessing semantic similarity of complex sentences ( http://arxiv.org/abs/2010.12637v3 ) ライセンス: Link先を確認	Dhivya Chandrasekaran and Vijay Mago	(参考訳) セマンティックテキストの類似性は自然言語処理分野におけるオープンな研究課題の1つである。この分野で大規模な研究が行われ、STSデータセットやSICKデータセットのような既存のベンチマークデータセットにおける最近のトランスフォーマーベースモデルによってほぼ完全な結果が得られている。本稿では,これらのデータセットの文について検討し,文の複雑さに関する各種単語埋め込みの感度を解析する。 15人のアノテータが提供した50の文対と関連する意味的類似度値からなる複雑な文データセットを構築した。既存のベンチマークデータセットと提案データセットにおける文の複雑さの増加を強調するために、可読性分析が行われる。さらに,既存のベンチマークデータセットと提案データセットを用いて,単語埋め込みと言語モデルの性能の比較分析を行った。その結果, 文の複雑さの増加は, 組込みモデルの性能に有意な影響を与え, Pearson と Spearman の相関は10～20%減少した。 Semantic textual similarity is one of the open research challenges in the field of Natural Language Processing. Extensive research has been carried out in this field and near-perfect results are achieved by recent transformer-based models in existing benchmark datasets like the STS dataset and the SICK dataset. In this paper, we study the sentences in these datasets and analyze the sensitivity of various word embeddings with respect to the complexity of the sentences. We build a complex sentences dataset comprising of 50 sentence pairs with associated semantic similarity values provided by 15 human annotators. Readability analysis is performed to highlight the increase in complexity of the sentences in the existing benchmark datasets and those in the proposed dataset. Further, we perform a comparative analysis of the performance of various word embeddings and language models on the existing benchmark datasets and the proposed dataset. The results show the increase in complexity of the sentences has a significant impact on the performance of the embedding models resulting in a 10-20% decrease in Pearson's and Spearman's correlation.	翻訳日:2022-10-03 23:26:50 公開日:2021-07-09
# (un)ソーシャルメディアによるCOVID-19の流行 (Un)Masked COVID-19 Trends from Social Media ( http://arxiv.org/abs/2011.00052v3 ) ライセンス: Link先を確認	Asmit Kumar Singh, Paras Mehan, Divyanshu Sharma, Rohan Pandey, Tavpritesh Sethi, Ponnurangam Kumaraguru	(参考訳) マスクを着用することは、新型コロナウイルス(covid-19)に対する有効な保護方法であり、世界中で経済や社会的影響を引き起こしている。世界中の政府はマスクの使用を義務付けており、肯定的な反応も否定的な反応も受けている。オンラインソーシャルメディアは、マスクの使用を研究し、基礎となるマスク着用パターンを分析するエキサイティングなプラットフォームを提供する。本稿では,米国6都市を対象に,2400万件のソーシャルメディア画像を分析した。新型コロナウイルスの感染者が増加し、特に各州が厳格な規制を課した際、画像に被るマスクの増加が見られる。また,家庭内滞在法が施行されたため,グループ写真投稿の減少も見いだされた。さらに、Black Lives Matterの抗議行動におけるマスクのコンプライアンスを分析し、グループ写真の40%がマスクを着用し、そのうち45%が80%以上のフィットスコアのマスクを着用していた。今回我々は,マスク検出とマスク適合分析のための2つの新しいデータセットであるvariety masks(vama-c)とvariety masks- segmentation(vama-s)を導入した。分析のために、マスク検出装置(マスク付き顔とマスクなし顔の分類)とマスク適合分析装置(マスク適合スコアを計算するセグメンテーションベースモデル)の2つのフレームワークを構築した。フェイスマスク検出器は98%の分類精度を達成し、マスクフィットアナライザのセマンティクスセグメンテーションモデルは98%の交点点(iou)を達成した。このような枠組みは、パンデミック時のソーシャルメディアプラットフォームを用いた公衆衛生戦略の有効性を評価するのに利用できると結論づける。 Wearing masks is a useful protection method against COVID-19, which has caused widespread economic and social impact worldwide. Across the globe, governments have put mandates for the use of face masks, which have received both positive and negative reaction. Online social media provides an exciting platform to study the use of masks and analyze underlying mask-wearing patterns. In this article, we analyze 2.04 million social media images for six US cities. An increase in masks worn in images is seen as the COVID-19 cases rose, particularly when their respective states imposed strict regulations. We also found a decrease in the posting of group pictures as stay-at-home laws were put into place. Furthermore, mask compliance in the Black Lives Matter protest was analyzed, eliciting that 40% of the people in group photos wore masks, and 45% of them wore the masks with a fit score of greater than 80%. We introduce two new datasets, VAriety MAsks - Classification (VAMA-C) and VAriety MAsks - Segmentation (VAMA-S), for mask detection and mask fit analysis tasks, respectively. For the analysis, we create two frameworks, face mask detector (for classifying masked and unmasked faces) and mask fit analyzer (a semantic segmentation based model to calculate a mask-fit score). The face mask detector achieved a classification accuracy of 98%, and the semantic segmentation model for the mask fit analyzer achieved an Intersection Over Union (IOU) score of 98%. We conclude that such a framework can be used to evaluate the effectiveness of such public health strategies using social media platforms in times of pandemic.	翻訳日:2022-10-01 17:21:30 公開日:2021-07-09
# 胸部X線における胸部疾患の分類と局在の事前知識としての放射線学 Using Radiomics as Prior Knowledge for Thorax Disease Classification and Localization in Chest X-rays ( http://arxiv.org/abs/2011.12506v3 ) ライセンス: Link先を確認	Yan Han, Chongyan Chen, Liyan Tang, Mingquan Lin, Ajay Jaiswal, Song Wang, Ahmed Tewfik, George Shih, Ying Ding, Yifan Peng	(参考訳) 胸部X線は非侵襲性から最も一般的な診断の1つである。胸部X線画像の数は急上昇したが、胸部X線を読むのは放射線技師が手動で行い、火傷や遅延が発生する。医学画像から多くの定量的特徴を抽出できる放射線学のサブフィールドとして伝統的にラジオミクスは、深層学習時代以前の医療画像診断を容易にする可能性を示している。本稿では,放射能特性を利用して異常分類性能を向上させるためのエンドツーエンドフレームワークであるChexRadiNetを開発する。具体的には、chexradinetはまず、胸部x線を分類し異常領域を強調するために、軽量だが効率的なトリプレット・アテンション機構を適用した。次に、生成されたクラスアクティベーションマップを使用して放射能特徴を抽出し、より堅牢な画像特徴を学習するためのモデルをさらにガイドする。何度も繰り返し、放射能的特徴の助けを借りて、我々のフレームワークはより正確な画像領域に収束できる。我々は、NIH ChestX-ray、CheXpert、MIMIC-CXRの3つの公開データセットを用いてChexRadiNetフレームワークを評価する。その結果,chexradinetは疾患検出(aucでは0.843)と局在(t(iou) = 0.1)の両方において最先端を上回っていることがわかった。我々は,この手法が,放射線学の世界をより高度に理解した自動システムの開発を促進することを期待して,このコードをhttps://github.com/bionlplab/lung_disease_detection_amia2021で公開する。 Chest X-ray becomes one of the most common medical diagnoses due to its noninvasiveness. The number of chest X-ray images has skyrocketed, but reading chest X-rays still have been manually performed by radiologists, which creates huge burnouts and delays. Traditionally, radiomics, as a subfield of radiology that can extract a large number of quantitative features from medical images, demonstrates its potential to facilitate medical imaging diagnosis before the deep learning era. In this paper, we develop an end-to-end framework, ChexRadiNet, that can utilize the radiomics features to improve the abnormality classification performance. Specifically, ChexRadiNet first applies a light-weight but efficient triplet-attention mechanism to classify the chest X-rays and highlight the abnormal regions. Then it uses the generated class activation map to extract radiomic features, which further guides our model to learn more robust image features. After a number of iterations and with the help of radiomic features, our framework can converge to more accurate image regions. We evaluate the ChexRadiNet framework using three public datasets: NIH ChestX-ray, CheXpert, and MIMIC-CXR. We find that ChexRadiNet outperforms the state-of-the-art on both disease detection (0.843 in AUC) and localization (0.679 in T(IoU) = 0.1). We will make the code publicly available at https://github.com/bionlplab/lung_disease_detection_amia2021, with the hope that this method can facilitate the development of automatic systems with a higher-level understanding of the radiological world.	翻訳日:2022-09-21 02:55:42 公開日:2021-07-09
# ABD-Net:3Dポイントクラウド分解のための注意に基づく分解ネットワーク ABD-Net: Attention Based Decomposition Network for 3D Point Cloud Decomposition ( http://arxiv.org/abs/2108.04221v1 ) ライセンス: Link先を確認	Siddharth Katageri, Shashidhar V Kudari, Akshaykumar Gunari, Ramesh Ashok Tabib, Uma Mudenagudi	(参考訳) 本稿では, 点雲を平面, 球面, 円錐, シリンダーといった基本的な幾何学形状に分解するための注意に基づく分解ネットワーク(ABD-Net)を提案する。点雲の原始形状に基づく注意特徴を用いた3次元オブジェクト分類の性能向上を示す。 3Dオブジェクトのシンプルでコンパクトな表現であるポイントクラウドの人気が高まっている。彼らは点集合における不順序性による特徴抽出のための堅牢な方法を要求する。 abd-netでは、提案する局所近接カプセル化器は、入力点集合から各点周辺の空間エンコーディングと共に局所幾何変化をキャプチャする。カプセル化された局所機能は、ポイントクラウドの基本形状を学ぶために、提案する注意機能エンコーダにさらに渡される。注意特徴エンコーダは、全点の近傍間の幾何学的関係をモデル化し、全点クラウド情報をキャプチャする。提案するansiメカニカルコンポーネントとmodelnet40データセットにおけるabd-netの結果を示す。また,モデルNet40ベンチマークデータセット上での3次元オブジェクト分類の性能を向上させることにより,獲得した注目機能に対するABD-Netの有効性を実証し,最先端技術と比較した。 In this paper, we propose Attention Based Decomposition Network (ABD-Net), for point cloud decomposition into basic geometric shapes namely, plane, sphere, cone and cylinder. We show improved performance of 3D object classification using attention features based on primitive shapes in point clouds. Point clouds, being the simple and compact representation of 3D objects have gained increasing popularity. They demand robust methods for feature extraction due to unorderness in point sets. In ABD-Net the proposed Local Proximity Encapsulator captures the local geometric variations along with spatial encoding around each point from the input point sets. The encapsulated local features are further passed to proposed Attention Feature Encoder to learn basic shapes in point cloud. Attention Feature Encoder models geometric relationship between the neighborhoods of all the points resulting in capturing global point cloud information. We demonstrate the results of our proposed ABD-Net on ANSI mechanical component and ModelNet40 datasets. We also demonstrate the effectiveness of ABD-Net over the acquired attention features by improving the performance of 3D object classification on ModelNet40 benchmark dataset and compare them with state-of-the-art techniques.	翻訳日:2021-08-15 11:27:18 公開日:2021-07-09
# 歩行からのパーキンソン病の効率的な診断のための線形予測 Linear Prediction Residual for Efficient Diagnosis of Parkinson's Disease from Gait ( http://arxiv.org/abs/2107.12878v1 ) ライセンス: Link先を確認	Shanmukh Alle and U. Deva Priyakumar	(参考訳) パーキンソン病(英: Parkinson's Disease、PD)は、慢性的に進行する神経疾患であり、硬直性、震動、姿勢不安定をもたらす。 PDを診断するための明確な医療検査はなく、診断は主に臨床演習である。ガイドラインはあるものの、約10～30%の患者が誤ってPDと診断されている。したがって、正確な、偏りのない、迅速な診断方法が必要となる。本研究では,歩行からpdを迅速かつ正確に診断する手法であるlpgnetを提案する。 LPGNetはLPR(Linear Prediction Residuals)を使用して歩行記録から識別パターンを抽出し、1D畳み込みニューラルネットワークを用いて診断を行う。 LPGNetは21倍のスピードアップと約99%のパラメータを持つ0.91のAUCを達成している。また,歩行からpd診断の文献で用いられる様々なクロスバリデーション戦略の分析を行い,多くの手法が不必要に大きなモデルと過剰フィッティングによる性能の増大につながる様々な折りたたみ型データ漏洩によって影響を受けることを見出した。この分析により、今後の手法を正しく評価する道のりが明確になる。 Parkinson's Disease (PD) is a chronic and progressive neurological disorder that results in rigidity, tremors and postural instability. There is no definite medical test to diagnose PD and diagnosis is mostly a clinical exercise. Although guidelines exist, about 10-30% of the patients are wrongly diagnosed with PD. Hence, there is a need for an accurate, unbiased and fast method for diagnosis. In this study, we propose LPGNet, a fast and accurate method to diagnose PD from gait. LPGNet uses Linear Prediction Residuals (LPR) to extract discriminating patterns from gait recordings and then uses a 1D convolution neural network with depth-wise separable convolutions to perform diagnosis. LPGNet achieves an AUC of 0.91 with a 21 times speedup and about 99% lesser parameters in the model compared to the state of the art. We also undertake an analysis of various cross-validation strategies used in literature in PD diagnosis from gait and find that most methods are affected by some form of data leakage between various folds which leads to unnecessarily large models and inflated performance due to overfitting. The analysis clears the path for future works in correctly evaluating their methods.	翻訳日:2021-08-01 11:01:29 公開日:2021-07-09
# (参考訳) テキストから音声への動的変換器によるフェデレーション学習 Federated Learning with Dynamic Transformer for Text to Speech ( http://arxiv.org/abs/2107.08795v1 ) ライセンス: CC BY 4.0	Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Jie Liu, Chendong Zhao, Jing Xiao	(参考訳) text to speech(tts)はユーザインタラクションにとって重要なタスクだが、ttsモデルトレーニングは高品質なオリジナルデータセットのセットに依存している。プライバシとセキュリティの問題のため、オリジナルのデータセットは通常、直接使用できない。近年,連合学習は,プライバシ保護機構が強化された,一般的な分散機械学習パラダイムを提案する。データ所有者が他の人とコラボレーションするための実用的でセキュアなフレームワークを提供するので、より大きなデータセットでトレーニングされたより良いグローバルモデルを得ることができる。しかし、変圧器モデルの複雑性が高いため、連合学習環境では収束過程が遅く不安定になる。さらに、連合学習で訓練されたトランスフォーマーモデルは、クライアント上での通信コストと計算速度の制限であり、その人気を妨げている。これらの課題に対処するために,フェデレーション動的トランスフォーマを提案する。一方, クライアント数が増加すると, 集中型トランスフォーマー-TTSに近づき, フェデレーショントランスに比べて性能が大幅に向上する。一方、トレーニングフェーズにおけるより高速でより安定した収束を実現し、通信時間を著しく短縮する。 LJSpeechデータセットの実験も、我々の手法の利点を強く証明している。 Text to speech (TTS) is a crucial task for user interaction, but TTS model training relies on a sizable set of high-quality original datasets. Due to privacy and security issues, the original datasets are usually unavailable directly. Recently, federated learning proposes a popular distributed machine learning paradigm with an enhanced privacy protection mechanism. It offers a practical and secure framework for data owners to collaborate with others, thus obtaining a better global model trained on the larger dataset. However, due to the high complexity of transformer models, the convergence process becomes slow and unstable in the federated learning setting. Besides, the transformer model trained in federated learning is costly communication and limited computational speed on clients, impeding its popularity. To deal with these challenges, we propose the federated dynamic transformer. On the one hand, the performance is greatly improved comparing with the federated transformer, approaching centralize-trained Transformer-TTS when increasing clients number. On the other hand, it achieves faster and more stable convergence in the training phase and significantly reduces communication time. Experiments on the LJSpeech dataset also strongly prove our method's advantage.	翻訳日:2021-07-25 13:44:36 公開日:2021-07-09
# (参考訳) 旅行セールスマン問題の強化型ハイブリッド遺伝的アルゴリズム Reinforced Hybrid Genetic Algorithm for the Traveling Salesman Problem ( http://arxiv.org/abs/2107.06870v1 ) ライセンス: CC BY 4.0	Jiongzhi Zheng and Menglei Chen and Jialun Zhong and Kun He	(参考訳) 本稿では,NPハードトラベリングセールスマン問題(TSP)に対する強力な強化ハイブリッド遺伝的アルゴリズム(RHGA)を提案する。 RHGAは強化学習技術と有名なエッジアセンブリクロスオーバー遺伝的アルゴリズム(EAX-GA)とLin-Kernighan-Helsgaun(LKH)局所探索ヒューリスティックを組み合わせた。提案したハイブリッド機構の助けを借りて、EAX-GAの遺伝的進化とLKHの局所探索により、互いのパフォーマンスが向上する。また、q学習に基づく強化学習技術は、ハイブリッド遺伝的アルゴリズムをさらに促進する。 128のよく知られたTSPベンチマーク実験の結果,1,000から85,900都市を対象に,提案手法の優れた性能を示した。 We propose a powerful Reinforced Hybrid Genetic Algorithm (RHGA) for the famous NP-hard Traveling Salesman Problem (TSP). RHGA combines reinforcement learning technique with the well-known Edge Assembly Crossover genetic algorithm (EAX-GA) and the Lin-Kernighan-Helsgaun (LKH) local search heuristic. With the help of the proposed hybrid mechanism, the genetic evolution of EAX-GA and the local search of LKH can boost each other's performance. And the reinforcement learning technique based on Q-learning further promotes the hybrid genetic algorithm. Experimental results on 138 well-known and widely used TSP benchmarks, with the number of cities ranging from 1,000 to 85,900, demonstrate the excellent performance of the proposed method.	翻訳日:2021-07-18 13:12:30 公開日:2021-07-09
# GGT:ディープニューラルネットワークの逆サンプル検出のためのグラフガイドテスト GGT: Graph-Guided Testing for Adversarial Sample Detection of Deep Neural Network ( http://arxiv.org/abs/2107.07043v1 ) ライセンス: Link先を確認	Zuohui Chen, Renxuan Wang, Jingyang Xiang, Yue Yu, Xin Xia, Shouling Ji, Qi Xuan, and Xiaoniu Yang	(参考訳) ディープニューラルネットワーク(dnn)は、敵のサンプルに対して脆弱であることが知られており、その検出は、これらのdnnモデルの広範囲な適用に不可欠である。近年、DNNシステムの脆弱性を見つけるために、ソフトウェア工学における多くの深層試験手法が提案され、その1つ、すなわちモデル変異テスト(MMT)は、様々な種類の敵攻撃によって生成された様々な敵のサンプルを正常に検出するために使用された。しかし、MTMの変異モデルは、常に大きな数(例えば100モデル以上)であり、多様性の欠如(例えば、高信頼の敵検体では容易に回避できる)のため、実際の応用では効率が悪く、高信頼の敵検体の検出にも効果が低い。本研究では,これらの課題を克服するために,逆サンプル検出のためのグラフガイドテスト(GGT)を提案する。 GGT はグラフ特性をガイドしたプルーニングモデルを生成し、それぞれ MMT の変異モデルのパラメータは5% 程度しかなく、グラフガイドモデルの方が多様性が高い。 CIFAR10 と SVHN の実験により、GGT は MMT よりも効率と効率の両面で優れていることが示された。 Deep Neural Networks (DNN) are known to be vulnerable to adversarial samples, the detection of which is crucial for the wide application of these DNN models. Recently, a number of deep testing methods in software engineering were proposed to find the vulnerability of DNN systems, and one of them, i.e., Model Mutation Testing (MMT), was used to successfully detect various adversarial samples generated by different kinds of adversarial attacks. However, the mutated models in MMT are always huge in number (e.g., over 100 models) and lack diversity (e.g., can be easily circumvented by high-confidence adversarial samples), which makes it less efficient in real applications and less effective in detecting high-confidence adversarial samples. In this study, we propose Graph-Guided Testing (GGT) for adversarial sample detection to overcome these aforementioned challenges. GGT generates pruned models with the guide of graph characteristics, each of them has only about 5% parameters of the mutated model in MMT, and graph guided models have higher diversity. The experiments on CIFAR10 and SVHN validate that GGT performs much better than MMT with respect to both effectiveness and efficiency.	翻訳日:2021-07-18 12:33:39 公開日:2021-07-09
# NVCell:強化学習による高度な技術ノードにおける標準セルレイアウト NVCell: Standard Cell Layout in Advanced Technology Nodes with Reinforcement Learning ( http://arxiv.org/abs/2107.07044v1 ) ライセンス: Link先を確認	Haoxing Ren, Matthew Fojtik, Brucek Khailany	(参考訳) 高度な技術ノードにおける高品質な標準セルレイアウト自動化は、複雑な設計規則のため、現在でも業界では難しい。本稿では,高度技術ノード上の産業標準セルライブラリにおいて,一列セルの90%以上を均等あるいは小面積で配置できる,NVCellと呼ばれる標準セルレイアウト自動生成装置を提案する。 NVCellは強化学習(RL)を活用して、ルーティング中の設計規則違反を修正し、効率的な配置を生成する。 High quality standard cell layout automation in advanced technology nodes is still challenging in the industry today because of complex design rules. In this paper we introduce an automatic standard cell layout generator called NVCell that can generate layouts with equal or smaller area for over 90% of single row cells in an industry standard cell library on an advanced technology node. NVCell leverages reinforcement learning (RL) to fix design rule violations during routing and to generate efficient placements.	翻訳日:2021-07-18 12:30:26 公開日:2021-07-09
# fmnet: ノイズマイクロドップラースペクトログラムをクリーンアップする潜在機能マッピングネットワーク FMNet: Latent Feature-wise Mapping Network for Cleaning up Noisy Micro-Doppler Spectrogram ( http://arxiv.org/abs/2107.07312v1 ) ライセンス: Link先を確認	Chong Tang, Wenda Li, Shelly Vishwakarma, Fangzhan Shi, Simon Julier, Kevin Chetty	(参考訳) マイクロドップラーシグネチャには、ターゲットダイナミクスに関するかなりの情報が含まれている。しかし、レーダセンシングシステムはノイズの多い環境に影響を受けやすく、マイクロドップラースペクトログラム上では解釈不能な動きパターンとなる。一方、レーダーリターンは、しばしばマルチパス、乱雑、干渉に悩まされる。これらの問題は、例えば、運動特徴抽出、マイクロドップラーシグネチャを用いたアクティビティ分類(\mu$-DS)などにおいて困難をもたらす。本稿では,同一条件下でのシミュレーション結果とより密接に類似するように,測定されたスペクトログラムを変換する機能マッピングネットワーク(fmnet)を提案する。計測されたスペクトログラムとマッチングされたシミュレーションデータに基づいて,潜在表現/特徴を抽出するエンコーダ,潜在特徴に応じて再構成されたスペクトログラムを出力するデコーダ,計測およびシミュレーションデータの潜在特徴距離を最小化する判別器の3つの部分を含む。 6つの活動データと2つの実験シナリオを用いてfmnetを実演し,最終結果は,強力な拡張パターンを示し,実際の動作情報を最大限に保持できる。一方,シミュレーションデータのみを用いて分類器を訓練し,fmnetでクリーンアップした後,新たに測定したサンプルを予測できる新しいアイデアを提案する。最終分類の結果から、大幅な改善が見られる。 Micro-Doppler signatures contain considerable information about target dynamics. However, the radar sensing systems are easily affected by noisy surroundings, resulting in uninterpretable motion patterns on the micro-Doppler spectrogram. Meanwhile, radar returns often suffer from multipath, clutter and interference. These issues lead to difficulty in, for example motion feature extraction, activity classification using micro Doppler signatures ($\mu$-DS), etc. In this paper, we propose a latent feature-wise mapping strategy, called Feature Mapping Network (FMNet), to transform measured spectrograms so that they more closely resemble the output from a simulation under the same conditions. Based on measured spectrogram and the matched simulated data, our framework contains three parts: an Encoder which is used to extract latent representations/features, a Decoder outputs reconstructed spectrogram according to the latent features, and a Discriminator minimizes the distance of latent features of measured and simulated data. We demonstrate the FMNet with six activities data and two experimental scenarios, and final results show strong enhanced patterns and can keep actual motion information to the greatest extent. On the other hand, we also propose a novel idea which trains a classifier with only simulated data and predicts new measured samples after cleaning them up with the FMNet. From final classification results, we can see significant improvements.	翻訳日:2021-07-18 12:27:37 公開日:2021-07-09
# トランスフォーマティブな行動表現学習は、小さなデータセットにおける移動センシングのためのトランスファー学習を可能にする Transformer-Based Behavioral Representation Learning Enables Transfer Learning for Mobile Sensing in Small Datasets ( http://arxiv.org/abs/2107.06097v1 ) ライセンス: Link先を確認	Mike A. Merrill and Tim Althoff	(参考訳) ディープラーニングは、nlpとコンピュータビジョンの研究と応用に革命をもたらしたが、行動モデリングや行動健康アプリケーションでは、まだそうではない。これは、ドメインのデータセットが小さく、異種データ型を持ち、通常、大量の欠落を示すためである。したがって、既成のディープラーニングモデルは、重要な、しばしば禁止的な適応を必要とする。それゆえ、多くの研究アプリケーションはまだ木モデルが強化された手動でコーディングされた機能に依存しており、時には専門家によって手作りされたタスク特有の機能がある。本稿では,時系列から一般化可能な特徴表現を学習可能なモバイルセンシングデータのためのニューラルアーキテクチャフレームワークを提供し,微調整による小さなデータ領域での転送学習の実現可能性を示す。このアーキテクチャは、cnnとtrans-formerアーキテクチャの利点を組み合わせることで、1) 手作りのフィーチャを0.33 roc aucまで必要とせずに、生の微小レベルのセンサーデータから直接学習することで、より良い予測性能を実現する。 While deep learning has revolutionized research and applications in NLP and computer vision, this has not yet been the case for behavioral modeling and behavioral health applications. This is because the domain's datasets are smaller, have heterogeneous datatypes, and typically exhibit a large degree of missingness. Therefore, off-the-shelf deep learning models require significant, often prohibitive, adaptation. Accordingly, many research applications still rely on manually coded features with boosted tree models, sometimes with task-specific features handcrafted by experts. Here, we address these challenges by providing a neural architecture framework for mobile sensing data that can learn generalizable feature representations from time series and demonstrates the feasibility of transfer learning on small data domains through finetuning. This architecture combines benefits from CNN and Trans-former architectures to (1) enable better prediction performance by learning directly from raw minute-level sensor data without the need for handcrafted features by up to 0.33 ROC AUC, and (2) use pretraining to outperform simpler neural models and boosted decision trees with data from as few a dozen participants.	翻訳日:2021-07-14 14:30:49 公開日:2021-07-09
# (参考訳) 低級子宮内膜間質肉腫(lgess)のコンピュータ診断 Computer-Aided Diagnosis of Low Grade Endometrial Stromal Sarcoma (LGESS) ( http://arxiv.org/abs/2107.05426v1 ) ライセンス: CC BY 4.0	Xinxin Yang and Mark Stamp	(参考訳) 低悪性度子宮内膜間質肉腫(LGESS)はまれながんであり、全子宮癌症例の約0.2%を占める。 LGESS患者の約75%は、当初は良性腫瘍の一種である平滑筋腫(線維化物)と誤診されている。本研究では,lgess患者の子宮組織生検像をセグメンテーションと染色正規化アルゴリズムを用いて前処理する。さまざまな古典的な機械学習とディープラーニングモデルを使用して、組織画像を良性または癌性に分類する。従来の手法では,最も高い分類精度が約0.85であり,最高のディープラーニングモデルでは約0.87の精度を実現している。これらの結果から,LGESSの診断に適切な学習アルゴリズムが有用であることが示唆された。 Low grade endometrial stromal sarcoma (LGESS) is rare form of cancer, accounting for about 0.2% of all uterine cancer cases. Approximately 75% of LGESS patients are initially misdiagnosed with leiomyoma, which is a type of benign tumor, also known as fibroids. In this research, uterine tissue biopsy images of potential LGESS patients are preprocessed using segmentation and staining normalization algorithms. A variety of classic machine learning and leading deep learning models are then applied to classify tissue images as either benign or cancerous. For the classic techniques considered, the highest classification accuracy we attain is about 0.85, while our best deep learning model achieves an accuracy of approximately 0.87. These results indicate that properly trained learning algorithms can play a useful role in the diagnosis of LGESS.	翻訳日:2021-07-14 13:43:10 公開日:2021-07-09
# (参考訳) 最適三角法は本当に最適ではない Optimal Triangulation Method is Not Really Optimal ( http://arxiv.org/abs/2107.04618v1 ) ライセンス: CC BY 4.0	Seyed-Mahdi Nasiri, Reshad Hosseini, Hadi Moradi	(参考訳) 三角測量は、複数のカメラ画像の2d投影から3dポイントを見つける問題を指す。この問題を解決するには,いわゆる最適三角測量法を用いるのが一般的であり,本論文ではl2法と呼ぶ。しかし、この方法はカメラパラメータの不確かさを仮定しない場合にのみ最適である。合成データと実データとの広範な比較により,L2法はカメラパラメータに不確実性が存在する場合に最適ではないことがわかった。興味深いことに、単純な中点法は他の方法よりも優れている。ハイパフォーマンスとは別に、中点法は複数のカメラ画像に対して単純な閉じたソリューションを持ち、L2法は2つ以上のカメラ画像に対して使用できない。したがって、一般的な手法とは対照的に、単純な中間点法は、カメラパラメータに不確かさがある構造から動きへのアプリケーションで使われるべきであると論じている。 Triangulation refers to the problem of finding a 3D point from its 2D projections on multiple camera images. For solving this problem, it is the common practice to use so-called optimal triangulation method, which we call the L2 method in this paper. But, the method can be optimal only if we assume no uncertainty in the camera parameters. Through extensive comparison on synthetic and real data, we observed that the L2 method is actually not the best choice when there is uncertainty in the camera parameters. Interestingly, it can be observed that the simple mid-point method outperforms other methods. Apart from its high performance, the mid-point method has a simple closed formed solution for multiple camera images while the L2 method is hard to be used for more than two camera images. Therefore, in contrast to the common practice, we argue that the simple mid-point method should be used in structure-from-motion applications where there is uncertainty in camera parameters.	翻訳日:2021-07-14 13:32:25 公開日:2021-07-09
# (参考訳) ガウス過程トリガーを用いた多様な映像生成 Diverse Video Generation using a Gaussian Process Trigger ( http://arxiv.org/abs/2107.04619v1 ) ライセンス: CC0 1.0	Gaurav Shrivastava and Abhinav Shrivastava	(参考訳) いくつかのコンテキスト(あるいは過去の)フレームが与えられた将来のフレームを生成するのは、難しい作業です。将来的な状態の多様性の観点から、ビデオの時間的コヒーレンスとマルチモダリティをモデル化する必要がある。ビデオ生成に対する現在の変分アプローチは、マルチモーダルな将来の結果よりも疎外する傾向にある。代わりに、将来の成果におけるマルチモダリティを明示的にモデル化し、多様な未来をサンプリングするためにそれを活用することを提案する。我々のアプローチであるDiverse Video Generatorは、ガウス過程(GP)を用いて、過去の状態を学習し、特定のサンプルを与えられた未来の確率分布を維持する。さらに,この分布の変化を時間とともに活用し,現在進行中のシーケンスの終了を推定することで,多様な将来状態のサンプリングを制御する。すなわち、出力関数空間上のGPの分散を利用して、アクションシーケンスの変更をトリガーする。生成したシーケンスの復元品質と多様性の観点から,将来的なフレーム生成の最先端性を実現する。 Generating future frames given a few context (or past) frames is a challenging task. It requires modeling the temporal coherence of videos and multi-modality in terms of diversity in the potential future states. Current variational approaches for video generation tend to marginalize over multi-modal future outcomes. Instead, we propose to explicitly model the multi-modality in the future outcomes and leverage it to sample diverse futures. Our approach, Diverse Video Generator, uses a Gaussian Process (GP) to learn priors on future states given the past and maintains a probability distribution over possible futures given a particular sample. In addition, we leverage the changes in this distribution over time to control the sampling of diverse future states by estimating the end of ongoing sequences. That is, we use the variance of GP over the output function space to trigger a change in an action sequence. We achieve state-of-the-art results on diverse future frame generation in terms of reconstruction quality and diversity of the generated sequences.	翻訳日:2021-07-14 13:22:07 公開日:2021-07-09
# (参考訳) ハイブリッドディープニューラルネットワークを用いたマルチジオメトリハイパースペクトル画像からのIll-posed Surface Emissivity検索 Ill-posed Surface Emissivity Retrieval from Multi-Geometry HyperspectralImages using a Hybrid Deep Neural Network ( http://arxiv.org/abs/2107.04631v1 ) ライセンス: CC BY 4.0	Fangcao Xu, Jian Suna, Guido Cervonea, Mark Salvador	(参考訳) 大気補正はリモートセンシングの基本的なタスクであり、観測は大気のどちらかで行われるか、大気を通して観測される。大気補正誤差は観測のスペクトルシグネチャを著しく変化させ、不正な分類やターゲット検出につながる可能性がある。これは、スペクトル特性の正確な測定が必要な超スペクトルデータを扱う場合にさらに重要である。最先端の物理学に基づく大気補正アプローチでは、センサ特性、収集形状、収集されるシーンの環境特性に関する幅広い事前知識が必要である。これらのアプローチは計算コストが高く、十分な環境情報や収集情報の欠如により不正確になりがちであり、しばしばリアルタイムアプリケーションでは不可能である。本稿では,異なる測地から収集したマルチスキャンハイパースペクトルデータを用いた自動大気補正のための幾何依存型ハイブリッドニューラルネットワークを提案する。提案したネットワークは、追加の気象データなしで大気を特徴づけることができる。温度放射率分離問題の解法としてグリッド探索法を提案する。その結果,提案ネットワークは,29種類の材料に対して0.02未満の絶対誤差(mae)で,大気を正確に特徴付け,目標放射率スペクトルを推定できることがわかった。このソリューションは、リアルタイムアプリケーションに対する目標検出を改善するために、正確な大気補正につながる可能性がある。 Atmospheric correction is a fundamental task in remote sensing because observations are taken either of the atmosphere or looking through the atmosphere. Atmospheric correction errors can significantly alter the spectral signature of the observations, and lead to invalid classifications or target detection. This is even more crucial when working with hyperspectral data, where a precise measurement of spectral properties is required. State-of-the-art physics-based atmospheric correction approaches require extensive prior knowledge about sensor characteristics, collection geometry, and environmental characteristics of the scene being collected. These approaches are computationally expensive, prone to inaccuracy due to lack of sufficient environmental and collection information, and often impossible for real-time applications. In this paper, a geometry-dependent hybrid neural network is proposed for automatic atmospheric correction using multi-scan hyperspectral data collected from different geometries. The proposed network can characterize the atmosphere without any additional meteorological data. A grid-search method is also proposed to solve the temperature emissivity separation problem. Results show that the proposed network has the capacity to accurately characterize the atmosphere and estimate target emissivity spectra with a Mean Absolute Error (MAE) under 0.02 for 29 different materials. This solution can lead to accurate atmospheric correction to improve target detection for real time applications.	翻訳日:2021-07-14 13:01:53 公開日:2021-07-09
# (参考訳) 因果効果を用いたアルゴリズム因果効果同定 Algorithmic Causal Effect Identification with causaleffect ( http://arxiv.org/abs/2107.04632v1 ) ライセンス: CC BY 4.0	Mart\'i Pedemonte, Jordi Vitri\`a and \'Alvaro Parafita (Universitat de Barcelona)	(参考訳) 種としての私たちの進化は、原因と影響の関係を理解する際に大きな一歩を踏み出した。これらの関連は、いくつかのイベントには自明だが、複雑なシナリオではない。因果理論と因果推論が形式化され、$do$-operatorとその関連する規則が導入された。このレポートの主な目的は、Pythonのいくつかのアルゴリズムで観測データから条件付きおよび条件なし因果クエリを計算し、実装することである。この目的のために、まず確率とグラフ理論に関する基本的な背景知識を提示し、アルゴリズムの構築に使用される因果論の重要な結果を紹介した。 2006年にshpitserとpearlによって提示された識別アルゴリズムを徹底的に研究し、pythonの実装について説明した。主同定アルゴリズムは、$do$-calculusの規則の繰り返し適用と見なすことができ、最終的に実験的な確率から因果クエリの式を返すか、因果効果を識別できないかのどちらかである。我々は、新しく開発したpythonライブラリを紹介し、いくつかの利用例を示す。 Our evolution as a species made a huge step forward when we understood the relationships between causes and effects. These associations may be trivial for some events, but they are not in complex scenarios. To rigorously prove that some occurrences are caused by others, causal theory and causal inference were formalized, introducing the $do$-operator and its associated rules. The main goal of this report is to review and implement in Python some algorithms to compute conditional and non-conditional causal queries from observational data. To this end, we first present some basic background knowledge on probability and graph theory, before introducing important results on causal theory, used in the construction of the algorithms. We then thoroughly study the identification algorithms presented by Shpitser and Pearl in 2006, explaining our implementation in Python alongside. The main identification algorithm can be seen as a repeated application of the rules of $do$-calculus, and it eventually either returns an expression for the causal query from experimental probabilities or fails to identify the causal effect, in which case the effect is non-identifiable. We introduce our newly developed Python library and give some usage examples.	翻訳日:2021-07-14 13:00:16 公開日:2021-07-09
# (参考訳) 非マルコフ確率的リワード過程からの確率的リワードマシンの学習 Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes ( http://arxiv.org/abs/2107.04633v1 ) ライセンス: CC BY 4.0	Alvaro Velasquez, Andre Beckus, Taylor Dohmen, Ashutosh Trivedi, Noah Topper, George Atia	(参考訳) 典型的な環境での強化学習の成功は、部分的には、エージェントが最適なポリシーを学ぶ報酬信号に関するマルコフの仮定に基づくものである。近年、報酬機械の使用は、非マルコフ報酬の構造化表現を可能にしてこの仮定を緩和している。特に、そのような表現は、基礎となる決定プロセスの状態空間を増大させ、非マルコフ強化学習を容易にするために用いられる。しかし、これらの報酬機械は、確率的報酬信号のセマンティクスを捉えることができない。本稿では,非マルコフ確率的報酬の表現として確率的報酬機械(prm)を導入することで,この方向を前進させる。本稿では,意思決定プロセスからPRMを学習するアルゴリズムと,意思決定方針のPRM表現を学習するアルゴリズムを提案する。 The success of reinforcement learning in typical settings is, in part, predicated on underlying Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture the semantics of stochastic reward signals. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs) as a representation of non-Markovian stochastic rewards. We present an algorithm to learn PRMs from the underlying decision process as well as to learn the PRM representation of a given decision-making policy.	翻訳日:2021-07-14 12:59:13 公開日:2021-07-09
# (参考訳) ドメインに依存しないpddl+プランナーでangry birdsをプレイする Playing Angry Birds with a Domain-Independent PDDL+ Planner ( http://arxiv.org/abs/2107.04635v1 ) ライセンス: CC BY-SA 4.0	Wiktor Piotrowski, Roni Stern, Matthew Klenk, Alexandre Perez, Shiwali Mohan, Johan de Kleer, Jacob Le	(参考訳) 本稿では,ドメインに依存しないプランナーを用いて人気のangry birdsゲームを初めてプレイするシステムを提案する。我々のシステムは、混合離散/連続ドメインのための計画言語PDDL+を用いて、Angry Birdsレベルをモデル化する。ドメインに依存しないPDDL+プランナーを使用してプランを生成し、実行する。本稿では,本ドメインのPDDL+モデルについて述べるとともに,問題の複雑性を低減させる重要な設計上の決定事項を特定し,本ドメインのモデル固有の手法と比較する。その結果,本システムの性能はangry birdsの他のドメイン固有システムと同等であり,このベンチマークai課題に対するドメイン独立計画の適用性が示唆された。 This demo paper presents the first system for playing the popular Angry Birds game using a domain-independent planner. Our system models Angry Birds levels using PDDL+, a planning language for mixed discrete/continuous domains. It uses a domain-independent PDDL+ planner to generate plans and executes them. In this demo paper, we present the system's PDDL+ model for this domain, identify key design decisions that reduce the problem complexity, and compare the performance of our system to model-specific methods for this domain. The results show that our system's performance is on par with other domain-specific systems for Angry Birds, suggesting the applicability of domain-independent planning to this benchmark AI challenge.	翻訳日:2021-07-14 12:39:01 公開日:2021-07-09
# (参考訳) 表データにおける反事実生成法に関するフレームワークとベンチマーク A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data ( http://arxiv.org/abs/2107.04680v1 ) ライセンス: CC BY 4.0	Raphael Mazzine and David Martens	(参考訳) 事実的説明は、機械学習の予測を説明する効果的な方法と見なされる。この関心は、そのような説明を生み出すために既に何十ものアルゴリズムが使われている比較的若い文献に反映されている。これらのアルゴリズムは、出力の分類を変えるために機能をどのように変更できるかを見つけることに重点を置いている。しかし、この比較的一般的な目的を異なる方法で達成できるため、これらのアルゴリズムをテストし、ベンチマークする方法論が必要となる。まず、関連する9つの評価指標を用いて、22の表付きデータセットに対する10のアルゴリズム的アプローチに関する大規模なベンチマーク研究を行う。第二に、反事実生成アルゴリズムをテストするための新しいフレームワークの導入です。第三に、反事実的な結果を評価し比較するための客観的指標のセットです。そして最後に、どのアプローチがどのタイプのデータセットで最高のパフォーマンスを得るかを示すベンチマーク結果から洞察を得る。このベンチマーク研究とフレームワークは、実践者がどのテクニックとビルディングブロックが最も適しているかを決定するのに役立ち、研究者が現在および将来のカウンターファクト生成アルゴリズムの設計と評価に役立ちます。以上の結果から,パフォーマンスがデータセット,モデル,スコア,事実点の特異性に大きく依存するため,全体として,反実的説明を生成する最善のアルゴリズムは存在しないことがわかった。 Counterfactual explanations are viewed as an effective way to explain machine learning predictions. This interest is reflected by a relatively young literature with already dozens of algorithms aiming to generate such explanations. These algorithms are focused on finding how features can be modified to change the output classification. However, this rather general objective can be achieved in different ways, which brings about the need for a methodology to test and benchmark these algorithms. The contributions of this work are manifold: First, a large benchmarking study of 10 algorithmic approaches on 22 tabular datasets is performed, using 9 relevant evaluation metrics. Second, the introduction of a novel, first of its kind, framework to test counterfactual generation algorithms. Third, a set of objective metrics to evaluate and compare counterfactual results. And finally, insight from the benchmarking results that indicate which approaches obtain the best performance on what type of dataset. This benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context, and can help researchers in the design and evaluation of current and future counterfactual generation algorithms. Our findings show that, overall, there's no single best algorithm to generate counterfactual explanations as the performance highly depends on properties related to the dataset, model, score and factual point specificities.	翻訳日:2021-07-14 12:35:24 公開日:2021-07-09
# (参考訳) 時空間ロバストエッジネットワーク Scaled-Time-Attention Robust Edge Network ( http://arxiv.org/abs/2107.04688v1 ) ライセンス: CC BY 4.0	Richard Lau, Lihan Yao, Todd Huster, William Johnson, Stephen Arleth, Justin Wong, Devin Ridge, Michael Fletcher, William C. Headley	(参考訳) 本稿では,貯水池ニューラルネットワークの遅延ループバージョンに基づくニューラルネットの新しいファミリーを構築するための体系的なアプローチについて述べる。結果として得られたアーキテクチャは、STARE(Scaled-Time-Attention Robust Edge)ネットワークと呼ばれ、超次元空間と非乗算演算を利用して、浅いレイヤを持ち、トレーニングが簡単で、従来のディープニューラルネットワークよりもIoT(Internet of Things)のようなエッジアプリケーションに適している。 STAREは、注意やコンテキストといった新しいAI概念を取り入れており、時間的特徴抽出と分類に最も適している。 stareは様々なアプリケーションに適用でき、パフォーマンスが向上し、実装の複雑さが低下する。特に,空間的(ビデオフレーム)情報と時間的(軌道)情報の両方を利用して,対向無人航空システム(UAS)検出アプリケーションにおいて,二重ループ構成をドローン対鳥の検出と識別に応用する方法を示した。また、STAREの性能は、RF変調の分類において最先端のディープニューラルネットワークに近づき、マッキーグラスの時系列予測の特別な場合において長短期記憶(LSTM)より優れることを示した。ハードウェア効率を実証するために,STAREアルゴリズムのFPGA実装を開発し,その低消費電力かつ高スループットな演算を実証した。さらに,ASIC実装のためのSTAREアルゴリズムの大規模並列実装を統合するための効率的な構造について述べる。 This paper describes a systematic approach towards building a new family of neural networks based on a delay-loop version of a reservoir neural network. The resulting architecture, called Scaled-Time-Attention Robust Edge (STARE) network, exploits hyper dimensional space and non-multiply-and-add computation to achieve a simpler architecture, which has shallow layers, is simple to train, and is better suited for Edge applications, such as Internet of Things (IoT), over traditional deep neural networks. STARE incorporates new AI concepts such as Attention and Context, and is best suited for temporal feature extraction and classification. We demonstrate that STARE is applicable to a variety of applications with improved performance and lower implementation complexity. In particular, we showed a novel way of applying a dual-loop configuration to detection and identification of drone vs bird in a counter Unmanned Air Systems (UAS) detection application by exploiting both spatial (video frame) and temporal (trajectory) information. We also demonstrated that the STARE performance approaches that of a State-of-the-Art deep neural network in classifying RF modulations, and outperforms Long Short-term Memory (LSTM) in a special case of Mackey Glass time series prediction. To demonstrate hardware efficiency, we designed and developed an FPGA implementation of the STARE algorithm to demonstrate its low-power and high-throughput operations. In addition, we illustrate an efficient structure for integrating a massively parallel implementation of the STARE algorithm for ASIC implementation.	翻訳日:2021-07-14 12:34:22 公開日:2021-07-09
# (参考訳) 生涯教師-学生ネットワーク学習 Lifelong Teacher-Student Network Learning ( http://arxiv.org/abs/2107.04689v1 ) ライセンス: CC BY-SA 4.0	Fei Ye and Adrian G. Bors	(参考訳) 人間の独特の認知能力は、一連の経験から新しい知識とスキルを得る能力から成り立っている。一方、人工知能システムは、過去に学んだデータベースを覚えることなく、与えられた最後のタスクのみを学ぶのに長けている。本稿では,教師-学生ネットワークフレームワークを用いた生涯学習手法を提案する。学生モジュールが与えられた新しいデータベースでトレーニングされている間、教師モジュールは過去に学んだ情報を学生に思い出させる。 The TeacherはGAN(Generative Adversarial Network)によって実装され、学習前のデータベースの確率的表現に対応する過去の知識を保存・再生するように訓練されている。一方、学生モジュールは変分オートエンコーダ(VAE)によって実装され、教師モジュールの出力と新たに利用可能なデータベースの両方から潜在変数表現を推論する。さらに、学生モジュールは、異なるドメインにまたがる連続的および離散的なデータ表現の両方をキャプチャするように訓練される。提案した生涯学習フレームワークは、教師付き、半教師付き、教師なしの訓練に適用される。コードは?: \url{https://github.com/dtuzi123/Lifelong-Teacher-Student-Network-Learning} A unique cognitive capability of humans consists in their ability to acquire new knowledge and skills from a sequence of experiences. Meanwhile, artificial intelligence systems are good at learning only the last given task without being able to remember the databases learnt in the past. We propose a novel lifelong learning methodology by employing a Teacher-Student network framework. While the Student module is trained with a new given database, the Teacher module would remind the Student about the information learnt in the past. The Teacher, implemented by a Generative Adversarial Network (GAN), is trained to preserve and replay past knowledge corresponding to the probabilistic representations of previously learn databases. Meanwhile, the Student module is implemented by a Variational Autoencoder (VAE) which infers its latent variable representation from both the output of the Teacher module as well as from the newly available database. Moreover, the Student module is trained to capture both continuous and discrete underlying data representations across different domains. The proposed lifelong learning framework is applied in supervised, semi-supervised and unsupervised training. The code is available~: \url{https://github.com/dtuzi123/Lifelong-Teacher-Student-Network-Learning}	翻訳日:2021-07-14 11:39:12 公開日:2021-07-09
# (参考訳) 非Native Spoken Question-Answering の初期調査 An Initial Investigation of Non-Native Spoken Question-Answering ( http://arxiv.org/abs/2107.04691v1 ) ライセンス: CC BY 4.0	Vatsal Raina, Mark J.F. Gales	(参考訳) テキストベースマシン理解(mc)システムには幅広い応用があり、アプローチの開発と評価には標準コーパスが存在する。音声質問応答 (SQA) システムの研究は, はるかに少ない。本論文で検討されているsqaタスクは,質問に対する質問応答の候補$\text{'}$sから,即応型言語アセスメントテストで回答を抽出することである。例えば、このSQAタスクにこれらのMCアプローチを適用することで、例えば、オフトピー応答検出は、さらに下流処理に使用できるはるかに詳細な情報を提供する。重要な課題の1つは、このタスクのためにシステムを訓練するために適切に注釈付けされた音声コーパスがないことである。したがって、非ネイティブ話者によるSQAタスクにおいて、テキストベースのMCで訓練されたシステムを評価できるトランスファーラーニング方式を採用する。ミスマッチは、テキスト文書と音声応答、非ネイティブな文法と文法の間で考慮されなければならない。実用的なSQAでは、ASRシステムを使用し、ASRエラーの影響を調べる必要がある。 SQAD2.0 で訓練された単純なテキストベースの ELECTRA MC モデルが,SQA に対して良好であることを示す。その結果,asr誤差とsqa評価スコアには線形関係がみられたが,文法的ミスマッチの影響は最小限であった。 Text-based machine comprehension (MC) systems have a wide-range of applications, and standard corpora exist for developing and evaluating approaches. There has been far less research on spoken question answering (SQA) systems. The SQA task considered in this paper is to extract the answer from a candidate$\text{'}$s spoken response to a question in a prompt-response style language assessment test. Applying these MC approaches to this SQA task rather than, for example, off-topic response detection provides far more detailed information that can be used for further downstream processing. One significant challenge is the lack of appropriately annotated speech corpora to train systems for this task. Hence, a transfer-learning style approach is adopted where a system trained on text-based MC is evaluated on an SQA task with non-native speakers. Mismatches must be considered between text documents and spoken responses; non-native spoken grammar and written grammar. In practical SQA, ASR systems are used, necessitating an investigation of the impact of ASR errors. We show that a simple text-based ELECTRA MC model trained on SQuAD2.0 transfers well for SQA. It is found that there is an approximately linear relationship between ASR errors and the SQA assessment scores but grammar mismatches have minimal impact.	翻訳日:2021-07-14 10:59:07 公開日:2021-07-09
# (参考訳) 変分オートエンコーダの寿命混合 Lifelong Mixture of Variational Autoencoders ( http://arxiv.org/abs/2107.04694v1 ) ライセンス: CC BY-SA 4.0	Fei Ye and Adrian G. Bors	(参考訳) 本稿では,専門家による終末から終末までの学習の組み合わせを提案する。各専門家は変分オートエンコーダ(VAE)によって実装される。混合システムのエキスパートは、与えられたトレーニングサンプルのログライクな状態において、個々のコンポーネントエビデンスローバウンド(MELBO)の混合物を最大化することによって共同で訓練される。混合における混合係数は、目標表現における各専門家の貢献を制御する。これらは、生涯学習中の非パラメトリック推定によってパラメータが決定されるディリクレ分布からサンプリングされる。モデルは、これらが以前学んだものと似ている場合に、新しいタスクを素早く学習することができる。 VAE(L-MVAE)のLifelong混合は、完全に新しいタスクを学ぶ際に、アーキテクチャを新しいコンポーネントで拡張する。トレーニング後、我々のモデルは、新しいデータサンプルを投入する際に使用する関連する専門家を自動的に決定できる。このメカニズムは、推論中に専門家が1人しか使わないため、メモリ効率と計算コストの両方に効果がある。 L-MVAE推論モデルは、異なるタスクに関連するデータ領域にまたがる結合潜在空間において補間を行うことができ、非絡み合いの学習表現に効率的であることが示されている。 In this paper, we propose an end-to-end lifelong learning mixture of experts. Each expert is implemented by a Variational Autoencoder (VAE). The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds (MELBO) on the log-likelihood of the given training samples. The mixing coefficients in the mixture, control the contributions of each expert in the goal representation. These are sampled from a Dirichlet distribution whose parameters are determined through non-parametric estimation during lifelong learning. The model can learn new tasks fast when these are similar to those previously learnt. The proposed Lifelong mixture of VAE (L-MVAE) expands its architecture with new components when learning a completely new task. After the training, our model can automatically determine the relevant expert to be used when fed with new data samples. This mechanism benefits both the memory efficiency and the required computational cost as only one expert is used during the inference. The L-MVAE inference model is able to perform interpolation in the joint latent space across the data domains associated with different tasks and is shown to be efficient for disentangled learning representation.	翻訳日:2021-07-14 10:47:32 公開日:2021-07-09
# (参考訳) L2M:最適化駆動第2モーメント推定による後部ラプラス近似 L2M: Practical posterior Laplace approximation with optimization-driven second moment estimation ( http://arxiv.org/abs/2107.04695v1 ) ライセンス: CC BY 4.0	Christian S. Perone, Roberto Pereira Silveira, Thomas Paula	(参考訳) ディープニューラルネットワークの不確かさの定量化は、最近多くの技術を通じて進化している。本研究では,計算的に魅力的な後方近似の古典的アプローチであるLaplace近似を再検討する。しかし、曲率行列を計算する代わりに、いくつかの正規性条件の下では、ラプラス近似が勾配第二モーメントを用いて容易に構成できることを示す。この量はアダムやRMSpropのような多くの指数移動平均変種によって既に推定されているが、伝統的に訓練後に捨てられている。提案手法(l2m)はモデルや最適化の変更を必要とせず、合理的な結果を得るために数行のコードで実装でき、新しいハイパーパラメータを導入することなく、既にオプティマイザによって計算されているもの以外の計算ステップも必要としないことを示す。提案手法は,深部ニューラルネットワークにおける不確実性推定のための最適化器によって既に計算されている量を用いて,新たな研究方向を開拓できることを期待する。 Uncertainty quantification for deep neural networks has recently evolved through many techniques. In this work, we revisit Laplace approximation, a classical approach for posterior approximation that is computationally attractive. However, instead of computing the curvature matrix, we show that, under some regularity conditions, the Laplace approximation can be easily constructed using the gradient second moment. This quantity is already estimated by many exponential moving average variants of Adagrad such as Adam and RMSprop, but is traditionally discarded after training. We show that our method (L2M) does not require changes in models or optimization, can be implemented in a few lines of code to yield reasonable results, and it does not require any extra computational steps besides what is already being computed by optimizers, without introducing any new hyperparameter. We hope our method can open new research directions on using quantities already computed by optimizers for uncertainty estimation in deep neural networks.	翻訳日:2021-07-14 10:19:52 公開日:2021-07-09
# (参考訳) infovaegan : 情報最大化と最大確率による理解可能表現の学習 InfoVAEGAN : learning joint interpretable representations by information maximization and maximum likelihood ( http://arxiv.org/abs/2107.04705v1 ) ライセンス: CC BY-SA 4.0	Fei Ye and Adrian G. Bors	(参考訳) 乱れと解釈可能な表現の学習は、多様体上の包括的なデータ表現を達成するための重要なステップである。本稿では,可変オートエンコーダ(vae)の推論能力と生成型逆ネットワーク(gan)の一般化能力を組み合わせた新しい表現学習アルゴリズムを提案する。提案モデルはInfoVAEGANと呼ばれ,Encoder, Generator, Discriminatorの3つのネットワークで構成されている。 InfoVAEGANは、2つの異なるデータフリーログライクな関数をジェネレータの分布からサンプリングされた変数に使用することにより、離散的かつ連続的な解釈可能な表現を教師なしで共同学習することを目的としている。本稿では,生成ネットワークを生成器のトレーニングとは別に最適化する2段階アルゴリズムを提案する。さらに,既存の潜伏変数と生成および推論プロセスによって生成された変数間の相互情報の最大化を通じて,解釈可能な表現の学習を実施する。 Learning disentangled and interpretable representations is an important step towards accomplishing comprehensive data representations on the manifold. In this paper, we propose a novel representation learning algorithm which combines the inference abilities of Variational Autoencoders (VAE) with the generalization capability of Generative Adversarial Networks (GAN). The proposed model, called InfoVAEGAN, consists of three networks~: Encoder, Generator and Discriminator. InfoVAEGAN aims to jointly learn discrete and continuous interpretable representations in an unsupervised manner by using two different data-free log-likelihood functions onto the variables sampled from the generator's distribution. We propose a two-stage algorithm for optimizing the inference network separately from the generator training. Moreover, we enforce the learning of interpretable representations through the maximization of the mutual information between the existing latent variables and those created through generative and inference processes.	翻訳日:2021-07-14 10:10:49 公開日:2021-07-09
# (参考訳) 長寿命双対生成対向ネットワーク Lifelong Twin Generative Adversarial Networks ( http://arxiv.org/abs/2107.04708v1 ) ライセンス: CC BY-SA 4.0	Fei Ye and Adrian G. Bors	(参考訳) 本稿では,ライフロングツイン生成適応ネットワーク (LT-GAN) と呼ばれる連続学習型生成モデルを提案する。 LT-GANは複数のデータベースから一連のタスクを学習し、そのアーキテクチャは3つのコンポーネントで構成されている。 lt-gansが忘れずに新しい概念を学べるようにするため、教師とアシスタントが交互に相互に教え合うように促し、新しいデータベースを学習しながら、生涯にわたって学習する新しい訓練手法、lakd(lifelong adversarial knowledge distillation)を導入する。このトレーニングアプローチは、より知識のあるプレイヤーから、以前与えられたタスクに関する情報が少ない他のプレイヤーに知識を移すことを好む。 In this paper, we propose a new continuously learning generative model, called the Lifelong Twin Generative Adversarial Networks (LT-GANs). LT-GANs learns a sequence of tasks from several databases and its architecture consists of three components: two identical generators, namely the Teacher and Assistant, and one Discriminator. In order to allow for the LT-GANs to learn new concepts without forgetting, we introduce a new lifelong training approach, namely Lifelong Adversarial Knowledge Distillation (LAKD), which encourages the Teacher and Assistant to alternately teach each other, while learning a new database. This training approach favours transferring knowledge from a more knowledgeable player to another player which knows less information about a previously given task.	翻訳日:2021-07-14 10:01:07 公開日:2021-07-09
# (参考訳) 劣化網膜の基底画像におけるランドマーク検出のための階層型ボトルネック注意U-Net U-Net with Hierarchical Bottleneck Attention for Landmark Detection in Fundus Images of the Degenerated Retina ( http://arxiv.org/abs/2107.04721v1 ) ライセンス: CC BY 4.0	Shuyun Tang, Ziming Qi, Jacob Granley and Michael Beyeler	(参考訳) 眼底写真は、臨床における加齢関連黄斑変性症(AMD)、緑内障、糖尿病網膜症(DR)などの網膜変性疾患の存在と重症度を日常的に記録するために使われてきた。しかし、網膜変性に伴う病変、ドルゼン、その他の網膜異常の発生は、自動的ランドマーク検出とセグメンテーションを著しく複雑にする。本稿では,階層的ボトルネックに注目するU-NetバックボーンHBA-U-Netを提案する。このネットワークは、自己注意、チャネルアテンション、および相対的な位置アテンションを組み合わせた、新たなボトルネックアテンションブロックで構成されており、変性網膜における卵胞およびODセグメンテーションに重要な網膜異常を強調している。 hba-u-netは、データセットと眼の状態(adam: euclidean distance (ed) of 25.4 pixels, refuge: 32.5 pixels, idrid: 32.1 pixels), on od segmentation for amd (adam: dice coefficient (dc) of 0.947), on od detection for dr (idrid: ed of 20.5 pixels)の最新の結果を得た。以上の結果から,HBA-U-Netは網膜変性疾患の存在下でのランドマーク検出に適している可能性が示唆された。 Fundus photography has routinely been used to document the presence and severity of retinal degenerative diseases such as age-related macular degeneration (AMD), glaucoma, and diabetic retinopathy (DR) in clinical practice, for which the fovea and optic disc (OD) are important retinal landmarks. However, the occurrence of lesions, drusen, and other retinal abnormalities during retinal degeneration severely complicates automatic landmark detection and segmentation. Here we propose HBA-U-Net: a U-Net backbone enriched with hierarchical bottleneck attention. The network consists of a novel bottleneck attention block that combines and refines self-attention, channel attention, and relative-position attention to highlight retinal abnormalities that may be important for fovea and OD segmentation in the degenerated retina. HBA-U-Net achieved state-of-the-art results on fovea detection across datasets and eye conditions (ADAM: Euclidean Distance (ED) of 25.4 pixels, REFUGE: 32.5 pixels, IDRiD: 32.1 pixels), on OD segmentation for AMD (ADAM: Dice Coefficient (DC) of 0.947), and on OD detection for DR (IDRiD: ED of 20.5 pixels). Our results suggest that HBA-U-Net may be well suited for landmark detection in the presence of a variety of retinal degenerative diseases.	翻訳日:2021-07-14 09:51:17 公開日:2021-07-09
# 非可逆目的を持つオーバーパラメータモデルのトレーニング Training Over-parameterized Models with Non-decomposable Objectives ( http://arxiv.org/abs/2107.04641v1 ) ライセンス: Link先を確認	Harikrishna Narasimhan, Aditya Krishna Menon	(参考訳) 多くの現代の機械学習アプリケーションは、最悪のケースエラーを最小限に抑えること、与えられた精度やリコールターゲットを満たすこと、グループフェアネスの制約を強制することなど、複雑で曖昧な設計目標を掲げている。このような分解不能な目的を最適化するための一般的なテクニックは、問題をコストに敏感な一連の学習タスクに還元し、それぞれがサンプル固有のコストでトレーニング損失を再重み付けすることで解決する。ラベルコストを組み込むために損失を再重み付けする標準的なアプローチは、過パラメータモデルのトレーニングで不満足な結果をもたらす可能性がある、と指摘する。そこで本稿では,ロジット調整という古典的な考え方を拡張し,より一般的なコスト行列を扱うための新たなコスト感受性損失を提案する。私たちの損失は校正され、教師モデルからの蒸留ラベルによってさらに改善できます。ベンチマーク画像データセットの実験を通じて、共通の頑健で制約のある最適化目標を持つResNetモデルのトレーニングにおいて、我々のアプローチの有効性を示す。 Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the training loss with example-specific costs. We point out that the standard approach of re-weighting the loss to incorporate label costs can produce unsatisfactory results when used to train over-parameterized models. As a remedy, we propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices. Our losses are calibrated, and can be further improved with distilled labels from a teacher model. Through experiments on benchmark image datasets, we showcase the effectiveness of our approach in training ResNet models with common robust and constrained optimization objectives.	翻訳日:2021-07-13 16:17:35 公開日:2021-07-09
# 直線上の精度:分布外と分布内一般化の強い相関について Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization ( http://arxiv.org/abs/2107.04649v1 ) ライセンス: Link先を確認	John Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, Ludwig Schmidt	(参考訳) 機械学習システムが信頼できるためには、その性能を無意識で分散しない環境で理解する必要がある。本稿では,様々なモデルに対する分配性能と分配性能が強く相関していることを実証的に示す。具体的には,YCBオブジェクトから合成されたポーズ推定タスク,FMoW-WILDSの衛星画像分類,iWildCam-WILDSの野生生物分類,CIFAR-10とImageNetの変種に対する分布内分布と分布外分布性能の相関性を示す。モデルアーキテクチャ、ハイパーパラメータ、トレーニングセットサイズ、トレーニング期間の間に強い相関関係があり、既存のドメイン適応理論から予想されるよりも正確である。また,CIFAR-10-Cと組織分類データセットCamelyon17-WILDSの合成分布の変化など,相関が弱いケースについても検討した。最後に,分布シフトによるデータ共分散の変化が観測された相関に与える影響を示すガウスデータモデルに基づく候補理論を提案する。 For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, satellite imagery classification in FMoW-WILDS, and wildlife classification in iWildCam-WILDS. The strong correlations hold across model architectures, hyperparameters, training set size, and training duration, and are more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.	翻訳日:2021-07-13 16:14:33 公開日:2021-07-09
# 変分オートエンコーダにおけるエンコーダの表現複雑性に及ぼす可逆性の影響 The Effects of Invertibility on the Representational Complexity of Encoders in Variational Autoencoders ( http://arxiv.org/abs/2107.04652v1 ) ライセンス: Link先を確認	Divyansh Pareek, Andrej Risteski	(参考訳) 現代のニューラルネットワークに基づく潜在変数生成モデル(変分オートエンコーダなど)のトレーニングと使用には、しばしば、潜在変数の後方分布を近似する推論(エンコード)方向とともに生成方向の訓練を同時に行う必要がある。与えられた生成モデルの後方分布を正確にモデル化するために、推論モデルはどの程度複雑でなければならないのか? 本稿では,エンコーダの必要なサイズに影響を及ぼす生成写像の重要な特性を同定する。生成写像が「強可逆(strongly invertible)」ならば(ある意味では、適切に形式化できる)、推論モデルはそれほど複雑ではない。逆に、エンコーディング方向が指数関数的に大きい(計算複雑性の標準的な仮定の下で)必要となる非可逆生成写像が存在することを証明する。重要なことは、生成モデルは階層的に非可逆である必要はなく、関係する文献の多くが想定し、実際に使用される多くのアーキテクチャ(例えば、)に満足していない。畳み込みとプールベースのネットワーク)。したがって、低次元多様体上にデータを置くと、深層生成モデルの学習が困難であるという経験的知恵を理論的に支持する。 Training and using modern neural-network based latent-variable generative models (like Variational Autoencoders) often require simultaneously training a generative direction along with an inferential(encoding) direction, which approximates the posterior distribution over the latent variables. Thus, the question arises: how complex does the inferential model need to be, in order to be able to accurately model the posterior distribution of a given generative model? In this paper, we identify an important property of the generative map impacting the required size of the encoder. We show that if the generative map is "strongly invertible" (in a sense we suitably formalize), the inferential model need not be much more complex. Conversely, we prove that there exist non-invertible generative maps, for which the encoding direction needs to be exponentially larger (under standard assumptions in computational complexity). Importantly, we do not require the generative model to be layerwise invertible, which a lot of the related literature assumes and isn't satisfied by many architectures used in practice (e.g. convolution and pooling based networks). Thus, we provide theoretical support for the empirical wisdom that learning deep generative models is harder when data lies on a low-dimensional manifold.	翻訳日:2021-07-13 16:14:14 公開日:2021-07-09
# 因果推論における感度解析のためのh\"older bounds H\"older Bounds for Sensitivity Analysis in Causal Reasoning ( http://arxiv.org/abs/2107.04661v1 ) ライセンス: Link先を確認	Serge Assaad, Shuxi Zeng, Henry Pfister, Fan Li, Lawrence Carin	(参考訳) 本研究では,未保存の共同設立者Uの存在から,治療Tが成績Yに与える影響の間隔推定を行った。 H\'olderの不等式を用いて、未測定の共役の度合い(すなわち、接続 U->T の強さと U->Y の強さ)に基づいて、共役バイアス \|E[Y\|T=t]-E[Y\|do(T=t)]\| 上の一連の境界を導出する。これらの境界は、U が T から独立であるとき、または U が T から独立であるとき、あるいは U が T から独立であるとき、厳密である。我々は、分布 p(U) と p(U\|T=t) の間の全変動距離、および平均期待結果 E[Y\|U=u,T=t] からの条件付き期待結果 E[Y\|U=u,T=t] の最大偏差(U のすべての可能な値)に依存するこの境界の特別な場合に焦点を当てる。本稿では,このバウンドのキャリブレーション戦略について検討し,合成および半合成データセットを用いてそのバウンドを実験的に検証する。 We examine interval estimation of the effect of a treatment T on an outcome Y given the existence of an unobserved confounder U. Using H\"older's inequality, we derive a set of bounds on the confounding bias \|E[Y\|T=t]-E[Y\|do(T=t)]\| based on the degree of unmeasured confounding (i.e., the strength of the connection U->T, and the strength of U->Y). These bounds are tight either when U is independent of T or when U is independent of Y given T (when there is no unobserved confounding). We focus on a special case of this bound depending on the total variation distance between the distributions p(U) and p(U\|T=t), as well as the maximum (over all possible values of U) deviation of the conditional expected outcome E[Y\|U=u,T=t] from the average expected outcome E[Y\|T=t]. We discuss possible calibration strategies for this bound to get interval estimates for treatment effects, and experimentally validate the bound using synthetic and semi-synthetic datasets.	翻訳日:2021-07-13 16:13:53 公開日:2021-07-09
# 人口ベースのセルフチューニングgcnによる自動グラフ学習 Automated Graph Learning via Population Based Self-Tuning GCN ( http://arxiv.org/abs/2107.04713v1 ) ライセンス: Link先を確認	Ronghang Zhu and Zhiqiang Tao and Yaliang Li and Sheng Li	(参考訳) 効率的なグラフ埋め込みを抽出する顕著な能力のため、グラフ畳み込みネットワーク(GCN)とその変種は、ノード分類、リンク予測、グラフ分類といった幅広いタスクにうまく適用されている。従来のGCNモデルはオーバーフィッティングとオーバースムーシングの問題に悩まされており、DropEdgeのような最近の技術はこれらの問題を緩和し、ディープGCNの開発を可能にする。しかし、GCNモデルのトレーニングは、特に深いGCNモデルにおいて、ドロップアウト率や学習重量減少などのハイパーパラメータの選択に敏感であるため、簡単ではない。本稿では,ハイパーパラメータ最適化によりGCNモデルのトレーニングを自動化することを目的とする。具体的には、代替トレーニングアルゴリズムを用いた自己学習型GCNアプローチを提案し、人口ベーストレーニングスキームを取り入れたアプローチをさらに拡張する。 3つのベンチマークデータセットの実験結果から,複数の代表的ベースラインと比較して,多層GCNの最適化におけるアプローチの有効性が示された。 Owing to the remarkable capability of extracting effective graph embeddings, graph convolutional network (GCN) and its variants have been successfully applied to a broad range of tasks, such as node classification, link prediction, and graph classification. Traditional GCN models suffer from the issues of overfitting and oversmoothing, while some recent techniques like DropEdge could alleviate these issues and thus enable the development of deep GCN. However, training GCN models is non-trivial, as it is sensitive to the choice of hyperparameters such as dropout rate and learning weight decay, especially for deep GCN models. In this paper, we aim to automate the training of GCN models through hyperparameter optimization. To be specific, we propose a self-tuning GCN approach with an alternate training algorithm, and further extend our approach by incorporating the population based training scheme. Experimental results on three benchmark datasets demonstrate the effectiveness of our approaches on optimizing multi-layer GCN, compared with several representative baselines.	翻訳日:2021-07-13 16:11:41 公開日:2021-07-09
# 機械学習モデルの性能解析を改善するトポロジカルフレームワーク A Topological-Framework to Improve Analysis of Machine Learning Model Performance ( http://arxiv.org/abs/2107.04714v1 ) ライセンス: Link先を確認	Henry Kvinge, Colby Wight, Sarah Akers, Scott Howland, Woongjo Choi, Xiaolong Ma, Luke Gosink, Elizabeth Jurrus, Keerti Kappagantula, Tegan H. Emerson	(参考訳) 機械学習モデルと評価されたデータセットがサイズと複雑性が増大するにつれて、モデルのパフォーマンスを理解するためにいくつかの要約統計を使用するプラクティスがますます問題になっている。これは、データの特定のサブポピュレーションにおけるモデル失敗を理解することが重要な現実のシナリオにおいて特に当てはまる。本稿では,データセットをモデルが動作する「空間」として扱う機械学習モデルを評価するためのトポロジカルな枠組みを提案する。これにより、グローバルレベル(テストセット全体)とローカルレベル(特定のサブポピュレーション)の両方で、モデルパフォーマンスに関する情報を整理する原則化された方法が提供されます。最後に,様々な部分集団間のモデル性能を保存・分析するための便利な手法である,トポロジカルデータ構造であるpresheavesについて述べる。 As both machine learning models and the datasets on which they are evaluated have grown in size and complexity, the practice of using a few summary statistics to understand model performance has become increasingly problematic. This is particularly true in real-world scenarios where understanding model failure on certain subpopulations of the data is of critical importance. In this paper we propose a topological framework for evaluating machine learning models in which a dataset is treated as a "space" on which a model operates. This provides us with a principled way to organize information about model performance at both the global level (over the entire test set) and also the local level (on specific subpopulations). Finally, we describe a topological data structure, presheaves, which offer a convenient way to store and analyze model performance between different subpopulations.	翻訳日:2021-07-13 16:06:18 公開日:2021-07-09
# ノイズトレーニングによるエッジ用E2E ASRの改善 Noisy Training Improves E2E ASR for the Edge ( http://arxiv.org/abs/2107.04677v1 ) ライセンス: Link先を確認	Dilin Wang, Yuan Shangguan, Haichuan Yang, Pierce Chuang, Jiatong Zhou, Meng Li, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra	(参考訳) 音声認識(ASR)は現代のエッジデバイスでますます普及している。過去の研究では、エッジデバイス上でコンパクトに動作可能な全ニューロン音声認識器(E2E)を開発した。しかしながら、E2E ASRモデルは過度に適合する傾向にあり、見えないテストデータの一般化には困難である。層正規化、ドロップアウト、スペクトルデータ増大、入力の速度歪みなど、ASRモデルのトレーニングを規則化する様々な手法が提案されている。本稿では,e2e asrモデルトレーニングをさらに改善するための,単純かつ効果的なノイズトレーニング戦略を提案する。学習中にパラメータ空間にランダムノイズを導入することにより,より一般化した収束時のスムースモデルを生成することができる。我々は,高密度かつスパースなEmformerモデルの改良と,一貫したWER削減の観測に雑音学習を適用した。具体的には、90%の間隔でEmformerをトレーニングする場合、それぞれ12%と14%のWER改善をLibriSpeech Test-otherとTest-cleanデータセットで達成します。 Automatic speech recognition (ASR) has become increasingly ubiquitous on modern edge devices. Past work developed streaming End-to-End (E2E) all-neural speech recognizers that can run compactly on edge devices. However, E2E ASR models are prone to overfitting and have difficulties in generalizing to unseen testing data. Various techniques have been proposed to regularize the training of ASR models, including layer normalization, dropout, spectrum data augmentation and speed distortions in the inputs. In this work, we present a simple yet effective noisy training strategy to further improve the E2E ASR model training. By introducing random noise to the parameter space during training, our method can produce smoother models at convergence that generalize better. We apply noisy training to improve both dense and sparse state-of-the-art Emformer models and observe consistent WER reduction. Specifically, when training Emformers with 90% sparsity, we achieve 12% and 14% WER improvements on the LibriSpeech Test-other and Test-clean data set, respectively.	翻訳日:2021-07-13 16:04:55 公開日:2021-07-09
# 都市3次元モデリングのための累積評価 Cumulative Assessment for Urban 3D Modeling ( http://arxiv.org/abs/2107.04622v1 ) ライセンス: Link先を確認	Shea Hagstrom, Hee Won Pak, Stephanie Ku, Sean Wang, Gregory Hager, Myron Brown	(参考訳) 衛星画像からの都市の3dモデリングには、都市の特徴を表現するための正確なセマンティックセグメンテーション、表面高さの3d再構成のためのマルチビューステレオ、3正確な表面傾斜を持つコンパクトモデルを作成するための3dモデルフィッティングが必要である。本稿では,各コンポーネントからの誤差貢献を簡潔に捉えた累積評価指標を提案する。我々は,2つのオープンソースプロジェクトを拡張してエンド・ツー・エンドの3dモデリングベースラインソリューションを提供し,パブリックなリーダボードによるさらなる研究と評価を促進することで,このアプローチを実証する。 Urban 3D modeling from satellite images requires accurate semantic segmentation to delineate urban features, multiple view stereo for 3D reconstruction of surface heights, and 3D model fitting to produce compact models with accurate surface slopes. In this work, we present a cumulative assessment metric that succinctly captures error contributions from each of these components. We demonstrate our approach by providing challenging public datasets and extending two open source projects to provide an end-to-end 3D modeling baseline solution to stimulate further research and evaluation with a public leaderboard.	翻訳日:2021-07-13 16:01:44 公開日:2021-07-09
# 腹腔鏡画像の深度推定のための自己監督型生成逆数ネットワーク Self-Supervised Generative Adversarial Network for Depth Estimation in Laparoscopic Images ( http://arxiv.org/abs/2107.04644v1 ) ライセンス: Link先を確認	Baoru Huang, Jianqing Zheng, Anh Nguyen, David Tuch, Kunal Vyas, Stamatia Giannarou, Daniel S. Elson	(参考訳) 手術シーンの深度推定と3次元再構成は,コンピュータ支援手術における重要なステップである。近年の研究では、畳み込みニューラルネットワークによってステレオ画像ペアから深度を推定できることが示されている。しかし、最近の深度推定モデルは、ピクセル単位の基底真理を持つデータセットで訓練された。このようなデータは腹腔鏡画像では特に稀であり、実際の外科的応用に教師付き深度推定を適用することは困難である。この制限を克服するために,生成逆ネットワークに基づく自己教師型深度推定手法であるSADepthを提案する。エンコーダデコーダジェネレータと、トレーニング中に幾何学的制約を組み込む識別器で構成される。生成装置からのマルチスケール出力は、光度再投射損失による局所的なミニマを解くのに役立ち、対向学習はフレームワーク生成品質を改善する。 2つの公開データセットに対する大規模な実験により、SADepthは最新の最先端の教師なし手法を大きなマージンで上回り、腹腔鏡画像における教師なしと教師なしの深さ推定のギャップを減らしている。 Dense depth estimation and 3D reconstruction of a surgical scene are crucial steps in computer assisted surgery. Recent work has shown that depth estimation from a stereo images pair could be solved with convolutional neural networks. However, most recent depth estimation models were trained on datasets with per-pixel ground truth. Such data is especially rare for laparoscopic imaging, making it hard to apply supervised depth estimation to real surgical applications. To overcome this limitation, we propose SADepth, a new self-supervised depth estimation method based on Generative Adversarial Networks. It consists of an encoder-decoder generator and a discriminator to incorporate geometry constraints during training. Multi-scale outputs from the generator help to solve the local minima caused by the photometric reprojection loss, while the adversarial learning improves the framework generation quality. Extensive experiments on two public datasets show that SADepth outperforms recent state-of-the-art unsupervised methods by a large margin, and reduces the gap between supervised and unsupervised depth estimation in laparoscopic images.	翻訳日:2021-07-13 16:01:33 公開日:2021-07-09
# DDCNet: ディエンス予測のための深層拡張畳み込みニューラルネットワーク DDCNet: Deep Dilated Convolutional Neural Network for Dense Prediction ( http://arxiv.org/abs/2107.04715v1 ) ライセンス: Link先を確認	Ali Salehi, Madhusudhanan Balasubramanian	(参考訳) 光フローや不均一性推定などの複雑なピクセルマッチング問題は、コンピュータビジョンにおいて最も難しい課題である。近年,これらの問題に対する深層学習手法が成功している。十分に大きな有効受容場(ERF)とネットワーク内の空間的特徴の高分解能な分解能は、高分解能な密度推定を提供することに不可欠である。本稿では,高い空間的特徴分解能を維持しつつ,より広い受容領域を提供できるネットワークアーキテクチャを設計するためのシステム的アプローチを提案する。より大きなRFを実現するために,拡張畳み込み層を利用した。より深い層での拡散率を積極的に増加させることで、トレーニング可能なパラメータの数が著しく少ない十分に大きなRFを達成できた。ネットワーク設計戦略の第一指標として,光フロー推定問題を用いた。ベンチマークの結果(sintel, kitti, middlebury)は、私たちのコンパクトネットワークが軽量ネットワークのクラスで同等のパフォーマンスを達成できることを示しています。 Dense pixel matching problems such as optical flow and disparity estimation are among the most challenging tasks in computer vision. Recently, several deep learning methods designed for these problems have been successful. A sufficiently larger effective receptive field (ERF) and a higher resolution of spatial features within a network are essential for providing higher-resolution dense estimates. In this work, we present a systemic approach to design network architectures that can provide a larger receptive field while maintaining a higher spatial feature resolution. To achieve a larger ERF, we utilized dilated convolutional layers. By aggressively increasing dilation rates in the deeper layers, we were able to achieve a sufficiently larger ERF with a significantly fewer number of trainable parameters. We used optical flow estimation problem as the primary benchmark to illustrate our network design strategy. The benchmark results (Sintel, KITTI, and Middlebury) indicate that our compact networks can achieve comparable performance in the class of lightweight networks.	翻訳日:2021-07-13 16:01:16 公開日:2021-07-09
# SITHCon: 時間次元における入力スケーリングの変動に頑健なニューラルネットワーク SITHCon: A neural network robust to variations in input scaling on the time dimension ( http://arxiv.org/abs/2107.04616v1 ) ライセンス: Link先を確認	Brandon G. Jacques, Zoran Tiganj, Aakash Sarkar, Marc W. Howard, Per B. Sederberg	(参考訳) 機械学習では、畳み込みニューラルネットワーク(CNN)はコンピュータビジョンと時間とともに拡張されたパターンの認識の両方に非常に影響を与えた。コンピュータビジョンにおいて、柔軟性の一部は、変換不変性を達成するために畳み込み上の最大プール演算を使用することによって生じる。哺乳類の脳では、時間の神経表現は時間基底関数のセットを使用する。批判的に、これらの基底関数は、基底集合が対数時間上で均等に分布するように幾何級数に配置されているように見える。本稿では,対数的に分散した時間メモリを用いたSITHCon(Scale-Invariant Temporal History Convolution Network)を提案する。対数分布した時間記憶上の最大プールは、時間のスケール不変性をもたらす。 SITHConの性能を時間的畳み込みネットワーク(TCN)と比較し、両ネットワークが単変量および多変量時系列$f(t)$の分類と回帰問題を学習できるが、入力$f(at)$の再スケールに再学習することなく一般化できる特性を持つのはSITHConのみであることを示す。この性質は神経科学や心理学の知見に触発され、トレーニングの高速化や一般化性の向上など、大幅に異なる能力を持つ大規模ネットワークに繋がる可能性がある。 In machine learning, convolutional neural networks (CNNs) have been extremely influential in both computer vision and in recognizing patterns extended over time. In computer vision, part of the flexibility arises from the use of max-pooling operations over the convolutions to attain translation invariance. In the mammalian brain, neural representations of time use a set of temporal basis functions. Critically, these basis functions appear to be arranged in a geometric series such that the basis set is evenly distributed over logarithmic time. This paper introduces a Scale-Invariant Temporal History Convolution network (SITHCon) that uses a logarithmically-distributed temporal memory. A max-pool over a logarithmically-distributed temporal memory results in scale-invariance in time. We compare performance of SITHCon to a Temporal Convolution Network (TCN) and demonstrate that, although both networks can learn classification and regression problems on both univariate and multivariate time series $f(t)$, only SITHCon has the property that it generalizes without retraining to rescaled versions of the input $f(at)$. This property, inspired by findings from neuroscience and psychology, could lead to large-scale networks with dramatically different capabilities, including faster training and greater generalizability, even with significantly fewer free parameters.	翻訳日:2021-07-13 15:55:31 公開日:2021-07-09
# UAVと畳み込みネットワークの協調群を用いた効率的なリアルタイム画像認識 Efficient Real-Time Image Recognition Using Collaborative Swarm of UAVs and Convolutional Networks ( http://arxiv.org/abs/2107.04648v1 ) ライセンス: Link先を確認	Marwan Dhuheir, Emna Baccour, Aiman Erbad, Sinan Sabeeh, Mounir Hamdi	(参考訳) 無人航空機(uavs)は最近、異なる部門で使用され、困難で危険な地域で使用される能力に優れたため、大きな注目を集めている。さらに、コンピュータビジョンと人工知能の進歩により、森林火災の検出や国境監視といった様々な用途やソリューションにおけるUAVの使用が増加した。しかし、uavsでディープニューラルネットワーク(dnn)を使用することで、より深いネットワークや複雑なモデルを処理することの難しさが生まれ、オンボード計算が制限される。そこで本研究では,画像の分類と意思決定遅延の最小化を目的とした,リソース制約のあるUAV群に推論要求を分散する戦略を提案する。画像取得と最終決定の待ち時間を最小限に抑える最適化問題としてモデルを定式化する。定式化最適化解はnpハード問題である。したがって、オンラインリソース割り当てには不十分である。そこで我々は,オンラインヒューリスティックソリューションであるdistinferenceを導入して,利用可能なuavの中で最良なレイテンシを与えるレイヤ配置戦略を提案する。提案されたアプローチは、異なる低遅延アプリケーションや、レイヤのパイプライン(例えば、vgg)に編成されたすべてのcnnタイプ、あるいは残差ブロック(例えば、resnet)に基づいて使用するのに十分なほど一般的である。 Unmanned Aerial Vehicles (UAVs) have recently attracted significant attention due to their outstanding ability to be used in different sectors and serve in difficult and dangerous areas. Moreover, the advancements in computer vision and artificial intelligence have increased the use of UAVs in various applications and solutions, such as forest fires detection and borders monitoring. However, using deep neural networks (DNNs) with UAVs introduces several challenges of processing deeper networks and complex models, which restricts their on-board computation. In this work, we present a strategy aiming at distributing inference requests to a swarm of resource-constrained UAVs that classifies captured images on-board and finds the minimum decision-making latency. We formulate the model as an optimization problem that minimizes the latency between acquiring images and making the final decisions. The formulated optimization solution is an NP-hard problem. Hence it is not adequate for online resource allocation. Therefore, we introduce an online heuristic solution, namely DistInference, to find the layers placement strategy that gives the best latency among the available UAVs. The proposed approach is general enough to be used for different low decision-latency applications as well as for all CNN types organized into the pipeline of layers (e.g., VGG) or based on residual blocks (e.g., ResNet).	翻訳日:2021-07-13 15:48:53 公開日:2021-07-09
# とは何か? アルゴリズムフェアネスにおける形式的および実体的平等 Impossibility of What? Formal and Substantive Equality in Algorithmic Fairness ( http://arxiv.org/abs/2107.04642v1 ) ライセンス: Link先を確認	Ben Green	(参考訳) 社会的・経済的不平等の複合的危機に直面した多くの人々は、社会的公正を達成するためにアルゴリズム的意思決定に目を向けた。これらの取り組みが強化されるにつれて、"algorithmic fairness"という急成長する分野における推論は、実践においての公正さの出現をますます形作る。本稿では, アルゴリズム的公平性が, 社会的平等性を高めるための適切な概念的, 実践的なツールを提供するかどうかを問う。アルゴリズムの公正性に対する支配的な「形式的」アプローチは、その分析の狭い枠組みが改革に対する制限的なアプローチを生成するため、平等を追求する枠組みとして不適切である、と私は論じる。これらの欠点を踏まえて、社会階層に反するアルゴリズム的公正に対する「実質的」アプローチを提案し、不平等に対処する方法をより広範囲に分析する。この静的アプローチは、抑圧と戦うアルゴリズムの役割についてより実りある理論化を可能にする。形式的および実体的アルゴリズム的公正の区別は、各アプローチの「公正の実施可能性」(アルゴリズム的公正の数学的定義の不適合性)に対する応答によって例示される。形式的なアプローチでは、平等を高める努力に対する厳しい制限として「公正の不可能性」を受け入れる必要があるが、従属的なアプローチは、この虚偽のジレンマに従わず、社会的抑圧の状態を改善できるような改革を提案することによって、「公平の不可能性」から逃れることができる。 In the face of compounding crises of social and economic inequality, many have turned to algorithmic decision-making to achieve greater fairness in society. As these efforts intensify, reasoning within the burgeoning field of "algorithmic fairness" increasingly shapes how fairness manifests in practice. This paper interrogates whether algorithmic fairness provides the appropriate conceptual and practical tools for enhancing social equality. I argue that the dominant, "formal" approach to algorithmic fairness is ill-equipped as a framework for pursuing equality, as its narrow frame of analysis generates restrictive approaches to reform. In light of these shortcomings, I propose an alternative: a "substantive" approach to algorithmic fairness that centers opposition to social hierarchies and provides a more expansive analysis of how to address inequality. This substantive approach enables more fruitful theorizing about the role of algorithms in combatting oppression. The distinction between formal and substantive algorithmic fairness is exemplified by each approach's responses to the "impossibility of fairness" (an incompatibility between mathematical definitions of algorithmic fairness). While the formal approach requires us to accept the "impossibility of fairness" as a harsh limit on efforts to enhance equality, the substantive approach allows us to escape the "impossibility of fairness" by suggesting reforms that are not subject to this false dilemma and that are better equipped to ameliorate conditions of social oppression.	翻訳日:2021-07-13 15:42:53 公開日:2021-07-09
# モデル削減のためのガウス過程部分空間回帰 Gaussian Process Subspace Regression for Model Reduction ( http://arxiv.org/abs/2107.04668v1 ) ライセンス: Link先を確認	Ruda Zhang and Simon Mak and David Dunson	(参考訳) 部分空間値関数はパラメトリック・リダクション・オーダー・モデリング(PROM)を含む幅広い問題で発生する。 PROM では、各パラメータ点は、大きな系行列のペトロフ・ガレルキン射影に使用される部分空間に関連付けることができる。このような関数を近似する以前の取り組みは、不正確で遅い多様体上の補間を用いる。そこで我々は, ガウス過程部分空間回帰(gps)モデルという, 部分空間予測のためのベイズ非パラメトリックモデルを提案する。ユークリッド空間上の多変量ガウス分布(英語版)(multivariate gaussian distributions on the euclidean space)では、固定次元部分空間の集合であるグラスマン多様体上の合同確率モデル(英語版)(joint probability model)を誘導する。 GPSは単純な相関構造とモデル選択の原則的アプローチを採用している。その予測分布は解析形式を認め、パラメータ空間上の効率的な部分空間予測を可能にする。 PROMの場合、GPSは新しいパラメータポイントで確率的予測を提供し、局所的な縮小モデルの精度を保ち、計算の複雑さはシステム次元に依存しないため、オンライン計算に適している。本手法を部分空間補間と比較する4つの数値例と,局所還元モデルを補間する2つの方法を提案する。全体として、GPSは部分空間補間よりもデータ効率が良く、計算効率も高い。 Subspace-valued functions arise in a wide range of problems, including parametric reduced order modeling (PROM). In PROM, each parameter point can be associated with a subspace, which is used for Petrov-Galerkin projections of large system matrices. Previous efforts to approximate such functions use interpolations on manifolds, which can be inaccurate and slow. To tackle this, we propose a novel Bayesian nonparametric model for subspace prediction: the Gaussian Process Subspace regression (GPS) model. This method is extrinsic and intrinsic at the same time: with multivariate Gaussian distributions on the Euclidean space, it induces a joint probability model on the Grassmann manifold, the set of fixed-dimensional subspaces. The GPS adopts a simple yet general correlation structure, and a principled approach for model selection. Its predictive distribution admits an analytical form, which allows for efficient subspace prediction over the parameter space. For PROM, the GPS provides a probabilistic prediction at a new parameter point that retains the accuracy of local reduced models, at a computational complexity that does not depend on system dimension, and thus is suitable for online computation. We give four numerical examples to compare our method to subspace interpolation, as well as two methods that interpolate local reduced models. Overall, GPS is the most data efficient, more computationally efficient than subspace interpolation, and gives smooth predictions with uncertainty quantification.	翻訳日:2021-07-13 15:37:14 公開日:2021-07-09
# (参考訳) シナリオとVerifAIによる並列・多目的ファルシフィケーション Parallel and Multi-Objective Falsification with Scenic and VerifAI ( http://arxiv.org/abs/2107.04164v1 ) ライセンス: CC BY 4.0	Kesav Viswanadha, Edward Kim, Francis Indaheng, Daniel J. Fremont, Sanjit A. Seshia	(参考訳) Falsificationは、自律システムのシミュレーションベースの検証のための重要なツールとして登場した。本稿では,並列性を活用し,多目的仕様までファルシフィケーションを拡張することで,サンプリングベースファルシフィケーション法のスケーラビリティを向上するシナリオ仕様言語とVerifAIツールキットの拡張について述べる。まず,Scanic のシミュレーションとサンプリング機能と VerifAI のファルシフィケーション機能の両方にインターフェースされた並列化フレームワークを提案する。次に,本アルゴリズムを拡張して,サンプリング中の多目的最適化を支援する。ルールブックの概念を用いて,逆例探索プロセスの導出に使用できる複数のメトリクスに対する優先順序を指定する。最後に、これらの拡張の利点を、シークエンス言語で書かれた包括的なベンチマークセットで評価する。 Falsification has emerged as an important tool for simulation-based verification of autonomous systems. In this paper, we present extensions to the Scenic scenario specification language and VerifAI toolkit that improve the scalability of sampling-based falsification methods by using parallelism and extend falsification to multi-objective specifications. We first present a parallelized framework that is interfaced with both the simulation and sampling capabilities of Scenic and the falsification capabilities of VerifAI, reducing the execution time bottleneck inherently present in simulation-based testing. We then present an extension of VerifAI's falsification algorithms to support multi-objective optimization during sampling, using the concept of rulebooks to specify a preference ordering over multiple metrics that can be used to guide the counterexample search process. Lastly, we evaluate the benefits of these extensions with a comprehensive set of benchmarks written in the Scenic language.	翻訳日:2021-07-13 02:27:45 公開日:2021-07-09
# (参考訳) 動作単位と表現認識のためのマルチモーダル・マルチタスク学習法 A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition ( http://arxiv.org/abs/2107.04187v1 ) ライセンス: CC BY-SA 4.0	Yue Jin, Tianqing Zheng, Chao Gao, Guoqiang Xu	(参考訳) 人間の感情分析は、人間とコンピュータの相互作用システムにとって不可欠である。ほとんどのメソッドは、Wildの設定に実用的でない制限されたシナリオで開発されます。 ABAW (Affective Behavior Analysis in-the-wild) 2021 コンテストは、この進行中の問題に対するベンチマークを提供する。本稿では,視覚情報と音声情報の両方を用いたマルチモーダル・マルチタスク学習手法を提案する。 auアノテーションと式アノテーションの両方を使用してモデルをトレーニングし、ビデオフレーム間の関連をさらに抽出するためにシーケンスモデルを適用します。検証セット上でauスコア0.712、式スコア0.477を達成する。これらの結果は, モデル性能向上における我々のアプローチの有効性を示す。 Analyzing human affect is vital for human-computer interaction systems. Most methods are developed in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem. In this paper, we introduce a multi-modal and multi-task learning method by using both visual and audio information. We use both AU and expression annotations to train the model and apply a sequence model to further extract associations between video frames. We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set. These results demonstrate the effectiveness of our approach in improving model performance.	翻訳日:2021-07-13 02:16:08 公開日:2021-07-09
# (参考訳) 室内局所化のための非IIDデータを用いた個人化フェデレーション学習 Personalized Federated Learning over non-IID Data for Indoor Localization ( http://arxiv.org/abs/2107.04189v1 ) ライセンス: CC BY 4.0	Peng Wu, Tales Imbiriba, Junha Park, Sunwoo Kim, Pau Closas	(参考訳) データ駆動方式によるオブジェクトの局在化と追跡は,無線チャネル伝搬モデルの物理特性を特徴付ける複雑さから,一般的な話題である。これらのモデリングアプローチでは、ユーザのプライバシが維持されると同時に、モデルを正確にトレーニングするためにデータを収集する必要がある。これらの目標を協調的に達成するための魅力的なスキームは、連合学習(federated learning:fl)と呼ばれる。 FLスキームの課題は、異なる領域を不均一に探索することに起因する非独立で同一の(非IID)データの存在である。本稿では,近年のflスキームを用いて,ベイズ則によって最適に融合されるパーソナライズされたモデルの集合を学習し,屋内ローカライゼーションの文脈において適切であることを示す。 Localization and tracking of objects using data-driven methods is a popular topic due to the complexity in characterizing the physics of wireless channel propagation models. In these modeling approaches, data needs to be gathered to accurately train models, at the same time that user's privacy is maintained. An appealing scheme to cooperatively achieve these goals is known as Federated Learning (FL). A challenge in FL schemes is the presence of non-independent and identically distributed (non-IID) data, caused by unevenly exploration of different areas. In this paper, we consider the use of recent FL schemes to train a set of personalized models that are then optimally fused through Bayesian rules, which makes it appropriate in the context of indoor localization.	翻訳日:2021-07-13 02:10:53 公開日:2021-07-09
# (参考訳) テンソル処理ユニット上の畳み込みネットワークの構造化モデルプルーニング Structured Model Pruning of Convolutional Networks on Tensor Processing Units ( http://arxiv.org/abs/2107.04191v1 ) ライセンス: CC BY 4.0	Kongtao Chen, Ken Franko, Ruoxin Sang	(参考訳) 畳み込みニューラルネットワークの展開は、高い計算能力とストレージ要件によってしばしば妨げられる。構造化モデルプルーニングは、これらの要求を緩和するための有望なアプローチである。例えば、VGG-16モデルを用いて、テンソル処理ユニット(TPU)上の様々な構造化モデルプルーニング手法とデータセット(CIFAR-10およびImageNet)の精度-効率トレードオフを測定する。モデルの実際の性能を測定するため、TensorFlow2のための構造化モデルプルーニングライブラリを開発し、(マスク層を追加する代わりに)モデルを修正する。特に小さなデータセット(例えばcifar-10)では、構造化モデルプルーニングがモデルメモリ使用量とtpusの速度を大幅に改善できることを示した。 The deployment of convolutional neural networks is often hindered by high computational and storage requirements. Structured model pruning is a promising approach to alleviate these requirements. Using the VGG-16 model as an example, we measure the accuracy-efficiency trade-off for various structured model pruning methods and datasets (CIFAR-10 and ImageNet) on Tensor Processing Units (TPUs). To measure the actual performance of models, we develop a structured model pruning library for TensorFlow2 to modify models in place (instead of adding mask layers). We show that structured model pruning can significantly improve model memory usage and speed on TPUs without losing accuracy, especially for small datasets (e.g., CIFAR-10).	翻訳日:2021-07-13 02:01:30 公開日:2021-07-09
# (参考訳) 修正マルチタスク学習手法を用いた不完全ラベルによる感情認識 Emotion Recognition with Incomplete Labels Using Modified Multi-task Learning Technique ( http://arxiv.org/abs/2107.04192v1 ) ライセンス: CC BY 4.0	Phan Tran Dac Thinh, Hoang Manh Hung, Hyung-Jeong Yang, Soo-Hyung Kim, and Guee-Sang Lee	(参考訳) 人間の顔から7つの基本的な感情や行動単位などの感情情報を予測するタスクは、大量の注釈付きデータセットのアクセシビリティーと可用性により、徐々に興味深いものになりつつある。本研究では、afwild2データセットから7つの基本的な感情と12のアクションユニットを関連付ける手法を提案する。 ResNet50のアーキテクチャに基づく手法は、2つのタスクの不完全なラベルに対するマルチタスク学習技術を含む。 2つの相関したタスクの知識を組み合わせることで、両方のパフォーマンスは1種類のラベルのみを使用するモデルと比較して大きなマージンで改善される。 The task of predicting affective information in the wild such as seven basic emotions or action units from human faces has gradually become more interesting due to the accessibility and availability of massive annotated datasets. In this study, we propose a method that utilizes the association between seven basic emotions and twelve action units from the AffWild2 dataset. The method based on the architecture of ResNet50 involves the multi-task learning technique for the incomplete labels of the two tasks. By combining the knowledge for two correlated tasks, both performances are improved by a large margin compared to those with the model employing only one kind of label.	翻訳日:2021-07-13 01:54:19 公開日:2021-07-09
# (参考訳) 構造制約を考慮した確率的軌道予測 Probabilistic Trajectory Prediction with Structural Constraints ( http://arxiv.org/abs/2107.04193v1 ) ライセンス: CC BY 4.0	Weiming Zhi, Lionel Ott, Fabio Ramos	(参考訳) 本研究は,環境中の動的物体の運動軌跡を予測する問題に対処する。最近の動きパターン予測の進歩は、しばしば観測された軌道から動きパターンを外挿する機械学習技術に依存しており、既知の規則を直接組み込むメカニズムはない。本稿では,確率的学習と制約付き軌道最適化を組み合わせた新しい枠組みを提案する。我々のフレームワークの学習コンポーネントは、過去の観測座標に条件付けられた将来の運動軌跡の分布を提供する。この分布は、軌道分布の確率制約を強制する制約付き最適化問題の先行として用いられる。この結果、事前によく似た制約に従順な軌道分布が得られる。特に,外挿された将来の軌道分布が環境構造に適合するように,衝突の制約に焦点をあてる。実世界とシミュレーションされたデータセットを実証的に実証し,運動データに対する複雑な確率的運動軌跡を学習する上で,より堅牢で高品質な軌道分布を生成するために,制約を直接実施する。 This work addresses the problem of predicting the motion trajectories of dynamic objects in the environment. Recent advances in predicting motion patterns often rely on machine learning techniques to extrapolate motion patterns from observed trajectories, with no mechanism to directly incorporate known rules. We propose a novel framework, which combines probabilistic learning and constrained trajectory optimisation. The learning component of our framework provides a distribution over future motion trajectories conditioned on observed past coordinates. This distribution is then used as a prior to a constrained optimisation problem which enforces chance constraints on the trajectory distribution. This results in constraint-compliant trajectory distributions which closely resemble the prior. In particular, we focus our investigation on collision constraints, such that extrapolated future trajectory distributions conform to the environment structure. We empirically demonstrate on real-world and simulated datasets the ability of our framework to learn complex probabilistic motion trajectories for motion data, while directly enforcing constraints to improve generalisability, producing more robust and higher quality trajectory distributions.	翻訳日:2021-07-13 01:49:26 公開日:2021-07-09
# (参考訳) 直感的ユーザ入力からの深層画像合成 : レビューと展望 Deep Image Synthesis from Intuitive User Input: A Review and Perspectives ( http://arxiv.org/abs/2107.04240v1 ) ライセンス: CC BY 4.0	Yuan Xue, Yuan-Chen Guo, Han Zhang, Tao Xu, Song-Hai Zhang, Xiaolei Huang	(参考訳) コンピュータグラフィックス、アート、デザインの多くの応用において、ユーザはテキスト、スケッチ、ストローク、グラフ、レイアウトといった直感的な非画像入力を提供し、入力内容に準拠したフォトリアリスティックな画像を自動的に生成するコンピュータシステムを持つことが望ましい。このような自動画像コンテンツ生成を可能にする古典的な研究は、画像検索と合成の枠組みを踏襲しているが、GAN(generative adversarial network)、VAE(variantal autoencoder)、フローベース手法などの深層生成モデルの進歩により、より強力で汎用的な画像生成タスクが実現されている。本稿では,直感的なユーザ入力による画像合成,入力の汎用性の向上,画像生成手法,ベンチマークデータセット,評価指標について述べる。このことは、入力表現と対話性、主要画像生成パラダイム間のクロスポーリング、および生成方法の評価と比較に関する新しい視点を動機付けている。 In many applications of computer graphics, art and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically generate photo-realistic images that adhere to the input content. While classic works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation tasks. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross pollination between major image generation paradigms, and evaluation and comparison of generation methods.	翻訳日:2021-07-13 01:35:53 公開日:2021-07-09
# (参考訳) WinoCNN:FPGA上での効率的な畳み込みニューラルネットワーク高速化のためのカーネル共有Winograd Systolic Array WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs ( http://arxiv.org/abs/2107.04244v1 ) ライセンス: CC0 1.0	Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen	(参考訳) Winogradのアルゴリズムとsystolic arrayアーキテクチャの組み合わせにより、FPGAプラットフォーム上での畳み込みニューラルネットワーク(CNN)の高速化において、DSP効率を改善する能力が実証された。しかし、FPGAベースのWinograd処理要素で任意のコンボリューションカーネルサイズを扱い、効率的なデータアクセスをサポートすることは未定である。本研究では,WinoPEを最適化し,同じ計算資源で複数のカーネルサイズを自然にサポートし,高い実行時 DSP 効率を維持できる,最適化されたWinograd 処理素子を提案する。提案したWinoPEを用いて,WinoCNNと呼ばれる高効率なシリアルアレイ加速器を構築する。また,データアクセスを最適化する専用メモリサブシステムを提案する。アクセラレーションアーキテクチャに基づいて,リソース制約の異なる最適なアクセラレーション構成を探索するために,正確なリソースとパフォーマンスのモデリングを構築する。提案するアクセラレータを複数のFPGA上で実装し、スループットとDSP効率の両方で最先端の設計を上回ります。 Xilinx ZCU102 FPGA で DSP の効率を 1.33 GOPS/DSP まで向上し,スループットを 3.1 TOPS まで向上させる。これらはそれぞれ、前述した最高の解よりも29.1\%と20.0\%良い。 The combination of Winograd's algorithm and systolic array architecture has demonstrated the capability of improving DSP efficiency in accelerating convolutional neural networks (CNNs) on FPGA platforms. However, handling arbitrary convolution kernel sizes in FPGA-based Winograd processing elements and supporting efficient data access remain underexplored. In this work, we are the first to propose an optimized Winograd processing element (WinoPE), which can naturally support multiple convolution kernel sizes with the same amount of computing resources and maintains high runtime DSP efficiency. Using the proposed WinoPE, we construct a highly efficient systolic array accelerator, termed WinoCNN. We also propose a dedicated memory subsystem to optimize the data access. Based on the accelerator architecture, we build accurate resource and performance modeling to explore optimal accelerator configurations under different resource constraints. We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency. Our implementation achieves DSP efficiency up to 1.33 GOPS/DSP and throughput up to 3.1 TOPS with the Xilinx ZCU102 FPGA. These are 29.1\% and 20.0\% better than the best solutions reported previously, respectively.	翻訳日:2021-07-13 01:00:40 公開日:2021-07-09
# (参考訳) ハイブリッド自動微分を用いた微分プライベート機械学習における感度解析 Sensitivity analysis in differentially private machine learning using hybrid automatic differentiation ( http://arxiv.org/abs/2107.04265v1 ) ライセンス: CC BY 4.0	Alexander Ziller, Dmitrii Usynin, Moritz Knolle, Kritika Prakash, Andrew Trask, Rickmer Braren, Marcus Makowski, Daniel Rueckert, Georgios Kaissis	(参考訳) 近年,機械学習(ML)などのデータ駆動タスクに展開可能な,差分プライバシー(DP)などの形式的なプライバシ保護手法が出現している。個人のプライバシ損失の原則分析に必要なクローズドフォーム推論と大規模mlの調整には、自動感度分析のための新しいツールの導入と、計算フローを通じて個人のデータとその特徴を追跡することが必要である。そこで,本研究では,逆モードadの効率と計算グラフ内の任意の量に対してクローズドフォーム式を得る能力を組み合わせた,新しい \textit{hybrid} automatic differentiation (ad) システムを提案する。これにより、ニューラルネットワークをプライベートデータ上でトレーニングするなど、任意の微分可能な関数合成の感度をモデル化できる。統計的データベースクエリの個々のDP保証を分析することで、我々のアプローチを実証する。さらに,本手法のdpニューラルネットワークのトレーニングへの応用について検討した。当社のアプローチは,データ処理設定におけるプライバシ損失の原則的推論を可能にし,さらに自動感度分析とプライバシー予算システムの開発を可能にする。 In recent years, formal methods of privacy protection such as differential privacy (DP), capable of deployment to data-driven tasks such as machine learning (ML), have emerged. Reconciling large-scale ML with the closed-form reasoning required for the principled analysis of individual privacy loss requires the introduction of new tools for automatic sensitivity analysis and for tracking an individual's data and their features through the flow of computation. For this purpose, we introduce a novel \textit{hybrid} automatic differentiation (AD) system which combines the efficiency of reverse-mode AD with an ability to obtain a closed-form expression for any given quantity in the computational graph. This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data. We demonstrate our approach by analysing the individual DP guarantees of statistical database queries. Moreover, we investigate the application of our technique to the training of DP neural networks. Our approach can enable the principled reasoning about privacy loss in the setting of data processing, and further the development of automatic sensitivity analysis and privacy budgeting systems.	翻訳日:2021-07-13 00:28:31 公開日:2021-07-09
# (参考訳) fedadapt: フェデレーション学習におけるiotデバイスの適応オフロード FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning ( http://arxiv.org/abs/2107.04271v1 ) ライセンス: CC BY 4.0	Di Wu and Rehmat Ullah and Paul Harvey and Peter Kilpatrick and Ivor Spence and Blesson Varghese	(参考訳) Internet-of-Thingsデバイスにフェデレートラーニング(FL)を適用するには、生成する大量のデータと、データプライバシに関する懸念が不可欠である。しかし、FLを効率的にするためには、3つの課題がある: (i) 限られた計算能力を持つデバイス上で実行し、 (ii) デバイスの計算的不均一性に起因するストラグラーを考慮し、 (iii) ネットワーク帯域幅の変化に適応する。本稿では、上記の課題を軽減するための適応型オフロードFLフレームワークであるFedAdaptを提案する。 FedAdaptは、ディープニューラルネットワーク(DNN)をサーバにオフロードすることで、計算制約のあるデバイスのローカルトレーニングを加速する。さらに、FedAdaptは強化学習に基づく最適化とクラスタリングを採用し、各デバイスにDNNのどの層をオフロードすべきかを適応的に識別し、計算の不均一性やネットワーク帯域幅の変化といった課題に取り組む。 5つのIoTデバイスからなる実験室ベースのテストベッドで実験を行った。デバイスからサーバにDNNをオフロードすることで、FedAdaptは従来のFLに比べて、一般的なIoTデバイスのトレーニング時間を半減する。極端なストラグラーのトレーニング時間と全体のトレーニング時間は最大57%削減できる。さらに、ネットワーク帯域幅の変更により、FedAdaptは、従来のFLと比較してトレーニング時間を最大40%短縮する。 FedAdaptはhttps://github.com/qub-blesson/FedAdaptからダウンロードできる。 Applying Federated Learning (FL) on Internet-of-Things devices is necessitated by the large volumes of data they produce and growing concerns of data privacy. However, there are three challenges that need to be addressed to make FL efficient: (i) execute on devices with limited computational capabilities, (ii) account for stragglers due to computational heterogeneity of devices, and (iii) adapt to the changing network bandwidths. This paper presents FedAdapt, an adaptive offloading FL framework to mitigate the aforementioned challenges. FedAdapt accelerates local training in computationally constrained devices by leveraging layer offloading of deep neural networks (DNNs) to servers. Further, FedAdapt adopts reinforcement learning-based optimization and clustering to adaptively identify which layers of the DNN should be offloaded for each individual device on to a server to tackle the challenges of computational heterogeneity and changing network bandwidth. Experimental studies are carried out on a lab-based testbed comprising five IoT devices. By offloading a DNN from the device to the server FedAdapt reduces the training time of a typical IoT device by over half compared to classic FL. The training time of extreme stragglers and the overall training time can be reduced by up to 57%. Furthermore, with changing network bandwidth, FedAdapt is demonstrated to reduce the training time by up to 40% when compared to classic FL, without sacrificing accuracy. FedAdapt can be downloaded from https://github.com/qub-blesson/FedAdapt.	翻訳日:2021-07-13 00:17:38 公開日:2021-07-09
# (参考訳) LIFE: 3D OCT-A 容器セグメンテーションのための汎用オートディクティックパイプライン LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel Segmentation ( http://arxiv.org/abs/2107.04282v1 ) ライセンス: CC BY 4.0	Dewei Hu, Can Cui, Hao Li, Kathleen E. Larson, Yuankai K. Tao and Ipek Oguz	(参考訳) 光コヒーレンス断層撮影(OCT)は、眼科領域で広く用いられている非侵襲的イメージング技術である。 OCTアンギオグラフィー(OCT-A)に拡張し,コントラストが改善した網膜血管を呈する。近年の深層学習アルゴリズムは血管セグメンテーションに有望な結果をもたらすが,手動による注記データがないため,3次元網膜血管セグメンテーションは困難である。本研究では,局所強度融合(LIF)と呼ばれる自己合成モーメントによってのみ教師される学習に基づく手法を提案する。 LIFは、入力OCT-Aから直接計算される毛細血管拡張ボリュームである。次に、局所強度融合エンコーダ(LIFE)を構築し、与えられたOCT-A体積とそのLIFを共有潜在空間にマップする。 LIFEの潜在空間は入力データと同じ次元を持ち、両方のモダリティに共通する特徴を含む。この潜伏空間をバイナライズすることにより、体積容器セグメンテーションが得られる。本手法はヒト卵胞 OCT-A と 3 個のゼブラフィッシュ OCT-A を手動ラベルで評価した。人間のデータでは0.7736、ゼブラフィッシュデータでは 0.8594 +/-0.0275、教師なしのアルゴリズムより劇的な改善である。 Optical coherence tomography (OCT) is a non-invasive imaging technique widely used for ophthalmology. It can be extended to OCT angiography (OCT-A), which reveals the retinal vasculature with improved contrast. Recent deep learning algorithms produced promising vascular segmentation results; however, 3D retinal vessel segmentation remains difficult due to the lack of manually annotated training data. We propose a learning-based method that is only supervised by a self-synthesized modality named local intensity fusion (LIF). LIF is a capillary-enhanced volume computed directly from the input OCT-A. We then construct the local intensity fusion encoder (LIFE) to map a given OCT-A volume and its LIF counterpart to a shared latent space. The latent space of LIFE has the same dimensions as the input data and it contains features common to both modalities. By binarizing this latent space, we obtain a volumetric vessel segmentation. Our method is evaluated in a human fovea OCT-A and three zebrafish OCT-A volumes with manual labels. It yields a Dice score of 0.7736 on human data and 0.8594 +/- 0.0275 on zebrafish data, a dramatic improvement over existing unsupervised algorithms.	翻訳日:2021-07-12 23:55:21 公開日:2021-07-09
# (参考訳) Pseudo-Multimodal Fusion Network を用いた網膜OCT Retinal OCT Denoising with Pseudo-Multimodal Fusion Network ( http://arxiv.org/abs/2107.04288v1 ) ライセンス: CC BY 4.0	Dewei Hu, Joseph D. Malone, Yigit Atay, Yuankai K. Tao and Ipek Oguz	(参考訳) 光コヒーレンストモグラフィー(OCT)は、網膜の一般的なイメージング技術である。しかし、血管や組織層を含む重要な解剖学的構造の可視性を低下させることができる乗法的なスペックルノイズの影響を受けている。連続したBスキャンフレームの平均化はSNR(Signal-to-noise-ratio)を大幅に改善するが、これはより長い取得時間を必要とするため、運動アーティファクトの導入や患者への不快感を引き起こす可能性がある。本研究では,単フレーム雑音b-scan情報と擬似モダリティ情報を利用する学習ベース手法を提案する。擬似モダリティは、ノイズの多いBスキャンではほとんど認識できないが、小さな容器のような細かな特徴を過度に滑らかにできる層に対して優れたSNRを提供する。融合ネットワークを利用することで、各モダリティから望ましい特徴を組み合わせることができ、その寄与の重みを調整できる。強度基準および構造指標を用いて評価した結果,本手法はスペックルノイズを効果的に抑制し,網膜層間のコントラストを増強し,全体の構造と小血管を保存できることがわかった。本手法は, 単一モードネットワークと比較して0.559 +\- 0.033から0.576 +\- 0.031までの低雑音Bスキャンと構造的類似性を改善する。 Optical coherence tomography (OCT) is a prevalent imaging technique for retina. However, it is affected by multiplicative speckle noise that can degrade the visibility of essential anatomical structures, including blood vessels and tissue layers. Although averaging repeated B-scan frames can significantly improve the signal-to-noise-ratio (SNR), this requires longer acquisition time, which can introduce motion artifacts and cause discomfort to patients. In this study, we propose a learning-based method that exploits information from the single-frame noisy B-scan and a pseudo-modality that is created with the aid of the self-fusion method. The pseudo-modality provides good SNR for layers that are barely perceptible in the noisy B-scan but can over-smooth fine features such as small vessels. By using a fusion network, desired features from each modality can be combined, and the weight of their contribution is adjustable. Evaluated by intensity-based and structural metrics, the result shows that our method can effectively suppress the speckle noise and enhance the contrast between retina layers while the overall structure and small blood vessels are preserved. Compared to the single modality network, our method improves the structural similarity with low noise B-scan from 0.559 +\- 0.033 to 0.576 +\- 0.031.	翻訳日:2021-07-12 23:45:20 公開日:2021-07-09
# (参考訳) ポイントワイズ解析における最遠点サンプリング Beyond Farthest Point Sampling in Point-Wise Analysis ( http://arxiv.org/abs/2107.04291v1 ) ライセンス: CC BY 4.0	Yiqun Lin, Lichang Chen, Haibin Huang, Chongyang Ma, Xiaoguang Han and Shuguang Cui	(参考訳) サンプリング、グルーピング、アグリゲーションはポイントクラウドのマルチスケール分析において3つの重要なコンポーネントである。本稿では,ポイントワイズ分析タスクのための新しいデータ駆動型サンプル学習戦略を提案する。広く使われているサンプリング手法であるfarthest point sampling (fps) とは異なり,サンプリングと下流アプリケーションを同時に学習することを提案する。我々の重要な洞察は、FPSのような一様サンプリング手法が必ずしも異なるタスクに対して最適であるとは限らないことである。最後に,タスク関連真実情報によって教師されるサンプリング点変位を学習し,その基礎となる課題と協調して学習できる新しいサンプル学習手法を提案する。さらに,本手法を意味的部分分割,ポイントクラウド補完,キーポイント検出など,様々な点解析アーキテクチャで実証する。実験の結果, 従来のベースライン法に比べて, サンプルとタスクの同時学習が著しく改善した。 Sampling, grouping, and aggregation are three important components in the multi-scale analysis of point clouds. In this paper, we present a novel data-driven sampler learning strategy for point-wise analysis tasks. Unlike the widely used sampling technique, Farthest Point Sampling (FPS), we propose to learn sampling and downstream applications jointly. Our key insight is that uniform sampling methods like FPS are not always optimal for different tasks: sampling more points around boundary areas can make the point-wise classification easier for segmentation. Towards the end, we propose a novel sampler learning strategy that learns sampling point displacement supervised by task-related ground truth information and can be trained jointly with the underlying tasks. We further demonstrate our methods in various point-wise analysis architectures, including semantic part segmentation, point cloud completion, and keypoint detection. Our experiments show that jointly learning of the sampler and task brings remarkable improvement over previous baseline methods.	翻訳日:2021-07-12 23:37:47 公開日:2021-07-09
# (参考訳) 予測不確実性を考慮したランゲヴィン力学を用いたニューラルネットワークの微分プライベートトレーニング Differentially private training of neural networks with Langevin dynamics forcalibrated predictive uncertainty ( http://arxiv.org/abs/2107.04296v1 ) ライセンス: CC BY 4.0	Moritz Knolle, Alexander Ziller, Dmitrii Usynin, Rickmer Braren, Marcus R. Makowski, Daniel Rueckert, Georgios Kaissis	(参考訳) 偏差的にプライベートな確率的勾配降下(dp-sgd)は、校正が不十分で信頼度の高い深層学習モデルをもたらす可能性がある。これは、例えば安全クリティカルなアプリケーションにとって深刻な問題である。医学診断で我々は,従来の(DP-SGD)アルゴリズムを微調整した偏微分プライベートなベイズニューラルネットワークをトレーニングするために,ディープニューラルネットワークのトレーニングのためのスケーラブルベイズ推論手法である確率勾配ランゲヴィンダイナミクスとDP-SGDの並列性を強調・活用する。我々のアプローチはdp-sgdよりもかなり信頼性の高い不確実性推定を提供し、予測校正誤差の低減(mnist $\sim{5}$-fold、小児肺炎データセット $\sim{2}$-fold)によって実証された。 We show that differentially private stochastic gradient descent (DP-SGD) can yield poorly calibrated, overconfident deep learning models. This represents a serious issue for safety-critical applications, e.g. in medical diagnosis. We highlight and exploit parallels between stochastic gradient Langevin dynamics, a scalable Bayesian inference technique for training deep neural networks, and DP-SGD, in order to train differentially private, Bayesian neural networks with minor adjustments to the original (DP-SGD) algorithm. Our approach provides considerably more reliable uncertainty estimates than DP-SGD, as demonstrated empirically by a reduction in expected calibration error (MNIST $\sim{5}$-fold, Pediatric Pneumonia Dataset $\sim{2}$-fold).	翻訳日:2021-07-12 23:17:44 公開日:2021-07-09
# (参考訳) 重力波サーロゲートモデリングのためのオートエンコーダ駆動スパイラル表現学習 Autoencoder-driven Spiral Representation Learning for Gravitational Wave Surrogate Modelling ( http://arxiv.org/abs/2107.04312v1 ) ライセンス: CC BY 4.0	Paraskevi Nousi, Styliani-Christina Fragkouli, Nikolaos Passalis, Panagiotis Iosif, Theocharis Apostolatos, George Pappas, Nikolaos Stergioulas, Anastasios Tefas	(参考訳) 近年, 人工ニューラルネットワークは重力波天文学の分野で勢いを増している。例えば, 二元ブラックホールの吸入と融合のための計算コストの高い波形モデルの代理モデリングなどである。サーロゲートモデリングは、トレーニングサンプル外の任意の波形に対するサーロゲートモデルの係数を補間する最終段階において、重力波とニューラルネットワークの高速かつ正確な近似が得られる。オートエンコーダを用いた経験的補間係数における基底構造の存在について検討する。係数空間が2次元のみに圧縮されると、スパイラル構造が現れ、スパイラル角は質量比と線形に関係していることを示す。この発見に基づいて、ニューラルネットワークの第1層として使用される学習可能なパラメータを持つスパイラルモジュールを設計し、入力空間を係数にマッピングする方法を学習する。スパイラルモジュールは複数のニューラルネットワークアーキテクチャ上で評価され、ベースラインモデルよりも高い速度精度のトレードオフを達成する。デスクトップgpu上で1ms以下で1回のフォワードパスで数百万の入力パラメータを評価できるサーロゲートモデルと、対応する生成された波形と接地波形とのミスマッチが比較基準法より優れていることを示す。我々は、ブラックホール双対を回転させる場合の類似構造とそれに対応する計算ゲインの存在を予想する。 Recently, artificial neural networks have been gaining momentum in the field of gravitational wave astronomy, for example in surrogate modelling of computationally expensive waveform models for binary black hole inspiral and merger. Surrogate modelling yields fast and accurate approximations of gravitational waves and neural networks have been used in the final step of interpolating the coefficients of the surrogate model for arbitrary waveforms outside the training sample. We investigate the existence of underlying structures in the empirical interpolation coefficients using autoencoders. We demonstrate that when the coefficient space is compressed to only two dimensions, a spiral structure appears, wherein the spiral angle is linearly related to the mass ratio. Based on this finding, we design a spiral module with learnable parameters, that is used as the first layer in a neural network, which learns to map the input space to the coefficients. The spiral module is evaluated on multiple neural network architectures and consistently achieves better speed-accuracy trade-off than baseline models. A thorough experimental study is conducted and the final result is a surrogate model which can evaluate millions of input parameters in a single forward pass in under 1ms on a desktop GPU, while the mismatch between the corresponding generated waveforms and the ground-truth waveforms is better than the compared baseline methods. We anticipate the existence of analogous underlying structures and corresponding computational gains also in the case of spinning black hole binaries.	翻訳日:2021-07-12 23:07:53 公開日:2021-07-09
# (参考訳) 収穫者・リモートセンシング・環境データを用いたノルウェー産スズ林の腐朽量予測 Prediction of butt rot volume in Norway spruce forest stands using harvester, remotely sensed and environmental data ( http://arxiv.org/abs/2107.04316v1 ) ライセンス: CC BY 4.0	Janne R\"aty, Johannes Breidenbach, Marius Hauglin, Rasmus Astrup	(参考訳) ノルウェー・スプルース(picea abies [l.] karst.)に関連するバットロート(br)損傷北半球の木材生産でかなりの経済的損失を計上しています br損傷に関する情報は森林管理の最適意思決定には不可欠であるが、br損傷の地図は森林情報システムでは一般的に欠落している。ノルウェーのスタンドレベルにおいて, BRにより損傷を受けた木材の体積を, 186,026茎(クラーカット), リモートセンシング, 環境データ(例)を用いて予測した。気候と地形の特徴) 本研究では,(1)収穫後に利用可能な予測変数(理論ケース)と(2)収穫前に利用可能な予測変数(マッピングケース)の2種類の予測変数を持つランダムフォレスト(RF)モデルを用いた。森林特性は, リモートセンシングによる高さ, 収穫木材体積, 乳房高さの2次平均直径など, 森林の成熟度を特徴付けることが, 最も重要な予測変数であることがわかった。大気レーザースキャンデータとセンチネル-2画像から得られたリモートセンシング予測変数は,環境変数よりも重要であった。 11.4 $m^3ha^{-1}$(pseudo $R^2$: 0.66)のRMSEが得られたが、このマッピングの場合、擬似的なR^2$が0.60となった。林冠の空間的に異なるk-meansクラスターをクロスバリデーション単位とした場合, RMSE値と擬似$R^2$は, それぞれ15.6 $m^3ha^{-1}$と0.37であった。このことは, BR損傷のマッピングにおいて良好な誤差率を得る上で, 空間閉点のBR状態に関する知識が重要であることを示している。 Butt rot (BR) damages associated with Norway spruce (Picea abies [L.] Karst.) account for considerable economic losses in timber production across the northern hemisphere. While information on BR damages is critical for optimal decision-making in forest management, the maps of BR damages are typically lacking in forest information systems. We predicted timber volume damaged by BR at the stand-level in Norway using harvester information of 186,026 stems (clear-cuts), remotely sensed, and environmental data (e.g. climate and terrain characteristics). We utilized random forest (RF) models with two sets of predictor variables: (1) predictor variables available after harvest (theoretical case) and (2) predictor variables available prior to harvest (mapping case). We found that forest attributes characterizing the maturity of forest, such as remote sensing-based height, harvested timber volume and quadratic mean diameter at breast height, were among the most important predictor variables. Remotely sensed predictor variables obtained from airborne laser scanning data and Sentinel-2 imagery were more important than the environmental variables. The theoretical case with a leave-stand-out cross-validation achieved an RMSE of 11.4 $m^3ha^{-1}$ (pseudo $R^2$: 0.66) whereas the mapping case resulted in a pseudo $R^2$ of 0.60. When the spatially distinct k-means clusters of harvested forest stands were used as units in the cross-validation, the RMSE value and pseudo $R^2$ associated with the mapping case were 15.6 $m^3ha^{-1}$ and 0.37, respectively. This indicates that the knowledge about the BR status of spatially close stands is of high importance for obtaining satisfactory error rates in the mapping of BR damages.	翻訳日:2021-07-12 22:20:54 公開日:2021-07-09
# (参考訳) idrlnet:物理に変形したニューラルネットワークライブラリ IDRLnet: A Physics-Informed Neural Network Library ( http://arxiv.org/abs/2107.04320v1 ) ライセンス: CC BY 4.0	Wei Peng, Jun Zhang, Weien Zhou, Xiaoyu Zhao, Wen Yao, Xiaoqian Chen	(参考訳) 物理情報ニューラルネットワーク(英: Physics Informed Neural Network, PINN)は、偏微分方程式(英語版)(PDE)によってモデル化された前方および逆問題の両方を解決するための科学計算フレームワークである。本稿では,PINNによる問題解決のためのPythonツールボックスであるIDRLnetを紹介する。 IDRLnetは、幅広いPINNアルゴリズムとアプリケーションのためのフレームワークを構築している。幾何学的オブジェクト、データソース、人工ニューラルネットワーク、損失メトリクス、最適化をPythonに組み込む構造化された方法を提供する。さらに、雑音の逆問題、変分最小化、積分微分方程式を解く機能を提供する。新しいPINNの亜種は容易にフレームワークに統合できる。ソースコード、チュートリアル、ドキュメントは \url{https://github.com/idrl-lab/idrlnet} で入手できる。 Physics Informed Neural Network (PINN) is a scientific computing framework used to solve both forward and inverse problems modeled by Partial Differential Equations (PDEs). This paper introduces IDRLnet, a Python toolbox for modeling and solving problems through PINN systematically. IDRLnet constructs the framework for a wide range of PINN algorithms and applications. It provides a structured way to incorporate geometric objects, data sources, artificial neural networks, loss metrics, and optimizers within Python. Furthermore, it provides functionality to solve noisy inverse problems, variational minimization, and integral differential equations. New PINN variants can be integrated into the framework easily. Source code, tutorials, and documentation are available at \url{https://github.com/idrl-lab/idrlnet}.	翻訳日:2021-07-12 22:09:27 公開日:2021-07-09
# (参考訳) Mutually-Aware Sub-Graphs Differentiable Architecture Search Mutually-aware Sub-Graphs Differentiable Architecture Search ( http://arxiv.org/abs/2107.04324v1 ) ライセンス: CC BY 4.0	Haoxian Tan, Sheng Guo, Yujie Zhong, Weilin Huang	(参考訳) 差別化可能なアーキテクチャ検索は、そのシンプルさと効率性のため、nasの分野では、マルチパスアルゴリズムとシングルパスメソッドの2つのパラダイムが支配されている。マルチパスフレームワーク(例) DARTS)は直感的だが、メモリ使用量とトレーニングの崩壊に悩まされている。シングルパス法(GDASやProxylessNASなど)はメモリ問題を緩和し、検索と評価のギャップを縮めるが性能を犠牲にする。本稿では,これら2つのパラダイムを相互に認識するサブグラフ微分可能アーキテクチャ探索 (msg-das) と呼ぶ,概念的に単純かつ効率的な橋渡し手法を提案する。フレームワークのコアはGumbel-TopKサンプルであり、複数の相互排他的なシングルパスサブグラフを生成する。複数のサブグラフ設定によるスキップ接続の問題を軽減するため,最適化を安定化するためのDropblock-Identityモジュールを提案する。利用可能なモデル(スーパーネットとサブグラフ)を最大限に活用するために、トレーニングを改善するためのメモリ効率の高いスーパーネット誘導蒸留を導入する。提案するフレームワークは、フレキシブルメモリ使用量と検索品質のバランスをとる。本研究では,imagenet と cifar10 における提案手法の有効性を実証する。 Differentiable architecture search is prevalent in the field of NAS because of its simplicity and efficiency, where two paradigms, multi-path algorithms and single-path methods, are dominated. Multi-path framework (e.g. DARTS) is intuitive but suffers from memory usage and training collapse. Single-path methods (e.g.GDAS and ProxylessNAS) mitigate the memory issue and shrink the gap between searching and evaluation but sacrifice the performance. In this paper, we propose a conceptually simple yet efficient method to bridge these two paradigms, referred as Mutually-aware Sub-Graphs Differentiable Architecture Search (MSG-DAS). The core of our framework is a differentiable Gumbel-TopK sampler that produces multiple mutually exclusive single-path sub-graphs. To alleviate the severer skip-connect issue brought by multiple sub-graphs setting, we propose a Dropblock-Identity module to stabilize the optimization. To make best use of the available models (super-net and sub-graphs), we introduce a memory-efficient super-net guidance distillation to improve training. The proposed framework strikes a balance between flexible memory usage and searching quality. We demonstrate the effectiveness of our methods on ImageNet and CIFAR10, where the searched models show a comparable performance as the most recent approaches.	翻訳日:2021-07-12 22:06:13 公開日:2021-07-09
# (参考訳) attend2pack: 注意深い強化学習によるビンパッキング Attend2Pack: Bin Packing through Deep Reinforcement Learning with Attention ( http://arxiv.org/abs/2107.04333v1 ) ライセンス: CC0 1.0	Jingwei Zhang, Bin Zi, Xiaoyu Ge	(参考訳) 本稿では,学習の観点からBPP(bin packing problem)に取り組むことを目的とする。自己注意に基づく符号化と深層強化学習アルゴリズムに基づいて,本課題に対する新たなエンドツーエンド学習モデルを提案する。複合行動空間を分解し、また、政治学習を高速化する一般的な手法である優先オーバーサンプリングと呼ばれる新しい訓練手法を利用することで、様々な実験環境において最先端のパフォーマンスを実現する。さらに,提案手法はオフラインBPPをターゲットにしているが,本手法は最先端の性能も達成できる厳密なオンラインBPP設定に限定する。一連のアブレーション研究と、それ以前の一連の研究との比較により、この研究分野への有効なベースラインアプローチとして提案したい。 This paper seeks to tackle the bin packing problem (BPP) through a learning perspective. Building on self-attention-based encoding and deep reinforcement learning algorithms, we propose a new end-to-end learning model for this task of interest. By decomposing the combinatorial action space, as well as utilizing a new training technique denoted as prioritized oversampling, which is a general scheme to speed up on-policy learning, we achieve state-of-the-art performance in a range of experimental settings. Moreover, although the proposed approach attend2pack targets offline-BPP, we strip our method down to the strict online-BPP setting where it is also able to achieve state-of-the-art performance. With a set of ablation studies as well as comparisons against a range of previous works, we hope to offer as a valid baseline approach to this field of study.	翻訳日:2021-07-12 21:52:43 公開日:2021-07-09
# (参考訳) 変数式の変化の一般化と残留流への応用 Generalization of the Change of Variables Formula with Applications to Residual Flows ( http://arxiv.org/abs/2107.04346v1 ) ライセンス: CC BY 4.0	Niklas Koenen, Marvin N. Wright, Peter Maa{\ss} and Jens Behrmann	(参考訳) 正規化フローは可変式 (CVF) を利用してフレキシブル密度モデルを定義する。しかし、CVFにおける滑らかな変換(微分同相)の要求は、これらのモデルの構築において大きな課題となる。フローの設計空間を拡大するために、一般化変換として $\mathcal{L}$-diffeomorphisms を導入する。この緩和は、例えば、 ReLUのような非滑らかなアクティベーション関数の使用。最後に,得られた結果を平面流,ラジアル流,収縮的残留流に適用する。 Normalizing flows leverage the Change of Variables Formula (CVF) to define flexible density models. Yet, the requirement of smooth transformations (diffeomorphisms) in the CVF poses a significant challenge in the construction of these models. To enlarge the design space of flows, we introduce $\mathcal{L}$-diffeomorphisms as generalized transformations which may violate these requirements on zero Lebesgue-measure sets. This relaxation allows e.g. the use of non-smooth activation functions such as ReLU. Finally, we apply the obtained results to planar, radial, and contractive residual flows.	翻訳日:2021-07-12 21:24:41 公開日:2021-07-09
# (参考訳) 科学知識の形式化と可視化のためのオントロジー An ontology for the formalization and visualization of scientific knowledge ( http://arxiv.org/abs/2107.04347v1 ) ライセンス: CC BY 4.0	Vincenzo Daponte and Gilles Falquet	(参考訳) ここで提示される科学知識オブジェクトのオントロジーの構築は、科学的知識の可視化を指向したアプローチの開発の一部である。科学的知識の組織化の概念(理論、法、経験、証明など)によって動機づけられている。既存のオントロジーに現れるが、どれもこの話題に重点を置いておらず、シンプルで簡単に使える組織を提示する。オントロジソース(特定の分野の知識オブジェクトのオントロジー、語彙的および高レベルのオブジェクトのオントロジー)、専門知識ベース、科学者へのインタビューから構築された最初のバージョンを提示する。我々は、このオントロジーを、使用したいくつかのソースと整合させ、それらに関して一貫性を検証できるようにしました。オントロジーの検証は、我々が物理学の分野から始めた様々な情報源からの知識を形式化するためにそれを使うことである。 The construction of an ontology of scientific knowledge objects, presented here, is part of the development of an approach oriented towards the visualization of scientific knowledge. It is motivated by the fact that the concepts of organization of scientific knowledge (theorem, law, experience, proof, etc.) appear in existing ontologies but that none of them is centered on this topic and presents a simple and easily usable organization. We present the first version built from ontological sources (ontologies of knowledge objects of certain fields, lexical and higher level ones), specialized knowledge bases and interviews with scientists. We have aligned this ontology with some of the sources used, which has allowed us to verify its consistency with respect to them. The validation of the ontology consists in using it to formalize knowledge from various sources, which we have begun to do in the field of physics.	翻訳日:2021-07-12 21:07:53 公開日:2021-07-09
# (参考訳) 文書レイアウト生成のためのグラフベース深層生成モデル Graph-based Deep Generative Modelling for Document Layout Generation ( http://arxiv.org/abs/2107.04357v1 ) ライセンス: CC BY-SA 4.0	Sanket Biswas, Pau Riba, Josep Llad\'os, and Umapada Pal	(参考訳) ディープラーニングアプローチの主要な前提条件の1つは、大規模トレーニングデータの可用性である。実世界のシナリオでスキャンされた文書画像を扱う場合、その内容の主情報はレイアウト自体に格納される。本研究では,グラフニューラルネットワーク(GNN)を用いて,文書解釈システム,特にデジタルメールルームアプリケーションにおいて,文書解釈システムの学習に使用可能な,高度に可変かつ信頼性の高い文書レイアウトを持つ合成データを生成する。また、ドキュメントレイアウト生成タスクを管理文書画像、この場合請求書で実験する最初のグラフベースのアプローチでもある。 One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices.	翻訳日:2021-07-12 21:01:15 公開日:2021-07-09
# (参考訳) 図形言語検出のためのロバストディープアンサンブル分類器 A Robust Deep Ensemble Classifier for Figurative Language Detection ( http://arxiv.org/abs/2107.04372v1 ) ライセンス: CC BY 4.0	Rolandos Alexandros Potamias and Georgios Siolas and Andreas - Georgios Stafylopatis	(参考訳) 表現型言語(FL)の認識と分類は、比喩的内容のフレーズに含まれる矛盾した意味から、自然言語処理(NLP)の幅広い分野における知覚分析のオープンな問題である。本論文では,高度なDeep Learning (DL) 技術に対処する,皮肉,皮肉,メタファの3つの相互関連FL認識タスクについて述べる。まず,各入力をDLモデルに最適化するために,効率的なデータ表現形式に向けたデータ前提フレームワークを提案する。さらに、各ソーシャルメディアテキスト参照に反映される構文的、表現的、感情的、テンポ的コンテンツを特徴付けるために、特殊特徴を抽出する。これらの機能は、ソーシャルネットワークユーザの書き込み方法の側面をキャプチャすることを目的としている。最後に、異なるDL技術の組み合わせに基づく、堅牢なDeep Ensemble Soft Classifier (DESC) に機能を供給する。 3つの異なるベンチマークデータセット(そのうちの1つは様々なFL形式を含む)を用いて、DECモデルはFL認識の困難な分野において、関連する方法論や最先端技術と比較するにふさわしい非常に優れた性能を達成すると結論付けた。 Recognition and classification of Figurative Language (FL) is an open problem of Sentiment Analysis in the broader field of Natural Language Processing (NLP) due to the contradictory meaning contained in phrases with metaphorical content. The problem itself contains three interrelated FL recognition tasks: sarcasm, irony and metaphor which, in the present paper, are dealt with advanced Deep Learning (DL) techniques. First, we introduce a data prepossessing framework towards efficient data representation formats so that to optimize the respective inputs to the DL models. In addition, special features are extracted in order to characterize the syntactic, expressive, emotional and temper content reflected in the respective social media text references. These features aim to capture aspects of the social network user's writing method. Finally, features are fed to a robust, Deep Ensemble Soft Classifier (DESC) which is based on the combination of different DL techniques. Using three different benchmark datasets (one of them containing various FL forms) we conclude that the DESC model achieves a very good performance, worthy of comparison with relevant methodologies and state-of-the-art technologies in the challenging field of FL recognition.	翻訳日:2021-07-12 20:51:15 公開日:2021-07-09
# (参考訳) ドメイン固有ALBERTを用いたバイオメディカル自然言語処理タスクのベンチマーク Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT ( http://arxiv.org/abs/2107.04374v1 ) ライセンス: CC BY 4.0	Usman Naseem, Adam G. Dunn, Matloob Khushi, Jinman Kim	(参考訳) バイオメディカルテキストデータの入手と自然言語処理(NLP)の進歩により、バイオメディカルNLPの新たな応用が可能となった。ドメイン固有コーパスを用いて訓練または微調整された言語モデルは、一般的なモデルより優れているが、バイオメディカルNLPにおける作業は、コーパスとタスクの点で制限されている。本稿では,生物医学的(pubmed centralとpubmed central)と臨床(mimic-iii)コーポラを訓練し,20個のベンチマークデータセットにまたがる6つの異なるタスクを微調整した,ライト双方向エンコーダ表現のドメイン固有適応であるbioalbertを提案する。実験の結果、BioALBERTは、名前付きエンティティ認識(+11.09% BLURBスコアの改善)、関係抽出(+0.80% BLURBスコア)、文類似性(+1.05% BLURBスコア)、文書分類(+0.62% F1スコア)、質問応答(+2.83% BLURBスコア)において、技術の現状よりも優れていた。 20のベンチマークデータセットのうち17で、新しい最先端技術を表している。バイオALBERTモデルとデータを利用可能にすることで、バイオメディカルNLPコミュニティがトレーニングの計算コストを回避し、幅広いバイオメディカルNLPタスクにわたる今後の取り組みのための新たなベースラインを確立することを目的とする。 The availability of biomedical text data and advances in natural language processing (NLP) have made new applications in biomedical NLP possible. Language models trained or fine tuned using domain specific corpora can outperform general models, but work to date in biomedical NLP has been limited in terms of corpora and tasks. We present BioALBERT, a domain-specific adaptation of A Lite Bidirectional Encoder Representations from Transformers (ALBERT), trained on biomedical (PubMed and PubMed Central) and clinical (MIMIC-III) corpora and fine tuned for 6 different tasks across 20 benchmark datasets. Experiments show that BioALBERT outperforms the state of the art on named entity recognition (+11.09% BLURB score improvement), relation extraction (+0.80% BLURB score), sentence similarity (+1.05% BLURB score), document classification (+0.62% F1-score), and question answering (+2.83% BLURB score). It represents a new state of the art in 17 out of 20 benchmark datasets. By making BioALBERT models and data available, our aim is to help the biomedical NLP community avoid computational costs of training and establish a new set of baselines for future efforts across a broad range of biomedical NLP tasks.	翻訳日:2021-07-12 20:41:02 公開日:2021-07-09
# (参考訳) アンサンブル分類におけるスペシャリストの成績 Specialists Outperform Generalists in Ensemble Classification ( http://arxiv.org/abs/2107.04381v1 ) ライセンス: CC BY 4.0	Sascha Meyen, Frieder G\"oppert, Helen Alber, Ulrike von Luxburg, Volker H. Franz	(参考訳) 精度が知られている個々の分類器の集合を考える。テストポイントを受信すると、各分類器は、この特定のテストポイントに対する予測ラベルとその予測に対する信頼度を出力する。本稿では,アンサンブルの精度を判定できるかどうかという問題に対処する。驚いたことに、この設定において、統計学的に最適な方法で分類器が組み合わされたとしても、その結果のアンサンブル分類器の精度は個々の分類器の精度から計算することはできない。アンサンブル精度について, 上下境界を厳密に証明した。我々は、上と下の境界に達する個々の分類器を明示的に構築する。 1) アンサンブル法を用いて, 個々の(独立でない)分類器をスクラッチから構築する選択肢があれば, 一般論者ではなく, 専門的分類器を目標とすべきである。 2) 所望のアンサンブル精度を達成するために,少なくとも何個の分類器が必要かを決定するために,我々の境界を用いることができる。最後に、真のラベルと個々の分類器の出力間の相互情報を考慮して境界を改善する。 Consider an ensemble of $k$ individual classifiers whose accuracies are known. Upon receiving a test point, each of the classifiers outputs a predicted label and a confidence in its prediction for this particular test point. In this paper, we address the question of whether we can determine the accuracy of the ensemble. Surprisingly, even when classifiers are combined in the statistically optimal way in this setting, the accuracy of the resulting ensemble classifier cannot be computed from the accuracies of the individual classifiers-as would be the case in the standard setting of confidence weighted majority voting. We prove tight upper and lower bounds on the ensemble accuracy. We explicitly construct the individual classifiers that attain the upper and lower bounds: specialists and generalists. Our theoretical results have very practical consequences: (1) If we use ensemble methods and have the choice to construct our individual (independent) classifiers from scratch, then we should aim for specialist classifiers rather than generalists. (2) Our bounds can be used to determine how many classifiers are at least required to achieve a desired ensemble accuracy. Finally, we improve our bounds by considering the mutual information between the true label and the individual classifier's output.	翻訳日:2021-07-12 20:28:35 公開日:2021-07-09
# (参考訳) hoechstは必要なすべてだ:深層学習によるリンパ球分類 Hoechst Is All You Need: LymphocyteClassification with Deep Learning ( http://arxiv.org/abs/2107.04388v1 ) ライセンス: CC BY 4.0	Jessica Cooper, In Hwa Um, Ognjen Arandjelovi\'c and David J Harrison	(参考訳) 多発性免疫蛍光および免疫組織化学は、がん病理学者が細胞表面に発現するいくつかのタンパク質を同定し、細胞分類、腫瘍の微小環境の理解、より正確な診断、予後、個々の患者の免疫状態に基づく調整された免疫療法を可能にすることで、患者に利益をもたらす。しかし、それらは高価で時間を要するプロセスであり、専門家による複雑な染色とイメージング技術を必要とする。ホーフスト染色はより安価で実行が容易であるが、免疫蛍光法で標的とするタンパク質よりもdnaに結合するので一般的には用いられず、dna形態のみに基づいてこれらのタンパク質を発現する細胞を区別することは従来考えられていなかった。本研究では,3つのタンパク質(tリンパ球マーカーcd3,cd8,bリンパ球マーカーcd20)を90%以上の精度で発現する細胞を,ホーチスト33342染色組織のみから同定するために,深い畳み込みニューラルネットワークを訓練することを提案する。本モデルでは, 免疫細胞浸潤の評価などの重要な予後指標において, リンパ球サブタイプを正確に識別し, コストのかかる多重蛍光を必要とせず, 患者の予後を予測し, 改善することのできる, これらのタンパク質の発現に関連する既知形態的特徴を学習する。 Multiplex immunofluorescence and immunohistochemistry benefit patients by allowing cancer pathologists to identify several proteins expressed on the surface of cells, enabling cell classification, better understanding of the tumour micro-environment, more accurate diagnoses, prognoses, and tailored immunotherapy based on the immune status of individual patients. However, they are expensive and time consuming processes which require complex staining and imaging techniques by expert technicians. Hoechst staining is much cheaper and easier to perform, but is not typically used in this case as it binds to DNA rather than to the proteins targeted by immunofluorescent techniques, and it was not previously thought possible to differentiate cells expressing these proteins based only on DNA morphology. In this work we show otherwise, training a deep convolutional neural network to identify cells expressing three proteins (T lymphocyte markers CD3 and CD8, and the B lymphocyte marker CD20) with greater than 90% precision and recall, from Hoechst 33342 stained tissue only. Our model learns previously unknown morphological features associated with expression of these proteins which can be used to accurately differentiate lymphocyte subtypes for use in key prognostic metrics such as assessment of immune cell infiltration,and thereby predict and improve patient outcomes without the need for costly multiplex immunofluorescence.	翻訳日:2021-07-12 19:54:24 公開日:2021-07-09
# (参考訳) 集合加算問題に対する文脈的・非文脈的選好ランキングの比較 A Comparison of Contextual and Non-Contextual Preference Ranking for Set Addition Problems ( http://arxiv.org/abs/2107.04438v1 ) ライセンス: CC BY 4.0	Timo Bertram, Johannes F\"urnkranz, Martin M\"uller	(参考訳) 本稿では,要素の集合への付加性を評価する問題について検討する。この問題は、一般的な場合では、選択間の無条件な選好に還元できないため、難しい。したがって、決定の文脈に基づいて好みをモデル化する。本課題では,追加後の2つの集合を比較するツインネットワークと,各候補の既存集合への寄与をモデル化するトリプレットネットワークという,2つの異なるシムセネットワークアーキテクチャを議論し比較する。収集可能なカードゲームMagic: The Gathering(マジック:ザ・ギャザリング)におけるデッキビルディングの人間のカード嗜好を学習する。本稿では,2つのネットワークよりも3重項アプローチの方がよい結果が得られることを示す。 In this paper, we study the problem of evaluating the addition of elements to a set. This problem is difficult, because it can, in the general case, not be reduced to unconditional preferences between the choices. Therefore, we model preferences based on the context of the decision. We discuss and compare two different Siamese network architectures for this task: a twin network that compares the two sets resulting after the addition, and a triplet network that models the contribution of each candidate to the existing set. We evaluate the two settings on a real-world task; learning human card preferences for deck building in the collectible card game Magic: The Gathering. We show that the triplet approach achieves a better result than the twin network and that both outperform previous results on this task.	翻訳日:2021-07-12 19:38:48 公開日:2021-07-09
# (参考訳) ディープニューラルネットワークにおける凝集層分布の理解 Understanding the Distributions of Aggregation Layers in Deep Neural Networks ( http://arxiv.org/abs/2107.04458v1 ) ライセンス: CC BY 4.0	Eng-Jon Ong, Sameed Husain, Miroslaw Bober	(参考訳) 集約のプロセスは、ほとんどすべてのディープネットモデルにおいてユビキタスである。深い特徴をよりコンパクトな表現にまとめる重要なメカニズムとして機能し、深い網に過度に収まることへの堅牢性を高め、空間的不変性を提供する。特に、DNNの出力層へのグローバルアグリゲーション層の近接は、集約された特徴がディープネットの性能に直接的な影響を与えることを意味する。この関係をよりよく理解するには、情報理論の手法を用いる。しかし、これは凝集層の活性化の分布に関する知識を必要とする。そこで本研究では,深い特徴集約に関わるレイヤの出力値の確率分布を解析的にモデル化する,新しい数学的定式化を提案する。重要な結果として、DNNにおける出力ノードのKL分割を解析的に予測する能力がある。また,様々な分類タスクやデータセットにわたる経験的観測に対する理論的予測を実験的に検証した。 The process of aggregation is ubiquitous in almost all deep nets models. It functions as an important mechanism for consolidating deep features into a more compact representation, whilst increasing robustness to overfitting and providing spatial invariance in deep nets. In particular, the proximity of global aggregation layers to the output layers of DNNs mean that aggregated features have a direct influence on the performance of a deep net. A better understanding of this relationship can be obtained using information theoretic methods. However, this requires the knowledge of the distributions of the activations of aggregation layers. To achieve this, we propose a novel mathematical formulation for analytically modelling the probability distributions of output values of layers involved with deep feature aggregation. An important outcome is our ability to analytically predict the KL-divergence of output nodes in a DNN. We also experimentally verify our theoretical predictions against empirical observations across a range of different classification tasks and datasets.	翻訳日:2021-07-12 19:28:53 公開日:2021-07-09
# (参考訳) 脳波に基づく睡眠段階分類のための自己訓練による対向領域適応 Adversarial Domain Adaptation with Self-Training for EEG-based Sleep Stage Classification ( http://arxiv.org/abs/2107.04470v1 ) ライセンス: CC BY 4.0	Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee-Keong Kwoh, Xiaoli Li, and Cuntai Guan	(参考訳) 睡眠ステージングは睡眠障害の診断と治療において非常に重要である。近年,自動睡眠ステージングのためのデータ駆動型ディープラーニングモデルが提案されている。それらは主に、トレーニングとテストのデータを、実際のシナリオでは保持できないような同じ分布から引き出すという仮定に依存している。ドメインシフト問題に対処するために、Unsupervised Domain Adaption (UDA) が最近開発された。しかし、これまでの睡眠ステージングに適用されるUDAメソッドには2つの大きな制限がある。まず、それらはドメインアライメントの完全な共有モデルに依存しており、機能抽出中にドメイン固有の情報を失う可能性がある。第2に、ターゲットドメインのクラス情報を考慮せずに、ソースとターゲットの分布をグローバルに調整するだけで、モデルの分類性能を阻害する。本研究では,未ラベル対象領域におけるドメインシフト問題に対処するための新しい逆学習フレームワークを提案する。まず、ソースドメインとターゲットドメインのドメイン固有の特徴を保存するために、非共有アテンション機構を開発する。第2に、ターゲットドメインの擬似ラベルを用いて、ソースおよびターゲットドメインの詳細なクラス分布を調整するための自己学習戦略を設計する。また,擬似ラベルのロバスト性と品質を高めるために,2つの識別分類器を提案する。 6つのクロスドメインシナリオの実験結果から、睡眠ステージングのためのフレームワークの有効性と最先端UDA法に対する利点が検証された。 Sleep staging is of great importance in the diagnosis and treatment of sleep disorders. Recently, numerous data driven deep learning models have been proposed for automatic sleep staging. They mainly rely on the assumption that training and testing data are drawn from the same distribution which may not hold in real-world scenarios. Unsupervised domain adaption (UDA) has been recently developed to handle this domain shift problem. However, previous UDA methods applied for sleep staging has two main limitations. First, they rely on a totally shared model for the domain alignment, which may lose the domain-specific information during feature extraction. Second, they only align the source and target distributions globally without considering the class information in the target domain, which hinders the classification performance of the model. In this work, we propose a novel adversarial learning framework to tackle the domain shift problem in the unlabeled target domain. First, we develop unshared attention mechanisms to preserve the domain-specific features in the source and target domains. Second, we design a self-training strategy to align the fine-grained class distributions for the source and target domains via target domain pseudo labels. We also propose dual distinct classifiers to increase the robustness and quality of the pseudo labels. The experimental results on six cross-domain scenarios validate the efficacy of our proposed framework for sleep staging and its advantage over state-of-the-art UDA methods.	翻訳日:2021-07-12 19:06:10 公開日:2021-07-09
# (参考訳) 電力ネットワーク同定のためのベイズ誤差イン変数モデル Bayesian Error-in-Variables Models for the Identification of Power Networks ( http://arxiv.org/abs/2107.04480v1 ) ライセンス: CC BY 4.0	Jean-S\'ebastien Brouillon, Emanuele Fabbiani, Pulkit Nahata, Florian D\"orfler, Giancarlo Ferrari-Trecate	(参考訳) 断続的な再生可能発電、特に分布レベルでの統合が増加すると、グリッドの知識、特に電気ネットワークのトポロジーとラインパラメータをキャプチャするアドミタンス行列に基づく高度な計画および最適化手法が必要となる。しかし、アドミタンス行列の信頼できる推定は、時間的に変化する格子に対して欠落するか、あるいはすぐに時代遅れになるかもしれない。本研究では,マイクロPMUから収集した電圧と電流を利用したデータ駆動型識別手法を提案する。より正確には、我々はまず最大帰納的アプローチを示し、次にベイズ的枠組みに向かい、最大後続推定の原理を推定する。既存のコントリビューションとは対照的に,本手法では,電圧と電流データの両方のノイズを測定するだけでなく,疎度パターンやノウハウラインパラメータなどの事前情報を活用できる。ベンチマークケースで行ったシミュレーションでは, 他アルゴリズムと比較して, 精度が大幅に向上することが示された。 The increasing integration of intermittent renewable generation, especially at the distribution level,necessitates advanced planning and optimisation methodologies contingent on the knowledge of thegrid, specifically the admittance matrix capturing the topology and line parameters of an electricnetwork. However, a reliable estimate of the admittance matrix may either be missing or quicklybecome obsolete for temporally varying grids. In this work, we propose a data-driven identificationmethod utilising voltage and current measurements collected from micro-PMUs. More precisely,we first present a maximum likelihood approach and then move towards a Bayesian framework,leveraging the principles of maximum a posteriori estimation. In contrast with most existing con-tributions, our approach not only factors in measurement noise on both voltage and current data,but is also capable of exploiting available a priori information such as sparsity patterns and knownline parameters. Simulations conducted on benchmark cases demonstrate that, compared to otheralgorithms, our method can achieve significantly greater accuracy.	翻訳日:2021-07-12 18:51:09 公開日:2021-07-09
# (参考訳) 敗血症治療戦略の不確実性を考慮したオフライン強化学習 Offline reinforcement learning with uncertainty for treatment strategies in sepsis ( http://arxiv.org/abs/2107.04491v1 ) ライセンス: CC BY 4.0	Ran Liu (1 and 2), Joseph L. Greenstein (1 and 2), James C. Fackler (3), Jules Bergmann (3), Melania M. Bembea (3 and 4), Raimond L. Winslow (1 and 2) ((1) Institute for Computational Medicine, the Johns Hopkins University, (2) Department of Biomedical Engineering, the Johns Hopkins University School of Medicine and Whiting School of Engineering, (3) Department of Anesthesiology and Critical Care Medicine, the Johns Hopkins University, (4) Department of Pediatrics, the Johns Hopkins University School of Medicine)	(参考訳) 敗血症と敗血症性ショックに対するガイドラインに基づく治療は、病態を十分に理解していない生命を脅かす臓器機能障害の異なる範囲であるため困難である。敗血症の早期介入は患者の予後に不可欠であるが、これらの介入は副作用があり、しばしば過剰投与される。すべての患者には単一の行動が適さないため、より個人化が必要である。本稿では,データから敗血症治療の最適勧告を抽出し,信頼度を推定し,トレーニングデータで頻繁に観察される治療オプションを同定する,強化学習の新たな応用を提案する。単一の推奨ではなく,いくつかの治療法を提示できる。学習方針を考察し, 死亡率と治療のレベルが重なり合うことから, 強化学習は積極的な介入に偏っていることを見出した。このバイアスをサブスペース学習を用いて軽減し、医療アプリケーション全体でより正確な学習方針をもたらす方法を開発します。 Guideline-based treatment for sepsis and septic shock is difficult because sepsis is a disparate range of life-threatening organ dysfunctions whose pathophysiology is not fully understood. Early intervention in sepsis is crucial for patient outcome, yet those interventions have adverse effects and are frequently overadministered. Greater personalization is necessary, as no single action is suitable for all patients. We present a novel application of reinforcement learning in which we identify optimal recommendations for sepsis treatment from data, estimate their confidence level, and identify treatment options infrequently observed in training data. Rather than a single recommendation, our method can present several treatment options. We examine learned policies and discover that reinforcement learning is biased against aggressive intervention due to the confounding relationship between mortality and level of treatment received. We mitigate this bias using subspace learning, and develop methodology that can yield more accurate learning policies across healthcare applications.	翻訳日:2021-07-12 18:21:56 公開日:2021-07-09
# (参考訳) 勾配に基づく深部物体検出器の不確かさの定量化 Gradient-Based Quantification of Epistemic Uncertainty for Deep Object Detectors ( http://arxiv.org/abs/2107.04517v1 ) ライセンス: CC BY 4.0	Tobias Riedlinger, Matthias Rottmann, Marius Schubert, Hanno Gottschalk	(参考訳) 信頼性の高いてんかん不確実性評価は, 深部物体検出装置のバックエンド応用に欠かせない要素である。現代のネットワークアーキテクチャは、予測能力に制限のある、キャリブレーションの低い信頼性を与える傾向がある。本稿では,新しい勾配に基づく不確実性メトリクスを導入し,異なるオブジェクト検出アーキテクチャについて検討する。 MS COCO, PASCAL VOC, KITTIデータセットを用いた実験では, ネットワーク信頼度と比較して, 正/偽の正の正の正の判別と交叉の予測が有意に向上した。また、モンテカルロのドロップアウト不確実性指標に対する改善や、さまざまな不確実性指標のソースを集約することで、さらに大幅な改善が見られ、その結果の不確実性モデルは、すべてのインスタンスにおいて十分に校正された信頼を生み出す。さらに,不確実性定量化モデルを物体検出パイプラインに実装し,通常のスコアスレッシャードに基づく決定規則を置き換え,偽予測と真偽を識別する。実験では,平均的な精度で検出性能を大幅に向上させることができた。計算複雑性に関しては,浮動小数点演算における計算勾配の不確実性の測定値がモンテカルロ・ドロップアウトの値と類似していることが分かる。 Reliable epistemic uncertainty estimation is an essential component for backend applications of deep object detectors in safety-critical environments. Modern network architectures tend to give poorly calibrated confidences with limited predictive power. Here, we introduce novel gradient-based uncertainty metrics and investigate them for different object detection architectures. Experiments on the MS COCO, PASCAL VOC and the KITTI dataset show significant improvements in true positive / false positive discrimination and prediction of intersection over union as compared to network confidence. We also find improvement over Monte-Carlo dropout uncertainty metrics and further significant boosts by aggregating different sources of uncertainty metrics.The resulting uncertainty models generate well-calibrated confidences in all instances. Furthermore, we implement our uncertainty quantification models into object detection pipelines as a means to discern true against false predictions, replacing the ordinary score-threshold-based decision rule. In our experiments, we achieve a significant boost in detection performance in terms of mean average precision. With respect to computational complexity, we find that computing gradient uncertainty metrics results in floating point operation counts similar to those of Monte-Carlo dropout.	翻訳日:2021-07-12 18:06:44 公開日:2021-07-09
# (参考訳) 非凹帯域最適化のための最適勾配アルゴリズム Optimal Gradient-based Algorithms for Non-concave Bandit Optimization ( http://arxiv.org/abs/2107.04518v1 ) ライセンス: CC0 1.0	Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang	(参考訳) 線形あるいは凹面報酬のバンドイット問題は広く研究されているが、非凹面報酬のバンドイットの研究は比較的少ない。本研究は、低ランク一般化線形バンディット問題や多項式活性化バンディット問題を持つ2層ニューラルネットワークなど、未知の報酬関数が凹凸でないバンディット問題の大きなファミリーを考察する。低ランク一般化線形バンドイット問題に対しては、[LMT21, JWWN19] における両方の予想を反論するミニマックス最適化アルゴリズムを提供する。我々のアルゴリズムは、非常に一般化されたゼロ階最適化パラダイムに基づいており、(次元において)いくつかの構造化多項式設定において最適な速度が得られる。さらに、生成モデル設定におけるRLにおけるアルゴリズムの適用性を実証し、従来の手法よりもサンプルの複雑さが向上した。最後に、標準楽観的アルゴリズム(例:ucb)が次元因子によって最適化されることを示す。雑音のない報酬を持つニューラルネット設定(多項式アクティベーション関数付き)では、本質的な代数次元に等しいサンプリング複雑性を持つバンディットアルゴリズムを提供する。また、楽観的なアプローチはサンプルの複雑さが悪く、外部次元の多項式(多項式次数において指数関数的に悪い)があることを示した。 Bandit problems with linear or concave reward have been extensively studied, but relatively few works have studied bandits with non-concave reward. This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem. For the low-rank generalized linear bandit problem, we provide a minimax-optimal algorithm in the dimension, refuting both conjectures in [LMT21, JWWN19]. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality and attains optimal rates in several structured polynomial settings (in the dimension). We further demonstrate the applicability of our algorithms in RL in the generative model setting, resulting in improved sample complexity over prior approaches. Finally, we show that the standard optimistic algorithms (e.g., UCB) are sub-optimal by dimension factors. In the neural net setting (with polynomial activation functions) with noiseless reward, we provide a bandit algorithm with sample complexity equal to the intrinsic algebraic dimension. Again, we show that optimistic approaches have worse sample complexity, polynomial in the extrinsic dimension (which could be exponentially worse in the polynomial degree).	翻訳日:2021-07-12 17:37:40 公開日:2021-07-09
# (参考訳) ラベル分布シフトに対するオンライン適応 Online Adaptation to Label Distribution Shift ( http://arxiv.org/abs/2107.04520v1 ) ライセンス: CC BY 4.0	Ruihan Wu, Chuan Guo, Yi Su, Kilian Q. Weinberger	(参考訳) 機械学習モデルは、現実世界にデプロイすると、しばしば分散シフトに遭遇する。本稿では,テストタイムラベルの分布が継続的に変化しているオンライン環境でのラベルの分布変化への適応に着目し,真のラベルを観察することなく動的に適応する必要がある。そこで,本研究では,従来のオンライン学習へのオンラインラベルシフト適応の低減を図り,真のラベルの欠如が期待されるテスト損失の推定を妨げないことを示す。そこで本研究では,従来の学習手法であるフォロー・ザ・リーダー (ftl) やオンライン勾配降下 (ogd) に触発された適応アルゴリズムを提案する。我々はシミュレーションと実世界のラベルの分布変化の両方でこの発見を実証し、ogdが様々な挑戦的なラベルシフトシナリオに対して特に効果的で頑健であることを実証した。 Machine learning models often encounter distribution shifts when deployed in the real world. In this paper, we focus on adaptation to label distribution shift in the online setting, where the test-time label distribution is continually changing and the model must dynamically adapt to it without observing the true label. Leveraging a novel analysis, we show that the lack of true label does not hinder estimation of the expected test loss, which enables the reduction of online label shift adaptation to conventional online learning. Informed by this observation, we propose adaptation algorithms inspired by classical online learning techniques such as Follow The Leader (FTL) and Online Gradient Descent (OGD) and derive their regret bounds. We empirically verify our findings under both simulated and real world label distribution shifts and show that OGD is particularly effective and robust to a variety of challenging label shift scenarios.	翻訳日:2021-07-12 17:36:30 公開日:2021-07-09
# (参考訳) エントロピー、情報、および確率の更新 Entropy, Information, and the Updating of Probabilities ( http://arxiv.org/abs/2107.04529v1 ) ライセンス: CC BY 4.0	Ariel Caticha	(参考訳) 本稿では,推論の一般的な枠組みとして,最大エントロピー法に対する特定のアプローチを概説する。議論は導出における実用的要素を強調している。情報の概念は、理想的に有理なエージェントのベイズ的信念との関係の観点から定義される。先行確率分布から後続確率分布への更新方法は、固有誘導過程を通じて設計する。対数的相対エントロピーは、(a)が普遍的な適用性、(b)先行情報の価値を認識する、(c)科学における独立の概念が果たす特権的役割を認識する、ユニークなツールとして選択される。結果として生じるフレームワーク -- MEメソッド -- は、任意の事前および任意の制約を処理できる。これは特殊ケースとしてMaxEntとBayesの法則を含み、したがってエントロピー法とベイズ法を単一の一般推論スキームに統一する。 ME法は1つの後部の単なる選択を超えるが、他の分布がどれだけ小さいかという問題にも対処し、揺らぎの理論と大きな偏差の直接的な橋渡しとなる。 This paper is a review of a particular approach to the method of maximum entropy as a general framework for inference. The discussion emphasizes the pragmatic elements in the derivation. An epistemic notion of information is defined in terms of its relation to the Bayesian beliefs of ideally rational agents. The method of updating from a prior to a posterior probability distribution is designed through an eliminative induction process. The logarithmic relative entropy is singled out as the unique tool for updating that (a) is of universal applicability; (b) that recognizes the value of prior information; and (c) that recognizes the privileged role played by the notion of independence in science. The resulting framework -- the ME method -- can handle arbitrary priors and arbitrary constraints. It includes MaxEnt and Bayes' rule as special cases and, therefore, it unifies entropic and Bayesian methods into a single general inference scheme. The ME method goes beyond the mere selection of a single posterior, but also addresses the question of how much less probable other distributions might be, which provides a direct bridge to the theories of fluctuations and large deviations.	翻訳日:2021-07-12 16:58:38 公開日:2021-07-09
# (参考訳) バイオメディカルイメージセグメンテーションのためのモダリティ特異的U-Net変異体:調査 Modality specific U-Net variants for biomedical image segmentation: A survey ( http://arxiv.org/abs/2107.04537v1 ) ライセンス: CC BY 4.0	Narinder Singh Punn, Sonali Agarwal	(参考訳) 深層畳み込みニューラルネットワーク、残留ニューラルネットワーク、敵ネットワークなどのディープラーニングアプローチの進展に伴い、U-Netアーキテクチャは、標的領域またはサブリージョンの識別と検出における自動化に対処するために、バイオメディカルイメージセグメンテーションにおいて最も広く利用されている。最近の研究では、u-netベースのアプローチは、脳腫瘍、肺癌、アルツハイマー病、乳がんなどの疾患の早期診断および治療のためのコンピュータ支援診断システムの開発に、様々な応用において最先端のパフォーマンスを示す。本稿では,U-Netフレームワークを説明することによって,これらのアプローチの成功を示すとともに,磁気共鳴画像,X線,コンピュータ断層撮影/コンピュータ軸断層撮影,超音波,ポジトロン放射断層撮影など,異なる医用画像のU-Net変種を包括的に分析する。さらに、新型コロナウイルス(COVID-19)としても知られる重症急性呼吸器症候群ウイルス2(SARS-CoV-2)へのU-Netベースのフレームワークの貢献についても強調する。 With the advent of advancements in deep learning approaches, such as deep convolution neural network, residual neural network, adversarial network; U-Net architectures are most widely utilized in biomedical image segmentation to address the automation in identification and detection of the target regions or sub-regions. In recent studies, U-Net based approaches have illustrated state-of-the-art performance in different applications for the development of computer-aided diagnosis systems for early diagnosis and treatment of diseases such as brain tumor, lung cancer, alzheimer, breast cancer, etc. This article contributes to present the success of these approaches by describing the U-Net framework, followed by the comprehensive analysis of the U-Net variants for different medical imaging or modalities such as magnetic resonance imaging, X-ray, computerized tomography/computerized axial tomography, ultrasound, positron emission tomography, etc. Besides, this article also highlights the contribution of U-Net based frameworks in the on-going pandemic, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) also known as COVID-19.	翻訳日:2021-07-12 16:35:42 公開日:2021-07-09
# (参考訳) ディープニューラルネットワークは列名からデータ相関を予測できるか? Can Deep Neural Networks Predict Data Correlations from Column Names? ( http://arxiv.org/abs/2107.04553v1 ) ライセンス: CC BY 4.0	Immanuel Trummer	(参考訳) 人間の場合、コラム名からデータ相関を予測することがしばしば可能である。我々は、ディープニューラルネットワークが同じことを学べるかどうかを調べる実験を行う。もしそうなら、例えば、nlp分析をスキーマ要素に使用するチューニングツールが、相関検出への取り組みを優先する可能性を開くだろう。約4,000データセットから抽出した約12万列の相関関係を解析した。カラム名のみに基づいて相関を予測しようとする。予測には,最近提案されたTransformerアーキテクチャに基づく事前学習言語モデルを利用する。異なるタイプの相関、複数の予測方法、および様々な予測シナリオを検討する。カラム名の長さやトレーニングデータの量などの要因が予測精度に与える影響について検討した。全体として、ディープニューラルネットワークは、多くのシナリオにおいて比較的高い精度で相関を予測できる(例えば、長いカラム名に対して95%の精度で)。 For humans, it is often possible to predict data correlations from column names. We conduct experiments to find out whether deep neural networks can learn to do the same. If so, e.g., it would open up the possibility of tuning tools that use NLP analysis on schema elements to prioritize their efforts for correlation detection. We analyze correlations for around 120,000 column pairs, taken from around 4,000 data sets. We try to predict correlations, based on column names alone. For predictions, we exploit pre-trained language models, based on the recently proposed Transformer architecture. We consider different types of correlations, multiple prediction methods, and various prediction scenarios. We study the impact of factors such as column name length or the amount of training data on prediction accuracy. Altogether, we find that deep neural networks can predict correlations with a relatively high accuracy in many scenarios (e.g., with an accuracy of 95% for long column names).	翻訳日:2021-07-12 15:39:48 公開日:2021-07-09
# (参考訳) ベイズ語の学習規則 The Bayesian Learning Rule ( http://arxiv.org/abs/2107.04562v1 ) ライセンス: CC BY 4.0	Mohammad Emtiyaz Khan and H{\aa}vard Rue	(参考訳) 多くの機械学習アルゴリズムがベイズ学習則と呼ばれる単一のアルゴリズムの特定の例であることを示す。この規則はベイズ原理から派生したもので、最適化、ディープラーニング、グラフィカルモデルといった分野から幅広いアルゴリズムを導出する。これにはリッジ回帰、ニュートン法、カルマンフィルタのような古典的なアルゴリズムや、確率勾配降下、rmsprop、ドロップアウトといった現代のディープラーニングアルゴリズムが含まれる。このようなアルゴリズムを導出する鍵となるアイデアは、自然勾配を用いて推定された候補分布を用いて後部を近似することである。異なる候補分布は異なるアルゴリズムとなり、さらに自然勾配への近似はそれらのアルゴリズムの変種を引き起こす。私たちの仕事は、既存のアルゴリズムを統一、一般化、改善するだけでなく、新しいアルゴリズムの設計にも役立ちます。 We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.	翻訳日:2021-07-12 15:23:03 公開日:2021-07-09
# (参考訳) ランダムウォークと再起動によるユニバーサル多層ネットワーク探索 Universal Multilayer Network Exploration by Random Walk with Restart ( http://arxiv.org/abs/2107.04565v1 ) ライセンス: CC BY 4.0	Anthony Baptista, Aitor Gonzalez, Ana\"is Baudot	(参考訳) ここ数年、データの量と種類は劇的に増加している。これらのデータはしばしばネットワークとして表現され、ネットワーク理論から生じるアプローチで探索される。近年では、より複雑でリッチなネットワークフレームワークを活用するためのネットワーク探索手法が拡張されている。例えば、ランダムウォークは多層ネットワークを探索するために拡張されている。しかし、現在のランダムウォークアプローチは、処理可能なネットワーク層の組合せと不均一性に制限がある。多層ネットワークの多様性と複雑さの増大に対応するために,新しい解析的および数値的ランダムウォーク法が必要である。そこで本稿では,Random Walk with Restart(RWR)を最適化したマルチレイヤネットワーク上で実現するPythonパッケージであるMultiXrankを提案する。このパッケージはRWRの普遍的な数学的定式化によって支えられている。我々は,MultiXrankを相互検証とリンク予測で評価し,マルチレイヤネットワークデータの追加や削除が予測性能に与える影響を評価するためのプロトコルを導入した。さらに,入力パラメータに対するマルチックスランクの感度をパラメータ空間の詳細な探索により測定した。最後に,ヒト遺伝病の文脈において,非教師付きノード優先化と教師付き分類の異なるユースケースを用いたマルチックスランクの汎用性を示す。 The amount and variety of data is increasing drastically for several years. These data are often represented as networks, which are then explored with approaches arising from network theory. Recent years have witnessed the extension of network exploration methods to leverage more complex and richer network frameworks. Random walks, for instance, have been extended to explore multilayer networks. However, current random walk approaches are limited in the combination and heterogeneity of network layers they can handle. New analytical and numerical random walk methods are needed to cope with the increasing diversity and complexity of multilayer networks. We propose here MultiXrank, a Python package that enables Random Walk with Restart (RWR) on any kind of multilayer network with an optimized implementation. This package is supported by a universal mathematical formulation of the RWR. We evaluated MultiXrank with leave-one-out cross-validation and link prediction, and introduced protocols to measure the impact of the addition or removal of multilayer network data on prediction performances. We further measured the sensitivity of MultiXrank to input parameters by in-depth exploration of the parameter space. Finally, we illustrate the versatility of MultiXrank with different use-cases of unsupervised node prioritization and supervised classification in the context of human genetic diseases.	翻訳日:2021-07-12 15:22:07 公開日:2021-07-09
# (参考訳) マルチモーダル融合を用いた仮想現実環境における心電図からの多レベル応力評価 Multi-level Stress Assessment from ECG in a Virtual Reality Environment using Multimodal Fusion ( http://arxiv.org/abs/2107.04566v1 ) ライセンス: CC BY 4.0	Zeeshan Ahmad, Suha Rabbani, Muhammad Rehman Zafar, Syem Ishaque, Sridhar Krishnan, Naimul Khan	(参考訳) ECGは、非侵襲的な性質のため、深刻な仮想現実(VR)アプリケーションにおけるストレスを評価する魅力的な選択肢である。しかし、既存の機械学習(ML)モデルは性能が良くない。さらに、既存の研究は二分ストレスアセスメントしか行わず、より活発なバイオフィードバックベースのアプリケーションを開発するためには、マルチレベルアセスメントが必要である。既存の研究は単一の経験(例えば)を注釈し分類している。 vrビデオの視聴)を単一のストレスレベルにすることで、リアルタイムのゲーム内ストレス評価を活用できる動的エクスペリエンスの設計を再び防ぐことができる。本稿では、3つのストレスレベルを評価するvrストレスアセスメントに関する新しい研究について報告する。 ECGデータは、VRジェットコースターを経験している9人のユーザーから収集された。その後、VR体験を手作業で10秒単位で3つのストレスレベルにラベル付けした。次に,1秒間の窓から応力予測を行うことができるspectrogramと1d ecgを用いた,新しいマルチモーダル深層融合モデルを提案する。実験の結果,提案モデルは従来のHRVベースMLモデル(精度9%向上)とベースラインディープラーニングモデル(2.5%向上)より優れていた。また、ベンチマークWESADデータセットの結果を報告し、モデルの優位性を示す。 ECG is an attractive option to assess stress in serious Virtual Reality (VR) applications due to its non-invasive nature. However, the existing Machine Learning (ML) models perform poorly. Moreover, existing studies only perform a binary stress assessment, while to develop a more engaging biofeedback-based application, multi-level assessment is necessary. Existing studies annotate and classify a single experience (e.g. watching a VR video) to a single stress level, which again prevents design of dynamic experiences where real-time in-game stress assessment can be utilized. In this paper, we report our findings on a new study on VR stress assessment, where three stress levels are assessed. ECG data was collected from 9 users experiencing a VR roller coaster. The VR experience was then manually labeled in 10-seconds segments to three stress levels by three raters. We then propose a novel multimodal deep fusion model utilizing spectrogram and 1D ECG that can provide a stress prediction from just a 1-second window. Experimental results demonstrate that the proposed model outperforms the classical HRV-based ML models (9% increase in accuracy) and baseline deep learning models (2.5% increase in accuracy). We also report results on the benchmark WESAD dataset to show the supremacy of the model.	翻訳日:2021-07-12 15:05:41 公開日:2021-07-09
# (参考訳) anceR:サンプルワイドボリューム最大化による異方性認証 ANCER: Anisotropic Certification via Sample-wise Volume Maximization ( http://arxiv.org/abs/2107.04570v1 ) ライセンス: CC BY 4.0	Francisco Eiras, Motasem Alfarra, M. Pawan Kumar, Philip H. S. Torr, Puneet K. Dokania, Bernard Ghanem, Adel Bibi	(参考訳) ランダム化平滑化は最近、大規模なディープニューラルネットワーク分類器の認証を可能にする効果的なツールとして登場した。ランダム化平滑化に関するすべての先行技術は、等方性$\ell_p$認証にフォーカスしており、これは$\ell_p$-norm半径を介して等方性メソッド間で容易に比較可能な証明書を発行する利点がある。しかし、等方的認証は、入力から最悪の場合の敵への認証を制限しているため、他の「閉じた」、潜在的に大きく、一定の予測の安全な領域を推論することはできない。この問題を緩和するため、(i)理論上は、簡単な解析に従って、等方性ランダム化平滑化 $\ell_1$ と $\ell_2$ の証明書を一般化した異方性証明に拡張する。さらに、(ii)認証領域のボリュームを通して各証明書を定量化することにより、上位集合領域を認証した場合、証明書が他よりも優れている一般証明書の比較を可能にする評価指標を提案する。本稿では,ボリューム最大化によるテストセットサンプルの異方性証明書を取得するための実用的なフレームワークであるanceRを紹介する。実験結果から,ancer は cifar-10 と imagenet の両方で複数の radii において最先端の $\ell_1$ と $\ell_2$ の認証精度を達成し,ボリュームの面ではかなり大きな領域を証明し,等方性解析から遠ざかる利点を浮き彫りにした。私たちの実験で使用されたコードはhttps://github.com/motasemalfarra/ancerで利用可能です。 Randomized smoothing has recently emerged as an effective tool that enables certification of deep neural network classifiers at scale. All prior art on randomized smoothing has focused on isotropic $\ell_p$ certification, which has the advantage of yielding certificates that can be easily compared among isotropic methods via $\ell_p$-norm radius. However, isotropic certification limits the region that can be certified around an input to worst-case adversaries, \ie it cannot reason about other "close", potentially large, constant prediction safe regions. To alleviate this issue, (i) we theoretically extend the isotropic randomized smoothing $\ell_1$ and $\ell_2$ certificates to their generalized anisotropic counterparts following a simplified analysis. Moreover, (ii) we propose evaluation metrics allowing for the comparison of general certificates - a certificate is superior to another if it certifies a superset region - with the quantification of each certificate through the volume of the certified region. We introduce ANCER, a practical framework for obtaining anisotropic certificates for a given test set sample via volume maximization. Our empirical results demonstrate that ANCER achieves state-of-the-art $\ell_1$ and $\ell_2$ certified accuracy on both CIFAR-10 and ImageNet at multiple radii, while certifying substantially larger regions in terms of volume, thus highlighting the benefits of moving away from isotropic analysis. Code used in our experiments is available in https://github.com/MotasemAlfarra/ANCER.	翻訳日:2021-07-12 14:46:25 公開日:2021-07-09
# Batch Inverse-Variance Weighting: Deep Heteroscedastic Regression Batch Inverse-Variance Weighting: Deep Heteroscedastic Regression ( http://arxiv.org/abs/2107.04497v1 ) ライセンス: Link先を確認	Vincent Mai, Waleed Khamies, Liam Paull	(参考訳) ヘテロセダスティック回帰(Heteroscedastic regression)は、各ラベルが異なる分布からノイズを受ける教師あり学習のタスクである。このノイズはラベル付けプロセスによって引き起こされ、i.i.dに反する学習アルゴリズムの性能に悪影響を及ぼす。仮定だしかし、多くの状況において、ラベル付けプロセスはラベルごとにそのような分布のばらつきを推定することができ、この影響を軽減するために追加情報として使用できる。ニューラルネットワークのパラメータ最適化にガウス・マルコフの定理に基づく逆分散重み付き平均二乗誤差を適用する。近地真理サンプルに頑健な損失関数であるバッチ逆分散を導入し,効果的な学習率の制御を可能にする。実験の結果,L2損失,逆分散重み付け,フィルタベースラインに比べて,BIVは2つのノイズデータセット上でのネットワーク性能を著しく向上することがわかった。 Heteroscedastic regression is the task of supervised learning where each label is subject to noise from a different distribution. This noise can be caused by the labelling process, and impacts negatively the performance of the learning algorithm as it violates the i.i.d. assumptions. In many situations however, the labelling process is able to estimate the variance of such distribution for each label, which can be used as an additional information to mitigate this impact. We adapt an inverse-variance weighted mean square error, based on the Gauss-Markov theorem, for parameter optimization on neural networks. We introduce Batch Inverse-Variance, a loss function which is robust to near-ground truth samples, and allows to control the effective learning rate. Our experimental results show that BIV improves significantly the performance of the networks on two noisy datasets, compared to L2 loss, inverse-variance weighting, as well as a filtering-based baseline.	翻訳日:2021-07-12 14:01:22 公開日:2021-07-09
# ドメイン適応のためのドロップアウト判別器の探索 Exploring Dropout Discriminator for Domain Adaptation ( http://arxiv.org/abs/2107.04231v1 ) ライセンス: Link先を確認	Vinod K Kurmi and Venkatesh K Subramanian and Vinay P. Namboodiri	(参考訳) 新しいドメインへの分類器の適応は、機械学習における難しい問題の1つである。これは多くの深層学習と非深層学習に基づく手法を用いて解決されている。提案手法のうち,多くの深層学習問題とドメイン適応を両立させるために,逆学習の手法が広く適用されている。これらの方法は、ソースとターゲットの分布が近いことを保証する判別器に基づいている。しかし, 一つの判別器で得られる点推定を用いるのではなく, 判別器のアンサンブルに基づく分布を用いてこのギャップを橋渡しすることは有用であると考えられる。これは複数の分類器や従来のアンサンブル方式で実現できる。対照的に,モンテカルロのドロップアウトに基づくアンサンブル判別器は,分布に基づく判別器を得るのに十分である可能性が示唆された。具体的には,サンプルベース分布のばらつきを徐々に増加させ,それに対応する逆勾配を用いて特徴表現の調整を行うカリキュラムベースのドロップアウト判別器を提案する。判別器のアンサンブルは、モデルがデータ分布を効率的に学習するのに役立つ。さらに、機能抽出子をトレーニングするための勾配推定も改善されている。詳細な結果と徹底的なアブレーション解析により,本モデルが最先端の結果を上回っていることが示された。 Adaptation of a classifier to new domains is one of the challenging problems in machine learning. This has been addressed using many deep and non-deep learning based methods. Among the methodologies used, that of adversarial learning is widely applied to solve many deep learning problems along with domain adaptation. These methods are based on a discriminator that ensures source and target distributions are close. However, here we suggest that rather than using a point estimate obtaining by a single discriminator, it would be useful if a distribution based on ensembles of discriminators could be used to bridge this gap. This could be achieved using multiple classifiers or using traditional ensemble methods. In contrast, we suggest that a Monte Carlo dropout based ensemble discriminator could suffice to obtain the distribution based discriminator. Specifically, we propose a curriculum based dropout discriminator that gradually increases the variance of the sample based distribution and the corresponding reverse gradients are used to align the source and target feature representations. An ensemble of discriminators helps the model to learn the data distribution efficiently. It also provides a better gradient estimates to train the feature extractor. The detailed results and thorough ablation analysis show that our model outperforms state-of-the-art results.	翻訳日:2021-07-12 14:00:38 公開日:2021-07-09
# Heterogeneous Attention を用いた Levi Graph AMR Parser Levi Graph AMR Parser using Heterogeneous Attention ( http://arxiv.org/abs/2107.04152v1 ) ライセンス: Link先を確認	Han He, Jinho D. Choi	(参考訳) バイファインデコーダと組み合わせて、トランスフォーマーはテキストからグラフへの変換に効果的に適応し、AMR解析における最先端のパフォーマンスを実現している。しかし、多くの先行研究はビスフィンデコーダをアークまたはラベルの予測に頼っているが、デコーダで使われているほとんどの特徴は変圧器で既に学習されている。本稿では,異種データ(トークン,概念,ラベル)を変換器への入力として組み合わせて注意を学習し,AMRグラフのすべての要素(概念,弧,ラベル)を予測するために,変換器からの注意行列のみを用いる,新しいAMR解析手法を提案する。我々のモデルでは、従来の最先端グラフパーサよりもパラメータが大幅に少ないが、AMR 2.0と3.0では類似またはより良い精度を示している。 Coupled with biaffine decoders, transformers have been effectively adapted to text-to-graph transduction and achieved state-of-the-art performance on AMR parsing. Many prior works, however, rely on the biaffine decoder for either or both arc and label predictions although most features used by the decoder may be learned by the transformer already. This paper presents a novel approach to AMR parsing by combining heterogeneous data (tokens, concepts, labels) as one input to a transformer to learn attention, and use only attention matrices from the transformer to predict all elements in AMR graphs (concepts, arcs, labels). Although our models use significantly fewer parameters than the previous state-of-the-art graph parser, they show similar or better accuracy on AMR 2.0 and 3.0.	翻訳日:2021-07-12 14:00:21 公開日:2021-07-09
# 深部ニューラルネットワークのための活性化勾配 Activated Gradients for Deep Neural Networks ( http://arxiv.org/abs/2107.04228v1 ) ライセンス: Link先を確認	Mei Liu, Liangming Chen, Xiaohao Du, Long Jin, and Mingsheng Shang	(参考訳) ディープニューラルネットワークは、悪条件の問題、消失/爆発勾配問題、サドルポイント問題などにより、パフォーマンスの低下やトレーニングの失敗に苦しむことが多い。本稿では,勾配に勾配活性化関数(GAF)を作用させることにより,これらの課題に対処する新しい手法を提案する。直感的には、GAFは小さな勾配を拡大し、大きな勾配を制限する。理論的には、この論文は、GAFが満たすべき条件を与え、この条件に基づいて、GAFが上記の問題を緩和することを証明している。さらに, 本論文は, SGD のGAF との収束速度がGAF を含まない場合よりも速いことを証明する。さらに、CIFAR、ImageNet、PASCAL視覚オブジェクトクラスに関する実験により、GAFの有効性が確認された。また,実験結果から,提案手法が様々なディープニューラルネットワークに採用され,性能が向上することが示唆された。ソースコードはhttps://github.com/LongJin-lab/Activated-Gradients-for-Deep-Neural-Networksで公開されている。 Deep neural networks often suffer from poor performance or even training failure due to the ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point problem. In this paper, a novel method by acting the gradient activation function (GAF) on the gradient is proposed to handle these challenges. Intuitively, the GAF enlarges the tiny gradients and restricts the large gradient. Theoretically, this paper gives conditions that the GAF needs to meet, and on this basis, proves that the GAF alleviates the problems mentioned above. In addition, this paper proves that the convergence rate of SGD with the GAF is faster than that without the GAF under some assumptions. Furthermore, experiments on CIFAR, ImageNet, and PASCAL visual object classes confirm the GAF's effectiveness. The experimental results also demonstrate that the proposed method is able to be adopted in various deep neural networks to improve their performance. The source code is publicly available at https://github.com/LongJin-lab/Activated-Gradients-for-Deep-Neural-Networks.	翻訳日:2021-07-12 14:00:04 公開日:2021-07-09
# RGBストリームで時間的アクション検出が可能に RGB Stream Is Enough for Temporal Action Detection ( http://arxiv.org/abs/2107.04362v1 ) ライセンス: Link先を確認	Chenhao Wang, Hongxiang Cai, Yuxin Zou, Yichao Xiong	(参考訳) 現在最先端の時間的動作検出器は、RGBフレームと光フローを含む2ストリーム入力に基づいている。 rgbフレームとオプティカルフローの組み合わせは性能を著しく向上させるが、光学フローは、重い計算を必要とするだけでなく、2つのストリームメソッドがフローと共同でエンドツーエンドで学習されることが少なく、方法論上不満足なハンドデザインの表現である。本稿では,光学フローの高精度な時間的動作検出には光学フローが不要であり,画像レベルのデータ拡張(ILDA)が重要な解であり,光学フローの除去時の性能劣化を回避する。 ILDAの有効性を評価するため,DaoTADという単一のRGBストリームをベースとした簡易かつ効率的な一段階動作検出器を設計した。以上の結果から,DeoTADは既存の2ストリーム検出器と同等の精度を保ちつつ,従来の手法の推論速度を大きなマージンで上回り,GeForce GTX 1080 Tiでは6668fpsの速度を達成できた。コードは \url{https://github.com/Media-Smart/vedatad} で入手できる。 State-of-the-art temporal action detectors to date are based on two-stream input including RGB frames and optical flow. Although combining RGB frames and optical flow boosts performance significantly, optical flow is a hand-designed representation which not only requires heavy computation, but also makes it methodologically unsatisfactory that two-stream methods are often not learned end-to-end jointly with the flow. In this paper, we argue that optical flow is dispensable in high-accuracy temporal action detection and image level data augmentation (ILDA) is the key solution to avoid performance degradation when optical flow is removed. To evaluate the effectiveness of ILDA, we design a simple yet efficient one-stage temporal action detector based on single RGB stream named DaoTAD. Our results show that when trained with ILDA, DaoTAD has comparable accuracy with all existing state-of-the-art two-stream detectors while surpassing the inference speed of previous methods by a large margin and the inference speed is astounding 6668 fps on GeForce GTX 1080 Ti. Code is available at \url{https://github.com/Media-Smart/vedatad}.	翻訳日:2021-07-12 13:59:47 公開日:2021-07-09
# 深部畳み込みニューラルネットワーク圧縮のための結合行列分解 Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression ( http://arxiv.org/abs/2107.04386v1 ) ライセンス: Link先を確認	Shaowu Chen, Jihao Zhou, Weize Sun, Lei Huang	(参考訳) 多数のパラメータを持つディープ畳み込みニューラルネットワーク(CNN)は膨大な計算資源を必要とし、リソース制約されたアプライアンスへのCNNの適用を制限する。そのため,近年,分解に基づく手法がcnnの圧縮に利用されている。しかし、圧縮係数と性能は負の相関関係にあるため、最先端の作業は厳しい性能劣化に悩まされるか、圧縮係数が限られている。これらの課題を克服するため,CNNを圧縮し,結合行列分解による性能劣化を軽減することを提案する。このアイデアは、CNNには多くの繰り返しモジュールがあり、同じ構造を持つ重みを同じ部分空間に投影することで、ネットワークをさらに圧縮し、加速することができるという事実にインスパイアされている。特に, 3つの合同行列分解スキームを開発し, 特異値分解に基づく最適化手法を提案する。 3つの挑戦的なコンパクトcnnと3つのベンチマークデータセットで広範な実験を行い、提案アルゴリズムの優れた性能を実証した。その結果,本手法はresnet-34のサイズを22倍圧縮し,精度を低下させることができた。 Deep convolutional neural networks (CNNs) with a large number of parameters requires huge computational resources, which has limited the application of CNNs on resources constrained appliances. Decomposition-based methods, therefore, have been utilized to compress CNNs in recent years. However, since the compression factor and performance are negatively correlated, the state-of-the-art works either suffer from severe performance degradation or have limited low compression factors. To overcome these problems, unlike previous works compressing layers separately, we propose to compress CNNs and alleviate performance degradation via joint matrix decomposition. The idea is inspired by the fact that there are lots of repeated modules in CNNs, and by projecting weights with the same structures into the same subspace, networks can be further compressed and even accelerated. In particular, three joint matrix decomposition schemes are developed, and the corresponding optimization approaches based on Singular Values Decomposition are proposed. Extensive experiments are conducted across three challenging compact CNNs and 3 benchmark data sets to demonstrate the superior performance of our proposed algorithms. As a result, our methods can compress the size of ResNet-34 by 22x with slighter accuracy degradation compared with several state-of-the-art methods.	翻訳日:2021-07-12 13:59:27 公開日:2021-07-09
# 不確実性推定を用いたモデル・モデレータ協調の測定と改善 Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation ( http://arxiv.org/abs/2107.04212v1 ) ライセンス: Link先を確認	Ian D. Kivlichan, Zi Lin, Jeremiah Liu, Lucy Vasserman	(参考訳) コンテンツモデレーションは、人間と機械学習モデルのコラボレーションによって実行されることが多い。しかし,モデレータとモデルを組み合わせたシステムの性能を最大化するために,協調的なプロセスの設計方法がよく理解されていない。本研究は,協調的プロセスにモデル不確実性を取り込むアプローチに着目し,この問題を厳密に研究する。まず,人間のモデレーター上での容量制約下での協調システムの性能を記述するための原則付きメトリクスを導入し,組み合わせたシステムがいかに人的決定を効果的に活用するかを定量化する。これらの指標を用いて, 異なる協調的レビュー戦略の下で, 最先端の不確実性モデルの性能評価を行う。不確実性に基づく戦略は、毒性スコアに基づく広く使用されている戦略を一貫して上回っており、レビュー戦略の選択はシステム全体のパフォーマンスを劇的に変化させる。本研究は,コンテンツモデレーションのための効果的なモデレータモデルシステムを理解・開発するための厳密なメトリクスの重要性と,この領域における不確実性推定の有用性を示す。 Content moderation is often performed by a collaboration between humans and machine learning models. However, it is not well understood how to design the collaborative process so as to maximize the combined moderator-model system performance. This work presents a rigorous study of this problem, focusing on an approach that incorporates model uncertainty into the collaborative process. First, we introduce principled metrics to describe the performance of the collaborative system under capacity constraints on the human moderator, quantifying how efficiently the combined system utilizes human decisions. Using these metrics, we conduct a large benchmark study evaluating the performance of state-of-the-art uncertainty models under different collaborative review strategies. We find that an uncertainty-based strategy consistently outperforms the widely used strategy based on toxicity scores, and moreover that the choice of review strategy drastically changes the overall system performance. Our results demonstrate the importance of rigorous metrics for understanding and developing effective moderator-model systems for content moderation, as well as the utility of uncertainty estimation in this domain.	翻訳日:2021-07-12 13:59:08 公開日:2021-07-09
# 低リソースニューラルマシン翻訳に関する調査研究 A Survey on Low-Resource Neural Machine Translation ( http://arxiv.org/abs/2107.04239v1 ) ライセンス: Link先を確認	Rui Wang and Xu Tan and Renqian Luo and Tao Qin and Tie-Yan Liu	(参考訳) ニューラルアプローチは機械翻訳における最先端の精度を達成したが、大規模並列データ収集のコストが高い。したがって、非常に限られた並列データ、すなわち低リソース設定を持つニューラルマシン翻訳(nmt)について多くの研究が行われている。本稿では,低リソースNMTに関する調査を,(1)ソースおよび/またはターゲット言語の単言語データの活用,(2)補助言語からのデータの活用,(3)マルチモーダルデータの活用の3つのカテゴリに分類する。私たちの調査は、研究者がこの分野をより深く理解し、より良いアルゴリズムを設計するように促し、業界関係者がアプリケーションに適したアルゴリズムを選択するのに役立つことを期待しています。 Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data. Thus, a lot of research has been conducted for neural machine translation (NMT) with very limited parallel data, i.e., the low-resource setting. In this paper, we provide a survey for low-resource NMT and classify related works into three categories according to the auxiliary data they used: (1) exploiting monolingual data of source and/or target languages, (2) exploiting data from auxiliary languages, and (3) exploiting multi-modal data. We hope that our survey can help researchers to better understand this field and inspire them to design better algorithms, and help industry practitioners to choose appropriate algorithms for their applications.	翻訳日:2021-07-12 13:58:49 公開日:2021-07-09
# UniRE: エンティティ関係抽出のための統一ラベル空間 UniRE: A Unified Label Space for Entity Relation Extraction ( http://arxiv.org/abs/2107.04292v1 ) ライセンス: Link先を確認	Yijun Wang, Changzhi Sun, Yuanbin Wu, Hao Zhou, Lei Li, and Junchi Yan	(参考訳) 多くのジョイントエンティティ関係抽出モデルは、2つのサブタスク(エンティティ検出と関係分類)に対して2つの分離されたラベル空間を設定する。この設定は、エンティティとリレーション間の情報相互作用を妨げる可能性がある。本研究では,2つのサブタスクのラベル空間における異なる処理の除去を提案する。我々のモデルの入力は、文から全ての単語対を含むテーブルである。実体と関係は表の中の正方形と矩形で表される。 2つのサブタスクの学習を統一する,各セルラベルの予測に統一型分類器を適用した。テストでは、テーブルから正方形と矩形を見つけるために有効な(より速い)近似デコーダを提案する。 3つのベンチマーク (ACE04, ACE05, SciERC) 実験の結果, パラメータの半数しか使用せず, 最適抽出器との競合精度が向上し, 高速であることがわかった。 Many joint entity relation extraction models setup two separated label spaces for the two sub-tasks (i.e., entity detection and relation classification). We argue that this setting may hinder the information interaction between entities and relations. In this work, we propose to eliminate the different treatment on the two sub-tasks' label spaces. The input of our model is a table containing all word pairs from a sentence. Entities and relations are represented by squares and rectangles in the table. We apply a unified classifier to predict each cell's label, which unifies the learning of two sub-tasks. For testing, an effective (yet fast) approximate decoder is proposed for finding squares and rectangles from tables. Experiments on three benchmarks (ACE04, ACE05, SciERC) show that, using only half the number of parameters, our model achieves competitive accuracy with the best extractor, and is faster.	翻訳日:2021-07-12 13:58:35 公開日:2021-07-09
# タスク指向NLG出力のローカライズに機械翻訳を用いる Using Machine Translation to Localize Task Oriented NLG Output ( http://arxiv.org/abs/2107.04512v1 ) ライセンス: Link先を確認	Scott Roy, Cliff Brunk, Kyu-Young Kim, Justin Zhao, Markus Freitag, Mihir Kale, Gagan Bansal, Sidharth Mudgal, Chris Varano	(参考訳) Google Assistant、Siri、Alexaといったタスク指向自然言語アプリケーションの課題のひとつは、出力を多くの言語にローカライズすることだ。本稿では、英語の出力に機械翻訳を適用してこれを行う。機械翻訳を使うことは非常にスケーラブルで、あらゆる英語の出力で動作し、動的テキストを処理できる。要求される品質バーは完璧に近く、文章の範囲は非常に狭く、機械翻訳訓練データとは大きく異なることが多い。この要求の組み合わせは、機械翻訳のためのドメイン適応の分野では新しくなっている。既存のアイデアに基づいて、ドメイン内翻訳の微調整、Webからの文の追加、セマンティックアノテーションの追加、自動エラー検出など、必要な品質バーに到達することができます。論文は, 大規模翻訳モデルを実現するための蒸留モデルとともに, 我々のアプローチと結果を共有する。 One of the challenges in a task oriented natural language application like the Google Assistant, Siri, or Alexa is to localize the output to many languages. This paper explores doing this by applying machine translation to the English output. Using machine translation is very scalable, as it can work with any English output and can handle dynamic text, but otherwise the problem is a poor fit. The required quality bar is close to perfection, the range of sentences is extremely narrow, and the sentences are often very different than the ones in the machine translation training data. This combination of requirements is novel in the field of domain adaptation for machine translation. We are able to reach the required quality bar by building on existing ideas and adding new ones: finetuning on in-domain translations, adding sentences from the Web, adding semantic annotations, and using automatic error detection. The paper shares our approach and results, together with a distillation model to serve the translation models at scale.	翻訳日:2021-07-12 13:58:21 公開日:2021-07-09
# 代理説明を理解する: 複雑性と忠実さとカバレッジの相互作用 Understanding surrogate explanations: the interplay between complexity, fidelity and coverage ( http://arxiv.org/abs/2107.04309v1 ) ライセンス: Link先を確認	Rafael Poyiadzi, Xavier Renard, Thibault Laugel, Raul Santos-Rodriguez, Marcin Detyniecki	(参考訳) 本稿では,サロゲート説明の背後にある基本成分を分析し,その内部動作の理解を深める。我々は、グローバルサロゲートを考慮し、サロゲートの複雑さとブラックボックスがモデル化される忠実さの間のトレードオフを記述して、展示を開始する。グローバルからローカルへの移行 - カバー範囲の削減 - により、サロゲートの忠実度-複雑度のパレートフロンティアにおいて、より好ましい条件が実現できることが示される。複雑度,忠実度,カバレッジの相互作用を議論し,ユーザニーズの違いが制約やペナルティである問題定式化にどのようにつながるかを検討する。また,局所的な代用的解釈可能性の手順をインタラクティブにし,より良い説明につながることを示す実験を行った。 This paper analyses the fundamental ingredients behind surrogate explanations to provide a better understanding of their inner workings. We start our exposition by considering global surrogates, describing the trade-off between complexity of the surrogate and fidelity to the black-box being modelled. We show that transitioning from global to local - reducing coverage - allows for more favourable conditions on the Pareto frontier of fidelity-complexity of a surrogate. We discuss the interplay between complexity, fidelity and coverage, and consider how different user needs can lead to problem formulations where these are either constraints or penalties. We also present experiments that demonstrate how the local surrogate interpretability procedure can be made interactive and lead to better explanations.	翻訳日:2021-07-12 13:57:36 公開日:2021-07-09
# 深層学習における漁業情報のばらつきについて On the Variance of the Fisher Information for Deep Learning ( http://arxiv.org/abs/2107.04205v1 ) ライセンス: Link先を確認	Alexander Soen and Ke Sun	(参考訳) Fisher InformationMatrix (FIM) はディープラーニングの領域に応用されている。これは損失の風景、パラメータの分散、二階最適化、ディープラーニング理論と密接に関連している。正確なFIMはクローズドな形で利用できないか、計算に高すぎるかのいずれかである。実際には、ほぼ常に経験的なサンプルに基づいて推定される。 FIMの2つの等価表現に基づく2つの推定器について検討する。これらはどちらも非バイアスであり、根底にある「真の」FIMに関して一貫性がある。その推定品質は、閉じた形で与えられる分散によって特徴づけられる。それらの分散を束縛し、ディープニューラルネットワークのパラメトリック構造が分散にどのように影響するかを分析する。本稿では,この分散尺度の意味と深層学習の文脈における境界について考察する。 The Fisher information matrix (FIM) has been applied to the realm of deep learning. It is closely related to the loss landscape, the variance of the parameters, second order optimization, and deep learning theory. The exact FIM is either unavailable in closed form or too expensive to compute. In practice, it is almost always estimated based on empirical samples. We investigate two such estimators based on two equivalent representations of the FIM. They are both unbiased and consistent with respect to the underlying "true" FIM. Their estimation quality is characterized by their variance given in closed form. We bound their variances and analyze how the parametric structure of a deep neural network can impact the variance. We discuss the meaning of this variance measure and our bounds in the context of deep learning.	翻訳日:2021-07-12 13:56:38 公開日:2021-07-09
# 多頭部ニューラルアンサンブル探索 Multi-headed Neural Ensemble Search ( http://arxiv.org/abs/2107.04369v1 ) ライセンス: Link先を確認	Ashwin Raaghav Narayanan, Arber Zela, Tonmoy Saikia, Thomas Brox, Frank Hutter	(参考訳) 異なる種(ディープ・アンサンブルとしても知られる)で訓練されたCNNモデルのアンサンブルは、CNNの単一コピーよりも優れたパフォーマンスを達成することが知られている。 Neural Ensemble Search (NES)は、アーキテクチャの多様性を追加することでパフォーマンスをさらに向上させることができる。しかし、nesの範囲は限られた計算資源で制限されている。本研究では,マルチヘッドアンサンブルにnesを拡張し,複数の予測ヘッドに共有バックボーンを付加した。 Deep Ensemblesとは異なり、これらのマルチヘッドアンサンブルはエンドツーエンドで訓練できるため、ワンショットNASメソッドを利用してアンサンブルの目的を最適化することができる。実験により,マルチヘッド型アンサンブル検索は,他のアンサンブル検索手法と比較して3倍高速に動作し,予測性能と不確かさの両面で高い性能を示した。 Ensembles of CNN models trained with different seeds (also known as Deep Ensembles) are known to achieve superior performance over a single copy of the CNN. Neural Ensemble Search (NES) can further boost performance by adding architectural diversity. However, the scope of NES remains prohibitive under limited computational resources. In this work, we extend NES to multi-headed ensembles, which consist of a shared backbone attached to multiple prediction heads. Unlike Deep Ensembles, these multi-headed ensembles can be trained end to end, which enables us to leverage one-shot NAS methods to optimize an ensemble objective. With extensive empirical evaluations, we demonstrate that multi-headed ensemble search finds robust ensembles 3 times faster, while having comparable performance to other ensemble search methods, in both predictive performance and uncertainty calibration.	翻訳日:2021-07-12 13:56:29 公開日:2021-07-09
# 拡張GANフレームワークを用いたWhite-Box Cartoonization White-Box Cartoonization Using An Extended GAN Framework ( http://arxiv.org/abs/2107.04551v1 ) ライセンス: Link先を確認	Amey Thakur, Hasan Rizvi, Mega Satish	(参考訳) 本研究では,既存のGANフレームワークを拡張し,実世界の写真やビデオから高品質なマンガ画像や映像を生成するホワイトボックス制御可能な画像の漫画化を開発するための,敵対的なプロセスによる生成モデルを推定するための新しいフレームワークを提案する。本システムの学習目的は, 表面表現, 構造表現, テクスチャ表現の3つの異なる表現に基づいている。表面表現は、画像の滑らかな表面を指す。構造表現はスパースカラーブロックと関連し、ジェネリックコンテンツを圧縮する。テクスチャ表現は、漫画画像のテクスチャ、曲線、特徴を示す。 Generative Adversarial Network (GAN)フレームワークは、画像を異なる表現に分解し、そこから学習して漫画画像を生成する。この分解により、フレームワークはより制御可能でフレキシブルになり、ユーザーは必要な出力に基づいて変更できる。このアプローチは、画像の明快さ、色、テクスチャ、形状を維持できるが、漫画のイメージの特徴は示さないという点で、過去のシステムを克服する。 In the present study, we propose to implement a new framework for estimating generative models via an adversarial process to extend an existing GAN framework and develop a white-box controllable image cartoonization, which can generate high-quality cartooned images/videos from real-world photos and videos. The learning purposes of our system are based on three distinct representations: surface representation, structure representation, and texture representation. The surface representation refers to the smooth surface of the images. The structure representation relates to the sparse colour blocks and compresses generic content. The texture representation shows the texture, curves, and features in cartoon images. Generative Adversarial Network (GAN) framework decomposes the images into different representations and learns from them to generate cartoon images. This decomposition makes the framework more controllable and flexible which allows users to make changes based on the required output. This approach overcomes any previous system in terms of maintaining clarity, colours, textures, shapes of images yet showing the characteristics of cartoon images.	翻訳日:2021-07-12 13:55:57 公開日:2021-07-09
# 複数の視覚領域のセマンティックセグメンテーション Semantic Segmentation on Multiple Visual Domains ( http://arxiv.org/abs/2107.04326v1 ) ライセンス: Link先を確認	Floris Naber	(参考訳) セマンティクスのセグメンテーションモデルは、トレーニング対象のドメインでのみうまく動作し、トレーニング用のデータセットは不足しており、必要なピクセルレベルのアノテーションはコストがかかるため、ラベルスペースが小さいことが多い。したがって、複数の既存ドメインでのトレーニングモデルは出力ラベル空間を増大させることが望まれる。現在の研究では、マルチドメイントレーニングを使用してデータセット間の精度を改善する可能性があるが、手動ラベリングなしで3つの異なる非重複ドメインのデータセットにはまだ拡張されていない。本稿では,データセットの全クラスにまたがるラベル空間を作成することで,都市景観,SUIM,SUN RGB-Dのデータセットに対して,この手法を提案する。重複クラスはマージされ、クラスを分離して離散的な粒度が解決される。その結果、ハードウェアの性能が等しければ、マルチドメインモデルの精度は全てのベースラインモデルよりも高いことが示され、リソースに制限がないため、モデルは共通でないドメインからでも追加データから恩恵を受けることが示された。 Semantic segmentation models only perform well on the domain they are trained on and datasets for training are scarce and often have a small label-spaces, because the pixel level annotations required are expensive to make. Thus training models on multiple existing domains is desired to increase the output label-space. Current research shows that there is potential to improve accuracy across datasets by using multi-domain training, but this has not yet been successfully extended to datasets of three different non-overlapping domains without manual labelling. In this paper a method for this is proposed for the datasets Cityscapes, SUIM and SUN RGB-D, by creating a label-space that spans all classes of the datasets. Duplicate classes are merged and discrepant granularity is solved by keeping classes separate. Results show that accuracy of the multi-domain model has higher accuracy than all baseline models together, if hardware performance is equalized, as resources are not limitless, showing that models benefit from additional data even from domains that have nothing in common.	翻訳日:2021-07-12 13:55:41 公開日:2021-07-09
# モバイルアプリケーションのためのマルチモーダルアイコンアノテーション Multimodal Icon Annotation For Mobile Applications ( http://arxiv.org/abs/2107.04452v1 ) ライセンス: Link先を確認	Xiaoxue Zang, Ying Xu, Jindong Chen	(参考訳) 画面上の意味のあるUI要素のローカライズと分類を含むユーザインターフェース(UI)のアノテーションは、スクリーンリーダーやデバイスの音声制御といった多くのモバイルアプリケーションにとって重要なステップである。メニュー、検索、矢印といったオブジェクトアイコンを後方にアノテートすることは、画面上の明示的なラベルの欠如、画像との類似性、そしてそれらの多様な形状のため、特に困難である。既存の研究では、ビュー階層またはピクセルベースメソッドを使用してタスクに取り組む。モバイルプラットフォームのビュー階層機能は不完全あるいは不正確なことが多いため、Pixelベースのアプローチの方が一般的だが、リソースIDやコンテンツ記述などのビュー階層に命令情報を残している。本稿では,画素とビュー階層機能の両方の利点と,最先端のオブジェクト検出技術を活用する,新しいディープラーニングに基づくマルチモーダルアプローチを提案する。 ricoは72kのuiスクリーンショットからなる大規模なモバイルデザインデータセットで,29個のアイコンを手作業でアノテートすることにより,高品質のuiデータセットを作成する。実験の結果,マルチモーダルアプローチの有効性が示された。我々のモデルは、広く使われているオブジェクト分類ベースラインだけでなく、ピクセルベースのオブジェクト検出モデルよりも優れている。当社の研究は、ビュー階層とピクセル機能を組み合わせてui要素をアノテートする方法に光を当てています。 Annotating user interfaces (UIs) that involves localization and classification of meaningful UI elements on a screen is a critical step for many mobile applications such as screen readers and voice control of devices. Annotating object icons, such as menu, search, and arrow backward, is especially challenging due to the lack of explicit labels on screens, their similarity to pictures, and their diverse shapes. Existing studies either use view hierarchy or pixel based methods to tackle the task. Pixel based approaches are more popular as view hierarchy features on mobile platforms are often incomplete or inaccurate, however it leaves out instructional information in the view hierarchy such as resource-ids or content descriptions. We propose a novel deep learning based multi-modal approach that combines the benefits of both pixel and view hierarchy features as well as leverages the state-of-the-art object detection techniques. In order to demonstrate the utility provided, we create a high quality UI dataset by manually annotating the most commonly used 29 icons in Rico, a large scale mobile design dataset consisting of 72k UI screenshots. The experimental results indicate the effectiveness of our multi-modal approach. Our model not only outperforms a widely used object classification baseline but also pixel based object detection models. Our study sheds light on how to combine view hierarchy with pixel features for annotating UI elements.	翻訳日:2021-07-12 13:55:24 公開日:2021-07-09
# 説明可能性の方法を選ぶには? XAIの実践的実践に向けて How to choose an Explainability Method? Towards a Methodical Implementation of XAI in Practice ( http://arxiv.org/abs/2107.04427v1 ) ライセンス: Link先を確認	Tom Vermeire and Thibault Laugel and Xavier Renard and David Martens and Marcin Detyniecki	(参考訳) 説明責任は、規制イニシアチブと公共の意識の変化によって、自動意思決定を利用する組織にとって重要な要件になりつつある。この説明可能性を提供するための様々なアルゴリズム的手法がこの分野で導入されているが、機械学習コミュニティの既存の文献は、人間とコンピュータのインタフェースコミュニティでより研究されている利害関係者にはほとんど注意を払っていない。したがって、この説明可能性を望むか、提供する必要がある組織は、ユースケースに適した方法の選択に直面します。本稿では,利害関係者のニーズと説明方法のギャップを埋めるための方法論の必要性を論じる。我々は、ステークホルダーに説明責任を提供するプロセスにおいて、データサイエンティストを支援するために、この方法論を作成するための継続的な作業を示す。特に、私たちのコントリビューションには、XAIメソッドとユーザ要求(Appendixで書かれた)を特徴付ける文書が含まれています。 Explainability is becoming an important requirement for organizations that make use of automated decision-making due to regulatory initiatives and a shift in public awareness. Various and significantly different algorithmic methods to provide this explainability have been introduced in the field, but the existing literature in the machine learning community has paid little attention to the stakeholder whose needs are rather studied in the human-computer interface community. Therefore, organizations that want or need to provide this explainability are confronted with the selection of an appropriate method for their use case. In this paper, we argue there is a need for a methodology to bridge the gap between stakeholder needs and explanation methods. We present our ongoing work on creating this methodology to help data scientists in the process of providing explainability to stakeholders. In particular, our contributions include documents used to characterize XAI methods and user requirements (shown in Appendix), which our methodology builds upon.	翻訳日:2021-07-12 13:55:05 公開日:2021-07-09
# 光干渉計のビーム発散制御と連続動作空間との整合 Aligning an optical interferometer with beam divergence control and continuous action space ( http://arxiv.org/abs/2107.04457v1 ) ライセンス: Link先を確認	Stepan Makarenko, Dmitry Sorokin, Alexander Ulanov, A. I. Lvovsky	(参考訳) 強化学習は実世界の問題アプリケーションへの道を見つけ、シミュレーションされた環境から物理的な環境へ移行している。本研究では,片腕に共焦点望遠鏡を装着し,対応するビームの直径と発散を制御する光学マッハツェンダー干渉計の視覚に基づくアライメントを実装した。指数的スケーリングによって、2桁以上の範囲のアクションを処理することができます。我々のエージェントはドメインランダム化を模擬環境でのみ訓練する。実験的評価では、エージェントは既存のソリューションと人間の専門家とを著しく上回る。 Reinforcement learning is finding its way to real-world problem application, transferring from simulated environments to physical setups. In this work, we implement vision-based alignment of an optical Mach-Zehnder interferometer with a confocal telescope in one arm, which controls the diameter and divergence of the corresponding beam. We use a continuous action space; exponential scaling enables us to handle actions within a range of over two orders of magnitude. Our agent trains only in a simulated environment with domain randomizations. In an experimental evaluation, the agent significantly outperforms an existing solution and a human expert.	翻訳日:2021-07-12 13:54:49 公開日:2021-07-09
# 逆混合密度ネットワーク:衝突データから安全に運転する学習 Adversarial Mixture Density Networks: Learning to Drive Safely from Collision Data ( http://arxiv.org/abs/2107.04485v1 ) ライセンス: Link先を確認	Sampo Kuutti, Saber Fallah, Richard Bowden	(参考訳) 模倣学習は、予め記録されたデータに基づいて自律運転の制御方針を学ぶために広く使われている。しかしながら、模倣学習に基づくポリシーは、トレーニング分布外の状態に遭遇する際のエラーを複雑化する可能性があることが示されている。さらに、これらのエージェントは衝突を起こそうとする敵の道路利用者によって容易に利用できることが示されている。これらの欠点を克服するために、異なるデータセットから2つの分布を学習するAdversarial Mixture Density Networks (AMDN)を導入する。 1つ目は、自然主義的な人間の運転のデータセットから学んだ安全な行動の分布である。 2つ目は、衝突のデータセットから学んだ、衝突につながる可能性のある安全でない行動を表す分布である。トレーニング中、これらの2つの分布を利用して、2つの分布の類似性に基づいたさらなる損失を与える。衝突データセットのトレーニング時に、安全行動分布と非安全行動分布との類似性に基づいて安全行動分布を解析することにより、より堅牢で安全な制御ポリシーを得る。提案するamdnアプローチをユースケースに追従した車両で実証し,自然主義的および敵対的テスト環境下で評価する。その単純さにもかかわらず、amdnは純粋な模倣学習や標準混合密度ネットワークアプローチと比較して、学習した制御ポリシーの安全性に大きなメリットがあることを示している。 Imitation learning has been widely used to learn control policies for autonomous driving based on pre-recorded data. However, imitation learning based policies have been shown to be susceptible to compounding errors when encountering states outside of the training distribution. Further, these agents have been demonstrated to be easily exploitable by adversarial road users aiming to create collisions. To overcome these shortcomings, we introduce Adversarial Mixture Density Networks (AMDN), which learns two distributions from separate datasets. The first is a distribution of safe actions learned from a dataset of naturalistic human driving. The second is a distribution representing unsafe actions likely to lead to collision, learned from a dataset of collisions. During training, we leverage these two distributions to provide an additional loss based on the similarity of the two distributions. By penalising the safe action distribution based on its similarity to the unsafe action distribution when training on the collision dataset, a more robust and safe control policy is obtained. We demonstrate the proposed AMDN approach in a vehicle following use-case, and evaluate under naturalistic and adversarial testing environments. We show that despite its simplicity, AMDN provides significant benefits for the safety of the learned control policy, when compared to pure imitation learning or standard mixture density network approaches.	翻訳日:2021-07-12 13:54:39 公開日:2021-07-09
# 教師-学生組立における継続的な学習 : 課題類似性の影響 Continual Learning in the Teacher-Student Setup: Impact of Task Similarity ( http://arxiv.org/abs/2107.04384v1 ) ライセンス: Link先を確認	Sebastian Lee and Sebastian Goldt and Andrew Saxe	(参考訳) 連続学習-シーケンスで多くのタスクを学習する能力は、人工知能システムにとって重要である。しかし、ディープネットワークの標準的なトレーニング方法は、新しいタスクの学習が以前のタスクの知識を消去する壊滅的な忘れに苦しむことが多い。大惨事は問題を忘れるが、タスク間の干渉の理論的理由は不明である。そこで本研究では,教師の学習環境において継続学習を学習することで,理論と実践のギャップを狭めようとする。教師-学生構成における2層ネットワークに関する過去の分析作業を複数の教師に拡張する。各教師が異なるタスクを表現するために,教師間の関係が,タスク切替時の生徒が提示する忘れや転校の量にどのように影響するかを検討する。最近の研究によると、タスクが類似した機能に依存する場合、中間タスクの類似性が最大の忘れ物となる。しかし、機能的類似性はタスクが関係する1つの方法である。教師と学生のアプローチは、読み出し(隠れる重み)と特徴(隠れる重み)のレベルでタスクの類似性を分離することを可能にします。両者の類似性、初期転送/フォーゲッティング率、最大転送/フォーゲティング、長期転送/フォーゲティングの複雑な相互作用を見出す。これらの結果は、壊滅的な忘れに寄与する様々な要因を照らすのに役立つ。 Continual learning-the ability to learn many tasks in sequence-is critical for artificial learning systems. Yet standard training methods for deep networks often suffer from catastrophic forgetting, where learning new tasks erases knowledge of earlier tasks. While catastrophic forgetting labels the problem, the theoretical reasons for interference between tasks remain unclear. Here, we attempt to narrow this gap between theory and practice by studying continual learning in the teacher-student setup. We extend previous analytical work on two-layer networks in the teacher-student setup to multiple teachers. Using each teacher to represent a different task, we investigate how the relationship between teachers affects the amount of forgetting and transfer exhibited by the student when the task switches. In line with recent work, we find that when tasks depend on similar features, intermediate task similarity leads to greatest forgetting. However, feature similarity is only one way in which tasks may be related. The teacher-student approach allows us to disentangle task similarity at the level of readouts (hidden-to-output weights) and features (input-to-hidden weights). We find a complex interplay between both types of similarity, initial transfer/forgetting rates, maximum transfer/forgetting, and long-term transfer/forgetting. Together, these results help illuminate the diverse factors contributing to catastrophic forgetting.	翻訳日:2021-07-12 13:53:52 公開日:2021-07-09
# 学級得点に基づく逆例検出のための学習 Learning to Detect Adversarial Examples Based on Class Scores ( http://arxiv.org/abs/2107.04435v1 ) ライセンス: Link先を確認	Tobias Uelwer, Felix Michels, Oliver De Candido	(参考訳) ディープニューラルネットワーク(DNN)に対する敵攻撃の脅威が増加する中、効率的な検出方法の研究はこれまで以上に重要である。本研究では,すでに訓練済みの分類モデルのクラススコアに基づいて,敵の攻撃検出を詳細に検討する。我々は,クラススコアでサポートベクターマシン(svm)を訓練し,逆例を検出することを提案する。本手法は,様々な攻撃によって発生する逆例を検出でき,多くの深層分類モデルに容易に適用できる。提案手法は,実装が容易でありながら,既存の手法と比較して検出率の向上を図っている。異なる深層分類モデルに対する広範な実証分析を行い、様々な最先端の敵攻撃について検討する。さらに,本手法は敵の攻撃の組み合わせを検出するのに優れていることを確かめた。本研究は, 訓練済みの分類モデルのクラススコアを用いて, 様々な敵攻撃を検出する可能性を示唆する。 Given the increasing threat of adversarial attacks on deep neural networks (DNNs), research on efficient detection methods is more important than ever. In this work, we take a closer look at adversarial attack detection based on the class scores of an already trained classification model. We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples. Our method is able to detect adversarial examples generated by various attacks, and can be easily adopted to a plethora of deep classification models. We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement. We perform an extensive empirical analysis on different deep classification models, investigating various state-of-the-art adversarial attacks. Moreover, we observe that our proposed method is better at detecting a combination of adversarial attacks. This work indicates the potential of detecting various adversarial attacks simply by using the class scores of an already trained classification model.	翻訳日:2021-07-12 13:52:57 公開日:2021-07-09
# ViTGAN:視覚変換器を用いたガン訓練 ViTGAN: Training GANs with Vision Transformers ( http://arxiv.org/abs/2107.04589v1 ) ライセンス: Link先を確認	Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu	(参考訳) 近年、視覚変換器(ViT)は、視覚固有の誘導バイアスを少なくしながら、画像認識に競争力を発揮している。本稿では,このような観察を画像生成に拡張できるかどうかを検討する。この目的のために、我々はViTアーキテクチャをGAN(Generative Adversarial Network)に統合する。我々は,ganの既存の正規化手法が自己着脱に乏しく,訓練中に深刻な不安定性を引き起こすことを観察する。この問題を解決するために,我々は,新しい正規化手法を導入し,GANをViTでトレーニングする。 CIFAR-10、CelebA、LSUNの寝室データセット上で、我々のアプローチであるViTGANは最先端のCNNベースのStyleGAN2に匹敵する性能を実現している。 Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such observation can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). We observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce novel regularization techniques for training GANs with ViTs. Empirically, our approach, named ViTGAN, achieves comparable performance to state-of-the-art CNN-based StyleGAN2 on CIFAR-10, CelebA, and LSUN bedroom datasets.	翻訳日:2021-07-12 13:52:44 公開日:2021-07-09
# ARC: 自動運転車の対向的ロバスト制御 ARC: Adversarially Robust Control Policies for Autonomous Vehicles ( http://arxiv.org/abs/2107.04487v1 ) ライセンス: Link先を確認	Sampo Kuutti, Saber Fallah, Richard Bowden	(参考訳) ディープニューラルネットワークは、さまざまなタスクの制御ポリシを学習する能力を示している。しかしながら、これらのニューラルネットワークベースのポリシーは、敵エージェントによる搾取に影響を受けやすいことが示されている。したがって、敵に対して堅牢な制御ポリシーを学ぶための技術を開発する必要がある。本稿では, 対人ロバスト制御(ARC)を導入し, 同じ損失に対して, 対人政策と対人政策を訓練する。主人公の目的は、敵が最小化しようとしている間、この損失を最大化することである。提案したARCトレーニングを高速道路走行シナリオで実演し、敵が先頭車両を制御している間に追従者が追従車両を制御する。敵のアンサンブルに対して主人公を訓練することにより、敵の戦略を一般化する、はるかに堅牢な制御ポリシーを学ぶ。このアプローチは、当初の方針と比較して、新しい敵に対する衝突の回数を90.25%まで減少させることが示されている。また, 補助蒸留損失を利用することにより, 微調整制御方針は, 元のトレーニング分布をまたいだ性能低下を示さないことを示した。 Deep neural networks have demonstrated their capability to learn control policies for a variety of tasks. However, these neural network-based policies have been shown to be susceptible to exploitation by adversarial agents. Therefore, there is a need to develop techniques to learn control policies that are robust against adversaries. We introduce Adversarially Robust Control (ARC), which trains the protagonist policy and the adversarial policy end-to-end on the same loss. The aim of the protagonist is to maximise this loss, whilst the adversary is attempting to minimise it. We demonstrate the proposed ARC training in a highway driving scenario, where the protagonist controls the follower vehicle whilst the adversary controls the lead vehicle. By training the protagonist against an ensemble of adversaries, it learns a significantly more robust control policy, which generalises to a variety of adversarial strategies. The approach is shown to reduce the amount of collisions against new adversaries by up to 90.25%, compared to the original policy. Moreover, by utilising an auxiliary distillation loss, we show that the fine-tuned control policy shows no drop in performance across its original training distribution.	翻訳日:2021-07-12 13:52:19 公開日:2021-07-09
# EasyCom:ノイズの多い環境で簡単にコミュニケーションできるアルゴリズムをサポートする拡張現実データセット EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments ( http://arxiv.org/abs/2107.04174v1 ) ライセンス: Link先を確認	Jacob Donley, Vladimir Tourbabin, Jung-Suk Lee, Mark Broyles, Hao Jiang, Jie Shen, Maja Pantic, Vamsi Krishna Ithapu, Ravish Mehra	(参考訳) プラットフォームとしての拡張現実(AR)は、カクテルパーティー効果の低減を促進する可能性がある。将来のarヘッドセットは、さまざまな種類のセンサーからの情報を活用する可能性がある。ビームフォーミングや音声強調などのタスクにおける信号処理と機械学習アルゴリズムの訓練と試験には、高品質な代表データが必要である。著者の知る限り、出版時点では、ノイズの多い環境での動的動きと会話を伴う、エゴセントリックなマルチチャンネルオーディオとビデオの同期を含む利用可能なデータセットは存在しない。本研究では,ARメガネ装着者の会話改善のためのアルゴリズムのトレーニングやテストに有用な5時間以上のマルチモーダルデータを含むデータセットを記述,評価,リリースする。ベースライン法に対して,音声の可聴性,品質,信号対雑音比の改善結果を提供し,全試験指標で改善を示す。私たちがリリースするデータセットには、ARグラスのエゴセントリックなマルチチャネルマイクロフォンアレイオーディオ、広視野RGBビデオ、音声ソースポーズ、ヘッドセットマイクロフォンオーディオ、注釈付き音声アクティビティ、音声書き起こし、ヘッドバウンディングボックス、スピーチのターゲット、ソース識別ラベルが含まれています。我々は、カクテルパーティー問題に対するマルチモーダルARソリューションの研究を促進するために、このデータセットを作成し、リリースしています。 Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Training and testing signal processing and machine learning algorithms on tasks such as beam-forming and speech enhancement require high quality representative data. To the best of the author's knowledge, as of publication there are no available datasets that contain synchronized egocentric multi-channel audio and video with dynamic movement and conversations in a noisy environment. In this work, we describe, evaluate and release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer. We provide speech intelligibility, quality and signal-to-noise ratio improvement results for a baseline method and show improvements across all tested metrics. The dataset we are releasing contains AR glasses egocentric multi-channel microphone array audio, wide field-of-view RGB video, speech source pose, headset microphone audio, annotated voice activity, speech transcriptions, head bounding boxes, target of speech and source identification labels. We have created and are releasing this dataset to facilitate research in multi-modal AR solutions to the cocktail party problem.	翻訳日:2021-07-12 13:51:58 公開日:2021-07-09
# 質問応答システムにおける回答検証のための共同モデル Joint Models for Answer Verification in Question Answering Systems ( http://arxiv.org/abs/2107.04217v1 ) ライセンス: Link先を確認	Zeyu Zhang, Thuy Vu, and Alessandro Moschitti	(参考訳) 本稿では,検索に基づく質問回答システム(QA)のコアコンポーネントである,回答文選択(AS2)モジュールによって提供される上位$k$の中から,正しい回答文を選択するためのジョイントモデルについて検討する。本研究は,一対の回答間の相互関連情報をモデル化することに関して,回答集合を効果的に活用するための重要なステップを示す。この目的のために三方向多重分類器を構築し,解答が他の解答を支持するか,反証するか,あるいは中立かを決定する。より具体的には、私たちのニューラルネットワークアーキテクチャは、最先端のAS2モデルとマルチクラス化器、およびすべてのコンポーネントを接続するジョイント層を統合しています。私たちは、WikiQA、TREC-QA、実世界のデータセットでモデルをテストしました。その結果,本モデルではAS2の新たな状態が得られた。 This paper studies joint models for selecting correct answer sentences among the top $k$ provided by answer sentence selection (AS2) modules, which are core components of retrieval-based Question Answering (QA) systems. Our work shows that a critical step to effectively exploit an answer set regards modeling the interrelated information between pair of answers. For this purpose, we build a three-way multi-classifier, which decides if an answer supports, refutes, or is neutral with respect to another one. More specifically, our neural architecture integrates a state-of-the-art AS2 model with the multi-classifier, and a joint layer connecting all components. We tested our models on WikiQA, TREC-QA, and a real-world dataset. The results show that our models obtain the new state of the art in AS2.	翻訳日:2021-07-12 13:51:36 公開日:2021-07-09
# 自動可読性評価のための相関グラフを用いたシンタクティックセンス埋め込みの学習 Learning Syntactic Dense Embedding with Correlation Graph for Automatic Readability Assessment ( http://arxiv.org/abs/2107.04268v1 ) ライセンス: Link先を確認	Xinying Qiu, Yuan Chen, Hanwu Chen, Jian-Yun Nie, Yuming Shen, Dawei Lu	(参考訳) 自動可読性評価のためのディープラーニングモデルは、一般的に、タスクの機械学習モデルで伝統的に使用される言語的特徴を捨てる。本稿では,言語的特徴に基づく構文的密埋め込みを学習することにより,言語的特徴をニューラルネットワークモデルに組み込むことを提案する。特徴間の関係に対処するため,特徴間の相関グラフを作成し,類似した特徴が類似の埋め込みによって表現されるように,それらの埋め込みを学習する。提案手法は, BERTのみのモデルを補完し, 自動可読性評価のための性能を著しく向上させることができることを示す。 Deep learning models for automatic readability assessment generally discard linguistic features traditionally used in machine learning models for the task. We propose to incorporate linguistic features into neural network models by learning syntactic dense embeddings based on linguistic features. To cope with the relationships between the features, we form a correlation graph among features and use it to learn their embeddings so that similar features will be represented by similar embeddings. Experiments with six data sets of two proficiency levels demonstrate that our proposed methodology can complement BERT-only model to achieve significantly better performances for automatic readability assessment.	翻訳日:2021-07-12 13:51:20 公開日:2021-07-09
# 持ち上げ動作モデルの安全学習 Safe Learning of Lifted Action Models ( http://arxiv.org/abs/2107.04169v1 ) ライセンス: Link先を確認	Brendan Juba, Hai S. Le, Roni Stern	(参考訳) ドメインモデルの作成は、古典的でドメインに依存しない計画であっても、非常に難しい知識エンジニアリングタスクです。この問題を解決する自然なアプローチは、観察からドメインモデルを学ぶことである。しかし、モデル学習アプローチは、しばしば安全保証を提供しない: 学習モデルは、アクションが適用されないときに、アクションが適用可能であると仮定し、アクションの効果を誤ってキャプチャする可能性がある。これは実行時に失敗する計画を生成する可能性がある。一部のドメインでは、失敗のコストや失敗後のオンライン再計画のできないため、このような失敗は許されない。このような環境では、他のエージェントや人間によって収集された観察に基づいて、すべての学習をオフラインで行う必要がある。この学習を通じて、そのタスクは成功を保証された計画を生成することです。これをモデルフリー計画問題と呼ぶ。先行研究は、古典計画におけるモデルフリー計画問題の解法を提案した。しかし、藩の学習に限定されていたため、規模は拡大できなかった。我々は、この先行研究を一般化し、リフトドドメインに対する最初の安全なモデルフリープランニングアルゴリズムを提案する。我々は,このアプローチの正確性を証明し,確率の高い将来の問題を解くのに必要な軌道数が,ドメインモデルのポテンシャルサイズにおいて線形であることを示す統計解析を提供する。また,12のICCドメインに対して,少なくとも2つの軌道で実動作モデルを学習可能であることを示す実験を行った。 Creating a domain model, even for classical, domain-independent planning, is a notoriously hard knowledge-engineering task. A natural approach to solve this problem is to learn a domain model from observations. However, model learning approaches frequently do not provide safety guarantees: the learned model may assume actions are applicable when they are not, and may incorrectly capture actions' effects. This may result in generating plans that will fail when executed. In some domains such failures are not acceptable, due to the cost of failure or inability to replan online after failure. In such settings, all learning must be done offline, based on some observations collected, e.g., by some other agents or a human. Through this learning, the task is to generate a plan that is guaranteed to be successful. This is called the model-free planning problem. Prior work proposed an algorithm for solving the model-free planning problem in classical planning. However, they were limited to learning grounded domains, and thus they could not scale. We generalize this prior work and propose the first safe model-free planning algorithm for lifted domains. We prove the correctness of our approach, and provide a statistical analysis showing that the number of trajectories needed to solve future problems with high probability is linear in the potential size of the domain model. We also present experiments on twelve IPC domains showing that our approach is able to learn the real action model in all cases with at most two trajectories.	翻訳日:2021-07-12 13:51:10 公開日:2021-07-09
# オープンワールド新規企業における計画・実行・監視の統合:オープンワールドモノポリーソルバーを事例として Integrating Planning, Execution and Monitoring in the presence of Open World Novelties: Case Study of an Open World Monopoly Solver ( http://arxiv.org/abs/2107.04303v1 ) ライセンス: Link先を確認	Sriram Gopalakrishnan, Utkarsh Soni, Tung Thai, Panagiotis Lymperopoulos, Matthias Scheutz, Subbarao Kambhampati	(参考訳) ゲーム・オブ・モノポリー(英: game of monopoly)は、最後のプレイヤー溶媒となること以外の固定的な目標がなく、プロパティの集合の独占やそれらの発展といった有用なサブゴールが存在する、敵対的マルチエージェントドメインである。 dice rolls、card-draws、adversariesの戦略からも多くのランダム性がある。この予測不可能性は、ゲームプレイ中に未知のノベルティを追加すると悪化する。これらの課題を考えると、モノポリーはDARPA-SAILONプログラムで選ばれたテストベッドの1つであり、新規性を検出して適応できるエージェントを作ることを目的としている。ゲームの複雑さに対処するため,我々は,ゲームが進化するにつれてオンラインの方針に適応するエージェントを開発した。 SAILONプログラムの最近の独立性評価では,ほとんどの指標において,我々のエージェントが最も優れたエージェントであった。ここでは、我々のアプローチと結果を示す。 The game of monopoly is an adversarial multi-agent domain where there is no fixed goal other than to be the last player solvent, There are useful subgoals like monopolizing sets of properties, and developing them. There is also a lot of randomness from dice rolls, card-draws, and adversaries' strategies. This unpredictability is made worse when unknown novelties are added during gameplay. Given these challenges, Monopoly was one of the test beds chosen for the DARPA-SAILON program which aims to create agents that can detect and accommodate novelties. To handle the game complexities, we developed an agent that eschews complete plans, and adapts it's policy online as the game evolves. In the most recent independent evaluation in the SAILON program, our agent was the best performing agent on most measures. We herein present our approach and results.	翻訳日:2021-07-12 13:50:51 公開日:2021-07-09
# 鉄道トポロジーオントロジー:鉄道インフラ基盤オントロジー Rail Topology Ontology: A Rail Infrastructure Base Ontology ( http://arxiv.org/abs/2107.04378v1 ) ライセンス: Link先を確認	Stefan Bischof, Gottfried Schenner	(参考訳) 鉄道インフラのエンジニアリングプロジェクトは通常、計画され構築されたインフラとその基盤となるトポロジを一貫したビューを必要とする多くのサブシステムを含む。一貫性はXMLベースのデータフォーマットとUMLベースのオブジェクト指向モデルを使ってツール間でデータを交換し、検証することで保証される。共通のトポロジーモデルによるこれらのデータ表現のより緊密なアラインメントは、鉄道インフラエンジニアリングツールの開発労力を減少させる可能性がある。一般的な意味モデルは、鉄道知識グラフの導入の成功の前提条件でもある。レールトポモデル標準に基づき、鉄道インフラのコア特性を標準に準拠した形で表現するためのモデルとしてレールトポロジオントロジーを開発した。本稿では, オントロジーとその開発手法について述べるとともに, 鉄道工学系などのデータの統合性について, 知識グラフで考察する。レールトポロジーオントロジーにより、ソフトウェアエンジニアと知識科学者は、切断されたデータソースを統合するための鉄道トポロジーを表す標準オントロジーを持っている。私たちはレールトポロジオントロジーをレールナレッジグラフとして使用し、既存のデータ交換標準から派生したレールインフラストラクチャオントロジーによって拡張することを計画しています。 Engineering projects for railway infrastructure typically involve many subsystems which need consistent views of the planned and built infrastructure and its underlying topology. Consistency is typically ensured by exchanging and verifying data between tools using XML-based data formats and UML-based object-oriented models. A tighter alignment of these data representations via a common topology model could decrease the development effort of railway infrastructure engineering tools. A common semantic model is also a prerequisite for the successful adoption of railway knowledge graphs. Based on the RailTopoModel standard, we developed the Rail Topology Ontology as a model to represent core features of railway infrastructures in a standard-compliant manner. This paper describes the ontology and its development method, and discusses its suitability for integrating data of railway engineering systems and other sources in a knowledge graph. With the Rail Topology Ontology, software engineers and knowledge scientists have a standard-based ontology for representing railway topologies to integrate disconnected data sources. We use the Rail Topology Ontology for our rail knowledge graph and plan to extend it by rail infrastructure ontologies derived from existing data exchange standards, since many such standards use the same base model as the presented ontology, viz., RailTopoModel.	翻訳日:2021-07-12 13:50:35 公開日:2021-07-09
# Unity Perception:コンピュータビジョンのための合成データ生成 Unity Perception: Generate Synthetic Data for Computer Vision ( http://arxiv.org/abs/2107.04259v1 ) ライセンス: Link先を確認	Steve Borkman, Adam Crespi, Saurav Dhakad, Sujoy Ganguly, Jonathan Hogins, You-Cyuan Jhang, Mohsen Kamalzadeh, Bowen Li, Steven Leal, Pete Parisi, Cesar Romero, Wesley Smith, Alex Thaman, Samuel Warren, Nupur Yadav	(参考訳) 本稿では,コンピュータビジョンタスクのための合成データセット生成プロセスを簡素化し,高速化することを目的としたunity perception packageを紹介する。このオープンソースのパッケージはunityエディタとエンジンコンポーネントを拡張し、いくつかの一般的なコンピュータビジョンタスクの注釈付き例を生成する。さらに、ユーザが生成したデータセットにバリエーションを導入するために、ランダム化されたシミュレーションパラメータを迅速に構築、設定できる拡張可能なランダム化フレームワークも提供する。提案するツールの概要と動作方法,および2次元オブジェクト検出モデルをトレーニングすることにより生成した合成データセットの価値を実証する。主に合成データでトレーニングされたモデルは、実際のデータのみを使用してトレーニングされたモデルよりも優れている。 We introduce the Unity Perception package which aims to simplify and accelerate the process of generating synthetic datasets for computer vision tasks by offering an easy-to-use and highly customizable toolset. This open-source package extends the Unity Editor and engine components to generate perfectly annotated examples for several common computer vision tasks. Additionally, it offers an extensible Randomization framework that lets the user quickly construct and configure randomized simulation parameters in order to introduce variation into the generated datasets. We provide an overview of the provided tools and how they work, and demonstrate the value of the generated synthetic datasets by training a 2D object detection model. The model trained with mostly synthetic data outperforms the model trained using only real data.	翻訳日:2021-07-12 13:49:50 公開日:2021-07-09
# Wavelet Transform-assisted Adaptive Generative Modeling for Colorization Wavelet Transform-assisted Adaptive Generative Modeling for Colorization ( http://arxiv.org/abs/2107.04261v1 ) ライセンス: Link先を確認	Jin Li, Wanyun Li, Zichen Xu, Yuhao Wang, Qiegen Liu	(参考訳) 教師なしのディープラーニングは、最近高品質なサンプルを生成するという約束を実証した。画像の着色タスクを促進する可能性は非常に高いが、機械学習における多様体仮説により性能は限られている。本研究では,ウェーブレット領域におけるスコアベース生成モデルを利用した新しい手法を提案する。ウェーブレット変換によるマルチスケール・マルチチャネル表現を利用して,重畳されたウェーブレット係数成分から先行成分を学習し,粗い周波数スペクトルと詳細周波数スペクトルを併用して画像特性を学習する。さらに、逆最適化のない高フレキシブルな生成モデルは、ウェーブレット領域における二重整合項、すなわちデータ一貫性と構造整合性の下で、より優れた色付けタスクを実行することができる。具体的には、トレーニングフェーズにおいて、ウェーブレット係数からなるマルチチャネルテンソルのセットを入力として、スコアマッチングを識別してネットワークをトレーニングする。テストフェーズでは、サンプルはデータと構造からなるアニールランジュバンダイナミクスを介して反復的に生成される。実験により, 提案モデルが着色品質, 特に着色性, 多様性に顕著な改善が認められた。 Unsupervised deep learning has recently demonstrated the promise to produce high-quality samples. While it has tremendous potential to promote the image colorization task, the performance is limited owing to the manifold hypothesis in machine learning. This study presents a novel scheme that exploiting the score-based generative model in wavelet domain to address the issue. By taking advantage of the multi-scale and multi-channel representation via wavelet transform, the proposed model learns the priors from stacked wavelet coefficient components, thus learns the image characteristics under coarse and detail frequency spectrums jointly and effectively. Moreover, such a highly flexible generative model without adversarial optimization can execute colorization tasks better under dual consistency terms in wavelet domain, namely data-consistency and structure-consistency. Specifically, in the training phase, a set of multi-channel tensors consisting of wavelet coefficients are used as the input to train the network by denoising score matching. In the test phase, samples are iteratively generated via annealed Langevin dynamics with data and structure consistencies. Experiments demonstrated remarkable improvements of the proposed model on colorization quality, particularly on colorization robustness and diversity.	翻訳日:2021-07-12 13:49:38 公開日:2021-07-09
# 一般医用画像セグメンテーションの堅牢化に向けて Towards Robust General Medical Image Segmentation ( http://arxiv.org/abs/2107.04263v1 ) ライセンス: Link先を確認	Laura Daza, Juan C. P\'erez, Pablo Arbel\'aez	(参考訳) 深層学習システムの信頼性は,その精度に依存するだけでなく,入力データに対する逆摂動に対する頑健性にも依存する。自然画像領域における対向ノイズの存在下でのディープニューラルネットワークの性能向上のために,いくつかの攻撃と防御が提案されている。しかしながら、ボリュームデータに対するコンピュータ支援診断のロバスト性は、特定のタスクと限られた攻撃でのみ研究されている。一般医用画像分割システムの堅牢性を評価するための新しい枠組みを提案する。 i)最近のAutoAttack自然画像分類フレームワークをボリュームデータセグメンテーションの領域に拡張することにより,医療セグメンテーション宣言(MSD)の文脈における堅牢性を評価するための新しいベンチマークを提案し,(ii)RObust Generic Medical Image segmentation(ROG)のための新しい格子アーキテクチャを提案する。以上の結果から,ROGはMSDの様々なタスクにまたがる一般化が可能であり,高度な敵攻撃下での最先端技術を上回ることが示唆された。 The reliability of Deep Learning systems depends on their accuracy but also on their robustness against adversarial perturbations to the input data. Several attacks and defenses have been proposed to improve the performance of Deep Neural Networks under the presence of adversarial noise in the natural image domain. However, robustness in computer-aided diagnosis for volumetric data has only been explored for specific tasks and with limited attacks. We propose a new framework to assess the robustness of general medical image segmentation systems. Our contributions are two-fold: (i) we propose a new benchmark to evaluate robustness in the context of the Medical Segmentation Decathlon (MSD) by extending the recent AutoAttack natural image classification framework to the domain of volumetric data segmentation, and (ii) we present a novel lattice architecture for RObust Generic medical image segmentation (ROG). Our results show that ROG is capable of generalizing across different tasks of the MSD and largely surpasses the state-of-the-art under sophisticated adversarial attacks.	翻訳日:2021-07-12 13:49:19 公開日:2021-07-09
# 先導型マルチビュー3次元頭部再構成 Prior-Guided Multi-View 3D Head Reconstruction ( http://arxiv.org/abs/2107.04277v1 ) ライセンス: Link先を確認	Xueying Wang, Yudong Guo, Zhongqi Yang and Juyong Zhang	(参考訳) 顔と髪の領域を含む3Dヘッドモデルの復元は、コンピュータビジョンとグラフィックスにおいて依然として難しい問題である。本稿では,複数視点のポートレート画像を入力として,この問題を考える。従来のマルチビューステレオ法は、最適化戦略またはディープラーニング技術に基づいており、不明瞭な頭部構造や毛髪領域における不正確な再構成といった低周波の幾何学的構造に苦しむ。この問題に対処するために,先導型暗黙的ニューラルネットワークを提案する。具体的には,頭部形状を学習可能な符号付き距離場(SDF)でモデル化し,顔の事前知識,頭部意味的セグメンテーション情報,2Dヘアオリエンテーションマップなどを含む,暗黙の微分可能レンダラーを用いて最適化する。これらの先行技術を利用することで、復元精度とロバスト性が向上し、高品質な3Dヘッドモデルが実現される。広範なアブレーション研究と最新手法との比較により,本手法が先行手法の指導により高精度な3次元頭部形状を生成できることが証明された。 Recovering a 3D head model including the complete face and hair regions is still a challenging problem in computer vision and graphics. In this paper, we consider this problem with a few multi-view portrait images as input. Previous multi-view stereo methods, either based on the optimization strategies or deep learning techniques, suffer from low-frequency geometric structures such as unclear head structures and inaccurate reconstruction in hair regions. To tackle this problem, we propose a prior-guided implicit neural rendering network. Specifically, we model the head geometry with a learnable signed distance field (SDF) and optimize it via an implicit differentiable renderer with the guidance of some human head priors, including the facial prior knowledge, head semantic segmentation information and 2D hair orientation maps. The utilization of these priors can improve the reconstruction accuracy and robustness, leading to a high-quality integrated 3D head model. Extensive ablation studies and comparisons with state-of-the-art methods demonstrate that our method could produce high-fidelity 3D head geometries with the guidance of these priors.	翻訳日:2021-07-12 13:49:00 公開日:2021-07-09
# ビデオオブジェクトセグメンテーションのための高速画素マッチング Fast Pixel-Matching for Video Object Segmentation ( http://arxiv.org/abs/2107.04279v1 ) ライセンス: Link先を確認	Siyue Yu, Jimin Xiao, BingFeng Zhang, Eng Gee Lim	(参考訳) 第1フレームのアノテーションによる前景オブジェクトのセグメント化を目的としたビデオオブジェクトセグメンテーションが注目されている。多くの最先端のアプローチは、オンラインモデル更新やマスクプロパゲーション技術に頼ることで、優れたパフォーマンスを実現している。しかし、ほとんどのオンラインモデルは推論中のモデル微調整のために高い計算コストを必要とする。ほとんどのマスクプロパゲーションベースのモデルは高速だが、オブジェクトの外観の変化に適応できないため比較的性能が低い。本稿では,速度と性能のバランスを良くするために,新しいモデルを設計することを目的としている。マスクプロパゲーションと非局所的手法に基づいて、参照フレームとターゲットフレームの画素をマッチングすることにより、前景オブジェクトを直接ローカライズするNPMCA-netモデルを提案する。最初のフレームと前のフレームの両方の情報をもたらすので、我々のネットワークは大きなオブジェクトの外観変化に対して堅牢であり、オクルージョンに適応できる。実験の結果,DAVIS-2016では86.5% IoU,DAVIS-2017では72.2% IoU,フレーム当たり0.11秒の速度)を同時に達成できることがわかった。ソースコードはhttps://github.com/siyueyu/NPMCA-net.comで入手できる。 Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net.	翻訳日:2021-07-12 13:48:40 公開日:2021-07-09
# JPGNet:イメージインペイントのための共同予測フィルタと生成ネットワーク JPGNet: Joint Predictive Filtering and Generative Network for Image Inpainting ( http://arxiv.org/abs/2107.04281v1 ) ライセンス: Link先を確認	Xiaoguang Li and Qing Guo and Felix Juefei-Xu and Hongkai Yu and Yang Liu and Song wang	(参考訳) 画像インペインティングは、画像の自然性を強調する共通生成タスクとは異なる、欠落した領域を復元し、元の完全画像と同一の回復結果を得ることを目的としている。それにもかかわらず、既存の作品では、通常は純粋な生成問題と見なされ、それに対処するために最先端の生成技術を使用している。生成ネットワークは、主要な欠落した部分を現実的な内容で埋めるが、通常は局所構造を歪ませる。本稿では,画像インペインティングを,予測フィルタリングと深層生成という2つの問題の混合として定式化する。予測フィルタリングは、ローカルな構造の保存とアーティファクトの除去に優れているが、大きな欠落した領域を完遂するには不足している。ディープジェネレーティブネットワークは、シーン全体の理解に基づいて多数の欠落画素を満たすことができるが、元のピクセルと同じ詳細を復元することはほとんどない。それぞれの利点を利用するために,予測フィルタリング・不確実性ネットワーク(PFUNet),深層生成ネットワーク(UAFNet),不確実性認識融合ネットワーク(UAFNet)の3分野を含む共同予測フィルタリング・生成ネットワーク(JPGNet)を提案する。 PFUNetは、入力画像に応じてフィルタリングベースの塗布のための画素単位のカーネルを適応的に予測し、不確実性マップを出力する。このマップは、ピクセルはフィルタリングまたは生成ネットワークによって処理されるべきであり、フィルタリングと生成結果の間のスマートな組み合わせのためにさらにuafnetに供給されることを示している。画像インペイント問題に対する新しいフレームワークとしての本手法は,既存の世代ベース手法の恩恵を受けることができる。我々は,Dunhuang,Places2,CelebAの3つの公開データセットに対して本手法の有効性を検証し,この手法が3つの最先端生成手法(StructFlow,EdgeConnect,RFRNet)を大幅に拡張できることを示す。 Image inpainting aims to restore the missing regions and make the recovery results identical to the originally complete image, which is different from the common generative task emphasizing the naturalness of generated images. Nevertheless, existing works usually regard it as a pure generation problem and employ cutting-edge generative techniques to address it. The generative networks fill the main missing parts with realistic contents but usually distort the local structures. In this paper, we formulate image inpainting as a mix of two problems, i.e., predictive filtering and deep generation. Predictive filtering is good at preserving local structures and removing artifacts but falls short to complete the large missing regions. The deep generative network can fill the numerous missing pixels based on the understanding of the whole scene but hardly restores the details identical to the original ones. To make use of their respective advantages, we propose the joint predictive filtering and generative network (JPGNet) that contains three branches: predictive filtering & uncertainty network (PFUNet), deep generative network, and uncertainty-aware fusion network (UAFNet). The PFUNet can adaptively predict pixel-wise kernels for filtering-based inpainting according to the input image and output an uncertainty map. This map indicates the pixels should be processed by filtering or generative networks, which is further fed to the UAFNet for a smart combination between filtering and generative results. Note that, our method as a novel framework for the image inpainting problem can benefit any existing generation-based methods. We validate our method on three public datasets, i.e., Dunhuang, Places2, and CelebA, and demonstrate that our method can enhance three state-of-the-art generative methods (i.e., StructFlow, EdgeConnect, and RFRNet) significantly with the slightly extra time cost.	翻訳日:2021-07-12 13:48:14 公開日:2021-07-09
# 野生のミーム: 有害ミームチャレンジデータセットの一般化可能性を評価する Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset ( http://arxiv.org/abs/2107.04313v1 ) ライセンス: Link先を確認	Hannah Rose Kirk, Yennie Jun, Paulius Rauba, Gal Wachtel, Ruining Li, Xingjian Bai, Noah Broestl, Martin Doff-Sotta, Aleksandar Shtedritski, Yuki M. Asano	(参考訳) ヘイトフルミームは、メッセージがテキストとビジュアルの両方から派生しているため、現在の機械学習システムにとってユニークな課題となる。この効果のためにFacebookは、事前抽出されたテキストキャプションを備えたミームのデータセットであるHateful Memes Challengeをリリースしたが、これらの合成例が「野生のミーム」に一般化されるかどうかは不明である。本稿では、facebookデータセットで事前トレーニングされたモデル上でのサンプル外のパフォーマンスを評価するため、pinterestから嫌悪感と不快感のないミームを収集する。 1)キャプションをocrで抽出し、ノイズを注入し、マルチモーダルモデルの性能を低下させ、2)ミームは会話のスクリーンショットやプレーンな背景のテキストを含む「伝統的なミーム」よりも多様である。そこで本論文は,現在行われているヘイトフルミーム検出のベンチマークと,その実世界ヘイト検出への適用性について検討する。 Hateful memes pose a unique challenge for current machine learning systems because their message is derived from both text- and visual-modalities. To this effect, Facebook released the Hateful Memes Challenge, a dataset of memes with pre-extracted text captions, but it is unclear whether these synthetic examples generalize to `memes in the wild'. In this paper, we collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset. We find that memes in the wild differ in two key aspects: 1) Captions must be extracted via OCR, injecting noise and diminishing performance of multimodal models, and 2) Memes are more diverse than `traditional memes', including screenshots of conversations or text on a plain background. This paper thus serves as a reality check for the current benchmark of hateful meme detection and its applicability for detecting real world hate.	翻訳日:2021-07-12 13:47:38 公開日:2021-07-09
# 共同適応注意とグラフ関係を用いた行動単位検出 Action Unit Detection with Joint Adaptive Attention and Graph Relation ( http://arxiv.org/abs/2107.04389v1 ) ライセンス: Link先を確認	Chenggong Zhang and Juan Song and Qingyang Zhang and Weilong Dong and Ruomeng Ding and Zhilei Liu	(参考訳) 本稿では,顔行動単位(AU)検出へのアプローチについて述べる。本研究では,ABAW(Field Affective Behavior Analysis)2021コンペティションに応募する。提案手法は,事前学習したJAAモデルを特徴抽出器として使用し,マルチスケール特徴に基づいてグローバル特徴,顔アライメント特徴,AU局所特徴を抽出する。我々は、AUの局所的な特徴をグラフ畳み込みの入力として、AU間の相関をさらに考慮し、最終的に融合した特徴を用いてAUを分類する。検出精度は0.5精度+0.5F1。 aff-wild2データベース上で0.674。 This paper describes an approach to the facial action unit (AU) detection. In this work, we present our submission to the Field Affective Behavior Analysis (ABAW) 2021 competition. The proposed method uses the pre-trained JAA model as the feature extractor, and extracts global features, face alignment features and AU local features on the basis of multi-scale features. We take the AU local features as the input of the graph convolution to further consider the correlation between AU, and finally use the fused features to classify AU. The detected accuracy was evaluated by 0.5accuracy + 0.5F1. Our model achieves 0.674 on the challenging Aff-Wild2 database.	翻訳日:2021-07-12 13:47:20 公開日:2021-07-09
# 形態構造抽出のためのマルチモーダルアソシエーションに基づくグループ化 Multi-Modal Association based Grouping for Form Structure Extraction ( http://arxiv.org/abs/2107.04396v1 ) ライセンス: Link先を確認	Milan Aggarwal, Mausoom Sarkar, Hiresh Gupta, Balaji Krishnamurthy	(参考訳) 文書構造抽出は数十年にわたって広く研究されてきた。この方向の最近の研究は深層学習に基づくもので、主にセマンティックセグメンテーションによる完全な畳み込みNNを用いた構造抽出に焦点を当てている。本稿では,形式構造抽出のための新しいマルチモーダルアプローチを提案する。テキストランやウィジェットなどの単純な要素が与えられた場合,フォーム情報収集に不可欠なテキストブロック,テキストフィールド,選択フィールド,選択グループなどの高次構造を抽出する。これを実現するために,各低レベル要素(参照)に近接する候補要素を同定し,局所的な画像パッチを得る。我々は、BiLSTMを通して候補のテキストおよび空間表現を逐次処理し、文脈認識表現を取得し、それをCNNで処理した画像パッチ特徴と融合する。その後、シーケンシャルデコーダはこの融合特徴ベクトルを用いて参照と候補の関連型を予測する。これらの予測関連性を利用して、連結成分分析によりより大きな構造を決定する。実験の結果, 本手法は, それらの構造に対して, 90.29%, 73.80%, 83.12%, 52.72%のリコールを達成し, 意味的セグメンテーションベースラインを著しく上回った。本手法の有効性をアブレーションにより示し,個別のモダリティを用いて比較した。また、新しいリッチな人間アノテーション付きフォームデータセットも紹介します。 Document structure extraction has been a widely researched area for decades. Recent work in this direction has been deep learning-based, mostly focusing on extracting structure using fully convolution NN through semantic segmentation. In this work, we present a novel multi-modal approach for form structure extraction. Given simple elements such as textruns and widgets, we extract higher-order structures such as TextBlocks, Text Fields, Choice Fields, and Choice Groups, which are essential for information collection in forms. To achieve this, we obtain a local image patch around each low-level element (reference) by identifying candidate elements closest to it. We process textual and spatial representation of candidates sequentially through a BiLSTM to obtain context-aware representations and fuse them with image patch features obtained by processing it through a CNN. Subsequently, the sequential decoder takes this fused feature vector to predict the association type between reference and candidates. These predicted associations are utilized to determine larger structures through connected components analysis. Experimental results show the effectiveness of our approach achieving a recall of 90.29%, 73.80%, 83.12%, and 52.72% for the above structures, respectively, outperforming semantic segmentation baselines significantly. We show the efficacy of our method through ablations, comparing it against using individual modalities. We also introduce our new rich human-annotated Forms Dataset.	翻訳日:2021-07-12 13:47:08 公開日:2021-07-09
# 解釈可能な構成畳み込みニューラルネットワーク Interpretable Compositional Convolutional Neural Networks ( http://arxiv.org/abs/2107.04474v1 ) ライセンス: Link先を確認	Wen Shen, Zhihua Wei, Shikun Huang, Binbin Zhang, Jiaqi Fan, Ping Zhao, Quanshi Zhang	(参考訳) 意味論的解釈可能性の定義は、説明可能なAIにおける中核的な課題を示す。本稿では,中間畳み込み層における有意な視覚パターンを符号化するフィルタを学習するために,従来の畳み込みニューラルネットワーク(CNN)を解釈可能な合成CNNに変換する手法を提案する。合成cnnでは、各フィルタは、明確な意味を持つ特定の合成対象部分または画像領域を一貫して表現する。合成cnnは、分類のための画像ラベルから、監督のための部分や領域の注釈なしで学習する。我々の手法は様々な種類のCNNに適用できる。実験により本手法の有効性が示された。 The reasonable definition of semantic interpretability presents the core challenge in explainable AI. This paper proposes a method to modify a traditional convolutional neural network (CNN) into an interpretable compositional CNN, in order to learn filters that encode meaningful visual patterns in intermediate convolutional layers. In a compositional CNN, each filter is supposed to consistently represent a specific compositional object part or image region with a clear meaning. The compositional CNN learns from image labels for classification without any annotations of parts or regions for supervision. Our method can be broadly applied to different types of CNNs. Experiments have demonstrated the effectiveness of our method.	翻訳日:2021-07-12 13:46:44 公開日:2021-07-09
# mutualeyecontact:アイコンタクトに焦点を当てた会話分析ツール MutualEyeContact: A conversation analysis tool with focus on eye contact ( http://arxiv.org/abs/2107.04476v1 ) ライセンス: Link先を確認	Alexander Sch\"afer, Tomoko Isomura, Gerd Reis, Katsumi Watanabe, Didier Stricker	(参考訳) 個人間の目の接触は、人間の行動を理解する上で特に重要である。社会的相互作用におけるアイコンタクトの重要性をさらに調査するため,携帯型アイトラッキング技術は自然選択であると考えられる。しかし、利用可能なデータの分析は非常に複雑になる可能性がある。科学者は素早く正確に計算されるデータが必要です。さらに、関連するデータを自動的に分離して保存しなければなりません。本研究では,これらの課題に優れた相互接触ツールを提案し,相互接触の重要性を科学者に理解させる。最先端のアイトラッキングと機械学習に基づく顔認識を組み合わせることで,ソーシャルインタラクションセッションの分析と可視化を行うツールを提供する。この研究はコンピュータ科学者と認知科学者の共同研究である。社会科学と行動科学の分野とコンピュータビジョンとディープラーニングを組み合わせる。 Eye contact between individuals is particularly important for understanding human behaviour. To further investigate the importance of eye contact in social interactions, portable eye tracking technology seems to be a natural choice. However, the analysis of available data can become quite complex. Scientists need data that is calculated quickly and accurately. Additionally, the relevant data must be automatically separated to save time. In this work, we propose a tool called MutualEyeContact which excels in those tasks and can help scientists to understand the importance of (mutual) eye contact in social interactions. We combine state-of-the-art eye tracking with face recognition based on machine learning and provide a tool for analysis and visualization of social interaction sessions. This work is a joint collaboration of computer scientists and cognitive scientists. It combines the fields of social and behavioural science with computer vision and deep learning.	翻訳日:2021-07-12 13:46:34 公開日:2021-07-09
# スタイルGAN潜時空間の意味的および幾何学的展開 Semantic and Geometric Unfolding of StyleGAN Latent Space ( http://arxiv.org/abs/2107.04481v1 ) ライセンス: Link先を確認	Mustafa Shukor, Xu Yao, Bharath Bhushan Damodaran, Pierre Hellier	(参考訳) generative adversarial networks (gans) は、自然画像に対応する潜在コードを反転させ操作することで画像編集に驚くほど効率的であることが証明されている。この性質は、潜在空間の不連続な性質から生じる。本稿では, 画像知覚距離とユークリッド距離の違いと, (b) アンタングル化が最適ではなく, (b) 線形モデルを用いた顔属性分離が限界仮説である,という2つの幾何学的制約を同定する。そこで本研究では,これらの制約を解消するために,正規化フローを用いてプロキシ潜在表現を学習する新しい手法を提案する。 Generative adversarial networks (GANs) have proven to be surprisingly efficient for image editing by inverting and manipulating the latent code corresponding to a natural image. This property emerges from the disentangled nature of the latent space. In this paper, we identify two geometric limitations of such latent space: (a) euclidean distances differ from image perceptual distance, and (b) disentanglement is not optimal and facial attribute separation using linear model is a limiting hypothesis. We thus propose a new method to learn a proxy latent representation using normalizing flows to remedy these limitations, and show that this leads to a more efficient space for face image editing.	翻訳日:2021-07-12 13:46:23 公開日:2021-07-09
# 弱教師付き領域適応によるカスケード検出タスクの学習 Learning Cascaded Detection Tasks with Weakly-Supervised Domain Adaptation ( http://arxiv.org/abs/2107.04523v1 ) ライセンス: Link先を確認	Niklas Hanselmann, Nick Schneider, Benedikt Ortelt and Andreas Geiger	(参考訳) 自動運転の課題に対処するために、ディープラーニングは、3d検出やインスタンスセグメンテーションなど、ますます複雑なタスクに取り組む上で重要であることが証明されている。画像に基づく検出タスクに対する最先端のアプローチは、カスケードな方法で操作することで、この複雑さに対処している。仮面は推測される。これらの手法はうまく機能するが、様々なタスクに対する正確で安価なアノテーションが欠如していることは依然として大きな課題である。合成データは有望な解であるが、ドメイン適応研究の努力にもかかわらず、合成データと実際のデータのギャップは未解決の問題である。本研究では,逐次的検出タスクの構造を生かした弱教師付き領域適応設定を提案する。特に、2dバウンディングボックスを両方のドメインの弱いラベルとして活用しながら、ソースドメインのみから属性を推測し、ドメインシフトを説明することを学びます。さらに,教師なし設定では利用できない基底クラス情報を用いて,クラス毎の機能アライメントを通じて,ドメイン不変機能を奨励する。実験の結果,提案手法は完全教師付き設定と競合する一方で,教師なし適応手法よりも大きなマージンで優れていた。 In order to handle the challenges of autonomous driving, deep learning has proven to be crucial in tackling increasingly complex tasks, such as 3D detection or instance segmentation. State-of-the-art approaches for image-based detection tasks tackle this complexity by operating in a cascaded fashion: they first extract a 2D bounding box based on which additional attributes, e.g. instance masks, are inferred. While these methods perform well, a key challenge remains the lack of accurate and cheap annotations for the growing variety of tasks. Synthetic data presents a promising solution but, despite the effort in domain adaptation research, the gap between synthetic and real data remains an open problem. In this work, we propose a weakly supervised domain adaptation setting which exploits the structure of cascaded detection tasks. In particular, we learn to infer the attributes solely from the source domain while leveraging 2D bounding boxes as weak labels in both domains to explain the domain shift. We further encourage domain-invariant features through class-wise feature alignment using ground-truth class information, which is not available in the unsupervised setting. As our experiments demonstrate, the approach is competitive with fully supervised settings while outperforming unsupervised adaptation approaches by a large margin.	翻訳日:2021-07-12 13:46:11 公開日:2021-07-09
# スライディングウィンドウ最適化による連続時間におけるイベントベース特徴追跡 Event-Based Feature Tracking in Continuous Time with Sliding Window Optimization ( http://arxiv.org/abs/2107.04536v1 ) ライセンス: Link先を確認	Jason Chui, Simon Klenk, Daniel Cremers	(参考訳) イベントカメラにおける連続時間特徴追跡のための新しい手法を提案する。この目的のために,画像平面上の投影によって最大にシャープなイベントパッチ画像が得られるように,推定軌道に沿ったイベントを時空に調整して特徴を追跡する。軌道は$n^{th}$次 B-スプラインによってパラメータ化され、これは$(n-2)^{th}$微分まで連続である。従来の作業とは対照的に,スライディングウィンドウ方式で曲線パラメータを最適化する。パブリックデータセットでは,提案したスライドウインドウB-スプライン最適化が,従来よりも長い,より正確な特徴トラックにつながることを実験的に確認した。 We propose a novel method for continuous-time feature tracking in event cameras. To this end, we track features by aligning events along an estimated trajectory in space-time such that the projection on the image plane results in maximally sharp event patch images. The trajectory is parameterized by $n^{th}$ order B-splines, which are continuous up to $(n-2)^{th}$ derivative. In contrast to previous work, we optimize the curve parameters in a sliding window fashion. On a public dataset we experimentally confirm that the proposed sliding-window B-spline optimization leads to longer and more accurate feature tracks than in previous work.	翻訳日:2021-07-12 13:45:51 公開日:2021-07-09
# MRIと超音波ボリューム登録のためのクロスモーダルアテンション Cross-modal Attention for MRI and Ultrasound Volume Registration ( http://arxiv.org/abs/2107.04548v1 ) ライセンス: Link先を確認	Xinrui Song, Hengtao Guo, Xuanang Xu, Hanqing Chao, Sheng Xu, Baris Turkbey, Bradford J. Wood, Ge Wang, Pingkun Yan	(参考訳) 前立腺癌生検は経直腸超音波(TRUS)とMR画像の正確な融合の恩恵を受ける。過去数年間、畳み込みニューラルネットワーク(cnns)は、画像登録に不可欠な画像特徴を抽出する上で強力であることが証明されてきた。しかし、挑戦的な応用やコンピュータビジョンの最近の進歩は、cnnが特徴間の空間的対応を理解する能力にかなり制限があることを示唆している。本稿では,モーダル画像登録のための自己認識機構を開発することを目的とする。提案するクロスモーダルアテンションブロックは,各特徴量と対応する特徴量とを効果的にマッピングする。実験の結果,クロスモーダルアテンションブロックを組み込んだCNNネットワークが,CNNネットワークの10倍の性能を発揮することがわかった。ネットワークの解釈性を改善するために可視化技術も取り入れた。私たちの作業のソースコードはhttps://github.com/DIAL-RPI/Attention-Reg で公開されています。 Prostate cancer biopsy benefits from accurate fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images. In the past few years, convolutional neural networks (CNNs) have been proved powerful in extracting image features crucial for image registration. However, challenging applications and recent advances in computer vision suggest that CNNs are quite limited in its ability to understand spatial correspondence between features, a task in which the self-attention mechanism excels. This paper aims to develop a self-attention mechanism specifically for cross-modal image registration. Our proposed cross-modal attention block effectively maps each of the features in one volume to all features in the corresponding volume. Our experimental results demonstrate that a CNN network designed with the cross-modal attention block embedded outperforms an advanced CNN network 10 times of its size. We also incorporated visualization techniques to improve the interpretability of our network. The source code of our work is available at https://github.com/DIAL-RPI/Attention-Reg .	翻訳日:2021-07-12 13:45:42 公開日:2021-07-09
# ResNet-18を用いた7つの基本表現認識 Seven Basic Expression Recognition Using ResNet-18 ( http://arxiv.org/abs/2107.04569v1 ) ライセンス: Link先を確認	Satnam Singh, Doris Schicker	(参考訳) 本稿では, fer+データセット上で事前学習したResNet-18アーキテクチャを用いて, 感情行動分析(ABAW)の問題に対処し, 中立性, 怒り, 嫌悪感, 恐怖, 幸福, 悲しみ, 驚きの7つの基本表現の分類を行う。第2回ワークショップと第2回感情行動分析コンテスト(ABAW2)では、約2.8Mフレームの564ビデオからなるデータベースと、これら7つの基本表現のラベルが提供される。我々は、過剰表現されたクラスをアンダーサンプリングし、過表現されたクラスをクラスワイドと共にオーバーサンプリングすることで、クラス不均衡に対処するためにデータセットを再サンプリングした。オーバーフィッティングを避けるためにデータ表示を行い、l2正規化を使った。我々の分類器は、abaw2スコア0.4に達し、競争相手が提供したベースライン結果を超える。 We propose to use a ResNet-18 architecture that was pre-trained on the FER+ dataset for tackling the problem of affective behavior analysis in-the-wild (ABAW) for classification of the seven basic expressions, namely, neutral, anger, disgust, fear, happiness, sadness and surprise. As part of the second workshop and competition on affective behavior analysis in-the-wild (ABAW2), a database consisting of 564 videos with around 2.8M frames is provided along with labels for these seven basic expressions. We resampled the dataset to counter class-imbalances by under-sampling the over-represented classes and over-sampling the under-represented classes along with class-wise weights. To avoid overfitting we performed data-augmentation and used L2 regularisation. Our classifier reaches an ABAW2 score of 0.4 and therefore exceeds the baseline results provided by the hosts of the competition.	翻訳日:2021-07-12 13:45:28 公開日:2021-07-09
# 堅牢な機能獲得に向けて Towards Robust Active Feature Acquisition ( http://arxiv.org/abs/2107.04163v1 ) ライセンス: Link先を確認	Yang Li, Siyuan Shan, Qin Liu, Junier B. Oliva	(参考訳) 真にインテリジェントなシステムは、不完全で不確実なデータで重要な決定をすると予想されている。予測を改善するために機能が順次取得されるアクティブ機能獲得(afa)は、この目標に向けての一歩です。しかしながら、現在のAFAモデルは、すべて小さな機能セットを扱い、大きな機能領域へのスケーリングが困難です。さらに、信頼できる予測が可能な有効なドメインについて無知であるため、アウト・オブ・ディストリビューション(OOD)の入力に弱い可能性がある。これらの欠陥を解消し、AFAモデルを実用化に近づけるために、我々は現在のAFAアプローチを進めるためのいくつかの手法を提案する。本フレームワークは階層的な取得ポリシを用いて,多数の機能を容易に扱えるとともに,OOD検出器の助けを借りてOOD入力に対してより堅牢である。大規模な実験は、強いベースラインに対する我々のフレームワークの有効性を示す。 Truly intelligent systems are expected to make critical decisions with incomplete and uncertain data. Active feature acquisition (AFA), where features are sequentially acquired to improve the prediction, is a step towards this goal. However, current AFA models all deal with a small set of candidate features and have difficulty scaling to a large feature space. Moreover, they are ignorant about the valid domains where they can predict confidently, thus they can be vulnerable to out-of-distribution (OOD) inputs. In order to remedy these deficiencies and bring AFA models closer to practical use, we propose several techniques to advance the current AFA approaches. Our framework can easily handle a large number of features using a hierarchical acquisition policy and is more robust to OOD inputs with the help of an OOD detector for partially observed data. Extensive experiments demonstrate the efficacy of our framework over strong baselines.	翻訳日:2021-07-12 13:44:59 公開日:2021-07-09
# 系統的欠落値を含むデータから学習する欲求構造 Greedy structure learning from data that contains systematic missing values ( http://arxiv.org/abs/2107.04184v1 ) ライセンス: Link先を確認	Yang Liu and Anthony C. Constantinou	(参考訳) 欠落した値を含むデータから学ぶことは、多くの領域で共通の現象である。ベイズネットワーク構造学習アルゴリズムが欠落データを扱うのは、比較的少ないが、欠落データを想定最大化アルゴリズムのようにランダムに欠落していると仮定する標準的なアプローチに依存する傾向がある。欠落したデータはしばしば体系的であるため、ランダムに欠落しない値を含むデータセットを効果的に扱えるより実用的な方法が必要である。体系的な欠落データを扱うアプローチの欠如は、BN構造学習法の欠落がランダムでない実世界の問題への適用を妨げる。本稿では,ペアワイズ削除と逆確率重み付けを活用し,観測データを最大に活用し,欠落値による潜在的なバイアスを最小化する,グリーディ探索構造学習の3つの変種について述べる。最初の2つの変種は3番目の変種と最高の変種をサブバージョンと見なすことができるが、学習精度の連続的な改善を示す上で重要である。実験により, 提案手法は, 学習精度と効率の両面において, 一般用および最先端の構造化EMアルゴリズムよりも優れており, ランダムにデータが欠落している場合や, ランダムではない場合にも優れることがわかった。 Learning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data, and those that do tend to rely on standard approaches that assume missing data are missing at random, such as the Expectation-Maximisation algorithm. Because missing data are often systematic, there is a need for more pragmatic methods that can effectively deal with data sets containing missing values not missing at random. The absence of approaches that deal with systematic missing data impedes the application of BN structure learning methods to real-world problems where missingness are not random. This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting to maximally leverage the observed data and to limit potential bias caused by missing values. The first two of the variants can be viewed as sub-versions of the third and best performing variant, but are important in their own in illustrating the successive improvements in learning accuracy. The empirical investigations show that the proposed approach outperforms the commonly used and state-of-the-art Structural EM algorithm, both in terms of learning accuracy and efficiency, as well as both when data are missing at random and not at random.	翻訳日:2021-07-12 13:44:46 公開日:2021-07-09
# rex: スケジュールの改善による予算トレーニングの再検討 REX: Revisiting Budgeted Training with an Improved Schedule ( http://arxiv.org/abs/2107.04197v1 ) ライセンス: Link先を確認	John Chen, Cameron Wolfe, Anastasios Kyrillidis	(参考訳) ディープラーニングの実践者は、しばしば計算と金銭の予算を運用する。したがって、いかなる予算でもうまく機能する最適化アルゴリズムを設計することは重要である。線形学習率のスケジュールは、低予算体制の他のほとんどのスケジュールよりも優れているため、最良の予算対応スケジュールと考えられている。一方、例えば \texttt{30-60-90} ステップスケジュールのような学習率スケジュールは、モデルが多くのエポックに対してトレーニングできる場合に高いパフォーマンスを達成することが知られている。しかし、予算が大きくなるか小さいかは事前に分かっていないことが多いため、学習率スケジュールの最適な選択はケース・バイ・ケース・バイ・ケースで行われる。本稿では、学習率スケジュール選択問題を、プロファイルの選択(すなわち、学習率スケジュールをモデル化する連続関数)と、サンプリングレートの選択(つまり、このプロファイルから学習率が更新/サンプリングされる頻度)の組合せとして構成する。 sgdとadamオプティマイザの両方を用いて7つの異なる実験環境で評価した,reflection exponential (rex) scheduleと呼ばれる新しいプロファイルとサンプリングレートの組み合わせを提案する。 REXは低予算体制において線形スケジュールを上回り、高予算体制と低予算体制の両方において最先端の学習率スケジュール(線形、ステップ、指数関数、コサイン、高原でのステップ崩壊、OneCycle)のパフォーマンスを一致または超過する。さらに、REXは計算、ストレージ、ハイパーパラメータを追加する必要はない。 Deep learning practitioners often operate on a computational and monetary budget. Thus, it is critical to design optimization algorithms that perform well under any budget. The linear learning rate schedule is considered the best budget-aware schedule, as it outperforms most other schedules in the low budget regime. On the other hand, learning rate schedules -- such as the \texttt{30-60-90} step schedule -- are known to achieve high performance when the model can be trained for many epochs. Yet, it is often not known a priori whether one's budget will be large or small; thus, the optimal choice of learning rate schedule is made on a case-by-case basis. In this paper, we frame the learning rate schedule selection problem as a combination of $i)$ selecting a profile (i.e., the continuous function that models the learning rate schedule), and $ii)$ choosing a sampling rate (i.e., how frequently the learning rate is updated/sampled from this profile). We propose a novel profile and sampling rate combination called the Reflected Exponential (REX) schedule, which we evaluate across seven different experimental settings with both SGD and Adam optimizers. REX outperforms the linear schedule in the low budget regime, while matching or exceeding the performance of several state-of-the-art learning rate schedules (linear, step, exponential, cosine, step decay on plateau, and OneCycle) in both high and low budget regimes. Furthermore, REX requires no added computation, storage, or hyperparameters.	翻訳日:2021-07-12 13:44:25 公開日:2021-07-09
# 早期終末期MDPの解決による安全な探索 Safe Exploration by Solving Early Terminated MDP ( http://arxiv.org/abs/2107.04200v1 ) ライセンス: Link先を確認	Hao Sun, Ziping Xu, Meng Fang, Zhenghao Peng, Jiadong Guo, Bo Dai, Bolei Zhou	(参考訳) 強化学習(RL)の現実的な応用には,安全な探索が不可欠である。従来の研究では、安全な探索問題を制約付きマルコフ決定プロセス(CMDP)とみなしており、政策は制約の下で最適化されている。しかし、潜在的な危険に遭遇すると、人間はすぐに立ち止まり、危険の中で安全に行動することを学ぶことは滅多にない。人間の学習を動機として,早期終末型MDP(ET-MDP)の枠組みの下で安全なRL問題に対処する新たなアプローチを導入する。まず,ET-MDP を,対応するCMDP と同じ最適値関数を持つ制約のない MDP として定義する。そこで, 文脈モデルに基づく非政治アルゴリズムを提案し, ET-MDPを解くことにより, CMDPの漸近性能を向上し, 学習効率を向上する。 CMDPタスクの実験では、CMDPを直接解く従来の方法よりも大幅に改善されている。 Safe exploration is crucial for the real-world application of reinforcement learning (RL). Previous works consider the safe exploration problem as Constrained Markov Decision Process (CMDP), where the policies are being optimized under constraints. However, when encountering any potential dangers, human tends to stop immediately and rarely learns to behave safely in danger. Motivated by human learning, we introduce a new approach to address safe RL problems under the framework of Early Terminated MDP (ET-MDP). We first define the ET-MDP as an unconstrained MDP with the same optimal value function as its corresponding CMDP. An off-policy algorithm based on context models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP with better asymptotic performance and improved learning efficiency. Experiments on various CMDP tasks show a substantial improvement over previous methods that directly solve CMDP.	翻訳日:2021-07-12 13:43:57 公開日:2021-07-09
# 局所適応型不均一フェデレーション学習によるリソグラフィホットスポット検出 Lithography Hotspot Detection via Heterogeneous Federated Learning with Local Adaptation ( http://arxiv.org/abs/2107.04367v1 ) ライセンス: Link先を確認	Xuezhong Lin, Jingyu Pan, Jinming Xu, Yiran Chen and Cheng Zhuo	(参考訳) 技術的スケーリングが物理的限界に近づいている中、リソグラフィホットスポット検出は製造性の設計において重要なタスクとなっている。パターンマッチングや機械学習をホットスポット検出に配置することで、かなりのシミュレーション時間を節約できるが、そのような手法は通常、モデルを構築するための非自明な品質データを必要とする。また、デザインハウスは、このようなデータを他の住宅と直接共有して統一モデルを構築することを望まないため、データ不足によりユニークなデザインパターンを持つデザインハウスには効果がない。一方、各デザインハウスにおけるデータ均質性により、局所的に訓練されたモデルは容易に過剰に適合し、一般化能力と堅牢性を失う。本稿では,上記の問題に対処可能な,リソグラフィホットスポット検出のための異種フェデレーション学習フレームワークを提案する。一方、このフレームワークは、ローカルデータをプライベートに保ちながら、異質な知識共有を通じて、より堅牢なグローバルサブモデルを構築することができる。一方、グローバルなサブモデルとローカルなサブモデルを組み合わせることで、ローカルなデータの均一性を改善することができる。提案手法は,非独立かつ同一分散(非iid)データとヘテロジニアス通信の課題を克服し,様々なシナリオにおいて良好な収束率を確保しつつ,他の最先端手法と比較して非常に高い性能を実現することができることを示す。 As technology scaling is approaching the physical limit, lithography hotspot detection has become an essential task in design for manufacturability. While the deployment of pattern matching or machine learning in hotspot detection can help save significant simulation time, such methods typically demand for non-trivial quality data to build the model, which most design houses are short of. Moreover, the design houses are also unwilling to directly share such data with the other houses to build a unified model, which can be ineffective for the design house with unique design patterns due to data insufficiency. On the other hand, with data homogeneity in each design house, the locally trained models can be easily over-fitted, losing generalization ability and robustness. In this paper, we propose a heterogeneous federated learning framework for lithography hotspot detection that can address the aforementioned issues. On one hand, the framework can build a more robust centralized global sub-model through heterogeneous knowledge sharing while keeping local data private. On the other hand, the global sub-model can be combined with a local sub-model to better adapt to local data heterogeneity. The experimental results show that the proposed framework can overcome the challenge of non-independent and identically distributed (non-IID) data and heterogeneous communication to achieve very high performance in comparison to other state-of-the-art methods while guaranteeing a good convergence rate in various scenarios.	翻訳日:2021-07-12 13:43:43 公開日:2021-07-09
# 制約付き最適化としてのモデル圧縮とニューラルネットへの応用パート5:圧縮の組み合わせ Model compression as constrained optimization, with application to neural nets. Part V: combining compressions ( http://arxiv.org/abs/2107.04380v1 ) ライセンス: Link先を確認	Miguel \'A. Carreira-Perpi\~n\'an, Yerlan Idelbayev	(参考訳) モデル圧縮は一般に量子化、低ランク近似、プルーニングを用いて行われ、近年様々なアルゴリズムが研究されている。基本的な質問の1つは、どのタイプの圧縮が特定のモデルに対してうまく働くかということです。あるいは、もっと良い:適切な方法で圧縮を組み合わせることで改善できるのか? これを損失を最適化する問題として一般に定式化するが、重みを個別に圧縮した部分の加法結合に制限し、対応する部分のパラメータを学習するアルゴリズムを与える。ディープニューラルネットを用いた実験では,1)誤り圧縮空間において,異なる圧縮型が相補的な効果をもたせ,2)最適な組み合わせがニューラルネットワークのタイプに依存することを示す,はるかに優れたモデルを見出すことができる。例えば、数個の浮動小数点重みを追加してエラーを発生させることなく、ResNetとAlexNetを1ビット1重で圧縮できます。しかし、低ランクと浮動小数点重みを組み合わせることで、VGGネットをより圧縮することができる。 Model compression is generally performed by using quantization, low-rank approximation or pruning, for which various algorithms have been researched in recent years. One fundamental question is: what types of compression work better for a given model? Or even better: can we improve by combining compressions in a suitable way? We formulate this generally as a problem of optimizing the loss but where the weights are constrained to equal an additive combination of separately compressed parts; and we give an algorithm to learn the corresponding parts' parameters. Experimentally with deep neural nets, we observe that 1) we can find significantly better models in the error-compression space, indicating that different compression types have complementary benefits, and 2) the best type of combination depends exquisitely on the type of neural net. For example, we can compress ResNets and AlexNet using only 1 bit per weight without error degradation at the cost of adding a few floating point weights. However, VGG nets can be better compressed by combining low-rank with a few floating point weights.	翻訳日:2021-07-12 13:43:19 公開日:2021-07-09
# Form2Seq : 高次構造抽出のためのフレームワーク Form2Seq : A Framework for Higher-Order Form Structure Extraction ( http://arxiv.org/abs/2107.04419v1 ) ライセンス: Link先を確認	Milan Aggarwal, Hiresh Gupta, Mausoom Sarkar, Balaji Krishnamurthy	(参考訳) 文書構造抽出は数十年にわたって広く研究されてきた分野であり、近年では完全畳み込みネットワークを用いた文書画像のセマンティックセグメンテーションタスクとして行われている。このような手法は画像分解能によって制限されるが、一般的に形に現れる濃密な領域の構造を曖昧にしないためである。そこで本稿では,テキストを用いた構造抽出のための新しいシーケンシャル・ツー・シークエンス(seq2seq)フレームワークであるform2seqを提案する。 1) 低レベルの構成要素(TextBlockと空の充填可能なウィジェット)をフィールドキャプションやリストアイテムなど10種類に分類し,2) 低レベルの要素をテキストフィールド, ChoiceFields, ChoiceGroupsなどの高次の構成要素に分類し,フォームの情報収集機構として利用する。これを実現するため、構成要素を自然読み順に線形に配置し、その空間表現とテキスト表現をseq2seqフレームワークに供給し、最終タスクに応じて各要素の予測を順次出力する。タスクをグループ化するためにseq2seqを修正し、2つのタスクのエンドツーエンドトレーニングを分離したトレーニングと比較することで得られた改善について検討する。実験の結果, 分類タスクにおいて90%の精度を達成するテキストベースアプローチの有効性を示し, 上記のグループでは75.82, 86.01, 61.63のf1がセグメンテーションベースラインを上回った。さらに,ICDAR 2013 データセット上でのテーブル構造認識の結果の状況を示す。 Document structure extraction has been a widely researched area for decades with recent works performing it as a semantic segmentation task over document images using fully-convolution networks. Such methods are limited by image resolution due to which they fail to disambiguate structures in dense regions which appear commonly in forms. To mitigate this, we propose Form2Seq, a novel sequence-to-sequence (Seq2Seq) inspired framework for structure extraction using text, with a specific focus on forms, which leverages relative spatial arrangement of structures. We discuss two tasks; 1) Classification of low-level constituent elements (TextBlock and empty fillable Widget) into ten types such as field captions, list items, and others; 2) Grouping lower-level elements into higher-order constructs, such as Text Fields, ChoiceFields and ChoiceGroups, used as information collection mechanism in forms. To achieve this, we arrange the constituent elements linearly in natural reading order, feed their spatial and textual representations to Seq2Seq framework, which sequentially outputs prediction of each element depending on the final task. We modify Seq2Seq for grouping task and discuss improvements obtained through cascaded end-to-end training of two tasks versus training in isolation. Experimental results show the effectiveness of our text-based approach achieving an accuracy of 90% on classification task and an F1 of 75.82, 86.01, 61.63 on groups discussed above respectively, outperforming segmentation baselines. Further we show our framework achieves state of the results for table structure recognition on ICDAR 2013 dataset.	翻訳日:2021-07-12 13:42:41 公開日:2021-07-09
# 非漸近解析による歪みリスク尺度の類似度に基づく政策勾配法 Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis ( http://arxiv.org/abs/2107.04422v1 ) ライセンス: Link先を確認	Nithia Vijayan and Prashanth L. A	(参考訳) 本稿では,リスクに敏感な強化学習(rl)環境での制御問題を解決するためのポリシ勾配アルゴリズムを提案する。本アルゴリズムの目的は,マルコフ決定過程(MDP)における累積報酬の歪みリスク尺度(DRM)を最大化することである。我々は、drmの目的に対応するポリシー勾配定理の変種を導出する。この定理とLRに基づく勾配推定手法を併用して,オン・ポリティクスとオフ・ポリティクスのRL設定の両方においてDRMを最適化するポリシー勾配アルゴリズムを提案する。我々は、drm目標の近似定常点へのアルゴリズムの収束を確立する非漸近境界を導出する。 We propose policy-gradient algorithms for solving the problem of control in a risk-sensitive reinforcement learning (RL) context. The objective of our algorithm is to maximize the distorted risk measure (DRM) of the cumulative reward in an episodic Markov decision process (MDP). We derive a variant of the policy gradient theorem that caters to the DRM objective. Using this theorem in conjunction with a likelihood ratio (LR) based gradient estimation scheme, we propose policy gradient algorithms for optimizing DRM in both on-policy and off-policy RL settings. We derive non-asymptotic bounds that establish the convergence of our algorithms to an approximate stationary point of the DRM objective.	翻訳日:2021-07-12 13:42:10 公開日:2021-07-09
# 交通シナリオにおける行動計画のための対話型ガイダンスの学習 Learning Interaction-aware Guidance Policies for Motion Planning in Dense Traffic Scenarios ( http://arxiv.org/abs/2107.04538v1 ) ライセンス: Link先を確認	Bruno Brito, Achin Agarwal and Javier Alonso-Mora	(参考訳) 密集した交通シナリオにおける自律ナビゲーションは、他のドライバーの意図が直接観察不可能であり、AVは幅広い運転行動を扱う必要があるため、自動運転車(AV)にとって依然として困難である。密集した交通を操るために、avは彼らの行動が他人(相互作用モデル)にどう影響するかを判断し、この推論を利用して密集した交通を安全にナビゲートする必要がある。本稿では,高密度交通シナリオにおける対話型動き計画のための新しい枠組みを提案する。人間の運転行動と相互作用時の速度変化との関係について検討する。そこで我々は,制約満足度による安全性と運動性の実現性を保証する最適化型プランナーに,他車両の協調性に関するグローバルガイダンスを提供するインタラクション対応政策であるDeep Reinforcement Learning (RL)を通じて学習することを提案する。学習されたポリシーは、ローカル最適化ベースのプランナーを推論し、対話的な振る舞いで誘導し、他の車両が収まらない場合に安全を維持しながら、高密度トラフィックに積極的にマージする。我々は,高度にインタラクティブなシミュレーション環境(ハイウェイマージとアンプロテクト左旋回)において,学習ベースと最適化ベースの2つのベースラインアプローチに対して定性的かつ定量的な結果を示す。本手法は,学習ベースと最適化ベースの両方において,衝突数を大幅に削減し,成功率を増加させることを示す。 Autonomous navigation in dense traffic scenarios remains challenging for autonomous vehicles (AVs) because the intentions of other drivers are not directly observable and AVs have to deal with a wide range of driving behaviors. To maneuver through dense traffic, AVs must be able to reason how their actions affect others (interaction model) and exploit this reasoning to navigate through dense traffic safely. This paper presents a novel framework for interaction-aware motion planning in dense traffic scenarios. We explore the connection between human driving behavior and their velocity changes when interacting. Hence, we propose to learn, via deep Reinforcement Learning (RL), an interaction-aware policy providing global guidance about the cooperativeness of other vehicles to an optimization-based planner ensuring safety and kinematic feasibility through constraint satisfaction. The learned policy can reason and guide the local optimization-based planner with interactive behavior to pro-actively merge in dense traffic while remaining safe in case the other vehicles do not yield. We present qualitative and quantitative results in highly interactive simulation environments (highway merging and unprotected left turns) against two baseline approaches, a learning-based and an optimization-based method. The presented results demonstrate that our method significantly reduces the number of collisions and increases the success rate with respect to both learning-based and optimization-based baselines.	翻訳日:2021-07-12 13:41:58 公開日:2021-07-09
# 半教師型顔行動分析のためのマルチタスク平均教師 A Multi-task Mean Teacher for Semi-supervised Facial Affective Behavior Analysis ( http://arxiv.org/abs/2107.04225v1 ) ライセンス: Link先を確認	Lingfeng Wang, Shisen Wang	(参考訳) Affective Behavior Analysisは、人間とコンピュータの相互作用において重要な部分である。 tsav[9]のような既存の感情的行動分析手法は、不完全なラベル付きデータセットの課題に苦しむ。そこで本論文では,ラベルの欠落から学習し,複数の関連課題を同時に学習するための,半教師付き感情行動分析のためのマルチタスク平均教師モデルを提案する。具体的には、TSAVをベースラインモデルとして利用し、3つのタスクを同時に認識する。我々は,より優れた意味情報を提供するために,マスクのレンダリング前処理法を変更した。その後、平均教師を用いてTSAVモデルを半教師付きモデルに拡張し、ラベルなしデータから恩恵を受けることができた。評価実験の結果,提案手法はTSAVモデルよりも優れた性能を達成し,提案手法が適応的行動解析性能を向上させるために,新たなラベル付きデータを効果的に学習できることが確認された。 Affective Behavior Analysis is an important part in human?computer interaction. Existing successful affective behavior analysis method such as TSAV[9] suffer from challenge of incomplete labeled datasets. To boost its performance, this paper presents a multi-task mean teacher model for semi?supervised Affective Behavior Analysis to learn from missing labels and exploring the learning of multiple correlated task simultaneously. To be specific, we first utilize TSAV as baseline model to simultaneously recognize the three tasks. We have modified the preprocessing method of rendering mask to provide better semantics information. After that, we extended TSAV model to semi-supervised model using mean teacher, which allow it to be benefited from unlabeled data. Experimental results on validation datasets show that our method achieves better performance than TSAV model, which verifies that the proposed network can effectively learn additional unlabeled data to boost the affective behavior analysis performance.	翻訳日:2021-07-12 13:41:22 公開日:2021-07-09
# UrbanScene3D: 大規模都市景観データセットとシミュレータ UrbanScene3D: A Large Scale Urban Scene Dataset and Simulator ( http://arxiv.org/abs/2107.04286v1 ) ライセンス: Link先を確認	Yilin Liu and Fuyou Xue and Hui Huang	(参考訳) 異なる方法で環境を知覚する能力は、ロボット研究に不可欠である。これには2dデータソースと3dデータソースの両方の分析が含まれる。本研究では,Unreal Engine 4 と AirSim をベースとした手頃なシミュレータを応用した大規模都市景観データセットを提案する。従来の2d情報や人工3dcadモデルに基づく作品とは異なり、urbanscene3dは小型の人工物モデルと航空画像で再構成された詳細な実世界のモデルの両方を含んでいる。各建物はシーンモデル全体から手動で抽出され、ユニークなラベルが割り当てられ、インスタンスセグメンテーションマップが作成されます。 UrbanScene3Dのインスタンスセグメンテーションラベルが付いた3Dの地平線テクスチャモデルでは、ユーザは、インスタンスセグメンテーションマップ、任意の解像度での深度マップ、可視および見えない場所での3Dポイントクラウド/メッシュなど、すべての種類のデータを取得することができる。さらに、airsimの助けを借りて、ユーザーはロボット(カー/ドロネス)をシミュレートして、提案された都市環境で様々な自律的なタスクをテストできる。詳細とアプリケーションの詳細については、私たちの論文とWebサイト(https://vcc.tech/UrbanScene3D/)を参照してください。 The ability to perceive the environments in different ways is essential to robotic research. This involves the analysis of both 2D and 3D data sources. We present a large scale urban scene dataset associated with a handy simulator based on Unreal Engine 4 and AirSim, which consists of both man-made and real-world reconstruction scenes in different scales, referred to as UrbanScene3D. Unlike previous works that purely based on 2D information or man-made 3D CAD models, UrbanScene3D contains both compact man-made models and detailed real-world models reconstructed by aerial images. Each building has been manually extracted from the entire scene model and then has been assigned with a unique label, forming an instance segmentation map. The provided 3D ground-truth textured models with instance segmentation labels in UrbanScene3D allow users to obtain all kinds of data they would like to have: instance segmentation map, depth map in arbitrary resolution, 3D point cloud/mesh in both visible and invisible places, etc. In addition, with the help of AirSim, users can also simulate the robots (cars/drones)to test a variety of autonomous tasks in the proposed city environment. Please refer to our paper and website(https://vcc.tech/UrbanScene3D/) for further details and applications.	翻訳日:2021-07-12 13:41:07 公開日:2021-07-09
# 経時的差分法によるDigital Subtraction Angiography Videoからの肝細胞癌の分節化 Hepatocellular Carcinoma Segmentation fromDigital Subtraction Angiography Videos usingLearnable Temporal Difference ( http://arxiv.org/abs/2107.04306v1 ) ライセンス: Link先を確認	Wenting Jiang, Yicheng Jiang, Lu Zhang, Changmiao Wang, Xiaoguang Han, Shuixing Zhang, Xiang Wan, Shuguang Cui	(参考訳) DSA(Digital Subtraction Angiography)ビデオにおける肝細胞癌 (HCC) の自動分画は, HCCの効率的な診断と臨床における腫瘍の正確な評価に役立つ。 dsaビデオからのhccセグメンテーションに関する研究はほとんどない。撮影における運動アーティファクト、腫瘍領域の曖昧な境界、および他の解剖組織へのイメージングにおける高い類似性により、非常に困難である。本稿では、DSAビデオにおけるHCCセグメンテーションの問題を提起し、独自のDSAデータセットを構築する。また,asegmentation sub-network,temporal difference learning (tdl) モジュール,および liver region segmentation (lrs) sub-network など,dsa-ltdnet と呼ばれる新しいセグメンテーションネットワークも提案する。 DSA-LTDNetは、DSAビデオから潜時動作情報を積極的に学習し、セグメンテーション性能を高めるのに好ましい。実験の結果、DSA-LTDNetは、U-Netベースラインと比較して、DICEスコアを4%近く増加させることがわかった。 Automatic segmentation of hepatocellular carcinoma (HCC)in Digital Subtraction Angiography (DSA) videos can assist radiologistsin efficient diagnosis of HCC and accurate evaluation of tumors in clinical practice. Few studies have investigated HCC segmentation from DSAvideos. It shows great challenging due to motion artifacts in filming, ambiguous boundaries of tumor regions and high similarity in imaging toother anatomical tissues. In this paper, we raise the problem of HCCsegmentation in DSA videos, and build our own DSA dataset. We alsopropose a novel segmentation network called DSA-LTDNet, including asegmentation sub-network, a temporal difference learning (TDL) moduleand a liver region segmentation (LRS) sub-network for providing additional guidance. DSA-LTDNet is preferable for learning the latent motioninformation from DSA videos proactively and boosting segmentation performance. All of experiments are conducted on our self-collected dataset.Experimental results show that DSA-LTDNet increases the DICE scoreby nearly 4% compared to the U-Net baseline.	翻訳日:2021-07-12 13:40:42 公開日:2021-07-09
# 信頼度に基づく複数物体追跡のためのスコア改善 Score refinement for confidence-based 3D multi-object tracking ( http://arxiv.org/abs/2107.04327v1 ) ライセンス: Link先を確認	Nuri Benbarka, Jona Schr\"oder, Andreas Zell	(参考訳) マルチオブジェクトトラッキングは、意思決定に有用な情報を提供するため、自律ナビゲーションにおいて重要なコンポーネントである。多くの研究者は、フレームごとの3D検出をフィルタリングすることで、3D多目的追跡タスクに取り組みましたが、その焦点は主に有用な特徴や適切なマッチングメトリクスを見つけることでした。我々の研究は追跡システムの無視された部分に焦点を当てている:スコアの洗練とトラックレットの終了。トラックレットスコアに応じてトラックレットを終了させながら、時間的一貫性に応じてスコアを操作することにより、追跡結果が向上することを示す。我々は、一致トラックレットのスコアをスコア更新機能で増加させ、一致しないトラックレットのスコアを減少させることによりこれを行う。数に基づく手法と比較して,様々な検出器とフィルタリングアルゴリズムを異なるデータセットで利用する場合,amotaとmotaスコアが一貫して向上する。 AMOTAのスコアは1.83と2.96まで改善された。また, 本手法を後期輸液センシング法として使用し, 投票に基づくアンサンブル法よりも有意な性能を示した。 AMOTAスコア67.6のnuScenesテスト評価を達成し、これは他の最先端のトラッカーと同等である。コードは: \url{https://github.com/cogsys-tuebingen/CBMOT}で公開されている。 Multi-object tracking is a critical component in autonomous navigation, as it provides valuable information for decision-making. Many researchers tackled the 3D multi-object tracking task by filtering out the frame-by-frame 3D detections; however, their focus was mainly on finding useful features or proper matching metrics. Our work focuses on a neglected part of the tracking system: score refinement and tracklet termination. We show that manipulating the scores depending on time consistency while terminating the tracklets depending on the tracklet score improves tracking results. We do this by increasing the matched tracklets' score with score update functions and decreasing the unmatched tracklets' score. Compared to count-based methods, our method consistently produces better AMOTA and MOTA scores when utilizing various detectors and filtering algorithms on different datasets. The improvements in AMOTA score went up to 1.83 and 2.96 in MOTA. We also used our method as a late-fusion ensembling method, and it performed better than voting-based ensemble methods by a solid margin. It achieved an AMOTA score of 67.6 on nuScenes test evaluation, which is comparable to other state-of-the-art trackers. Code is publicly available at: \url{https://github.com/cogsys-tuebingen/CBMOT}.	翻訳日:2021-07-12 13:40:19 公開日:2021-07-09
# StyleCariGAN: StyleGAN特徴マップ変調による画像生成 StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation ( http://arxiv.org/abs/2107.04331v1 ) ライセンス: Link先を確認	Wonjong Jang, Gwangjin Ju, Yucheol Jung, Jiaolong Yang, Xin Tong, Seungyong Lee	(参考訳) StyleGAN を用いた形状とスタイルの操作に基づく似顔絵生成フレームワークを提案する。 stylecariganと呼ばれるこのフレームワークは、入力写真からリアルで詳細な似顔絵を自動的に作成し、形状の誇張度とカラースタイライゼーションタイプを任意に制御する。提案手法の鍵となる要素は,StyleGANの粗い層特徴写像を変調して,好適な図形強調を生成する形状強調ブロックである。まず、写真用StyleGANの微細な層を、画像生成の訓練を受けたStyleGANの対応する層に置き換えることで、写真から画像への変換が可能なStyleGANを構築した。入力写真が与えられた場合、層状混合モデルは似顔絵の詳細なカラースタイリングを生成するが、形状の誇張はない。次に、層混合モデルの粗い層に形状誇張ブロックを付加し、入力の特徴的外観を保ちながら形状誇張を作成するようにブロックを訓練する。実験結果から,我々のStyleCariGANは,現在の最先端手法と比較して,現実的で詳細な似顔絵を生成することがわかった。 StyleCariGANは、表情制御など、他のStyleGANベースの画像操作もサポートしています。 We present a caricature generation framework based on shape and style manipulation using StyleGAN. Our framework, dubbed StyleCariGAN, automatically creates a realistic and detailed caricature from an input photo with optional controls on shape exaggeration degree and color stylization type. The key component of our method is shape exaggeration blocks that are used for modulating coarse layer feature maps of StyleGAN to produce desirable caricature shape exaggerations. We first build a layer-mixed StyleGAN for photo-to-caricature style conversion by swapping fine layers of the StyleGAN for photos to the corresponding layers of the StyleGAN trained to generate caricatures. Given an input photo, the layer-mixed model produces detailed color stylization for a caricature but without shape exaggerations. We then append shape exaggeration blocks to the coarse layers of the layer-mixed model and train the blocks to create shape exaggerations while preserving the characteristic appearances of the input. Experimental results show that our StyleCariGAN generates realistic and detailed caricatures compared to the current state-of-the-art methods. We demonstrate StyleCariGAN also supports other StyleGAN-based image manipulations, such as facial expression control.	翻訳日:2021-07-12 13:39:58 公開日:2021-07-09
# 深部不連続保存画像登録ネットワーク A Deep Discontinuity-Preserving Image Registration Network ( http://arxiv.org/abs/2107.04440v1 ) ライセンス: Link先を確認	Xiang Chen, Nishant Ravikumar, Yan Xia, Alejandro F Frangi	(参考訳) 画像登録は、ペアまたは画像のグループ間の空間対応を確立することを目的としており、医療画像計算とコンピュータ支援介入の基盤となっている。現在、ほとんどのディープラーニングベースの登録法は、所望の変形場は世界規模で滑らかで連続的であり、実際のシナリオ、特に医用画像の登録において必ずしも有効ではないと仮定している。心臓画像と腹部画像)。このようなグローバル制約は、不連続な組織界面におけるアーティファクトやエラーの増加につながる可能性がある。そこで本研究では,より優れた登録性能と現実的な変形場を得るため,ddir(deep discontinuity-preserving image registration network)を提案する。本手法は,UK Biobank Imaging Study (UKBB) の心臓磁気共鳴(MR)画像の登録実験において,最先端のアプローチよりも,登録精度を大幅に向上し,より現実的な変形を予測する。 Image registration aims to establish spatial correspondence across pairs, or groups of images, and is a cornerstone of medical image computing and computer-assisted-interventions. Currently, most deep learning-based registration methods assume that the desired deformation fields are globally smooth and continuous, which is not always valid for real-world scenarios, especially in medical image registration (e.g. cardiac imaging and abdominal imaging). Such a global constraint can lead to artefacts and increased errors at discontinuous tissue interfaces. To tackle this issue, we propose a weakly-supervised Deep Discontinuity-preserving Image Registration network (DDIR), to obtain better registration performance and realistic deformation fields. We demonstrate that our method achieves significant improvements in registration accuracy and predicts more realistic deformations, in registration experiments on cardiac magnetic resonance (MR) images from UK Biobank Imaging Study (UKBB), than state-of-the-art approaches.	翻訳日:2021-07-12 13:39:37 公開日:2021-07-09
# 視覚領域の変遷に伴うオープンワールド認識の課題について On the Challenges of Open World Recognitionunder Shifting Visual Domains ( http://arxiv.org/abs/2107.04461v1 ) ライセンス: Link先を確認	Dario Fontanel, Fabio Cermelli, Massimiliano Mancini, Barbara Caputo	(参考訳) 野生で動作しているロボット視覚システムは、異なる環境条件下で、未知の環境を含む様々な意味概念に直面しながら、制約のないシナリオで行動しなければならない。この目的のために、近年の研究では、視覚的オブジェクト認識手法をi)見知らぬ概念を検出し、i)新しいセマンティッククラスの画像が到着するにつれて、その知識を時間とともに拡張しようと試みている。 Open World Recognition (OWR)と呼ばれるこの設定は、初期トレーニングセットに存在するセマンティック制限を破ることのできるシステムを開発することを目的としている。しかしながら、このトレーニングセットは、実世界の高変動を必ずしも反映しない特定の取得条件に対するバイアスのため、システム自体の意味的限界だけでなく、環境的な制約も課している。このトレーニングとテスト分布の相違をドメインシフトと呼ぶ。本研究では、OWRアルゴリズムがドメインシフトの下で有効であるかどうかを調査し、OWRアルゴリズムの性能をドメインシフトなしで正確に評価するための最初のベンチマーク設定を示す。次に、このベンチマークを用いて様々なシナリオの分析を行い、既存のOWRアルゴリズムが、列車とテストの分布が異なる場合、いかに深刻な性能劣化を経験しているかを示す。解析の結果,この劣化はOWRと領域一般化手法の結合によってわずかに緩和されることが示され,既存のアルゴリズムのプラグアンドプレイだけでは未知の領域における新しいカテゴリや未知のカテゴリを認識するには不十分であることが示唆された。本研究は,ロボット視覚システムの構築において,これらの課題に対して,極めて現実的な条件下で確実に機能するために必要なオープンな課題と今後の研究方向性を,明らかに示している。 https://github.com/DarioFontanel/OWR-VisualDomainsで利用可能なコード Robotic visual systems operating in the wild must act in unconstrained scenarios, under different environmental conditions while facing a variety of semantic concepts, including unknown ones. To this end, recent works tried to empower visual object recognition methods with the capability to i) detect unseen concepts and ii) extended their knowledge over time, as images of new semantic classes arrive. This setting, called Open World Recognition (OWR), has the goal to produce systems capable of breaking the semantic limits present in the initial training set. However, this training set imposes to the system not only its own semantic limits, but also environmental ones, due to its bias toward certain acquisition conditions that do not necessarily reflect the high variability of the real-world. This discrepancy between training and test distribution is called domain-shift. This work investigates whether OWR algorithms are effective under domain-shift, presenting the first benchmark setup for assessing fairly the performances of OWR algorithms, with and without domain-shift. We then use this benchmark to conduct analyses in various scenarios, showing how existing OWR algorithms indeed suffer a severe performance degradation when train and test distributions differ. Our analysis shows that this degradation is only slightly mitigated by coupling OWR with domain generalization techniques, indicating that the mere plug-and-play of existing algorithms is not enough to recognize new and unknown categories in unseen domains. Our results clearly point toward open issues and future research directions, that need to be investigated for building robot visual systems able to function reliably under these challenging yet very real conditions. Code available at https://github.com/DarioFontanel/OWR-VisualDomains	翻訳日:2021-07-12 13:39:19 公開日:2021-07-09
# モデル予測制御のための構造化ハマースタイン・ウィーナーモデル学習 Structured Hammerstein-Wiener Model Learning for Model Predictive Control ( http://arxiv.org/abs/2107.04247v1 ) ライセンス: Link先を確認	Ryuta Moriyasu, Taro Ikeda, Sho Kawaguchi, Kenji Kashima	(参考訳) 本稿では,機械学習によって構築されたモデルを用いて最適制御の信頼性を向上させることを目的とする。このようなモデルに基づく最適制御問題は一般に非凸であり、オンラインでは解決が難しい。本稿では,Hammerstein-Wienerモデルと入力凸ニューラルネットワークを組み合わせたモデルを提案する。提案モデルの重要な特徴は, 最適制御問題の発生は, 柔軟モデリング能力を維持しつつ, 対流性と部分線形性を効果的に活用できる点である。本手法の実用性について,エンジンエアパスシステムのモデル化と制御への応用を通して検討した。 This paper aims to improve the reliability of optimal control using models constructed by machine learning methods. Optimal control problems based on such models are generally non-convex and difficult to solve online. In this paper, we propose a model that combines the Hammerstein-Wiener model with input convex neural networks, which have recently been proposed in the field of machine learning. An important feature of the proposed model is that resulting optimal control problems are effectively solvable exploiting their convexity and partial linearity while retaining flexible modeling ability. The practical usefulness of the method is examined through its application to the modeling and control of an engine airpath system.	翻訳日:2021-07-12 13:38:49 公開日:2021-07-09
# HMMとCTCに基づくフルコンテキストASRモデルの格子フリー強化MMIトレーニングについて On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models ( http://arxiv.org/abs/2107.04154v1 ) ライセンス: Link先を確認	Xiaohui Zhang, Vimal Manohar, David Zhang, Frank Zhang, Yangyang Shi, Nayan Singhal, Julian Chan, Fuchun Peng, Yatharth Saraf, Mike Seltzer	(参考訳) ハイブリッド自動音声認識(ASR)モデルは通常、CTCまたはLF-MMI基準で順次訓練される。しかし、それらは非常に異なる正統性を持ち、通常は異なるフレームワークで実装される。本稿では,モデリング単位とラベルトポロジの概念を分離し,適切な数値/デノミネータグラフを構築することにより,ハイブリッド音響モデリング(AM)のための一般化された枠組みを確立する。本フレームワークでは,HMM/CTCトポロジを持つワードピース/モノチャー/ビチャー/チェノン単位に対して,LF-MMIは限定コンテキストモデルとフルコンテキストモデルの両方に適用可能な,強力なトレーニング基準であることを示す。本フレームワークでは,チェノン(ch)/ワードピース(wp)-CTC-bMMI,ワードピース(wp)-HMM-bMMIの3つの新しいトレーニング手法を提案する。異なるトレーニングスキームの利点をLibrispeech上で総合的に評価し,wp-CTC-bMMIとch-CTC-bMMIを実世界の2つのタスクで評価し,その効果を示した。さらに、バイチャーHMM-MMIモデルが従来の非ニューラルGMM-HMMよりも優れたアライメントモデルとして機能することを示す。 Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria. However, they have vastly different legacies and are usually implemented in different frameworks. In this paper, by decoupling the concepts of modeling units and label topologies and building proper numerator/denominator graphs accordingly, we establish a generalized framework for hybrid acoustic modeling (AM). In this framework, we show that LF-MMI is a powerful training criterion applicable to both limited-context and full-context models, for wordpiece/mono-char/bi-char/chenone units, with both HMM/CTC topologies. From this framework, we propose three novel training schemes: chenone(ch)/wordpiece(wp)-CTC-bMMI, and wordpiece(wp)-HMM-bMMI with different advantages in training performance, decoding efficiency and decoding time-stamp accuracy. The advantages of different training schemes are evaluated comprehensively on Librispeech, and wp-CTC-bMMI and ch-CTC-bMMI are evaluated on two real world ASR tasks to show their effectiveness. Besides, we also show bi-char(bc) HMM-MMI models can serve as better alignment models than traditional non-neural GMM-HMMs.	翻訳日:2021-07-12 13:38:22 公開日:2021-07-09
# Bib2Auth: 文献データを用いた著者曖昧化のためのディープラーニングアプローチ Bib2Auth: Deep Learning Approach for Author Disambiguation using Bibliographic Data ( http://arxiv.org/abs/2107.04382v1 ) ライセンス: Link先を確認	Zeyd Boukhers, Nagaraj Bahubali, Abinaya Thulsi Chandrasekaran, Adarsh Anand, Soniya Manchenahalli Gnanendra Prasadand, Sriram Aralappa	(参考訳) 著者名の曖昧さは、名前の同義語や同義語のため、デジタル図書館において重要な問題である。本稿では,著者の共著者パターンと研究領域に依存して,著者名を現実世界の実体に結びつける手法を提案する。本モデルでは,著者と著者の共著者との関係を捉え,対象著者の出版物のタイトルと出典によって表される研究領域を把握し,著者を特定する。これらの属性は、意味的および象徴的な表現によって符号化される。この目的のために、Bib2AuthはDBLPリポジトリから約22Kの書誌記録を使用し、それぞれの共著者でトレーニングされている。広範な実験により、同じ名前を共有する著者を区別し、異なる名前の作者を識別するアプローチの能力が証明された。 Bib2Authは比較的大きなデータセットで優れたパフォーマンスを示しており、書誌インデックスに直接組み込むことができる。 Author name ambiguity remains a critical open problem in digital libraries due to synonymy and homonymy of names. In this paper, we propose a novel approach to link author names to their real-world entities by relying on their co-authorship pattern and area of research. Our supervised deep learning model identifies an author by capturing his/her relationship with his/her co-authors and area of research, which is represented by the titles and sources of the target author's publications. These attributes are encoded by their semantic and symbolic representations. To this end, Bib2Auth uses ~ 22K bibliographic records from the DBLP repository and is trained with each pair of co-authors. The extensive experiments have proved the capability of the approach to distinguish between authors sharing the same name and recognize authors with different name variations. Bib2Auth has shown good performance on a relatively large dataset, which qualifies it to be directly integrated into bibliographic indices.	翻訳日:2021-07-12 13:37:56 公開日:2021-07-09
# 下流フェアネスのための多精度プロキシ Multiaccurate Proxies for Downstream Fairness ( http://arxiv.org/abs/2107.04423v1 ) ライセンス: Link先を確認	Emily Diana, Wesley Gill, Michael Kearns, Krishnaram Kenthapadi, Aaron Roth, and Saeed Sharifi-Malvajerdi	(参考訳) 私たちは、センシティブな機能がトレーニング時に利用できない場合に、人口統計学的公正条件に従わなければならないモデルをトレーニングする問題を調査します。私たちはフェアネスパイプラインの観点を採用しており、センシティブな機能にアクセス可能な"上流"学習者は、他の属性からこれらの機能のプロキシモデルを学びます。プロキシの目標は、一般的な"ダウンストリーム"学習者 -- 予測タスクを最小限の仮定で -- が、プロキシを使用して、真に敏感な機能に対して公平なモデルをトレーニングできるようにすることです。我々は,この目的のために,下流モデルクラスに対する多精度制約に従うことを示し,サンプルおよびoracleの効率的なアルゴリズムと,そのようなプロキシを学ぶための一般化境界を提供する。一般に、多重精度は分類の正確さよりもずっと容易に満足でき、感度の高い特徴が予測しにくい場合でも満足できる。 We study the problem of training a model that must obey demographic fairness conditions when the sensitive features are not available at training time -- in other words, how can we train a model to be fair by race when we don't have data about race? We adopt a fairness pipeline perspective, in which an "upstream" learner that does have access to the sensitive features will learn a proxy model for these features from the other attributes. The goal of the proxy is to allow a general "downstream" learner -- with minimal assumptions on their prediction task -- to be able to use the proxy to train a model that is fair with respect to the true sensitive features. We show that obeying multiaccuracy constraints with respect to the downstream model class suffices for this purpose, and provide sample- and oracle efficient-algorithms and generalization bounds for learning such proxies. In general, multiaccuracy can be much easier to satisfy than classification accuracy, and can be satisfied even when the sensitive features are hard to predict.	翻訳日:2021-07-12 13:37:41 公開日:2021-07-09
# 再記述モデルマイニング Redescription Model Mining ( http://arxiv.org/abs/2107.04462v1 ) ライセンス: Link先を確認	Felix I. Stamm, Martin Becker, Markus Strohmaier, Florian Lemmerich	(参考訳) 本稿では,属性のサブセットのみを共有し,共通インスタンスを持たない2つのデータセットにまたがる解釈可能なパターンを識別する,新しい手法であるRedescription Model Miningを紹介する。特に、再記述モデルマイニング(redescription model mining)は、予測可能なデータサブセットのペア(データセット毎にひとつ)を見つけることを目的としている。これを実現するために、以前は2つの研究領域、例外モデルマイニングと再定義マイニングを組み合わせた。この新しい問題設定のために, 有望なパターンの選択, 効率的なアルゴリズムの提案, 合成データおよび実世界データの可能性を示すための興味深い尺度を開発した。未知のパターンは、データセットにまたがって現れる共通の基礎的な現象をヒントにすることができ、同じデータセットに現れない属性間の(組み合わせ)関連を発見できる。 This paper introduces Redescription Model Mining, a novel approach to identify interpretable patterns across two datasets that share only a subset of attributes and have no common instances. In particular, Redescription Model Mining aims to find pairs of describable data subsets -- one for each dataset -- that induce similar exceptional models with respect to a prespecified model class. To achieve this, we combine two previously separate research areas: Exceptional Model Mining and Redescription Mining. For this new problem setting, we develop interestingness measures to select promising patterns, propose efficient algorithms, and demonstrate their potential on synthetic and real-world data. Uncovered patterns can hint at common underlying phenomena that manifest themselves across datasets, enabling the discovery of possible associations between (combinations of) attributes that do not appear in the same dataset.	翻訳日:2021-07-12 13:37:26 公開日:2021-07-09
# ディープラーニングモバイルトラフィック分類におけるクラスインクリメンタル学習の初見 A First Look at Class Incremental Learning in Deep Learning Mobile Traffic Classification ( http://arxiv.org/abs/2107.04464v1 ) ライセンス: Link先を確認	Giampaolo Bovenzi, Lixuan Yang, Alessandro Finamore, Giuseppe Aceto, Domenico Ciuonzo, Antonio Pescap\`e, Dario Rossi	(参考訳) 近年のDeep Learning(DL)の普及により、トラフィック分類への関心が再燃し、インターネットアプリケーションのトラフィックを特定するためのDLベースの分類器の正確性を示す研究がいくつか行われた。ハードウェアアクセラレータ(GPU、TPU)の助けを借りても、DLモデルのトレーニングは高価であり、インターネットトラフィックの進化する性質、特にモバイルトラフィックに適合するために必要な頻繁なモデル更新を運用する能力を制限する。この問題点に対処するため、本研究では、モデルにフルリトレーニングなしで新しいクラスを追加するためのインクリメンタルラーニング(il)技術を検討し、モデルの更新サイクルをスピードアップします。 iCarlはアートILメソッドのステートであり、MIRAGE-2019は40のAndroidアプリからのトラフィックを持つパブリックデータセットであり、「トラフィック分類に漸進的な学習がある場合」を理解することを目的としている。 iCarl内部を分離することにより、設計を改善する方法について議論し、iCarl+という改訂版に寄与する。当社の分析結果から、il技術はdlベースの自動トラヒック分析システムに向けたロードマップにおいて有望な研究領域である事が分かりました。 The recent popularity growth of Deep Learning (DL) re-ignited the interest towards traffic classification, with several studies demonstrating the accuracy of DL-based classifiers to identify Internet applications' traffic. Even with the aid of hardware accelerators (GPUs, TPUs), DL model training remains expensive, and limits the ability to operate frequent model updates necessary to fit to the ever evolving nature of Internet traffic, and mobile traffic in particular. To address this pain point, in this work we explore Incremental Learning (IL) techniques to add new classes to models without a full retraining, hence speeding up model's updates cycle. We consider iCarl, a state of the art IL method, and MIRAGE-2019, a public dataset with traffic from 40 Android apps, aiming to understand "if there is a case for incremental learning in traffic classification". By dissecting iCarl internals, we discuss ways to improve its design, contributing a revised version, namely iCarl+. Despite our analysis reveals their infancy, IL techniques are a promising research area on the roadmap towards automated DL-based traffic analysis systems.	翻訳日:2021-07-12 13:37:11 公開日:2021-07-09
# BayesSimIG:IsaacGymを用いた適応的ドメインランダム化のためのスケーラブルパラメータ推論 BayesSimIG: Scalable Parameter Inference for Adaptive Domain Randomization with IsaacGym ( http://arxiv.org/abs/2107.04527v1 ) ライセンス: Link先を確認	Rika Antonova, Fabio Ramos, Rafael Possas, Dieter Fox	(参考訳) BayesSimは、シミュレーションパラメータの確率自由推論に基づく強化学習における領域ランダム化の統計手法である。本稿では、最近リリースされたNVIDIA IsaacGymと統合されたBayesSimの実装を提供するライブラリであるBayesSimIGの概要を紹介する。この組み合わせにより、エンドツーエンドgpuアクセラレーションによる大規模パラメータ推論が可能になる。推論とシミュレーションの両方にgpuのスピードアップがあり、100以上のシミュレーションパラメータを持つ複雑なロボットタスクに対して、10k以上の並列シミュレーション環境の実行をサポートする。 BayesSimIGは、高次元の後方のスライスを簡単に視覚化するTensorBoardとの統合を提供する。このライブラリはモジュール的な方法で構築され、並列IsaacGym環境から軌跡を収集・処理する新しい方法で研究実験を支援する。 BayesSim is a statistical technique for domain randomization in reinforcement learning based on likelihood-free inference of simulation parameters. This paper outlines BayesSimIG: a library that provides an implementation of BayesSim integrated with the recently released NVIDIA IsaacGym. This combination allows large-scale parameter inference with end-to-end GPU acceleration. Both inference and simulation get GPU speedup, with support for running more than 10K parallel simulation environments for complex robotics tasks that can have more than 100 simulation parameters to estimate. BayesSimIG provides an integration with TensorBoard to easily visualize slices of high-dimensional posteriors. The library is built in a modular way to support research experiments with novel ways to collect and process the trajectories from the parallel IsaacGym environments.	翻訳日:2021-07-12 13:36:51 公開日:2021-07-09
# ロボット学習のためのタスク推論を支援する行動自己組織化 Behavior Self-Organization Supports Task Inference for Continual Robot Learning ( http://arxiv.org/abs/2107.04533v1 ) ライセンス: Link先を確認	Muhammad Burhan Hafez, Stefan Wermter	(参考訳) ロボット学習の最近の進歩により、ロボットは事前定義されたタスクを習得する能力がますます向上している。一方、人間として、私たちは生涯にわたって増え続けるタスクを学習する能力を持っています。連続的なロボット学習は、ロボットにこの能力を与えることを目標とする、新たな研究方向である。時間とともに新しいタスクを学ぶために、ロボットはまず手元のタスクを推測する必要がある。しかし,タスク推論はマルチタスク学習文学においてほとんど注目されていない。本稿では,ロボット制御タスクの連続学習のための新しい手法を提案する。提案手法は,段階的な自己組織的行動による行動埋め込みの教師なし学習を行う。タスク推論は、タスクよりもパフォーマンスを最適化するために強化学習で訓練されたマルチタスクポリシーへの入力として、環境状態とともに使用される実証行動に最も近い振る舞いを埋め込むことによって行われる。従来の手法とは異なり,本手法ではタスク分布の仮定は行わず,タスクを推論するタスク探索は不要である。並列かつ逐次的に提示されたタスクを用いた実験において,本手法は一般化性能と収束速度,特に連続学習環境において,他のマルチタスク学習手法よりも優れていることを示す。 Recent advances in robot learning have enabled robots to become increasingly better at mastering a predefined set of tasks. On the other hand, as humans, we have the ability to learn a growing set of tasks over our lifetime. Continual robot learning is an emerging research direction with the goal of endowing robots with this ability. In order to learn new tasks over time, the robot first needs to infer the task at hand. Task inference, however, has received little attention in the multi-task learning literature. In this paper, we propose a novel approach to continual learning of robotic control tasks. Our approach performs unsupervised learning of behavior embeddings by incrementally self-organizing demonstrated behaviors. Task inference is made by finding the nearest behavior embedding to a demonstrated behavior, which is used together with the environment state as input to a multi-task policy trained with reinforcement learning to optimize performance over tasks. Unlike previous approaches, our approach makes no assumptions about task distribution and requires no task exploration to infer tasks. We evaluate our approach in experiments with concurrently and sequentially presented tasks and show that it outperforms other multi-task learning approaches in terms of generalization performance and convergence speed, particularly in the continual learning setting.	翻訳日:2021-07-12 13:36:38 公開日:2021-07-09
# 流体シミュレーションの低次モデリングと効率的な時間進化のための深層学習 Deep Learning for Reduced Order Modelling and Efficient Temporal Evolution of Fluid Simulations ( http://arxiv.org/abs/2107.04556v1 ) ライセンス: Link先を確認	Pranshu Pant, Ruchit Doshi, Pranav Bahl, Amir Barati Farimani	(参考訳) Reduced Order Modelling (ROM) は、高次力学系の低次で計算コストの低い表現を作成するために広く用いられている。これらの表現を用いて、romはより少ないパラメータを使いながら効率的にフローフィールドをモデル化することができる。従来のROMは高階多様体を低次元空間に直線的に射影することでこれを達成し、プロパー直交分解(POD)のような次元還元手法を用いる。本研究では,非線形射影によって順序状態が減少するニューラルネットワークを構築するための,新しい深層学習フレームワークdl-rom(deep learning- reduced order modelling)を開発した。次に,3次元オートエンコーダと3次元U-Netアーキテクチャを用いて,学習した縮小状態を用いてシミュレーションの時間ステップを効率的に予測する。我々のモデルDL-ROMは、学習したROMから高精度な再構成を生成でき、学習した縮小状態を時間的にトラバースすることで、将来の時間ステップを効率的に予測することができる。これらはすべて、地上の真実を監督したり、高価なNavier-Stokes(NS)方程式を反復的に解決する必要なく達成される。提案手法の有効性と性能を検証するため,計算機流体力学(CFD)データセットを再構成性能と計算ランタイムメトリクスを用いて評価した。 DL-ROMは、許容誤差閾値を維持しながら、反復解法の計算ランタイムを2桁近く削減することができる。 Reduced Order Modelling (ROM) has been widely used to create lower order, computationally inexpensive representations of higher-order dynamical systems. Using these representations, ROMs can efficiently model flow fields while using significantly lesser parameters. Conventional ROMs accomplish this by linearly projecting higher-order manifolds to lower-dimensional space using dimensionality reduction techniques such as Proper Orthogonal Decomposition (POD). In this work, we develop a novel deep learning framework DL-ROM (Deep Learning - Reduced Order Modelling) to create a neural network capable of non-linear projections to reduced order states. We then use the learned reduced state to efficiently predict future time steps of the simulation using 3D Autoencoder and 3D U-Net based architectures. Our model DL-ROM is able to create highly accurate reconstructions from the learned ROM and is thus able to efficiently predict future time steps by temporally traversing in the learned reduced state. All of this is achieved without ground truth supervision or needing to iteratively solve the expensive Navier-Stokes(NS) equations thereby resulting in massive computational savings. To test the effectiveness and performance of our approach, we evaluate our implementation on five different Computational Fluid Dynamics (CFD) datasets using reconstruction performance and computational runtime metrics. DL-ROM can reduce the computational runtimes of iterative solvers by nearly two orders of magnitude while maintaining an acceptable error threshold.	翻訳日:2021-07-12 13:36:20 公開日:2021-07-09
# 良性および悪性眼腫瘍進展推定のための深層学習モデル Deep Learning models for benign and malign Ocular Tumor Growth Estimation ( http://arxiv.org/abs/2107.04220v1 ) ライセンス: Link先を確認	Mayank Goswami	(参考訳) 医療画像データの比較的豊富な可用性は、ニューラルネットワークベースの画像処理手法の開発とテストにおいて重要なサポートを提供している。臨床医は、医療画像データに適した画像処理アルゴリズムを選択する際にしばしば問題に直面する。ここでは、適切なモデルを選択するための戦略を示す。トレーニングデータセットは、100日以上経過した50マウス目の光コヒーレンストモグラフィ(oct)および血管造影(oct−a)画像を含む。このデータには、治療を受けていないマウスの目の画像が含まれている。正常網膜層を有する腫瘍領域の自動(a)分化と3次元眼腫瘍体積のセグメンテーションの4種類のディープラーニング変異体を試験した。深層学習モデルの被曝感度解析は,8つの性能指標を用いて,精度,信頼性,再現性,速度を計測する訓練・試験画像の数に対して行われる。 U-net with UVgg16 is best for malign tumor data set with treatment (have certain variation) and U-net with Inception backbone for beign tumor data (with minor variation)。損失値と根平均二乗誤差(R.M.S.E.) それぞれ最も敏感なパフォーマンス指標と最も敏感なパフォーマンス指標が見られます指標による)性能は、多くのトレーニング画像に関して指数関数的に改善されている。セグメンテッドオクタアンギオグラフィーデータから,血管新生が腫瘍体積を増加させることが示唆された。画像解析により,photodynamic imaging-assisted tumor treatment protocolが積極的に増殖する腫瘍を嚢胞に変化させていることが明らかとなった。画像の数や特徴の種類に応じて、医療専門家が特定のモデルを選択するのに役立つ経験的表現を得る。生体画像解析に特定の深層学習モデルを採用する前に,提案課題を標準的実践として採用することを推奨する。 Relatively abundant availability of medical imaging data has provided significant support in the development and testing of Neural Network based image processing methods. Clinicians often face issues in selecting suitable image processing algorithm for medical imaging data. A strategy for the selection of a proper model is presented here. The training data set comprises optical coherence tomography (OCT) and angiography (OCT-A) images of 50 mice eyes with more than 100 days follow-up. The data contains images from treated and untreated mouse eyes. Four deep learning variants are tested for automatic (a) differentiation of tumor region with healthy retinal layer and (b) segmentation of 3D ocular tumor volumes. Exhaustive sensitivity analysis of deep learning models is performed with respect to the number of training and testing images using 8 eight performance indices to study accuracy, reliability/reproducibility, and speed. U-net with UVgg16 is best for malign tumor data set with treatment (having considerable variation) and U-net with Inception backbone for benign tumor data (with minor variation). Loss value and root mean square error (R.M.S.E.) are found most and least sensitive performance indices, respectively. The performance (via indices) is found to be exponentially improving regarding a number of training images. The segmented OCT-Angiography data shows that neovascularization drives the tumor volume. Image analysis shows that photodynamic imaging-assisted tumor treatment protocol is transforming an aggressively growing tumor into a cyst. An empirical expression is obtained to help medical professionals to choose a particular model given the number of images and types of characteristics. We recommend that the presented exercise should be taken as standard practice before employing a particular deep learning model for biomedical image analysis.	翻訳日:2021-07-12 13:35:39 公開日:2021-07-09
# VMAFとVMAF NEGのハック:異なる前処理に対するメトリクス脆弱性 Hacking VMAF and VMAF NEG: metrics vulnerability to different preprocessing ( http://arxiv.org/abs/2107.04510v1 ) ライセンス: Link先を確認	Maksim Siniukov, Anastasia Antsiferova, Dmitriy Kulikov, Dmitriy Vatolin	(参考訳) ビデオ品質測定は、ビデオ処理アプリケーションの開発において重要な役割を果たす。本稿では,ビデオプリプロセッシングにより,VMAFとそのチューニング耐性バージョンVMAF NEGが人工的に向上可能であることを示す。我々は,vmafを最大218.8%増加させる処理アルゴリズムのパラメータをチューニングするパイプラインを提案する。前処理したビデオの主観的な比較では、ほとんどの方法では、視覚的品質は低下するか、変わらないままである。また,vmaf negスコアは,前処理法によって最大23.6%向上できることを示した。 Video quality measurement plays a critical role in the development of video processing applications. In this paper, we show how popular quality metrics VMAF and its tuning-resistant version VMAF NEG can be artificially increased by video preprocessing. We propose a pipeline for tuning parameters of processing algorithms that allows increasing VMAF by up to 218.8%. A subjective comparison of preprocessed videos showed that with the majority of methods visual quality drops down or stays unchanged. We show that VMAF NEG scores can also be increased by some preprocessing methods by up to 23.6%.	翻訳日:2021-07-12 13:35:12 公開日:2021-07-09
# 補間を伴うブロック交代ブレグマンメジャー化最小化 Block Alternating Bregman Majorization Minimization with Extrapolation ( http://arxiv.org/abs/2107.04395v1 ) ライセンス: Link先を確認	Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis, Masoud Ahookhosh, Panagiotis Patrinos	(参考訳) 本稿では,ブロック相対滑らか関数と固有かつ下半連続ブロック分離関数の和を目的とする非滑らかな非凸最適化問題のクラスを考える。ブロックのクラスに対するブロック近位勾配 (BPG) 法の解析は、ブロック相対滑らか関数のクラスを扱うブレグマンBPG法にうまく拡張されているが、加速されたブレグマンBPG法は不足しており、設計が困難である。本研究では,Nesterov型加速法と最大化最小化法から着想を得たBregman Majorization-Minimization framework with Extrapolation (BMME)を提案する。軽微な仮定の下でBMMEの1次定常点への後続収束を証明し、その大域収束を強い条件下で研究する。直交非負行列分解問題に対するBMMEの有効性について述べる。 In this paper, we consider a class of nonsmooth nonconvex optimization problems whose objective is the sum of a block relative smooth function and a proper and lower semicontinuous block separable function. Although the analysis of block proximal gradient (BPG) methods for the class of block $L$-smooth functions have been successfully extended to Bregman BPG methods that deal with the class of block relative smooth functions, accelerated Bregman BPG methods are scarce and challenging to design. Taking our inspiration from Nesterov-type acceleration and the majorization-minimization scheme, we propose a block alternating Bregman Majorization-Minimization framework with Extrapolation (BMME). We prove subsequential convergence of BMME to a first-order stationary point under mild assumptions, and study its global convergence under stronger conditions. We illustrate the effectiveness of BMME on the penalized orthogonal nonnegative matrix factorization problem.	翻訳日:2021-07-12 13:35:03 公開日:2021-07-09
# 多経路畳み込みニューラルネットワークによる持続的肺音検出における特徴抽出の効率化 Multi-path Convolutional Neural Networks Efficiently Improve Feature Extraction in Continuous Adventitious Lung Sound Detection ( http://arxiv.org/abs/2107.04226v1 ) ライセンス: Link先を確認	Fu-Shun Hsu, Shang-Ran Huang, Chien-Wen Huang, Chun-Chieh Chen, Yuan-Ren Cheng, Feipei Lai	(参考訳) 我々は以前, 大きな肺音データベースhf_lung_v2 (lung_v2) を構築した。我々は,Lung_V2に基づいて,吸入,吸入,連続的冒険音(CAS),不連続的冒険音を検出するために,畳み込み二方向ゲートリカレントユニット(CNN-BiGRU)ネットワークを訓練した。しかし,CAS検出性能は多種多様であり,その1つが高度に多様化したCASパターンである。元々のcnn-bigruモデルがcasパターンをより効果的に学習し、計算負荷を過大にしないようにするため、cnn層のネットワークアーキテクチャの最小限の変更を含む3つの戦略について検討した。(1)cnn層を残留ブロックを用いてより深く、(2)cnnカーネルの数を増やしてcnn層を少し大きくし、(3)入力を複数のパスに分離する(モデルはマルチパスcnn-bigruで示される)。 CASセグメントとイベント検出の性能を評価した。その結果,提案したアーキテクチャ修正モデルでCAS検出の改善が認められた。 CASイベント検出のためのF1スコアは0.445から0.491-0.530に増加した。しかし,マルチパスcnn-bigruモデルは,9つの評価指標において,優勝タイトル数 (5) において他のモデルよりも優れていた。さらに、マルチパスCNN-BiGRUモデルでは、元のCNN-BiGRUモデルと比べて余分な計算負荷(0.97倍の推論時間)は生じなかった。結論として、マルチパスCNN層は、特徴抽出の有効性を効率よく改善し、その結果、CAS検出が向上する。 We previously established a large lung sound database, HF_Lung_V2 (Lung_V2). We trained convolutional-bidirectional gated recurrent unit (CNN-BiGRU) networks for detecting inhalation, exhalation, continuous adventitious sound (CAS) and discontinuous adventitious sound at the recording level on the basis of Lung_V2. However, the performance of CAS detection was poor due to many reasons, one of which is the highly diversified CAS patterns. To make the original CNN-BiGRU model learn the CAS patterns more effectively and not cause too much computing burden, three strategies involving minimal modifications of the network architecture of the CNN layers were investigated: (1) making the CNN layers a bit deeper by using the residual blocks, (2) making the CNN layers a bit wider by increasing the number of CNN kernels, and (3) separating the feature input into multiple paths (the model was denoted by Multi-path CNN-BiGRU). The performance of CAS segment and event detection were evaluated. Results showed that improvement in CAS detection was observed among all the proposed architecture-modified models. The F1 score for CAS event detection of the proposed models increased from 0.445 to 0.491-0.530, which was deemed significant. However, the Multi-path CNN-BiGRU model outperformed the other models in terms of the number of winning titles (five) in total nine evaluation metrics. In addition, the Multi-path CNN-BiGRU model did not cause extra computing burden (0.97-fold inference time) compared to the original CNN-BiGRU model. Conclusively, the Multi-path CNN layers can efficiently improve the effectiveness of feature extraction and subsequently result in better CAS detection.	翻訳日:2021-07-12 13:34:33 公開日:2021-07-09
# 混合訓練とドメイン適応を用いた肺・気管音の呼吸位相の改善と持続的予防音検出 Improved Breath Phase and Continuous Adventitious Sound Detection in Lung and Tracheal Sound Using Mixed Set Training and Domain Adaptation ( http://arxiv.org/abs/2107.04229v1 ) ライセンス: Link先を確認	Fu-Shun Hsu, Shang-Ran Huang, Chang-Fu Su, Chien-Wen Huang, Yuan-Ren Cheng, Chun-Chieh Chen, Chun-Yu Wu, Chung-Wei Chen, Yen-Chun Lai, Tang-Wei Cheng, Nian-Jhen Lin, Wan-Ling Tsai, Ching-Shiang Lu, Chuan Chen, Feipei Lai	(参考訳) 従来, 肺音データベースHF_Lung_V2を構築し, 吸入, 吸入, 持続的興奮音 (CAS) , 不連続的不定音検出能力を有する畳み込み双方向ゲート再帰器 (CNN-BiGRU) モデルを提案した。本研究では, 気管音響データベースHF_Tracheal_V1を構築し, 15秒間気管音響記録の11107, 23087 吸入ラベル, 16728 吸入ラベル, 6874 CASラベルを含む。 HF_Tracheal_V1の気管音とHF_Lung_V2の肺音を組み合わせるか単独でCNN-BiGRUモデルを訓練し,気管音響解析を行った。その結果,(1)完全訓練(スクラッチからトレーニング)を用いて気管音を用いて肺音モデルを訓練し,(2)気管音のみを用いて気管音モデルを訓練すること,(2)気管音と気管音の両方を含む混合セットを用いてモデルを訓練すること,(3)気管音データと予め訓練された肺音モデルを微調整した領域適応を用いること,の2つを比較した。その結果, 気管音響解析では, 肺音のみを訓練したモデルが不十分であった。しかし、混合セットトレーニングとドメイン適応は、肺音における呼気およびCAS検出の性能を改善し、気管音における吸気、呼気、CAS検出を正の制御(肺音でのみ訓練された肺モデルとその逆)と比較して改善することができる。特に2羽の鳥を1羽の石で殺す場合、混合セットトレーニングに由来するモデルが一般的である。 Previously, we established a lung sound database, HF_Lung_V2 and proposed convolutional bidirectional gated recurrent unit (CNN-BiGRU) models with adequate ability for inhalation, exhalation, continuous adventitious sound (CAS), and discontinuous adventitious sound detection in the lung sound. In this study, we proceeded to build a tracheal sound database, HF_Tracheal_V1, containing 11107 of 15-second tracheal sound recordings, 23087 inhalation labels, 16728 exhalation labels, and 6874 CAS labels. The tracheal sound in HF_Tracheal_V1 and the lung sound in HF_Lung_V2 were either combined or used alone to train the CNN-BiGRU models for respective lung and tracheal sound analysis. Different training strategies were investigated and compared: (1) using full training (training from scratch) to train the lung sound models using lung sound alone and train the tracheal sound models using tracheal sound alone, (2) using a mixed set that contains both the lung and tracheal sound to train the models, and (3) using domain adaptation that finetuned the pre-trained lung sound models with the tracheal sound data and vice versa. Results showed that the models trained only by lung sound performed poorly in the tracheal sound analysis and vice versa. However, the mixed set training and domain adaptation can improve the performance of exhalation and CAS detection in the lung sound, and inhalation, exhalation, and CAS detection in the tracheal sound compared to positive controls (lung models trained only by lung sound and vice versa). Especially, a model derived from the mixed set training prevails in the situation of killing two birds with one stone.	翻訳日:2021-07-12 13:34:05 公開日:2021-07-09
# ポリフォニック録音におけるブラインド音源分離のためのポリシー勾配によるディープニューラルネットワークの訓練 Training a Deep Neural Network via Policy Gradients for Blind Source Separation in Polyphonic Music Recordings ( http://arxiv.org/abs/2107.04235v1 ) ライセンス: Link先を確認	S\"oren Schulze, Johannes Leuschner, Emily J. King	(参考訳) 音響信号における楽器の音の盲点分離法を提案する。パラメトリックモデルを用いて個々の音色を記述し、調波の相対振幅を捉えるために辞書を訓練する。モデルパラメータは、ディープニューラルネットワークの一種であるu-netを介して予測される。ネットワークは、モデル予測と個々のSTFT時間フレームの差に基づいて、地上の真理情報なしで訓練される。モデルパラメータのいくつかは有用なバックプロパゲーション勾配を与えないため、それらを確率的にモデル化し、代わりにポリシー勾配を用いる。辞書に基づく表現における不正確性を考慮した位相情報を提供するため,ネットワークに直接予測を行い,各楽器の音声信号の合成を行う。ニューラルネットワークの柔軟性のため、不調和性をシームレスに組み込むことができ、入力スペクトルの前処理は不要である。提案手法は,学習のための十分なデータと,楽器のスペクトル特性が辞書に近似されるほど十分に安定していることから,音響的および合成的に様々な音声サンプルに対する干渉が少なく,高品質な分離結果が得られる。 We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual STFT time frames. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary.	翻訳日:2021-07-12 13:33:30 公開日:2021-07-09
# ReLUアクティベーションを用いたニューラルネットワークのトレーニングにおける勾配流の収束解析 Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation ( http://arxiv.org/abs/2107.04479v1 ) ライセンス: Link先を確認	Arnulf Jentzen and Adrian Riekert	(参考訳) 勾配降下(GD)型最適化スキームは、ニューラルネットワーク(ANN)を修正線形単位(ReLU)アクティベーションで訓練する標準的な方法である。このようなスキームは、ReLU アクティベーションを持つ ANN のトレーニングに関連する勾配流(GF)の離散化と、ReLU アクティベーションを持つ ANN のトレーニングにおける GD 型最適化スキームの数学的収束解析におけるほとんどの重要な困難が、対応する GF 微分方程式の力学に既に存在していると考えられる。この研究は、ReLUアクティベーションと3つの層(入力層1つ、隠蔽層1つ、出力層1つ)を持つANNのトレーニングにおいて、そのようなGF微分方程式を分析する上で重要な課題である。特に、本論文では、対象関数が多次元かつ連続な場合と、入力データの確率分布がルベーグ測度に対して絶対連続である場合において、すべての有界GF軌道のリスクが臨界点のリスクに収束することを証明する。さらに,本論文では, 1次元アフィン線形対象関数の場合と, 入力データの確率分布が標準均一分布と一致する場合において, 初期リスクが十分に小さい場合には, 有界GF軌道のリスクが0に収束することを示す。最後に、隠れた層(1次元の隠蔽層)に1つのニューロンしか存在しない特別な状況において、初期リスクが十分に小さい場合、すべての(必ずしも有界ではない)GF軌道のリスクがゼロに収束することを証明することによって、アフィン線形対象関数に対する上記の名前付き結果を強化する。 Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of gradient flows (GFs) associated to the training of ANNs with ReLU activation and most of the key difficulties in the mathematical convergence analysis of GD type optimization schemes in the training of ANNs with ReLU activation seem to be already present in the dynamics of the corresponding GF differential equations. It is the key subject of this work to analyze such GF differential equations in the training of ANNs with ReLU activation and three layers (one input layer, one hidden layer, and one output layer). In particular, in this article we prove in the case where the target function is possibly multi-dimensional and continuous and in the case where the probability distribution of the input data is absolutely continuous with respect to the Lebesgue measure that the risk of every bounded GF trajectory converges to the risk of a critical point. In addition, in this article we show in the case of a 1-dimensional affine linear target function and in the case where the probability distribution of the input data coincides with the standard uniform distribution that the risk of every bounded GF trajectory converges to zero if the initial risk is sufficiently small. Finally, in the special situation where there is only one neuron on the hidden layer (1-dimensional hidden layer) we strengthen the above named result for affine linear target functions by proving that that the risk of every (not necessarily bounded) GF trajectory converges to zero if the initial risk is sufficiently small.	翻訳日:2021-07-12 13:33:12 公開日:2021-07-09
# コミュニティ進化予測のためのグループノード注意 Group-Node Attention for Community Evolution Prediction ( http://arxiv.org/abs/2107.04522v1 ) ライセンス: Link先を確認	Matt Revelle, Carlotta Domeniconi, Ben Gelman	(参考訳) ソーシャルネットワークのコミュニティは、人々がネットワークに入り、去るにつれて進化し、活動行動は変化する。時間とともにコミュニティの構造変化を予測するタスクは、コミュニティ進化予測として知られている。この領域における既存の作業は、実際の予測を行うために従来の分類手法を使用しながら、イベントを定義するフレームワークの開発に焦点を当ててきた。本稿では,構造的および時間的情報からコミュニティ進化イベントを予測するための新しいグラフニューラルネットワークを提案する。モデル(GNAN)は、可変サイズの入力と、メンバーおよび隣接ノードの特徴に基づくグループの学習表現を可能にするグループノードアテンションコンポーネントを含む。標準ベースライン法との比較評価を行い,本モデルがベースラインよりも優れていることを示す。さらに,ネットワークの傾向がモデル性能に及ぼす影響を示す。 Communities in social networks evolve over time as people enter and leave the network and their activity behaviors shift. The task of predicting structural changes in communities over time is known as community evolution prediction. Existing work in this area has focused on the development of frameworks for defining events while using traditional classification methods to perform the actual prediction. We present a novel graph neural network for predicting community evolution events from structural and temporal information. The model (GNAN) includes a group-node attention component which enables support for variable-sized inputs and learned representation of groups based on member and neighbor node features. A comparative evaluation with standard baseline methods is performed and we demonstrate that our model outperforms the baselines. Additionally, we show the effects of network trends on model performance.	翻訳日:2021-07-12 13:32:39 公開日:2021-07-09
# 平均場ゲームのためのディープラーニングと平均場制御とファイナンスへの応用 Deep Learning for Mean Field Games and Mean Field Control with Applications to Finance ( http://arxiv.org/abs/2107.04568v1 ) ライセンス: Link先を確認	Ren\'e Carmona and Mathieu Lauri\`ere	(参考訳) 金融市場やより一般的にマクロ経済モデルでは、全てのエージェントの集合行動から生じる価格などの変数を介して相互作用する多数の個人が関与する。平均場ゲームは、プレイヤーの数が無限である極限におけるそのような問題に対するナッシュ均衡を研究するために導入された。この理論は分析ツールと確率ツールの両方を使用して過去10年間に広く開発され、経済学から群集運動まで幅広い応用が発見されている。最近では、機械学習とのインタラクションが関心を集めている。この側面は、複雑な構造、高次元、または共通のランダム性源を持つ非常に大きなゲームを解くことに特に関係している。本章では,平均フィールドゲームとディープラーニングの相互作用に関する文献を,3種類の手法に焦点をあてて検討する。金融アプリケーションに特に重点を置いている。 Financial markets and more generally macro-economic models involve a large number of individuals interacting through variables such as prices resulting from the aggregate behavior of all the agents. Mean field games have been introduced to study Nash equilibria for such problems in the limit when the number of players is infinite. The theory has been extensively developed in the past decade, using both analytical and probabilistic tools, and a wide range of applications have been discovered, from economics to crowd motion. More recently the interaction with machine learning has attracted a growing interest. This aspect is particularly relevant to solve very large games with complex structures, in high dimension or with common sources of randomness. In this chapter, we review the literature on the interplay between mean field games and deep learning, with a focus on three families of methods. A special emphasis is given to financial applications.	翻訳日:2021-07-12 13:32:27 公開日:2021-07-09
# (参考訳) 二階情報の効率的な行列フリー近似と刈り取りと最適化への応用 Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization ( http://arxiv.org/abs/2107.03356v3 ) ライセンス: CC BY 4.0	Elias Frantar, Eldar Kurtic, Dan Alistarh	(参考訳) 損失関数の局所曲率情報を効率的に近似することは、ディープニューラルネットワークの最適化と圧縮の鍵となるツールである。しかし、既存の2次情報を近似する手法の多くは計算コストやストレージコストが高く、実用性を制限できる。本研究では,経験的フィッシャー行列によるヘッシアンの古典的な近似のように,ヘッシアンをランク1の行列の和として近似できる場合の逆ヘッシアンベクトル積(ihvps)を推定するための行列フリーな線形時間アプローチについて検討する。 M-FACと呼ばれるフレームワークの一部として、2つの新しいアルゴリズムを提案する: 最初のアルゴリズムはネットワーク圧縮に最適化され、逆 Hessian の任意の要素に対して$O(dm^2)$プリ計算、$O(dm)$計算、$O(dm)$クエリコスト$O(m)$で階数1の行列の和として与えられる場合、次元$d$で IHVPを計算できる。第2のアルゴリズムは最適化設定を目標とし,最適化ステップのスライディングウィンドウ上で推定される逆ヘシアンと,事前条件付きSGDに必要な勾配方向との間の積の計算を行う。 IHVPの計算に$O(dm + m^2)$と$O(dm + m^3)$を、スライディングウィンドウから勾配を追加したり取り除いたりするためのアルゴリズムを与える。これら2つのアルゴリズムは、既存の二階法に比べて計算オーバーヘッドの少ないネットワークプルーニングと最適化に最先端の結果をもたらす。実装は[10]と[18]で利用可能です。 Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, which can limit their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. We propose two new algorithms as part of a framework called M-FAC: the first algorithm is tailored towards network compression and can compute the IHVP for dimension $d$, if the Hessian is given as a sum of $m$ rank-one matrices, using $O(dm^2)$ precomputation, $O(dm)$ cost for computing the IHVP, and query cost $O(m)$ for any single element of the inverse Hessian. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction, as required for preconditioned SGD. We give an algorithm with cost $O(dm + m^2)$ for computing the IHVP and $O(dm + m^3)$ for adding or removing any gradient from the sliding window. These two algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods. Implementations are available at [10] and [18].	翻訳日:2021-07-12 11:20:14 公開日:2021-07-09
# (参考訳) 3次元胸部CT画像によるCovid-19検出のためのハイブリッドディープラーニングフレームワーク A hybrid deep learning framework for Covid-19 detection via 3D Chest CT Images ( http://arxiv.org/abs/2107.03904v2 ) ライセンス: CC BY 4.0	Shuang Liang	(参考訳) 本稿では,畳み込みニューラルネットワークとトランスフォーマーを組み合わせた3次元胸部CT画像によるCOVID-19検出のためのハイブリッドディープラーニングフレームワークCTNetを提案する。これは、CTスキャンから十分な特徴を抽出するためにSEが注目するCNN特徴抽出モジュールと、3D CTスキャンの識別特徴をモデル化するトランスフォーマーモデルで構成されている。従来の研究と比較すると、CTNetは、データ再サンプリング戦略を備えた3D CTスキャンによる新型コロナウイルスの診断を効果的かつ効率的に行う方法を提供している。大規模かつパブリックなベンチマークによる高度な結果、COV19-CT-DBデータベースは、提案されたCTNetによって達成された。 In this paper, we present a hybrid deep learning framework named CTNet which combines convolutional neural network and transformer together for the detection of COVID-19 via 3D chest CT images. It consists of a CNN feature extractor module with SE attention to extract sufficient features from CT scans, together with a transformer model to model the discriminative features of the 3D CT scans. Compared to previous works, CTNet provides an effective and efficient method to perform COVID-19 diagnosis via 3D CT scans with data resampling strategy. Advanced results on a large and public benchmarks, COV19-CT-DB database was achieved by the proposed CTNet, over the state-of-the-art baseline approachproposed together with the dataset.	翻訳日:2021-07-12 10:46:22 公開日:2021-07-09
# 特徴解釈と時空間解析を用いた機械学習に基づく沿岸水質予測 Coastal water quality prediction based on machine learning with feature interpretation and spatio-temporal analysis ( http://arxiv.org/abs/2107.03230v2 ) ライセンス: Link先を確認	Luka Grb\v{c}i\'c, Sini\v{s}a Dru\v{z}eta, Goran Mau\v{s}a, Tomislav Lipi\'c, Darija Vuki\'c Lu\v{s}i\'c, Marta Alvir, Ivana Lu\v{c}in, Ante Sikirica, Davor Davidovi\'c, Vanja Trava\v{s}, Daniela Kalafatovi\'c, Kristina Pikelj, Hana Fajkovi\'c, Toni Holjevi\'c and Lado Kranj\v{c}evi\'c	(参考訳) 沿岸水質管理は公衆衛生上の問題であり、沿岸水質の悪化は人の健康に危険である病原体を収容することができる。観光志向の国は、夏季の観光名所で沿岸水の状態を積極的に監視する必要がある。本研究では,クロアチアのリイェカ市にある15か所の公衆ビーチを対象に,escherichia\ coli$とenterococciの定期的モニタリングデータを用いて,環境パラメータに基づいてレベルを予測する機械学習モデルを構築し,環境ストレスとの関連性について検討した。勾配ブースティング (catboost, xgboost) , ランダム林, サポートベクター回帰, 人工ニューラルネットを全てのサンプリングサイトから測定し, 環境特性に基づくe.\ coli$およびenterococci値の予測に用いた。機械学習モデルの10倍クロスバリデーション解析による安定性と一般化性の評価は,xgboost,ランダムフォレスト,サポートベクター回帰,ニューラルネットワークなど他の評価mlアルゴリズムと比較して,それぞれ0.71,0.68のr$^2$値で最高性能を示した。また、SHapley Additive exPlanations技術を用いて、最も予測力のある特徴を特定し、解釈する。その結果, 塩分濃度はE.\ Coli$ と enterococci の両方を推定する上で最も重要な特徴であることがわかった。最後に, 沿岸水質の低い地点において, 両方のMLモデルの空間的および時間的精度について検討した。スペースは$e。 Coli$およびEnterococciモデルは0.85および0.83の強いR$^2$値、時間モデルは0.74および0.67のR$^2$値を得た。また, 沿岸水質の高い地点では, 適度なR$^2$値0.44および0.46を達成した。 Coastal water quality management is a public health concern, as poor coastal water quality can harbor pathogens that are dangerous to human health. Tourism-oriented countries need to actively monitor the condition of coastal water at tourist popular sites during the summer season. In this study, routine monitoring data of $Escherichia\ Coli$ and enterococci across 15 public beaches in the city of Rijeka, Croatia, were used to build machine learning models for predicting their levels based on environmental parameters as well as to investigate their relationships with environmental stressors. Gradient Boosting (Catboost, Xgboost), Random Forests, Support Vector Regression and Artificial Neural Networks were trained with measurements from all sampling sites and used to predict $E.\ Coli$ and enterococci values based on environmental features. The evaluation of stability and generalizability with 10-fold cross validation analysis of the machine learning models, showed that the Catboost algorithm performed best with R$^2$ values of 0.71 and 0.68 for predicting $E.\ Coli$ and enterococci, respectively, compared to other evaluated ML algorithms including Xgboost, Random Forests, Support Vector Regression and Artificial Neural Networks. We also use the SHapley Additive exPlanations technique to identify and interpret which features have the most predictive power. The results show that site salinity measured is the most important feature for forecasting both $E.\ Coli$ and enterococci levels. Finally, the spatial and temporal accuracy of both ML models were examined at sites with the lowest coastal water quality. The spatial $E. Coli$ and enterococci models achieved strong R$^2$ values of 0.85 and 0.83, while the temporal models achieved R$^2$ values of 0.74 and 0.67. The temporal model also achieved moderate R$^2$ values of 0.44 and 0.46 at a site with high coastal water quality.	翻訳日:2021-07-12 10:39:04 公開日:2021-07-09
# 部分的スーパービジョンのためのラベルセット損失関数:胎児脳MRI解析への応用 Label-set Loss Functions for Partial Supervision: Application to Fetal Brain 3D MRI Parcellation ( http://arxiv.org/abs/2107.03846v2 ) ライセンス: Link先を確認	Lucas Fidon, Michael Aertsen, Doaa Emam, Nada Mufti, Fr\'ed\'eric Guffens, Thomas Deprest, Philippe Demaerel, Anna L. David, Andrew Melbourne, S\'ebastien Ourselin, Jan Deprest, Tom Vercauteren	(参考訳) ディープニューラルネットワークは自動セグメンテーションの精度を高めているが、その精度は多数の完全セグメンテーションされた画像の可用性に依存する。部分的に注釈付きデータセットをうまく活用するためには、興味のある領域がセグメンテーションされている画像を使ってディープニューラルネットワークを訓練する方法が必要である。本稿では,部分分割画像を扱うことができる損失関数であるラベルセット損失関数の最初の公理的定義を提案する。完全分割画像に対する古典的損失関数を適切なラベルセット損失関数に変換する方法は1つと1つしかないことを証明した。我々の理論は、特に欠落ラベルしか持たない部分的な監督に適したディース損失のラベルセット一般化であるリーフ・ディース損失を定義できる。葉分裂損失を用いて,胎児脳3次元mri分割のための部分教師あり学習における新しい状態を設定した。白質、心室、小脳、室外csf、皮質灰白質、深灰白質、脳幹、コーパスカルーサムを解剖学的に正常な胎児の胎児脳3dmriまたは開放性スピナビフィダに基づいて分節することができる深層ニューラルネットワークを実現する。提案するラベルセット損失関数の実装は、https://github.com/lucasfidon/label-set-loss-functionsで利用可能です。 Deep neural networks have increased the accuracy of automatic segmentation, however, their accuracy depends on the availability of a large number of fully segmented images. Methods to train deep neural networks using images for which some, but not all, regions of interest are segmented are necessary to make better use of partially annotated datasets. In this paper, we propose the first axiomatic definition of label-set loss functions that are the loss functions that can handle partially segmented images. We prove that there is one and only one method to convert a classical loss function for fully segmented images into a proper label-set loss function. Our theory also allows us to define the leaf-Dice loss, a label-set generalization of the Dice loss particularly suited for partial supervision with only missing labels. Using the leaf-Dice loss, we set a new state of the art in partially supervised learning for fetal brain 3D MRI segmentation. We achieve a deep neural network able to segment white matter, ventricles, cerebellum, extra-ventricular CSF, cortical gray matter, deep gray matter, brainstem, and corpus callosum based on fetal brain 3D MRI of anatomically normal fetuses or with open spina bifida. Our implementation of the proposed label-set loss functions is available at https://github.com/LucasFidon/label-set-loss-functions	翻訳日:2021-07-12 10:38:20 公開日:2021-07-09
# SCSS-Net:3次元屋内シーンのためのスーパーポイント制約付き半教師付きセグメンテーションネットワーク SCSS-Net: Superpoint Constrained Semi-supervised Segmentation Network for 3D Indoor Scenes ( http://arxiv.org/abs/2107.03601v2 ) ライセンス: Link先を確認	Shuang Deng, Qiulei Dong, and Bo Liu	(参考訳) 3Dポイントクラウドセマンティックセグメンテーションのための既存のディープニューラルネットワーク(DNN)の多くは、大量のラベル付きトレーニングデータを必要とする。しかし、複雑なシーンにポイントレベルのラベルを手動で割り当てるのには時間がかかる。ラベルのない点雲はセンサや再構成から容易に得ることができるが,SCSS-Netと呼ばれる3次元点雲のための超点制約付き半教師付きセグメンテーションネットワークを提案する。具体的には,ラベルのない点雲から予測された擬似ラベルを自己学習に利用し,幾何ベースおよび色ベースの領域拡大アルゴリズムによって生成されたスーパーポイントを組み合わせて,疑似ラベルを低信頼で修正・削除する。さらに,特徴を幾何学や色彩のエッジポイントから制約するエッジ予測モジュールを提案する。各スーパーポイントの特徴を円滑にするために、スーパーポイント特徴集合モジュールとスーパーポイント特徴整合損失関数を導入する。 2つの公開屋内データセットにおける広範囲な実験結果から,最先端のクラウドセグメンテーションネットワークや,ラベル付きシーンの少ない半教師付きセグメンテーション手法よりも優れた性能が得られることが示された。 Many existing deep neural networks (DNNs) for 3D point cloud semantic segmentation require a large amount of fully labeled training data. However, manually assigning point-level labels on the complex scenes is time-consuming. While unlabeled point clouds can be easily obtained from sensors or reconstruction, we propose a superpoint constrained semi-supervised segmentation network for 3D point clouds, named as SCSS-Net. Specifically, we use the pseudo labels predicted from unlabeled point clouds for self-training, and the superpoints produced by geometry-based and color-based Region Growing algorithms are combined to modify and delete pseudo labels with low confidence. Additionally, we propose an edge prediction module to constrain the features from edge points of geometry and color. A superpoint feature aggregation module and superpoint feature consistency loss functions are introduced to smooth the point features in each superpoint. Extensive experimental results on two 3D public indoor datasets demonstrate that our method can achieve better performance than some state-of-the-art point cloud segmentation networks and some popular semi-supervised segmentation methods with few labeled scenes.	翻訳日:2021-07-12 10:37:27 公開日:2021-07-09
# deep metric learning を用いた悪性リンパ腫の弱アノテート大きな病理組織像に対するケースベース類似画像検索 Case-based similar image retrieval for weakly annotated large histopathological images of malignant lymphoma using deep metric learning ( http://arxiv.org/abs/2107.03602v2 ) ライセンス: Link先を確認	Noriaki Hashimoto, Yusuke Takagi, Hiroki Masuda, Hiroaki Miyoshi, Kei Kohno, Miharu Nagaishi, Kensaku Sato, Mai Takeuchi, Takuya Furuta, Keisuke Kawamoto, Kyohei Yamada, Mayuko Moritsubo, Kanako Inoue, Yasumasa Shimasaki, Yusuke Ogura, Teppei Imamoto, Tatsuzo Mishina, Koichi Ohshima, Hidekata Hontani, Ichiro Takeuchi	(参考訳) そこで本研究では,ヘマトキシリンとエオシン(H&E)による悪性リンパ腫の組織像を検索する新しい症例ベース類似画像検索法を提案する。全身のスライド画像(WSI)を入力クエリとして使用する場合,腫瘍細胞などの病理学的に重要な領域のイメージパッチに着目して,同様の症例を検索できることが望ましい。この問題に対処するために,注意に基づく複数インスタンス学習を採用し,症例間の類似性を計算する際に腫瘍特異的領域に着目した。さらに,免疫組織化学的(ihc)染色パターンを,異種悪性リンパ腫の適切な類似性を定義するための教師付き情報として組み込むために,対比的距離測定を行った。 249例の悪性リンパ腫に対する実験において,本手法はsir法よりも高い評価基準を示した。また, 病理医による主観的評価により, 悪性リンパ腫に対するh&e染色組織像の類似性を表すために, ihc染色パターンを用いた類似度測定が適切であった。 In the present study, we propose a novel case-based similar image retrieval (SIR) method for hematoxylin and eosin (H&E)-stained histopathological images of malignant lymphoma. When a whole slide image (WSI) is used as an input query, it is desirable to be able to retrieve similar cases by focusing on image patches in pathologically important regions such as tumor cells. To address this problem, we employ attention-based multiple instance learning, which enables us to focus on tumor-specific regions when the similarity between cases is computed. Moreover, we employ contrastive distance metric learning to incorporate immunohistochemical (IHC) staining patterns as useful supervised information for defining appropriate similarity between heterogeneous malignant lymphoma cases. In the experiment with 249 malignant lymphoma patients, we confirmed that the proposed method exhibited higher evaluation measures than the baseline case-based SIR methods. Furthermore, the subjective evaluation by pathologists revealed that our similarity measure using IHC staining patterns is appropriate for representing the similarity of H&E-stained tissue images for malignant lymphoma.	翻訳日:2021-07-12 10:37:06 公開日:2021-07-09
# マルチタスク感情分析のための特徴ピラミッドネットワーク Feature Pyramid Network for Multi-task Affective Analysis ( http://arxiv.org/abs/2107.03670v2 ) ライセンス: Link先を確認	Ruian He, Zhen Xing, Weimin Tan, Bo Yan	(参考訳) Affective Analysisは単一のタスクではなく、valence-arousal値、式クラス、アクションユニットを同時に予測することができる。これまでの研究では、これら3つの顔属性の絡み合いや階層関係を無視して、全体的タスクとして捉えられなかった。マルチタスク影響分析のための特徴ピラミッドネットワークという新しいモデルを提案する。階層的特徴を抽出して3つのラベルを予測し,事前学習されたシングルタスクモデルから学習するための教師学生訓練戦略を適用する。実験の結果,提案モデルが他のモデルより優れており,本論文はABAW(Affective Behavior Analysis in-wild)の第2ワークショップおよびコンペティションに提出されている。コードとモデルは、https://github.com/ryanhe312/ABAW2-FPNMAAで研究目的で利用可能である。 Affective Analysis is not a single task, and the valence-arousal value, expression class and action unit can be predicted at the same time. Previous researches failed to take them as a whole task or ignore the entanglement and hierarchical relation of this three facial attributes. We propose a novel model named feature pyramid networks for multi-task affect analysis. The hierarchical features are extracted to predict three labels and we apply teacher-student training strategy to learn from pretrained single-task models. Extensive experiment results demonstrate the proposed model outperform other models.This is a submission to The 2nd Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW). The code and model are available for research purposes at https://github.com/ryanhe312/ABAW2-FPNMAA.	翻訳日:2021-07-12 10:36:47 公開日:2021-07-09
# 深部神経回路を用いた耳部CT画像における顎骨内解剖のアトラスによる分類 Atlas-Based Segmentation of Intracochlear Anatomy in Metal Artifact Affected CT Images of the Ear with Co-trained Deep Neural Networks ( http://arxiv.org/abs/2107.03987v2 ) ライセンス: Link先を確認	Jianing Wang, Dingjie Su, Yubo Fan, Srijata Chakravorti, Jack H. Noble, and Benoit M. Dawant	(参考訳) 本稿では,アトラス内のメッシュ間のポイント・ツー・ポイント対応を保った人工内耳インプラント(ci)受像者の術後ct画像中の人工内耳解剖(ica)をアトラスベースで分割する手法を提案する。インプラントが生成する強いアーティファクトにより困難であるこの問題を解決するために, 対向方向に高密度変形場(ddfs)を発生させる2対の共学習深層ネットワークを用いた。 1つのネットワークは、アトラス画像をポストCT画像に登録し、もう1つのネットワークは、ポストCT画像をアトラス画像に登録する。ネットワークは、voxel-wiseラベル、画像内容、fiducial registration error、およびcycle-consistency制約に基づく損失関数を用いてトレーニングされる。その後、トレーニングされた登録ネットワークによって生成された対応するDFFを用いて、アトラス画像中のICAの予め定義されたセグメンテーションメッシュをポストCT画像に転送することにより、ポストCT画像中のICAのセグメンテーションを得る。本モデルでは,金属工芸品によって隠蔽されているにもかかわらず,ICAの基盤となる幾何学的特徴を学習することができる。この手法は,まず条件付き生成逆数ネットワークを用いてPost-CT画像からアーティファクトのない画像を合成し,その後,活性形状モデルを用いてICAを合成画像に分割する手法である。提案手法は,エンドユーザの受け入れに重要なSOTAに必要な時間の一部を要している。 We propose an atlas-based method to segment the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of cochlear implant (CI) recipients that preserves the point-to-point correspondence between the meshes in the atlas and the segmented volumes. To solve this problem, which is challenging because of the strong artifacts produced by the implant, we use a pair of co-trained deep networks that generate dense deformation fields (DDFs) in opposite directions. One network is tasked with registering an atlas image to the Post-CT images and the other network is tasked with registering the Post-CT images to the atlas image. The networks are trained using loss functions based on voxel-wise labels, image content, fiducial registration error, and cycle-consistency constraint. The segmentation of the ICA in the Post-CT images is subsequently obtained by transferring the predefined segmentation meshes of the ICA in the atlas image to the Post-CT images using the corresponding DDFs generated by the trained registration networks. Our model can learn the underlying geometric features of the ICA even though they are obscured by the metal artifacts. We show that our end-to-end network produces results that are comparable to the current state of the art (SOTA) that relies on a two-steps approach that first uses conditional generative adversarial networks to synthesize artifact-free images from the Post-CT images and then uses an active shape model-based method to segment the ICA in the synthetic images. Our method requires a fraction of the time needed by the SOTA, which is important for end-user acceptance.	翻訳日:2021-07-12 10:36:36 公開日:2021-07-09

Title

Authors

Abstract

論文公表日・翻訳日

# ファンデルワールスヘテロ界面におけるマグノン-励起子近接結合

Magnon-exciton proximity coupling at a van der Waals heterointerface ( http://arxiv.org/abs/2006.14257v2 )

ライセンス: Link先を確認

Arnaud Gloppe and Masaru Onga and Ryusuke Hisatomi and Atac Imamoglu and Yasunobu Nakamura and Yoshihiro Iwasa and Koji Usami

(参考訳) スピンとフォトニックシステムは、現代の情報デバイスや新しい量子技術の中心にある。半導体の電子ホール対(電子子)と磁性結晶の集団スピン励起(マグノン)の相互作用は、これらの異種系を橋渡しし、新しい相互接続デバイスにおける個々の資産を活用する。本稿では,磁性薄膜と原子薄膜半導体との界面におけるマグノン-エキシトン結合について報告する。我々のアプローチは、イットリウム鉄ガーネット (YIG) フィルムに宿る長寿命マグノンを遷移金属ジアルコゲナイド (MoSe$_2$) のフレーク中の強結合励起子に結合させる。マグノンは界面交換相互作用によって支配される動的谷ゼーマン効果を励起子に誘導する。この初期のハイブリッドシステムは、マイクロ波と光領域間の情報伝達の新しい機会を示唆している。

Spin and photonic systems are at the heart of modern information devices and emerging quantum technologies. An interplay between electron-hole pairs (excitons) in semiconductors and collective spin excitations (magnons) in magnetic crystals would bridge these heterogeneous systems, leveraging their individual assets in novel interconnected devices. Here, we report the magnon-exciton coupling at the interface between a magnetic thin film and an atomically-thin semiconductor. Our approach allies the long-lived magnons hosted in a film of yttrium iron garnet (YIG) to strongly-bound excitons in a flake of a transition metal dichalcogenide, MoSe$_2$. The magnons induce on the excitons a dynamical valley Zeeman effect ruled by interfacial exchange interactions. This nascent class of hybrid system suggests new opportunities for information transduction between microwave and optical regions.

翻訳日:2023-05-12 20:03:37 公開日:2021-07-09

# 連続作業証明の圧縮Oracle技術とポスト量子セキュリティについて

On the Compressed-Oracle Technique, and Post-Quantum Security of Proofs of Sequential Work ( http://arxiv.org/abs/2010.11658v4 )

ライセンス: Link先を確認

Kai-Min Chung, Serge Fehr, Yu-Hsuan Huang, Tai-Ning Liao

(参考訳) 我々は、zhandryが量子ランダムオラクルモデル(qrom)で量子アルゴリズムを分析するために導入した圧縮オラクル技術について再検討する。まず、並列クエリQROMに容易に拡張できる手法の簡潔な説明を行い、各クエリラウンドにおいて、考慮されたアルゴリズムが複数のクエリを並列にQROMに生成する。このQROMの変形により、よりきめ細かいクエリ・複雑度解析が可能になる。我々の主な技術的貢献は、クエリ複雑性の結果を証明するために圧縮されたオラクル技術を使用する(並列クエリの一般化)フレームワークである。我々のフレームワークが組み込まれているため、任意の場合に、純粋に古典的な推論によって量子的クエリの複雑さを下げることが可能である。それよりも、典型的には古典的境界をもたらす重要な古典的観測は、対応する量子境界を結論付けるのに十分である。我々はこれをいくつかの例で示し、既知の結果(並列Groverの最適性など)を復元すると同時に、新しい結果(並列BHT衝突探索の最適性など)を得る。私たちの主なターゲットは、$q$以下の並列クエリ、すなわち、$x_0, x_1,\ldots, x_q$ with $x_i = H(x_{i-1})$ for all $1 \leq i \leq q$。上記のハッシュ連鎖を見つける問題は、シーケンシャルな作業の証明の文脈において重要な問題である。実際、我々の技術の具体的な暗号的応用として、CohenとPietrzakが提唱した"Simple Proofs of Sequential Work"が量子攻撃に対して安全であることを示す。このような分析は、単に新しいバウンドをプラグインすることではなく、プロトコル全体を量子攻撃の光で分析する必要がある。私たちのフレームワークのおかげで、これは純粋に古典的な推論で実現できます。

We revisit the so-called compressed oracle technique, introduced by Zhandry for analyzing quantum algorithms in the quantum random oracle model (QROM). To start off with, we offer a concise exposition of the technique, which easily extends to the parallel-query QROM, where in each query-round the considered algorithm may make several queries to the QROM in parallel. This variant of the QROM allows for a more fine-grained query-complexity analysis. Our main technical contribution is a framework that simplifies the use of (the parallel-query generalization of) the compressed oracle technique for proving query complexity results. With our framework in place, whenever applicable, it is possible to prove quantum query complexity lower bounds by means of purely classical reasoning. More than that, for typical examples the crucial classical observations that give rise to the classical bounds are sufficient to conclude the corresponding quantum bounds. We demonstrate this on a few examples, recovering known results (like the optimality of parallel Grover), but also obtaining new results (like the optimality of parallel BHT collision search). Our main target is the hardness of finding a $q$-chain with fewer than $q$ parallel queries, i.e., a sequence $x_0, x_1,\ldots, x_q$ with $x_i = H(x_{i-1})$ for all $1 \leq i \leq q$. The above problem of finding a hash chain is of fundamental importance in the context of proofs of sequential work. Indeed, as a concrete cryptographic application of our techniques, we prove that the "Simple Proofs of Sequential Work" proposed by Cohen and Pietrzak remains secure against quantum attacks. Such an analysis is not simply a matter of plugging in our new bound; the entire protocol needs to be analyzed in the light of a quantum attack. Thanks to our framework, this can now be done with purely classical reasoning.

翻訳日:2023-04-28 01:08:00 公開日:2021-07-09

# 多成分散乱マトリックスの調査:ユニタリティと対称性

Surveying the Multicomponent Scattering Matrix: Unitarity and Symmetries ( http://arxiv.org/abs/2010.15926v2 )

ライセンス: Link先を確認

L. Diago-Cisneros, J. J. Flores-Godoy and G. Fern\'andez-Anaya

(参考訳) 成分が混合および同期的に伝播するspm電荷キャリアの多成分多バンドフラックスは、非ゼロの入射振幅を持つが、任意の基底集合に対して散乱行列上の標準ユニタリティ条件に従わない。そのような場合には、ユニタリティー保存のための量子輸送問題の基本となるロバストな理論手順を導出し、その名称は \emph{structured unitarity condition} に因む。我々のアプローチは、包絡関数近似(EFA)内の相互作用成分$(N \times N)$(N \geq 2$)を扱い、しかしながら、$N = 1$)散乱行列の標準ユニタリ特性を回復する。むしろ基底集合および/または出力散乱係数に対する任意の条件は、構成とスピノリアル空間の両方において \emph{eigen}-函数が正規化されている場合、もはや必要ではない。エルミート・ハミルトニアン(Hermitian Hamiltonian)によってEFA内で記述された様々な種類のマルチバンド・マルチコンポーネント物理系に対して、このモデルが有効であると期待する。我々は、状態ベクトル伝達行列の相互作用を、その条件数の大きい値とともに予測し、散乱実験におけるトンネルチャネルの閾値をより正確に定義するための新しい補完的ツールである。

Multicomponent-multiband fluxes of spim-charge carriers, whose components propagate mixed and synchronously, with \emph{a priori} nonzero incoming amplitudes, do not obey the standard unitarity condition on the scattering matrix for an arbitrary basis set. For such cases, we have derived a robust theoretical procedure, which is fundamental in quantum-transport problems for unitarity preservation and we have named after \emph{structured unitarity condition}. Our approach deals with $(N \times N)$ interacting components (for $N \geq 2$), within the envelope function approximation (EFA), and yet the standard unitary properties of the ($N = 1$) scattering matrix are recovered. Rather arbitrary conditions to the basis-set and/or to the output scattering coefficients, are not longer required, if the \emph{eigen}-functions are orthonormalized in both the configuration and the spinorial spaces. We expect the present model to be workable, for different kind of multiband-multicomponent physical systems described by Hermitian Hamiltonians within the EFA, with small transformations if any. We foretell the interplay for the state-vector transfer matrix, together with the large values of its condition number, as a novel complementary tools for a more accurate definition of the threshold for tunnelling channels in a scattering experiment.

翻訳日:2023-04-27 00:29:36 公開日:2021-07-09

# 空洞を介する相関トンネルによる自己組織型トポロジカル絶縁体

Self-organized topological insulator due to cavity-mediated correlated tunneling ( http://arxiv.org/abs/2011.01687v3 )

ライセンス: Link先を確認

Titas Chanda, Rebecca Kraus, Giovanna Morigi, Jakub Zakrzewski

(参考訳) トポロジカル材料は量子技術に潜在的な応用がある。位相絶縁体や超伝導体などの非相互作用型位相材料は基本対称性クラスによって分類される。その代わりに、相互作用がトポロジカルな性質にどのように影響するかを部分的に理解しているだけである。本稿では,単粒子力学と大域的相互作用の量子干渉からトポロジーが出現するモデルについて述べる。このシステムは、1次元格子内の大域的相関ホッピングを介して相互作用するソフトコアボソンによって構成される。量子干渉の開始は格子変換対称性の自発的な破れにつながり、対応する位相は有名なSu-Schriefer-Heegerモデルの非自明な状態に似ている。フェルミオンピエルズ不安定性と同様に、出現する量子相はトポロジカル絶縁体であり、半分の充填で見られる。量子干渉から派生したこの位相位相は「正確な」密度行列再正規化群計算で見られ、平均場アプローチでは完全に欠落している。これらのダイナミクスはキャビティ量子電磁力学の設定のような既存の実験プラットフォームで実現可能であり、共振器から放出される光で位相的特徴が明らかにできると主張している。

Topological materials have potential applications for quantum technologies. Non-interacting topological materials, such as e.g., topological insulators and superconductors, are classified by means of fundamental symmetry classes. It is instead only partially understood how interactions affect topological properties. Here, we discuss a model where topology emerges from the quantum interference between single-particle dynamics and global interactions. The system is composed by soft-core bosons that interact via global correlated hopping in a one-dimensional lattice. The onset of quantum interference leads to spontaneous breaking of the lattice translational symmetry, the corresponding phase resembles nontrivial states of the celebrated Su-Schriefer-Heeger model. Like the fermionic Peierls instability, the emerging quantum phase is a topological insulator and is found at half fillings. Originating from quantum interference, this topological phase is found in "exact" density-matrix renormalization group calculations and is entirely absent in the mean-field approach. We argue that these dynamics can be realized in existing experimental platforms, such as cavity quantum electrodynamics setups, where the topological features can be revealed in the light emitted by the resonator.

翻訳日:2023-04-25 11:48:16 公開日:2021-07-09

# 高エネルギー物理における弱値増幅:B中間子崩壊におけるCP振動の高精度測定の事例研究

Weak Value Amplification in High Energy Physics: A Case Study for Precision Measurement of CP Violation in B Meson Decays ( http://arxiv.org/abs/2011.07560v4 )

ライセンス: Link先を確認

Satoshi Higashino, Yuichiro Mori, Yosuke Takubo, Takeo Higuchi, Akimasa Ishikawa, Izumi Tsutsui

(参考訳) 1988年にAharonovらによって提唱された弱値増幅法は、精密測定のために物理学の様々な分野に適用され、物理過程における最終状態を積極的に特定する「ポストセレクション」の自由を利用して実現されている。本稿では,高エネルギー粒子物理学における弱値増幅の手法,特にb中間子崩壊におけるcp違反パラメータの測定において,減衰モードの有効寿命がポスト選択により統計的に長引くことが期待される場合,その実現可能性について述べる。解析の結果,SuperKEKBコライダーでのベルII実験では有効寿命が2.6倍に長くなる可能性があり,CP違反パラメータの測定精度も向上することが示唆された。

The technique of weak value amplification, proposed by Aharonov et al. in 1988, has been applied for various fields of physics for the purpose of precision measurement, which is made possible by exploiting the freedom of `postselection' specifying actively the final state in the physical process. Here we report for the first time the feasibility of utilizing the technique of weak value amplification in high energy particle physics, especially in measuring the CP-violating parameters in B meson decays, where the effective lifetime of the decay mode is expected to be prolonged statistically due to the postselection. Our analysis shows that, when adopted in the Belle II experiment at the SuperKEKB collider, the effective lifetime may be prolonged up to 2.6 times, and that the measurement precision of the CP-violating parameters will also be improved by its effect.

翻訳日:2023-04-24 01:40:35 公開日:2021-07-09

# モデル網膜の定常光異性化における極度パラメトリック感度

Extreme Parametric Sensitivity in the Steady-State Photoisomerization of Model Retinal ( http://arxiv.org/abs/2011.14342v3 )

ライセンス: Link先を確認

Chern Chuang and Paul Brumer

(参考訳) ロドプシン中の網膜色素の光異性化反応を熱浴に結合した2状態二モードモデルを用いて計算した。定常状態(10ps以上)での反応量子収率は、それらの過渡値とはかなり異なることが分かり、これらの系における過渡と定常状態のダイナミクスの間に弱い相関が示唆された。さらに, 定常量子収率は系パラメータの微妙な変化に対して高い感度を示したが, 過渡力学はほとんど影響を受けなかった。このような感度とノルナジアバティック・ビブロン系の標準レベル間隔統計との相関は、量子カオスの起源を示唆している。本現象の実験的観察の可能性とその凝縮相光化学および生物光センシングにおける意義について考察した。

The photoisomerization reaction of the retinal chromophore in rhodopsin was computationally studied using a two-state two-mode model coupled to thermal baths. Reaction quantum yields at the steady state (10 ps and beyond) were found to be considerably different than their transient values, suggesting a weak correlation between transient and steady-state dynamics in these systems. Significantly, the steady-state quantum yield was highly sensitive to minute changes in system parameters, while transient dynamics was nearly unaffected. Correlation of such sensitivity with standard level spacing statistics of the nonadiabatic vibronic system suggests a possible origin in quantum chaos. The feasibility of experimental observation of this phenomenon and its implications in condensed-phase photochemistry and biological light sensing are discussed.

翻訳日:2023-04-22 16:42:21 公開日:2021-07-09

# 線形ポールトラップにおける半径2次元イオン結晶

Radial two-dimensional ion crystals in a linear Paul trap ( http://arxiv.org/abs/2012.12766v5 )

ライセンス: Link先を確認

Marissa D'Onofrio, Yuanheng Xie, A.J. Rasmusson, Evangeline Wolanski, Jiafeng Cui, and Philip Richerme

(参考訳) 線形ポールトラップの"radial-2d"相における二次元(2次元)クーロン結晶を実験的に研究した。この相はラジアル面に完全に整列した2次元イオン格子によって同定され、軸方向とラジアル方向のトラップ電位の比が大きいことで形成される。 19$^{171}$Yb$^+$イオンの配列を用いて、本態マイクロモーションによって駆動される時間依存性イオン位置にもかかわらず、そのような結晶の構造相境界と振動モード周波数が擬ポテンシャル近似によって適切に記述されていることを示す。さらに,微動による放射状2D結晶の加熱が放射面に制限されていることを観察した。最後に、ほとんどのイオントラップ量子シミュレーションで使用される横運動モードが、この幾何学において分離され冷えていることを検証した。本研究では,ラジアル2次元イオン結晶を量子シミュレーションと計算における様々な理論的提案を実現するための強固な実験プラットフォームとして確立する。

We experimentally study two-dimensional (2D) Coulomb crystals in the "radial-2D" phase of a linear Paul trap. This phase is identified by a 2D ion lattice aligned entirely with the radial plane and is created by imposing a large ratio of axial to radial trapping potentials. Using arrays of up to 19 $^{171}$Yb$^+$ ions, we demonstrate that the structural phase boundaries and vibrational mode frequencies of such crystals are well-described by the pseudopotential approximation, despite the time-dependent ion positions driven by intrinsic micromotion. We further observe that micromotion-induced heating of the radial-2D crystal is confined to the radial plane. Finally, we verify that the transverse motional modes, which are used in most ion-trap quantum simulation schemes, remain decoupled and cold in this geometry. Our results establish radial-2D ion crystals as a robust experimental platform for realizing a variety of theoretical proposals in quantum simulation and computation.

翻訳日:2023-04-19 19:36:34 公開日:2021-07-09

# 閉・開放型量子電池におけるエネルギー貯蔵とコヒーレンス

Energy storage and coherence in closed and open quantum batteries ( http://arxiv.org/abs/2012.15026v4 )

ライセンス: Link先を確認

Francesco Caravelli, Bin Yan, Luis Pedro Garcia-Pintos, Alioscia Hamma

(参考訳) 閉鎖型および開放型量子電池におけるコヒーレンスの役割について検討する。我々は、コヒーレンスの観点から、クローズドとオープンの両方の量子バッテリによって実行される仕事やエネルギーの上限を求める。具体的には、電池の進化をエンコードするユニタリ作用素のスペクトル基底における密度行列のヒルベルト・シュミットコヒーレンスによってエネルギー貯蔵が束縛されることを示す。また、電池のハミルトニアンコヒーレンスの観点から、可換作用素の評価により類似した境界が得られることを示す。これらの境界を閉系の場合の4状態量子系と異方性XYイジングモデル、開系の場合のスピン-ボソンモデルに適用する。

We study the role of coherence in closed and open quantum batteries. We obtain upper bounds to the work performed or energy exchanged by both closed and open quantum batteries in terms of coherence. Specifically, we show that the energy storage can be bounded by the Hilbert-Schmidt coherence of the density matrix in the spectral basis of the unitary operator that encodes the evolution of the battery. We also show that an analogous bound can be obtained in terms of the battery's Hamiltonian coherence in the basis of the unitary operator by evaluating their commutator. We apply these bounds to a 4-state quantum system and the anisotropic XY Ising model in the closed system case, and the Spin-Boson model in the open case.

翻訳日:2023-04-18 08:07:08 公開日:2021-07-09

# リアリスティック量子ビットにおける加速断熱ゲートの解析設計:超電導回路の一般理論と応用

Analytic Design of Accelerated Adiabatic Gates in Realistic Qubits: General Theory and Applications to Superconducting Circuits ( http://arxiv.org/abs/2102.02370v2 )

ライセンス: Link先を確認

F. Setiawan, Peter Groszkowski, Hugo Ribeiro, and Aashish A. Clerk

(参考訳) adiabaticityへのショートカットは、adiabatic quantum protocolを高速化するための一般的な方法であり、量子情報処理に多くの潜在的な応用がある。残念ながら、複雑な相互作用といくつかのレベルを持つシステムに対して、分析的にショートカットを構築することは難しい作業である。これは通常、理想化されたハミルトニアン(例えば、エネルギーレベルの限られた部分集合のみが保持され、回転波近似(RWA)が作られる)を仮定することで克服される。ここでは、これらの制限を超えることができる$analytic$アプローチを開発します。本手法は一般的であり,非rwa誤差と非rwa誤差の両方を補正するパルス形状を解析的に導出する。また,本手法は従来の非断熱プロトコルよりも少ない駆動力を必要とするパルスが得られることを示す。我々は,高忠実な単一量子ビット"三脚"ゲートを現実的な超伝導フラックスニウム量子ビットで解析的に設計する方法を詳細に示す。

Shortcuts to adiabaticity is a general method for speeding up adiabatic quantum protocols, and has many potential applications in quantum information processing. Unfortunately, analytically constructing shortcuts to adiabaticity for systems having complex interactions and more than a few levels is a challenging task. This is usually overcome by assuming an idealized Hamiltonian [e.g., only a limited subset of energy levels are retained, and the rotating-wave approximation (RWA) is made]. Here we develop an $analytic$ approach that allows one to go beyond these limitations. Our method is general and results in analytically derived pulse shapes that correct both nonadiabatic errors as well as non-RWA errors. We also show that our approach can yield pulses requiring a smaller driving power than conventional nonadiabatic protocols. We show in detail how our ideas can be used to analytically design high-fidelity single-qubit "tripod" gates in a realistic superconducting fluxonium qubit.

翻訳日:2023-04-12 20:11:09 公開日:2021-07-09

# 10kg物体の運動基底状態への接近

Approaching the motional ground state of a 10 kg object ( http://arxiv.org/abs/2102.12665v2 )

ライセンス: Link先を確認

Chris Whittle, Evan D. Hall, Sheila Dwyer, Nergis Mavalvala, Vivishek Sudhir, R. Abbott, A. Ananyeva, C. Austin, L. Barsotti, J. Betzwieser, C. D. Blair, A. F. Brooks, D. D. Brown, A. Buikema, C. Cahillane, J. C. Driggers, A. Effler, A. Fernandez-Galiana, P. Fritschel, V. V. Frolov, T. Hardwick, M. Kasprzack, K. Kawabe, N. Kijbunchoo, J. S. Kissel, G. L. Mansell, F. Matichard, L. McCuller, T. McRae, A. Mullavey, A. Pele, R. M. S. Schofield, D. Sigg, M. Tse, G. Vajente, D. C. Vander-Hyde, Hang Yu, Haocun Yu, C. Adams, R. X. Adhikari, S. Appert, K. Arai, J. S. Areeda, Y. Asali, S. M. Aston, A. M. Baer, M. Ball, S. W. Ballmer, S. Banagiri, D. Barker, J. Bartlett, B. K. Berger, D. Bhattacharjee, G. Billingsley, S. Biscans, R. M. Blair, N. Bode, P. Booker, R. Bork, A. Bramley, K. C. Cannon, X. Chen, A. A. Ciobanu, F. Clara, C. M. Compton, S. J. Cooper, K. R. Corley, S. T. Countryman, P. B. Covas, D. C. Coyne, L. E. H. Datrier, D. Davis, C. Di Fronzo, K. L. Dooley, P. Dupej, T. Etzel, M. Evans, T. M. Evans, J. Feicht, P. Fulda, M. Fyffe, J. A. Giaime, K. D. Giardina, P. Godwin, E. Goetz, S. Gras, C. Gray, R. Gray, A. C. Green, E. K. Gustafson, R. Gustafson, J. Hanks, J. Hanson, R. K. Hasskew, M. C. Heintze, A. F. Helmling-Cornell, N. A. Holland, J. D. Jones, S. Kandhasamy, S. Karki, P. J. King, Rahul Kumar, M. Landry, B. B. Lane, B. Lantz, M. Laxen, Y. K. Lecoeuche, J. Leviton, J. Liu, M. Lormand, A. P. Lundgren, R. Macas, M. MacInnis, D. M. Macleod, S. M\'arka, Z. M\'arka, D. V. Martynov, K. Mason, T. J. Massinger, R. McCarthy, D. E. McClelland, S. McCormick, J. McIver, G. Mendell, K. Merfeld, E. L. Merilh, F. Meylahn, T. Mistry, R. Mittleman, G. Moreno, C. M. Mow-Lowry, S. Mozzon, T. J. N. Nelson, P. Nguyen, L. K. Nuttall, J. Oberling, Richard J. Oram, C. Osthelder, D. J. Ottaway, H. Overmier, J. R. Palamos, W. Parker, E. Payne, R. Penhorwood, C. J. Perez, M. Pirello, H. Radkins, K. E. Ramirez, J. W. Richardson, K. Riles, N. A. Robertson, J. G. Rollins, C. L. Romel, J. H. Romie, M. P. Ross, K. Ryan, T. Sadecki, E. J. Sanchez, L. E. Sanchez, T. R. Saravanan, R. L. Savage, D. Schaetzl, R. Schnabel, E. Schwartz, D. Sellers, T. Shaffer, B. J. J. Slagmolen, J. R. Smith, S. Soni, B. Sorazu, A. P. Spencer, K. A. Strain, L. Sun, M. J. Szczepa\'nczyk, M. Thomas, P. Thomas, K. A. Thorne, K. Toland, C. I. Torrie, G. Traylor, A. L. Urban, G. Valdes, P. J. Veitch, K. Venkateswara, G. Venugopalan, A. D. Viets, T. Vo, C. Vorvick, M. Wade, R. L. Ward, J. Warner, B. Weaver, R. Weiss, B. Willke, C. C. Wipf, L. Xiao, H. Yamamoto, L. Zhang, M. E. Zucker, J. Zweizig

(参考訳) 機械的物体(人間サイズの物体でさえ)の運動は、量子力学の規則によって制御されるべきである。熱環境は、物体の動きの量子的シグネチャを隠蔽する。実際、熱環境は大規模スケールでの量子力学の修正提案の効果を隠蔽している。 10kgの機械振動子の質量中心運動を、平均フォノン占有量10.8の状態で作成する。室温から77 nkまでの温度の低下は、フィードバックによる量子バックアクションの11桁の抑制と、その運動基底状態に近い物体の質量の13桁の桁の上昇とが共通している。これは、巨大な量子系の重力を観測する可能性を示している。

The motion of a mechanical object -- even a human-sized object -- should be governed by the rules of quantum mechanics. Coaxing them into a quantum state is, however, difficult: the thermal environment masks any quantum signature of the object's motion. Indeed, the thermal environment also masks effects of proposed modifications of quantum mechanics at large mass scales. We prepare the center-of-mass motion of a 10 kg mechanical oscillator in a state with an average phonon occupation of 10.8. The reduction in temperature, from room temperature to 77 nK, is commensurate with an 11 orders-of-magnitude suppression of quantum back-action by feedback -- and a 13 orders-of-magnitude increase in the mass of an object prepared close to its motional ground state. This begets the possibility of probing gravity on massive quantum systems.

翻訳日:2023-04-09 23:01:18 公開日:2021-07-09

# 合成次元における高次量子特異点の生成

Generating high-order quantum exceptional points in synthetic dimensions ( http://arxiv.org/abs/2102.13646v2 )

ライセンス: Link先を確認

Ievgen I. Arkhipov, Fabrizio Minganti, Adam Miranowicz, Franco Nori

(参考訳) 近年,散逸系における高次例外点(eps)の構成法の提案と開発が盛んに行われている。これらのepは、キラル輸送や感度の向上など、多くの興味深い特性を持つことができる。高次epを持つ非エルミート型ハミルトニアン(nhhs)を実現する以前の提案は、主に結合モードの空間ネットワークを直接構築するか、例えば空間格子を時間や光子数空間にマッピングする合成次元の活用に基づいている。どちらの手法も、古典的またはポストセレクトされた量子場を記述する効果的なNHHの構築に依存しており、量子ジャンプの影響を無視し、したがって、量子ジャンプの確率が励起数や散逸率によって増加するとき、 {\it量子状態におけるスケーラビリティの問題に悩まされる。ここでは、二次リウビリア超作用素の完全な量子力学を考慮し、系作用素モーメントの進化行列から導かれる高次量子EPを用いたNHHsをシンプルかつ効果的に設計する手法を提案する。すなわち、二次二モード系のシステム作用素の高次モーメントを量子化することにより、結果として得られる進化行列は、例えば結合共振器の空間格子を記述する代替の nhhs として解釈でき、そこでは、空間サイトは場モーメントの合成空間における高次フィールドモーメントとして表現される。例えば、u(1) 対称二次二次リウビリアン(英語版)($u(1)$-symmetric quadratic liouvillian)は、非コヒーレントモードカップリングを持つ {\it bimodal} キャビティを記述するが、これは反$\cal pt$-symmetry(英語版)も持つことができ、そのフィールドモーメントダイナミクスは高次epを持つ結合共振器の空間 {\it network} を制御する nhh にマッピングできる。

Recently, there has been intense research in proposing and developing various methods for constructing high-order exceptional points (EPs) in dissipative systems. These EPs can possess a number of intriguing properties related to, e.g., chiral transport and enhanced sensitivity. Previous proposals to realize non-Hermitian Hamiltonians (NHHs) with high-order EPs have been mainly based on either direct construction of spatial networks of coupled modes or utilization of synthetic dimensions, e.g., of mapping spatial lattices to time or photon-number space. Both methods rely on the construction of effective NHHs describing classical or postselected quantum fields, which neglect the effects of quantum jumps, and which, thus, suffer from a scalability problem in the {\it quantum regime}, when the probability of quantum jumps increases with the number of excitations and dissipation rate. Here, by considering the full quantum dynamics of a quadratic Liouvillian superoperator, we introduce a simple and effective method for engineering NHHs with high-order quantum EPs, derived from evolution matrices of system operators moments. That is, by quantizing higher-order moments of system operators, e.g., of a quadratic two-mode system, the resulting evolution matrices can be interpreted as alternative NHHs describing, e.g., a spatial lattice of coupled resonators, where spatial sites are represented by high-order field moments in the synthetic space of field moments. As an example, we consider a $U(1)$-symmetric quadratic Liouvillian describing a {\it bimodal} cavity with incoherent mode coupling, which can also possess anti-$\cal PT$-symmetry, whose field moment dynamics can be mapped to an NHH governing a spatial {\it network} of coupled resonators with high-order EPs.

翻訳日:2023-04-09 20:22:12 公開日:2021-07-09

# 原子と分子による2光子の絡み合った吸収:量子光学チュートリアル

Entangled Two-Photon Absorption by Atoms and Molecules: A Quantum Optics Tutorial ( http://arxiv.org/abs/2103.02551v3 )

ライセンス: Link先を確認

Michael G. Raymer, Tiemo Landes and Andrew H. Marcus

(参考訳) 2光子吸収(tpa)や他の分子と時間周波数エンタングル光子対(epp)との非線形相互作用は、様々な興味深い効果を示すと予測されている。そのため、実用的な量子化分子分光法での可能性は、綿密な検査を必要とする。本稿では, 分子による1光および2光子吸収の詳細な理論的研究を行い, 光の量子的性質の扱いについて述べる。基本量子光学理論を概観し、分子光学応答の密度行列(リウヴィル)の導出を概観し、光の量子状態をどのように治療に組み込むかを強調した。挿絵では、自然パラメトリックダウン変換によって生成される光子対のTPAを詳細に扱い、量子光のTPAが古典光とどのように異なるかを強調している。特に, 絡み合った状態を用いて, どの程度のTPA率向上が達成できるかという問題を論じる。この論文は、既知の理論手法と結果のレビュー、およびいくつかの拡張、特に遠方共振中間状態のみを介して発生するTPAプロセスと非共振中間状態を含むTPAプロセスの比較を含む。また, 絡み合ったTPAの実験的研究に直面する主な課題についても概説した。

Two-photon absorption (TPA) and other nonlinear interactions of molecules with time-frequency-entangled photon pairs (EPP) has been predicted to display a variety of fascinating effects. Therefore, their potential use in practical quantum-enhanced molecular spectroscopy requires close examination. This paper presents in tutorial style a detailed theoretical study of one- and two-photon absorption by molecules, focusing on how to treat the quantum nature of light. We review some basic quantum optics theory, then we review the density-matrix (Liouville) derivation of molecular optical response, emphasizing how to incorporate quantum states of light into the treatment. For illustration we treat in detail the TPA of photon pairs created by spontaneous parametric down conversion, with an emphasis on how quantum light TPA differs from that with classical light. In particular, we treat the question of how much enhancement of the TPA rate can be achieved using entangled states. The paper includes review of known theoretical methods and results, as well as some extensions, especially the comparison of TPA processes that occur via far-off-resonant intermediate states only and those that involve off-resonant intermediate state by virtue of dephasing processes. A brief discussion of the main challenges facing experimental studies of entangled TPA is also given.

翻訳日:2023-04-09 07:51:45 公開日:2021-07-09

# トモグラフィーデータへの量子ノイズモデルの適用

Fitting quantum noise models to tomography data ( http://arxiv.org/abs/2103.17243v2 )

ライセンス: Link先を確認

Emilio Onorati, Tamara Kohler, and Toby Cubitt

(参考訳) ノイズの存在は、現在、大規模な量子計算を達成するための主要な障害の1つである。量子ハードウェアにおけるノイズプロセスの特徴付けと理解の戦略は、特に完全なエラー修正とフォールトトレランスのオーバーヘッドが現在のハードウェアの範囲を超えているため、それを緩和する重要な部分である。非マルコフ効果は特に好ましくない種類のノイズであり、標準技術を用いて解析することは困難であり、誤り訂正を用いて制御することが困難である。本研究では,マルコフマスター方程式の厳密な数学的理論に基づいて,未知雑音過程の解析・評価を行う効率的なアルゴリズムを開発した。時間に依存しないマルコフ力学(あるいはほぼマルコフ力学)の場合、このアルゴリズムは最も適したリンドブラジアン、すなわち、与えられた精度内でトモグラフィデータを最も近似するメモリレス量子チャネルの生成子を出力する。非マルコフ力学の場合、このアルゴリズムは等方性雑音付加の観点で非マルコフ性についての定量的かつ操作上有意義な尺度を返す。我々は全てのアルゴリズムのpython実装を提供し、cirqプラットフォームを用いて生成された合成雑音トモグラフィデータの1ビットおよび2量子ビットのサンプルでこれらをベンチマークします。数値計算の結果から,本アルゴリズムは,計測力学に対する最適リンドブラジアンの完全な記述と,解析計算に適合する非マルコフ性を正確に計算することに成功した。

The presence of noise is currently one of the main obstacles to achieving large-scale quantum computation. Strategies to characterise and understand noise processes in quantum hardware are a critical part of mitigating it, especially as the overhead of full error correction and fault-tolerance is beyond the reach of current hardware. Non-Markovian effects are a particularly unfavorable type of noise, being both harder to analyse using standard techniques and more difficult to control using error correction. In this work we develop a set of efficient algorithms, based on the rigorous mathematical theory of Markovian master equations, to analyse and evaluate unknown noise processes. In the case of time-independent Markovian (or nearly Markovian) dynamics, our algorithm outputs the best-fit Lindbladian, i.e., the generator of a memoryless quantum channel which best approximates the tomographic data to within the given precision. In the case of non-Markovian dynamics, our algorithm returns a quantitative and operationally meaningful measure of non-Markovianity in terms of isotropic noise addition. We provide a Python implementation of all our algorithms, and benchmark these on a range of 1- and 2-qubit examples of synthesised noisy tomography data, generated using the Cirq platform. The numerical results show that our algorithms succeed both in extracting a full description of the best-fit Lindbladian to the measured dynamics, and in computing accurate values of non-Markovianity that match analytical calculations.

翻訳日:2023-04-06 00:48:27 公開日:2021-07-09

# t\bar t$-変形フェルミオン理論の再検討

$T\bar T$-deformed Fermionic Theories Revisited ( http://arxiv.org/abs/2104.09529v2 )

ライセンス: Link先を確認

Kyung-Sun Lee, Piljin Yi and Junggi Yoon

(参考訳) 我々は、量子化に向けてフェルミオンを持つ$d=2$理論の$T\bar T$変形を再考する。簡単な図解として、変形したディラックブラケットをMajorana doubletで計算し、既知の固有値の流れを摂動的に確認する。我々は、ワールドシート計量を統合する際に弦のような理論から再構成できるこれらの$t\bar t$理論をほとんど考慮している。 NSRのようなフェルミオンやGSのようなフェルミオンを加えると、これがどのように働くかを簡単に説明した後、ネーターエネルギー運動量に基づいて、後者から$\cN=(1,1)$理論の既知の非超対称性の$T\bar T$変形を得る。この世界表の再構成は、後者が実際には$d=3$ GS-likeモデルの超対称部分集合であり、隠されたスーパーチャージを暗示していることを意味する。これにより、超対称な$T\bar T$のような異なる$T\bar T$の変形や、より一般的には対称エネルギーモメンタムを経由する。フェルミオンを持つ理論では、そのような選択は、潜在的にユニタリティの問題を伴う自由度を2倍にすることが多い。余剰セクターが「小さな変形」限界の分岐ギャップを発達し、赤外における疎結合が生じることを示すが、どのような意味での変形と考えられるかは定かではない。

We revisit $T\bar T$ deformations of $d=2$ theories with fermions with a view toward the quantization. As a simple illustration, we compute the deformed Dirac bracket for a Majorana doublet and confirm the known eigenvalue flows perturbatively. We mostly consider those $T\bar T$ theories that can be reconstructed from string-like theories upon integrating out the worldsheet metric. After a quick overview of how this works when we add NSR-like or GS-like fermions, we obtain a known non-supersymmetric $T\bar T$ deformation of a $\cN=(1,1)$ theory from the latter, based on the Noether energy-momentum. This worldsheet reconstruction implies that the latter is actually a supersymmetric subsector of a $d=3$ GS-like model, implying hidden supercharges, which we do construct explicitly. This brings us to ask about different $T\bar T$ deformations, such as manifestly supersymmetric $T\bar T$ and also more generally via the symmetric energy-momentum. We show that, for theories with fermions, such choices often lead us to doubling of degrees of freedom, with potential unitarity issues. We show that the extra sector develops a divergent gap in the "small deformation" limit and decouples in the infrared, although it remains uncertain in what sense these can be considered a deformation.

翻訳日:2023-04-03 04:43:16 公開日:2021-07-09

# 動的バックアクションマグノメカニクス

Dynamical Backaction Magnomechanics ( http://arxiv.org/abs/2104.11218v2 )

ライセンス: Link先を確認

C.A. Potts, E. Varga, V.A.S.V. Bittencourt, S. Viola Kusminskiy and J.P. Davis

(参考訳) 光機械系の放射圧による動的バックアクションは、機械振動を操作するための汎用的な道具であることが証明されている。特に、動的バックアクションは、メカニカル共振器の基底状態への冷却、フォノン発振の駆動、絡み合った状態の生成、光バネ効果の観測に繋がった。ある磁性材料では、機械的振動は磁歪相互作用を介して磁気励起(マグノン)と相互作用し、類似のマグノン誘起動的バックアクションを引き起こす。本稿では,マグノン誘起動的バックアクションが球状磁気試料の機械的振動に与える影響を直接観察する。さらに,近年の多くの理論的提案において,動的バックアクション効果が重要な役割を担っている。

Dynamical backaction resulting from radiation pressure forces in optomechanical systems has proven to be a versatile tool for manipulating mechanical vibrations. Notably, dynamical backaction has resulted in the cooling of a mechanical resonator to its ground-state, driving phonon lasing, the generation of entangled states, and observation of the optical-spring effect. In certain magnetic materials, mechanical vibrations can interact with magnetic excitations (magnons) via the magnetostrictive interaction, resulting in an analogous magnon-induced dynamical backaction. In this article, we directly observe the impact of magnon-induced dynamical backaction on a spherical magnetic sample's mechanical vibrations. Moreover, dynamical backaction effects play a crucial role in many recent theoretical proposals; thus, our work provides the foundation for future experimental work pursuing many of these theoretical proposals.

翻訳日:2023-04-02 20:09:39 公開日:2021-07-09

# エンタングル量子アンルーオットーエンジンはより効率的である

Entangled quantum Unruh Otto engine is more efficient ( http://arxiv.org/abs/2105.11709v2 )

ライセンス: Link先を確認

Gaurang Ramakant Kane, Bibhas Ranjan Majhi

(参考訳) 2量子ビットの絡み合った状態と、通常の1量子ビットの量子オットーエンジンよりも効率が良い複合励起状態(あるいは基底状態)との間の相対論的量子オットーサイクルを提案する。熱水貯留層は、背景場と個々の量子ビット間の相互作用とともに、これらの量子ビットに均一な加速度を提供することによって構成される。量子ビットのフレームの1つから測定された効率は、状態のエネルギーギャップだけでなく、それらの間の相対加速度にも依存する。観測者のキュービットの加速度を他のキュービットと比較すると、サイクルは単一キュービット量子オットーエンジンよりも効率的である。さらに、そのようなサイクルを構築するための完全なプロトコルが提供される。

We propose a relativistic quantum Otto cycle between an entangled state of two qubits and their composite excited (or ground) state whose efficiency can be greater than the usual single qubit quantum Otto engine. The hot and cold reservoirs are constructed by providing uniform accelerations to these qubits along with the interaction between the background field and individual qubits. The efficiency, as measured from one of the qubits' frame, not only depends on the energy gap of the states but also the relative acceleration between them. For lower acceleration of our observer's qubit compared to the other one, the cycle is more efficient than the single qubit quantum Otto engine. Furthermore, a complete protocol to construct such a cycle is being provided.

翻訳日:2023-03-29 21:09:07 公開日:2021-07-09

# ESR: 人工知能研究の倫理と社会観

ESR: Ethics and Society Review of Artificial Intelligence Research ( http://arxiv.org/abs/2106.11521v2 )

ライセンス: Link先を確認

Michael S. Bernstein, Margaret Levi, David Magnus, Betsy Rajala, Debra Satz, Charla Waeiss

(参考訳) 人工知能(AI)の研究は、その現実的および潜在的な社会への影響について定期的に批判されており、我々はこの批判とそれが反映する責任に対する十分な制度的な反応を欠いている。 AI研究は、人間の社会に害を与えるのではなく、人に対する害を評価するように設計された、制度審査委員会(IRB)のような既存のフィードバックメカニズムの見地から外れることが多い。そこで我々は,AI研究の否定的倫理的側面と社会的側面を緩和するためのフィードバックパネルであるEthics and Society Review Board (ESR)を開発した。研究者は、この提案のためにesrプロセスが完了するまで、私たちの大学で主要なai資金プログラムから助成金を受けられません。本稿では、41の提案で最初の1年間に設計し、実行してきたESRについて述べる。我々はこれらの提案に関するESRの総合的なフィードバックを分析し、このパネルがマイノリティグループに対する害の問題を最もよく特定していること、研究計画における多様な利害関係者の関与、二重利用、データの表現について調べる。 esrと対話した研究者の調査では、58%が研究プロジェクトの設計に影響を与えていると感じており、100%は将来のプロジェクトをesrに提出し続け、倫理や社会問題を通じて推論のための足場を探していた。

Artificial intelligence (AI) research is routinely criticized for its real and potential impacts on society, and we lack adequate institutional responses to this criticism and to the responsibility that it reflects. AI research often falls outside the purview of existing feedback mechanisms such as the Institutional Review Board (IRB), which are designed to evaluate harms to human subjects rather than harms to human society. In response, we have developed the Ethics and Society Review board (ESR), a feedback panel that works with researchers to mitigate negative ethical and societal aspects of AI research. The ESR's main insight is to serve as a requirement for funding: researchers cannot receive grant funding from a major AI funding program at our university until the researchers complete the ESR process for the proposal. In this article, we describe the ESR as we have designed and run it over its first year across 41 proposals. We analyze aggregate ESR feedback on these proposals, finding that the panel most commonly identifies issues of harms to minority groups, inclusion of diverse stakeholders in the research plan, dual use, and representation in data. Surveys and interviews of researchers who interacted with the ESR found that 58% felt that it had influenced the design of their research project, 100% are willing to continue submitting future projects to the ESR, and that they sought additional scaffolding for reasoning through ethics and society issues.

翻訳日:2023-03-25 21:11:27 公開日:2021-07-09

# レーン変化のアトラス:顧客艦隊の測定データを用いた位置依存型レーン変化行動の検討

The Atlas of Lane Changes: Investigating Location-dependent Lane Change Behaviors Using Measurement Data from a Customer Fleet ( http://arxiv.org/abs/2107.04029v2 )

ライセンス: Link先を確認

Florian Wirthm\"uller, Jochen Hipp, Christian Reichenb\"acher and Manfred Reichert

(参考訳) 周辺交通参加者の行動予測は、運転支援システムや自動運転システムにとって重要かつ困難な課題である。今日のアプローチは、主に交通状況の動的側面をモデル化し、これに基づいて交通参加者の行動を予測することに焦点を当てている。本稿では、位置特異的なa-プリオリレーン変化確率を計算することにより、この共通プラクティスを拡大する第一歩を踏み出す。人間の運転行動は、それぞれの場所によって全く同じ交通状況で異なるかもしれない。例えば、運転手は自問自答する:私はすぐにトラックを目の前で通り過ぎるべきか、あるいは、わずか数キロ先にあるルートの曲がりくねった部分に到達するまで待つべきなのか? このような情報は単独で行動予測を許すには程遠いが、今日のアプローチがそのような位置固有のa-priori確率を予測に組み込むことで大いに有益であることは明らかである。例えば、高速道路のインターチェンジは車線変更を行うドライバーのモチベーションを高めがちであるが、カーブは車線変更削減効果を持っているようである。それにもかかわらず、すべての検討された地域条件の調査は、様々な効果の重畳が、いくつかの場所で予期せぬ確率をもたらすことを示している。そこで我々は,車載予測システムを支援するために,顧客艦隊データに基づく車線変更確率マップを動的に構築,維持することを提案する。信頼できる車線変更確率を導出するためには、広い顧客層が成功の鍵となる。

The prediction of surrounding traffic participants behavior is a crucial and challenging task for driver assistance and autonomous driving systems. Today's approaches mainly focus on modeling dynamic aspects of the traffic situation and try to predict traffic participants behavior based on this. In this article we take a first step towards extending this common practice by calculating location-specific a-priori lane change probabilities. The idea behind this is straight forward: The driving behavior of humans may vary in exactly the same traffic situation depending on the respective location. E.g. drivers may ask themselves: Should I pass the truck in front of me immediately or should I wait until reaching the less curvy part of my route lying only a few kilometers ahead? Although, such information is far away from allowing behavior prediction on its own, it is obvious that today's approaches will greatly benefit when incorporating such location-specific a-priori probabilities into their predictions. For example, our investigations show that highway interchanges tend to enhance driver's motivation to perform lane changes, whereas curves seem to have lane change-dampening effects. Nevertheless, the investigation of all considered local conditions shows that superposition of various effects can lead to unexpected probabilities at some locations. We thus suggest dynamically constructing and maintaining a lane change probability map based on customer fleet data in order to support onboard prediction systems with additional information. For deriving reliable lane change probabilities a broad customer fleet is the key to success.

翻訳日:2023-03-25 18:12:51 公開日:2021-07-09

# ダイヤモンド核スピンジャイロスコープの実証

Demonstration of diamond nuclear spin gyroscope ( http://arxiv.org/abs/2107.04257v1 )

ライセンス: Link先を確認

Andrey Jarmola, Sean Lourette, Victor M. Acosta, A. Glen Birdwell, Peter Bl\"umler, Dmitry Budker, Tony Ivanov, Vladimir S. Malinovsky

(参考訳) 我々は,ダイヤモンド中の窒素空白(nv)色中心に固有な原子核スピンである^<14}$nに基づく回転センサの動作を実証する。このセンサーは、核の光偏光と読み出しと、無線周波数の2量子パルスプロトコルを使用し、原子核スピンの先入観をモニターする。この測定プロトコルは、$^{14}$N四極子分割における温度変化に対する感度を抑え、NV電子スピン遷移に共鳴するマイクロ波パルスを必要としない。この装置は回転プラットフォーム上でテストされ、感度は4.7$^{\circ}/\sqrt{\rm{s}}$ (13 mHz/$\sqrt{\rm{Hz}}$)、バイアス安定性は0.4$^{\circ}$/s (1.1 mHz)であった。

We demonstrate operation of a rotation sensor based on the $^{14}$N nuclear spins intrinsic to nitrogen-vacancy (NV) color centers in diamond. The sensor employs optical polarization and readout of the nuclei and a radio-frequency double-quantum pulse protocol that monitors $^{14}$N nuclear spin precession. This measurement protocol suppresses the sensitivity to temperature variations in the $^{14}$N quadrupole splitting, and it does not require microwave pulses resonant with the NV electron spin transitions. The device was tested on a rotation platform and demonstrated a sensitivity of 4.7 $^{\circ}/\sqrt{\rm{s}}$ (13 mHz/$\sqrt{\rm{Hz}}$), with bias stability of 0.4 $^{\circ}$/s (1.1 mHz).

翻訳日:2023-03-23 00:08:24 公開日:2021-07-09

# インターフェロメトリ質量分析

Interferometric mass spectrometry ( http://arxiv.org/abs/2107.04256v1 )

ライセンス: Link先を確認

Radu Ionicioiu

(参考訳) 加速器質量分析法(accelerator mass spectrometry, ams)は、地質学、分子生物学、考古学など、様々な応用分野において広く用いられている技術である。非常に正確ではあるが、AMSはタンデム加速器とバルク磁石を必要とし、大きな実験室に閉じ込める。本稿では、量子干渉を用いた新しい質量分離法であるインターフェロメトリ質量分析法(IMS)を提案する。 IMSは試料の波状特性を採用しており、試料が粒子状であるAMSと相補的である。この相補性は2つの重要な結果をもたらす。 i) 絶対質量$m$に従ってIMS分離を行うが、AMSのように質量/電荷比$m/q$には対応しない。 IMSでは, 試料は低速度状態にあるが, AMSで使用される高速度状態とは対照的である。 IMSの潜在的な応用は、モバイルアプリケーションのためのコンパクトなデバイス、加速段階で壊れる感受性分子、イオン化が難しい中性試料である。

Accelerator mass spectrometry (AMS) is a widely-used technique with multiple applications, including geology, molecular biology and archeology. Although extremely precise, AMS requires tandem accelerators and bulky magnets which confines it to large laboratories. Here we propose interferometric mass spectrometry (IMS), a novel method of mass separation which uses quantum interference. IMS employs the wave-like properties of the samples, and as such is complementary to AMS, in which samples are particle-like. This complementarity has two significant consequences: (i) in IMS separation is performed according to the absolute mass $m$, and not to the mass-to-charge ratio $m/q$, as in AMS; (ii) in IMS the samples are in the low-velocity regime, in contrast to the high-velocity regime used in AMS. Potential applications of IMS are compact devices for mobile applications, sensitive molecules that break at the acceleration stage and neutral samples which are difficult to ionise.

翻訳日:2023-03-23 00:08:08 公開日:2021-07-09

# 回路qedにおける単一光子状態キュービットとコヒーレント状態キュービット間の量子絡み合い状態の伝達

Transferring quantum entangled states between multiple single-photon-state qubits and coherent-state qubits in circuit QED ( http://arxiv.org/abs/2107.04203v1 )

ライセンス: Link先を確認

Qi-Ping Su, Hanyu Zhang, Chui-Ping Yang

(参考訳) 超伝導フラックス量子ビットに結合した2n個のマイクロ波キャビティを用いて、n個の単光子状態(SPS)量子ビットをn個のコヒーレント状態(CS)量子ビットに最大あるいは部分的に絡み合った状態に転送する方法を提案する。ここでのSPS量子ビットの2つの論理状態は空洞の真空状態と単光子状態で表され、CS量子ビットの2つの論理状態は空洞の2つのコヒーレント状態で符号化される。カプラとして1つの超伝導クトリットのみを使用するため、回路アーキテクチャは大幅に単純化される。状態転送の動作時間はキュービット数の増加とともに増加しない。系の散逸が無視できる場合、量子状態は測定が不要であるため決定論的に転送することができる。さらに、カプラキュトリットの高エネルギー中間レベルは全操作中に励起されず、キュトリットからの脱コヒーレンスを大幅に抑制する。具体例として、2つのSPS量子ビットのベル状態の2つのCS量子ビットへの高忠実転送が、現在の回路QED技術で実現可能であることを示す。最後に、散逸が無視できるとき、n個のCS量子ビットの絡み合った状態は、逆演算を行うことでn個のSPS量子ビットに戻すことができることに注意する必要がある。この提案は非常に一般的であり、自然な原子または人工原子を使用して2nマイクロ波または光学共振器を結合することにより、同じタスクを達成するために拡張することができる。

We present a way to transfer maximally- or partially-entangled states of n single-photon-state (SPS) qubits onto n coherent-state (CS) qubits, by employing 2n microwave cavities coupled to a superconducting flux qutrit. The two logic states of a SPS qubit here are represented by the vacuum state and the single-photon state of a cavity, while the two logic states of a CS qubit are encoded with two coherent states of a cavity. Because of using only one superconducting qutrit as the coupler, the circuit architecture is significantly simplified. The operation time for the state transfer does not increase with the increasing of the number of qubits. When the dissipation of the system is negligible, the quantum state can be transferred in a deterministic way since no measurement is required. Furthermore, the higher-energy intermediate level of the coupler qutrit is not excited during the entire operation and thus decoherence from the qutrit is greatly suppressed. As a specific example, we numerically demonstrate that the high-fidelity transfer of a Bell state of two SPS qubits onto two CS qubits is achievable within the present-day circuit QED technology. Finally, it is worthy to note that when the dissipation is negligible, entangled states of n CS qubits can be transferred back onto n SPS qubits by performing reverse operations. This proposal is quite general and can be extended to accomplish the same task, by employing a natural or artificial atom to couple 2n microwave or optical cavities.

翻訳日:2023-03-23 00:07:35 公開日:2021-07-09

# ポラリトンシミュレーションによる実空間における超高速コヒーレンス非局在化

Ultrafast Coherence Delocalization in Real Space Simulated by Polaritons ( http://arxiv.org/abs/2107.04162v1 )

ライセンス: Link先を確認

Bo Xiang, Zimo Yang, Yi-Zhuang You, Wei Xiong

(参考訳) 超高速2次元赤外高スペクトルイメージングにより, 時間, 周波数, 空間領域における結合キャビティ分子偏光子プラットフォーム上のコヒーレンス非局在化を検討した。周波数および実空間において一方向コヒーレンス非局在化(一つの空洞から別の空洞へのコヒーレンス移動で調製したコヒーレンス)が観察された。この方向性は、lindblad dynamicsによって記述された高エネルギーモードから低エネルギーモードへの非局在光子の散逸によって実現された。さらなる実験により、コヒーレンスがキャビティ間(異なるキャビティからのポラリトン間の重ね合わせ)で直接合成されたとき、エネルギー的に近傍のポラリトンのみが長距離環境変動を生き残ったコヒーレンスを形成することができた。リンドブラッド力学とともに、この結果はコヒーレンスが1段階の機構を通じて非局在化され、光子が1つの空洞から別の空洞へ移動し、自然と人工の量子系におけるコヒーレンス進化に光を遮蔽することを示した。この研究は光子と分子モードを組み合わせてコヒーレンス力学をシミュレートする方法も示した。

We investigated coherence delocalization on a coupled-cavity molecular polariton platform in time, frequency, and spatial domains, enabled by ultrafast two-dimensional infrared hyperspectral imaging. Unidirectional coherence delocalization (coherence prepared in one cavity transfer to another cavity) was observed in frequency and real spaces. This directionality was enabled by dissipation of delocalized photon from high-energy to low-energy modes, described by Lindblad dynamics. Further experiments showed that when coherences were directly prepared across cavities (superpositions between polaritons from different cavities), only energetically nearby polaritons could form coherences that survived the long-range environmental fluctuation. Together with the Lindblad dynamics, this result implied that coherences delocalized through a one-step mechanism where photons transferred from one cavity to another, shedding lights to coherence evolution in natural and artificial quantum systems. This work also demonstrated a way of combining photon and molecular modes to simulate coherence dynamics.

翻訳日:2023-03-23 00:07:06 公開日:2021-07-09

# 集団結合レジームにおける振動ポラリトン化学の理論

Theory of Vibrational Polariton Chemistry in the Collective Coupling Regime ( http://arxiv.org/abs/2107.04156v1 )

ライセンス: Link先を確認

Arkajit Mandal, Xinyang Li, Pengfei Huo

(参考訳) 分子振動を光学キャビティと結合させることで化学反応速度定数を著しく抑制し,集合結合効果と速度定数のキャビティ周波数変化の両方を示すことを理論的に証明した。反応座標が溶媒分子に強く結合すると、動的カウジング効果により反応速度定数が低下する。また, 溶媒をキャビティに結合させることにより, この動的カウジング効果をさらに高め, 化学速度のさらなる抑制が期待できることを示した。この効果はキャビティ損失を考慮するとさらに増幅される。

We theoretically demonstrate that chemical reaction rate constant can be significantly suppressed by coupling molecular vibrations with an optical cavity, exhibiting both the collective coupling effect and the cavity-frequency modification of the rate constant. When a reaction coordinate is strongly coupled to the solvent molecules, the reaction rate constant is reduced due to the dynamical caging effect. We demonstrate that collectively coupling the solvent to the cavity can further enhance this dynamical caging effect, leading to additional suppression of the chemical kinetics. This effect is further amplified when cavity loss is considered.

翻訳日:2023-03-23 00:06:44 公開日:2021-07-09

# ランダウアー境界を超える実験的非平衡メモリ消去

Experimental nonequilibrium memory erasure beyond Landauer's bound ( http://arxiv.org/abs/2107.04429v1 )

ライセンス: Link先を確認

Mario A. Ciampini, Tobias Wenzl, Michael Konopik, Gregor Thalhammer, Markus Aspelmeyer, Eric Lutz, Nikolai Kiesel

(参考訳) デジタル情報のクリーンな世界は、ノイズの多い物理デバイスに基づいている。ランダウアーの原理は、論理的に不可逆な変換のエネルギー消費と熱生成の限界を低く設定することで、情報処理と基礎となる熱力学の深い関係を提供する。ランダウアーの元々の定式化は平衡を仮定するが、実際の装置はしばしば平衡から遠く離れている。メモリ状態の非平衡特性により、消費電力の低減と負の熱発生を伴う全消去が可能となることを実験的に示す。最適化された消去プロトコルを2状態メモリに実装する。この目的のために, 非線形ポテンシャルランドスケープの動的形状をレヴィトダイナミクスの強力なツールとして, および非平衡過程の研究として導入する。

The clean world of digital information is based on noisy physical devices. Landauer's principle provides a deep connection between information processing and the underlying thermodynamics by setting a lower limit on the energy consumption and heat production of logically irreversible transformations. While Landauer's original formulation assumes equilibrium, real devices often do operate far from equilibrium. We show experimentally that the nonequilibrium character of a memory state enables full erasure with reduced power consumption as well as negative heat production. We implement the optimized erasure protocols in an optomechanical two-state memory. To this end, we introduce dynamical shaping of nonlinear potential landscapes as a powerful tool for levitodynamics as well as the investigation of far-from-equilibrium processes.

翻訳日:2023-03-23 00:02:12 公開日:2021-07-09

# ド・ジッター時空の双曲真空中の球によって誘起されるカシミール密度

Casimir densities induced by a sphere in the hyperbolic vacuum of de Sitter spacetime ( http://arxiv.org/abs/2107.04376v1 )

ライセンス: Link先を確認

A. A. Saharian, T. A. Petrosyan

(参考訳) モードの完全集合とアダマール関数は (D+1)-次元ド・ジッター時空における球面内および外部のスカラー場に対して負の定数曲率空間で分離される。体は球面上のロビン境界条件に従うと仮定する。球によって誘導されるアダマール関数の寄与を明示的に分離し, 双極子およびエネルギー-モーメントテンソルの真空期待値(VEVs)を双曲真空に対して検討した。平坦な時空極限では、後者はミルネ宇宙の共形真空に還元され、最大対称の束-デイヴィス真空状態とは異なる。真空エネルギー運動量テンソルは、放射方向のエネルギーフラックスを記述する非零オフ対角成分を有する。後者は純粋に球面誘起効果であり、境界自由幾何には存在しない。ロビン境界条件の定数と放射座標によっては、エネルギーフラックスは球面からまたは球面へ向けることができる。宇宙膨張の初期段階では、時空曲率が球によって誘導されるVEVに与える影響は弱く、対応する膨張の先頭の項は、ミルヌ宇宙の球のそれと一致する。重力場の影響は、膨張の後期において不可欠である。磁場質量と曲率結合パラメータに依存すると、時間座標の関数としての球誘起VEVの崩壊は単調または減衰振動である。球面から遠く離れたところでは、測地線距離の関数としての球面誘起VEVの降下は、質量場と質量場の両方に対して指数関数的である。

Complete set of modes and the Hadamard function are constructed for a scalar field inside and outside a sphere in (D+1)-dimensional de Sitter spacetime foliated by negative constant curvature spaces. We assume that the field obeys Robin boundary condition on the sphere. The contributions in the Hadamard function induced by the sphere are explicitly separated and the vacuum expectation values (VEVs) of the field squared and energy-momentum tensor are investigated for the hyperbolic vacuum. In the flat spacetime limit the latter is reduced to the conformal vacuum in the Milne universe and is different from the maximally symmetric Bunch-Davies vacuum state. The vacuum energy-momentum tensor has a nonzero off-diagonal component that describes the energy flux in the radial direction. The latter is a purely sphere-induced effect and is absent in the boundary-free geometry. Depending on the constant in Robin boundary condition and also on the radial coordinate, the energy flux can be directed either from the sphere or towards the sphere. At early stages of the cosmological expansion the effects of the spacetime curvature on the sphere-induced VEVs are weak and the leading terms in the corresponding expansions coincide with those for a sphere in the Milne universe. The influence of the gravitational field is essential at late stages of the expansion. Depending on the field mass and the curvature coupling parameter, the decay of the sphere-induced VEVs, as functions of the time coordinate, is monotonic or damping oscillatory. At large distances from the sphere the fall-off of the sphere-induced VEVs, as functions of the geodesic distance, is exponential for both massless and massive fields.

翻訳日:2023-03-23 00:01:36 公開日:2021-07-09

# 分離可能な数値範囲による自信エンタングルメント検出

Confident entanglement detection via separable numerical range ( http://arxiv.org/abs/2107.04365v1 )

ライセンス: Link先を確認

Timo Simnacher, Jakub Czartowski, Konrad Szyma\'nski and Karol \.Zyczkowski

(参考訳) 我々は、複数の測定値のジョイント(分離可能な)数値範囲、すなわち、与えられた観測値に対して(分離可能な)量子状態でアクセス可能な期待値の領域について検討する。これは効率の良い絡み合い検出を可能にするだけでなく、量子状態の集合の幾何学にも光を当てる。より正確には、実験において、得られたデータに対する信頼領域と分離可能な数値範囲が解離した場合、絡み合いを確実に検出する。概して、このような実験の成功は、分離可能な数値範囲が測定された観測値の標準数値範囲と比較されるほど小さい可能性が高い。これら2つの体積の比を用いてこの関係を定量化し、任意の粒子数、局所次元および測定数に対して解析的境界を与えることなく、任意に小さくすることはできないことを示す。さらに, 2つの局所トレースのない2量子ビット生成可観測器の分離可能領域と標準数値範囲の体積を明示的に計算する。さらに、一般的な観測可能量と極端なインスタンスに対する典型的な体積比を考察する。

We investigate the joint (separable) numerical range of multiple measurements, i.e., the regions of expectation values accessible with (separable) quantum states for given observables. This not only enables efficient entanglement detection, but also sheds light on the geometry of the set of quantum states. More precisely, in an experiment, if the confidence region for the obtained data and the separable numerical range are disjoint, entanglement is reliably detected. Generically, the success of such an experiment is more likely the smaller the separable numerical range is compared to the standard numerical range of the observables measured. We quantify this relation using the ratio between these two volumes and show that it cannot be arbitrarily small, giving analytical bounds for any number of particles, local dimensions as well as number of measurements. Moreover, we explicitly compute the volume of separable and standard numerical range for two locally traceless two-qubit product observables, which are of particular interest as they are easier to measure in practice. Furthermore, we consider typical volume ratios for generic observables and extreme instances.

翻訳日:2023-03-23 00:01:07 公開日:2021-07-09

# 無反射ポテンシャルに対するグリーン関数と超対称パートナーを見つけるためのパワーローポテンシャルへの境界状態の追加

Green's functions for reflectionless potentials and addition of boundstates to powerlaw potentials to find Supersymmetric partners ( http://arxiv.org/abs/2107.04332v1 )

ライセンス: Link先を確認

C.V.Sukumar

(参考訳) グリーンの無反射ポテンシャル関数は構成され、解析される。電力法ポテンシャルのグリーン関数,超対称性パートナー,固有値の和規則について検討した。付加境界状態が$e=0$である法ポテンシャルを動力とするsusyパートナーポテンシャルが構成される。

Green's functions for reflectionless potentials are constructed and analyzed. Green's functions for power law potentials, their Super Symmetric partners and sum rules for eigenvalues are examined. The SUSY partner potentials to power law potentials which have an additional bound state at $E=0$ are constructed.

翻訳日:2023-03-23 00:00:48 公開日:2021-07-09

# プラトン絡み合い

Platonic Entanglement ( http://arxiv.org/abs/2107.04329v1 )

ライセンス: Link先を確認

Jos\'e I. Latorre and Germ\'an Sierra

(参考訳) 本稿では, Acillary Absolute Maximally Entangled (AME) 状態に基づくテンソルネットワークを用いて, プラトニックソリッドの位相上で定義される強絡状態の構成について述べる。ドデカヘドロン上のAME(5,2)に基づく量子状態の例を用いて、このアイデアを説明する。このような状態のエントロピーを多くの異なる分割で解析し、それらが整数上で発生し、ほぼ極大であるのを観測する。また,すべてのプラトニックソリッドは,面数,頂点数,辺数が常に素数プラス1であるため,リード・ソロモン符号に基づくAME状態の構成を受け入れる。

We present a construction of highly entangled states defined on the topology of a platonic solid using tensor networks based on ancillary Absolute Maximally Entangled (AME) states. We illustrate the idea using the example of a quantum state based on AME(5,2) over a dodecahedron. We analyze the entropy of such states on many different partitions, and observe that they come on integer numbers and are almost maximal. We also observe that all platonic solids accept the construction of AME states based on Reed-Solomon codes since their number of facets, vertices and edges are always a prime number plus one.

翻訳日:2023-03-23 00:00:42 公開日:2021-07-09

# 量子回路冷凍における電荷動態:熱化とマイクロ波利得

Charge dynamics in quantum-circuit refrigeration: thermalization and microwave gain ( http://arxiv.org/abs/2107.04278v1 )

ライセンス: Link先を確認

Hao Hsu, Matti Silveri, Vasilii Sevriuk, Mikko M\"ott\"onen, Gianluigi Catelani

(参考訳) 通常の金属-絶縁体-超導体接合による光子補助トンネルの研究は、量子電気回路の散逸をその場で制御するための便利なツールを提供する可能性を示した。しかし、そのような量子回路冷凍機(QCR)に関する現在の文献では、トンネル過程の電荷ダイナミクスやオープン量子系の位相コヒーレンスについて詳細な記述は示されていない。ここでは、量子電気と電荷の自由度の両方を記述するマスター方程式を導出し、低温と低電荷エネルギーの典型的な実験パラメータが電荷と量子力学の時間スケールの分離をもたらすことを発見する。したがって、電荷分布を平均化することにより、異なる電荷状態のマイナーな効果を考慮に入れることができる。また、交流電圧をトンネル接合に印加することにより、駆動振幅を変化させて超伝導量子ビットの減衰率を4桁以上制御可能とし、40 nsでの量子ビット励起の桁数低下と10^{-4}$以下の残留リセットを求める。さらに、通常の島では、超伝導ギャップ、すなわち量子ドットに比べて電荷エネルギーと単一粒子レベルが大きな場合を考える。このような点QCRから生じる減衰速度は量子ビットリセットにおいて低いように見えるが、結合マイクロ波共振器に効果的な負減衰(利得)を与えることができる。そのようなミリケルビンマイクロ波源のファノ係数はユニティよりも小さくなり、後者の値は最大到達可能電力に近い値に達する。

Previous studies of photon-assisted tunneling through normal-metal-insulator-superconductor junctions have exhibited potential for providing a convenient tool to control the dissipation of quantum-electric circuits in-situ. However, the current literature on such a quantum-circuit refrigerator (QCR) does not present a detailed description for the charge dynamics of the tunneling processes or the phase coherence of the open quantum system. Here we derive a master equation describing both quantum-electric and charge degrees of freedom, and discover that typical experimental parameters of low temperature and yet lower charging energy yield a separation of time scales for the charge and quantum dynamics. Consequently, the minor effect of the different charge states can be taken into account by averaging over the charge distribution. We also consider applying an ac voltage to the tunnel junction, which enables control of the decay rate of a superconducting qubit over four orders of magnitude by changing the drive amplitude; we find an order-of-magnitude drop in the qubit excitation in 40 ns and a residual reset infidelity below $10^{-4}$. Furthermore, for the normal island we consider the case of charging energy and single-particle level spacing large compared to the superconducting gap, i.e., a quantum dot. Although the decay rates arising from such a dot QCR appear low for use in qubit reset, the device can provide effective negative damping (gain) to the coupled microwave resonator. The Fano factor of such a millikelvin microwave source may be smaller than unity, with the latter value being reached close to the maximum attainable power.

翻訳日:2023-03-22 23:59:48 公開日:2021-07-09

# 重力sagによる殻状凝縮:接触と双極子相互作用

Shell-shaped condensates with gravitational sag: contact and dipolar interactions ( http://arxiv.org/abs/2107.04577v1 )

ライセンス: Link先を確認

Maria Arazo, Ricardo Mayol, Montserrat Guilleumas

(参考訳) 小重力下での気泡トラップポテンシャルにおけるボース・アインシュタイン凝縮について検討する。特に,薄い殻に注目し,接触と双極子相互作用の凝縮の研究を行う。まず,双極子相互作用の異方性の影響を解析し,双極子と重力の偏光軸がわずかに不一致の場合,すでに重力の欠如に現れている。そこで, 微小重力場において, 重力方向の瞬時に傾いたり, 重力強度の急激な変化によって引き起こされた, 薄い貝殻状凝縮体の小さな振動のダイナミクスについて検討した。このシステムは、宇宙実験室で重力センサーを実現するための予備段階となるかもしれない。

We investigate Bose-Einstein condensates in bubble trap potentials in the presence of a small gravity. In particular, we focus on thin shells and study both contact and dipolar interacting condensates. We first analyze the effects of the anisotropic nature of the dipolar interactions, which already appear in the absence of gravity and are enhanced when the polarization axis of the dipoles and the gravity are slightly misaligned. Then, in the small gravity context, we investigate the dynamics of small oscillations of these thin, shell-shaped condensates triggered either by an instantaneous tilting of the gravity direction or by a sudden change of the gravity strength. This system could be a preliminary stage for realizing a gravity sensor in space laboratories.

翻訳日:2023-03-22 23:53:34 公開日:2021-07-09

# KscAイオンチャネルの量子モデルにおける輸送しきい値

Transport threshold in a quantum model for the KscA ion channel ( http://arxiv.org/abs/2107.04573v1 )

ライセンス: Link先を確認

N. De March, S. D. Prado and L. G. Brunnet

(参考訳) K$^{+}$チャネルにおける高いスループット率の背後にあるメカニズムは、まだ未解決の問題である。最近のシミュレーションにより、k$^{+}$チャネルコアを通るカリウムの通過(いわゆる選択性フィルター(sf))は、クーロン反発の強さがイオン伝導を凍結するモデルに対して無水であることが示されている。量子コヒーレントホッピングはイオン伝導の仲介に関係があることが示唆されている。量子的アプローチと経路に沿った解離したイオンの仮説の中で、ソース内の多数の粒子から始まり、ドレインで収集されるサイトの線形連鎖によってモデル化されたSFをどう通過するかを確認する。その結果、SF占有率の平均は3イオンであり、これは最近の古典的モデルシミュレーションと一致していることがわかった。

The mechanism behind the high throughput rate in K$^{+}$ channels is still an open problem. Recent simulations have shown that the passage of potassium through the K$^{+}$ channel core, the so-called selectivity filter (SF), is water-free against models where the strength of Coulomb repulsion freezes ions conduction. It has been suggested that quantum coherent hopping might be relevant in mediating ion conduction. Within the quantum approach and the hypothesis of desolvated ions along the pathway, we start with a number of particles in a source to see how they go across the SF modeled by a linear chain of sites to be collected in a drain. As a main result we show that there is a threshold SF occupancy is three ions on average, which is in agreement with recent classical model simulations.

翻訳日:2023-03-22 23:53:20 公開日:2021-07-09

# SherLOCKed:サイバーセキュリティ教育のための探偵テーマのシリアスゲーム

SherLOCKED: A Detective-themed Serious Game for Cyber Security Education ( http://arxiv.org/abs/2107.04506v1 )

ライセンス: Link先を確認

Alice Jaffray and Conor Finn and Jason R.C. Nurse

(参考訳) ゲーミフィケーションとシリアスゲームは、多くの分野、特に教育を支援するために徐々に使われている。このようなゲームは、生徒にコンテンツを与える新しい方法を提供し、より伝統的な学習アプローチを補完する。この記事は、2Dトップダウンパズルアドベンチャーのスタイルで作られた新しい真剣なゲームであるSherLOCKEDを提案する。このゲームは、学部のサイバーセキュリティコースの文脈にあり、学生の基本的なセキュリティ概念(ciaのトライアド、セキュリティの脅威と攻撃、リスク管理など)に関する知識を統合するために使用される。 sherlockedは、既存のシリアスゲームのレビューと共通のゲーミフィケーション原則の研究に基づいて構築された。その後、学部で実施され、112名の学生で評価された。このゲームは、学生が講義中に導入したコンテンツへのさらなるエンゲージメントを可能にする、効果的で魅力的で楽しいソリューションであることがわかった。この研究は、サイバーセキュリティに関する学習を支援するシリアスゲームの使用に新たな証拠を与えている。

Gamification and Serious Games are progressively being used over a host of fields, particularly to support education. Such games provide a new way to engage students with content and can complement more traditional approaches to learning. This article proposes SherLOCKED, a new serious game created in the style of a 2D top-down puzzle adventure. The game is situated in the context of an undergraduate cyber security course, and is used to consolidate students' knowledge of foundational security concepts (e.g. the CIA triad, security threats and attacks and risk management). SherLOCKED was built based on a review of existing serious games and a study of common gamification principles. It was subsequently implemented within an undergraduate course, and evaluated with 112 students. We found the game to be an effective, attractive and fun solution for allowing further engagement with content that students were introduced to during lectures. This research lends additional evidence to the use of serious games in supporting learning about cyber security.

翻訳日:2023-03-22 23:52:19 公開日:2021-07-09

# 臨界パラメトリック量子センシング

Critical parametric quantum sensing ( http://arxiv.org/abs/2107.04503v1 )

ライセンス: Link先を確認

R. Di Candia, F. Minganti, K. V. Petrovnin, G. S. Paraoanu and S. Felicetti

(参考訳) 臨界量子システム(Critical quantum systems)は、相転移に近接して発達する拡散感受性のため、量子力学応用の有望な資源である。ここでは、駆動散逸位相遷移中のパラメトリックカー共振器のメトロジーパワーを評価する。周波数推定のための量子フィッシャー情報と周波数識別のためのヘルストロムバウンドを完全に特徴付ける。漸近的な状態を超えて、実験的な到達可能なパラメータでハイゼンベルク精度を達成できることが示される。我々は、非線形共振器の臨界挙動を利用して量子磁気センサの精度と超伝導量子ビット読み出しの忠実性を高めるプロトコルを設計する。

Critical quantum systems are a promising resource for quantum metrology applications, due to the diverging susceptibility developed in proximity of phase transitions. Here, we assess the metrological power of parametric Kerr resonators undergoing driven-dissipative phase transitions. We fully characterize the quantum Fisher information for frequency estimation, and the Helstrom bound for frequency discrimination. By going beyond the asymptotic regime, we show that the Heisenberg precision can be achieved with experimentally reachable parameters. We design protocols that exploit the critical behavior of nonlinear resonators to enhance the precision of quantum magnetometers and the fidelity of superconducting qubit readout.

翻訳日:2023-03-22 23:52:04 公開日:2021-07-09

# 高調波発生によるツイストト秒パルス中のトーラス結び角運動量

Torus Knot Angular Momentum in Twisted Attosecond Pulses from High Harmonic Generation ( http://arxiv.org/abs/2107.04499v1 )

ライセンス: Link先を確認

Bj\"orn Minneker, Birger B\"oning, Anne Weber and Stephan Fritzsche

(参考訳) 双円ねじれラゲール・ガウスビームは、新しい角運動量として、一定のトーラス結び目角運動量(TKAM)を持つ。 tkam は高調波発生のような非線形原子プロセスで保存され、時間遅延パラメータ $\tau$ と調整パラメータ $\gamma$ で分類することができる。これらのパラメータは、それぞれ投影された軌道角運動量と2つの重ね合わせされたラゲール・ガウシアンビームのエネルギーによって定義される。我々は、駆動ビームと高調波放射から$\tau$と$\gamma$を決定する一貫した幾何学的手法を導出した。この方法は、放射される高調波放射に対する不変パラメータ($\tau$ と $\gamma$)の両方を関連づける。したがって、$\tau$と$\gamma$は2つの異なるトーラス結び目から読み取ることができる。これらの結び目は、それぞれの高調波放射または駆動ビームの電界の時空間的進化から構築することができる。二次元ラゲール・ガウス線を明示的に照射した平面型原子ガスターゲットの分散パラメータの分類を実証する。さらに、$\tau$ と $\gamma$ によって決定される各トーラス結び目は、小さな修正で互いに写像できることを示した。この幾何学的手法は、純粋な形式的導出と比較して、不変パラメータである$\tau$ と $\gamma$ を解釈する異なる方法をもたらす。この研究で示された研究は、前回の発見とよく一致し、双円状のラゲール・ガウスビームによって誘導される高調波発生の文脈におけるTKAMの動的対称性に関する洞察を与える。

Bicircular twisted Laguerre-Gaussian beams possess a definite torus knot angular momentum (TKAM) as a new form of angular momentum. TKAM is conserved in nonlinear atomic processes such as high harmonic generation and can be classified by a time delay parameter $\tau$ and a coordination parameter $\gamma$. These parameters are defined by the respective projected orbital angular momentum and the energy of the two superimposed Laguerre-Gaussian beams. We derive a consistent geometric method to determine $\tau$ and $\gamma$ from the driving beam as well as from the high harmonic radiation. This method relates both invariance parameters ($\tau$ and $\gamma$) to the emitted high harmonic radiation. Therefore, $\tau$ and $\gamma$ can be read off of two different torus knots. These knots can be constructed from the spatio-temporal evolution of the electric field of the respective high harmonic radiation or the driving beam. We demonstrate the classification of the invariance parameters for a planar atomic gas target irradiated by bicircular Laguerre-Gaussian beams explicitly. Furthermore, we demonstrate that the respective torus knots determined by $\tau$ and $\gamma$ can be mapped onto each other within minor modifications. This geometric method yields a different way to interpret the invariance parameters $\tau$ and $\gamma$ as well as their underlying relation compared to a purely formal derivation. The investigations presented in this work are in good agreement with previous findings and provide insight into the dynamical symmetry of TKAM in the context of high harmonic generation induced by bicircular twisted Laguerre-Gaussian beams.

翻訳日:2023-03-22 23:51:53 公開日:2021-07-09

# 光キャビティにおける原子配列を用いた多重通信通信量子ネットワーク

Multiplexed telecom-band quantum networking with atom arrays in optical cavities ( http://arxiv.org/abs/2107.04477v1 )

ライセンス: Link先を確認

William Huie, Shankar G. Menon, Hannes Bernien, and Jacob P. Covey

(参考訳) 通信帯域演算や大規模量子情報処理と互換性のある物質ベースの量子ビットの量子ネットワークノードの実現は、基本的な量子ネットワークの可能性を制限する優れた課題である。マルチプレクサネットワークアーキテクチャにおいて、中性原子配列とテレコムバンド光子からなる量子プロセッサを相互接続するプラットフォームを提案する。単一原子ではなく大きな原子配列を用いることで、双方向通信の有害な影響を緩和し、2つのノード間の絡み合いを2桁近く改善する。さらに、各ノード内で高忠実度決定性ゲートと読み出しを同時に実行し、量子リピータへのドアを開き、ネットワークの長さと忠実度を高めるプロトコルを浄化する機能を提供する。中間ノードを量子リピータとして使用することにより,実際の仮定に基づいて約1500kmにわたる絡み合い分布の実現可能性を示し,大陸間ネットワークの青写真を提供する。最後に,分散フォールトトレラント量子コンピュータのバックボーンとして機能する,約25個のベルペアを大都市圏に分散できることを実証する。

The realization of a quantum network node of matter-based qubits compatible with telecom-band operation and large-scale quantum information processing is an outstanding challenge that has limited the potential of elementary quantum networks. We propose a platform for interfacing quantum processors comprising neutral atom arrays with telecom-band photons in a multiplexed network architecture. The use of a large atom array instead of a single atom mitigates the deleterious effects of two-way communication and improves the entanglement rate between two nodes by nearly two orders of magnitude. Further, this system simultaneously provides the ability to perform high-fidelity deterministic gates and readout within each node, opening the door to quantum repeater and purification protocols to enhance the length and fidelity of the network, respectively. Using intermediate nodes as quantum repeaters, we demonstrate the feasibility of entanglement distribution over approximately 1500 km based on realistic assumptions, providing a blueprint for a transcontinental network. Finally, we demonstrate that our platform can distribute approximately 25 Bell pairs over metropolitan distances, which could serve as the backbone of a distributed fault-tolerant quantum computer.

翻訳日:2023-03-22 23:51:02 公開日:2021-07-09

# ブロックチェーンとスマートコントラクトにセマンティック記述が必要な理由

Why blockchain and smart contracts need semantic descriptions ( http://arxiv.org/abs/2107.14101v1 )

ライセンス: Link先を確認

Zoran \v{S}koda

(参考訳) 私たちは、ブロックチェーンやスマートコントラクトの内容や振る舞いの背後にある、その特定のレベルの関連する現実の特徴を記述するレベルの階層が存在する、と論じています。これらの記述の基礎の研究がこれらの記述の形式主義、ツール、標準を発達させ、設定すれば、これらの体系の選択、設計、監査、法的な統制はより情報化され、より容易で、より高いレベルに引き上げられる。

We argue that there is a hierarchy of levels describing to that particular level relevant features of reality behind the content and behavior of blockchain and smart contracts in their realistic deployment. Choice, design, audit and legal control of these systems could be more informed, easier and raised to a higher level, if research on foundations of these descriptions develops and sets the formalisms, tools and standards for such descriptions.

翻訳日:2023-03-22 23:44:16 公開日:2021-07-09

# Smart Band: 緊急管理のための統合デバイス

Smart Band: An Integrated Device for Emergency Management ( http://arxiv.org/abs/2107.14100v1 )

ライセンス: Link先を確認

A. Jackulin Mahariba, Shivam Patel

(参考訳) 誘拐や緊急事態の場合には、しばしば助けを求めるために無力化される。そして通常、最初の応答者が到着するまでは遅すぎます。現在、市場に出回っている「安全」デバイスは、あまり実用的でない電源ボタンをダブルタップしたり、そもそも大きな投資をしているスマートウォッチのアプリなど、あまりにも初歩的すぎることが多い。 Smart Bandは、誘拐、野生動物への正面対面、心臓発作などの危険な状況で自分自身を見つける人による物理的なトリガーの必要性を排除し、むしろ心拍を感知することを目的としている。 smart bandはパーソナライズされたウェアラブルデバイスとして設計されており、ユーザの心拍数を機械学習アルゴリズムを使って収集し、トレーニングすることで、イベントが特定された時にアラートシステムが自動的に起動される。したがって、緊急状況を評価する精度が高く、虚偽率を低減することができる。イベントが検出されると、バンドは第1応答者および緊急連絡先にgps座標を中継し、それはネットワークキャリア(simカードモジュール)を介してバンドから直接送信される。基本的には、スマートバンドはgpsトラッカー、心拍センサー、ネットワークモジュール、bluetoothモジュールで構成されており、既存の技術はすべて大量生産されており、最終製品が手頃な価格で大量生産できる程度に大量生産されている。さらなる開発では、スマートバンドは、その目的を自律的に果たすことができる、きめ細かいウェアラブルジュエリーにカスタマイズできる。

In the event of a kidnapping or a medical emergency, a person is often incapacitated to be able to call for help. And it's usually too late before the first responders arrive on-scene. Currently, a vast array of 'safety' devices available in the market are often far too rudimentary such as double tapping the power button that isn't very practical, or an app on a smart watch that is a huge investment in the first place. The Smart Band aims to eliminate the need for a physical trigger by the person who finds himself in dangerous situations like kidnapping, front-facing some wild animal, heart attack, etc., and rather senses the heartbeat. The Smart Band is designed as a personalized wearable device, wherein the user heart beat rate is collected and trained using machine learning algorithm, which triggers the alert system automatically when the event is identified. Hence the accuracy of assessing emergency situation will be high and false rate will be reduced. As soon as the event is detected, the band relays GPS coordinates to first responders and emergency contacts, which will be sent via the Network Carrier (SIM card module) directly from the band, not relying on a mobile phone, which is usually out of reach during such emergency situations. In essence, the Smart Band consists of a GPS tracker, a heartbeat sensor, a Network module, and a Bluetooth module, all existing technologies which have been mass produced to an extent that the end product can be made affordable, and in huge quantities as well. On further development the smart band can be customized to a finely wearable jewel which can serve the purpose autonomously.

翻訳日:2023-03-22 23:44:09 公開日:2021-07-09

# 振動ポラリトン化学の理論へのロードマップ

A roadmap toward the theory of vibrational polariton chemistry ( http://arxiv.org/abs/2107.09026v1 )

ライセンス: Link先を確認

Derek S Wang and Susanne F Yelin

(参考訳) 振動ポラリトン化学の分野は2016年に、室温での化学反応速度が外部に駆動することなく共鳴調整された赤外線キャビティ内で変化した際に確立された。反応速度がなぜ変化するのかを理解するために世界中の科学者による激しい努力にもかかわらず、理論的な説明は存在しない。この観点からは, まず, 反応物質濃度, キャビティ周波数, 対称性の役割をほのめかした, このセレント実験およびそれに続く関連する実験を概観する。次に, 量子力学修飾遷移速度理論, フォトニック溶媒ケージ効果, 暗黒状態からの散逸の影響, 分子内振動エネルギー再分配による結合強化, 局所分子特性の総合的向上など, 主要な理論の関連性を分析する。最後に、理論と理論家のための新しい経路をテストする実験を提案し、振動ポラリトン化学の理論へのロードマップを構築する。強い結合機構の開始の重要性を理解し,反応経路の変化を捉えるための実験を設計し,さらにキャビティ修飾した分子内振動エネルギー再分配の理論と局所分子特性の総合的強化が次の重要なステップであると考えている。この視点が振動偏光子化学の分野の研究を導くための貴重な資源になることを願っている。

The field of vibrational polariton chemistry was firmly established in 2016 when a chemical reaction rate at room temperature was modified within a resonantly tuned infrared cavity without externally driving the system. Despite intense efforts by scientists around the world to understand why the reaction rate changes, no convincing theoretical explanation exists. In this perspective, first, we briefly review this seminal experiment, as well as relevant experiments that have since followed that have hinted at the roles of reactant concentration, cavity frequency, and symmetry. Then, we analyze the relevance of leading theories, such as quantum electrodynamics-modified transition rate theories, the photonic solvent cage effect, the impact of dissipation from dark states, bond strengthening via intramolecular vibrational energy redistribution, and collectively enhanced local molecular properties. Finally, we construct a roadmap toward the theory of vibrational polariton chemistry by suggesting experiments to test theories and new paths for theorists. We believe that understanding the importance of the onset of the strong coupling regime, designing experiments to capture changes in reaction pathways, and further developing the theories of cavity-modified intramolecular vibrational energy redistribution and collectively enhanced local molecular properties are crucial next steps. We hope this perspective will be a valuable resource for guiding research in the field of vibrational polariton chemistry.

翻訳日:2023-03-22 23:43:45 公開日:2021-07-09

# 過去の断片:家庭内暴力の加害者によるピアサポートの算出

Fragments of the Past: Curating Peer Support with Perpetrators of Domestic Violence ( http://arxiv.org/abs/2107.04711v1 )

ライセンス: Link先を確認

Rosanna Bellini, Alexander Wilson, Jan David Smeddinck

(参考訳) デジタルピアサポートネットワークが、自分や他人を傷つける人々にとって行動の変化や幸福な結果にポジティブな影響を与えうるという証拠が増えている。しかし、このようなネットワークの構築と維持には倫理的かつ実践的な課題があり、特に家庭内暴力の加害者が団結する際に独特なリスクを負う。本研究は,6人の支援労働者と18人の加害者とともに,音声メッセージを有形人工物と結び付ける社会材料システムFragments of the Pastの設計と展開について10ヶ月にわたる研究を報告する。暴力から脱落した経験をデジタルで表現したアーティファクトの作り方 - フラッグメント(fragments) - を共有することで、直接対人コミュニケーションに固有のリスクを負うことなく、モチベーションやラプポートのメッセージを伝えることができる。これらの知見は、挑戦的な人口を持つ将来のネットワーク設計の実践的考察の基礎となる。

There is growing evidence that digital peer-support networks can have a positive influence on behaviour change and wellbeing outcomes for people who harm themselves and others. However, making and sustaining such networks are subject to ethical and pragmatic challenges, particularly for perpetrators of domestic violence whom pose unique risks when brought together. In this work we report on a ten-month study where we worked with six support workers and eighteen perpetrators in the design and deployment of Fragments of the Past; a socio-material system that connects audio messages with tangible artefacts. We share how crafting digitally-augmented artefacts - 'fragments' - of experiences of desisting from violence can translate messages for motivation and rapport between peers, without subjecting the process to risks inherent with direct inter-personal communication. These insights provide the basis for practical considerations for future network design with challenging populations.

翻訳日:2023-03-22 23:43:24 公開日:2021-07-09

# Um Metodo para Busca Automatica de Redes Neurais Artificiais

Um Metodo para Busca Automatica de Redes Neurais Artificiais ( http://arxiv.org/abs/2107.04702v1 )

ライセンス: Link先を確認

Anderson P. da Silva, Teresa B. Ludermir, Leandro M. Almeida

(参考訳) 本稿では,セル遺伝アルゴリズムを用いたニューラルネットワークの自動検索手法について述べる。一般的な遺伝的アルゴリズムにおけるこの方法の主な違いは、個人に位置情報を提供することができるセルオートマトンを使用することで、検索空間における局所最小化の可能性を減らすことである。この方法は、初期重み付け、伝達関数、アーキテクチャ、学習規則の同時選択のための進化的探索を用いる。実験結果から,本手法は,文献に見られる他の手法と比較して,十分に一般化し,訓練時間も短く,コンパクトで効率的なネットワークを探索できることがわかった。

This paper describes a method that automatically searches Artificial Neural Networks using Cellular Genetic Algorithms. The main difference of this method for a common genetic algorithm is the use of a cellular automaton capable of providing the location for individuals, reducing the possibility of local minima in search space. This method employs an evolutionary search for simultaneous choices of initial weights, transfer functions, architectures and learning rules. Experimental results have shown that the developed method can find compact, efficient networks with a satisfactory generalization power and with shorter training times when compared to other methods found in the literature.

翻訳日:2023-03-22 23:43:08 公開日:2021-07-09

# 量子コンピューティングによるアンテナアレー薄肉化

Antenna Array Thinning Through Quantum Computing ( http://arxiv.org/abs/2107.04684v1 )

ライセンス: Link先を確認

Paolo Rocca, Nicola Anselmi, Giacomo Oliveri, Alessandro Polo and Andrea Massa

(参考訳) 量子フーリエ変換(QFT)によるアンテナアレイの薄膜化を提案する。配列要素の候補位置の格子が与えられた場合、配列要素がどのアンテナ位置を占有するかを問う問題は量子コンピューティング(QC)フレームワークで定式化され、QFTアルゴリズムの適切な実装に基づくアドホック設計手法で対処される。提案手法の特徴と利点を指摘するために, 代表的な数値計算結果を提示し, 考察した。

Thinning antenna arrays through quantum Fourier transform (QFT) is proposed. Given the lattice of the candidate locations for the array elements, the problem of selecting which antenna location has to be either occupied or not by an array element is formulated in the quantum computing (QC) framework and then addressed with an ad-hoc design method based on a suitable implementation of the QFT algorithm. Representative numerical results are presented and discussed to point out the features and the advantages of the proposed QC-based thinning technique.

翻訳日:2023-03-22 23:42:56 公開日:2021-07-09

# 量子純度と生体直交多項式反復のモーメント

Moments of quantum purity and biorthogonal polynomial recurrence ( http://arxiv.org/abs/2107.04637v1 )

ライセンス: Link先を確認

Shi-Hao Li and Lu Wei

(参考訳) ビューズ・ハルアンサンブルは密度行列のユニークな尺度であり、量子情報処理における様々な特性を満たす。本研究では,バーレス・ハル・アンサンブル上での絡み合いの統計的挙動を,最も単純なエントロピー(量子純度)によって測定した。この研究の主な成果は、任意のサブシステム次元に対して有効な量子純粋性の正確な第2および第3モーメント表現であり、文学における対応する結果は、等しいサブシステム次元のシナリオに限られる。結果を得るためには,cauchy-laguerre biorthogonal polynomials 上の基底積分の帰結関係を独立に求めた。

The Bures-Hall ensemble is a unique measure of density matrices that satisfies various distinguished properties in quantum information processing. In this work, we study the statistical behavior of entanglement over the Bures-Hall ensemble as measured by the simplest form of an entanglement entropy - the quantum purity. The main results of this work are the exact second and third moment expressions of quantum purity valid for any subsystem dimensions, where the corresponding results in the literature are limited to the scenario of equal subsystem dimensions. In obtaining the results, we have derived recurrence relations of the underlying integrals over the Cauchy-Laguerre biorthogonal polynomials that may be of independent interest.

翻訳日:2023-03-22 23:42:46 公開日:2021-07-09

# Tavis-Cummingsモデルを超えて:原子アンサンブルを用いたQEDの再検討

Beyond the Tavis-Cummings model: revisiting cavity QED with atomic ensembles ( http://arxiv.org/abs/2107.04583v1 )

ライセンス: Link先を確認

Martin Blaha, Aisling Johnson, Arno Rauschenbeutel, J\"urgen Volz

(参考訳) 単一モード電磁場と$N$2レベル原子のアンサンブルの相互作用は、Tavis-Cummingsモデルによって説明される。ここで、集合的に強化された光-物質結合強度は、$g_N = \sqrt{N} \bar{g}_1$, $\bar{g}_1$ で与えられる。以前は、このモデルは多くの空洞実験を記述し分析するために用いられてきた。ここでは,非キャビティモードへの有効散乱速度がキャビティの自由スペクトル範囲と比較して無視できる場合にのみ正当性を示す。実験パラメータに関しては、アンサンブルの光学的深さが低く、いくつかの最先端の実験で破られる条件が必要である。我々は、tavis-cummingsモデルの有効性の定量的条件を与え、全ての連続原子と光子のカスケード相互作用を考慮したより一般的なハミルトニアン記述を導出する。その結果,tavis-cummingsモデルで得られた予測と定量的および定性的に予測が異なっていた。最後に,Tavis-Cummingsモデルの予測から逸脱していることを示す実験データについて述べる。本研究は、量子エミッタの光密度アンサンブルを光共振器に結合した全ての実験に関係している。

The interaction of an ensemble of $N$ two-level atoms with a single mode electromagnetic field is described by the Tavis-Cummings model. There, the collectively enhanced light-matter coupling strength is given by $g_N = \sqrt{N} \bar{g}_1$, where $\bar{g}_1$ is the ensemble-averaged single-atom coupling strength. Formerly, this model has been employed to describe and to analyze numerous cavity-based experiments. Here, we show that this is only justified if the effective scattering rate into non-cavity modes is negligible compared to the cavity's free-spectral range. In terms of experimental parameters, this requires that the optical depth of the ensemble is low, a condition that is violated in several state-of-the-art experiments. We give quantitative conditions for the validity of the Tavis-Cummings model and derive a more general Hamiltonian description that takes into account the cascaded interaction of the photons with all consecutive atoms. We show that the predictions of our model can differ quantitatively and even qualitatively from those obtained with the Tavis-Cummings model. Finally, we present experimental data, for which the deviation from the predictions of the Tavis-Cummings model is apparent. Our findings are relevant for all experiments in which optically dense ensembles of quantum emitters are coupled to an optical resonator.

翻訳日:2023-03-22 23:41:50 公開日:2021-07-09

# 離散時間アルゴリズム理解のための$O(s^r)$-resolution ODE Frameworkとミニマックス問題の線形収束への応用

An $O(s^r)$-Resolution ODE Framework for Understanding Discrete-Time Algorithms and Applications to the Linear Convergence of Minimax Problems ( http://arxiv.org/abs/2001.08826v7 )

ライセンス: Link先を確認

Haihao Lu

(参考訳) 離散時間アルゴリズム(dtas)のダイナミクスを理解するために、通常の微分方程式(odes)を用いた長い歴史がある。意外なことに、まだ基本的な疑問と答えが2つあります。 i)所定のDTAから \emph{suitable} ODE を取得する方法が不明確で、 (ii) DTA の収束と対応する ODE との関係は不明確である。本稿では、上記の2つの疑問に答える汎用DTAの挙動を分析するための新しい機械、$O(s^r)$- resolution ODEフレームワークを提案する。フレームワークには3つのステップがある。 1. 与えられた DTA から適切な ODE を得るには、$s$ が DTA のステップサイズである次数 $r$ でパラメータ化された DTA の $O(s^r)$- resolution ODE の階層を定義する。 DTA からユニークな $O(s^r)$- resolution ODE を構築するための主要なアプローチを提案する。 2) 得られたODEを解析するために,DTAのエネルギー関数に対する$O(s^r)$-linear-convergence条件を提案し,そこで$O(s^r)$- resolution ODEが最適解に線形に収束する。 3) DTA とその対応する ODE の収束特性をブリッジするために、エネルギー関数の固有性を定義し、適切なエネルギー関数に対する$O(s^r)$-解像度 ODE の線型収束が、DTA の線形収束を自動的に保証できることを示す。この機構をよりよく説明するために、制約のないミニマックス問題 $\min_{x\in\RR^n} \max_{y\in \RR^m} L(x,y)$ の解法として、勾配降下昇降法(GDA)、近点法(PPM)、外勾配法(EGM)の3つの古典的アルゴリズムについて検討する。

There has been a long history of using ordinary differential equations (ODEs) to understand the dynamics of discrete-time algorithms (DTAs). Surprisingly, there are still two fundamental and unanswered questions: (i) it is unclear how to obtain a \emph{suitable} ODE from a given DTA, and (ii) it is unclear the connection between the convergence of a DTA and its corresponding ODEs. In this paper, we propose a new machinery -- an $O(s^r)$-resolution ODE framework -- for analyzing the behavior of a generic DTA, which (partially) answers the above two questions. The framework contains three steps: 1. To obtain a suitable ODE from a given DTA, we define a hierarchy of $O(s^r)$-resolution ODEs of a DTA parameterized by the degree $r$, where $s$ is the step-size of the DTA. We present a principal approach to construct the unique $O(s^r)$-resolution ODEs from a DTA; 2. To analyze the resulting ODE, we propose the $O(s^r)$-linear-convergence condition of a DTA with respect to an energy function, under which the $O(s^r)$-resolution ODE converges linearly to an optimal solution; 3. To bridge the convergence properties of a DTA and its corresponding ODEs, we define the properness of an energy function and show that the linear convergence of the $O(s^r)$-resolution ODE with respect to a proper energy function can automatically guarantee the linear convergence of the DTA. To better illustrate this machinery, we utilize it to study three classic algorithms -- gradient descent ascent (GDA), proximal point method (PPM) and extra-gradient method (EGM) -- for solving the unconstrained minimax problem $\min_{x\in\RR^n} \max_{y\in \RR^m} L(x,y)$.

翻訳日:2023-01-07 13:32:04 公開日:2021-07-09

# 一般バナッハ空間におけるコヒーレントとアルキメデスの選択

Coherent and Archimedean choice in general Banach spaces ( http://arxiv.org/abs/2002.05461v4 )

ライセンス: Link先を確認

Gert de Cooman

(参考訳) 私は、抽象バナッハ空間に生きる選択肢間の二項選択と非二項選択に対するアルキメデス性という新しい概念を、非常に一般的な選択モデルのクラスを通して導入し、研究する。 In order to be able to bring an important diversity of contexts into the fold, amongst which choice between horse lottery options, I pay special attention to the case where these linear spaces don't include all `constant' options.I consider the frameworks of conservative inference associated with Archimedean (and coherent) choice models, and also pay quite a lot of attention to representation of general (non-binary) choice models in terms of the simpler, binary ones.The representation theorems proved here provide an axiomatic characterisation for, amongst many other choice methods, Levi's E-admissibility and Walley-Sen maximality.

I introduce and study a new notion of Archimedeanity for binary and non-binary choice between options that live in an abstract Banach space, through a very general class of choice models, called sets of desirable option sets. In order to be able to bring an important diversity of contexts into the fold, amongst which choice between horse lottery options, I pay special attention to the case where these linear spaces don't include all `constant' options.I consider the frameworks of conservative inference associated with Archimedean (and coherent) choice models, and also pay quite a lot of attention to representation of general (non-binary) choice models in terms of the simpler, binary ones.The representation theorems proved here provide an axiomatic characterisation for, amongst many other choice methods, Levi's E-admissibility and Walley-Sen maximality.

翻訳日:2023-01-01 13:23:06 公開日:2021-07-09

# 高エネルギー物理データを用いた量子インスピレーション機械学習

Quantum-inspired Machine Learning on high-energy physics data ( http://arxiv.org/abs/2004.13747v2 )

ライセンス: Link先を確認

Timo Felser, Marco Trenti, Lorenzo Sestini, Alessio Gianelle, Davide Zuliani, Donatella Lucchesi and Simone Montangero

(参考訳) 量子多体システムのシミュレーション用に設計された数値ツールであるTensor Networksは、機械学習の問題を解決するために最近応用されている。木テンソルネットワークをエクスプロイトし、CERNの大型ハドロン衝突型加速器によって生成されたデータの分析と分類を、高エネルギー物理学において非常に重要かつ挑戦的なビッグデータ問題に適用する。特に, LHCb実験において, いわゆるb-ジェット, 陽子-陽子衝突に由来するb-クォークを効果的に分類する方法, および, 分類結果の解釈方法について述べる。我々は,テンソルネットワークアプローチを利用して重要な特徴を抽出し,学習プロセスで取得した情報に基づいてネットワーク形状を適応する。最後に,木テンソルネットワークを適応させて,学習プロセスを繰り返すことなく,最適な精度や高速な応答を実現する方法を示す。これらの結果は、数十mhz規模のイベントをトリガできる現在のlhcbイベント分類や将来のlhcbイベント分類に必要な重要な要素である、高周波リアルタイムアプリケーションの実装への道を開いた。

Tensor Networks, a numerical tool originally designed for simulating quantum many-body systems, have recently been applied to solve Machine Learning problems. Exploiting a tree tensor network, we apply a quantum-inspired machine learning technique to a very important and challenging big data problem in high energy physics: the analysis and classification of data produced by the Large Hadron Collider at CERN. In particular, we present how to effectively classify so-called b-jets, jets originating from b-quarks from proton-proton collisions in the LHCb experiment, and how to interpret the classification results. We exploit the Tensor Network approach to select important features and adapt the network geometry based on information acquired in the learning process. Finally, we show how to adapt the tree tensor network to achieve optimal precision or fast response in time without the need of repeating the learning process. These results pave the way to the implementation of high-frequency real-time applications, a key ingredient needed among others for current and future LHCb event classification able to trigger events at the tens of MHz scale.

翻訳日:2022-12-08 22:50:32 公開日:2021-07-09

# data-to-textタスクのためのtext-to-text事前トレーニング

Text-to-Text Pre-Training for Data-to-Text Tasks ( http://arxiv.org/abs/2005.10433v3 )

ライセンス: Link先を確認

Mihir Kale, Abhinav Rastogi

(参考訳) データ・ツー・テキストタスクの事前トレーニング+微調整戦略について検討する。実験の結果,テキスト・トゥ・テキスト・プレトレーニングをT5形式で行うことで,データ・ツー・テキスト生成に適したパイプライン型ニューラルネットワークモデルと,BERT や GPT-2 といった代替言語モデルに基づく事前トレーニング技術に勝ることを示す。重要な点として、T5事前トレーニングはドメイン外のテストセットを大きく改善することで証明されるように、より良い一般化をもたらす。私たちの研究が、データからテキストへのタスクでより普及するにつれて、将来の研究のベースラインとして役立つことを願っています。

We study the pre-train + fine-tune strategy for data-to-text tasks. Our experiments indicate that text-to-text pre-training in the form of T5, enables simple, end-to-end transformer based models to outperform pipelined neural architectures tailored for data-to-text generation, as well as alternative language model based pre-training techniques such as BERT and GPT-2. Importantly, T5 pre-training leads to better generalization, as evidenced by large improvements on out-of-domain test sets. We hope our work serves as a useful baseline for future research, as transfer learning becomes ever more prevalent for data-to-text tasks.

翻訳日:2022-11-30 23:30:27 公開日:2021-07-09

# Deep RelativeFusion:シングルイメージ相対深度予測を用いた高密度単分子SLAM

DeepRelativeFusion: Dense Monocular SLAM using Single-Image Relative Depth Prediction ( http://arxiv.org/abs/2006.04047v3 )

ライセンス: Link先を確認

Shing Yan Loo, Syamsiah Mashohor, Sai Hong Tang, Hong Zhang

(参考訳) 本稿では,大域的に一貫した3次元構造を復元できる,DeepRelativeFusionと呼ばれる高密度単分子SLAMシステムを提案する。この目的のために,視覚的なslamアルゴリズムを用いて,キーフレームのカメラポーズとセミセンス深度マップを確実に復元し,相対深度予測を用いてセミセンス深度マップを高密度化し,キーフレームポーズグラフを洗練する。半密度深度マップを改善するため, 隣接する画素の画素強度と深度を考慮した構造保存型平均平滑化フィルタである適応フィルタ方式を提案する。高密度化を実現するために,deepfusionが提案するエネルギー最小化フレームワークについて,(1)コスト関数の改善,(2)単像相対深度予測の2つの段階的な改善を提案する。密度化後、キーフレームを2ビュー一貫した半深度と深度マップで更新し、ポーズグラフの最適化を改善し、正確なシーン再構成のためにキーフレームのポーズを洗練するためのフィードバックループを提供する。我々のシステムは最先端の高密度SLAMシステムよりも高い精度で高精度に再現できる。

In this paper, we propose a dense monocular SLAM system, named DeepRelativeFusion, that is capable to recover a globally consistent 3D structure. To this end, we use a visual SLAM algorithm to reliably recover the camera poses and semi-dense depth maps of the keyframes, and then use relative depth prediction to densify the semi-dense depth maps and refine the keyframe pose-graph. To improve the semi-dense depth maps, we propose an adaptive filtering scheme, which is a structure-preserving weighted average smoothing filter that takes into account the pixel intensity and depth of the neighbouring pixels, yielding substantial reconstruction accuracy gain in densification. To perform densification, we introduce two incremental improvements upon the energy minimization framework proposed by DeepFusion: (1) an improved cost function, and (2) the use of single-image relative depth prediction. After densification, we update the keyframes with two-view consistent optimized semi-dense and dense depth maps to improve pose-graph optimization, providing a feedback loop to refine the keyframe poses for accurate scene reconstruction. Our system outperforms the state-of-the-art dense SLAM systems quantitatively in dense reconstruction accuracy by a large margin.

翻訳日:2022-11-24 08:32:03 公開日:2021-07-09

# 部分観測線形力学系における粒子フィルタリングは計画に有効か?

When is Particle Filtering Efficient for Planning in Partially Observed Linear Dynamical Systems? ( http://arxiv.org/abs/2006.05975v2 )

ライセンス: Link先を確認

Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu

(参考訳) 粒子フィルタリングは確率力学系の潜在状態を推定する一般的な方法であり、その理論的性質は機械学習や統計コミュニティでよく研究されている。多くの制御問題、例えば、部分的に観測された線形力学系(POLDS)では、推論された潜在状態が各ステップの計画にさらに使用される。本稿では,逐次計画のための粒子フィルタリングの効率に関する厳密な研究を開始し,最初の粒子複雑性境界を与える。過去の行動の誤りは未来に影響を与えるかもしれないが、粒子フィルタリングに基づく政策の長期的報酬が正確な推測に基づいてそれに近いように、必要な粒子の数を制限できる。特に、安定系では、多項式的に多くの粒子が十分であることを示す。我々の証明の鍵は、正確な計画と、粒子フィルタリングに基づく近似計画によって生成されるシーケンスに基づく理想的なシーケンスのカップリングである。このテクニックは、他の逐次的な意思決定問題にも有用だと考えています。

Particle filtering is a popular method for inferring latent states in stochastic dynamical systems, whose theoretical properties have been well studied in machine learning and statistics communities. In many control problems, e.g., partially observed linear dynamical systems (POLDS), oftentimes the inferred latent state is further used for planning at each step. This paper initiates a rigorous study on the efficiency of particle filtering for sequential planning, and gives the first particle complexity bounds. Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference. In particular, we show that, in stable systems, polynomially many particles suffice. Key in our proof is a coupling of the ideal sequence based on the exact planning and the sequence generated by approximate planning based on particle filtering. We believe this technique can be useful in other sequential decision-making problems.

翻訳日:2022-11-23 05:16:43 公開日:2021-07-09

# 分布変換と多様体埋め込みのための四分位・四分位埋め込みと埋め込み分布の選択能力

Quantile-Quantile Embedding for Distribution Transformation and Manifold Embedding with Ability to Choose the Embedding Distribution ( http://arxiv.org/abs/2006.11385v2 )

ライセンス: Link先を確認

Benyamin Ghojogh, Fakhri Karray, Mark Crowley

(参考訳) 本稿では, 分布変換および多様体埋め込みのためのquantile-quantile embedded (qqe) という新しい埋め込み手法を提案する。 QQEは、視覚統計的テストから量子量子的プロットの概念を用いており、データの分布を理論上望ましい分布や経験的参照サンプルに変換することができる。さらに、QQEは、データの多様体を低次元の埋め込み空間に埋め込む際に、ユーザーに分布を埋め込む選択を与える。また、PCA、t-SNE、ディープメトリックラーニングなどの他の次元削減手法の埋め込み分布を修正して、データの表現や視覚化に使用することもできる。教師なし型と教師なし型の両方でQQEを提案する。 QQEはまた、分布を正確な参照分布またはその形状に変換することもできる。また,qqeによってクラス識別が向上する場合もある。異なる合成データと画像データセットを用いた実験により,提案手法の有効性を示す。

We propose a new embedding method, named Quantile-Quantile Embedding (QQE), for distribution transformation and manifold embedding with the ability to choose the embedding distribution. QQE, which uses the concept of quantile-quantile plot from visual statistical tests, can transform the distribution of data to any theoretical desired distribution or empirical reference sample. Moreover, QQE gives the user a choice of embedding distribution in embedding the manifold of data into the low dimensional embedding space. It can also be used for modifying the embedding distribution of other dimensionality reduction methods, such as PCA, t-SNE, and deep metric learning, for better representation or visualization of data. We propose QQE in both unsupervised and supervised forms. QQE can also transform a distribution to either an exact reference distribution or its shape. We show that QQE allows for better discrimination of classes in some cases. Our experiments on different synthetic and image datasets show the effectiveness of the proposed embedding method.

翻訳日:2022-11-19 03:29:58 公開日:2021-07-09

# logit調整によるロングテール学習

Long-tail learning via logit adjustment ( http://arxiv.org/abs/2007.07314v2 )

ライセンス: Link先を確認

Aditya Krishna Menon and Sadeep Jayasumana and Ankit Singh Rawat and Himanshu Jain and Andreas Veit and Sanjiv Kumar

(参考訳) 実世界の分類問題は通常、不均衡またはロングテールのラベル分布を示し、多くのラベルは少数のサンプルに関連付けられる。これはそのようなラベルの一般化に挑戦し、na\"ive learningを支配的なラベルに偏らせる。本稿では,これらの課題に対処するために,標準ソフトマックスクロスエントロピートレーニングの2つの簡単な修正を提案する。本手法では,ラベル周波数に基づくロジット調整の古典的考え方を再考し,トレーニングモデルにポストホックを適用したり,トレーニング中に損失を強制したりする。このような調整は、レアラベルと支配ラベルのロジットの間に大きな相対的マージンをもたらす。これらの技術は、統計的根拠と経験的パフォーマンスをしっかりと保ちながら、文学における最近のいくつかの提案を統一し、一般化する。

Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes na\"ive learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these challenges. Our techniques revisit the classic idea of logit adjustment based on the label frequencies, either applied post-hoc to a trained model, or enforced in the loss during training. Such adjustment encourages a large relative margin between logits of rare versus dominant labels. These techniques unify and generalise several recent proposals in the literature, while possessing firmer statistical grounding and empirical performance.

翻訳日:2022-11-10 13:59:13 公開日:2021-07-09

# 抽象的マルチエージェントインタラクションによる量子セキュア認証と鍵合意に向けて

Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction ( http://arxiv.org/abs/2007.09327v2 )

ライセンス: Link先を確認

Ibrahim H. Ahmed, Josiah P. Hanna, Elliot Fosong, and Stefano V. Albrecht

(参考訳) 公開鍵暗号に基づく認証と鍵契約の現在の方法は、量子コンピューティングに弱い。本稿では,人工知能研究に基づく新たなアプローチを提案する。この手法では,コミュニケーション関係者を自律エージェントと見なして,個人決定モデルを用いて相互に対話する。インタラクション中のエージェントの観察行動に基づいて認証とキーアグリーメントが決定される。このアプローチの安全性は、限られた観測結果から相互作用するエージェントの決定をモデル化することの難しさに起因している。提案手法に基づくプロトタイプ認証および鍵契約システムであるPyAMIをリリースする。本手法を実証的に検証し、異なるタイプの攻撃を検知し、正当なユーザを認証する。最後に,サーバモデルのトレーニングに強化学習技術を用いることで,クライアントの判断を効果的に探索し,よりサンプリング効率の高い認証を実現する方法を示す。

Current methods for authentication and key agreement based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach based on artificial intelligence research in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. Authentication and key agreement are decided based on the agents' observed behaviors during the interaction. The security of this approach rests upon the difficulty of modeling the decisions of interacting agents from limited observations, a problem which we conjecture is also hard for quantum computing. We release PyAMI, a prototype authentication and key agreement system based on the proposed method. We empirically validate our method for authenticating legitimate users while detecting different types of adversarial attacks. Finally, we show how reinforcement learning techniques can be used to train server models which effectively probe a client's decisions to achieve more sample-efficient authentication.

翻訳日:2022-11-09 06:12:38 公開日:2021-07-09

# CNNに基づくテキスト分類モデルのためのSHAP値

SHAP values for Explaining CNN-based Text Classification Models ( http://arxiv.org/abs/2008.11825v2 )

ライセンス: Link先を確認

Wei Zhao, Tarun Joshi, Vijayan N. Nair, and Agus Sudjianto

(参考訳) ディープニューラルネットワークは自然言語処理(nlp)モデルでますます使われている。しかし、複雑なアルゴリズムによる結果の解釈と説明の必要性は、銀行などの規制産業において広く採用されていることを制限している。構造化データを用いた機械学習アルゴリズムの解釈可能性に関する最近の研究がある。しかし、語彙の大きさ、高次元の性質、テキストのコヒーレンスと言語構造を考慮する必要があるため、問題がより難しいnlpアプリケーションでは、制限された技術しかありません。本稿では,cnnに基づくテキスト分類モデルの局所的説明可能性のためのshap値を計算する手法を開発した。このアプローチは、機能の重要性を評価するためにグローバルスコアを計算するためにも拡張されている。結果は、Amazon Electronic Reviewのデータの感情分析に基づいて説明される。

Deep neural networks are increasingly used in natural language processing (NLP) models. However, the need to interpret and explain the results from complex algorithms are limiting their widespread adoption in regulated industries such as banking. There has been recent work on interpretability of machine learning algorithms with structured data. But there are only limited techniques for NLP applications where the problem is more challenging due to the size of the vocabulary, high-dimensional nature, and the need to consider textual coherence and language structure. This paper develops a methodology to compute SHAP values for local explainability of CNN-based text classification models. The approach is also extended to compute global scores to assess the importance of features. The results are illustrated on sentiment analysis of Amazon Electronic Review data.

翻訳日:2022-10-24 20:43:05 公開日:2021-07-09

# 効率的な衛星画像による雲構造の分類と理解

Classification and understanding of cloud structures via satellite images with EfficientUNet ( http://arxiv.org/abs/2009.12931v4 )

ライセンス: Link先を確認

Tashin Ahmed and Noor Hossain Nuri Sabab

(参考訳) 気候変動は、長年にわたって重要な政治議論と意思決定の最前線であり、共通の関心事であった。浅層雲は地球の気候を理解する上で重要な役割を果たすが、気候モデルで解釈し表現することは困難である。これらの雲構造を分類することで、雲の物理的構造を理解する可能性が高くなり、気候モデルの生成が改善され、気候変動の予測や天気予報がより良くなる。クラウドは多くの形式で編成されるため、従来のルールベースのアルゴリズムを構築してクラウド機能を分離することは困難である。本稿では,コンボリューションニューラルネット(cnn)をエンコーダとして,unetをデコーダとして,細粒度特徴マップの抽出と再構成を行い,分類器として活用し,専門家が雲が将来的な気候をどのように形成するかを理解するのに役立つような,クラウド組織パターンの分類を行った。分類タスクでセグメンテーションモデルを使用することで、UNetと共に優れたエンコーダを使用することで、このデータセットから優れたパフォーマンスを得ることができることを示した。ダイス係数は最終評価基準に使われており、カグル競技においてそれぞれ66.26\%と66.02\%のスコアを得た。

Climate change has been a common interest and the forefront of crucial political discussion and decision-making for many years. Shallow clouds play a significant role in understanding the Earth's climate, but they are challenging to interpret and represent in a climate model. By classifying these cloud structures, there is a better possibility of understanding the physical structures of the clouds, which would improve the climate model generation, resulting in a better prediction of climate change or forecasting weather update. Clouds organise in many forms, which makes it challenging to build traditional rule-based algorithms to separate cloud features. In this paper, classification of cloud organization patterns was performed using a new scaled-up version of Convolutional Neural Network (CNN) named as EfficientNet as the encoder and UNet as decoder where they worked as feature extractor and reconstructor of fine grained feature map and was used as a classifier, which will help experts to understand how clouds will shape the future climate. By using a segmentation model in a classification task, it was shown that with a good encoder alongside UNet, it is possible to obtain good performance from this dataset. Dice coefficient has been used for the final evaluation metric, which gave the score of 66.26\% and 66.02\% for public and private (test set) leaderboard on Kaggle competition respectively.

翻訳日:2022-10-14 03:53:57 公開日:2021-07-09

# 確率的強制アンサンブル動的モード分解による近周期系の予測と解析

Stochastically forced ensemble dynamic mode decomposition for forecasting and analysis of near-periodic systems ( http://arxiv.org/abs/2010.04248v2 )

ライセンス: Link先を確認

Daniel Dylewsky, David Barajas-Solano, Tong Ma, Alexandre M. Tartakovsky, J. Nathan Kutz

(参考訳) 時系列予測はほとんどの科学分野において中心的な課題である。本稿では, 時間遅延座標における動的モード分解(dmd)を用いた強制線形系として, 観測されたダイナミクスをモデル化する新しい負荷予測法を提案する。このアプローチの中心は、グリッドの負荷が、複雑な実世界の多くの観測可能量と同様に、「ほぼ周期的な」特性、すなわち、支配的なピークによって変動する連続フーリエスペクトルを持つという洞察である。提示した予測方法は,この特性を利用する (i)固有スペクトルがそれらのピークに写像する決定論的線形モデルへの回帰、 (2)確率ガウス過程回帰(GPR)過程を同時に学習し、このシステムを動作させる。予測アルゴリズムは, 説明変数を付加せず, 最先端予測手法と比較し, 優れた性能が得られることを示した。さらに、線形固有ダイナミクスの使用は、解釈可能性とパシモニーの観点から、多くの望ましい特性を提供する。電力網からの負荷データを用いたテストケースについて結果を示す。負荷予測は、リアルタイム制御、価格設定、メンテナンス、セキュリティ決定など、電力システム工学における重要な課題である。

Time series forecasting remains a central challenge problem in almost all scientific disciplines. We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system using Dynamic Mode Decomposition (DMD) in time delay coordinates. Central to this approach is the insight that grid load, like many observables on complex real-world systems, has an "almost-periodic" character, i.e., a continuous Fourier spectrum punctuated by dominant peaks, which capture regular (e.g., daily or weekly) recurrences in the dynamics. The forecasting method presented takes advantage of this property by (i) regressing to a deterministic linear model whose eigenspectrum maps onto those peaks, and (ii) simultaneously learning a stochastic Gaussian process regression (GPR) process to actuate this system. Our forecasting algorithm is compared against state-of-the-art forecasting techniques not using additional explanatory variables and is shown to produce superior performance. Moreover, its use of linear intrinsic dynamics offers a number of desirable properties in terms of interpretability and parsimony. Results are presented for a test case using load data from an electrical grid. Load forecasting is an essential challenge in power systems engineering, with major implications for real-time control, pricing, maintenance, and security decisions.

翻訳日:2022-10-09 13:10:30 公開日:2021-07-09

# 深部ニューラルネットワークを用いた有限温度コーンシャム密度関数理論の高速化

Accelerating Finite-temperature Kohn-Sham Density Functional Theory with Deep Neural Networks ( http://arxiv.org/abs/2010.04905v2 )

ライセンス: Link先を確認

J. Austin Ellis and Lenz Fiedler and Gabriel A. Popoola and Normand A. Modine and J. Adam Stephens and Aidan P. Thompson and Attila Cangi and Sivasankaran Rajamanickam

(参考訳) 有限電子温度でコーン・シャム密度汎関数理論(DFT)によって生成された総エネルギーを、無視可能な計算コストで化学精度で再現する機械学習(ML)に基づく数値モデリングワークフローを提案する。ディープニューラルネットワークに基づいて、ワークフローは与えられた原子構成に対する状態の局所密度(LDOS)を生成する。 LDOSから、原子のボルン・オッペンハイマーポテンシャルエネルギー表面として機能するDFT全自由エネルギーを含む空間分解、エネルギー分解、統合量を計算することができる。本研究では, 固体および液体金属に対するこのアプローチの有効性を実証し, 固体および液体アルミニウムの独立および統一機械学習モデルとの比較を行った。機械学習の密度汎関数理論の枠組みは、現在のアルゴリズムでは達成不可能な計算規模とコストで、環境条件および極限条件下でのマルチスケール材料モデリングへの道を開く。

We present a numerical modeling workflow based on machine learning (ML) which reproduces the the total energies produced by Kohn-Sham density functional theory (DFT) at finite electronic temperature to within chemical accuracy at negligible computational cost. Based on deep neural networks, our workflow yields the local density of states (LDOS) for a given atomic configuration. From the LDOS, spatially-resolved, energy-resolved, and integrated quantities can be calculated, including the DFT total free energy, which serves as the Born-Oppenheimer potential energy surface for the atoms. We demonstrate the efficacy of this approach for both solid and liquid metals and compare results between independent and unified machine-learning models for solid and liquid aluminum. Our machine-learning density functional theory framework opens up the path towards multiscale materials modeling for matter under ambient and extreme conditions at a computational scale and cost that is unattainable with current algorithms.

翻訳日:2022-10-08 23:38:03 公開日:2021-07-09

# 強化学習とグラフニューラルネットワークによるグラフダイナミクスの制御

Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks ( http://arxiv.org/abs/2010.05313v3 )

ライセンス: Link先を確認

Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik

(参考訳) グラフ上で部分的に観察された動的プロセスを限られた数の介入によって制御する問題を考える。この問題は、流行を抑制するためのウイルス検査のスケジュール、製品を宣伝するためのターゲットマーケティング、ソーシャルネットワークに拡散する偽ニュースを検出するために投稿を手作業で検査するといった状況で自然に発生する。この設定を時間グラフプロセス上の逐次決定問題として定式化する。指数的状態空間、組合せ作用空間、部分可観測性に直面して、時間グラフ上の動的過程を制御する新しい可観測スキームを設計する。我々は、流行拡大を抑制するためにどのノードをテストするべきかを優先順位付けし、グラフの最大化に影響を与えるという2つの一般的な問題に対して、このアプローチをうまく適用しました。

We consider the problem of controlling a partially-observed dynamic process on a graph by a limited number of interventions. This problem naturally arises in contexts such as scheduling virus tests to curb an epidemic; targeted marketing in order to promote a product; and manually inspecting posts to detect fake news spreading on social networks. We formulate this setup as a sequential decision problem over a temporal graph process. In face of an exponential state space, combinatorial action space and partial observability, we design a novel tractable scheme to control dynamical processes on temporal graphs. We successfully apply our approach to two popular problems that fall into our framework: prioritizing which nodes should be tested in order to curb the spread of an epidemic, and influence maximization on a graph.

翻訳日:2022-10-08 13:59:57 公開日:2021-07-09

# 講演から説明責任行動へ:ディープニューラルネットワークとトピックモデリングによる政策立案者の公開討論の監視

From Talk to Action with Accountability: Monitoring the Public Discussion of Policy Makers with Deep Neural Networks and Topic Modelling ( http://arxiv.org/abs/2010.08346v3 )

ライセンス: Link先を確認

Vili H\"at\"onen and Fiona Melzer

(参考訳) 気候変動の研究は、人間の活動が気候を変え、現在気候危機に向かっているという意見の一致をもたらした。気候変動の緩和に関する公的な議論や研究活動は増加しているが、潜在的な解決策は議論されるだけでなく、効果的に展開する必要がある。不正管理や政策立案者が説明責任を負うのを防ぐため、透明性と政府プロセスに関する情報の程度が重要であることが示されている。しかし、現在、気候変動に関する議論や情報源の多さから、公共社会や市民社会が政治家の責任を負うための概要を維持することはますます困難になっている。そこで本研究では,複数の公開情報源の発言と修辞を,容易に理解可能なトピック要約へと処理するマルチソーストピックアグリゲーションシステム(mustas)を提案する。 MuSTASは、様々なドキュメントからトピックをモデル化するために、新しいマルチソースハイブリッド遅延ディリクレアロケーションを使用する。この話題の消化は、政治家が気候変動と気候変動の政策について話す場所、方法、時期を評価する上で、一般市民や市民社会に役立ち、気候変動を緩和し、その欠如を和らげるために政治家に責任を負わせることができる。

Decades of research on climate have provided a consensus that human activity has changed the climate and we are currently heading into a climate crisis. While public discussion and research efforts on climate change mitigation have increased, potential solutions need to not only be discussed but also effectively deployed. For preventing mismanagement and holding policy makers accountable, transparency and degree of information about government processes have been shown to be crucial. However, currently the quantity of information about climate change discussions and the range of sources make it increasingly difficult for the public and civil society to maintain an overview to hold politicians accountable. In response, we propose a multi-source topic aggregation system (MuSTAS) which processes policy makers speech and rhetoric from several publicly available sources into an easily digestible topic summary. MuSTAS uses novel multi-source hybrid latent Dirichlet allocation to model topics from a variety of documents. This topic digest will serve the general public and civil society in assessing where, how, and when politicians talk about climate and climate policies, enabling them to hold politicians accountable for their actions to mitigate climate change and lack thereof.

翻訳日:2022-10-06 20:13:08 公開日:2021-07-09

# 情報クエリのための概要指向質問生成

Summary-Oriented Question Generation for Informational Queries ( http://arxiv.org/abs/2010.09692v2 )

ライセンス: Link先を確認

Xusen Yin, Li Zhou, Kevin Small, Jonathan May

(参考訳) ユーザは、質問応答(QA)システムに対して、単純なファクトイドの質問を頻繁に求め、より複雑な質問をサポートする無数の最近の研究の影響を減らします。自動生成された質問(SQ)をユーザに提供することで、QAシステム機能のユーザ理解が向上し、より効果的な使用が容易になる。主文書の話題に焦点をあて,可変長文で回答可能な自己説明的な質問を適切な形で作成することを目指している。 NQ(Natural Questions)データセットに基づいてトレーニングしたBERTベースのPointer-Generator Networkを用いて,これらの要件を満たす。 NQデータセット(20.1BLEU-4)上でのSQ生成のSOTA性能を示す。我々はさらに,本モデルを外部のニュース記事に適用し,ゴールド質問の欠如によるQAシステムによる評価を行い,我々のモデルがニュース記事に対してより良いSQを生成することを示す。

Users frequently ask simple factoid questions for question answering (QA) systems, attenuating the impact of myriad recent works that support more complex questions. Prompting users with automatically generated suggested questions (SQs) can improve user understanding of QA system capabilities and thus facilitate more effective use. We aim to produce self-explanatory questions that focus on main document topics and are answerable with variable length passages as appropriate. We satisfy these requirements by using a BERT-based Pointer-Generator Network trained on the Natural Questions (NQ) dataset. Our model shows SOTA performance of SQ generation on the NQ dataset (20.1 BLEU-4). We further apply our model on out-of-domain news articles, evaluating with a QA system due to the lack of gold questions and demonstrate that our model produces better SQs for news articles -- with further confirmation via a human evaluation.

翻訳日:2022-10-05 21:48:31 公開日:2021-07-09

# 複合文の意味的類似性評価における単語埋め込みの比較分析

Comparative analysis of word embeddings in assessing semantic similarity of complex sentences ( http://arxiv.org/abs/2010.12637v3 )

ライセンス: Link先を確認

Dhivya Chandrasekaran and Vijay Mago

(参考訳) セマンティックテキストの類似性は自然言語処理分野におけるオープンな研究課題の1つである。この分野で大規模な研究が行われ、STSデータセットやSICKデータセットのような既存のベンチマークデータセットにおける最近のトランスフォーマーベースモデルによってほぼ完全な結果が得られている。本稿では,これらのデータセットの文について検討し,文の複雑さに関する各種単語埋め込みの感度を解析する。 15人のアノテータが提供した50の文対と関連する意味的類似度値からなる複雑な文データセットを構築した。既存のベンチマークデータセットと提案データセットにおける文の複雑さの増加を強調するために、可読性分析が行われる。さらに,既存のベンチマークデータセットと提案データセットを用いて,単語埋め込みと言語モデルの性能の比較分析を行った。その結果, 文の複雑さの増加は, 組込みモデルの性能に有意な影響を与え, Pearson と Spearman の相関は10～20%減少した。

Semantic textual similarity is one of the open research challenges in the field of Natural Language Processing. Extensive research has been carried out in this field and near-perfect results are achieved by recent transformer-based models in existing benchmark datasets like the STS dataset and the SICK dataset. In this paper, we study the sentences in these datasets and analyze the sensitivity of various word embeddings with respect to the complexity of the sentences. We build a complex sentences dataset comprising of 50 sentence pairs with associated semantic similarity values provided by 15 human annotators. Readability analysis is performed to highlight the increase in complexity of the sentences in the existing benchmark datasets and those in the proposed dataset. Further, we perform a comparative analysis of the performance of various word embeddings and language models on the existing benchmark datasets and the proposed dataset. The results show the increase in complexity of the sentences has a significant impact on the performance of the embedding models resulting in a 10-20% decrease in Pearson's and Spearman's correlation.

翻訳日:2022-10-03 23:26:50 公開日:2021-07-09

# (un)ソーシャルメディアによるCOVID-19の流行

(Un)Masked COVID-19 Trends from Social Media ( http://arxiv.org/abs/2011.00052v3 )

ライセンス: Link先を確認

Asmit Kumar Singh, Paras Mehan, Divyanshu Sharma, Rohan Pandey, Tavpritesh Sethi, Ponnurangam Kumaraguru

(参考訳) マスクを着用することは、新型コロナウイルス(covid-19)に対する有効な保護方法であり、世界中で経済や社会的影響を引き起こしている。世界中の政府はマスクの使用を義務付けており、肯定的な反応も否定的な反応も受けている。オンラインソーシャルメディアは、マスクの使用を研究し、基礎となるマスク着用パターンを分析するエキサイティングなプラットフォームを提供する。本稿では,米国6都市を対象に,2400万件のソーシャルメディア画像を分析した。新型コロナウイルスの感染者が増加し、特に各州が厳格な規制を課した際、画像に被るマスクの増加が見られる。また,家庭内滞在法が施行されたため,グループ写真投稿の減少も見いだされた。さらに、Black Lives Matterの抗議行動におけるマスクのコンプライアンスを分析し、グループ写真の40%がマスクを着用し、そのうち45%が80%以上のフィットスコアのマスクを着用していた。今回我々は,マスク検出とマスク適合分析のための2つの新しいデータセットであるvariety masks(vama-c)とvariety masks- segmentation(vama-s)を導入した。分析のために、マスク検出装置(マスク付き顔とマスクなし顔の分類)とマスク適合分析装置(マスク適合スコアを計算するセグメンテーションベースモデル)の2つのフレームワークを構築した。フェイスマスク検出器は98%の分類精度を達成し、マスクフィットアナライザのセマンティクスセグメンテーションモデルは98%の交点点(iou)を達成した。このような枠組みは、パンデミック時のソーシャルメディアプラットフォームを用いた公衆衛生戦略の有効性を評価するのに利用できると結論づける。

Wearing masks is a useful protection method against COVID-19, which has caused widespread economic and social impact worldwide. Across the globe, governments have put mandates for the use of face masks, which have received both positive and negative reaction. Online social media provides an exciting platform to study the use of masks and analyze underlying mask-wearing patterns. In this article, we analyze 2.04 million social media images for six US cities. An increase in masks worn in images is seen as the COVID-19 cases rose, particularly when their respective states imposed strict regulations. We also found a decrease in the posting of group pictures as stay-at-home laws were put into place. Furthermore, mask compliance in the Black Lives Matter protest was analyzed, eliciting that 40% of the people in group photos wore masks, and 45% of them wore the masks with a fit score of greater than 80%. We introduce two new datasets, VAriety MAsks - Classification (VAMA-C) and VAriety MAsks - Segmentation (VAMA-S), for mask detection and mask fit analysis tasks, respectively. For the analysis, we create two frameworks, face mask detector (for classifying masked and unmasked faces) and mask fit analyzer (a semantic segmentation based model to calculate a mask-fit score). The face mask detector achieved a classification accuracy of 98%, and the semantic segmentation model for the mask fit analyzer achieved an Intersection Over Union (IOU) score of 98%. We conclude that such a framework can be used to evaluate the effectiveness of such public health strategies using social media platforms in times of pandemic.

翻訳日:2022-10-01 17:21:30 公開日:2021-07-09

# 胸部X線における胸部疾患の分類と局在の事前知識としての放射線学

Using Radiomics as Prior Knowledge for Thorax Disease Classification and Localization in Chest X-rays ( http://arxiv.org/abs/2011.12506v3 )

ライセンス: Link先を確認

Yan Han, Chongyan Chen, Liyan Tang, Mingquan Lin, Ajay Jaiswal, Song Wang, Ahmed Tewfik, George Shih, Ying Ding, Yifan Peng

(参考訳) 胸部X線は非侵襲性から最も一般的な診断の1つである。胸部X線画像の数は急上昇したが、胸部X線を読むのは放射線技師が手動で行い、火傷や遅延が発生する。医学画像から多くの定量的特徴を抽出できる放射線学のサブフィールドとして伝統的にラジオミクスは、深層学習時代以前の医療画像診断を容易にする可能性を示している。本稿では,放射能特性を利用して異常分類性能を向上させるためのエンドツーエンドフレームワークであるChexRadiNetを開発する。具体的には、chexradinetはまず、胸部x線を分類し異常領域を強調するために、軽量だが効率的なトリプレット・アテンション機構を適用した。次に、生成されたクラスアクティベーションマップを使用して放射能特徴を抽出し、より堅牢な画像特徴を学習するためのモデルをさらにガイドする。何度も繰り返し、放射能的特徴の助けを借りて、我々のフレームワークはより正確な画像領域に収束できる。我々は、NIH ChestX-ray、CheXpert、MIMIC-CXRの3つの公開データセットを用いてChexRadiNetフレームワークを評価する。その結果,chexradinetは疾患検出(aucでは0.843)と局在(t(iou) = 0.1)の両方において最先端を上回っていることがわかった。我々は,この手法が,放射線学の世界をより高度に理解した自動システムの開発を促進することを期待して,このコードをhttps://github.com/bionlplab/lung_disease_detection_amia2021で公開する。

Chest X-ray becomes one of the most common medical diagnoses due to its noninvasiveness. The number of chest X-ray images has skyrocketed, but reading chest X-rays still have been manually performed by radiologists, which creates huge burnouts and delays. Traditionally, radiomics, as a subfield of radiology that can extract a large number of quantitative features from medical images, demonstrates its potential to facilitate medical imaging diagnosis before the deep learning era. In this paper, we develop an end-to-end framework, ChexRadiNet, that can utilize the radiomics features to improve the abnormality classification performance. Specifically, ChexRadiNet first applies a light-weight but efficient triplet-attention mechanism to classify the chest X-rays and highlight the abnormal regions. Then it uses the generated class activation map to extract radiomic features, which further guides our model to learn more robust image features. After a number of iterations and with the help of radiomic features, our framework can converge to more accurate image regions. We evaluate the ChexRadiNet framework using three public datasets: NIH ChestX-ray, CheXpert, and MIMIC-CXR. We find that ChexRadiNet outperforms the state-of-the-art on both disease detection (0.843 in AUC) and localization (0.679 in T(IoU) = 0.1). We will make the code publicly available at https://github.com/bionlplab/lung_disease_detection_amia2021, with the hope that this method can facilitate the development of automatic systems with a higher-level understanding of the radiological world.

翻訳日:2022-09-21 02:55:42 公開日:2021-07-09

# ABD-Net:3Dポイントクラウド分解のための注意に基づく分解ネットワーク

ABD-Net: Attention Based Decomposition Network for 3D Point Cloud Decomposition ( http://arxiv.org/abs/2108.04221v1 )

ライセンス: Link先を確認

Siddharth Katageri, Shashidhar V Kudari, Akshaykumar Gunari, Ramesh Ashok Tabib, Uma Mudenagudi

(参考訳) 本稿では, 点雲を平面, 球面, 円錐, シリンダーといった基本的な幾何学形状に分解するための注意に基づく分解ネットワーク(ABD-Net)を提案する。点雲の原始形状に基づく注意特徴を用いた3次元オブジェクト分類の性能向上を示す。 3Dオブジェクトのシンプルでコンパクトな表現であるポイントクラウドの人気が高まっている。彼らは点集合における不順序性による特徴抽出のための堅牢な方法を要求する。 abd-netでは、提案する局所近接カプセル化器は、入力点集合から各点周辺の空間エンコーディングと共に局所幾何変化をキャプチャする。カプセル化された局所機能は、ポイントクラウドの基本形状を学ぶために、提案する注意機能エンコーダにさらに渡される。注意特徴エンコーダは、全点の近傍間の幾何学的関係をモデル化し、全点クラウド情報をキャプチャする。提案するansiメカニカルコンポーネントとmodelnet40データセットにおけるabd-netの結果を示す。また,モデルNet40ベンチマークデータセット上での3次元オブジェクト分類の性能を向上させることにより,獲得した注目機能に対するABD-Netの有効性を実証し,最先端技術と比較した。

In this paper, we propose Attention Based Decomposition Network (ABD-Net), for point cloud decomposition into basic geometric shapes namely, plane, sphere, cone and cylinder. We show improved performance of 3D object classification using attention features based on primitive shapes in point clouds. Point clouds, being the simple and compact representation of 3D objects have gained increasing popularity. They demand robust methods for feature extraction due to unorderness in point sets. In ABD-Net the proposed Local Proximity Encapsulator captures the local geometric variations along with spatial encoding around each point from the input point sets. The encapsulated local features are further passed to proposed Attention Feature Encoder to learn basic shapes in point cloud. Attention Feature Encoder models geometric relationship between the neighborhoods of all the points resulting in capturing global point cloud information. We demonstrate the results of our proposed ABD-Net on ANSI mechanical component and ModelNet40 datasets. We also demonstrate the effectiveness of ABD-Net over the acquired attention features by improving the performance of 3D object classification on ModelNet40 benchmark dataset and compare them with state-of-the-art techniques.

翻訳日:2021-08-15 11:27:18 公開日:2021-07-09

# 歩行からのパーキンソン病の効率的な診断のための線形予測

Linear Prediction Residual for Efficient Diagnosis of Parkinson's Disease from Gait ( http://arxiv.org/abs/2107.12878v1 )

ライセンス: Link先を確認

Shanmukh Alle and U. Deva Priyakumar

(参考訳) パーキンソン病(英: Parkinson's Disease、PD)は、慢性的に進行する神経疾患であり、硬直性、震動、姿勢不安定をもたらす。 PDを診断するための明確な医療検査はなく、診断は主に臨床演習である。ガイドラインはあるものの、約10～30%の患者が誤ってPDと診断されている。したがって、正確な、偏りのない、迅速な診断方法が必要となる。本研究では,歩行からpdを迅速かつ正確に診断する手法であるlpgnetを提案する。 LPGNetはLPR(Linear Prediction Residuals)を使用して歩行記録から識別パターンを抽出し、1D畳み込みニューラルネットワークを用いて診断を行う。 LPGNetは21倍のスピードアップと約99%のパラメータを持つ0.91のAUCを達成している。また,歩行からpd診断の文献で用いられる様々なクロスバリデーション戦略の分析を行い,多くの手法が不必要に大きなモデルと過剰フィッティングによる性能の増大につながる様々な折りたたみ型データ漏洩によって影響を受けることを見出した。この分析により、今後の手法を正しく評価する道のりが明確になる。

Parkinson's Disease (PD) is a chronic and progressive neurological disorder that results in rigidity, tremors and postural instability. There is no definite medical test to diagnose PD and diagnosis is mostly a clinical exercise. Although guidelines exist, about 10-30% of the patients are wrongly diagnosed with PD. Hence, there is a need for an accurate, unbiased and fast method for diagnosis. In this study, we propose LPGNet, a fast and accurate method to diagnose PD from gait. LPGNet uses Linear Prediction Residuals (LPR) to extract discriminating patterns from gait recordings and then uses a 1D convolution neural network with depth-wise separable convolutions to perform diagnosis. LPGNet achieves an AUC of 0.91 with a 21 times speedup and about 99% lesser parameters in the model compared to the state of the art. We also undertake an analysis of various cross-validation strategies used in literature in PD diagnosis from gait and find that most methods are affected by some form of data leakage between various folds which leads to unnecessarily large models and inflated performance due to overfitting. The analysis clears the path for future works in correctly evaluating their methods.

翻訳日:2021-08-01 11:01:29 公開日:2021-07-09

# (参考訳) テキストから音声への動的変換器によるフェデレーション学習

Federated Learning with Dynamic Transformer for Text to Speech ( http://arxiv.org/abs/2107.08795v1 )

ライセンス: CC BY 4.0

Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Jie Liu, Chendong Zhao, Jing Xiao

(参考訳) text to speech(tts)はユーザインタラクションにとって重要なタスクだが、ttsモデルトレーニングは高品質なオリジナルデータセットのセットに依存している。プライバシとセキュリティの問題のため、オリジナルのデータセットは通常、直接使用できない。近年,連合学習は,プライバシ保護機構が強化された,一般的な分散機械学習パラダイムを提案する。データ所有者が他の人とコラボレーションするための実用的でセキュアなフレームワークを提供するので、より大きなデータセットでトレーニングされたより良いグローバルモデルを得ることができる。しかし、変圧器モデルの複雑性が高いため、連合学習環境では収束過程が遅く不安定になる。さらに、連合学習で訓練されたトランスフォーマーモデルは、クライアント上での通信コストと計算速度の制限であり、その人気を妨げている。これらの課題に対処するために,フェデレーション動的トランスフォーマを提案する。一方, クライアント数が増加すると, 集中型トランスフォーマー-TTSに近づき, フェデレーショントランスに比べて性能が大幅に向上する。一方、トレーニングフェーズにおけるより高速でより安定した収束を実現し、通信時間を著しく短縮する。 LJSpeechデータセットの実験も、我々の手法の利点を強く証明している。

Text to speech (TTS) is a crucial task for user interaction, but TTS model training relies on a sizable set of high-quality original datasets. Due to privacy and security issues, the original datasets are usually unavailable directly. Recently, federated learning proposes a popular distributed machine learning paradigm with an enhanced privacy protection mechanism. It offers a practical and secure framework for data owners to collaborate with others, thus obtaining a better global model trained on the larger dataset. However, due to the high complexity of transformer models, the convergence process becomes slow and unstable in the federated learning setting. Besides, the transformer model trained in federated learning is costly communication and limited computational speed on clients, impeding its popularity. To deal with these challenges, we propose the federated dynamic transformer. On the one hand, the performance is greatly improved comparing with the federated transformer, approaching centralize-trained Transformer-TTS when increasing clients number. On the other hand, it achieves faster and more stable convergence in the training phase and significantly reduces communication time. Experiments on the LJSpeech dataset also strongly prove our method's advantage.

翻訳日:2021-07-25 13:44:36 公開日:2021-07-09

# (参考訳) 旅行セールスマン問題の強化型ハイブリッド遺伝的アルゴリズム

Reinforced Hybrid Genetic Algorithm for the Traveling Salesman Problem ( http://arxiv.org/abs/2107.06870v1 )

ライセンス: CC BY 4.0

Jiongzhi Zheng and Menglei Chen and Jialun Zhong and Kun He

(参考訳) 本稿では,NPハードトラベリングセールスマン問題(TSP)に対する強力な強化ハイブリッド遺伝的アルゴリズム(RHGA)を提案する。 RHGAは強化学習技術と有名なエッジアセンブリクロスオーバー遺伝的アルゴリズム(EAX-GA)とLin-Kernighan-Helsgaun(LKH)局所探索ヒューリスティックを組み合わせた。提案したハイブリッド機構の助けを借りて、EAX-GAの遺伝的進化とLKHの局所探索により、互いのパフォーマンスが向上する。また、q学習に基づく強化学習技術は、ハイブリッド遺伝的アルゴリズムをさらに促進する。 128のよく知られたTSPベンチマーク実験の結果,1,000から85,900都市を対象に,提案手法の優れた性能を示した。

We propose a powerful Reinforced Hybrid Genetic Algorithm (RHGA) for the famous NP-hard Traveling Salesman Problem (TSP). RHGA combines reinforcement learning technique with the well-known Edge Assembly Crossover genetic algorithm (EAX-GA) and the Lin-Kernighan-Helsgaun (LKH) local search heuristic. With the help of the proposed hybrid mechanism, the genetic evolution of EAX-GA and the local search of LKH can boost each other's performance. And the reinforcement learning technique based on Q-learning further promotes the hybrid genetic algorithm. Experimental results on 138 well-known and widely used TSP benchmarks, with the number of cities ranging from 1,000 to 85,900, demonstrate the excellent performance of the proposed method.

翻訳日:2021-07-18 13:12:30 公開日:2021-07-09

# GGT:ディープニューラルネットワークの逆サンプル検出のためのグラフガイドテスト

GGT: Graph-Guided Testing for Adversarial Sample Detection of Deep Neural Network ( http://arxiv.org/abs/2107.07043v1 )

ライセンス: Link先を確認

Zuohui Chen, Renxuan Wang, Jingyang Xiang, Yue Yu, Xin Xia, Shouling Ji, Qi Xuan, and Xiaoniu Yang

(参考訳) ディープニューラルネットワーク(dnn)は、敵のサンプルに対して脆弱であることが知られており、その検出は、これらのdnnモデルの広範囲な適用に不可欠である。近年、DNNシステムの脆弱性を見つけるために、ソフトウェア工学における多くの深層試験手法が提案され、その1つ、すなわちモデル変異テスト(MMT)は、様々な種類の敵攻撃によって生成された様々な敵のサンプルを正常に検出するために使用された。しかし、MTMの変異モデルは、常に大きな数(例えば100モデル以上)であり、多様性の欠如(例えば、高信頼の敵検体では容易に回避できる)のため、実際の応用では効率が悪く、高信頼の敵検体の検出にも効果が低い。本研究では,これらの課題を克服するために,逆サンプル検出のためのグラフガイドテスト(GGT)を提案する。 GGT はグラフ特性をガイドしたプルーニングモデルを生成し、それぞれ MMT の変異モデルのパラメータは5% 程度しかなく、グラフガイドモデルの方が多様性が高い。 CIFAR10 と SVHN の実験により、GGT は MMT よりも効率と効率の両面で優れていることが示された。

Deep Neural Networks (DNN) are known to be vulnerable to adversarial samples, the detection of which is crucial for the wide application of these DNN models. Recently, a number of deep testing methods in software engineering were proposed to find the vulnerability of DNN systems, and one of them, i.e., Model Mutation Testing (MMT), was used to successfully detect various adversarial samples generated by different kinds of adversarial attacks. However, the mutated models in MMT are always huge in number (e.g., over 100 models) and lack diversity (e.g., can be easily circumvented by high-confidence adversarial samples), which makes it less efficient in real applications and less effective in detecting high-confidence adversarial samples. In this study, we propose Graph-Guided Testing (GGT) for adversarial sample detection to overcome these aforementioned challenges. GGT generates pruned models with the guide of graph characteristics, each of them has only about 5% parameters of the mutated model in MMT, and graph guided models have higher diversity. The experiments on CIFAR10 and SVHN validate that GGT performs much better than MMT with respect to both effectiveness and efficiency.

翻訳日:2021-07-18 12:33:39 公開日:2021-07-09

# NVCell:強化学習による高度な技術ノードにおける標準セルレイアウト

NVCell: Standard Cell Layout in Advanced Technology Nodes with Reinforcement Learning ( http://arxiv.org/abs/2107.07044v1 )

ライセンス: Link先を確認

Haoxing Ren, Matthew Fojtik, Brucek Khailany

(参考訳) 高度な技術ノードにおける高品質な標準セルレイアウト自動化は、複雑な設計規則のため、現在でも業界では難しい。本稿では,高度技術ノード上の産業標準セルライブラリにおいて,一列セルの90%以上を均等あるいは小面積で配置できる,NVCellと呼ばれる標準セルレイアウト自動生成装置を提案する。 NVCellは強化学習(RL)を活用して、ルーティング中の設計規則違反を修正し、効率的な配置を生成する。

High quality standard cell layout automation in advanced technology nodes is still challenging in the industry today because of complex design rules. In this paper we introduce an automatic standard cell layout generator called NVCell that can generate layouts with equal or smaller area for over 90% of single row cells in an industry standard cell library on an advanced technology node. NVCell leverages reinforcement learning (RL) to fix design rule violations during routing and to generate efficient placements.

翻訳日:2021-07-18 12:30:26 公開日:2021-07-09

# fmnet: ノイズマイクロドップラースペクトログラムをクリーンアップする潜在機能マッピングネットワーク

FMNet: Latent Feature-wise Mapping Network for Cleaning up Noisy Micro-Doppler Spectrogram ( http://arxiv.org/abs/2107.07312v1 )

ライセンス: Link先を確認

Chong Tang, Wenda Li, Shelly Vishwakarma, Fangzhan Shi, Simon Julier, Kevin Chetty

(参考訳) マイクロドップラーシグネチャには、ターゲットダイナミクスに関するかなりの情報が含まれている。しかし、レーダセンシングシステムはノイズの多い環境に影響を受けやすく、マイクロドップラースペクトログラム上では解釈不能な動きパターンとなる。一方、レーダーリターンは、しばしばマルチパス、乱雑、干渉に悩まされる。これらの問題は、例えば、運動特徴抽出、マイクロドップラーシグネチャを用いたアクティビティ分類(\mu$-DS)などにおいて困難をもたらす。本稿では,同一条件下でのシミュレーション結果とより密接に類似するように,測定されたスペクトログラムを変換する機能マッピングネットワーク(fmnet)を提案する。計測されたスペクトログラムとマッチングされたシミュレーションデータに基づいて,潜在表現/特徴を抽出するエンコーダ,潜在特徴に応じて再構成されたスペクトログラムを出力するデコーダ,計測およびシミュレーションデータの潜在特徴距離を最小化する判別器の3つの部分を含む。 6つの活動データと2つの実験シナリオを用いてfmnetを実演し,最終結果は,強力な拡張パターンを示し,実際の動作情報を最大限に保持できる。一方,シミュレーションデータのみを用いて分類器を訓練し,fmnetでクリーンアップした後,新たに測定したサンプルを予測できる新しいアイデアを提案する。最終分類の結果から、大幅な改善が見られる。

Micro-Doppler signatures contain considerable information about target dynamics. However, the radar sensing systems are easily affected by noisy surroundings, resulting in uninterpretable motion patterns on the micro-Doppler spectrogram. Meanwhile, radar returns often suffer from multipath, clutter and interference. These issues lead to difficulty in, for example motion feature extraction, activity classification using micro Doppler signatures ($\mu$-DS), etc. In this paper, we propose a latent feature-wise mapping strategy, called Feature Mapping Network (FMNet), to transform measured spectrograms so that they more closely resemble the output from a simulation under the same conditions. Based on measured spectrogram and the matched simulated data, our framework contains three parts: an Encoder which is used to extract latent representations/features, a Decoder outputs reconstructed spectrogram according to the latent features, and a Discriminator minimizes the distance of latent features of measured and simulated data. We demonstrate the FMNet with six activities data and two experimental scenarios, and final results show strong enhanced patterns and can keep actual motion information to the greatest extent. On the other hand, we also propose a novel idea which trains a classifier with only simulated data and predicts new measured samples after cleaning them up with the FMNet. From final classification results, we can see significant improvements.

翻訳日:2021-07-18 12:27:37 公開日:2021-07-09

# トランスフォーマティブな行動表現学習は、小さなデータセットにおける移動センシングのためのトランスファー学習を可能にする

Transformer-Based Behavioral Representation Learning Enables Transfer Learning for Mobile Sensing in Small Datasets ( http://arxiv.org/abs/2107.06097v1 )

ライセンス: Link先を確認

Mike A. Merrill and Tim Althoff

(参考訳) ディープラーニングは、nlpとコンピュータビジョンの研究と応用に革命をもたらしたが、行動モデリングや行動健康アプリケーションでは、まだそうではない。これは、ドメインのデータセットが小さく、異種データ型を持ち、通常、大量の欠落を示すためである。したがって、既成のディープラーニングモデルは、重要な、しばしば禁止的な適応を必要とする。それゆえ、多くの研究アプリケーションはまだ木モデルが強化された手動でコーディングされた機能に依存しており、時には専門家によって手作りされたタスク特有の機能がある。本稿では,時系列から一般化可能な特徴表現を学習可能なモバイルセンシングデータのためのニューラルアーキテクチャフレームワークを提供し,微調整による小さなデータ領域での転送学習の実現可能性を示す。このアーキテクチャは、cnnとtrans-formerアーキテクチャの利点を組み合わせることで、1) 手作りのフィーチャを0.33 roc aucまで必要とせずに、生の微小レベルのセンサーデータから直接学習することで、より良い予測性能を実現する。

While deep learning has revolutionized research and applications in NLP and computer vision, this has not yet been the case for behavioral modeling and behavioral health applications. This is because the domain's datasets are smaller, have heterogeneous datatypes, and typically exhibit a large degree of missingness. Therefore, off-the-shelf deep learning models require significant, often prohibitive, adaptation. Accordingly, many research applications still rely on manually coded features with boosted tree models, sometimes with task-specific features handcrafted by experts. Here, we address these challenges by providing a neural architecture framework for mobile sensing data that can learn generalizable feature representations from time series and demonstrates the feasibility of transfer learning on small data domains through finetuning. This architecture combines benefits from CNN and Trans-former architectures to (1) enable better prediction performance by learning directly from raw minute-level sensor data without the need for handcrafted features by up to 0.33 ROC AUC, and (2) use pretraining to outperform simpler neural models and boosted decision trees with data from as few a dozen participants.

翻訳日:2021-07-14 14:30:49 公開日:2021-07-09

# (参考訳) 低級子宮内膜間質肉腫(lgess)のコンピュータ診断

Computer-Aided Diagnosis of Low Grade Endometrial Stromal Sarcoma (LGESS) ( http://arxiv.org/abs/2107.05426v1 )

ライセンス: CC BY 4.0

Xinxin Yang and Mark Stamp

(参考訳) 低悪性度子宮内膜間質肉腫(LGESS)はまれながんであり、全子宮癌症例の約0.2%を占める。 LGESS患者の約75%は、当初は良性腫瘍の一種である平滑筋腫(線維化物)と誤診されている。本研究では,lgess患者の子宮組織生検像をセグメンテーションと染色正規化アルゴリズムを用いて前処理する。さまざまな古典的な機械学習とディープラーニングモデルを使用して、組織画像を良性または癌性に分類する。従来の手法では,最も高い分類精度が約0.85であり,最高のディープラーニングモデルでは約0.87の精度を実現している。これらの結果から,LGESSの診断に適切な学習アルゴリズムが有用であることが示唆された。

Low grade endometrial stromal sarcoma (LGESS) is rare form of cancer, accounting for about 0.2% of all uterine cancer cases. Approximately 75% of LGESS patients are initially misdiagnosed with leiomyoma, which is a type of benign tumor, also known as fibroids. In this research, uterine tissue biopsy images of potential LGESS patients are preprocessed using segmentation and staining normalization algorithms. A variety of classic machine learning and leading deep learning models are then applied to classify tissue images as either benign or cancerous. For the classic techniques considered, the highest classification accuracy we attain is about 0.85, while our best deep learning model achieves an accuracy of approximately 0.87. These results indicate that properly trained learning algorithms can play a useful role in the diagnosis of LGESS.

翻訳日:2021-07-14 13:43:10 公開日:2021-07-09

# (参考訳) 最適三角法は本当に最適ではない

Optimal Triangulation Method is Not Really Optimal ( http://arxiv.org/abs/2107.04618v1 )

ライセンス: CC BY 4.0

Seyed-Mahdi Nasiri, Reshad Hosseini, Hadi Moradi

(参考訳) 三角測量は、複数のカメラ画像の2d投影から3dポイントを見つける問題を指す。この問題を解決するには,いわゆる最適三角測量法を用いるのが一般的であり,本論文ではl2法と呼ぶ。しかし、この方法はカメラパラメータの不確かさを仮定しない場合にのみ最適である。合成データと実データとの広範な比較により,L2法はカメラパラメータに不確実性が存在する場合に最適ではないことがわかった。興味深いことに、単純な中点法は他の方法よりも優れている。ハイパフォーマンスとは別に、中点法は複数のカメラ画像に対して単純な閉じたソリューションを持ち、L2法は2つ以上のカメラ画像に対して使用できない。したがって、一般的な手法とは対照的に、単純な中間点法は、カメラパラメータに不確かさがある構造から動きへのアプリケーションで使われるべきであると論じている。

Triangulation refers to the problem of finding a 3D point from its 2D projections on multiple camera images. For solving this problem, it is the common practice to use so-called optimal triangulation method, which we call the L2 method in this paper. But, the method can be optimal only if we assume no uncertainty in the camera parameters. Through extensive comparison on synthetic and real data, we observed that the L2 method is actually not the best choice when there is uncertainty in the camera parameters. Interestingly, it can be observed that the simple mid-point method outperforms other methods. Apart from its high performance, the mid-point method has a simple closed formed solution for multiple camera images while the L2 method is hard to be used for more than two camera images. Therefore, in contrast to the common practice, we argue that the simple mid-point method should be used in structure-from-motion applications where there is uncertainty in camera parameters.

翻訳日:2021-07-14 13:32:25 公開日:2021-07-09

# (参考訳) ガウス過程トリガーを用いた多様な映像生成

Diverse Video Generation using a Gaussian Process Trigger ( http://arxiv.org/abs/2107.04619v1 )

ライセンス: CC0 1.0

Gaurav Shrivastava and Abhinav Shrivastava

(参考訳) いくつかのコンテキスト(あるいは過去の)フレームが与えられた将来のフレームを生成するのは、難しい作業です。将来的な状態の多様性の観点から、ビデオの時間的コヒーレンスとマルチモダリティをモデル化する必要がある。ビデオ生成に対する現在の変分アプローチは、マルチモーダルな将来の結果よりも疎外する傾向にある。代わりに、将来の成果におけるマルチモダリティを明示的にモデル化し、多様な未来をサンプリングするためにそれを活用することを提案する。我々のアプローチであるDiverse Video Generatorは、ガウス過程(GP)を用いて、過去の状態を学習し、特定のサンプルを与えられた未来の確率分布を維持する。さらに,この分布の変化を時間とともに活用し,現在進行中のシーケンスの終了を推定することで,多様な将来状態のサンプリングを制御する。すなわち、出力関数空間上のGPの分散を利用して、アクションシーケンスの変更をトリガーする。生成したシーケンスの復元品質と多様性の観点から,将来的なフレーム生成の最先端性を実現する。

Generating future frames given a few context (or past) frames is a challenging task. It requires modeling the temporal coherence of videos and multi-modality in terms of diversity in the potential future states. Current variational approaches for video generation tend to marginalize over multi-modal future outcomes. Instead, we propose to explicitly model the multi-modality in the future outcomes and leverage it to sample diverse futures. Our approach, Diverse Video Generator, uses a Gaussian Process (GP) to learn priors on future states given the past and maintains a probability distribution over possible futures given a particular sample. In addition, we leverage the changes in this distribution over time to control the sampling of diverse future states by estimating the end of ongoing sequences. That is, we use the variance of GP over the output function space to trigger a change in an action sequence. We achieve state-of-the-art results on diverse future frame generation in terms of reconstruction quality and diversity of the generated sequences.

翻訳日:2021-07-14 13:22:07 公開日:2021-07-09

# (参考訳) ハイブリッドディープニューラルネットワークを用いたマルチジオメトリハイパースペクトル画像からのIll-posed Surface Emissivity検索

Ill-posed Surface Emissivity Retrieval from Multi-Geometry HyperspectralImages using a Hybrid Deep Neural Network ( http://arxiv.org/abs/2107.04631v1 )

ライセンス: CC BY 4.0

Fangcao Xu, Jian Suna, Guido Cervonea, Mark Salvador

(参考訳) 大気補正はリモートセンシングの基本的なタスクであり、観測は大気のどちらかで行われるか、大気を通して観測される。大気補正誤差は観測のスペクトルシグネチャを著しく変化させ、不正な分類やターゲット検出につながる可能性がある。これは、スペクトル特性の正確な測定が必要な超スペクトルデータを扱う場合にさらに重要である。最先端の物理学に基づく大気補正アプローチでは、センサ特性、収集形状、収集されるシーンの環境特性に関する幅広い事前知識が必要である。これらのアプローチは計算コストが高く、十分な環境情報や収集情報の欠如により不正確になりがちであり、しばしばリアルタイムアプリケーションでは不可能である。本稿では,異なる測地から収集したマルチスキャンハイパースペクトルデータを用いた自動大気補正のための幾何依存型ハイブリッドニューラルネットワークを提案する。提案したネットワークは、追加の気象データなしで大気を特徴づけることができる。温度放射率分離問題の解法としてグリッド探索法を提案する。その結果,提案ネットワークは,29種類の材料に対して0.02未満の絶対誤差(mae)で,大気を正確に特徴付け,目標放射率スペクトルを推定できることがわかった。このソリューションは、リアルタイムアプリケーションに対する目標検出を改善するために、正確な大気補正につながる可能性がある。

Atmospheric correction is a fundamental task in remote sensing because observations are taken either of the atmosphere or looking through the atmosphere. Atmospheric correction errors can significantly alter the spectral signature of the observations, and lead to invalid classifications or target detection. This is even more crucial when working with hyperspectral data, where a precise measurement of spectral properties is required. State-of-the-art physics-based atmospheric correction approaches require extensive prior knowledge about sensor characteristics, collection geometry, and environmental characteristics of the scene being collected. These approaches are computationally expensive, prone to inaccuracy due to lack of sufficient environmental and collection information, and often impossible for real-time applications. In this paper, a geometry-dependent hybrid neural network is proposed for automatic atmospheric correction using multi-scan hyperspectral data collected from different geometries. The proposed network can characterize the atmosphere without any additional meteorological data. A grid-search method is also proposed to solve the temperature emissivity separation problem. Results show that the proposed network has the capacity to accurately characterize the atmosphere and estimate target emissivity spectra with a Mean Absolute Error (MAE) under 0.02 for 29 different materials. This solution can lead to accurate atmospheric correction to improve target detection for real time applications.

翻訳日:2021-07-14 13:01:53 公開日:2021-07-09

# (参考訳) 因果効果を用いたアルゴリズム因果効果同定

Algorithmic Causal Effect Identification with causaleffect ( http://arxiv.org/abs/2107.04632v1 )

ライセンス: CC BY 4.0

Mart\'i Pedemonte, Jordi Vitri\`a and \'Alvaro Parafita (Universitat de Barcelona)

(参考訳) 種としての私たちの進化は、原因と影響の関係を理解する際に大きな一歩を踏み出した。これらの関連は、いくつかのイベントには自明だが、複雑なシナリオではない。因果理論と因果推論が形式化され、$do$-operatorとその関連する規則が導入された。このレポートの主な目的は、Pythonのいくつかのアルゴリズムで観測データから条件付きおよび条件なし因果クエリを計算し、実装することである。この目的のために、まず確率とグラフ理論に関する基本的な背景知識を提示し、アルゴリズムの構築に使用される因果論の重要な結果を紹介した。 2006年にshpitserとpearlによって提示された識別アルゴリズムを徹底的に研究し、pythonの実装について説明した。主同定アルゴリズムは、$do$-calculusの規則の繰り返し適用と見なすことができ、最終的に実験的な確率から因果クエリの式を返すか、因果効果を識別できないかのどちらかである。我々は、新しく開発したpythonライブラリを紹介し、いくつかの利用例を示す。

Our evolution as a species made a huge step forward when we understood the relationships between causes and effects. These associations may be trivial for some events, but they are not in complex scenarios. To rigorously prove that some occurrences are caused by others, causal theory and causal inference were formalized, introducing the $do$-operator and its associated rules. The main goal of this report is to review and implement in Python some algorithms to compute conditional and non-conditional causal queries from observational data. To this end, we first present some basic background knowledge on probability and graph theory, before introducing important results on causal theory, used in the construction of the algorithms. We then thoroughly study the identification algorithms presented by Shpitser and Pearl in 2006, explaining our implementation in Python alongside. The main identification algorithm can be seen as a repeated application of the rules of $do$-calculus, and it eventually either returns an expression for the causal query from experimental probabilities or fails to identify the causal effect, in which case the effect is non-identifiable. We introduce our newly developed Python library and give some usage examples.

翻訳日:2021-07-14 13:00:16 公開日:2021-07-09

# (参考訳) 非マルコフ確率的リワード過程からの確率的リワードマシンの学習

Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes ( http://arxiv.org/abs/2107.04633v1 )

ライセンス: CC BY 4.0

Alvaro Velasquez, Andre Beckus, Taylor Dohmen, Ashutosh Trivedi, Noah Topper, George Atia

(参考訳) 典型的な環境での強化学習の成功は、部分的には、エージェントが最適なポリシーを学ぶ報酬信号に関するマルコフの仮定に基づくものである。近年、報酬機械の使用は、非マルコフ報酬の構造化表現を可能にしてこの仮定を緩和している。特に、そのような表現は、基礎となる決定プロセスの状態空間を増大させ、非マルコフ強化学習を容易にするために用いられる。しかし、これらの報酬機械は、確率的報酬信号のセマンティクスを捉えることができない。本稿では,非マルコフ確率的報酬の表現として確率的報酬機械(prm)を導入することで,この方向を前進させる。本稿では,意思決定プロセスからPRMを学習するアルゴリズムと,意思決定方針のPRM表現を学習するアルゴリズムを提案する。

The success of reinforcement learning in typical settings is, in part, predicated on underlying Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture the semantics of stochastic reward signals. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs) as a representation of non-Markovian stochastic rewards. We present an algorithm to learn PRMs from the underlying decision process as well as to learn the PRM representation of a given decision-making policy.

翻訳日:2021-07-14 12:59:13 公開日:2021-07-09

# (参考訳) ドメインに依存しないpddl+プランナーでangry birdsをプレイする

Playing Angry Birds with a Domain-Independent PDDL+ Planner ( http://arxiv.org/abs/2107.04635v1 )

ライセンス: CC BY-SA 4.0

Wiktor Piotrowski, Roni Stern, Matthew Klenk, Alexandre Perez, Shiwali Mohan, Johan de Kleer, Jacob Le

(参考訳) 本稿では,ドメインに依存しないプランナーを用いて人気のangry birdsゲームを初めてプレイするシステムを提案する。我々のシステムは、混合離散/連続ドメインのための計画言語PDDL+を用いて、Angry Birdsレベルをモデル化する。ドメインに依存しないPDDL+プランナーを使用してプランを生成し、実行する。本稿では,本ドメインのPDDL+モデルについて述べるとともに,問題の複雑性を低減させる重要な設計上の決定事項を特定し,本ドメインのモデル固有の手法と比較する。その結果,本システムの性能はangry birdsの他のドメイン固有システムと同等であり,このベンチマークai課題に対するドメイン独立計画の適用性が示唆された。

This demo paper presents the first system for playing the popular Angry Birds game using a domain-independent planner. Our system models Angry Birds levels using PDDL+, a planning language for mixed discrete/continuous domains. It uses a domain-independent PDDL+ planner to generate plans and executes them. In this demo paper, we present the system's PDDL+ model for this domain, identify key design decisions that reduce the problem complexity, and compare the performance of our system to model-specific methods for this domain. The results show that our system's performance is on par with other domain-specific systems for Angry Birds, suggesting the applicability of domain-independent planning to this benchmark AI challenge.

翻訳日:2021-07-14 12:39:01 公開日:2021-07-09

# (参考訳) 表データにおける反事実生成法に関するフレームワークとベンチマーク

A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data ( http://arxiv.org/abs/2107.04680v1 )

ライセンス: CC BY 4.0

Raphael Mazzine and David Martens

(参考訳) 事実的説明は、機械学習の予測を説明する効果的な方法と見なされる。この関心は、そのような説明を生み出すために既に何十ものアルゴリズムが使われている比較的若い文献に反映されている。これらのアルゴリズムは、出力の分類を変えるために機能をどのように変更できるかを見つけることに重点を置いている。しかし、この比較的一般的な目的を異なる方法で達成できるため、これらのアルゴリズムをテストし、ベンチマークする方法論が必要となる。まず、関連する9つの評価指標を用いて、22の表付きデータセットに対する10のアルゴリズム的アプローチに関する大規模なベンチマーク研究を行う。第二に、反事実生成アルゴリズムをテストするための新しいフレームワークの導入です。第三に、反事実的な結果を評価し比較するための客観的指標のセットです。そして最後に、どのアプローチがどのタイプのデータセットで最高のパフォーマンスを得るかを示すベンチマーク結果から洞察を得る。このベンチマーク研究とフレームワークは、実践者がどのテクニックとビルディングブロックが最も適しているかを決定するのに役立ち、研究者が現在および将来のカウンターファクト生成アルゴリズムの設計と評価に役立ちます。以上の結果から,パフォーマンスがデータセット,モデル,スコア,事実点の特異性に大きく依存するため,全体として,反実的説明を生成する最善のアルゴリズムは存在しないことがわかった。

Counterfactual explanations are viewed as an effective way to explain machine learning predictions. This interest is reflected by a relatively young literature with already dozens of algorithms aiming to generate such explanations. These algorithms are focused on finding how features can be modified to change the output classification. However, this rather general objective can be achieved in different ways, which brings about the need for a methodology to test and benchmark these algorithms. The contributions of this work are manifold: First, a large benchmarking study of 10 algorithmic approaches on 22 tabular datasets is performed, using 9 relevant evaluation metrics. Second, the introduction of a novel, first of its kind, framework to test counterfactual generation algorithms. Third, a set of objective metrics to evaluate and compare counterfactual results. And finally, insight from the benchmarking results that indicate which approaches obtain the best performance on what type of dataset. This benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context, and can help researchers in the design and evaluation of current and future counterfactual generation algorithms. Our findings show that, overall, there's no single best algorithm to generate counterfactual explanations as the performance highly depends on properties related to the dataset, model, score and factual point specificities.

翻訳日:2021-07-14 12:35:24 公開日:2021-07-09

# (参考訳) 時空間ロバストエッジネットワーク

Scaled-Time-Attention Robust Edge Network ( http://arxiv.org/abs/2107.04688v1 )

ライセンス: CC BY 4.0

Richard Lau, Lihan Yao, Todd Huster, William Johnson, Stephen Arleth, Justin Wong, Devin Ridge, Michael Fletcher, William C. Headley

(参考訳) 本稿では,貯水池ニューラルネットワークの遅延ループバージョンに基づくニューラルネットの新しいファミリーを構築するための体系的なアプローチについて述べる。結果として得られたアーキテクチャは、STARE(Scaled-Time-Attention Robust Edge)ネットワークと呼ばれ、超次元空間と非乗算演算を利用して、浅いレイヤを持ち、トレーニングが簡単で、従来のディープニューラルネットワークよりもIoT(Internet of Things)のようなエッジアプリケーションに適している。 STAREは、注意やコンテキストといった新しいAI概念を取り入れており、時間的特徴抽出と分類に最も適している。 stareは様々なアプリケーションに適用でき、パフォーマンスが向上し、実装の複雑さが低下する。特に,空間的(ビデオフレーム)情報と時間的(軌道)情報の両方を利用して,対向無人航空システム(UAS)検出アプリケーションにおいて,二重ループ構成をドローン対鳥の検出と識別に応用する方法を示した。また、STAREの性能は、RF変調の分類において最先端のディープニューラルネットワークに近づき、マッキーグラスの時系列予測の特別な場合において長短期記憶(LSTM)より優れることを示した。ハードウェア効率を実証するために,STAREアルゴリズムのFPGA実装を開発し,その低消費電力かつ高スループットな演算を実証した。さらに,ASIC実装のためのSTAREアルゴリズムの大規模並列実装を統合するための効率的な構造について述べる。

This paper describes a systematic approach towards building a new family of neural networks based on a delay-loop version of a reservoir neural network. The resulting architecture, called Scaled-Time-Attention Robust Edge (STARE) network, exploits hyper dimensional space and non-multiply-and-add computation to achieve a simpler architecture, which has shallow layers, is simple to train, and is better suited for Edge applications, such as Internet of Things (IoT), over traditional deep neural networks. STARE incorporates new AI concepts such as Attention and Context, and is best suited for temporal feature extraction and classification. We demonstrate that STARE is applicable to a variety of applications with improved performance and lower implementation complexity. In particular, we showed a novel way of applying a dual-loop configuration to detection and identification of drone vs bird in a counter Unmanned Air Systems (UAS) detection application by exploiting both spatial (video frame) and temporal (trajectory) information. We also demonstrated that the STARE performance approaches that of a State-of-the-Art deep neural network in classifying RF modulations, and outperforms Long Short-term Memory (LSTM) in a special case of Mackey Glass time series prediction. To demonstrate hardware efficiency, we designed and developed an FPGA implementation of the STARE algorithm to demonstrate its low-power and high-throughput operations. In addition, we illustrate an efficient structure for integrating a massively parallel implementation of the STARE algorithm for ASIC implementation.

翻訳日:2021-07-14 12:34:22 公開日:2021-07-09

# (参考訳) 生涯教師-学生ネットワーク学習

Lifelong Teacher-Student Network Learning ( http://arxiv.org/abs/2107.04689v1 )

ライセンス: CC BY-SA 4.0

Fei Ye and Adrian G. Bors

(参考訳) 人間の独特の認知能力は、一連の経験から新しい知識とスキルを得る能力から成り立っている。一方、人工知能システムは、過去に学んだデータベースを覚えることなく、与えられた最後のタスクのみを学ぶのに長けている。本稿では,教師-学生ネットワークフレームワークを用いた生涯学習手法を提案する。学生モジュールが与えられた新しいデータベースでトレーニングされている間、教師モジュールは過去に学んだ情報を学生に思い出させる。 The TeacherはGAN(Generative Adversarial Network)によって実装され、学習前のデータベースの確率的表現に対応する過去の知識を保存・再生するように訓練されている。一方、学生モジュールは変分オートエンコーダ(VAE)によって実装され、教師モジュールの出力と新たに利用可能なデータベースの両方から潜在変数表現を推論する。さらに、学生モジュールは、異なるドメインにまたがる連続的および離散的なデータ表現の両方をキャプチャするように訓練される。提案した生涯学習フレームワークは、教師付き、半教師付き、教師なしの訓練に適用される。コードは?: \url{https://github.com/dtuzi123/Lifelong-Teacher-Student-Network-Learning}

A unique cognitive capability of humans consists in their ability to acquire new knowledge and skills from a sequence of experiences. Meanwhile, artificial intelligence systems are good at learning only the last given task without being able to remember the databases learnt in the past. We propose a novel lifelong learning methodology by employing a Teacher-Student network framework. While the Student module is trained with a new given database, the Teacher module would remind the Student about the information learnt in the past. The Teacher, implemented by a Generative Adversarial Network (GAN), is trained to preserve and replay past knowledge corresponding to the probabilistic representations of previously learn databases. Meanwhile, the Student module is implemented by a Variational Autoencoder (VAE) which infers its latent variable representation from both the output of the Teacher module as well as from the newly available database. Moreover, the Student module is trained to capture both continuous and discrete underlying data representations across different domains. The proposed lifelong learning framework is applied in supervised, semi-supervised and unsupervised training. The code is available~: \url{https://github.com/dtuzi123/Lifelong-Teacher-Student-Network-Learning}

翻訳日:2021-07-14 11:39:12 公開日:2021-07-09

# (参考訳) 非Native Spoken Question-Answering の初期調査

An Initial Investigation of Non-Native Spoken Question-Answering ( http://arxiv.org/abs/2107.04691v1 )

ライセンス: CC BY 4.0

Vatsal Raina, Mark J.F. Gales

(参考訳) テキストベースマシン理解(mc)システムには幅広い応用があり、アプローチの開発と評価には標準コーパスが存在する。音声質問応答 (SQA) システムの研究は, はるかに少ない。本論文で検討されているsqaタスクは,質問に対する質問応答の候補$\text{'}$sから,即応型言語アセスメントテストで回答を抽出することである。例えば、このSQAタスクにこれらのMCアプローチを適用することで、例えば、オフトピー応答検出は、さらに下流処理に使用できるはるかに詳細な情報を提供する。重要な課題の1つは、このタスクのためにシステムを訓練するために適切に注釈付けされた音声コーパスがないことである。したがって、非ネイティブ話者によるSQAタスクにおいて、テキストベースのMCで訓練されたシステムを評価できるトランスファーラーニング方式を採用する。ミスマッチは、テキスト文書と音声応答、非ネイティブな文法と文法の間で考慮されなければならない。実用的なSQAでは、ASRシステムを使用し、ASRエラーの影響を調べる必要がある。 SQAD2.0 で訓練された単純なテキストベースの ELECTRA MC モデルが,SQA に対して良好であることを示す。その結果,asr誤差とsqa評価スコアには線形関係がみられたが,文法的ミスマッチの影響は最小限であった。

Text-based machine comprehension (MC) systems have a wide-range of applications, and standard corpora exist for developing and evaluating approaches. There has been far less research on spoken question answering (SQA) systems. The SQA task considered in this paper is to extract the answer from a candidate$\text{'}$s spoken response to a question in a prompt-response style language assessment test. Applying these MC approaches to this SQA task rather than, for example, off-topic response detection provides far more detailed information that can be used for further downstream processing. One significant challenge is the lack of appropriately annotated speech corpora to train systems for this task. Hence, a transfer-learning style approach is adopted where a system trained on text-based MC is evaluated on an SQA task with non-native speakers. Mismatches must be considered between text documents and spoken responses; non-native spoken grammar and written grammar. In practical SQA, ASR systems are used, necessitating an investigation of the impact of ASR errors. We show that a simple text-based ELECTRA MC model trained on SQuAD2.0 transfers well for SQA. It is found that there is an approximately linear relationship between ASR errors and the SQA assessment scores but grammar mismatches have minimal impact.

翻訳日:2021-07-14 10:59:07 公開日:2021-07-09

# (参考訳) 変分オートエンコーダの寿命混合

Lifelong Mixture of Variational Autoencoders ( http://arxiv.org/abs/2107.04694v1 )

ライセンス: CC BY-SA 4.0

Fei Ye and Adrian G. Bors

(参考訳) 本稿では,専門家による終末から終末までの学習の組み合わせを提案する。各専門家は変分オートエンコーダ(VAE)によって実装される。混合システムのエキスパートは、与えられたトレーニングサンプルのログライクな状態において、個々のコンポーネントエビデンスローバウンド(MELBO)の混合物を最大化することによって共同で訓練される。混合における混合係数は、目標表現における各専門家の貢献を制御する。これらは、生涯学習中の非パラメトリック推定によってパラメータが決定されるディリクレ分布からサンプリングされる。モデルは、これらが以前学んだものと似ている場合に、新しいタスクを素早く学習することができる。 VAE(L-MVAE)のLifelong混合は、完全に新しいタスクを学ぶ際に、アーキテクチャを新しいコンポーネントで拡張する。トレーニング後、我々のモデルは、新しいデータサンプルを投入する際に使用する関連する専門家を自動的に決定できる。このメカニズムは、推論中に専門家が1人しか使わないため、メモリ効率と計算コストの両方に効果がある。 L-MVAE推論モデルは、異なるタスクに関連するデータ領域にまたがる結合潜在空間において補間を行うことができ、非絡み合いの学習表現に効率的であることが示されている。

In this paper, we propose an end-to-end lifelong learning mixture of experts. Each expert is implemented by a Variational Autoencoder (VAE). The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds (MELBO) on the log-likelihood of the given training samples. The mixing coefficients in the mixture, control the contributions of each expert in the goal representation. These are sampled from a Dirichlet distribution whose parameters are determined through non-parametric estimation during lifelong learning. The model can learn new tasks fast when these are similar to those previously learnt. The proposed Lifelong mixture of VAE (L-MVAE) expands its architecture with new components when learning a completely new task. After the training, our model can automatically determine the relevant expert to be used when fed with new data samples. This mechanism benefits both the memory efficiency and the required computational cost as only one expert is used during the inference. The L-MVAE inference model is able to perform interpolation in the joint latent space across the data domains associated with different tasks and is shown to be efficient for disentangled learning representation.

翻訳日:2021-07-14 10:47:32 公開日:2021-07-09

# (参考訳) L2M:最適化駆動第2モーメント推定による後部ラプラス近似

L2M: Practical posterior Laplace approximation with optimization-driven second moment estimation ( http://arxiv.org/abs/2107.04695v1 )

ライセンス: CC BY 4.0

Christian S. Perone, Roberto Pereira Silveira, Thomas Paula

(参考訳) ディープニューラルネットワークの不確かさの定量化は、最近多くの技術を通じて進化している。本研究では,計算的に魅力的な後方近似の古典的アプローチであるLaplace近似を再検討する。しかし、曲率行列を計算する代わりに、いくつかの正規性条件の下では、ラプラス近似が勾配第二モーメントを用いて容易に構成できることを示す。この量はアダムやRMSpropのような多くの指数移動平均変種によって既に推定されているが、伝統的に訓練後に捨てられている。提案手法(l2m)はモデルや最適化の変更を必要とせず、合理的な結果を得るために数行のコードで実装でき、新しいハイパーパラメータを導入することなく、既にオプティマイザによって計算されているもの以外の計算ステップも必要としないことを示す。提案手法は,深部ニューラルネットワークにおける不確実性推定のための最適化器によって既に計算されている量を用いて,新たな研究方向を開拓できることを期待する。

Uncertainty quantification for deep neural networks has recently evolved through many techniques. In this work, we revisit Laplace approximation, a classical approach for posterior approximation that is computationally attractive. However, instead of computing the curvature matrix, we show that, under some regularity conditions, the Laplace approximation can be easily constructed using the gradient second moment. This quantity is already estimated by many exponential moving average variants of Adagrad such as Adam and RMSprop, but is traditionally discarded after training. We show that our method (L2M) does not require changes in models or optimization, can be implemented in a few lines of code to yield reasonable results, and it does not require any extra computational steps besides what is already being computed by optimizers, without introducing any new hyperparameter. We hope our method can open new research directions on using quantities already computed by optimizers for uncertainty estimation in deep neural networks.

翻訳日:2021-07-14 10:19:52 公開日:2021-07-09

# (参考訳) infovaegan : 情報最大化と最大確率による理解可能表現の学習

InfoVAEGAN : learning joint interpretable representations by information maximization and maximum likelihood ( http://arxiv.org/abs/2107.04705v1 )

ライセンス: CC BY-SA 4.0

Fei Ye and Adrian G. Bors

(参考訳) 乱れと解釈可能な表現の学習は、多様体上の包括的なデータ表現を達成するための重要なステップである。本稿では,可変オートエンコーダ(vae)の推論能力と生成型逆ネットワーク(gan)の一般化能力を組み合わせた新しい表現学習アルゴリズムを提案する。提案モデルはInfoVAEGANと呼ばれ,Encoder, Generator, Discriminatorの3つのネットワークで構成されている。 InfoVAEGANは、2つの異なるデータフリーログライクな関数をジェネレータの分布からサンプリングされた変数に使用することにより、離散的かつ連続的な解釈可能な表現を教師なしで共同学習することを目的としている。本稿では,生成ネットワークを生成器のトレーニングとは別に最適化する2段階アルゴリズムを提案する。さらに,既存の潜伏変数と生成および推論プロセスによって生成された変数間の相互情報の最大化を通じて,解釈可能な表現の学習を実施する。

Learning disentangled and interpretable representations is an important step towards accomplishing comprehensive data representations on the manifold. In this paper, we propose a novel representation learning algorithm which combines the inference abilities of Variational Autoencoders (VAE) with the generalization capability of Generative Adversarial Networks (GAN). The proposed model, called InfoVAEGAN, consists of three networks~: Encoder, Generator and Discriminator. InfoVAEGAN aims to jointly learn discrete and continuous interpretable representations in an unsupervised manner by using two different data-free log-likelihood functions onto the variables sampled from the generator's distribution. We propose a two-stage algorithm for optimizing the inference network separately from the generator training. Moreover, we enforce the learning of interpretable representations through the maximization of the mutual information between the existing latent variables and those created through generative and inference processes.

翻訳日:2021-07-14 10:10:49 公開日:2021-07-09

# (参考訳) 長寿命双対生成対向ネットワーク

Lifelong Twin Generative Adversarial Networks ( http://arxiv.org/abs/2107.04708v1 )

ライセンス: CC BY-SA 4.0

Fei Ye and Adrian G. Bors

(参考訳) 本稿では,ライフロングツイン生成適応ネットワーク (LT-GAN) と呼ばれる連続学習型生成モデルを提案する。 LT-GANは複数のデータベースから一連のタスクを学習し、そのアーキテクチャは3つのコンポーネントで構成されている。 lt-gansが忘れずに新しい概念を学べるようにするため、教師とアシスタントが交互に相互に教え合うように促し、新しいデータベースを学習しながら、生涯にわたって学習する新しい訓練手法、lakd(lifelong adversarial knowledge distillation)を導入する。このトレーニングアプローチは、より知識のあるプレイヤーから、以前与えられたタスクに関する情報が少ない他のプレイヤーに知識を移すことを好む。

In this paper, we propose a new continuously learning generative model, called the Lifelong Twin Generative Adversarial Networks (LT-GANs). LT-GANs learns a sequence of tasks from several databases and its architecture consists of three components: two identical generators, namely the Teacher and Assistant, and one Discriminator. In order to allow for the LT-GANs to learn new concepts without forgetting, we introduce a new lifelong training approach, namely Lifelong Adversarial Knowledge Distillation (LAKD), which encourages the Teacher and Assistant to alternately teach each other, while learning a new database. This training approach favours transferring knowledge from a more knowledgeable player to another player which knows less information about a previously given task.

翻訳日:2021-07-14 10:01:07 公開日:2021-07-09

# (参考訳) 劣化網膜の基底画像におけるランドマーク検出のための階層型ボトルネック注意U-Net

U-Net with Hierarchical Bottleneck Attention for Landmark Detection in Fundus Images of the Degenerated Retina ( http://arxiv.org/abs/2107.04721v1 )

ライセンス: CC BY 4.0

Shuyun Tang, Ziming Qi, Jacob Granley and Michael Beyeler

(参考訳) 眼底写真は、臨床における加齢関連黄斑変性症(AMD)、緑内障、糖尿病網膜症(DR)などの網膜変性疾患の存在と重症度を日常的に記録するために使われてきた。しかし、網膜変性に伴う病変、ドルゼン、その他の網膜異常の発生は、自動的ランドマーク検出とセグメンテーションを著しく複雑にする。本稿では,階層的ボトルネックに注目するU-NetバックボーンHBA-U-Netを提案する。このネットワークは、自己注意、チャネルアテンション、および相対的な位置アテンションを組み合わせた、新たなボトルネックアテンションブロックで構成されており、変性網膜における卵胞およびODセグメンテーションに重要な網膜異常を強調している。 hba-u-netは、データセットと眼の状態(adam: euclidean distance (ed) of 25.4 pixels, refuge: 32.5 pixels, idrid: 32.1 pixels), on od segmentation for amd (adam: dice coefficient (dc) of 0.947), on od detection for dr (idrid: ed of 20.5 pixels)の最新の結果を得た。以上の結果から,HBA-U-Netは網膜変性疾患の存在下でのランドマーク検出に適している可能性が示唆された。

Fundus photography has routinely been used to document the presence and severity of retinal degenerative diseases such as age-related macular degeneration (AMD), glaucoma, and diabetic retinopathy (DR) in clinical practice, for which the fovea and optic disc (OD) are important retinal landmarks. However, the occurrence of lesions, drusen, and other retinal abnormalities during retinal degeneration severely complicates automatic landmark detection and segmentation. Here we propose HBA-U-Net: a U-Net backbone enriched with hierarchical bottleneck attention. The network consists of a novel bottleneck attention block that combines and refines self-attention, channel attention, and relative-position attention to highlight retinal abnormalities that may be important for fovea and OD segmentation in the degenerated retina. HBA-U-Net achieved state-of-the-art results on fovea detection across datasets and eye conditions (ADAM: Euclidean Distance (ED) of 25.4 pixels, REFUGE: 32.5 pixels, IDRiD: 32.1 pixels), on OD segmentation for AMD (ADAM: Dice Coefficient (DC) of 0.947), and on OD detection for DR (IDRiD: ED of 20.5 pixels). Our results suggest that HBA-U-Net may be well suited for landmark detection in the presence of a variety of retinal degenerative diseases.

翻訳日:2021-07-14 09:51:17 公開日:2021-07-09

# 非可逆目的を持つオーバーパラメータモデルのトレーニング

Training Over-parameterized Models with Non-decomposable Objectives ( http://arxiv.org/abs/2107.04641v1 )

ライセンス: Link先を確認

Harikrishna Narasimhan, Aditya Krishna Menon

(参考訳) 多くの現代の機械学習アプリケーションは、最悪のケースエラーを最小限に抑えること、与えられた精度やリコールターゲットを満たすこと、グループフェアネスの制約を強制することなど、複雑で曖昧な設計目標を掲げている。このような分解不能な目的を最適化するための一般的なテクニックは、問題をコストに敏感な一連の学習タスクに還元し、それぞれがサンプル固有のコストでトレーニング損失を再重み付けすることで解決する。ラベルコストを組み込むために損失を再重み付けする標準的なアプローチは、過パラメータモデルのトレーニングで不満足な結果をもたらす可能性がある、と指摘する。そこで本稿では,ロジット調整という古典的な考え方を拡張し,より一般的なコスト行列を扱うための新たなコスト感受性損失を提案する。私たちの損失は校正され、教師モデルからの蒸留ラベルによってさらに改善できます。ベンチマーク画像データセットの実験を通じて、共通の頑健で制約のある最適化目標を持つResNetモデルのトレーニングにおいて、我々のアプローチの有効性を示す。

Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the training loss with example-specific costs. We point out that the standard approach of re-weighting the loss to incorporate label costs can produce unsatisfactory results when used to train over-parameterized models. As a remedy, we propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices. Our losses are calibrated, and can be further improved with distilled labels from a teacher model. Through experiments on benchmark image datasets, we showcase the effectiveness of our approach in training ResNet models with common robust and constrained optimization objectives.

翻訳日:2021-07-13 16:17:35 公開日:2021-07-09

# 直線上の精度:分布外と分布内一般化の強い相関について

Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization ( http://arxiv.org/abs/2107.04649v1 )

ライセンス: Link先を確認

John Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, Ludwig Schmidt

(参考訳) 機械学習システムが信頼できるためには、その性能を無意識で分散しない環境で理解する必要がある。本稿では,様々なモデルに対する分配性能と分配性能が強く相関していることを実証的に示す。具体的には,YCBオブジェクトから合成されたポーズ推定タスク,FMoW-WILDSの衛星画像分類,iWildCam-WILDSの野生生物分類,CIFAR-10とImageNetの変種に対する分布内分布と分布外分布性能の相関性を示す。モデルアーキテクチャ、ハイパーパラメータ、トレーニングセットサイズ、トレーニング期間の間に強い相関関係があり、既存のドメイン適応理論から予想されるよりも正確である。また,CIFAR-10-Cと組織分類データセットCamelyon17-WILDSの合成分布の変化など,相関が弱いケースについても検討した。最後に,分布シフトによるデータ共分散の変化が観測された相関に与える影響を示すガウスデータモデルに基づく候補理論を提案する。

For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, satellite imagery classification in FMoW-WILDS, and wildlife classification in iWildCam-WILDS. The strong correlations hold across model architectures, hyperparameters, training set size, and training duration, and are more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.

翻訳日:2021-07-13 16:14:33 公開日:2021-07-09

# 変分オートエンコーダにおけるエンコーダの表現複雑性に及ぼす可逆性の影響

The Effects of Invertibility on the Representational Complexity of Encoders in Variational Autoencoders ( http://arxiv.org/abs/2107.04652v1 )

ライセンス: Link先を確認

Divyansh Pareek, Andrej Risteski

(参考訳) 現代のニューラルネットワークに基づく潜在変数生成モデル(変分オートエンコーダなど)のトレーニングと使用には、しばしば、潜在変数の後方分布を近似する推論(エンコード)方向とともに生成方向の訓練を同時に行う必要がある。与えられた生成モデルの後方分布を正確にモデル化するために、推論モデルはどの程度複雑でなければならないのか? 本稿では,エンコーダの必要なサイズに影響を及ぼす生成写像の重要な特性を同定する。生成写像が「強可逆(strongly invertible)」ならば(ある意味では、適切に形式化できる)、推論モデルはそれほど複雑ではない。逆に、エンコーディング方向が指数関数的に大きい(計算複雑性の標準的な仮定の下で)必要となる非可逆生成写像が存在することを証明する。重要なことは、生成モデルは階層的に非可逆である必要はなく、関係する文献の多くが想定し、実際に使用される多くのアーキテクチャ(例えば、)に満足していない。畳み込みとプールベースのネットワーク)。したがって、低次元多様体上にデータを置くと、深層生成モデルの学習が困難であるという経験的知恵を理論的に支持する。

Training and using modern neural-network based latent-variable generative models (like Variational Autoencoders) often require simultaneously training a generative direction along with an inferential(encoding) direction, which approximates the posterior distribution over the latent variables. Thus, the question arises: how complex does the inferential model need to be, in order to be able to accurately model the posterior distribution of a given generative model? In this paper, we identify an important property of the generative map impacting the required size of the encoder. We show that if the generative map is "strongly invertible" (in a sense we suitably formalize), the inferential model need not be much more complex. Conversely, we prove that there exist non-invertible generative maps, for which the encoding direction needs to be exponentially larger (under standard assumptions in computational complexity). Importantly, we do not require the generative model to be layerwise invertible, which a lot of the related literature assumes and isn't satisfied by many architectures used in practice (e.g. convolution and pooling based networks). Thus, we provide theoretical support for the empirical wisdom that learning deep generative models is harder when data lies on a low-dimensional manifold.

翻訳日:2021-07-13 16:14:14 公開日:2021-07-09

# 因果推論における感度解析のためのh\"older bounds

H\"older Bounds for Sensitivity Analysis in Causal Reasoning ( http://arxiv.org/abs/2107.04661v1 )

ライセンス: Link先を確認

Serge Assaad, Shuxi Zeng, Henry Pfister, Fan Li, Lawrence Carin

(参考訳) 本研究では,未保存の共同設立者Uの存在から,治療Tが成績Yに与える影響の間隔推定を行った。 H\'olderの不等式を用いて、未測定の共役の度合い(すなわち、接続 U->T の強さと U->Y の強さ)に基づいて、共役バイアス |E[Y|T=t]-E[Y|do(T=t)]| 上の一連の境界を導出する。これらの境界は、U が T から独立であるとき、または U が T から独立であるとき、あるいは U が T から独立であるとき、厳密である。我々は、分布 p(U) と p(U|T=t) の間の全変動距離、および平均期待結果 E[Y|U=u,T=t] からの条件付き期待結果 E[Y|U=u,T=t] の最大偏差(U のすべての可能な値)に依存するこの境界の特別な場合に焦点を当てる。本稿では,このバウンドのキャリブレーション戦略について検討し,合成および半合成データセットを用いてそのバウンドを実験的に検証する。

We examine interval estimation of the effect of a treatment T on an outcome Y given the existence of an unobserved confounder U. Using H\"older's inequality, we derive a set of bounds on the confounding bias |E[Y|T=t]-E[Y|do(T=t)]| based on the degree of unmeasured confounding (i.e., the strength of the connection U->T, and the strength of U->Y). These bounds are tight either when U is independent of T or when U is independent of Y given T (when there is no unobserved confounding). We focus on a special case of this bound depending on the total variation distance between the distributions p(U) and p(U|T=t), as well as the maximum (over all possible values of U) deviation of the conditional expected outcome E[Y|U=u,T=t] from the average expected outcome E[Y|T=t]. We discuss possible calibration strategies for this bound to get interval estimates for treatment effects, and experimentally validate the bound using synthetic and semi-synthetic datasets.

翻訳日:2021-07-13 16:13:53 公開日:2021-07-09

# 人口ベースのセルフチューニングgcnによる自動グラフ学習

Automated Graph Learning via Population Based Self-Tuning GCN ( http://arxiv.org/abs/2107.04713v1 )

ライセンス: Link先を確認

Ronghang Zhu and Zhiqiang Tao and Yaliang Li and Sheng Li

(参考訳) 効率的なグラフ埋め込みを抽出する顕著な能力のため、グラフ畳み込みネットワーク(GCN)とその変種は、ノード分類、リンク予測、グラフ分類といった幅広いタスクにうまく適用されている。従来のGCNモデルはオーバーフィッティングとオーバースムーシングの問題に悩まされており、DropEdgeのような最近の技術はこれらの問題を緩和し、ディープGCNの開発を可能にする。しかし、GCNモデルのトレーニングは、特に深いGCNモデルにおいて、ドロップアウト率や学習重量減少などのハイパーパラメータの選択に敏感であるため、簡単ではない。本稿では,ハイパーパラメータ最適化によりGCNモデルのトレーニングを自動化することを目的とする。具体的には、代替トレーニングアルゴリズムを用いた自己学習型GCNアプローチを提案し、人口ベーストレーニングスキームを取り入れたアプローチをさらに拡張する。 3つのベンチマークデータセットの実験結果から,複数の代表的ベースラインと比較して,多層GCNの最適化におけるアプローチの有効性が示された。

Owing to the remarkable capability of extracting effective graph embeddings, graph convolutional network (GCN) and its variants have been successfully applied to a broad range of tasks, such as node classification, link prediction, and graph classification. Traditional GCN models suffer from the issues of overfitting and oversmoothing, while some recent techniques like DropEdge could alleviate these issues and thus enable the development of deep GCN. However, training GCN models is non-trivial, as it is sensitive to the choice of hyperparameters such as dropout rate and learning weight decay, especially for deep GCN models. In this paper, we aim to automate the training of GCN models through hyperparameter optimization. To be specific, we propose a self-tuning GCN approach with an alternate training algorithm, and further extend our approach by incorporating the population based training scheme. Experimental results on three benchmark datasets demonstrate the effectiveness of our approaches on optimizing multi-layer GCN, compared with several representative baselines.

翻訳日:2021-07-13 16:11:41 公開日:2021-07-09

# 機械学習モデルの性能解析を改善するトポロジカルフレームワーク

A Topological-Framework to Improve Analysis of Machine Learning Model Performance ( http://arxiv.org/abs/2107.04714v1 )

ライセンス: Link先を確認

Henry Kvinge, Colby Wight, Sarah Akers, Scott Howland, Woongjo Choi, Xiaolong Ma, Luke Gosink, Elizabeth Jurrus, Keerti Kappagantula, Tegan H. Emerson

(参考訳) 機械学習モデルと評価されたデータセットがサイズと複雑性が増大するにつれて、モデルのパフォーマンスを理解するためにいくつかの要約統計を使用するプラクティスがますます問題になっている。これは、データの特定のサブポピュレーションにおけるモデル失敗を理解することが重要な現実のシナリオにおいて特に当てはまる。本稿では,データセットをモデルが動作する「空間」として扱う機械学習モデルを評価するためのトポロジカルな枠組みを提案する。これにより、グローバルレベル(テストセット全体)とローカルレベル(特定のサブポピュレーション)の両方で、モデルパフォーマンスに関する情報を整理する原則化された方法が提供されます。最後に,様々な部分集団間のモデル性能を保存・分析するための便利な手法である,トポロジカルデータ構造であるpresheavesについて述べる。

As both machine learning models and the datasets on which they are evaluated have grown in size and complexity, the practice of using a few summary statistics to understand model performance has become increasingly problematic. This is particularly true in real-world scenarios where understanding model failure on certain subpopulations of the data is of critical importance. In this paper we propose a topological framework for evaluating machine learning models in which a dataset is treated as a "space" on which a model operates. This provides us with a principled way to organize information about model performance at both the global level (over the entire test set) and also the local level (on specific subpopulations). Finally, we describe a topological data structure, presheaves, which offer a convenient way to store and analyze model performance between different subpopulations.

翻訳日:2021-07-13 16:06:18 公開日:2021-07-09

# ノイズトレーニングによるエッジ用E2E ASRの改善

Noisy Training Improves E2E ASR for the Edge ( http://arxiv.org/abs/2107.04677v1 )

ライセンス: Link先を確認

Dilin Wang, Yuan Shangguan, Haichuan Yang, Pierce Chuang, Jiatong Zhou, Meng Li, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra

(参考訳) 音声認識(ASR)は現代のエッジデバイスでますます普及している。過去の研究では、エッジデバイス上でコンパクトに動作可能な全ニューロン音声認識器(E2E)を開発した。しかしながら、E2E ASRモデルは過度に適合する傾向にあり、見えないテストデータの一般化には困難である。層正規化、ドロップアウト、スペクトルデータ増大、入力の速度歪みなど、ASRモデルのトレーニングを規則化する様々な手法が提案されている。本稿では,e2e asrモデルトレーニングをさらに改善するための,単純かつ効果的なノイズトレーニング戦略を提案する。学習中にパラメータ空間にランダムノイズを導入することにより,より一般化した収束時のスムースモデルを生成することができる。我々は,高密度かつスパースなEmformerモデルの改良と,一貫したWER削減の観測に雑音学習を適用した。具体的には、90%の間隔でEmformerをトレーニングする場合、それぞれ12%と14%のWER改善をLibriSpeech Test-otherとTest-cleanデータセットで達成します。

Automatic speech recognition (ASR) has become increasingly ubiquitous on modern edge devices. Past work developed streaming End-to-End (E2E) all-neural speech recognizers that can run compactly on edge devices. However, E2E ASR models are prone to overfitting and have difficulties in generalizing to unseen testing data. Various techniques have been proposed to regularize the training of ASR models, including layer normalization, dropout, spectrum data augmentation and speed distortions in the inputs. In this work, we present a simple yet effective noisy training strategy to further improve the E2E ASR model training. By introducing random noise to the parameter space during training, our method can produce smoother models at convergence that generalize better. We apply noisy training to improve both dense and sparse state-of-the-art Emformer models and observe consistent WER reduction. Specifically, when training Emformers with 90% sparsity, we achieve 12% and 14% WER improvements on the LibriSpeech Test-other and Test-clean data set, respectively.

翻訳日:2021-07-13 16:04:55 公開日:2021-07-09

# 都市3次元モデリングのための累積評価

Cumulative Assessment for Urban 3D Modeling ( http://arxiv.org/abs/2107.04622v1 )

ライセンス: Link先を確認

Shea Hagstrom, Hee Won Pak, Stephanie Ku, Sean Wang, Gregory Hager, Myron Brown

(参考訳) 衛星画像からの都市の3dモデリングには、都市の特徴を表現するための正確なセマンティックセグメンテーション、表面高さの3d再構成のためのマルチビューステレオ、3正確な表面傾斜を持つコンパクトモデルを作成するための3dモデルフィッティングが必要である。本稿では,各コンポーネントからの誤差貢献を簡潔に捉えた累積評価指標を提案する。我々は,2つのオープンソースプロジェクトを拡張してエンド・ツー・エンドの3dモデリングベースラインソリューションを提供し,パブリックなリーダボードによるさらなる研究と評価を促進することで,このアプローチを実証する。

Urban 3D modeling from satellite images requires accurate semantic segmentation to delineate urban features, multiple view stereo for 3D reconstruction of surface heights, and 3D model fitting to produce compact models with accurate surface slopes. In this work, we present a cumulative assessment metric that succinctly captures error contributions from each of these components. We demonstrate our approach by providing challenging public datasets and extending two open source projects to provide an end-to-end 3D modeling baseline solution to stimulate further research and evaluation with a public leaderboard.

翻訳日:2021-07-13 16:01:44 公開日:2021-07-09

# 腹腔鏡画像の深度推定のための自己監督型生成逆数ネットワーク

Self-Supervised Generative Adversarial Network for Depth Estimation in Laparoscopic Images ( http://arxiv.org/abs/2107.04644v1 )

ライセンス: Link先を確認

Baoru Huang, Jianqing Zheng, Anh Nguyen, David Tuch, Kunal Vyas, Stamatia Giannarou, Daniel S. Elson

(参考訳) 手術シーンの深度推定と3次元再構成は,コンピュータ支援手術における重要なステップである。近年の研究では、畳み込みニューラルネットワークによってステレオ画像ペアから深度を推定できることが示されている。しかし、最近の深度推定モデルは、ピクセル単位の基底真理を持つデータセットで訓練された。このようなデータは腹腔鏡画像では特に稀であり、実際の外科的応用に教師付き深度推定を適用することは困難である。この制限を克服するために,生成逆ネットワークに基づく自己教師型深度推定手法であるSADepthを提案する。エンコーダデコーダジェネレータと、トレーニング中に幾何学的制約を組み込む識別器で構成される。生成装置からのマルチスケール出力は、光度再投射損失による局所的なミニマを解くのに役立ち、対向学習はフレームワーク生成品質を改善する。 2つの公開データセットに対する大規模な実験により、SADepthは最新の最先端の教師なし手法を大きなマージンで上回り、腹腔鏡画像における教師なしと教師なしの深さ推定のギャップを減らしている。

Dense depth estimation and 3D reconstruction of a surgical scene are crucial steps in computer assisted surgery. Recent work has shown that depth estimation from a stereo images pair could be solved with convolutional neural networks. However, most recent depth estimation models were trained on datasets with per-pixel ground truth. Such data is especially rare for laparoscopic imaging, making it hard to apply supervised depth estimation to real surgical applications. To overcome this limitation, we propose SADepth, a new self-supervised depth estimation method based on Generative Adversarial Networks. It consists of an encoder-decoder generator and a discriminator to incorporate geometry constraints during training. Multi-scale outputs from the generator help to solve the local minima caused by the photometric reprojection loss, while the adversarial learning improves the framework generation quality. Extensive experiments on two public datasets show that SADepth outperforms recent state-of-the-art unsupervised methods by a large margin, and reduces the gap between supervised and unsupervised depth estimation in laparoscopic images.

翻訳日:2021-07-13 16:01:33 公開日:2021-07-09

# DDCNet: ディエンス予測のための深層拡張畳み込みニューラルネットワーク

DDCNet: Deep Dilated Convolutional Neural Network for Dense Prediction ( http://arxiv.org/abs/2107.04715v1 )

ライセンス: Link先を確認

Ali Salehi, Madhusudhanan Balasubramanian

(参考訳) 光フローや不均一性推定などの複雑なピクセルマッチング問題は、コンピュータビジョンにおいて最も難しい課題である。近年,これらの問題に対する深層学習手法が成功している。十分に大きな有効受容場(ERF)とネットワーク内の空間的特徴の高分解能な分解能は、高分解能な密度推定を提供することに不可欠である。本稿では,高い空間的特徴分解能を維持しつつ,より広い受容領域を提供できるネットワークアーキテクチャを設計するためのシステム的アプローチを提案する。より大きなRFを実現するために,拡張畳み込み層を利用した。より深い層での拡散率を積極的に増加させることで、トレーニング可能なパラメータの数が著しく少ない十分に大きなRFを達成できた。ネットワーク設計戦略の第一指標として,光フロー推定問題を用いた。ベンチマークの結果(sintel, kitti, middlebury)は、私たちのコンパクトネットワークが軽量ネットワークのクラスで同等のパフォーマンスを達成できることを示しています。

Dense pixel matching problems such as optical flow and disparity estimation are among the most challenging tasks in computer vision. Recently, several deep learning methods designed for these problems have been successful. A sufficiently larger effective receptive field (ERF) and a higher resolution of spatial features within a network are essential for providing higher-resolution dense estimates. In this work, we present a systemic approach to design network architectures that can provide a larger receptive field while maintaining a higher spatial feature resolution. To achieve a larger ERF, we utilized dilated convolutional layers. By aggressively increasing dilation rates in the deeper layers, we were able to achieve a sufficiently larger ERF with a significantly fewer number of trainable parameters. We used optical flow estimation problem as the primary benchmark to illustrate our network design strategy. The benchmark results (Sintel, KITTI, and Middlebury) indicate that our compact networks can achieve comparable performance in the class of lightweight networks.

翻訳日:2021-07-13 16:01:16 公開日:2021-07-09

# SITHCon: 時間次元における入力スケーリングの変動に頑健なニューラルネットワーク

SITHCon: A neural network robust to variations in input scaling on the time dimension ( http://arxiv.org/abs/2107.04616v1 )

ライセンス: Link先を確認

Brandon G. Jacques, Zoran Tiganj, Aakash Sarkar, Marc W. Howard, Per B. Sederberg

(参考訳) 機械学習では、畳み込みニューラルネットワーク(CNN)はコンピュータビジョンと時間とともに拡張されたパターンの認識の両方に非常に影響を与えた。コンピュータビジョンにおいて、柔軟性の一部は、変換不変性を達成するために畳み込み上の最大プール演算を使用することによって生じる。哺乳類の脳では、時間の神経表現は時間基底関数のセットを使用する。批判的に、これらの基底関数は、基底集合が対数時間上で均等に分布するように幾何級数に配置されているように見える。本稿では,対数的に分散した時間メモリを用いたSITHCon(Scale-Invariant Temporal History Convolution Network)を提案する。対数分布した時間記憶上の最大プールは、時間のスケール不変性をもたらす。 SITHConの性能を時間的畳み込みネットワーク(TCN)と比較し、両ネットワークが単変量および多変量時系列$f(t)$の分類と回帰問題を学習できるが、入力$f(at)$の再スケールに再学習することなく一般化できる特性を持つのはSITHConのみであることを示す。この性質は神経科学や心理学の知見に触発され、トレーニングの高速化や一般化性の向上など、大幅に異なる能力を持つ大規模ネットワークに繋がる可能性がある。

In machine learning, convolutional neural networks (CNNs) have been extremely influential in both computer vision and in recognizing patterns extended over time. In computer vision, part of the flexibility arises from the use of max-pooling operations over the convolutions to attain translation invariance. In the mammalian brain, neural representations of time use a set of temporal basis functions. Critically, these basis functions appear to be arranged in a geometric series such that the basis set is evenly distributed over logarithmic time. This paper introduces a Scale-Invariant Temporal History Convolution network (SITHCon) that uses a logarithmically-distributed temporal memory. A max-pool over a logarithmically-distributed temporal memory results in scale-invariance in time. We compare performance of SITHCon to a Temporal Convolution Network (TCN) and demonstrate that, although both networks can learn classification and regression problems on both univariate and multivariate time series $f(t)$, only SITHCon has the property that it generalizes without retraining to rescaled versions of the input $f(at)$. This property, inspired by findings from neuroscience and psychology, could lead to large-scale networks with dramatically different capabilities, including faster training and greater generalizability, even with significantly fewer free parameters.

翻訳日:2021-07-13 15:55:31 公開日:2021-07-09

# UAVと畳み込みネットワークの協調群を用いた効率的なリアルタイム画像認識

Efficient Real-Time Image Recognition Using Collaborative Swarm of UAVs and Convolutional Networks ( http://arxiv.org/abs/2107.04648v1 )

ライセンス: Link先を確認

Marwan Dhuheir, Emna Baccour, Aiman Erbad, Sinan Sabeeh, Mounir Hamdi

(参考訳) 無人航空機(uavs)は最近、異なる部門で使用され、困難で危険な地域で使用される能力に優れたため、大きな注目を集めている。さらに、コンピュータビジョンと人工知能の進歩により、森林火災の検出や国境監視といった様々な用途やソリューションにおけるUAVの使用が増加した。しかし、uavsでディープニューラルネットワーク(dnn)を使用することで、より深いネットワークや複雑なモデルを処理することの難しさが生まれ、オンボード計算が制限される。そこで本研究では,画像の分類と意思決定遅延の最小化を目的とした,リソース制約のあるUAV群に推論要求を分散する戦略を提案する。画像取得と最終決定の待ち時間を最小限に抑える最適化問題としてモデルを定式化する。定式化最適化解はnpハード問題である。したがって、オンラインリソース割り当てには不十分である。そこで我々は,オンラインヒューリスティックソリューションであるdistinferenceを導入して,利用可能なuavの中で最良なレイテンシを与えるレイヤ配置戦略を提案する。提案されたアプローチは、異なる低遅延アプリケーションや、レイヤのパイプライン(例えば、vgg)に編成されたすべてのcnnタイプ、あるいは残差ブロック(例えば、resnet)に基づいて使用するのに十分なほど一般的である。

Unmanned Aerial Vehicles (UAVs) have recently attracted significant attention due to their outstanding ability to be used in different sectors and serve in difficult and dangerous areas. Moreover, the advancements in computer vision and artificial intelligence have increased the use of UAVs in various applications and solutions, such as forest fires detection and borders monitoring. However, using deep neural networks (DNNs) with UAVs introduces several challenges of processing deeper networks and complex models, which restricts their on-board computation. In this work, we present a strategy aiming at distributing inference requests to a swarm of resource-constrained UAVs that classifies captured images on-board and finds the minimum decision-making latency. We formulate the model as an optimization problem that minimizes the latency between acquiring images and making the final decisions. The formulated optimization solution is an NP-hard problem. Hence it is not adequate for online resource allocation. Therefore, we introduce an online heuristic solution, namely DistInference, to find the layers placement strategy that gives the best latency among the available UAVs. The proposed approach is general enough to be used for different low decision-latency applications as well as for all CNN types organized into the pipeline of layers (e.g., VGG) or based on residual blocks (e.g., ResNet).

翻訳日:2021-07-13 15:48:53 公開日:2021-07-09

# とは何か? アルゴリズムフェアネスにおける形式的および実体的平等

Impossibility of What? Formal and Substantive Equality in Algorithmic Fairness ( http://arxiv.org/abs/2107.04642v1 )

ライセンス: Link先を確認

Ben Green

(参考訳) 社会的・経済的不平等の複合的危機に直面した多くの人々は、社会的公正を達成するためにアルゴリズム的意思決定に目を向けた。これらの取り組みが強化されるにつれて、"algorithmic fairness"という急成長する分野における推論は、実践においての公正さの出現をますます形作る。本稿では, アルゴリズム的公平性が, 社会的平等性を高めるための適切な概念的, 実践的なツールを提供するかどうかを問う。アルゴリズムの公正性に対する支配的な「形式的」アプローチは、その分析の狭い枠組みが改革に対する制限的なアプローチを生成するため、平等を追求する枠組みとして不適切である、と私は論じる。これらの欠点を踏まえて、社会階層に反するアルゴリズム的公正に対する「実質的」アプローチを提案し、不平等に対処する方法をより広範囲に分析する。この静的アプローチは、抑圧と戦うアルゴリズムの役割についてより実りある理論化を可能にする。形式的および実体的アルゴリズム的公正の区別は、各アプローチの「公正の実施可能性」(アルゴリズム的公正の数学的定義の不適合性)に対する応答によって例示される。形式的なアプローチでは、平等を高める努力に対する厳しい制限として「公正の不可能性」を受け入れる必要があるが、従属的なアプローチは、この虚偽のジレンマに従わず、社会的抑圧の状態を改善できるような改革を提案することによって、「公平の不可能性」から逃れることができる。

In the face of compounding crises of social and economic inequality, many have turned to algorithmic decision-making to achieve greater fairness in society. As these efforts intensify, reasoning within the burgeoning field of "algorithmic fairness" increasingly shapes how fairness manifests in practice. This paper interrogates whether algorithmic fairness provides the appropriate conceptual and practical tools for enhancing social equality. I argue that the dominant, "formal" approach to algorithmic fairness is ill-equipped as a framework for pursuing equality, as its narrow frame of analysis generates restrictive approaches to reform. In light of these shortcomings, I propose an alternative: a "substantive" approach to algorithmic fairness that centers opposition to social hierarchies and provides a more expansive analysis of how to address inequality. This substantive approach enables more fruitful theorizing about the role of algorithms in combatting oppression. The distinction between formal and substantive algorithmic fairness is exemplified by each approach's responses to the "impossibility of fairness" (an incompatibility between mathematical definitions of algorithmic fairness). While the formal approach requires us to accept the "impossibility of fairness" as a harsh limit on efforts to enhance equality, the substantive approach allows us to escape the "impossibility of fairness" by suggesting reforms that are not subject to this false dilemma and that are better equipped to ameliorate conditions of social oppression.

翻訳日:2021-07-13 15:42:53 公開日:2021-07-09

# モデル削減のためのガウス過程部分空間回帰

Gaussian Process Subspace Regression for Model Reduction ( http://arxiv.org/abs/2107.04668v1 )

ライセンス: Link先を確認

Ruda Zhang and Simon Mak and David Dunson

(参考訳) 部分空間値関数はパラメトリック・リダクション・オーダー・モデリング(PROM)を含む幅広い問題で発生する。 PROM では、各パラメータ点は、大きな系行列のペトロフ・ガレルキン射影に使用される部分空間に関連付けることができる。このような関数を近似する以前の取り組みは、不正確で遅い多様体上の補間を用いる。そこで我々は, ガウス過程部分空間回帰(gps)モデルという, 部分空間予測のためのベイズ非パラメトリックモデルを提案する。ユークリッド空間上の多変量ガウス分布(英語版)(multivariate gaussian distributions on the euclidean space)では、固定次元部分空間の集合であるグラスマン多様体上の合同確率モデル(英語版)(joint probability model)を誘導する。 GPSは単純な相関構造とモデル選択の原則的アプローチを採用している。その予測分布は解析形式を認め、パラメータ空間上の効率的な部分空間予測を可能にする。 PROMの場合、GPSは新しいパラメータポイントで確率的予測を提供し、局所的な縮小モデルの精度を保ち、計算の複雑さはシステム次元に依存しないため、オンライン計算に適している。本手法を部分空間補間と比較する4つの数値例と,局所還元モデルを補間する2つの方法を提案する。全体として、GPSは部分空間補間よりもデータ効率が良く、計算効率も高い。

Subspace-valued functions arise in a wide range of problems, including parametric reduced order modeling (PROM). In PROM, each parameter point can be associated with a subspace, which is used for Petrov-Galerkin projections of large system matrices. Previous efforts to approximate such functions use interpolations on manifolds, which can be inaccurate and slow. To tackle this, we propose a novel Bayesian nonparametric model for subspace prediction: the Gaussian Process Subspace regression (GPS) model. This method is extrinsic and intrinsic at the same time: with multivariate Gaussian distributions on the Euclidean space, it induces a joint probability model on the Grassmann manifold, the set of fixed-dimensional subspaces. The GPS adopts a simple yet general correlation structure, and a principled approach for model selection. Its predictive distribution admits an analytical form, which allows for efficient subspace prediction over the parameter space. For PROM, the GPS provides a probabilistic prediction at a new parameter point that retains the accuracy of local reduced models, at a computational complexity that does not depend on system dimension, and thus is suitable for online computation. We give four numerical examples to compare our method to subspace interpolation, as well as two methods that interpolate local reduced models. Overall, GPS is the most data efficient, more computationally efficient than subspace interpolation, and gives smooth predictions with uncertainty quantification.

翻訳日:2021-07-13 15:37:14 公開日:2021-07-09

# (参考訳) シナリオとVerifAIによる並列・多目的ファルシフィケーション

Parallel and Multi-Objective Falsification with Scenic and VerifAI ( http://arxiv.org/abs/2107.04164v1 )

ライセンス: CC BY 4.0

Kesav Viswanadha, Edward Kim, Francis Indaheng, Daniel J. Fremont, Sanjit A. Seshia

(参考訳) Falsificationは、自律システムのシミュレーションベースの検証のための重要なツールとして登場した。本稿では,並列性を活用し,多目的仕様までファルシフィケーションを拡張することで,サンプリングベースファルシフィケーション法のスケーラビリティを向上するシナリオ仕様言語とVerifAIツールキットの拡張について述べる。まず,Scanic のシミュレーションとサンプリング機能と VerifAI のファルシフィケーション機能の両方にインターフェースされた並列化フレームワークを提案する。次に,本アルゴリズムを拡張して,サンプリング中の多目的最適化を支援する。ルールブックの概念を用いて,逆例探索プロセスの導出に使用できる複数のメトリクスに対する優先順序を指定する。最後に、これらの拡張の利点を、シークエンス言語で書かれた包括的なベンチマークセットで評価する。

Falsification has emerged as an important tool for simulation-based verification of autonomous systems. In this paper, we present extensions to the Scenic scenario specification language and VerifAI toolkit that improve the scalability of sampling-based falsification methods by using parallelism and extend falsification to multi-objective specifications. We first present a parallelized framework that is interfaced with both the simulation and sampling capabilities of Scenic and the falsification capabilities of VerifAI, reducing the execution time bottleneck inherently present in simulation-based testing. We then present an extension of VerifAI's falsification algorithms to support multi-objective optimization during sampling, using the concept of rulebooks to specify a preference ordering over multiple metrics that can be used to guide the counterexample search process. Lastly, we evaluate the benefits of these extensions with a comprehensive set of benchmarks written in the Scenic language.

翻訳日:2021-07-13 02:27:45 公開日:2021-07-09

# (参考訳) 動作単位と表現認識のためのマルチモーダル・マルチタスク学習法

A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition ( http://arxiv.org/abs/2107.04187v1 )

ライセンス: CC BY-SA 4.0

Yue Jin, Tianqing Zheng, Chao Gao, Guoqiang Xu

(参考訳) 人間の感情分析は、人間とコンピュータの相互作用システムにとって不可欠である。ほとんどのメソッドは、Wildの設定に実用的でない制限されたシナリオで開発されます。 ABAW (Affective Behavior Analysis in-the-wild) 2021 コンテストは、この進行中の問題に対するベンチマークを提供する。本稿では,視覚情報と音声情報の両方を用いたマルチモーダル・マルチタスク学習手法を提案する。 auアノテーションと式アノテーションの両方を使用してモデルをトレーニングし、ビデオフレーム間の関連をさらに抽出するためにシーケンスモデルを適用します。検証セット上でauスコア0.712、式スコア0.477を達成する。これらの結果は, モデル性能向上における我々のアプローチの有効性を示す。

Analyzing human affect is vital for human-computer interaction systems. Most methods are developed in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem. In this paper, we introduce a multi-modal and multi-task learning method by using both visual and audio information. We use both AU and expression annotations to train the model and apply a sequence model to further extract associations between video frames. We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set. These results demonstrate the effectiveness of our approach in improving model performance.

翻訳日:2021-07-13 02:16:08 公開日:2021-07-09

# (参考訳) 室内局所化のための非IIDデータを用いた個人化フェデレーション学習

Personalized Federated Learning over non-IID Data for Indoor Localization ( http://arxiv.org/abs/2107.04189v1 )

ライセンス: CC BY 4.0

Peng Wu, Tales Imbiriba, Junha Park, Sunwoo Kim, Pau Closas

(参考訳) データ駆動方式によるオブジェクトの局在化と追跡は,無線チャネル伝搬モデルの物理特性を特徴付ける複雑さから,一般的な話題である。これらのモデリングアプローチでは、ユーザのプライバシが維持されると同時に、モデルを正確にトレーニングするためにデータを収集する必要がある。これらの目標を協調的に達成するための魅力的なスキームは、連合学習(federated learning:fl)と呼ばれる。 FLスキームの課題は、異なる領域を不均一に探索することに起因する非独立で同一の(非IID)データの存在である。本稿では,近年のflスキームを用いて,ベイズ則によって最適に融合されるパーソナライズされたモデルの集合を学習し,屋内ローカライゼーションの文脈において適切であることを示す。

Localization and tracking of objects using data-driven methods is a popular topic due to the complexity in characterizing the physics of wireless channel propagation models. In these modeling approaches, data needs to be gathered to accurately train models, at the same time that user's privacy is maintained. An appealing scheme to cooperatively achieve these goals is known as Federated Learning (FL). A challenge in FL schemes is the presence of non-independent and identically distributed (non-IID) data, caused by unevenly exploration of different areas. In this paper, we consider the use of recent FL schemes to train a set of personalized models that are then optimally fused through Bayesian rules, which makes it appropriate in the context of indoor localization.

翻訳日:2021-07-13 02:10:53 公開日:2021-07-09

# (参考訳) テンソル処理ユニット上の畳み込みネットワークの構造化モデルプルーニング

Structured Model Pruning of Convolutional Networks on Tensor Processing Units ( http://arxiv.org/abs/2107.04191v1 )

ライセンス: CC BY 4.0

Kongtao Chen, Ken Franko, Ruoxin Sang

(参考訳) 畳み込みニューラルネットワークの展開は、高い計算能力とストレージ要件によってしばしば妨げられる。構造化モデルプルーニングは、これらの要求を緩和するための有望なアプローチである。例えば、VGG-16モデルを用いて、テンソル処理ユニット(TPU)上の様々な構造化モデルプルーニング手法とデータセット(CIFAR-10およびImageNet)の精度-効率トレードオフを測定する。モデルの実際の性能を測定するため、TensorFlow2のための構造化モデルプルーニングライブラリを開発し、(マスク層を追加する代わりに)モデルを修正する。特に小さなデータセット(例えばcifar-10)では、構造化モデルプルーニングがモデルメモリ使用量とtpusの速度を大幅に改善できることを示した。

The deployment of convolutional neural networks is often hindered by high computational and storage requirements. Structured model pruning is a promising approach to alleviate these requirements. Using the VGG-16 model as an example, we measure the accuracy-efficiency trade-off for various structured model pruning methods and datasets (CIFAR-10 and ImageNet) on Tensor Processing Units (TPUs). To measure the actual performance of models, we develop a structured model pruning library for TensorFlow2 to modify models in place (instead of adding mask layers). We show that structured model pruning can significantly improve model memory usage and speed on TPUs without losing accuracy, especially for small datasets (e.g., CIFAR-10).

翻訳日:2021-07-13 02:01:30 公開日:2021-07-09

# (参考訳) 修正マルチタスク学習手法を用いた不完全ラベルによる感情認識

Emotion Recognition with Incomplete Labels Using Modified Multi-task Learning Technique ( http://arxiv.org/abs/2107.04192v1 )

ライセンス: CC BY 4.0

Phan Tran Dac Thinh, Hoang Manh Hung, Hyung-Jeong Yang, Soo-Hyung Kim, and Guee-Sang Lee

(参考訳) 人間の顔から7つの基本的な感情や行動単位などの感情情報を予測するタスクは、大量の注釈付きデータセットのアクセシビリティーと可用性により、徐々に興味深いものになりつつある。本研究では、afwild2データセットから7つの基本的な感情と12のアクションユニットを関連付ける手法を提案する。 ResNet50のアーキテクチャに基づく手法は、2つのタスクの不完全なラベルに対するマルチタスク学習技術を含む。 2つの相関したタスクの知識を組み合わせることで、両方のパフォーマンスは1種類のラベルのみを使用するモデルと比較して大きなマージンで改善される。

The task of predicting affective information in the wild such as seven basic emotions or action units from human faces has gradually become more interesting due to the accessibility and availability of massive annotated datasets. In this study, we propose a method that utilizes the association between seven basic emotions and twelve action units from the AffWild2 dataset. The method based on the architecture of ResNet50 involves the multi-task learning technique for the incomplete labels of the two tasks. By combining the knowledge for two correlated tasks, both performances are improved by a large margin compared to those with the model employing only one kind of label.

翻訳日:2021-07-13 01:54:19 公開日:2021-07-09

# (参考訳) 構造制約を考慮した確率的軌道予測

Probabilistic Trajectory Prediction with Structural Constraints ( http://arxiv.org/abs/2107.04193v1 )

ライセンス: CC BY 4.0

Weiming Zhi, Lionel Ott, Fabio Ramos

(参考訳) 本研究は,環境中の動的物体の運動軌跡を予測する問題に対処する。最近の動きパターン予測の進歩は、しばしば観測された軌道から動きパターンを外挿する機械学習技術に依存しており、既知の規則を直接組み込むメカニズムはない。本稿では,確率的学習と制約付き軌道最適化を組み合わせた新しい枠組みを提案する。我々のフレームワークの学習コンポーネントは、過去の観測座標に条件付けられた将来の運動軌跡の分布を提供する。この分布は、軌道分布の確率制約を強制する制約付き最適化問題の先行として用いられる。この結果、事前によく似た制約に従順な軌道分布が得られる。特に,外挿された将来の軌道分布が環境構造に適合するように,衝突の制約に焦点をあてる。実世界とシミュレーションされたデータセットを実証的に実証し,運動データに対する複雑な確率的運動軌跡を学習する上で,より堅牢で高品質な軌道分布を生成するために,制約を直接実施する。

This work addresses the problem of predicting the motion trajectories of dynamic objects in the environment. Recent advances in predicting motion patterns often rely on machine learning techniques to extrapolate motion patterns from observed trajectories, with no mechanism to directly incorporate known rules. We propose a novel framework, which combines probabilistic learning and constrained trajectory optimisation. The learning component of our framework provides a distribution over future motion trajectories conditioned on observed past coordinates. This distribution is then used as a prior to a constrained optimisation problem which enforces chance constraints on the trajectory distribution. This results in constraint-compliant trajectory distributions which closely resemble the prior. In particular, we focus our investigation on collision constraints, such that extrapolated future trajectory distributions conform to the environment structure. We empirically demonstrate on real-world and simulated datasets the ability of our framework to learn complex probabilistic motion trajectories for motion data, while directly enforcing constraints to improve generalisability, producing more robust and higher quality trajectory distributions.

翻訳日:2021-07-13 01:49:26 公開日:2021-07-09

# (参考訳) 直感的ユーザ入力からの深層画像合成 : レビューと展望

Deep Image Synthesis from Intuitive User Input: A Review and Perspectives ( http://arxiv.org/abs/2107.04240v1 )

ライセンス: CC BY 4.0

Yuan Xue, Yuan-Chen Guo, Han Zhang, Tao Xu, Song-Hai Zhang, Xiaolei Huang

(参考訳) コンピュータグラフィックス、アート、デザインの多くの応用において、ユーザはテキスト、スケッチ、ストローク、グラフ、レイアウトといった直感的な非画像入力を提供し、入力内容に準拠したフォトリアリスティックな画像を自動的に生成するコンピュータシステムを持つことが望ましい。このような自動画像コンテンツ生成を可能にする古典的な研究は、画像検索と合成の枠組みを踏襲しているが、GAN(generative adversarial network)、VAE(variantal autoencoder)、フローベース手法などの深層生成モデルの進歩により、より強力で汎用的な画像生成タスクが実現されている。本稿では,直感的なユーザ入力による画像合成,入力の汎用性の向上,画像生成手法,ベンチマークデータセット,評価指標について述べる。このことは、入力表現と対話性、主要画像生成パラダイム間のクロスポーリング、および生成方法の評価と比較に関する新しい視点を動機付けている。

In many applications of computer graphics, art and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically generate photo-realistic images that adhere to the input content. While classic works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation tasks. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross pollination between major image generation paradigms, and evaluation and comparison of generation methods.

翻訳日:2021-07-13 01:35:53 公開日:2021-07-09

# (参考訳) WinoCNN:FPGA上での効率的な畳み込みニューラルネットワーク高速化のためのカーネル共有Winograd Systolic Array

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs ( http://arxiv.org/abs/2107.04244v1 )

ライセンス: CC0 1.0

Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen

(参考訳) Winogradのアルゴリズムとsystolic arrayアーキテクチャの組み合わせにより、FPGAプラットフォーム上での畳み込みニューラルネットワーク(CNN)の高速化において、DSP効率を改善する能力が実証された。しかし、FPGAベースのWinograd処理要素で任意のコンボリューションカーネルサイズを扱い、効率的なデータアクセスをサポートすることは未定である。本研究では,WinoPEを最適化し,同じ計算資源で複数のカーネルサイズを自然にサポートし,高い実行時 DSP 効率を維持できる,最適化されたWinograd 処理素子を提案する。提案したWinoPEを用いて,WinoCNNと呼ばれる高効率なシリアルアレイ加速器を構築する。また,データアクセスを最適化する専用メモリサブシステムを提案する。アクセラレーションアーキテクチャに基づいて,リソース制約の異なる最適なアクセラレーション構成を探索するために,正確なリソースとパフォーマンスのモデリングを構築する。提案するアクセラレータを複数のFPGA上で実装し、スループットとDSP効率の両方で最先端の設計を上回ります。 Xilinx ZCU102 FPGA で DSP の効率を 1.33 GOPS/DSP まで向上し,スループットを 3.1 TOPS まで向上させる。これらはそれぞれ、前述した最高の解よりも29.1\%と20.0\%良い。

The combination of Winograd's algorithm and systolic array architecture has demonstrated the capability of improving DSP efficiency in accelerating convolutional neural networks (CNNs) on FPGA platforms. However, handling arbitrary convolution kernel sizes in FPGA-based Winograd processing elements and supporting efficient data access remain underexplored. In this work, we are the first to propose an optimized Winograd processing element (WinoPE), which can naturally support multiple convolution kernel sizes with the same amount of computing resources and maintains high runtime DSP efficiency. Using the proposed WinoPE, we construct a highly efficient systolic array accelerator, termed WinoCNN. We also propose a dedicated memory subsystem to optimize the data access. Based on the accelerator architecture, we build accurate resource and performance modeling to explore optimal accelerator configurations under different resource constraints. We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency. Our implementation achieves DSP efficiency up to 1.33 GOPS/DSP and throughput up to 3.1 TOPS with the Xilinx ZCU102 FPGA. These are 29.1\% and 20.0\% better than the best solutions reported previously, respectively.

翻訳日:2021-07-13 01:00:40 公開日:2021-07-09

# (参考訳) ハイブリッド自動微分を用いた微分プライベート機械学習における感度解析

Sensitivity analysis in differentially private machine learning using hybrid automatic differentiation ( http://arxiv.org/abs/2107.04265v1 )

ライセンス: CC BY 4.0

Alexander Ziller, Dmitrii Usynin, Moritz Knolle, Kritika Prakash, Andrew Trask, Rickmer Braren, Marcus Makowski, Daniel Rueckert, Georgios Kaissis

(参考訳) 近年,機械学習(ML)などのデータ駆動タスクに展開可能な,差分プライバシー(DP)などの形式的なプライバシ保護手法が出現している。個人のプライバシ損失の原則分析に必要なクローズドフォーム推論と大規模mlの調整には、自動感度分析のための新しいツールの導入と、計算フローを通じて個人のデータとその特徴を追跡することが必要である。そこで,本研究では,逆モードadの効率と計算グラフ内の任意の量に対してクローズドフォーム式を得る能力を組み合わせた,新しい \textit{hybrid} automatic differentiation (ad) システムを提案する。これにより、ニューラルネットワークをプライベートデータ上でトレーニングするなど、任意の微分可能な関数合成の感度をモデル化できる。統計的データベースクエリの個々のDP保証を分析することで、我々のアプローチを実証する。さらに,本手法のdpニューラルネットワークのトレーニングへの応用について検討した。当社のアプローチは,データ処理設定におけるプライバシ損失の原則的推論を可能にし,さらに自動感度分析とプライバシー予算システムの開発を可能にする。

In recent years, formal methods of privacy protection such as differential privacy (DP), capable of deployment to data-driven tasks such as machine learning (ML), have emerged. Reconciling large-scale ML with the closed-form reasoning required for the principled analysis of individual privacy loss requires the introduction of new tools for automatic sensitivity analysis and for tracking an individual's data and their features through the flow of computation. For this purpose, we introduce a novel \textit{hybrid} automatic differentiation (AD) system which combines the efficiency of reverse-mode AD with an ability to obtain a closed-form expression for any given quantity in the computational graph. This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data. We demonstrate our approach by analysing the individual DP guarantees of statistical database queries. Moreover, we investigate the application of our technique to the training of DP neural networks. Our approach can enable the principled reasoning about privacy loss in the setting of data processing, and further the development of automatic sensitivity analysis and privacy budgeting systems.

翻訳日:2021-07-13 00:28:31 公開日:2021-07-09

# (参考訳) fedadapt: フェデレーション学習におけるiotデバイスの適応オフロード

FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning ( http://arxiv.org/abs/2107.04271v1 )

ライセンス: CC BY 4.0

Di Wu and Rehmat Ullah and Paul Harvey and Peter Kilpatrick and Ivor Spence and Blesson Varghese

(参考訳) Internet-of-Thingsデバイスにフェデレートラーニング(FL)を適用するには、生成する大量のデータと、データプライバシに関する懸念が不可欠である。しかし、FLを効率的にするためには、3つの課題がある: (i) 限られた計算能力を持つデバイス上で実行し、 (ii) デバイスの計算的不均一性に起因するストラグラーを考慮し、 (iii) ネットワーク帯域幅の変化に適応する。本稿では、上記の課題を軽減するための適応型オフロードFLフレームワークであるFedAdaptを提案する。 FedAdaptは、ディープニューラルネットワーク(DNN)をサーバにオフロードすることで、計算制約のあるデバイスのローカルトレーニングを加速する。さらに、FedAdaptは強化学習に基づく最適化とクラスタリングを採用し、各デバイスにDNNのどの層をオフロードすべきかを適応的に識別し、計算の不均一性やネットワーク帯域幅の変化といった課題に取り組む。 5つのIoTデバイスからなる実験室ベースのテストベッドで実験を行った。デバイスからサーバにDNNをオフロードすることで、FedAdaptは従来のFLに比べて、一般的なIoTデバイスのトレーニング時間を半減する。極端なストラグラーのトレーニング時間と全体のトレーニング時間は最大57%削減できる。さらに、ネットワーク帯域幅の変更により、FedAdaptは、従来のFLと比較してトレーニング時間を最大40%短縮する。 FedAdaptはhttps://github.com/qub-blesson/FedAdaptからダウンロードできる。

Applying Federated Learning (FL) on Internet-of-Things devices is necessitated by the large volumes of data they produce and growing concerns of data privacy. However, there are three challenges that need to be addressed to make FL efficient: (i) execute on devices with limited computational capabilities, (ii) account for stragglers due to computational heterogeneity of devices, and (iii) adapt to the changing network bandwidths. This paper presents FedAdapt, an adaptive offloading FL framework to mitigate the aforementioned challenges. FedAdapt accelerates local training in computationally constrained devices by leveraging layer offloading of deep neural networks (DNNs) to servers. Further, FedAdapt adopts reinforcement learning-based optimization and clustering to adaptively identify which layers of the DNN should be offloaded for each individual device on to a server to tackle the challenges of computational heterogeneity and changing network bandwidth. Experimental studies are carried out on a lab-based testbed comprising five IoT devices. By offloading a DNN from the device to the server FedAdapt reduces the training time of a typical IoT device by over half compared to classic FL. The training time of extreme stragglers and the overall training time can be reduced by up to 57%. Furthermore, with changing network bandwidth, FedAdapt is demonstrated to reduce the training time by up to 40% when compared to classic FL, without sacrificing accuracy. FedAdapt can be downloaded from https://github.com/qub-blesson/FedAdapt.

翻訳日:2021-07-13 00:17:38 公開日:2021-07-09

# (参考訳) LIFE: 3D OCT-A 容器セグメンテーションのための汎用オートディクティックパイプライン

LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel Segmentation ( http://arxiv.org/abs/2107.04282v1 )

ライセンス: CC BY 4.0

Dewei Hu, Can Cui, Hao Li, Kathleen E. Larson, Yuankai K. Tao and Ipek Oguz

(参考訳) 光コヒーレンス断層撮影(OCT)は、眼科領域で広く用いられている非侵襲的イメージング技術である。 OCTアンギオグラフィー(OCT-A)に拡張し,コントラストが改善した網膜血管を呈する。近年の深層学習アルゴリズムは血管セグメンテーションに有望な結果をもたらすが,手動による注記データがないため,3次元網膜血管セグメンテーションは困難である。本研究では,局所強度融合(LIF)と呼ばれる自己合成モーメントによってのみ教師される学習に基づく手法を提案する。 LIFは、入力OCT-Aから直接計算される毛細血管拡張ボリュームである。次に、局所強度融合エンコーダ(LIFE)を構築し、与えられたOCT-A体積とそのLIFを共有潜在空間にマップする。 LIFEの潜在空間は入力データと同じ次元を持ち、両方のモダリティに共通する特徴を含む。この潜伏空間をバイナライズすることにより、体積容器セグメンテーションが得られる。本手法はヒト卵胞 OCT-A と 3 個のゼブラフィッシュ OCT-A を手動ラベルで評価した。人間のデータでは0.7736、ゼブラフィッシュデータでは 0.8594 +/-0.0275、教師なしのアルゴリズムより劇的な改善である。

Optical coherence tomography (OCT) is a non-invasive imaging technique widely used for ophthalmology. It can be extended to OCT angiography (OCT-A), which reveals the retinal vasculature with improved contrast. Recent deep learning algorithms produced promising vascular segmentation results; however, 3D retinal vessel segmentation remains difficult due to the lack of manually annotated training data. We propose a learning-based method that is only supervised by a self-synthesized modality named local intensity fusion (LIF). LIF is a capillary-enhanced volume computed directly from the input OCT-A. We then construct the local intensity fusion encoder (LIFE) to map a given OCT-A volume and its LIF counterpart to a shared latent space. The latent space of LIFE has the same dimensions as the input data and it contains features common to both modalities. By binarizing this latent space, we obtain a volumetric vessel segmentation. Our method is evaluated in a human fovea OCT-A and three zebrafish OCT-A volumes with manual labels. It yields a Dice score of 0.7736 on human data and 0.8594 +/- 0.0275 on zebrafish data, a dramatic improvement over existing unsupervised algorithms.

翻訳日:2021-07-12 23:55:21 公開日:2021-07-09

# (参考訳) Pseudo-Multimodal Fusion Network を用いた網膜OCT

Retinal OCT Denoising with Pseudo-Multimodal Fusion Network ( http://arxiv.org/abs/2107.04288v1 )

ライセンス: CC BY 4.0

Dewei Hu, Joseph D. Malone, Yigit Atay, Yuankai K. Tao and Ipek Oguz

(参考訳) 光コヒーレンストモグラフィー(OCT)は、網膜の一般的なイメージング技術である。しかし、血管や組織層を含む重要な解剖学的構造の可視性を低下させることができる乗法的なスペックルノイズの影響を受けている。連続したBスキャンフレームの平均化はSNR(Signal-to-noise-ratio)を大幅に改善するが、これはより長い取得時間を必要とするため、運動アーティファクトの導入や患者への不快感を引き起こす可能性がある。本研究では,単フレーム雑音b-scan情報と擬似モダリティ情報を利用する学習ベース手法を提案する。擬似モダリティは、ノイズの多いBスキャンではほとんど認識できないが、小さな容器のような細かな特徴を過度に滑らかにできる層に対して優れたSNRを提供する。融合ネットワークを利用することで、各モダリティから望ましい特徴を組み合わせることができ、その寄与の重みを調整できる。強度基準および構造指標を用いて評価した結果,本手法はスペックルノイズを効果的に抑制し,網膜層間のコントラストを増強し,全体の構造と小血管を保存できることがわかった。本手法は, 単一モードネットワークと比較して0.559 +\- 0.033から0.576 +\- 0.031までの低雑音Bスキャンと構造的類似性を改善する。

Optical coherence tomography (OCT) is a prevalent imaging technique for retina. However, it is affected by multiplicative speckle noise that can degrade the visibility of essential anatomical structures, including blood vessels and tissue layers. Although averaging repeated B-scan frames can significantly improve the signal-to-noise-ratio (SNR), this requires longer acquisition time, which can introduce motion artifacts and cause discomfort to patients. In this study, we propose a learning-based method that exploits information from the single-frame noisy B-scan and a pseudo-modality that is created with the aid of the self-fusion method. The pseudo-modality provides good SNR for layers that are barely perceptible in the noisy B-scan but can over-smooth fine features such as small vessels. By using a fusion network, desired features from each modality can be combined, and the weight of their contribution is adjustable. Evaluated by intensity-based and structural metrics, the result shows that our method can effectively suppress the speckle noise and enhance the contrast between retina layers while the overall structure and small blood vessels are preserved. Compared to the single modality network, our method improves the structural similarity with low noise B-scan from 0.559 +\- 0.033 to 0.576 +\- 0.031.

翻訳日:2021-07-12 23:45:20 公開日:2021-07-09

# (参考訳) ポイントワイズ解析における最遠点サンプリング

Beyond Farthest Point Sampling in Point-Wise Analysis ( http://arxiv.org/abs/2107.04291v1 )

ライセンス: CC BY 4.0

Yiqun Lin, Lichang Chen, Haibin Huang, Chongyang Ma, Xiaoguang Han and Shuguang Cui

(参考訳) サンプリング、グルーピング、アグリゲーションはポイントクラウドのマルチスケール分析において3つの重要なコンポーネントである。本稿では,ポイントワイズ分析タスクのための新しいデータ駆動型サンプル学習戦略を提案する。広く使われているサンプリング手法であるfarthest point sampling (fps) とは異なり,サンプリングと下流アプリケーションを同時に学習することを提案する。我々の重要な洞察は、FPSのような一様サンプリング手法が必ずしも異なるタスクに対して最適であるとは限らないことである。最後に,タスク関連真実情報によって教師されるサンプリング点変位を学習し,その基礎となる課題と協調して学習できる新しいサンプル学習手法を提案する。さらに,本手法を意味的部分分割,ポイントクラウド補完,キーポイント検出など,様々な点解析アーキテクチャで実証する。実験の結果, 従来のベースライン法に比べて, サンプルとタスクの同時学習が著しく改善した。

Sampling, grouping, and aggregation are three important components in the multi-scale analysis of point clouds. In this paper, we present a novel data-driven sampler learning strategy for point-wise analysis tasks. Unlike the widely used sampling technique, Farthest Point Sampling (FPS), we propose to learn sampling and downstream applications jointly. Our key insight is that uniform sampling methods like FPS are not always optimal for different tasks: sampling more points around boundary areas can make the point-wise classification easier for segmentation. Towards the end, we propose a novel sampler learning strategy that learns sampling point displacement supervised by task-related ground truth information and can be trained jointly with the underlying tasks. We further demonstrate our methods in various point-wise analysis architectures, including semantic part segmentation, point cloud completion, and keypoint detection. Our experiments show that jointly learning of the sampler and task brings remarkable improvement over previous baseline methods.

翻訳日:2021-07-12 23:37:47 公開日:2021-07-09

# (参考訳) 予測不確実性を考慮したランゲヴィン力学を用いたニューラルネットワークの微分プライベートトレーニング

Differentially private training of neural networks with Langevin dynamics forcalibrated predictive uncertainty ( http://arxiv.org/abs/2107.04296v1 )

ライセンス: CC BY 4.0

Moritz Knolle, Alexander Ziller, Dmitrii Usynin, Rickmer Braren, Marcus R. Makowski, Daniel Rueckert, Georgios Kaissis

(参考訳) 偏差的にプライベートな確率的勾配降下(dp-sgd)は、校正が不十分で信頼度の高い深層学習モデルをもたらす可能性がある。これは、例えば安全クリティカルなアプリケーションにとって深刻な問題である。医学診断で我々は,従来の(DP-SGD)アルゴリズムを微調整した偏微分プライベートなベイズニューラルネットワークをトレーニングするために,ディープニューラルネットワークのトレーニングのためのスケーラブルベイズ推論手法である確率勾配ランゲヴィンダイナミクスとDP-SGDの並列性を強調・活用する。我々のアプローチはdp-sgdよりもかなり信頼性の高い不確実性推定を提供し、予測校正誤差の低減(mnist $\sim{5}$-fold、小児肺炎データセット $\sim{2}$-fold)によって実証された。

We show that differentially private stochastic gradient descent (DP-SGD) can yield poorly calibrated, overconfident deep learning models. This represents a serious issue for safety-critical applications, e.g. in medical diagnosis. We highlight and exploit parallels between stochastic gradient Langevin dynamics, a scalable Bayesian inference technique for training deep neural networks, and DP-SGD, in order to train differentially private, Bayesian neural networks with minor adjustments to the original (DP-SGD) algorithm. Our approach provides considerably more reliable uncertainty estimates than DP-SGD, as demonstrated empirically by a reduction in expected calibration error (MNIST $\sim{5}$-fold, Pediatric Pneumonia Dataset $\sim{2}$-fold).

翻訳日:2021-07-12 23:17:44 公開日:2021-07-09

# (参考訳) 重力波サーロゲートモデリングのためのオートエンコーダ駆動スパイラル表現学習

Autoencoder-driven Spiral Representation Learning for Gravitational Wave Surrogate Modelling ( http://arxiv.org/abs/2107.04312v1 )

ライセンス: CC BY 4.0

Paraskevi Nousi, Styliani-Christina Fragkouli, Nikolaos Passalis, Panagiotis Iosif, Theocharis Apostolatos, George Pappas, Nikolaos Stergioulas, Anastasios Tefas

(参考訳) 近年, 人工ニューラルネットワークは重力波天文学の分野で勢いを増している。例えば, 二元ブラックホールの吸入と融合のための計算コストの高い波形モデルの代理モデリングなどである。サーロゲートモデリングは、トレーニングサンプル外の任意の波形に対するサーロゲートモデルの係数を補間する最終段階において、重力波とニューラルネットワークの高速かつ正確な近似が得られる。オートエンコーダを用いた経験的補間係数における基底構造の存在について検討する。係数空間が2次元のみに圧縮されると、スパイラル構造が現れ、スパイラル角は質量比と線形に関係していることを示す。この発見に基づいて、ニューラルネットワークの第1層として使用される学習可能なパラメータを持つスパイラルモジュールを設計し、入力空間を係数にマッピングする方法を学習する。スパイラルモジュールは複数のニューラルネットワークアーキテクチャ上で評価され、ベースラインモデルよりも高い速度精度のトレードオフを達成する。デスクトップgpu上で1ms以下で1回のフォワードパスで数百万の入力パラメータを評価できるサーロゲートモデルと、対応する生成された波形と接地波形とのミスマッチが比較基準法より優れていることを示す。我々は、ブラックホール双対を回転させる場合の類似構造とそれに対応する計算ゲインの存在を予想する。

Recently, artificial neural networks have been gaining momentum in the field of gravitational wave astronomy, for example in surrogate modelling of computationally expensive waveform models for binary black hole inspiral and merger. Surrogate modelling yields fast and accurate approximations of gravitational waves and neural networks have been used in the final step of interpolating the coefficients of the surrogate model for arbitrary waveforms outside the training sample. We investigate the existence of underlying structures in the empirical interpolation coefficients using autoencoders. We demonstrate that when the coefficient space is compressed to only two dimensions, a spiral structure appears, wherein the spiral angle is linearly related to the mass ratio. Based on this finding, we design a spiral module with learnable parameters, that is used as the first layer in a neural network, which learns to map the input space to the coefficients. The spiral module is evaluated on multiple neural network architectures and consistently achieves better speed-accuracy trade-off than baseline models. A thorough experimental study is conducted and the final result is a surrogate model which can evaluate millions of input parameters in a single forward pass in under 1ms on a desktop GPU, while the mismatch between the corresponding generated waveforms and the ground-truth waveforms is better than the compared baseline methods. We anticipate the existence of analogous underlying structures and corresponding computational gains also in the case of spinning black hole binaries.

翻訳日:2021-07-12 23:07:53 公開日:2021-07-09

# (参考訳) 収穫者・リモートセンシング・環境データを用いたノルウェー産スズ林の腐朽量予測

Prediction of butt rot volume in Norway spruce forest stands using harvester, remotely sensed and environmental data ( http://arxiv.org/abs/2107.04316v1 )

ライセンス: CC BY 4.0

Janne R\"aty, Johannes Breidenbach, Marius Hauglin, Rasmus Astrup

(参考訳) ノルウェー・スプルース(picea abies [l.] karst.)に関連するバットロート(br)損傷北半球の木材生産でかなりの経済的損失を計上しています br損傷に関する情報は森林管理の最適意思決定には不可欠であるが、br損傷の地図は森林情報システムでは一般的に欠落している。ノルウェーのスタンドレベルにおいて, BRにより損傷を受けた木材の体積を, 186,026茎(クラーカット), リモートセンシング, 環境データ(例)を用いて予測した。気候と地形の特徴) 本研究では,(1)収穫後に利用可能な予測変数(理論ケース)と(2)収穫前に利用可能な予測変数(マッピングケース)の2種類の予測変数を持つランダムフォレスト(RF)モデルを用いた。森林特性は, リモートセンシングによる高さ, 収穫木材体積, 乳房高さの2次平均直径など, 森林の成熟度を特徴付けることが, 最も重要な予測変数であることがわかった。大気レーザースキャンデータとセンチネル-2画像から得られたリモートセンシング予測変数は,環境変数よりも重要であった。 11.4 $m^3ha^{-1}$(pseudo $R^2$: 0.66)のRMSEが得られたが、このマッピングの場合、擬似的なR^2$が0.60となった。林冠の空間的に異なるk-meansクラスターをクロスバリデーション単位とした場合, RMSE値と擬似$R^2$は, それぞれ15.6 $m^3ha^{-1}$と0.37であった。このことは, BR損傷のマッピングにおいて良好な誤差率を得る上で, 空間閉点のBR状態に関する知識が重要であることを示している。

Butt rot (BR) damages associated with Norway spruce (Picea abies [L.] Karst.) account for considerable economic losses in timber production across the northern hemisphere. While information on BR damages is critical for optimal decision-making in forest management, the maps of BR damages are typically lacking in forest information systems. We predicted timber volume damaged by BR at the stand-level in Norway using harvester information of 186,026 stems (clear-cuts), remotely sensed, and environmental data (e.g. climate and terrain characteristics). We utilized random forest (RF) models with two sets of predictor variables: (1) predictor variables available after harvest (theoretical case) and (2) predictor variables available prior to harvest (mapping case). We found that forest attributes characterizing the maturity of forest, such as remote sensing-based height, harvested timber volume and quadratic mean diameter at breast height, were among the most important predictor variables. Remotely sensed predictor variables obtained from airborne laser scanning data and Sentinel-2 imagery were more important than the environmental variables. The theoretical case with a leave-stand-out cross-validation achieved an RMSE of 11.4 $m^3ha^{-1}$ (pseudo $R^2$: 0.66) whereas the mapping case resulted in a pseudo $R^2$ of 0.60. When the spatially distinct k-means clusters of harvested forest stands were used as units in the cross-validation, the RMSE value and pseudo $R^2$ associated with the mapping case were 15.6 $m^3ha^{-1}$ and 0.37, respectively. This indicates that the knowledge about the BR status of spatially close stands is of high importance for obtaining satisfactory error rates in the mapping of BR damages.

翻訳日:2021-07-12 22:20:54 公開日:2021-07-09

# (参考訳) idrlnet:物理に変形したニューラルネットワークライブラリ

IDRLnet: A Physics-Informed Neural Network Library ( http://arxiv.org/abs/2107.04320v1 )

ライセンス: CC BY 4.0

Wei Peng, Jun Zhang, Weien Zhou, Xiaoyu Zhao, Wen Yao, Xiaoqian Chen

(参考訳) 物理情報ニューラルネットワーク(英: Physics Informed Neural Network, PINN)は、偏微分方程式(英語版)(PDE)によってモデル化された前方および逆問題の両方を解決するための科学計算フレームワークである。本稿では,PINNによる問題解決のためのPythonツールボックスであるIDRLnetを紹介する。 IDRLnetは、幅広いPINNアルゴリズムとアプリケーションのためのフレームワークを構築している。幾何学的オブジェクト、データソース、人工ニューラルネットワーク、損失メトリクス、最適化をPythonに組み込む構造化された方法を提供する。さらに、雑音の逆問題、変分最小化、積分微分方程式を解く機能を提供する。新しいPINNの亜種は容易にフレームワークに統合できる。ソースコード、チュートリアル、ドキュメントは \url{https://github.com/idrl-lab/idrlnet} で入手できる。

Physics Informed Neural Network (PINN) is a scientific computing framework used to solve both forward and inverse problems modeled by Partial Differential Equations (PDEs). This paper introduces IDRLnet, a Python toolbox for modeling and solving problems through PINN systematically. IDRLnet constructs the framework for a wide range of PINN algorithms and applications. It provides a structured way to incorporate geometric objects, data sources, artificial neural networks, loss metrics, and optimizers within Python. Furthermore, it provides functionality to solve noisy inverse problems, variational minimization, and integral differential equations. New PINN variants can be integrated into the framework easily. Source code, tutorials, and documentation are available at \url{https://github.com/idrl-lab/idrlnet}.

翻訳日:2021-07-12 22:09:27 公開日:2021-07-09

# (参考訳) Mutually-Aware Sub-Graphs Differentiable Architecture Search

Mutually-aware Sub-Graphs Differentiable Architecture Search ( http://arxiv.org/abs/2107.04324v1 )

ライセンス: CC BY 4.0

Haoxian Tan, Sheng Guo, Yujie Zhong, Weilin Huang

(参考訳) 差別化可能なアーキテクチャ検索は、そのシンプルさと効率性のため、nasの分野では、マルチパスアルゴリズムとシングルパスメソッドの2つのパラダイムが支配されている。マルチパスフレームワーク(例) DARTS)は直感的だが、メモリ使用量とトレーニングの崩壊に悩まされている。シングルパス法(GDASやProxylessNASなど)はメモリ問題を緩和し、検索と評価のギャップを縮めるが性能を犠牲にする。本稿では,これら2つのパラダイムを相互に認識するサブグラフ微分可能アーキテクチャ探索 (msg-das) と呼ぶ,概念的に単純かつ効率的な橋渡し手法を提案する。フレームワークのコアはGumbel-TopKサンプルであり、複数の相互排他的なシングルパスサブグラフを生成する。複数のサブグラフ設定によるスキップ接続の問題を軽減するため,最適化を安定化するためのDropblock-Identityモジュールを提案する。利用可能なモデル(スーパーネットとサブグラフ)を最大限に活用するために、トレーニングを改善するためのメモリ効率の高いスーパーネット誘導蒸留を導入する。提案するフレームワークは、フレキシブルメモリ使用量と検索品質のバランスをとる。本研究では,imagenet と cifar10 における提案手法の有効性を実証する。

Differentiable architecture search is prevalent in the field of NAS because of its simplicity and efficiency, where two paradigms, multi-path algorithms and single-path methods, are dominated. Multi-path framework (e.g. DARTS) is intuitive but suffers from memory usage and training collapse. Single-path methods (e.g.GDAS and ProxylessNAS) mitigate the memory issue and shrink the gap between searching and evaluation but sacrifice the performance. In this paper, we propose a conceptually simple yet efficient method to bridge these two paradigms, referred as Mutually-aware Sub-Graphs Differentiable Architecture Search (MSG-DAS). The core of our framework is a differentiable Gumbel-TopK sampler that produces multiple mutually exclusive single-path sub-graphs. To alleviate the severer skip-connect issue brought by multiple sub-graphs setting, we propose a Dropblock-Identity module to stabilize the optimization. To make best use of the available models (super-net and sub-graphs), we introduce a memory-efficient super-net guidance distillation to improve training. The proposed framework strikes a balance between flexible memory usage and searching quality. We demonstrate the effectiveness of our methods on ImageNet and CIFAR10, where the searched models show a comparable performance as the most recent approaches.

翻訳日:2021-07-12 22:06:13 公開日:2021-07-09

# (参考訳) attend2pack: 注意深い強化学習によるビンパッキング

Attend2Pack: Bin Packing through Deep Reinforcement Learning with Attention ( http://arxiv.org/abs/2107.04333v1 )

ライセンス: CC0 1.0

Jingwei Zhang, Bin Zi, Xiaoyu Ge

(参考訳) 本稿では,学習の観点からBPP(bin packing problem)に取り組むことを目的とする。自己注意に基づく符号化と深層強化学習アルゴリズムに基づいて,本課題に対する新たなエンドツーエンド学習モデルを提案する。複合行動空間を分解し、また、政治学習を高速化する一般的な手法である優先オーバーサンプリングと呼ばれる新しい訓練手法を利用することで、様々な実験環境において最先端のパフォーマンスを実現する。さらに,提案手法はオフラインBPPをターゲットにしているが,本手法は最先端の性能も達成できる厳密なオンラインBPP設定に限定する。一連のアブレーション研究と、それ以前の一連の研究との比較により、この研究分野への有効なベースラインアプローチとして提案したい。

This paper seeks to tackle the bin packing problem (BPP) through a learning perspective. Building on self-attention-based encoding and deep reinforcement learning algorithms, we propose a new end-to-end learning model for this task of interest. By decomposing the combinatorial action space, as well as utilizing a new training technique denoted as prioritized oversampling, which is a general scheme to speed up on-policy learning, we achieve state-of-the-art performance in a range of experimental settings. Moreover, although the proposed approach attend2pack targets offline-BPP, we strip our method down to the strict online-BPP setting where it is also able to achieve state-of-the-art performance. With a set of ablation studies as well as comparisons against a range of previous works, we hope to offer as a valid baseline approach to this field of study.

翻訳日:2021-07-12 21:52:43 公開日:2021-07-09

# (参考訳) 変数式の変化の一般化と残留流への応用

Generalization of the Change of Variables Formula with Applications to Residual Flows ( http://arxiv.org/abs/2107.04346v1 )

ライセンス: CC BY 4.0

Niklas Koenen, Marvin N. Wright, Peter Maa{\ss} and Jens Behrmann

(参考訳) 正規化フローは可変式 (CVF) を利用してフレキシブル密度モデルを定義する。しかし、CVFにおける滑らかな変換(微分同相)の要求は、これらのモデルの構築において大きな課題となる。フローの設計空間を拡大するために、一般化変換として $\mathcal{L}$-diffeomorphisms を導入する。この緩和は、例えば、 ReLUのような非滑らかなアクティベーション関数の使用。最後に,得られた結果を平面流,ラジアル流,収縮的残留流に適用する。

Normalizing flows leverage the Change of Variables Formula (CVF) to define flexible density models. Yet, the requirement of smooth transformations (diffeomorphisms) in the CVF poses a significant challenge in the construction of these models. To enlarge the design space of flows, we introduce $\mathcal{L}$-diffeomorphisms as generalized transformations which may violate these requirements on zero Lebesgue-measure sets. This relaxation allows e.g. the use of non-smooth activation functions such as ReLU. Finally, we apply the obtained results to planar, radial, and contractive residual flows.

翻訳日:2021-07-12 21:24:41 公開日:2021-07-09

# (参考訳) 科学知識の形式化と可視化のためのオントロジー

An ontology for the formalization and visualization of scientific knowledge ( http://arxiv.org/abs/2107.04347v1 )

ライセンス: CC BY 4.0

Vincenzo Daponte and Gilles Falquet

(参考訳) ここで提示される科学知識オブジェクトのオントロジーの構築は、科学的知識の可視化を指向したアプローチの開発の一部である。科学的知識の組織化の概念(理論、法、経験、証明など)によって動機づけられている。既存のオントロジーに現れるが、どれもこの話題に重点を置いておらず、シンプルで簡単に使える組織を提示する。オントロジソース(特定の分野の知識オブジェクトのオントロジー、語彙的および高レベルのオブジェクトのオントロジー)、専門知識ベース、科学者へのインタビューから構築された最初のバージョンを提示する。我々は、このオントロジーを、使用したいくつかのソースと整合させ、それらに関して一貫性を検証できるようにしました。オントロジーの検証は、我々が物理学の分野から始めた様々な情報源からの知識を形式化するためにそれを使うことである。

The construction of an ontology of scientific knowledge objects, presented here, is part of the development of an approach oriented towards the visualization of scientific knowledge. It is motivated by the fact that the concepts of organization of scientific knowledge (theorem, law, experience, proof, etc.) appear in existing ontologies but that none of them is centered on this topic and presents a simple and easily usable organization. We present the first version built from ontological sources (ontologies of knowledge objects of certain fields, lexical and higher level ones), specialized knowledge bases and interviews with scientists. We have aligned this ontology with some of the sources used, which has allowed us to verify its consistency with respect to them. The validation of the ontology consists in using it to formalize knowledge from various sources, which we have begun to do in the field of physics.

翻訳日:2021-07-12 21:07:53 公開日:2021-07-09

# (参考訳) 文書レイアウト生成のためのグラフベース深層生成モデル

Graph-based Deep Generative Modelling for Document Layout Generation ( http://arxiv.org/abs/2107.04357v1 )

ライセンス: CC BY-SA 4.0

Sanket Biswas, Pau Riba, Josep Llad\'os, and Umapada Pal

(参考訳) ディープラーニングアプローチの主要な前提条件の1つは、大規模トレーニングデータの可用性である。実世界のシナリオでスキャンされた文書画像を扱う場合、その内容の主情報はレイアウト自体に格納される。本研究では,グラフニューラルネットワーク(GNN)を用いて,文書解釈システム,特にデジタルメールルームアプリケーションにおいて,文書解釈システムの学習に使用可能な,高度に可変かつ信頼性の高い文書レイアウトを持つ合成データを生成する。また、ドキュメントレイアウト生成タスクを管理文書画像、この場合請求書で実験する最初のグラフベースのアプローチでもある。

One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices.

翻訳日:2021-07-12 21:01:15 公開日:2021-07-09

# (参考訳) 図形言語検出のためのロバストディープアンサンブル分類器

A Robust Deep Ensemble Classifier for Figurative Language Detection ( http://arxiv.org/abs/2107.04372v1 )

ライセンス: CC BY 4.0

Rolandos Alexandros Potamias and Georgios Siolas and Andreas - Georgios Stafylopatis

(参考訳) 表現型言語(FL)の認識と分類は、比喩的内容のフレーズに含まれる矛盾した意味から、自然言語処理(NLP)の幅広い分野における知覚分析のオープンな問題である。本論文では,高度なDeep Learning (DL) 技術に対処する,皮肉,皮肉,メタファの3つの相互関連FL認識タスクについて述べる。まず,各入力をDLモデルに最適化するために,効率的なデータ表現形式に向けたデータ前提フレームワークを提案する。さらに、各ソーシャルメディアテキスト参照に反映される構文的、表現的、感情的、テンポ的コンテンツを特徴付けるために、特殊特徴を抽出する。これらの機能は、ソーシャルネットワークユーザの書き込み方法の側面をキャプチャすることを目的としている。最後に、異なるDL技術の組み合わせに基づく、堅牢なDeep Ensemble Soft Classifier (DESC) に機能を供給する。 3つの異なるベンチマークデータセット(そのうちの1つは様々なFL形式を含む)を用いて、DECモデルはFL認識の困難な分野において、関連する方法論や最先端技術と比較するにふさわしい非常に優れた性能を達成すると結論付けた。

Recognition and classification of Figurative Language (FL) is an open problem of Sentiment Analysis in the broader field of Natural Language Processing (NLP) due to the contradictory meaning contained in phrases with metaphorical content. The problem itself contains three interrelated FL recognition tasks: sarcasm, irony and metaphor which, in the present paper, are dealt with advanced Deep Learning (DL) techniques. First, we introduce a data prepossessing framework towards efficient data representation formats so that to optimize the respective inputs to the DL models. In addition, special features are extracted in order to characterize the syntactic, expressive, emotional and temper content reflected in the respective social media text references. These features aim to capture aspects of the social network user's writing method. Finally, features are fed to a robust, Deep Ensemble Soft Classifier (DESC) which is based on the combination of different DL techniques. Using three different benchmark datasets (one of them containing various FL forms) we conclude that the DESC model achieves a very good performance, worthy of comparison with relevant methodologies and state-of-the-art technologies in the challenging field of FL recognition.

翻訳日:2021-07-12 20:51:15 公開日:2021-07-09

# (参考訳) ドメイン固有ALBERTを用いたバイオメディカル自然言語処理タスクのベンチマーク

Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT ( http://arxiv.org/abs/2107.04374v1 )

ライセンス: CC BY 4.0

Usman Naseem, Adam G. Dunn, Matloob Khushi, Jinman Kim

(参考訳) バイオメディカルテキストデータの入手と自然言語処理(NLP)の進歩により、バイオメディカルNLPの新たな応用が可能となった。ドメイン固有コーパスを用いて訓練または微調整された言語モデルは、一般的なモデルより優れているが、バイオメディカルNLPにおける作業は、コーパスとタスクの点で制限されている。本稿では,生物医学的(pubmed centralとpubmed central)と臨床(mimic-iii)コーポラを訓練し,20個のベンチマークデータセットにまたがる6つの異なるタスクを微調整した,ライト双方向エンコーダ表現のドメイン固有適応であるbioalbertを提案する。実験の結果、BioALBERTは、名前付きエンティティ認識(+11.09% BLURBスコアの改善)、関係抽出(+0.80% BLURBスコア)、文類似性(+1.05% BLURBスコア)、文書分類(+0.62% F1スコア)、質問応答(+2.83% BLURBスコア)において、技術の現状よりも優れていた。 20のベンチマークデータセットのうち17で、新しい最先端技術を表している。バイオALBERTモデルとデータを利用可能にすることで、バイオメディカルNLPコミュニティがトレーニングの計算コストを回避し、幅広いバイオメディカルNLPタスクにわたる今後の取り組みのための新たなベースラインを確立することを目的とする。

The availability of biomedical text data and advances in natural language processing (NLP) have made new applications in biomedical NLP possible. Language models trained or fine tuned using domain specific corpora can outperform general models, but work to date in biomedical NLP has been limited in terms of corpora and tasks. We present BioALBERT, a domain-specific adaptation of A Lite Bidirectional Encoder Representations from Transformers (ALBERT), trained on biomedical (PubMed and PubMed Central) and clinical (MIMIC-III) corpora and fine tuned for 6 different tasks across 20 benchmark datasets. Experiments show that BioALBERT outperforms the state of the art on named entity recognition (+11.09% BLURB score improvement), relation extraction (+0.80% BLURB score), sentence similarity (+1.05% BLURB score), document classification (+0.62% F1-score), and question answering (+2.83% BLURB score). It represents a new state of the art in 17 out of 20 benchmark datasets. By making BioALBERT models and data available, our aim is to help the biomedical NLP community avoid computational costs of training and establish a new set of baselines for future efforts across a broad range of biomedical NLP tasks.

翻訳日:2021-07-12 20:41:02 公開日:2021-07-09

# (参考訳) アンサンブル分類におけるスペシャリストの成績

Specialists Outperform Generalists in Ensemble Classification ( http://arxiv.org/abs/2107.04381v1 )

ライセンス: CC BY 4.0

Sascha Meyen, Frieder G\"oppert, Helen Alber, Ulrike von Luxburg, Volker H. Franz

(参考訳) 精度が知られている個々の分類器の集合を考える。テストポイントを受信すると、各分類器は、この特定のテストポイントに対する予測ラベルとその予測に対する信頼度を出力する。本稿では,アンサンブルの精度を判定できるかどうかという問題に対処する。驚いたことに、この設定において、統計学的に最適な方法で分類器が組み合わされたとしても、その結果のアンサンブル分類器の精度は個々の分類器の精度から計算することはできない。アンサンブル精度について, 上下境界を厳密に証明した。我々は、上と下の境界に達する個々の分類器を明示的に構築する。 1) アンサンブル法を用いて, 個々の(独立でない)分類器をスクラッチから構築する選択肢があれば, 一般論者ではなく, 専門的分類器を目標とすべきである。 2) 所望のアンサンブル精度を達成するために,少なくとも何個の分類器が必要かを決定するために,我々の境界を用いることができる。最後に、真のラベルと個々の分類器の出力間の相互情報を考慮して境界を改善する。

Consider an ensemble of $k$ individual classifiers whose accuracies are known. Upon receiving a test point, each of the classifiers outputs a predicted label and a confidence in its prediction for this particular test point. In this paper, we address the question of whether we can determine the accuracy of the ensemble. Surprisingly, even when classifiers are combined in the statistically optimal way in this setting, the accuracy of the resulting ensemble classifier cannot be computed from the accuracies of the individual classifiers-as would be the case in the standard setting of confidence weighted majority voting. We prove tight upper and lower bounds on the ensemble accuracy. We explicitly construct the individual classifiers that attain the upper and lower bounds: specialists and generalists. Our theoretical results have very practical consequences: (1) If we use ensemble methods and have the choice to construct our individual (independent) classifiers from scratch, then we should aim for specialist classifiers rather than generalists. (2) Our bounds can be used to determine how many classifiers are at least required to achieve a desired ensemble accuracy. Finally, we improve our bounds by considering the mutual information between the true label and the individual classifier's output.

翻訳日:2021-07-12 20:28:35 公開日:2021-07-09

# (参考訳) hoechstは必要なすべてだ:深層学習によるリンパ球分類

Hoechst Is All You Need: LymphocyteClassification with Deep Learning ( http://arxiv.org/abs/2107.04388v1 )

ライセンス: CC BY 4.0

Jessica Cooper, In Hwa Um, Ognjen Arandjelovi\'c and David J Harrison

(参考訳) 多発性免疫蛍光および免疫組織化学は、がん病理学者が細胞表面に発現するいくつかのタンパク質を同定し、細胞分類、腫瘍の微小環境の理解、より正確な診断、予後、個々の患者の免疫状態に基づく調整された免疫療法を可能にすることで、患者に利益をもたらす。しかし、それらは高価で時間を要するプロセスであり、専門家による複雑な染色とイメージング技術を必要とする。ホーフスト染色はより安価で実行が容易であるが、免疫蛍光法で標的とするタンパク質よりもdnaに結合するので一般的には用いられず、dna形態のみに基づいてこれらのタンパク質を発現する細胞を区別することは従来考えられていなかった。本研究では,3つのタンパク質(tリンパ球マーカーcd3,cd8,bリンパ球マーカーcd20)を90%以上の精度で発現する細胞を,ホーチスト33342染色組織のみから同定するために,深い畳み込みニューラルネットワークを訓練することを提案する。本モデルでは, 免疫細胞浸潤の評価などの重要な予後指標において, リンパ球サブタイプを正確に識別し, コストのかかる多重蛍光を必要とせず, 患者の予後を予測し, 改善することのできる, これらのタンパク質の発現に関連する既知形態的特徴を学習する。

Multiplex immunofluorescence and immunohistochemistry benefit patients by allowing cancer pathologists to identify several proteins expressed on the surface of cells, enabling cell classification, better understanding of the tumour micro-environment, more accurate diagnoses, prognoses, and tailored immunotherapy based on the immune status of individual patients. However, they are expensive and time consuming processes which require complex staining and imaging techniques by expert technicians. Hoechst staining is much cheaper and easier to perform, but is not typically used in this case as it binds to DNA rather than to the proteins targeted by immunofluorescent techniques, and it was not previously thought possible to differentiate cells expressing these proteins based only on DNA morphology. In this work we show otherwise, training a deep convolutional neural network to identify cells expressing three proteins (T lymphocyte markers CD3 and CD8, and the B lymphocyte marker CD20) with greater than 90% precision and recall, from Hoechst 33342 stained tissue only. Our model learns previously unknown morphological features associated with expression of these proteins which can be used to accurately differentiate lymphocyte subtypes for use in key prognostic metrics such as assessment of immune cell infiltration,and thereby predict and improve patient outcomes without the need for costly multiplex immunofluorescence.

翻訳日:2021-07-12 19:54:24 公開日:2021-07-09

# (参考訳) 集合加算問題に対する文脈的・非文脈的選好ランキングの比較

A Comparison of Contextual and Non-Contextual Preference Ranking for Set Addition Problems ( http://arxiv.org/abs/2107.04438v1 )

ライセンス: CC BY 4.0

Timo Bertram, Johannes F\"urnkranz, Martin M\"uller

(参考訳) 本稿では,要素の集合への付加性を評価する問題について検討する。この問題は、一般的な場合では、選択間の無条件な選好に還元できないため、難しい。したがって、決定の文脈に基づいて好みをモデル化する。本課題では,追加後の2つの集合を比較するツインネットワークと,各候補の既存集合への寄与をモデル化するトリプレットネットワークという,2つの異なるシムセネットワークアーキテクチャを議論し比較する。収集可能なカードゲームMagic: The Gathering(マジック:ザ・ギャザリング)におけるデッキビルディングの人間のカード嗜好を学習する。本稿では,2つのネットワークよりも3重項アプローチの方がよい結果が得られることを示す。

In this paper, we study the problem of evaluating the addition of elements to a set. This problem is difficult, because it can, in the general case, not be reduced to unconditional preferences between the choices. Therefore, we model preferences based on the context of the decision. We discuss and compare two different Siamese network architectures for this task: a twin network that compares the two sets resulting after the addition, and a triplet network that models the contribution of each candidate to the existing set. We evaluate the two settings on a real-world task; learning human card preferences for deck building in the collectible card game Magic: The Gathering. We show that the triplet approach achieves a better result than the twin network and that both outperform previous results on this task.

翻訳日:2021-07-12 19:38:48 公開日:2021-07-09

# (参考訳) ディープニューラルネットワークにおける凝集層分布の理解

Understanding the Distributions of Aggregation Layers in Deep Neural Networks ( http://arxiv.org/abs/2107.04458v1 )

ライセンス: CC BY 4.0

Eng-Jon Ong, Sameed Husain, Miroslaw Bober

(参考訳) 集約のプロセスは、ほとんどすべてのディープネットモデルにおいてユビキタスである。深い特徴をよりコンパクトな表現にまとめる重要なメカニズムとして機能し、深い網に過度に収まることへの堅牢性を高め、空間的不変性を提供する。特に、DNNの出力層へのグローバルアグリゲーション層の近接は、集約された特徴がディープネットの性能に直接的な影響を与えることを意味する。この関係をよりよく理解するには、情報理論の手法を用いる。しかし、これは凝集層の活性化の分布に関する知識を必要とする。そこで本研究では,深い特徴集約に関わるレイヤの出力値の確率分布を解析的にモデル化する,新しい数学的定式化を提案する。重要な結果として、DNNにおける出力ノードのKL分割を解析的に予測する能力がある。また,様々な分類タスクやデータセットにわたる経験的観測に対する理論的予測を実験的に検証した。

The process of aggregation is ubiquitous in almost all deep nets models. It functions as an important mechanism for consolidating deep features into a more compact representation, whilst increasing robustness to overfitting and providing spatial invariance in deep nets. In particular, the proximity of global aggregation layers to the output layers of DNNs mean that aggregated features have a direct influence on the performance of a deep net. A better understanding of this relationship can be obtained using information theoretic methods. However, this requires the knowledge of the distributions of the activations of aggregation layers. To achieve this, we propose a novel mathematical formulation for analytically modelling the probability distributions of output values of layers involved with deep feature aggregation. An important outcome is our ability to analytically predict the KL-divergence of output nodes in a DNN. We also experimentally verify our theoretical predictions against empirical observations across a range of different classification tasks and datasets.

翻訳日:2021-07-12 19:28:53 公開日:2021-07-09

# (参考訳) 脳波に基づく睡眠段階分類のための自己訓練による対向領域適応

Adversarial Domain Adaptation with Self-Training for EEG-based Sleep Stage Classification ( http://arxiv.org/abs/2107.04470v1 )

ライセンス: CC BY 4.0

Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee-Keong Kwoh, Xiaoli Li, and Cuntai Guan

(参考訳) 睡眠ステージングは睡眠障害の診断と治療において非常に重要である。近年,自動睡眠ステージングのためのデータ駆動型ディープラーニングモデルが提案されている。それらは主に、トレーニングとテストのデータを、実際のシナリオでは保持できないような同じ分布から引き出すという仮定に依存している。ドメインシフト問題に対処するために、Unsupervised Domain Adaption (UDA) が最近開発された。しかし、これまでの睡眠ステージングに適用されるUDAメソッドには2つの大きな制限がある。まず、それらはドメインアライメントの完全な共有モデルに依存しており、機能抽出中にドメイン固有の情報を失う可能性がある。第2に、ターゲットドメインのクラス情報を考慮せずに、ソースとターゲットの分布をグローバルに調整するだけで、モデルの分類性能を阻害する。本研究では,未ラベル対象領域におけるドメインシフト問題に対処するための新しい逆学習フレームワークを提案する。まず、ソースドメインとターゲットドメインのドメイン固有の特徴を保存するために、非共有アテンション機構を開発する。第2に、ターゲットドメインの擬似ラベルを用いて、ソースおよびターゲットドメインの詳細なクラス分布を調整するための自己学習戦略を設計する。また,擬似ラベルのロバスト性と品質を高めるために,2つの識別分類器を提案する。 6つのクロスドメインシナリオの実験結果から、睡眠ステージングのためのフレームワークの有効性と最先端UDA法に対する利点が検証された。

Sleep staging is of great importance in the diagnosis and treatment of sleep disorders. Recently, numerous data driven deep learning models have been proposed for automatic sleep staging. They mainly rely on the assumption that training and testing data are drawn from the same distribution which may not hold in real-world scenarios. Unsupervised domain adaption (UDA) has been recently developed to handle this domain shift problem. However, previous UDA methods applied for sleep staging has two main limitations. First, they rely on a totally shared model for the domain alignment, which may lose the domain-specific information during feature extraction. Second, they only align the source and target distributions globally without considering the class information in the target domain, which hinders the classification performance of the model. In this work, we propose a novel adversarial learning framework to tackle the domain shift problem in the unlabeled target domain. First, we develop unshared attention mechanisms to preserve the domain-specific features in the source and target domains. Second, we design a self-training strategy to align the fine-grained class distributions for the source and target domains via target domain pseudo labels. We also propose dual distinct classifiers to increase the robustness and quality of the pseudo labels. The experimental results on six cross-domain scenarios validate the efficacy of our proposed framework for sleep staging and its advantage over state-of-the-art UDA methods.

翻訳日:2021-07-12 19:06:10 公開日:2021-07-09

# (参考訳) 電力ネットワーク同定のためのベイズ誤差イン変数モデル

Bayesian Error-in-Variables Models for the Identification of Power Networks ( http://arxiv.org/abs/2107.04480v1 )

ライセンス: CC BY 4.0

Jean-S\'ebastien Brouillon, Emanuele Fabbiani, Pulkit Nahata, Florian D\"orfler, Giancarlo Ferrari-Trecate

(参考訳) 断続的な再生可能発電、特に分布レベルでの統合が増加すると、グリッドの知識、特に電気ネットワークのトポロジーとラインパラメータをキャプチャするアドミタンス行列に基づく高度な計画および最適化手法が必要となる。しかし、アドミタンス行列の信頼できる推定は、時間的に変化する格子に対して欠落するか、あるいはすぐに時代遅れになるかもしれない。本研究では,マイクロPMUから収集した電圧と電流を利用したデータ駆動型識別手法を提案する。より正確には、我々はまず最大帰納的アプローチを示し、次にベイズ的枠組みに向かい、最大後続推定の原理を推定する。既存のコントリビューションとは対照的に,本手法では,電圧と電流データの両方のノイズを測定するだけでなく,疎度パターンやノウハウラインパラメータなどの事前情報を活用できる。ベンチマークケースで行ったシミュレーションでは, 他アルゴリズムと比較して, 精度が大幅に向上することが示された。

The increasing integration of intermittent renewable generation, especially at the distribution level,necessitates advanced planning and optimisation methodologies contingent on the knowledge of thegrid, specifically the admittance matrix capturing the topology and line parameters of an electricnetwork. However, a reliable estimate of the admittance matrix may either be missing or quicklybecome obsolete for temporally varying grids. In this work, we propose a data-driven identificationmethod utilising voltage and current measurements collected from micro-PMUs. More precisely,we first present a maximum likelihood approach and then move towards a Bayesian framework,leveraging the principles of maximum a posteriori estimation. In contrast with most existing con-tributions, our approach not only factors in measurement noise on both voltage and current data,but is also capable of exploiting available a priori information such as sparsity patterns and knownline parameters. Simulations conducted on benchmark cases demonstrate that, compared to otheralgorithms, our method can achieve significantly greater accuracy.

翻訳日:2021-07-12 18:51:09 公開日:2021-07-09

# (参考訳) 敗血症治療戦略の不確実性を考慮したオフライン強化学習

Offline reinforcement learning with uncertainty for treatment strategies in sepsis ( http://arxiv.org/abs/2107.04491v1 )

ライセンス: CC BY 4.0

Ran Liu (1 and 2), Joseph L. Greenstein (1 and 2), James C. Fackler (3), Jules Bergmann (3), Melania M. Bembea (3 and 4), Raimond L. Winslow (1 and 2) ((1) Institute for Computational Medicine, the Johns Hopkins University, (2) Department of Biomedical Engineering, the Johns Hopkins University School of Medicine and Whiting School of Engineering, (3) Department of Anesthesiology and Critical Care Medicine, the Johns Hopkins University, (4) Department of Pediatrics, the Johns Hopkins University School of Medicine)

(参考訳) 敗血症と敗血症性ショックに対するガイドラインに基づく治療は、病態を十分に理解していない生命を脅かす臓器機能障害の異なる範囲であるため困難である。敗血症の早期介入は患者の予後に不可欠であるが、これらの介入は副作用があり、しばしば過剰投与される。すべての患者には単一の行動が適さないため、より個人化が必要である。本稿では,データから敗血症治療の最適勧告を抽出し,信頼度を推定し,トレーニングデータで頻繁に観察される治療オプションを同定する,強化学習の新たな応用を提案する。単一の推奨ではなく,いくつかの治療法を提示できる。学習方針を考察し, 死亡率と治療のレベルが重なり合うことから, 強化学習は積極的な介入に偏っていることを見出した。このバイアスをサブスペース学習を用いて軽減し、医療アプリケーション全体でより正確な学習方針をもたらす方法を開発します。

Guideline-based treatment for sepsis and septic shock is difficult because sepsis is a disparate range of life-threatening organ dysfunctions whose pathophysiology is not fully understood. Early intervention in sepsis is crucial for patient outcome, yet those interventions have adverse effects and are frequently overadministered. Greater personalization is necessary, as no single action is suitable for all patients. We present a novel application of reinforcement learning in which we identify optimal recommendations for sepsis treatment from data, estimate their confidence level, and identify treatment options infrequently observed in training data. Rather than a single recommendation, our method can present several treatment options. We examine learned policies and discover that reinforcement learning is biased against aggressive intervention due to the confounding relationship between mortality and level of treatment received. We mitigate this bias using subspace learning, and develop methodology that can yield more accurate learning policies across healthcare applications.

翻訳日:2021-07-12 18:21:56 公開日:2021-07-09

# (参考訳) 勾配に基づく深部物体検出器の不確かさの定量化

Gradient-Based Quantification of Epistemic Uncertainty for Deep Object Detectors ( http://arxiv.org/abs/2107.04517v1 )

ライセンス: CC BY 4.0

Tobias Riedlinger, Matthias Rottmann, Marius Schubert, Hanno Gottschalk

(参考訳) 信頼性の高いてんかん不確実性評価は, 深部物体検出装置のバックエンド応用に欠かせない要素である。現代のネットワークアーキテクチャは、予測能力に制限のある、キャリブレーションの低い信頼性を与える傾向がある。本稿では,新しい勾配に基づく不確実性メトリクスを導入し,異なるオブジェクト検出アーキテクチャについて検討する。 MS COCO, PASCAL VOC, KITTIデータセットを用いた実験では, ネットワーク信頼度と比較して, 正/偽の正の正の正の判別と交叉の予測が有意に向上した。また、モンテカルロのドロップアウト不確実性指標に対する改善や、さまざまな不確実性指標のソースを集約することで、さらに大幅な改善が見られ、その結果の不確実性モデルは、すべてのインスタンスにおいて十分に校正された信頼を生み出す。さらに,不確実性定量化モデルを物体検出パイプラインに実装し,通常のスコアスレッシャードに基づく決定規則を置き換え,偽予測と真偽を識別する。実験では,平均的な精度で検出性能を大幅に向上させることができた。計算複雑性に関しては,浮動小数点演算における計算勾配の不確実性の測定値がモンテカルロ・ドロップアウトの値と類似していることが分かる。

Reliable epistemic uncertainty estimation is an essential component for backend applications of deep object detectors in safety-critical environments. Modern network architectures tend to give poorly calibrated confidences with limited predictive power. Here, we introduce novel gradient-based uncertainty metrics and investigate them for different object detection architectures. Experiments on the MS COCO, PASCAL VOC and the KITTI dataset show significant improvements in true positive / false positive discrimination and prediction of intersection over union as compared to network confidence. We also find improvement over Monte-Carlo dropout uncertainty metrics and further significant boosts by aggregating different sources of uncertainty metrics.The resulting uncertainty models generate well-calibrated confidences in all instances. Furthermore, we implement our uncertainty quantification models into object detection pipelines as a means to discern true against false predictions, replacing the ordinary score-threshold-based decision rule. In our experiments, we achieve a significant boost in detection performance in terms of mean average precision. With respect to computational complexity, we find that computing gradient uncertainty metrics results in floating point operation counts similar to those of Monte-Carlo dropout.

翻訳日:2021-07-12 18:06:44 公開日:2021-07-09

# (参考訳) 非凹帯域最適化のための最適勾配アルゴリズム

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization ( http://arxiv.org/abs/2107.04518v1 )

ライセンス: CC0 1.0

Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

(参考訳) 線形あるいは凹面報酬のバンドイット問題は広く研究されているが、非凹面報酬のバンドイットの研究は比較的少ない。本研究は、低ランク一般化線形バンディット問題や多項式活性化バンディット問題を持つ2層ニューラルネットワークなど、未知の報酬関数が凹凸でないバンディット問題の大きなファミリーを考察する。低ランク一般化線形バンドイット問題に対しては、[LMT21, JWWN19] における両方の予想を反論するミニマックス最適化アルゴリズムを提供する。我々のアルゴリズムは、非常に一般化されたゼロ階最適化パラダイムに基づいており、(次元において)いくつかの構造化多項式設定において最適な速度が得られる。さらに、生成モデル設定におけるRLにおけるアルゴリズムの適用性を実証し、従来の手法よりもサンプルの複雑さが向上した。最後に、標準楽観的アルゴリズム(例:ucb)が次元因子によって最適化されることを示す。雑音のない報酬を持つニューラルネット設定(多項式アクティベーション関数付き)では、本質的な代数次元に等しいサンプリング複雑性を持つバンディットアルゴリズムを提供する。また、楽観的なアプローチはサンプルの複雑さが悪く、外部次元の多項式(多項式次数において指数関数的に悪い)があることを示した。

Bandit problems with linear or concave reward have been extensively studied, but relatively few works have studied bandits with non-concave reward. This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem. For the low-rank generalized linear bandit problem, we provide a minimax-optimal algorithm in the dimension, refuting both conjectures in [LMT21, JWWN19]. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality and attains optimal rates in several structured polynomial settings (in the dimension). We further demonstrate the applicability of our algorithms in RL in the generative model setting, resulting in improved sample complexity over prior approaches. Finally, we show that the standard optimistic algorithms (e.g., UCB) are sub-optimal by dimension factors. In the neural net setting (with polynomial activation functions) with noiseless reward, we provide a bandit algorithm with sample complexity equal to the intrinsic algebraic dimension. Again, we show that optimistic approaches have worse sample complexity, polynomial in the extrinsic dimension (which could be exponentially worse in the polynomial degree).

翻訳日:2021-07-12 17:37:40 公開日:2021-07-09

# (参考訳) ラベル分布シフトに対するオンライン適応

Online Adaptation to Label Distribution Shift ( http://arxiv.org/abs/2107.04520v1 )

ライセンス: CC BY 4.0

Ruihan Wu, Chuan Guo, Yi Su, Kilian Q. Weinberger

(参考訳) 機械学習モデルは、現実世界にデプロイすると、しばしば分散シフトに遭遇する。本稿では,テストタイムラベルの分布が継続的に変化しているオンライン環境でのラベルの分布変化への適応に着目し,真のラベルを観察することなく動的に適応する必要がある。そこで,本研究では,従来のオンライン学習へのオンラインラベルシフト適応の低減を図り,真のラベルの欠如が期待されるテスト損失の推定を妨げないことを示す。そこで本研究では,従来の学習手法であるフォロー・ザ・リーダー (ftl) やオンライン勾配降下 (ogd) に触発された適応アルゴリズムを提案する。我々はシミュレーションと実世界のラベルの分布変化の両方でこの発見を実証し、ogdが様々な挑戦的なラベルシフトシナリオに対して特に効果的で頑健であることを実証した。

Machine learning models often encounter distribution shifts when deployed in the real world. In this paper, we focus on adaptation to label distribution shift in the online setting, where the test-time label distribution is continually changing and the model must dynamically adapt to it without observing the true label. Leveraging a novel analysis, we show that the lack of true label does not hinder estimation of the expected test loss, which enables the reduction of online label shift adaptation to conventional online learning. Informed by this observation, we propose adaptation algorithms inspired by classical online learning techniques such as Follow The Leader (FTL) and Online Gradient Descent (OGD) and derive their regret bounds. We empirically verify our findings under both simulated and real world label distribution shifts and show that OGD is particularly effective and robust to a variety of challenging label shift scenarios.

翻訳日:2021-07-12 17:36:30 公開日:2021-07-09

# (参考訳) エントロピー、情報、および確率の更新

Entropy, Information, and the Updating of Probabilities ( http://arxiv.org/abs/2107.04529v1 )

ライセンス: CC BY 4.0

Ariel Caticha

(参考訳) 本稿では,推論の一般的な枠組みとして,最大エントロピー法に対する特定のアプローチを概説する。議論は導出における実用的要素を強調している。情報の概念は、理想的に有理なエージェントのベイズ的信念との関係の観点から定義される。先行確率分布から後続確率分布への更新方法は、固有誘導過程を通じて設計する。対数的相対エントロピーは、(a)が普遍的な適用性、(b)先行情報の価値を認識する、(c)科学における独立の概念が果たす特権的役割を認識する、ユニークなツールとして選択される。結果として生じるフレームワーク -- MEメソッド -- は、任意の事前および任意の制約を処理できる。これは特殊ケースとしてMaxEntとBayesの法則を含み、したがってエントロピー法とベイズ法を単一の一般推論スキームに統一する。 ME法は1つの後部の単なる選択を超えるが、他の分布がどれだけ小さいかという問題にも対処し、揺らぎの理論と大きな偏差の直接的な橋渡しとなる。

This paper is a review of a particular approach to the method of maximum entropy as a general framework for inference. The discussion emphasizes the pragmatic elements in the derivation. An epistemic notion of information is defined in terms of its relation to the Bayesian beliefs of ideally rational agents. The method of updating from a prior to a posterior probability distribution is designed through an eliminative induction process. The logarithmic relative entropy is singled out as the unique tool for updating that (a) is of universal applicability; (b) that recognizes the value of prior information; and (c) that recognizes the privileged role played by the notion of independence in science. The resulting framework -- the ME method -- can handle arbitrary priors and arbitrary constraints. It includes MaxEnt and Bayes' rule as special cases and, therefore, it unifies entropic and Bayesian methods into a single general inference scheme. The ME method goes beyond the mere selection of a single posterior, but also addresses the question of how much less probable other distributions might be, which provides a direct bridge to the theories of fluctuations and large deviations.

翻訳日:2021-07-12 16:58:38 公開日:2021-07-09

# (参考訳) バイオメディカルイメージセグメンテーションのためのモダリティ特異的U-Net変異体:調査

Modality specific U-Net variants for biomedical image segmentation: A survey ( http://arxiv.org/abs/2107.04537v1 )

ライセンス: CC BY 4.0

Narinder Singh Punn, Sonali Agarwal

(参考訳) 深層畳み込みニューラルネットワーク、残留ニューラルネットワーク、敵ネットワークなどのディープラーニングアプローチの進展に伴い、U-Netアーキテクチャは、標的領域またはサブリージョンの識別と検出における自動化に対処するために、バイオメディカルイメージセグメンテーションにおいて最も広く利用されている。最近の研究では、u-netベースのアプローチは、脳腫瘍、肺癌、アルツハイマー病、乳がんなどの疾患の早期診断および治療のためのコンピュータ支援診断システムの開発に、様々な応用において最先端のパフォーマンスを示す。本稿では,U-Netフレームワークを説明することによって,これらのアプローチの成功を示すとともに,磁気共鳴画像,X線,コンピュータ断層撮影/コンピュータ軸断層撮影,超音波,ポジトロン放射断層撮影など,異なる医用画像のU-Net変種を包括的に分析する。さらに、新型コロナウイルス(COVID-19)としても知られる重症急性呼吸器症候群ウイルス2(SARS-CoV-2)へのU-Netベースのフレームワークの貢献についても強調する。

With the advent of advancements in deep learning approaches, such as deep convolution neural network, residual neural network, adversarial network; U-Net architectures are most widely utilized in biomedical image segmentation to address the automation in identification and detection of the target regions or sub-regions. In recent studies, U-Net based approaches have illustrated state-of-the-art performance in different applications for the development of computer-aided diagnosis systems for early diagnosis and treatment of diseases such as brain tumor, lung cancer, alzheimer, breast cancer, etc. This article contributes to present the success of these approaches by describing the U-Net framework, followed by the comprehensive analysis of the U-Net variants for different medical imaging or modalities such as magnetic resonance imaging, X-ray, computerized tomography/computerized axial tomography, ultrasound, positron emission tomography, etc. Besides, this article also highlights the contribution of U-Net based frameworks in the on-going pandemic, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) also known as COVID-19.

翻訳日:2021-07-12 16:35:42 公開日:2021-07-09

# (参考訳) ディープニューラルネットワークは列名からデータ相関を予測できるか?

Can Deep Neural Networks Predict Data Correlations from Column Names? ( http://arxiv.org/abs/2107.04553v1 )

ライセンス: CC BY 4.0

Immanuel Trummer

(参考訳) 人間の場合、コラム名からデータ相関を予測することがしばしば可能である。我々は、ディープニューラルネットワークが同じことを学べるかどうかを調べる実験を行う。もしそうなら、例えば、nlp分析をスキーマ要素に使用するチューニングツールが、相関検出への取り組みを優先する可能性を開くだろう。約4,000データセットから抽出した約12万列の相関関係を解析した。カラム名のみに基づいて相関を予測しようとする。予測には,最近提案されたTransformerアーキテクチャに基づく事前学習言語モデルを利用する。異なるタイプの相関、複数の予測方法、および様々な予測シナリオを検討する。カラム名の長さやトレーニングデータの量などの要因が予測精度に与える影響について検討した。全体として、ディープニューラルネットワークは、多くのシナリオにおいて比較的高い精度で相関を予測できる(例えば、長いカラム名に対して95%の精度で)。

For humans, it is often possible to predict data correlations from column names. We conduct experiments to find out whether deep neural networks can learn to do the same. If so, e.g., it would open up the possibility of tuning tools that use NLP analysis on schema elements to prioritize their efforts for correlation detection. We analyze correlations for around 120,000 column pairs, taken from around 4,000 data sets. We try to predict correlations, based on column names alone. For predictions, we exploit pre-trained language models, based on the recently proposed Transformer architecture. We consider different types of correlations, multiple prediction methods, and various prediction scenarios. We study the impact of factors such as column name length or the amount of training data on prediction accuracy. Altogether, we find that deep neural networks can predict correlations with a relatively high accuracy in many scenarios (e.g., with an accuracy of 95% for long column names).

翻訳日:2021-07-12 15:39:48 公開日:2021-07-09

# (参考訳) ベイズ語の学習規則

The Bayesian Learning Rule ( http://arxiv.org/abs/2107.04562v1 )

ライセンス: CC BY 4.0

Mohammad Emtiyaz Khan and H{\aa}vard Rue

(参考訳) 多くの機械学習アルゴリズムがベイズ学習則と呼ばれる単一のアルゴリズムの特定の例であることを示す。この規則はベイズ原理から派生したもので、最適化、ディープラーニング、グラフィカルモデルといった分野から幅広いアルゴリズムを導出する。これにはリッジ回帰、ニュートン法、カルマンフィルタのような古典的なアルゴリズムや、確率勾配降下、rmsprop、ドロップアウトといった現代のディープラーニングアルゴリズムが含まれる。このようなアルゴリズムを導出する鍵となるアイデアは、自然勾配を用いて推定された候補分布を用いて後部を近似することである。異なる候補分布は異なるアルゴリズムとなり、さらに自然勾配への近似はそれらのアルゴリズムの変種を引き起こす。私たちの仕事は、既存のアルゴリズムを統一、一般化、改善するだけでなく、新しいアルゴリズムの設計にも役立ちます。

We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.

翻訳日:2021-07-12 15:23:03 公開日:2021-07-09

# (参考訳) ランダムウォークと再起動によるユニバーサル多層ネットワーク探索

Universal Multilayer Network Exploration by Random Walk with Restart ( http://arxiv.org/abs/2107.04565v1 )

ライセンス: CC BY 4.0

Anthony Baptista, Aitor Gonzalez, Ana\"is Baudot

(参考訳) ここ数年、データの量と種類は劇的に増加している。これらのデータはしばしばネットワークとして表現され、ネットワーク理論から生じるアプローチで探索される。近年では、より複雑でリッチなネットワークフレームワークを活用するためのネットワーク探索手法が拡張されている。例えば、ランダムウォークは多層ネットワークを探索するために拡張されている。しかし、現在のランダムウォークアプローチは、処理可能なネットワーク層の組合せと不均一性に制限がある。多層ネットワークの多様性と複雑さの増大に対応するために,新しい解析的および数値的ランダムウォーク法が必要である。そこで本稿では,Random Walk with Restart(RWR)を最適化したマルチレイヤネットワーク上で実現するPythonパッケージであるMultiXrankを提案する。このパッケージはRWRの普遍的な数学的定式化によって支えられている。我々は,MultiXrankを相互検証とリンク予測で評価し,マルチレイヤネットワークデータの追加や削除が予測性能に与える影響を評価するためのプロトコルを導入した。さらに,入力パラメータに対するマルチックスランクの感度をパラメータ空間の詳細な探索により測定した。最後に,ヒト遺伝病の文脈において,非教師付きノード優先化と教師付き分類の異なるユースケースを用いたマルチックスランクの汎用性を示す。

The amount and variety of data is increasing drastically for several years. These data are often represented as networks, which are then explored with approaches arising from network theory. Recent years have witnessed the extension of network exploration methods to leverage more complex and richer network frameworks. Random walks, for instance, have been extended to explore multilayer networks. However, current random walk approaches are limited in the combination and heterogeneity of network layers they can handle. New analytical and numerical random walk methods are needed to cope with the increasing diversity and complexity of multilayer networks. We propose here MultiXrank, a Python package that enables Random Walk with Restart (RWR) on any kind of multilayer network with an optimized implementation. This package is supported by a universal mathematical formulation of the RWR. We evaluated MultiXrank with leave-one-out cross-validation and link prediction, and introduced protocols to measure the impact of the addition or removal of multilayer network data on prediction performances. We further measured the sensitivity of MultiXrank to input parameters by in-depth exploration of the parameter space. Finally, we illustrate the versatility of MultiXrank with different use-cases of unsupervised node prioritization and supervised classification in the context of human genetic diseases.

翻訳日:2021-07-12 15:22:07 公開日:2021-07-09

# (参考訳) マルチモーダル融合を用いた仮想現実環境における心電図からの多レベル応力評価

Multi-level Stress Assessment from ECG in a Virtual Reality Environment using Multimodal Fusion ( http://arxiv.org/abs/2107.04566v1 )

ライセンス: CC BY 4.0

Zeeshan Ahmad, Suha Rabbani, Muhammad Rehman Zafar, Syem Ishaque, Sridhar Krishnan, Naimul Khan

(参考訳) ECGは、非侵襲的な性質のため、深刻な仮想現実(VR)アプリケーションにおけるストレスを評価する魅力的な選択肢である。しかし、既存の機械学習(ML)モデルは性能が良くない。さらに、既存の研究は二分ストレスアセスメントしか行わず、より活発なバイオフィードバックベースのアプリケーションを開発するためには、マルチレベルアセスメントが必要である。既存の研究は単一の経験(例えば)を注釈し分類している。 vrビデオの視聴)を単一のストレスレベルにすることで、リアルタイムのゲーム内ストレス評価を活用できる動的エクスペリエンスの設計を再び防ぐことができる。本稿では、3つのストレスレベルを評価するvrストレスアセスメントに関する新しい研究について報告する。 ECGデータは、VRジェットコースターを経験している9人のユーザーから収集された。その後、VR体験を手作業で10秒単位で3つのストレスレベルにラベル付けした。次に,1秒間の窓から応力予測を行うことができるspectrogramと1d ecgを用いた,新しいマルチモーダル深層融合モデルを提案する。実験の結果,提案モデルは従来のHRVベースMLモデル(精度9%向上)とベースラインディープラーニングモデル(2.5%向上)より優れていた。また、ベンチマークWESADデータセットの結果を報告し、モデルの優位性を示す。

ECG is an attractive option to assess stress in serious Virtual Reality (VR) applications due to its non-invasive nature. However, the existing Machine Learning (ML) models perform poorly. Moreover, existing studies only perform a binary stress assessment, while to develop a more engaging biofeedback-based application, multi-level assessment is necessary. Existing studies annotate and classify a single experience (e.g. watching a VR video) to a single stress level, which again prevents design of dynamic experiences where real-time in-game stress assessment can be utilized. In this paper, we report our findings on a new study on VR stress assessment, where three stress levels are assessed. ECG data was collected from 9 users experiencing a VR roller coaster. The VR experience was then manually labeled in 10-seconds segments to three stress levels by three raters. We then propose a novel multimodal deep fusion model utilizing spectrogram and 1D ECG that can provide a stress prediction from just a 1-second window. Experimental results demonstrate that the proposed model outperforms the classical HRV-based ML models (9% increase in accuracy) and baseline deep learning models (2.5% increase in accuracy). We also report results on the benchmark WESAD dataset to show the supremacy of the model.

翻訳日:2021-07-12 15:05:41 公開日:2021-07-09

# (参考訳) anceR:サンプルワイドボリューム最大化による異方性認証

ANCER: Anisotropic Certification via Sample-wise Volume Maximization ( http://arxiv.org/abs/2107.04570v1 )

ライセンス: CC BY 4.0

Francisco Eiras, Motasem Alfarra, M. Pawan Kumar, Philip H. S. Torr, Puneet K. Dokania, Bernard Ghanem, Adel Bibi

(参考訳) ランダム化平滑化は最近、大規模なディープニューラルネットワーク分類器の認証を可能にする効果的なツールとして登場した。ランダム化平滑化に関するすべての先行技術は、等方性$\ell_p$認証にフォーカスしており、これは$\ell_p$-norm半径を介して等方性メソッド間で容易に比較可能な証明書を発行する利点がある。しかし、等方的認証は、入力から最悪の場合の敵への認証を制限しているため、他の「閉じた」、潜在的に大きく、一定の予測の安全な領域を推論することはできない。この問題を緩和するため、(i)理論上は、簡単な解析に従って、等方性ランダム化平滑化 $\ell_1$ と $\ell_2$ の証明書を一般化した異方性証明に拡張する。さらに、(ii)認証領域のボリュームを通して各証明書を定量化することにより、上位集合領域を認証した場合、証明書が他よりも優れている一般証明書の比較を可能にする評価指標を提案する。本稿では,ボリューム最大化によるテストセットサンプルの異方性証明書を取得するための実用的なフレームワークであるanceRを紹介する。実験結果から,ancer は cifar-10 と imagenet の両方で複数の radii において最先端の $\ell_1$ と $\ell_2$ の認証精度を達成し,ボリュームの面ではかなり大きな領域を証明し,等方性解析から遠ざかる利点を浮き彫りにした。私たちの実験で使用されたコードはhttps://github.com/motasemalfarra/ancerで利用可能です。

Randomized smoothing has recently emerged as an effective tool that enables certification of deep neural network classifiers at scale. All prior art on randomized smoothing has focused on isotropic $\ell_p$ certification, which has the advantage of yielding certificates that can be easily compared among isotropic methods via $\ell_p$-norm radius. However, isotropic certification limits the region that can be certified around an input to worst-case adversaries, \ie it cannot reason about other "close", potentially large, constant prediction safe regions. To alleviate this issue, (i) we theoretically extend the isotropic randomized smoothing $\ell_1$ and $\ell_2$ certificates to their generalized anisotropic counterparts following a simplified analysis. Moreover, (ii) we propose evaluation metrics allowing for the comparison of general certificates - a certificate is superior to another if it certifies a superset region - with the quantification of each certificate through the volume of the certified region. We introduce ANCER, a practical framework for obtaining anisotropic certificates for a given test set sample via volume maximization. Our empirical results demonstrate that ANCER achieves state-of-the-art $\ell_1$ and $\ell_2$ certified accuracy on both CIFAR-10 and ImageNet at multiple radii, while certifying substantially larger regions in terms of volume, thus highlighting the benefits of moving away from isotropic analysis. Code used in our experiments is available in https://github.com/MotasemAlfarra/ANCER.

翻訳日:2021-07-12 14:46:25 公開日:2021-07-09

# Batch Inverse-Variance Weighting: Deep Heteroscedastic Regression

Batch Inverse-Variance Weighting: Deep Heteroscedastic Regression ( http://arxiv.org/abs/2107.04497v1 )

ライセンス: Link先を確認

Vincent Mai, Waleed Khamies, Liam Paull

(参考訳) ヘテロセダスティック回帰(Heteroscedastic regression)は、各ラベルが異なる分布からノイズを受ける教師あり学習のタスクである。このノイズはラベル付けプロセスによって引き起こされ、i.i.dに反する学習アルゴリズムの性能に悪影響を及ぼす。仮定だしかし、多くの状況において、ラベル付けプロセスはラベルごとにそのような分布のばらつきを推定することができ、この影響を軽減するために追加情報として使用できる。ニューラルネットワークのパラメータ最適化にガウス・マルコフの定理に基づく逆分散重み付き平均二乗誤差を適用する。近地真理サンプルに頑健な損失関数であるバッチ逆分散を導入し,効果的な学習率の制御を可能にする。実験の結果,L2損失,逆分散重み付け,フィルタベースラインに比べて,BIVは2つのノイズデータセット上でのネットワーク性能を著しく向上することがわかった。

Heteroscedastic regression is the task of supervised learning where each label is subject to noise from a different distribution. This noise can be caused by the labelling process, and impacts negatively the performance of the learning algorithm as it violates the i.i.d. assumptions. In many situations however, the labelling process is able to estimate the variance of such distribution for each label, which can be used as an additional information to mitigate this impact. We adapt an inverse-variance weighted mean square error, based on the Gauss-Markov theorem, for parameter optimization on neural networks. We introduce Batch Inverse-Variance, a loss function which is robust to near-ground truth samples, and allows to control the effective learning rate. Our experimental results show that BIV improves significantly the performance of the networks on two noisy datasets, compared to L2 loss, inverse-variance weighting, as well as a filtering-based baseline.

翻訳日:2021-07-12 14:01:22 公開日:2021-07-09

# ドメイン適応のためのドロップアウト判別器の探索

Exploring Dropout Discriminator for Domain Adaptation ( http://arxiv.org/abs/2107.04231v1 )

ライセンス: Link先を確認

Vinod K Kurmi and Venkatesh K Subramanian and Vinay P. Namboodiri

(参考訳) 新しいドメインへの分類器の適応は、機械学習における難しい問題の1つである。これは多くの深層学習と非深層学習に基づく手法を用いて解決されている。提案手法のうち,多くの深層学習問題とドメイン適応を両立させるために,逆学習の手法が広く適用されている。これらの方法は、ソースとターゲットの分布が近いことを保証する判別器に基づいている。しかし, 一つの判別器で得られる点推定を用いるのではなく, 判別器のアンサンブルに基づく分布を用いてこのギャップを橋渡しすることは有用であると考えられる。これは複数の分類器や従来のアンサンブル方式で実現できる。対照的に,モンテカルロのドロップアウトに基づくアンサンブル判別器は,分布に基づく判別器を得るのに十分である可能性が示唆された。具体的には,サンプルベース分布のばらつきを徐々に増加させ,それに対応する逆勾配を用いて特徴表現の調整を行うカリキュラムベースのドロップアウト判別器を提案する。判別器のアンサンブルは、モデルがデータ分布を効率的に学習するのに役立つ。さらに、機能抽出子をトレーニングするための勾配推定も改善されている。詳細な結果と徹底的なアブレーション解析により,本モデルが最先端の結果を上回っていることが示された。

Adaptation of a classifier to new domains is one of the challenging problems in machine learning. This has been addressed using many deep and non-deep learning based methods. Among the methodologies used, that of adversarial learning is widely applied to solve many deep learning problems along with domain adaptation. These methods are based on a discriminator that ensures source and target distributions are close. However, here we suggest that rather than using a point estimate obtaining by a single discriminator, it would be useful if a distribution based on ensembles of discriminators could be used to bridge this gap. This could be achieved using multiple classifiers or using traditional ensemble methods. In contrast, we suggest that a Monte Carlo dropout based ensemble discriminator could suffice to obtain the distribution based discriminator. Specifically, we propose a curriculum based dropout discriminator that gradually increases the variance of the sample based distribution and the corresponding reverse gradients are used to align the source and target feature representations. An ensemble of discriminators helps the model to learn the data distribution efficiently. It also provides a better gradient estimates to train the feature extractor. The detailed results and thorough ablation analysis show that our model outperforms state-of-the-art results.

翻訳日:2021-07-12 14:00:38 公開日:2021-07-09

# Heterogeneous Attention を用いた Levi Graph AMR Parser

Levi Graph AMR Parser using Heterogeneous Attention ( http://arxiv.org/abs/2107.04152v1 )

ライセンス: Link先を確認

Han He, Jinho D. Choi

(参考訳) バイファインデコーダと組み合わせて、トランスフォーマーはテキストからグラフへの変換に効果的に適応し、AMR解析における最先端のパフォーマンスを実現している。しかし、多くの先行研究はビスフィンデコーダをアークまたはラベルの予測に頼っているが、デコーダで使われているほとんどの特徴は変圧器で既に学習されている。本稿では,異種データ(トークン,概念,ラベル)を変換器への入力として組み合わせて注意を学習し,AMRグラフのすべての要素(概念,弧,ラベル)を予測するために,変換器からの注意行列のみを用いる,新しいAMR解析手法を提案する。我々のモデルでは、従来の最先端グラフパーサよりもパラメータが大幅に少ないが、AMR 2.0と3.0では類似またはより良い精度を示している。

Coupled with biaffine decoders, transformers have been effectively adapted to text-to-graph transduction and achieved state-of-the-art performance on AMR parsing. Many prior works, however, rely on the biaffine decoder for either or both arc and label predictions although most features used by the decoder may be learned by the transformer already. This paper presents a novel approach to AMR parsing by combining heterogeneous data (tokens, concepts, labels) as one input to a transformer to learn attention, and use only attention matrices from the transformer to predict all elements in AMR graphs (concepts, arcs, labels). Although our models use significantly fewer parameters than the previous state-of-the-art graph parser, they show similar or better accuracy on AMR 2.0 and 3.0.

翻訳日:2021-07-12 14:00:21 公開日:2021-07-09

# 深部ニューラルネットワークのための活性化勾配

Activated Gradients for Deep Neural Networks ( http://arxiv.org/abs/2107.04228v1 )

ライセンス: Link先を確認

Mei Liu, Liangming Chen, Xiaohao Du, Long Jin, and Mingsheng Shang

(参考訳) ディープニューラルネットワークは、悪条件の問題、消失/爆発勾配問題、サドルポイント問題などにより、パフォーマンスの低下やトレーニングの失敗に苦しむことが多い。本稿では,勾配に勾配活性化関数(GAF)を作用させることにより,これらの課題に対処する新しい手法を提案する。直感的には、GAFは小さな勾配を拡大し、大きな勾配を制限する。理論的には、この論文は、GAFが満たすべき条件を与え、この条件に基づいて、GAFが上記の問題を緩和することを証明している。さらに, 本論文は, SGD のGAF との収束速度がGAF を含まない場合よりも速いことを証明する。さらに、CIFAR、ImageNet、PASCAL視覚オブジェクトクラスに関する実験により、GAFの有効性が確認された。また,実験結果から,提案手法が様々なディープニューラルネットワークに採用され,性能が向上することが示唆された。ソースコードはhttps://github.com/LongJin-lab/Activated-Gradients-for-Deep-Neural-Networksで公開されている。

Deep neural networks often suffer from poor performance or even training failure due to the ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point problem. In this paper, a novel method by acting the gradient activation function (GAF) on the gradient is proposed to handle these challenges. Intuitively, the GAF enlarges the tiny gradients and restricts the large gradient. Theoretically, this paper gives conditions that the GAF needs to meet, and on this basis, proves that the GAF alleviates the problems mentioned above. In addition, this paper proves that the convergence rate of SGD with the GAF is faster than that without the GAF under some assumptions. Furthermore, experiments on CIFAR, ImageNet, and PASCAL visual object classes confirm the GAF's effectiveness. The experimental results also demonstrate that the proposed method is able to be adopted in various deep neural networks to improve their performance. The source code is publicly available at https://github.com/LongJin-lab/Activated-Gradients-for-Deep-Neural-Networks.

翻訳日:2021-07-12 14:00:04 公開日:2021-07-09

# RGBストリームで時間的アクション検出が可能に

RGB Stream Is Enough for Temporal Action Detection ( http://arxiv.org/abs/2107.04362v1 )

ライセンス: Link先を確認

Chenhao Wang, Hongxiang Cai, Yuxin Zou, Yichao Xiong

(参考訳) 現在最先端の時間的動作検出器は、RGBフレームと光フローを含む2ストリーム入力に基づいている。 rgbフレームとオプティカルフローの組み合わせは性能を著しく向上させるが、光学フローは、重い計算を必要とするだけでなく、2つのストリームメソッドがフローと共同でエンドツーエンドで学習されることが少なく、方法論上不満足なハンドデザインの表現である。本稿では,光学フローの高精度な時間的動作検出には光学フローが不要であり,画像レベルのデータ拡張(ILDA)が重要な解であり,光学フローの除去時の性能劣化を回避する。 ILDAの有効性を評価するため,DaoTADという単一のRGBストリームをベースとした簡易かつ効率的な一段階動作検出器を設計した。以上の結果から,DeoTADは既存の2ストリーム検出器と同等の精度を保ちつつ,従来の手法の推論速度を大きなマージンで上回り,GeForce GTX 1080 Tiでは6668fpsの速度を達成できた。コードは \url{https://github.com/Media-Smart/vedatad} で入手できる。

State-of-the-art temporal action detectors to date are based on two-stream input including RGB frames and optical flow. Although combining RGB frames and optical flow boosts performance significantly, optical flow is a hand-designed representation which not only requires heavy computation, but also makes it methodologically unsatisfactory that two-stream methods are often not learned end-to-end jointly with the flow. In this paper, we argue that optical flow is dispensable in high-accuracy temporal action detection and image level data augmentation (ILDA) is the key solution to avoid performance degradation when optical flow is removed. To evaluate the effectiveness of ILDA, we design a simple yet efficient one-stage temporal action detector based on single RGB stream named DaoTAD. Our results show that when trained with ILDA, DaoTAD has comparable accuracy with all existing state-of-the-art two-stream detectors while surpassing the inference speed of previous methods by a large margin and the inference speed is astounding 6668 fps on GeForce GTX 1080 Ti. Code is available at \url{https://github.com/Media-Smart/vedatad}.

翻訳日:2021-07-12 13:59:47 公開日:2021-07-09

# 深部畳み込みニューラルネットワーク圧縮のための結合行列分解

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression ( http://arxiv.org/abs/2107.04386v1 )

ライセンス: Link先を確認

Shaowu Chen, Jihao Zhou, Weize Sun, Lei Huang

(参考訳) 多数のパラメータを持つディープ畳み込みニューラルネットワーク(CNN)は膨大な計算資源を必要とし、リソース制約されたアプライアンスへのCNNの適用を制限する。そのため,近年,分解に基づく手法がcnnの圧縮に利用されている。しかし、圧縮係数と性能は負の相関関係にあるため、最先端の作業は厳しい性能劣化に悩まされるか、圧縮係数が限られている。これらの課題を克服するため,CNNを圧縮し,結合行列分解による性能劣化を軽減することを提案する。このアイデアは、CNNには多くの繰り返しモジュールがあり、同じ構造を持つ重みを同じ部分空間に投影することで、ネットワークをさらに圧縮し、加速することができるという事実にインスパイアされている。特に, 3つの合同行列分解スキームを開発し, 特異値分解に基づく最適化手法を提案する。 3つの挑戦的なコンパクトcnnと3つのベンチマークデータセットで広範な実験を行い、提案アルゴリズムの優れた性能を実証した。その結果,本手法はresnet-34のサイズを22倍圧縮し,精度を低下させることができた。

Deep convolutional neural networks (CNNs) with a large number of parameters requires huge computational resources, which has limited the application of CNNs on resources constrained appliances. Decomposition-based methods, therefore, have been utilized to compress CNNs in recent years. However, since the compression factor and performance are negatively correlated, the state-of-the-art works either suffer from severe performance degradation or have limited low compression factors. To overcome these problems, unlike previous works compressing layers separately, we propose to compress CNNs and alleviate performance degradation via joint matrix decomposition. The idea is inspired by the fact that there are lots of repeated modules in CNNs, and by projecting weights with the same structures into the same subspace, networks can be further compressed and even accelerated. In particular, three joint matrix decomposition schemes are developed, and the corresponding optimization approaches based on Singular Values Decomposition are proposed. Extensive experiments are conducted across three challenging compact CNNs and 3 benchmark data sets to demonstrate the superior performance of our proposed algorithms. As a result, our methods can compress the size of ResNet-34 by 22x with slighter accuracy degradation compared with several state-of-the-art methods.

翻訳日:2021-07-12 13:59:27 公開日:2021-07-09

# 不確実性推定を用いたモデル・モデレータ協調の測定と改善

Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation ( http://arxiv.org/abs/2107.04212v1 )

ライセンス: Link先を確認

Ian D. Kivlichan, Zi Lin, Jeremiah Liu, Lucy Vasserman

(参考訳) コンテンツモデレーションは、人間と機械学習モデルのコラボレーションによって実行されることが多い。しかし,モデレータとモデルを組み合わせたシステムの性能を最大化するために,協調的なプロセスの設計方法がよく理解されていない。本研究は,協調的プロセスにモデル不確実性を取り込むアプローチに着目し,この問題を厳密に研究する。まず,人間のモデレーター上での容量制約下での協調システムの性能を記述するための原則付きメトリクスを導入し,組み合わせたシステムがいかに人的決定を効果的に活用するかを定量化する。これらの指標を用いて, 異なる協調的レビュー戦略の下で, 最先端の不確実性モデルの性能評価を行う。不確実性に基づく戦略は、毒性スコアに基づく広く使用されている戦略を一貫して上回っており、レビュー戦略の選択はシステム全体のパフォーマンスを劇的に変化させる。本研究は,コンテンツモデレーションのための効果的なモデレータモデルシステムを理解・開発するための厳密なメトリクスの重要性と,この領域における不確実性推定の有用性を示す。

Content moderation is often performed by a collaboration between humans and machine learning models. However, it is not well understood how to design the collaborative process so as to maximize the combined moderator-model system performance. This work presents a rigorous study of this problem, focusing on an approach that incorporates model uncertainty into the collaborative process. First, we introduce principled metrics to describe the performance of the collaborative system under capacity constraints on the human moderator, quantifying how efficiently the combined system utilizes human decisions. Using these metrics, we conduct a large benchmark study evaluating the performance of state-of-the-art uncertainty models under different collaborative review strategies. We find that an uncertainty-based strategy consistently outperforms the widely used strategy based on toxicity scores, and moreover that the choice of review strategy drastically changes the overall system performance. Our results demonstrate the importance of rigorous metrics for understanding and developing effective moderator-model systems for content moderation, as well as the utility of uncertainty estimation in this domain.

翻訳日:2021-07-12 13:59:08 公開日:2021-07-09

# 低リソースニューラルマシン翻訳に関する調査研究

A Survey on Low-Resource Neural Machine Translation ( http://arxiv.org/abs/2107.04239v1 )

ライセンス: Link先を確認

Rui Wang and Xu Tan and Renqian Luo and Tao Qin and Tie-Yan Liu

(参考訳) ニューラルアプローチは機械翻訳における最先端の精度を達成したが、大規模並列データ収集のコストが高い。したがって、非常に限られた並列データ、すなわち低リソース設定を持つニューラルマシン翻訳(nmt)について多くの研究が行われている。本稿では,低リソースNMTに関する調査を,(1)ソースおよび/またはターゲット言語の単言語データの活用,(2)補助言語からのデータの活用,(3)マルチモーダルデータの活用の3つのカテゴリに分類する。私たちの調査は、研究者がこの分野をより深く理解し、より良いアルゴリズムを設計するように促し、業界関係者がアプリケーションに適したアルゴリズムを選択するのに役立つことを期待しています。

Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data. Thus, a lot of research has been conducted for neural machine translation (NMT) with very limited parallel data, i.e., the low-resource setting. In this paper, we provide a survey for low-resource NMT and classify related works into three categories according to the auxiliary data they used: (1) exploiting monolingual data of source and/or target languages, (2) exploiting data from auxiliary languages, and (3) exploiting multi-modal data. We hope that our survey can help researchers to better understand this field and inspire them to design better algorithms, and help industry practitioners to choose appropriate algorithms for their applications.

翻訳日:2021-07-12 13:58:49 公開日:2021-07-09

# UniRE: エンティティ関係抽出のための統一ラベル空間

UniRE: A Unified Label Space for Entity Relation Extraction ( http://arxiv.org/abs/2107.04292v1 )

ライセンス: Link先を確認

Yijun Wang, Changzhi Sun, Yuanbin Wu, Hao Zhou, Lei Li, and Junchi Yan

(参考訳) 多くのジョイントエンティティ関係抽出モデルは、2つのサブタスク(エンティティ検出と関係分類)に対して2つの分離されたラベル空間を設定する。この設定は、エンティティとリレーション間の情報相互作用を妨げる可能性がある。本研究では,2つのサブタスクのラベル空間における異なる処理の除去を提案する。我々のモデルの入力は、文から全ての単語対を含むテーブルである。実体と関係は表の中の正方形と矩形で表される。 2つのサブタスクの学習を統一する,各セルラベルの予測に統一型分類器を適用した。テストでは、テーブルから正方形と矩形を見つけるために有効な(より速い)近似デコーダを提案する。 3つのベンチマーク (ACE04, ACE05, SciERC) 実験の結果, パラメータの半数しか使用せず, 最適抽出器との競合精度が向上し, 高速であることがわかった。

Many joint entity relation extraction models setup two separated label spaces for the two sub-tasks (i.e., entity detection and relation classification). We argue that this setting may hinder the information interaction between entities and relations. In this work, we propose to eliminate the different treatment on the two sub-tasks' label spaces. The input of our model is a table containing all word pairs from a sentence. Entities and relations are represented by squares and rectangles in the table. We apply a unified classifier to predict each cell's label, which unifies the learning of two sub-tasks. For testing, an effective (yet fast) approximate decoder is proposed for finding squares and rectangles from tables. Experiments on three benchmarks (ACE04, ACE05, SciERC) show that, using only half the number of parameters, our model achieves competitive accuracy with the best extractor, and is faster.

翻訳日:2021-07-12 13:58:35 公開日:2021-07-09

# タスク指向NLG出力のローカライズに機械翻訳を用いる

Using Machine Translation to Localize Task Oriented NLG Output ( http://arxiv.org/abs/2107.04512v1 )

ライセンス: Link先を確認

Scott Roy, Cliff Brunk, Kyu-Young Kim, Justin Zhao, Markus Freitag, Mihir Kale, Gagan Bansal, Sidharth Mudgal, Chris Varano

(参考訳) Google Assistant、Siri、Alexaといったタスク指向自然言語アプリケーションの課題のひとつは、出力を多くの言語にローカライズすることだ。本稿では、英語の出力に機械翻訳を適用してこれを行う。機械翻訳を使うことは非常にスケーラブルで、あらゆる英語の出力で動作し、動的テキストを処理できる。要求される品質バーは完璧に近く、文章の範囲は非常に狭く、機械翻訳訓練データとは大きく異なることが多い。この要求の組み合わせは、機械翻訳のためのドメイン適応の分野では新しくなっている。既存のアイデアに基づいて、ドメイン内翻訳の微調整、Webからの文の追加、セマンティックアノテーションの追加、自動エラー検出など、必要な品質バーに到達することができます。論文は, 大規模翻訳モデルを実現するための蒸留モデルとともに, 我々のアプローチと結果を共有する。

One of the challenges in a task oriented natural language application like the Google Assistant, Siri, or Alexa is to localize the output to many languages. This paper explores doing this by applying machine translation to the English output. Using machine translation is very scalable, as it can work with any English output and can handle dynamic text, but otherwise the problem is a poor fit. The required quality bar is close to perfection, the range of sentences is extremely narrow, and the sentences are often very different than the ones in the machine translation training data. This combination of requirements is novel in the field of domain adaptation for machine translation. We are able to reach the required quality bar by building on existing ideas and adding new ones: finetuning on in-domain translations, adding sentences from the Web, adding semantic annotations, and using automatic error detection. The paper shares our approach and results, together with a distillation model to serve the translation models at scale.

翻訳日:2021-07-12 13:58:21 公開日:2021-07-09

# 代理説明を理解する: 複雑性と忠実さとカバレッジの相互作用

Understanding surrogate explanations: the interplay between complexity, fidelity and coverage ( http://arxiv.org/abs/2107.04309v1 )

ライセンス: Link先を確認

Rafael Poyiadzi, Xavier Renard, Thibault Laugel, Raul Santos-Rodriguez, Marcin Detyniecki

(参考訳) 本稿では,サロゲート説明の背後にある基本成分を分析し,その内部動作の理解を深める。我々は、グローバルサロゲートを考慮し、サロゲートの複雑さとブラックボックスがモデル化される忠実さの間のトレードオフを記述して、展示を開始する。グローバルからローカルへの移行 - カバー範囲の削減 - により、サロゲートの忠実度-複雑度のパレートフロンティアにおいて、より好ましい条件が実現できることが示される。複雑度,忠実度,カバレッジの相互作用を議論し,ユーザニーズの違いが制約やペナルティである問題定式化にどのようにつながるかを検討する。また,局所的な代用的解釈可能性の手順をインタラクティブにし,より良い説明につながることを示す実験を行った。

This paper analyses the fundamental ingredients behind surrogate explanations to provide a better understanding of their inner workings. We start our exposition by considering global surrogates, describing the trade-off between complexity of the surrogate and fidelity to the black-box being modelled. We show that transitioning from global to local - reducing coverage - allows for more favourable conditions on the Pareto frontier of fidelity-complexity of a surrogate. We discuss the interplay between complexity, fidelity and coverage, and consider how different user needs can lead to problem formulations where these are either constraints or penalties. We also present experiments that demonstrate how the local surrogate interpretability procedure can be made interactive and lead to better explanations.

翻訳日:2021-07-12 13:57:36 公開日:2021-07-09

# 深層学習における漁業情報のばらつきについて

On the Variance of the Fisher Information for Deep Learning ( http://arxiv.org/abs/2107.04205v1 )

ライセンス: Link先を確認

Alexander Soen and Ke Sun

(参考訳) Fisher InformationMatrix (FIM) はディープラーニングの領域に応用されている。これは損失の風景、パラメータの分散、二階最適化、ディープラーニング理論と密接に関連している。正確なFIMはクローズドな形で利用できないか、計算に高すぎるかのいずれかである。実際には、ほぼ常に経験的なサンプルに基づいて推定される。 FIMの2つの等価表現に基づく2つの推定器について検討する。これらはどちらも非バイアスであり、根底にある「真の」FIMに関して一貫性がある。その推定品質は、閉じた形で与えられる分散によって特徴づけられる。それらの分散を束縛し、ディープニューラルネットワークのパラメトリック構造が分散にどのように影響するかを分析する。本稿では,この分散尺度の意味と深層学習の文脈における境界について考察する。

The Fisher information matrix (FIM) has been applied to the realm of deep learning. It is closely related to the loss landscape, the variance of the parameters, second order optimization, and deep learning theory. The exact FIM is either unavailable in closed form or too expensive to compute. In practice, it is almost always estimated based on empirical samples. We investigate two such estimators based on two equivalent representations of the FIM. They are both unbiased and consistent with respect to the underlying "true" FIM. Their estimation quality is characterized by their variance given in closed form. We bound their variances and analyze how the parametric structure of a deep neural network can impact the variance. We discuss the meaning of this variance measure and our bounds in the context of deep learning.

翻訳日:2021-07-12 13:56:38 公開日:2021-07-09

# 多頭部ニューラルアンサンブル探索

Multi-headed Neural Ensemble Search ( http://arxiv.org/abs/2107.04369v1 )

ライセンス: Link先を確認

Ashwin Raaghav Narayanan, Arber Zela, Tonmoy Saikia, Thomas Brox, Frank Hutter

(参考訳) 異なる種(ディープ・アンサンブルとしても知られる)で訓練されたCNNモデルのアンサンブルは、CNNの単一コピーよりも優れたパフォーマンスを達成することが知られている。 Neural Ensemble Search (NES)は、アーキテクチャの多様性を追加することでパフォーマンスをさらに向上させることができる。しかし、nesの範囲は限られた計算資源で制限されている。本研究では,マルチヘッドアンサンブルにnesを拡張し,複数の予測ヘッドに共有バックボーンを付加した。 Deep Ensemblesとは異なり、これらのマルチヘッドアンサンブルはエンドツーエンドで訓練できるため、ワンショットNASメソッドを利用してアンサンブルの目的を最適化することができる。実験により,マルチヘッド型アンサンブル検索は,他のアンサンブル検索手法と比較して3倍高速に動作し,予測性能と不確かさの両面で高い性能を示した。

Ensembles of CNN models trained with different seeds (also known as Deep Ensembles) are known to achieve superior performance over a single copy of the CNN. Neural Ensemble Search (NES) can further boost performance by adding architectural diversity. However, the scope of NES remains prohibitive under limited computational resources. In this work, we extend NES to multi-headed ensembles, which consist of a shared backbone attached to multiple prediction heads. Unlike Deep Ensembles, these multi-headed ensembles can be trained end to end, which enables us to leverage one-shot NAS methods to optimize an ensemble objective. With extensive empirical evaluations, we demonstrate that multi-headed ensemble search finds robust ensembles 3 times faster, while having comparable performance to other ensemble search methods, in both predictive performance and uncertainty calibration.

翻訳日:2021-07-12 13:56:29 公開日:2021-07-09

# 拡張GANフレームワークを用いたWhite-Box Cartoonization

White-Box Cartoonization Using An Extended GAN Framework ( http://arxiv.org/abs/2107.04551v1 )

ライセンス: Link先を確認

Amey Thakur, Hasan Rizvi, Mega Satish

(参考訳) 本研究では,既存のGANフレームワークを拡張し,実世界の写真やビデオから高品質なマンガ画像や映像を生成するホワイトボックス制御可能な画像の漫画化を開発するための,敵対的なプロセスによる生成モデルを推定するための新しいフレームワークを提案する。本システムの学習目的は, 表面表現, 構造表現, テクスチャ表現の3つの異なる表現に基づいている。表面表現は、画像の滑らかな表面を指す。構造表現はスパースカラーブロックと関連し、ジェネリックコンテンツを圧縮する。テクスチャ表現は、漫画画像のテクスチャ、曲線、特徴を示す。 Generative Adversarial Network (GAN)フレームワークは、画像を異なる表現に分解し、そこから学習して漫画画像を生成する。この分解により、フレームワークはより制御可能でフレキシブルになり、ユーザーは必要な出力に基づいて変更できる。このアプローチは、画像の明快さ、色、テクスチャ、形状を維持できるが、漫画のイメージの特徴は示さないという点で、過去のシステムを克服する。

In the present study, we propose to implement a new framework for estimating generative models via an adversarial process to extend an existing GAN framework and develop a white-box controllable image cartoonization, which can generate high-quality cartooned images/videos from real-world photos and videos. The learning purposes of our system are based on three distinct representations: surface representation, structure representation, and texture representation. The surface representation refers to the smooth surface of the images. The structure representation relates to the sparse colour blocks and compresses generic content. The texture representation shows the texture, curves, and features in cartoon images. Generative Adversarial Network (GAN) framework decomposes the images into different representations and learns from them to generate cartoon images. This decomposition makes the framework more controllable and flexible which allows users to make changes based on the required output. This approach overcomes any previous system in terms of maintaining clarity, colours, textures, shapes of images yet showing the characteristics of cartoon images.

翻訳日:2021-07-12 13:55:57 公開日:2021-07-09

# 複数の視覚領域のセマンティックセグメンテーション

Semantic Segmentation on Multiple Visual Domains ( http://arxiv.org/abs/2107.04326v1 )

ライセンス: Link先を確認

Floris Naber

(参考訳) セマンティクスのセグメンテーションモデルは、トレーニング対象のドメインでのみうまく動作し、トレーニング用のデータセットは不足しており、必要なピクセルレベルのアノテーションはコストがかかるため、ラベルスペースが小さいことが多い。したがって、複数の既存ドメインでのトレーニングモデルは出力ラベル空間を増大させることが望まれる。現在の研究では、マルチドメイントレーニングを使用してデータセット間の精度を改善する可能性があるが、手動ラベリングなしで3つの異なる非重複ドメインのデータセットにはまだ拡張されていない。本稿では,データセットの全クラスにまたがるラベル空間を作成することで,都市景観,SUIM,SUN RGB-Dのデータセットに対して,この手法を提案する。重複クラスはマージされ、クラスを分離して離散的な粒度が解決される。その結果、ハードウェアの性能が等しければ、マルチドメインモデルの精度は全てのベースラインモデルよりも高いことが示され、リソースに制限がないため、モデルは共通でないドメインからでも追加データから恩恵を受けることが示された。

Semantic segmentation models only perform well on the domain they are trained on and datasets for training are scarce and often have a small label-spaces, because the pixel level annotations required are expensive to make. Thus training models on multiple existing domains is desired to increase the output label-space. Current research shows that there is potential to improve accuracy across datasets by using multi-domain training, but this has not yet been successfully extended to datasets of three different non-overlapping domains without manual labelling. In this paper a method for this is proposed for the datasets Cityscapes, SUIM and SUN RGB-D, by creating a label-space that spans all classes of the datasets. Duplicate classes are merged and discrepant granularity is solved by keeping classes separate. Results show that accuracy of the multi-domain model has higher accuracy than all baseline models together, if hardware performance is equalized, as resources are not limitless, showing that models benefit from additional data even from domains that have nothing in common.

翻訳日:2021-07-12 13:55:41 公開日:2021-07-09

# モバイルアプリケーションのためのマルチモーダルアイコンアノテーション

Multimodal Icon Annotation For Mobile Applications ( http://arxiv.org/abs/2107.04452v1 )

ライセンス: Link先を確認

Xiaoxue Zang, Ying Xu, Jindong Chen

(参考訳) 画面上の意味のあるUI要素のローカライズと分類を含むユーザインターフェース(UI)のアノテーションは、スクリーンリーダーやデバイスの音声制御といった多くのモバイルアプリケーションにとって重要なステップである。メニュー、検索、矢印といったオブジェクトアイコンを後方にアノテートすることは、画面上の明示的なラベルの欠如、画像との類似性、そしてそれらの多様な形状のため、特に困難である。既存の研究では、ビュー階層またはピクセルベースメソッドを使用してタスクに取り組む。モバイルプラットフォームのビュー階層機能は不完全あるいは不正確なことが多いため、Pixelベースのアプローチの方が一般的だが、リソースIDやコンテンツ記述などのビュー階層に命令情報を残している。本稿では,画素とビュー階層機能の両方の利点と,最先端のオブジェクト検出技術を活用する,新しいディープラーニングに基づくマルチモーダルアプローチを提案する。 ricoは72kのuiスクリーンショットからなる大規模なモバイルデザインデータセットで,29個のアイコンを手作業でアノテートすることにより,高品質のuiデータセットを作成する。実験の結果,マルチモーダルアプローチの有効性が示された。我々のモデルは、広く使われているオブジェクト分類ベースラインだけでなく、ピクセルベースのオブジェクト検出モデルよりも優れている。当社の研究は、ビュー階層とピクセル機能を組み合わせてui要素をアノテートする方法に光を当てています。

Annotating user interfaces (UIs) that involves localization and classification of meaningful UI elements on a screen is a critical step for many mobile applications such as screen readers and voice control of devices. Annotating object icons, such as menu, search, and arrow backward, is especially challenging due to the lack of explicit labels on screens, their similarity to pictures, and their diverse shapes. Existing studies either use view hierarchy or pixel based methods to tackle the task. Pixel based approaches are more popular as view hierarchy features on mobile platforms are often incomplete or inaccurate, however it leaves out instructional information in the view hierarchy such as resource-ids or content descriptions. We propose a novel deep learning based multi-modal approach that combines the benefits of both pixel and view hierarchy features as well as leverages the state-of-the-art object detection techniques. In order to demonstrate the utility provided, we create a high quality UI dataset by manually annotating the most commonly used 29 icons in Rico, a large scale mobile design dataset consisting of 72k UI screenshots. The experimental results indicate the effectiveness of our multi-modal approach. Our model not only outperforms a widely used object classification baseline but also pixel based object detection models. Our study sheds light on how to combine view hierarchy with pixel features for annotating UI elements.

翻訳日:2021-07-12 13:55:24 公開日:2021-07-09

# 説明可能性の方法を選ぶには? XAIの実践的実践に向けて

How to choose an Explainability Method? Towards a Methodical Implementation of XAI in Practice ( http://arxiv.org/abs/2107.04427v1 )

ライセンス: Link先を確認

Tom Vermeire and Thibault Laugel and Xavier Renard and David Martens and Marcin Detyniecki

(参考訳) 説明責任は、規制イニシアチブと公共の意識の変化によって、自動意思決定を利用する組織にとって重要な要件になりつつある。この説明可能性を提供するための様々なアルゴリズム的手法がこの分野で導入されているが、機械学習コミュニティの既存の文献は、人間とコンピュータのインタフェースコミュニティでより研究されている利害関係者にはほとんど注意を払っていない。したがって、この説明可能性を望むか、提供する必要がある組織は、ユースケースに適した方法の選択に直面します。本稿では,利害関係者のニーズと説明方法のギャップを埋めるための方法論の必要性を論じる。我々は、ステークホルダーに説明責任を提供するプロセスにおいて、データサイエンティストを支援するために、この方法論を作成するための継続的な作業を示す。特に、私たちのコントリビューションには、XAIメソッドとユーザ要求(Appendixで書かれた)を特徴付ける文書が含まれています。

Explainability is becoming an important requirement for organizations that make use of automated decision-making due to regulatory initiatives and a shift in public awareness. Various and significantly different algorithmic methods to provide this explainability have been introduced in the field, but the existing literature in the machine learning community has paid little attention to the stakeholder whose needs are rather studied in the human-computer interface community. Therefore, organizations that want or need to provide this explainability are confronted with the selection of an appropriate method for their use case. In this paper, we argue there is a need for a methodology to bridge the gap between stakeholder needs and explanation methods. We present our ongoing work on creating this methodology to help data scientists in the process of providing explainability to stakeholders. In particular, our contributions include documents used to characterize XAI methods and user requirements (shown in Appendix), which our methodology builds upon.

翻訳日:2021-07-12 13:55:05 公開日:2021-07-09

# 光干渉計のビーム発散制御と連続動作空間との整合

Aligning an optical interferometer with beam divergence control and continuous action space ( http://arxiv.org/abs/2107.04457v1 )

ライセンス: Link先を確認

Stepan Makarenko, Dmitry Sorokin, Alexander Ulanov, A. I. Lvovsky

(参考訳) 強化学習は実世界の問題アプリケーションへの道を見つけ、シミュレーションされた環境から物理的な環境へ移行している。本研究では,片腕に共焦点望遠鏡を装着し,対応するビームの直径と発散を制御する光学マッハツェンダー干渉計の視覚に基づくアライメントを実装した。指数的スケーリングによって、2桁以上の範囲のアクションを処理することができます。我々のエージェントはドメインランダム化を模擬環境でのみ訓練する。実験的評価では、エージェントは既存のソリューションと人間の専門家とを著しく上回る。

Reinforcement learning is finding its way to real-world problem application, transferring from simulated environments to physical setups. In this work, we implement vision-based alignment of an optical Mach-Zehnder interferometer with a confocal telescope in one arm, which controls the diameter and divergence of the corresponding beam. We use a continuous action space; exponential scaling enables us to handle actions within a range of over two orders of magnitude. Our agent trains only in a simulated environment with domain randomizations. In an experimental evaluation, the agent significantly outperforms an existing solution and a human expert.

翻訳日:2021-07-12 13:54:49 公開日:2021-07-09

# 逆混合密度ネットワーク:衝突データから安全に運転する学習

Adversarial Mixture Density Networks: Learning to Drive Safely from Collision Data ( http://arxiv.org/abs/2107.04485v1 )

ライセンス: Link先を確認

Sampo Kuutti, Saber Fallah, Richard Bowden

(参考訳) 模倣学習は、予め記録されたデータに基づいて自律運転の制御方針を学ぶために広く使われている。しかしながら、模倣学習に基づくポリシーは、トレーニング分布外の状態に遭遇する際のエラーを複雑化する可能性があることが示されている。さらに、これらのエージェントは衝突を起こそうとする敵の道路利用者によって容易に利用できることが示されている。これらの欠点を克服するために、異なるデータセットから2つの分布を学習するAdversarial Mixture Density Networks (AMDN)を導入する。 1つ目は、自然主義的な人間の運転のデータセットから学んだ安全な行動の分布である。 2つ目は、衝突のデータセットから学んだ、衝突につながる可能性のある安全でない行動を表す分布である。トレーニング中、これらの2つの分布を利用して、2つの分布の類似性に基づいたさらなる損失を与える。衝突データセットのトレーニング時に、安全行動分布と非安全行動分布との類似性に基づいて安全行動分布を解析することにより、より堅牢で安全な制御ポリシーを得る。提案するamdnアプローチをユースケースに追従した車両で実証し,自然主義的および敵対的テスト環境下で評価する。その単純さにもかかわらず、amdnは純粋な模倣学習や標準混合密度ネットワークアプローチと比較して、学習した制御ポリシーの安全性に大きなメリットがあることを示している。

Imitation learning has been widely used to learn control policies for autonomous driving based on pre-recorded data. However, imitation learning based policies have been shown to be susceptible to compounding errors when encountering states outside of the training distribution. Further, these agents have been demonstrated to be easily exploitable by adversarial road users aiming to create collisions. To overcome these shortcomings, we introduce Adversarial Mixture Density Networks (AMDN), which learns two distributions from separate datasets. The first is a distribution of safe actions learned from a dataset of naturalistic human driving. The second is a distribution representing unsafe actions likely to lead to collision, learned from a dataset of collisions. During training, we leverage these two distributions to provide an additional loss based on the similarity of the two distributions. By penalising the safe action distribution based on its similarity to the unsafe action distribution when training on the collision dataset, a more robust and safe control policy is obtained. We demonstrate the proposed AMDN approach in a vehicle following use-case, and evaluate under naturalistic and adversarial testing environments. We show that despite its simplicity, AMDN provides significant benefits for the safety of the learned control policy, when compared to pure imitation learning or standard mixture density network approaches.

翻訳日:2021-07-12 13:54:39 公開日:2021-07-09

# 教師-学生組立における継続的な学習 : 課題類似性の影響

Continual Learning in the Teacher-Student Setup: Impact of Task Similarity ( http://arxiv.org/abs/2107.04384v1 )

ライセンス: Link先を確認

Sebastian Lee and Sebastian Goldt and Andrew Saxe

(参考訳) 連続学習-シーケンスで多くのタスクを学習する能力は、人工知能システムにとって重要である。しかし、ディープネットワークの標準的なトレーニング方法は、新しいタスクの学習が以前のタスクの知識を消去する壊滅的な忘れに苦しむことが多い。大惨事は問題を忘れるが、タスク間の干渉の理論的理由は不明である。そこで本研究では,教師の学習環境において継続学習を学習することで,理論と実践のギャップを狭めようとする。教師-学生構成における2層ネットワークに関する過去の分析作業を複数の教師に拡張する。各教師が異なるタスクを表現するために,教師間の関係が,タスク切替時の生徒が提示する忘れや転校の量にどのように影響するかを検討する。最近の研究によると、タスクが類似した機能に依存する場合、中間タスクの類似性が最大の忘れ物となる。しかし、機能的類似性はタスクが関係する1つの方法である。教師と学生のアプローチは、読み出し(隠れる重み)と特徴(隠れる重み)のレベルでタスクの類似性を分離することを可能にします。両者の類似性、初期転送/フォーゲッティング率、最大転送/フォーゲティング、長期転送/フォーゲティングの複雑な相互作用を見出す。これらの結果は、壊滅的な忘れに寄与する様々な要因を照らすのに役立つ。

Continual learning-the ability to learn many tasks in sequence-is critical for artificial learning systems. Yet standard training methods for deep networks often suffer from catastrophic forgetting, where learning new tasks erases knowledge of earlier tasks. While catastrophic forgetting labels the problem, the theoretical reasons for interference between tasks remain unclear. Here, we attempt to narrow this gap between theory and practice by studying continual learning in the teacher-student setup. We extend previous analytical work on two-layer networks in the teacher-student setup to multiple teachers. Using each teacher to represent a different task, we investigate how the relationship between teachers affects the amount of forgetting and transfer exhibited by the student when the task switches. In line with recent work, we find that when tasks depend on similar features, intermediate task similarity leads to greatest forgetting. However, feature similarity is only one way in which tasks may be related. The teacher-student approach allows us to disentangle task similarity at the level of readouts (hidden-to-output weights) and features (input-to-hidden weights). We find a complex interplay between both types of similarity, initial transfer/forgetting rates, maximum transfer/forgetting, and long-term transfer/forgetting. Together, these results help illuminate the diverse factors contributing to catastrophic forgetting.

翻訳日:2021-07-12 13:53:52 公開日:2021-07-09

# 学級得点に基づく逆例検出のための学習

Learning to Detect Adversarial Examples Based on Class Scores ( http://arxiv.org/abs/2107.04435v1 )

ライセンス: Link先を確認

Tobias Uelwer, Felix Michels, Oliver De Candido

(参考訳) ディープニューラルネットワーク(DNN)に対する敵攻撃の脅威が増加する中、効率的な検出方法の研究はこれまで以上に重要である。本研究では,すでに訓練済みの分類モデルのクラススコアに基づいて,敵の攻撃検出を詳細に検討する。我々は,クラススコアでサポートベクターマシン(svm)を訓練し,逆例を検出することを提案する。本手法は,様々な攻撃によって発生する逆例を検出でき,多くの深層分類モデルに容易に適用できる。提案手法は,実装が容易でありながら,既存の手法と比較して検出率の向上を図っている。異なる深層分類モデルに対する広範な実証分析を行い、様々な最先端の敵攻撃について検討する。さらに,本手法は敵の攻撃の組み合わせを検出するのに優れていることを確かめた。本研究は, 訓練済みの分類モデルのクラススコアを用いて, 様々な敵攻撃を検出する可能性を示唆する。

Given the increasing threat of adversarial attacks on deep neural networks (DNNs), research on efficient detection methods is more important than ever. In this work, we take a closer look at adversarial attack detection based on the class scores of an already trained classification model. We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples. Our method is able to detect adversarial examples generated by various attacks, and can be easily adopted to a plethora of deep classification models. We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement. We perform an extensive empirical analysis on different deep classification models, investigating various state-of-the-art adversarial attacks. Moreover, we observe that our proposed method is better at detecting a combination of adversarial attacks. This work indicates the potential of detecting various adversarial attacks simply by using the class scores of an already trained classification model.

翻訳日:2021-07-12 13:52:57 公開日:2021-07-09

# ViTGAN:視覚変換器を用いたガン訓練

ViTGAN: Training GANs with Vision Transformers ( http://arxiv.org/abs/2107.04589v1 )

ライセンス: Link先を確認

Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

(参考訳) 近年、視覚変換器(ViT)は、視覚固有の誘導バイアスを少なくしながら、画像認識に競争力を発揮している。本稿では,このような観察を画像生成に拡張できるかどうかを検討する。この目的のために、我々はViTアーキテクチャをGAN(Generative Adversarial Network)に統合する。我々は,ganの既存の正規化手法が自己着脱に乏しく,訓練中に深刻な不安定性を引き起こすことを観察する。この問題を解決するために,我々は,新しい正規化手法を導入し,GANをViTでトレーニングする。 CIFAR-10、CelebA、LSUNの寝室データセット上で、我々のアプローチであるViTGANは最先端のCNNベースのStyleGAN2に匹敵する性能を実現している。

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such observation can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). We observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce novel regularization techniques for training GANs with ViTs. Empirically, our approach, named ViTGAN, achieves comparable performance to state-of-the-art CNN-based StyleGAN2 on CIFAR-10, CelebA, and LSUN bedroom datasets.

翻訳日:2021-07-12 13:52:44 公開日:2021-07-09

# ARC: 自動運転車の対向的ロバスト制御

ARC: Adversarially Robust Control Policies for Autonomous Vehicles ( http://arxiv.org/abs/2107.04487v1 )

ライセンス: Link先を確認

Sampo Kuutti, Saber Fallah, Richard Bowden

(参考訳) ディープニューラルネットワークは、さまざまなタスクの制御ポリシを学習する能力を示している。しかしながら、これらのニューラルネットワークベースのポリシーは、敵エージェントによる搾取に影響を受けやすいことが示されている。したがって、敵に対して堅牢な制御ポリシーを学ぶための技術を開発する必要がある。本稿では, 対人ロバスト制御(ARC)を導入し, 同じ損失に対して, 対人政策と対人政策を訓練する。主人公の目的は、敵が最小化しようとしている間、この損失を最大化することである。提案したARCトレーニングを高速道路走行シナリオで実演し、敵が先頭車両を制御している間に追従者が追従車両を制御する。敵のアンサンブルに対して主人公を訓練することにより、敵の戦略を一般化する、はるかに堅牢な制御ポリシーを学ぶ。このアプローチは、当初の方針と比較して、新しい敵に対する衝突の回数を90.25%まで減少させることが示されている。また, 補助蒸留損失を利用することにより, 微調整制御方針は, 元のトレーニング分布をまたいだ性能低下を示さないことを示した。

Deep neural networks have demonstrated their capability to learn control policies for a variety of tasks. However, these neural network-based policies have been shown to be susceptible to exploitation by adversarial agents. Therefore, there is a need to develop techniques to learn control policies that are robust against adversaries. We introduce Adversarially Robust Control (ARC), which trains the protagonist policy and the adversarial policy end-to-end on the same loss. The aim of the protagonist is to maximise this loss, whilst the adversary is attempting to minimise it. We demonstrate the proposed ARC training in a highway driving scenario, where the protagonist controls the follower vehicle whilst the adversary controls the lead vehicle. By training the protagonist against an ensemble of adversaries, it learns a significantly more robust control policy, which generalises to a variety of adversarial strategies. The approach is shown to reduce the amount of collisions against new adversaries by up to 90.25%, compared to the original policy. Moreover, by utilising an auxiliary distillation loss, we show that the fine-tuned control policy shows no drop in performance across its original training distribution.

翻訳日:2021-07-12 13:52:19 公開日:2021-07-09

# EasyCom:ノイズの多い環境で簡単にコミュニケーションできるアルゴリズムをサポートする拡張現実データセット

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments ( http://arxiv.org/abs/2107.04174v1 )

ライセンス: Link先を確認

Jacob Donley, Vladimir Tourbabin, Jung-Suk Lee, Mark Broyles, Hao Jiang, Jie Shen, Maja Pantic, Vamsi Krishna Ithapu, Ravish Mehra

(参考訳) プラットフォームとしての拡張現実(AR)は、カクテルパーティー効果の低減を促進する可能性がある。将来のarヘッドセットは、さまざまな種類のセンサーからの情報を活用する可能性がある。ビームフォーミングや音声強調などのタスクにおける信号処理と機械学習アルゴリズムの訓練と試験には、高品質な代表データが必要である。著者の知る限り、出版時点では、ノイズの多い環境での動的動きと会話を伴う、エゴセントリックなマルチチャンネルオーディオとビデオの同期を含む利用可能なデータセットは存在しない。本研究では,ARメガネ装着者の会話改善のためのアルゴリズムのトレーニングやテストに有用な5時間以上のマルチモーダルデータを含むデータセットを記述,評価,リリースする。ベースライン法に対して,音声の可聴性,品質,信号対雑音比の改善結果を提供し,全試験指標で改善を示す。私たちがリリースするデータセットには、ARグラスのエゴセントリックなマルチチャネルマイクロフォンアレイオーディオ、広視野RGBビデオ、音声ソースポーズ、ヘッドセットマイクロフォンオーディオ、注釈付き音声アクティビティ、音声書き起こし、ヘッドバウンディングボックス、スピーチのターゲット、ソース識別ラベルが含まれています。我々は、カクテルパーティー問題に対するマルチモーダルARソリューションの研究を促進するために、このデータセットを作成し、リリースしています。

Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Training and testing signal processing and machine learning algorithms on tasks such as beam-forming and speech enhancement require high quality representative data. To the best of the author's knowledge, as of publication there are no available datasets that contain synchronized egocentric multi-channel audio and video with dynamic movement and conversations in a noisy environment. In this work, we describe, evaluate and release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer. We provide speech intelligibility, quality and signal-to-noise ratio improvement results for a baseline method and show improvements across all tested metrics. The dataset we are releasing contains AR glasses egocentric multi-channel microphone array audio, wide field-of-view RGB video, speech source pose, headset microphone audio, annotated voice activity, speech transcriptions, head bounding boxes, target of speech and source identification labels. We have created and are releasing this dataset to facilitate research in multi-modal AR solutions to the cocktail party problem.

翻訳日:2021-07-12 13:51:58 公開日:2021-07-09

# 質問応答システムにおける回答検証のための共同モデル

Joint Models for Answer Verification in Question Answering Systems ( http://arxiv.org/abs/2107.04217v1 )

ライセンス: Link先を確認

Zeyu Zhang, Thuy Vu, and Alessandro Moschitti

(参考訳) 本稿では,検索に基づく質問回答システム(QA)のコアコンポーネントである,回答文選択(AS2)モジュールによって提供される上位$k$の中から,正しい回答文を選択するためのジョイントモデルについて検討する。本研究は,一対の回答間の相互関連情報をモデル化することに関して,回答集合を効果的に活用するための重要なステップを示す。この目的のために三方向多重分類器を構築し,解答が他の解答を支持するか,反証するか,あるいは中立かを決定する。より具体的には、私たちのニューラルネットワークアーキテクチャは、最先端のAS2モデルとマルチクラス化器、およびすべてのコンポーネントを接続するジョイント層を統合しています。私たちは、WikiQA、TREC-QA、実世界のデータセットでモデルをテストしました。その結果,本モデルではAS2の新たな状態が得られた。

This paper studies joint models for selecting correct answer sentences among the top $k$ provided by answer sentence selection (AS2) modules, which are core components of retrieval-based Question Answering (QA) systems. Our work shows that a critical step to effectively exploit an answer set regards modeling the interrelated information between pair of answers. For this purpose, we build a three-way multi-classifier, which decides if an answer supports, refutes, or is neutral with respect to another one. More specifically, our neural architecture integrates a state-of-the-art AS2 model with the multi-classifier, and a joint layer connecting all components. We tested our models on WikiQA, TREC-QA, and a real-world dataset. The results show that our models obtain the new state of the art in AS2.

翻訳日:2021-07-12 13:51:36 公開日:2021-07-09

# 自動可読性評価のための相関グラフを用いたシンタクティックセンス埋め込みの学習

Learning Syntactic Dense Embedding with Correlation Graph for Automatic Readability Assessment ( http://arxiv.org/abs/2107.04268v1 )

ライセンス: Link先を確認

Xinying Qiu, Yuan Chen, Hanwu Chen, Jian-Yun Nie, Yuming Shen, Dawei Lu

(参考訳) 自動可読性評価のためのディープラーニングモデルは、一般的に、タスクの機械学習モデルで伝統的に使用される言語的特徴を捨てる。本稿では,言語的特徴に基づく構文的密埋め込みを学習することにより,言語的特徴をニューラルネットワークモデルに組み込むことを提案する。特徴間の関係に対処するため,特徴間の相関グラフを作成し,類似した特徴が類似の埋め込みによって表現されるように,それらの埋め込みを学習する。提案手法は, BERTのみのモデルを補完し, 自動可読性評価のための性能を著しく向上させることができることを示す。

Deep learning models for automatic readability assessment generally discard linguistic features traditionally used in machine learning models for the task. We propose to incorporate linguistic features into neural network models by learning syntactic dense embeddings based on linguistic features. To cope with the relationships between the features, we form a correlation graph among features and use it to learn their embeddings so that similar features will be represented by similar embeddings. Experiments with six data sets of two proficiency levels demonstrate that our proposed methodology can complement BERT-only model to achieve significantly better performances for automatic readability assessment.

翻訳日:2021-07-12 13:51:20 公開日:2021-07-09

# 持ち上げ動作モデルの安全学習

Safe Learning of Lifted Action Models ( http://arxiv.org/abs/2107.04169v1 )

ライセンス: Link先を確認

Brendan Juba, Hai S. Le, Roni Stern

(参考訳) ドメインモデルの作成は、古典的でドメインに依存しない計画であっても、非常に難しい知識エンジニアリングタスクです。この問題を解決する自然なアプローチは、観察からドメインモデルを学ぶことである。しかし、モデル学習アプローチは、しばしば安全保証を提供しない: 学習モデルは、アクションが適用されないときに、アクションが適用可能であると仮定し、アクションの効果を誤ってキャプチャする可能性がある。これは実行時に失敗する計画を生成する可能性がある。一部のドメインでは、失敗のコストや失敗後のオンライン再計画のできないため、このような失敗は許されない。このような環境では、他のエージェントや人間によって収集された観察に基づいて、すべての学習をオフラインで行う必要がある。この学習を通じて、そのタスクは成功を保証された計画を生成することです。これをモデルフリー計画問題と呼ぶ。先行研究は、古典計画におけるモデルフリー計画問題の解法を提案した。しかし、藩の学習に限定されていたため、規模は拡大できなかった。我々は、この先行研究を一般化し、リフトドドメインに対する最初の安全なモデルフリープランニングアルゴリズムを提案する。我々は,このアプローチの正確性を証明し,確率の高い将来の問題を解くのに必要な軌道数が,ドメインモデルのポテンシャルサイズにおいて線形であることを示す統計解析を提供する。また,12のICCドメインに対して,少なくとも2つの軌道で実動作モデルを学習可能であることを示す実験を行った。

Creating a domain model, even for classical, domain-independent planning, is a notoriously hard knowledge-engineering task. A natural approach to solve this problem is to learn a domain model from observations. However, model learning approaches frequently do not provide safety guarantees: the learned model may assume actions are applicable when they are not, and may incorrectly capture actions' effects. This may result in generating plans that will fail when executed. In some domains such failures are not acceptable, due to the cost of failure or inability to replan online after failure. In such settings, all learning must be done offline, based on some observations collected, e.g., by some other agents or a human. Through this learning, the task is to generate a plan that is guaranteed to be successful. This is called the model-free planning problem. Prior work proposed an algorithm for solving the model-free planning problem in classical planning. However, they were limited to learning grounded domains, and thus they could not scale. We generalize this prior work and propose the first safe model-free planning algorithm for lifted domains. We prove the correctness of our approach, and provide a statistical analysis showing that the number of trajectories needed to solve future problems with high probability is linear in the potential size of the domain model. We also present experiments on twelve IPC domains showing that our approach is able to learn the real action model in all cases with at most two trajectories.

翻訳日:2021-07-12 13:51:10 公開日:2021-07-09

# オープンワールド新規企業における計画・実行・監視の統合:オープンワールドモノポリーソルバーを事例として

Integrating Planning, Execution and Monitoring in the presence of Open World Novelties: Case Study of an Open World Monopoly Solver ( http://arxiv.org/abs/2107.04303v1 )

ライセンス: Link先を確認

Sriram Gopalakrishnan, Utkarsh Soni, Tung Thai, Panagiotis Lymperopoulos, Matthias Scheutz, Subbarao Kambhampati

(参考訳) ゲーム・オブ・モノポリー(英: game of monopoly)は、最後のプレイヤー溶媒となること以外の固定的な目標がなく、プロパティの集合の独占やそれらの発展といった有用なサブゴールが存在する、敵対的マルチエージェントドメインである。 dice rolls、card-draws、adversariesの戦略からも多くのランダム性がある。この予測不可能性は、ゲームプレイ中に未知のノベルティを追加すると悪化する。これらの課題を考えると、モノポリーはDARPA-SAILONプログラムで選ばれたテストベッドの1つであり、新規性を検出して適応できるエージェントを作ることを目的としている。ゲームの複雑さに対処するため,我々は,ゲームが進化するにつれてオンラインの方針に適応するエージェントを開発した。 SAILONプログラムの最近の独立性評価では,ほとんどの指標において,我々のエージェントが最も優れたエージェントであった。ここでは、我々のアプローチと結果を示す。

The game of monopoly is an adversarial multi-agent domain where there is no fixed goal other than to be the last player solvent, There are useful subgoals like monopolizing sets of properties, and developing them. There is also a lot of randomness from dice rolls, card-draws, and adversaries' strategies. This unpredictability is made worse when unknown novelties are added during gameplay. Given these challenges, Monopoly was one of the test beds chosen for the DARPA-SAILON program which aims to create agents that can detect and accommodate novelties. To handle the game complexities, we developed an agent that eschews complete plans, and adapts it's policy online as the game evolves. In the most recent independent evaluation in the SAILON program, our agent was the best performing agent on most measures. We herein present our approach and results.

翻訳日:2021-07-12 13:50:51 公開日:2021-07-09

# 鉄道トポロジーオントロジー:鉄道インフラ基盤オントロジー

Rail Topology Ontology: A Rail Infrastructure Base Ontology ( http://arxiv.org/abs/2107.04378v1 )

ライセンス: Link先を確認

Stefan Bischof, Gottfried Schenner

(参考訳) 鉄道インフラのエンジニアリングプロジェクトは通常、計画され構築されたインフラとその基盤となるトポロジを一貫したビューを必要とする多くのサブシステムを含む。一貫性はXMLベースのデータフォーマットとUMLベースのオブジェクト指向モデルを使ってツール間でデータを交換し、検証することで保証される。共通のトポロジーモデルによるこれらのデータ表現のより緊密なアラインメントは、鉄道インフラエンジニアリングツールの開発労力を減少させる可能性がある。一般的な意味モデルは、鉄道知識グラフの導入の成功の前提条件でもある。レールトポモデル標準に基づき、鉄道インフラのコア特性を標準に準拠した形で表現するためのモデルとしてレールトポロジオントロジーを開発した。本稿では, オントロジーとその開発手法について述べるとともに, 鉄道工学系などのデータの統合性について, 知識グラフで考察する。レールトポロジーオントロジーにより、ソフトウェアエンジニアと知識科学者は、切断されたデータソースを統合するための鉄道トポロジーを表す標準オントロジーを持っている。私たちはレールトポロジオントロジーをレールナレッジグラフとして使用し、既存のデータ交換標準から派生したレールインフラストラクチャオントロジーによって拡張することを計画しています。

Engineering projects for railway infrastructure typically involve many subsystems which need consistent views of the planned and built infrastructure and its underlying topology. Consistency is typically ensured by exchanging and verifying data between tools using XML-based data formats and UML-based object-oriented models. A tighter alignment of these data representations via a common topology model could decrease the development effort of railway infrastructure engineering tools. A common semantic model is also a prerequisite for the successful adoption of railway knowledge graphs. Based on the RailTopoModel standard, we developed the Rail Topology Ontology as a model to represent core features of railway infrastructures in a standard-compliant manner. This paper describes the ontology and its development method, and discusses its suitability for integrating data of railway engineering systems and other sources in a knowledge graph. With the Rail Topology Ontology, software engineers and knowledge scientists have a standard-based ontology for representing railway topologies to integrate disconnected data sources. We use the Rail Topology Ontology for our rail knowledge graph and plan to extend it by rail infrastructure ontologies derived from existing data exchange standards, since many such standards use the same base model as the presented ontology, viz., RailTopoModel.

翻訳日:2021-07-12 13:50:35 公開日:2021-07-09

# Unity Perception:コンピュータビジョンのための合成データ生成

Unity Perception: Generate Synthetic Data for Computer Vision ( http://arxiv.org/abs/2107.04259v1 )

ライセンス: Link先を確認

Steve Borkman, Adam Crespi, Saurav Dhakad, Sujoy Ganguly, Jonathan Hogins, You-Cyuan Jhang, Mohsen Kamalzadeh, Bowen Li, Steven Leal, Pete Parisi, Cesar Romero, Wesley Smith, Alex Thaman, Samuel Warren, Nupur Yadav

(参考訳) 本稿では,コンピュータビジョンタスクのための合成データセット生成プロセスを簡素化し,高速化することを目的としたunity perception packageを紹介する。このオープンソースのパッケージはunityエディタとエンジンコンポーネントを拡張し、いくつかの一般的なコンピュータビジョンタスクの注釈付き例を生成する。さらに、ユーザが生成したデータセットにバリエーションを導入するために、ランダム化されたシミュレーションパラメータを迅速に構築、設定できる拡張可能なランダム化フレームワークも提供する。提案するツールの概要と動作方法,および2次元オブジェクト検出モデルをトレーニングすることにより生成した合成データセットの価値を実証する。主に合成データでトレーニングされたモデルは、実際のデータのみを使用してトレーニングされたモデルよりも優れている。

We introduce the Unity Perception package which aims to simplify and accelerate the process of generating synthetic datasets for computer vision tasks by offering an easy-to-use and highly customizable toolset. This open-source package extends the Unity Editor and engine components to generate perfectly annotated examples for several common computer vision tasks. Additionally, it offers an extensible Randomization framework that lets the user quickly construct and configure randomized simulation parameters in order to introduce variation into the generated datasets. We provide an overview of the provided tools and how they work, and demonstrate the value of the generated synthetic datasets by training a 2D object detection model. The model trained with mostly synthetic data outperforms the model trained using only real data.

翻訳日:2021-07-12 13:49:50 公開日:2021-07-09

# Wavelet Transform-assisted Adaptive Generative Modeling for Colorization

Wavelet Transform-assisted Adaptive Generative Modeling for Colorization ( http://arxiv.org/abs/2107.04261v1 )

ライセンス: Link先を確認

Jin Li, Wanyun Li, Zichen Xu, Yuhao Wang, Qiegen Liu

(参考訳) 教師なしのディープラーニングは、最近高品質なサンプルを生成するという約束を実証した。画像の着色タスクを促進する可能性は非常に高いが、機械学習における多様体仮説により性能は限られている。本研究では,ウェーブレット領域におけるスコアベース生成モデルを利用した新しい手法を提案する。ウェーブレット変換によるマルチスケール・マルチチャネル表現を利用して,重畳されたウェーブレット係数成分から先行成分を学習し,粗い周波数スペクトルと詳細周波数スペクトルを併用して画像特性を学習する。さらに、逆最適化のない高フレキシブルな生成モデルは、ウェーブレット領域における二重整合項、すなわちデータ一貫性と構造整合性の下で、より優れた色付けタスクを実行することができる。具体的には、トレーニングフェーズにおいて、ウェーブレット係数からなるマルチチャネルテンソルのセットを入力として、スコアマッチングを識別してネットワークをトレーニングする。テストフェーズでは、サンプルはデータと構造からなるアニールランジュバンダイナミクスを介して反復的に生成される。実験により, 提案モデルが着色品質, 特に着色性, 多様性に顕著な改善が認められた。

Unsupervised deep learning has recently demonstrated the promise to produce high-quality samples. While it has tremendous potential to promote the image colorization task, the performance is limited owing to the manifold hypothesis in machine learning. This study presents a novel scheme that exploiting the score-based generative model in wavelet domain to address the issue. By taking advantage of the multi-scale and multi-channel representation via wavelet transform, the proposed model learns the priors from stacked wavelet coefficient components, thus learns the image characteristics under coarse and detail frequency spectrums jointly and effectively. Moreover, such a highly flexible generative model without adversarial optimization can execute colorization tasks better under dual consistency terms in wavelet domain, namely data-consistency and structure-consistency. Specifically, in the training phase, a set of multi-channel tensors consisting of wavelet coefficients are used as the input to train the network by denoising score matching. In the test phase, samples are iteratively generated via annealed Langevin dynamics with data and structure consistencies. Experiments demonstrated remarkable improvements of the proposed model on colorization quality, particularly on colorization robustness and diversity.

翻訳日:2021-07-12 13:49:38 公開日:2021-07-09

# 一般医用画像セグメンテーションの堅牢化に向けて

Towards Robust General Medical Image Segmentation ( http://arxiv.org/abs/2107.04263v1 )

ライセンス: Link先を確認

Laura Daza, Juan C. P\'erez, Pablo Arbel\'aez

(参考訳) 深層学習システムの信頼性は,その精度に依存するだけでなく,入力データに対する逆摂動に対する頑健性にも依存する。自然画像領域における対向ノイズの存在下でのディープニューラルネットワークの性能向上のために,いくつかの攻撃と防御が提案されている。しかしながら、ボリュームデータに対するコンピュータ支援診断のロバスト性は、特定のタスクと限られた攻撃でのみ研究されている。一般医用画像分割システムの堅牢性を評価するための新しい枠組みを提案する。 i)最近のAutoAttack自然画像分類フレームワークをボリュームデータセグメンテーションの領域に拡張することにより,医療セグメンテーション宣言(MSD)の文脈における堅牢性を評価するための新しいベンチマークを提案し,(ii)RObust Generic Medical Image segmentation(ROG)のための新しい格子アーキテクチャを提案する。以上の結果から,ROGはMSDの様々なタスクにまたがる一般化が可能であり,高度な敵攻撃下での最先端技術を上回ることが示唆された。

The reliability of Deep Learning systems depends on their accuracy but also on their robustness against adversarial perturbations to the input data. Several attacks and defenses have been proposed to improve the performance of Deep Neural Networks under the presence of adversarial noise in the natural image domain. However, robustness in computer-aided diagnosis for volumetric data has only been explored for specific tasks and with limited attacks. We propose a new framework to assess the robustness of general medical image segmentation systems. Our contributions are two-fold: (i) we propose a new benchmark to evaluate robustness in the context of the Medical Segmentation Decathlon (MSD) by extending the recent AutoAttack natural image classification framework to the domain of volumetric data segmentation, and (ii) we present a novel lattice architecture for RObust Generic medical image segmentation (ROG). Our results show that ROG is capable of generalizing across different tasks of the MSD and largely surpasses the state-of-the-art under sophisticated adversarial attacks.

翻訳日:2021-07-12 13:49:19 公開日:2021-07-09

# 先導型マルチビュー3次元頭部再構成

Prior-Guided Multi-View 3D Head Reconstruction ( http://arxiv.org/abs/2107.04277v1 )

ライセンス: Link先を確認

Xueying Wang, Yudong Guo, Zhongqi Yang and Juyong Zhang

(参考訳) 顔と髪の領域を含む3Dヘッドモデルの復元は、コンピュータビジョンとグラフィックスにおいて依然として難しい問題である。本稿では,複数視点のポートレート画像を入力として,この問題を考える。従来のマルチビューステレオ法は、最適化戦略またはディープラーニング技術に基づいており、不明瞭な頭部構造や毛髪領域における不正確な再構成といった低周波の幾何学的構造に苦しむ。この問題に対処するために,先導型暗黙的ニューラルネットワークを提案する。具体的には,頭部形状を学習可能な符号付き距離場(SDF)でモデル化し,顔の事前知識,頭部意味的セグメンテーション情報,2Dヘアオリエンテーションマップなどを含む,暗黙の微分可能レンダラーを用いて最適化する。これらの先行技術を利用することで、復元精度とロバスト性が向上し、高品質な3Dヘッドモデルが実現される。広範なアブレーション研究と最新手法との比較により,本手法が先行手法の指導により高精度な3次元頭部形状を生成できることが証明された。

Recovering a 3D head model including the complete face and hair regions is still a challenging problem in computer vision and graphics. In this paper, we consider this problem with a few multi-view portrait images as input. Previous multi-view stereo methods, either based on the optimization strategies or deep learning techniques, suffer from low-frequency geometric structures such as unclear head structures and inaccurate reconstruction in hair regions. To tackle this problem, we propose a prior-guided implicit neural rendering network. Specifically, we model the head geometry with a learnable signed distance field (SDF) and optimize it via an implicit differentiable renderer with the guidance of some human head priors, including the facial prior knowledge, head semantic segmentation information and 2D hair orientation maps. The utilization of these priors can improve the reconstruction accuracy and robustness, leading to a high-quality integrated 3D head model. Extensive ablation studies and comparisons with state-of-the-art methods demonstrate that our method could produce high-fidelity 3D head geometries with the guidance of these priors.

翻訳日:2021-07-12 13:49:00 公開日:2021-07-09

# ビデオオブジェクトセグメンテーションのための高速画素マッチング

Fast Pixel-Matching for Video Object Segmentation ( http://arxiv.org/abs/2107.04279v1 )

ライセンス: Link先を確認

Siyue Yu, Jimin Xiao, BingFeng Zhang, Eng Gee Lim

(参考訳) 第1フレームのアノテーションによる前景オブジェクトのセグメント化を目的としたビデオオブジェクトセグメンテーションが注目されている。多くの最先端のアプローチは、オンラインモデル更新やマスクプロパゲーション技術に頼ることで、優れたパフォーマンスを実現している。しかし、ほとんどのオンラインモデルは推論中のモデル微調整のために高い計算コストを必要とする。ほとんどのマスクプロパゲーションベースのモデルは高速だが、オブジェクトの外観の変化に適応できないため比較的性能が低い。本稿では,速度と性能のバランスを良くするために,新しいモデルを設計することを目的としている。マスクプロパゲーションと非局所的手法に基づいて、参照フレームとターゲットフレームの画素をマッチングすることにより、前景オブジェクトを直接ローカライズするNPMCA-netモデルを提案する。最初のフレームと前のフレームの両方の情報をもたらすので、我々のネットワークは大きなオブジェクトの外観変化に対して堅牢であり、オクルージョンに適応できる。実験の結果,DAVIS-2016では86.5% IoU,DAVIS-2017では72.2% IoU,フレーム当たり0.11秒の速度)を同時に達成できることがわかった。ソースコードはhttps://github.com/siyueyu/NPMCA-net.comで入手できる。

Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net.

翻訳日:2021-07-12 13:48:40 公開日:2021-07-09

# JPGNet:イメージインペイントのための共同予測フィルタと生成ネットワーク

JPGNet: Joint Predictive Filtering and Generative Network for Image Inpainting ( http://arxiv.org/abs/2107.04281v1 )

ライセンス: Link先を確認

Xiaoguang Li and Qing Guo and Felix Juefei-Xu and Hongkai Yu and Yang Liu and Song wang

(参考訳) 画像インペインティングは、画像の自然性を強調する共通生成タスクとは異なる、欠落した領域を復元し、元の完全画像と同一の回復結果を得ることを目的としている。それにもかかわらず、既存の作品では、通常は純粋な生成問題と見なされ、それに対処するために最先端の生成技術を使用している。生成ネットワークは、主要な欠落した部分を現実的な内容で埋めるが、通常は局所構造を歪ませる。本稿では,画像インペインティングを,予測フィルタリングと深層生成という2つの問題の混合として定式化する。予測フィルタリングは、ローカルな構造の保存とアーティファクトの除去に優れているが、大きな欠落した領域を完遂するには不足している。ディープジェネレーティブネットワークは、シーン全体の理解に基づいて多数の欠落画素を満たすことができるが、元のピクセルと同じ詳細を復元することはほとんどない。それぞれの利点を利用するために,予測フィルタリング・不確実性ネットワーク(PFUNet),深層生成ネットワーク(UAFNet),不確実性認識融合ネットワーク(UAFNet)の3分野を含む共同予測フィルタリング・生成ネットワーク(JPGNet)を提案する。 PFUNetは、入力画像に応じてフィルタリングベースの塗布のための画素単位のカーネルを適応的に予測し、不確実性マップを出力する。このマップは、ピクセルはフィルタリングまたは生成ネットワークによって処理されるべきであり、フィルタリングと生成結果の間のスマートな組み合わせのためにさらにuafnetに供給されることを示している。画像インペイント問題に対する新しいフレームワークとしての本手法は,既存の世代ベース手法の恩恵を受けることができる。我々は,Dunhuang,Places2,CelebAの3つの公開データセットに対して本手法の有効性を検証し,この手法が3つの最先端生成手法(StructFlow,EdgeConnect,RFRNet)を大幅に拡張できることを示す。

Image inpainting aims to restore the missing regions and make the recovery results identical to the originally complete image, which is different from the common generative task emphasizing the naturalness of generated images. Nevertheless, existing works usually regard it as a pure generation problem and employ cutting-edge generative techniques to address it. The generative networks fill the main missing parts with realistic contents but usually distort the local structures. In this paper, we formulate image inpainting as a mix of two problems, i.e., predictive filtering and deep generation. Predictive filtering is good at preserving local structures and removing artifacts but falls short to complete the large missing regions. The deep generative network can fill the numerous missing pixels based on the understanding of the whole scene but hardly restores the details identical to the original ones. To make use of their respective advantages, we propose the joint predictive filtering and generative network (JPGNet) that contains three branches: predictive filtering & uncertainty network (PFUNet), deep generative network, and uncertainty-aware fusion network (UAFNet). The PFUNet can adaptively predict pixel-wise kernels for filtering-based inpainting according to the input image and output an uncertainty map. This map indicates the pixels should be processed by filtering or generative networks, which is further fed to the UAFNet for a smart combination between filtering and generative results. Note that, our method as a novel framework for the image inpainting problem can benefit any existing generation-based methods. We validate our method on three public datasets, i.e., Dunhuang, Places2, and CelebA, and demonstrate that our method can enhance three state-of-the-art generative methods (i.e., StructFlow, EdgeConnect, and RFRNet) significantly with the slightly extra time cost.

翻訳日:2021-07-12 13:48:14 公開日:2021-07-09

# 野生のミーム: 有害ミームチャレンジデータセットの一般化可能性を評価する

Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset ( http://arxiv.org/abs/2107.04313v1 )

ライセンス: Link先を確認

Hannah Rose Kirk, Yennie Jun, Paulius Rauba, Gal Wachtel, Ruining Li, Xingjian Bai, Noah Broestl, Martin Doff-Sotta, Aleksandar Shtedritski, Yuki M. Asano

(参考訳) ヘイトフルミームは、メッセージがテキストとビジュアルの両方から派生しているため、現在の機械学習システムにとってユニークな課題となる。この効果のためにFacebookは、事前抽出されたテキストキャプションを備えたミームのデータセットであるHateful Memes Challengeをリリースしたが、これらの合成例が「野生のミーム」に一般化されるかどうかは不明である。本稿では、facebookデータセットで事前トレーニングされたモデル上でのサンプル外のパフォーマンスを評価するため、pinterestから嫌悪感と不快感のないミームを収集する。 1)キャプションをocrで抽出し、ノイズを注入し、マルチモーダルモデルの性能を低下させ、2)ミームは会話のスクリーンショットやプレーンな背景のテキストを含む「伝統的なミーム」よりも多様である。そこで本論文は,現在行われているヘイトフルミーム検出のベンチマークと,その実世界ヘイト検出への適用性について検討する。

Hateful memes pose a unique challenge for current machine learning systems because their message is derived from both text- and visual-modalities. To this effect, Facebook released the Hateful Memes Challenge, a dataset of memes with pre-extracted text captions, but it is unclear whether these synthetic examples generalize to `memes in the wild'. In this paper, we collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset. We find that memes in the wild differ in two key aspects: 1) Captions must be extracted via OCR, injecting noise and diminishing performance of multimodal models, and 2) Memes are more diverse than `traditional memes', including screenshots of conversations or text on a plain background. This paper thus serves as a reality check for the current benchmark of hateful meme detection and its applicability for detecting real world hate.

翻訳日:2021-07-12 13:47:38 公開日:2021-07-09

# 共同適応注意とグラフ関係を用いた行動単位検出

Action Unit Detection with Joint Adaptive Attention and Graph Relation ( http://arxiv.org/abs/2107.04389v1 )

ライセンス: Link先を確認

Chenggong Zhang and Juan Song and Qingyang Zhang and Weilong Dong and Ruomeng Ding and Zhilei Liu

(参考訳) 本稿では,顔行動単位(AU)検出へのアプローチについて述べる。本研究では,ABAW(Field Affective Behavior Analysis)2021コンペティションに応募する。提案手法は,事前学習したJAAモデルを特徴抽出器として使用し,マルチスケール特徴に基づいてグローバル特徴,顔アライメント特徴,AU局所特徴を抽出する。我々は、AUの局所的な特徴をグラフ畳み込みの入力として、AU間の相関をさらに考慮し、最終的に融合した特徴を用いてAUを分類する。検出精度は0.5*精度+0.5*F1。 aff-wild2データベース上で0.674。

This paper describes an approach to the facial action unit (AU) detection. In this work, we present our submission to the Field Affective Behavior Analysis (ABAW) 2021 competition. The proposed method uses the pre-trained JAA model as the feature extractor, and extracts global features, face alignment features and AU local features on the basis of multi-scale features. We take the AU local features as the input of the graph convolution to further consider the correlation between AU, and finally use the fused features to classify AU. The detected accuracy was evaluated by 0.5*accuracy + 0.5*F1. Our model achieves 0.674 on the challenging Aff-Wild2 database.

翻訳日:2021-07-12 13:47:20 公開日:2021-07-09

# 形態構造抽出のためのマルチモーダルアソシエーションに基づくグループ化

Multi-Modal Association based Grouping for Form Structure Extraction ( http://arxiv.org/abs/2107.04396v1 )

ライセンス: Link先を確認

Milan Aggarwal, Mausoom Sarkar, Hiresh Gupta, Balaji Krishnamurthy

(参考訳) 文書構造抽出は数十年にわたって広く研究されてきた。この方向の最近の研究は深層学習に基づくもので、主にセマンティックセグメンテーションによる完全な畳み込みNNを用いた構造抽出に焦点を当てている。本稿では,形式構造抽出のための新しいマルチモーダルアプローチを提案する。テキストランやウィジェットなどの単純な要素が与えられた場合,フォーム情報収集に不可欠なテキストブロック,テキストフィールド,選択フィールド,選択グループなどの高次構造を抽出する。これを実現するために,各低レベル要素(参照)に近接する候補要素を同定し,局所的な画像パッチを得る。我々は、BiLSTMを通して候補のテキストおよび空間表現を逐次処理し、文脈認識表現を取得し、それをCNNで処理した画像パッチ特徴と融合する。その後、シーケンシャルデコーダはこの融合特徴ベクトルを用いて参照と候補の関連型を予測する。これらの予測関連性を利用して、連結成分分析によりより大きな構造を決定する。実験の結果, 本手法は, それらの構造に対して, 90.29%, 73.80%, 83.12%, 52.72%のリコールを達成し, 意味的セグメンテーションベースラインを著しく上回った。本手法の有効性をアブレーションにより示し,個別のモダリティを用いて比較した。また、新しいリッチな人間アノテーション付きフォームデータセットも紹介します。

Document structure extraction has been a widely researched area for decades. Recent work in this direction has been deep learning-based, mostly focusing on extracting structure using fully convolution NN through semantic segmentation. In this work, we present a novel multi-modal approach for form structure extraction. Given simple elements such as textruns and widgets, we extract higher-order structures such as TextBlocks, Text Fields, Choice Fields, and Choice Groups, which are essential for information collection in forms. To achieve this, we obtain a local image patch around each low-level element (reference) by identifying candidate elements closest to it. We process textual and spatial representation of candidates sequentially through a BiLSTM to obtain context-aware representations and fuse them with image patch features obtained by processing it through a CNN. Subsequently, the sequential decoder takes this fused feature vector to predict the association type between reference and candidates. These predicted associations are utilized to determine larger structures through connected components analysis. Experimental results show the effectiveness of our approach achieving a recall of 90.29%, 73.80%, 83.12%, and 52.72% for the above structures, respectively, outperforming semantic segmentation baselines significantly. We show the efficacy of our method through ablations, comparing it against using individual modalities. We also introduce our new rich human-annotated Forms Dataset.

翻訳日:2021-07-12 13:47:08 公開日:2021-07-09

# 解釈可能な構成畳み込みニューラルネットワーク

Interpretable Compositional Convolutional Neural Networks ( http://arxiv.org/abs/2107.04474v1 )

ライセンス: Link先を確認

Wen Shen, Zhihua Wei, Shikun Huang, Binbin Zhang, Jiaqi Fan, Ping Zhao, Quanshi Zhang

(参考訳) 意味論的解釈可能性の定義は、説明可能なAIにおける中核的な課題を示す。本稿では,中間畳み込み層における有意な視覚パターンを符号化するフィルタを学習するために,従来の畳み込みニューラルネットワーク(CNN)を解釈可能な合成CNNに変換する手法を提案する。合成cnnでは、各フィルタは、明確な意味を持つ特定の合成対象部分または画像領域を一貫して表現する。合成cnnは、分類のための画像ラベルから、監督のための部分や領域の注釈なしで学習する。我々の手法は様々な種類のCNNに適用できる。実験により本手法の有効性が示された。

The reasonable definition of semantic interpretability presents the core challenge in explainable AI. This paper proposes a method to modify a traditional convolutional neural network (CNN) into an interpretable compositional CNN, in order to learn filters that encode meaningful visual patterns in intermediate convolutional layers. In a compositional CNN, each filter is supposed to consistently represent a specific compositional object part or image region with a clear meaning. The compositional CNN learns from image labels for classification without any annotations of parts or regions for supervision. Our method can be broadly applied to different types of CNNs. Experiments have demonstrated the effectiveness of our method.

翻訳日:2021-07-12 13:46:44 公開日:2021-07-09

# mutualeyecontact:アイコンタクトに焦点を当てた会話分析ツール

MutualEyeContact: A conversation analysis tool with focus on eye contact ( http://arxiv.org/abs/2107.04476v1 )

ライセンス: Link先を確認

Alexander Sch\"afer, Tomoko Isomura, Gerd Reis, Katsumi Watanabe, Didier Stricker

(参考訳) 個人間の目の接触は、人間の行動を理解する上で特に重要である。社会的相互作用におけるアイコンタクトの重要性をさらに調査するため,携帯型アイトラッキング技術は自然選択であると考えられる。しかし、利用可能なデータの分析は非常に複雑になる可能性がある。科学者は素早く正確に計算されるデータが必要です。さらに、関連するデータを自動的に分離して保存しなければなりません。本研究では,これらの課題に優れた相互接触ツールを提案し,相互接触の重要性を科学者に理解させる。最先端のアイトラッキングと機械学習に基づく顔認識を組み合わせることで,ソーシャルインタラクションセッションの分析と可視化を行うツールを提供する。この研究はコンピュータ科学者と認知科学者の共同研究である。社会科学と行動科学の分野とコンピュータビジョンとディープラーニングを組み合わせる。

Eye contact between individuals is particularly important for understanding human behaviour. To further investigate the importance of eye contact in social interactions, portable eye tracking technology seems to be a natural choice. However, the analysis of available data can become quite complex. Scientists need data that is calculated quickly and accurately. Additionally, the relevant data must be automatically separated to save time. In this work, we propose a tool called MutualEyeContact which excels in those tasks and can help scientists to understand the importance of (mutual) eye contact in social interactions. We combine state-of-the-art eye tracking with face recognition based on machine learning and provide a tool for analysis and visualization of social interaction sessions. This work is a joint collaboration of computer scientists and cognitive scientists. It combines the fields of social and behavioural science with computer vision and deep learning.

翻訳日:2021-07-12 13:46:34 公開日:2021-07-09

# スタイルGAN潜時空間の意味的および幾何学的展開

Semantic and Geometric Unfolding of StyleGAN Latent Space ( http://arxiv.org/abs/2107.04481v1 )

ライセンス: Link先を確認

Mustafa Shukor, Xu Yao, Bharath Bhushan Damodaran, Pierre Hellier

(参考訳) generative adversarial networks (gans) は、自然画像に対応する潜在コードを反転させ操作することで画像編集に驚くほど効率的であることが証明されている。この性質は、潜在空間の不連続な性質から生じる。本稿では, 画像知覚距離とユークリッド距離の違いと, (b) アンタングル化が最適ではなく, (b) 線形モデルを用いた顔属性分離が限界仮説である,という2つの幾何学的制約を同定する。そこで本研究では,これらの制約を解消するために,正規化フローを用いてプロキシ潜在表現を学習する新しい手法を提案する。

Generative adversarial networks (GANs) have proven to be surprisingly efficient for image editing by inverting and manipulating the latent code corresponding to a natural image. This property emerges from the disentangled nature of the latent space. In this paper, we identify two geometric limitations of such latent space: (a) euclidean distances differ from image perceptual distance, and (b) disentanglement is not optimal and facial attribute separation using linear model is a limiting hypothesis. We thus propose a new method to learn a proxy latent representation using normalizing flows to remedy these limitations, and show that this leads to a more efficient space for face image editing.

翻訳日:2021-07-12 13:46:23 公開日:2021-07-09

# 弱教師付き領域適応によるカスケード検出タスクの学習

Learning Cascaded Detection Tasks with Weakly-Supervised Domain Adaptation ( http://arxiv.org/abs/2107.04523v1 )

ライセンス: Link先を確認

Niklas Hanselmann, Nick Schneider, Benedikt Ortelt and Andreas Geiger

(参考訳) 自動運転の課題に対処するために、ディープラーニングは、3d検出やインスタンスセグメンテーションなど、ますます複雑なタスクに取り組む上で重要であることが証明されている。画像に基づく検出タスクに対する最先端のアプローチは、カスケードな方法で操作することで、この複雑さに対処している。仮面は推測される。これらの手法はうまく機能するが、様々なタスクに対する正確で安価なアノテーションが欠如していることは依然として大きな課題である。合成データは有望な解であるが、ドメイン適応研究の努力にもかかわらず、合成データと実際のデータのギャップは未解決の問題である。本研究では,逐次的検出タスクの構造を生かした弱教師付き領域適応設定を提案する。特に、2dバウンディングボックスを両方のドメインの弱いラベルとして活用しながら、ソースドメインのみから属性を推測し、ドメインシフトを説明することを学びます。さらに,教師なし設定では利用できない基底クラス情報を用いて,クラス毎の機能アライメントを通じて,ドメイン不変機能を奨励する。実験の結果,提案手法は完全教師付き設定と競合する一方で,教師なし適応手法よりも大きなマージンで優れていた。

In order to handle the challenges of autonomous driving, deep learning has proven to be crucial in tackling increasingly complex tasks, such as 3D detection or instance segmentation. State-of-the-art approaches for image-based detection tasks tackle this complexity by operating in a cascaded fashion: they first extract a 2D bounding box based on which additional attributes, e.g. instance masks, are inferred. While these methods perform well, a key challenge remains the lack of accurate and cheap annotations for the growing variety of tasks. Synthetic data presents a promising solution but, despite the effort in domain adaptation research, the gap between synthetic and real data remains an open problem. In this work, we propose a weakly supervised domain adaptation setting which exploits the structure of cascaded detection tasks. In particular, we learn to infer the attributes solely from the source domain while leveraging 2D bounding boxes as weak labels in both domains to explain the domain shift. We further encourage domain-invariant features through class-wise feature alignment using ground-truth class information, which is not available in the unsupervised setting. As our experiments demonstrate, the approach is competitive with fully supervised settings while outperforming unsupervised adaptation approaches by a large margin.

翻訳日:2021-07-12 13:46:11 公開日:2021-07-09

# スライディングウィンドウ最適化による連続時間におけるイベントベース特徴追跡

Event-Based Feature Tracking in Continuous Time with Sliding Window Optimization ( http://arxiv.org/abs/2107.04536v1 )

ライセンス: Link先を確認

Jason Chui, Simon Klenk, Daniel Cremers

(参考訳) イベントカメラにおける連続時間特徴追跡のための新しい手法を提案する。この目的のために,画像平面上の投影によって最大にシャープなイベントパッチ画像が得られるように,推定軌道に沿ったイベントを時空に調整して特徴を追跡する。軌道は$n^{th}$次 B-スプラインによってパラメータ化され、これは$(n-2)^{th}$微分まで連続である。従来の作業とは対照的に,スライディングウィンドウ方式で曲線パラメータを最適化する。パブリックデータセットでは,提案したスライドウインドウB-スプライン最適化が,従来よりも長い,より正確な特徴トラックにつながることを実験的に確認した。

We propose a novel method for continuous-time feature tracking in event cameras. To this end, we track features by aligning events along an estimated trajectory in space-time such that the projection on the image plane results in maximally sharp event patch images. The trajectory is parameterized by $n^{th}$ order B-splines, which are continuous up to $(n-2)^{th}$ derivative. In contrast to previous work, we optimize the curve parameters in a sliding window fashion. On a public dataset we experimentally confirm that the proposed sliding-window B-spline optimization leads to longer and more accurate feature tracks than in previous work.

翻訳日:2021-07-12 13:45:51 公開日:2021-07-09

# MRIと超音波ボリューム登録のためのクロスモーダルアテンション

Cross-modal Attention for MRI and Ultrasound Volume Registration ( http://arxiv.org/abs/2107.04548v1 )

ライセンス: Link先を確認

Xinrui Song, Hengtao Guo, Xuanang Xu, Hanqing Chao, Sheng Xu, Baris Turkbey, Bradford J. Wood, Ge Wang, Pingkun Yan

(参考訳) 前立腺癌生検は経直腸超音波(TRUS)とMR画像の正確な融合の恩恵を受ける。過去数年間、畳み込みニューラルネットワーク(cnns)は、画像登録に不可欠な画像特徴を抽出する上で強力であることが証明されてきた。しかし、挑戦的な応用やコンピュータビジョンの最近の進歩は、cnnが特徴間の空間的対応を理解する能力にかなり制限があることを示唆している。本稿では,モーダル画像登録のための自己認識機構を開発することを目的とする。提案するクロスモーダルアテンションブロックは,各特徴量と対応する特徴量とを効果的にマッピングする。実験の結果,クロスモーダルアテンションブロックを組み込んだCNNネットワークが,CNNネットワークの10倍の性能を発揮することがわかった。ネットワークの解釈性を改善するために可視化技術も取り入れた。私たちの作業のソースコードはhttps://github.com/DIAL-RPI/Attention-Reg で公開されています。

Prostate cancer biopsy benefits from accurate fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images. In the past few years, convolutional neural networks (CNNs) have been proved powerful in extracting image features crucial for image registration. However, challenging applications and recent advances in computer vision suggest that CNNs are quite limited in its ability to understand spatial correspondence between features, a task in which the self-attention mechanism excels. This paper aims to develop a self-attention mechanism specifically for cross-modal image registration. Our proposed cross-modal attention block effectively maps each of the features in one volume to all features in the corresponding volume. Our experimental results demonstrate that a CNN network designed with the cross-modal attention block embedded outperforms an advanced CNN network 10 times of its size. We also incorporated visualization techniques to improve the interpretability of our network. The source code of our work is available at https://github.com/DIAL-RPI/Attention-Reg .

翻訳日:2021-07-12 13:45:42 公開日:2021-07-09

# ResNet-18を用いた7つの基本表現認識

Seven Basic Expression Recognition Using ResNet-18 ( http://arxiv.org/abs/2107.04569v1 )

ライセンス: Link先を確認

Satnam Singh, Doris Schicker

(参考訳) 本稿では, fer+データセット上で事前学習したResNet-18アーキテクチャを用いて, 感情行動分析(ABAW)の問題に対処し, 中立性, 怒り, 嫌悪感, 恐怖, 幸福, 悲しみ, 驚きの7つの基本表現の分類を行う。第2回ワークショップと第2回感情行動分析コンテスト(ABAW2)では、約2.8Mフレームの564ビデオからなるデータベースと、これら7つの基本表現のラベルが提供される。我々は、過剰表現されたクラスをアンダーサンプリングし、過表現されたクラスをクラスワイドと共にオーバーサンプリングすることで、クラス不均衡に対処するためにデータセットを再サンプリングした。オーバーフィッティングを避けるためにデータ表示を行い、l2正規化を使った。我々の分類器は、abaw2スコア0.4に達し、競争相手が提供したベースライン結果を超える。

We propose to use a ResNet-18 architecture that was pre-trained on the FER+ dataset for tackling the problem of affective behavior analysis in-the-wild (ABAW) for classification of the seven basic expressions, namely, neutral, anger, disgust, fear, happiness, sadness and surprise. As part of the second workshop and competition on affective behavior analysis in-the-wild (ABAW2), a database consisting of 564 videos with around 2.8M frames is provided along with labels for these seven basic expressions. We resampled the dataset to counter class-imbalances by under-sampling the over-represented classes and over-sampling the under-represented classes along with class-wise weights. To avoid overfitting we performed data-augmentation and used L2 regularisation. Our classifier reaches an ABAW2 score of 0.4 and therefore exceeds the baseline results provided by the hosts of the competition.

翻訳日:2021-07-12 13:45:28 公開日:2021-07-09

# 堅牢な機能獲得に向けて

Towards Robust Active Feature Acquisition ( http://arxiv.org/abs/2107.04163v1 )

ライセンス: Link先を確認

Yang Li, Siyuan Shan, Qin Liu, Junier B. Oliva

(参考訳) 真にインテリジェントなシステムは、不完全で不確実なデータで重要な決定をすると予想されている。予測を改善するために機能が順次取得されるアクティブ機能獲得(afa)は、この目標に向けての一歩です。しかしながら、現在のAFAモデルは、すべて小さな機能セットを扱い、大きな機能領域へのスケーリングが困難です。さらに、信頼できる予測が可能な有効なドメインについて無知であるため、アウト・オブ・ディストリビューション(OOD)の入力に弱い可能性がある。これらの欠陥を解消し、AFAモデルを実用化に近づけるために、我々は現在のAFAアプローチを進めるためのいくつかの手法を提案する。本フレームワークは階層的な取得ポリシを用いて,多数の機能を容易に扱えるとともに,OOD検出器の助けを借りてOOD入力に対してより堅牢である。大規模な実験は、強いベースラインに対する我々のフレームワークの有効性を示す。

Truly intelligent systems are expected to make critical decisions with incomplete and uncertain data. Active feature acquisition (AFA), where features are sequentially acquired to improve the prediction, is a step towards this goal. However, current AFA models all deal with a small set of candidate features and have difficulty scaling to a large feature space. Moreover, they are ignorant about the valid domains where they can predict confidently, thus they can be vulnerable to out-of-distribution (OOD) inputs. In order to remedy these deficiencies and bring AFA models closer to practical use, we propose several techniques to advance the current AFA approaches. Our framework can easily handle a large number of features using a hierarchical acquisition policy and is more robust to OOD inputs with the help of an OOD detector for partially observed data. Extensive experiments demonstrate the efficacy of our framework over strong baselines.

翻訳日:2021-07-12 13:44:59 公開日:2021-07-09

# 系統的欠落値を含むデータから学習する欲求構造

Greedy structure learning from data that contains systematic missing values ( http://arxiv.org/abs/2107.04184v1 )

ライセンス: Link先を確認

Yang Liu and Anthony C. Constantinou

(参考訳) 欠落した値を含むデータから学ぶことは、多くの領域で共通の現象である。ベイズネットワーク構造学習アルゴリズムが欠落データを扱うのは、比較的少ないが、欠落データを想定最大化アルゴリズムのようにランダムに欠落していると仮定する標準的なアプローチに依存する傾向がある。欠落したデータはしばしば体系的であるため、ランダムに欠落しない値を含むデータセットを効果的に扱えるより実用的な方法が必要である。体系的な欠落データを扱うアプローチの欠如は、BN構造学習法の欠落がランダムでない実世界の問題への適用を妨げる。本稿では,ペアワイズ削除と逆確率重み付けを活用し,観測データを最大に活用し,欠落値による潜在的なバイアスを最小化する,グリーディ探索構造学習の3つの変種について述べる。最初の2つの変種は3番目の変種と最高の変種をサブバージョンと見なすことができるが、学習精度の連続的な改善を示す上で重要である。実験により, 提案手法は, 学習精度と効率の両面において, 一般用および最先端の構造化EMアルゴリズムよりも優れており, ランダムにデータが欠落している場合や, ランダムではない場合にも優れることがわかった。

Learning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data, and those that do tend to rely on standard approaches that assume missing data are missing at random, such as the Expectation-Maximisation algorithm. Because missing data are often systematic, there is a need for more pragmatic methods that can effectively deal with data sets containing missing values not missing at random. The absence of approaches that deal with systematic missing data impedes the application of BN structure learning methods to real-world problems where missingness are not random. This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting to maximally leverage the observed data and to limit potential bias caused by missing values. The first two of the variants can be viewed as sub-versions of the third and best performing variant, but are important in their own in illustrating the successive improvements in learning accuracy. The empirical investigations show that the proposed approach outperforms the commonly used and state-of-the-art Structural EM algorithm, both in terms of learning accuracy and efficiency, as well as both when data are missing at random and not at random.

翻訳日:2021-07-12 13:44:46 公開日:2021-07-09

# rex: スケジュールの改善による予算トレーニングの再検討

REX: Revisiting Budgeted Training with an Improved Schedule ( http://arxiv.org/abs/2107.04197v1 )

ライセンス: Link先を確認

John Chen, Cameron Wolfe, Anastasios Kyrillidis

(参考訳) ディープラーニングの実践者は、しばしば計算と金銭の予算を運用する。したがって、いかなる予算でもうまく機能する最適化アルゴリズムを設計することは重要である。線形学習率のスケジュールは、低予算体制の他のほとんどのスケジュールよりも優れているため、最良の予算対応スケジュールと考えられている。一方、例えば \texttt{30-60-90} ステップスケジュールのような学習率スケジュールは、モデルが多くのエポックに対してトレーニングできる場合に高いパフォーマンスを達成することが知られている。しかし、予算が大きくなるか小さいかは事前に分かっていないことが多いため、学習率スケジュールの最適な選択はケース・バイ・ケース・バイ・ケースで行われる。本稿では、学習率スケジュール選択問題を、プロファイルの選択(すなわち、学習率スケジュールをモデル化する連続関数)と、サンプリングレートの選択(つまり、このプロファイルから学習率が更新/サンプリングされる頻度)の組合せとして構成する。 sgdとadamオプティマイザの両方を用いて7つの異なる実験環境で評価した,reflection exponential (rex) scheduleと呼ばれる新しいプロファイルとサンプリングレートの組み合わせを提案する。 REXは低予算体制において線形スケジュールを上回り、高予算体制と低予算体制の両方において最先端の学習率スケジュール(線形、ステップ、指数関数、コサイン、高原でのステップ崩壊、OneCycle)のパフォーマンスを一致または超過する。さらに、REXは計算、ストレージ、ハイパーパラメータを追加する必要はない。

Deep learning practitioners often operate on a computational and monetary budget. Thus, it is critical to design optimization algorithms that perform well under any budget. The linear learning rate schedule is considered the best budget-aware schedule, as it outperforms most other schedules in the low budget regime. On the other hand, learning rate schedules -- such as the \texttt{30-60-90} step schedule -- are known to achieve high performance when the model can be trained for many epochs. Yet, it is often not known a priori whether one's budget will be large or small; thus, the optimal choice of learning rate schedule is made on a case-by-case basis. In this paper, we frame the learning rate schedule selection problem as a combination of $i)$ selecting a profile (i.e., the continuous function that models the learning rate schedule), and $ii)$ choosing a sampling rate (i.e., how frequently the learning rate is updated/sampled from this profile). We propose a novel profile and sampling rate combination called the Reflected Exponential (REX) schedule, which we evaluate across seven different experimental settings with both SGD and Adam optimizers. REX outperforms the linear schedule in the low budget regime, while matching or exceeding the performance of several state-of-the-art learning rate schedules (linear, step, exponential, cosine, step decay on plateau, and OneCycle) in both high and low budget regimes. Furthermore, REX requires no added computation, storage, or hyperparameters.

翻訳日:2021-07-12 13:44:25 公開日:2021-07-09

# 早期終末期MDPの解決による安全な探索

Safe Exploration by Solving Early Terminated MDP ( http://arxiv.org/abs/2107.04200v1 )

ライセンス: Link先を確認

Hao Sun, Ziping Xu, Meng Fang, Zhenghao Peng, Jiadong Guo, Bo Dai, Bolei Zhou

(参考訳) 強化学習(RL)の現実的な応用には,安全な探索が不可欠である。従来の研究では、安全な探索問題を制約付きマルコフ決定プロセス(CMDP)とみなしており、政策は制約の下で最適化されている。しかし、潜在的な危険に遭遇すると、人間はすぐに立ち止まり、危険の中で安全に行動することを学ぶことは滅多にない。人間の学習を動機として,早期終末型MDP(ET-MDP)の枠組みの下で安全なRL問題に対処する新たなアプローチを導入する。まず,ET-MDP を,対応するCMDP と同じ最適値関数を持つ制約のない MDP として定義する。そこで, 文脈モデルに基づく非政治アルゴリズムを提案し, ET-MDPを解くことにより, CMDPの漸近性能を向上し, 学習効率を向上する。 CMDPタスクの実験では、CMDPを直接解く従来の方法よりも大幅に改善されている。

Safe exploration is crucial for the real-world application of reinforcement learning (RL). Previous works consider the safe exploration problem as Constrained Markov Decision Process (CMDP), where the policies are being optimized under constraints. However, when encountering any potential dangers, human tends to stop immediately and rarely learns to behave safely in danger. Motivated by human learning, we introduce a new approach to address safe RL problems under the framework of Early Terminated MDP (ET-MDP). We first define the ET-MDP as an unconstrained MDP with the same optimal value function as its corresponding CMDP. An off-policy algorithm based on context models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP with better asymptotic performance and improved learning efficiency. Experiments on various CMDP tasks show a substantial improvement over previous methods that directly solve CMDP.

翻訳日:2021-07-12 13:43:57 公開日:2021-07-09

# 局所適応型不均一フェデレーション学習によるリソグラフィホットスポット検出

Lithography Hotspot Detection via Heterogeneous Federated Learning with Local Adaptation ( http://arxiv.org/abs/2107.04367v1 )

ライセンス: Link先を確認

Xuezhong Lin, Jingyu Pan, Jinming Xu, Yiran Chen and Cheng Zhuo

(参考訳) 技術的スケーリングが物理的限界に近づいている中、リソグラフィホットスポット検出は製造性の設計において重要なタスクとなっている。パターンマッチングや機械学習をホットスポット検出に配置することで、かなりのシミュレーション時間を節約できるが、そのような手法は通常、モデルを構築するための非自明な品質データを必要とする。また、デザインハウスは、このようなデータを他の住宅と直接共有して統一モデルを構築することを望まないため、データ不足によりユニークなデザインパターンを持つデザインハウスには効果がない。一方、各デザインハウスにおけるデータ均質性により、局所的に訓練されたモデルは容易に過剰に適合し、一般化能力と堅牢性を失う。本稿では,上記の問題に対処可能な,リソグラフィホットスポット検出のための異種フェデレーション学習フレームワークを提案する。一方、このフレームワークは、ローカルデータをプライベートに保ちながら、異質な知識共有を通じて、より堅牢なグローバルサブモデルを構築することができる。一方、グローバルなサブモデルとローカルなサブモデルを組み合わせることで、ローカルなデータの均一性を改善することができる。提案手法は,非独立かつ同一分散(非iid)データとヘテロジニアス通信の課題を克服し,様々なシナリオにおいて良好な収束率を確保しつつ,他の最先端手法と比較して非常に高い性能を実現することができることを示す。

As technology scaling is approaching the physical limit, lithography hotspot detection has become an essential task in design for manufacturability. While the deployment of pattern matching or machine learning in hotspot detection can help save significant simulation time, such methods typically demand for non-trivial quality data to build the model, which most design houses are short of. Moreover, the design houses are also unwilling to directly share such data with the other houses to build a unified model, which can be ineffective for the design house with unique design patterns due to data insufficiency. On the other hand, with data homogeneity in each design house, the locally trained models can be easily over-fitted, losing generalization ability and robustness. In this paper, we propose a heterogeneous federated learning framework for lithography hotspot detection that can address the aforementioned issues. On one hand, the framework can build a more robust centralized global sub-model through heterogeneous knowledge sharing while keeping local data private. On the other hand, the global sub-model can be combined with a local sub-model to better adapt to local data heterogeneity. The experimental results show that the proposed framework can overcome the challenge of non-independent and identically distributed (non-IID) data and heterogeneous communication to achieve very high performance in comparison to other state-of-the-art methods while guaranteeing a good convergence rate in various scenarios.

翻訳日:2021-07-12 13:43:43 公開日:2021-07-09

# 制約付き最適化としてのモデル圧縮とニューラルネットへの応用パート5:圧縮の組み合わせ

Model compression as constrained optimization, with application to neural nets. Part V: combining compressions ( http://arxiv.org/abs/2107.04380v1 )

ライセンス: Link先を確認

Miguel \'A. Carreira-Perpi\~n\'an, Yerlan Idelbayev

(参考訳) モデル圧縮は一般に量子化、低ランク近似、プルーニングを用いて行われ、近年様々なアルゴリズムが研究されている。基本的な質問の1つは、どのタイプの圧縮が特定のモデルに対してうまく働くかということです。あるいは、もっと良い:適切な方法で圧縮を組み合わせることで改善できるのか? これを損失を最適化する問題として一般に定式化するが、重みを個別に圧縮した部分の加法結合に制限し、対応する部分のパラメータを学習するアルゴリズムを与える。ディープニューラルネットを用いた実験では,1)誤り圧縮空間において,異なる圧縮型が相補的な効果をもたせ,2)最適な組み合わせがニューラルネットワークのタイプに依存することを示す,はるかに優れたモデルを見出すことができる。例えば、数個の浮動小数点重みを追加してエラーを発生させることなく、ResNetとAlexNetを1ビット1重で圧縮できます。しかし、低ランクと浮動小数点重みを組み合わせることで、VGGネットをより圧縮することができる。

Model compression is generally performed by using quantization, low-rank approximation or pruning, for which various algorithms have been researched in recent years. One fundamental question is: what types of compression work better for a given model? Or even better: can we improve by combining compressions in a suitable way? We formulate this generally as a problem of optimizing the loss but where the weights are constrained to equal an additive combination of separately compressed parts; and we give an algorithm to learn the corresponding parts' parameters. Experimentally with deep neural nets, we observe that 1) we can find significantly better models in the error-compression space, indicating that different compression types have complementary benefits, and 2) the best type of combination depends exquisitely on the type of neural net. For example, we can compress ResNets and AlexNet using only 1 bit per weight without error degradation at the cost of adding a few floating point weights. However, VGG nets can be better compressed by combining low-rank with a few floating point weights.

翻訳日:2021-07-12 13:43:19 公開日:2021-07-09

# Form2Seq : 高次構造抽出のためのフレームワーク

Form2Seq : A Framework for Higher-Order Form Structure Extraction ( http://arxiv.org/abs/2107.04419v1 )

ライセンス: Link先を確認

Milan Aggarwal, Hiresh Gupta, Mausoom Sarkar, Balaji Krishnamurthy

(参考訳) 文書構造抽出は数十年にわたって広く研究されてきた分野であり、近年では完全畳み込みネットワークを用いた文書画像のセマンティックセグメンテーションタスクとして行われている。このような手法は画像分解能によって制限されるが、一般的に形に現れる濃密な領域の構造を曖昧にしないためである。そこで本稿では,テキストを用いた構造抽出のための新しいシーケンシャル・ツー・シークエンス(seq2seq)フレームワークであるform2seqを提案する。 1) 低レベルの構成要素(TextBlockと空の充填可能なウィジェット)をフィールドキャプションやリストアイテムなど10種類に分類し,2) 低レベルの要素をテキストフィールド, ChoiceFields, ChoiceGroupsなどの高次の構成要素に分類し,フォームの情報収集機構として利用する。これを実現するため、構成要素を自然読み順に線形に配置し、その空間表現とテキスト表現をseq2seqフレームワークに供給し、最終タスクに応じて各要素の予測を順次出力する。タスクをグループ化するためにseq2seqを修正し、2つのタスクのエンドツーエンドトレーニングを分離したトレーニングと比較することで得られた改善について検討する。実験の結果, 分類タスクにおいて90%の精度を達成するテキストベースアプローチの有効性を示し, 上記のグループでは75.82, 86.01, 61.63のf1がセグメンテーションベースラインを上回った。さらに,ICDAR 2013 データセット上でのテーブル構造認識の結果の状況を示す。

Document structure extraction has been a widely researched area for decades with recent works performing it as a semantic segmentation task over document images using fully-convolution networks. Such methods are limited by image resolution due to which they fail to disambiguate structures in dense regions which appear commonly in forms. To mitigate this, we propose Form2Seq, a novel sequence-to-sequence (Seq2Seq) inspired framework for structure extraction using text, with a specific focus on forms, which leverages relative spatial arrangement of structures. We discuss two tasks; 1) Classification of low-level constituent elements (TextBlock and empty fillable Widget) into ten types such as field captions, list items, and others; 2) Grouping lower-level elements into higher-order constructs, such as Text Fields, ChoiceFields and ChoiceGroups, used as information collection mechanism in forms. To achieve this, we arrange the constituent elements linearly in natural reading order, feed their spatial and textual representations to Seq2Seq framework, which sequentially outputs prediction of each element depending on the final task. We modify Seq2Seq for grouping task and discuss improvements obtained through cascaded end-to-end training of two tasks versus training in isolation. Experimental results show the effectiveness of our text-based approach achieving an accuracy of 90% on classification task and an F1 of 75.82, 86.01, 61.63 on groups discussed above respectively, outperforming segmentation baselines. Further we show our framework achieves state of the results for table structure recognition on ICDAR 2013 dataset.

翻訳日:2021-07-12 13:42:41 公開日:2021-07-09

# 非漸近解析による歪みリスク尺度の類似度に基づく政策勾配法

Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis ( http://arxiv.org/abs/2107.04422v1 )

ライセンス: Link先を確認

Nithia Vijayan and Prashanth L. A

(参考訳) 本稿では,リスクに敏感な強化学習(rl)環境での制御問題を解決するためのポリシ勾配アルゴリズムを提案する。本アルゴリズムの目的は,マルコフ決定過程(MDP)における累積報酬の歪みリスク尺度(DRM)を最大化することである。我々は、drmの目的に対応するポリシー勾配定理の変種を導出する。この定理とLRに基づく勾配推定手法を併用して,オン・ポリティクスとオフ・ポリティクスのRL設定の両方においてDRMを最適化するポリシー勾配アルゴリズムを提案する。我々は、drm目標の近似定常点へのアルゴリズムの収束を確立する非漸近境界を導出する。

We propose policy-gradient algorithms for solving the problem of control in a risk-sensitive reinforcement learning (RL) context. The objective of our algorithm is to maximize the distorted risk measure (DRM) of the cumulative reward in an episodic Markov decision process (MDP). We derive a variant of the policy gradient theorem that caters to the DRM objective. Using this theorem in conjunction with a likelihood ratio (LR) based gradient estimation scheme, we propose policy gradient algorithms for optimizing DRM in both on-policy and off-policy RL settings. We derive non-asymptotic bounds that establish the convergence of our algorithms to an approximate stationary point of the DRM objective.

翻訳日:2021-07-12 13:42:10 公開日:2021-07-09

# 交通シナリオにおける行動計画のための対話型ガイダンスの学習

Learning Interaction-aware Guidance Policies for Motion Planning in Dense Traffic Scenarios ( http://arxiv.org/abs/2107.04538v1 )

ライセンス: Link先を確認

Bruno Brito, Achin Agarwal and Javier Alonso-Mora

(参考訳) 密集した交通シナリオにおける自律ナビゲーションは、他のドライバーの意図が直接観察不可能であり、AVは幅広い運転行動を扱う必要があるため、自動運転車(AV)にとって依然として困難である。密集した交通を操るために、avは彼らの行動が他人(相互作用モデル)にどう影響するかを判断し、この推論を利用して密集した交通を安全にナビゲートする必要がある。本稿では,高密度交通シナリオにおける対話型動き計画のための新しい枠組みを提案する。人間の運転行動と相互作用時の速度変化との関係について検討する。そこで我々は,制約満足度による安全性と運動性の実現性を保証する最適化型プランナーに,他車両の協調性に関するグローバルガイダンスを提供するインタラクション対応政策であるDeep Reinforcement Learning (RL)を通じて学習することを提案する。学習されたポリシーは、ローカル最適化ベースのプランナーを推論し、対話的な振る舞いで誘導し、他の車両が収まらない場合に安全を維持しながら、高密度トラフィックに積極的にマージする。我々は,高度にインタラクティブなシミュレーション環境(ハイウェイマージとアンプロテクト左旋回)において,学習ベースと最適化ベースの2つのベースラインアプローチに対して定性的かつ定量的な結果を示す。本手法は,学習ベースと最適化ベースの両方において,衝突数を大幅に削減し,成功率を増加させることを示す。

Autonomous navigation in dense traffic scenarios remains challenging for autonomous vehicles (AVs) because the intentions of other drivers are not directly observable and AVs have to deal with a wide range of driving behaviors. To maneuver through dense traffic, AVs must be able to reason how their actions affect others (interaction model) and exploit this reasoning to navigate through dense traffic safely. This paper presents a novel framework for interaction-aware motion planning in dense traffic scenarios. We explore the connection between human driving behavior and their velocity changes when interacting. Hence, we propose to learn, via deep Reinforcement Learning (RL), an interaction-aware policy providing global guidance about the cooperativeness of other vehicles to an optimization-based planner ensuring safety and kinematic feasibility through constraint satisfaction. The learned policy can reason and guide the local optimization-based planner with interactive behavior to pro-actively merge in dense traffic while remaining safe in case the other vehicles do not yield. We present qualitative and quantitative results in highly interactive simulation environments (highway merging and unprotected left turns) against two baseline approaches, a learning-based and an optimization-based method. The presented results demonstrate that our method significantly reduces the number of collisions and increases the success rate with respect to both learning-based and optimization-based baselines.

翻訳日:2021-07-12 13:41:58 公開日:2021-07-09

# 半教師型顔行動分析のためのマルチタスク平均教師

A Multi-task Mean Teacher for Semi-supervised Facial Affective Behavior Analysis ( http://arxiv.org/abs/2107.04225v1 )

ライセンス: Link先を確認

Lingfeng Wang, Shisen Wang

(参考訳) Affective Behavior Analysisは、人間とコンピュータの相互作用において重要な部分である。 tsav[9]のような既存の感情的行動分析手法は、不完全なラベル付きデータセットの課題に苦しむ。そこで本論文では,ラベルの欠落から学習し,複数の関連課題を同時に学習するための,半教師付き感情行動分析のためのマルチタスク平均教師モデルを提案する。具体的には、TSAVをベースラインモデルとして利用し、3つのタスクを同時に認識する。我々は,より優れた意味情報を提供するために,マスクのレンダリング前処理法を変更した。その後、平均教師を用いてTSAVモデルを半教師付きモデルに拡張し、ラベルなしデータから恩恵を受けることができた。評価実験の結果,提案手法はTSAVモデルよりも優れた性能を達成し,提案手法が適応的行動解析性能を向上させるために,新たなラベル付きデータを効果的に学習できることが確認された。

Affective Behavior Analysis is an important part in human?computer interaction. Existing successful affective behavior analysis method such as TSAV[9] suffer from challenge of incomplete labeled datasets. To boost its performance, this paper presents a multi-task mean teacher model for semi?supervised Affective Behavior Analysis to learn from missing labels and exploring the learning of multiple correlated task simultaneously. To be specific, we first utilize TSAV as baseline model to simultaneously recognize the three tasks. We have modified the preprocessing method of rendering mask to provide better semantics information. After that, we extended TSAV model to semi-supervised model using mean teacher, which allow it to be benefited from unlabeled data. Experimental results on validation datasets show that our method achieves better performance than TSAV model, which verifies that the proposed network can effectively learn additional unlabeled data to boost the affective behavior analysis performance.

翻訳日:2021-07-12 13:41:22 公開日:2021-07-09

# UrbanScene3D: 大規模都市景観データセットとシミュレータ

UrbanScene3D: A Large Scale Urban Scene Dataset and Simulator ( http://arxiv.org/abs/2107.04286v1 )

ライセンス: Link先を確認

Yilin Liu and Fuyou Xue and Hui Huang

(参考訳) 異なる方法で環境を知覚する能力は、ロボット研究に不可欠である。これには2dデータソースと3dデータソースの両方の分析が含まれる。本研究では,Unreal Engine 4 と AirSim をベースとした手頃なシミュレータを応用した大規模都市景観データセットを提案する。従来の2d情報や人工3dcadモデルに基づく作品とは異なり、urbanscene3dは小型の人工物モデルと航空画像で再構成された詳細な実世界のモデルの両方を含んでいる。各建物はシーンモデル全体から手動で抽出され、ユニークなラベルが割り当てられ、インスタンスセグメンテーションマップが作成されます。 UrbanScene3Dのインスタンスセグメンテーションラベルが付いた3Dの地平線テクスチャモデルでは、ユーザは、インスタンスセグメンテーションマップ、任意の解像度での深度マップ、可視および見えない場所での3Dポイントクラウド/メッシュなど、すべての種類のデータを取得することができる。さらに、airsimの助けを借りて、ユーザーはロボット(カー/ドロネス)をシミュレートして、提案された都市環境で様々な自律的なタスクをテストできる。詳細とアプリケーションの詳細については、私たちの論文とWebサイト(https://vcc.tech/UrbanScene3D/)を参照してください。

The ability to perceive the environments in different ways is essential to robotic research. This involves the analysis of both 2D and 3D data sources. We present a large scale urban scene dataset associated with a handy simulator based on Unreal Engine 4 and AirSim, which consists of both man-made and real-world reconstruction scenes in different scales, referred to as UrbanScene3D. Unlike previous works that purely based on 2D information or man-made 3D CAD models, UrbanScene3D contains both compact man-made models and detailed real-world models reconstructed by aerial images. Each building has been manually extracted from the entire scene model and then has been assigned with a unique label, forming an instance segmentation map. The provided 3D ground-truth textured models with instance segmentation labels in UrbanScene3D allow users to obtain all kinds of data they would like to have: instance segmentation map, depth map in arbitrary resolution, 3D point cloud/mesh in both visible and invisible places, etc. In addition, with the help of AirSim, users can also simulate the robots (cars/drones)to test a variety of autonomous tasks in the proposed city environment. Please refer to our paper and website(https://vcc.tech/UrbanScene3D/) for further details and applications.

翻訳日:2021-07-12 13:41:07 公開日:2021-07-09

# 経時的差分法によるDigital Subtraction Angiography Videoからの肝細胞癌の分節化

Hepatocellular Carcinoma Segmentation fromDigital Subtraction Angiography Videos usingLearnable Temporal Difference ( http://arxiv.org/abs/2107.04306v1 )

ライセンス: Link先を確認

Wenting Jiang, Yicheng Jiang, Lu Zhang, Changmiao Wang, Xiaoguang Han, Shuixing Zhang, Xiang Wan, Shuguang Cui

(参考訳) DSA(Digital Subtraction Angiography)ビデオにおける肝細胞癌 (HCC) の自動分画は, HCCの効率的な診断と臨床における腫瘍の正確な評価に役立つ。 dsaビデオからのhccセグメンテーションに関する研究はほとんどない。撮影における運動アーティファクト、腫瘍領域の曖昧な境界、および他の解剖組織へのイメージングにおける高い類似性により、非常に困難である。本稿では、DSAビデオにおけるHCCセグメンテーションの問題を提起し、独自のDSAデータセットを構築する。また,asegmentation sub-network,temporal difference learning (tdl) モジュール,および liver region segmentation (lrs) sub-network など,dsa-ltdnet と呼ばれる新しいセグメンテーションネットワークも提案する。 DSA-LTDNetは、DSAビデオから潜時動作情報を積極的に学習し、セグメンテーション性能を高めるのに好ましい。実験の結果、DSA-LTDNetは、U-Netベースラインと比較して、DICEスコアを4%近く増加させることがわかった。

Automatic segmentation of hepatocellular carcinoma (HCC)in Digital Subtraction Angiography (DSA) videos can assist radiologistsin efficient diagnosis of HCC and accurate evaluation of tumors in clinical practice. Few studies have investigated HCC segmentation from DSAvideos. It shows great challenging due to motion artifacts in filming, ambiguous boundaries of tumor regions and high similarity in imaging toother anatomical tissues. In this paper, we raise the problem of HCCsegmentation in DSA videos, and build our own DSA dataset. We alsopropose a novel segmentation network called DSA-LTDNet, including asegmentation sub-network, a temporal difference learning (TDL) moduleand a liver region segmentation (LRS) sub-network for providing additional guidance. DSA-LTDNet is preferable for learning the latent motioninformation from DSA videos proactively and boosting segmentation performance. All of experiments are conducted on our self-collected dataset.Experimental results show that DSA-LTDNet increases the DICE scoreby nearly 4% compared to the U-Net baseline.

翻訳日:2021-07-12 13:40:42 公開日:2021-07-09

# 信頼度に基づく複数物体追跡のためのスコア改善

Score refinement for confidence-based 3D multi-object tracking ( http://arxiv.org/abs/2107.04327v1 )

ライセンス: Link先を確認

Nuri Benbarka, Jona Schr\"oder, Andreas Zell

(参考訳) マルチオブジェクトトラッキングは、意思決定に有用な情報を提供するため、自律ナビゲーションにおいて重要なコンポーネントである。多くの研究者は、フレームごとの3D検出をフィルタリングすることで、3D多目的追跡タスクに取り組みましたが、その焦点は主に有用な特徴や適切なマッチングメトリクスを見つけることでした。我々の研究は追跡システムの無視された部分に焦点を当てている:スコアの洗練とトラックレットの終了。トラックレットスコアに応じてトラックレットを終了させながら、時間的一貫性に応じてスコアを操作することにより、追跡結果が向上することを示す。我々は、一致トラックレットのスコアをスコア更新機能で増加させ、一致しないトラックレットのスコアを減少させることによりこれを行う。数に基づく手法と比較して,様々な検出器とフィルタリングアルゴリズムを異なるデータセットで利用する場合,amotaとmotaスコアが一貫して向上する。 AMOTAのスコアは1.83と2.96まで改善された。また, 本手法を後期輸液センシング法として使用し, 投票に基づくアンサンブル法よりも有意な性能を示した。 AMOTAスコア67.6のnuScenesテスト評価を達成し、これは他の最先端のトラッカーと同等である。コードは: \url{https://github.com/cogsys-tuebingen/CBMOT}で公開されている。

Multi-object tracking is a critical component in autonomous navigation, as it provides valuable information for decision-making. Many researchers tackled the 3D multi-object tracking task by filtering out the frame-by-frame 3D detections; however, their focus was mainly on finding useful features or proper matching metrics. Our work focuses on a neglected part of the tracking system: score refinement and tracklet termination. We show that manipulating the scores depending on time consistency while terminating the tracklets depending on the tracklet score improves tracking results. We do this by increasing the matched tracklets' score with score update functions and decreasing the unmatched tracklets' score. Compared to count-based methods, our method consistently produces better AMOTA and MOTA scores when utilizing various detectors and filtering algorithms on different datasets. The improvements in AMOTA score went up to 1.83 and 2.96 in MOTA. We also used our method as a late-fusion ensembling method, and it performed better than voting-based ensemble methods by a solid margin. It achieved an AMOTA score of 67.6 on nuScenes test evaluation, which is comparable to other state-of-the-art trackers. Code is publicly available at: \url{https://github.com/cogsys-tuebingen/CBMOT}.

翻訳日:2021-07-12 13:40:19 公開日:2021-07-09

# StyleCariGAN: StyleGAN特徴マップ変調による画像生成

StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation ( http://arxiv.org/abs/2107.04331v1 )

ライセンス: Link先を確認

Wonjong Jang, Gwangjin Ju, Yucheol Jung, Jiaolong Yang, Xin Tong, Seungyong Lee

(参考訳) StyleGAN を用いた形状とスタイルの操作に基づく似顔絵生成フレームワークを提案する。 stylecariganと呼ばれるこのフレームワークは、入力写真からリアルで詳細な似顔絵を自動的に作成し、形状の誇張度とカラースタイライゼーションタイプを任意に制御する。提案手法の鍵となる要素は,StyleGANの粗い層特徴写像を変調して,好適な図形強調を生成する形状強調ブロックである。まず、写真用StyleGANの微細な層を、画像生成の訓練を受けたStyleGANの対応する層に置き換えることで、写真から画像への変換が可能なStyleGANを構築した。入力写真が与えられた場合、層状混合モデルは似顔絵の詳細なカラースタイリングを生成するが、形状の誇張はない。次に、層混合モデルの粗い層に形状誇張ブロックを付加し、入力の特徴的外観を保ちながら形状誇張を作成するようにブロックを訓練する。実験結果から,我々のStyleCariGANは,現在の最先端手法と比較して,現実的で詳細な似顔絵を生成することがわかった。 StyleCariGANは、表情制御など、他のStyleGANベースの画像操作もサポートしています。

We present a caricature generation framework based on shape and style manipulation using StyleGAN. Our framework, dubbed StyleCariGAN, automatically creates a realistic and detailed caricature from an input photo with optional controls on shape exaggeration degree and color stylization type. The key component of our method is shape exaggeration blocks that are used for modulating coarse layer feature maps of StyleGAN to produce desirable caricature shape exaggerations. We first build a layer-mixed StyleGAN for photo-to-caricature style conversion by swapping fine layers of the StyleGAN for photos to the corresponding layers of the StyleGAN trained to generate caricatures. Given an input photo, the layer-mixed model produces detailed color stylization for a caricature but without shape exaggerations. We then append shape exaggeration blocks to the coarse layers of the layer-mixed model and train the blocks to create shape exaggerations while preserving the characteristic appearances of the input. Experimental results show that our StyleCariGAN generates realistic and detailed caricatures compared to the current state-of-the-art methods. We demonstrate StyleCariGAN also supports other StyleGAN-based image manipulations, such as facial expression control.

翻訳日:2021-07-12 13:39:58 公開日:2021-07-09

# 深部不連続保存画像登録ネットワーク

A Deep Discontinuity-Preserving Image Registration Network ( http://arxiv.org/abs/2107.04440v1 )

ライセンス: Link先を確認

Xiang Chen, Nishant Ravikumar, Yan Xia, Alejandro F Frangi

(参考訳) 画像登録は、ペアまたは画像のグループ間の空間対応を確立することを目的としており、医療画像計算とコンピュータ支援介入の基盤となっている。現在、ほとんどのディープラーニングベースの登録法は、所望の変形場は世界規模で滑らかで連続的であり、実際のシナリオ、特に医用画像の登録において必ずしも有効ではないと仮定している。心臓画像と腹部画像)。このようなグローバル制約は、不連続な組織界面におけるアーティファクトやエラーの増加につながる可能性がある。そこで本研究では,より優れた登録性能と現実的な変形場を得るため,ddir(deep discontinuity-preserving image registration network)を提案する。本手法は,UK Biobank Imaging Study (UKBB) の心臓磁気共鳴(MR)画像の登録実験において,最先端のアプローチよりも,登録精度を大幅に向上し,より現実的な変形を予測する。

Image registration aims to establish spatial correspondence across pairs, or groups of images, and is a cornerstone of medical image computing and computer-assisted-interventions. Currently, most deep learning-based registration methods assume that the desired deformation fields are globally smooth and continuous, which is not always valid for real-world scenarios, especially in medical image registration (e.g. cardiac imaging and abdominal imaging). Such a global constraint can lead to artefacts and increased errors at discontinuous tissue interfaces. To tackle this issue, we propose a weakly-supervised Deep Discontinuity-preserving Image Registration network (DDIR), to obtain better registration performance and realistic deformation fields. We demonstrate that our method achieves significant improvements in registration accuracy and predicts more realistic deformations, in registration experiments on cardiac magnetic resonance (MR) images from UK Biobank Imaging Study (UKBB), than state-of-the-art approaches.

翻訳日:2021-07-12 13:39:37 公開日:2021-07-09

# 視覚領域の変遷に伴うオープンワールド認識の課題について

On the Challenges of Open World Recognitionunder Shifting Visual Domains ( http://arxiv.org/abs/2107.04461v1 )

ライセンス: Link先を確認

Dario Fontanel, Fabio Cermelli, Massimiliano Mancini, Barbara Caputo

(参考訳) 野生で動作しているロボット視覚システムは、異なる環境条件下で、未知の環境を含む様々な意味概念に直面しながら、制約のないシナリオで行動しなければならない。この目的のために、近年の研究では、視覚的オブジェクト認識手法をi)見知らぬ概念を検出し、i)新しいセマンティッククラスの画像が到着するにつれて、その知識を時間とともに拡張しようと試みている。 Open World Recognition (OWR)と呼ばれるこの設定は、初期トレーニングセットに存在するセマンティック制限を破ることのできるシステムを開発することを目的としている。しかしながら、このトレーニングセットは、実世界の高変動を必ずしも反映しない特定の取得条件に対するバイアスのため、システム自体の意味的限界だけでなく、環境的な制約も課している。このトレーニングとテスト分布の相違をドメインシフトと呼ぶ。本研究では、OWRアルゴリズムがドメインシフトの下で有効であるかどうかを調査し、OWRアルゴリズムの性能をドメインシフトなしで正確に評価するための最初のベンチマーク設定を示す。次に、このベンチマークを用いて様々なシナリオの分析を行い、既存のOWRアルゴリズムが、列車とテストの分布が異なる場合、いかに深刻な性能劣化を経験しているかを示す。解析の結果,この劣化はOWRと領域一般化手法の結合によってわずかに緩和されることが示され,既存のアルゴリズムのプラグアンドプレイだけでは未知の領域における新しいカテゴリや未知のカテゴリを認識するには不十分であることが示唆された。本研究は,ロボット視覚システムの構築において,これらの課題に対して,極めて現実的な条件下で確実に機能するために必要なオープンな課題と今後の研究方向性を,明らかに示している。 https://github.com/DarioFontanel/OWR-VisualDomainsで利用可能なコード

Robotic visual systems operating in the wild must act in unconstrained scenarios, under different environmental conditions while facing a variety of semantic concepts, including unknown ones. To this end, recent works tried to empower visual object recognition methods with the capability to i) detect unseen concepts and ii) extended their knowledge over time, as images of new semantic classes arrive. This setting, called Open World Recognition (OWR), has the goal to produce systems capable of breaking the semantic limits present in the initial training set. However, this training set imposes to the system not only its own semantic limits, but also environmental ones, due to its bias toward certain acquisition conditions that do not necessarily reflect the high variability of the real-world. This discrepancy between training and test distribution is called domain-shift. This work investigates whether OWR algorithms are effective under domain-shift, presenting the first benchmark setup for assessing fairly the performances of OWR algorithms, with and without domain-shift. We then use this benchmark to conduct analyses in various scenarios, showing how existing OWR algorithms indeed suffer a severe performance degradation when train and test distributions differ. Our analysis shows that this degradation is only slightly mitigated by coupling OWR with domain generalization techniques, indicating that the mere plug-and-play of existing algorithms is not enough to recognize new and unknown categories in unseen domains. Our results clearly point toward open issues and future research directions, that need to be investigated for building robot visual systems able to function reliably under these challenging yet very real conditions. Code available at https://github.com/DarioFontanel/OWR-VisualDomains

翻訳日:2021-07-12 13:39:19 公開日:2021-07-09

# モデル予測制御のための構造化ハマースタイン・ウィーナーモデル学習

Structured Hammerstein-Wiener Model Learning for Model Predictive Control ( http://arxiv.org/abs/2107.04247v1 )

ライセンス: Link先を確認

Ryuta Moriyasu, Taro Ikeda, Sho Kawaguchi, Kenji Kashima

(参考訳) 本稿では,機械学習によって構築されたモデルを用いて最適制御の信頼性を向上させることを目的とする。このようなモデルに基づく最適制御問題は一般に非凸であり、オンラインでは解決が難しい。本稿では,Hammerstein-Wienerモデルと入力凸ニューラルネットワークを組み合わせたモデルを提案する。提案モデルの重要な特徴は, 最適制御問題の発生は, 柔軟モデリング能力を維持しつつ, 対流性と部分線形性を効果的に活用できる点である。本手法の実用性について,エンジンエアパスシステムのモデル化と制御への応用を通して検討した。

This paper aims to improve the reliability of optimal control using models constructed by machine learning methods. Optimal control problems based on such models are generally non-convex and difficult to solve online. In this paper, we propose a model that combines the Hammerstein-Wiener model with input convex neural networks, which have recently been proposed in the field of machine learning. An important feature of the proposed model is that resulting optimal control problems are effectively solvable exploiting their convexity and partial linearity while retaining flexible modeling ability. The practical usefulness of the method is examined through its application to the modeling and control of an engine airpath system.

翻訳日:2021-07-12 13:38:49 公開日:2021-07-09

# HMMとCTCに基づくフルコンテキストASRモデルの格子フリー強化MMIトレーニングについて

On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models ( http://arxiv.org/abs/2107.04154v1 )

ライセンス: Link先を確認

Xiaohui Zhang, Vimal Manohar, David Zhang, Frank Zhang, Yangyang Shi, Nayan Singhal, Julian Chan, Fuchun Peng, Yatharth Saraf, Mike Seltzer

(参考訳) ハイブリッド自動音声認識(ASR)モデルは通常、CTCまたはLF-MMI基準で順次訓練される。しかし、それらは非常に異なる正統性を持ち、通常は異なるフレームワークで実装される。本稿では,モデリング単位とラベルトポロジの概念を分離し,適切な数値/デノミネータグラフを構築することにより,ハイブリッド音響モデリング(AM)のための一般化された枠組みを確立する。本フレームワークでは,HMM/CTCトポロジを持つワードピース/モノチャー/ビチャー/チェノン単位に対して,LF-MMIは限定コンテキストモデルとフルコンテキストモデルの両方に適用可能な,強力なトレーニング基準であることを示す。本フレームワークでは,チェノン(ch)/ワードピース(wp)-CTC-bMMI,ワードピース(wp)-HMM-bMMIの3つの新しいトレーニング手法を提案する。異なるトレーニングスキームの利点をLibrispeech上で総合的に評価し,wp-CTC-bMMIとch-CTC-bMMIを実世界の2つのタスクで評価し,その効果を示した。さらに、バイチャーHMM-MMIモデルが従来の非ニューラルGMM-HMMよりも優れたアライメントモデルとして機能することを示す。

Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria. However, they have vastly different legacies and are usually implemented in different frameworks. In this paper, by decoupling the concepts of modeling units and label topologies and building proper numerator/denominator graphs accordingly, we establish a generalized framework for hybrid acoustic modeling (AM). In this framework, we show that LF-MMI is a powerful training criterion applicable to both limited-context and full-context models, for wordpiece/mono-char/bi-char/chenone units, with both HMM/CTC topologies. From this framework, we propose three novel training schemes: chenone(ch)/wordpiece(wp)-CTC-bMMI, and wordpiece(wp)-HMM-bMMI with different advantages in training performance, decoding efficiency and decoding time-stamp accuracy. The advantages of different training schemes are evaluated comprehensively on Librispeech, and wp-CTC-bMMI and ch-CTC-bMMI are evaluated on two real world ASR tasks to show their effectiveness. Besides, we also show bi-char(bc) HMM-MMI models can serve as better alignment models than traditional non-neural GMM-HMMs.

翻訳日:2021-07-12 13:38:22 公開日:2021-07-09

# Bib2Auth: 文献データを用いた著者曖昧化のためのディープラーニングアプローチ

Bib2Auth: Deep Learning Approach for Author Disambiguation using Bibliographic Data ( http://arxiv.org/abs/2107.04382v1 )

ライセンス: Link先を確認

Zeyd Boukhers, Nagaraj Bahubali, Abinaya Thulsi Chandrasekaran, Adarsh Anand, Soniya Manchenahalli Gnanendra Prasadand, Sriram Aralappa

(参考訳) 著者名の曖昧さは、名前の同義語や同義語のため、デジタル図書館において重要な問題である。本稿では,著者の共著者パターンと研究領域に依存して,著者名を現実世界の実体に結びつける手法を提案する。本モデルでは,著者と著者の共著者との関係を捉え,対象著者の出版物のタイトルと出典によって表される研究領域を把握し,著者を特定する。これらの属性は、意味的および象徴的な表現によって符号化される。この目的のために、Bib2AuthはDBLPリポジトリから約22Kの書誌記録を使用し、それぞれの共著者でトレーニングされている。広範な実験により、同じ名前を共有する著者を区別し、異なる名前の作者を識別するアプローチの能力が証明された。 Bib2Authは比較的大きなデータセットで優れたパフォーマンスを示しており、書誌インデックスに直接組み込むことができる。

Author name ambiguity remains a critical open problem in digital libraries due to synonymy and homonymy of names. In this paper, we propose a novel approach to link author names to their real-world entities by relying on their co-authorship pattern and area of research. Our supervised deep learning model identifies an author by capturing his/her relationship with his/her co-authors and area of research, which is represented by the titles and sources of the target author's publications. These attributes are encoded by their semantic and symbolic representations. To this end, Bib2Auth uses ~ 22K bibliographic records from the DBLP repository and is trained with each pair of co-authors. The extensive experiments have proved the capability of the approach to distinguish between authors sharing the same name and recognize authors with different name variations. Bib2Auth has shown good performance on a relatively large dataset, which qualifies it to be directly integrated into bibliographic indices.

翻訳日:2021-07-12 13:37:56 公開日:2021-07-09

# 下流フェアネスのための多精度プロキシ

Multiaccurate Proxies for Downstream Fairness ( http://arxiv.org/abs/2107.04423v1 )

ライセンス: Link先を確認

Emily Diana, Wesley Gill, Michael Kearns, Krishnaram Kenthapadi, Aaron Roth, and Saeed Sharifi-Malvajerdi

(参考訳) 私たちは、センシティブな機能がトレーニング時に利用できない場合に、人口統計学的公正条件に従わなければならないモデルをトレーニングする問題を調査します。私たちはフェアネスパイプラインの観点を採用しており、センシティブな機能にアクセス可能な"上流"学習者は、他の属性からこれらの機能のプロキシモデルを学びます。プロキシの目標は、一般的な"ダウンストリーム"学習者 -- 予測タスクを最小限の仮定で -- が、プロキシを使用して、真に敏感な機能に対して公平なモデルをトレーニングできるようにすることです。我々は,この目的のために,下流モデルクラスに対する多精度制約に従うことを示し,サンプルおよびoracleの効率的なアルゴリズムと,そのようなプロキシを学ぶための一般化境界を提供する。一般に、多重精度は分類の正確さよりもずっと容易に満足でき、感度の高い特徴が予測しにくい場合でも満足できる。

We study the problem of training a model that must obey demographic fairness conditions when the sensitive features are not available at training time -- in other words, how can we train a model to be fair by race when we don't have data about race? We adopt a fairness pipeline perspective, in which an "upstream" learner that does have access to the sensitive features will learn a proxy model for these features from the other attributes. The goal of the proxy is to allow a general "downstream" learner -- with minimal assumptions on their prediction task -- to be able to use the proxy to train a model that is fair with respect to the true sensitive features. We show that obeying multiaccuracy constraints with respect to the downstream model class suffices for this purpose, and provide sample- and oracle efficient-algorithms and generalization bounds for learning such proxies. In general, multiaccuracy can be much easier to satisfy than classification accuracy, and can be satisfied even when the sensitive features are hard to predict.

翻訳日:2021-07-12 13:37:41 公開日:2021-07-09

# 再記述モデルマイニング

Redescription Model Mining ( http://arxiv.org/abs/2107.04462v1 )

ライセンス: Link先を確認

Felix I. Stamm, Martin Becker, Markus Strohmaier, Florian Lemmerich

(参考訳) 本稿では,属性のサブセットのみを共有し,共通インスタンスを持たない2つのデータセットにまたがる解釈可能なパターンを識別する,新しい手法であるRedescription Model Miningを紹介する。特に、再記述モデルマイニング(redescription model mining)は、予測可能なデータサブセットのペア(データセット毎にひとつ)を見つけることを目的としている。これを実現するために、以前は2つの研究領域、例外モデルマイニングと再定義マイニングを組み合わせた。この新しい問題設定のために, 有望なパターンの選択, 効率的なアルゴリズムの提案, 合成データおよび実世界データの可能性を示すための興味深い尺度を開発した。未知のパターンは、データセットにまたがって現れる共通の基礎的な現象をヒントにすることができ、同じデータセットに現れない属性間の(組み合わせ)関連を発見できる。

This paper introduces Redescription Model Mining, a novel approach to identify interpretable patterns across two datasets that share only a subset of attributes and have no common instances. In particular, Redescription Model Mining aims to find pairs of describable data subsets -- one for each dataset -- that induce similar exceptional models with respect to a prespecified model class. To achieve this, we combine two previously separate research areas: Exceptional Model Mining and Redescription Mining. For this new problem setting, we develop interestingness measures to select promising patterns, propose efficient algorithms, and demonstrate their potential on synthetic and real-world data. Uncovered patterns can hint at common underlying phenomena that manifest themselves across datasets, enabling the discovery of possible associations between (combinations of) attributes that do not appear in the same dataset.

翻訳日:2021-07-12 13:37:26 公開日:2021-07-09

# ディープラーニングモバイルトラフィック分類におけるクラスインクリメンタル学習の初見

A First Look at Class Incremental Learning in Deep Learning Mobile Traffic Classification ( http://arxiv.org/abs/2107.04464v1 )

ライセンス: Link先を確認

Giampaolo Bovenzi, Lixuan Yang, Alessandro Finamore, Giuseppe Aceto, Domenico Ciuonzo, Antonio Pescap\`e, Dario Rossi

(参考訳) 近年のDeep Learning(DL)の普及により、トラフィック分類への関心が再燃し、インターネットアプリケーションのトラフィックを特定するためのDLベースの分類器の正確性を示す研究がいくつか行われた。ハードウェアアクセラレータ(GPU、TPU)の助けを借りても、DLモデルのトレーニングは高価であり、インターネットトラフィックの進化する性質、特にモバイルトラフィックに適合するために必要な頻繁なモデル更新を運用する能力を制限する。この問題点に対処するため、本研究では、モデルにフルリトレーニングなしで新しいクラスを追加するためのインクリメンタルラーニング(il)技術を検討し、モデルの更新サイクルをスピードアップします。 iCarlはアートILメソッドのステートであり、MIRAGE-2019は40のAndroidアプリからのトラフィックを持つパブリックデータセットであり、「トラフィック分類に漸進的な学習がある場合」を理解することを目的としている。 iCarl内部を分離することにより、設計を改善する方法について議論し、iCarl+という改訂版に寄与する。当社の分析結果から、il技術はdlベースの自動トラヒック分析システムに向けたロードマップにおいて有望な研究領域である事が分かりました。

The recent popularity growth of Deep Learning (DL) re-ignited the interest towards traffic classification, with several studies demonstrating the accuracy of DL-based classifiers to identify Internet applications' traffic. Even with the aid of hardware accelerators (GPUs, TPUs), DL model training remains expensive, and limits the ability to operate frequent model updates necessary to fit to the ever evolving nature of Internet traffic, and mobile traffic in particular. To address this pain point, in this work we explore Incremental Learning (IL) techniques to add new classes to models without a full retraining, hence speeding up model's updates cycle. We consider iCarl, a state of the art IL method, and MIRAGE-2019, a public dataset with traffic from 40 Android apps, aiming to understand "if there is a case for incremental learning in traffic classification". By dissecting iCarl internals, we discuss ways to improve its design, contributing a revised version, namely iCarl+. Despite our analysis reveals their infancy, IL techniques are a promising research area on the roadmap towards automated DL-based traffic analysis systems.

翻訳日:2021-07-12 13:37:11 公開日:2021-07-09

# BayesSimIG:IsaacGymを用いた適応的ドメインランダム化のためのスケーラブルパラメータ推論

BayesSimIG: Scalable Parameter Inference for Adaptive Domain Randomization with IsaacGym ( http://arxiv.org/abs/2107.04527v1 )

ライセンス: Link先を確認

Rika Antonova, Fabio Ramos, Rafael Possas, Dieter Fox

(参考訳) BayesSimは、シミュレーションパラメータの確率自由推論に基づく強化学習における領域ランダム化の統計手法である。本稿では、最近リリースされたNVIDIA IsaacGymと統合されたBayesSimの実装を提供するライブラリであるBayesSimIGの概要を紹介する。この組み合わせにより、エンドツーエンドgpuアクセラレーションによる大規模パラメータ推論が可能になる。推論とシミュレーションの両方にgpuのスピードアップがあり、100以上のシミュレーションパラメータを持つ複雑なロボットタスクに対して、10k以上の並列シミュレーション環境の実行をサポートする。 BayesSimIGは、高次元の後方のスライスを簡単に視覚化するTensorBoardとの統合を提供する。このライブラリはモジュール的な方法で構築され、並列IsaacGym環境から軌跡を収集・処理する新しい方法で研究実験を支援する。

BayesSim is a statistical technique for domain randomization in reinforcement learning based on likelihood-free inference of simulation parameters. This paper outlines BayesSimIG: a library that provides an implementation of BayesSim integrated with the recently released NVIDIA IsaacGym. This combination allows large-scale parameter inference with end-to-end GPU acceleration. Both inference and simulation get GPU speedup, with support for running more than 10K parallel simulation environments for complex robotics tasks that can have more than 100 simulation parameters to estimate. BayesSimIG provides an integration with TensorBoard to easily visualize slices of high-dimensional posteriors. The library is built in a modular way to support research experiments with novel ways to collect and process the trajectories from the parallel IsaacGym environments.

翻訳日:2021-07-12 13:36:51 公開日:2021-07-09

# ロボット学習のためのタスク推論を支援する行動自己組織化

Behavior Self-Organization Supports Task Inference for Continual Robot Learning ( http://arxiv.org/abs/2107.04533v1 )

ライセンス: Link先を確認

Muhammad Burhan Hafez, Stefan Wermter

(参考訳) ロボット学習の最近の進歩により、ロボットは事前定義されたタスクを習得する能力がますます向上している。一方、人間として、私たちは生涯にわたって増え続けるタスクを学習する能力を持っています。連続的なロボット学習は、ロボットにこの能力を与えることを目標とする、新たな研究方向である。時間とともに新しいタスクを学ぶために、ロボットはまず手元のタスクを推測する必要がある。しかし,タスク推論はマルチタスク学習文学においてほとんど注目されていない。本稿では,ロボット制御タスクの連続学習のための新しい手法を提案する。提案手法は,段階的な自己組織的行動による行動埋め込みの教師なし学習を行う。タスク推論は、タスクよりもパフォーマンスを最適化するために強化学習で訓練されたマルチタスクポリシーへの入力として、環境状態とともに使用される実証行動に最も近い振る舞いを埋め込むことによって行われる。従来の手法とは異なり,本手法ではタスク分布の仮定は行わず,タスクを推論するタスク探索は不要である。並列かつ逐次的に提示されたタスクを用いた実験において,本手法は一般化性能と収束速度,特に連続学習環境において,他のマルチタスク学習手法よりも優れていることを示す。

Recent advances in robot learning have enabled robots to become increasingly better at mastering a predefined set of tasks. On the other hand, as humans, we have the ability to learn a growing set of tasks over our lifetime. Continual robot learning is an emerging research direction with the goal of endowing robots with this ability. In order to learn new tasks over time, the robot first needs to infer the task at hand. Task inference, however, has received little attention in the multi-task learning literature. In this paper, we propose a novel approach to continual learning of robotic control tasks. Our approach performs unsupervised learning of behavior embeddings by incrementally self-organizing demonstrated behaviors. Task inference is made by finding the nearest behavior embedding to a demonstrated behavior, which is used together with the environment state as input to a multi-task policy trained with reinforcement learning to optimize performance over tasks. Unlike previous approaches, our approach makes no assumptions about task distribution and requires no task exploration to infer tasks. We evaluate our approach in experiments with concurrently and sequentially presented tasks and show that it outperforms other multi-task learning approaches in terms of generalization performance and convergence speed, particularly in the continual learning setting.

翻訳日:2021-07-12 13:36:38 公開日:2021-07-09

# 流体シミュレーションの低次モデリングと効率的な時間進化のための深層学習

Deep Learning for Reduced Order Modelling and Efficient Temporal Evolution of Fluid Simulations ( http://arxiv.org/abs/2107.04556v1 )

ライセンス: Link先を確認

Pranshu Pant, Ruchit Doshi, Pranav Bahl, Amir Barati Farimani

(参考訳) Reduced Order Modelling (ROM) は、高次力学系の低次で計算コストの低い表現を作成するために広く用いられている。これらの表現を用いて、romはより少ないパラメータを使いながら効率的にフローフィールドをモデル化することができる。従来のROMは高階多様体を低次元空間に直線的に射影することでこれを達成し、プロパー直交分解(POD)のような次元還元手法を用いる。本研究では,非線形射影によって順序状態が減少するニューラルネットワークを構築するための,新しい深層学習フレームワークdl-rom(deep learning- reduced order modelling)を開発した。次に,3次元オートエンコーダと3次元U-Netアーキテクチャを用いて,学習した縮小状態を用いてシミュレーションの時間ステップを効率的に予測する。我々のモデルDL-ROMは、学習したROMから高精度な再構成を生成でき、学習した縮小状態を時間的にトラバースすることで、将来の時間ステップを効率的に予測することができる。これらはすべて、地上の真実を監督したり、高価なNavier-Stokes(NS)方程式を反復的に解決する必要なく達成される。提案手法の有効性と性能を検証するため,計算機流体力学(CFD)データセットを再構成性能と計算ランタイムメトリクスを用いて評価した。 DL-ROMは、許容誤差閾値を維持しながら、反復解法の計算ランタイムを2桁近く削減することができる。

Reduced Order Modelling (ROM) has been widely used to create lower order, computationally inexpensive representations of higher-order dynamical systems. Using these representations, ROMs can efficiently model flow fields while using significantly lesser parameters. Conventional ROMs accomplish this by linearly projecting higher-order manifolds to lower-dimensional space using dimensionality reduction techniques such as Proper Orthogonal Decomposition (POD). In this work, we develop a novel deep learning framework DL-ROM (Deep Learning - Reduced Order Modelling) to create a neural network capable of non-linear projections to reduced order states. We then use the learned reduced state to efficiently predict future time steps of the simulation using 3D Autoencoder and 3D U-Net based architectures. Our model DL-ROM is able to create highly accurate reconstructions from the learned ROM and is thus able to efficiently predict future time steps by temporally traversing in the learned reduced state. All of this is achieved without ground truth supervision or needing to iteratively solve the expensive Navier-Stokes(NS) equations thereby resulting in massive computational savings. To test the effectiveness and performance of our approach, we evaluate our implementation on five different Computational Fluid Dynamics (CFD) datasets using reconstruction performance and computational runtime metrics. DL-ROM can reduce the computational runtimes of iterative solvers by nearly two orders of magnitude while maintaining an acceptable error threshold.

翻訳日:2021-07-12 13:36:20 公開日:2021-07-09

# 良性および悪性眼腫瘍進展推定のための深層学習モデル

Deep Learning models for benign and malign Ocular Tumor Growth Estimation ( http://arxiv.org/abs/2107.04220v1 )

ライセンス: Link先を確認

Mayank Goswami

(参考訳) 医療画像データの比較的豊富な可用性は、ニューラルネットワークベースの画像処理手法の開発とテストにおいて重要なサポートを提供している。臨床医は、医療画像データに適した画像処理アルゴリズムを選択する際にしばしば問題に直面する。ここでは、適切なモデルを選択するための戦略を示す。トレーニングデータセットは、100日以上経過した50マウス目の光コヒーレンストモグラフィ(oct)および血管造影(oct−a)画像を含む。このデータには、治療を受けていないマウスの目の画像が含まれている。正常網膜層を有する腫瘍領域の自動(a)分化と3次元眼腫瘍体積のセグメンテーションの4種類のディープラーニング変異体を試験した。深層学習モデルの被曝感度解析は,8つの性能指標を用いて,精度,信頼性,再現性,速度を計測する訓練・試験画像の数に対して行われる。 U-net with UVgg16 is best for malign tumor data set with treatment (have certain variation) and U-net with Inception backbone for beign tumor data (with minor variation)。損失値と根平均二乗誤差(R.M.S.E.) それぞれ最も敏感なパフォーマンス指標と最も敏感なパフォーマンス指標が見られます指標による)性能は、多くのトレーニング画像に関して指数関数的に改善されている。セグメンテッドオクタアンギオグラフィーデータから,血管新生が腫瘍体積を増加させることが示唆された。画像解析により,photodynamic imaging-assisted tumor treatment protocolが積極的に増殖する腫瘍を嚢胞に変化させていることが明らかとなった。画像の数や特徴の種類に応じて、医療専門家が特定のモデルを選択するのに役立つ経験的表現を得る。生体画像解析に特定の深層学習モデルを採用する前に,提案課題を標準的実践として採用することを推奨する。

Relatively abundant availability of medical imaging data has provided significant support in the development and testing of Neural Network based image processing methods. Clinicians often face issues in selecting suitable image processing algorithm for medical imaging data. A strategy for the selection of a proper model is presented here. The training data set comprises optical coherence tomography (OCT) and angiography (OCT-A) images of 50 mice eyes with more than 100 days follow-up. The data contains images from treated and untreated mouse eyes. Four deep learning variants are tested for automatic (a) differentiation of tumor region with healthy retinal layer and (b) segmentation of 3D ocular tumor volumes. Exhaustive sensitivity analysis of deep learning models is performed with respect to the number of training and testing images using 8 eight performance indices to study accuracy, reliability/reproducibility, and speed. U-net with UVgg16 is best for malign tumor data set with treatment (having considerable variation) and U-net with Inception backbone for benign tumor data (with minor variation). Loss value and root mean square error (R.M.S.E.) are found most and least sensitive performance indices, respectively. The performance (via indices) is found to be exponentially improving regarding a number of training images. The segmented OCT-Angiography data shows that neovascularization drives the tumor volume. Image analysis shows that photodynamic imaging-assisted tumor treatment protocol is transforming an aggressively growing tumor into a cyst. An empirical expression is obtained to help medical professionals to choose a particular model given the number of images and types of characteristics. We recommend that the presented exercise should be taken as standard practice before employing a particular deep learning model for biomedical image analysis.

翻訳日:2021-07-12 13:35:39 公開日:2021-07-09

# VMAFとVMAF NEGのハック:異なる前処理に対するメトリクス脆弱性

Hacking VMAF and VMAF NEG: metrics vulnerability to different preprocessing ( http://arxiv.org/abs/2107.04510v1 )

ライセンス: Link先を確認

Maksim Siniukov, Anastasia Antsiferova, Dmitriy Kulikov, Dmitriy Vatolin

(参考訳) ビデオ品質測定は、ビデオ処理アプリケーションの開発において重要な役割を果たす。本稿では,ビデオプリプロセッシングにより,VMAFとそのチューニング耐性バージョンVMAF NEGが人工的に向上可能であることを示す。我々は,vmafを最大218.8%増加させる処理アルゴリズムのパラメータをチューニングするパイプラインを提案する。前処理したビデオの主観的な比較では、ほとんどの方法では、視覚的品質は低下するか、変わらないままである。また,vmaf negスコアは,前処理法によって最大23.6%向上できることを示した。

Video quality measurement plays a critical role in the development of video processing applications. In this paper, we show how popular quality metrics VMAF and its tuning-resistant version VMAF NEG can be artificially increased by video preprocessing. We propose a pipeline for tuning parameters of processing algorithms that allows increasing VMAF by up to 218.8%. A subjective comparison of preprocessed videos showed that with the majority of methods visual quality drops down or stays unchanged. We show that VMAF NEG scores can also be increased by some preprocessing methods by up to 23.6%.

翻訳日:2021-07-12 13:35:12 公開日:2021-07-09

# 補間を伴うブロック交代ブレグマンメジャー化最小化

Block Alternating Bregman Majorization Minimization with Extrapolation ( http://arxiv.org/abs/2107.04395v1 )

ライセンス: Link先を確認

Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis, Masoud Ahookhosh, Panagiotis Patrinos

(参考訳) 本稿では,ブロック相対滑らか関数と固有かつ下半連続ブロック分離関数の和を目的とする非滑らかな非凸最適化問題のクラスを考える。ブロックのクラスに対するブロック近位勾配 (BPG) 法の解析は、ブロック相対滑らか関数のクラスを扱うブレグマンBPG法にうまく拡張されているが、加速されたブレグマンBPG法は不足しており、設計が困難である。本研究では,Nesterov型加速法と最大化最小化法から着想を得たBregman Majorization-Minimization framework with Extrapolation (BMME)を提案する。軽微な仮定の下でBMMEの1次定常点への後続収束を証明し、その大域収束を強い条件下で研究する。直交非負行列分解問題に対するBMMEの有効性について述べる。

In this paper, we consider a class of nonsmooth nonconvex optimization problems whose objective is the sum of a block relative smooth function and a proper and lower semicontinuous block separable function. Although the analysis of block proximal gradient (BPG) methods for the class of block $L$-smooth functions have been successfully extended to Bregman BPG methods that deal with the class of block relative smooth functions, accelerated Bregman BPG methods are scarce and challenging to design. Taking our inspiration from Nesterov-type acceleration and the majorization-minimization scheme, we propose a block alternating Bregman Majorization-Minimization framework with Extrapolation (BMME). We prove subsequential convergence of BMME to a first-order stationary point under mild assumptions, and study its global convergence under stronger conditions. We illustrate the effectiveness of BMME on the penalized orthogonal nonnegative matrix factorization problem.

翻訳日:2021-07-12 13:35:03 公開日:2021-07-09

# 多経路畳み込みニューラルネットワークによる持続的肺音検出における特徴抽出の効率化

Multi-path Convolutional Neural Networks Efficiently Improve Feature Extraction in Continuous Adventitious Lung Sound Detection ( http://arxiv.org/abs/2107.04226v1 )

ライセンス: Link先を確認

Fu-Shun Hsu, Shang-Ran Huang, Chien-Wen Huang, Chun-Chieh Chen, Yuan-Ren Cheng, Feipei Lai

(参考訳) 我々は以前, 大きな肺音データベースhf_lung_v2 (lung_v2) を構築した。我々は,Lung_V2に基づいて,吸入,吸入,連続的冒険音(CAS),不連続的冒険音を検出するために,畳み込み二方向ゲートリカレントユニット(CNN-BiGRU)ネットワークを訓練した。しかし,CAS検出性能は多種多様であり,その1つが高度に多様化したCASパターンである。元々のcnn-bigruモデルがcasパターンをより効果的に学習し、計算負荷を過大にしないようにするため、cnn層のネットワークアーキテクチャの最小限の変更を含む3つの戦略について検討した。(1)cnn層を残留ブロックを用いてより深く、(2)cnnカーネルの数を増やしてcnn層を少し大きくし、(3)入力を複数のパスに分離する(モデルはマルチパスcnn-bigruで示される)。 CASセグメントとイベント検出の性能を評価した。その結果,提案したアーキテクチャ修正モデルでCAS検出の改善が認められた。 CASイベント検出のためのF1スコアは0.445から0.491-0.530に増加した。しかし,マルチパスcnn-bigruモデルは,9つの評価指標において,優勝タイトル数 (5) において他のモデルよりも優れていた。さらに、マルチパスCNN-BiGRUモデルでは、元のCNN-BiGRUモデルと比べて余分な計算負荷(0.97倍の推論時間)は生じなかった。結論として、マルチパスCNN層は、特徴抽出の有効性を効率よく改善し、その結果、CAS検出が向上する。

We previously established a large lung sound database, HF_Lung_V2 (Lung_V2). We trained convolutional-bidirectional gated recurrent unit (CNN-BiGRU) networks for detecting inhalation, exhalation, continuous adventitious sound (CAS) and discontinuous adventitious sound at the recording level on the basis of Lung_V2. However, the performance of CAS detection was poor due to many reasons, one of which is the highly diversified CAS patterns. To make the original CNN-BiGRU model learn the CAS patterns more effectively and not cause too much computing burden, three strategies involving minimal modifications of the network architecture of the CNN layers were investigated: (1) making the CNN layers a bit deeper by using the residual blocks, (2) making the CNN layers a bit wider by increasing the number of CNN kernels, and (3) separating the feature input into multiple paths (the model was denoted by Multi-path CNN-BiGRU). The performance of CAS segment and event detection were evaluated. Results showed that improvement in CAS detection was observed among all the proposed architecture-modified models. The F1 score for CAS event detection of the proposed models increased from 0.445 to 0.491-0.530, which was deemed significant. However, the Multi-path CNN-BiGRU model outperformed the other models in terms of the number of winning titles (five) in total nine evaluation metrics. In addition, the Multi-path CNN-BiGRU model did not cause extra computing burden (0.97-fold inference time) compared to the original CNN-BiGRU model. Conclusively, the Multi-path CNN layers can efficiently improve the effectiveness of feature extraction and subsequently result in better CAS detection.

翻訳日:2021-07-12 13:34:33 公開日:2021-07-09

# 混合訓練とドメイン適応を用いた肺・気管音の呼吸位相の改善と持続的予防音検出

Improved Breath Phase and Continuous Adventitious Sound Detection in Lung and Tracheal Sound Using Mixed Set Training and Domain Adaptation ( http://arxiv.org/abs/2107.04229v1 )

ライセンス: Link先を確認

Fu-Shun Hsu, Shang-Ran Huang, Chang-Fu Su, Chien-Wen Huang, Yuan-Ren Cheng, Chun-Chieh Chen, Chun-Yu Wu, Chung-Wei Chen, Yen-Chun Lai, Tang-Wei Cheng, Nian-Jhen Lin, Wan-Ling Tsai, Ching-Shiang Lu, Chuan Chen, Feipei Lai

(参考訳) 従来, 肺音データベースHF_Lung_V2を構築し, 吸入, 吸入, 持続的興奮音 (CAS) , 不連続的不定音検出能力を有する畳み込み双方向ゲート再帰器 (CNN-BiGRU) モデルを提案した。本研究では, 気管音響データベースHF_Tracheal_V1を構築し, 15秒間気管音響記録の11107, 23087 吸入ラベル, 16728 吸入ラベル, 6874 CASラベルを含む。 HF_Tracheal_V1の気管音とHF_Lung_V2の肺音を組み合わせるか単独でCNN-BiGRUモデルを訓練し,気管音響解析を行った。その結果,(1)完全訓練(スクラッチからトレーニング)を用いて気管音を用いて肺音モデルを訓練し,(2)気管音のみを用いて気管音モデルを訓練すること,(2)気管音と気管音の両方を含む混合セットを用いてモデルを訓練すること,(3)気管音データと予め訓練された肺音モデルを微調整した領域適応を用いること,の2つを比較した。その結果, 気管音響解析では, 肺音のみを訓練したモデルが不十分であった。しかし、混合セットトレーニングとドメイン適応は、肺音における呼気およびCAS検出の性能を改善し、気管音における吸気、呼気、CAS検出を正の制御(肺音でのみ訓練された肺モデルとその逆)と比較して改善することができる。特に2羽の鳥を1羽の石で殺す場合、混合セットトレーニングに由来するモデルが一般的である。

Previously, we established a lung sound database, HF_Lung_V2 and proposed convolutional bidirectional gated recurrent unit (CNN-BiGRU) models with adequate ability for inhalation, exhalation, continuous adventitious sound (CAS), and discontinuous adventitious sound detection in the lung sound. In this study, we proceeded to build a tracheal sound database, HF_Tracheal_V1, containing 11107 of 15-second tracheal sound recordings, 23087 inhalation labels, 16728 exhalation labels, and 6874 CAS labels. The tracheal sound in HF_Tracheal_V1 and the lung sound in HF_Lung_V2 were either combined or used alone to train the CNN-BiGRU models for respective lung and tracheal sound analysis. Different training strategies were investigated and compared: (1) using full training (training from scratch) to train the lung sound models using lung sound alone and train the tracheal sound models using tracheal sound alone, (2) using a mixed set that contains both the lung and tracheal sound to train the models, and (3) using domain adaptation that finetuned the pre-trained lung sound models with the tracheal sound data and vice versa. Results showed that the models trained only by lung sound performed poorly in the tracheal sound analysis and vice versa. However, the mixed set training and domain adaptation can improve the performance of exhalation and CAS detection in the lung sound, and inhalation, exhalation, and CAS detection in the tracheal sound compared to positive controls (lung models trained only by lung sound and vice versa). Especially, a model derived from the mixed set training prevails in the situation of killing two birds with one stone.

翻訳日:2021-07-12 13:34:05 公開日:2021-07-09

# ポリフォニック録音におけるブラインド音源分離のためのポリシー勾配によるディープニューラルネットワークの訓練

Training a Deep Neural Network via Policy Gradients for Blind Source Separation in Polyphonic Music Recordings ( http://arxiv.org/abs/2107.04235v1 )

ライセンス: Link先を確認

S\"oren Schulze, Johannes Leuschner, Emily J. King

(参考訳) 音響信号における楽器の音の盲点分離法を提案する。パラメトリックモデルを用いて個々の音色を記述し、調波の相対振幅を捉えるために辞書を訓練する。モデルパラメータは、ディープニューラルネットワークの一種であるu-netを介して予測される。ネットワークは、モデル予測と個々のSTFT時間フレームの差に基づいて、地上の真理情報なしで訓練される。モデルパラメータのいくつかは有用なバックプロパゲーション勾配を与えないため、それらを確率的にモデル化し、代わりにポリシー勾配を用いる。辞書に基づく表現における不正確性を考慮した位相情報を提供するため,ネットワークに直接予測を行い,各楽器の音声信号の合成を行う。ニューラルネットワークの柔軟性のため、不調和性をシームレスに組み込むことができ、入力スペクトルの前処理は不要である。提案手法は,学習のための十分なデータと,楽器のスペクトル特性が辞書に近似されるほど十分に安定していることから,音響的および合成的に様々な音声サンプルに対する干渉が少なく,高品質な分離結果が得られる。

We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual STFT time frames. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary.

翻訳日:2021-07-12 13:33:30 公開日:2021-07-09

# ReLUアクティベーションを用いたニューラルネットワークのトレーニングにおける勾配流の収束解析

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation ( http://arxiv.org/abs/2107.04479v1 )

ライセンス: Link先を確認

Arnulf Jentzen and Adrian Riekert

(参考訳) 勾配降下(GD)型最適化スキームは、ニューラルネットワーク(ANN)を修正線形単位(ReLU)アクティベーションで訓練する標準的な方法である。このようなスキームは、ReLU アクティベーションを持つ ANN のトレーニングに関連する勾配流(GF)の離散化と、ReLU アクティベーションを持つ ANN のトレーニングにおける GD 型最適化スキームの数学的収束解析におけるほとんどの重要な困難が、対応する GF 微分方程式の力学に既に存在していると考えられる。この研究は、ReLUアクティベーションと3つの層(入力層1つ、隠蔽層1つ、出力層1つ)を持つANNのトレーニングにおいて、そのようなGF微分方程式を分析する上で重要な課題である。特に、本論文では、対象関数が多次元かつ連続な場合と、入力データの確率分布がルベーグ測度に対して絶対連続である場合において、すべての有界GF軌道のリスクが臨界点のリスクに収束することを証明する。さらに,本論文では, 1次元アフィン線形対象関数の場合と, 入力データの確率分布が標準均一分布と一致する場合において, 初期リスクが十分に小さい場合には, 有界GF軌道のリスクが0に収束することを示す。最後に、隠れた層(1次元の隠蔽層)に1つのニューロンしか存在しない特別な状況において、初期リスクが十分に小さい場合、すべての(必ずしも有界ではない)GF軌道のリスクがゼロに収束することを証明することによって、アフィン線形対象関数に対する上記の名前付き結果を強化する。

Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of gradient flows (GFs) associated to the training of ANNs with ReLU activation and most of the key difficulties in the mathematical convergence analysis of GD type optimization schemes in the training of ANNs with ReLU activation seem to be already present in the dynamics of the corresponding GF differential equations. It is the key subject of this work to analyze such GF differential equations in the training of ANNs with ReLU activation and three layers (one input layer, one hidden layer, and one output layer). In particular, in this article we prove in the case where the target function is possibly multi-dimensional and continuous and in the case where the probability distribution of the input data is absolutely continuous with respect to the Lebesgue measure that the risk of every bounded GF trajectory converges to the risk of a critical point. In addition, in this article we show in the case of a 1-dimensional affine linear target function and in the case where the probability distribution of the input data coincides with the standard uniform distribution that the risk of every bounded GF trajectory converges to zero if the initial risk is sufficiently small. Finally, in the special situation where there is only one neuron on the hidden layer (1-dimensional hidden layer) we strengthen the above named result for affine linear target functions by proving that that the risk of every (not necessarily bounded) GF trajectory converges to zero if the initial risk is sufficiently small.

翻訳日:2021-07-12 13:33:12 公開日:2021-07-09

# コミュニティ進化予測のためのグループノード注意

Group-Node Attention for Community Evolution Prediction ( http://arxiv.org/abs/2107.04522v1 )

ライセンス: Link先を確認

Matt Revelle, Carlotta Domeniconi, Ben Gelman

(参考訳) ソーシャルネットワークのコミュニティは、人々がネットワークに入り、去るにつれて進化し、活動行動は変化する。時間とともにコミュニティの構造変化を予測するタスクは、コミュニティ進化予測として知られている。この領域における既存の作業は、実際の予測を行うために従来の分類手法を使用しながら、イベントを定義するフレームワークの開発に焦点を当ててきた。本稿では,構造的および時間的情報からコミュニティ進化イベントを予測するための新しいグラフニューラルネットワークを提案する。モデル(GNAN)は、可変サイズの入力と、メンバーおよび隣接ノードの特徴に基づくグループの学習表現を可能にするグループノードアテンションコンポーネントを含む。標準ベースライン法との比較評価を行い,本モデルがベースラインよりも優れていることを示す。さらに,ネットワークの傾向がモデル性能に及ぼす影響を示す。

Communities in social networks evolve over time as people enter and leave the network and their activity behaviors shift. The task of predicting structural changes in communities over time is known as community evolution prediction. Existing work in this area has focused on the development of frameworks for defining events while using traditional classification methods to perform the actual prediction. We present a novel graph neural network for predicting community evolution events from structural and temporal information. The model (GNAN) includes a group-node attention component which enables support for variable-sized inputs and learned representation of groups based on member and neighbor node features. A comparative evaluation with standard baseline methods is performed and we demonstrate that our model outperforms the baselines. Additionally, we show the effects of network trends on model performance.

翻訳日:2021-07-12 13:32:39 公開日:2021-07-09

# 平均場ゲームのためのディープラーニングと平均場制御とファイナンスへの応用

Deep Learning for Mean Field Games and Mean Field Control with Applications to Finance ( http://arxiv.org/abs/2107.04568v1 )

ライセンス: Link先を確認

Ren\'e Carmona and Mathieu Lauri\`ere

(参考訳) 金融市場やより一般的にマクロ経済モデルでは、全てのエージェントの集合行動から生じる価格などの変数を介して相互作用する多数の個人が関与する。平均場ゲームは、プレイヤーの数が無限である極限におけるそのような問題に対するナッシュ均衡を研究するために導入された。この理論は分析ツールと確率ツールの両方を使用して過去10年間に広く開発され、経済学から群集運動まで幅広い応用が発見されている。最近では、機械学習とのインタラクションが関心を集めている。この側面は、複雑な構造、高次元、または共通のランダム性源を持つ非常に大きなゲームを解くことに特に関係している。本章では,平均フィールドゲームとディープラーニングの相互作用に関する文献を,3種類の手法に焦点をあてて検討する。金融アプリケーションに特に重点を置いている。

Financial markets and more generally macro-economic models involve a large number of individuals interacting through variables such as prices resulting from the aggregate behavior of all the agents. Mean field games have been introduced to study Nash equilibria for such problems in the limit when the number of players is infinite. The theory has been extensively developed in the past decade, using both analytical and probabilistic tools, and a wide range of applications have been discovered, from economics to crowd motion. More recently the interaction with machine learning has attracted a growing interest. This aspect is particularly relevant to solve very large games with complex structures, in high dimension or with common sources of randomness. In this chapter, we review the literature on the interplay between mean field games and deep learning, with a focus on three families of methods. A special emphasis is given to financial applications.

翻訳日:2021-07-12 13:32:27 公開日:2021-07-09

# (参考訳) 二階情報の効率的な行列フリー近似と刈り取りと最適化への応用

Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization ( http://arxiv.org/abs/2107.03356v3 )

ライセンス: CC BY 4.0

Elias Frantar, Eldar Kurtic, Dan Alistarh

(参考訳) 損失関数の局所曲率情報を効率的に近似することは、ディープニューラルネットワークの最適化と圧縮の鍵となるツールである。しかし、既存の2次情報を近似する手法の多くは計算コストやストレージコストが高く、実用性を制限できる。本研究では,経験的フィッシャー行列によるヘッシアンの古典的な近似のように,ヘッシアンをランク1の行列の和として近似できる場合の逆ヘッシアンベクトル積(ihvps)を推定するための行列フリーな線形時間アプローチについて検討する。 M-FACと呼ばれるフレームワークの一部として、2つの新しいアルゴリズムを提案する: 最初のアルゴリズムはネットワーク圧縮に最適化され、逆 Hessian の任意の要素に対して$O(dm^2)$プリ計算、$O(dm)$計算、$O(dm)$クエリコスト$O(m)$で階数1の行列の和として与えられる場合、次元$d$で IHVPを計算できる。第2のアルゴリズムは最適化設定を目標とし,最適化ステップのスライディングウィンドウ上で推定される逆ヘシアンと,事前条件付きSGDに必要な勾配方向との間の積の計算を行う。 IHVPの計算に$O(dm + m^2)$と$O(dm + m^3)$を、スライディングウィンドウから勾配を追加したり取り除いたりするためのアルゴリズムを与える。これら2つのアルゴリズムは、既存の二階法に比べて計算オーバーヘッドの少ないネットワークプルーニングと最適化に最先端の結果をもたらす。実装は[10]と[18]で利用可能です。

Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, which can limit their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. We propose two new algorithms as part of a framework called M-FAC: the first algorithm is tailored towards network compression and can compute the IHVP for dimension $d$, if the Hessian is given as a sum of $m$ rank-one matrices, using $O(dm^2)$ precomputation, $O(dm)$ cost for computing the IHVP, and query cost $O(m)$ for any single element of the inverse Hessian. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction, as required for preconditioned SGD. We give an algorithm with cost $O(dm + m^2)$ for computing the IHVP and $O(dm + m^3)$ for adding or removing any gradient from the sliding window. These two algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods. Implementations are available at [10] and [18].

翻訳日:2021-07-12 11:20:14 公開日:2021-07-09

# (参考訳) 3次元胸部CT画像によるCovid-19検出のためのハイブリッドディープラーニングフレームワーク

A hybrid deep learning framework for Covid-19 detection via 3D Chest CT Images ( http://arxiv.org/abs/2107.03904v2 )

ライセンス: CC BY 4.0

Shuang Liang

(参考訳) 本稿では,畳み込みニューラルネットワークとトランスフォーマーを組み合わせた3次元胸部CT画像によるCOVID-19検出のためのハイブリッドディープラーニングフレームワークCTNetを提案する。これは、CTスキャンから十分な特徴を抽出するためにSEが注目するCNN特徴抽出モジュールと、3D CTスキャンの識別特徴をモデル化するトランスフォーマーモデルで構成されている。従来の研究と比較すると、CTNetは、データ再サンプリング戦略を備えた3D CTスキャンによる新型コロナウイルスの診断を効果的かつ効率的に行う方法を提供している。大規模かつパブリックなベンチマークによる高度な結果、COV19-CT-DBデータベースは、提案されたCTNetによって達成された。

In this paper, we present a hybrid deep learning framework named CTNet which combines convolutional neural network and transformer together for the detection of COVID-19 via 3D chest CT images. It consists of a CNN feature extractor module with SE attention to extract sufficient features from CT scans, together with a transformer model to model the discriminative features of the 3D CT scans. Compared to previous works, CTNet provides an effective and efficient method to perform COVID-19 diagnosis via 3D CT scans with data resampling strategy. Advanced results on a large and public benchmarks, COV19-CT-DB database was achieved by the proposed CTNet, over the state-of-the-art baseline approachproposed together with the dataset.

翻訳日:2021-07-12 10:46:22 公開日:2021-07-09

# 特徴解釈と時空間解析を用いた機械学習に基づく沿岸水質予測

Coastal water quality prediction based on machine learning with feature interpretation and spatio-temporal analysis ( http://arxiv.org/abs/2107.03230v2 )

ライセンス: Link先を確認

Luka Grb\v{c}i\'c, Sini\v{s}a Dru\v{z}eta, Goran Mau\v{s}a, Tomislav Lipi\'c, Darija Vuki\'c Lu\v{s}i\'c, Marta Alvir, Ivana Lu\v{c}in, Ante Sikirica, Davor Davidovi\'c, Vanja Trava\v{s}, Daniela Kalafatovi\'c, Kristina Pikelj, Hana Fajkovi\'c, Toni Holjevi\'c and Lado Kranj\v{c}evi\'c

(参考訳) 沿岸水質管理は公衆衛生上の問題であり、沿岸水質の悪化は人の健康に危険である病原体を収容することができる。観光志向の国は、夏季の観光名所で沿岸水の状態を積極的に監視する必要がある。本研究では,クロアチアのリイェカ市にある15か所の公衆ビーチを対象に,escherichia\ coli$とenterococciの定期的モニタリングデータを用いて,環境パラメータに基づいてレベルを予測する機械学習モデルを構築し,環境ストレスとの関連性について検討した。勾配ブースティング (catboost, xgboost) , ランダム林, サポートベクター回帰, 人工ニューラルネットを全てのサンプリングサイトから測定し, 環境特性に基づくe.\ coli$およびenterococci値の予測に用いた。機械学習モデルの10倍クロスバリデーション解析による安定性と一般化性の評価は,xgboost,ランダムフォレスト,サポートベクター回帰,ニューラルネットワークなど他の評価mlアルゴリズムと比較して,それぞれ0.71,0.68のr$^2$値で最高性能を示した。また、SHapley Additive exPlanations技術を用いて、最も予測力のある特徴を特定し、解釈する。その結果, 塩分濃度はE.\ Coli$ と enterococci の両方を推定する上で最も重要な特徴であることがわかった。最後に, 沿岸水質の低い地点において, 両方のMLモデルの空間的および時間的精度について検討した。スペースは$e。 Coli$およびEnterococciモデルは0.85および0.83の強いR$^2$値、時間モデルは0.74および0.67のR$^2$値を得た。また, 沿岸水質の高い地点では, 適度なR$^2$値0.44および0.46を達成した。

Coastal water quality management is a public health concern, as poor coastal water quality can harbor pathogens that are dangerous to human health. Tourism-oriented countries need to actively monitor the condition of coastal water at tourist popular sites during the summer season. In this study, routine monitoring data of $Escherichia\ Coli$ and enterococci across 15 public beaches in the city of Rijeka, Croatia, were used to build machine learning models for predicting their levels based on environmental parameters as well as to investigate their relationships with environmental stressors. Gradient Boosting (Catboost, Xgboost), Random Forests, Support Vector Regression and Artificial Neural Networks were trained with measurements from all sampling sites and used to predict $E.\ Coli$ and enterococci values based on environmental features. The evaluation of stability and generalizability with 10-fold cross validation analysis of the machine learning models, showed that the Catboost algorithm performed best with R$^2$ values of 0.71 and 0.68 for predicting $E.\ Coli$ and enterococci, respectively, compared to other evaluated ML algorithms including Xgboost, Random Forests, Support Vector Regression and Artificial Neural Networks. We also use the SHapley Additive exPlanations technique to identify and interpret which features have the most predictive power. The results show that site salinity measured is the most important feature for forecasting both $E.\ Coli$ and enterococci levels. Finally, the spatial and temporal accuracy of both ML models were examined at sites with the lowest coastal water quality. The spatial $E. Coli$ and enterococci models achieved strong R$^2$ values of 0.85 and 0.83, while the temporal models achieved R$^2$ values of 0.74 and 0.67. The temporal model also achieved moderate R$^2$ values of 0.44 and 0.46 at a site with high coastal water quality.

翻訳日:2021-07-12 10:39:04 公開日:2021-07-09

# 部分的スーパービジョンのためのラベルセット損失関数:胎児脳MRI解析への応用

Label-set Loss Functions for Partial Supervision: Application to Fetal Brain 3D MRI Parcellation ( http://arxiv.org/abs/2107.03846v2 )

ライセンス: Link先を確認

Lucas Fidon, Michael Aertsen, Doaa Emam, Nada Mufti, Fr\'ed\'eric Guffens, Thomas Deprest, Philippe Demaerel, Anna L. David, Andrew Melbourne, S\'ebastien Ourselin, Jan Deprest, Tom Vercauteren

(参考訳) ディープニューラルネットワークは自動セグメンテーションの精度を高めているが、その精度は多数の完全セグメンテーションされた画像の可用性に依存する。部分的に注釈付きデータセットをうまく活用するためには、興味のある領域がセグメンテーションされている画像を使ってディープニューラルネットワークを訓練する方法が必要である。本稿では,部分分割画像を扱うことができる損失関数であるラベルセット損失関数の最初の公理的定義を提案する。完全分割画像に対する古典的損失関数を適切なラベルセット損失関数に変換する方法は1つと1つしかないことを証明した。我々の理論は、特に欠落ラベルしか持たない部分的な監督に適したディース損失のラベルセット一般化であるリーフ・ディース損失を定義できる。葉分裂損失を用いて,胎児脳3次元mri分割のための部分教師あり学習における新しい状態を設定した。白質、心室、小脳、室外csf、皮質灰白質、深灰白質、脳幹、コーパスカルーサムを解剖学的に正常な胎児の胎児脳3dmriまたは開放性スピナビフィダに基づいて分節することができる深層ニューラルネットワークを実現する。提案するラベルセット損失関数の実装は、https://github.com/lucasfidon/label-set-loss-functionsで利用可能です。

Deep neural networks have increased the accuracy of automatic segmentation, however, their accuracy depends on the availability of a large number of fully segmented images. Methods to train deep neural networks using images for which some, but not all, regions of interest are segmented are necessary to make better use of partially annotated datasets. In this paper, we propose the first axiomatic definition of label-set loss functions that are the loss functions that can handle partially segmented images. We prove that there is one and only one method to convert a classical loss function for fully segmented images into a proper label-set loss function. Our theory also allows us to define the leaf-Dice loss, a label-set generalization of the Dice loss particularly suited for partial supervision with only missing labels. Using the leaf-Dice loss, we set a new state of the art in partially supervised learning for fetal brain 3D MRI segmentation. We achieve a deep neural network able to segment white matter, ventricles, cerebellum, extra-ventricular CSF, cortical gray matter, deep gray matter, brainstem, and corpus callosum based on fetal brain 3D MRI of anatomically normal fetuses or with open spina bifida. Our implementation of the proposed label-set loss functions is available at https://github.com/LucasFidon/label-set-loss-functions

翻訳日:2021-07-12 10:38:20 公開日:2021-07-09

# SCSS-Net:3次元屋内シーンのためのスーパーポイント制約付き半教師付きセグメンテーションネットワーク

SCSS-Net: Superpoint Constrained Semi-supervised Segmentation Network for 3D Indoor Scenes ( http://arxiv.org/abs/2107.03601v2 )

ライセンス: Link先を確認

Shuang Deng, Qiulei Dong, and Bo Liu

(参考訳) 3Dポイントクラウドセマンティックセグメンテーションのための既存のディープニューラルネットワーク(DNN)の多くは、大量のラベル付きトレーニングデータを必要とする。しかし、複雑なシーンにポイントレベルのラベルを手動で割り当てるのには時間がかかる。ラベルのない点雲はセンサや再構成から容易に得ることができるが,SCSS-Netと呼ばれる3次元点雲のための超点制約付き半教師付きセグメンテーションネットワークを提案する。具体的には,ラベルのない点雲から予測された擬似ラベルを自己学習に利用し,幾何ベースおよび色ベースの領域拡大アルゴリズムによって生成されたスーパーポイントを組み合わせて,疑似ラベルを低信頼で修正・削除する。さらに,特徴を幾何学や色彩のエッジポイントから制約するエッジ予測モジュールを提案する。各スーパーポイントの特徴を円滑にするために、スーパーポイント特徴集合モジュールとスーパーポイント特徴整合損失関数を導入する。 2つの公開屋内データセットにおける広範囲な実験結果から,最先端のクラウドセグメンテーションネットワークや,ラベル付きシーンの少ない半教師付きセグメンテーション手法よりも優れた性能が得られることが示された。

Many existing deep neural networks (DNNs) for 3D point cloud semantic segmentation require a large amount of fully labeled training data. However, manually assigning point-level labels on the complex scenes is time-consuming. While unlabeled point clouds can be easily obtained from sensors or reconstruction, we propose a superpoint constrained semi-supervised segmentation network for 3D point clouds, named as SCSS-Net. Specifically, we use the pseudo labels predicted from unlabeled point clouds for self-training, and the superpoints produced by geometry-based and color-based Region Growing algorithms are combined to modify and delete pseudo labels with low confidence. Additionally, we propose an edge prediction module to constrain the features from edge points of geometry and color. A superpoint feature aggregation module and superpoint feature consistency loss functions are introduced to smooth the point features in each superpoint. Extensive experimental results on two 3D public indoor datasets demonstrate that our method can achieve better performance than some state-of-the-art point cloud segmentation networks and some popular semi-supervised segmentation methods with few labeled scenes.

翻訳日:2021-07-12 10:37:27 公開日:2021-07-09

# deep metric learning を用いた悪性リンパ腫の弱アノテート大きな病理組織像に対するケースベース類似画像検索

Case-based similar image retrieval for weakly annotated large histopathological images of malignant lymphoma using deep metric learning ( http://arxiv.org/abs/2107.03602v2 )

ライセンス: Link先を確認

Noriaki Hashimoto, Yusuke Takagi, Hiroki Masuda, Hiroaki Miyoshi, Kei Kohno, Miharu Nagaishi, Kensaku Sato, Mai Takeuchi, Takuya Furuta, Keisuke Kawamoto, Kyohei Yamada, Mayuko Moritsubo, Kanako Inoue, Yasumasa Shimasaki, Yusuke Ogura, Teppei Imamoto, Tatsuzo Mishina, Koichi Ohshima, Hidekata Hontani, Ichiro Takeuchi

(参考訳) そこで本研究では,ヘマトキシリンとエオシン(H&E)による悪性リンパ腫の組織像を検索する新しい症例ベース類似画像検索法を提案する。全身のスライド画像(WSI)を入力クエリとして使用する場合,腫瘍細胞などの病理学的に重要な領域のイメージパッチに着目して,同様の症例を検索できることが望ましい。この問題に対処するために,注意に基づく複数インスタンス学習を採用し,症例間の類似性を計算する際に腫瘍特異的領域に着目した。さらに,免疫組織化学的(ihc)染色パターンを,異種悪性リンパ腫の適切な類似性を定義するための教師付き情報として組み込むために,対比的距離測定を行った。 249例の悪性リンパ腫に対する実験において,本手法はsir法よりも高い評価基準を示した。また, 病理医による主観的評価により, 悪性リンパ腫に対するh&e染色組織像の類似性を表すために, ihc染色パターンを用いた類似度測定が適切であった。

In the present study, we propose a novel case-based similar image retrieval (SIR) method for hematoxylin and eosin (H&E)-stained histopathological images of malignant lymphoma. When a whole slide image (WSI) is used as an input query, it is desirable to be able to retrieve similar cases by focusing on image patches in pathologically important regions such as tumor cells. To address this problem, we employ attention-based multiple instance learning, which enables us to focus on tumor-specific regions when the similarity between cases is computed. Moreover, we employ contrastive distance metric learning to incorporate immunohistochemical (IHC) staining patterns as useful supervised information for defining appropriate similarity between heterogeneous malignant lymphoma cases. In the experiment with 249 malignant lymphoma patients, we confirmed that the proposed method exhibited higher evaluation measures than the baseline case-based SIR methods. Furthermore, the subjective evaluation by pathologists revealed that our similarity measure using IHC staining patterns is appropriate for representing the similarity of H&E-stained tissue images for malignant lymphoma.

翻訳日:2021-07-12 10:37:06 公開日:2021-07-09

# マルチタスク感情分析のための特徴ピラミッドネットワーク

Feature Pyramid Network for Multi-task Affective Analysis ( http://arxiv.org/abs/2107.03670v2 )

ライセンス: Link先を確認

Ruian He, Zhen Xing, Weimin Tan, Bo Yan

(参考訳) Affective Analysisは単一のタスクではなく、valence-arousal値、式クラス、アクションユニットを同時に予測することができる。これまでの研究では、これら3つの顔属性の絡み合いや階層関係を無視して、全体的タスクとして捉えられなかった。マルチタスク影響分析のための特徴ピラミッドネットワークという新しいモデルを提案する。階層的特徴を抽出して3つのラベルを予測し,事前学習されたシングルタスクモデルから学習するための教師学生訓練戦略を適用する。実験の結果,提案モデルが他のモデルより優れており,本論文はABAW(Affective Behavior Analysis in-wild)の第2ワークショップおよびコンペティションに提出されている。コードとモデルは、https://github.com/ryanhe312/ABAW2-FPNMAAで研究目的で利用可能である。

Affective Analysis is not a single task, and the valence-arousal value, expression class and action unit can be predicted at the same time. Previous researches failed to take them as a whole task or ignore the entanglement and hierarchical relation of this three facial attributes. We propose a novel model named feature pyramid networks for multi-task affect analysis. The hierarchical features are extracted to predict three labels and we apply teacher-student training strategy to learn from pretrained single-task models. Extensive experiment results demonstrate the proposed model outperform other models.This is a submission to The 2nd Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW). The code and model are available for research purposes at https://github.com/ryanhe312/ABAW2-FPNMAA.

翻訳日:2021-07-12 10:36:47 公開日:2021-07-09

# 深部神経回路を用いた耳部CT画像における顎骨内解剖のアトラスによる分類

Atlas-Based Segmentation of Intracochlear Anatomy in Metal Artifact Affected CT Images of the Ear with Co-trained Deep Neural Networks ( http://arxiv.org/abs/2107.03987v2 )

ライセンス: Link先を確認

Jianing Wang, Dingjie Su, Yubo Fan, Srijata Chakravorti, Jack H. Noble, and Benoit M. Dawant

(参考訳) 本稿では,アトラス内のメッシュ間のポイント・ツー・ポイント対応を保った人工内耳インプラント(ci)受像者の術後ct画像中の人工内耳解剖(ica)をアトラスベースで分割する手法を提案する。インプラントが生成する強いアーティファクトにより困難であるこの問題を解決するために, 対向方向に高密度変形場(ddfs)を発生させる2対の共学習深層ネットワークを用いた。 1つのネットワークは、アトラス画像をポストCT画像に登録し、もう1つのネットワークは、ポストCT画像をアトラス画像に登録する。ネットワークは、voxel-wiseラベル、画像内容、fiducial registration error、およびcycle-consistency制約に基づく損失関数を用いてトレーニングされる。その後、トレーニングされた登録ネットワークによって生成された対応するDFFを用いて、アトラス画像中のICAの予め定義されたセグメンテーションメッシュをポストCT画像に転送することにより、ポストCT画像中のICAのセグメンテーションを得る。本モデルでは,金属工芸品によって隠蔽されているにもかかわらず,ICAの基盤となる幾何学的特徴を学習することができる。この手法は,まず条件付き生成逆数ネットワークを用いてPost-CT画像からアーティファクトのない画像を合成し,その後,活性形状モデルを用いてICAを合成画像に分割する手法である。提案手法は,エンドユーザの受け入れに重要なSOTAに必要な時間の一部を要している。

We propose an atlas-based method to segment the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of cochlear implant (CI) recipients that preserves the point-to-point correspondence between the meshes in the atlas and the segmented volumes. To solve this problem, which is challenging because of the strong artifacts produced by the implant, we use a pair of co-trained deep networks that generate dense deformation fields (DDFs) in opposite directions. One network is tasked with registering an atlas image to the Post-CT images and the other network is tasked with registering the Post-CT images to the atlas image. The networks are trained using loss functions based on voxel-wise labels, image content, fiducial registration error, and cycle-consistency constraint. The segmentation of the ICA in the Post-CT images is subsequently obtained by transferring the predefined segmentation meshes of the ICA in the atlas image to the Post-CT images using the corresponding DDFs generated by the trained registration networks. Our model can learn the underlying geometric features of the ICA even though they are obscured by the metal artifacts. We show that our end-to-end network produces results that are comparable to the current state of the art (SOTA) that relies on a two-steps approach that first uses conditional generative adversarial networks to synthesize artifact-free images from the Post-CT images and then uses an active shape model-based method to segment the ICA in the synthetic images. Our method requires a fraction of the time needed by the SOTA, which is important for end-user acceptance.

翻訳日:2021-07-12 10:36:36 公開日:2021-07-09

PDF登録状況（公開日: 20210709）