Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20210601となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# ベル非局所性による通信複雑性の量子的利点 Quantum advantages of communication complexity from Bell nonlocality ( http://arxiv.org/abs/2004.05098v3 ) ライセンス: Link先を確認	Zhih-Ahn Jia, Lu Wei, Yu-Chun Wu, Guang-Can Guo	(参考訳) コミュニケーションゲームは物理理論の限界を調査するための重要なツールである。通信複雑性(CC)問題は、いくつかの分散パーティが古典的な通信に制限のある関数を共同で計算しようとする典型的な例である。本研究では,ベルテストによるcc問題をグラフ理論的に構築する手法を提案する。実験的な整合性グラフとそれに対応するベルテスト関数から、各エッジの情報をエンコードするターゲット関数を構築することができ、このターゲット関数を用いて、事前共有された絡み合った状態により、成功確率が任意の古典的戦略に対してそれを超えるCC関数を構築することができる。 Popescu-Rohrlich ボックスに基づく非署名プロトコルについても論じられ、この場合の成功確率は 1 である。 Communication games are crucial tools for investigating the limitations of physical theories. The communication complexity (CC) problem is a typical example, for which several distributed parties attempt to jointly calculate a given function with limited classical communications. In this work, we present a method to construct CC problems from Bell tests in a graph-theoretic way. Starting from an experimental compatibility graph and the corresponding Bell test function, a target function which encodes the information of each edge can be constructed, then using this target function we could construct an CC function for which by pre-sharing entangled states, the success probability will exceed that for arbitrary classical strategy. The non-signaling protocol based on Popescu-Rohrlich box is also discussed, and the success probability in this case would reach one.	翻訳日:2023-05-25 06:14:17 公開日:2021-06-01
# 2つの鏡間を加速する2つの絡み合った原子の共鳴相互作用 Resonance interaction of two entangled atoms accelerating between two mirrors ( http://arxiv.org/abs/2007.15465v2 ) ライセンス: Link先を確認	Riddhi Chatterjee, Sunandan Gangopadhyay and A. S. Majumdar	(参考訳) 量子化スカラー場真空と結合した2つの絡み合った同一原子間の共鳴相互作用と2つのミラー間の加速について検討した。非慣性運動中の原子配置によって、2つの原子の絡み合った状態の放射過程がどう操作できるかを示す。ハイゼンベルク図を対称作用素順序で組み込むと、真空揺らぎと自己反応の寄与が区別される。ハイゼンベルク運動方程式における自己反応の寄与から、2つの原子系の共鳴エネルギーシフトと緩和速度を評価する。本研究では, 原子加速, 原子間距離, 境界に対する位置などのパラメータによる2つの量の変化について検討する。以上のパラメータをチューニングすることにより,エネルギーレベルシフトと緩和率の両方を制御できることを示す。 We study the resonance interaction between two entangled identical atoms coupled to a quantized scalar field vacuum, and accelerating between two mirrors. We show how radiative processes of the two-atom entangled state can be manipulated by the atomic configuration undergoing noninertial motion. Incorporating the Heisenberg picture with symmetric operator ordering, the vacuum fluctuation and the self-reaction contributions are distinguished. We evaluate the resonance energy shift and the relaxation rate of energy of the two atom system from the self-reaction contribution in the Heisenberg equation of motion. We investigate the variation of these two quantities with relevant parameters such as atomic acceleration, interatomic distance and position with respect to the boundaries. We show that both the energy level shift and the relaxation rate can be controlled by tuning the above parameters.	翻訳日:2023-05-07 18:31:04 公開日:2021-06-01
# 2レベル結合系におけるエネルギーの量子対古典輸送 Quantum versus classical transport of energy in coupled two-level systems ( http://arxiv.org/abs/2007.15669v2 ) ライセンス: Link先を確認	I. Medina, S. V. Moreira, and F. L. Semi\~ao	(参考訳) 結合量子系の連鎖におけるエネルギー輸送の問題は、非古典的資源が輸送にどのように影響するかの光を遮蔽することを目的として検討する。チェーン内でコヒーレントまたは非コヒーレントなエネルギーホッピングが行われる場合について検討する。ここでは、非コヒーレントエネルギーホッピングは、非結合部位の固有状態によって形成されるその完全な対角線力学への言及において「古典的」シナリオと呼ばれる。 2-レベルサイトの線形連鎖の場合に注目し、コヒーレントな量子ケースが非コヒーレントなサイトよりも効率的であるホッピングレートのしきい値を見つける。次に、量子ホッピングレートをコヒーレンス大域的最大値にリンクすることで、量子シナリオをより効率的にするコヒーレンスしきい値が存在することを示すことができる。次に、ダイナミクスによって生成される統合的コヒーレンスを考察し、量子演算の侵入性として知られるものとの関連性を示す。本結果は,量子輸送の資源として量子侵襲性が果たす重要な役割を強く示唆する。 We consider the problem of energy transport in a chain of coupled quantum systems with the goal of shedding light on how nonclassical resources can affect transport. We study the cases for which either coherent or incoherent energy hopping takes place in the chain. Here, incoherent energy hopping is referred to as the "classical" scenario in allusion to its fully diagonal dynamics in the basis formed by the eigenstates of the decoupled sites. We focus on the case of a linear chain of two-level sites and find a hopping rate threshold above which the coherent quantum case is more efficient than the incoherent counterpart. We then link the quantum hopping rate to the coherence global maximum, which allows us to state that there is a coherence threshold above which the quantum scenario is more efficient. Next, we consider the integrated coherence generated by the dynamics and show how it is related to what is known as the invasiveness of a quantum operation. Our results strongly suggest the significant role played by quantum invasiveness as a resource for quantum transport.	翻訳日:2023-05-07 18:10:39 公開日:2021-06-01
# ランダム多成分量子状態に対する古典相関の共有可能性の制限 Restrictions on shareability of classical correlations for random multipartite quantum states ( http://arxiv.org/abs/2008.09592v2 ) ライセンス: Link先を確認	Saptarshi Roy, Shiladitya Mal, Aditi Sen De	(参考訳) 量子相関とは異なり、古典的相関(CC)の多部状態の2つの部分間の共有性は自由であると仮定される。しかし、状態空間からランダムに状態を選択すると、代数的最大値を持つ状態を得る確率は無限に小さいことが分かる。本研究では,ランダム多元状態の一様生成による非自明な上限の可能性を検討し,様々なcc測度,従来の古典相関器,古典相関の2つの公理測度,すなわち量子ディスコの古典的部分と作業不足の局所的仕事の周波数分布を計算する。分布は典型的にはガウス型であり,その標準偏差はパーティ数の増加とともに減少する。また, マルチ量子ビット確率状態のうち, 還元密度行列のほとんどがccsの量が少なく, 分布の平均によっても確認できるため, ランダム状態に対する古典相関のシェーラビリティに何らかの制限があることを明らかにした。さらに、乱数状態の最大値は、一連の状態に対して得られる代数的最大値よりもかなり低く、2つの状態の間のギャップは、より多くの相手を持つ状態に対してさらに大きくなることにも気付く。より多くのパーティにおいて、量子不協和の古典的部分と局所的な作業は、モノガミーに基づくシャーラビリティ上の境界に従うことができるが、古典的相関子は異なる上限を持つ。ランダム状態における古典的相関測度に対するシャーラビリティの傾向は、古典的相関の公理的定義と従来とを明確に区別する。 Unlike quantum correlations, the shareability of classical correlations (CCs) between two-parties of a multipartite state is assumed to be free since there exist states for which CCs for each of the reduced states can simultaneously reach their algebraic maximum value. However, when one randomly picks out states from the state space, we find that the probability of obtaining those states possessing the algebraic maximum value is vanishingly small. We explore the possibility of nontrivial upper bound by Haar uniformly generating random multipartite states and computing the frequency distribution for various CC measures, conventional classical correlators, and two axiomatic measures of classical correlations, namely the classical part of quantum discord and local work of work-deficit. We find that the distributions are typically Gaussian-like and their standard deviations decrease with the increase in number of parties. It also reveals that among the multiqubit random states, most of the reduced density matrices possess a low amount of CCs which can also be confirmed by the mean of the distributions, thereby showing a kind of restrictions on the sharability of classical correlations for random states. Furthermore, we also notice that the maximal value for random states is much lower than the algebraic maxima obtained for a set of states, and the gap between the two increases further for states with a higher number of parties. We report that for a higher number of parties, the classical part of quantum discord and local work can follow monogamy-based upper bound on sharability while classical correlators have a different upper bound. The trends of sharability for classical correlation measures in random states clearly demarcate between the axiomatic definition of classical correlations and the conventional ones.	翻訳日:2023-05-05 07:48:43 公開日:2021-06-01
# 強いポンプ場下における超伝導量子パラメトロンの制御 Controls of a superconducting quantum parametron under a strong pump field ( http://arxiv.org/abs/2009.05723v2 ) ライセンス: Link先を確認	Shumpei Masuda, Toyofumi Ishikawa, Yuichiro Matsuzaki and Shiro Kawabata	(参考訳) 自然振動の約2倍の周波数で励起され、パラメトロンまたはカーパラメトリック発振器と呼ばれるジョセフソンパラメトリック発振器は自己振動を示す。量子アニールと自励パラメトロンを量子ビットとして用いた普遍量子計算を提案した。しかし、ポンプ場下のパラメトロンの制御は、回転波近似の違反から生じる非共鳴高速振動項 (nrots) と呼ばれるハミルトニアンにおける不必要な高速振動項によって劣化する。したがって、ポンプ場はパラメトロンの制御の不完全さの原点となる可能性がある。ここでは,猫の状態生成と単一キュービットゲートによるパラメトロンの制御精度に及ぼすnrotの影響を理論的に検討する。従来の手法では, 非断熱遷移の抑制と回転波近似の有効性との間にはトレードオフ関係があることが示されている。また,ポンプの脱調の調整時間依存性は,非断熱遷移とNROTによるパラメトロン状態の乱れの両方を抑制できることを示した。 Pumped at approximately twice the natural frequency, a Josephson parametric oscillator called parametron or Kerr parametric oscillator shows self-oscillation. Quantum annealing and universal quantum computation using self-oscillating parametrons as qubits were proposed. However, controls of parametrons under the pump field are degraded by unwanted rapidly oscillating terms in the Hamiltonian, which we call non-resonant rapidly oscillating terms (NROTs) coming from the violation of the rotating wave approximation. Therefore, the pump field can be an intrinsic origin of the imperfection of controls of parametrons. Here, we theoretically study the influence of the NROTs on the accuracy of controls of a parametron: a cat-state creation and a single-qubit gate. It is shown that there is a trade-off relationship between the suppression of the nonadiabatic transitions and the validity of the rotating wave approximation in a conventional approach. We also show that the tailored time dependence of the detuning of the pump field can suppress both of the nonadiabatic transitions and the disturbance of the state of the parametron due to the NROTs.	翻訳日:2023-05-02 10:50:33 公開日:2021-06-01
# 2成分ボース-アインシュタイン凝縮体のリモート状態調製 Remote state preparation of two-component Bose-Einstein condensates ( http://arxiv.org/abs/2009.06923v2 ) ライセンス: Link先を確認	Manish Chaudhary, Matteo Fadel, Ebubechukwu O. Ilo-Okeke, Alexey N. Pyrkov, Valentin Ivannikov, Tim Byrnes	(参考訳) スピンアンサンブルのための遠隔状態準備プロトコルを提案し、このプロトコルの目的は、アンタングルメント、局所スピン回転、およびフォックベースでの計測を用いて、所定のスピン期待値のセットを持つ状態を作成することである。スピンアンサンブルは熱原子アンサンブルやホウ素-アインシュタイン凝縮によって実現される。このプロトコルは、フルブロッホ球面のスピン期待値を作成することができるホルシュタイン・プリマコフ近似を超えて機能する。主な実用上の障害は、スピンアンサンブル間の最大絡み合い状態の調整である。これを解決するために, 2軸2スピン(2A2S)ハミルトニアンを最大絡み合う状態の代わりに用いて, その性能について検討する。 2a2sのプロトコルのバージョンは、スピン平均をリモートで作成できるように、最大エンタングル状態に近いことが判明した。 2A2S圧縮状態の誤差を評価し,アンサンブルサイズで減少することを確認した。選択後、エラーはさらに体系的に減少する。 A protocol for remote state preparation is proposed for spin ensembles, where the aim is to prepare a state with a given set of spin expectation values on a remote spin ensemble using entanglement, local spin rotations, and measurements in the Fock basis. The spin ensembles could be realized by thermal atomic ensembles or spinor Bose-Einstein condensates. The protocol works beyond the Holstein-Primakoff approximation, such that spin expectation values for the full Bloch sphere can be prepared. The main practical obstacle is the preparation of the maximally entangled state between the spin ensembles. To overcome this, we examine using states based on the two-axis two-spin (2A2S) Hamiltonian in place of the maximally entangled state and examine its performance. We find that the version of the protocol with 2A2S squeezing well-approximates the maximally entangled state, such that spin averages can be remotely prepared. We evaluate the errors of using 2A2S squeezed states, and find that it decreases with the ensemble size. With post-selection, errors can be systematically decreased further.	翻訳日:2023-05-02 04:37:41 公開日:2021-06-01
# クビットアレイによるマイクロ波光子検出におけるハイゼンベルク限界に向けて Towards the Heisenberg limit in microwave photon detection by a qubit array ( http://arxiv.org/abs/2009.11271v3 ) ライセンス: Link先を確認	P. Navez, A. G. Balanov, S. E. Savel'ev, A. M. Zagoskin	(参考訳) 解析的に可解なモデルを用いて、量子ビットアレイに基づく検出器は、単一光子を検出する際の基本的なハイゼンベルク限界を達成することができることを示す。超伝導量子ビットの場合、これは重要なマイクロ波領域における量子センシングと通信の新しい機会を開く。 Using an analytically solvable model, we show that a qubit array-based detector allows to achieve the fundamental Heisenberg limit in detecting single photons. In case of superconducting qubits, this opens new opportunities for quantum sensing and communications in the important microwave range.	翻訳日:2023-05-01 04:46:27 公開日:2021-06-01
# 量子環の電子的性質を正確に計算する Accurately computing electronic properties of a quantum ring ( http://arxiv.org/abs/2012.00921v2 ) ライセンス: Link先を確認	C. Neill, T. McCourt, X. Mi, Z. Jiang, M. Y. Niu, W. Mruczkiewicz, I. Aleiner, F. Arute, K. Arya, J. Atalaya, R. Babbush, J. C. Bardin, R. Barends, A. Bengtsson, A. Bourassa, M. Broughton, B. B. Buckley, D. A. Buell, B. Burkett, N. Bushnell, J. Campero, Z. Chen, B. Chiaro, R. Collins, W. Courtney, S. Demura, A. R. Derk, A. Dunsworth, D. Eppens, C. Erickson, E. Farhi, A. G. Fowler, B. Foxen, C. Gidney, M. Giustina, J. A. Gross, M. P. Harrigan, S. D. Harrington, J. Hilton, A. Ho, S. Hong, T. Huang, W. J. Huggins, S. V. Isakov, M. Jacob-Mitos, E. Jeffrey, C. Jones, D. Kafri, K. Kechedzhi, J. Kelly, S. Kim, P. V. Klimov, A. N. Korotkov, F. Kostritsa, D. Landhuis, P. Laptev, E. Lucero, O. Martin, J. R. McClean, M. McEwen, A. Megrant, K. C. Miao, M. Mohseni, J. Mutus, O. Naaman, M. Neeley, M. Newman, T. E. O'Brien, A. Opremcak, E. Ostby, B. Pato, A. Petukhov, C. Quintana, N. Redd, N. C. Rubin, D. Sank, K. J. Satzinger, V. Shvarts, D. Strain, M. Szalay, M. D. Trevithick, B. Villalonga, T. C. White, Z. Yao, P. Yeh, A. Zalcman, H. Neven, S. Boixo, L. B. Ioffe, P. Roushan, Y. Chen, V. Smelyanskiy	(参考訳) 凝縮マター系を研究するための有望なアプローチは、それらをエンジニアリングされた量子プラットフォーム上でシミュレートすることである。しかし、古典的手法を上回る精度を達成することは、大きな課題である。ここでは,18個の超伝導量子ビットを用いて,精密凝縮マッターシミュレータのための実験青写真を提供し,基本電子特性の探索方法を示す。 1次元ワイヤの単一粒子バンド構造を再構築することにより,基礎となる手法をベンチマークする。我々はデコヒーレンスと読み出し誤差のほぼ完全な緩和を実証し、このワイヤのエネルギー固有値の誤差が0.01 radであるのに対して、典型的なエネルギースケールは1 radである。この前例のないアルゴリズムの忠実性への洞察は、フーリエ変換の頑健な性質を強調し、1e-4 radの統計的不確実性で固有エネルギーを解く能力を含む。さらに, 凝縮マター系の2つの鍵要素である磁束と乱れ局所電位を合成する。磁束を網羅する際には,局所障害の空間分布の詳細な指紋であるスペクトルの水平交差を観測する。これらの方法を組み合わせて固有状態の電子的性質を再構成し, 持続電流を観測し, コンダクタンスの強い抑制を付加した。本研究は、量子シミュレーションの正確な方法を説明し、超伝導量子ビットを用いた新しい量子材料の研究方法を提案する。 A promising approach to study condensed-matter systems is to simulate them on an engineered quantum platform. However, achieving the accuracy needed to outperform classical methods has been an outstanding challenge. Here, using eighteen superconducting qubits, we provide an experimental blueprint for an accurate condensed-matter simulator and demonstrate how to probe fundamental electronic properties. We benchmark the underlying method by reconstructing the single-particle band-structure of a one-dimensional wire. We demonstrate nearly complete mitigation of decoherence and readout errors and arrive at an accuracy in measuring energy eigenvalues of this wire with an error of ~0.01 rad, whereas typical energy scales are of order 1 rad. Insight into this unprecedented algorithm fidelity is gained by highlighting robust properties of a Fourier transform, including the ability to resolve eigenenergies with a statistical uncertainty of 1e-4 rad. Furthermore, we synthesize magnetic flux and disordered local potentials, two key tenets of a condensed-matter system. When sweeping the magnetic flux, we observe avoided level crossings in the spectrum, a detailed fingerprint of the spatial distribution of local disorder. Combining these methods, we reconstruct electronic properties of the eigenstates where we observe persistent currents and a strong suppression of conductance with added disorder. Our work describes an accurate method for quantum simulation and paves the way to study novel quantum materials with superconducting qubits.	翻訳日:2023-04-22 08:06:07 公開日:2021-06-01
# 一般化安定化状態の効率的な絡み合い生成と検出 Efficient entanglement generation and detection of generalized stabilizer states ( http://arxiv.org/abs/2012.07606v2 ) ライセンス: Link先を確認	Yihong Zhang, Yifan Tang, You Zhou, Xiongfeng Ma	(参考訳) 大規模絡み合いの生成と検証は量子技術の発展に不可欠である。本論文では,ハイゼンベルク相互作用を用いて,多数の量子ビットの真の多部絡み合いを生成する効率的なスキームを提案する。この方法は超伝導、閉じ込められたイオン、低温原子系を含む様々な物理プラットフォームで便利に実装できる。出力量子状態の絡み合いを特徴付けるために、安定化器形式を一般化し、絡み合い証人法を開発する。特に,与えられた雑音レベル以下の計測設定を最小にすることで,絡み合い証人を最適化する汎用探索アルゴリズムを設計した。実用化の観点から,実験効率と検出堅牢性とのトレードオフを数値的に検討する。 The generation and verification of large-scale entanglement are essential to the development of quantum technologies. In this paper, we present an efficient scheme to generate genuine multipartite entanglement of a large number of qubits by using the Heisenberg interaction. This method can be conveniently implemented in various physical platforms, including superconducting, trapped-ion, and cold-atom systems. In order to characterize the entanglement of the output quantum state, we generalize the stabilizer formalism and develop an entanglement witness method. In particular, we design a generic searching algorithm to optimize entanglement witness with a minimal number of measurement settings under a given noise level. From the perspective of practical applications, we numerically study the trade-off between the experiment efficiency and the detection robustness.	翻訳日:2023-04-20 21:23:03 公開日:2021-06-01
# 駆動型2次系浴による有効量子動力学 Effective quantum dynamics induced by a driven two-level-system bath ( http://arxiv.org/abs/2012.11235v2 ) ライセンス: Link先を確認	Katja Kustura, Oriol Romero-Isart, Carlos Gonzalez-Ballestero	(参考訳) 損失のあるがコヒーレント駆動の2段階システム(tls)が、jaynes-cummings相互作用を介してボソニックシステムと結合した、ボルン・マルコフマスター方程式を導出する。すべての主方程式を解析的に導出する。同一のTLSに結合した単一モードシステムの場合のこれらの速度を特徴付ける。駆動型TLS浴の非熱定常状態から生じる系の定常状態とそのエキゾチック特性について検討した。これらの性質には、消散性増幅、浴槽誘起線形不安定、コヒーレントおよび消散性スクイーズがある。マスター方程式は任意の強いTLS駆動に有効であり、多レベルシステムや他のシステム-バス相互作用項を含むように一般化することができる。我々の研究は、例えば超伝導回路、マグノン系、量子音響に基づく量子技術デバイスにおいて重要な制限因子であるTLSによるデコヒーレンスの研究と特徴付けを行うツールを提供する。 We derive a Born-Markov master equation describing the dissipation induced by a bath of lossy but coherently driven two-level systems (TLS) coupled to a bosonic system via Jaynes-Cummings interaction. We analytically derive all the master equation rates. We characterize these rates for the particular case of a single-mode system coupled to identical TLS. We study the steady state of the system and its exotic properties stemming from the non-thermal stationary state of the driven TLS bath. These properties include dissipative amplification, bath-induced linear instability, and both coherent and dissipative squeezing. The master equation is valid for arbitrarily strong TLS driving, and it can be generalized to include multi-level systems or other system-bath interaction terms, among others. Our work provides a tool to study and characterize TLS-induced decoherence, a key limiting factor in quantum technological devices based on, for instance, superconducting circuits, magnonic systems, or quantum acoustics.	翻訳日:2023-04-20 00:37:37 公開日:2021-06-01
# 量子インターネットアプリケーション,機能,実装技術,課題,研究の方向性 Quantum Internet- Applications, Functionalities, Enabling Technologies, Challenges, and Research Directions ( http://arxiv.org/abs/2101.04427v2 ) ライセンス: Link先を確認	Amoldeep Singh, Kapal Dev, Harun Siljak, Hem Dutt Joshi and Maurizio Magarini	(参考訳) 私たちが現在使っている先進的なノートブック、携帯電話、インターネットアプリケーションは、すべてゼロとゼロの古典的なコミュニケーションに根ざしています。古典的なインターネットは、数学の融合とクロード・シャノンの情報理論を基礎にしている。しかし、今日のインターネット技術は盗聴者の遊び場です。これは、古典的なインターネット技術に依存する様々なアプリケーションに深刻な課題をもたらす。これにより研究者たちは、より安全である新しい技術に切り替える動機となった。量子効果を探求し、研究者はセキュリティ、プライバシ、量子計算、通信、メトロロジーといった幅広い能力を提供する量子ネットワークへの道を切り開いた。量子インターネットの実現には、量子暗号プロトコルで保護された量子チャネルを介して、様々なリモートノード間の量子通信が必要である。このようなネットワークは、ゼロと1の値を同時に取り得る量子ビット(量子ビット)に依存している。エンタングルメント、テレポーテーション、重ね合わせといった量子ビットの異常な性質のため、従来のネットワーク上の量子ネットワークに多くの点で縁を与える。しかし同時に、長い距離で量子ビットを送信することは恐ろしい作業であり、そのような距離で量子テレポーテーションの広範な研究は、近い将来、量子インターネットを物理的に実現するためのブレークスルーとなるだろう。本稿では,グローバルな量子インターネット開発に必要なインフラの基本的な理解を得るために,量子インターネット機能,技術,アプリケーション,オープンな課題を幅広く調査している。 The advanced notebooks, mobile phones, and internet applications in today's world that we use are all entrenched in classical communication bits of zeros and ones. Classical internet has laid its foundation originating from the amalgamation of mathematics and Claude Shannon's theory of information. But today's internet technology is a playground for eavesdroppers. This poses a serious challenge to various applications that relies on classical internet technology. This has motivated the researchers to switch to new technologies that are fundamentally more secure. Exploring the quantum effects, researchers paved the way into quantum networks that provide security, privacy and range of capabilities such as quantum computation, communication and metrology. The realization of quantum internet requires quantum communication between various remote nodes through quantum channels guarded by quantum cryptographic protocols. Such networks rely upon quantum bits (qubits) that can simultaneously take the value of zeros and ones. Due to extraordinary properties of qubits such as entanglement, teleportation and superposition, it gives an edge to quantum networks over traditional networks in many ways. But at the same time transmitting qubits over long distances is a formidable task and extensive research is going on quantum teleportation over such distances, which will become a breakthrough in physically realizing quantum internet in near future. In this paper, quantum internet functionalities, technologies, applications and open challenges have been extensively surveyed to help readers gain a basic understanding of infrastructure required for the development of global quantum internet.	翻訳日:2023-04-17 00:43:19 公開日:2021-06-01
# 曲面時空における干渉可視性 Interferometric Visibility in Curved Spacetimes ( http://arxiv.org/abs/2101.06320v3 ) ライセンス: Link先を確認	Marcos L. W. Basso and Jonas Maziero	(参考訳) 著者らは[M. Zych et al., Nat. Commun. 2, 505 (2011)]で、インターフェロメトリーの可視性は重力場の影響を受けており、適切な時間という一般相対論的概念なしでは説明できないと予測した。本研究では,ニュートン極限における局所ローレンツ変換のユニタリ表現を用いて,異なる経路を導出し,同じ効果を導出する。また,重力による干渉視認性への影響は時空測地線によっても持続することを示した。しかし、その影響は必ずしも適切な時間の概念によるものではない。例えば、シュワルツシルト時空に「天文学的な」マッハ・ツェンダー干渉計を構築することで、干渉計の可視性への影響は、別の一般相対論的効果である測地的偏差による可能性がある。さらに、局所ローレンツ変換のユニタリ表現を用いて、この干渉的可視性の振る舞いが任意の時空に対して一般的であることを示し、量子トンの運動を二次元の空間平面に制限する。 In [M. Zych et al., Nat. Commun. 2, 505 (2011)], the authors predicted that the interferometric visibility is affected by a gravitational field in way that cannot be explained without the general relativistic notion of proper time. In this work, we take a different route and start deriving the same effect using the unitary representation of the local Lorentz transformation in the Newtonian Limit. In addition, we show that the effect on the interferometric visibility due to gravity persists in different spacetime geometries. However, the influence is not necessarily due to the notion of proper time. For instance, by constructing a `astronomical' Mach-Zehnder interferometer in the Schwarzschild spacetime, the influence on the interferometric visibility can be due to another general relativistic effect, the geodetic precession. Besides, by using the unitary representation of the local Lorentz transformation, we show that this behavior of the interferometric visibility is general for an arbitrary spacetime, provided that we restrict the motion of the quanton to a two-dimensional spacial plane.	翻訳日:2023-04-15 02:55:33 公開日:2021-06-01
# lindblad方程式における時間依存的環境結合による欠陥生成 Defect production due to time-dependent coupling to environment in the Lindblad equation ( http://arxiv.org/abs/2101.11334v2 ) ライセンス: Link先を確認	Bal\'azs Gul\'acsi, Bal\'azs D\'ora	(参考訳) 近年,非エルミート・ハミルトニアンによる非ユニタリダイナミクス中の欠陥生成が研究されている。例外点を通じて非エルミートカップリングを線形に隆起させることで、欠陥はエルミート臨界点に近づくのとほとんど同じ方法で生成される。一般化されたKibble-Zurekスケーリングは、ドライブの速度と対応する臨界指数の点で欠陥密度の増大を考慮に入れた。ここでは, 循環項を付加し, 量子ジャンプ問題に対するリンドブラッド時間の完全な発展を考えることにより, この設定を拡張する。時間内に環境結合を線形に増加させ、リウビリアの定常解を超えることで、欠陥密度は全てのケースにおいて駆動速度と線形にスケールする。このスケーリングは、過渡状態に現れるリウヴィリアンの例外的な点の存在の影響を受けない。断熱摂動理論の変種を用いて、欠陥密度のスケーリングは代数方程式の集合から正確に決定される。本研究は定常状態と過渡状態に対応する例外点に対するリンドブラジアン時間発展の特異な感度を示している。 Recently defect production was investigated during non-unitary dynamics due to non-Hermitian Hamiltonian. By ramping up the non-Hermitian coupling linearly in time through an exceptional point, defects are produced in much the same way as approaching a Hermitian critical point. A generalized Kibble--Zurek scaling accounted for the ensuing scaling of the defect density in terms of the speed of the drive and the corresponding critical exponents. Here we extend this setting by adding the recycling term and considering the full Lindbladian time evolution of the problem with quantum jumps. We find that by linearly ramping up the environmental coupling in time, and going beyond the steady-state solution of the Liouvillian, the defect density scales linearly with the speed of the drive for all cases. This scaling is unaffected by the presence of exceptional points of the Liouvillian, which can show up in the transient states. By using a variant of the adiabatic perturbation theory, the scaling of the defect density is determined exactly from a set of algebraic equations. Our study indicates the distinct sensitivity of the Lindbladian time evolution to exceptional points corresponding to steady states and transient states.	翻訳日:2023-04-13 20:08:23 公開日:2021-06-01
# 絡み合い能力によるホーキング放射の検出 Probing Hawking radiation through capacity of entanglement ( http://arxiv.org/abs/2102.02425v3 ) ライセンス: Link先を確認	Kohki Kawabata, Tatsuma Nishioka, Yoshitaka Okuyama and Kento Watanabe	(参考訳) 重力相転移に関連するモデルにおける絡み合いの容量について考察する。キャパシティは、熱力学の逆温度と似た役割を果たすレプリカパラメータによってラベル付けされる。放射状ブラックホールの世界ブレインモデルの終わりには、様々なタイプのトポロジーの模擬ワームホール幾何学間の相転移を示すピークがページ時間付近にある。同様に、ホーキング放射を記述する移動ミラーモデルでは、支配的なサドルが2つのフェーズの間で切り替わるときに、キャパシティが不連続を示すのが典型的である。いずれの場合も、ブラックホール蒸発過程の診断に有用であることが判明します。 We consider the capacity of entanglement in models related with the gravitational phase transitions. The capacity is labeled by the replica parameter which plays a similar role to the inverse temperature in thermodynamics. In the end of the world brane model of a radiating black hole the capacity has a peak around the Page time indicating the phase transition between replica wormhole geometries of different types of topology. Similarly, in a moving mirror model describing Hawking radiation the capacity typically shows a discontinuity when the dominant saddle switches between two phases, which can be seen as a formation of island regions. In either case we find the capacity can be an invaluable diagnostic for a black hole evaporation process.	翻訳日:2023-04-12 20:12:07 公開日:2021-06-01
# 近接場放射伝熱固有モード Near-Field Radiative Heat Transfer Eigenmodes ( http://arxiv.org/abs/2102.05769v2 ) ライセンス: Link先を確認	Stephen Sanders and Lauren Zundel and Wilton J. M. Kort-Kamp and Diego A. R. Dalvit and Alejandro Manjavacas	(参考訳) ナノスケール物体間の近接場電磁相互作用は、遠距離場黒体放射によって確立された限界を大幅に超える拡張された放射熱伝達をもたらす。本稿では, この過程を支配する方程式の固有モード展開を用いて, ナノ構造の集合における放射熱伝達の時間的ダイナミクスを記述するための理論的枠組みを提案する。このフォーマリズムを用いて、ナノ構造の集合の熱化を決定する基本原理を同定し、一般的ながしばしば直観的ではないダイナミクスを明らかにする。その結果,多数のナノ粒子を含む系における近接場放射熱伝達の時間的ダイナミクスを効率的に解析する,エレガントで正確な手法が得られた。 The near-field electromagnetic interaction between nanoscale objects produces enhanced radiative heat transfer that can greatly surpass the limits established by far-field black-body radiation. Here, we present a theoretical framework to describe the temporal dynamics of the radiative heat transfer in ensembles of nanostructures, which is based on the use of an eigenmode expansion of the equations that govern this process. Using this formalism, we identify the fundamental principles that determine the thermalization of collections of nanostructures, revealing general but often unintuitive dynamics. Our results provide an elegant and precise approach to efficiently analyze the temporal dynamics of the near-field radiative heat transfer in systems containing a large number of nanoparticles.	翻訳日:2023-04-12 00:31:12 公開日:2021-06-01
# 説明責任のあるモノのインターネットを目指して : レビュー可能性への呼びかけ Towards an accountable Internet of Things: A call for reviewability ( http://arxiv.org/abs/2102.08132v2 ) ライセンス: Link先を確認	Chris Norval, Jennifer Cobbe, Jatinder Singh	(参考訳) IoTがますます普及するにつれて、IoTシステムの構築とデプロイに関する懸念が高まっている。接続されたデバイスは膨大な量のデータを生成し、アルゴリズムシステムを駆動し、実際の結果をもたらす。何が起きたのか、なぜ起きたのか、誰が責任を負っているのかをどうやって特定するのか? このようなシステムの複雑さを考えると、どこから始めるのか? この章では、IoTに関連する説明責任の側面を概説する。具体的には、IoTシステムのレビューを容易にするメカニズム(法的、技術的、組織的)の緊急性の必要性を論じます。このようなメカニズムは、関係する利害関係者がより深く理解し、評価し、尋問し、我々の世界に浸透する接続された環境に挑戦できるようにすることで、説明責任をサポートするために機能する。 As the IoT becomes increasingly ubiquitous, concerns are being raised about how IoT systems are being built and deployed. Connected devices will generate vast quantities of data, which drive algorithmic systems and result in real-world consequences. Things will go wrong, and when they do, how do we identify what happened, why they happened, and who is responsible? Given the complexity of such systems, where do we even begin? This chapter outlines aspects of accountability as they relate to IoT, in the context of the increasingly interconnected and data-driven nature of such systems. Specifically, we argue the urgent need for mechanisms - legal, technical, and organisational - that facilitate the review of IoT systems. Such mechanisms work to support accountability, by enabling the relevant stakeholders to better understand, assess, interrogate and challenge the connected environments that increasingly pervade our world.	翻訳日:2023-04-11 00:22:49 公開日:2021-06-01
# 拡張ウィグナーの友人問題と標準量子力学の内部整合性 Extended Wigner's friend problem and the internal consistency of standard quantum mechanics ( http://arxiv.org/abs/2102.08709v3 ) ライセンス: Link先を確認	D.Sokolovski and A.Matzkin	(参考訳) 拡張されたウィグナーの友人問題は、友人が量子測定をしている密閉された実験室を測定する2つのオブザーバーを扱う。本稿では、ファインマンが有名な「Feynman Lectures on Physics」で明らかにした量子力学の基本規則に頼って、この問題を考察する。近年の議論では、拡張されたウィグナーの友人問題は量子論では一貫して説明できないことが示唆されているが、これらの標準規則の直接的な適用は、関係するすべてのエージェントの計測結果の曖昧で一貫した説明をもたらす。 The extended Wigner's friend problem deals with two Observers each measuring a sealed laboratory in which a friend is making a quantum measurement. We investigate this problem by relying on the basic rules of quantum mechanics as exposed by Feynman in the well-known "Feynman Lectures on Physics". Although recent discussions have suggested that the extended Wigner's friend problem cannot consistently be described by quantum theory, we show here that a straightforward application of these standard rules results in a non-ambiguous and consistent account of the measurement outcomes for all agents involved.	翻訳日:2023-04-10 23:54:32 公開日:2021-06-01
# 最大量子カオス系について On systems of maximal quantum chaos ( http://arxiv.org/abs/2102.11294v3 ) ライセンス: Link先を確認	Mike Blake and Hong Liu	(参考訳) 多体量子系におけるカオスの顕著な特徴は、量子リャプノフ指数上の有界の存在である。重要な質問は、この境界を飽和させる最大カオスシステムについて、何が特別なのかを理解することである。ここでは、このようなシステムにおけるカオスの「流体力学」の起源のさらなる証拠を提供し、最大カオスシステムの目印について議論する。まず,前述したカオスの流体力学的実効場理論を最大カオス系の理論として理解すべきであることを示す。次に,それまでの文献では暗黙であった極大カオス,すなわち一般数体作用素の可換正方形における指数的成長の抑制を強調・明示する。カオス有効場理論におけるこの抑制の一般論を提案し、SYKモデルとホログラフィシステムを用いて説明する。この抑制は、最大カオス系における演算子スクランブルの性質が、非最大カオス系におけるスクランブルと根本的に異なることを示していると推測する。また,非最大カオス系においても,十分大きな距離で最大カオス状態が存在する場合の最も単純なシナリオについても論じる。 A remarkable feature of chaos in many-body quantum systems is the existence of a bound on the quantum Lyapunov exponent. An important question is to understand what is special about maximally chaotic systems which saturate this bound. Here we provide further evidence for the `hydrodynamic' origin of chaos in such systems, and discuss hallmarks of maximally chaotic systems. We first provide evidence that a hydrodynamic effective field theory of chaos we previously proposed should be understood as a theory of maximally chaotic systems. We then emphasize and make explicit a signature of maximal chaos which was only implicit in prior literature, namely the suppression of exponential growth in commutator squares of generic few-body operators. We provide a general argument for this suppression within our chaos effective field theory, and illustrate it using SYK models and holographic systems. We speculate that this suppression indicates that the nature of operator scrambling in maximally chaotic systems is fundamentally different to scrambling in non-maximally chaotic systems. We also discuss a simplest scenario for the existence of a maximally chaotic regime at sufficiently large distances even for non-maximally chaotic systems.	翻訳日:2023-04-10 05:31:13 公開日:2021-06-01
# サイバーセキュリティの月が輝いているとは言わないで下さい(サイバーセキュリティのショーと説明) Don't Tell Me The Cybersecurity Moon Is Shining... (Cybersecurity Show And Tell) ( http://arxiv.org/abs/2103.11030v3 ) ライセンス: Link先を確認	Luca Vigan\`o	(参考訳) 「ええ、言うなよ」は、どの作家にとっても文学の戒めとなっている。これはあらゆる形態のフィクションにも当てはまり、科学的な記述を含むノンフィクションにも当てはまり、多くの科学的なコミュニケーションとストーリーテリングのアプローチの中心にある。本稿では,数学や科学の基盤となる概念や結果,特にサイバーセキュリティにおいて,複雑な概念を提示し,教えたり,説明したりする上で,実際に「説明する(show \emph{and} tell)」が最善のアプローチであることが多いことを論じる。サイバーセキュリティを説明するために、どのように異なる種類のアートワークが使えるかについて議論し、視覚的なストーリーテリングや他の形態のストーリーテリングを通して、どのように(形式的で技術的に)概念を説明するかを説明する。また、アートワークの4つのカテゴリとそれらが提供する説明についても論じます。 "Show, don't tell" has become the literary commandment for any writer. It applies to all forms of fiction, and to non-fiction, including scientific writing, where it lies at the heart of many scientific communication and storytelling approaches. In this paper, I discuss how "show \emph{and} tell" is actually often the best approach when one wants to present, teach or explain complicated ideas such as those underlying notions and results in mathematics and science, and in particular in cybersecurity. I discuss how different kinds of artworks can be used to explain cybersecurity and I illustrate how telling (i.e., explaining notions in a formal, technical way) can be paired with showing through visual storytelling or other forms of storytelling. I also discuss four categories of artworks and the explanations they help provide.	翻訳日:2023-04-07 10:46:09 公開日:2021-06-01
# 市民中心・国境を越えたeGovernanceのための信頼と相互運用可能な分散ソリューション--概念的アプローチ A trustable and interoperable decentralized solution for citizen-centric and cross-border eGovernance: A conceptual approach ( http://arxiv.org/abs/2103.15458v2 ) ライセンス: Link先を確認	George Domalis, Nikos Karacapilidis, Dimitris Tsakalidis, Anastasios Giannaros	(参考訳) 本稿では,共通の公共サービスを共有するための横断的・横断的エグバランス・パラダイムを支援することを目的として,デジタル,効率的,費用対効果,相互運用性,安全性を備えた,効率的なビッグデータ交換・サービス配信のためのデセンタライズド・ネットワークに受益者が参加可能なaiエンハンスメント・ソリューションを提案する。溶液は、一データの共有のための信頼性及び効率的な分散化機構で、プロセスの複雑さ及び資源の高要求に対処することができる。 (二利害関係者のニーズに合わせてモバイルサービスを提供するためのエコシステム (iii)複数のサービスで取引を管理するためのシングルサインオンウォレット機構 (iv)既存の電子政府システムと新規に開発したものとの間で情報の安全な交換を行う通信層。示唆的なアプリケーションシナリオは、私たちのアプローチの可能性を示しています。 Aiming to support a cross-sector and cross-border eGovernance paradigm for sharing common public services, this paper introduces an AI-enhanced solution that enables beneficiaries to participate in a decenntralized network for effective big data exchange and service delivery that promotes the once-only priority and is by design digital, efficient, cost-effective, interoperable and secure. The solution comprises (i) a reliable and efficient decentralized mechanism for data sharing, capable of addressing the complexity of the processes and their high demand of resources; (ii) an ecosystem for delivering mobile services tailored to the needs of stakeholders; (iii) a single sign-on Wallet mechanism to manage the transactions with multiple services; and (iv) an intercommunication layer, responsible for the secure exchange of information among existing eGovernment systems with newly developed ones. An indicative application scenario showcases the potential of our approach.	翻訳日:2023-04-06 06:08:05 公開日:2021-06-01
# 特異相互作用を持つ量子熱エンジン Quantum Heat Engines with Singular Interactions ( http://arxiv.org/abs/2105.00032v2 ) ライセンス: Link先を確認	Nathan M Myers, Jacob McCready, Sebastian Deffner	(参考訳) 量子現象を利用することで、量子デバイスは古典的デバイスを上回る可能性がある。これまでの研究では、ボソニック加工媒体はフェルミオン加工媒体よりも優れた性能が得られることが示されている。我々は、ボゾンとフェルミオンの極限の間で有効対称性を調整できる特異な相互作用を組み込むことにより、この研究を拡大する。この枠組みでは、粒子はハルダンの一般化された排他統計に従属するオンとして扱うことができる。統計エノン」の枠組みを用いて解析的にダイナミクスを解き、粒子間相互作用と波動関数対称性との相互作用をエンジン性能について検討する。 By harnessing quantum phenomena, quantum devices have the potential to outperform their classical counterparts. Previous work has shown that a bosonic working medium can yield better performance than a fermionic medium. We expand upon this work by incorporating a singular interaction that allows the effective symmetry to be tuned between the bosonic and fermionic limits. In this framework, the particles can be treated as anyons subject to Haldane's generalized exclusion statistics. Solving the dynamics analytically using the framework of "statistical anyons" we explore the interplay between interparticle interactions and wave function symmetry on engine performance.	翻訳日:2023-04-01 23:32:56 公開日:2021-06-01
# 単一原子の放射圧:多レベル原子への正確な解析的アプローチの一般化 Radiation pressure on single atoms: generalization of an exact analytical approach to multilevel atoms ( http://arxiv.org/abs/2105.08554v2 ) ライセンス: Link先を確認	L. Podlecki, J. Martin, and T. Bastin	(参考訳) 近年の研究では、任意の強度、周波数、位相、伝播方向を持つ任意の平面波と相互作用する2レベル原子が経験する放射力について、半古典的状態下での計算のための標準化された正確な解析形式を提示した。オプト Soc 私は... b \textbf{35}, 127-132 (2018)]。ここでは、この処理を多レベル原子の場合にまで拡張し、原子レベルの縮退が考慮され、光の偏光が遊びに入る。この目的のために行列形式が開発されている。 In a recent work, we provided a standardized and exact analytical formalism for computing in the semiclassical regime the radiation force experienced by a two-level atom interacting with any number of plane waves with arbitrary intensities, frequencies, phases, and propagation directions [J. Opt. Soc. Am. B \textbf{35}, 127-132 (2018)]. Here, we extend this treatment to the multilevel atom case, where degeneracy of the atomic levels is considered and polarization of light enters into play. A matrix formalism is developed to this aim.	翻訳日:2023-03-30 19:59:11 公開日:2021-06-01
# 教育ツールとしての競争のプログラミングと学生への動機づけ Students Programming Competitions as an Educational Tool and a Motivational Incentive to Students ( http://arxiv.org/abs/2105.15136v2 ) ライセンス: Link先を確認	Youry Khmelevsky, Ken Chidlow	(参考訳) 本稿では,オカナガン大学(OC)のコンピュータサイエンス科(COSC)の学生によるプログラミングコンペティションの結果について報告し,その成果を教育的観点から考察する。卒業証書や学位課程の1年生や2年生の学生は、早ければ2学期には応用研究プロジェクトや、地元のプログラミングコンペティションや国際的なプログラミングコンペに参加したいと願っていることがわかりました。私たちの観察は、2015年にCOSC学生にプログラミングコンペティションを導入して以来の2年間の教育に基づいている。学生は、コンテストに参加することで、プログラミングコースで効果的に学び、より深く、より徹底的に学習し、クラスでより良い結果を得るのを助ける動機を与えると報告した。 In this short paper we report on student programming competition results by students from the Computer Science Department (COSC) of Okanagan College (OC) and discuss the achieved results from an educational point of view. We found that some freshmen and sophomore students in diploma and degree programs are very capable and eager to be involved in applied research projects as early as the second semester, and into local and international programming competitions as well. Our observation is based on the last 2 educational years, beginning 2015 when we introduced programming competitions to COSC students. Students reported that participation in competitions give them motivation to effectively learn in their programming courses, inspire them to learn deeper and more thoroughly, and help them achieve better results in their classes.	翻訳日:2023-03-29 06:57:02 公開日:2021-06-01
# 不純物ドープボース・アインシュタイン凝縮体における幾何相による絡み合いの観察 Witnessing entanglement via the geometric phase in a impurity-doped Bose-Einstein condensate ( http://arxiv.org/abs/2106.00224v1 ) ライセンス: Link先を確認	X. Wu, S. P. Jia, C. L. Cai, L. M. Kuang	(参考訳) 本稿では、2つのRydberg不純物量子ビットとBECからなるマイクロマクロ量子系である不純物ドープボース・アインシュタイン凝縮系(BEC)における幾何学的位相による量子絡み合いを目撃する理論的スキームを提案する。初期マイクロマイクロエンタングルメントとマイクロマクロエンタングルメントの存在下で,不純物量子ビットの幾何学的位相を計算する。不純物量子ビットの幾何学的位相は、量子間マイクロ・マイクロ・アンタングルだけでなく、キュービット-BECマイクロ・マクロ・アンタングルも観測可能である。我々の研究は、不純物ドープBECにおけるミクロとマイクロマクロの絡み合いを目撃する新しい洞察を提供する。 We propose a theoretical scheme to witness quantum entanglement via the geometric phase in an impurity-doped Bose-Einstein condensate (BEC), which is a micro-macro quantum system consisting of two Rydberg impurity qubits and the BEC. We calculate the geometric phase of the impurity qubits in the presence of the initial micro-micro and micro-macro entanglement, respectively. It is demonstrated that the geometric phase of the impurity qubits can witness not only inter-qubit micro-micro entanglement, but also qubit-BEC micro-macro entanglement. Our work provide a new insight to witness micro-micro and micro-macro entanglement in a impurity-doped BEC.	翻訳日:2023-03-28 03:58:01 公開日:2021-06-01
# シューマッハの情報幾何ベル不等式の実験的実現 Experimental Realization of Schumacher's Information Geometric Bell Inequality ( http://arxiv.org/abs/2106.00194v1 ) ライセンス: Link先を確認	Tahereh Rezaei and Shahabeddin M. Aslmarand and Robert Snyder and Behzad Khajavi and Paul M. Alsing and Michael Fanto and Doyeol (David) Ahn and Warner A. Miller	(参考訳) 量子力学は古典的に許されるよりも強い相関を生み出すことができる。この古典的相関は量子コンピューティングの「燃料」である。 1991年、シューマッハはベルのよく知られた結果に類似した美しい幾何学的アプローチを推し進め、この相関関係の古典的でない一重項状態を捉えた。彼は同じ準備された状態のアンサンブルで定義された確立された情報距離を使用した。彼は、絡み合った状態を測定するために使われる特定の検出器の設定では、結果として得られる幾何学は三角形の不等式に違反していると計算した。これは「共分散距離」という観点での新しい情報に基づく幾何ベルの不等式を与えた。本稿では,この構成を実験的に再現し,bbo結晶の自発的パラメトリックダウンコンバージョンに基づく2つの光子のベル状態に対する決定的な違反を示す。私たちが作成した状態は、$v_{ad}=0.970$でした。我々は高次元多部量子状態への一般化について議論する。 Quantum mechanics can produce correlations that are stronger than classically allowed. This stronger-than-classical correlation is the "fuel" for quantum computing. In 1991 Schumacher forwarded a beautiful geometric approach, analogous to the well-known result of Bell, to capture non-classicality of this correlation for a singlet state. He used well-established information distance defined on an ensemble of identically-prepared states. He calculated that for certain detector settings used to measure the entangled state, the resulting geometry violated a triangle inequality -- a violation that is not possible classically. This provided a novel information-based geometric Bell inequality in terms of a "covariance distance." Here we experimentally-reproduce his construction and demonstrate a definitive violation for a Bell state of two photons based on the usual spontaneous parametric down-conversion in a paired BBO crystal. The state we produced had a visibility of $V_{ad}=0.970$. We discuss generalizations to higher dimensional multipartite quantum states.	翻訳日:2023-03-28 03:57:28 公開日:2021-06-01
# 周期的量子ウォークを誘導する正則グラフの組合せ必要条件 Combinatorial necessary conditions for regular graphs to induce periodic quantum walks ( http://arxiv.org/abs/2106.00166v1 ) ライセンス: Link先を確認	Sho Kubota	(参考訳) 正規混合グラフで定義される離散時間量子ウォークの組合せ必要条件を周期的に導出する。量子ウォークが周期的であれば、時間発展行列のすべての固有値は代数整数でなければならない。この点に着目し,特性多項式の係数がどの環に属するべきかを考察する。一方、$\eta$-Hermitian adjacency matrice の特徴多項式の係数は組合せ的含意を持つ。これらのことから、時間発展行列の特徴多項式の係数に組合せ的含意を見出すことができ、したがって混合グラフが周期的であるためには組合せ的必要条件を導出することができる。例えば、$k$-regular mixed graph with $n$ vertices が周期的であるなら、$n/k$ は整数でなければならない。この研究の応用として、頂点数の素数を持つ混合完全グラフと混合グラフの周期性を決定する。 We derive combinatorial necessary conditions for discrete-time quantum walks defined by regular mixed graphs to be periodic. If the quantum walk is periodic, all the eigenvalues of the time evolution matrices must be algebraic integers. Focusing on this, we explore which ring the coefficients of the characteristic polynomials should belong to. On the other hand, the coefficients of the characteristic polynomials of $\eta$-Hermitian adjacency matrices have combinatorial implications. From these, we can find combinatorial implications in the coefficients of the characteristic polynomials of the time evolution matrices, and thus derive combinatorial necessary conditions for mixed graphs to be periodic. For example, if a $k$-regular mixed graph with $n$ vertices is periodic, then $2n/k$ must be an integer. As an application of this work, we determine periodicity of mixed complete graphs and mixed graphs with a prime number of vertices.	翻訳日:2023-03-28 03:56:37 公開日:2021-06-01
# フリップチップ技術による真空ギャップトランスモン量子ビットの実現 Vacuum-gap transmon qubits realized using flip-chip technology ( http://arxiv.org/abs/2106.00341v1 ) ライセンス: Link先を確認	Xuegang Li, Yingshan Zhang, Chuhong Yang, Zhiyuan Li, Junhua Wang, Tang Su, Mo Chen, Yongchao Li, Chengyao Li, Zhenyu Mi, Xuehui Liang, Chenlu Wang, Zhen Yang, Yulong Feng, Kehuan Linghu, Huikai Xu, Jiaxiu Han, Weiyang Liu, Peng Zhao, Teng Ma, Ruixia Wang, Jingning Zhang, Yu Song, Pei Liu, Ziting Wang, Zhaohua Yang, Guangming Xue, Yirong Jin, and Haifeng Yu	(参考訳) フリップチップ技術に基づく大規模超伝導量子プロセッサの開発では大きな進歩が見られた。本研究では、フリップチップ技術を用いて、大きなシャントコンデンサを真空ギャップパラレルプレートコンデンサに置き換えた「フリップモン」として寄贈されたトランスモン量子ビットの修正を実現する。さらに、キュービットフットプリントを低減させるために、キュービットパッドの1つと1つのジョセフソンジャンクションを底チップに、もう1つのパッドをインジウムバンプを介して1つのジョセフソンジャンクションにガルバニー接続するトップチップに配置する。真空ギャップが約5ミクロンである場合、電場参加比は約53%に達し、結果として誘電損失が減少する可能性がある。フリップモンのコヒーレンス時間は30～60マイクロ秒の範囲で測定され、同様の製造プロセスを持つ伝統的なトランスモンと同等である。電界シミュレーションは、金属-空気界面の参加比が著しく増加し、キュービットのデコヒーレンスを支配できることを示している。これはより慎重な表面処理が必要であることを示唆している。フリップモンの内部にインジウムが膨らんでいるという証拠はない。優れた形状と良好な表面処理により、フリップモンのコヒーレンスをさらに改善することができる。 Significant progress has been made in building large-scale superconducting quantum processors based on flip-chip technology. In this work, we use the flip-chip technology to realize a modified transmon qubit, donated as the "flipmon", whose large shunt capacitor is replaced by a vacuum-gap parallel plate capacitor. To further reduce the qubit footprint, we place one of the qubit pads and a single Josephson junction on the bottom chip and the other pad on the top chip which is galvanically connected with the single Josephson junction through an indium bump. The electric field participation ratio can arrive at nearly 53% in air when the vacuum-gap is about 5 microns, and thus potentially leading to a lower dielectric loss. The coherence times of the flipmons are measured in the range of 30-60 microseconds, which are comparable with that of traditional transmons with similar fabrication processes. The electric field simulation indicates that the metal-air interface's participation ratio increases significantly and may dominate the qubit's decoherence. This suggests that more careful surface treatment needs to be considered. No evidence shows that the indium bumps inside the flipmons cause significant decoherence. With well-designed geometry and good surface treatment, the coherence of the flipmons can be further improved.	翻訳日:2023-03-28 03:49:44 公開日:2021-06-01
# デザインによるAI倫理。 AIの倫理的設計原則の重要性に対する公的な認識の評価 AI-Ethics by Design. Evaluating Public Perception on the Importance of Ethical Design Principles of AI ( http://arxiv.org/abs/2106.00326v1 ) ライセンス: Link先を確認	Kimon Kieslich, Birte Keller, Christopher Starke	(参考訳) 人工知能(AI)を倫理的に設計する上での社会的重要性にもかかわらず、倫理的AI原則に対する大衆の認識に関する研究はほとんど存在しない。倫理的AI開発が人間中心で、社会全体に利益をもたらすという目標を持っているとすれば、これはさらに顕著になる。本研究では, 倫理的原則(説明可能性, 公平性, セキュリティ, 説明責任, 正確性, プライバシー, マシン自律性)が相互に重み付けされているかを検討する。これは特に重要であり、倫理的原則を同時に考慮することはコストがかかるだけでなく、開発者が特定のトレードオフの決定を下さなければならないため、時には不可能である。本稿では,税法違反検出におけるAIの利用という,特定のユースケースを考慮に入れた倫理原則の相対的重要性について,最初の回答を与える。大規模なコンジョイント調査 (n=1099) の結果は、ドイツの回答者が概ね、倫理的原則が同様に重要であることを示唆している。しかし、その後のクラスター分析により、倫理的に設計されたシステムに対する異なる選好モデルが存在することが判明した。これらのクラスタは、望ましい属性だけでなく、属性自体の重要性も実質的に異なる。さらに、これらのグループは、社会デマログラフィーやAIに関する意見の観点から構成されているかについても述べる。社会的な意味と設計上の課題について論じる。 Despite the immense societal importance of ethically designing artificial intelligence (AI), little research on the public perceptions of ethical AI principles exists. This becomes even more striking when considering that ethical AI development has the aim to be human-centric and of benefit for the whole society. In this study, we investigate how ethical principles (explainability, fairness, security, accountability, accuracy, privacy, machine autonomy) are weighted in comparison to each other. This is especially important, since simultaneously considering ethical principles is not only costly, but sometimes even impossible, as developers must make specific trade-off decisions. In this paper, we give first answers on the relative importance of ethical principles given a specific use case - the use of AI in tax fraud detection. The results of a large conjoint survey (n=1099) suggest that, by and large, German respondents found the ethical principles equally important. However, subsequent cluster analysis shows that different preference models for ethically designed systems exist among the German population. These clusters substantially differ not only in the preferred attributes, but also in the importance level of the attributes themselves. We further describe how these groups are constituted in terms of sociodemographics as well as opinions on AI. Societal implications as well as design challenges are discussed.	翻訳日:2023-03-28 03:49:22 公開日:2021-06-01
# ユニバーサルVRアクセシビリティツールキットへの道 A Way to a Universal VR Accessibility Toolkit ( http://arxiv.org/abs/2106.00321v1 ) ライセンス: Link先を確認	Felix J. Thiel, Anthony Steed	(参考訳) VR(Virtual Reality)は,システム価格の低下とユーザ数の増加によって,ますます人気が高まっている。しかし、vrのアクセシビリティの問題は今のところほとんど解決されておらず、現時点で統一的なアプローチや標準は存在しない。本稿では,システムレベルで実装されるカスタマイズ可能なツールキットを提案し,このアプローチの潜在的なメリットと,実装を成功させるために克服する必要がある課題について議論する。 Virtual Reality (VR) has become more and more popular with dropping prices for systems and a growing number of users. However, the issue of accessibility in VR has been hardly addressed so far and no uniform approach or standard exists at this time. In this position paper, we propose a customisable toolkit implemented at the system-level and discuss the potential benefits of this approach and challenges that will need to be overcome for a successful implementation.	翻訳日:2023-03-28 03:49:00 公開日:2021-06-01
# 2色フェムト秒パルス励起複合分子の三次元配向 Three Dimensional Orientation of Complex Molecules Excited by Two-Color Femtosecond Pulses ( http://arxiv.org/abs/2106.00299v1 ) ライセンス: Link先を確認	Long Xu, Ilia Tutunnikov, Yehiam Prior, Ilya Sh. Averbukh	(参考訳) 2色フェムト秒レーザーパルスによる不斉トップ分子(キラルを含む)の励起の研究を行った。直交偏光2色パルスで励起される非キラル非対称トップ分子の場合、古典的かつ量子力学的に3次元の向きを示す。キラル分子では、交差偏光二色パルスによって誘導される配向がレーザー伝播方向に沿ってエナンチオ選択的であること、すなわち2つのエナンチオマーが反対方向に配向していることが示される。短い時間スケールでは、古典的および量子シミュレーションは優れた一致の結果を与えるが、長い時間スケールでは、エナンチオ選択的配向は量子ビートを示す。これらの観測は、2色パルスと分子(超)ポーラリザビリティの相互作用電位を解析することによって定性的に説明される。それぞれのエナンチオマーを分離するためのエナンチオ選択的配向の測定および利用に長寿命配向を利用するための展望について述べる。 We study the excitation of asymmetric-top (including chiral) molecules by two-color femtosecond laser pulses. In the cases of non-chiral asymmetric-top molecules excited by an orthogonally polarized two-color pulse, we demonstrate, classically and quantum mechanically, three-dimensional orientation. For chiral molecules, we show that the orientation induced by a cross-polarized two-color pulse is enantioselective along the laser propagation direction, namely, the two enantiomers are oriented in opposite directions. On the short time scale, the classical and quantum simulations give results that are in excellent agreement, whereas on the longer time scale, the enantioselective orientation exhibits quantum beats. These observations are qualitatively explained by analyzing the interaction potential between the two-color pulse and molecular (hyper-)polarizability. The prospects for utilizing the long-lasting orientation for measuring and using the enantioselective orientation for separating the individual enantiomers are discussed.	翻訳日:2023-03-28 03:48:53 公開日:2021-06-01
# 「なぜ民主主義を標的としないのか?--米国の政治運動に関わる人々の安全保障の実践と課題-」 "Why wouldn't someone think of democracy as a target?": Security practices & challenges of people involved with U.S. political campaigns ( http://arxiv.org/abs/2106.00236v1 ) ライセンス: Link先を確認	Sunny Consolvo, Patrick Gage Kelley, Tara Matthews, Kurt Thomas, Lee Dunn, Elie Bursztein	(参考訳) 政治キャンペーンに関わる人々は、資金豊富な高度な攻撃者、特に国家国家からのデジタルセキュリティの脅威に直面している。政治運動の治安向上は民主主義を守る重要な要素である。キャンペーンのセキュリティ問題を特定するために,米国の政治領域の28人の参加者を対象に,デジタルセキュリティの実践や課題,キャンペーンに関わる人々の認識を理解するための質的研究を行った。脅威、制約、労働文化のユニークな組み合わせが、さまざまなプラットフォームやドメインのテクノロジーを、セキュリティ攻撃に悪影響を及ぼすような方法で利用するための政治キャンペーンに関与している人々を導く、というのが、大きくて包括的な発見だ。機密データは、強力なパスワード、二要素認証、暗号化、アクセス制御をアドホックに採用することで、多くの個人および作業アカウントに保存された。個人企業、委員会、組織、キャンペーン、学術機関は、特定された問題を自ら解決することはできない。この目的のために、我々はこの複雑な問題空間を最初に理解し、様々な専門家グループが政治キャンペーンのセキュリティを改善するために協力し始める方法を推奨する。 People who are involved with political campaigns face increased digital security threats from well-funded, sophisticated attackers, especially nation-states. Improving political campaign security is a vital part of protecting democracy. To identify campaign security issues, we conducted qualitative research with 28 participants across the U.S. political spectrum to understand the digital security practices, challenges, and perceptions of people involved in campaigns. A main, overarching finding is that a unique combination of threats, constraints, and work culture lead people involved with political campaigns to use technologies from across platforms and domains in ways that leave them--and democracy--vulnerable to security attacks. Sensitive data was kept in a plethora of personal and work accounts, with ad hoc adoption of strong passwords, two-factor authentication, encryption, and access controls. No individual company, committee, organization, campaign, or academic institution can solve the identified problems on their own. To this end, we provide an initial understanding of this complex problem space and recommendations for how a diverse group of experts can begin working together to improve security for political campaigns.	翻訳日:2023-03-28 03:48:29 公開日:2021-06-01
# フォトニックケージにおける逆アンダーソン遷移 Inverse Anderson transition in photonic cages ( http://arxiv.org/abs/2106.00231v1 ) ライセンス: Link先を確認	Stefano Longhi	(参考訳) アンダーソン局在による輸送阻害は、乱れた周期格子においてユビキタスである。しかし、平らなバンド障害のみを示す結晶では、マクロなバンド平坦化を持ち上げ、幾何学的局在を除去し、特定の条件下での輸送を可能にする。この現象は、逆アンダーソン転移と呼ばれ、3次元平面バンド系に対して予測されるが、今のところ直接観測されていない。ここでは,相関性二元性障害が逆アンダーソン遷移と弾道輸送を誘発する,アハロノフ-ボームフォトニックケージという,単純な準一次元フォトニックフラットバンドシステムを提案する。 Transport inhibition via Anderson localization is ubiquitous in disordered periodic lattices. However, in crystals displaying only flat bands disorder can lift macroscopic band flattening, removing geometric localization and enabling transport in certain conditions. Such a striking phenomenon, dubbed inverse Anderson transition and predicted for three-dimensional flat band systems, has thus far not been directly observed. Here we suggest a simple quasi one-dimensional photonic flat band system, namely an Aharonov-Bohm photonic cage, in which correlated binary disorder induces an inverse Anderson transition and ballistic transport.	翻訳日:2023-03-28 03:47:35 公開日:2021-06-01
# 非エルミートメリーランド模型 Non-Hermitian Maryland Model ( http://arxiv.org/abs/2106.00230v1 ) ライセンス: Link先を確認	Stefano Longhi	(参考訳) 周期順序表示相転移を持つ非エルミート系は、エルミート物理学のパラダイムを超えている。残念なことに、既知の非エルミート模型のポテンシャルの非可測性は可積分ではない。このことは、局所化/非局在化相転移、複素平面のモビリティエッジ、およびそれらのトポロジカルな性質が解けるような、正確な可解モデルを求める動機付けとなる。ここでは、al において grempel {\it によって提唱された有名な量子カオスの可積分モデルに対する非帰納的非エルミート拡大である準結晶の完全可解モデルを示す。である。 Rev. Lett. bf 49}, 833 (1982) であり、メリーランドモデルと呼ばれた。エルミート・メリーランドのモデルとは対照的に、非エルミート拡張はよりリッチなシナリオを示し、複素エネルギー平面における位相的モビリティエッジによる局在化-非局在化相転移を示す。 Non-Hermitian systems with aperiodic order display phase transitions that are beyond the paradigm of Hermitian physics. Unfortunately, owing to the incommensurability of the potential most of known non-Hermitian models are not integrable. This motivates the search for exactly solvable models, where localization/delocalization phase transitions, mobility edges in complex plane and their topological nature can be unraveled. Here we present an exactly solvable model of quasi crystal, which is a non-pertrurbative non-Hermitian extension of a famous integrable model of quantum chaos proposed by Grempel {\it at al.} [Phys. Rev. Lett. {\bf 49}, 833 (1982)] and dubbed the Maryland model. Contrary to the Hermitian Maryland model, its non-Hermitian extension shows a richer scenario, with a localization-delocalization phase transition via topological mobility edges in complex energy plane.	翻訳日:2023-03-28 03:47:24 公開日:2021-06-01
# 惑星外惑星検出のための量子仮説試験 Quantum hypothesis testing for exoplanet detection ( http://arxiv.org/abs/2106.00488v1 ) ライセンス: Link先を確認	Zixin Huang, Cosmo Lupo	(参考訳) より明るい光源の近傍で二次光源のかすかな放出を検出することは、太陽系外惑星の探索に直接イメージングを使用する上で最も深刻な障害である。量子状態識別と量子イメージング技術を用いて, 2つの音源が角分離が小さい場合であっても, 弱い二次音源の存在を検出する誤差の確率を著しく低減できることを示す。弱いソースが明るいソースに対して$\epsilon \ll 1 $という相対的な強度を持つ場合、エラー指数は$/1/epsilon$で改善される。また、この方法では最適である線形光学測定値も発見する。この結果は、天文学から顕微鏡まで、光学イメージングのツールボックスを補完する手段として機能する。 Detecting the faint emission of a secondary source in the proximity of the much brighter source has been the most severe obstacle for using direct imaging in searching for exoplanets. Using quantum state discrimination and quantum imaging techniques, we show that one can significantly reduce the probability of error for detecting the presence of a weak secondary source, even when the two sources have small angular separations. If the weak source has relative intensity $\epsilon \ll 1 $ to the bright source, we find that the error exponent can be improved by a factor of $1/\epsilon$. We also find the linear-optical measurements that are optimal in this regime. Our result serves as a complementary method in the toolbox of optical imaging, from astronomy to microscopy.	翻訳日:2023-03-28 03:40:18 公開日:2021-06-01
# 液体中のヘテロ核スピン一重項秩序の光双極化 Optical hyperpolarization of heteronuclear spin singlet order in liquids ( http://arxiv.org/abs/2106.00414v1 ) ライセンス: Link先を確認	Y. Yang, L. Zhou, and Q. Chen	(参考訳) スピン1/2の結合対を含む核スピン一重項は、スピン格子緩和時間$t_1$よりもずっと長い時間、室温液体に核スピン過分極を保存するために用いられる。どちらも、長寿命のホモ核およびヘテロ核スピン-シンクレット秩序の観測である。同一種のハイパーポーラライズド一重項はアクセス可能であるが、ハイパーポーラライズされたヘテロ核スピンシングレット秩序はまだ提示されていない。ナノダイアモンドの光偏極窒素空孔(NV)中心スピンを用いて, 室温での超分極一重項溶液の試料中で超分極一重項位が達成可能であることを示す。 The nuclear spin singlet order involving coupled pairs of spins-1/2 may be used to store nuclear spin hyperpolarization in a room temperature liquid for a time much longer than the spin-lattice relaxation time $T_1$. There both are observations of long-lived homonuclear and heteronuclear spin-singlet order. Although hyperpolarized singlet order of the same species are accessible, hyperpolarized heteronuclear spin-singlet order has not been presented yet. Here we show hyperpolarized singlet order is achievable in the sample of $^{13}$C-labeled formic acid solution at room temperature by using optically polarized nitrogen vacancy (NV) center spins in nanodiamonds.	翻訳日:2023-03-28 03:39:42 公開日:2021-06-01
# 高調波発生過程を用いた相対論的レーザープラズマ相互作用の量子光学分光法の提案 Quantum-Optical Spectrometry in Relativistic Laser-Plasma Interactions Using the High-Harmonic Generation Process: A Proposal ( http://arxiv.org/abs/2106.00372v1 ) ライセンス: Link先を確認	Theocharis Lamprou, Rodrigo Lopez-Martens, Stefan Haessler, Ioannis Liontos, Subhendu Kahaly, Javier Rivera-Dean, Philipp Stammer, Emilio Pisanty, Marcelo F. Ciappina, Maciej Lewenstein and Paraskevas Tzallas	(参考訳) 量子光学スペクトロメトリ(quantum-optical spectrometry)は、最近開発された光子相関法であり、量子分光計(quantum spectrometer, QS)を用いて、強いレーザー・マッター相互作用の量子光学的性質を明らかにし、量子光学(QO)と強いレーザー-磁場物理学(SLFP)の研究領域を結びつける。この方法は、駆動レーザ場から高次高調波などの強いレーザー場相互作用生成物へ光子を吸収する確率を提供する。この場合、高調波発生媒体との相互作用後の赤外(ir)駆動場の光子数分布に高調波スペクトルが反映される。この方法は、強いレーザーパルスと原子と半導体との相互作用によって生じる高調波を用いた非相対論的相互作用で実装された。高強度レーザー-原子相互作用における非古典的光状態の生成に利用され、強レーザー場物理学における量子電気力学の研究の基礎を構築し、量子技術への応用のための新しいタイプの非古典的光源の開発に用いられた。ここでは、QS法を簡潔に導入した後、相対論的レーザー-プラズマ相互作用においてQSがどのように適用され、相対論的量子電磁力学の研究を開始する原動力となるかについて議論する。 Quantum-optical spectrometry is a recently developed shot-to-shot photon correlation-based method, namely using a quantum spectrometer (QS), that has been used to reveal the quantum optical nature of intense laser-matter interactions and connect the research domains of quantum optics (QO) and strong laser-field physics (SLFP). The method provides the probability of absorbing photons from a driving laser field towards the generation of a strong laser-field interaction product, such as high-order harmonics. In this case, the harmonic spectrum is reflected in the photon number distribution of the infrared (IR) driving field after its interaction with the high harmonic generation medium. The method was implemented in non-relativistic interactions using high harmonics produced by the interaction of strong laser pulses with atoms and semiconductors. Very recently, it was used for the generation of non-classical light states in intense laser-atom interaction, building the basis for studies of quantum electrodynamics in strong laser-field physics and the development of a new class of non-classical light sources for applications in quantum technology. Here, after a brief introduction of the QS method, we will discuss how the QS can be applied in relativistic laser-plasma interactions and become the driving factor for initiating investigations on relativistic quantum electrodynamics.	翻訳日:2023-03-28 03:39:18 公開日:2021-06-01
# 大規模モビリティデータによる新型コロナウイルスの感染拡大予測 Predicting COVID-19 Spread from Large-Scale Mobility Data ( http://arxiv.org/abs/2106.00356v1 ) ライセンス: Link先を確認	Amray Schwabe, Joel Persson and Stefan Feuerriegel	(参考訳) 新型コロナウイルスの感染を効果的に管理するには、公衆衛生の意思決定者はケースナンバーの正確な予測が必要である。将来のケースナンバーのほぼリアルタイム予測には、人間の移動力があるが、移動力の予測力は不足している。このギャップを埋めるために,モビリティマークホークスモデルと呼ばれる,モビリティデータに基づく流行予測の新しいモデルを提案する。提案モデルは3つの構成要素から構成される: 1) ホークスプロセスは感染症の伝染動態を捉える。 2) マークは感染率を調節し, 再生数Rが空間や時間によってどのように変化するかを説明する。このマークはモビリティ共変量に基づく正規化ポアソン回帰を用いてモデル化される。 (3)地域間を旅する人々がシードした新症例を補正する。われわれのモデルはスイスのCOVID-19流行で評価された。具体的には、2020年2月から4月までの移動データを用いて、約15億回の旅行を行った。トリップカウントは、スイス最大の通信事業者であるswisscomネットワークからの大規模な通信データ、すなわち携帯電話のpingに由来する。サンプル外根平均二乗誤差の観点から,本モデルと最先端のベースラインを比較した。私たちのモデルはベースラインを15.52%上回りました。改善は5日から21日の間に異なる予測地平線を越えて一貫して達成された。また,従来の関心点データの予測能力を評価し,通信データが優れていることを確認した。我々の知る限りでは、私たちの研究は、通信データから新型コロナウイルスの拡散を予測する最初のものである。本研究は,感染拡大の抑制に携わる公衆衛生の意思決定者に対して,スケーラブルな早期警戒システムを開発することにより,これまでの研究に寄与する。 To manage the COVID-19 epidemic effectively, decision-makers in public health need accurate forecasts of case numbers. A potential near real-time predictor of future case numbers is human mobility; however, research on the predictive power of mobility is lacking. To fill this gap, we introduce a novel model for epidemic forecasting based on mobility data, called mobility marked Hawkes model. The proposed model consists of three components: (1) A Hawkes process captures the transmission dynamics of infectious diseases. (2) A mark modulates the rate of infections, thus accounting for how the reproduction number R varies across space and time. The mark is modeled using a regularized Poisson regression based on mobility covariates. (3) A correction procedure incorporates new cases seeded by people traveling between regions. Our model was evaluated on the COVID-19 epidemic in Switzerland. Specifically, we used mobility data from February through April 2020, amounting to approximately 1.5 billion trips. Trip counts were derived from large-scale telecommunication data, i.e., cell phone pings from the Swisscom network, the largest telecommunication provider in Switzerland. We compared our model against various state-of-the-art baselines in terms of out-of-sample root mean squared error. We found that our model outperformed the baselines by 15.52%. The improvement was consistently achieved across different forecast horizons between 5 and 21 days. In addition, we assessed the predictive power of conventional point of interest data, confirming that telecommunication data is superior. To the best of our knowledge, our work is the first to predict the spread of COVID-19 from telecommunication data. Altogether, our work contributes to previous research by developing a scalable early warning system for decision-makers in public health tasked with controlling the spread of infectious diseases.	翻訳日:2023-03-28 03:38:39 公開日:2021-06-01
# PdCoO$_2$における電子輸送の有限サイズ効果 Finite-size effects of electron transport in PdCoO$_2$ ( http://arxiv.org/abs/2106.00697v1 ) ライセンス: Link先を確認	Georgios Varnavides, Yaxian Wang, Philip J.W. Moll, Polina Anikeeva, and Prineha Narang	(参考訳) 近年, 単一結晶のデラフォスサイト金属において, 異種の輸送現象が観察されている。本稿では,第一原理計算と異方性ボルツマン輸送方程式の数値モデリングを組み合わせた電子輸送の理論的枠組みを提案する。モデル系としてpdcoo$_2$を用いて、異なる微視的電子およびフォノン散乱機構を研究し、異なる温度で準粒子の平均自由経路階層を確立する。異方性フェルミ表面を明示的に処理し, 拡散性, 弾道性, 流体力学的な輸送状態の限界を橋渡しする実験アクセス性輸送観測器を数値的に得る。我々は,「quasi-ballistic」と「quasi-hydrodynamic」の区別が困難であり,しばしば定量的である必要があることを示す。第一原理計算から, 得られた輸送レジームのプロットを推定し, フェルミ表面配向が微小スケールデバイスで観測される輸送シグネチャの複雑さをいかに高めるかを示す。本研究は,オープンヘキサゴナルフェルミ表面の微視的相互作用機構に関する重要な知見を提供し,有限サイズのチャネルにおける巨視的電子輸送との接続を確立する。 A wide range of unconventional transport phenomena have recently been observed in single-crystal delafossite metals. Here, we present a theoretical framework to elucidate electron transport using a combination of first-principles calculations and numerical modeling of the anisotropic Boltzmann transport equation. Using PdCoO$_2$ as a model system, we study different microscopic electron and phonon scattering mechanisms and establish the mean free path hierarchy of quasiparticles at different temperatures. We treat the anisotropic Fermi surface explicitly to numerically obtain experimentally-accessible transport observables, which bridge between the "diffusive", "ballistic", and "hydrodynamic" transport regime limits. We illustrate that distinction between the "quasi-ballistic", and "quasi-hydrodynamic" regimes is challenging and often needs to be quantitative in nature. From first-principles calculations, we populate the resulting transport regime plots, and demonstrate how the Fermi surface orientation adds complexity to the observed transport signatures in micro-scale devices. Our work provides key insights into microscopic interaction mechanisms on open hexagonal Fermi surfaces and establishes their connection to the macroscopic electron transport in finite-size channels.	翻訳日:2023-03-28 03:30:30 公開日:2021-06-01
# 高速エンタングゲート用量子クロストークキャンセルとマルチビット性能の改善 Quantum crosstalk cancellation for fast entangling gates and improved multi-qubit performance ( http://arxiv.org/abs/2106.00675v1 ) ライセンス: Link先を確認	K. X. Wei, E. Magesan, I. Lauer, S. Srinivasan, D. F. Bogorin, S. Carnevale, G. A. Keefe, Y. Kim, D. Klaus, W. Landers, N. Sundaresan, C. Wang, E. J. Zhang, M. Steffen, O. E. Dial, D. C. McKay, A. Kandala	(参考訳) 超伝導人工原子で作られた量子コンピュータは、すでにその古典的な限界を延ばしている。これらの人工原子の最低エネルギー状態は量子ビット基底として機能するが、より高いレベルは魅力的なゲートスキームのホストと望ましくない相互作用の両方の原因となる。特に、これらの原子を結合して絡み合いを生成すると、より高いレベルは計算レベルのシフトを引き起こし、不要な$zz$量子クロストークにつながる。本稿では,結合量子ビットに対する同時交流スターク効果により,エネルギーレベルを操作し,このクロストークを緩和する新しい手法を提案する。これはqubit-qubit結合とcrosstalkの基本的なデッドロックを破り、90ns cnotのゲートエラーが (0.19 $\pm$ 0.02) $\%$ となり、固定結合の単一接合トランスモンキュービットを持つ新しいczゲートのデモンストレーションとなる。さらに、7キュービットのクロストークキャンセルにより回路性能が大幅に向上し,その拡張性が実証された。この研究は、より高速なゲートと、多ビット回路の忠実度を大幅に改善した超伝導ハードウェアの道を開いた。 Quantum computers built with superconducting artificial atoms already stretch the limits of their classical counterparts. While the lowest energy states of these artificial atoms serve as the qubit basis, the higher levels are responsible for both a host of attractive gate schemes as well as generating undesired interactions. In particular, when coupling these atoms to generate entanglement, the higher levels cause shifts in the computational levels that leads to unwanted $ZZ$ quantum crosstalk. Here, we present a novel technique to manipulate the energy levels and mitigate this crosstalk via a simultaneous AC Stark effect on coupled qubits. This breaks a fundamental deadlock between qubit-qubit coupling and crosstalk, leading to a 90ns CNOT with a gate error of (0.19 $\pm$ 0.02) $\%$ and the demonstration of a novel CZ gate with fixed-coupling single-junction transmon qubits. Furthermore, we show a definitive improvement in circuit performance with crosstalk cancellation over seven qubits, demonstrating the scalability of the technique. This work paves the way for superconducting hardware with faster gates and greatly improved multi-qubit circuit fidelities.	翻訳日:2023-03-28 03:30:07 公開日:2021-06-01
# フェルミオンと量子コンピューティングに結合したZ3ゲージ理論 Z3 gauge theory coupled to fermions and quantum computing ( http://arxiv.org/abs/2106.00549v1 ) ライセンス: Link先を確認	Ronak Desai, Yuan Feng, Mohammad Hassan, Abhishek Kodumagulla, Michael McGuigan	(参考訳) 本稿では,IBM QISKitソフトウェアを用いた変分量子固有解法(VQE)アルゴリズムを用いて,量子コンピュータ上のフェルミオンを用いたZ3ゲージ理論について検討する。最大9量子ビットを使用して、基底状態エネルギーの正確な結果を得ることができる。非ゼロ化学ポテンシャルの導入により、量子コンピュータ上の有限密度の状態方程式(EOS)を決定することができる。本稿では,本システムにおける量子アドバンテージの実現可能性について,有限密度シミュレーションとフェルミオン符号問題に関して論じる。 We study the Z3 gauge theory with fermions on the quantum computer using the Variational Quantum Eigensolver (VQE) algorithm with IBM QISKit software. Using up to 9 qubits we are able to obtain accurate results for the ground state energy. Introducing nonzero chemical potential we are able to determine the Equation of State (EOS) for finite density on the quantum computer. We discuss possible realizations of quantum advantage for this system over classical computers with regards to finite density simulations and the fermion sign problem.	翻訳日:2023-03-28 03:28:35 公開日:2021-06-01
# 離散二部分断状態の有効検証と忠実度推定 Efficient verification and fidelity estimation of discrete bipartite squeezed states ( http://arxiv.org/abs/2106.00533v1 ) ライセンス: Link先を確認	Russell P Rundle	(参考訳) 利点を得るため、量子技術は量子力学に特有の現象を利用する。このような現象は2つある。これらの特徴を示す状態を生成するため、局所的な測定による生成の検証は難しいプロセスである。ここでは、2量子の単軸スクイージングハミルトニアンを用いて生成される状態を考えるが、これは絡み合った2量子の圧縮状態を生成するだけでなく、様々な形の興味深い絡み合い状態をもたらす。実測値を用いて,これらの状態の忠実度を効率的に検証し,直接推定する方法を示す。 To gain an advantage, quantum technologies utilize phenomena particular to quantum mechanics. Two such phenomena are squeezing and entanglement. Having generated states that exhibit these features, verification of their generation with local measurements can be a difficult process. Here we consider the states that are generated using the two-qudit single-axis squeezing Hamiltonian, that not only produces entangled two-qudit squeezed states but also results in various forms of interesting entangled states. We show how one can use local measurements to both efficiently verify and directly estimate the fidelity of these generated states.	翻訳日:2023-03-28 03:28:26 公開日:2021-06-01
# 平均コンカレンスと絡み合い交換 Average concurrence and entanglement swapping ( http://arxiv.org/abs/2106.00848v1 ) ライセンス: Link先を確認	J\'anos A. Bergou, Dov Fields, Mark Hillery, Siddhartha Santra and Vladimir S. Malinovsky	(参考訳) 量子ネットワークにおけるエンタングルメントスワップにおける平均コンカレンスの役割について検討する。 qubit純状態から始まり、複数のスワップにおける平均収束の伝播を規定する非常に単純な規則が存在する。混合量子ビット状態の例を見て、純粋な状態の関係が混合状態で何が可能なのかの上界を与えるのを見つける。その後、I-concurrenceを利用するquditsに移動します。ここでの状況は qubits ほど単純ではないが、比較的簡単な結果が得られる場合もある。 We study the role of average concurrence in entanglement swapping in quantum networks. We begin with qubit pure states, and there is a very simple rule governing the propagation of average concurrence in multiple swaps. We look at examples of mixed qubit states, and find the relation for pure states gives an upper bound on what is possible with mixed states. We then move on to qudits, where we make use of the I-concurrence. Here the situation is not as simple as for qubits, but in some cases relatively straightforward results can be obtained.	翻訳日:2023-03-28 03:22:32 公開日:2021-06-01
# 最大重みスケジューリングを持つ量子ネットワークの安定性解析 Stability Analysis of a Quantum Network with Max-Weight Scheduling ( http://arxiv.org/abs/2106.00831v1 ) ライセンス: Link先を確認	Thirupathaiah Vasantam, Don Towsley	(参考訳) 本稿では,ネットワークに接続された複数のユーザに対して,絡み合った量子状態を分散する量子ネットワークについて検討する。各ユーザは、リンクを介してネットワークのスイッチに接続される。ネットワークのすべてのリンクは、特定の確率で各タイムスロット内の2部ベル状態の絡み合い状態を生成し、各エンドノードは、リンクによって生成された絡み合いの1キュービットを格納する。ユーザ集合の共有絡み付けを作成するために、リンクレベルの絡み合わせのキュービット上で測定操作を行い、それらの操作は本質的に確率的であり、特定の確率で成功している。リクエストは、異なるユーザーのための共有の絡み合いを求めるシステムに届く。各リクエストは、固定されたリンクセット上のリンクレベル絡みを使って、固定されたユーザーの共有絡みを作成するためのものである。リクエストはFirst-Come-First-Servedサービス規律に従って処理され、保存されていないリクエストはバッファに格納されます。サービス要求が選択されると、関連リンク上のリンクレベルの絡み合いのキュービット上で測定操作を行い、共有絡みを生成する。要求到着率とリンクレベルの絡み合い発生率のセットに対して,要求キューの安定性に必要な条件を求める。各タイムスロットにおいて、スケジューラはネットワークを安定化させるために、異なるユーザのセットに対する絡み合わせ操作をスケジュールする必要がある。次に、最大ウェイトスケジューリングポリシーを提案し、本ポリシーが全到達率のネットワークを安定化させることを示す。また、分析を支援する数値的な結果も提供する。異なるユーザの集合に対してマルチパーティショニングを生成する単一の量子スイッチの解析は、私たちの仕事の特別なケースです。 We study a quantum network that distributes entangled quantum states to multiple sets of users that are connected to the network. Each user is connected to a switch of the network via a link. All the links of the network generate bipartite Bell-state entangled states in each time-slot with certain probabilities, and each end node stores one qubit of the entanglement generated by the link. To create shared entanglements for a set of users, measurement operations are performed on qubits of link-level entanglements on a set of related links, and these operations are probabilistic in nature and are successful with certain probabilities. Requests arrive to the system seeking shared entanglements for different sets of users. Each request is for the creation of shared entanglements for a fixed set of users using link-level entanglements on a fixed set of links. Requests are processed according to First-Come-First-Served service discipline and unserved requests are stored in buffers. Once a request is selected for service, measurement operations are performed on qubits of link-level entanglements on related links to create a shared entanglement. For given set of request arrival rates and link-level entanglement generation rates, we obtain necessary conditions for the stability of queues of requests. In each time-slot, the scheduler has to schedule entanglement swapping operations for different sets of users to stabilize the network. Next, we propose a Max-Weight scheduling policy and show that this policy stabilizes the network for all feasible arrival rates. We also provide numerical results to support our analysis. The analysis of a single quantum switch that creates multipartite entanglements for different sets of users is a special case of our work.	翻訳日:2023-03-28 03:21:59 公開日:2021-06-01
# 拡張断熱性を有するカプラアシスト制御相ゲート Coupler-Assisted Controlled-Phase Gate with Enhanced Adiabaticity ( http://arxiv.org/abs/2106.00725v1 ) ライセンス: Link先を確認	Ji Chu and Fei Yan	(参考訳) 高忠実性2量子エンタングゲートは、フォールトトレラント量子コンピュータにとって必須のビルディングブロックである。過去10年間、超伝導量子回路を用いたスケーラブルな2量子ビットゲートの開発に多大な努力が払われてきた。近年,固定周波数量子ビット(phys. rev. lett. 125, 240502; phys. rev. lett. 125, 240503)を用いた可変結合アーキテクチャを用いた簡易な制御相ゲート方式が高い忠実度で実証されている。しかし、根底にあるメカニズムの深い理解はいまだに欠けており、その可能性を完全に活用できない。ここでは、高コントラストZZ相互作用の起源を説明する包括的な理論的研究を紹介する。理解を深めたことにより,多レベルシステムにおいて断熱パルスを形成する汎用的かつ簡便な手法を開発し,設計からゲート性能を最適化する方法を明らかにした。最先端のコヒーレンス特性を考えると、このスキームは、フォールトトレラント量子計算の進歩を劇的に加速する10〜5ドルに近い2ビットゲート誤差率を達成する可能性がある。 High-fidelity two-qubit entangling gates are essential building blocks for fault-tolerant quantum computers. Over the past decade, tremendous efforts have been made to develop scalable high-fidelity two-qubit gates with superconducting quantum circuits. Recently, an easy-to-scale controlled-phase gate scheme that utilizes the tunable-coupling architecture with fixed-frequency qubits [Phys. Rev. Lett. 125, 240502; Phys. Rev. Lett. 125, 240503] has been demonstrated with high fidelity and attracted broad interest. However, in-depth understanding of the underlying mechanism is still missing, preventing us from fully exploiting its potential. Here we present a comprehensive theoretical study, explaining the origin of the high-contrast ZZ interaction. Based on improved understanding, we develop a general yet convenient method for shaping an adiabatic pulse in a multilevel system, and identify how to optimize the gate performance from design. Given state-of-the-art coherence properties, we expect the scheme to potentially achieve a two-qubit gate error rate near $10^{-5}$, which would drastically speed up the progress towards fault-tolerant quantum computation.	翻訳日:2023-03-28 03:20:11 公開日:2021-06-01
# ダイヤモンド中のtin空スピン量子ビットの量子制御 Quantum control of the tin-vacancy spin qubit in diamond ( http://arxiv.org/abs/2106.00723v1 ) ライセンス: Link先を確認	Romain Debroux, Cathryn P. Michaels, Carola M. Purser, Noel Wan, Matthew E. Trusheim, Jes\'us Arjona Mart\'inez, Ryan A. Parker, Alexander M. Stramma, Kevin C. Chen, Lorenzo de Santis, Evgeny M. Alexeev, Andrea C. Ferrari, Dirk Englund, Dorian A. Gangloff, Mete Atat\"ure	(参考訳) ダイヤモンドにおけるグループIVカラーセンターは、量子ネットワークデバイスにとって有望なライトマッターインターフェースである。負電荷のスズ空洞中心(snv)は特に興味深いもので、その大きなスピン軌道結合はフォノンの強調に対する強い保護と、スピン-光子エンタングルメントスキームへの光遷移のロバストな周期性をもたらす。ここでは、SnVスピン量子ビットの多軸コヒーレント制御を、地上と励起状態の間の全光刺激されたラマン駆動により実証する。我々はコヒーレント集団トラップと光駆動型電子スピン共鳴を用いて1.7Kで量子ビットへのコヒーレントアクセスを確認し、スピンラビ振動を$\Omega/2\pi$=3.6(1) MHzで得る。 all-optical ramsey interferometry は、スピンの減衰時間である$t_2^$=1.3(3)$\mu$s と2パルスの動的デカップリングが既にスピンコヒーレンス時間を$t_2$=0.33(14) ms に拡張し、変換制限された光子とフォトニックナノ構造への統合により、snv は量子ネットワークにおける競合スピンフォトニクス構築ブロックとなることを示した。 Group-IV color centers in diamond are a promising light-matter interface for quantum networking devices. The negatively charged tin-vacancy center (SnV) is particularly interesting, as its large spin-orbit coupling offers strong protection against phonon dephasing and robust cyclicity of its optical transitions towards spin-photon entanglement schemes. Here, we demonstrate multi-axis coherent control of the SnV spin qubit via an all-optical stimulated Raman drive between the ground and excited states. We use coherent population trapping and optically driven electronic spin resonance to confirm coherent access to the qubit at 1.7 K, and obtain spin Rabi oscillations at a rate of $\Omega/2\pi$=3.6(1) MHz. All-optical Ramsey interferometry reveals a spin dephasing time of $T_2^$=1.3(3)$\mu$s and two-pulse dynamical decoupling already extends the spin coherence time to $T_2$=0.33(14) ms. Combined with transform-limited photons and integration into photonic nanostructures, our results make the SnV a competitive spin-photon building block for quantum networks.	翻訳日:2023-03-28 03:19:45 公開日:2021-06-01
# 集約学習:ニューラルネットワーク分類器の学習に対するベクトル量子化アプローチ Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers ( http://arxiv.org/abs/2001.03955v3 ) ライセンス: Link先を確認	Masoumeh Soflaei, Hongyu Guo, Ali Al-Bashabsheh, Yongyi Mao, Richong Zhang	(参考訳) ニューラルネットワーク分類器の学習の問題点を考察する。情報ボトルネック(IB)の原則の下では,この分類問題を「IB学習」と呼ぶ表現学習問題と関連付ける。 IB学習は、実際、量子化問題の特別なクラスと等価であることを示す。速度歪み理論の古典的な結果は、IB学習は「ベクトル量子化」アプローチ、すなわち複数の入力オブジェクトの表現を同時に学習するアプローチの恩恵を受けることができることを示唆する。このようなアプローチは、ニューラルネットワークモデルによる分類のための新しい学習フレームワークである"集約学習(Aggregated Learning)"を生み出した。このフレームワークでは、複数のオブジェクトを単一のニューラルネットワークで共同で分類する。本フレームワークの有効性は,標準画像認識およびテキスト分類タスクに関する広範な実験を通じて検証される。 We consider the problem of learning a neural network classifier. Under the information bottleneck (IB) principle, we associate with this classification problem a representation learning problem, which we call "IB learning". We show that IB learning is, in fact, equivalent to a special class of the quantization problem. The classical results in rate-distortion theory then suggest that IB learning can benefit from a "vector quantization" approach, namely, simultaneously learning the representations of multiple input objects. Such an approach assisted with some variational techniques, result in a novel learning framework, "Aggregated Learning", for classification with neural network models. In this framework, several objects are jointly classified by a single neural network. The effectiveness of this framework is verified through extensive experiments on standard image recognition and text classification tasks.	翻訳日:2023-01-12 04:32:10 公開日:2021-06-01
# ニューラルネットワークを用いたベイズ推論 Bayesian Reasoning with Trained Neural Networks ( http://arxiv.org/abs/2001.11031v3 ) ライセンス: Link先を確認	Jakob Knollm\"uller and Torsten En{\ss}lin	(参考訳) 我々は、トレーニングされたニューラルネットワークを用いてベイズ推論を行い、初期スコープ外のタスクを解決する方法を示した。深層生成モデルは事前知識を提供し、分類/回帰ネットワークは制約を課す。手前のタスクはベイズ推論問題として定式化され、変分法やサンプリング法によってほぼ解決した。既にトレーニング済みのネットワーク上に構築されたアプローチと、対応可能な質問は、利用可能なネットワークの数によって超指数的に増加した。最も単純な形で、アプローチは条件付き生成モデルを生み出した。しかし、複数の同時制約は精巧な問題を構成する。我々は、このアプローチを特別に訓練されたジェネレータと比較し、謎を解く方法を示し、最先端アーキテクチャとの互換性を実証した。 We showed how to use trained neural networks to perform Bayesian reasoning in order to solve tasks outside their initial scope. Deep generative models provide prior knowledge, and classification/regression networks impose constraints. The tasks at hand were formulated as Bayesian inference problems, which we approximately solved through variational or sampling techniques. The approach built on top of already trained networks, and the addressable questions grew super-exponentially with the number of available networks. In its simplest form, the approach yielded conditional generative models. However, multiple simultaneous constraints constitute elaborate questions. We compared the approach to specifically trained generators, showed how to solve riddles, and demonstrated its compatibility with state-of-the-art architectures.	翻訳日:2023-01-05 20:36:07 公開日:2021-06-01
# 動的ニューロモルフィックプロセッサによる時空間的特徴のシナプス的統合 Synaptic Integration of Spatiotemporal Features with a Dynamic Neuromorphic Processor ( http://arxiv.org/abs/2002.04924v2 ) ライセンス: Link先を確認	Mattias Nilsson, Foteini Liwicki and Fredrik Sandin	(参考訳) スパイキングニューロンは、シナプス前スパイクパターンの非線形シナプスおよび樹状統合による時空間的特徴検出を行うことができる。非線型デンドライトと関連するニューロモルフィック回路設計のマルチコンパートメントモデルは、そのような動的統合プロセスの忠実な模倣を可能にするが、これらのアプローチは比較的高い計算コストや回路サイズにも関係している。本稿では,dynap-seニューロモルフィックプロセッサにおける,複数の動的シナプスと時空間スパイクパターンの相補的統合について検討する。提案する動的シナプスの興奮-抑制対が組み合わさって複数の入力を統合する方法について検討し、この概念を1つの抑制シナプスと複数の興奮シナプスが組み合わされた場合に一般化する。神経形ニューロン回路の膜電位を測定し,解析することにより,後シナプス電位(EPSP)の遅延を特徴づける。生物学的に関係のあるEPSP遅延は1ニューロンあたり10ミリ秒の変動であり、デバイスミスマッチにより異なるシナプスの組み合わせを選択することにより、提案手法で実現できる。これらの結果に基づき,dynap-seに動的シナプスを有する単一点ニューロンが,特定の時空間構造を有するシナプス前スパイクに対して選択的に応答できることを実証した。 Spiking neurons can perform spatiotemporal feature detection by nonlinear synaptic and dendritic integration of presynaptic spike patterns. Multicompartment models of non-linear dendrites and related neuromorphic circuit designs enable faithful imitation of such dynamic integration processes, but these approaches are also associated with a relatively high computing cost or circuit size. Here, we investigate synaptic integration of spatiotemporal spike patterns with multiple dynamic synapses on point-neurons in the DYNAP-SE neuromorphic processor, which offers a complementary resource-efficient, albeit less flexible, approach to feature detection. We investigate how previously proposed excitatory--inhibitory pairs of dynamic synapses can be combined to integrate multiple inputs, and we generalize that concept to a case in which one inhibitory synapse is combined with multiple excitatory synapses. We characterize the resulting delayed excitatory postsynaptic potentials (EPSPs) by measuring and analyzing the membrane potentials of the neuromorphic neuronal circuits. We find that biologically relevant EPSP delays, with variability of order 10 milliseconds per neuron, can be realized in the proposed manner by selecting different synapse combinations, thanks to device mismatch. Based on these results, we demonstrate that a single point-neuron with dynamic synapses in the DYNAP-SE can respond selectively to presynaptic spikes with a particular spatiotemporal structure, which enables, for instance, visual feature tuning of single neurons.	翻訳日:2023-01-01 19:02:48 公開日:2021-06-01
# 公正主成分分析とフィルタ設計 Fair Principal Component Analysis and Filter Design ( http://arxiv.org/abs/2002.06557v2 ) ライセンス: Link先を確認	Gad Zalcberg and Ami Wiesel	(参考訳) 我々は,fair principal component analysis (fpca) を検討し,複数の対象ベクトルに公平にまたがる低次元部分空間を探索する。 FPCAは、与えられた集合内の最悪の射影目標ノルムの非凸最大化として定義される。この問題は信号処理におけるフィルタ設計や、公平性を次元還元スキームに組み込む際に発生する。 FPCAへの芸術的アプローチの状況は半有限緩和によるものであり、多項式は計算に費用がかかる。スケーラビリティを実現するために,naive sub-gradient descend を用いて fpca に対処することを提案する。直交目標の場合, 基礎となる最適化の状況を分析する。ランドスケープが良性であること、およびすべての局所ミニマがグローバルに最適であることを証明する。興味深いことに、sdrアプローチは、この単純なケースでは、最適以下のソリューションにつながります。最後に、直交FPCAと正規化タイトフレームの設計の等価性について論じる。 We consider Fair Principal Component Analysis (FPCA) and search for a low dimensional subspace that spans multiple target vectors in a fair manner. FPCA is defined as a non-concave maximization of the worst projected target norm within a given set. The problem arises in filter design in signal processing, and when incorporating fairness into dimensionality reduction schemes. The state of the art approach to FPCA is via semidefinite relaxation and involves a polynomial yet computationally expensive optimization. To allow scalability, we propose to address FPCA using naive sub-gradient descent. We analyze the landscape of the underlying optimization in the case of orthogonal targets. We prove that the landscape is benign and that all local minima are globally optimal. Interestingly, the SDR approach leads to sub-optimal solutions in this simple case. Finally, we discuss the equivalence between orthogonal FPCA and the design of normalized tight frames.	翻訳日:2022-12-31 17:40:11 公開日:2021-06-01
# ソースデータに本当にアクセスする必要があるか? 教師なし領域適応のためのソース仮説伝達 Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation ( http://arxiv.org/abs/2002.08546v6 ) ライセンス: Link先を確認	Jian Liang, Dapeng Hu, and Jiashi Feng	(参考訳) unsupervised domain adaptation(uda)は、ラベル付きソースデータセットから学んだ知識を活用して、新しいラベル付きドメインで同様のタスクを解決することを目的としている。従来のUDAメソッドは、モデルに適応するために学習する際にソースデータにアクセスする必要があり、分散化されたプライベートデータに対してリスクが高く非効率である。この研究は、訓練済みのソースモデルのみが利用できる実践的な環境に取り組み、ソースデータ無しでそのようなモデルを効果的に活用してUDA問題を解決する方法について検討する。本稿では,簡単な汎用的な表現学習フレームワークである \emph{Source HypOthesis Transfer} (SHOT) を提案する。 shotはソースモデルの分類器モジュール(仮説)を凍結し、情報最大化と自己教師付き擬似ラベルの両方を利用してターゲット固有の特徴抽出モジュールを学習し、ターゲットドメインからソース仮説への表現を暗黙的に整列させる。その汎用性を検証するため, 閉集合, 部分集合, 開集合領域適応など, SHOTを多岐にわたる適応例で評価した。実験によると、shotは複数のドメイン適応ベンチマークにおいて最先端の結果をもたらす。 Unsupervised domain adaptation (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain. Prior UDA methods typically require to access the source data when learning to adapt the model, making them risky and inefficient for decentralized private data. This work tackles a practical setting where only a trained source model is available and investigates how we can effectively utilize such a model without source data to solve UDA problems. We propose a simple yet generic representation learning framework, named \emph{Source HypOthesis Transfer} (SHOT). SHOT freezes the classifier module (hypothesis) of the source model and learns the target-specific feature extraction module by exploiting both information maximization and self-supervised pseudo-labeling to implicitly align representations from the target domains to the source hypothesis. To verify its versatility, we evaluate SHOT in a variety of adaptation cases including closed-set, partial-set, and open-set domain adaptation. Experiments indicate that SHOT yields state-of-the-art results among multiple domain adaptation benchmarks.	翻訳日:2022-12-30 07:07:27 公開日:2021-06-01
# 古典的適応フィルタ理論を用いたCNN訓練速度と安定性に及ぼすバッチ正規化の影響の分離 Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory ( http://arxiv.org/abs/2002.10674v2 ) ライセンス: Link先を確認	Elaina Chai, Mert Pilanci, Boris Murmann	(参考訳) バッチ正規化(BatchNorm)は、トレーニング速度と安定性を改善するために、畳み込みニューラルネットワーク(CNN)で一般的に使用される。しかし、なぜこの手法が有効であるかについてのコンセンサスはまだ限られている。本稿では、従来の適応フィルタ領域の概念を用いて、BatchNormの動的および内部動作に関する洞察を提供する。まず、畳み込み重み更新は、畳み込み層のチャネルワイド構造を介してBatchNormによって制御される入力自己相関行列の固有値に、安定性と収束速度が結びついている自然なモードを持つことを示す。さらに,本実験では,速度と安定性の利点が異なる効果を示す。低い学習率では、収束速度を改善する最小固有値のBatchNormの増幅であり、高い学習率では、安定性を保証する最大の固有値の抑制である。最後に、第1のトレーニングステップにおいて、正規化が最も必要となる場合、BatchNormは正規化リースト平均角 (NLMS) と同じ最適化を満足する一方で、その後のステップでこの条件を近似し続けていることを証明した。本稿では,適応フィルタ理論を用いて,現代のニューラルネットワーク構造に関するさらなる知見を得るための基礎研究を行った。 Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. However, there is still limited consensus on why this technique is effective. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm. First, we show that the convolution weight updates have natural modes whose stability and convergence speed are tied to the eigenvalues of the input autocorrelation matrices, which are controlled by BatchNorm through the convolution layers' channel-wise structure. Furthermore, our experiments demonstrate that the speed and stability benefits are distinct effects. At low learning rates, it is BatchNorm's amplification of the smallest eigenvalues that improves convergence speed, while at high learning rates, it is BatchNorm's suppression of the largest eigenvalues that ensures stability. Lastly, we prove that in the first training step, when normalization is needed most, BatchNorm satisfies the same optimization as Normalized Least Mean Square (NLMS), while it continues to approximate this condition in subsequent steps. The analyses provided in this paper lay the groundwork for gaining further insight into the operation of modern neural network structures using adaptive filter theory.	翻訳日:2022-12-28 20:34:16 公開日:2021-06-01
# DP-MERF: 実用的プライバシー保護データ生成のためのランダムな特徴付き微分プライベート平均埋め込み DP-MERF: Differentially Private Mean Embeddings with Random Features for Practical Privacy-Preserving Data Generation ( http://arxiv.org/abs/2002.11603v5 ) ライセンス: Link先を確認	Frederik Harder, Kamil Adamczewski, Mijung Park	(参考訳) 実データと合成データの分布を比較する際に,カーネル平均埋め込みのランダムな特徴表現を用いた差分プライベートなデータ生成パラダイムを提案する。ランダムな特徴表現を2つの重要な利点として活用する。まず、深層生成モデルのトレーニングには最小限のプライバシーコストが必要です。これは、真のデータポイントと合成データポイントのすべてのペアでカーネルマトリックスを計算する必要があるカーネルベースの距離メトリクスとは異なり、データ依存項を合成データのみに依存する用語から切り離すことができるためである。したがって、データ依存項を一度だけ摂動し、ジェネレータのトレーニング中に繰り返し使用する必要がある。第二に、ランダムな特徴が構築によってノルムとなるため、カーネル平均埋め込みの解析感度を得ることができる。これにより、ジェネレータネットワークの未知の感度を扱うために、クリッピングノルムのハイパーパラメータ検索の必要性がなくなる。我々は,不均質な表データや画像データなどのデータセットのラベルと入力特徴を共同で生成するために,ランダムな特徴量(dp-merf)を持つ微分的平均埋め込みアルゴリズムを提案する。このアルゴリズムは、複数のデータセットでテストした場合、既存の方法よりもはるかに優れたプライバシ利用トレードオフを実現する。 We propose a differentially private data generation paradigm using random feature representations of kernel mean embeddings when comparing the distribution of true data with that of synthetic data. We exploit the random feature representations for two important benefits. First, we require a minimal privacy cost for training deep generative models. This is because unlike kernel-based distance metrics that require computing the kernel matrix on all pairs of true and synthetic data points, we can detach the data-dependent term from the term solely dependent on synthetic data. Hence, we need to perturb the data-dependent term only once and then use it repeatedly during the generator training. Second, we can obtain an analytic sensitivity of the kernel mean embedding as the random features are norm bounded by construction. This removes the necessity of hyper-parameter search for a clipping norm to handle the unknown sensitivity of a generator network. We provide several variants of our algorithm, differentially-private mean embeddings with random features (DP-MERF) to jointly generate labels and input features for datasets such as heterogeneous tabular data and image data. Our algorithm achieves drastically better privacy-utility trade-offs than existing methods when tested on several datasets.	翻訳日:2022-12-28 14:25:14 公開日:2021-06-01
# 広小密度仮説と探索的拡張学習率スケジュール Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule ( http://arxiv.org/abs/2003.03977v5 ) ライセンス: Link先を確認	Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu	(参考訳) いくつかの論文では、幅の広いミニマは狭いミニマよりも一般化されていると主張している。本稿では,広大極小の一般化特性を共生する詳細な実験を通じて,広大極小の密度が狭小極小の密度よりも低いという新しい仮説の実証的な証拠を提供する。さらに,この仮説に動機づけられ,新しい探索・探索学習率スケジュールを設計する。様々な画像や自然言語データセットにおいて,学習のベースラインを手作業で調整した場合と比較して,探索・探索のスケジュールは最大0.84%高い絶対精度が得られるか,最大57%のトレーニング時間を短縮し,元の報告精度を達成することができることを示した。例えば、ハイパフォーマンスモデルの学習率スケジュールを変更するだけで、IWSLT'14(DE-EN)データセットの最先端(SOTA)精度を実現する。 Several papers argue that wide minima generalize better than narrow minima. In this paper, through detailed experiments that not only corroborate the generalization properties of wide minima, we also provide empirical evidence for a new hypothesis that the density of wide minima is likely lower than the density of narrow minima. Further, motivated by this hypothesis, we design a novel explore-exploit learning rate schedule. On a variety of image and natural language datasets, compared to their original hand-tuned learning rate baselines, we show that our explore-exploit schedule can result in either up to 0.84% higher absolute accuracy using the original training budget or up to 57% reduced training time while achieving the original reported accuracy. For example, we achieve state-of-the-art (SOTA) accuracy for IWSLT'14 (DE-EN) dataset by just modifying the learning rate schedule of a high performing model.	翻訳日:2022-12-25 07:56:55 公開日:2021-06-01
# 潜在画像を用いたオープンドメイン対話生成 Open Domain Dialogue Generation with Latent Images ( http://arxiv.org/abs/2004.01981v2 ) ライセンス: Link先を確認	Ze Yang, Wei Wu, Huang Hu, Can Xu, Wei Wang, Zhoujun Li	(参考訳) オープンドメインと画像との対話について検討する。既存の研究は、画像とテキストの文脈の両方が利用可能であると仮定しているが、自然界における画像地上対話は、テキスト対話よりも入手が困難である。そこで本研究では,対話時の視覚シーン情報を画像で表現可能と仮定し,テキスト対画像生成技術を用いてテキスト対話の潜在画像の復元を試みることにより,画像接地対話とテキスト対話の両方を用いた応答生成モデルを学ぶことを提案する。 2つのタイプの対話の可能性は、条件付き変分オートエンコーディングフレームワークで学習される応答生成器と画像再構成器によって定式化される。画像地上会話とテキストベースの会話の両方において実証的研究を行う。第1シナリオでは、特に低リソース環境下でのイメージ接頭辞対話は、潜在画像とのテキスト対話によって効果的に強化されるが、第2シナリオでは、潜在画像は応答の内容を強化し、同時に文脈に関連づけられる。 We consider grounding open domain dialogues with images. Existing work assumes that both an image and a textual context are available, but image-grounded dialogues by nature are more difficult to obtain than textual dialogues. Thus, we propose learning a response generation model with both image-grounded dialogues and textual dialogues by assuming that the visual scene information at the time of a conversation can be represented by an image, and trying to recover the latent images of the textual dialogues through text-to-image generation techniques. The likelihood of the two types of dialogues is then formulated by a response generator and an image reconstructor that are learned within a conditional variational auto-encoding framework. Empirical studies are conducted in both image-grounded conversation and text-based conversation. In the first scenario, image-grounded dialogues, especially under a low-resource setting, can be effectively augmented by textual dialogues with latent images; while in the second scenario, latent images can enrich the content of responses and at the same time keep them relevant to contexts.	翻訳日:2022-12-16 22:35:28 公開日:2021-06-01
# 過パラメータ領域における補間線形分類器の有限サンプル解析 Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime ( http://arxiv.org/abs/2004.12019v4 ) ライセンス: Link先を確認	Niladri S. Chatterji, Philip M. Long	(参考訳) 2クラス線形分類における最大マージンアルゴリズムの集団リスクの限界を証明した。線形分離可能なトレーニングデータに対して、最大マージンアルゴリズムは、トレーニングエラーが0に駆動されるため、勾配降下を用いたロジスティック損失を伴うトレーニングの限界に相当することが以前の研究で示されている。このアルゴリズムは誤分類ノイズを含むランダムデータに適用される。クリーンデータに対する我々の仮定は、クラス条件分布が標準正規分布である場合を含む。誤分類ノイズは敵によって選択され、破損したラベルのごく一部に制限される。我々の限界は、十分な過パラメータ化によって、ノイズデータに基づいてトレーニングされた最大マージンアルゴリズムが、ほぼ最適な人口リスクを達成できることを示している。 We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification. For linearly separable training data, the maximum margin algorithm has been shown in previous work to be equivalent to a limit of training with logistic loss using gradient descent, as the training error is driven to zero. We analyze this algorithm applied to random data including misclassification noise. Our assumptions on the clean data include the case in which the class-conditional distributions are standard normal distributions. The misclassification noise may be chosen by an adversary, subject to a limit on the fraction of corrupted labels. Our bounds show that, with sufficient over-parameterization, the maximum margin algorithm trained on noisy data can achieve nearly optimal population risk.	翻訳日:2022-12-09 21:35:38 公開日:2021-06-01
# InfoScrub: 目的の難読化による属性プライバシの実現 InfoScrub: Towards Attribute Privacy by Targeted Obfuscation ( http://arxiv.org/abs/2005.10329v2 ) ライセンス: Link先を確認	Hui-Po Wang, Tribhuvanesh Orekondy, Mario Fritz	(参考訳) オンラインで共有された個人の個人写真は、記憶に残る多くの詳細を示す以外に、幅広いプライベート情報を明らかにし、プライバシーリスク(オンラインハラスメント、追跡など)を伴う可能性がある。このようなリスクを軽減するためには、個人が視覚データに漏洩した個人情報を制限する技術を研究することが不可欠である。我々は,画像の忠実さを維持しつつ,対象とするプライバシ属性に対する推論のエントロピーを最大化する,新しい画像難読化フレームワークでこの問題に取り組む。エンコーダ-デコーダ方式のアーキテクチャを基本とした2つの問題にアプローチする。 (a)複数ドメインから同時に双方向翻訳を行うための識別器を導入すること b)属性のターゲットセットに対する不確実性を最大化する画像補間を予測する。我々のアプローチは、元の入力画像に忠実な難読化画像を生成し、さらに非難読化画像に対して6.2$\times$(最大0.25bit)の不確かさを増加させる。 Personal photos of individuals when shared online, apart from exhibiting a myriad of memorable details, also reveals a wide range of private information and potentially entails privacy risks (e.g., online harassment, tracking). To mitigate such risks, it is crucial to study techniques that allow individuals to limit the private information leaked in visual data. We tackle this problem in a novel image obfuscation framework: to maximize entropy on inferences over targeted privacy attributes, while retaining image fidelity. We approach the problem based on an encoder-decoder style architecture, with two key novelties: (a) introducing a discriminator to perform bi-directional translation simultaneously from multiple unpaired domains; (b) predicting an image interpolation which maximizes uncertainty over a target set of attributes. We find our approach generates obfuscated images faithful to the original input images, and additionally increase uncertainty by 6.2$\times$ (or up to 0.85 bits) over the non-obfuscated counterparts.	翻訳日:2022-12-01 05:03:56 公開日:2021-06-01
# 胸部X線画像からのCOVID-19, MERS, SARSの信頼性診断のための深層学習 Deep Learning for Reliable Classification of COVID-19, MERS, and SARS from Chest X-Ray Images ( http://arxiv.org/abs/2005.11524v6 ) ライセンス: Link先を確認	Anas Tahir, Yazan Qiblawey, Amith Khandakar, Tawsifur Rahman, Uzair Khurshid, Farayi Musharavati, M. T. Islam, Serkan Kiranyaz, Muhammad E. H. Chowdhury	(参考訳) 新規のコロナウイルス病(COVID-19)は、非常に感染性が高く、急速に感染するコロナウイルスである。 2002年と2011年に流行した重症急性呼吸器症候群(sars)と中東呼吸器症候群(mers)、そして現在の新型コロナウイルス(covid-19)のパンデミックは、すべて同じ種類のコロナウイルスである。本研究の目的は、深層畳み込みニューラルネットワーク(CNN)を用いて、COVID-19、SARS、MERS胸部X線(CXR)画像を分類することである。 423のCOVID-19、144のMERS、134のSARS CXR画像からなるQU-COVID- Familyと呼ばれるユニークなデータベースが作成された。さらに、CNNセグメンテーションモデル(U-Net)を用いて肺領域を同定し、訓練済みのCNN分類器を用いて、セグメンテーションされた肺画像をCOVID-19、MERS、SARSに分類する堅牢なCOVID-19認識システムを提案した。さらに,スコアカム可視化法を用いて分類結果の可視化を行い,深層cnnの決定の背後にある理由を理解する。いくつかのディープラーニング分類器が訓練され、テストされ、4つの優れたアルゴリズムが報告された。オリジナル画像とプリプロセス画像は、ネットワークへの入力として、それぞれに同時に使用された。 CXR分類とCXR分類の2つの分類法が検討された。通常のcxrでは、inceptionv3は他のネットワークを3チャンネル方式で上回り、99.5%、93.1%、および97%の感度を達成し、covid-19、mers、sars画像の分類を行った。一方、セグメンテーションされたCXRでは、InceptionV3はオリジナルのCXRデータセットより優れ、それぞれ96.94%、79.68%、90.26%の感度で新型コロナウイルス、MERS、SARSの画像を分類した。すべてのネットワークは、分枝肺画像で高い新型コロナウイルス検出感度(>96%)を示した。これは、医療従事者にとって難しい課題であるAIの目に、新型コロナウイルス(COVID-19)のユニークな症状を示すものだ。 Novel Coronavirus disease (COVID-19) is an extremely contagious and quickly spreading Coronavirus infestation. Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS), which outbreak in 2002 and 2011, and the current COVID-19 pandemic are all from the same family of coronavirus. This work aims to classify COVID-19, SARS, and MERS chest X-ray (CXR) images using deep Convolutional Neural Networks (CNNs). A unique database was created, so-called QU-COVID-family, consisting of 423 COVID-19, 144 MERS, and 134 SARS CXR images. Besides, a robust COVID-19 recognition system was proposed to identify lung regions using a CNN segmentation model (U-Net), and then classify the segmented lung images as COVID-19, MERS, or SARS using a pre-trained CNN classifier. Furthermore, the Score-CAM visualization method was utilized to visualize classification output and understand the reasoning behind the decision of deep CNNs. Several Deep Learning classifiers were trained and tested; four outperforming algorithms were reported. Original and preprocessed images were used individually and all together as the input(s) to the networks. Two recognition schemes were considered: plain CXR classification and segmented CXR classification. For plain CXRs, it was observed that InceptionV3 outperforms other networks with a 3-channel scheme and achieves sensitivities of 99.5%, 93.1%, and 97% for classifying COVID-19, MERS, and SARS images, respectively. In contrast, for segmented CXRs, InceptionV3 outperformed using the original CXR dataset and achieved sensitivities of 96.94%, 79.68%, and 90.26% for classifying COVID-19, MERS, and SARS images, respectively. All networks showed high COVID-19 detection sensitivity (>96%) with the segmented lung images. This indicates the unique radiographic signature of COVID-19 cases in the eyes of AI, which is often a challenging task for medical doctors.	翻訳日:2022-11-30 03:38:28 公開日:2021-06-01
# 連続ゲームにおける近似ナッシュ平衡の計算アルゴリズムと連続ブロットーへの応用 Algorithm for Computing Approximate Nash Equilibrium in Continuous Games with Application to Continuous Blotto ( http://arxiv.org/abs/2006.07443v5 ) ライセンス: Link先を確認	Sam Ganzfried	(参考訳) 様々な有限ゲームクラスにおけるナッシュ均衡の計算に有効なアルゴリズムが開発されている。しかし、純粋な戦略空間が(潜在的に数え切れないほど)無限である連続的なゲームを解くことは、はるかに難しい。にもかかわらず、多くの実世界のドメインは連続的な行動空間を持ち、例えば、アクションは時間、お金、あるいは自然に積分とは対照的に実数値としてモデル化される他の資源の量を指す。連続ゲームにおけるナッシュ均衡戦略を近似する新しいアルゴリズムを提案する。 2人プレイのゼロサムゲームに加えて、アルゴリズムは不完全な情報を持つマルチプレイヤーゲームやゲームにも適用される。 2人のプレイヤーが複数の戦場でリソースを分配する連続的不完全情報ブロットゲームについて実験を行った。ブロットゲームは国家の安全シナリオをモデル化するために頻繁に用いられ、選挙競争やオークション理論にも適用されてきた。実験により,本ゲームにおけるnash平衡戦略の近接近似を高速に計算できることを示した。 Successful algorithms have been developed for computing Nash equilibrium in a variety of finite game classes. However, solving continuous games -- in which the pure strategy space is (potentially uncountably) infinite -- is far more challenging. Nonetheless, many real-world domains have continuous action spaces, e.g., where actions refer to an amount of time, money, or other resource that is naturally modeled as being real-valued as opposed to integral. We present a new algorithm for {approximating} Nash equilibrium strategies in continuous games. In addition to two-player zero-sum games, our algorithm also applies to multiplayer games and games with imperfect information. We experiment with our algorithm on a continuous imperfect-information Blotto game, in which two players distribute resources over multiple battlefields. Blotto games have frequently been used to model national security scenarios and have also been applied to electoral competition and auction theory. Experiments show that our algorithm is able to quickly compute close approximations of Nash equilibrium strategies for this game.	翻訳日:2022-11-22 04:46:49 公開日:2021-06-01
# 高速かつイーガーなk-メドイドクラスタリング: PAM, CLARA, CLARANSアルゴリズムのO(k)実行時改善 Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms ( http://arxiv.org/abs/2008.05171v2 ) ライセンス: Link先を確認	Erich Schubert and Peter J. Rousseeuw	(参考訳) 非ユークリッドデータのクラスタリングは困難であり、階層的クラスタリング以外の最もよく使われるアルゴリズムの1つは、k-medoids clusteringとも呼ばれるPAM(Partitioning Around Medoids)である。ユークリッド幾何学において、k-平均で使われる平均はクラスター中心に良い推定子であるが、任意の相似性には存在しない。 PAMは代わりにメドイドを使用し、クラスタ内の他のすべてのオブジェクトと最小の相同性を持つオブジェクトである。この中心性の概念は任意の(dis-)類似性で使用することができ、したがって多くのドメインやアプリケーションに高い関係がある。 PAMの大きな問題は、高い実行時間コストである。アルゴリズムの第2(SWAP)フェーズでO(k)倍の高速化を実現するPAMアルゴリズムの修正を提案するが、元のPAMアルゴリズムと同じ結果が得られる。実行されたスワップの選択を緩和し(同等の品質を維持しながら)、各イテレーションで熱心にスワップを実行してアルゴリズムをさらに加速することができる。スワップが大幅に速くなれば、より高速な初期化戦略を探せるようになります。 (i)古典的「BUILD」初期化がボトルネックとなり、 (二)当社のスワップは、開始条件の悪化を補うのに十分な速さです。また,claraアルゴリズムとclaransアルゴリズムが提案する修正の利点を示す。本研究におけるアプローチの並列化は研究されていないが,PAM と CLARA をビッグデータ上で使用するための従来のアプローチ(一部ではサブルーチンとして PAM を使用しているため,これらの改善の恩恵を受けられる)と組み合わせれば,高い k での性能がますます重要になる。 k=100,200の実データに対する実験では、元のPAM SWAPアルゴリズムと比較して458倍のスピードアップを観測し、PAMをより大きなデータセット、特に高いkに適用した。 Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids clustering. In Euclidean geometry the mean-as used in k-means-is a good estimator for the cluster center, but this does not exist for arbitrary dissimilarities. PAM uses the medoid instead, the object with the smallest dissimilarity to all others in the cluster. This notion of centrality can be used with any (dis-)similarity, and thus is of high relevance to many domains and applications. A key issue with PAM is its high run time cost. We propose modifications to the PAM algorithm that achieve an O(k)-fold speedup in the second ("SWAP") phase of the algorithm, but will still find the same results as the original PAM algorithm. If we relax the choice of swaps performed (while retaining comparable quality), we can further accelerate the algorithm by eagerly performing additional swaps in each iteration. With the substantially faster SWAP, we can now explore faster initialization strategies, because (i) the classic ("BUILD") initialization now becomes the bottleneck, and (ii) our swap is fast enough to compensate for worse starting conditions. We also show how the CLARA and CLARANS algorithms benefit from the proposed modifications. While we do not study the parallelization of our approach in this work, it can easily be combined with earlier approaches to use PAM and CLARA on big data (some of which use PAM as a subroutine, hence can immediately benefit from these improvements), where the performance with high k becomes increasingly important. In experiments on real data with k=100,200, we observed a 458x respectively 1191x speedup compared to the original PAM SWAP algorithm, making PAM applicable to larger data sets, and in particular to higher k.	翻訳日:2022-10-31 04:28:32 公開日:2021-06-01
# 政治的主張のマルチホップファクトチェック Multi-Hop Fact Checking of Political Claims ( http://arxiv.org/abs/2009.06401v3 ) ライセンス: Link先を確認	Wojciech Ostrowski, Arnav Arora, Pepa Atanasova, Isabelle Augenstein	(参考訳) 近年の研究では、複雑な自然言語推論を研究するためのマルチホップモデルとデータセットが提案されている。マルチホップ推論を必要とする注目すべきタスクの1つは事実チェックであり、接続された証拠のセットがクレームの最終判断に繋がる。しかし、既存のデータセットは金のエビデンスページのアノテーションを提供していないか、または(FEVER)唯一のデータセットは、単純な推論で事実チェックでき、人工的に構築されるクレームで構成されている。ここでは、相互接続された証拠チャンクの上に複数のホップを持つ自然発生クレームのより複雑なクレーム検証を行う。私たち 1) クレーム検証のための証拠文の小さな注釈付きデータセット、politihopを構築する。 2) 既存のマルチホップデータセットと比較し, 3)より広範なドメイン内および外部リソースからPolitHopへの知識の転送方法を検討する。タスクは複雑で、ドメイン内の転送学習と組み合わせてエビデンスを推論するアーキテクチャで、最高のパフォーマンスを達成することが分かっています。 Recent work has proposed multi-hop models and datasets for studying complex natural language reasoning. One notable task requiring multi-hop reasoning is fact checking, where a set of connected evidence pieces leads to the final verdict of a claim. However, existing datasets either do not provide annotations for gold evidence pages, or the only dataset which does (FEVER) mostly consists of claims which can be fact-checked with simple reasoning and is constructed artificially. Here, we study more complex claim verification of naturally occurring claims with multiple hops over interconnected evidence chunks. We: 1) construct a small annotated dataset, PolitiHop, of evidence sentences for claim verification; 2) compare it to existing multi-hop datasets; and 3) study how to transfer knowledge from more extensive in- and out-of-domain resources to PolitiHop. We find that the task is complex and achieve the best performance with an architecture that specifically models reasoning over evidence pieces in combination with in-domain transfer learning.	翻訳日:2022-10-20 02:34:50 公開日:2021-06-01
# 暗黙的グラフニューラルネットワーク Implicit Graph Neural Networks ( http://arxiv.org/abs/2009.06211v3 ) ライセンス: Link先を確認	Fangda Gu, Heng Chang, Wenwu Zhu, Somayeh Sojoudi, Laurent El Ghaoui	(参考訳) グラフニューラルネットワーク(gnns)は、グラフ構造化データから意味のある表現を学ぶディープラーニングモデルとして広く使われている。基礎となるリカレント構造が有限であるため、現在のGNN法は基礎となるグラフの長距離依存を捉えるのに苦労する可能性がある。この難しさを克服するために,我々は,暗黙の「状態」ベクトルを含む固定点平衡方程式の解に基づく,暗黙のグラフニューラルネットワーク(ignn)と呼ばれるグラフ学習フレームワークを提案する。我々はペロン・フロベニウス理論を用いて、枠組みの健全性を保証する十分な条件を導出する。暗黙的な差別化を生かして、フレームワークを訓練するための引き込み可能な勾配降下法を導出する。包括的なタスクの実験は、IGNNが一貫して長距離依存をキャプチャし、最先端のGNNモデルより優れていることを示している。 Graph Neural Networks (GNNs) are widely used deep learning models that learn meaningful representations from graph-structured data. Due to the finite nature of the underlying recurrent structure, current GNN methods may struggle to capture long-range dependencies in underlying graphs. To overcome this difficulty, we propose a graph learning framework, called Implicit Graph Neural Networks (IGNN), where predictions are based on the solution of a fixed-point equilibrium equation involving implicitly defined "state" vectors. We use the Perron-Frobenius theory to derive sufficient conditions that ensure well-posedness of the framework. Leveraging implicit differentiation, we derive a tractable projected gradient descent method to train the framework. Experiments on a comprehensive range of tasks show that IGNNs consistently capture long-range dependencies and outperform the state-of-the-art GNN models.	翻訳日:2022-10-18 11:31:51 公開日:2021-06-01
# 最小平均化によるモーメントム:非凸最適化のための理論的考察と学習率スケジューリング Momentum via Primal Averaging: Theoretical Insights and Learning Rate Schedules for Non-Convex Optimization ( http://arxiv.org/abs/2010.00406v4 ) ライセンス: Link先を確認	Aaron Defazio	(参考訳) モーメント法は、ディープニューラルネットワークのような非凸モデルのトレーニングに機械学習コミュニティ内で広く使われている。経験的には、それらは伝統的な確率勾配降下(SGD)アプローチを実行する。本研究では, 運動量を持つSGD(SGD+M)のリアプノフ解析を行い, 確率的原始平均化法(SPA)の等価な書き換えを利用する。この解析は、非凸の場合の以前の理論よりもはるかに厳密であり、SGD+MがSGDをいつ上回るのか、ハイパーパラメータースケジュールがどうなるのか、なぜ動くのかを正確に把握することができる。 Momentum methods are now used pervasively within the machine learning community for training non-convex models such as deep neural networks. Empirically, they out perform traditional stochastic gradient descent (SGD) approaches. In this work we develop a Lyapunov analysis of SGD with momentum (SGD+M), by utilizing a equivalent rewriting of the method known as the stochastic primal averaging (SPA) form. This analysis is much tighter than previous theory in the non-convex case, and due to this we are able to give precise insights into when SGD+M may out-perform SGD, and what hyper-parameter schedules will work and why.	翻訳日:2022-10-12 07:43:33 公開日:2021-06-01
# 構造予測のための埋め込みの自動連結 Automated Concatenation of Embeddings for Structured Prediction ( http://arxiv.org/abs/2010.05006v4 ) ライセンス: Link先を確認	Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu	(参考訳) 事前制約付き文脈埋め込みは、構造化予測タスクのための強力な単語表現である。最近の研究により、異なる種類の埋め込みを結合することでより良い単語表現が得られることがわかった。しかし、最善の連結表現を形成する組込みの選択は、通常、タスクや候補組込みのコレクションによって異なり、組込み型がますます増えているため、より難しい問題となっている。本稿では,ニューラルネットワーク探索の最近の進歩に触発された定式化に基づいて,構造化予測タスクに対する埋め込みのより良い結合を見つけるプロセスを自動化するための,埋め込みの自動結合(ACE)を提案する。具体的には、タスクを考慮した個別の埋め込み型の有効性に関する現在の信念に基づいて埋め込みの結合を交互にサンプリングし、報酬に基づいてその信念を更新する。強化学習の戦略に従い、コントローラのパラメータを最適化し、入力としてサンプルされた連結で供給され、タスクデータセットでトレーニングされたタスクモデルの精度に基づいて報酬を算出する。 6つのタスクと21のデータセットに対する実証的な結果から、我々のアプローチは強いベースラインを上回り、すべての評価に微調整された埋め込みによる最先端のパフォーマンスを実現する。 Pretrained contextualized embeddings are powerful word representations for structured prediction tasks. Recent work found that better word representations can be obtained by concatenating different types of embeddings. However, the selection of embeddings to form the best concatenated representation usually varies depending on the task and the collection of candidate embeddings, and the ever-increasing number of embedding types makes it a more difficult problem. In this paper, we propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks, based on a formulation inspired by recent progress on neural architecture search. Specifically, a controller alternately samples a concatenation of embeddings, according to its current belief of the effectiveness of individual embedding types in consideration for a task, and updates the belief based on a reward. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model, which is fed with the sampled concatenation as input and trained on a task dataset. Empirical results on 6 tasks and 21 datasets show that our approach outperforms strong baselines and achieves state-of-the-art performance with fine-tuned embeddings in all the evaluations.	翻訳日:2022-10-08 22:19:27 公開日:2021-06-01
# 構成的一般化と自然言語の変動:意味的パーシングアプローチは両立できるか? Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both? ( http://arxiv.org/abs/2010.12725v2 ) ライセンス: Link先を確認	Peter Shaw, Ming-Wei Chang, Panupong Pasupat, Kristina Toutanova	(参考訳) sequence-to-sequenceモデルは自然言語の変化を扱うのに優れているが、分散的構成の一般化に苦しむことが示されている。これは、構成バイアスが強い新しい特殊アーキテクチャを動機付けるが、これらのアプローチのほとんどは、自然言語の変化を代表しない合成生成データセットでのみ評価されている。私たちは、自然言語の変化と合成の一般化の両方を扱うセマンティック解析のアプローチを開発できますか? この機能をよりよく評価するために、非合成データセットの新しいトレインとテスト分割を提案する。我々は、強力な既存のアプローチが幅広い評価でうまく機能しないことを実証する。また,高精度文法ベースアプローチと事前学習されたシーケンス・ツー・シーケンスモデルを組み合わせたハイブリッドモデルであるnqg-t5を提案する。これは、非合成データに対するいくつかの構成的一般化課題にまたがる既存のアプローチよりも優れており、標準評価に関する最先端技術と競合している。この問題はまだ解決には程遠いが,本研究は多彩な評価の重要性と,構文解析における合成汎化と自然言語変化の両方を扱うオープンチャレンジを浮き彫りにしている。 Sequence-to-sequence models excel at handling natural language variation, but have been shown to struggle with out-of-distribution compositional generalization. This has motivated new specialized architectures with stronger compositional biases, but most of these approaches have only been evaluated on synthetically-generated datasets, which are not representative of natural language variation. In this work we ask: can we develop a semantic parsing approach that handles both natural language variation and compositional generalization? To better assess this capability, we propose new train and test splits of non-synthetic datasets. We demonstrate that strong existing approaches do not perform well across a broad set of evaluations. We also propose NQG-T5, a hybrid model that combines a high-precision grammar-based approach with a pre-trained sequence-to-sequence model. It outperforms existing approaches across several compositional generalization challenges on non-synthetic data, while also being competitive with the state-of-the-art on standard evaluations. While still far from solving this problem, our study highlights the importance of diverse evaluations and the open challenge of handling both compositional generalization and natural language variation in semantic parsing.	翻訳日:2022-10-03 12:45:18 公開日:2021-06-01
# deep21:21cm前景除去のための深層学習法 deep21: a Deep Learning Method for 21cm Foreground Removal ( http://arxiv.org/abs/2010.15843v2 ) ライセンス: Link先を確認	T. Lucas Makinen, Lachlan Lancaster, Francisco Villaescusa-Navarro, Peter Melchior, Shirley Ho, Laurence Perreault-Levasseur, and David N. Spergel	(参考訳) 21cmの強度マッピング観測から前景汚染物質を除去する。 unetアーキテクチャと3次元畳み込みを持つ深層畳み込みニューラルネットワーク(cnn)は、シミュレーション観測に基づいて訓練され、ノイズ発生時に前景から宇宙中性水素(hi)信号の周波数と空間パターンを効果的に分離できることを実証する。クリーニングマップは、すべての関連する角スケールと周波数で10%以内の宇宙的クラスタリング統計を回復する。これは、小さな角スケールでの桁違いの予測ばらつきを減少させる("\ell > 300$")ことと、標準主成分分析(PCA)法と比較して小さな半径スケールでの精度の改善("k_{\parallel} > 0.17\ \rm h\ Mpc^{-1}")である。ネットワークの予測に対する後続信頼区間をUNETのアンサンブルを訓練することにより推定する。提案手法は,シミュレーション前景モデルが十分現実的である限り,今後の無線実験のために導出された要約統計とは対照的に,21cmの強度マップを解析できることを実証する。我々は、Github https://github.com/tlmakinen/deep21でこの分析に使用されるコードと、 http://bit.ly/deep21-colab Colabノートブックを通じて、実験とUNetモデルのブラウザベースのチュートリアルを提供する。 We seek to remove foreground contaminants from 21cm intensity mapping observations. We demonstrate that a deep convolutional neural network (CNN) with a UNet architecture and three-dimensional convolutions, trained on simulated observations, can effectively separate frequency and spatial patterns of the cosmic neutral hydrogen (HI) signal from foregrounds in the presence of noise. Cleaned maps recover cosmological clustering statistics within 10% at all relevant angular scales and frequencies. This amounts to a reduction in prediction variance of over an order of magnitude on small angular scales ($\ell > 300$), and improved accuracy for small radial scales ($k_{\parallel} > 0.17\ \rm h\ Mpc^{-1})$ compared to standard Principal Component Analysis (PCA) methods. We estimate posterior confidence intervals for the network's prediction by training an ensemble of UNets. Our approach demonstrates the feasibility of analyzing 21cm intensity maps, as opposed to derived summary statistics, for upcoming radio experiments, as long as the simulated foreground model is sufficiently realistic. We provide the code used for this analysis on Github https://github.com/tlmakinen/deep21 as well as a browser-based tutorial for the experiment and UNet model via the accompanying http://bit.ly/deep21-colab Colab notebook.	翻訳日:2022-10-01 23:46:45 公開日:2021-06-01
# データ拡張を用いた低リソース表現型音声合成 Low-resource expressive text-to-speech using data augmentation ( http://arxiv.org/abs/2011.05707v2 ) ライセンス: Link先を確認	Goeric Huybrechts, Thomas Merritt, Giulia Comini, Bartek Perz, Raahil Shah, Jaime Lorenzo-Trueba	(参考訳) 最近のneural text-to-speech (tts)システムは、非常によく機能するが、通常、目的とする話者からの所望の発話スタイルでのかなりの録音を必要とする。本研究では,このような録音を15分以内で表現型音声を構築するために,大量のターゲットデータを記録するコストのかかる作業を回避するために,新しい3段階の手法を提案する。まず、他の話者から希望する発話スタイルでの録音を利用して、音声変換によるデータ拡張を行う。次に、利用可能な録音の上に合成データを使って、TSモデルをトレーニングします。最後に、このモデルを微調整して、さらに品質を高めます。評価の結果,提案した変化は,合成音声の多くの側面において,非拡張モデルに対して大きな改善をもたらすことが示された。提案手法は2つのスタイル(新しい話者と会話型)、様々な話者、および単一話者モデルとマルチ話者モデルにおいて、我々のアプローチの堅牢性を示す。 While recent neural text-to-speech (TTS) systems perform remarkably well, they typically require a substantial amount of recordings from the target speaker reading in the desired speaking style. In this work, we present a novel 3-step methodology to circumvent the costly operation of recording large amounts of target data in order to build expressive style voices with as little as 15 minutes of such recordings. First, we augment data via voice conversion by leveraging recordings in the desired speaking style from other speakers. Next, we use that synthetic data on top of the available recordings to train a TTS model. Finally, we fine-tune that model to further increase quality. Our evaluations show that the proposed changes bring significant improvements over non-augmented models across many perceived aspects of synthesised speech. We demonstrate the proposed approach on 2 styles (newscaster and conversational), on various speakers, and on both single and multi-speaker models, illustrating the robustness of our approach.	翻訳日:2022-09-27 00:42:40 公開日:2021-06-01
# 再構成可能なインテリジェントサーフェスによるフェデレーション学習の実現 - 統一的なコミュニケーション・ラーニング設計アプローチ Reconfigurable Intelligent Surface Enabled Federated Learning: A Unified Communication-Learning Design Approach ( http://arxiv.org/abs/2011.10282v4 ) ライセンス: Link先を確認	Hang Liu, Xiaojun Yuan, Ying-Jun Angela Zhang	(参考訳) モバイルエッジネットワークで生成された大量のデータを活用するために、集中型機械学習(ML)の魅力的な代替手段として、フェデレートラーニング(FL)が提案されている。エッジデバイスで共有学習モデルを協調的にトレーニングすることにより、FLは直接データ伝送を避け、集中型MLと比較して通信遅延とプライバシーの問題を克服する。 flモデルアグリゲーションにおける通信効率を向上させるため、無線チャネル固有の重ね合わせ特性を利用して、多数の同時ローカルモデルアップロードをサポートするover-the-air計算が導入された。しかし、エッジデバイス間の通信容量の不均一性により、最弱チャネルのデバイスがモデル集約性能のボトルネックとなるストラグラー問題に悩まされる。この問題はデバイスの選択によってある程度緩和できるが、後者は依然としてデータ搾取とモデル通信のトレードオフに苦しんでいる。本稿では、再構成可能なインテリジェントサーフェス(RIS)技術を活用し、オーバーザエアFLにおけるストラグラー問題を解消する。具体的には,デバイス選択とモデル集約誤差が空中flの収束に与える影響を定量的に特徴付ける学習分析フレームワークを開発した。そして,統合通信学習最適化問題を定式化し,デバイス選択,無線トランスシーバ設計,RIS構成を共同で最適化する。数値実験により、特にエッジデバイス間でチャネル条件が劇的に変化する場合において、提案手法は最先端手法に比べて学習精度が大幅に向上することが示された。 To exploit massive amounts of data generated at mobile edge networks, federated learning (FL) has been proposed as an attractive substitute for centralized machine learning (ML). By collaboratively training a shared learning model at edge devices, FL avoids direct data transmission and thus overcomes high communication latency and privacy issues as compared to centralized ML. To improve the communication efficiency in FL model aggregation, over-the-air computation has been introduced to support a large number of simultaneous local model uploading by exploiting the inherent superposition property of wireless channels. However, due to the heterogeneity of communication capacities among edge devices, over-the-air FL suffers from the straggler issue in which the device with the weakest channel acts as a bottleneck of the model aggregation performance. This issue can be alleviated by device selection to some extent, but the latter still suffers from a tradeoff between data exploitation and model communication. In this paper, we leverage the reconfigurable intelligent surface (RIS) technology to relieve the straggler issue in over-the-air FL. Specifically, we develop a learning analysis framework to quantitatively characterize the impact of device selection and model aggregation error on the convergence of over-the-air FL. Then, we formulate a unified communication-learning optimization problem to jointly optimize device selection, over-the-air transceiver design, and RIS configuration. Numerical experiments show that the proposed design achieves substantial learning accuracy improvement compared with the state-of-the-art approaches, especially when channel conditions vary dramatically across edge devices.	翻訳日:2022-09-23 06:54:47 公開日:2021-06-01
# convtransformer:ビデオフレーム合成のための畳み込みトランスフォーマネットワーク ConvTransformer: A Convolutional Transformer Network for Video Frame Synthesis ( http://arxiv.org/abs/2011.10185v2 ) ライセンス: Link先を確認	Zhouyong Liu, Shun Luo, Wubin Li, Jingben Lu, Yufan Wu, Shilei Sun, Chunguo Li, Luxi Yang	(参考訳) 深層畳み込みニューラルネットワーク(Deep Convolutional Neural Networks, CNN)は、難しいコンピュータビジョンタスクにおいて優れたパフォーマンスを達成する強力なモデルである。 CNNは、大きなラベル付きトレーニングサンプルが利用可能であればいつでもうまく機能するが、オブジェクトの変形や移動、シーンの照明変更、ビデオシーケンスで動くカメラなどにより、ビデオフレームの合成に悪影響を及ぼす。本稿では、ビデオフレームシーケンス学習とビデオフレーム合成のための、畳み込み変換器(Conv Transformer)と呼ばれる、新規で汎用的なエンドツーエンドアーキテクチャを提案する。 convtransformerの中核となる要素は、ビデオシーケンスの逐次依存性を学習するマルチヘッド畳み込み層(multi-head convolutional self-attention layer)である。 ConvTransformerは、マルチヘッドの畳み込み自己保持層上に構築されたエンコーダを使用して、入力フレーム間のシーケンシャルな依存を符号化し、デコーダはターゲットの合成フレームと入力フレーム間の長期的依存を復号する。ビデオフレーム外挿タスクの実験では、ConvTransformerは高品質でありながら、畳み込みLSTM(ConvLSTM)上に構築された最近のアプローチよりも並列化可能である。我々の知る限りでは、ConvTransformerアーキテクチャが提案され、ビデオフレーム合成に適用されたのはこれが初めてである。 Deep Convolutional Neural Networks (CNNs) are powerful models that have achieved excellent performance on difficult computer vision tasks. Although CNNs perform well whenever large labeled training samples are available, they work badly on video frame synthesis due to objects deforming and moving, scene lighting changes, and cameras moving in video sequence. In this paper, we present a novel and general end-to-end architecture, called convolutional Transformer or ConvTransformer, for video frame sequence learning and video frame synthesis. The core ingredient of ConvTransformer is the proposed attention layer, i.e., multi-head convolutional self-attention layer, that learns the sequential dependence of video sequence. ConvTransformer uses an encoder, built upon multi-head convolutional self-attention layer, to encode the sequential dependence between the input frames, and then a decoder decodes the long-term dependence between the target synthesized frames and the input frames. Experiments on video future frame extrapolation task show ConvTransformer to be superior in quality while being more parallelizable to recent approaches built upon convolutional LSTM (ConvLSTM). To the best of our knowledge, this is the first time that ConvTransformer architecture is proposed and applied to video frame synthesis.	翻訳日:2022-09-23 05:58:58 公開日:2021-06-01
# GLGE: 新しい汎用言語生成評価ベンチマーク GLGE: A New General Language Generation Evaluation Benchmark ( http://arxiv.org/abs/2011.11928v3 ) ライセンス: Link先を確認	Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan	(参考訳) GLUEやSuperGLUEのようなマルチタスクベンチマークは、自然言語処理(NLP)における事前学習と転送学習の大きな進歩を導いている。これらのベンチマークは主に自然言語生成(NLG)モデルを考慮せずに、さまざまな自然言語理解(NLU)タスクに焦点を当てている。本稿では,8つの言語生成タスクにわたるNLGモデルの一般化能力を評価するための,新しいマルチタスクベンチマークであるジェネラル言語生成評価(GLGE)を提案する。各タスクに対して,タスク難易度(GLGE-Easy, GLGE-Medium, GLGE-Hard)の3つのサブタスクを引き続き設計する。これにより、モデルパフォーマンスを包括的に比較する24のサブタスクが導入される。 NLGモデルの事前トレーニングと転送学習の研究を促進するため、GLGEを公開し、MASS、BART、ProphetNetなどの強力なベースラインを持つリーダボードを構築する(ソースコードとデータセットはhttps://github.com/microsoft/glge.comで公開されている)。 Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks. For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard). This introduces 24 subtasks to comprehensively compare model performance. To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and ProphetNet (The source code and dataset are publicly available at https://github.com/microsoft/glge).	翻訳日:2022-09-21 13:00:55 公開日:2021-06-01
# (参考訳) 統合グラディエントを用いたBERTが学習した言語学的受容性 Using Integrated Gradients to explain Linguistic Acceptability learnt by BERT ( http://arxiv.org/abs/2106.07349v1 ) ライセンス: CC BY 4.0	Anmol Nayak, Hari Prasad Timmapathini	(参考訳) BERTは、そのアーキテクチャにおけるマルチヘッド自己認識メカニズムを活用することで、言語理解のブレークスルーとなっている。我々の知る限りでは、この研究は初めてLayer Integrated Gradients Attribution Scores (LIGAS)を活用して、BERTがCorp of Linguistic Acceptability (CoLA)ベンチマークデータセットで学んだ言語受容性基準を説明する。 Our experiments on 5 different categories of sentences lead to the following interesting findings: 1) LIGAS for Linguistically Acceptable (LA) sentences are significantly smaller in comparison to Linguistically Unacceptable (LUA) sentences, 2) There are specific subtrees of the Constituency Parse Tree (CPT) for LA and LUA sentences which contribute larger LIGAS, 3) Across the different categories of sentences we observed around 88% to 100% of the Correctly classified sentences had positive LIGAS, indicating a strong positive relationship to the prediction confidence of the model, and 4) Around 57% of the Misclassified sentences had positive LIGAS, which we believe can become correctly classified sentences if the LIGAS are parameterized in the loss function of the model. BERT has been a breakthrough in language understanding by leveraging the multi-head self-attention mechanism in its architecture. To the best of our knowledge this work is the first to leverage Layer Integrated Gradients Attribution Scores (LIGAS) to explain the Linguistic Acceptability criteria that are learnt by BERT on the Corpus of Linguistic Acceptability (CoLA) benchmark dataset. Our experiments on 5 different categories of sentences lead to the following interesting findings: 1) LIGAS for Linguistically Acceptable (LA) sentences are significantly smaller in comparison to Linguistically Unacceptable (LUA) sentences, 2) There are specific subtrees of the Constituency Parse Tree (CPT) for LA and LUA sentences which contribute larger LIGAS, 3) Across the different categories of sentences we observed around 88% to 100% of the Correctly classified sentences had positive LIGAS, indicating a strong positive relationship to the prediction confidence of the model, and 4) Around 57% of the Misclassified sentences had positive LIGAS, which we believe can become correctly classified sentences if the LIGAS are parameterized in the loss function of the model.	翻訳日:2021-06-27 12:55:21 公開日:2021-06-01
# (参考訳) 視線追跡タスクのためのデータセット Dataset for eye-tracking tasks ( http://arxiv.org/abs/2106.07554v1 ) ライセンス: CC0 1.0	R. Ildar	(参考訳) 近年、さまざまなディープニューラルネットワークが開発されているが、ディープネットワークの層が多すぎるため、トレーニングには長い時間と大量のデータセットが必要になる。今日では、このようなディープネットワークを必要としない単純なタスクでも、さまざまなタスクにトレーニングされたディープニューラルネットワークを使用することが一般的です。 YoloV3やSSDなど、よく知られたディープネットワーク。様々な物体の追跡と監視を目的としているため、重量は重く、特定のタスクの全体的な精度は低い。視線追跡タスクは、特定の領域の虹彩の1つの物体のみを検出する必要がある。したがって、このタスクにニューラルネットワークを使用するのは論理的である。しかし問題は、モデルをトレーニングするのに適切なデータセットがないことだ。本稿では,視覚追跡タスクのための畳み込みニューラルネットワークのカスタムモデルのトレーニングに適したデータセットを提案する。データセットデータを使用することで、各ユーザは独立して、アイトラッキングタスクのための畳み込みニューラルネットワークモデルを事前トレーニングすることができる。このデータセットには416×416ピクセルの拡張に注釈付き1万枚のアイイメージが含まれている。注記情報付きテーブルは、各画像の視線の座標と半径を示している。この写本は、視線追跡装置のためのデータセット作成のためのガイドとみなすことができる In recent years many different deep neural networks were developed, but due to a large number of layers in deep networks, their training requires a long time and a large number of datasets. Today is popular to use trained deep neural networks for various tasks, even for simple ones in which such deep networks are not required. The well-known deep networks such as YoloV3, SSD, etc. are intended for tracking and monitoring various objects, therefore their weights are heavy and the overall accuracy for a specific task is low. Eye-tracking tasks need to detect only one object - an iris in a given area. Therefore, it is logical to use a neural network only for this task. But the problem is the lack of suitable datasets for training the model. In the manuscript, we presented a dataset that is suitable for training custom models of convolutional neural networks for eye-tracking tasks. Using data set data, each user can independently pre-train the convolutional neural network models for eye-tracking tasks. This dataset contains annotated 10,000 eye images in an extension of 416 by 416 pixels. The table with annotation information shows the coordinates and radius of the eye for each image. This manuscript can be considered as a guide for the preparation of datasets for eye-tracking devices	翻訳日:2021-06-27 12:51:04 公開日:2021-06-01
# クリックベイトですか? 機械学習を使って予測しよう Is it a click bait? Let's predict using Machine Learning ( http://arxiv.org/abs/2106.07348v1 ) ライセンス: Link先を確認	Sohom Ghosh	(参考訳) このデジタル化の時代、ニュース読者はオンラインでニュースを読む傾向にある。これは、オンラインメディアがすぐに幅広いコンテンツにアクセスできるからである。ですから、今日の状況を知るために明日の新聞を待たなくてもよいのです。これらの美徳に加えて、オンラインニュースにはいくつかの逆もある。ニュース記事に関するソーシャルメディア投稿(つぶやき)の存在は、実際のコンテンツを読むように指示するのではなく、ユーザーの注意を引くことだけを目的としている。このようなポストをクリックベイトと呼ぶ。このプロジェクトの目的は、新しい記事に関連するソーシャルメディア投稿(つぶやき)がクリックベイトである確率を予測できるシステムを開発することである。 In this era of digitisation, news reader tend to read news online. This is because, online media instantly provides access to a wide variety of content. Thus, people don't have to wait for tomorrow's newspaper to know what's happening today. Along with these virtues, online news have some vices as well. One such vice is presence of social media posts (tweets) relating to news articles whose sole purpose is to draw attention of the users rather than directing them to read the actual content. Such posts are referred to as clickbaits. The objective of this project is to develop a system which would be capable of predicting how likely are the social media posts (tweets) relating to new articles tend to be clickbait.	翻訳日:2021-06-27 09:02:08 公開日:2021-06-01
# ネステロフ型加速法の高速シンプレクティック積分器 Fast symplectic integrator for Nesterov-type acceleration method ( http://arxiv.org/abs/2106.07620v1 ) ライセンス: Link先を確認	Shin-itiro Goto and Hideitsu Hino	(参考訳) 本論文では,Nesterovの加速勾配法の収束率の向上に寄与する非自明な常微分方程式(ODE)に対して,シンプレクティックおよび接触幾何学に基づく明示的な安定積分器を提案する。シンプレクティック幾何学はハミルトン力学の記述に適していることが知られており、接触幾何学はシンプレクティック幾何学の奇数次元対応として知られている。さらに、シンプレクタゼーションと呼ばれる手続きは接触多様体からシンプレクティック多様体を構築する既知の方法であり、接触多様体からハミルトン系を生成する。この論文では、以前に研究された非自明なODEは、接触ハミルトン系として記述できる。そして、非自明なODEを表す非自明な接触ハミルトンベクトル場のシンプレクティック化により、新しいシンプレクティック積分器が導出される。提案したシンプレクティック積分器はODE内に隠されたシンプレクティック構造と接触構造を保持するため、ランゲ・クッタ法よりも安定である。数値実験により, 2階シンプレクティック積分器は安定であり, 高収束率が得られることが示された。 In this paper, explicit stable integrators based on symplectic and contact geometries are proposed for a non-autonomous ordinarily differential equation (ODE) found in improving convergence rate of Nesterov's accelerated gradient method. Symplectic geometry is known to be suitable for describing Hamiltonian mechanics, and contact geometry is known as an odd-dimensional counterpart of symplectic geometry. Moreover, a procedure, called symplectization, is a known way to construct a symplectic manifold from a contact manifold, yielding Hamiltonian systems from contact ones. It is found in this paper that a previously investigated non-autonomous ODE can be written as a contact Hamiltonian system. Then, by symplectization of a non-autonomous contact Hamiltonian vector field expressing the non-autonomous ODE, novel symplectic integrators are derived. Because the proposed symplectic integrators preserve hidden symplectic and contact structures in the ODE, they should be more stable than the Runge-Kutta method. Numerical experiments demonstrate that, as expected, the second-order symplectic integrator is stable and high convergence rates are achieved.	翻訳日:2021-06-27 09:01:59 公開日:2021-06-01
# (参考訳) FiSH: 空間ホットスポット FiSH: Fair Spatial Hotspots ( http://arxiv.org/abs/2106.06049v1 ) ライセンス: CC BY 4.0	Deepak P, Sowmya S Sundaram	(参考訳) 追跡デバイスの普及と空間的位置データの可用性の強化は、空間ホットスポット検出などの計算データ解析タスクを通じて、様々な政策介入に利用することへの関心を深めている。本稿では,空間的ホットスポットの検出における公平性について,我々の最善の知識から初めて考察する。我々は、選択したホットスポットにまたがる集団人口に対する統計的公平性を通じて公正性を確保する必要性を動機付けている。次に,注意と公正のトレードオフスペクトルにおいて,多様なソリューションセットを識別するタスクを特徴付け,ポリシー領域によって正当化されたトレードオフを選択する権限を与える。新たなタスクの定式化として,タスクの関連する側面を評価する必要性を動機とする,公正なホットスポット評価指標のスイートも開発した。本研究は, 単純かつ直接的アプローチによる公平なホットスポットの同定と, 高品質で公平で多様な空間的ホットスポットを効率的に同定するためのコードネーム {\it fish} の考案による計算不可能性を示す。 FiSHは、空間ホットスポットの有効かつ公平な集合を特定するためのヒューリスティックスを用いて、木構造検索空間を横断する。人間の開発領域から得られた実世界のデータセットに対する広範な実証分析を通じて、高速な応答時間で高品質なソリューションが生成されることを示す。 Pervasiveness of tracking devices and enhanced availability of spatially located data has deepened interest in using them for various policy interventions, through computational data analysis tasks such as spatial hot spot detection. In this paper, we consider, for the first time to our best knowledge, fairness in detecting spatial hot spots. We motivate the need for ensuring fairness through statistical parity over the collective population covered across chosen hot spots. We then characterize the task of identifying a diverse set of solutions in the noteworthiness-fairness trade-off spectrum, to empower the user to choose a trade-off justified by the policy domain. Being a novel task formulation, we also develop a suite of evaluation metrics for fair hot spots, motivated by the need to evaluate pertinent aspects of the task. We illustrate the computational infeasibility of identifying fair hot spots using naive and/or direct approaches and devise a method, codenamed {\it FiSH}, for efficiently identifying high-quality, fair and diverse sets of spatial hot spots. FiSH traverses the tree-structured search space using heuristics that guide it towards identifying effective and fair sets of spatial hot spots. Through an extensive empirical analysis over a real-world dataset from the domain of human development, we illustrate that FiSH generates high-quality solutions at fast response times.	翻訳日:2021-06-20 21:42:52 公開日:2021-06-01
# N-Gauss活性化関数に基づくニューラルネットワーク構造設計 Neural Network Structure Design based on N-Gauss Activation Function ( http://arxiv.org/abs/2106.07562v1 ) ライセンス: Link先を確認	Xiangri Lu, Hongbin Ma, Jingcheng Zhang	(参考訳) 近年の研究では、畳み込みニューラルネットワークの活性化関数がリプシッツ条件を満たし、それに対応する畳み込みニューラルネットワーク構造をデータセットの規模に応じて構築することができ、データセットをより深く、より正確に、より効果的に訓練できることを示した。本稿では,実験結果を受け入れ,コアブロックN-Gauss,N-Gauss,Swish(Conv1,Conv2,FC1)のニューラルネットワーク構造設計を導入し,それぞれMNIST,CIFAR10,CIFAR100を訓練した。実験により、n-gaussはアクティベーション関数の非線形モデリングの主要な役割を果たすことが示され、ディープ畳み込みニューラルネットワークは階層的非線形マッピング学習能力を持つ。同時に、単純な1次元チャネル小データセット上のN-Gaussのトレーニング能力は、ReLUとSwishの性能と同等である。 Recent work has shown that the activation function of the convolutional neural network can meet the Lipschitz condition, then the corresponding convolutional neural network structure can be constructed according to the scale of the data set, and the data set can be trained more deeply, more accurately and more effectively. In this article, we have accepted the experimental results and introduced the core block N-Gauss, N-Gauss, and Swish (Conv1, Conv2, FC1) neural network structure design to train MNIST, CIFAR10, and CIFAR100 respectively. Experiments show that N-Gauss gives full play to the main role of nonlinear modeling of activation functions, so that deep convolutional neural networks have hierarchical nonlinear mapping learning capabilities. At the same time, the training ability of N-Gauss on simple one-dimensional channel small data sets is equivalent to the performance of ReLU and Swish.	翻訳日:2021-06-20 16:05:00 公開日:2021-06-01
# THG:双曲幾何変換器 THG: Transformer with Hyperbolic Geometry ( http://arxiv.org/abs/2106.07350v1 ) ライセンス: Link先を確認	Zhe Liu and Yibin Xu	(参考訳) トランスフォーマーモデルアーキテクチャは、近年、さまざまなタスクにまたがる効果のために、ディープラーニングにおいて必須の要素となっている。近年、オリジナルのトランスフォーマーアーキテクチャを改良した「xフォーマー」モデルの急増が提案されている。しかし、これらの変種のほとんどは2次時間と自己注意のメモリ複雑性にのみ変化を起こす。クエリとキーの間のドット製品。さらに、それらはユークリッド空間でのみ計算されます。本研究では, ユークリッド空間と双曲空間の両方の利点を生かした, 双曲幾何を用いたトランスフォーマー(THG)モデルを提案する。 thgは、クエリとキーを取得するために入力シーケンスに適用され、提案された双曲線形を用いて自己アテンションの線形変換を改善する。シーケンスラベリングタスク,機械読解タスク,分類タスクに関する広範な実験により,本モデルの有効性と汎用性が示された。また、thgが過剰フィッティングを緩和できることも示している。 Transformer model architectures have become an indispensable staple in deep learning lately for their effectiveness across a range of tasks. Recently, a surge of "X-former" models have been proposed which improve upon the original Transformer architecture. However, most of these variants make changes only around the quadratic time and memory complexity of self-attention, i.e. the dot product between the query and the key. What's more, they are calculate solely in Euclidean space. In this work, we propose a novel Transformer with Hyperbolic Geometry (THG) model, which take the advantage of both Euclidean space and Hyperbolic space. THG makes improvements in linear transformations of self-attention, which are applied on the input sequence to get the query and the key, with the proposed hyperbolic linear. Extensive experiments on sequence labeling task, machine reading comprehension task and classification task demonstrate the effectiveness and generalizability of our model. It also demonstrates THG could alleviate overfitting.	翻訳日:2021-06-20 16:04:44 公開日:2021-06-01
# (参考訳) AI対応型6G O-RANのネットワーク・物理層攻撃と対策 Network and Physical Layer Attacks and countermeasures to AI-Enabled 6G O-RAN ( http://arxiv.org/abs/2106.02494v1 ) ライセンス: CC BY-SA 4.0	Talha F. Rahman, Aly S. Abdalla, Keith Powell, Walaa AlQwider, and Vuk Marojevic	(参考訳) 人工知能(AI)は、細胞ネットワークの展開、構成、管理において、ますます大きな役割を果たすだろう。本稿では,AI駆動型6G無線アクセスネットワーク(RAN)のセキュリティへの影響について検討する。 6G標準化の予定時期はまだ数年先だが、6Gセキュリティに関する事前標準化作業はすでに進行中であり、基礎的および実験的研究の恩恵を受けるだろう。 Open RAN(O-RAN)は、AIコントロールを備えた次世代RANを構築するための、業界主導のオープンアーキテクチャとインターフェースを記述する。このアーキテクチャを考慮すると、データ駆動ネットワークおよび物理層要素に対する重要な脅威、対応する対策、研究の方向性を識別する。 Artificial intelligence (AI) will play an increasing role in cellular network deployment, configuration and management. This paper examines the security implications of AI-driven 6G radio access networks (RANs). While the expected timeline for 6G standardization is still several years out, pre-standardization efforts related to 6G security are already ongoing and will benefit from fundamental and experimental research. The Open RAN (O-RAN) describes an industry-driven open architecture and interfaces for building next generation RANs with AI control. Considering this architecture, we identify the critical threats to data driven network and physical layer elements, the corresponding countermeasures, and the research directions.	翻訳日:2021-06-15 13:46:37 公開日:2021-06-01
# (参考訳) AMV : アルゴリズムメタデータ語彙 AMV : Algorithm Metadata Vocabulary ( http://arxiv.org/abs/2106.03567v1 ) ライセンス: CC BY 4.0	Biswanath Dutta and Jyotima Patel	(参考訳) メタデータ語彙は様々な研究領域で使用される。リソースの詳細な説明を提供する。本研究では,アルゴリズムに関するメタデータ(特にコンピュータによる)を段階的に取得し,保存する語彙であるアルゴリズムメタデータ語彙(AMV)を開発する。研究者が現在直面している問題は、任意の検索エンジンでアルゴリズムを検索する際に、関連性のある結果を得ることができないことだ。 AMVはセマンティックモデルとして表現され、OWLファイルを生成する。これは、アルゴリズムメタデータを知識グラフとして作成、公開したり、SPARQLエンドポイントを通じてメタデータサービスを提供することに関心のある人なら誰でも利用できる。語彙をデザインするために,アルゴリズムユーザと実践者が直面する実際の問題を考えるための,明確に定義された手法を提案する。評価は有望な結果を示す。 Metadata vocabularies are used in various domains of study. It provides an in-depth description of the resources. In this work, we develop Algorithm Metadata Vocabulary (AMV), a vocabulary for capturing and storing the metadata about the algorithms (a procedure or a set of rules that is followed step-by-step to solve a problem, especially by a computer). The snag faced by the researchers in the current time is the failure of getting relevant results when searching for algorithms in any search engine. AMV is represented as a semantic model and produced OWL file, which can be directly used by anyone interested to create and publish algorithm metadata as a knowledge graph, or to provide metadata service through SPARQL endpoint. To design the vocabulary, we propose a well-defined methodology, which considers real issues faced by the algorithm users and the practitioners. The evaluation shows a promising result.	翻訳日:2021-06-15 13:34:38 公開日:2021-06-01
# 1次元畳み込みニューラルネットワークを用いた混合系のラマンスペクトル解析 Raman spectral analysis of mixtures with one-dimensional convolutional neural network ( http://arxiv.org/abs/2106.05316v1 ) ライセンス: Link先を確認	M. Hamed Mozaffari and Li-Lin Tay	(参考訳) 近年,ロバストな1次元畳み込みニューラルネットワーク (1-d cnns) とラマン分光法の組み合わせにより,未知の物質の迅速同定が高精度に行えることが期待されている。この技術を使って、研究者は純粋な化合物を認識し、混合物中の未知の物質と区別することができる。このアプローチの新規性は、トレーニングされたニューラルネットワークがデータの事前処理や後処理なしに自動的に動作することだ。この手法を未知の混合物中の純粋な化合物の分類にまで拡張しようとする研究もある。しかし、1次元cnnの適用は通常、純粋な化合物のバイナリ分類に制限されている。ここでは、多成分混合物中の化学成分のスペクトル認識と定量化における新しいアプローチを紹介する。この目的のために2つの1次元CNNモデルRaMixNet IとIIが開発された。前者は混合物中の成分の迅速な分類、後者はそれらの成分の定量化を目的としている。提案手法では, 混合物中の化合物の数に制限はない。また、ラマンスペクトルにランダムベースラインを追加することにより、データ拡張手法も導入する。実験の結果,RaMixNet I と II の分類精度は未知の試験混合物の分析では100%であり,RMixNet II モデルでは各成分の定量化では88%の回帰精度が得られることがわかった。 Recently, the combination of robust one-dimensional convolutional neural networks (1-D CNNs) and Raman spectroscopy has shown great promise in rapid identification of unknown substances with good accuracy. Using this technique, researchers can recognize a pure compound and distinguish it from unknown substances in a mixture. The novelty of this approach is that the trained neural network operates automatically without any pre- or post-processing of data. Some studies have attempted to extend this technique to the classification of pure compounds in an unknown mixture. However, the application of 1-D CNNs has typically been restricted to binary classifications of pure compounds. Here we will highlight a new approach in spectral recognition and quantification of chemical components in a multicomponent mixture. Two 1-D CNN models, RaMixNet I and II, have been developed for this purpose. The former is for rapid classification of components in a mixture while the latter is for quantitative determination of those constituents. In the proposed method, there is no limit to the number of compounds in a mixture. A data augmentation method is also introduced by adding random baselines to the Raman spectra. The experimental results revealed that the classification accuracy of RaMixNet I and II is 100% for analysis of unknown test mixtures; at the same time, the RaMixNet II model may achieve a regression accuracy of 88% for the quantification of each component.	翻訳日:2021-06-13 14:02:43 公開日:2021-06-01
# 歩行者軌道符号化用非対称bi-rnn Asymmetrical Bi-RNN for pedestrian trajectory encoding ( http://arxiv.org/abs/2106.04419v1 ) ライセンス: Link先を確認	Rapha\"el Rozenberg, Joseph Gesnouin and Fabien Moutarde	(参考訳) 歩行者の行動行動は、個々の目標と他のエージェントとの社会的相互作用の組み合わせを含む。本稿では,U-RNNと呼ばれる非対称な双方向リカレントニューラルネットワークアーキテクチャをシーケンスエンコーダとして提案する。 Trajnet++ベンチマークの実験結果によると、U-LSTMの変種は、様々なアプローチと相互作用モジュールのための一般的なLSTMシーケンスエンコーダよりも、利用可能なすべてのメトリック(ADE、FDE、衝突速度)についてより良い結果が得られる。 Trajnet++ベンチマークのための非対称Bi-RNNの実装は、github.com/JosephGesnouin/Asymmetrical-Bi-RNNs-to-encode-pedestrian-trajectoriesで利用可能である。 Pedestrian motion behavior involves a combination of individual goals and social interactions with other agents. In this article, we present a non-symmetrical bidirectional recurrent neural network architecture called U-RNN as a sequence encoder and evaluate its relevance to replace LSTMs for various forecasting models. Experimental results on the Trajnet++ benchmark show that the U-LSTM variant can yield better results regarding every available metric (ADE, FDE, Collision rate) than common LSTMs sequence encoders for a variety of approaches and interaction modules. Our implementation of the asymmetrical Bi-RNNs for the Trajnet++ benchmark is available at: github.com/JosephGesnouin/Asymmetrical-Bi-RNNs-to-encode-pedestrian-trajectories	翻訳日:2021-06-13 14:02:22 公開日:2021-06-01
# レーダースペクトルを用いた深層学習対象分類の不確かさの検討 Investigation of Uncertainty of Deep Learning-based Object Classification on Radar Spectra ( http://arxiv.org/abs/2106.05870v1 ) ライセンス: Link先を確認	Kanil Patel, William Beluch, Kilian Rambach, Adriana-Eliza Cozma, Michael Pfeiffer and Bin Yang	(参考訳) 近年,自動車用レーダの物体分類精度向上への関心が高まっているが,高い精度に加えて,予測の信頼性を評価する上で自動運転車の意思決定が重要であるが,dlネットワークの判断は不透明である。近年のDL研究は,予測の不確かさの定量化について検討しており,本稿では,これらの手法が自動車のレーダ認識に有効である可能性について検討する。特に,(1)領域シフト,(2)入力信号の破損,(3)未知物体の存在下での不確実性定量化がレーダ知覚をどのように支援できるかを評価する。文献に見られる現象と一致して、深いレーダー分類器は間違った予測であっても過度に自信を持っていることがわかった。モデルが未知の状況に対処できない場合に通知できないため、不確実性下での意思決定に信頼値を使用することに対する懸念が高まる。正確な信頼度は、例えば複数の情報ソースの最適な統合を可能にする。センサー・フュージョン経由で本研究では, 最先端のポストホック不確実性校正を適用することにより, 信頼性対策の質を著しく向上できることを示す。本研究は、DLネットワークのトレーニングと校正に関するさらなる研究が必要であることを示し、レーダーセンサを用いた安全な自動車物体分類の可能性を示している。 Deep learning (DL) has recently attracted increasing interest to improve object type classification for automotive radar.In addition to high accuracy, it is crucial for decision making in autonomous vehicles to evaluate the reliability of the predictions; however, decisions of DL networks are non-transparent. Current DL research has investigated how uncertainties of predictions can be quantified, and in this article, we evaluate the potential of these methods for safe, automotive radar perception. In particular we evaluate how uncertainty quantification can support radar perception under (1) domain shift, (2) corruptions of input signals, and (3) in the presence of unknown objects. We find that in agreement with phenomena observed in the literature,deep radar classifiers are overly confident, even in their wrong predictions. This raises concerns about the use of the confidence values for decision making under uncertainty, as the model fails to notify when it cannot handle an unknown situation. Accurate confidence values would allow optimal integration of multiple information sources, e.g. via sensor fusion. We show that by applying state-of-the-art post-hoc uncertainty calibration, the quality of confidence measures can be significantly improved,thereby partially resolving the over-confidence problem. Our investigation shows that further research into training and calibrating DL networks is necessary and offers great potential for safe automotive object classification with radar sensors.	翻訳日:2021-06-13 14:02:07 公開日:2021-06-01
# 条件付き生成逆ネットワークを用いたユーザ定義特性を用いたディジタルロック再構成 Digital rock reconstruction with user-defined properties using conditional generative adversarial networks ( http://arxiv.org/abs/2012.07719v2 ) ライセンス: Link先を確認	Qiang Zheng and Dongxiao Zhang	(参考訳) 不確実性は、その固有の不均一性やその場測定の欠如により、地下岩の流動に至らない。マルチスケールで不確実性解析を完了させるには,十分な岩石試料の提供が必須である。デジタルロック技術の出現は、岩を再現する機会を提供するが、高いコストのために大量のサンプルを供給できないため、多様化された数学的手法の開発につながっている。このうち2点統計(TPS)と多点統計(MPS)が一般的に利用されており、それぞれ低次統計情報と高次統計情報を取り入れている。近年,優れた視覚的・地質学的リアリズムを持つ訓練画像の再生が可能なGAN(Generative Adversarial Network)が普及している。しかし、標準のGANはデータからの情報のみを組み込むことができるが、ユーザ定義プロパティのインターフェースは残っていないため、再構成されたサンプルの表現性を制限できる。本研究では,実際のトレーニングデータに類似したサンプルを再現することを目的とした,デジタル岩盤復元のための条件付きganを提案する。実際,提案フレームワークは,岩盤画像からの高次情報を直接GANスキームに組み込むことで,MPSとTPSの目標を同時に実現し,低次情報を条件付きで保存することができる。本研究では, 3つの復元実験を行い, 岩石の種類, 岩石のポロシティ, 相関長が再現された岩石画像に影響を及ぼすことを示す。さらに,既存のGANとは対照的に,複数種類の岩の同時学習が可能であり,計算コストを不可視的に削減することができる。 Uncertainty is ubiquitous with flow in subsurface rocks because of their inherent heterogeneity and lack of in-situ measurements. To complete uncertainty analysis in a multi-scale manner, it is a prerequisite to provide sufficient rock samples. Even though the advent of digital rock technology offers opportunities to reproduce rocks, it still cannot be utilized to provide massive samples due to its high cost, thus leading to the development of diversified mathematical methods. Among them, two-point statistics (TPS) and multi-point statistics (MPS) are commonly utilized, which feature incorporating low-order and high-order statistical information, respectively. Recently, generative adversarial networks (GANs) are becoming increasingly popular since they can reproduce training images with excellent visual and consequent geologic realism. However, standard GANs can only incorporate information from data, while leaving no interface for user-defined properties, and thus may limit the representativeness of reconstructed samples. In this study, we propose conditional GANs for digital rock reconstruction, aiming to reproduce samples not only similar to the real training data, but also satisfying user-specified properties. In fact, the proposed framework can realize the targets of MPS and TPS simultaneously by incorporating high-order information directly from rock images with the GANs scheme, while preserving low-order counterparts through conditioning. We conduct three reconstruction experiments, and the results demonstrate that rock type, rock porosity, and correlation length can be successfully conditioned to affect the reconstructed rock images. Furthermore, in contrast to existing GANs, the proposed conditioning enables learning of multiple rock types simultaneously, and thus invisibly saves computational cost.	翻訳日:2021-06-07 09:09:07 公開日:2021-06-01
# 知識の追跡に関する調査 A Survey of Knowledge Tracing ( http://arxiv.org/abs/2105.15106v2 ) ライセンス: Link先を確認	Qi Liu, Shuanghong Shen, Zhenya Huang, Enhong Chen, and Yonghe Zheng	(参考訳) 高品質な教育は、より持続可能な世界を達成するための鍵の1つだ。新型コロナウイルスの感染拡大を受け、オンライン教育が流行し、学生も教師も家庭で学び、教えることができるようになった。一方、オンライン学習プラットフォームを使って大量の学習データを記録し、調査することが可能になり、よりインテリジェントな教育サービスを提供できるようになった。学生の進化する知識状態を監視することを目的とした知識追跡(KT)は、これらのインテリジェントサービスを支援するための基本的で重要な課題である。そのため、この新興地域には研究の注意が払われており、かなりの進歩を遂げている。本研究では,既存の基本ktモデルの新しい分類法を技術的観点から提案し,これらのモデルの包括的概要を体系的に示す。さらに、より完全な学習プロセスを捉えるために、多くのKTモデルの変種が提案されている。次に、学習過程の3つの段階(前、中、後)に関わるこれらの変種を、それぞれレビューする。さらに、異なる教育シナリオにおけるKTの典型的な応用について述べる。最後に、この急成長分野における今後の研究の方向性について述べる。 High-quality education is one of the keys to achieving a more sustainable world. The recent COVID-19 epidemic has triggered the outbreak of online education, which has enabled both students and teachers to learn and teach at home. Meanwhile, it is now possible to record and research a large amount of learning data using online learning platforms in order to offer better intelligent educational services. Knowledge Tracing (KT), which aims to monitor students' evolving knowledge state, is a fundamental and crucial task to support these intelligent services. Therefore, an increasing amount of research attention has been paid to this emerging area and considerable progress has been made. In this survey, we propose a new taxonomy of existing basic KT models from a technical perspective and provide a comprehensive overview of these models in a systematic manner. In addition, many variants of KT models have been proposed to capture more complete learning process. We then review these variants involved in three phases of the learning process: before, during, and after the student learning, respectively. Moreover, we present several typical applications of KT in different educational scenarios. Finally, we provide some potential directions for future research in this fast-growing field.	翻訳日:2021-06-06 11:07:59 公開日:2021-06-01
# (参考訳) 自律走行車の動作計画と制御のための深層強化学習アルゴリズムに関する研究 A Survey of Deep Reinforcement Learning Algorithms for Motion Planning and Control of Autonomous Vehicles ( http://arxiv.org/abs/2105.14218v2 ) ライセンス: CC BY 4.0	Fei Ye, Shen Zhang, Pin Wang, and Ching-Yao Chan	(参考訳) 本研究では,強化学習(rl)を自律走行車の運動計画と制御に適用する研究の最近の文献を体系的に要約する。多くの既存のコントリビューションは、手作りのモジュールで構成され、それぞれが人間の解釈の容易さのために選択された機能を持つパイプラインアプローチに起因している。しかし、このアプローチはシステムレベルの最適化が欠如しているため、最大性能を自動保証しない。そこで、本稿では、エンド・ツー・エンドのアプローチに陥り、パフォーマンスが向上し、システム・スケールが小さくなる傾向を示す。しかし、その性能は専門家のデータ不足や一般化の問題にも悩まされている。最後に、自動運転に深いRLアルゴリズムを適用した残りの課題を要約し、これらの課題に取り組むための今後の研究方向も提示する。 In this survey, we systematically summarize the current literature on studies that apply reinforcement learning (RL) to the motion planning and control of autonomous vehicles. Many existing contributions can be attributed to the pipeline approach, which consists of many hand-crafted modules, each with a functionality selected for the ease of human interpretation. However, this approach does not automatically guarantee maximal performance due to the lack of a system-level optimization. Therefore, this paper also presents a growing trend of work that falls into the end-to-end approach, which typically offers better performance and smaller system scales. However, their performance also suffers from the lack of expert data and generalization issues. Finally, the remaining challenges applying deep RL algorithms on autonomous driving are summarized, and future research directions are also presented to tackle these challenges.	翻訳日:2021-06-05 21:44:01 公開日:2021-06-01
# (参考訳) 複数のトークン化戦略を持つ韓国英語機械翻訳 Korean-English Machine Translation with Multiple Tokenization Strategy ( http://arxiv.org/abs/2105.14274v2 ) ライセンス: CC BY 4.0	Dojun Park, Youngjin Jang and Harksoo Kim	(参考訳) 本研究では,機械翻訳モデルの学習結果にトークン化手法がどう影響するかを明らかにする。本研究は, 韓国語を原語として, 英語を対象言語として, アルファベットトークン化, 形態素トークン化, およびBPEトークン化をそれぞれ適用し, トランスフォーマーニューラルネットワークを用いて, 9モデル毎に5万エポックを繰り返して比較実験を行った。実験モデルのbleuスコアを計測した結果、bpeトークン化を韓国語に適用したモデルは35.73点を記録し、最高のパフォーマンスを示した。 This work was conducted to find out how tokenization methods affect the training results of machine translation models. In this work, alphabet tokenization, morpheme tokenization, and BPE tokenization were applied to Korean as the source language and English as the target language respectively, and the comparison experiment was conducted by repeating 50,000 epochs of each 9 models using the Transformer neural network. As a result of measuring the BLEU scores of the experimental models, the model that applied BPE tokenization to Korean and morpheme tokenization to English recorded 35.73, showing the best performance.	翻訳日:2021-06-05 18:08:53 公開日:2021-06-01
# (参考訳) 深部アンサンブルを用いたGreedy Bayesian Posterior Approximation Greedy Bayesian Posterior Approximation with Deep Ensembles ( http://arxiv.org/abs/2105.14275v2 ) ライセンス: CC BY 4.0	Aleksei Tiulpin and Matthew B. Blaschko	(参考訳) 独立に訓練されたニューラルネットワークのアンサンブルは、ディープラーニングにおける予測の不確かさを推定するための最先端のアプローチであり、デルタ関数の混合による後方分布の近似と解釈できる。アンサンブルの訓練は、損失ランドスケープの非凸性と個々のメンバーのランダムな初期化に依存し、その結果の後方近似は制御されない。本稿では,関数空間における実後部とカーネル密度推定器間の$f$-divergenceを最小化する,この制限に対処する新しい原理的手法を提案する。我々は、この目的を組合せの観点から分析し、任意の$f$ に対して混合成分に関して亜モジュラーであることを示す。その後, グリーディアンサンブル構築の問題を考えるとともに, 全目的の限界ゲインから, アンサンブル法の新たな多様性用語を導出する。このアプローチのパフォーマンスは、複数のデータセットでトレーニングされたさまざまなアーキテクチャにおける、コンピュータビジョンの分散ベンチマークで実証されます。本手法のソースコードはhttps://github.com/MIPT-Oulu/greedy_ensembles_trainingで公開されている。 Ensembles of independently trained neural networks are a state-of-the-art approach to estimate predictive uncertainty in Deep Learning, and can be interpreted as an approximation of the posterior distribution via a mixture of delta functions. The training of ensembles relies on non-convexity of the loss landscape and random initialization of their individual members, making the resulting posterior approximation uncontrolled. This paper proposes a novel and principled method to tackle this limitation, minimizing an $f$-divergence between the true posterior and a kernel density estimator in a function space. We analyze this objective from a combinatorial point of view, and show that it is submodular with respect to mixture components for any $f$. Subsequently, we consider the problem of greedy ensemble construction, and from the marginal gain of the total objective, we derive a novel diversity term for ensemble methods. The performance of our approach is demonstrated on computer vision out-of-distribution benchmarks in a range of architectures trained on multiple datasets. The source code of our method is publicly available at https://github.com/MIPT-Oulu/greedy_ensembles_training.	翻訳日:2021-06-05 17:59:57 公開日:2021-06-01
# (参考訳) 文法精度評価(gae) : 機械翻訳モデルの量的固有性評価 Grammar Accuracy Evaluation (GAE): Quantifiable Intrinsic Evaluation of Machine Translation Models ( http://arxiv.org/abs/2105.14277v2 ) ライセンス: CC BY 4.0	Dojun Park, Youngjin Jang and Harksoo Kim	(参考訳) 自然言語生成モデルの性能評価のための人間による本質的評価は、生成文の品質が外部的な評価だけでは完全に表現できないという事実を克服するために行われる。それにもかかわらず、既存の内在的評価は評価者の基準に応じて大きなスコア偏差を有する。本稿では,特定の評価基準を提供するための文法精度評価(GAE)を提案する。 bleuとgaeによる機械翻訳の品質分析の結果、bleuスコアは機械翻訳モデルの絶対的性能を表わさないこと、およびgaeがbleuの欠点を補うことを確認し、代替同義語や文構造の変化を柔軟に評価した。 Intrinsic evaluation by humans for the performance of natural language generation models is conducted to overcome the fact that the quality of generated sentences cannot be fully represented by only extrinsic evaluation. Nevertheless, existing intrinsic evaluations have a large score deviation according to the evaluator's criteria. In this paper, we propose Grammar Accuracy Evaluation (GAE) that can provide specific evaluating criteria. As a result of analyzing the quality of machine translation by BLEU and GAE, it was confirmed that the BLEU score does not represent the absolute performance of machine translation models and that GAE compensates for the shortcomings of BLEU with a flexible evaluation on alternative synonyms and changes in sentence structure.	翻訳日:2021-06-05 17:13:53 公開日:2021-06-01
# (参考訳) 畳み込みニューラルネットワークを用いた境界ボックスアノテーションの自動CT分割 Automatic CT Segmentation from Bounding Box Annotations using Convolutional Neural Networks ( http://arxiv.org/abs/2105.14314v2 ) ライセンス: CC BY 4.0	Yuanpeng Liu, Qinglei Hui, Zhiyi Peng, Shaolin Gong and Dexing Kong	(参考訳) 臨床診断には医用画像の正確なセグメンテーションが重要である。既存の自動セグメンテーション手法は、主に完全に教師ありの学習に基づいており、正確なアノテーションの需要が非常に高く、非常に費用がかかり、時間を要する。この問題に対処するため,我々は,境界ボックスという形で,弱いアノテーションでのみ正確なセグメント化モデルを訓練できる,弱い教師付き学習に基づくctセグメント化手法を提案した。提案手法は,1)k平均クラスタリングによる境界ボックスアノテーションによる擬似マスクの生成,2)分割モデルとして3次元U-Net畳み込みニューラルネットワークを反復的に訓練する。いくつかのデータ前処理手法は性能向上に使用される。この方法は3種類の臓器を含む4つのデータセットで627個のCTボリュームで検証された。肝臓,脾臓,腎分画では95.19%,92.11%,91.45%の精度を示した。実験の結果,本手法は正確で,効率的であり,臨床応用に適していることが示された。 Accurate segmentation for medical images is important for clinical diagnosis. Existing automatic segmentation methods are mainly based on fully supervised learning and have an extremely high demand for precise annotations, which are very costly and time-consuming to obtain. To address this problem, we proposed an automatic CT segmentation method based on weakly supervised learning, by which one could train an accurate segmentation model only with weak annotations in the form of bounding boxes. The proposed method is composed of two steps: 1) generating pseudo masks with bounding box annotations by k-means clustering, and 2) iteratively training a 3D U-Net convolutional neural network as a segmentation model. Some data pre-processing methods are used to improve performance. The method was validated on four datasets containing three types of organs with a total of 627 CT volumes. For liver, spleen and kidney segmentation, it achieved an accuracy of 95.19%, 92.11%, and 91.45%, respectively. Experimental results demonstrate that our method is accurate, efficient, and suitable for clinical use.	翻訳日:2021-06-05 16:02:37 公開日:2021-06-01
# (参考訳) 単一RGBカメラによるベイズ推定に基づくスペクトル分布の分離推定 Separated-Spectral-Distribution Estimation Based on Bayesian Inference with Single RGB Camera ( http://arxiv.org/abs/2106.01861v1 ) ライセンス: CC BY 4.0	Yuma Kinoshita and Hitoshi Kiya	(参考訳) 本稿では,典型的なRGBカメラで撮影した画像からスペクトル分布を別々に推定する手法を提案する。提案手法では,照明,反射率,カメラ感度のスペクトル分布を別々に推定できるが,近年のハイパースペクトルカメラはシーンからの同時スペクトル分布を捉えることに限られている。さらに、ベイズ推定を用いることで、スペクトル分布と画像ノイズの両方の事前情報を確率分布として考慮することができる。その結果,提案手法はスペクトル分布を統一的に推定することができ,従来のスペクトル分布推定法では不可能な雑音に対する推定の堅牢性を高めることができる。ベイズ推論を用いることで,推定結果の信頼度も得ることができる。実験では,提案手法が従来のrmse法を上回るだけでなく,雑音に対するロバスト性も示している。 In this paper, we propose a novel method for separately estimating spectral distributions from images captured by a typical RGB camera. The proposed method allows us to separately estimate a spectral distribution of illumination, reflectance, or camera sensitivity, while recent hyperspectral cameras are limited to capturing a joint spectral distribution from a scene. In addition, the use of Bayesian inference makes it possible to take into account prior information of both spectral distributions and image noise as probability distributions. As a result, the proposed method can estimate spectral distributions in a unified way, and it can enhance the robustness of the estimation against noise, which conventional spectral-distribution estimation methods cannot. The use of Bayesian inference also enables us to obtain the confidence of estimation results. In an experiment, the proposed method is shown not only to outperform conventional estimation methods in terms of RMSE but also to be robust against noise.	翻訳日:2021-06-05 12:31:39 公開日:2021-06-01
# (参考訳) 気腫沈降のための深部クラスタリング活性化マップ Deep Clustering Activation Maps for Emphysema Subtyping ( http://arxiv.org/abs/2106.01351v1 ) ライセンス: CC BY 4.0	Weiyi Xie, Colin Jacobs, Bram van Ginneken	(参考訳) 本稿では,CTスキャンから気腫のサブタイプを抽出するためのセグメンテーションネットワークから高密度な特徴を生かしたディープラーニングクラスタリング手法を提案する。濃密な特徴を利用することで、高密度クラスタリングアクティベーションマップ(dCAM)を介して、クラスタ割り当てに対応する画像領域の高精細な可視化が可能になる。このアプローチはモデル解釈性を提供する。 COPDGenestudyによる500名の被験者のクラスタリング結果について検討し,画像CTによる肺気腫サブタイプを手動でアノテートした。教師なしクラスタリングの精度は43%で、ベースラインを41%で上回り、45%で教師付き分類に匹敵する結果を得た。提案手法は, シルエット係数0.54, David-Bouldin スコア 0.55 に対して, ベースラインよりも優れたクラスタ形成を提供する。 We propose a deep learning clustering method that exploits dense features from a segmentation network for emphysema subtyping from computed tomography (CT) scans. Using dense features enables high-resolution visualization of image regions corresponding to the cluster assignment via dense clustering activation maps (dCAMs). This approach provides model interpretability. We evaluated clustering results on 500 subjects from the COPDGenestudy, where radiologists manually annotated emphysema sub-types according to their visual CT assessment. We achieved a 43% unsupervised clustering accuracy, outperforming our baseline at 41% and yielding results comparable to supervised classification at 45%. The proposed method also offers a better cluster formation than the baseline, achieving0.54 in silhouette coefficient and 0.55 in David-Bouldin scores.	翻訳日:2021-06-05 12:20:59 公開日:2021-06-01
# (参考訳) 深層学習を用いたSPECT MPIを用いた機械的不整脈からのCRT応答の新しい予測因子の探索 A method using deep learning to discover new predictors of CRT response from mechanical dyssynchrony on gated SPECT MPI ( http://arxiv.org/abs/2106.01355v1 ) ライセンス: CC BY 4.0	Zhuo He, Xinwei Zhang, Chen Zhao, Zhiyong Qian, Yao Wang, Xiaofeng Hou, Jiangang Zou, Weihua Zhou	(参考訳) 背景。従来の左室機械的同期(LVMD)パラメータには独自の統計的制限があることが研究で示されている。本研究の目的は,CRT患者選択を支援するための深層学習により,ゲートSPECT MPIの位相解析から新しいLVMDパラメータを抽出することである。方法。 SPECT SPECT MPI を施行した患者は 100 名, 3 名であった。 crt反応は6ヶ月+1カ月後に左室末端収縮容積 (lvesv) >=15%の減少と定義した。教師なしのディープラーニング手法であるAutoencoder (AE) は、生のLVシストリック位相極写像を用いてトレーニングされ、AEベースのLVMDパラメータと呼ばれる新しいLVMDパラメータを抽出した。新しいパラメータと従来のLVMDパラメータの関係を説明するために相関解析を用いた。単変量および多変量解析を用いてCRT応答を予測する多変量モデルを構築した。結果。完全なデータは102例で得られ、44.1%はCRT応答薬に分類された。 AE-based LVMD parameters was significant in the univariate (OR 1.24, 95% CI 1.07 - 1.44, P = 0.006) and multivariate analysis (OR 1.03, 95% CI 1.01 - 1.06, P = 0.006)。さらに, PSD (AUC 0.72 vs. 0.63, LH 8.06, P = 0.005) とPBW (AUC 0.72 vs. 0.64, LH 7.87, P = 0.005) を上回り, LVEF や性差など有意な臨床症状が認められた。結論。ベースラインゲートSPECT MPIからオートエンコーダによって抽出された新しいLVMDパラメータは、CRT応答の予測を改善する可能性がある。 Background. Studies have shown that the conventional left ventricular mechanical dyssynchrony (LVMD) parameters have their own statistical limitations. The purpose of this study is to extract new LVMD parameters from the phase analysis of gated SPECT MPI by deep learning to help CRT patient selection. Methods. One hundred and three patients who underwent rest gated SPECT MPI were enrolled in this study. CRT response was defined as a decrease in left ventricular end-systolic volume (LVESV) >= 15% at 6 +- 1 month follow up. Autoencoder (AE), an unsupervised deep learning method, was trained by the raw LV systolic phase polar maps to extract new LVMD parameters, called AE-based LVMD parameters. Correlation analysis was used to explain the relationships between new parameters with conventional LVMD parameters. Univariate and multivariate analyses were used to establish a multivariate model for predicting CRT response. Results. Complete data were obtained in 102 patients, 44.1% of them were classified as CRT responders. AE-based LVMD parameter was significant in the univariate (OR 1.24, 95% CI 1.07 - 1.44, P = 0.006) and multivariate analyses (OR 1.03, 95% CI 1.01 - 1.06, P = 0.006). Moreover, it had incremental value over PSD (AUC 0.72 vs. 0.63, LH 8.06, P = 0.005) and PBW (AUC 0.72 vs. 0.64, LH 7.87, P = 0.005), combined with significant clinic characteristics, including LVEF and gender. Conclusions. The new LVMD parameters extracted by autoencoder from the baseline gated SPECT MPI has the potential to improve the prediction of CRT response.	翻訳日:2021-06-05 11:59:52 公開日:2021-06-01
# (参考訳) Diffusion Schr\"odinger Bridgeとスコアベース生成モデルへの応用 Diffusion Schr\"odinger Bridge with Applications to Score-Based Generative Modeling ( http://arxiv.org/abs/2106.01357v1 ) ライセンス: CC0 1.0	Valentin De Bortoli, James Thornton, Jeremy Heng, Arnaud Doucet	(参考訳) ガウス雑音の漸進的適用は、複素データ分布をおよそガウスに変換する。このダイナミックな反転は生成モデルを定義する。前方雑音発生過程が確率微分方程式(SDE)によって与えられる場合、Song et al。 (2021) スコアマッチングを用いて, 関連する逆時間SDEの時間不均一ドリフトを推定する方法を示した。このアプローチの制限は、最終分布がほぼガウス的であるためには、前向きの SDE を十分に長い時間実行しなければならないことである。対照的に、schr\"odinger bridge problem (sb) の解法である。経路空間上のエントロピー規則化された最適輸送問題で、有限時間でデータ分布からサンプルを生成する拡散を生成する。本稿では,SB問題を解くためにIterative Proportional Fitting (IPF) 法のオリジナル近似である Diffusion SB (DSB) を提案し,生成モデル実験とともに理論的解析を行った。最初のDSBイテレーションは、Songらによって提案された方法論を復元する。 (2021年)は、後続のdsbの反復が前方(resp)の最終時間辺とのずれを減少させるため、より短い時間間隔を使用する柔軟性がある。 sde (複数形 sdes または sdes) データ)配信。生成モデリング以外にも、DSBは人気のあるシンクホーンアルゴリズム(Cuturi, 2013)の連続状態空間アナログとして広く応用可能な計算最適輸送ツールを提供している。 Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schr\"odinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).	翻訳日:2021-06-05 11:47:09 公開日:2021-06-01
# (参考訳) balanced spiking neural networkを用いたオンライン振動異常検出 Online Detection of Vibration Anomalies Using Balanced Spiking Neural Networks ( http://arxiv.org/abs/2106.00687v1 ) ライセンス: CC BY 4.0	Nik Dennler, Germain Haessig, Matteo Cartiglia, Giacomo Indiveri	(参考訳) 振動パターンは、大規模産業システムの予測メンテナンスタスクに一般的に利用されるランニングマシンの健康状態に関する貴重な情報をもたらす。しかし、この情報を利用する古典的な方法によって必要とされる、サイズ、複雑さ、電力予算のオーバーヘッドは、自動運転車、ドローン、ロボット工学のような小規模のアプリケーションでは、しばしば禁止される。本稿では,幅広いシナリオに適用可能なスパイキングニューラルネットワークを用いた振動解析を行うためのニューロモルフィックアプローチを提案する。本稿では,アナログディジタルニューロモルフィック回路と互換性のあるビルディングブロックを用いて,振動データからシステム異常を検出するスパイクベースのエンドツーエンドパイプラインを提案する。このパイプラインはオンラインの教師なしの方法で動作し、コチェリーモデル、フィードバック適応、バランスの取れたスパイクニューラルネットワークに依存している。提案手法は,2つの公開データセットに対して,最先端の性能あるいは優れた性能を実現する。さらに,非同期なニューロモーフィックプロセッサデバイスに実装された概念実証を行う。この研究は、オンライン振動監視のための自律低消費電力エッジコンピューティングデバイスの設計と実装に向けた重要な一歩である。 Vibration patterns yield valuable information about the health state of a running machine, which is commonly exploited in predictive maintenance tasks for large industrial systems. However, the overhead, in terms of size, complexity and power budget, required by classical methods to exploit this information is often prohibitive for smaller-scale applications such as autonomous cars, drones or robotics. Here we propose a neuromorphic approach to perform vibration analysis using spiking neural networks that can be applied to a wide range of scenarios. We present a spike-based end-to-end pipeline able to detect system anomalies from vibration data, using building blocks that are compatible with analog-digital neuromorphic circuits. This pipeline operates in an online unsupervised fashion, and relies on a cochlea model, on feedback adaptation and on a balanced spiking neural network. We show that the proposed method achieves state-of-the-art performance or better against two publicly available data sets. Further, we demonstrate a working proof-of-concept implemented on an asynchronous neuromorphic processor device. This work represents a significant step towards the design and implementation of autonomous low-power edge-computing devices for online vibration monitoring.	翻訳日:2021-06-05 11:45:55 公開日:2021-06-01
# (参考訳) 対称性-via-duality:パラメータ空間相関器からの不変ニューラルネットワーク密度 Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators ( http://arxiv.org/abs/2106.00694v1 ) ライセンス: CC BY 4.0	Anindita Maiti, Keegan Stoner, James Halverson	(参考訳) パラメータ空間と関数空間は、ニューラルネットワークを研究するための2つの異なる双対フレームを提供する。ネットワーク密度の対称性は、密度が未知で同値でない場合でも、ネットワーク相関関数の双対計算によって決定できることを示す。対称性と双対性は、ネットワークパラメータ分布の選択に由来する相関関数の不変性に依存する。ニューラルネットワーク密度の入力および出力対称性が決定され、既知のガウス過程が回復すると無限の幅制限が生じる。このメカニズムは、パラメータが相関している場合や、神経接核の対称性など、トレーニング中の対称性を決定するためにも利用できる。初期化密度における対称性の量は、Fashion-MNISTで訓練されたネットワークの精度に影響を及ぼし、対称性の破れは、それが基底真理の方向にある場合にのみ役立つことを実証する。 Parameter-space and function-space provide two different duality frames in which to study neural networks. We demonstrate that symmetries of network densities may be determined via dual computations of network correlation functions, even when the density is unknown and the network is not equivariant. Symmetry-via-duality relies on invariance properties of the correlation functions, which stem from the choice of network parameter distributions. Input and output symmetries of neural network densities are determined, which recover known Gaussian process results in the infinite width limit. The mechanism may also be utilized to determine symmetries during training, when parameters are correlated, as well as symmetries of the Neural Tangent Kernel. We demonstrate that the amount of symmetry in the initialization density affects the accuracy of networks trained on Fashion-MNIST, and that symmetry breaking helps only when it is in the direction of ground truth.	翻訳日:2021-06-05 11:37:36 公開日:2021-06-01
# (参考訳) グラフニューラルネットワークを用いた協調的モバイルクラウドソーシングのための低複雑性リクルート Low Complexity Recruitment for Collaborative Mobile Crowdsourcing Using Graph Neural Networks ( http://arxiv.org/abs/2106.00717v1 ) ライセンス: CC BY 4.0	Aymen Hamrouni, Hakim Ghazzai, Turki Alelyani, Yehia Massoud	(参考訳) コラボレーティブ・モバイル・クラウドソーシング(CMCS、Collaborative Mobile crowdsourcing)は、例えば地方自治体や個人が、接続された人々の群衆から労働者のチームを雇い、複雑なタスクを実行することを可能にする。本稿では,タスク依頼者がソーシャル・コネクテッドで熟練した労働者のチームを作るための2つの異なるCMCS採用戦略について検討する。つまり,プラットフォームが作業員に関する自身の知識を活用してチームを編成するプラットフォームベースの戦略と,プラットフォームが自身のソーシャル・ネットワーク(SN)隣人について独自の知識を持つ適切なチームを採用するグループリーダーを指定するリーダベースの戦略である。 Integer Linear Program (ILP) として採用を定式化し、4つのファジィ論理に基づく基準(専門知識レベル、社会的関係の強さ、採用コスト、採用者の信頼度レベル)に従ってチームを最適に形成する。 NP硬度に対処するため,グラフニューラルネットワーク(GNN)を利用した新しい低複雑さCMCS採用手法を設計し,特にグラフ埋め込みとクラスタリング技術を用いて,作業者の検索空間を縮小し,その後,適切な作業者を選択するためにメタヒューリスティックな遺伝的アルゴリズムを活用する。実世界のデータセットに適用したシミュレーション結果から,CMCS採用手法の有効性が示された。提案手法は,大規模モバイルクラウドソーシングプラットフォーム上での計算時間削減と運用能力により,ベースライン型IPPと比較して高い性能を達成できることが示唆された。また、リーダーベースの戦略と比較して、プラットフォームベースの戦略はより熟練したチームを採用するが、SN関係は低く、コストも高い。 Collaborative Mobile crowdsourcing (CMCS) allows entities, e.g., local authorities or individuals, to hire a team of workers from the crowd of connected people, to execute complex tasks. In this paper, we investigate two different CMCS recruitment strategies allowing task requesters to form teams of socially connected and skilled workers: i) a platform-based strategy where the platform exploits its own knowledge about the workers to form a team and ii) a leader-based strategy where the platform designates a group leader that recruits its own suitable team given its own knowledge about its Social Network (SN) neighbors. We first formulate the recruitment as an Integer Linear Program (ILP) that optimally forms teams according to four fuzzy-logic-based criteria: level of expertise, social relationship strength, recruitment cost, and recruiter's confidence level. To cope with NP-hardness, we design a novel low-complexity CMCS recruitment approach relying on Graph Neural Networks (GNNs), specifically graph embedding and clustering techniques, to shrink the workers' search space and afterwards, exploiting a meta-heuristic genetic algorithm to select appropriate workers. Simulation results applied on a real-world dataset illustrate the performance of both proposed CMCS recruitment approaches. It is shown that our proposed low-complexity GNN-based recruitment algorithm achieves close performances to those of the baseline ILP with significant computational time saving and ability to operate on large-scale mobile crowdsourcing platforms. It is also shown that compared to the leader-based strategy, the platform-based strategy recruits a more skilled team but with lower SN relationships and higher cost.	翻訳日:2021-06-05 11:06:09 公開日:2021-06-01
# (参考訳) fair-net:識別可能なサブ人口間のパフォーマンス格差を軽減するネットワークアーキテクチャ Fair-Net: A Network Architecture For Reducing Performance Disparity Between Identifiable Sub-Populations ( http://arxiv.org/abs/2106.00720v1 ) ライセンス: CC BY 4.0	Arghya Datta, S. Joshua Swamidass	(参考訳) 現実世界のデータセットでは、特定のグループは過度に表現され、他のものよりはるかにまれであり、マシンラーニングの分類器は、過度に表現された人口に先立つことが多い。この問題は、データセットがクラス不均衡であり、マイノリティクラスが多数派クラスよりもはるかに稀な多くのドメインで悪化する。下位表現とクラス不均衡を扱うための有望なアプローチには、クラス不均衡を扱うサブポピュレーション固有の分類器の訓練や、サブポピュレーションの格差を無視し、クラス不均衡を扱うことで高い全体的な精度を達成することを目的としたグローバルな分類器の訓練が含まれる。本研究では,これらの手法が少数サブ人口のクラス不均衡データセットにおいて脆弱であることを示す。 fair-netは分岐型マルチタスクニューラルネットワークアーキテクチャで、クラス不均衡データセットにおける識別可能なサブポピュレーションの分類精度と確率校正を両立させる。 Fair-Netsは、ネットワークの出力層とエラー関数への直接的な拡張であり、より複雑なアーキテクチャに組み込むことができる。 3つの実世界のベンチマークデータセットを用いた実証研究により、Fair-Netは分類と校正性能を改善し、性別と人種のサブ人口間のパフォーマンス格差を大幅に減らした。 In real world datasets, particular groups are under-represented, much rarer than others, and machine learning classifiers will often preform worse on under-represented populations. This problem is aggravated across many domains where datasets are class imbalanced, with a minority class far rarer than the majority class. Naive approaches to handle under-representation and class imbalance include training sub-population specific classifiers that handle class imbalance or training a global classifier that overlooks sub-population disparities and aims to achieve high overall accuracy by handling class imbalance. In this study, we find that these approaches are vulnerable in class imbalanced datasets with minority sub-populations. We introduced Fair-Net, a branched multitask neural network architecture that improves both classification accuracy and probability calibration across identifiable sub-populations in class imbalanced datasets. Fair-Nets is a straightforward extension to the output layer and error function of a network, so can be incorporated in far more complex architectures. Empirical studies with three real world benchmark datasets demonstrate that Fair-Net improves classification and calibration performance, substantially reducing performance disparity between gender and racial sub-populations.	翻訳日:2021-06-05 10:33:41 公開日:2021-06-01
# (参考訳) 関数型オブジェクト指向ネットワークから生成するレシピの評価 Evaluating Recipes Generated from Functional Object-Oriented Network ( http://arxiv.org/abs/2106.00728v1 ) ライセンス: CC BY 4.0	Md Sadman Sakib, Hailey Baez, David Paulius, and Yu Sun	(参考訳) 関数型オブジェクト指向ネットワーク(foon)は,シンボリックタスク計画のためのグラフ形式の知識表現として導入された。ロボットは、操作タスクのシーケンシャルプランを得るために、FOONから知識検索プロセスを通じてタスクツリーを得ることができる。獲得したタスクツリーの品質を評価するため,レシピやマニュアルなどの従来のタスク知識と比較した。まずタスクツリーをレシピに自動的に変換し、それをRecipe1M+データセットの人間が作ったレシピと比較します。 Recipe1M+のレシピとFOONタスクツリーのレシピの間には,正確性,完全性,明確性という点で有意な差は認められなかった。 The functional object-oriented network (FOON) has been introduced as a knowledge representation, which takes the form of a graph, for symbolic task planning. To get a sequential plan for a manipulation task, a robot can obtain a task tree through a knowledge retrieval process from the FOON. To evaluate the quality of an acquired task tree, we compare it with a conventional form of task knowledge, such as recipes or manuals. We first automatically convert task trees to recipes, and we then compare them with the human-created recipes in the Recipe1M+ dataset via a survey. Our preliminary study finds no significant difference between the recipes in Recipe1M+ and the recipes generated from FOON task trees in terms of correctness, completeness, and clarity.	翻訳日:2021-06-05 10:21:06 公開日:2021-06-01
# (参考訳) 英語アラビア語機械翻訳における音声と普遍的依存の影響 Part of Speech and Universal Dependency effects on English Arabic Machine Translation ( http://arxiv.org/abs/2106.00745v1 ) ライセンス: CC0 1.0	Omri Abend, Leshem Choshen, Dmitry Nikolaev, Ofek Rafaeli	(参考訳) 本稿では,英語とアラビア語の文法的現象を基礎とした機械翻訳モデルの評価手法について述べる。このような「神経」や「機械学習」は微調整や変化が難しいため、この方法は特に重要である。したがって、それらを容易かつ多様に評価する方法を見つけることは、それらを改善するタスクに大いに役立ちます。 In this research paper, I will elaborate on a method to evaluate machine translation models based on their performance on underlying syntactical phenomena between English and Arabic languages. This method is especially important as such "neural" and "machine learning" are hard to fine-tune and change. Thus, finding a way to evaluate them easily and diversely would greatly help the task of bettering them.	翻訳日:2021-06-05 10:06:06 公開日:2021-06-01
# (参考訳) 無限水平動的計画法におけるオンラインポリシーイテレーション On-Line Policy Iteration for Infinite Horizon Dynamic Programming ( http://arxiv.org/abs/2106.00746v1 ) ライセンス: CC BY 4.0	Dimitri Bertsekas	(参考訳) 本稿では,有限状態無限大地平線ディスカウント動的計画のためのオンラインポリシー反復 (pi) アルゴリズムを提案する。これにより、現在のポリシの継続的な更新/改善が可能になり、結果として、改善されたコントロールを現在のポリシに組み込んだオンラインPIが生成される。このアルゴリズムは、有限個の段階において局所最適ポリシーの一種に収束し、ポリシー改善を単純化したpiおよびマルチエージェントpiの変種の可能性を提案する。さらに、このアルゴリズムはオンラインのリプランニングで使用することができ、また、値とポリシー近似を持つオンラインPIアルゴリズムにも適している。 In this paper we propose an on-line policy iteration (PI) algorithm for finite-state infinite horizon discounted dynamic programming, whereby the policy improvement operation is done on-line, only for the states that are encountered during operation of the system. This allows the continuous updating/improvement of the current policy, thus resulting in a form of on-line PI that incorporates the improved controls into the current policy as new states and controls are generated. The algorithm converges in a finite number of stages to a type of locally optimal policy, and suggests the possibility of variants of PI and multiagent PI where the policy improvement is simplified. Moreover, the algorithm can be used with on-line replanning, and is also well-suited for on-line PI algorithms with value and policy approximations.	翻訳日:2021-06-05 09:57:26 公開日:2021-06-01
# (参考訳) 重畳有限状態機械の高次微分 Higher-order Derivatives of Weighted Finite-state Machines ( http://arxiv.org/abs/2106.00749v1 ) ライセンス: CC BY 4.0	Ran Zmigrod, Tim Vieira, Ryan Cotterell	(参考訳) 重み付き有限状態機械はNLPシステムの基本的な構成要素である。彼らは、1990年代にノイズの多いチャネルモデルで初期の使用から、現代のニューラルなパラメータ化条件付きランダムフィールドまで、時間の試験を再考した。本研究では,重み付き有限状態機械の正規化定数に関する高次導関数の計算について検討する。文献に記載されていないすべての順序の導関数を評価するための一般アルゴリズムを提案する。 2階微分の場合、我々のスキームは最適な$\mathcal{O}(A^2 N^4)$時間で実行され、$A$はアルファベットサイズ、$N$は状態の数である。我々のアルゴリズムは以前のアルゴリズムよりはるかに高速である。さらに,本手法により,共分散行列や一階期待勾配などの2階期待値を計算するアルゴリズムが大幅に高速化される。 Weighted finite-state machines are a fundamental building block of NLP systems. They have withstood the test of time -- from their early use in noisy channel models in the 1990s up to modern-day neurally parameterized conditional random fields. This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines. We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature. In the case of second-order derivatives, our scheme runs in the optimal $\mathcal{O}(A^2 N^4)$ time where $A$ is the alphabet size and $N$ is the number of states. Our algorithm is significantly faster than prior algorithms. Additionally, our approach leads to a significantly faster algorithm for computing second-order expectations, such as covariance matrices and gradients of first-order expectations.	翻訳日:2021-06-05 09:50:45 公開日:2021-06-01
# (参考訳) 老いぼれのGRAPPAは死んだのか? Is good old GRAPPA dead? ( http://arxiv.org/abs/2106.00753v1 ) ライセンス: CC BY 4.0	Zaccharie Ramzi, Alexandre Vignaud, Jean-Luc Starck, Philippe Ciuciu	(参考訳) 我々はMRI再建のための最先端深層学習手法であるXPDNetの性能を,従来のGRAPPAと比較して定性的に解析する。私たちはこれを複数の設定で実行し、特にXPDNetの堅牢性をテストすることで、XPDNetがある程度の一般化が可能であることを示しています。 We perform a qualitative analysis of performance of XPDNet, a state-of-the-art deep learning approach for MRI reconstruction, compared to GRAPPA, a classical approach. We do this in multiple settings, in particular testing the robustness of the XPDNet to unseen settings, and show that the XPDNet can to some degree generalize well.	翻訳日:2021-06-05 09:37:26 公開日:2021-06-01
# (参考訳) 入力表現の復号化によるニューラルネットワークの構成性向上 Improving Compositionality of Neural Networks by Decoding Representations to Inputs ( http://arxiv.org/abs/2106.00769v1 ) ライセンス: CC BY 4.0	Mike Wu, Noah Goodman, Stefano Ermon	(参考訳) 従来のソフトウェアプログラムでは、変数から入力までプログラムロジックをトレースし、ユニットテストとアサーションステートメントを適用して誤った振る舞いをブロックし、プログラムを一緒に構成することで、コードのデバッグがいかに簡単かを考慮します。しかし、プログラムが複雑化するにつれて、コンピュータビジョンや自然言語のようなアプリケーションに従来のソフトウェアを適用するのは難しくなります。ディープラーニングプログラムはこれらのアプリケーションで強いパフォーマンスを示しているが、従来のソフトウェアプログラムの機能の多くを犠牲にしている。本稿では,ニューラルネットワークのアクティベーションを"デコード"に制約するために,生成モデルを共同でトレーニングすることにより,従来型および深層学習プログラムの利点を橋渡しする。そうすることで、実践者はアクティベーションで符号化された情報を探索し追跡し、アクティベーションで符号化された情報にアサーションのような制約を適用し、プラグインとプレイで別々のニューラルネットワークを構成することができる。実験では、分散検出、逆例、キャリブレーション、公平性に対するデコダラブル表現の応用を、標準ニューラルネットワークの精度と一致させながら実証する。 In traditional software programs, we take for granted how easy it is to debug code by tracing program logic from variables back to input, apply unit tests and assertion statements to block erroneous behavior, and compose programs together. But as the programs we write grow more complex, it becomes hard to apply traditional software to applications like computer vision or natural language. Although deep learning programs have demonstrated strong performance on these applications, they sacrifice many of the functionalities of traditional software programs. In this paper, we work towards bridging the benefits of traditional and deep learning programs by jointly training a generative model to constrain neural network activations to "decode" back to inputs. Doing so enables practitioners to probe and track information encoded in activation(s), apply assertion-like constraints on what information is encoded in an activation, and compose separate neural networks together in a plug-and-play fashion. In our experiments, we demonstrate applications of decodable representations to out-of-distribution detection, adversarial examples, calibration, and fairness -- while matching standard neural networks in accuracy.	翻訳日:2021-06-05 09:33:43 公開日:2021-06-01
# (参考訳) フェアネスを考慮した特徴選択のための情報理論 Information Theoretic Measures for Fairness-aware Feature Selection ( http://arxiv.org/abs/2106.00772v1 ) ライセンス: CC BY 4.0	Sajad Khodadadian, Mohamed Nafea, AmirEmad Ghassami, Negar Kiyavash	(参考訳) 機械利得アルゴリズムは、関連する特徴に基づいて個人に関する一連の意思決定にますます使われている。しかし、正確な決定に関係のある特徴は、特定の人種や性別のような非特権集団に対する明示的または暗黙的な差別に繋がる可能性がある。これはトレーニングデータに既存のバイアスがあり、学習アルゴリズムによってしばしば複製されるか、さらに悪化する。これらのバイアスをデータレベルで識別し、測定することは、特徴間の相互依存と決定結果のために難しい問題である。本研究では,特徴の精度と識別的影響に関する情報理論に基づく,公正な特徴選択のためのフレームワークを開発する。特に当社の目標は,この機能が正確性や非差別的判断に与える影響を定量化する,各機能に対する公平性ユーティリティスコアの設計にあります。まず,モデルの精度と識別に異なる特徴のサブセットが与える影響に関する情報理論的な尺度を提案する。その後,shapley値関数を用いて各特徴の限界影響を推定する。我々のフレームワークは、特定の分類器の設計よりもデータの合同統計に依存する。提案する実データおよび合成データに関する枠組みについて検討し,その性能評価を行った。 Machine earning algorithms are increasingly used for consequential decision making regarding individuals based on their relevant features. Features that are relevant for accurate decisions may however lead to either explicit or implicit forms of discrimination against unprivileged groups, such as those of certain race or gender. This happens due to existing biases in the training data, which are often replicated or even exacerbated by the learning algorithm. Identifying and measuring these biases at the data level is a challenging problem due to the interdependence among the features, and the decision outcome. In this work, we develop a framework for fairness-aware feature selection, based on information theoretic measures for the accuracy and discriminatory impacts of features. Specifically, our goal is to design a fairness utility score for each feature which quantifies how this feature influences accurate as well as nondiscriminatory decisions. We first propose information theoretic measures for the impact of different subsets of features on the accuracy and discrimination of the model. Subsequently, we deduce the marginal impact of each feature using Shapley value function. Our framework depends on the joint statistics of the data rather than a particular classifier design. We examine our proposed framework on real and synthetic data to evaluate its performance.	翻訳日:2021-06-05 09:16:15 公開日:2021-06-01
# (参考訳) K$-best非射影依存ツリーの探索について On Finding the $K$-best Non-projective Dependency Trees ( http://arxiv.org/abs/2106.00780v1 ) ライセンス: CC BY 4.0	Ran Zmigrod, Tim Vieira, Ryan Cotterell	(参考訳) 有向グラフにおける最大スパンニングツリーと文の最高の依存ツリーとの接続は、NLPコミュニティによって活用されている。しかし、多くの依存性解析スキームにおいて、このアプローチの重要な詳細は、スパンニングツリーがルートからちょうど1つのエッジを持つ必要があることである。一流の依存性ツリーを見つけるために、この問題を効率的に解決する作業が行われているが、k$-best依存性ツリーを見つけるためにこのソリューションを拡張する研究は行われていない。これはおそらくより重要な拡張であり、デコードされた木の割合が依存性ツリーのルート制約の対象にならないためである。実際、ルート制約違反の率は、$K\!=\!50$でデコードした場合、$K\!=\!1$とは対照的に平均13ドルずつ増加する。本稿では,camerini et al の $k$-best spaning tree アルゴリズムの単純化について述べる。 (1980). 我々の単純化により、元のアルゴリズム上で一定の時間短縮が得られる。さらに、ルート制約を受けるグラフの$k$-best依存性木を復号するアルゴリズムの新たな拡張を提案する。 The connection between the maximum spanning tree in a directed graph and the best dependency tree of a sentence has been exploited by the NLP community. However, for many dependency parsing schemes, an important detail of this approach is that the spanning tree must have exactly one edge emanating from the root. While work has been done to efficiently solve this problem for finding the one-best dependency tree, no research has attempted to extend this solution to finding the $K$-best dependency trees. This is arguably a more important extension as a larger proportion of decoded trees will not be subject to the root constraint of dependency trees. Indeed, we show that the rate of root constraint violations increases by an average of $13$ times when decoding with $K\!=\!50$ as opposed to $K\!=\!1$. In this paper, we provide a simplification of the $K$-best spanning tree algorithm of Camerini et al. (1980). Our simplification allows us to obtain a constant time speed-up over the original algorithm. Furthermore, we present a novel extension of the algorithm for decoding the $K$-best dependency trees of a graph which are subject to a root constraint.	翻訳日:2021-06-05 08:59:43 公開日:2021-06-01
# (参考訳) Cセキュリティ脆弱性検出のためのソースコードの分散表現について On using distributed representations of source code for the detection of C security vulnerabilities ( http://arxiv.org/abs/2106.01367v1 ) ライセンス: CC BY 4.0	David Coimbra, Sofia Reis, Rui Abreu, Corina P\u{a}s\u{a}reanu, Hakan Erdogmus	(参考訳) 本稿では,c ソースコードのセキュリティ脆弱性検出タスクにおいて,コード表現モデル code2vec の評価を行う。我々はオープンソースライブラリのastminerを利用してラベル付きc関数のコーパスの抽象構文木からパスコンテキストを抽出する。 code2vecは、関数を脆弱か非破壊可能かを分類するタスクで、結果のパスコンテキストでトレーニングされる。 CodeXGLUEベンチマークを用いて、このタスクのCode2vecの精度は、事前訓練されたRoBERTaのような単純なトランスフォーマーベースのメソッドに匹敵し、より単純なNLPベースのメソッドよりも優れていることを示す。我々は,より大きなモデルに対して低計算要求を維持しながら,61.43%の精度を実現した。 This paper presents an evaluation of the code representation model Code2vec when trained on the task of detecting security vulnerabilities in C source code. We leverage the open-source library astminer to extract path-contexts from the abstract syntax trees of a corpus of labeled C functions. Code2vec is trained on the resulting path-contexts with the task of classifying a function as vulnerable or non-vulnerable. Using the CodeXGLUE benchmark, we show that the accuracy of Code2vec for this task is comparable to simple transformer-based methods such as pre-trained RoBERTa, and outperforms more naive NLP-based methods. We achieved an accuracy of 61.43% while maintaining low computational requirements relative to larger models.	翻訳日:2021-06-05 08:22:53 公開日:2021-06-01
# 可逆適応正規化による単一領域一般化 Adversarially Adaptive Normalization for Single Domain Generalization ( http://arxiv.org/abs/2106.01899v1 ) ライセンス: Link先を確認	Xinjie Fan, Qifei Wang, Junjie Ke, Feng Yang, Boqing Gong, Mingyuan Zhou	(参考訳) 単一ドメインの一般化は、トレーニング用の1つのドメインデータだけで、見えない多くのドメインでうまく機能するモデルを学ぶことを目的としています。既存の研究は、モデルの一般化能力を改善するために、adversarial domain augmentation (ada)の研究に焦点を当てている。正規化層の統計の領域一般化への影響はいまだ検討されていない。本稿では,従来の研究の欠如を補うために,一般化正規化アプローチ,適応標準化と再スケーリング正規化(ASR-Norm)を提案する。 ASR-Normは、ニューラルネットワークを介して標準化と再スケーリングの統計学を学ぶ。この新しい正規化の形式は、伝統的な正規化の一般的な形式と見なすことができる。 ADAでトレーニングすると、ASR-Normの統計は異なるドメインから来るデータに適応することが学習され、したがって、特にソースドメインと大きな差があるターゲットドメインにおいて、ドメイン間でのモデルの一般化性能が向上する。実験の結果,asr-normは平均1.6%,2.7%,6.3%,cifar-10-c,pacsベンチマークにおいて,最先端adaアプローチに一貫した改善をもたらすことがわかった。一般的なツールとして、ASR-Normによって導入された改善はADAメソッドの選択に依存しない。 Single domain generalization aims to learn a model that performs well on many unseen domains with only one domain data for training. Existing works focus on studying the adversarial domain augmentation (ADA) to improve the model's generalization capability. The impact on domain generalization of the statistics of normalization layers is still underinvestigated. In this paper, we propose a generic normalization approach, adaptive standardization and rescaling normalization (ASR-Norm), to complement the missing part in previous works. ASR-Norm learns both the standardization and rescaling statistics via neural networks. This new form of normalization can be viewed as a generic form of the traditional normalizations. When trained with ADA, the statistics in ASR-Norm are learned to be adaptive to the data coming from different domains, and hence improves the model generalization performance across domains, especially on the target domain with large discrepancy from the source domain. The experimental results show that ASR-Norm can bring consistent improvement to the state-of-the-art ADA approaches by 1.6%, 2.7%, and 6.3% averagely on the Digits, CIFAR-10-C, and PACS benchmarks, respectively. As a generic tool, the improvement introduced by ASR-Norm is agnostic to the choice of ADA methods.	翻訳日:2021-06-04 16:07:47 公開日:2021-06-01
# memory wrap: 画像分類モデルへのデータ効率と解釈可能な拡張 Memory Wrap: a Data-Efficient and Interpretable Extension to Image Classification Models ( http://arxiv.org/abs/2106.01440v1 ) ライセンス: Link先を確認	Biagio La Rosa, Roberto Capobianco and Daniele Nardi	(参考訳) ブラックボックスとデータ処理の性質のため、ディープラーニング技術は医療や司法といった重要な分野における現実世界の応用にはまだ広く採用されていない。本稿では,任意の画像分類モデルのプラグアンドプレイ拡張であるMemory Wrapを提案する。メモリラップはデータ効率とモデル解釈性の両方を改善し、過去のトレーニングサンプルのメモリと入力の間にコンテントアテンション機構を採用する。メモリラップは、限られたデータ集合から学習すると標準的な分類器よりも優れており、完全なデータセットから学習すると同等のパフォーマンスに達することを示す。本稿では,その構造と内容認識機構が,標準分類器と比較して解釈可能かを論じる。この目的のために,記憶内容に基づいて実例と実例による説明を構築する手法と,その意思決定プロセスに関する洞察を得るためにそれらを活用する方法を示す。我々は,CIFAR10,SVHN,CINIC10という3つの異なるデータセット上で,複数のアーキテクチャを用いて画像分類タスクをテストする。 Due to their black-box and data-hungry nature, deep learning techniques are not yet widely adopted for real-world applications in critical domains, like healthcare and justice. This paper presents Memory Wrap, a plug-and-play extension to any image classification model. Memory Wrap improves both data-efficiency and model interpretability, adopting a content-attention mechanism between the input and some memories of past training samples. We show that Memory Wrap outperforms standard classifiers when it learns from a limited set of data, and it reaches comparable performance when it learns from the full dataset. We discuss how its structure and content-attention mechanisms make predictions interpretable, compared to standard classifiers. To this end, we both show a method to build explanations by examples and counterfactuals, based on the memory content, and how to exploit them to get insights about its decision process. We test our approach on image classification tasks using several architectures on three different datasets, namely CIFAR10, SVHN, and CINIC10.	翻訳日:2021-06-04 16:05:28 公開日:2021-06-01
# (参考訳) 深部生成モデルのための潜時空間再構成 Latent Space Refinement for Deep Generative Models ( http://arxiv.org/abs/2106.00792v1 ) ライセンス: CC BY 4.0	Ramon Winterhalder, Marco Bellagente, Benjamin Nachman	(参考訳) 深層生成モデルは様々な目的のために科学や産業で広く利用されている。一般的な課題は、データ確率密度の正確な暗黙的あるいは明示的な表現を達成することである。最近の提案では、深層生成モデルの学習密度を向上するために分類器重みを用いた。我々は、このアイデアをあらゆる種類の生成モデルに拡張し、反復生成モデリングによる潜在空間の洗練が位相的障害を回避し、精度を向上させる方法を示す。この方法論は、対象モデルが微分不能で、改良前に限界化されなければならない多くの内部潜在次元を持つ場合にも適用される。本稿では,LaSeR(Latent Space Refinement)プロトコルを実例で紹介し,正規化フローと生成逆数ネットワークの組み合わせに着目した。 Deep generative models are becoming widely used across science and industry for a variety of purposes. A common challenge is achieving a precise implicit or explicit representation of the data probability density. Recent proposals have suggested using classifier weights to refine the learned density of deep generative models. We extend this idea to all types of generative models and show how latent space refinement via iterated generative modeling can circumvent topological obstructions and improve precision. This methodology also applies to cases were the target model is non-differentiable and has many internal latent dimensions which must be marginalized over before refinement. We demonstrate our Latent Space Refinement (LaSeR) protocol on a variety of examples, focusing on the combinations of Normalizing Flows and Generative Adversarial Networks.	翻訳日:2021-06-04 12:12:28 公開日:2021-06-01
# (参考訳) 機械学習会議のレビュープロセスにおける倫理的課題 Some Ethical Issues in the Review Process of Machine Learning Conferences ( http://arxiv.org/abs/2106.00810v1 ) ライセンス: CC BY 4.0	Alessio Russo	(参考訳) 最近の機械学習コミュニティの成功により、カンファレンスに提出された論文の数が大幅に増加した。この増加は、これらのカンファレンスが使用している現在のレビュープロセスに影響を及ぼすいくつかの問題をより顕著にした。レビュープロセスには科学研究の性質を損なういくつかの問題があり、これは完全に客観的で、政治的で、偏見がなく、不正行為(盗作、不正行為、不適切な影響、その他の不利益など)がない。本研究では,レビュワーの募集問題,二重盲検過程の侵害,不正行為,数値評価におけるバイアス,付録現象(すなわち,論文の付録部に結果を公開することが一般的になっていること)について検討する。これら各問題に対して、簡単な説明と可能な解決策を提供する。この作業の目標は、これらの問題に対する機械学習コミュニティの意識を高めることにある。 Recent successes in the Machine Learning community have led to a steep increase in the number of papers submitted to conferences. This increase made more prominent some of the issues that affect the current review process used by these conferences. The review process has several issues that may undermine the nature of scientific research, which is of being fully objective, apolitical, unbiased and free of misconduct (such as plagiarism, cheating, improper influence, and other improprieties). In this work, we study the problem of reviewers' recruitment, infringements of the double-blind process, fraudulent behaviors, biases in numerical ratings, and the appendix phenomenon (i.e., the fact that it is becoming more common to publish results in the appendix section of a paper). For each of these problems, we provide a short description and possible solutions. The goal of this work is to raise awareness in the Machine Learning community regarding these issues.	翻訳日:2021-06-04 11:56:52 公開日:2021-06-01
# (参考訳) iMetコレクション2020のラベルスペースのクリーン化と構造化 Cleaning and Structuring the Label Space of the iMet Collection 2020 ( http://arxiv.org/abs/2106.00815v1 ) ライセンス: CC BY 4.0	Vivien Nguyen and Sunnie S. Y. Kim	(参考訳) iMet 2020データセットは、細粒度アート属性認識の分野で貴重なリソースだが、その真の可能性には達していないと私たちは信じている。我々は、データセットのユニークな特性を文書化し、多くの属性ラベルが、データセット記述によって示唆されるよりもノイズが多いことを観察する。しばしば、ラベル間の意味的関係(例えば、同一性、相互排除、仮定、不確実性との重なり)も、私たちが利用していないと信じている。我々は,iMet 2020ラベルのクリーニングと構造化のアプローチを提案し,その意義と価値について議論する。さらに,提案手法の利点をいくつかの実験により示す。私たちのコードとクリーニングラベルは、https://github.com/sunniesuhyoung/imet2020cleanedで利用可能です。 The iMet 2020 dataset is a valuable resource in the space of fine-grained art attribution recognition, but we believe it has yet to reach its true potential. We document the unique properties of the dataset and observe that many of the attribute labels are noisy, more than is implied by the dataset description. Oftentimes, there are also semantic relationships between the labels (e.g., identical, mutual exclusion, subsumption, overlap with uncertainty) which we believe are underutilized. We propose an approach to cleaning and structuring the iMet 2020 labels, and discuss the implications and value of doing so. Further, we demonstrate the benefits of our proposed approach through several experiments. Our code and cleaned labels are available at https://github.com/sunniesuhyoung/iMet2020cleaned.	翻訳日:2021-06-04 11:50:32 公開日:2021-06-01
# (参考訳) ConvoSumm: 会話要約ベンチマークとArgument Miningによる抽象要約の改善 ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining ( http://arxiv.org/abs/2106.00829v1 ) ライセンス: CC BY 4.0	Alexander R. Fabbri, Faiaz Rahman, Imad Rizvi, Borui Wang, Haoran Li, Yashar Mehdad, Dragomir Radev	(参考訳) オンライン会話は膨大な情報をさまざまな形式でカバーすることができるが、抽象的なテキスト要約は主にニュース記事のモデリングに重点を置いている。この研究のギャップは、部分的にはオンラインの議論を要約するための標準化されたデータセットの欠如によるものだ。このギャップに対処するため、我々は、ニュースコメント、ディスカッションフォーラム、コミュニティ質問応答フォーラム、電子メールスレッドの4つの新しいデータセットをクラウドソースする、課題視点フレームワークによって動機付けられたアノテーションプロトコルを設計する。我々は、データセットの最先端モデルをベンチマークし、データに関連する特徴を分析する。包括的ベンチマークを作成するために、この領域で強力なベースラインを確立するために、広く使われている会話要約データセット上でこれらのモデルを評価する。さらに,会話に含まれる問題や視点,アサーションを直接モデル化するために,グラフ構築による議論マイニングを取り入れ,ノイズ入力をフィルタリングし,自動評価や人間評価による比較や改善結果を示す。 While online conversations can cover a vast amount of information in many different formats, abstractive text summarization has primarily focused on modeling solely news articles. This research gap is due, in part, to the lack of standardized datasets for summarizing online discussions. To address this gap, we design annotation protocols motivated by an issues--viewpoints--assertions framework to crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads. We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data. To create a comprehensive benchmark, we also evaluate these models on widely-used conversation summarization datasets to establish strong baselines in this domain. Furthermore, we incorporate argument mining through graph construction to directly model the issues, viewpoints, and assertions present in a conversation and filter noisy input, showing comparable or improved results according to automatic and human evaluations.	翻訳日:2021-06-04 11:38:20 公開日:2021-06-01
# (参考訳) 価格アルゴリズム保険 Pricing Algorithmic Insurance ( http://arxiv.org/abs/2106.00839v1 ) ライセンス: CC BY 4.0	Dimitris Bertsimas, Agni Orfanoudaki	(参考訳) 機械学習アルゴリズムが企業や組織の意思決定プロセスに統合され始めると、保険製品は所有者をリスクから守るために開発される。本稿では, アルゴリズム保険の概念を導入し, 導出保険契約の価格設定を実現するための定量的枠組みを提案する。本稿では,バイナリ分類モデルのリスク露出と価格を推定する最適化式を提案する。本稿では,モデルの性質,すなわち正確性,解釈性,一般化性が保険契約評価に与える影響について概説する。提案手法の実践的実装を示すために,乳がん検出の文脈における医療的誤りの事例研究を行った。本分析は,モデルパラメータが期待される損失に与える影響を計測し,契約の価格に大きく影響するアルゴリズム性能の側面を特定することに焦点を当てる。 As machine learning algorithms start to get integrated into the decision-making process of companies and organizations, insurance products will be developed to protect their owners from risk. We introduce the concept of algorithmic insurance and present a quantitative framework to enable the pricing of the derived insurance contracts. We propose an optimization formulation to estimate the risk exposure and price for a binary classification model. Our approach outlines how properties of the model, such as accuracy, interpretability and generalizability, can influence the insurance contract evaluation. To showcase a practical implementation of the proposed framework, we present a case study of medical malpractice in the context of breast cancer detection. Our analysis focuses on measuring the effect of the model parameters on the expected financial loss and identifying the aspects of algorithmic performance that predominantly affect the price of the contract.	翻訳日:2021-06-04 11:16:18 公開日:2021-06-01
# (参考訳) 項目応答理論によるテストセットの比較 Comparing Test Sets with Item Response Theory ( http://arxiv.org/abs/2106.00840v1 ) ライセンス: CC BY 4.0	Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman	(参考訳) 近年,自然言語理解タスクにおける微調整モデルの性能を評価するために,多くのNLPデータセットが導入された。しかし、大規模な事前訓練されたモデルによる最近の結果は、これらのデータセットの大部分は飽和しており、さらなる進歩を検出することができないことを示している。強力なモデル間での差別化に依然として有効なデータセットは何か、将来の改善を検出できるデータセットはどのようなものか? これをデータセット全体にわたって一様に測定するために、項目応答理論に基づき、個別のテスト例で18の事前学習トランスフォーマーモデルの予測を用いて29のデータセットを評価する。 Quoref、HellaSwag、MC-TACOは最先端のモデルの区別に最適であるのに対して、SNLI、MNLI、CommitmentBankは現在の強力なモデルに飽和しているようだ。また、QAMRやSQuAD2.0のようなQAデータセットに使用されるスパン選択タスク形式は、強いモデルと弱いモデルとの差別化に有効である。 Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks. Recent results from large pretrained models, though, show that many of these datasets are largely saturated and unlikely to be able to detect further progress. What kind of datasets are still effective at discriminating among strong models, and what kind of datasets should we expect to be able to detect future improvements? To measure this uniformly across datasets, we draw on Item Response Theory and evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.	翻訳日:2021-06-04 11:15:24 公開日:2021-06-01
# (参考訳) 多変量環境における非線形関係発見のための事前画像の活用 Leveraging Pre-Images to Discover Nonlinear Relationships in Multivariate Environments ( http://arxiv.org/abs/2106.00842v1 ) ライセンス: CC BY 4.0	M. Ali Vosoughi and Axel Wismuller	(参考訳) 因果発見は、接続された点の集合としてのネットワークの推論を超えて、人工知能を用いた科学的発見において重要な機能を提供する。物理学、生理学、不確定な環境における複数のエージェントによる戦略的決定、気候学、その他多くの領域で発生する問題は、因果関係や推論にルーツを持つ。多くの実世界の時間観測が互いに非線形に関連していることが判明した。観測の回数は数百万ポイントにも達するが、時間サンプルの数は倫理的あるいは実践的な理由から最小限に抑えられ、大規模システムにおける次元の呪いにつながる。本稿では,カーネルの主成分分析と事前イメージを用いて,多変量時系列データの非線形依存関係を求める手法を提案する。本手法は, 観測が時間的に制限され, 非線形関係にある場合に, 最先端の因果発見手法よりも優れることを示す。提案手法を評価するために,様々なトポロジを持つ実世界および合成データセットの広範なシミュレーションを行った。 Causal discovery, beyond the inference of a network as a collection of connected dots, offers a crucial functionality in scientific discovery using artificial intelligence. The questions that arise in multiple domains, such as physics, physiology, the strategic decision in uncertain environments with multiple agents, climatology, among many others, have roots in causality and reasoning. It became apparent that many real-world temporal observations are nonlinearly related to each other. While the number of observations can be as high as millions of points, the number of temporal samples can be minimal due to ethical or practical reasons, leading to the curse-of-dimensionality in large-scale systems. This paper proposes a novel method using kernel principal component analysis and pre-images to obtain nonlinear dependencies of multivariate time-series data. We show that our method outperforms state-of-the-art causal discovery methods when the observations are restricted by time and are nonlinearly related. Extensive simulations on both real-world and synthetic datasets with various topologies are provided to evaluate our proposed methods.	翻訳日:2021-06-04 09:29:26 公開日:2021-06-01
# (参考訳) グラフリッチドキュメンテーション表現を用いたパラメータ効率の良いニューラル質問応答モデル Parameter-Efficient Neural Question Answering Models via Graph-Enriched Document Representations ( http://arxiv.org/abs/2106.00851v1 ) ライセンス: CC BY 4.0	Louis Castricato, Stephen Fitz, Won Young Shin	(参考訳) 現代のNLPシステムの計算フットプリントが増加するにつれて、より効率的なモデルに到達することがますます重要になる。グラフ畳み込み文書表現を用いることで、学習可能なパラメータの観点でリソースの5\%未満を消費しながら、somaソリューションを両立し、場合によっては超越する質問応答システムが得られることを示す。現在、GCNをNLPに適用する際の大きな問題は文書表現である。本稿では,GCNに富んだ文書表現が,自明なトポロジを用いてもHotPotQAで見られる結果を大幅に改善することを示す。我々のモデル(gQA)は、現在のSOTAと比較するとすばらしい性能を示し、前処理はほとんど必要としない。シャオとアルで 2020年、著者らはマルチホップQAの性能向上のためにグラフネットワークは必要ないことを示唆した。本稿では,GCNのna\{i}ve実装が事前訓練された言語モデルに基づくSoTAモデルと相容れない性能を示すことによって,大規模言語モデルは性能向上に必要ではないことを示唆する。 As the computational footprint of modern NLP systems grows, it becomes increasingly important to arrive at more efficient models. We show that by employing graph convolutional document representation, we can arrive at a question answering system that performs comparably to, and in some cases exceeds the SOTA solutions, while using less than 5\% of their resources in terms of trainable parameters. As it currently stands, a major issue in applying GCNs to NLP is document representation. In this paper, we show that a GCN enriched document representation greatly improves the results seen in HotPotQA, even when using a trivial topology. Our model (gQA), performs admirably when compared to the current SOTA, and requires little to no preprocessing. In Shao et al. 2020, the authors suggest that graph networks are not necessary for good performance in multi-hop QA. In this paper, we suggest that large language models are not necessary for good performance by showing a na\"{i}ve implementation of a GCN performs comparably to SoTA models based on pretrained language models.	翻訳日:2021-06-04 09:18:43 公開日:2021-06-01
# In-Distribution Counterfactuals を用いた社会適応型特徴重要度記述のための検索手法 Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals ( http://arxiv.org/abs/2106.00786v1 ) ライセンス: Link先を確認	Peter Hase, Harry Xie, Mohit Bansal	(参考訳) 特徴重要度(FI)推定は一般的な説明形式であり、テスト時に特定の入力特徴を除去することによって生じるモデル信頼度の変化を計算し、評価することが一般的である。例えば、標準sufficiencyメトリックでは、最も重要なトークンはトップkのみ保持される。本稿では,fiベース説明の未検討次元をいくつか検討し,この説明形式に対する概念的および経験的改善について述べる。まず、説明の作成や評価において、なぜインプットから特徴を取り除くことが問題となるのか、という新たな議論を前進させる: モデルに対するこれらの反事実入力がアウト・オブ・ディストリビューション(OOD)であるという事実は、結果として生じる説明が社会的に不一致であることを意味する。問題の本質は、モデル事前化とランダムな重みの初期化が意図しない方法で説明(と説明メトリクス)に影響を与えることである。この問題を解決するために、モデルトレーニングプロセスの簡単な変更を提案し、より社会的に整合した説明とメトリクスをもたらす。第2に,モデル入力から機能を取り除くための5つのアプローチを比較した。いくつかの手法はOOD対策を他の方法よりも多く生成し,機能置換関数を選択することを推奨する。最後に,fi説明を識別し,lime,統合勾配,ランダム検索など,強力なベースラインと比較する検索ベース手法を4つ導入する。 6つの多様なテキスト分類データセットを用いて実験したところ、ランダム検索を一貫して上回る手法は並列局所探索のみであることがわかった。第2の方法による改善は、十分で5.4ポイント、包括性で17ポイントである。サポートコードはすべてhttps://github.com/peterbhase/ExplanationSearchで公開されている。 Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. For example, in the standard Sufficiency metric, only the top-k most important tokens are kept. In this paper, we study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation. First, we advance a new argument for why it can be problematic to remove features from an input when creating or evaluating explanations: the fact that these counterfactual inputs are out-of-distribution (OOD) to models implies that the resulting explanations are socially misaligned. The crux of the problem is that the model prior and random weight initialization influence the explanations (and explanation metrics) in unintended ways. To resolve this issue, we propose a simple alteration to the model training process, which results in more socially aligned explanations and metrics. Second, we compare among five approaches for removing features from model inputs. We find that some methods produce more OOD counterfactuals than others, and we make recommendations for selecting a feature-replacement function. Finally, we introduce four search-based methods for identifying FI explanations and compare them to strong baselines, including LIME, Integrated Gradients, and random search. On experiments with six diverse text classification datasets, we find that the only method that consistently outperforms random search is a Parallel Local Search that we introduce. Improvements over the second-best method are as large as 5.4 points for Sufficiency and 17 points for Comprehensiveness. All supporting code is publicly available at https://github.com/peterbhase/ExplanationSearch.	翻訳日:2021-06-03 14:52:15 公開日:2021-06-01
# 不変政策学習:因果的視点 Invariant Policy Learning: A Causal Perspective ( http://arxiv.org/abs/2106.00808v1 ) ライセンス: Link先を確認	Sorawit Saengkyongam, Nikolaj Thams, Jonas Peters and Niklas Pfister	(参考訳) 過去10年間で、オンライン広告、レコメンダシステム、動的価格などの様々なインタラクティブな学習システムにおいて、文脈的帯域幅と強化学習アルゴリズムがうまく使われてきた。しかし、医療などの高度なアプリケーション領域では、まだ広く採用されていない。一つの理由は、既存のアプローチが、基盤となるメカニズムが、時間とともに異なる環境にまたがって変化しないという意味で静的であると仮定しているからかもしれない。しかし、多くの現実世界のシステムでは、メカニズムは静的環境の仮定を無効にする可能性のある環境にまたがるシフトの対象となる。本稿では,オフラインの文脈的帯域幅の枠組みの下での環境変化問題に対処する。我々は,因果関係のレンズを通して環境変化の問題を考察し,基盤メカニズムの変化を可能にするマルチ環境コンテキストバンディットを提案する。因果関係文献から不変性の概念を採用し,政策不変性の概念を導入する。政策不変性は、観測されていない共同創設者が存在する場合にのみ重要であり、その場合、ある仮定の下で最適な不変性が環境全体にわたって一般化されることを示す。本研究は,環境変化問題に対する解決策を提供するだけでなく,因果関係,不変性,文脈的バンディットの具体的関係を確立する。 In the past decade, contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over time or over different environments. In many real world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we tackle the problem of environmental shifts under the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved confounders are present and show that, in that case, an optimal invariant policy is guaranteed, under certain assumptions, to generalize across environments. Our results do not only provide a solution to the environmental shift problem but also establish concrete connections among causality, invariance and contextual bandits.	翻訳日:2021-06-03 14:49:35 公開日:2021-06-01
# 不確かさ特性曲線:予測間隔の体系的評価 Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals ( http://arxiv.org/abs/2106.00858v1 ) ライセンス: Link先を確認	Jiri Navratil, Benjamin Elder, Matthew Arnold, Soumya Ghosh, Prasanna Sattigeri	(参考訳) モデル不確実性の正確な定量化は、信頼できるAIの基本的な要件として長年認識されてきた。回帰タスクでは、不確実性は通常、特定の操作点に調整された予測間隔を用いて定量化され、異なる研究における評価と比較が困難になる。本研究は,(1)操作特性曲線の概念,(2)単純な参照よりも利得の概念を活用して,予測間隔に対する新たな操作点非依存評価手法を導出する。本稿では, 対応するアルゴリズムを記述し, 理論的解析を行い, 複数のシナリオでその有用性を実証する。提案手法は予測間隔の包括的評価の必要性に対処し,不確実性定量化ツールボックスの付加価値を示すものである。 Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to a specific operating point, making evaluation and comparison across different studies difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a simple reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper describes the corresponding algorithm, provides a theoretical analysis, and demonstrates its utility in multiple scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.	翻訳日:2021-06-03 14:49:16 公開日:2021-06-01
# QLSD:ベイズ連邦学習のための量子Langevin確率力学 QLSD: Quantised Langevin stochastic dynamics for Bayesian federated learning ( http://arxiv.org/abs/2106.00797v1 ) ライセンス: Link先を確認	Maxime Vono, Vincent Plassier, Alain Durmus, Aymeric Dieuleveut, Eric Moulines	(参考訳) フェデレーション学習は、データが分散化され、複数のクライアントにローカルに保存された場合に、データオーナシップと通信オーバーヘッドという2つの主な制約の下で推論を実行することを目的としている。本稿では,これらの問題をベイズパラダイムのもとで扱う。この目的のために,確率勾配ランジュバンダイナミクスの量子化バージョンを基盤とした,新しいマルコフ連鎖モンテカルロアルゴリズムを提案する。ビッグデータシステムの性能向上のために,本稿では,<texttt{QLSD}$^\star$および<texttt{QLSD}$^{++}$と呼ばれる方法論の分散還元代替案を紹介する。我々は,提案アルゴリズムの非漸近収束保証と漸近収束保証の両方を提供し,その利点を複数のフェデレート学習ベンチマークで示す。 Federated learning aims at conducting inference when data are decentralised and locally stored on several clients, under two main constraints: data ownership and communication overhead. In this paper, we address these issues under the Bayesian paradigm. To this end, we propose a novel Markov chain Monte Carlo algorithm coined \texttt{QLSD} built upon quantised versions of stochastic gradient Langevin dynamics. To improve performance in a big data regime, we introduce variance-reduced alternatives of our methodology referred to as \texttt{QLSD}$^\star$ and \texttt{QLSD}$^{++}$. We provide both non-asymptotic and asymptotic convergence guarantees for the proposed algorithms and illustrate their benefits on several federated learning benchmarks.	翻訳日:2021-06-03 14:48:37 公開日:2021-06-01
# ポリシーに基づく強化学習のためのエントロピー正規化自由機構 An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning ( http://arxiv.org/abs/2106.00707v1 ) ライセンス: Link先を確認	Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng	(参考訳) 政策に基づく強化学習手法は、政策崩壊問題に苦しむ。我々は,「epsilon」-greedy機構を用いた価値ベースの強化学習手法が,クローズド・フォーム・ダイバーシティ,客観的不変探索,適応的トレードオフという3つの特徴を享受できることを示す。しかし、3つの特性をすべて達成するポリシーベース手法の並列メカニズムは存在しない。本稿では,閉じた形態の多様性,客観的不変な探索,適応的トレードオフを実現する政策に基づく手法のために設計されたエントロピー正規化自由機構を提案する。実験の結果,本機構は,政策に基づく手法では極めてサンプル効率が高く,アーケード学習環境における新たな最先端技術への政策ベースラインの強化が期待できることがわかった。 Policy-based reinforcement learning methods suffer from the policy collapse problem. We find valued-based reinforcement learning methods with {\epsilon}-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem. However, there does not exist a parallel mechanism for policy-based methods that achieves all three characteristics. In this paper, we propose an entropy regularization free mechanism that is designed for policy-based methods, which achieves Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off. Our experiments show that our mechanism is super sample-efficient for policy-based methods and boosts a policy-based baseline to a new State-Of-The-Art on Arcade Learning Environment.	翻訳日:2021-06-03 14:46:15 公開日:2021-06-01
# NLUデータ収集作業の難しさに対する効果的なクラウドソーシングプロトコルについて What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks? ( http://arxiv.org/abs/2106.00794v1 ) ライセンス: Link先を確認	Nikita Nangia, Saku Sugawara, Harsh Trivedi, Alex Warstadt, Clara Vania, Samuel R. Bowman	(参考訳) クラウドソーシングは、共通の自然言語理解タスクのためのデータを作成するために広く使われている。これらのデータセットは、言語のモデル理解の測定と精細化において重要であるが、データセットの収集に使用されるクラウドソーシング手法にはほとんど焦点が当てられていない。本稿では,データ品質向上手法として,先行研究で提案された介入の有効性を比較した。複数項目の質問応答をテストベッドとして使用し、4つの異なるデータ収集プロトコルの1つで質問を書くようクラウドワーカーに割り当ててランダムに試行します。我々は,NLU例の難易度を高めるための非効率なスタンドアロン戦略として,実例の説明書を書くよう労働者に求めた。しかし,データ収集やフィードバックの送付,専門家の判断に基づく資格取得といった反復的なプロセスは,クラウドワーカーの育成に有効であることが判明した。しかし、専門家の判断ではなくクラウドソーシングを使って労働者を認定し、フィードバックを送ることは効果的ではない。専門家評価を伴う反復的プロトコルからのデータは、いくつかの尺度によりより困難である。特に、このデータの満場一致部分におけるヒューマンモデルギャップは、平均して、ベースラインプロトコルデータのギャップの2倍の大きさである。 Crowdsourcing is widely used to create data for common natural language understanding tasks. Despite the importance of these datasets for measuring and refining model understanding of language, there has been little focus on the crowdsourcing methods used for collecting the datasets. In this paper, we compare the efficacy of interventions that have been proposed in prior work as ways of improving data quality. We use multiple-choice question answering as a testbed and run a randomized trial by assigning crowdworkers to write questions under one of four different data collection protocols. We find that asking workers to write explanations for their examples is an ineffective stand-alone strategy for boosting NLU example difficulty. However, we find that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data. But using crowdsourced, instead of expert judgments, to qualify workers and send feedback does not prove to be effective. We observe that the data from the iterative protocol with expert assessments is more challenging by several measures. Notably, the human--model gap on the unanimous agreement portion of this data is, on average, twice as large as the gap for the baseline protocol data.	翻訳日:2021-06-03 14:44:41 公開日:2021-06-01
# 協調型非定常多変量ガウス過程モデル Collaborative Nonstationary Multivariate Gaussian Process Model ( http://arxiv.org/abs/2106.00719v1 ) ライセンス: Link先を確認	Rui Meng, Herbie Lee, Kristofer Bouchard	(参考訳) 現在、マルチ出力ガウス過程回帰モデルは非定常性をモデル化しないか、あるいは厳しい計算負荷とストレージ要求に関連付けられている。非定常多変量ガウス過程モデル (NMGP) は、入力依存線形モデルを持つ非定常共分散関数を用いて、入力依存相関、スケール、出力の滑らかさを共同でモデル化する。変分スパース近似は、スケーラブルな計算を可能にするために点の誘導に依存する。そこで我々は,NMGPにおける潜在関数の変動フレームワークを誘導することを考えると,協調的非定常ガウス過程モデル(CNMGP)と呼ばれる新しいモデルを提案する。 cnmgpでは, 2倍の確率的変分推論が可能な計算可能な変分境界を導出する。これにより、出力が共通の入力セットを共有していないデータを、入力と出力のサイズに依存しない計算複雑性でモデル化することができる。本稿では,合成データと3つの実データを用いた手法の性能を概ね示し,そのモデルが概して最先端の予測性能よりも優れた予測性能を示すとともに,出力間で異なる時間変動相関の見積もりを提供する。 Currently, multi-output Gaussian process regression models either do not model nonstationarity or are associated with severe computational burdens and storage demands. Nonstationary multi-variate Gaussian process models (NMGP) use a nonstationary covariance function with an input-dependent linear model of coregionalisation to jointly model input-dependent correlation, scale, and smoothness of outputs. Variational sparse approximation relies on inducing points to enable scalable computations. Here, we take the best of both worlds: considering an inducing variable framework on the underlying latent functions in NMGP, we propose a novel model called the collaborative nonstationary Gaussian process model(CNMGP). For CNMGP, we derive computationally tractable variational bounds amenable to doubly stochastic variational inference. Together, this allows us to model data in which outputs do not share a common input set, with a computational complexity that is independent of the size of the inputs and outputs. We illustrate the performance of our method on synthetic data and three real datasets and show that our model generally pro-vides better predictive performance than the state-of-the-art, and also provides estimates of time-varying correlations that differ across outputs.	翻訳日:2021-06-03 14:43:41 公開日:2021-06-01
# 深層学習コンテストにおける反省会--シンプソンのパラドックスとスケールメトリクスと形状メトリクスの相補的役割 Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics ( http://arxiv.org/abs/2106.00734v1 ) ライセンス: Link先を確認	Charles H. Martin and Michael W. Mahoney	(参考訳) 最先端ニューラルネットワーク(NN)モデルにおいて,優れた一般化性能の原因をよりよく理解するために,NNの一般化精度を予測するコンテストで公開されているモデルのコーパスを分析した。これらのモデルには幅広い品質が含まれ、様々なアーキテクチャと正規化ハイパーパラメータで訓練された。 We identify what amounts to a Simpson's paradox: where "scale" metrics (from traditional statistical learning theory) perform well overall but perform poorly on subpartitions of the data of a given depth, when regularization hyperparameters are varied; and where "shape" metrics (from Heavy-Tailed Self Regularization theory) perform well on subpartitions of the data, when hyperparameters are varied for models of a given depth, but perform poorly overall when models with varying depths are aggregated. この結果から,アーキテクチャとハイパーパラメータが異なる場合と,NNモデル品質の理解における暗黙的スケールと暗黙的形状パラメータの相補的役割について,モデルの比較を行った。また,データ収集に応用した1つの指標を用いて因果的洞察を抽出しようとする場合,さらに,一般化理論の上限値に基づいて,最先端のNNモデルの性能を記述することの必要性を強調した。これらの結果に基づき,データ非依存とデータ依存の2つの新しい形状指標を示し,解法ハイパーパラメータを変化させる際に,nnの連続したテスト精度の傾向を予測できる。 To understand better the causes of good generalization performance in state-of-the-art neural network (NN) models, we analyze of a corpus of models that was made publicly-available for a contest to predict the generalization accuracy of NNs. These models include a wide range of qualities and were trained with a range of architectures and regularization hyperparameters. We identify what amounts to a Simpson's paradox: where "scale" metrics (from traditional statistical learning theory) perform well overall but perform poorly on subpartitions of the data of a given depth, when regularization hyperparameters are varied; and where "shape" metrics (from Heavy-Tailed Self Regularization theory) perform well on subpartitions of the data, when hyperparameters are varied for models of a given depth, but perform poorly overall when models with varying depths are aggregated. Our results highlight the subtly of comparing models when both architectures and hyperparameters are varied, as well as the complementary role of implicit scale versus implicit shape parameters in understanding NN model quality. Our results also suggest caution when one tries to extract causal insight with a single metric applied to aggregate data, and they highlight the need to go beyond one-size-fits-all metrics based on upper bounds from generalization theory to describe the performance of state-of-the-art NN models. Based on these findings, we present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs, of a fixed architecture/depth, when varying solver hyperparameters.	翻訳日:2021-06-03 14:43:19 公開日:2021-06-01
# 時間近傍符号化による時系列の教師なし表現学習 Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding ( http://arxiv.org/abs/2106.00750v1 ) ライセンス: Link先を確認	Sana Tonekaboni, Danny Eytan, Anna Goldenberg	(参考訳) 時系列はしばしば複雑で情報に富んでいるが、わずかにラベル付けされているためモデル化が難しい。本稿では,非定常時系列の一般化表現を学習するための自己教師付きフレームワークを提案する。我々の手法は、TNC(Temporal Neighborhood Coding)と呼ばれ、信号の生成過程の局所的滑らかさを利用して、定常特性のある近傍を定義する。偏りのある対比目的を用いて, 符号化空間において, 近傍からの信号の分布と非隣接信号の分布を区別できることを保証することにより, 時系列表現を学習する。我々のモチベーションは、時系列データのダイナミックな性質をモデル化する能力が、ラベル付けデータが事実上不可能な環境で患者の潜伏状態を特定し、追跡し、予測するのに特に有用である医療分野に起因している。提案手法と最近開発された教師なし表現学習手法を比較し,クラスタリングおよび複数のデータセットの分類タスクにおける優れた性能を示す。 Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.	翻訳日:2021-06-03 14:42:51 公開日:2021-06-01
# 入力凸ニューラルネットワークを用いた確率空間上の関数の最適化 Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks ( http://arxiv.org/abs/2106.00774v1 ) ライセンス: Link先を確認	David Alvarez-Melis, Yair Schiff, Youssef Mroueh	(参考訳) グラディエントフローは、ワッサーシュタイン計量によって与えられる確率の空間を含む一般計量空間における関数を最適化するための強力なツールである。この最適化問題を解決する典型的なアプローチは、最適輸送の動的定式化と有名なjordan-kinderlehrer-otto (jko) スキームとの関係に依存している。しかし、この定式化は凸関数の最適化を伴い、特に高次元では困難である。本研究では,最近導入された入力凸ニューラルネットワーク(ICNN)を用いて,JKOスキームを近似するために凸関数の空間をパラメータ化し,収束保証を享受する尺度よりも関数を設計する手法を提案する。我々は、このJKO-ICNNフレームワークの計算効率の良い実装を導き、その実現可能性と、既知の解を用いた低次元偏微分方程式の近似解の妥当性と妥当性を示す。また、分子発見のための制御生成実験により、JKO-ICNNアプローチを高次元に利用することについても検討する。 Gradient flows are a powerful tool for optimizing functionals in general metric spaces, including the space of probabilities endowed with the Wasserstein metric. A typical approach to solving this optimization problem relies on its connection to the dynamic formulation of optimal transport and the celebrated Jordan-Kinderlehrer-Otto (JKO) scheme. However, this formulation involves optimization over convex functions, which is challenging, especially in high dimensions. In this work, we propose an approach that relies on the recently introduced input-convex neural networks (ICNN) to parameterize the space of convex functions in order to approximate the JKO scheme, as well as in designing functionals over measures that enjoy convergence guarantees. We derive a computationally efficient implementation of this JKO-ICNN framework and use various experiments to demonstrate its feasibility and validity in approximating solutions of low-dimensional partial differential equations with known solutions. We also explore the use of our JKO-ICNN approach in high dimensions with an experiment in controlled generation for molecular discovery.	翻訳日:2021-06-03 14:40:54 公開日:2021-06-01
# 機械学習のための重み付けベクトル:境界検出に適用した数値調和解析 Weighting vectors for machine learning: numerical harmonic analysis applied to boundary detection ( http://arxiv.org/abs/2106.00827v1 ) ライセンス: Link先を確認	Eric Bunch, Jeffery Kline, Daniel Dickinson, Suhaas Bhat, Glenn Fung	(参考訳) 計量空間等級(英: Metric space magnitude)は、代数トポロジーの研究の活発な分野であるスカラー量であり、一般的な計量空間に存在している異なる点の有効個数をまとめたものである。ヘーディングベクトル(英: {\em weighting vector)は、原距離空間の基底幾何学の多くを非自明な方法で捉える、密接に関連する概念である。最近の研究は、計量空間がユークリッドであるとき、重み付けベクトルが境界検出の有効なツールであることを示した。我々はこの結果を再放送し、重み付けベクトルをカーネル化されたSVMの解と見なせることを示す。結果として、この新たな洞察を異常検出タスクに適用し、ベンチマークデータセットにおける最先端技術のパフォーマンスよりも競争力のある性能を示す。穏やかな仮定の下では、行列反転の計算コストを持つ重み付けベクトルを線形時間で効率的に近似できることを示す。 SVM が定義する最小化問題に対して,近傍の手法がいかに近似できるかを示す。 Metric space magnitude, an active field of research in algebraic topology, is a scalar quantity that summarizes the effective number of distinct points that live in a general metric space. The {\em weighting vector} is a closely-related concept that captures, in a nontrivial way, much of the underlying geometry of the original metric space. Recent work has demonstrated that when the metric space is Euclidean, the weighting vector serves as an effective tool for boundary detection. We recast this result and show the weighting vector may be viewed as a solution to a kernelized SVM. As one consequence, we apply this new insight to the task of outlier detection, and we demonstrate performance that is competitive or exceeds performance of state-of-the-art techniques on benchmark data sets. Under mild assumptions, we show the weighting vector, which has computational cost of matrix inversion, can be efficiently approximated in linear time. We show how nearest neighbor methods can approximate solutions to the minimization problems defined by SVMs.	翻訳日:2021-06-03 14:40:36 公開日:2021-06-01
# ニューラルネットワークモデルにおける意味の含意表現 Implicit Representations of Meaning in Neural Language Models ( http://arxiv.org/abs/2106.00737v1 ) ライセンス: Link先を確認	Belinda Z. Li, Maxwell Nye, Jacob Andreas	(参考訳) ニューラルランゲージモデルの有効性は、表層単語共起統計の正確なモデリングから完全に導かれるのか、それとも、これらのモデルが彼らが記述した世界と理性を表すのか? BARTおよびT5トランスフォーマー言語モデルでは、会話を通して進化するエンティティや状況のモデルとして機能する文脈的単語表現を識別する。これらのニューラル表現は、動的意味論の言語モデルと機能的類似性を持ち、それぞれのエンティティの現在の特性と関係の線形な読み出しをサポートし、言語生成に予測可能な効果で操作できる。その結果,少なくとも部分的には,意味の動的表現と実体状態の暗黙的シミュレーションによって,事前学習されたニューラルネットワークモデルの予測がサポートされ,学習データとしてテキストだけで学習できることがわかった。コードとデータはhttps://github.com/belindal/state-probesで入手できる。 Does the effectiveness of neural language models derive entirely from accurate modeling of surface word co-occurrence statistics, or do these models represent and reason about the world they describe? In BART and T5 transformer language models, we identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse. These neural representations have functional similarities to linguistic models of dynamic semantics: they support a linear readout of each entity's current properties and relations, and can be manipulated with predictable effects on language generation. Our results indicate that prediction in pretrained neural language models is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state, and that this behavior can be learned with only text as training data. Code and data are available at https://github.com/belindal/state-probes .	翻訳日:2021-06-03 14:37:37 公開日:2021-06-01
# DYPLOC:テキスト生成のための混合言語モデルを用いたコンテンツの動的計画 DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Text Generation ( http://arxiv.org/abs/2106.00791v1 ) ライセンス: Link先を確認	Xinyu Hua, Ashwin Sreevatsa, and Lu Wang	(参考訳) 我々は,少なくとも2つの異なる課題に直面する長文意見テキスト生成の課題について検討する。まず、既存のニューラルジェネレーションモデルはコヒーレンスに欠けており、効率的なコンテンツプランニングが必要である。第二に、主観的コンテンツと客観的コンテンツの両方をカバーするようにジェネレータを導くには、多様な種類の情報が必要である。そこで本研究では,混合言語モデルの新たな設計に基づく出力生成をしながら,コンテンツの動的計画を行う生成フレームワークdyplocを提案する。多様なコンテンツで生成を豊かにするために、より大規模な事前学習モデルを用いて関連する概念を予測し、クレームを生成することを提案する。我々は,(1)Reddit ChangeMyViewを用いた引数生成,(2)New York Timesのオピニオンセクションを用いた記事作成という,新たに収集されたデータセットに関する2つの課題を実験した。自動評価は,本モデルが競合比較を著しく上回ることを示す。人間の審査員は、われわれの世代がよりリッチなコンテンツに忠実であることをさらに確認する。 We study the task of long-form opinion text generation, which faces at least two distinct challenges. First, existing neural generation models fall short of coherence, thus requiring efficient content planning. Second, diverse types of information are needed to guide the generator to cover both subjective and objective content. To this end, we propose DYPLOC, a generation framework that conducts dynamic planning of content while generating the output based on a novel design of mixed language models. To enrich the generation with diverse content, we further propose to use large pre-trained models to predict relevant concepts and to generate claims. We experiment with two challenging tasks on newly collected datasets: (1) argument generation with Reddit ChangeMyView, and (2) writing articles using New York Times' Opinion section. Automatic evaluation shows that our model significantly outperforms competitive comparisons. Human judges further confirm that our generations are more coherent with richer content.	翻訳日:2021-06-03 14:37:22 公開日:2021-06-01
# CoRI:オープン情報抽出のためのデータ拡張と集合関係の統合 CoRI: Collective Relation Integration with Data Augmentation for Open Information Extraction ( http://arxiv.org/abs/2106.00793v1 ) ライセンス: Link先を確認	Zhengbao Jiang, Jialong Han, Bunyamin Sisman, Xin Luna Dong	(参考訳) Webから抽出された知識を知識グラフ(KG)に統合することで、質問応答のような作業が容易になる。本研究では,対象kgにおける関係関係に対する主観-関係-対象抽出における自由テキスト関係の整合を目的とした関係統合について検討する。自由テキスト関係が曖昧であるという課題に対処するために、以前の方法は、追加の文脈で隣のエンティティとリレーションを利用する。しかし、予測は独立して行われ、相互に矛盾する可能性がある。本稿では,第1段階が個別に候補予測を行い,第2段階が全ての候補予測にアクセスしてグローバルにコヒーレントな予測を行う2段階集団関係統合(cori)モデルを提案する。さらに、未使用のターゲットKGの一部からデータを付加することで、集合モデルをさらに改善する。 2つのデータセットの実験結果から、CoRIはベースラインを大幅に上回り、AUCは.677から.748に、AUCは.716から.780に改善された。 Integrating extracted knowledge from the Web to knowledge graphs (KGs) can facilitate tasks like question answering. We study relation integration that aims to align free-text relations in subject-relation-object extractions to relations in a target KG. To address the challenge that free-text relations are ambiguous, previous methods exploit neighbor entities and relations for additional context. However, the predictions are made independently, which can be mutually inconsistent. We propose a two-stage Collective Relation Integration (CoRI) model, where the first stage independently makes candidate predictions, and the second stage employs a collective model that accesses all candidate predictions to make globally coherent predictions. We further improve the collective model with augmented data from the portion of the target KG that is otherwise unused. Experiment results on two datasets show that CoRI can significantly outperform the baselines, improving AUC from .677 to .748 and from .716 to .780, respectively.	翻訳日:2021-06-03 14:37:06 公開日:2021-06-01
# 英語以外のクレームマッチングによるグローバルなファクトチェックのスケールアップ Claim Matching Beyond English to Scale Global Fact-Checking ( http://arxiv.org/abs/2106.00853v1 ) ライセンス: Link先を確認	Ashkan Kazemi, Kiran Garimella, Devin Gaffney and Scott A. Hale	(参考訳) 手動の事実チェックは、インターネットのニーズを満たすためにうまくスケールしない。この問題は英語以外の文脈でさらに複雑になる。本稿では,ファクトチェックをスケールする手段として,クレームマッチングについて論じる。我々は、クレームマッチングを、1つのファクトチェックで提供可能なクレームを含むテキストメッセージのペアを特定するタスクとして定義する。我々は、WhatsAppのチップラインと公開グループメッセージのデータセットを、ファクトチェックされたクレームとともに構築し、最初に“claim-like statement”を含むアノテートされ、潜在的に類似したアイテムとマッチし、クレームマッチングのためのアノテートされる。我々のデータセットには、高リソース(英語、ヒンディー語)と低リソース(ベンガル語、マラヤラム語、タミル語)のコンテンツが含まれています。データセット内の低リソース言語と高リソース言語間の品質の不均衡に対処するため、知識の蒸留と高品質な"教師"モデルを使って、独自の組込みモデルをトレーニングします。本稿では,本ソリューションの性能評価を行い,ベースラインと既存の多言語埋め込みモデルであるLASERとLaBSEと比較する。すべての設定において、私たちのパフォーマンスがLASERとLaBSEを超えていることを示します。アノテーション付きデータセット、コードブック、トレーニングされた埋め込みモデルをリリースし、さらなる研究を可能にします。 Manual fact-checking does not scale well to serve the needs of the internet. This issue is further compounded in non-English contexts. In this paper, we discuss claim matching as a possible solution to scale fact-checking. We define claim matching as the task of identifying pairs of textual messages containing claims that can be served with one fact-check. We construct a novel dataset of WhatsApp tipline and public group messages alongside fact-checked claims that are first annotated for containing "claim-like statements" and then matched with potentially similar items and annotated for claim matching. Our dataset contains content in high-resource (English, Hindi) and lower-resource (Bengali, Malayalam, Tamil) languages. We train our own embedding model using knowledge distillation and a high-quality "teacher" model in order to address the imbalance in embedding quality between the low- and high-resource languages in our dataset. We provide evaluations on the performance of our solution and compare with baselines and existing state-of-the-art multilingual embedding models, namely LASER and LaBSE. We demonstrate that our performance exceeds LASER and LaBSE in all settings. We release our annotated datasets, codebooks, and trained embedding model to allow for further research.	翻訳日:2021-06-03 14:36:48 公開日:2021-06-01
# 小さなトレーニングハイパースペクトルデータを用いた高密度森林における樹木種マッピングのためのマルチタスク完全畳み込みネットワーク Multi-task fully convolutional network for tree species mapping in dense forests using small training hyperspectral data ( http://arxiv.org/abs/2106.00799v1 ) ライセンス: Link先を確認	Laura Elena Cu\'e La Rosa, Camile Sothe, Raul Queiroz Feitosa, Cl\'audia Maria de Almeida, Marcos Benedito Schimalski, Dario Augusto Borges Oliveira	(参考訳) 本研究は,超スペクトルuavデータを用いた多角形アノテーションによる密林の樹種マッピングのためのマルチタスク完全畳み込みアーキテクチャを提案する。本モデルでは, 樹冠境界制約を強制し, モデル性能を大幅に改善する距離回帰補完タスクを, 非密度トレーニングサンプルから高密度ツリーセマンティックラベリング結果を実現する部分損失関数を実装した。我々のマルチタスクアーキテクチャは、タスクと2つのタスク固有のデコーダの共通表現を学習する共有バックボーンネットワークを用いており、ひとつはセマンティックセグメンテーション出力、もう一つは距離マップレグレッションである。補完課題の導入により, 熱帯林の樹木種分類において, 総合F1スコア87.5%, 総合精度85.9%のセマンティックセマンティックセマンティクス性能が10%向上し, 木種分類の最先端性能が達成されたことを報告した。 This work proposes a multi-task fully convolutional architecture for tree species mapping in dense forests from sparse and scarce polygon-level annotations using hyperspectral UAV-borne data. Our model implements a partial loss function that enables dense tree semantic labeling outcomes from non-dense training samples, and a distance regression complementary task that enforces tree crown boundary constraints and substantially improves the model performance. Our multi-task architecture uses a shared backbone network that learns common representations for both tasks and two task-specific decoders, one for the semantic segmentation output and one for the distance map regression. We report that introducing the complementary task boosts the semantic segmentation performance compared to the single-task counterpart in up to 10% reaching an overall F1 score of 87.5% and an overall accuracy of 85.9%, achieving state-of-art performance for tree species classification in tropical forests.	翻訳日:2021-06-03 14:29:19 公開日:2021-06-01
# nnDetection:医療対象検出のための自己設定方法 nnDetection: A Self-configuring Method for Medical Object Detection ( http://arxiv.org/abs/2106.00817v1 ) ライセンス: Link先を確認	Michael Baumgartner, Paul F. Jaeger, Fabian Isensee, Klaus H. Maier-Hein	(参考訳) 医学画像における物体の同時局所化と分類は、医学的対象検出とも呼ばれ、診断決定は、例えば、対象の格付けに依存することが多いため、高い臨床関連性を有する。ピクセルこのタスクでは、メソッド構成の面倒で反復的なプロセスが大きな研究ボトルネックとなります。近年、nnU-Netは画像分割の課題に対して大きな成功を収めている。 nnu-netのアジェンダに従って,医療用オブジェクト検出の構成プロセスを体系化し,自動化する。結果の自己設定方法であるnnDetectionは、手動による介入なしに、任意の医学的検出問題に適応し、その結果を最先端に匹敵する結果を得る。我々は,adam と luna16 の2つの公開ベンチマークにおいて nndetection の有効性を実証し,総合的手法評価のために10の医療用物体検出タスクを提案する。コードはhttps://github.com/MIC-DKFZ/nnDetectionにある。 Simultaneous localisation and categorization of objects in medical images, also referred to as medical object detection, is of high clinical relevance because diagnostic decisions often depend on rating of objects rather than e.g. pixels. For this task, the cumbersome and iterative process of method configuration constitutes a major research bottleneck. Recently, nnU-Net has tackled this challenge for the task of image segmentation with great success. Following nnU-Net's agenda, in this work we systematize and automate the configuration process for medical object detection. The resulting self-configuring method, nnDetection, adapts itself without any manual intervention to arbitrary medical detection problems while achieving results en par with or superior to the state-of-the-art. We demonstrate the effectiveness of nnDetection on two public benchmarks, ADAM and LUNA16, and propose 10 further medical object detection tasks on public data sets for comprehensive method evaluation. Code is at https://github.com/MIC-DKFZ/nnDetection .	翻訳日:2021-06-03 14:29:00 公開日:2021-06-01
# 大規模ワッサースタイン勾配流れ Large-Scale Wasserstein Gradient Flows ( http://arxiv.org/abs/2106.00736v1 ) ライセンス: Link先を確認	Petr Mokrov, Alexander Korotin, Lingxiao Li, Aude Genevay, Justin Solomon, Evgeny Burnaev	(参考訳) ワッサーシュタイン勾配流は多くの拡散方程式を理解・解く強力な手段を提供する。具体的には、確率測度の拡散をモデル化するフォッカー・プランク方程式は、ワッサーシュタイン空間におけるエントロピー汎函数の勾配勾配として理解することができる。この同値性はjordan、kinderlehrer、ottoによって導入され、いわゆるjkoスキームに触発され、ワッサーシュタイン空間の勾配流の暗黙の離散化を通じてこれらの拡散過程を近似した。しかし、各JKOステップに関連する最適化問題を解くことは、深刻な計算上の課題をもたらす。機械学習アプリケーションを対象として,Wasserstein勾配流を近似するスケーラブルな手法を提案する。提案手法は, 確率勾配降下により最適化できるJKOステップを識別するために, 入力凸ニューラルネットワーク(ICNN)に依存する。従来の研究と異なり, この手法では領域離散化や粒子シミュレーションは不要である。その結果、拡散の各時間ステップにおける測度からサンプルを採取し、その確率密度を計算することができる。フォッカー・プランク方程式に従う拡散を計算し,非正規化密度サンプリングや非線形フィルタリングに適用することにより,アルゴリズムの性能を実証する。 Wasserstein gradient flows provide a powerful means of understanding and solving many diffusion equations. Specifically, Fokker-Planck equations, which model the diffusion of probability measures, can be understood as gradient descent over entropy functionals in Wasserstein space. This equivalence, introduced by Jordan, Kinderlehrer and Otto, inspired the so-called JKO scheme to approximate these diffusion processes via an implicit discretization of the gradient flow in Wasserstein space. Solving the optimization problem associated to each JKO step, however, presents serious computational challenges. We introduce a scalable method to approximate Wasserstein gradient flows, targeted to machine learning applications. Our approach relies on input-convex neural networks (ICNNs) to discretize the JKO steps, which can be optimized by stochastic gradient descent. Unlike previous work, our method does not require domain discretization or particle simulation. As a result, we can sample from the measure at each time step of the diffusion and compute its probability density. We demonstrate our algorithm's performance by computing diffusions following the Fokker-Planck equation and apply it to unnormalized density sampling as well as nonlinear filtering.	翻訳日:2021-06-03 14:26:11 公開日:2021-06-01
# ICDAR 2021 オンライン署名検証に関するコンペティション ICDAR 2021 Competition on On-Line Signature Verification ( http://arxiv.org/abs/2106.00739v1 ) ライセンス: Link先を確認	Ruben Tolosana, Ruben Vera-Rodriguez, Carlos Gonzalez-Garcia, Julian Fierrez, Santiago Rengifo, Aythami Morales, Javier Ortega-Garcia, Juan Carlos Ruiz-Garcia, Sergio Romero-Tapiador, Jiajia Jiang, Songxuan Lai, Lianwen Jin, Yecheng Zhu, Javier Galbally, Moises Diaz, Miguel Angel Ferrer, Marta Gomez-Barrero, Ilya Hodashinsky, Konstantin Sarin, Artem Slezkin, Marina Bardamova, Mikhail Svetlakov, Mohammad Saleem, Cintia Lia Sz\"ucs, Bence Kovari, Falk Pulsmeyer, Mohamad Wehbi, Dario Zanca, Sumaiya Ahmad, Sarthak Mishra and Suraiya Jabin	(参考訳) 本稿では,オンライン署名検証 (SVC 2021) に関する ICDAR 2021 コンペティションの枠組みと成果について述べる。 SVC 2021の目標は、一般的なシナリオ(オフィス/モバイル)におけるオンライン署名検証システムの限界を評価し、大規模なパブリックデータベースを通じて入力(スタイラス/フィンガー)を書くことである。競技では3つの異なるタスクが考慮され、各タスクにランダムと熟練した偽造が同時に考慮されるように、現実的なシナリオをシミュレートする。 svc 2021で得られた結果は,深層学習手法の可能性が高いことを証明した。特に、SVC 2021の最良のオンライン署名検証システムでは、EERの値は3.33%(Task 1)、 7.41%(Task2)、 6.04%(Task3)である。 SVC 2021は、現在進行中のコンペティションとして確立され、DeepSignDBやSVC2021_EvalDBといった大規模公開データベースと標準実験プロトコルを使用して、オープンな共通プラットフォームにおける最先端技術に対して、システムを簡単にベンチマークすることができる。 This paper describes the experimental framework and results of the ICDAR 2021 Competition on On-Line Signature Verification (SVC 2021). The goal of SVC 2021 is to evaluate the limits of on-line signature verification systems on popular scenarios (office/mobile) and writing inputs (stylus/finger) through large-scale public databases. Three different tasks are considered in the competition, simulating realistic scenarios as both random and skilled forgeries are simultaneously considered on each task. The results obtained in SVC 2021 prove the high potential of deep learning methods. In particular, the best on-line signature verification system of SVC 2021 obtained Equal Error Rate (EER) values of 3.33% (Task 1), 7.41% (Task 2), and 6.04% (Task 3). SVC 2021 will be established as an on-going competition, where researchers can easily benchmark their systems against the state of the art in an open common platform using large-scale public databases such as DeepSignDB and SVC2021_EvalDB, and standard experimental protocols.	翻訳日:2021-06-03 14:24:19 公開日:2021-06-01
# 知覚画像の高分解能化のためのフーリエ空間損失 Fourier Space Losses for Efficient Perceptual Image Super-Resolution ( http://arxiv.org/abs/2106.00783v1 ) ライセンス: Link先を確認	Dario Fuoli, Luc Van Gool, and Radu Timofte	(参考訳) 多くの超解像モデル (SR) は高性能に最適化されているため、大きなモデルの複雑さのために効率が良くない。大規模モデルは実世界の応用では実用的ではないことが多いため、より効率的なモデルから高い知覚品質のSRを実現するために、新しい損失関数を研究・提案する。与えられた低複雑性ジェネレータネットワークの代表電力は、パラメータの最適セットに対する強いガイダンスによってのみ活用できる。提案した損失関数の適用のみで,最近導入された効率的なジェネレータアーキテクチャの性能向上が可能であることを示す。特に,フーリエ領域において直接動作する識別器アーキテクチャを設計し,対象のhf分布をよりよく一致させるため,フーリエ空間監督損失を用いて,地上真理画像から欠落した高周波(hf)コンテンツを復元する。フーリエ空間における損失の直接的強調は知覚的画質を著しく向上させると同時に,従来提案されていた損失関数と比較して高い復元品質を維持していることを示す。両方の表現がトレーニング中に相補的な情報を提供するので、空間領域と周波数領域の損失の組み合わせを利用してさらに性能を向上する。それに加えて、訓練されたジェネレータは、最先端の知覚的SR法である RankSRGAN と SRFlow よりも2.4倍、48倍高速である。 Many super-resolution (SR) models are optimized for high performance only and therefore lack efficiency due to large model complexity. As large models are often not practical in real-world applications, we investigate and propose novel loss functions, to enable SR with high perceptual quality from much more efficient models. The representative power for a given low-complexity generator network can only be fully leveraged by strong guidance towards the optimal set of parameters. We show that it is possible to improve the performance of a recently introduced efficient generator architecture solely with the application of our proposed loss functions. In particular, we use a Fourier space supervision loss for improved restoration of missing high-frequency (HF) content from the ground truth image and design a discriminator architecture working directly in the Fourier domain to better match the target HF distribution. We show that our losses' direct emphasis on the frequencies in Fourier-space significantly boosts the perceptual image quality, while at the same time retaining high restoration quality in comparison to previously proposed loss functions for this task. The performance is further improved by utilizing a combination of spatial and frequency domain losses, as both representations provide complementary information during training. On top of that, the trained generator achieves comparable results with and is 2.4x and 48x faster than state-of-the-art perceptual SR methods RankSRGAN and SRFlow respectively.	翻訳日:2021-06-03 14:23:58 公開日:2021-06-01
# ボクセル化点雲幾何の無損失圧縮のための境界体積の精錬 Refining the bounding volumes for lossless compression of voxelized point clouds geometry ( http://arxiv.org/abs/2106.00828v1 ) ライセンス: Link先を確認	Emre Can Kaya, Sebastian Schwarz, Ioan Tabus	(参考訳) 本稿では, 点雲の体積のみを再構成することを目的とした, 最新の損失圧縮法を基にした, 点雲幾何学の新しい無損失圧縮法について述べる。提案手法は1つの投影方向に関連する2つの深度マップから幾何を部分的に再構成することから始まる。深度マップから得られた部分再構成は、一方向に沿って断面分割し、2つの深さマップに含まれない点を符号化することにより、点雲の完全な再構成に完成する。主成分は、入力データに存在する回転不変性を効率的に利用する新規な算術的3次元コンテキスト符号化手順により、内点(実現可能領域内)の一覧に基づく符号化である。ベンチマークデータセットでは、最先端のビット毎voxel結果が得られる。 This paper describes a novel lossless compression method for point cloud geometry, building on a recent lossy compression method that aimed at reconstructing only the bounding volume of a point cloud. The proposed scheme starts by partially reconstructing the geometry from the two depthmaps associated to a single projection direction. The partial reconstruction obtained from the depthmaps is completed to a full reconstruction of the point cloud by sweeping section by section along one direction and encoding the points which were not contained in the two depthmaps. The main ingredient is a list-based encoding of the inner points (situated inside the feasible regions) by a novel arithmetic three dimensional context coding procedure that efficiently utilizes rotational invariances present in the input data. State-of-the-art bits-per-voxel results are obtained on benchmark datasets.	翻訳日:2021-06-03 14:23:36 公開日:2021-06-01
# 物体マニフォールドのニューラルプロセッシングの統計力学 Statistical Mechanics of Neural Processing of Object Manifolds ( http://arxiv.org/abs/2106.00790v1 ) ライセンス: Link先を確認	SueYeon Chung	(参考訳) 不変物体認識は、脳が行う最も基本的な認知タスクの1つである。神経状態空間では、刺激変動を持つ異なる物体は異なる多様体として表現される。この幾何学的な観点では、オブジェクト認識は異なるオブジェクト多様体を線形に分離する問題となる。フィードフォワードの視覚階層では、オブジェクト多様体の表現は層全体に再フォーマットされ、より線形に分離可能であることが示唆されている。したがって、知覚の完全な理論は、可変神経応答から対象多様体を分類する線形読み出しネットワークの能力を特徴付ける必要がある。孤立点の知覚論は、E. Gardnerがこれを統計力学問題として定式化し、レプリカ理論を用いて解析した。本稿では、ガードナーの解析を一般化し、高次元信号の統計的および幾何学的性質を合成する多様体の線形分類の理論を確立する。次に、我々の理論をさらに一般化して、点雲のような一般的な知覚多様体の線形分類を行う。多様体のキャパシティは,有効半径, R_M, 有効次元, D_Mと決定される。最後に、相関多様体、異種多様体ジオメトリ、スパースラベル、非線形分類を含む実データへの応用に関する拡張を示す。次に、オブジェクトベース多様体が標準深層ネットワークでどのように変換されるかを示す。この論文は、対象の神経処理の計算理論の基礎を定め、対象多様体の線形分離性に関する定量的測度を提供する。この理論が、生体および人工ニューラルネットワークにおける感覚表現の処理の基礎となる計算原理に新たな洞察を与えることを期待している。 Invariant object recognition is one of the most fundamental cognitive tasks performed by the brain. In the neural state space, different objects with stimulus variabilities are represented as different manifolds. In this geometrical perspective, object recognition becomes the problem of linearly separating different object manifolds. In feedforward visual hierarchy, it has been suggested that the object manifold representations are reformatted across the layers, to become more linearly separable. Thus, a complete theory of perception requires characterizing the ability of linear readout networks to classify object manifolds from variable neural responses. A theory of the perceptron of isolated points was pioneered by E. Gardner who formulated it as a statistical mechanics problem and analyzed it using replica theory. In this thesis, we generalize Gardner's analysis and establish a theory of linear classification of manifolds synthesizing statistical and geometric properties of high dimensional signals. [..] Next, we generalize our theory further to linear classification of general perceptual manifolds, such as point clouds. We identify that the capacity of a manifold is determined that effective radius, R_M, and effective dimension, D_M. Finally, we show extensions relevant for applications to real data, incorporating correlated manifolds, heterogenous manifold geometries, sparse labels and nonlinear classifications. Then, we demonstrate how object-based manifolds transform in standard deep networks. This thesis lays the groundwork for a computational theory of neuronal processing of objects, providing quantitative measures for linear separability of object manifolds. We hope this theory will provide new insights into the computational principles underlying processing of sensory representations in biological and artificial neural networks.	翻訳日:2021-06-03 14:22:49 公開日:2021-06-01
# 極端分類におけるラベル木の効率-精度トレードオフ Enabling Efficiency-Precision Trade-offs for Label Trees in Extreme Classification ( http://arxiv.org/abs/2106.00730v1 ) ライセンス: Link先を確認	Tavor Z. Baharav, Daniel L. Jiang, Kedarnath Kolluri, Sujay Sanghavi, Inderjit S. Dhillon	(参考訳) Extreme Multi-label Classification (XMC) は、非常に大きなラベルセットから関連するラベルのサブセットでデータポイントをタグ付けできるモデルを学ぶことを目的としている。パーソナライズされたレコメンデーションや製品広告のような現実世界のeコマースアプリケーションは、XMC問題として定式化することができる。このようなアプリケーションでは、ラベルを木に整理し、ラベル数に対数的なトレーニングと推論時間を可能にするのが一般的なアプローチである。ラベルツリーが利用可能になったらモデルをトレーニングすることはよく研究されていますが、ツリーの構造を設計することは、まだよく理解されていない難しい作業であり、モデルのレイテンシと統計パフォーマンスの両方に劇的に影響を与えます。既存のツリー構築アプローチは、統計的なパフォーマンスにのみ最適化するか、レイテンシーに最適化される。我々は,両者の利益をトレードオフする中間操作点を構築するための効率的な情報理論インスパイアアルゴリズムを提案する。本アルゴリズムは,従来不可能であったこれらの目的間の補間を可能にする。 wiki-500kベンチマークデータセットでは、パラベルと同じ精度を維持しつつ、予測レイテンシのプロキシを最大28%削減できることを示した。電子商取引の顧客ログから得られたいくつかのデータセットでは、修正されたラベルツリーが、同じ精度を維持しながら、この予測レイテンシメトリックを最大20%改善することができます。最後に,デプロイモデルのレイテンシ向上を実現する上での課題について論じる。 Extreme multi-label classification (XMC) aims to learn a model that can tag data points with a subset of relevant labels from an extremely large label set. Real world e-commerce applications like personalized recommendations and product advertising can be formulated as XMC problems, where the objective is to predict for a user a small subset of items from a catalog of several million products. For such applications, a common approach is to organize these labels into a tree, enabling training and inference times that are logarithmic in the number of labels. While training a model once a label tree is available is well studied, designing the structure of the tree is a difficult task that is not yet well understood, and can dramatically impact both model latency and statistical performance. Existing approaches to tree construction fall at an extreme point, either optimizing exclusively for statistical performance, or for latency. We propose an efficient information theory inspired algorithm to construct intermediary operating points that trade off between the benefits of both. Our algorithm enables interpolation between these objectives, which was not previously possible. We corroborate our theoretical analysis with numerical results, showing that on the Wiki-500K benchmark dataset our method can reduce a proxy for expected latency by up to 28% while maintaining the same accuracy as Parabel. On several datasets derived from e-commerce customer logs, our modified label tree is able to improve this expected latency metric by up to 20% while maintaining the same accuracy. Finally, we discuss challenges in realizing these latency improvements in deployed models.	翻訳日:2021-06-03 14:21:43 公開日:2021-06-01
# マルチドメイン環境におけるc2意思決定を改善するための画像オーディオ符号化 Image-Audio Encoding to Improve C2 Decision-Making in Multi-Domain Environment ( http://arxiv.org/abs/2106.00787v1 ) ライセンス: Link先を確認	Piyush K. Sharma and Adrienne Raglin	(参考訳) 軍は、MDO(Multi- Domain Operation)におけるコミュニケーションと機敏性を改善する方法を調査している。 IoT(Internet of Things)が最近人気になったのは、パブリックドメインと政府ドメインだ。 MDOにおけるその使用は将来の戦場に革命をもたらし、戦略的優位性をもたらす可能性がある。この技術は軍事能力の活用を提供するが、不確実性と関連するリスクが問題となる。重要な疑問は、これらの不確実性に対処する方法だ。近年、あるデータ領域から別のデータ領域へ情報を変換するための情報カモフラージュが提案されている。これは比較的新しいアプローチであるため、このような変革の課題と、関連する不確実性の検出と対処方法、特に未知の未知の意思決定の改善について検討する。 The military is investigating methods to improve communication and agility in its multi-domain operations (MDO). Nascent popularity of Internet of Things (IoT) has gained traction in public and government domains. Its usage in MDO may revolutionize future battlefields and may enable strategic advantage. While this technology offers leverage to military capabilities, it comes with challenges where one is the uncertainty and associated risk. A key question is how can these uncertainties be addressed. Recently published studies proposed information camouflage to transform information from one data domain to another. As this is comparatively a new approach, we investigate challenges of such transformations and how these associated uncertainties can be detected and addressed, specifically unknown-unknowns to improve decision-making.	翻訳日:2021-06-03 14:19:20 公開日:2021-06-01
# 分散型マルチエージェントq-learningによるuav基地局の省エネルギー配置最適化 Energy-aware placement optimization of UAV base stations via decentralized multi-agent Q-learning ( http://arxiv.org/abs/2106.00845v1 ) ライセンス: Link先を確認	Babatunji Omoniwa, Boris Galkin, Ivana Dusparic	(参考訳) 航空基地局(uav-bss)として機能する無人航空機は、ネットワーク需要の増加、既存のインフラの障害点、災害発生時の地上機器への無線接続を提供する。しかし、バッテリー容量の制限を考慮すると、長時間のカバー作業においてUAVのエネルギーを節約することは困難である。強化学習ベース(rl)アプローチは、これまで複数のuavのエネルギー利用を改善するために用いられてきたが、中央のクラウドコントローラは、エンドデバイスの位置に関する完全な知識を持っていると仮定されている。この仮定は、モバイルグラウンドデバイスを用いた動的ネットワーク環境では現実的ではない。この問題に対処するため,各UAV-BSには,地上機器との接続性を最大化し,エネルギー利用の向上を図る自律エージェントが備わっている。実験の結果,UAV-BSの連接する接地装置の数とエネルギー利用の最大化において,提案手法は集中型アプローチよりも有意に優れていた。 Unmanned aerial vehicles serving as aerial base stations (UAV-BSs) can be deployed to provide wireless connectivity to ground devices in events of increased network demand, points-of-failure in existing infrastructure, or disasters. However, it is challenging to conserve the energy of UAVs during prolonged coverage tasks, considering their limited on-board battery capacity. Reinforcement learning-based (RL) approaches have been previously used to improve energy utilization of multiple UAVs, however, a central cloud controller is assumed to have complete knowledge of the end-devices' locations, i.e., the controller periodically scans and sends updates for UAV decision-making. This assumption is impractical in dynamic network environments with mobile ground devices. To address this problem, we propose a decentralized Q-learning approach, where each UAV-BS is equipped with an autonomous agent that maximizes the connectivity to ground devices while improving its energy utilization. Experimental results show that the proposed design significantly outperforms the centralized approaches in jointly maximizing the number of connected ground devices and the energy utilization of the UAV-BSs.	翻訳日:2021-06-03 14:19:07 公開日:2021-06-01
# (参考訳) スパースグラフのサンプルfr\'echet平均はスパースである The Sample Fr\'echet Mean of Sparse Graphs is Sparse ( http://arxiv.org/abs/2105.14397v2 ) ライセンス: CC BY 4.0	Daniel Ferguson, Francois G. Meyer	(参考訳) グラフからなる大規模なデータセットが利用可能になったことで、"グラフ値のランダム変数"の統計学習において、新しいツールを発明する必要がなくなった。グラフのサンプルの「平均」を特徴づけるために、サンプルFr\'echet平均を計算することができる。サンプル平均はグラフサンプルの解釈可能な要約を与える必要があるので、サンプルの構造的性質がFr'echet平均に伝達されると予想される。サンプル Fr\'echet は標本中のグラフの構造的性質を継承することを意味するのか? 具体的には、以下の結果を示す: スパースグラフの集合のサンプルfr\'echet平均はスパースである。グラフハミング距離とスペクトル隣接擬メトリックに対する結果は、非常に異なる引数を用いて証明する。実際に、サンプルFr'echet平均のエッジ密度は、サンプル内のグラフのエッジ密度によって束縛されるという、より強い結果が証明される。この結果は、サンプルFr\'echet平均を推定するために用いられる方法にかかわらず、グラフサンプルからサンプルFr\'echet平均に伝達できる空間が遺伝性であることを保証している。 The availability of large datasets composed of graphs creates an unprecedented need to invent novel tools in statistical learning for "graph-valued random variables". To characterize the "average" of a sample of graphs, one can compute the sample Fr\'echet mean. Because the sample mean should provide an interpretable summary of the graph sample, one would expect that the structural properties of the sample be transmitted to the Fr\'echet mean. In this paper, we address the following foundational question: does the sample Fr\'echet mean inherit the structural properties of the graphs in the sample? Specifically, we prove the following result: the sample Fr\'echet mean of a set of sparse graphs is sparse. We prove the result for the graph Hamming distance, and the spectral adjacency pseudometric, using very different arguments. In fact, we prove a stronger result: the edge density of the sample Fr\'echet mean is bounded by the edge density of the graphs in the sample. This result guarantees that sparsity is an hereditary property, which can be transmitted from a graph sample to its sample Fr\'echet mean, irrespective of the method used to estimate the sample Fr\'echet mean.	翻訳日:2021-06-03 13:21:05 公開日:2021-06-01
# 周期gp:ガウス過程バンディットを用いた周期世界学習 Periodic-GP: Learning Periodic World with Gaussian Process Bandits ( http://arxiv.org/abs/2105.14422v2 ) ライセンス: Link先を確認	Hengrui Cai, Zhihao Cen, Ling Leng, Rui Song	(参考訳) 配車におけるドライバーの日々の需要や交通の動的な交通パターンなど、データが季節性を伴う場合に、様々な実世界のアプリケーションで発生する周期的環境における逐次的決定最適化を考える。本研究では,この季節法則を活用し,確率的周期世界を学ぶことに注力する。一般作用空間に対処するために,ガウス過程(GP)に基づくバンドイットを基本モデルとして,その柔軟性と一般性から用い,高信頼度境界に基づく周期的カーネルを用いた周期的GP法を提案する。理論的には、周期的定常モデルにおいて周期的核を明示的に特徴付けることにより、提案手法の新たな後悔のバウンドを与える。実験的に,提案アルゴリズムは,マドリードの交通汚染に対する合成データ実験と実データ応用の両方において,既存の手法を著しく上回っている。 We consider the sequential decision optimization on the periodic environment, that occurs in a wide variety of real-world applications when the data involves seasonality, such as the daily demand of drivers in ride-sharing and dynamic traffic patterns in transportation. In this work, we focus on learning the stochastic periodic world by leveraging this seasonal law. To deal with the general action space, we use the bandit based on Gaussian process (GP) as the base model due to its flexibility and generality, and propose the Periodic-GP method with a temporal periodic kernel based on the upper confidence bound. Theoretically, we provide a new regret bound of the proposed method, by explicitly characterizing the periodic kernel in the periodic stationary model. Empirically, the proposed algorithm significantly outperforms the existing methods in both synthetic data experiments and a real data application on Madrid traffic pollution.	翻訳日:2021-06-03 11:02:46 公開日:2021-06-01
# (参考訳) 自然災害評価のためのUAVデータセットの注意に基づくセマンティックセマンティックセグメンテーション Attention Based Semantic Segmentation on UAV Dataset for Natural Disaster Damage Assessment ( http://arxiv.org/abs/2105.14540v2 ) ライセンス: CC BY 4.0	Tashnim Chowdhury, Maryam Rahnemoonfar	(参考訳) 気候変動による有害な影響には、世界中の強大で破壊的なハリケーンが含まれる。自然災害による被害を最小限に抑えるため、救助隊の計画を支援するため、建物や道路を含む地域の被害の異なる構造物の特定が不可欠である。セマンティックセグメンテーションは、画像の異なる部分を特定するのに役立つ。我々は,高分解能UAVデータセット上に,自己注意に基づくセマンティックセマンティックセマンティクスモデルを実装し,テストセットで約88%のMean IoUスコアを得る。その結果、人命を救うとともに経済損失を減らす自然災害被害評価に自己注意型スキームを使うことが示唆された。 The detrimental impacts of climate change include stronger and more destructive hurricanes happening all over the world. Identifying different damaged structures of an area including buildings and roads are vital since it helps the rescue team to plan their efforts to minimize the damage caused by a natural disaster. Semantic segmentation helps to identify different parts of an image. We implement a novel self-attention based semantic segmentation model on a high resolution UAV dataset and attain Mean IoU score of around 88% on the test set. The result inspires to use self-attention schemes in natural disaster damage assessment which will save human lives and reduce economic losses.	翻訳日:2021-06-03 09:18:59 公開日:2021-06-01
# (参考訳) 変分オートエンコーダ:調和的視点 Variational Autoencoders: A Harmonic Perspective ( http://arxiv.org/abs/2105.14866v2 ) ライセンス: CC BY 4.0	Alexander Camuto, Matthew Willetts	(参考訳) 本研究では,高調波解析の観点から変分オートエンコーダ(VAE)について検討する。 VAEの潜伏空間を様々な測度空間であるガウス空間として見ることにより、VAEのエンコーダ分散がVAEエンコーダとデコーダニューラルネットワークによってパラメータ化された関数の周波数内容を制御することを示す一連の結果を得る。特に、より大きなエンコーダ分散がこれらの関数の高周波含量を減少させることを示す。解析により,この分散の増大がvaeのデコーダネットワークにソフトリプシッツ制約を効果的に生じさせることを示した。さらに、VAEの入力にガウス雑音を加えることで、VAEエンコーダネットワークの周波数内容とリプシッツ定数をより細かく制御できることを示す。理論解析を支援するために、我々は、小さな完全連結ニューラルネットワークとより大きな畳み込みネットワークを用いたVAEの実験を行い、我々の理論が様々なニューラルネットワークアーキテクチャを実証した。 In this work we study Variational Autoencoders (VAEs) from the perspective of harmonic analysis. By viewing a VAE's latent space as a Gaussian Space, a variety of measure space, we derive a series of results that show that the encoder variance of a VAE controls the frequency content of the functions parameterised by the VAE encoder and decoder neural networks. In particular we demonstrate that larger encoder variances reduce the high frequency content of these functions. Our analysis allows us to show that increasing this variance effectively induces a soft Lipschitz constraint on the decoder network of a VAE, which is a core contributor to the adversarial robustness of VAEs. We further demonstrate that adding Gaussian noise to the input of a VAE allows us to more finely control the frequency content and the Lipschitz constant of the VAE encoder networks. To support our theoretical analysis we run experiments with VAEs with small fully-connected neural networks and with larger convolutional networks, demonstrating empirically that our theory holds for a variety of neural network architectures.	翻訳日:2021-06-03 08:44:05 公開日:2021-06-01
# (参考訳) semeval-2021タスク4 : 抽象的意味の理解 SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning ( http://arxiv.org/abs/2105.14879v2 ) ライセンス: CC BY 4.0	Boyuan Zheng, Xiaoyu Yang, Yu-Ping Ruan, Zhenhua Ling, Quan Liu, Si Wei, Xiaodan Zhu	(参考訳) 本稿では, semeval-2021 共通タスク4: read comprehension of abstract meaning (recam) を紹介する。この共有タスクは抽象概念を表現・理解する機械の能力を評価するために設計されている。質問文とそれに対応する質問文が与えられた場合、参加システムは5つの抽象概念候補の中から正しい回答を選択することが期待される。抽象性の2つの典型的な定義、すなわち非受容性と非特異性に基づいて、我々のタスクは参加モデルを評価するための3つのサブタスクを提供する。特に、subtask 1は、システムが物理的世界で直接知覚できない概念をいかにうまくモデル化できるかを評価することを目的としている。 Subtask 2は、パスの文脈から、ハイパーネム階層にある非特異な概念を解釈するモデルの能力に焦点を当てている。 Subtask 3は、2種類の抽象性に対するモデルの一般化可能性に関する洞察を提供することを目的としている。 SemEval-2021 の公式評価期間中に,Subtask 1 に 23 件,Subtask 2 に 28 件を提出した。参加チームはさらに29件をSubtask 3に提出した。 leaderboard and competitionのウェブサイトはhttps://competitions.codalab.org/competitions/26153にある。データとベースラインコードはhttps://github.com/boyuanzheng010/SemEval2021-Reading-Comprehension-of-Abstract-Meaningで入手できる。 This paper introduces the SemEval-2021 shared task 4: Reading Comprehension of Abstract Meaning (ReCAM). This shared task is designed to help evaluate the ability of machines in representing and understanding abstract concepts. Given a passage and the corresponding question, a participating system is expected to choose the correct answer from five candidates of abstract concepts in a cloze-style machine reading comprehension setup. Based on two typical definitions of abstractness, i.e., the imperceptibility and nonspecificity, our task provides three subtasks to evaluate the participating models. Specifically, Subtask 1 aims to evaluate how well a system can model concepts that cannot be directly perceived in the physical world. Subtask 2 focuses on models' ability in comprehending nonspecific concepts located high in a hypernym hierarchy given the context of a passage. Subtask 3 aims to provide some insights into models' generalizability over the two types of abstractness. During the SemEval-2021 official evaluation period, we received 23 submissions to Subtask 1 and 28 to Subtask 2. The participating teams additionally made 29 submissions to Subtask 3. The leaderboard and competition website can be found at https://competitions.codalab.org/competitions/26153. The data and baseline code are available at https://github.com/boyuanzheng010/SemEval2021-Reading-Comprehension-of-Abstract-Meaning.	翻訳日:2021-06-03 08:24:41 公開日:2021-06-01
# (参考訳) スパースなエキスパートモデルとそれ以上を探求する Exploring Sparse Expert Models and Beyond ( http://arxiv.org/abs/2105.15082v2 ) ライセンス: CC BY 4.0	An Yang, Junyang Lin, Rui Men, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Jiamang Wang, Yong Li, Di Zhang, Wei Lin, Lin Qu, Jingren Zhou, Hongxia Yang	(参考訳) Mixture-of-Experts (MoE) モデルは、無数のパラメータを持つ有望な結果が得られるが、計算コストは一定であり、モデルスケーリングのトレンドとなっている。それでも、MoE層がパラメータをスパースアクティベーションで活用することで、どのように品質向上をもたらすのかは謎である。本研究では,スパースエキスパートモデルにおけるいくつかの要因について検討する。負荷の不均衡は、最近の研究の視点とは対照的に、モデル品質に重大な問題ではない可能性があるが、sparsely activated experts $k$とexpert capacity $c$トップ$k$ routingは、この文脈で大きな違いをもたらす可能性がある。さらに私たちは、エキスパートプロトタイピングと呼ばれる、専門家を異なるプロトタイプに分割し、トップクラスのルーティングに$k$を適用するシンプルな方法を提案します。この戦略は, モデル品質を向上させるが, 一定の計算コストを維持するとともに, 大規模モデルのさらなる探索により, 大規模モデルの訓練に有効であることが示唆された。私たちはモデルスケールを1兆ドル以上のパラメータにし、NVIDIA V100-32GBのGPUのみに実装します。提案する巨大モデルは,同規模のベースライン上での収束の大幅な高速化を実現する。 Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost, and thus it has become a trend in model scaling. Still it is a mystery how MoE layers bring quality gains by leveraging the parameters with sparse activation. In this work, we investigate several key factors in sparse expert models. We observe that load imbalance may not be a significant problem affecting model quality, contrary to the perspectives of recent studies, while the number of sparsely activated experts $k$ and expert capacity $C$ in top-$k$ routing can significantly make a difference in this context. Furthermore, we take a step forward to propose a simple method called expert prototyping that splits experts into different prototypes and applies $k$ top-$1$ routing. This strategy improves the model quality but maintains constant computational costs, and our further exploration on extremely large-scale models reflects that it is more effective in training larger models. We push the model scale to over $1$ trillion parameters and implement it on solely $480$ NVIDIA V100-32GB GPUs, in comparison with the recent SOTAs on $2048$ TPU cores. The proposed giant model achieves substantial speedup in convergence over the same-size baseline.	翻訳日:2021-06-03 08:02:58 公開日:2021-06-01
# (参考訳) 単調分類器の説明 Explanations for Monotonic Classifiers ( http://arxiv.org/abs/2106.00154v1 ) ライセンス: CC BY 4.0	Joao Marques-Silva, Thomas Gerspacher, Martin Cooper, Alexey Ignatiev, Nina Narodytska	(参考訳) 多くの分類課題では単調性が要求される。具体的には、もし他のすべてが一定であるならば、増加(resp)する。減少) 1つ以上の特徴の値が減少してはいけない(resp)。増加) 予測の値。単調分類子を学ぶための包括的な取り組みにもかかわらず、単調分類子を説明するための専門的なアプローチは乏しく、分類子特有のものである。本稿では,ブラックボックス型単調分類器の一形式的説明の計算アルゴリズムについて述べる。これらの新しいアルゴリズムは、分類器のランタイム複雑性と特徴数における多項式である。さらに,形式的説明を列挙する実用的なモデル非依存アルゴリズムを提案する。 In many classification tasks there is a requirement of monotonicity. Concretely, if all else remains constant, increasing (resp. decreasing) the value of one or more features must not decrease (resp. increase) the value of the prediction. Despite comprehensive efforts on learning monotonic classifiers, dedicated approaches for explaining monotonic classifiers are scarce and classifier-specific. This paper describes novel algorithms for the computation of one formal explanation of a (black-box) monotonic classifier. These novel algorithms are polynomial in the run time complexity of the classifier and the number of features. Furthermore, the paper presents a practically efficient model-agnostic algorithm for enumerating formal explanations.	翻訳日:2021-06-03 03:26:05 公開日:2021-06-01
# (参考訳) スパース出力を用いた軌道予測の強化:チームスポーツへの応用 Enhancing Trajectory Prediction using Sparse Outputs: Application to Team Sports ( http://arxiv.org/abs/2106.00173v1 ) ライセンス: CC BY 4.0	Brandon Victor, Aiden Nibali, Zhen He, David L. Carey	(参考訳) チームのダイナミクスを効果的に模倣する洗練された軌道予測モデルは、スポーツコーチ、放送局、観客に多くの潜在的用途がある。しかし、サッカーデータを用いた実験により、予測と真の将来の軌跡の間の平均距離で線形外挿を上回り、プレイヤー軌道予測のためのディープラーニングモデルをトレーニングすることは驚くほど困難であることがわかった。本研究では,スパース軌道の予測と一定加速度による補間により訓練を改善する新しい手法を提案し,実験を行った。この補間は、スパースアウトプットで訓練されていないモデルでも使用することができ、テストされたすべてのモデルのパフォーマンスを一貫して改善することがわかった。さらに,他のプレイヤーの完全な軌跡を条件にすることで,プレイヤーのサブセットに対する予測軌跡の精度が向上し,スパース予測と組み合わせることでさらに改善できることが判明した。また、グラフネットワークとマルチヘッドアテンション(gran-ma)を用いた新しいアーキテクチャを提案する。このアーキテクチャは、データセット上の他のテストされた最先端モデルよりも優れた性能を実現し、スパーストラジェクタとフルトラジェクション条件付き軌道予測の両方に自明に適合する。 Sophisticated trajectory prediction models that effectively mimic team dynamics have many potential uses for sports coaches, broadcasters and spectators. However, through experiments on soccer data we found that it can be surprisingly challenging to train a deep learning model for player trajectory prediction which outperforms linear extrapolation on average distance between predicted and true future trajectories. We propose and test a novel method for improving training by predicting a sparse trajectory and interpolating using constant acceleration, which improves performance for several models. This interpolation can also be used on models that aren't trained with sparse outputs, and we find that this consistently improves performance for all tested models. Additionally, we find that the accuracy of predicted trajectories for a subset of players can be improved by conditioning on the full trajectories of the other players, and that this is further improved when combined with sparse predictions. We also propose a novel architecture using graph networks and multi-head attention (GraN-MA) which achieves better performance than other tested state-of-the-art models on our dataset and is trivially adapted for both sparse trajectories and full-trajectory conditioned trajectory prediction.	翻訳日:2021-06-03 03:08:31 公開日:2021-06-01
# (参考訳) 2次元データセットのハイブリッド生成モデル Hybrid Generative Models for Two-Dimensional Datasets ( http://arxiv.org/abs/2106.00203v1 ) ライセンス: CC BY 4.0	Hoda Shajari, Jaemoon Lee, Sanjay Ranka, Anand Rangarajan	(参考訳) 2次元配列に基づくデータセットは、様々な領域にまたがっている。現在の生成モデリングのアプローチは、通常、従来の画像データセットに限定され、ピクセル間の相関を明示的にキャプチャしないピクセルドメインで実行される。さらに、これらのアプローチは、各要素値が連続で固定範囲に制限されない科学や他の応用に拡張されない。本稿では,計算を表現基盤の空間に移動させることにより,二次元データセットを生成する新しい手法を提案し,画像から,科学計算から2つの異なるデータセットにその有用性を示す。提案手法は汎用的で,任意のデータセット,表現ベース,生成モデルに適用可能である。生成モデルと表現ベース空間の様々な組み合わせを総合的に比較する。また,画素空間における画像生成の不足を捉える新しい評価指標を提案する。 Two-dimensional array-based datasets are pervasive in a variety of domains. Current approaches for generative modeling have typically been limited to conventional image datasets and performed in the pixel domain which do not explicitly capture the correlation between pixels. Additionally, these approaches do not extend to scientific and other applications where each element value is continuous and is not limited to a fixed range. In this paper, we propose a novel approach for generating two-dimensional datasets by moving the computations to the space of representation bases and show its usefulness for two different datasets, one from imaging and another from scientific computing. The proposed approach is general and can be applied to any dataset, representation basis, or generative model. We provide a comprehensive performance comparison of various combinations of generative models and representation basis spaces. We also propose a new evaluation metric which captures the deficiency of generating images in pixel space.	翻訳日:2021-06-03 02:50:58 公開日:2021-06-01
# (参考訳) 最大傾き発見としての不連続名前付きエンティティ認識 Discontinuous Named Entity Recognition as Maximal Clique Discovery ( http://arxiv.org/abs/2106.00218v1 ) ライセンス: CC BY 4.0	Yucheng Wang, Bowen Yu, Hongsong Zhu, Tingwen Liu, Nan Yu and Limin Sun	(参考訳) 名前付きエンティティ認識(NER)は、エンティティの言及が不連続である場合、依然として困難である。既存の方法は、認識プロセスをいくつかの逐次ステップに分割する。トレーニングにおいて、彼らは、前のステップのモデル出力に依存する推論を行いながら、黄金の中間結果を条件付きで予測する。この問題を解決するために、まず各文のセグメントグラフを構築し、各ノードがセグメント(自身上の連続エンティティ、または不連続エンティティの一部)を表現し、エッジが同一エンティティに属する2つのノードをリンクする。ノードとエッジはそれぞれ1つのステージでグリッドタグ方式で生成でき、macという新しいアーキテクチャを使って共同で学習することができる。すると、不連続な NER はグラフ内の最大傾きを発見し、各傾きのスパンを連結する非パラメトリックな過程として再構成することができる。 3つのベンチマーク実験により,本手法はf1において最大3.5ポイント向上し,somaモデルよりも5倍の高速化を達成した。 Named entity recognition (NER) remains challenging when entity mentions can be discontinuous. Existing methods break the recognition process into several sequential steps. In training, they predict conditioned on the golden intermediate results, while at inference relying on the model output of the previous steps, which introduces exposure bias. To solve this problem, we first construct a segment graph for each sentence, in which each node denotes a segment (a continuous entity on its own, or a part of discontinuous entities), and an edge links two nodes that belong to the same entity. The nodes and edges can be generated respectively in one stage with a grid tagging scheme and learned jointly using a novel architecture named Mac. Then discontinuous NER can be reformulated as a non-parametric process of discovering maximal cliques in the graph and concatenating the spans in each clique. Experiments on three benchmarks show that our method outperforms the state-of-the-art (SOTA) results, with up to 3.5 percentage points improvement on F1, and achieves 5x speedup over the SOTA model.	翻訳日:2021-06-03 02:41:11 公開日:2021-06-01
# (参考訳) VA-GCN:ポイントクラウド上での学習のためのベクトル注意グラフ畳み込みネットワーク VA-GCN: A Vector Attention Graph Convolution Network for learning on Point Clouds ( http://arxiv.org/abs/2106.00227v1 ) ライセンス: CC BY 4.0	Haotian Hu, Fanyi Wang, Huixiao Le	(参考訳) 局所集約演算子の研究の発展により、ポイントクラウド解析モデルにおいて劇的なブレークスルーが行われた。しかし、現在の文献における既存の局所集約演算子は、モデルのパワーを制限する点雲の局所的な情報に十分な重要性を持たない。そこで我々は,K-Nearest Neighbor (KNN) を用いて各入力点の近傍点を抽出し,中心点とその近傍点間のベクトルの標高と方位関係を利用して,エッジ特徴に対する注目重み行列を構築する,効率的なベクトル注意変換モジュール(VAConv)を提案する。その後、VAConvは二重チャネル構造を採用し、重み付けされたエッジ特徴とグローバル特徴を融合させる。 VAConvの効率を検証するために,VAConvsを異なる受容領域に並列に接続し,マルチスケールグラフ畳み込みネットワークVA-GCNを得る。提案したVA-GCNは、ModelNet40、S3DIS、ShapeNetなどの標準ベンチマークで最先端のパフォーマンスを実現する。 3D分類のためのModelNet40データセットでは、VA-GCNはベースラインに比べて2.4%増加した。 Owing to the development of research on local aggregation operators, dramatic breakthrough has been made in point cloud analysis models. However, existing local aggregation operators in the current literature fail to attach decent importance to the local information of the point cloud, which limits the power of the models. To fit this gap, we propose an efficient Vector Attention Convolution module (VAConv), which utilizes K-Nearest Neighbor (KNN) to extract the neighbor points of each input point, and then uses the elevation and azimuth relationship of the vectors between the center point and its neighbors to construct an attention weight matrix for edge features. Afterwards, the VAConv adopts a dual-channel structure to fuse weighted edge features and global features. To verify the efficiency of the VAConv, we connect the VAConvs with different receptive fields in parallel to obtain a Multi-scale graph convolutional network, VA-GCN. The proposed VA-GCN achieves state-of-the-art performance on standard benchmarks including ModelNet40, S3DIS and ShapeNet. Remarkably, on the ModelNet40 dataset for 3D classification, VA-GCN increased by 2.4% compared to the baseline.	翻訳日:2021-06-03 02:26:11 公開日:2021-06-01
# (参考訳) ロバスト画像と画像セット分類のための統計的・空間的疎結合の再検討 Reconciliation of Statistical and Spatial Sparsity For Robust Image and Image-Set Classification ( http://arxiv.org/abs/2106.00256v1 ) ライセンス: CC BY 4.0	Hao Cheng, Kim-Hui Yap, and Bihan Wen	(参考訳) 最近の画像分類アルゴリズムは、大規模データセットから深い特徴を学習することで、従来の特徴ベースアプローチと比較してかなり優れた結果を得た。しかしながら、ノイズ画像や画像集合クエリの分類や、限られたスケールのデータセット上での深層画像分類モデルのトレーニングなど、実際にはさまざまな画像分類の課題がある。汎用的な深い特徴を適用する代わりに、モデルベースのアプローチは、画像と画像セットの分類タスクにおいてより効果的でデータ効率が良い。本研究では,局所パッチ構造とリーマン多様体に写像された大域ガウス分布とを調和させることにより,画像や画像データセットの分類をモデル化する,新たな統計的・空間的スパース表現法である \textit{j3s} を提案する。我々の知る限りでは、グローバル統計と局所パッチ構造をジョイントスパース表現を通じて併用する作業は行われていない。ジョイントスパース性を用いて局所画像表現と大域画像表現を結合することにより,j3sモデルに基づくジョイントスパース符号化問題を解く。学習したJ3Sモデルは、堅牢な画像分類とイメージセット分類に使用される。実験の結果,提案手法はFMD, UIUC, ETH-80, YTCデータベース上での競合手法よりも高い性能を示した。 Recent image classification algorithms, by learning deep features from large-scale datasets, have achieved significantly better results comparing to the classic feature-based approaches. However, there are still various challenges of image classifications in practice, such as classifying noisy image or image-set queries and training deep image classification models over the limited-scale dataset. Instead of applying generic deep features, the model-based approaches can be more effective and data-efficient for robust image and image-set classification tasks, as various image priors are exploited for modeling the inter- and intra-set data variations while preventing over-fitting. In this work, we propose a novel Joint Statistical and Spatial Sparse representation, dubbed \textit{J3S}, to model the image or image-set data for classification, by reconciling both their local patch structures and global Gaussian distribution mapped into Riemannian manifold. To the best of our knowledge, no work to date utilized both global statistics and local patch structures jointly via joint sparse representation. We propose to solve the joint sparse coding problem based on the J3S model, by coupling the local and global image representations using joint sparsity. The learned J3S models are used for robust image and image-set classification. Experiments show that the proposed J3S-based image classification scheme outperforms the popular or state-of-the-art competing methods over FMD, UIUC, ETH-80 and YTC databases.	翻訳日:2021-06-03 02:14:06 公開日:2021-06-01
# (参考訳) 分割とルール: 動的プロセスのための繰り返し分割ネットワーク Divide and Rule: Recurrent Partitioned Network for Dynamic Processes ( http://arxiv.org/abs/2106.00258v1 ) ライセンス: CC BY 4.0	Qianyu Feng, Bang Zhang, Yi Yang	(参考訳) 一般に、多くの動的プロセスは相互作用変数(物理システムから社会学的分析まで)に関与している。システム内のコンポーネントの相互作用は、相反する動的な振る舞いを引き起こす可能性がある。多くのアプローチは、プロトゲン運動を捉えるのに有効な内部相互作用を無視した時間配列をモデル化する。異なることに、我々のゴールは、部分全体階層を持つシステムを表現し、システム内変数間のインプリート依存性を発見することであり、これは、Recurrent partItioned Network (REIN) によるサブシステム動作に因果関係を持つ相互作用を推論することである。提案アーキテクチャは, (i) 複数のレベルにおける観測の階層的かつ時間的に一貫した表現を抽出する知覚モジュール, (ii) 各レベルにおけるニューロン間の関係性を決定する導出モジュール, (iii)時間分布推定を条件に未来を予測する統計的モジュールからなる。本モデルは,様々な物理システムを用いた長期予測において,限られた観測と安定なコンポーネント間相互作用の同定に有効であることが実証された。 In general, many dynamic processes are involved with interacting variables, from physical systems to sociological analysis. The interplay of components in the system can give rise to confounding dynamic behavior. Many approaches model temporal sequences holistically ignoring the internal interaction which are impotent in capturing the protogenic actuation. Differently, our goal is to represent a system with a part-whole hierarchy and discover the implied dependencies among intra-system variables: inferring the interactions that possess causal effects on the sub-system behavior with REcurrent partItioned Network (REIN). The proposed architecture consists of (i) a perceptive module that extracts a hierarchical and temporally consistent representation of the observation at multiple levels, (ii) a deductive module for determining the relational connection between neurons at each level, and (iii) a statistical module that can predict the future by conditioning on the temporal distributional estimation. Our model is demonstrated to be effective in identifying the componential interactions with limited observation and stable in long-term future predictions experimented with diverse physical systems.	翻訳日:2021-06-03 01:51:48 公開日:2021-06-01
# (参考訳) 3d waveunet:3d wavelet integrated encoder-decoder network for neuron segmentation 3D WaveUNet: 3D Wavelet Integrated Encoder-Decoder Network for Neuron Segmentation ( http://arxiv.org/abs/2106.00259v1 ) ライセンス: CC BY 4.0	Qiufu Li and Linlin Shen	(参考訳) 3Dニューロンセグメンテーションは、脳回路の探索と脳機能の理解に不可欠なニューロンのデジタル再構成の重要なステップである。しかし、ニューロンの細い線状神経繊維は広い領域に広がり、3Dニューロン画像のセグメンテーションに多大な計算コストをもたらす可能性がある。一方、画像内の強いノイズと断線された神経繊維は、タスクに大きな課題をもたらします。本稿では,3次元ウェーブレットとディープラーニングに基づく3次元ニューロン分割法を提案する。ニューロンイメージは、セグメンテーションタスクを単純化するために、まずニューロンキューブに分割される。次に、最初の3dウェーブレット統合エンコーダ・デコーダネットワークである3d waveunetを設計し、キューブ内の神経繊維を分割する。また、3D WaveUNetをトレーニングするために、最大の注釈付きニューロン画像データセットであるBigNeuronを用いて、NeuCuDa(NeuCuDa)を作成する。最後に、キューブに区切られた神経線維を組み立てて完全なニューロンを生成し、利用可能な自動追跡アルゴリズムを用いてデジタル再構成する。実験結果から, ノイズニューロン画像中の標的ニューロンを完全に抽出できる可能性が示唆された。統合された3Dウェーブレットは、3Dニューロンセグメンテーションと再構成の性能を効率よく向上させることができる。この作業のコードと事前訓練されたモデルはhttps://github.com/LiQiufu/3D-WaveUNet.comで入手できる。 3D neuron segmentation is a key step for the neuron digital reconstruction, which is essential for exploring brain circuits and understanding brain functions. However, the fine line-shaped nerve fibers of neuron could spread in a large region, which brings great computational cost to the segmentation in 3D neuronal images. Meanwhile, the strong noises and disconnected nerve fibers in the image bring great challenges to the task. In this paper, we propose a 3D wavelet and deep learning based 3D neuron segmentation method. The neuronal image is first partitioned into neuronal cubes to simplify the segmentation task. Then, we design 3D WaveUNet, the first 3D wavelet integrated encoder-decoder network, to segment the nerve fibers in the cubes; the wavelets could assist the deep networks in suppressing data noise and connecting the broken fibers. We also produce a Neuronal Cube Dataset (NeuCuDa) using the biggest available annotated neuronal image dataset, BigNeuron, to train 3D WaveUNet. Finally, the nerve fibers segmented in cubes are assembled to generate the complete neuron, which is digitally reconstructed using an available automatic tracing algorithm. The experimental results show that our neuron segmentation method could completely extract the target neuron in noisy neuronal images. The integrated 3D wavelets can efficiently improve the performance of 3D neuron segmentation and reconstruction. The code and pre-trained models for this work will be available at https://github.com/LiQiufu/3D-WaveUNet.	翻訳日:2021-06-03 01:41:06 公開日:2021-06-01
# (参考訳) コード生成のための分岐展開順序の動的選択の探索 Exploring Dynamic Selection of Branch Expansion Orders for Code Generation ( http://arxiv.org/abs/2106.00261v1 ) ライセンス: CC BY 4.0	Hui Jiang, Chulun Zhou, Fandong Meng, Biao Zhang, Jie Zhou, Degen Huang, Qingqiang Wu, Jinsong Su	(参考訳) ソフトウェア開発を促進する大きな可能性のために、コード生成は最近注目を集めています。一般に、支配的なモデルはseq2treeモデルであり、入力された自然言語記述を抽象構文木(ast)のプレオーダートラバーサルに対応するツリー構築アクションのシーケンスに変換する。しかし、そのようなトラバース順序は、すべてのマルチブランチノードを扱うのに適していないかもしれない。本稿では,複数分岐ノードに対する分岐の最適拡張順序を動的に決定できるコンテキストベース分岐セレクタを備えたSeq2Treeモデルを提案する。特に,拡張順序の選択は非微分可能な多段階演算であるため,強化学習によりセレクタを最適化し,異なる拡張順序によって得られるモデル損失の差として報酬関数を定式化する。いくつかの一般的なデータセットに対する実験結果と詳細な分析により,本手法の有効性と汎用性を示した。コードをhttps://github.com/DeepLearnXMU/CG-RLでリリースしました。 Due to the great potential in facilitating software development, code generation has attracted increasing attention recently. Generally, dominant models are Seq2Tree models, which convert the input natural language description into a sequence of tree-construction actions corresponding to the pre-order traversal of an Abstract Syntax Tree (AST). However, such a traversal order may not be suitable for handling all multi-branch nodes. In this paper, we propose to equip the Seq2Tree model with a context-based Branch Selector, which is able to dynamically determine optimal expansion orders of branches for multi-branch nodes. Particularly, since the selection of expansion orders is a non-differentiable multi-step operation, we optimize the selector through reinforcement learning, and formulate the reward function as the difference of model losses obtained through different expansion orders. Experimental results and in-depth analysis on several commonly-used datasets demonstrate the effectiveness and generality of our approach. We have released our code at https://github.com/DeepLearnXMU/CG-RL.	翻訳日:2021-06-03 00:56:42 公開日:2021-06-01
# (参考訳) 私がやったの? 強化学習における制御効果を識別する手段としての非難 Did I do that? Blame as a means to identify controlled effects in reinforcement learning ( http://arxiv.org/abs/2106.00266v1 ) ライセンス: CC BY-SA 4.0	Oriol Corcoll, Raul Vicente	(参考訳) 環境の制御可能な側面をモデル化することで、介入の優先順位付けが向上し、強化学習法における一般的な探索戦略となっている。繰り返し最先端の成果が得られたにもかかわらず、このアプローチは報酬ベースのタスクのプロキシとしてのみ研究されており、それ自体ではまだ評価されていない。我々は、アクション予測に依存するソリューションが重要なイベントをモデル化しないことを示す。一方、人間は自分の行動に責任を負い、自分がコントロールしたものを決定する。本稿では, 非難対策に基づく教師なし手法である制御効果ネットワーク(CEN)を提案する。 cenは、アクション予測に基づいて、人気のあるモデルよりも制御された効果を識別できることを示す幅広い環境で評価される。 Modeling controllable aspects of the environment enable better prioritization of interventions and has become a popular exploration strategy in reinforcement learning methods. Despite repeatedly achieving State-of-the-Art results, this approach has only been studied as a proxy to a reward-based task and has not yet been evaluated on its own. We show that solutions relying on action prediction fail to model important events. Humans, on the other hand, assign blame to their actions to decide what they controlled. Here we propose Controlled Effect Network (CEN), an unsupervised method based on counterfactual measures of blame. CEN is evaluated in a wide range of environments showing that it can identify controlled effects better than popular models based on action prediction.	翻訳日:2021-06-03 00:28:55 公開日:2021-06-01
# (参考訳) 自己監督型学習による話者自動検証のための逆防御 Adversarial Defense for Automatic Speaker Verification by Self-Supervised Learning ( http://arxiv.org/abs/2106.00273v1 ) ライセンス: CC BY 4.0	Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-yi Lee	(参考訳) 以前の研究では、自動話者検証(ASV)が、リプレイ、合成音声、最近出現した敵攻撃などの悪意のある密封攻撃に深刻な脆弱性があることが示されている。再生と合成音声に対するasvの防御に多大な努力が払われているが、敵対的な攻撃に対処するためのアプローチはごくわずかである。 ASVの敵攻撃に取り組むための既存のアプローチは、敵のサンプル生成の知識を必要とするが、敵の攻撃者によって適用される正確な攻撃アルゴリズムを知ることは現実的ではない。この研究は、特定の攻撃アルゴリズムを知らずにASVの敵防衛を行う最初の試みの一つである。自己教師型学習モデル(SSLMs)により、入力中の表面ノイズを緩和し、中断されたものからクリーンなサンプルを再構築する利点を持つが、この研究は、敵の摂動を一種のノイズとみなし、SSLMsによるASVに対する敵の防御を行う。具体的には,1) 対向摂動浄化と2) 対向摂動検出の2つの観点から対向防御を行うことを提案する。実験の結果, 検出モジュールは, 約80%の精度で対向検体を検出することにより, ASVを効果的に遮蔽することがわかった。さらに, ASV の敵防衛性能を評価するための一般的な指標は存在しないため, 浄化法と検出法の両方を考慮した敵防衛評価指標を定式化した。提案した評価フレームワークに基づいて,今後のアプローチのベンチマークを強く推奨する。 Previous works have shown that automatic speaker verification (ASV) is seriously vulnerable to malicious spoofing attacks, such as replay, synthetic speech, and recently emerged adversarial attacks. Great efforts have been dedicated to defending ASV against replay and synthetic speech; however, only a few approaches have been explored to deal with adversarial attacks. All the existing approaches to tackle adversarial attacks for ASV require the knowledge for adversarial samples generation, but it is impractical for defenders to know the exact attack algorithms that are applied by the in-the-wild attackers. This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms. Inspired by self-supervised learning models (SSLMs) that possess the merits of alleviating the superficial noise in the inputs and reconstructing clean samples from the interrupted ones, this work regards adversarial perturbations as one kind of noise and conducts adversarial defense for ASV by SSLMs. Specifically, we propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection. Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%. Moreover, since there is no common metric for evaluating the adversarial defense performance for ASV, this work also formalizes evaluation metrics for adversarial defense considering both purification and detection based approaches into account. We sincerely encourage future works to benchmark their approaches based on the proposed evaluation framework.	翻訳日:2021-06-03 00:11:31 公開日:2021-06-01
# (参考訳) 雑音ラベルに頑健な分類器の解析 Analysis of classifiers robust to noisy labels ( http://arxiv.org/abs/2106.00274v1 ) ライセンス: CC BY 4.0	Alex D\'iaz and Damian Steele	(参考訳) 我々は,クラス依存ラベリングノイズを克服するための現代ロバスト分類アルゴリズムについて検討する。最終試験データがクリーンである間に、クラス条件ランダムラベルノイズデータに基づいて分類器を訓練し評価する。ノイズデータを扱う際の分類器の性能を向上させるために,遷移行列を推定する手法を示す。深層学習を3つのデータセットに適用し,CIFARデータセット上の未知ノイズを用いたエンドツーエンド解析をスクラッチから導出する。分類器の有効性とロバスト性を分析し,各実験の結果をtop-1精度を用いて比較対照した。 We explore contemporary robust classification algorithms for overcoming class-dependant labelling noise: Forward, Importance Re-weighting and T-revision. The classifiers are trained and evaluated on class-conditional random label noise data while the final test data is clean. We demonstrate methods for estimating the transition matrix in order to obtain better classifier performance when working with noisy data. We apply deep learning to three data-sets and derive an end-to-end analysis with unknown noise on the CIFAR data-set from scratch. The effectiveness and robustness of the classifiers are analysed, and we compare and contrast the results of each experiment are using top-1 accuracy as our criterion.	翻訳日:2021-06-02 23:43:07 公開日:2021-06-01
# (参考訳) AAPM DL-Sparse-View CT Challenge Submission Report: Designing a Iterative Network for Fanbeam-CT with unknown Geometry AAPM DL-Sparse-View CT Challenge Submission Report: Designing an Iterative Network for Fanbeam-CT with Unknown Geometry ( http://arxiv.org/abs/2106.00280v1 ) ライセンス: CC BY 4.0	Martin Genzel, Jan Macdonald, Maximilian M\"arz	(参考訳) 本報告は、AAPM DL-Sparse-View CT Challenge(チーム名「robust-and-stable」)への私たちの貢献の短い動機と説明に捧げるものである。データ駆動再建技術を用いて,限られたビューファンビーム測定から乳房モデルファントム画像の復元を行う。この課題は、参加者が基底真理画像のコレクションと、ノイズのないサブサンプリングされたシンノグラム(および関連する限定ビューフィルターされたバックプロジェクション画像)を提供するが、実際のフォワードモデルでは提供されないという意味で特徴的である。そこで,本手法では,まずファンビーム形状をデータ駆動幾何キャリブレーションステップで推定する。その後の2段階の手順で、ほぼ正確な解の計算を可能にする反復的なエンドツーエンドネットワークを設計する。 This report is dedicated to a short motivation and description of our contribution to the AAPM DL-Sparse-View CT Challenge (team name: "robust-and-stable"). The task is to recover breast model phantom images from limited view fanbeam measurements using data-driven reconstruction techniques. The challenge is distinctive in the sense that participants are provided with a collection of ground truth images and their noiseless, subsampled sinograms (as well as the associated limited view filtered backprojection images), but not with the actual forward model. Therefore, our approach first estimates the fanbeam geometry in a data-driven geometric calibration step. In a subsequent two-step procedure, we design an iterative end-to-end network that enables the computation of near-exact solutions.	翻訳日:2021-06-02 23:34:19 公開日:2021-06-01
# (参考訳) マルチドメイン対話状態追跡のためのスキーマ対応カリキュラム学習のプレビュー,参加,レビュー Preview, Attend and Review: Schema-Aware Curriculum Learning for Multi-Domain Dialog State Tracking ( http://arxiv.org/abs/2106.00291v1 ) ライセンス: CC BY 4.0	Yinpei Dai, Hangyu Li, Yongbin Li, Jian Sun, Fei Huang, Luo Si, Xiaodan Zhu	(参考訳) 既存のダイアログ状態追跡(DST)モデルは、データセットの豊富な構造情報を無視して、ランダムにダイアログデータをトレーニングする。本稿では,課題指向対話におけるカリキュラム構造とスキーマ構造の両方をよりよく活用するために,カリキュラム学習(CL)を提案する。具体的には,Schema-aware Curriculum Learning for Dialog State Tracking (SaCLog) と呼ばれるモデルに依存しないフレームワークを提案する。このフレームワークは,DSTモデルをスキーマ情報で事前トレーニングするプレビューモジュールと,CLでモデルを最適化するカリキュラムモジュールと,CLトレーニングの強化のために誤予測データを拡張するレビューモジュールから構成される。提案手法は変換器ベースおよびRNNベースDSTモデル(TripPyおよびTRADE)よりもDST性能が向上し,WOZ2.0およびMultiWOZ2.1における新たな最先端結果が得られることを示す。 Existing dialog state tracking (DST) models are trained with dialog data in a random order, neglecting rich structural information in a dataset. In this paper, we propose to use curriculum learning (CL) to better leverage both the curriculum structure and schema structure for task-oriented dialogs. Specifically, we propose a model-agnostic framework called Schema-aware Curriculum Learning for Dialog State Tracking (SaCLog), which consists of a preview module that pre-trains a DST model with schema information, a curriculum module that optimizes the model with CL, and a review module that augments mispredicted data to reinforce the CL training. We show that our proposed approach improves DST performance over both a transformer-based and RNN-based DST model (TripPy and TRADE) and achieves new state-of-the-art results on WOZ2.0 and MultiWOZ2.1.	翻訳日:2021-06-02 23:23:48 公開日:2021-06-01
# (参考訳) 正半定因子化のためのリー・ソンのアルゴリズムの非可換拡張 A Non-commutative Extension of Lee-Seung's Algorithm for Positive Semidefinite Factorizations ( http://arxiv.org/abs/2106.00293v1 ) ライセンス: CC BY 4.0	Yong Sheng Soh, Antonios Varvitsiotis	(参考訳) 非負の成分を持つ行列 $X\in \mathbb{R}_+^{m\times n}$ が与えられたとき、$X$ の正半定値 (PSD) 分解は$r \times r$-dimensional PSD 行列 $\{A_i\}$ と $\{B_j\}$ の集合であり、すべての$\i\in [m],\ j\in [n]$ に対して$X_{ij}= \mathrm{tr}(A_i B_j)$ を満たす。 psd因子分解は、情報理論における量子資源の力と限界だけでなく、半定値プログラムの表現力の理解と基本的に結びついている。 psd因子分解タスクは、非負行列因子分解(nmf)問題を一般化し、r$-次元非負ベクトルの集まりである$\{a_i\}$と$\{b_j\}$を満たす$x_{ij}= a_i^\top b_j$, for all $i\in [m],\j\in [n]$ -- ここで、psd因子分解の行列を対角化として選択することで後者の問題を回復することができる。行列のNMFを計算するための最も広く使われているアルゴリズムは、Lee and Seungによって開発された乗算更新アルゴリズムであり、更新の非負性は正の対角行列でスケーリングすることで保存される。本稿では,PSD分解の計算のために,行列乗法更新(MMU)アルゴリズムと呼ぶLee-Seungアルゴリズムの非可換拡張について述べる。 MMUアルゴリズムは、適切なPSD行列の行列幾何学平均と一致スケーリングによって更新がPSDのままであることを保証する。また,Majorization-Minimizationフレームワークに基づいて,2乗損失目標が非増加的であり,固定点が臨界点に対応することを示す。この分析はリーブのConcavity Theoremに依存する。 PSD分解以外にも、MMUアルゴリズムをプリミティブとしてブロック対角PSD分解とテンソルPSD分解を計算する。実データと合成データの実験により,本手法の有用性を実証する。 Given a matrix $X\in \mathbb{R}_+^{m\times n}$ with nonnegative entries, a Positive Semidefinite (PSD) factorization of $X$ is a collection of $r \times r$-dimensional PSD matrices $\{A_i\}$ and $\{B_j\}$ satisfying $X_{ij}= \mathrm{tr}(A_i B_j)$ for all $\ i\in [m],\ j\in [n]$. PSD factorizations are fundamentally linked to understanding the expressiveness of semidefinite programs as well as the power and limitations of quantum resources in information theory. The PSD factorization task generalizes the Non-negative Matrix Factorization (NMF) problem where we seek a collection of $r$-dimensional nonnegative vectors $\{a_i\}$ and $\{b_j\}$ satisfying $X_{ij}= a_i^\top b_j$, for all $i\in [m],\ j\in [n]$ -- one can recover the latter problem by choosing matrices in the PSD factorization to be diagonal. The most widely used algorithm for computing NMFs of a matrix is the Multiplicative Update algorithm developed by Lee and Seung, in which nonnegativity of the updates is preserved by scaling with positive diagonal matrices. In this paper, we describe a non-commutative extension of Lee-Seung's algorithm, which we call the Matrix Multiplicative Update (MMU) algorithm, for computing PSD factorizations. The MMU algorithm ensures that updates remain PSD by congruence scaling with the matrix geometric mean of appropriate PSD matrices, and it retains the simplicity of implementation that Lee-Seung's algorithm enjoys. Building on the Majorization-Minimization framework, we show that under our update scheme the squared loss objective is non-increasing and fixed points correspond to critical points. The analysis relies on Lieb's Concavity Theorem. Beyond PSD factorizations, we use the MMU algorithm as a primitive to calculate block-diagonal PSD factorizations and tensor PSD factorizations. We demonstrate the utility of our method with experiments on real and synthetic data.	翻訳日:2021-06-02 23:12:40 公開日:2021-06-01
# (参考訳) 電力請求書の裏側:非インタラクティブ負荷監視のためのデュアルdnnアプローチ More Behind Your Electricity Bill: a Dual-DNN Approach to Non-Intrusive Load Monitoring ( http://arxiv.org/abs/2106.00297v1 ) ライセンス: CC BY 4.0	Yu Zhang, Guoming Tang, Qianyi Huang, Yi Wang, Hong Xu	(参考訳) 非侵入負荷モニタリング(NILM)は、家庭のエネルギー消費を個々の家電の項目別エネルギー利用に分解することを目的とした、よく知られた単一チャネルブラインドソース分離問題である。このように、家庭のエネルギー利用に対する意識を高めることで、かなりの省エネが達成できる。近年の研究では、ディープニューラルネットワーク(DNN)ベースのアプローチがNILMタスクに有望であることが示されている。それでも、彼らは通常、ネットワーク設計におけるアプライアンス操作の固有の特性を無視し、不可解な結果をもたらす可能性がある。そこで我々は,DNNの潜在特徴の学習能力を活かしたデュアルディープニューラルネットワーク(Dual-DNN)を開発した。具体的には,2重DNNの設計において,異なる機器の動作状態のパワーレーティングを測定するサブネットワークと,対象機器の動作状態を特定するサブネットワークを採用する。最終結果は、これら2つのネットワーク出力を乗算し、一方、家電製品の多状態特性を考慮して得られる。家電の動作状態の空間特性を強制するために, 正中フィルタリングとハードゲーティング機構をサブネットワークに適用し, 状態同定を行う。最新のNILM手法と比較して、我々のデュアルDNNアプローチは、2つの公開ベンチマークデータセットで平均21.67%の性能改善を示す。 Non-intrusive load monitoring (NILM) is a well-known single-channel blind source separation problem that aims to decompose the household energy consumption into itemised energy usage of individual appliances. In this way, considerable energy savings could be achieved by enhancing household's awareness of energy usage. Recent investigations have shown that deep neural networks (DNNs) based approaches are promising for the NILM task. Nevertheless, they normally ignore the inherent properties of appliance operations in the network design, potentially leading to implausible results. We are thus motivated to develop the dual Deep Neural Networks (dual-DNN), which aims to i) take advantage of DNNs' learning capability of latent features and ii) empower the DNN architecture with identification ability of universal properties. Specifically in the design of dual-DNN, we adopt one subnetwork to measure power ratings of different appliances' operation states, and the other subnetwork to identify the running states of target appliances. The final result is then obtained by multiplying these two network outputs and meanwhile considering the multi-state property of household appliances. To enforce the sparsity property in appliance's state operating, we employ median filtering and hard gating mechanisms to the subnetwork for state identification. Compared with the state-of-the-art NILM methods, our dual-DNN approach demonstrates a 21.67% performance improvement in average on two public benchmark datasets.	翻訳日:2021-06-02 22:54:16 公開日:2021-06-01
# (参考訳) ゼロショット合成性のための独立プロトタイプ伝搬 Independent Prototype Propagation for Zero-Shot Compositionality ( http://arxiv.org/abs/2106.00305v1 ) ライセンス: CC BY 4.0	Frank Ruis, Gertjan Burghours, Doina Bucur	(参考訳) 人間は作曲のゼロショット推論が得意で、シマウマを見たことがない人は、黒と白のストライプの馬のように見えると認識することができる。一方、機械学習システムは通常、トレーニングデータの急激な相関を利用しており、そのような相関は、コンテキスト内でオブジェクトを認識するのに役立つが、一般化を損なう。分類中の文脈的手がかりを活用しつつ,不特定なデータセットを扱うために,新しいプロトタイプ伝搬グラフ法protopropを提案する。まず、条件付き独立なw.r.t.である物体(例えばゼブラ)の原型的表現を学ぶ。彼らの属性ラベル(例:ストライプ)とその逆。次に, 対象分布の依存関係を反映した新規属性・オブジェクトの組み合わせの合成プロトタイプを学習するために, 合成グラフを通して独立プロトタイプを伝搬する。このメソッドはクラス階層グラフや事前学習された単語埋め込みといった外部データに依存しない。 AO-Cleverはクリーンなラベルを持つ合成的でビジュアルなデータセットであり、UT-Zapposはきめ細かい靴型のノイズの多い現実世界のデータセットである。一般化された合成ゼロショット設定において、我々は最先端の結果よりも優れており、この手法のそれぞれの部分の重要性と最終的な結果への寄与を示す。 Humans are good at compositional zero-shot reasoning; someone who has never seen a zebra before could nevertheless recognize one when we tell them it looks like a horse with black and white stripes. Machine learning systems, on the other hand, usually leverage spurious correlations in the training data, and while such correlations can help recognize objects in context, they hurt generalization. To be able to deal with underspecified datasets while still leveraging contextual clues during classification, we propose ProtoProp, a novel prototype propagation graph method. First we learn prototypical representations of objects (e.g., zebra) that are conditionally independent w.r.t. their attribute labels (e.g., stripes) and vice versa. Next we propagate the independent prototypes through a compositional graph, to learn compositional prototypes of novel attribute-object combinations that reflect the dependencies of the target distribution. The method does not rely on any external data, such as class hierarchy graphs or pretrained word embeddings. We evaluate our approach on AO-Clever, a synthetic and strongly visual dataset with clean labels, and UT-Zappos, a noisy real-world dataset of fine-grained shoe types. We show that in the generalized compositional zero-shot setting we outperform state-of-the-art results, and through ablations we show the importance of each part of the method and their contribution to the final results.	翻訳日:2021-06-02 22:37:37 公開日:2021-06-01
# (参考訳) 世界ニュースを通して平和を理解する Understanding peacefulness through the world news ( http://arxiv.org/abs/2106.00306v1 ) ライセンス: CC BY 4.0	Vasiliki Voukelatou, Ioanna Miliou, Fosca Giannotti, Luca Pappalardo	(参考訳) 平和性は全ての人類にとって幸福の主要な次元であり、不平等とあらゆる形態の暴力から抜け出す方法である。そのため、近年は研究者や政策立案者の注目を集めている。ここ数年、新しいデジタルデータストリームがこの分野の研究を大きく変えてきた。本研究は,GDELT(Global Data on Events, Location, and Tone)デジタルニュースデータベースから抽出した情報を利用して,GPI(Global Peace Index)を通して平和性を捉えている。予測機械学習モデルを適用することで,gdeltによるニュースメディアの注目度を月単位のgpi測定の指標として利用できることを示す。さらに、shap方法論を使用して、予測を駆動する最も重要な変数を取得します。この分析は各国のプロファイルを強調し、全体的な予測、特にこれらのエラーを駆動するエラーやイベントについての説明を提供する。社会善研究者、政策立案者、平和構築者が活用するデジタルデータは、機械学習と同じくらい強力なデータサイエンスツールによって、社会的利益の最大化と平和へのリスクの最小化に寄与すると考えている。 Peacefulness is a principal dimension of well-being for all humankind and is the way out of inequity and every single form of violence. Thus, its measurement has lately drawn the attention of researchers and policy-makers. During the last years, novel digital data streams have drastically changed the research in this field. In the current study, we exploit information extracted from Global Data on Events, Location, and Tone (GDELT) digital news database, to capture peacefulness through the Global Peace Index (GPI). Applying predictive machine learning models, we demonstrate that news media attention from GDELT can be used as a proxy for measuring GPI at a monthly level. Additionally, we use the SHAP methodology to obtain the most important variables that drive the predictions. This analysis highlights each country's profile and provides explanations for the predictions overall, and particularly for the errors and the events that drive these errors. We believe that digital data exploited by Social Good researchers, policy-makers, and peace-builders, with data science tools as powerful as machine learning, could contribute to maximize the societal benefits and minimize the risks to peacefulness.	翻訳日:2021-06-02 22:22:23 公開日:2021-06-01
# (参考訳) Reinforce Security: セキュアなWiretapコーディングに向けたモデルフリーアプローチ Reinforce Security: A Model-Free Approach Towards Secure Wiretap Coding ( http://arxiv.org/abs/2106.00343v1 ) ライセンス: CC BY 4.0	Rick Fritschek, Rafael F. Schaefer, Gerhard Wunder	(参考訳) セキュアな符号化関数を近似するためのディープラーニングベースの技術は、無線通信システムの一般的なコーディングとデコードタスクで得られた素晴らしい結果によって、無線通信にかなりの関心を集めている。特に重要なのは、基礎となるチャネルを知らずに機能するモデルフリー技術の開発である。このような手法は,例えば,条件付きチャネル分布の推定とモデル化,報奨関数としての相互情報推定,強化学習などに用いる。本稿では,強化学習のアプローチについて検討し,特にニューラルネットワークを用いたセキュアエンコーディングのモデルフリーアプローチのためのポリシー勾配法について検討する。従来開発された符号化プロセス上のコセット構造を強制する手法は、最近の強化学習手法と組み合わせることができる。この新しい手法は広範囲のシミュレーションにより評価され, 盗聴者の復号性能が一定の誤差レベルに低下することが示されている。 The use of deep learning-based techniques for approximating secure encoding functions has attracted considerable interest in wireless communications due to impressive results obtained for general coding and decoding tasks for wireless communication systems. Of particular importance is the development of model-free techniques that work without knowledge about the underlying channel. Such techniques utilize for example generative adversarial networks to estimate and model the conditional channel distribution, mutual information estimation as a reward function, or reinforcement learning. In this paper, the approach of reinforcement learning is studied and, in particular, the policy gradient method for a model-free approach of neural network-based secure encoding is investigated. Previously developed techniques for enforcing a certain co-set structure on the encoding process can be combined with recent reinforcement learning approaches. This new approach is evaluated by extensive simulations, and it is demonstrated that the resulting decoding performance of an eavesdropper is capped at a certain error level.	翻訳日:2021-06-02 21:53:10 公開日:2021-06-01
# (参考訳) 頂点$p$-center問題の解法のためのグラフ畳み込みネットワークによる実験 Experiments with graph convolutional networks for solving the vertex $p$-center problem ( http://arxiv.org/abs/2106.00357v1 ) ライセンス: CC BY 4.0	Elisabeth Gaar and Markus Sinnl	(参考訳) 過去数年間、グラフ畳み込みネットワーク(gcn)は、グラフ上で定義されたnp-hard combinatorial optimization problem(cops)に取り組むために、機械学習コミュニティで人気のある研究方向となっている。得られた結果は、通常、オペレーションリサーチコミュニティの問題解決アプローチと競合しないが、GCNは、トラベルセールスパーソン問題(TSP)のような古典的なCOPに対する以前の機械学習アプローチと比べて改善されることが多い。本稿では,グラフ上の別の古典的COPである頂点p中心問題(PCP)の解法としてGCNを用いた予備的検討を行う。特に、TSPのエンド・ツー・エンドトレーニングに基づくモデルが、同様の2次元ユークリッドグラフ入力に基づいてTSPの通常使われるバージョンとして定義されたPCPに適応できるかどうかを検討する。しかし、PCP の目的は min-max 構造であり、多くの対称最適解、すなわち、接地トラス解や学習の潜在的な困難をもたらす可能性がある。得られた予備結果は,ネットワークアーキテクチャのアイデアの直接転送があまりうまくいかないことを示している。したがって、我々はPCPがGCNの領域における新しいアイデアや開発のための興味深いベンチマーク問題になり得ると考えている。 In the last few years, graph convolutional networks (GCN) have become a popular research direction in the machine learning community to tackle NP-hard combinatorial optimization problems (COPs) defined on graphs. While the obtained results are usually still not competitive with problem-specific solution approaches from the operations research community, GCNs often lead to improvements compared to previous machine learning approaches for classical COPs such as the traveling salesperson problem (TSP). In this work we present a preliminary study on using GCNs for solving the vertex p-center problem (PCP), which is another classic COP on graphs. In particular, we investigate whether a successful model based on end-to-end training for the TSP can be adapted to a PCP, which is defined on a similar 2D Euclidean graph input as the usually used version of the TSP. However, the objective of the PCP has a min-max structure which could lead to many symmetric optimal, i.e., ground-truth solutions and other potential difficulties for learning. Our obtained preliminary results show that indeed a direct transfer of network architecture ideas does not seem to work too well. Thus we think that the PCP could be an interesting benchmark problem for new ideas and developments in the area of GCNs.	翻訳日:2021-06-02 21:41:16 公開日:2021-06-01
# (参考訳) フットボールのボディーオリエンテーションを分類として学ぶ Learning Football Body-Orientation as a Matter of Classification ( http://arxiv.org/abs/2106.00359v1 ) ライセンス: CC BY 4.0	Adri\`a Arbu\'es-Sang\"uesa, Adri\'an Mart\'in, Paulino Granero, Coloma Ballester, Gloria Haro	(参考訳) オリエンテーションはサッカー選手にとって重要なスキルであり、多くのイベント、特にパスを含むイベントにおいて差別化要因となる。しかし、既存の方向推定手法は、コンピュータビジョン技術に基づいているが、改善の余地は多い。我々の知る限り、本論文はビデオ映像から直接向きを推定する最初のディープラーニングモデルを示す。クラスが配向ビンに対応する分類問題としてこの課題にアプローチし、循環損失関数を導入することにより、有名な畳み込みネットワークを改良し、プレーヤの配向データを提供する。このモデルは、現在のフレームの認識方向に対して個別に補償されるウェアラブルEPTSデバイスから得られる地中構造データを用いて訓練される。得られた結果は従来の手法よりも優れており、特に絶対中央値誤差はプレイヤー当たり12度以下である。あらゆる種類のフットボールビデオ映像に潜在的な一般化を示すために、アブレーション研究が行われる。 Orientation is a crucial skill for football players that becomes a differential factor in a large set of events, especially the ones involving passes. However, existing orientation estimation methods, which are based on computer-vision techniques, still have a lot of room for improvement. To the best of our knowledge, this article presents the first deep learning model for estimating orientation directly from video footage. By approaching this challenge as a classification problem where classes correspond to orientation bins, and by introducing a cyclic loss function, a well-known convolutional network is refined to provide player orientation data. The model is trained by using ground-truth orientation data obtained from wearable EPTS devices, which are individually compensated with respect to the perceived orientation in the current frame. The obtained results outperform previous methods; in particular, the absolute median error is less than 12 degrees per player. An ablation study is included in order to show the potential generalization to any kind of football video footage.	翻訳日:2021-06-02 21:33:58 公開日:2021-06-01
# (参考訳) 空間的・時間的制約を伴う大規模・動的・分散連立形成 Large-scale, Dynamic and Distributed Coalition Formation with Spatial and Temporal Constraints ( http://arxiv.org/abs/2106.00379v1 ) ライセンス: CC BY-SA 4.0	Luca Capezzuto, Danesh Tarapore, and Sarvapali D. Ramchurn	(参考訳) 時間的制約問題と時間的制約問題(cfstp)による連立形成は、複数のエージェントがそれぞれ期限とワークロードで多くのタスクを実行しなければならないマルチエージェントタスク割り当て問題である。完了したタスクの数を最大化するために、エージェントは連合を形成し、解散し、改革することで協力する必要がある。 CFSTPの元々の数学的プログラミングの定式化は、長大で問題のあるBig-M法に基づいているため、実装が難しい。本稿では,コンパクトで実装が容易な定式化を提案する。さらに、最先端CFSTPアルゴリズムの分散バージョンであるD-CTSを設計する。ロンドン消防団の記録を使って、347588ドルのタスクと、動的環境における消防士の動員をシミュレートするテストフレームワークを備えたデータセットを作成します。最先端の分散アルゴリズムであるDSA-SDPと比較して、150ドルのエージェントと3000ドルのタスクを持つ問題では、D-CTSは3.79\% \pm [42.22\%, 1.96\%]$以上のタスクを完了し、通信オーバーヘッドと時間複雑性の点で1桁の効率である。 D-CTSは、最初の大規模、動的、分散CFSTPベンチマークを設定。 The Coalition Formation with Spatial and Temporal constraints Problem (CFSTP) is a multi-agent task allocation problem in which few agents have to perform many tasks, each with its deadline and workload. To maximize the number of completed tasks, the agents need to cooperate by forming, disbanding and reforming coalitions. The original mathematical programming formulation of the CFSTP is difficult to implement, since it is lengthy and based on the problematic Big-M method. In this paper, we propose a compact and easy-to-implement formulation. Moreover, we design D-CTS, a distributed version of the state-of-the-art CFSTP algorithm. Using public London Fire Brigade records, we create a dataset with $347588$ tasks and a test framework that simulates the mobilization of firefighters in dynamic environments. In problems with up to $150$ agents and $3000$ tasks, compared to DSA-SDP, a state-of-the-art distributed algorithm, D-CTS completes $3.79\% \pm [42.22\%, 1.96\%]$ more tasks, and is one order of magnitude more efficient in terms of communication overhead and time complexity. D-CTS sets the first large-scale, dynamic and distributed CFSTP benchmark.	翻訳日:2021-06-02 21:21:45 公開日:2021-06-01
# (参考訳) 決定概念の格子対決定木とランダムフォレスト Decision Concept Lattice vs. Decision Trees and Random Forests ( http://arxiv.org/abs/2106.00387v1 ) ライセンス: CC BY 4.0	Egor Dudyrev, Sergei O. Kuznetsov	(参考訳) 決定木とそのアンサンブルは、教師付き機械学習の非常に人気のあるモデルである。本稿では、多項式時間で構築可能な新しい教師付き機械学習モデルを提案し、分類問題と回帰問題の両方に適用できる決定木、それらのアンサンブル、FCAの考え方を融合する。具体的には,まず決定木に基づく概念格子の一部を構成する多項式時間アルゴリズムを提案する。第2に,最先端モデルに匹敵する予測品質で分類タスクと回帰タスクの両方を解くための概念格子に基づく予測スキームについて述べる。 Decision trees and their ensembles are very popular models of supervised machine learning. In this paper we merge the ideas underlying decision trees, their ensembles and FCA by proposing a new supervised machine learning model which can be constructed in polynomial time and is applicable for both classification and regression problems. Specifically, we first propose a polynomial-time algorithm for constructing a part of the concept lattice that is based on a decision tree. Second, we describe a prediction scheme based on a concept lattice for solving both classification and regression tasks with prediction quality comparable to that of state-of-the-art models.	翻訳日:2021-06-02 21:05:08 公開日:2021-06-01
# (参考訳) 視覚に基づく異常赤血球分類の解析 Analysis of Vision-based Abnormal Red Blood Cell Classification ( http://arxiv.org/abs/2106.00389v1 ) ライセンス: CC BY 4.0	Annika Wong and Nantheera Anantrasirichai and Thanarat H. Chalidabhongse and Duangdao Palasuwan and Attakorn Palasuwan and David Bull	(参考訳) 赤血球(RBC)の異常の同定は、貧血から肝疾患まで幅広い医学的疾患を診断する鍵となる。現在、これは手動で行われ、時間がかかり、主観的なプロセスである。本稿では,機械学習の利点を利用したセル異常検出のキャパシティ向上と標準化を行い,その性能を解析する。従来の機械学習技術であるSVM(Support Vector Machine)、グラフデータのためのディープラーニングアーキテクチャであるTabNet、医用画像セグメンテーション用に設計されたセグメンテーションネットワークであるU-Netの3つの異なる機械学習技術が使用された。重要な問題は、機械学習の有効性に影響を与えるデータセットの高度に不均衡な性質であった。これを解決するために,SMOTE(Synthetic Minority Over-Sampling Technique)とコスト依存学習を用いて,特徴空間におけるマイノリティクラスサンプルの合成を検討した。これら2つの手法を組み合わせて全体の性能を改善する。これらの戦略は少数民族に対する感受性を高めることが判明した。未知の細胞が意味的セグメンテーションに与える影響を実証し、このモデルがラベル付き細胞の学習をこれらの匿名細胞に適用する証拠を示す。これらの結果は,RBC異常検出の自動化に期待できる手法として,古典的モデルと新しいディープラーニングネットワークの両方を示している。 Identification of abnormalities in red blood cells (RBC) is key to diagnosing a range of medical conditions from anaemia to liver disease. Currently this is done manually, a time-consuming and subjective process. This paper presents an automated process utilising the advantages of machine learning to increase capacity and standardisation of cell abnormality detection, and its performance is analysed. Three different machine learning technologies were used: a Support Vector Machine (SVM), a classical machine learning technology; TabNet, a deep learning architecture for tabular data; U-Net, a semantic segmentation network designed for medical image segmentation. A critical issue was the highly imbalanced nature of the dataset which impacts the efficacy of machine learning. To address this, synthesising minority class samples in feature space was investigated via Synthetic Minority Over-sampling Technique (SMOTE) and cost-sensitive learning. A combination of these two methods is investigated to improve the overall performance. These strategies were found to increase sensitivity to minority classes. The impact of unknown cells on semantic segmentation is demonstrated, with some evidence of the model applying learning of labelled cells to these anonymous cells. These findings indicate both classical models and new deep learning networks as promising methods in automating RBC abnormality detection.	翻訳日:2021-06-02 20:58:54 公開日:2021-06-01
# (参考訳) 春園寺井:中国語モデル事前学習のための言語的インフォームド・トケナイザー SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language Model Pretraining ( http://arxiv.org/abs/2106.00400v1 ) ライセンス: CC BY 4.0	Chenglei Si, Zhengyan Zhang, Yingfa Chen, Fanchao Qi, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun	(参考訳) 中国語事前訓練言語モデル(PLM)の従来のトークン化手法では、各文字を識別不可能なトークンとして扱う(Devlin et al., 2019)。本研究では,PLMの中国語トークン化における3つの要因,すなわち発音,グリフ(形),単語境界の影響を包括的に研究する。対応として,1) SHUOWEN(話し言葉),2) JIEZI(ソルブ文字),3) グリフベーストークン,3) 単語セグメント化トークン,および中国語単語セグメント化トークンの3種類を提案する。検討したトークン化器の有効性を実証的に比較するために,BERTスタイルの言語モデルとそれらを事前学習し,下流NLUタスクのモデルを評価する。 SHUOWENとJIEZIは従来の単一文字のトークン化器よりも優れており、中国語のセグメンテーションは前処理のステップとして何の利益も示さない。さらに,提案したSHUOWENおよびJIEZIトークンは,ノイズの多いテキストを扱う場合のロバスト性が著しく向上した。コードと事前訓練されたモデルは、言語的に知らされた中国語NLPを促進するために公開される。 Conventional tokenization methods for Chinese pretrained language models (PLMs) treat each character as an indivisible token (Devlin et al., 2019), which ignores the characteristics of the Chinese writing system. In this work, we comprehensively study the influences of three main factors on the Chinese tokenization for PLM: pronunciation, glyph (i.e., shape), and word boundary. Correspondingly, we propose three kinds of tokenizers: 1) SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers; 3) Word segmented tokenizers, the tokenizers with Chinese word segmentation. To empirically compare the effectiveness of studied tokenizers, we pretrain BERT-style language models with them and evaluate the models on various downstream NLU tasks. We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers, while Chinese word segmentation shows no benefit as a preprocessing step. Moreover, the proposed SHUOWEN and JIEZI tokenizers exhibit significantly better robustness in handling noisy texts. The code and pretrained models will be publicly released to facilitate linguistically informed Chinese NLP.	翻訳日:2021-06-02 20:39:06 公開日:2021-06-01
# (参考訳) KGPool:関係抽出のための動的知識グラフコンテキスト選択 KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction ( http://arxiv.org/abs/2106.00459v1 ) ライセンス: CC BY 4.0	Abhishek Nadgeri, Anson Bastos, Kuldeep Singh, Isaiah Onando Mulang', Johannes Hoffart, Saeedeh Shekarpour, Vijay Saraswat	(参考訳) 本稿では,1つの文から関係抽出(RE)を行い,文と2つの与えられた実体を知識グラフ(KG)の標準事実にマッピングする手法を提案する。特にこの推定されたセンデンシャルRE設定では、単一の文のコンテキストはしばしばスパースである。本稿では,KGPool法を用いて,KGから追加事実を付加してコンテキストを動的に拡張する手法を提案する。これらの事実(エンティティエイリアス、エンティティ記述など)の表現を学習する。知覚的文脈を補う神経的手法を使いますすべての拡張事実を静的に使用する既存の方法とは異なり、KGPoolはこの拡張を文に条件付ける。ニューラルモデルとKG(WikidataとNYT Freebase)を用いてKGPoolの有効性を評価する。標準データセットを用いた実験により,KGPool表現をグラフニューラルネットワークに入力することにより,本手法は最先端手法よりもはるかに精度が高いことがわかった。 We present a novel method for relation extraction (RE) from a single sentence, mapping the sentence and two given entities to a canonical fact in a knowledge graph (KG). Especially in this presumed sentential RE setting, the context of a single sentence is often sparse. This paper introduces the KGPool method to address this sparsity, dynamically expanding the context with additional facts from the KG. It learns the representation of these facts (entity alias, entity descriptions, etc.) using neural methods, supplementing the sentential context. Unlike existing methods that statically use all expanded facts, KGPool conditions this expansion on the sentence. We study the efficacy of KGPool by evaluating it with different neural models and KGs (Wikidata and NYT Freebase). Our experimental evaluation on standard datasets shows that by feeding the KGPool representation into a Graph Neural Network, the overall method is significantly more accurate than state-of-the-art methods.	翻訳日:2021-06-02 20:26:39 公開日:2021-06-01
# (参考訳) 機械学習における公平度指標の動物園 The zoo of Fairness metrics in Machine Learning ( http://arxiv.org/abs/2106.00467v1 ) ライセンス: CC BY 4.0	Alessandro Castelnovo, Riccardo Crupi, Greta Greco, Daniele Regoli	(参考訳) 近年,機械学習(ML)における公平性と自動意思決定の問題が,人工知能を扱う科学コミュニティで注目されている。 MLにおける公平性の定義の多様さが提案され、人口の個人に影響を与える状況において「公正な決定」とは何かという異なる概念が検討されている。これらの概念間の正確な相違、含意、および「直交性」は、まだ文献で完全には分析されていない。本研究では、この定義の動物園から何らかの順序付けを試みる。 In the recent years, the problem of addressing fairness in Machine Learning (ML) and automatic decision-making has attracted a lot of attention in the scientific communities dealing with Artificial Intelligence. A plethora of different definitions of fairness in ML have been proposed, that consider different notions of what is a "fair decision" in situations impacting individuals in the population. The precise differences, implications and "orthogonality" between these notions have not yet been fully analyzed in the literature. In this work, we try to make some order out of this zoo of definitions.	翻訳日:2021-06-02 20:08:26 公開日:2021-06-01
# (参考訳) 微分プライバシーを持つガウス過程 Gaussian Processes with Differential Privacy ( http://arxiv.org/abs/2106.00474v1 ) ライセンス: CC BY 4.0	Antti Honkela	(参考訳) ガウス過程(英: Gaussian process、GP)は、様々な予測タスクに広く使用される非パラメトリックベイズモデルである。ディファレンシャルプライバシ(dp)を通じてgpsに強力なプライバシ保護を追加する以前の作業は、入力ではなく、予測対象(モデル出力)のプライバシのみを保護することに限定されていた。モデル入力と出力の両方に対してDP保護を備えたGPを導入することで、この制限を破る。我々は, sparse gp法を用いて, 既知の誘導点に対するプライベートな変分近似を公表することでこれを実現する。近似共分散は、DPノイズから付加された不確実性を考慮して調整される。この近似は、標準スパースGP技術を用いて任意の予測を計算するために用いられる。本稿では,検証セットのログ類似性に適用したプライベート選択プロトコルを用いたハイパーパラメータ学習手法を提案する。我々の実験は、十分な量のデータがあれば、強力なプライバシー保護下で正確なモデルを生成することができることを示した。 Gaussian processes (GPs) are non-parametric Bayesian models that are widely used for diverse prediction tasks. Previous work in adding strong privacy protection to GPs via differential privacy (DP) has been limited to protecting only the privacy of the prediction targets (model outputs) but not inputs. We break this limitation by introducing GPs with DP protection for both model inputs and outputs. We achieve this by using sparse GP methodology and publishing a private variational approximation on known inducing points. The approximation covariance is adjusted to approximately account for the added uncertainty from DP noise. The approximation can be used to compute arbitrary predictions using standard sparse GP techniques. We propose a method for hyperparameter learning using a private selection protocol applied to validation set log-likelihood. Our experiments demonstrate that given sufficient amount of data, the method can produce accurate models under strong privacy protection.	翻訳日:2021-06-02 19:43:40 公開日:2021-06-01
# (参考訳) 微分プライバシーのシャッフルモデルにおける厳密な会計 Tight Accounting in the Shuffle Model of Differential Privacy ( http://arxiv.org/abs/2106.00477v1 ) ライセンス: CC BY 4.0	Antti Koskela, Mikko A. Heikkil\"a, Antti Honkela	(参考訳) ディファレンシャルプライバシのシャッフルモデル(英: shuffle model of differential privacy)は、ローカルプライバシ機構と信頼できるシャッファを組み合わせた、新しい分散プライバシモデルである。シャッフルによって提供される追加のランダム化は、純粋にローカルなメカニズムと比較してプライバシの境界を改善することが示されている。厳密な境界、特にマルチメッセージプロトコルはシャフラーによってもたらされる複雑さによって複雑になる。最近提案された$(\varepsilon,\delta)$-differential privacy guaranteesの評価のためのフーリエ会計士は、様々な複雑なメカニズムの非適応構成の一般的な方法よりも厳密な境界を与えることが示されている。本稿では,シャッフルモデルにおける複数のユビキタスメカニズムのマルチメッセージバージョンに対して,Fourier Accountantを用いた厳密なプライバシー境界の計算方法を示し,文献における既存のバウンダリのゆるみを示す。 Shuffle model of differential privacy is a novel distributed privacy model based on a combination of local privacy mechanisms and a trusted shuffler. It has been shown that the additional randomisation provided by the shuffler improves privacy bounds compared to the purely local mechanisms. Accounting tight bounds, especially for multi-message protocols, is complicated by the complexity brought by the shuffler. The recently proposed Fourier Accountant for evaluating $(\varepsilon,\delta)$-differential privacy guarantees has been shown to give tighter bounds than commonly used methods for non-adaptive compositions of various complex mechanisms. In this paper we show how to compute tight privacy bounds using the Fourier Accountant for multi-message versions of several ubiquitous mechanisms in the shuffle model and demonstrate looseness of the existing bounds in the literature.	翻訳日:2021-06-02 19:28:28 公開日:2021-06-01
# (参考訳) RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network ( http://arxiv.org/abs/2106.00496v1 ) ライセンス: CC BY 4.0	Lili Zhao, Zezhi Zhu, Xuhu Lin, Xuezhou Guo, Qian Yin, Wenyi Wang, Jianwen Chen	(参考訳) 捕捉されたフレーム間の中間フレームを合成するLiDAR点雲フレーム補間は、多くのアプリケーションにおいて重要な問題となっている。特に点雲の伝送量を減少させるためには、参照フレームに基づいて中間フレームを予測し、高いフレームレートにデータをアップサンプルする。しかし, 点雲の高次元的, スパース的特徴から, ビデオよりもLiDAR点雲の中間フレームの予測が困難である。本稿では,CNNとの中間表現として範囲画像(RI)を利用してフレーム補間処理を行う,新しいLiDAR点雲補間法を提案する。 RIの遺伝特性がカラー画像と異なることを考えると、我々は空間適応的畳み込みを導入して範囲の特徴を適応的に抽出し、高効率フロー推定法を提案する。提案したモデルでは,光学フローに基づいて入力フレームとレンジ特徴をワープし,補間フレームを合成する。 KITTIデータセットの広汎な実験により,本手法は最新の映像フレーム補間法よりも優れた知覚品質を有する優れたフレーム補間結果が得られることが示された。提案手法は,任意のLiDAR点クラウド圧縮システムに統合して予測を行うことができる。 LiDAR point cloud frame interpolation, which synthesizes the intermediate frame between the captured frames, has emerged as an important issue for many applications. Especially for reducing the amounts of point cloud transmission, it is by predicting the intermediate frame based on the reference frames to upsample data to high frame rate ones. However, due to high-dimensional and sparse characteristics of point clouds, it is more difficult to predict the intermediate frame for LiDAR point clouds than videos. In this paper, we propose a novel LiDAR point cloud frame interpolation method, which exploits range images (RIs) as an intermediate representation with CNNs to conduct the frame interpolation process. Considering the inherited characteristics of RIs differ from that of color images, we introduce spatially adaptive convolutions to extract range features adaptively, while a high-efficient flow estimation method is presented to generate optical flows. The proposed model then warps the input frames and range features, based on the optical flows to synthesize the interpolated frame. Extensive experiments on the KITTI dataset have clearly demonstrated that our method consistently achieves superior frame interpolation results with better perceptual quality to that of using state-of-the-art video frame interpolation methods. The proposed method could be integrated into any LiDAR point cloud compression systems for inter prediction.	翻訳日:2021-06-02 19:05:40 公開日:2021-06-01
# (参考訳) 動的環境とタスクへの適応のための統一認知学習フレームワーク A Unified Cognitive Learning Framework for Adapting to Dynamic Environment and Tasks ( http://arxiv.org/abs/2106.00501v1 ) ライセンス: CC BY 4.0	Qihui Wu, Tianchen Ruan, Fuhui Zhou, Yang Huang, Fan Xu, Shijin Zhao, Ya Liu, and Xuyang Huang	(参考訳) 多くの機械学習フレームワークが提案され、様々な目標を実現するために無線通信に利用されている。しかし、ダイナミックなワイヤレス環境やタスクに適応できず、自己学習ができないことで、幅広い応用と達成可能な性能が制限される。脳認知機構による霊長類行動の柔軟性と適応性に着想を得て、動的無線環境とタスクに対して統合認知学習(CL)フレームワークが提案されている。提案するCLの数学的枠組みを確立した。提案するCLフレームワークには,動的な環境やタスクに適応する能力,自己学習能力,そして,変調認識を例に挙げて「悪いお金を追い出す良い金」の能力の3つの利点があることを示す。提案されているCLフレームワークは、現在の学習フレームワークを強化し、アプリケーションを拡張することができる。 Many machine learning frameworks have been proposed and used in wireless communications for realizing diverse goals. However, their incapability of adapting to the dynamic wireless environment and tasks and of self-learning limit their extensive applications and achievable performance. Inspired by the great flexibility and adaptation of primate behaviors due to the brain cognitive mechanism, a unified cognitive learning (CL) framework is proposed for the dynamic wireless environment and tasks. The mathematical framework for our proposed CL is established. Using the public and authoritative dataset, we demonstrate that our proposed CL framework has three advantages, namely, the capability of adapting to the dynamic environment and tasks, the self-learning capability and the capability of 'good money driving out bad money' by taking modulation recognition as an example. The proposed CL framework can enrich the current learning frameworks and widen the applications.	翻訳日:2021-06-02 18:54:44 公開日:2021-06-01
# (参考訳) マルチラベルリモートセンシング画像検索のためのグラフ理論的深部表現学習法 A Novel Graph-Theoretic Deep Representation Learning Method for Multi-Label Remote Sensing Image Retrieval ( http://arxiv.org/abs/2106.00506v1 ) ライセンス: CC BY 4.0	Gencer Sumbul and Beg\"um Demir	(参考訳) 本稿では,多層リモートセンシング(rs)画像検索問題におけるグラフ理論的深層表現学習手法を提案する。提案手法は,アーカイブ内の各RS画像に関連する複数ラベルの共起関係を抽出し,活用することを目的としている。この目的のために、各トレーニング画像は、まず、局所情報と関連する空間構造の両方を組み合わせた地域ベースの画像表現を提供するグラフ構造で表現される。他のグラフベース手法とは異なり、提案手法は、アーカイブ内の各RS画像のグラフ構造を自動的に予測するディープニューラルネットワークをトレーニングするための新しい学習戦略を含む。この戦略は、領域表現学習損失関数を用いて、そのマルチラベル共起関係に基づいて画像コンテンツを特徴付ける。実験により,RSにおける検索問題に対する提案手法の有効性を,最先端の深層表現学習法と比較した。提案手法のコードはhttps://git.tu-berlin.de/rsim/GT-DRL-CBIR で公開されている。 This paper presents a novel graph-theoretic deep representation learning method in the framework of multi-label remote sensing (RS) image retrieval problems. The proposed method aims to extract and exploit multi-label co-occurrence relationships associated to each RS image in the archive. To this end, each training image is initially represented with a graph structure that provides region-based image representation combining both local information and the related spatial organization. Unlike the other graph-based methods, the proposed method contains a novel learning strategy to train a deep neural network for automatically predicting a graph structure of each RS image in the archive. This strategy employs a region representation learning loss function to characterize the image content based on its multi-label co-occurrence relationship. Experimental results show the effectiveness of the proposed method for retrieval problems in RS compared to state-of-the-art deep representation learning methods. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/GT-DRL-CBIR .	翻訳日:2021-06-02 18:45:12 公開日:2021-06-01
# (参考訳) レベル適応型クレジット割り当てを用いた協調型マルチエージェント転送学習 Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment ( http://arxiv.org/abs/2106.00517v1 ) ライセンス: CC BY 4.0	Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, Jianye Hao	(参考訳) 協調型マルチエージェント強化学習(MARL)への移行学習は近年注目されている。単一エージェントの設定とは対照的に、協調的なMARLでは調整が不可欠である。しかし,既存の転送手法はエージェントポリシーにのみ焦点をあて,協調知識を無視する。本稿では,コーディネーション全体を複数の協調パターンに適切に分解することで,ロバストな協調知識の伝達を実現するアーキテクチャを提案する。我々は、レベル適応型QTransformer(LA-QTransformer)と呼ばれる新しいミキシングネットワークを用いて、クレジット代入を考慮したエージェント調整を実現し、協調知識の伝達に特化した新しいレベル適応型QTransformer(LA-Transformer)によって実現された異なるエージェントに対する適切な調整パターンを実現する。さらに,Population Invariant agent with Transformer (PIT) という新しいエージェントネットワークを用いて,多種多様なシナリオにおけるコーディネーション転送を実現する。 StarCraft IIの大規模なマイクロマネジメント実験により、LA-QTransformerとPITは最先端のベースラインに比べて優れた性能を発揮することが示された。 Extending transfer learning to cooperative multi-agent reinforcement learning (MARL) has recently received much attention. In contrast to the single-agent setting, the coordination indispensable in cooperative MARL constrains each agent's policy. However, existing transfer methods focus exclusively on agent policy and ignores coordination knowledge. We propose a new architecture that realizes robust coordination knowledge transfer through appropriate decomposition of the overall coordination into several coordination patterns. We use a novel mixing network named level-adaptive QTransformer (LA-QTransformer) to realize agent coordination that considers credit assignment, with appropriate coordination patterns for different agents realized by a novel level-adaptive Transformer (LA-Transformer) dedicated to the transfer of coordination knowledge. In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios. Extensive experiments in StarCraft II micro-management show that LA-QTransformer together with PIT achieves superior performance compared with state-of-the-art baselines.	翻訳日:2021-06-02 18:38:14 公開日:2021-06-01
# (参考訳) MalPhase:ネットワークフローデータを用いた細粒度マルウェア検出 MalPhase: Fine-Grained Malware Detection Using Network Flow Data ( http://arxiv.org/abs/2106.00541v1 ) ライセンス: CC BY 4.0	Michal Piskozub, Fabio De Gaspari, Frederick Barr-Smith, Luigi V. Mancini, Ivan Martinovic	(参考訳) 経済的インセンティブにより、マルウェアの著者は、機密性の高いデータを盗み、個人や企業に多額の身代金を支払うよう脅迫するために、ますます複雑な新しいマルウェアを常に開発することが奨励される。 2017年、世界のサイバー攻撃の経済的影響は445～600億米ドル、世界のGDPの0.8%と推定されている。伝統的に、マルウェアに対する防御に使われるアプローチの1つはネットワークトラフィック分析であり、これは潜在的に悪意のあるソフトウェアの存在を検出するためにネットワークデータに依存する。しかし、ネットワーク速度とトラフィック量の増加に対応するために、ネットワーク分析は通常、集約されたネットワークデータを扱うことに限られる。本稿では,集約フローの限界に対処するシステムであるMalPhaseを提案する。 malphaseはマルウェアの検出、タイプ、家族分類のためのマルチフェーズパイプラインを備えている。拡張されたネットワークフロー機能と同時多層アーキテクチャを使用することで、ディープラーニングモデルのパフォーマンス向上が容易になり、悪意のあるフロー(>98% F1)を検出し、それらをそれぞれのマルウェアタイプ(>93% F1)とファミリー(>91% F1)に分類することができる。さらに、ロバストな機能の使用と自動エンコーダのデノナイズにより、MalPhaseは、さまざまな量の良質なトラフィックが混在するサンプルでうまく機能する。最後に、MalPhaseは、実際のネットワーク環境を反映する良質なフローにインターレースされた場合でも、既知のサンプルに匹敵するパフォーマンスで、目に見えないマルウェアサンプルを検出する。 Economic incentives encourage malware authors to constantly develop new, increasingly complex malware to steal sensitive data or blackmail individuals and companies into paying large ransoms. In 2017, the worldwide economic impact of cyberattacks is estimated to be between 445 and 600 billion USD, or 0.8% of global GDP. Traditionally, one of the approaches used to defend against malware is network traffic analysis, which relies on network data to detect the presence of potentially malicious software. However, to keep up with increasing network speeds and amount of traffic, network analysis is generally limited to work on aggregated network data, which is traditionally challenging and yields mixed results. In this paper we present MalPhase, a system that was designed to cope with the limitations of aggregated flows. MalPhase features a multi-phase pipeline for malware detection, type and family classification. The use of an extended set of network flow features and a simultaneous multi-tier architecture facilitates a performance improvement for deep learning models, making them able to detect malicious flows (>98% F1) and categorize them to a respective malware type (>93% F1) and family (>91% F1). Furthermore, the use of robust features and denoising autoencoders allows MalPhase to perform well on samples with varying amounts of benign traffic mixed in. Finally, MalPhase detects unseen malware samples with performance comparable to that of known samples, even when interlaced with benign flows to reflect realistic network environments.	翻訳日:2021-06-02 18:05:05 公開日:2021-06-01
# (参考訳) 関連集合による効率的な説明 Efficient Explanations With Relevant Sets ( http://arxiv.org/abs/2106.00546v1 ) ライセンス: CC BY 4.0	Yacine Izza, Alexey Ignatiev, Nina Narodytska, Martin C. Cooper, Joao Marques-Silva	(参考訳) 最近の研究は、与えられた入力に対する分類器による予測の確率論的説明として$\delta$-relevant inputs (またはset)を提案した。 $\delta$-relevant 集合は、(モデル非依存の)アンカーと(モデル-正確な) PI- の説明を関連付けるのに役立つので重要である。残念なことに、最小サイズの$\delta$-relevant集合の計算は${NP}^{PP}$に対して完備であり、その計算は実際はほとんど実現不可能である。本稿では,$\delta$-関係集合の実用的限界に取り組むための解について検討する。まず、本論文は部分最小集合の計算を交互に検討する。第2に、決定木などを含む分類器の具体的家族について研究する。これらの場合、本論文はnpにおける部分集合最小$\delta$-関係集合の計算がnpオラクルへの呼び出しの多項式数で解くことができることを示す。実験による評価は,提案手法と,本論文で研究した分類器の具体的な場合のヒューリスティックな説明器との比較を行い,提案手法の有効性を確認した。 Recent work proposed $\delta$-relevant inputs (or sets) as a probabilistic explanation for the predictions made by a classifier on a given input. $\delta$-relevant sets are significant because they serve to relate (model-agnostic) Anchors with (model-accurate) PI- explanations, among other explanation approaches. Unfortunately, the computation of smallest size $\delta$-relevant sets is complete for ${NP}^{PP}$, rendering their computation largely infeasible in practice. This paper investigates solutions for tackling the practical limitations of $\delta$-relevant sets. First, the paper alternatively considers the computation of subset-minimal sets. Second, the paper studies concrete families of classifiers, including decision trees among others. For these cases, the paper shows that the computation of subset-minimal $\delta$-relevant sets is in NP, and can be solved with a polynomial number of calls to an NP oracle. The experimental evaluation compares the proposed approach with heuristic explainers for the concrete case of the classifiers studied in the paper, and confirms the advantage of the proposed solution over the state of the art.	翻訳日:2021-06-02 17:40:56 公開日:2021-06-01
# (参考訳) Shine: 双方向最適化と暗黙的モデルのためのフォワードパスからの逆推定 SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models ( http://arxiv.org/abs/2106.00553v1 ) ライセンス: CC BY 4.0	Zaccharie Ramzi, Florian Mannel, Shaojie Bai, Jean-Luc Starck, Philippe Ciuciu, Thomas Moreau	(参考訳) 近年,深層ニューラルネットワークの深度を高める手法として暗黙の深度学習が登場している。トレーニングはメモリ効率が高いが、明示的なトレーニングに比べてトレーニングがかなり遅い。深層平衡モデル(deqs)では、トレーニングは双レベル問題として行われ、計算の複雑さは巨大なヤコビ行列の反復反転によって部分的に駆動される。本稿では,biレベルの問題が多く発生するこの計算ボトルネックに対処するための新しい手法を提案する。主な考え方は、フォワードパスから準ニュートン行列を用いて、勾配計算に必要な方向の逆ヤコビ行列を効率的に近似することである。本手法を元来のフォワードアルゴリズムで活用する動機付けとなる定理を提案する。さらに,これらのフォワードアルゴリズムを改良することにより,本手法が漸近的に真の暗黙的勾配を推定する理論的な保証を与える。我々は、超パラメータ最適化からCIFARやImageNetに適用された大規模DQまで、様々な環境でこのアプローチを実証的に研究している。これにより、後方通過の計算コストを最大2桁削減できることを示す。これらはすべて、ハイパーパラメータ最適化およびCIFARにおけるオリジナルのモデルの優れたパフォーマンスを維持し、ImageNet上での奨励的かつ競争的な結果を提供する。 In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks. While their training is memory-efficient, they are still significantly slower to train than their explicit counterparts. In Deep Equilibrium Models (DEQs), the training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. In this paper, we propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer. The main idea is to use the quasi-Newton matrices from the forward pass to efficiently approximate the inverse Jacobian matrix in the direction needed for the gradient computation. We provide a theorem that motivates using our method with the original forward algorithms. In addition, by modifying these forward algorithms, we further provide theoretical guarantees that our method asymptotically estimates the true implicit gradient. We empirically study this approach in many settings, ranging from hyperparameter optimization to large Multiscale DEQs applied to CIFAR and ImageNet. We show that it reduces the computational cost of the backward pass by up to two orders of magnitude. All this is achieved while retaining the excellent performance of the original models in hyperparameter optimization and on CIFAR, and giving encouraging and competitive results on ImageNet.	翻訳日:2021-06-02 17:26:05 公開日:2021-06-01
# (参考訳) マルチスケール特徴融合によるポーズ推定のためのフルレゾリューションエンコーダ・デコーダネットワーク Full-Resolution Encoder-Decoder Networks with Multi-Scale Feature Fusion for Human Pose Estimation ( http://arxiv.org/abs/2106.00566v1 ) ライセンス: CC BY 4.0	Jie Ou, Mingjian Chen, Hong Wu	(参考訳) より正確な2次元ポーズ推定を実現するために,エンコーダ・デコーダネットワーク,単純なベースラインネットワーク(SBN)を3つの方法で拡張する。大きな出力ストライドサイズに起因する量子化誤差を低減するため、単純なベースラインネットワークの端に2つのデコーダモジュールを追加して完全な出力解像度を得る。次に、グローバルコンテキストブロック(gcbs)がエンコーダとデコーダモジュールに追加され、グローバルコンテキスト機能によってそれらを強化する。さらに,マルチスケール特徴を融合分散し,ポーズ推定を促進するために,空間対応型マルチスケール特徴収集分散モジュール(sa-mfcd)を提案する。 ms cocoデータセットにおける実験結果から,本ネットワークはsbn上でのポーズ推定の精度を著しく向上し,resnet34をバックボーンネットワークとして使用するネットワークは,resnet152でsbnと同等の精度を達成し,大規模バックボーンネットワークで優れた結果を得ることができた。 To achieve more accurate 2D human pose estimation, we extend the successful encoder-decoder network, simple baseline network (SBN), in three ways. To reduce the quantization errors caused by the large output stride size, two more decoder modules are appended to the end of the simple baseline network to get full output resolution. Then, the global context blocks (GCBs) are added to the encoder and decoder modules to enhance them with global context features. Furthermore, we propose a novel spatial-attention-based multi-scale feature collection and distribution module (SA-MFCD) to fuse and distribute multi-scale features to boost the pose estimation. Experimental results on the MS COCO dataset indicate that our network can remarkably improve the accuracy of human pose estimation over SBN, our network using ResNet34 as the backbone network can even achieve the same accuracy as SBN with ResNet152, and our networks can achieve superior results with big backbone networks.	翻訳日:2021-06-02 17:00:57 公開日:2021-06-01
# (参考訳) NewsEmbed: 事前訓練されたドキュメント表現によるニュースのモデリング NewsEmbed: Modeling News through Pre-trained DocumentRepresentations ( http://arxiv.org/abs/2106.00590v1 ) ライセンス: CC BY 4.0	Jialu Liu, Tianqi Liu, Cong Yu	(参考訳) 文書レベルでのニュース記事などのテキストリッチな新鮮なコンテンツを効果的にモデル化することは難しい問題である。コンテンツベースモデルが広範囲のアプリケーションに適合するようにするためには、望ましい品質を達成しつつ、人間のラベルの規模を超えて大きなトレーニングデータセットを持つことが重要である。本稿では,この2つの課題に対して,意味的に関係のある新文書とその話題ラベルを人間の監督をほとんど受けずにマイニングする新しい手法を提案する。一方,マルチタスクモデルであるNewsEmbedを設計し,コントラスト学習をマルチラベル分類で訓練し,ユニバーサル文書エンコーダを導出する。提案手法は,数十億の高品質な有機学習例を提供し,異なる言語のテキストが同じ意味空間にエンコードされるような多言語環境に自然に拡張できることを示す。我々は,複数の自然言語理解タスクを対象としたNewsEmbedの競合性能を実験的に実証した。 Effectively modeling text-rich fresh content such as news articles at document-level is a challenging problem. To ensure a content-based model generalize well to a broad range of applications, it is critical to have a training dataset that is large beyond the scale of human labels while achieving desired quality. In this work, we address those two challenges by proposing a novel approach to mine semantically-relevant fresh documents, and their topic labels, with little human supervision. Meanwhile, we design a multitask model called NewsEmbed that alternatively trains a contrastive learning with a multi-label classification to derive a universal document encoder. We show that the proposed approach can provide billions of high quality organic training examples and can be naturally extended to multilingual setting where texts in different languages are encoded in the same semantic space. We experimentally demonstrate NewsEmbed's competitive performance across multiple natural language understanding tasks, both supervised and unsupervised.	翻訳日:2021-06-02 16:50:50 公開日:2021-06-01
# (参考訳) Look Wide and Interpret Twice: 対話型インストラクションフォロータスクのパフォーマンス向上 Look Wide and Interpret Twice: Improving Performance on Interactive Instruction-following Tasks ( http://arxiv.org/abs/2106.00596v1 ) ライセンス: CC BY 4.0	Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani	(参考訳) インボディードAIエージェントを自然言語の指示に従う環境と対話しながら複雑なタスクを実行することに、コミュニティへの関心が高まっている。近年の研究では、タスクのためのよく設計されたデータセットであるALFREDを用いてこの問題に取り組んでいるが、精度は非常に低い。本稿では,従来の手法を大きなマージンで上回る新しい手法を提案する。それはいくつかの新しいアイデアの組み合わせに基づいている。 1つは提供された命令の2段階の解釈である。まず、視覚情報を用いずに命令を選択して解釈し、仮の動作シーケンス予測を行う。そして、その予測を視覚情報等と統合し、アクションとオブジェクトの最終的な予測を生成する。対話するオブジェクトのクラスが第一段階で識別されるので、入力画像から正しいオブジェクトを正確に選択することができる。また,本手法では,環境の複数の自己中心的視点を考察し,現在の指示に基づく階層的注意を応用して本質的な情報を抽出する。これはナビゲーションに対するアクションの正確な予測に寄与する。この手法の予備版がALFRED Challenge 2020で優勝した。現在のバージョンでは、単一のビューで4.45%の成功率を達成しており、複数のビューで8.37%に改善されている。 There is a growing interest in the community in making an embodied AI agent perform a complicated task while interacting with an environment following natural language directives. Recent studies have tackled the problem using ALFRED, a well-designed dataset for the task, but achieved only very low accuracy. This paper proposes a new method, which outperforms the previous methods by a large margin. It is based on a combination of several new ideas. One is a two-stage interpretation of the provided instructions. The method first selects and interprets an instruction without using visual information, yielding a tentative action sequence prediction. It then integrates the prediction with the visual information etc., yielding the final prediction of an action and an object. As the object's class to interact is identified in the first stage, it can accurately select the correct object from the input image. Moreover, our method considers multiple egocentric views of the environment and extracts essential information by applying hierarchical attention conditioned on the current instruction. This contributes to the accurate prediction of actions for navigation. A preliminary version of the method won the ALFRED Challenge 2020. The current version achieves the unseen environment's success rate of 4.45% with a single view, which is further improved to 8.37% with multiple views.	翻訳日:2021-06-02 16:33:11 公開日:2021-06-01
# (参考訳) SpanNer: エンティティの再認識をスパン予測に SpanNer: Named Entity Re-/Recognition as Span Prediction ( http://arxiv.org/abs/2106.00641v1 ) ライセンス: CC BY 4.0	Jinlan Fu, Xuanjing Huang, Pengfei Liu	(参考訳) 近年では、名前付きエンティティ認識(ner)システムのシーケンスラベリングからスパン予測へのパラダイムシフトが見られる。予備的な効果にもかかわらず、スパン予測モデルのアーキテクチャバイアスは完全には理解されていない。本稿では,名前付きエンティティ認識にスパン予測モデルを用いた場合の長所と短所について,シーケンスラベリングフレームワークと比較して検討し,その改善方法について検討する。次に、スパン予測がシステムコンビネータとして機能し、異なるシステムの出力から名前付きエンティティを再認識できることを明らかにする。 3つの言語をカバーする11のデータセット上で154のシステムを実験的に実装し,ベースnerシステムとシステムコンビネータとして機能するスパン予測モデルの有効性を示した。すべてのコードとデータセットを利用可能にする: \url{https://github.com/neulab/spanner} オンラインシステムのデモ: \url{https://spanner.sh} 。私たちのモデルはExplainaBoardプラットフォームにもデプロイされており、ユーザはインタラクティブな方法でトップスコーリングシステムのシステム組み合わせを柔軟に実行することができます。 Recent years have seen the paradigm shift of Named Entity Recognition (NER) systems from sequence labeling to span prediction. Despite its preliminary effectiveness, the span prediction model's architectural bias has not been fully understood. In this paper, we first investigate the strengths and weaknesses when the span prediction model is used for named entity recognition compared with the sequence labeling framework and how to further improve it, which motivates us to make complementary advantages of systems based on different paradigms. We then reveal that span prediction, simultaneously, can serve as a system combiner to re-recognize named entities from different systems' outputs. We experimentally implement 154 systems on 11 datasets, covering three languages, comprehensive results show the effectiveness of span prediction models that both serve as base NER systems and system combiners. We make all code and datasets available: \url{https://github.com/neulab/spanner}, as well as an online system demo: \url{http://spanner.sh}. Our model also has been deployed into the ExplainaBoard platform, which allows users to flexibly perform a system combination of top-scoring systems in an interactive way: \url{http://explainaboard.nlpedia.ai/leaderboard/task-ner/}.	翻訳日:2021-06-02 16:13:01 公開日:2021-06-01
# (参考訳) ネットワーク接続性が集団学習に及ぼす影響 The Impact of Network Connectivity on Collective Learning ( http://arxiv.org/abs/2106.00655v1 ) ライセンス: CC BY 4.0	Michael Crosscombe and Jonathan Lawry	(参考訳) 分散自律システムでは、システムの集団行動を管理する個々のエージェント間の相互作用である。これらのローカルレベルの相互作用は、しばしば基盤となるネットワーク構造によって制御される。これらのネットワークは、エージェントが環境から証拠を収集し、システム内の他のエージェントに情報を伝達する必要がある、集合学習や意思決定において特に重要である。集団行動のモデルは、システム内で効果的な情報共有を提供するためにエージェント間の完全な接続の仮定に依存することが多いが、この仮定は不十分である。本稿では,基礎となるネットワークが集団学習の文脈におけるパフォーマンスに与える影響について検討する。シミュレーションにより,接続性やランダム性のレベルが異なる小世界ネットワークを調査し,接続性の低いネットワークと比較して,完全接続ネットワークの方が平均誤差が高いと結論づけた。さらに,高規則性ネットワークはランダム接続のレベルが増大するネットワークよりも優れることを示す。 In decentralised autonomous systems it is the interactions between individual agents which govern the collective behaviours of the system. These local-level interactions are themselves often governed by an underlying network structure. These networks are particularly important for collective learning and decision-making whereby agents must gather evidence from their environment and propagate this information to other agents in the system. Models for collective behaviours may often rely upon the assumption of total connectivity between agents to provide effective information sharing within the system, but this assumption may be ill-advised. In this paper we investigate the impact that the underlying network has on performance in the context of collective learning. Through simulations we study small-world networks with varying levels of connectivity and randomness and conclude that totally-connected networks result in higher average error when compared to networks with less connectivity. Furthermore, we show that networks of high regularity outperform networks with increasing levels of random connectivity.	翻訳日:2021-06-02 16:04:41 公開日:2021-06-01
# (参考訳) Markpainting: 敵の機械学習がInpaintingと出会う Markpainting: Adversarial Machine Learning meets Inpainting ( http://arxiv.org/abs/2106.00660v1 ) ライセンス: CC BY 4.0	David Khachaturov, Ilia Shumailov, Yiren Zhao, Nicolas Papernot, Ross Anderson	(参考訳) インパインティング(Inpainting)は、画像にマスクや欠落したピースを投入するために使用される生成モデルに基づく、学習された補間技法であり、画像編集や修正に広く応用されている。近年、透かしの除去に塗布が使われ始め、懸念が高まった。本稿では,マークペイント技術を用いてその操作方法について検討する。まず、塗装モデルにアクセスした画像所有者が、そのモデルを使って編集しようとすると、任意の可視情報を付加するように画像を強化できることを示す。我々は,複数の異なるモデルを同時にターゲットとすることができる。これは、エディタがそれを削除しようとした場合、ウォーターマークを再構成するように設計できる。第二に、我々のマークペイント技術は異なるアーキテクチャを持つモデルや異なるデータセットで訓練されたモデルに転送可能であることを示し、それを用いて作成された透かしは敵が取り除くのが困難である。マークパインティング(Markpainting)は新規で、着色時に目に見えるアラームとして使用できる。 Inpainting is a learned interpolation technique that is based on generative modeling and used to populate masked or missing pieces in an image; it has wide applications in picture editing and retouching. Recently, inpainting started being used for watermark removal, raising concerns. In this paper we study how to manipulate it using our markpainting technique. First, we show how an image owner with access to an inpainting model can augment their image in such a way that any attempt to edit it using that model will add arbitrary visible information. We find that we can target multiple different models simultaneously with our technique. This can be designed to reconstitute a watermark if the editor had been trying to remove it. Second, we show that our markpainting technique is transferable to models that have different architectures or were trained on different datasets, so watermarks created using it are difficult for adversaries to remove. Markpainting is novel and can be used as a manipulation alarm that becomes visible in the event of inpainting.	翻訳日:2021-06-02 15:53:18 公開日:2021-06-01
# (参考訳) 科学テキスト分類のための視覚レイアウト構造の導入 Incorporating Visual Layout Structures for Scientific Text Classification ( http://arxiv.org/abs/2106.00676v1 ) ライセンス: CC BY 4.0	Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey	(参考訳) 科学論文のコアテキストコンポーネントの分類は、科学文書の自動理解において重要な第一歩である。これまでの研究は、基本的なレイアウト情報、すなわちページ上の各トークンの2D位置を用いて、より正確な分類を行う方法を示してきた。本稿では,VILA(Visual LAyout Structure)の新たな手法として,ページテキストのテキスト行やテキストブロックへのグループ化を言語モデルに導入し,パフォーマンスの向上を図る。モデル入力にレイアウト構造の境界を示す特別なトークンを追加するI-VILAアプローチは、トークン分類タスクにおいて+1~4.5 F1のスコア改善につながることを示す。さらに,これらのレイアウト構造を符号化し,予測精度を損なうことなく最大70%の効率向上を記録できる階層モデルh-vilaを設計した。実験は、新しい評価スイートであるS2-VLUEで行われ、VILAの意識を測定する新しい測定基準と、金のアノテーションで19の科学分野をカバーする新しいデータセットが提供される。トレーニング済みのウェイト、ベンチマークデータセット、ソースコードはhttps://github.com/allenai/VILA}{https://github.com/allenai/VILAで入手できる。 Classifying the core textual components of a scientific paper-title, author, body text, etc.-is a critical first step in automated scientific document understanding. Previous work has shown how using elementary layout information, i.e., each token's 2D position on the page, leads to more accurate classification. We introduce new methods for incorporating VIsual LAyout structures (VILA), e.g., the grouping of page texts into text lines or text blocks, into language models to further improve performance. We show that the I-VILA approach, which simply adds special tokens denoting boundaries between layout structures into model inputs, can lead to +1~4.5 F1 Score improvements in token classification tasks. Moreover, we design a hierarchical model H-VILA that encodes these layout structures and record a up-to 70% efficiency boost without hurting prediction accuracy. The experiments are conducted on a newly curated evaluation suite, S2-VLUE, with a novel metric measuring VILA awareness and a new dataset covering 19 scientific disciplines with gold annotations. Pre-trained weights, benchmark datasets, and source code will be available at https://github.com/allenai/VILA}{https://github.com/allenai/VILA.	翻訳日:2021-06-02 15:36:41 公開日:2021-06-01
# 校正型クロスモーダル検索のための学習関係アライメント Learning Relation Alignment for Calibrated Cross-modal Retrieval ( http://arxiv.org/abs/2105.13868v2 ) ライセンス: Link先を確認	Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu Sun, Hongxia Yang	(参考訳) 大規模なマルチモーダル事前学習アプローチの成果にもかかわらず、画像テキスト検索のようなクロスモーダル検索は難しい課題である。 2つのモダリティ間の意味的ギャップを埋めるために、これまでの研究では、主に対象レベルでの単語領域のアライメントに注目し、単語間の言語的関係と領域間の視覚的関係のマッチングを欠いている。このような関係一貫性の無視は、画像テキスト対の文脈的表現を損なうとともに、モデル性能と解釈可能性を妨げる。本稿では,まず,言語関係と視覚関係の間の意味的距離を計測し,関係一貫性を定量化する新しい指標であるisd(intra-modal self-attention distance)を提案する。そこで本研究では,isdを最適化し,両モダリティ間アライメントを介して相互にモダリティ内自己アライメントを校正するための正規化トレーニング手法であるiais(intra-modal self-attention)のモード間アライメントを提案する。 IAIS正規化器はFlickr30kおよびMS COCOデータセット上での一般的なモデルの性能を大幅に向上させ、我々のアプローチの優位性を示す。 Despite the achievements of large-scale multimodal pre-training approaches, cross-modal retrieval, e.g., image-text retrieval, remains a challenging task. To bridge the semantic gap between the two modalities, previous studies mainly focus on word-region alignment at the object level, lacking the matching between the linguistic relation among the words and the visual relation among the regions. The neglect of such relation consistency impairs the contextualized representation of image-text pairs and hinders the model performance and the interpretability. In this paper, we first propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations. In response, we present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD and calibrate intra-modal self-attentions from the two modalities mutually via inter-modal alignment. The IAIS regularizer boosts the performance of prevailing models on Flickr30k and MS COCO datasets by a considerable margin, which demonstrates the superiority of our approach.	翻訳日:2021-06-02 14:48:49 公開日:2021-06-01
# 1$\times$N Block Pattern for Network Sparsity 1$\times$N Block Pattern for Network Sparsity ( http://arxiv.org/abs/2105.14713v2 ) ライセンス: Link先を確認	Mingbao Lin, Yuchao Li, Yuxin Zhang, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji	(参考訳) ネットワークの分散性は、ニューラルネットワークの大幅な規模拡大を克服するための有望な方向として現れるが、一般的なCPU上での大幅なスピードアップを達成するだけでなく、モデル精度の同時維持も未解決のままである。本稿では,この制限を破るために,ブロック間隔パターン(ブロックプルーニング)を1\times N$という新しい概念を提案する。特に、同じ入力チャネルインデックスを持つ連続$N$出力カーネルは、1つのブロックにグループ化され、プルーニングパターンの基本的なプルーニング粒度として機能する。われわれの$1 \times N$ sparsityパターンは、これらのブロックを重要視している。また,最初に出力チャネル次元の重み行列を再構成し,精度向上のためにより影響力のあるブロックを導出し,入力チャネル次元の次層重みに同様の再配置を適用し,畳み込み操作を確実にするフィルタ再配置のワークフローを提供する。さらに, 並列化されたブロックワイドベクトル化演算により, 1 ドルブロック間隔後の出力計算を実現し, 一般的な CPU ベースのプラットフォーム上での大幅な高速化を実現した。プルーニングパターンの有効性は,ilsvrc-2012実験により実証された。例えば、50%の間隔と$N=4$の場合、MobileNet-V2の上位1の精度でフィルタプルーニングよりも約3.0%改善する。一方、重量プルーニングよりもcortex-a7 cpuの56.04msの推論節約が得られる。コードはhttps://github.com/lmbxmu/1xn。 Though network sparsity emerges as a promising direction to overcome the drastically increasing size of neural networks, it remains an open problem to concurrently maintain model accuracy as well as achieve significant speedups on general CPUs. In this paper, we propose one novel concept of $1\times N$ block sparsity pattern (block pruning) to break this limitation. In particular, consecutive $N$ output kernels with the same input channel index are grouped into one block, which serves as a basic pruning granularity of our pruning pattern. Our $1 \times N$ sparsity pattern prunes these blocks considered unimportant. We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements, and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations. Moreover, the output computation after our $1 \times N$ block sparsity can be realized via a parallelized block-wise vectorized operation, leading to significant speedups on general CPUs-based platforms. The efficacy of our pruning pattern is proved with experiments on ILSVRC-2012. For example, in the case of 50% sparsity and $N=4$, our pattern obtains about 3.0% improvements over filter pruning in the top-1 accuracy of MobileNet-V2. Meanwhile, it obtains 56.04ms inference savings on Cortex-A7 CPU over weight pruning. Code is available at https://github.com/lmbxmu/1xN.	翻訳日:2021-06-02 14:48:25 公開日:2021-06-01
# 問合せから引用への変換に基づく引用推薦と解釈 Quotation Recommendation and Interpretation Based on Transformation from Queries to Quotations ( http://arxiv.org/abs/2105.14189v2 ) ライセンス: Link先を確認	Lingzhi Wang, Xingshan Zeng, Kam-Fai Wong	(参考訳) 個人が自分を表現するのを助けるために、引用推奨が注目を集めている。それでも、これまでのほとんどの取り組みは、引用とクエリを別々にモデル化することに集中し、引用とクエリの関係を無視する。本研究では,クエリ表現を直接引用表現にマッピングする変換行列を提案する。マッピング関係をよりよく学ぶために、2つの意味空間の距離を最小にするマッピング損失(1つは引用用、もう1つはマッピングクエリ用)を用いる。さらに,問合せ中の単語を用いて引用の擬人的言語を解釈し,問合せの上に引用を意識した注意を施し,指示語を強調する。英語と中国語の2つのデータセットの実験では、我々のモデルは過去の最先端モデルよりも優れていた。 To help individuals express themselves better, quotation recommendation is receiving growing attention. Nevertheless, most prior efforts focus on modeling quotations and queries separately and ignore the relationship between the quotations and the queries. In this work, we introduce a transformation matrix that directly maps the query representations to quotation representations. To better learn the mapping relationship, we employ a mapping loss that minimizes the distance of two semantic spaces (one for quotation and another for mapped-query). Furthermore, we explore using the words in history queries to interpret the figurative language of quotations, where quotation-aware attention is applied on top of history queries to highlight the indicator words. Experiments on two datasets in English and Chinese show that our model outperforms previous state-of-the-art models.	翻訳日:2021-06-02 14:47:57 公開日:2021-06-01
# 変圧器の微調整と組成の相互作用について On the Interplay Between Fine-tuning and Composition in Transformers ( http://arxiv.org/abs/2105.14668v2 ) ライセンス: Link先を確認	Lang Yu and Allyson Ettinger	(参考訳) 事前訓練されたトランスフォーマー言語モデルは、様々なNLPタスクにおいて顕著な性能を示した。しかし、近年の研究では、これらのモデルにおけるフレーズレベルの表現は、語彙内容の強い影響を反映しているが、洗練された合成句情報の証拠がないことが示唆されている。本稿では,語彙的内容を超えた句意味情報を取り込むための文脈的埋め込みの能力に対する微調整の影響について検討する。具体的には,語彙重複度の高い逆パラフレーズ分類タスクと感情分類タスクでモデルを微調整する。微調整後,事前作業後の制御設定におけるフラシアル表現の分析を行う。微調整はこれらの表現において構成性に恩恵をもたらすことがほとんどないが、感情の訓練は特定のモデルに小さな局所的な利益をもたらす。フォローアップ分析では,その課題から構成的利益の欠如を説明できるパラフレーズデータセット内の類似した手がかりを同定し,感情訓練による局所的利益の根底にある潜在的な要因について考察する。 Pre-trained transformer language models have shown remarkable performance on a variety of NLP tasks. However, recent research has suggested that phrase-level representations in these models reflect heavy influences of lexical content, but lack evidence of sophisticated, compositional phrase information. Here we investigate the impact of fine-tuning on the capacity of contextualized embeddings to capture phrase meaning information beyond lexical content. Specifically, we fine-tune models on an adversarial paraphrase classification task with high lexical overlap, and on a sentiment classification task. After fine-tuning, we analyze phrasal representations in controlled settings following prior work. We find that fine-tuning largely fails to benefit compositionality in these representations, though training on sentiment yields a small, localized benefit for certain models. In follow-up analyses, we identify confounding cues in the paraphrase dataset that may explain the lack of composition benefits from that task, and we discuss potential factors underlying the localized benefits from sentiment training.	翻訳日:2021-06-02 14:47:43 公開日:2021-06-01
# 探索と爆発:中国のスペル補正モデルを改善する2つの方法 Exploration and Exploitation: Two Ways to Improve Chinese Spelling Correction Models ( http://arxiv.org/abs/2105.14813v2 ) ライセンス: Link先を確認	Chong Li, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang	(参考訳) ニューラルネットワークを用いたシーケンシャル・ツー・シーケンス学習は、いくつかの綴り誤りのある文を入力として出力する中国語綴り修正(csc)の有効な枠組みであることが実証的に証明されている。しかし、CSCモデルは混乱セットによってカバーされるスペルエラーの修正に失敗し、また目に見えないエラーに遭遇する。本稿では,モデルの弱点を継続的に識別し,より価値のあるトレーニングインスタンスを生成し,そのモデルを強化するためにタスク固有の事前学習戦略を適用する手法を提案する。生成した敵の例をトレーニングセットに徐々に追加する。実験結果から, 事前学習戦略と組み合わさって, 複数のCSCモデルの3つのデータセット間の一般化とロバスト性を改善し, CSCタスクの最先端性能を達成できることが示唆された。 A sequence-to-sequence learning with neural networks has empirically proven to be an effective framework for Chinese Spelling Correction (CSC), which takes a sentence with some spelling errors as input and outputs the corrected one. However, CSC models may fail to correct spelling errors covered by the confusion sets, and also will encounter unseen ones. We propose a method, which continually identifies the weak spots of a model to generate more valuable training instances, and apply a task-specific pre-training strategy to enhance the model. The generated adversarial examples are gradually added to the training set. Experimental results show that such an adversarial training method combined with the pretraining strategy can improve both the generalization and robustness of multiple CSC models across three different datasets, achieving stateof-the-art performance for CSC task.	翻訳日:2021-06-02 14:47:26 公開日:2021-06-01
# 加算ニューラルネットワーク Adder Neural Networks ( http://arxiv.org/abs/2105.14202v2 ) ライセンス: Link先を確認	Hanting Chen, Yunhe Wang, Chang Xu, Chao Xu, Chunjing Xu, Tong Zhang	(参考訳) 安価な加算演算と比較すると、乗算演算は計算の複雑さがはるかに高い。ディープニューラルネットワークにおける広く使われている畳み込みは、入力特徴と畳み込みフィルタの類似度を測定するために、正確にクロス相関である。本稿では,深層ニューラルネットワーク,特に畳み込みニューラルネットワーク(CNN)におけるこれらの膨大な乗算を,計算コストを削減するために,より安価な加算を行うための加算器ネットワーク(AdderNets)を提案する。 AdderNetsでは、フィルタと入力機能の間の$\ell_1$-norm距離を出力応答としています。この新たな類似度尺度がニューラルネットワークの最適化に与える影響を網羅的に分析した。より優れたパフォーマンスを実現するため,$\ell_p$-norm を調査し,AdderNets の特別なトレーニング手法を開発した。次に,各ニューロンの勾配の大きさに応じてアダネットの学習手順を強化する適応学習速度戦略を提案する。その結果、AdderNetsは画像Netデータセット上でResNet-50を使用して75.7%のTop-1精度92.3%のTop-5精度を達成することができる。さらに,ReLUアクティベーション関数を持つ単一の隠蔽層AdderNetと幅境界層AdderNetの両方が普遍関数近似器であることを示すことにより,AdderNetsの理論基盤を構築する。これらの結果は、より複雑な乗算単位を用いて従来のニューラルネットワークのものと一致する。単一の隠れレイヤでAdderNetsにバインドされた近似も提示される。 Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special training approach for AdderNets by investigating the $\ell_p$-norm. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 75.7% Top-1 accuracy 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolutional layer. Moreover, we develop a theoretical foundation for AdderNets, by showing that both the single hidden layer AdderNet and the width-bounded deep AdderNet with ReLU activation functions are universal function approximators. These results match those of the traditional neural networks using the more complex multiplication units. An approximation bound for AdderNets with a single hidden layer is also presented.	翻訳日:2021-06-02 14:47:11 公開日:2021-06-01
# VersatileGait:Wildシミュレーションに向けた大規模合成ゲイトデータセット VersatileGait: A Large-Scale Synthetic Gait Dataset Towards in-the-Wild Simulation ( http://arxiv.org/abs/2105.14421v2 ) ライセンス: Link先を確認	Pengyi Zhang, Huanzhang Dou, Wenhu Zhang, Yuhan Zhao, Songyuan Li, Zequn Qin, Xi Li	(参考訳) 近年,歩行認識が急速に進展している。しかし、野生での歩行認識はまだ十分に研究されていない。明らかな理由は、本質的および外生的要因の観点からの多様なトレーニングデータが欠如していることにある。この問題を解決するために,制御可能なコンピュータシミュレーションを用いて大規模歩行データセットを構築することを提案する。詳しくは,歩行の本質的要因を多様化するために,多様な属性を持つ多数のキャラクターを生成し,様々なタイプの歩行スタイルを付与する。歩行の外部要因を多様化するために,高密度カメラレイアウトの複雑なシーンを構築する。最後に、歩行シナリオをシミュレーションし、歩行データを自動キャプチャする自動生成ツールキットをUnity3Dで設計する。その結果,多種多様なシナリオを持つ1万件の被験者のシルエット配列を100万件以上持つVersatileGaitという,Wildの歩行データセットが得られた。 versatilegaitには、巨大なデータセットサイズ、多様な歩行者属性、複雑なカメラレイアウト、高品質なアノテーション、実際のドメイン間隙、新しい要求に対する優れたスケーラビリティ、プライバシ問題のない、いくつかの優れた特性があります。 versatilegaitを基盤として,野生の歩行研究と実用研究の両面において,一連の実験と応用を提案する。我々のデータセットとその生成ツールキットは、さらなる研究のために公開されます。 Gait recognition has a rapid development in recent years. However, gait recognition in the wild is not well explored yet. An obvious reason could be ascribed to the lack of diverse training data from the perspective of intrinsic and extrinsic factors. To remedy this problem, we propose to construct a large-scale gait dataset with the help of controllable computer simulation. In detail, to diversify the intrinsic factors of gait, we generate numerous characters with diverse attributes and empower them with various types of walking styles. To diversify the extrinsic factors of gait, we build a complicated scene with a dense camera layout. Finally, we design an automated generation toolkit under Unity3D for simulating the walking scenario and capturing the gait data automatically. As a result, we obtain an in-the-wild gait dataset, called VersatileGait, which has more than one million silhouette sequences of 10,000 subjects with diverse scenarios. VersatileGait possesses several nice properties, including huge dataset size, diverse pedestrian attributes, complicated camera layout, high-quality annotations, small domain gap with the real one, good scalability for new demands, and no privacy issues. Based on VersatileGait, we propose series of experiments and applications for both research exploration of gait in the wild and practical applications. Our dataset and its corresponding generation toolkit will be publicly available for further studies.	翻訳日:2021-06-02 14:46:48 公開日:2021-06-01
# ディープニューラルネットワークを用いた運転意図予測 Predicting Driver Intention Using Deep Neural Network ( http://arxiv.org/abs/2105.14790v2 ) ライセンス: Link先を確認	Mahdi Bonyani, Mina Rahmanian, Simindokht Jahangard	(参考訳) 運転安全性の向上と自動車事故の回避のために,高度運転支援システム (ADAS) が注目されている。近年の研究では、運転者の意図をシステムの重要部分として予測することに焦点を当てている。本研究では,brain4carsデータセットを用いたダイバー操作の予測に4つの入力を用い,実際の動作が起こる5,4,3,2,1秒前に操作予測を行う新しい枠組みを提案する。 1) 内部ビューのみ、2) 外部ビュー、3) 内部ビューと外部ビューの両方を使用して、フレームワークを3つのシナリオで評価しました。データセットをトレーニング,検証,テストセットに分割し,K倍のクロス検証も活用した。最先端の研究と比較すると、アーキテクチャは高速で、2番目と3番目のシナリオで高いパフォーマンスを実現しています。評価指標として精度,精度,リコール,f1-scoreを用い,外視では82.41%,82.28%,82,42%,82.24%,内視では98.90%,98.96%,外視では98.90%,外視では98.88%を得た。 To improve driving safety and avoid car accidents, Advanced Driver Assistance Systems (ADAS) are given significant attention. Recent studies have focused on predicting driver intention as a key part of these systems. In this study, we proposed new framework in which 4 inputs are employed to anticipate diver maneuver using Brain4Cars dataset and the maneuver prediction is achieved from 5, 4, 3, 2, 1 seconds before the actual action occurs. We evaluated our framework in three scenarios: using only 1) inside view 2) outside view and 3) both inside and outside view. We divided the dataset into training, validation and test sets, also K-fold cross validation is utilized. Compared with state-of-the-art studies, our architecture is faster and achieved higher performance in second and third scenario. Accuracy, precision, recall and f1-score as evaluation metrics were utilized and the result of 82.41%, 82.28%, 82,42% and 82.24% for outside view and 98.90%, 98.96%, 98.90% and 98.88% for both inside and outside view were gained, respectively.	翻訳日:2021-06-02 14:46:28 公開日:2021-06-01
# 都市交通監視(UTS:Urban Traffic Surveillance) : 2次元検出に基づく完全確率的3D追跡手法 Urban Traffic Surveillance (UTS): A fully probabilistic 3D tracking approach based on 2D detections ( http://arxiv.org/abs/2105.14993v2 ) ライセンス: Link先を確認	Henry Bradler, Adrian Kretz and Rudolf Mester	(参考訳) 都市交通監視(英語: urban traffic surveillance、略称:uts)は、複数の車線や車両が集中する都市交通シナリオにおける車両を検知し、鋭い旋回操作を行う単眼およびキャリブレーションビデオカメラに基づく監視システムである。 UTSは3Dバウンディングボックス表現と、無意味なカルマンフィルタに基づく物理的に合理的な3Dモーションモデルを用いて車両を追跡する。 UTSは3次元世界座標系における位置、形状、運動情報を復元するため、多様な交通違反を認識したり、貴重な交通情報を提供するために使用できる。 YOLOv3は、各車両の2Dバウンディングボックスとクラスラベルを生成する検出器として構築されている。 2D検出器は、さまざまなラベル付きトレーニングデータが利用できるため、我々のシステムを異なるカメラ視点にはるかに独立させる。これにより、よりハードウェア効率が良く、優れた一般化が可能になる。 2次元検出に基づく3Dトラッキングのタスクは、車両形状に関するクラス固有の事前知識を統合することで支援される。都市部における車両監視設定とラベル付き3Dバウンディングボックスによるデータセットの非存在により,CARLAシミュレータからの自己生成合成データと地上真実を用いてUTSを定量的に評価した。さらに,実世界のデータに対するUTSの動作の質的な印象を与える。私たちの実装は、かなりモダンなワークステーション上でリアルタイムに動作できます。われわれの知る限り、UTSは監視シナリオ(静止カメラによる移動目標の観測)の中で唯一の3D車両追跡システムとなる。 Urban Traffic Surveillance (UTS) is a surveillance system based on a monocular and calibrated video camera that detects vehicles in an urban traffic scenario with dense traffic on multiple lanes and vehicles performing sharp turning maneuvers. UTS then tracks the vehicles using a 3D bounding box representation and a physically reasonable 3D motion model relying on an unscented Kalman filter based approach. Since UTS recovers positions, shape and motion information in a three-dimensional world coordinate system, it can be employed to recognize diverse traffic violations or to supply intelligent vehicles with valuable traffic information. We build on YOLOv3 as a detector yielding 2D bounding boxes and class labels for each vehicle. A 2D detector renders our system much more independent to different camera perspectives as a variety of labeled training data is available. This allows for a good generalization while also being more hardware efficient. The task of 3D tracking based on 2D detections is supported by integrating class specific prior knowledge about the vehicle shape. We quantitatively evaluate UTS using self generated synthetic data and ground truth from the CARLA simulator, due to the non-existence of datasets with an urban vehicle surveillance setting and labeled 3D bounding boxes. Additionally, we give a qualitative impression of how UTS performs on real-world data. Our implementation is capable of operating in real time on a reasonably modern workstation. To the best of our knowledge, UTS is to date the only 3D vehicle tracking system in a surveillance scenario (static camera observing moving targets).	翻訳日:2021-06-02 14:46:03 公開日:2021-06-01
# 新規白質路の少数ショットセグメンテーションのための知識伝達 Knowledge Transfer for Few-shot Segmentation of Novel White Matter Tracts ( http://arxiv.org/abs/2105.14513v2 ) ライセンス: Link先を確認	Qi Lu and Chuyang Ye	(参考訳) 畳み込みニューラルネットワーク(CNN)は,拡散磁気共鳴画像(dMRI)に基づいて,白色物質(WM)トラクションセグメンテーションの最先端性能を達成した。これらのCNNは、一般に労働集約的でコストがかかるWMの訓練に多くの手作業による指示を必要とする。新しいWMトラクション、すなわち既存の手動デラインに含まれていないトラクションを解析する場合、高価な手動デライン化は特に不利になる可能性がある。新規WMトラクトを正確にセグメンテーションするには、既存のWMトラクトについて学んだ知識を伝達することが望ましいので、新規WMトラクトをわずかに記述しても、CNNはセグメンテーションのために適切に学習することができる。本稿では,これらの知識を,いくつかの場面で新規なWMトラクトのセグメンテーションに移行することを検討する。古典的な微調整戦略は目的に利用できるが、既存のwmパスをセグメント化するための最後のタスク特定層の情報は、完全に破棄される。我々は、この最後の層の重みは、新しいWMトラクトをセグメント化するための貴重な情報を保持することができるため、情報を完全に破棄することは最適ではないと仮定する。特に,新しいWMトラクトは既存のWMトラクトと相関し,新しいWMトラクトのセグメンテーションは既存のWMトラクトのロジットで予測できると考えられる。このように、微調整のためにランダム初期化よりも最終層のより良い初期化が達成できる。さらに,古典的な微調整の前にウォームアップステージを挿入するだけで,既存のWMトラクトを分割するための最終層における知識をより適応的に利用できることを示す。提案手法はdmriデータセット上で評価され,提案手法が新規なwm路の少数画分節化に有用であることを実証した。 Convolutional neural networks (CNNs) have achieved stateof-the-art performance for white matter (WM) tract segmentation based on diffusion magnetic resonance imaging (dMRI). These CNNs require a large number of manual delineations of the WM tracts of interest for training, which are generally labor-intensive and costly. The expensive manual delineation can be a particular disadvantage when novel WM tracts, i.e., tracts that have not been included in existing manual delineations, are to be analyzed. To accurately segment novel WM tracts, it is desirable to transfer the knowledge learned about existing WM tracts, so that even with only a few delineations of the novel WM tracts, CNNs can learn adequately for the segmentation. In this paper, we explore the transfer of such knowledge to the segmentation of novel WM tracts in the few-shot setting. Although a classic fine-tuning strategy can be used for the purpose, the information in the last task-specific layer for segmenting existing WM tracts is completely discarded. We hypothesize that the weights of this last layer can bear valuable information for segmenting the novel WM tracts and thus completely discarding the information is not optimal. In particular, we assume that the novel WM tracts can correlate with existing WM tracts and the segmentation of novel WM tracts can be predicted with the logits of existing WM tracts. In this way, better initialization of the last layer than random initialization can be achieved for fine-tuning. Further, we show that a more adaptive use of the knowledge in the last layer for segmenting existing WM tracts can be conveniently achieved by simply inserting a warmup stage before classic fine-tuning. The proposed method was evaluated on a publicly available dMRI dataset, where we demonstrate the benefit of our method for few-shot segmentation of novel WM tracts.	翻訳日:2021-06-02 14:45:39 公開日:2021-06-01
# 強化学習に基づく車両ネットワークにおける動的サービス配置 Reinforcement Learning-based Dynamic Service Placement in Vehicular Networks ( http://arxiv.org/abs/2105.15022v2 ) ライセンス: Link先を確認	Anum Talpur and Mohan Gurusamy	(参考訳) 5Gやモバイルエッジコンピューティングといった技術が出現すると、車載ネットワーク内の車両に異なるリソースとサービス要件を持つ異なるタイプのサービスのプロビジョニングが可能となり、さまざまなタイプのサービスの要求に対するトラフィックモビリティパターンとダイナミックスの複雑さが増し、サービスの配置が困難な課題となっている。典型的な静的配置ソリューションは、トラフィック移動性とサービスダイナミクスを考慮していないため、効果的ではない。本稿では,車両の移動性や動的性を考慮しつつ,エッジサーバに最適なサービス配置を求めるための強化学習型動的(RL-Dynamic)サービス配置フレームワークを提案する。シミュレーション実験にはSUMOとMATLABを用いる。学習フレームワークでは,決定モジュールに対して,遅延最小化とエッジサーバ利用最小化という2つの目的関数を検討する。 2つの目的関数に対するILPに基づく問題定式化を開発した。実験の結果,1)静的サービス配置と比較して,RLベースの動的サービス配置はエッジサーバリソースの公平な利用とサービス遅延の低減を実現し,2)遅延最適化配置と比較して,サーバ利用最適化配置はリソースをより効果的に活用し,エッジサーバ利用率を低くする。 The emergence of technologies such as 5G and mobile edge computing has enabled provisioning of different types of services with different resource and service requirements to the vehicles in a vehicular network.The growing complexity of traffic mobility patterns and dynamics in the requests for different types of services has made service placement a challenging task. A typical static placement solution is not effective as it does not consider the traffic mobility and service dynamics. In this paper, we propose a reinforcement learning-based dynamic (RL-Dynamic) service placement framework to find the optimal placement of services at the edge servers while considering the vehicle's mobility and dynamics in the requests for different types of services. We use SUMO and MATLAB to carry out simulation experiments. In our learning framework, for the decision module, we consider two alternative objective functions-minimizing delay and minimizing edge server utilization. We developed an ILP based problem formulation for the two objective functions. The experimental results show that 1) compared to static service placement, RL-based dynamic service placement achieves fair utilization of edge server resources and low service delay, and 2) compared to delay-optimized placement, server utilization optimized placement utilizes resources more effectively, achieving higher fairness with lower edge-server utilization.	翻訳日:2021-06-02 14:45:08 公開日:2021-06-01
# CIDER:対話説明と推論のための常識推論 CIDER: Commonsense Inference for Dialogue Explanation and Reasoning ( http://arxiv.org/abs/2106.00510v1 ) ライセンス: Link先を確認	Deepanway Ghosal and Pengfei Hong and Siqi Shen and Navonil Majumder and Rada Mihalcea and Soujanya Poria	(参考訳) 人間の言語を理解し説明するための常識推論は、自然言語処理における基本的な研究課題である。人間の会話を説明するには、文脈理解、計画、推論、因果関係、時間的、常識的推論を含む推論のいくつかの側面が必要であるため、大きな課題となる。本研究では,文脈コモンセンス推論を用いて推測される暗黙的かつ明示的な知識三重項という形で,ダイアディックな対話説明を含む手作業によるデータセットであるCIDERを紹介する。そのようなリッチな説明を会話から抽出することは、いくつかの下流アプリケーションを改善することにつながる。注釈付き三重項は、コモンセンス知識のタイプ(例えば因果、条件、時間)によって分類される。注釈付きデータセットでは,対話レベル自然言語推論,スパン抽出,複数選択スパン選択という3つのタスクを設定した。トランスフォーマーモデルで得られたベースライン結果は、タスクが困難であることを明らかにし、将来的な研究の道を開く。データセットとベースラインの実装はhttps://github.com/declare-lab/CIDERで公開されている。 Commonsense inference to understand and explain human language is a fundamental research problem in natural language processing. Explaining human conversations poses a great challenge as it requires contextual understanding, planning, inference, and several aspects of reasoning including causal, temporal, and commonsense reasoning. In this work, we introduce CIDER -- a manually curated dataset that contains dyadic dialogue explanations in the form of implicit and explicit knowledge triplets inferred using contextual commonsense inference. Extracting such rich explanations from conversations can be conducive to improving several downstream applications. The annotated triplets are categorized by the type of commonsense knowledge present (e.g., causal, conditional, temporal). We set up three different tasks conditioned on the annotated dataset: Dialogue-level Natural Language Inference, Span Extraction, and Multi-choice Span Selection. Baseline results obtained with transformer-based models reveal that the tasks are difficult, paving the way for promising future research. The dataset and the baseline implementations are publicly available at https://github.com/declare-lab/CIDER.	翻訳日:2021-06-02 14:44:30 公開日:2021-06-01
# GAN-BioBERTの検証 : 臨床治験における報告傾向の評価方法 Validating GAN-BioBERT: A Methodology For Assessing Reporting Trends In Clinical Trials ( http://arxiv.org/abs/2106.00665v1 ) ライセンス: Link先を確認	Joshua J Myszewski, Emily Klossowski, Patrick Meyer, Kristin Bevil, Lisa Klesius, Kristopher M Schroeder	(参考訳) 過去10年間、臨床研究におけるバイアスド・レポーティングの問題について多くの議論がなされてきた。この点にも拘わらず、臨床研究における質的記述の体系的評価のための限られたツールが開発されており、ほとんどの研究は、その大きさを制限する手作業のエキスパート・リサーの使用に依拠して質的記述を評価する。また、自然言語処理などの大規模ツール開発の試みは、その精度と発見の分類に使用されるカテゴリ数によって制限されていた。これらの制約を念頭に置いて、臨床試験の要約で表される定性的な感情を評価するために、大規模に適用するには適度に正確かつきめ細かな分類アルゴリズムを開発することが本研究の目的であった。さらに,本研究では,提案アルゴリズムであるGAN-BioBERTと過去の研究との比較や,臨床治験要約のマニュアル評価について検討する。本研究は,トランスフォーマー(bert)モデルからの双方向エンコーダ表現に基づく半教師自然言語プロセスモデルを用いて,臨床実施例の3種類の感情分類アルゴリズムを開発した。結果: このアルゴリズムを用いた場合, 分類精度は91.3%であり, マクロf1-scoreは0.92であり, 従来の方法やエキスパート格付けに比べ, 精度が大幅に向上した。提案手法であるgan-biobertは, 臨床文献における質的記述の大規模評価に適した分類モデルであり, 臨床出版動向の大規模研究に正確な再現性を提供する。 In the past decade, there has been much discussion about the issue of biased reporting in clinical research. Despite this attention, there have been limited tools developed for the systematic assessment of qualitative statements made in clinical research, with most studies assessing qualitative statements relying on the use of manual expert raters, which limits their size. Also, previous attempts to develop larger scale tools, such as those using natural language processing, were limited by both their accuracy and the number of categories used for the classification of their findings. With these limitations in mind, this study's goal was to develop a classification algorithm that was both suitably accurate and finely grained to be applied on a large scale for assessing the qualitative sentiment expressed in clinical trial abstracts. Additionally, this study seeks to compare the performance of the proposed algorithm, GAN-BioBERT, to previous studies as well as to expert manual rating of clinical trial abstracts. This study develops a three-class sentiment classification algorithm for clinical trial abstracts using a semi-supervised natural language process model based on the Bidirectional Encoder Representation from Transformers (BERT) model, from a series of clinical trial abstracts annotated by a group of experts in academic medicine. Results: The use of this algorithm was found to have a classification accuracy of 91.3%, with a macro F1-Score of 0.92, which is a significant improvement in accuracy when compared to previous methods and expert ratings, while also making the sentiment classification finer grained than previous studies. The proposed algorithm, GAN-BioBERT, is a suitable classification model for the large-scale assessment of qualitative statements in clinical trial literature, providing an accurate, reproducible tool for the large-scale study of clinical publication trends.	翻訳日:2021-06-02 14:44:13 公開日:2021-06-01
# Rewardは凸型MDPに十分である Reward is enough for convex MDPs ( http://arxiv.org/abs/2106.00661v1 ) ライセンス: Link先を確認	Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins and Satinder Singh	(参考訳) マルコフと定常である累積報酬関数の最大化、すなわち状態-作用対上で定義され、時間に依存しないことは、強化学習(RL)問題定式化に基づくマルコフ決定過程(MDP)において多くの種類の目標を捉えるのに十分である。しかし、この方法で全ての目標を達成できるわけではない。具体的には、目標が定常分布の凸関数として表される凸 MDP は、一般にこの方法では定式化できないことが分かりやすい。本稿では,Fenchel双対性を用いたポリシーとコスト(負の報酬)プレーヤー間のmin-maxゲームとして凸MDP問題を再構成し,その解決のためのメタアルゴリズムを提案する。本研究では,コストプレーヤが生成する非定常報酬を最大化するrlエージェントが生成するポリシーの平均値が,凸mdpの最適解に収束することを示す。最後に、メタアルゴリズムは、見習い学習、変分内在性制御、制約されたMDP、単一フレームワークへの純粋探索など、文学における強化学習アルゴリズムの様々な分岐を統一することを示す。 Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov Decision Process (MDP) based on the Reinforcement Learning (RL) problem formulation. However, not all goals can be captured in this manner. Specifically, it is easy to see that Convex MDPs in which goals are expressed as convex functions of stationary distributions cannot, in general, be formulated in this manner. In this paper, we reformulate the convex MDP problem as a min-max game between the policy and cost (negative reward) players using Fenchel duality and propose a meta-algorithm for solving it. We show that the average of the policies produced by an RL agent that maximizes the non-stationary reward produced by the cost player converges to an optimal solution to the convex MDP. Finally, we show that the meta-algorithm unifies several disparate branches of reinforcement learning algorithms in the literature, such as apprenticeship learning, variational intrinsic control, constrained MDPs, and pure exploration into a single framework.	翻訳日:2021-06-02 14:42:36 公開日:2021-06-01
# サクセス機能を有する多種多様な最適政策の発見 Discovering Diverse Nearly Optimal Policies withSuccessor Features ( http://arxiv.org/abs/2106.00669v1 ) ライセンス: Link先を確認	Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag and Satinder Singh	(参考訳) 同じ問題に対する異なる解決策を見つけることは、創造性と新しい状況への適応に関連するインテリジェンスの重要な側面である。強化学習では、様々なポリシーが探索、転送、階層化、堅牢性に有用である。提案手法は,後継的特徴の空間において多様な方針を探索する手法であり,それらがほぼ最適であることを示すものである。我々は,この問題をCMDP(Constrained Markov Decision Process)として定式化し,本質的な多様性報酬を特徴とする多様性を最大化する政策を見つけることを目的としている。また,最近提案されたロバスト性および識別報酬がいかに機能するかを分析し,手続きの初期化に敏感であり,サブ最適解に収束する可能性を見出した。そこで,本稿では,政策の後継的特徴の相関を最小限に抑えることを目的とした,新たな明示的な多様性報酬を提案する。我々はDeepMind Control Suiteの異なる多様性メカニズムを比較し、提案している明示的な多様性のタイプが、例えば異なる移動パターンのような異なる振る舞いを発見するために重要であることを発見した。 Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal. We formalize the problem as a Constrained Markov Decision Process (CMDP) where the goal is to find policies that maximize diversity, characterized by an intrinsic diversity reward, while remaining near-optimal with respect to the extrinsic reward of the MDP. We also analyze how recently proposed robustness and discrimination rewards perform and find that they are sensitive to the initialization of the procedure and may converge to sub-optimal solutions. To alleviate this, we propose new explicit diversity rewards that aim to minimize the correlation between the Successor Features of the policies in the set. We compare the different diversity mechanisms in the DeepMind Control Suite and find that the type of explicit diversity we are proposing is important to discover distinct behavior, like for example different locomotion patterns.	翻訳日:2021-06-02 14:42:14 公開日:2021-06-01
# 確率的スタイルマッチを用いた半教師付き領域一般化 Semi-Supervised Domain Generalization with Stochastic StyleMatch ( http://arxiv.org/abs/2106.00592v1 ) ライセンス: Link先を確認	Kaiyang Zhou, Chen Change Loy, Ziwei Liu	(参考訳) ドメイン一般化に関する既存の研究の多くは、複数のドメインから集めたソースデータが完全に注釈付けされていると仮定している。しかし、現実世界のアプリケーションでは、アノテーションのコストが高いため、各ソースドメインから利用可能なラベルはわずかにしかありません。本研究では,より現実的で実用的な半教師付き領域一般化(SSDG)について検討する。提案手法であるStyleMatchは,疑似ラベルをベースとした最先端の半教師付き学習手法であるFixMatchに触発され,SSDGを解くための新しい材料がいくつか提案されている。具体的には,うるさい擬似ラベルに対するロバスト性を改善しつつ,希少ラベル付きソースデータの過剰フィットを軽減するため,ガウス分布を持つクラスプロトタイプと見なされる分類器の重みに対する確率的モデリングを導入する。 2) ドメインシフト下での一般化を促進するために,fixmatch の 2-view 一貫性学習パラダイムを,スタイル拡張を第3の補完的視点として,マルチビュー版への弱みと強い拡張性に基づいてアップグレードする。そこで本研究では,ドメイン一般化や半教師付き学習など,幅広い領域で開発された強力なベースライン手法を網羅した2つのSSDGベンチマークを構築した。大規模な実験により、StyleMatchは低データ方式で最適な分布外一般化性能を達成することが示された。われわれのアプローチとベンチマークが、データ効率と一般化可能な学習システムに関する将来の研究の道を開くことを願っている。 Most existing research on domain generalization assumes source data gathered from multiple domains are fully annotated. However, in real-world applications, we might have only a few labels available from each source domain due to high annotation cost, along with abundant unlabeled data that are much easier to obtain. In this work, we investigate semi-supervised domain generalization (SSDG), a more realistic and practical setting. Our proposed approach, StyleMatch, is inspired by FixMatch, a state-of-the-art semi-supervised learning method based on pseudo-labeling, with several new ingredients tailored to solve SSDG. Specifically, 1) to mitigate overfitting in the scarce labeled source data while improving robustness against noisy pseudo labels, we introduce stochastic modeling to the classifier's weights, seen as class prototypes, with Gaussian distributions. 2) To enhance generalization under domain shift, we upgrade FixMatch's two-view consistency learning paradigm based on weak and strong augmentations to a multi-view version with style augmentation as the third complementary view. To provide a comprehensive study and evaluation, we establish two SSDG benchmarks, which cover a wide range of strong baseline methods developed in relevant areas including domain generalization and semi-supervised learning. Extensive experiments demonstrate that StyleMatch achieves the best out-of-distribution generalization performance in the low-data regime. We hope our approach and benchmarks can pave the way for future research on data-efficient and generalizable learning systems.	翻訳日:2021-06-02 14:41:33 公開日:2021-06-01
# 1つのシーケンスだけを見る:オブジェクト検出による視界のトランスフォーマーの再考 You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection ( http://arxiv.org/abs/2106.00666v1 ) ライセンス: Link先を確認	Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu	(参考訳) transformerは$2\mathrm{d}$の空間構造に関する最小限の知識で、純粋なシーケンスからシーケンスまでの視点でオブジェクトレベルの認識を実行できるか? この疑問に答えるために、我々は、インダクティブバイアスだけでなく、最も少ない修正が可能な na\"ive Vision Transformer に基づく一連のオブジェクト検出モデルである You Only Look at One Sequence (YOLOS) を提示する。中間サイズのImageNet-$1k$データセットで事前トレーニングされたYOLOSは,COCO, \textit{e.g.の競合オブジェクト検出性能をすでに達成できるのみである。 BERT-Baseから直接採用されているYOLOS-Baseは42.0ドルのボックスAPを達成できます。また、オブジェクト検出を通じて、トランスフォーマーの視界における現在の事前訓練スキームとモデルスケーリング戦略の影響についても論じる。コードとモデルの重み付けは \url{https://github.com/hustvl/YOLOS} で確認できる。 Can Transformer perform $2\mathrm{D}$ object-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the $2\mathrm{D}$ spatial structure? To answer this question, we present You Only Look at One Sequence (YOLOS), a series of object detection models based on the na\"ive Vision Transformer with the fewest possible modifications as well as inductive biases. We find that YOLOS pre-trained on the mid-sized ImageNet-$1k$ dataset only can already achieve competitive object detection performance on COCO, \textit{e.g.}, YOLOS-Base directly adopted from BERT-Base can achieve $42.0$ box AP. We also discuss the impacts as well as limitations of current pre-train schemes and model scaling strategies for Transformer in vision through object detection. Code and model weights are available at \url{https://github.com/hustvl/YOLOS}.	翻訳日:2021-06-02 14:41:04 公開日:2021-06-01
# PIGLeT:3次元世界におけるニューロ・シンボリック相互作用による言語接地 PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World ( http://arxiv.org/abs/2106.00188v1 ) ライセンス: Link先を確認	Rowan Zellers, Ari Holtzman, Matthew Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, Yejin Choi	(参考訳) PIGLeT - 相互作用を通して物理的常識知識を学習し,その知識を基底言語に利用するモデルを提案する。我々はPIGLeTを物理力学モデルと別言語モデルに分類する。私たちのダイナミクスモデルは、どんな物体なのかだけでなく、それらが何をしているのかも学べます。次に、言語モデルのインターフェースとして使用し、言語形式と基礎的意味の統一モデルを提供します。 PIGLeTは文を読み、次に何が起こるか神経的にシミュレートし、その結果をリテラル記号表現または自然言語で伝達する。実験結果から,我々のモデルは世界力学を効果的に学習し,コミュニケーションの仕方を示した。 80%以上の英語の文から「次に何が起こるか」を正確に予測することができ、100倍以上のテキスト・テキスト・アプローチを10%以上上回っている。同様に、物理相互作用の自然言語の要約も、人間がLMの代替品よりも正確であると判断する。今後の仕事の場を示す包括的分析を行う。 We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a physical dynamics model, and a separate language model. Our dynamics model learns not just what objects are but also what they do: glass cups break when thrown, plastic ones don't. We then use it as the interface to our language model, giving us a unified model of linguistic form and grounded meaning. PIGLeT can read a sentence, simulate neurally what might happen next, and then communicate that result through a literal symbolic representation, or natural language. Experimental results show that our model effectively learns world dynamics, along with how to communicate them. It is able to correctly forecast "what happens next" given an English sentence over 80% of the time, outperforming a 100x larger, text-to-text approach by over 10%. Likewise, its natural language summaries of physical interactions are also judged by humans as more accurate than LM alternatives. We present comprehensive analysis showing room for future work.	翻訳日:2021-06-02 14:40:51 公開日:2021-06-01
# 長文合成質問応答のためのエンドツーエンドマルチホップ検索 End-to-End Multihop Retrieval for Compositional Question Answering over Long Documents ( http://arxiv.org/abs/2106.00200v1 ) ライセンス: Link先を確認	Haitian Sun, William W. Cohen, Ruslan Salakhutdinov	(参考訳) 長い文書から複雑な質問に答えるには、複数の証拠をまとめて答えを予測する必要がある。本稿では,長い文書に対して合成質問に答えるマルチホップ検索手法であるdochopperを提案する。各ステップでDocHopperは文書から段落や文を検索し、検索した結果とクエリを混合し、次のステップでクエリを更新する。他の多くの検索ベースメソッド(ragやrealmなど)とは対照的に、クエリはトークンシーケンスでは拡張されない。これはモデルがエンドツーエンドで微分可能であることを意味する。文書構造を活用すれば、長い文書の質問応答や検索性能を大幅に改善できることを示す。我々はDocHopperを3つの異なるQAタスクで実験し、長い文書を読むことで構成的疑問に答える:談話内容推論、テーブルとテキストによる事実的QA、学術論文からのQAを求める情報。 DocHopperはすべてのベースラインモデルを上回っ、すべてのデータセットで最先端の結果を達成する。さらに、DocHopperは推論時に効率的で、ベースラインの3～10倍高速である。 Answering complex questions from long documents requires aggregating multiple pieces of evidence and then predicting the answers. In this paper, we propose a multi-hop retrieval method, DocHopper, to answer compositional questions over long documents. At each step, DocHopper retrieves a paragraph or sentence embedding from the document, mixes the retrieved result with the query, and updates the query for the next step. In contrast to many other retrieval-based methods (e.g., RAG or REALM) the query is not augmented with a token sequence: instead, it is augmented by "numerically" combining it with another neural representation. This means that model is end-to-end differentiable. We demonstrate that utilizing document structure in this was can largely improve question-answering and retrieval performance on long documents. We experimented with DocHopper on three different QA tasks that require reading long documents to answer compositional questions: discourse entailment reasoning, factual QA with table and text, and information seeking QA from academic papers. DocHopper outperforms all baseline models and achieves state-of-the-art results on all datasets. Additionally, DocHopper is efficient at inference time, being 3~10 times faster than the baselines.	翻訳日:2021-06-02 14:40:34 公開日:2021-06-01
# 言語横断的名前付きエンティティ認識のための強化反復的知識蒸留法 Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition ( http://arxiv.org/abs/2106.00241v1 ) ライセンス: Link先を確認	Shining Liang, Ming Gong, Jian Pei, Linjun Shou, Wanli Zuo, Xianglin Zuo, Daxin Jiang	(参考訳) 名前付きエンティティ認識(NER)は、Web SearchやVoice Assistantsなど、多くのアプリケーションの基本コンポーネントである。ディープニューラルネットワークは、NERの性能を大幅に改善するが、大量のトレーニングデータを必要とするため、ディープニューラルネットワークは業界環境で多くの言語にスケールアウトすることができない。この課題に対処するため、クロス言語NERは、訓練済みの多言語言語モデルを通じて、リッチリソース言語から低リソース言語へ知識を転送する。ターゲット言語でトレーニングデータを使用する代わりに、言語間NERはソース言語のトレーニングデータのみに依存し、オプションでソース言語から派生したトレーニングデータを追加する必要がある。しかし、既存の言語間nerメソッドでは、ターゲット言語でラベルのないリッチなデータをうまく利用していないため、業界アプリケーションでは比較的簡単に収集できる。この機会と課題に対処するため、本論文では、マイクロソフトにおいて、このような大量のラベルのないデータを実際の運用環境でターゲット言語で活用する新しいプラクティスについて述べる。ラベルなしデータから弱い監督信号を効果的に抽出するため,半教師付き学習と強化学習のアイデアに基づく新しいアプローチを開発した。 3つのベンチマークデータセットに関する実証的研究は、我々のアプローチがクリアなエッジで新しい最先端のパフォーマンスを確立することを検証します。現在、この論文で報告されているNER技術は、Microsoft Bing検索エンジンにおけるWebランキング、Entity Pane、Answers Triggering、Issue Answeringの基本的なコンポーネントになりつつある。さらに,本手法は,商用音声アシスタントのための音声言語理解モジュールの一部としても機能する。デプロイ後にプロトタイプフレームワークのコードをオープンソース化する予定です。 Named entity recognition (NER) is a fundamental component in many applications, such as Web Search and Voice Assistants. Although deep neural networks greatly improve the performance of NER, due to the requirement of large amounts of training data, deep neural networks can hardly scale out to many languages in an industry setting. To tackle this challenge, cross-lingual NER transfers knowledge from a rich-resource language to languages with low resources through pre-trained multilingual language models. Instead of using training data in target languages, cross-lingual NER has to rely on only training data in source languages, and optionally adds the translated training data derived from source languages. However, the existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages, which is relatively easy to collect in industry applications. To address the opportunities and challenges, in this paper we describe our novel practice in Microsoft to leverage such large amounts of unlabeled data in target languages in real production settings. To effectively extract weak supervision signals from the unlabeled data, we develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning. The empirical study on three benchmark data sets verifies that our approach establishes the new state-of-the-art performance with clear edges. Now, the NER techniques reported in this paper are on their way to become a fundamental component for Web ranking, Entity Pane, Answers Triggering, and Question Answering in the Microsoft Bing search engine. Moreover, our techniques will also serve as part of the Spoken Language Understanding module for a commercial voice assistant. We plan to open source the code of the prototype framework after deployment.	翻訳日:2021-06-02 14:40:15 公開日:2021-06-01
# 強化学習に基づくきめ細かな質問応答システム A Coarse to Fine Question Answering System based on Reinforcement Learning ( http://arxiv.org/abs/2106.00257v1 ) ライセンス: Link先を確認	Yu Wang, Hongxia Jin	(参考訳) 本稿では,適切な行動を選択することで,異なる長さの文書を効率的に処理できる強化学習に基づく粗い質問応答(CFQA)システムを提案する。本システムは,アクタ批判に基づく深層強化学習モデルを用いて,多段階質問応答を実現する。ショートドキュメントとロングドキュメントの両方を主とするデータセットを対象とした従来のQAモデルと比較して、マルチステップからファインモデルへは、ショートドキュメントとロングドキュメントの両方を扱える複数のシステムモジュールからメリットを享受する。これにより、現在の最先端モデルよりも精度が向上し、トレーニング速度も速くなる。我々は、WIKEREADING、WIKIREADING LONG、CNN、SQuADの4つのQAデータセットでモデルをテストし、1.3$\%$-1.7$\%$精度の改善を1.5x-3.4xのトレーニングスピードアップで示す。 In this paper, we present a coarse to fine question answering (CFQA) system based on reinforcement learning which can efficiently processes documents with different lengths by choosing appropriate actions. The system is designed using an actor-critic based deep reinforcement learning model to achieve multi-step question answering. Compared to previous QA models targeting on datasets mainly containing either short or long documents, our multi-step coarse to fine model takes the merits from multiple system modules, which can handle both short and long documents. The system hence obtains a much better accuracy and faster trainings speed compared to the current state-of-the-art models. We test our model on four QA datasets, WIKEREADING, WIKIREADING LONG, CNN and SQuAD, and demonstrate 1.3$\%$-1.7$\%$ accuracy improvements with 1.5x-3.4x training speed-ups in comparison to the baselines using state-of-the-art models.	翻訳日:2021-06-02 14:39:49 公開日:2021-06-01
# graph isomorphism, covariants, and parser performance"\textit{ because their treebanks leak}"の複製と拡張 Replicating and Extending "\textit{Because Their Treebanks Leak}": Graph Isomorphism, Covariants, and Parser Performance ( http://arxiv.org/abs/2106.00352v1 ) ライセンス: Link先を確認	Mark Anderson and Anders S{\o}gaard and Carlos G\'omez Rodr\'iguez	(参考訳) s{\o}gaard (2020) は、テストデータに含まれる木の割合がトレーニングセット内の木に同型であることを示唆する結果を得た。 NLPの他の統計分析と同様に、結果は線形回帰評価に基づく。しかし,本研究には方法論的な問題があり,信頼性の低いサンプルサイズを用いて実施した。そこで本研究では,文の長さを単位とする複製研究を行い,グラフ同型に関して,文のごく一部しか性能に変化がないことを示す。さらに,共変量を制御する際に,野生におけるパーサ性能とグラフアイソモーフィズムの相関は消失する。しかし、共変を固定した制御実験では、強い相関関係が観察される。このような統計的分析から得られた結論は、より容易に要因を分解することで、制御された実験がそれらを補う必要があることを示唆する。 S{\o}gaard (2020) obtained results suggesting the fraction of trees occurring in the test data isomorphic to trees in the training set accounts for a non-trivial variation in parser performance. Similar to other statistical analyses in NLP, the results were based on evaluating linear regressions. However, the study had methodological issues and was undertaken using a small sample size leading to unreliable results. We present a replication study in which we also bin sentences by length and find that only a small subset of sentences vary in performance with respect to graph isomorphism. Further, the correlation observed between parser performance and graph isomorphism in the wild disappears when controlling for covariants. However, in a controlled experiment, where covariants are kept fixed, we do observe a strong correlation. We suggest that conclusions drawn from statistical analyses like this need to be tempered and that controlled experiments can complement them by more readily teasing factors apart.	翻訳日:2021-06-02 14:39:31 公開日:2021-06-01
# SemEval-2021 Task 6: テキストとマルチモーダルアンサンブルを用いた説得的テキストと画像の検出に向けて Volta at SemEval-2021 Task 6: Towards Detecting Persuasive Texts and Images using Textual and Multimodal Ensemble ( http://arxiv.org/abs/2106.00240v1 ) ライセンス: Link先を確認	Kshitij Gupta, Devansh Gautam, Radhika Mamidi	(参考訳) ミームは、情報をオンラインで拡散するために使われる最も人気のあるコンテンツの1つである。修辞的・心理学的手法によって多くの人々に影響を及ぼすことができる。テキストや画像における説得技術の検出は,これらの説得技術を検出することを目的としている。 A)テキストコンテンツを用いたマルチラベル分類,(B)テキストコンテンツを用いたマルチラベル分類とスパン識別,(C)ビジュアルコンテンツとテキストコンテンツを用いたマルチラベル分類の3つのサブタスクから構成される。本稿では, BERT をベースとしたモデルに対して, 異なるモダリティで伝達学習手法を提案する。また、異なるモードで訓練されたモデルのアンサンブルの有効性についても検討する。 57.0, 48.2, 52.1のF1スコアを対応するサブタスクで達成する。 Memes are one of the most popular types of content used to spread information online. They can influence a large number of people through rhetorical and psychological techniques. The task, Detection of Persuasion Techniques in Texts and Images, is to detect these persuasive techniques in memes. It consists of three subtasks: (A) Multi-label classification using textual content, (B) Multi-label classification and span identification using textual content, and (C) Multi-label classification using visual and textual content. In this paper, we propose a transfer learning approach to fine-tune BERT-based models in different modalities. We also explore the effectiveness of ensembles of models trained in different modalities. We achieve an F1-score of 57.0, 48.2, and 52.1 in the corresponding subtasks.	翻訳日:2021-06-02 14:39:15 公開日:2021-06-01
# 逆VQA:VQAモデルのロバスト性を評価するための新しいベンチマーク Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models ( http://arxiv.org/abs/2106.00245v1 ) ライセンス: Link先を確認	Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu	(参考訳) 大規模な事前トレーニングでは、過去2年間、vqa(visual question answering)タスクのパフォーマンスが大幅に向上している。急速な進展はあったが、これらの最先端(SOTA)のVQAモデルが野生での試験例に遭遇する際に堅牢かどうかは不明である。そこで本研究では,新たな大規模VQAベンチマークであるAdversarial VQAを紹介する。この新しいベンチマークでは,いくつかの興味深い結果が得られた。意外なことに,データセット収集の過程で,非エキスパートアノテータが比較的容易にSOTA VQAモデルを攻撃できることがわかった。 (II)新しいデータセット上で様々なSOTA VQAモデルをテストして、その脆弱性を強調し、大規模な事前学習モデルと敵のトレーニング手法の両方が、標準のVQA v2データセットよりもはるかに低いパフォーマンスしか達成できないことを発見した。 (iii)データ拡張とみなす場合、我々のデータセットは、他の堅牢なVQAベンチマークのパフォーマンス向上に利用できます。 (iv)我々は,データセットの詳細な分析を行い,コミュニティにもたらした課題に関する貴重な洞察を提供する。我々は、Adversarial VQAが、開発したVQAモデルの堅牢性をテストするために将来の作業で使用される貴重なベンチマークとして機能することを願っている。私たちのデータセットはhttps://adversarialvqa.comで公開されています。 github.io/ With large-scale pre-training, the past two years have witnessed significant performance boost on the Visual Question Answering (VQA) task. Though rapid progresses have been made, it remains unclear whether these state-of-the-art (SOTA) VQA models are robust when encountering test examples in the wild. To study this, we introduce Adversarial VQA, a new large-scale VQA benchmark, collected iteratively via an adversarial human-and-model-in-the-loop procedure. Through this new benchmark, we present several interesting findings. (i) Surprisingly, during dataset collection, we find that non-expert annotators can successfully attack SOTA VQA models with relative ease. (ii) We test a variety of SOTA VQA models on our new dataset to highlight their fragility, and find that both large-scale pre-trained models and adversarial training methods can only achieve far lower performance than what they can achieve on the standard VQA v2 dataset. (iii) When considered as data augmentation, our dataset can be used to improve the performance on other robust VQA benchmarks. (iv) We present a detailed analysis of the dataset, providing valuable insights on the challenges it brings to the community. We hope Adversarial VQA can serve as a valuable benchmark that will be used by future work to test the robustness of its developed VQA models. Our dataset is publicly available at https://adversarialvqa. github.io/.	翻訳日:2021-06-02 14:39:02 公開日:2021-06-01
# ViTA:オブジェクトタグのアライメントによる視覚言語翻訳 ViTA: Visual-Linguistic Translation by Aligning Object Tags ( http://arxiv.org/abs/2106.00250v1 ) ライセンス: Link先を確認	Kshitij Gupta, Devansh Gautam, Radhika Mamidi	(参考訳) マルチモーダル機械翻訳(mmt)は、翻訳のための視覚情報を含む原文を豊かにする。近年は人気が高まり、同じ方向にいくつかのパイプラインが提案されている。しかし、このタスクは、翻訳システムにおける視覚的モダリティの寄与を説明するための品質データセットを欠いている。本稿では,WAT 2021の多モーダル翻訳タスクを英語からヒンディー語に翻訳するシステムを提案する。我々は、テキストのみの翻訳に、事前訓練された多言語列列列列モデルであるmBARTを用いることを提案する。さらに、画像からオブジェクトタグを抽出し、マルチモーダルタスクの入力を強化することにより、視覚情報をテキスト領域に持ち込む。また,ソーステキストを体系的に劣化させることにより,システムのロバスト性について検討する。最後に、タスクのテストセットとチャレンジセットにおいて、BLEUスコア44.6と51.6を達成する。 Multimodal Machine Translation (MMT) enriches the source text with visual information for translation. It has gained popularity in recent years, and several pipelines have been proposed in the same direction. Yet, the task lacks quality datasets to illustrate the contribution of visual modality in the translation systems. In this paper, we propose our system for the Multimodal Translation Task of WAT 2021 from English to Hindi. We propose to use mBART, a pretrained multilingual sequence-to-sequence model, for the textual-only translations. Further, we bring the visual information to a textual domain by extracting object tags from the image and enhance the input for the multimodal task. We also explore the robustness of our system by systematically degrading the source text. Finally, we achieve a BLEU score of 44.6 and 51.6 on the test set and challenge set of the task.	翻訳日:2021-06-02 14:38:37 公開日:2021-06-01
# TransVOS: トランスフォーマーによるビデオオブジェクトセグメンテーション TransVOS: Video Object Segmentation with Transformers ( http://arxiv.org/abs/2106.00588v1 ) ライセンス: Link先を確認	Jianbiao Mei, Mengmeng Wang, Yeneng Lin, Yong Liu	(参考訳) 近年,STM(Space-Time Memory Network)に基づく手法は,半教師付きビデオオブジェクトセグメンテーション(VOS)において最先端のパフォーマンスを実現している。このタスクにおける重要な問題は、異なるフレームと各フレーム内の依存関係をモデル化する方法である。しかし、これらの手法の多くは空間的関係(各フレームの内側)を無視し、時間的関係(異なるフレーム)を完全に利用しない。本稿では,時間的・空間的関係をフル活用し,モデル化するビジョントランスフォーマを導入する,TransVOSと呼ばれる新しいトランスフォーマベースのフレームワークを提案する。さらに、ほとんどのSTMベースのアプローチでは、2つの異なるエンコーダを使用して、2つの重要な入力、すなわち参照セット(予測マスク付き歴史フレーム)とクエリフレームの特徴を抽出し、モデルのパラメータと複雑さを増大させる。有効性を保ちながら、人気のある2エンコーダパイプラインをスリム化するために、上記の2つの入力を統一的に符号化する単一の2パス特徴抽出器を設計する。大規模な実験は、DAVISとYouTube-VOSデータセットの最先端手法よりもTransVOSの方が優れていることを示している。コードは公開時にリリースされる。 Recently, Space-Time Memory Network (STM) based methods have achieved state-of-the-art performance in semi-supervised video object segmentation (VOS). A critical problem in this task is how to model the dependency both among different frames and inside every frame. However, most of these methods neglect the spatial relationships (inside each frame) and do not make full use of the temporal relationships (among different frames). In this paper, we propose a new transformer-based framework, termed TransVOS, introducing a vision transformer to fully exploit and model both the temporal and spatial relationships. Moreover, most STM-based approaches employ two disparate encoders to extract features of two significant inputs, i.e., reference sets (history frames with predicted masks) and query frame, respectively, increasing the models' parameters and complexity. To slim the popular two-encoder pipeline while keeping the effectiveness, we design a single two-path feature extractor to encode the above two inputs in a unified way. Extensive experiments demonstrate the superiority of our TransVOS over state-of-the-art methods on both DAVIS and YouTube-VOS datasets. Codes will be released when it is published.	翻訳日:2021-06-02 14:38:26 公開日:2021-06-01
# 大規模バッチ学習のための並行学習 Concurrent Adversarial Learning for Large-Batch Training ( http://arxiv.org/abs/2106.00221v1 ) ライセンス: Link先を確認	Yong Liu, Xiangning Chen, Minhao Cheng, Cho-Jui Hsieh, Yang You	(参考訳) 大規模バッチトレーニングは、多数のGPU/TPUプロセッサでニューラルネットワークをトレーニングする際に一般的に使用されるテクニックとなっている。バッチサイズが大きくなると、確率的最適化器は鋭い局所的な最小値に収束し、テスト性能が低下する。現行の手法では,バッチサイズを増大させるため,バッチサイズが大きくなるにつれてデータ増倍による性能向上が減少し,ある時点からデータ増倍が不十分になることがわかった。本稿では,大規模バッチ学習におけるバッチサイズ向上のための逆学習を提案する。意思決定面の平滑化と平坦な領域への偏りに対する自然な選択であるにもかかわらず、各ステップで少なくとも2つの逐次的な勾配計算が必要となるため、大規模なバッチトレーニングでは、逆学習がうまく適用されていない。そこで本研究では, 逐次的勾配計算を逐次的に切り離し, 定常パラメータを活用し, 同時進行学習 (conadv) 法を提案する。実験の結果,ConAdvは高精度を維持しつつ,ImageNet上でのResNet-50とEfficientNetトレーニングの両方でバッチサイズを向上できることがわかった。具体的には,ImageNet ResNet-50トレーニングにおいて,96Kバッチサイズで75.3\%のTop-1精度を実現し,ConAdvとデータ拡張を組み合わせた場合の精度をさらに76.2\%に向上できることを示す。これはResNet-50トレーニングバッチサイズを96Kにスケールする最初の作業である。 Large-batch training has become a commonly used technique when training neural networks with a large number of GPU/TPU processors. As batch size increases, stochastic optimizers tend to converge to sharp local minima, leading to degraded test performance. Current methods usually use extensive data augmentation to increase the batch size, but we found the performance gain with data augmentation decreases as batch size increases, and data augmentation will become insufficient after certain point. In this paper, we propose to use adversarial learning to increase the batch size in large-batch training. Despite being a natural choice for smoothing the decision surface and biasing towards a flat region, adversarial learning has not been successfully applied in large-batch training since it requires at least two sequential gradient computations at each step, which will at least double the running time compared with vanilla training even with a large number of processors. To overcome this issue, we propose a novel Concurrent Adversarial Learning (ConAdv) method that decouple the sequential gradient computations in adversarial learning by utilizing staled parameters. Experimental results demonstrate that ConAdv can successfully increase the batch size on both ResNet-50 and EfficientNet training on ImageNet while maintaining high accuracy. In particular, we show ConAdv along can achieve 75.3\% top-1 accuracy on ImageNet ResNet-50 training with 96K batch size, and the accuracy can be further improved to 76.2\% when combining ConAdv with data augmentation. This is the first work successfully scales ResNet-50 training batch size to 96K.	翻訳日:2021-06-02 14:37:26 公開日:2021-06-01
# サブシンボリック推論のための学習表現 Learning Representations for Sub-Symbolic Reasoning ( http://arxiv.org/abs/2106.00393v1 ) ライセンス: Link先を確認	Giuseppe Marra, Michelangelo Diligenti, Francesco Giannini and Marco Maggini	(参考訳) ニューロシンボリックな手法は、神経アーキテクチャ、知識表現、推論を統合する。しかし、彼らは観測の本質的な不確実性に対処し、現実の応用へのスケーリングに苦慮している。本稿では,ディープ・ラーナ・アーキテクチャの潜在空間における関係推論を行う新しいエンド・ツー・エンドモデルであるrelational reasoning networks(r2n)について述べる。エンティティ間の関係を表現できる知識グラフ埋め込みのような平らなアーキテクチャとは異なり、R2Nは基底原子間の高レベルな関係を考慮し、追加の計算構造を定義する。考慮された関係は、論理式によって定義されたもののように明示的に知られているか、基底原子の群間の無拘束相関として定義される。 R2Nは純粋にシンボリックなタスクや、シンボリックと特徴に基づく表現的エンティティの両方で異種問題における学習と推論を統合するための神経-記号的プラットフォームとして適用することができる。提案モデルは,拡張性や表現性に制限された従来のニューロシンボリックな手法のギャップを埋めるものである。提案手法は, 異なる実験環境で最新の結果が得られることを示す。 Neuro-symbolic methods integrate neural architectures, knowledge representation and reasoning. However, they have been struggling at both dealing with the intrinsic uncertainty of the observations and scaling to real world applications. This paper presents Relational Reasoning Networks (R2N), a novel end-to-end model that performs relational reasoning in the latent space of a deep learner architecture, where the representations of constants, ground atoms and their manipulations are learned in an integrated fashion. Unlike flat architectures like Knowledge Graph Embedders, which can only represent relations between entities, R2Ns define an additional computational structure, accounting for higher-level relations among the ground atoms. The considered relations can be explicitly known, like the ones defined by logic formulas, or defined as unconstrained correlations among groups of ground atoms. R2Ns can be applied to purely symbolic tasks or as a neuro-symbolic platform to integrate learning and reasoning in heterogeneous problems with both symbolic and feature-based represented entities. The proposed model bridges the gap between previous neuro-symbolic methods that have been either limited in terms of scalability or expressivity. The proposed methodology is shown to achieve state-of-the-art results in different experimental settings.	翻訳日:2021-06-02 14:36:55 公開日:2021-06-01
# OpenBox: 汎用ブラックボックス最適化サービス OpenBox: A Generalized Black-box Optimization Service ( http://arxiv.org/abs/2106.00421v1 ) ライセンス: Link先を確認	Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huaijun Jiang, Mingchao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang, Bin Cui	(参考訳) black-box optimization(bbo)は、自動機械学習、エンジニアリング、物理学、実験設計など、幅広い応用がある。しかし、既存のソフトウェアパッケージと互換性のある問題に対して、ユーザがBBOメソッドを適用することは、適用性、性能、効率の点で依然として課題である。本稿では,ユーザビリティを向上したオープンソースの汎用BBOサービスであるOpenBoxを構築する。 OpenBoxを支えるモジュール設計は、他の既存のシステムで共通する基本的なBBOコンポーネントの柔軟な抽象化と最適化を容易にする。 OpenBoxは分散、フォールトトレラント、スケーラブルである。効率を改善するために、OpenBoxはさらに"algorithm agnostic"並列化と転送学習を利用している。実験の結果,既存のシステムと比較してopenboxの有効性と効率が実証された。 Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, engineering, physics, and experimental design. However, it remains a challenge for users to apply BBO methods to their problems at hand with existing software packages, in terms of applicability, performance, and efficiency. In this paper, we build OpenBox, an open-source and general-purpose BBO service with improved usability. The modular design behind OpenBox also facilitates flexible abstraction and optimization of basic BBO components that are common in other existing systems. OpenBox is distributed, fault-tolerant, and scalable. To improve efficiency, OpenBox further utilizes "algorithm agnostic" parallelization and transfer learning. Our experimental results demonstrate the effectiveness and efficiency of OpenBox compared to existing systems.	翻訳日:2021-06-02 14:36:35 公開日:2021-06-01
# 動的ニューラルモデルを用いた学生のパフォーマンス予測 Student Performance Prediction Using Dynamic Neural Models ( http://arxiv.org/abs/2106.00524v1 ) ライセンス: Link先を確認	Marina Delianidi, Konstantinos Diamantaras, George Chrysogonidis, Vasileios Nikiforidis	(参考訳) 本研究は,学生の学習・評価過程における過去のインタラクションに基づいて,次の試験問題に対する学生の回答の正当性を予測する問題に対処する。我々は、学生のパフォーマンスを動的問題としてモデル化し、そのソリューションとして、有限メモリ時間遅延ニューラルネットワーク(TDNN)と潜在的無限メモリリカレントニューラルネットワーク(RNN)の2つの主要なクラスを比較した。次の応答は,学生の知識状態の関数であり,それに対して,従来の応答と,それに関連するスキルの関数であるので,2部ネットワークアーキテクチャを提案する。第1部は、動的ニューラルネットワーク(tdnnまたはrnn)を使用して、学生の知識状態をトレースする。第2部は動的部分の上に適用され、学生の知識状態の推定に基づいて学生の反応を予測する分類タスクを完了した多層フィードフォワードネットワークである。入力スキルと以前のレスポンスは、異なる埋め込みを使ってエンコードされる。スキル埋め込みに関しては, (a) ランダムベクトルと (b) スキルのテキスト記述と一致する事前学習ベクトルを用いて, 2つの異なる初期化手法を試した。実験の結果,これまでに使用したすべてのデータセットにおいて,RNNアプローチの性能はTDNNアプローチよりも優れていることがわかった。また、我々のRNNアーキテクチャは、5つのデータセットのうち4つで最先端のモデルよりも優れていることを示す。 tdnnのアプローチは、5つのデータセットのうち4つでアートモデルの状態を上回っていますが、提案されているrnnのアプローチよりは少し悪いです。最後に、我々の期待に反して、事前学習ベクターを用いたスキル埋め込みの初期化は、ランダム初期化に対して事実上優位ではないことが判明した。 We address the problem of predicting the correctness of the student's response on the next exam question based on their previous interactions in the course of their learning and evaluation process. We model the student performance as a dynamic problem and compare the two major classes of dynamic neural architectures for its solution, namely the finite-memory Time Delay Neural Networks (TDNN) and the potentially infinite-memory Recurrent Neural Networks (RNN). Since the next response is a function of the knowledge state of the student and this, in turn, is a function of their previous responses and the skills associated with the previous questions, we propose a two-part network architecture. The first part employs a dynamic neural network (either TDNN or RNN) to trace the student knowledge state. The second part applies on top of the dynamic part and it is a multi-layer feed-forward network which completes the classification task of predicting the student response based on our estimate of the student knowledge state. Both input skills and previous responses are encoded using different embeddings. Regarding the skill embeddings we tried two different initialization schemes using (a) random vectors and (b) pretrained vectors matching the textual descriptions of the skills. Our experiments show that the performance of the RNN approach is better compared to the TDNN approach in all datasets that we have used. Also, we show that our RNN architecture outperforms the state-of-the-art models in four out of five datasets. It is worth noting that the TDNN approach also outperforms the state of the art models in four out of five datasets, although it is slightly worse than our proposed RNN approach. Finally, contrary to our expectations, we find that the initialization of skill embeddings using pretrained vectors offers practically no advantage over random initialization.	翻訳日:2021-06-02 14:36:26 公開日:2021-06-01
# Duckworth-Lewis-Stern法と機械学習手法の比較 Duckworth-Lewis-Stern Method Comparison with Machine Learning Approach ( http://arxiv.org/abs/2106.00175v1 ) ライセンス: Link先を確認	Kumail Abbas and Sajjad Haider	(参考訳) 本研究は,ODIクリケットの試合におけるDuckworth-Lewis-Stern (DLS)法の解析を行った。 DLS法の精度を様々な教師付き学習アルゴリズムと比較し,結果予測を行う。クリケットの試合の結果は2回目の間に予測される。また,Duckworth-Lewis (D/L) 式で使用される DLS 資源テーブルを最適化し,予測能力を向上した。最後に、ODIの試合中にどれだけ予測不可能かに応じて異なるクリケット競技国をランク付けする予測不可能指数が開発されている。 This work presents an analysis of the Duckworth-Lewis-Stern (DLS) method for One Day International (ODI) cricket matches. The accuracy of the DLS method is compared against various supervised learning algorithms for result prediction. The result of a cricket match is predicted during the second inning. The paper also optimized DLS resource table which is used in the Duckworth-Lewis (D/L) formula to increase its predictive power. Finally, an Unpredictability Index is developed that ranks different cricket playing nations according to how unpredictable they are while playing an ODI match.	翻訳日:2021-06-02 14:35:30 公開日:2021-06-01
# 価値の欠如を予測できるよいインプットは何でしょう? What's a good imputation to predict with missing values? ( http://arxiv.org/abs/2106.00311v1 ) ライセンス: Link先を確認	Marine Le Morvan (PARIETAL, IJCLab), Julie Josse (CRISAM), Erwan Scornet (CMAP), Ga\"el Varoquaux (PARIETAL)	(参考訳) 値が欠けているデータについてよい予測子を学ぶには? ほとんどの取り組みは、結果を予測するために、完了データにできる限り第一の示唆と第二の学習に焦点を当てています。しかし、この広範な実践には理論的根拠がない。ここでは, ほぼすべてのインプテーション関数に対して, 強力な学習者を持つインプタント・テン・レグレッション手順がベイズ最適であることを示す。この結果は、確率的モデリングにおいて不確定性を使用するために非ランダムな設定を必要とする古典的な統計結果とは対照的である。さらに、完全な条件付きインプテーションは漸近的に良い予測には必要ではないかもしれない。実際、完全にインプットされたデータでは、最高の回帰関数は概して不連続であり、学習は困難である。代わりに、回帰関数を変更しないようにインプテーションを作成することは、単に問題を不連続インプテーションの学習に移す。むしろ、インプテーションと回帰を共同で学ぶのがより簡単であることを示唆する。観測された変数と観測されていない変数をまたいだ条件付きリンクをキャプチャするニューラルネットワークであるNeuMissに適応する手法を提案する。実験により, 有限個の試料を用いた実験において, NeuMiss による連成計算と回帰は, 様々な2段階の手順より優れていることを確認した。 How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation may not be needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.	翻訳日:2021-06-02 14:35:21 公開日:2021-06-01
# 変分ベイズにおけるフレキシブル後方の変形モデル Transformation Models for Flexible Posteriors in Variational Bayes ( http://arxiv.org/abs/2106.00528v1 ) ライセンス: Link先を確認	Sefan H\"ortling, Daniel Dold, Oliver D\"urr, Beate Sick	(参考訳) ベイズモデルの主な課題は、モデルパラメータの後方を決定することである。既に1つまたは少数のパラメータしか持たないモデルでは、分析後部は特別な設定でのみ決定できる。ベイズニューラルネットワークでは、変分分布による計算が難しい後部を近似するために、変分推論が広く用いられている。通常、ガウス分布は変分分布 (Gaussian-VI) として用いられ、その柔軟性の制限により近似の質が制限される。一方、変換モデルはどんな分布にも適合するほど柔軟である。ここでは、変換モデルに基づく変分推論(TM-VI)を提案し、一つのパラメータを持つモデルにおける複雑な後部を正確に近似し、ニューラルネットワークのようなマルチパラメータモデルに対して平均場的に機能することを実証する。 The main challenge in Bayesian models is to determine the posterior for the model parameters. Already, in models with only one or few parameters, the analytical posterior can only be determined in special settings. In Bayesian neural networks, variational inference is widely used to approximate difficult-to-compute posteriors by variational distributions. Usually, Gaussians are used as variational distributions (Gaussian-VI) which limits the quality of the approximation due to their limited flexibility. Transformation models on the other hand are flexible enough to fit any distribution. Here we present transformation model-based variational inference (TM-VI) and demonstrate that it allows to accurately approximate complex posteriors in models with one parameter and also works in a mean-field fashion for multi-parameter models like neural networks.	翻訳日:2021-06-02 14:35:01 公開日:2021-06-01
# 実時間および軽量ラインセグメント検出に向けて Towards Real-time and Light-weight Line Segment Detection ( http://arxiv.org/abs/2106.00186v1 ) ライセンス: Link先を確認	Geonmo Gu, Byungsoo Ko, SeoungHyun Go, Sung-Hyun Lee, Jingeun Lee, Minchul Shin	(参考訳) 従来の深層学習に基づく線分検出(LSD)は、ライン予測のための膨大なモデルサイズと高い計算コストに悩まされていた。これにより、計算的に制限された環境でのリアルタイム推論から制約される。本稿では,mobile lsd (m-lsd) という,資源制約環境のリアルタイム・軽量ラインセグメント検出手法を提案する。バックボーンネットワークの最小化と,従来手法におけるライン予測のための典型的なマルチモジュールプロセスの削除により,極めて効率的なLCDアーキテクチャを設計する。このような軽量ネットワークとの競争性能を維持するために,線形セグメント(SoL)のセグメント化と幾何学習方式という,新しいトレーニング手法を提案する。 SoL拡張は、トレーニングプロセス中に補助ラインデータを提供するために使用される複数のサブパートにラインセグメントを分割する。さらに、幾何学習スキームにより、モデルがマッチング損失、接合および線分節、長さおよび次数回帰から追加の幾何学的手がかりを捉えることができる。これまで最高のリアルタイムLSD手法であったTP-LSD-Liteと比較して、我々のモデル(M-LSD-tiny)は、Wireframeおよび YorkUrbanデータセットで評価した場合、モデルサイズ2.5%、GPUでの推論速度130.5%の競合性能を達成する。さらに、当社のモデルは、それぞれAndroidとiPhoneのモバイルデバイス上で56.8 FPSと48.6 FPSで動作する。私たちの知る限りでは、これはモバイルデバイスで利用可能な最初のリアルタイム深層lsdメソッドです。 Previous deep learning-based line segment detection (LSD) suffer from the immense model size and high computational cost for line prediction. This constrains them from real-time inference on computationally restricted environments. In this paper, we propose a real-time and light-weight line segment detector for resource-constrained environments named Mobile LSD (M-LSD). We design an extremely efficient LSD architecture by minimizing the backbone network and removing the typical multi-module process for line prediction in previous methods. To maintain competitive performance with such a light-weight network, we present novel training schemes: Segments of Line segment (SoL) augmentation and geometric learning scheme. SoL augmentation splits a line segment into multiple subparts, which are used to provide auxiliary line data during the training process. Moreover, the geometric learning scheme allows a model to capture additional geometry cues from matching loss, junction and line segmentation, length and degree regression. Compared with TP-LSD-Lite, previously the best real-time LSD method, our model (M-LSD-tiny) achieves competitive performance with 2.5% of model size and an increase of 130.5% in inference speed on GPU when evaluated with Wireframe and YorkUrban datasets. Furthermore, our model runs at 56.8 FPS and 48.6 FPS on Android and iPhone mobile devices, respectively. To the best of our knowledge, this is the first real-time deep LSD method available on mobile devices.	翻訳日:2021-06-02 14:34:16 公開日:2021-06-01
# 深層カーネル学習による医用画像解析における予測不確かさの定量化 Quantifying Predictive Uncertainty in Medical Image Analysis with Deep Kernel Learning ( http://arxiv.org/abs/2106.00638v1 ) ライセンス: Link先を確認	Zhiliang Wu, Yinchong Yang, Jindong Gu, Volker Tresp	(参考訳) ディープニューラルネットワークは、医療画像の分析にますます利用されている。しかし、ほとんどの作品はモデルの予測の不確実性を無視している。本稿では、畳み込みニューラルネットワークとスパースガウス過程のパイプラインによる予測の不確実性の推定を可能にする不確実性を考慮した深層カーネル学習モデルを提案する。さらに,提案モデルへの影響を検討するために,様々な事前学習手法を適用した。我々は骨年齢予測と病変局所化にアプローチを適用した。ほとんどの場合、提案したモデルは一般的なアーキテクチャよりも優れた性能を示している。さらに重要なことは、我々のモデルはより正確な予測の信頼性を体系的に高く表現し、より正確な予測の信頼性を低くする。私たちのモデルは、挑戦的で議論を呼ぶテストサンプルを検出するためにも使用できます。モンテカルロ・ドロップアウトのような関連する手法と比較して,本手法は不確かさ情報を純粋に解析的に導出し,計算効率が向上する。 Deep neural networks are increasingly being used for the analysis of medical images. However, most works neglect the uncertainty in the model's prediction. We propose an uncertainty-aware deep kernel learning model which permits the estimation of the uncertainty in the prediction by a pipeline of a Convolutional Neural Network and a sparse Gaussian Process. Furthermore, we adapt different pre-training methods to investigate their impacts on the proposed model. We apply our approach to Bone Age Prediction and Lesion Localization. In most cases, the proposed model shows better performance compared to common architectures. More importantly, our model expresses systematically higher confidence in more accurate predictions and less confidence in less accurate ones. Our model can also be used to detect challenging and controversial test samples. Compared to related methods such as Monte-Carlo Dropout, our approach derives the uncertainty information in a purely analytical fashion and is thus computationally more efficient.	翻訳日:2021-06-02 14:33:53 公開日:2021-06-01
# 説明を信用する、または信頼しない:leafを使って局所線形xai法を評価する To trust or not to trust an explanation: using LEAF to evaluate local linear XAI methods ( http://arxiv.org/abs/2106.00461v1 ) ライセンス: Link先を確認	Elvio G. Amparore and Alan Perotti and Paolo Bajardi	(参考訳) eXplainable Artificial Intelligence (XAI)の主な目的は、ブラックボックス分類器の効果的な説明を提供することである。既存の文献では、説明に有用な多くの望ましい特性を挙げているが、実際に説明を定量的に評価する方法については合意が得られていない。さらに、説明は一般にブラックボックスモデルの検査にのみ使用され、意思決定支援としての説明の積極的な使用は一般的に見過ごされる。 XAIへの多くのアプローチの中で広く採用されているパラダイムは、局所線形説明(Local Linear Explanations)である。これらの手法は不安定な説明、約束された理論特性からの実際の実装のばらつき、間違ったラベルの説明など、多くの欠陥に悩まされている。このことは、XAI分野における局所線形説明のための標準および非バイアス評価手順の必要性を強調している。本稿では,局所線形説明の評価のための,明確であいまいなメトリクス集合を特定する問題に対処する。この集合は、この種類の説明のために具体的に定義された既存のメトリクスと新しいメトリクスの両方を含んでいる。すべてのメトリクスは、LEAFという名前のオープンPythonフレームワークに含まれている。 LEAFの目的は、エンドユーザが標準化され、偏見のない方法で説明を評価するためのリファレンスを提供し、研究者が説明可能な技術の改善に導くことである。 The main objective of eXplainable Artificial Intelligence (XAI) is to provide effective explanations for black-box classifiers. The existing literature lists many desirable properties for explanations to be useful, but there is no consensus on how to quantitatively evaluate explanations in practice. Moreover, explanations are typically used only to inspect black-box models, and the proactive use of explanations as a decision support is generally overlooked. Among the many approaches to XAI, a widely adopted paradigm is Local Linear Explanations - with LIME and SHAP emerging as state-of-the-art methods. We show that these methods are plagued by many defects including unstable explanations, divergence of actual implementations from the promised theoretical properties, and explanations for the wrong label. This highlights the need to have standard and unbiased evaluation procedures for Local Linear Explanations in the XAI field. In this paper we address the problem of identifying a clear and unambiguous set of metrics for the evaluation of Local Linear Explanations. This set includes both existing and novel metrics defined specifically for this class of explanations. All metrics have been included in an open Python framework, named LEAF. The purpose of LEAF is to provide a reference for end users to evaluate explanations in a standardised and unbiased way, and to guide researchers towards developing improved explainable techniques.	翻訳日:2021-06-02 14:32:39 公開日:2021-06-01
# 対人模倣学習には何が重要か? What Matters for Adversarial Imitation Learning? ( http://arxiv.org/abs/2106.00672v1 ) ライセンス: Link先を確認	Manu Orsini, Anton Raichuk, L\'eonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz	(参考訳) 逆模倣学習は、継続的制御における模倣の一般的なフレームワークとなっている。長年にわたり、学習ポリシーの性能向上とアルゴリズムのサンプル複雑さを高めるために、そのコンポーネントの様々なバリエーションが提案されてきた。実際には、これらの選択が厳密な実証研究で一緒にテストされることは滅多にない。したがって、高レベルのアルゴリズムオプションや低レベルの実装の詳細について、どの選択肢を議論し、理解することは困難である。この問題に取り組むため,我々は50以上の選択肢を汎用的な敵意模倣学習フレームワークに実装し,人工的および人為的に生成した実演を用いた大規模研究(>500k訓練エージェント)においてその影響を調査した。私たちの発見の多くは一般的なプラクティスを裏付けていますが、いくつかは以前の作業に驚きや矛盾すらあります。特に,人工的な実演は人間のデータにとってよい指標ではないこと,および人工的な実演でのみ模倣アルゴリズムを評価するという非常に一般的な実践が,より現実的な実演でうまく機能しないアルゴリズムにつながる可能性があることを示唆する。 Adversarial imitation learning has become a popular framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and understand what choices, among the high-level algorithmic options as well as low-level implementation details, matter. To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations. While many of our findings confirm common practices, some of them are surprising or even contradict prior work. In particular, our results suggest that artificial demonstrations are not a good proxy for human data and that the very common practice of evaluating imitation algorithms only with synthetic demonstrations may lead to algorithms which perform poorly in the more realistic scenarios with human demonstrations.	翻訳日:2021-06-02 14:32:17 公開日:2021-06-01
# 深層学習モデルにおける局所的妥当性と識別的信頼区間 Locally Valid and Discriminative Confidence Intervals for Deep Learning Models ( http://arxiv.org/abs/2106.00225v1 ) ライセンス: Link先を確認	Zhen Lin, Shubhendu Trivedi, Jimeng Sun	(参考訳) 重要な現実世界の応用のためのディープラーニングモデルの信頼を構築するための重要な課題は、効率的で理論的に不確実な定量化である。有効な不確実性情報は2つの重要な特性を持つことが期待されている: 有効性(保証範囲)と差別性(予想されるリスクが高い場合にさらに不確実性)である。さらに、ディープラーニング(DL)メソッドと組み合わせると、拡張性が高く、DLモデルの性能に最小限の影響が及ぶ。既存のベイズ法の多くは、頻繁なカバレッジ保証がなく、通常はモデル性能に影響する。利用可能な数少ない頻繁主義的手法は、非現実的仮定による範囲保証を差別的かつ/または違反することはほとんどない。さらに、多くの手法は費用がかかるか、ベースとなるニューラルネットワークに大きな修正が必要となる。近年のコンフォメーション予測の進歩とカーネル回帰の古典的考え方の活用に基づいて,ほぼ任意のDLモデルに対して識別信頼区間(CI)を構築するための簡易かつ効率的かつ軽量な手法である局所妥当性・識別信頼区間(LVD)を提案する。データの分散に関する仮定がなければ、そのようなcisは有限サンプルのローカルカバレッジ保証も提供する(より単純な限界カバレッジに対応する)。多様なデータセットを用いて、LVDは局所的に有効な唯一の方法であるだけでなく、既存の不確実性定量化手法のパフォーマンス(カバレッジ率と予測精度を含む)を上回るか、一致しているかを実証的に検証し、スケーラビリティと柔軟性のさらなる利点を提供する。 Crucial for building trust in deep learning models for critical real-world applications is efficient and theoretically sound uncertainty quantification, a task that continues to be challenging. Useful uncertainty information is expected to have two key properties: It should be valid (guaranteeing coverage) and discriminative (more uncertain when the expected risk is high). Moreover, when combined with deep learning (DL) methods, it should be scalable and affect the DL model performance minimally. Most existing Bayesian methods lack frequentist coverage guarantees and usually affect model performance. The few available frequentist methods are rarely discriminative and/or violate coverage guarantees due to unrealistic assumptions. Moreover, many methods are expensive or require substantial modifications to the base neural network. Building upon recent advances in conformal prediction and leveraging the classical idea of kernel regression, we propose Locally Valid and Discriminative confidence intervals (LVD), a simple, efficient and lightweight method to construct discriminative confidence intervals (CIs) for almost any DL model. With no assumptions on the data distribution, such CIs also offer finite-sample local coverage guarantees (contrasted to the simpler marginal coverage). Using a diverse set of datasets, we empirically verify that besides being the only locally valid method, LVD also exceeds or matches the performance (including coverage rate and prediction accuracy) of existing uncertainty quantification methods, while offering additional benefits in scalability and flexibility.	翻訳日:2021-06-02 14:31:09 公開日:2021-06-01
# 分散ロバストエキスパートの合成による系列領域適応 Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts ( http://arxiv.org/abs/2106.00322v1 ) ライセンス: Link先を確認	Bahar Taskesen, Man-Chung Yue, Jose Blanchet, Daniel Kuhn, Viet Anh Nguyen	(参考訳) 最小二乗推定器は、いくつかの対象領域のサンプルでトレーニングすると、予測が貧弱になる可能性がある。教師付きドメイン適応は、目標分布に近いソース分布からラベル付きトレーニングサンプルを追加することにより、予測精度を向上させることを目的としている。利用可能なデータに基づいて,モーメント条件に関してロバストな最小二乗推定専門家の家族を合成する新しい戦略を検討する。これらのモーメント条件をkullback-leiblerまたはwasserstein型ダイバージェンスを用いて指定すると、凸最適化を用いてロバスト推定器を効率的に見つけることができる。我々は,提案するロバストな専門家群に対するbernstein online aggregationアルゴリズムを用いて,ターゲットテストサンプルの逐次ストリームの予測を行う。実データに対する数値実験は、ロバストな戦略が経験的最小二乗推定器の非ロバスト補間よりも優れていることを示している。 Least squares estimators, when trained on a few target domain samples, may predict poorly. Supervised domain adaptation aims to improve the predictive accuracy by exploiting additional labeled training samples from a source distribution that is close to the target distribution. Given available data, we investigate novel strategies to synthesize a family of least squares estimator experts that are robust with regard to moment conditions. When these moment conditions are specified using Kullback-Leibler or Wasserstein-type divergences, we can find the robust estimators efficiently using convex optimization. We use the Bernstein online aggregation algorithm on the proposed family of robust experts to generate predictions for the sequential stream of target test samples. Numerical experiments on real data show that the robust strategies may outperform non-robust interpolations of the empirical least squares estimators.	翻訳日:2021-06-02 14:30:48 公開日:2021-06-01
# Post-Contextual-Bandit推論 Post-Contextual-Bandit Inference ( http://arxiv.org/abs/2106.00418v1 ) ライセンス: Link先を確認	Aur\'elien Bibaut and Antoine Chambaz and Maria Dimakopoulou and Nathan Kallus and Mark van der Laan	(参考訳) コンテクストバンディットアルゴリズムは、eコマース、ヘルスケア、ポリシーメーキングにおける非適応的なa/bテストを置き換えるようになってきている。研究の終盤における新規介入の信頼性推論を支援するため, 平均治療効果, サブグループ効果, あるいは新政策の価値について, 有効な信頼区間を構築したい。しかし、文脈的帯域幅アルゴリズムによって収集されたデータの適応性は、これを難しくする: 標準推定器は、もはや漸近的に分布せず、古典的な信頼区間は、正しいカバレッジを提供することができない。これは、安定化推定器を用いて、非コンテキスト設定で対処されているが、この文脈設定は、我々が初めて取り組んだユニークな課題である。本研究では,文脈適応型データ収集において漸近的に正常なポリシー値に対する最初の推定器であるCADR(Contextual Adaptive Doubly Robust)推定器を提案する。 CADRの構築における主な技術的課題は、安定化のための適応的で一貫した条件付き標準偏差推定器を設計することである。 57のOpenMLデータセットを用いた大規模な数値実験により、CADRに基づく信頼区間が一意に正しいカバレッジを提供することが示された。 Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking because they can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, nonetheless, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies. The adaptive nature of the data collected by contextual bandit algorithms, however, makes this difficult: standard estimators are no longer asymptotically normally distributed and classic confidence intervals fail to provide correct coverage. While this has been addressed in non-contextual settings by using stabilized estimators, the contextual setting poses unique challenges that we tackle for the first time in this paper. We propose the Contextual Adaptive Doubly Robust (CADR) estimator, the first estimator for policy value that is asymptotically normal under contextual adaptive data collection. The main technical challenge in constructing CADR is designing adaptive and consistent conditional standard deviation estimators for stabilization. Extensive numerical experiments using 57 OpenML datasets demonstrate that confidence intervals based on CADR uniquely provide correct coverage.	翻訳日:2021-06-02 14:30:32 公開日:2021-06-01
# 有限ベイズニューラルネットワークにおける表現学習の漸近性 Asymptotics of representation learning in finite Bayesian neural networks ( http://arxiv.org/abs/2106.00651v1 ) ライセンス: Link先を確認	Jacob A. Zavatone-Veth and Abdulkadir Canatar and Cengiz Pehlevan	(参考訳) 近年の研究では、有限ベイズニューラルネットワークは、有限ネットワークが内部表現を柔軟に適応できるため、無限の従兄弟より優れていることが示唆されている。しかし、有限ネットワークの学習された隠れ層表現が無限ネットワークの固定表現とどのように異なるかに関する理論的理解は未完のままである。ネットワーク前後の摂動的有限幅補正について検討するが, 学習特徴の漸近性は十分に評価されていない。ここで、線形読み出しと二次コストを持つ任意のベイズネットワークの平均的特徴核に対する主有限幅補正は、概ね普遍的な形式であると主張する。完全連結ネットワークの2つのクラス – 深い線形ネットワークと単一の非線形隠蔽層を持つネットワーク – に対して,これを明示的に説明する。この結果から,データワイドベイズ型ニューラルネットワークの表現学習における特徴を解明する。 Recent works have suggested that finite Bayesian neural networks may outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width corrections to the network prior and posterior have been studied, but the asymptotics of learned features have not been fully characterized. Here, we argue that the leading finite-width corrections to the average feature kernels for any Bayesian network with linear readout and quadratic cost have a largely universal form. We illustrate this explicitly for two classes of fully connected networks: deep linear networks and networks with a single nonlinear hidden layer. Our results begin to elucidate which features of data wide Bayesian neural networks learn to represent.	翻訳日:2021-06-02 14:30:13 公開日:2021-06-01
# ヒートマップ回帰と深い畳み込みオドメトリを用いたマルコフ局所化 Markov Localisation using Heatmap Regression and Deep Convolutional Odometry ( http://arxiv.org/abs/2106.00371v1 ) ライセンス: Link先を確認	Oscar Mendez, Simon Hadfield, Richard Bowden	(参考訳) 自動運転車の文脈では、視覚的ローカライゼーションに基づくアプローチとLiDARとの強い競争がある。 LiDARは重要な深度情報を提供するが、解像度が低く高価である。一方、カメラは低コストであり、ディープラーニングの最近の進歩は、高いローカライズ性能を提供できることを意味する。しかし、特に不確実性領域において、学習に基づくアプローチが自信過剰で悪名高い、いくつかの根本的な問題が残っている。マルコフ、あるいはグリッドベースのローカライズは、ローカライズ問題の初期の解決策であったが、計算の複雑さのために好ましくなかった。確率場をグリッド(またはボリューム)として表現することは、精度とメモリサイズの間にトレードオフがあることを意味する。さらに,全容積全体にわたって高価な畳み込みを行う必要がある。全ての可能な位置を同時に維持する利点にもかかわらず、グリッドベースのアプローチはより効率的な粒子フィルタとモンテカルロ局在(MCL)に取って代わられた。しかし、MCLは独自の問題を導入している。粒子除去近年のディープラーニングハードウェアの進歩により、GPUに格納される大きな可能性ボリュームと、GPUによる3D畳み込みを効率的に実行するために必要なハードウェアが実現し、グリッドベースの手法の欠点の多くを排除している。本研究では,最新のディープラーニングハードウェアを活用する新しいCNNベースのローカライゼーション手法を提案する。グリッドベースのマルコフローカライズアプローチをgpu上で直接実装することにより、単一のニューラルネットワーク内でイメージベースのローカライズとオドメトリーに基づくラピッド伝搬を実行できるハイブリッドcnnを作成する。結果として得られたアプローチは、最先端のローカライズシステムと同様に、直接ポーズ回帰法を上回ることができる。 In the context of self-driving vehicles there is strong competition between approaches based on visual localisation and LiDAR. While LiDAR provides important depth information, it is sparse in resolution and expensive. On the other hand, cameras are low-cost and recent developments in deep learning mean they can provide high localisation performance. However, several fundamental problems remain, particularly in the domain of uncertainty, where learning based approaches can be notoriously over-confident. Markov, or grid-based, localisation was an early solution to the localisation problem but fell out of favour due to its computational complexity. Representing the likelihood field as a grid (or volume) means there is a trade off between accuracy and memory size. Furthermore, it is necessary to perform expensive convolutions across the entire likelihood volume. Despite the benefit of simultaneously maintaining a likelihood for all possible locations, grid based approaches were superseded by more efficient particle filters and Monte Carlo Localisation (MCL). However, MCL introduces its own problems e.g. particle deprivation. Recent advances in deep learning hardware allow large likelihood volumes to be stored directly on the GPU, along with the hardware necessary to efficiently perform GPU-bound 3D convolutions and this obviates many of the disadvantages of grid based methods. In this work, we present a novel CNN-based localisation approach that can leverage modern deep learning hardware. By implementing a grid-based Markov localisation approach directly on the GPU, we create a hybrid CNN that can perform image-based localisation and odometry-based likelihood propagation within a single neural network. The resulting approach is capable of outperforming direct pose regression methods as well as state-of-the-art localisation systems.	翻訳日:2021-06-02 14:29:59 公開日:2021-06-01
# COV-ECGNET:深部畳み込みニューラルネットワークを用いたECGトレース画像を用いたCOVID-19検出 COV-ECGNET: COVID-19 detection using ECG trace images with deep convolutional neural network ( http://arxiv.org/abs/2106.00436v1 ) ライセンス: Link先を確認	Tawsifur Rahman, Alex Akinbi, Muhammad E. H. Chowdhury, Tarik A. Rashid, Abdulkadir \c{S}eng\"ur, Amith Khandakar, Khandaker Reajul Islam, Aras M. Ismael	(参考訳) 新型コロナウイルスの感染拡大を防ぎ、ロックダウンの規制を緩和し、公衆衛生インフラへの圧力を減らすため、信頼性と迅速な識別が重要になっている。近年,SARS-CoV-2ウイルスを画像やデータを用いて検出する手法や手法が提案されている。しかし、これは心電図(ECG)トレース画像からCOVID-19を検出するために深部畳み込みニューラルネットワーク(CNN)モデルを使用することの可能性を探る最初の研究である。本研究は、深層学習技術を用いて、COVID-19および他の心血管疾患(CVD)を検出した。本研究では, 正常, COVID-19, 心筋梗塞 (MI), 異常心拍 (AHB) , 回復心筋梗塞 (RMI) の5つのカテゴリから1937年像を作成した。 6種類の深層CNNモデル (ResNet18, ResNet50, ResNet101, InceptionV3, DenseNet201, MobileNetv2) を用いて2クラス分類 (Normal vs COVID-19), 3クラス分類 (Normal, COVID-19, CVDs), そして5クラス分類 (Normal, COVID-19, MI, AHB, RMI) について検討した。 2級と3級の分類では、drknet201は99.1%、97.36%の精度で他のネットワークを上回り、5級の分類ではinceptionv3が97.83%の精度で他のネットワークを上回っている。 ScoreCAM視覚化は、ネットワークがトレース画像の関連領域から学習していることを確認する。提案手法は, スマートフォンで撮影可能なECGトレース画像を用いて, 低リソース国で容易に利用できる施設であるため, コンピュータ支援による新型コロナウイルスなどの心疾患の早期診断に有効である。 The reliable and rapid identification of the COVID-19 has become crucial to prevent the rapid spread of the disease, ease lockdown restrictions and reduce pressure on public health infrastructures. Recently, several methods and techniques have been proposed to detect the SARS-CoV-2 virus using different images and data. However, this is the first study that will explore the possibility of using deep convolutional neural network (CNN) models to detect COVID-19 from electrocardiogram (ECG) trace images. In this work, COVID-19 and other cardiovascular diseases (CVDs) were detected using deep-learning techniques. A public dataset of ECG images consists of 1937 images from five distinct categories, such as Normal, COVID-19, myocardial infarction (MI), abnormal heartbeat (AHB), and recovered myocardial infarction (RMI) were used in this study. Six different deep CNN models (ResNet18, ResNet50, ResNet101, InceptionV3, DenseNet201, and MobileNetv2) were used to investigate three different classification schemes: two-class classification (Normal vs COVID-19); three-class classification (Normal, COVID-19, and Other CVDs), and finally, five-class classification (Normal, COVID-19, MI, AHB, and RMI). For two-class and three-class classification, Densenet201 outperforms other networks with an accuracy of 99.1%, and 97.36%, respectively; while for the five-class classification, InceptionV3 outperforms others with an accuracy of 97.83%. ScoreCAM visualization confirms that the networks are learning from the relevant area of the trace images. Since the proposed method uses ECG trace images which can be captured by smartphones and are readily available facilities in low-resources countries, this study will help in faster computer-aided diagnosis of COVID-19 and other cardiac abnormalities.	翻訳日:2021-06-02 14:29:29 公開日:2021-06-01
# ディープニューラルネットワークにおける従来検出不能な障害の露呈 Exposing Previously Undetectable Faults in Deep Neural Networks ( http://arxiv.org/abs/2106.00576v1 ) ライセンス: Link先を確認	Isaac Dunn, Hadrien Pouget, Daniel Kroening and Tom Melham	(参考訳) DNNをテストするための既存の手法は、生の特徴(例えば)を制約することでオラクルの問題を解決する。 image pixel value) 所望のDNN出力が知られているデータセット例の小さな距離内にあること。しかしこれは、これらのアプローチが検出できる障害の種類を制限する。本稿では,他の手法では不可能なDNNの欠陥を見つけることができる新しいDNNテスト手法を提案する。 cruxは、生成的機械学習を利用することで、高レベルな特徴(画像の場合、オブジェクトの形、位置、テクスチャ、色など)に異なる新しいテスト入力を生成することができる、ということです。我々は,本手法が故意に注入された障害や最新dnnの新しい障害を検知できることを示すとともに,既存の手法ではこれらの障害を見つけることができないことを実証する。 Existing methods for testing DNNs solve the oracle problem by constraining the raw features (e.g. image pixel values) to be within a small distance of a dataset example for which the desired DNN output is known. But this limits the kinds of faults these approaches are able to detect. In this paper, we introduce a novel DNN testing method that is able to find faults in DNNs that other methods cannot. The crux is that, by leveraging generative machine learning, we can generate fresh test inputs that vary in their high-level features (for images, these include object shape, location, texture, and colour). We demonstrate that our approach is capable of detecting deliberately injected faults as well as new faults in state-of-the-art DNNs, and that in both cases, existing methods are unable to find these faults.	翻訳日:2021-06-02 14:28:51 公開日:2021-06-01
# ここで何ができるか? 視覚能力を利用した新しいスキルの学習 What Can I Do Here? Learning New Skills by Imagining Visual Affordances ( http://arxiv.org/abs/2106.00671v1 ) ライセンス: Link先を確認	Alexander Khazatsky, Ashvin Nair, Daniel Jing, Sergey Levine	(参考訳) 学習スキルを備えた汎用ロボットは、多くの異なる環境で多くのタスクを実行できなければならない。しかし、新しい設定へのゼロショットの一般化が常に可能であるとは限らない。ロボットが新しい環境や物体に遭遇したとき、この変化に対応するために、以前に学んだスキルを微調整する必要があるかもしれない。しかし、重要なことは、これまで学んだ行動やモデルは、この再学習を加速するのに相応しいはずだ。本稿では,可能成果の生成モデルを用いて,ロボットが手頃価格の視覚的表現を学習し,新たな状況において潜在的成果をサンプリングし,さらにその成果を達成するためのポリシーを訓練することを目的とした。実際に、事前データは、ロボットが不慣れな設定に遭遇すると、そのモデルから潜在的な成果をサンプリングし、それらに到達し、そのスキルと結果モデルの両方を更新することができるように、どのような結果が可能かを学習するために使用される。本手法は, VAL (visuomotor affordance Learning) を用いて, 原画像入力で動作する目標条件付きポリシーを訓練し, 提案手法を用いて, 新たなオブジェクトの操作を迅速に学習することができる。我々は,VALが先行データを利用すれば,新たなシーンで5分間のオンライン体験しか持たずに,引き出しのオープニングや把握,オブジェクトの配置といった現実的なタスクを解決できることを示す。 A generalist robot equipped with learned skills must be able to perform many tasks in many different environments. However, zero-shot generalization to new settings is not always possible. When the robot encounters a new environment or object, it may need to finetune some of its previously learned skills to accommodate this change. But crucially, previously learned behaviors and models should still be suitable to accelerate this relearning. In this paper, we aim to study how generative models of possible outcomes can allow a robot to learn visual representations of affordances, so that the robot can sample potentially possible outcomes in new situations, and then further train its policy to achieve those outcomes. In effect, prior data is used to learn what kinds of outcomes may be possible, such that when the robot encounters an unfamiliar setting, it can sample potential outcomes from its model, attempt to reach them, and thereby update both its skills and its outcome model. This approach, visuomotor affordance learning (VAL), can be used to train goal-conditioned policies that operate on raw image inputs, and can rapidly learn to manipulate new objects via our proposed affordance-directed exploration scheme. We show that VAL can utilize prior data to solve real-world tasks such drawer opening, grasping, and placing objects in new scenes with only five minutes of online experience in the new scene.	翻訳日:2021-06-02 14:28:38 公開日:2021-06-01
# HERALD:ソーシャル・会話におけるユーザ・ディエンジメントを効果的に検出するアノテーション手法 HERALD: An Annotation Efficient Method to Detect User Disengagement in Social Conversations ( http://arxiv.org/abs/2106.00162v1 ) ライセンス: Link先を確認	Weixin Liang, Kai-Hui Liang, Zhou Yu	(参考訳) オープンドメインダイアログシステムには、人間に魅力的な会話体験を提供することという、ユーザ中心の目標がある。ユーザエンゲージメントはオープンドメインダイアログシステムを評価する上で最も重要な指標の1つであり、ダイアログポリシー学習のためにリアルタイムフィードバックとしても使用できる。ユーザの離脱を検出する既存の作業は、通常、多くのダイアログのサンプルを手作業でラベル付けする必要がある。本稿では,学習データアノテーションプロセスを再編成するアノテーション効率のよいフレームワークであるHERALDを提案する。具体的には、手作業によるトレーニングサンプルのラベル付けではなく、トレーニングサンプルの自動ラベル付けヒューリスティックのセットを使っています。次に、Shapleyアルゴリズムを用いて弱いラベル付きデータを復調する。最後に、ユーザエンゲージメント検出器をトレーニングするために、デノライズドデータを使用します。実験の結果,herbledはアノテーションの効率を大幅に向上し,2つのダイアログコーパスにおいて86%のユーザ離脱検出精度を達成した。 Open-domain dialog systems have a user-centric goal: to provide humans with an engaging conversation experience. User engagement is one of the most important metrics for evaluating open-domain dialog systems, and could also be used as real-time feedback to benefit dialog policy learning. Existing work on detecting user disengagement typically requires hand-labeling many dialog samples. We propose HERALD, an annotation efficient framework that reframes the training data annotation process as a denoising problem. Specifically, instead of manual labeling training samples, we first use a set of labeling heuristics to automatically label training samples. We then denoise the weakly labeled data using Shapley algorithm. Finally, we use the denoised data to train a user engagement detector. Our experiments show that HERALD improves annotation efficiency significantly and achieves 86% user disengagement detection accuracy in two dialog corpora.	翻訳日:2021-06-02 14:27:20 公開日:2021-06-01
# ニューラルマシン翻訳における速度品質最適化時のジェンダーバイアス増幅 Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation ( http://arxiv.org/abs/2106.00169v1 ) ライセンス: Link先を確認	Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab	(参考訳) ニューラルネットワーク翻訳(NMT)モデルが速度に最適化され、BLEUを用いたジェネリックテストセットで評価された場合、バイアスは増幅されるか? 本稿では,ゲーディ検索,量子化,平均アテンションネットワーク(AAN)や浅層デコーダモデルなどのトランスフォーマーモデルにおいて,デコーディングの高速化によく用いられるアーキテクチャや手法について検討し,その効果を示す。本研究は, 男女差テストセットであるSimpleGENを構築し, 性別付き名詞句を1つ, 曖昧で, 正解が1つ存在する。速度最適化を適用するとBLEU全体の劣化は最小限に抑えられるが、性別付き名詞翻訳性能ははるかに高速に低下する。 Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU? We investigate architectures and techniques commonly used to speed up decoding in Transformer-based models, such as greedy search, quantization, average attention networks (AANs) and shallow decoder models and show their effect on gendered noun translation. We construct a new gender bias test set, SimpleGEN, based on gendered noun phrases in which there is a single, unambiguous, correct answer. While we find minimal overall BLEU degradation as we apply speed optimizations, we observe that gendered noun translation performance degrades at a much faster rate.	翻訳日:2021-06-02 14:27:03 公開日:2021-06-01
# ジェンダーバイアスを隠した中国語の単語埋め込み:中国語の形容詞を例に Gender Bias Hidden Behind Chinese Word Embeddings: The Case of Chinese Adjectives ( http://arxiv.org/abs/2106.00181v1 ) ライセンス: Link先を確認	Meichun Jiao, Ziyang Luo	(参考訳) 単語埋め込みにおけるジェンダーバイアスは、近年徐々に鮮明な研究分野になりつつある。この分野のほとんどの研究は、対象言語として英語を用いた測定と偏差法を目標としている。本研究は,中国語形容詞における静的単語埋め込みにおける性別バイアスについて考察する。異なるモデルで単語表現を訓練することにより、形容詞のベクトルの背後にある性別バイアスを評価する。生成した結果と人称データセットを比較することで,単語埋め込みに符号化された性別バイアスが人々の態度とどのように異なるかを示す。 Gender bias in word embeddings gradually becomes a vivid research field in recent years. Most studies in this field aim at measurement and debiasing methods with English as the target language. This paper investigates gender bias in static word embeddings from a unique perspective, Chinese adjectives. By training word representations with different models, the gender bias behind the vectors of adjectives is assessed. Through a comparison between the produced results and a human-scored data set, we demonstrate how gender bias encoded in word embeddings differentiates from people's attitudes.	翻訳日:2021-06-02 14:26:49 公開日:2021-06-01
# 文脈対応ルール注入による形式的スタイル伝達の改善 Improving Formality Style Transfer with Context-Aware Rule Injection ( http://arxiv.org/abs/2106.00210v1 ) ライセンス: Link先を確認	Zonghai Yao and Hong Yu	(参考訳) 大規模正規テキストコーパスで事前学習されたモデルは、主流テキストと言語スタイルが大きく異なるユーザー生成データではうまく機能しないことが多い。ここでは、形式的スタイル転送(FST)の革新的な方法である文脈認識ルール注入(CARI)について述べる。 CARIは、エンドツーエンドのBERTベースのエンコーダとデコーダモデルに複数のルールを注入する。コンテキストに基づいて最適なルールを選択することを学ぶ。内在的評価により,CARIはFSTベンチマークデータセット上での新たな最高性能を達成した。本研究では,複数のツイート感情分析タスクにおいて,CARIが通常の事前学習モデルの性能を大幅に向上できることを示す。 Models pre-trained on large-scale regular text corpora often do not work well for user-generated data where the language styles differ significantly from the mainstream text. Here we present Context-Aware Rule Injection (CARI), an innovative method for formality style transfer (FST). CARI injects multiple rules into an end-to-end BERT-based encoder and decoder model. It learns to select optimal rules based on context. The intrinsic evaluation showed that CARI achieved the new highest performance on the FST benchmark dataset. Our extrinsic evaluation showed that CARI can greatly improve the regular pre-trained models' performance on several tweet sentiment analysis tasks.	翻訳日:2021-06-02 14:26:41 公開日:2021-06-01
# 消費者健康質問要約のための質問認識トランスフォーマーモデル Question-aware Transformer Models for Consumer Health Question Summarization ( http://arxiv.org/abs/2106.00219v1 ) ライセンス: Link先を確認	Shweta Yadav, Deepak Gupta, Asma Ben Abacha and Dina Demner-Fushman	(参考訳) オンラインの健康情報検索は、日々ますます多くの消費者にとって慣例となっているため、効率的で信頼性の高い質問応答システムの必要性が高まっている。これらのシステムの成功率に重要な貢献は、消費者の質問を完全に理解できる能力である。しかし、これらの質問はしばしば必要以上に長く、適切な回答を見つけるのに役に立たない周辺情報に言及する。質問の要約は、答えを見つける前に、長く複雑な消費者の質問を単純化する潜在的な解決策の1つである。本稿では,現実の消費者健康問題に対する抽象的な要約の課題について考察する。医療機関の認識を通じて質問の意味的解釈を活用し,情報的要約の生成を可能にする抽象的質問要約モデルを開発した。そこで我々は複数のClozeタスク(すなわち)を提案する。問題焦点認識においてより良いカバレッジを持つようにモデルを強制する重要な医療機関を特定するための(特定の文脈で欠落した単語を提出する)タスク。さらに,デコーダの入力に質問型情報を加え,質問型要約を生成する。 MeQSumベンチマークコーパスで評価すると、我々のフレームワークは最先端の手法を10.2ROUGE-Lで上回りました。また,生成した要約の正確性を評価するために手動による評価を行った。 Searching for health information online is becoming customary for more and more consumers every day, which makes the need for efficient and reliable question answering systems more pressing. An important contributor to the success rates of these systems is their ability to fully understand the consumers' questions. However, these questions are frequently longer than needed and mention peripheral information that is not useful in finding relevant answers. Question summarization is one of the potential solutions to simplifying long and complex consumer questions before attempting to find an answer. In this paper, we study the task of abstractive summarization for real-world consumer health questions. We develop an abstractive question summarization model that leverages the semantic interpretation of a question via recognition of medical entities, which enables the generation of informative summaries. Towards this, we propose multiple Cloze tasks (i.e. the task of filing missing words in a given context) to identify the key medical entities that enforce the model to have better coverage in question-focus recognition. Additionally, we infuse the decoder inputs with question-type information to generate question-type driven summaries. When evaluated on the MeQSum benchmark corpus, our framework outperformed the state-of-the-art method by 10.2 ROUGE-L points. We also conducted a manual evaluation to assess the correctness of the generated summaries.	翻訳日:2021-06-02 14:26:30 公開日:2021-06-01
# 多単語表現機能によるヘイトスピーチ自動検出の改善 Improving Automatic Hate Speech Detection with Multiword Expression Features ( http://arxiv.org/abs/2106.00237v1 ) ライセンス: Link先を確認	Nicolas Zampieri, Irina Illina and Dominique Fohr	(参考訳) ソーシャルメディアでヘイトスピーチを自動的に検出する作業は、ますます注目を集めている。毎日投稿される大量のコンテンツを考えると、ヘイトスピーチの人間の監視は不可能だ。本研究では,ヘイトスピーチ自動検出(hsd: multiword expressions, mwes)のための新しい単語レベル機能を提案する。 mwes は慣用的意味と構成的意味を持つ単語よりも大きい語彙単位である。我々は、深層ニューラルネットワークベースのHSDフレームワークにMWE機能を統合することを提案する。我々のベースライン HSD システムは Universal Sentence Encoder (USE) に依存している。 MWE機能を組み込むために、3分岐のディープニューラルネットワーク(USE用の1つのブランチ、MWEカテゴリ用の1つ、MWE埋め込みのための1つ)を作成します。我々は、異なるMWEカテゴリと2種類のMWE埋め込み、 word2vec と BERT を用いた2種類のヘイトスピーチツイートコーパスの実験を行った。実験の結果,MWE特徴を持つHSDシステムはマクロF1の点でベースラインシステムよりも有意に優れていた。 The task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional meanings. We propose to integrate MWE features in a deep neural network-based HSD framework. Our baseline HSD system relies on Universal Sentence Encoder (USE). To incorporate MWE features, we create a three-branch deep neural network: one branch for USE, one for MWE categories, and one for MWE embeddings. We conduct experiments on two hate speech tweet corpora with different MWE categories and with two types of MWE embeddings, word2vec and BERT. Our experiments demonstrate that the proposed HSD system with MWE features significantly outperforms the baseline system in terms of macro-F1.	翻訳日:2021-06-02 14:26:14 公開日:2021-06-01
# semeval-2021タスク9 : tapasと転送学習を用いた表による文検証とエビデンス発見 Volta at SemEval-2021 Task 9: Statement Verification and Evidence Finding with Tables using TAPAS and Transfer Learning ( http://arxiv.org/abs/2106.00248v1 ) ライセンス: Link先を確認	Devansh Gautam, Kshitij Gupta, Manish Shrivastava	(参考訳) 表は、情報を簡潔に提示するために、様々な種類の文書で広く使われている。表を理解することは、言語と表の構造、数値的および論理的推論を理解することを必要とする難しい問題である。本稿では,SemEval-2021: Statement Verification and Evidence Finding with Tables (SEM-TAB-FACTS) のタスク9を解くシステムを提案する。タスクは2つのサブタスクで構成される: (a) テーブルとステートメント、そのテーブルがステートメントをサポートするかどうかの予測、および (b) テーブル内のどのセルがそのステートメントの証拠を提供するかを予測する。我々は,テーブル理解タスクにおける最先端性能を示すため,各サブタスクに対してTAPAS(BERTのアーキテクチャを拡張して表構造をキャプチャするモデル)を微調整する。サブタスクAでは,1つのヘッダ列を持つテーブルの転送学習と標準化がTAPASの性能を向上させるかを評価する。 In subtask B, we evaluate how different fine-tuning strategy could improve of TAPAS。サブタスクではF1スコアが67.34、サブタスクでは72.89、サブタスクBでは62.95である。 Tables are widely used in various kinds of documents to present information concisely. Understanding tables is a challenging problem that requires an understanding of language and table structure, along with numerical and logical reasoning. In this paper, we present our systems to solve Task 9 of SemEval-2021: Statement Verification and Evidence Finding with Tables (SEM-TAB-FACTS). The task consists of two subtasks: (A) Given a table and a statement, predicting whether the table supports the statement and (B) Predicting which cells in the table provide evidence for/against the statement. We fine-tune TAPAS (a model which extends BERT's architecture to capture tabular structure) for both the subtasks as it has shown state-of-the-art performance in various table understanding tasks. In subtask A, we evaluate how transfer learning and standardizing tables to have a single header row improves TAPAS' performance. In subtask B, we evaluate how different fine-tuning strategies can improve TAPAS' performance. Our systems achieve an F1 score of 67.34 in subtask A three-way classification, 72.89 in subtask A two-way classification, and 62.95 in subtask B.	翻訳日:2021-06-02 14:26:00 公開日:2021-06-01
# LenAtten: テキスト要約に有効な長さ制御ユニット LenAtten: An Effective Length Controlling Unit For Text Summarization ( http://arxiv.org/abs/2106.00316v1 ) ライセンス: Link先を確認	Zhongyi Yu, Zhenghao Wu, Hao Zheng, Zhe XuanYuan, Jefferson Fong, Weifeng Su	(参考訳) 固定長要約は、単語や文字のプリセット数で要約を生成することを目的としている。近年の研究では、単語埋め込みを繰り返し復号ユニットへの入力として長さ情報を取り込んでおり、長さ制御性と要約品質の妥協を引き起こしている。本稿では,このトレードオフを解消するために,有効長制御単位長注意(lenatten)を提案する。実験結果から,LenAttenは長さ制御性とROGUEスコアの改善をもたらすだけでなく,高い一般化能力を有することが示された。 CNN/Daily Mailデータセットにおいて,目標長の要約を生成するタスクにおいて,我々のモデルは,最大長制御可能な要約器よりも732倍よい。 Fixed length summarization aims at generating summaries with a preset number of words or characters. Most recent researches incorporate length information with word embeddings as the input to the recurrent decoding unit, causing a compromise between length controllability and summary quality. In this work, we present an effective length controlling unit Length Attention (LenAtten) to break this trade-off. Experimental results show that LenAtten not only brings improvements in length controllability and ROGUE scores but also has great generalization ability. In the task of generating a summary with the target length, our model is 732 times better than the best-performing length controllable summarizer in length controllability on the CNN/Daily Mail dataset.	翻訳日:2021-06-02 14:25:39 公開日:2021-06-01
# 合理化のための分布マッチング Distribution Matching for Rationalization ( http://arxiv.org/abs/2106.00320v1 ) ライセンス: Link先を確認	Yongfeng Huang, Yujun Chen, Yulun Du, Zhilin Yang	(参考訳) 合理化の課題は、入力テキストの一部を論理として抽出し、テキスト分類タスクにおけるニューラルネットワーク予測を正当化することである。定義上、合理化は予測に使われるキーテキストを表現し、したがって元の入力テキストと類似した分類特徴分布を持つべきである。しかし,従来の手法は,有理とラベル間の相互情報の最大化を主眼とし,有理と入力テキストの関係を無視するものであった。そこで本研究では,特徴空間と出力空間の両方における有理と入力テキストの分布に一致する新しい合理化手法を提案する。実験的に、提案した分布マッチングアプローチは、従来手法を大きなマージンで一貫して上回っている。データとコードは利用可能です。 The task of rationalization aims to extract pieces of input text as rationales to justify neural network predictions on text classification tasks. By definition, rationales represent key text pieces used for prediction and thus should have similar classification feature distribution compared to the original input text. However, previous methods mainly focused on maximizing the mutual information between rationales and labels while neglecting the relationship between rationales and input text. To address this issue, we propose a novel rationalization method that matches the distributions of rationales and input text in both the feature space and output space. Empirically, the proposed distribution matching approach consistently outperforms previous methods by a large margin. Our data and code are available.	翻訳日:2021-06-02 14:25:25 公開日:2021-06-01
# 中国語単語の内部構造に関する詳細な研究 An In-depth Study on Internal Structure of Chinese Words ( http://arxiv.org/abs/2106.00334v1 ) ライセンス: Link先を確認	Chen Gong, Saihao Huang, Houquan Zhou, Zhenghua Li, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan	(参考訳) 英語の文字とは異なり、漢字は豊かで特定の意味を持つ。通常、単語の意味は何らかの形でその構成文字から派生することができる。構文解析に関するいくつかの以前の研究は、文字レベルの情報を活用するために浅い単語内部構造を注釈付けすることを提案した。本研究は,中国語単語の深い内部構造を,構文的関係を識別するための11のラベルを持つ依存木としてモデル化することを提案する。まず,新たにコンパイルされたアノテーションガイドラインに基づいて,中国ペンツリーバンクの30万語以上の多字語からなる単語内部構造木バンク(WIST)を手作業で注釈する。品質を保証するため、各単語は独立して2つの注釈により注釈され、不整合は第3上級注釈者によって処理される。第2に,中国語の単語形成に関する知見を明らかにするために,WISTに関する詳細な,興味深い分析を行った。第3に,新しいタスクとして単語内構造解析を提案し,競合依存構文解析器を用いてベンチマーク実験を行う。最後に,単語内部構造を符号化する2つの簡単な方法を提案する。 Unlike English letters, Chinese characters have rich and specific meanings. Usually, the meaning of a word can be derived from its constituent characters in some way. Several previous works on syntactic parsing propose to annotate shallow word-internal structures for better utilizing character-level information. This work proposes to model the deep internal structures of Chinese words as dependency trees with 11 labels for distinguishing syntactic relationships. First, based on newly compiled annotation guidelines, we manually annotate a word-internal structure treebank (WIST) consisting of over 30K multi-char words from Chinese Penn Treebank. To guarantee quality, each word is independently annotated by two annotators and inconsistencies are handled by a third senior annotator. Second, we present detailed and interesting analysis on WIST to reveal insights on Chinese word formation. Third, we propose word-internal structure parsing as a new task, and conduct benchmark experiments using a competitive dependency parser. Finally, we present two simple ways to encode word-internal structures, leading to promising gains on the sentence-level syntactic parsing task.	翻訳日:2021-06-02 14:25:14 公開日:2021-06-01
# 対話型事前学習 Dialogue-oriented Pre-training ( http://arxiv.org/abs/2106.00420v1 ) ライセンス: Link先を確認	Yi Xu, Hai Zhao	(参考訳) 事前訓練された言語モデル(PrLM)は、様々な対話に関連したタスクを含む幅広い下流タスクの強化に有効であることが示されている。しかし、PrLMは通常、共通言語モデル(LM)訓練目的の一般的なプレーンテキストで訓練されるため、そのようなトレーニング設定の制限により、対話排他的特徴を十分に捉えられないため、特定の対話タスクとLMタスクのギャップを埋める必要がすぐに生じる。本稿では,対話指向事前学習のための膨大な対話データを収集することができないため,一般的な平文における対話特徴をシミュレートする3つの手法を提案する。提案手法は, 話者認識, 連続性, 一貫性などの対話的特徴を学習しながら, 汎用のPrLMを生成でき, 詳細なタスクを特定できない既存の学習方法と異なる。その結果、Dialog-PrLMは3つの公開マルチターン対話データセットに基づいて微調整され、通常のPrLMよりも大幅に一貫した改善を実現する。 Pre-trained language models (PrLM) has been shown powerful in enhancing a broad range of downstream tasks including various dialogue related ones. However, PrLMs are usually trained on general plain text with common language model (LM) training objectives, which cannot sufficiently capture dialogue exclusive features due to the limitation of such training setting, so that there is an immediate need to fill the gap between a specific dialogue task and the LM task. As it is unlikely to collect huge dialogue data for dialogue-oriented pre-training, in this paper, we propose three strategies to simulate the conversation features on general plain text. Our proposed method differs from existing post-training methods that it may yield a general-purpose PrLM and does not individualize to any detailed task while keeping the capability of learning dialogue related features including speaker awareness, continuity and consistency. The resulted Dialog-PrLM is fine-tuned on three public multi-turn dialogue datasets and helps achieve significant and consistent improvement over the plain PrLMs.	翻訳日:2021-06-02 14:24:56 公開日:2021-06-01
# SemEval-2021 Task 1: Lexical Complexity Prediction SemEval-2021 Task 1: Lexical Complexity Prediction ( http://arxiv.org/abs/2106.00473v1 ) ライセンス: Link先を確認	Matthew Shardlow, Richard Evans, Gustavo Henrique Paetzold, Marcos Zampieri	(参考訳) 本稿では,SemEval-2021 Task 1Lexical Complexity Predictionの結果と主な結果を示す。参加者にCompLex Corpus(Shardlow et al 2020)の拡張版を提供した。コンプレックス (complex) は、英語の多言語コーパスで、単語と多語表現 (mwes) が5点類似尺度を用いてその複雑さについて注釈付けされた。 semeval-2021 task 1 には2つのサブタスクがあった。このコンペには合計198チームが参加し、うち54チームがテストデータの公式実行をサブタスク1に、37チームがサブタスク2に提出した。 This paper presents the results and main findings of SemEval-2021 Task 1 - Lexical Complexity Prediction. We provided participants with an augmented version of the CompLex Corpus (Shardlow et al 2020). CompLex is an English multi-domain corpus in which words and multi-word expressions (MWEs) were annotated with respect to their complexity using a five point Likert scale. SemEval-2021 Task 1 featured two Sub-tasks: Sub-task 1 focused on single words and Sub-task 2 focused on MWEs. The competition attracted 198 teams in total, of which 54 teams submitted official runs on the test data to Sub-task 1 and 37 to Sub-task 2.	翻訳日:2021-06-02 14:24:37 公開日:2021-06-01
# DoT:テーブル付きNLPタスクのための効率的なダブルトランス DoT: An efficient Double Transformer for NLP tasks with tables ( http://arxiv.org/abs/2106.00479v1 ) ライセンス: Link先を確認	Syrine Krichene, Thomas M\"uller and Julian Martin Eisenschlos	(参考訳) 半構造化テーブルを用いた自然言語処理(NLP)タスクにおける最先端の精度を得るためにトランスフォーマーベースのアプローチが成功している。これらのモデルアーキテクチャは一般的に深く、特に長い入力に対してトレーニングや推論が遅くなる。高い精度を維持しつつ効率を向上させるために、問題を2つのサブタスクに分解する新しいアーキテクチャ、dot(double transformer model)を提案している。さらに,タスク固有の注意点を変更し,プルーニングスコアを組み込む。 2つのトランスフォーマーはタスク固有の損失を最適化することで共同で訓練される。詳細と質問応答を含む3つのベンチマークで実験を行う。少ない精度でDoTはトレーニング時間と推論時間を少なくとも50%改善することを示した。また,pruning transformerは,エンド・ツー・エンドモデルが低速なベースラインモデルと同様の精度を維持するための関連するトークンを効果的に選択できることを示す。最後に、刈り取りを分析し、そのタスクモデルへの影響について見識を与えます。 Transformer-based approaches have been successfully used to obtain state-of-the-art accuracy on natural language processing (NLP) tasks with semi-structured tables. These model architectures are typically deep, resulting in slow training and inference, especially for long inputs. To improve efficiency while maintaining a high accuracy, we propose a new architecture, DoT, a double transformer model, that decomposes the problem into two sub-tasks: A shallow pruning transformer that selects the top-K tokens, followed by a deep task-specific transformer that takes as input those K tokens. Additionally, we modify the task-specific attention to incorporate the pruning scores. The two transformers are jointly trained by optimizing the task-specific loss. We run experiments on three benchmarks, including entailment and question-answering. We show that for a small drop of accuracy, DoT improves training and inference time by at least 50%. We also show that the pruning transformer effectively selects relevant tokens enabling the end-to-end model to maintain similar accuracy as slower baseline models. Finally, we analyse the pruning and give some insight into its impact on the task model.	翻訳日:2021-06-02 14:24:27 公開日:2021-06-01
# 定量的対話コヒーレンス評価に向けて Towards Quantifiable Dialogue Coherence Evaluation ( http://arxiv.org/abs/2106.00507v1 ) ライセンス: Link先を確認	Zheng Ye, Liucun Lu, Lishan Huang, Liang Lin, Xiaodan Liang	(参考訳) 自動対話コヒーレンス評価は注目度が高くなり,有望な対話システムの開発に不可欠である。しかし、既存の指標には2つの大きな制限がある: (a) それらは主に単純化された2段階の設定(コヒーレント対非コヒーレント)で訓練されているのに対し、人間は「量子化」と呼ばれる、クアルト型多段階コヒーレンススコアを与える; (b) トレーニング中に人間の指導が欠如しているため、予測されたコヒーレンススコアは実際の人間の評価基準に適合しない。そこで本研究では,実際の評価基準を反映することのできる,定量化可能な対話コヒーレンスメトリックの学習を目的とした新しい枠組みであるquantidceを提案する。具体的には、QuantiDCEには、マルチレベルランキング(MLR)事前トレーニングと知識蒸留(KD)微調整という2つのトレーニング段階が含まれている。 MLR事前学習中に、モデルがコヒーレンスの粗い判断を学習できるようにするために、新しいMLR損失を提案する。そして、KD微調整の間、事前訓練されたモデルはさらに微調整され、人間の注釈付きデータだけで実際の人間の評価基準を学習する。限られた微調整データでも一般化性を提唱するため、事前学習段階で学んだ知識を保持するために、新しいkd正則化を導入する。実験結果から,QuantiDCEによりトレーニングされたモデルは,他の最先端の指標に比べて,人間の判断と強い相関関係を示すことが示された。 Automatic dialogue coherence evaluation has attracted increasing attention and is crucial for developing promising dialogue systems. However, existing metrics have two major limitations: (a) they are mostly trained in a simplified two-level setting (coherent vs. incoherent), while humans give Likert-type multi-level coherence scores, dubbed as "quantifiable"; (b) their predicted coherence scores cannot align with the actual human rating standards due to the absence of human guidance during training. To address these limitations, we propose Quantifiable Dialogue Coherence Evaluation (QuantiDCE), a novel framework aiming to train a quantifiable dialogue coherence metric that can reflect the actual human rating standards. Specifically, QuantiDCE includes two training stages, Multi-Level Ranking (MLR) pre-training and Knowledge Distillation (KD) fine-tuning. During MLR pre-training, a new MLR loss is proposed for enabling the model to learn the coarse judgement of coherence degrees. Then, during KD fine-tuning, the pretrained model is further finetuned to learn the actual human rating standards with only very few human-annotated data. To advocate the generalizability even with limited fine-tuning data, a novel KD regularization is introduced to retain the knowledge learned at the pre-training stage. Experimental results show that the model trained by QuantiDCE presents stronger correlations with human judgements than the other state-of-the-art metrics.	翻訳日:2021-06-02 14:24:12 公開日:2021-06-01
# 学生のパフォーマンス予測のためのグラフベース演習・知識学習ネットワーク Graph-based Exercise- and Knowledge-Aware Learning Network for Student Performance Prediction ( http://arxiv.org/abs/2106.00263v1 ) ライセンス: Link先を確認	Mengfan Liu, Pengyang Shao, Kun Zhang	(参考訳) 知的指導システム(itss)では、生徒のパフォーマンスを予測することは、生徒の知識レベルを学習し、それらに対してパーソナライズされた指導戦略を提供するための基本的なタスクである。研究者はこの課題に多くの努力をしてきた。学習知識の熟練度に応じて生徒のスコアを予測するために教育心理学的手法を利用するか、学生や演習の潜伏要因を表すために協調フィルタリング(CF)モデルをフル活用する。しかし、これらの手法のほとんどは、運動特有の特性(例えば運動材料)を無視したり、学生間の高次相互作用や運動、知識概念を十分に探求することができない。そこで本稿では,学生のスコアを正確に予測するためのグラフベースの知識認識学習ネットワークを提案する。具体的には,エクササイズと知識概念の2つの効果をモデル化するために,学生のエクササイズと知識概念の熟達度をそれぞれ学習する。そして,高次相互作用をモデル化するために,予測プロセスにグラフ畳み込み手法を適用する。 2つの実世界のデータセットに対する大規模な実験により、提案したグラフ-EKLNの有効性が証明された。 Predicting student performance is a fundamental task in Intelligent Tutoring Systems (ITSs), by which we can learn about students' knowledge level and provide personalized teaching strategies for them. Researchers have made plenty of efforts on this task. They either leverage educational psychology methods to predict students' scores according to the learned knowledge proficiency, or make full use of Collaborative Filtering (CF) models to represent latent factors of students and exercises. However, most of these methods either neglect the exercise-specific characteristics (e.g., exercise materials), or cannot fully explore the high-order interactions between students, exercises, as well as knowledge concepts. To this end, we propose a Graph-based Exercise- and Knowledge-Aware Learning Network for accurate student score prediction. Specifically, we learn students' mastery of exercises and knowledge concepts respectively to model the two-fold effects of exercises and knowledge concepts. Then, to model the high-order interactions, we apply graph convolution techniques in the prediction process. Extensive experiments on two real-world datasets prove the effectiveness of our proposed Graph-EKLN.	翻訳日:2021-06-02 14:23:25 公開日:2021-06-01
# 歴史からの探索と未来への理由:時間知識グラフの2段階推論 Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs ( http://arxiv.org/abs/2106.00327v1 ) ライセンス: Link先を確認	Zixuan Li, Xiaolong Jin, Saiping Guan, Wei Li, Jiafeng Guo, Yuanzhuo Wang and Xueqi Cheng	(参考訳) 時間的知識グラフ (TKG) は様々な分野で開発・利用されている。将来の潜在的な事実(イベント)を予測するtkgの推論は、既存のモデルに大きな課題をもたらす。予測タスクに直面するとき、人間は通常、記憶の中の有用な歴史的情報(すなわち手がかり)を検索し、将来を慎重に考える。そこで本研究では,この機構に触発されて,手がかり探索と時間推論の2段階的な予測を行うクラスタを提案する。具体的には、手がかり探索段階において、CluSTeRは強化学習(RL)を介してビーム探索ポリシーを学び、歴史的事実から複数の手がかりを導き出す。時間的推論の段階では、グラフ畳み込みネットワークに基づくシーケンス法を採用し、答えを手がかりから導き出す。 4つのデータセットの実験は、最先端の手法と比較してCluSTeRのかなりの利点を示している。さらに、CluSTeRが発見した手がかりは、結果の解釈可能性をさらに高める。 Temporal Knowledge Graphs (TKGs) have been developed and used in many different areas. Reasoning on TKGs that predicts potential facts (events) in the future brings great challenges to existing models. When facing a prediction task, human beings usually search useful historical information (i.e., clues) in their memories and then reason for future meticulously. Inspired by this mechanism, we propose CluSTeR to predict future facts in a two-stage manner, Clue Searching and Temporal Reasoning, accordingly. Specifically, at the clue searching stage, CluSTeR learns a beam search policy via reinforcement learning (RL) to induce multiple clues from historical facts. At the temporal reasoning stage, it adopts a graph convolution network based sequence method to deduce answers from clues. Experiments on four datasets demonstrate the substantial advantages of CluSTeR compared with the state-of-the-art methods. Moreover, the clues found by CluSTeR further provide interpretability for the results.	翻訳日:2021-06-02 14:23:07 公開日:2021-06-01
# 典型性を有するファジィDLのKLM特性について On the KLM properties of a fuzzy DL with Typicality ( http://arxiv.org/abs/2106.00390v1 ) ライセンス: Link先を確認	Laura Giordano	(参考訳) 本稿では,ファジィ論理の典型的特性について考察する。近年,多層パーセプトロンのファジィ多参照セマンティクスを,ニューラルネットワークを条件付き知識ベースとして定義するために,ファジィ論理を典型演算子で拡張する手法が提案されている。本稿では,その特性について考察する。まず、ファジィALCの典型性を持つ単調拡張(ALCFT)を考慮し、この論理に対する優先的な帰結関係のKLM特性を再構成する。ほとんどの性質は、再編成と考慮されたファジィ結合関数によって満足される。次に,前述した条件付き知識基盤のコヒーレントモデルの概念を一般化した重み付き知識ベースの概念を導入することにより,alcftを閉包構成で強化し,その性質について検討する。 The paper investigates the properties of a fuzzy logic of typicality. The extension of fuzzy logic with a typicality operator was proposed in recent work to define a fuzzy multipreference semantics for Multilayer Perceptrons, by regarding the deep neural network as a conditional knowledge base. In this paper, we study its properties. First, a monotonic extension of a fuzzy ALC with typicality is considered (called ALCFT) and a reformulation the KLM properties of a preferential consequence relation for this logic is devised. Most of the properties are satisfied, depending on the reformulation and on the fuzzy combination functions considered. We then strengthen ALCFT with a closure construction by introducing a notion of faithful model of a weighted knowledge base, which generalizes the notion of coherent model of a conditional knowledge base previously introduced, and we study its properties.	翻訳日:2021-06-02 14:22:51 公開日:2021-06-01
# マルコフ報酬過程にインスパイアされた値伝播に基づく時空間補間 Value propagation-based spatio-temporal interpolation inspired by Markov reward processes ( http://arxiv.org/abs/2106.00538v1 ) ライセンス: Link先を確認	Laurens Arp, Mitra Baratchi, Holger Hoos	(参考訳) リモートセンシング、生態学、気象学といった様々な分野の現実世界のアプリケーションで欠落するデータの一般的な問題を考えると、空間的・時空間的データの補間は極めて重要である。既存の空間補間法(特にガウス過程と空間自己回帰モデル)は、(a)局所的または大域的な空間的相互作用のモデル化間のトレードオフ、(b)二つの点の間に可能な経路が1つしかないという仮定、(c)点間の中間位置の同質性の仮定に苦しむ傾向がある。これらの問題に対処するため,空間補間法としてマルコフ報酬プロセス (MRP) に着想を得た値伝搬法を提案し,その2つの変種(SD-MRP)とデータ駆動重み予測 (WP-MRP) の変種(WP-MRP)を提案する。これらの補間変種はどちらも局所的に動作し、再帰を通じてシステム全体の空間的関係を暗黙的に説明する。提案手法は, 補間格子セルの平均絶対誤差とランニング時間と7つの共通基底線の平均誤差を比較して評価した。本分析では,44実験条件を超える2つの合成データと2つの実世界のデータセットについて詳細な実験を行った。実験結果から,実験条件下でのSD-MRPの平均性能は,他のすべての手法に比べて有意に高く,続いてWP-MRPが続いた。合成データから,WP-MRPがSD-MRPよりも十分な情報的特徴を有することを示す。さらに,本手法が基準線に対して有意な優位性を持たない場合においても,対象格子の空間構造を基準線よりもよく保存することがわかった。 Given the common problem of missing data in real-world applications from various fields, such as remote sensing, ecology and meteorology, the interpolation of missing spatial and spatio-temporal data can be of tremendous value. Existing methods for spatial interpolation, most notably Gaussian processes and spatial autoregressive models, tend to suffer from (a) a trade-off between modelling local or global spatial interaction, (b) the assumption there is only one possible path between two points, and (c) the assumption of homogeneity of intermediate locations between points. Addressing these issues, we propose a value propagation method, inspired by Markov reward processes (MRPs), as a spatial interpolation method, and introduce two variants thereof: (i) a static discount (SD-MRP) and (ii) a data-driven weight prediction (WP-MRP) variant. Both these interpolation variants operate locally, while implicitly accounting for global spatial relationships in the entire system through recursion. We evaluated our proposed methods by comparing the mean absolute errors and running times of interpolated grid cells to those of 7 common baselines. Our analysis involved detailed experiments on two synthetic and two real-world datasets over 44 total experimental conditions. Experimental results show the competitive advantage of MRP interpolation on real-world data, as the average performance of SD-MRP on real-world data under all experimental conditions was ranked significantly higher than that of all other methods, followed by WP-MRP. On synthetic data, we show that WP-MRP can perform better than SD-MRP given sufficiently informative features. We further found that, even in cases where our methods had no significant advantage over baselines numerically, our methods preserved the spatial structure of the target grid better than the baselines.	翻訳日:2021-06-02 14:22:36 公開日:2021-06-01
# コンピュータビジョンと無人航空機技術の公衆検査における統合的利用:異物デブリ画像収集 Integrative Use of Computer Vision and Unmanned Aircraft Technologies in Public Inspection: Foreign Object Debris Image Collection ( http://arxiv.org/abs/2106.00161v1 ) ライセンス: Link先を確認	Travis J. E. Munyer, Daniel Brinkman, Chenyu Huang, Xin Zhong	(参考訳) 無人航空機システム(UAS)は公共サービス事業者やスマートシティにとって重要な資源となっている。本研究の目的は,コンピュータビジョンとUAS技術を統合して公衆検査を自動化することにある。本研究の最初のケーススタディとして,軽量自動検出の可能性を評価するために,共通異物デブリ(fod)のデータセットを開発した。本稿では,このデータセットの根拠と作成について述べる。我々の研究の今後のイテレーションには、実験的な実装を分析する技術的な詳細が含まれます。地元の空港では、UASとポータブルカメラを使用して、このデータセットの初期バージョンに含まれるデータを収集する。 FODのビデオを収集した後、個々のフレームに分割され、数千の画像として保存された。これらのフレームは、標準のコンピュータビジョンフォーマットに従って注釈付けされ、フォルダ構造に格納される。データセットアノテーションは、将来のアプリケーションに適合するように抽象化できるカスタムツールを使用して検証される。提案したデータの実用性を示す有名なYou Only Look Onceアルゴリズムを用いて,初期検出モデルの作成に成功した。最後に、このデータセットまたは他の公共サービスのための類似のメソッドを利用する可能性のあるいくつかのシナリオが提示される。 Unmanned Aircraft Systems (UAS) have become an important resource for public service providers and smart cities. The purpose of this study is to expand this research area by integrating computer vision and UAS technology to automate public inspection. As an initial case study for this work, a dataset of common foreign object debris (FOD) is developed to assess the potential of light-weight automated detection. This paper presents the rationale and creation of this dataset. Future iterations of our work will include further technical details analyzing experimental implementation. At a local airport, UAS and portable cameras are used to collect the data contained in the initial version of this dataset. After collecting these videos of FOD, they were split into individual frames and stored as several thousand images. These frames are then annotated following standard computer vision format and stored in a folder-structure that reflects our creation method. The dataset annotations are validated using a custom tool that could be abstracted to fit future applications. Initial detection models were successfully created using the famous You Only Look Once algorithm, which indicates the practicality of the proposed data. Finally, several potential scenarios that could utilize either this dataset or similar methods for other public service are presented.	翻訳日:2021-06-02 14:21:42 公開日:2021-06-01
# 半教師付き物体検出のための擬似ラベル再考 Rethinking Pseudo Labels for Semi-Supervised Object Detection ( http://arxiv.org/abs/2106.00168v1 ) ライセンス: Link先を確認	Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis	(参考訳) 半教師対象検出(SSOD)の最近の進歩は、画像分類タスクの整合性に基づく擬似ラベル法によって大きく左右される。しかしながら、擬似ラベルを使用する場合、局所化精度と増幅されたクラス不均衡には考慮が欠如しており、どちらも検出タスクに不可欠である。本稿では,対象検出に適した確実な擬似ラベルを導入し,抽出した擬似ラベルの分類と位置化品質を効果的に推定する。これは、従来のローカライゼーションを分類タスクとして変換し、改良することで達成される。分類とローカライズ品質スコアに基づいて,各カテゴリの擬似ラベルと再重み付き損失関数を生成する閾値を動的に調整し,クラス不均衡問題を緩和する。実験の結果,COCOおよびPASCALVOCにおけるSSOD性能は1-2%,APは4-6%向上した。限定アノテーション方式では,COCOのラベル付きデータのみを用いて,教師付きベースラインを最大10%AP改善する。 Recent advances in semi-supervised object detection (SSOD) are largely driven by consistency-based pseudo-labeling methods for image classification tasks, producing pseudo labels as supervisory signals. However, when using pseudo labels, there is a lack of consideration in localization precision and amplified class imbalance, both of which are critical for detection tasks. In this paper, we introduce certainty-aware pseudo labels tailored for object detection, which can effectively estimate the classification and localization quality of derived pseudo labels. This is achieved by converting conventional localization as a classification task followed by refinement. Conditioned on classification and localization quality scores, we dynamically adjust the thresholds used to generate pseudo labels and reweight loss functions for each category to alleviate the class imbalance problem. Extensive experiments demonstrate that our method improves state-of-the-art SSOD performance by 1-2% and 4-6% AP on COCO and PASCAL VOC, respectively. In the limited-annotation regime, our approach improves supervised baselines by up to 10% AP using only 1-10% labeled data from COCO.	翻訳日:2021-06-02 14:21:24 公開日:2021-06-01
# 言語駆動イメージスタイル転送 Language-Driven Image Style Transfer ( http://arxiv.org/abs/2106.00178v1 ) ライセンス: Link先を確認	Tsu-Jui Fu, Xin Eric Wang, William Yang Wang	(参考訳) 期待できる結果を得たにもかかわらず、事前にスタイルイメージを作成する必要があるスタイル転送は、創造性とアクセシビリティの欠如をもたらす可能性がある。一方、人間の指示に従うことは、視覚効果アプリケーションの制御性を大幅に向上させる芸術的スタイル転送を行う最も自然な方法である。テキストでガイドされたコンテンツイメージのスタイルを操作するために,言語駆動型画像スタイル転送(\texttt{LDIST})という新たなタスクを導入する。そこで我々は,スタイル指示から視覚的意味を抽出し,パッチワイドなスタイル判別器で「texttt{LDIST}」を実現できるコントラスト言語ビジュアルアーティスト(CLVA)を提案する。判別器は、スタイル画像の言語とパッチの相関や、スタイル命令を共同埋め込むための転送結果について検討する。 CLVAはさらに、コンテントイメージのコントラスト対とスタイル命令を比較して、転送結果間の相互相対性を改善する。同じコンテンツ画像から転送された結果は、一貫したコンテンツ構造を保存できる。さらに、同様のビジュアルセマンティクスを含むスタイル命令からの類似のスタイルパターンも提示する必要がある。実験の結果, CLVA は有効であり, <texttt{LDIST} 上で超過渡した結果が得られることがわかった。 Despite having promising results, style transfer, which requires preparing style images in advance, may result in lack of creativity and accessibility. Following human instruction, on the other hand, is the most natural way to perform artistic style transfer that can significantly improve controllability for visual effect applications. We introduce a new task -- language-driven image style transfer (\texttt{LDIST}) -- to manipulate the style of a content image, guided by a text. We propose contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish \texttt{LDIST} by the patch-wise style discriminator. The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions. CLVA further compares contrastive pairs of content image and style instruction to improve the mutual relativeness between transfer results. The transferred results from the same content image can preserve consistent content structures. Besides, they should present analogous style patterns from style instructions that contain similar visual semantics. The experiments show that our CLVA is effective and achieves superb transferred results on \texttt{LDIST}.	翻訳日:2021-06-02 14:21:04 公開日:2021-06-01
# 少数の意味セマンティクスセグメンテーションに対するアンチエイリアシングセマンティクス再構成 Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation ( http://arxiv.org/abs/2106.00184v1 ) ライセンス: Link先を確認	Binghao Liu and Yao Ding and Jianbin Jiao and Xiangyang Ji and Qixiang Ye	(参考訳) 初歩的な意味セグメンテーションの進展を促すために、初歩的な例で新しいクラスを表現するのに十分なトレーニングデータを持つベースクラスで学習した機能を活用した。しかし、この特徴共有機構は必然的に、セマンティック概念の類似した構成を持つ場合、新しいクラス間のセマンティックエイリアスを引き起こす。本稿では,セグメンテーションを意味的再構成問題として再構成し,新しいクラス再構築のためのクラスレベルのセグメンテーション空間にまたがる一連の基底ベクトルに基底クラス特徴を変換する。対照損失を導入することにより,クラス間の意味的エイリアスを最小化しつつ,基底ベクトルの直交性を最大化する。再構成された表現空間内では、クエリ特徴をサポートベクタに投影し、正確なセマンティックアクティベーションを行うことにより、他のクラスからの干渉をさらに抑制する。提案手法はアンチエイリアス・セマンティック・リストラクション (ASR) と呼ばれ, 数発の学習問題に対する体系的かつ解釈可能な解決策を提供する。 PASCAL VOCとMS COCOデータセットの大規模な実験により、ASRは以前の研究と比べて強い結果が得られることが示された。 Encouraging progress in few-shot semantic segmentation has been made by leveraging features learned upon base classes with sufficient training data to represent novel classes with few-shot examples. However, this feature sharing mechanism inevitably causes semantic aliasing between novel classes when they have similar compositions of semantic concepts. In this paper, we reformulate few-shot segmentation as a semantic reconstruction problem, and convert base class features into a series of basis vectors which span a class-level semantic space for novel class reconstruction. By introducing contrastive loss, we maximize the orthogonality of basis vectors while minimizing semantic aliasing between classes. Within the reconstructed representation space, we further suppress interference from other classes by projecting query features to the support vector for precise semantic activation. Our proposed approach, referred to as anti-aliasing semantic reconstruction (ASR), provides a systematic yet interpretable solution for few-shot learning problems. Extensive experiments on PASCAL VOC and MS COCO datasets show that ASR achieves strong results compared with the prior works.	翻訳日:2021-06-02 14:20:43 公開日:2021-06-01
# 不均衡半教師学習における再サンプリングの再考 Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning ( http://arxiv.org/abs/2106.00209v1 ) ライセンス: Link先を確認	Ju He, Adam Kortylewski, Shaokang Yang, Shuai Liu, Cheng Yang, Changhu Wang, Alan Yuille	(参考訳) Semi-Supervised Learning (SSL)はラベル付きデータが不足している場合にラベル付きデータを利用する強力な能力を示している。しかし、ほとんどのSSLアルゴリズムは、クラス分布がトレーニングセットとテストセットの両方で均衡しているという仮定の下で機能する。本研究では,クラス不均衡データに対するSSLの問題について考察する。特に、表現と分類器の訓練を分離し、分類器を含むネットワーク全体のトレーニングや特徴抽出器のみを微調整する際に異なるデータ再サンプリング手法の効果を体系的に検討する。特にラベルなしデータのマイノリティクラスにおいて、疑似ラベルの精度を高めるため、データ再サンプリングは優れた分類法を学ぶ上で非常に重要であることがわかった。興味深いことに、特徴抽出器をトレーニングする際、むしろ逆にデータ再サンプリングが特徴抽出器のトレーニングを損なう場合、正確な擬似ラベルは役に立たない。この発見は、間違った擬似ラベルがSSLのモデルパフォーマンスを常に損なうという一般的な直観に反している。これらの結果を踏まえて,単一データ再サンプリング戦略の現在のパラダイムを再考し,クラス不均衡データに対するsslの単純かつ高効率なbis戦略を開発することを提案する。 BiSは機能抽出器と分類器をトレーニングするための2つの異なる再サンプリング戦略を実装し、この分離されたトレーニングをエンドツーエンドフレームワークに統合する。 Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce. However, most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets. In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations but has only received limited attention so far. In particular, we decouple the training of the representation and the classifier, and systematically investigate the effects of different data re-sampling techniques when training the whole network including a classifier as well as fine-tuning the feature extractor only. We find that data re-sampling is of critical importance to learn a good classifier as it increases the accuracy of the pseudo-labels, in particular for the minority classes in the unlabeled data. Interestingly, we find that accurate pseudo-labels do not help when training the feature extractor, rather contrariwise, data re-sampling harms the training of the feature extractor. This finding is against the general intuition that wrong pseudo-labels always harm the model performance in SSL. Based on these findings, we suggest to re-think the current paradigm of having a single data re-sampling strategy and develop a simple yet highly effective Bi-Sampling (BiS) strategy for SSL on class-imbalanced data. BiS implements two different re-sampling strategies for training the feature extractor and the classifier and integrates this decoupled training into an end-to-end framework... Code will be released at https://github.com/TACJu/Bi-Sampling.	翻訳日:2021-06-02 14:20:25 公開日:2021-06-01
# EV-VGCNN:イベントベースオブジェクト分類のためのVoxel Graph CNN EV-VGCNN: A Voxel Graph CNN for Event-based Object Classification ( http://arxiv.org/abs/2106.00216v1 ) ライセンス: Link先を確認	Yongjian Deng, Hao Chen, Huiying Chen, Youfu Li	(参考訳) イベントカメラは、少ない強度変化を報告し、ポータブルデバイス上での視覚知覚と理解のための低消費電力、高ダイナミックレンジ、高応答速度の顕著な利点を目立たせる。イベントベースの学習手法は、従来型の2d学習アルゴリズムを適用するために、イベントを高密度フレームベースの表現に統合することで、オブジェクト認識において大きな成功を収めている。しかし、これらの手法は、スパース・トゥ・ディエンス変換と重厚大容量モデルを必要とするモデルにおいて、多くの冗長な情報を導入し、実際の応用におけるイベントカメラの可能性を制限する。 To address the core problem of balancing accuracy and model complexity for event-based classification models, we (1) construct graph representations for event data to utilize their sparsity nature better and design a lightweight end-to-end graph neural network (EV-VGCNN) for classification; (2) use voxel-wise vertices rather than traditional point-wise methods to incorporate the information from more points; (3) introduce a multi-scale feature relational layer (MFRL) to extract semantic and motion cues from each vertex adaptively concerning its distances to neighbors. 総合的な実験により,本手法は20倍近いパラメータ削減(約0.84Mパラメータ)を達成しつつ,最先端の分類精度を向上することが示された。 Event cameras report sparse intensity changes and hold noticeable advantages of low power consumption, high dynamic range, and high response speed for visual perception and understanding on portable devices. Event-based learning methods have recently achieved massive success on object recognition by integrating events into dense frame-based representations to apply traditional 2D learning algorithms. However, these approaches introduce much redundant information during the sparse-to-dense conversion and necessitate models with heavy-weight and large capacities, limiting the potential of event cameras on real-life applications. To address the core problem of balancing accuracy and model complexity for event-based classification models, we (1) construct graph representations for event data to utilize their sparsity nature better and design a lightweight end-to-end graph neural network (EV-VGCNN) for classification; (2) use voxel-wise vertices rather than traditional point-wise methods to incorporate the information from more points; (3) introduce a multi-scale feature relational layer (MFRL) to extract semantic and motion cues from each vertex adaptively concerning its distances to neighbors. Comprehensive experiments show that our approach advances state-of-the-art classification accuracy while achieving nearly 20 times parameter reduction (merely 0.84M parameters).	翻訳日:2021-06-02 14:20:00 公開日:2021-06-01
# 自己学習に基づくトランスダクティブゼロショット学習のためのハードネスサンプリング Hardness Sampling for Self-Training Based Transductive Zero-Shot Learning ( http://arxiv.org/abs/2106.00264v1 ) ライセンス: Link先を確認	Liu Bo, Qiulei Dong, Zhanyi Hu	(参考訳) 既存のZSL作業におけるドメインシフト問題を緩和するトランスダクティブゼロショット学習(T-ZSL)が近年注目を集めている。しかし、T-ZSLのオープンな問題として、未確認クラスのサンプルを効果的にトレーニングに利用する方法が残っている。そこで本研究では,zsl法で見られる不均一な予測現象に基づいて,訓練過程における難易度が異なる非検出クラスサンプルの役割を経験的に解析し,3つの観察結果を得た。そこで本研究では,与えられた非知覚型データセットから多様かつ硬質なサンプルのサブセットを選択するための2つのハードネスサンプリング手法を提案する。第1はモデル予測のクラスレベル周波数に基づいてサンプルを識別し、第2は探索された事前推定アルゴリズムにより推定された近似クラスを介してクラス周波数を正規化することにより前者を強化する。最後に, 任意の誘導型ZSL法をシームレスに組み込むことができ, ハードネスサンプリング手法で選択した未確認のサンプルを反復的に学習できるSTHSという, T-ZSL用自己学習フレームワークを設計した。我々はSTHSフレームワークに2つの典型的なZSL手法を導入し、得られたT-ZSL法が3つの公開ベンチマークで多くの最先端手法より優れていることを示す。また,既存の一般化ZSL (T-GZSL) 法を学習するためには,未確認のデータセットを別々に使用し,GZSL タスクには厳密でない点に留意する。したがって、より厳密なT-GZSLデータ設定を提案し、提案したSTHSフレームワークをT-GZSLに導入することにより、この設定の競争ベースラインを確立する。 Transductive zero-shot learning (T-ZSL) which could alleviate the domain shift problem in existing ZSL works, has received much attention recently. However, an open problem in T-ZSL: how to effectively make use of unseen-class samples for training, still remains. Addressing this problem, we first empirically analyze the roles of unseen-class samples with different degrees of hardness in the training process based on the uneven prediction phenomenon found in many ZSL methods, resulting in three observations. Then, we propose two hardness sampling approaches for selecting a subset of diverse and hard samples from a given unseen-class dataset according to these observations. The first one identifies the samples based on the class-level frequency of the model predictions while the second enhances the former by normalizing the class frequency via an approximate class prior estimated by an explored prior estimation algorithm. Finally, we design a new Self-Training framework with Hardness Sampling for T-ZSL, called STHS, where an arbitrary inductive ZSL method could be seamlessly embedded and it is iteratively trained with unseen-class samples selected by the hardness sampling approach. We introduce two typical ZSL methods into the STHS framework and extensive experiments demonstrate that the derived T-ZSL methods outperform many state-of-the-art methods on three public benchmarks. Besides, we note that the unseen-class dataset is separately used for training in some existing transductive generalized ZSL (T-GZSL) methods, which is not strict for a GZSL task. Hence, we suggest a more strict T-GZSL data setting and establish a competitive baseline on this setting by introducing the proposed STHS framework to T-GZSL.	翻訳日:2021-06-02 14:19:41 公開日:2021-06-01
# 深部特徴再構成を用いた半教師付き距離推定 Semi-Supervised Disparity Estimation with Deep Feature Reconstruction ( http://arxiv.org/abs/2106.00318v1 ) ライセンス: Link先を確認	Julia Guerrero-Viu, Sergio Izquierdo, Philipp Schr\"oppel and Thomas Brox	(参考訳) 差分推定におけるディープラーニングの成功にもかかわらず、領域一般化のギャップは依然として問題である。本稿では,ラベル付き合成データの教師付きトレーニングとラベルなし実データに対する自己教師付きトレーニングを共同で行うことで,dispnetを実世界ドメインに適応させる半教師付きパイプラインを提案する。さらに, 広範に使用されている測光損失の限界を考慮し, 深部特徴再構成の影響を, 差分推定のための有望な監視信号として分析する。 Despite the success of deep learning in disparity estimation, the domain generalization gap remains an issue. We propose a semi-supervised pipeline that successfully adapts DispNet to a real-world domain by joint supervised training on labeled synthetic data and self-supervised training on unlabeled real data. Furthermore, accounting for the limitations of the widely-used photometric loss, we analyze the impact of deep feature reconstruction as a promising supervisory signal for disparity estimation.	翻訳日:2021-06-02 14:19:06 公開日:2021-06-01
# 点雲の遠隔登録のための一貫した二流ネットワーク Consistent Two-Flow Network for Tele-Registration of Point Clouds ( http://arxiv.org/abs/2106.00329v1 ) ライセンス: Link先を確認	Zihao Yan, Zimu Yi, Ruizhen Hu, Niloy J. Mitra, Daniel Cohen-Or, Hui Huang	(参考訳) 部分観測の厳密な登録は、様々な応用分野における根本的な問題である。コンピュータグラフィックスでは、走査装置によって生成される2つの部分点雲間の登録に特に注意が払われている。最先端の登録技術は、2つのポイントクラウド間のオーバーラップ領域が小さく、スキャンペア間のオーバーラップがなければ、完全に失敗する。本稿では,この問題を緩和し,任意のポーズで提示された点群間の登録を可能にし,重なりがほとんどあるいは全くない,遠隔登録と呼ばれる設定を学習ベースで行う手法を提案する。本手法は,一群の形状の先行を学習し,部分的な形状を完遂できる新しいニューラルネットワーク設計に基づいている。キーとなるアイデアは、登録と完了タスクを互いに強化する方法で組み合わせることです。特に,登録ネットワークと完了ネットワークを,登録・完了フローと完全登録フローの2つの結合フローを用いて同時に訓練し,両フローが一貫した結果を生み出すように促す。個別のフローと比較すると、この2フロートレーニングは堅牢で信頼性の高い遠隔登録につながり、したがって登録されたスキャンを完了したより良いポイントクラウド予測につながる。また、ニューラルネットワークの各コンポーネントは、完了と登録の両方において最先端の手法よりも優れています。我々はさらに,いくつかのアブレーション研究によりネットワークを解析し,その性能を合成と実世界の両方の多数の部分点雲上で実証した。 Rigid registration of partial observations is a fundamental problem in various applied fields. In computer graphics, special attention has been given to the registration between two partial point clouds generated by scanning devices. State-of-the-art registration techniques still struggle when the overlap region between the two point clouds is small, and completely fail if there is no overlap between the scan pairs. In this paper, we present a learning-based technique that alleviates this problem, and allows registration between point clouds, presented in arbitrary poses, and having little or even no overlap, a setting that has been referred to as tele-registration. Our technique is based on a novel neural network design that learns a prior of a class of shapes and can complete a partial shape. The key idea is combining the registration and completion tasks in a way that reinforces each other. In particular, we simultaneously train the registration network and completion network using two coupled flows, one that register-and-complete, and one that complete-and-register, and encourage the two flows to produce a consistent result. We show that, compared with each separate flow, this two-flow training leads to robust and reliable tele-registration, and hence to a better point cloud prediction that completes the registered scans. It is also worth mentioning that each of the components in our neural network outperforms state-of-the-art methods in both completion and registration. We further analyze our network with several ablation studies and demonstrate its performance on a large number of partial point clouds, both synthetic and real-world, that have only small or no overlap.	翻訳日:2021-06-02 14:18:59 公開日:2021-06-01
# Transformer-Encoder Deep Features を用いた高能率クロスプラットフォームビジュアルテキスト検索に向けて Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features ( http://arxiv.org/abs/2106.00358v1 ) ライセンス: Link先を確認	Nicola Messina, Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, St\'ephane Marchand-Maillet	(参考訳) クロスモーダル検索は、クエリや検索対象を異なるモダリティに関連付けることでユーザエクスペリエンスを向上させるため、現代の検索エンジンにおいて重要な機能である。本稿では,ある文(画像検索)の関連画像や,ある画像(画像検索)の関連文を効率的に見つけることを目的とした画像文検索タスクに着目した。コンピュータビジョン文献は、注意と自己注意機構を備えたディープニューラルネットワークを用いた画像文マッチングタスクにおける最良の結果を報告する。データセット全体の逐次スキャンを行い,検索タスクのマッチング性能を評価する。この方法は画像や字幕の数が増えるほどスケールが良くない。本研究では,画像テキストマッチングのための最先端のディープラーニングアーキテクチャから抽出する,スパース化された深層マルチモーダル特徴を生成するための,さまざまな前処理手法について検討する。我々の主な目的は、複雑なマルチモーダル記述の効率的な索引付けのための経路を敷設することである。我々は最近導入されたTERNアーキテクチャを画像文特徴抽出器として利用する。画像全体と文を記述した固定サイズ1024-dベクターと、2つのモーダル(画像領域と文語)の様々な構成要素を記述する可変長1024-dベクターを作成するように設計されている。これらのベクトルはすべて、TERN設計によって同じ共通空間に置かれるように強制される。本実験では,本手法の予備実験を行い,本研究の方向性についてさらなる実験を行うことを提案する。 Cross-modal retrieval is an important functionality in modern search engines, as it increases the user experience by allowing queries and retrieved objects to pertain to different modalities. In this paper, we focus on the image-sentence retrieval task, where the objective is to efficiently find relevant images for a given sentence (image-retrieval) or the relevant sentences for a given image (sentence-retrieval). Computer vision literature reports the best results on the image-sentence matching task using deep neural networks equipped with attention and self-attention mechanisms. They evaluate the matching performance on the retrieval task by performing sequential scans of the whole dataset. This method does not scale well with an increasing amount of images or captions. In this work, we explore different preprocessing techniques to produce sparsified deep multi-modal features extracting them from state-of-the-art deep-learning architectures for image-text matching. Our main objective is to lay down the paths for efficient indexing of complex multi-modal descriptions. We use the recently introduced TERN architecture as an image-sentence features extractor. It is designed for producing fixed-size 1024-d vectors describing whole images and sentences, as well as variable-length sets of 1024-d vectors describing the various building components of the two modalities (image regions and sentence words respectively). All these vectors are enforced by the TERN design to lie into the same common space. Our experiments show interesting preliminary results on the explored methods and suggest further experimentation in this important research direction.	翻訳日:2021-06-02 14:18:35 公開日:2021-06-01
# ネットワーク活性化の自然統計と知識蒸留への示唆 Natural Statistics of Network Activations and Implications for Knowledge Distillation ( http://arxiv.org/abs/2106.00368v1 ) ライセンス: Link先を確認	Michael Rotman and Lior Wolf	(参考訳) 自然画像統計の研究に類似するものとして,様々な層におけるディープニューラルネットワークの活性化の自然統計について検討する。ご覧の通り、これらの統計は画像統計と同様、権力法に従っている。また,解析的にも経験的にも,このパワー法則の指数が線形速度で増加することを示した。発見の直接的意味として,我々は知識蒸留(KD)を行う方法を提案する。従来のKD手法では教師ネットワークのロジットを考慮しているが,近年ではアクティベーションマップによる性能向上が図られている。しかし、これは画像の比較に適したメトリクスを使用します。本稿では,中間活性化写像のスペクトル特性に基づく2つの損失項を提案する。提案手法は,複数の画像認識KDベンチマークにおける技術結果の状態を求める。 In a matter that is analog to the study of natural image statistics, we study the natural statistics of the deep neural network activations at various layers. As we show, these statistics, similar to image statistics, follow a power law. We also show, both analytically and empirically, that with depth the exponent of this power law increases at a linear rate. As a direct implication of our discoveries, we present a method for performing Knowledge Distillation (KD). While classical KD methods consider the logits of the teacher network, more recent methods obtain a leap in performance by considering the activation maps. This, however, uses metrics that are suitable for comparing images. We propose to employ two additional loss terms that are based on the spectral properties of the intermediate activation maps. The proposed method obtains state of the art results on multiple image recognition KD benchmarks.	翻訳日:2021-06-02 14:18:11 公開日:2021-06-01
# DLA-Net: 大規模建物ファサードポイントクラウドのセマンティックセグメンテーションのためのデュアルローカルアテンション特徴の学習 DLA-Net: Learning Dual Local Attention Features for Semantic Segmentation of Large-Scale Building Facade Point Clouds ( http://arxiv.org/abs/2106.00376v1 ) ライセンス: Link先を確認	Yanfei Su, Weiquan Liu, Zhimin Yuan, Ming Cheng, Zhihong Zhang, Xuelun Shen, Cheng Wang	(参考訳) 建物ファサードのセマンティックセグメンテーションは、都市建物の再建や損傷評価など、様々な用途において重要である。細粒度のビルディングファサードに関連する3dポイントクラウドデータセットが不足しているため、最初の大規模ビルディングファサードポイントクラウドベンチマークデータセットをセマンティックセグメンテーションのために構築する。既存のセマンティックセグメンテーションの方法は、点雲の局所的な近傍情報を完全にはマイニングできない。本稿では,dlaと呼ばれる2つの局所的注意特徴を学習する学習可能な注意モジュールを提案する。提案したDLAモジュールは、自己注意ブロックと注意プールブロックの2つのブロックから構成されており、どちらも拡張された位置符号化ブロックを埋め込んでいる。 DLAモジュールは、ポイントクラウドセグメンテーションのために、様々なネットワークアーキテクチャに簡単に組み込むことができ、自然界では、DLA-Netと呼ばれるエンコーダデコーダアーキテクチャを持つ新しい3Dセグメンテーションネットワークとなる。構築したファサードデータセットの大規模な実験結果から,提案したDLA-Netは,セマンティックセグメンテーションの最先端手法よりも優れた性能を示すことが示された。 Semantic segmentation of building facade is significant in various applications, such as urban building reconstruction and damage assessment. As there is a lack of 3D point clouds datasets related to the fine-grained building facade, we construct the first large-scale building facade point clouds benchmark dataset for semantic segmentation. The existing methods of semantic segmentation cannot fully mine the local neighborhood information of point clouds. Addressing this problem, we propose a learnable attention module that learns Dual Local Attention features, called DLA in this paper. The proposed DLA module consists of two blocks, including the self-attention block and attentive pooling block, which both embed an enhanced position encoding block. The DLA module could be easily embedded into various network architectures for point cloud segmentation, naturally resulting in a new 3D semantic segmentation network with an encoder-decoder architecture, called DLA-Net in this work. Extensive experimental results on our constructed building facade dataset demonstrate that the proposed DLA-Net achieves better performance than the state-of-the-art methods for semantic segmentation.	翻訳日:2021-06-02 14:17:59 公開日:2021-06-01
# パノDR:屋内シーンの球状パノラマが消滅 PanoDR: Spherical Panorama Diminished Reality for Indoor Scenes ( http://arxiv.org/abs/2106.00446v1 ) ライセンス: Link先を確認	V. Gkitsas, V. Sterzentsenko, N. Zioulis, G. Albanis, D. Zarpalas	(参考訳) 屋内スキャンを民主化する商用360^\circ$カメラの普及により、内部空間の再設計などの新しい応用への関心が高まっている。縮小現実(英語版) (dr) はそのような応用要件を満たし、シーン内の既存のオブジェクトを取り除き、実質的には偽りの塗りつぶしタスクに翻訳する。近年のデータ駆動塗布の進歩は、現実的なサンプルの生成において顕著な進歩を見せているが、現実の地図構造による結果の生成には制約がない。屋内(再計画)アプリケーションにおける「現実」を保つためには,シーンの構造保存が重要である。そこで本研究では,室内シーンの構造を最初に予測し,それを用いて同一シーンの空-背景のみ-表現の再構築を導くモデルを提案する。 dr用に修正されたstructured3dデータセットのバージョンで、他の最先端の手法をトレーニングして比較し、定量的測定と質的結果の両方において優れた結果を示しているが、より興味深いことに、このアプローチはより高速な収束率を示している。コードとモデルはhttps://vcl3d.github.io/panodr/で入手できる。 The rising availability of commercial $360^\circ$ cameras that democratize indoor scanning, has increased the interest for novel applications, such as interior space re-design. Diminished Reality (DR) fulfills the requirement of such applications, to remove existing objects in the scene, essentially translating this to a counterfactual inpainting task. While recent advances in data-driven inpainting have shown significant progress in generating realistic samples, they are not constrained to produce results with reality mapped structures. To preserve the `reality' in indoor (re-)planning applications, the scene's structure preservation is crucial. To ensure structure-aware counterfactual inpainting, we propose a model that initially predicts the structure of an indoor scene and then uses it to guide the reconstruction of an empty -- background only -- representation of the same scene. We train and compare against other state-of-the-art methods on a version of the Structured3D dataset modified for DR, showing superior results in both quantitative metrics and qualitative results, but more interestingly, our approach exhibits a much faster convergence rate. Code and models are available at https://vcl3d.github.io/PanoDR/ .	翻訳日:2021-06-02 14:17:37 公開日:2021-06-01
# プロトタイプを用いたセマンティックセグメンテーションの異常検出 Detecting Anomalies in Semantic Segmentation with Prototypes ( http://arxiv.org/abs/2106.00472v1 ) ライセンス: Link先を確認	Dario Fontanel, Fabio Cermelli, Massimiliano Mancini, Barbara Caputo	(参考訳) 従来のセマンティックセグメンテーションメソッドは、テスト時にトレーニングセットに存在するクラスのみを認識することができる。これは重要な制限であり、特にインテリジェントな自律システムに搭載されたセマンティックセグメンテーションアルゴリズムは、現実的な設定でデプロイされる。システムがトレーニング時に何つのクラスを見たかに関わらず、予期せぬ未知のオブジェクトがテスト時に現れることは避けられない。このような異常を特定することの失敗は、現実世界に配備された場合、そのようなセグメンテーションモデルを備えた自律エージェントの不正確で危険な行動を引き起こす可能性がある。異常セグメンテーション技術の現状は生成モデルを用いており、訓練中に見えないパターンを再構築することができない。しかし、これらのモデルのトレーニングは高価であり、生成されたアーティファクトは誤った異常を生み出す可能性がある。本稿では,異なる経路をとり,プロトタイプ学習による異常セグメンテーションに対処することを提案する。我々の直感では、異常画素はモデルで知られている全てのクラスプロトタイプと異なるものである。学習データから,コサイン類似度に基づく分類器を用いて,軽量にクラスプロトタイプを抽出する。また,StreetHazards実験の結果,計算オーバーヘッドの低減にもかかわらず,従来の手法に比べて大きな差があることがわかった。コードはhttps://github.com/DarioFontanel/PAnSで入手できる。 Traditional semantic segmentation methods can recognize at test time only the classes that are present in the training set. This is a significant limitation, especially for semantic segmentation algorithms mounted on intelligent autonomous systems, deployed in realistic settings. Regardless of how many classes the system has seen at training time, it is inevitable that unexpected, unknown objects will appear at test time. The failure in identifying such anomalies may lead to incorrect, even dangerous behaviors of the autonomous agent equipped with such segmentation model when deployed in the real world. Current state of the art of anomaly segmentation uses generative models, exploiting their incapability to reconstruct patterns unseen during training. However, training these models is expensive, and their generated artifacts may create false anomalies. In this paper we take a different route and we propose to address anomaly segmentation through prototype learning. Our intuition is that anomalous pixels are those that are dissimilar to all class prototypes known by the model. We extract class prototypes from the training data in a lightweight manner using a cosine similarity-based classifier. Experiments on StreetHazards show that our approach achieves the new state of the art, with a significant margin over previous works, despite the reduced computational overhead. Code is available at https://github.com/DarioFontanel/PAnS.	翻訳日:2021-06-02 14:17:17 公開日:2021-06-01
# 赤外線小ターゲット検出のためのDense Nested Attention Network Dense Nested Attention Network for Infrared Small Target Detection ( http://arxiv.org/abs/2106.00487v1 ) ライセンス: Link先を確認	Boyang Li, Chao Xiao, Longguang Wang, Yingqian Wang, Zaiping Lin, Miao Li, Wei An, Yulan Guo	(参考訳) 単一フレーム赤外線小ターゲット(SIRST)検出は、小さなターゲットを乱雑な背景から分離することを目的としている。ディープラーニングの進歩により、cnnベースの手法は強力なモデリング能力により、汎用オブジェクト検出に有望な結果をもたらした。しかし、既存のCNNベースの手法は、ネットワーク内のプール層が深い層内のターゲットを失う可能性があるため、赤外線小ターゲットに対して直接適用することはできない。この問題に対処するため,本論文では,高密度なネスト型注意ネットワーク(DNANet)を提案する。具体的には,高レベルかつ低レベルの特徴間のプログレッシブな相互作用を実現するために,高密度ネスト型インタラクティブモジュール(DNIM)を設計する。 DNIMにおける繰り返しの相互作用により、深い層内の赤外線小ターゲットを維持することができる。 DNIMに基づいて,多レベル特徴を適応的に拡張するカスケードチャネルと空間アテンションモジュール(CSAM)を提案する。我々のDNANetでは、小さなターゲットのコンテキスト情報をうまく組み込んで、繰り返し融合と拡張によって完全に活用することができる。さらに,赤外線小目標データセット(nudt-sirst)を開発し,総合的な性能評価を行うための評価指標を提案する。公開と自己開発の両方の実験により,本手法の有効性が示された。本手法は他の最先端手法と比較して,検出確率 (Pd), 偽アラーム率 (Fa), 結合の交叉率 (IoU) の点で, 性能が向上する。 Single-frame infrared small target (SIRST) detection aims at separating small targets from clutter backgrounds. With the advances of deep learning, CNN-based methods have yielded promising results in generic object detection due to their powerful modeling capability. However, existing CNN-based methods cannot be directly applied for infrared small targets since pooling layers in their networks could lead to the loss of targets in deep layers. To handle this problem, we propose a dense nested attention network (DNANet) in this paper. Specifically, we design a dense nested interactive module (DNIM) to achieve progressive interaction among high-level and low-level features. With the repeated interaction in DNIM, infrared small targets in deep layers can be maintained. Based on DNIM, we further propose a cascaded channel and spatial attention module (CSAM) to adaptively enhance multi-level features. With our DNANet, contextual information of small targets can be well incorporated and fully exploited by repeated fusion and enhancement. Moreover, we develop an infrared small target dataset (namely, NUDT-SIRST) and propose a set of evaluation metrics to conduct comprehensive performance evaluation. Experiments on both public and our self-developed datasets demonstrate the effectiveness of our method. Compared to other state-of-the-art methods, our method achieves better performance in terms of probability of detection (Pd), false-alarm rate (Fa), and intersection of union (IoU).	翻訳日:2021-06-02 14:16:58 公開日:2021-06-01
# 視覚前訓練作業における自己の多様性と不変性を探る Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task ( http://arxiv.org/abs/2106.00537v1 ) ライセンス: Link先を確認	Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian	(参考訳) 近年,自己指導型学習手法は視覚前訓練において顕著な成功を収めている。各画像の異なる拡張ビューをまとめたり、あるいは他の新しいメカニズムを取り入れることで、教師なしの知識を習得し、事前学習モデルの転送性能を大幅に向上させることができる。しかし、これらの作品は表現の崩壊問題を避けることはできない。つまり、それらは限られた領域のみに焦点を当てたり、画像内の全く異なる領域で抽出された特徴がほぼ同じである。一般に、この問題は、事前学習モデルが画像内の複数の粒度の情報を十分に記述できないため、転送性能の上限がさらに制限される。この問題を軽減するため,本稿では,e-diyにおける多様性と不変性を検討するという,単純かつ効果的なメカニズムを紹介する。 E-DIYは、各拡張ビュー内の最も異なる領域を移動させることで、抽出された領域レベルの特徴の多様性を維持できる。同じ画像の異なる拡張ビューから最も類似した領域を抽出することで、E-DIYは領域レベルの機能の堅牢性を確保することができる。上記の多様性と不変性探索機構から、E-DIYは各画像内の多粒度視覚情報を最大限に抽出する。例えば、COCO上の強力なベースラインであるBYOLに比べて2.1%改善され、R50-C4バックボーンと1X学習スケジュールを微調整したMask R-CNNが実現された。 Recently, self-supervised learning methods have achieved remarkable success in visual pre-training task. By simply pulling the different augmented views of each image together or other novel mechanisms, they can learn much unsupervised knowledge and significantly improve the transfer performance of pre-training models. However, these works still cannot avoid the representation collapse problem, i.e., they only focus on limited regions or the extracted features on totally different regions inside each image are nearly the same. Generally, this problem makes the pre-training models cannot sufficiently describe the multi-grained information inside images, which further limits the upper bound of their transfer performance. To alleviate this issue, this paper introduces a simple but effective mechanism, called Exploring the Diversity and Invariance in Yourself E-DIY. By simply pushing the most different regions inside each augmented view away, E-DIY can preserve the diversity of extracted region-level features. By pulling the most similar regions from different augmented views of the same image together, E-DIY can ensure the robustness of region-level features. Benefited from the above diversity and invariance exploring mechanism, E-DIY maximally extracts the multi-grained visual information inside each image. Extensive experiments on downstream tasks demonstrate the superiority of our proposed approach, e.g., there are 2.1% improvements compared with the strong baseline BYOL on COCO while fine-tuning Mask R-CNN with the R50-C4 backbone and 1X learning schedule.	翻訳日:2021-06-02 14:16:34 公開日:2021-06-01
# 変圧器ネットワークと拡張情報を用いた都市シナリオにおける車両軌跡予測 Predicting Vehicles Trajectories in Urban Scenarios with Transformer Networks and Augmented Information ( http://arxiv.org/abs/2106.00559v1 ) ライセンス: Link先を確認	A. Quintanar, D. Fern\'andez-Llorca, I. Parra, R. Izquierdo, M. A. Sotelo	(参考訳) 道路利用者の行動を理解することは,軌道予測システムの開発に不可欠である。この文脈では、最新の進歩は繰り返しの構造に焦点を合わせ、現場に関わるエージェント間の社会的相互作用を確立している。近年では、変圧器ネットワークに基づく歩行者軌道予測や位置情報を用いた簡易な構造も導入されている。それぞれのエージェントの軌道の個々のモデリングを、複雑な相互作用項を使わずに別々に行うことができる。提案モデルでは, 都市シナリオにおける車両軌道予測問題に最大5秒間, 付加データ(位置と方向)を付加することにより, これらの単純な構造を利用する。さらに、最近のデータセット(inD, rounD, highD, InterAction)を使用して、ハイウェイ、交差点、ラウンドアバウトを含むさまざまなタイプのシナリオ間で、クロスパフォーマンス分析を行う。我々のモデルは最先端の成果を達成し、異なるタイプの都市環境に柔軟で適応可能であることを証明している。 Understanding the behavior of road users is of vital importance for the development of trajectory prediction systems. In this context, the latest advances have focused on recurrent structures, establishing the social interaction between the agents involved in the scene. More recently, simpler structures have also been introduced for predicting pedestrian trajectories, based on Transformer Networks, and using positional information. They allow the individual modelling of each agent's trajectory separately without any complex interaction terms. Our model exploits these simple structures by adding augmented data (position and heading), and adapting their use to the problem of vehicle trajectory prediction in urban scenarios in prediction horizons up to 5 seconds. In addition, a cross-performance analysis is performed between different types of scenarios, including highways, intersections and roundabouts, using recent datasets (inD, rounD, highD and INTERACTION). Our model achieves state-of-the-art results and proves to be flexible and adaptable to different types of urban contexts.	翻訳日:2021-06-02 14:16:10 公開日:2021-06-01
# メタプロトタイプを用いたプレエンハンスフットショットセグメンテーション Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes ( http://arxiv.org/abs/2106.00572v1 ) ライセンス: Link先を確認	Jian-Wei Zhang, Lei Lv, Yawei Luo, Hao-Zhe Feng, Yi Yang, Wei Chen	(参考訳) Few-shot segmentation~(FSS)のパフォーマンスは、エピソードトレーニングとクラスワイドプロトタイプの導入によって広範囲に向上している。しかし,FSS問題は,(1)モデルがタスク非関連情報に気を散らすこと,(2)単一プロトタイプの表現能力に制限があること,(3)クラス関連プロトタイプは基本クラスの事前の知識を無視すること,の3つの制約により,依然として困難なままである。これらの制約に対処するために,メタプロトタイプを用いた事前拡張ネットワークを提案する。 pre-enhanced networkは、機能抽出における support and query (pseudo-) ラベルを活用し、モデルが前景オブジェクトのタスク関連の特徴に焦点を合わせ、教師付き知識の欠如により多くのノイズを抑制する。さらに,階層的特徴をエンコードし,クラスに依存しない構造情報を学習するために,複数のメタプロトタイプを導入する。階層的特徴は決定境界を強調表示し,ハードピクセルに着目し,基本クラスから学習した構造情報は新規クラスの事前知識として扱われる。実験の結果, PASCAL-$5^i$およびCOCO-$20^i$では平均IoUスコアが60.79%, 41.16%となり, 5ショット設定では3.49%, 5.64%向上した。さらに,上記2つのベンチマークにおいて,5ショット精度を3.73%,10.32%向上させた。このメソッドのソースコードはhttps://github.com/jarvis73/pempで入手できます。 Few-shot segmentation~(FSS) performance has been extensively promoted by introducing episodic training and class-wise prototypes. However, the FSS problem remains challenging due to three limitations: (1) Models are distracted by task-unrelated information; (2) The representation ability of a single prototype is limited; (3) Class-related prototypes ignore the prior knowledge of base classes. We propose the Prior-Enhanced network with Meta-Prototypes to tackle these limitations. The prior-enhanced network leverages the support and query (pseudo-) labels in feature extraction, which guides the model to focus on the task-related features of the foreground objects, and suppress much noise due to the lack of supervised knowledge. Moreover, we introduce multiple meta-prototypes to encode hierarchical features and learn class-agnostic structural information. The hierarchical features help the model highlight the decision boundary and focus on hard pixels, and the structural information learned from base classes is treated as the prior knowledge for novel classes. Experiments show that our method achieves the mean-IoU scores of 60.79% and 41.16% on PASCAL-$5^i$ and COCO-$20^i$, outperforming the state-of-the-art method by 3.49% and 5.64% in the 5-shot setting. Moreover, comparing with 1-shot results, our method promotes 5-shot accuracy by 3.73% and 10.32% on the above two benchmarks. The source code of our method is available at https://github.com/Jarvis73/PEMP.	翻訳日:2021-06-02 14:15:53 公開日:2021-06-01
# 半教師付きセマンティックセグメンテーションのためのロバスト相互学習 Robust Mutual Learning for Semi-supervised Semantic Segmentation ( http://arxiv.org/abs/2106.00609v1 ) ライセンス: Link先を確認	Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Fang Wen	(参考訳) 最近の半教師付き学習(SSL)法は一般に擬似ラベリングに基づいている。 SSL性能は擬似ラベルの品質に大きく影響されているため,疑似監視における雑音を効果的に抑制するための相互学習が提案されている。本研究では,従来のアプローチを2つの側面で改善する頑健な相互学習を提案する。まず、バニラ相互学習者は、モデルが均質な知識を学ぶために収束するかもしれない結合の問題に苦しむ。この問題は,教師同士の直接の交流がないように,教師同士の相互監督を生み出すことによって解決する。また,モデル結合の緩和には強固なデータ拡張,モデルノイズ,異種ネットワークアーキテクチャが不可欠であることを示す。第2に,相互学習はネットワークの擬似ラベル改良能力の活用に失敗していることに気付く。そこで,本研究では,内部知識を活用し,相互指導前の擬似ラベルを明示的に修正する自己認識を導入する。このような自己修正と相互指導によって、学習を通して擬似ラベルの精度が向上する。提案した頑健な相互学習は、低データ状態におけるセマンティックセグメンテーションにおける最先端のパフォーマンスを示す。 Recent semi-supervised learning (SSL) methods are commonly based on pseudo labeling. Since the SSL performance is greatly influenced by the quality of pseudo labels, mutual learning has been proposed to effectively suppress the noises in the pseudo supervision. In this work, we propose robust mutual learning that improves the prior approach in two aspects. First, the vanilla mutual learners suffer from the coupling issue that models may converge to learn homogeneous knowledge. We resolve this issue by introducing mean teachers to generate mutual supervisions so that there is no direct interaction between the two students. We also show that strong data augmentations, model noises and heterogeneous network architectures are essential to alleviate the model coupling. Second, we notice that mutual learning fails to leverage the network's own ability for pseudo label refinement. Therefore, we introduce self-rectification that leverages the internal knowledge and explicitly rectifies the pseudo labels before the mutual teaching. Such self-rectification and mutual teaching collaboratively improve the pseudo label accuracy throughout the learning. The proposed robust mutual learning demonstrates state-of-the-art performance on semantic segmentation in low-data regime.	翻訳日:2021-06-02 14:15:26 公開日:2021-06-01
# 独自の対応をブートストラップする Bootstrap Your Own Correspondences ( http://arxiv.org/abs/2106.00677v1 ) ライセンス: Link先を確認	Mohamed El Banani, Justin Johnson	(参考訳) 幾何学的特徴抽出はポイントクラウド登録パイプラインの重要なコンポーネントである。最近の研究は、より良くよりコンパクトな3d機能を学ぶために教師あり学習をどのように活用できるかを実証している。しかし、これらのアプローチは地道アノテーションに依存しているためスケーラビリティは制限される。本稿では,RGB-Dビデオから視覚的・幾何学的特徴を学習する自己教師型アプローチBYOCを提案する。我々の重要な観察は、ランダムに初期化されたcnnは私たちに良い対応を提供し、視覚と幾何学の両方の特徴の学習をブートストラップできるということです。我々のアプローチは、ポイントクラウド登録からの古典的なアイデアと、より最近の表現学習アプローチを組み合わせたものです。室内シーンデータセットに対するアプローチを評価し,従来型および学習済みのディスクリプタを上回りながら,現在の最先端の教師付きアプローチと競合することを見出した。 Geometric feature extraction is a crucial component of point cloud registration pipelines. Recent work has demonstrated how supervised learning can be leveraged to learn better and more compact 3D features. However, those approaches' reliance on ground-truth annotation limits their scalability. We propose BYOC: a self-supervised approach that learns visual and geometric features from RGB-D video without relying on ground-truth pose or correspondence. Our key observation is that randomly-initialized CNNs readily provide us with good correspondences; allowing us to bootstrap the learning of both visual and geometric features. Our approach combines classic ideas from point cloud registration with more recent representation learning approaches. We evaluate our approach on indoor scene datasets and find that our method outperforms traditional and learned descriptors, while being competitive with current state-of-the-art supervised approaches.	翻訳日:2021-06-02 14:15:09 公開日:2021-06-01
# 3次元物体のデータ駆動シャドウグラフシミュレーション Data-Driven Shadowgraph Simulation of a 3D Object ( http://arxiv.org/abs/2106.00317v1 ) ライセンス: Link先を確認	Anna Willmann, Patrick Stiller, Alexander Debus, Arie Irman, Richard Pausch, Yen-Yu Chang, Michael Bussmann, Nico Hoffmann	(参考訳) 本研究では,プラズマシャドウグラフのためのディープニューラルネットワークに基づく代理モデルを提案する。数値計算法で必要となるすべての先行する電場を計算することなく、所定の時間で電場を近似できる計算コストの低い投射型代理モデルを用いて数値コードを置換する。これは、プロジェクションベースサロゲートモデルにより、与えられた計算領域の任意の点と構成において、3次元偏微分方程式(3次元波動方程式)の解を完全なシミュレーションを実行することなく復元することができることを意味する。このモデルでは、シミュレーションパラメータの狭い範囲におけるデータの補間問題において、再構成の質が良く、大規模な入力データに使用することができる。 In this work we propose a deep neural network based surrogate model for a plasma shadowgraph - a technique for visualization of perturbations in a transparent medium. We are substituting the numerical code by a computationally cheaper projection based surrogate model that is able to approximate the electric fields at a given time without computing all preceding electric fields as required by numerical methods. This means that the projection based surrogate model allows to recover the solution of the governing 3D partial differential equation, 3D wave equation, at any point of a given compute domain and configuration without the need to run a full simulation. This model has shown a good quality of reconstruction in a problem of interpolation of data within a narrow range of simulation parameters and can be used for input data of large size.	翻訳日:2021-06-02 14:14:07 公開日:2021-06-01
# 直交分位回帰による条件範囲の改善 Improving Conditional Coverage via Orthogonal Quantile Regression ( http://arxiv.org/abs/2106.00394v1 ) ライセンス: Link先を確認	Shai Feldman, Stephen Bates, Yaniv Romano	(参考訳) 本研究では,特徴空間の全領域にわたるユーザ指定カバレッジレベルを持つ予測区間を生成する手法を開発した。このタスクの典型的なアプローチは、質的回帰を伴う条件付き四分位数を推定することであり、有限サンプルでは正確ではないものの、大きなサンプル限界のカバレッジが正しいことがよく知られている。従来の量子レグレッションは条件付きカバレッジが低いという実験で明らかになった。これを解決するために,区間の大きさと誤検出の指標との独立性を促進するために,損失関数を変更する。真の条件量子について、これらの2つの量は独立(直交)であるため、修正された損失関数は引き続き有効である。さらに,いくつかの指標で評価されるように,修正損失関数が条件付きカバレッジを改善することを実証的に示す。また,間隔の大きさと誤発見の指標との依存性の強さを調べることで条件付きカバレッジをチェックする2つの新しい指標も導入した。 We develop a method to generate prediction intervals that have a user-specified coverage level across all regions of feature-space, a property called conditional coverage. A typical approach to this task is to estimate the conditional quantiles with quantile regression -- it is well-known that this leads to correct coverage in the large-sample limit, although it may not be accurate in finite samples. We find in experiments that traditional quantile regression can have poor conditional coverage. To remedy this, we modify the loss function to promote independence between the size of the intervals and the indicator of a miscoverage event. For the true conditional quantiles, these two quantities are independent (orthogonal), so the modified loss function continues to be valid. Moreover, we empirically show that the modified loss function leads to improved conditional coverage, as evaluated by several metrics. We also introduce two new metrics that check conditional coverage by looking at the strength of the dependence between the interval size and the indicator of miscoverage.	翻訳日:2021-06-02 14:13:54 公開日:2021-06-01
# 半教師付きモデルは強い教師なしドメイン適応学習者である Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners ( http://arxiv.org/abs/2106.00417v1 ) ライセンス: Link先を確認	Yabin Zhang, Haojian Zhang, Bin Deng, Shuai Li, Kui Jia, Lei Zhang	(参考訳) unsupervised domain adaptation (uda) と semi-supervised learning (ssl) の2つは、機械学習における高価な手動アノテーションを減らすための典型的な戦略である。ターゲットタスクの効果的なモデルを学ぶために、UDAは利用可能なラベル付きソースデータを使用し、ターゲットドメイン内のラベルなしサンプルと異なる分布を持つ可能性がある。 UDAとSSLは全く異なる戦略のように見えるが、それらはタスクの目的とソリューションに関して密接に関連しており、SSLはUDA問題の特別なケースである。この結果に基づき、SSLメソッドが UDA タスクで機能するかどうかをさらに調査する。 UDAベンチマークに8つの代表的なSSLアルゴリズムを適用することで、SSLメソッドが強力なUDA学習者であることが分かる。特に、最先端のSSLメソッドは、DomainNetの挑戦的なUDAベンチマークにおいて既存のUDAメソッドよりも大幅に優れており、最先端のUDAメソッドはSSL技術によってさらに強化される可能性がある。したがって,今後のUDA研究のベースラインとしてSSLメソッドを採用することを推奨し,UDAとSSLの関係を明らかにすることによって,今後のUDA開発に光を当てることが期待できる。コードは \url{https://github.com/ybzh} で入手できる。 Unsupervised domain adaptation (UDA) and semi-supervised learning (SSL) are two typical strategies to reduce expensive manual annotations in machine learning. In order to learn effective models for a target task, UDA utilizes the available labeled source data, which may have different distributions from unlabeled samples in the target domain, while SSL employs few manually annotated target samples. Although UDA and SSL are seemingly very different strategies, we find that they are closely related in terms of task objectives and solutions, and SSL is a special case of UDA problems. Based on this finding, we further investigate whether SSL methods work on UDA tasks. By adapting eight representative SSL algorithms on UDA benchmarks, we show that SSL methods are strong UDA learners. Especially, state-of-the-art SSL methods significantly outperform existing UDA methods on the challenging UDA benchmark of DomainNet, and state-of-the-art UDA methods could be further enhanced with SSL techniques. We thus promote that SSL methods should be employed as baselines in future UDA studies and expect that the revealed relationship between UDA and SSL could shed light on future UDA development. Codes are available at \url{https://github.com/YBZh}.	翻訳日:2021-06-02 14:13:38 公開日:2021-06-01
# 雑音ラベル学習における損失の不確実性を考慮したサンプル選択 Sample Selection with Uncertainty of Losses for Learning with Noisy Labels ( http://arxiv.org/abs/2106.00445v1 ) ライセンス: Link先を確認	Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, Masashi Sugiyama	(参考訳) ノイズの多いラベルで学習する際、サンプル選択アプローチは非常に人気があり、小さなロスデータをトレーニング中に正しくラベル付けされているとみなす。しかし、ノイズラベルでトレーニングされたモデルに基づいて、損失はオンザフライで発生し、大容量のデータはおそらく誤りである。 a) ラベルが間違っていて、その損失が他のデータよりも遅くなります。なぜなら、ディープニューラルネットワークが"リーンパターンファースト"であるからです; (b) 不足しているデータのグループに属しており、まだ選択されていないからです。本稿では,損失の点推定ではなく区間推定を用いて損失の不確実性を取り入れ,分布自由濃度の不等式から生じる損失の信頼区間の低境界をサンプル選択に用いる。このようにして、大容量だが少ない選択されたデータも試してみると、試行後の不確実性によって損失が効果的に減少するかどうかを見極めることにより、(a)と(b)を区別できる。結果として、正しくラベル付けされているが、一見すると誤ってラベル付けされているように見える、未表示のデータをより深く探索することができる。実験により,提案手法はベースラインよりも優れ,幅広いラベルノイズタイプに対して頑健であることが示された。 In learning with noisy labels, the sample selection approach is very popular, which regards small-loss data as correctly labeled during training. However, losses are generated on-the-fly based on the model being trained with noisy labels, and thus large-loss data are likely but not certainly to be incorrect. There are actually two possibilities of a large-loss data point: (a) it is mislabeled, and then its loss decreases slower than other data, since deep neural networks "learn patterns first"; (b) it belongs to an underrepresented group of data and has not been selected yet. In this paper, we incorporate the uncertainty of losses by adopting interval estimation instead of point estimation of losses, where lower bounds of the confidence intervals of losses derived from distribution-free concentration inequalities, but not losses themselves, are used for sample selection. In this way, we also give large-loss but less selected data a try; then, we can better distinguish between the cases (a) and (b) by seeing if the losses effectively decrease with the uncertainty after the try. As a result, we can better explore underrepresented data that are correctly labeled but seem to be mislabeled at first glance. Experiments demonstrate that the proposed method is superior to baselines and robust to a broad range of label noise types.	翻訳日:2021-06-02 14:13:19 公開日:2021-06-01
# オープンセット雑音ラベルを用いた学習用インスタンス補正 Instance Correction for Learning with Open-set Noisy Labels ( http://arxiv.org/abs/2106.00455v1 ) ライセンス: Link先を確認	Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, Masashi Sugiyama	(参考訳) オープンセットノイズラベルの問題は、トレーニングデータの一部が真のクラスを含まないラベル空間を持っていることを意味する。オープンセットノイズラベルを学習するには、同じラベルスペースを共有するためのトレーニングデータとテストデータが必要であるため、例えば損失補正やラベル修正といった多くのアプローチでは、そのようなオープンセットノイズラベルをうまく処理できない。したがって、最先端の手法では、サンプル選択アプローチを用いてオープンセットノイズラベルを処理し、ネットワークパラメータ更新のためにノイズデータからクリーンなデータを選択する。廃棄されたデータは誤ってラベル付けされ、トレーニングに参加しない。このようなアプローチは一見すると直感的で合理的です。しかし、自然に「そのようなデータはトレーニング中にのみ破棄できるのか」という疑問を提起することができる。本稿では,その答えがノーであることを示す。具体的には、廃棄されたデータのインスタンスは、一般化のための有意義な情報から成り得ると論じる。このため、そのようなデータを捨てるのではなく、破棄されたデータのインスタンスを変更するためにインスタンス修正を使用することで、廃棄されたデータの予測をラベルと一致させる。インスタンス修正は、ターゲットの敵攻撃によって実行される。修正されたデータは、一般化を支援するためにトレーニングに利用されます。分析結果に加えて、我々の主張を正当化するための一連の実証的証拠が提供される。 The problem of open-set noisy labels denotes that part of training data have a different label space that does not contain the true class. Lots of approaches, e.g., loss correction and label correction, cannot handle such open-set noisy labels well, since they need training data and test data to share the same label space, which does not hold for learning with open-set noisy labels. The state-of-the-art methods thus employ the sample selection approach to handle open-set noisy labels, which tries to select clean data from noisy data for network parameters updates. The discarded data are seen to be mislabeled and do not participate in training. Such an approach is intuitive and reasonable at first glance. However, a natural question could be raised "can such data only be discarded during training?". In this paper, we show that the answer is no. Specifically, we discuss that the instances of discarded data could consist of some meaningful information for generalization. For this reason, we do not abandon such data, but use instance correction to modify the instances of the discarded data, which makes the predictions for the discarded data consistent with given labels. Instance correction are performed by targeted adversarial attacks. The corrected data are then exploited for training to help generalization. In addition to the analytical results, a series of empirical evidences are provided to justify our claims.	翻訳日:2021-06-02 14:12:55 公開日:2021-06-01
# 決定木を用いた解剖学的対象構造実測値の自動解析 Automated Grading of Anatomical Objective Structured Practical Exams Using Decision Trees ( http://arxiv.org/abs/2106.00502v1 ) ライセンス: Link先を確認	Jason Bernard, Ranil Sonnadara, Anthony N. Saraco, Josh P. Mitchell, Alex B. Bak, Ilana Bayer, Bruce C. Wainman	(参考訳) 客観的構造化実用試験(ospe)は、解剖学的知識を評価するための効果的で堅牢だが資源集約的な手法である。ほとんどのOSPEは短い回答やブランクのスタイルの質問を使っているため、このフォーマットは試験をマークするためにコンテンツに精通した多くの人々を必要としている。しかし、解剖学と生理学のコースのオンライン配信の頻度が高まると、学生は対面学習セッションで受けるOSPEの実践を失う可能性がある。本研究の目的は、知的オンラインOSPE学習システムを構築するための第1ステップとして、OSPE質問のマーク付けにおいて、決定木(DT)の精度をテストすることである。この研究は、McMaster大学健康科学部(HTHSCI 2FF3/2LL3/1D06)の解剖学と生理学のコースから、2020年冬期最終OSPEの結果をデータセットとして使用した。データセットの90%は10倍の検証アルゴリズムで54の質問に対してDTをトレーニングするために使われました。それぞれのDTは、学生が書いた正しい回答に現れるユニークな単語で構成されていた。残りの10%のデータセットは生成されたdtsでマークされた。 DTで示される回答が職員や教員の回答と比較された場合、DTは54の質問に対して平均94.49%の精度を達成した。これは、DTのような機械学習アルゴリズムがOSPEのグレーティングに非常に効果的な選択肢であり、インテリジェントでオンラインのOSPE学習システムの開発に適していることを示唆している。 An Objective Structured Practical Examination (OSPE) is an effective and robust, but resource-intensive, means of evaluating anatomical knowledge. Since most OSPEs employ short answer or fill-in-the-blank style questions, the format requires many people familiar with the content to mark the exams. However, the increasing prevalence of online delivery for anatomy and physiology courses could result in students losing the OSPE practice that they would receive in face-to-face learning sessions. The purpose of this study was to test the accuracy of Decision Trees (DTs) in marking OSPE questions as a potential first step to creating an intelligent, online OSPE tutoring system. The study used the results of the winter 2020 semester final OSPE from McMaster University's anatomy and physiology course in the Faculty of Health Sciences (HTHSCI 2FF3/2LL3/1D06) as the data set. Ninety percent of the data set was used in a 10-fold validation algorithm to train a DT for each of the 54 questions. Each DT was comprised of unique words that appeared in correct, student-written answers. The remaining 10% of the data set was marked by the generated DTs. When the answers marked by the DT were compared to the answers marked by staff and faculty, the DT achieved an average accuracy of 94.49% across all 54 questions. This suggests that machine learning algorithms such as DTs are a highly effective option for OSPE grading and are suitable for the development of an intelligent, online OSPE tutoring system.	翻訳日:2021-06-02 14:12:36 公開日:2021-06-01
# Care Label の概念: 信頼とリソースを意識した機械学習のための認定スイート The Care Label Concept: A Certification Suite for Trustworthy and Resource-Aware Machine Learning ( http://arxiv.org/abs/2106.00512v1 ) ライセンス: Link先を確認	Katharina Morik and Helena Kotthaus and Lukas Heppe and Danny Heinrich and Raphael Fischer and Andreas Pauly and Nico Piatkowski	(参考訳) 機械学習アプリケーションはユビキタスになった。これにより、機械学習を信頼できるものにする努力が増えた。説明可能な公正なAIはすでに成熟しています。彼らは知識のあるユーザとアプリケーションエンジニアに対処する。方法や学習したモデルを理解するのに時間を投資したくない人のために、私たちはケアラベルを提供しています。これは、例えばファクトシートやモデルカードのような記述をエンドユーザーに適した形式に変換する。一方、ケアラベルは、保証が守られているかどうかをテストする認証スイートの結果である。本稿では,認証スイートを用いて2つの実験を行った。ひとつは、マルコフランダムフィールド(MRF)の設定のためのケアラベルを示す。 MRFの基本理論に基づいて、それぞれの選択は、例えば表現性と信頼性のような静的特性の特定の評価につながる。さらに、実装をテストし、動的特性を産出するリソース消費量を計測する。この2段階の手順に続いて、ディープニューラルネットワーク(DNN)モデルを認定する別の実験が行われる。そこで、特定のモデルとデータセットに基づいて、文献から静的な特性を描画する。第2のレベルでは、特定の攻撃に対する堅牢性の測定を提供する実験が生成される。 ResNet-18 と MobileNetV3 が ImageNet に適用した。 Machine learning applications have become ubiquitous. This has led to an increased effort of making machine learning trustworthy. Explainable and fair AI have already matured. They address knowledgeable users and application engineers. For those who do not want to invest time into understanding the method or the learned model, we offer care labels: easy to understand at a glance, allowing for method or model comparisons, and, at the same time, scientifically well-based. On one hand, this transforms descriptions as given by, e.g., Fact Sheets or Model Cards, into a form that is well-suited for end-users. On the other hand, care labels are the result of a certification suite that tests whether stated guarantees hold. In this paper, we present two experiments with our certification suite. One shows the care labels for configurations of Markov random fields (MRFs). Based on the underlying theory of MRFs, each choice leads to its specific rating of static properties like, e.g., expressivity and reliability. In addition, the implementation is tested and resource consumption is measured yielding dynamic properties. This two-level procedure is followed by another experiment certifying deep neural network (DNN) models. There, we draw the static properties from the literature on a particular model and data set. At the second level, experiments are generated that deliver measurements of robustness against certain attacks. We illustrate this by ResNet-18 and MobileNetV3 applied to ImageNet.	翻訳日:2021-06-02 14:12:08 公開日:2021-06-01
# IID-GAN : モード崩壊の正規化のためのIIDサンプリング視点 IID-GAN: an IID Sampling Perspective for Regularizing Mode Collapse ( http://arxiv.org/abs/2106.00563v1 ) ライセンス: Link先を確認	Liangliang Shi, Yang Li, Junchi Yan	(参考訳) その成功にもかかわらず、GAN(Generative Adversarial Network)は依然としてモード崩壊に悩まされており、ジェネレータは潜在変数をターゲット分布の部分的なモードにしかマッピングできない。本稿では,この問題を,独立かつ同一分布(IID)サンプリング視点で解析し,正規化しようと試み,対象空間(すなわち,対象空間)における生成のためのID特性を保持することを強調する。実際のデータ) モード崩壊を自然に回避できる。これは機械学習の実際のデータに対する基本的なiid仮定に基づいている。しかし、ソースサンプル $\mathbf{z}$ は IID に従うが、ターゲット生成 $G(\mathbf{z})$ は必ずしも IID ではないかもしれない。この観測に基づいて、我々は、生成をIIDに正規化する方法として、生成からの逆ソースと潜在空間における標準ガウス分布との近接性を促進するために、新たな損失を与える。論理は、対象データから戻る逆サンプルも、ソース分散のためのiidであるべきです。合成データと実世界のデータの両方の実験は、我々のモデルの優越性と堅牢性を示している。 Despite its success, generative adversarial networks (GANs) still suffer from mode collapse, namely the generator can only map latent variables to a partial set of modes of the target distribution. In this paper, we analyze and try to regularize this issue with an independent and identically distributed (IID) sampling perspective and emphasize that holding the IID property for generation in target space (i.e. real data) can naturally avoid mode collapse. This is based on the basic IID assumption for real data in machine learning. However, though the source samples $\mathbf{z}$ obey IID, the target generation $G(\mathbf{z})$ may not necessarily be IID. Based on this observation, we provide a new loss to encourage the closeness between the inverse source from generation, and a standard Gaussian distribution in the latent space, as a way of regularizing the generation to be IID. The logic is that the inverse samples back from target data should also be IID for source distribution. Experiments on both synthetic and real-world data show the superiority and robustness of our model.	翻訳日:2021-06-02 14:11:54 公開日:2021-06-01
# 短距離オフラインRLを用いた勧告システムの長期化 Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL ( http://arxiv.org/abs/2106.00589v1 ) ライセンス: Link先を確認	Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, Adith Swaminathan	(参考訳) セッションベースのレコメンデーションシナリオについて検討し、シーケンシャルなインタラクションの間、ユーザに対してアイテムを推薦し、長期的なユーティリティを改善する。長期的なメトリクスの最適化は、学習信号(推奨が望ましい目標を達成したかどうか)がシステムとの他のユーザインタラクションによって遅延して確立されるため、難しい。クリックのような即時測定可能なプロキシは、長期的な指標とのミスアライメントによる最適以下の推奨につながる可能性がある。多くの研究がセッションベースレコメンデーションにエピソード強化学習(RL)技術を適用しているが、これらの手法はセッション間でのユーザ意図の変動を考慮していない。我々は,セッション間におけるポリシ誘起分布シフトを近似する新しいバッチrlアルゴリズムである short horizon policy improvement (shpi) を開発した。 SHPIの水平超パラメータを変化させることで、RL文献でよく知られた政策改善スキームを復元する。 4つのレコメンデーションタスクの実証結果から、SHPIは行列係数化、オフライン帯域幅、オフラインRLベースラインよりも優れていることが示された。また,重み付き回帰オラクルを用いた安定かつ効率的な実装も提供する。 We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility. Optimizing a long-term metric is challenging because the learning signal (whether the recommendations achieved their desired goals) is delayed and confounded by other user interactions with the system. Immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric. Many works have applied episodic reinforcement learning (RL) techniques for session-based recommendation but these methods do not account for policy-induced drift in user intent across sessions. We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions. By varying the horizon hyper-parameter in SHPI, we recover well-known policy improvement schemes in the RL literature. Empirical results on four recommendation tasks show that SHPI can outperform matrix factorization, offline bandits, and offline RL baselines. We also provide a stable and computationally efficient implementation using weighted regression oracles.	翻訳日:2021-06-02 14:11:37 公開日:2021-06-01
# 解毒剤データを用いたフェアクラスタリング Fair Clustering Using Antidote Data ( http://arxiv.org/abs/2106.00600v1 ) ライセンス: Link先を確認	Anshuman Chhabra, Adish Singla, Prasant Mohapatra	(参考訳) クラスタリングアルゴリズムは多くの現代のデータサイエンスアプリケーションに広く利用されている。これにより、クラスタリングアルゴリズムの出力を公平にする必要がある。伝統的に、クラスタリングアルゴリズムに対する新しいフェアアルゴリズムの変種は、フェアネスの特定の概念のために開発されている。しかし、アプリケーションコンテキストによっては、フェアネスの定義が異なる場合もあります。その結果、クラスタリングアルゴリズムとフェアネス定義の組み合わせ毎に、新しいアルゴリズムと分析を提案する必要がある。さらに、新しいアルゴリズムは現実世界のシステムにデプロイするために再実装される必要がある。したがって、クラスタリングにおける公正性に対する代替的なアプローチとして、アンチドテデータと呼ばれる少数のデータポイントで元のデータセットを増強する手法を提案する。この新しいデータセット上でクラスタリングが行われると、選択されたクラスタリングアルゴリズムとフェアネス定義に対して出力が公正になる。我々はこれを、任意の中心的クラスタリングアルゴリズムと公平性の概念に対応できる一般的な二段階最適化問題として定式化する。次に、異なる問題設定に対するこの二段階最適化のアプローチを分類する。異なるクラスタリングアルゴリズムと公平性の概念に関する広範囲な実験により、我々のアルゴリズムは、非常に少ない反ドートデータを追加することで、多くの現実世界のデータセットで所望の公平性を達成できることが示された。また,本アルゴリズムは,他の最先端のフェアクラスタリングアルゴリズムと比較して,フェアネスコストと競合クラスタリング性能の低減を実現する。 Clustering algorithms are widely utilized for many modern data science applications. This motivates the need to make outputs of clustering algorithms fair. Traditionally, new fair algorithmic variants to clustering algorithms are developed for specific notions of fairness. However, depending on the application context, different definitions of fairness might need to be employed. As a result, new algorithms and analysis need to be proposed for each combination of clustering algorithm and fairness definition. Additionally, each new algorithm would need to be reimplemented for deployment in a real-world system. Hence, we propose an alternate approach to fairness in clustering where we augment the original dataset with a small number of data points, called antidote data. When clustering is undertaken on this new dataset, the output is fair, for the chosen clustering algorithm and fairness definition. We formulate this as a general bi-level optimization problem which can accommodate any center-based clustering algorithms and fairness notions. We then categorize approaches for solving this bi-level optimization for different problem settings. Extensive experiments on different clustering algorithms and fairness notions show that our algorithms can achieve desired levels of fairness on many real-world datasets with a very small percentage of antidote data added. We also find that our algorithms achieve lower fairness costs and competitive clustering performance compared to other state-of-the-art fair clustering algorithms.	翻訳日:2021-06-02 14:11:17 公開日:2021-06-01
# 機能的オブジェクト指向ネットワークを用いたロボットタスク実行へのロードマップ A Road-map to Robot Task Execution with the Functional Object-Oriented Network ( http://arxiv.org/abs/2106.00158v1 ) ライセンス: Link先を確認	David Paulius, Alejandro Agostini, Yu Sun and Dongheui Lee	(参考訳) ロボットのための知識グラフ表現として関数型オブジェクト指向ネットワーク(foon)が導入された。 FOONは、二部グラフの形で、ロボットの環境やタスクに対する理解に関係のある象徴的あるいは高レベルな情報を、人間の行動理解を反映した形で含んでいる。本稿では,foonの今後の開発に向けたロードマップと,そのタスク計画のためのロボットシステムへの応用,および実演からの知識獲得について概説する。本研究では,ロボットと人間の教師が実世界のシナリオにおいて,FOONの既存の知識を協調的に強化し,実証された動作を再現し,与えられた操作問題を解くために必要なスキルをロボットに教えるための,予備的なアイデアを提案する。 Following work on joint object-action representations, the functional object-oriented network (FOON) was introduced as a knowledge graph representation for robots. Taking the form of a bipartite graph, a FOON contains symbolic or high-level information that would be pertinent to a robot's understanding of its environment and tasks in a way that mirrors human understanding of actions. In this work, we outline a road-map for future development of FOON and its application in robotic systems for task planning as well as knowledge acquisition from demonstration. We propose preliminary ideas to show how a FOON can be created in a real-world scenario with a robot and human teacher in a way that can jointly augment existing knowledge in a FOON and teach a robot the skills it needs to replicate the demonstrated actions and solve a given manipulation problem.	翻訳日:2021-06-02 14:10:47 公開日:2021-06-01
# マルチエージェント強化学習のためのshapley counterfactualcredits Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2106.00285v1 ) ライセンス: Link先を確認	Jiahui Li, Kun Kuang, Baoxiang Wang, Furui Liu, Long Chen, Fei Wu and Jun Xiao	(参考訳) 分散実行による集中訓練(CTDE)は、協調的マルチエージェント強化学習(MARL)設定において一般的なパラダイムであり、多くの実アプリケーションで広く利用されている。トレーニングプロセスにおける大きな課題の1つは、グローバルな報酬に応じて各エージェントの貢献を推論することを目的としたクレジット割り当てである。既存のクレジット割当手法は、結合値関数を個々の値関数に分解するか、局所的な観測と行動がグローバルな値関数に与える影響を測定することに焦点を当てている。これらのアプローチは、複数のエージェント間の複雑な相互作用を十分に考慮していないため、クレジットの割り当てが不適当であり、MARL上でのメディカルな結果をもたらす。本稿では,エージェントの連立を考慮に入れた明示的なクレジット割当手法であるshapley counterfactual credit assignmentを提案する。具体的には、shapley値とその望ましい特性は、エージェントの組み合わせを信用するためにdeep marlで活用され、各エージェントの個々のクレジットを見積もる能力を与えてくれます。この能力にもかかわらず、主な技術的困難は、エージェントの数として要因的に成長するShapley Valueの計算複雑性にある。代わりにモンテカルロサンプリングによる近似法を用いて,その有効性を維持しつつ,サンプルの複雑さを低減する。異なるシナリオにわたるStarCraft IIベンチマークにおいて,本手法の評価を行った。本手法は,既存の協調的marlアルゴリズムを著しく上回り,特に困難度の高いタスクにおいて,最先端のマージンを達成する。 Centralized Training with Decentralized Execution (CTDE) has been a popular paradigm in cooperative Multi-Agent Reinforcement Learning (MARL) settings and is widely used in many real applications. One of the major challenges in the training process is credit assignment, which aims to deduce the contributions of each agent according to the global rewards. Existing credit assignment methods focus on either decomposing the joint value function into individual value functions or measuring the impact of local observations and actions on the global value function. These approaches lack a thorough consideration of the complicated interactions among multiple agents, leading to an unsuitable assignment of credit and subsequently mediocre results on MARL. We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent. Despite this capability, the main technical difficulty lies in the computational complexity of Shapley Value who grows factorially as the number of agents. We instead utilize an approximation method via Monte Carlo sampling, which reduces the sample complexity while maintaining its effectiveness. We evaluate our method on StarCraft II benchmarks across different scenarios. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.	翻訳日:2021-06-02 14:10:32 公開日:2021-06-01
# 統一トランスフォーマーによる多言語音声翻訳:huawei noah's ark lab at iwslt 2021 Multilingual Speech Translation with Unified Transformer: Huawei Noah's Ark Lab at IWSLT 2021 ( http://arxiv.org/abs/2106.00197v1 ) ライセンス: Link先を確認	Xingshan Zeng, Liangyou Li and Qun Liu	(参考訳) 本稿では,Huawei Noah の Ark Lab から IWSLT 2021 Multilingual Speech Translation (MultiST) タスクに送信されたシステムについて述べる。マルチストモデルでは,異なるモダリティ(音声とテキスト)と異なるタスク(音声認識,機械翻訳,音声翻訳)からのデータを活用し,モデルの能力を高めるために,統一トランスフォーマアーキテクチャを用いる。具体的には、まず音声とテキストをそれぞれ異なる特徴抽出器に入力し、音響的特徴とテキスト的特徴を抽出する。次に、これらの機能は共有エンコーダ-デコーダアーキテクチャによって処理される。マルチタスク学習,タスクレベルのカリキュラム学習,データ拡張など,パフォーマンス向上にいくつかのトレーニング手法を適用する。最終システムは教師付き言語ペアのバイリンガルベースラインよりもかなり良い結果を得ることができ,ゼロショット言語ペアでは合理的な結果が得られる。 This paper describes the system submitted to the IWSLT 2021 Multilingual Speech Translation (MultiST) task from Huawei Noah's Ark Lab. We use a unified transformer architecture for our MultiST model, so that the data from different modalities (i.e., speech and text) and different tasks (i.e., Speech Recognition, Machine Translation, and Speech Translation) can be exploited to enhance the model's ability. Specifically, speech and text inputs are firstly fed to different feature extractors to extract acoustic and textual features, respectively. Then, these features are processed by a shared encoder--decoder architecture. We apply several training techniques to improve the performance, including multi-task learning, task-level curriculum learning, data augmentation, etc. Our final system achieves significantly better results than bilingual baselines on supervised language pairs and yields reasonable results on zero-shot language pairs.	翻訳日:2021-06-02 14:10:09 公開日:2021-06-01
# Nora: 幸福なコーチ Nora: The Well-Being Coach ( http://arxiv.org/abs/2106.00410v1 ) ライセンス: Link先を確認	Genta Indra Winata, Holy Lovenia, Etsuko Ishii, Farhad Bin Siddique, Yongsheng Yang, Pascale Fung	(参考訳) 現在のパンデミックは、人々が孤立し続け、社会的距離を取ることを強制し、結果として生じる孤独とネガティブな感情に対処するシステムの必要性を生み出している。本稿では,対話システムにおける自然言語理解を活用した仮想コーチングプラットフォームであるNoraを提案し,ユーザインタラクションに基づく他のレコメンデーションを提案する。自給自足や在宅勤務を行う人々を支援することを目的としている。 Noraは、ユーザの感情、感情、ストレスを検出し記録することで、幸福度を計測する。 Noraはまた、健康的な日々のルーチンの開発を支援するために、様々なワークアウト、想想、ヨガのエクササイズをユーザに推奨している。さらに私たちは,nora内のソーシャルコミュニティを提供して,ユーザが自身の経験を他の人たちと共有できるようにしています。 Noraはウェブリンクを通じてどこからでもアクセスでき、英語とマンダリンの両方をサポートしている。 The current pandemic has forced people globally to remain in isolation and practice social distancing, which creates the need for a system to combat the resulting loneliness and negative emotions. In this paper we propose Nora, a virtual coaching platform designed to utilize natural language understanding in its dialogue system and suggest other recommendations based on user interactions. It is intended to provide assistance and companionship to people undergoing self-quarantine or work-from-home routines. Nora helps users gauge their well-being by detecting and recording the user's emotion, sentiment, and stress. Nora also recommends various workout, meditation, or yoga exercises to users in support of developing a healthy daily routine. In addition, we provide a social community inside Nora, where users can connect and share their experiences with others undergoing a similar isolation procedure. Nora can be accessed from anywhere via a web link and has support for both English and Mandarin.	翻訳日:2021-06-02 14:09:53 公開日:2021-06-01
# 不正確なデータのベールによるロジスティック回帰 Logistic Regression Through the Veil of Imprecise Data ( http://arxiv.org/abs/2106.00492v1 ) ライセンス: Link先を確認	Nicholas Gray and Scott Ferson	(参考訳) ロジスティック回帰は、いくつかの予測変数に基づいて結果の確率を評価する重要な統計ツールである。標準的な手法は、正確に知られているデータのみを扱うことができるが、多くのデータセットには、従来の手法が単一ポイントに縮小するか、完全に無視されるかの不確実性がある。本稿では,区間内の値から得られる可能性のあるモデルの集合を用いて,不正確なロジスティック回帰モデルを考えることで,これらの不確実性を含めることができることを示す。これは従来の方法によって取り除かれたてんかんの不確実性を明確に表現する利点がある。 Logistic regression is an important statistical tool for assessing the probability of an outcome based upon some predictive variables. Standard methods can only deal with precisely known data, however many datasets have uncertainties which traditional methods either reduce to a single point or completely disregarded. In this paper we show that it is possible to include these uncertainties by considering an imprecise logistic regression model using the set of possible models that can be obtained from values from within the intervals. This has the advantage of clearly expressing the epistemic uncertainty removed by traditional methods.	翻訳日:2021-06-02 14:09:26 公開日:2021-06-01
# モーメントからのガウス混合学習のためのテンソル分解 Tensor decomposition for learning Gaussian mixtures from moments ( http://arxiv.org/abs/2106.00555v1 ) ライセンス: Link先を確認	Rima Khouja (AROMATH), Pierre-Alexandre Mattei (MAASAI), Bernard Mourrain (AROMATH)	(参考訳) データ処理や機械学習では、データを正確に表現できるモデルを復元し活用することが重要な課題である。データセットからガウス混合モデルを復元する問題を考察する。この問題に対処するための対称テンソル分解法について検討し,データ分布の経験的モーメントからテンソルを構築する。我々は一意な分解を持つ識別可能なテンソルを考えるが、球面のガウス混合から作られるモーメントテンソルは、この性質を持っていることを示している。補間次数が厳密に半分未満の対称テンソルは同定可能であることを証明し、それらの分解を計算するために単純な線形代数演算に基づくアルゴリズムを提案する。図示的な実験は、他の最先端のアプローチと比較して、ガウス混合物を回収するためのテンソル分解法の影響を示している。 In data processing and machine learning, an important challenge is to recover and exploit models that can represent accurately the data. We consider the problem of recovering Gaussian mixture models from datasets. We investigate symmetric tensor decomposition methods for tackling this problem, where the tensor is built from empirical moments of the data distribution. We consider identifiable tensors, which have a unique decomposition, showing that moment tensors built from spherical Gaussian mixtures have this property. We prove that symmetric tensors with interpolation degree strictly less than half their order are identifiable and we present an algorithm, based on simple linear algebra operations, to compute their decomposition. Illustrative experimentations show the impact of the tensor decomposition method for recovering Gaussian mixtures, in comparison with other state-of-the-art approaches.	翻訳日:2021-06-02 14:09:17 公開日:2021-06-01
# omnizart:自動音楽書き起こしのための汎用ツールボックス Omnizart: A General Toolbox for Automatic Music Transcription ( http://arxiv.org/abs/2106.00497v1 ) ライセンス: Link先を確認	Yu-Te Wu, Yin-Jyun Luo, Tsung-Ping Chen, I-Chieh Wei, Jui-Yang Hsu, Yi-Chin Chuang, Li Su	(参考訳) 我々は、自動音楽転写(AMT)の合理化ソリューションを提供する新しいPythonライブラリであるOmnizartを紹介し、リリースする。 Omnizartは、ディープラーニングベースのATTのライフサイクルを構成するモジュールを含み、コンパクトなコマンドラインインタフェースでの使用を容易にするように設計されている。我々の知る限り、Omnizartは最初の転写ツールキットであり、ソロ、楽器アンサンブル、パーカッション楽器、ボーカル、コード認識とビート/ダウンビート追跡のためのモデル、AMTに関連する2つの音楽情報検索(MIR)タスクなど、幅広い種類の楽器をカバーするモデルを提供する。 We present and release Omnizart, a new Python library that provides a streamlined solution to automatic music transcription (AMT). Omnizart encompasses modules that construct the life-cycle of deep learning-based AMT, and is designed for ease of use with a compact command-line interface. To the best of our knowledge, Omnizart is the first transcription toolkit which offers models covering a wide class of instruments ranging from solo, instrument ensembles, percussion instruments to vocal, as well as models for chord recognition and beat/downbeat tracking, two music information retrieval (MIR) tasks highly related to AMT.	翻訳日:2021-06-02 14:09:04 公開日:2021-06-01
# 都市森林における炭素隔離の定量化 Quantification of Carbon Sequestration in Urban Forests ( http://arxiv.org/abs/2106.00182v1 ) ライセンス: Link先を確認	Levente Klein, Wang Zhou, Conrad Albrecht	(参考訳) 植物、特に木は大気から二酸化炭素を吸収して炭素を抽出するが、木に蓄えられた炭素の効率的な定量法が欠如しているため、その過程を追跡することは困難である。本稿では,多スペクトル空中画像とlidarデータを用いて炭素蓄積量の推定を行い,炭素蓄積量化の重要な特性である樹木の被覆率,幾何学的形状,樹木種を同定する手法を提案する。樹木のバイオマスを計算するために,樹種情報とその3次元幾何学形状を遠隔画像から推定できることを実証する。特にニューヨーク市マンハッタンでは、木に植えられた炭素の合計が5万2000ドルと見積もっています。 Vegetation, trees in particular, sequester carbon by absorbing carbon dioxide from the atmosphere, however, the lack of efficient quantification methods of carbon stored in trees renders it difficult to track the process. Here we present an approach to estimate the carbon storage in trees based on fusing multispectral aerial imagery and LiDAR data to identify tree coverage, geometric shape, and tree species, which are crucial attributes in carbon storage quantification. We demonstrate that tree species information and their three-dimensional geometric shapes can be estimated from remote imagery in order to calculate the tree's biomass. Specifically, for Manhattan, New York City, we estimate a total of $52,000$ tons of carbon sequestered in trees.	翻訳日:2021-06-02 14:08:49 公開日:2021-06-01
# 超音波画像における腕神経叢分割のためのハイブリッドディープニューラルネットワーク Hybrid Deep Neural Network for Brachial Plexus Nerve Segmentation in Ultrasound Images ( http://arxiv.org/abs/2106.00373v1 ) ライセンス: Link先を確認	Juul P.A. van Boxtel, Vincent R.J. Vousten, Josien Pluim, Nastaran Mohammadian Rad	(参考訳) 超音波ガイド下局所麻酔(UGRA)は全身麻酔(GA)を代替し、鎮痛と回復時間を改善する。この方法は鎖骨外科手術後の腕神経叢(BP)に応用できる。しかし,超音波(US)画像からのBPの同定は,専門職でも困難である。この問題を解決するために、BP神経領域の同定とセグメンテーションに畳み込みニューラルネットワーク(CNN)とより高度なディープニューラルネットワーク(DNN)を用いることができる。本稿では,超音波画像中のbp神経領域をセグメント化するための分類モデルとセグメント化モデルを組み合わせたハイブリッドモデルを提案する。 CNNモデルは、BP領域で画像を正確に選択するための分類器として使用される。次に、セグメント化にU-netまたはM-netモデルを用いる。実験の結果,提案手法は単一セグメンテーションモデルに対するセグメンテーション性能を大幅に向上させることが示唆された。 Ultrasound-guided regional anesthesia (UGRA) can replace general anesthesia (GA), improving pain control and recovery time. This method can be applied on the brachial plexus (BP) after clavicular surgeries. However, identification of the BP from ultrasound (US) images is difficult, even for trained professionals. To address this problem, convolutional neural networks (CNNs) and more advanced deep neural networks (DNNs) can be used for identification and segmentation of the BP nerve region. In this paper, we propose a hybrid model consisting of a classification model followed by a segmentation model to segment BP nerve regions in ultrasound images. A CNN model is employed as a classifier to precisely select the images with the BP region. Then, a U-net or M-net model is used for the segmentation. Our experimental results indicate that the proposed hybrid model significantly improves the segmentation performance over a single segmentation model.	翻訳日:2021-06-02 14:08:38 公開日:2021-06-01
# 条件付き生成逆数ネットワークを用いた肝病変合成のためのデカップリング形状と密度 Decoupling Shape and Density for Liver Lesion Synthesis Using Conditional Generative Adversarial Networks ( http://arxiv.org/abs/2106.00629v1 ) ライセンス: Link先を確認	Dario Augusto Borges Oliveira	(参考訳) 病変合成は、トレーニングデータの増強、病変の進展シナリオの描画、専門家の訓練を支援するための効率的な生成モデルの台頭により、多くの注目を集めた。合成データの質と多様性は、モデルをトレーニングするのに使用される注釈付きデータに大きく依存する。これにより、病変分節アルゴリズムに固有のバイアスが加わり、病変の進化シナリオの合成を効率的に制限できる。本稿では,肝病変合成のための形状と密度を分離する手法を提案する。形状と密度を個々に修正して合成制御を示す定性的な結果と,生成器モデルに密度情報を組み込むことが,形状のみを用いた場合に比べて病変分割性能の向上に寄与することを示す定量的な結果を提供する。 Lesion synthesis received much attention with the rise of efficient generative models for augmenting training data, drawing lesion evolution scenarios, or aiding expert training. The quality and diversity of synthesized data are highly dependent on the annotated data used to train the models, which not rarely struggle to derive very different yet realistic samples from the training ones. That adds an inherent bias to lesion segmentation algorithms and limits synthesizing lesion evolution scenarios efficiently. This paper presents a method for decoupling shape and density for liver lesion synthesis, creating a framework that allows straight-forwardly driving the synthesis. We offer qualitative results that show the synthesis control by modifying shape and density individually, and quantitative results that demonstrate that embedding the density information in the generator model helps to increase lesion segmentation performance compared to using the shape solely.	翻訳日:2021-06-02 14:08:24 公開日:2021-06-01
# 畳み込みネットワークを用いたマルチスペクトル画像分類のためのハイパースペクトル帯域選択 Hyperspectral Band Selection for Multispectral Image Classification with Convolutional Networks ( http://arxiv.org/abs/2106.00645v1 ) ライセンス: Link先を確認	Giorgio Morales and John Sheppard and Riley Logan and Joseph Shaw	(参考訳) 近年、ハイパースペクトルイメージング(HSI)はリモートセンシング、農業、バイオメディシンといったアプリケーションにおける信頼性の高いデータ源となっている。しかし、ハイパースペクトル画像は非常にデータ密度が高く、特定のアプリケーションに最も有用な情報を保持しながらスペクトル帯域を減らす方法の恩恵を受けることが多い。画像分類の文脈において、HSIシステムから得られた波長の削減されたセットを選択するための新しいバンド選択法を提案する。提案手法は2つの主要なステップから構成される: 1つは、フィルタに基づくアプローチを用いて、帯域とその近傍のコリニアリティ解析に基づいて、関連するスペクトル帯域を求める。この分析は冗長バンドの除去に役立ち、検索スペースを劇的に削減する。第2のステップは、情報エントロピー値に基づいて縮小集合からバンドを選択するラッパーベースアプローチを適用し、コンパクト畳み込みニューラルネットワーク(cnn)を訓練し、現在の選択性能を評価する。提案手法から得られた分類結果を,2つのハイパースペクトル画像データセット上の他の特徴選択法と比較する。さらに、元のハイパースペクトルデータキューブを使用して、マルチスペクトルイメージにおける実際のフィルタの使用プロセスをシミュレートする。本手法はマルチスペクトルセンサの設計に適した結果が得られることを示す。 In recent years, Hyperspectral Imaging (HSI) has become a powerful source for reliable data in applications such as remote sensing, agriculture, and biomedicine. However, hyperspectral images are highly data-dense and often benefit from methods to reduce the number of spectral bands while retaining the most useful information for a specific application. We propose a novel band selection method to select a reduced set of wavelengths, obtained from an HSI system in the context of image classification. Our approach consists of two main steps: the first utilizes a filter-based approach to find relevant spectral bands based on a collinearity analysis between a band and its neighbors. This analysis helps to remove redundant bands and dramatically reduces the search space. The second step applies a wrapper-based approach to select bands from the reduced set based on their information entropy values, and trains a compact Convolutional Neural Network (CNN) to evaluate the performance of the current selection. We present classification results obtained from our method and compare them to other feature selection methods on two hyperspectral image datasets. Additionally, we use the original hyperspectral data cube to simulate the process of using actual filters in a multispectral imager. We show that our method produces more suitable results for a multispectral sensor design.	翻訳日:2021-06-02 14:08:09 公開日:2021-06-01
# 体組成解析のための3次元ct画像からの全身骨格筋・脂肪組織・骨切片の自動測定の包括的検証 : 拡張体組成に向けて Comprehensive Validation of Automated Whole Body Skeletal Muscle, Adipose Tissue, and Bone Segmentation from 3D CT images for Body Composition Analysis: Towards Extended Body Composition ( http://arxiv.org/abs/2106.00652v1 ) ライセンス: Link先を確認	Da Ma, Vincent Chow, Karteek Popuri, Mirza Faisal Beg	(参考訳) コンピュータ支援精密医療の最近の進歩は、グループベースの分析に有効な集合パターンを見つけるのに役立つ集団全体モデルから、治療の選択や治療結果の予測に関して患者固有の決定を導くことができる患者固有のモデルへと移行しやすくしている。身体構成は、様々な疾患にとって重要な要因であり、また治療選択や外科的介入に対する患者固有の臨床結果の予測因子として認識されている。 3次元CT画像は、腫瘍学的ワークローで日常的に取得され、内部解剖の正確なレンダリングを提供するため、骨格筋の量や組織区画の分別を同時に評価することができる。ディープラーニングのような強力な人工知能のツールは、3D画像全体を分割し、すべての内部解剖を正確に測定することを可能にする。これにより、それまで存在した深刻なボトルネック、すなわち3dボリュームイメージを構成する数百の2d軸スライスにスケールすることを禁じられていた手動セグメンテーションの必要性が克服される。今回紹介したような自動化ツールは、3dctやmri画像から全身の計測値を取り出すことができるようになり、個々の組織、臓器容積、形状、機能状態に基づいて様々な疾患のドライバが発見される新しい時代へと繋がる。これらの測定は不可能であったため、フィールドを非常に小さく限られたサブセットに制限した。これらの発見と、高速かつ精度で個々の画像セグメンテーションを行う能力は、がんなどの主要な疾患の発症後の栄養、老化、化学療法、手術、生存に関連する個々の治療計画モデルにこれらの3D尺度を組み込むことにつながる可能性が高い。 The latest advances in computer-assisted precision medicine are making it feasible to move from population-wide models that are useful to discover aggregate patterns that hold for group-based analysis to patient-specific models that can drive patient-specific decisions with regard to treatment choices, and predictions of outcomes of treatment. Body Composition is recognized as an important driver and risk factor for a wide variety of diseases, as well as a predictor of individual patient-specific clinical outcomes to treatment choices or surgical interventions. 3D CT images are routinely acquired in the oncological worklows and deliver accurate rendering of internal anatomy and therefore can be used opportunistically to assess the amount of skeletal muscle and adipose tissue compartments. Powerful tools of artificial intelligence such as deep learning are making it feasible now to segment the entire 3D image and generate accurate measurements of all internal anatomy. These will enable the overcoming of the severe bottleneck that existed previously, namely, the need for manual segmentation, which was prohibitive to scale to the hundreds of 2D axial slices that made up a 3D volumetric image. Automated tools such as presented here will now enable harvesting whole-body measurements from 3D CT or MRI images, leading to a new era of discovery of the drivers of various diseases based on individual tissue, organ volume, shape, and functional status. These measurements were hitherto unavailable thereby limiting the field to a very small and limited subset. These discoveries and the potential to perform individual image segmentation with high speed and accuracy are likely to lead to the incorporation of these 3D measures into individual specific treatment planning models related to nutrition, aging, chemotoxicity, surgery and survival after the onset of a major disease such as cancer.	翻訳日:2021-06-02 14:07:52 公開日:2021-06-01
# 事前学習ネットワークを用いたノイズ画像分類の忠実度推定 Fidelity Estimation Improves Noisy-Image Classification with Pretrained Networks ( http://arxiv.org/abs/2106.00673v1 ) ライセンス: Link先を確認	Xiaoyu Lin, Deblina Bhattacharjee, Majed El Helou and Sabine S\"usstrunk	(参考訳) 画像分類はディープラーニングを用いて大幅に改善された。これは主に、大規模なデータセットから豊富な特徴抽出器を学習できる畳み込みニューラルネットワーク(cnns)に起因する。しかし、ほとんどのディープラーニング分類法はクリーンな画像に基づいて訓練されており、復元前処理のステップが適用されたとしても、ノイズ処理では堅牢ではない。新しい手法はこの問題に対処するが、それらは修正された特徴抽出器に依存し、したがって再訓練を必要とする。代わりに,事前学習した分類器に適用可能な手法を提案する。提案手法は,特徴抽出器の内部表現に融合した忠実度マップ推定を活用し,ネットワークの注意を誘導し,ノイズデータに対してより頑健にする。我々は,特に高雑音レベルにおいて,ノイズ画像分類(NIC)の結果を大幅に改善し,完全に再訓練されたアプローチに近づいた。さらに, 概念実証として, オラクルの忠実度マップを用いた場合, ノイズや復元画像の訓練の有無にかかわらず, 完全に再現された手法よりも優れていることを示す。 Image classification has significantly improved using deep learning. This is mainly due to convolutional neural networks (CNNs) that are capable of learning rich feature extractors from large datasets. However, most deep learning classification methods are trained on clean images and are not robust when handling noisy ones, even if a restoration preprocessing step is applied. While novel methods address this problem, they rely on modified feature extractors and thus necessitate retraining. We instead propose a method that can be applied on a pretrained classifier. Our method exploits a fidelity map estimate that is fused into the internal representations of the feature extractor, thereby guiding the attention of the network and making it more robust to noisy data. We improve the noisy-image classification (NIC) results by significantly large margins, especially at high noise levels, and come close to the fully retrained approaches. Furthermore, as proof of concept, we show that when using our oracle fidelity map we even outperform the fully retrained methods, whether trained on noisy or restored images.	翻訳日:2021-06-02 14:07:21 公開日:2021-06-01
# bures-wasserstein幾何をもつ正定値行列上のリーマン最適化について On Riemannian Optimization over Positive Definite Matrices with the Bures-Wasserstein Geometry ( http://arxiv.org/abs/2106.00286v1 ) ライセンス: Link先を確認	Andi Han, Bamdev Mishra, Pratik Jawanpuria, Junbin Gao	(参考訳) 本稿では、対称正定値(spd)行列多様体上のリーマン最適化のための一般的なアフィン不変量(ai)幾何と、bures-wasserstein(bw)幾何を比較分析する。我々の研究は、AIメトリックの二次的依存とは対照的に、BWメトリックがSPD行列に線形依存していることから始まる。我々は、不条件のSPD行列に対するいくつかのリーマン最適化問題に対して、BW計量がより適切で堅牢な選択であることを示す。 BW幾何学は非負の曲率を持ち、非正の曲線を持つAI幾何に対するアルゴリズムの収束率をさらに向上させることを示す。最後に、AI幾何学では測地線凸(geodeic convex)として知られているいくつかの一般的なコスト関数が、BW幾何学では測地線凸(geodeic convex)であることを示す。様々な応用に関する広範な実験が我々の発見を裏付けている。 In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affine-Invariant (AI) geometry for Riemannian optimization on the symmetric positive definite (SPD) matrix manifold. Our study begins with an observation that the BW metric has a linear dependence on SPD matrices in contrast to the quadratic dependence of the AI metric. We build on this to show that the BW metric is a more suitable and robust choice for several Riemannian optimization problems over ill-conditioned SPD matrices. We show that the BW geometry has a non-negative curvature, which further improves convergence rates of algorithms over the non-positively curved AI geometry. Finally, we verify that several popular cost functions, which are known to be geodesic convex under the AI geometry, are also geodesic convex under the BW geometry. Extensive experiments on various applications support our findings.	翻訳日:2021-06-02 14:06:06 公開日:2021-06-01
# リッジ関数の帯域凸最適化のためのミニマックスレグレット Minimax Regret for Bandit Convex Optimisation of Ridge Functions ( http://arxiv.org/abs/2106.00444v1 ) ライセンス: Link先を確認	Tor Lattimore	(参考訳) 逆向きのバンドイット凸最適化を、f(x) = g(\langle x, \theta\rangle)$ for convex $g : \mathbb r \to \mathbb r$ と $\theta \in \mathbb r^d$ という形式の関数に制限された逆数で解析する。ミニマックスの後悔は最大で$o(d\sqrt{n} \log(\operatorname{diam}\mathcal k))$であり、ここで$n$は相互作用の数、$d$は次元、$\operatorname{diam}(\mathcal k)$は制約集合の直径である。したがって、この函数の類は線形の場合よりも対数的に難しい。 We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f(x) = g(\langle x, \theta\rangle)$ for convex $g : \mathbb R \to \mathbb R$ and $\theta \in \mathbb R^d$. We provide a short information-theoretic proof that the minimax regret is at most $O(d\sqrt{n} \log(\operatorname{diam}\mathcal K))$ where $n$ is the number of interactions, $d$ the dimension and $\operatorname{diam}(\mathcal K)$ is the diameter of the constraint set. Hence, this class of functions is at most logarithmically harder than the linear case.	翻訳日:2021-06-02 14:05:53 公開日:2021-06-01
# 伸縮触覚: 工具とグラフト物体による振動センシング Extended Tactile Perception: Vibration Sensing through Tools and Grasped Objects ( http://arxiv.org/abs/2106.00489v1 ) ライセンス: Link先を確認	Tasbolat Taunyazov, Luar Shui Song, Eugene Lim, Hian Hian See, David Lee, Benjamin C.K. Tee, Harold Soh	(参考訳) 人間は道具やその他の保持物を通して世界を感知する驚くべき能力を示す。例えば、保持された棒の衝突箇所をピンポイントで特定し、硬いプローブを使って異なるテクスチャを区別することができる。本研究では,ロボットが道具を具現化し,標準的な把持物体を用いて知覚を拡張できるような能力を実現する方法について検討する。ロボットの指の動的触覚センサを用いた視覚触覚センシングと機械学習モデルにより,ロボットは剛体物体に沿って伝達される振動として伝達される接触情報を解読できる。本稿では,BioTacマイクロ振動センサと4〜kHzでマルチタキセルセンシングが可能な新しいイベントダイナミックセンサであるNUSkinを用いた広範囲な実験について報告する。保持棒上の微細な局在化は我々のアプローチ(20cmロッド上の誤差が1cm未満)により可能であることを示す。次に, 振動触覚知覚は, 物体ハンドオーバ時の適度な把握安定性予測と, 標準フォークを用いた正確な食品識別につながることを示す。マルチタキセルビブロ触覚を十分に高いサンプリングレート(2kHz以上)で検出すると,様々なタスクやオブジェクトに対して最高の性能が得られることがわかった。両者を組み合わせることで,触覚知覚の拡張にvibro-tactile perceptionを使用するためのエビデンスとガイドラインが提供され,ツールによる能力向上と,人間とロボットの対話性の向上につながると我々は信じている。 Humans display the remarkable ability to sense the world through tools and other held objects. For example, we are able to pinpoint impact locations on a held rod and tell apart different textures using a rigid probe. In this work, we consider how we can enable robots to have a similar capacity, i.e., to embody tools and extend perception using standard grasped objects. We propose that vibro-tactile sensing using dynamic tactile sensors on the robot fingers, along with machine learning models, enables robots to decipher contact information that is transmitted as vibrations along rigid objects. This paper reports on extensive experiments using the BioTac micro-vibration sensor and a new event dynamic sensor, the NUSkin, capable of multi-taxel sensing at 4~kHz. We demonstrate that fine localization on a held rod is possible using our approach (with errors less than 1 cm on a 20 cm rod). Next, we show that vibro-tactile perception can lead to reasonable grasp stability prediction during object handover, and accurate food identification using a standard fork. We find that multi-taxel vibro-tactile sensing at sufficiently high sampling rate (above 2 kHz) led to the best performance across the various tasks and objects. Taken together, our results provides both evidence and guidelines for using vibro-tactile perception to extend tactile perception, which we believe will lead to enhanced competency with tools and better physical human-robot-interaction.	翻訳日:2021-06-02 14:05:35 公開日:2021-06-01
# clustrank: クラスタパターンによる散乱プロットのソートのための知覚データに基づく視覚品質尺度 ClustRank: a Visual Quality Measure Trained on Perceptual Data for Sorting Scatterplots by Cluster Patterns ( http://arxiv.org/abs/2106.00599v1 ) ライセンス: Link先を確認	Mostafa Abbas, Ehsan Ullah, Abdelkader Baggag, Halima Bensmail, Michael Sedlmair, Michael Aupetit	(参考訳) ビジュアル品質測定(VQM)は、視覚化のパターンを自動的に検出し定量化することにより、アナリストを支援するように設計されている。そこで本研究では,視認可能なグループ化パターンに従って散布確率をランク付けする,clustrankと呼ばれる新しいデータ駆動手法を提案する。本モデルはまず, ガウス混合モデルのパラメトリック空間に散乱プロットを符号化し, 人間の判断データに基づいて学習した分類器を用いてグループ化パターンの知覚複雑性を推定する。初期混合成分の個数と最終結合群は散乱指数の階数を決定する。 ClustRankは、2ガウスのクラスタパターン上の人間の判断を模倣することで既存のVQM技術を改善し、スパッタプロットで一般的なクラスタパターンをランク付けする際の精度を高める。我々は,大規模な散布株の視覚的解析に専門家が依存する領域であるゲノムワイド・アソシエーション研究において,血縁関係データを分析することで,そのメリットを実証する。 3つのベンチマークデータセットとClustRank VQMを実用的な使用とさらなる改善のために利用しています。 Visual quality measures (VQMs) are designed to support analysts by automatically detecting and quantifying patterns in visualizations. We propose a new data-driven technique called ClustRank that allows to rank scatterplots according to visible grouping patterns. Our model first encodes scatterplots in the parametric space of a Gaussian Mixture Model, and then uses a classifier trained on human judgment data to estimate the perceptual complexity of grouping patterns. The numbers of initial mixture components and final combined groups determine the rank of the scatterplot. ClustRank improves on existing VQM techniques by mimicking human judgments on two-Gaussian cluster patterns and gives more accuracy when ranking general cluster patterns in scatterplots. We demonstrate its benefit by analyzing kinship data for genome-wide association studies, a domain in which experts rely on the visual analysis of large sets of scatterplots. We make the three benchmark datasets and the ClustRank VQM available for practical use and further improvements.	翻訳日:2021-06-02 14:05:12 公開日:2021-06-01
# 深層学習を用いた走査型心電図追跡による運動時の胎児障害の検出 Detection of preventable fetal distress during labor from scanned cardiotocogram tracings using deep learning ( http://arxiv.org/abs/2106.00628v1 ) ライセンス: Link先を確認	Martin G. Frasch, Shadrian B. Strong, David Nilosek, Joshua Leaverton, Barry S. Schifrin	(参考訳) 労働・配送の分野で広く応用されているにもかかわらず、電子胎児モニタリング(EFM)の価値についてかなりの議論が続いている。 EFMには胎児の心拍数(FHR)パターンの監視と、胎児の行動に関する豊富なデータと、酸素化と灌流の脅威を提供する母体の子宮収縮が含まれる。 fhrパターン情報にタイムリーに応答できない場合、胎児の損傷を普遍的に関連づける副作用。歴史的に、デジタルに保存されたEMMデータは、現代的または歴史的議論と検査のためのラスタライズされたpdf画像としてのみ利用可能である。しかし実際には、体系的にレビューされることはめったにない。本研究は,50年以上にわたって収集したEMFの独自のアーカイブを用いて,早期ないし過去の胎児外傷の訓練および検出のための深層学習フレームワークを提案する。早期の予防的胎児外傷の診断精度は94%であった。この枠組みは、胎児の健康維持のための早期の警告および意思決定支援システムの自動化に適している。最終的には、そのようなシステムは、医師が労働中にタイムリーに反応し、有害な結果を防ぐことができる。副作用が回避できない場合、新生児の早期神経保護治療へのガイダンスを提供することができる。 Despite broad application during labor and delivery, there remains considerable debate about the value of electronic fetal monitoring (EFM). EFM includes the surveillance of the fetal heart rate (FHR) patterns in conjunction with the maternal uterine contractions providing a wealth of data about fetal behavior and the threat of diminished oxygenation and perfusion. Adverse outcomes universally associate a fetal injury with the failure to timely respond to FHR pattern information. Historically, the EFM data, stored digitally, are available only as rasterized pdf images for contemporary or historical discussion and examination. In reality, however, they are rarely reviewed systematically. Using a unique archive of EFM collected over 50 years of practice in conjunction with adverse outcomes, we present a deep learning framework for training and detection of incipient or past fetal injury. We report 94% accuracy in identifying early, preventable fetal injury intrapartum. This framework is suited for automating an early warning and decision support system for maintaining fetal well-being during the stresses of labor. Ultimately, such a system could enable a physician to timely respond during labor and prevent adverse outcomes. When adverse outcomes cannot be avoided, they can provide guidance to the early neuroprotective treatment of the newborn.	翻訳日:2021-06-02 14:04:08 公開日:2021-06-01
# 音像定位のためのデュアル正規化マルチタスキング Dual Normalization Multitasking for Audio-Visual Sounding Object Localization ( http://arxiv.org/abs/2106.00180v1 ) ライセンス: Link先を確認	Tokuhiro Nishikawa, Daiki Shimada, Jerry Jun Yokono	(参考訳) 未訓練映像における視聴覚音源の定位に関するいくつかの研究が報告されているが、その性能を定量的に評価するためのデータセットやメトリクスは提案されていない。音源定位のための基礎的真理を定義することは, 音源の位置は音源の範囲に限らず, 振動が周囲の物体を伝播・伝播させるため, 困難である。そこで本研究では,音の視的位置の曖昧さを低減し,幅広い音源の位置をアノテートする新しい概念であるサウンド・オブジェクトを提案する。定量的評価のためのメトリクスを新たに提案し,AVSOL(Audio-Visual Sounding Object Localization)の問題を定式化する。また、よく知られたAVEデータセットのテストセットを手動でアノテートすることで、評価データセット(AVSOL-Eデータセット)を作成しました。本稿では,この新たなavsol問題に対処するために,オーディオ・ビジュアル対応 (avc) タスクとビデオイベントの分類タスクを1つのオーディオ・ビジュアル類似度マップに集約する,デュアル・ノーマライズ・マルチタスク (dnm) と呼ばれる新しいマルチタスク・トレーニング戦略とアーキテクチャを提案する。 DNMによる両監視を効率的に活用することにより,提案アーキテクチャはベースライン法よりも大幅に優れる。 Although several research works have been reported on audio-visual sound source localization in unconstrained videos, no datasets and metrics have been proposed in the literature to quantitatively evaluate its performance. Defining the ground truth for sound source localization is difficult, because the location where the sound is produced is not limited to the range of the source object, but the vibrations propagate and spread through the surrounding objects. Therefore we propose a new concept, Sounding Object, to reduce the ambiguity of the visual location of sound, making it possible to annotate the location of the wide range of sound sources. With newly proposed metrics for quantitative evaluation, we formulate the problem of Audio-Visual Sounding Object Localization (AVSOL). We also created the evaluation dataset (AVSOL-E dataset) by manually annotating the test set of well-known Audio-Visual Event (AVE) dataset. To tackle this new AVSOL problem, we propose a novel multitask training strategy and architecture called Dual Normalization Multitasking (DNM), which aggregates the Audio-Visual Correspondence (AVC) task and the classification task for video events into a single audio-visual similarity map. By efficiently utilize both supervisions by DNM, our proposed architecture significantly outperforms the baseline methods.	翻訳日:2021-06-02 14:03:51 公開日:2021-06-01
# マルチエージェントマルコフ確率ゲームにおけるグラディエントプレイ:静止点と収束 Gradient Play in Multi-Agent Markov Stochastic Games: Stationary Points and Convergence ( http://arxiv.org/abs/2106.00198v1 ) ライセンス: Link先を確認	Runyu Zhang, Zhaolin Ren, Na Li	(参考訳) エージェント間で共有される現在の状態情報に基づいて決定を独立に行うことにより、各エージェントが自己の合計割引報酬を最大化しようとする確率ゲーム(SGs)としても知られるマルチエージェントタブラマルコフ決定プロセス(MDPs)の勾配プレイアルゴリズムの性能について検討する。ポリシーは、ある状態において特定のアクションを選択する確率によって直接パラメータ化される。 nash平衡(nes)と1次定常ポリシーがこの設定において等価であることを示し、マルコフポテンシャルゲームと呼ばれるマルチエージェントmdpのサブクラスに対して、非漸近的大域収束率解析を$\epsilon$-neに与える。その結果,エージェント数で指数関数的にではなく,$\epsilon$-neに達するイテレーションの数は線形にスケールすることがわかった。局所幾何学や局所安定性も考慮される。マルコフポテンシャルゲームに対しては、厳密な NE が全ポテンシャル関数の局所極大であり、完全混合の NE がサドル点であることを証明する。さらに、より一般的な設定では、厳格なnes周辺の局所収束率も与えます。 We study the performance of the gradient play algorithm for multi-agent tabular Markov decision processes (MDPs), which are also known as stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first order stationary policies are equivalent in this setting, and give a non-asymptotic global convergence rate analysis to an $\epsilon$-NE for a subclass of multi-agent MDPs called Markov potential games, which includes the cooperative setting with identical rewards among agents as an important special case. Our result shows that the number of iterations to reach an $\epsilon$-NE scales linearly, instead of exponentially, with the number of agents. Local geometry and local stability are also considered. For Markov potential games, we prove that strict NEs are local maxima of the total potential function and fully-mixed NEs are saddle points. We also give a local convergence rate around strict NEs for more general settings.	翻訳日:2021-06-02 14:01:58 公開日:2021-06-01
# ベイズメタラーニングにおける認識不確実性の情報理論解析 Information-Theoretic Analysis of Epistemic Uncertainty in Bayesian Meta-learning ( http://arxiv.org/abs/2106.00252v1 ) ライセンス: Link先を確認	Sharu Theresa Jose, Sangwoo Park, Osvaldo Simeone	(参考訳) 訓練された予測器の全体的な予測の不確実性は、認識論的不確実性とアレエータ的不確実性のために別個の貢献に分解することができる。ベイズ的定式化の下では、十分に特定されたモデルとして、2つの寄与は情報理論量(Xu と Raginsky, 2020)の点で(ログロスに関して)正確に表現できる。本稿では,ベイズメタラーニングにおける情報理論の枠組みにおける認識の不確実性について考察する。一般的な階層的ベイズモデルでは、ハイパーパラメータがモデルパラメータのタスクごとの事前を決定する。最適なメタ学習規則の最小過剰メタリスク(MEMR)によって定量化されるてんかんの不確実性に対して、(ログロスに関して)厳密な特徴と境界(より一般的な損失のために)導出される。この特徴付けは、タスク数とタスク毎のトレーニングデータ量に対する認識の不確かさの依存性に関する洞察をもたらすために利用される。神経相互情報推定器を用いて評価した情報理論的境界と,langevin-stein bayesian meta-learning(ls-bml)と呼ばれる新しい近似完全ベイズ型メタラーニング戦略の性能を比較する実験を行った。 The overall predictive uncertainty of a trained predictor can be decomposed into separate contributions due to epistemic and aleatoric uncertainty. Under a Bayesian formulation, assuming a well-specified model, the two contributions can be exactly expressed (for the log-loss) or bounded (for more general losses) in terms of information-theoretic quantities (Xu and Raginsky, 2020). This paper addresses the study of epistemic uncertainty within an information-theoretic framework in the broader setting of Bayesian meta-learning. A general hierarchical Bayesian model is assumed in which hyperparameters determine the per-task priors of the model parameters. Exact characterizations (for the log-loss) and bounds (for more general losses) are derived for the epistemic uncertainty - quantified by the minimum excess meta-risk (MEMR)- of optimal meta-learning rules. This characterization is leveraged to bring insights into the dependence of the epistemic uncertainty on the number of tasks and on the amount of per-task training data. Experiments are presented that compare the proposed information-theoretic bounds, evaluated via neural mutual information estimators, with the performance of a novel approximate fully Bayesian meta-learning strategy termed Langevin-Stein Bayesian Meta-Learning (LS-BML).	翻訳日:2021-06-02 14:01:36 公開日:2021-06-01
# 情報リスク最小化による機械学習のための統合PAC-Bayesianフレームワーク A unified PAC-Bayesian framework for machine unlearning via information risk minimization ( http://arxiv.org/abs/2106.00265v1 ) ライセンス: Link先を確認	Sharu Theresa Jose, Osvaldo Simeone	(参考訳) マシンアンラーニング(machine unlearning)とは、トレーニングモデルの要求に対するトレーニングデータのサブセットの影響を、スクラッチから再トレーニングするコストを伴わずに取り除くメカニズムである。本稿では,情報リスク最小化問題(Zhang,2006)として,変分アンラーニング(Nguyen et.al., 2020)とラグランジアン(Golatkar et.al., 2020)の2つの最近の設計原則を回復する,機械学習のための統一的なPAC-Bayesianフレームワークを開発する。したがって、どちらの基準も自由エネルギー計量の形をとる未学習モデルの試験損失に関するPAC-ベイジアン上界と解釈できる。 Machine unlearning refers to mechanisms that can remove the influence of a subset of training data upon request from a trained model without incurring the cost of re-training from scratch. This paper develops a unified PAC-Bayesian framework for machine unlearning that recovers the two recent design principles - variational unlearning (Nguyen et.al., 2020) and forgetting Lagrangian (Golatkar et.al., 2020) - as information risk minimization problems (Zhang,2006). Accordingly, both criteria can be interpreted as PAC-Bayesian upper bounds on the test loss of the unlearned model that take the form of free energy metrics.	翻訳日:2021-06-02 14:01:12 公開日:2021-06-01
# H-FL:フェデレートラーニングのための階層的コミュニケーション効率とプライバシ保護アーキテクチャ H-FL: A Hierarchical Communication-Efficient and Privacy-Protected Architecture for Federated Learning ( http://arxiv.org/abs/2106.00275v1 ) ライセンス: Link先を確認	He Yang	(参考訳) 連合学習(federated learning:fl)の長年の目標は、厳密なプライバシー保証と、比較的高いモデル精度を維持しながら、低い通信オーバーヘッドを必要とする。しかし、すべての目標を同時に達成することは極めて難しい。本稿では,この課題に対処するため,階層型連合学習(H-FL)と呼ばれる新しい枠組みを提案する。トレーニングデータの統計的不均一性によるモデル性能の劣化を考慮し、クライアントを適切に配置し、仲介者を利用してクライアントのローカルトレーニングを再構成する実行時分布再構築戦略を考案する。さらに,H-FLに組み込まれた圧縮補正機構を設計し,モデル性能を犠牲にすることなく通信オーバーヘッドを低減する。さらに,プライバシの保証を提供するために,ローカルトレーニングを実施しながらディファレンシャルプライバシを導入し,モデルの一部のみに適度なノイズを注入する。実験結果から,H-FLフレームワークは実世界の画像認識タスクに対して,異なるデータセット上での最先端性能を実現することがわかった。 The longstanding goals of federated learning (FL) require rigorous privacy guarantees and low communication overhead while holding a relatively high model accuracy. However, simultaneously achieving all the goals is extremely challenging. In this paper, we propose a novel framework called hierarchical federated learning (H-FL) to tackle this challenge. Considering the degradation of the model performance due to the statistic heterogeneity of the training data, we devise a runtime distribution reconstruction strategy, which reallocates the clients appropriately and utilizes mediators to rearrange the local training of the clients. In addition, we design a compression-correction mechanism incorporated into H-FL to reduce the communication overhead while not sacrificing the model performance. To further provide privacy guarantees, we introduce differential privacy while performing local training, which injects moderate amount of noise into only part of the complete model. Experimental results show that our H-FL framework achieves the state-of-art performance on different datasets for the real-world image recognition tasks.	翻訳日:2021-06-02 14:00:56 公開日:2021-06-01
# 可逆サロゲートモデル:可逆ニューラルネットワークによるレーザー-ウェークフィールド加速の合同サロゲートモデルと再構成 Invertible Surrogate Models: Joint surrogate modelling and reconstruction of Laser-Wakefield Acceleration by invertible neural networks ( http://arxiv.org/abs/2106.00432v1 ) ライセンス: Link先を確認	Friedrich Bethke, Richard Pausch, Patrick Stiller, Alexander Debus, Michael Bussmann, Nico Hoffmann	(参考訳) 可逆ニューラルネットワークは、前と逆モードで実行できる、機械学習の有望なニューラルネットワークアーキテクチャにおける最近の技術である。本稿では,レーザープラズマ加速器(iLWFA)に係わる物理の複雑な前方シミュレーションを近似する,可逆サロゲートモデルを導入する。代理モデルの客観的設計は、実験的に得られた診断を再構築するためのあらゆる手段を提供する。我々の逆レーザーウェイクフィールド加速ネットワークの品質は、大規模な数値LWFAシミュレーションで検証される。 Invertible neural networks are a recent technique in machine learning promising neural network architectures that can be run in forward and reverse mode. In this paper, we will be introducing invertible surrogate models that approximate complex forward simulation of the physics involved in laser plasma accelerators: iLWFA. The bijective design of the surrogate model also provides all means for reconstruction of experimentally acquired diagnostics. The quality of our invertible laser wakefield acceleration network will be verified on a large set of numerical LWFA simulations.	翻訳日:2021-06-02 14:00:41 公開日:2021-06-01
# 実世界の画像復元と超解像のための2段階領域適応トレーニング Two-stage domain adapted training for better generalization in real-world image restoration and super-resolution ( http://arxiv.org/abs/2106.00504v1 ) ライセンス: Link先を確認	Cansu Korkmaz, A.Murat Tekalp, Zafer Dogan	(参考訳) 逆問題では、エンドツーエンドのトレーニングされたネットワークがトレーニングセットに見られる劣化モデルに過剰に適合していること、すなわち、それらは他のタイプの劣化にうまく一般化しないことがよく知られている。近年,未知フィルタによりサンプリングされた画像を,ビキュービックにダウンサンプリングされたルックアライクな画像にマッピングする手法が提案されている。本稿では,まず入力された劣化した画像を中間領域にマッピングし,次いでその中間領域から出力画像を生成するための第2のネットワークを訓練することにより,任意の逆問題を定式化できることを示す。さらに、最適な中間領域はタスクによって異なる場合がある。実験の結果, この2段階のドメイン適応トレーニング戦略は, 未知の劣化のクラスに対してより良い結果を得るだけでなく, 他の未知の劣化クラスにも一般化できることがわかった。 It is well-known that in inverse problems, end-to-end trained networks overfit the degradation model seen in the training set, i.e., they do not generalize to other types of degradations well. Recently, an approach to first map images downsampled by unknown filters to bicubicly downsampled look-alike images was proposed to successfully super-resolve such images. In this paper, we show that any inverse problem can be formulated by first mapping the input degraded images to an intermediate domain, and then training a second network to form output images from these intermediate images. Furthermore, the best intermediate domain may vary according to the task. Our experimental results demonstrate that this two-stage domain-adapted training strategy does not only achieve better results on a given class of unknown degradations but can also generalize to other unseen classes of degradations better.	翻訳日:2021-06-02 14:00:07 公開日:2021-06-01
# 限定的な通信と差分プライバシーを持つ無線フェデレーション学習 Wireless Federated Learning with Limited Communication and Differential Privacy ( http://arxiv.org/abs/2106.00564v1 ) ライセンス: Link先を確認	Amir Sonee and Stefano Rini and Yu-Chih Huang	(参考訳) 本稿では,空力計算(AirComp)に基づくフェデレーション学習(FL)モデルにおいて,ローカルデータセットの効率的な通信と差分プライバシー(DP)における次元性低減の役割について検討する。より正確には、ガウスマルチアクセスチャネル(gmac)上のパラメータサーバ(ps)との同時チャネル認識と限定的な通信により、クライアントが機械学習モデルをトレーニングするように促されるfl設定を考える。この設定のために、局所勾配に基づいて与えられた損失関数の最小値をトレーニングするためのフェデレート確率勾配降下(FedSGD)、局所的な更新の次元を小さくするためのジョンソン・リンデンシュトラウス(JL)ランダムプロジェクション、ユーザのプライバシーをさらに支援するための人工ノイズを適用するアルゴリズムを提案する。本手法では,各次元に大きなノイズを注入し,ベクトルの感度を一定に保ちながら,局所DP性能が主に向上していることが示唆された。これは次元減少のない場合と比較して収束速度が遅くなるのに対してである。コンバージェンスが遅いため、プライバシとコンバージェンスの間のトレードオフは高いが、高次元のシステムでは、通信コストをはるかに少なくしてほぼ同じトレードオフが発生することが示されている。 This paper investigates the role of dimensionality reduction in efficient communication and differential privacy (DP) of the local datasets at the remote users for over-the-air computation (AirComp)-based federated learning (FL) model. More precisely, we consider the FL setting in which clients are prompted to train a machine learning model by simultaneous channel-aware and limited communications with a parameter server (PS) over a Gaussian multiple-access channel (GMAC), so that transmissions sum coherently at the PS globally aware of the channel coefficients. For this setting, an algorithm is proposed based on applying federated stochastic gradient descent (FedSGD) for training the minimum of a given loss function based on the local gradients, Johnson-Lindenstrauss (JL) random projection for reducing the dimension of the local updates, and artificial noise to further aid user's privacy. For this scheme, our results show that the local DP performance is mainly improved due to injecting noise of greater variance on each dimension while keeping the sensitivity of the projected vectors unchanged. This is while the convergence rate is slowed down compared to the case without dimensionality reduction. As the performance outweighs for the slower convergence, the trade-off between privacy and convergence is higher but is shown to lessen in high-dimensional regime yielding almost the same trade-off with much less communication cost.	翻訳日:2021-06-02 13:59:52 公開日:2021-06-01
# 機械学習に基づく物理イベント生成に関する調査 A survey of machine learning-based physics event generation ( http://arxiv.org/abs/2106.00643v1 ) ライセンス: Link先を確認	Yasir Alanazi, N. Sato, Pawel Ambrozewicz, Astrid N. Hiller Blin, W. Melnitchouk, Marco Battaglieri, Tianbo Liu, Yaohang Li	(参考訳) 高エネルギー核および素粒子物理学における事象生成子は、粒子反応の研究を促進する上で重要な役割を果たす。物理イベントジェネレータの構築における機械学習(ML)の取り組みの現状について調査する。 MLベースのイベントジェネレータで使用されるML生成モデルとその特定の課題について検討し、これらの課題を克服するために、MLモデル設計に物理を組み込む様々なアプローチについて議論する。最後に,ML技術に基づく物理イベント生成のための超解像,忠実度,外挿に関するオープンな質問について検討する。 Event generators in high-energy nuclear and particle physics play an important role in facilitating studies of particle reactions. We survey the state-of-the-art of machine learning (ML) efforts at building physics event generators. We review ML generative models used in ML-based event generators and their specific challenges, and discuss various approaches of incorporating physics into the ML model designs to overcome these challenges. Finally, we explore some open questions related to super-resolution, fidelity, and extrapolation for physics event generation based on ML technology.	翻訳日:2021-06-02 13:58:58 公開日:2021-06-01
# フォグベースのIoTにおける通信性能とエネルギー利用向上のための強化学習手法 A reinforcement learning approach to improve communication performance and energy utilization in fog-based IoT ( http://arxiv.org/abs/2106.00654v1 ) ライセンス: Link先を確認	Babatunji Omoniwa, Maxime Gueriau and Ivana Dusparic	(参考訳) 近年の研究では、利用可能なモバイルフォグデバイス(スマートフォン、ドローン、国内および産業用ロボットなど)をリレーとして、センサーと目的地デバイス間の通信停止を最小限に抑える可能性を実証している。しかし、移動中のリレーは移動時にエネルギーを減らし、遠隔地へ送信する。したがって、中継装置の電力制御機構とインテリジェントモビリティは、通信性能とエネルギー利用の改善に不可欠である。本稿では,各移動式フォグ中継エージェント(MFRA)を,強化学習を用いて通信性能とエネルギー利用を同時に向上させる自律エージェントによって制御する,Qラーニングに基づく分散型アプローチを提案する。それぞれの自律エージェントは、目的地とそのエネルギーレベルからのフィードバックに基づいて、メッセージの送信を継続するか、送信フェーズに受動的になるかを学習する。本手法は集中型アプローチと比較し,MFRAの少ない数で信頼性の高いデータ配信を実現し,全体のエネルギーコストを 56.76\% -- 88.03\% 削減できることを示した。 Recent research has shown the potential of using available mobile fog devices (such as smartphones, drones, domestic and industrial robots) as relays to minimize communication outages between sensors and destination devices, where localized Internet-of-Things services (e.g., manufacturing process control, health and security monitoring) are delivered. However, these mobile relays deplete energy when they move and transmit to distant destinations. As such, power-control mechanisms and intelligent mobility of the relay devices are critical in improving communication performance and energy utilization. In this paper, we propose a Q-learning-based decentralized approach where each mobile fog relay agent (MFRA) is controlled by an autonomous agent which uses reinforcement learning to simultaneously improve communication performance and energy utilization. Each autonomous agent learns based on the feedback from the destination and its own energy levels whether to remain active and forward the message, or become passive for that transmission phase. We evaluate the approach by comparing with the centralized approach, and observe that with lesser number of MFRAs, our approach is able to ensure reliable delivery of data and reduce overall energy cost by 56.76\% -- 88.03\%.	翻訳日:2021-06-02 13:58:50 公開日:2021-06-01

Title

Authors

Abstract

論文公表日・翻訳日

# ベル非局所性による通信複雑性の量子的利点

Quantum advantages of communication complexity from Bell nonlocality ( http://arxiv.org/abs/2004.05098v3 )

ライセンス: Link先を確認

Zhih-Ahn Jia, Lu Wei, Yu-Chun Wu, Guang-Can Guo

(参考訳) コミュニケーションゲームは物理理論の限界を調査するための重要なツールである。通信複雑性(CC)問題は、いくつかの分散パーティが古典的な通信に制限のある関数を共同で計算しようとする典型的な例である。本研究では,ベルテストによるcc問題をグラフ理論的に構築する手法を提案する。実験的な整合性グラフとそれに対応するベルテスト関数から、各エッジの情報をエンコードするターゲット関数を構築することができ、このターゲット関数を用いて、事前共有された絡み合った状態により、成功確率が任意の古典的戦略に対してそれを超えるCC関数を構築することができる。 Popescu-Rohrlich ボックスに基づく非署名プロトコルについても論じられ、この場合の成功確率は 1 である。

Communication games are crucial tools for investigating the limitations of physical theories. The communication complexity (CC) problem is a typical example, for which several distributed parties attempt to jointly calculate a given function with limited classical communications. In this work, we present a method to construct CC problems from Bell tests in a graph-theoretic way. Starting from an experimental compatibility graph and the corresponding Bell test function, a target function which encodes the information of each edge can be constructed, then using this target function we could construct an CC function for which by pre-sharing entangled states, the success probability will exceed that for arbitrary classical strategy. The non-signaling protocol based on Popescu-Rohrlich box is also discussed, and the success probability in this case would reach one.

翻訳日:2023-05-25 06:14:17 公開日:2021-06-01

# 2つの鏡間を加速する2つの絡み合った原子の共鳴相互作用

Resonance interaction of two entangled atoms accelerating between two mirrors ( http://arxiv.org/abs/2007.15465v2 )

ライセンス: Link先を確認

Riddhi Chatterjee, Sunandan Gangopadhyay and A. S. Majumdar

(参考訳) 量子化スカラー場真空と結合した2つの絡み合った同一原子間の共鳴相互作用と2つのミラー間の加速について検討した。非慣性運動中の原子配置によって、2つの原子の絡み合った状態の放射過程がどう操作できるかを示す。ハイゼンベルク図を対称作用素順序で組み込むと、真空揺らぎと自己反応の寄与が区別される。ハイゼンベルク運動方程式における自己反応の寄与から、2つの原子系の共鳴エネルギーシフトと緩和速度を評価する。本研究では, 原子加速, 原子間距離, 境界に対する位置などのパラメータによる2つの量の変化について検討する。以上のパラメータをチューニングすることにより,エネルギーレベルシフトと緩和率の両方を制御できることを示す。

We study the resonance interaction between two entangled identical atoms coupled to a quantized scalar field vacuum, and accelerating between two mirrors. We show how radiative processes of the two-atom entangled state can be manipulated by the atomic configuration undergoing noninertial motion. Incorporating the Heisenberg picture with symmetric operator ordering, the vacuum fluctuation and the self-reaction contributions are distinguished. We evaluate the resonance energy shift and the relaxation rate of energy of the two atom system from the self-reaction contribution in the Heisenberg equation of motion. We investigate the variation of these two quantities with relevant parameters such as atomic acceleration, interatomic distance and position with respect to the boundaries. We show that both the energy level shift and the relaxation rate can be controlled by tuning the above parameters.

翻訳日:2023-05-07 18:31:04 公開日:2021-06-01

# 2レベル結合系におけるエネルギーの量子対古典輸送

Quantum versus classical transport of energy in coupled two-level systems ( http://arxiv.org/abs/2007.15669v2 )

ライセンス: Link先を確認

I. Medina, S. V. Moreira, and F. L. Semi\~ao

(参考訳) 結合量子系の連鎖におけるエネルギー輸送の問題は、非古典的資源が輸送にどのように影響するかの光を遮蔽することを目的として検討する。チェーン内でコヒーレントまたは非コヒーレントなエネルギーホッピングが行われる場合について検討する。ここでは、非コヒーレントエネルギーホッピングは、非結合部位の固有状態によって形成されるその完全な対角線力学への言及において「古典的」シナリオと呼ばれる。 2-レベルサイトの線形連鎖の場合に注目し、コヒーレントな量子ケースが非コヒーレントなサイトよりも効率的であるホッピングレートのしきい値を見つける。次に、量子ホッピングレートをコヒーレンス大域的最大値にリンクすることで、量子シナリオをより効率的にするコヒーレンスしきい値が存在することを示すことができる。次に、ダイナミクスによって生成される統合的コヒーレンスを考察し、量子演算の侵入性として知られるものとの関連性を示す。本結果は,量子輸送の資源として量子侵襲性が果たす重要な役割を強く示唆する。

We consider the problem of energy transport in a chain of coupled quantum systems with the goal of shedding light on how nonclassical resources can affect transport. We study the cases for which either coherent or incoherent energy hopping takes place in the chain. Here, incoherent energy hopping is referred to as the "classical" scenario in allusion to its fully diagonal dynamics in the basis formed by the eigenstates of the decoupled sites. We focus on the case of a linear chain of two-level sites and find a hopping rate threshold above which the coherent quantum case is more efficient than the incoherent counterpart. We then link the quantum hopping rate to the coherence global maximum, which allows us to state that there is a coherence threshold above which the quantum scenario is more efficient. Next, we consider the integrated coherence generated by the dynamics and show how it is related to what is known as the invasiveness of a quantum operation. Our results strongly suggest the significant role played by quantum invasiveness as a resource for quantum transport.

翻訳日:2023-05-07 18:10:39 公開日:2021-06-01

# ランダム多成分量子状態に対する古典相関の共有可能性の制限

Restrictions on shareability of classical correlations for random multipartite quantum states ( http://arxiv.org/abs/2008.09592v2 )

ライセンス: Link先を確認

Saptarshi Roy, Shiladitya Mal, Aditi Sen De

(参考訳) 量子相関とは異なり、古典的相関(CC)の多部状態の2つの部分間の共有性は自由であると仮定される。しかし、状態空間からランダムに状態を選択すると、代数的最大値を持つ状態を得る確率は無限に小さいことが分かる。本研究では,ランダム多元状態の一様生成による非自明な上限の可能性を検討し,様々なcc測度,従来の古典相関器,古典相関の2つの公理測度,すなわち量子ディスコの古典的部分と作業不足の局所的仕事の周波数分布を計算する。分布は典型的にはガウス型であり,その標準偏差はパーティ数の増加とともに減少する。また, マルチ量子ビット確率状態のうち, 還元密度行列のほとんどがccsの量が少なく, 分布の平均によっても確認できるため, ランダム状態に対する古典相関のシェーラビリティに何らかの制限があることを明らかにした。さらに、乱数状態の最大値は、一連の状態に対して得られる代数的最大値よりもかなり低く、2つの状態の間のギャップは、より多くの相手を持つ状態に対してさらに大きくなることにも気付く。より多くのパーティにおいて、量子不協和の古典的部分と局所的な作業は、モノガミーに基づくシャーラビリティ上の境界に従うことができるが、古典的相関子は異なる上限を持つ。ランダム状態における古典的相関測度に対するシャーラビリティの傾向は、古典的相関の公理的定義と従来とを明確に区別する。

Unlike quantum correlations, the shareability of classical correlations (CCs) between two-parties of a multipartite state is assumed to be free since there exist states for which CCs for each of the reduced states can simultaneously reach their algebraic maximum value. However, when one randomly picks out states from the state space, we find that the probability of obtaining those states possessing the algebraic maximum value is vanishingly small. We explore the possibility of nontrivial upper bound by Haar uniformly generating random multipartite states and computing the frequency distribution for various CC measures, conventional classical correlators, and two axiomatic measures of classical correlations, namely the classical part of quantum discord and local work of work-deficit. We find that the distributions are typically Gaussian-like and their standard deviations decrease with the increase in number of parties. It also reveals that among the multiqubit random states, most of the reduced density matrices possess a low amount of CCs which can also be confirmed by the mean of the distributions, thereby showing a kind of restrictions on the sharability of classical correlations for random states. Furthermore, we also notice that the maximal value for random states is much lower than the algebraic maxima obtained for a set of states, and the gap between the two increases further for states with a higher number of parties. We report that for a higher number of parties, the classical part of quantum discord and local work can follow monogamy-based upper bound on sharability while classical correlators have a different upper bound. The trends of sharability for classical correlation measures in random states clearly demarcate between the axiomatic definition of classical correlations and the conventional ones.

翻訳日:2023-05-05 07:48:43 公開日:2021-06-01

# 強いポンプ場下における超伝導量子パラメトロンの制御

Controls of a superconducting quantum parametron under a strong pump field ( http://arxiv.org/abs/2009.05723v2 )

ライセンス: Link先を確認

Shumpei Masuda, Toyofumi Ishikawa, Yuichiro Matsuzaki and Shiro Kawabata

(参考訳) 自然振動の約2倍の周波数で励起され、パラメトロンまたはカーパラメトリック発振器と呼ばれるジョセフソンパラメトリック発振器は自己振動を示す。量子アニールと自励パラメトロンを量子ビットとして用いた普遍量子計算を提案した。しかし、ポンプ場下のパラメトロンの制御は、回転波近似の違反から生じる非共鳴高速振動項 (nrots) と呼ばれるハミルトニアンにおける不必要な高速振動項によって劣化する。したがって、ポンプ場はパラメトロンの制御の不完全さの原点となる可能性がある。ここでは,猫の状態生成と単一キュービットゲートによるパラメトロンの制御精度に及ぼすnrotの影響を理論的に検討する。従来の手法では, 非断熱遷移の抑制と回転波近似の有効性との間にはトレードオフ関係があることが示されている。また,ポンプの脱調の調整時間依存性は,非断熱遷移とNROTによるパラメトロン状態の乱れの両方を抑制できることを示した。

Pumped at approximately twice the natural frequency, a Josephson parametric oscillator called parametron or Kerr parametric oscillator shows self-oscillation. Quantum annealing and universal quantum computation using self-oscillating parametrons as qubits were proposed. However, controls of parametrons under the pump field are degraded by unwanted rapidly oscillating terms in the Hamiltonian, which we call non-resonant rapidly oscillating terms (NROTs) coming from the violation of the rotating wave approximation. Therefore, the pump field can be an intrinsic origin of the imperfection of controls of parametrons. Here, we theoretically study the influence of the NROTs on the accuracy of controls of a parametron: a cat-state creation and a single-qubit gate. It is shown that there is a trade-off relationship between the suppression of the nonadiabatic transitions and the validity of the rotating wave approximation in a conventional approach. We also show that the tailored time dependence of the detuning of the pump field can suppress both of the nonadiabatic transitions and the disturbance of the state of the parametron due to the NROTs.

翻訳日:2023-05-02 10:50:33 公開日:2021-06-01

# 2成分ボース-アインシュタイン凝縮体のリモート状態調製

Remote state preparation of two-component Bose-Einstein condensates ( http://arxiv.org/abs/2009.06923v2 )

ライセンス: Link先を確認

Manish Chaudhary, Matteo Fadel, Ebubechukwu O. Ilo-Okeke, Alexey N. Pyrkov, Valentin Ivannikov, Tim Byrnes

(参考訳) スピンアンサンブルのための遠隔状態準備プロトコルを提案し、このプロトコルの目的は、アンタングルメント、局所スピン回転、およびフォックベースでの計測を用いて、所定のスピン期待値のセットを持つ状態を作成することである。スピンアンサンブルは熱原子アンサンブルやホウ素-アインシュタイン凝縮によって実現される。このプロトコルは、フルブロッホ球面のスピン期待値を作成することができるホルシュタイン・プリマコフ近似を超えて機能する。主な実用上の障害は、スピンアンサンブル間の最大絡み合い状態の調整である。これを解決するために, 2軸2スピン(2A2S)ハミルトニアンを最大絡み合う状態の代わりに用いて, その性能について検討する。 2a2sのプロトコルのバージョンは、スピン平均をリモートで作成できるように、最大エンタングル状態に近いことが判明した。 2A2S圧縮状態の誤差を評価し,アンサンブルサイズで減少することを確認した。選択後、エラーはさらに体系的に減少する。

A protocol for remote state preparation is proposed for spin ensembles, where the aim is to prepare a state with a given set of spin expectation values on a remote spin ensemble using entanglement, local spin rotations, and measurements in the Fock basis. The spin ensembles could be realized by thermal atomic ensembles or spinor Bose-Einstein condensates. The protocol works beyond the Holstein-Primakoff approximation, such that spin expectation values for the full Bloch sphere can be prepared. The main practical obstacle is the preparation of the maximally entangled state between the spin ensembles. To overcome this, we examine using states based on the two-axis two-spin (2A2S) Hamiltonian in place of the maximally entangled state and examine its performance. We find that the version of the protocol with 2A2S squeezing well-approximates the maximally entangled state, such that spin averages can be remotely prepared. We evaluate the errors of using 2A2S squeezed states, and find that it decreases with the ensemble size. With post-selection, errors can be systematically decreased further.

翻訳日:2023-05-02 04:37:41 公開日:2021-06-01

# クビットアレイによるマイクロ波光子検出におけるハイゼンベルク限界に向けて

Towards the Heisenberg limit in microwave photon detection by a qubit array ( http://arxiv.org/abs/2009.11271v3 )

ライセンス: Link先を確認

P. Navez, A. G. Balanov, S. E. Savel'ev, A. M. Zagoskin

(参考訳) 解析的に可解なモデルを用いて、量子ビットアレイに基づく検出器は、単一光子を検出する際の基本的なハイゼンベルク限界を達成することができることを示す。超伝導量子ビットの場合、これは重要なマイクロ波領域における量子センシングと通信の新しい機会を開く。

Using an analytically solvable model, we show that a qubit array-based detector allows to achieve the fundamental Heisenberg limit in detecting single photons. In case of superconducting qubits, this opens new opportunities for quantum sensing and communications in the important microwave range.

翻訳日:2023-05-01 04:46:27 公開日:2021-06-01

# 量子環の電子的性質を正確に計算する

Accurately computing electronic properties of a quantum ring ( http://arxiv.org/abs/2012.00921v2 )

ライセンス: Link先を確認

C. Neill, T. McCourt, X. Mi, Z. Jiang, M. Y. Niu, W. Mruczkiewicz, I. Aleiner, F. Arute, K. Arya, J. Atalaya, R. Babbush, J. C. Bardin, R. Barends, A. Bengtsson, A. Bourassa, M. Broughton, B. B. Buckley, D. A. Buell, B. Burkett, N. Bushnell, J. Campero, Z. Chen, B. Chiaro, R. Collins, W. Courtney, S. Demura, A. R. Derk, A. Dunsworth, D. Eppens, C. Erickson, E. Farhi, A. G. Fowler, B. Foxen, C. Gidney, M. Giustina, J. A. Gross, M. P. Harrigan, S. D. Harrington, J. Hilton, A. Ho, S. Hong, T. Huang, W. J. Huggins, S. V. Isakov, M. Jacob-Mitos, E. Jeffrey, C. Jones, D. Kafri, K. Kechedzhi, J. Kelly, S. Kim, P. V. Klimov, A. N. Korotkov, F. Kostritsa, D. Landhuis, P. Laptev, E. Lucero, O. Martin, J. R. McClean, M. McEwen, A. Megrant, K. C. Miao, M. Mohseni, J. Mutus, O. Naaman, M. Neeley, M. Newman, T. E. O'Brien, A. Opremcak, E. Ostby, B. Pato, A. Petukhov, C. Quintana, N. Redd, N. C. Rubin, D. Sank, K. J. Satzinger, V. Shvarts, D. Strain, M. Szalay, M. D. Trevithick, B. Villalonga, T. C. White, Z. Yao, P. Yeh, A. Zalcman, H. Neven, S. Boixo, L. B. Ioffe, P. Roushan, Y. Chen, V. Smelyanskiy

(参考訳) 凝縮マター系を研究するための有望なアプローチは、それらをエンジニアリングされた量子プラットフォーム上でシミュレートすることである。しかし、古典的手法を上回る精度を達成することは、大きな課題である。ここでは,18個の超伝導量子ビットを用いて,精密凝縮マッターシミュレータのための実験青写真を提供し,基本電子特性の探索方法を示す。 1次元ワイヤの単一粒子バンド構造を再構築することにより,基礎となる手法をベンチマークする。我々はデコヒーレンスと読み出し誤差のほぼ完全な緩和を実証し、このワイヤのエネルギー固有値の誤差が0.01 radであるのに対して、典型的なエネルギースケールは1 radである。この前例のないアルゴリズムの忠実性への洞察は、フーリエ変換の頑健な性質を強調し、1e-4 radの統計的不確実性で固有エネルギーを解く能力を含む。さらに, 凝縮マター系の2つの鍵要素である磁束と乱れ局所電位を合成する。磁束を網羅する際には,局所障害の空間分布の詳細な指紋であるスペクトルの水平交差を観測する。これらの方法を組み合わせて固有状態の電子的性質を再構成し, 持続電流を観測し, コンダクタンスの強い抑制を付加した。本研究は、量子シミュレーションの正確な方法を説明し、超伝導量子ビットを用いた新しい量子材料の研究方法を提案する。

A promising approach to study condensed-matter systems is to simulate them on an engineered quantum platform. However, achieving the accuracy needed to outperform classical methods has been an outstanding challenge. Here, using eighteen superconducting qubits, we provide an experimental blueprint for an accurate condensed-matter simulator and demonstrate how to probe fundamental electronic properties. We benchmark the underlying method by reconstructing the single-particle band-structure of a one-dimensional wire. We demonstrate nearly complete mitigation of decoherence and readout errors and arrive at an accuracy in measuring energy eigenvalues of this wire with an error of ~0.01 rad, whereas typical energy scales are of order 1 rad. Insight into this unprecedented algorithm fidelity is gained by highlighting robust properties of a Fourier transform, including the ability to resolve eigenenergies with a statistical uncertainty of 1e-4 rad. Furthermore, we synthesize magnetic flux and disordered local potentials, two key tenets of a condensed-matter system. When sweeping the magnetic flux, we observe avoided level crossings in the spectrum, a detailed fingerprint of the spatial distribution of local disorder. Combining these methods, we reconstruct electronic properties of the eigenstates where we observe persistent currents and a strong suppression of conductance with added disorder. Our work describes an accurate method for quantum simulation and paves the way to study novel quantum materials with superconducting qubits.

翻訳日:2023-04-22 08:06:07 公開日:2021-06-01

# 一般化安定化状態の効率的な絡み合い生成と検出

Efficient entanglement generation and detection of generalized stabilizer states ( http://arxiv.org/abs/2012.07606v2 )

ライセンス: Link先を確認

Yihong Zhang, Yifan Tang, You Zhou, Xiongfeng Ma

(参考訳) 大規模絡み合いの生成と検証は量子技術の発展に不可欠である。本論文では,ハイゼンベルク相互作用を用いて,多数の量子ビットの真の多部絡み合いを生成する効率的なスキームを提案する。この方法は超伝導、閉じ込められたイオン、低温原子系を含む様々な物理プラットフォームで便利に実装できる。出力量子状態の絡み合いを特徴付けるために、安定化器形式を一般化し、絡み合い証人法を開発する。特に,与えられた雑音レベル以下の計測設定を最小にすることで,絡み合い証人を最適化する汎用探索アルゴリズムを設計した。実用化の観点から,実験効率と検出堅牢性とのトレードオフを数値的に検討する。

The generation and verification of large-scale entanglement are essential to the development of quantum technologies. In this paper, we present an efficient scheme to generate genuine multipartite entanglement of a large number of qubits by using the Heisenberg interaction. This method can be conveniently implemented in various physical platforms, including superconducting, trapped-ion, and cold-atom systems. In order to characterize the entanglement of the output quantum state, we generalize the stabilizer formalism and develop an entanglement witness method. In particular, we design a generic searching algorithm to optimize entanglement witness with a minimal number of measurement settings under a given noise level. From the perspective of practical applications, we numerically study the trade-off between the experiment efficiency and the detection robustness.

翻訳日:2023-04-20 21:23:03 公開日:2021-06-01

# 駆動型2次系浴による有効量子動力学

Effective quantum dynamics induced by a driven two-level-system bath ( http://arxiv.org/abs/2012.11235v2 )

ライセンス: Link先を確認

Katja Kustura, Oriol Romero-Isart, Carlos Gonzalez-Ballestero

(参考訳) 損失のあるがコヒーレント駆動の2段階システム(tls)が、jaynes-cummings相互作用を介してボソニックシステムと結合した、ボルン・マルコフマスター方程式を導出する。すべての主方程式を解析的に導出する。同一のTLSに結合した単一モードシステムの場合のこれらの速度を特徴付ける。駆動型TLS浴の非熱定常状態から生じる系の定常状態とそのエキゾチック特性について検討した。これらの性質には、消散性増幅、浴槽誘起線形不安定、コヒーレントおよび消散性スクイーズがある。マスター方程式は任意の強いTLS駆動に有効であり、多レベルシステムや他のシステム-バス相互作用項を含むように一般化することができる。我々の研究は、例えば超伝導回路、マグノン系、量子音響に基づく量子技術デバイスにおいて重要な制限因子であるTLSによるデコヒーレンスの研究と特徴付けを行うツールを提供する。

We derive a Born-Markov master equation describing the dissipation induced by a bath of lossy but coherently driven two-level systems (TLS) coupled to a bosonic system via Jaynes-Cummings interaction. We analytically derive all the master equation rates. We characterize these rates for the particular case of a single-mode system coupled to identical TLS. We study the steady state of the system and its exotic properties stemming from the non-thermal stationary state of the driven TLS bath. These properties include dissipative amplification, bath-induced linear instability, and both coherent and dissipative squeezing. The master equation is valid for arbitrarily strong TLS driving, and it can be generalized to include multi-level systems or other system-bath interaction terms, among others. Our work provides a tool to study and characterize TLS-induced decoherence, a key limiting factor in quantum technological devices based on, for instance, superconducting circuits, magnonic systems, or quantum acoustics.

翻訳日:2023-04-20 00:37:37 公開日:2021-06-01

# 量子インターネットアプリケーション,機能,実装技術,課題,研究の方向性

Quantum Internet- Applications, Functionalities, Enabling Technologies, Challenges, and Research Directions ( http://arxiv.org/abs/2101.04427v2 )

ライセンス: Link先を確認

Amoldeep Singh, Kapal Dev, Harun Siljak, Hem Dutt Joshi and Maurizio Magarini

(参考訳) 私たちが現在使っている先進的なノートブック、携帯電話、インターネットアプリケーションは、すべてゼロとゼロの古典的なコミュニケーションに根ざしています。古典的なインターネットは、数学の融合とクロード・シャノンの情報理論を基礎にしている。しかし、今日のインターネット技術は盗聴者の遊び場です。これは、古典的なインターネット技術に依存する様々なアプリケーションに深刻な課題をもたらす。これにより研究者たちは、より安全である新しい技術に切り替える動機となった。量子効果を探求し、研究者はセキュリティ、プライバシ、量子計算、通信、メトロロジーといった幅広い能力を提供する量子ネットワークへの道を切り開いた。量子インターネットの実現には、量子暗号プロトコルで保護された量子チャネルを介して、様々なリモートノード間の量子通信が必要である。このようなネットワークは、ゼロと1の値を同時に取り得る量子ビット(量子ビット)に依存している。エンタングルメント、テレポーテーション、重ね合わせといった量子ビットの異常な性質のため、従来のネットワーク上の量子ネットワークに多くの点で縁を与える。しかし同時に、長い距離で量子ビットを送信することは恐ろしい作業であり、そのような距離で量子テレポーテーションの広範な研究は、近い将来、量子インターネットを物理的に実現するためのブレークスルーとなるだろう。本稿では,グローバルな量子インターネット開発に必要なインフラの基本的な理解を得るために,量子インターネット機能,技術,アプリケーション,オープンな課題を幅広く調査している。

The advanced notebooks, mobile phones, and internet applications in today's world that we use are all entrenched in classical communication bits of zeros and ones. Classical internet has laid its foundation originating from the amalgamation of mathematics and Claude Shannon's theory of information. But today's internet technology is a playground for eavesdroppers. This poses a serious challenge to various applications that relies on classical internet technology. This has motivated the researchers to switch to new technologies that are fundamentally more secure. Exploring the quantum effects, researchers paved the way into quantum networks that provide security, privacy and range of capabilities such as quantum computation, communication and metrology. The realization of quantum internet requires quantum communication between various remote nodes through quantum channels guarded by quantum cryptographic protocols. Such networks rely upon quantum bits (qubits) that can simultaneously take the value of zeros and ones. Due to extraordinary properties of qubits such as entanglement, teleportation and superposition, it gives an edge to quantum networks over traditional networks in many ways. But at the same time transmitting qubits over long distances is a formidable task and extensive research is going on quantum teleportation over such distances, which will become a breakthrough in physically realizing quantum internet in near future. In this paper, quantum internet functionalities, technologies, applications and open challenges have been extensively surveyed to help readers gain a basic understanding of infrastructure required for the development of global quantum internet.

翻訳日:2023-04-17 00:43:19 公開日:2021-06-01

# 曲面時空における干渉可視性

Interferometric Visibility in Curved Spacetimes ( http://arxiv.org/abs/2101.06320v3 )

ライセンス: Link先を確認

Marcos L. W. Basso and Jonas Maziero

(参考訳) 著者らは[M. Zych et al., Nat. Commun. 2, 505 (2011)]で、インターフェロメトリーの可視性は重力場の影響を受けており、適切な時間という一般相対論的概念なしでは説明できないと予測した。本研究では,ニュートン極限における局所ローレンツ変換のユニタリ表現を用いて,異なる経路を導出し,同じ効果を導出する。また,重力による干渉視認性への影響は時空測地線によっても持続することを示した。しかし、その影響は必ずしも適切な時間の概念によるものではない。例えば、シュワルツシルト時空に「天文学的な」マッハ・ツェンダー干渉計を構築することで、干渉計の可視性への影響は、別の一般相対論的効果である測地的偏差による可能性がある。さらに、局所ローレンツ変換のユニタリ表現を用いて、この干渉的可視性の振る舞いが任意の時空に対して一般的であることを示し、量子トンの運動を二次元の空間平面に制限する。

In [M. Zych et al., Nat. Commun. 2, 505 (2011)], the authors predicted that the interferometric visibility is affected by a gravitational field in way that cannot be explained without the general relativistic notion of proper time. In this work, we take a different route and start deriving the same effect using the unitary representation of the local Lorentz transformation in the Newtonian Limit. In addition, we show that the effect on the interferometric visibility due to gravity persists in different spacetime geometries. However, the influence is not necessarily due to the notion of proper time. For instance, by constructing a `astronomical' Mach-Zehnder interferometer in the Schwarzschild spacetime, the influence on the interferometric visibility can be due to another general relativistic effect, the geodetic precession. Besides, by using the unitary representation of the local Lorentz transformation, we show that this behavior of the interferometric visibility is general for an arbitrary spacetime, provided that we restrict the motion of the quanton to a two-dimensional spacial plane.

翻訳日:2023-04-15 02:55:33 公開日:2021-06-01

# lindblad方程式における時間依存的環境結合による欠陥生成

Defect production due to time-dependent coupling to environment in the Lindblad equation ( http://arxiv.org/abs/2101.11334v2 )

ライセンス: Link先を確認

Bal\'azs Gul\'acsi, Bal\'azs D\'ora

(参考訳) 近年,非エルミート・ハミルトニアンによる非ユニタリダイナミクス中の欠陥生成が研究されている。例外点を通じて非エルミートカップリングを線形に隆起させることで、欠陥はエルミート臨界点に近づくのとほとんど同じ方法で生成される。一般化されたKibble-Zurekスケーリングは、ドライブの速度と対応する臨界指数の点で欠陥密度の増大を考慮に入れた。ここでは, 循環項を付加し, 量子ジャンプ問題に対するリンドブラッド時間の完全な発展を考えることにより, この設定を拡張する。時間内に環境結合を線形に増加させ、リウビリアの定常解を超えることで、欠陥密度は全てのケースにおいて駆動速度と線形にスケールする。このスケーリングは、過渡状態に現れるリウヴィリアンの例外的な点の存在の影響を受けない。断熱摂動理論の変種を用いて、欠陥密度のスケーリングは代数方程式の集合から正確に決定される。本研究は定常状態と過渡状態に対応する例外点に対するリンドブラジアン時間発展の特異な感度を示している。

Recently defect production was investigated during non-unitary dynamics due to non-Hermitian Hamiltonian. By ramping up the non-Hermitian coupling linearly in time through an exceptional point, defects are produced in much the same way as approaching a Hermitian critical point. A generalized Kibble--Zurek scaling accounted for the ensuing scaling of the defect density in terms of the speed of the drive and the corresponding critical exponents. Here we extend this setting by adding the recycling term and considering the full Lindbladian time evolution of the problem with quantum jumps. We find that by linearly ramping up the environmental coupling in time, and going beyond the steady-state solution of the Liouvillian, the defect density scales linearly with the speed of the drive for all cases. This scaling is unaffected by the presence of exceptional points of the Liouvillian, which can show up in the transient states. By using a variant of the adiabatic perturbation theory, the scaling of the defect density is determined exactly from a set of algebraic equations. Our study indicates the distinct sensitivity of the Lindbladian time evolution to exceptional points corresponding to steady states and transient states.

翻訳日:2023-04-13 20:08:23 公開日:2021-06-01

# 絡み合い能力によるホーキング放射の検出

Probing Hawking radiation through capacity of entanglement ( http://arxiv.org/abs/2102.02425v3 )

ライセンス: Link先を確認

Kohki Kawabata, Tatsuma Nishioka, Yoshitaka Okuyama and Kento Watanabe

(参考訳) 重力相転移に関連するモデルにおける絡み合いの容量について考察する。キャパシティは、熱力学の逆温度と似た役割を果たすレプリカパラメータによってラベル付けされる。放射状ブラックホールの世界ブレインモデルの終わりには、様々なタイプのトポロジーの模擬ワームホール幾何学間の相転移を示すピークがページ時間付近にある。同様に、ホーキング放射を記述する移動ミラーモデルでは、支配的なサドルが2つのフェーズの間で切り替わるときに、キャパシティが不連続を示すのが典型的である。いずれの場合も、ブラックホール蒸発過程の診断に有用であることが判明します。

We consider the capacity of entanglement in models related with the gravitational phase transitions. The capacity is labeled by the replica parameter which plays a similar role to the inverse temperature in thermodynamics. In the end of the world brane model of a radiating black hole the capacity has a peak around the Page time indicating the phase transition between replica wormhole geometries of different types of topology. Similarly, in a moving mirror model describing Hawking radiation the capacity typically shows a discontinuity when the dominant saddle switches between two phases, which can be seen as a formation of island regions. In either case we find the capacity can be an invaluable diagnostic for a black hole evaporation process.

翻訳日:2023-04-12 20:12:07 公開日:2021-06-01

# 近接場放射伝熱固有モード

Near-Field Radiative Heat Transfer Eigenmodes ( http://arxiv.org/abs/2102.05769v2 )

ライセンス: Link先を確認

Stephen Sanders and Lauren Zundel and Wilton J. M. Kort-Kamp and Diego A. R. Dalvit and Alejandro Manjavacas

(参考訳) ナノスケール物体間の近接場電磁相互作用は、遠距離場黒体放射によって確立された限界を大幅に超える拡張された放射熱伝達をもたらす。本稿では, この過程を支配する方程式の固有モード展開を用いて, ナノ構造の集合における放射熱伝達の時間的ダイナミクスを記述するための理論的枠組みを提案する。このフォーマリズムを用いて、ナノ構造の集合の熱化を決定する基本原理を同定し、一般的ながしばしば直観的ではないダイナミクスを明らかにする。その結果,多数のナノ粒子を含む系における近接場放射熱伝達の時間的ダイナミクスを効率的に解析する,エレガントで正確な手法が得られた。

The near-field electromagnetic interaction between nanoscale objects produces enhanced radiative heat transfer that can greatly surpass the limits established by far-field black-body radiation. Here, we present a theoretical framework to describe the temporal dynamics of the radiative heat transfer in ensembles of nanostructures, which is based on the use of an eigenmode expansion of the equations that govern this process. Using this formalism, we identify the fundamental principles that determine the thermalization of collections of nanostructures, revealing general but often unintuitive dynamics. Our results provide an elegant and precise approach to efficiently analyze the temporal dynamics of the near-field radiative heat transfer in systems containing a large number of nanoparticles.

翻訳日:2023-04-12 00:31:12 公開日:2021-06-01

# 説明責任のあるモノのインターネットを目指して : レビュー可能性への呼びかけ

Towards an accountable Internet of Things: A call for reviewability ( http://arxiv.org/abs/2102.08132v2 )

ライセンス: Link先を確認

Chris Norval, Jennifer Cobbe, Jatinder Singh

(参考訳) IoTがますます普及するにつれて、IoTシステムの構築とデプロイに関する懸念が高まっている。接続されたデバイスは膨大な量のデータを生成し、アルゴリズムシステムを駆動し、実際の結果をもたらす。何が起きたのか、なぜ起きたのか、誰が責任を負っているのかをどうやって特定するのか? このようなシステムの複雑さを考えると、どこから始めるのか? この章では、IoTに関連する説明責任の側面を概説する。具体的には、IoTシステムのレビューを容易にするメカニズム(法的、技術的、組織的)の緊急性の必要性を論じます。このようなメカニズムは、関係する利害関係者がより深く理解し、評価し、尋問し、我々の世界に浸透する接続された環境に挑戦できるようにすることで、説明責任をサポートするために機能する。

As the IoT becomes increasingly ubiquitous, concerns are being raised about how IoT systems are being built and deployed. Connected devices will generate vast quantities of data, which drive algorithmic systems and result in real-world consequences. Things will go wrong, and when they do, how do we identify what happened, why they happened, and who is responsible? Given the complexity of such systems, where do we even begin? This chapter outlines aspects of accountability as they relate to IoT, in the context of the increasingly interconnected and data-driven nature of such systems. Specifically, we argue the urgent need for mechanisms - legal, technical, and organisational - that facilitate the review of IoT systems. Such mechanisms work to support accountability, by enabling the relevant stakeholders to better understand, assess, interrogate and challenge the connected environments that increasingly pervade our world.

翻訳日:2023-04-11 00:22:49 公開日:2021-06-01

# 拡張ウィグナーの友人問題と標準量子力学の内部整合性

Extended Wigner's friend problem and the internal consistency of standard quantum mechanics ( http://arxiv.org/abs/2102.08709v3 )

ライセンス: Link先を確認

D.Sokolovski and A.Matzkin

(参考訳) 拡張されたウィグナーの友人問題は、友人が量子測定をしている密閉された実験室を測定する2つのオブザーバーを扱う。本稿では、ファインマンが有名な「Feynman Lectures on Physics」で明らかにした量子力学の基本規則に頼って、この問題を考察する。近年の議論では、拡張されたウィグナーの友人問題は量子論では一貫して説明できないことが示唆されているが、これらの標準規則の直接的な適用は、関係するすべてのエージェントの計測結果の曖昧で一貫した説明をもたらす。

The extended Wigner's friend problem deals with two Observers each measuring a sealed laboratory in which a friend is making a quantum measurement. We investigate this problem by relying on the basic rules of quantum mechanics as exposed by Feynman in the well-known "Feynman Lectures on Physics". Although recent discussions have suggested that the extended Wigner's friend problem cannot consistently be described by quantum theory, we show here that a straightforward application of these standard rules results in a non-ambiguous and consistent account of the measurement outcomes for all agents involved.

翻訳日:2023-04-10 23:54:32 公開日:2021-06-01

# 最大量子カオス系について

On systems of maximal quantum chaos ( http://arxiv.org/abs/2102.11294v3 )

ライセンス: Link先を確認

Mike Blake and Hong Liu

(参考訳) 多体量子系におけるカオスの顕著な特徴は、量子リャプノフ指数上の有界の存在である。重要な質問は、この境界を飽和させる最大カオスシステムについて、何が特別なのかを理解することである。ここでは、このようなシステムにおけるカオスの「流体力学」の起源のさらなる証拠を提供し、最大カオスシステムの目印について議論する。まず,前述したカオスの流体力学的実効場理論を最大カオス系の理論として理解すべきであることを示す。次に,それまでの文献では暗黙であった極大カオス,すなわち一般数体作用素の可換正方形における指数的成長の抑制を強調・明示する。カオス有効場理論におけるこの抑制の一般論を提案し、SYKモデルとホログラフィシステムを用いて説明する。この抑制は、最大カオス系における演算子スクランブルの性質が、非最大カオス系におけるスクランブルと根本的に異なることを示していると推測する。また,非最大カオス系においても,十分大きな距離で最大カオス状態が存在する場合の最も単純なシナリオについても論じる。

A remarkable feature of chaos in many-body quantum systems is the existence of a bound on the quantum Lyapunov exponent. An important question is to understand what is special about maximally chaotic systems which saturate this bound. Here we provide further evidence for the `hydrodynamic' origin of chaos in such systems, and discuss hallmarks of maximally chaotic systems. We first provide evidence that a hydrodynamic effective field theory of chaos we previously proposed should be understood as a theory of maximally chaotic systems. We then emphasize and make explicit a signature of maximal chaos which was only implicit in prior literature, namely the suppression of exponential growth in commutator squares of generic few-body operators. We provide a general argument for this suppression within our chaos effective field theory, and illustrate it using SYK models and holographic systems. We speculate that this suppression indicates that the nature of operator scrambling in maximally chaotic systems is fundamentally different to scrambling in non-maximally chaotic systems. We also discuss a simplest scenario for the existence of a maximally chaotic regime at sufficiently large distances even for non-maximally chaotic systems.

翻訳日:2023-04-10 05:31:13 公開日:2021-06-01

# サイバーセキュリティの月が輝いているとは言わないで下さい(サイバーセキュリティのショーと説明)

Don't Tell Me The Cybersecurity Moon Is Shining... (Cybersecurity Show And Tell) ( http://arxiv.org/abs/2103.11030v3 )

ライセンス: Link先を確認

Luca Vigan\`o

(参考訳) 「ええ、言うなよ」は、どの作家にとっても文学の戒めとなっている。これはあらゆる形態のフィクションにも当てはまり、科学的な記述を含むノンフィクションにも当てはまり、多くの科学的なコミュニケーションとストーリーテリングのアプローチの中心にある。本稿では,数学や科学の基盤となる概念や結果,特にサイバーセキュリティにおいて,複雑な概念を提示し,教えたり,説明したりする上で,実際に「説明する(show \emph{and} tell)」が最善のアプローチであることが多いことを論じる。サイバーセキュリティを説明するために、どのように異なる種類のアートワークが使えるかについて議論し、視覚的なストーリーテリングや他の形態のストーリーテリングを通して、どのように(形式的で技術的に)概念を説明するかを説明する。また、アートワークの4つのカテゴリとそれらが提供する説明についても論じます。

"Show, don't tell" has become the literary commandment for any writer. It applies to all forms of fiction, and to non-fiction, including scientific writing, where it lies at the heart of many scientific communication and storytelling approaches. In this paper, I discuss how "show \emph{and} tell" is actually often the best approach when one wants to present, teach or explain complicated ideas such as those underlying notions and results in mathematics and science, and in particular in cybersecurity. I discuss how different kinds of artworks can be used to explain cybersecurity and I illustrate how telling (i.e., explaining notions in a formal, technical way) can be paired with showing through visual storytelling or other forms of storytelling. I also discuss four categories of artworks and the explanations they help provide.

翻訳日:2023-04-07 10:46:09 公開日:2021-06-01

# 市民中心・国境を越えたeGovernanceのための信頼と相互運用可能な分散ソリューション--概念的アプローチ

A trustable and interoperable decentralized solution for citizen-centric and cross-border eGovernance: A conceptual approach ( http://arxiv.org/abs/2103.15458v2 )

ライセンス: Link先を確認

George Domalis, Nikos Karacapilidis, Dimitris Tsakalidis, Anastasios Giannaros

(参考訳) 本稿では,共通の公共サービスを共有するための横断的・横断的エグバランス・パラダイムを支援することを目的として,デジタル,効率的,費用対効果,相互運用性,安全性を備えた,効率的なビッグデータ交換・サービス配信のためのデセンタライズド・ネットワークに受益者が参加可能なaiエンハンスメント・ソリューションを提案する。溶液は、一データの共有のための信頼性及び効率的な分散化機構で、プロセスの複雑さ及び資源の高要求に対処することができる。 (二利害関係者のニーズに合わせてモバイルサービスを提供するためのエコシステム (iii)複数のサービスで取引を管理するためのシングルサインオンウォレット機構 (iv)既存の電子政府システムと新規に開発したものとの間で情報の安全な交換を行う通信層。示唆的なアプリケーションシナリオは、私たちのアプローチの可能性を示しています。

Aiming to support a cross-sector and cross-border eGovernance paradigm for sharing common public services, this paper introduces an AI-enhanced solution that enables beneficiaries to participate in a decenntralized network for effective big data exchange and service delivery that promotes the once-only priority and is by design digital, efficient, cost-effective, interoperable and secure. The solution comprises (i) a reliable and efficient decentralized mechanism for data sharing, capable of addressing the complexity of the processes and their high demand of resources; (ii) an ecosystem for delivering mobile services tailored to the needs of stakeholders; (iii) a single sign-on Wallet mechanism to manage the transactions with multiple services; and (iv) an intercommunication layer, responsible for the secure exchange of information among existing eGovernment systems with newly developed ones. An indicative application scenario showcases the potential of our approach.

翻訳日:2023-04-06 06:08:05 公開日:2021-06-01

# 特異相互作用を持つ量子熱エンジン

Quantum Heat Engines with Singular Interactions ( http://arxiv.org/abs/2105.00032v2 )

ライセンス: Link先を確認

Nathan M Myers, Jacob McCready, Sebastian Deffner

(参考訳) 量子現象を利用することで、量子デバイスは古典的デバイスを上回る可能性がある。これまでの研究では、ボソニック加工媒体はフェルミオン加工媒体よりも優れた性能が得られることが示されている。我々は、ボゾンとフェルミオンの極限の間で有効対称性を調整できる特異な相互作用を組み込むことにより、この研究を拡大する。この枠組みでは、粒子はハルダンの一般化された排他統計に従属するオンとして扱うことができる。統計エノン」の枠組みを用いて解析的にダイナミクスを解き、粒子間相互作用と波動関数対称性との相互作用をエンジン性能について検討する。

By harnessing quantum phenomena, quantum devices have the potential to outperform their classical counterparts. Previous work has shown that a bosonic working medium can yield better performance than a fermionic medium. We expand upon this work by incorporating a singular interaction that allows the effective symmetry to be tuned between the bosonic and fermionic limits. In this framework, the particles can be treated as anyons subject to Haldane's generalized exclusion statistics. Solving the dynamics analytically using the framework of "statistical anyons" we explore the interplay between interparticle interactions and wave function symmetry on engine performance.

翻訳日:2023-04-01 23:32:56 公開日:2021-06-01

# 単一原子の放射圧:多レベル原子への正確な解析的アプローチの一般化

Radiation pressure on single atoms: generalization of an exact analytical approach to multilevel atoms ( http://arxiv.org/abs/2105.08554v2 )

ライセンス: Link先を確認

L. Podlecki, J. Martin, and T. Bastin

(参考訳) 近年の研究では、任意の強度、周波数、位相、伝播方向を持つ任意の平面波と相互作用する2レベル原子が経験する放射力について、半古典的状態下での計算のための標準化された正確な解析形式を提示した。オプト Soc 私は... b \textbf{35}, 127-132 (2018)]。ここでは、この処理を多レベル原子の場合にまで拡張し、原子レベルの縮退が考慮され、光の偏光が遊びに入る。この目的のために行列形式が開発されている。

In a recent work, we provided a standardized and exact analytical formalism for computing in the semiclassical regime the radiation force experienced by a two-level atom interacting with any number of plane waves with arbitrary intensities, frequencies, phases, and propagation directions [J. Opt. Soc. Am. B \textbf{35}, 127-132 (2018)]. Here, we extend this treatment to the multilevel atom case, where degeneracy of the atomic levels is considered and polarization of light enters into play. A matrix formalism is developed to this aim.

翻訳日:2023-03-30 19:59:11 公開日:2021-06-01

# 教育ツールとしての競争のプログラミングと学生への動機づけ

Students Programming Competitions as an Educational Tool and a Motivational Incentive to Students ( http://arxiv.org/abs/2105.15136v2 )

ライセンス: Link先を確認

Youry Khmelevsky, Ken Chidlow

(参考訳) 本稿では,オカナガン大学(OC)のコンピュータサイエンス科(COSC)の学生によるプログラミングコンペティションの結果について報告し,その成果を教育的観点から考察する。卒業証書や学位課程の1年生や2年生の学生は、早ければ2学期には応用研究プロジェクトや、地元のプログラミングコンペティションや国際的なプログラミングコンペに参加したいと願っていることがわかりました。私たちの観察は、2015年にCOSC学生にプログラミングコンペティションを導入して以来の2年間の教育に基づいている。学生は、コンテストに参加することで、プログラミングコースで効果的に学び、より深く、より徹底的に学習し、クラスでより良い結果を得るのを助ける動機を与えると報告した。

In this short paper we report on student programming competition results by students from the Computer Science Department (COSC) of Okanagan College (OC) and discuss the achieved results from an educational point of view. We found that some freshmen and sophomore students in diploma and degree programs are very capable and eager to be involved in applied research projects as early as the second semester, and into local and international programming competitions as well. Our observation is based on the last 2 educational years, beginning 2015 when we introduced programming competitions to COSC students. Students reported that participation in competitions give them motivation to effectively learn in their programming courses, inspire them to learn deeper and more thoroughly, and help them achieve better results in their classes.

翻訳日:2023-03-29 06:57:02 公開日:2021-06-01

# 不純物ドープボース・アインシュタイン凝縮体における幾何相による絡み合いの観察

Witnessing entanglement via the geometric phase in a impurity-doped Bose-Einstein condensate ( http://arxiv.org/abs/2106.00224v1 )

ライセンス: Link先を確認

X. Wu, S. P. Jia, C. L. Cai, L. M. Kuang

(参考訳) 本稿では、2つのRydberg不純物量子ビットとBECからなるマイクロマクロ量子系である不純物ドープボース・アインシュタイン凝縮系(BEC)における幾何学的位相による量子絡み合いを目撃する理論的スキームを提案する。初期マイクロマイクロエンタングルメントとマイクロマクロエンタングルメントの存在下で,不純物量子ビットの幾何学的位相を計算する。不純物量子ビットの幾何学的位相は、量子間マイクロ・マイクロ・アンタングルだけでなく、キュービット-BECマイクロ・マクロ・アンタングルも観測可能である。我々の研究は、不純物ドープBECにおけるミクロとマイクロマクロの絡み合いを目撃する新しい洞察を提供する。

We propose a theoretical scheme to witness quantum entanglement via the geometric phase in an impurity-doped Bose-Einstein condensate (BEC), which is a micro-macro quantum system consisting of two Rydberg impurity qubits and the BEC. We calculate the geometric phase of the impurity qubits in the presence of the initial micro-micro and micro-macro entanglement, respectively. It is demonstrated that the geometric phase of the impurity qubits can witness not only inter-qubit micro-micro entanglement, but also qubit-BEC micro-macro entanglement. Our work provide a new insight to witness micro-micro and micro-macro entanglement in a impurity-doped BEC.

翻訳日:2023-03-28 03:58:01 公開日:2021-06-01

# シューマッハの情報幾何ベル不等式の実験的実現

Experimental Realization of Schumacher's Information Geometric Bell Inequality ( http://arxiv.org/abs/2106.00194v1 )

ライセンス: Link先を確認

Tahereh Rezaei and Shahabeddin M. Aslmarand and Robert Snyder and Behzad Khajavi and Paul M. Alsing and Michael Fanto and Doyeol (David) Ahn and Warner A. Miller

(参考訳) 量子力学は古典的に許されるよりも強い相関を生み出すことができる。この古典的相関は量子コンピューティングの「燃料」である。 1991年、シューマッハはベルのよく知られた結果に類似した美しい幾何学的アプローチを推し進め、この相関関係の古典的でない一重項状態を捉えた。彼は同じ準備された状態のアンサンブルで定義された確立された情報距離を使用した。彼は、絡み合った状態を測定するために使われる特定の検出器の設定では、結果として得られる幾何学は三角形の不等式に違反していると計算した。これは「共分散距離」という観点での新しい情報に基づく幾何ベルの不等式を与えた。本稿では,この構成を実験的に再現し,bbo結晶の自発的パラメトリックダウンコンバージョンに基づく2つの光子のベル状態に対する決定的な違反を示す。私たちが作成した状態は、$v_{ad}=0.970$でした。我々は高次元多部量子状態への一般化について議論する。

Quantum mechanics can produce correlations that are stronger than classically allowed. This stronger-than-classical correlation is the "fuel" for quantum computing. In 1991 Schumacher forwarded a beautiful geometric approach, analogous to the well-known result of Bell, to capture non-classicality of this correlation for a singlet state. He used well-established information distance defined on an ensemble of identically-prepared states. He calculated that for certain detector settings used to measure the entangled state, the resulting geometry violated a triangle inequality -- a violation that is not possible classically. This provided a novel information-based geometric Bell inequality in terms of a "covariance distance." Here we experimentally-reproduce his construction and demonstrate a definitive violation for a Bell state of two photons based on the usual spontaneous parametric down-conversion in a paired BBO crystal. The state we produced had a visibility of $V_{ad}=0.970$. We discuss generalizations to higher dimensional multipartite quantum states.

翻訳日:2023-03-28 03:57:28 公開日:2021-06-01

# 周期的量子ウォークを誘導する正則グラフの組合せ必要条件

Combinatorial necessary conditions for regular graphs to induce periodic quantum walks ( http://arxiv.org/abs/2106.00166v1 )

ライセンス: Link先を確認

Sho Kubota

(参考訳) 正規混合グラフで定義される離散時間量子ウォークの組合せ必要条件を周期的に導出する。量子ウォークが周期的であれば、時間発展行列のすべての固有値は代数整数でなければならない。この点に着目し,特性多項式の係数がどの環に属するべきかを考察する。一方、$\eta$-Hermitian adjacency matrice の特徴多項式の係数は組合せ的含意を持つ。これらのことから、時間発展行列の特徴多項式の係数に組合せ的含意を見出すことができ、したがって混合グラフが周期的であるためには組合せ的必要条件を導出することができる。例えば、$k$-regular mixed graph with $n$ vertices が周期的であるなら、$n/k$ は整数でなければならない。この研究の応用として、頂点数の素数を持つ混合完全グラフと混合グラフの周期性を決定する。

We derive combinatorial necessary conditions for discrete-time quantum walks defined by regular mixed graphs to be periodic. If the quantum walk is periodic, all the eigenvalues of the time evolution matrices must be algebraic integers. Focusing on this, we explore which ring the coefficients of the characteristic polynomials should belong to. On the other hand, the coefficients of the characteristic polynomials of $\eta$-Hermitian adjacency matrices have combinatorial implications. From these, we can find combinatorial implications in the coefficients of the characteristic polynomials of the time evolution matrices, and thus derive combinatorial necessary conditions for mixed graphs to be periodic. For example, if a $k$-regular mixed graph with $n$ vertices is periodic, then $2n/k$ must be an integer. As an application of this work, we determine periodicity of mixed complete graphs and mixed graphs with a prime number of vertices.

翻訳日:2023-03-28 03:56:37 公開日:2021-06-01

# フリップチップ技術による真空ギャップトランスモン量子ビットの実現

Vacuum-gap transmon qubits realized using flip-chip technology ( http://arxiv.org/abs/2106.00341v1 )

ライセンス: Link先を確認

Xuegang Li, Yingshan Zhang, Chuhong Yang, Zhiyuan Li, Junhua Wang, Tang Su, Mo Chen, Yongchao Li, Chengyao Li, Zhenyu Mi, Xuehui Liang, Chenlu Wang, Zhen Yang, Yulong Feng, Kehuan Linghu, Huikai Xu, Jiaxiu Han, Weiyang Liu, Peng Zhao, Teng Ma, Ruixia Wang, Jingning Zhang, Yu Song, Pei Liu, Ziting Wang, Zhaohua Yang, Guangming Xue, Yirong Jin, and Haifeng Yu

(参考訳) フリップチップ技術に基づく大規模超伝導量子プロセッサの開発では大きな進歩が見られた。本研究では、フリップチップ技術を用いて、大きなシャントコンデンサを真空ギャップパラレルプレートコンデンサに置き換えた「フリップモン」として寄贈されたトランスモン量子ビットの修正を実現する。さらに、キュービットフットプリントを低減させるために、キュービットパッドの1つと1つのジョセフソンジャンクションを底チップに、もう1つのパッドをインジウムバンプを介して1つのジョセフソンジャンクションにガルバニー接続するトップチップに配置する。真空ギャップが約5ミクロンである場合、電場参加比は約53%に達し、結果として誘電損失が減少する可能性がある。フリップモンのコヒーレンス時間は30～60マイクロ秒の範囲で測定され、同様の製造プロセスを持つ伝統的なトランスモンと同等である。電界シミュレーションは、金属-空気界面の参加比が著しく増加し、キュービットのデコヒーレンスを支配できることを示している。これはより慎重な表面処理が必要であることを示唆している。フリップモンの内部にインジウムが膨らんでいるという証拠はない。優れた形状と良好な表面処理により、フリップモンのコヒーレンスをさらに改善することができる。

Significant progress has been made in building large-scale superconducting quantum processors based on flip-chip technology. In this work, we use the flip-chip technology to realize a modified transmon qubit, donated as the "flipmon", whose large shunt capacitor is replaced by a vacuum-gap parallel plate capacitor. To further reduce the qubit footprint, we place one of the qubit pads and a single Josephson junction on the bottom chip and the other pad on the top chip which is galvanically connected with the single Josephson junction through an indium bump. The electric field participation ratio can arrive at nearly 53% in air when the vacuum-gap is about 5 microns, and thus potentially leading to a lower dielectric loss. The coherence times of the flipmons are measured in the range of 30-60 microseconds, which are comparable with that of traditional transmons with similar fabrication processes. The electric field simulation indicates that the metal-air interface's participation ratio increases significantly and may dominate the qubit's decoherence. This suggests that more careful surface treatment needs to be considered. No evidence shows that the indium bumps inside the flipmons cause significant decoherence. With well-designed geometry and good surface treatment, the coherence of the flipmons can be further improved.

翻訳日:2023-03-28 03:49:44 公開日:2021-06-01

# デザインによるAI倫理。 AIの倫理的設計原則の重要性に対する公的な認識の評価

AI-Ethics by Design. Evaluating Public Perception on the Importance of Ethical Design Principles of AI ( http://arxiv.org/abs/2106.00326v1 )

ライセンス: Link先を確認

Kimon Kieslich, Birte Keller, Christopher Starke

(参考訳) 人工知能(AI)を倫理的に設計する上での社会的重要性にもかかわらず、倫理的AI原則に対する大衆の認識に関する研究はほとんど存在しない。倫理的AI開発が人間中心で、社会全体に利益をもたらすという目標を持っているとすれば、これはさらに顕著になる。本研究では, 倫理的原則(説明可能性, 公平性, セキュリティ, 説明責任, 正確性, プライバシー, マシン自律性)が相互に重み付けされているかを検討する。これは特に重要であり、倫理的原則を同時に考慮することはコストがかかるだけでなく、開発者が特定のトレードオフの決定を下さなければならないため、時には不可能である。本稿では,税法違反検出におけるAIの利用という,特定のユースケースを考慮に入れた倫理原則の相対的重要性について,最初の回答を与える。大規模なコンジョイント調査 (n=1099) の結果は、ドイツの回答者が概ね、倫理的原則が同様に重要であることを示唆している。しかし、その後のクラスター分析により、倫理的に設計されたシステムに対する異なる選好モデルが存在することが判明した。これらのクラスタは、望ましい属性だけでなく、属性自体の重要性も実質的に異なる。さらに、これらのグループは、社会デマログラフィーやAIに関する意見の観点から構成されているかについても述べる。社会的な意味と設計上の課題について論じる。

Despite the immense societal importance of ethically designing artificial intelligence (AI), little research on the public perceptions of ethical AI principles exists. This becomes even more striking when considering that ethical AI development has the aim to be human-centric and of benefit for the whole society. In this study, we investigate how ethical principles (explainability, fairness, security, accountability, accuracy, privacy, machine autonomy) are weighted in comparison to each other. This is especially important, since simultaneously considering ethical principles is not only costly, but sometimes even impossible, as developers must make specific trade-off decisions. In this paper, we give first answers on the relative importance of ethical principles given a specific use case - the use of AI in tax fraud detection. The results of a large conjoint survey (n=1099) suggest that, by and large, German respondents found the ethical principles equally important. However, subsequent cluster analysis shows that different preference models for ethically designed systems exist among the German population. These clusters substantially differ not only in the preferred attributes, but also in the importance level of the attributes themselves. We further describe how these groups are constituted in terms of sociodemographics as well as opinions on AI. Societal implications as well as design challenges are discussed.

翻訳日:2023-03-28 03:49:22 公開日:2021-06-01

# ユニバーサルVRアクセシビリティツールキットへの道

A Way to a Universal VR Accessibility Toolkit ( http://arxiv.org/abs/2106.00321v1 )

ライセンス: Link先を確認

Felix J. Thiel, Anthony Steed

(参考訳) VR(Virtual Reality)は,システム価格の低下とユーザ数の増加によって,ますます人気が高まっている。しかし、vrのアクセシビリティの問題は今のところほとんど解決されておらず、現時点で統一的なアプローチや標準は存在しない。本稿では,システムレベルで実装されるカスタマイズ可能なツールキットを提案し,このアプローチの潜在的なメリットと,実装を成功させるために克服する必要がある課題について議論する。

Virtual Reality (VR) has become more and more popular with dropping prices for systems and a growing number of users. However, the issue of accessibility in VR has been hardly addressed so far and no uniform approach or standard exists at this time. In this position paper, we propose a customisable toolkit implemented at the system-level and discuss the potential benefits of this approach and challenges that will need to be overcome for a successful implementation.

翻訳日:2023-03-28 03:49:00 公開日:2021-06-01

# 2色フェムト秒パルス励起複合分子の三次元配向

Three Dimensional Orientation of Complex Molecules Excited by Two-Color Femtosecond Pulses ( http://arxiv.org/abs/2106.00299v1 )

ライセンス: Link先を確認

Long Xu, Ilia Tutunnikov, Yehiam Prior, Ilya Sh. Averbukh

(参考訳) 2色フェムト秒レーザーパルスによる不斉トップ分子(キラルを含む)の励起の研究を行った。直交偏光2色パルスで励起される非キラル非対称トップ分子の場合、古典的かつ量子力学的に3次元の向きを示す。キラル分子では、交差偏光二色パルスによって誘導される配向がレーザー伝播方向に沿ってエナンチオ選択的であること、すなわち2つのエナンチオマーが反対方向に配向していることが示される。短い時間スケールでは、古典的および量子シミュレーションは優れた一致の結果を与えるが、長い時間スケールでは、エナンチオ選択的配向は量子ビートを示す。これらの観測は、2色パルスと分子(超)ポーラリザビリティの相互作用電位を解析することによって定性的に説明される。それぞれのエナンチオマーを分離するためのエナンチオ選択的配向の測定および利用に長寿命配向を利用するための展望について述べる。

We study the excitation of asymmetric-top (including chiral) molecules by two-color femtosecond laser pulses. In the cases of non-chiral asymmetric-top molecules excited by an orthogonally polarized two-color pulse, we demonstrate, classically and quantum mechanically, three-dimensional orientation. For chiral molecules, we show that the orientation induced by a cross-polarized two-color pulse is enantioselective along the laser propagation direction, namely, the two enantiomers are oriented in opposite directions. On the short time scale, the classical and quantum simulations give results that are in excellent agreement, whereas on the longer time scale, the enantioselective orientation exhibits quantum beats. These observations are qualitatively explained by analyzing the interaction potential between the two-color pulse and molecular (hyper-)polarizability. The prospects for utilizing the long-lasting orientation for measuring and using the enantioselective orientation for separating the individual enantiomers are discussed.

翻訳日:2023-03-28 03:48:53 公開日:2021-06-01

# 「なぜ民主主義を標的としないのか?--米国の政治運動に関わる人々の安全保障の実践と課題-」

"Why wouldn't someone think of democracy as a target?": Security practices & challenges of people involved with U.S. political campaigns ( http://arxiv.org/abs/2106.00236v1 )

ライセンス: Link先を確認

Sunny Consolvo, Patrick Gage Kelley, Tara Matthews, Kurt Thomas, Lee Dunn, Elie Bursztein

(参考訳) 政治キャンペーンに関わる人々は、資金豊富な高度な攻撃者、特に国家国家からのデジタルセキュリティの脅威に直面している。政治運動の治安向上は民主主義を守る重要な要素である。キャンペーンのセキュリティ問題を特定するために,米国の政治領域の28人の参加者を対象に,デジタルセキュリティの実践や課題,キャンペーンに関わる人々の認識を理解するための質的研究を行った。脅威、制約、労働文化のユニークな組み合わせが、さまざまなプラットフォームやドメインのテクノロジーを、セキュリティ攻撃に悪影響を及ぼすような方法で利用するための政治キャンペーンに関与している人々を導く、というのが、大きくて包括的な発見だ。機密データは、強力なパスワード、二要素認証、暗号化、アクセス制御をアドホックに採用することで、多くの個人および作業アカウントに保存された。個人企業、委員会、組織、キャンペーン、学術機関は、特定された問題を自ら解決することはできない。この目的のために、我々はこの複雑な問題空間を最初に理解し、様々な専門家グループが政治キャンペーンのセキュリティを改善するために協力し始める方法を推奨する。

People who are involved with political campaigns face increased digital security threats from well-funded, sophisticated attackers, especially nation-states. Improving political campaign security is a vital part of protecting democracy. To identify campaign security issues, we conducted qualitative research with 28 participants across the U.S. political spectrum to understand the digital security practices, challenges, and perceptions of people involved in campaigns. A main, overarching finding is that a unique combination of threats, constraints, and work culture lead people involved with political campaigns to use technologies from across platforms and domains in ways that leave them--and democracy--vulnerable to security attacks. Sensitive data was kept in a plethora of personal and work accounts, with ad hoc adoption of strong passwords, two-factor authentication, encryption, and access controls. No individual company, committee, organization, campaign, or academic institution can solve the identified problems on their own. To this end, we provide an initial understanding of this complex problem space and recommendations for how a diverse group of experts can begin working together to improve security for political campaigns.

翻訳日:2023-03-28 03:48:29 公開日:2021-06-01

# フォトニックケージにおける逆アンダーソン遷移

Inverse Anderson transition in photonic cages ( http://arxiv.org/abs/2106.00231v1 )

ライセンス: Link先を確認

Stefano Longhi

(参考訳) アンダーソン局在による輸送阻害は、乱れた周期格子においてユビキタスである。しかし、平らなバンド障害のみを示す結晶では、マクロなバンド平坦化を持ち上げ、幾何学的局在を除去し、特定の条件下での輸送を可能にする。この現象は、逆アンダーソン転移と呼ばれ、3次元平面バンド系に対して予測されるが、今のところ直接観測されていない。ここでは,相関性二元性障害が逆アンダーソン遷移と弾道輸送を誘発する,アハロノフ-ボームフォトニックケージという,単純な準一次元フォトニックフラットバンドシステムを提案する。

Transport inhibition via Anderson localization is ubiquitous in disordered periodic lattices. However, in crystals displaying only flat bands disorder can lift macroscopic band flattening, removing geometric localization and enabling transport in certain conditions. Such a striking phenomenon, dubbed inverse Anderson transition and predicted for three-dimensional flat band systems, has thus far not been directly observed. Here we suggest a simple quasi one-dimensional photonic flat band system, namely an Aharonov-Bohm photonic cage, in which correlated binary disorder induces an inverse Anderson transition and ballistic transport.

翻訳日:2023-03-28 03:47:35 公開日:2021-06-01

# 非エルミートメリーランド模型

Non-Hermitian Maryland Model ( http://arxiv.org/abs/2106.00230v1 )

ライセンス: Link先を確認

Stefano Longhi

(参考訳) 周期順序表示相転移を持つ非エルミート系は、エルミート物理学のパラダイムを超えている。残念なことに、既知の非エルミート模型のポテンシャルの非可測性は可積分ではない。このことは、局所化/非局在化相転移、複素平面のモビリティエッジ、およびそれらのトポロジカルな性質が解けるような、正確な可解モデルを求める動機付けとなる。ここでは、al において grempel {\it によって提唱された有名な量子カオスの可積分モデルに対する非帰納的非エルミート拡大である準結晶の完全可解モデルを示す。である。 Rev. Lett. bf 49}, 833 (1982) であり、メリーランドモデルと呼ばれた。エルミート・メリーランドのモデルとは対照的に、非エルミート拡張はよりリッチなシナリオを示し、複素エネルギー平面における位相的モビリティエッジによる局在化-非局在化相転移を示す。

Non-Hermitian systems with aperiodic order display phase transitions that are beyond the paradigm of Hermitian physics. Unfortunately, owing to the incommensurability of the potential most of known non-Hermitian models are not integrable. This motivates the search for exactly solvable models, where localization/delocalization phase transitions, mobility edges in complex plane and their topological nature can be unraveled. Here we present an exactly solvable model of quasi crystal, which is a non-pertrurbative non-Hermitian extension of a famous integrable model of quantum chaos proposed by Grempel {\it at al.} [Phys. Rev. Lett. {\bf 49}, 833 (1982)] and dubbed the Maryland model. Contrary to the Hermitian Maryland model, its non-Hermitian extension shows a richer scenario, with a localization-delocalization phase transition via topological mobility edges in complex energy plane.

翻訳日:2023-03-28 03:47:24 公開日:2021-06-01

# 惑星外惑星検出のための量子仮説試験

Quantum hypothesis testing for exoplanet detection ( http://arxiv.org/abs/2106.00488v1 )

ライセンス: Link先を確認

Zixin Huang, Cosmo Lupo

(参考訳) より明るい光源の近傍で二次光源のかすかな放出を検出することは、太陽系外惑星の探索に直接イメージングを使用する上で最も深刻な障害である。量子状態識別と量子イメージング技術を用いて, 2つの音源が角分離が小さい場合であっても, 弱い二次音源の存在を検出する誤差の確率を著しく低減できることを示す。弱いソースが明るいソースに対して$\epsilon \ll 1 $という相対的な強度を持つ場合、エラー指数は$/1/epsilon$で改善される。また、この方法では最適である線形光学測定値も発見する。この結果は、天文学から顕微鏡まで、光学イメージングのツールボックスを補完する手段として機能する。

Detecting the faint emission of a secondary source in the proximity of the much brighter source has been the most severe obstacle for using direct imaging in searching for exoplanets. Using quantum state discrimination and quantum imaging techniques, we show that one can significantly reduce the probability of error for detecting the presence of a weak secondary source, even when the two sources have small angular separations. If the weak source has relative intensity $\epsilon \ll 1 $ to the bright source, we find that the error exponent can be improved by a factor of $1/\epsilon$. We also find the linear-optical measurements that are optimal in this regime. Our result serves as a complementary method in the toolbox of optical imaging, from astronomy to microscopy.

翻訳日:2023-03-28 03:40:18 公開日:2021-06-01

# 液体中のヘテロ核スピン一重項秩序の光双極化

Optical hyperpolarization of heteronuclear spin singlet order in liquids ( http://arxiv.org/abs/2106.00414v1 )

ライセンス: Link先を確認

Y. Yang, L. Zhou, and Q. Chen

(参考訳) スピン1/2の結合対を含む核スピン一重項は、スピン格子緩和時間$t_1$よりもずっと長い時間、室温液体に核スピン過分極を保存するために用いられる。どちらも、長寿命のホモ核およびヘテロ核スピン-シンクレット秩序の観測である。同一種のハイパーポーラライズド一重項はアクセス可能であるが、ハイパーポーラライズされたヘテロ核スピンシングレット秩序はまだ提示されていない。ナノダイアモンドの光偏極窒素空孔(NV)中心スピンを用いて, 室温での超分極一重項溶液の試料中で超分極一重項位が達成可能であることを示す。

The nuclear spin singlet order involving coupled pairs of spins-1/2 may be used to store nuclear spin hyperpolarization in a room temperature liquid for a time much longer than the spin-lattice relaxation time $T_1$. There both are observations of long-lived homonuclear and heteronuclear spin-singlet order. Although hyperpolarized singlet order of the same species are accessible, hyperpolarized heteronuclear spin-singlet order has not been presented yet. Here we show hyperpolarized singlet order is achievable in the sample of $^{13}$C-labeled formic acid solution at room temperature by using optically polarized nitrogen vacancy (NV) center spins in nanodiamonds.

翻訳日:2023-03-28 03:39:42 公開日:2021-06-01

# 高調波発生過程を用いた相対論的レーザープラズマ相互作用の量子光学分光法の提案

Quantum-Optical Spectrometry in Relativistic Laser-Plasma Interactions Using the High-Harmonic Generation Process: A Proposal ( http://arxiv.org/abs/2106.00372v1 )

ライセンス: Link先を確認

Theocharis Lamprou, Rodrigo Lopez-Martens, Stefan Haessler, Ioannis Liontos, Subhendu Kahaly, Javier Rivera-Dean, Philipp Stammer, Emilio Pisanty, Marcelo F. Ciappina, Maciej Lewenstein and Paraskevas Tzallas

(参考訳) 量子光学スペクトロメトリ(quantum-optical spectrometry)は、最近開発された光子相関法であり、量子分光計(quantum spectrometer, QS)を用いて、強いレーザー・マッター相互作用の量子光学的性質を明らかにし、量子光学(QO)と強いレーザー-磁場物理学(SLFP)の研究領域を結びつける。この方法は、駆動レーザ場から高次高調波などの強いレーザー場相互作用生成物へ光子を吸収する確率を提供する。この場合、高調波発生媒体との相互作用後の赤外(ir)駆動場の光子数分布に高調波スペクトルが反映される。この方法は、強いレーザーパルスと原子と半導体との相互作用によって生じる高調波を用いた非相対論的相互作用で実装された。高強度レーザー-原子相互作用における非古典的光状態の生成に利用され、強レーザー場物理学における量子電気力学の研究の基礎を構築し、量子技術への応用のための新しいタイプの非古典的光源の開発に用いられた。ここでは、QS法を簡潔に導入した後、相対論的レーザー-プラズマ相互作用においてQSがどのように適用され、相対論的量子電磁力学の研究を開始する原動力となるかについて議論する。

Quantum-optical spectrometry is a recently developed shot-to-shot photon correlation-based method, namely using a quantum spectrometer (QS), that has been used to reveal the quantum optical nature of intense laser-matter interactions and connect the research domains of quantum optics (QO) and strong laser-field physics (SLFP). The method provides the probability of absorbing photons from a driving laser field towards the generation of a strong laser-field interaction product, such as high-order harmonics. In this case, the harmonic spectrum is reflected in the photon number distribution of the infrared (IR) driving field after its interaction with the high harmonic generation medium. The method was implemented in non-relativistic interactions using high harmonics produced by the interaction of strong laser pulses with atoms and semiconductors. Very recently, it was used for the generation of non-classical light states in intense laser-atom interaction, building the basis for studies of quantum electrodynamics in strong laser-field physics and the development of a new class of non-classical light sources for applications in quantum technology. Here, after a brief introduction of the QS method, we will discuss how the QS can be applied in relativistic laser-plasma interactions and become the driving factor for initiating investigations on relativistic quantum electrodynamics.

翻訳日:2023-03-28 03:39:18 公開日:2021-06-01

# 大規模モビリティデータによる新型コロナウイルスの感染拡大予測

Predicting COVID-19 Spread from Large-Scale Mobility Data ( http://arxiv.org/abs/2106.00356v1 )

ライセンス: Link先を確認

Amray Schwabe, Joel Persson and Stefan Feuerriegel

(参考訳) 新型コロナウイルスの感染を効果的に管理するには、公衆衛生の意思決定者はケースナンバーの正確な予測が必要である。将来のケースナンバーのほぼリアルタイム予測には、人間の移動力があるが、移動力の予測力は不足している。このギャップを埋めるために,モビリティマークホークスモデルと呼ばれる,モビリティデータに基づく流行予測の新しいモデルを提案する。提案モデルは3つの構成要素から構成される: 1) ホークスプロセスは感染症の伝染動態を捉える。 2) マークは感染率を調節し, 再生数Rが空間や時間によってどのように変化するかを説明する。このマークはモビリティ共変量に基づく正規化ポアソン回帰を用いてモデル化される。 (3)地域間を旅する人々がシードした新症例を補正する。われわれのモデルはスイスのCOVID-19流行で評価された。具体的には、2020年2月から4月までの移動データを用いて、約15億回の旅行を行った。トリップカウントは、スイス最大の通信事業者であるswisscomネットワークからの大規模な通信データ、すなわち携帯電話のpingに由来する。サンプル外根平均二乗誤差の観点から,本モデルと最先端のベースラインを比較した。私たちのモデルはベースラインを15.52%上回りました。改善は5日から21日の間に異なる予測地平線を越えて一貫して達成された。また,従来の関心点データの予測能力を評価し,通信データが優れていることを確認した。我々の知る限りでは、私たちの研究は、通信データから新型コロナウイルスの拡散を予測する最初のものである。本研究は,感染拡大の抑制に携わる公衆衛生の意思決定者に対して,スケーラブルな早期警戒システムを開発することにより,これまでの研究に寄与する。

To manage the COVID-19 epidemic effectively, decision-makers in public health need accurate forecasts of case numbers. A potential near real-time predictor of future case numbers is human mobility; however, research on the predictive power of mobility is lacking. To fill this gap, we introduce a novel model for epidemic forecasting based on mobility data, called mobility marked Hawkes model. The proposed model consists of three components: (1) A Hawkes process captures the transmission dynamics of infectious diseases. (2) A mark modulates the rate of infections, thus accounting for how the reproduction number R varies across space and time. The mark is modeled using a regularized Poisson regression based on mobility covariates. (3) A correction procedure incorporates new cases seeded by people traveling between regions. Our model was evaluated on the COVID-19 epidemic in Switzerland. Specifically, we used mobility data from February through April 2020, amounting to approximately 1.5 billion trips. Trip counts were derived from large-scale telecommunication data, i.e., cell phone pings from the Swisscom network, the largest telecommunication provider in Switzerland. We compared our model against various state-of-the-art baselines in terms of out-of-sample root mean squared error. We found that our model outperformed the baselines by 15.52%. The improvement was consistently achieved across different forecast horizons between 5 and 21 days. In addition, we assessed the predictive power of conventional point of interest data, confirming that telecommunication data is superior. To the best of our knowledge, our work is the first to predict the spread of COVID-19 from telecommunication data. Altogether, our work contributes to previous research by developing a scalable early warning system for decision-makers in public health tasked with controlling the spread of infectious diseases.

翻訳日:2023-03-28 03:38:39 公開日:2021-06-01

# PdCoO$_2$における電子輸送の有限サイズ効果

Finite-size effects of electron transport in PdCoO$_2$ ( http://arxiv.org/abs/2106.00697v1 )

ライセンス: Link先を確認

Georgios Varnavides, Yaxian Wang, Philip J.W. Moll, Polina Anikeeva, and Prineha Narang

(参考訳) 近年, 単一結晶のデラフォスサイト金属において, 異種の輸送現象が観察されている。本稿では,第一原理計算と異方性ボルツマン輸送方程式の数値モデリングを組み合わせた電子輸送の理論的枠組みを提案する。モデル系としてpdcoo$_2$を用いて、異なる微視的電子およびフォノン散乱機構を研究し、異なる温度で準粒子の平均自由経路階層を確立する。異方性フェルミ表面を明示的に処理し, 拡散性, 弾道性, 流体力学的な輸送状態の限界を橋渡しする実験アクセス性輸送観測器を数値的に得る。我々は,「quasi-ballistic」と「quasi-hydrodynamic」の区別が困難であり,しばしば定量的である必要があることを示す。第一原理計算から, 得られた輸送レジームのプロットを推定し, フェルミ表面配向が微小スケールデバイスで観測される輸送シグネチャの複雑さをいかに高めるかを示す。本研究は,オープンヘキサゴナルフェルミ表面の微視的相互作用機構に関する重要な知見を提供し,有限サイズのチャネルにおける巨視的電子輸送との接続を確立する。

A wide range of unconventional transport phenomena have recently been observed in single-crystal delafossite metals. Here, we present a theoretical framework to elucidate electron transport using a combination of first-principles calculations and numerical modeling of the anisotropic Boltzmann transport equation. Using PdCoO$_2$ as a model system, we study different microscopic electron and phonon scattering mechanisms and establish the mean free path hierarchy of quasiparticles at different temperatures. We treat the anisotropic Fermi surface explicitly to numerically obtain experimentally-accessible transport observables, which bridge between the "diffusive", "ballistic", and "hydrodynamic" transport regime limits. We illustrate that distinction between the "quasi-ballistic", and "quasi-hydrodynamic" regimes is challenging and often needs to be quantitative in nature. From first-principles calculations, we populate the resulting transport regime plots, and demonstrate how the Fermi surface orientation adds complexity to the observed transport signatures in micro-scale devices. Our work provides key insights into microscopic interaction mechanisms on open hexagonal Fermi surfaces and establishes their connection to the macroscopic electron transport in finite-size channels.

翻訳日:2023-03-28 03:30:30 公開日:2021-06-01

# 高速エンタングゲート用量子クロストークキャンセルとマルチビット性能の改善

Quantum crosstalk cancellation for fast entangling gates and improved multi-qubit performance ( http://arxiv.org/abs/2106.00675v1 )

ライセンス: Link先を確認

K. X. Wei, E. Magesan, I. Lauer, S. Srinivasan, D. F. Bogorin, S. Carnevale, G. A. Keefe, Y. Kim, D. Klaus, W. Landers, N. Sundaresan, C. Wang, E. J. Zhang, M. Steffen, O. E. Dial, D. C. McKay, A. Kandala

(参考訳) 超伝導人工原子で作られた量子コンピュータは、すでにその古典的な限界を延ばしている。これらの人工原子の最低エネルギー状態は量子ビット基底として機能するが、より高いレベルは魅力的なゲートスキームのホストと望ましくない相互作用の両方の原因となる。特に、これらの原子を結合して絡み合いを生成すると、より高いレベルは計算レベルのシフトを引き起こし、不要な$zz$量子クロストークにつながる。本稿では,結合量子ビットに対する同時交流スターク効果により,エネルギーレベルを操作し,このクロストークを緩和する新しい手法を提案する。これはqubit-qubit結合とcrosstalkの基本的なデッドロックを破り、90ns cnotのゲートエラーが (0.19 $\pm$ 0.02) $\%$ となり、固定結合の単一接合トランスモンキュービットを持つ新しいczゲートのデモンストレーションとなる。さらに、7キュービットのクロストークキャンセルにより回路性能が大幅に向上し,その拡張性が実証された。この研究は、より高速なゲートと、多ビット回路の忠実度を大幅に改善した超伝導ハードウェアの道を開いた。

Quantum computers built with superconducting artificial atoms already stretch the limits of their classical counterparts. While the lowest energy states of these artificial atoms serve as the qubit basis, the higher levels are responsible for both a host of attractive gate schemes as well as generating undesired interactions. In particular, when coupling these atoms to generate entanglement, the higher levels cause shifts in the computational levels that leads to unwanted $ZZ$ quantum crosstalk. Here, we present a novel technique to manipulate the energy levels and mitigate this crosstalk via a simultaneous AC Stark effect on coupled qubits. This breaks a fundamental deadlock between qubit-qubit coupling and crosstalk, leading to a 90ns CNOT with a gate error of (0.19 $\pm$ 0.02) $\%$ and the demonstration of a novel CZ gate with fixed-coupling single-junction transmon qubits. Furthermore, we show a definitive improvement in circuit performance with crosstalk cancellation over seven qubits, demonstrating the scalability of the technique. This work paves the way for superconducting hardware with faster gates and greatly improved multi-qubit circuit fidelities.

翻訳日:2023-03-28 03:30:07 公開日:2021-06-01

# フェルミオンと量子コンピューティングに結合したZ3ゲージ理論

Z3 gauge theory coupled to fermions and quantum computing ( http://arxiv.org/abs/2106.00549v1 )

ライセンス: Link先を確認

Ronak Desai, Yuan Feng, Mohammad Hassan, Abhishek Kodumagulla, Michael McGuigan

(参考訳) 本稿では,IBM QISKitソフトウェアを用いた変分量子固有解法(VQE)アルゴリズムを用いて,量子コンピュータ上のフェルミオンを用いたZ3ゲージ理論について検討する。最大9量子ビットを使用して、基底状態エネルギーの正確な結果を得ることができる。非ゼロ化学ポテンシャルの導入により、量子コンピュータ上の有限密度の状態方程式(EOS)を決定することができる。本稿では,本システムにおける量子アドバンテージの実現可能性について,有限密度シミュレーションとフェルミオン符号問題に関して論じる。

We study the Z3 gauge theory with fermions on the quantum computer using the Variational Quantum Eigensolver (VQE) algorithm with IBM QISKit software. Using up to 9 qubits we are able to obtain accurate results for the ground state energy. Introducing nonzero chemical potential we are able to determine the Equation of State (EOS) for finite density on the quantum computer. We discuss possible realizations of quantum advantage for this system over classical computers with regards to finite density simulations and the fermion sign problem.

翻訳日:2023-03-28 03:28:35 公開日:2021-06-01

# 離散二部分断状態の有効検証と忠実度推定

Efficient verification and fidelity estimation of discrete bipartite squeezed states ( http://arxiv.org/abs/2106.00533v1 )

ライセンス: Link先を確認

Russell P Rundle

(参考訳) 利点を得るため、量子技術は量子力学に特有の現象を利用する。このような現象は2つある。これらの特徴を示す状態を生成するため、局所的な測定による生成の検証は難しいプロセスである。ここでは、2量子の単軸スクイージングハミルトニアンを用いて生成される状態を考えるが、これは絡み合った2量子の圧縮状態を生成するだけでなく、様々な形の興味深い絡み合い状態をもたらす。実測値を用いて,これらの状態の忠実度を効率的に検証し,直接推定する方法を示す。

To gain an advantage, quantum technologies utilize phenomena particular to quantum mechanics. Two such phenomena are squeezing and entanglement. Having generated states that exhibit these features, verification of their generation with local measurements can be a difficult process. Here we consider the states that are generated using the two-qudit single-axis squeezing Hamiltonian, that not only produces entangled two-qudit squeezed states but also results in various forms of interesting entangled states. We show how one can use local measurements to both efficiently verify and directly estimate the fidelity of these generated states.

翻訳日:2023-03-28 03:28:26 公開日:2021-06-01

# 平均コンカレンスと絡み合い交換

Average concurrence and entanglement swapping ( http://arxiv.org/abs/2106.00848v1 )

ライセンス: Link先を確認

J\'anos A. Bergou, Dov Fields, Mark Hillery, Siddhartha Santra and Vladimir S. Malinovsky

(参考訳) 量子ネットワークにおけるエンタングルメントスワップにおける平均コンカレンスの役割について検討する。 qubit純状態から始まり、複数のスワップにおける平均収束の伝播を規定する非常に単純な規則が存在する。混合量子ビット状態の例を見て、純粋な状態の関係が混合状態で何が可能なのかの上界を与えるのを見つける。その後、I-concurrenceを利用するquditsに移動します。ここでの状況は qubits ほど単純ではないが、比較的簡単な結果が得られる場合もある。

We study the role of average concurrence in entanglement swapping in quantum networks. We begin with qubit pure states, and there is a very simple rule governing the propagation of average concurrence in multiple swaps. We look at examples of mixed qubit states, and find the relation for pure states gives an upper bound on what is possible with mixed states. We then move on to qudits, where we make use of the I-concurrence. Here the situation is not as simple as for qubits, but in some cases relatively straightforward results can be obtained.

翻訳日:2023-03-28 03:22:32 公開日:2021-06-01

# 最大重みスケジューリングを持つ量子ネットワークの安定性解析

Stability Analysis of a Quantum Network with Max-Weight Scheduling ( http://arxiv.org/abs/2106.00831v1 )

ライセンス: Link先を確認

Thirupathaiah Vasantam, Don Towsley

(参考訳) 本稿では,ネットワークに接続された複数のユーザに対して,絡み合った量子状態を分散する量子ネットワークについて検討する。各ユーザは、リンクを介してネットワークのスイッチに接続される。ネットワークのすべてのリンクは、特定の確率で各タイムスロット内の2部ベル状態の絡み合い状態を生成し、各エンドノードは、リンクによって生成された絡み合いの1キュービットを格納する。ユーザ集合の共有絡み付けを作成するために、リンクレベルの絡み合わせのキュービット上で測定操作を行い、それらの操作は本質的に確率的であり、特定の確率で成功している。リクエストは、異なるユーザーのための共有の絡み合いを求めるシステムに届く。各リクエストは、固定されたリンクセット上のリンクレベル絡みを使って、固定されたユーザーの共有絡みを作成するためのものである。リクエストはFirst-Come-First-Servedサービス規律に従って処理され、保存されていないリクエストはバッファに格納されます。サービス要求が選択されると、関連リンク上のリンクレベルの絡み合いのキュービット上で測定操作を行い、共有絡みを生成する。要求到着率とリンクレベルの絡み合い発生率のセットに対して,要求キューの安定性に必要な条件を求める。各タイムスロットにおいて、スケジューラはネットワークを安定化させるために、異なるユーザのセットに対する絡み合わせ操作をスケジュールする必要がある。次に、最大ウェイトスケジューリングポリシーを提案し、本ポリシーが全到達率のネットワークを安定化させることを示す。また、分析を支援する数値的な結果も提供する。異なるユーザの集合に対してマルチパーティショニングを生成する単一の量子スイッチの解析は、私たちの仕事の特別なケースです。

We study a quantum network that distributes entangled quantum states to multiple sets of users that are connected to the network. Each user is connected to a switch of the network via a link. All the links of the network generate bipartite Bell-state entangled states in each time-slot with certain probabilities, and each end node stores one qubit of the entanglement generated by the link. To create shared entanglements for a set of users, measurement operations are performed on qubits of link-level entanglements on a set of related links, and these operations are probabilistic in nature and are successful with certain probabilities. Requests arrive to the system seeking shared entanglements for different sets of users. Each request is for the creation of shared entanglements for a fixed set of users using link-level entanglements on a fixed set of links. Requests are processed according to First-Come-First-Served service discipline and unserved requests are stored in buffers. Once a request is selected for service, measurement operations are performed on qubits of link-level entanglements on related links to create a shared entanglement. For given set of request arrival rates and link-level entanglement generation rates, we obtain necessary conditions for the stability of queues of requests. In each time-slot, the scheduler has to schedule entanglement swapping operations for different sets of users to stabilize the network. Next, we propose a Max-Weight scheduling policy and show that this policy stabilizes the network for all feasible arrival rates. We also provide numerical results to support our analysis. The analysis of a single quantum switch that creates multipartite entanglements for different sets of users is a special case of our work.

翻訳日:2023-03-28 03:21:59 公開日:2021-06-01

# 拡張断熱性を有するカプラアシスト制御相ゲート

Coupler-Assisted Controlled-Phase Gate with Enhanced Adiabaticity ( http://arxiv.org/abs/2106.00725v1 )

ライセンス: Link先を確認

Ji Chu and Fei Yan

(参考訳) 高忠実性2量子エンタングゲートは、フォールトトレラント量子コンピュータにとって必須のビルディングブロックである。過去10年間、超伝導量子回路を用いたスケーラブルな2量子ビットゲートの開発に多大な努力が払われてきた。近年,固定周波数量子ビット(phys. rev. lett. 125, 240502; phys. rev. lett. 125, 240503)を用いた可変結合アーキテクチャを用いた簡易な制御相ゲート方式が高い忠実度で実証されている。しかし、根底にあるメカニズムの深い理解はいまだに欠けており、その可能性を完全に活用できない。ここでは、高コントラストZZ相互作用の起源を説明する包括的な理論的研究を紹介する。理解を深めたことにより,多レベルシステムにおいて断熱パルスを形成する汎用的かつ簡便な手法を開発し,設計からゲート性能を最適化する方法を明らかにした。最先端のコヒーレンス特性を考えると、このスキームは、フォールトトレラント量子計算の進歩を劇的に加速する10〜5ドルに近い2ビットゲート誤差率を達成する可能性がある。

High-fidelity two-qubit entangling gates are essential building blocks for fault-tolerant quantum computers. Over the past decade, tremendous efforts have been made to develop scalable high-fidelity two-qubit gates with superconducting quantum circuits. Recently, an easy-to-scale controlled-phase gate scheme that utilizes the tunable-coupling architecture with fixed-frequency qubits [Phys. Rev. Lett. 125, 240502; Phys. Rev. Lett. 125, 240503] has been demonstrated with high fidelity and attracted broad interest. However, in-depth understanding of the underlying mechanism is still missing, preventing us from fully exploiting its potential. Here we present a comprehensive theoretical study, explaining the origin of the high-contrast ZZ interaction. Based on improved understanding, we develop a general yet convenient method for shaping an adiabatic pulse in a multilevel system, and identify how to optimize the gate performance from design. Given state-of-the-art coherence properties, we expect the scheme to potentially achieve a two-qubit gate error rate near $10^{-5}$, which would drastically speed up the progress towards fault-tolerant quantum computation.

翻訳日:2023-03-28 03:20:11 公開日:2021-06-01

# ダイヤモンド中のtin空スピン量子ビットの量子制御

Quantum control of the tin-vacancy spin qubit in diamond ( http://arxiv.org/abs/2106.00723v1 )

ライセンス: Link先を確認

Romain Debroux, Cathryn P. Michaels, Carola M. Purser, Noel Wan, Matthew E. Trusheim, Jes\'us Arjona Mart\'inez, Ryan A. Parker, Alexander M. Stramma, Kevin C. Chen, Lorenzo de Santis, Evgeny M. Alexeev, Andrea C. Ferrari, Dirk Englund, Dorian A. Gangloff, Mete Atat\"ure

(参考訳) ダイヤモンドにおけるグループIVカラーセンターは、量子ネットワークデバイスにとって有望なライトマッターインターフェースである。負電荷のスズ空洞中心(snv)は特に興味深いもので、その大きなスピン軌道結合はフォノンの強調に対する強い保護と、スピン-光子エンタングルメントスキームへの光遷移のロバストな周期性をもたらす。ここでは、SnVスピン量子ビットの多軸コヒーレント制御を、地上と励起状態の間の全光刺激されたラマン駆動により実証する。我々はコヒーレント集団トラップと光駆動型電子スピン共鳴を用いて1.7Kで量子ビットへのコヒーレントアクセスを確認し、スピンラビ振動を$\Omega/2\pi$=3.6(1) MHzで得る。 all-optical ramsey interferometry は、スピンの減衰時間である$t_2^*$=1.3(3)$\mu$s と2パルスの動的デカップリングが既にスピンコヒーレンス時間を$t_2$=0.33(14) ms に拡張し、変換制限された光子とフォトニックナノ構造への統合により、snv は量子ネットワークにおける競合スピンフォトニクス構築ブロックとなることを示した。

Group-IV color centers in diamond are a promising light-matter interface for quantum networking devices. The negatively charged tin-vacancy center (SnV) is particularly interesting, as its large spin-orbit coupling offers strong protection against phonon dephasing and robust cyclicity of its optical transitions towards spin-photon entanglement schemes. Here, we demonstrate multi-axis coherent control of the SnV spin qubit via an all-optical stimulated Raman drive between the ground and excited states. We use coherent population trapping and optically driven electronic spin resonance to confirm coherent access to the qubit at 1.7 K, and obtain spin Rabi oscillations at a rate of $\Omega/2\pi$=3.6(1) MHz. All-optical Ramsey interferometry reveals a spin dephasing time of $T_2^*$=1.3(3)$\mu$s and two-pulse dynamical decoupling already extends the spin coherence time to $T_2$=0.33(14) ms. Combined with transform-limited photons and integration into photonic nanostructures, our results make the SnV a competitive spin-photon building block for quantum networks.

翻訳日:2023-03-28 03:19:45 公開日:2021-06-01

# 集約学習:ニューラルネットワーク分類器の学習に対するベクトル量子化アプローチ

Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers ( http://arxiv.org/abs/2001.03955v3 )

ライセンス: Link先を確認

Masoumeh Soflaei, Hongyu Guo, Ali Al-Bashabsheh, Yongyi Mao, Richong Zhang

(参考訳) ニューラルネットワーク分類器の学習の問題点を考察する。情報ボトルネック(IB)の原則の下では,この分類問題を「IB学習」と呼ぶ表現学習問題と関連付ける。 IB学習は、実際、量子化問題の特別なクラスと等価であることを示す。速度歪み理論の古典的な結果は、IB学習は「ベクトル量子化」アプローチ、すなわち複数の入力オブジェクトの表現を同時に学習するアプローチの恩恵を受けることができることを示唆する。このようなアプローチは、ニューラルネットワークモデルによる分類のための新しい学習フレームワークである"集約学習(Aggregated Learning)"を生み出した。このフレームワークでは、複数のオブジェクトを単一のニューラルネットワークで共同で分類する。本フレームワークの有効性は,標準画像認識およびテキスト分類タスクに関する広範な実験を通じて検証される。

We consider the problem of learning a neural network classifier. Under the information bottleneck (IB) principle, we associate with this classification problem a representation learning problem, which we call "IB learning". We show that IB learning is, in fact, equivalent to a special class of the quantization problem. The classical results in rate-distortion theory then suggest that IB learning can benefit from a "vector quantization" approach, namely, simultaneously learning the representations of multiple input objects. Such an approach assisted with some variational techniques, result in a novel learning framework, "Aggregated Learning", for classification with neural network models. In this framework, several objects are jointly classified by a single neural network. The effectiveness of this framework is verified through extensive experiments on standard image recognition and text classification tasks.

翻訳日:2023-01-12 04:32:10 公開日:2021-06-01

# ニューラルネットワークを用いたベイズ推論

Bayesian Reasoning with Trained Neural Networks ( http://arxiv.org/abs/2001.11031v3 )

ライセンス: Link先を確認

Jakob Knollm\"uller and Torsten En{\ss}lin

(参考訳) 我々は、トレーニングされたニューラルネットワークを用いてベイズ推論を行い、初期スコープ外のタスクを解決する方法を示した。深層生成モデルは事前知識を提供し、分類/回帰ネットワークは制約を課す。手前のタスクはベイズ推論問題として定式化され、変分法やサンプリング法によってほぼ解決した。既にトレーニング済みのネットワーク上に構築されたアプローチと、対応可能な質問は、利用可能なネットワークの数によって超指数的に増加した。最も単純な形で、アプローチは条件付き生成モデルを生み出した。しかし、複数の同時制約は精巧な問題を構成する。我々は、このアプローチを特別に訓練されたジェネレータと比較し、謎を解く方法を示し、最先端アーキテクチャとの互換性を実証した。

We showed how to use trained neural networks to perform Bayesian reasoning in order to solve tasks outside their initial scope. Deep generative models provide prior knowledge, and classification/regression networks impose constraints. The tasks at hand were formulated as Bayesian inference problems, which we approximately solved through variational or sampling techniques. The approach built on top of already trained networks, and the addressable questions grew super-exponentially with the number of available networks. In its simplest form, the approach yielded conditional generative models. However, multiple simultaneous constraints constitute elaborate questions. We compared the approach to specifically trained generators, showed how to solve riddles, and demonstrated its compatibility with state-of-the-art architectures.

翻訳日:2023-01-05 20:36:07 公開日:2021-06-01

# 動的ニューロモルフィックプロセッサによる時空間的特徴のシナプス的統合

Synaptic Integration of Spatiotemporal Features with a Dynamic Neuromorphic Processor ( http://arxiv.org/abs/2002.04924v2 )

ライセンス: Link先を確認

Mattias Nilsson, Foteini Liwicki and Fredrik Sandin

(参考訳) スパイキングニューロンは、シナプス前スパイクパターンの非線形シナプスおよび樹状統合による時空間的特徴検出を行うことができる。非線型デンドライトと関連するニューロモルフィック回路設計のマルチコンパートメントモデルは、そのような動的統合プロセスの忠実な模倣を可能にするが、これらのアプローチは比較的高い計算コストや回路サイズにも関係している。本稿では,dynap-seニューロモルフィックプロセッサにおける,複数の動的シナプスと時空間スパイクパターンの相補的統合について検討する。提案する動的シナプスの興奮-抑制対が組み合わさって複数の入力を統合する方法について検討し、この概念を1つの抑制シナプスと複数の興奮シナプスが組み合わされた場合に一般化する。神経形ニューロン回路の膜電位を測定し,解析することにより,後シナプス電位(EPSP)の遅延を特徴づける。生物学的に関係のあるEPSP遅延は1ニューロンあたり10ミリ秒の変動であり、デバイスミスマッチにより異なるシナプスの組み合わせを選択することにより、提案手法で実現できる。これらの結果に基づき,dynap-seに動的シナプスを有する単一点ニューロンが,特定の時空間構造を有するシナプス前スパイクに対して選択的に応答できることを実証した。

Spiking neurons can perform spatiotemporal feature detection by nonlinear synaptic and dendritic integration of presynaptic spike patterns. Multicompartment models of non-linear dendrites and related neuromorphic circuit designs enable faithful imitation of such dynamic integration processes, but these approaches are also associated with a relatively high computing cost or circuit size. Here, we investigate synaptic integration of spatiotemporal spike patterns with multiple dynamic synapses on point-neurons in the DYNAP-SE neuromorphic processor, which offers a complementary resource-efficient, albeit less flexible, approach to feature detection. We investigate how previously proposed excitatory--inhibitory pairs of dynamic synapses can be combined to integrate multiple inputs, and we generalize that concept to a case in which one inhibitory synapse is combined with multiple excitatory synapses. We characterize the resulting delayed excitatory postsynaptic potentials (EPSPs) by measuring and analyzing the membrane potentials of the neuromorphic neuronal circuits. We find that biologically relevant EPSP delays, with variability of order 10 milliseconds per neuron, can be realized in the proposed manner by selecting different synapse combinations, thanks to device mismatch. Based on these results, we demonstrate that a single point-neuron with dynamic synapses in the DYNAP-SE can respond selectively to presynaptic spikes with a particular spatiotemporal structure, which enables, for instance, visual feature tuning of single neurons.

翻訳日:2023-01-01 19:02:48 公開日:2021-06-01

# 公正主成分分析とフィルタ設計

Fair Principal Component Analysis and Filter Design ( http://arxiv.org/abs/2002.06557v2 )

ライセンス: Link先を確認

Gad Zalcberg and Ami Wiesel

(参考訳) 我々は,fair principal component analysis (fpca) を検討し,複数の対象ベクトルに公平にまたがる低次元部分空間を探索する。 FPCAは、与えられた集合内の最悪の射影目標ノルムの非凸最大化として定義される。この問題は信号処理におけるフィルタ設計や、公平性を次元還元スキームに組み込む際に発生する。 FPCAへの芸術的アプローチの状況は半有限緩和によるものであり、多項式は計算に費用がかかる。スケーラビリティを実現するために,naive sub-gradient descend を用いて fpca に対処することを提案する。直交目標の場合, 基礎となる最適化の状況を分析する。ランドスケープが良性であること、およびすべての局所ミニマがグローバルに最適であることを証明する。興味深いことに、sdrアプローチは、この単純なケースでは、最適以下のソリューションにつながります。最後に、直交FPCAと正規化タイトフレームの設計の等価性について論じる。

We consider Fair Principal Component Analysis (FPCA) and search for a low dimensional subspace that spans multiple target vectors in a fair manner. FPCA is defined as a non-concave maximization of the worst projected target norm within a given set. The problem arises in filter design in signal processing, and when incorporating fairness into dimensionality reduction schemes. The state of the art approach to FPCA is via semidefinite relaxation and involves a polynomial yet computationally expensive optimization. To allow scalability, we propose to address FPCA using naive sub-gradient descent. We analyze the landscape of the underlying optimization in the case of orthogonal targets. We prove that the landscape is benign and that all local minima are globally optimal. Interestingly, the SDR approach leads to sub-optimal solutions in this simple case. Finally, we discuss the equivalence between orthogonal FPCA and the design of normalized tight frames.

翻訳日:2022-12-31 17:40:11 公開日:2021-06-01

# ソースデータに本当にアクセスする必要があるか? 教師なし領域適応のためのソース仮説伝達

Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation ( http://arxiv.org/abs/2002.08546v6 )

ライセンス: Link先を確認

Jian Liang, Dapeng Hu, and Jiashi Feng

(参考訳) unsupervised domain adaptation(uda)は、ラベル付きソースデータセットから学んだ知識を活用して、新しいラベル付きドメインで同様のタスクを解決することを目的としている。従来のUDAメソッドは、モデルに適応するために学習する際にソースデータにアクセスする必要があり、分散化されたプライベートデータに対してリスクが高く非効率である。この研究は、訓練済みのソースモデルのみが利用できる実践的な環境に取り組み、ソースデータ無しでそのようなモデルを効果的に活用してUDA問題を解決する方法について検討する。本稿では,簡単な汎用的な表現学習フレームワークである \emph{Source HypOthesis Transfer} (SHOT) を提案する。 shotはソースモデルの分類器モジュール(仮説)を凍結し、情報最大化と自己教師付き擬似ラベルの両方を利用してターゲット固有の特徴抽出モジュールを学習し、ターゲットドメインからソース仮説への表現を暗黙的に整列させる。その汎用性を検証するため, 閉集合, 部分集合, 開集合領域適応など, SHOTを多岐にわたる適応例で評価した。実験によると、shotは複数のドメイン適応ベンチマークにおいて最先端の結果をもたらす。

Unsupervised domain adaptation (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain. Prior UDA methods typically require to access the source data when learning to adapt the model, making them risky and inefficient for decentralized private data. This work tackles a practical setting where only a trained source model is available and investigates how we can effectively utilize such a model without source data to solve UDA problems. We propose a simple yet generic representation learning framework, named \emph{Source HypOthesis Transfer} (SHOT). SHOT freezes the classifier module (hypothesis) of the source model and learns the target-specific feature extraction module by exploiting both information maximization and self-supervised pseudo-labeling to implicitly align representations from the target domains to the source hypothesis. To verify its versatility, we evaluate SHOT in a variety of adaptation cases including closed-set, partial-set, and open-set domain adaptation. Experiments indicate that SHOT yields state-of-the-art results among multiple domain adaptation benchmarks.

翻訳日:2022-12-30 07:07:27 公開日:2021-06-01

# 古典的適応フィルタ理論を用いたCNN訓練速度と安定性に及ぼすバッチ正規化の影響の分離

Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory ( http://arxiv.org/abs/2002.10674v2 )

ライセンス: Link先を確認

Elaina Chai, Mert Pilanci, Boris Murmann

(参考訳) バッチ正規化(BatchNorm)は、トレーニング速度と安定性を改善するために、畳み込みニューラルネットワーク(CNN)で一般的に使用される。しかし、なぜこの手法が有効であるかについてのコンセンサスはまだ限られている。本稿では、従来の適応フィルタ領域の概念を用いて、BatchNormの動的および内部動作に関する洞察を提供する。まず、畳み込み重み更新は、畳み込み層のチャネルワイド構造を介してBatchNormによって制御される入力自己相関行列の固有値に、安定性と収束速度が結びついている自然なモードを持つことを示す。さらに,本実験では,速度と安定性の利点が異なる効果を示す。低い学習率では、収束速度を改善する最小固有値のBatchNormの増幅であり、高い学習率では、安定性を保証する最大の固有値の抑制である。最後に、第1のトレーニングステップにおいて、正規化が最も必要となる場合、BatchNormは正規化リースト平均角 (NLMS) と同じ最適化を満足する一方で、その後のステップでこの条件を近似し続けていることを証明した。本稿では,適応フィルタ理論を用いて,現代のニューラルネットワーク構造に関するさらなる知見を得るための基礎研究を行った。

Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. However, there is still limited consensus on why this technique is effective. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm. First, we show that the convolution weight updates have natural modes whose stability and convergence speed are tied to the eigenvalues of the input autocorrelation matrices, which are controlled by BatchNorm through the convolution layers' channel-wise structure. Furthermore, our experiments demonstrate that the speed and stability benefits are distinct effects. At low learning rates, it is BatchNorm's amplification of the smallest eigenvalues that improves convergence speed, while at high learning rates, it is BatchNorm's suppression of the largest eigenvalues that ensures stability. Lastly, we prove that in the first training step, when normalization is needed most, BatchNorm satisfies the same optimization as Normalized Least Mean Square (NLMS), while it continues to approximate this condition in subsequent steps. The analyses provided in this paper lay the groundwork for gaining further insight into the operation of modern neural network structures using adaptive filter theory.

翻訳日:2022-12-28 20:34:16 公開日:2021-06-01

# DP-MERF: 実用的プライバシー保護データ生成のためのランダムな特徴付き微分プライベート平均埋め込み

DP-MERF: Differentially Private Mean Embeddings with Random Features for Practical Privacy-Preserving Data Generation ( http://arxiv.org/abs/2002.11603v5 )

ライセンス: Link先を確認

Frederik Harder, Kamil Adamczewski, Mijung Park

(参考訳) 実データと合成データの分布を比較する際に,カーネル平均埋め込みのランダムな特徴表現を用いた差分プライベートなデータ生成パラダイムを提案する。ランダムな特徴表現を2つの重要な利点として活用する。まず、深層生成モデルのトレーニングには最小限のプライバシーコストが必要です。これは、真のデータポイントと合成データポイントのすべてのペアでカーネルマトリックスを計算する必要があるカーネルベースの距離メトリクスとは異なり、データ依存項を合成データのみに依存する用語から切り離すことができるためである。したがって、データ依存項を一度だけ摂動し、ジェネレータのトレーニング中に繰り返し使用する必要がある。第二に、ランダムな特徴が構築によってノルムとなるため、カーネル平均埋め込みの解析感度を得ることができる。これにより、ジェネレータネットワークの未知の感度を扱うために、クリッピングノルムのハイパーパラメータ検索の必要性がなくなる。我々は,不均質な表データや画像データなどのデータセットのラベルと入力特徴を共同で生成するために,ランダムな特徴量(dp-merf)を持つ微分的平均埋め込みアルゴリズムを提案する。このアルゴリズムは、複数のデータセットでテストした場合、既存の方法よりもはるかに優れたプライバシ利用トレードオフを実現する。

We propose a differentially private data generation paradigm using random feature representations of kernel mean embeddings when comparing the distribution of true data with that of synthetic data. We exploit the random feature representations for two important benefits. First, we require a minimal privacy cost for training deep generative models. This is because unlike kernel-based distance metrics that require computing the kernel matrix on all pairs of true and synthetic data points, we can detach the data-dependent term from the term solely dependent on synthetic data. Hence, we need to perturb the data-dependent term only once and then use it repeatedly during the generator training. Second, we can obtain an analytic sensitivity of the kernel mean embedding as the random features are norm bounded by construction. This removes the necessity of hyper-parameter search for a clipping norm to handle the unknown sensitivity of a generator network. We provide several variants of our algorithm, differentially-private mean embeddings with random features (DP-MERF) to jointly generate labels and input features for datasets such as heterogeneous tabular data and image data. Our algorithm achieves drastically better privacy-utility trade-offs than existing methods when tested on several datasets.

翻訳日:2022-12-28 14:25:14 公開日:2021-06-01

# 広小密度仮説と探索的拡張学習率スケジュール

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule ( http://arxiv.org/abs/2003.03977v5 )

ライセンス: Link先を確認

Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

(参考訳) いくつかの論文では、幅の広いミニマは狭いミニマよりも一般化されていると主張している。本稿では,広大極小の一般化特性を共生する詳細な実験を通じて,広大極小の密度が狭小極小の密度よりも低いという新しい仮説の実証的な証拠を提供する。さらに,この仮説に動機づけられ,新しい探索・探索学習率スケジュールを設計する。様々な画像や自然言語データセットにおいて,学習のベースラインを手作業で調整した場合と比較して,探索・探索のスケジュールは最大0.84%高い絶対精度が得られるか,最大57%のトレーニング時間を短縮し,元の報告精度を達成することができることを示した。例えば、ハイパフォーマンスモデルの学習率スケジュールを変更するだけで、IWSLT'14(DE-EN)データセットの最先端(SOTA)精度を実現する。

Several papers argue that wide minima generalize better than narrow minima. In this paper, through detailed experiments that not only corroborate the generalization properties of wide minima, we also provide empirical evidence for a new hypothesis that the density of wide minima is likely lower than the density of narrow minima. Further, motivated by this hypothesis, we design a novel explore-exploit learning rate schedule. On a variety of image and natural language datasets, compared to their original hand-tuned learning rate baselines, we show that our explore-exploit schedule can result in either up to 0.84% higher absolute accuracy using the original training budget or up to 57% reduced training time while achieving the original reported accuracy. For example, we achieve state-of-the-art (SOTA) accuracy for IWSLT'14 (DE-EN) dataset by just modifying the learning rate schedule of a high performing model.

翻訳日:2022-12-25 07:56:55 公開日:2021-06-01

# 潜在画像を用いたオープンドメイン対話生成

Open Domain Dialogue Generation with Latent Images ( http://arxiv.org/abs/2004.01981v2 )

ライセンス: Link先を確認

Ze Yang, Wei Wu, Huang Hu, Can Xu, Wei Wang, Zhoujun Li

(参考訳) オープンドメインと画像との対話について検討する。既存の研究は、画像とテキストの文脈の両方が利用可能であると仮定しているが、自然界における画像地上対話は、テキスト対話よりも入手が困難である。そこで本研究では,対話時の視覚シーン情報を画像で表現可能と仮定し,テキスト対画像生成技術を用いてテキスト対話の潜在画像の復元を試みることにより,画像接地対話とテキスト対話の両方を用いた応答生成モデルを学ぶことを提案する。 2つのタイプの対話の可能性は、条件付き変分オートエンコーディングフレームワークで学習される応答生成器と画像再構成器によって定式化される。画像地上会話とテキストベースの会話の両方において実証的研究を行う。第1シナリオでは、特に低リソース環境下でのイメージ接頭辞対話は、潜在画像とのテキスト対話によって効果的に強化されるが、第2シナリオでは、潜在画像は応答の内容を強化し、同時に文脈に関連づけられる。

We consider grounding open domain dialogues with images. Existing work assumes that both an image and a textual context are available, but image-grounded dialogues by nature are more difficult to obtain than textual dialogues. Thus, we propose learning a response generation model with both image-grounded dialogues and textual dialogues by assuming that the visual scene information at the time of a conversation can be represented by an image, and trying to recover the latent images of the textual dialogues through text-to-image generation techniques. The likelihood of the two types of dialogues is then formulated by a response generator and an image reconstructor that are learned within a conditional variational auto-encoding framework. Empirical studies are conducted in both image-grounded conversation and text-based conversation. In the first scenario, image-grounded dialogues, especially under a low-resource setting, can be effectively augmented by textual dialogues with latent images; while in the second scenario, latent images can enrich the content of responses and at the same time keep them relevant to contexts.

翻訳日:2022-12-16 22:35:28 公開日:2021-06-01

# 過パラメータ領域における補間線形分類器の有限サンプル解析

Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime ( http://arxiv.org/abs/2004.12019v4 )

ライセンス: Link先を確認

Niladri S. Chatterji, Philip M. Long

(参考訳) 2クラス線形分類における最大マージンアルゴリズムの集団リスクの限界を証明した。線形分離可能なトレーニングデータに対して、最大マージンアルゴリズムは、トレーニングエラーが0に駆動されるため、勾配降下を用いたロジスティック損失を伴うトレーニングの限界に相当することが以前の研究で示されている。このアルゴリズムは誤分類ノイズを含むランダムデータに適用される。クリーンデータに対する我々の仮定は、クラス条件分布が標準正規分布である場合を含む。誤分類ノイズは敵によって選択され、破損したラベルのごく一部に制限される。我々の限界は、十分な過パラメータ化によって、ノイズデータに基づいてトレーニングされた最大マージンアルゴリズムが、ほぼ最適な人口リスクを達成できることを示している。

We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification. For linearly separable training data, the maximum margin algorithm has been shown in previous work to be equivalent to a limit of training with logistic loss using gradient descent, as the training error is driven to zero. We analyze this algorithm applied to random data including misclassification noise. Our assumptions on the clean data include the case in which the class-conditional distributions are standard normal distributions. The misclassification noise may be chosen by an adversary, subject to a limit on the fraction of corrupted labels. Our bounds show that, with sufficient over-parameterization, the maximum margin algorithm trained on noisy data can achieve nearly optimal population risk.

翻訳日:2022-12-09 21:35:38 公開日:2021-06-01

# InfoScrub: 目的の難読化による属性プライバシの実現

InfoScrub: Towards Attribute Privacy by Targeted Obfuscation ( http://arxiv.org/abs/2005.10329v2 )

ライセンス: Link先を確認

Hui-Po Wang, Tribhuvanesh Orekondy, Mario Fritz

(参考訳) オンラインで共有された個人の個人写真は、記憶に残る多くの詳細を示す以外に、幅広いプライベート情報を明らかにし、プライバシーリスク(オンラインハラスメント、追跡など)を伴う可能性がある。このようなリスクを軽減するためには、個人が視覚データに漏洩した個人情報を制限する技術を研究することが不可欠である。我々は,画像の忠実さを維持しつつ,対象とするプライバシ属性に対する推論のエントロピーを最大化する,新しい画像難読化フレームワークでこの問題に取り組む。エンコーダ-デコーダ方式のアーキテクチャを基本とした2つの問題にアプローチする。 (a)複数ドメインから同時に双方向翻訳を行うための識別器を導入すること b)属性のターゲットセットに対する不確実性を最大化する画像補間を予測する。我々のアプローチは、元の入力画像に忠実な難読化画像を生成し、さらに非難読化画像に対して6.2$\times$(最大0.25bit)の不確かさを増加させる。

Personal photos of individuals when shared online, apart from exhibiting a myriad of memorable details, also reveals a wide range of private information and potentially entails privacy risks (e.g., online harassment, tracking). To mitigate such risks, it is crucial to study techniques that allow individuals to limit the private information leaked in visual data. We tackle this problem in a novel image obfuscation framework: to maximize entropy on inferences over targeted privacy attributes, while retaining image fidelity. We approach the problem based on an encoder-decoder style architecture, with two key novelties: (a) introducing a discriminator to perform bi-directional translation simultaneously from multiple unpaired domains; (b) predicting an image interpolation which maximizes uncertainty over a target set of attributes. We find our approach generates obfuscated images faithful to the original input images, and additionally increase uncertainty by 6.2$\times$ (or up to 0.85 bits) over the non-obfuscated counterparts.

翻訳日:2022-12-01 05:03:56 公開日:2021-06-01

# 胸部X線画像からのCOVID-19, MERS, SARSの信頼性診断のための深層学習

Deep Learning for Reliable Classification of COVID-19, MERS, and SARS from Chest X-Ray Images ( http://arxiv.org/abs/2005.11524v6 )

ライセンス: Link先を確認

Anas Tahir, Yazan Qiblawey, Amith Khandakar, Tawsifur Rahman, Uzair Khurshid, Farayi Musharavati, M. T. Islam, Serkan Kiranyaz, Muhammad E. H. Chowdhury

(参考訳) 新規のコロナウイルス病(COVID-19)は、非常に感染性が高く、急速に感染するコロナウイルスである。 2002年と2011年に流行した重症急性呼吸器症候群(sars)と中東呼吸器症候群(mers)、そして現在の新型コロナウイルス(covid-19)のパンデミックは、すべて同じ種類のコロナウイルスである。本研究の目的は、深層畳み込みニューラルネットワーク(CNN)を用いて、COVID-19、SARS、MERS胸部X線(CXR)画像を分類することである。 423のCOVID-19、144のMERS、134のSARS CXR画像からなるQU-COVID- Familyと呼ばれるユニークなデータベースが作成された。さらに、CNNセグメンテーションモデル(U-Net)を用いて肺領域を同定し、訓練済みのCNN分類器を用いて、セグメンテーションされた肺画像をCOVID-19、MERS、SARSに分類する堅牢なCOVID-19認識システムを提案した。さらに,スコアカム可視化法を用いて分類結果の可視化を行い,深層cnnの決定の背後にある理由を理解する。いくつかのディープラーニング分類器が訓練され、テストされ、4つの優れたアルゴリズムが報告された。オリジナル画像とプリプロセス画像は、ネットワークへの入力として、それぞれに同時に使用された。 CXR分類とCXR分類の2つの分類法が検討された。通常のcxrでは、inceptionv3は他のネットワークを3チャンネル方式で上回り、99.5%、93.1%、および97%の感度を達成し、covid-19、mers、sars画像の分類を行った。一方、セグメンテーションされたCXRでは、InceptionV3はオリジナルのCXRデータセットより優れ、それぞれ96.94%、79.68%、90.26%の感度で新型コロナウイルス、MERS、SARSの画像を分類した。すべてのネットワークは、分枝肺画像で高い新型コロナウイルス検出感度(>96%)を示した。これは、医療従事者にとって難しい課題であるAIの目に、新型コロナウイルス(COVID-19)のユニークな症状を示すものだ。

Novel Coronavirus disease (COVID-19) is an extremely contagious and quickly spreading Coronavirus infestation. Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS), which outbreak in 2002 and 2011, and the current COVID-19 pandemic are all from the same family of coronavirus. This work aims to classify COVID-19, SARS, and MERS chest X-ray (CXR) images using deep Convolutional Neural Networks (CNNs). A unique database was created, so-called QU-COVID-family, consisting of 423 COVID-19, 144 MERS, and 134 SARS CXR images. Besides, a robust COVID-19 recognition system was proposed to identify lung regions using a CNN segmentation model (U-Net), and then classify the segmented lung images as COVID-19, MERS, or SARS using a pre-trained CNN classifier. Furthermore, the Score-CAM visualization method was utilized to visualize classification output and understand the reasoning behind the decision of deep CNNs. Several Deep Learning classifiers were trained and tested; four outperforming algorithms were reported. Original and preprocessed images were used individually and all together as the input(s) to the networks. Two recognition schemes were considered: plain CXR classification and segmented CXR classification. For plain CXRs, it was observed that InceptionV3 outperforms other networks with a 3-channel scheme and achieves sensitivities of 99.5%, 93.1%, and 97% for classifying COVID-19, MERS, and SARS images, respectively. In contrast, for segmented CXRs, InceptionV3 outperformed using the original CXR dataset and achieved sensitivities of 96.94%, 79.68%, and 90.26% for classifying COVID-19, MERS, and SARS images, respectively. All networks showed high COVID-19 detection sensitivity (>96%) with the segmented lung images. This indicates the unique radiographic signature of COVID-19 cases in the eyes of AI, which is often a challenging task for medical doctors.

翻訳日:2022-11-30 03:38:28 公開日:2021-06-01

# 連続ゲームにおける近似ナッシュ平衡の計算アルゴリズムと連続ブロットーへの応用

Algorithm for Computing Approximate Nash Equilibrium in Continuous Games with Application to Continuous Blotto ( http://arxiv.org/abs/2006.07443v5 )

ライセンス: Link先を確認

Sam Ganzfried

(参考訳) 様々な有限ゲームクラスにおけるナッシュ均衡の計算に有効なアルゴリズムが開発されている。しかし、純粋な戦略空間が(潜在的に数え切れないほど)無限である連続的なゲームを解くことは、はるかに難しい。にもかかわらず、多くの実世界のドメインは連続的な行動空間を持ち、例えば、アクションは時間、お金、あるいは自然に積分とは対照的に実数値としてモデル化される他の資源の量を指す。連続ゲームにおけるナッシュ均衡戦略を近似する新しいアルゴリズムを提案する。 2人プレイのゼロサムゲームに加えて、アルゴリズムは不完全な情報を持つマルチプレイヤーゲームやゲームにも適用される。 2人のプレイヤーが複数の戦場でリソースを分配する連続的不完全情報ブロットゲームについて実験を行った。ブロットゲームは国家の安全シナリオをモデル化するために頻繁に用いられ、選挙競争やオークション理論にも適用されてきた。実験により,本ゲームにおけるnash平衡戦略の近接近似を高速に計算できることを示した。

Successful algorithms have been developed for computing Nash equilibrium in a variety of finite game classes. However, solving continuous games -- in which the pure strategy space is (potentially uncountably) infinite -- is far more challenging. Nonetheless, many real-world domains have continuous action spaces, e.g., where actions refer to an amount of time, money, or other resource that is naturally modeled as being real-valued as opposed to integral. We present a new algorithm for {approximating} Nash equilibrium strategies in continuous games. In addition to two-player zero-sum games, our algorithm also applies to multiplayer games and games with imperfect information. We experiment with our algorithm on a continuous imperfect-information Blotto game, in which two players distribute resources over multiple battlefields. Blotto games have frequently been used to model national security scenarios and have also been applied to electoral competition and auction theory. Experiments show that our algorithm is able to quickly compute close approximations of Nash equilibrium strategies for this game.

翻訳日:2022-11-22 04:46:49 公開日:2021-06-01

# 高速かつイーガーなk-メドイドクラスタリング: PAM, CLARA, CLARANSアルゴリズムのO(k)実行時改善

Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms ( http://arxiv.org/abs/2008.05171v2 )

ライセンス: Link先を確認

Erich Schubert and Peter J. Rousseeuw

(参考訳) 非ユークリッドデータのクラスタリングは困難であり、階層的クラスタリング以外の最もよく使われるアルゴリズムの1つは、k-medoids clusteringとも呼ばれるPAM(Partitioning Around Medoids)である。ユークリッド幾何学において、k-平均で使われる平均はクラスター中心に良い推定子であるが、任意の相似性には存在しない。 PAMは代わりにメドイドを使用し、クラスタ内の他のすべてのオブジェクトと最小の相同性を持つオブジェクトである。この中心性の概念は任意の(dis-)類似性で使用することができ、したがって多くのドメインやアプリケーションに高い関係がある。 PAMの大きな問題は、高い実行時間コストである。アルゴリズムの第2(SWAP)フェーズでO(k)倍の高速化を実現するPAMアルゴリズムの修正を提案するが、元のPAMアルゴリズムと同じ結果が得られる。実行されたスワップの選択を緩和し(同等の品質を維持しながら)、各イテレーションで熱心にスワップを実行してアルゴリズムをさらに加速することができる。スワップが大幅に速くなれば、より高速な初期化戦略を探せるようになります。 (i)古典的「BUILD」初期化がボトルネックとなり、 (二)当社のスワップは、開始条件の悪化を補うのに十分な速さです。また,claraアルゴリズムとclaransアルゴリズムが提案する修正の利点を示す。本研究におけるアプローチの並列化は研究されていないが,PAM と CLARA をビッグデータ上で使用するための従来のアプローチ(一部ではサブルーチンとして PAM を使用しているため,これらの改善の恩恵を受けられる)と組み合わせれば,高い k での性能がますます重要になる。 k=100,200の実データに対する実験では、元のPAM SWAPアルゴリズムと比較して458倍のスピードアップを観測し、PAMをより大きなデータセット、特に高いkに適用した。

Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids clustering. In Euclidean geometry the mean-as used in k-means-is a good estimator for the cluster center, but this does not exist for arbitrary dissimilarities. PAM uses the medoid instead, the object with the smallest dissimilarity to all others in the cluster. This notion of centrality can be used with any (dis-)similarity, and thus is of high relevance to many domains and applications. A key issue with PAM is its high run time cost. We propose modifications to the PAM algorithm that achieve an O(k)-fold speedup in the second ("SWAP") phase of the algorithm, but will still find the same results as the original PAM algorithm. If we relax the choice of swaps performed (while retaining comparable quality), we can further accelerate the algorithm by eagerly performing additional swaps in each iteration. With the substantially faster SWAP, we can now explore faster initialization strategies, because (i) the classic ("BUILD") initialization now becomes the bottleneck, and (ii) our swap is fast enough to compensate for worse starting conditions. We also show how the CLARA and CLARANS algorithms benefit from the proposed modifications. While we do not study the parallelization of our approach in this work, it can easily be combined with earlier approaches to use PAM and CLARA on big data (some of which use PAM as a subroutine, hence can immediately benefit from these improvements), where the performance with high k becomes increasingly important. In experiments on real data with k=100,200, we observed a 458x respectively 1191x speedup compared to the original PAM SWAP algorithm, making PAM applicable to larger data sets, and in particular to higher k.

翻訳日:2022-10-31 04:28:32 公開日:2021-06-01

# 政治的主張のマルチホップファクトチェック

Multi-Hop Fact Checking of Political Claims ( http://arxiv.org/abs/2009.06401v3 )

ライセンス: Link先を確認

Wojciech Ostrowski, Arnav Arora, Pepa Atanasova, Isabelle Augenstein

(参考訳) 近年の研究では、複雑な自然言語推論を研究するためのマルチホップモデルとデータセットが提案されている。マルチホップ推論を必要とする注目すべきタスクの1つは事実チェックであり、接続された証拠のセットがクレームの最終判断に繋がる。しかし、既存のデータセットは金のエビデンスページのアノテーションを提供していないか、または(FEVER)唯一のデータセットは、単純な推論で事実チェックでき、人工的に構築されるクレームで構成されている。ここでは、相互接続された証拠チャンクの上に複数のホップを持つ自然発生クレームのより複雑なクレーム検証を行う。私たち 1) クレーム検証のための証拠文の小さな注釈付きデータセット、politihopを構築する。 2) 既存のマルチホップデータセットと比較し, 3)より広範なドメイン内および外部リソースからPolitHopへの知識の転送方法を検討する。タスクは複雑で、ドメイン内の転送学習と組み合わせてエビデンスを推論するアーキテクチャで、最高のパフォーマンスを達成することが分かっています。

Recent work has proposed multi-hop models and datasets for studying complex natural language reasoning. One notable task requiring multi-hop reasoning is fact checking, where a set of connected evidence pieces leads to the final verdict of a claim. However, existing datasets either do not provide annotations for gold evidence pages, or the only dataset which does (FEVER) mostly consists of claims which can be fact-checked with simple reasoning and is constructed artificially. Here, we study more complex claim verification of naturally occurring claims with multiple hops over interconnected evidence chunks. We: 1) construct a small annotated dataset, PolitiHop, of evidence sentences for claim verification; 2) compare it to existing multi-hop datasets; and 3) study how to transfer knowledge from more extensive in- and out-of-domain resources to PolitiHop. We find that the task is complex and achieve the best performance with an architecture that specifically models reasoning over evidence pieces in combination with in-domain transfer learning.

翻訳日:2022-10-20 02:34:50 公開日:2021-06-01

# 暗黙的グラフニューラルネットワーク

Implicit Graph Neural Networks ( http://arxiv.org/abs/2009.06211v3 )

ライセンス: Link先を確認

Fangda Gu, Heng Chang, Wenwu Zhu, Somayeh Sojoudi, Laurent El Ghaoui

(参考訳) グラフニューラルネットワーク(gnns)は、グラフ構造化データから意味のある表現を学ぶディープラーニングモデルとして広く使われている。基礎となるリカレント構造が有限であるため、現在のGNN法は基礎となるグラフの長距離依存を捉えるのに苦労する可能性がある。この難しさを克服するために,我々は,暗黙の「状態」ベクトルを含む固定点平衡方程式の解に基づく,暗黙のグラフニューラルネットワーク(ignn)と呼ばれるグラフ学習フレームワークを提案する。我々はペロン・フロベニウス理論を用いて、枠組みの健全性を保証する十分な条件を導出する。暗黙的な差別化を生かして、フレームワークを訓練するための引き込み可能な勾配降下法を導出する。包括的なタスクの実験は、IGNNが一貫して長距離依存をキャプチャし、最先端のGNNモデルより優れていることを示している。

Graph Neural Networks (GNNs) are widely used deep learning models that learn meaningful representations from graph-structured data. Due to the finite nature of the underlying recurrent structure, current GNN methods may struggle to capture long-range dependencies in underlying graphs. To overcome this difficulty, we propose a graph learning framework, called Implicit Graph Neural Networks (IGNN), where predictions are based on the solution of a fixed-point equilibrium equation involving implicitly defined "state" vectors. We use the Perron-Frobenius theory to derive sufficient conditions that ensure well-posedness of the framework. Leveraging implicit differentiation, we derive a tractable projected gradient descent method to train the framework. Experiments on a comprehensive range of tasks show that IGNNs consistently capture long-range dependencies and outperform the state-of-the-art GNN models.

翻訳日:2022-10-18 11:31:51 公開日:2021-06-01

# 最小平均化によるモーメントム:非凸最適化のための理論的考察と学習率スケジューリング

Momentum via Primal Averaging: Theoretical Insights and Learning Rate Schedules for Non-Convex Optimization ( http://arxiv.org/abs/2010.00406v4 )

ライセンス: Link先を確認

Aaron Defazio

(参考訳) モーメント法は、ディープニューラルネットワークのような非凸モデルのトレーニングに機械学習コミュニティ内で広く使われている。経験的には、それらは伝統的な確率勾配降下(SGD)アプローチを実行する。本研究では, 運動量を持つSGD(SGD+M)のリアプノフ解析を行い, 確率的原始平均化法(SPA)の等価な書き換えを利用する。この解析は、非凸の場合の以前の理論よりもはるかに厳密であり、SGD+MがSGDをいつ上回るのか、ハイパーパラメータースケジュールがどうなるのか、なぜ動くのかを正確に把握することができる。

Momentum methods are now used pervasively within the machine learning community for training non-convex models such as deep neural networks. Empirically, they out perform traditional stochastic gradient descent (SGD) approaches. In this work we develop a Lyapunov analysis of SGD with momentum (SGD+M), by utilizing a equivalent rewriting of the method known as the stochastic primal averaging (SPA) form. This analysis is much tighter than previous theory in the non-convex case, and due to this we are able to give precise insights into when SGD+M may out-perform SGD, and what hyper-parameter schedules will work and why.

翻訳日:2022-10-12 07:43:33 公開日:2021-06-01

# 構造予測のための埋め込みの自動連結

Automated Concatenation of Embeddings for Structured Prediction ( http://arxiv.org/abs/2010.05006v4 )

ライセンス: Link先を確認

Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu

(参考訳) 事前制約付き文脈埋め込みは、構造化予測タスクのための強力な単語表現である。最近の研究により、異なる種類の埋め込みを結合することでより良い単語表現が得られることがわかった。しかし、最善の連結表現を形成する組込みの選択は、通常、タスクや候補組込みのコレクションによって異なり、組込み型がますます増えているため、より難しい問題となっている。本稿では,ニューラルネットワーク探索の最近の進歩に触発された定式化に基づいて,構造化予測タスクに対する埋め込みのより良い結合を見つけるプロセスを自動化するための,埋め込みの自動結合(ACE)を提案する。具体的には、タスクを考慮した個別の埋め込み型の有効性に関する現在の信念に基づいて埋め込みの結合を交互にサンプリングし、報酬に基づいてその信念を更新する。強化学習の戦略に従い、コントローラのパラメータを最適化し、入力としてサンプルされた連結で供給され、タスクデータセットでトレーニングされたタスクモデルの精度に基づいて報酬を算出する。 6つのタスクと21のデータセットに対する実証的な結果から、我々のアプローチは強いベースラインを上回り、すべての評価に微調整された埋め込みによる最先端のパフォーマンスを実現する。

Pretrained contextualized embeddings are powerful word representations for structured prediction tasks. Recent work found that better word representations can be obtained by concatenating different types of embeddings. However, the selection of embeddings to form the best concatenated representation usually varies depending on the task and the collection of candidate embeddings, and the ever-increasing number of embedding types makes it a more difficult problem. In this paper, we propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks, based on a formulation inspired by recent progress on neural architecture search. Specifically, a controller alternately samples a concatenation of embeddings, according to its current belief of the effectiveness of individual embedding types in consideration for a task, and updates the belief based on a reward. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model, which is fed with the sampled concatenation as input and trained on a task dataset. Empirical results on 6 tasks and 21 datasets show that our approach outperforms strong baselines and achieves state-of-the-art performance with fine-tuned embeddings in all the evaluations.

翻訳日:2022-10-08 22:19:27 公開日:2021-06-01

# 構成的一般化と自然言語の変動:意味的パーシングアプローチは両立できるか?

Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both? ( http://arxiv.org/abs/2010.12725v2 )

ライセンス: Link先を確認

Peter Shaw, Ming-Wei Chang, Panupong Pasupat, Kristina Toutanova

(参考訳) sequence-to-sequenceモデルは自然言語の変化を扱うのに優れているが、分散的構成の一般化に苦しむことが示されている。これは、構成バイアスが強い新しい特殊アーキテクチャを動機付けるが、これらのアプローチのほとんどは、自然言語の変化を代表しない合成生成データセットでのみ評価されている。私たちは、自然言語の変化と合成の一般化の両方を扱うセマンティック解析のアプローチを開発できますか? この機能をよりよく評価するために、非合成データセットの新しいトレインとテスト分割を提案する。我々は、強力な既存のアプローチが幅広い評価でうまく機能しないことを実証する。また,高精度文法ベースアプローチと事前学習されたシーケンス・ツー・シーケンスモデルを組み合わせたハイブリッドモデルであるnqg-t5を提案する。これは、非合成データに対するいくつかの構成的一般化課題にまたがる既存のアプローチよりも優れており、標準評価に関する最先端技術と競合している。この問題はまだ解決には程遠いが,本研究は多彩な評価の重要性と,構文解析における合成汎化と自然言語変化の両方を扱うオープンチャレンジを浮き彫りにしている。

Sequence-to-sequence models excel at handling natural language variation, but have been shown to struggle with out-of-distribution compositional generalization. This has motivated new specialized architectures with stronger compositional biases, but most of these approaches have only been evaluated on synthetically-generated datasets, which are not representative of natural language variation. In this work we ask: can we develop a semantic parsing approach that handles both natural language variation and compositional generalization? To better assess this capability, we propose new train and test splits of non-synthetic datasets. We demonstrate that strong existing approaches do not perform well across a broad set of evaluations. We also propose NQG-T5, a hybrid model that combines a high-precision grammar-based approach with a pre-trained sequence-to-sequence model. It outperforms existing approaches across several compositional generalization challenges on non-synthetic data, while also being competitive with the state-of-the-art on standard evaluations. While still far from solving this problem, our study highlights the importance of diverse evaluations and the open challenge of handling both compositional generalization and natural language variation in semantic parsing.

翻訳日:2022-10-03 12:45:18 公開日:2021-06-01

# deep21:21cm前景除去のための深層学習法

deep21: a Deep Learning Method for 21cm Foreground Removal ( http://arxiv.org/abs/2010.15843v2 )

ライセンス: Link先を確認

T. Lucas Makinen, Lachlan Lancaster, Francisco Villaescusa-Navarro, Peter Melchior, Shirley Ho, Laurence Perreault-Levasseur, and David N. Spergel

(参考訳) 21cmの強度マッピング観測から前景汚染物質を除去する。 unetアーキテクチャと3次元畳み込みを持つ深層畳み込みニューラルネットワーク(cnn)は、シミュレーション観測に基づいて訓練され、ノイズ発生時に前景から宇宙中性水素(hi)信号の周波数と空間パターンを効果的に分離できることを実証する。クリーニングマップは、すべての関連する角スケールと周波数で10%以内の宇宙的クラスタリング統計を回復する。これは、小さな角スケールでの桁違いの予測ばらつきを減少させる("\ell > 300$")ことと、標準主成分分析(PCA)法と比較して小さな半径スケールでの精度の改善("k_{\parallel} > 0.17\ \rm h\ Mpc^{-1}")である。ネットワークの予測に対する後続信頼区間をUNETのアンサンブルを訓練することにより推定する。提案手法は,シミュレーション前景モデルが十分現実的である限り,今後の無線実験のために導出された要約統計とは対照的に,21cmの強度マップを解析できることを実証する。我々は、Github https://github.com/tlmakinen/deep21でこの分析に使用されるコードと、 http://bit.ly/deep21-colab Colabノートブックを通じて、実験とUNetモデルのブラウザベースのチュートリアルを提供する。

We seek to remove foreground contaminants from 21cm intensity mapping observations. We demonstrate that a deep convolutional neural network (CNN) with a UNet architecture and three-dimensional convolutions, trained on simulated observations, can effectively separate frequency and spatial patterns of the cosmic neutral hydrogen (HI) signal from foregrounds in the presence of noise. Cleaned maps recover cosmological clustering statistics within 10% at all relevant angular scales and frequencies. This amounts to a reduction in prediction variance of over an order of magnitude on small angular scales ($\ell > 300$), and improved accuracy for small radial scales ($k_{\parallel} > 0.17\ \rm h\ Mpc^{-1})$ compared to standard Principal Component Analysis (PCA) methods. We estimate posterior confidence intervals for the network's prediction by training an ensemble of UNets. Our approach demonstrates the feasibility of analyzing 21cm intensity maps, as opposed to derived summary statistics, for upcoming radio experiments, as long as the simulated foreground model is sufficiently realistic. We provide the code used for this analysis on Github https://github.com/tlmakinen/deep21 as well as a browser-based tutorial for the experiment and UNet model via the accompanying http://bit.ly/deep21-colab Colab notebook.

翻訳日:2022-10-01 23:46:45 公開日:2021-06-01

# データ拡張を用いた低リソース表現型音声合成

Low-resource expressive text-to-speech using data augmentation ( http://arxiv.org/abs/2011.05707v2 )

ライセンス: Link先を確認

Goeric Huybrechts, Thomas Merritt, Giulia Comini, Bartek Perz, Raahil Shah, Jaime Lorenzo-Trueba

(参考訳) 最近のneural text-to-speech (tts)システムは、非常によく機能するが、通常、目的とする話者からの所望の発話スタイルでのかなりの録音を必要とする。本研究では,このような録音を15分以内で表現型音声を構築するために,大量のターゲットデータを記録するコストのかかる作業を回避するために,新しい3段階の手法を提案する。まず、他の話者から希望する発話スタイルでの録音を利用して、音声変換によるデータ拡張を行う。次に、利用可能な録音の上に合成データを使って、TSモデルをトレーニングします。最後に、このモデルを微調整して、さらに品質を高めます。評価の結果,提案した変化は,合成音声の多くの側面において,非拡張モデルに対して大きな改善をもたらすことが示された。提案手法は2つのスタイル(新しい話者と会話型)、様々な話者、および単一話者モデルとマルチ話者モデルにおいて、我々のアプローチの堅牢性を示す。

While recent neural text-to-speech (TTS) systems perform remarkably well, they typically require a substantial amount of recordings from the target speaker reading in the desired speaking style. In this work, we present a novel 3-step methodology to circumvent the costly operation of recording large amounts of target data in order to build expressive style voices with as little as 15 minutes of such recordings. First, we augment data via voice conversion by leveraging recordings in the desired speaking style from other speakers. Next, we use that synthetic data on top of the available recordings to train a TTS model. Finally, we fine-tune that model to further increase quality. Our evaluations show that the proposed changes bring significant improvements over non-augmented models across many perceived aspects of synthesised speech. We demonstrate the proposed approach on 2 styles (newscaster and conversational), on various speakers, and on both single and multi-speaker models, illustrating the robustness of our approach.

翻訳日:2022-09-27 00:42:40 公開日:2021-06-01

# 再構成可能なインテリジェントサーフェスによるフェデレーション学習の実現 - 統一的なコミュニケーション・ラーニング設計アプローチ

Reconfigurable Intelligent Surface Enabled Federated Learning: A Unified Communication-Learning Design Approach ( http://arxiv.org/abs/2011.10282v4 )

ライセンス: Link先を確認

Hang Liu, Xiaojun Yuan, Ying-Jun Angela Zhang

(参考訳) モバイルエッジネットワークで生成された大量のデータを活用するために、集中型機械学習(ML)の魅力的な代替手段として、フェデレートラーニング(FL)が提案されている。エッジデバイスで共有学習モデルを協調的にトレーニングすることにより、FLは直接データ伝送を避け、集中型MLと比較して通信遅延とプライバシーの問題を克服する。 flモデルアグリゲーションにおける通信効率を向上させるため、無線チャネル固有の重ね合わせ特性を利用して、多数の同時ローカルモデルアップロードをサポートするover-the-air計算が導入された。しかし、エッジデバイス間の通信容量の不均一性により、最弱チャネルのデバイスがモデル集約性能のボトルネックとなるストラグラー問題に悩まされる。この問題はデバイスの選択によってある程度緩和できるが、後者は依然としてデータ搾取とモデル通信のトレードオフに苦しんでいる。本稿では、再構成可能なインテリジェントサーフェス(RIS)技術を活用し、オーバーザエアFLにおけるストラグラー問題を解消する。具体的には,デバイス選択とモデル集約誤差が空中flの収束に与える影響を定量的に特徴付ける学習分析フレームワークを開発した。そして,統合通信学習最適化問題を定式化し,デバイス選択,無線トランスシーバ設計,RIS構成を共同で最適化する。数値実験により、特にエッジデバイス間でチャネル条件が劇的に変化する場合において、提案手法は最先端手法に比べて学習精度が大幅に向上することが示された。

To exploit massive amounts of data generated at mobile edge networks, federated learning (FL) has been proposed as an attractive substitute for centralized machine learning (ML). By collaboratively training a shared learning model at edge devices, FL avoids direct data transmission and thus overcomes high communication latency and privacy issues as compared to centralized ML. To improve the communication efficiency in FL model aggregation, over-the-air computation has been introduced to support a large number of simultaneous local model uploading by exploiting the inherent superposition property of wireless channels. However, due to the heterogeneity of communication capacities among edge devices, over-the-air FL suffers from the straggler issue in which the device with the weakest channel acts as a bottleneck of the model aggregation performance. This issue can be alleviated by device selection to some extent, but the latter still suffers from a tradeoff between data exploitation and model communication. In this paper, we leverage the reconfigurable intelligent surface (RIS) technology to relieve the straggler issue in over-the-air FL. Specifically, we develop a learning analysis framework to quantitatively characterize the impact of device selection and model aggregation error on the convergence of over-the-air FL. Then, we formulate a unified communication-learning optimization problem to jointly optimize device selection, over-the-air transceiver design, and RIS configuration. Numerical experiments show that the proposed design achieves substantial learning accuracy improvement compared with the state-of-the-art approaches, especially when channel conditions vary dramatically across edge devices.

翻訳日:2022-09-23 06:54:47 公開日:2021-06-01

# convtransformer:ビデオフレーム合成のための畳み込みトランスフォーマネットワーク

ConvTransformer: A Convolutional Transformer Network for Video Frame Synthesis ( http://arxiv.org/abs/2011.10185v2 )

ライセンス: Link先を確認

Zhouyong Liu, Shun Luo, Wubin Li, Jingben Lu, Yufan Wu, Shilei Sun, Chunguo Li, Luxi Yang

(参考訳) 深層畳み込みニューラルネットワーク(Deep Convolutional Neural Networks, CNN)は、難しいコンピュータビジョンタスクにおいて優れたパフォーマンスを達成する強力なモデルである。 CNNは、大きなラベル付きトレーニングサンプルが利用可能であればいつでもうまく機能するが、オブジェクトの変形や移動、シーンの照明変更、ビデオシーケンスで動くカメラなどにより、ビデオフレームの合成に悪影響を及ぼす。本稿では、ビデオフレームシーケンス学習とビデオフレーム合成のための、畳み込み変換器(Conv Transformer)と呼ばれる、新規で汎用的なエンドツーエンドアーキテクチャを提案する。 convtransformerの中核となる要素は、ビデオシーケンスの逐次依存性を学習するマルチヘッド畳み込み層(multi-head convolutional self-attention layer)である。 ConvTransformerは、マルチヘッドの畳み込み自己保持層上に構築されたエンコーダを使用して、入力フレーム間のシーケンシャルな依存を符号化し、デコーダはターゲットの合成フレームと入力フレーム間の長期的依存を復号する。ビデオフレーム外挿タスクの実験では、ConvTransformerは高品質でありながら、畳み込みLSTM(ConvLSTM)上に構築された最近のアプローチよりも並列化可能である。我々の知る限りでは、ConvTransformerアーキテクチャが提案され、ビデオフレーム合成に適用されたのはこれが初めてである。

Deep Convolutional Neural Networks (CNNs) are powerful models that have achieved excellent performance on difficult computer vision tasks. Although CNNs perform well whenever large labeled training samples are available, they work badly on video frame synthesis due to objects deforming and moving, scene lighting changes, and cameras moving in video sequence. In this paper, we present a novel and general end-to-end architecture, called convolutional Transformer or ConvTransformer, for video frame sequence learning and video frame synthesis. The core ingredient of ConvTransformer is the proposed attention layer, i.e., multi-head convolutional self-attention layer, that learns the sequential dependence of video sequence. ConvTransformer uses an encoder, built upon multi-head convolutional self-attention layer, to encode the sequential dependence between the input frames, and then a decoder decodes the long-term dependence between the target synthesized frames and the input frames. Experiments on video future frame extrapolation task show ConvTransformer to be superior in quality while being more parallelizable to recent approaches built upon convolutional LSTM (ConvLSTM). To the best of our knowledge, this is the first time that ConvTransformer architecture is proposed and applied to video frame synthesis.

翻訳日:2022-09-23 05:58:58 公開日:2021-06-01

# GLGE: 新しい汎用言語生成評価ベンチマーク

GLGE: A New General Language Generation Evaluation Benchmark ( http://arxiv.org/abs/2011.11928v3 )

ライセンス: Link先を確認

Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan

(参考訳) GLUEやSuperGLUEのようなマルチタスクベンチマークは、自然言語処理(NLP)における事前学習と転送学習の大きな進歩を導いている。これらのベンチマークは主に自然言語生成(NLG)モデルを考慮せずに、さまざまな自然言語理解(NLU)タスクに焦点を当てている。本稿では,8つの言語生成タスクにわたるNLGモデルの一般化能力を評価するための,新しいマルチタスクベンチマークであるジェネラル言語生成評価(GLGE)を提案する。各タスクに対して,タスク難易度(GLGE-Easy, GLGE-Medium, GLGE-Hard)の3つのサブタスクを引き続き設計する。これにより、モデルパフォーマンスを包括的に比較する24のサブタスクが導入される。 NLGモデルの事前トレーニングと転送学習の研究を促進するため、GLGEを公開し、MASS、BART、ProphetNetなどの強力なベースラインを持つリーダボードを構築する(ソースコードとデータセットはhttps://github.com/microsoft/glge.comで公開されている)。

Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks. For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard). This introduces 24 subtasks to comprehensively compare model performance. To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and ProphetNet (The source code and dataset are publicly available at https://github.com/microsoft/glge).

翻訳日:2022-09-21 13:00:55 公開日:2021-06-01

# (参考訳) 統合グラディエントを用いたBERTが学習した言語学的受容性

Using Integrated Gradients to explain Linguistic Acceptability learnt by BERT ( http://arxiv.org/abs/2106.07349v1 )

ライセンス: CC BY 4.0

Anmol Nayak, Hari Prasad Timmapathini

(参考訳) BERTは、そのアーキテクチャにおけるマルチヘッド自己認識メカニズムを活用することで、言語理解のブレークスルーとなっている。我々の知る限りでは、この研究は初めてLayer Integrated Gradients Attribution Scores (LIGAS)を活用して、BERTがCorp of Linguistic Acceptability (CoLA)ベンチマークデータセットで学んだ言語受容性基準を説明する。 Our experiments on 5 different categories of sentences lead to the following interesting findings: 1) LIGAS for Linguistically Acceptable (LA) sentences are significantly smaller in comparison to Linguistically Unacceptable (LUA) sentences, 2) There are specific subtrees of the Constituency Parse Tree (CPT) for LA and LUA sentences which contribute larger LIGAS, 3) Across the different categories of sentences we observed around 88% to 100% of the Correctly classified sentences had positive LIGAS, indicating a strong positive relationship to the prediction confidence of the model, and 4) Around 57% of the Misclassified sentences had positive LIGAS, which we believe can become correctly classified sentences if the LIGAS are parameterized in the loss function of the model.

BERT has been a breakthrough in language understanding by leveraging the multi-head self-attention mechanism in its architecture. To the best of our knowledge this work is the first to leverage Layer Integrated Gradients Attribution Scores (LIGAS) to explain the Linguistic Acceptability criteria that are learnt by BERT on the Corpus of Linguistic Acceptability (CoLA) benchmark dataset. Our experiments on 5 different categories of sentences lead to the following interesting findings: 1) LIGAS for Linguistically Acceptable (LA) sentences are significantly smaller in comparison to Linguistically Unacceptable (LUA) sentences, 2) There are specific subtrees of the Constituency Parse Tree (CPT) for LA and LUA sentences which contribute larger LIGAS, 3) Across the different categories of sentences we observed around 88% to 100% of the Correctly classified sentences had positive LIGAS, indicating a strong positive relationship to the prediction confidence of the model, and 4) Around 57% of the Misclassified sentences had positive LIGAS, which we believe can become correctly classified sentences if the LIGAS are parameterized in the loss function of the model.

翻訳日:2021-06-27 12:55:21 公開日:2021-06-01

# (参考訳) 視線追跡タスクのためのデータセット

Dataset for eye-tracking tasks ( http://arxiv.org/abs/2106.07554v1 )

ライセンス: CC0 1.0

R. Ildar

(参考訳) 近年、さまざまなディープニューラルネットワークが開発されているが、ディープネットワークの層が多すぎるため、トレーニングには長い時間と大量のデータセットが必要になる。今日では、このようなディープネットワークを必要としない単純なタスクでも、さまざまなタスクにトレーニングされたディープニューラルネットワークを使用することが一般的です。 YoloV3やSSDなど、よく知られたディープネットワーク。様々な物体の追跡と監視を目的としているため、重量は重く、特定のタスクの全体的な精度は低い。視線追跡タスクは、特定の領域の虹彩の1つの物体のみを検出する必要がある。したがって、このタスクにニューラルネットワークを使用するのは論理的である。しかし問題は、モデルをトレーニングするのに適切なデータセットがないことだ。本稿では,視覚追跡タスクのための畳み込みニューラルネットワークのカスタムモデルのトレーニングに適したデータセットを提案する。データセットデータを使用することで、各ユーザは独立して、アイトラッキングタスクのための畳み込みニューラルネットワークモデルを事前トレーニングすることができる。このデータセットには416×416ピクセルの拡張に注釈付き1万枚のアイイメージが含まれている。注記情報付きテーブルは、各画像の視線の座標と半径を示している。この写本は、視線追跡装置のためのデータセット作成のためのガイドとみなすことができる

In recent years many different deep neural networks were developed, but due to a large number of layers in deep networks, their training requires a long time and a large number of datasets. Today is popular to use trained deep neural networks for various tasks, even for simple ones in which such deep networks are not required. The well-known deep networks such as YoloV3, SSD, etc. are intended for tracking and monitoring various objects, therefore their weights are heavy and the overall accuracy for a specific task is low. Eye-tracking tasks need to detect only one object - an iris in a given area. Therefore, it is logical to use a neural network only for this task. But the problem is the lack of suitable datasets for training the model. In the manuscript, we presented a dataset that is suitable for training custom models of convolutional neural networks for eye-tracking tasks. Using data set data, each user can independently pre-train the convolutional neural network models for eye-tracking tasks. This dataset contains annotated 10,000 eye images in an extension of 416 by 416 pixels. The table with annotation information shows the coordinates and radius of the eye for each image. This manuscript can be considered as a guide for the preparation of datasets for eye-tracking devices

翻訳日:2021-06-27 12:51:04 公開日:2021-06-01

# クリックベイトですか? 機械学習を使って予測しよう

Is it a click bait? Let's predict using Machine Learning ( http://arxiv.org/abs/2106.07348v1 )

ライセンス: Link先を確認

Sohom Ghosh

(参考訳) このデジタル化の時代、ニュース読者はオンラインでニュースを読む傾向にある。これは、オンラインメディアがすぐに幅広いコンテンツにアクセスできるからである。ですから、今日の状況を知るために明日の新聞を待たなくてもよいのです。これらの美徳に加えて、オンラインニュースにはいくつかの逆もある。ニュース記事に関するソーシャルメディア投稿(つぶやき)の存在は、実際のコンテンツを読むように指示するのではなく、ユーザーの注意を引くことだけを目的としている。このようなポストをクリックベイトと呼ぶ。このプロジェクトの目的は、新しい記事に関連するソーシャルメディア投稿(つぶやき)がクリックベイトである確率を予測できるシステムを開発することである。

In this era of digitisation, news reader tend to read news online. This is because, online media instantly provides access to a wide variety of content. Thus, people don't have to wait for tomorrow's newspaper to know what's happening today. Along with these virtues, online news have some vices as well. One such vice is presence of social media posts (tweets) relating to news articles whose sole purpose is to draw attention of the users rather than directing them to read the actual content. Such posts are referred to as clickbaits. The objective of this project is to develop a system which would be capable of predicting how likely are the social media posts (tweets) relating to new articles tend to be clickbait.

翻訳日:2021-06-27 09:02:08 公開日:2021-06-01

# ネステロフ型加速法の高速シンプレクティック積分器

Fast symplectic integrator for Nesterov-type acceleration method ( http://arxiv.org/abs/2106.07620v1 )

ライセンス: Link先を確認

Shin-itiro Goto and Hideitsu Hino

(参考訳) 本論文では,Nesterovの加速勾配法の収束率の向上に寄与する非自明な常微分方程式(ODE)に対して,シンプレクティックおよび接触幾何学に基づく明示的な安定積分器を提案する。シンプレクティック幾何学はハミルトン力学の記述に適していることが知られており、接触幾何学はシンプレクティック幾何学の奇数次元対応として知られている。さらに、シンプレクタゼーションと呼ばれる手続きは接触多様体からシンプレクティック多様体を構築する既知の方法であり、接触多様体からハミルトン系を生成する。この論文では、以前に研究された非自明なODEは、接触ハミルトン系として記述できる。そして、非自明なODEを表す非自明な接触ハミルトンベクトル場のシンプレクティック化により、新しいシンプレクティック積分器が導出される。提案したシンプレクティック積分器はODE内に隠されたシンプレクティック構造と接触構造を保持するため、ランゲ・クッタ法よりも安定である。数値実験により, 2階シンプレクティック積分器は安定であり, 高収束率が得られることが示された。

In this paper, explicit stable integrators based on symplectic and contact geometries are proposed for a non-autonomous ordinarily differential equation (ODE) found in improving convergence rate of Nesterov's accelerated gradient method. Symplectic geometry is known to be suitable for describing Hamiltonian mechanics, and contact geometry is known as an odd-dimensional counterpart of symplectic geometry. Moreover, a procedure, called symplectization, is a known way to construct a symplectic manifold from a contact manifold, yielding Hamiltonian systems from contact ones. It is found in this paper that a previously investigated non-autonomous ODE can be written as a contact Hamiltonian system. Then, by symplectization of a non-autonomous contact Hamiltonian vector field expressing the non-autonomous ODE, novel symplectic integrators are derived. Because the proposed symplectic integrators preserve hidden symplectic and contact structures in the ODE, they should be more stable than the Runge-Kutta method. Numerical experiments demonstrate that, as expected, the second-order symplectic integrator is stable and high convergence rates are achieved.

翻訳日:2021-06-27 09:01:59 公開日:2021-06-01

# (参考訳) FiSH: 空間ホットスポット

FiSH: Fair Spatial Hotspots ( http://arxiv.org/abs/2106.06049v1 )

ライセンス: CC BY 4.0

Deepak P, Sowmya S Sundaram

(参考訳) 追跡デバイスの普及と空間的位置データの可用性の強化は、空間ホットスポット検出などの計算データ解析タスクを通じて、様々な政策介入に利用することへの関心を深めている。本稿では,空間的ホットスポットの検出における公平性について,我々の最善の知識から初めて考察する。我々は、選択したホットスポットにまたがる集団人口に対する統計的公平性を通じて公正性を確保する必要性を動機付けている。次に,注意と公正のトレードオフスペクトルにおいて,多様なソリューションセットを識別するタスクを特徴付け,ポリシー領域によって正当化されたトレードオフを選択する権限を与える。新たなタスクの定式化として,タスクの関連する側面を評価する必要性を動機とする,公正なホットスポット評価指標のスイートも開発した。本研究は, 単純かつ直接的アプローチによる公平なホットスポットの同定と, 高品質で公平で多様な空間的ホットスポットを効率的に同定するためのコードネーム {\it fish} の考案による計算不可能性を示す。 FiSHは、空間ホットスポットの有効かつ公平な集合を特定するためのヒューリスティックスを用いて、木構造検索空間を横断する。人間の開発領域から得られた実世界のデータセットに対する広範な実証分析を通じて、高速な応答時間で高品質なソリューションが生成されることを示す。

Pervasiveness of tracking devices and enhanced availability of spatially located data has deepened interest in using them for various policy interventions, through computational data analysis tasks such as spatial hot spot detection. In this paper, we consider, for the first time to our best knowledge, fairness in detecting spatial hot spots. We motivate the need for ensuring fairness through statistical parity over the collective population covered across chosen hot spots. We then characterize the task of identifying a diverse set of solutions in the noteworthiness-fairness trade-off spectrum, to empower the user to choose a trade-off justified by the policy domain. Being a novel task formulation, we also develop a suite of evaluation metrics for fair hot spots, motivated by the need to evaluate pertinent aspects of the task. We illustrate the computational infeasibility of identifying fair hot spots using naive and/or direct approaches and devise a method, codenamed {\it FiSH}, for efficiently identifying high-quality, fair and diverse sets of spatial hot spots. FiSH traverses the tree-structured search space using heuristics that guide it towards identifying effective and fair sets of spatial hot spots. Through an extensive empirical analysis over a real-world dataset from the domain of human development, we illustrate that FiSH generates high-quality solutions at fast response times.

翻訳日:2021-06-20 21:42:52 公開日:2021-06-01

# N-Gauss活性化関数に基づくニューラルネットワーク構造設計

Neural Network Structure Design based on N-Gauss Activation Function ( http://arxiv.org/abs/2106.07562v1 )

ライセンス: Link先を確認

Xiangri Lu, Hongbin Ma, Jingcheng Zhang

(参考訳) 近年の研究では、畳み込みニューラルネットワークの活性化関数がリプシッツ条件を満たし、それに対応する畳み込みニューラルネットワーク構造をデータセットの規模に応じて構築することができ、データセットをより深く、より正確に、より効果的に訓練できることを示した。本稿では,実験結果を受け入れ,コアブロックN-Gauss,N-Gauss,Swish(Conv1,Conv2,FC1)のニューラルネットワーク構造設計を導入し,それぞれMNIST,CIFAR10,CIFAR100を訓練した。実験により、n-gaussはアクティベーション関数の非線形モデリングの主要な役割を果たすことが示され、ディープ畳み込みニューラルネットワークは階層的非線形マッピング学習能力を持つ。同時に、単純な1次元チャネル小データセット上のN-Gaussのトレーニング能力は、ReLUとSwishの性能と同等である。

Recent work has shown that the activation function of the convolutional neural network can meet the Lipschitz condition, then the corresponding convolutional neural network structure can be constructed according to the scale of the data set, and the data set can be trained more deeply, more accurately and more effectively. In this article, we have accepted the experimental results and introduced the core block N-Gauss, N-Gauss, and Swish (Conv1, Conv2, FC1) neural network structure design to train MNIST, CIFAR10, and CIFAR100 respectively. Experiments show that N-Gauss gives full play to the main role of nonlinear modeling of activation functions, so that deep convolutional neural networks have hierarchical nonlinear mapping learning capabilities. At the same time, the training ability of N-Gauss on simple one-dimensional channel small data sets is equivalent to the performance of ReLU and Swish.

翻訳日:2021-06-20 16:05:00 公開日:2021-06-01

# THG:双曲幾何変換器

THG: Transformer with Hyperbolic Geometry ( http://arxiv.org/abs/2106.07350v1 )

ライセンス: Link先を確認

Zhe Liu and Yibin Xu

(参考訳) トランスフォーマーモデルアーキテクチャは、近年、さまざまなタスクにまたがる効果のために、ディープラーニングにおいて必須の要素となっている。近年、オリジナルのトランスフォーマーアーキテクチャを改良した「xフォーマー」モデルの急増が提案されている。しかし、これらの変種のほとんどは2次時間と自己注意のメモリ複雑性にのみ変化を起こす。クエリとキーの間のドット製品。さらに、それらはユークリッド空間でのみ計算されます。本研究では, ユークリッド空間と双曲空間の両方の利点を生かした, 双曲幾何を用いたトランスフォーマー(THG)モデルを提案する。 thgは、クエリとキーを取得するために入力シーケンスに適用され、提案された双曲線形を用いて自己アテンションの線形変換を改善する。シーケンスラベリングタスク,機械読解タスク,分類タスクに関する広範な実験により,本モデルの有効性と汎用性が示された。また、thgが過剰フィッティングを緩和できることも示している。

Transformer model architectures have become an indispensable staple in deep learning lately for their effectiveness across a range of tasks. Recently, a surge of "X-former" models have been proposed which improve upon the original Transformer architecture. However, most of these variants make changes only around the quadratic time and memory complexity of self-attention, i.e. the dot product between the query and the key. What's more, they are calculate solely in Euclidean space. In this work, we propose a novel Transformer with Hyperbolic Geometry (THG) model, which take the advantage of both Euclidean space and Hyperbolic space. THG makes improvements in linear transformations of self-attention, which are applied on the input sequence to get the query and the key, with the proposed hyperbolic linear. Extensive experiments on sequence labeling task, machine reading comprehension task and classification task demonstrate the effectiveness and generalizability of our model. It also demonstrates THG could alleviate overfitting.

翻訳日:2021-06-20 16:04:44 公開日:2021-06-01

# (参考訳) AI対応型6G O-RANのネットワーク・物理層攻撃と対策

Network and Physical Layer Attacks and countermeasures to AI-Enabled 6G O-RAN ( http://arxiv.org/abs/2106.02494v1 )

ライセンス: CC BY-SA 4.0

Talha F. Rahman, Aly S. Abdalla, Keith Powell, Walaa AlQwider, and Vuk Marojevic

(参考訳) 人工知能(AI)は、細胞ネットワークの展開、構成、管理において、ますます大きな役割を果たすだろう。本稿では,AI駆動型6G無線アクセスネットワーク(RAN)のセキュリティへの影響について検討する。 6G標準化の予定時期はまだ数年先だが、6Gセキュリティに関する事前標準化作業はすでに進行中であり、基礎的および実験的研究の恩恵を受けるだろう。 Open RAN(O-RAN)は、AIコントロールを備えた次世代RANを構築するための、業界主導のオープンアーキテクチャとインターフェースを記述する。このアーキテクチャを考慮すると、データ駆動ネットワークおよび物理層要素に対する重要な脅威、対応する対策、研究の方向性を識別する。

Artificial intelligence (AI) will play an increasing role in cellular network deployment, configuration and management. This paper examines the security implications of AI-driven 6G radio access networks (RANs). While the expected timeline for 6G standardization is still several years out, pre-standardization efforts related to 6G security are already ongoing and will benefit from fundamental and experimental research. The Open RAN (O-RAN) describes an industry-driven open architecture and interfaces for building next generation RANs with AI control. Considering this architecture, we identify the critical threats to data driven network and physical layer elements, the corresponding countermeasures, and the research directions.

翻訳日:2021-06-15 13:46:37 公開日:2021-06-01

# (参考訳) AMV : アルゴリズムメタデータ語彙

AMV : Algorithm Metadata Vocabulary ( http://arxiv.org/abs/2106.03567v1 )

ライセンス: CC BY 4.0

Biswanath Dutta and Jyotima Patel

(参考訳) メタデータ語彙は様々な研究領域で使用される。リソースの詳細な説明を提供する。本研究では,アルゴリズムに関するメタデータ(特にコンピュータによる)を段階的に取得し,保存する語彙であるアルゴリズムメタデータ語彙(AMV)を開発する。研究者が現在直面している問題は、任意の検索エンジンでアルゴリズムを検索する際に、関連性のある結果を得ることができないことだ。 AMVはセマンティックモデルとして表現され、OWLファイルを生成する。これは、アルゴリズムメタデータを知識グラフとして作成、公開したり、SPARQLエンドポイントを通じてメタデータサービスを提供することに関心のある人なら誰でも利用できる。語彙をデザインするために,アルゴリズムユーザと実践者が直面する実際の問題を考えるための,明確に定義された手法を提案する。評価は有望な結果を示す。

Metadata vocabularies are used in various domains of study. It provides an in-depth description of the resources. In this work, we develop Algorithm Metadata Vocabulary (AMV), a vocabulary for capturing and storing the metadata about the algorithms (a procedure or a set of rules that is followed step-by-step to solve a problem, especially by a computer). The snag faced by the researchers in the current time is the failure of getting relevant results when searching for algorithms in any search engine. AMV is represented as a semantic model and produced OWL file, which can be directly used by anyone interested to create and publish algorithm metadata as a knowledge graph, or to provide metadata service through SPARQL endpoint. To design the vocabulary, we propose a well-defined methodology, which considers real issues faced by the algorithm users and the practitioners. The evaluation shows a promising result.

翻訳日:2021-06-15 13:34:38 公開日:2021-06-01

# 1次元畳み込みニューラルネットワークを用いた混合系のラマンスペクトル解析

Raman spectral analysis of mixtures with one-dimensional convolutional neural network ( http://arxiv.org/abs/2106.05316v1 )

ライセンス: Link先を確認

M. Hamed Mozaffari and Li-Lin Tay

(参考訳) 近年,ロバストな1次元畳み込みニューラルネットワーク (1-d cnns) とラマン分光法の組み合わせにより,未知の物質の迅速同定が高精度に行えることが期待されている。この技術を使って、研究者は純粋な化合物を認識し、混合物中の未知の物質と区別することができる。このアプローチの新規性は、トレーニングされたニューラルネットワークがデータの事前処理や後処理なしに自動的に動作することだ。この手法を未知の混合物中の純粋な化合物の分類にまで拡張しようとする研究もある。しかし、1次元cnnの適用は通常、純粋な化合物のバイナリ分類に制限されている。ここでは、多成分混合物中の化学成分のスペクトル認識と定量化における新しいアプローチを紹介する。この目的のために2つの1次元CNNモデルRaMixNet IとIIが開発された。前者は混合物中の成分の迅速な分類、後者はそれらの成分の定量化を目的としている。提案手法では, 混合物中の化合物の数に制限はない。また、ラマンスペクトルにランダムベースラインを追加することにより、データ拡張手法も導入する。実験の結果,RaMixNet I と II の分類精度は未知の試験混合物の分析では100%であり,RMixNet II モデルでは各成分の定量化では88%の回帰精度が得られることがわかった。

Recently, the combination of robust one-dimensional convolutional neural networks (1-D CNNs) and Raman spectroscopy has shown great promise in rapid identification of unknown substances with good accuracy. Using this technique, researchers can recognize a pure compound and distinguish it from unknown substances in a mixture. The novelty of this approach is that the trained neural network operates automatically without any pre- or post-processing of data. Some studies have attempted to extend this technique to the classification of pure compounds in an unknown mixture. However, the application of 1-D CNNs has typically been restricted to binary classifications of pure compounds. Here we will highlight a new approach in spectral recognition and quantification of chemical components in a multicomponent mixture. Two 1-D CNN models, RaMixNet I and II, have been developed for this purpose. The former is for rapid classification of components in a mixture while the latter is for quantitative determination of those constituents. In the proposed method, there is no limit to the number of compounds in a mixture. A data augmentation method is also introduced by adding random baselines to the Raman spectra. The experimental results revealed that the classification accuracy of RaMixNet I and II is 100% for analysis of unknown test mixtures; at the same time, the RaMixNet II model may achieve a regression accuracy of 88% for the quantification of each component.

翻訳日:2021-06-13 14:02:43 公開日:2021-06-01

# 歩行者軌道符号化用非対称bi-rnn

Asymmetrical Bi-RNN for pedestrian trajectory encoding ( http://arxiv.org/abs/2106.04419v1 )

ライセンス: Link先を確認

Rapha\"el Rozenberg, Joseph Gesnouin and Fabien Moutarde

(参考訳) 歩行者の行動行動は、個々の目標と他のエージェントとの社会的相互作用の組み合わせを含む。本稿では,U-RNNと呼ばれる非対称な双方向リカレントニューラルネットワークアーキテクチャをシーケンスエンコーダとして提案する。 Trajnet++ベンチマークの実験結果によると、U-LSTMの変種は、様々なアプローチと相互作用モジュールのための一般的なLSTMシーケンスエンコーダよりも、利用可能なすべてのメトリック(ADE、FDE、衝突速度)についてより良い結果が得られる。 Trajnet++ベンチマークのための非対称Bi-RNNの実装は、github.com/JosephGesnouin/Asymmetrical-Bi-RNNs-to-encode-pedestrian-trajectoriesで利用可能である。

Pedestrian motion behavior involves a combination of individual goals and social interactions with other agents. In this article, we present a non-symmetrical bidirectional recurrent neural network architecture called U-RNN as a sequence encoder and evaluate its relevance to replace LSTMs for various forecasting models. Experimental results on the Trajnet++ benchmark show that the U-LSTM variant can yield better results regarding every available metric (ADE, FDE, Collision rate) than common LSTMs sequence encoders for a variety of approaches and interaction modules. Our implementation of the asymmetrical Bi-RNNs for the Trajnet++ benchmark is available at: github.com/JosephGesnouin/Asymmetrical-Bi-RNNs-to-encode-pedestrian-trajectories

翻訳日:2021-06-13 14:02:22 公開日:2021-06-01

# レーダースペクトルを用いた深層学習対象分類の不確かさの検討

Investigation of Uncertainty of Deep Learning-based Object Classification on Radar Spectra ( http://arxiv.org/abs/2106.05870v1 )

ライセンス: Link先を確認

Kanil Patel, William Beluch, Kilian Rambach, Adriana-Eliza Cozma, Michael Pfeiffer and Bin Yang

(参考訳) 近年,自動車用レーダの物体分類精度向上への関心が高まっているが,高い精度に加えて,予測の信頼性を評価する上で自動運転車の意思決定が重要であるが,dlネットワークの判断は不透明である。近年のDL研究は,予測の不確かさの定量化について検討しており,本稿では,これらの手法が自動車のレーダ認識に有効である可能性について検討する。特に,(1)領域シフト,(2)入力信号の破損,(3)未知物体の存在下での不確実性定量化がレーダ知覚をどのように支援できるかを評価する。文献に見られる現象と一致して、深いレーダー分類器は間違った予測であっても過度に自信を持っていることがわかった。モデルが未知の状況に対処できない場合に通知できないため、不確実性下での意思決定に信頼値を使用することに対する懸念が高まる。正確な信頼度は、例えば複数の情報ソースの最適な統合を可能にする。センサー・フュージョン経由で本研究では, 最先端のポストホック不確実性校正を適用することにより, 信頼性対策の質を著しく向上できることを示す。本研究は、DLネットワークのトレーニングと校正に関するさらなる研究が必要であることを示し、レーダーセンサを用いた安全な自動車物体分類の可能性を示している。

Deep learning (DL) has recently attracted increasing interest to improve object type classification for automotive radar.In addition to high accuracy, it is crucial for decision making in autonomous vehicles to evaluate the reliability of the predictions; however, decisions of DL networks are non-transparent. Current DL research has investigated how uncertainties of predictions can be quantified, and in this article, we evaluate the potential of these methods for safe, automotive radar perception. In particular we evaluate how uncertainty quantification can support radar perception under (1) domain shift, (2) corruptions of input signals, and (3) in the presence of unknown objects. We find that in agreement with phenomena observed in the literature,deep radar classifiers are overly confident, even in their wrong predictions. This raises concerns about the use of the confidence values for decision making under uncertainty, as the model fails to notify when it cannot handle an unknown situation. Accurate confidence values would allow optimal integration of multiple information sources, e.g. via sensor fusion. We show that by applying state-of-the-art post-hoc uncertainty calibration, the quality of confidence measures can be significantly improved,thereby partially resolving the over-confidence problem. Our investigation shows that further research into training and calibrating DL networks is necessary and offers great potential for safe automotive object classification with radar sensors.

翻訳日:2021-06-13 14:02:07 公開日:2021-06-01

# 条件付き生成逆ネットワークを用いたユーザ定義特性を用いたディジタルロック再構成

Digital rock reconstruction with user-defined properties using conditional generative adversarial networks ( http://arxiv.org/abs/2012.07719v2 )

ライセンス: Link先を確認

Qiang Zheng and Dongxiao Zhang

(参考訳) 不確実性は、その固有の不均一性やその場測定の欠如により、地下岩の流動に至らない。マルチスケールで不確実性解析を完了させるには,十分な岩石試料の提供が必須である。デジタルロック技術の出現は、岩を再現する機会を提供するが、高いコストのために大量のサンプルを供給できないため、多様化された数学的手法の開発につながっている。このうち2点統計(TPS)と多点統計(MPS)が一般的に利用されており、それぞれ低次統計情報と高次統計情報を取り入れている。近年,優れた視覚的・地質学的リアリズムを持つ訓練画像の再生が可能なGAN(Generative Adversarial Network)が普及している。しかし、標準のGANはデータからの情報のみを組み込むことができるが、ユーザ定義プロパティのインターフェースは残っていないため、再構成されたサンプルの表現性を制限できる。本研究では,実際のトレーニングデータに類似したサンプルを再現することを目的とした,デジタル岩盤復元のための条件付きganを提案する。実際,提案フレームワークは,岩盤画像からの高次情報を直接GANスキームに組み込むことで,MPSとTPSの目標を同時に実現し,低次情報を条件付きで保存することができる。本研究では, 3つの復元実験を行い, 岩石の種類, 岩石のポロシティ, 相関長が再現された岩石画像に影響を及ぼすことを示す。さらに,既存のGANとは対照的に,複数種類の岩の同時学習が可能であり,計算コストを不可視的に削減することができる。

Uncertainty is ubiquitous with flow in subsurface rocks because of their inherent heterogeneity and lack of in-situ measurements. To complete uncertainty analysis in a multi-scale manner, it is a prerequisite to provide sufficient rock samples. Even though the advent of digital rock technology offers opportunities to reproduce rocks, it still cannot be utilized to provide massive samples due to its high cost, thus leading to the development of diversified mathematical methods. Among them, two-point statistics (TPS) and multi-point statistics (MPS) are commonly utilized, which feature incorporating low-order and high-order statistical information, respectively. Recently, generative adversarial networks (GANs) are becoming increasingly popular since they can reproduce training images with excellent visual and consequent geologic realism. However, standard GANs can only incorporate information from data, while leaving no interface for user-defined properties, and thus may limit the representativeness of reconstructed samples. In this study, we propose conditional GANs for digital rock reconstruction, aiming to reproduce samples not only similar to the real training data, but also satisfying user-specified properties. In fact, the proposed framework can realize the targets of MPS and TPS simultaneously by incorporating high-order information directly from rock images with the GANs scheme, while preserving low-order counterparts through conditioning. We conduct three reconstruction experiments, and the results demonstrate that rock type, rock porosity, and correlation length can be successfully conditioned to affect the reconstructed rock images. Furthermore, in contrast to existing GANs, the proposed conditioning enables learning of multiple rock types simultaneously, and thus invisibly saves computational cost.

翻訳日:2021-06-07 09:09:07 公開日:2021-06-01

# 知識の追跡に関する調査

A Survey of Knowledge Tracing ( http://arxiv.org/abs/2105.15106v2 )

ライセンス: Link先を確認

Qi Liu, Shuanghong Shen, Zhenya Huang, Enhong Chen, and Yonghe Zheng

(参考訳) 高品質な教育は、より持続可能な世界を達成するための鍵の1つだ。新型コロナウイルスの感染拡大を受け、オンライン教育が流行し、学生も教師も家庭で学び、教えることができるようになった。一方、オンライン学習プラットフォームを使って大量の学習データを記録し、調査することが可能になり、よりインテリジェントな教育サービスを提供できるようになった。学生の進化する知識状態を監視することを目的とした知識追跡(KT)は、これらのインテリジェントサービスを支援するための基本的で重要な課題である。そのため、この新興地域には研究の注意が払われており、かなりの進歩を遂げている。本研究では,既存の基本ktモデルの新しい分類法を技術的観点から提案し,これらのモデルの包括的概要を体系的に示す。さらに、より完全な学習プロセスを捉えるために、多くのKTモデルの変種が提案されている。次に、学習過程の3つの段階(前、中、後)に関わるこれらの変種を、それぞれレビューする。さらに、異なる教育シナリオにおけるKTの典型的な応用について述べる。最後に、この急成長分野における今後の研究の方向性について述べる。

High-quality education is one of the keys to achieving a more sustainable world. The recent COVID-19 epidemic has triggered the outbreak of online education, which has enabled both students and teachers to learn and teach at home. Meanwhile, it is now possible to record and research a large amount of learning data using online learning platforms in order to offer better intelligent educational services. Knowledge Tracing (KT), which aims to monitor students' evolving knowledge state, is a fundamental and crucial task to support these intelligent services. Therefore, an increasing amount of research attention has been paid to this emerging area and considerable progress has been made. In this survey, we propose a new taxonomy of existing basic KT models from a technical perspective and provide a comprehensive overview of these models in a systematic manner. In addition, many variants of KT models have been proposed to capture more complete learning process. We then review these variants involved in three phases of the learning process: before, during, and after the student learning, respectively. Moreover, we present several typical applications of KT in different educational scenarios. Finally, we provide some potential directions for future research in this fast-growing field.

翻訳日:2021-06-06 11:07:59 公開日:2021-06-01

# (参考訳) 自律走行車の動作計画と制御のための深層強化学習アルゴリズムに関する研究

A Survey of Deep Reinforcement Learning Algorithms for Motion Planning and Control of Autonomous Vehicles ( http://arxiv.org/abs/2105.14218v2 )

ライセンス: CC BY 4.0

Fei Ye, Shen Zhang, Pin Wang, and Ching-Yao Chan

(参考訳) 本研究では,強化学習(rl)を自律走行車の運動計画と制御に適用する研究の最近の文献を体系的に要約する。多くの既存のコントリビューションは、手作りのモジュールで構成され、それぞれが人間の解釈の容易さのために選択された機能を持つパイプラインアプローチに起因している。しかし、このアプローチはシステムレベルの最適化が欠如しているため、最大性能を自動保証しない。そこで、本稿では、エンド・ツー・エンドのアプローチに陥り、パフォーマンスが向上し、システム・スケールが小さくなる傾向を示す。しかし、その性能は専門家のデータ不足や一般化の問題にも悩まされている。最後に、自動運転に深いRLアルゴリズムを適用した残りの課題を要約し、これらの課題に取り組むための今後の研究方向も提示する。

In this survey, we systematically summarize the current literature on studies that apply reinforcement learning (RL) to the motion planning and control of autonomous vehicles. Many existing contributions can be attributed to the pipeline approach, which consists of many hand-crafted modules, each with a functionality selected for the ease of human interpretation. However, this approach does not automatically guarantee maximal performance due to the lack of a system-level optimization. Therefore, this paper also presents a growing trend of work that falls into the end-to-end approach, which typically offers better performance and smaller system scales. However, their performance also suffers from the lack of expert data and generalization issues. Finally, the remaining challenges applying deep RL algorithms on autonomous driving are summarized, and future research directions are also presented to tackle these challenges.

翻訳日:2021-06-05 21:44:01 公開日:2021-06-01

# (参考訳) 複数のトークン化戦略を持つ韓国英語機械翻訳

Korean-English Machine Translation with Multiple Tokenization Strategy ( http://arxiv.org/abs/2105.14274v2 )

ライセンス: CC BY 4.0

Dojun Park, Youngjin Jang and Harksoo Kim

(参考訳) 本研究では,機械翻訳モデルの学習結果にトークン化手法がどう影響するかを明らかにする。本研究は, 韓国語を原語として, 英語を対象言語として, アルファベットトークン化, 形態素トークン化, およびBPEトークン化をそれぞれ適用し, トランスフォーマーニューラルネットワークを用いて, 9モデル毎に5万エポックを繰り返して比較実験を行った。実験モデルのbleuスコアを計測した結果、bpeトークン化を韓国語に適用したモデルは35.73点を記録し、最高のパフォーマンスを示した。

This work was conducted to find out how tokenization methods affect the training results of machine translation models. In this work, alphabet tokenization, morpheme tokenization, and BPE tokenization were applied to Korean as the source language and English as the target language respectively, and the comparison experiment was conducted by repeating 50,000 epochs of each 9 models using the Transformer neural network. As a result of measuring the BLEU scores of the experimental models, the model that applied BPE tokenization to Korean and morpheme tokenization to English recorded 35.73, showing the best performance.

翻訳日:2021-06-05 18:08:53 公開日:2021-06-01

# (参考訳) 深部アンサンブルを用いたGreedy Bayesian Posterior Approximation

Greedy Bayesian Posterior Approximation with Deep Ensembles ( http://arxiv.org/abs/2105.14275v2 )

ライセンス: CC BY 4.0

Aleksei Tiulpin and Matthew B. Blaschko

(参考訳) 独立に訓練されたニューラルネットワークのアンサンブルは、ディープラーニングにおける予測の不確かさを推定するための最先端のアプローチであり、デルタ関数の混合による後方分布の近似と解釈できる。アンサンブルの訓練は、損失ランドスケープの非凸性と個々のメンバーのランダムな初期化に依存し、その結果の後方近似は制御されない。本稿では,関数空間における実後部とカーネル密度推定器間の$f$-divergenceを最小化する,この制限に対処する新しい原理的手法を提案する。我々は、この目的を組合せの観点から分析し、任意の$f$ に対して混合成分に関して亜モジュラーであることを示す。その後, グリーディアンサンブル構築の問題を考えるとともに, 全目的の限界ゲインから, アンサンブル法の新たな多様性用語を導出する。このアプローチのパフォーマンスは、複数のデータセットでトレーニングされたさまざまなアーキテクチャにおける、コンピュータビジョンの分散ベンチマークで実証されます。本手法のソースコードはhttps://github.com/MIPT-Oulu/greedy_ensembles_trainingで公開されている。

Ensembles of independently trained neural networks are a state-of-the-art approach to estimate predictive uncertainty in Deep Learning, and can be interpreted as an approximation of the posterior distribution via a mixture of delta functions. The training of ensembles relies on non-convexity of the loss landscape and random initialization of their individual members, making the resulting posterior approximation uncontrolled. This paper proposes a novel and principled method to tackle this limitation, minimizing an $f$-divergence between the true posterior and a kernel density estimator in a function space. We analyze this objective from a combinatorial point of view, and show that it is submodular with respect to mixture components for any $f$. Subsequently, we consider the problem of greedy ensemble construction, and from the marginal gain of the total objective, we derive a novel diversity term for ensemble methods. The performance of our approach is demonstrated on computer vision out-of-distribution benchmarks in a range of architectures trained on multiple datasets. The source code of our method is publicly available at https://github.com/MIPT-Oulu/greedy_ensembles_training.

翻訳日:2021-06-05 17:59:57 公開日:2021-06-01

# (参考訳) 文法精度評価(gae) : 機械翻訳モデルの量的固有性評価

Grammar Accuracy Evaluation (GAE): Quantifiable Intrinsic Evaluation of Machine Translation Models ( http://arxiv.org/abs/2105.14277v2 )

ライセンス: CC BY 4.0

Dojun Park, Youngjin Jang and Harksoo Kim

(参考訳) 自然言語生成モデルの性能評価のための人間による本質的評価は、生成文の品質が外部的な評価だけでは完全に表現できないという事実を克服するために行われる。それにもかかわらず、既存の内在的評価は評価者の基準に応じて大きなスコア偏差を有する。本稿では,特定の評価基準を提供するための文法精度評価(GAE)を提案する。 bleuとgaeによる機械翻訳の品質分析の結果、bleuスコアは機械翻訳モデルの絶対的性能を表わさないこと、およびgaeがbleuの欠点を補うことを確認し、代替同義語や文構造の変化を柔軟に評価した。

Intrinsic evaluation by humans for the performance of natural language generation models is conducted to overcome the fact that the quality of generated sentences cannot be fully represented by only extrinsic evaluation. Nevertheless, existing intrinsic evaluations have a large score deviation according to the evaluator's criteria. In this paper, we propose Grammar Accuracy Evaluation (GAE) that can provide specific evaluating criteria. As a result of analyzing the quality of machine translation by BLEU and GAE, it was confirmed that the BLEU score does not represent the absolute performance of machine translation models and that GAE compensates for the shortcomings of BLEU with a flexible evaluation on alternative synonyms and changes in sentence structure.

翻訳日:2021-06-05 17:13:53 公開日:2021-06-01

# (参考訳) 畳み込みニューラルネットワークを用いた境界ボックスアノテーションの自動CT分割

Automatic CT Segmentation from Bounding Box Annotations using Convolutional Neural Networks ( http://arxiv.org/abs/2105.14314v2 )

ライセンス: CC BY 4.0

Yuanpeng Liu, Qinglei Hui, Zhiyi Peng, Shaolin Gong and Dexing Kong

(参考訳) 臨床診断には医用画像の正確なセグメンテーションが重要である。既存の自動セグメンテーション手法は、主に完全に教師ありの学習に基づいており、正確なアノテーションの需要が非常に高く、非常に費用がかかり、時間を要する。この問題に対処するため,我々は,境界ボックスという形で,弱いアノテーションでのみ正確なセグメント化モデルを訓練できる,弱い教師付き学習に基づくctセグメント化手法を提案した。提案手法は,1)k平均クラスタリングによる境界ボックスアノテーションによる擬似マスクの生成,2)分割モデルとして3次元U-Net畳み込みニューラルネットワークを反復的に訓練する。いくつかのデータ前処理手法は性能向上に使用される。この方法は3種類の臓器を含む4つのデータセットで627個のCTボリュームで検証された。肝臓,脾臓,腎分画では95.19%,92.11%,91.45%の精度を示した。実験の結果,本手法は正確で,効率的であり,臨床応用に適していることが示された。

Accurate segmentation for medical images is important for clinical diagnosis. Existing automatic segmentation methods are mainly based on fully supervised learning and have an extremely high demand for precise annotations, which are very costly and time-consuming to obtain. To address this problem, we proposed an automatic CT segmentation method based on weakly supervised learning, by which one could train an accurate segmentation model only with weak annotations in the form of bounding boxes. The proposed method is composed of two steps: 1) generating pseudo masks with bounding box annotations by k-means clustering, and 2) iteratively training a 3D U-Net convolutional neural network as a segmentation model. Some data pre-processing methods are used to improve performance. The method was validated on four datasets containing three types of organs with a total of 627 CT volumes. For liver, spleen and kidney segmentation, it achieved an accuracy of 95.19%, 92.11%, and 91.45%, respectively. Experimental results demonstrate that our method is accurate, efficient, and suitable for clinical use.

翻訳日:2021-06-05 16:02:37 公開日:2021-06-01

# (参考訳) 単一RGBカメラによるベイズ推定に基づくスペクトル分布の分離推定

Separated-Spectral-Distribution Estimation Based on Bayesian Inference with Single RGB Camera ( http://arxiv.org/abs/2106.01861v1 )

ライセンス: CC BY 4.0

Yuma Kinoshita and Hitoshi Kiya

(参考訳) 本稿では,典型的なRGBカメラで撮影した画像からスペクトル分布を別々に推定する手法を提案する。提案手法では,照明,反射率,カメラ感度のスペクトル分布を別々に推定できるが,近年のハイパースペクトルカメラはシーンからの同時スペクトル分布を捉えることに限られている。さらに、ベイズ推定を用いることで、スペクトル分布と画像ノイズの両方の事前情報を確率分布として考慮することができる。その結果,提案手法はスペクトル分布を統一的に推定することができ,従来のスペクトル分布推定法では不可能な雑音に対する推定の堅牢性を高めることができる。ベイズ推論を用いることで,推定結果の信頼度も得ることができる。実験では,提案手法が従来のrmse法を上回るだけでなく,雑音に対するロバスト性も示している。

In this paper, we propose a novel method for separately estimating spectral distributions from images captured by a typical RGB camera. The proposed method allows us to separately estimate a spectral distribution of illumination, reflectance, or camera sensitivity, while recent hyperspectral cameras are limited to capturing a joint spectral distribution from a scene. In addition, the use of Bayesian inference makes it possible to take into account prior information of both spectral distributions and image noise as probability distributions. As a result, the proposed method can estimate spectral distributions in a unified way, and it can enhance the robustness of the estimation against noise, which conventional spectral-distribution estimation methods cannot. The use of Bayesian inference also enables us to obtain the confidence of estimation results. In an experiment, the proposed method is shown not only to outperform conventional estimation methods in terms of RMSE but also to be robust against noise.

翻訳日:2021-06-05 12:31:39 公開日:2021-06-01

# (参考訳) 気腫沈降のための深部クラスタリング活性化マップ

Deep Clustering Activation Maps for Emphysema Subtyping ( http://arxiv.org/abs/2106.01351v1 )

ライセンス: CC BY 4.0

Weiyi Xie, Colin Jacobs, Bram van Ginneken

(参考訳) 本稿では,CTスキャンから気腫のサブタイプを抽出するためのセグメンテーションネットワークから高密度な特徴を生かしたディープラーニングクラスタリング手法を提案する。濃密な特徴を利用することで、高密度クラスタリングアクティベーションマップ(dCAM)を介して、クラスタ割り当てに対応する画像領域の高精細な可視化が可能になる。このアプローチはモデル解釈性を提供する。 COPDGenestudyによる500名の被験者のクラスタリング結果について検討し,画像CTによる肺気腫サブタイプを手動でアノテートした。教師なしクラスタリングの精度は43%で、ベースラインを41%で上回り、45%で教師付き分類に匹敵する結果を得た。提案手法は, シルエット係数0.54, David-Bouldin スコア 0.55 に対して, ベースラインよりも優れたクラスタ形成を提供する。

We propose a deep learning clustering method that exploits dense features from a segmentation network for emphysema subtyping from computed tomography (CT) scans. Using dense features enables high-resolution visualization of image regions corresponding to the cluster assignment via dense clustering activation maps (dCAMs). This approach provides model interpretability. We evaluated clustering results on 500 subjects from the COPDGenestudy, where radiologists manually annotated emphysema sub-types according to their visual CT assessment. We achieved a 43% unsupervised clustering accuracy, outperforming our baseline at 41% and yielding results comparable to supervised classification at 45%. The proposed method also offers a better cluster formation than the baseline, achieving0.54 in silhouette coefficient and 0.55 in David-Bouldin scores.

翻訳日:2021-06-05 12:20:59 公開日:2021-06-01

# (参考訳) 深層学習を用いたSPECT MPIを用いた機械的不整脈からのCRT応答の新しい予測因子の探索

A method using deep learning to discover new predictors of CRT response from mechanical dyssynchrony on gated SPECT MPI ( http://arxiv.org/abs/2106.01355v1 )

ライセンス: CC BY 4.0

Zhuo He, Xinwei Zhang, Chen Zhao, Zhiyong Qian, Yao Wang, Xiaofeng Hou, Jiangang Zou, Weihua Zhou

(参考訳) 背景。従来の左室機械的同期(LVMD)パラメータには独自の統計的制限があることが研究で示されている。本研究の目的は,CRT患者選択を支援するための深層学習により,ゲートSPECT MPIの位相解析から新しいLVMDパラメータを抽出することである。方法。 SPECT SPECT MPI を施行した患者は 100 名, 3 名であった。 crt反応は6ヶ月+1カ月後に左室末端収縮容積 (lvesv) >=15%の減少と定義した。教師なしのディープラーニング手法であるAutoencoder (AE) は、生のLVシストリック位相極写像を用いてトレーニングされ、AEベースのLVMDパラメータと呼ばれる新しいLVMDパラメータを抽出した。新しいパラメータと従来のLVMDパラメータの関係を説明するために相関解析を用いた。単変量および多変量解析を用いてCRT応答を予測する多変量モデルを構築した。結果。完全なデータは102例で得られ、44.1%はCRT応答薬に分類された。 AE-based LVMD parameters was significant in the univariate (OR 1.24, 95% CI 1.07 - 1.44, P = 0.006) and multivariate analysis (OR 1.03, 95% CI 1.01 - 1.06, P = 0.006)。さらに, PSD (AUC 0.72 vs. 0.63, LH 8.06, P = 0.005) とPBW (AUC 0.72 vs. 0.64, LH 7.87, P = 0.005) を上回り, LVEF や性差など有意な臨床症状が認められた。結論。ベースラインゲートSPECT MPIからオートエンコーダによって抽出された新しいLVMDパラメータは、CRT応答の予測を改善する可能性がある。

Background. Studies have shown that the conventional left ventricular mechanical dyssynchrony (LVMD) parameters have their own statistical limitations. The purpose of this study is to extract new LVMD parameters from the phase analysis of gated SPECT MPI by deep learning to help CRT patient selection. Methods. One hundred and three patients who underwent rest gated SPECT MPI were enrolled in this study. CRT response was defined as a decrease in left ventricular end-systolic volume (LVESV) >= 15% at 6 +- 1 month follow up. Autoencoder (AE), an unsupervised deep learning method, was trained by the raw LV systolic phase polar maps to extract new LVMD parameters, called AE-based LVMD parameters. Correlation analysis was used to explain the relationships between new parameters with conventional LVMD parameters. Univariate and multivariate analyses were used to establish a multivariate model for predicting CRT response. Results. Complete data were obtained in 102 patients, 44.1% of them were classified as CRT responders. AE-based LVMD parameter was significant in the univariate (OR 1.24, 95% CI 1.07 - 1.44, P = 0.006) and multivariate analyses (OR 1.03, 95% CI 1.01 - 1.06, P = 0.006). Moreover, it had incremental value over PSD (AUC 0.72 vs. 0.63, LH 8.06, P = 0.005) and PBW (AUC 0.72 vs. 0.64, LH 7.87, P = 0.005), combined with significant clinic characteristics, including LVEF and gender. Conclusions. The new LVMD parameters extracted by autoencoder from the baseline gated SPECT MPI has the potential to improve the prediction of CRT response.

翻訳日:2021-06-05 11:59:52 公開日:2021-06-01

# (参考訳) Diffusion Schr\"odinger Bridgeとスコアベース生成モデルへの応用

Diffusion Schr\"odinger Bridge with Applications to Score-Based Generative Modeling ( http://arxiv.org/abs/2106.01357v1 )

ライセンス: CC0 1.0

Valentin De Bortoli, James Thornton, Jeremy Heng, Arnaud Doucet

(参考訳) ガウス雑音の漸進的適用は、複素データ分布をおよそガウスに変換する。このダイナミックな反転は生成モデルを定義する。前方雑音発生過程が確率微分方程式(SDE)によって与えられる場合、Song et al。 (2021) スコアマッチングを用いて, 関連する逆時間SDEの時間不均一ドリフトを推定する方法を示した。このアプローチの制限は、最終分布がほぼガウス的であるためには、前向きの SDE を十分に長い時間実行しなければならないことである。対照的に、schr\"odinger bridge problem (sb) の解法である。経路空間上のエントロピー規則化された最適輸送問題で、有限時間でデータ分布からサンプルを生成する拡散を生成する。本稿では,SB問題を解くためにIterative Proportional Fitting (IPF) 法のオリジナル近似である Diffusion SB (DSB) を提案し,生成モデル実験とともに理論的解析を行った。最初のDSBイテレーションは、Songらによって提案された方法論を復元する。 (2021年)は、後続のdsbの反復が前方(resp)の最終時間辺とのずれを減少させるため、より短い時間間隔を使用する柔軟性がある。 sde (複数形 sdes または sdes) データ)配信。生成モデリング以外にも、DSBは人気のあるシンクホーンアルゴリズム(Cuturi, 2013)の連続状態空間アナログとして広く応用可能な計算最適輸送ツールを提供している。

Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schr\"odinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).

翻訳日:2021-06-05 11:47:09 公開日:2021-06-01

# (参考訳) balanced spiking neural networkを用いたオンライン振動異常検出

Online Detection of Vibration Anomalies Using Balanced Spiking Neural Networks ( http://arxiv.org/abs/2106.00687v1 )

ライセンス: CC BY 4.0

Nik Dennler, Germain Haessig, Matteo Cartiglia, Giacomo Indiveri

(参考訳) 振動パターンは、大規模産業システムの予測メンテナンスタスクに一般的に利用されるランニングマシンの健康状態に関する貴重な情報をもたらす。しかし、この情報を利用する古典的な方法によって必要とされる、サイズ、複雑さ、電力予算のオーバーヘッドは、自動運転車、ドローン、ロボット工学のような小規模のアプリケーションでは、しばしば禁止される。本稿では,幅広いシナリオに適用可能なスパイキングニューラルネットワークを用いた振動解析を行うためのニューロモルフィックアプローチを提案する。本稿では,アナログディジタルニューロモルフィック回路と互換性のあるビルディングブロックを用いて,振動データからシステム異常を検出するスパイクベースのエンドツーエンドパイプラインを提案する。このパイプラインはオンラインの教師なしの方法で動作し、コチェリーモデル、フィードバック適応、バランスの取れたスパイクニューラルネットワークに依存している。提案手法は,2つの公開データセットに対して,最先端の性能あるいは優れた性能を実現する。さらに,非同期なニューロモーフィックプロセッサデバイスに実装された概念実証を行う。この研究は、オンライン振動監視のための自律低消費電力エッジコンピューティングデバイスの設計と実装に向けた重要な一歩である。

Vibration patterns yield valuable information about the health state of a running machine, which is commonly exploited in predictive maintenance tasks for large industrial systems. However, the overhead, in terms of size, complexity and power budget, required by classical methods to exploit this information is often prohibitive for smaller-scale applications such as autonomous cars, drones or robotics. Here we propose a neuromorphic approach to perform vibration analysis using spiking neural networks that can be applied to a wide range of scenarios. We present a spike-based end-to-end pipeline able to detect system anomalies from vibration data, using building blocks that are compatible with analog-digital neuromorphic circuits. This pipeline operates in an online unsupervised fashion, and relies on a cochlea model, on feedback adaptation and on a balanced spiking neural network. We show that the proposed method achieves state-of-the-art performance or better against two publicly available data sets. Further, we demonstrate a working proof-of-concept implemented on an asynchronous neuromorphic processor device. This work represents a significant step towards the design and implementation of autonomous low-power edge-computing devices for online vibration monitoring.

翻訳日:2021-06-05 11:45:55 公開日:2021-06-01

# (参考訳) 対称性-via-duality:パラメータ空間相関器からの不変ニューラルネットワーク密度

Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators ( http://arxiv.org/abs/2106.00694v1 )

ライセンス: CC BY 4.0

Anindita Maiti, Keegan Stoner, James Halverson

(参考訳) パラメータ空間と関数空間は、ニューラルネットワークを研究するための2つの異なる双対フレームを提供する。ネットワーク密度の対称性は、密度が未知で同値でない場合でも、ネットワーク相関関数の双対計算によって決定できることを示す。対称性と双対性は、ネットワークパラメータ分布の選択に由来する相関関数の不変性に依存する。ニューラルネットワーク密度の入力および出力対称性が決定され、既知のガウス過程が回復すると無限の幅制限が生じる。このメカニズムは、パラメータが相関している場合や、神経接核の対称性など、トレーニング中の対称性を決定するためにも利用できる。初期化密度における対称性の量は、Fashion-MNISTで訓練されたネットワークの精度に影響を及ぼし、対称性の破れは、それが基底真理の方向にある場合にのみ役立つことを実証する。

Parameter-space and function-space provide two different duality frames in which to study neural networks. We demonstrate that symmetries of network densities may be determined via dual computations of network correlation functions, even when the density is unknown and the network is not equivariant. Symmetry-via-duality relies on invariance properties of the correlation functions, which stem from the choice of network parameter distributions. Input and output symmetries of neural network densities are determined, which recover known Gaussian process results in the infinite width limit. The mechanism may also be utilized to determine symmetries during training, when parameters are correlated, as well as symmetries of the Neural Tangent Kernel. We demonstrate that the amount of symmetry in the initialization density affects the accuracy of networks trained on Fashion-MNIST, and that symmetry breaking helps only when it is in the direction of ground truth.

翻訳日:2021-06-05 11:37:36 公開日:2021-06-01

# (参考訳) グラフニューラルネットワークを用いた協調的モバイルクラウドソーシングのための低複雑性リクルート

Low Complexity Recruitment for Collaborative Mobile Crowdsourcing Using Graph Neural Networks ( http://arxiv.org/abs/2106.00717v1 )

ライセンス: CC BY 4.0

Aymen Hamrouni, Hakim Ghazzai, Turki Alelyani, Yehia Massoud

(参考訳) コラボレーティブ・モバイル・クラウドソーシング(CMCS、Collaborative Mobile crowdsourcing)は、例えば地方自治体や個人が、接続された人々の群衆から労働者のチームを雇い、複雑なタスクを実行することを可能にする。本稿では,タスク依頼者がソーシャル・コネクテッドで熟練した労働者のチームを作るための2つの異なるCMCS採用戦略について検討する。つまり,プラットフォームが作業員に関する自身の知識を活用してチームを編成するプラットフォームベースの戦略と,プラットフォームが自身のソーシャル・ネットワーク(SN)隣人について独自の知識を持つ適切なチームを採用するグループリーダーを指定するリーダベースの戦略である。 Integer Linear Program (ILP) として採用を定式化し、4つのファジィ論理に基づく基準(専門知識レベル、社会的関係の強さ、採用コスト、採用者の信頼度レベル)に従ってチームを最適に形成する。 NP硬度に対処するため,グラフニューラルネットワーク(GNN)を利用した新しい低複雑さCMCS採用手法を設計し,特にグラフ埋め込みとクラスタリング技術を用いて,作業者の検索空間を縮小し,その後,適切な作業者を選択するためにメタヒューリスティックな遺伝的アルゴリズムを活用する。実世界のデータセットに適用したシミュレーション結果から,CMCS採用手法の有効性が示された。提案手法は,大規模モバイルクラウドソーシングプラットフォーム上での計算時間削減と運用能力により,ベースライン型IPPと比較して高い性能を達成できることが示唆された。また、リーダーベースの戦略と比較して、プラットフォームベースの戦略はより熟練したチームを採用するが、SN関係は低く、コストも高い。

Collaborative Mobile crowdsourcing (CMCS) allows entities, e.g., local authorities or individuals, to hire a team of workers from the crowd of connected people, to execute complex tasks. In this paper, we investigate two different CMCS recruitment strategies allowing task requesters to form teams of socially connected and skilled workers: i) a platform-based strategy where the platform exploits its own knowledge about the workers to form a team and ii) a leader-based strategy where the platform designates a group leader that recruits its own suitable team given its own knowledge about its Social Network (SN) neighbors. We first formulate the recruitment as an Integer Linear Program (ILP) that optimally forms teams according to four fuzzy-logic-based criteria: level of expertise, social relationship strength, recruitment cost, and recruiter's confidence level. To cope with NP-hardness, we design a novel low-complexity CMCS recruitment approach relying on Graph Neural Networks (GNNs), specifically graph embedding and clustering techniques, to shrink the workers' search space and afterwards, exploiting a meta-heuristic genetic algorithm to select appropriate workers. Simulation results applied on a real-world dataset illustrate the performance of both proposed CMCS recruitment approaches. It is shown that our proposed low-complexity GNN-based recruitment algorithm achieves close performances to those of the baseline ILP with significant computational time saving and ability to operate on large-scale mobile crowdsourcing platforms. It is also shown that compared to the leader-based strategy, the platform-based strategy recruits a more skilled team but with lower SN relationships and higher cost.

翻訳日:2021-06-05 11:06:09 公開日:2021-06-01

# (参考訳) fair-net:識別可能なサブ人口間のパフォーマンス格差を軽減するネットワークアーキテクチャ

Fair-Net: A Network Architecture For Reducing Performance Disparity Between Identifiable Sub-Populations ( http://arxiv.org/abs/2106.00720v1 )

ライセンス: CC BY 4.0

Arghya Datta, S. Joshua Swamidass

(参考訳) 現実世界のデータセットでは、特定のグループは過度に表現され、他のものよりはるかにまれであり、マシンラーニングの分類器は、過度に表現された人口に先立つことが多い。この問題は、データセットがクラス不均衡であり、マイノリティクラスが多数派クラスよりもはるかに稀な多くのドメインで悪化する。下位表現とクラス不均衡を扱うための有望なアプローチには、クラス不均衡を扱うサブポピュレーション固有の分類器の訓練や、サブポピュレーションの格差を無視し、クラス不均衡を扱うことで高い全体的な精度を達成することを目的としたグローバルな分類器の訓練が含まれる。本研究では,これらの手法が少数サブ人口のクラス不均衡データセットにおいて脆弱であることを示す。 fair-netは分岐型マルチタスクニューラルネットワークアーキテクチャで、クラス不均衡データセットにおける識別可能なサブポピュレーションの分類精度と確率校正を両立させる。 Fair-Netsは、ネットワークの出力層とエラー関数への直接的な拡張であり、より複雑なアーキテクチャに組み込むことができる。 3つの実世界のベンチマークデータセットを用いた実証研究により、Fair-Netは分類と校正性能を改善し、性別と人種のサブ人口間のパフォーマンス格差を大幅に減らした。

In real world datasets, particular groups are under-represented, much rarer than others, and machine learning classifiers will often preform worse on under-represented populations. This problem is aggravated across many domains where datasets are class imbalanced, with a minority class far rarer than the majority class. Naive approaches to handle under-representation and class imbalance include training sub-population specific classifiers that handle class imbalance or training a global classifier that overlooks sub-population disparities and aims to achieve high overall accuracy by handling class imbalance. In this study, we find that these approaches are vulnerable in class imbalanced datasets with minority sub-populations. We introduced Fair-Net, a branched multitask neural network architecture that improves both classification accuracy and probability calibration across identifiable sub-populations in class imbalanced datasets. Fair-Nets is a straightforward extension to the output layer and error function of a network, so can be incorporated in far more complex architectures. Empirical studies with three real world benchmark datasets demonstrate that Fair-Net improves classification and calibration performance, substantially reducing performance disparity between gender and racial sub-populations.

翻訳日:2021-06-05 10:33:41 公開日:2021-06-01

# (参考訳) 関数型オブジェクト指向ネットワークから生成するレシピの評価

Evaluating Recipes Generated from Functional Object-Oriented Network ( http://arxiv.org/abs/2106.00728v1 )

ライセンス: CC BY 4.0

Md Sadman Sakib, Hailey Baez, David Paulius, and Yu Sun

(参考訳) 関数型オブジェクト指向ネットワーク(foon)は,シンボリックタスク計画のためのグラフ形式の知識表現として導入された。ロボットは、操作タスクのシーケンシャルプランを得るために、FOONから知識検索プロセスを通じてタスクツリーを得ることができる。獲得したタスクツリーの品質を評価するため,レシピやマニュアルなどの従来のタスク知識と比較した。まずタスクツリーをレシピに自動的に変換し、それをRecipe1M+データセットの人間が作ったレシピと比較します。 Recipe1M+のレシピとFOONタスクツリーのレシピの間には,正確性,完全性,明確性という点で有意な差は認められなかった。

The functional object-oriented network (FOON) has been introduced as a knowledge representation, which takes the form of a graph, for symbolic task planning. To get a sequential plan for a manipulation task, a robot can obtain a task tree through a knowledge retrieval process from the FOON. To evaluate the quality of an acquired task tree, we compare it with a conventional form of task knowledge, such as recipes or manuals. We first automatically convert task trees to recipes, and we then compare them with the human-created recipes in the Recipe1M+ dataset via a survey. Our preliminary study finds no significant difference between the recipes in Recipe1M+ and the recipes generated from FOON task trees in terms of correctness, completeness, and clarity.

翻訳日:2021-06-05 10:21:06 公開日:2021-06-01

# (参考訳) 英語アラビア語機械翻訳における音声と普遍的依存の影響

Part of Speech and Universal Dependency effects on English Arabic Machine Translation ( http://arxiv.org/abs/2106.00745v1 )

ライセンス: CC0 1.0

Omri Abend, Leshem Choshen, Dmitry Nikolaev, Ofek Rafaeli

(参考訳) 本稿では,英語とアラビア語の文法的現象を基礎とした機械翻訳モデルの評価手法について述べる。このような「神経」や「機械学習」は微調整や変化が難しいため、この方法は特に重要である。したがって、それらを容易かつ多様に評価する方法を見つけることは、それらを改善するタスクに大いに役立ちます。

In this research paper, I will elaborate on a method to evaluate machine translation models based on their performance on underlying syntactical phenomena between English and Arabic languages. This method is especially important as such "neural" and "machine learning" are hard to fine-tune and change. Thus, finding a way to evaluate them easily and diversely would greatly help the task of bettering them.

翻訳日:2021-06-05 10:06:06 公開日:2021-06-01

# (参考訳) 無限水平動的計画法におけるオンラインポリシーイテレーション

On-Line Policy Iteration for Infinite Horizon Dynamic Programming ( http://arxiv.org/abs/2106.00746v1 )

ライセンス: CC BY 4.0

Dimitri Bertsekas

(参考訳) 本稿では,有限状態無限大地平線ディスカウント動的計画のためのオンラインポリシー反復 (pi) アルゴリズムを提案する。これにより、現在のポリシの継続的な更新/改善が可能になり、結果として、改善されたコントロールを現在のポリシに組み込んだオンラインPIが生成される。このアルゴリズムは、有限個の段階において局所最適ポリシーの一種に収束し、ポリシー改善を単純化したpiおよびマルチエージェントpiの変種の可能性を提案する。さらに、このアルゴリズムはオンラインのリプランニングで使用することができ、また、値とポリシー近似を持つオンラインPIアルゴリズムにも適している。

In this paper we propose an on-line policy iteration (PI) algorithm for finite-state infinite horizon discounted dynamic programming, whereby the policy improvement operation is done on-line, only for the states that are encountered during operation of the system. This allows the continuous updating/improvement of the current policy, thus resulting in a form of on-line PI that incorporates the improved controls into the current policy as new states and controls are generated. The algorithm converges in a finite number of stages to a type of locally optimal policy, and suggests the possibility of variants of PI and multiagent PI where the policy improvement is simplified. Moreover, the algorithm can be used with on-line replanning, and is also well-suited for on-line PI algorithms with value and policy approximations.

翻訳日:2021-06-05 09:57:26 公開日:2021-06-01

# (参考訳) 重畳有限状態機械の高次微分

Higher-order Derivatives of Weighted Finite-state Machines ( http://arxiv.org/abs/2106.00749v1 )

ライセンス: CC BY 4.0

Ran Zmigrod, Tim Vieira, Ryan Cotterell

(参考訳) 重み付き有限状態機械はNLPシステムの基本的な構成要素である。彼らは、1990年代にノイズの多いチャネルモデルで初期の使用から、現代のニューラルなパラメータ化条件付きランダムフィールドまで、時間の試験を再考した。本研究では,重み付き有限状態機械の正規化定数に関する高次導関数の計算について検討する。文献に記載されていないすべての順序の導関数を評価するための一般アルゴリズムを提案する。 2階微分の場合、我々のスキームは最適な$\mathcal{O}(A^2 N^4)$時間で実行され、$A$はアルファベットサイズ、$N$は状態の数である。我々のアルゴリズムは以前のアルゴリズムよりはるかに高速である。さらに,本手法により,共分散行列や一階期待勾配などの2階期待値を計算するアルゴリズムが大幅に高速化される。

Weighted finite-state machines are a fundamental building block of NLP systems. They have withstood the test of time -- from their early use in noisy channel models in the 1990s up to modern-day neurally parameterized conditional random fields. This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines. We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature. In the case of second-order derivatives, our scheme runs in the optimal $\mathcal{O}(A^2 N^4)$ time where $A$ is the alphabet size and $N$ is the number of states. Our algorithm is significantly faster than prior algorithms. Additionally, our approach leads to a significantly faster algorithm for computing second-order expectations, such as covariance matrices and gradients of first-order expectations.

翻訳日:2021-06-05 09:50:45 公開日:2021-06-01

# (参考訳) 老いぼれのGRAPPAは死んだのか?

Is good old GRAPPA dead? ( http://arxiv.org/abs/2106.00753v1 )

ライセンス: CC BY 4.0

Zaccharie Ramzi, Alexandre Vignaud, Jean-Luc Starck, Philippe Ciuciu

(参考訳) 我々はMRI再建のための最先端深層学習手法であるXPDNetの性能を,従来のGRAPPAと比較して定性的に解析する。私たちはこれを複数の設定で実行し、特にXPDNetの堅牢性をテストすることで、XPDNetがある程度の一般化が可能であることを示しています。

We perform a qualitative analysis of performance of XPDNet, a state-of-the-art deep learning approach for MRI reconstruction, compared to GRAPPA, a classical approach. We do this in multiple settings, in particular testing the robustness of the XPDNet to unseen settings, and show that the XPDNet can to some degree generalize well.

翻訳日:2021-06-05 09:37:26 公開日:2021-06-01

# (参考訳) 入力表現の復号化によるニューラルネットワークの構成性向上

Improving Compositionality of Neural Networks by Decoding Representations to Inputs ( http://arxiv.org/abs/2106.00769v1 )

ライセンス: CC BY 4.0

Mike Wu, Noah Goodman, Stefano Ermon

(参考訳) 従来のソフトウェアプログラムでは、変数から入力までプログラムロジックをトレースし、ユニットテストとアサーションステートメントを適用して誤った振る舞いをブロックし、プログラムを一緒に構成することで、コードのデバッグがいかに簡単かを考慮します。しかし、プログラムが複雑化するにつれて、コンピュータビジョンや自然言語のようなアプリケーションに従来のソフトウェアを適用するのは難しくなります。ディープラーニングプログラムはこれらのアプリケーションで強いパフォーマンスを示しているが、従来のソフトウェアプログラムの機能の多くを犠牲にしている。本稿では,ニューラルネットワークのアクティベーションを"デコード"に制約するために,生成モデルを共同でトレーニングすることにより,従来型および深層学習プログラムの利点を橋渡しする。そうすることで、実践者はアクティベーションで符号化された情報を探索し追跡し、アクティベーションで符号化された情報にアサーションのような制約を適用し、プラグインとプレイで別々のニューラルネットワークを構成することができる。実験では、分散検出、逆例、キャリブレーション、公平性に対するデコダラブル表現の応用を、標準ニューラルネットワークの精度と一致させながら実証する。

In traditional software programs, we take for granted how easy it is to debug code by tracing program logic from variables back to input, apply unit tests and assertion statements to block erroneous behavior, and compose programs together. But as the programs we write grow more complex, it becomes hard to apply traditional software to applications like computer vision or natural language. Although deep learning programs have demonstrated strong performance on these applications, they sacrifice many of the functionalities of traditional software programs. In this paper, we work towards bridging the benefits of traditional and deep learning programs by jointly training a generative model to constrain neural network activations to "decode" back to inputs. Doing so enables practitioners to probe and track information encoded in activation(s), apply assertion-like constraints on what information is encoded in an activation, and compose separate neural networks together in a plug-and-play fashion. In our experiments, we demonstrate applications of decodable representations to out-of-distribution detection, adversarial examples, calibration, and fairness -- while matching standard neural networks in accuracy.

翻訳日:2021-06-05 09:33:43 公開日:2021-06-01

# (参考訳) フェアネスを考慮した特徴選択のための情報理論

Information Theoretic Measures for Fairness-aware Feature Selection ( http://arxiv.org/abs/2106.00772v1 )

ライセンス: CC BY 4.0

Sajad Khodadadian, Mohamed Nafea, AmirEmad Ghassami, Negar Kiyavash

(参考訳) 機械利得アルゴリズムは、関連する特徴に基づいて個人に関する一連の意思決定にますます使われている。しかし、正確な決定に関係のある特徴は、特定の人種や性別のような非特権集団に対する明示的または暗黙的な差別に繋がる可能性がある。これはトレーニングデータに既存のバイアスがあり、学習アルゴリズムによってしばしば複製されるか、さらに悪化する。これらのバイアスをデータレベルで識別し、測定することは、特徴間の相互依存と決定結果のために難しい問題である。本研究では,特徴の精度と識別的影響に関する情報理論に基づく,公正な特徴選択のためのフレームワークを開発する。特に当社の目標は,この機能が正確性や非差別的判断に与える影響を定量化する,各機能に対する公平性ユーティリティスコアの設計にあります。まず,モデルの精度と識別に異なる特徴のサブセットが与える影響に関する情報理論的な尺度を提案する。その後,shapley値関数を用いて各特徴の限界影響を推定する。我々のフレームワークは、特定の分類器の設計よりもデータの合同統計に依存する。提案する実データおよび合成データに関する枠組みについて検討し,その性能評価を行った。

Machine earning algorithms are increasingly used for consequential decision making regarding individuals based on their relevant features. Features that are relevant for accurate decisions may however lead to either explicit or implicit forms of discrimination against unprivileged groups, such as those of certain race or gender. This happens due to existing biases in the training data, which are often replicated or even exacerbated by the learning algorithm. Identifying and measuring these biases at the data level is a challenging problem due to the interdependence among the features, and the decision outcome. In this work, we develop a framework for fairness-aware feature selection, based on information theoretic measures for the accuracy and discriminatory impacts of features. Specifically, our goal is to design a fairness utility score for each feature which quantifies how this feature influences accurate as well as nondiscriminatory decisions. We first propose information theoretic measures for the impact of different subsets of features on the accuracy and discrimination of the model. Subsequently, we deduce the marginal impact of each feature using Shapley value function. Our framework depends on the joint statistics of the data rather than a particular classifier design. We examine our proposed framework on real and synthetic data to evaluate its performance.

翻訳日:2021-06-05 09:16:15 公開日:2021-06-01

# (参考訳) K$-best非射影依存ツリーの探索について

On Finding the $K$-best Non-projective Dependency Trees ( http://arxiv.org/abs/2106.00780v1 )

ライセンス: CC BY 4.0

Ran Zmigrod, Tim Vieira, Ryan Cotterell

(参考訳) 有向グラフにおける最大スパンニングツリーと文の最高の依存ツリーとの接続は、NLPコミュニティによって活用されている。しかし、多くの依存性解析スキームにおいて、このアプローチの重要な詳細は、スパンニングツリーがルートからちょうど1つのエッジを持つ必要があることである。一流の依存性ツリーを見つけるために、この問題を効率的に解決する作業が行われているが、k$-best依存性ツリーを見つけるためにこのソリューションを拡張する研究は行われていない。これはおそらくより重要な拡張であり、デコードされた木の割合が依存性ツリーのルート制約の対象にならないためである。実際、ルート制約違反の率は、$K\!=\!50$でデコードした場合、$K\!=\!1$とは対照的に平均13ドルずつ増加する。本稿では,camerini et al の $k$-best spaning tree アルゴリズムの単純化について述べる。 (1980). 我々の単純化により、元のアルゴリズム上で一定の時間短縮が得られる。さらに、ルート制約を受けるグラフの$k$-best依存性木を復号するアルゴリズムの新たな拡張を提案する。

The connection between the maximum spanning tree in a directed graph and the best dependency tree of a sentence has been exploited by the NLP community. However, for many dependency parsing schemes, an important detail of this approach is that the spanning tree must have exactly one edge emanating from the root. While work has been done to efficiently solve this problem for finding the one-best dependency tree, no research has attempted to extend this solution to finding the $K$-best dependency trees. This is arguably a more important extension as a larger proportion of decoded trees will not be subject to the root constraint of dependency trees. Indeed, we show that the rate of root constraint violations increases by an average of $13$ times when decoding with $K\!=\!50$ as opposed to $K\!=\!1$. In this paper, we provide a simplification of the $K$-best spanning tree algorithm of Camerini et al. (1980). Our simplification allows us to obtain a constant time speed-up over the original algorithm. Furthermore, we present a novel extension of the algorithm for decoding the $K$-best dependency trees of a graph which are subject to a root constraint.

翻訳日:2021-06-05 08:59:43 公開日:2021-06-01

# (参考訳) Cセキュリティ脆弱性検出のためのソースコードの分散表現について

On using distributed representations of source code for the detection of C security vulnerabilities ( http://arxiv.org/abs/2106.01367v1 )

ライセンス: CC BY 4.0

David Coimbra, Sofia Reis, Rui Abreu, Corina P\u{a}s\u{a}reanu, Hakan Erdogmus

(参考訳) 本稿では,c ソースコードのセキュリティ脆弱性検出タスクにおいて,コード表現モデル code2vec の評価を行う。我々はオープンソースライブラリのastminerを利用してラベル付きc関数のコーパスの抽象構文木からパスコンテキストを抽出する。 code2vecは、関数を脆弱か非破壊可能かを分類するタスクで、結果のパスコンテキストでトレーニングされる。 CodeXGLUEベンチマークを用いて、このタスクのCode2vecの精度は、事前訓練されたRoBERTaのような単純なトランスフォーマーベースのメソッドに匹敵し、より単純なNLPベースのメソッドよりも優れていることを示す。我々は,より大きなモデルに対して低計算要求を維持しながら,61.43%の精度を実現した。

This paper presents an evaluation of the code representation model Code2vec when trained on the task of detecting security vulnerabilities in C source code. We leverage the open-source library astminer to extract path-contexts from the abstract syntax trees of a corpus of labeled C functions. Code2vec is trained on the resulting path-contexts with the task of classifying a function as vulnerable or non-vulnerable. Using the CodeXGLUE benchmark, we show that the accuracy of Code2vec for this task is comparable to simple transformer-based methods such as pre-trained RoBERTa, and outperforms more naive NLP-based methods. We achieved an accuracy of 61.43% while maintaining low computational requirements relative to larger models.

翻訳日:2021-06-05 08:22:53 公開日:2021-06-01

# 可逆適応正規化による単一領域一般化

Adversarially Adaptive Normalization for Single Domain Generalization ( http://arxiv.org/abs/2106.01899v1 )

ライセンス: Link先を確認

Xinjie Fan, Qifei Wang, Junjie Ke, Feng Yang, Boqing Gong, Mingyuan Zhou

(参考訳) 単一ドメインの一般化は、トレーニング用の1つのドメインデータだけで、見えない多くのドメインでうまく機能するモデルを学ぶことを目的としています。既存の研究は、モデルの一般化能力を改善するために、adversarial domain augmentation (ada)の研究に焦点を当てている。正規化層の統計の領域一般化への影響はいまだ検討されていない。本稿では,従来の研究の欠如を補うために,一般化正規化アプローチ,適応標準化と再スケーリング正規化(ASR-Norm)を提案する。 ASR-Normは、ニューラルネットワークを介して標準化と再スケーリングの統計学を学ぶ。この新しい正規化の形式は、伝統的な正規化の一般的な形式と見なすことができる。 ADAでトレーニングすると、ASR-Normの統計は異なるドメインから来るデータに適応することが学習され、したがって、特にソースドメインと大きな差があるターゲットドメインにおいて、ドメイン間でのモデルの一般化性能が向上する。実験の結果,asr-normは平均1.6%,2.7%,6.3%,cifar-10-c,pacsベンチマークにおいて,最先端adaアプローチに一貫した改善をもたらすことがわかった。一般的なツールとして、ASR-Normによって導入された改善はADAメソッドの選択に依存しない。

Single domain generalization aims to learn a model that performs well on many unseen domains with only one domain data for training. Existing works focus on studying the adversarial domain augmentation (ADA) to improve the model's generalization capability. The impact on domain generalization of the statistics of normalization layers is still underinvestigated. In this paper, we propose a generic normalization approach, adaptive standardization and rescaling normalization (ASR-Norm), to complement the missing part in previous works. ASR-Norm learns both the standardization and rescaling statistics via neural networks. This new form of normalization can be viewed as a generic form of the traditional normalizations. When trained with ADA, the statistics in ASR-Norm are learned to be adaptive to the data coming from different domains, and hence improves the model generalization performance across domains, especially on the target domain with large discrepancy from the source domain. The experimental results show that ASR-Norm can bring consistent improvement to the state-of-the-art ADA approaches by 1.6%, 2.7%, and 6.3% averagely on the Digits, CIFAR-10-C, and PACS benchmarks, respectively. As a generic tool, the improvement introduced by ASR-Norm is agnostic to the choice of ADA methods.

翻訳日:2021-06-04 16:07:47 公開日:2021-06-01

# memory wrap: 画像分類モデルへのデータ効率と解釈可能な拡張

Memory Wrap: a Data-Efficient and Interpretable Extension to Image Classification Models ( http://arxiv.org/abs/2106.01440v1 )

ライセンス: Link先を確認

Biagio La Rosa, Roberto Capobianco and Daniele Nardi

(参考訳) ブラックボックスとデータ処理の性質のため、ディープラーニング技術は医療や司法といった重要な分野における現実世界の応用にはまだ広く採用されていない。本稿では,任意の画像分類モデルのプラグアンドプレイ拡張であるMemory Wrapを提案する。メモリラップはデータ効率とモデル解釈性の両方を改善し、過去のトレーニングサンプルのメモリと入力の間にコンテントアテンション機構を採用する。メモリラップは、限られたデータ集合から学習すると標準的な分類器よりも優れており、完全なデータセットから学習すると同等のパフォーマンスに達することを示す。本稿では,その構造と内容認識機構が,標準分類器と比較して解釈可能かを論じる。この目的のために,記憶内容に基づいて実例と実例による説明を構築する手法と,その意思決定プロセスに関する洞察を得るためにそれらを活用する方法を示す。我々は,CIFAR10,SVHN,CINIC10という3つの異なるデータセット上で,複数のアーキテクチャを用いて画像分類タスクをテストする。

Due to their black-box and data-hungry nature, deep learning techniques are not yet widely adopted for real-world applications in critical domains, like healthcare and justice. This paper presents Memory Wrap, a plug-and-play extension to any image classification model. Memory Wrap improves both data-efficiency and model interpretability, adopting a content-attention mechanism between the input and some memories of past training samples. We show that Memory Wrap outperforms standard classifiers when it learns from a limited set of data, and it reaches comparable performance when it learns from the full dataset. We discuss how its structure and content-attention mechanisms make predictions interpretable, compared to standard classifiers. To this end, we both show a method to build explanations by examples and counterfactuals, based on the memory content, and how to exploit them to get insights about its decision process. We test our approach on image classification tasks using several architectures on three different datasets, namely CIFAR10, SVHN, and CINIC10.

翻訳日:2021-06-04 16:05:28 公開日:2021-06-01

# (参考訳) 深部生成モデルのための潜時空間再構成

Latent Space Refinement for Deep Generative Models ( http://arxiv.org/abs/2106.00792v1 )

ライセンス: CC BY 4.0

Ramon Winterhalder, Marco Bellagente, Benjamin Nachman

(参考訳) 深層生成モデルは様々な目的のために科学や産業で広く利用されている。一般的な課題は、データ確率密度の正確な暗黙的あるいは明示的な表現を達成することである。最近の提案では、深層生成モデルの学習密度を向上するために分類器重みを用いた。我々は、このアイデアをあらゆる種類の生成モデルに拡張し、反復生成モデリングによる潜在空間の洗練が位相的障害を回避し、精度を向上させる方法を示す。この方法論は、対象モデルが微分不能で、改良前に限界化されなければならない多くの内部潜在次元を持つ場合にも適用される。本稿では,LaSeR(Latent Space Refinement)プロトコルを実例で紹介し,正規化フローと生成逆数ネットワークの組み合わせに着目した。

Deep generative models are becoming widely used across science and industry for a variety of purposes. A common challenge is achieving a precise implicit or explicit representation of the data probability density. Recent proposals have suggested using classifier weights to refine the learned density of deep generative models. We extend this idea to all types of generative models and show how latent space refinement via iterated generative modeling can circumvent topological obstructions and improve precision. This methodology also applies to cases were the target model is non-differentiable and has many internal latent dimensions which must be marginalized over before refinement. We demonstrate our Latent Space Refinement (LaSeR) protocol on a variety of examples, focusing on the combinations of Normalizing Flows and Generative Adversarial Networks.

翻訳日:2021-06-04 12:12:28 公開日:2021-06-01

# (参考訳) 機械学習会議のレビュープロセスにおける倫理的課題

Some Ethical Issues in the Review Process of Machine Learning Conferences ( http://arxiv.org/abs/2106.00810v1 )

ライセンス: CC BY 4.0

Alessio Russo

(参考訳) 最近の機械学習コミュニティの成功により、カンファレンスに提出された論文の数が大幅に増加した。この増加は、これらのカンファレンスが使用している現在のレビュープロセスに影響を及ぼすいくつかの問題をより顕著にした。レビュープロセスには科学研究の性質を損なういくつかの問題があり、これは完全に客観的で、政治的で、偏見がなく、不正行為(盗作、不正行為、不適切な影響、その他の不利益など)がない。本研究では,レビュワーの募集問題,二重盲検過程の侵害,不正行為,数値評価におけるバイアス,付録現象(すなわち,論文の付録部に結果を公開することが一般的になっていること)について検討する。これら各問題に対して、簡単な説明と可能な解決策を提供する。この作業の目標は、これらの問題に対する機械学習コミュニティの意識を高めることにある。

Recent successes in the Machine Learning community have led to a steep increase in the number of papers submitted to conferences. This increase made more prominent some of the issues that affect the current review process used by these conferences. The review process has several issues that may undermine the nature of scientific research, which is of being fully objective, apolitical, unbiased and free of misconduct (such as plagiarism, cheating, improper influence, and other improprieties). In this work, we study the problem of reviewers' recruitment, infringements of the double-blind process, fraudulent behaviors, biases in numerical ratings, and the appendix phenomenon (i.e., the fact that it is becoming more common to publish results in the appendix section of a paper). For each of these problems, we provide a short description and possible solutions. The goal of this work is to raise awareness in the Machine Learning community regarding these issues.

翻訳日:2021-06-04 11:56:52 公開日:2021-06-01

# (参考訳) iMetコレクション2020のラベルスペースのクリーン化と構造化

Cleaning and Structuring the Label Space of the iMet Collection 2020 ( http://arxiv.org/abs/2106.00815v1 )

ライセンス: CC BY 4.0

Vivien Nguyen and Sunnie S. Y. Kim

(参考訳) iMet 2020データセットは、細粒度アート属性認識の分野で貴重なリソースだが、その真の可能性には達していないと私たちは信じている。我々は、データセットのユニークな特性を文書化し、多くの属性ラベルが、データセット記述によって示唆されるよりもノイズが多いことを観察する。しばしば、ラベル間の意味的関係(例えば、同一性、相互排除、仮定、不確実性との重なり)も、私たちが利用していないと信じている。我々は,iMet 2020ラベルのクリーニングと構造化のアプローチを提案し,その意義と価値について議論する。さらに,提案手法の利点をいくつかの実験により示す。私たちのコードとクリーニングラベルは、https://github.com/sunniesuhyoung/imet2020cleanedで利用可能です。

The iMet 2020 dataset is a valuable resource in the space of fine-grained art attribution recognition, but we believe it has yet to reach its true potential. We document the unique properties of the dataset and observe that many of the attribute labels are noisy, more than is implied by the dataset description. Oftentimes, there are also semantic relationships between the labels (e.g., identical, mutual exclusion, subsumption, overlap with uncertainty) which we believe are underutilized. We propose an approach to cleaning and structuring the iMet 2020 labels, and discuss the implications and value of doing so. Further, we demonstrate the benefits of our proposed approach through several experiments. Our code and cleaned labels are available at https://github.com/sunniesuhyoung/iMet2020cleaned.

翻訳日:2021-06-04 11:50:32 公開日:2021-06-01

# (参考訳) ConvoSumm: 会話要約ベンチマークとArgument Miningによる抽象要約の改善

ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining ( http://arxiv.org/abs/2106.00829v1 )

ライセンス: CC BY 4.0

Alexander R. Fabbri, Faiaz Rahman, Imad Rizvi, Borui Wang, Haoran Li, Yashar Mehdad, Dragomir Radev

(参考訳) オンライン会話は膨大な情報をさまざまな形式でカバーすることができるが、抽象的なテキスト要約は主にニュース記事のモデリングに重点を置いている。この研究のギャップは、部分的にはオンラインの議論を要約するための標準化されたデータセットの欠如によるものだ。このギャップに対処するため、我々は、ニュースコメント、ディスカッションフォーラム、コミュニティ質問応答フォーラム、電子メールスレッドの4つの新しいデータセットをクラウドソースする、課題視点フレームワークによって動機付けられたアノテーションプロトコルを設計する。我々は、データセットの最先端モデルをベンチマークし、データに関連する特徴を分析する。包括的ベンチマークを作成するために、この領域で強力なベースラインを確立するために、広く使われている会話要約データセット上でこれらのモデルを評価する。さらに,会話に含まれる問題や視点,アサーションを直接モデル化するために,グラフ構築による議論マイニングを取り入れ,ノイズ入力をフィルタリングし,自動評価や人間評価による比較や改善結果を示す。

While online conversations can cover a vast amount of information in many different formats, abstractive text summarization has primarily focused on modeling solely news articles. This research gap is due, in part, to the lack of standardized datasets for summarizing online discussions. To address this gap, we design annotation protocols motivated by an issues--viewpoints--assertions framework to crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads. We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data. To create a comprehensive benchmark, we also evaluate these models on widely-used conversation summarization datasets to establish strong baselines in this domain. Furthermore, we incorporate argument mining through graph construction to directly model the issues, viewpoints, and assertions present in a conversation and filter noisy input, showing comparable or improved results according to automatic and human evaluations.

翻訳日:2021-06-04 11:38:20 公開日:2021-06-01

# (参考訳) 価格アルゴリズム保険

Pricing Algorithmic Insurance ( http://arxiv.org/abs/2106.00839v1 )

ライセンス: CC BY 4.0

Dimitris Bertsimas, Agni Orfanoudaki

(参考訳) 機械学習アルゴリズムが企業や組織の意思決定プロセスに統合され始めると、保険製品は所有者をリスクから守るために開発される。本稿では, アルゴリズム保険の概念を導入し, 導出保険契約の価格設定を実現するための定量的枠組みを提案する。本稿では,バイナリ分類モデルのリスク露出と価格を推定する最適化式を提案する。本稿では,モデルの性質,すなわち正確性,解釈性,一般化性が保険契約評価に与える影響について概説する。提案手法の実践的実装を示すために,乳がん検出の文脈における医療的誤りの事例研究を行った。本分析は,モデルパラメータが期待される損失に与える影響を計測し,契約の価格に大きく影響するアルゴリズム性能の側面を特定することに焦点を当てる。

As machine learning algorithms start to get integrated into the decision-making process of companies and organizations, insurance products will be developed to protect their owners from risk. We introduce the concept of algorithmic insurance and present a quantitative framework to enable the pricing of the derived insurance contracts. We propose an optimization formulation to estimate the risk exposure and price for a binary classification model. Our approach outlines how properties of the model, such as accuracy, interpretability and generalizability, can influence the insurance contract evaluation. To showcase a practical implementation of the proposed framework, we present a case study of medical malpractice in the context of breast cancer detection. Our analysis focuses on measuring the effect of the model parameters on the expected financial loss and identifying the aspects of algorithmic performance that predominantly affect the price of the contract.

翻訳日:2021-06-04 11:16:18 公開日:2021-06-01

# (参考訳) 項目応答理論によるテストセットの比較

Comparing Test Sets with Item Response Theory ( http://arxiv.org/abs/2106.00840v1 )

ライセンス: CC BY 4.0

Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman

(参考訳) 近年,自然言語理解タスクにおける微調整モデルの性能を評価するために,多くのNLPデータセットが導入された。しかし、大規模な事前訓練されたモデルによる最近の結果は、これらのデータセットの大部分は飽和しており、さらなる進歩を検出することができないことを示している。強力なモデル間での差別化に依然として有効なデータセットは何か、将来の改善を検出できるデータセットはどのようなものか? これをデータセット全体にわたって一様に測定するために、項目応答理論に基づき、個別のテスト例で18の事前学習トランスフォーマーモデルの予測を用いて29のデータセットを評価する。 Quoref、HellaSwag、MC-TACOは最先端のモデルの区別に最適であるのに対して、SNLI、MNLI、CommitmentBankは現在の強力なモデルに飽和しているようだ。また、QAMRやSQuAD2.0のようなQAデータセットに使用されるスパン選択タスク形式は、強いモデルと弱いモデルとの差別化に有効である。

Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks. Recent results from large pretrained models, though, show that many of these datasets are largely saturated and unlikely to be able to detect further progress. What kind of datasets are still effective at discriminating among strong models, and what kind of datasets should we expect to be able to detect future improvements? To measure this uniformly across datasets, we draw on Item Response Theory and evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.

翻訳日:2021-06-04 11:15:24 公開日:2021-06-01

# (参考訳) 多変量環境における非線形関係発見のための事前画像の活用

Leveraging Pre-Images to Discover Nonlinear Relationships in Multivariate Environments ( http://arxiv.org/abs/2106.00842v1 )

ライセンス: CC BY 4.0

M. Ali Vosoughi and Axel Wismuller

(参考訳) 因果発見は、接続された点の集合としてのネットワークの推論を超えて、人工知能を用いた科学的発見において重要な機能を提供する。物理学、生理学、不確定な環境における複数のエージェントによる戦略的決定、気候学、その他多くの領域で発生する問題は、因果関係や推論にルーツを持つ。多くの実世界の時間観測が互いに非線形に関連していることが判明した。観測の回数は数百万ポイントにも達するが、時間サンプルの数は倫理的あるいは実践的な理由から最小限に抑えられ、大規模システムにおける次元の呪いにつながる。本稿では,カーネルの主成分分析と事前イメージを用いて,多変量時系列データの非線形依存関係を求める手法を提案する。本手法は, 観測が時間的に制限され, 非線形関係にある場合に, 最先端の因果発見手法よりも優れることを示す。提案手法を評価するために,様々なトポロジを持つ実世界および合成データセットの広範なシミュレーションを行った。

Causal discovery, beyond the inference of a network as a collection of connected dots, offers a crucial functionality in scientific discovery using artificial intelligence. The questions that arise in multiple domains, such as physics, physiology, the strategic decision in uncertain environments with multiple agents, climatology, among many others, have roots in causality and reasoning. It became apparent that many real-world temporal observations are nonlinearly related to each other. While the number of observations can be as high as millions of points, the number of temporal samples can be minimal due to ethical or practical reasons, leading to the curse-of-dimensionality in large-scale systems. This paper proposes a novel method using kernel principal component analysis and pre-images to obtain nonlinear dependencies of multivariate time-series data. We show that our method outperforms state-of-the-art causal discovery methods when the observations are restricted by time and are nonlinearly related. Extensive simulations on both real-world and synthetic datasets with various topologies are provided to evaluate our proposed methods.

翻訳日:2021-06-04 09:29:26 公開日:2021-06-01

# (参考訳) グラフリッチドキュメンテーション表現を用いたパラメータ効率の良いニューラル質問応答モデル

Parameter-Efficient Neural Question Answering Models via Graph-Enriched Document Representations ( http://arxiv.org/abs/2106.00851v1 )

ライセンス: CC BY 4.0

Louis Castricato, Stephen Fitz, Won Young Shin

(参考訳) 現代のNLPシステムの計算フットプリントが増加するにつれて、より効率的なモデルに到達することがますます重要になる。グラフ畳み込み文書表現を用いることで、学習可能なパラメータの観点でリソースの5\%未満を消費しながら、somaソリューションを両立し、場合によっては超越する質問応答システムが得られることを示す。現在、GCNをNLPに適用する際の大きな問題は文書表現である。本稿では,GCNに富んだ文書表現が,自明なトポロジを用いてもHotPotQAで見られる結果を大幅に改善することを示す。我々のモデル(gQA)は、現在のSOTAと比較するとすばらしい性能を示し、前処理はほとんど必要としない。シャオとアルで 2020年、著者らはマルチホップQAの性能向上のためにグラフネットワークは必要ないことを示唆した。本稿では,GCNのna\{i}ve実装が事前訓練された言語モデルに基づくSoTAモデルと相容れない性能を示すことによって,大規模言語モデルは性能向上に必要ではないことを示唆する。

As the computational footprint of modern NLP systems grows, it becomes increasingly important to arrive at more efficient models. We show that by employing graph convolutional document representation, we can arrive at a question answering system that performs comparably to, and in some cases exceeds the SOTA solutions, while using less than 5\% of their resources in terms of trainable parameters. As it currently stands, a major issue in applying GCNs to NLP is document representation. In this paper, we show that a GCN enriched document representation greatly improves the results seen in HotPotQA, even when using a trivial topology. Our model (gQA), performs admirably when compared to the current SOTA, and requires little to no preprocessing. In Shao et al. 2020, the authors suggest that graph networks are not necessary for good performance in multi-hop QA. In this paper, we suggest that large language models are not necessary for good performance by showing a na\"{i}ve implementation of a GCN performs comparably to SoTA models based on pretrained language models.

翻訳日:2021-06-04 09:18:43 公開日:2021-06-01

# In-Distribution Counterfactuals を用いた社会適応型特徴重要度記述のための検索手法

Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals ( http://arxiv.org/abs/2106.00786v1 )

ライセンス: Link先を確認

Peter Hase, Harry Xie, Mohit Bansal

(参考訳) 特徴重要度(FI)推定は一般的な説明形式であり、テスト時に特定の入力特徴を除去することによって生じるモデル信頼度の変化を計算し、評価することが一般的である。例えば、標準sufficiencyメトリックでは、最も重要なトークンはトップkのみ保持される。本稿では,fiベース説明の未検討次元をいくつか検討し,この説明形式に対する概念的および経験的改善について述べる。まず、説明の作成や評価において、なぜインプットから特徴を取り除くことが問題となるのか、という新たな議論を前進させる: モデルに対するこれらの反事実入力がアウト・オブ・ディストリビューション(OOD)であるという事実は、結果として生じる説明が社会的に不一致であることを意味する。問題の本質は、モデル事前化とランダムな重みの初期化が意図しない方法で説明(と説明メトリクス)に影響を与えることである。この問題を解決するために、モデルトレーニングプロセスの簡単な変更を提案し、より社会的に整合した説明とメトリクスをもたらす。第2に,モデル入力から機能を取り除くための5つのアプローチを比較した。いくつかの手法はOOD対策を他の方法よりも多く生成し,機能置換関数を選択することを推奨する。最後に,fi説明を識別し,lime,統合勾配,ランダム検索など,強力なベースラインと比較する検索ベース手法を4つ導入する。 6つの多様なテキスト分類データセットを用いて実験したところ、ランダム検索を一貫して上回る手法は並列局所探索のみであることがわかった。第2の方法による改善は、十分で5.4ポイント、包括性で17ポイントである。サポートコードはすべてhttps://github.com/peterbhase/ExplanationSearchで公開されている。

Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. For example, in the standard Sufficiency metric, only the top-k most important tokens are kept. In this paper, we study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation. First, we advance a new argument for why it can be problematic to remove features from an input when creating or evaluating explanations: the fact that these counterfactual inputs are out-of-distribution (OOD) to models implies that the resulting explanations are socially misaligned. The crux of the problem is that the model prior and random weight initialization influence the explanations (and explanation metrics) in unintended ways. To resolve this issue, we propose a simple alteration to the model training process, which results in more socially aligned explanations and metrics. Second, we compare among five approaches for removing features from model inputs. We find that some methods produce more OOD counterfactuals than others, and we make recommendations for selecting a feature-replacement function. Finally, we introduce four search-based methods for identifying FI explanations and compare them to strong baselines, including LIME, Integrated Gradients, and random search. On experiments with six diverse text classification datasets, we find that the only method that consistently outperforms random search is a Parallel Local Search that we introduce. Improvements over the second-best method are as large as 5.4 points for Sufficiency and 17 points for Comprehensiveness. All supporting code is publicly available at https://github.com/peterbhase/ExplanationSearch.

翻訳日:2021-06-03 14:52:15 公開日:2021-06-01

# 不変政策学習:因果的視点

Invariant Policy Learning: A Causal Perspective ( http://arxiv.org/abs/2106.00808v1 )

ライセンス: Link先を確認

Sorawit Saengkyongam, Nikolaj Thams, Jonas Peters and Niklas Pfister

(参考訳) 過去10年間で、オンライン広告、レコメンダシステム、動的価格などの様々なインタラクティブな学習システムにおいて、文脈的帯域幅と強化学習アルゴリズムがうまく使われてきた。しかし、医療などの高度なアプリケーション領域では、まだ広く採用されていない。一つの理由は、既存のアプローチが、基盤となるメカニズムが、時間とともに異なる環境にまたがって変化しないという意味で静的であると仮定しているからかもしれない。しかし、多くの現実世界のシステムでは、メカニズムは静的環境の仮定を無効にする可能性のある環境にまたがるシフトの対象となる。本稿では,オフラインの文脈的帯域幅の枠組みの下での環境変化問題に対処する。我々は,因果関係のレンズを通して環境変化の問題を考察し,基盤メカニズムの変化を可能にするマルチ環境コンテキストバンディットを提案する。因果関係文献から不変性の概念を採用し,政策不変性の概念を導入する。政策不変性は、観測されていない共同創設者が存在する場合にのみ重要であり、その場合、ある仮定の下で最適な不変性が環境全体にわたって一般化されることを示す。本研究は,環境変化問題に対する解決策を提供するだけでなく,因果関係,不変性,文脈的バンディットの具体的関係を確立する。

In the past decade, contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over time or over different environments. In many real world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we tackle the problem of environmental shifts under the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved confounders are present and show that, in that case, an optimal invariant policy is guaranteed, under certain assumptions, to generalize across environments. Our results do not only provide a solution to the environmental shift problem but also establish concrete connections among causality, invariance and contextual bandits.

翻訳日:2021-06-03 14:49:35 公開日:2021-06-01

# 不確かさ特性曲線:予測間隔の体系的評価

Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals ( http://arxiv.org/abs/2106.00858v1 )

ライセンス: Link先を確認

Jiri Navratil, Benjamin Elder, Matthew Arnold, Soumya Ghosh, Prasanna Sattigeri

(参考訳) モデル不確実性の正確な定量化は、信頼できるAIの基本的な要件として長年認識されてきた。回帰タスクでは、不確実性は通常、特定の操作点に調整された予測間隔を用いて定量化され、異なる研究における評価と比較が困難になる。本研究は,(1)操作特性曲線の概念,(2)単純な参照よりも利得の概念を活用して,予測間隔に対する新たな操作点非依存評価手法を導出する。本稿では, 対応するアルゴリズムを記述し, 理論的解析を行い, 複数のシナリオでその有用性を実証する。提案手法は予測間隔の包括的評価の必要性に対処し,不確実性定量化ツールボックスの付加価値を示すものである。

Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to a specific operating point, making evaluation and comparison across different studies difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a simple reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper describes the corresponding algorithm, provides a theoretical analysis, and demonstrates its utility in multiple scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.

翻訳日:2021-06-03 14:49:16 公開日:2021-06-01

# QLSD:ベイズ連邦学習のための量子Langevin確率力学

QLSD: Quantised Langevin stochastic dynamics for Bayesian federated learning ( http://arxiv.org/abs/2106.00797v1 )

ライセンス: Link先を確認

Maxime Vono, Vincent Plassier, Alain Durmus, Aymeric Dieuleveut, Eric Moulines

(参考訳) フェデレーション学習は、データが分散化され、複数のクライアントにローカルに保存された場合に、データオーナシップと通信オーバーヘッドという2つの主な制約の下で推論を実行することを目的としている。本稿では,これらの問題をベイズパラダイムのもとで扱う。この目的のために,確率勾配ランジュバンダイナミクスの量子化バージョンを基盤とした,新しいマルコフ連鎖モンテカルロアルゴリズムを提案する。ビッグデータシステムの性能向上のために,本稿では,<texttt{QLSD}$^\star$および<texttt{QLSD}$^{++}$と呼ばれる方法論の分散還元代替案を紹介する。我々は,提案アルゴリズムの非漸近収束保証と漸近収束保証の両方を提供し,その利点を複数のフェデレート学習ベンチマークで示す。

Federated learning aims at conducting inference when data are decentralised and locally stored on several clients, under two main constraints: data ownership and communication overhead. In this paper, we address these issues under the Bayesian paradigm. To this end, we propose a novel Markov chain Monte Carlo algorithm coined \texttt{QLSD} built upon quantised versions of stochastic gradient Langevin dynamics. To improve performance in a big data regime, we introduce variance-reduced alternatives of our methodology referred to as \texttt{QLSD}$^\star$ and \texttt{QLSD}$^{++}$. We provide both non-asymptotic and asymptotic convergence guarantees for the proposed algorithms and illustrate their benefits on several federated learning benchmarks.

翻訳日:2021-06-03 14:48:37 公開日:2021-06-01

# ポリシーに基づく強化学習のためのエントロピー正規化自由機構

An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning ( http://arxiv.org/abs/2106.00707v1 )

ライセンス: Link先を確認

Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng

(参考訳) 政策に基づく強化学習手法は、政策崩壊問題に苦しむ。我々は,「epsilon」-greedy機構を用いた価値ベースの強化学習手法が,クローズド・フォーム・ダイバーシティ,客観的不変探索,適応的トレードオフという3つの特徴を享受できることを示す。しかし、3つの特性をすべて達成するポリシーベース手法の並列メカニズムは存在しない。本稿では,閉じた形態の多様性,客観的不変な探索,適応的トレードオフを実現する政策に基づく手法のために設計されたエントロピー正規化自由機構を提案する。実験の結果,本機構は,政策に基づく手法では極めてサンプル効率が高く,アーケード学習環境における新たな最先端技術への政策ベースラインの強化が期待できることがわかった。

Policy-based reinforcement learning methods suffer from the policy collapse problem. We find valued-based reinforcement learning methods with {\epsilon}-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem. However, there does not exist a parallel mechanism for policy-based methods that achieves all three characteristics. In this paper, we propose an entropy regularization free mechanism that is designed for policy-based methods, which achieves Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off. Our experiments show that our mechanism is super sample-efficient for policy-based methods and boosts a policy-based baseline to a new State-Of-The-Art on Arcade Learning Environment.

翻訳日:2021-06-03 14:46:15 公開日:2021-06-01

# NLUデータ収集作業の難しさに対する効果的なクラウドソーシングプロトコルについて

What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks? ( http://arxiv.org/abs/2106.00794v1 )

ライセンス: Link先を確認

Nikita Nangia, Saku Sugawara, Harsh Trivedi, Alex Warstadt, Clara Vania, Samuel R. Bowman

(参考訳) クラウドソーシングは、共通の自然言語理解タスクのためのデータを作成するために広く使われている。これらのデータセットは、言語のモデル理解の測定と精細化において重要であるが、データセットの収集に使用されるクラウドソーシング手法にはほとんど焦点が当てられていない。本稿では,データ品質向上手法として,先行研究で提案された介入の有効性を比較した。複数項目の質問応答をテストベッドとして使用し、4つの異なるデータ収集プロトコルの1つで質問を書くようクラウドワーカーに割り当ててランダムに試行します。我々は,NLU例の難易度を高めるための非効率なスタンドアロン戦略として,実例の説明書を書くよう労働者に求めた。しかし,データ収集やフィードバックの送付,専門家の判断に基づく資格取得といった反復的なプロセスは,クラウドワーカーの育成に有効であることが判明した。しかし、専門家の判断ではなくクラウドソーシングを使って労働者を認定し、フィードバックを送ることは効果的ではない。専門家評価を伴う反復的プロトコルからのデータは、いくつかの尺度によりより困難である。特に、このデータの満場一致部分におけるヒューマンモデルギャップは、平均して、ベースラインプロトコルデータのギャップの2倍の大きさである。

Crowdsourcing is widely used to create data for common natural language understanding tasks. Despite the importance of these datasets for measuring and refining model understanding of language, there has been little focus on the crowdsourcing methods used for collecting the datasets. In this paper, we compare the efficacy of interventions that have been proposed in prior work as ways of improving data quality. We use multiple-choice question answering as a testbed and run a randomized trial by assigning crowdworkers to write questions under one of four different data collection protocols. We find that asking workers to write explanations for their examples is an ineffective stand-alone strategy for boosting NLU example difficulty. However, we find that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data. But using crowdsourced, instead of expert judgments, to qualify workers and send feedback does not prove to be effective. We observe that the data from the iterative protocol with expert assessments is more challenging by several measures. Notably, the human--model gap on the unanimous agreement portion of this data is, on average, twice as large as the gap for the baseline protocol data.

翻訳日:2021-06-03 14:44:41 公開日:2021-06-01

# 協調型非定常多変量ガウス過程モデル

Collaborative Nonstationary Multivariate Gaussian Process Model ( http://arxiv.org/abs/2106.00719v1 )

ライセンス: Link先を確認

Rui Meng, Herbie Lee, Kristofer Bouchard

(参考訳) 現在、マルチ出力ガウス過程回帰モデルは非定常性をモデル化しないか、あるいは厳しい計算負荷とストレージ要求に関連付けられている。非定常多変量ガウス過程モデル (NMGP) は、入力依存線形モデルを持つ非定常共分散関数を用いて、入力依存相関、スケール、出力の滑らかさを共同でモデル化する。変分スパース近似は、スケーラブルな計算を可能にするために点の誘導に依存する。そこで我々は,NMGPにおける潜在関数の変動フレームワークを誘導することを考えると,協調的非定常ガウス過程モデル(CNMGP)と呼ばれる新しいモデルを提案する。 cnmgpでは, 2倍の確率的変分推論が可能な計算可能な変分境界を導出する。これにより、出力が共通の入力セットを共有していないデータを、入力と出力のサイズに依存しない計算複雑性でモデル化することができる。本稿では,合成データと3つの実データを用いた手法の性能を概ね示し,そのモデルが概して最先端の予測性能よりも優れた予測性能を示すとともに,出力間で異なる時間変動相関の見積もりを提供する。

Currently, multi-output Gaussian process regression models either do not model nonstationarity or are associated with severe computational burdens and storage demands. Nonstationary multi-variate Gaussian process models (NMGP) use a nonstationary covariance function with an input-dependent linear model of coregionalisation to jointly model input-dependent correlation, scale, and smoothness of outputs. Variational sparse approximation relies on inducing points to enable scalable computations. Here, we take the best of both worlds: considering an inducing variable framework on the underlying latent functions in NMGP, we propose a novel model called the collaborative nonstationary Gaussian process model(CNMGP). For CNMGP, we derive computationally tractable variational bounds amenable to doubly stochastic variational inference. Together, this allows us to model data in which outputs do not share a common input set, with a computational complexity that is independent of the size of the inputs and outputs. We illustrate the performance of our method on synthetic data and three real datasets and show that our model generally pro-vides better predictive performance than the state-of-the-art, and also provides estimates of time-varying correlations that differ across outputs.

翻訳日:2021-06-03 14:43:41 公開日:2021-06-01

# 深層学習コンテストにおける反省会--シンプソンのパラドックスとスケールメトリクスと形状メトリクスの相補的役割

Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics ( http://arxiv.org/abs/2106.00734v1 )

ライセンス: Link先を確認

Charles H. Martin and Michael W. Mahoney

(参考訳) 最先端ニューラルネットワーク(NN)モデルにおいて,優れた一般化性能の原因をよりよく理解するために,NNの一般化精度を予測するコンテストで公開されているモデルのコーパスを分析した。これらのモデルには幅広い品質が含まれ、様々なアーキテクチャと正規化ハイパーパラメータで訓練された。 We identify what amounts to a Simpson's paradox: where "scale" metrics (from traditional statistical learning theory) perform well overall but perform poorly on subpartitions of the data of a given depth, when regularization hyperparameters are varied; and where "shape" metrics (from Heavy-Tailed Self Regularization theory) perform well on subpartitions of the data, when hyperparameters are varied for models of a given depth, but perform poorly overall when models with varying depths are aggregated. この結果から,アーキテクチャとハイパーパラメータが異なる場合と,NNモデル品質の理解における暗黙的スケールと暗黙的形状パラメータの相補的役割について,モデルの比較を行った。また,データ収集に応用した1つの指標を用いて因果的洞察を抽出しようとする場合,さらに,一般化理論の上限値に基づいて,最先端のNNモデルの性能を記述することの必要性を強調した。これらの結果に基づき,データ非依存とデータ依存の2つの新しい形状指標を示し,解法ハイパーパラメータを変化させる際に,nnの連続したテスト精度の傾向を予測できる。

To understand better the causes of good generalization performance in state-of-the-art neural network (NN) models, we analyze of a corpus of models that was made publicly-available for a contest to predict the generalization accuracy of NNs. These models include a wide range of qualities and were trained with a range of architectures and regularization hyperparameters. We identify what amounts to a Simpson's paradox: where "scale" metrics (from traditional statistical learning theory) perform well overall but perform poorly on subpartitions of the data of a given depth, when regularization hyperparameters are varied; and where "shape" metrics (from Heavy-Tailed Self Regularization theory) perform well on subpartitions of the data, when hyperparameters are varied for models of a given depth, but perform poorly overall when models with varying depths are aggregated. Our results highlight the subtly of comparing models when both architectures and hyperparameters are varied, as well as the complementary role of implicit scale versus implicit shape parameters in understanding NN model quality. Our results also suggest caution when one tries to extract causal insight with a single metric applied to aggregate data, and they highlight the need to go beyond one-size-fits-all metrics based on upper bounds from generalization theory to describe the performance of state-of-the-art NN models. Based on these findings, we present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs, of a fixed architecture/depth, when varying solver hyperparameters.

翻訳日:2021-06-03 14:43:19 公開日:2021-06-01

# 時間近傍符号化による時系列の教師なし表現学習

Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding ( http://arxiv.org/abs/2106.00750v1 )

ライセンス: Link先を確認

Sana Tonekaboni, Danny Eytan, Anna Goldenberg

(参考訳) 時系列はしばしば複雑で情報に富んでいるが、わずかにラベル付けされているためモデル化が難しい。本稿では,非定常時系列の一般化表現を学習するための自己教師付きフレームワークを提案する。我々の手法は、TNC(Temporal Neighborhood Coding)と呼ばれ、信号の生成過程の局所的滑らかさを利用して、定常特性のある近傍を定義する。偏りのある対比目的を用いて, 符号化空間において, 近傍からの信号の分布と非隣接信号の分布を区別できることを保証することにより, 時系列表現を学習する。我々のモチベーションは、時系列データのダイナミックな性質をモデル化する能力が、ラベル付けデータが事実上不可能な環境で患者の潜伏状態を特定し、追跡し、予測するのに特に有用である医療分野に起因している。提案手法と最近開発された教師なし表現学習手法を比較し,クラスタリングおよび複数のデータセットの分類タスクにおける優れた性能を示す。

Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.

翻訳日:2021-06-03 14:42:51 公開日:2021-06-01

# 入力凸ニューラルネットワークを用いた確率空間上の関数の最適化

Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks ( http://arxiv.org/abs/2106.00774v1 )

ライセンス: Link先を確認

David Alvarez-Melis, Yair Schiff, Youssef Mroueh

(参考訳) グラディエントフローは、ワッサーシュタイン計量によって与えられる確率の空間を含む一般計量空間における関数を最適化するための強力なツールである。この最適化問題を解決する典型的なアプローチは、最適輸送の動的定式化と有名なjordan-kinderlehrer-otto (jko) スキームとの関係に依存している。しかし、この定式化は凸関数の最適化を伴い、特に高次元では困難である。本研究では,最近導入された入力凸ニューラルネットワーク(ICNN)を用いて,JKOスキームを近似するために凸関数の空間をパラメータ化し,収束保証を享受する尺度よりも関数を設計する手法を提案する。我々は、このJKO-ICNNフレームワークの計算効率の良い実装を導き、その実現可能性と、既知の解を用いた低次元偏微分方程式の近似解の妥当性と妥当性を示す。また、分子発見のための制御生成実験により、JKO-ICNNアプローチを高次元に利用することについても検討する。

Gradient flows are a powerful tool for optimizing functionals in general metric spaces, including the space of probabilities endowed with the Wasserstein metric. A typical approach to solving this optimization problem relies on its connection to the dynamic formulation of optimal transport and the celebrated Jordan-Kinderlehrer-Otto (JKO) scheme. However, this formulation involves optimization over convex functions, which is challenging, especially in high dimensions. In this work, we propose an approach that relies on the recently introduced input-convex neural networks (ICNN) to parameterize the space of convex functions in order to approximate the JKO scheme, as well as in designing functionals over measures that enjoy convergence guarantees. We derive a computationally efficient implementation of this JKO-ICNN framework and use various experiments to demonstrate its feasibility and validity in approximating solutions of low-dimensional partial differential equations with known solutions. We also explore the use of our JKO-ICNN approach in high dimensions with an experiment in controlled generation for molecular discovery.

翻訳日:2021-06-03 14:40:54 公開日:2021-06-01

# 機械学習のための重み付けベクトル:境界検出に適用した数値調和解析

Weighting vectors for machine learning: numerical harmonic analysis applied to boundary detection ( http://arxiv.org/abs/2106.00827v1 )

ライセンス: Link先を確認

Eric Bunch, Jeffery Kline, Daniel Dickinson, Suhaas Bhat, Glenn Fung

(参考訳) 計量空間等級(英: Metric space magnitude)は、代数トポロジーの研究の活発な分野であるスカラー量であり、一般的な計量空間に存在している異なる点の有効個数をまとめたものである。ヘーディングベクトル(英: {\em weighting vector)は、原距離空間の基底幾何学の多くを非自明な方法で捉える、密接に関連する概念である。最近の研究は、計量空間がユークリッドであるとき、重み付けベクトルが境界検出の有効なツールであることを示した。我々はこの結果を再放送し、重み付けベクトルをカーネル化されたSVMの解と見なせることを示す。結果として、この新たな洞察を異常検出タスクに適用し、ベンチマークデータセットにおける最先端技術のパフォーマンスよりも競争力のある性能を示す。穏やかな仮定の下では、行列反転の計算コストを持つ重み付けベクトルを線形時間で効率的に近似できることを示す。 SVM が定義する最小化問題に対して,近傍の手法がいかに近似できるかを示す。

Metric space magnitude, an active field of research in algebraic topology, is a scalar quantity that summarizes the effective number of distinct points that live in a general metric space. The {\em weighting vector} is a closely-related concept that captures, in a nontrivial way, much of the underlying geometry of the original metric space. Recent work has demonstrated that when the metric space is Euclidean, the weighting vector serves as an effective tool for boundary detection. We recast this result and show the weighting vector may be viewed as a solution to a kernelized SVM. As one consequence, we apply this new insight to the task of outlier detection, and we demonstrate performance that is competitive or exceeds performance of state-of-the-art techniques on benchmark data sets. Under mild assumptions, we show the weighting vector, which has computational cost of matrix inversion, can be efficiently approximated in linear time. We show how nearest neighbor methods can approximate solutions to the minimization problems defined by SVMs.

翻訳日:2021-06-03 14:40:36 公開日:2021-06-01

# ニューラルネットワークモデルにおける意味の含意表現

Implicit Representations of Meaning in Neural Language Models ( http://arxiv.org/abs/2106.00737v1 )

ライセンス: Link先を確認

Belinda Z. Li, Maxwell Nye, Jacob Andreas

(参考訳) ニューラルランゲージモデルの有効性は、表層単語共起統計の正確なモデリングから完全に導かれるのか、それとも、これらのモデルが彼らが記述した世界と理性を表すのか? BARTおよびT5トランスフォーマー言語モデルでは、会話を通して進化するエンティティや状況のモデルとして機能する文脈的単語表現を識別する。これらのニューラル表現は、動的意味論の言語モデルと機能的類似性を持ち、それぞれのエンティティの現在の特性と関係の線形な読み出しをサポートし、言語生成に予測可能な効果で操作できる。その結果,少なくとも部分的には,意味の動的表現と実体状態の暗黙的シミュレーションによって,事前学習されたニューラルネットワークモデルの予測がサポートされ,学習データとしてテキストだけで学習できることがわかった。コードとデータはhttps://github.com/belindal/state-probesで入手できる。

Does the effectiveness of neural language models derive entirely from accurate modeling of surface word co-occurrence statistics, or do these models represent and reason about the world they describe? In BART and T5 transformer language models, we identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse. These neural representations have functional similarities to linguistic models of dynamic semantics: they support a linear readout of each entity's current properties and relations, and can be manipulated with predictable effects on language generation. Our results indicate that prediction in pretrained neural language models is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state, and that this behavior can be learned with only text as training data. Code and data are available at https://github.com/belindal/state-probes .

翻訳日:2021-06-03 14:37:37 公開日:2021-06-01

# DYPLOC:テキスト生成のための混合言語モデルを用いたコンテンツの動的計画

DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Text Generation ( http://arxiv.org/abs/2106.00791v1 )

ライセンス: Link先を確認

Xinyu Hua, Ashwin Sreevatsa, and Lu Wang

(参考訳) 我々は,少なくとも2つの異なる課題に直面する長文意見テキスト生成の課題について検討する。まず、既存のニューラルジェネレーションモデルはコヒーレンスに欠けており、効率的なコンテンツプランニングが必要である。第二に、主観的コンテンツと客観的コンテンツの両方をカバーするようにジェネレータを導くには、多様な種類の情報が必要である。そこで本研究では,混合言語モデルの新たな設計に基づく出力生成をしながら,コンテンツの動的計画を行う生成フレームワークdyplocを提案する。多様なコンテンツで生成を豊かにするために、より大規模な事前学習モデルを用いて関連する概念を予測し、クレームを生成することを提案する。我々は,(1)Reddit ChangeMyViewを用いた引数生成,(2)New York Timesのオピニオンセクションを用いた記事作成という,新たに収集されたデータセットに関する2つの課題を実験した。自動評価は,本モデルが競合比較を著しく上回ることを示す。人間の審査員は、われわれの世代がよりリッチなコンテンツに忠実であることをさらに確認する。

We study the task of long-form opinion text generation, which faces at least two distinct challenges. First, existing neural generation models fall short of coherence, thus requiring efficient content planning. Second, diverse types of information are needed to guide the generator to cover both subjective and objective content. To this end, we propose DYPLOC, a generation framework that conducts dynamic planning of content while generating the output based on a novel design of mixed language models. To enrich the generation with diverse content, we further propose to use large pre-trained models to predict relevant concepts and to generate claims. We experiment with two challenging tasks on newly collected datasets: (1) argument generation with Reddit ChangeMyView, and (2) writing articles using New York Times' Opinion section. Automatic evaluation shows that our model significantly outperforms competitive comparisons. Human judges further confirm that our generations are more coherent with richer content.

翻訳日:2021-06-03 14:37:22 公開日:2021-06-01

# CoRI:オープン情報抽出のためのデータ拡張と集合関係の統合

CoRI: Collective Relation Integration with Data Augmentation for Open Information Extraction ( http://arxiv.org/abs/2106.00793v1 )

ライセンス: Link先を確認

Zhengbao Jiang, Jialong Han, Bunyamin Sisman, Xin Luna Dong

(参考訳) Webから抽出された知識を知識グラフ(KG)に統合することで、質問応答のような作業が容易になる。本研究では,対象kgにおける関係関係に対する主観-関係-対象抽出における自由テキスト関係の整合を目的とした関係統合について検討する。自由テキスト関係が曖昧であるという課題に対処するために、以前の方法は、追加の文脈で隣のエンティティとリレーションを利用する。しかし、予測は独立して行われ、相互に矛盾する可能性がある。本稿では,第1段階が個別に候補予測を行い,第2段階が全ての候補予測にアクセスしてグローバルにコヒーレントな予測を行う2段階集団関係統合(cori)モデルを提案する。さらに、未使用のターゲットKGの一部からデータを付加することで、集合モデルをさらに改善する。 2つのデータセットの実験結果から、CoRIはベースラインを大幅に上回り、AUCは.677から.748に、AUCは.716から.780に改善された。

Integrating extracted knowledge from the Web to knowledge graphs (KGs) can facilitate tasks like question answering. We study relation integration that aims to align free-text relations in subject-relation-object extractions to relations in a target KG. To address the challenge that free-text relations are ambiguous, previous methods exploit neighbor entities and relations for additional context. However, the predictions are made independently, which can be mutually inconsistent. We propose a two-stage Collective Relation Integration (CoRI) model, where the first stage independently makes candidate predictions, and the second stage employs a collective model that accesses all candidate predictions to make globally coherent predictions. We further improve the collective model with augmented data from the portion of the target KG that is otherwise unused. Experiment results on two datasets show that CoRI can significantly outperform the baselines, improving AUC from .677 to .748 and from .716 to .780, respectively.

翻訳日:2021-06-03 14:37:06 公開日:2021-06-01

# 英語以外のクレームマッチングによるグローバルなファクトチェックのスケールアップ

Claim Matching Beyond English to Scale Global Fact-Checking ( http://arxiv.org/abs/2106.00853v1 )

ライセンス: Link先を確認

Ashkan Kazemi, Kiran Garimella, Devin Gaffney and Scott A. Hale

(参考訳) 手動の事実チェックは、インターネットのニーズを満たすためにうまくスケールしない。この問題は英語以外の文脈でさらに複雑になる。本稿では,ファクトチェックをスケールする手段として,クレームマッチングについて論じる。我々は、クレームマッチングを、1つのファクトチェックで提供可能なクレームを含むテキストメッセージのペアを特定するタスクとして定義する。我々は、WhatsAppのチップラインと公開グループメッセージのデータセットを、ファクトチェックされたクレームとともに構築し、最初に“claim-like statement”を含むアノテートされ、潜在的に類似したアイテムとマッチし、クレームマッチングのためのアノテートされる。我々のデータセットには、高リソース(英語、ヒンディー語)と低リソース(ベンガル語、マラヤラム語、タミル語)のコンテンツが含まれています。データセット内の低リソース言語と高リソース言語間の品質の不均衡に対処するため、知識の蒸留と高品質な"教師"モデルを使って、独自の組込みモデルをトレーニングします。本稿では,本ソリューションの性能評価を行い,ベースラインと既存の多言語埋め込みモデルであるLASERとLaBSEと比較する。すべての設定において、私たちのパフォーマンスがLASERとLaBSEを超えていることを示します。アノテーション付きデータセット、コードブック、トレーニングされた埋め込みモデルをリリースし、さらなる研究を可能にします。

Manual fact-checking does not scale well to serve the needs of the internet. This issue is further compounded in non-English contexts. In this paper, we discuss claim matching as a possible solution to scale fact-checking. We define claim matching as the task of identifying pairs of textual messages containing claims that can be served with one fact-check. We construct a novel dataset of WhatsApp tipline and public group messages alongside fact-checked claims that are first annotated for containing "claim-like statements" and then matched with potentially similar items and annotated for claim matching. Our dataset contains content in high-resource (English, Hindi) and lower-resource (Bengali, Malayalam, Tamil) languages. We train our own embedding model using knowledge distillation and a high-quality "teacher" model in order to address the imbalance in embedding quality between the low- and high-resource languages in our dataset. We provide evaluations on the performance of our solution and compare with baselines and existing state-of-the-art multilingual embedding models, namely LASER and LaBSE. We demonstrate that our performance exceeds LASER and LaBSE in all settings. We release our annotated datasets, codebooks, and trained embedding model to allow for further research.

翻訳日:2021-06-03 14:36:48 公開日:2021-06-01

# 小さなトレーニングハイパースペクトルデータを用いた高密度森林における樹木種マッピングのためのマルチタスク完全畳み込みネットワーク

Multi-task fully convolutional network for tree species mapping in dense forests using small training hyperspectral data ( http://arxiv.org/abs/2106.00799v1 )

ライセンス: Link先を確認

Laura Elena Cu\'e La Rosa, Camile Sothe, Raul Queiroz Feitosa, Cl\'audia Maria de Almeida, Marcos Benedito Schimalski, Dario Augusto Borges Oliveira

(参考訳) 本研究は,超スペクトルuavデータを用いた多角形アノテーションによる密林の樹種マッピングのためのマルチタスク完全畳み込みアーキテクチャを提案する。本モデルでは, 樹冠境界制約を強制し, モデル性能を大幅に改善する距離回帰補完タスクを, 非密度トレーニングサンプルから高密度ツリーセマンティックラベリング結果を実現する部分損失関数を実装した。我々のマルチタスクアーキテクチャは、タスクと2つのタスク固有のデコーダの共通表現を学習する共有バックボーンネットワークを用いており、ひとつはセマンティックセグメンテーション出力、もう一つは距離マップレグレッションである。補完課題の導入により, 熱帯林の樹木種分類において, 総合F1スコア87.5%, 総合精度85.9%のセマンティックセマンティックセマンティクス性能が10%向上し, 木種分類の最先端性能が達成されたことを報告した。

This work proposes a multi-task fully convolutional architecture for tree species mapping in dense forests from sparse and scarce polygon-level annotations using hyperspectral UAV-borne data. Our model implements a partial loss function that enables dense tree semantic labeling outcomes from non-dense training samples, and a distance regression complementary task that enforces tree crown boundary constraints and substantially improves the model performance. Our multi-task architecture uses a shared backbone network that learns common representations for both tasks and two task-specific decoders, one for the semantic segmentation output and one for the distance map regression. We report that introducing the complementary task boosts the semantic segmentation performance compared to the single-task counterpart in up to 10% reaching an overall F1 score of 87.5% and an overall accuracy of 85.9%, achieving state-of-art performance for tree species classification in tropical forests.

翻訳日:2021-06-03 14:29:19 公開日:2021-06-01

# nnDetection:医療対象検出のための自己設定方法

nnDetection: A Self-configuring Method for Medical Object Detection ( http://arxiv.org/abs/2106.00817v1 )

ライセンス: Link先を確認

Michael Baumgartner, Paul F. Jaeger, Fabian Isensee, Klaus H. Maier-Hein

(参考訳) 医学画像における物体の同時局所化と分類は、医学的対象検出とも呼ばれ、診断決定は、例えば、対象の格付けに依存することが多いため、高い臨床関連性を有する。ピクセルこのタスクでは、メソッド構成の面倒で反復的なプロセスが大きな研究ボトルネックとなります。近年、nnU-Netは画像分割の課題に対して大きな成功を収めている。 nnu-netのアジェンダに従って,医療用オブジェクト検出の構成プロセスを体系化し,自動化する。結果の自己設定方法であるnnDetectionは、手動による介入なしに、任意の医学的検出問題に適応し、その結果を最先端に匹敵する結果を得る。我々は,adam と luna16 の2つの公開ベンチマークにおいて nndetection の有効性を実証し,総合的手法評価のために10の医療用物体検出タスクを提案する。コードはhttps://github.com/MIC-DKFZ/nnDetectionにある。

Simultaneous localisation and categorization of objects in medical images, also referred to as medical object detection, is of high clinical relevance because diagnostic decisions often depend on rating of objects rather than e.g. pixels. For this task, the cumbersome and iterative process of method configuration constitutes a major research bottleneck. Recently, nnU-Net has tackled this challenge for the task of image segmentation with great success. Following nnU-Net's agenda, in this work we systematize and automate the configuration process for medical object detection. The resulting self-configuring method, nnDetection, adapts itself without any manual intervention to arbitrary medical detection problems while achieving results en par with or superior to the state-of-the-art. We demonstrate the effectiveness of nnDetection on two public benchmarks, ADAM and LUNA16, and propose 10 further medical object detection tasks on public data sets for comprehensive method evaluation. Code is at https://github.com/MIC-DKFZ/nnDetection .

翻訳日:2021-06-03 14:29:00 公開日:2021-06-01

# 大規模ワッサースタイン勾配流れ

Large-Scale Wasserstein Gradient Flows ( http://arxiv.org/abs/2106.00736v1 )

ライセンス: Link先を確認

Petr Mokrov, Alexander Korotin, Lingxiao Li, Aude Genevay, Justin Solomon, Evgeny Burnaev

(参考訳) ワッサーシュタイン勾配流は多くの拡散方程式を理解・解く強力な手段を提供する。具体的には、確率測度の拡散をモデル化するフォッカー・プランク方程式は、ワッサーシュタイン空間におけるエントロピー汎函数の勾配勾配として理解することができる。この同値性はjordan、kinderlehrer、ottoによって導入され、いわゆるjkoスキームに触発され、ワッサーシュタイン空間の勾配流の暗黙の離散化を通じてこれらの拡散過程を近似した。しかし、各JKOステップに関連する最適化問題を解くことは、深刻な計算上の課題をもたらす。機械学習アプリケーションを対象として,Wasserstein勾配流を近似するスケーラブルな手法を提案する。提案手法は, 確率勾配降下により最適化できるJKOステップを識別するために, 入力凸ニューラルネットワーク(ICNN)に依存する。従来の研究と異なり, この手法では領域離散化や粒子シミュレーションは不要である。その結果、拡散の各時間ステップにおける測度からサンプルを採取し、その確率密度を計算することができる。フォッカー・プランク方程式に従う拡散を計算し,非正規化密度サンプリングや非線形フィルタリングに適用することにより,アルゴリズムの性能を実証する。

Wasserstein gradient flows provide a powerful means of understanding and solving many diffusion equations. Specifically, Fokker-Planck equations, which model the diffusion of probability measures, can be understood as gradient descent over entropy functionals in Wasserstein space. This equivalence, introduced by Jordan, Kinderlehrer and Otto, inspired the so-called JKO scheme to approximate these diffusion processes via an implicit discretization of the gradient flow in Wasserstein space. Solving the optimization problem associated to each JKO step, however, presents serious computational challenges. We introduce a scalable method to approximate Wasserstein gradient flows, targeted to machine learning applications. Our approach relies on input-convex neural networks (ICNNs) to discretize the JKO steps, which can be optimized by stochastic gradient descent. Unlike previous work, our method does not require domain discretization or particle simulation. As a result, we can sample from the measure at each time step of the diffusion and compute its probability density. We demonstrate our algorithm's performance by computing diffusions following the Fokker-Planck equation and apply it to unnormalized density sampling as well as nonlinear filtering.

翻訳日:2021-06-03 14:26:11 公開日:2021-06-01

# ICDAR 2021 オンライン署名検証に関するコンペティション

ICDAR 2021 Competition on On-Line Signature Verification ( http://arxiv.org/abs/2106.00739v1 )

ライセンス: Link先を確認

Ruben Tolosana, Ruben Vera-Rodriguez, Carlos Gonzalez-Garcia, Julian Fierrez, Santiago Rengifo, Aythami Morales, Javier Ortega-Garcia, Juan Carlos Ruiz-Garcia, Sergio Romero-Tapiador, Jiajia Jiang, Songxuan Lai, Lianwen Jin, Yecheng Zhu, Javier Galbally, Moises Diaz, Miguel Angel Ferrer, Marta Gomez-Barrero, Ilya Hodashinsky, Konstantin Sarin, Artem Slezkin, Marina Bardamova, Mikhail Svetlakov, Mohammad Saleem, Cintia Lia Sz\"ucs, Bence Kovari, Falk Pulsmeyer, Mohamad Wehbi, Dario Zanca, Sumaiya Ahmad, Sarthak Mishra and Suraiya Jabin

(参考訳) 本稿では,オンライン署名検証 (SVC 2021) に関する ICDAR 2021 コンペティションの枠組みと成果について述べる。 SVC 2021の目標は、一般的なシナリオ(オフィス/モバイル)におけるオンライン署名検証システムの限界を評価し、大規模なパブリックデータベースを通じて入力(スタイラス/フィンガー)を書くことである。競技では3つの異なるタスクが考慮され、各タスクにランダムと熟練した偽造が同時に考慮されるように、現実的なシナリオをシミュレートする。 svc 2021で得られた結果は,深層学習手法の可能性が高いことを証明した。特に、SVC 2021の最良のオンライン署名検証システムでは、EERの値は3.33%(Task 1)、 7.41%(Task2)、 6.04%(Task3)である。 SVC 2021は、現在進行中のコンペティションとして確立され、DeepSignDBやSVC2021_EvalDBといった大規模公開データベースと標準実験プロトコルを使用して、オープンな共通プラットフォームにおける最先端技術に対して、システムを簡単にベンチマークすることができる。

This paper describes the experimental framework and results of the ICDAR 2021 Competition on On-Line Signature Verification (SVC 2021). The goal of SVC 2021 is to evaluate the limits of on-line signature verification systems on popular scenarios (office/mobile) and writing inputs (stylus/finger) through large-scale public databases. Three different tasks are considered in the competition, simulating realistic scenarios as both random and skilled forgeries are simultaneously considered on each task. The results obtained in SVC 2021 prove the high potential of deep learning methods. In particular, the best on-line signature verification system of SVC 2021 obtained Equal Error Rate (EER) values of 3.33% (Task 1), 7.41% (Task 2), and 6.04% (Task 3). SVC 2021 will be established as an on-going competition, where researchers can easily benchmark their systems against the state of the art in an open common platform using large-scale public databases such as DeepSignDB and SVC2021_EvalDB, and standard experimental protocols.

翻訳日:2021-06-03 14:24:19 公開日:2021-06-01

# 知覚画像の高分解能化のためのフーリエ空間損失

Fourier Space Losses for Efficient Perceptual Image Super-Resolution ( http://arxiv.org/abs/2106.00783v1 )

ライセンス: Link先を確認

Dario Fuoli, Luc Van Gool, and Radu Timofte

(参考訳) 多くの超解像モデル (SR) は高性能に最適化されているため、大きなモデルの複雑さのために効率が良くない。大規模モデルは実世界の応用では実用的ではないことが多いため、より効率的なモデルから高い知覚品質のSRを実現するために、新しい損失関数を研究・提案する。与えられた低複雑性ジェネレータネットワークの代表電力は、パラメータの最適セットに対する強いガイダンスによってのみ活用できる。提案した損失関数の適用のみで,最近導入された効率的なジェネレータアーキテクチャの性能向上が可能であることを示す。特に,フーリエ領域において直接動作する識別器アーキテクチャを設計し,対象のhf分布をよりよく一致させるため,フーリエ空間監督損失を用いて,地上真理画像から欠落した高周波(hf)コンテンツを復元する。フーリエ空間における損失の直接的強調は知覚的画質を著しく向上させると同時に,従来提案されていた損失関数と比較して高い復元品質を維持していることを示す。両方の表現がトレーニング中に相補的な情報を提供するので、空間領域と周波数領域の損失の組み合わせを利用してさらに性能を向上する。それに加えて、訓練されたジェネレータは、最先端の知覚的SR法である RankSRGAN と SRFlow よりも2.4倍、48倍高速である。

Many super-resolution (SR) models are optimized for high performance only and therefore lack efficiency due to large model complexity. As large models are often not practical in real-world applications, we investigate and propose novel loss functions, to enable SR with high perceptual quality from much more efficient models. The representative power for a given low-complexity generator network can only be fully leveraged by strong guidance towards the optimal set of parameters. We show that it is possible to improve the performance of a recently introduced efficient generator architecture solely with the application of our proposed loss functions. In particular, we use a Fourier space supervision loss for improved restoration of missing high-frequency (HF) content from the ground truth image and design a discriminator architecture working directly in the Fourier domain to better match the target HF distribution. We show that our losses' direct emphasis on the frequencies in Fourier-space significantly boosts the perceptual image quality, while at the same time retaining high restoration quality in comparison to previously proposed loss functions for this task. The performance is further improved by utilizing a combination of spatial and frequency domain losses, as both representations provide complementary information during training. On top of that, the trained generator achieves comparable results with and is 2.4x and 48x faster than state-of-the-art perceptual SR methods RankSRGAN and SRFlow respectively.

翻訳日:2021-06-03 14:23:58 公開日:2021-06-01

# ボクセル化点雲幾何の無損失圧縮のための境界体積の精錬

Refining the bounding volumes for lossless compression of voxelized point clouds geometry ( http://arxiv.org/abs/2106.00828v1 )

ライセンス: Link先を確認

Emre Can Kaya, Sebastian Schwarz, Ioan Tabus

(参考訳) 本稿では, 点雲の体積のみを再構成することを目的とした, 最新の損失圧縮法を基にした, 点雲幾何学の新しい無損失圧縮法について述べる。提案手法は1つの投影方向に関連する2つの深度マップから幾何を部分的に再構成することから始まる。深度マップから得られた部分再構成は、一方向に沿って断面分割し、2つの深さマップに含まれない点を符号化することにより、点雲の完全な再構成に完成する。主成分は、入力データに存在する回転不変性を効率的に利用する新規な算術的3次元コンテキスト符号化手順により、内点(実現可能領域内)の一覧に基づく符号化である。ベンチマークデータセットでは、最先端のビット毎voxel結果が得られる。

This paper describes a novel lossless compression method for point cloud geometry, building on a recent lossy compression method that aimed at reconstructing only the bounding volume of a point cloud. The proposed scheme starts by partially reconstructing the geometry from the two depthmaps associated to a single projection direction. The partial reconstruction obtained from the depthmaps is completed to a full reconstruction of the point cloud by sweeping section by section along one direction and encoding the points which were not contained in the two depthmaps. The main ingredient is a list-based encoding of the inner points (situated inside the feasible regions) by a novel arithmetic three dimensional context coding procedure that efficiently utilizes rotational invariances present in the input data. State-of-the-art bits-per-voxel results are obtained on benchmark datasets.

翻訳日:2021-06-03 14:23:36 公開日:2021-06-01

# 物体マニフォールドのニューラルプロセッシングの統計力学

Statistical Mechanics of Neural Processing of Object Manifolds ( http://arxiv.org/abs/2106.00790v1 )

ライセンス: Link先を確認

SueYeon Chung

(参考訳) 不変物体認識は、脳が行う最も基本的な認知タスクの1つである。神経状態空間では、刺激変動を持つ異なる物体は異なる多様体として表現される。この幾何学的な観点では、オブジェクト認識は異なるオブジェクト多様体を線形に分離する問題となる。フィードフォワードの視覚階層では、オブジェクト多様体の表現は層全体に再フォーマットされ、より線形に分離可能であることが示唆されている。したがって、知覚の完全な理論は、可変神経応答から対象多様体を分類する線形読み出しネットワークの能力を特徴付ける必要がある。孤立点の知覚論は、E. Gardnerがこれを統計力学問題として定式化し、レプリカ理論を用いて解析した。本稿では、ガードナーの解析を一般化し、高次元信号の統計的および幾何学的性質を合成する多様体の線形分類の理論を確立する。次に、我々の理論をさらに一般化して、点雲のような一般的な知覚多様体の線形分類を行う。多様体のキャパシティは,有効半径, R_M, 有効次元, D_Mと決定される。最後に、相関多様体、異種多様体ジオメトリ、スパースラベル、非線形分類を含む実データへの応用に関する拡張を示す。次に、オブジェクトベース多様体が標準深層ネットワークでどのように変換されるかを示す。この論文は、対象の神経処理の計算理論の基礎を定め、対象多様体の線形分離性に関する定量的測度を提供する。この理論が、生体および人工ニューラルネットワークにおける感覚表現の処理の基礎となる計算原理に新たな洞察を与えることを期待している。

Invariant object recognition is one of the most fundamental cognitive tasks performed by the brain. In the neural state space, different objects with stimulus variabilities are represented as different manifolds. In this geometrical perspective, object recognition becomes the problem of linearly separating different object manifolds. In feedforward visual hierarchy, it has been suggested that the object manifold representations are reformatted across the layers, to become more linearly separable. Thus, a complete theory of perception requires characterizing the ability of linear readout networks to classify object manifolds from variable neural responses. A theory of the perceptron of isolated points was pioneered by E. Gardner who formulated it as a statistical mechanics problem and analyzed it using replica theory. In this thesis, we generalize Gardner's analysis and establish a theory of linear classification of manifolds synthesizing statistical and geometric properties of high dimensional signals. [..] Next, we generalize our theory further to linear classification of general perceptual manifolds, such as point clouds. We identify that the capacity of a manifold is determined that effective radius, R_M, and effective dimension, D_M. Finally, we show extensions relevant for applications to real data, incorporating correlated manifolds, heterogenous manifold geometries, sparse labels and nonlinear classifications. Then, we demonstrate how object-based manifolds transform in standard deep networks. This thesis lays the groundwork for a computational theory of neuronal processing of objects, providing quantitative measures for linear separability of object manifolds. We hope this theory will provide new insights into the computational principles underlying processing of sensory representations in biological and artificial neural networks.

翻訳日:2021-06-03 14:22:49 公開日:2021-06-01

# 極端分類におけるラベル木の効率-精度トレードオフ

Enabling Efficiency-Precision Trade-offs for Label Trees in Extreme Classification ( http://arxiv.org/abs/2106.00730v1 )

ライセンス: Link先を確認

Tavor Z. Baharav, Daniel L. Jiang, Kedarnath Kolluri, Sujay Sanghavi, Inderjit S. Dhillon

(参考訳) Extreme Multi-label Classification (XMC) は、非常に大きなラベルセットから関連するラベルのサブセットでデータポイントをタグ付けできるモデルを学ぶことを目的としている。パーソナライズされたレコメンデーションや製品広告のような現実世界のeコマースアプリケーションは、XMC問題として定式化することができる。このようなアプリケーションでは、ラベルを木に整理し、ラベル数に対数的なトレーニングと推論時間を可能にするのが一般的なアプローチである。ラベルツリーが利用可能になったらモデルをトレーニングすることはよく研究されていますが、ツリーの構造を設計することは、まだよく理解されていない難しい作業であり、モデルのレイテンシと統計パフォーマンスの両方に劇的に影響を与えます。既存のツリー構築アプローチは、統計的なパフォーマンスにのみ最適化するか、レイテンシーに最適化される。我々は,両者の利益をトレードオフする中間操作点を構築するための効率的な情報理論インスパイアアルゴリズムを提案する。本アルゴリズムは,従来不可能であったこれらの目的間の補間を可能にする。 wiki-500kベンチマークデータセットでは、パラベルと同じ精度を維持しつつ、予測レイテンシのプロキシを最大28%削減できることを示した。電子商取引の顧客ログから得られたいくつかのデータセットでは、修正されたラベルツリーが、同じ精度を維持しながら、この予測レイテンシメトリックを最大20%改善することができます。最後に,デプロイモデルのレイテンシ向上を実現する上での課題について論じる。

Extreme multi-label classification (XMC) aims to learn a model that can tag data points with a subset of relevant labels from an extremely large label set. Real world e-commerce applications like personalized recommendations and product advertising can be formulated as XMC problems, where the objective is to predict for a user a small subset of items from a catalog of several million products. For such applications, a common approach is to organize these labels into a tree, enabling training and inference times that are logarithmic in the number of labels. While training a model once a label tree is available is well studied, designing the structure of the tree is a difficult task that is not yet well understood, and can dramatically impact both model latency and statistical performance. Existing approaches to tree construction fall at an extreme point, either optimizing exclusively for statistical performance, or for latency. We propose an efficient information theory inspired algorithm to construct intermediary operating points that trade off between the benefits of both. Our algorithm enables interpolation between these objectives, which was not previously possible. We corroborate our theoretical analysis with numerical results, showing that on the Wiki-500K benchmark dataset our method can reduce a proxy for expected latency by up to 28% while maintaining the same accuracy as Parabel. On several datasets derived from e-commerce customer logs, our modified label tree is able to improve this expected latency metric by up to 20% while maintaining the same accuracy. Finally, we discuss challenges in realizing these latency improvements in deployed models.

翻訳日:2021-06-03 14:21:43 公開日:2021-06-01

# マルチドメイン環境におけるc2意思決定を改善するための画像オーディオ符号化

Image-Audio Encoding to Improve C2 Decision-Making in Multi-Domain Environment ( http://arxiv.org/abs/2106.00787v1 )

ライセンス: Link先を確認

Piyush K. Sharma and Adrienne Raglin

(参考訳) 軍は、MDO(Multi- Domain Operation)におけるコミュニケーションと機敏性を改善する方法を調査している。 IoT(Internet of Things)が最近人気になったのは、パブリックドメインと政府ドメインだ。 MDOにおけるその使用は将来の戦場に革命をもたらし、戦略的優位性をもたらす可能性がある。この技術は軍事能力の活用を提供するが、不確実性と関連するリスクが問題となる。重要な疑問は、これらの不確実性に対処する方法だ。近年、あるデータ領域から別のデータ領域へ情報を変換するための情報カモフラージュが提案されている。これは比較的新しいアプローチであるため、このような変革の課題と、関連する不確実性の検出と対処方法、特に未知の未知の意思決定の改善について検討する。

The military is investigating methods to improve communication and agility in its multi-domain operations (MDO). Nascent popularity of Internet of Things (IoT) has gained traction in public and government domains. Its usage in MDO may revolutionize future battlefields and may enable strategic advantage. While this technology offers leverage to military capabilities, it comes with challenges where one is the uncertainty and associated risk. A key question is how can these uncertainties be addressed. Recently published studies proposed information camouflage to transform information from one data domain to another. As this is comparatively a new approach, we investigate challenges of such transformations and how these associated uncertainties can be detected and addressed, specifically unknown-unknowns to improve decision-making.

翻訳日:2021-06-03 14:19:20 公開日:2021-06-01

# 分散型マルチエージェントq-learningによるuav基地局の省エネルギー配置最適化

Energy-aware placement optimization of UAV base stations via decentralized multi-agent Q-learning ( http://arxiv.org/abs/2106.00845v1 )

ライセンス: Link先を確認

Babatunji Omoniwa, Boris Galkin, Ivana Dusparic

(参考訳) 航空基地局(uav-bss)として機能する無人航空機は、ネットワーク需要の増加、既存のインフラの障害点、災害発生時の地上機器への無線接続を提供する。しかし、バッテリー容量の制限を考慮すると、長時間のカバー作業においてUAVのエネルギーを節約することは困難である。強化学習ベース(rl)アプローチは、これまで複数のuavのエネルギー利用を改善するために用いられてきたが、中央のクラウドコントローラは、エンドデバイスの位置に関する完全な知識を持っていると仮定されている。この仮定は、モバイルグラウンドデバイスを用いた動的ネットワーク環境では現実的ではない。この問題に対処するため,各UAV-BSには,地上機器との接続性を最大化し,エネルギー利用の向上を図る自律エージェントが備わっている。実験の結果,UAV-BSの連接する接地装置の数とエネルギー利用の最大化において,提案手法は集中型アプローチよりも有意に優れていた。

Unmanned aerial vehicles serving as aerial base stations (UAV-BSs) can be deployed to provide wireless connectivity to ground devices in events of increased network demand, points-of-failure in existing infrastructure, or disasters. However, it is challenging to conserve the energy of UAVs during prolonged coverage tasks, considering their limited on-board battery capacity. Reinforcement learning-based (RL) approaches have been previously used to improve energy utilization of multiple UAVs, however, a central cloud controller is assumed to have complete knowledge of the end-devices' locations, i.e., the controller periodically scans and sends updates for UAV decision-making. This assumption is impractical in dynamic network environments with mobile ground devices. To address this problem, we propose a decentralized Q-learning approach, where each UAV-BS is equipped with an autonomous agent that maximizes the connectivity to ground devices while improving its energy utilization. Experimental results show that the proposed design significantly outperforms the centralized approaches in jointly maximizing the number of connected ground devices and the energy utilization of the UAV-BSs.

翻訳日:2021-06-03 14:19:07 公開日:2021-06-01

# (参考訳) スパースグラフのサンプルfr\'echet平均はスパースである

The Sample Fr\'echet Mean of Sparse Graphs is Sparse ( http://arxiv.org/abs/2105.14397v2 )

ライセンス: CC BY 4.0

Daniel Ferguson, Francois G. Meyer

(参考訳) グラフからなる大規模なデータセットが利用可能になったことで、"グラフ値のランダム変数"の統計学習において、新しいツールを発明する必要がなくなった。グラフのサンプルの「平均」を特徴づけるために、サンプルFr\'echet平均を計算することができる。サンプル平均はグラフサンプルの解釈可能な要約を与える必要があるので、サンプルの構造的性質がFr'echet平均に伝達されると予想される。サンプル Fr\'echet は標本中のグラフの構造的性質を継承することを意味するのか? 具体的には、以下の結果を示す: スパースグラフの集合のサンプルfr\'echet平均はスパースである。グラフハミング距離とスペクトル隣接擬メトリックに対する結果は、非常に異なる引数を用いて証明する。実際に、サンプルFr'echet平均のエッジ密度は、サンプル内のグラフのエッジ密度によって束縛されるという、より強い結果が証明される。この結果は、サンプルFr\'echet平均を推定するために用いられる方法にかかわらず、グラフサンプルからサンプルFr\'echet平均に伝達できる空間が遺伝性であることを保証している。

The availability of large datasets composed of graphs creates an unprecedented need to invent novel tools in statistical learning for "graph-valued random variables". To characterize the "average" of a sample of graphs, one can compute the sample Fr\'echet mean. Because the sample mean should provide an interpretable summary of the graph sample, one would expect that the structural properties of the sample be transmitted to the Fr\'echet mean. In this paper, we address the following foundational question: does the sample Fr\'echet mean inherit the structural properties of the graphs in the sample? Specifically, we prove the following result: the sample Fr\'echet mean of a set of sparse graphs is sparse. We prove the result for the graph Hamming distance, and the spectral adjacency pseudometric, using very different arguments. In fact, we prove a stronger result: the edge density of the sample Fr\'echet mean is bounded by the edge density of the graphs in the sample. This result guarantees that sparsity is an hereditary property, which can be transmitted from a graph sample to its sample Fr\'echet mean, irrespective of the method used to estimate the sample Fr\'echet mean.

翻訳日:2021-06-03 13:21:05 公開日:2021-06-01

# 周期gp:ガウス過程バンディットを用いた周期世界学習

Periodic-GP: Learning Periodic World with Gaussian Process Bandits ( http://arxiv.org/abs/2105.14422v2 )

ライセンス: Link先を確認

Hengrui Cai, Zhihao Cen, Ling Leng, Rui Song

(参考訳) 配車におけるドライバーの日々の需要や交通の動的な交通パターンなど、データが季節性を伴う場合に、様々な実世界のアプリケーションで発生する周期的環境における逐次的決定最適化を考える。本研究では,この季節法則を活用し,確率的周期世界を学ぶことに注力する。一般作用空間に対処するために,ガウス過程(GP)に基づくバンドイットを基本モデルとして,その柔軟性と一般性から用い,高信頼度境界に基づく周期的カーネルを用いた周期的GP法を提案する。理論的には、周期的定常モデルにおいて周期的核を明示的に特徴付けることにより、提案手法の新たな後悔のバウンドを与える。実験的に,提案アルゴリズムは,マドリードの交通汚染に対する合成データ実験と実データ応用の両方において,既存の手法を著しく上回っている。

We consider the sequential decision optimization on the periodic environment, that occurs in a wide variety of real-world applications when the data involves seasonality, such as the daily demand of drivers in ride-sharing and dynamic traffic patterns in transportation. In this work, we focus on learning the stochastic periodic world by leveraging this seasonal law. To deal with the general action space, we use the bandit based on Gaussian process (GP) as the base model due to its flexibility and generality, and propose the Periodic-GP method with a temporal periodic kernel based on the upper confidence bound. Theoretically, we provide a new regret bound of the proposed method, by explicitly characterizing the periodic kernel in the periodic stationary model. Empirically, the proposed algorithm significantly outperforms the existing methods in both synthetic data experiments and a real data application on Madrid traffic pollution.

翻訳日:2021-06-03 11:02:46 公開日:2021-06-01

# (参考訳) 自然災害評価のためのUAVデータセットの注意に基づくセマンティックセマンティックセグメンテーション

Attention Based Semantic Segmentation on UAV Dataset for Natural Disaster Damage Assessment ( http://arxiv.org/abs/2105.14540v2 )

ライセンス: CC BY 4.0

Tashnim Chowdhury, Maryam Rahnemoonfar

(参考訳) 気候変動による有害な影響には、世界中の強大で破壊的なハリケーンが含まれる。自然災害による被害を最小限に抑えるため、救助隊の計画を支援するため、建物や道路を含む地域の被害の異なる構造物の特定が不可欠である。セマンティックセグメンテーションは、画像の異なる部分を特定するのに役立つ。我々は,高分解能UAVデータセット上に,自己注意に基づくセマンティックセマンティックセマンティクスモデルを実装し,テストセットで約88%のMean IoUスコアを得る。その結果、人命を救うとともに経済損失を減らす自然災害被害評価に自己注意型スキームを使うことが示唆された。

The detrimental impacts of climate change include stronger and more destructive hurricanes happening all over the world. Identifying different damaged structures of an area including buildings and roads are vital since it helps the rescue team to plan their efforts to minimize the damage caused by a natural disaster. Semantic segmentation helps to identify different parts of an image. We implement a novel self-attention based semantic segmentation model on a high resolution UAV dataset and attain Mean IoU score of around 88% on the test set. The result inspires to use self-attention schemes in natural disaster damage assessment which will save human lives and reduce economic losses.

翻訳日:2021-06-03 09:18:59 公開日:2021-06-01

# (参考訳) 変分オートエンコーダ:調和的視点

Variational Autoencoders: A Harmonic Perspective ( http://arxiv.org/abs/2105.14866v2 )

ライセンス: CC BY 4.0

Alexander Camuto, Matthew Willetts

(参考訳) 本研究では,高調波解析の観点から変分オートエンコーダ(VAE)について検討する。 VAEの潜伏空間を様々な測度空間であるガウス空間として見ることにより、VAEのエンコーダ分散がVAEエンコーダとデコーダニューラルネットワークによってパラメータ化された関数の周波数内容を制御することを示す一連の結果を得る。特に、より大きなエンコーダ分散がこれらの関数の高周波含量を減少させることを示す。解析により,この分散の増大がvaeのデコーダネットワークにソフトリプシッツ制約を効果的に生じさせることを示した。さらに、VAEの入力にガウス雑音を加えることで、VAEエンコーダネットワークの周波数内容とリプシッツ定数をより細かく制御できることを示す。理論解析を支援するために、我々は、小さな完全連結ニューラルネットワークとより大きな畳み込みネットワークを用いたVAEの実験を行い、我々の理論が様々なニューラルネットワークアーキテクチャを実証した。

In this work we study Variational Autoencoders (VAEs) from the perspective of harmonic analysis. By viewing a VAE's latent space as a Gaussian Space, a variety of measure space, we derive a series of results that show that the encoder variance of a VAE controls the frequency content of the functions parameterised by the VAE encoder and decoder neural networks. In particular we demonstrate that larger encoder variances reduce the high frequency content of these functions. Our analysis allows us to show that increasing this variance effectively induces a soft Lipschitz constraint on the decoder network of a VAE, which is a core contributor to the adversarial robustness of VAEs. We further demonstrate that adding Gaussian noise to the input of a VAE allows us to more finely control the frequency content and the Lipschitz constant of the VAE encoder networks. To support our theoretical analysis we run experiments with VAEs with small fully-connected neural networks and with larger convolutional networks, demonstrating empirically that our theory holds for a variety of neural network architectures.

翻訳日:2021-06-03 08:44:05 公開日:2021-06-01

# (参考訳) semeval-2021タスク4 : 抽象的意味の理解

SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning ( http://arxiv.org/abs/2105.14879v2 )

ライセンス: CC BY 4.0

Boyuan Zheng, Xiaoyu Yang, Yu-Ping Ruan, Zhenhua Ling, Quan Liu, Si Wei, Xiaodan Zhu

(参考訳) 本稿では, semeval-2021 共通タスク4: read comprehension of abstract meaning (recam) を紹介する。この共有タスクは抽象概念を表現・理解する機械の能力を評価するために設計されている。質問文とそれに対応する質問文が与えられた場合、参加システムは5つの抽象概念候補の中から正しい回答を選択することが期待される。抽象性の2つの典型的な定義、すなわち非受容性と非特異性に基づいて、我々のタスクは参加モデルを評価するための3つのサブタスクを提供する。特に、subtask 1は、システムが物理的世界で直接知覚できない概念をいかにうまくモデル化できるかを評価することを目的としている。 Subtask 2は、パスの文脈から、ハイパーネム階層にある非特異な概念を解釈するモデルの能力に焦点を当てている。 Subtask 3は、2種類の抽象性に対するモデルの一般化可能性に関する洞察を提供することを目的としている。 SemEval-2021 の公式評価期間中に,Subtask 1 に 23 件,Subtask 2 に 28 件を提出した。参加チームはさらに29件をSubtask 3に提出した。 leaderboard and competitionのウェブサイトはhttps://competitions.codalab.org/competitions/26153にある。データとベースラインコードはhttps://github.com/boyuanzheng010/SemEval2021-Reading-Comprehension-of-Abstract-Meaningで入手できる。

This paper introduces the SemEval-2021 shared task 4: Reading Comprehension of Abstract Meaning (ReCAM). This shared task is designed to help evaluate the ability of machines in representing and understanding abstract concepts. Given a passage and the corresponding question, a participating system is expected to choose the correct answer from five candidates of abstract concepts in a cloze-style machine reading comprehension setup. Based on two typical definitions of abstractness, i.e., the imperceptibility and nonspecificity, our task provides three subtasks to evaluate the participating models. Specifically, Subtask 1 aims to evaluate how well a system can model concepts that cannot be directly perceived in the physical world. Subtask 2 focuses on models' ability in comprehending nonspecific concepts located high in a hypernym hierarchy given the context of a passage. Subtask 3 aims to provide some insights into models' generalizability over the two types of abstractness. During the SemEval-2021 official evaluation period, we received 23 submissions to Subtask 1 and 28 to Subtask 2. The participating teams additionally made 29 submissions to Subtask 3. The leaderboard and competition website can be found at https://competitions.codalab.org/competitions/26153. The data and baseline code are available at https://github.com/boyuanzheng010/SemEval2021-Reading-Comprehension-of-Abstract-Meaning.

翻訳日:2021-06-03 08:24:41 公開日:2021-06-01

# (参考訳) スパースなエキスパートモデルとそれ以上を探求する

Exploring Sparse Expert Models and Beyond ( http://arxiv.org/abs/2105.15082v2 )

ライセンス: CC BY 4.0

An Yang, Junyang Lin, Rui Men, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Jiamang Wang, Yong Li, Di Zhang, Wei Lin, Lin Qu, Jingren Zhou, Hongxia Yang

(参考訳) Mixture-of-Experts (MoE) モデルは、無数のパラメータを持つ有望な結果が得られるが、計算コストは一定であり、モデルスケーリングのトレンドとなっている。それでも、MoE層がパラメータをスパースアクティベーションで活用することで、どのように品質向上をもたらすのかは謎である。本研究では,スパースエキスパートモデルにおけるいくつかの要因について検討する。負荷の不均衡は、最近の研究の視点とは対照的に、モデル品質に重大な問題ではない可能性があるが、sparsely activated experts $k$とexpert capacity $c$トップ$k$ routingは、この文脈で大きな違いをもたらす可能性がある。さらに私たちは、エキスパートプロトタイピングと呼ばれる、専門家を異なるプロトタイプに分割し、トップクラスのルーティングに$k$を適用するシンプルな方法を提案します。この戦略は, モデル品質を向上させるが, 一定の計算コストを維持するとともに, 大規模モデルのさらなる探索により, 大規模モデルの訓練に有効であることが示唆された。私たちはモデルスケールを1兆ドル以上のパラメータにし、NVIDIA V100-32GBのGPUのみに実装します。提案する巨大モデルは,同規模のベースライン上での収束の大幅な高速化を実現する。

Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost, and thus it has become a trend in model scaling. Still it is a mystery how MoE layers bring quality gains by leveraging the parameters with sparse activation. In this work, we investigate several key factors in sparse expert models. We observe that load imbalance may not be a significant problem affecting model quality, contrary to the perspectives of recent studies, while the number of sparsely activated experts $k$ and expert capacity $C$ in top-$k$ routing can significantly make a difference in this context. Furthermore, we take a step forward to propose a simple method called expert prototyping that splits experts into different prototypes and applies $k$ top-$1$ routing. This strategy improves the model quality but maintains constant computational costs, and our further exploration on extremely large-scale models reflects that it is more effective in training larger models. We push the model scale to over $1$ trillion parameters and implement it on solely $480$ NVIDIA V100-32GB GPUs, in comparison with the recent SOTAs on $2048$ TPU cores. The proposed giant model achieves substantial speedup in convergence over the same-size baseline.

翻訳日:2021-06-03 08:02:58 公開日:2021-06-01

# (参考訳) 単調分類器の説明

Explanations for Monotonic Classifiers ( http://arxiv.org/abs/2106.00154v1 )

ライセンス: CC BY 4.0

Joao Marques-Silva, Thomas Gerspacher, Martin Cooper, Alexey Ignatiev, Nina Narodytska

(参考訳) 多くの分類課題では単調性が要求される。具体的には、もし他のすべてが一定であるならば、増加(resp)する。減少) 1つ以上の特徴の値が減少してはいけない(resp)。増加) 予測の値。単調分類子を学ぶための包括的な取り組みにもかかわらず、単調分類子を説明するための専門的なアプローチは乏しく、分類子特有のものである。本稿では,ブラックボックス型単調分類器の一形式的説明の計算アルゴリズムについて述べる。これらの新しいアルゴリズムは、分類器のランタイム複雑性と特徴数における多項式である。さらに,形式的説明を列挙する実用的なモデル非依存アルゴリズムを提案する。

In many classification tasks there is a requirement of monotonicity. Concretely, if all else remains constant, increasing (resp. decreasing) the value of one or more features must not decrease (resp. increase) the value of the prediction. Despite comprehensive efforts on learning monotonic classifiers, dedicated approaches for explaining monotonic classifiers are scarce and classifier-specific. This paper describes novel algorithms for the computation of one formal explanation of a (black-box) monotonic classifier. These novel algorithms are polynomial in the run time complexity of the classifier and the number of features. Furthermore, the paper presents a practically efficient model-agnostic algorithm for enumerating formal explanations.

翻訳日:2021-06-03 03:26:05 公開日:2021-06-01

# (参考訳) スパース出力を用いた軌道予測の強化:チームスポーツへの応用

Enhancing Trajectory Prediction using Sparse Outputs: Application to Team Sports ( http://arxiv.org/abs/2106.00173v1 )

ライセンス: CC BY 4.0

Brandon Victor, Aiden Nibali, Zhen He, David L. Carey

(参考訳) チームのダイナミクスを効果的に模倣する洗練された軌道予測モデルは、スポーツコーチ、放送局、観客に多くの潜在的用途がある。しかし、サッカーデータを用いた実験により、予測と真の将来の軌跡の間の平均距離で線形外挿を上回り、プレイヤー軌道予測のためのディープラーニングモデルをトレーニングすることは驚くほど困難であることがわかった。本研究では,スパース軌道の予測と一定加速度による補間により訓練を改善する新しい手法を提案し,実験を行った。この補間は、スパースアウトプットで訓練されていないモデルでも使用することができ、テストされたすべてのモデルのパフォーマンスを一貫して改善することがわかった。さらに,他のプレイヤーの完全な軌跡を条件にすることで,プレイヤーのサブセットに対する予測軌跡の精度が向上し,スパース予測と組み合わせることでさらに改善できることが判明した。また、グラフネットワークとマルチヘッドアテンション(gran-ma)を用いた新しいアーキテクチャを提案する。このアーキテクチャは、データセット上の他のテストされた最先端モデルよりも優れた性能を実現し、スパーストラジェクタとフルトラジェクション条件付き軌道予測の両方に自明に適合する。

Sophisticated trajectory prediction models that effectively mimic team dynamics have many potential uses for sports coaches, broadcasters and spectators. However, through experiments on soccer data we found that it can be surprisingly challenging to train a deep learning model for player trajectory prediction which outperforms linear extrapolation on average distance between predicted and true future trajectories. We propose and test a novel method for improving training by predicting a sparse trajectory and interpolating using constant acceleration, which improves performance for several models. This interpolation can also be used on models that aren't trained with sparse outputs, and we find that this consistently improves performance for all tested models. Additionally, we find that the accuracy of predicted trajectories for a subset of players can be improved by conditioning on the full trajectories of the other players, and that this is further improved when combined with sparse predictions. We also propose a novel architecture using graph networks and multi-head attention (GraN-MA) which achieves better performance than other tested state-of-the-art models on our dataset and is trivially adapted for both sparse trajectories and full-trajectory conditioned trajectory prediction.

翻訳日:2021-06-03 03:08:31 公開日:2021-06-01

# (参考訳) 2次元データセットのハイブリッド生成モデル

Hybrid Generative Models for Two-Dimensional Datasets ( http://arxiv.org/abs/2106.00203v1 )

ライセンス: CC BY 4.0

Hoda Shajari, Jaemoon Lee, Sanjay Ranka, Anand Rangarajan

(参考訳) 2次元配列に基づくデータセットは、様々な領域にまたがっている。現在の生成モデリングのアプローチは、通常、従来の画像データセットに限定され、ピクセル間の相関を明示的にキャプチャしないピクセルドメインで実行される。さらに、これらのアプローチは、各要素値が連続で固定範囲に制限されない科学や他の応用に拡張されない。本稿では,計算を表現基盤の空間に移動させることにより,二次元データセットを生成する新しい手法を提案し,画像から,科学計算から2つの異なるデータセットにその有用性を示す。提案手法は汎用的で,任意のデータセット,表現ベース,生成モデルに適用可能である。生成モデルと表現ベース空間の様々な組み合わせを総合的に比較する。また,画素空間における画像生成の不足を捉える新しい評価指標を提案する。

Two-dimensional array-based datasets are pervasive in a variety of domains. Current approaches for generative modeling have typically been limited to conventional image datasets and performed in the pixel domain which do not explicitly capture the correlation between pixels. Additionally, these approaches do not extend to scientific and other applications where each element value is continuous and is not limited to a fixed range. In this paper, we propose a novel approach for generating two-dimensional datasets by moving the computations to the space of representation bases and show its usefulness for two different datasets, one from imaging and another from scientific computing. The proposed approach is general and can be applied to any dataset, representation basis, or generative model. We provide a comprehensive performance comparison of various combinations of generative models and representation basis spaces. We also propose a new evaluation metric which captures the deficiency of generating images in pixel space.

翻訳日:2021-06-03 02:50:58 公開日:2021-06-01

# (参考訳) 最大傾き発見としての不連続名前付きエンティティ認識

Discontinuous Named Entity Recognition as Maximal Clique Discovery ( http://arxiv.org/abs/2106.00218v1 )

ライセンス: CC BY 4.0

Yucheng Wang, Bowen Yu, Hongsong Zhu, Tingwen Liu, Nan Yu and Limin Sun

(参考訳) 名前付きエンティティ認識(NER)は、エンティティの言及が不連続である場合、依然として困難である。既存の方法は、認識プロセスをいくつかの逐次ステップに分割する。トレーニングにおいて、彼らは、前のステップのモデル出力に依存する推論を行いながら、黄金の中間結果を条件付きで予測する。この問題を解決するために、まず各文のセグメントグラフを構築し、各ノードがセグメント(自身上の連続エンティティ、または不連続エンティティの一部)を表現し、エッジが同一エンティティに属する2つのノードをリンクする。ノードとエッジはそれぞれ1つのステージでグリッドタグ方式で生成でき、macという新しいアーキテクチャを使って共同で学習することができる。すると、不連続な NER はグラフ内の最大傾きを発見し、各傾きのスパンを連結する非パラメトリックな過程として再構成することができる。 3つのベンチマーク実験により,本手法はf1において最大3.5ポイント向上し,somaモデルよりも5倍の高速化を達成した。

Named entity recognition (NER) remains challenging when entity mentions can be discontinuous. Existing methods break the recognition process into several sequential steps. In training, they predict conditioned on the golden intermediate results, while at inference relying on the model output of the previous steps, which introduces exposure bias. To solve this problem, we first construct a segment graph for each sentence, in which each node denotes a segment (a continuous entity on its own, or a part of discontinuous entities), and an edge links two nodes that belong to the same entity. The nodes and edges can be generated respectively in one stage with a grid tagging scheme and learned jointly using a novel architecture named Mac. Then discontinuous NER can be reformulated as a non-parametric process of discovering maximal cliques in the graph and concatenating the spans in each clique. Experiments on three benchmarks show that our method outperforms the state-of-the-art (SOTA) results, with up to 3.5 percentage points improvement on F1, and achieves 5x speedup over the SOTA model.

翻訳日:2021-06-03 02:41:11 公開日:2021-06-01

# (参考訳) VA-GCN:ポイントクラウド上での学習のためのベクトル注意グラフ畳み込みネットワーク

VA-GCN: A Vector Attention Graph Convolution Network for learning on Point Clouds ( http://arxiv.org/abs/2106.00227v1 )

ライセンス: CC BY 4.0

Haotian Hu, Fanyi Wang, Huixiao Le

(参考訳) 局所集約演算子の研究の発展により、ポイントクラウド解析モデルにおいて劇的なブレークスルーが行われた。しかし、現在の文献における既存の局所集約演算子は、モデルのパワーを制限する点雲の局所的な情報に十分な重要性を持たない。そこで我々は,K-Nearest Neighbor (KNN) を用いて各入力点の近傍点を抽出し,中心点とその近傍点間のベクトルの標高と方位関係を利用して,エッジ特徴に対する注目重み行列を構築する,効率的なベクトル注意変換モジュール(VAConv)を提案する。その後、VAConvは二重チャネル構造を採用し、重み付けされたエッジ特徴とグローバル特徴を融合させる。 VAConvの効率を検証するために,VAConvsを異なる受容領域に並列に接続し,マルチスケールグラフ畳み込みネットワークVA-GCNを得る。提案したVA-GCNは、ModelNet40、S3DIS、ShapeNetなどの標準ベンチマークで最先端のパフォーマンスを実現する。 3D分類のためのModelNet40データセットでは、VA-GCNはベースラインに比べて2.4%増加した。

Owing to the development of research on local aggregation operators, dramatic breakthrough has been made in point cloud analysis models. However, existing local aggregation operators in the current literature fail to attach decent importance to the local information of the point cloud, which limits the power of the models. To fit this gap, we propose an efficient Vector Attention Convolution module (VAConv), which utilizes K-Nearest Neighbor (KNN) to extract the neighbor points of each input point, and then uses the elevation and azimuth relationship of the vectors between the center point and its neighbors to construct an attention weight matrix for edge features. Afterwards, the VAConv adopts a dual-channel structure to fuse weighted edge features and global features. To verify the efficiency of the VAConv, we connect the VAConvs with different receptive fields in parallel to obtain a Multi-scale graph convolutional network, VA-GCN. The proposed VA-GCN achieves state-of-the-art performance on standard benchmarks including ModelNet40, S3DIS and ShapeNet. Remarkably, on the ModelNet40 dataset for 3D classification, VA-GCN increased by 2.4% compared to the baseline.

翻訳日:2021-06-03 02:26:11 公開日:2021-06-01

# (参考訳) ロバスト画像と画像セット分類のための統計的・空間的疎結合の再検討

Reconciliation of Statistical and Spatial Sparsity For Robust Image and Image-Set Classification ( http://arxiv.org/abs/2106.00256v1 )

ライセンス: CC BY 4.0

Hao Cheng, Kim-Hui Yap, and Bihan Wen

(参考訳) 最近の画像分類アルゴリズムは、大規模データセットから深い特徴を学習することで、従来の特徴ベースアプローチと比較してかなり優れた結果を得た。しかしながら、ノイズ画像や画像集合クエリの分類や、限られたスケールのデータセット上での深層画像分類モデルのトレーニングなど、実際にはさまざまな画像分類の課題がある。汎用的な深い特徴を適用する代わりに、モデルベースのアプローチは、画像と画像セットの分類タスクにおいてより効果的でデータ効率が良い。本研究では,局所パッチ構造とリーマン多様体に写像された大域ガウス分布とを調和させることにより,画像や画像データセットの分類をモデル化する,新たな統計的・空間的スパース表現法である \textit{j3s} を提案する。我々の知る限りでは、グローバル統計と局所パッチ構造をジョイントスパース表現を通じて併用する作業は行われていない。ジョイントスパース性を用いて局所画像表現と大域画像表現を結合することにより,j3sモデルに基づくジョイントスパース符号化問題を解く。学習したJ3Sモデルは、堅牢な画像分類とイメージセット分類に使用される。実験の結果,提案手法はFMD, UIUC, ETH-80, YTCデータベース上での競合手法よりも高い性能を示した。

Recent image classification algorithms, by learning deep features from large-scale datasets, have achieved significantly better results comparing to the classic feature-based approaches. However, there are still various challenges of image classifications in practice, such as classifying noisy image or image-set queries and training deep image classification models over the limited-scale dataset. Instead of applying generic deep features, the model-based approaches can be more effective and data-efficient for robust image and image-set classification tasks, as various image priors are exploited for modeling the inter- and intra-set data variations while preventing over-fitting. In this work, we propose a novel Joint Statistical and Spatial Sparse representation, dubbed \textit{J3S}, to model the image or image-set data for classification, by reconciling both their local patch structures and global Gaussian distribution mapped into Riemannian manifold. To the best of our knowledge, no work to date utilized both global statistics and local patch structures jointly via joint sparse representation. We propose to solve the joint sparse coding problem based on the J3S model, by coupling the local and global image representations using joint sparsity. The learned J3S models are used for robust image and image-set classification. Experiments show that the proposed J3S-based image classification scheme outperforms the popular or state-of-the-art competing methods over FMD, UIUC, ETH-80 and YTC databases.

翻訳日:2021-06-03 02:14:06 公開日:2021-06-01

# (参考訳) 分割とルール: 動的プロセスのための繰り返し分割ネットワーク

Divide and Rule: Recurrent Partitioned Network for Dynamic Processes ( http://arxiv.org/abs/2106.00258v1 )

ライセンス: CC BY 4.0

Qianyu Feng, Bang Zhang, Yi Yang

(参考訳) 一般に、多くの動的プロセスは相互作用変数(物理システムから社会学的分析まで)に関与している。システム内のコンポーネントの相互作用は、相反する動的な振る舞いを引き起こす可能性がある。多くのアプローチは、プロトゲン運動を捉えるのに有効な内部相互作用を無視した時間配列をモデル化する。異なることに、我々のゴールは、部分全体階層を持つシステムを表現し、システム内変数間のインプリート依存性を発見することであり、これは、Recurrent partItioned Network (REIN) によるサブシステム動作に因果関係を持つ相互作用を推論することである。提案アーキテクチャは, (i) 複数のレベルにおける観測の階層的かつ時間的に一貫した表現を抽出する知覚モジュール, (ii) 各レベルにおけるニューロン間の関係性を決定する導出モジュール, (iii)時間分布推定を条件に未来を予測する統計的モジュールからなる。本モデルは,様々な物理システムを用いた長期予測において,限られた観測と安定なコンポーネント間相互作用の同定に有効であることが実証された。

In general, many dynamic processes are involved with interacting variables, from physical systems to sociological analysis. The interplay of components in the system can give rise to confounding dynamic behavior. Many approaches model temporal sequences holistically ignoring the internal interaction which are impotent in capturing the protogenic actuation. Differently, our goal is to represent a system with a part-whole hierarchy and discover the implied dependencies among intra-system variables: inferring the interactions that possess causal effects on the sub-system behavior with REcurrent partItioned Network (REIN). The proposed architecture consists of (i) a perceptive module that extracts a hierarchical and temporally consistent representation of the observation at multiple levels, (ii) a deductive module for determining the relational connection between neurons at each level, and (iii) a statistical module that can predict the future by conditioning on the temporal distributional estimation. Our model is demonstrated to be effective in identifying the componential interactions with limited observation and stable in long-term future predictions experimented with diverse physical systems.

翻訳日:2021-06-03 01:51:48 公開日:2021-06-01

# (参考訳) 3d waveunet:3d wavelet integrated encoder-decoder network for neuron segmentation

3D WaveUNet: 3D Wavelet Integrated Encoder-Decoder Network for Neuron Segmentation ( http://arxiv.org/abs/2106.00259v1 )

ライセンス: CC BY 4.0

Qiufu Li and Linlin Shen

(参考訳) 3Dニューロンセグメンテーションは、脳回路の探索と脳機能の理解に不可欠なニューロンのデジタル再構成の重要なステップである。しかし、ニューロンの細い線状神経繊維は広い領域に広がり、3Dニューロン画像のセグメンテーションに多大な計算コストをもたらす可能性がある。一方、画像内の強いノイズと断線された神経繊維は、タスクに大きな課題をもたらします。本稿では,3次元ウェーブレットとディープラーニングに基づく3次元ニューロン分割法を提案する。ニューロンイメージは、セグメンテーションタスクを単純化するために、まずニューロンキューブに分割される。次に、最初の3dウェーブレット統合エンコーダ・デコーダネットワークである3d waveunetを設計し、キューブ内の神経繊維を分割する。また、3D WaveUNetをトレーニングするために、最大の注釈付きニューロン画像データセットであるBigNeuronを用いて、NeuCuDa(NeuCuDa)を作成する。最後に、キューブに区切られた神経線維を組み立てて完全なニューロンを生成し、利用可能な自動追跡アルゴリズムを用いてデジタル再構成する。実験結果から, ノイズニューロン画像中の標的ニューロンを完全に抽出できる可能性が示唆された。統合された3Dウェーブレットは、3Dニューロンセグメンテーションと再構成の性能を効率よく向上させることができる。この作業のコードと事前訓練されたモデルはhttps://github.com/LiQiufu/3D-WaveUNet.comで入手できる。

3D neuron segmentation is a key step for the neuron digital reconstruction, which is essential for exploring brain circuits and understanding brain functions. However, the fine line-shaped nerve fibers of neuron could spread in a large region, which brings great computational cost to the segmentation in 3D neuronal images. Meanwhile, the strong noises and disconnected nerve fibers in the image bring great challenges to the task. In this paper, we propose a 3D wavelet and deep learning based 3D neuron segmentation method. The neuronal image is first partitioned into neuronal cubes to simplify the segmentation task. Then, we design 3D WaveUNet, the first 3D wavelet integrated encoder-decoder network, to segment the nerve fibers in the cubes; the wavelets could assist the deep networks in suppressing data noise and connecting the broken fibers. We also produce a Neuronal Cube Dataset (NeuCuDa) using the biggest available annotated neuronal image dataset, BigNeuron, to train 3D WaveUNet. Finally, the nerve fibers segmented in cubes are assembled to generate the complete neuron, which is digitally reconstructed using an available automatic tracing algorithm. The experimental results show that our neuron segmentation method could completely extract the target neuron in noisy neuronal images. The integrated 3D wavelets can efficiently improve the performance of 3D neuron segmentation and reconstruction. The code and pre-trained models for this work will be available at https://github.com/LiQiufu/3D-WaveUNet.

翻訳日:2021-06-03 01:41:06 公開日:2021-06-01

# (参考訳) コード生成のための分岐展開順序の動的選択の探索

Exploring Dynamic Selection of Branch Expansion Orders for Code Generation ( http://arxiv.org/abs/2106.00261v1 )

ライセンス: CC BY 4.0

Hui Jiang, Chulun Zhou, Fandong Meng, Biao Zhang, Jie Zhou, Degen Huang, Qingqiang Wu, Jinsong Su

(参考訳) ソフトウェア開発を促進する大きな可能性のために、コード生成は最近注目を集めています。一般に、支配的なモデルはseq2treeモデルであり、入力された自然言語記述を抽象構文木(ast)のプレオーダートラバーサルに対応するツリー構築アクションのシーケンスに変換する。しかし、そのようなトラバース順序は、すべてのマルチブランチノードを扱うのに適していないかもしれない。本稿では,複数分岐ノードに対する分岐の最適拡張順序を動的に決定できるコンテキストベース分岐セレクタを備えたSeq2Treeモデルを提案する。特に,拡張順序の選択は非微分可能な多段階演算であるため,強化学習によりセレクタを最適化し,異なる拡張順序によって得られるモデル損失の差として報酬関数を定式化する。いくつかの一般的なデータセットに対する実験結果と詳細な分析により,本手法の有効性と汎用性を示した。コードをhttps://github.com/DeepLearnXMU/CG-RLでリリースしました。

Due to the great potential in facilitating software development, code generation has attracted increasing attention recently. Generally, dominant models are Seq2Tree models, which convert the input natural language description into a sequence of tree-construction actions corresponding to the pre-order traversal of an Abstract Syntax Tree (AST). However, such a traversal order may not be suitable for handling all multi-branch nodes. In this paper, we propose to equip the Seq2Tree model with a context-based Branch Selector, which is able to dynamically determine optimal expansion orders of branches for multi-branch nodes. Particularly, since the selection of expansion orders is a non-differentiable multi-step operation, we optimize the selector through reinforcement learning, and formulate the reward function as the difference of model losses obtained through different expansion orders. Experimental results and in-depth analysis on several commonly-used datasets demonstrate the effectiveness and generality of our approach. We have released our code at https://github.com/DeepLearnXMU/CG-RL.

翻訳日:2021-06-03 00:56:42 公開日:2021-06-01

# (参考訳) 私がやったの? 強化学習における制御効果を識別する手段としての非難

Did I do that? Blame as a means to identify controlled effects in reinforcement learning ( http://arxiv.org/abs/2106.00266v1 )

ライセンス: CC BY-SA 4.0

Oriol Corcoll, Raul Vicente

(参考訳) 環境の制御可能な側面をモデル化することで、介入の優先順位付けが向上し、強化学習法における一般的な探索戦略となっている。繰り返し最先端の成果が得られたにもかかわらず、このアプローチは報酬ベースのタスクのプロキシとしてのみ研究されており、それ自体ではまだ評価されていない。我々は、アクション予測に依存するソリューションが重要なイベントをモデル化しないことを示す。一方、人間は自分の行動に責任を負い、自分がコントロールしたものを決定する。本稿では, 非難対策に基づく教師なし手法である制御効果ネットワーク(CEN)を提案する。 cenは、アクション予測に基づいて、人気のあるモデルよりも制御された効果を識別できることを示す幅広い環境で評価される。

Modeling controllable aspects of the environment enable better prioritization of interventions and has become a popular exploration strategy in reinforcement learning methods. Despite repeatedly achieving State-of-the-Art results, this approach has only been studied as a proxy to a reward-based task and has not yet been evaluated on its own. We show that solutions relying on action prediction fail to model important events. Humans, on the other hand, assign blame to their actions to decide what they controlled. Here we propose Controlled Effect Network (CEN), an unsupervised method based on counterfactual measures of blame. CEN is evaluated in a wide range of environments showing that it can identify controlled effects better than popular models based on action prediction.

翻訳日:2021-06-03 00:28:55 公開日:2021-06-01

# (参考訳) 自己監督型学習による話者自動検証のための逆防御

Adversarial Defense for Automatic Speaker Verification by Self-Supervised Learning ( http://arxiv.org/abs/2106.00273v1 )

ライセンス: CC BY 4.0

Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-yi Lee

(参考訳) 以前の研究では、自動話者検証(ASV)が、リプレイ、合成音声、最近出現した敵攻撃などの悪意のある密封攻撃に深刻な脆弱性があることが示されている。再生と合成音声に対するasvの防御に多大な努力が払われているが、敵対的な攻撃に対処するためのアプローチはごくわずかである。 ASVの敵攻撃に取り組むための既存のアプローチは、敵のサンプル生成の知識を必要とするが、敵の攻撃者によって適用される正確な攻撃アルゴリズムを知ることは現実的ではない。この研究は、特定の攻撃アルゴリズムを知らずにASVの敵防衛を行う最初の試みの一つである。自己教師型学習モデル(SSLMs)により、入力中の表面ノイズを緩和し、中断されたものからクリーンなサンプルを再構築する利点を持つが、この研究は、敵の摂動を一種のノイズとみなし、SSLMsによるASVに対する敵の防御を行う。具体的には,1) 対向摂動浄化と2) 対向摂動検出の2つの観点から対向防御を行うことを提案する。実験の結果, 検出モジュールは, 約80%の精度で対向検体を検出することにより, ASVを効果的に遮蔽することがわかった。さらに, ASV の敵防衛性能を評価するための一般的な指標は存在しないため, 浄化法と検出法の両方を考慮した敵防衛評価指標を定式化した。提案した評価フレームワークに基づいて,今後のアプローチのベンチマークを強く推奨する。

Previous works have shown that automatic speaker verification (ASV) is seriously vulnerable to malicious spoofing attacks, such as replay, synthetic speech, and recently emerged adversarial attacks. Great efforts have been dedicated to defending ASV against replay and synthetic speech; however, only a few approaches have been explored to deal with adversarial attacks. All the existing approaches to tackle adversarial attacks for ASV require the knowledge for adversarial samples generation, but it is impractical for defenders to know the exact attack algorithms that are applied by the in-the-wild attackers. This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms. Inspired by self-supervised learning models (SSLMs) that possess the merits of alleviating the superficial noise in the inputs and reconstructing clean samples from the interrupted ones, this work regards adversarial perturbations as one kind of noise and conducts adversarial defense for ASV by SSLMs. Specifically, we propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection. Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%. Moreover, since there is no common metric for evaluating the adversarial defense performance for ASV, this work also formalizes evaluation metrics for adversarial defense considering both purification and detection based approaches into account. We sincerely encourage future works to benchmark their approaches based on the proposed evaluation framework.

翻訳日:2021-06-03 00:11:31 公開日:2021-06-01

# (参考訳) 雑音ラベルに頑健な分類器の解析

Analysis of classifiers robust to noisy labels ( http://arxiv.org/abs/2106.00274v1 )

ライセンス: CC BY 4.0

Alex D\'iaz and Damian Steele

(参考訳) 我々は,クラス依存ラベリングノイズを克服するための現代ロバスト分類アルゴリズムについて検討する。最終試験データがクリーンである間に、クラス条件ランダムラベルノイズデータに基づいて分類器を訓練し評価する。ノイズデータを扱う際の分類器の性能を向上させるために,遷移行列を推定する手法を示す。深層学習を3つのデータセットに適用し,CIFARデータセット上の未知ノイズを用いたエンドツーエンド解析をスクラッチから導出する。分類器の有効性とロバスト性を分析し,各実験の結果をtop-1精度を用いて比較対照した。

We explore contemporary robust classification algorithms for overcoming class-dependant labelling noise: Forward, Importance Re-weighting and T-revision. The classifiers are trained and evaluated on class-conditional random label noise data while the final test data is clean. We demonstrate methods for estimating the transition matrix in order to obtain better classifier performance when working with noisy data. We apply deep learning to three data-sets and derive an end-to-end analysis with unknown noise on the CIFAR data-set from scratch. The effectiveness and robustness of the classifiers are analysed, and we compare and contrast the results of each experiment are using top-1 accuracy as our criterion.

翻訳日:2021-06-02 23:43:07 公開日:2021-06-01

# (参考訳) AAPM DL-Sparse-View CT Challenge Submission Report: Designing a Iterative Network for Fanbeam-CT with unknown Geometry

AAPM DL-Sparse-View CT Challenge Submission Report: Designing an Iterative Network for Fanbeam-CT with Unknown Geometry ( http://arxiv.org/abs/2106.00280v1 )

ライセンス: CC BY 4.0

Martin Genzel, Jan Macdonald, Maximilian M\"arz

(参考訳) 本報告は、AAPM DL-Sparse-View CT Challenge(チーム名「robust-and-stable」)への私たちの貢献の短い動機と説明に捧げるものである。データ駆動再建技術を用いて,限られたビューファンビーム測定から乳房モデルファントム画像の復元を行う。この課題は、参加者が基底真理画像のコレクションと、ノイズのないサブサンプリングされたシンノグラム(および関連する限定ビューフィルターされたバックプロジェクション画像)を提供するが、実際のフォワードモデルでは提供されないという意味で特徴的である。そこで,本手法では,まずファンビーム形状をデータ駆動幾何キャリブレーションステップで推定する。その後の2段階の手順で、ほぼ正確な解の計算を可能にする反復的なエンドツーエンドネットワークを設計する。

This report is dedicated to a short motivation and description of our contribution to the AAPM DL-Sparse-View CT Challenge (team name: "robust-and-stable"). The task is to recover breast model phantom images from limited view fanbeam measurements using data-driven reconstruction techniques. The challenge is distinctive in the sense that participants are provided with a collection of ground truth images and their noiseless, subsampled sinograms (as well as the associated limited view filtered backprojection images), but not with the actual forward model. Therefore, our approach first estimates the fanbeam geometry in a data-driven geometric calibration step. In a subsequent two-step procedure, we design an iterative end-to-end network that enables the computation of near-exact solutions.

翻訳日:2021-06-02 23:34:19 公開日:2021-06-01

# (参考訳) マルチドメイン対話状態追跡のためのスキーマ対応カリキュラム学習のプレビュー,参加,レビュー

Preview, Attend and Review: Schema-Aware Curriculum Learning for Multi-Domain Dialog State Tracking ( http://arxiv.org/abs/2106.00291v1 )

ライセンス: CC BY 4.0

Yinpei Dai, Hangyu Li, Yongbin Li, Jian Sun, Fei Huang, Luo Si, Xiaodan Zhu

(参考訳) 既存のダイアログ状態追跡(DST)モデルは、データセットの豊富な構造情報を無視して、ランダムにダイアログデータをトレーニングする。本稿では,課題指向対話におけるカリキュラム構造とスキーマ構造の両方をよりよく活用するために,カリキュラム学習(CL)を提案する。具体的には,Schema-aware Curriculum Learning for Dialog State Tracking (SaCLog) と呼ばれるモデルに依存しないフレームワークを提案する。このフレームワークは,DSTモデルをスキーマ情報で事前トレーニングするプレビューモジュールと,CLでモデルを最適化するカリキュラムモジュールと,CLトレーニングの強化のために誤予測データを拡張するレビューモジュールから構成される。提案手法は変換器ベースおよびRNNベースDSTモデル(TripPyおよびTRADE)よりもDST性能が向上し,WOZ2.0およびMultiWOZ2.1における新たな最先端結果が得られることを示す。

Existing dialog state tracking (DST) models are trained with dialog data in a random order, neglecting rich structural information in a dataset. In this paper, we propose to use curriculum learning (CL) to better leverage both the curriculum structure and schema structure for task-oriented dialogs. Specifically, we propose a model-agnostic framework called Schema-aware Curriculum Learning for Dialog State Tracking (SaCLog), which consists of a preview module that pre-trains a DST model with schema information, a curriculum module that optimizes the model with CL, and a review module that augments mispredicted data to reinforce the CL training. We show that our proposed approach improves DST performance over both a transformer-based and RNN-based DST model (TripPy and TRADE) and achieves new state-of-the-art results on WOZ2.0 and MultiWOZ2.1.

翻訳日:2021-06-02 23:23:48 公開日:2021-06-01

# (参考訳) 正半定因子化のためのリー・ソンのアルゴリズムの非可換拡張

A Non-commutative Extension of Lee-Seung's Algorithm for Positive Semidefinite Factorizations ( http://arxiv.org/abs/2106.00293v1 )

ライセンス: CC BY 4.0

Yong Sheng Soh, Antonios Varvitsiotis

(参考訳) 非負の成分を持つ行列 $X\in \mathbb{R}_+^{m\times n}$ が与えられたとき、$X$ の正半定値 (PSD) 分解は$r \times r$-dimensional PSD 行列 $\{A_i\}$ と $\{B_j\}$ の集合であり、すべての$\i\in [m],\ j\in [n]$ に対して$X_{ij}= \mathrm{tr}(A_i B_j)$ を満たす。 psd因子分解は、情報理論における量子資源の力と限界だけでなく、半定値プログラムの表現力の理解と基本的に結びついている。 psd因子分解タスクは、非負行列因子分解(nmf)問題を一般化し、r$-次元非負ベクトルの集まりである$\{a_i\}$と$\{b_j\}$を満たす$x_{ij}= a_i^\top b_j$, for all $i\in [m],\j\in [n]$ -- ここで、psd因子分解の行列を対角化として選択することで後者の問題を回復することができる。行列のNMFを計算するための最も広く使われているアルゴリズムは、Lee and Seungによって開発された乗算更新アルゴリズムであり、更新の非負性は正の対角行列でスケーリングすることで保存される。本稿では,PSD分解の計算のために,行列乗法更新(MMU)アルゴリズムと呼ぶLee-Seungアルゴリズムの非可換拡張について述べる。 MMUアルゴリズムは、適切なPSD行列の行列幾何学平均と一致スケーリングによって更新がPSDのままであることを保証する。また,Majorization-Minimizationフレームワークに基づいて,2乗損失目標が非増加的であり,固定点が臨界点に対応することを示す。この分析はリーブのConcavity Theoremに依存する。 PSD分解以外にも、MMUアルゴリズムをプリミティブとしてブロック対角PSD分解とテンソルPSD分解を計算する。実データと合成データの実験により,本手法の有用性を実証する。

Given a matrix $X\in \mathbb{R}_+^{m\times n}$ with nonnegative entries, a Positive Semidefinite (PSD) factorization of $X$ is a collection of $r \times r$-dimensional PSD matrices $\{A_i\}$ and $\{B_j\}$ satisfying $X_{ij}= \mathrm{tr}(A_i B_j)$ for all $\ i\in [m],\ j\in [n]$. PSD factorizations are fundamentally linked to understanding the expressiveness of semidefinite programs as well as the power and limitations of quantum resources in information theory. The PSD factorization task generalizes the Non-negative Matrix Factorization (NMF) problem where we seek a collection of $r$-dimensional nonnegative vectors $\{a_i\}$ and $\{b_j\}$ satisfying $X_{ij}= a_i^\top b_j$, for all $i\in [m],\ j\in [n]$ -- one can recover the latter problem by choosing matrices in the PSD factorization to be diagonal. The most widely used algorithm for computing NMFs of a matrix is the Multiplicative Update algorithm developed by Lee and Seung, in which nonnegativity of the updates is preserved by scaling with positive diagonal matrices. In this paper, we describe a non-commutative extension of Lee-Seung's algorithm, which we call the Matrix Multiplicative Update (MMU) algorithm, for computing PSD factorizations. The MMU algorithm ensures that updates remain PSD by congruence scaling with the matrix geometric mean of appropriate PSD matrices, and it retains the simplicity of implementation that Lee-Seung's algorithm enjoys. Building on the Majorization-Minimization framework, we show that under our update scheme the squared loss objective is non-increasing and fixed points correspond to critical points. The analysis relies on Lieb's Concavity Theorem. Beyond PSD factorizations, we use the MMU algorithm as a primitive to calculate block-diagonal PSD factorizations and tensor PSD factorizations. We demonstrate the utility of our method with experiments on real and synthetic data.

翻訳日:2021-06-02 23:12:40 公開日:2021-06-01

# (参考訳) 電力請求書の裏側:非インタラクティブ負荷監視のためのデュアルdnnアプローチ

More Behind Your Electricity Bill: a Dual-DNN Approach to Non-Intrusive Load Monitoring ( http://arxiv.org/abs/2106.00297v1 )

ライセンス: CC BY 4.0

Yu Zhang, Guoming Tang, Qianyi Huang, Yi Wang, Hong Xu

(参考訳) 非侵入負荷モニタリング(NILM)は、家庭のエネルギー消費を個々の家電の項目別エネルギー利用に分解することを目的とした、よく知られた単一チャネルブラインドソース分離問題である。このように、家庭のエネルギー利用に対する意識を高めることで、かなりの省エネが達成できる。近年の研究では、ディープニューラルネットワーク(DNN)ベースのアプローチがNILMタスクに有望であることが示されている。それでも、彼らは通常、ネットワーク設計におけるアプライアンス操作の固有の特性を無視し、不可解な結果をもたらす可能性がある。そこで我々は,DNNの潜在特徴の学習能力を活かしたデュアルディープニューラルネットワーク(Dual-DNN)を開発した。具体的には,2重DNNの設計において,異なる機器の動作状態のパワーレーティングを測定するサブネットワークと,対象機器の動作状態を特定するサブネットワークを採用する。最終結果は、これら2つのネットワーク出力を乗算し、一方、家電製品の多状態特性を考慮して得られる。家電の動作状態の空間特性を強制するために, 正中フィルタリングとハードゲーティング機構をサブネットワークに適用し, 状態同定を行う。最新のNILM手法と比較して、我々のデュアルDNNアプローチは、2つの公開ベンチマークデータセットで平均21.67%の性能改善を示す。

Non-intrusive load monitoring (NILM) is a well-known single-channel blind source separation problem that aims to decompose the household energy consumption into itemised energy usage of individual appliances. In this way, considerable energy savings could be achieved by enhancing household's awareness of energy usage. Recent investigations have shown that deep neural networks (DNNs) based approaches are promising for the NILM task. Nevertheless, they normally ignore the inherent properties of appliance operations in the network design, potentially leading to implausible results. We are thus motivated to develop the dual Deep Neural Networks (dual-DNN), which aims to i) take advantage of DNNs' learning capability of latent features and ii) empower the DNN architecture with identification ability of universal properties. Specifically in the design of dual-DNN, we adopt one subnetwork to measure power ratings of different appliances' operation states, and the other subnetwork to identify the running states of target appliances. The final result is then obtained by multiplying these two network outputs and meanwhile considering the multi-state property of household appliances. To enforce the sparsity property in appliance's state operating, we employ median filtering and hard gating mechanisms to the subnetwork for state identification. Compared with the state-of-the-art NILM methods, our dual-DNN approach demonstrates a 21.67% performance improvement in average on two public benchmark datasets.

翻訳日:2021-06-02 22:54:16 公開日:2021-06-01

# (参考訳) ゼロショット合成性のための独立プロトタイプ伝搬

Independent Prototype Propagation for Zero-Shot Compositionality ( http://arxiv.org/abs/2106.00305v1 )

ライセンス: CC BY 4.0

Frank Ruis, Gertjan Burghours, Doina Bucur

(参考訳) 人間は作曲のゼロショット推論が得意で、シマウマを見たことがない人は、黒と白のストライプの馬のように見えると認識することができる。一方、機械学習システムは通常、トレーニングデータの急激な相関を利用しており、そのような相関は、コンテキスト内でオブジェクトを認識するのに役立つが、一般化を損なう。分類中の文脈的手がかりを活用しつつ,不特定なデータセットを扱うために,新しいプロトタイプ伝搬グラフ法protopropを提案する。まず、条件付き独立なw.r.t.である物体(例えばゼブラ)の原型的表現を学ぶ。彼らの属性ラベル(例:ストライプ)とその逆。次に, 対象分布の依存関係を反映した新規属性・オブジェクトの組み合わせの合成プロトタイプを学習するために, 合成グラフを通して独立プロトタイプを伝搬する。このメソッドはクラス階層グラフや事前学習された単語埋め込みといった外部データに依存しない。 AO-Cleverはクリーンなラベルを持つ合成的でビジュアルなデータセットであり、UT-Zapposはきめ細かい靴型のノイズの多い現実世界のデータセットである。一般化された合成ゼロショット設定において、我々は最先端の結果よりも優れており、この手法のそれぞれの部分の重要性と最終的な結果への寄与を示す。

Humans are good at compositional zero-shot reasoning; someone who has never seen a zebra before could nevertheless recognize one when we tell them it looks like a horse with black and white stripes. Machine learning systems, on the other hand, usually leverage spurious correlations in the training data, and while such correlations can help recognize objects in context, they hurt generalization. To be able to deal with underspecified datasets while still leveraging contextual clues during classification, we propose ProtoProp, a novel prototype propagation graph method. First we learn prototypical representations of objects (e.g., zebra) that are conditionally independent w.r.t. their attribute labels (e.g., stripes) and vice versa. Next we propagate the independent prototypes through a compositional graph, to learn compositional prototypes of novel attribute-object combinations that reflect the dependencies of the target distribution. The method does not rely on any external data, such as class hierarchy graphs or pretrained word embeddings. We evaluate our approach on AO-Clever, a synthetic and strongly visual dataset with clean labels, and UT-Zappos, a noisy real-world dataset of fine-grained shoe types. We show that in the generalized compositional zero-shot setting we outperform state-of-the-art results, and through ablations we show the importance of each part of the method and their contribution to the final results.

翻訳日:2021-06-02 22:37:37 公開日:2021-06-01

# (参考訳) 世界ニュースを通して平和を理解する

Understanding peacefulness through the world news ( http://arxiv.org/abs/2106.00306v1 )

ライセンス: CC BY 4.0

Vasiliki Voukelatou, Ioanna Miliou, Fosca Giannotti, Luca Pappalardo

(参考訳) 平和性は全ての人類にとって幸福の主要な次元であり、不平等とあらゆる形態の暴力から抜け出す方法である。そのため、近年は研究者や政策立案者の注目を集めている。ここ数年、新しいデジタルデータストリームがこの分野の研究を大きく変えてきた。本研究は,GDELT(Global Data on Events, Location, and Tone)デジタルニュースデータベースから抽出した情報を利用して,GPI(Global Peace Index)を通して平和性を捉えている。予測機械学習モデルを適用することで,gdeltによるニュースメディアの注目度を月単位のgpi測定の指標として利用できることを示す。さらに、shap方法論を使用して、予測を駆動する最も重要な変数を取得します。この分析は各国のプロファイルを強調し、全体的な予測、特にこれらのエラーを駆動するエラーやイベントについての説明を提供する。社会善研究者、政策立案者、平和構築者が活用するデジタルデータは、機械学習と同じくらい強力なデータサイエンスツールによって、社会的利益の最大化と平和へのリスクの最小化に寄与すると考えている。

Peacefulness is a principal dimension of well-being for all humankind and is the way out of inequity and every single form of violence. Thus, its measurement has lately drawn the attention of researchers and policy-makers. During the last years, novel digital data streams have drastically changed the research in this field. In the current study, we exploit information extracted from Global Data on Events, Location, and Tone (GDELT) digital news database, to capture peacefulness through the Global Peace Index (GPI). Applying predictive machine learning models, we demonstrate that news media attention from GDELT can be used as a proxy for measuring GPI at a monthly level. Additionally, we use the SHAP methodology to obtain the most important variables that drive the predictions. This analysis highlights each country's profile and provides explanations for the predictions overall, and particularly for the errors and the events that drive these errors. We believe that digital data exploited by Social Good researchers, policy-makers, and peace-builders, with data science tools as powerful as machine learning, could contribute to maximize the societal benefits and minimize the risks to peacefulness.

翻訳日:2021-06-02 22:22:23 公開日:2021-06-01

# (参考訳) Reinforce Security: セキュアなWiretapコーディングに向けたモデルフリーアプローチ

Reinforce Security: A Model-Free Approach Towards Secure Wiretap Coding ( http://arxiv.org/abs/2106.00343v1 )

ライセンス: CC BY 4.0

Rick Fritschek, Rafael F. Schaefer, Gerhard Wunder

(参考訳) セキュアな符号化関数を近似するためのディープラーニングベースの技術は、無線通信システムの一般的なコーディングとデコードタスクで得られた素晴らしい結果によって、無線通信にかなりの関心を集めている。特に重要なのは、基礎となるチャネルを知らずに機能するモデルフリー技術の開発である。このような手法は,例えば,条件付きチャネル分布の推定とモデル化,報奨関数としての相互情報推定,強化学習などに用いる。本稿では,強化学習のアプローチについて検討し,特にニューラルネットワークを用いたセキュアエンコーディングのモデルフリーアプローチのためのポリシー勾配法について検討する。従来開発された符号化プロセス上のコセット構造を強制する手法は、最近の強化学習手法と組み合わせることができる。この新しい手法は広範囲のシミュレーションにより評価され, 盗聴者の復号性能が一定の誤差レベルに低下することが示されている。

The use of deep learning-based techniques for approximating secure encoding functions has attracted considerable interest in wireless communications due to impressive results obtained for general coding and decoding tasks for wireless communication systems. Of particular importance is the development of model-free techniques that work without knowledge about the underlying channel. Such techniques utilize for example generative adversarial networks to estimate and model the conditional channel distribution, mutual information estimation as a reward function, or reinforcement learning. In this paper, the approach of reinforcement learning is studied and, in particular, the policy gradient method for a model-free approach of neural network-based secure encoding is investigated. Previously developed techniques for enforcing a certain co-set structure on the encoding process can be combined with recent reinforcement learning approaches. This new approach is evaluated by extensive simulations, and it is demonstrated that the resulting decoding performance of an eavesdropper is capped at a certain error level.

翻訳日:2021-06-02 21:53:10 公開日:2021-06-01

# (参考訳) 頂点$p$-center問題の解法のためのグラフ畳み込みネットワークによる実験

Experiments with graph convolutional networks for solving the vertex $p$-center problem ( http://arxiv.org/abs/2106.00357v1 )

ライセンス: CC BY 4.0

Elisabeth Gaar and Markus Sinnl

(参考訳) 過去数年間、グラフ畳み込みネットワーク(gcn)は、グラフ上で定義されたnp-hard combinatorial optimization problem(cops)に取り組むために、機械学習コミュニティで人気のある研究方向となっている。得られた結果は、通常、オペレーションリサーチコミュニティの問題解決アプローチと競合しないが、GCNは、トラベルセールスパーソン問題(TSP)のような古典的なCOPに対する以前の機械学習アプローチと比べて改善されることが多い。本稿では,グラフ上の別の古典的COPである頂点p中心問題(PCP)の解法としてGCNを用いた予備的検討を行う。特に、TSPのエンド・ツー・エンドトレーニングに基づくモデルが、同様の2次元ユークリッドグラフ入力に基づいてTSPの通常使われるバージョンとして定義されたPCPに適応できるかどうかを検討する。しかし、PCP の目的は min-max 構造であり、多くの対称最適解、すなわち、接地トラス解や学習の潜在的な困難をもたらす可能性がある。得られた予備結果は,ネットワークアーキテクチャのアイデアの直接転送があまりうまくいかないことを示している。したがって、我々はPCPがGCNの領域における新しいアイデアや開発のための興味深いベンチマーク問題になり得ると考えている。

In the last few years, graph convolutional networks (GCN) have become a popular research direction in the machine learning community to tackle NP-hard combinatorial optimization problems (COPs) defined on graphs. While the obtained results are usually still not competitive with problem-specific solution approaches from the operations research community, GCNs often lead to improvements compared to previous machine learning approaches for classical COPs such as the traveling salesperson problem (TSP). In this work we present a preliminary study on using GCNs for solving the vertex p-center problem (PCP), which is another classic COP on graphs. In particular, we investigate whether a successful model based on end-to-end training for the TSP can be adapted to a PCP, which is defined on a similar 2D Euclidean graph input as the usually used version of the TSP. However, the objective of the PCP has a min-max structure which could lead to many symmetric optimal, i.e., ground-truth solutions and other potential difficulties for learning. Our obtained preliminary results show that indeed a direct transfer of network architecture ideas does not seem to work too well. Thus we think that the PCP could be an interesting benchmark problem for new ideas and developments in the area of GCNs.

翻訳日:2021-06-02 21:41:16 公開日:2021-06-01

# (参考訳) フットボールのボディーオリエンテーションを分類として学ぶ

Learning Football Body-Orientation as a Matter of Classification ( http://arxiv.org/abs/2106.00359v1 )

ライセンス: CC BY 4.0

Adri\`a Arbu\'es-Sang\"uesa, Adri\'an Mart\'in, Paulino Granero, Coloma Ballester, Gloria Haro

(参考訳) オリエンテーションはサッカー選手にとって重要なスキルであり、多くのイベント、特にパスを含むイベントにおいて差別化要因となる。しかし、既存の方向推定手法は、コンピュータビジョン技術に基づいているが、改善の余地は多い。我々の知る限り、本論文はビデオ映像から直接向きを推定する最初のディープラーニングモデルを示す。クラスが配向ビンに対応する分類問題としてこの課題にアプローチし、循環損失関数を導入することにより、有名な畳み込みネットワークを改良し、プレーヤの配向データを提供する。このモデルは、現在のフレームの認識方向に対して個別に補償されるウェアラブルEPTSデバイスから得られる地中構造データを用いて訓練される。得られた結果は従来の手法よりも優れており、特に絶対中央値誤差はプレイヤー当たり12度以下である。あらゆる種類のフットボールビデオ映像に潜在的な一般化を示すために、アブレーション研究が行われる。

Orientation is a crucial skill for football players that becomes a differential factor in a large set of events, especially the ones involving passes. However, existing orientation estimation methods, which are based on computer-vision techniques, still have a lot of room for improvement. To the best of our knowledge, this article presents the first deep learning model for estimating orientation directly from video footage. By approaching this challenge as a classification problem where classes correspond to orientation bins, and by introducing a cyclic loss function, a well-known convolutional network is refined to provide player orientation data. The model is trained by using ground-truth orientation data obtained from wearable EPTS devices, which are individually compensated with respect to the perceived orientation in the current frame. The obtained results outperform previous methods; in particular, the absolute median error is less than 12 degrees per player. An ablation study is included in order to show the potential generalization to any kind of football video footage.

翻訳日:2021-06-02 21:33:58 公開日:2021-06-01

# (参考訳) 空間的・時間的制約を伴う大規模・動的・分散連立形成

Large-scale, Dynamic and Distributed Coalition Formation with Spatial and Temporal Constraints ( http://arxiv.org/abs/2106.00379v1 )

ライセンス: CC BY-SA 4.0

Luca Capezzuto, Danesh Tarapore, and Sarvapali D. Ramchurn

(参考訳) 時間的制約問題と時間的制約問題(cfstp)による連立形成は、複数のエージェントがそれぞれ期限とワークロードで多くのタスクを実行しなければならないマルチエージェントタスク割り当て問題である。完了したタスクの数を最大化するために、エージェントは連合を形成し、解散し、改革することで協力する必要がある。 CFSTPの元々の数学的プログラミングの定式化は、長大で問題のあるBig-M法に基づいているため、実装が難しい。本稿では,コンパクトで実装が容易な定式化を提案する。さらに、最先端CFSTPアルゴリズムの分散バージョンであるD-CTSを設計する。ロンドン消防団の記録を使って、347588ドルのタスクと、動的環境における消防士の動員をシミュレートするテストフレームワークを備えたデータセットを作成します。最先端の分散アルゴリズムであるDSA-SDPと比較して、150ドルのエージェントと3000ドルのタスクを持つ問題では、D-CTSは3.79\% \pm [42.22\%, 1.96\%]$以上のタスクを完了し、通信オーバーヘッドと時間複雑性の点で1桁の効率である。 D-CTSは、最初の大規模、動的、分散CFSTPベンチマークを設定。

The Coalition Formation with Spatial and Temporal constraints Problem (CFSTP) is a multi-agent task allocation problem in which few agents have to perform many tasks, each with its deadline and workload. To maximize the number of completed tasks, the agents need to cooperate by forming, disbanding and reforming coalitions. The original mathematical programming formulation of the CFSTP is difficult to implement, since it is lengthy and based on the problematic Big-M method. In this paper, we propose a compact and easy-to-implement formulation. Moreover, we design D-CTS, a distributed version of the state-of-the-art CFSTP algorithm. Using public London Fire Brigade records, we create a dataset with $347588$ tasks and a test framework that simulates the mobilization of firefighters in dynamic environments. In problems with up to $150$ agents and $3000$ tasks, compared to DSA-SDP, a state-of-the-art distributed algorithm, D-CTS completes $3.79\% \pm [42.22\%, 1.96\%]$ more tasks, and is one order of magnitude more efficient in terms of communication overhead and time complexity. D-CTS sets the first large-scale, dynamic and distributed CFSTP benchmark.

翻訳日:2021-06-02 21:21:45 公開日:2021-06-01

# (参考訳) 決定概念の格子対決定木とランダムフォレスト

Decision Concept Lattice vs. Decision Trees and Random Forests ( http://arxiv.org/abs/2106.00387v1 )

ライセンス: CC BY 4.0

Egor Dudyrev, Sergei O. Kuznetsov

(参考訳) 決定木とそのアンサンブルは、教師付き機械学習の非常に人気のあるモデルである。本稿では、多項式時間で構築可能な新しい教師付き機械学習モデルを提案し、分類問題と回帰問題の両方に適用できる決定木、それらのアンサンブル、FCAの考え方を融合する。具体的には,まず決定木に基づく概念格子の一部を構成する多項式時間アルゴリズムを提案する。第2に,最先端モデルに匹敵する予測品質で分類タスクと回帰タスクの両方を解くための概念格子に基づく予測スキームについて述べる。

Decision trees and their ensembles are very popular models of supervised machine learning. In this paper we merge the ideas underlying decision trees, their ensembles and FCA by proposing a new supervised machine learning model which can be constructed in polynomial time and is applicable for both classification and regression problems. Specifically, we first propose a polynomial-time algorithm for constructing a part of the concept lattice that is based on a decision tree. Second, we describe a prediction scheme based on a concept lattice for solving both classification and regression tasks with prediction quality comparable to that of state-of-the-art models.

翻訳日:2021-06-02 21:05:08 公開日:2021-06-01

# (参考訳) 視覚に基づく異常赤血球分類の解析

Analysis of Vision-based Abnormal Red Blood Cell Classification ( http://arxiv.org/abs/2106.00389v1 )

ライセンス: CC BY 4.0

Annika Wong and Nantheera Anantrasirichai and Thanarat H. Chalidabhongse and Duangdao Palasuwan and Attakorn Palasuwan and David Bull

(参考訳) 赤血球(RBC)の異常の同定は、貧血から肝疾患まで幅広い医学的疾患を診断する鍵となる。現在、これは手動で行われ、時間がかかり、主観的なプロセスである。本稿では,機械学習の利点を利用したセル異常検出のキャパシティ向上と標準化を行い,その性能を解析する。従来の機械学習技術であるSVM(Support Vector Machine)、グラフデータのためのディープラーニングアーキテクチャであるTabNet、医用画像セグメンテーション用に設計されたセグメンテーションネットワークであるU-Netの3つの異なる機械学習技術が使用された。重要な問題は、機械学習の有効性に影響を与えるデータセットの高度に不均衡な性質であった。これを解決するために,SMOTE(Synthetic Minority Over-Sampling Technique)とコスト依存学習を用いて,特徴空間におけるマイノリティクラスサンプルの合成を検討した。これら2つの手法を組み合わせて全体の性能を改善する。これらの戦略は少数民族に対する感受性を高めることが判明した。未知の細胞が意味的セグメンテーションに与える影響を実証し、このモデルがラベル付き細胞の学習をこれらの匿名細胞に適用する証拠を示す。これらの結果は,RBC異常検出の自動化に期待できる手法として,古典的モデルと新しいディープラーニングネットワークの両方を示している。

Identification of abnormalities in red blood cells (RBC) is key to diagnosing a range of medical conditions from anaemia to liver disease. Currently this is done manually, a time-consuming and subjective process. This paper presents an automated process utilising the advantages of machine learning to increase capacity and standardisation of cell abnormality detection, and its performance is analysed. Three different machine learning technologies were used: a Support Vector Machine (SVM), a classical machine learning technology; TabNet, a deep learning architecture for tabular data; U-Net, a semantic segmentation network designed for medical image segmentation. A critical issue was the highly imbalanced nature of the dataset which impacts the efficacy of machine learning. To address this, synthesising minority class samples in feature space was investigated via Synthetic Minority Over-sampling Technique (SMOTE) and cost-sensitive learning. A combination of these two methods is investigated to improve the overall performance. These strategies were found to increase sensitivity to minority classes. The impact of unknown cells on semantic segmentation is demonstrated, with some evidence of the model applying learning of labelled cells to these anonymous cells. These findings indicate both classical models and new deep learning networks as promising methods in automating RBC abnormality detection.

翻訳日:2021-06-02 20:58:54 公開日:2021-06-01

# (参考訳) 春園寺井:中国語モデル事前学習のための言語的インフォームド・トケナイザー

SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language Model Pretraining ( http://arxiv.org/abs/2106.00400v1 )

ライセンス: CC BY 4.0

Chenglei Si, Zhengyan Zhang, Yingfa Chen, Fanchao Qi, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun

(参考訳) 中国語事前訓練言語モデル(PLM)の従来のトークン化手法では、各文字を識別不可能なトークンとして扱う(Devlin et al., 2019)。本研究では,PLMの中国語トークン化における3つの要因,すなわち発音,グリフ(形),単語境界の影響を包括的に研究する。対応として,1) SHUOWEN(話し言葉),2) JIEZI(ソルブ文字),3) グリフベーストークン,3) 単語セグメント化トークン,および中国語単語セグメント化トークンの3種類を提案する。検討したトークン化器の有効性を実証的に比較するために,BERTスタイルの言語モデルとそれらを事前学習し,下流NLUタスクのモデルを評価する。 SHUOWENとJIEZIは従来の単一文字のトークン化器よりも優れており、中国語のセグメンテーションは前処理のステップとして何の利益も示さない。さらに,提案したSHUOWENおよびJIEZIトークンは,ノイズの多いテキストを扱う場合のロバスト性が著しく向上した。コードと事前訓練されたモデルは、言語的に知らされた中国語NLPを促進するために公開される。

Conventional tokenization methods for Chinese pretrained language models (PLMs) treat each character as an indivisible token (Devlin et al., 2019), which ignores the characteristics of the Chinese writing system. In this work, we comprehensively study the influences of three main factors on the Chinese tokenization for PLM: pronunciation, glyph (i.e., shape), and word boundary. Correspondingly, we propose three kinds of tokenizers: 1) SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers; 3) Word segmented tokenizers, the tokenizers with Chinese word segmentation. To empirically compare the effectiveness of studied tokenizers, we pretrain BERT-style language models with them and evaluate the models on various downstream NLU tasks. We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers, while Chinese word segmentation shows no benefit as a preprocessing step. Moreover, the proposed SHUOWEN and JIEZI tokenizers exhibit significantly better robustness in handling noisy texts. The code and pretrained models will be publicly released to facilitate linguistically informed Chinese NLP.

翻訳日:2021-06-02 20:39:06 公開日:2021-06-01

# (参考訳) KGPool:関係抽出のための動的知識グラフコンテキスト選択

KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction ( http://arxiv.org/abs/2106.00459v1 )

ライセンス: CC BY 4.0

Abhishek Nadgeri, Anson Bastos, Kuldeep Singh, Isaiah Onando Mulang', Johannes Hoffart, Saeedeh Shekarpour, Vijay Saraswat

(参考訳) 本稿では,1つの文から関係抽出(RE)を行い,文と2つの与えられた実体を知識グラフ(KG)の標準事実にマッピングする手法を提案する。特にこの推定されたセンデンシャルRE設定では、単一の文のコンテキストはしばしばスパースである。本稿では,KGPool法を用いて,KGから追加事実を付加してコンテキストを動的に拡張する手法を提案する。これらの事実(エンティティエイリアス、エンティティ記述など)の表現を学習する。知覚的文脈を補う神経的手法を使いますすべての拡張事実を静的に使用する既存の方法とは異なり、KGPoolはこの拡張を文に条件付ける。ニューラルモデルとKG(WikidataとNYT Freebase)を用いてKGPoolの有効性を評価する。標準データセットを用いた実験により,KGPool表現をグラフニューラルネットワークに入力することにより,本手法は最先端手法よりもはるかに精度が高いことがわかった。

We present a novel method for relation extraction (RE) from a single sentence, mapping the sentence and two given entities to a canonical fact in a knowledge graph (KG). Especially in this presumed sentential RE setting, the context of a single sentence is often sparse. This paper introduces the KGPool method to address this sparsity, dynamically expanding the context with additional facts from the KG. It learns the representation of these facts (entity alias, entity descriptions, etc.) using neural methods, supplementing the sentential context. Unlike existing methods that statically use all expanded facts, KGPool conditions this expansion on the sentence. We study the efficacy of KGPool by evaluating it with different neural models and KGs (Wikidata and NYT Freebase). Our experimental evaluation on standard datasets shows that by feeding the KGPool representation into a Graph Neural Network, the overall method is significantly more accurate than state-of-the-art methods.

翻訳日:2021-06-02 20:26:39 公開日:2021-06-01

# (参考訳) 機械学習における公平度指標の動物園

The zoo of Fairness metrics in Machine Learning ( http://arxiv.org/abs/2106.00467v1 )

ライセンス: CC BY 4.0

Alessandro Castelnovo, Riccardo Crupi, Greta Greco, Daniele Regoli

(参考訳) 近年,機械学習(ML)における公平性と自動意思決定の問題が,人工知能を扱う科学コミュニティで注目されている。 MLにおける公平性の定義の多様さが提案され、人口の個人に影響を与える状況において「公正な決定」とは何かという異なる概念が検討されている。これらの概念間の正確な相違、含意、および「直交性」は、まだ文献で完全には分析されていない。本研究では、この定義の動物園から何らかの順序付けを試みる。

In the recent years, the problem of addressing fairness in Machine Learning (ML) and automatic decision-making has attracted a lot of attention in the scientific communities dealing with Artificial Intelligence. A plethora of different definitions of fairness in ML have been proposed, that consider different notions of what is a "fair decision" in situations impacting individuals in the population. The precise differences, implications and "orthogonality" between these notions have not yet been fully analyzed in the literature. In this work, we try to make some order out of this zoo of definitions.

翻訳日:2021-06-02 20:08:26 公開日:2021-06-01

# (参考訳) 微分プライバシーを持つガウス過程

Gaussian Processes with Differential Privacy ( http://arxiv.org/abs/2106.00474v1 )

ライセンス: CC BY 4.0

Antti Honkela

(参考訳) ガウス過程(英: Gaussian process、GP)は、様々な予測タスクに広く使用される非パラメトリックベイズモデルである。ディファレンシャルプライバシ(dp)を通じてgpsに強力なプライバシ保護を追加する以前の作業は、入力ではなく、予測対象(モデル出力)のプライバシのみを保護することに限定されていた。モデル入力と出力の両方に対してDP保護を備えたGPを導入することで、この制限を破る。我々は, sparse gp法を用いて, 既知の誘導点に対するプライベートな変分近似を公表することでこれを実現する。近似共分散は、DPノイズから付加された不確実性を考慮して調整される。この近似は、標準スパースGP技術を用いて任意の予測を計算するために用いられる。本稿では,検証セットのログ類似性に適用したプライベート選択プロトコルを用いたハイパーパラメータ学習手法を提案する。我々の実験は、十分な量のデータがあれば、強力なプライバシー保護下で正確なモデルを生成することができることを示した。

Gaussian processes (GPs) are non-parametric Bayesian models that are widely used for diverse prediction tasks. Previous work in adding strong privacy protection to GPs via differential privacy (DP) has been limited to protecting only the privacy of the prediction targets (model outputs) but not inputs. We break this limitation by introducing GPs with DP protection for both model inputs and outputs. We achieve this by using sparse GP methodology and publishing a private variational approximation on known inducing points. The approximation covariance is adjusted to approximately account for the added uncertainty from DP noise. The approximation can be used to compute arbitrary predictions using standard sparse GP techniques. We propose a method for hyperparameter learning using a private selection protocol applied to validation set log-likelihood. Our experiments demonstrate that given sufficient amount of data, the method can produce accurate models under strong privacy protection.

翻訳日:2021-06-02 19:43:40 公開日:2021-06-01

# (参考訳) 微分プライバシーのシャッフルモデルにおける厳密な会計

Tight Accounting in the Shuffle Model of Differential Privacy ( http://arxiv.org/abs/2106.00477v1 )

ライセンス: CC BY 4.0

Antti Koskela, Mikko A. Heikkil\"a, Antti Honkela

(参考訳) ディファレンシャルプライバシのシャッフルモデル(英: shuffle model of differential privacy)は、ローカルプライバシ機構と信頼できるシャッファを組み合わせた、新しい分散プライバシモデルである。シャッフルによって提供される追加のランダム化は、純粋にローカルなメカニズムと比較してプライバシの境界を改善することが示されている。厳密な境界、特にマルチメッセージプロトコルはシャフラーによってもたらされる複雑さによって複雑になる。最近提案された$(\varepsilon,\delta)$-differential privacy guaranteesの評価のためのフーリエ会計士は、様々な複雑なメカニズムの非適応構成の一般的な方法よりも厳密な境界を与えることが示されている。本稿では,シャッフルモデルにおける複数のユビキタスメカニズムのマルチメッセージバージョンに対して,Fourier Accountantを用いた厳密なプライバシー境界の計算方法を示し,文献における既存のバウンダリのゆるみを示す。

Shuffle model of differential privacy is a novel distributed privacy model based on a combination of local privacy mechanisms and a trusted shuffler. It has been shown that the additional randomisation provided by the shuffler improves privacy bounds compared to the purely local mechanisms. Accounting tight bounds, especially for multi-message protocols, is complicated by the complexity brought by the shuffler. The recently proposed Fourier Accountant for evaluating $(\varepsilon,\delta)$-differential privacy guarantees has been shown to give tighter bounds than commonly used methods for non-adaptive compositions of various complex mechanisms. In this paper we show how to compute tight privacy bounds using the Fourier Accountant for multi-message versions of several ubiquitous mechanisms in the shuffle model and demonstrate looseness of the existing bounds in the literature.

翻訳日:2021-06-02 19:28:28 公開日:2021-06-01

# (参考訳) RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network

RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network ( http://arxiv.org/abs/2106.00496v1 )

ライセンス: CC BY 4.0

Lili Zhao, Zezhi Zhu, Xuhu Lin, Xuezhou Guo, Qian Yin, Wenyi Wang, Jianwen Chen

(参考訳) 捕捉されたフレーム間の中間フレームを合成するLiDAR点雲フレーム補間は、多くのアプリケーションにおいて重要な問題となっている。特に点雲の伝送量を減少させるためには、参照フレームに基づいて中間フレームを予測し、高いフレームレートにデータをアップサンプルする。しかし, 点雲の高次元的, スパース的特徴から, ビデオよりもLiDAR点雲の中間フレームの予測が困難である。本稿では,CNNとの中間表現として範囲画像(RI)を利用してフレーム補間処理を行う,新しいLiDAR点雲補間法を提案する。 RIの遺伝特性がカラー画像と異なることを考えると、我々は空間適応的畳み込みを導入して範囲の特徴を適応的に抽出し、高効率フロー推定法を提案する。提案したモデルでは,光学フローに基づいて入力フレームとレンジ特徴をワープし,補間フレームを合成する。 KITTIデータセットの広汎な実験により,本手法は最新の映像フレーム補間法よりも優れた知覚品質を有する優れたフレーム補間結果が得られることが示された。提案手法は,任意のLiDAR点クラウド圧縮システムに統合して予測を行うことができる。

LiDAR point cloud frame interpolation, which synthesizes the intermediate frame between the captured frames, has emerged as an important issue for many applications. Especially for reducing the amounts of point cloud transmission, it is by predicting the intermediate frame based on the reference frames to upsample data to high frame rate ones. However, due to high-dimensional and sparse characteristics of point clouds, it is more difficult to predict the intermediate frame for LiDAR point clouds than videos. In this paper, we propose a novel LiDAR point cloud frame interpolation method, which exploits range images (RIs) as an intermediate representation with CNNs to conduct the frame interpolation process. Considering the inherited characteristics of RIs differ from that of color images, we introduce spatially adaptive convolutions to extract range features adaptively, while a high-efficient flow estimation method is presented to generate optical flows. The proposed model then warps the input frames and range features, based on the optical flows to synthesize the interpolated frame. Extensive experiments on the KITTI dataset have clearly demonstrated that our method consistently achieves superior frame interpolation results with better perceptual quality to that of using state-of-the-art video frame interpolation methods. The proposed method could be integrated into any LiDAR point cloud compression systems for inter prediction.

翻訳日:2021-06-02 19:05:40 公開日:2021-06-01

# (参考訳) 動的環境とタスクへの適応のための統一認知学習フレームワーク

A Unified Cognitive Learning Framework for Adapting to Dynamic Environment and Tasks ( http://arxiv.org/abs/2106.00501v1 )

ライセンス: CC BY 4.0

Qihui Wu, Tianchen Ruan, Fuhui Zhou, Yang Huang, Fan Xu, Shijin Zhao, Ya Liu, and Xuyang Huang

(参考訳) 多くの機械学習フレームワークが提案され、様々な目標を実現するために無線通信に利用されている。しかし、ダイナミックなワイヤレス環境やタスクに適応できず、自己学習ができないことで、幅広い応用と達成可能な性能が制限される。脳認知機構による霊長類行動の柔軟性と適応性に着想を得て、動的無線環境とタスクに対して統合認知学習(CL)フレームワークが提案されている。提案するCLの数学的枠組みを確立した。提案するCLフレームワークには,動的な環境やタスクに適応する能力,自己学習能力,そして,変調認識を例に挙げて「悪いお金を追い出す良い金」の能力の3つの利点があることを示す。提案されているCLフレームワークは、現在の学習フレームワークを強化し、アプリケーションを拡張することができる。

Many machine learning frameworks have been proposed and used in wireless communications for realizing diverse goals. However, their incapability of adapting to the dynamic wireless environment and tasks and of self-learning limit their extensive applications and achievable performance. Inspired by the great flexibility and adaptation of primate behaviors due to the brain cognitive mechanism, a unified cognitive learning (CL) framework is proposed for the dynamic wireless environment and tasks. The mathematical framework for our proposed CL is established. Using the public and authoritative dataset, we demonstrate that our proposed CL framework has three advantages, namely, the capability of adapting to the dynamic environment and tasks, the self-learning capability and the capability of 'good money driving out bad money' by taking modulation recognition as an example. The proposed CL framework can enrich the current learning frameworks and widen the applications.

翻訳日:2021-06-02 18:54:44 公開日:2021-06-01

# (参考訳) マルチラベルリモートセンシング画像検索のためのグラフ理論的深部表現学習法

A Novel Graph-Theoretic Deep Representation Learning Method for Multi-Label Remote Sensing Image Retrieval ( http://arxiv.org/abs/2106.00506v1 )

ライセンス: CC BY 4.0

Gencer Sumbul and Beg\"um Demir

(参考訳) 本稿では,多層リモートセンシング(rs)画像検索問題におけるグラフ理論的深層表現学習手法を提案する。提案手法は,アーカイブ内の各RS画像に関連する複数ラベルの共起関係を抽出し,活用することを目的としている。この目的のために、各トレーニング画像は、まず、局所情報と関連する空間構造の両方を組み合わせた地域ベースの画像表現を提供するグラフ構造で表現される。他のグラフベース手法とは異なり、提案手法は、アーカイブ内の各RS画像のグラフ構造を自動的に予測するディープニューラルネットワークをトレーニングするための新しい学習戦略を含む。この戦略は、領域表現学習損失関数を用いて、そのマルチラベル共起関係に基づいて画像コンテンツを特徴付ける。実験により,RSにおける検索問題に対する提案手法の有効性を,最先端の深層表現学習法と比較した。提案手法のコードはhttps://git.tu-berlin.de/rsim/GT-DRL-CBIR で公開されている。

This paper presents a novel graph-theoretic deep representation learning method in the framework of multi-label remote sensing (RS) image retrieval problems. The proposed method aims to extract and exploit multi-label co-occurrence relationships associated to each RS image in the archive. To this end, each training image is initially represented with a graph structure that provides region-based image representation combining both local information and the related spatial organization. Unlike the other graph-based methods, the proposed method contains a novel learning strategy to train a deep neural network for automatically predicting a graph structure of each RS image in the archive. This strategy employs a region representation learning loss function to characterize the image content based on its multi-label co-occurrence relationship. Experimental results show the effectiveness of the proposed method for retrieval problems in RS compared to state-of-the-art deep representation learning methods. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/GT-DRL-CBIR .

翻訳日:2021-06-02 18:45:12 公開日:2021-06-01

# (参考訳) レベル適応型クレジット割り当てを用いた協調型マルチエージェント転送学習

Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment ( http://arxiv.org/abs/2106.00517v1 )

ライセンス: CC BY 4.0

Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, Jianye Hao

(参考訳) 協調型マルチエージェント強化学習(MARL)への移行学習は近年注目されている。単一エージェントの設定とは対照的に、協調的なMARLでは調整が不可欠である。しかし,既存の転送手法はエージェントポリシーにのみ焦点をあて,協調知識を無視する。本稿では,コーディネーション全体を複数の協調パターンに適切に分解することで,ロバストな協調知識の伝達を実現するアーキテクチャを提案する。我々は、レベル適応型QTransformer(LA-QTransformer)と呼ばれる新しいミキシングネットワークを用いて、クレジット代入を考慮したエージェント調整を実現し、協調知識の伝達に特化した新しいレベル適応型QTransformer(LA-Transformer)によって実現された異なるエージェントに対する適切な調整パターンを実現する。さらに,Population Invariant agent with Transformer (PIT) という新しいエージェントネットワークを用いて,多種多様なシナリオにおけるコーディネーション転送を実現する。 StarCraft IIの大規模なマイクロマネジメント実験により、LA-QTransformerとPITは最先端のベースラインに比べて優れた性能を発揮することが示された。

Extending transfer learning to cooperative multi-agent reinforcement learning (MARL) has recently received much attention. In contrast to the single-agent setting, the coordination indispensable in cooperative MARL constrains each agent's policy. However, existing transfer methods focus exclusively on agent policy and ignores coordination knowledge. We propose a new architecture that realizes robust coordination knowledge transfer through appropriate decomposition of the overall coordination into several coordination patterns. We use a novel mixing network named level-adaptive QTransformer (LA-QTransformer) to realize agent coordination that considers credit assignment, with appropriate coordination patterns for different agents realized by a novel level-adaptive Transformer (LA-Transformer) dedicated to the transfer of coordination knowledge. In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios. Extensive experiments in StarCraft II micro-management show that LA-QTransformer together with PIT achieves superior performance compared with state-of-the-art baselines.

翻訳日:2021-06-02 18:38:14 公開日:2021-06-01

# (参考訳) MalPhase:ネットワークフローデータを用いた細粒度マルウェア検出

MalPhase: Fine-Grained Malware Detection Using Network Flow Data ( http://arxiv.org/abs/2106.00541v1 )

ライセンス: CC BY 4.0

Michal Piskozub, Fabio De Gaspari, Frederick Barr-Smith, Luigi V. Mancini, Ivan Martinovic

(参考訳) 経済的インセンティブにより、マルウェアの著者は、機密性の高いデータを盗み、個人や企業に多額の身代金を支払うよう脅迫するために、ますます複雑な新しいマルウェアを常に開発することが奨励される。 2017年、世界のサイバー攻撃の経済的影響は445～600億米ドル、世界のGDPの0.8%と推定されている。伝統的に、マルウェアに対する防御に使われるアプローチの1つはネットワークトラフィック分析であり、これは潜在的に悪意のあるソフトウェアの存在を検出するためにネットワークデータに依存する。しかし、ネットワーク速度とトラフィック量の増加に対応するために、ネットワーク分析は通常、集約されたネットワークデータを扱うことに限られる。本稿では,集約フローの限界に対処するシステムであるMalPhaseを提案する。 malphaseはマルウェアの検出、タイプ、家族分類のためのマルチフェーズパイプラインを備えている。拡張されたネットワークフロー機能と同時多層アーキテクチャを使用することで、ディープラーニングモデルのパフォーマンス向上が容易になり、悪意のあるフロー(>98% F1)を検出し、それらをそれぞれのマルウェアタイプ(>93% F1)とファミリー(>91% F1)に分類することができる。さらに、ロバストな機能の使用と自動エンコーダのデノナイズにより、MalPhaseは、さまざまな量の良質なトラフィックが混在するサンプルでうまく機能する。最後に、MalPhaseは、実際のネットワーク環境を反映する良質なフローにインターレースされた場合でも、既知のサンプルに匹敵するパフォーマンスで、目に見えないマルウェアサンプルを検出する。

Economic incentives encourage malware authors to constantly develop new, increasingly complex malware to steal sensitive data or blackmail individuals and companies into paying large ransoms. In 2017, the worldwide economic impact of cyberattacks is estimated to be between 445 and 600 billion USD, or 0.8% of global GDP. Traditionally, one of the approaches used to defend against malware is network traffic analysis, which relies on network data to detect the presence of potentially malicious software. However, to keep up with increasing network speeds and amount of traffic, network analysis is generally limited to work on aggregated network data, which is traditionally challenging and yields mixed results. In this paper we present MalPhase, a system that was designed to cope with the limitations of aggregated flows. MalPhase features a multi-phase pipeline for malware detection, type and family classification. The use of an extended set of network flow features and a simultaneous multi-tier architecture facilitates a performance improvement for deep learning models, making them able to detect malicious flows (>98% F1) and categorize them to a respective malware type (>93% F1) and family (>91% F1). Furthermore, the use of robust features and denoising autoencoders allows MalPhase to perform well on samples with varying amounts of benign traffic mixed in. Finally, MalPhase detects unseen malware samples with performance comparable to that of known samples, even when interlaced with benign flows to reflect realistic network environments.

翻訳日:2021-06-02 18:05:05 公開日:2021-06-01

# (参考訳) 関連集合による効率的な説明

Efficient Explanations With Relevant Sets ( http://arxiv.org/abs/2106.00546v1 )

ライセンス: CC BY 4.0

Yacine Izza, Alexey Ignatiev, Nina Narodytska, Martin C. Cooper, Joao Marques-Silva

(参考訳) 最近の研究は、与えられた入力に対する分類器による予測の確率論的説明として$\delta$-relevant inputs (またはset)を提案した。 $\delta$-relevant 集合は、(モデル非依存の)アンカーと(モデル-正確な) PI- の説明を関連付けるのに役立つので重要である。残念なことに、最小サイズの$\delta$-relevant集合の計算は${NP}^{PP}$に対して完備であり、その計算は実際はほとんど実現不可能である。本稿では,$\delta$-関係集合の実用的限界に取り組むための解について検討する。まず、本論文は部分最小集合の計算を交互に検討する。第2に、決定木などを含む分類器の具体的家族について研究する。これらの場合、本論文はnpにおける部分集合最小$\delta$-関係集合の計算がnpオラクルへの呼び出しの多項式数で解くことができることを示す。実験による評価は,提案手法と,本論文で研究した分類器の具体的な場合のヒューリスティックな説明器との比較を行い,提案手法の有効性を確認した。

Recent work proposed $\delta$-relevant inputs (or sets) as a probabilistic explanation for the predictions made by a classifier on a given input. $\delta$-relevant sets are significant because they serve to relate (model-agnostic) Anchors with (model-accurate) PI- explanations, among other explanation approaches. Unfortunately, the computation of smallest size $\delta$-relevant sets is complete for ${NP}^{PP}$, rendering their computation largely infeasible in practice. This paper investigates solutions for tackling the practical limitations of $\delta$-relevant sets. First, the paper alternatively considers the computation of subset-minimal sets. Second, the paper studies concrete families of classifiers, including decision trees among others. For these cases, the paper shows that the computation of subset-minimal $\delta$-relevant sets is in NP, and can be solved with a polynomial number of calls to an NP oracle. The experimental evaluation compares the proposed approach with heuristic explainers for the concrete case of the classifiers studied in the paper, and confirms the advantage of the proposed solution over the state of the art.

翻訳日:2021-06-02 17:40:56 公開日:2021-06-01

# (参考訳) Shine: 双方向最適化と暗黙的モデルのためのフォワードパスからの逆推定

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models ( http://arxiv.org/abs/2106.00553v1 )

ライセンス: CC BY 4.0

Zaccharie Ramzi, Florian Mannel, Shaojie Bai, Jean-Luc Starck, Philippe Ciuciu, Thomas Moreau

(参考訳) 近年,深層ニューラルネットワークの深度を高める手法として暗黙の深度学習が登場している。トレーニングはメモリ効率が高いが、明示的なトレーニングに比べてトレーニングがかなり遅い。深層平衡モデル(deqs)では、トレーニングは双レベル問題として行われ、計算の複雑さは巨大なヤコビ行列の反復反転によって部分的に駆動される。本稿では,biレベルの問題が多く発生するこの計算ボトルネックに対処するための新しい手法を提案する。主な考え方は、フォワードパスから準ニュートン行列を用いて、勾配計算に必要な方向の逆ヤコビ行列を効率的に近似することである。本手法を元来のフォワードアルゴリズムで活用する動機付けとなる定理を提案する。さらに,これらのフォワードアルゴリズムを改良することにより,本手法が漸近的に真の暗黙的勾配を推定する理論的な保証を与える。我々は、超パラメータ最適化からCIFARやImageNetに適用された大規模DQまで、様々な環境でこのアプローチを実証的に研究している。これにより、後方通過の計算コストを最大2桁削減できることを示す。これらはすべて、ハイパーパラメータ最適化およびCIFARにおけるオリジナルのモデルの優れたパフォーマンスを維持し、ImageNet上での奨励的かつ競争的な結果を提供する。

In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks. While their training is memory-efficient, they are still significantly slower to train than their explicit counterparts. In Deep Equilibrium Models (DEQs), the training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. In this paper, we propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer. The main idea is to use the quasi-Newton matrices from the forward pass to efficiently approximate the inverse Jacobian matrix in the direction needed for the gradient computation. We provide a theorem that motivates using our method with the original forward algorithms. In addition, by modifying these forward algorithms, we further provide theoretical guarantees that our method asymptotically estimates the true implicit gradient. We empirically study this approach in many settings, ranging from hyperparameter optimization to large Multiscale DEQs applied to CIFAR and ImageNet. We show that it reduces the computational cost of the backward pass by up to two orders of magnitude. All this is achieved while retaining the excellent performance of the original models in hyperparameter optimization and on CIFAR, and giving encouraging and competitive results on ImageNet.

翻訳日:2021-06-02 17:26:05 公開日:2021-06-01

# (参考訳) マルチスケール特徴融合によるポーズ推定のためのフルレゾリューションエンコーダ・デコーダネットワーク

Full-Resolution Encoder-Decoder Networks with Multi-Scale Feature Fusion for Human Pose Estimation ( http://arxiv.org/abs/2106.00566v1 )

ライセンス: CC BY 4.0

Jie Ou, Mingjian Chen, Hong Wu

(参考訳) より正確な2次元ポーズ推定を実現するために,エンコーダ・デコーダネットワーク,単純なベースラインネットワーク(SBN)を3つの方法で拡張する。大きな出力ストライドサイズに起因する量子化誤差を低減するため、単純なベースラインネットワークの端に2つのデコーダモジュールを追加して完全な出力解像度を得る。次に、グローバルコンテキストブロック(gcbs)がエンコーダとデコーダモジュールに追加され、グローバルコンテキスト機能によってそれらを強化する。さらに,マルチスケール特徴を融合分散し,ポーズ推定を促進するために,空間対応型マルチスケール特徴収集分散モジュール(sa-mfcd)を提案する。 ms cocoデータセットにおける実験結果から,本ネットワークはsbn上でのポーズ推定の精度を著しく向上し,resnet34をバックボーンネットワークとして使用するネットワークは,resnet152でsbnと同等の精度を達成し,大規模バックボーンネットワークで優れた結果を得ることができた。

To achieve more accurate 2D human pose estimation, we extend the successful encoder-decoder network, simple baseline network (SBN), in three ways. To reduce the quantization errors caused by the large output stride size, two more decoder modules are appended to the end of the simple baseline network to get full output resolution. Then, the global context blocks (GCBs) are added to the encoder and decoder modules to enhance them with global context features. Furthermore, we propose a novel spatial-attention-based multi-scale feature collection and distribution module (SA-MFCD) to fuse and distribute multi-scale features to boost the pose estimation. Experimental results on the MS COCO dataset indicate that our network can remarkably improve the accuracy of human pose estimation over SBN, our network using ResNet34 as the backbone network can even achieve the same accuracy as SBN with ResNet152, and our networks can achieve superior results with big backbone networks.

翻訳日:2021-06-02 17:00:57 公開日:2021-06-01

# (参考訳) NewsEmbed: 事前訓練されたドキュメント表現によるニュースのモデリング

NewsEmbed: Modeling News through Pre-trained DocumentRepresentations ( http://arxiv.org/abs/2106.00590v1 )

ライセンス: CC BY 4.0

Jialu Liu, Tianqi Liu, Cong Yu

(参考訳) 文書レベルでのニュース記事などのテキストリッチな新鮮なコンテンツを効果的にモデル化することは難しい問題である。コンテンツベースモデルが広範囲のアプリケーションに適合するようにするためには、望ましい品質を達成しつつ、人間のラベルの規模を超えて大きなトレーニングデータセットを持つことが重要である。本稿では,この2つの課題に対して,意味的に関係のある新文書とその話題ラベルを人間の監督をほとんど受けずにマイニングする新しい手法を提案する。一方,マルチタスクモデルであるNewsEmbedを設計し,コントラスト学習をマルチラベル分類で訓練し,ユニバーサル文書エンコーダを導出する。提案手法は,数十億の高品質な有機学習例を提供し,異なる言語のテキストが同じ意味空間にエンコードされるような多言語環境に自然に拡張できることを示す。我々は,複数の自然言語理解タスクを対象としたNewsEmbedの競合性能を実験的に実証した。

Effectively modeling text-rich fresh content such as news articles at document-level is a challenging problem. To ensure a content-based model generalize well to a broad range of applications, it is critical to have a training dataset that is large beyond the scale of human labels while achieving desired quality. In this work, we address those two challenges by proposing a novel approach to mine semantically-relevant fresh documents, and their topic labels, with little human supervision. Meanwhile, we design a multitask model called NewsEmbed that alternatively trains a contrastive learning with a multi-label classification to derive a universal document encoder. We show that the proposed approach can provide billions of high quality organic training examples and can be naturally extended to multilingual setting where texts in different languages are encoded in the same semantic space. We experimentally demonstrate NewsEmbed's competitive performance across multiple natural language understanding tasks, both supervised and unsupervised.

翻訳日:2021-06-02 16:50:50 公開日:2021-06-01

# (参考訳) Look Wide and Interpret Twice: 対話型インストラクションフォロータスクのパフォーマンス向上

Look Wide and Interpret Twice: Improving Performance on Interactive Instruction-following Tasks ( http://arxiv.org/abs/2106.00596v1 )

ライセンス: CC BY 4.0

Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani

(参考訳) インボディードAIエージェントを自然言語の指示に従う環境と対話しながら複雑なタスクを実行することに、コミュニティへの関心が高まっている。近年の研究では、タスクのためのよく設計されたデータセットであるALFREDを用いてこの問題に取り組んでいるが、精度は非常に低い。本稿では,従来の手法を大きなマージンで上回る新しい手法を提案する。それはいくつかの新しいアイデアの組み合わせに基づいている。 1つは提供された命令の2段階の解釈である。まず、視覚情報を用いずに命令を選択して解釈し、仮の動作シーケンス予測を行う。そして、その予測を視覚情報等と統合し、アクションとオブジェクトの最終的な予測を生成する。対話するオブジェクトのクラスが第一段階で識別されるので、入力画像から正しいオブジェクトを正確に選択することができる。また,本手法では,環境の複数の自己中心的視点を考察し,現在の指示に基づく階層的注意を応用して本質的な情報を抽出する。これはナビゲーションに対するアクションの正確な予測に寄与する。この手法の予備版がALFRED Challenge 2020で優勝した。現在のバージョンでは、単一のビューで4.45%の成功率を達成しており、複数のビューで8.37%に改善されている。

There is a growing interest in the community in making an embodied AI agent perform a complicated task while interacting with an environment following natural language directives. Recent studies have tackled the problem using ALFRED, a well-designed dataset for the task, but achieved only very low accuracy. This paper proposes a new method, which outperforms the previous methods by a large margin. It is based on a combination of several new ideas. One is a two-stage interpretation of the provided instructions. The method first selects and interprets an instruction without using visual information, yielding a tentative action sequence prediction. It then integrates the prediction with the visual information etc., yielding the final prediction of an action and an object. As the object's class to interact is identified in the first stage, it can accurately select the correct object from the input image. Moreover, our method considers multiple egocentric views of the environment and extracts essential information by applying hierarchical attention conditioned on the current instruction. This contributes to the accurate prediction of actions for navigation. A preliminary version of the method won the ALFRED Challenge 2020. The current version achieves the unseen environment's success rate of 4.45% with a single view, which is further improved to 8.37% with multiple views.

翻訳日:2021-06-02 16:33:11 公開日:2021-06-01

# (参考訳) SpanNer: エンティティの再認識をスパン予測に

SpanNer: Named Entity Re-/Recognition as Span Prediction ( http://arxiv.org/abs/2106.00641v1 )

ライセンス: CC BY 4.0

Jinlan Fu, Xuanjing Huang, Pengfei Liu

(参考訳) 近年では、名前付きエンティティ認識(ner)システムのシーケンスラベリングからスパン予測へのパラダイムシフトが見られる。予備的な効果にもかかわらず、スパン予測モデルのアーキテクチャバイアスは完全には理解されていない。本稿では,名前付きエンティティ認識にスパン予測モデルを用いた場合の長所と短所について,シーケンスラベリングフレームワークと比較して検討し,その改善方法について検討する。次に、スパン予測がシステムコンビネータとして機能し、異なるシステムの出力から名前付きエンティティを再認識できることを明らかにする。 3つの言語をカバーする11のデータセット上で154のシステムを実験的に実装し,ベースnerシステムとシステムコンビネータとして機能するスパン予測モデルの有効性を示した。すべてのコードとデータセットを利用可能にする: \url{https://github.com/neulab/spanner} オンラインシステムのデモ: \url{https://spanner.sh} 。私たちのモデルはExplainaBoardプラットフォームにもデプロイされており、ユーザはインタラクティブな方法でトップスコーリングシステムのシステム組み合わせを柔軟に実行することができます。

Recent years have seen the paradigm shift of Named Entity Recognition (NER) systems from sequence labeling to span prediction. Despite its preliminary effectiveness, the span prediction model's architectural bias has not been fully understood. In this paper, we first investigate the strengths and weaknesses when the span prediction model is used for named entity recognition compared with the sequence labeling framework and how to further improve it, which motivates us to make complementary advantages of systems based on different paradigms. We then reveal that span prediction, simultaneously, can serve as a system combiner to re-recognize named entities from different systems' outputs. We experimentally implement 154 systems on 11 datasets, covering three languages, comprehensive results show the effectiveness of span prediction models that both serve as base NER systems and system combiners. We make all code and datasets available: \url{https://github.com/neulab/spanner}, as well as an online system demo: \url{http://spanner.sh}. Our model also has been deployed into the ExplainaBoard platform, which allows users to flexibly perform a system combination of top-scoring systems in an interactive way: \url{http://explainaboard.nlpedia.ai/leaderboard/task-ner/}.

翻訳日:2021-06-02 16:13:01 公開日:2021-06-01

# (参考訳) ネットワーク接続性が集団学習に及ぼす影響

The Impact of Network Connectivity on Collective Learning ( http://arxiv.org/abs/2106.00655v1 )

ライセンス: CC BY 4.0

Michael Crosscombe and Jonathan Lawry

(参考訳) 分散自律システムでは、システムの集団行動を管理する個々のエージェント間の相互作用である。これらのローカルレベルの相互作用は、しばしば基盤となるネットワーク構造によって制御される。これらのネットワークは、エージェントが環境から証拠を収集し、システム内の他のエージェントに情報を伝達する必要がある、集合学習や意思決定において特に重要である。集団行動のモデルは、システム内で効果的な情報共有を提供するためにエージェント間の完全な接続の仮定に依存することが多いが、この仮定は不十分である。本稿では,基礎となるネットワークが集団学習の文脈におけるパフォーマンスに与える影響について検討する。シミュレーションにより,接続性やランダム性のレベルが異なる小世界ネットワークを調査し,接続性の低いネットワークと比較して,完全接続ネットワークの方が平均誤差が高いと結論づけた。さらに,高規則性ネットワークはランダム接続のレベルが増大するネットワークよりも優れることを示す。

In decentralised autonomous systems it is the interactions between individual agents which govern the collective behaviours of the system. These local-level interactions are themselves often governed by an underlying network structure. These networks are particularly important for collective learning and decision-making whereby agents must gather evidence from their environment and propagate this information to other agents in the system. Models for collective behaviours may often rely upon the assumption of total connectivity between agents to provide effective information sharing within the system, but this assumption may be ill-advised. In this paper we investigate the impact that the underlying network has on performance in the context of collective learning. Through simulations we study small-world networks with varying levels of connectivity and randomness and conclude that totally-connected networks result in higher average error when compared to networks with less connectivity. Furthermore, we show that networks of high regularity outperform networks with increasing levels of random connectivity.

翻訳日:2021-06-02 16:04:41 公開日:2021-06-01

# (参考訳) Markpainting: 敵の機械学習がInpaintingと出会う

Markpainting: Adversarial Machine Learning meets Inpainting ( http://arxiv.org/abs/2106.00660v1 )

ライセンス: CC BY 4.0

David Khachaturov, Ilia Shumailov, Yiren Zhao, Nicolas Papernot, Ross Anderson

(参考訳) インパインティング(Inpainting)は、画像にマスクや欠落したピースを投入するために使用される生成モデルに基づく、学習された補間技法であり、画像編集や修正に広く応用されている。近年、透かしの除去に塗布が使われ始め、懸念が高まった。本稿では,マークペイント技術を用いてその操作方法について検討する。まず、塗装モデルにアクセスした画像所有者が、そのモデルを使って編集しようとすると、任意の可視情報を付加するように画像を強化できることを示す。我々は,複数の異なるモデルを同時にターゲットとすることができる。これは、エディタがそれを削除しようとした場合、ウォーターマークを再構成するように設計できる。第二に、我々のマークペイント技術は異なるアーキテクチャを持つモデルや異なるデータセットで訓練されたモデルに転送可能であることを示し、それを用いて作成された透かしは敵が取り除くのが困難である。マークパインティング(Markpainting)は新規で、着色時に目に見えるアラームとして使用できる。

Inpainting is a learned interpolation technique that is based on generative modeling and used to populate masked or missing pieces in an image; it has wide applications in picture editing and retouching. Recently, inpainting started being used for watermark removal, raising concerns. In this paper we study how to manipulate it using our markpainting technique. First, we show how an image owner with access to an inpainting model can augment their image in such a way that any attempt to edit it using that model will add arbitrary visible information. We find that we can target multiple different models simultaneously with our technique. This can be designed to reconstitute a watermark if the editor had been trying to remove it. Second, we show that our markpainting technique is transferable to models that have different architectures or were trained on different datasets, so watermarks created using it are difficult for adversaries to remove. Markpainting is novel and can be used as a manipulation alarm that becomes visible in the event of inpainting.

翻訳日:2021-06-02 15:53:18 公開日:2021-06-01

# (参考訳) 科学テキスト分類のための視覚レイアウト構造の導入

Incorporating Visual Layout Structures for Scientific Text Classification ( http://arxiv.org/abs/2106.00676v1 )

ライセンス: CC BY 4.0

Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey

(参考訳) 科学論文のコアテキストコンポーネントの分類は、科学文書の自動理解において重要な第一歩である。これまでの研究は、基本的なレイアウト情報、すなわちページ上の各トークンの2D位置を用いて、より正確な分類を行う方法を示してきた。本稿では,VILA(Visual LAyout Structure)の新たな手法として,ページテキストのテキスト行やテキストブロックへのグループ化を言語モデルに導入し,パフォーマンスの向上を図る。モデル入力にレイアウト構造の境界を示す特別なトークンを追加するI-VILAアプローチは、トークン分類タスクにおいて+1~4.5 F1のスコア改善につながることを示す。さらに,これらのレイアウト構造を符号化し,予測精度を損なうことなく最大70%の効率向上を記録できる階層モデルh-vilaを設計した。実験は、新しい評価スイートであるS2-VLUEで行われ、VILAの意識を測定する新しい測定基準と、金のアノテーションで19の科学分野をカバーする新しいデータセットが提供される。トレーニング済みのウェイト、ベンチマークデータセット、ソースコードはhttps://github.com/allenai/VILA}{https://github.com/allenai/VILAで入手できる。

Classifying the core textual components of a scientific paper-title, author, body text, etc.-is a critical first step in automated scientific document understanding. Previous work has shown how using elementary layout information, i.e., each token's 2D position on the page, leads to more accurate classification. We introduce new methods for incorporating VIsual LAyout structures (VILA), e.g., the grouping of page texts into text lines or text blocks, into language models to further improve performance. We show that the I-VILA approach, which simply adds special tokens denoting boundaries between layout structures into model inputs, can lead to +1~4.5 F1 Score improvements in token classification tasks. Moreover, we design a hierarchical model H-VILA that encodes these layout structures and record a up-to 70% efficiency boost without hurting prediction accuracy. The experiments are conducted on a newly curated evaluation suite, S2-VLUE, with a novel metric measuring VILA awareness and a new dataset covering 19 scientific disciplines with gold annotations. Pre-trained weights, benchmark datasets, and source code will be available at https://github.com/allenai/VILA}{https://github.com/allenai/VILA.

翻訳日:2021-06-02 15:36:41 公開日:2021-06-01

# 校正型クロスモーダル検索のための学習関係アライメント

Learning Relation Alignment for Calibrated Cross-modal Retrieval ( http://arxiv.org/abs/2105.13868v2 )

ライセンス: Link先を確認

Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu Sun, Hongxia Yang

(参考訳) 大規模なマルチモーダル事前学習アプローチの成果にもかかわらず、画像テキスト検索のようなクロスモーダル検索は難しい課題である。 2つのモダリティ間の意味的ギャップを埋めるために、これまでの研究では、主に対象レベルでの単語領域のアライメントに注目し、単語間の言語的関係と領域間の視覚的関係のマッチングを欠いている。このような関係一貫性の無視は、画像テキスト対の文脈的表現を損なうとともに、モデル性能と解釈可能性を妨げる。本稿では,まず,言語関係と視覚関係の間の意味的距離を計測し,関係一貫性を定量化する新しい指標であるisd(intra-modal self-attention distance)を提案する。そこで本研究では,isdを最適化し,両モダリティ間アライメントを介して相互にモダリティ内自己アライメントを校正するための正規化トレーニング手法であるiais(intra-modal self-attention)のモード間アライメントを提案する。 IAIS正規化器はFlickr30kおよびMS COCOデータセット上での一般的なモデルの性能を大幅に向上させ、我々のアプローチの優位性を示す。

Despite the achievements of large-scale multimodal pre-training approaches, cross-modal retrieval, e.g., image-text retrieval, remains a challenging task. To bridge the semantic gap between the two modalities, previous studies mainly focus on word-region alignment at the object level, lacking the matching between the linguistic relation among the words and the visual relation among the regions. The neglect of such relation consistency impairs the contextualized representation of image-text pairs and hinders the model performance and the interpretability. In this paper, we first propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations. In response, we present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD and calibrate intra-modal self-attentions from the two modalities mutually via inter-modal alignment. The IAIS regularizer boosts the performance of prevailing models on Flickr30k and MS COCO datasets by a considerable margin, which demonstrates the superiority of our approach.

翻訳日:2021-06-02 14:48:49 公開日:2021-06-01

# 1$\times$N Block Pattern for Network Sparsity

1$\times$N Block Pattern for Network Sparsity ( http://arxiv.org/abs/2105.14713v2 )

ライセンス: Link先を確認

Mingbao Lin, Yuchao Li, Yuxin Zhang, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji

(参考訳) ネットワークの分散性は、ニューラルネットワークの大幅な規模拡大を克服するための有望な方向として現れるが、一般的なCPU上での大幅なスピードアップを達成するだけでなく、モデル精度の同時維持も未解決のままである。本稿では,この制限を破るために,ブロック間隔パターン(ブロックプルーニング)を1\times N$という新しい概念を提案する。特に、同じ入力チャネルインデックスを持つ連続$N$出力カーネルは、1つのブロックにグループ化され、プルーニングパターンの基本的なプルーニング粒度として機能する。われわれの$1 \times N$ sparsityパターンは、これらのブロックを重要視している。また,最初に出力チャネル次元の重み行列を再構成し,精度向上のためにより影響力のあるブロックを導出し,入力チャネル次元の次層重みに同様の再配置を適用し,畳み込み操作を確実にするフィルタ再配置のワークフローを提供する。さらに, 並列化されたブロックワイドベクトル化演算により, 1 ドルブロック間隔後の出力計算を実現し, 一般的な CPU ベースのプラットフォーム上での大幅な高速化を実現した。プルーニングパターンの有効性は,ilsvrc-2012実験により実証された。例えば、50%の間隔と$N=4$の場合、MobileNet-V2の上位1の精度でフィルタプルーニングよりも約3.0%改善する。一方、重量プルーニングよりもcortex-a7 cpuの56.04msの推論節約が得られる。コードはhttps://github.com/lmbxmu/1xn。

Though network sparsity emerges as a promising direction to overcome the drastically increasing size of neural networks, it remains an open problem to concurrently maintain model accuracy as well as achieve significant speedups on general CPUs. In this paper, we propose one novel concept of $1\times N$ block sparsity pattern (block pruning) to break this limitation. In particular, consecutive $N$ output kernels with the same input channel index are grouped into one block, which serves as a basic pruning granularity of our pruning pattern. Our $1 \times N$ sparsity pattern prunes these blocks considered unimportant. We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements, and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations. Moreover, the output computation after our $1 \times N$ block sparsity can be realized via a parallelized block-wise vectorized operation, leading to significant speedups on general CPUs-based platforms. The efficacy of our pruning pattern is proved with experiments on ILSVRC-2012. For example, in the case of 50% sparsity and $N=4$, our pattern obtains about 3.0% improvements over filter pruning in the top-1 accuracy of MobileNet-V2. Meanwhile, it obtains 56.04ms inference savings on Cortex-A7 CPU over weight pruning. Code is available at https://github.com/lmbxmu/1xN.

翻訳日:2021-06-02 14:48:25 公開日:2021-06-01

# 問合せから引用への変換に基づく引用推薦と解釈

Quotation Recommendation and Interpretation Based on Transformation from Queries to Quotations ( http://arxiv.org/abs/2105.14189v2 )

ライセンス: Link先を確認

Lingzhi Wang, Xingshan Zeng, Kam-Fai Wong

(参考訳) 個人が自分を表現するのを助けるために、引用推奨が注目を集めている。それでも、これまでのほとんどの取り組みは、引用とクエリを別々にモデル化することに集中し、引用とクエリの関係を無視する。本研究では,クエリ表現を直接引用表現にマッピングする変換行列を提案する。マッピング関係をよりよく学ぶために、2つの意味空間の距離を最小にするマッピング損失(1つは引用用、もう1つはマッピングクエリ用)を用いる。さらに,問合せ中の単語を用いて引用の擬人的言語を解釈し,問合せの上に引用を意識した注意を施し,指示語を強調する。英語と中国語の2つのデータセットの実験では、我々のモデルは過去の最先端モデルよりも優れていた。

To help individuals express themselves better, quotation recommendation is receiving growing attention. Nevertheless, most prior efforts focus on modeling quotations and queries separately and ignore the relationship between the quotations and the queries. In this work, we introduce a transformation matrix that directly maps the query representations to quotation representations. To better learn the mapping relationship, we employ a mapping loss that minimizes the distance of two semantic spaces (one for quotation and another for mapped-query). Furthermore, we explore using the words in history queries to interpret the figurative language of quotations, where quotation-aware attention is applied on top of history queries to highlight the indicator words. Experiments on two datasets in English and Chinese show that our model outperforms previous state-of-the-art models.

翻訳日:2021-06-02 14:47:57 公開日:2021-06-01

# 変圧器の微調整と組成の相互作用について

On the Interplay Between Fine-tuning and Composition in Transformers ( http://arxiv.org/abs/2105.14668v2 )

ライセンス: Link先を確認

Lang Yu and Allyson Ettinger

(参考訳) 事前訓練されたトランスフォーマー言語モデルは、様々なNLPタスクにおいて顕著な性能を示した。しかし、近年の研究では、これらのモデルにおけるフレーズレベルの表現は、語彙内容の強い影響を反映しているが、洗練された合成句情報の証拠がないことが示唆されている。本稿では,語彙的内容を超えた句意味情報を取り込むための文脈的埋め込みの能力に対する微調整の影響について検討する。具体的には,語彙重複度の高い逆パラフレーズ分類タスクと感情分類タスクでモデルを微調整する。微調整後,事前作業後の制御設定におけるフラシアル表現の分析を行う。微調整はこれらの表現において構成性に恩恵をもたらすことがほとんどないが、感情の訓練は特定のモデルに小さな局所的な利益をもたらす。フォローアップ分析では,その課題から構成的利益の欠如を説明できるパラフレーズデータセット内の類似した手がかりを同定し,感情訓練による局所的利益の根底にある潜在的な要因について考察する。

Pre-trained transformer language models have shown remarkable performance on a variety of NLP tasks. However, recent research has suggested that phrase-level representations in these models reflect heavy influences of lexical content, but lack evidence of sophisticated, compositional phrase information. Here we investigate the impact of fine-tuning on the capacity of contextualized embeddings to capture phrase meaning information beyond lexical content. Specifically, we fine-tune models on an adversarial paraphrase classification task with high lexical overlap, and on a sentiment classification task. After fine-tuning, we analyze phrasal representations in controlled settings following prior work. We find that fine-tuning largely fails to benefit compositionality in these representations, though training on sentiment yields a small, localized benefit for certain models. In follow-up analyses, we identify confounding cues in the paraphrase dataset that may explain the lack of composition benefits from that task, and we discuss potential factors underlying the localized benefits from sentiment training.

翻訳日:2021-06-02 14:47:43 公開日:2021-06-01

# 探索と爆発:中国のスペル補正モデルを改善する2つの方法

Exploration and Exploitation: Two Ways to Improve Chinese Spelling Correction Models ( http://arxiv.org/abs/2105.14813v2 )

ライセンス: Link先を確認

Chong Li, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

(参考訳) ニューラルネットワークを用いたシーケンシャル・ツー・シーケンス学習は、いくつかの綴り誤りのある文を入力として出力する中国語綴り修正(csc)の有効な枠組みであることが実証的に証明されている。しかし、CSCモデルは混乱セットによってカバーされるスペルエラーの修正に失敗し、また目に見えないエラーに遭遇する。本稿では,モデルの弱点を継続的に識別し,より価値のあるトレーニングインスタンスを生成し,そのモデルを強化するためにタスク固有の事前学習戦略を適用する手法を提案する。生成した敵の例をトレーニングセットに徐々に追加する。実験結果から, 事前学習戦略と組み合わさって, 複数のCSCモデルの3つのデータセット間の一般化とロバスト性を改善し, CSCタスクの最先端性能を達成できることが示唆された。

A sequence-to-sequence learning with neural networks has empirically proven to be an effective framework for Chinese Spelling Correction (CSC), which takes a sentence with some spelling errors as input and outputs the corrected one. However, CSC models may fail to correct spelling errors covered by the confusion sets, and also will encounter unseen ones. We propose a method, which continually identifies the weak spots of a model to generate more valuable training instances, and apply a task-specific pre-training strategy to enhance the model. The generated adversarial examples are gradually added to the training set. Experimental results show that such an adversarial training method combined with the pretraining strategy can improve both the generalization and robustness of multiple CSC models across three different datasets, achieving stateof-the-art performance for CSC task.

翻訳日:2021-06-02 14:47:26 公開日:2021-06-01

# 加算ニューラルネットワーク

Adder Neural Networks ( http://arxiv.org/abs/2105.14202v2 )

ライセンス: Link先を確認

Hanting Chen, Yunhe Wang, Chang Xu, Chao Xu, Chunjing Xu, Tong Zhang

(参考訳) 安価な加算演算と比較すると、乗算演算は計算の複雑さがはるかに高い。ディープニューラルネットワークにおける広く使われている畳み込みは、入力特徴と畳み込みフィルタの類似度を測定するために、正確にクロス相関である。本稿では,深層ニューラルネットワーク,特に畳み込みニューラルネットワーク(CNN)におけるこれらの膨大な乗算を,計算コストを削減するために,より安価な加算を行うための加算器ネットワーク(AdderNets)を提案する。 AdderNetsでは、フィルタと入力機能の間の$\ell_1$-norm距離を出力応答としています。この新たな類似度尺度がニューラルネットワークの最適化に与える影響を網羅的に分析した。より優れたパフォーマンスを実現するため,$\ell_p$-norm を調査し,AdderNets の特別なトレーニング手法を開発した。次に,各ニューロンの勾配の大きさに応じてアダネットの学習手順を強化する適応学習速度戦略を提案する。その結果、AdderNetsは画像Netデータセット上でResNet-50を使用して75.7%のTop-1精度92.3%のTop-5精度を達成することができる。さらに,ReLUアクティベーション関数を持つ単一の隠蔽層AdderNetと幅境界層AdderNetの両方が普遍関数近似器であることを示すことにより,AdderNetsの理論基盤を構築する。これらの結果は、より複雑な乗算単位を用いて従来のニューラルネットワークのものと一致する。単一の隠れレイヤでAdderNetsにバインドされた近似も提示される。

Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special training approach for AdderNets by investigating the $\ell_p$-norm. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 75.7% Top-1 accuracy 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolutional layer. Moreover, we develop a theoretical foundation for AdderNets, by showing that both the single hidden layer AdderNet and the width-bounded deep AdderNet with ReLU activation functions are universal function approximators. These results match those of the traditional neural networks using the more complex multiplication units. An approximation bound for AdderNets with a single hidden layer is also presented.

翻訳日:2021-06-02 14:47:11 公開日:2021-06-01

# VersatileGait:Wildシミュレーションに向けた大規模合成ゲイトデータセット

VersatileGait: A Large-Scale Synthetic Gait Dataset Towards in-the-Wild Simulation ( http://arxiv.org/abs/2105.14421v2 )

ライセンス: Link先を確認

Pengyi Zhang, Huanzhang Dou, Wenhu Zhang, Yuhan Zhao, Songyuan Li, Zequn Qin, Xi Li

(参考訳) 近年,歩行認識が急速に進展している。しかし、野生での歩行認識はまだ十分に研究されていない。明らかな理由は、本質的および外生的要因の観点からの多様なトレーニングデータが欠如していることにある。この問題を解決するために,制御可能なコンピュータシミュレーションを用いて大規模歩行データセットを構築することを提案する。詳しくは,歩行の本質的要因を多様化するために,多様な属性を持つ多数のキャラクターを生成し,様々なタイプの歩行スタイルを付与する。歩行の外部要因を多様化するために,高密度カメラレイアウトの複雑なシーンを構築する。最後に、歩行シナリオをシミュレーションし、歩行データを自動キャプチャする自動生成ツールキットをUnity3Dで設計する。その結果,多種多様なシナリオを持つ1万件の被験者のシルエット配列を100万件以上持つVersatileGaitという,Wildの歩行データセットが得られた。 versatilegaitには、巨大なデータセットサイズ、多様な歩行者属性、複雑なカメラレイアウト、高品質なアノテーション、実際のドメイン間隙、新しい要求に対する優れたスケーラビリティ、プライバシ問題のない、いくつかの優れた特性があります。 versatilegaitを基盤として,野生の歩行研究と実用研究の両面において,一連の実験と応用を提案する。我々のデータセットとその生成ツールキットは、さらなる研究のために公開されます。

Gait recognition has a rapid development in recent years. However, gait recognition in the wild is not well explored yet. An obvious reason could be ascribed to the lack of diverse training data from the perspective of intrinsic and extrinsic factors. To remedy this problem, we propose to construct a large-scale gait dataset with the help of controllable computer simulation. In detail, to diversify the intrinsic factors of gait, we generate numerous characters with diverse attributes and empower them with various types of walking styles. To diversify the extrinsic factors of gait, we build a complicated scene with a dense camera layout. Finally, we design an automated generation toolkit under Unity3D for simulating the walking scenario and capturing the gait data automatically. As a result, we obtain an in-the-wild gait dataset, called VersatileGait, which has more than one million silhouette sequences of 10,000 subjects with diverse scenarios. VersatileGait possesses several nice properties, including huge dataset size, diverse pedestrian attributes, complicated camera layout, high-quality annotations, small domain gap with the real one, good scalability for new demands, and no privacy issues. Based on VersatileGait, we propose series of experiments and applications for both research exploration of gait in the wild and practical applications. Our dataset and its corresponding generation toolkit will be publicly available for further studies.

翻訳日:2021-06-02 14:46:48 公開日:2021-06-01

# ディープニューラルネットワークを用いた運転意図予測

Predicting Driver Intention Using Deep Neural Network ( http://arxiv.org/abs/2105.14790v2 )

ライセンス: Link先を確認

Mahdi Bonyani, Mina Rahmanian, Simindokht Jahangard

(参考訳) 運転安全性の向上と自動車事故の回避のために,高度運転支援システム (ADAS) が注目されている。近年の研究では、運転者の意図をシステムの重要部分として予測することに焦点を当てている。本研究では,brain4carsデータセットを用いたダイバー操作の予測に4つの入力を用い,実際の動作が起こる5,4,3,2,1秒前に操作予測を行う新しい枠組みを提案する。 1) 内部ビューのみ、2) 外部ビュー、3) 内部ビューと外部ビューの両方を使用して、フレームワークを3つのシナリオで評価しました。データセットをトレーニング,検証,テストセットに分割し,K倍のクロス検証も活用した。最先端の研究と比較すると、アーキテクチャは高速で、2番目と3番目のシナリオで高いパフォーマンスを実現しています。評価指標として精度,精度,リコール,f1-scoreを用い,外視では82.41%,82.28%,82,42%,82.24%,内視では98.90%,98.96%,外視では98.90%,外視では98.88%を得た。

To improve driving safety and avoid car accidents, Advanced Driver Assistance Systems (ADAS) are given significant attention. Recent studies have focused on predicting driver intention as a key part of these systems. In this study, we proposed new framework in which 4 inputs are employed to anticipate diver maneuver using Brain4Cars dataset and the maneuver prediction is achieved from 5, 4, 3, 2, 1 seconds before the actual action occurs. We evaluated our framework in three scenarios: using only 1) inside view 2) outside view and 3) both inside and outside view. We divided the dataset into training, validation and test sets, also K-fold cross validation is utilized. Compared with state-of-the-art studies, our architecture is faster and achieved higher performance in second and third scenario. Accuracy, precision, recall and f1-score as evaluation metrics were utilized and the result of 82.41%, 82.28%, 82,42% and 82.24% for outside view and 98.90%, 98.96%, 98.90% and 98.88% for both inside and outside view were gained, respectively.

翻訳日:2021-06-02 14:46:28 公開日:2021-06-01

# 都市交通監視(UTS:Urban Traffic Surveillance) : 2次元検出に基づく完全確率的3D追跡手法

Urban Traffic Surveillance (UTS): A fully probabilistic 3D tracking approach based on 2D detections ( http://arxiv.org/abs/2105.14993v2 )

ライセンス: Link先を確認

Henry Bradler, Adrian Kretz and Rudolf Mester

(参考訳) 都市交通監視(英語: urban traffic surveillance、略称:uts)は、複数の車線や車両が集中する都市交通シナリオにおける車両を検知し、鋭い旋回操作を行う単眼およびキャリブレーションビデオカメラに基づく監視システムである。 UTSは3Dバウンディングボックス表現と、無意味なカルマンフィルタに基づく物理的に合理的な3Dモーションモデルを用いて車両を追跡する。 UTSは3次元世界座標系における位置、形状、運動情報を復元するため、多様な交通違反を認識したり、貴重な交通情報を提供するために使用できる。 YOLOv3は、各車両の2Dバウンディングボックスとクラスラベルを生成する検出器として構築されている。 2D検出器は、さまざまなラベル付きトレーニングデータが利用できるため、我々のシステムを異なるカメラ視点にはるかに独立させる。これにより、よりハードウェア効率が良く、優れた一般化が可能になる。 2次元検出に基づく3Dトラッキングのタスクは、車両形状に関するクラス固有の事前知識を統合することで支援される。都市部における車両監視設定とラベル付き3Dバウンディングボックスによるデータセットの非存在により,CARLAシミュレータからの自己生成合成データと地上真実を用いてUTSを定量的に評価した。さらに,実世界のデータに対するUTSの動作の質的な印象を与える。私たちの実装は、かなりモダンなワークステーション上でリアルタイムに動作できます。われわれの知る限り、UTSは監視シナリオ(静止カメラによる移動目標の観測)の中で唯一の3D車両追跡システムとなる。

Urban Traffic Surveillance (UTS) is a surveillance system based on a monocular and calibrated video camera that detects vehicles in an urban traffic scenario with dense traffic on multiple lanes and vehicles performing sharp turning maneuvers. UTS then tracks the vehicles using a 3D bounding box representation and a physically reasonable 3D motion model relying on an unscented Kalman filter based approach. Since UTS recovers positions, shape and motion information in a three-dimensional world coordinate system, it can be employed to recognize diverse traffic violations or to supply intelligent vehicles with valuable traffic information. We build on YOLOv3 as a detector yielding 2D bounding boxes and class labels for each vehicle. A 2D detector renders our system much more independent to different camera perspectives as a variety of labeled training data is available. This allows for a good generalization while also being more hardware efficient. The task of 3D tracking based on 2D detections is supported by integrating class specific prior knowledge about the vehicle shape. We quantitatively evaluate UTS using self generated synthetic data and ground truth from the CARLA simulator, due to the non-existence of datasets with an urban vehicle surveillance setting and labeled 3D bounding boxes. Additionally, we give a qualitative impression of how UTS performs on real-world data. Our implementation is capable of operating in real time on a reasonably modern workstation. To the best of our knowledge, UTS is to date the only 3D vehicle tracking system in a surveillance scenario (static camera observing moving targets).

翻訳日:2021-06-02 14:46:03 公開日:2021-06-01

# 新規白質路の少数ショットセグメンテーションのための知識伝達

Knowledge Transfer for Few-shot Segmentation of Novel White Matter Tracts ( http://arxiv.org/abs/2105.14513v2 )

ライセンス: Link先を確認

Qi Lu and Chuyang Ye

(参考訳) 畳み込みニューラルネットワーク(CNN)は,拡散磁気共鳴画像(dMRI)に基づいて,白色物質(WM)トラクションセグメンテーションの最先端性能を達成した。これらのCNNは、一般に労働集約的でコストがかかるWMの訓練に多くの手作業による指示を必要とする。新しいWMトラクション、すなわち既存の手動デラインに含まれていないトラクションを解析する場合、高価な手動デライン化は特に不利になる可能性がある。新規WMトラクトを正確にセグメンテーションするには、既存のWMトラクトについて学んだ知識を伝達することが望ましいので、新規WMトラクトをわずかに記述しても、CNNはセグメンテーションのために適切に学習することができる。本稿では,これらの知識を,いくつかの場面で新規なWMトラクトのセグメンテーションに移行することを検討する。古典的な微調整戦略は目的に利用できるが、既存のwmパスをセグメント化するための最後のタスク特定層の情報は、完全に破棄される。我々は、この最後の層の重みは、新しいWMトラクトをセグメント化するための貴重な情報を保持することができるため、情報を完全に破棄することは最適ではないと仮定する。特に,新しいWMトラクトは既存のWMトラクトと相関し,新しいWMトラクトのセグメンテーションは既存のWMトラクトのロジットで予測できると考えられる。このように、微調整のためにランダム初期化よりも最終層のより良い初期化が達成できる。さらに,古典的な微調整の前にウォームアップステージを挿入するだけで,既存のWMトラクトを分割するための最終層における知識をより適応的に利用できることを示す。提案手法はdmriデータセット上で評価され,提案手法が新規なwm路の少数画分節化に有用であることを実証した。

Convolutional neural networks (CNNs) have achieved stateof-the-art performance for white matter (WM) tract segmentation based on diffusion magnetic resonance imaging (dMRI). These CNNs require a large number of manual delineations of the WM tracts of interest for training, which are generally labor-intensive and costly. The expensive manual delineation can be a particular disadvantage when novel WM tracts, i.e., tracts that have not been included in existing manual delineations, are to be analyzed. To accurately segment novel WM tracts, it is desirable to transfer the knowledge learned about existing WM tracts, so that even with only a few delineations of the novel WM tracts, CNNs can learn adequately for the segmentation. In this paper, we explore the transfer of such knowledge to the segmentation of novel WM tracts in the few-shot setting. Although a classic fine-tuning strategy can be used for the purpose, the information in the last task-specific layer for segmenting existing WM tracts is completely discarded. We hypothesize that the weights of this last layer can bear valuable information for segmenting the novel WM tracts and thus completely discarding the information is not optimal. In particular, we assume that the novel WM tracts can correlate with existing WM tracts and the segmentation of novel WM tracts can be predicted with the logits of existing WM tracts. In this way, better initialization of the last layer than random initialization can be achieved for fine-tuning. Further, we show that a more adaptive use of the knowledge in the last layer for segmenting existing WM tracts can be conveniently achieved by simply inserting a warmup stage before classic fine-tuning. The proposed method was evaluated on a publicly available dMRI dataset, where we demonstrate the benefit of our method for few-shot segmentation of novel WM tracts.

翻訳日:2021-06-02 14:45:39 公開日:2021-06-01

# 強化学習に基づく車両ネットワークにおける動的サービス配置

Reinforcement Learning-based Dynamic Service Placement in Vehicular Networks ( http://arxiv.org/abs/2105.15022v2 )

ライセンス: Link先を確認

Anum Talpur and Mohan Gurusamy

(参考訳) 5Gやモバイルエッジコンピューティングといった技術が出現すると、車載ネットワーク内の車両に異なるリソースとサービス要件を持つ異なるタイプのサービスのプロビジョニングが可能となり、さまざまなタイプのサービスの要求に対するトラフィックモビリティパターンとダイナミックスの複雑さが増し、サービスの配置が困難な課題となっている。典型的な静的配置ソリューションは、トラフィック移動性とサービスダイナミクスを考慮していないため、効果的ではない。本稿では,車両の移動性や動的性を考慮しつつ,エッジサーバに最適なサービス配置を求めるための強化学習型動的(RL-Dynamic)サービス配置フレームワークを提案する。シミュレーション実験にはSUMOとMATLABを用いる。学習フレームワークでは,決定モジュールに対して,遅延最小化とエッジサーバ利用最小化という2つの目的関数を検討する。 2つの目的関数に対するILPに基づく問題定式化を開発した。実験の結果,1)静的サービス配置と比較して,RLベースの動的サービス配置はエッジサーバリソースの公平な利用とサービス遅延の低減を実現し,2)遅延最適化配置と比較して,サーバ利用最適化配置はリソースをより効果的に活用し,エッジサーバ利用率を低くする。

The emergence of technologies such as 5G and mobile edge computing has enabled provisioning of different types of services with different resource and service requirements to the vehicles in a vehicular network.The growing complexity of traffic mobility patterns and dynamics in the requests for different types of services has made service placement a challenging task. A typical static placement solution is not effective as it does not consider the traffic mobility and service dynamics. In this paper, we propose a reinforcement learning-based dynamic (RL-Dynamic) service placement framework to find the optimal placement of services at the edge servers while considering the vehicle's mobility and dynamics in the requests for different types of services. We use SUMO and MATLAB to carry out simulation experiments. In our learning framework, for the decision module, we consider two alternative objective functions-minimizing delay and minimizing edge server utilization. We developed an ILP based problem formulation for the two objective functions. The experimental results show that 1) compared to static service placement, RL-based dynamic service placement achieves fair utilization of edge server resources and low service delay, and 2) compared to delay-optimized placement, server utilization optimized placement utilizes resources more effectively, achieving higher fairness with lower edge-server utilization.

翻訳日:2021-06-02 14:45:08 公開日:2021-06-01

# CIDER:対話説明と推論のための常識推論

CIDER: Commonsense Inference for Dialogue Explanation and Reasoning ( http://arxiv.org/abs/2106.00510v1 )

ライセンス: Link先を確認

Deepanway Ghosal and Pengfei Hong and Siqi Shen and Navonil Majumder and Rada Mihalcea and Soujanya Poria

(参考訳) 人間の言語を理解し説明するための常識推論は、自然言語処理における基本的な研究課題である。人間の会話を説明するには、文脈理解、計画、推論、因果関係、時間的、常識的推論を含む推論のいくつかの側面が必要であるため、大きな課題となる。本研究では,文脈コモンセンス推論を用いて推測される暗黙的かつ明示的な知識三重項という形で,ダイアディックな対話説明を含む手作業によるデータセットであるCIDERを紹介する。そのようなリッチな説明を会話から抽出することは、いくつかの下流アプリケーションを改善することにつながる。注釈付き三重項は、コモンセンス知識のタイプ(例えば因果、条件、時間)によって分類される。注釈付きデータセットでは,対話レベル自然言語推論,スパン抽出,複数選択スパン選択という3つのタスクを設定した。トランスフォーマーモデルで得られたベースライン結果は、タスクが困難であることを明らかにし、将来的な研究の道を開く。データセットとベースラインの実装はhttps://github.com/declare-lab/CIDERで公開されている。

Commonsense inference to understand and explain human language is a fundamental research problem in natural language processing. Explaining human conversations poses a great challenge as it requires contextual understanding, planning, inference, and several aspects of reasoning including causal, temporal, and commonsense reasoning. In this work, we introduce CIDER -- a manually curated dataset that contains dyadic dialogue explanations in the form of implicit and explicit knowledge triplets inferred using contextual commonsense inference. Extracting such rich explanations from conversations can be conducive to improving several downstream applications. The annotated triplets are categorized by the type of commonsense knowledge present (e.g., causal, conditional, temporal). We set up three different tasks conditioned on the annotated dataset: Dialogue-level Natural Language Inference, Span Extraction, and Multi-choice Span Selection. Baseline results obtained with transformer-based models reveal that the tasks are difficult, paving the way for promising future research. The dataset and the baseline implementations are publicly available at https://github.com/declare-lab/CIDER.

翻訳日:2021-06-02 14:44:30 公開日:2021-06-01

# GAN-BioBERTの検証 : 臨床治験における報告傾向の評価方法

Validating GAN-BioBERT: A Methodology For Assessing Reporting Trends In Clinical Trials ( http://arxiv.org/abs/2106.00665v1 )

ライセンス: Link先を確認

Joshua J Myszewski, Emily Klossowski, Patrick Meyer, Kristin Bevil, Lisa Klesius, Kristopher M Schroeder

(参考訳) 過去10年間、臨床研究におけるバイアスド・レポーティングの問題について多くの議論がなされてきた。この点にも拘わらず、臨床研究における質的記述の体系的評価のための限られたツールが開発されており、ほとんどの研究は、その大きさを制限する手作業のエキスパート・リサーの使用に依拠して質的記述を評価する。また、自然言語処理などの大規模ツール開発の試みは、その精度と発見の分類に使用されるカテゴリ数によって制限されていた。これらの制約を念頭に置いて、臨床試験の要約で表される定性的な感情を評価するために、大規模に適用するには適度に正確かつきめ細かな分類アルゴリズムを開発することが本研究の目的であった。さらに,本研究では,提案アルゴリズムであるGAN-BioBERTと過去の研究との比較や,臨床治験要約のマニュアル評価について検討する。本研究は,トランスフォーマー(bert)モデルからの双方向エンコーダ表現に基づく半教師自然言語プロセスモデルを用いて,臨床実施例の3種類の感情分類アルゴリズムを開発した。結果: このアルゴリズムを用いた場合, 分類精度は91.3%であり, マクロf1-scoreは0.92であり, 従来の方法やエキスパート格付けに比べ, 精度が大幅に向上した。提案手法であるgan-biobertは, 臨床文献における質的記述の大規模評価に適した分類モデルであり, 臨床出版動向の大規模研究に正確な再現性を提供する。

In the past decade, there has been much discussion about the issue of biased reporting in clinical research. Despite this attention, there have been limited tools developed for the systematic assessment of qualitative statements made in clinical research, with most studies assessing qualitative statements relying on the use of manual expert raters, which limits their size. Also, previous attempts to develop larger scale tools, such as those using natural language processing, were limited by both their accuracy and the number of categories used for the classification of their findings. With these limitations in mind, this study's goal was to develop a classification algorithm that was both suitably accurate and finely grained to be applied on a large scale for assessing the qualitative sentiment expressed in clinical trial abstracts. Additionally, this study seeks to compare the performance of the proposed algorithm, GAN-BioBERT, to previous studies as well as to expert manual rating of clinical trial abstracts. This study develops a three-class sentiment classification algorithm for clinical trial abstracts using a semi-supervised natural language process model based on the Bidirectional Encoder Representation from Transformers (BERT) model, from a series of clinical trial abstracts annotated by a group of experts in academic medicine. Results: The use of this algorithm was found to have a classification accuracy of 91.3%, with a macro F1-Score of 0.92, which is a significant improvement in accuracy when compared to previous methods and expert ratings, while also making the sentiment classification finer grained than previous studies. The proposed algorithm, GAN-BioBERT, is a suitable classification model for the large-scale assessment of qualitative statements in clinical trial literature, providing an accurate, reproducible tool for the large-scale study of clinical publication trends.

翻訳日:2021-06-02 14:44:13 公開日:2021-06-01

# Rewardは凸型MDPに十分である

Reward is enough for convex MDPs ( http://arxiv.org/abs/2106.00661v1 )

ライセンス: Link先を確認

Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins and Satinder Singh

(参考訳) マルコフと定常である累積報酬関数の最大化、すなわち状態-作用対上で定義され、時間に依存しないことは、強化学習(RL)問題定式化に基づくマルコフ決定過程(MDP)において多くの種類の目標を捉えるのに十分である。しかし、この方法で全ての目標を達成できるわけではない。具体的には、目標が定常分布の凸関数として表される凸 MDP は、一般にこの方法では定式化できないことが分かりやすい。本稿では,Fenchel双対性を用いたポリシーとコスト(負の報酬)プレーヤー間のmin-maxゲームとして凸MDP問題を再構成し,その解決のためのメタアルゴリズムを提案する。本研究では,コストプレーヤが生成する非定常報酬を最大化するrlエージェントが生成するポリシーの平均値が,凸mdpの最適解に収束することを示す。最後に、メタアルゴリズムは、見習い学習、変分内在性制御、制約されたMDP、単一フレームワークへの純粋探索など、文学における強化学習アルゴリズムの様々な分岐を統一することを示す。

Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov Decision Process (MDP) based on the Reinforcement Learning (RL) problem formulation. However, not all goals can be captured in this manner. Specifically, it is easy to see that Convex MDPs in which goals are expressed as convex functions of stationary distributions cannot, in general, be formulated in this manner. In this paper, we reformulate the convex MDP problem as a min-max game between the policy and cost (negative reward) players using Fenchel duality and propose a meta-algorithm for solving it. We show that the average of the policies produced by an RL agent that maximizes the non-stationary reward produced by the cost player converges to an optimal solution to the convex MDP. Finally, we show that the meta-algorithm unifies several disparate branches of reinforcement learning algorithms in the literature, such as apprenticeship learning, variational intrinsic control, constrained MDPs, and pure exploration into a single framework.

翻訳日:2021-06-02 14:42:36 公開日:2021-06-01

# サクセス機能を有する多種多様な最適政策の発見

Discovering Diverse Nearly Optimal Policies withSuccessor Features ( http://arxiv.org/abs/2106.00669v1 )

ライセンス: Link先を確認

Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag and Satinder Singh

(参考訳) 同じ問題に対する異なる解決策を見つけることは、創造性と新しい状況への適応に関連するインテリジェンスの重要な側面である。強化学習では、様々なポリシーが探索、転送、階層化、堅牢性に有用である。提案手法は,後継的特徴の空間において多様な方針を探索する手法であり,それらがほぼ最適であることを示すものである。我々は,この問題をCMDP(Constrained Markov Decision Process)として定式化し,本質的な多様性報酬を特徴とする多様性を最大化する政策を見つけることを目的としている。また,最近提案されたロバスト性および識別報酬がいかに機能するかを分析し,手続きの初期化に敏感であり,サブ最適解に収束する可能性を見出した。そこで,本稿では,政策の後継的特徴の相関を最小限に抑えることを目的とした,新たな明示的な多様性報酬を提案する。我々はDeepMind Control Suiteの異なる多様性メカニズムを比較し、提案している明示的な多様性のタイプが、例えば異なる移動パターンのような異なる振る舞いを発見するために重要であることを発見した。

Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal. We formalize the problem as a Constrained Markov Decision Process (CMDP) where the goal is to find policies that maximize diversity, characterized by an intrinsic diversity reward, while remaining near-optimal with respect to the extrinsic reward of the MDP. We also analyze how recently proposed robustness and discrimination rewards perform and find that they are sensitive to the initialization of the procedure and may converge to sub-optimal solutions. To alleviate this, we propose new explicit diversity rewards that aim to minimize the correlation between the Successor Features of the policies in the set. We compare the different diversity mechanisms in the DeepMind Control Suite and find that the type of explicit diversity we are proposing is important to discover distinct behavior, like for example different locomotion patterns.

翻訳日:2021-06-02 14:42:14 公開日:2021-06-01

# 確率的スタイルマッチを用いた半教師付き領域一般化

Semi-Supervised Domain Generalization with Stochastic StyleMatch ( http://arxiv.org/abs/2106.00592v1 )

ライセンス: Link先を確認

Kaiyang Zhou, Chen Change Loy, Ziwei Liu

(参考訳) ドメイン一般化に関する既存の研究の多くは、複数のドメインから集めたソースデータが完全に注釈付けされていると仮定している。しかし、現実世界のアプリケーションでは、アノテーションのコストが高いため、各ソースドメインから利用可能なラベルはわずかにしかありません。本研究では,より現実的で実用的な半教師付き領域一般化(SSDG)について検討する。提案手法であるStyleMatchは,疑似ラベルをベースとした最先端の半教師付き学習手法であるFixMatchに触発され,SSDGを解くための新しい材料がいくつか提案されている。具体的には,うるさい擬似ラベルに対するロバスト性を改善しつつ,希少ラベル付きソースデータの過剰フィットを軽減するため,ガウス分布を持つクラスプロトタイプと見なされる分類器の重みに対する確率的モデリングを導入する。 2) ドメインシフト下での一般化を促進するために,fixmatch の 2-view 一貫性学習パラダイムを,スタイル拡張を第3の補完的視点として,マルチビュー版への弱みと強い拡張性に基づいてアップグレードする。そこで本研究では,ドメイン一般化や半教師付き学習など,幅広い領域で開発された強力なベースライン手法を網羅した2つのSSDGベンチマークを構築した。大規模な実験により、StyleMatchは低データ方式で最適な分布外一般化性能を達成することが示された。われわれのアプローチとベンチマークが、データ効率と一般化可能な学習システムに関する将来の研究の道を開くことを願っている。

Most existing research on domain generalization assumes source data gathered from multiple domains are fully annotated. However, in real-world applications, we might have only a few labels available from each source domain due to high annotation cost, along with abundant unlabeled data that are much easier to obtain. In this work, we investigate semi-supervised domain generalization (SSDG), a more realistic and practical setting. Our proposed approach, StyleMatch, is inspired by FixMatch, a state-of-the-art semi-supervised learning method based on pseudo-labeling, with several new ingredients tailored to solve SSDG. Specifically, 1) to mitigate overfitting in the scarce labeled source data while improving robustness against noisy pseudo labels, we introduce stochastic modeling to the classifier's weights, seen as class prototypes, with Gaussian distributions. 2) To enhance generalization under domain shift, we upgrade FixMatch's two-view consistency learning paradigm based on weak and strong augmentations to a multi-view version with style augmentation as the third complementary view. To provide a comprehensive study and evaluation, we establish two SSDG benchmarks, which cover a wide range of strong baseline methods developed in relevant areas including domain generalization and semi-supervised learning. Extensive experiments demonstrate that StyleMatch achieves the best out-of-distribution generalization performance in the low-data regime. We hope our approach and benchmarks can pave the way for future research on data-efficient and generalizable learning systems.

翻訳日:2021-06-02 14:41:33 公開日:2021-06-01

# 1つのシーケンスだけを見る:オブジェクト検出による視界のトランスフォーマーの再考

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection ( http://arxiv.org/abs/2106.00666v1 )

ライセンス: Link先を確認

Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu

(参考訳) transformerは$2\mathrm{d}$の空間構造に関する最小限の知識で、純粋なシーケンスからシーケンスまでの視点でオブジェクトレベルの認識を実行できるか? この疑問に答えるために、我々は、インダクティブバイアスだけでなく、最も少ない修正が可能な na\"ive Vision Transformer に基づく一連のオブジェクト検出モデルである You Only Look at One Sequence (YOLOS) を提示する。中間サイズのImageNet-$1k$データセットで事前トレーニングされたYOLOSは,COCO, \textit{e.g.の競合オブジェクト検出性能をすでに達成できるのみである。 BERT-Baseから直接採用されているYOLOS-Baseは42.0ドルのボックスAPを達成できます。また、オブジェクト検出を通じて、トランスフォーマーの視界における現在の事前訓練スキームとモデルスケーリング戦略の影響についても論じる。コードとモデルの重み付けは \url{https://github.com/hustvl/YOLOS} で確認できる。

Can Transformer perform $2\mathrm{D}$ object-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the $2\mathrm{D}$ spatial structure? To answer this question, we present You Only Look at One Sequence (YOLOS), a series of object detection models based on the na\"ive Vision Transformer with the fewest possible modifications as well as inductive biases. We find that YOLOS pre-trained on the mid-sized ImageNet-$1k$ dataset only can already achieve competitive object detection performance on COCO, \textit{e.g.}, YOLOS-Base directly adopted from BERT-Base can achieve $42.0$ box AP. We also discuss the impacts as well as limitations of current pre-train schemes and model scaling strategies for Transformer in vision through object detection. Code and model weights are available at \url{https://github.com/hustvl/YOLOS}.

翻訳日:2021-06-02 14:41:04 公開日:2021-06-01

# PIGLeT:3次元世界におけるニューロ・シンボリック相互作用による言語接地

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World ( http://arxiv.org/abs/2106.00188v1 )

ライセンス: Link先を確認

Rowan Zellers, Ari Holtzman, Matthew Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, Yejin Choi

(参考訳) PIGLeT - 相互作用を通して物理的常識知識を学習し,その知識を基底言語に利用するモデルを提案する。我々はPIGLeTを物理力学モデルと別言語モデルに分類する。私たちのダイナミクスモデルは、どんな物体なのかだけでなく、それらが何をしているのかも学べます。次に、言語モデルのインターフェースとして使用し、言語形式と基礎的意味の統一モデルを提供します。 PIGLeTは文を読み、次に何が起こるか神経的にシミュレートし、その結果をリテラル記号表現または自然言語で伝達する。実験結果から,我々のモデルは世界力学を効果的に学習し,コミュニケーションの仕方を示した。 80%以上の英語の文から「次に何が起こるか」を正確に予測することができ、100倍以上のテキスト・テキスト・アプローチを10%以上上回っている。同様に、物理相互作用の自然言語の要約も、人間がLMの代替品よりも正確であると判断する。今後の仕事の場を示す包括的分析を行う。

We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a physical dynamics model, and a separate language model. Our dynamics model learns not just what objects are but also what they do: glass cups break when thrown, plastic ones don't. We then use it as the interface to our language model, giving us a unified model of linguistic form and grounded meaning. PIGLeT can read a sentence, simulate neurally what might happen next, and then communicate that result through a literal symbolic representation, or natural language. Experimental results show that our model effectively learns world dynamics, along with how to communicate them. It is able to correctly forecast "what happens next" given an English sentence over 80% of the time, outperforming a 100x larger, text-to-text approach by over 10%. Likewise, its natural language summaries of physical interactions are also judged by humans as more accurate than LM alternatives. We present comprehensive analysis showing room for future work.

翻訳日:2021-06-02 14:40:51 公開日:2021-06-01

# 長文合成質問応答のためのエンドツーエンドマルチホップ検索

End-to-End Multihop Retrieval for Compositional Question Answering over Long Documents ( http://arxiv.org/abs/2106.00200v1 )

ライセンス: Link先を確認

Haitian Sun, William W. Cohen, Ruslan Salakhutdinov

(参考訳) 長い文書から複雑な質問に答えるには、複数の証拠をまとめて答えを予測する必要がある。本稿では,長い文書に対して合成質問に答えるマルチホップ検索手法であるdochopperを提案する。各ステップでDocHopperは文書から段落や文を検索し、検索した結果とクエリを混合し、次のステップでクエリを更新する。他の多くの検索ベースメソッド(ragやrealmなど)とは対照的に、クエリはトークンシーケンスでは拡張されない。これはモデルがエンドツーエンドで微分可能であることを意味する。文書構造を活用すれば、長い文書の質問応答や検索性能を大幅に改善できることを示す。我々はDocHopperを3つの異なるQAタスクで実験し、長い文書を読むことで構成的疑問に答える:談話内容推論、テーブルとテキストによる事実的QA、学術論文からのQAを求める情報。 DocHopperはすべてのベースラインモデルを上回っ、すべてのデータセットで最先端の結果を達成する。さらに、DocHopperは推論時に効率的で、ベースラインの3～10倍高速である。

Answering complex questions from long documents requires aggregating multiple pieces of evidence and then predicting the answers. In this paper, we propose a multi-hop retrieval method, DocHopper, to answer compositional questions over long documents. At each step, DocHopper retrieves a paragraph or sentence embedding from the document, mixes the retrieved result with the query, and updates the query for the next step. In contrast to many other retrieval-based methods (e.g., RAG or REALM) the query is not augmented with a token sequence: instead, it is augmented by "numerically" combining it with another neural representation. This means that model is end-to-end differentiable. We demonstrate that utilizing document structure in this was can largely improve question-answering and retrieval performance on long documents. We experimented with DocHopper on three different QA tasks that require reading long documents to answer compositional questions: discourse entailment reasoning, factual QA with table and text, and information seeking QA from academic papers. DocHopper outperforms all baseline models and achieves state-of-the-art results on all datasets. Additionally, DocHopper is efficient at inference time, being 3~10 times faster than the baselines.

翻訳日:2021-06-02 14:40:34 公開日:2021-06-01

# 言語横断的名前付きエンティティ認識のための強化反復的知識蒸留法

Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition ( http://arxiv.org/abs/2106.00241v1 )

ライセンス: Link先を確認

Shining Liang, Ming Gong, Jian Pei, Linjun Shou, Wanli Zuo, Xianglin Zuo, Daxin Jiang

(参考訳) 名前付きエンティティ認識(NER)は、Web SearchやVoice Assistantsなど、多くのアプリケーションの基本コンポーネントである。ディープニューラルネットワークは、NERの性能を大幅に改善するが、大量のトレーニングデータを必要とするため、ディープニューラルネットワークは業界環境で多くの言語にスケールアウトすることができない。この課題に対処するため、クロス言語NERは、訓練済みの多言語言語モデルを通じて、リッチリソース言語から低リソース言語へ知識を転送する。ターゲット言語でトレーニングデータを使用する代わりに、言語間NERはソース言語のトレーニングデータのみに依存し、オプションでソース言語から派生したトレーニングデータを追加する必要がある。しかし、既存の言語間nerメソッドでは、ターゲット言語でラベルのないリッチなデータをうまく利用していないため、業界アプリケーションでは比較的簡単に収集できる。この機会と課題に対処するため、本論文では、マイクロソフトにおいて、このような大量のラベルのないデータを実際の運用環境でターゲット言語で活用する新しいプラクティスについて述べる。ラベルなしデータから弱い監督信号を効果的に抽出するため,半教師付き学習と強化学習のアイデアに基づく新しいアプローチを開発した。 3つのベンチマークデータセットに関する実証的研究は、我々のアプローチがクリアなエッジで新しい最先端のパフォーマンスを確立することを検証します。現在、この論文で報告されているNER技術は、Microsoft Bing検索エンジンにおけるWebランキング、Entity Pane、Answers Triggering、Issue Answeringの基本的なコンポーネントになりつつある。さらに,本手法は,商用音声アシスタントのための音声言語理解モジュールの一部としても機能する。デプロイ後にプロトタイプフレームワークのコードをオープンソース化する予定です。

Named entity recognition (NER) is a fundamental component in many applications, such as Web Search and Voice Assistants. Although deep neural networks greatly improve the performance of NER, due to the requirement of large amounts of training data, deep neural networks can hardly scale out to many languages in an industry setting. To tackle this challenge, cross-lingual NER transfers knowledge from a rich-resource language to languages with low resources through pre-trained multilingual language models. Instead of using training data in target languages, cross-lingual NER has to rely on only training data in source languages, and optionally adds the translated training data derived from source languages. However, the existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages, which is relatively easy to collect in industry applications. To address the opportunities and challenges, in this paper we describe our novel practice in Microsoft to leverage such large amounts of unlabeled data in target languages in real production settings. To effectively extract weak supervision signals from the unlabeled data, we develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning. The empirical study on three benchmark data sets verifies that our approach establishes the new state-of-the-art performance with clear edges. Now, the NER techniques reported in this paper are on their way to become a fundamental component for Web ranking, Entity Pane, Answers Triggering, and Question Answering in the Microsoft Bing search engine. Moreover, our techniques will also serve as part of the Spoken Language Understanding module for a commercial voice assistant. We plan to open source the code of the prototype framework after deployment.

翻訳日:2021-06-02 14:40:15 公開日:2021-06-01

# 強化学習に基づくきめ細かな質問応答システム

A Coarse to Fine Question Answering System based on Reinforcement Learning ( http://arxiv.org/abs/2106.00257v1 )

ライセンス: Link先を確認

Yu Wang, Hongxia Jin

(参考訳) 本稿では,適切な行動を選択することで,異なる長さの文書を効率的に処理できる強化学習に基づく粗い質問応答(CFQA)システムを提案する。本システムは,アクタ批判に基づく深層強化学習モデルを用いて,多段階質問応答を実現する。ショートドキュメントとロングドキュメントの両方を主とするデータセットを対象とした従来のQAモデルと比較して、マルチステップからファインモデルへは、ショートドキュメントとロングドキュメントの両方を扱える複数のシステムモジュールからメリットを享受する。これにより、現在の最先端モデルよりも精度が向上し、トレーニング速度も速くなる。我々は、WIKEREADING、WIKIREADING LONG、CNN、SQuADの4つのQAデータセットでモデルをテストし、1.3$\%$-1.7$\%$精度の改善を1.5x-3.4xのトレーニングスピードアップで示す。

In this paper, we present a coarse to fine question answering (CFQA) system based on reinforcement learning which can efficiently processes documents with different lengths by choosing appropriate actions. The system is designed using an actor-critic based deep reinforcement learning model to achieve multi-step question answering. Compared to previous QA models targeting on datasets mainly containing either short or long documents, our multi-step coarse to fine model takes the merits from multiple system modules, which can handle both short and long documents. The system hence obtains a much better accuracy and faster trainings speed compared to the current state-of-the-art models. We test our model on four QA datasets, WIKEREADING, WIKIREADING LONG, CNN and SQuAD, and demonstrate 1.3$\%$-1.7$\%$ accuracy improvements with 1.5x-3.4x training speed-ups in comparison to the baselines using state-of-the-art models.

翻訳日:2021-06-02 14:39:49 公開日:2021-06-01

# graph isomorphism, covariants, and parser performance"\textit{ because their treebanks leak}"の複製と拡張

Replicating and Extending "\textit{Because Their Treebanks Leak}": Graph Isomorphism, Covariants, and Parser Performance ( http://arxiv.org/abs/2106.00352v1 )

ライセンス: Link先を確認

Mark Anderson and Anders S{\o}gaard and Carlos G\'omez Rodr\'iguez

(参考訳) s{\o}gaard (2020) は、テストデータに含まれる木の割合がトレーニングセット内の木に同型であることを示唆する結果を得た。 NLPの他の統計分析と同様に、結果は線形回帰評価に基づく。しかし,本研究には方法論的な問題があり,信頼性の低いサンプルサイズを用いて実施した。そこで本研究では,文の長さを単位とする複製研究を行い,グラフ同型に関して,文のごく一部しか性能に変化がないことを示す。さらに,共変量を制御する際に,野生におけるパーサ性能とグラフアイソモーフィズムの相関は消失する。しかし、共変を固定した制御実験では、強い相関関係が観察される。このような統計的分析から得られた結論は、より容易に要因を分解することで、制御された実験がそれらを補う必要があることを示唆する。

S{\o}gaard (2020) obtained results suggesting the fraction of trees occurring in the test data isomorphic to trees in the training set accounts for a non-trivial variation in parser performance. Similar to other statistical analyses in NLP, the results were based on evaluating linear regressions. However, the study had methodological issues and was undertaken using a small sample size leading to unreliable results. We present a replication study in which we also bin sentences by length and find that only a small subset of sentences vary in performance with respect to graph isomorphism. Further, the correlation observed between parser performance and graph isomorphism in the wild disappears when controlling for covariants. However, in a controlled experiment, where covariants are kept fixed, we do observe a strong correlation. We suggest that conclusions drawn from statistical analyses like this need to be tempered and that controlled experiments can complement them by more readily teasing factors apart.

翻訳日:2021-06-02 14:39:31 公開日:2021-06-01

# SemEval-2021 Task 6: テキストとマルチモーダルアンサンブルを用いた説得的テキストと画像の検出に向けて

Volta at SemEval-2021 Task 6: Towards Detecting Persuasive Texts and Images using Textual and Multimodal Ensemble ( http://arxiv.org/abs/2106.00240v1 )

ライセンス: Link先を確認

Kshitij Gupta, Devansh Gautam, Radhika Mamidi

(参考訳) ミームは、情報をオンラインで拡散するために使われる最も人気のあるコンテンツの1つである。修辞的・心理学的手法によって多くの人々に影響を及ぼすことができる。テキストや画像における説得技術の検出は,これらの説得技術を検出することを目的としている。 A)テキストコンテンツを用いたマルチラベル分類,(B)テキストコンテンツを用いたマルチラベル分類とスパン識別,(C)ビジュアルコンテンツとテキストコンテンツを用いたマルチラベル分類の3つのサブタスクから構成される。本稿では, BERT をベースとしたモデルに対して, 異なるモダリティで伝達学習手法を提案する。また、異なるモードで訓練されたモデルのアンサンブルの有効性についても検討する。 57.0, 48.2, 52.1のF1スコアを対応するサブタスクで達成する。

Memes are one of the most popular types of content used to spread information online. They can influence a large number of people through rhetorical and psychological techniques. The task, Detection of Persuasion Techniques in Texts and Images, is to detect these persuasive techniques in memes. It consists of three subtasks: (A) Multi-label classification using textual content, (B) Multi-label classification and span identification using textual content, and (C) Multi-label classification using visual and textual content. In this paper, we propose a transfer learning approach to fine-tune BERT-based models in different modalities. We also explore the effectiveness of ensembles of models trained in different modalities. We achieve an F1-score of 57.0, 48.2, and 52.1 in the corresponding subtasks.

翻訳日:2021-06-02 14:39:15 公開日:2021-06-01

# 逆VQA:VQAモデルのロバスト性を評価するための新しいベンチマーク

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models ( http://arxiv.org/abs/2106.00245v1 )

ライセンス: Link先を確認

Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu

(参考訳) 大規模な事前トレーニングでは、過去2年間、vqa(visual question answering)タスクのパフォーマンスが大幅に向上している。急速な進展はあったが、これらの最先端(SOTA)のVQAモデルが野生での試験例に遭遇する際に堅牢かどうかは不明である。そこで本研究では,新たな大規模VQAベンチマークであるAdversarial VQAを紹介する。この新しいベンチマークでは,いくつかの興味深い結果が得られた。意外なことに,データセット収集の過程で,非エキスパートアノテータが比較的容易にSOTA VQAモデルを攻撃できることがわかった。 (II)新しいデータセット上で様々なSOTA VQAモデルをテストして、その脆弱性を強調し、大規模な事前学習モデルと敵のトレーニング手法の両方が、標準のVQA v2データセットよりもはるかに低いパフォーマンスしか達成できないことを発見した。 (iii)データ拡張とみなす場合、我々のデータセットは、他の堅牢なVQAベンチマークのパフォーマンス向上に利用できます。 (iv)我々は,データセットの詳細な分析を行い,コミュニティにもたらした課題に関する貴重な洞察を提供する。我々は、Adversarial VQAが、開発したVQAモデルの堅牢性をテストするために将来の作業で使用される貴重なベンチマークとして機能することを願っている。私たちのデータセットはhttps://adversarialvqa.comで公開されています。 github.io/

With large-scale pre-training, the past two years have witnessed significant performance boost on the Visual Question Answering (VQA) task. Though rapid progresses have been made, it remains unclear whether these state-of-the-art (SOTA) VQA models are robust when encountering test examples in the wild. To study this, we introduce Adversarial VQA, a new large-scale VQA benchmark, collected iteratively via an adversarial human-and-model-in-the-loop procedure. Through this new benchmark, we present several interesting findings. (i) Surprisingly, during dataset collection, we find that non-expert annotators can successfully attack SOTA VQA models with relative ease. (ii) We test a variety of SOTA VQA models on our new dataset to highlight their fragility, and find that both large-scale pre-trained models and adversarial training methods can only achieve far lower performance than what they can achieve on the standard VQA v2 dataset. (iii) When considered as data augmentation, our dataset can be used to improve the performance on other robust VQA benchmarks. (iv) We present a detailed analysis of the dataset, providing valuable insights on the challenges it brings to the community. We hope Adversarial VQA can serve as a valuable benchmark that will be used by future work to test the robustness of its developed VQA models. Our dataset is publicly available at https://adversarialvqa. github.io/.

翻訳日:2021-06-02 14:39:02 公開日:2021-06-01

# ViTA:オブジェクトタグのアライメントによる視覚言語翻訳

ViTA: Visual-Linguistic Translation by Aligning Object Tags ( http://arxiv.org/abs/2106.00250v1 )

ライセンス: Link先を確認

Kshitij Gupta, Devansh Gautam, Radhika Mamidi

(参考訳) マルチモーダル機械翻訳(mmt)は、翻訳のための視覚情報を含む原文を豊かにする。近年は人気が高まり、同じ方向にいくつかのパイプラインが提案されている。しかし、このタスクは、翻訳システムにおける視覚的モダリティの寄与を説明するための品質データセットを欠いている。本稿では,WAT 2021の多モーダル翻訳タスクを英語からヒンディー語に翻訳するシステムを提案する。我々は、テキストのみの翻訳に、事前訓練された多言語列列列列モデルであるmBARTを用いることを提案する。さらに、画像からオブジェクトタグを抽出し、マルチモーダルタスクの入力を強化することにより、視覚情報をテキスト領域に持ち込む。また,ソーステキストを体系的に劣化させることにより,システムのロバスト性について検討する。最後に、タスクのテストセットとチャレンジセットにおいて、BLEUスコア44.6と51.6を達成する。

Multimodal Machine Translation (MMT) enriches the source text with visual information for translation. It has gained popularity in recent years, and several pipelines have been proposed in the same direction. Yet, the task lacks quality datasets to illustrate the contribution of visual modality in the translation systems. In this paper, we propose our system for the Multimodal Translation Task of WAT 2021 from English to Hindi. We propose to use mBART, a pretrained multilingual sequence-to-sequence model, for the textual-only translations. Further, we bring the visual information to a textual domain by extracting object tags from the image and enhance the input for the multimodal task. We also explore the robustness of our system by systematically degrading the source text. Finally, we achieve a BLEU score of 44.6 and 51.6 on the test set and challenge set of the task.

翻訳日:2021-06-02 14:38:37 公開日:2021-06-01

# TransVOS: トランスフォーマーによるビデオオブジェクトセグメンテーション

TransVOS: Video Object Segmentation with Transformers ( http://arxiv.org/abs/2106.00588v1 )

ライセンス: Link先を確認

Jianbiao Mei, Mengmeng Wang, Yeneng Lin, Yong Liu

(参考訳) 近年,STM(Space-Time Memory Network)に基づく手法は,半教師付きビデオオブジェクトセグメンテーション(VOS)において最先端のパフォーマンスを実現している。このタスクにおける重要な問題は、異なるフレームと各フレーム内の依存関係をモデル化する方法である。しかし、これらの手法の多くは空間的関係(各フレームの内側)を無視し、時間的関係(異なるフレーム)を完全に利用しない。本稿では,時間的・空間的関係をフル活用し,モデル化するビジョントランスフォーマを導入する,TransVOSと呼ばれる新しいトランスフォーマベースのフレームワークを提案する。さらに、ほとんどのSTMベースのアプローチでは、2つの異なるエンコーダを使用して、2つの重要な入力、すなわち参照セット(予測マスク付き歴史フレーム)とクエリフレームの特徴を抽出し、モデルのパラメータと複雑さを増大させる。有効性を保ちながら、人気のある2エンコーダパイプラインをスリム化するために、上記の2つの入力を統一的に符号化する単一の2パス特徴抽出器を設計する。大規模な実験は、DAVISとYouTube-VOSデータセットの最先端手法よりもTransVOSの方が優れていることを示している。コードは公開時にリリースされる。

Recently, Space-Time Memory Network (STM) based methods have achieved state-of-the-art performance in semi-supervised video object segmentation (VOS). A critical problem in this task is how to model the dependency both among different frames and inside every frame. However, most of these methods neglect the spatial relationships (inside each frame) and do not make full use of the temporal relationships (among different frames). In this paper, we propose a new transformer-based framework, termed TransVOS, introducing a vision transformer to fully exploit and model both the temporal and spatial relationships. Moreover, most STM-based approaches employ two disparate encoders to extract features of two significant inputs, i.e., reference sets (history frames with predicted masks) and query frame, respectively, increasing the models' parameters and complexity. To slim the popular two-encoder pipeline while keeping the effectiveness, we design a single two-path feature extractor to encode the above two inputs in a unified way. Extensive experiments demonstrate the superiority of our TransVOS over state-of-the-art methods on both DAVIS and YouTube-VOS datasets. Codes will be released when it is published.

翻訳日:2021-06-02 14:38:26 公開日:2021-06-01

# 大規模バッチ学習のための並行学習

Concurrent Adversarial Learning for Large-Batch Training ( http://arxiv.org/abs/2106.00221v1 )

ライセンス: Link先を確認

Yong Liu, Xiangning Chen, Minhao Cheng, Cho-Jui Hsieh, Yang You

(参考訳) 大規模バッチトレーニングは、多数のGPU/TPUプロセッサでニューラルネットワークをトレーニングする際に一般的に使用されるテクニックとなっている。バッチサイズが大きくなると、確率的最適化器は鋭い局所的な最小値に収束し、テスト性能が低下する。現行の手法では,バッチサイズを増大させるため,バッチサイズが大きくなるにつれてデータ増倍による性能向上が減少し,ある時点からデータ増倍が不十分になることがわかった。本稿では,大規模バッチ学習におけるバッチサイズ向上のための逆学習を提案する。意思決定面の平滑化と平坦な領域への偏りに対する自然な選択であるにもかかわらず、各ステップで少なくとも2つの逐次的な勾配計算が必要となるため、大規模なバッチトレーニングでは、逆学習がうまく適用されていない。そこで本研究では, 逐次的勾配計算を逐次的に切り離し, 定常パラメータを活用し, 同時進行学習 (conadv) 法を提案する。実験の結果,ConAdvは高精度を維持しつつ,ImageNet上でのResNet-50とEfficientNetトレーニングの両方でバッチサイズを向上できることがわかった。具体的には,ImageNet ResNet-50トレーニングにおいて,96Kバッチサイズで75.3\%のTop-1精度を実現し,ConAdvとデータ拡張を組み合わせた場合の精度をさらに76.2\%に向上できることを示す。これはResNet-50トレーニングバッチサイズを96Kにスケールする最初の作業である。

Large-batch training has become a commonly used technique when training neural networks with a large number of GPU/TPU processors. As batch size increases, stochastic optimizers tend to converge to sharp local minima, leading to degraded test performance. Current methods usually use extensive data augmentation to increase the batch size, but we found the performance gain with data augmentation decreases as batch size increases, and data augmentation will become insufficient after certain point. In this paper, we propose to use adversarial learning to increase the batch size in large-batch training. Despite being a natural choice for smoothing the decision surface and biasing towards a flat region, adversarial learning has not been successfully applied in large-batch training since it requires at least two sequential gradient computations at each step, which will at least double the running time compared with vanilla training even with a large number of processors. To overcome this issue, we propose a novel Concurrent Adversarial Learning (ConAdv) method that decouple the sequential gradient computations in adversarial learning by utilizing staled parameters. Experimental results demonstrate that ConAdv can successfully increase the batch size on both ResNet-50 and EfficientNet training on ImageNet while maintaining high accuracy. In particular, we show ConAdv along can achieve 75.3\% top-1 accuracy on ImageNet ResNet-50 training with 96K batch size, and the accuracy can be further improved to 76.2\% when combining ConAdv with data augmentation. This is the first work successfully scales ResNet-50 training batch size to 96K.

翻訳日:2021-06-02 14:37:26 公開日:2021-06-01

# サブシンボリック推論のための学習表現

Learning Representations for Sub-Symbolic Reasoning ( http://arxiv.org/abs/2106.00393v1 )

ライセンス: Link先を確認

Giuseppe Marra, Michelangelo Diligenti, Francesco Giannini and Marco Maggini

(参考訳) ニューロシンボリックな手法は、神経アーキテクチャ、知識表現、推論を統合する。しかし、彼らは観測の本質的な不確実性に対処し、現実の応用へのスケーリングに苦慮している。本稿では,ディープ・ラーナ・アーキテクチャの潜在空間における関係推論を行う新しいエンド・ツー・エンドモデルであるrelational reasoning networks(r2n)について述べる。エンティティ間の関係を表現できる知識グラフ埋め込みのような平らなアーキテクチャとは異なり、R2Nは基底原子間の高レベルな関係を考慮し、追加の計算構造を定義する。考慮された関係は、論理式によって定義されたもののように明示的に知られているか、基底原子の群間の無拘束相関として定義される。 R2Nは純粋にシンボリックなタスクや、シンボリックと特徴に基づく表現的エンティティの両方で異種問題における学習と推論を統合するための神経-記号的プラットフォームとして適用することができる。提案モデルは,拡張性や表現性に制限された従来のニューロシンボリックな手法のギャップを埋めるものである。提案手法は, 異なる実験環境で最新の結果が得られることを示す。

Neuro-symbolic methods integrate neural architectures, knowledge representation and reasoning. However, they have been struggling at both dealing with the intrinsic uncertainty of the observations and scaling to real world applications. This paper presents Relational Reasoning Networks (R2N), a novel end-to-end model that performs relational reasoning in the latent space of a deep learner architecture, where the representations of constants, ground atoms and their manipulations are learned in an integrated fashion. Unlike flat architectures like Knowledge Graph Embedders, which can only represent relations between entities, R2Ns define an additional computational structure, accounting for higher-level relations among the ground atoms. The considered relations can be explicitly known, like the ones defined by logic formulas, or defined as unconstrained correlations among groups of ground atoms. R2Ns can be applied to purely symbolic tasks or as a neuro-symbolic platform to integrate learning and reasoning in heterogeneous problems with both symbolic and feature-based represented entities. The proposed model bridges the gap between previous neuro-symbolic methods that have been either limited in terms of scalability or expressivity. The proposed methodology is shown to achieve state-of-the-art results in different experimental settings.

翻訳日:2021-06-02 14:36:55 公開日:2021-06-01

# OpenBox: 汎用ブラックボックス最適化サービス

OpenBox: A Generalized Black-box Optimization Service ( http://arxiv.org/abs/2106.00421v1 )

ライセンス: Link先を確認

Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huaijun Jiang, Mingchao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang, Bin Cui

(参考訳) black-box optimization(bbo)は、自動機械学習、エンジニアリング、物理学、実験設計など、幅広い応用がある。しかし、既存のソフトウェアパッケージと互換性のある問題に対して、ユーザがBBOメソッドを適用することは、適用性、性能、効率の点で依然として課題である。本稿では,ユーザビリティを向上したオープンソースの汎用BBOサービスであるOpenBoxを構築する。 OpenBoxを支えるモジュール設計は、他の既存のシステムで共通する基本的なBBOコンポーネントの柔軟な抽象化と最適化を容易にする。 OpenBoxは分散、フォールトトレラント、スケーラブルである。効率を改善するために、OpenBoxはさらに"algorithm agnostic"並列化と転送学習を利用している。実験の結果,既存のシステムと比較してopenboxの有効性と効率が実証された。

Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, engineering, physics, and experimental design. However, it remains a challenge for users to apply BBO methods to their problems at hand with existing software packages, in terms of applicability, performance, and efficiency. In this paper, we build OpenBox, an open-source and general-purpose BBO service with improved usability. The modular design behind OpenBox also facilitates flexible abstraction and optimization of basic BBO components that are common in other existing systems. OpenBox is distributed, fault-tolerant, and scalable. To improve efficiency, OpenBox further utilizes "algorithm agnostic" parallelization and transfer learning. Our experimental results demonstrate the effectiveness and efficiency of OpenBox compared to existing systems.

翻訳日:2021-06-02 14:36:35 公開日:2021-06-01

# 動的ニューラルモデルを用いた学生のパフォーマンス予測

Student Performance Prediction Using Dynamic Neural Models ( http://arxiv.org/abs/2106.00524v1 )

ライセンス: Link先を確認

Marina Delianidi, Konstantinos Diamantaras, George Chrysogonidis, Vasileios Nikiforidis

(参考訳) 本研究は,学生の学習・評価過程における過去のインタラクションに基づいて,次の試験問題に対する学生の回答の正当性を予測する問題に対処する。我々は、学生のパフォーマンスを動的問題としてモデル化し、そのソリューションとして、有限メモリ時間遅延ニューラルネットワーク(TDNN)と潜在的無限メモリリカレントニューラルネットワーク(RNN)の2つの主要なクラスを比較した。次の応答は,学生の知識状態の関数であり,それに対して,従来の応答と,それに関連するスキルの関数であるので,2部ネットワークアーキテクチャを提案する。第1部は、動的ニューラルネットワーク(tdnnまたはrnn)を使用して、学生の知識状態をトレースする。第2部は動的部分の上に適用され、学生の知識状態の推定に基づいて学生の反応を予測する分類タスクを完了した多層フィードフォワードネットワークである。入力スキルと以前のレスポンスは、異なる埋め込みを使ってエンコードされる。スキル埋め込みに関しては, (a) ランダムベクトルと (b) スキルのテキスト記述と一致する事前学習ベクトルを用いて, 2つの異なる初期化手法を試した。実験の結果,これまでに使用したすべてのデータセットにおいて,RNNアプローチの性能はTDNNアプローチよりも優れていることがわかった。また、我々のRNNアーキテクチャは、5つのデータセットのうち4つで最先端のモデルよりも優れていることを示す。 tdnnのアプローチは、5つのデータセットのうち4つでアートモデルの状態を上回っていますが、提案されているrnnのアプローチよりは少し悪いです。最後に、我々の期待に反して、事前学習ベクターを用いたスキル埋め込みの初期化は、ランダム初期化に対して事実上優位ではないことが判明した。

We address the problem of predicting the correctness of the student's response on the next exam question based on their previous interactions in the course of their learning and evaluation process. We model the student performance as a dynamic problem and compare the two major classes of dynamic neural architectures for its solution, namely the finite-memory Time Delay Neural Networks (TDNN) and the potentially infinite-memory Recurrent Neural Networks (RNN). Since the next response is a function of the knowledge state of the student and this, in turn, is a function of their previous responses and the skills associated with the previous questions, we propose a two-part network architecture. The first part employs a dynamic neural network (either TDNN or RNN) to trace the student knowledge state. The second part applies on top of the dynamic part and it is a multi-layer feed-forward network which completes the classification task of predicting the student response based on our estimate of the student knowledge state. Both input skills and previous responses are encoded using different embeddings. Regarding the skill embeddings we tried two different initialization schemes using (a) random vectors and (b) pretrained vectors matching the textual descriptions of the skills. Our experiments show that the performance of the RNN approach is better compared to the TDNN approach in all datasets that we have used. Also, we show that our RNN architecture outperforms the state-of-the-art models in four out of five datasets. It is worth noting that the TDNN approach also outperforms the state of the art models in four out of five datasets, although it is slightly worse than our proposed RNN approach. Finally, contrary to our expectations, we find that the initialization of skill embeddings using pretrained vectors offers practically no advantage over random initialization.

翻訳日:2021-06-02 14:36:26 公開日:2021-06-01

# Duckworth-Lewis-Stern法と機械学習手法の比較

Duckworth-Lewis-Stern Method Comparison with Machine Learning Approach ( http://arxiv.org/abs/2106.00175v1 )

ライセンス: Link先を確認

Kumail Abbas and Sajjad Haider

(参考訳) 本研究は,ODIクリケットの試合におけるDuckworth-Lewis-Stern (DLS)法の解析を行った。 DLS法の精度を様々な教師付き学習アルゴリズムと比較し,結果予測を行う。クリケットの試合の結果は2回目の間に予測される。また,Duckworth-Lewis (D/L) 式で使用される DLS 資源テーブルを最適化し,予測能力を向上した。最後に、ODIの試合中にどれだけ予測不可能かに応じて異なるクリケット競技国をランク付けする予測不可能指数が開発されている。

This work presents an analysis of the Duckworth-Lewis-Stern (DLS) method for One Day International (ODI) cricket matches. The accuracy of the DLS method is compared against various supervised learning algorithms for result prediction. The result of a cricket match is predicted during the second inning. The paper also optimized DLS resource table which is used in the Duckworth-Lewis (D/L) formula to increase its predictive power. Finally, an Unpredictability Index is developed that ranks different cricket playing nations according to how unpredictable they are while playing an ODI match.

翻訳日:2021-06-02 14:35:30 公開日:2021-06-01

# 価値の欠如を予測できるよいインプットは何でしょう?

What's a good imputation to predict with missing values? ( http://arxiv.org/abs/2106.00311v1 )

ライセンス: Link先を確認

Marine Le Morvan (PARIETAL, IJCLab), Julie Josse (CRISAM), Erwan Scornet (CMAP), Ga\"el Varoquaux (PARIETAL)

(参考訳) 値が欠けているデータについてよい予測子を学ぶには? ほとんどの取り組みは、結果を予測するために、完了データにできる限り第一の示唆と第二の学習に焦点を当てています。しかし、この広範な実践には理論的根拠がない。ここでは, ほぼすべてのインプテーション関数に対して, 強力な学習者を持つインプタント・テン・レグレッション手順がベイズ最適であることを示す。この結果は、確率的モデリングにおいて不確定性を使用するために非ランダムな設定を必要とする古典的な統計結果とは対照的である。さらに、完全な条件付きインプテーションは漸近的に良い予測には必要ではないかもしれない。実際、完全にインプットされたデータでは、最高の回帰関数は概して不連続であり、学習は困難である。代わりに、回帰関数を変更しないようにインプテーションを作成することは、単に問題を不連続インプテーションの学習に移す。むしろ、インプテーションと回帰を共同で学ぶのがより簡単であることを示唆する。観測された変数と観測されていない変数をまたいだ条件付きリンクをキャプチャするニューラルネットワークであるNeuMissに適応する手法を提案する。実験により, 有限個の試料を用いた実験において, NeuMiss による連成計算と回帰は, 様々な2段階の手順より優れていることを確認した。

How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation may not be needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.

翻訳日:2021-06-02 14:35:21 公開日:2021-06-01

# 変分ベイズにおけるフレキシブル後方の変形モデル

Transformation Models for Flexible Posteriors in Variational Bayes ( http://arxiv.org/abs/2106.00528v1 )

ライセンス: Link先を確認

Sefan H\"ortling, Daniel Dold, Oliver D\"urr, Beate Sick

(参考訳) ベイズモデルの主な課題は、モデルパラメータの後方を決定することである。既に1つまたは少数のパラメータしか持たないモデルでは、分析後部は特別な設定でのみ決定できる。ベイズニューラルネットワークでは、変分分布による計算が難しい後部を近似するために、変分推論が広く用いられている。通常、ガウス分布は変分分布 (Gaussian-VI) として用いられ、その柔軟性の制限により近似の質が制限される。一方、変換モデルはどんな分布にも適合するほど柔軟である。ここでは、変換モデルに基づく変分推論(TM-VI)を提案し、一つのパラメータを持つモデルにおける複雑な後部を正確に近似し、ニューラルネットワークのようなマルチパラメータモデルに対して平均場的に機能することを実証する。

The main challenge in Bayesian models is to determine the posterior for the model parameters. Already, in models with only one or few parameters, the analytical posterior can only be determined in special settings. In Bayesian neural networks, variational inference is widely used to approximate difficult-to-compute posteriors by variational distributions. Usually, Gaussians are used as variational distributions (Gaussian-VI) which limits the quality of the approximation due to their limited flexibility. Transformation models on the other hand are flexible enough to fit any distribution. Here we present transformation model-based variational inference (TM-VI) and demonstrate that it allows to accurately approximate complex posteriors in models with one parameter and also works in a mean-field fashion for multi-parameter models like neural networks.

翻訳日:2021-06-02 14:35:01 公開日:2021-06-01

# 実時間および軽量ラインセグメント検出に向けて

Towards Real-time and Light-weight Line Segment Detection ( http://arxiv.org/abs/2106.00186v1 )

ライセンス: Link先を確認

Geonmo Gu, Byungsoo Ko, SeoungHyun Go, Sung-Hyun Lee, Jingeun Lee, Minchul Shin

(参考訳) 従来の深層学習に基づく線分検出(LSD)は、ライン予測のための膨大なモデルサイズと高い計算コストに悩まされていた。これにより、計算的に制限された環境でのリアルタイム推論から制約される。本稿では,mobile lsd (m-lsd) という,資源制約環境のリアルタイム・軽量ラインセグメント検出手法を提案する。バックボーンネットワークの最小化と,従来手法におけるライン予測のための典型的なマルチモジュールプロセスの削除により,極めて効率的なLCDアーキテクチャを設計する。このような軽量ネットワークとの競争性能を維持するために,線形セグメント(SoL)のセグメント化と幾何学習方式という,新しいトレーニング手法を提案する。 SoL拡張は、トレーニングプロセス中に補助ラインデータを提供するために使用される複数のサブパートにラインセグメントを分割する。さらに、幾何学習スキームにより、モデルがマッチング損失、接合および線分節、長さおよび次数回帰から追加の幾何学的手がかりを捉えることができる。これまで最高のリアルタイムLSD手法であったTP-LSD-Liteと比較して、我々のモデル(M-LSD-tiny)は、Wireframeおよび YorkUrbanデータセットで評価した場合、モデルサイズ2.5%、GPUでの推論速度130.5%の競合性能を達成する。さらに、当社のモデルは、それぞれAndroidとiPhoneのモバイルデバイス上で56.8 FPSと48.6 FPSで動作する。私たちの知る限りでは、これはモバイルデバイスで利用可能な最初のリアルタイム深層lsdメソッドです。

Previous deep learning-based line segment detection (LSD) suffer from the immense model size and high computational cost for line prediction. This constrains them from real-time inference on computationally restricted environments. In this paper, we propose a real-time and light-weight line segment detector for resource-constrained environments named Mobile LSD (M-LSD). We design an extremely efficient LSD architecture by minimizing the backbone network and removing the typical multi-module process for line prediction in previous methods. To maintain competitive performance with such a light-weight network, we present novel training schemes: Segments of Line segment (SoL) augmentation and geometric learning scheme. SoL augmentation splits a line segment into multiple subparts, which are used to provide auxiliary line data during the training process. Moreover, the geometric learning scheme allows a model to capture additional geometry cues from matching loss, junction and line segmentation, length and degree regression. Compared with TP-LSD-Lite, previously the best real-time LSD method, our model (M-LSD-tiny) achieves competitive performance with 2.5% of model size and an increase of 130.5% in inference speed on GPU when evaluated with Wireframe and YorkUrban datasets. Furthermore, our model runs at 56.8 FPS and 48.6 FPS on Android and iPhone mobile devices, respectively. To the best of our knowledge, this is the first real-time deep LSD method available on mobile devices.

翻訳日:2021-06-02 14:34:16 公開日:2021-06-01

# 深層カーネル学習による医用画像解析における予測不確かさの定量化

Quantifying Predictive Uncertainty in Medical Image Analysis with Deep Kernel Learning ( http://arxiv.org/abs/2106.00638v1 )

ライセンス: Link先を確認

Zhiliang Wu, Yinchong Yang, Jindong Gu, Volker Tresp

(参考訳) ディープニューラルネットワークは、医療画像の分析にますます利用されている。しかし、ほとんどの作品はモデルの予測の不確実性を無視している。本稿では、畳み込みニューラルネットワークとスパースガウス過程のパイプラインによる予測の不確実性の推定を可能にする不確実性を考慮した深層カーネル学習モデルを提案する。さらに,提案モデルへの影響を検討するために,様々な事前学習手法を適用した。我々は骨年齢予測と病変局所化にアプローチを適用した。ほとんどの場合、提案したモデルは一般的なアーキテクチャよりも優れた性能を示している。さらに重要なことは、我々のモデルはより正確な予測の信頼性を体系的に高く表現し、より正確な予測の信頼性を低くする。私たちのモデルは、挑戦的で議論を呼ぶテストサンプルを検出するためにも使用できます。モンテカルロ・ドロップアウトのような関連する手法と比較して,本手法は不確かさ情報を純粋に解析的に導出し,計算効率が向上する。

Deep neural networks are increasingly being used for the analysis of medical images. However, most works neglect the uncertainty in the model's prediction. We propose an uncertainty-aware deep kernel learning model which permits the estimation of the uncertainty in the prediction by a pipeline of a Convolutional Neural Network and a sparse Gaussian Process. Furthermore, we adapt different pre-training methods to investigate their impacts on the proposed model. We apply our approach to Bone Age Prediction and Lesion Localization. In most cases, the proposed model shows better performance compared to common architectures. More importantly, our model expresses systematically higher confidence in more accurate predictions and less confidence in less accurate ones. Our model can also be used to detect challenging and controversial test samples. Compared to related methods such as Monte-Carlo Dropout, our approach derives the uncertainty information in a purely analytical fashion and is thus computationally more efficient.

翻訳日:2021-06-02 14:33:53 公開日:2021-06-01

# 説明を信用する、または信頼しない:leafを使って局所線形xai法を評価する

To trust or not to trust an explanation: using LEAF to evaluate local linear XAI methods ( http://arxiv.org/abs/2106.00461v1 )

ライセンス: Link先を確認

Elvio G. Amparore and Alan Perotti and Paolo Bajardi

(参考訳) eXplainable Artificial Intelligence (XAI)の主な目的は、ブラックボックス分類器の効果的な説明を提供することである。既存の文献では、説明に有用な多くの望ましい特性を挙げているが、実際に説明を定量的に評価する方法については合意が得られていない。さらに、説明は一般にブラックボックスモデルの検査にのみ使用され、意思決定支援としての説明の積極的な使用は一般的に見過ごされる。 XAIへの多くのアプローチの中で広く採用されているパラダイムは、局所線形説明(Local Linear Explanations)である。これらの手法は不安定な説明、約束された理論特性からの実際の実装のばらつき、間違ったラベルの説明など、多くの欠陥に悩まされている。このことは、XAI分野における局所線形説明のための標準および非バイアス評価手順の必要性を強調している。本稿では,局所線形説明の評価のための,明確であいまいなメトリクス集合を特定する問題に対処する。この集合は、この種類の説明のために具体的に定義された既存のメトリクスと新しいメトリクスの両方を含んでいる。すべてのメトリクスは、LEAFという名前のオープンPythonフレームワークに含まれている。 LEAFの目的は、エンドユーザが標準化され、偏見のない方法で説明を評価するためのリファレンスを提供し、研究者が説明可能な技術の改善に導くことである。

The main objective of eXplainable Artificial Intelligence (XAI) is to provide effective explanations for black-box classifiers. The existing literature lists many desirable properties for explanations to be useful, but there is no consensus on how to quantitatively evaluate explanations in practice. Moreover, explanations are typically used only to inspect black-box models, and the proactive use of explanations as a decision support is generally overlooked. Among the many approaches to XAI, a widely adopted paradigm is Local Linear Explanations - with LIME and SHAP emerging as state-of-the-art methods. We show that these methods are plagued by many defects including unstable explanations, divergence of actual implementations from the promised theoretical properties, and explanations for the wrong label. This highlights the need to have standard and unbiased evaluation procedures for Local Linear Explanations in the XAI field. In this paper we address the problem of identifying a clear and unambiguous set of metrics for the evaluation of Local Linear Explanations. This set includes both existing and novel metrics defined specifically for this class of explanations. All metrics have been included in an open Python framework, named LEAF. The purpose of LEAF is to provide a reference for end users to evaluate explanations in a standardised and unbiased way, and to guide researchers towards developing improved explainable techniques.

翻訳日:2021-06-02 14:32:39 公開日:2021-06-01

# 対人模倣学習には何が重要か?

What Matters for Adversarial Imitation Learning? ( http://arxiv.org/abs/2106.00672v1 )

ライセンス: Link先を確認

Manu Orsini, Anton Raichuk, L\'eonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz

(参考訳) 逆模倣学習は、継続的制御における模倣の一般的なフレームワークとなっている。長年にわたり、学習ポリシーの性能向上とアルゴリズムのサンプル複雑さを高めるために、そのコンポーネントの様々なバリエーションが提案されてきた。実際には、これらの選択が厳密な実証研究で一緒にテストされることは滅多にない。したがって、高レベルのアルゴリズムオプションや低レベルの実装の詳細について、どの選択肢を議論し、理解することは困難である。この問題に取り組むため,我々は50以上の選択肢を汎用的な敵意模倣学習フレームワークに実装し,人工的および人為的に生成した実演を用いた大規模研究(>500k訓練エージェント)においてその影響を調査した。私たちの発見の多くは一般的なプラクティスを裏付けていますが、いくつかは以前の作業に驚きや矛盾すらあります。特に,人工的な実演は人間のデータにとってよい指標ではないこと,および人工的な実演でのみ模倣アルゴリズムを評価するという非常に一般的な実践が,より現実的な実演でうまく機能しないアルゴリズムにつながる可能性があることを示唆する。

Adversarial imitation learning has become a popular framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and understand what choices, among the high-level algorithmic options as well as low-level implementation details, matter. To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations. While many of our findings confirm common practices, some of them are surprising or even contradict prior work. In particular, our results suggest that artificial demonstrations are not a good proxy for human data and that the very common practice of evaluating imitation algorithms only with synthetic demonstrations may lead to algorithms which perform poorly in the more realistic scenarios with human demonstrations.

翻訳日:2021-06-02 14:32:17 公開日:2021-06-01

# 深層学習モデルにおける局所的妥当性と識別的信頼区間

Locally Valid and Discriminative Confidence Intervals for Deep Learning Models ( http://arxiv.org/abs/2106.00225v1 )

ライセンス: Link先を確認

Zhen Lin, Shubhendu Trivedi, Jimeng Sun

(参考訳) 重要な現実世界の応用のためのディープラーニングモデルの信頼を構築するための重要な課題は、効率的で理論的に不確実な定量化である。有効な不確実性情報は2つの重要な特性を持つことが期待されている: 有効性(保証範囲)と差別性(予想されるリスクが高い場合にさらに不確実性)である。さらに、ディープラーニング(DL)メソッドと組み合わせると、拡張性が高く、DLモデルの性能に最小限の影響が及ぶ。既存のベイズ法の多くは、頻繁なカバレッジ保証がなく、通常はモデル性能に影響する。利用可能な数少ない頻繁主義的手法は、非現実的仮定による範囲保証を差別的かつ/または違反することはほとんどない。さらに、多くの手法は費用がかかるか、ベースとなるニューラルネットワークに大きな修正が必要となる。近年のコンフォメーション予測の進歩とカーネル回帰の古典的考え方の活用に基づいて,ほぼ任意のDLモデルに対して識別信頼区間(CI)を構築するための簡易かつ効率的かつ軽量な手法である局所妥当性・識別信頼区間(LVD)を提案する。データの分散に関する仮定がなければ、そのようなcisは有限サンプルのローカルカバレッジ保証も提供する(より単純な限界カバレッジに対応する)。多様なデータセットを用いて、LVDは局所的に有効な唯一の方法であるだけでなく、既存の不確実性定量化手法のパフォーマンス(カバレッジ率と予測精度を含む)を上回るか、一致しているかを実証的に検証し、スケーラビリティと柔軟性のさらなる利点を提供する。

Crucial for building trust in deep learning models for critical real-world applications is efficient and theoretically sound uncertainty quantification, a task that continues to be challenging. Useful uncertainty information is expected to have two key properties: It should be valid (guaranteeing coverage) and discriminative (more uncertain when the expected risk is high). Moreover, when combined with deep learning (DL) methods, it should be scalable and affect the DL model performance minimally. Most existing Bayesian methods lack frequentist coverage guarantees and usually affect model performance. The few available frequentist methods are rarely discriminative and/or violate coverage guarantees due to unrealistic assumptions. Moreover, many methods are expensive or require substantial modifications to the base neural network. Building upon recent advances in conformal prediction and leveraging the classical idea of kernel regression, we propose Locally Valid and Discriminative confidence intervals (LVD), a simple, efficient and lightweight method to construct discriminative confidence intervals (CIs) for almost any DL model. With no assumptions on the data distribution, such CIs also offer finite-sample local coverage guarantees (contrasted to the simpler marginal coverage). Using a diverse set of datasets, we empirically verify that besides being the only locally valid method, LVD also exceeds or matches the performance (including coverage rate and prediction accuracy) of existing uncertainty quantification methods, while offering additional benefits in scalability and flexibility.

翻訳日:2021-06-02 14:31:09 公開日:2021-06-01

# 分散ロバストエキスパートの合成による系列領域適応

Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts ( http://arxiv.org/abs/2106.00322v1 )

ライセンス: Link先を確認

Bahar Taskesen, Man-Chung Yue, Jose Blanchet, Daniel Kuhn, Viet Anh Nguyen

(参考訳) 最小二乗推定器は、いくつかの対象領域のサンプルでトレーニングすると、予測が貧弱になる可能性がある。教師付きドメイン適応は、目標分布に近いソース分布からラベル付きトレーニングサンプルを追加することにより、予測精度を向上させることを目的としている。利用可能なデータに基づいて,モーメント条件に関してロバストな最小二乗推定専門家の家族を合成する新しい戦略を検討する。これらのモーメント条件をkullback-leiblerまたはwasserstein型ダイバージェンスを用いて指定すると、凸最適化を用いてロバスト推定器を効率的に見つけることができる。我々は,提案するロバストな専門家群に対するbernstein online aggregationアルゴリズムを用いて,ターゲットテストサンプルの逐次ストリームの予測を行う。実データに対する数値実験は、ロバストな戦略が経験的最小二乗推定器の非ロバスト補間よりも優れていることを示している。

Least squares estimators, when trained on a few target domain samples, may predict poorly. Supervised domain adaptation aims to improve the predictive accuracy by exploiting additional labeled training samples from a source distribution that is close to the target distribution. Given available data, we investigate novel strategies to synthesize a family of least squares estimator experts that are robust with regard to moment conditions. When these moment conditions are specified using Kullback-Leibler or Wasserstein-type divergences, we can find the robust estimators efficiently using convex optimization. We use the Bernstein online aggregation algorithm on the proposed family of robust experts to generate predictions for the sequential stream of target test samples. Numerical experiments on real data show that the robust strategies may outperform non-robust interpolations of the empirical least squares estimators.

翻訳日:2021-06-02 14:30:48 公開日:2021-06-01

# Post-Contextual-Bandit推論

Post-Contextual-Bandit Inference ( http://arxiv.org/abs/2106.00418v1 )

ライセンス: Link先を確認

Aur\'elien Bibaut and Antoine Chambaz and Maria Dimakopoulou and Nathan Kallus and Mark van der Laan

(参考訳) コンテクストバンディットアルゴリズムは、eコマース、ヘルスケア、ポリシーメーキングにおける非適応的なa/bテストを置き換えるようになってきている。研究の終盤における新規介入の信頼性推論を支援するため, 平均治療効果, サブグループ効果, あるいは新政策の価値について, 有効な信頼区間を構築したい。しかし、文脈的帯域幅アルゴリズムによって収集されたデータの適応性は、これを難しくする: 標準推定器は、もはや漸近的に分布せず、古典的な信頼区間は、正しいカバレッジを提供することができない。これは、安定化推定器を用いて、非コンテキスト設定で対処されているが、この文脈設定は、我々が初めて取り組んだユニークな課題である。本研究では,文脈適応型データ収集において漸近的に正常なポリシー値に対する最初の推定器であるCADR(Contextual Adaptive Doubly Robust)推定器を提案する。 CADRの構築における主な技術的課題は、安定化のための適応的で一貫した条件付き標準偏差推定器を設計することである。 57のOpenMLデータセットを用いた大規模な数値実験により、CADRに基づく信頼区間が一意に正しいカバレッジを提供することが示された。

Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking because they can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, nonetheless, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies. The adaptive nature of the data collected by contextual bandit algorithms, however, makes this difficult: standard estimators are no longer asymptotically normally distributed and classic confidence intervals fail to provide correct coverage. While this has been addressed in non-contextual settings by using stabilized estimators, the contextual setting poses unique challenges that we tackle for the first time in this paper. We propose the Contextual Adaptive Doubly Robust (CADR) estimator, the first estimator for policy value that is asymptotically normal under contextual adaptive data collection. The main technical challenge in constructing CADR is designing adaptive and consistent conditional standard deviation estimators for stabilization. Extensive numerical experiments using 57 OpenML datasets demonstrate that confidence intervals based on CADR uniquely provide correct coverage.

翻訳日:2021-06-02 14:30:32 公開日:2021-06-01

# 有限ベイズニューラルネットワークにおける表現学習の漸近性

Asymptotics of representation learning in finite Bayesian neural networks ( http://arxiv.org/abs/2106.00651v1 )

ライセンス: Link先を確認

Jacob A. Zavatone-Veth and Abdulkadir Canatar and Cengiz Pehlevan

(参考訳) 近年の研究では、有限ベイズニューラルネットワークは、有限ネットワークが内部表現を柔軟に適応できるため、無限の従兄弟より優れていることが示唆されている。しかし、有限ネットワークの学習された隠れ層表現が無限ネットワークの固定表現とどのように異なるかに関する理論的理解は未完のままである。ネットワーク前後の摂動的有限幅補正について検討するが, 学習特徴の漸近性は十分に評価されていない。ここで、線形読み出しと二次コストを持つ任意のベイズネットワークの平均的特徴核に対する主有限幅補正は、概ね普遍的な形式であると主張する。完全連結ネットワークの2つのクラス – 深い線形ネットワークと単一の非線形隠蔽層を持つネットワーク – に対して,これを明示的に説明する。この結果から,データワイドベイズ型ニューラルネットワークの表現学習における特徴を解明する。

Recent works have suggested that finite Bayesian neural networks may outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width corrections to the network prior and posterior have been studied, but the asymptotics of learned features have not been fully characterized. Here, we argue that the leading finite-width corrections to the average feature kernels for any Bayesian network with linear readout and quadratic cost have a largely universal form. We illustrate this explicitly for two classes of fully connected networks: deep linear networks and networks with a single nonlinear hidden layer. Our results begin to elucidate which features of data wide Bayesian neural networks learn to represent.

翻訳日:2021-06-02 14:30:13 公開日:2021-06-01

# ヒートマップ回帰と深い畳み込みオドメトリを用いたマルコフ局所化

Markov Localisation using Heatmap Regression and Deep Convolutional Odometry ( http://arxiv.org/abs/2106.00371v1 )

ライセンス: Link先を確認

Oscar Mendez, Simon Hadfield, Richard Bowden

(参考訳) 自動運転車の文脈では、視覚的ローカライゼーションに基づくアプローチとLiDARとの強い競争がある。 LiDARは重要な深度情報を提供するが、解像度が低く高価である。一方、カメラは低コストであり、ディープラーニングの最近の進歩は、高いローカライズ性能を提供できることを意味する。しかし、特に不確実性領域において、学習に基づくアプローチが自信過剰で悪名高い、いくつかの根本的な問題が残っている。マルコフ、あるいはグリッドベースのローカライズは、ローカライズ問題の初期の解決策であったが、計算の複雑さのために好ましくなかった。確率場をグリッド(またはボリューム)として表現することは、精度とメモリサイズの間にトレードオフがあることを意味する。さらに,全容積全体にわたって高価な畳み込みを行う必要がある。全ての可能な位置を同時に維持する利点にもかかわらず、グリッドベースのアプローチはより効率的な粒子フィルタとモンテカルロ局在(MCL)に取って代わられた。しかし、MCLは独自の問題を導入している。粒子除去近年のディープラーニングハードウェアの進歩により、GPUに格納される大きな可能性ボリュームと、GPUによる3D畳み込みを効率的に実行するために必要なハードウェアが実現し、グリッドベースの手法の欠点の多くを排除している。本研究では,最新のディープラーニングハードウェアを活用する新しいCNNベースのローカライゼーション手法を提案する。グリッドベースのマルコフローカライズアプローチをgpu上で直接実装することにより、単一のニューラルネットワーク内でイメージベースのローカライズとオドメトリーに基づくラピッド伝搬を実行できるハイブリッドcnnを作成する。結果として得られたアプローチは、最先端のローカライズシステムと同様に、直接ポーズ回帰法を上回ることができる。

In the context of self-driving vehicles there is strong competition between approaches based on visual localisation and LiDAR. While LiDAR provides important depth information, it is sparse in resolution and expensive. On the other hand, cameras are low-cost and recent developments in deep learning mean they can provide high localisation performance. However, several fundamental problems remain, particularly in the domain of uncertainty, where learning based approaches can be notoriously over-confident. Markov, or grid-based, localisation was an early solution to the localisation problem but fell out of favour due to its computational complexity. Representing the likelihood field as a grid (or volume) means there is a trade off between accuracy and memory size. Furthermore, it is necessary to perform expensive convolutions across the entire likelihood volume. Despite the benefit of simultaneously maintaining a likelihood for all possible locations, grid based approaches were superseded by more efficient particle filters and Monte Carlo Localisation (MCL). However, MCL introduces its own problems e.g. particle deprivation. Recent advances in deep learning hardware allow large likelihood volumes to be stored directly on the GPU, along with the hardware necessary to efficiently perform GPU-bound 3D convolutions and this obviates many of the disadvantages of grid based methods. In this work, we present a novel CNN-based localisation approach that can leverage modern deep learning hardware. By implementing a grid-based Markov localisation approach directly on the GPU, we create a hybrid CNN that can perform image-based localisation and odometry-based likelihood propagation within a single neural network. The resulting approach is capable of outperforming direct pose regression methods as well as state-of-the-art localisation systems.

翻訳日:2021-06-02 14:29:59 公開日:2021-06-01

# COV-ECGNET:深部畳み込みニューラルネットワークを用いたECGトレース画像を用いたCOVID-19検出

COV-ECGNET: COVID-19 detection using ECG trace images with deep convolutional neural network ( http://arxiv.org/abs/2106.00436v1 )

ライセンス: Link先を確認

Tawsifur Rahman, Alex Akinbi, Muhammad E. H. Chowdhury, Tarik A. Rashid, Abdulkadir \c{S}eng\"ur, Amith Khandakar, Khandaker Reajul Islam, Aras M. Ismael

(参考訳) 新型コロナウイルスの感染拡大を防ぎ、ロックダウンの規制を緩和し、公衆衛生インフラへの圧力を減らすため、信頼性と迅速な識別が重要になっている。近年,SARS-CoV-2ウイルスを画像やデータを用いて検出する手法や手法が提案されている。しかし、これは心電図(ECG)トレース画像からCOVID-19を検出するために深部畳み込みニューラルネットワーク(CNN)モデルを使用することの可能性を探る最初の研究である。本研究は、深層学習技術を用いて、COVID-19および他の心血管疾患(CVD)を検出した。本研究では, 正常, COVID-19, 心筋梗塞 (MI), 異常心拍 (AHB) , 回復心筋梗塞 (RMI) の5つのカテゴリから1937年像を作成した。 6種類の深層CNNモデル (ResNet18, ResNet50, ResNet101, InceptionV3, DenseNet201, MobileNetv2) を用いて2クラス分類 (Normal vs COVID-19), 3クラス分類 (Normal, COVID-19, CVDs), そして5クラス分類 (Normal, COVID-19, MI, AHB, RMI) について検討した。 2級と3級の分類では、drknet201は99.1%、97.36%の精度で他のネットワークを上回り、5級の分類ではinceptionv3が97.83%の精度で他のネットワークを上回っている。 ScoreCAM視覚化は、ネットワークがトレース画像の関連領域から学習していることを確認する。提案手法は, スマートフォンで撮影可能なECGトレース画像を用いて, 低リソース国で容易に利用できる施設であるため, コンピュータ支援による新型コロナウイルスなどの心疾患の早期診断に有効である。

The reliable and rapid identification of the COVID-19 has become crucial to prevent the rapid spread of the disease, ease lockdown restrictions and reduce pressure on public health infrastructures. Recently, several methods and techniques have been proposed to detect the SARS-CoV-2 virus using different images and data. However, this is the first study that will explore the possibility of using deep convolutional neural network (CNN) models to detect COVID-19 from electrocardiogram (ECG) trace images. In this work, COVID-19 and other cardiovascular diseases (CVDs) were detected using deep-learning techniques. A public dataset of ECG images consists of 1937 images from five distinct categories, such as Normal, COVID-19, myocardial infarction (MI), abnormal heartbeat (AHB), and recovered myocardial infarction (RMI) were used in this study. Six different deep CNN models (ResNet18, ResNet50, ResNet101, InceptionV3, DenseNet201, and MobileNetv2) were used to investigate three different classification schemes: two-class classification (Normal vs COVID-19); three-class classification (Normal, COVID-19, and Other CVDs), and finally, five-class classification (Normal, COVID-19, MI, AHB, and RMI). For two-class and three-class classification, Densenet201 outperforms other networks with an accuracy of 99.1%, and 97.36%, respectively; while for the five-class classification, InceptionV3 outperforms others with an accuracy of 97.83%. ScoreCAM visualization confirms that the networks are learning from the relevant area of the trace images. Since the proposed method uses ECG trace images which can be captured by smartphones and are readily available facilities in low-resources countries, this study will help in faster computer-aided diagnosis of COVID-19 and other cardiac abnormalities.

翻訳日:2021-06-02 14:29:29 公開日:2021-06-01

# ディープニューラルネットワークにおける従来検出不能な障害の露呈

Exposing Previously Undetectable Faults in Deep Neural Networks ( http://arxiv.org/abs/2106.00576v1 )

ライセンス: Link先を確認

Isaac Dunn, Hadrien Pouget, Daniel Kroening and Tom Melham

(参考訳) DNNをテストするための既存の手法は、生の特徴(例えば)を制約することでオラクルの問題を解決する。 image pixel value) 所望のDNN出力が知られているデータセット例の小さな距離内にあること。しかしこれは、これらのアプローチが検出できる障害の種類を制限する。本稿では,他の手法では不可能なDNNの欠陥を見つけることができる新しいDNNテスト手法を提案する。 cruxは、生成的機械学習を利用することで、高レベルな特徴(画像の場合、オブジェクトの形、位置、テクスチャ、色など)に異なる新しいテスト入力を生成することができる、ということです。我々は,本手法が故意に注入された障害や最新dnnの新しい障害を検知できることを示すとともに,既存の手法ではこれらの障害を見つけることができないことを実証する。

Existing methods for testing DNNs solve the oracle problem by constraining the raw features (e.g. image pixel values) to be within a small distance of a dataset example for which the desired DNN output is known. But this limits the kinds of faults these approaches are able to detect. In this paper, we introduce a novel DNN testing method that is able to find faults in DNNs that other methods cannot. The crux is that, by leveraging generative machine learning, we can generate fresh test inputs that vary in their high-level features (for images, these include object shape, location, texture, and colour). We demonstrate that our approach is capable of detecting deliberately injected faults as well as new faults in state-of-the-art DNNs, and that in both cases, existing methods are unable to find these faults.

翻訳日:2021-06-02 14:28:51 公開日:2021-06-01

# ここで何ができるか? 視覚能力を利用した新しいスキルの学習

What Can I Do Here? Learning New Skills by Imagining Visual Affordances ( http://arxiv.org/abs/2106.00671v1 )

ライセンス: Link先を確認

Alexander Khazatsky, Ashvin Nair, Daniel Jing, Sergey Levine

(参考訳) 学習スキルを備えた汎用ロボットは、多くの異なる環境で多くのタスクを実行できなければならない。しかし、新しい設定へのゼロショットの一般化が常に可能であるとは限らない。ロボットが新しい環境や物体に遭遇したとき、この変化に対応するために、以前に学んだスキルを微調整する必要があるかもしれない。しかし、重要なことは、これまで学んだ行動やモデルは、この再学習を加速するのに相応しいはずだ。本稿では,可能成果の生成モデルを用いて,ロボットが手頃価格の視覚的表現を学習し,新たな状況において潜在的成果をサンプリングし,さらにその成果を達成するためのポリシーを訓練することを目的とした。実際に、事前データは、ロボットが不慣れな設定に遭遇すると、そのモデルから潜在的な成果をサンプリングし、それらに到達し、そのスキルと結果モデルの両方を更新することができるように、どのような結果が可能かを学習するために使用される。本手法は, VAL (visuomotor affordance Learning) を用いて, 原画像入力で動作する目標条件付きポリシーを訓練し, 提案手法を用いて, 新たなオブジェクトの操作を迅速に学習することができる。我々は,VALが先行データを利用すれば,新たなシーンで5分間のオンライン体験しか持たずに,引き出しのオープニングや把握,オブジェクトの配置といった現実的なタスクを解決できることを示す。

A generalist robot equipped with learned skills must be able to perform many tasks in many different environments. However, zero-shot generalization to new settings is not always possible. When the robot encounters a new environment or object, it may need to finetune some of its previously learned skills to accommodate this change. But crucially, previously learned behaviors and models should still be suitable to accelerate this relearning. In this paper, we aim to study how generative models of possible outcomes can allow a robot to learn visual representations of affordances, so that the robot can sample potentially possible outcomes in new situations, and then further train its policy to achieve those outcomes. In effect, prior data is used to learn what kinds of outcomes may be possible, such that when the robot encounters an unfamiliar setting, it can sample potential outcomes from its model, attempt to reach them, and thereby update both its skills and its outcome model. This approach, visuomotor affordance learning (VAL), can be used to train goal-conditioned policies that operate on raw image inputs, and can rapidly learn to manipulate new objects via our proposed affordance-directed exploration scheme. We show that VAL can utilize prior data to solve real-world tasks such drawer opening, grasping, and placing objects in new scenes with only five minutes of online experience in the new scene.

翻訳日:2021-06-02 14:28:38 公開日:2021-06-01

# HERALD:ソーシャル・会話におけるユーザ・ディエンジメントを効果的に検出するアノテーション手法

HERALD: An Annotation Efficient Method to Detect User Disengagement in Social Conversations ( http://arxiv.org/abs/2106.00162v1 )

ライセンス: Link先を確認

Weixin Liang, Kai-Hui Liang, Zhou Yu

(参考訳) オープンドメインダイアログシステムには、人間に魅力的な会話体験を提供することという、ユーザ中心の目標がある。ユーザエンゲージメントはオープンドメインダイアログシステムを評価する上で最も重要な指標の1つであり、ダイアログポリシー学習のためにリアルタイムフィードバックとしても使用できる。ユーザの離脱を検出する既存の作業は、通常、多くのダイアログのサンプルを手作業でラベル付けする必要がある。本稿では,学習データアノテーションプロセスを再編成するアノテーション効率のよいフレームワークであるHERALDを提案する。具体的には、手作業によるトレーニングサンプルのラベル付けではなく、トレーニングサンプルの自動ラベル付けヒューリスティックのセットを使っています。次に、Shapleyアルゴリズムを用いて弱いラベル付きデータを復調する。最後に、ユーザエンゲージメント検出器をトレーニングするために、デノライズドデータを使用します。実験の結果,herbledはアノテーションの効率を大幅に向上し,2つのダイアログコーパスにおいて86%のユーザ離脱検出精度を達成した。

Open-domain dialog systems have a user-centric goal: to provide humans with an engaging conversation experience. User engagement is one of the most important metrics for evaluating open-domain dialog systems, and could also be used as real-time feedback to benefit dialog policy learning. Existing work on detecting user disengagement typically requires hand-labeling many dialog samples. We propose HERALD, an annotation efficient framework that reframes the training data annotation process as a denoising problem. Specifically, instead of manual labeling training samples, we first use a set of labeling heuristics to automatically label training samples. We then denoise the weakly labeled data using Shapley algorithm. Finally, we use the denoised data to train a user engagement detector. Our experiments show that HERALD improves annotation efficiency significantly and achieves 86% user disengagement detection accuracy in two dialog corpora.

翻訳日:2021-06-02 14:27:20 公開日:2021-06-01

# ニューラルマシン翻訳における速度品質最適化時のジェンダーバイアス増幅

Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation ( http://arxiv.org/abs/2106.00169v1 )

ライセンス: Link先を確認

Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab

(参考訳) ニューラルネットワーク翻訳(NMT)モデルが速度に最適化され、BLEUを用いたジェネリックテストセットで評価された場合、バイアスは増幅されるか? 本稿では,ゲーディ検索,量子化,平均アテンションネットワーク(AAN)や浅層デコーダモデルなどのトランスフォーマーモデルにおいて,デコーディングの高速化によく用いられるアーキテクチャや手法について検討し,その効果を示す。本研究は, 男女差テストセットであるSimpleGENを構築し, 性別付き名詞句を1つ, 曖昧で, 正解が1つ存在する。速度最適化を適用するとBLEU全体の劣化は最小限に抑えられるが、性別付き名詞翻訳性能ははるかに高速に低下する。

Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU? We investigate architectures and techniques commonly used to speed up decoding in Transformer-based models, such as greedy search, quantization, average attention networks (AANs) and shallow decoder models and show their effect on gendered noun translation. We construct a new gender bias test set, SimpleGEN, based on gendered noun phrases in which there is a single, unambiguous, correct answer. While we find minimal overall BLEU degradation as we apply speed optimizations, we observe that gendered noun translation performance degrades at a much faster rate.

翻訳日:2021-06-02 14:27:03 公開日:2021-06-01

# ジェンダーバイアスを隠した中国語の単語埋め込み:中国語の形容詞を例に

Gender Bias Hidden Behind Chinese Word Embeddings: The Case of Chinese Adjectives ( http://arxiv.org/abs/2106.00181v1 )

ライセンス: Link先を確認

Meichun Jiao, Ziyang Luo

(参考訳) 単語埋め込みにおけるジェンダーバイアスは、近年徐々に鮮明な研究分野になりつつある。この分野のほとんどの研究は、対象言語として英語を用いた測定と偏差法を目標としている。本研究は,中国語形容詞における静的単語埋め込みにおける性別バイアスについて考察する。異なるモデルで単語表現を訓練することにより、形容詞のベクトルの背後にある性別バイアスを評価する。生成した結果と人称データセットを比較することで,単語埋め込みに符号化された性別バイアスが人々の態度とどのように異なるかを示す。

Gender bias in word embeddings gradually becomes a vivid research field in recent years. Most studies in this field aim at measurement and debiasing methods with English as the target language. This paper investigates gender bias in static word embeddings from a unique perspective, Chinese adjectives. By training word representations with different models, the gender bias behind the vectors of adjectives is assessed. Through a comparison between the produced results and a human-scored data set, we demonstrate how gender bias encoded in word embeddings differentiates from people's attitudes.

翻訳日:2021-06-02 14:26:49 公開日:2021-06-01

# 文脈対応ルール注入による形式的スタイル伝達の改善

Improving Formality Style Transfer with Context-Aware Rule Injection ( http://arxiv.org/abs/2106.00210v1 )

ライセンス: Link先を確認

Zonghai Yao and Hong Yu

(参考訳) 大規模正規テキストコーパスで事前学習されたモデルは、主流テキストと言語スタイルが大きく異なるユーザー生成データではうまく機能しないことが多い。ここでは、形式的スタイル転送(FST)の革新的な方法である文脈認識ルール注入(CARI)について述べる。 CARIは、エンドツーエンドのBERTベースのエンコーダとデコーダモデルに複数のルールを注入する。コンテキストに基づいて最適なルールを選択することを学ぶ。内在的評価により,CARIはFSTベンチマークデータセット上での新たな最高性能を達成した。本研究では,複数のツイート感情分析タスクにおいて,CARIが通常の事前学習モデルの性能を大幅に向上できることを示す。

Models pre-trained on large-scale regular text corpora often do not work well for user-generated data where the language styles differ significantly from the mainstream text. Here we present Context-Aware Rule Injection (CARI), an innovative method for formality style transfer (FST). CARI injects multiple rules into an end-to-end BERT-based encoder and decoder model. It learns to select optimal rules based on context. The intrinsic evaluation showed that CARI achieved the new highest performance on the FST benchmark dataset. Our extrinsic evaluation showed that CARI can greatly improve the regular pre-trained models' performance on several tweet sentiment analysis tasks.

翻訳日:2021-06-02 14:26:41 公開日:2021-06-01

# 消費者健康質問要約のための質問認識トランスフォーマーモデル

Question-aware Transformer Models for Consumer Health Question Summarization ( http://arxiv.org/abs/2106.00219v1 )

ライセンス: Link先を確認

Shweta Yadav, Deepak Gupta, Asma Ben Abacha and Dina Demner-Fushman

(参考訳) オンラインの健康情報検索は、日々ますます多くの消費者にとって慣例となっているため、効率的で信頼性の高い質問応答システムの必要性が高まっている。これらのシステムの成功率に重要な貢献は、消費者の質問を完全に理解できる能力である。しかし、これらの質問はしばしば必要以上に長く、適切な回答を見つけるのに役に立たない周辺情報に言及する。質問の要約は、答えを見つける前に、長く複雑な消費者の質問を単純化する潜在的な解決策の1つである。本稿では,現実の消費者健康問題に対する抽象的な要約の課題について考察する。医療機関の認識を通じて質問の意味的解釈を活用し,情報的要約の生成を可能にする抽象的質問要約モデルを開発した。そこで我々は複数のClozeタスク(すなわち)を提案する。問題焦点認識においてより良いカバレッジを持つようにモデルを強制する重要な医療機関を特定するための(特定の文脈で欠落した単語を提出する)タスク。さらに,デコーダの入力に質問型情報を加え,質問型要約を生成する。 MeQSumベンチマークコーパスで評価すると、我々のフレームワークは最先端の手法を10.2ROUGE-Lで上回りました。また,生成した要約の正確性を評価するために手動による評価を行った。

Searching for health information online is becoming customary for more and more consumers every day, which makes the need for efficient and reliable question answering systems more pressing. An important contributor to the success rates of these systems is their ability to fully understand the consumers' questions. However, these questions are frequently longer than needed and mention peripheral information that is not useful in finding relevant answers. Question summarization is one of the potential solutions to simplifying long and complex consumer questions before attempting to find an answer. In this paper, we study the task of abstractive summarization for real-world consumer health questions. We develop an abstractive question summarization model that leverages the semantic interpretation of a question via recognition of medical entities, which enables the generation of informative summaries. Towards this, we propose multiple Cloze tasks (i.e. the task of filing missing words in a given context) to identify the key medical entities that enforce the model to have better coverage in question-focus recognition. Additionally, we infuse the decoder inputs with question-type information to generate question-type driven summaries. When evaluated on the MeQSum benchmark corpus, our framework outperformed the state-of-the-art method by 10.2 ROUGE-L points. We also conducted a manual evaluation to assess the correctness of the generated summaries.

翻訳日:2021-06-02 14:26:30 公開日:2021-06-01

# 多単語表現機能によるヘイトスピーチ自動検出の改善

Improving Automatic Hate Speech Detection with Multiword Expression Features ( http://arxiv.org/abs/2106.00237v1 )

ライセンス: Link先を確認

Nicolas Zampieri, Irina Illina and Dominique Fohr

(参考訳) ソーシャルメディアでヘイトスピーチを自動的に検出する作業は、ますます注目を集めている。毎日投稿される大量のコンテンツを考えると、ヘイトスピーチの人間の監視は不可能だ。本研究では,ヘイトスピーチ自動検出(hsd: multiword expressions, mwes)のための新しい単語レベル機能を提案する。 mwes は慣用的意味と構成的意味を持つ単語よりも大きい語彙単位である。我々は、深層ニューラルネットワークベースのHSDフレームワークにMWE機能を統合することを提案する。我々のベースライン HSD システムは Universal Sentence Encoder (USE) に依存している。 MWE機能を組み込むために、3分岐のディープニューラルネットワーク(USE用の1つのブランチ、MWEカテゴリ用の1つ、MWE埋め込みのための1つ)を作成します。我々は、異なるMWEカテゴリと2種類のMWE埋め込み、 word2vec と BERT を用いた2種類のヘイトスピーチツイートコーパスの実験を行った。実験の結果,MWE特徴を持つHSDシステムはマクロF1の点でベースラインシステムよりも有意に優れていた。

The task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional meanings. We propose to integrate MWE features in a deep neural network-based HSD framework. Our baseline HSD system relies on Universal Sentence Encoder (USE). To incorporate MWE features, we create a three-branch deep neural network: one branch for USE, one for MWE categories, and one for MWE embeddings. We conduct experiments on two hate speech tweet corpora with different MWE categories and with two types of MWE embeddings, word2vec and BERT. Our experiments demonstrate that the proposed HSD system with MWE features significantly outperforms the baseline system in terms of macro-F1.

翻訳日:2021-06-02 14:26:14 公開日:2021-06-01

# semeval-2021タスク9 : tapasと転送学習を用いた表による文検証とエビデンス発見

Volta at SemEval-2021 Task 9: Statement Verification and Evidence Finding with Tables using TAPAS and Transfer Learning ( http://arxiv.org/abs/2106.00248v1 )

ライセンス: Link先を確認

Devansh Gautam, Kshitij Gupta, Manish Shrivastava

(参考訳) 表は、情報を簡潔に提示するために、様々な種類の文書で広く使われている。表を理解することは、言語と表の構造、数値的および論理的推論を理解することを必要とする難しい問題である。本稿では,SemEval-2021: Statement Verification and Evidence Finding with Tables (SEM-TAB-FACTS) のタスク9を解くシステムを提案する。タスクは2つのサブタスクで構成される: (a) テーブルとステートメント、そのテーブルがステートメントをサポートするかどうかの予測、および (b) テーブル内のどのセルがそのステートメントの証拠を提供するかを予測する。我々は,テーブル理解タスクにおける最先端性能を示すため,各サブタスクに対してTAPAS(BERTのアーキテクチャを拡張して表構造をキャプチャするモデル)を微調整する。サブタスクAでは,1つのヘッダ列を持つテーブルの転送学習と標準化がTAPASの性能を向上させるかを評価する。 In subtask B, we evaluate how different fine-tuning strategy could improve of TAPAS。サブタスクではF1スコアが67.34、サブタスクでは72.89、サブタスクBでは62.95である。

Tables are widely used in various kinds of documents to present information concisely. Understanding tables is a challenging problem that requires an understanding of language and table structure, along with numerical and logical reasoning. In this paper, we present our systems to solve Task 9 of SemEval-2021: Statement Verification and Evidence Finding with Tables (SEM-TAB-FACTS). The task consists of two subtasks: (A) Given a table and a statement, predicting whether the table supports the statement and (B) Predicting which cells in the table provide evidence for/against the statement. We fine-tune TAPAS (a model which extends BERT's architecture to capture tabular structure) for both the subtasks as it has shown state-of-the-art performance in various table understanding tasks. In subtask A, we evaluate how transfer learning and standardizing tables to have a single header row improves TAPAS' performance. In subtask B, we evaluate how different fine-tuning strategies can improve TAPAS' performance. Our systems achieve an F1 score of 67.34 in subtask A three-way classification, 72.89 in subtask A two-way classification, and 62.95 in subtask B.

翻訳日:2021-06-02 14:26:00 公開日:2021-06-01

# LenAtten: テキスト要約に有効な長さ制御ユニット

LenAtten: An Effective Length Controlling Unit For Text Summarization ( http://arxiv.org/abs/2106.00316v1 )

ライセンス: Link先を確認

Zhongyi Yu, Zhenghao Wu, Hao Zheng, Zhe XuanYuan, Jefferson Fong, Weifeng Su

(参考訳) 固定長要約は、単語や文字のプリセット数で要約を生成することを目的としている。近年の研究では、単語埋め込みを繰り返し復号ユニットへの入力として長さ情報を取り込んでおり、長さ制御性と要約品質の妥協を引き起こしている。本稿では,このトレードオフを解消するために,有効長制御単位長注意(lenatten)を提案する。実験結果から,LenAttenは長さ制御性とROGUEスコアの改善をもたらすだけでなく,高い一般化能力を有することが示された。 CNN/Daily Mailデータセットにおいて,目標長の要約を生成するタスクにおいて,我々のモデルは,最大長制御可能な要約器よりも732倍よい。

Fixed length summarization aims at generating summaries with a preset number of words or characters. Most recent researches incorporate length information with word embeddings as the input to the recurrent decoding unit, causing a compromise between length controllability and summary quality. In this work, we present an effective length controlling unit Length Attention (LenAtten) to break this trade-off. Experimental results show that LenAtten not only brings improvements in length controllability and ROGUE scores but also has great generalization ability. In the task of generating a summary with the target length, our model is 732 times better than the best-performing length controllable summarizer in length controllability on the CNN/Daily Mail dataset.

翻訳日:2021-06-02 14:25:39 公開日:2021-06-01

# 合理化のための分布マッチング

Distribution Matching for Rationalization ( http://arxiv.org/abs/2106.00320v1 )

ライセンス: Link先を確認

Yongfeng Huang, Yujun Chen, Yulun Du, Zhilin Yang

(参考訳) 合理化の課題は、入力テキストの一部を論理として抽出し、テキスト分類タスクにおけるニューラルネットワーク予測を正当化することである。定義上、合理化は予測に使われるキーテキストを表現し、したがって元の入力テキストと類似した分類特徴分布を持つべきである。しかし,従来の手法は,有理とラベル間の相互情報の最大化を主眼とし,有理と入力テキストの関係を無視するものであった。そこで本研究では,特徴空間と出力空間の両方における有理と入力テキストの分布に一致する新しい合理化手法を提案する。実験的に、提案した分布マッチングアプローチは、従来手法を大きなマージンで一貫して上回っている。データとコードは利用可能です。

The task of rationalization aims to extract pieces of input text as rationales to justify neural network predictions on text classification tasks. By definition, rationales represent key text pieces used for prediction and thus should have similar classification feature distribution compared to the original input text. However, previous methods mainly focused on maximizing the mutual information between rationales and labels while neglecting the relationship between rationales and input text. To address this issue, we propose a novel rationalization method that matches the distributions of rationales and input text in both the feature space and output space. Empirically, the proposed distribution matching approach consistently outperforms previous methods by a large margin. Our data and code are available.

翻訳日:2021-06-02 14:25:25 公開日:2021-06-01

# 中国語単語の内部構造に関する詳細な研究

An In-depth Study on Internal Structure of Chinese Words ( http://arxiv.org/abs/2106.00334v1 )

ライセンス: Link先を確認

Chen Gong, Saihao Huang, Houquan Zhou, Zhenghua Li, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan

(参考訳) 英語の文字とは異なり、漢字は豊かで特定の意味を持つ。通常、単語の意味は何らかの形でその構成文字から派生することができる。構文解析に関するいくつかの以前の研究は、文字レベルの情報を活用するために浅い単語内部構造を注釈付けすることを提案した。本研究は,中国語単語の深い内部構造を,構文的関係を識別するための11のラベルを持つ依存木としてモデル化することを提案する。まず,新たにコンパイルされたアノテーションガイドラインに基づいて,中国ペンツリーバンクの30万語以上の多字語からなる単語内部構造木バンク(WIST)を手作業で注釈する。品質を保証するため、各単語は独立して2つの注釈により注釈され、不整合は第3上級注釈者によって処理される。第2に,中国語の単語形成に関する知見を明らかにするために,WISTに関する詳細な,興味深い分析を行った。第3に,新しいタスクとして単語内構造解析を提案し,競合依存構文解析器を用いてベンチマーク実験を行う。最後に,単語内部構造を符号化する2つの簡単な方法を提案する。

Unlike English letters, Chinese characters have rich and specific meanings. Usually, the meaning of a word can be derived from its constituent characters in some way. Several previous works on syntactic parsing propose to annotate shallow word-internal structures for better utilizing character-level information. This work proposes to model the deep internal structures of Chinese words as dependency trees with 11 labels for distinguishing syntactic relationships. First, based on newly compiled annotation guidelines, we manually annotate a word-internal structure treebank (WIST) consisting of over 30K multi-char words from Chinese Penn Treebank. To guarantee quality, each word is independently annotated by two annotators and inconsistencies are handled by a third senior annotator. Second, we present detailed and interesting analysis on WIST to reveal insights on Chinese word formation. Third, we propose word-internal structure parsing as a new task, and conduct benchmark experiments using a competitive dependency parser. Finally, we present two simple ways to encode word-internal structures, leading to promising gains on the sentence-level syntactic parsing task.

翻訳日:2021-06-02 14:25:14 公開日:2021-06-01

# 対話型事前学習

Dialogue-oriented Pre-training ( http://arxiv.org/abs/2106.00420v1 )

ライセンス: Link先を確認

Yi Xu, Hai Zhao

(参考訳) 事前訓練された言語モデル(PrLM)は、様々な対話に関連したタスクを含む幅広い下流タスクの強化に有効であることが示されている。しかし、PrLMは通常、共通言語モデル(LM)訓練目的の一般的なプレーンテキストで訓練されるため、そのようなトレーニング設定の制限により、対話排他的特徴を十分に捉えられないため、特定の対話タスクとLMタスクのギャップを埋める必要がすぐに生じる。本稿では,対話指向事前学習のための膨大な対話データを収集することができないため,一般的な平文における対話特徴をシミュレートする3つの手法を提案する。提案手法は, 話者認識, 連続性, 一貫性などの対話的特徴を学習しながら, 汎用のPrLMを生成でき, 詳細なタスクを特定できない既存の学習方法と異なる。その結果、Dialog-PrLMは3つの公開マルチターン対話データセットに基づいて微調整され、通常のPrLMよりも大幅に一貫した改善を実現する。

Pre-trained language models (PrLM) has been shown powerful in enhancing a broad range of downstream tasks including various dialogue related ones. However, PrLMs are usually trained on general plain text with common language model (LM) training objectives, which cannot sufficiently capture dialogue exclusive features due to the limitation of such training setting, so that there is an immediate need to fill the gap between a specific dialogue task and the LM task. As it is unlikely to collect huge dialogue data for dialogue-oriented pre-training, in this paper, we propose three strategies to simulate the conversation features on general plain text. Our proposed method differs from existing post-training methods that it may yield a general-purpose PrLM and does not individualize to any detailed task while keeping the capability of learning dialogue related features including speaker awareness, continuity and consistency. The resulted Dialog-PrLM is fine-tuned on three public multi-turn dialogue datasets and helps achieve significant and consistent improvement over the plain PrLMs.

翻訳日:2021-06-02 14:24:56 公開日:2021-06-01

# SemEval-2021 Task 1: Lexical Complexity Prediction

SemEval-2021 Task 1: Lexical Complexity Prediction ( http://arxiv.org/abs/2106.00473v1 )

ライセンス: Link先を確認

Matthew Shardlow, Richard Evans, Gustavo Henrique Paetzold, Marcos Zampieri

(参考訳) 本稿では,SemEval-2021 Task 1Lexical Complexity Predictionの結果と主な結果を示す。参加者にCompLex Corpus(Shardlow et al 2020)の拡張版を提供した。コンプレックス (complex) は、英語の多言語コーパスで、単語と多語表現 (mwes) が5点類似尺度を用いてその複雑さについて注釈付けされた。 semeval-2021 task 1 には2つのサブタスクがあった。このコンペには合計198チームが参加し、うち54チームがテストデータの公式実行をサブタスク1に、37チームがサブタスク2に提出した。

This paper presents the results and main findings of SemEval-2021 Task 1 - Lexical Complexity Prediction. We provided participants with an augmented version of the CompLex Corpus (Shardlow et al 2020). CompLex is an English multi-domain corpus in which words and multi-word expressions (MWEs) were annotated with respect to their complexity using a five point Likert scale. SemEval-2021 Task 1 featured two Sub-tasks: Sub-task 1 focused on single words and Sub-task 2 focused on MWEs. The competition attracted 198 teams in total, of which 54 teams submitted official runs on the test data to Sub-task 1 and 37 to Sub-task 2.

翻訳日:2021-06-02 14:24:37 公開日:2021-06-01

# DoT:テーブル付きNLPタスクのための効率的なダブルトランス

DoT: An efficient Double Transformer for NLP tasks with tables ( http://arxiv.org/abs/2106.00479v1 )

ライセンス: Link先を確認

Syrine Krichene, Thomas M\"uller and Julian Martin Eisenschlos

(参考訳) 半構造化テーブルを用いた自然言語処理(NLP)タスクにおける最先端の精度を得るためにトランスフォーマーベースのアプローチが成功している。これらのモデルアーキテクチャは一般的に深く、特に長い入力に対してトレーニングや推論が遅くなる。高い精度を維持しつつ効率を向上させるために、問題を2つのサブタスクに分解する新しいアーキテクチャ、dot(double transformer model)を提案している。さらに,タスク固有の注意点を変更し,プルーニングスコアを組み込む。 2つのトランスフォーマーはタスク固有の損失を最適化することで共同で訓練される。詳細と質問応答を含む3つのベンチマークで実験を行う。少ない精度でDoTはトレーニング時間と推論時間を少なくとも50%改善することを示した。また,pruning transformerは,エンド・ツー・エンドモデルが低速なベースラインモデルと同様の精度を維持するための関連するトークンを効果的に選択できることを示す。最後に、刈り取りを分析し、そのタスクモデルへの影響について見識を与えます。

Transformer-based approaches have been successfully used to obtain state-of-the-art accuracy on natural language processing (NLP) tasks with semi-structured tables. These model architectures are typically deep, resulting in slow training and inference, especially for long inputs. To improve efficiency while maintaining a high accuracy, we propose a new architecture, DoT, a double transformer model, that decomposes the problem into two sub-tasks: A shallow pruning transformer that selects the top-K tokens, followed by a deep task-specific transformer that takes as input those K tokens. Additionally, we modify the task-specific attention to incorporate the pruning scores. The two transformers are jointly trained by optimizing the task-specific loss. We run experiments on three benchmarks, including entailment and question-answering. We show that for a small drop of accuracy, DoT improves training and inference time by at least 50%. We also show that the pruning transformer effectively selects relevant tokens enabling the end-to-end model to maintain similar accuracy as slower baseline models. Finally, we analyse the pruning and give some insight into its impact on the task model.

翻訳日:2021-06-02 14:24:27 公開日:2021-06-01

# 定量的対話コヒーレンス評価に向けて

Towards Quantifiable Dialogue Coherence Evaluation ( http://arxiv.org/abs/2106.00507v1 )

ライセンス: Link先を確認

Zheng Ye, Liucun Lu, Lishan Huang, Liang Lin, Xiaodan Liang

(参考訳) 自動対話コヒーレンス評価は注目度が高くなり,有望な対話システムの開発に不可欠である。しかし、既存の指標には2つの大きな制限がある: (a) それらは主に単純化された2段階の設定(コヒーレント対非コヒーレント)で訓練されているのに対し、人間は「量子化」と呼ばれる、クアルト型多段階コヒーレンススコアを与える; (b) トレーニング中に人間の指導が欠如しているため、予測されたコヒーレンススコアは実際の人間の評価基準に適合しない。そこで本研究では,実際の評価基準を反映することのできる,定量化可能な対話コヒーレンスメトリックの学習を目的とした新しい枠組みであるquantidceを提案する。具体的には、QuantiDCEには、マルチレベルランキング(MLR)事前トレーニングと知識蒸留(KD)微調整という2つのトレーニング段階が含まれている。 MLR事前学習中に、モデルがコヒーレンスの粗い判断を学習できるようにするために、新しいMLR損失を提案する。そして、KD微調整の間、事前訓練されたモデルはさらに微調整され、人間の注釈付きデータだけで実際の人間の評価基準を学習する。限られた微調整データでも一般化性を提唱するため、事前学習段階で学んだ知識を保持するために、新しいkd正則化を導入する。実験結果から,QuantiDCEによりトレーニングされたモデルは,他の最先端の指標に比べて,人間の判断と強い相関関係を示すことが示された。

Automatic dialogue coherence evaluation has attracted increasing attention and is crucial for developing promising dialogue systems. However, existing metrics have two major limitations: (a) they are mostly trained in a simplified two-level setting (coherent vs. incoherent), while humans give Likert-type multi-level coherence scores, dubbed as "quantifiable"; (b) their predicted coherence scores cannot align with the actual human rating standards due to the absence of human guidance during training. To address these limitations, we propose Quantifiable Dialogue Coherence Evaluation (QuantiDCE), a novel framework aiming to train a quantifiable dialogue coherence metric that can reflect the actual human rating standards. Specifically, QuantiDCE includes two training stages, Multi-Level Ranking (MLR) pre-training and Knowledge Distillation (KD) fine-tuning. During MLR pre-training, a new MLR loss is proposed for enabling the model to learn the coarse judgement of coherence degrees. Then, during KD fine-tuning, the pretrained model is further finetuned to learn the actual human rating standards with only very few human-annotated data. To advocate the generalizability even with limited fine-tuning data, a novel KD regularization is introduced to retain the knowledge learned at the pre-training stage. Experimental results show that the model trained by QuantiDCE presents stronger correlations with human judgements than the other state-of-the-art metrics.

翻訳日:2021-06-02 14:24:12 公開日:2021-06-01

# 学生のパフォーマンス予測のためのグラフベース演習・知識学習ネットワーク

Graph-based Exercise- and Knowledge-Aware Learning Network for Student Performance Prediction ( http://arxiv.org/abs/2106.00263v1 )

ライセンス: Link先を確認

Mengfan Liu, Pengyang Shao, Kun Zhang

(参考訳) 知的指導システム(itss)では、生徒のパフォーマンスを予測することは、生徒の知識レベルを学習し、それらに対してパーソナライズされた指導戦略を提供するための基本的なタスクである。研究者はこの課題に多くの努力をしてきた。学習知識の熟練度に応じて生徒のスコアを予測するために教育心理学的手法を利用するか、学生や演習の潜伏要因を表すために協調フィルタリング(CF)モデルをフル活用する。しかし、これらの手法のほとんどは、運動特有の特性(例えば運動材料)を無視したり、学生間の高次相互作用や運動、知識概念を十分に探求することができない。そこで本稿では,学生のスコアを正確に予測するためのグラフベースの知識認識学習ネットワークを提案する。具体的には,エクササイズと知識概念の2つの効果をモデル化するために,学生のエクササイズと知識概念の熟達度をそれぞれ学習する。そして,高次相互作用をモデル化するために,予測プロセスにグラフ畳み込み手法を適用する。 2つの実世界のデータセットに対する大規模な実験により、提案したグラフ-EKLNの有効性が証明された。

Predicting student performance is a fundamental task in Intelligent Tutoring Systems (ITSs), by which we can learn about students' knowledge level and provide personalized teaching strategies for them. Researchers have made plenty of efforts on this task. They either leverage educational psychology methods to predict students' scores according to the learned knowledge proficiency, or make full use of Collaborative Filtering (CF) models to represent latent factors of students and exercises. However, most of these methods either neglect the exercise-specific characteristics (e.g., exercise materials), or cannot fully explore the high-order interactions between students, exercises, as well as knowledge concepts. To this end, we propose a Graph-based Exercise- and Knowledge-Aware Learning Network for accurate student score prediction. Specifically, we learn students' mastery of exercises and knowledge concepts respectively to model the two-fold effects of exercises and knowledge concepts. Then, to model the high-order interactions, we apply graph convolution techniques in the prediction process. Extensive experiments on two real-world datasets prove the effectiveness of our proposed Graph-EKLN.

翻訳日:2021-06-02 14:23:25 公開日:2021-06-01

# 歴史からの探索と未来への理由:時間知識グラフの2段階推論

Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs ( http://arxiv.org/abs/2106.00327v1 )

ライセンス: Link先を確認

Zixuan Li, Xiaolong Jin, Saiping Guan, Wei Li, Jiafeng Guo, Yuanzhuo Wang and Xueqi Cheng

(参考訳) 時間的知識グラフ (TKG) は様々な分野で開発・利用されている。将来の潜在的な事実(イベント)を予測するtkgの推論は、既存のモデルに大きな課題をもたらす。予測タスクに直面するとき、人間は通常、記憶の中の有用な歴史的情報(すなわち手がかり)を検索し、将来を慎重に考える。そこで本研究では,この機構に触発されて,手がかり探索と時間推論の2段階的な予測を行うクラスタを提案する。具体的には、手がかり探索段階において、CluSTeRは強化学習(RL)を介してビーム探索ポリシーを学び、歴史的事実から複数の手がかりを導き出す。時間的推論の段階では、グラフ畳み込みネットワークに基づくシーケンス法を採用し、答えを手がかりから導き出す。 4つのデータセットの実験は、最先端の手法と比較してCluSTeRのかなりの利点を示している。さらに、CluSTeRが発見した手がかりは、結果の解釈可能性をさらに高める。

Temporal Knowledge Graphs (TKGs) have been developed and used in many different areas. Reasoning on TKGs that predicts potential facts (events) in the future brings great challenges to existing models. When facing a prediction task, human beings usually search useful historical information (i.e., clues) in their memories and then reason for future meticulously. Inspired by this mechanism, we propose CluSTeR to predict future facts in a two-stage manner, Clue Searching and Temporal Reasoning, accordingly. Specifically, at the clue searching stage, CluSTeR learns a beam search policy via reinforcement learning (RL) to induce multiple clues from historical facts. At the temporal reasoning stage, it adopts a graph convolution network based sequence method to deduce answers from clues. Experiments on four datasets demonstrate the substantial advantages of CluSTeR compared with the state-of-the-art methods. Moreover, the clues found by CluSTeR further provide interpretability for the results.

翻訳日:2021-06-02 14:23:07 公開日:2021-06-01

# 典型性を有するファジィDLのKLM特性について

On the KLM properties of a fuzzy DL with Typicality ( http://arxiv.org/abs/2106.00390v1 )

ライセンス: Link先を確認

Laura Giordano

(参考訳) 本稿では,ファジィ論理の典型的特性について考察する。近年,多層パーセプトロンのファジィ多参照セマンティクスを,ニューラルネットワークを条件付き知識ベースとして定義するために,ファジィ論理を典型演算子で拡張する手法が提案されている。本稿では,その特性について考察する。まず、ファジィALCの典型性を持つ単調拡張(ALCFT)を考慮し、この論理に対する優先的な帰結関係のKLM特性を再構成する。ほとんどの性質は、再編成と考慮されたファジィ結合関数によって満足される。次に,前述した条件付き知識基盤のコヒーレントモデルの概念を一般化した重み付き知識ベースの概念を導入することにより,alcftを閉包構成で強化し,その性質について検討する。

The paper investigates the properties of a fuzzy logic of typicality. The extension of fuzzy logic with a typicality operator was proposed in recent work to define a fuzzy multipreference semantics for Multilayer Perceptrons, by regarding the deep neural network as a conditional knowledge base. In this paper, we study its properties. First, a monotonic extension of a fuzzy ALC with typicality is considered (called ALCFT) and a reformulation the KLM properties of a preferential consequence relation for this logic is devised. Most of the properties are satisfied, depending on the reformulation and on the fuzzy combination functions considered. We then strengthen ALCFT with a closure construction by introducing a notion of faithful model of a weighted knowledge base, which generalizes the notion of coherent model of a conditional knowledge base previously introduced, and we study its properties.

翻訳日:2021-06-02 14:22:51 公開日:2021-06-01

# マルコフ報酬過程にインスパイアされた値伝播に基づく時空間補間

Value propagation-based spatio-temporal interpolation inspired by Markov reward processes ( http://arxiv.org/abs/2106.00538v1 )

ライセンス: Link先を確認

Laurens Arp, Mitra Baratchi, Holger Hoos

(参考訳) リモートセンシング、生態学、気象学といった様々な分野の現実世界のアプリケーションで欠落するデータの一般的な問題を考えると、空間的・時空間的データの補間は極めて重要である。既存の空間補間法(特にガウス過程と空間自己回帰モデル)は、(a)局所的または大域的な空間的相互作用のモデル化間のトレードオフ、(b)二つの点の間に可能な経路が1つしかないという仮定、(c)点間の中間位置の同質性の仮定に苦しむ傾向がある。これらの問題に対処するため,空間補間法としてマルコフ報酬プロセス (MRP) に着想を得た値伝搬法を提案し,その2つの変種(SD-MRP)とデータ駆動重み予測 (WP-MRP) の変種(WP-MRP)を提案する。これらの補間変種はどちらも局所的に動作し、再帰を通じてシステム全体の空間的関係を暗黙的に説明する。提案手法は, 補間格子セルの平均絶対誤差とランニング時間と7つの共通基底線の平均誤差を比較して評価した。本分析では,44実験条件を超える2つの合成データと2つの実世界のデータセットについて詳細な実験を行った。実験結果から,実験条件下でのSD-MRPの平均性能は,他のすべての手法に比べて有意に高く,続いてWP-MRPが続いた。合成データから,WP-MRPがSD-MRPよりも十分な情報的特徴を有することを示す。さらに,本手法が基準線に対して有意な優位性を持たない場合においても,対象格子の空間構造を基準線よりもよく保存することがわかった。

Given the common problem of missing data in real-world applications from various fields, such as remote sensing, ecology and meteorology, the interpolation of missing spatial and spatio-temporal data can be of tremendous value. Existing methods for spatial interpolation, most notably Gaussian processes and spatial autoregressive models, tend to suffer from (a) a trade-off between modelling local or global spatial interaction, (b) the assumption there is only one possible path between two points, and (c) the assumption of homogeneity of intermediate locations between points. Addressing these issues, we propose a value propagation method, inspired by Markov reward processes (MRPs), as a spatial interpolation method, and introduce two variants thereof: (i) a static discount (SD-MRP) and (ii) a data-driven weight prediction (WP-MRP) variant. Both these interpolation variants operate locally, while implicitly accounting for global spatial relationships in the entire system through recursion. We evaluated our proposed methods by comparing the mean absolute errors and running times of interpolated grid cells to those of 7 common baselines. Our analysis involved detailed experiments on two synthetic and two real-world datasets over 44 total experimental conditions. Experimental results show the competitive advantage of MRP interpolation on real-world data, as the average performance of SD-MRP on real-world data under all experimental conditions was ranked significantly higher than that of all other methods, followed by WP-MRP. On synthetic data, we show that WP-MRP can perform better than SD-MRP given sufficiently informative features. We further found that, even in cases where our methods had no significant advantage over baselines numerically, our methods preserved the spatial structure of the target grid better than the baselines.

翻訳日:2021-06-02 14:22:36 公開日:2021-06-01

# コンピュータビジョンと無人航空機技術の公衆検査における統合的利用:異物デブリ画像収集

Integrative Use of Computer Vision and Unmanned Aircraft Technologies in Public Inspection: Foreign Object Debris Image Collection ( http://arxiv.org/abs/2106.00161v1 )

ライセンス: Link先を確認

Travis J. E. Munyer, Daniel Brinkman, Chenyu Huang, Xin Zhong

(参考訳) 無人航空機システム(UAS)は公共サービス事業者やスマートシティにとって重要な資源となっている。本研究の目的は,コンピュータビジョンとUAS技術を統合して公衆検査を自動化することにある。本研究の最初のケーススタディとして,軽量自動検出の可能性を評価するために,共通異物デブリ(fod)のデータセットを開発した。本稿では,このデータセットの根拠と作成について述べる。我々の研究の今後のイテレーションには、実験的な実装を分析する技術的な詳細が含まれます。地元の空港では、UASとポータブルカメラを使用して、このデータセットの初期バージョンに含まれるデータを収集する。 FODのビデオを収集した後、個々のフレームに分割され、数千の画像として保存された。これらのフレームは、標準のコンピュータビジョンフォーマットに従って注釈付けされ、フォルダ構造に格納される。データセットアノテーションは、将来のアプリケーションに適合するように抽象化できるカスタムツールを使用して検証される。提案したデータの実用性を示す有名なYou Only Look Onceアルゴリズムを用いて,初期検出モデルの作成に成功した。最後に、このデータセットまたは他の公共サービスのための類似のメソッドを利用する可能性のあるいくつかのシナリオが提示される。

Unmanned Aircraft Systems (UAS) have become an important resource for public service providers and smart cities. The purpose of this study is to expand this research area by integrating computer vision and UAS technology to automate public inspection. As an initial case study for this work, a dataset of common foreign object debris (FOD) is developed to assess the potential of light-weight automated detection. This paper presents the rationale and creation of this dataset. Future iterations of our work will include further technical details analyzing experimental implementation. At a local airport, UAS and portable cameras are used to collect the data contained in the initial version of this dataset. After collecting these videos of FOD, they were split into individual frames and stored as several thousand images. These frames are then annotated following standard computer vision format and stored in a folder-structure that reflects our creation method. The dataset annotations are validated using a custom tool that could be abstracted to fit future applications. Initial detection models were successfully created using the famous You Only Look Once algorithm, which indicates the practicality of the proposed data. Finally, several potential scenarios that could utilize either this dataset or similar methods for other public service are presented.

翻訳日:2021-06-02 14:21:42 公開日:2021-06-01

# 半教師付き物体検出のための擬似ラベル再考

Rethinking Pseudo Labels for Semi-Supervised Object Detection ( http://arxiv.org/abs/2106.00168v1 )

ライセンス: Link先を確認

Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis

(参考訳) 半教師対象検出(SSOD)の最近の進歩は、画像分類タスクの整合性に基づく擬似ラベル法によって大きく左右される。しかしながら、擬似ラベルを使用する場合、局所化精度と増幅されたクラス不均衡には考慮が欠如しており、どちらも検出タスクに不可欠である。本稿では,対象検出に適した確実な擬似ラベルを導入し,抽出した擬似ラベルの分類と位置化品質を効果的に推定する。これは、従来のローカライゼーションを分類タスクとして変換し、改良することで達成される。分類とローカライズ品質スコアに基づいて,各カテゴリの擬似ラベルと再重み付き損失関数を生成する閾値を動的に調整し,クラス不均衡問題を緩和する。実験の結果,COCOおよびPASCALVOCにおけるSSOD性能は1-2%,APは4-6%向上した。限定アノテーション方式では,COCOのラベル付きデータのみを用いて,教師付きベースラインを最大10%AP改善する。

Recent advances in semi-supervised object detection (SSOD) are largely driven by consistency-based pseudo-labeling methods for image classification tasks, producing pseudo labels as supervisory signals. However, when using pseudo labels, there is a lack of consideration in localization precision and amplified class imbalance, both of which are critical for detection tasks. In this paper, we introduce certainty-aware pseudo labels tailored for object detection, which can effectively estimate the classification and localization quality of derived pseudo labels. This is achieved by converting conventional localization as a classification task followed by refinement. Conditioned on classification and localization quality scores, we dynamically adjust the thresholds used to generate pseudo labels and reweight loss functions for each category to alleviate the class imbalance problem. Extensive experiments demonstrate that our method improves state-of-the-art SSOD performance by 1-2% and 4-6% AP on COCO and PASCAL VOC, respectively. In the limited-annotation regime, our approach improves supervised baselines by up to 10% AP using only 1-10% labeled data from COCO.

翻訳日:2021-06-02 14:21:24 公開日:2021-06-01

# 言語駆動イメージスタイル転送

Language-Driven Image Style Transfer ( http://arxiv.org/abs/2106.00178v1 )

ライセンス: Link先を確認

Tsu-Jui Fu, Xin Eric Wang, William Yang Wang

(参考訳) 期待できる結果を得たにもかかわらず、事前にスタイルイメージを作成する必要があるスタイル転送は、創造性とアクセシビリティの欠如をもたらす可能性がある。一方、人間の指示に従うことは、視覚効果アプリケーションの制御性を大幅に向上させる芸術的スタイル転送を行う最も自然な方法である。テキストでガイドされたコンテンツイメージのスタイルを操作するために,言語駆動型画像スタイル転送(\texttt{LDIST})という新たなタスクを導入する。そこで我々は,スタイル指示から視覚的意味を抽出し,パッチワイドなスタイル判別器で「texttt{LDIST}」を実現できるコントラスト言語ビジュアルアーティスト(CLVA)を提案する。判別器は、スタイル画像の言語とパッチの相関や、スタイル命令を共同埋め込むための転送結果について検討する。 CLVAはさらに、コンテントイメージのコントラスト対とスタイル命令を比較して、転送結果間の相互相対性を改善する。同じコンテンツ画像から転送された結果は、一貫したコンテンツ構造を保存できる。さらに、同様のビジュアルセマンティクスを含むスタイル命令からの類似のスタイルパターンも提示する必要がある。実験の結果, CLVA は有効であり, <texttt{LDIST} 上で超過渡した結果が得られることがわかった。

Despite having promising results, style transfer, which requires preparing style images in advance, may result in lack of creativity and accessibility. Following human instruction, on the other hand, is the most natural way to perform artistic style transfer that can significantly improve controllability for visual effect applications. We introduce a new task -- language-driven image style transfer (\texttt{LDIST}) -- to manipulate the style of a content image, guided by a text. We propose contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish \texttt{LDIST} by the patch-wise style discriminator. The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions. CLVA further compares contrastive pairs of content image and style instruction to improve the mutual relativeness between transfer results. The transferred results from the same content image can preserve consistent content structures. Besides, they should present analogous style patterns from style instructions that contain similar visual semantics. The experiments show that our CLVA is effective and achieves superb transferred results on \texttt{LDIST}.

翻訳日:2021-06-02 14:21:04 公開日:2021-06-01

# 少数の意味セマンティクスセグメンテーションに対するアンチエイリアシングセマンティクス再構成

Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation ( http://arxiv.org/abs/2106.00184v1 )

ライセンス: Link先を確認

Binghao Liu and Yao Ding and Jianbin Jiao and Xiangyang Ji and Qixiang Ye

(参考訳) 初歩的な意味セグメンテーションの進展を促すために、初歩的な例で新しいクラスを表現するのに十分なトレーニングデータを持つベースクラスで学習した機能を活用した。しかし、この特徴共有機構は必然的に、セマンティック概念の類似した構成を持つ場合、新しいクラス間のセマンティックエイリアスを引き起こす。本稿では,セグメンテーションを意味的再構成問題として再構成し,新しいクラス再構築のためのクラスレベルのセグメンテーション空間にまたがる一連の基底ベクトルに基底クラス特徴を変換する。対照損失を導入することにより,クラス間の意味的エイリアスを最小化しつつ,基底ベクトルの直交性を最大化する。再構成された表現空間内では、クエリ特徴をサポートベクタに投影し、正確なセマンティックアクティベーションを行うことにより、他のクラスからの干渉をさらに抑制する。提案手法はアンチエイリアス・セマンティック・リストラクション (ASR) と呼ばれ, 数発の学習問題に対する体系的かつ解釈可能な解決策を提供する。 PASCAL VOCとMS COCOデータセットの大規模な実験により、ASRは以前の研究と比べて強い結果が得られることが示された。

Encouraging progress in few-shot semantic segmentation has been made by leveraging features learned upon base classes with sufficient training data to represent novel classes with few-shot examples. However, this feature sharing mechanism inevitably causes semantic aliasing between novel classes when they have similar compositions of semantic concepts. In this paper, we reformulate few-shot segmentation as a semantic reconstruction problem, and convert base class features into a series of basis vectors which span a class-level semantic space for novel class reconstruction. By introducing contrastive loss, we maximize the orthogonality of basis vectors while minimizing semantic aliasing between classes. Within the reconstructed representation space, we further suppress interference from other classes by projecting query features to the support vector for precise semantic activation. Our proposed approach, referred to as anti-aliasing semantic reconstruction (ASR), provides a systematic yet interpretable solution for few-shot learning problems. Extensive experiments on PASCAL VOC and MS COCO datasets show that ASR achieves strong results compared with the prior works.

翻訳日:2021-06-02 14:20:43 公開日:2021-06-01

# 不均衡半教師学習における再サンプリングの再考

Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning ( http://arxiv.org/abs/2106.00209v1 )

ライセンス: Link先を確認

Ju He, Adam Kortylewski, Shaokang Yang, Shuai Liu, Cheng Yang, Changhu Wang, Alan Yuille

(参考訳) Semi-Supervised Learning (SSL)はラベル付きデータが不足している場合にラベル付きデータを利用する強力な能力を示している。しかし、ほとんどのSSLアルゴリズムは、クラス分布がトレーニングセットとテストセットの両方で均衡しているという仮定の下で機能する。本研究では,クラス不均衡データに対するSSLの問題について考察する。特に、表現と分類器の訓練を分離し、分類器を含むネットワーク全体のトレーニングや特徴抽出器のみを微調整する際に異なるデータ再サンプリング手法の効果を体系的に検討する。特にラベルなしデータのマイノリティクラスにおいて、疑似ラベルの精度を高めるため、データ再サンプリングは優れた分類法を学ぶ上で非常に重要であることがわかった。興味深いことに、特徴抽出器をトレーニングする際、むしろ逆にデータ再サンプリングが特徴抽出器のトレーニングを損なう場合、正確な擬似ラベルは役に立たない。この発見は、間違った擬似ラベルがSSLのモデルパフォーマンスを常に損なうという一般的な直観に反している。これらの結果を踏まえて,単一データ再サンプリング戦略の現在のパラダイムを再考し,クラス不均衡データに対するsslの単純かつ高効率なbis戦略を開発することを提案する。 BiSは機能抽出器と分類器をトレーニングするための2つの異なる再サンプリング戦略を実装し、この分離されたトレーニングをエンドツーエンドフレームワークに統合する。

Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce. However, most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets. In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations but has only received limited attention so far. In particular, we decouple the training of the representation and the classifier, and systematically investigate the effects of different data re-sampling techniques when training the whole network including a classifier as well as fine-tuning the feature extractor only. We find that data re-sampling is of critical importance to learn a good classifier as it increases the accuracy of the pseudo-labels, in particular for the minority classes in the unlabeled data. Interestingly, we find that accurate pseudo-labels do not help when training the feature extractor, rather contrariwise, data re-sampling harms the training of the feature extractor. This finding is against the general intuition that wrong pseudo-labels always harm the model performance in SSL. Based on these findings, we suggest to re-think the current paradigm of having a single data re-sampling strategy and develop a simple yet highly effective Bi-Sampling (BiS) strategy for SSL on class-imbalanced data. BiS implements two different re-sampling strategies for training the feature extractor and the classifier and integrates this decoupled training into an end-to-end framework... Code will be released at https://github.com/TACJu/Bi-Sampling.

翻訳日:2021-06-02 14:20:25 公開日:2021-06-01

# EV-VGCNN:イベントベースオブジェクト分類のためのVoxel Graph CNN

EV-VGCNN: A Voxel Graph CNN for Event-based Object Classification ( http://arxiv.org/abs/2106.00216v1 )

ライセンス: Link先を確認

Yongjian Deng, Hao Chen, Huiying Chen, Youfu Li

(参考訳) イベントカメラは、少ない強度変化を報告し、ポータブルデバイス上での視覚知覚と理解のための低消費電力、高ダイナミックレンジ、高応答速度の顕著な利点を目立たせる。イベントベースの学習手法は、従来型の2d学習アルゴリズムを適用するために、イベントを高密度フレームベースの表現に統合することで、オブジェクト認識において大きな成功を収めている。しかし、これらの手法は、スパース・トゥ・ディエンス変換と重厚大容量モデルを必要とするモデルにおいて、多くの冗長な情報を導入し、実際の応用におけるイベントカメラの可能性を制限する。 To address the core problem of balancing accuracy and model complexity for event-based classification models, we (1) construct graph representations for event data to utilize their sparsity nature better and design a lightweight end-to-end graph neural network (EV-VGCNN) for classification; (2) use voxel-wise vertices rather than traditional point-wise methods to incorporate the information from more points; (3) introduce a multi-scale feature relational layer (MFRL) to extract semantic and motion cues from each vertex adaptively concerning its distances to neighbors. 総合的な実験により,本手法は20倍近いパラメータ削減(約0.84Mパラメータ)を達成しつつ,最先端の分類精度を向上することが示された。

Event cameras report sparse intensity changes and hold noticeable advantages of low power consumption, high dynamic range, and high response speed for visual perception and understanding on portable devices. Event-based learning methods have recently achieved massive success on object recognition by integrating events into dense frame-based representations to apply traditional 2D learning algorithms. However, these approaches introduce much redundant information during the sparse-to-dense conversion and necessitate models with heavy-weight and large capacities, limiting the potential of event cameras on real-life applications. To address the core problem of balancing accuracy and model complexity for event-based classification models, we (1) construct graph representations for event data to utilize their sparsity nature better and design a lightweight end-to-end graph neural network (EV-VGCNN) for classification; (2) use voxel-wise vertices rather than traditional point-wise methods to incorporate the information from more points; (3) introduce a multi-scale feature relational layer (MFRL) to extract semantic and motion cues from each vertex adaptively concerning its distances to neighbors. Comprehensive experiments show that our approach advances state-of-the-art classification accuracy while achieving nearly 20 times parameter reduction (merely 0.84M parameters).

翻訳日:2021-06-02 14:20:00 公開日:2021-06-01

# 自己学習に基づくトランスダクティブゼロショット学習のためのハードネスサンプリング

Hardness Sampling for Self-Training Based Transductive Zero-Shot Learning ( http://arxiv.org/abs/2106.00264v1 )

ライセンス: Link先を確認

Liu Bo, Qiulei Dong, Zhanyi Hu

(参考訳) 既存のZSL作業におけるドメインシフト問題を緩和するトランスダクティブゼロショット学習(T-ZSL)が近年注目を集めている。しかし、T-ZSLのオープンな問題として、未確認クラスのサンプルを効果的にトレーニングに利用する方法が残っている。そこで本研究では,zsl法で見られる不均一な予測現象に基づいて,訓練過程における難易度が異なる非検出クラスサンプルの役割を経験的に解析し,3つの観察結果を得た。そこで本研究では,与えられた非知覚型データセットから多様かつ硬質なサンプルのサブセットを選択するための2つのハードネスサンプリング手法を提案する。第1はモデル予測のクラスレベル周波数に基づいてサンプルを識別し、第2は探索された事前推定アルゴリズムにより推定された近似クラスを介してクラス周波数を正規化することにより前者を強化する。最後に, 任意の誘導型ZSL法をシームレスに組み込むことができ, ハードネスサンプリング手法で選択した未確認のサンプルを反復的に学習できるSTHSという, T-ZSL用自己学習フレームワークを設計した。我々はSTHSフレームワークに2つの典型的なZSL手法を導入し、得られたT-ZSL法が3つの公開ベンチマークで多くの最先端手法より優れていることを示す。また,既存の一般化ZSL (T-GZSL) 法を学習するためには,未確認のデータセットを別々に使用し,GZSL タスクには厳密でない点に留意する。したがって、より厳密なT-GZSLデータ設定を提案し、提案したSTHSフレームワークをT-GZSLに導入することにより、この設定の競争ベースラインを確立する。

Transductive zero-shot learning (T-ZSL) which could alleviate the domain shift problem in existing ZSL works, has received much attention recently. However, an open problem in T-ZSL: how to effectively make use of unseen-class samples for training, still remains. Addressing this problem, we first empirically analyze the roles of unseen-class samples with different degrees of hardness in the training process based on the uneven prediction phenomenon found in many ZSL methods, resulting in three observations. Then, we propose two hardness sampling approaches for selecting a subset of diverse and hard samples from a given unseen-class dataset according to these observations. The first one identifies the samples based on the class-level frequency of the model predictions while the second enhances the former by normalizing the class frequency via an approximate class prior estimated by an explored prior estimation algorithm. Finally, we design a new Self-Training framework with Hardness Sampling for T-ZSL, called STHS, where an arbitrary inductive ZSL method could be seamlessly embedded and it is iteratively trained with unseen-class samples selected by the hardness sampling approach. We introduce two typical ZSL methods into the STHS framework and extensive experiments demonstrate that the derived T-ZSL methods outperform many state-of-the-art methods on three public benchmarks. Besides, we note that the unseen-class dataset is separately used for training in some existing transductive generalized ZSL (T-GZSL) methods, which is not strict for a GZSL task. Hence, we suggest a more strict T-GZSL data setting and establish a competitive baseline on this setting by introducing the proposed STHS framework to T-GZSL.

翻訳日:2021-06-02 14:19:41 公開日:2021-06-01

# 深部特徴再構成を用いた半教師付き距離推定

Semi-Supervised Disparity Estimation with Deep Feature Reconstruction ( http://arxiv.org/abs/2106.00318v1 )

ライセンス: Link先を確認

Julia Guerrero-Viu, Sergio Izquierdo, Philipp Schr\"oppel and Thomas Brox

(参考訳) 差分推定におけるディープラーニングの成功にもかかわらず、領域一般化のギャップは依然として問題である。本稿では,ラベル付き合成データの教師付きトレーニングとラベルなし実データに対する自己教師付きトレーニングを共同で行うことで,dispnetを実世界ドメインに適応させる半教師付きパイプラインを提案する。さらに, 広範に使用されている測光損失の限界を考慮し, 深部特徴再構成の影響を, 差分推定のための有望な監視信号として分析する。

Despite the success of deep learning in disparity estimation, the domain generalization gap remains an issue. We propose a semi-supervised pipeline that successfully adapts DispNet to a real-world domain by joint supervised training on labeled synthetic data and self-supervised training on unlabeled real data. Furthermore, accounting for the limitations of the widely-used photometric loss, we analyze the impact of deep feature reconstruction as a promising supervisory signal for disparity estimation.

翻訳日:2021-06-02 14:19:06 公開日:2021-06-01

# 点雲の遠隔登録のための一貫した二流ネットワーク

Consistent Two-Flow Network for Tele-Registration of Point Clouds ( http://arxiv.org/abs/2106.00329v1 )

ライセンス: Link先を確認

Zihao Yan, Zimu Yi, Ruizhen Hu, Niloy J. Mitra, Daniel Cohen-Or, Hui Huang

(参考訳) 部分観測の厳密な登録は、様々な応用分野における根本的な問題である。コンピュータグラフィックスでは、走査装置によって生成される2つの部分点雲間の登録に特に注意が払われている。最先端の登録技術は、2つのポイントクラウド間のオーバーラップ領域が小さく、スキャンペア間のオーバーラップがなければ、完全に失敗する。本稿では,この問題を緩和し,任意のポーズで提示された点群間の登録を可能にし,重なりがほとんどあるいは全くない,遠隔登録と呼ばれる設定を学習ベースで行う手法を提案する。本手法は,一群の形状の先行を学習し,部分的な形状を完遂できる新しいニューラルネットワーク設計に基づいている。キーとなるアイデアは、登録と完了タスクを互いに強化する方法で組み合わせることです。特に,登録ネットワークと完了ネットワークを,登録・完了フローと完全登録フローの2つの結合フローを用いて同時に訓練し,両フローが一貫した結果を生み出すように促す。個別のフローと比較すると、この2フロートレーニングは堅牢で信頼性の高い遠隔登録につながり、したがって登録されたスキャンを完了したより良いポイントクラウド予測につながる。また、ニューラルネットワークの各コンポーネントは、完了と登録の両方において最先端の手法よりも優れています。我々はさらに,いくつかのアブレーション研究によりネットワークを解析し,その性能を合成と実世界の両方の多数の部分点雲上で実証した。

Rigid registration of partial observations is a fundamental problem in various applied fields. In computer graphics, special attention has been given to the registration between two partial point clouds generated by scanning devices. State-of-the-art registration techniques still struggle when the overlap region between the two point clouds is small, and completely fail if there is no overlap between the scan pairs. In this paper, we present a learning-based technique that alleviates this problem, and allows registration between point clouds, presented in arbitrary poses, and having little or even no overlap, a setting that has been referred to as tele-registration. Our technique is based on a novel neural network design that learns a prior of a class of shapes and can complete a partial shape. The key idea is combining the registration and completion tasks in a way that reinforces each other. In particular, we simultaneously train the registration network and completion network using two coupled flows, one that register-and-complete, and one that complete-and-register, and encourage the two flows to produce a consistent result. We show that, compared with each separate flow, this two-flow training leads to robust and reliable tele-registration, and hence to a better point cloud prediction that completes the registered scans. It is also worth mentioning that each of the components in our neural network outperforms state-of-the-art methods in both completion and registration. We further analyze our network with several ablation studies and demonstrate its performance on a large number of partial point clouds, both synthetic and real-world, that have only small or no overlap.

翻訳日:2021-06-02 14:18:59 公開日:2021-06-01

# Transformer-Encoder Deep Features を用いた高能率クロスプラットフォームビジュアルテキスト検索に向けて

Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features ( http://arxiv.org/abs/2106.00358v1 )

ライセンス: Link先を確認

Nicola Messina, Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, St\'ephane Marchand-Maillet

(参考訳) クロスモーダル検索は、クエリや検索対象を異なるモダリティに関連付けることでユーザエクスペリエンスを向上させるため、現代の検索エンジンにおいて重要な機能である。本稿では,ある文(画像検索)の関連画像や,ある画像(画像検索)の関連文を効率的に見つけることを目的とした画像文検索タスクに着目した。コンピュータビジョン文献は、注意と自己注意機構を備えたディープニューラルネットワークを用いた画像文マッチングタスクにおける最良の結果を報告する。データセット全体の逐次スキャンを行い,検索タスクのマッチング性能を評価する。この方法は画像や字幕の数が増えるほどスケールが良くない。本研究では,画像テキストマッチングのための最先端のディープラーニングアーキテクチャから抽出する,スパース化された深層マルチモーダル特徴を生成するための,さまざまな前処理手法について検討する。我々の主な目的は、複雑なマルチモーダル記述の効率的な索引付けのための経路を敷設することである。我々は最近導入されたTERNアーキテクチャを画像文特徴抽出器として利用する。画像全体と文を記述した固定サイズ1024-dベクターと、2つのモーダル(画像領域と文語)の様々な構成要素を記述する可変長1024-dベクターを作成するように設計されている。これらのベクトルはすべて、TERN設計によって同じ共通空間に置かれるように強制される。本実験では,本手法の予備実験を行い,本研究の方向性についてさらなる実験を行うことを提案する。

Cross-modal retrieval is an important functionality in modern search engines, as it increases the user experience by allowing queries and retrieved objects to pertain to different modalities. In this paper, we focus on the image-sentence retrieval task, where the objective is to efficiently find relevant images for a given sentence (image-retrieval) or the relevant sentences for a given image (sentence-retrieval). Computer vision literature reports the best results on the image-sentence matching task using deep neural networks equipped with attention and self-attention mechanisms. They evaluate the matching performance on the retrieval task by performing sequential scans of the whole dataset. This method does not scale well with an increasing amount of images or captions. In this work, we explore different preprocessing techniques to produce sparsified deep multi-modal features extracting them from state-of-the-art deep-learning architectures for image-text matching. Our main objective is to lay down the paths for efficient indexing of complex multi-modal descriptions. We use the recently introduced TERN architecture as an image-sentence features extractor. It is designed for producing fixed-size 1024-d vectors describing whole images and sentences, as well as variable-length sets of 1024-d vectors describing the various building components of the two modalities (image regions and sentence words respectively). All these vectors are enforced by the TERN design to lie into the same common space. Our experiments show interesting preliminary results on the explored methods and suggest further experimentation in this important research direction.

翻訳日:2021-06-02 14:18:35 公開日:2021-06-01

# ネットワーク活性化の自然統計と知識蒸留への示唆

Natural Statistics of Network Activations and Implications for Knowledge Distillation ( http://arxiv.org/abs/2106.00368v1 )

ライセンス: Link先を確認

Michael Rotman and Lior Wolf

(参考訳) 自然画像統計の研究に類似するものとして,様々な層におけるディープニューラルネットワークの活性化の自然統計について検討する。ご覧の通り、これらの統計は画像統計と同様、権力法に従っている。また,解析的にも経験的にも,このパワー法則の指数が線形速度で増加することを示した。発見の直接的意味として,我々は知識蒸留(KD)を行う方法を提案する。従来のKD手法では教師ネットワークのロジットを考慮しているが,近年ではアクティベーションマップによる性能向上が図られている。しかし、これは画像の比較に適したメトリクスを使用します。本稿では,中間活性化写像のスペクトル特性に基づく2つの損失項を提案する。提案手法は,複数の画像認識KDベンチマークにおける技術結果の状態を求める。

In a matter that is analog to the study of natural image statistics, we study the natural statistics of the deep neural network activations at various layers. As we show, these statistics, similar to image statistics, follow a power law. We also show, both analytically and empirically, that with depth the exponent of this power law increases at a linear rate. As a direct implication of our discoveries, we present a method for performing Knowledge Distillation (KD). While classical KD methods consider the logits of the teacher network, more recent methods obtain a leap in performance by considering the activation maps. This, however, uses metrics that are suitable for comparing images. We propose to employ two additional loss terms that are based on the spectral properties of the intermediate activation maps. The proposed method obtains state of the art results on multiple image recognition KD benchmarks.

翻訳日:2021-06-02 14:18:11 公開日:2021-06-01

# DLA-Net: 大規模建物ファサードポイントクラウドのセマンティックセグメンテーションのためのデュアルローカルアテンション特徴の学習

DLA-Net: Learning Dual Local Attention Features for Semantic Segmentation of Large-Scale Building Facade Point Clouds ( http://arxiv.org/abs/2106.00376v1 )

ライセンス: Link先を確認

Yanfei Su, Weiquan Liu, Zhimin Yuan, Ming Cheng, Zhihong Zhang, Xuelun Shen, Cheng Wang

(参考訳) 建物ファサードのセマンティックセグメンテーションは、都市建物の再建や損傷評価など、様々な用途において重要である。細粒度のビルディングファサードに関連する3dポイントクラウドデータセットが不足しているため、最初の大規模ビルディングファサードポイントクラウドベンチマークデータセットをセマンティックセグメンテーションのために構築する。既存のセマンティックセグメンテーションの方法は、点雲の局所的な近傍情報を完全にはマイニングできない。本稿では,dlaと呼ばれる2つの局所的注意特徴を学習する学習可能な注意モジュールを提案する。提案したDLAモジュールは、自己注意ブロックと注意プールブロックの2つのブロックから構成されており、どちらも拡張された位置符号化ブロックを埋め込んでいる。 DLAモジュールは、ポイントクラウドセグメンテーションのために、様々なネットワークアーキテクチャに簡単に組み込むことができ、自然界では、DLA-Netと呼ばれるエンコーダデコーダアーキテクチャを持つ新しい3Dセグメンテーションネットワークとなる。構築したファサードデータセットの大規模な実験結果から,提案したDLA-Netは,セマンティックセグメンテーションの最先端手法よりも優れた性能を示すことが示された。

Semantic segmentation of building facade is significant in various applications, such as urban building reconstruction and damage assessment. As there is a lack of 3D point clouds datasets related to the fine-grained building facade, we construct the first large-scale building facade point clouds benchmark dataset for semantic segmentation. The existing methods of semantic segmentation cannot fully mine the local neighborhood information of point clouds. Addressing this problem, we propose a learnable attention module that learns Dual Local Attention features, called DLA in this paper. The proposed DLA module consists of two blocks, including the self-attention block and attentive pooling block, which both embed an enhanced position encoding block. The DLA module could be easily embedded into various network architectures for point cloud segmentation, naturally resulting in a new 3D semantic segmentation network with an encoder-decoder architecture, called DLA-Net in this work. Extensive experimental results on our constructed building facade dataset demonstrate that the proposed DLA-Net achieves better performance than the state-of-the-art methods for semantic segmentation.

翻訳日:2021-06-02 14:17:59 公開日:2021-06-01

# パノDR:屋内シーンの球状パノラマが消滅

PanoDR: Spherical Panorama Diminished Reality for Indoor Scenes ( http://arxiv.org/abs/2106.00446v1 )

ライセンス: Link先を確認

V. Gkitsas, V. Sterzentsenko, N. Zioulis, G. Albanis, D. Zarpalas

(参考訳) 屋内スキャンを民主化する商用360^\circ$カメラの普及により、内部空間の再設計などの新しい応用への関心が高まっている。縮小現実(英語版) (dr) はそのような応用要件を満たし、シーン内の既存のオブジェクトを取り除き、実質的には偽りの塗りつぶしタスクに翻訳する。近年のデータ駆動塗布の進歩は、現実的なサンプルの生成において顕著な進歩を見せているが、現実の地図構造による結果の生成には制約がない。屋内(再計画)アプリケーションにおける「現実」を保つためには,シーンの構造保存が重要である。そこで本研究では,室内シーンの構造を最初に予測し,それを用いて同一シーンの空-背景のみ-表現の再構築を導くモデルを提案する。 dr用に修正されたstructured3dデータセットのバージョンで、他の最先端の手法をトレーニングして比較し、定量的測定と質的結果の両方において優れた結果を示しているが、より興味深いことに、このアプローチはより高速な収束率を示している。コードとモデルはhttps://vcl3d.github.io/panodr/で入手できる。

The rising availability of commercial $360^\circ$ cameras that democratize indoor scanning, has increased the interest for novel applications, such as interior space re-design. Diminished Reality (DR) fulfills the requirement of such applications, to remove existing objects in the scene, essentially translating this to a counterfactual inpainting task. While recent advances in data-driven inpainting have shown significant progress in generating realistic samples, they are not constrained to produce results with reality mapped structures. To preserve the `reality' in indoor (re-)planning applications, the scene's structure preservation is crucial. To ensure structure-aware counterfactual inpainting, we propose a model that initially predicts the structure of an indoor scene and then uses it to guide the reconstruction of an empty -- background only -- representation of the same scene. We train and compare against other state-of-the-art methods on a version of the Structured3D dataset modified for DR, showing superior results in both quantitative metrics and qualitative results, but more interestingly, our approach exhibits a much faster convergence rate. Code and models are available at https://vcl3d.github.io/PanoDR/ .

翻訳日:2021-06-02 14:17:37 公開日:2021-06-01

# プロトタイプを用いたセマンティックセグメンテーションの異常検出

Detecting Anomalies in Semantic Segmentation with Prototypes ( http://arxiv.org/abs/2106.00472v1 )

ライセンス: Link先を確認

Dario Fontanel, Fabio Cermelli, Massimiliano Mancini, Barbara Caputo

(参考訳) 従来のセマンティックセグメンテーションメソッドは、テスト時にトレーニングセットに存在するクラスのみを認識することができる。これは重要な制限であり、特にインテリジェントな自律システムに搭載されたセマンティックセグメンテーションアルゴリズムは、現実的な設定でデプロイされる。システムがトレーニング時に何つのクラスを見たかに関わらず、予期せぬ未知のオブジェクトがテスト時に現れることは避けられない。このような異常を特定することの失敗は、現実世界に配備された場合、そのようなセグメンテーションモデルを備えた自律エージェントの不正確で危険な行動を引き起こす可能性がある。異常セグメンテーション技術の現状は生成モデルを用いており、訓練中に見えないパターンを再構築することができない。しかし、これらのモデルのトレーニングは高価であり、生成されたアーティファクトは誤った異常を生み出す可能性がある。本稿では,異なる経路をとり,プロトタイプ学習による異常セグメンテーションに対処することを提案する。我々の直感では、異常画素はモデルで知られている全てのクラスプロトタイプと異なるものである。学習データから,コサイン類似度に基づく分類器を用いて,軽量にクラスプロトタイプを抽出する。また,StreetHazards実験の結果,計算オーバーヘッドの低減にもかかわらず,従来の手法に比べて大きな差があることがわかった。コードはhttps://github.com/DarioFontanel/PAnSで入手できる。

Traditional semantic segmentation methods can recognize at test time only the classes that are present in the training set. This is a significant limitation, especially for semantic segmentation algorithms mounted on intelligent autonomous systems, deployed in realistic settings. Regardless of how many classes the system has seen at training time, it is inevitable that unexpected, unknown objects will appear at test time. The failure in identifying such anomalies may lead to incorrect, even dangerous behaviors of the autonomous agent equipped with such segmentation model when deployed in the real world. Current state of the art of anomaly segmentation uses generative models, exploiting their incapability to reconstruct patterns unseen during training. However, training these models is expensive, and their generated artifacts may create false anomalies. In this paper we take a different route and we propose to address anomaly segmentation through prototype learning. Our intuition is that anomalous pixels are those that are dissimilar to all class prototypes known by the model. We extract class prototypes from the training data in a lightweight manner using a cosine similarity-based classifier. Experiments on StreetHazards show that our approach achieves the new state of the art, with a significant margin over previous works, despite the reduced computational overhead. Code is available at https://github.com/DarioFontanel/PAnS.

翻訳日:2021-06-02 14:17:17 公開日:2021-06-01

# 赤外線小ターゲット検出のためのDense Nested Attention Network

Dense Nested Attention Network for Infrared Small Target Detection ( http://arxiv.org/abs/2106.00487v1 )

ライセンス: Link先を確認

Boyang Li, Chao Xiao, Longguang Wang, Yingqian Wang, Zaiping Lin, Miao Li, Wei An, Yulan Guo

(参考訳) 単一フレーム赤外線小ターゲット(SIRST)検出は、小さなターゲットを乱雑な背景から分離することを目的としている。ディープラーニングの進歩により、cnnベースの手法は強力なモデリング能力により、汎用オブジェクト検出に有望な結果をもたらした。しかし、既存のCNNベースの手法は、ネットワーク内のプール層が深い層内のターゲットを失う可能性があるため、赤外線小ターゲットに対して直接適用することはできない。この問題に対処するため,本論文では,高密度なネスト型注意ネットワーク(DNANet)を提案する。具体的には,高レベルかつ低レベルの特徴間のプログレッシブな相互作用を実現するために,高密度ネスト型インタラクティブモジュール(DNIM)を設計する。 DNIMにおける繰り返しの相互作用により、深い層内の赤外線小ターゲットを維持することができる。 DNIMに基づいて,多レベル特徴を適応的に拡張するカスケードチャネルと空間アテンションモジュール(CSAM)を提案する。我々のDNANetでは、小さなターゲットのコンテキスト情報をうまく組み込んで、繰り返し融合と拡張によって完全に活用することができる。さらに,赤外線小目標データセット(nudt-sirst)を開発し,総合的な性能評価を行うための評価指標を提案する。公開と自己開発の両方の実験により,本手法の有効性が示された。本手法は他の最先端手法と比較して,検出確率 (Pd), 偽アラーム率 (Fa), 結合の交叉率 (IoU) の点で, 性能が向上する。

Single-frame infrared small target (SIRST) detection aims at separating small targets from clutter backgrounds. With the advances of deep learning, CNN-based methods have yielded promising results in generic object detection due to their powerful modeling capability. However, existing CNN-based methods cannot be directly applied for infrared small targets since pooling layers in their networks could lead to the loss of targets in deep layers. To handle this problem, we propose a dense nested attention network (DNANet) in this paper. Specifically, we design a dense nested interactive module (DNIM) to achieve progressive interaction among high-level and low-level features. With the repeated interaction in DNIM, infrared small targets in deep layers can be maintained. Based on DNIM, we further propose a cascaded channel and spatial attention module (CSAM) to adaptively enhance multi-level features. With our DNANet, contextual information of small targets can be well incorporated and fully exploited by repeated fusion and enhancement. Moreover, we develop an infrared small target dataset (namely, NUDT-SIRST) and propose a set of evaluation metrics to conduct comprehensive performance evaluation. Experiments on both public and our self-developed datasets demonstrate the effectiveness of our method. Compared to other state-of-the-art methods, our method achieves better performance in terms of probability of detection (Pd), false-alarm rate (Fa), and intersection of union (IoU).

翻訳日:2021-06-02 14:16:58 公開日:2021-06-01

# 視覚前訓練作業における自己の多様性と不変性を探る

Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task ( http://arxiv.org/abs/2106.00537v1 )

ライセンス: Link先を確認

Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

(参考訳) 近年,自己指導型学習手法は視覚前訓練において顕著な成功を収めている。各画像の異なる拡張ビューをまとめたり、あるいは他の新しいメカニズムを取り入れることで、教師なしの知識を習得し、事前学習モデルの転送性能を大幅に向上させることができる。しかし、これらの作品は表現の崩壊問題を避けることはできない。つまり、それらは限られた領域のみに焦点を当てたり、画像内の全く異なる領域で抽出された特徴がほぼ同じである。一般に、この問題は、事前学習モデルが画像内の複数の粒度の情報を十分に記述できないため、転送性能の上限がさらに制限される。この問題を軽減するため,本稿では,e-diyにおける多様性と不変性を検討するという,単純かつ効果的なメカニズムを紹介する。 E-DIYは、各拡張ビュー内の最も異なる領域を移動させることで、抽出された領域レベルの特徴の多様性を維持できる。同じ画像の異なる拡張ビューから最も類似した領域を抽出することで、E-DIYは領域レベルの機能の堅牢性を確保することができる。上記の多様性と不変性探索機構から、E-DIYは各画像内の多粒度視覚情報を最大限に抽出する。例えば、COCO上の強力なベースラインであるBYOLに比べて2.1%改善され、R50-C4バックボーンと1X学習スケジュールを微調整したMask R-CNNが実現された。

Recently, self-supervised learning methods have achieved remarkable success in visual pre-training task. By simply pulling the different augmented views of each image together or other novel mechanisms, they can learn much unsupervised knowledge and significantly improve the transfer performance of pre-training models. However, these works still cannot avoid the representation collapse problem, i.e., they only focus on limited regions or the extracted features on totally different regions inside each image are nearly the same. Generally, this problem makes the pre-training models cannot sufficiently describe the multi-grained information inside images, which further limits the upper bound of their transfer performance. To alleviate this issue, this paper introduces a simple but effective mechanism, called Exploring the Diversity and Invariance in Yourself E-DIY. By simply pushing the most different regions inside each augmented view away, E-DIY can preserve the diversity of extracted region-level features. By pulling the most similar regions from different augmented views of the same image together, E-DIY can ensure the robustness of region-level features. Benefited from the above diversity and invariance exploring mechanism, E-DIY maximally extracts the multi-grained visual information inside each image. Extensive experiments on downstream tasks demonstrate the superiority of our proposed approach, e.g., there are 2.1% improvements compared with the strong baseline BYOL on COCO while fine-tuning Mask R-CNN with the R50-C4 backbone and 1X learning schedule.

翻訳日:2021-06-02 14:16:34 公開日:2021-06-01

# 変圧器ネットワークと拡張情報を用いた都市シナリオにおける車両軌跡予測

Predicting Vehicles Trajectories in Urban Scenarios with Transformer Networks and Augmented Information ( http://arxiv.org/abs/2106.00559v1 )

ライセンス: Link先を確認

A. Quintanar, D. Fern\'andez-Llorca, I. Parra, R. Izquierdo, M. A. Sotelo

(参考訳) 道路利用者の行動を理解することは,軌道予測システムの開発に不可欠である。この文脈では、最新の進歩は繰り返しの構造に焦点を合わせ、現場に関わるエージェント間の社会的相互作用を確立している。近年では、変圧器ネットワークに基づく歩行者軌道予測や位置情報を用いた簡易な構造も導入されている。それぞれのエージェントの軌道の個々のモデリングを、複雑な相互作用項を使わずに別々に行うことができる。提案モデルでは, 都市シナリオにおける車両軌道予測問題に最大5秒間, 付加データ(位置と方向)を付加することにより, これらの単純な構造を利用する。さらに、最近のデータセット(inD, rounD, highD, InterAction)を使用して、ハイウェイ、交差点、ラウンドアバウトを含むさまざまなタイプのシナリオ間で、クロスパフォーマンス分析を行う。我々のモデルは最先端の成果を達成し、異なるタイプの都市環境に柔軟で適応可能であることを証明している。

Understanding the behavior of road users is of vital importance for the development of trajectory prediction systems. In this context, the latest advances have focused on recurrent structures, establishing the social interaction between the agents involved in the scene. More recently, simpler structures have also been introduced for predicting pedestrian trajectories, based on Transformer Networks, and using positional information. They allow the individual modelling of each agent's trajectory separately without any complex interaction terms. Our model exploits these simple structures by adding augmented data (position and heading), and adapting their use to the problem of vehicle trajectory prediction in urban scenarios in prediction horizons up to 5 seconds. In addition, a cross-performance analysis is performed between different types of scenarios, including highways, intersections and roundabouts, using recent datasets (inD, rounD, highD and INTERACTION). Our model achieves state-of-the-art results and proves to be flexible and adaptable to different types of urban contexts.

翻訳日:2021-06-02 14:16:10 公開日:2021-06-01

# メタプロトタイプを用いたプレエンハンスフットショットセグメンテーション

Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes ( http://arxiv.org/abs/2106.00572v1 )

ライセンス: Link先を確認

Jian-Wei Zhang, Lei Lv, Yawei Luo, Hao-Zhe Feng, Yi Yang, Wei Chen

(参考訳) Few-shot segmentation~(FSS)のパフォーマンスは、エピソードトレーニングとクラスワイドプロトタイプの導入によって広範囲に向上している。しかし,FSS問題は,(1)モデルがタスク非関連情報に気を散らすこと,(2)単一プロトタイプの表現能力に制限があること,(3)クラス関連プロトタイプは基本クラスの事前の知識を無視すること,の3つの制約により,依然として困難なままである。これらの制約に対処するために,メタプロトタイプを用いた事前拡張ネットワークを提案する。 pre-enhanced networkは、機能抽出における support and query (pseudo-) ラベルを活用し、モデルが前景オブジェクトのタスク関連の特徴に焦点を合わせ、教師付き知識の欠如により多くのノイズを抑制する。さらに,階層的特徴をエンコードし,クラスに依存しない構造情報を学習するために,複数のメタプロトタイプを導入する。階層的特徴は決定境界を強調表示し,ハードピクセルに着目し,基本クラスから学習した構造情報は新規クラスの事前知識として扱われる。実験の結果, PASCAL-$5^i$およびCOCO-$20^i$では平均IoUスコアが60.79%, 41.16%となり, 5ショット設定では3.49%, 5.64%向上した。さらに,上記2つのベンチマークにおいて,5ショット精度を3.73%,10.32%向上させた。このメソッドのソースコードはhttps://github.com/jarvis73/pempで入手できます。

Few-shot segmentation~(FSS) performance has been extensively promoted by introducing episodic training and class-wise prototypes. However, the FSS problem remains challenging due to three limitations: (1) Models are distracted by task-unrelated information; (2) The representation ability of a single prototype is limited; (3) Class-related prototypes ignore the prior knowledge of base classes. We propose the Prior-Enhanced network with Meta-Prototypes to tackle these limitations. The prior-enhanced network leverages the support and query (pseudo-) labels in feature extraction, which guides the model to focus on the task-related features of the foreground objects, and suppress much noise due to the lack of supervised knowledge. Moreover, we introduce multiple meta-prototypes to encode hierarchical features and learn class-agnostic structural information. The hierarchical features help the model highlight the decision boundary and focus on hard pixels, and the structural information learned from base classes is treated as the prior knowledge for novel classes. Experiments show that our method achieves the mean-IoU scores of 60.79% and 41.16% on PASCAL-$5^i$ and COCO-$20^i$, outperforming the state-of-the-art method by 3.49% and 5.64% in the 5-shot setting. Moreover, comparing with 1-shot results, our method promotes 5-shot accuracy by 3.73% and 10.32% on the above two benchmarks. The source code of our method is available at https://github.com/Jarvis73/PEMP.

翻訳日:2021-06-02 14:15:53 公開日:2021-06-01

# 半教師付きセマンティックセグメンテーションのためのロバスト相互学習

Robust Mutual Learning for Semi-supervised Semantic Segmentation ( http://arxiv.org/abs/2106.00609v1 )

ライセンス: Link先を確認

Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Fang Wen

(参考訳) 最近の半教師付き学習(SSL)法は一般に擬似ラベリングに基づいている。 SSL性能は擬似ラベルの品質に大きく影響されているため,疑似監視における雑音を効果的に抑制するための相互学習が提案されている。本研究では,従来のアプローチを2つの側面で改善する頑健な相互学習を提案する。まず、バニラ相互学習者は、モデルが均質な知識を学ぶために収束するかもしれない結合の問題に苦しむ。この問題は,教師同士の直接の交流がないように,教師同士の相互監督を生み出すことによって解決する。また,モデル結合の緩和には強固なデータ拡張,モデルノイズ,異種ネットワークアーキテクチャが不可欠であることを示す。第2に,相互学習はネットワークの擬似ラベル改良能力の活用に失敗していることに気付く。そこで,本研究では,内部知識を活用し,相互指導前の擬似ラベルを明示的に修正する自己認識を導入する。このような自己修正と相互指導によって、学習を通して擬似ラベルの精度が向上する。提案した頑健な相互学習は、低データ状態におけるセマンティックセグメンテーションにおける最先端のパフォーマンスを示す。

Recent semi-supervised learning (SSL) methods are commonly based on pseudo labeling. Since the SSL performance is greatly influenced by the quality of pseudo labels, mutual learning has been proposed to effectively suppress the noises in the pseudo supervision. In this work, we propose robust mutual learning that improves the prior approach in two aspects. First, the vanilla mutual learners suffer from the coupling issue that models may converge to learn homogeneous knowledge. We resolve this issue by introducing mean teachers to generate mutual supervisions so that there is no direct interaction between the two students. We also show that strong data augmentations, model noises and heterogeneous network architectures are essential to alleviate the model coupling. Second, we notice that mutual learning fails to leverage the network's own ability for pseudo label refinement. Therefore, we introduce self-rectification that leverages the internal knowledge and explicitly rectifies the pseudo labels before the mutual teaching. Such self-rectification and mutual teaching collaboratively improve the pseudo label accuracy throughout the learning. The proposed robust mutual learning demonstrates state-of-the-art performance on semantic segmentation in low-data regime.

翻訳日:2021-06-02 14:15:26 公開日:2021-06-01

# 独自の対応をブートストラップする

Bootstrap Your Own Correspondences ( http://arxiv.org/abs/2106.00677v1 )

ライセンス: Link先を確認

Mohamed El Banani, Justin Johnson

(参考訳) 幾何学的特徴抽出はポイントクラウド登録パイプラインの重要なコンポーネントである。最近の研究は、より良くよりコンパクトな3d機能を学ぶために教師あり学習をどのように活用できるかを実証している。しかし、これらのアプローチは地道アノテーションに依存しているためスケーラビリティは制限される。本稿では,RGB-Dビデオから視覚的・幾何学的特徴を学習する自己教師型アプローチBYOCを提案する。我々の重要な観察は、ランダムに初期化されたcnnは私たちに良い対応を提供し、視覚と幾何学の両方の特徴の学習をブートストラップできるということです。我々のアプローチは、ポイントクラウド登録からの古典的なアイデアと、より最近の表現学習アプローチを組み合わせたものです。室内シーンデータセットに対するアプローチを評価し,従来型および学習済みのディスクリプタを上回りながら,現在の最先端の教師付きアプローチと競合することを見出した。

Geometric feature extraction is a crucial component of point cloud registration pipelines. Recent work has demonstrated how supervised learning can be leveraged to learn better and more compact 3D features. However, those approaches' reliance on ground-truth annotation limits their scalability. We propose BYOC: a self-supervised approach that learns visual and geometric features from RGB-D video without relying on ground-truth pose or correspondence. Our key observation is that randomly-initialized CNNs readily provide us with good correspondences; allowing us to bootstrap the learning of both visual and geometric features. Our approach combines classic ideas from point cloud registration with more recent representation learning approaches. We evaluate our approach on indoor scene datasets and find that our method outperforms traditional and learned descriptors, while being competitive with current state-of-the-art supervised approaches.

翻訳日:2021-06-02 14:15:09 公開日:2021-06-01

# 3次元物体のデータ駆動シャドウグラフシミュレーション

Data-Driven Shadowgraph Simulation of a 3D Object ( http://arxiv.org/abs/2106.00317v1 )

ライセンス: Link先を確認

Anna Willmann, Patrick Stiller, Alexander Debus, Arie Irman, Richard Pausch, Yen-Yu Chang, Michael Bussmann, Nico Hoffmann

(参考訳) 本研究では,プラズマシャドウグラフのためのディープニューラルネットワークに基づく代理モデルを提案する。数値計算法で必要となるすべての先行する電場を計算することなく、所定の時間で電場を近似できる計算コストの低い投射型代理モデルを用いて数値コードを置換する。これは、プロジェクションベースサロゲートモデルにより、与えられた計算領域の任意の点と構成において、3次元偏微分方程式(3次元波動方程式)の解を完全なシミュレーションを実行することなく復元することができることを意味する。このモデルでは、シミュレーションパラメータの狭い範囲におけるデータの補間問題において、再構成の質が良く、大規模な入力データに使用することができる。

In this work we propose a deep neural network based surrogate model for a plasma shadowgraph - a technique for visualization of perturbations in a transparent medium. We are substituting the numerical code by a computationally cheaper projection based surrogate model that is able to approximate the electric fields at a given time without computing all preceding electric fields as required by numerical methods. This means that the projection based surrogate model allows to recover the solution of the governing 3D partial differential equation, 3D wave equation, at any point of a given compute domain and configuration without the need to run a full simulation. This model has shown a good quality of reconstruction in a problem of interpolation of data within a narrow range of simulation parameters and can be used for input data of large size.

翻訳日:2021-06-02 14:14:07 公開日:2021-06-01

# 直交分位回帰による条件範囲の改善

Improving Conditional Coverage via Orthogonal Quantile Regression ( http://arxiv.org/abs/2106.00394v1 )

ライセンス: Link先を確認

Shai Feldman, Stephen Bates, Yaniv Romano

(参考訳) 本研究では,特徴空間の全領域にわたるユーザ指定カバレッジレベルを持つ予測区間を生成する手法を開発した。このタスクの典型的なアプローチは、質的回帰を伴う条件付き四分位数を推定することであり、有限サンプルでは正確ではないものの、大きなサンプル限界のカバレッジが正しいことがよく知られている。従来の量子レグレッションは条件付きカバレッジが低いという実験で明らかになった。これを解決するために,区間の大きさと誤検出の指標との独立性を促進するために,損失関数を変更する。真の条件量子について、これらの2つの量は独立(直交)であるため、修正された損失関数は引き続き有効である。さらに,いくつかの指標で評価されるように,修正損失関数が条件付きカバレッジを改善することを実証的に示す。また,間隔の大きさと誤発見の指標との依存性の強さを調べることで条件付きカバレッジをチェックする2つの新しい指標も導入した。

We develop a method to generate prediction intervals that have a user-specified coverage level across all regions of feature-space, a property called conditional coverage. A typical approach to this task is to estimate the conditional quantiles with quantile regression -- it is well-known that this leads to correct coverage in the large-sample limit, although it may not be accurate in finite samples. We find in experiments that traditional quantile regression can have poor conditional coverage. To remedy this, we modify the loss function to promote independence between the size of the intervals and the indicator of a miscoverage event. For the true conditional quantiles, these two quantities are independent (orthogonal), so the modified loss function continues to be valid. Moreover, we empirically show that the modified loss function leads to improved conditional coverage, as evaluated by several metrics. We also introduce two new metrics that check conditional coverage by looking at the strength of the dependence between the interval size and the indicator of miscoverage.

翻訳日:2021-06-02 14:13:54 公開日:2021-06-01

# 半教師付きモデルは強い教師なしドメイン適応学習者である

Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners ( http://arxiv.org/abs/2106.00417v1 )

ライセンス: Link先を確認

Yabin Zhang, Haojian Zhang, Bin Deng, Shuai Li, Kui Jia, Lei Zhang

(参考訳) unsupervised domain adaptation (uda) と semi-supervised learning (ssl) の2つは、機械学習における高価な手動アノテーションを減らすための典型的な戦略である。ターゲットタスクの効果的なモデルを学ぶために、UDAは利用可能なラベル付きソースデータを使用し、ターゲットドメイン内のラベルなしサンプルと異なる分布を持つ可能性がある。 UDAとSSLは全く異なる戦略のように見えるが、それらはタスクの目的とソリューションに関して密接に関連しており、SSLはUDA問題の特別なケースである。この結果に基づき、SSLメソッドが UDA タスクで機能するかどうかをさらに調査する。 UDAベンチマークに8つの代表的なSSLアルゴリズムを適用することで、SSLメソッドが強力なUDA学習者であることが分かる。特に、最先端のSSLメソッドは、DomainNetの挑戦的なUDAベンチマークにおいて既存のUDAメソッドよりも大幅に優れており、最先端のUDAメソッドはSSL技術によってさらに強化される可能性がある。したがって,今後のUDA研究のベースラインとしてSSLメソッドを採用することを推奨し,UDAとSSLの関係を明らかにすることによって,今後のUDA開発に光を当てることが期待できる。コードは \url{https://github.com/ybzh} で入手できる。

Unsupervised domain adaptation (UDA) and semi-supervised learning (SSL) are two typical strategies to reduce expensive manual annotations in machine learning. In order to learn effective models for a target task, UDA utilizes the available labeled source data, which may have different distributions from unlabeled samples in the target domain, while SSL employs few manually annotated target samples. Although UDA and SSL are seemingly very different strategies, we find that they are closely related in terms of task objectives and solutions, and SSL is a special case of UDA problems. Based on this finding, we further investigate whether SSL methods work on UDA tasks. By adapting eight representative SSL algorithms on UDA benchmarks, we show that SSL methods are strong UDA learners. Especially, state-of-the-art SSL methods significantly outperform existing UDA methods on the challenging UDA benchmark of DomainNet, and state-of-the-art UDA methods could be further enhanced with SSL techniques. We thus promote that SSL methods should be employed as baselines in future UDA studies and expect that the revealed relationship between UDA and SSL could shed light on future UDA development. Codes are available at \url{https://github.com/YBZh}.

翻訳日:2021-06-02 14:13:38 公開日:2021-06-01

# 雑音ラベル学習における損失の不確実性を考慮したサンプル選択

Sample Selection with Uncertainty of Losses for Learning with Noisy Labels ( http://arxiv.org/abs/2106.00445v1 )

ライセンス: Link先を確認

Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, Masashi Sugiyama

(参考訳) ノイズの多いラベルで学習する際、サンプル選択アプローチは非常に人気があり、小さなロスデータをトレーニング中に正しくラベル付けされているとみなす。しかし、ノイズラベルでトレーニングされたモデルに基づいて、損失はオンザフライで発生し、大容量のデータはおそらく誤りである。 a) ラベルが間違っていて、その損失が他のデータよりも遅くなります。なぜなら、ディープニューラルネットワークが"リーンパターンファースト"であるからです; (b) 不足しているデータのグループに属しており、まだ選択されていないからです。本稿では,損失の点推定ではなく区間推定を用いて損失の不確実性を取り入れ,分布自由濃度の不等式から生じる損失の信頼区間の低境界をサンプル選択に用いる。このようにして、大容量だが少ない選択されたデータも試してみると、試行後の不確実性によって損失が効果的に減少するかどうかを見極めることにより、(a)と(b)を区別できる。結果として、正しくラベル付けされているが、一見すると誤ってラベル付けされているように見える、未表示のデータをより深く探索することができる。実験により,提案手法はベースラインよりも優れ,幅広いラベルノイズタイプに対して頑健であることが示された。

In learning with noisy labels, the sample selection approach is very popular, which regards small-loss data as correctly labeled during training. However, losses are generated on-the-fly based on the model being trained with noisy labels, and thus large-loss data are likely but not certainly to be incorrect. There are actually two possibilities of a large-loss data point: (a) it is mislabeled, and then its loss decreases slower than other data, since deep neural networks "learn patterns first"; (b) it belongs to an underrepresented group of data and has not been selected yet. In this paper, we incorporate the uncertainty of losses by adopting interval estimation instead of point estimation of losses, where lower bounds of the confidence intervals of losses derived from distribution-free concentration inequalities, but not losses themselves, are used for sample selection. In this way, we also give large-loss but less selected data a try; then, we can better distinguish between the cases (a) and (b) by seeing if the losses effectively decrease with the uncertainty after the try. As a result, we can better explore underrepresented data that are correctly labeled but seem to be mislabeled at first glance. Experiments demonstrate that the proposed method is superior to baselines and robust to a broad range of label noise types.

翻訳日:2021-06-02 14:13:19 公開日:2021-06-01

# オープンセット雑音ラベルを用いた学習用インスタンス補正

Instance Correction for Learning with Open-set Noisy Labels ( http://arxiv.org/abs/2106.00455v1 )

ライセンス: Link先を確認

Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, Masashi Sugiyama

(参考訳) オープンセットノイズラベルの問題は、トレーニングデータの一部が真のクラスを含まないラベル空間を持っていることを意味する。オープンセットノイズラベルを学習するには、同じラベルスペースを共有するためのトレーニングデータとテストデータが必要であるため、例えば損失補正やラベル修正といった多くのアプローチでは、そのようなオープンセットノイズラベルをうまく処理できない。したがって、最先端の手法では、サンプル選択アプローチを用いてオープンセットノイズラベルを処理し、ネットワークパラメータ更新のためにノイズデータからクリーンなデータを選択する。廃棄されたデータは誤ってラベル付けされ、トレーニングに参加しない。このようなアプローチは一見すると直感的で合理的です。しかし、自然に「そのようなデータはトレーニング中にのみ破棄できるのか」という疑問を提起することができる。本稿では,その答えがノーであることを示す。具体的には、廃棄されたデータのインスタンスは、一般化のための有意義な情報から成り得ると論じる。このため、そのようなデータを捨てるのではなく、破棄されたデータのインスタンスを変更するためにインスタンス修正を使用することで、廃棄されたデータの予測をラベルと一致させる。インスタンス修正は、ターゲットの敵攻撃によって実行される。修正されたデータは、一般化を支援するためにトレーニングに利用されます。分析結果に加えて、我々の主張を正当化するための一連の実証的証拠が提供される。

The problem of open-set noisy labels denotes that part of training data have a different label space that does not contain the true class. Lots of approaches, e.g., loss correction and label correction, cannot handle such open-set noisy labels well, since they need training data and test data to share the same label space, which does not hold for learning with open-set noisy labels. The state-of-the-art methods thus employ the sample selection approach to handle open-set noisy labels, which tries to select clean data from noisy data for network parameters updates. The discarded data are seen to be mislabeled and do not participate in training. Such an approach is intuitive and reasonable at first glance. However, a natural question could be raised "can such data only be discarded during training?". In this paper, we show that the answer is no. Specifically, we discuss that the instances of discarded data could consist of some meaningful information for generalization. For this reason, we do not abandon such data, but use instance correction to modify the instances of the discarded data, which makes the predictions for the discarded data consistent with given labels. Instance correction are performed by targeted adversarial attacks. The corrected data are then exploited for training to help generalization. In addition to the analytical results, a series of empirical evidences are provided to justify our claims.

翻訳日:2021-06-02 14:12:55 公開日:2021-06-01

# 決定木を用いた解剖学的対象構造実測値の自動解析

Automated Grading of Anatomical Objective Structured Practical Exams Using Decision Trees ( http://arxiv.org/abs/2106.00502v1 )

ライセンス: Link先を確認

Jason Bernard, Ranil Sonnadara, Anthony N. Saraco, Josh P. Mitchell, Alex B. Bak, Ilana Bayer, Bruce C. Wainman

(参考訳) 客観的構造化実用試験(ospe)は、解剖学的知識を評価するための効果的で堅牢だが資源集約的な手法である。ほとんどのOSPEは短い回答やブランクのスタイルの質問を使っているため、このフォーマットは試験をマークするためにコンテンツに精通した多くの人々を必要としている。しかし、解剖学と生理学のコースのオンライン配信の頻度が高まると、学生は対面学習セッションで受けるOSPEの実践を失う可能性がある。本研究の目的は、知的オンラインOSPE学習システムを構築するための第1ステップとして、OSPE質問のマーク付けにおいて、決定木(DT)の精度をテストすることである。この研究は、McMaster大学健康科学部(HTHSCI 2FF3/2LL3/1D06)の解剖学と生理学のコースから、2020年冬期最終OSPEの結果をデータセットとして使用した。データセットの90%は10倍の検証アルゴリズムで54の質問に対してDTをトレーニングするために使われました。それぞれのDTは、学生が書いた正しい回答に現れるユニークな単語で構成されていた。残りの10%のデータセットは生成されたdtsでマークされた。 DTで示される回答が職員や教員の回答と比較された場合、DTは54の質問に対して平均94.49%の精度を達成した。これは、DTのような機械学習アルゴリズムがOSPEのグレーティングに非常に効果的な選択肢であり、インテリジェントでオンラインのOSPE学習システムの開発に適していることを示唆している。

An Objective Structured Practical Examination (OSPE) is an effective and robust, but resource-intensive, means of evaluating anatomical knowledge. Since most OSPEs employ short answer or fill-in-the-blank style questions, the format requires many people familiar with the content to mark the exams. However, the increasing prevalence of online delivery for anatomy and physiology courses could result in students losing the OSPE practice that they would receive in face-to-face learning sessions. The purpose of this study was to test the accuracy of Decision Trees (DTs) in marking OSPE questions as a potential first step to creating an intelligent, online OSPE tutoring system. The study used the results of the winter 2020 semester final OSPE from McMaster University's anatomy and physiology course in the Faculty of Health Sciences (HTHSCI 2FF3/2LL3/1D06) as the data set. Ninety percent of the data set was used in a 10-fold validation algorithm to train a DT for each of the 54 questions. Each DT was comprised of unique words that appeared in correct, student-written answers. The remaining 10% of the data set was marked by the generated DTs. When the answers marked by the DT were compared to the answers marked by staff and faculty, the DT achieved an average accuracy of 94.49% across all 54 questions. This suggests that machine learning algorithms such as DTs are a highly effective option for OSPE grading and are suitable for the development of an intelligent, online OSPE tutoring system.

翻訳日:2021-06-02 14:12:36 公開日:2021-06-01

# Care Label の概念: 信頼とリソースを意識した機械学習のための認定スイート

The Care Label Concept: A Certification Suite for Trustworthy and Resource-Aware Machine Learning ( http://arxiv.org/abs/2106.00512v1 )

ライセンス: Link先を確認

Katharina Morik and Helena Kotthaus and Lukas Heppe and Danny Heinrich and Raphael Fischer and Andreas Pauly and Nico Piatkowski

(参考訳) 機械学習アプリケーションはユビキタスになった。これにより、機械学習を信頼できるものにする努力が増えた。説明可能な公正なAIはすでに成熟しています。彼らは知識のあるユーザとアプリケーションエンジニアに対処する。方法や学習したモデルを理解するのに時間を投資したくない人のために、私たちはケアラベルを提供しています。これは、例えばファクトシートやモデルカードのような記述をエンドユーザーに適した形式に変換する。一方、ケアラベルは、保証が守られているかどうかをテストする認証スイートの結果である。本稿では,認証スイートを用いて2つの実験を行った。ひとつは、マルコフランダムフィールド(MRF)の設定のためのケアラベルを示す。 MRFの基本理論に基づいて、それぞれの選択は、例えば表現性と信頼性のような静的特性の特定の評価につながる。さらに、実装をテストし、動的特性を産出するリソース消費量を計測する。この2段階の手順に続いて、ディープニューラルネットワーク(DNN)モデルを認定する別の実験が行われる。そこで、特定のモデルとデータセットに基づいて、文献から静的な特性を描画する。第2のレベルでは、特定の攻撃に対する堅牢性の測定を提供する実験が生成される。 ResNet-18 と MobileNetV3 が ImageNet に適用した。

Machine learning applications have become ubiquitous. This has led to an increased effort of making machine learning trustworthy. Explainable and fair AI have already matured. They address knowledgeable users and application engineers. For those who do not want to invest time into understanding the method or the learned model, we offer care labels: easy to understand at a glance, allowing for method or model comparisons, and, at the same time, scientifically well-based. On one hand, this transforms descriptions as given by, e.g., Fact Sheets or Model Cards, into a form that is well-suited for end-users. On the other hand, care labels are the result of a certification suite that tests whether stated guarantees hold. In this paper, we present two experiments with our certification suite. One shows the care labels for configurations of Markov random fields (MRFs). Based on the underlying theory of MRFs, each choice leads to its specific rating of static properties like, e.g., expressivity and reliability. In addition, the implementation is tested and resource consumption is measured yielding dynamic properties. This two-level procedure is followed by another experiment certifying deep neural network (DNN) models. There, we draw the static properties from the literature on a particular model and data set. At the second level, experiments are generated that deliver measurements of robustness against certain attacks. We illustrate this by ResNet-18 and MobileNetV3 applied to ImageNet.

翻訳日:2021-06-02 14:12:08 公開日:2021-06-01

# IID-GAN : モード崩壊の正規化のためのIIDサンプリング視点

IID-GAN: an IID Sampling Perspective for Regularizing Mode Collapse ( http://arxiv.org/abs/2106.00563v1 )

ライセンス: Link先を確認

Liangliang Shi, Yang Li, Junchi Yan

(参考訳) その成功にもかかわらず、GAN(Generative Adversarial Network)は依然としてモード崩壊に悩まされており、ジェネレータは潜在変数をターゲット分布の部分的なモードにしかマッピングできない。本稿では,この問題を,独立かつ同一分布(IID)サンプリング視点で解析し,正規化しようと試み,対象空間(すなわち,対象空間)における生成のためのID特性を保持することを強調する。実際のデータ) モード崩壊を自然に回避できる。これは機械学習の実際のデータに対する基本的なiid仮定に基づいている。しかし、ソースサンプル $\mathbf{z}$ は IID に従うが、ターゲット生成 $G(\mathbf{z})$ は必ずしも IID ではないかもしれない。この観測に基づいて、我々は、生成をIIDに正規化する方法として、生成からの逆ソースと潜在空間における標準ガウス分布との近接性を促進するために、新たな損失を与える。論理は、対象データから戻る逆サンプルも、ソース分散のためのiidであるべきです。合成データと実世界のデータの両方の実験は、我々のモデルの優越性と堅牢性を示している。

Despite its success, generative adversarial networks (GANs) still suffer from mode collapse, namely the generator can only map latent variables to a partial set of modes of the target distribution. In this paper, we analyze and try to regularize this issue with an independent and identically distributed (IID) sampling perspective and emphasize that holding the IID property for generation in target space (i.e. real data) can naturally avoid mode collapse. This is based on the basic IID assumption for real data in machine learning. However, though the source samples $\mathbf{z}$ obey IID, the target generation $G(\mathbf{z})$ may not necessarily be IID. Based on this observation, we provide a new loss to encourage the closeness between the inverse source from generation, and a standard Gaussian distribution in the latent space, as a way of regularizing the generation to be IID. The logic is that the inverse samples back from target data should also be IID for source distribution. Experiments on both synthetic and real-world data show the superiority and robustness of our model.

翻訳日:2021-06-02 14:11:54 公開日:2021-06-01

# 短距離オフラインRLを用いた勧告システムの長期化

Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL ( http://arxiv.org/abs/2106.00589v1 )

ライセンス: Link先を確認

Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, Adith Swaminathan

(参考訳) セッションベースのレコメンデーションシナリオについて検討し、シーケンシャルなインタラクションの間、ユーザに対してアイテムを推薦し、長期的なユーティリティを改善する。長期的なメトリクスの最適化は、学習信号(推奨が望ましい目標を達成したかどうか)がシステムとの他のユーザインタラクションによって遅延して確立されるため、難しい。クリックのような即時測定可能なプロキシは、長期的な指標とのミスアライメントによる最適以下の推奨につながる可能性がある。多くの研究がセッションベースレコメンデーションにエピソード強化学習(RL)技術を適用しているが、これらの手法はセッション間でのユーザ意図の変動を考慮していない。我々は,セッション間におけるポリシ誘起分布シフトを近似する新しいバッチrlアルゴリズムである short horizon policy improvement (shpi) を開発した。 SHPIの水平超パラメータを変化させることで、RL文献でよく知られた政策改善スキームを復元する。 4つのレコメンデーションタスクの実証結果から、SHPIは行列係数化、オフライン帯域幅、オフラインRLベースラインよりも優れていることが示された。また,重み付き回帰オラクルを用いた安定かつ効率的な実装も提供する。

We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility. Optimizing a long-term metric is challenging because the learning signal (whether the recommendations achieved their desired goals) is delayed and confounded by other user interactions with the system. Immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric. Many works have applied episodic reinforcement learning (RL) techniques for session-based recommendation but these methods do not account for policy-induced drift in user intent across sessions. We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions. By varying the horizon hyper-parameter in SHPI, we recover well-known policy improvement schemes in the RL literature. Empirical results on four recommendation tasks show that SHPI can outperform matrix factorization, offline bandits, and offline RL baselines. We also provide a stable and computationally efficient implementation using weighted regression oracles.

翻訳日:2021-06-02 14:11:37 公開日:2021-06-01

# 解毒剤データを用いたフェアクラスタリング

Fair Clustering Using Antidote Data ( http://arxiv.org/abs/2106.00600v1 )

ライセンス: Link先を確認

Anshuman Chhabra, Adish Singla, Prasant Mohapatra

(参考訳) クラスタリングアルゴリズムは多くの現代のデータサイエンスアプリケーションに広く利用されている。これにより、クラスタリングアルゴリズムの出力を公平にする必要がある。伝統的に、クラスタリングアルゴリズムに対する新しいフェアアルゴリズムの変種は、フェアネスの特定の概念のために開発されている。しかし、アプリケーションコンテキストによっては、フェアネスの定義が異なる場合もあります。その結果、クラスタリングアルゴリズムとフェアネス定義の組み合わせ毎に、新しいアルゴリズムと分析を提案する必要がある。さらに、新しいアルゴリズムは現実世界のシステムにデプロイするために再実装される必要がある。したがって、クラスタリングにおける公正性に対する代替的なアプローチとして、アンチドテデータと呼ばれる少数のデータポイントで元のデータセットを増強する手法を提案する。この新しいデータセット上でクラスタリングが行われると、選択されたクラスタリングアルゴリズムとフェアネス定義に対して出力が公正になる。我々はこれを、任意の中心的クラスタリングアルゴリズムと公平性の概念に対応できる一般的な二段階最適化問題として定式化する。次に、異なる問題設定に対するこの二段階最適化のアプローチを分類する。異なるクラスタリングアルゴリズムと公平性の概念に関する広範囲な実験により、我々のアルゴリズムは、非常に少ない反ドートデータを追加することで、多くの現実世界のデータセットで所望の公平性を達成できることが示された。また,本アルゴリズムは,他の最先端のフェアクラスタリングアルゴリズムと比較して,フェアネスコストと競合クラスタリング性能の低減を実現する。

Clustering algorithms are widely utilized for many modern data science applications. This motivates the need to make outputs of clustering algorithms fair. Traditionally, new fair algorithmic variants to clustering algorithms are developed for specific notions of fairness. However, depending on the application context, different definitions of fairness might need to be employed. As a result, new algorithms and analysis need to be proposed for each combination of clustering algorithm and fairness definition. Additionally, each new algorithm would need to be reimplemented for deployment in a real-world system. Hence, we propose an alternate approach to fairness in clustering where we augment the original dataset with a small number of data points, called antidote data. When clustering is undertaken on this new dataset, the output is fair, for the chosen clustering algorithm and fairness definition. We formulate this as a general bi-level optimization problem which can accommodate any center-based clustering algorithms and fairness notions. We then categorize approaches for solving this bi-level optimization for different problem settings. Extensive experiments on different clustering algorithms and fairness notions show that our algorithms can achieve desired levels of fairness on many real-world datasets with a very small percentage of antidote data added. We also find that our algorithms achieve lower fairness costs and competitive clustering performance compared to other state-of-the-art fair clustering algorithms.

翻訳日:2021-06-02 14:11:17 公開日:2021-06-01

# 機能的オブジェクト指向ネットワークを用いたロボットタスク実行へのロードマップ

A Road-map to Robot Task Execution with the Functional Object-Oriented Network ( http://arxiv.org/abs/2106.00158v1 )

ライセンス: Link先を確認

David Paulius, Alejandro Agostini, Yu Sun and Dongheui Lee

(参考訳) ロボットのための知識グラフ表現として関数型オブジェクト指向ネットワーク(foon)が導入された。 FOONは、二部グラフの形で、ロボットの環境やタスクに対する理解に関係のある象徴的あるいは高レベルな情報を、人間の行動理解を反映した形で含んでいる。本稿では,foonの今後の開発に向けたロードマップと,そのタスク計画のためのロボットシステムへの応用,および実演からの知識獲得について概説する。本研究では,ロボットと人間の教師が実世界のシナリオにおいて,FOONの既存の知識を協調的に強化し,実証された動作を再現し,与えられた操作問題を解くために必要なスキルをロボットに教えるための,予備的なアイデアを提案する。

Following work on joint object-action representations, the functional object-oriented network (FOON) was introduced as a knowledge graph representation for robots. Taking the form of a bipartite graph, a FOON contains symbolic or high-level information that would be pertinent to a robot's understanding of its environment and tasks in a way that mirrors human understanding of actions. In this work, we outline a road-map for future development of FOON and its application in robotic systems for task planning as well as knowledge acquisition from demonstration. We propose preliminary ideas to show how a FOON can be created in a real-world scenario with a robot and human teacher in a way that can jointly augment existing knowledge in a FOON and teach a robot the skills it needs to replicate the demonstrated actions and solve a given manipulation problem.

翻訳日:2021-06-02 14:10:47 公開日:2021-06-01

# マルチエージェント強化学習のためのshapley counterfactualcredits

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2106.00285v1 )

ライセンス: Link先を確認

Jiahui Li, Kun Kuang, Baoxiang Wang, Furui Liu, Long Chen, Fei Wu and Jun Xiao

(参考訳) 分散実行による集中訓練(CTDE)は、協調的マルチエージェント強化学習(MARL)設定において一般的なパラダイムであり、多くの実アプリケーションで広く利用されている。トレーニングプロセスにおける大きな課題の1つは、グローバルな報酬に応じて各エージェントの貢献を推論することを目的としたクレジット割り当てである。既存のクレジット割当手法は、結合値関数を個々の値関数に分解するか、局所的な観測と行動がグローバルな値関数に与える影響を測定することに焦点を当てている。これらのアプローチは、複数のエージェント間の複雑な相互作用を十分に考慮していないため、クレジットの割り当てが不適当であり、MARL上でのメディカルな結果をもたらす。本稿では,エージェントの連立を考慮に入れた明示的なクレジット割当手法であるshapley counterfactual credit assignmentを提案する。具体的には、shapley値とその望ましい特性は、エージェントの組み合わせを信用するためにdeep marlで活用され、各エージェントの個々のクレジットを見積もる能力を与えてくれます。この能力にもかかわらず、主な技術的困難は、エージェントの数として要因的に成長するShapley Valueの計算複雑性にある。代わりにモンテカルロサンプリングによる近似法を用いて,その有効性を維持しつつ,サンプルの複雑さを低減する。異なるシナリオにわたるStarCraft IIベンチマークにおいて,本手法の評価を行った。本手法は,既存の協調的marlアルゴリズムを著しく上回り,特に困難度の高いタスクにおいて,最先端のマージンを達成する。

Centralized Training with Decentralized Execution (CTDE) has been a popular paradigm in cooperative Multi-Agent Reinforcement Learning (MARL) settings and is widely used in many real applications. One of the major challenges in the training process is credit assignment, which aims to deduce the contributions of each agent according to the global rewards. Existing credit assignment methods focus on either decomposing the joint value function into individual value functions or measuring the impact of local observations and actions on the global value function. These approaches lack a thorough consideration of the complicated interactions among multiple agents, leading to an unsuitable assignment of credit and subsequently mediocre results on MARL. We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent. Despite this capability, the main technical difficulty lies in the computational complexity of Shapley Value who grows factorially as the number of agents. We instead utilize an approximation method via Monte Carlo sampling, which reduces the sample complexity while maintaining its effectiveness. We evaluate our method on StarCraft II benchmarks across different scenarios. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.

翻訳日:2021-06-02 14:10:32 公開日:2021-06-01

# 統一トランスフォーマーによる多言語音声翻訳:huawei noah's ark lab at iwslt 2021

Multilingual Speech Translation with Unified Transformer: Huawei Noah's Ark Lab at IWSLT 2021 ( http://arxiv.org/abs/2106.00197v1 )

ライセンス: Link先を確認

Xingshan Zeng, Liangyou Li and Qun Liu

(参考訳) 本稿では,Huawei Noah の Ark Lab から IWSLT 2021 Multilingual Speech Translation (MultiST) タスクに送信されたシステムについて述べる。マルチストモデルでは,異なるモダリティ(音声とテキスト)と異なるタスク(音声認識,機械翻訳,音声翻訳)からのデータを活用し,モデルの能力を高めるために,統一トランスフォーマアーキテクチャを用いる。具体的には、まず音声とテキストをそれぞれ異なる特徴抽出器に入力し、音響的特徴とテキスト的特徴を抽出する。次に、これらの機能は共有エンコーダ-デコーダアーキテクチャによって処理される。マルチタスク学習,タスクレベルのカリキュラム学習,データ拡張など,パフォーマンス向上にいくつかのトレーニング手法を適用する。最終システムは教師付き言語ペアのバイリンガルベースラインよりもかなり良い結果を得ることができ,ゼロショット言語ペアでは合理的な結果が得られる。

This paper describes the system submitted to the IWSLT 2021 Multilingual Speech Translation (MultiST) task from Huawei Noah's Ark Lab. We use a unified transformer architecture for our MultiST model, so that the data from different modalities (i.e., speech and text) and different tasks (i.e., Speech Recognition, Machine Translation, and Speech Translation) can be exploited to enhance the model's ability. Specifically, speech and text inputs are firstly fed to different feature extractors to extract acoustic and textual features, respectively. Then, these features are processed by a shared encoder--decoder architecture. We apply several training techniques to improve the performance, including multi-task learning, task-level curriculum learning, data augmentation, etc. Our final system achieves significantly better results than bilingual baselines on supervised language pairs and yields reasonable results on zero-shot language pairs.

翻訳日:2021-06-02 14:10:09 公開日:2021-06-01

# Nora: 幸福なコーチ

Nora: The Well-Being Coach ( http://arxiv.org/abs/2106.00410v1 )

ライセンス: Link先を確認

Genta Indra Winata, Holy Lovenia, Etsuko Ishii, Farhad Bin Siddique, Yongsheng Yang, Pascale Fung

(参考訳) 現在のパンデミックは、人々が孤立し続け、社会的距離を取ることを強制し、結果として生じる孤独とネガティブな感情に対処するシステムの必要性を生み出している。本稿では,対話システムにおける自然言語理解を活用した仮想コーチングプラットフォームであるNoraを提案し,ユーザインタラクションに基づく他のレコメンデーションを提案する。自給自足や在宅勤務を行う人々を支援することを目的としている。 Noraは、ユーザの感情、感情、ストレスを検出し記録することで、幸福度を計測する。 Noraはまた、健康的な日々のルーチンの開発を支援するために、様々なワークアウト、想想、ヨガのエクササイズをユーザに推奨している。さらに私たちは,nora内のソーシャルコミュニティを提供して,ユーザが自身の経験を他の人たちと共有できるようにしています。 Noraはウェブリンクを通じてどこからでもアクセスでき、英語とマンダリンの両方をサポートしている。

The current pandemic has forced people globally to remain in isolation and practice social distancing, which creates the need for a system to combat the resulting loneliness and negative emotions. In this paper we propose Nora, a virtual coaching platform designed to utilize natural language understanding in its dialogue system and suggest other recommendations based on user interactions. It is intended to provide assistance and companionship to people undergoing self-quarantine or work-from-home routines. Nora helps users gauge their well-being by detecting and recording the user's emotion, sentiment, and stress. Nora also recommends various workout, meditation, or yoga exercises to users in support of developing a healthy daily routine. In addition, we provide a social community inside Nora, where users can connect and share their experiences with others undergoing a similar isolation procedure. Nora can be accessed from anywhere via a web link and has support for both English and Mandarin.

翻訳日:2021-06-02 14:09:53 公開日:2021-06-01

# 不正確なデータのベールによるロジスティック回帰

Logistic Regression Through the Veil of Imprecise Data ( http://arxiv.org/abs/2106.00492v1 )

ライセンス: Link先を確認

Nicholas Gray and Scott Ferson

(参考訳) ロジスティック回帰は、いくつかの予測変数に基づいて結果の確率を評価する重要な統計ツールである。標準的な手法は、正確に知られているデータのみを扱うことができるが、多くのデータセットには、従来の手法が単一ポイントに縮小するか、完全に無視されるかの不確実性がある。本稿では,区間内の値から得られる可能性のあるモデルの集合を用いて,不正確なロジスティック回帰モデルを考えることで,これらの不確実性を含めることができることを示す。これは従来の方法によって取り除かれたてんかんの不確実性を明確に表現する利点がある。

Logistic regression is an important statistical tool for assessing the probability of an outcome based upon some predictive variables. Standard methods can only deal with precisely known data, however many datasets have uncertainties which traditional methods either reduce to a single point or completely disregarded. In this paper we show that it is possible to include these uncertainties by considering an imprecise logistic regression model using the set of possible models that can be obtained from values from within the intervals. This has the advantage of clearly expressing the epistemic uncertainty removed by traditional methods.

翻訳日:2021-06-02 14:09:26 公開日:2021-06-01

# モーメントからのガウス混合学習のためのテンソル分解

Tensor decomposition for learning Gaussian mixtures from moments ( http://arxiv.org/abs/2106.00555v1 )

ライセンス: Link先を確認

Rima Khouja (AROMATH), Pierre-Alexandre Mattei (MAASAI), Bernard Mourrain (AROMATH)

(参考訳) データ処理や機械学習では、データを正確に表現できるモデルを復元し活用することが重要な課題である。データセットからガウス混合モデルを復元する問題を考察する。この問題に対処するための対称テンソル分解法について検討し,データ分布の経験的モーメントからテンソルを構築する。我々は一意な分解を持つ識別可能なテンソルを考えるが、球面のガウス混合から作られるモーメントテンソルは、この性質を持っていることを示している。補間次数が厳密に半分未満の対称テンソルは同定可能であることを証明し、それらの分解を計算するために単純な線形代数演算に基づくアルゴリズムを提案する。図示的な実験は、他の最先端のアプローチと比較して、ガウス混合物を回収するためのテンソル分解法の影響を示している。

In data processing and machine learning, an important challenge is to recover and exploit models that can represent accurately the data. We consider the problem of recovering Gaussian mixture models from datasets. We investigate symmetric tensor decomposition methods for tackling this problem, where the tensor is built from empirical moments of the data distribution. We consider identifiable tensors, which have a unique decomposition, showing that moment tensors built from spherical Gaussian mixtures have this property. We prove that symmetric tensors with interpolation degree strictly less than half their order are identifiable and we present an algorithm, based on simple linear algebra operations, to compute their decomposition. Illustrative experimentations show the impact of the tensor decomposition method for recovering Gaussian mixtures, in comparison with other state-of-the-art approaches.

翻訳日:2021-06-02 14:09:17 公開日:2021-06-01

# omnizart:自動音楽書き起こしのための汎用ツールボックス

Omnizart: A General Toolbox for Automatic Music Transcription ( http://arxiv.org/abs/2106.00497v1 )

ライセンス: Link先を確認

Yu-Te Wu, Yin-Jyun Luo, Tsung-Ping Chen, I-Chieh Wei, Jui-Yang Hsu, Yi-Chin Chuang, Li Su

(参考訳) 我々は、自動音楽転写(AMT)の合理化ソリューションを提供する新しいPythonライブラリであるOmnizartを紹介し、リリースする。 Omnizartは、ディープラーニングベースのATTのライフサイクルを構成するモジュールを含み、コンパクトなコマンドラインインタフェースでの使用を容易にするように設計されている。我々の知る限り、Omnizartは最初の転写ツールキットであり、ソロ、楽器アンサンブル、パーカッション楽器、ボーカル、コード認識とビート/ダウンビート追跡のためのモデル、AMTに関連する2つの音楽情報検索(MIR)タスクなど、幅広い種類の楽器をカバーするモデルを提供する。

We present and release Omnizart, a new Python library that provides a streamlined solution to automatic music transcription (AMT). Omnizart encompasses modules that construct the life-cycle of deep learning-based AMT, and is designed for ease of use with a compact command-line interface. To the best of our knowledge, Omnizart is the first transcription toolkit which offers models covering a wide class of instruments ranging from solo, instrument ensembles, percussion instruments to vocal, as well as models for chord recognition and beat/downbeat tracking, two music information retrieval (MIR) tasks highly related to AMT.

翻訳日:2021-06-02 14:09:04 公開日:2021-06-01

# 都市森林における炭素隔離の定量化

Quantification of Carbon Sequestration in Urban Forests ( http://arxiv.org/abs/2106.00182v1 )

ライセンス: Link先を確認

Levente Klein, Wang Zhou, Conrad Albrecht

(参考訳) 植物、特に木は大気から二酸化炭素を吸収して炭素を抽出するが、木に蓄えられた炭素の効率的な定量法が欠如しているため、その過程を追跡することは困難である。本稿では,多スペクトル空中画像とlidarデータを用いて炭素蓄積量の推定を行い,炭素蓄積量化の重要な特性である樹木の被覆率,幾何学的形状,樹木種を同定する手法を提案する。樹木のバイオマスを計算するために,樹種情報とその3次元幾何学形状を遠隔画像から推定できることを実証する。特にニューヨーク市マンハッタンでは、木に植えられた炭素の合計が5万2000ドルと見積もっています。

Vegetation, trees in particular, sequester carbon by absorbing carbon dioxide from the atmosphere, however, the lack of efficient quantification methods of carbon stored in trees renders it difficult to track the process. Here we present an approach to estimate the carbon storage in trees based on fusing multispectral aerial imagery and LiDAR data to identify tree coverage, geometric shape, and tree species, which are crucial attributes in carbon storage quantification. We demonstrate that tree species information and their three-dimensional geometric shapes can be estimated from remote imagery in order to calculate the tree's biomass. Specifically, for Manhattan, New York City, we estimate a total of $52,000$ tons of carbon sequestered in trees.

翻訳日:2021-06-02 14:08:49 公開日:2021-06-01

# 超音波画像における腕神経叢分割のためのハイブリッドディープニューラルネットワーク

Hybrid Deep Neural Network for Brachial Plexus Nerve Segmentation in Ultrasound Images ( http://arxiv.org/abs/2106.00373v1 )

ライセンス: Link先を確認

Juul P.A. van Boxtel, Vincent R.J. Vousten, Josien Pluim, Nastaran Mohammadian Rad

(参考訳) 超音波ガイド下局所麻酔(UGRA)は全身麻酔(GA)を代替し、鎮痛と回復時間を改善する。この方法は鎖骨外科手術後の腕神経叢(BP)に応用できる。しかし,超音波(US)画像からのBPの同定は,専門職でも困難である。この問題を解決するために、BP神経領域の同定とセグメンテーションに畳み込みニューラルネットワーク(CNN)とより高度なディープニューラルネットワーク(DNN)を用いることができる。本稿では,超音波画像中のbp神経領域をセグメント化するための分類モデルとセグメント化モデルを組み合わせたハイブリッドモデルを提案する。 CNNモデルは、BP領域で画像を正確に選択するための分類器として使用される。次に、セグメント化にU-netまたはM-netモデルを用いる。実験の結果,提案手法は単一セグメンテーションモデルに対するセグメンテーション性能を大幅に向上させることが示唆された。

Ultrasound-guided regional anesthesia (UGRA) can replace general anesthesia (GA), improving pain control and recovery time. This method can be applied on the brachial plexus (BP) after clavicular surgeries. However, identification of the BP from ultrasound (US) images is difficult, even for trained professionals. To address this problem, convolutional neural networks (CNNs) and more advanced deep neural networks (DNNs) can be used for identification and segmentation of the BP nerve region. In this paper, we propose a hybrid model consisting of a classification model followed by a segmentation model to segment BP nerve regions in ultrasound images. A CNN model is employed as a classifier to precisely select the images with the BP region. Then, a U-net or M-net model is used for the segmentation. Our experimental results indicate that the proposed hybrid model significantly improves the segmentation performance over a single segmentation model.

翻訳日:2021-06-02 14:08:38 公開日:2021-06-01

# 条件付き生成逆数ネットワークを用いた肝病変合成のためのデカップリング形状と密度

Decoupling Shape and Density for Liver Lesion Synthesis Using Conditional Generative Adversarial Networks ( http://arxiv.org/abs/2106.00629v1 )

ライセンス: Link先を確認

Dario Augusto Borges Oliveira

(参考訳) 病変合成は、トレーニングデータの増強、病変の進展シナリオの描画、専門家の訓練を支援するための効率的な生成モデルの台頭により、多くの注目を集めた。合成データの質と多様性は、モデルをトレーニングするのに使用される注釈付きデータに大きく依存する。これにより、病変分節アルゴリズムに固有のバイアスが加わり、病変の進化シナリオの合成を効率的に制限できる。本稿では,肝病変合成のための形状と密度を分離する手法を提案する。形状と密度を個々に修正して合成制御を示す定性的な結果と,生成器モデルに密度情報を組み込むことが,形状のみを用いた場合に比べて病変分割性能の向上に寄与することを示す定量的な結果を提供する。

Lesion synthesis received much attention with the rise of efficient generative models for augmenting training data, drawing lesion evolution scenarios, or aiding expert training. The quality and diversity of synthesized data are highly dependent on the annotated data used to train the models, which not rarely struggle to derive very different yet realistic samples from the training ones. That adds an inherent bias to lesion segmentation algorithms and limits synthesizing lesion evolution scenarios efficiently. This paper presents a method for decoupling shape and density for liver lesion synthesis, creating a framework that allows straight-forwardly driving the synthesis. We offer qualitative results that show the synthesis control by modifying shape and density individually, and quantitative results that demonstrate that embedding the density information in the generator model helps to increase lesion segmentation performance compared to using the shape solely.

翻訳日:2021-06-02 14:08:24 公開日:2021-06-01

# 畳み込みネットワークを用いたマルチスペクトル画像分類のためのハイパースペクトル帯域選択

Hyperspectral Band Selection for Multispectral Image Classification with Convolutional Networks ( http://arxiv.org/abs/2106.00645v1 )

ライセンス: Link先を確認

Giorgio Morales and John Sheppard and Riley Logan and Joseph Shaw

(参考訳) 近年、ハイパースペクトルイメージング(HSI)はリモートセンシング、農業、バイオメディシンといったアプリケーションにおける信頼性の高いデータ源となっている。しかし、ハイパースペクトル画像は非常にデータ密度が高く、特定のアプリケーションに最も有用な情報を保持しながらスペクトル帯域を減らす方法の恩恵を受けることが多い。画像分類の文脈において、HSIシステムから得られた波長の削減されたセットを選択するための新しいバンド選択法を提案する。提案手法は2つの主要なステップから構成される: 1つは、フィルタに基づくアプローチを用いて、帯域とその近傍のコリニアリティ解析に基づいて、関連するスペクトル帯域を求める。この分析は冗長バンドの除去に役立ち、検索スペースを劇的に削減する。第2のステップは、情報エントロピー値に基づいて縮小集合からバンドを選択するラッパーベースアプローチを適用し、コンパクト畳み込みニューラルネットワーク(cnn)を訓練し、現在の選択性能を評価する。提案手法から得られた分類結果を,2つのハイパースペクトル画像データセット上の他の特徴選択法と比較する。さらに、元のハイパースペクトルデータキューブを使用して、マルチスペクトルイメージにおける実際のフィルタの使用プロセスをシミュレートする。本手法はマルチスペクトルセンサの設計に適した結果が得られることを示す。

In recent years, Hyperspectral Imaging (HSI) has become a powerful source for reliable data in applications such as remote sensing, agriculture, and biomedicine. However, hyperspectral images are highly data-dense and often benefit from methods to reduce the number of spectral bands while retaining the most useful information for a specific application. We propose a novel band selection method to select a reduced set of wavelengths, obtained from an HSI system in the context of image classification. Our approach consists of two main steps: the first utilizes a filter-based approach to find relevant spectral bands based on a collinearity analysis between a band and its neighbors. This analysis helps to remove redundant bands and dramatically reduces the search space. The second step applies a wrapper-based approach to select bands from the reduced set based on their information entropy values, and trains a compact Convolutional Neural Network (CNN) to evaluate the performance of the current selection. We present classification results obtained from our method and compare them to other feature selection methods on two hyperspectral image datasets. Additionally, we use the original hyperspectral data cube to simulate the process of using actual filters in a multispectral imager. We show that our method produces more suitable results for a multispectral sensor design.

翻訳日:2021-06-02 14:08:09 公開日:2021-06-01

# 体組成解析のための3次元ct画像からの全身骨格筋・脂肪組織・骨切片の自動測定の包括的検証 : 拡張体組成に向けて

Comprehensive Validation of Automated Whole Body Skeletal Muscle, Adipose Tissue, and Bone Segmentation from 3D CT images for Body Composition Analysis: Towards Extended Body Composition ( http://arxiv.org/abs/2106.00652v1 )

ライセンス: Link先を確認

Da Ma, Vincent Chow, Karteek Popuri, Mirza Faisal Beg

(参考訳) コンピュータ支援精密医療の最近の進歩は、グループベースの分析に有効な集合パターンを見つけるのに役立つ集団全体モデルから、治療の選択や治療結果の予測に関して患者固有の決定を導くことができる患者固有のモデルへと移行しやすくしている。身体構成は、様々な疾患にとって重要な要因であり、また治療選択や外科的介入に対する患者固有の臨床結果の予測因子として認識されている。 3次元CT画像は、腫瘍学的ワークローで日常的に取得され、内部解剖の正確なレンダリングを提供するため、骨格筋の量や組織区画の分別を同時に評価することができる。ディープラーニングのような強力な人工知能のツールは、3D画像全体を分割し、すべての内部解剖を正確に測定することを可能にする。これにより、それまで存在した深刻なボトルネック、すなわち3dボリュームイメージを構成する数百の2d軸スライスにスケールすることを禁じられていた手動セグメンテーションの必要性が克服される。今回紹介したような自動化ツールは、3dctやmri画像から全身の計測値を取り出すことができるようになり、個々の組織、臓器容積、形状、機能状態に基づいて様々な疾患のドライバが発見される新しい時代へと繋がる。これらの測定は不可能であったため、フィールドを非常に小さく限られたサブセットに制限した。これらの発見と、高速かつ精度で個々の画像セグメンテーションを行う能力は、がんなどの主要な疾患の発症後の栄養、老化、化学療法、手術、生存に関連する個々の治療計画モデルにこれらの3D尺度を組み込むことにつながる可能性が高い。

The latest advances in computer-assisted precision medicine are making it feasible to move from population-wide models that are useful to discover aggregate patterns that hold for group-based analysis to patient-specific models that can drive patient-specific decisions with regard to treatment choices, and predictions of outcomes of treatment. Body Composition is recognized as an important driver and risk factor for a wide variety of diseases, as well as a predictor of individual patient-specific clinical outcomes to treatment choices or surgical interventions. 3D CT images are routinely acquired in the oncological worklows and deliver accurate rendering of internal anatomy and therefore can be used opportunistically to assess the amount of skeletal muscle and adipose tissue compartments. Powerful tools of artificial intelligence such as deep learning are making it feasible now to segment the entire 3D image and generate accurate measurements of all internal anatomy. These will enable the overcoming of the severe bottleneck that existed previously, namely, the need for manual segmentation, which was prohibitive to scale to the hundreds of 2D axial slices that made up a 3D volumetric image. Automated tools such as presented here will now enable harvesting whole-body measurements from 3D CT or MRI images, leading to a new era of discovery of the drivers of various diseases based on individual tissue, organ volume, shape, and functional status. These measurements were hitherto unavailable thereby limiting the field to a very small and limited subset. These discoveries and the potential to perform individual image segmentation with high speed and accuracy are likely to lead to the incorporation of these 3D measures into individual specific treatment planning models related to nutrition, aging, chemotoxicity, surgery and survival after the onset of a major disease such as cancer.

翻訳日:2021-06-02 14:07:52 公開日:2021-06-01

# 事前学習ネットワークを用いたノイズ画像分類の忠実度推定

Fidelity Estimation Improves Noisy-Image Classification with Pretrained Networks ( http://arxiv.org/abs/2106.00673v1 )

ライセンス: Link先を確認

Xiaoyu Lin, Deblina Bhattacharjee, Majed El Helou and Sabine S\"usstrunk

(参考訳) 画像分類はディープラーニングを用いて大幅に改善された。これは主に、大規模なデータセットから豊富な特徴抽出器を学習できる畳み込みニューラルネットワーク(cnns)に起因する。しかし、ほとんどのディープラーニング分類法はクリーンな画像に基づいて訓練されており、復元前処理のステップが適用されたとしても、ノイズ処理では堅牢ではない。新しい手法はこの問題に対処するが、それらは修正された特徴抽出器に依存し、したがって再訓練を必要とする。代わりに,事前学習した分類器に適用可能な手法を提案する。提案手法は,特徴抽出器の内部表現に融合した忠実度マップ推定を活用し,ネットワークの注意を誘導し,ノイズデータに対してより頑健にする。我々は,特に高雑音レベルにおいて,ノイズ画像分類(NIC)の結果を大幅に改善し,完全に再訓練されたアプローチに近づいた。さらに, 概念実証として, オラクルの忠実度マップを用いた場合, ノイズや復元画像の訓練の有無にかかわらず, 完全に再現された手法よりも優れていることを示す。

Image classification has significantly improved using deep learning. This is mainly due to convolutional neural networks (CNNs) that are capable of learning rich feature extractors from large datasets. However, most deep learning classification methods are trained on clean images and are not robust when handling noisy ones, even if a restoration preprocessing step is applied. While novel methods address this problem, they rely on modified feature extractors and thus necessitate retraining. We instead propose a method that can be applied on a pretrained classifier. Our method exploits a fidelity map estimate that is fused into the internal representations of the feature extractor, thereby guiding the attention of the network and making it more robust to noisy data. We improve the noisy-image classification (NIC) results by significantly large margins, especially at high noise levels, and come close to the fully retrained approaches. Furthermore, as proof of concept, we show that when using our oracle fidelity map we even outperform the fully retrained methods, whether trained on noisy or restored images.

翻訳日:2021-06-02 14:07:21 公開日:2021-06-01

# bures-wasserstein幾何をもつ正定値行列上のリーマン最適化について

On Riemannian Optimization over Positive Definite Matrices with the Bures-Wasserstein Geometry ( http://arxiv.org/abs/2106.00286v1 )

ライセンス: Link先を確認

Andi Han, Bamdev Mishra, Pratik Jawanpuria, Junbin Gao

(参考訳) 本稿では、対称正定値(spd)行列多様体上のリーマン最適化のための一般的なアフィン不変量(ai)幾何と、bures-wasserstein(bw)幾何を比較分析する。我々の研究は、AIメトリックの二次的依存とは対照的に、BWメトリックがSPD行列に線形依存していることから始まる。我々は、不条件のSPD行列に対するいくつかのリーマン最適化問題に対して、BW計量がより適切で堅牢な選択であることを示す。 BW幾何学は非負の曲率を持ち、非正の曲線を持つAI幾何に対するアルゴリズムの収束率をさらに向上させることを示す。最後に、AI幾何学では測地線凸(geodeic convex)として知られているいくつかの一般的なコスト関数が、BW幾何学では測地線凸(geodeic convex)であることを示す。様々な応用に関する広範な実験が我々の発見を裏付けている。

In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affine-Invariant (AI) geometry for Riemannian optimization on the symmetric positive definite (SPD) matrix manifold. Our study begins with an observation that the BW metric has a linear dependence on SPD matrices in contrast to the quadratic dependence of the AI metric. We build on this to show that the BW metric is a more suitable and robust choice for several Riemannian optimization problems over ill-conditioned SPD matrices. We show that the BW geometry has a non-negative curvature, which further improves convergence rates of algorithms over the non-positively curved AI geometry. Finally, we verify that several popular cost functions, which are known to be geodesic convex under the AI geometry, are also geodesic convex under the BW geometry. Extensive experiments on various applications support our findings.

翻訳日:2021-06-02 14:06:06 公開日:2021-06-01

# リッジ関数の帯域凸最適化のためのミニマックスレグレット

Minimax Regret for Bandit Convex Optimisation of Ridge Functions ( http://arxiv.org/abs/2106.00444v1 )

ライセンス: Link先を確認

Tor Lattimore

(参考訳) 逆向きのバンドイット凸最適化を、f(x) = g(\langle x, \theta\rangle)$ for convex $g : \mathbb r \to \mathbb r$ と $\theta \in \mathbb r^d$ という形式の関数に制限された逆数で解析する。ミニマックスの後悔は最大で$o(d\sqrt{n} \log(\operatorname{diam}\mathcal k))$であり、ここで$n$は相互作用の数、$d$は次元、$\operatorname{diam}(\mathcal k)$は制約集合の直径である。したがって、この函数の類は線形の場合よりも対数的に難しい。

We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f(x) = g(\langle x, \theta\rangle)$ for convex $g : \mathbb R \to \mathbb R$ and $\theta \in \mathbb R^d$. We provide a short information-theoretic proof that the minimax regret is at most $O(d\sqrt{n} \log(\operatorname{diam}\mathcal K))$ where $n$ is the number of interactions, $d$ the dimension and $\operatorname{diam}(\mathcal K)$ is the diameter of the constraint set. Hence, this class of functions is at most logarithmically harder than the linear case.

翻訳日:2021-06-02 14:05:53 公開日:2021-06-01

# 伸縮触覚: 工具とグラフト物体による振動センシング

Extended Tactile Perception: Vibration Sensing through Tools and Grasped Objects ( http://arxiv.org/abs/2106.00489v1 )

ライセンス: Link先を確認

Tasbolat Taunyazov, Luar Shui Song, Eugene Lim, Hian Hian See, David Lee, Benjamin C.K. Tee, Harold Soh

(参考訳) 人間は道具やその他の保持物を通して世界を感知する驚くべき能力を示す。例えば、保持された棒の衝突箇所をピンポイントで特定し、硬いプローブを使って異なるテクスチャを区別することができる。本研究では,ロボットが道具を具現化し,標準的な把持物体を用いて知覚を拡張できるような能力を実現する方法について検討する。ロボットの指の動的触覚センサを用いた視覚触覚センシングと機械学習モデルにより,ロボットは剛体物体に沿って伝達される振動として伝達される接触情報を解読できる。本稿では,BioTacマイクロ振動センサと4〜kHzでマルチタキセルセンシングが可能な新しいイベントダイナミックセンサであるNUSkinを用いた広範囲な実験について報告する。保持棒上の微細な局在化は我々のアプローチ(20cmロッド上の誤差が1cm未満)により可能であることを示す。次に, 振動触覚知覚は, 物体ハンドオーバ時の適度な把握安定性予測と, 標準フォークを用いた正確な食品識別につながることを示す。マルチタキセルビブロ触覚を十分に高いサンプリングレート(2kHz以上)で検出すると,様々なタスクやオブジェクトに対して最高の性能が得られることがわかった。両者を組み合わせることで,触覚知覚の拡張にvibro-tactile perceptionを使用するためのエビデンスとガイドラインが提供され,ツールによる能力向上と,人間とロボットの対話性の向上につながると我々は信じている。

Humans display the remarkable ability to sense the world through tools and other held objects. For example, we are able to pinpoint impact locations on a held rod and tell apart different textures using a rigid probe. In this work, we consider how we can enable robots to have a similar capacity, i.e., to embody tools and extend perception using standard grasped objects. We propose that vibro-tactile sensing using dynamic tactile sensors on the robot fingers, along with machine learning models, enables robots to decipher contact information that is transmitted as vibrations along rigid objects. This paper reports on extensive experiments using the BioTac micro-vibration sensor and a new event dynamic sensor, the NUSkin, capable of multi-taxel sensing at 4~kHz. We demonstrate that fine localization on a held rod is possible using our approach (with errors less than 1 cm on a 20 cm rod). Next, we show that vibro-tactile perception can lead to reasonable grasp stability prediction during object handover, and accurate food identification using a standard fork. We find that multi-taxel vibro-tactile sensing at sufficiently high sampling rate (above 2 kHz) led to the best performance across the various tasks and objects. Taken together, our results provides both evidence and guidelines for using vibro-tactile perception to extend tactile perception, which we believe will lead to enhanced competency with tools and better physical human-robot-interaction.

翻訳日:2021-06-02 14:05:35 公開日:2021-06-01

# clustrank: クラスタパターンによる散乱プロットのソートのための知覚データに基づく視覚品質尺度

ClustRank: a Visual Quality Measure Trained on Perceptual Data for Sorting Scatterplots by Cluster Patterns ( http://arxiv.org/abs/2106.00599v1 )

ライセンス: Link先を確認

Mostafa Abbas, Ehsan Ullah, Abdelkader Baggag, Halima Bensmail, Michael Sedlmair, Michael Aupetit

(参考訳) ビジュアル品質測定(VQM)は、視覚化のパターンを自動的に検出し定量化することにより、アナリストを支援するように設計されている。そこで本研究では,視認可能なグループ化パターンに従って散布確率をランク付けする,clustrankと呼ばれる新しいデータ駆動手法を提案する。本モデルはまず, ガウス混合モデルのパラメトリック空間に散乱プロットを符号化し, 人間の判断データに基づいて学習した分類器を用いてグループ化パターンの知覚複雑性を推定する。初期混合成分の個数と最終結合群は散乱指数の階数を決定する。 ClustRankは、2ガウスのクラスタパターン上の人間の判断を模倣することで既存のVQM技術を改善し、スパッタプロットで一般的なクラスタパターンをランク付けする際の精度を高める。我々は,大規模な散布株の視覚的解析に専門家が依存する領域であるゲノムワイド・アソシエーション研究において,血縁関係データを分析することで,そのメリットを実証する。 3つのベンチマークデータセットとClustRank VQMを実用的な使用とさらなる改善のために利用しています。

Visual quality measures (VQMs) are designed to support analysts by automatically detecting and quantifying patterns in visualizations. We propose a new data-driven technique called ClustRank that allows to rank scatterplots according to visible grouping patterns. Our model first encodes scatterplots in the parametric space of a Gaussian Mixture Model, and then uses a classifier trained on human judgment data to estimate the perceptual complexity of grouping patterns. The numbers of initial mixture components and final combined groups determine the rank of the scatterplot. ClustRank improves on existing VQM techniques by mimicking human judgments on two-Gaussian cluster patterns and gives more accuracy when ranking general cluster patterns in scatterplots. We demonstrate its benefit by analyzing kinship data for genome-wide association studies, a domain in which experts rely on the visual analysis of large sets of scatterplots. We make the three benchmark datasets and the ClustRank VQM available for practical use and further improvements.

翻訳日:2021-06-02 14:05:12 公開日:2021-06-01

# 深層学習を用いた走査型心電図追跡による運動時の胎児障害の検出

Detection of preventable fetal distress during labor from scanned cardiotocogram tracings using deep learning ( http://arxiv.org/abs/2106.00628v1 )

ライセンス: Link先を確認

Martin G. Frasch, Shadrian B. Strong, David Nilosek, Joshua Leaverton, Barry S. Schifrin

(参考訳) 労働・配送の分野で広く応用されているにもかかわらず、電子胎児モニタリング(EFM)の価値についてかなりの議論が続いている。 EFMには胎児の心拍数(FHR)パターンの監視と、胎児の行動に関する豊富なデータと、酸素化と灌流の脅威を提供する母体の子宮収縮が含まれる。 fhrパターン情報にタイムリーに応答できない場合、胎児の損傷を普遍的に関連づける副作用。歴史的に、デジタルに保存されたEMMデータは、現代的または歴史的議論と検査のためのラスタライズされたpdf画像としてのみ利用可能である。しかし実際には、体系的にレビューされることはめったにない。本研究は,50年以上にわたって収集したEMFの独自のアーカイブを用いて,早期ないし過去の胎児外傷の訓練および検出のための深層学習フレームワークを提案する。早期の予防的胎児外傷の診断精度は94%であった。この枠組みは、胎児の健康維持のための早期の警告および意思決定支援システムの自動化に適している。最終的には、そのようなシステムは、医師が労働中にタイムリーに反応し、有害な結果を防ぐことができる。副作用が回避できない場合、新生児の早期神経保護治療へのガイダンスを提供することができる。

Despite broad application during labor and delivery, there remains considerable debate about the value of electronic fetal monitoring (EFM). EFM includes the surveillance of the fetal heart rate (FHR) patterns in conjunction with the maternal uterine contractions providing a wealth of data about fetal behavior and the threat of diminished oxygenation and perfusion. Adverse outcomes universally associate a fetal injury with the failure to timely respond to FHR pattern information. Historically, the EFM data, stored digitally, are available only as rasterized pdf images for contemporary or historical discussion and examination. In reality, however, they are rarely reviewed systematically. Using a unique archive of EFM collected over 50 years of practice in conjunction with adverse outcomes, we present a deep learning framework for training and detection of incipient or past fetal injury. We report 94% accuracy in identifying early, preventable fetal injury intrapartum. This framework is suited for automating an early warning and decision support system for maintaining fetal well-being during the stresses of labor. Ultimately, such a system could enable a physician to timely respond during labor and prevent adverse outcomes. When adverse outcomes cannot be avoided, they can provide guidance to the early neuroprotective treatment of the newborn.

翻訳日:2021-06-02 14:04:08 公開日:2021-06-01

# 音像定位のためのデュアル正規化マルチタスキング

Dual Normalization Multitasking for Audio-Visual Sounding Object Localization ( http://arxiv.org/abs/2106.00180v1 )

ライセンス: Link先を確認

Tokuhiro Nishikawa, Daiki Shimada, Jerry Jun Yokono

(参考訳) 未訓練映像における視聴覚音源の定位に関するいくつかの研究が報告されているが、その性能を定量的に評価するためのデータセットやメトリクスは提案されていない。音源定位のための基礎的真理を定義することは, 音源の位置は音源の範囲に限らず, 振動が周囲の物体を伝播・伝播させるため, 困難である。そこで本研究では,音の視的位置の曖昧さを低減し,幅広い音源の位置をアノテートする新しい概念であるサウンド・オブジェクトを提案する。定量的評価のためのメトリクスを新たに提案し,AVSOL(Audio-Visual Sounding Object Localization)の問題を定式化する。また、よく知られたAVEデータセットのテストセットを手動でアノテートすることで、評価データセット(AVSOL-Eデータセット)を作成しました。本稿では,この新たなavsol問題に対処するために,オーディオ・ビジュアル対応 (avc) タスクとビデオイベントの分類タスクを1つのオーディオ・ビジュアル類似度マップに集約する,デュアル・ノーマライズ・マルチタスク (dnm) と呼ばれる新しいマルチタスク・トレーニング戦略とアーキテクチャを提案する。 DNMによる両監視を効率的に活用することにより,提案アーキテクチャはベースライン法よりも大幅に優れる。

Although several research works have been reported on audio-visual sound source localization in unconstrained videos, no datasets and metrics have been proposed in the literature to quantitatively evaluate its performance. Defining the ground truth for sound source localization is difficult, because the location where the sound is produced is not limited to the range of the source object, but the vibrations propagate and spread through the surrounding objects. Therefore we propose a new concept, Sounding Object, to reduce the ambiguity of the visual location of sound, making it possible to annotate the location of the wide range of sound sources. With newly proposed metrics for quantitative evaluation, we formulate the problem of Audio-Visual Sounding Object Localization (AVSOL). We also created the evaluation dataset (AVSOL-E dataset) by manually annotating the test set of well-known Audio-Visual Event (AVE) dataset. To tackle this new AVSOL problem, we propose a novel multitask training strategy and architecture called Dual Normalization Multitasking (DNM), which aggregates the Audio-Visual Correspondence (AVC) task and the classification task for video events into a single audio-visual similarity map. By efficiently utilize both supervisions by DNM, our proposed architecture significantly outperforms the baseline methods.

翻訳日:2021-06-02 14:03:51 公開日:2021-06-01

# マルチエージェントマルコフ確率ゲームにおけるグラディエントプレイ:静止点と収束

Gradient Play in Multi-Agent Markov Stochastic Games: Stationary Points and Convergence ( http://arxiv.org/abs/2106.00198v1 )

ライセンス: Link先を確認

Runyu Zhang, Zhaolin Ren, Na Li

(参考訳) エージェント間で共有される現在の状態情報に基づいて決定を独立に行うことにより、各エージェントが自己の合計割引報酬を最大化しようとする確率ゲーム(SGs)としても知られるマルチエージェントタブラマルコフ決定プロセス(MDPs)の勾配プレイアルゴリズムの性能について検討する。ポリシーは、ある状態において特定のアクションを選択する確率によって直接パラメータ化される。 nash平衡(nes)と1次定常ポリシーがこの設定において等価であることを示し、マルコフポテンシャルゲームと呼ばれるマルチエージェントmdpのサブクラスに対して、非漸近的大域収束率解析を$\epsilon$-neに与える。その結果,エージェント数で指数関数的にではなく,$\epsilon$-neに達するイテレーションの数は線形にスケールすることがわかった。局所幾何学や局所安定性も考慮される。マルコフポテンシャルゲームに対しては、厳密な NE が全ポテンシャル関数の局所極大であり、完全混合の NE がサドル点であることを証明する。さらに、より一般的な設定では、厳格なnes周辺の局所収束率も与えます。

We study the performance of the gradient play algorithm for multi-agent tabular Markov decision processes (MDPs), which are also known as stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first order stationary policies are equivalent in this setting, and give a non-asymptotic global convergence rate analysis to an $\epsilon$-NE for a subclass of multi-agent MDPs called Markov potential games, which includes the cooperative setting with identical rewards among agents as an important special case. Our result shows that the number of iterations to reach an $\epsilon$-NE scales linearly, instead of exponentially, with the number of agents. Local geometry and local stability are also considered. For Markov potential games, we prove that strict NEs are local maxima of the total potential function and fully-mixed NEs are saddle points. We also give a local convergence rate around strict NEs for more general settings.

翻訳日:2021-06-02 14:01:58 公開日:2021-06-01

# ベイズメタラーニングにおける認識不確実性の情報理論解析

Information-Theoretic Analysis of Epistemic Uncertainty in Bayesian Meta-learning ( http://arxiv.org/abs/2106.00252v1 )

ライセンス: Link先を確認

Sharu Theresa Jose, Sangwoo Park, Osvaldo Simeone

(参考訳) 訓練された予測器の全体的な予測の不確実性は、認識論的不確実性とアレエータ的不確実性のために別個の貢献に分解することができる。ベイズ的定式化の下では、十分に特定されたモデルとして、2つの寄与は情報理論量(Xu と Raginsky, 2020)の点で(ログロスに関して)正確に表現できる。本稿では,ベイズメタラーニングにおける情報理論の枠組みにおける認識の不確実性について考察する。一般的な階層的ベイズモデルでは、ハイパーパラメータがモデルパラメータのタスクごとの事前を決定する。最適なメタ学習規則の最小過剰メタリスク(MEMR)によって定量化されるてんかんの不確実性に対して、(ログロスに関して)厳密な特徴と境界(より一般的な損失のために)導出される。この特徴付けは、タスク数とタスク毎のトレーニングデータ量に対する認識の不確かさの依存性に関する洞察をもたらすために利用される。神経相互情報推定器を用いて評価した情報理論的境界と,langevin-stein bayesian meta-learning(ls-bml)と呼ばれる新しい近似完全ベイズ型メタラーニング戦略の性能を比較する実験を行った。

The overall predictive uncertainty of a trained predictor can be decomposed into separate contributions due to epistemic and aleatoric uncertainty. Under a Bayesian formulation, assuming a well-specified model, the two contributions can be exactly expressed (for the log-loss) or bounded (for more general losses) in terms of information-theoretic quantities (Xu and Raginsky, 2020). This paper addresses the study of epistemic uncertainty within an information-theoretic framework in the broader setting of Bayesian meta-learning. A general hierarchical Bayesian model is assumed in which hyperparameters determine the per-task priors of the model parameters. Exact characterizations (for the log-loss) and bounds (for more general losses) are derived for the epistemic uncertainty - quantified by the minimum excess meta-risk (MEMR)- of optimal meta-learning rules. This characterization is leveraged to bring insights into the dependence of the epistemic uncertainty on the number of tasks and on the amount of per-task training data. Experiments are presented that compare the proposed information-theoretic bounds, evaluated via neural mutual information estimators, with the performance of a novel approximate fully Bayesian meta-learning strategy termed Langevin-Stein Bayesian Meta-Learning (LS-BML).

翻訳日:2021-06-02 14:01:36 公開日:2021-06-01

# 情報リスク最小化による機械学習のための統合PAC-Bayesianフレームワーク

A unified PAC-Bayesian framework for machine unlearning via information risk minimization ( http://arxiv.org/abs/2106.00265v1 )

ライセンス: Link先を確認

Sharu Theresa Jose, Osvaldo Simeone

(参考訳) マシンアンラーニング(machine unlearning)とは、トレーニングモデルの要求に対するトレーニングデータのサブセットの影響を、スクラッチから再トレーニングするコストを伴わずに取り除くメカニズムである。本稿では,情報リスク最小化問題(Zhang,2006)として,変分アンラーニング(Nguyen et.al., 2020)とラグランジアン(Golatkar et.al., 2020)の2つの最近の設計原則を回復する,機械学習のための統一的なPAC-Bayesianフレームワークを開発する。したがって、どちらの基準も自由エネルギー計量の形をとる未学習モデルの試験損失に関するPAC-ベイジアン上界と解釈できる。

Machine unlearning refers to mechanisms that can remove the influence of a subset of training data upon request from a trained model without incurring the cost of re-training from scratch. This paper develops a unified PAC-Bayesian framework for machine unlearning that recovers the two recent design principles - variational unlearning (Nguyen et.al., 2020) and forgetting Lagrangian (Golatkar et.al., 2020) - as information risk minimization problems (Zhang,2006). Accordingly, both criteria can be interpreted as PAC-Bayesian upper bounds on the test loss of the unlearned model that take the form of free energy metrics.

翻訳日:2021-06-02 14:01:12 公開日:2021-06-01

# H-FL:フェデレートラーニングのための階層的コミュニケーション効率とプライバシ保護アーキテクチャ

H-FL: A Hierarchical Communication-Efficient and Privacy-Protected Architecture for Federated Learning ( http://arxiv.org/abs/2106.00275v1 )

ライセンス: Link先を確認

He Yang

(参考訳) 連合学習(federated learning:fl)の長年の目標は、厳密なプライバシー保証と、比較的高いモデル精度を維持しながら、低い通信オーバーヘッドを必要とする。しかし、すべての目標を同時に達成することは極めて難しい。本稿では,この課題に対処するため,階層型連合学習(H-FL)と呼ばれる新しい枠組みを提案する。トレーニングデータの統計的不均一性によるモデル性能の劣化を考慮し、クライアントを適切に配置し、仲介者を利用してクライアントのローカルトレーニングを再構成する実行時分布再構築戦略を考案する。さらに,H-FLに組み込まれた圧縮補正機構を設計し,モデル性能を犠牲にすることなく通信オーバーヘッドを低減する。さらに,プライバシの保証を提供するために,ローカルトレーニングを実施しながらディファレンシャルプライバシを導入し,モデルの一部のみに適度なノイズを注入する。実験結果から,H-FLフレームワークは実世界の画像認識タスクに対して,異なるデータセット上での最先端性能を実現することがわかった。

The longstanding goals of federated learning (FL) require rigorous privacy guarantees and low communication overhead while holding a relatively high model accuracy. However, simultaneously achieving all the goals is extremely challenging. In this paper, we propose a novel framework called hierarchical federated learning (H-FL) to tackle this challenge. Considering the degradation of the model performance due to the statistic heterogeneity of the training data, we devise a runtime distribution reconstruction strategy, which reallocates the clients appropriately and utilizes mediators to rearrange the local training of the clients. In addition, we design a compression-correction mechanism incorporated into H-FL to reduce the communication overhead while not sacrificing the model performance. To further provide privacy guarantees, we introduce differential privacy while performing local training, which injects moderate amount of noise into only part of the complete model. Experimental results show that our H-FL framework achieves the state-of-art performance on different datasets for the real-world image recognition tasks.

翻訳日:2021-06-02 14:00:56 公開日:2021-06-01

# 可逆サロゲートモデル:可逆ニューラルネットワークによるレーザー-ウェークフィールド加速の合同サロゲートモデルと再構成

Invertible Surrogate Models: Joint surrogate modelling and reconstruction of Laser-Wakefield Acceleration by invertible neural networks ( http://arxiv.org/abs/2106.00432v1 )

ライセンス: Link先を確認

Friedrich Bethke, Richard Pausch, Patrick Stiller, Alexander Debus, Michael Bussmann, Nico Hoffmann

(参考訳) 可逆ニューラルネットワークは、前と逆モードで実行できる、機械学習の有望なニューラルネットワークアーキテクチャにおける最近の技術である。本稿では,レーザープラズマ加速器(iLWFA)に係わる物理の複雑な前方シミュレーションを近似する,可逆サロゲートモデルを導入する。代理モデルの客観的設計は、実験的に得られた診断を再構築するためのあらゆる手段を提供する。我々の逆レーザーウェイクフィールド加速ネットワークの品質は、大規模な数値LWFAシミュレーションで検証される。

Invertible neural networks are a recent technique in machine learning promising neural network architectures that can be run in forward and reverse mode. In this paper, we will be introducing invertible surrogate models that approximate complex forward simulation of the physics involved in laser plasma accelerators: iLWFA. The bijective design of the surrogate model also provides all means for reconstruction of experimentally acquired diagnostics. The quality of our invertible laser wakefield acceleration network will be verified on a large set of numerical LWFA simulations.

翻訳日:2021-06-02 14:00:41 公開日:2021-06-01

# 実世界の画像復元と超解像のための2段階領域適応トレーニング

Two-stage domain adapted training for better generalization in real-world image restoration and super-resolution ( http://arxiv.org/abs/2106.00504v1 )

ライセンス: Link先を確認

Cansu Korkmaz, A.Murat Tekalp, Zafer Dogan

(参考訳) 逆問題では、エンドツーエンドのトレーニングされたネットワークがトレーニングセットに見られる劣化モデルに過剰に適合していること、すなわち、それらは他のタイプの劣化にうまく一般化しないことがよく知られている。近年,未知フィルタによりサンプリングされた画像を,ビキュービックにダウンサンプリングされたルックアライクな画像にマッピングする手法が提案されている。本稿では,まず入力された劣化した画像を中間領域にマッピングし,次いでその中間領域から出力画像を生成するための第2のネットワークを訓練することにより,任意の逆問題を定式化できることを示す。さらに、最適な中間領域はタスクによって異なる場合がある。実験の結果, この2段階のドメイン適応トレーニング戦略は, 未知の劣化のクラスに対してより良い結果を得るだけでなく, 他の未知の劣化クラスにも一般化できることがわかった。

It is well-known that in inverse problems, end-to-end trained networks overfit the degradation model seen in the training set, i.e., they do not generalize to other types of degradations well. Recently, an approach to first map images downsampled by unknown filters to bicubicly downsampled look-alike images was proposed to successfully super-resolve such images. In this paper, we show that any inverse problem can be formulated by first mapping the input degraded images to an intermediate domain, and then training a second network to form output images from these intermediate images. Furthermore, the best intermediate domain may vary according to the task. Our experimental results demonstrate that this two-stage domain-adapted training strategy does not only achieve better results on a given class of unknown degradations but can also generalize to other unseen classes of degradations better.

翻訳日:2021-06-02 14:00:07 公開日:2021-06-01

# 限定的な通信と差分プライバシーを持つ無線フェデレーション学習

Wireless Federated Learning with Limited Communication and Differential Privacy ( http://arxiv.org/abs/2106.00564v1 )

ライセンス: Link先を確認

Amir Sonee and Stefano Rini and Yu-Chih Huang

(参考訳) 本稿では,空力計算(AirComp)に基づくフェデレーション学習(FL)モデルにおいて,ローカルデータセットの効率的な通信と差分プライバシー(DP)における次元性低減の役割について検討する。より正確には、ガウスマルチアクセスチャネル(gmac)上のパラメータサーバ(ps)との同時チャネル認識と限定的な通信により、クライアントが機械学習モデルをトレーニングするように促されるfl設定を考える。この設定のために、局所勾配に基づいて与えられた損失関数の最小値をトレーニングするためのフェデレート確率勾配降下(FedSGD)、局所的な更新の次元を小さくするためのジョンソン・リンデンシュトラウス(JL)ランダムプロジェクション、ユーザのプライバシーをさらに支援するための人工ノイズを適用するアルゴリズムを提案する。本手法では,各次元に大きなノイズを注入し,ベクトルの感度を一定に保ちながら,局所DP性能が主に向上していることが示唆された。これは次元減少のない場合と比較して収束速度が遅くなるのに対してである。コンバージェンスが遅いため、プライバシとコンバージェンスの間のトレードオフは高いが、高次元のシステムでは、通信コストをはるかに少なくしてほぼ同じトレードオフが発生することが示されている。

This paper investigates the role of dimensionality reduction in efficient communication and differential privacy (DP) of the local datasets at the remote users for over-the-air computation (AirComp)-based federated learning (FL) model. More precisely, we consider the FL setting in which clients are prompted to train a machine learning model by simultaneous channel-aware and limited communications with a parameter server (PS) over a Gaussian multiple-access channel (GMAC), so that transmissions sum coherently at the PS globally aware of the channel coefficients. For this setting, an algorithm is proposed based on applying federated stochastic gradient descent (FedSGD) for training the minimum of a given loss function based on the local gradients, Johnson-Lindenstrauss (JL) random projection for reducing the dimension of the local updates, and artificial noise to further aid user's privacy. For this scheme, our results show that the local DP performance is mainly improved due to injecting noise of greater variance on each dimension while keeping the sensitivity of the projected vectors unchanged. This is while the convergence rate is slowed down compared to the case without dimensionality reduction. As the performance outweighs for the slower convergence, the trade-off between privacy and convergence is higher but is shown to lessen in high-dimensional regime yielding almost the same trade-off with much less communication cost.

翻訳日:2021-06-02 13:59:52 公開日:2021-06-01

# 機械学習に基づく物理イベント生成に関する調査

A survey of machine learning-based physics event generation ( http://arxiv.org/abs/2106.00643v1 )

ライセンス: Link先を確認

Yasir Alanazi, N. Sato, Pawel Ambrozewicz, Astrid N. Hiller Blin, W. Melnitchouk, Marco Battaglieri, Tianbo Liu, Yaohang Li

(参考訳) 高エネルギー核および素粒子物理学における事象生成子は、粒子反応の研究を促進する上で重要な役割を果たす。物理イベントジェネレータの構築における機械学習(ML)の取り組みの現状について調査する。 MLベースのイベントジェネレータで使用されるML生成モデルとその特定の課題について検討し、これらの課題を克服するために、MLモデル設計に物理を組み込む様々なアプローチについて議論する。最後に,ML技術に基づく物理イベント生成のための超解像,忠実度,外挿に関するオープンな質問について検討する。

Event generators in high-energy nuclear and particle physics play an important role in facilitating studies of particle reactions. We survey the state-of-the-art of machine learning (ML) efforts at building physics event generators. We review ML generative models used in ML-based event generators and their specific challenges, and discuss various approaches of incorporating physics into the ML model designs to overcome these challenges. Finally, we explore some open questions related to super-resolution, fidelity, and extrapolation for physics event generation based on ML technology.

翻訳日:2021-06-02 13:58:58 公開日:2021-06-01

# フォグベースのIoTにおける通信性能とエネルギー利用向上のための強化学習手法

A reinforcement learning approach to improve communication performance and energy utilization in fog-based IoT ( http://arxiv.org/abs/2106.00654v1 )

ライセンス: Link先を確認

Babatunji Omoniwa, Maxime Gueriau and Ivana Dusparic

(参考訳) 近年の研究では、利用可能なモバイルフォグデバイス(スマートフォン、ドローン、国内および産業用ロボットなど)をリレーとして、センサーと目的地デバイス間の通信停止を最小限に抑える可能性を実証している。しかし、移動中のリレーは移動時にエネルギーを減らし、遠隔地へ送信する。したがって、中継装置の電力制御機構とインテリジェントモビリティは、通信性能とエネルギー利用の改善に不可欠である。本稿では,各移動式フォグ中継エージェント(MFRA)を,強化学習を用いて通信性能とエネルギー利用を同時に向上させる自律エージェントによって制御する,Qラーニングに基づく分散型アプローチを提案する。それぞれの自律エージェントは、目的地とそのエネルギーレベルからのフィードバックに基づいて、メッセージの送信を継続するか、送信フェーズに受動的になるかを学習する。本手法は集中型アプローチと比較し,MFRAの少ない数で信頼性の高いデータ配信を実現し,全体のエネルギーコストを 56.76\% -- 88.03\% 削減できることを示した。

Recent research has shown the potential of using available mobile fog devices (such as smartphones, drones, domestic and industrial robots) as relays to minimize communication outages between sensors and destination devices, where localized Internet-of-Things services (e.g., manufacturing process control, health and security monitoring) are delivered. However, these mobile relays deplete energy when they move and transmit to distant destinations. As such, power-control mechanisms and intelligent mobility of the relay devices are critical in improving communication performance and energy utilization. In this paper, we propose a Q-learning-based decentralized approach where each mobile fog relay agent (MFRA) is controlled by an autonomous agent which uses reinforcement learning to simultaneously improve communication performance and energy utilization. Each autonomous agent learns based on the feedback from the destination and its own energy levels whether to remain active and forward the message, or become passive for that transmission phase. We evaluate the approach by comparing with the centralized approach, and observe that with lesser number of MFRAs, our approach is able to ensure reliable delivery of data and reduce overall energy cost by 56.76\% -- 88.03\%.

翻訳日:2021-06-02 13:58:50 公開日:2021-06-01

PDF登録状況（公開日: 20210601）