Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20210625となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# ソーシャルメディアにおける攻撃的伝播のモデル化 Modeling Aggression Propagation on Social Media ( http://arxiv.org/abs/2002.10131v3 ) ライセンス: Link先を確認	Chrysoula Terizi, Despoina Chatzakou, Evaggelia Pitoura, Panayiotis Tsaparas and Nicolas Kourtellis	(参考訳) Cyberaggressionはさまざまなコンテキストやオンラインソーシャルプラットフォームで研究され、最先端の機械学習アルゴリズムとディープラーニングアルゴリズムを使用してさまざまなデータをモデル化して、この動作の自動検出とブロックを実現している。ユーザは、自身の(オンラインの)ソーシャルサークルにおける毒性と攻撃性の増加によって、攻撃的に行動したり、他人をいじめたりすることができる。事実上、この動作は、あるユーザや近隣から別のユーザへと伝播するので、ネットワーク中に広がります。興味深いことに、我々の知る限り、攻撃的行動のネットワークダイナミクスをモデル化した研究は行われていない。本稿では,ソーシャルメディア上での攻撃の伝播を意見ダイナミクスを用いて研究することで,この方向への第一歩を踏み出す。我々は,攻撃的あるいは通常のユーザとの接続方法に応じて,攻撃性をあるユーザから別のユーザへと伝播させる方法をモデル化する方法を提案する。 Twitterデータに対する広範なシミュレーションを通じて、攻撃的行動がネットワーク内でどのように伝播するかを研究する。我々は,アノテートされた基底真理データを用いてモデルの検証を行い,最大80%のaucに到達し,研究の結果と影響について議論した。 Cyberaggression has been studied in various contexts and online social platforms, and modeled on different data using state-of-the-art machine and deep learning algorithms to enable automatic detection and blocking of this behavior. Users can be influenced to act aggressively or even bully others because of elevated toxicity and aggression in their own (online) social circle. In effect, this behavior can propagate from one user and neighborhood to another, and therefore, spread in the network. Interestingly, to our knowledge, no work has modeled the network dynamics of aggressive behavior. In this paper, we take a first step towards this direction by studying propagation of aggression on social media using opinion dynamics. We propose ways to model how aggression may propagate from one user to another, depending on how each user is connected to other aggressive or regular users. Through extensive simulations on Twitter data, we study how aggressive behavior could propagate in the network. We validate our models with crawled and annotated ground truth data, reaching up to 80% AUC, and discuss the results and implications of our work.	翻訳日:2023-06-02 05:16:42 公開日:2021-06-25
# 鎖状錯体の特定の量子積符号とテンソル積に対する最小距離 Minimal distances for certain quantum product codes and tensor products of chain complexes ( http://arxiv.org/abs/2007.12152v3 ) ライセンス: Link先を確認	Weilei Zeng and Leonid P. Pryadko	(参考訳) 量子誤り訂正符号への写像と部分空間射影を用いて、有限体上のベクトル空間の2つの鎖複体からなるテンソル積の最小ホモロジー距離に対する下界を得る。そのような複素体のホモロジー群は、k\"unneth定理によって記述される。複素空間の一方が2つの空間の間の線型写像であるとき、距離を明示的に表現する。構築における符号、サブシステム製品符号およびゲージ固定された変種は、いくつかの既知の量子誤り訂正符号を一般化する。 We use a map to quantum error-correcting codes and a subspace projection to get lower bounds for minimal homological distances in a tensor product of two chain complexes of vector spaces over a finite field. Homology groups of such a complex are described by the K\"unneth theorem. We give an explicit expression for the distances when one of the complexes is a linear map between two spaces. The codes in the construction, subsystem product codes and their gauge-fixed variants, generalize several known families of quantum error-correcting codes.	翻訳日:2023-05-08 10:49:56 公開日:2021-06-25
# 波長多重エンタングルメントに基づく量子暗号の実験 Experimental wavelength-multiplexed entanglement-based quantum cryptography ( http://arxiv.org/abs/2009.03691v3 ) ライセンス: Link先を確認	Johannes Pseiner, Lukas Achatz, Lukas Bulla, Martin Bohmann, Rupert Ursin	(参考訳) 最先端量子鍵分布(qkd)システムでは、鍵生成速度を増加させる主な制限因子は光子検出のタイミング分解能である。本稿では,この限界を克服する戦略を提示し,実験的に実証する。波長多重化を用いた絡み合い光子の固有波長相関を利用して、偏光絡みから量子セキュア鍵を生成する。提案手法は、多くのタイプの絡み合い源を変更することなく、ファイバーおよび衛星ベースの量子通信方式に統合することができる。この手法は、多重化されていないスキームと比較して、安全な鍵レートを数桁向上できる巨大なスケーリングポテンシャルを備えている。 In state-of-the-art quantum key distribution (QKD) systems, the main limiting factor in increasing the key generation rate is the timing resolution in detecting photons. Here, we present and experimentally demonstrate a strategy to overcome this limitation, also for high-loss and long-distance implementations. We exploit the intrinsic wavelength correlations of entangled photons using wavelength multiplexing to generate a quantum secure key from polarization entanglement. The presented approach can be integrated into both fiber- and satellite-based quantum-communication schemes, without any changes to most types of entanglement sources. This technique features a huge scaling potential allowing to increase the secure key rate by several orders of magnitude as compared to non-multiplexed schemes.	翻訳日:2023-05-03 05:15:26 公開日:2021-06-25
# 量子調和振動子スペクトル分析器 Quantum harmonic oscillator spectrum analyzers ( http://arxiv.org/abs/2010.10438v3 ) ライセンス: Link先を確認	Jonas Keller, Pan-Yu Hou, Katherine C. McCormick, Daniel C. Cole, Stephen D. Erickson, Jenny J. Wu, Andrew C. Wilson, Dietrich Leibfried	(参考訳) ノイズの特性と抑制は、量子状態における調和振動子の制御に不可欠である。量子高調波発振器の雑音スペクトルを低周波から近傍の振動子共鳴で測定し,周波数変調周期駆動に対する応答をキュービットで検出する。捕捉したイオンの運動を用いて,500Hzから600kHzの雑音に対する感度を併用した2つの異なる実装を実験的に実証した。本手法は, イオントラップ電位の固有ノイズスペクトルを, 従来アクセスできなかった周波数範囲で測定する。 Characterization and suppression of noise are essential for the control of harmonic oscillators in the quantum regime. We measure the noise spectrum of a quantum harmonic oscillator from low frequency to near the oscillator resonance by sensing its response to amplitude modulated periodic drives with a qubit. Using the motion of a trapped ion, we experimentally demonstrate two different implementations with combined sensitivity to noise from 500 Hz to 600 kHz. We apply our method to measure the intrinsic noise spectrum of an ion trap potential in a previously unaccessed frequency range.	翻訳日:2023-04-28 05:32:25 公開日:2021-06-25
# チェビシェフ多項式の多スピン系における何千もの中心固有値の効率的な計算への応用 Dual application of Chebyshev polynomial for efficiently computing thousands of central eigenvalues in many-spin systems ( http://arxiv.org/abs/2011.02107v2 ) ライセンス: Link先を確認	Haoyu Guan, Wenxian Zhang	(参考訳) スペクトルの統計的性質が量子カオスの本質的な特徴を与えることが知られている。したがって、中間スペクトルにおける大きな内部固有値群の計算は、量子多体系にとって重要な問題である。本稿では,システムサイズの観点から指数関数的に互いに近い何千もの中心固有値を効率よく見つけるために,チェビシェフ多項式(DACP)法の二重応用を提案する。近縮退問題に対処するため、チェビシェフ多項式を用いて半円フィルタの指数関数をプレコンディショニングステップとして構成し、所望の部分空間の基底として大きな固有状態を生成する。さらに、DACPは計算時間が要求される固有値の数に影響されないという優れた性質を負っている。イジングスピンチェーンとスピングラスシャードに関する数値実験により,提案手法の正確性と有効性が示された。以上の結果から,DACPはスピングラスシャードの8倍の速度で,Isingスピンチェーンの最先端シフト反転法よりも30倍高速であることがわかった。メモリの要件はシステムサイズによって大きくなり、シフト反転アプローチよりも100倍も小さくなる可能性がある。 It is known that the statistical properties of the spectrum provide an essential characterization of quantum chaos. The computation of a large group of interior eigenvalues at the middle spectrum is thus an important problem for quantum many-body systems. We propose a dual application of Chebyshev polynomial (DACP) method to effciently find thousands of central eigenvalues, which are exponentially close to each other in terms of the system size. To cope with the near-degenerate problem, we use the Chebyshev polynomial to both construct an exponential of semicircle filter as the preconditioning step and generate a large set of proper states as the basis of the desired subspace. Besides, DACP owes an excellent property that its computation time is not influenced by the required number of eigenvalues. Numerical experiments on Ising spin chain and spin glass shards show the correctness and effciency of the proposed method. As our results demonstrate, DACP is a factor of 30 faster than the state-of-the-art shift-invert method for the Ising spin chain while 8 times faster for the spin glass shards. The memory requirements scale better with system size and could be a factor of 100 less than in the shift-invert approach.	翻訳日:2023-04-25 07:39:05 公開日:2021-06-25
# プレ選択およびポスト選択計測を含まないスケーラブル多光子量子メソロジー Scalable multiphoton quantum metrology with neither pre- nor post-selected measurements ( http://arxiv.org/abs/2011.02454v2 ) ライセンス: Link先を確認	Chenglong You, Mingyuan Hong, Peter Bierhorst, Adriana E. Lita, Scott Glancy, Steve Kolthammer, Emanuel Knill, Sae Woo Nam, Richard P. Mirin, Omar S. Magana-Loaiza, Thomas Gerrits	(参考訳) 電磁場の量子統計ゆらぎは、古典的な技術で実行される光学測定の感度に基づいて、ショットノイズ限界と呼ばれる限界を確立する。しかし、量子技術はこのショットノイズ限界に拘束されない。この点において、量子光源によって生成された全ての光子を用いて、ショットノイズ限界を超える小さな物理パラメータを推定することは、量子光学の主要な目標の1つである。そこで本研究では,様々な位相の量子エンハンスド光位相推定のためのスケーラブルなプロトコルを実験的に実証する。これは、光子数分解検出と組み合わせた自発的パラメトリックダウンコンバージョン源の効率的な設計によって達成される。 2モード圧縮真空状態の損失に対するロバスト性は、単一の光子の損失が量子状態から全ての位相情報を取り除くのに十分なn00n状態に基づくスキームよりも優れています。 N00N状態や条件測定に依存する他のスキームとは対照的に,高次光子対の生成と検出により,本手法の感度を向上させることができる。このプロトコルのユニークな機能によって、スケーラブルになります。我々の研究は、量子イメージング、ボーソンサンプリング、量子ネットワークなどの多光子干渉に依存する量子技術にとって重要である。 The quantum statistical fluctuations of the electromagnetic field establish a limit, known as the shot-noise limit, on the sensitivity of optical measurements performed with classical technologies. However, quantum technologies are not constrained by this shot-noise limit. In this regard, the possibility of using every photon produced by quantum sources of light to estimate small physical parameters, beyond the shot-noise limit, constitutes one of the main goals of quantum optics. Here we experimentally demonstrate a scalable protocol for quantum-enhanced optical phase estimation across a broad range of phases, with neither pre- nor post-selected measurements. This is achieved through the efficient design of a source of spontaneous parametric down-conversion in combination with photon-number-resolving detection. The robustness of two-mode squeezed vacuum states against loss allows us to outperform schemes based on N00N states, in which the loss of a single photon is enough to remove all phase information from a quantum state. In contrast to other schemes that rely on N00N states or conditional measurements, the sensitivity of our technique could be improved through the generation and detection of high-order photon pairs. This unique feature of our protocol makes it scalable. Our work is important for quantum technologies that rely on multiphoton interference such as quantum imaging, boson sampling and quantum networks.	翻訳日:2023-04-25 07:23:07 公開日:2021-06-25
# ロバスト陰影推定 Robust shadow estimation ( http://arxiv.org/abs/2011.09636v2 ) ライセンス: Link先を確認	Senrui Chen, Wenjun Yu, Pei Zeng and Steven T. Flammia	(参考訳) 大規模で強結合した量子システムの特性を効率的に推定することは、多体物理学と量子情報理論の中心的な焦点である。量子コンピュータは多くのタスクでスピードアップを約束するが、短期的なデバイスではノイズが発生しやすいため、推定の精度は一般的に低下する。本稿では,Huang,Kueng,Preskillが最近提案したシャドウ推定プロトコルにおける誤りの軽減方法を紹介する。標準影推定法に実験的にフレンドリーなキャリブレーションステージを付加することにより,量子システムの古典影の偏りのない推定を得ることができ,実験条件において最小の仮定しか与えられず,サンプル効率と耐雑音性に優れた方法で多くの有用な特性を抽出することができる。我々は,本プロトコルのサンプル複雑性に厳密な限界を与え,その性能をいくつかの数値実験で実証する。 Efficiently estimating properties of large and strongly coupled quantum systems is a central focus in many-body physics and quantum information theory. While quantum computers promise speedups for many such tasks, near-term devices are prone to noise that will generally reduce the accuracy of such estimates. Here we show how to mitigate errors in the shadow estimation protocol recently proposed by Huang, Kueng, and Preskill. By adding an experimentally friendly calibration stage to the standard shadow estimation scheme, our robust shadow estimation algorithm can obtain an unbiased estimate of the classical shadow of a quantum system and hence extract many useful properties in a sample-efficient and noise-resilient manner given only minimal assumptions on the experimental conditions. We give rigorous bounds on the sample complexity of our protocol and demonstrate its performance with several numerical experiments.	翻訳日:2023-04-23 17:23:23 公開日:2021-06-25
# 散逸によって引き起こされるトポロジカル状態の障害:異なるタイプの局在遷移の証拠 Disorder in dissipation-induced topological states: Evidence for a different type of localization transition ( http://arxiv.org/abs/2011.09730v3 ) ライセンス: Link先を確認	Alon Beck, Moshe Goldstein	(参考訳) 非平衡量子相転移の探求は、しばしば駆動と散逸が効果的な温度をもたらす傾向によって妨げられ、古典的振る舞いをもたらす。散逸が非自明な量子コヒーレント定常状態へとシステムを動かすように設計されたとき、これは違うのだろうか? 本研究では,最近導入された散逸誘発チャーン位相状態に対する障害の影響を考察し,エルミート定常密度行列や絡み合いハミルトン行列の固有モードを調べることで,この問題に光を当てた。平衡と同様に、各ランダウバンドは中心付近に1つの非局在化レベルを持つ。しかし、3つの異なる有限サイズスケーリング法を用いて、非局在化状態に近づくときの局所化長のばらつきを記述する臨界指数$\nu$が、力学の非散逸部分に障害が導入された場合の平衡とは大きく異なることを示す。これは、冷原子実験で利用できる異なるタイプの非平衡量子臨界普遍性クラスを示す。 The quest for nonequilibrium quantum phase transitions is often hampered by the tendency of driving and dissipation to give rise to an effective temperature, resulting in classical behavior. Could this be different when the dissipation is engineered to drive the system into a nontrivial quantum coherent steady state? In this work we shed light on this issue by studying the effect of disorder on recently-introduced dissipation-induced Chern topological states, and examining the eigenmodes of the Hermitian steady state density matrix or entanglement Hamiltonian. We find that, similarly to equilibrium, each Landau band has a single delocalized level near its center. However, using three different finite size scaling methods we show that the critical exponent $\nu$ describing the divergence of the localization length upon approaching the delocalized state is significantly different from equilibrium if disorder is introduced into the non-dissipative part of the dynamics. This indicates a different type of nonequilibrium quantum critical universality class accessible in cold-atom experiments.	翻訳日:2023-04-23 17:16:56 公開日:2021-06-25
# 共変量子チャネルのプログラム可能性 Programmability of covariant quantum channels ( http://arxiv.org/abs/2012.00717v2 ) ライセンス: Link先を確認	Martina Gschwendtner, Andreas Bluhm, Andreas Winter	(参考訳) プログラム可能な量子プロセッサは、プログラムレジスタの状態を使用して、入力レジスタに適用される一連の量子チャネルの1つの要素を特定する。そのようなデバイスは、無限に多くのユニタリ量子チャネル(NielsenとChuangのNo-Programming Theorem)を含む集合に対して有限次元のプログラムレジスタでは不可能であることはよく知られている。システムが対称性を持つ場合、状況は変化する。実際、ここでは群共変チャネルを考える。群がチャネル入力に対して無作為に作用すると、これらのチャネルは有限のプログラム次元のプログラマブル量子プロセッサによって正確に実装できる(チャネルのチェイ・ジャミオルコフスキー状態をプログラムとして使用するテレポーテーションシミュレーションにより)。さらに、対称群作用の表現理論を活用することで、プログラムの冗長性を除去し、その結果のプログラムレジスタがヒルベルト空間次元が最小であることを示す方法を示す。さらに、全てのグループ共変チャネルを概ね実装したプロセッサのプログラムレジスタ次元の上限と下限を提供する。 A programmable quantum processor uses the states of a program register to specify one element of a set of quantum channels which is applied to an input register. It is well-known that such a device is impossible with a finite-dimensional program register for any set that contains infinitely many unitary quantum channels (Nielsen and Chuang's No-Programming Theorem), meaning that a universal programmable quantum processor does not exist. The situation changes if the system has symmetries. Indeed, here we consider group-covariant channels. If the group acts irreducibly on the channel input, these channels can be implemented exactly by a programmable quantum processor with finite program dimension (via teleportation simulation, which uses the Choi-Jamiolkowski state of the channel as a program). Moreover, by leveraging the representation theory of the symmetry group action, we show how to remove redundancy in the program and prove that the resulting program register has minimum Hilbert space dimension. Furthermore, we provide upper and lower bounds on the program register dimension of a processor implementing all group-covariant channels approximately.	翻訳日:2023-04-22 11:59:58 公開日:2021-06-25
# 極微相互作用によるキタエフ材料のトーリック符号位相の電気プローブ Electric probe for the toric code phase in Kitaev materials through the hyperfine interaction ( http://arxiv.org/abs/2012.08825v5 ) ライセンス: Link先を確認	Masahiko G. Yamada, Satoshi Fujimoto	(参考訳) キタエフモデルは、ガッピングおよびギャップのないスピン液体相を持つ顕著なスピンモデルであり、イリダートと$\alpha$-rucl$_3$で実現される。最近行われた$\alpha$-RuCl$_3$の実験では、系のC_3$対称性を破るギャップ付きトーリック符号相へのネマティック遷移のシグネチャが、熱容量の角度依存性によって観測されている。本稿では,ネマティック遷移を電気的に検出する機構を提案する。 j_\textrm{eff}=1/2$スピンは電気四極子モーメント(eqm)を持たないため、これは不可能に見える。しかし、2階摂動では、非ゼロのEQMを持つ仮想状態が現れ、核磁気共鳴とM\"オスバウアー分光によってネマティック秩序パラメータが検出される。 EQMの純粋な磁源は従来の電子ネマティック相とは異なるため、北エフのトーリック誤り訂正符号の直接検出が可能である。 The Kitaev model is a remarkable spin model with gapped and gapless spin liquid phases, which are potentially realized in iridates and $\alpha$-RuCl$_3$. In the recent experiment of $\alpha$-RuCl$_3$, the signature of a nematic transition to the gapped toric code phase, which breaks the $C_3$ symmetry of the system, has been observed through the angle dependence of the heat capacity. We here propose a mechanism by which the nematic transition can be detected electrically. This is seemingly impossible because $J_\textrm{eff}=1/2$ spins do not have an electric quadrupole moment (EQM). However, in the second-order perturbation the virtual state with a nonzero EQM appears, which makes the nematic order parameter detectable by nuclear magnetic resonance and M\"ossbauer spectroscopy. The purely magnetic origin of EQM is different from conventional electronic nematic phases, allowing the direct detection of the realization of Kitaev's toric error-correction code.	翻訳日:2023-04-20 11:21:06 公開日:2021-06-25
# スケーラブルな量子コンピューティングアーキテクチャのための浮動小数点可変カプラ Floating tunable coupler for scalable quantum computing architectures ( http://arxiv.org/abs/2103.07030v2 ) ライセンス: Link先を確認	Eyob A. Sete, Angela Q. Chen, Riccardo Manenti, Shobhan Kulshreshtha, and Stefano Poletto	(参考訳) ゼロカップリング条件を達成するために、直接量子ビット結合容量に依存しない浮動小数点結合器を提案する。量子ビットカップラ結合の極性は、非定数の量子ビット結合をオフセットし、カプラ周波数がキュービット周波数より上または下にあるときにゼロカップリング状態に達するように設計できることを示す。量子ビットに対するカプラの超伝導パッドの対称および非対称配置を実装し,これら2種類の可変カプラの動作機構を実験的に実証した。このような浮動小数点共振器は、常時オンの残余結合を低減しつつ、大規模量子プロセッサを設計する柔軟性を提供する。 We propose a floating tunable coupler that does not rely on direct qubit-qubit coupling capacitances to achieve the zero-coupling condition. We show that the polarity of the qubit-coupler couplings can be engineered to offset the otherwise constant qubit-qubit coupling and attain the zero-coupling condition when the coupler frequency is above or below the qubit frequencies. We experimentally demonstrate these two operating regimes of the tunable coupler by implementing symmetric and asymmetric configurations of the coupler's superconducting pads with respect to the qubits. Such a floating tunable coupler provides flexibility in designing large-scale quantum processors while reducing the always-on residual couplings.	翻訳日:2023-04-08 08:57:07 公開日:2021-06-25
# 熱前フロッケ時間結晶の臨界特性 Critical properties of the prethermal Floquet Time Crystal ( http://arxiv.org/abs/2103.10818v2 ) ライセンス: Link先を確認	Muath Natsheh, Andrea Gambassi, Aditi Mitra	(参考訳) 予熱相におけるフロケット時間結晶の形成を特徴付ける臨界特性を周期的に駆動された$O(N)$モデルで解析した。特に,周期同期力学を伴う自明な位相と,長距離空間秩序を伴う非自明な位相との長距離空間秩序の欠如とを分離する臨界線に着目した。臨界線の近傍では、次元展開と$N\to\infty$の正確な解の組み合わせにより、等時相関関数の空間相関長の発散を特徴付ける指数$\nu$、オーダーパラメータの振幅の増大を特徴付ける指数$\beta$、およびクエンチが臨界線への自明位相の奥深くで実行されたときの老化ダイナミクスの初期スリップ指数$\theta$を決定する。指数 $\nu, \beta, \theta$ はドライブがないときと同一であることが分かる。また、老化機能形態は、駆動期間に比べて小さかったり大きかったりした場合に、系がプローブされているかに依存することが判明した。臨界線近傍の摂動ポテンシャルに対する線形応答として得られる2点相関関数の空間構造は、駆動の欠如よりも長い範囲の代数的崩壊を示すことが示され、周期的に倍められただけでなく、波動ベクトル $\omega/(2 v)$, $v$ で空間的に振動することも見出され、準粒子の速度は$\omega$ であり、駆動周波数は$\omega$ である。 The critical properties characterizing the formation of the Floquet time crystal in the prethermal phase are investigated analytically in the periodically driven $O(N)$ model. In particular, we focus on the critical line separating the trivial phase with period synchronized dynamics and absence of long-range spatial order from the non-trivial phase where long-range spatial order is accompanied by period-doubling dynamics. In the vicinity of the critical line, with a combination of dimensional expansion and exact solution for $N\to\infty$, we determine the exponent $\nu$ that characterizes the divergence of the spatial correlation length of the equal-time correlation functions, the exponent $\beta$ characterizing the growth of the amplitude of the order-parameter, as well as the initial-slip exponent $\theta$ of the aging dynamics when a quench is performed from deep in the trivial phase to the critical line. The exponents $\nu, \beta, \theta$ are found to be identical to those in the absence of the drive. In addition, the functional form of the aging is found to depend on whether the system is probed at times that are small or large compared to the drive period. The spatial structure of the two-point correlation functions, obtained as a linear response to a perturbing potential in the vicinity of the critical line, is found to show algebraic decays that are longer ranged than in the absence of a drive, and besides being period-doubled, are also found to oscillate in space at the wave-vector $\omega/(2 v)$, $v$ being the velocity of the quasiparticles, and $\omega$ being the drive frequency.	翻訳日:2023-04-07 10:53:23 公開日:2021-06-25
# fairmandering: 公平を最適化した政治区分のためのコラム生成ヒューリスティック Fairmandering: A column generation heuristic for fairness-optimized political districting ( http://arxiv.org/abs/2103.11469v2 ) ライセンス: Link先を確認	Wes Gurnee and David B. Shmoys	(参考訳) アメリカ合衆国議会の選挙区制は、選挙区の境界を操作することで選挙結果を決める権限を政治家に与えている。既存の計算ソリューションは、政治的、人口統計的な入力を無視して偏りのない地図を描くことに集中しており、代わりに単にコンパクトさを最適化している。コンパクトさと公正さは直交的な性質であるため、これは欠陥のあるアプローチであり、公正性の任意の片方向線形定義を明示的に最適化するためのスケーラブルな2段階法を導入する。第1段階はランダム化された分割列生成ヒューリスティックであり、グラフ分割問題の構成構造を利用して、指数的な数の異なる地区計画を生成する。この地区アンサンブルは、マスター選択問題への入力を形成し、最終計画に含まれる地区を選択する。分離した設計により、公正な対象関数を定義する上で、前例のない柔軟性が実現できます。パイプラインは任意に並列化可能で、さらなる再制限制約をサポートする柔軟性があり、他の広範囲の地域化問題に適用できる。議会地区における最大規模のアンサンブル研究において、我々の手法を用いて、期待される結果の範囲と、この範囲がフェアネスの潜在的な定義に与える影響を理解する。 The American winner-take-all congressional district system empowers politicians to engineer electoral outcomes by manipulating district boundaries. Existing computational solutions mostly focus on drawing unbiased maps by ignoring political and demographic input, and instead simply optimize for compactness. We claim that this is a flawed approach because compactness and fairness are orthogonal qualities, and introduce a scalable two-stage method to explicitly optimize for arbitrary piecewise-linear definitions of fairness. The first stage is a randomized divide-and-conquer column generation heuristic which produces an exponential number of distinct district plans by exploiting the compositional structure of graph partitioning problems. This district ensemble forms the input to a master selection problem to choose the districts to include in the final plan. Our decoupled design allows for unprecedented flexibility in defining fairness-aligned objective functions. The pipeline is arbitrarily parallelizable, is flexible to support additional redistricting constraints, and can be applied to a wide array of other regionalization problems. In the largest ever ensemble study of congressional districts, we use our method to understand the range of possible expected outcomes and the implications of this range on potential definitions of fairness.	翻訳日:2023-04-07 06:34:30 公開日:2021-06-25
# トーリック符号のグラフ状態表現 Graph state representation of the toric code ( http://arxiv.org/abs/2103.12268v3 ) ライセンス: Link先を確認	Pengcheng Liao, David L. Feder	(参考訳) フォールトトレラントな操作の可能性を考えると、トポロジカル量子状態は現在、激しい活動の焦点となっている。特に興味深いのは、表面や平らな安定化符号のようなトポロジカルな量子誤り訂正符号であり、これは有名なトーリック符号と等価である。すべてのスタビライザ状態は局所クリフォード操作下でグラフ状態にマップされるが、トポロジカルスタビライザ符号に関連するグラフは未知のままである。トーリック符号グラフは、星グラフ(グリーンベルガー=ホルン=ゼーリンガー状態の符号化)とハーフグラフの2種類の部分グラフからなる。位相次数は、繰り返し符号とトーリック符号の間の関係を明らかにする多重星グラフの存在と同一視される。グラフ構造は、幾何学的に非局所ゲートを仮定して、状態形成のためのログ深さ量子回路を容易に生成し、アキラを含む一定の深さに縮小でき、回路幅を増大させるコストで測定することができる。その結果, トポロジ的順序の調査と新しいトポロジ的誤り訂正符号の開発のためのグラフ理論の枠組みが得られた。 Given their potential for fault-tolerant operations, topological quantum states are currently the focus of intense activity. Of particular interest are topological quantum error correction codes, such as the surface and planar stabilizer codes that are equivalent to the celebrated toric code. While every stabilizer state maps to a graph state under local Clifford operations, the graphs associated with topological stabilizer codes remain unknown. We show that the toric code graph is composed of only two kinds of subgraphs: star graphs (which encode Greenberger-Horne-Zeilinger states) and half graphs. The topological order is identified with the existence of multiple star graphs, which reveals a connection between the repetition and toric codes. The graph structure readily yields a log-depth quantum circuit for state preparation, assuming geometrically non-local gates, which can be reduced to a constant depth including ancillae and measurements at the cost of increasing the circuit width. The results provide a new graph-theoretic framework for the investigation of topological order and the development of novel topological error correction codes.	翻訳日:2023-04-07 02:30:12 公開日:2021-06-25
# 断熱臨界量子距離論は、断熱へのショートカットを適用してもハイゼンベルク極限に達することができない Adiabatic critical quantum metrology cannot reach the Heisenberg limit even when shortcuts to adiabaticity are applied ( http://arxiv.org/abs/2103.12939v2 ) ライセンス: Link先を確認	Karol Gietka, Friederike Metz, Tim Keller, and Jing Li	(参考訳) 臨界量子メトロロジーに対する断熱的なアプローチで得られた量子フィッシャー情報は、精度のハイゼンベルク限界につながり得ないため、最適な設定下での正則量子メトロロジーは常に優れていることを示す。さらに, 断熱への近道は臨界基底状態の生成時間を任意に減少させることができるが, 断熱臨界量子メトロロジーにおける量子パラメータ推定のハイゼンベルク限界の達成や克服には使用できないと論じた。ケーススタディとして、ランダウ・ツェナーモデルと量子ラビモデルへの反断熱駆動の適用について検討する。 We show that the quantum Fisher information attained in an adiabatic approach to critical quantum metrology cannot lead to the Heisenberg limit of precision and therefore regular quantum metrology under optimal settings is always superior. Furthermore, we argue that even though shortcuts to adiabaticity can arbitrarily decrease the time of preparing critical ground states, they cannot be used to achieve or overcome the Heisenberg limit for quantum parameter estimation in adiabatic critical quantum metrology. As case studies, we explore the application of counter-diabatic driving to the Landau-Zener model and the quantum Rabi model.	翻訳日:2023-04-06 23:59:23 公開日:2021-06-25
# フェルミオン双対性:強い散逸と記憶を持つ開システムの一般対称性 Fermionic duality: General symmetry of open systems with strong dissipation and memory ( http://arxiv.org/abs/2104.11202v2 ) ライセンス: Link先を確認	V. Bruch, K. Nestmann, J. Schulenborg and M. R. Wegewijs	(参考訳) 我々は、強い相互作用と広帯域貯水池との強い結合を持つ幅広い種類のフェルミオン開量子系の正確な時間進化を考える。我々は、状態の進化(schr\"odinger)と観測可能な状態(heisenberg)の間の非自明なフェルミオン双対関係を示す。この非常に直感的な関係は、クラース測度演算子、Choi-Jamio{\l}kowski状態、時間畳み込みおよび畳み込みのない量子マスター方程式、一般化されたリンドブラッドジャンプ演算子など、量子力学のすべての正準的アプローチにおける解析計算においてどのように理解され、活用されるかを示す。力学の可除性と因果構造に関する洞察と、非摂動マルコフ近似とその初期すべり補正への応用について論じる。フェミオンモデルに対する予測は、これまで考えられていたよりもはるかに広範囲に根本原理によって既に固定されている。 We consider the exact time-evolution of a broad class of fermionic open quantum systems with both strong interactions and strong coupling to wide-band reservoirs. We present a nontrivial fermionic duality relation between the evolution of states (Schr\"odinger) and of observables (Heisenberg). We show how this highly nonintuitive relation can be understood and exploited in analytical calculations within all canonical approaches to quantum dynamics, covering Kraus measurement operators, the Choi-Jamio{\l}kowski state, time-convolution and convolutionless quantum master equations and generalized Lindblad jump operators. We discuss the insights this offers into the divisibility and causal structure of the dynamics and the application to nonperturbative Markov approximations and their initial-slip corrections. Our results underscore that predictions for fermionic models are already fixed by fundamental principles to a much greater extent than previously thought.	翻訳日:2023-04-02 20:09:25 公開日:2021-06-25
# 逐次アルゴリズムを逆転させる Reversify any sequential algorithm ( http://arxiv.org/abs/2105.05626v2 ) ライセンス: Link先を確認	Yuri Gurevich	(参考訳) 任意のシーケンシャルアルゴリズムを$A$に逆転させるには、簿記機械で$A$を優しく実装する。結果として、$a$のstep-for-stepを模倣したstep-for-stepのリバーシブルアルゴリズムが生まれ、$a$の時点で停止する。一般性を失うことなく、アルゴリズム $a$ は、動作的に $a$ と同一の抽象状態マシンとして提示される。そのような表現の存在は理論的に証明され、そのような表現の実用性は十分に証明されている。 To reversify an arbitrary sequential algorithm $A$, we gently instrument $A$ with bookkeeping machinery. The result is a step-for-step reversible algorithm that mimics $A$ step-for-step and stops exactly when $A$ does. Without loss of generality, we presume that algorithm $A$ is presented as an abstract state machine that is behaviorally identical to $A$. The existence of such representation has been proven theoretically, and the practicality of such representation has been amply demonstrated.	翻訳日:2023-03-31 08:52:24 公開日:2021-06-25
# 線形光学系におけるガウス絡み合いの分布 Distribution of Gaussian Entanglement in Linear Optical Systems ( http://arxiv.org/abs/2105.13441v2 ) ライセンス: Link先を確認	Jiru Liu, Wenchao Ge and M. Suhail Zubairy	(参考訳) 絡み合いは、多くのアプリケーションを持つ量子ネットワークを構築する上で不可欠な要素である。ネットワーク内での絡み合いの分散を理解することは、前進するための重要なステップです。本稿では,二部交絡のための新しい定量化器を用いて,線形ネットワークにおけるガウス交絡の保存と分布について検討する。本研究では, ビームスプリッタを透過率, 反射率と同等に, 絡み合いを分散できることを示す。絡み合った状態の要件と、この関係を満たすネットワークの種類は明確に示される。本研究は,量子エンタングルメントの新しい定量化とネットワーク内のエンタングルメントの構造に関するさらなる知見を提供する。 Entanglement is an essential ingredient for building a quantum network that can have many applications. Understanding how entanglement is distributed in a network is a crucial step to move forward. Here we study the conservation and distribution of Gaussian entanglement in a linear network using a new quantifier for bipartite entanglement. We show that the entanglement can be distributed through a beam-splitter in the same way as the transmittance and the reflectance. The requirements on the entangled states and the type of networks to satisfy this relation are presented explicitly. Our results provide a new quantification for quantum entanglement and further insights into the structure of entanglement in a network.	翻訳日:2023-03-29 06:55:29 公開日:2021-06-25
# 非断熱ドライブと非熱量子状態のエネルギー的利点 Energetic advantages of non-adiabatic drives combined with non-thermal quantum states ( http://arxiv.org/abs/2106.05990v3 ) ライセンス: Link先を確認	Camille L Latune	(参考訳) 量子力学の実験や応用において、量子系のユニタリ駆動はユビキタスであり、特に量子熱力学に関連するエネルギー的側面が注目されている。初期非熱状態から得られるユニタリ駆動のエネルギー的利点について検討する。我々は非環状エルゴトロピーを導入してエネルギー利得を定量化し、コヒーレント(コヒーレンスベース)と非コヒーレント(人口ベース)の寄与を同定する。特に、初期量子コヒーレンスは常に有益であるように見えるが、非パッシブ集団分布は体系的ではない。さらに、これらのエネルギーゲインは、初期熱状態に対する断熱力学の通常の最適性とは対照的に、非断熱力学を通してのみアクセス可能である。最後に、ショートカット・トゥ・アディバチティの文脈で確立されたフレームワークに従って、最適なドライブの実装に関連するエネルギーコストが分析され、ほとんどの場合、ショートカット・トゥ・アディバチティティに関連するエネルギーコストよりも小さいことが判明した。我々は,二段階システムの例を明示的に扱い,より大きな初期コヒーレンスによってエネルギッシュなアドバンテージが増大し,初期コヒーレンスとダイナミクスがコヒーレンスを消費し使用する能力との間に相互作用することを示す。 Unitary drivings of quantum systems are ubiquitous in experiments and applications of quantum mechanics and the underlying energetic aspects, particularly relevant in quantum thermodynamics, are receiving growing attention. We investigate energetic advantages in unitary driving obtained from initial non-thermal states. We introduce the non-cyclic ergotropy to quantify the energetic gains, from which coherent (coherence-based) and incoherent (population-based) contributions are identified. In particular, initial quantum coherences appear to be always beneficial whereas non-passive population distributions not systematically. Additionally, these energetic gains are accessible only through non-adiabatic dynamics, contrasting with the usual optimality of adiabatic dynamics for initial thermal states. Finally, following frameworks established in the context of shortcut-to-adiabaticity, the energetic cost related to the implementation of the optimal drives are analysed and, in most situations, are found to be smaller than the energetic cost associated with shortcut-to-adiabaticity. We treat explicitly the example of a two-level system and show that energetic advantages increase with larger initial coherences, illustrating the interplay between initial coherences and the ability of the dynamics to consume and use coherences.	翻訳日:2023-03-27 01:41:07 公開日:2021-06-25
# 古典的二次元ハイゼンベルク模型の再検討:$SU(2)$-symmetric tensor network study The classical two-dimensional Heisenberg model revisited: An $SU(2)$-symmetric tensor network study ( http://arxiv.org/abs/2106.06310v2 ) ライセンス: Link先を確認	Philipp Schmoll, Augustine Kshetrimayum, Jens Eisert, Roman Orus, Matteo Rizzi	(参考訳) 2つの空間次元の古典的ハイゼンベルク模型は最もパラダイム的なスピンモデルの一つであり、磁性を理解するために統計物理学や凝縮物質物理学において重要な役割を果たす。それでも、そのパラダイム的特徴と(連続的な)自発的対称性の破れが広く受け入れられているにもかかわらず、モデルが有限温度で相転移を示すかどうかの議論は残る。重要なことに、このモデルは 1+1$ 次元における $o(3)$ 非線形シグマモデルの格子離散化として解釈することができ、これは有名な高次元(3+1$次元の量子色力学のような)の重要な特徴、すなわち漸近自由現象を含む最も単純な量子場理論の1つである。これは有限温度遷移も除外するが、格子効果は主流図の修正に重要な役割を果たす。本研究では,gibbs状態の相関構造を包括的に探究するために,熱力学的限界における古典的分割関数を表現する最先端テンソルネットワーク手法を用いた。 2次元テンソルネットワーク収縮スキームに$SU(2)$対称性を実装することで、相転移を検出する上で重要な特徴である$\chi_E^\text{eff} \sim 1500$までの環境の非常に大きな有効結合次元を処理できる。気温が下がるにつれて、急速に変化する相関関係の長さがみられ、その振る舞いは文献で知られている二つの矛盾する2つの仮説、すなわち有限=$t$遷移と漸近自由とに適合する。 The classical Heisenberg model in two spatial dimensions constitutes one of the most paradigmatic spin models, taking an important role in statistical and condensed matter physics to understand magnetism. Still, despite its paradigmatic character and the widely accepted ban of a (continuous) spontaneous symmetry breaking, controversies remain whether the model exhibits a phase transition at finite temperature. Importantly, the model can be interpreted as a lattice discretization of the $O(3)$ non-linear sigma model in $1+1$ dimensions, one of the simplest quantum field theories encompassing crucial features of celebrated higher-dimensional ones (like quantum chromodynamics in $3+1$ dimensions), namely the phenomenon of asymptotic freedom. This should also exclude finite-temperature transitions, but lattice effects might play a significant role in correcting the mainstream picture. In this work, we make use of state-of-the-art tensor network approaches, representing the classical partition function in the thermodynamic limit over a large range of temperatures, to comprehensively explore the correlation structure for Gibbs states. By implementing an $SU(2)$ symmetry in our two-dimensional tensor network contraction scheme, we are able to handle very large effective bond dimensions of the environment up to $\chi_E^\text{eff} \sim 1500$, a feature that is crucial in detecting phase transitions. With decreasing temperatures, we find a rapidly diverging correlation length, whose behaviour is apparently compatible with the two main contradictory hypotheses known in the literature, namely a finite-$T$ transition and asymptotic freedom, though with a slight preference for the second.	翻訳日:2023-03-26 23:42:02 公開日:2021-06-25
# 時間符号化多層スパイクニューラルネットワークにおけるVLSI回路制約の影響 Effects of VLSI Circuit Constraints on Temporal-Coding Multilayer Spiking Neural Networks ( http://arxiv.org/abs/2106.10382v2 ) ライセンス: Link先を確認	Yusuke Sakemi, Takashi Morie, Takeo Hosomi, Kazuyuki Aihara	(参考訳) spiking neural network (snn)は、脳の数学的モデルとしてだけでなく、現実世界のアプリケーションのためのエネルギー効率の良い情報処理モデルとしても注目されている。特に、テンポラリ符号化に基づくsnsは、タスクの実行にかなり少ないスパイクを必要とするため、レート符号化に基づくものよりもずっと効率的であることが期待されている。 SNNは連続状態および連続時間モデルであるため、アナログVLSI回路で実装することが好ましい。しかし、システムサイズが非常に大きい場合、連続時間アナログ回路によるシステム全体の構築は不可能である。したがって、混合信号回路を用いる必要があり、シナプス重みの時間離散化と量子化が必要である。さらに、SNNのアナログVLSI実装は、ノイズやデバイスミスマッチの影響、アナログ回路操作に起因する他の制約など、非理想性を示す。本研究では,SNNの性能に及ぼす時間離散化および/または重み量子化の影響を検討した。さらに, 膜電位の下限と焼成閾値の時間的変動の影響を解明した。最後に,数理SNNモデルを離散時間でアナログ回路にマッピングするための最適手法を提案する。 The spiking neural network (SNN) has been attracting considerable attention not only as a mathematical model for the brain, but also as an energy-efficient information processing model for real-world applications. In particular, SNNs based on temporal coding are expected to be much more efficient than those based on rate coding, because the former requires substantially fewer spikes to carry out tasks. As SNNs are continuous-state and continuous-time models, it is favorable to implement them with analog VLSI circuits. However, the construction of the entire system with continuous-time analog circuits would be infeasible when the system size is very large. Therefore, mixed-signal circuits must be employed, and the time discretization and quantization of the synaptic weights are necessary. Moreover, the analog VLSI implementation of SNNs exhibits non-idealities, such as the effects of noise and device mismatches, as well as other constraints arising from the analog circuit operation. In this study, we investigated the effects of the time discretization and/or weight quantization on the performance of SNNs. Furthermore, we elucidated the effects the lower bound of the membrane potentials and the temporal fluctuation of the firing threshold. Finally, we propose an optimal approach for the mapping of mathematical SNN models to analog circuits with discretized time.	翻訳日:2023-03-26 08:07:48 公開日:2021-06-25
# 強化学習によるインタラクティブレコメンデーションの精度と公平性のバランス Balancing Accuracy and Fairness for Interactive Recommendation with Reinforcement Learning ( http://arxiv.org/abs/2106.13386v1 ) ライセンス: Link先を確認	Weiwen Liu, Feng Liu, Ruiming Tang, Ben Liao, Guangyong Chen, Pheng Ann Heng	(参考訳) レコメンデーションの公平性は、従来のレコメンデーションによって引き起こされるバイアスと差別のために、注目を集めている。 Interactive Recommender Systems (IRS)では、ユーザの好みとシステムの公平性は時間とともに常に変化している。既存の公正を意識した推奨者は、主に静的な設定における公平性を考慮する。 IRSに直接既存手法を適用すると、推奨度は低下する。この問題を解決するために,IRSの精度と公平性の長期的バランスを動的に維持する強化学習ベースのフレームワークであるFairRecを提案する。ユーザの好みとシステムの公平性ステータスは、状態表現に共同で圧縮され、レコメンデーションを生成する。 FairRecは、正確性と公正性を組み合わせた設計された累積報酬の最大化を目指している。大規模な実験は、FairRecが優れたレコメンデーション品質を維持しながら、公正性を改善することを実証する。 Fairness in recommendation has attracted increasing attention due to bias and discrimination possibly caused by traditional recommenders. In Interactive Recommender Systems (IRS), user preferences and the system's fairness status are constantly changing over time. Existing fairness-aware recommenders mainly consider fairness in static settings. Directly applying existing methods to IRS will result in poor recommendation. To resolve this problem, we propose a reinforcement learning based framework, FairRec, to dynamically maintain a long-term balance between accuracy and fairness in IRS. User preferences and the system's fairness status are jointly compressed into the state representation to generate recommendations. FairRec aims at maximizing our designed cumulative reward that combines accuracy and fairness. Extensive experiments validate that FairRec can improve fairness, while preserving good recommendation quality.	翻訳日:2023-03-25 14:11:22 公開日:2021-06-25
# 非マルコフ性との量子相関とコヒーレンスの保存 Preserving quantum correlations and coherence with non-Markovianity ( http://arxiv.org/abs/2106.13573v1 ) ライセンス: Link先を確認	Marek Miller, Kang-Da Wu, Manfredi Scalici, Jan Kolodynski, Guo-Yong Xiang, Chuan-Feng Li, Guang-Can Guo, Alexander Streltsov	(参考訳) 開量子系はシュリンガー方程式に従って一元的に進化する閉量子系と比較して豊かな現象論を示す。開量子系の力学は通常、任意の時間スケールで量子力学を有効な量子演算に分解できるかどうかによってマルコフ的および非マルコフ的に分類される。マルコフの進化は非マルコフ力学と比較してシミュレートが容易であるため、非マルコフ性は有用な量子技術応用に利用できると仮定することは妥当である。本稿では,量子系における相関とコヒーレンスを保存するための非マルコフ性の有用性を示す。このために我々は、デコヒーレンス行列をゼロから大きく分離した、広範囲な量子ビット進化のクラスを考える。そのようなマルコフの進化は、指数関数的な相関の損失をもたらすが、非マルコフ性は、極限 $t \rightarrow \infty$ においても相関を維持するのに役立つ。共変量子ビットの進化について、非マルコビアン性は、常に量子コヒーレンスを維持するのに利用できることを示す。我々は,この効果を線形光学を用いて実験的に証明し,常に非マルコフ型である所要進化を実装した。 Open quantum systems exhibit a rich phenomenology, in comparison to closed quantum systems that evolve unitarily according to the Schr\"odinger equation. The dynamics of an open quantum system are typically classified into Markovian and non-Markovian, depending on whether the dynamics can be decomposed into valid quantum operations at any time scale. Since Markovian evolutions are easier to simulate, compared to non-Markovian dynamics, it is reasonable to assume that non-Markovianity can be employed for useful quantum-technological applications. Here, we demonstrate the usefulness of non-Markovianity for preserving correlations and coherence in quantum systems. For this, we consider a broad class of qubit evolutions, having a decoherence matrix separated from zero for large times. While any such Markovian evolution leads to an exponential loss of correlations, non-Markovianity can help to preserve correlations even in the limit $t \rightarrow \infty$. For covariant qubit evolutions, we also show that non-Markovianity can be used to preserve quantum coherence at all times, which is an important resource for quantum metrology. We explicitly demonstrate this effect experimentally with linear optics, by implementing the required evolution that is non-Markovian at all times.	翻訳日:2023-03-25 14:08:41 公開日:2021-06-25
# 量子論のQに基づく解釈の導入 Introducing the Q-based interpretation of quantum theory ( http://arxiv.org/abs/2106.13502v1 ) ライセンス: Link先を確認	Simon Friederich	(参考訳) この記事では量子論の新しい解釈について概説する。この解釈の根底にある考え方は、Drummond と Reid [2020] によって最近提案された場の量子論において、位相空間函数 Q を、古典的な統計力学における確率分布 \rho に大まかに類似した適切な確率分布として解釈することである。ここで、qに基づく解釈を動機付け、経験的に適切かどうかを調べ、その重要な概念的特徴を概説する。 qに基づく解釈は、測定問題を持たないことを約束し、概念的に控えめであり、相対論的および場理論的な文脈にエレガントに適用する可能性を秘めているという点で魅力的である。 This article outlines a novel interpretation of quantum theory: the Q-based interpretation. The core idea underlying this interpretation, recently suggested for quantum field theories by Drummond and Reid [2020], is to interpret the phase space function Q -- a transform of the better known Wigner function -- as a proper probability distribution, roughly analogous to the probability distribution \rho in classical statistical mechanics. Here I motivate the Q-based interpretation, investigate whether it is empirically adequate, and outline some of its key conceptual features. I argue that the Q-based interpretation is attractive in that it promises having no measurement problem, is conceptually parsimonious and has the potential to apply elegantly to relativistic and field-theoretic contexts.	翻訳日:2023-03-25 14:07:35 公開日:2021-06-25
# 3 x 3方向不偏光線形光マルチポートの実装 Implementation of a 3 x 3 directionally-unbiased linear optical multiport ( http://arxiv.org/abs/2106.13473v1 ) ライセンス: Link先を確認	Ilhwan Kim, Donghwa Lee, Seongjin Hong, Young-Wook Cho, Kwang Jo Lee, Yong-Su Kim, Hyang-Tag Lim	(参考訳) 線形光マルチポートはフォトニック量子情報処理に広く用いられている。当然これらのデバイスは、光子が常に入力ポートから出力ポートに向かって伝播するため、方向バイアスを受ける。近年,方向不偏光多重ポートの概念が提案されている。これらの方向が不偏な多重ポートは光子を逆方向に伝播させ、複雑な線形光量子ネットワークに必要な線形光学素子の数を大幅に減少させる。本稿では,光トリッタとミラーを用いた3×3方向偏りのない線形光ファイバマルチポートの実証実験を行う。長いコヒーレンス長の光源でしか動作しないバルク光学素子を用いた以前の実演と比較して、実験用3×3光マルチポートは、可能なすべての光軌道に無視可能な光路長差を与えるため、長コヒーレンス長を必要としない。これは複雑なグラフネットワーク上で大規模量子ウォークを実装するのに有用なビルディングブロックである。 Linear optical multiports are widely used in photonic quantum information processing. Naturally, these devices are directionally-biased since photons always propagate from the input ports toward the output ports. Recently, the concept of directionally-unbiased linear optical multiports was proposed. These directionally-unbiased multiports allow photons to propagate along a reverse direction, which can greatly reduce the number of required linear optical elements for complicated linear optical quantum networks. Here, we report an experimental demonstration of a 3 x 3 directionally-unbiased linear optical fiber multiport using an optical tritter and mirrors. Compared to the previous demonstration using bulk optical elements which works only with light sources with a long coherence length, our experimental directionally-unbiased 3 x 3 optical multiport does not require a long coherence length since it provides negligible optical path length differences among all possible optical trajectories. It can be a useful building block for implementing large-scale quantum walks on complex graph networks.	翻訳日:2023-03-25 14:06:52 公開日:2021-06-25
# 全光伝送による水中デコイ状態量子鍵分布の実験 Experimental Demonstration of Underwater Decoy-state Quantum Key Distribution with All-optical Transmission ( http://arxiv.org/abs/2106.13441v1 ) ライセンス: Link先を確認	Yonghe Yu, Wendong Li, Yu Wei, Yang Yang, Shanchuan Dong, Tian Qian, Shuo Wang, Qiming Zhu, Shangshuai Zheng, Xinjian Zhang and Yongjian Gu	(参考訳) 量子信号、同期信号、古典的通信信号の全光伝送を備えた完全なuwqkdシステムを構築し、水深10.4メートルのjerlov型iii海水チャンネル上の水中量子鍵分布(uwqkd)を実証する。波長分割多重化と時空間波長フィルタリング技術を適用し、光信号が干渉しないようにする。このシステムはFPGAで制御されており、水密キャビンに容易に統合してフィールド実験を行うことができる。偏光符号化によるデコイ状態BB84プロトコルを用いることで、13.26dBの減衰で、鍵レート 1.82Kbps、エラーレート 1.55% が得られる。最大23.7dbのチャネル損失を許容できることを証明し、300mのジェロフ型清浄海水チャンネルで使用できることを証明した。 We demonstrate the underwater quantum key distribution (UWQKD) over a 10.4-meter Jerlov type III seawater channel by building a complete UWQKD system with all-optical transmission of quantum signals, synchronization signal and classical communication signal. The wavelength division multiplexing and the space-time-wavelength filtering technology are applied to ensure that the optical signals do not interfere with each other. The system is controlled by FPGA, and can be easily integrated into watertight cabins to perform field experiment. By using the decoy-state BB84 protocol with polarization encoding, we obtain a secure key rate of 1.82Kbps and an error rate of 1.55% at the attenuation of 13.26dB. We prove that the system can tolerate the channel loss up to 23.7dB, therefore may be used in the 300-meter-long Jerlov type I clean seawater channel.	翻訳日:2023-03-25 14:06:03 公開日:2021-06-25
# 衛星から地球への量子通信における光時間モード Temporal Modes of Light in Satellite-to-Earth Quantum Communications ( http://arxiv.org/abs/2106.13693v1 ) ライセンス: Link先を確認	Ziqing Wang, Robert Malaney, Ryan Aguinaldo	(参考訳) フォトニック・テンポラル・モード(TM)は、実現可能な多次元量子通信の候補である。しかし、Orbital Angular Momentum (OAM)のような他の多次元量子情報キャリアと比較して、TMはあまり注目されていない。さらに、新興量子インターネットと衛星ベースの量子通信の文脈では、TMは注目されていない。本研究では,tm空間に符号化された単一光子の衛星から地球への通信路を考慮し,この状況を改善する。以上の結果から,photonic tmは衛星から地上への高スループット量子通信を実現するための有望な手段であることが示唆された。特に、これらのモードが、OAM単一光子状態と比較して、衛星-地球間通信路における多重化性能と優れた量子鍵分布を実現する方法を示す。この結果を保証するtm識別のレベルを概説し、衛星ベースの量子インターネットにおける我々の結果の意義について論じる。 The photonic Temporal Mode (TM) represents a possible candidate for the delivery of viable multidimensional quantum communications. However, relative to other multidimensional quantum information carriers such as the Orbital Angular Momentum (OAM), the TM has received less attention. Moreover, in the context of the emerging quantum internet and satellite-based quantum communications, the TM has received no attention. In this work, we remedy this situation by considering the traversal through the satellite-to-Earth channel of single photons encoded in TM space. Our results indicate that for anticipated atmospheric conditions the photonic TM offers a promising avenue for the delivery of high-throughput quantum communications from a satellite to a terrestrial receiver. In particular, we show how these modes can provide for improved multiplexing performance and superior quantum key distribution in the satellite-to-Earth channel, relative to OAM single-photon states. The levels of TM discrimination that guarantee this outcome are outlined and implications of our results for the emerging satellite-based quantum internet are discussed.	翻訳日:2023-03-25 13:59:50 公開日:2021-06-25
# 条件付きフォン・ノイマンエントロピー上のデバイス独立な下界 Device-independent lower bounds on the conditional von Neumann entropy ( http://arxiv.org/abs/2106.13692v1 ) ライセンス: Link先を確認	Peter Brown, Hamza Fawzi and Omar Fawzi	(参考訳) 量子鍵分布(QKD)やランダムネス展開(RE)を含むいくつかのデバイス非依存(DI)プロトコルの速度は、特定の量子状態のクラスに対する条件付きフォン・ノイマンエントロピーの最適化によって計算できる。本研究では,そのようなレートで下限を計算する数値計算手法を提案する。一般分離ヒルベルト空間上で定義される系の条件付きフォン・ノイマンエントロピーに収束する最適化問題を導出する。 Navascu\'es-Pironio-Ac\'in階層を用いて、これらの問題を半定値プログラムに緩和し、DIプロトコルのレートの低い境界を計算する計算可能な方法を与える。提案手法を適用してDI-REおよびDI-QKDプロトコルの速度を計算することにより,従来の数値手法よりも大幅に改善され,DI-REとDI-QKDの両者の速度が大幅に向上したことを示す。特に、DI-QKDでは、現在の能力の範囲内にある最小限の検出効率しきい値を示す。さらに, 本手法は, 既知の密接な解析境界のインスタンスを回収することで, 高速に収束できることを実証する。最後に,本手法はエントロピー累積定理に適合するので,有限ラウンドプロトコルの計算速度を計算し,その安全性を証明できることを示す。 The rates of several device-independent (DI) protocols, including quantum key-distribution (QKD) and randomness expansion (RE), can be computed via an optimization of the conditional von Neumann entropy over a particular class of quantum states. In this work we introduce a numerical method to compute lower bounds on such rates. We derive a sequence of optimization problems that converge to the conditional von Neumann entropy of systems defined on general separable Hilbert spaces. Using the Navascu\'es-Pironio-Ac\'in hierarchy we can then relax these problems to semidefinite programs, giving a computationally tractable method to compute lower bounds on the rates of DI protocols. Applying our method to compute the rates of DI-RE and DI-QKD protocols we find substantial improvements over all previous numerical techniques, demonstrating significantly higher rates for both DI-RE and DI-QKD. In particular, for DI-QKD we show a new minimal detection efficiency threshold which is within the realm of current capabilities. Moreover, we demonstrate that our method is capable of converging rapidly by recovering instances of known tight analytical bounds. Finally, we note that our method is compatible with the entropy accumulation theorem and can thus be used to compute rates of finite round protocols and subsequently prove their security.	翻訳日:2023-03-25 13:59:34 公開日:2021-06-25
# 2つのローカライズ技術--ローエンド電話における高精度ローカライズを可能にする- The Tale of Two Localization Technologies: Enabling Accurate Low-Overhead WiFi-based Localization for Low-end Phones ( http://arxiv.org/abs/2106.13663v1 ) ライセンス: Link先を確認	Ahmed Shokry, Moustafa Elhamshary, Moustafa Youssef	(参考訳) WiFiフィンガープリントは、屋内ローカライゼーションの主流技術の一つである。しかし、指紋データベースを手動で構築する初期校正フェーズが必要となる。このプロセスは労働集約的であり、環境の変化を繰り返す必要があります。 RF伝搬モデルやクラウドソーシングによる校正作業を減らすために多くのシステムが導入されたが、これらにはいくつかの制限がある。他のアプローチでは、最近開発されたiBeacon技術が、屋内ローカライゼーションのためのWiFiに代わるものとして使われている。しかし、ビーコンベースのソリューションは、ハイエンドフォンの小さなサブセットに限られている。本稿では,精度の低い屋内ローカライズシステムであるhybridlocを提案する。 HybridLocの基本的な考え方は、ハイエンドのスマートフォンのセンサーを利用してローエンドのスマートフォンをローカライズすることだ。具体的には、WiFi指紋は、BLE対応ハイエンドスマートフォンから取得した位置情報をラベル付けしたWi-Fiスキャンによってクラウドソースされる。これらのスキャンは、Wi-Fiフィンガープリントを自動で構築するために使用され、その後、ローエンドの携帯電話をユビキタスなWiFi技術でローカライズするために使われる。 hybridlocはまた、指紋作成に使用される推定ble位置の固有のエラーに対処するとともに、ノイズの多いワイヤレス環境や異種デバイスなどの実用的な配置問題に対処するための規定も備えている。 Android携帯電話を用いたHybridLocの評価では,手動指紋認証技術と同じ範囲で,正確な位置推定が可能である。さらに、WiFiのみをサポートするローエンド端末でのローエンド端末のローカライズ精度は、BLEをサポートするハイエンド端末と同等である。この精度はトレーニングオーバーヘッドなしで達成され、異なるユーザデバイスに対して堅牢であり、環境変化下で一貫性がある。 WiFi fingerprinting is one of the mainstream technologies for indoor localization. However, it requires an initial calibration phase during which the fingerprint database is built manually. This process is labour intensive and needs to be repeated with any change in the environment. While a number of systems have been introduced to reduce the calibration effort through RF propagation models or crowdsourcing, these still have some limitations. Other approaches use the recently developed iBeacon technology as an alternative to WiFi for indoor localization. However, these beacon-based solutions are limited to a small subset of high-end phones. In this paper, we present HybridLoc: an accurate low-overhead indoor localization system. The basic idea HybridLoc builds on is to leverage the sensors of high-end phones to enable localization of lower-end phones. Specifically, the WiFi fingerprint is crowdsourced by opportunistically collecting WiFi-scans labeled with location data obtained from BLE-enabled high-end smart phones. These scans are used to automatically construct the WiFi-fingerprint, that is used later to localize any lower-end cell phone with the ubiquitous WiFi technology. HybridLoc also has provisions for handling the inherent error in the estimated BLE locations used in constructing the fingerprint as well as to handle practical deployment issues including the noisy wireless environment, heterogeneous devices, among others. Evaluation of HybridLoc using Android phones shows that it can provide accurate localization in the same range as manual fingerprinting techniques under the same conditions. Moreover, the localization accuracy on low-end phones supporting only WiFi is comparable to that achieved with high-end phones supporting BLE. This accuracy is achieved with no training overhead, is robust to the different user devices, and is consistent under environment changes.	翻訳日:2023-03-25 13:58:37 公開日:2021-06-25
# 相互作用する量子系の散逸および散逸のないダイナミクスの等価性とそのユニタリフェルミ気体への応用 Equivalence of dissipative and dissipationless dynamics of interacting quantum systems with its application to the unitary Fermi gas ( http://arxiv.org/abs/2106.13621v1 ) ライセンス: Link先を確認	Masaaki Tokieda and Shimpei Endo	(参考訳) 粒子間相互作用を用いたCaldirola-Kanaiモデルによる量子散逸ダイナミクスの解析を行った。カルディロラ・カナイモデルの散逸量子力学は、粒子が強く相互作用している場合でも、負の外部調和ポテンシャルの下で散逸のない量子力学に正確にマッピングできることがわかった。特に,低温原子や原子核問題に関連する一元的フェルミ気体ではマッピングが有効であることを示す。 We analytically study quantum dissipative dynamics described by the Caldirola-Kanai model with inter-particle interactions. We have found that the dissipative quantum dynamics of the Caldirola-Kanai model can be exactly mapped to a dissipationless quantum dynamics under a negative external harmonic potential, even when the particles are strongly interacting. In particular, we show that the mapping is valid for the unitary Fermi gas, which is relevant for cold atoms and nuclear matters.	翻訳日:2023-03-25 13:57:54 公開日:2021-06-25
# 局地化システム評価における地中真実の精度の影響 The Effect of Ground Truth Accuracy on the Evaluation of Localization Systems ( http://arxiv.org/abs/2106.13614v1 ) ライセンス: Link先を確認	Chen Gu, Ahmed Shokry, Moustafa Youssef	(参考訳) 位置決定システムの性能を正確に評価する能力は、多くのアプリケーションにおいて不可欠である。通常、そのようなシステムの性能は、地上の真理位置と推定位置を比較して得られる。しかし、これらの地上の真実の場所は通常、地図をクリックするか、GPSのような他のグローバルな技術を使って得られる。これは、マーキングプロセス、地図歪み、または固有のGPS不正確さに起因する、地上の真実の誤りをもたらす。本稿では,局所化システムの評価に対する基底的真理誤差の影響を分析するための理論的枠組みを提案する。そこで本研究では,検証誤差とマーキング/マップグランド真理誤差から実アルゴリズム誤差を計算するアルゴリズムを2つ設計した。さまざまなパフォーマンス指標の境界をさらに確立します。典型的な環境において収集された実データを用いた理論的仮定と解析の検証は、地中真理誤差の存在下での局所化アルゴリズムの推定誤差を補正する理論的枠組みの能力を示している。具体的には, マーキング誤差アルゴリズムは実誤差CDFと4%以内の精度で一致し, マップが6mずれた場合には, 中央値/テール誤差を150%/72%精度で推定する。 The ability to accurately evaluate the performance of location determination systems is crucial for many applications. Typically, the performance of such systems is obtained by comparing ground truth locations with estimated locations. However, these ground truth locations are usually obtained by clicking on a map or using other worldwide available technologies like GPS. This introduces ground truth errors that are due to the marking process, map distortions, or inherent GPS inaccuracy. In this paper, we present a theoretical framework for analyzing the effect of ground truth errors on the evaluation of localization systems. Based on that, we design two algorithms for computing the real algorithmic error from the validation error and marking/map ground truth errors, respectively. We further establish bounds on different performance metrics. Validation of our theoretical assumptions and analysis using real data collected in a typical environment shows the ability of our theoretical framework to correct the estimated error of a localization algorithm in the presence of ground truth errors. Specifically, our marking error algorithm matches the real error CDF within 4%, and our map error algorithm provides a more accurate estimate of the median/tail error by 150%/72% when the map is shifted by 6m.	翻訳日:2023-03-25 13:57:43 公開日:2021-06-25
# 集束原子レーザービーム Focusing Atom Laser Beams ( http://arxiv.org/abs/2106.13845v1 ) ライセンス: Link先を確認	R. Richberg and A. M. Martin	(参考訳) 我々は、ルビジウム85(^{85}$Rb)の準連続原子レーザービームの焦点を理論的に研究する。 Gross-Pitaevskii方程式に基づく2相モデル解析は、2-体原子-原子相互作用と3-体再結合損失の影響を含む。ハーモニックポテンシャルなどの光集束電位を用いて,集束原子ビームプロファイルの幅,ピーク密度,原子損失率などの重要な因子について検討した。我々の分析は、原子レーザーを用いることで最大8ドルnmの分解能が劇的に向上すると予想している。 We theoretically study the focusing of a quasi-continuous atom laser beam of rubidium-85 ($^{85}$Rb). A two-sate model analysis based on the Gross-Pitaevskii equation is used which comprises the effects of two-body atom-atom interactions and three-body recombination losses. Utilizing optical focusing potentials such as harmonic potentials, the essential factors such as the width, peak density and atom loss rate of the focused atom laser beam profile are investigated. Our analysis predicts that using an atom laser offers a dramatic improvement in resolution of up to $8$ nm.	翻訳日:2023-03-25 13:50:29 公開日:2021-06-25
# 機械学習のための量子埋め込み実験 Experimental Quantum Embedding for Machine Learning ( http://arxiv.org/abs/2106.13835v1 ) ライセンス: Link先を確認	Ilaria Gianani, Ivana Mastroserio, Lorenzo Buffoni, Natalia Bruno, Ludovica Donati, Valeria Cimini, Marco Barbieri, Francesco S. Cataliotti, and Filippo Caruso	(参考訳) ビッグデータの分類は通常、新しいデータクラスタへのマッピングが必要であり、より効率的で実行可能な線形セパレータによって機械学習アルゴリズムによって処理される。最近、lloydらは古典データを量子空間に埋め込む提案を推し進めた: これらはより複雑なヒルベルト空間に存在し、線形に分離可能なクラスターに分割できる。本稿では、量子光学と超低温原子をベースとした2つの異なる実験プラットフォームを設計し、深層学習法により量子埋め込みプロトコルを適応・数値的に最適化し、いくつかの古典的データに対して検証する。リゲッティ超伝導量子コンピュータでも同様の解析を行う。したがって、量子埋め込みアプローチは実験レベルでもうまく機能し、特に、異なるプラットフォームがこのタスクを達成するために補完的な方法でどのように機能するかを示す。これらの研究は、特にハイブリッド量子技術に基づく量子機械学習技術に関する将来の研究の道を開くかもしれない。 The classification of big data usually requires a mapping onto new data clusters which can then be processed by machine learning algorithms by means of more efficient and feasible linear separators. Recently, Lloyd et al. have advanced the proposal to embed classical data into quantum ones: these live in the more complex Hilbert space where they can get split into linearly separable clusters. Here, we implement these ideas by engineering two different experimental platforms, based on quantum optics and ultra-cold atoms respectively, where we adapt and numerically optimize the quantum embedding protocol by deep learning methods, and test it for some trial classical data. We perform also a similar analysis on the Rigetti superconducting quantum computer. Therefore, we find that the quantum embedding approach successfully works also at the experimental level and, in particular, we show how different platforms could work in a complementary fashion to achieve this task. These studies might pave the way for future investigations on quantum machine learning techniques especially based on hybrid quantum technologies.	翻訳日:2023-03-25 13:49:36 公開日:2021-06-25
# 分散センシングのための近似デコヒーレンス自由部分空間 Approximate decoherence free subspaces for distributed sensing ( http://arxiv.org/abs/2106.13828v1 ) ライセンス: Link先を確認	Arne Hamann, Pavel Sekatski and Wolfgang D\"ur	(参考訳) トラップ内の異なる位置に位置する複数の原子などのセンサネットワークを用いて,空間依存性のあるスカラー値場を検知することを検討する。特定信号のみを検知する空間相関の活用法を示し、定評のある位置における雑音源の非一貫性部分空間を構築することにより、異なる位置や不等な空間依存の他人に対して無感を与える方法を示す。これは特定の表面にあるノイズ源に拡張することができ、そこでは古典的な静電気学におけるミラー電荷と等方性表面との接続に遭遇する。一般的な状況では、制御された方法で信号強度を減少させるコストで、あるボリューム内のすべてのソースのノイズを著しく抑制する、近似デコヒーレンスのない部分空間の概念を導入する。多数のノイズ源が存在するにも関わらず, 長期間にわたって, 多数のセンサに対して, ハイゼンベルクスケーリングを維持できることを示す。内部状態とセンサ構成を構築するための効率的なフォーマリズムを導入し、それをいくつかの例に適用して、我々のアプローチの有用性と幅広い適用性を示す。 We consider the sensing of scalar valued fields with specific spatial dependence using a network of sensors, e.g. multiple atoms located at different positions within a trap. We show how to harness the spatial correlations to sense only a specific signal, and be insensitive to others at different positions or with unequal spatial dependence by constructing a decoherence-free subspace for noise sources at fixed, known positions. This can be extended to noise sources lying on certain surfaces, where we encounter a connection to mirror charges and equipotential surfaces in classical electrostatics. For general situations, we introduce the notion of an approximate decoherence-free subspace, where noise for all sources within some volume is significantly suppressed, at the cost of reducing the signal strength in a controlled way. We show that one can use this approach to maintain Heisenberg-scaling over long times and for a large number of sensors, despite the presence of multiple noise sources in large volumes. We introduce an efficient formalism to construct internal states and sensor configurations, and apply it to several examples to demonstrate the usefulness and wide applicability of our approach.	翻訳日:2023-03-25 13:49:21 公開日:2021-06-25
# リニア光学による絡み合ったフォトニック状態の生成 Creation of Entangled Photonic States Using Linear Optics ( http://arxiv.org/abs/2106.13825v1 ) ライセンス: Link先を確認	Sara Bartolucci, Patrick M. Birchall, Mercedes Gimeno-Segovia, Eric Johnston, Konrad Kieling, Mihir Pant, Terry Rudolph, Jake Smith, Chris Sparrow, Mihai D. Vidrighin	(参考訳) 線形光学素子のみを用いることで、デュアルレールフォトニックエンタングル状態の生成は本質的に確率的である。量子情報処理プロトコルのほぼ決定論的操作を実現するために大規模な多重化を必要とする。本稿では,線形光量子コンピューティング(loqc)アーキテクチャのフットプリントを大幅に削減する可能性を持つ,高い確率でフォトニック絡み合い状態を生成する複数の手法と手法を提案する。特に,4つの単一光子から最大p=2/3までのベル状態調製の改善,デュアルレールベル状態アンシラによるType-I融合の75%向上,ベル状態識別限界を超えるType-II融合の改善を示す。 Using only linear optical elements, the creation of dual-rail photonic entangled states is inherently probabilistic. Known entanglement generation schemes have low success probabilities, requiring large-scale multiplexing to achieve near-deterministic operation of quantum information processing protocols. In this paper, we introduce multiple techniques and methods to generate photonic entangled states with high probability, which have the potential to reduce the footprint of Linear Optical Quantum Computing (LOQC) architectures drastically. Most notably, we are showing how to improve Bell state preparation from four single photons to up to p=2/3, boost Type-I fusion to 75% with a dual-rail Bell state ancilla and improve Type-II fusion beyond the limits of Bell state discrimination.	翻訳日:2023-03-25 13:49:01 公開日:2021-06-25
# SiおよびGe量子ドットにおけるホールスピン量子ビットの完全な可変超微細相互作用 Fully tunable hyperfine interactions of hole spin qubits in Si and Ge quantum dots ( http://arxiv.org/abs/2106.13744v1 ) ライセンス: Link先を確認	Stefano Bosco and Daniel Loss	(参考訳) ホールスピン量子ビットはスケーラブルな量子コンピュータのフロントエンドプラットフォームであるが、最先端のデバイスは原子核欠陥との超微細な相互作用に起因するノイズに悩まされている。これらの相互作用は、デバイス設計と外部電界によって制御される高度に調整可能な異方性を有する。この調整性により、超微細ノイズがマグニチュードで抑制され、異方性に精製された材料に匹敵するスイートスポットが可能になる。量子ビットは非常に整合性が高く、電荷と超微細ノイズの影響を受けない驚くほど単純な設計を同定する。長い量子ドットに典型的な大きなスピン軌道相互作用は、量子ビット演算を高速化するだけでなく、極小ノイズを劇的に再正規化し、駆動する量子ビットのダイナミクスを定性的に変化させ、量子ビットゲートの忠実度を高める。本研究は,量子コンピュータのスケールアップのための高性能量子ビットの設計ガイドラインとして機能する。 Hole spin qubits are frontrunner platforms for scalable quantum computers, but state-of-the-art devices suffer from noise originating from the hyperfine interactions with nuclear defects. We show that these interactions have a highly tunable anisotropy that is controlled by device design and external electric fields. This tunability enables sweet spots where the hyperfine noise is suppressed by an order of magnitude and is comparable to isotopically purified materials. We identify surprisingly simple designs where the qubits are highly coherent and are largely unaffected by both charge and hyperfine noise. We find that the large spin-orbit interaction typical of elongated quantum dots not only speeds up qubit operations, but also dramatically renormalizes the hyperfine noise, altering qualitatively the dynamics of driven qubits and enhancing the fidelity of qubit gates. Our findings serve as guidelines to design high performance qubits for scaling up quantum computers.	翻訳日:2023-03-25 13:48:16 公開日:2021-06-25
# ロスレス画像フォーマットの比較 Comparison of Lossless Image Formats ( http://arxiv.org/abs/2108.02557v1 ) ライセンス: Link先を確認	David Barina	(参考訳) 近年、画像圧縮フォーマットとビデオ圧縮フォーマットを備えたバッグが取り壊されている。しかし、それらのほとんどはロスレス圧縮にフォーカスしており、ロスレスモードのみをサポートしている。本稿では、ロスレスフォーマットと「どちらが最も効率的か」という重要な疑問に焦点をあてる。 FLIFは現在、ロスレス画像圧縮の最も効率的なフォーマットであることが判明した。これとは対照的に、FLIF開発者はJPEG XLに賛成して開発を中止した。 In recent years, a bag with image and video compression formats has been torn. However, most of them are focused on lossy compression and only marginally support the lossless mode. In this paper, I will focus on lossless formats and the critical question: "Which one is the most efficient?" It turned out that FLIF is currently the most efficient format for lossless image compression. This finding is in contrast to that FLIF developers stopped its development in favor of JPEG XL.	翻訳日:2023-03-25 13:41:51 公開日:2021-06-25
# ポアソン方程式とシュリンガー方程式の3次元における双対性 Duality between Poisson and Schr\"odinger equations in three dimensions ( http://arxiv.org/abs/2107.04669v1 ) ライセンス: Link先を確認	G. Gonzalez	(参考訳) 3次元世界における静電問題と1次元世界における量子力学的問題との間の双対性は、静電結果を用いてシュリンガー方程式の基底状態解を得ることができることを3次元に一般化する。ここでは、同じ変換手法が中心ポテンシャルの3次元におけるs-wave Schr\"odinger方程式にも適用可能であることを示す。このアプローチは静電ポテンシャルとs波関数と電気エネルギー密度と量子力学的エネルギーとの一般的な関係をもたらす。 A duality between an electrostatic problem in a three dimensional world and a quantum mechanical problem in a one dimensional world which allows one to obtain the ground state solution of the Schr\"odinger equation by using electrostatic results is generalized to three dimensions. Here, it is demonstrated that the same transformation technique is also applicable to the s-wave Schr\"odinger equation in three dimensions for central potentials. This approach leads to a general relationship between the electrostatic potential and the s-wave function and the electric energy density to the quantum mechanical energy.	翻訳日:2023-03-25 13:41:45 公開日:2021-06-25
# 表面符号モデルの量子二重性 Quantum double aspects of surface code models ( http://arxiv.org/abs/2107.04411v1 ) ライセンス: Link先を確認	Alexander Cowtan and Shahn Majid	(参考訳) 我々は、量子二重D(G)$対称性を持つ正方格子上のフォールトトレラント量子コンピューティングの北エフモデルを再検討し、$G$は有限群である。我々は、その準粒子コンテンツの投影演算子を $d(g)$ の既約表現として提供し、これを開リボン励起空間 $l(s_0,s_1)$ の $d(g)$-双加群特性と組み合わせて、開リボンがそれらのエンドポイント間の情報をテレポートするのにどのように使えるかを示す。初期の仕事をベースにした自己完結型アカウントを提供しながら、$d(s_3)$ のゲートを含む表面符号理論として量子コンピューティングへの応用を強調する。 D( \Bbb Z_n)\cong \Bbb C\Bbb Z_n^2$(トリックリボン作用素とそのブレイディングを含む)の場合、この理論がトーリック符号のより単純な理論にどのように還元されるかを示す。一方、我々の構成は、有限次元ホップ代数$H$に基づいて$D(H)$モデルに一般化し、ホップ代数が半単純でない場合でも、$D(H)$のサイトアクションとリボン同値部分結果を含むことを示す。 We revisit the Kitaev model for fault tolerant quantum computing on a square lattice with underlying quantum double $D(G)$ symmetry, where $G$ is a finite group. We provide projection operators for its quasiparticles content as irreducible representations of $D(G)$ and combine this with $D(G)$-bimodule properties of open ribbon excitation spaces $L(s_0,s_1)$ to show how open ribbons can be used to teleport information between their endpoints $s_0,s_1$. We give a self-contained account that builds on earlier work but emphasises applications to quantum computing as surface code theory, including gates on $D(S_3)$. We show how the theory reduces to a simpler theory for toric codes in the case of $D( \Bbb Z_n)\cong \Bbb C\Bbb Z_n^2$, including toric ribbon operators and their braiding. In the other direction, we show how our constructions generalise to $D(H)$ models based on a finite-dimensional Hopf algebra $H$, including site actions of $D(H)$ and partial results on ribbon equivariance even when the Hopf algebra is not semisimple.	翻訳日:2023-03-25 13:41:36 公開日:2021-06-25
# 番組は継続しなければならない --パンデミック中の検査 The Show Must Go On -- Examination During a Pandemic ( http://arxiv.org/abs/2107.04014v1 ) ライセンス: Link先を確認	Pamela Fleischmann and Mitja Kulczynski and Dirk Nowotka	(参考訳) 予期せぬインシデントが発生すると、新しい革新的で柔軟なソリューションが求められます。もしこのイベントが新型コロナウイルス(COVID-19)のパンデミックのように急激で劇的なものであるなら、これらの解決策は生命を守りながら可能な限り正常性を保証することを目指している。ショックを受けた後,大学側は,学生の失業による財政問題に直面している学生が多いため,期待できる時間内での学位確保のために,学業を追求しなくてはならないと判断した。これは,教師としての私たちにとって,授業方法がほぼ1日から次の1日間に再編成されるだけでなく,厳格な衛生規則の下でペンや紙で直接行うべき試験方法の調整が必要であったことを暗示している。一方、修正は個人的な接触を避けるべきである。我々は、我が国の一般データ保護規制による高い基準を提供しながら、自宅でのデジタル化試験を安全に修正できる枠組みを開発した。さらに、自動テストシートが自動生成され、自動でデジタル化され、ワークオンテストがソートされるため、オフィスでの時間を最小限に抑えることができる。 When unexpected incidents occur, new innovative and flexible solutions are required. If this event is something such radical and dramatic like the COVID-19 pandemic, these solutions must aim to guarantee as much normality as possible while protecting lives. After a moment of shock our university decided that the students have to be able to pursue their studies for guaranteeing a degree in the expected time since most of them faced immediate financial problems due to the loss of their student jobs. This implied, for us as teachers, that we had to reorganise not only the teaching methods from nearly one day to the next, but we also had to come up with an adjusted way of examinations which had to take place in person with pen and paper under strict hygiene rules. On the other hand the correction should avoid personal contacts. We developed a framework which allowed us to correct the digitalised exams safely at home while providing the high standards given by the general data protection regulation of our country. Moreover, the time spent in the offices could be reduced to a minimum thanks to automatically generated exam sheets, automatically re-digitalised and sorted worked-on exams.	翻訳日:2023-03-25 13:41:09 公開日:2021-06-25
# AIと医薬品研究の将来 AI and the future of pharmaceutical research ( http://arxiv.org/abs/2107.03896v1 ) ライセンス: Link先を確認	Adam Zielinski	(参考訳) 本稿では今後,医薬品の進歩が医薬品開発にどのような影響を与えるかを検討する。この質問は、業界文献、研究雑誌、ai研究、市場報告、市場予測、討論論文、プレスリリース、組織のウェブサイトを含む豊富なソース資料をレビューすることで答えられた。この論文は、医薬品のAIの継続的な革新は、これまで治療できなかった病気に対する安全で効果的な治療法の迅速な開発を可能にすると論じている。製薬業界は今日、重大な生産性危機に陥っており、aiを活用した研究手法を直接適用することで、医薬品発見プロジェクトの時間とコストを削減することができる。業界はすでに、薬物分子発見時間の10倍削減などの結果を報告している。産業、政府、アカデミア間の多くのAIアライアンスにより、独自データの利用が可能となり、これまでで最大の分子毒性データベースや200以上の薬物安全性予測モデルなどの結果につながった。最近、テック大企業と記録的な資金調達ラウンドが組み合わさったことで勢いが増した。長期的な効果は、安全で効果的な治療法から、薬品の特許の役割が減り、大規模なコラボレーションや、現在治療不能な病気に焦点を当てた新しいビジネス戦略まで幅広い。論文は、多くのレビューされたリソースは、過度に楽観的な将来の期待を持っているように見えるが、これらの開発のごく一部でさえ、生産性の危機を緩和するだろうと指摘している。最後に、この論文は、医薬品のAIに焦点をあてることによって、別の大きな破壊、すなわちオープンデータ共有とコラボレーションに向けて業界を軌道に乗せたと結論付けている。 This paper examines how pharmaceutical Artificial Intelligence advancements may affect the development of new drugs in the coming years. The question was answered by reviewing a rich body of source material, including industry literature, research journals, AI studies, market reports, market projections, discussion papers, press releases, and organizations' websites. The paper argues that continued innovation in pharmaceutical AI will enable rapid development of safe and effective therapies for previously untreatable diseases. A series of major points support this conclusion: The pharmaceutical industry is in a significant productivity crisis today, and AI-enabled research methods can be directly applied to reduce the time and cost of drug discovery projects. The industry already reported results such as a 10-fold reduction in drug molecule discovery times. Numerous AI alliances between industry, governments, and academia enabled utilizing proprietary data and led to outcomes such as the largest molecule toxicity database to date or more than 200 drug safety predictive models. The momentum was recently increased by the involvement of tech giants combined with record rounds of funding. The long-term effects will range from safer and more effective therapies, through the diminished role of pharmaceutical patents, to large-scale collaboration and new business strategies oriented around currently untreatable diseases. The paper notes that while many reviewed resources seem to have overly optimistic future expectations, even a fraction of these developments would alleviate the productivity crisis. Finally, the paper concludes that the focus on pharmaceutical AI put the industry on a trajectory towards another significant disruption: open data sharing and collaboration.	翻訳日:2023-03-25 13:40:49 公開日:2021-06-25
# 移動センサを用いた日常行動リズム変化を用いた患者非依存型統合失調症再発予測 Patient-independent Schizophrenia Relapse Prediction Using Mobile Sensor based Daily Behavioral Rhythm Changes ( http://arxiv.org/abs/2106.15353v1 ) ライセンス: Link先を確認	Bishal Lamichhane, Dror Ben-Zeev, Andrew Campbell, Tanzeem Choudhury, Marta Hauser, John Kane, Mikio Obuchi, Emily Scherer, Megan Walsh, Rui Wang, Weichen Wang, and Akane Sano	(参考訳) 統合失調症の再発は患者の健康、仕事、時には生命の安全に深刻な影響を与える。例えば、患者の早期の行動変化を検出することで、予定される再発を予測することができれば、再発を防ぐための介入が提供される。本研究では,モバイルセンシングデータを用いた統合失調症再発予測モデルを用いて,行動の特徴を特徴付ける。再発予測のための臨床展開シナリオを忠実に表現した逐次予測を提供する患者独立モデルの評価を行った。このモデルは、過去4週間のモバイルセンシングデータを使用して、来週の到来を予測している。本研究では,モバイルセンシングデータの毎日のテンプレートから抽出した行動リズム特徴,ema(ecological momentary assessment)による自己報告症状,および再帰予測のための分類器の比較を行った。 Naive Bayesをベースとしたモデルでは、63人の統合失調症患者からなるデータセットで最大1年間測定されたF2スコアが0.083であった。得られたF2スコアは低いが、ランダム分類のベースライン性能より優れている(F2スコアは0.02$\pm$0.024)。このように、移動センシングは、来るべき再発を検出するための予測値を持ち、現在の性能を改善するためにさらなる調査が必要である。その目的に向けて、患者の行動的特徴に基づくさらなる機能工学とモデルパーソナライズが役立つかもしれない。 A schizophrenia relapse has severe consequences for a patient's health, work, and sometimes even life safety. If an oncoming relapse can be predicted on time, for example by detecting early behavioral changes in patients, then interventions could be provided to prevent the relapse. In this work, we investigated a machine learning based schizophrenia relapse prediction model using mobile sensing data to characterize behavioral features. A patient-independent model providing sequential predictions, closely representing the clinical deployment scenario for relapse prediction, was evaluated. The model uses the mobile sensing data from the recent four weeks to predict an oncoming relapse in the next week. We used the behavioral rhythm features extracted from daily templates of mobile sensing data, self-reported symptoms collected via EMA (Ecological Momentary Assessment), and demographics to compare different classifiers for the relapse prediction. Naive Bayes based model gave the best results with an F2 score of 0.083 when evaluated in a dataset consisting of 63 schizophrenia patients, each monitored for up to a year. The obtained F2 score, though low, is better than the baseline performance of random classification (F2 score of 0.02 $\pm$ 0.024). Thus, mobile sensing has predictive value for detecting an oncoming relapse and needs further investigation to improve the current performance. Towards that end, further feature engineering and model personalization based on the behavioral idiosyncrasies of a patient could be helpful.	翻訳日:2023-03-25 13:40:09 公開日:2021-06-25
# 幾何ヒートポンプ:時間依存変調による熱輸送制御 Geometric Heat Pump: Controlling Thermal Transport with Time-dependent Modulations ( http://arxiv.org/abs/2106.14687v1 ) ライセンス: Link先を確認	Zi Wang, Luqin Wang, Jiangzhi Chen, Chen Wang, and Jie Ren	(参考訳) 熱力学の第2法則は、熱は平均して熱湯から冷たい浴場まで同時に流れると定めている。この図を越えて、過去10年間の作品の範囲は、瞬時熱バイアスによって決定される平均的な動的熱流束を除いて、時間駆動系において、内在幾何学的起源の非自明な流束が一般的に存在することを示している。この追加の熱流束は、ポンプされた熱に対して無料のランチを提供し、バイアスに対して熱を駆動することもできる。ここでは, 位相幾何学的位相効果に由来する, いわゆる「幾何学的ヒートポンプ」の出現と発展を概観し, 内部動力学の異なる様々な量子・古典輸送系について述べる。 adiabatic から non-adiabatic regime への一般化と制御理論の適用についても論じる。次に, 熱ポンプ効果の対称性, 双対性, 超対称性, 時間反転対称性について概説する。最後に, 幾何学的ヒートポンププロセスに関するオープンな問題について検討し, 高性能熱機械開発におけるその将来的意義を明らかにする。 The second law of thermodynamics dictates that heat simultaneously flows from the hot to cold bath on average. To go beyond this picture, a range of works in the past decade show that, other than the average dynamical heat flux determined by instantaneous thermal bias, a non-trivial flux contribution of intrinsic geometric origin is generally present in temporally driven systems. This additional heat flux provides a free lunch for the pumped heat and could even drive heat against the bias. We review here the emergence and development of this so called ``geometric heat pump'', originating from the topological geometric phase effect, and cover various quantum and classical transport systems with different internal dynamics. The generalization from the adiabatic to the non-adiabatic regime and the application of control theory are also discussed. Then, we briefly discuss the symmetry restriction on the heat pump effect, such as duality, supersymmetry and time-reversal symmetry. Finally, we examine open problems concerning the geometric heat pump process and elucidate their prospective significance in devising thermal machines with high performance.	翻訳日:2023-03-25 13:39:45 公開日:2021-06-25
# 量子エネルギーテレポーテーションはいつ観測可能か? Comment on "When the Quantum Energy Teleportation is Observable? " ( http://arxiv.org/abs/2106.14680v1 ) ライセンス: Link先を確認	Masahiro Hotta	(参考訳) 最近の論文(arxiv:2105.04407)の著者は、時間-エネルギーの不確実性関係のため、量子エネルギーテレポーテーションは観測不可能であると主張している。この短い注記で、私は彼らの議論が間違っていることを指摘します。彼らは不確実性関係を誤用する。 Recently authors of a paper (arXiv:2105.04407) claim that quantum energy teleportation is unobservable due to a time-energy uncertainty relation. In this short note, I will point out that their argument is wrong. They misuse the uncertainty relation.	翻訳日:2023-03-25 13:39:23 公開日:2021-06-25
# ハードウェアニューラルネットワークにおけるノイズ摂動下でのブール学習 Boolean learning under noise-perturbations in hardware neural networks ( http://arxiv.org/abs/2003.12319v2 ) ライセンス: Link先を確認	Louis Andreoli, Xavier Porte, St\'ephane Chr\'etien, Maxime Jacquot, Laurent Larger and Daniel Brunner	(参考訳) ニューラルネットワークの高効率なハードウェア統合は、非線形性の実現、ネットワーク接続、物理基板での学習から恩恵を受ける。最近、複数のシステムがこれらの操作の一部を実装したが、技術的な課題に対処することに焦点が当てられた。ハードウェアニューラルネットワークの学習に関する基本的な疑問はほとんど未解決のままである。このようなアーキテクチャでは特にノイズは避けられず、光電子リカレントニューラルネットワークを用いた学習アルゴリズムとのインタラクションについて検討する。ノイズは収束中のシステムの経路を強く修飾し、最終的な読み出し重み行列を驚くほど完全に分離する。これは、相互作用するプレイヤーとしてアーキテクチャ、ノイズ、学習アルゴリズムを理解することの重要性を強調し、ノイズの多いアナログシステムの最適化のための数学的ツールの必要性を特定する。 A high efficiency hardware integration of neural networks benefits from realizing nonlinearity, network connectivity and learning fully in a physical substrate. Multiple systems have recently implemented some or all of these operations, yet the focus was placed on addressing technological challenges. Fundamental questions regarding learning in hardware neural networks remain largely unexplored. Noise in particular is unavoidable in such architectures, and here we investigate its interaction with a learning algorithm using an opto-electronic recurrent neural network. We find that noise strongly modifies the system's path during convergence, and surprisingly fully decorrelates the final readout weight matrices. This highlights the importance of understanding architecture, noise and learning algorithm as interacting players, and therefore identifies the need for mathematical tools for noisy, analogue system optimization.	翻訳日:2022-12-19 04:27:01 公開日:2021-06-25
# ベイズ最適化を用いた未知非線形システムの安全学習に基づくオブザーバ Safe Learning-based Observers for Unknown Nonlinear Systems using Bayesian Optimization ( http://arxiv.org/abs/2005.05888v2 ) ライセンス: Link先を確認	Ankush Chakrabarty and Mouhacine Benosman	(参考訳) 未知のダイナミクスを持つ動的システムから生成されたデータは、エラーのモデリングが堅牢で、設計が計算可能で、保証された性能で操作できる状態オブザーバの学習を可能にする。本稿では,3つの設計段階からなるモジュール設計方法論を定式化する。 i) 状態推定エラーの発散を許さずに、ダイナミックスを学習することのできる、初期の堅牢なオブザーバ設計(従って、安全)。 (II)ベイズ最適化とガウス過程を用いて非モデル化成分を推定する学習段階、及び (iii)状態推定誤差の収束率を改善するために学習ダイナミクスを利用する再設計フェーズ。提案する学習ベースオブザーバのポテンシャルをベンチマーク非線形システムで実証する。また、保証された推定性能の証明書を提供する。 Data generated from dynamical systems with unknown dynamics enable the learning of state observers that are: robust to modeling error, computationally tractable to design, and capable of operating with guaranteed performance. In this paper, a modular design methodology is formulated, that consists of three design phases: (i) an initial robust observer design that enables one to learn the dynamics without allowing the state estimation error to diverge (hence, safe); (ii) a learning phase wherein the unmodeled components are estimated using Bayesian optimization and Gaussian processes; and, (iii) a re-design phase that leverages the learned dynamics to improve convergence rate of the state estimation error. The potential of our proposed learning-based observer is demonstrated on a benchmark nonlinear system. Additionally, certificates of guaranteed estimation performance are provided.	翻訳日:2022-12-03 20:00:12 公開日:2021-06-25
# daemon:マルチステージ特徴マイニングを用いたデータセット非依存なマルウェア分類 DAEMON: Dataset-Agnostic Explainable Malware Classification Using Multi-Stage Feature Mining ( http://arxiv.org/abs/2008.01855v2 ) ライセンス: Link先を確認	Ron Korine and Danny Hendler	(参考訳) シグネチャに基づく検出を回避するために、悪意のあるプログラムのコードを変換する変異エンジンによって、多くの変成的および多形的な悪意のある変種が毎日自動的に生成される。これらの自動処理によってマルウェアの変種数が大幅に増加し、完全な手動解析が不可能になった。マルウェア分類は、新しい悪意のある亜種が属する家族を決定するタスクである。同じマルウェアファミリーの変異種は、同様の行動パターンを示す。したがって、新たに発見された悪意あるプログラムとアプリケーションの分類は、それらが引き起こすリスクを評価するのに役立つ。さらに、マルウェア分類は、新たに発見された変異種のうちどれがセキュリティ専門家による手動分析を受けるべきかを決定するのを容易とし、新しいファミリー(例えば、ゼロデイ脆弱性を利用するメンバー)に属するか、または単に既知の悪意のある家族内の概念のドリフトの結果であるかを決定する。これは近年、マルウェア分類のための高精度自動ツールの開発に力を入れている。本稿では,新しいデータセット非依存マルウェア分類器であるdaemonを提案する。デーモンの重要な特性は、使用する特徴の種類とそれらの採掘方法が、マルウェアファミリーの識別行動の理解を容易とし、その分類決定を説明可能にすることである。私たちは、x86バイナリの大規模なデータセットを使用して、DAEMONを最適化しました。その後、再トレーニングして、アルゴリズム的な変更なしに、多数のマルウェアファミリーからなる悪意のあるandroidアプリケーションの2つの大規模データセットに、機能再設計やパラメータチューニングを適用しました。 DAEMONは全てのデータセットの高精度な分類結果を得て、プラットフォームに依存しないことも確認した。 Numerous metamorphic and polymorphic malicious variants are generated automatically on a daily basis by mutation engines that transform the code of a malicious program while retaining its functionality, in order to evade signature-based detection. These automatic processes have greatly increased the number of malware variants, deeming their fully-manual analysis impossible. Malware classification is the task of determining to which family a new malicious variant belongs. Variants of the same malware family show similar behavioral patterns. Thus, classifying newly discovered malicious programs and applications helps assess the risks they pose. Moreover, malware classification facilitates determining which of the newly discovered variants should undergo manual analysis by a security expert, in order to determine whether they belong to a new family (e.g., one whose members exploit a zero-day vulnerability) or are simply the result of a concept drift within a known malicious family. This motivated intense research in recent years on devising high-accuracy automatic tools for malware classification. In this work, we present DAEMON - a novel dataset-agnostic malware classifier. A key property of DAEMON is that the type of features it uses and the manner in which they are mined facilitate understanding the distinctive behavior of malware families, making its classification decisions explainable. We've optimized DAEMON using a large-scale dataset of x86 binaries, belonging to a mix of several malware families targeting computers running Windows. We then re-trained it and applied it, without any algorithmic change, feature re-engineering or parameter tuning, to two other large-scale datasets of malicious Android applications consisting of numerous malware families. DAEMON obtained highly accurate classification results on all datasets, establishing that it is also platform-agnostic.	翻訳日:2022-11-03 01:17:49 公開日:2021-06-25
# 一般化線形モデルに対する線形およびスペクトル推定器の最適組み合わせ Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models ( http://arxiv.org/abs/2008.03326v3 ) ライセンス: Link先を確認	Marco Mondelli, Christos Thrampoulidis and Ramji Venkataramanan	(参考訳) 本研究では,gaussian sensing matrixを用いた一般化線形モデルから得られた未知信号 $\boldsymbol x$ の回復問題について検討する。 2つの一般的な解は、線形推定器 $\hat{\boldsymbol x}^{\rm l}$ とスペクトル推定器 $\hat{\boldsymbol x}^{\rm s}$ に基づいている。前者は測定行列の列のデータ依存線形結合であり、その解析は非常に単純である。後者はデータ依存行列の主要な固有ベクトルであり、最近の研究でその性能が研究されている。本稿では、$\hat{\boldsymbol x}^{\rm L}$と$\hat{\boldsymbol x}^{\rm s}$を最適に組み合わせる方法を示す。我々の分析の中心は、高次元極限において$(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$の合同経験的分布の正確な特徴づけである。これにより、$\hat{\boldsymbol x}^{\rm l}$ と $\hat{\boldsymbol x}^{\rm s}$ のベイズ・オプティカル結合を計算することができる。信号の分布がガウス的であるとき、ベイズ-最適結合は $\theta\hat{\boldsymbol x}^{\rm L}+\hat{\boldsymbol x}^{\rm s}$ という形式を持ち、最適結合係数を導出する。 $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$の極限分布を確立するために、反復が$\hat{\boldsymbol x}^{\rm L}$を付与する近似メッセージパッシング(AMP)アルゴリズムを設計・解析し、$\hat{\boldsymbol x}^{\rm s}$にアプローチする。数値シミュレーションにより, 2つの手法を別々に検討した結果, 提案手法の改良が示された。 We study the problem of recovering an unknown signal $\boldsymbol x$ given measurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator $\hat{\boldsymbol x}^{\rm L}$ and a spectral estimator $\hat{\boldsymbol x}^{\rm s}$. The former is a data-dependent linear combination of the columns of the measurement matrix, and its analysis is quite simple. The latter is the principal eigenvector of a data-dependent matrix, and a recent line of work has studied its performance. In this paper, we show how to optimally combine $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$. At the heart of our analysis is the exact characterization of the joint empirical distribution of $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$ in the high-dimensional limit. This allows us to compute the Bayes-optimal combination of $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$, given the limiting distribution of the signal $\boldsymbol x$. When the distribution of the signal is Gaussian, then the Bayes-optimal combination has the form $\theta\hat{\boldsymbol x}^{\rm L}+\hat{\boldsymbol x}^{\rm s}$ and we derive the optimal combination coefficient. In order to establish the limiting distribution of $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$, we design and analyze an Approximate Message Passing (AMP) algorithm whose iterates give $\hat{\boldsymbol x}^{\rm L}$ and approach $\hat{\boldsymbol x}^{\rm s}$. Numerical simulations demonstrate the improvement of the proposed combination with respect to the two methods considered separately.	翻訳日:2022-11-02 01:22:51 公開日:2021-06-25
# データの目における公平性: 機械学習モデルの証明 Fairness in the Eyes of the Data: Certifying Machine-Learning Models ( http://arxiv.org/abs/2009.01534v3 ) ライセンス: Link先を確認	Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, Joseph Keshet	(参考訳) 本稿では,対話型およびプライバシ保護テストに基づいて,モデルの公正度を認定するフレームワークを提案する。フレームワークはトレーニングプロセスやアーキテクチャに関係なく、トレーニングされたモデルを検証する。これにより、複数のフェアネス定義に基づくディープラーニングモデルの評価を経験的に行うことができる。テストデータはテスタにのみプライベートに提供されるか、あるいはモデル作成者にも事前に公開されている2つのシナリオに対処します。理論解析を用いて提案手法の健全性を調査し,インタラクティブテストのための統計的保証を提案する。最後に,参加者の機密データを隠蔽しながら,対象モデルへのブラックボックスアクセスのみを用いて,公正性テストを自動化する暗号手法を提案する。 We present a framework that allows to certify the fairness degree of a model based on an interactive and privacy-preserving test. The framework verifies any trained model, regardless of its training process and architecture. Thus, it allows us to evaluate any deep learning model on multiple fairness definitions empirically. We tackle two scenarios, where either the test data is privately available only to the tester or is publicly known in advance, even to the model creator. We investigate the soundness of the proposed approach using theoretical analysis and present statistical guarantees for the interactive test. Finally, we provide a cryptographic technique to automate fairness testing and certified inference with only black-box access to the model at hand while hiding the participants' sensitive data.	翻訳日:2022-10-22 06:50:14 公開日:2021-06-25
# 正規化流れにおける深さと条件の表現的側面 Representational aspects of depth and conditioning in normalizing flows ( http://arxiv.org/abs/2010.01155v2 ) ライセンス: Link先を確認	Frederic Koehler, Viraj Mehta, Andrej Risteski	(参考訳) 正規化フローは、特に画像において、データポイントの確率を効率的に評価できるため、生成モデリングにおいて最も一般的なパラダイムの一つである。これは、モデルの適合性を評価することと、トレーニングの容易性の両方において、勾配降下による可能性の最大化が望ましい。しかし、フローの正規化のトレーニングには、困難も伴う。良いサンプルを生成するモデルは通常、非常に深いものが必要です。それらは $\mathbb{R}^d \to \mathbb{R}^d$ から可逆写像としてパラメータ化され、画像のような典型的な訓練データは直感的に低次元であるため、学習された写像は特異点に近いヤコビアンを持つことが多い。本稿では,一般的な可逆アーキテクチャと一般的なアーキテクチャであるアフィンカップリングについて,奥行きと正規化フローの条件付けに関する表現的側面を取り上げる。 GLOWで使われているように、$\Theta(1)$アフィン結合層は、置換を正確に表すのに十分か、または1ドル \times 1$畳み込みを表すのに十分であることを示す。また, 浅いアフィンカップリングネットワークは不調が許容される場合, ワッサースタイン距離の普遍近似であり, パディングに関連する関連する現象を実験的に検討する。最後に,層ごとのニューロン数が少なく,リプシッツ定数が有界な一般フローアーキテクチャの深さ下界を示す。 Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point. This is desirable both for evaluating the fit of a model, and for ease of training, as maximizing the likelihood can be done by gradient descent. However, training normalizing flows comes with difficulties as well: models which produce good samples typically need to be extremely deep -- which comes with accompanying vanishing/exploding gradient problems. A very related problem is that they are often poorly conditioned: since they are parametrized as invertible maps from $\mathbb{R}^d \to \mathbb{R}^d$, and typical training data like images intuitively is lower-dimensional, the learned maps often have Jacobians that are close to being singular. In our paper, we tackle representational aspects around depth and conditioning of normalizing flows: both for general invertible architectures, and for a particular common architecture, affine couplings. We prove that $\Theta(1)$ affine coupling layers suffice to exactly represent a permutation or $1 \times 1$ convolution, as used in GLOW, showing that representationally the choice of partition is not a bottleneck for depth. We also show that shallow affine coupling networks are universal approximators in Wasserstein distance if ill-conditioning is allowed, and experimentally investigate related phenomena involving padding. Finally, we show a depth lower bound for general flow architectures with few neurons per layer and bounded Lipschitz constant.	翻訳日:2022-10-12 00:59:45 公開日:2021-06-25
# ニューラルマシン翻訳におけるソース分析と予測への目標貢献 Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation ( http://arxiv.org/abs/2010.10907v3 ) ライセンス: Link先を確認	Elena Voita, Rico Sennrich, Ivan Titov	(参考訳) ニューラルネットワーク翻訳(およびより一般的には、条件付き言語モデリング)では、ターゲットトークンの生成は、ターゲットシーケンスのソースとプレフィックスの2つのタイプのコンテキストに影響される。 NMTモデルの内部動作を理解するために多くの試みがなされているが、いずれも相対的な情報源と世代決定への目標貢献を明示的に評価するものではない。この相対的貢献は、Layerwise Relevance Propagation (LRP)の変種を採用することで評価できると論じる。他の方法とは異なり、トークンの重要性を反映した抽象的な量ではなく、それぞれのトークンの影響の比率を評価する。我々は、LPPをTransformerに拡張し、生成プロセスに対するソースおよびターゲット相対的コントリビューションを明確に評価するNMTモデルの解析を行う。本研究は,プレフィックスの種類による条件づけや,トレーニング目標やトレーニングデータ量の変化,トレーニングプロセスにおける貢献度の変化を分析する。より多くのデータでトレーニングされたモデルは、ソース情報に依存する傾向があり、より鋭いトークンコントリビュートを持つ傾向があることが分かりました。 In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We argue that this relative contribution can be evaluated by adopting a variant of Layerwise Relevance Propagation (LRP). Its underlying 'conservation principle' makes relevance propagation unique: differently from other methods, it evaluates not an abstract quantity reflecting token importance, but the proportion of each token's influence. We extend LRP to the Transformer and conduct an analysis of NMT models which explicitly evaluates the source and target relative contributions to the generation process. We analyze changes in these contributions when conditioning on different types of prefixes, when varying the training objective or the amount of training data, and during the training process. We find that models trained with more data tend to rely on source information more and to have more sharp token contributions; the training process is non-monotonic with several stages of different nature.	翻訳日:2022-10-04 23:50:19 公開日:2021-06-25
# 視覚オブジェクト操作のためのキーポイント予測モデルのマルチモーダル学習 Multi-Modal Learning of Keypoint Predictive Models for Visual Object Manipulation ( http://arxiv.org/abs/2011.03882v2 ) ライセンス: Link先を確認	Sarah Bechtle, Neha Das and Franziska Meier	(参考訳) 人間は、全く新しい環境でオブジェクトやツールを操作できるという印象的な一般化能力を持っている。これらの能力は、少なくとも部分的には、人間の体の内部モデルと把握された物体を持つ結果である。ロボットのボディスキーマを学習する方法は、まだ未解決の問題である。本研究では,視覚潜在表現から物体をつかむ際にロボットの運動モデルを拡張できる自己教師付きアプローチを開発した。本フレームワークは,(1) 物体上の視覚的キーポイントを予測するためにプロセプションと視覚を融合させて訓練したオートエンコーダアーキテクチャ,(2) 学習したキーポイント検出器を用いて,予測された視覚的キーポイントから仮想ジョイントを回帰することにより,キネマティックチェーンの拡張を学習する方法を示す。提案手法は,マニピュレータの手にある物体の視覚的キーポイントを一貫して予測することを学び,数秒間の視覚データから,様々な構成で把握された物体を含む拡張キネマティックチェーンの学習を容易にする。最後に, この拡張キネマティックチェーンは, 把握対象の配置やシミュレーション実験, ハードウェア上での実験など, オブジェクト操作作業に役立てることを示す。 Humans have impressive generalization capabilities when it comes to manipulating objects and tools in completely novel environments. These capabilities are, at least partially, a result of humans having internal models of their bodies and any grasped object. How to learn such body schemas for robots remains an open problem. In this work, we develop an self-supervised approach that can extend a robot's kinematic model when grasping an object from visual latent representations. Our framework comprises two components: (1) we present a multi-modal keypoint detector: an autoencoder architecture trained by fusing proprioception and vision to predict visual key points on an object; (2) we show how we can use our learned keypoint detector to learn an extension of the kinematic chain by regressing virtual joints from the predicted visual keypoints. Our evaluation shows that our approach learns to consistently predict visual keypoints on objects in the manipulator's hand, and thus can easily facilitate learning an extended kinematic chain to include the object grasped in various configurations, from a few seconds of visual data. Finally we show that this extended kinematic chain lends itself for object manipulation tasks such as placing a grasped object and present experiments in simulation and on hardware.	翻訳日:2022-09-28 08:35:52 公開日:2021-06-25
# Masked Proxy Loss for Text-Independent Speaker Verification (英語) Masked Proxy Loss For Text-Independent Speaker Verification ( http://arxiv.org/abs/2011.04491v2 ) ライセンス: Link先を確認	Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh	(参考訳) オープンセット話者認識は,クラス間分散を最大化し,クラス内分散を最小化する,メートル法学習問題とみなすことができる。教師付きメトリック学習は、エンティティベースの学習とプロキシベースの学習に分類できる。 contrastive、triplet、prototypical、ge2eなど、既存のメトリック学習の目標のほとんどは、前者の区分に属しており、そのパフォーマンスはサンプルマイニング戦略に大きく依存するか、ミニバッチのラベル情報に制限されている。プロキシベースの損失は、どちらの欠点も軽減するが、エンティティ間のきめ細かい接続は、そうでないか間接的に活用される。本稿では、プロキシベースの関係とペアベースの関係を直接組み込んだMasked Proxy(MP)損失を提案する。さらに,話者対の難易度を活用するために,マルチノマルマスクドプロキシ(mmp)損失を提案する。これらの手法はVoxCelebテストセットの評価に応用され、最先端のEER(Equal Error Rate)に達する。 Open-set speaker recognition can be regarded as a metric learning problem, which is to maximize inter-class variance and minimize intra-class variance. Supervised metric learning can be categorized into entity-based learning and proxy-based learning. Most of the existing metric learning objectives like Contrastive, Triplet, Prototypical, GE2E, etc all belong to the former division, the performance of which is either highly dependent on sample mining strategy or restricted by insufficient label information in the mini-batch. Proxy-based losses mitigate both shortcomings, however, fine-grained connections among entities are either not or indirectly leveraged. This paper proposes a Masked Proxy (MP) loss which directly incorporates both proxy-based relationships and pair-based relationships. We further propose Multinomial Masked Proxy (MMP) loss to leverage the hardness of speaker pairs. These methods have been applied to evaluate on VoxCeleb test set and reach state-of-the-art Equal Error Rate(EER).	翻訳日:2022-09-28 02:02:57 公開日:2021-06-25
# 強化因子ポートフォリオに対するディリクレポリシー Dirichlet policies for reinforced factor portfolios ( http://arxiv.org/abs/2011.05381v3 ) ライセンス: Link先を確認	Eric Andr\'e and Guillaume Coqueret	(参考訳) 本稿では、要素投資と強化学習(RL)を組み合わせることを目的とする。エージェントは、企業の特性に依存する逐次ランダム割り当てを通じて学習する。ディリクレ分布を駆動方針として用いることにより,性能尺度の政策勾配および分析特性の閉形式を導出する。これにより、米国株式の大きなデータセット上で実行されるREINFORCEメソッドの実装が可能になる。その結果、rlベースのポートフォリオは均等に重み付けされた(1/n)の割り当てに非常に近いことがわかった。これは、エージェントが因子に関して診断的であることを学ぶことを意味し、これは部分的には、リターンと強みの関係において強い時間変化を示す断面回帰によって説明できる。 This article aims to combine factor investing and reinforcement learning (RL). The agent learns through sequential random allocations which rely on firms' characteristics. Using Dirichlet distributions as the driving policy, we derive closed forms for the policy gradients and analytical properties of the performance measure. This enables the implementation of REINFORCE methods, which we perform on a large dataset of US equities. Across a large range of parametric choices, our result indicates that RL-based portfolios are very close to the equally-weighted (1/N) allocation. This implies that the agent learns to be agnostic with regard to factors, which can partly be explained by cross-sectional regressions showing a strong time variation in the relationship between returns and firm characteristics.	翻訳日:2022-09-27 08:16:52 公開日:2021-06-25
# iReason: ビデオと解釈可能な自然言語を用いたマルチモーダルコモンセンス推論 iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability ( http://arxiv.org/abs/2107.10300v1 ) ライセンス: Link先を確認	Aman Chadha and Vinija Jain	(参考訳) 因果的知識は、堅牢なAIシステムを構築する上で不可欠である。ディープラーニングモデルは、しばしば因果推論を必要とするタスクでパフォーマンスが悪く、入力ですぐには利用できないが、人間によって暗黙的に推論されるある種のコモンセンス知識を用いて導出されることが多い。先行研究は、モデルが因果性の欠如に危険を及ぼすような、散発的な観察バイアスを生じさせていない。言語表現モデルは学習された組込みの中で文脈知識を保存するが、訓練中の因果関係には影響しない。視覚認知タスク(シーン理解、ビデオキャプション、ビデオ質問回答など)を実行する既存のモデルに、入力特徴と因果関係をブレンドすることにより。 ) 因果関係がもたらす洞察により、より良いパフォーマンスを達成することができる。近年,視覚的・テキスト的モダリティから因果データを抽出する作業に取り組むモデルがいくつか提案されている。しかし、視覚と言語的モダリティを併せ持つ因果関係を探究する広範な研究は存在していない。画像は因果関係の知識を抽出するためのリッチでプロセスのリソースを提供するが、ビデオはより密度が高く、自然に時間順のイベントで構成されている。また、テキスト情報はビデオで暗黙的な詳細を提供する。 ireasonは,映像と自然言語キャプションを用いて視覚・視覚常識知識を推定するフレームワークである。さらに、iReasonのアーキテクチャは因果合理化モジュールを統合し、解釈可能性、エラー分析、バイアス検出のプロセスを支援する。言語表現学習モデル(BERT, GPT-2)と現在の最先端マルチモーダル因果モデルを用いた2段階比較分析によるiReasonの有効性を実証する。 Causality knowledge is vital to building robust AI systems. Deep learning models often perform poorly on tasks that require causal reasoning, which is often derived using some form of commonsense knowledge not immediately available in the input but implicitly inferred by humans. Prior work has unraveled spurious observational biases that models fall prey to in the absence of causality. While language representation models preserve contextual knowledge within learned embeddings, they do not factor in causal relationships during training. By blending causal relationships with the input features to an existing model that performs visual cognition tasks (such as scene understanding, video captioning, video question-answering, etc.), better performance can be achieved owing to the insight causal relationships bring about. Recently, several models have been proposed that have tackled the task of mining causal data from either the visual or textual modality. However, there does not exist widespread research that mines causal relationships by juxtaposing the visual and language modalities. While images offer a rich and easy-to-process resource for us to mine causality knowledge from, videos are denser and consist of naturally time-ordered events. Also, textual information offers details that could be implicit in videos. We propose iReason, a framework that infers visual-semantic commonsense knowledge using both videos and natural language captions. Furthermore, iReason's architecture integrates a causal rationalization module to aid the process of interpretability, error analysis and bias detection. We demonstrate the effectiveness of iReason using a two-pronged comparative analysis with language representation learning models (BERT, GPT-2) as well as current state-of-the-art multimodal causality models.	翻訳日:2021-07-25 11:53:39 公開日:2021-06-25
# (参考訳) aiSTROM -- AI戦略を成功させるロードマップ aiSTROM -- A roadmap for developing a successful AI strategy ( http://arxiv.org/abs/2107.06071v1 ) ライセンス: CC BY 4.0	Dorien Herremans	(参考訳) 1,870社のRackspace Technologyによる最近の調査によると、AI研究開発プロジェクトの34%が失敗または放棄されている。我々は、管理者が詳細な文献レビューに基づいてAI戦略を成功させるための新しい戦略フレームワーク、aiSTROMを提案する。これは、マネージャと開発者を、実装プロセスのさまざまな課題を通じて導く、ユニークで統合されたアプローチを提供する。 aiSTROMフレームワークでは、まずトップnプロジェクト(典型的には3-5)を特定します。それぞれ、焦点の7つの領域を徹底的に分析する。これらの領域には、独自の部門間機械学習データ要件、セキュリティ、法的要件を考慮したデータ戦略の作成が含まれる。そして、AI人材の不足を踏まえた学際的人工知能(AI)実装チームを編成する方法を考えるようマネージャに指示する。 AIチームの戦略が確立すれば、部門横断的あるいは独立した部門として、組織内に配置する必要があります。その他には、AI as a service(AIaas)やアウトソーシング開発などがある。新しい技術を見てみると、バイアス、ブラックボックスモデルの合法性、人間をループに留めるといった課題を考える必要がある。次に、他のプロジェクトと同様に、進捗を追跡し検証するために価値ベースのキーパフォーマンス指標(KPI)が必要です。企業のリスク戦略によって、SWOT分析(強度、弱点、機会、脅威)は、ショートリスト化されたプロジェクトをさらに分類するのに役立ちます。最後に、採用の文化を実現するために、当社の戦略が従業員の継続的な教育を含むことを確実にすべきです。このユニークで包括的なフレームワークは、マネージャとリードディベロッパに価値ある文学的サポートを提供する。 A total of 34% of AI research and development projects fails or are abandoned, according to a recent survey by Rackspace Technology of 1,870 companies. We propose a new strategic framework, aiSTROM, that empowers managers to create a successful AI strategy based on a thorough literature review. This provides a unique and integrated approach that guides managers and lead developers through the various challenges in the implementation process. In the aiSTROM framework, we start by identifying the top n potential projects (typically 3-5). For each of those, seven areas of focus are thoroughly analysed. These areas include creating a data strategy that takes into account unique cross-departmental machine learning data requirements, security, and legal requirements. aiSTROM then guides managers to think about how to put together an interdisciplinary artificial intelligence (AI) implementation team given the scarcity of AI talent. Once an AI team strategy has been established, it needs to be positioned within the organization, either cross-departmental or as a separate division. Other considerations include AI as a service (AIaas), or outsourcing development. Looking at new technologies, we have to consider challenges such as bias, legality of black-box-models, and keeping humans in the loop. Next, like any project, we need value-based key performance indicators (KPIs) to track and validate the progress. Depending on the company's risk-strategy, a SWOT analysis (strengths, weaknesses, opportunities, and threats) can help further classify the shortlisted projects. Finally, we should make sure that our strategy includes continuous education of employees to enable a culture of adoption. This unique and comprehensive framework offers a valuable, literature supported, tool for managers and lead developers.	翻訳日:2021-07-18 19:56:15 公開日:2021-06-25
# 深部ニューラルネットワークを用いた内因性電位を用いた自然脳と機械の相互作用 Towards Natural Brain-Machine Interaction using Endogenous Potentials based on Deep Neural Networks ( http://arxiv.org/abs/2107.07335v1 ) ライセンス: Link先を確認	Hyung-Ju Ahn, Dae-Hyeok Lee, Ji-Hoon Jeong, Seong-Whan Lee	(参考訳) 人間とロボットのコラボレーションは、自律ロボットの操作効率を最大化する可能性がある。脳機械インタフェース(BMI)は、ユーザーの意図や状態が神経活動から翻訳できるため、ロボットと協調する上で望ましい技術である。しかし、最も一般的な非侵襲的BMIモダリティの1つである脳波図(EEG)は、信号対雑音比が低いため、精度が低く、自由度(DoF)が制限されている。したがって、より柔軟なBMIベースの人間ロボットコラボレーションを開発するためには、マルチクラス脳波分類の性能向上が不可欠である。本研究では,運動画像 (MI) や視覚画像 (VI) ,音声画像 (SI) などの複数の内因性BMIパラダイムのパラダイム間分類の可能性を検討した。 MI, VI, SIの統計的, 神経生理学的解析を行い, 提案した時間情報ベースニューラルネットワーク(TINN)を用いて3つのパラダイムを分類した。 3つの内在的パラダイムを分類すると, 統計的に有意な特徴が異なる脳領域で抽出できることを確認した。さらに,提案したTINNは,従来の3種類の精神イメージタスク(MI, VI, SI)の分類法と比較して0.93の精度を示した。 Human-robot collaboration has the potential to maximize the efficiency of the operation of autonomous robots. Brain-machine interface (BMI) would be a desirable technology to collaborate with robots since the intention or state of users can be translated from the neural activities. However, the electroencephalogram (EEG), which is one of the most popularly used non-invasive BMI modalities, has low accuracy and a limited degree of freedom (DoF) due to a low signal-to-noise ratio. Thus, improving the performance of multi-class EEG classification is crucial to develop more flexible BMI-based human-robot collaboration. In this study, we investigated the possibility for inter-paradigm classification of multiple endogenous BMI paradigms, such as motor imagery (MI), visual imagery (VI), and speech imagery (SI), to enhance the limited DoF while maintaining robust accuracy. We conducted the statistical and neurophysiological analyses on MI, VI, and SI and classified three paradigms using the proposed temporal information-based neural network (TINN). We confirmed that statistically significant features could be extracted on different brain regions when classifying three endogenous paradigms. Moreover, our proposed TINN showed the highest accuracy of 0.93 compared to the previous methods for classifying three different types of mental imagery tasks (MI, VI, and SI).	翻訳日:2021-07-18 12:21:15 公開日:2021-06-25
# (参考訳) IoTセキュリティにおける侵入検出のためのフェデレーション学習:ハイブリッドアンサンブルアプローチ Federated Learning for Intrusion Detection in IoT Security: A Hybrid Ensemble Approach ( http://arxiv.org/abs/2106.15349v1 ) ライセンス: CC BY 4.0	Sayan Chatterjee and Manjesh K. Hanawal	(参考訳) スマートシティ、ヘルスケア、サプライチェーン、輸送といったさまざまなドメインにおけるIoT(Internet of Things)の役割は、悪意のある攻撃の標的となっている。この領域における過去の研究は、データ分析と脅威を特定する中央エンティティの存在を前提として、集中侵入検知システム(IDS)に焦点を当てていた。しかし、複数のソースにまたがるデータの拡散と中央ノードでの収集がコストがかかるため、そのようなIDSが常に実現可能であるとは限らない。また、初期の作業は、主にTrue Positive Rate(TPR)の改善に重点を置いており、システムの不必要なダウンタイムを回避する上でも不可欠である偽陽性レート(FPR)を無視している。本稿では、まず、PHECと呼ばれるハイブリッドアンサンブルモデルに基づくIDSのためのアーキテクチャを提案する。次に、このモデルを、ローカルトレーニングを実行し、モデルパラメータのみを集約する連合学習フレームワークに適応させる。次に、ラベルノイズ問題に対処するために、集中型およびフェデレートされた環境における耐雑音性PHECを提案する。提案手法は重み付き凸代理損失関数を用いた分類器を用いる。提案アーキテクチャでは,KNN分類器のノイズデータに対する自然な堅牢性も利用している。各種セキュリティ攻撃から得られた4つのベンチマークデータセットによる実験結果から,FPRを低ノイズでクリーンなデータに保ちながら高いTPRを達成することが示された。さらに, ハイブリッドアンサンブルモデルにより, 集中型設定に近いフェデレーション設定において, 性能が向上することを示した。 Critical role of Internet of Things (IoT) in various domains like smart city, healthcare, supply chain and transportation has made them the target of malicious attacks. Past works in this area focused on centralized Intrusion Detection System (IDS), assuming the existence of a central entity to perform data analysis and identify threats. However, such IDS may not always be feasible, mainly due to spread of data across multiple sources and gathering at central node can be costly. Also, the earlier works primarily focused on improving True Positive Rate (TPR) and ignored the False Positive Rate (FPR), which is also essential to avoid unnecessary downtime of the systems. In this paper, we first present an architecture for IDS based on hybrid ensemble model, named PHEC, which gives improved performance compared to state-of-the-art architectures. We then adapt this model to a federated learning framework that performs local training and aggregates only the model parameters. Next, we propose Noise-Tolerant PHEC in centralized and federated settings to address the label-noise problem. The proposed idea uses classifiers using weighted convex surrogate loss functions. Natural robustness of KNN classifier towards noisy data is also used in the proposed architecture. Experimental results on four benchmark datasets drawn from various security attacks show that our model achieves high TPR while keeping FPR low on noisy and clean data. Further, they also demonstrate that the hybrid ensemble models achieve performance in federated settings close to that of the centralized settings.	翻訳日:2021-07-02 04:49:45 公開日:2021-06-25
# (参考訳) PQK: プルーニング、量子化、知識蒸留によるモデル圧縮 PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation ( http://arxiv.org/abs/2106.14681v1 ) ライセンス: CC BY 4.0	Jangho Kim, Simyung Chang and Nojun Kwak	(参考訳) エッジデバイスが普及するにつれて、エッジデバイスにディープニューラルネットワーク(DNN)をデプロイすることが重要な問題となっている。しかし、DNNはエッジデバイスではほとんど利用できない高い計算資源を必要とする。そこで本稿では, プルーニング, 量子化, 知識蒸留(KD)プロセスからなるPQKと呼ばれる, 限られた計算資源を持つデバイスを対象とした新しいモデル圧縮手法を提案する。従来のプルーニングやKDとは異なり、PQKはプルーニング過程において重要でない重みを利用して、教師モデルを事前訓練することなく、より良い学生ネットワークをトレーニングするための教師ネットワークを構築している。 PQKには2つのフェーズがある。フェーズ1は、反復的プルーニングと量子化対応トレーニングを利用して、軽量で電力効率の良いモデルを作成する。第2相では、第1相未使用の重要度重みを刈り込みネットワークに付加して教師ネットワークを構築する。この教師ネットワークを用いて,学生ネットワークとして刈り取られたネットワークを訓練する。このような場合、教師と学生のネットワークが同一ネットワーク内で共存するため、KDフレームワーク用に事前学習した教師ネットワークは必要ない。本手法を認識モデルに適用し,キーワードスポッティング(KWS)と画像認識におけるPQKの有効性を検証する。 As edge devices become prevalent, deploying Deep Neural Networks (DNN) on edge devices has become a critical issue. However, DNN requires a high computational resource which is rarely available for edge devices. To handle this, we propose a novel model compression method for the devices with limited computational resources, called PQK consisting of pruning, quantization, and knowledge distillation (KD) processes. Unlike traditional pruning and KD, PQK makes use of unimportant weights pruned in the pruning process to make a teacher network for training a better student network without pre-training the teacher model. PQK has two phases. Phase 1 exploits iterative pruning and quantization-aware training to make a lightweight and power-efficient model. In phase 2, we make a teacher network by adding unimportant weights unused in phase 1 to a pruned network. By using this teacher network, we train the pruned network as a student network. In doing so, we do not need a pre-trained teacher network for the KD framework because the teacher and the student networks coexist within the same network. We apply our method to the recognition model and verify the effectiveness of PQK on keyword spotting (KWS) and image recognition.	翻訳日:2021-07-02 04:32:43 公開日:2021-06-25
# (参考訳) 問題を使って法的決定を説明する Using Issues to Explain Legal Decisions ( http://arxiv.org/abs/2106.14688v1 ) ライセンス: CC BY 4.0	Trevor Bench-Capon	(参考訳) 訴訟の結果を予測するために設計された機械学習システムからのアウトプットを説明する必要性は、従来のaiと法システム、特に因子ベースの推論と前例を用いた説明に対する新たな関心をもたらした。本稿では,このようなシステムに対してどのような説明が期待できるのか,特に問題の利用によって提供できる構造に焦点をあてて検討する。 The need to explain the output from Machine Learning systems designed to predict the outcomes of legal cases has led to a renewed interest in the explanations offered by traditional AI and Law systems, especially those using factor based reasoning and precedent cases. In this paper we consider what sort of explanations we should expect from such systems, with a particular focus on the structure that can be provided by the use of issues in cases.	翻訳日:2021-07-01 12:27:13 公開日:2021-06-25
# (参考訳) 層状量子近似最適化におけるトレーニング飽和 Training Saturation in Layerwise Quantum Approximate Optimisation ( http://arxiv.org/abs/2106.13814v1 ) ライセンス: CC BY 4.0	E. Campos, D. Rabinovich, V. Akshay, J. Biamonte	(参考訳) 量子近似最適化(QAOA)は、今日最も研究されているゲートベースの変分量子アルゴリズムである。同時にQAOAを1層にトレーニングし、$n$ qubitのターゲットステートでオーバーラップを最大化します。これによって、このようなトレーニングは常に -- \textit{training saturation} と呼ばれる -- 一定の深さの $p^$ で飽和していることが分かりました。我々は飽和に必要な条件を定式化する。数値的には、層状qaoaは深さ$p^=n$で最大重なりに達する。トレーニングに一貫性を欠くエラーを加えることで、飽和が解消され、層間トレーニングの堅牢性が回復する。本研究は,QAOAの性能限界と今後の展望について,新たな光を当てるものである。 Quantum Approximate Optimisation (QAOA) is the most studied gate based variational quantum algorithm today. We train QAOA one layer at a time to maximize overlap with an $n$ qubit target state. Doing so we discovered that such training always saturates -- called \textit{training saturation} -- at some depth $p^$, meaning that past a certain depth, overlap can not be improved by adding subsequent layers. We formulate necessary conditions for saturation. Numerically, we find layerwise QAOA reaches its maximum overlap at depth $p^=n$. The addition of coherent dephasing errors to training removes saturation, recovering robustness to layerwise training. This study sheds new light on the performance limitations and prospects of QAOA.	翻訳日:2021-07-01 11:55:24 公開日:2021-06-25
# (参考訳) ペルシャの修辞構造理論 Persian Rhetorical Structure Theory ( http://arxiv.org/abs/2106.13833v1 ) ライセンス: CC BY 4.0	Sara Shahmohammadi, Hadi Veisi, Ali Darzi	(参考訳) 過去数年間、談話分析や談話解析への関心は着実に高まり、多くの談話注釈コーパスが作られ、結果として談話パーサーが作られた。本稿では、レトリック構造理論の枠組みで構築されたペルシア語の言論注釈コーパスと、オープンソースの言論パーサであるDPLPパーサ上に構築された言論パーサについて述べる。私たちのコーパスは150のジャーナリストのテキストで構成され、各テキストは平均約400語である。コーパステキストは18の談話関係を用いて注釈付けされ、英語のrst談話ツリーバンクコーパスの注釈ガイドラインに基づいている。テキストレベルの談話パーサは金セグメンテーションを用いて訓練され,DPLP談話パーサ上に構築されている。スパン (s), 核性 (n), 関係性 (r) における我々の談話解析器の性能は, それぞれ78%, 64%, 44%であった。 Over the past years, interest in discourse analysis and discourse parsing has steadily grown, and many discourse-annotated corpora and, as a result, discourse parsers have been built. In this paper, we present a discourse-annotated corpus for the Persian language built in the framework of Rhetorical Structure Theory as well as a discourse parser built upon the DPLP parser, an open-source discourse parser. Our corpus consists of 150 journalistic texts, each text having an average of around 400 words. Corpus texts were annotated using 18 discourse relations and based on the annotation guideline of the English RST Discourse Treebank corpus. Our text-level discourse parser is trained using gold segmentation and is built upon the DPLP discourse parser, which uses a large-margin transition-based approach to solve the problem of discourse parsing. The performance of our discourse parser in span (S), nuclearity (N) and relation (R) detection is around 78%, 64%, 44% respectively, in terms of F1 measure.	翻訳日:2021-07-01 11:45:58 公開日:2021-06-25
# (参考訳) Ladder Polynomial Neural Networks Ladder Polynomial Neural Networks ( http://arxiv.org/abs/2106.13834v1 ) ライセンス: CC BY-SA 4.0	Li-Ping Liu, Ruiyuan Gu, Xiaozhe Hu	(参考訳) 多項式関数は有用な解析的性質を多数持っているが、それらの関数クラスは制限されていると考えられるため、学習モデルとして使われることは滅多にない。この研究は、適切な多項式関数を訓練すると強い学習モデルになることを示す。特にこの研究は、乗法から構築した新しい活性化関数である積活性化を用いて多項式フィードフォワードニューラルネットワークを構築する。新しいニューラルネットワークは多項式関数であり、多項式の順序を正確に制御する。バッチ正規化やドロップアウトといった標準的なトレーニングテクニックでトレーニングすることができる。この新しいfeedforwardネットワークは、いくつかの以前の多項式モデルを特別なケースとしてカバーする。一般的なフィードフォワードニューラルネットワークと比較して、多項式フィードフォワードネットワークはいくつかの興味深い量のクローズドフォーム計算を持ち、ベイズ学習において非常に有用である。経験的研究における回帰と分類の一連のタスクにおいて、提案モデルは以前の多項式モデルよりも優れている。 Polynomial functions have plenty of useful analytical properties, but they are rarely used as learning models because their function class is considered to be restricted. This work shows that when trained properly polynomial functions can be strong learning models. Particularly this work constructs polynomial feedforward neural networks using the product activation, a new activation function constructed from multiplications. The new neural network is a polynomial function and provides accurate control of its polynomial order. It can be trained by standard training techniques such as batch normalization and dropout. This new feedforward network covers several previous polynomial models as special cases. Compared with common feedforward neural networks, the polynomial feedforward network has closed-form calculations of a few interesting quantities, which are very useful in Bayesian learning. In a series of regression and classification tasks in the empirical study, the proposed model outperforms previous polynomial models.	翻訳日:2021-07-01 11:24:27 公開日:2021-06-25
# (参考訳) EARLIN:資源効率の協調推論のための早期分布検出 EARLIN: Early Out-of-Distribution Detection for Resource-efficient Collaborative Inference ( http://arxiv.org/abs/2106.13842v1 ) ライセンス: CC BY 4.0	Sumaiya Tabassum Nimi, Md Adnan Arefeen, Md Yusuf Sarwar Uddin, Yugyung Lee	(参考訳) 協調推論により、リソース制約のあるエッジデバイスは、重いディープラーニングモデルを実行するサーバ(クラウド)に入力(画像など)をアップロードすることで、推論を行うことができる。このセットアップは、成功した推論のためにコスト効率よく機能するが、モデルがトレーニングされていない入力サンプル(OOD(Out-of-Distribution)サンプル)に直面すると、非常にパフォーマンスが低下する。エッジデバイスが少なくとも、入力サンプルがOODであることを検出できれば、推論ワークロードのためにこれらの入力をサーバにアップロードしないことで、通信と計算リソースを節約できる可能性がある。本稿では,事前学習したCNNモデルの浅い層から重要な特徴を抽出し,縮小した特徴空間上に定義された距離関数に基づいて,入力サンプルをID(In-Distribution)またはOODとして検出する,新しい軽量なOOD検出手法を提案する。提案手法(a)は,事前学習したモデルに対して,それらのモデルの再トレーニングを伴わずに動作し,(b)任意のOODデータセットに自身を公開しない(すべての検出パラメータはIDトレーニングデータセットから得られる)。この目的のために、事前訓練されたモデルを用いて、OOD検出層でモデルを分割し、エッジデバイスとその他をクラウド上に展開するEARLIN(EARLy OOD Detection for Collaborative Inference)を開発した。実際のデータセットとプロトタイプの実装を用いて実験することにより,ベンチマークデータセットで事前学習された一般的なディープラーニングモデル上で,一般的なoodデータセットに対してテストした場合の全体的な精度とコストの観点から,他のアプローチよりも優れた結果が得られることを示す。 Collaborative inference enables resource-constrained edge devices to make inferences by uploading inputs (e.g., images) to a server (i.e., cloud) where the heavy deep learning models run. While this setup works cost-effectively for successful inferences, it severely underperforms when the model faces input samples on which the model was not trained (known as Out-of-Distribution (OOD) samples). If the edge devices could, at least, detect that an input sample is an OOD, that could potentially save communication and computation resources by not uploading those inputs to the server for inference workload. In this paper, we propose a novel lightweight OOD detection approach that mines important features from the shallow layers of a pretrained CNN model and detects an input sample as ID (In-Distribution) or OOD based on a distance function defined on the reduced feature space. Our technique (a) works on pretrained models without any retraining of those models, and (b) does not expose itself to any OOD dataset (all detection parameters are obtained from the ID training dataset). To this end, we develop EARLIN (EARLy OOD detection for Collaborative INference) that takes a pretrained model and partitions the model at the OOD detection layer and deploys the considerably small OOD part on an edge device and the rest on the cloud. By experimenting using real datasets and a prototype implementation, we show that our technique achieves better results than other approaches in terms of overall accuracy and cost when tested against popular OOD datasets on top of popular deep learning models pretrained on benchmark datasets.	翻訳日:2021-07-01 11:12:43 公開日:2021-06-25
# (参考訳) 近似最大半空間差 Approximate Maximum Halfspace Discrepancy ( http://arxiv.org/abs/2106.13851v1 ) ライセンス: CC BY 4.0	Michael Matheny and Jeff M. Phillips	(参考訳) 幾何学的範囲空間 $(X, \mathcal{H}_d)$ を考えると、$X \subset \mathbb{R}^d$ と $\mathcal{H}_d$ は$d$次元半空間で定義される範囲の集合である。この設定では、$x$ は赤と青の集合の合同和であると考える。各半空間 $h \in \mathcal{H}_d$ に対して、赤の分数と青の点の分数の差を測る函数 $\Phi(h)$ を定義する。この文脈における最大の相違問題は、$h^* = \arg \max_{h \in (X, \mathcal{H}_d)} \Phi(h)$ を見つけることである。代わりに、$\phi(h^) - \phi(\hat{h}) \le \varepsilon$ となる$\hat{h}$を求める。これは、機械学習の線形分類、空間的異常検出のための空間スキャン統計における中心的な問題であり、他の多くの領域に見られる。この問題に対する解は$o(\|x\| + (1/\varepsilon^d) \log^4 (1/\varepsilon))$ time で与えられる。 $d=2$ の場合、条件付き下界ではほぼ厳密であることを示す。異なる$\Phi$のクラスに対して、APSP に還元された完全解に対して $\Omega(\|X\|^{3/2 - o(1)})$ 時下界を与えるか、3SUM に還元された近似解に対して $\Omega(\|X\| + 1/\varepsilon^{2-o(1)})$ 時下界を与えることができる。主要な技術的結果は、$O(1/\varepsilon^d)$ with $O(\log (1/\varepsilon))$ query timeであり、$O(\|X\| + (1/\varepsilon^d) \log^4 (1/\varepsilon)$ timeである。 Consider the geometric range space $(X, \mathcal{H}_d)$ where $X \subset \mathbb{R}^d$ and $\mathcal{H}_d$ is the set of ranges defined by $d$-dimensional halfspaces. In this setting we consider that $X$ is the disjoint union of a red and blue set. For each halfspace $h \in \mathcal{H}_d$ define a function $\Phi(h)$ that measures the "difference" between the fraction of red and fraction of blue points which fall in the range $h$. In this context the maximum discrepancy problem is to find the $h^ = \arg \max_{h \in (X, \mathcal{H}_d)} \Phi(h)$. We aim to instead find an $\hat{h}$ such that $\Phi(h^*) - \Phi(\hat{h}) \le \varepsilon$. This is the central problem in linear classification for machine learning, in spatial scan statistics for spatial anomaly detection, and shows up in many other areas. We provide a solution for this problem in $O(\|X\| + (1/\varepsilon^d) \log^4 (1/\varepsilon))$ time, which improves polynomially over the previous best solutions. For $d=2$ we show that this is nearly tight through conditional lower bounds. For different classes of $\Phi$ we can either provide a $\Omega(\|X\|^{3/2 - o(1)})$ time lower bound for the exact solution with a reduction to APSP, or an $\Omega(\|X\| + 1/\varepsilon^{2-o(1)})$ lower bound for the approximate solution with a reduction to 3SUM. A key technical result is a $\varepsilon$-approximate halfspace range counting data structure of size $O(1/\varepsilon^d)$ with $O(\log (1/\varepsilon))$ query time, which we can build in $O(\|X\| + (1/\varepsilon^d) \log^4 (1/\varepsilon))$ time.	翻訳日:2021-07-01 10:56:38 公開日:2021-06-25
# (参考訳) セマンティックパーシング自然言語をリレーショナル代数に変換する Semantic Parsing Natural Language into Relational Algebra ( http://arxiv.org/abs/2106.13858v1 ) ライセンス: CC BY 4.0	Ruiyang Xu, Ayush Singh	(参考訳) データベースへの自然なインターフェース(NLIDB)は、過去数十年で多く研究されてきた。 NLIDBの中核は、自然言語をSQLに変換するために使われるセマンティックパーサである。従来のNLP方法論の解決策は文法規則パターン学習と中間論理形式によるペアリングに焦点を当てている。これらのメソッドは特定のデータベースやパースタスクに対して許容できるパフォーマンスを提供するが、一般化や拡張は困難である。一方,近年のニューラルディープ・ラーニングの進歩は,一般的なNLIDBシステム構築に有望な方向性をもたらすと考えられる。従来のアプローチとは異なり、これらの神経方法論は解析問題をシーケンスからシーケンスへの学習問題として扱う。本稿では,いくつかのシーケンスからシーケンスへの学習モデルを実験し,その性能を一般データベース解析タスクで評価する。 Natural interface to database (NLIDB) has been researched a lot during the past decades. In the core of NLIDB, is a semantic parser used to convert natural language into SQL. Solutions from traditional NLP methodology focuses on grammar rule pattern learning and pairing via intermediate logic forms. Although those methods give an acceptable performance on certain specific database and parsing tasks, they are hard to generalize and scale. On the other hand, recent progress in neural deep learning seems to provide a promising direction towards building a general NLIDB system. Unlike the traditional approach, those neural methodologies treat the parsing problem as a sequence-to-sequence learning problem. In this paper, we experimented on several sequence-to-sequence learning models and evaluate their performance on general database parsing task.	翻訳日:2021-07-01 10:33:14 公開日:2021-06-25
# (参考訳) AutoPipeline: 強化学習と検索を使ってデータパイプラインをターゲット別に合成する AutoPipeline: Synthesize Data Pipelines By-Target Using Reinforcement Learning and Search ( http://arxiv.org/abs/2106.13861v1 ) ライセンス: CC BY 4.0	Junwen Yang, Yeye He, Surajit Chaudhuri	(参考訳) 最近の作業は、文字列変換やテーブル操作演算子(join、groupby、pivotなど)のような単一のデータ準備ステップの自動化を支援する上で大きな進歩を遂げている。本研究では、文字列変換とテーブル操作演算子の両方で複雑なデータパイプラインを合成することにより、複数のステップをエンドツーエンドで自動化することを提案する。本稿では,従来のバイサンプルパラダイムとは大きく離れているパイプラインをユーザが容易に指定できる,新たな"バイターゲット"パラダイムを提案する。 by-targetを使用することで、ユーザは入力テーブル(csvやjsonファイルなど)を提供して、“ターゲットテーブル”(既存のデータベーステーブルやbiダッシュボードなど)を指して、希望するパイプラインからの出力がどのようにスキーマ的に“見た目”するかを実証する。問題は具体的でないように見えるが、FDやキーといった暗黙のテーブルの制約を利用して、空間を著しく制約し、問題を抽出できるようにするというユニークな洞察がある。我々は、強化学習と探索を用いてパイプラインを合成するオートパイプシステムを開発した。 GitHubからクロールされた多数の実パイプラインの実験によると、Auto-Pipelineは、これらの複雑なパイプラインの60～70%(最大10ステップ)を平均10～20秒で合成できる。 Recent work has made significant progress in helping users to automate single data preparation steps, such as string-transformations and table-manipulation operators (e.g., Join, GroupBy, Pivot, etc.). We in this work propose to automate multiple such steps end-to-end, by synthesizing complex data pipelines with both string transformations and table-manipulation operators. We propose a novel "by-target" paradigm that allows users to easily specify the desired pipeline, which is a significant departure from the traditional by-example paradigm. Using by-target, users would provide input tables (e.g., csv or json files), and point us to a "target table" (e.g., an existing database table or BI dashboard) to demonstrate how the output from the desired pipeline would schematically "look like". While the problem is seemingly underspecified, our unique insight is that implicit table constraints such as FDs and keys can be exploited to significantly constrain the space to make the problem tractable. We develop an Auto-Pipeline system that learns to synthesize pipelines using reinforcement learning and search. Experiments on large numbers of real pipelines crawled from GitHub suggest that Auto-Pipeline can successfully synthesize 60-70% of these complex pipelines (up to 10 steps) in 10-20 seconds on average.	翻訳日:2021-07-01 10:25:52 公開日:2021-06-25
# LB-CNN:チェインとキューピーを用いた軽二元畳み込みニューラルネットワークの高速トレーニングのためのオープンソースフレームワーク LB-CNN: An Open Source Framework for Fast Training of Light Binary Convolutional Neural Networks using Chainer and Cupy ( http://arxiv.org/abs/2106.15350v1 ) ライセンス: Link先を確認	Radu Dogaru, Ioana Dogaru	(参考訳) 軽量バイナリ畳み込みニューラルネットワーク(LB-CNN)は、多くの産業アプリケーションで必要とされる低エネルギーのコンピューティングプラットフォームで実装する場合、特に有用である。本稿では,コンパクトLB-CNNの最適化フレームワークを導入し,その有効性を評価する。このフレームワークは無償で利用可能であり、フリーアクセスのクラウドプラットフォームで動作する可能性がある。最適化されたモデルは標準化された.h5形式で保存され、特定の技術へのさらなる展開のための特別なツールへの入力として使用できる。モデル最適化,特にバイナリ畳み込みカーネルの選択を高速化する主な要素は,出力層を極端な学習機械として訓練するための大幅な高速化を提供するChainer/Cupy機械学習ライブラリである。 Keras/Tensorflowを使った出力層の追加トレーニングは、精度の向上を可能にするため含まれる。 MNIST, GTSRB, ORL, VGGなど, 広く使用されているデータセットの結果は, 精度と複雑性の間に非常に良い妥協点を示す。特に顔認識問題では、慎重に最適化されたlb-cnnモデルが最大100%の精度を提供する。このようなTinyMLソリューションは、低消費電力の画像認識を必要とする産業用途に適している。 Light binary convolutional neural networks (LB-CNN) are particularly useful when implemented in low-energy computing platforms as required in many industrial applications. Herein, a framework for optimizing compact LB-CNN is introduced and its effectiveness is evaluated. The framework is freely available and may run on free-access cloud platforms, thus requiring no major investments. The optimized model is saved in the standardized .h5 format and can be used as input to specialized tools for further deployment into specific technologies, thus enabling the rapid development of various intelligent image sensors. The main ingredient in accelerating the optimization of our model, particularly the selection of binary convolution kernels, is the Chainer/Cupy machine learning library offering significant speed-ups for training the output layer as an extreme-learning machine. Additional training of the output layer using Keras/Tensorflow is included, as it allows an increase in accuracy. Results for widely used datasets including MNIST, GTSRB, ORL, VGG show very good compromise between accuracy and complexity. Particularly, for face recognition problems a carefully optimized LB-CNN model provides up to 100% accuracies. Such TinyML solutions are well suited for industrial applications requiring image recognition with low energy consumption.	翻訳日:2021-06-30 15:31:57 公開日:2021-06-25
# (参考訳) POLAR:ニューラルネットワーク制御システムの検証のための多項式算術フレームワーク POLAR: A Polynomial Arithmetic Framework for Verifying Neural-Network Controlled Systems ( http://arxiv.org/abs/2106.13867v1 ) ライセンス: CC BY 4.0	Chao Huang, Jiameng Fan, Xin Chen, Wenchao Li, Qi Zhu	(参考訳) 本稿では,ニューラルネットワーク制御システム(NNCSs)の有界時間到達性解析に多項式オーバー近似を時間間隔で利用した,テキストbf{pol}ynomial \textbf{ar}ithmetic frameworkであるPOLARを提案する。標準テイラーモデルを用いた既存の算術手法と比較して,本手法では,連続活性化関数のベルンシュタイン多項式補間法と他の演算のテイラーモデル算術法を組み合わせて,ニューロン出力を層々に重ね合わせて反復的に近似する新しい手法を用いる。このアプローチは標準テイラーモデルの算術における主な欠点を克服することができる。テイラー多項式で十分に近似できない関数を扱うことができず、NNCSの到達可能な状態計算の精度と効率を大幅に改善する。さらに,本手法では,ニューラルネットワークの出力範囲を推定する際に,線形写像下でのテイラーモデル残差を象徴的に保持する。 POLARが既存のTaylorモデルフローパイプ構築技術とシームレスに統合できることを示し、POLARが一連のベンチマークで現在の最先端技術よりも大幅に優れていることを示す。 We propose POLAR, a \textbf{pol}ynomial \textbf{ar}ithmetic framework that leverages polynomial overapproximations with interval remainders for bounded-time reachability analysis of neural network-controlled systems (NNCSs). Compared with existing arithmetic approaches that use standard Taylor models, our framework uses a novel approach to iteratively overapproximate the neuron output ranges layer-by-layer with a combination of Bernstein polynomial interpolation for continuous activation functions and Taylor model arithmetic for the other operations. This approach can overcome the main drawback in the standard Taylor model arithmetic, i.e. its inability to handle functions that cannot be well approximated by Taylor polynomials, and significantly improve the accuracy and efficiency of reachable states computation for NNCSs. To further tighten the overapproximation, our method keeps the Taylor model remainders symbolic under the linear mappings when estimating the output range of a neural network. We show that POLAR can be seamlessly integrated with existing Taylor model flowpipe construction techniques, and demonstrate that POLAR significantly outperforms the current state-of-the-art techniques on a suite of benchmarks.	翻訳日:2021-06-30 15:04:19 公開日:2021-06-25
# (参考訳) トランスフラワー:マルチモーダルアテンションによる確率的自己回帰ダンス生成 Transflower: probabilistic autoregressive dance generation with multimodal attention ( http://arxiv.org/abs/2106.13871v1 ) ライセンス: CC BY 4.0	Guillermo Valle-P\'erez, Gustav Eje Henter, Jonas Beskow, Andr\'e Holzapfel, Pierre-Yves Oudeyer, Simon Alexanderson	(参考訳) ダンスは、音楽のリズム、調律、音節の特徴に従う複雑な動きの巧みな構成を必要とする。形式的には、音楽に条件付けされたダンスの生成は、オーディオ信号に基づいて条件付けされた高次元連続モーション信号をモデル化する問題として表現することができる。この作業では、この問題に取り組むために2つの貢献をします。まず,マルチモーダルトランスフォーマーエンコーダを用いて,前回のポーズと音楽コンテキストで条件付けられた正規化フローを用いて,将来的なポーズの分布をモデル化する,新しい確率的自己回帰型アーキテクチャを提案する。第2に,現在最大規模の3dダンスモーションデータセットを紹介し,さまざまなモーションキャプチャ技術を用いて,プロダンサーとカジュアルダンサーの両方を含む。このデータセットを用いて,客観的指標とユーザスタディを用いて,新たなモデルを2つのベースラインと比較し,確率分布をモデル化する能力と,大きな動きや音楽のコンテキストを乗り越える能力の両方が,音楽にマッチする興味深い,多様で現実的なダンスを生み出すために必要なことを示す。 Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder. Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers. Using this dataset, we compare our new model against two baselines, via objective metrics and a user study, and show that both the ability to model a probability distribution, as well as being able to attend over a large motion and music context are necessary to produce interesting, diverse, and realistic dance that matches the music.	翻訳日:2021-06-30 14:42:49 公開日:2021-06-25
# (参考訳) コモンセンスを用いたRationale-Inspireed Natural Language Explanations Rationale-Inspired Natural Language Explanations with Commonsense ( http://arxiv.org/abs/2106.13876v1 ) ライセンス: CC BY 4.0	Bodhisattwa Prasad Majumder, Oana-Maria Camburu, Thomas Lukasiewicz, Julian McAuley	(参考訳) 説明可能な機械学習モデルは、主に、抽出的論理(入力特徴のサブセット)または抽象的正当化として自由テキスト自然言語説明(NLE)を用いて予測されたラベルを正当化する。 NLEは抽出的理性よりも包括的であるが、機械生成のNLEは時に常識的知識を欠いていることが示されている。ここでは,コモンセンス知識が抽出的合理性とnlesの橋渡しとして機能し,両タイプの説明をより良くすることを示す。より正確には、(1)機械予測に責任のある特徴の集合として有理を抽出し、(2)利用可能なコモンセンスリソースを用いて抽出有理を拡大し、(3)拡張された知識を用いて自然言語の説明を生成する、RExC(Rationale-Inspired Explanations with Commonsense)と呼ばれる統一的なフレームワークを導入する。我々のフレームワークは、自然言語処理と視覚言語理解の両方において5つのタスクにまたがるNLEの生成において、これまでの最先端よりも大きなマージンを上回り、人間のアノテータは、RExCが生成した説明をより包括的で、常識に根ざし、従来の最先端モデルよりも全体的に好まれていることを常に評価している。さらに,コモンセンスに基づく説明により,作業性能と合理化抽出能力が向上することを示す。 Explainable machine learning models primarily justify predicted labels using either extractive rationales (i.e., subsets of input features) or free-text natural language explanations (NLEs) as abstractive justifications. While NLEs can be more comprehensive than extractive rationales, machine-generated NLEs have been shown to sometimes lack commonsense knowledge. Here, we show that commonsense knowledge can act as a bridge between extractive rationales and NLEs, rendering both types of explanations better. More precisely, we introduce a unified framework, called RExC (Rationale-Inspired Explanations with Commonsense), that (1) extracts rationales as a set of features responsible for machine predictions, (2) expands the extractive rationales using available commonsense resources, and (3) uses the expanded knowledge to generate natural language explanations. Our framework surpasses by a large margin the previous state-of-the-art in generating NLEs across five tasks in both natural language processing and vision-language understanding, with human annotators consistently rating the explanations generated by RExC to be more comprehensive, grounded in commonsense, and overall preferred compared to previous state-of-the-art models. Moreover, our work shows that commonsense-grounded explanations can enhance both task performance and rationales extraction capabilities.	翻訳日:2021-06-30 14:14:41 公開日:2021-06-25
# (参考訳) Pastprop-RNN:過去補正による未来予測の改善 Pastprop-RNN: improved predictions of the future by correcting the past ( http://arxiv.org/abs/2106.13881v1 ) ライセンス: CC BY 4.0	Andr\'e Baptista, Yassine Baghoussi, Carlos Soares, Jo\~ao Mendes-Moreira, Miguel Arantes	(参考訳) 予測精度は、利用可能な過去のデータの品質に依存する。データ破壊は生成されたモデルの品質(例)に悪影響を及ぼす可能性がある。需要予測時の在庫外商品などの予期せぬ出来事) 未来をよりよく説明するために、過去にどのようにデータが必要だったかを予測します。本研究では,データ中心のバックプロパゲーションアルゴリズムであるPassprop-LSTMを提案する。競合データセット M4 と M5 の予測と Numenta Anomaly Benchmark の3種類の Pastprop-LSTM を検証した。実験により,標準LSTMの予測誤差が高い場合,提案手法は予測精度を向上させることができることが示された。また、異常を含むデータセット上でアルゴリズムの可能性を示す。 Forecasting accuracy is reliant on the quality of available past data. Data disruptions can adversely affect the quality of the generated model (e.g. unexpected events such as out-of-stock products when forecasting demand). We address this problem by pastcasting: predicting how data should have been in the past to explain the future better. We propose Pastprop-LSTM, a data-centric backpropagation algorithm that assigns part of the responsibility for errors to the training data and changes it accordingly. We test three variants of Pastprop-LSTM on forecasting competition datasets, M4 and M5, plus the Numenta Anomaly Benchmark. Empirical evaluation indicates that the proposed method can improve forecasting accuracy, especially when the prediction errors of standard LSTM are high. It also demonstrates the potential of the algorithm on datasets containing anomalies.	翻訳日:2021-06-30 13:51:57 公開日:2021-06-25
# (参考訳) 関係帯域に対する信頼度境界付き知識注入型政策勾配 Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits ( http://arxiv.org/abs/2106.13895v1 ) ライセンス: CC BY 4.0	Kaushik Roy and Qi Zhang and Manas Gaur and Amit Sheth	(参考訳) コンテキストバンドは、オンライン広告、レコメンデーションシステム、ヘルスケアなど、さまざまな現実のシナリオで重要なユースケースを見つけます。しかし、ほとんどのアルゴリズムは文脈を表現するのに平坦な特徴ベクトルを使い、現実の世界では様々なオブジェクトとそれらの間の関係が文脈でモデル化される。例えば、音楽レコメンデーションシステムでは、ユーザのコンテキストには、聴く音楽、どのアーティストがこの音楽を作成するか、アーティストのアルバムなどが含まれる。よりリッチなリレーショナルコンテキスト表現を追加することで、探索と探索が難しくなる。探索探索戦略を導くために、文脈に関する探索探索知識の効率を向上させる。リレーショナルな文脈表現は、人間が記述的な性質から知識を特定できる自然な方法である。本研究では,知識注入政策グラディエンスを文脈帯域設定や新しい知識注入政策グラディエンス・アッパー信頼境界アルゴリズムに適応させ,模擬音楽レコメンデーションデータセットと各種実生活データセットを実験的に解析し,専門家の知識が全体の後悔を劇的に減らし,それを不可能にする方法を提案する。 Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc. However, most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context. For example, in a music recommendation system, the user context contains what music they listen to, which artists create this music, the artist albums, etc. Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder. To improve the efficiency of exploration-exploitation knowledge about the context can be infused to guide the exploration-exploitation strategy. Relational context representations allow a natural way for humans to specify knowledge owing to their descriptive nature. We propose an adaptation of Knowledge Infused Policy Gradients to the Contextual Bandit setting and a novel Knowledge Infused Policy Gradients Upper Confidence Bound algorithm and perform an experimental analysis of a simulated music recommendation dataset and various real-life datasets where expert knowledge can drastically reduce the total regret and where it cannot.	翻訳日:2021-06-30 13:42:42 公開日:2021-06-25
# (参考訳) ローリング水平展開による学習状態空間モデルによる予測制御 Predictive Control Using Learned State Space Models via Rolling Horizon Evolution ( http://arxiv.org/abs/2106.13911v1 ) ライセンス: CC BY 4.0	Alvaro Ovalle, Simon M. Lucas	(参考訳) モデルに基づく強化学習への関心の大部分は、戦略的長期的意思決定が可能な前方モデルを取得する可能性から導かれる。エージェントが有用な予測モデルを学ぶのに成功すると仮定すると、シミュレーションされた計画の生成と選択にそれを利用するメカニズムが必要である。本稿では,進化的アルゴリズム計画手法とディープラーニングと変分推論を用いて学習したモデルを組み合わせることを目的とした。視覚的ナビゲーションタスクのセットでオンライン計画を確実に行うエージェントを用いて,このアプローチを実証する。 A large part of the interest in model-based reinforcement learning derives from the potential utility to acquire a forward model capable of strategic long term decision making. Assuming that an agent succeeds in learning a useful predictive model, it still requires a mechanism to harness it to generate and select among competing simulated plans. In this paper, we explore this theme combining evolutionary algorithmic planning techniques with models learned via deep learning and variational inference. We demonstrate the approach with an agent that reliably performs online planning in a set of visual navigation tasks.	翻訳日:2021-06-30 13:27:32 公開日:2021-06-25
# XL-Sum:44言語のための大規模多言語抽象要約 XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages ( http://arxiv.org/abs/2106.13822v1 ) ライセンス: Link先を確認	Tahmid Hasan, Abhik Bhattacharjee, Md Saiful Islam, Kazi Samin, Yuan-Fang Li, Yong-Bin Kang, M. Sohel Rahman, Rifat Shahriyar	(参考訳) 抽象的テキスト要約(abstractive text summarization)に関する現代の研究は、主に英語のような高リソース言語に焦点を当ててきた。本稿では,bbcから100万の専門的注釈付記事要約ペアからなる包括的かつ多様なデータセットであるxl-sumを提案する。データセットは、ローからハイリソースまでの44の言語をカバーしており、その多くは、現在公開データセットが使用できない。 XL-Sumは非常に抽象的で簡潔で高品質で、人間や本質的な評価によって示される。我々は,最先端の事前学習型多言語モデルであるmt5をxl-sumで微調整し,多言語および低リソースの要約タスク実験を行った。 XL-Sumは、類似したモノリンガルデータセットを用いて得られたものと比較して、競合的な結果を誘導する: ベンチマークした10言語で11 ROUGE-2スコアを上回り、そのうちのいくつかはマルチリンガルトレーニングによって得られた15を超えている。さらに、低リソース言語でのトレーニングは、個々に競争的なパフォーマンスを提供する。我々の知る限り、XL-Sumは単一のソースから収集されたサンプルの数とカバーされる言語数で最大の抽象的な要約データセットである。我々は,多言語抽象要約に関する今後の研究を促進するために,データセットとモデルをリリースする。リソースは \url{https://github.com/csebuetnlp/xl-sum} にある。 Contemporary works on abstractive text summarization have focused primarily on high-resource languages like English, mostly due to the limited availability of datasets for low/mid-resource ones. In this work, we present XL-Sum, a comprehensive and diverse dataset comprising 1 million professionally annotated article-summary pairs from BBC, extracted using a set of carefully designed heuristics. The dataset covers 44 languages ranging from low to high-resource, for many of which no public dataset is currently available. XL-Sum is highly abstractive, concise, and of high quality, as indicated by human and intrinsic evaluation. We fine-tune mT5, a state-of-the-art pretrained multilingual model, with XL-Sum and experiment on multilingual and low-resource summarization tasks. XL-Sum induces competitive results compared to the ones obtained using similar monolingual datasets: we show higher than 11 ROUGE-2 scores on 10 languages we benchmark on, with some of them exceeding 15, as obtained by multilingual training. Additionally, training on low-resource languages individually also provides competitive performance. To the best of our knowledge, XL-Sum is the largest abstractive summarization dataset in terms of the number of samples collected from a single source and the number of languages covered. We are releasing our dataset and models to encourage future research on multilingual abstractive summarization. The resources can be found at \url{https://github.com/csebuetnlp/xl-sum}.	翻訳日:2021-06-29 18:13:37 公開日:2021-06-25
# ビルディングブリッジ:AI倫理を探求するジェネレーティブアートワーク Building Bridges: Generative Artworks to Explore AI Ethics ( http://arxiv.org/abs/2106.13901v1 ) ライセンス: Link先を確認	Ramya Srinivasan and Devi Parikh	(参考訳) 近年,人工知能(AI)技術が社会に与える影響の理解と緩和に重点が置かれている。学術、産業、政府機関全体で、AI倫理の強化に向けた様々な取り組みが追求されている。倫理的AIシステムの設計における重要な課題は、AIパイプラインには複数の利害関係者があり、それぞれがそれぞれ独自の制約と関心を持っていることだ。例えば、AIモデルを設計し開発するAI研究者は、AI決定の複合的な効果によって消費者の生活に生じる不安定性を必ずしも認識していない。より広い文脈で、異なるステークホルダーの役割と責任について教育する必要がある。本稿では,異なる視点を捉えるためのアクセス可能で強力な教育ツールとして機能することにより,生成的アートワークがこの役割を果たす可能性について概説する。 AI倫理を強化するツールとして、計算創造性に関する学際的な議論を広く起こしたいと考えています。 In recent years, there has been an increased emphasis on understanding and mitigating adverse impacts of artificial intelligence (AI) technologies on society. Across academia, industry, and government bodies, a variety of endeavours are being pursued towards enhancing AI ethics. A significant challenge in the design of ethical AI systems is that there are multiple stakeholders in the AI pipeline, each with their own set of constraints and interests. These different perspectives are often not understood, due in part to communication gaps.For example, AI researchers who design and develop AI models are not necessarily aware of the instability induced in consumers' lives by the compounded effects of AI decisions. Educating different stakeholders about their roles and responsibilities in the broader context becomes necessary. In this position paper, we outline some potential ways in which generative artworks can play this role by serving as accessible and powerful educational tools for surfacing different perspectives. We hope to spark interdisciplinary discussions about computational creativity broadly as a tool for enhancing AI ethics.	翻訳日:2021-06-29 17:57:29 公開日:2021-06-25
# 超音波スキャンにおけるcnnセグメンテーションに基づく物体検出・追跡法と迷走神経検出への応用 A CNN Segmentation-Based Approach to Object Detection and Tracking in Ultrasound Scans with Application to the Vagus Nerve Detection ( http://arxiv.org/abs/2106.13849v1 ) ライセンス: Link先を確認	Abdullah F. Al-Battal, Yan Gong, Lu Xu, Timothy Morton, Chen Du, Yifeng Bu 1, Imanuel R Lerman, Radhika Madhavan, Truong Q. Nguyen	(参考訳) 超音波検査はいくつかの医療診断や治療に不可欠である。治療計画に影響を与える解剖学的特徴や構造を可視化し分析するために用いられる。しかし、どちらも労働集約的であり、その効果は操作者に依存する。リアルタイムで正確でロバストな解剖学的構造の自動検出と追跡は、診断と治療の手順に一貫性と効率性に大きな影響を与える。本稿では,超音波スキャンで特定の解剖学的標的構造を自動的に検出し追跡する深層学習フレームワークを提案する。我々のフレームワークは、被験者や撮像装置間で正確で堅牢で、リアルタイムで動作し、大規模なトレーニングセットを必要としないように設計されています。ローカライズ精度を維持しており、元のトレーニングセットの20%程度の大きさのトレーニングセットでトレーニングした場合、90%以上をリコールする。フレームワークのバックボーンは、U-Netに基づいた弱いトレーニングを受けたセグメンテーションニューラルネットワークである。このフレームワークを2つの異なる超音波データセット上でテストし、Vagus神経を検出し、追跡することを目的として、最先端のリアルタイム物体検出ネットワークよりも優れた性能を示した。 Ultrasound scanning is essential in several medical diagnostic and therapeutic applications. It is used to visualize and analyze anatomical features and structures that influence treatment plans. However, it is both labor intensive, and its effectiveness is operator dependent. Real-time accurate and robust automatic detection and tracking of anatomical structures while scanning would significantly impact diagnostic and therapeutic procedures to be consistent and efficient. In this paper, we propose a deep learning framework to automatically detect and track a specific anatomical target structure in ultrasound scans. Our framework is designed to be accurate and robust across subjects and imaging devices, to operate in real-time, and to not require a large training set. It maintains a localization precision and recall higher than 90% when trained on training sets that are as small as 20% in size of the original training set. The framework backbone is a weakly trained segmentation neural network based on U-Net. We tested the framework on two different ultrasound datasets with the aim to detect and track the Vagus nerve, where it outperformed current state-of-the-art real-time object detection networks.	翻訳日:2021-06-29 17:54:55 公開日:2021-06-25
# 半監督Raw-to-Rawマッピング Semi-Supervised Raw-to-Raw Mapping ( http://arxiv.org/abs/2106.13883v1 ) ライセンス: Link先を確認	Mahmoud Afifi and Abdullah Abuolaim	(参考訳) カメラセンサーの生RGB色は、異なるセンサーのメーカーやモデル間のスペクトル感度の違いによって異なる。本稿では,異なるセンサRGB色空間間のマッピング作業に焦点をあてる。以前の研究は、正確な色マッピングを実現するためにペアワイズキャリブレーションを使用してこの問題に対処した。精度は高いものの、(1)カラーキャリブレーション対象を各シーンに配置した両カメラ装置で一対の画像を撮影する、(2)カラーキャリブレーション対象の正確な画像アライメントまたは手動アノテーション。本稿では,より実用的な構成で生空間のカラーマッピングを実現することを目的とする。具体的には,各カメラ装置で撮影された非ペア画像群とペア画像群で訓練された半教師付きraw-to-rawマッピング法を提案する。実験により,本手法は単一校正法に加えて,他の領域適応法よりも優れた結果が得られることを示す。この取り組みの一環として、2つの異なるスマートフォンカメラから生画像の新しいデータセットを作成しました。データセットには、セミ教師付きトレーニングと評価のためのペアとペアのセットが含まれています。 The raw-RGB colors of a camera sensor vary due to the spectral sensitivity differences across different sensor makes and models. This paper focuses on the task of mapping between different sensor raw-RGB color spaces. Prior work addressed this problem using a pairwise calibration to achieve accurate color mapping. Although being accurate, this approach is less practical as it requires: (1) capturing pair of images by both camera devices with a color calibration object placed in each new scene; (2) accurate image alignment or manual annotation of the color calibration object. This paper aims to tackle color mapping in the raw space through a more practical setup. Specifically, we present a semi-supervised raw-to-raw mapping method trained on a small set of paired images alongside an unpaired set of images captured by each camera device. Through extensive experiments, we show that our method achieves better results compared to other domain adaptation alternatives in addition to the single-calibration solution. We have generated a new dataset of raw images from two different smartphone cameras as part of this effort. Our dataset includes unpaired and paired sets for our semi-supervised training and evaluation.	翻訳日:2021-06-29 17:54:39 公開日:2021-06-25
# フォトニック回路インスパイアされた小型ネットワーク:エッジにおけるリアルタイム無線信号分類を目指して A Photonic-Circuits-Inspired Compact Network: Toward Real-Time Wireless Signal Classification at the Edge ( http://arxiv.org/abs/2106.13865v1 ) ライセンス: Link先を確認	Hsuan-Tung Peng, Joshua Lederman, Lei Xu, Thomas Ferreira de Lima, Chaoran Huang, Bhavin Shastri, David Rosenbluth, Paul Prucnal	(参考訳) 機械学習(ML)法は無線通信システムに広く普及しており、無線周波数(RF)フィンガープリント、自動変調分類、認知無線などの応用に強力であることが証明されている。しかし、mlモデルのサイズが大きいため、レイテンシに敏感なダウンストリームタスクのためにエッジデバイスを実装するのが難しくなる。無線通信システムでは、ミリ秒以下のスケールでのMLデータ処理により、リアルタイムネットワーク監視がセキュリティを改善し、侵入を防ぐことができる。さらに、チップスケールでMLモデルを実装可能なコンパクトで統合可能なハードウェアプラットフォームは、無線通信ネットワークに対するより広範な応用を見出すだろう。エッジにおけるリアルタイム無線信号分類に向けて,フォトニックハードウェアに触発されたリカレントニューラルネットワークモデルと簡易畳み込み分類器を組み合わせた,新しい小型深層ネットワークを提案し,そのランダム送信によるrfエミッタ同定への応用を実証する。提案モデルにより、既存の最先端CNN分類器の50倍のトレーニングパラメータを使用する場合、ZigBeeと同一の30個のデバイスに対して96.32%の分類精度が得られる。ネットワークサイズを大幅に削減したことにより、小型FPGAボードPYNQ-Z1を用いて、0.219ミリ秒のリアルタイムRFフィンガープリントを実演した。 Machine learning (ML) methods are ubiquitous in wireless communication systems and have proven powerful for applications including radio-frequency (RF) fingerprinting, automatic modulation classification, and cognitive radio. However, the large size of ML models can make them difficult to implement on edge devices for latency-sensitive downstream tasks. In wireless communication systems, ML data processing at a sub-millisecond scale will enable real-time network monitoring to improve security and prevent infiltration. In addition, compact and integratable hardware platforms which can implement ML models at the chip scale will find much broader application to wireless communication networks. Toward real-time wireless signal classification at the edge, we propose a novel compact deep network that consists of a photonic-hardware-inspired recurrent neural network model in combination with a simplified convolutional classifier, and we demonstrate its application to the identification of RF emitters by their random transmissions. With the proposed model, we achieve 96.32% classification accuracy over a set of 30 identical ZigBee devices when using 50 times fewer training parameters than an existing state-of-the-art CNN classifier. Thanks to the large reduction in network size, we demonstrate real-time RF fingerprinting with 0.219 ms latency using a small-scale FPGA board, the PYNQ-Z1.	翻訳日:2021-06-29 17:47:17 公開日:2021-06-25
# 量子データ圧縮と量子クロスエントロピー Quantum Data Compression and Quantum Cross Entropy ( http://arxiv.org/abs/2106.13823v1 ) ライセンス: Link先を確認	Zhou Shangnan	(参考訳) 量子機械学習は、機械学習と量子コンピューティングの交差点における新興分野である。量子機械学習の理論の基礎となる中心的な量は、量子クロスエントロピーである。本稿では,量子クロスエントロピーが準最適量子源符号化の圧縮率であることを示す。そこで我々は,可変長符号化の量子一般化と量子強度の典型性に基づいて開発した,単純で普遍的な量子データ圧縮プロトコルを提案する。 Quantum machine learning is an emerging field at the intersection of machine learning and quantum computing. A central quantity for the theoretical foundation of quantum machine learning is the quantum cross entropy. In this paper, we present one operational interpretation of this quantity, that the quantum cross entropy is the compression rate for sub-optimal quantum source coding. To do so, we give a simple, universal quantum data compression protocol, which is developed based on quantum generalization of variable-length coding, as well as quantum strong typicality.	翻訳日:2021-06-29 17:38:03 公開日:2021-06-25
# 自己ペース主成分分析 Self-paced Principal Component Analysis ( http://arxiv.org/abs/2106.13880v1 ) ライセンス: Link先を確認	Zhao Kang, Hongfei Liu, Jiangxin Li, Xiaofeng Zhu, and Ling Tian	(参考訳) 主成分分析(PCA)は次元減少と特徴抽出に広く用いられている。ロバストPCA (RPCA) は、l1-norm や l2, p-norm のような異なる頑健な距離の測定値の下で、ある程度ノイズや外れ値を扱うことができる。しかし、実世界のデータはこれらの単純な関数によって完全にキャプチャできない構造を表示するかもしれない。さらに、既存の方法は複雑で単純なサンプルを等しく扱う。対照的に、人間が一般的に採用する学習パターンは、単純から複雑、より少ないものから学ぶことである。この原理に基づき, 雑音や異常値の影響を更に低減する, セルフペーシングpca (spca) と呼ばれる新しい手法を提案する。特に、各サンプルの複雑さは、単純からより複雑なサンプルをトレーニングに統合するために、各イテレーションの初めに計算されます。交互最適化に基づいて、SPCAは最適なプロジェクション行列を見つけ、アウトリーチを反復的にフィルタする。理論的解析はSPCAの合理性を示す。一般的なデータセットに関する広範囲な実験により,提案手法が技術結果を大幅に改善できることが証明された。 Principal Component Analysis (PCA) has been widely used for dimensionality reduction and feature extraction. Robust PCA (RPCA), under different robust distance metrics, such as l1-norm and l2, p-norm, can deal with noise or outliers to some extent. However, real-world data may display structures that can not be fully captured by these simple functions. In addition, existing methods treat complex and simple samples equally. By contrast, a learning pattern typically adopted by human beings is to learn from simple to complex and less to more. Based on this principle, we propose a novel method called Self-paced PCA (SPCA) to further reduce the effect of noise and outliers. Notably, the complexity of each sample is calculated at the beginning of each iteration in order to integrate samples from simple to more complex into training. Based on an alternating optimization, SPCA finds an optimal projection matrix and filters out outliers iteratively. Theoretical analysis is presented to show the rationality of SPCA. Extensive experiments on popular data sets demonstrate that the proposed method can improve the state of-the-art results considerably.	翻訳日:2021-06-29 14:08:28 公開日:2021-06-25
# 凍結言語モデルを用いたマルチモーダルファウショット学習 Multimodal Few-Shot Learning with Frozen Language Models ( http://arxiv.org/abs/2106.13884v1 ) ライセンス: Link先を確認	Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S.M. Ali Eslami, Oriol Vinyals, Felix Hill	(参考訳) 十分な規模でトレーニングを行うと、自動回帰言語モデルは、ほんの数例で促された後、新しい言語タスクを学習する顕著な能力を示す。本稿では,このマイナショット学習能力をマルチモーダル環境(ビジョンと言語)に移すための,単純かつ効果的なアプローチを提案する。調整された画像とキャプションデータを用いて、視覚エンコーダを訓練し、各画像を連続した埋め込みのシーケンスとして表現し、プレトレーニングされた凍結言語モデルが適切なキャプションを生成する。結果として得られたシステムはマルチモーダルな数ショット学習者であり、実例に条件付けして、複数のインターリーブ画像とテキスト埋め込みのシーケンスとして表現された、様々な新しいタスクを学習する驚くべき能力を持つ。我々は,新しいオブジェクトや新しい視覚カテゴリーの単語を素早く学習し,ごく少数の例で視覚的質問応答を行い,複数の確立された新しいベンチマークで単一のモデルを測定することで外部知識を活用することを実証した。 When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each image as a sequence of continuous embeddings, such that a pre-trained, frozen language model prompted with this prefix generates the appropriate caption. The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples, represented as a sequence of multiple interleaved image and text embeddings. We demonstrate that it can rapidly learn words for new objects and novel visual categories, do visual question-answering with only a handful of examples, and make use of outside knowledge, by measuring a single model on a variety of established and new benchmarks.	翻訳日:2021-06-29 14:07:37 公開日:2021-06-25
# ドメイン適応のためのドメイン条件予測器 Domain Conditional Predictors for Domain Adaptation ( http://arxiv.org/abs/2106.13899v1 ) ライセンス: Link先を確認	Joao Monteiro, Xavier Gibert, Jianqiao Feng, Vincent Dumoulin, Dar-Shyang Lee	(参考訳) 学習保証は、しばしばi.i.d.の仮定に依存する予測器が実際のタスクを実行するためにデプロイされると、実際に違反する可能性があるデータ。このようにドメイン適応アプローチは、ラベル上の条件分布が基礎となるデータ分布から独立することを期待する共変量シフトなど、他の仮定が満たされるならば、異なるトレインとテストデータ分布をサポートするという、余分な柔軟性をもたらす有用なフレームワークとして現れた。様々な列車やテストデータソースをまたいだ一般化を誘導するために,データ生成分布が予測モデルに無視されるように,ドメイン不変性の一般的な考え方に依存する方法がいくつか導入された。本研究では,入力データに依存することに加えて,基礎となるデータ生成分布に関する情報を利用する条件付きモデリング手法を検討する。例えば、モデルには環境の変化や新しいデータソースに適応するための明確なメカニズムがあります。このようなアプローチは、共変量シフトのような余分な仮定を必要とせず、最小限の定式化によって生じるトレーニング不安定性の共通源を避けるため、より単純なトレーニングアルゴリズムが得られるため、現在の領域適応法よりも一般的に適用可能である。 Learning guarantees often rely on assumptions of i.i.d. data, which will likely be violated in practice once predictors are deployed to perform real-world tasks. Domain adaptation approaches thus appeared as a useful framework yielding extra flexibility in that distinct train and test data distributions are supported, provided that other assumptions are satisfied such as covariate shift, which expects the conditional distributions over labels to be independent of the underlying data distribution. Several approaches were introduced in order to induce generalization across varying train and test data sources, and those often rely on the general idea of domain-invariance, in such a way that the data-generating distributions are to be disregarded by the prediction model. In this contribution, we tackle the problem of generalizing across data sources by approaching it from the opposite direction: we consider a conditional modeling approach in which predictions, in addition to being dependent on the input data, use information relative to the underlying data-generating distribution. For instance, the model has an explicit mechanism to adapt to changing environments and/or new data sources. We argue that such an approach is more generally applicable than current domain adaptation methods since it does not require extra assumptions such as covariate shift and further yields simpler training algorithms that avoid a common source of training instabilities caused by minimax formulations, often employed in domain-invariant methods.	翻訳日:2021-06-29 14:07:19 公開日:2021-06-25
# 決定論的画像分類器のシーン不確かさとウェリントン後方 Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers ( http://arxiv.org/abs/2106.13870v1 ) ライセンス: Link先を確認	Stephanie Tsuei, Aditya Golatkar, Stefano Soatto	(参考訳) 本研究では,画像分類器の出力結果の不確実性を評価する手法を提案する。画像分類によく使用されるディープニューラルネットワークは、入力画像から出力クラスへの決定論的マップである。そのため、「自信」を定義し、測定し、解釈する場合に、どのような変動性について言及しているかを明確にする必要がある。この目的のために、Wellington Posteriorは、与えられた画像を生成する同じシーンから生成される可能性のあるデータに応答して得られる結果の分布である。与えられたイメージを生成できるシーンは無限に多いため、ウェリントン郵便局は描かれたシーン以外のシーンから誘導する必要がある。データ拡張、アンサンブル、モデル線形化を用いた代替手法について検討する。他にも、生成的敵ネットワーク、条件付き事前ネットワーク、教師付き単一ビュー再構築などがある。ビデオ中の時間隣接フレームのクラスを推測して得られた経験的後肢に対して,これらの代替案をテストした。これらの開発は、安全クリティカルなアプリケーションと互換性のある方法でディープネットワーク分類器の信頼性を評価するための小さなステップにすぎない。 We propose a method to estimate the uncertainty of the outcome of an image classifier on a given input datum. Deep neural networks commonly used for image classification are deterministic maps from an input image to an output class. As such, their outcome on a given datum involves no uncertainty, so we must specify what variability we are referring to when defining, measuring and interpreting "confidence." To this end, we introduce the Wellington Posterior, which is the distribution of outcomes that would have been obtained in response to data that could have been generated by the same scene that produced the given image. Since there are infinitely many scenes that could have generated the given image, the Wellington Posterior requires induction from scenes other than the one portrayed. We explore alternate methods using data augmentation, ensembling, and model linearization. Additional alternatives include generative adversarial networks, conditional prior networks, and supervised single-view reconstruction. We test these alternatives against the empirical posterior obtained by inferring the class of temporally adjacent frames in a video. These developments are only a small step towards assessing the reliability of deep network classifiers in a manner that is compatible with safety-critical applications.	翻訳日:2021-06-29 14:06:16 公開日:2021-06-25
# 食道マントメトリー診断のための多段階機械学習モデル A multi-stage machine learning model on diagnosis of esophageal manometry ( http://arxiv.org/abs/2106.13869v1 ) ライセンス: Link先を確認	Wenjun Kou, Dustin A. Carlson, Alexandra J. Baumann, Erica N. Donnan, Jacob M. Schauer, Mozziyar Etemadi, John E. Pandolfino	(参考訳) 高分解能マントメトリー(HRM)は食道運動障害を診断するための第一処置である。その解釈と分類には、ツバメレベルの結果の初期評価と、木のようなアルゴリズムを用いてシカゴ分類(cc)に基づく研究レベルの診断の導出が含まれる。 HRMを用いた運動性障害のこの診断手法は、様々な機械学習手法の組み合わせを用いて開発された多段階モデリングフレームワークを用いて反映された。特に、このフレームワークは、飲み込みレベルにおけるディープラーニングモデルと、学習レベルにおける機能ベースの機械学習モデルを含んでいる。また,畳み込みニューラルネットワーク(cnns)に基づく3つのモデルを開発し,飲み込み型,飲み込み加圧,統合緩和圧(irp)の予測を行った。研究段階において、エキスパート知識に基づくルールモデル、xgboostモデル、人工ニューラルネットワーク(ANN)モデルのファミリーからモデル選択を行い、後者の2つのモデルは輸出知識からモチベーションを得て設計および拡張された。ベイズ原理に動機づけられたモデルバランスの単純なモデル非依存戦略を利用して、精度スコアによって重み付けされたモデル平均化を生み出した。平均モデルと各モデルを比較して評価し,top-1予測では0.81,top-2予測では0.92であった。これは、生のマルチスワローデータからHRM研究のCC診断を自動的に予測する最初の人工知能モデルである。さらに,HRMおよび機能的光画像プローブパノメトリー(FLIP)による臨床データに基づいて食道患者の診断など,マルチモーダルタスクに容易に拡張することができる。 High-resolution manometry (HRM) is the primary procedure used to diagnose esophageal motility disorders. Its interpretation and classification includes an initial evaluation of swallow-level outcomes and then derivation of a study-level diagnosis based on Chicago Classification (CC), using a tree-like algorithm. This diagnostic approach on motility disordered using HRM was mirrored using a multi-stage modeling framework developed using a combination of various machine learning approaches. Specifically, the framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage. In the swallow-level stage, three models based on convolutional neural networks (CNNs) were developed to predict swallow type, swallow pressurization, and integrated relaxation pressure (IRP). At the study-level stage, model selection from families of the expert-knowledge-based rule models, xgboost models and artificial neural network(ANN) models were conducted, with the latter two model designed and augmented with motivation from the export knowledge. A simple model-agnostic strategy of model balancing motivated by Bayesian principles was utilized, which gave rise to model averaging weighted by precision scores. The averaged (blended) models and individual models were compared and evaluated, of which the best performance on test dataset is 0.81 in top-1 prediction, 0.92 in top-2 predictions. This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data. Moreover, the proposed modeling framework could be easily extended to multi-modal tasks, such as diagnosis of esophageal patients based on clinical data from both HRM and functional luminal imaging probe panometry (FLIP).	翻訳日:2021-06-29 14:01:55 公開日:2021-06-25
# 論理仕様からの合成強化学習 Compositional Reinforcement Learning from Logical Specifications ( http://arxiv.org/abs/2106.13906v1 ) ライセンス: Link先を確認	Kishor Jothimurugan, Suguman Bansal, Osbert Bastani and Rajeev Alur	(参考訳) 論理仕様による複雑なタスクに対する学習制御ポリシーの問題点について検討する。最近のアプローチでは、与えられた仕様から報酬関数を自動的に生成し、適切な強化学習アルゴリズムを用いて、期待される報酬を最大化するポリシーを学ぶ。しかし、これらのアプローチは、高レベルの計画を必要とする複雑なタスクに不十分にスケールする。本研究では,高レベルの計画と強化学習をインターリーブするDiRLという構成学習手法を開発する。まず、dirlは仕様を抽象グラフとしてエンコードする。直感的には、グラフの頂点と辺はそれぞれ状態空間の領域と単純なサブタスクに対応する。このアプローチでは、強化学習を取り入れて、dijkstraスタイルの計画アルゴリズムで各エッジ(サブタスク)のニューラルネットワークポリシを学習し、グラフ内の高レベルプランを計算する。連続状態とアクション空間を持つ一連の挑戦的制御ベンチマークに対する提案手法の評価は、最先端のベースラインよりも優れていることを示す。 We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning. In this work, we develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning. First, DiRL encodes the specification as an abstract graph; intuitively, vertices and edges of the graph correspond to regions of the state space and simpler sub-tasks, respectively. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph. An evaluation of the proposed approach on a set of challenging control benchmarks with continuous state and action spaces demonstrates that it outperforms state-of-the-art baselines.	翻訳日:2021-06-29 14:01:26 公開日:2021-06-25
# 真理発見と幾何最適化による深層ニューラルネットワークの不確実性校正の改善 Improving Uncertainty Calibration of Deep Neural Networks via Truth Discovery and Geometric Optimization ( http://arxiv.org/abs/2106.14662v1 ) ライセンス: Link先を確認	Chunwei Ma, Ziyun Huang, Jiayi Xian, Mingchen Gao, Jinhui Xu	(参考訳) 近年のDeep Neural Networks(DNN)は、その成功にもかかわらず、学習プロセスに固有の不確実性のために、予測に疑問を投げかける可能性がある。アンサンブル技術とポストホックキャリブレーションは、DNNの不確実性キャリブレーションの改善を個別に示す2種類のアプローチである。しかし,2種類の手法の相乗効果はよく研究されていない。本稿では,アンサンブル法とポストホック校正法を統合するための真理発見フレームワークを提案する。そこで,アンサンブル候補の幾何分散をサンプル不確かさのよい指標として用い,精度を保った真理推定器を設計した。さらに,事実発見正規化最適化により,ポストホックキャリブレーションを向上できることを示す。 CIFAR や ImageNet などの大規模データセットでは,ヒストグラムとカーネル密度に基づく評価指標に対する最先端キャリブレーション手法に対して一貫した改善が見られた。私たちのコードはhttps://github.com/ horsepurve/truly-uncertainで入手できます。 Deep Neural Networks (DNNs), despite their tremendous success in recent years, could still cast doubts on their predictions due to the intrinsic uncertainty associated with their learning process. Ensemble techniques and post-hoc calibrations are two types of approaches that have individually shown promise in improving the uncertainty calibration of DNNs. However, the synergistic effect of the two types of methods has not been well explored. In this paper, we propose a truth discovery framework to integrate ensemble-based and post-hoc calibration methods. Using the geometric variance of the ensemble candidates as a good indicator for sample uncertainty, we design an accuracy-preserving truth estimator with provably no accuracy drop. Furthermore, we show that post-hoc calibration can also be enhanced by truth discovery-regularized optimization. On large-scale datasets including CIFAR and ImageNet, our method shows consistent improvement against state-of-the-art calibration approaches on both histogram-based and kernel density-based evaluation metrics. Our codes are available at https://github.com/horsepurve/truly-uncertain.	翻訳日:2021-06-29 13:59:17 公開日:2021-06-25
# 分散学習と連合学習における暗黙的勾配アライメント Implicit Gradient Alignment in Distributed and Federated Learning ( http://arxiv.org/abs/2106.13897v1 ) ライセンス: Link先を確認	Yatin Dandi, Luis Barba, Martin Jaggi	(参考訳) 分散学習におけるグローバル収束を達成するための大きな障害は、分散データの不均一性と確率性によるクライアント間の勾配やミニバッチの誤調整である。この問題を軽減するひとつの方法は、トレーニングを通じて異なるクライアント間の勾配のアライメントを促進することだ。解析の結果,SGDの暗黙的正規化効果を再現する適切な最適化手法を用いることで,勾配アライメントとテスト精度の向上が実現可能であることがわかった。 sgdにおけるこの正規化の存在は、訓練中の異なるミニバッチの逐次使用に依存しているため、大きなミニバッチでのトレーニングでは本質的に欠如している。並列性を高めつつ、この正規化の一般化の利点を得るため、各更新で任意に大きなバッチを利用可能にしつつ、同じ暗黙の正規化を誘導する新しいgradalignアルゴリズムを提案する。分散学習とフェデレーション学習において,アルゴリズムの利点を実験的に検証した。 A major obstacle to achieving global convergence in distributed and federated learning is the misalignment of gradients across clients, or mini-batches due to heterogeneity and stochasticity of the distributed data. One way to alleviate this problem is to encourage the alignment of gradients across different clients throughout training. Our analysis reveals that this goal can be accomplished by utilizing the right optimization method that replicates the implicit regularization effect of SGD, leading to gradient alignment as well as improvements in test accuracies. Since the existence of this regularization in SGD completely relies on the sequential use of different mini-batches during training, it is inherently absent when training with large mini-batches. To obtain the generalization benefits of this regularization while increasing parallelism, we propose a novel GradAlign algorithm that induces the same implicit regularization while allowing the use of arbitrarily large batches in each update. We experimentally validate the benefit of our algorithm in different distributed and federated learning settings.	翻訳日:2021-06-29 13:58:20 公開日:2021-06-25
# 閉形式連続深層モデル Closed-form Continuous-Depth Models ( http://arxiv.org/abs/2106.13898v1 ) ライセンス: Link先を確認	Ramin Hasani, Mathias Lechner, Alexander Amini, Lucas Liebenwein, Max Tschaikowski, Gerald Teschl, Daniela Rus	(参考訳) モデル隠れ状態の微分がニューラルネットワークによって定義される連続深度ニューラルネットワークは、強力なシーケンシャルなデータ処理機能を実現している。しかし、これらのモデルは高度な数値微分方程式(DE)の解法に依存しており、計算コストとモデルの複雑さの両方において大きなオーバーヘッドをもたらす。本稿では,CfCネットワークと呼ばれる新しいモデル群について述べる。そのモデル群は,ODEをベースとしたモデルと同等に強力なモデリング能力を示しながら,記述が簡単で,少なくとも1桁高速である。モデルは、時間連続モデルの表現的部分集合の解析的閉形式解から導出され、複雑なdeソルバの必要性を全て和らげる。実験により,CfCネットワークは長期依存や不規則なサンプルデータを含む様々な時系列予測タスクにおいて,高度で反復的なモデルよりも優れていることを示した。私たちは、リソース制約のある環境でリッチで継続的なニューラルモデルをトレーニングし、デプロイする新たな機会が、パフォーマンスと効率の両方を必要としている、と信じています。 Continuous-depth neural models, where the derivative of the model's hidden state is defined by a neural network, have enabled strong sequential data processing capabilities. However, these models rely on advanced numerical differential equation (DE) solvers resulting in a significant overhead both in terms of computational cost and model complexity. In this paper, we present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster while exhibiting equally strong modeling abilities compared to their ODE-based counterparts. The models are hereby derived from the analytical closed-form solution of an expressive subset of time-continuous models, thus alleviating the need for complex DE solvers all together. In our experimental evaluations, we demonstrate that CfC networks outperform advanced, recurrent models over a diverse set of time-series prediction tasks, including those with long-term dependencies and irregularly sampled data. We believe our findings open new opportunities to train and deploy rich, continuous neural models in resource-constrained settings, which demand both performance and efficiency.	翻訳日:2021-06-29 13:46:22 公開日:2021-06-25
# 低リソース高表現性音声のための明示的持続時間モデルを用いた非自己回帰tt Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech ( http://arxiv.org/abs/2106.12896v2 ) ライセンス: Link先を確認	Raahil Shah, Kamil Pokora, Abdelhamid Ezzerg, Viacheslav Klimkov, Goeric Huybrechts, Bartosz Putrycz, Daniel Korzekwa, Thomas Merritt	(参考訳) 最近のニューラルテキスト音声(TTS)アプローチは高品質な音声を生成するが、通常はターゲット話者からの大量の録音を必要とする。先行研究では,高品質ttを生成するための3段階の手法を提案し,トレーニングに必要なデータ量を大幅に削減した。しかし, この手法では, 高い表現力を持つ音声に対して, 自然性レベルにおける天井効果が認められている。本稿では,ターゲット話者から15分間の音声データを用いて,高い表現力を持つTS音声を構築する手法を提案する。現在の最先端のアプローチと比較して,提案手法では音声の自然性が23.3%,話者の類似性が16.3%向上している。さらに,15分間の話者データを用いて,tacotron2ベースのフルデータモデル(約10時間)の自然性と話者の類似性を一致させた。 1) 自己回帰型注意型ttsモデルから, 注意を外部持続時間モデルに置き換えた非自己回帰型モデルに変更すること, 2) 追加条件付き生成敵ネットワーク(cgan)ベースの微調整ステップを提案する。 Whilst recent neural text-to-speech (TTS) approaches produce high-quality speech, they typically require a large amount of recordings from the target speaker. In previous work, a 3-step method was proposed to generate high-quality TTS while greatly reducing the amount of data required for training. However, we have observed a ceiling effect in the level of naturalness achievable for highly expressive voices when using this approach. In this paper, we present a method for building highly expressive TTS voices with as little as 15 minutes of speech data from the target speaker. Compared to the current state-of-the-art approach, our proposed improvements close the gap to recordings by 23.3% for naturalness of speech and by 16.3% for speaker similarity. Further, we match the naturalness and speaker similarity of a Tacotron2-based full-data (~10 hours) model using only 15 minutes of target speaker data, whereas with 30 minutes or more, we significantly outperform it. The following improvements are proposed: 1) changing from an autoregressive, attention-based TTS model to a non-autoregressive model replacing attention with an external duration model and 2) an additional Conditional Generative Adversarial Network (cGAN) based fine-tuning step.	翻訳日:2021-06-29 11:11:06 公開日:2021-06-25
# (参考訳) 正規化フローからの知識の蒸留 Distilling the Knowledge from Normalizing Flows ( http://arxiv.org/abs/2106.12699v2 ) ライセンス: CC BY 4.0	Dmitry Baranchuk, Vladimir Aliev, Artem Babenko	(参考訳) 正規化フローは、複数の音声および視覚問題において強力な性能を示す生成モデルの強力なクラスである。他の生成モデルとは対照的に、正規化フローは扱いやすい可能性を持つ潜在変数モデルであり、安定したトレーニングを可能にする。しかし、それらは効率的なヤコビ行列式計算で可逆関数を表現するように慎重に設計する必要がある。実際には、これらの要件は、推論時間とメモリ消費の観点から、代替フィードフォワードモデルよりも劣る、過度にパラメータ化され、洗練されたアーキテクチャをもたらす。本研究では,フローベースモデルをより効率的な代替品に蒸留できるかどうかを検討する。本稿では, 簡単な蒸留法を提案し, 画像超解像および音声合成のための現状条件付きフローベースモデルの有効性を示すことで, この問題に対する肯定的な回答を提供する。 Normalizing flows are a powerful class of generative models demonstrating strong performance in several speech and vision problems. In contrast to other generative models, normalizing flows are latent variable models with tractable likelihoods and allow for stable training. However, they have to be carefully designed to represent invertible functions with efficient Jacobian determinant calculation. In practice, these requirements lead to overparameterized and sophisticated architectures that are inferior to alternative feed-forward models in terms of inference time and memory consumption. In this work, we investigate whether one can distill flow-based models into more efficient alternatives. We provide a positive answer to this question by proposing a simple distillation approach and demonstrating its effectiveness on state-of-the-art conditional flow-based models for image super-resolution and speech synthesis.	翻訳日:2021-06-29 06:02:03 公開日:2021-06-25
# (参考訳) クラスタリング広告の意図による入札:Eコマースのための効率的な検索エンジンマーケティングシステム Bidding via Clustering Ads Intentions: an Efficient Search Engine Marketing System for E-commerce ( http://arxiv.org/abs/2106.12700v2 ) ライセンス: CC0 1.0	Cheng Jie, Da Xu, Zigeng Wang, Lu Wang, Wei Shen	(参考訳) 検索エンジンのマーケティングの規模が拡大するにつれ、効率的な入札システムの設計がeコマース企業の成功にとって最重要になっている。現代の産業レベルの入札システムで直面する重要な課題は、そのカタログは巨大であり、関連する入札機能は高い疎性である; 2. 大量の入札要求は、オフラインとオンラインの両方のサービスに大きな計算負担を生じさせる。不要なユーザ項目情報を活用することは,ユーザクエリからの自然言語信号と製品からのコンテキスト知識を活用するため,スパーシティの問題を軽減する上で不可欠である。特に,広告のベクトル表現をトランスフォーマモデルを用いて抽出し,それらの幾何学的関係をクラスタリングによる協調入札予測の構築に活用する。 2段階の手続きは入札評価と最適化の計算ストレスを大幅に低減する。本稿では,walmart eコマースにおける検索エンジンマーケティングのための入札システムのエンドツーエンド構造について紹介する。当社のアプローチのオンラインおよびオフラインのパフォーマンスを分析し、それを運用効率のよいソリューションとみなす方法について論じます。 With the increasing scale of search engine marketing, designing an efficient bidding system is becoming paramount for the success of e-commerce companies. The critical challenges faced by a modern industrial-level bidding system include: 1. the catalog is enormous, and the relevant bidding features are of high sparsity; 2. the large volume of bidding requests induces significant computation burden to both the offline and online serving. Leveraging extraneous user-item information proves essential to mitigate the sparsity issue, for which we exploit the natural language signals from the users' query and the contextual knowledge from the products. In particular, we extract the vector representations of ads via the Transformer model and leverage their geometric relation to building collaborative bidding predictions via clustering. The two-step procedure also significantly reduces the computation stress of bid evaluation and optimization. In this paper, we introduce the end-to-end structure of the bidding system for search engine marketing for Walmart e-commerce, which successfully handles tens of millions of bids each day. We analyze the online and offline performances of our approach and discuss how we find it as a production-efficient solution.	翻訳日:2021-06-29 05:44:08 公開日:2021-06-25
# (参考訳) 自律走行における多モード3次元物体検出:サーベイ Multi-Modal 3D Object Detection in Autonomous Driving: a Survey ( http://arxiv.org/abs/2106.12735v2 ) ライセンス: CC BY 4.0	Yingjie Wang, Qiuyu Mao, Hanqi Zhu, Yu Zhang, Jianmin Ji, Yanyong Zhang	(参考訳) 過去数年間、我々は自動運転の急速な発展を目撃してきた。しかし、複雑でダイナミックな運転環境のため、完全な自律性を実現することは依然として厄介な課題である。その結果、自動運転車は、堅牢で正確な環境認識を行うための一連のセンサーを備えている。センサーの数や種類が増加し続けており、それらを組み合わせて知覚を向上させることが自然なトレンドになりつつある。これまでのところ、マルチセンサー融合に基づく知覚に焦点を当てた詳細なレビューは行われていない。このギャップを埋め、将来の研究を動機付けるために、この調査では、複数のセンサーデータソース、特にカメラやLiDARを活用する、最近のフュージョンベースの3D検出ディープラーニングモデルについてレビューする。本調査では,各センサデータに共通するデータ表現やオブジェクト検出ネットワークを含む,自動運転車用の一般的なセンサの背景について紹介する。次に,マルチモーダル3dオブジェクト検出のための一般的なデータセットについて議論し,各データセットに含まれるセンサデータに着目した。次に, 核融合位置, 核融合データ表現, 核融合粒度の3つの側面を考慮し, 最新のマルチモーダル3次元検出ネットワークについて詳細に検討する。詳細なレビューの後、オープンチャレンジについて議論し、可能な解決策を指摘します。われわれの詳細なレビューが、マルチモーダルな3Dオブジェクト検出の分野での研究に役立てることを願っている。 In the past few years, we have witnessed rapid development of autonomous driving. However, achieving full autonomy remains a daunting task due to the complex and dynamic driving environment. As a result, self-driving cars are equipped with a suite of sensors to conduct robust and accurate environment perception. As the number and type of sensors keep increasing, combining them for better perception is becoming a natural trend. So far, there has been no indepth review that focuses on multi-sensor fusion based perception. To bridge this gap and motivate future research, this survey devotes to review recent fusion-based 3D detection deep learning models that leverage multiple sensor data sources, especially cameras and LiDARs. In this survey, we first introduce the background of popular sensors for autonomous cars, including their common data representations as well as object detection networks developed for each type of sensor data. Next, we discuss some popular datasets for multi-modal 3D object detection, with a special focus on the sensor data included in each dataset. Then we present in-depth reviews of recent multi-modal 3D detection networks by considering the following three aspects of the fusion: fusion location, fusion data representation, and fusion granularity. After a detailed review, we discuss open challenges and point out possible solutions. We hope that our detailed review can help researchers to embark investigations in the area of multi-modal 3D object detection.	翻訳日:2021-06-29 05:34:41 公開日:2021-06-25
# (参考訳) 時間的ルーティング適応と最適輸送を用いた複数ストックトレーディングパターンの学習 Learning Multiple Stock Trading Patterns with Temporal Routing Adaptor and Optimal Transport ( http://arxiv.org/abs/2106.12950v2 ) ライセンス: CC BY 4.0	Hengxu Lin, Dong Zhou, Weiqing Liu, Jiang Bian	(参考訳) 有効な量的投資は通常、株価の将来の動きの正確な予測に依存する。近年、機械学習ベースのソリューションは、より正確な株価予測を行い、現代の量的投資システムにおいて欠かせない要素となる能力を示している。しかし i. i. d. 既存手法の背景にある仮定は、市場における多様な取引パターンの存在と矛盾しており、それは必然的に、より良い株価予測性能を達成する能力を制限する。本稿では,既存の在庫予測モデルに複数の在庫取引パターンをモデル化する能力を持たせるための,新しいアーキテクチャである時間経路適応器(tra)を提案する。 TRAは、複数のパターンを学習するための独立した予測器のセットと、異なる予測器にサンプルをディスパッチするルータで構成される軽量モジュールである。それでも、明示的なパターン識別子がないため、効果的なTRAベースのモデルをトレーニングすることは極めて困難である。この課題に取り組むため,我々は,最適トランスポート(ot)に基づく学習アルゴリズムを更に設計し,予測者の割り当てに最適なサンプルを得るとともに,補助損失項を通じてルータを効果的に最適化する。実世界のストックランキングタスクの実験では,注意 LSTM や Transformer といった最先端のベースラインと比較して,情報係数を 0.053 から 0.059 , 0.051 から 0.056 に向上させることができる。 https://github.com/microsoft/qlib/tree/main/examples/benchmarks/TRA。 Successful quantitative investment usually relies on precise predictions of the future movement of the stock price. Recently, machine learning based solutions have shown their capacity to give more accurate stock prediction and become indispensable components in modern quantitative investment systems. However, the i.i.d. assumption behind existing methods is inconsistent with the existence of diverse trading patterns in the stock market, which inevitably limits their ability to achieve better stock prediction performance. In this paper, we propose a novel architecture, Temporal Routing Adaptor (TRA), to empower existing stock prediction models with the ability to model multiple stock trading patterns. Essentially, TRA is a lightweight module that consists of a set of independent predictors for learning multiple patterns as well as a router to dispatch samples to different predictors. Nevertheless, the lack of explicit pattern identifiers makes it quite challenging to train an effective TRA-based model. To tackle this challenge, we further design a learning algorithm based on Optimal Transport (OT) to obtain the optimal sample to predictor assignment and effectively optimize the router with such assignment through an auxiliary loss term. Experiments on the real-world stock ranking task show that compared to the state-of-the-art baselines, e.g., Attention LSTM and Transformer, the proposed method can improve information coefficient (IC) from 0.053 to 0.059 and 0.051 to 0.056 respectively. Our dataset and code used in this work are publicly available: https://github.com/microsoft/qlib/tree/main/examples/benchmarks/TRA.	翻訳日:2021-06-29 04:48:43 公開日:2021-06-25
# (参考訳) ディファレンシャルプライバシが解釈可能性を満たす場合--ケーススタディ When Differential Privacy Meets Interpretability: A Case Study ( http://arxiv.org/abs/2106.13203v2 ) ライセンス: CC BY 4.0	Rakshit Naidu, Aman Priyanshu, Aadith Kumar, Sasikanth Kotti, Haofan Wang, Fatemehsadat Mireshghallah	(参考訳) 医療画像や診断などのタスクにおけるDeep Neural Networks(DNN)のトレーニングにおける個人データの利用の増加を踏まえ、DNNの差分プライベートトレーニングの重要性が高まっている。しかし,これらのモデルの解釈可能性やDPの適用が解釈の質に与える影響についてはほとんど注目されていない。本稿では,DPトレーニングがDNN,特に医療画像への応用に与える影響について,APTOSデータセット上で広範囲に研究する。 Given the increase in the use of personal data for training Deep Neural Networks (DNNs) in tasks such as medical imaging and diagnosis, differentially private training of DNNs is surging in importance and there is a large body of work focusing on providing better privacy-utility trade-off. However, little attention is given to the interpretability of these models, and how the application of DP affects the quality of interpretations. We propose an extensive study into the effects of DP training on DNNs, especially on medical imaging applications, on the APTOS dataset.	翻訳日:2021-06-29 04:14:30 公開日:2021-06-25
# (参考訳) RSN: 高精度LiDAR3次元物体検出のためのレンジスパースネット RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection ( http://arxiv.org/abs/2106.13365v1 ) ライセンス: CC0 1.0	Pei Sun, Weiyue Wang, Yuning Chai, Gamaleldin Elsayed, Alex Bewley, Xiao Zhang, Cristian Sminchisescu, Dragomir Anguelov	(参考訳) LiDARデータから3Dオブジェクトを検出することは、ほとんどの自律運転システムにおいて重要な要素である。安全で高速な運転には、新しいLiDARによって実現されるより大きな検知範囲が必要である。これらのより大きな検出範囲はより効率的で正確な検出モデルを必要とする。本研究では,この拡張検出方式でリアルタイム3次元物体検出を実現するために,簡易で効率的かつ高精度な3次元物体検出器であるレンジスパースネット(RSN)を提案する。 RSNは、範囲画像からフォアグラウンドポイントを予測し、選択したフォアグラウンドポイントにスパース畳み込みを適用してオブジェクトを検出する。高密度領域画像上の軽量な2D畳み込みは、選択された前景点を著しく減らし、後の粗い畳み込みをRCNで効率的に操作できるようにする。距離画像の特徴を組み合わせることで検出精度がさらに向上する。 rsnはwaymo open dataset(wod)上の150m x 150m検出領域で毎秒60フレーム以上動作し、以前公開された検出器よりも正確である。 RSNは2020年11月11日現在、LiDARをベースとした歩行者および車両検出のためのAPH/LEVEL 1測定値に基づいて、WODのリーダーボードで第1位にランクされている。 The detection of 3D objects from LiDAR data is a critical component in most autonomous driving systems. Safe, high speed driving needs larger detection ranges, which are enabled by new LiDARs. These larger detection ranges require more efficient and accurate detection models. Towards this goal, we propose Range Sparse Net (RSN), a simple, efficient, and accurate 3D object detector in order to tackle real time 3D object detection in this extended detection regime. RSN predicts foreground points from range images and applies sparse convolutions on the selected foreground points to detect objects. The lightweight 2D convolutions on dense range images results in significantly fewer selected foreground points, thus enabling the later sparse convolutions in RSN to efficiently operate. Combining features from the range image further enhance detection accuracy. RSN runs at more than 60 frames per second on a 150m x 150m detection region on Waymo Open Dataset (WOD) while being more accurate than previously published detectors. As of 11/2020, RSN is ranked first in the WOD leaderboard based on the APH/LEVEL 1 metrics for LiDAR-based pedestrian and vehicle detection, while being several times faster than alternatives.	翻訳日:2021-06-29 01:14:03 公開日:2021-06-25
# (参考訳) グラフ畳み込みカーネルを用いた距離画像における効率的な3次元物体検出 To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels ( http://arxiv.org/abs/2106.13381v1 ) ライセンス: CC0 1.0	Yuning Chai, Pei Sun, Jiquan Ngiam, Weiyue Wang, Benjamin Caine, Vijay Vasudevan, Xiao Zhang, Dragomir Anguelov	(参考訳) 3Dオブジェクト検出は多くのロボティクス応用において不可欠である。 2次元視点範囲画像が存在するタスクに対しては,この範囲画像から直接3次元表現を学習することを提案する。この目的のために,我々は,各画素の3次元球面座標をネットワーク全体に伝達する2次元畳み込みネットワークアーキテクチャを設計した。その層は、デフォルトの内積カーネルの代わりに任意の畳み込みカーネルを消費し、各ピクセルの周囲の基底となる局所幾何学を利用することができる。我々は4つのカーネルを概説する: 単語の袋型パラダイムに基づく密集したカーネル、最近のグラフニューラルネットワークの進歩に触発された3つのグラフカーネル: トランスフォーマー、ポイントネット、エッジ畳み込み。また、遠近距離画像ビューの操作により、カメラ画像とのクロスモダリティ融合についても検討する。本手法はWaymo Open Dataset上で競合的に動作し,歩行者検出の最先端APを69.7%から75.5%に改善する。私たちの最小のモデルは、今でも人気の高いPointPillarsを上回り、180倍のFLOPSとモデルパラメータを必要としています。 3D object detection is vital for many robotics applications. For tasks where a 2D perspective range image exists, we propose to learn a 3D representation directly from this range image view. To this end, we designed a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network. Its layers can consume any arbitrary convolution kernel in place of the default inner product kernel and exploit the underlying local geometry around each pixel. We outline four such kernels: a dense kernel according to the bag-of-words paradigm, and three graph kernels inspired by recent graph neural network advances: the Transformer, the PointNet, and the Edge Convolution. We also explore cross-modality fusion with the camera image, facilitated by operating in the perspective range image view. Our method performs competitively on the Waymo Open Dataset and improves the state-of-the-art AP for pedestrian detection from 69.7% to 75.5%. It is also efficient in that our smallest model, which still outperforms the popular PointPillars in quality, requires 180 times fewer FLOPS and model parameters	翻訳日:2021-06-29 00:59:33 公開日:2021-06-25
# (参考訳) グローブインベディングのソース・クリティカル・デバイアス法 A Source-Criticism Debiasing Method for GloVe Embeddings ( http://arxiv.org/abs/2106.13382v1 ) ライセンス: CC BY-SA 4.0	Hope McGovern	(参考訳) 大規模な公共コーパスで訓練された単語の埋め込みは、既知の人間の社会的偏見を一貫して示すことはよく文書化されている。多くのデバイアスの方法が存在するが、ほとんどの場合、埋め込みからバイアス情報を完全に排除し、プロセス内のトレーニングセットのサイズを小さくする。本稿では,偏りのあるデータを取り除くのではなく,トレーニングセットの偏りに関する明示的な情報を取り込むことにより,グローブワード埋め込み(pennington et al., 2014)の偏りを解消する簡易かつ効果的な手法を提案する。提案手法は,Brunetらによる高速バイアス勾配近似法の助けを借りて,迅速かつ効率的に動作する。 (2019). 私たちのアプローチは、人文科学における「ソース批判」の概念に似ているので、本手法をソースクリティカルグローブ(sc-glove)と呼ぶ。 SC-GloVeは,トレーニングデータやTOP-1の性能を犠牲にすることなく,ワード埋め込みアソシエーションテスト(WEAT)セットへの影響を小さくする。 It is well-documented that word embeddings trained on large public corpora consistently exhibit known human social biases. Although many methods for debiasing exist, almost all fixate on completely eliminating biased information from the embeddings and often diminish training set size in the process. In this paper, we present a simple yet effective method for debiasing GloVe word embeddings (Pennington et al., 2014) which works by incorporating explicit information about training set bias rather than removing biased data outright. Our method runs quickly and efficiently with the help of a fast bias gradient approximation method from Brunet et al. (2019). As our approach is akin to the notion of 'source criticism' in the humanities, we term our method Source-Critical GloVe (SC-GloVe). We show that SC-GloVe reduces the effect size on Word Embedding Association Test (WEAT) sets without sacrificing training data or TOP-1 performance.	翻訳日:2021-06-29 00:40:49 公開日:2021-06-25
# (参考訳) ベイジアンアイトラッキング Bayesian Eye Tracking ( http://arxiv.org/abs/2106.13387v1 ) ライセンス: CC BY 4.0	Qiang Ji and Kang Wang	(参考訳) モデルに基づく視線追跡は、訓練データや視線アノテーションを必要とせず、異なる対象に一般化できるため、視線追跡において支配的なアプローチである。しかし、モデルベースの眼球追跡は、特に野生の眼球追跡において、眼球の特徴検出エラーの影響を受けやすい。そこで本研究では,モデルベースアイトラッキングのためのベイズフレームワークを提案する。提案システムは,眼の外観とランドマークとの確率的関係を捉えるカスケード・ベイズ畳み込みニューラルネットワーク(c-BCNN)と,眼のランドマークから視線を推定する幾何学的アイモデルから構成される。ベイジアンフレームワークは、テスト眼画像からベイジアン推定により、明確な目印検出やモデルトレーニングを伴わない視線分布を生成し、最も可能性の高い視線を推定するだけでなく、その不確実性を推定する。さらに,点に基づく推論ではなくベイズ推論を用いることで,異なるサブジェクトやヘッドポーズ,環境に対してよりよく一般化できるだけでなく,画像ノイズやランドマーク検出誤差にも頑健である。最後に、視線の不確実性の推定により、視線推定精度を段階的に向上できるカスケードアーキテクチャを構築することができる。最先端のモデルベースと学習ベースの手法と比較して,提案手法は,いくつかのベンチマークデータセットにおける一般化能力と,実世界の課題条件下での正確性と堅牢性が大幅に向上することを示す。 Model-based eye tracking has been a dominant approach for eye gaze tracking because of its ability to generalize to different subjects, without the need of any training data and eye gaze annotations. Model-based eye tracking, however, is susceptible to eye feature detection errors, in particular for eye tracking in the wild. To address this issue, we propose a Bayesian framework for model-based eye tracking. The proposed system consists of a cascade-Bayesian Convolutional Neural Network (c-BCNN) to capture the probabilistic relationships between eye appearance and its landmarks, and a geometric eye model to estimate eye gaze from the eye landmarks. Given a testing eye image, the Bayesian framework can generate, through Bayesian inference, the eye gaze distribution without explicit landmark detection and model training, based on which it not only estimates the most likely eye gaze but also its uncertainty. Furthermore, with Bayesian inference instead of point-based inference, our model can not only generalize better to different sub-jects, head poses, and environments but also is robust to image noise and landmark detection errors. Finally, with the estimated gaze uncertainty, we can construct a cascade architecture that allows us to progressively improve gaze estimation accuracy. Compared to state-of-the-art model-based and learning-based methods, the proposed Bayesian framework demonstrates significant improvement in generalization capability across several benchmark datasets and in accuracy and robustness under challenging real-world conditions.	翻訳日:2021-06-29 00:33:11 公開日:2021-06-25
# (参考訳) HAN:骨格型ジェスチャー認識のための効率的な階層型自己認識ネットワーク HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based Gesture Recognition ( http://arxiv.org/abs/2106.13391v1 ) ライセンス: CC BY 4.0	Jianbo Liu, Ying Wang, Shiming Xiang, Chunhong Pan	(参考訳) 骨格に基づくジェスチャー認識の従来の手法は、骨格配列を擬似画像や時空間グラフに配置し、深層畳み込みニューラルネットワーク(CNN)やグラフ畳み込みニューラルネットワーク(GCN)を用いて特徴抽出を行う。優れた結果を得たにもかかわらず、これらの手法はインタラクティブな手の部品の局所的な特徴を動的に捉えることに固有の制限があり、計算効率は依然として深刻な問題である。本研究では,この問題を緩和するために自己着脱機構を導入する。本稿では,手関節の階層構造を考慮し,CNN,RNN,GCN演算子を使わずに,純粋な自己認識に基づく骨格型ジェスチャー認識のための効率的な階層型自己認識ネットワーク(HAN)を提案する。具体的には、関節型自己保持モジュールは指の空間的特徴を捉え、指型自己保持モジュールは手全体の特徴を集約するように設計されている。時間的特徴の観点からは、時間的自己アテンションモジュールを使用して指と手全体の時間的ダイナミクスを捉える。最後に、これらの機能はジェスチャ分類のための融合自己注意モジュールによって融合される。提案手法は,計算複雑性がはるかに低い3つのジェスチャ認識データセットにおいて,競合する結果が得られることを示す。 Previous methods for skeleton-based gesture recognition mostly arrange the skeleton sequence into a pseudo picture or spatial-temporal graph and apply deep Convolutional Neural Network (CNN) or Graph Convolutional Network (GCN) for feature extraction. Although achieving superior results, these methods have inherent limitations in dynamically capturing local features of interactive hand parts, and the computing efficiency still remains a serious issue. In this work, the self-attention mechanism is introduced to alleviate this problem. Considering the hierarchical structure of hand joints, we propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition, which is based on pure self-attention without any CNN, RNN or GCN operators. Specifically, the joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand. In terms of temporal features, the temporal self-attention module is utilized to capture the temporal dynamics of the fingers and the entire hand. Finally, these features are fused by the fusion self-attention module for gesture classification. Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.	翻訳日:2021-06-29 00:20:57 公開日:2021-06-25
# (参考訳) SDS評価の長期ビデオ記録による抑うつの解釈 Interpreting Depression From Question-wise Long-term Video Recording of SDS Evaluation ( http://arxiv.org/abs/2106.13393v1 ) ライセンス: CC BY 4.0	Wanqing Xie, Lizhong Liang, Yao Lu, Chen Wang, Jihong Shen, Hui Luo, Xiaofeng Liu	(参考訳) SDS (Self-Rating Depression Scale) は, うつ病早期スクリーニングによく用いられている。しかし, コントロール不能な自己管理尺度は, 不合理に, 偏見的に答えることによって容易に影響を受け, 臨床医によるハミルトン抑うつ評価尺度 (HDRS) と最終診断で異なる結果が得られた。臨床では, 顔面表情(FE)と行動は, 臨床医による評価において重要な役割を担っている。本研究では,200人の被験者を対象とした新しいデータセットを収集し,自己評価アンケートの有効性を示す。 SDS評価とペアビデオからうつ病を自動的に解釈するために,質問票結果と回答時間にも配慮した,長期可変長ビデオのエンドツーエンド階層化フレームワークを提案する。具体的には,局所的時間的パターン探索に3D CNNを利用する階層モデルと,疑わしいグローバルな特徴集約のための冗長性を考慮した自己認識(RAS)方式を用いる。冗長なfeビデオ処理をターゲットとしたrasは,質問集合内の各ビデオクリップの相関を効果的に活用し,識別情報を強調し,特徴対の親和性に基づく冗長性を解消する。そして、質問側の映像特徴とアンケートスコアとを連結して最終抑うつ検出を行う。また,SDS評価とその映像記録の有効性,および従来の最先端の時間的モデリング手法に対するフレームワークの優位性も明らかにした。 Self-Rating Depression Scale (SDS) questionnaire has frequently been used for efficient depression preliminary screening. However, the uncontrollable self-administered measure can be easily affected by insouciantly or deceptively answering, and producing the different results with the clinician-administered Hamilton Depression Rating Scale (HDRS) and the final diagnosis. Clinically, facial expression (FE) and actions play a vital role in clinician-administered evaluation, while FE and action are underexplored for self-administered evaluations. In this work, we collect a novel dataset of 200 subjects to evidence the validity of self-rating questionnaires with their corresponding question-wise video recording. To automatically interpret depression from the SDS evaluation and the paired video, we propose an end-to-end hierarchical framework for the long-term variable-length video, which is also conditioned on the questionnaire results and the answering time. Specifically, we resort to a hierarchical model which utilizes a 3D CNN for local temporal pattern exploration and a redundancy-aware self-attention (RAS) scheme for question-wise global feature aggregation. Targeting for the redundant long-term FE video processing, our RAS is able to effectively exploit the correlations of each video clip within a question set to emphasize the discriminative information and eliminate the redundancy based on feature pair-wise affinity. Then, the question-wise video feature is concatenated with the questionnaire scores for final depression detection. Our thorough evaluations also show the validity of fusing SDS evaluation and its video recording, and the superiority of our framework to the conventional state-of-the-art temporal modeling methods.	翻訳日:2021-06-28 23:45:20 公開日:2021-06-25
# (参考訳) 逆算の例:入力変換と雑音学習の組み合わせ Countering Adversarial Examples: Combining Input Transformation and Noisy Training ( http://arxiv.org/abs/2106.13394v1 ) ライセンス: CC BY 4.0	Cheng Zhang, Pan Gao	(参考訳) 近年の研究では、ニューラルネットワーク(nn)ベースの画像分類器は、セキュリティに敏感な画像認識タスクの脅威となる敵の例に非常に脆弱であることが示されている。これまでの研究では、JPEG圧縮はある程度の逆例の分類精度の低下に対処できることを示した。しかし、圧縮比が大きくなるにつれて、従来のJPEG圧縮はこれらの攻撃を防御するには不十分である。本稿では,逆方向の摂動を完全にフィルタリングすることを目的として,NNに好適な従来のJPEG圧縮アルゴリズムを改良する。具体的には,周波数係数の解析に基づいて圧縮のためのnn-favored quantization tableを設計する。データ拡張戦略として圧縮を考えると、モデルに依存しない前処理とノイズの多いトレーニングを組み合わせる。異なる圧縮レベルで符号化された画像を用いてトレーニングすることにより,事前学習したモデルを微調整し,複数の分類器を生成する。最後に、低(高)圧縮比は摂動と原特徴をわずかに除去できるので、モデルアンサンブルのためにこれらの訓練された複数のモデルを使用する。モデルのアンサンブルの大多数の投票は最終予測として採用される。実験の結果,本手法はオリジナル精度を維持しつつ防御効率を向上させることができた。 Recent studies have shown that neural network (NN) based image classifiers are highly vulnerable to adversarial examples, which poses a threat to security-sensitive image recognition task. Prior work has shown that JPEG compression can combat the drop in classification accuracy on adversarial examples to some extent. But, as the compression ratio increases, traditional JPEG compression is insufficient to defend those attacks but can cause an abrupt accuracy decline to the benign images. In this paper, with the aim of fully filtering the adversarial perturbations, we firstly make modifications to traditional JPEG compression algorithm which becomes more favorable for NN. Specifically, based on an analysis of the frequency coefficient, we design a NN-favored quantization table for compression. Considering compression as a data augmentation strategy, we then combine our model-agnostic preprocess with noisy training. We fine-tune the pre-trained model by training with images encoded at different compression levels, thus generating multiple classifiers. Finally, since lower (higher) compression ratio can remove both perturbations and original features slightly (aggressively), we use these trained multiple models for model ensemble. The majority vote of the ensemble of models is adopted as final predictions. Experiments results show our method can improve defense efficiency while maintaining original accuracy.	翻訳日:2021-06-28 23:23:48 公開日:2021-06-25
# (参考訳) インテリジェントな自律ナビゲーションエージェントの構築 Building Intelligent Autonomous Navigation Agents ( http://arxiv.org/abs/2106.13415v1 ) ライセンス: CC BY 4.0	Devendra Singh Chaplot	(参考訳) 過去10年間の機械学習のブレークスルーは‘デジタルインテリジェンス’、すなわち“デジタルインテリジェンス’につながった。膨大なラベル付きデータから学習し、音声認識、顔認識、機械翻訳などのデジタルタスクを実行することができる機械学習モデル。この論文の目標は「物理知性」が可能なアルゴリズムの設計を前進させることである。視覚知覚、自然言語理解、推論、計画、シーケンシャルな意思決定を含む、物理的な世界で複雑なナビゲーションタスクを実行できるインテリジェントな自律ナビゲーションエージェントの構築。過去数十年間の古典的ナビゲーション手法の進歩にもかかわらず、現在のナビゲーションエージェントは長期的な意味的ナビゲーションタスクで苦労している。論文の前半では,障害回避,意味認識,言語接地,推論といった課題に取り組むために,エンドツーエンドの強化学習を用いた短期ナビゲーションについて論じる。第2部では,モジュール型学習と構造化された明示的地図表現に基づく新しいナビゲーション手法について紹介する。これらの手法は, ローカライゼーション, マッピング, 長期計画, 探索, セマンティック事前学習といった課題に効果的に対処できることを示す。これらのモジュール型学習手法は,長期的空間的・意味的理解と,様々なナビゲーションタスクにおける最先端の成果を達成することができる。 Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.	翻訳日:2021-06-28 23:09:36 公開日:2021-06-25
# (参考訳) ドメイン名と関連する時間的特性を用いたブロックチェーン内の悪意のあるアカウントの識別 Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties ( http://arxiv.org/abs/2106.13420v1 ) ライセンス: CC BY 4.0	Rohit Kumar Sachan, Rachit Agarwal, Sandeep Kumar Shukla	(参考訳) ブロックチェーン技術の普及は、サイバー犯罪者による違法行為の増加につながり、何十億ドルもの費用がかかっている。このような不正行為を検出するために、多くの機械学習アルゴリズムが適用される。これらのアルゴリズムは、しばしばトランザクションの振る舞いに基づいて訓練され、場合によっては、システムに存在する脆弱性について訓練される。このアプローチでは、ブロックチェーン内のアカウントに関連付けられたドメイン名(DN)などのメタデータを使用することで、アカウントに悪意のあるタグ付けをすべきかどうかを判定する。ここでは、DNに付随する時間的側面を活用する。その結果,144930個のDNを同定し,そのうち54114個のDNは時間とともに永続的な悪意を示すことがわかった。それにもかかわらず、新たにタグ付けされた悪意のあるブロックチェーンDNには、これらの悪意のあるDNが報告されていない。 The rise in the adoption of blockchain technology has led to increased illegal activities by cyber-criminals costing billions of dollars. Many machine learning algorithms are applied to detect such illegal behavior. These algorithms are often trained on the transaction behavior and, in some cases, trained on the vulnerabilities that exist in the system. In our approach, we study the feasibility of using metadata such as Domain Name (DN) associated with the account in the blockchain and identify whether an account should be tagged malicious or not. Here, we leverage the temporal aspects attached to the DNs. Our results identify 144930 DNs that show malicious behavior, and out of these, 54114 DNs show persistent malicious behavior over time. Nonetheless, none of these identified malicious DNs were reported in new officially tagged malicious blockchain DNs.	翻訳日:2021-06-28 23:07:58 公開日:2021-06-25
# (参考訳) 不正スマートコントラクトの脆弱性とトランザクション行動に基づく検出 Vulnerability and Transaction behavior based detection of Malicious Smart Contracts ( http://arxiv.org/abs/2106.13422v1 ) ライセンス: CC BY 4.0	Rachit Agarwal, Tanmay Thapliyal, Sandeep Kumar Shukla	(参考訳) ethereumのsmart contracts(scs)はタスクを自動化し、ユーザにさまざまな機能を提供する。このような自動化は、SCが書かれたプログラミング言語(Solidity)の'Turing-complete'の性質によって実現される。これはまた、悪意あるアクターが暗号通貨プラットフォーム上で悪意あるまたは違法なアクティビティを実行するために悪用する、SCのさまざまな脆弱性とバグを開放する。本研究では,悪質な活動とscsに存在する脆弱性の相関関係を調べ,悪質な活動が特定の種類の脆弱性と相関していることを見いだす。次に、SCの脆弱性の重大度に対応するスコアリング機構の実現可能性について検討し、不審なSCの特定に関連性があるかどうかを判断する。非教師付き機械学習(ml)アルゴリズムを用いて,不審なscsの検出に向けた重大度スコアの有用性を分析し,行動変化の同定を行う。オンチェーンSCを用いた実験では、さまざまな粒度にわたる合計1094個の良性SCが、悪意のあるSCと同じような振る舞いをしており、機能セットにスマートコントラクトの脆弱性スコアが組み込まれています。 Smart Contracts (SCs) in Ethereum can automate tasks and provide different functionalities to a user. Such automation is enabled by the `Turing-complete' nature of the programming language (Solidity) in which SCs are written. This also opens up different vulnerabilities and bugs in SCs that malicious actors exploit to carry out malicious or illegal activities on the cryptocurrency platform. In this work, we study the correlation between malicious activities and the vulnerabilities present in SCs and find that some malicious activities are correlated with certain types of vulnerabilities. We then develop and study the feasibility of a scoring mechanism that corresponds to the severity of the vulnerabilities present in SCs to determine if it is a relevant feature to identify suspicious SCs. We analyze the utility of severity score towards detection of suspicious SCs using unsupervised machine learning (ML) algorithms across different temporal granularities and identify behavioral changes. In our experiments with on-chain SCs, we were able to find a total of 1094 benign SCs across different granularities which behave similar to malicious SCs, with the inclusion of the smart contract vulnerability scores in the feature set.	翻訳日:2021-06-28 22:48:16 公開日:2021-06-25
# (参考訳) 強化学習問題としての分岐予測 : なぜ, 方法, 事例研究 Branch Prediction as a Reinforcement Learning Problem: Why, How and Case Studies ( http://arxiv.org/abs/2106.13429v1 ) ライセンス: CC BY 4.0	Anastasios Zouzias, Kleovoulos Kalaitzidis and Boris Grot	(参考訳) 近年、分岐予測器(BP)の有効性が停滞し、分岐予測器の設計における新しいアイデアが失われ、この分野における新しい思考が求められている。本稿では,Reinforcement Learning(RL)の観点からBPを考察することにより,BP設計の体系的推論と探索を容易にする。本稿では、分岐予測器にRLの定式化を適用し、この定式化で既存の予測器を簡潔に表現できることを示し、従来のBPの2つのRLに基づく変種について検討する。 Recent years have seen stagnating improvements to branch predictor (BP) efficacy and a dearth of fresh ideas in branch predictor design, calling for fresh thinking in this area. This paper argues that looking at BP from the viewpoint of Reinforcement Learning (RL) facilitates systematic reasoning about, and exploration of, BP designs. We describe how to apply the RL formulation to branch predictors, show that existing predictors can be succinctly expressed in this formulation, and study two RL-based variants of conventional BPs.	翻訳日:2021-06-28 22:30:48 公開日:2021-06-25
# (参考訳) 限られた数の学習サンプルを用いたハイブリッドモデルと学習モデルによる分類手法 A hybrid model-based and learning-based approach for classification using limited number of training samples ( http://arxiv.org/abs/2106.13436v1 ) ライセンス: CC BY 4.0	Alireza Nooraiepour, Waheed U. Bajwa, Narayan B. Mandayam	(参考訳) 限られた数のトレーニングデータサンプルが与えられた分類の基本的なタスクは、既知のパラメトリック統計モデルを持つ物理システムである。独立した学習ベースおよび統計モデルベース分類器は、小さなトレーニングセットを用いた分類タスクの実現に向けて大きな課題に直面している。具体的には、物理に基づく統計モデルにのみ依存する分類器は、基礎となる観測不可能なパラメータを適切に調整できないため、システムの振舞いが不一致となる。一方、学習ベースの分類器は通常、基礎となる物理的プロセスからの大量のトレーニングデータに依存しており、ほとんどの現実的なシナリオでは実現できないかもしれない。本稿では,物理ベースの統計モデルと学習に基づく分類器の両方を利用するハイブリッド分類法であるhyphylearnを提案する。提案手法は,HyPhyLearnが学習ベースおよび統計モデルに基づく分類器の個人的アプローチに関わる課題を,それぞれの強みを融合することによって緩和する,という予想に基づいている。提案手法は,まず利用可能な(最適でない)統計的推定手法を用いて観測不可能なモデルパラメータを推定し,次いで物理に基づく統計モデルを用いて合成データを生成する。次に、ニューラルネットワークのドメイン対逆トレーニングに基づく学習ベース分類器に、トレーニングデータサンプルを合成データに組み込む。具体的には、ミスマッチ問題に対処するために、分類器は、トレーニングデータと合成データとから共通の特徴空間へのマッピングを学習する。同時に、分類器は、分類タスクを満たすために、この空間内で識別的特徴を見つけるように訓練される。 The fundamental task of classification given a limited number of training data samples is considered for physical systems with known parametric statistical models. The standalone learning-based and statistical model-based classifiers face major challenges towards the fulfillment of the classification task using a small training set. Specifically, classifiers that solely rely on the physics-based statistical models usually suffer from their inability to properly tune the underlying unobservable parameters, which leads to a mismatched representation of the system's behaviors. Learning-based classifiers, on the other hand, typically rely on a large number of training data from the underlying physical process, which might not be feasible in most practical scenarios. In this paper, a hybrid classification method -- termed HyPhyLearn -- is proposed that exploits both the physics-based statistical models and the learning-based classifiers. The proposed solution is based on the conjecture that HyPhyLearn would alleviate the challenges associated with the individual approaches of learning-based and statistical model-based classifiers by fusing their respective strengths. The proposed hybrid approach first estimates the unobservable model parameters using the available (suboptimal) statistical estimation procedures, and subsequently use the physics-based statistical models to generate synthetic data. Then, the training data samples are incorporated with the synthetic data in a learning-based classifier that is based on domain-adversarial training of neural networks. Specifically, in order to address the mismatch problem, the classifier learns a mapping from the training data and the synthetic data to a common feature space. Simultaneously, the classifier is trained to find discriminative features within this space in order to fulfill the classification task.	翻訳日:2021-06-28 22:19:07 公開日:2021-06-25
# (参考訳) 絵は、視覚的な質問に答えるために100語分の価値があるかもしれない A Picture May Be Worth a Hundred Words for Visual Question Answering ( http://arxiv.org/abs/2106.13445v1 ) ライセンス: CC BY 4.0	Yusuke Hirota, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Ittetsu Taniguchi, Takao Onoye	(参考訳) 写真を理解するためのテキスト表現はどこまでできるのか? 画像理解では、簡潔だが詳細な画像表現を使うことが不可欠である。より高速なR-CNNのような視覚モデルによって抽出された深い視覚的特徴は、複数のタスク、特に視覚的質問応答(VQA)で広く使われている。しかし、従来の深い視覚的特徴は、人間のように画像内のすべての詳細を伝えるのに苦労するかもしれない。一方、最近の言語モデルの進歩により、記述テキストはこの問題の代替となるかもしれない。本稿では,VQAの特定の文脈における画像理解のためのテキスト表現の有効性について検討する。本稿では,記述・質問対を入力として,言語のみのトランスフォーマーモデルに導入し,プロセスと計算コストを単純化することを提案する。また、トレーニングセットの多様性を高め、統計的バイアスの学習を避けるために、データ拡張手法も実験した。大規模な評価では、VQA 2.0とVQA-CP v2の両方の深い視覚的特徴と競合するために、テキスト表現は100語程度しか必要としない。 How far can we go with textual representations for understanding pictures? In image understanding, it is essential to use concise but detailed image representations. Deep visual features extracted by vision models, such as Faster R-CNN, are prevailing used in multiple tasks, and especially in visual question answering (VQA). However, conventional deep visual features may struggle to convey all the details in an image as we humans do. Meanwhile, with recent language models' progress, descriptive text may be an alternative to this problem. This paper delves into the effectiveness of textual representations for image understanding in the specific context of VQA. We propose to take description-question pairs as input, instead of deep visual features, and fed them into a language-only Transformer model, simplifying the process and the computational cost. We also experiment with data augmentation techniques to increase the diversity in the training set and avoid learning statistical bias. Extensive evaluations have shown that textual representations require only about a hundred words to compete with deep visual features on both VQA 2.0 and VQA-CP v2.	翻訳日:2021-06-28 21:15:24 公開日:2021-06-25
# (参考訳) 深い解釈可能な刑事電荷予測とアルゴリズムバイアス Deep Interpretable Criminal Charge Prediction and Algorithmic Bias ( http://arxiv.org/abs/2106.13456v1 ) ライセンス: CC BY 4.0	Abdul Rafae Khan, Jia Xu, Peter Varsanyi, Rachit Pabreja	(参考訳) 刑事司法制度における決定を補助する上で、予測的警察はますます一般的になっているが、これらの結果の使用はいまだに議論の余地がある。深層学習に基づくソフトウェアの中には精度(例えばF-1)に欠けるものもあるが、多くの意思決定プロセスは、人種、年齢、性別格差などの決定バイアスに疑念を生じさせるものではない。本稿では,20年以上の時間行動パターンを学習することで,過去の犯罪記録から将来の刑事訴追を受けるかという信頼性の高い予測を行うため,ポストホックな説明を伴うバイアス問題に対処する。 Bi-LSTMは、消失する勾配問題を緩和し、注意機構は特徴の重要性の学習と解釈を可能にする。提案手法は,実生活データセット上での予測精度とリコールの一貫性を示す。筆者らは,各入力特徴の重要性を分析し,犯罪履歴が統計的に重要な要因であるのに対して,人種,性別,年齢などの識別子はそうではないことを示唆した。最後に,我々のアルゴリズムは,犯罪の深刻度が時間とともに急激に上昇する傾向にあることを示す。 While predictive policing has become increasingly common in assisting with decisions in the criminal justice system, the use of these results is still controversial. Some software based on deep learning lacks accuracy (e.g., in F-1), and many decision processes are not transparent causing doubt about decision bias, such as perceived racial, age, and gender disparities. This paper addresses bias issues with post-hoc explanations to provide a trustable prediction of whether a person will receive future criminal charges given one's previous criminal records by learning temporal behavior patterns over twenty years. Bi-LSTM relieves the vanishing gradient problem, and attentional mechanisms allows learning and interpretation of feature importance. Our approach shows consistent and reliable prediction precision and recall on a real-life dataset. Our analysis of the importance of each input feature shows the critical causal impact on decision-making, suggesting that criminal histories are statistically significant factors, while identifiers, such as race, gender, and age, are not. Finally, our algorithm indicates that a suspect tends to gradually rather than suddenly increase crime severity level over time.	翻訳日:2021-06-28 20:56:53 公開日:2021-06-25
# (参考訳) adapt-and-distill: ドメインのための小さくて高速で効果的な事前学習言語モデルの開発 Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains ( http://arxiv.org/abs/2106.13474v1 ) ライセンス: CC BY 4.0	Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong, Furu Wei	(参考訳) 訓練済みの大きなモデルは多くの自然言語処理タスクで大きな成功を収めた。しかしながら、特定のドメインに適用されると、これらのモデルはドメインシフトに悩まされ、レイテンシとキャパシティの制約に対して、微調整とオンラインサービスに課題をもたらす。本稿では、特定の領域に対して、小さくて高速で効果的な事前学習モデルを開発するための一般的なアプローチを提案する。これは、既成の一般訓練モデルに適応し、ターゲットドメインでタスク非依存の知識蒸留を行うことによって達成される。具体的には,適応段階におけるドメイン固有語彙拡張を提案し,コーパスレベル発生確率を用いてインクリメンタル語彙のサイズを自動的に選択する。そこで我々は,特定の領域に対する大規模事前学習モデルを圧縮するための様々な戦略を体系的に検討する。我々は生物医学とコンピュータ科学の領域で実験を行う。実験の結果、ドメイン固有タスクにおけるbertベースモデルよりもパフォーマンスが向上し、bertベースより3.3倍小さく5.1倍高速になった。コードと事前学習されたモデルはhttps://aka.ms/adalm.com/で入手できる。 Large pre-trained models have achieved great success in many natural language processing tasks. However, when they are applied in specific domains, these models suffer from domain shift and bring challenges in fine-tuning and online serving for latency and capacity constraints. In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains. This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains. Specifically, we propose domain-specific vocabulary expansion in the adaptation stage and employ corpus level occurrence probability to choose the size of incremental vocabulary automatically. Then we systematically explore different strategies to compress the large pre-trained models for specific domains. We conduct our experiments in the biomedical and computer science domain. The experimental results demonstrate that our approach achieves better performance over the BERT BASE model in domain-specific tasks while 3.3x smaller and 5.1x faster than BERT BASE. The code and pre-trained models are available at https://aka.ms/adalm.	翻訳日:2021-06-28 20:51:03 公開日:2021-06-25
# (参考訳) エネルギー予測のための機械学習の限界:ASHRAE Great Energy Predictor III Kaggle 競合誤差解析 Limitations of machine learning for building energy prediction: ASHRAE Great Energy Predictor III Kaggle competition error analysis ( http://arxiv.org/abs/2106.13475v1 ) ライセンス: CC BY 4.0	Clayton Miller, Bianca Picchetti, Chun Fu, Jovan Pantelic	(参考訳) 近年、エネルギー予測を構築するための機械学習が人気を博しているが、その限界と改善の可能性を理解していない。 ASHRAE Great Energy Predictor III (GEPIII) Kaggleコンペティションは、39,403件の予測を提出した4,370人の参加者による建築エネルギーメーター機械学習コンペティションである。テストデータには、時間給電、2年分の給湯、冷水、および16カ所の1,448棟の建物で2,380メートルの蒸気読み取りが含まれていた。本稿では,コンペティションのトップ50ソリューションの集約から残留モデルエラーの各種発生源と種類を分析した。この分析は、過去のメーター、天気、基本的な建築メタデータの標準モデル入力を用いた機械学習の限界を明らかにする。エラーの種類は、各インスタンスで発生した時間誤差の量、突然の振る舞いと漸進的な振る舞い、エラーの大きさ、エラーが1つの建物または複数の建物に一度に存在するかどうかによって分類される。結果は、機械学習モデルがテストデータの79.1%の許容範囲内でエラーを持っていることを示している。低等級のモデルエラーはテストデータの16.1%で発生する。これらの相違は、機械学習における追加のトレーニングデータソースやイノベーションによって対処される可能性がある。高次の誤差はテストデータの4.8%で発生し、イノベーションに関係なく正確に予測されることはない。エネルギーメータータイプ(電気予測モデルはテストデータの10%未満で許容できないエラーを持ち、温水は60%以上である)と使用タイプ(公共サービスでは14%未満、技術/科学では46%以上である)によって、エラーの振る舞いは様々である。 Machine learning for building energy prediction has exploded in popularity in recent years, yet understanding its limitations and potential for improvement are lacking. The ASHRAE Great Energy Predictor III (GEPIII) Kaggle competition was the largest building energy meter machine learning competition ever held with 4,370 participants who submitted 39,403 predictions. The test data set included two years of hourly electricity, hot water, chilled water, and steam readings from 2,380 meters in 1,448 buildings at 16 locations. This paper analyzes the various sources and types of residual model error from an aggregation of the competition's top 50 solutions. This analysis reveals the limitations for machine learning using the standard model inputs of historical meter, weather, and basic building metadata. The types of error are classified according to the amount of time errors occur in each instance, abrupt versus gradual behavior, the magnitude of error, and whether the error existed on single buildings or several buildings at once from a single location. The results show machine learning models have errors within a range of acceptability on 79.1% of the test data. Lower magnitude model errors occur in 16.1% of the test data. These discrepancies can likely be addressed through additional training data sources or innovations in machine learning. Higher magnitude errors occur in 4.8% of the test data and are unlikely to be accurately predicted regardless of innovation. There is a diversity of error behavior depending on the energy meter type (electricity prediction models have unacceptable error in under 10% of test data, while hot water is over 60%) and building use type (public service less than 14%, while technology/science is just over 46%).	翻訳日:2021-06-28 20:39:06 公開日:2021-06-25
# (参考訳) グラフパターン損失に基づくクロスモーダル検索のための分散注意ネットワーク Graph Pattern Loss based Diversified Attention Network for Cross-Modal Retrieval ( http://arxiv.org/abs/2106.13552v1 ) ライセンス: CC BY 4.0	Xueying Chen, Rong Zhang, Yibing Zhan	(参考訳) クロスモーダル検索は、画像、ビデオ、テキスト、オーディオなどのマルチメディアデータを組み合わせることで、柔軟な検索エクスペリエンスを実現することを目的としている。教師なしアプローチのコアの1つは、異なるオブジェクト表現間の相関を掘り下げて、高価なラベルを必要とせずに完全な検索性能を達成することである。本稿では,表現間の相関関係を深く解析するために,教師なしクロスモーダル検索のためのグラフパターン損失に基づく分散注意ネットワーク(GPLDAN)を提案する。まず、インスタンスの複数の表現を生成するために異なる表現間の相互作用を考慮し、多様な注目機能プロジェクタを提案する。そこで我々は,異なる表現間の相関関係を探索するために,新しいグラフパターンの損失を設計する。さらに、融合前に対応する特徴のモダリティを明示的に宣言するためにモダリティ分類器を追加し、ネットワークを誘導して識別能力を高める。 GPLDANを4つの公開データセットでテストする。最先端のクロスモーダル検索手法と比較して,GPLDANの性能と競争性を示す実験結果が得られた。 Cross-modal retrieval aims to enable flexible retrieval experience by combining multimedia data such as image, video, text, and audio. One core of unsupervised approaches is to dig the correlations among different object representations to complete satisfied retrieval performance without requiring expensive labels. In this paper, we propose a Graph Pattern Loss based Diversified Attention Network(GPLDAN) for unsupervised cross-modal retrieval to deeply analyze correlations among representations. First, we propose a diversified attention feature projector by considering the interaction between different representations to generate multiple representations of an instance. Then, we design a novel graph pattern loss to explore the correlations among different representations, in this graph all possible distances between different representations are considered. In addition, a modality classifier is added to explicitly declare the corresponding modalities of features before fusion and guide the network to enhance discrimination ability. We test GPLDAN on four public datasets. Compared with the state-of-the-art cross-modal retrieval methods, the experimental results demonstrate the performance and competitiveness of GPLDAN.	翻訳日:2021-06-28 20:18:26 公開日:2021-06-25
# (参考訳) 文脈における単語の意味表現の探索:ホモニミーとシンノミーを事例として Exploring the Representation of Word Meanings in Context: A Case Study on Homonymy and Synonymy ( http://arxiv.org/abs/2106.13553v1 ) ライセンス: CC BY 4.0	Marcos Garcia	(参考訳) 本稿では,文脈における単語の意味表現の多言語的研究について述べる。我々は,静的モデルと文脈モデルの両方が,同義語や同義語などの語彙関係を適切に表現できる能力を評価する。そこで我々は,周囲の文脈の影響や単語間の重なりなど,複数の要因の制御された評価を行い,同一あるいは異なる感覚を伝達できる,新たな多言語データセットを作成した。 4つのシナリオに関する体系的な評価は、トランスフォーマーに基づく最良の単言語モデルが文脈におけるホモニムを適切に曖昧化することができることを示している。しかし、これらのモデルは文脈に大きく依存しているため、類似した文で発生する異なる感覚の単語を表現できない。ガリシア語、ポルトガル語、英語、スペイン語で実験が行われ、データセット(3000以上の評価項目を含む)と新しいモデルの両方がこの研究で自由にリリースされる。 This paper presents a multilingual study of word meaning representations in context. We assess the ability of both static and contextualized models to adequately represent different lexical-semantic relations, such as homonymy and synonymy. To do so, we created a new multilingual dataset that allows us to perform a controlled evaluation of several factors such as the impact of the surrounding context or the overlap between words, conveying the same or different senses. A systematic assessment on four scenarios shows that the best monolingual models based on Transformers can adequately disambiguate homonyms in context. However, as they rely heavily on context, these models fail at representing words with different senses when occurring in similar sentences. Experiments are performed in Galician, Portuguese, English, and Spanish, and both the dataset (with more than 3,000 evaluation items) and new models are freely released with this study.	翻訳日:2021-06-28 20:08:27 公開日:2021-06-25
# (参考訳) srpn:組織画像における核および細胞検出のための類似性に基づく領域提案ネットワーク SRPN: similarity-based region proposal networks for nuclei and cells detection in histology images ( http://arxiv.org/abs/2106.13556v1 ) ライセンス: CC BY 4.0	Yibao Sun, Xingru Huang, Huiyu Zhou, Qianni Zhang	(参考訳) 組織像中の核と細胞の検出は臨床と病理学的研究の両方において非常に有用である。しかし, 原子核や細胞の形態変化などの複数の理由から, 従来の物体検出法では良好な性能が得られない課題となっている。検出タスクは2つのサブタスク、分類とローカライゼーションで構成される。密度の高い物体検出条件下では、分類は検出性能を高める鍵となる。そこで本研究では,核・細胞検出のための類似性に基づく領域提案ネットワーク(SRPN)を提案する。特に、組み込み層と呼ばれるカスタマイズされた畳み込み層は、ネットワーク構築のために設計されている。埋め込み層がリージョン提案ネットワークに追加され、類似性学習に基づいて識別的特徴を学習することができる。類似学習によって得られる特徴は,従来の手法に比べて分類性能を著しく向上させることができる。 SRPNは、Faster R-CNNやRetinaNetのような標準の畳み込みニューラルネットワークアーキテクチャに容易に統合できる。組織像における多臓器核検出とシグナレットリング細胞検出の課題について,提案手法を検証した。実験の結果,類似性学習を施したネットワークは,両タスクにおいて,両課題とも同等の性能を得た。特に,提案したSRPNは,従来手法と比較して核分割と検出のためのMoNuSegベンチマーク,およびベースラインと比較した場合のシグレットリング細胞検出ベンチマークにおいて,最先端性能を実現している。ソースコードはhttps://github.com/sigma10010/nuclei_cells_detで公開されている。 The detection of nuclei and cells in histology images is of great value in both clinical practice and pathological studies. However, multiple reasons such as morphological variations of nuclei or cells make it a challenging task where conventional object detection methods cannot obtain satisfactory performance in many cases. A detection task consists of two sub-tasks, classification and localization. Under the condition of dense object detection, classification is a key to boost the detection performance. Considering this, we propose similarity based region proposal networks (SRPN) for nuclei and cells detection in histology images. In particular, a customized convolution layer termed as embedding layer is designed for network building. The embedding layer is added into the region proposal networks, enabling the networks to learn discriminative features based on similarity learning. Features obtained by similarity learning can significantly boost the classification performance compared to conventional methods. SRPN can be easily integrated into standard convolutional neural networks architectures such as the Faster R-CNN and RetinaNet. We test the proposed approach on tasks of multi-organ nuclei detection and signet ring cells detection in histological images. Experimental results show that networks applying similarity learning achieved superior performance on both tasks when compared to their counterparts. In particular, the proposed SRPN achieve state-of-the-art performance on the MoNuSeg benchmark for nuclei segmentation and detection while compared to previous methods, and on the signet ring cell detection benchmark when compared with baselines. The sourcecode is publicly available at: https://github.com/sigma10010/nuclei_cells_det.	翻訳日:2021-06-28 18:59:51 公開日:2021-06-25
# (参考訳) HEVC画面コンテンツ符号化によるマルチビュー映像圧縮 Multiview Video Compression Using Advanced HEVC Screen Content Coding ( http://arxiv.org/abs/2106.13574v1 ) ライセンス: CC BY 4.0	Jaros{\l}aw Samelak, Marek Doma\'nski	(参考訳) 本稿では,スクリーンコンテンツ符号化を用いたマルチビュー映像符号化手法を提案する。一瞬の間、すべてのビューに対応するフレームが単一のフレームに詰め込まれていると仮定される。マルチビュー符号化へのフレーム互換アプローチが適用される。このようなコーディングシナリオに対して,マルチビュー映像符号化にスクリーンコンテンツ符号化が有効であることを示す。 2つのアプローチが検討されている: 1つは標準hevcスクリーンコンテンツコーディング、もう1つは高度なスクリーンコンテンツコーディングである。後者は、HEVCスクリーンコンテンツ符号化の4分の1のモーションベクトルや他の非標準拡張を利用する著者の原案である。実験結果から,標準的なHEVC画面コンテンツ符号化を用いたマルチビュー映像符号化の方が,HEVC符号化のシミュレートよりもはるかに効率的であることが示された。提案したAdvanced Screen Content Codingは、最先端のマルチビュービデオ圧縮技術であるMV-HEVCとほぼ同等の符号化効率を提供する。著者らは、新しいVersatile Video Coding(VVC)技術で、Advanced Screen Content Codingを効率的に利用できることを示唆している。しかしながら、vvcの参照マルチビュー拡張はまだ存在しないため、vvcベースのコーディングでは、将来の作業のために実験的な比較が残されている。 The paper presents a new approach to multiview video coding using Screen Content Coding. It is assumed that for a time instant the frames corresponding to all views are packed into a single frame, i.e. the frame-compatible approach to multiview coding is applied. For such coding scenario, the paper demonstrates that Screen Content Coding can be efficiently used for multiview video coding. Two approaches are considered: the first using standard HEVC Screen Content Coding, and the second using Advanced Screen Content Coding. The latter is the original proposal of the authors that exploits quarter-pel motion vectors and other nonstandard extensions of HEVC Screen Content Coding. The experimental results demonstrate that multiview video coding even using standard HEVC Screen Content Coding is much more efficient than simulcast HEVC coding. The proposed Advanced Screen Content Coding provides virtually the same coding efficiency as MV-HEVC, which is the state-of-the-art multiview video compression technique. The authors suggest that Advanced Screen Content Coding can be efficiently used within the new Versatile Video Coding (VVC) technology. Nevertheless a reference multiview extension of VVC does not exist yet, therefore, for VVC-based coding, the experimental comparisons are left for future work.	翻訳日:2021-06-28 18:34:25 公開日:2021-06-25
# (参考訳) 遺伝的アルゴリズムを用いた段階的議論フレームワークの学習 Learning Gradual Argumentation Frameworks using Genetic Algorithms ( http://arxiv.org/abs/2106.13585v1 ) ライセンス: CC BY 4.0	Jonathan Spieler, Nico Potyka, Steffen Staab	(参考訳) グラフィカルな議論フレームワークは、重み付きグラフで引数とその関係を表現する。彼らのグラフィカルな構造と直感的な意味論は、機械学習を解釈するための潜在的に興味深いツールとなる。近年、そのメカニズムはニューラルネットワークと密接に関連しており、標準のディープラーニングフレームワークによってデータから重み付けを学習することができる。最初の概念実証として,議論型分類モデルの構造を同時に学習する遺伝的アルゴリズムを提案する。良好に解釈可能なモデルを得るには、適合関数は分類器のスパースネスと精度のバランスをとる。提案アルゴリズムについて考察し,UCI機械学習レポジトリの標準ベンチマークに関する最初の実験結果を示す。本プロトタイプでは,学習性能と解釈可能性の観点から,決定木に匹敵する議論的分類モデルを学習する。 Gradual argumentation frameworks represent arguments and their relationships in a weighted graph. Their graphical structure and intuitive semantics makes them a potentially interesting tool for interpretable machine learning. It has been noted recently that their mechanics are closely related to neural networks, which allows learning their weights from data by standard deep learning frameworks. As a first proof of concept, we propose a genetic algorithm to simultaneously learn the structure of argumentative classification models. To obtain a well interpretable model, the fitness function balances sparseness and accuracy of the classifier. We discuss our algorithm and present first experimental results on standard benchmarks from the UCI machine learning repository. Our prototype learns argumentative classification models that are comparable to decision trees in terms of learning performance and interpretability.	翻訳日:2021-06-28 18:23:39 公開日:2021-06-25
# (参考訳) 本物の熱画像と偽の熱画像を混ぜて、オブジェクト検出を改善する Partially fake it till you make it: mixing real and fake thermal images for improved object detection ( http://arxiv.org/abs/2106.13603v1 ) ライセンス: CC BY-SA 4.0	Francesco Bongini, Lorenzo Berlincioni, Marco Bertini, Alberto Del Bimbo	(参考訳) 本稿では,学習データセットが乏しい視覚コンテンツ領域に対して,実シーンで合成された3Dオブジェクトを合成する新しいデータ拡張手法を提案する。熱画像における物体検出の文脈において, 提案システムの性能を示す。1) トレーニングデータセットは可視スペクトルデータセットと比較して非常に限られた領域であり, 2) シーンの素材の熱特性のモデル化が困難であるため, 完全なリアルな合成シーンの作成は非常に困難で費用がかかる。我々は,RL法を用いて得られた技術アプローチの状況,シミュレーションデータの注入,生成モデルの活用など,さまざまな拡張戦略を比較し,提案手法と他の手法を最大限に組み合わせる方法について検討する。実験結果から,我々のアプローチの有効性が示され,我々の単一モード検出装置はFLIR ADASデータセット上で最先端の成果を達成できる。 In this paper we propose a novel data augmentation approach for visual content domains that have scarce training datasets, compositing synthetic 3D objects within real scenes. We show the performance of the proposed system in the context of object detection in thermal videos, a domain where 1) training datasets are very limited compared to visible spectrum datasets and 2) creating full realistic synthetic scenes is extremely cumbersome and expensive due to the difficulty in modeling the thermal properties of the materials of the scene. We compare different augmentation strategies, including state of the art approaches obtained through RL techniques, the injection of simulated data and the employment of a generative model, and study how to best combine our proposed augmentation with these other techniques.Experimental results demonstrate the effectiveness of our approach, and our single-modality detector achieves state-of-the-art results on the FLIR ADAS dataset.	翻訳日:2021-06-28 18:08:21 公開日:2021-06-25
# (参考訳) Chebyshev-Cantelli PAC-Bayes-Bennettの不等式 Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote ( http://arxiv.org/abs/2106.13624v1 ) ライセンス: CC BY 4.0	Yi-Shan Wu, Andr\'es R. Masegosa, Stephan S. Lorenzen, Christian Igel, Yevgeny Seldin	(参考訳) 我々は、重み付けされた多数決のリスクを期待する2次託宣を新たに提示する。この境界は、チェビシェフ=カンテリの不等式(英語版)(a.k.a.\ one-sided chebyshev's)の新たなパラメトリック形式に基づいている。この新しい形式は、チェビシェフ・カンテリの不等式、C-バウンド [Germain et al., 2015] に基づく事前のオラクル境界が直面する最適化課題を解決し、同時に、マセゴサらによって導入された2階マルコフの不等式に基づくオラクル境界を改善する。 [2020]. また、私たちはpac-bayes-bennettの不等式を導出します。 PAC-Bayes-Bennettの不等式はセルディンらによってPAC-Bayes-Bernsteinの不等式を改善する。 [2012]. 我々は,Masegosaらによる新たな限界が改善できることを実証的に評価する。 [2020]. チェビシェフ・カンテッリ不等式のパラメトリック形式とPAC-ベイズ・ベネット不等式は、他の領域における測度集中の研究には独立した関心を持つかもしれない。 We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev-Cantelli inequality (a.k.a.\ one-sided Chebyshev's), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain et al., 2015], and, at the same time, it improves on the oracle bound based on second order Markov's inequality introduced by Masegosa et al. [2020]. We also derive the PAC-Bayes-Bennett inequality, which we use for empirical estimation of the oracle bound. The PAC-Bayes-Bennett inequality improves on the PAC-Bayes-Bernstein inequality by Seldin et al. [2012]. We provide an empirical evaluation demonstrating that the new bounds can improve on the work by Masegosa et al. [2020]. Both the parametric form of the Chebyshev-Cantelli inequality and the PAC-Bayes-Bennett inequality may be of independent interest for the study of concentration of measure in other domains.	翻訳日:2021-06-28 17:55:05 公開日:2021-06-25
# (参考訳) 言語モデルは優れた翻訳者です Language Models are Good Translators ( http://arxiv.org/abs/2106.13627v1 ) ライセンス: CC BY 4.0	Shuo Wang, Zhaopeng Tu, Zhixing Tan, Wenxuan Wang, Maosong Sun, Yang Liu	(参考訳) 近年、エンコーダ-デコーダアーキテクチャの中核であるニューラルネットワーク翻訳(NMT)が急速に進歩しているのを目撃している。機械翻訳における大規模事前学習言語モデルの限られたシナリオにおける最近の進歩に触発されて、我々はまず、単一の言語モデル(LM4MT)が標準機械翻訳ベンチマークにおける強力なエンコーダ・デコーダNMTモデルと同等の性能を達成できることを実証した。 LM4MTはソースサイドのテキストを簡単に追加の監視として利用することができる。同じメカニズムでソースとターゲットのテキストをモデリングするが、LM4MTはソースとターゲットの文の両方に統一表現を提供し、言語間で知識を伝達する。ピボットベースおよびゼロショット変換タスクの広範囲な実験により、LM4MTはエンコーダ・デコーダNMTモデルよりも大きなマージンで優れていることが示された。 Recent years have witnessed the rapid advance in neural machine translation (NMT), the core of which lies in the encoder-decoder architecture. Inspired by the recent progress of large-scale pre-trained language models on machine translation in a limited scenario, we firstly demonstrate that a single language model (LM4MT) can achieve comparable performance with strong encoder-decoder NMT models on standard machine translation benchmarks, using the same training data and similar amount of model parameters. LM4MT can also easily utilize source-side texts as additional supervision. Though modeling the source- and target-language texts with the same mechanism, LM4MT can provide unified representations for both source and target sentences, which can better transfer knowledge across languages. Extensive experiments on pivot-based and zero-shot translation tasks show that LM4MT can outperform the encoder-decoder NMT model by a large margin.	翻訳日:2021-06-28 17:26:51 公開日:2021-06-25
# (参考訳) deeploc: ユビキタスな精度と低オーバヘッドな屋外セルローカライズシステム DeepLoc: A Ubiquitous Accurate and Low-Overhead Outdoor Cellular Localization System ( http://arxiv.org/abs/2106.13632v1 ) ライセンス: CC BY 4.0	Ahmed Shokry, Marwan Torki, Moustafa Youssef	(参考訳) 近年,屋外位置情報サービスの普及が進んでいる。 gpsはユビキタスなローカライズシステムと考えられているが、ローエンドの携帯電話ではサポートされておらず、衛星への直接の視線を必要とする。本稿では,GPSライクな位置決め精度を限界なく獲得する深層学習型屋外位置決めシステムDeepLocを提案する。特にDeepLocは、モバイル端末が受信した異なるセルタワーから受信したユビキタスなセル信号を、それをローカライズするためのヒントとして利用する。そのため、異なるセルタワーから受信した信号強度情報をクラウドセンシングしたジオタグを用いて、利用者の位置を推定する深層モデルの訓練を行う。 deeploc設計の一環として,大規模領域へのデータ収集のスケールアップ,セル信号の固有ノイズ処理やジオタグデータ処理,低オーバヘッドのディープラーニングモデルに必要な十分なデータの提供など,多くの実用的な課題に対処するモジュールを導入する。私たちはさまざまなAndroidデバイスにDeepLocを実装しました。現実的な都市・農村環境の評価結果から、DeepLocは都市部では18.8m、農村部では15.7mの範囲で、中央値のローカライズ精度を達成できることが示された。この精度は、最先端のセルベースシステムよりも470%以上優れており、GPSと比較して330%の省電力が可能である。これはDeepLocがユビキタスで高精度かつ低オーバーヘッドなローカライゼーションシステムであることを示すものだ。 Recent years have witnessed fast growth in outdoor location-based services. While GPS is considered a ubiquitous localization system, it is not supported by low-end phones, requires direct line of sight to the satellites, and can drain the phone battery quickly. In this paper, we propose DeepLoc: a deep learning-based outdoor localization system that obtains GPS-like localization accuracy without its limitations. In particular, DeepLoc leverages the ubiquitous cellular signals received from the different cell towers heard by the mobile device as hints to localize it. To do that, crowd-sensed geo-tagged received signal strength information coming from different cell towers is used to train a deep model that is used to infer the user's position. As part of DeepLoc design, we introduce modules to address a number of practical challenges including scaling the data collection to large areas, handling the inherent noise in the cellular signal and geo-tagged data, as well as providing enough data that is required for deep learning models with low-overhead. We implemented DeepLoc on different Android devices. Evaluation results in realistic urban and rural environments show that DeepLoc can achieve a median localization accuracy within 18.8m in urban areas and within 15.7m in rural areas. This accuracy outperforms the state-of-the-art cellular-based systems by more than 470% and comes with 330% savings in power compared to the GPS. This highlights the promise of DeepLoc as a ubiquitous accurate and low-overhead localization system.	翻訳日:2021-06-28 17:11:39 公開日:2021-06-25
# (参考訳) 衝突依存報酬分布を有するマルチプレイヤーマルチアームバンディット Multi-player Multi-armed Bandits with Collision-Dependent Reward Distributions ( http://arxiv.org/abs/2106.13669v1 ) ライセンス: CC BY 4.0	Chengshuai Shi, Cong Shen	(参考訳) 本研究では,腕に衝突した場合に報酬分布が変化する確率的マルチプレイヤーマルチアームバンディット問題(mp-mab)について検討した。既存の文献は常に、衝突が発生した場合、関連するプレイヤーにゼロ報酬を仮定するが、認知無線のような応用の場合、より現実的なシナリオは、衝突が平均報酬を減らし、必ずしもゼロにしないことである。我々は,プレイヤーが直接衝突を知覚しない,より実用的なno-sensing設定に着目し,暗黙的通信をノイズチャネル問題に対する信頼性の高い通信としてモデル化する誤り訂正衝突通信(ec3)アルゴリズムを提案する。最後に、コード長とデコードエラー率のトレードオフを最適化することは、自然の低い境界を表す集中的なMP-MABの後悔に近づくことを後悔させる。合成データと実世界のデータセットの両方における実用的な誤り訂正コードによる実験は、ec3の優位を示している。特に, コーディングスキームの選択が後悔のパフォーマンスに大きな影響を与えることが示された。 We study a new stochastic multi-player multi-armed bandits (MP-MAB) problem, where the reward distribution changes if a collision occurs on the arm. Existing literature always assumes a zero reward for involved players if collision happens, but for applications such as cognitive radio, the more realistic scenario is that collision reduces the mean reward but not necessarily to zero. We focus on the more practical no-sensing setting where players do not perceive collisions directly, and propose the Error-Correction Collision Communication (EC3) algorithm that models implicit communication as a reliable communication over noisy channel problem, for which random coding error exponent is used to establish the optimal regret that no communication protocol can beat. Finally, optimizing the tradeoff between code length and decoding error rate leads to a regret that approaches the centralized MP-MAB regret, which represents a natural lower bound. Experiments with practical error-correction codes on both synthetic and real-world datasets demonstrate the superiority of EC3. In particular, the results show that the choice of coding schemes has a profound impact on the regret performance.	翻訳日:2021-06-28 16:56:28 公開日:2021-06-25
# (参考訳) フェデレーション学習のためのクリッピングの理解:収束とクライアントレベルの差分プライバシー Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy ( http://arxiv.org/abs/2106.13673v1 ) ライセンス: CC BY 4.0	Xinwei Zhang, Xiangyi Chen, Mingyi Hong, Zhiwei Steven Wu and Jinfeng Yi	(参考訳) プライバシ保護の提供は、フェデレートラーニング(FL)の主要な動機の1つだ。近年、差分プライバシーという形式的なプライバシー概念をFLに組み込むことに取り組んできた。 flアルゴリズムにおけるクライアントレベルのディファレンシャルプライバシを保証するためには、プライバシノイズを追加する前に、クライアントのモデル更新をクリップする必要がある。このようなクリッピング操作は、偏微分プライベートSGDにおける勾配クリッピングとは大きく異なり、十分に理解されていない。本稿では,ニューラルネットワークのトレーニングにおいて,有意なデータ不均一性を伴っても,カットしたFedAvgが驚くほど良好に動作可能であることを実証的に実証する。このキーとなる観測に基づいて、差分プライベート(DP)のFedAvgアルゴリズムの収束解析を行い、クリッピングバイアスとクライアント更新の分布との関係を明らかにする。私たちの知る限りでは、flアルゴリズムのクリッピング操作に関する理論的および経験的問題を厳格に調査するのはこれが初めてです。 Providing privacy protection has been one of the primary motivations of Federated Learning (FL). Recently, there has been a line of work on incorporating the formal privacy notion of differential privacy with FL. To guarantee the client-level differential privacy in FL algorithms, the clients' transmitted model updates have to be clipped before adding privacy noise. Such clipping operation is substantially different from its counterpart of gradient clipping in the centralized differentially private SGD and has not been well-understood. In this paper, we first empirically demonstrate that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity when training neural networks, which is partly because the clients' updates become similar for several popular deep architectures. Based on this key observation, we provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clipping bias and the distribution of the clients' updates. To the best of our knowledge, this is the first work that rigorously investigates theoretical and empirical issues regarding the clipping operation in FL algorithms.	翻訳日:2021-06-28 16:20:09 公開日:2021-06-25
# (参考訳) 非凸チューニングフリーロバスト回帰問題に対する近近大乗数最小化アルゴリズム A proximal-proximal majorization-minimization algorithm for nonconvex tuning-free robust regression problems ( http://arxiv.org/abs/2106.13683v1 ) ライセンス: CC0 1.0	Peipei Tang, Chengjing Wang and Bo Jiang	(参考訳) 本稿では,非凸チューニングフリーロバスト回帰問題に対する近近大乗数最小化(ppmm)アルゴリズムを提案する。基本的考え方は、近位偏極最小化アルゴリズムを用いて、スパース半平板ニュートン法(SSN)法に基づく近位点アルゴリズム(PPA)によって解かれた内部のサブプロブレムで非凸問題を解くことである。アルゴリズムの設計における主な困難は、内部サブプロブレムの特異な難しさを克服する方法にあることを強調する必要がある。さらに、PPMMアルゴリズムがd-定常点に収束することを証明した。この問題のKurtyka-Lojasiewicz(KL)特性のため、PPMMアルゴリズムの収束率を示す。数値実験により,提案アルゴリズムが既存の最先端アルゴリズムよりも優れていることを示す。 In this paper, we introduce a proximal-proximal majorization-minimization (PPMM) algorithm for nonconvex tuning-free robust regression problems. The basic idea is to apply the proximal majorization-minimization algorithm to solve the nonconvex problem with the inner subproblems solved by a sparse semismooth Newton (SSN) method based proximal point algorithm (PPA). We must emphasize that the main difficulty in the design of the algorithm lies in how to overcome the singular difficulty of the inner subproblem. Furthermore, we also prove that the PPMM algorithm converges to a d-stationary point. Due to the Kurdyka-Lojasiewicz (KL) property of the problem, we present the convergence rate of the PPMM algorithm. Numerical experiments demonstrate that our proposed algorithm outperforms the existing state-of-the-art algorithms.	翻訳日:2021-06-28 15:35:43 公開日:2021-06-25
# (参考訳) フランク・エミカ・パンダシミュレーションロボットのための多方向強化学習環境 Multi-Goal Reinforcement Learning environments for simulated Franka Emika Panda robot ( http://arxiv.org/abs/2106.13687v1 ) ライセンス: CC BY 4.0	Quentin Gallou\'edec, Nicolas Cazin, Emmanuel Dellandr\'ea, Liming Chen	(参考訳) 本報告では,openai gym と統合した franka emika panda ロボットの強化学習(rl)環境である panda-gym を提案する。 reach、push、slide、pick & place、stackの5つのタスクが含まれている。それらはすべてMulti-Goal RLフレームワークに従っており、目標指向のRLアルゴリズムを使用することができる。オープンリサーチを促進するために、私たちはオープンソースの物理エンジンpybulletを使うことを選択しました。このパッケージに選択された実装は、非常に簡単に新しいタスクや新しいロボットを定義することができる。本報告では,最先端のモデルレスオフポリシーアルゴリズムを用いて得られた結果のベースラインを示す。 panda-gymはhttps://github.com/qgallouedec/panda-gymでオープンソースである。 This technical report presents panda-gym, a set Reinforcement Learning (RL) environments for the Franka Emika Panda robot integrated with OpenAI Gym. Five tasks are included: reach, push, slide, pick & place and stack. They all follow a Multi-Goal RL framework, allowing to use goal-oriented RL algorithms. To foster open-research, we chose to use the open-source physics engine PyBullet. The implementation chosen for this package allows to define very easily new tasks or new robots. This report also presents a baseline of results obtained with state-of-the-art model-free off-policy algorithms. panda-gym is open-source at https://github.com/qgallouedec/panda-gym.	翻訳日:2021-06-28 15:34:45 公開日:2021-06-25
# (参考訳) 後方共分散情報基準 Posterior Covariance Information Criterion ( http://arxiv.org/abs/2106.13694v1 ) ライセンス: CC BY 4.0	Yukito Iba and Keisuke Yano	(参考訳) 準後続分布に基づく予測評価のための情報基準であるPCICを導入する。広く適用可能な情報基準(waic)の自然な一般化と見なされ、単一のマルコフ連鎖モンテカルロランで計算することができる。 PCICは、重み付き確率推定や準ベイズ予測など、WAICではうまく扱えない様々な予測設定において有用である。 We introduce an information criterion, PCIC, for predictive evaluation based on quasi-posterior distributions. It is regarded as a natural generalization of widely applicable information criterion (WAIC) and can be computed via a single Markov Chain Monte Carlo run. PCIC is useful in a variety of predictive settings that are not well dealt with in WAIC, including weighted likelihood inference and quasi-Bayesian prediction.	翻訳日:2021-06-28 15:27:30 公開日:2021-06-25
# (参考訳) 補助条件による画像間変換 Image-to-image Transformation with Auxiliary Condition ( http://arxiv.org/abs/2106.13696v1 ) ライセンス: CC BY 4.0	Robert Leer, Hessi Roma, James Amelia	(参考訳) シミュレーション画像で訓練された人間のポーズ検出のような画像認識の性能は通常、実際のデータとシミュレーションデータのばらつきによって悪化する。シミュレーション画像の分布を実画像に近いものにするために、SimGAN や CycleGAN といった GAN ベースの画像-画像変換手法を適用する研究がいくつかある。しかし、これらの方法は、特に訓練データが不均衡である場合、例えば、訓練データにおいて特定のポーズや形状が小さい場合など、被験者の姿勢や形の変化に十分敏感ではない。この問題を克服するために, 被験者のポーズや物体の種類といったラベル情報をサイクガンの訓練に導入し, ラベルワイズ・トランスフォーメーションモデルを得ることを提案する。提案手法であるラベルサイクガンをsvhnからmnistへのデジット画像変換とシミュレーション画像から実画像への監視カメラ画像変換実験により評価した。 The performance of image recognition like human pose detection, trained with simulated images would usually get worse due to the divergence between real and simulated data. To make the distribution of a simulated image close to that of real one, there are several works applying GAN-based image-to-image transformation methods, e.g., SimGAN and CycleGAN. However, these methods would not be sensitive enough to the various change in pose and shape of subjects, especially when the training data are imbalanced, e.g., some particular poses and shapes are minor in the training data. To overcome this problem, we propose to introduce the label information of subjects, e.g., pose and type of objects in the training of CycleGAN, and lead it to obtain label-wise transforamtion models. We evaluate our proposed method called Label-CycleGAN, through experiments on the digit image transformation from SVHN to MNIST and the surveillance camera image transformation from simulated to real images.	翻訳日:2021-06-28 15:11:04 公開日:2021-06-25
# (参考訳) 代替現実感ゲームを用いた社会科学研究の促進手法:個人差と適応性の測定による概念実証とチームパフォーマンスへの影響 Advancing Methodology for Social Science Research Using Alternate Reality Games: Proof-of-Concept Through Measuring Individual Differences and Adaptability and their impact on Team Performance ( http://arxiv.org/abs/2106.13740v1 ) ライセンス: CC BY 4.0	Magy Seif El-Nasr, Casper Harteveld, Paul Fombelle, Truong-Huy Nguyen, Paola Rizzo, Dylan Schouten, Abdelrahman Madkour, Chaima Jemmali, Erica Kleinman, Nithesh Javvaji, Zhaoqing Teng, Extra Ludic Inc	(参考訳) cscw(computer supported collaborative work)、心理学、社会科学(social sciences)といった分野の研究は、チームプロセスとその効果と効果の理解を進歩させていますが、現在の手法は観察や自己報告に依存しています。この報告では、個人の違いとそのチーム適応への影響を理解することに焦点を当てて、このオープンな問題に取り組む作業について議論し、これらの要因が結果とプロセスの両方としてチームパフォーマンスに与える影響をさらに探ります。具体的には、調査データと行動データを強化し、チームパフォーマンスに関する洞察を深め、グループ内およびグループ内における適応とパフォーマンスを評価する方法の開発を可能にする方法に関する貢献について論じます。この問題をより扱いやすくするため、私たちは特定のタイプの環境、代替現実ゲーム(arg)、そしていくつかの理由に焦点を当てることを選びました。まず、これらのゲームは、例えばスラックや電子メールによるコミュニケーションなど、現実世界のセットアップと類似したセットアップを含む。第二に、実際の環境よりも制御可能で、必要に応じて刺激を埋め込むことができます。最後に、経験の全体を通して意思決定やコミュニケーションを理解するのに必要なデータを集めることができるため、チームプロセスは可能な限り透過的になります。本報告では,これまでに行った作業について論じ,その効果を実証する。 While work in fields of CSCW (Computer Supported Collaborative Work), Psychology and Social Sciences have progressed our understanding of team processes and their effect performance and effectiveness, current methods rely on observations or self-report, with little work directed towards studying team processes with quantifiable measures based on behavioral data. In this report we discuss work tackling this open problem with a focus on understanding individual differences and its effect on team adaptation, and further explore the effect of these factors on team performance as both an outcome and a process. We specifically discuss our contribution in terms of methods that augment survey data and behavioral data that allow us to gain more insight on team performance as well as develop a method to evaluate adaptation and performance across and within a group. To make this problem more tractable we chose to focus on specific types of environments, Alternate Reality Games (ARGs), and for several reasons. First, these types of games involve setups that are similar to a real-world setup, e.g., communication through slack or email. Second, they are more controllable than real environments allowing us to embed stimuli if needed. Lastly, they allow us to collect data needed to understand decisions and communications made through the entire duration of the experience, which makes team processes more transparent than otherwise possible. In this report we discuss the work we did so far and demonstrate the efficacy of the approach.	翻訳日:2021-06-28 15:04:43 公開日:2021-06-25
# (参考訳) Privileged Zero-Shot AutoML Privileged Zero-Shot AutoML ( http://arxiv.org/abs/2106.13743v1 ) ライセンス: CC BY 4.0	Nikhil Singh, Brandon Kates, Jeff Mentch, Anant Kharkar, Madeleine Udell, Iddo Drori	(参考訳) この研究は、データセットと関数記述を用いて自動機械学習(AutoML)システムの品質を改善し、ゼロショットアプローチを用いて計算時間を数分からミリ秒に大幅に短縮する。新しいデータセットと明確に定義された機械学習タスクが与えられたとき、人間はデータセットの説明と使用するアルゴリズムのドキュメンテーションを読むことから始める。この作業は、AutoMLで特権情報と呼ばれるこれらのテキスト記述を使った最初のものです。トレーニング済みのTransformerモデルを使用して、特権テキストを処理し、この情報を使うことでAutoMLのパフォーマンスが向上することを示す。このように、自然言語処理における教師なし表現学習の進歩を活用し、AutoMLを大幅に向上させる。データと関数のテキスト記述のみを使用することで、合理的な分類性能が得られ、データメタ機能にテキスト記述を追加することで、表型データセット全体の分類が向上することを示す。ゼロショットAutoMLを達成するために、これらの記述埋め込みとデータメタ機能を使ってグラフニューラルネットワークをトレーニングする。各ノードはトレーニングデータセットを表しており、ゼロショット形式で新しいテストデータセットの最高の機械学習パイプラインを予測するために使用します。私たちのゼロショットアプローチは、教師付き学習タスクとデータセットのための高品質なパイプラインを迅速に予測します。対照的に、ほとんどのAutoMLシステムは、数十から数百のパイプライン評価を必要とする。ゼロショットのAutoMLは、実行時間と予測時間を数分からミリ秒に短縮する。 AutoMLを桁違いにスピードアップすることで、この作業はリアルタイムのAutoMLを示す。 This work improves the quality of automated machine learning (AutoML) systems by using dataset and function descriptions while significantly decreasing computation time from minutes to milliseconds by using a zero-shot approach. Given a new dataset and a well-defined machine learning task, humans begin by reading a description of the dataset and documentation for the algorithms to be used. This work is the first to use these textual descriptions, which we call privileged information, for AutoML. We use a pre-trained Transformer model to process the privileged text and demonstrate that using this information improves AutoML performance. Thus, our approach leverages the progress of unsupervised representation learning in natural language processing to provide a significant boost to AutoML. We demonstrate that using only textual descriptions of the data and functions achieves reasonable classification performance, and adding textual descriptions to data meta-features improves classification across tabular datasets. To achieve zero-shot AutoML we train a graph neural network with these description embeddings and the data meta-features. Each node represents a training dataset, which we use to predict the best machine learning pipeline for a new test dataset in a zero-shot fashion. Our zero-shot approach rapidly predicts a high-quality pipeline for a supervised learning task and dataset. In contrast, most AutoML systems require tens or hundreds of pipeline evaluations. We show that zero-shot AutoML reduces running and prediction times from minutes to milliseconds, consistently across datasets. By speeding up AutoML by orders of magnitude this work demonstrates real-time AutoML.	翻訳日:2021-06-28 15:03:15 公開日:2021-06-25
# (参考訳) InteL-VAEs:中間潜水剤による変分オートエンコーダへの誘導バイアス付加 InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via Intermediary Latents ( http://arxiv.org/abs/2106.13746v1 ) ライセンス: CC BY 4.0	Ning Miao, Emile Mathieu, N. Siddharth, Yee Whye Teh, Tom Rainforth	(参考訳) 本稿では,潜在変数の中間集合を用いて,制御可能な帰納バイアスを持つvaes学習法を提案する。これにより、標準ガウス事前仮定の制限を克服することができる。特に、学習した表現に疎結合やクラスタリングのような望ましい特性を課し、学習したモデルに事前情報を組み込むことができる。 InteL-VAE(Intermediary Latent Space VAE)と呼ばれる我々のアプローチは、符号化プロセスの確率性を中間潜時変数で制御することに基づいており、それらを対象潜時表現に決定的にマッピングし、そこから再構成を行う。これにより、望まれる事前情報、帰納的バイアス、さらには潜在マッピングによるトポロジ情報も取り入れながら、従来のVAEフレームワークのすべての利点を維持できます。これにより、InteL-VAEはより優れた生成モデルと表現の両方を学ぶことができる。 We introduce a simple and effective method for learning VAEs with controllable inductive biases by using an intermediary set of latent variables. This allows us to overcome the limitations of the standard Gaussian prior assumption. In particular, it allows us to impose desired properties like sparsity or clustering on learned representations, and incorporate prior information into the learned model. Our approach, which we refer to as the Intermediary Latent Space VAE (InteL-VAE), is based around controlling the stochasticity of the encoding process with the intermediary latent variables, before deterministically mapping them forward to our target latent representation, from which reconstruction is performed. This allows us to maintain all the advantages of the traditional VAE framework, while incorporating desired prior information, inductive biases, and even topological information through the latent mapping. We show that this, in turn, allows InteL-VAEs to learn both better generative models and representations.	翻訳日:2021-06-28 14:50:02 公開日:2021-06-25
# (参考訳) 新型コロナウイルス対策におけるロックダウン効果の評価 Assessing the Lockdown Effects on Air Quality during COVID-19 Era ( http://arxiv.org/abs/2106.13750v1 ) ライセンス: CC BY 4.0	Ioannis Kavouras, Eftychios Protopapadakis, Maria Kaselimia, Emmanuel Sardis, Nikolaos Doulamis	(参考訳) 本研究は、新型コロナウイルスの感染拡大を抑制するため、各都市で適用された予防策による大気汚染の短期的変動について検討する。特に、一酸化炭素(CO)、オゾン(O3)、二酸化窒素(NO2)、二酸化硫黄(SO2)などの特定の汚染ガスに対する濃度効果を強調した。ロックダウンの影響の評価は4つのヨーロッパ都市(Athens, Gladsaxe, Lodz, Rome)に焦点を当てた。地球規模の衛星観測により,汚染物質に関するデータを得た。雇用予防対策のレベルは、オックスフォード市政府対応トラッカーを用いて採用されている。分析の第2部では、さまざまな機械学習ツールを使用して、各汚染物質の濃度を2日前に推定した。その結果, 対応する指標と汚染要因との間には, 弱ないし中程度の相関関係が存在し, 日常生活における汚染物質ガスの挙動を予測できるモデルを作成することが可能であった。 In this work we investigate the short-term variations in air quality emissions, attributed to the prevention measures, applied in different cities, to mitigate the COVID-19 spread. In particular, we emphasize on the concentration effects regarding specific pollutant gases, such as carbon monoxide (CO), ozone (O3), nitrogen dioxide (NO2) and sulphur dioxide (SO2). The assessment of the impact of lockdown on air quality focused on four European Cities (Athens, Gladsaxe, Lodz and Rome). Available data on pollutant factors were obtained using global satellite observations. The level of the employed prevention measures is employed using the Oxford COVID-19 Government Response Tracker. The second part of the analysis employed a variety of machine learning tools, utilized for estimating the concentration of each pollutant, two days ahead. The results showed that a weak to moderate correlation exists between the corresponding measures and the pollutant factors and that it is possible to create models which can predict the behaviour of the pollutant gases under daily human activities.	翻訳日:2021-06-28 14:27:46 公開日:2021-06-25
# (参考訳) ニューラルスタイル伝達のための対話型マルチレベルストローク制御 Interactive Multi-level Stroke Control for Neural Style Transfer ( http://arxiv.org/abs/2106.13787v1 ) ライセンス: CC BY 4.0	Max Reimann and Benito Buchheim and Amir Semmo and J\"urgen D\"ollner and Matthias Trapp	(参考訳) 本稿では,スタイル要素の創造的調整を容易にし,高出力の忠実度を実現するニューラルスタイル転送をインタラクティブにマルチレベル制御するモバイルアプリstyletuneを提案する。現在のモバイルのニューラルスタイル転送アプリとは対照的に、styletuneでは、ブラシストロークやテクスチャパッチといったスタイル要素のサイズと向きを、グローバルおよびローカルレベルで調整することができる。そこで本研究では、ストロークサイズと強度を制御し、現在のアプローチよりも広い範囲の編集を可能にする、新しいストローク適応フィードフォワード型転送ネットワークを提案する。さらに,CNNの回転分散を利用したストローク向き調整のためのネットワーク非依存手法を提案する。さらに,高出力率を実現するために,20メガピクセル以上の出力解像度が得られるパッチベースのスタイル転送手法を提案する。当社のアプローチは,現在のモバイルニューラルスタイル転送アプリでは不可能な,多くの新しい結果を生成する上で有効です。 We present StyleTune, a mobile app for interactive multi-level control of neural style transfers that facilitates creative adjustments of style elements and enables high output fidelity. In contrast to current mobile neural style transfer apps, StyleTune supports users to adjust both the size and orientation of style elements, such as brushstrokes and texture patches, on a global as well as local level. To this end, we propose a novel stroke-adaptive feed-forward style transfer network, that enables control over stroke size and intensity and allows a larger range of edits than current approaches. For additional level-of-control, we propose a network agnostic method for stroke-orientation adjustment by utilizing the rotation-variance of CNNs. To achieve high output fidelity, we further add a patch-based style transfer method that enables users to obtain output resolutions of more than 20 Megapixel. Our approach empowers users to create many novel results that are not possible with current mobile neural style transfer apps.	翻訳日:2021-06-28 14:16:14 公開日:2021-06-25
# (参考訳) PVTv2:ピラミッドビジョントランスによるベースライン改善 PVTv2: Improved Baselines with Pyramid Vision Transformer ( http://arxiv.org/abs/2106.13797v1 ) ライセンス: CC BY 4.0	Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao	(参考訳) コンピュータビジョンのトランスフォーマーは、最近進歩している。本研究では,(1)畳み込みを伴う局所連続的な特徴,(2)ゼロパディングによる位置符号化,(3)平均プールを用いた線形複雑注意層を含む3つの改良設計を加えることにより,元のピラミッドビジョン変換器(PVTv1)を改善した。これらの簡単な修正により、PVTv2は分類、検出、セグメンテーションにおいてPVTv1を大幅に改善する。さらにPVTv2は、ImageNet-1K事前トレーニングの下で、Swin Transformerを含む最近の作業よりもはるかに優れたパフォーマンスを実現している。この研究により、最先端の視覚トランスフォーマー研究がよりアクセス可能になることを願っている。コードはhttps://github.com/whai362/PVTで入手できる。 Transformer in computer vision has recently shown encouraging progress. In this work, we improve the original Pyramid Vision Transformer (PVTv1) by adding three improvement designs, which include (1) locally continuous features with convolutions, (2) position encodings with zero paddings, and (3) linear complexity attention layers with average pooling. With these simple modifications, our PVTv2 significantly improves PVTv1 on classification, detection, and segmentation. Moreover, PVTv2 achieves much better performance than recent works, including Swin Transformer, under ImageNet-1K pre-training. We hope this work will make state-of-the-art vision Transformer research more accessible. Code is available at https://github.com/whai362/PVT .	翻訳日:2021-06-28 14:01:29 公開日:2021-06-25
# (参考訳) 領域ベースグラフニューラルネットワークを用いた効率的な文書画像分類 Efficient Document Image Classification Using Region-Based Graph Neural Network ( http://arxiv.org/abs/2106.13802v1 ) ライセンス: CC BY 4.0	Jaya Krishna Mandivarapu, Eric Bunch, Qian You, Glenn Fung	(参考訳) ドキュメントイメージの分類は、さまざまな業界にわたる多くのエンタープライズアプリケーションで商用化できるため、依然として一般的な研究分野である。大規模事前訓練されたコンピュータビジョンや言語モデル、グラフニューラルネットワークの最近の進歩は、画像分類を多くのツールに貸し出している。しかし、大きな事前訓練されたモデルを使用するには、通常かなりの計算資源が必要であるため、自動文書画像分類のコスト削減の利点を損なう可能性がある。本稿では,グラフ畳み込みニューラルネットワークを用いて,文書のテキスト情報,視覚情報,レイアウト情報を組み込んだ効率的な文書画像分類フレームワークを提案する。提案するアルゴリズムを,公開データセットと実生活保険書分類データセットの両方で,最先端のビジョンと言語モデルに対して厳格にベンチマークした。公開データと実世界のデータの両方で実証的な結果から,我々の手法はSOTAに近い性能を達成できるが,計算資源やモデルトレーニングや推論に要する時間をはるかに少なくすることがわかった。これにより、特にエンタープライズアプリケーションのスケーラブルなデプロイメントにおいて、コスト面でのメリットよりも優れたソリューションが実現される。その結果,本アルゴリズムはSOTAに非常に近い分類性能が得られることがわかった。また,提案手法とベースライン間の計算資源,モデルサイズ,トレーニング時間,推論時間を包括的に比較した。さらに、本手法および他のベースラインを用いて画像当たりのコストを並べる。 Document image classification remains a popular research area because it can be commercialized in many enterprise applications across different industries. Recent advancements in large pre-trained computer vision and language models and graph neural networks has lent document image classification many tools. However using large pre-trained models usually requires substantial computing resources which could defeat the cost-saving advantages of automatic document image classification. In the paper we propose an efficient document image classification framework that uses graph convolution neural networks and incorporates textual, visual and layout information of the document. We have rigorously benchmarked our proposed algorithm against several state-of-art vision and language models on both publicly available dataset and a real-life insurance document classification dataset. Empirical results on both publicly available and real-world data show that our methods achieve near SOTA performance yet require much less computing resources and time for model training and inference. This results in solutions than offer better cost advantages, especially in scalable deployment for enterprise applications. The results showed that our algorithm can achieve classification performance quite close to SOTA. We also provide comprehensive comparisons of computing resources, model sizes, train and inference time between our proposed methods and baselines. In addition we delineate the cost per image using our method and other baselines.	翻訳日:2021-06-28 13:46:54 公開日:2021-06-25
# 室内回転シーンの自己監督による単眼深度推定 Self-Supervised Monocular Depth Estimation of Untextured Indoor Rotated Scenes ( http://arxiv.org/abs/2106.12958v2 ) ライセンス: Link先を確認	Benjamin Keltjens and Tom van Dijk and Guido de Croon	(参考訳) 自己教師付き深層学習法では,単眼深度推定の訓練にステレオ画像を用いた。これらの手法は、KITTIなどの屋外データセットに対して強い結果を示すが、室内環境における監視手法の性能とカメラ回転とは一致しない。屋内で回転するシーンは、低テクスチャ領域の存在度と回転下の画像の奥行き手がかりの複雑さの増加という2つの理由から、制約の少ないアプリケーションでは一般的である。自己教師あり学習をより一般化した環境に拡張するために、我々は2つの追加を提案する。まず,テクスチャレス領域における画像再構成誤差損失の曖昧さを補正する新しい不均一損失項を提案する。具体的には, 周囲のテクスチャ領域からの距離を推定し, 元の推定値の補正にL1損失を用いる。実験の結果,ゴダードらによるモノデプスと比較すると,低テクスチャシーンでは,テクスチャシーンに損なわれることなく,奥行き推定が大幅に改善された。第2に, アプリケーションの代表回転によるトレーニングは, ピッチとロールの両方において, 期待回転範囲全体の性能を著しく向上させるのに十分であることを示す。カメラ回転のないテストセットで評価すると,性能が低下しないため,深さ推定がうまく一般化されることを示す。これらの開発により、複雑な環境に対する単眼深度推定の自己教師付き学習をより広く活用することができる。 Self-supervised deep learning methods have leveraged stereo images for training monocular depth estimation. Although these methods show strong results on outdoor datasets such as KITTI, they do not match performance of supervised methods on indoor environments with camera rotation. Indoor, rotated scenes are common for less constrained applications and pose problems for two reasons: abundance of low texture regions and increased complexity of depth cues for images under rotation. In an effort to extend self-supervised learning to more generalised environments we propose two additions. First, we propose a novel Filled Disparity Loss term that corrects for ambiguity of image reconstruction error loss in textureless regions. Specifically, we interpolate disparity in untextured regions, using the estimated disparity from surrounding textured areas, and use L1 loss to correct the original estimation. Our experiments show that depth estimation is substantially improved on low-texture scenes, without any loss on textured scenes, when compared to Monodepth by Godard et al. Secondly, we show that training with an application's representative rotations, in both pitch and roll, is sufficient to significantly improve performance over the entire range of expected rotation. We demonstrate that depth estimation is successfully generalised as performance is not lost when evaluated on test sets with no camera rotation. Together these developments enable a broader use of self-supervised learning of monocular depth estimation for complex environments.	翻訳日:2021-06-28 13:21:53 公開日:2021-06-25
# パストレースのためのリアルタイムニューラルネットワークラミアンスキャッシング Real-time Neural Radiance Caching for Path Tracing ( http://arxiv.org/abs/2106.12372v2 ) ライセンス: Link先を確認	Thomas M\"uller, Fabrice Rousselle, Jan Nov\'ak, Alexander Keller	(参考訳) 本稿では,パストレースによるグローバル照明のためのリアルタイムニューラルネットワークラミアンスキャッシング手法を提案する。我々のシステムは、完全にダイナミックなシーンを扱うように設計されており、照明、幾何学、材料に関する仮定は一切ない。私たちのアプローチのデータ駆動性は、キャッシュポイントの配置、補間、更新など、キャッシュアルゴリズムの多くの難しさを回避します。ニューラルネットワークをトレーニングして新しいものを扱うため、動的シーンは恐ろしい一般化の課題であるので、事前トレーニングを廃止し、適応によって一般化する。レンダリング中にレイディアンスキャッシュを訓練することにしました低ノイズのトレーニングターゲットを提供し、数バウンストレーニング更新を単に繰り返して無限バウンス輸送をシミュレートするために、自己学習を採用している。最新のハードウェアをフル活用したニューラルネットワークのストリーミング実装のおかげで、更新とキャッシュクエリは -- フルhd解像度で約2.6ミリ秒の軽いオーバーヘッドを伴います。バイアスを小さく抑えることで大きなノイズ低減効果を示すとともに,多くの課題に対して最先端のリアルタイム性能を報告した。 We present a real-time neural radiance caching method for path-traced global illumination. Our system is designed to handle fully dynamic scenes, and makes no assumptions about the lighting, geometry, and materials. The data-driven nature of our approach sidesteps many difficulties of caching algorithms, such as locating, interpolating, and updating cache points. Since pretraining neural networks to handle novel, dynamic scenes is a formidable generalization challenge, we do away with pretraining and instead achieve generalization via adaptation, i.e. we opt for training the radiance cache while rendering. We employ self-training to provide low-noise training targets and simulate infinite-bounce transport by merely iterating few-bounce training updates. The updates and cache queries incur a mild overhead -- about 2.6ms on full HD resolution -- thanks to a streaming implementation of the neural network that fully exploits modern hardware. We demonstrate significant noise reduction at the cost of little induced bias, and report state-of-the-art, real-time performance on a number of challenging scenarios.	翻訳日:2021-06-28 13:21:29 公開日:2021-06-25
# 非iidグラフ上のフェデレートグラフ分類 Federated Graph Classification over Non-IID Graphs ( http://arxiv.org/abs/2106.13423v1 ) ライセンス: Link先を確認	Han Xie, Jing Ma, Li Xiong, Carl Yang	(参考訳) フェデレートラーニングは、異なるドメインで機械学習モデルをトレーニングするための重要なパラダイムとして登場した。グラフ分類のようなグラフレベルのタスクでは、グラフは特別な種類のデータサンプルと見なすこともできる。他のドメインと同様に、グラフの小さなセットを持つ複数のローカルシステムは、人気のあるグラフニューラルネットワーク(GNN)のような強力なグラフマイニングモデルを協調的にトレーニングする利点がある。このような取り組みへのモチベーションを高めるために、異なるドメインの現実世界のグラフを分析し、ランダムグラフと比較して統計的に有意なグラフ特性を実際に共有していることを確認する。しかし、同じ領域や同じデータセットからでも異なるグラフ集合が、グラフ構造とノードの特徴の両方に関してIIDではないことが分かる。そこで本研究では,gnnの勾配に基づく局所システムのクラスタを動的に探索し,そのクラスタが局所システムが所有するグラフの構造や特徴の多様性を低減できることを理論的に正当化する,グラフクラスタリングフェデレーション学習(gcfl)フレームワークを提案する。さらに,gnnの勾配がgcflでかなり変動するのを観察し,動的時間ウォーピング(gcfl+)に基づく勾配シーケンスに基づくクラスタリング機構を設計する。広範な実験結果と詳細な分析により,提案手法の有効性が実証された。 Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collaboratively training a powerful graph mining model, such as the popular graph neural networks (GNNs). To provide more motivation towards such endeavors, we analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. To handle this, we propose a graph clustering federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time warping (GCFL+). Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed frameworks.	翻訳日:2021-06-28 13:21:13 公開日:2021-06-25
# 診断によるSGDの一般化評価 Assessing Generalization of SGD via Disagreement ( http://arxiv.org/abs/2106.13799v1 ) ライセンス: Link先を確認	Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J. Zico Kolter	(参考訳) 実験により、同一のトレーニングセット上で同じアーキテクチャをトレーニングするだけで、SGD(Stochastic Gradient Descent)が異なる動作で深層ネットワークのテスト誤差を推定できることを示し、ラベルのないテストデータ上で2つのネットワーク間の不一致率を測定する。これは、Nakkiran & Bansal '20における観察の、より強力なバージョンの上に構築されている。さらに、この特異な現象は、SGD訓練モデルの \emph{well-calibrated} の性質から生じることを理論的に示す。この発見は、ラベルのないテストデータを使ってテストエラーを直接予測する単純な経験的尺度を提供するだけでなく、一般化とキャリブレーションの間に新たな概念的接続を確立する。 We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data. This builds on -- and is a stronger version of -- the observation in Nakkiran & Bansal '20, which requires the second run to be on an altogether fresh training set. We further theoretically show that this peculiar phenomenon arises from the \emph{well-calibrated} nature of \emph{ensembles} of SGD-trained models. This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.	翻訳日:2021-06-28 13:20:50 公開日:2021-06-25
# CausalCity:Causal DiscoveryとReasoningのための複雑なシミュレーション CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning ( http://arxiv.org/abs/2106.13364v1 ) ライセンス: Link先を確認	Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn and Ashish Kapoor	(参考訳) 因果推論と反事実推論を行う能力は、人間の知性の中心的な性質である。このような推論を実行できる意思決定システムは、より一般化可能で解釈可能な可能性がある。シミュレーションは、パラメータ(例えば、共同設立者)を体系的に変化させ、反現実的なシナリオの場合の結果の例を生成する能力を提供することによって、この領域における最先端の進歩に役立っている。しかし、運転や車両ナビゲーションなど、多エージェントシナリオにおける複雑な時間的因果事象をシミュレートすることは困難である。そこで本研究では,安全クリティカルな文脈における因果発見と反事実推論のためのアルゴリズム開発を目的とした,忠実度の高いシミュレーション環境を提案する。私たちの作業の中核となるコンポーネントは \textit{agency} を導入することで、ハイレベルな定義を使って複雑なシナリオを簡単に定義し作成できます。車両はこれらの目的を達成するために機関と共に運用され、低レベルの行動は必要に応じてのみ制御される。我々は,3つの最先端手法を用いて実験を行い,ベースラインを作成し,この環境の余裕を強調する。最後に、将来の仕事の課題と機会を強調します。 The ability to perform causal and counterfactual reasoning are central properties of human intelligence. Decision-making systems that can perform these types of reasoning have the potential to be more generalizable and interpretable. Simulations have helped advance the state-of-the-art in this domain, by providing the ability to systematically vary parameters (e.g., confounders) and generate examples of the outcomes in the case of counterfactual scenarios. However, simulating complex temporal causal events in multi-agent scenarios, such as those that exist in driving and vehicle navigation, is challenging. To help address this, we present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning in the safety-critical context. A core component of our work is to introduce \textit{agency}, such that it is simple to define and create complex scenarios using high-level definitions. The vehicles then operate with agency to complete these objectives, meaning low-level behaviors need only be controlled if necessary. We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment. Finally, we highlight challenges and opportunities for future work.	翻訳日:2021-06-28 13:20:08 公開日:2021-06-25
# データ拡張のための単一画像テクスチャ変換 Single Image Texture Translation for Data Augmentation ( http://arxiv.org/abs/2106.13804v1 ) ライセンス: Link先を確認	Boyi Li and Yin Cui and Tsung-Yi Lin and Serge Belongie	(参考訳) 画像合成の最近の進歩により、ソースドメインとターゲットドメインのマッピングを学習することで、画像の翻訳が可能になる。既存の手法では、様々なデータセット上でモデルをトレーニングすることで分布を学習する傾向があり、その結果は主観的に評価される。しかし,画像認識タスクにおける意味的画像翻訳手法の可能性について検討する研究は比較的少ない。本稿では,データ拡張におけるSITT(Single Image Texture Translation)の利用について検討する。まず,ソーステクスチャの単一の入力に基づいてテクスチャを画像に変換する軽量モデルを提案し,高速なトレーニングとテストを可能にした。 SITTに基づいて、長い尾と少数ショットの画像分類タスクにおける拡張データの利用について検討する。提案手法は,入力データを対象領域に翻訳し,一貫した画像認識性能の向上を実現する。最後に、SITTと関連する画像翻訳手法が、モデルトレーニングにおけるデータ効率向上工学アプローチの基盤となるかを検討する。 Recent advances in image synthesis enables one to translate images by learning the mapping between a source domain and a target domain. Existing methods tend to learn the distributions by training a model on a variety of datasets, with results evaluated largely in a subjective manner. Relatively few works in this area, however, study the potential use of semantic image translation methods for image recognition tasks. In this paper, we explore the use of Single Image Texture Translation (SITT) for data augmentation. We first propose a lightweight model for translating texture to images based on a single input of source texture, allowing for fast training and testing. Based on SITT, we then explore the use of augmented data in long-tailed and few-shot image classification tasks. We find the proposed method is capable of translating input data into a target domain, leading to consistent improved image recognition performance. Finally, we examine how SITT and related image translation methods can provide a basis for a data-efficient, augmentation engineering approach to model training.	翻訳日:2021-06-28 13:19:49 公開日:2021-06-25
# ParaLaw Nets -- 法的テキスト処理のための言語間文レベルの事前学習 ParaLaw Nets -- Cross-lingual Sentence-level Pretraining for Legal Text Processing ( http://arxiv.org/abs/2106.13403v1 ) ライセンス: Link先を確認	Ha-Thanh Nguyen, Vu Tran, Phuong Minh Nguyen, Thi-Hai-Yen Vuong, Quan Minh Bui, Chau Minh Nguyen, Binh Tran Dang, Minh Le Nguyen, Ken Satoh	(参考訳) 曖昧さは自然言語の特徴であり、表現のアイデアを柔軟にする。しかし、正確なステートメントを必要とするドメインでは、それは障壁になります。具体的には、1つの単語が複数の意味を持ち、複数の単語が同じ意味を持つ。テキストを外国語に翻訳する場合、翻訳者は原文の各要素の正確な意味を判断し、正しい翻訳文を生成する必要がある。そこで本研究では,文レベルの言語間情報を用いた事前学習されたモデルファミリーであるParaLaw Netsを提案する。このアプローチは coliee-2021 の質問応答タスクで最高の結果を得た。 Ambiguity is a characteristic of natural language, which makes expression ideas flexible. However, in a domain that requires accurate statements, it becomes a barrier. Specifically, a single word can have many meanings and multiple words can have the same meaning. When translating a text into a foreign language, the translator needs to determine the exact meaning of each element in the original sentence to produce the correct translation sentence. From that observation, in this paper, we propose ParaLaw Nets, a pretrained model family using sentence-level cross-lingual information to reduce ambiguity and increase the performance in legal text processing. This approach achieved the best result in the Question Answering task of COLIEE-2021.	翻訳日:2021-06-28 13:19:35 公開日:2021-06-25
# コントラスト表現学習のための分解的相互情報推定 Decomposed Mutual Information Estimation for Contrastive Representation Learning ( http://arxiv.org/abs/2106.13401v1 ) ライセンス: Link先を確認	Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Phil Bachman, Remi Tachet	(参考訳) 最近のコントラスト表現学習法は、基礎となるコンテキストの複数のビュー間の相互情報(mi)の推定に依存する。例えば、データ拡張を適用することで、与えられた画像の複数のビューを導出したり、シーケンスをシーケンス内の何らかのステップの過去と未来からなるビューに分割することができる。 MI 上の対照的な下界は最適化が容易であるが、MI の大量推定では強い過小評価バイアスを持つ。そこで本稿では,MI推定問題をより小さな推定問題に分解する手法として,ビューの1つを段階的により情報的なサブビューに分割し,分割されたビュー間でMIに連鎖ルールを適用する手法を提案する。この式は無条件および条件MIの項の和を含み、それぞれが全MIのモデストチャンクを測定し、対照的な境界による近似を容易にする。和を最大化するために、効率的に近似できる条件 MI 上の対照的な下界を定式化する。我々は、一般的なアプローチをDEMI(Decomposed Estimation of Mutual Information)と呼ぶ。 DEMIは、合成環境では非分解コントラスト境界よりも多くのMIを捕捉でき、視覚領域や対話生成においてより良い表現を学習できることを示す。 Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.	翻訳日:2021-06-28 13:18:56 公開日:2021-06-25
# マルチドメインアクティブラーニング:比較研究 Multi-Domain Active Learning: A Comparative Study ( http://arxiv.org/abs/2106.13516v1 ) ライセンス: Link先を確認	Rui He, Shan He, Ke Tang	(参考訳) 複数のドメインに分類器を構築することは実生活において現実的な問題である。分類器を1つずつ構築する代わりに、マルチドメイン学習(MDL)は同時に複数のドメインに分類器を構築する。 MDLはドメイン間で共有される情報を利用してパフォーマンスを向上させる。教師付き学習問題として,MDL問題ではラベル付け作業が依然として高い。通常、この高いラベル付けコスト問題は、アクティブラーニングを使用することで軽減できる。したがって、MDLにおけるラベル付けの労力を減らすためにアクティブラーニングを活用することは自然であり、この設定をマルチドメインアクティブラーニング(MDAL)と呼ぶ。しかし、この設定で作られた作品はほとんどない。そして、研究がこの問題に直面するとき、既成の解決策は存在しない。この状況下では、現在のマルチドメイン学習モデルと単一ドメインのアクティブ学習戦略を組み合わせることが、MDAL問題の予備的な解決策となるかもしれない。この予備解の可能性を明らかにするために,5モデルと4つの選択戦略の比較研究を行った。私たちの知る限りでは、これがMDALの正式な定義を提供する最初の作品です。さらに、MDAL問題に対する最初の比較研究である。その結果,単純最良対第二最良(bvsb)不確実性戦略を用いた多項逆ネットワーク(man)モデルは,ほとんどの場合においてその優位を示す。我々はこの組み合わせをMDAL問題に対する既定の勧告だと考えている。 Building classifiers on multiple domains is a practical problem in the real life. Instead of building classifiers one by one, multi-domain learning (MDL) simultaneously builds classifiers on multiple domains. MDL utilizes the information shared among the domains to improve the performance. As a supervised learning problem, the labeling effort is still high in MDL problems. Usually, this high labeling cost issue could be relieved by using active learning. Thus, it is natural to utilize active learning to reduce the labeling effort in MDL, and we refer this setting as multi-domain active learning (MDAL). However, there are only few works which are built on this setting. And when the researches have to face this problem, there is no off-the-shelf solutions. Under this circumstance, combining the current multi-domain learning models and single-domain active learning strategies might be a preliminary solution for MDAL problem. To find out the potential of this preliminary solution, a comparative study over 5 models and 4 selection strategies is made in this paper. To the best of our knowledge, this is the first work provides the formal definition of MDAL. Besides, this is the first comparative work for MDAL problem. From the results, the Multinomial Adversarial Networks (MAN) model with a simple best vs second best (BvSB) uncertainty strategy shows its superiority in most cases. We take this combination as our off-the-shelf recommendation for the MDAL problem.	翻訳日:2021-06-28 13:18:37 公開日:2021-06-25
# 集団意思決定におけるエキスパートバイアスの対応 Dealing with Expert Bias in Collective Decision-Making ( http://arxiv.org/abs/2106.13539v1 ) ライセンス: Link先を確認	Axel Abels, Tom Lenaerts, Vito Trianni, Ann Now\'e	(参考訳) いくつかの現実の問題は意思決定の問題として定式化され、ある選択肢から適切な選択を繰り返す必要がある。人間であれ人工であれ、専門家の判断は、特に代替ソリューションの探索にコストがかかる場合に、正しい判断を下すのに役立つ。専門家の意見が逸脱する可能性があるため、正しい選択肢を見つけるという問題は、集団意思決定問題(CDM)としてアプローチできる。現在の最先端のCDM解決アプローチは、グループ内の最高の専門家の質によって制限されており、専門家が資格がない場合や過度に偏りがある場合、判断プロセスの妨げになる可能性がある。本稿では,文脈的マルチアームバンディット問題(CMAB)に基づく新たなアルゴリズムアプローチを提案する。我々は,同種,異種,偏極性の専門家グループを探索し,提案手法が優れた評価に直結するかどうか,最先端の手法,特に提供された専門知識の品質が低下した場合に,効果的に活用可能であることを示す。 cmabに触発された新しいアプローチは,従来の適応型アルゴリズムよりも高速に収束しながら,高い最終性能を実現している。 Quite some real-world problems can be formulated as decision-making problems wherein one must repeatedly make an appropriate choice from a set of alternatives. Expert judgements, whether human or artificial, can help in taking correct decisions, especially when exploration of alternative solutions is costly. As expert opinions might deviate, the problem of finding the right alternative can be approached as a collective decision making problem (CDM). Current state-of-the-art approaches to solve CDM are limited by the quality of the best expert in the group, and perform poorly if experts are not qualified or if they are overly biased, thus potentially derailing the decision-making process. In this paper, we propose a new algorithmic approach based on contextual multi-armed bandit problems (CMAB) to identify and counteract such biased expertises. We explore homogeneous, heterogeneous and polarised expert groups and show that this approach is able to effectively exploit the collective expertise, irrespective of whether the provided advice is directly conducive to good performance, outperforming state-of-the-art methods, especially when the quality of the provided expertise degrades. Our novel CMAB-inspired approach achieves a higher final performance and does so while converging more rapidly than previous adaptive algorithms, especially when heterogeneous expertise is readily available.	翻訳日:2021-06-28 13:18:20 公開日:2021-06-25
# フレキシブルニューラルネットワークのトレーニングのためのテンソルベースフレームワーク Tensor-based framework for training flexible neural networks ( http://arxiv.org/abs/2106.13542v1 ) ライセンス: Link先を確認	Yassine Zniyed, Konstantin Usevich, Sebastian Miron, David Brie	(参考訳) 活性化関数(AF)はニューラルネットワーク(NN)の設計において重要な部分であり、その選択はNNのパフォーマンスにおいて重要な役割を果たす。本研究では,afsを既定基底関数の重み付き和として表現したテンソルに基づく解を用いたフレキシブルアクティベーション関数の推定に特に注目する。そこで本研究では,制約付き結合行列-テンソル因子分解(cmtf)問題を解く新しい学習アルゴリズムを提案する。この手法は、制約付き正準多進分解(CPD)に続いて、一階情報がヤコビテンソルに含まれるNNの第1次及び第0次情報を融合する。提案アルゴリズムは、異なる分解基盤を扱える。この方法の目標は、元のネットワークの1層または複数の層を新しいフレキシブルな層に置き換えることで、大きな事前学習されたnnモデルを圧縮することである。このアプローチは、文字分類に使用される事前訓練された畳み込みニューラルネットワーク(CNN)に適用される。 Activation functions (AFs) are an important part of the design of neural networks (NNs), and their choice plays a predominant role in the performance of a NN. In this work, we are particularly interested in the estimation of flexible activation functions using tensor-based solutions, where the AFs are expressed as a weighted sum of predefined basis functions. To do so, we propose a new learning algorithm which solves a constrained coupled matrix-tensor factorization (CMTF) problem. This technique fuses the first and zeroth order information of the NN, where the first-order information is contained in a Jacobian tensor, following a constrained canonical polyadic decomposition (CPD). The proposed algorithm can handle different decomposition bases. The goal of this method is to compress large pretrained NN models, by replacing subnetworks, {\em i.e.,} one or multiple layers of the original network, by a new flexible layer. The approach is applied to a pretrained convolutional neural network (CNN) used for character classification.	翻訳日:2021-06-28 13:17:58 公開日:2021-06-25
# VEGN: グラフニューラルネットワークによる変数効果予測 VEGN: Variant Effect Prediction with Graph Neural Networks ( http://arxiv.org/abs/2106.13642v1 ) ライセンス: Link先を確認	Jun Cheng, Carolin Lawrence, Mathias Niepert	(参考訳) 遺伝的変異は、正常な遺伝子機能を破壊して病気を引き起こすことがある。個々の患者内の数百万の遺伝子変異から病気を引き起こす突然変異を特定することは難しい問題である。病原性突然変異を優先順位付けできる計算方法には、膨大な応用がある。遺伝子は複雑な制御ネットワークを介して機能することが知られている。しかし、既存の変種効果予測モデルは、単独で変種を考えるのみである。対照的に、遺伝子と変異を持つ異種グラフ上で動作するグラフニューラルネットワーク(GNN)を用いて、変動効果予測をモデル化するVEGNを提案する。このグラフは遺伝子に変異を割り当て、遺伝子と遺伝子相互作用ネットワークを接続することによって作られる。この文脈では、遺伝子遺伝子グラフが与えられるアプローチと、VEGNが遺伝子遺伝子グラフを学習し、与えられたエッジと学習したエッジの両方で操作するアプローチを探索する。グラフニューラルネットワークは、遺伝子間、および遺伝子と変異種間の情報を集約するために訓練される。変数は、接続する遺伝子を介して情報を交換することができる。このアプローチは既存の最先端モデルの性能を改善する。 Genetic mutations can cause disease by disrupting normal gene function. Identifying the disease-causing mutations from millions of genetic variants within an individual patient is a challenging problem. Computational methods which can prioritize disease-causing mutations have, therefore, enormous applications. It is well-known that genes function through a complex regulatory network. However, existing variant effect prediction models only consider a variant in isolation. In contrast, we propose VEGN, which models variant effect prediction using a graph neural network (GNN) that operates on a heterogeneous graph with genes and variants. The graph is created by assigning variants to genes and connecting genes with an gene-gene interaction network. In this context, we explore an approach where a gene-gene graph is given and another where VEGN learns the gene-gene graph and therefore operates both on given and learnt edges. The graph neural network is trained to aggregate information between genes, and between genes and variants. Variants can exchange information via the genes they connect to. This approach improves the performance of existing state-of-the-art models.	翻訳日:2021-06-28 13:17:25 公開日:2021-06-25
# ニューラルネットワークを用いた遺伝性癌の予測 Prediction of Hereditary Cancers Using Neural Networks ( http://arxiv.org/abs/2106.13682v1 ) ライセンス: Link先を確認	Zoe Guan, Giovanni Parmigiani, Danielle Braun, and Lorenzo Trippa	(参考訳) 家族歴は多くの種類のがんの主要な危険因子である。メンデルリスク予測モデルは、癌感受性遺伝子の知識に基づいて、家族の歴史をがんリスク予測に変換する。これらのモデルは、リスクの高い個人を特定するために臨床実践で広く利用されている。メンデルモデルは家族の歴史全体を生かしているが、変異の頻度が低いため、非現実的または検証が難しいがん感受性遺伝子に関する多くの仮定に依存している。ニューラルネットワークなどのフレキシブルなモデルを1桁の大規模なデータベースでトレーニングすることは、精度の向上につながる可能性がある。本稿では,家族史データにニューラルネットワークを適用する枠組みを開発し,癌に対する遺伝感受性を学習する能力について検討する。多くのタスクでは、ニューラルネットワークとその最先端のパフォーマンスに関する広範な文献があるが、家族の歴史データに適用する作業はほとんどない。本稿では,完全連結ニューラルネットワークと畳み込みニューラルネットワークの系統への適応を提案する。メンデル継承下でシミュレーションされたデータでは,提案するニューラルネットワークモデルがほぼ最適予測性能を達成できることを実証する。さらに、観測された家族歴が誤報告されたがん診断を含んでいる場合、ニューラルネットワークは正しい遺伝法則を組み込んだメンデル型BRCAPROモデルよりも優れている。リスクサービスのコホートである20万以上の家族履歴の大規模なデータセットを使用して、将来の乳癌リスク予測モデルをトレーニングします。がん遺伝学ネットワークのデータを用いてモデルを検証する。 Family history is a major risk factor for many types of cancer. Mendelian risk prediction models translate family histories into cancer risk predictions based on knowledge of cancer susceptibility genes. These models are widely used in clinical practice to help identify high-risk individuals. Mendelian models leverage the entire family history, but they rely on many assumptions about cancer susceptibility genes that are either unrealistic or challenging to validate due to low mutation prevalence. Training more flexible models, such as neural networks, on large databases of pedigrees can potentially lead to accuracy gains. In this paper, we develop a framework to apply neural networks to family history data and investigate their ability to learn inherited susceptibility to cancer. While there is an extensive literature on neural networks and their state-of-the-art performance in many tasks, there is little work applying them to family history data. We propose adaptations of fully-connected neural networks and convolutional neural networks to pedigrees. In data simulated under Mendelian inheritance, we demonstrate that our proposed neural network models are able to achieve nearly optimal prediction performance. Moreover, when the observed family history includes misreported cancer diagnoses, neural networks are able to outperform the Mendelian BRCAPRO model embedding the correct inheritance laws. Using a large dataset of over 200,000 family histories, the Risk Service cohort, we train prediction models for future risk of breast cancer. We validate the models using data from the Cancer Genetics Network.	翻訳日:2021-06-28 13:17:10 公開日:2021-06-25
# 共役エネルギーモデル Conjugate Energy-Based Models ( http://arxiv.org/abs/2106.13798v1 ) ライセンス: Link先を確認	Hao Wu, Babak Esmaeili, Michael Wick, Jean-Baptiste Tristan, Jan-Willem van de Meent	(参考訳) 本稿では,データと潜在変数の結合密度を定義する新しいエネルギー系モデルである共役エネルギーベースモデル(cebms)を提案する。 CEBMの結合密度は、データ上の難解な分布と遅延変数上の引き込み可能な後部分布に分解される。 CEBMは、データから潜在変数への教師なしマッピングを学ぶという意味で、変分オートエンコーダのようなユースケースを持つ。しかし、これらのモデルはジェネレータネットワークを省略し、データポイント間の類似性のより柔軟な概念を学ぶことができる。実験により,共役型EMMは画像モデリング,潜在空間の予測能力,および様々なデータセットの領域外検出において競合する結果が得られることを示した。 In this paper, we propose conjugate energy-based models (CEBMs), a new class of energy-based models that define a joint density over data and latent variables. The joint density of a CEBM decomposes into an intractable distribution over data and a tractable posterior over latent variables. CEBMs have similar use cases as variational autoencoders, in the sense that they learn an unsupervised mapping from data to latent variables. However, these models omit a generator network, which allows them to learn more flexible notions of similarity between data points. Our experiments demonstrate that conjugate EBMs achieve competitive results in terms of image modelling, predictive power of latent space, and out-of-domain detection on a variety of datasets.	翻訳日:2021-06-28 13:16:49 公開日:2021-06-25
# 正則化のための階層的接続球面多様体 Connecting Sphere Manifolds Hierarchically for Regularization ( http://arxiv.org/abs/2106.13549v1 ) ライセンス: Link先を確認	Damien Scieur, Youngsung Kim	(参考訳) 本稿では階層的なクラスによる分類問題を考察する。各クラスの分類器(超平面)を、その中心がスーパークラスの分類器である球面多様体に属するように強制する。そして、個々の球面多様体はその階層関係に基づいて連結される。本手法は,球状完全連結層と階層層を組み合わせることで,ニューラルネットワークの最終層を置き換えるものである。この正規化は、公開データセット(CIFAR100、CUB200、スタンフォード犬、スタンフォード車、Tiny-ImageNet)上で広く使用されているディープニューラルネットワークアーキテクチャ(ResNetとDenseNet)のパフォーマンスを改善することが示されている。 This paper considers classification problems with hierarchically organized classes. We force the classifier (hyperplane) of each class to belong to a sphere manifold, whose center is the classifier of its super-class. Then, individual sphere manifolds are connected based on their hierarchical relations. Our technique replaces the last layer of a neural network by combining a spherical fully-connected layer with a hierarchical layer. This regularization is shown to improve the performance of widely used deep neural network architectures (ResNet and DenseNet) on publicly available datasets (CIFAR100, CUB200, Stanford dogs, Stanford cars, and Tiny-ImageNet).	翻訳日:2021-06-28 13:15:47 公開日:2021-06-25
# 視覚変換器アーキテクチャ探索 Vision Transformer Architecture Search ( http://arxiv.org/abs/2106.13700v1 ) ライセンス: Link先を確認	Xiu Su, Shan You, Jiyang Xie, Mingkai Zheng, Fei Wang, Chen Qian, Changshui Zhang, Xiaogang Wang, Chang Xu	(参考訳) 近年,手動分割パッチのシーケンスを自己認識機構でモデル化することで,コンピュータビジョンタスクの解法において,トランスフォーマーは優れた優位性を示している。しかし、現在の視覚トランスフォーマー(vits)のアーキテクチャは自然言語処理(nlp)タスクから継承され、十分に研究され最適化されていない。本稿では,視覚タスクにおけるトランスフォーマの固有構造を検証し,同様のハードウェア予算で最適なアーキテクチャを探索するためのアーキテクチャ探索手法vitasを提案する。具体的には, 異なるトークン埋め込み, シーケンスサイズ, ヘッド数, 幅, 深さの異なるアーキテクチャを単一超変圧器から導出できるような, 有効かつ効率的なViTのための新しい重量共有パラダイムを設計する。さらに、異なるアーキテクチャのばらつきに対応するため、スーパートランスフォーマで \textit{private} クラストークンとセルフアテンションマップを導入する。また,異なる予算の探索に適応するために,同一性操作のサンプリング確率を探索することを提案する。実験の結果,既存のトランスフォーマアーキテクチャに比べ,vitasは優れた結果を得た。例えば、13$gのフロップス予算で、検索されたアーキテクチャは、imagenetで最大$$1の精度を74.7.%達成し、現在のベースラインvitアーキテクチャよりも$2.5\%優れている。コードは \url{https://github.com/xiusu/ViTAS} で入手できる。 Recently, transformers have shown great superiority in solving computer vision tasks by modeling images as a sequence of manually-split patches with self-attention mechanism. However, current architectures of vision transformers (ViTs) are simply inherited from natural language processing (NLP) tasks and have not been sufficiently investigated and optimized. In this paper, we make a further step by examining the intrinsic structure of transformers for vision tasks and propose an architecture search method, dubbed ViTAS, to search for the optimal architecture with similar hardware budgets. Concretely, we design a new effective yet efficient weight sharing paradigm for ViTs, such that architectures with different token embedding, sequence size, number of heads, width, and depth can be derived from a single super-transformer. Moreover, to cater for the variance of distinct architectures, we introduce \textit{private} class token and self-attention maps in the super-transformer. In addition, to adapt the searching for different budgets, we propose to search the sampling probability of identity operation. Experimental results show that our ViTAS attains excellent results compared to existing pure transformer architectures. For example, with $1.3$G FLOPs budget, our searched architecture achieves $74.7\%$ top-$1$ accuracy on ImageNet and is $2.5\%$ superior than the current baseline ViT architecture. Code is available at \url{https://github.com/xiusu/ViTAS}.	翻訳日:2021-06-28 13:15:35 公開日:2021-06-25
# 安定のための再パラメータvaes Re-parameterizing VAEs for stability ( http://arxiv.org/abs/2106.13739v1 ) ライセンス: Link先を確認	David Dehaene and R\'emy Brossard	(参考訳) 本稿では,変分オートエンコーダ(VAE)の数値安定性を訓練するための理論的アプローチを提案する。我々の研究は、VAEが複雑な画像データセットのアート生成結果に到達できるようにする最近の研究によって動機づけられている。これらの非常に深いVAEアーキテクチャと、より複雑な出力分布を用いたVAEは、高いトレーニング勾配とNaN損失を生み出す傾向を浮き彫りにしている。制限にもかかわらず訓練するために提案された経験的な修正は、完全に理論的に根拠づけられたり、実際は十分ではない。そこで本研究では,モデルのニューラルネットワークとその出力確率分布とのインタフェースに問題の原因を局所化する。符号化された正規分布の分散の注意深い定式化から生じる不安定性の共通源を説明し、他の明らかでないソースにも同様のアプローチを適用する。私たちが依存する正規分布をパラメータ化する方法に小さな変更を加えることで、VAEを安全にトレーニングできることが示されます。 We propose a theoretical approach towards the training numerical stability of Variational AutoEncoders (VAE). Our work is motivated by recent studies empowering VAEs to reach state of the art generative results on complex image datasets. These very deep VAE architectures, as well as VAEs using more complex output distributions, highlight a tendency to haphazardly produce high training gradients as well as NaN losses. The empirical fixes proposed to train them despite their limitations are neither fully theoretically grounded nor generally sufficient in practice. Building on this, we localize the source of the problem at the interface between the model's neural networks and their output probabilistic distributions. We explain a common source of instability stemming from an incautious formulation of the encoded Normal distribution's variance, and apply the same approach on other, less obvious sources. We show that by implementing small changes to the way we parameterize the Normal distributions on which they rely, VAEs can securely be trained.	翻訳日:2021-06-28 13:15:08 公開日:2021-06-25
# ヒューマンマシン協調によるつぶやきのきめ細かい位置情報予測 Fine-grained Geolocation Prediction of Tweets with Human Machine Collaboration ( http://arxiv.org/abs/2106.13411v1 ) ライセンス: Link先を確認	Florina Dutt and Subhajit Das	(参考訳) Twitterは、さまざまなトピックに関する人々の意見を分析するのに有用なリソースである。多くの場合、これらのトピックは、これらのつぶやきが投稿された場所と関連付けられている。例えば、レストランのオーナーは、食事に関する投稿の感情に関して、ターゲットの顧客がどこで食事をしているかを知る必要があり、政策プランナーは、犯罪、安全、渋滞などの関連する問題について、市民の意見を分析する必要がある。市の特定の部分、または郡または州について。約束通り、クロールされたツイートの投稿に位置情報タグが付くのは$1\%以下だ。これにより、ジオタグ付けされていないツイートに対するツイートの正確な予測が、さまざまなドメインのデータ分析に非常に重要である。本研究では,自然言語処理(NLP)技術を用いて,近隣,ジップコード,緯度などの様々な粒度における非ジオタグのつぶやき投稿の位置を推定するディープニューラルネットワークモデルを構築するために,何百万ものTwitter投稿とエンドユーザドメインの専門知識を活用した。複数のニューラルアーキテクチャ実験と協調的なヒューマンマシンワークフロー設計により、位置情報検出に関する現在進行中の研究は、エンドユーザが選択した変数と位置情報の関係を関連付けるための有望な結果を示している。 Twitter is a useful resource to analyze peoples' opinions on various topics. Often these topics are correlated or associated with locations from where these Tweet posts are made. For example, restaurant owners may need to know where their target customers eat with respect to the sentiment of the posts made related to food, policy planners may need to analyze citizens' opinion on relevant issues such as crime, safety, congestion, etc. with respect to specific parts of the city, or county or state. As promising as this is, less than $1\%$ of the crawled Tweet posts come with geolocation tags. That makes accurate prediction of Tweet posts for the non geo-tagged tweets very critical to analyze data in various domains. In this research, we utilized millions of Twitter posts and end-users domain expertise to build a set of deep neural network models using natural language processing (NLP) techniques, that predicts the geolocation of non geo-tagged Tweet posts at various level of granularities such as neighborhood, zipcode, and longitude with latitudes. With multiple neural architecture experiments, and a collaborative human-machine workflow design, our ongoing work on geolocation detection shows promising results that empower end-users to correlate relationship between variables of choice with the location information.	翻訳日:2021-06-28 13:14:53 公開日:2021-06-25
# リアルタイム話者分離のためのオンライン自己認識型学習RNN Online Self-Attentive Gated RNNs for Real-Time Speaker Separation ( http://arxiv.org/abs/2106.13493v1 ) ライセンス: Link先を確認	Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar	(参考訳) ディープニューラルネットワークは、モノラルとバイノーラルの両方の設定下で、ブラインドソース分離のタスクで大きな成功を収めた。これらの手法は高品質な分離を実現することが示されているが、主にオフライン環境で適用され、モデルが信号分離中に全入力信号にアクセスできる。本研究では,非因果的状態分離モデルを因果的かつリアルタイムなモデルに変換し,その性能をオンラインとオフラインの両方で評価する。提案モデルの性能を無響・無響・無雑音・残響記録条件下での複数のベースライン法と比較し,両耳入力と出力について検討した。分離時の因果モデルと非因果モデルとの相対的差異について検討した。オンライン分離のためのステートフルな実装は,オフラインモデルに比べてパフォーマンスが低下し,モノラル入力は0.8dB,バイノーラル入力は0.3dBとなり,リアルタイム係数0.65に達した。 https://kwanum.github.io/sagrnnc-stream-results/。 Deep neural networks have recently shown great success in the task of blind source separation, both under monaural and binaural settings. Although these methods were shown to produce high-quality separations, they were mainly applied under offline settings, in which the model has access to the full input signal while separating the signal. In this study, we convert a non-causal state-of-the-art separation model into a causal and real-time model and evaluate its performance under both online and offline settings. We compare the performance of the proposed model to several baseline methods under anechoic, noisy, and noisy-reverberant recording conditions while exploring both monaural and binaural inputs and outputs. Our findings shed light on the relative difference between causal and non-causal models when performing separation. Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0.8dB for monaural inputs and 0.3dB for binaural inputs while reaching a real-time factor of 0.65. Samples can be found under the following link: https://kwanum.github.io/sagrnnc-stream-results/.	翻訳日:2021-06-28 13:14:20 公開日:2021-06-25
# 空間的進化的adversarial networkにおける多様性の育成 Fostering Diversity in Spatial Evolutionary Generative Adversarial Networks ( http://arxiv.org/abs/2106.13590v1 ) ライセンス: Link先を確認	Jamal Toutouh and Erik Hemberg and Una-May O'Reilly	(参考訳) ジェネレーティブ・逆境ネットワーク(GAN)は不安定性やモード崩壊などのトレーニング病理に悩まされ、主に敵の相互作用の多様性の欠如から生じる。 Co-evolutionary GAN (CoE-GAN) トレーニングアルゴリズムはこれらの病理に耐性があることが示されている。本稿では,空間分布型CoE-GANであるMustangsを紹介する。 MNISTとCelebAの実験分析により、ムスタングは統計的により正確な発電機を訓練することを示した。 Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse, which mainly arise from a lack of diversity in their adversarial interactions. Co-evolutionary GAN (CoE-GAN) training algorithms have shown to be resilient to these pathologies. This article introduces Mustangs, a spatially distributed CoE-GAN, which fosters diversity by using different loss functions during the training. Experimental analysis on MNIST and CelebA demonstrated that Mustangs trains statistically more accurate generators.	翻訳日:2021-06-28 13:14:03 公開日:2021-06-25
# 直交確率線形混合モデルを用いた高次元時系列ベイズ推定 Bayesian Inference in High-Dimensional Time-Serieswith the Orthogonal Stochastic Linear Mixing Model ( http://arxiv.org/abs/2106.13379v1 ) ライセンス: Link先を確認	Rui Meng, Kristofer Bouchard	(参考訳) 現代の時系列データセットの多くは、長期間にわたってサンプリングされた大量の出力応答変数を含んでいる。例えば神経科学では、100s-1000のニューロンの活動は行動中や感覚刺激に反応して記録される。多出力ガウス過程モデルでは、ガウス過程の非パラメトリックな性質を利用して複数の出力をまたいだ構造を捉える。しかし、このモデルのクラスは、通常、出力応答変数間の相関が入力空間で不変であると仮定する。確率線形混合モデル(SLMM)は、混合係数が入力に依存すると仮定し、より柔軟で複雑な出力依存を捉えるのに効果的である。しかし、現在、SLMMの推論は大規模なデータセットには難解であり、現代の時系列問題にも適用できない。本稿では,混合係数間の直交制約を導入する新しい回帰フレームワークである直交確率線形混合モデル(oslmm)を提案する。この制約は、複雑な出力依存を処理する能力を保持しながら、推論の計算負荷を軽減する。我々は,slmmとoslmmの双方に対してマルコフ連鎖モンテカルロ推定手法を提供し,実世界のアプリケーションにおいて,oslmmのモデル拡張性と予測誤差の低減を実証した。神経生理学記録では,聴覚刺激に対する応答のコンパクトな可視化に,推定潜時関数を用い,競合法(GPFA)と比較して優れた結果を示した。これらの結果から,OSLMMは多様な大規模時系列データセットの分析に有用であることが示唆された。 Many modern time-series datasets contain large numbers of output response variables sampled for prolonged periods of time. For example, in neuroscience, the activities of 100s-1000's of neurons are recorded during behaviors and in response to sensory stimuli. Multi-output Gaussian process models leverage the nonparametric nature of Gaussian processes to capture structure across multiple outputs. However, this class of models typically assumes that the correlations between the output response variables are invariant in the input space. Stochastic linear mixing models (SLMM) assume the mixture coefficients depend on input, making them more flexible and effective to capture complex output dependence. However, currently, the inference for SLMMs is intractable for large datasets, making them inapplicable to several modern time-series problems. In this paper, we propose a new regression framework, the orthogonal stochastic linear mixing model (OSLMM) that introduces an orthogonal constraint amongst the mixing coefficients. This constraint reduces the computational burden of inference while retaining the capability to handle complex output dependence. We provide Markov chain Monte Carlo inference procedures for both SLMM and OSLMM and demonstrate superior model scalability and reduced prediction error of OSLMM compared with state-of-the-art methods on several real-world applications. In neurophysiology recordings, we use the inferred latent functions for compact visualization of population responses to auditory stimuli, and demonstrate superior results compared to a competing method (GPFA). Together, these results demonstrate that OSLMM will be useful for the analysis of diverse, large-scale time-series datasets.	翻訳日:2021-06-28 13:13:55 公開日:2021-06-25
# グルーピング効果を用いたロバスト行列因子分解 Robust Matrix Factorization with Grouping Effect ( http://arxiv.org/abs/2106.13681v1 ) ライセンス: Link先を確認	Haiyan Jiang, Shuyu Li, Luwei Zhang, Haoyi Xiong, Dejing Dou	(参考訳) 行列分解(MF)には多くの技術が応用されているが、特徴構造を完全に活用するものではない。本稿では,グループ化効果をMFに組み込んで,グループ化効果を用いたロバスト行列分解法(GRMF)を提案する。グルーピング効果はsparsity効果の一般化であり、0.0前後ではなく、複数の中心に類似した値をクラスタリングすることでデノイジングを行う。既存のアルゴリズムと比較して,提案したGRMFは,自然に調整可能な非凸正規化を導入し,同時分散とグループ化効果を実現することで,MF内のグループ構造と疎性を自動的に学習することができる。具体的には、GRMFは効率の良い交互最小化フレームワークを使用してMFを実行し、元の非凸問題はまず差分凸(DC)プログラミングによって凸問題に変換され、次に交互乗算器の方向法(ADMM)によって解決される。さらに、GRMFはNon- negative Matrix Factorization (NMF)設定に容易に拡張できる。 5つのベンチマークアルゴリズムと比較して, GRMFが性能と堅牢性を向上したことを示す実験結果が得られた。 Although many techniques have been applied to matrix factorization (MF), they may not fully exploit the feature structure. In this paper, we incorporate the grouping effect into MF and propose a novel method called Robust Matrix Factorization with Grouping effect (GRMF). The grouping effect is a generalization of the sparsity effect, which conducts denoising by clustering similar values around multiple centers instead of just around 0. Compared with existing algorithms, the proposed GRMF can automatically learn the grouping structure and sparsity in MF without prior knowledge, by introducing a naturally adjustable non-convex regularization to achieve simultaneous sparsity and grouping effect. Specifically, GRMF uses an efficient alternating minimization framework to perform MF, in which the original non-convex problem is first converted into a convex problem through Difference-of-Convex (DC) programming, and then solved by Alternating Direction Method of Multipliers (ADMM). In addition, GRMF can be easily extended to the Non-negative Matrix Factorization (NMF) settings. Extensive experiments have been conducted using real-world data sets with outliers and contaminated noise, where the experimental results show that GRMF has promoted performance and robustness, compared to five benchmark algorithms.	翻訳日:2021-06-28 13:13:30 公開日:2021-06-25
# 凸最適化のためのプライベート適応勾配法 Private Adaptive Gradient Methods for Convex Optimization ( http://arxiv.org/abs/2106.13756v1 ) ライセンス: Link先を確認	Hilal Asi, John Duchi, Alireza Fallah, Omid Javidbakht, Kunal Talwar	(参考訳) 適応ステップを持つ確率勾配降下 (sgd) アルゴリズムの微分プライベート変種の提案と解析, 微分プライベート凸最適化のための適応手法とアダグラードアルゴリズムについて検討した。我々は,両アルゴリズムの後悔の上限を与え,その限界が(最悪の場合)最適であることを示す。その結果,AdaGradのプライベートバージョンは適応性SGDより優れており,AdagradがSGDより優れていることを示す非等方勾配のシナリオでは従来のSGDより優れていた。主な課題は、一般にプライバシーのために付加される等方性雑音が高次元問題に対する勾配幾何学の信号を支配していることである。対照的に,非等方性クリッピングとノイズ付加について研究し,原理的理論的アプローチを考案した。 We study adaptive methods for differentially private convex optimization, proposing and analyzing differentially private variants of a Stochastic Gradient Descent (SGD) algorithm with adaptive stepsizes, as well as the AdaGrad algorithm. We provide upper bounds on the regret of both algorithms and show that the bounds are (worst-case) optimal. As a consequence of our development, we show that our private versions of AdaGrad outperform adaptive SGD, which in turn outperforms traditional SGD in scenarios with non-isotropic gradients where (non-private) Adagrad provably outperforms SGD. The major challenge is that the isotropic noise typically added for privacy dominates the signal in gradient geometry for high-dimensional problems; approaches to this that effectively optimize over lower-dimensional subspaces simply ignore the actual problems that varying gradient geometries introduce. In contrast, we study non-isotropic clipping and noise addition, developing a principled theoretical approach; the consequent procedures also enjoy significantly stronger empirical performance than prior approaches.	翻訳日:2021-06-28 13:13:09 公開日:2021-06-25
# 確率ネスト問題に対する交互確率勾配法のタイター解析 Tighter Analysis of Alternating Stochastic Gradient Method for Stochastic Nested Problems ( http://arxiv.org/abs/2106.13781v1 ) ライセンス: Link先を確認	Tianyi Chen, Yuejiao Sun, and Wotao Yin	(参考訳) 確率的合成、min-max、bilevel最適化を含む確率的ネスト最適化は、多くの機械学習アプリケーションで人気を集めている。 3つの問題はネスト構造を共有しているが、既存の作品はしばしばそれらを別々に扱い、問題固有のアルゴリズムとその分析を開発する。様々なエキサイティングな開発の中で、単純なsgdタイプの更新(潜在的に複数の変数上の)は、ネストした問題のクラスを解くために今でも一般的であるが、非ネスト問題に比べて収束速度が遅いと考えられている。本稿では,確率的ネスト問題に対するSGD型更新を1つのSGDアプローチに統合し,確率的勾配dEscenT法(Alternating Stochastic gradient dEscenT:ALSET)と呼ぶ。本稿では,問題の隠れた滑らかさを生かして,確率的ネスト問題に対するalsetのより厳密な解析を行う。新しい解析では、ネストされた問題の$\epsilon$-定常点を達成するには、${\cal O}(\epsilon^{-2})$サンプルが必要である。一定の規則性条件下では, 確率的構成, min-max, 強化学習問題に適用し, それぞれの場合において最もよく知られたサンプルの複雑さを改善または一致させる。確率ネスト問題における単純なSGD型アルゴリズムが、さらなる修正を必要とせず、実際に非常にうまく機能する理由を述べる。 Stochastic nested optimization, including stochastic compositional, min-max and bilevel optimization, is gaining popularity in many machine learning applications. While the three problems share the nested structure, existing works often treat them separately, and thus develop problem-specific algorithms and their analyses. Among various exciting developments, simple SGD-type updates (potentially on multiple variables) are still prevalent in solving this class of nested problems, but they are believed to have slower convergence rate compared to that of the non-nested problems. This paper unifies several SGD-type updates for stochastic nested problems into a single SGD approach that we term ALternating Stochastic gradient dEscenT (ALSET) method. By leveraging the hidden smoothness of the problem, this paper presents a tighter analysis of ALSET for stochastic nested problems. Under the new analysis, to achieve an $\epsilon$-stationary point of the nested problem, it requires ${\cal O}(\epsilon^{-2})$ samples. Under certain regularity conditions, applying our results to stochastic compositional, min-max and reinforcement learning problems either improves or matches the best-known sample complexity in the respective cases. Our results explain why simple SGD-type algorithms in stochastic nested problems all work very well in practice without the need for further modifications.	翻訳日:2021-06-28 13:12:50 公開日:2021-06-25
# 効率的なレアイベントシミュレーションのためのマルチフィデリティモデリングによるアクティブラーニング Active Learning with Multifidelity Modeling for Efficient Rare Event Simulation ( http://arxiv.org/abs/2106.13790v1 ) ライセンス: Link先を確認	S. L. N. Dhulipala, M. D. Shields, B. W. Spencer, C. Bolisetti, A. E. Slaughter, V. M. Laboure, P. Chakroborty	(参考訳) マルチフィデリティモデリングは、計算コストの高いモデルで不確実性定量化を行うためのコスト効率の高い方法を提供するが、問題の種類や複雑性、結果の所望の精度に応じて、必要なハイフィデリティ(hf)シミュレーションの数を適応的に決定することで、より効率が向上する。希少事象の効率的に推定することを強調する多要素モデルを用いた能動的学習フレームワークを提案する。提案手法は,hf推定された補正を用いて低忠実度(lf)予測を融合し,修正されたlf予測をフィルタリングして高忠実度モデルを呼び出すか否かを判断し,hfモデル呼び出し毎にlf予測の補正を適応させる。このフレームワークは、LFモデルタイプやHFモデルとの相関について、いかなる仮定もしていない。さらに,障害確率を小さく見積もる場合のロバスト性向上のために,HFモデルをいつ呼び出すかを決定する動的能動学習関数を提案する。我々は,いくつかの学術ケーススタディと2つの有限要素モデルケーススタディを用いて,Stokes近似を用いたNavier-Stokes velocitiesの推定と,粗いメッシュ化等方性モデルを用いた横方向等方性モデルによる応力推定を行う。これらのケーススタディを通じて,提案手法は故障確率を正確に推定するだけでなく,モンテカルロ法や標準分散還元法と比較して,hfモデルへのコールのごく一部しか必要としなかった。 While multifidelity modeling provides a cost-effective way to conduct uncertainty quantification with computationally expensive models, much greater efficiency can be achieved by adaptively deciding the number of required high-fidelity (HF) simulations, depending on the type and complexity of the problem and the desired accuracy in the results. We propose a framework for active learning with multifidelity modeling emphasizing the efficient estimation of rare events. Our framework works by fusing a low-fidelity (LF) prediction with an HF-inferred correction, filtering the corrected LF prediction to decide whether to call the high-fidelity model, and for enhanced subsequent accuracy, adapting the correction for the LF prediction after every HF model call. The framework does not make any assumptions as to the LF model type or its correlations with the HF model. In addition, for improved robustness when estimating smaller failure probabilities, we propose using dynamic active learning functions that decide when to call the HF model. We demonstrate our framework using several academic case studies and two finite element (FE) model case studies: estimating Navier-Stokes velocities using the Stokes approximation and estimating stresses in a transversely isotropic model subjected to displacements via a coarsely meshed isotropic model. Across these case studies, not only did the proposed framework estimate the failure probabilities accurately, but compared with either Monte Carlo or a standard variance reduction method, it also required only a small fraction of the calls to the HF model.	翻訳日:2021-06-28 13:12:25 公開日:2021-06-25
# Proxy Convexity: グラディエントDescentでトレーニングされたニューラルネットワーク解析のための統一フレームワーク Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent ( http://arxiv.org/abs/2106.13792v1 ) ライセンス: Link先を確認	Spencer Frei and Quanquan Gu	(参考訳) ニューラルネットワークを学習するための最適化目標は非常に非凸であるが、勾配に基づく手法は実際にニューラルネットワークを学習する上で大きな成功を収めている。この仮定は、勾配降下によって訓練されたニューラルネットワークの証明可能な保証に関する最近の多くの研究につながった。残念なことに、これらの研究のテクニックは、分散、最適化パラメータ、ネットワークアーキテクチャの異なる仮定に依存して、各設定で研究された問題に非常に特化していることが多い。本稿では,ニューラルネットワークの学習分析のための統合型非凸最適化フレームワークを提案する。本稿では,従来の目的関数が勾配法を用いて暗黙的に最小化されるプロキシ目的関数を誘導した場合に満足する,プロキシ凸性とプロキシのPolyak-Lojasiewicz(PL)不等式について紹介する。確率的勾配降下 (sgd) は, プロキシ凸性あるいはプロキシplの不等式を満たす目的に対して, プロキシ目的関数の効率的な保証をもたらす。さらに,勾配降下によって学習されたニューラルネットワークに対する既存の保証の多くは,プロキシ凸性とプロキシpl不等式によって統一できることを示した。 Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the problem studied in each setting, relying on different assumptions on the distribution, optimization parameters, and network architectures, making it difficult to generalize across different settings. In this work, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that stochastic gradient descent (SGD) on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.	翻訳日:2021-06-28 13:11:58 公開日:2021-06-25
# 自己学習型学習者は混合モデルで強い学習者へ変換する Self-training Converts Weak Learners to Strong Learners in Mixture Models ( http://arxiv.org/abs/2106.13805v1 ) ライセンス: Link先を確認	Spencer Frei and Difan Zou and Zixiang Chen and Quanquan Gu	(参考訳) 本研究では, 2 つの等方性分布が対数対数対数分布で満たし, 対数対数分布が満たしている場合の二分分類問題を考える。 We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension. すなわち、自己学習は、ラベルのない例のみを用いて弱い学習者を強い学習者に変換する。さらに、ロジスティック損失に対して勾配降下を行うことで、$o(d)$のラベル付き例のみを用いて分類誤差$c_{\mathrm{err}}$を持つ擬似ラベル$\boldsymbol{\beta}_{\mathrm{pl}}$が得られる(すなわち$\varepsilon$とは無関係)。その結果,半教師付き自己学習アルゴリズムを用いて,最大$o(d)$のラベル付き例と$\tilde o(d/\varepsilon^2)$のラベル付き例を用いて,混合モデルがベイズ最適精度の$\varepsilon$以内に学習できることが示唆された。 We consider a binary classification problem when the data comes from a mixture of two isotropic distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ with classification error $C_{\mathrm{err}}$ using only $O(d)$ labeled examples (i.e., independent of $\varepsilon$). Together our results imply that mixture models can be learned to within $\varepsilon$ of the Bayes-optimal accuracy using at most $O(d)$ labeled examples and $\tilde O(d/\varepsilon^2)$ unlabeled examples by way of a semi-supervised self-training algorithm.	翻訳日:2021-06-28 13:11:38 公開日:2021-06-25
# タンポラ型原型ニューラルネットワークを用いた眼底緑内障の円形OCT-Focused Hybrid Learning Circumpapillary OCT-Focused Hybrid Learning for Glaucoma Grading Using Tailored Prototypical Neural Networks ( http://arxiv.org/abs/2106.13551v1 ) ライセンス: Link先を確認	Gabriel Garc\'ia, Roc\'io del Amor, Adri\'an Colomer, Rafael Verd\'u-Monedero, Juan Morales-S\'anchez and Valery Naranjo	(参考訳) 緑内障は視覚障害の主要な原因の1つであり、光コヒーレンス・トモグラフィー(OCT)はその検出に欠かせない技術である。緑内障検出に焦点をあてた最先端研究の多くとは違って,本論文では,生の毛細血管Bスキャンを用いた緑内障診断のための新しい枠組みを提案する。特に,手動学習と深層学習を組み合わせた新しいOCTベースのハイブリッドネットワークを構築した。網膜神経線維層(RNFL)に関連する手作りの特徴を抽出するために,OCT特異的記述子を提案する。並行して、遅延空間の自動的特徴を洗練させるために、リザーブド・アテンション・モジュールを含むスキップ接続を用いて革新的なCNNが開発された。提案アーキテクチャは,静的および動的プロトタイプネットワークに基づく新規な数ショット学習を行うためのバックボーンとして使用される。 k-shotパラダイムが再定義され、健康、早期、先進的な緑内障のサンプルを区別する、監視されたエンドツーエンドシステムを生み出している。ハイデルベルクスペクトルシステムによって取得した2つの融合データベースから,動的プロトタイプネットワークの訓練と評価を行う。検証と検査の結果,緑内障の分類精度は 0.9459 と 0.8788 であった。さらに,提案モデルによる緑内障検出の高性能化は,特に注意すべき点である。 RNFLは緑内障診断の最も関連性の高い構造であると熱マップが指摘しているため, クラスアクティベーションマップからの知見は臨床医の意見と直接一致している。 Glaucoma is one of the leading causes of blindness worldwide and Optical Coherence Tomography (OCT) is the quintessential imaging technique for its detection. Unlike most of the state-of-the-art studies focused on glaucoma detection, in this paper, we propose, for the first time, a novel framework for glaucoma grading using raw circumpapillary B-scans. In particular, we set out a new OCT-based hybrid network which combines hand-driven and deep learning algorithms. An OCT-specific descriptor is proposed to extract hand-crafted features related to the retinal nerve fibre layer (RNFL). In parallel, an innovative CNN is developed using skip-connections to include tailored residual and attention modules to refine the automatic features of the latent space. The proposed architecture is used as a backbone to conduct a novel few-shot learning based on static and dynamic prototypical networks. The k-shot paradigm is redefined giving rise to a supervised end-to-end system which provides substantial improvements discriminating between healthy, early and advanced glaucoma samples. The training and evaluation processes of the dynamic prototypical network are addressed from two fused databases acquired via Heidelberg Spectralis system. Validation and testing results reach a categorical accuracy of 0.9459 and 0.8788 for glaucoma grading, respectively. Besides, the high performance reported by the proposed model for glaucoma detection deserves a special mention. The findings from the class activation maps are directly in line with the clinicians' opinion since the heatmaps pointed out the RNFL as the most relevant structure for glaucoma diagnosis.	翻訳日:2021-06-28 13:10:36 公開日:2021-06-25
# 病理組織像を用いた膀胱癌診断のための新しい自己学習フレームワーク A Novel Self-Learning Framework for Bladder Cancer Grading Using Histopathological Images ( http://arxiv.org/abs/2106.13559v1 ) ライセンス: Link先を確認	Gabriel Garc\'ia, Anna Esteve, Adri\'an Colomer, David Ramos and Valery Naranjo	(参考訳) 近年,膀胱癌の発生率と死亡率の増加がみられた。現在、NMIBC(非筋浸潤性膀胱癌)とMIBC(筋浸潤性膀胱癌)の2つのサブタイプが知られている。本研究では,mibcサブタイプに焦点をあてる。これは最悪の予後であり,隣接する臓器に拡がる可能性があるためである。組織像を免疫組織化学的手法により染色した膀胱癌に対する自己学習フレームワークを提案する。具体的には、本論文で確立されたパターンに従って、組織学的パッチを病の重症度に分類できる新しいDeep Convolutional Embedded Attention Clustering (DCEAC)を提案する。提案したDCEACモデルは,512×512ピクセルの高分解能試料から非腫瘍,軽度,浸透パターンを識別する2段階の完全教師なし学習手法に従う。本システムでは,従来のクラスタリング手法よりも,階層化前の潜伏空間の特徴を精査するコンボリューショナルアテンションモジュールを組み込むことで,性能を向上する。提案されたネットワークは最先端のアプローチを異なるメトリクスで2-3%上回り、マルチクラスのシナリオでは 0.9034 という最終的な平均精度を達成している。さらに、報告されたクラスアクティベーションマップは、我々のモデルが、事前のアノテーションステップを発生させることなく、臨床医が考慮するパターンと同じパターンで学習できることを示す。この事実は、ラベル付きデータでモデルを訓練する際のギャップを埋める筋肉浸潤性膀胱がんのグレーディングにおけるブレークスルーである。 Recently, bladder cancer has been significantly increased in terms of incidence and mortality. Currently, two subtypes are known based on tumour growth: non-muscle invasive (NMIBC) and muscle-invasive bladder cancer (MIBC). In this work, we focus on the MIBC subtype because it is of the worst prognosis and can spread to adjacent organs. We present a self-learning framework to grade bladder cancer from histological images stained via immunohistochemical techniques. Specifically, we propose a novel Deep Convolutional Embedded Attention Clustering (DCEAC) which allows classifying histological patches into different severity levels of the disease, according to the patterns established in the literature. The proposed DCEAC model follows a two-step fully unsupervised learning methodology to discern between non-tumour, mild and infiltrative patterns from high-resolution samples of 512x512 pixels. Our system outperforms previous clustering-based methods by including a convolutional attention module, which allows refining the features of the latent space before the classification stage. The proposed network exceeds state-of-the-art approaches by 2-3% across different metrics, achieving a final average accuracy of 0.9034 in a multi-class scenario. Furthermore, the reported class activation maps evidence that our model is able to learn by itself the same patterns that clinicians consider relevant, without incurring prior annotation steps. This fact supposes a breakthrough in muscle-invasive bladder cancer grading which bridges the gap with respect to train the model on labelled data.	翻訳日:2021-06-28 13:10:12 公開日:2021-06-25
# 変圧器における形状登録 Shape registration in the time of transformers ( http://arxiv.org/abs/2106.13679v1 ) ライセンス: Link先を確認	Giovanni Trappolini, Luca Cosmo, Luca Moschella, Riccardo Marin, Emanuele Rodol\`a	(参考訳) 本稿では,非剛性3次元点雲の効率的な登録のための変圧器に基づく手法を提案する。提案手法はデータ駆動型であり、登録タスクにおいて初めてトランスフォーマーアーキテクチャを採用する。我々の方法は一般的であり、異なる設定に当てはまる。いくつかの望ましい特性を持つ固定テンプレート(例)が与えられる。スキンウェイトや他のアニメーションキュー) 取得した生データを登録することで、すべてのテンプレートプロパティを入力ジオメトリに転送することができる。あるいは、一対の形状を与えられた場合、この方法は第1を第2(あるいはその逆)に登録し、2つの間の高品質な密度対応を得る。両方の文脈において、結果の品質は、テクスチャ転送や形状補間といった実際の応用を目標にすることができる。さらに,表面の密度を推定することにより,学習プロセスが簡単になることを示す。このアーキテクチャの潜在能力を生かして、基礎的真理対応のスパースセットだけを必要とするモデルを訓練することができる(全点の10\sim20\%$)。提案するモデルと解析により、登録およびマッチングアプリケーションのためのトランスフォーマーベースのアーキテクチャの今後の探究の道が開けた。定性的かつ定量的な評価は,異なるデータセットやシナリオの3dデータ登録を変形可能かつ無秩序にするために,パイプラインが最先端の手法よりも優れていることを示している。 In this paper, we propose a transformer-based procedure for the efficient registration of non-rigid 3D point clouds. The proposed approach is data-driven and adopts for the first time the transformer architecture in the registration task. Our method is general and applies to different settings. Given a fixed template with some desired properties (e.g. skinning weights or other animation cues), we can register raw acquired data to it, thereby transferring all the template properties to the input geometry. Alternatively, given a pair of shapes, our method can register the first onto the second (or vice-versa), obtaining a high-quality dense correspondence between the two. In both contexts, the quality of our results enables us to target real applications such as texture transfer and shape interpolation. Furthermore, we also show that including an estimation of the underlying density of the surface eases the learning process. By exploiting the potential of this architecture, we can train our model requiring only a sparse set of ground truth correspondences ($10\sim20\%$ of the total points). The proposed model and the analysis that we perform pave the way for future exploration of transformer-based architectures for registration and matching applications. Qualitative and quantitative evaluations demonstrate that our pipeline outperforms state-of-the-art methods for deformable and unordered 3D data registration on different datasets and scenarios.	翻訳日:2021-06-28 13:09:46 公開日:2021-06-25
# 計算病理学のセマンティックアノテーション:多分野経験とベストプラクティス勧告 Semantic annotation for computational pathology: Multidisciplinary experience and best practice recommendations ( http://arxiv.org/abs/2106.13689v1 ) ライセンス: Link先を確認	Noorul Wahab, Islam M Miligy, Katherine Dodd, Harvir Sahota, Michael Toss, Wenqi Lu, Mostafa Jahanifar, Mohsin Bilal, Simon Graham, Young Park, Giorgos Hadjigeorghiou, Abhir Bhalerao, Ayat Lashen, Asmaa Ibrahim, Ayaka Katayama, Henry O Ebili, Matthew Parkin, Tom Sorell, Shan E Ahmed Raza, Emily Hero, Hesham Eldaly, Yee Wah Tsang, Kishore Gopalakrishnan, David Snead, Emad Rakha, Nasir Rajpoot, Fayyaz Minhas	(参考訳) フルスライドイメージング(wsi)技術の最近の進歩は、無数のコンピュータビジョンと人工知能(ai)ベースの診断、予測、予測アルゴリズムの開発につながった。 CPath(Computational Pathology)は、病理学のWSIに埋め込まれた情報を活用するための統合されたソリューションを提供する。 WSIの自動分析と機械学習(ML)モデルの検証には、スライドでのアノテーション、組織、細胞レベルが必要である。病理画像における重要な視覚構成物のアノテーションはCPathプロジェクトの重要な構成要素である。不適切なアノテーションは解釈が難しいアルゴリズムとなり、不正確で一貫性のない結果を生み出す可能性がある。 CPathプロジェクトにおけるアノテーションの重要な役割にもかかわらず、アノテーションの実施方法に関する明確なガイドラインやベストプラクティスは存在しない。本稿では,多分野にわたる病理学者,ML専門家,研究者による大規模アノテーション演習の実施中に得られた経験とベストプラクティスを,PathLAKEコンソーシアム(Lake for Analytics, Knowledge and Education)コンソーシアムの一部として提示することで,この問題に対処する。本稿では,様々な種類のアノテーション,診断アルゴリズム,アノテーションデータ辞書,アノテーション構成例とともに実世界のケーススタディを示す。この研究で報告された分析は、CPathプロジェクトのライフサイクルに関するガイドラインとして使用できるベストプラクティスの推奨を強調している。 Recent advances in whole slide imaging (WSI) technology have led to the development of a myriad of computer vision and artificial intelligence (AI) based diagnostic, prognostic, and predictive algorithms. Computational Pathology (CPath) offers an integrated solution to utilize information embedded in pathology WSIs beyond what we obtain through visual assessment. For automated analysis of WSIs and validation of machine learning (ML) models, annotations at the slide, tissue and cellular levels are required. The annotation of important visual constructs in pathology images is an important component of CPath projects. Improper annotations can result in algorithms which are hard to interpret and can potentially produce inaccurate and inconsistent results. Despite the crucial role of annotations in CPath projects, there are no well-defined guidelines or best practices on how annotations should be carried out. In this paper, we address this shortcoming by presenting the experience and best practices acquired during the execution of a large-scale annotation exercise involving a multidisciplinary team of pathologists, ML experts and researchers as part of the Pathology image data Lake for Analytics, Knowledge and Education (PathLAKE) consortium. We present a real-world case study along with examples of different types of annotations, diagnostic algorithm, annotation data dictionary and annotation constructs. The analyses reported in this work highlight best practice recommendations that can be used as annotation guidelines over the lifecycle of a CPath project.	翻訳日:2021-06-28 13:09:27 公開日:2021-06-25
# JNLPチーム:COLIEE 2021における法律処理タスクのためのディープラーニングアプローチ JNLP Team: Deep Learning Approaches for Legal Processing Tasks in COLIEE 2021 ( http://arxiv.org/abs/2106.13405v1 ) ライセンス: Link先を確認	Ha-Thanh Nguyen, Phuong Minh Nguyen, Thi-Hai-Yen Vuong, Quan Minh Bui, Chau Minh Nguyen, Binh Tran Dang, Vu Tran, Minh Le Nguyen, Ken Satoh	(参考訳) COLIEEは、自動コンピュータ化された法律テキスト処理における毎年のコンペティションである。自動法的文書処理は野心的な目標であり、法律の構造と意味論は日常言語よりもはるかに複雑であることが多い。本稿では,法律文書処理における深層学習の方法と実験結果について調査・報告する。結果は、この一連のアプローチの難しさと可能性を示している。 COLIEE is an annual competition in automatic computerized legal text processing. Automatic legal document processing is an ambitious goal, and the structure and semantics of the law are often far more complex than everyday language. In this article, we survey and report our methods and experimental results in using deep learning in legal document processing. The results show the difficulties as well as potentials in this family of approaches.	翻訳日:2021-06-28 13:09:01 公開日:2021-06-25
# アムハラ語用マニュアルアノテーション付きスペル誤りコーパス Manually Annotated Spelling Error Corpus for Amharic ( http://arxiv.org/abs/2106.13521v1 ) ライセンス: Link先を確認	Andargachew Mekonnen Gezmu, Tirufat Tesifaye Lema, Binyam Ephrem Seyoum, Andreas N\"urnberger	(参考訳) 本稿では,エチオピアのAmharic, lingua Franceaに対して手書きの綴り誤りコーパスを提案する。コーパスはスペルエラーの検出と修正の評価に使用されるように設計されている。ミススペルは非単語と実単語のエラーとしてタグ付けされる。さらに、コーパスで利用可能なコンテキスト情報は、両方のスペルエラーを扱うのに役立ちます。 This paper presents a manually annotated spelling error corpus for Amharic, lingua franca in Ethiopia. The corpus is designed to be used for the evaluation of spelling error detection and correction. The misspellings are tagged as non-word and real-word errors. In addition, the contextual information available in the corpus makes it useful in dealing with both types of spelling errors.	翻訳日:2021-06-28 13:08:56 公開日:2021-06-25
# ELECTRA事前学習のためのサンプル交換の学習 Learning to Sample Replacements for ELECTRA Pre-Training ( http://arxiv.org/abs/2106.13715v1 ) ライセンス: Link先を確認	Yaru Hao, Li Dong, Hangbo Bao, Ke Xu, Furu Wei	(参考訳) ELECTRAは、置換トークンを検出するために識別器を事前訓練し、置換はマスク付き言語モデリングで訓練されたジェネレータからサンプリングされる。この性能にもかかわらず、ELECTRAは以下の2つの問題に悩まされている。まず、判別器からジェネレータへの直接フィードバックループはなく、置換サンプリングが非効率になる。第二に、ジェネレータの予測はトレーニングとともに過信される傾向があり、置換は正しいトークンに偏っている。本稿では,エレクトラプリトレーニングのための代替サンプリングを改善する2つの手法を提案する。具体的には,識別器が取得していないものを学習できるように,硬度予測機構によるサンプリングを増強する。また,効率的なサンプリングが判別器のトレーニング分散を減少させることを示す。さらに,代用として適切なトークンのオーバーサンプリングを緩和するために,発電機の焦点損失を利用する。実験の結果,提案手法は様々な下流タスクにおけるELECTRA事前学習を改善することがわかった。 ELECTRA pretrains a discriminator to detect replaced tokens, where the replacements are sampled from a generator trained with masked language modeling. Despite the compelling performance, ELECTRA suffers from the following two issues. First, there is no direct feedback loop from discriminator to generator, which renders replacement sampling inefficient. Second, the generator's prediction tends to be over-confident along with training, making replacements biased to correct tokens. In this paper, we propose two methods to improve replacement sampling for ELECTRA pre-training. Specifically, we augment sampling with a hardness prediction mechanism, so that the generator can encourage the discriminator to learn what it has not acquired. We also prove that efficient sampling reduces the training variance of the discriminator. Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements. Experimental results show that our method improves ELECTRA pre-training on various downstream tasks.	翻訳日:2021-06-28 13:08:49 公開日:2021-06-25
# DeltaLM: 事前訓練された多言語エンコーダの拡張による言語生成と翻訳のためのエンコーダデコーダ事前学習 DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders ( http://arxiv.org/abs/2106.13736v1 ) ライセンス: Link先を確認	Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei	(参考訳) プリトレーニングエンコーダは、様々な自然言語理解(nlu)タスクで成功を収めているが、これらのプリトレーニングエンコーダと自然言語生成(nlg)の間にはギャップがある。 nlgタスクはしばしばエンコーダ/デコーダフレームワークに基づいており、プリトレーニングされたエンコーダはその一部しか役に立たない。このギャップを減らすために,本モデルでは,デコーダを既訓練の既訓練エンコーダのタスク層とみなす,事前訓練された多言語エンコーダ-デコーダモデルであるDeltaLMを導入する。具体的には,事前学習した多言語エンコーダをデコーダで拡張し,自己指導型で事前学習する。大規模単言語データとバイリンガルデータの両方を活用するために,スパン破壊と翻訳スパン破壊を事前学習タスクとして採用する。実験により、DeltaLMは、機械翻訳、抽象テキスト要約、データ・トゥ・テキスト、質問生成など、自然言語生成と翻訳タスクの両方において、様々な強力なベースラインを上回ります。 While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. To take advantage of both the large-scale monolingual data and bilingual data, we adopt the span corruption and translation span corruption as the pre-training tasks. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks, including machine translation, abstractive text summarization, data-to-text, and question generation.	翻訳日:2021-06-28 13:08:35 公開日:2021-06-25
# エネルギーをベースとした協調型サリエンシ予測 Energy-Based Generative Cooperative Saliency Prediction ( http://arxiv.org/abs/2106.13389v1 ) ライセンス: Link先を確認	Jing Zhang and Jianwen Xie and Zilong Zheng and Nick Barnes	(参考訳) 従来のサリエンシー予測モデルは、通常、画像から対応する基底真理サリエンシーマップへの決定論的マッピングを学習する。本稿では,画像に与えられたサリエンシーマップ上の条件付き確率分布を学習し,その予測をサンプリングプロセスとして扱うことにより,生成モデルの観点からサリエンシー予測問題を検討する。具体的には,条件付き潜伏変数モデルと条件付きエネルギーベースモデルとを協調的に学習し,協調的に相応の予測を行う,生成型協調ネットワークに基づく生産型相応予測フレームワークを提案する。私たちはモデルをSalCoopNetsと呼んでいます。潜在変数モデルは、高速だが粗い予測器として機能し、初期予測を効率的に生成し、その後、微細な予測器として機能するエネルギーベースモデルの反復的ランゲヴィン修正によって洗練される。このような粗大な協力的サリエンシ予測戦略は、両方の世界の長所を提供する。さらに,戦略を回復しながら協調学習を行うことによって,トレーニング画像の塩分アノテーションを部分的に観察する,弱教師付き塩分予測のシナリオを一般化する。最後に,学習エネルギー関数を改良モジュールとして機能させることにより,事前学習した他の塩分濃度予測モデルの結果を洗練できることを示す。実験の結果, 生成モデルが最先端の性能を達成できることが判明した。我々のコードは以下で公開されている。 \url{https://github.com/JingZhang617/SalCoopNets}。 Conventional saliency prediction models typically learn a deterministic mapping from images to the corresponding ground truth saliency maps. In this paper, we study the saliency prediction problem from the perspective of generative models by learning a conditional probability distribution over saliency maps given an image, and treating the prediction as a sampling process. Specifically, we propose a generative cooperative saliency prediction framework based on the generative cooperative networks, where a conditional latent variable model and a conditional energy-based model are jointly trained to predict saliency in a cooperative manner. We call our model the SalCoopNets. The latent variable model serves as a fast but coarse predictor to efficiently produce an initial prediction, which is then refined by the iterative Langevin revision of the energy-based model that serves as a fine predictor. Such a coarse-to-fine cooperative saliency prediction strategy offers the best of both worlds. Moreover, we generalize our framework to the scenario of weakly supervised saliency prediction, where saliency annotation of training images is partially observed, by proposing a cooperative learning while recovering strategy. Lastly, we show that the learned energy function can serve as a refinement module that can refine the results of other pre-trained saliency prediction models. Experimental results show that our generative model can achieve state-of-the-art performance. Our code is publicly available at: \url{https://github.com/JingZhang617/SalCoopNets}.	翻訳日:2021-06-28 13:07:55 公開日:2021-06-25
# マルチタスク視覚学習のための生成モデル Generative Modeling for Multi-task Visual Learning ( http://arxiv.org/abs/2106.13409v1 ) ライセンス: Link先を確認	Zhipeng Bao, Martial Hebert, Yu-Xiong Wang	(参考訳) 生成モデリングはコンピュータビジョンにおいて非常に有望であるが、主に視覚的にリアルなイメージの合成に焦点を当てている。本稿では,共有可能な特徴表現のマルチタスク学習をモチベーションとして,様々な視覚的タスクにおいて有用な共有生成モデルを学ぶという,新たな課題について考察する。そこで本研究では,識別型マルチタスクネットワークと生成ネットワークを結合した汎用マルチタスク指向生成モデリング(mgm)フレームワークを提案する。 RGB画像と画素レベルのアノテーションの両方をマルチタスクシナリオで合成することは難しいが、我々のフレームワークは、弱いアノテーション(画像レベルのシーンラベル)のみをペアにした合成画像を使用することで、複数の視覚的タスクを容易にすることができる。 NYUv2やTaskonomyなど、挑戦的なマルチタスクベンチマークに関する実験的評価は、我々のMGMフレームワークがすべてのタスクのパフォーマンスを大きなマージンで改善し、一貫して最先端のマルチタスクアプローチよりも優れています。 Generative modeling has recently shown great promise in computer vision, but it has mostly focused on synthesizing visually realistic images. In this paper, motivated by multi-task learning of shareable feature representations, we consider a novel problem of learning a shared generative model that is useful across various visual perception tasks. Correspondingly, we propose a general multi-task oriented generative modeling (MGM) framework, by coupling a discriminative multi-task network with a generative network. While it is challenging to synthesize both RGB images and pixel-level annotations in multi-task scenarios, our framework enables us to use synthesized images paired with only weak annotations (i.e., image-level scene labels) to facilitate multiple visual tasks. Experimental evaluation on challenging multi-task benchmarks, including NYUv2 and Taskonomy, demonstrates that our MGM framework improves the performance of all the tasks by large margins, consistently outperforming state-of-the-art multi-task approaches.	翻訳日:2021-06-28 13:07:33 公開日:2021-06-25
# ビデオ質問応答のための階層的オブジェクト指向時空間推論 Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering ( http://arxiv.org/abs/2106.13432v1 ) ライセンス: Link先を確認	Long Hoang Dang, Thao Minh Le, Vuong Le, Truyen Tran	(参考訳) Video Question Answering(ビデオQA)は新しいAI機能を開発するための強力なテストベッドである。このタスクは、時空における視覚ドメインと言語ドメイン間のオブジェクト、関係、イベントの推論を学ぶ必要がある。高レベルの推論は、連想的な視覚的パターン認識から、オブジェクトに対するシンボルのような操作、その振る舞いと相互作用への要求を軽減します。この目標を達成するために,映像を相互作用するオブジェクトの動的ストリームとして抽象化するオブジェクト指向推論手法を提案する。ビデオイベントフローの各段階で、これらのオブジェクトは相互に相互作用し、それらの相互作用は、クエリおよびビデオの全体的なコンテキストの下で、推論される。このメカニズムは汎用神経ユニットのファミリーと階層的オブジェクト指向時空間推論(HOSTR)ネットワークと呼ばれる多層アーキテクチャに実体化されている。このニューラルモデルは、階層的にネストされた時空間グラフの形で、オブジェクトの一貫したライフラインを維持する。このグラフ内では、動的インタラクティブなオブジェクト指向表現がビデオシーケンスに沿って構築され、階層的にボトムアップ的に抽象化され、正しい回答のキー情報に収束する。この手法は、複数の主要なビデオQAデータセットで評価され、これらのタスクに新しい最先端技術を確立する。モデルの振る舞いの分析は、オブジェクト指向推論がビデオQAに対する信頼性、解釈可能、効率的なアプローチであることを示している。 Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities. This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time. High-level reasoning demands lifting from associative visual pattern recognition to symbol-like manipulation over objects, their behavior and interactions. Toward reaching this goal we propose an object-oriented reasoning approach in that video is abstracted as a dynamic stream of interacting objects. At each stage of the video event flow, these objects interact with each other, and their interactions are reasoned about with respect to the query and under the overall context of a video. This mechanism is materialized into a family of general-purpose neural units and their multi-level architecture called Hierarchical Object-oriented Spatio-Temporal Reasoning (HOSTR) networks. This neural model maintains the objects' consistent lifelines in the form of a hierarchically nested spatio-temporal graph. Within this graph, the dynamic interactive object-oriented representations are built up along the video sequence, hierarchically abstracted in a bottom-up manner, and converge toward the key information for the correct answer. The method is evaluated on multiple major Video QA datasets and establishes new state-of-the-arts in these tasks. Analysis into the model's behavior indicates that object-oriented reasoning is a reliable, interpretable and efficient approach to Video QA.	翻訳日:2021-06-28 13:07:17 公開日:2021-06-25
# NP-DRAW:画像生成のための非パラメータ構造潜在変数モデル NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image Generation ( http://arxiv.org/abs/2106.13435v1 ) ライセンス: Link先を確認	Xiaohui Zeng, Raquel Urtasun, Richard Zemel, Sanja Fidler, Renjie Liao	(参考訳) 本稿では、NP-DRAWと呼ばれる画像生成のための非パラメトリック構造化潜在変数モデルを提案する。主な貢献は以下の通りである。 1)ステップ毎の潜在変数 `what-to-draw''' がカテゴリ確率変数となるように,画像部分の出現に関する非パラメトリック事前分布を提案する。これにより表現性が向上し、文学で使用されるガウス語と比較して学習が大幅に楽になる。 2)本論文では,トランスフォーマーを用いて部品の逐次依存性構造をモデル化する。 3) 事前学習のための効果的なヒューリスティック解析アルゴリズムを提案する。 MNIST,Omniglot,CIFAR-10,CelebAによる実験により,本手法は従来のDRAWやAIRなどの画像モデルよりも大幅に優れており,他のジェネリック生成モデルと競合することを示す。さらに,本モデル固有の構成性や解釈性は,低データ学習システムや潜在空間編集において大きなメリットをもたらすことを示す。コードは \url{https://github.com/ZENGXH/NPDRAW} で入手できる。 In this paper, we present a non-parametric structured latent variable model for image generation, called NP-DRAW, which sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas. Our key contributions are as follows. 1) We propose a non-parametric prior distribution over the appearance of image parts so that the latent variable ``what-to-draw'' per step becomes a categorical random variable. This improves the expressiveness and greatly eases the learning compared to Gaussians used in the literature. 2) We model the sequential dependency structure of parts via a Transformer, which is more powerful and easier to train compared to RNNs used in the literature. 3) We propose an effective heuristic parsing algorithm to pre-train the prior. Experiments on MNIST, Omniglot, CIFAR-10, and CelebA show that our method significantly outperforms previous structured image models like DRAW and AIR and is competitive to other generic generative models. Moreover, we show that our model's inherent compositionality and interpretability bring significant benefits in the low-data learning regime and latent space editing. Code is available at \url{https://github.com/ZENGXH/NPDRAW}.	翻訳日:2021-06-28 13:06:58 公開日:2021-06-25
# モダリティの探索:視覚言語事前学習のための自己注意型視覚解析 Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training ( http://arxiv.org/abs/2106.13488v1 ) ライセンス: Link先を確認	Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo	(参考訳) Vision-Language Pre-Training (VLP)は、画像テキストペアからマルチモーダル表現を学習し、微調整で下流の視覚言語タスクに役立てることを目的としている。支配的なVLPモデルはCNN-Transformerアーキテクチャを採用し、CNNにイメージを埋め込んで、画像とテキストをTransformerにアライメントする。視覚コンテンツ間の視覚的関係は画像理解において重要な役割を担い、モーダル間アライメント学習の基礎となる。しかしながら、cnnは、長距離依存関係のモデリングにおける局所受容野の弱さのため、視覚関係学習に制限がある。したがって、視覚関係とモーダル間アライメントの2つの目的は同じトランスフォーマーネットワークにカプセル化される。このような設計は、各目的の特殊特性を無視してトランスフォーマーにおけるモーダル間アライメント学習を制限する可能性がある。そこで本研究では,視覚関係をよりよく学習し,モーダル間アライメントを促進するために,VLPのためのフルトランスフォーマー視覚埋め込みを提案する。具体的には、視覚と言語モダリティ(モダリティ間)の相互作用を測定するために、IMF(Inter-Modality Flow)と呼ばれる指標を提案する。また,モダリティ間の学習をさらに促進するために,Transformer で Masked Feature Regression (MFR) という新しいマスキング最適化機構を設計する。我々の知る限りでは、VLPにおける視覚的特徴学習におけるTransformerのメリットを探求する最初の研究である。本稿では,視覚的質問応答(VQA),視覚的ヒント(Visual Entailment),視覚的推論(Visual Reasoning)など,幅広い視覚言語タスクについて検証する。当社のアプローチは、最先端のVLPのパフォーマンスを上回るだけでなく、IMFの指標にもメリットがあります。 Vision-Language Pre-training (VLP) aims to learn multi-modal representations from image-text pairs and serves for downstream vision-language tasks in a fine-tuning fashion. The dominant VLP models adopt a CNN-Transformer architecture, which embeds images with a CNN, and then aligns images and text with a Transformer. Visual relationship between visual contents plays an important role in image understanding and is the basic for inter-modal alignment learning. However, CNNs have limitations in visual relation learning due to local receptive field's weakness in modeling long-range dependencies. Thus the two objectives of learning visual relation and inter-modal alignment are encapsulated in the same Transformer network. Such design might restrict the inter-modal alignment learning in the Transformer by ignoring the specialized characteristic of each objective. To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment. Specifically, we propose a metric named Inter-Modality Flow (IMF) to measure the interaction between vision and language modalities (i.e., inter-modality). We also design a novel masking optimization mechanism named Masked Feature Regression (MFR) in Transformer to further promote the inter-modality learning. To the best of our knowledge, this is the first study to explore the benefit of Transformer for visual feature learning in VLP. We verify our method on a wide range of vision-language tasks, including Visual Question Answering (VQA), Visual Entailment and Visual Reasoning. Our approach not only outperforms the state-of-the-art VLP performance, but also shows benefits on the IMF metric.	翻訳日:2021-06-28 13:06:37 公開日:2021-06-25
# 糖尿病網膜症の深層学習に基づく分析における事前訓練と自己監視のロバスト性について On the Robustness of Pretraining and Self-Supervision for a Deep Learning-based Analysis of Diabetic Retinopathy ( http://arxiv.org/abs/2106.13497v1 ) ライセンス: Link先を確認	Vignesh Srinivasan, Nils Strodthoff, Jackie Ma, Alexander Binder, Klaus-Robert M\"uller, Wojciech Samek	(参考訳) ディープニューラルネットワークに基づく分類アルゴリズムが、人間の医療専門家と競合するパフォーマンスレベルに達する医療用ユースケースが増えている。小さなデータセットサイズの課題を軽減するため、これらのシステムは事前トレーニングに依存することが多い。本研究は,これらのアプローチの広範な影響を評価することを目的とする。糖尿病網膜症を模範とした症例では, コントラスト学習に基づく自己指導型事前訓練法を含め, 異なる訓練方法の影響を比較検討した。この目的のために, 定量的性能, 学習特徴表現の統計, 解釈可能性, 画像歪みに対するロバスト性など, 様々な側面について検討した。以上の結果から,imagenetプリトレーニングから初期化したモデルでは,画像歪みに対する性能,一般化,ロバスト性が著しく向上することが示唆された。特に、自己教師付きモデルは教師付きモデルにさらなる利点をもたらす。 ImageNetから初期化した自己教師型モデルは、高いパフォーマンスを報告するだけでなく、大きな病変への過剰適合を減らし、疾患の進行を示す微小病変を考慮に入れた。簡単なパフォーマンス比較を超えて、より広い意味でプレトレーニングの効果を理解することは、この研究で考慮されたユースケースを超えて、幅広い医療画像コミュニティにとって重要である。 There is an increasing number of medical use-cases where classification algorithms based on deep neural networks reach performance levels that are competitive with human medical experts. To alleviate the challenges of small dataset sizes, these systems often rely on pretraining. In this work, we aim to assess the broader implications of these approaches. For diabetic retinopathy grading as exemplary use case, we compare the impact of different training procedures including recently established self-supervised pretraining methods based on contrastive learning. To this end, we investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions. Our results indicate that models initialized from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions. In particular, self-supervised models show further benefits to supervised models. Self-supervised models with initialization from ImageNet pretraining not only report higher performance, they also reduce overfitting to large lesions along with improvements in taking into account minute lesions indicative of the progression of the disease. Understanding the effects of pretraining in a broader sense that goes beyond simple performance comparisons is of crucial importance for the broader medical imaging community beyond the use-case considered in this work.	翻訳日:2021-06-28 13:06:07 公開日:2021-06-25
# 多対多対応を考慮したテキストクエリによるビデオモーメント検索 Video Moment Retrieval with Text Query Considering Many-to-Many Correspondence Using Potentially Relevant Pair ( http://arxiv.org/abs/2106.13566v1 ) ライセンス: Link先を確認	Sho Maeoki, Yusuke Mukuta, Tatsuya Harada	(参考訳) 本稿では,ビデオコーパスからテキストベースの映像モーメント検索を行う。モデルをトレーニングするために、テキストモーメントペアデータセットを使用して正しい対応を学習した。典型的な訓練法では、接地型テキストモーメントペアは正の対として、他のペアは負の対として用いられる。しかし、地対と地対は別として、一部の文対は正と見なすべきである。この場合、1つのテキストアノテーションは多くのビデオモーメントに対して陽性となる。逆に、あるビデオモーメントは多くのテキストアノテーションに対応できる。したがって、テキストアノテーションとビデオモーメントの間には多くの対多の対応がある。これらの対応に基づき、基礎的真理として与えられていなくても否定的でない潜在的に関連性のあるペアを形成することができ、そのような関連性のあるペアを効果的にトレーニングに組み込むことで、検索性能を向上させることができる。テキストクエリは、ビデオの瞬間に起きていることを記述すべきである。したがって、類似したアクションを含む類似のテキストでアノテートされた異なるビデオモーメントは、類似のアクションを持つ可能性が高いため、これらのペアは関連するペアと見なすことができる。本稿では,テキストアノテーションに関する言語解析に基づいて,潜在的に関連のあるペアを活用できる新しい学習手法を提案する。 2つのベンチマークデータセットを用いた実験により,本手法は定量的かつ定性的に検索性能を向上することがわかった。 In this paper we undertake the task of text-based video moment retrieval from a corpus of videos. To train the model, text-moment paired datasets were used to learn the correct correspondences. In typical training methods, ground-truth text-moment pairs are used as positive pairs, whereas other pairs are regarded as negative pairs. However, aside from the ground-truth pairs, some text-moment pairs should be regarded as positive. In this case, one text annotation can be positive for many video moments. Conversely, one video moment can be corresponded to many text annotations. Thus, there are many-to-many correspondences between the text annotations and video moments. Based on these correspondences, we can form potentially relevant pairs, which are not given as ground truth yet are not negative; effectively incorporating such relevant pairs into training can improve the retrieval performance. The text query should describe what is happening in a video moment. Hence, different video moments annotated with similar texts, which contain a similar action, are likely to hold the similar action, thus these pairs can be considered as potentially relevant pairs. In this paper, we propose a novel training method that takes advantage of potentially relevant pairs, which are detected based on linguistic analysis about text annotation. Experiments on two benchmark datasets revealed that our method improves the retrieval performance both quantitatively and qualitatively.	翻訳日:2021-06-28 13:05:48 公開日:2021-06-25
# 単眼RGB映像からのアニマタブルニューラルラジアンス場 Animatable Neural Radiance Fields from Monocular RGB Video ( http://arxiv.org/abs/2106.13629v1 ) ライセンス: Link先を確認	Jianchuan Chen, Ying Zhang, Di Kang, Xuefei Zhe, Linchao Bao, Huchuan Lu	(参考訳) 単眼ビデオからの詳細な人体アバター作成のためのアニマタブル神経放射場を提案する。提案手法は,シーン表現ネットワークを学習しながら,明示的なポーズ誘導変形を導入することで,人間の動きを伴う動的シーンにニューラルレイディアンス場(NeRF)を拡張する。特に、各フレームの人間のポーズを推定し、詳細な人間のテンプレートに対して一定の標準空間を学習し、ポーズパラメータの明示的な制御の下で観察空間から標準空間への自然な形状変形を可能にする。不正確なポーズ推定を補うために、学習過程における最初のポーズを更新するポーズ改善戦略を導入し、より正確な人間の再構築を学ぶだけでなく、収束を加速させる。実験の結果, 提案手法は, 1) 質の高い細部を持つ暗黙の人間の形状と外観の復元, 2) 任意の視点からの人間の写真リアルなレンダリング, 3) 任意のポーズを持つ人間のアニメーションを実現する。 We present animatable neural radiance fields for detailed human avatar creation from monocular videos. Our approach extends neural radiance fields (NeRF) to the dynamic scenes with human movements via introducing explicit pose-guided deformation while learning the scene representation network. In particular, we estimate the human pose for each frame and learn a constant canonical space for the detailed human template, which enables natural shape deformation from the observation space to the canonical space under the explicit control of the pose parameters. To compensate for inaccurate pose estimation, we introduce the pose refinement strategy that updates the initial pose during the learning process, which not only helps to learn more accurate human reconstruction but also accelerates the convergence. In experiments we show that the proposed approach achieves 1) implicit human geometry and appearance reconstruction with high-quality details, 2) photo-realistic rendering of the human from arbitrary views, and 3) animation of the human with arbitrary poses.	翻訳日:2021-06-28 13:05:29 公開日:2021-06-25
# 公平かつ解釈可能な表現学習のための投影的遠近法:3次元顔形状解析への応用 Projection-wise Disentangling for Fair and Interpretable Representation Learning: Application to 3D Facial Shape Analysis ( http://arxiv.org/abs/2106.13734v1 ) ライセンス: Link先を確認	Xianjing Liu, Bo Li, Esther Bron, Wiro Niessen, Eppo Wolvius and Gennady Roshchupkin	(参考訳) 合流バイアスは、特に臨床実践において、機械学習を実践する上で重要な問題である。我々は,複数のバイアスに依存しない学習表現の問題を考える。文学では、これは主にバイアス情報を学習した表現から取り除くことで解決される。しかし我々は,この戦略が表現における情報の多様性を損なうことを期待し,その将来的な利用(解釈など)を制限する。そこで本研究では,ほぼすべての情報を潜在表現に保持しながらバイアスを軽減することを提案する。これを実現するため,学習ベクトル方向に潜在機能を投影し,すべての学習特徴よりもバイアスと予測特徴の独立性を強制する。投影特徴と入力データとのマッピングを解釈するために,学習ベクトル方向に沿ってサンプリングと再構成を行うプロジェクションワイド・アンタングリングを提案する。提案手法は3次元顔の形状と患者特性(n=5011)の分析に基づいて評価した。実験により、この概念的に単純な手法は、最先端の公正な予測性能と解釈性を達成し、臨床応用への大きな可能性を示した。 Confounding bias is a crucial problem when applying machine learning to practice, especially in clinical practice. We consider the problem of learning representations independent to multiple biases. In literature, this is mostly solved by purging the bias information from learned representations. We however expect this strategy to harm the diversity of information in the representation, and thus limiting its prospective usage (e.g., interpretation). Therefore, we propose to mitigate the bias while keeping almost all information in the latent representations, which enables us to observe and interpret them as well. To achieve this, we project latent features onto a learned vector direction, and enforce the independence between biases and projected features rather than all learned features. To interpret the mapping between projected features and input data, we propose projection-wise disentangling: a sampling and reconstruction along the learned vector direction. The proposed method was evaluated on the analysis of 3D facial shape and patient characteristics (N=5011). Experiments showed that this conceptually simple method achieved state-of-the-art fair prediction performance and interpretability, showing its great potential for clinical applications.	翻訳日:2021-06-28 13:05:13 公開日:2021-06-25
# 「ゼロショット」ポイントクラウドアップサンプリング "Zero Shot" Point Cloud Upsampling ( http://arxiv.org/abs/2106.13765v1 ) ライセンス: Link先を確認	Kaiyue Zhou, Ming Dong, Suzan Arslanturk	(参考訳) ディープラーニングを使ったポイントクラウドのアップサンプリングは、ここ数年でさまざまな成果を上げている。近年の教師付き深層学習法は, 訓練データのサイズに制限されており, 点雲の形状を網羅する点で制限されている。さらに、そのような量のデータの取得は非現実的であり、ネットワークは一般に、見当たらないレコードで期待されたほど強力ではない。本稿では,ゼロショット (Zero Shot) Point Cloud Upsampling (ZSPU) と呼ばれる点群を包括的に監視する手法を提案する。我々のアプローチは、自己学習とテストの両方のフェーズにパッチを当てることなく、特定のポイントクラウドが提供する内部情報のみに基づいています。このシングルストリーム設計は、低解像度(LR)点雲と高解像度(HR)雲の関係を学習することにより、アップサンプリングタスクのトレーニング時間を著しく短縮する。このアソシエーションは、元の点雲が入力としてロードされたときに超解像(SR)出力を提供する。ベンチマークポイントクラウドデータセット上で、他のアップサンプリング手法と比較して、競合性能を示す。さらに、ZSPUは複雑な局所的な詳細や高い曲率を持つ形状の質的な結果を得る。 Point cloud upsampling using deep learning has been paid various efforts in the past few years. Recent supervised deep learning methods are restricted to the size of training data and is limited in terms of covering all shapes of point clouds. Besides, the acquisition of such amount of data is unrealistic, and the network generally performs less powerful than expected on unseen records. In this paper, we present an unsupervised approach to upsample point clouds internally referred as "Zero Shot" Point Cloud Upsampling (ZSPU) at holistic level. Our approach is solely based on the internal information provided by a particular point cloud without patching in both self-training and testing phases. This single-stream design significantly reduces the training time of the upsampling task, by learning the relation between low-resolution (LR) point clouds and their high (original) resolution (HR) counterparts. This association will provide super-resolution (SR) outputs when original point clouds are loaded as input. We demonstrate competitive performance on benchmark point cloud datasets when compared to other upsampling methods. Furthermore, ZSPU achieves superior qualitative results on shapes with complex local details or high curvatures.	翻訳日:2021-06-28 13:04:55 公開日:2021-06-25
# 逆学習による信頼グラフニューラルネットワークの説明 Reliable Graph Neural Network Explanations Through Adversarial Training ( http://arxiv.org/abs/2106.13427v1 ) ライセンス: Link先を確認	Donald Loveland, Shusen Liu, Bhavya Kailkhura, Anna Hiszpanski, Yong Han	(参考訳) グラフニューラルネットワーク(GNN)の説明は大半がポストホックイントロスペクションによって進められている。これは成功と見なされているが、多くのポストホックな説明方法はモデルの学習した表現を捉えるのに失敗することが示されている。この問題のため、モデルをどのようにトレーニングして、ポストホック解析がより快適になるか検討する価値がある。コンピュータビジョン領域における、より信頼性の高い表現でモデルを訓練するための逆トレーニングの成功を踏まえ、GNNの同様の訓練パラダイムを提案し、モデルの説明に対するそれぞれの影響を分析する。基底的真理ラベルのない例では、説明法がモデルの学習した表現を新しいメトリックを通していかにうまく活用しているかを判断し、逆行訓練が化学におけるドメイン関連洞察の抽出に役立つことを示す。 Graph neural network (GNN) explanations have largely been facilitated through post-hoc introspection. While this has been deemed successful, many post-hoc explanation methods have been shown to fail in capturing a model's learned representation. Due to this problem, it is worthwhile to consider how one might train a model so that it is more amenable to post-hoc analysis. Given the success of adversarial training in the computer vision domain to train models with more reliable representations, we propose a similar training paradigm for GNNs and analyze the respective impact on a model's explanations. In instances without ground truth labels, we also determine how well an explanation method is utilizing a model's learned representation through a new metric and demonstrate adversarial training can help better extract domain-relevant insights in chemistry.	翻訳日:2021-06-28 13:03:35 公開日:2021-06-25
# 時間グラフ信号分解 Temporal Graph Signal Decomposition ( http://arxiv.org/abs/2106.13517v1 ) ライセンス: Link先を確認	Maxwell McNeil and Lin Zhang and Petko Bogdanov	(参考訳) 時間グラフ信号は、固定グラフ構造のノードに関連付けられた個々のコンポーネントを持つ多変量時系列である。この種のデータは、ソーシャルネットワークユーザーの活動、時間の経過とともにセンサーネットワークを読み取ること、モデル生物の相互作用ネットワーク内の時間コース遺伝子表現など、多くの領域で発生する。このようなデータに適用される従来の行列分解法は、基礎となるグラフにエンコードされた構造的規則性や、信号の時間的パターンを活用できない。このような構造を考慮すれば、時間グラフ信号の簡潔かつ解釈可能な表現が得られるか。本稿では、時間グラフ信号分解(TGSD)のための一般的な辞書ベースのフレームワークを提案する。鍵となるアイデアは、グラフと時間辞書を組み合わせることで、データの低ランクでジョイントなエンコーディングを学ぶことである。本稿では, 完全データと不完全データの両方に対する高度にスケーラブルな分解アルゴリズムを提案し, 行列分解, 欠落値の計算, 時間的補間, クラスタリング, 周期推定, および, 交通パターンからソーシャルメディア活動まで, 実世界のデータにおけるランク推定の利点を示す。観察の75%が欠落している時, 時間的補間のための基準線に比べてRMSEの28%の減少が達成された。ベースライン間では、350万のデータポイントで20秒未満でスケールし、最も控えめなモデルを生成する。我々の知る限りでは、TGSDは時間辞書とグラフ辞書によってグラフ信号を共同でモデル化する最初のフレームワークである。 Temporal graph signals are multivariate time series with individual components associated with nodes of a fixed graph structure. Data of this kind arises in many domains including activity of social network users, sensor network readings over time, and time course gene expression within the interaction network of a model organism. Traditional matrix decomposition methods applied to such data fall short of exploiting structural regularities encoded in the underlying graph and also in the temporal patterns of the signal. How can we take into account such structure to obtain a succinct and interpretable representation of temporal graph signals? We propose a general, dictionary-based framework for temporal graph signal decomposition (TGSD). The key idea is to learn a low-rank, joint encoding of the data via a combination of graph and time dictionaries. We propose a highly scalable decomposition algorithm for both complete and incomplete data, and demonstrate its advantage for matrix decomposition, imputation of missing values, temporal interpolation, clustering, period estimation, and rank estimation in synthetic and real-world data ranging from traffic patterns to social media activity. Our framework achieves 28% reduction in RMSE compared to baselines for temporal interpolation when as many as 75% of the observations are missing. It scales best among baselines taking under 20 seconds on 3.5 million data points and produces the most parsimonious models. To the best of our knowledge, TGSD is the first framework to jointly model graph signals by temporal and graph dictionaries.	翻訳日:2021-06-28 13:03:22 公開日:2021-06-25
# 有限要素畳み込みニューラルネットワーク(fe-cnn)による構造トポロジ最適化の高速化 A mechanistic-based data-driven approach to accelerate structural topology optimization through finite element convolutional neural network (FE-CNN) ( http://arxiv.org/abs/2106.13652v1 ) ライセンス: Link先を確認	Tianle Yue, Hang Yang, Zongliang Du, Chang Liu, Khalil I. Elkhodary, Shan Tang, Xu Guo	(参考訳) 本稿では, 内部で開発された有限要素畳み込みニューラルネットワーク(FE-CNN)を用いて, 構造トポロジ最適化を高速化するメカニスティックなデータ駆動手法を提案する。我々のアプローチは、オフライントレーニングとオンライン最適化の2つの段階に分けられる。オフライントレーニングでは、所定の設計ドメインの高解像度表現と低解像度表現の間にマッピング関数が構築される。このマッピングはFE-CNNによって表現され、異なる解像度の設計領域間で共通の目的関数値(例えば、構造的コンプライアンス)をターゲットにしている。オンライン最適化では、訓練されたマッピング機能により、高解像度の任意の設計領域を低解像度に還元する。従って、オリジナルの高解像度ドメインは、低解像度バージョンのみで実行される計算と、高解像度ドメインへの逆マッピングによって設計されている。数値例は、このアプローチが計算時間の最大桁まで最適化を加速できることを示しています。したがって,提案手法は密度に基づく構造トポロジー最適化によって生じる次元の呪いを克服する大きな可能性を示す。本研究のアプローチの限界についても論じる。 In this paper, a mechanistic data-driven approach is proposed to accelerate structural topology optimization, employing an in-house developed finite element convolutional neural network (FE-CNN). Our approach can be divided into two stages: offline training, and online optimization. During offline training, a mapping function is built between high and low resolution representations of a given design domain. The mapping is expressed by a FE-CNN, which targets a common objective function value (e.g., structural compliance) across design domains of differing resolutions. During online optimization, an arbitrary design domain of high resolution is reduced to low resolution through the trained mapping function. The original high-resolution domain is thus designed by computations performed on only the low-resolution version, followed by an inverse mapping back to the high-resolution domain. Numerical examples demonstrate that this approach can accelerate optimization by up to an order of magnitude in computational time. Our proposed approach therefore shows great potential to overcome the curse-of-dimensionality incurred by density-based structural topology optimization. The limitation of our present approach is also discussed.	翻訳日:2021-06-28 13:02:58 公開日:2021-06-25
# cadda:脳波信号に対するクラス別自動微分可能データ拡張 CADDA: Class-wise Automatic Differentiable Data Augmentation for EEG Signals ( http://arxiv.org/abs/2106.13695v1 ) ライセンス: Link先を確認	C\'edric Rommel, Thomas Moreau, Alexandre Gramfort	(参考訳) データ拡張はディープラーニングパイプラインの重要な要素であり、ラベルを不変に保つ入力データの変換に関するトレーニング中にネットワークに通知する。しかし、与えられたパイプラインの適切な拡張メソッドとパラメータを手動で見つけるのは、急速に面倒です。特に、直観は画像に対してこの決定を導くことができるが、神経科学信号のようなより複雑なデータに対して、拡張ポリシーの設計と選択は不明確である。さらに、このような構造化データにはラベル独立戦略が適さない場合や、クラス依存の強化が必要かもしれない。カーイメージの色を変えることは、予測されるオブジェクトクラスを変えるのではなく、オレンジの画像に同じことをすることです。本稿では,データ拡張による一般化能力の向上を目的とする。しかし、クラスに依存した変換を求めるとタスクの複雑さが大きくなり、既存のほとんどの自動手法による勾配のない最適化手法が現実のデータセットにとって難解になる。そこで本研究では,勾配に基づく学習に適した微分可能データ拡張法を提案する。脳波信号は、良い拡張ポリシーがほとんど知られていないデータの完璧な例です。本研究は,臨床関連睡眠ステージ分類課題に対する我々のアプローチの意義を実証するものであり,また,異なる変換も提案する。 Data augmentation is a key element of deep learning pipelines, as it informs the network during training about transformations of the input data that keep the label unchanged. Manually finding adequate augmentation methods and parameters for a given pipeline is however rapidly cumbersome. In particular, while intuition can guide this decision for images, the design and choice of augmentation policies remains unclear for more complex types of data, such as neuroscience signals. Moreover, label independent strategies might not be suitable for such structured data and class-dependent augmentations might be necessary. This idea has been surprisingly unexplored in the literature, while it is quite intuitive: changing the color of a car image does not change the object class to be predicted, but doing the same to the picture of an orange does. This paper aims to increase the generalization power added through class-wise data augmentation. Yet, as seeking transformations depending on the class largely increases the complexity of the task, using gradient-free optimization techniques as done by most existing automatic approaches becomes intractable for real-world datasets. For this reason we propose to use differentiable data augmentation amenable to gradient-based learning. EEG signals are a perfect example of data for which good augmentation policies are mostly unknown. In this work, we demonstrate the relevance of our approach on the clinically relevant sleep staging classification task, for which we also propose differentiable transformations.	翻訳日:2021-06-28 13:02:38 公開日:2021-06-25
# Ranger21: シナジスティックなディープラーニングオプティマイザ Ranger21: a synergistic deep learning optimizer ( http://arxiv.org/abs/2106.13731v1 ) ライセンス: Link先を確認	Less Wright and Nestor Demeure	(参考訳) ニューラルネットワークの性能に最適化器が不可欠であるため、毎年多くの論文が発表されている。しかし、これらの出版物の多くは既存のアルゴリズムを漸進的に改善しているが、それらは構成可能なアルゴリズムではなく、新しい最適化として提示される傾向がある。このように、初期の出版物から多くの価値ある改善が見られることは滅多にない。この未解決の可能性を生かして、adamwと8つのコンポーネントを組み合わせた新しいオプティマイザ ranger21 を紹介し、文献からアイデアをレビューおよびテストした後、慎重に選択する。その結果、オプティマイザは検証精度とトレーニング速度を大幅に改善し、スムーズなトレーニング曲線を提供し、バッチ正規化レイヤなしでImageNet2012上でResNet50をトレーニングできることがわかった。 AdamWが体系的に悪い初期状態に留まっている問題。 As optimizers are critical to the performances of neural networks, every year a large number of papers innovating on the subject are published. However, while most of these publications provide incremental improvements to existing algorithms, they tend to be presented as new optimizers rather than composable algorithms. Thus, many worthwhile improvements are rarely seen out of their initial publication. Taking advantage of this untapped potential, we introduce Ranger21, a new optimizer which combines AdamW with eight components, carefully selected after reviewing and testing ideas from the literature. We found that the resulting optimizer provides significantly improved validation accuracy and training speed, smoother training curves, and is even able to train a ResNet50 on ImageNet2012 without Batch Normalization layers. A problem on which AdamW stays systematically stuck in a bad initial state.	翻訳日:2021-06-28 13:02:18 公開日:2021-06-25
# Jitter:ランダムジッタリング損失関数 Jitter: Random Jittering Loss Function ( http://arxiv.org/abs/2106.13749v1 ) ライセンス: Link先を確認	Zhicheng Cai, Chenglei Peng and Sidan Du	(参考訳) 正規化は機械学習の最適化において重要な役割を果たす。フラッディングと呼ばれる新しい正規化手法により、トレーニング損失はフラッディングレベル付近で変動する。一般化を促進するために、フラットな損失の風景に達するまで、モデルをランダムに歩き続けることを意図しています。しかし、洪水法のハイパーパラメータフラッディングレベルを適切に均一に選択することができない。そこで我々は,jitter という新しい手法を提案する。 jitterは本質的にランダムな損失関数の一種です。トレーニング前に、特定の確率分布からジッタ点をランダムにサンプリングする。浸水レベルをジッターポイントに置き換えて新しい目標関数を取得し、それに従ってモデルを訓練する必要がある。ランダムな要素として作用するジッター点は、実際に損失関数にランダム性を加えるが、これは機械学習モデルの学習プロセスに無数のランダムな振る舞いが存在するという事実と一致し、モデルをより堅牢にすることが期待される。さらに、jitterはランダムにランダムにウォークを行い、損失曲線を小さな間隔に分けて反転させ、損失曲線をよりフラットにし、一般化能力を高める。さらに、Jitterはドメイン、タスク、モデルに依存しない正規化手法であり、トレーニングエラーがゼロになった後にモデルを効果的に訓練することができる。実験の結果,jitter法では,従来のフラッディング法よりもモデル性能が大幅に向上し,試験損失曲線を2回降下できることがわかった。 Regularization plays a vital role in machine learning optimization. One novel regularization method called flooding makes the training loss fluctuate around the flooding level. It intends to make the model continue to random walk until it comes to a flat loss landscape to enhance generalization. However, the hyper-parameter flooding level of the flooding method fails to be selected properly and uniformly. We propose a novel method called Jitter to improve it. Jitter is essentially a kind of random loss function. Before training, we randomly sample the Jitter Point from a specific probability distribution. The flooding level should be replaced by Jitter point to obtain a new target function and train the model accordingly. As Jitter point acting as a random factor, we actually add some randomness to the loss function, which is consistent with the fact that there exists innumerable random behaviors in the learning process of the machine learning model and is supposed to make the model more robust. In addition, Jitter performs random walk randomly which divides the loss curve into small intervals and then flipping them over, ideally making the loss curve much flatter and enhancing generalization ability. Moreover, Jitter can be a domain-, task-, and model-independent regularization method and train the model effectively after the training error reduces to zero. Our experimental results show that Jitter method can improve model performance more significantly than the previous flooding method and make the test loss curve descend twice.	翻訳日:2021-06-28 13:02:03 公開日:2021-06-25
# 等分散グラフネットワークにおけるデータ効率 Data efficiency in graph networks through equivariance ( http://arxiv.org/abs/2106.13786v1 ) ライセンス: Link先を確認	Francesco Farina, Emma Slade	(参考訳) 本稿では,隣接ノード間の距離を保つ座標埋め込み内の任意の変換に同値なグラフネットワークのための新しいアーキテクチャを提案する。特に、n$-次元におけるユークリッド群と共形直交群とに同値である。その同値性のおかげで、提案モデルは古典的なグラフアーキテクチャに関して非常にデータ効率が良く、本質的にはより優れた帰納バイアスを備える。提案するアーキテクチャは、最小限のデータ量で学習することで、合成問題で見つからないデータに完全に一般化できる一方で、標準モデルからより多くのトレーニングデータが、同等のパフォーマンスに達するのに必要であることを示す。 We introduce a novel architecture for graph networks which is equivariant to any transformation in the coordinate embeddings that preserves the distance between neighbouring nodes. In particular, it is equivariant to the Euclidean and conformal orthogonal groups in $n$-dimensions. Thanks to its equivariance properties, the proposed model is extremely more data efficient with respect to classical graph architectures and also intrinsically equipped with a better inductive bias. We show that, learning on a minimal amount of data, the architecture we propose can perfectly generalise to unseen data in a synthetic problem, while much more training data are required from a standard model to reach comparable performance.	翻訳日:2021-06-28 13:01:43 公開日:2021-06-25
# 知識グラフに基づくソフトウェア定義ネットワークの自律管理に向けて Towards A Knowledge Graph Based Autonomic Management of Software Defined Networks ( http://arxiv.org/abs/2106.13367v1 ) ライセンス: Link先を確認	Qianru Zhou and Alasdair J.G. Gray and Stephen McLaughlin	(参考訳) 人工知能技術による自動ネットワーク管理は何十年にもわたって熱く議論されてきた。しかし、現在の報告では、主に理論的な提案とアーキテクチャ設計に焦点を当てており、現実のネットワーク上での実践的実装に関する作業は未だ現れていない。本稿では,ソフトウェア定義ネットワーク(SDN)における自律的ネットワーク管理のための知識グラフ駆動型アプローチの実装に向けた取り組みについて述べる。 ToCoオントロジーによって駆動されるSeaNetは、Mininet(SDNエミュレータ)に基づいて再プログラムされる。それは3つのコアコンポーネント、ナレッジグラフジェネレータ、sparqlエンジン、ネットワーク管理apiで構成されている。知識グラフ生成器は、通信ネットワーク管理タスクの知識を、正式に表現されたオントロジー駆動モデルに表現する。エキスパートエクスペリエンスとネットワーク管理ルールはナレッジグラフに形式化することができ、SPARQLエンジンによって自動的に推論されることにより、Network Management APIはテクノロジ固有の詳細をパケット化し、テクノロジに依存しないインターフェースをユーザに公開することができる。同一言語pythonで実装された商用sdnコントローラryuとの比較により,提案手法を評価する実験を行った。評価の結果,ほとんどの場合,SeaNetはRyuよりかなり高速であり,SeaNetのコードははるかにコンパクトであることがわかった。 RDF推論の利点として、SeaNetは知識グラフの異なるスケールでO(1)時間複雑性を達成でき、一方従来のデータベースはO(nlogn)を最大限に達成できる。 SeaNetは、開発したネットワーク管理APIにより、研究者が自身のSDN上でセマンティック・インテリジェントなアプリケーションを開発できるようにする。 Automatic network management driven by Artificial Intelligent technologies has been heatedly discussed over decades. However, current reports mainly focus on theoretic proposals and architecture designs, works on practical implementations on real-life networks are yet to appear. This paper proposes our effort toward the implementation of knowledge graph driven approach for autonomic network management in software defined networks (SDNs), termed as SeaNet. Driven by the ToCo ontology, SeaNet is reprogrammed based on Mininet (a SDN emulator). It consists three core components, a knowledge graph generator, a SPARQL engine, and a network management API. The knowledge graph generator represents the knowledge in the telecommunication network management tasks into formally represented ontology driven model. Expert experience and network management rules can be formalized into knowledge graph and by automatically inferenced by SPARQL engine, Network management API is able to packet technology-specific details and expose technology-independent interfaces to users. The Experiments are carried out to evaluate proposed work by comparing with a commercial SDN controller Ryu implemented by the same language Python. The evaluation results show that SeaNet is considerably faster in most circumstances than Ryu and the SeaNet code is significantly more compact. Benefit from RDF reasoning, SeaNet is able to achieve O(1) time complexity on different scales of the knowledge graph while the traditional database can achieve O(nlogn) at its best. With the developed network management API, SeaNet enables researchers to develop semantic-intelligent applications on their own SDNs.	翻訳日:2021-06-28 13:00:38 公開日:2021-06-25
# 垂直探索のためのドメイン特化事前学習:生物医学文献の事例研究 Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature ( http://arxiv.org/abs/2106.13375v1 ) ライセンス: Link先を確認	Yu Wang, Jinchao Li, Tristan Naumann, Chenyan Xiong, Hao Cheng, Robert Tinn, Cliff Wong, Naoto Usuyama, Richard Rogahn, Zhihong Shen, Yang Qin, Eric Horvitz, Paul N. Bennett, Jianfeng Gao, Hoifung Poon	(参考訳) 情報過負荷は、多くの高価値ドメインにおいて一般的な課題である。特筆すべき事例は、新型コロナウイルス(covid-19)に関する生物医学文献が爆発的に爆発し、何ヶ月にもわたって数十万の論文に膨れ上がったことだ。概して、生物医学の文献は毎分2つの論文に拡張され、毎年100万以上の新しい論文が発行されている。クリックログからの直接監督が不足しているため、バイオメディカル領域や多くの垂直領域での検索は困難である。自己監督学習は、アノテーションのボトルネックを克服するための有望な方向性として現れてきた。本稿では、ドメイン固有の事前学習に基づく垂直探索のための一般的なアプローチを提案し、バイオメディカルドメインのケーススタディを提案する。極めてシンプルで,訓練や開発に関連ラベルを使用しないにもかかわらず,本手法は,新型コロナ関連生物医学的検索競争である公式trec-covid評価において,優れたシステムと同等かそれ以上の性能を発揮する。現代のクラウドインフラで分散コンピューティングを使用することで、私たちのシステムはPubMed上で数千万の記事にスケールでき、Microsoft Biomedical Searchとしてデプロイされた。 Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vertical domains is challenging due to the scarcity of direct supervision from click logs. Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck. We propose a general approach for vertical search based on domain-specific pretraining and present a case study for the biomedical domain. Despite being substantially simpler and not using any relevance labels for training or development, our method performs comparably or better than the best systems in the official TREC-COVID evaluation, a COVID-related biomedical search competition. Using distributed computing in modern cloud infrastructure, our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search, a new search experience for biomedical literature: https://aka.ms/biomedsearch.	翻訳日:2021-06-28 13:00:13 公開日:2021-06-25
# tts/vcシステムにおけるベクトル量子化潜在空間の利用に関する予備的検討 Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance ( http://arxiv.org/abs/2106.13479v1 ) ライセンス: Link先を確認	Hieu-Thi Luong and Junichi Yamagishi	(参考訳) 一般に、ニューラルネットワーク合成システムの訓練の主な目的は、隠れた層にあまり注意を払わずに、ニューラルネットワークの出力層から自然で表現豊かな音声を合成することである。しかし、有用な潜在表現を学習することで、システムはより実用的なシナリオで使用できる。本稿では,潜在言語埋め込みのモデル化における量子化ベクトルの利用について検討し,それと比較する。学習における潜在空間上の異なるポリシーを強制することにより、品質と話者の類似性の観点から同様の性能を保ちながら、異なる特性を生かした潜在言語埋め込みを得ることができる。実験により,ベクトル量子化によって構築された音声クローンシステムは,知覚的評価の面では小さな劣化しか持たないが,データ転送や情報漏洩の抑制に望ましい表現ビットレートの低減や,話者の匿名化などのタスクにおいて重要な離散的潜在空間を有することが分かった。 Generally speaking, the main objective when training a neural speech synthesis system is to synthesize natural and expressive speech from the output layer of the neural network without much attention given to the hidden layers. However, by learning useful latent representation, the system can be used for many more practical scenarios. In this paper, we investigate the use of quantized vectors to model the latent linguistic embedding and compare it with the continuous counterpart. By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding that takes on different properties while having a similar performance in terms of quality and speaker similarity. Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations, but has a discrete latent space that is useful for reducing the representation bit-rate, which is desirable for data transferring, or limiting the information leaking, which is important for speaker anonymization and other tasks of that nature.	翻訳日:2021-06-28 12:59:53 公開日:2021-06-25
# 特徴群とスパース主成分分析 Feature Grouping and Sparse Principal Component Analysis ( http://arxiv.org/abs/2106.13685v1 ) ライセンス: Link先を確認	Haiyan Jiang, Shanshan Qin, Dejing Dou	(参考訳) スパース主成分分析 (sparse principal component analysis, spca) はデータ処理や次元縮小に広く使われている。しかし、スパースPCAは、すべての係数が 0 である特別な群(つまり、特徴選択)に加えて、負荷が類似した係数(すなわち、特徴群)を共有する追加のグループ構造を決して考慮しない。本稿では,FGSPCA(Feature Grouping and Sparse principal Component Analysis)と呼ばれる新しい手法を提案する。提案したFGSPCAは,非凸正規化を自然に調整可能な間隔とグループ化効果を付与することにより,グループ探索と特徴選択を同時に行うためのサブスペース学習手法である。結果として得られる非凸最適化問題を解決するために、差分凸プログラミング、拡張ラグランジュおよび座標降下法を組み込んだ交互アルゴリズムを提案する。さらに, 実データを用いた実験結果から, 提案したFGSPCAはグループ化効果のない手法と比較してグループ化効果の恩恵を受けることが示された。 Sparse Principal Component Analysis (SPCA) is widely used in data processing and dimension reduction; it uses the lasso to produce modified principal components with sparse loadings for better interpretability. However, sparse PCA never considers an additional grouping structure where the loadings share similar coefficients (i.e., feature grouping), besides a special group with all coefficients being zero (i.e., feature selection). In this paper, we propose a novel method called Feature Grouping and Sparse Principal Component Analysis (FGSPCA) which allows the loadings to belong to disjoint homogeneous groups, with sparsity as a special case. The proposed FGSPCA is a subspace learning method designed to simultaneously perform grouping pursuit and feature selection, by imposing a non-convex regularization with naturally adjustable sparsity and grouping effect. To solve the resulting non-convex optimization problem, we propose an alternating algorithm that incorporates the difference-of-convex programming, augmented Lagrange and coordinate descent methods. Additionally, the experimental results on real data sets show that the proposed FGSPCA benefits from the grouping effect compared with methods without grouping effect.	翻訳日:2021-06-28 12:59:34 公開日:2021-06-25
# クラス及び層別VAEによる意味的画像合成と編集の多様化 Diversifying Semantic Image Synthesis and Editing via Class- and Layer-wise VAEs ( http://arxiv.org/abs/2106.13416v1 ) ライセンス: Link先を確認	Yuki Endo, Yoshihiro Kanamori	(参考訳) セマンティック画像合成は、単一のセマンティックマスクからフォトリアリスティック画像を生成するプロセスである。マルチモーダル画像合成の多様性を高めるため、従来の手法では1つの潜在空間を学習することで出力画像のグローバル外観を制御する。しかし、オブジェクトの外観が複数の要因に依存するため、複数のオブジェクトスタイルをキャプチャするには、単一の潜時コードは不十分であることが多い。オブジェクトのスタイルを決定する個々の要素を扱うため、複数の潜在空間を学習することにより、各オブジェクトクラスをローカルからグローバルレベルまで柔軟に制御できるvaruational autoencoder(vae)フレームワークのクラスおよびレイヤごとに拡張する。さらに,本手法は3つの異なる領域における実データと合成データを用いた広範囲な実験により,最先端の手法と比較して,多種多様な画像を生成することを実証する。また,本手法は画像合成や編集作業において幅広い応用が可能となることを示した。 Semantic image synthesis is a process for generating photorealistic images from a single semantic mask. To enrich the diversity of multimodal image synthesis, previous methods have controlled the global appearance of an output image by learning a single latent space. However, a single latent code is often insufficient for capturing various object styles because object appearance depends on multiple factors. To handle individual factors that determine object styles, we propose a class- and layer-wise extension to the variational autoencoder (VAE) framework that allows flexible control over each object class at the local to global levels by learning multiple latent spaces. Furthermore, we demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods via extensive experiments with real and synthetic datasets inthree different domains. We also show that our method enables a wide range of applications in image synthesis and editing tasks.	翻訳日:2021-06-28 12:58:14 公開日:2021-06-25
# 物理形ニューラルネットワーク(pinns)のマルチフィデリティモデリング Multifidelity Modeling for Physics-Informed Neural Networks (PINNs) ( http://arxiv.org/abs/2106.13361v1 ) ライセンス: Link先を確認	Michael Penwarden, Shandian Zhe, Akil Narayan, Robert M. Kirby	(参考訳) マルチファイダリティシミュレーション手法は、低フィダリティと高フィダリティシミュレーションを巧みに組み合わせることで、精度の向上とコスト削減を図っている。このアプローチの候補は、重要な計算コストの差と忠実性の違いがあるシミュレーション方法論である。物理インフォームドニューラルネットワーク(PINN)は、異なる忠実度(アーキテクチャの幅と深さおよび最適化基準で表される)が採用されるために必要なトレーニング時間に大きな違いがあるため、この種のアプローチの候補となっている。本稿では,低ランク構造を利用するPINNに適用した,特定の多重忠実度アプローチを提案する。モデルの忠実度に関するパラメータとして,幅,深さ,最適化基準が利用可能であることを実証し,忠実度パラメータの選択によるトレーニングにおけるコスト差の数値的正当性を示す。我々は新しいピンズ文学で提示された様々な正準フォワードpdeモデル上で多元性スキームをテストする。 Multifidelity simulation methodologies are often used in an attempt to judiciously combine low-fidelity and high-fidelity simulation results in an accuracy-increasing, cost-saving way. Candidates for this approach are simulation methodologies for which there are fidelity differences connected with significant computational cost differences. Physics-informed Neural Networks (PINNs) are candidates for these types of approaches due to the significant difference in training times required when different fidelities (expressed in terms of architecture width and depth as well as optimization criteria) are employed. In this paper, we propose a particular multifidelity approach applied to PINNs that exploits low-rank structure. We demonstrate that width, depth, and optimization criteria can be used as parameters related to model fidelity, and show numerical justification of cost differences in training due to fidelity parameter choices. We test our multifidelity scheme on various canonical forward PDE models that have been presented in the emerging PINNs literature.	翻訳日:2021-06-28 12:57:48 公開日:2021-06-25
# 隣接世代を欠いたサブグラフフェデレーション学習 Subgraph Federated Learning with Missing Neighbor Generation ( http://arxiv.org/abs/2106.13430v1 ) ライセンス: Link先を確認	Ke Zhang, Carl Yang, Xiaoxiao Li, Lichao Sun, Siu Ming Yiu	(参考訳) グラフは、現実世界のオブジェクトとその相互作用のユニークな表現のために、データマイニングや機械学習で広く使われている。近年,グラフがますます大きくなってきているため,各サブグラフが個別に収集され,複数のローカルシステムに格納されることが一般的である。したがって、グラフ全体の分布からバイアスを受ける可能性のある小さなサブグラフを持つ各ローカルシステムにおいて、サブグラフフェデレーション学習環境を考えるのは自然である。したがって、subgraphフェデレーション学習は、グラフデータを直接共有することなく、強力で一般化可能なグラフマイニングモデルを協調的にトレーニングすることを目的としている。本研究では,1)FedAvgをベースとしたGraphSageモデルを訓練し,ノードの特徴,リンク構造,タスクラベルを複数のローカルサブグラフに統合するFedSage+,2)FedSageに沿って欠落した隣人ジェネレータを訓練してローカルサブグラフ間のリンクに対処するFedSage+という2つの主要な手法を提案する。合成サブグラフフェデレーション学習設定を用いた4つの実世界のグラフデータセットの実証結果から,提案手法の有効性と有効性を示す。同時に、大域グラフ上の一般化能力に対して一貫した理論的含意がもたらされる。 Graphs have been widely used in data mining and machine learning due to their unique representation of real-world objects and their interactions. As graphs are getting bigger and bigger nowadays, it is common to see their subgraphs separately collected and stored in multiple local systems. Therefore, it is natural to consider the subgraph federated learning setting, where each local system holding a small subgraph that may be biased from the distribution of the whole graph. Hence, the subgraph federated learning aims to collaboratively train a powerful and generalizable graph mining model without directly sharing their graph data. In this work, towards the novel yet realistic setting of subgraph federated learning, we propose two major techniques: (1) FedSage, which trains a GraphSage model based on FedAvg to integrate node features, link structures, and task labels on multiple local subgraphs; (2) FedSage+, which trains a missing neighbor generator along FedSage to deal with missing links across local subgraphs. Empirical results on four real-world graph datasets with synthesized subgraph federated learning settings demonstrate the effectiveness and efficiency of our proposed techniques. At the same time, consistent theoretical implications are made towards their generalization ability on the global graphs.	翻訳日:2021-06-28 12:57:34 公開日:2021-06-25
# ペトロケミカル産業用推論センサのデータベース設計 Data-based Design of Inferential Sensors for Petrochemical Industry ( http://arxiv.org/abs/2106.13503v1 ) ライセンス: Link先を確認	Martin Mojto, Karol \v{L}ubu\v{s}k\'y, Miroslav Fikar and Radoslav Paulen	(参考訳) 産業において、不正確な(または柔らかい)センサーは、オンラインで測定された変数(例えば圧力、温度)から不正確かつ稀に測定された(または完全に測定されていない)変数の値を推測するために用いられる。効果的な推論センサーを設計する際の、古典的なモデルオーバーフィッティングに似た主な課題は、センサーの正しい構造を選択することである。センサ構造は、オンラインで測定された変数とその(単純な)組み合わせに対応するセンサへの入力数によって表現される。本研究は,2つの油精製ユニット,流体触媒分解ユニットと真空ガス水素化ユニットにおける工業蒸留塔の製品組成推定センサの設計に焦点をあてたものである。最初の設計ステップとして,いくつかのよく知られたデータ前処理(gross error detection)手法を用いて,利用可能な産業データにおける系統的エラーと異常値を示すために,これらの手法の能力を比較する。次に,得られたモデルの複雑さと精度を考慮した推論センサの設計手法の有効性について検討する。有効性分析の結果、現在の平均センサによる改善は最大19%であった。 Inferential (or soft) sensors are used in industry to infer the values of imprecisely and rarely measured (or completely unmeasured) variables from variables measured online (e.g., pressures, temperatures). The main challenge, akin to classical model overfitting, in designing an effective inferential sensor is the selection of a correct structure of the sensor. The sensor structure is represented by the number of inputs to the sensor, which correspond to the variables measured online and their (simple) combinations. This work is focused on the design of inferential sensors for product composition of an industrial distillation column in two oil refinery units, a Fluid Catalytic Cracking unit and a Vacuum Gasoil Hydrogenation unit. As the first design step, we use several well-known data pre-treatment (gross error detection) methods and compare the ability of these approaches to indicate systematic errors and outliers in the available industrial data. We then study effectiveness of various methods for design of the inferential sensors taking into account the complexity and accuracy of the resulting model. The effectiveness analysis indicates that the improvements achieved over the current inferential sensors are up to 19 %.	翻訳日:2021-06-28 12:57:13 公開日:2021-06-25
# 高次元コルモゴロフ-スミルノフ距離の加速計算 Accelerated Computation of a High Dimensional Kolmogorov-Smirnov Distance ( http://arxiv.org/abs/2106.13706v1 ) ライセンス: Link先を確認	Alex Hagen, Shane Jackson, James Kahn, Jan Strube, Isabel Haide, Karl Pazdernik, Connor Hainje	(参考訳) 統計検査は、様々な科学分野において広く、重要である。機械学習の出現と計算能力の増大により、多次元データの分析と統計的テストへの関心が高まっている。強力なコルモゴロフ・スミルノフの2つの標本試験をファサーノ(fasano, 1987)と同様の高次元形式に拡張する。 We call our result the d-dimensional Kolmogorov-Smirnov test (ddKS) and provide three novel contributions therewith: we develop an analytical equation for the significance of a given ddKS score, we provide an algorithm for computation of ddKS on modern computing hardware that is of constant time complexity for small sample sizes and dimensions, and we provide two approximate calculations of ddKS: one that reduces the time complexity to linear at larger sample sizes, and another that reduces the time complexity to linear with increasing dimension. 我々は、ddKSとその近似をデータセットのコーパス上でパワー分析し、HotellingのT^2テストとKullback-Leiblerの分岐といった、他の一般的な高次元の2つのサンプルテストと距離と比較する。私たちのddkテストは、テストされたすべてのデータセット、寸法、サイズでうまく動作しますが、他のテストと距離は、少なくとも1つのデータセットのヌル仮説を拒否できません。したがって,ddkは汎用的な多次元2試料テストであり,並列近似法や近似法を用いて高速かつ効率的な計算が可能である。本研究で説明したすべてのメソッドのオープンソース実装はhttps://github.com/pnnl/ddks.comにある。 Statistical testing is widespread and critical for a variety of scientific disciplines. The advent of machine learning and the increase of computing power has increased the interest in the analysis and statistical testing of multidimensional data. We extend the powerful Kolmogorov-Smirnov two sample test to a high dimensional form in a similar manner to Fasano (Fasano, 1987). We call our result the d-dimensional Kolmogorov-Smirnov test (ddKS) and provide three novel contributions therewith: we develop an analytical equation for the significance of a given ddKS score, we provide an algorithm for computation of ddKS on modern computing hardware that is of constant time complexity for small sample sizes and dimensions, and we provide two approximate calculations of ddKS: one that reduces the time complexity to linear at larger sample sizes, and another that reduces the time complexity to linear with increasing dimension. We perform power analysis of ddKS and its approximations on a corpus of datasets and compare to other common high dimensional two sample tests and distances: Hotelling's T^2 test and Kullback-Leibler divergence. Our ddKS test performs well for all datasets, dimensions, and sizes tested, whereas the other tests and distances fail to reject the null hypothesis on at least one dataset. We therefore conclude that ddKS is a powerful multidimensional two sample test for general use, and can be calculated in a fast and efficient manner using our parallel or approximate methods. Open source implementations of all methods described in this work are located at https://github.com/pnnl/ddks.	翻訳日:2021-06-28 12:56:54 公開日:2021-06-25
# 平均フィールドゲームのための強化学習と経済学への応用 Reinforcement Learning for Mean Field Games, with Applications to Economics ( http://arxiv.org/abs/2106.13755v1 ) ライセンス: Link先を確認	Andrea Angiuli and Jean-Pierre Fouque and Mathieu Lauriere	(参考訳) 平均場ゲーム (mfg) と平均場制御問題 (mfc) は、エージェントの連続体を持つゲームにおけるnash平衡や社会光学を研究するためのフレームワークである。これらの問題は、大きな有限個のエージェントによる競争的または協調的なゲーム近似に利用することができ、特に経済学において幅広い応用を見いだすことができる。近年、MFGとMFCにおける学習の問題は、解を計算する方法と、学習者の大多数が均衡にどのように収束するかをモデル化する方法の両方として関心を集めている。特に興味深いのは、エージェントがモデルを知らない設定であり、これは強化学習(rl)メソッドの開発につながる。このトピックに関する文献をレビューした後、統一的なQ-ラーニングアルゴリズムに依存するMFGとMFCのためのRLを用いた2つのタイムスケールアプローチを提案する。この手法の主な目新しさは、アクション値関数と分布を同時に更新するが、異なるレートでモデルフリーで更新することである。 2つの学習率の比率に応じて、アルゴリズムはMFGまたはMFCソリューションのいずれかを学習する。この方法を説明するために,原ユーティリティ関数を用いた有限方向の累積消費の平均場問題と,トレーダの最適清算問題に適用する。 Mean field games (MFG) and mean field control problems (MFC) are frameworks to study Nash equilibria or social optima in games with a continuum of agents. These problems can be used to approximate competitive or cooperative games with a large finite number of agents and have found a broad range of applications, in particular in economics. In recent years, the question of learning in MFG and MFC has garnered interest, both as a way to compute solutions and as a way to model how large populations of learners converge to an equilibrium. Of particular interest is the setting where the agents do not know the model, which leads to the development of reinforcement learning (RL) methods. After reviewing the literature on this topic, we present a two timescale approach with RL for MFG and MFC, which relies on a unified Q-learning algorithm. The main novelty of this method is to simultaneously update an action-value function and a distribution but with different rates, in a model-free fashion. Depending on the ratio of the two learning rates, the algorithm learns either the MFG or the MFC solution. To illustrate this method, we apply it to a mean field problem of accumulated consumption in finite horizon with HARA utility function, and to a trader's optimal liquidation problem.	翻訳日:2021-06-28 12:56:03 公開日:2021-06-25
# HyperNP:多次元投影型ハイパーパラメータのインタラクティブビジュアル探索 HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters ( http://arxiv.org/abs/2106.13777v1 ) ライセンス: Link先を確認	Gabriel Appleby, Mateus Espadoto, Rui Chen, Samuel Goree, Alexandru Telea, Erik W Anderson, Remco Chang	(参考訳) t-SNE や UMAP のような投影アルゴリズムは高次元データの可視化に有用であるが、慎重に調整する必要があるハイパーパラメータに依存する。残念なことに、最適なハイパーパラメータ値を見つけるために反復的に再計算する射影は、これらの方法の確率的性質のために計算的に集中的で直感的である。本稿では,ニューラルネットワーク近似をトレーニングすることで,プロジェクション手法のリアルタイム対話型ハイパーパラメータ探索を可能にする,スケーラブルなHyperNPを提案する。 hypernpは、全データインスタンスとハイパーパラメータの構成のごく一部でトレーニングでき、新しいデータとハイパーパラメータのプロジェクションをインタラクティブな速度で計算できる。 HyperNPはサイズがコンパクトで計算が速いため、Webブラウザのような軽量な視覚化システムに組み込むことができる。我々は3つのデータセットにおけるhypernpの性能を性能と速度の観点から評価する。結果は、HyperNPは正確で、スケーラブルで、インタラクティブで、現実世界の設定での使用に適していることを示唆している。 Projection algorithms such as t-SNE or UMAP are useful for the visualization of high dimensional data, but depend on hyperparameters which must be tuned carefully. Unfortunately, iteratively recomputing projections to find the optimal hyperparameter value is computationally intensive and unintuitive due to the stochastic nature of these methods. In this paper we propose HyperNP, a scalable method that allows for real-time interactive hyperparameter exploration of projection methods by training neural network approximations. HyperNP can be trained on a fraction of the total data instances and hyperparameter configurations and can compute projections for new data and hyperparameters at interactive speeds. HyperNP is compact in size and fast to compute, thus allowing it to be embedded in lightweight visualization systems such as web browsers. We evaluate the performance of the HyperNP across three datasets in terms of performance and speed. The results suggest that HyperNP is accurate, scalable, interactive, and appropriate for use in real-world settings.	翻訳日:2021-06-28 12:55:36 公開日:2021-06-25
# 整数プログラミングによるバイナリ行列の分解と補完 Binary Matrix Factorisation and Completion via Integer Programming ( http://arxiv.org/abs/2106.13434v1 ) ライセンス: Link先を確認	Reka A. Kovacs, Oktay Gunluk, Raphael A. Hauser	(参考訳) binary matrix factorizationは、バイナリデータの離散的なパターンを識別するための必須のツールである。本稿では,次数-k二進行列分解問題 (k-BMF) をブール算術の下で考慮し,nxm二進行列 X が欠落する可能性があり,それぞれ nxk と kxm の 2 つの二進行列 A と B を見出すことで,X と A と B の距離を2乗フロベニウス距離で最小化する。我々はk-BMF用のコンパクトかつ2つの指数サイズの整数プログラム(IP)を提案し、コンパクトIPはLP緩和が弱い一方、指数サイズのLPはLP緩和が強いことを示す。従来の2乗フロベニウスの目的と異なり、ゼロがランク-k因子化で誤ってカバーされる回数に比例する入力行列の零エントリに重みを割り当てる新たな目的関数を導入する。指数サイズのIPの1つについて,列生成に基づく計算手法について述べる。合成および実単語データセットの実験結果から,我々の整数プログラミング手法はk-BMFの利用可能な手法と競合し,精度の高い低エラー因数分解を提供することが示された。 Binary matrix factorisation is an essential tool for identifying discrete patterns in binary data. In this paper we consider the rank-k binary matrix factorisation problem (k-BMF) under Boolean arithmetic: we are given an n x m binary matrix X with possibly missing entries and need to find two binary matrices A and B of dimension n x k and k x m respectively, which minimise the distance between X and the Boolean product of A and B in the squared Frobenius distance. We present a compact and two exponential size integer programs (IPs) for k-BMF and show that the compact IP has a weak LP relaxation, while the exponential size LPs have a stronger equivalent LP relaxation. We introduce a new objective function, which differs from the traditional squared Frobenius objective in attributing a weight to zero entries of the input matrix that is proportional to the number of times the zero is erroneously covered in a rank-k factorisation. For one of the exponential size IPs we describe a computational approach based on column generation. Experimental results on synthetic and real word datasets suggest that our integer programming approach is competitive against available methods for k-BMF and provides accurate low-error factorisations.	翻訳日:2021-06-28 12:54:20 公開日:2021-06-25
# 残響環境におけるディープラーニング音声活動検出器と室内インパルス応答モデルの評価 Evaluation of Deep-Learning-Based Voice Activity Detectors and Room Impulse Response Models in Reverberant Environments ( http://arxiv.org/abs/2106.13511v1 ) ライセンス: Link先を確認	Amir Ivry, Israel Cohen, Baruch Berdugo	(参考訳) 最先端のディープラーニングに基づく音声活動検出装置(vad)は、しばしば無響データを用いて訓練される。しかし、実際の音響環境は一般に残響であり、性能が著しく低下する。トレーニングデータと実データとのミスマッチを軽減するために,500万近い発話を含む拡張トレーニングセットをシミュレートする。この拡張は、様々な室インパルス応答(rir)を伴う無響発話の畳み込みによって生成される無響発話とその残響変化からなる。 rirの生成には5つの異なるモデルと、拡張トレーニングセットでトレーニングされる5つの異なるvadを考えます。トレーニングされたシステムはすべて、3つの異なる実残響環境でテストします。実験結果から,全ての検出器および応答モデルの平均精度,精度,リコールが,無響訓練と比較して20 %$上昇した。さらに、RIRモデルの1つは、テストされた全てのVADに対して、他のモデルよりも常に優れたパフォーマンスが得られる。さらに、VADの1つは全ての実験で他のVADよりも一貫して優れていた。 State-of-the-art deep-learning-based voice activity detectors (VADs) are often trained with anechoic data. However, real acoustic environments are generally reverberant, which causes the performance to significantly deteriorate. To mitigate this mismatch between training data and real data, we simulate an augmented training set that contains nearly five million utterances. This extension comprises of anechoic utterances and their reverberant modifications, generated by convolutions of the anechoic utterances with a variety of room impulse responses (RIRs). We consider five different models to generate RIRs, and five different VADs that are trained with the augmented training set. We test all trained systems in three different real reverberant environments. Experimental results show $20\%$ increase on average in accuracy, precision and recall for all detectors and response models, compared to anechoic training. Furthermore, one of the RIR models consistently yields better performance than the other models, for all the tested VADs. Additionally, one of the VADs consistently outperformed the other VADs in all experiments.	翻訳日:2021-06-28 12:53:58 公開日:2021-06-25
# Littlestoneクラスはプライベートにオンライン学習可能 Littlestone Classes are Privately Online Learnable ( http://arxiv.org/abs/2106.13513v1 ) ライセンス: Link先を確認	Noah Golowich and Roi Livni	(参考訳) プライバシー制約下でのオンライン分類の問題点を考察する。この設定では、学習者はラベル付きサンプルのストリームを逐次観察し、各イテレーションで$(x_t, y_t)$, for $1 \leq t \leq t$, and return at each iteration $t$ a hypothesis $h_t$ を返します。学習者のパフォーマンスは、既知の仮説クラス$\mathcal{H}$に対する後悔によって測定される。アルゴリズムが出力する仮説のシーケンス$h_1, \ldots, h_T$は$(\epsilon, \delta)$-differentially private function of the whole input sequence $(x_1, y_1), \ldots, (x_T, y_T)$である必要がある。実現可能な設定のために、最初の非自明な後悔を与える。具体的には、クラス $\mathcal{H}$ が定数のリトルストーン次元を持つなら、ラベル付き例の曖昧な列が与えられた場合、最大$O(\log T)$ミスを期待するプライベートラーナーが存在する。さらに、リトルストーン次元の一般値 $d$ に対して、同じ誤り境界は成り立つが、$d$因子の二重指数を持つ。最近の研究は、オンライン学習可能なクラスと、微分プライベート学習可能なクラスの間に強いつながりを示している。この関係を強化し、オンライン学習アルゴリズムが(実現可能な設定で)直接民営化可能であることを示す。また,適応的な設定を議論し,o(\sqrt{t})$ の劣線形後悔値を与える。 We consider the problem of online classification under a privacy constraint. In this setting a learner observes sequentially a stream of labelled examples $(x_t, y_t)$, for $1 \leq t \leq T$, and returns at each iteration $t$ a hypothesis $h_t$ which is used to predict the label of each new example $x_t$. The learner's performance is measured by her regret against a known hypothesis class $\mathcal{H}$. We require that the algorithm satisfies the following privacy constraint: the sequence $h_1, \ldots, h_T$ of hypotheses output by the algorithm needs to be an $(\epsilon, \delta)$-differentially private function of the whole input sequence $(x_1, y_1), \ldots, (x_T, y_T)$. We provide the first non-trivial regret bound for the realizable setting. Specifically, we show that if the class $\mathcal{H}$ has constant Littlestone dimension then, given an oblivious sequence of labelled examples, there is a private learner that makes in expectation at most $O(\log T)$ mistakes -- comparable to the optimal mistake bound in the non-private case, up to a logarithmic factor. Moreover, for general values of the Littlestone dimension $d$, the same mistake bound holds but with a doubly-exponential in $d$ factor. A recent line of work has demonstrated a strong connection between classes that are online learnable and those that are differentially-private learnable. Our results strengthen this connection and show that an online learning algorithm can in fact be directly privatized (in the realizable setting). We also discuss an adaptive setting and provide a sublinear regret bound of $O(\sqrt{T})$.	翻訳日:2021-06-28 12:53:43 公開日:2021-06-25
# テキスト依存話者検証のための音素認識とチャネル毎注意学習 Phoneme-aware and Channel-wise Attentive Learning for Text DependentSpeaker Verification ( http://arxiv.org/abs/2106.13514v1 ) ライセンス: Link先を確認	Yan Liu, Zheng Li, Lin Li, Qingyang Hong	(参考訳) 本稿では,テキスト依存型話者認証(SV)のための音素認識型マルチタスク学習ネットワークを提案する。提案手法では,フレームレベルのマルチタスク学習とセグメントレベルの逆学習を併用して話者埋め込み抽出を行う。話者分類器の主ネットワークにおけるフレームレベルの特徴に、補助サブネットにおける音素分布に対する対応する後続確率を利用して音素認識注意プールを行う。さらに、Squeeze and Excitation(SE-block)の導入により、動的チャネルワイズ機能の再検討が行われ、表現能力が向上する。提案手法は, パスフレーズに関連する話者イディオ同期を活用し, 時間的, チャネル的側面から音素対応の注意プーリングとseブロックによりさらに改善する。 RSR2015 Part 1データベースで行った実験により,本システムはテキスト依存型SVに対して優れた結果が得られることを確認した。 This paper proposes a multi-task learning network with phoneme-aware and channel-wise attentive learning strategies for text-dependent Speaker Verification (SV). In the proposed structure, the frame-level multi-task learning along with the segment-level adversarial learning is adopted for speaker embedding extraction. The phoneme-aware attentive pooling is exploited on frame-level features in the main network for speaker classifier, with the corresponding posterior probability for the phoneme distribution in the auxiliary subnet. Further, the introduction of Squeeze and Excitation (SE-block) performs dynamic channel-wise feature recalibration, which improves the representational ability. The proposed method exploits speaker idiosyncrasies associated with pass-phrases, and is further improved by the phoneme-aware attentive pooling and SE-block from temporal and channel-wise aspects, respectively. The experiments conducted on RSR2015 Part 1 database confirm that the proposed system achieves outstanding results for textdependent SV.	翻訳日:2021-06-28 12:53:06 公開日:2021-06-25
# 信号歪みとエコー抑圧の調整可能なトレードオフによる深い残留エコー抑圧 Deep Residual Echo Suppression with A Tunable Tradeoff Between Signal Distortion and Echo Suppression ( http://arxiv.org/abs/2106.13531v1 ) ライセンス: Link先を確認	Amir Ivry, Israel Cohen, Baruch Berdugo	(参考訳) 本稿では,線形音響エコーキャンセラの出力を直接スペクトル領域の所望信号にマッピングするUNetニューラルネットワークを用いた残留エコー抑圧手法を提案する。このシステムでは、ダブルトークシナリオにおける所望の信号歪みと残留エコー抑圧との調整可能なトレードオフを可能にする設計パラメータを組み込む。このシステムは136万のパラメータを使用し、毎秒1.6ギガ浮動小数点演算と10メガバイトのメモリを必要とする。この実装は、AECチャレンジのタイミング要件とオンデバイスアプリケーションの計算およびメモリ制限の両方を満たす。 AECチャレンジデータベースと実際の独立記録から161〜hのデータを用いて実験を行う。提案システムの性能を実生活環境で実証し、エコー抑圧と所望の信号歪み、様々な環境への一般化、高エコーレベルの堅牢性に関する2つの競合手法と比較する。 In this paper, we propose a residual echo suppression method using a UNet neural network that directly maps the outputs of a linear acoustic echo canceler to the desired signal in the spectral domain. This system embeds a design parameter that allows a tunable tradeoff between the desired-signal distortion and residual echo suppression in double-talk scenarios. The system employs 136 thousand parameters, and requires 1.6 Giga floating-point operations per second and 10 Mega-bytes of memory. The implementation satisfies both the timing requirements of the AEC challenge and the computational and memory limitations of on-device applications. Experiments are conducted with 161~h of data from the AEC challenge database and from real independent recordings. We demonstrate the performance of the proposed system in real-life conditions and compare it with two competing methods regarding echo suppression and desired-signal distortion, generalization to various environments, and robustness to high echo levels.	翻訳日:2021-06-28 12:52:51 公開日:2021-06-25
# 物理形ニューラルネットワークによる過渡安定性解析 Transient Stability Analysis with Physics-Informed Neural Networks ( http://arxiv.org/abs/2106.13638v1 ) ライセンス: Link先を確認	Jochen Stiasny, Georgios S. Misyris, Spyros Chatzivasileiadis	(参考訳) 電力系統を支配する通常の微分方程式を解くことは、過渡安定解析において不可欠である。しかし、伝統的に適用される手法は、計算量に大きな負担を負うか、モデルの単純化を必要とするか、保守的なサロゲートモデルを使うかのどちらかである。ニューラルネットワークは、これらの制限を回避できるが、使用するデータセットに対する高い要求に直面している。さらに、それらは下層の支配方程式に無関係である。物理インフォームドニューラルネットワークはこの問題に対処し,その利点と課題について考察する。 Kundur two-area systemの知見を概説し,本手法のさらなる発展に向けての道筋を明らかにする。 Solving the ordinary differential equations that govern the power system is an indispensable part in transient stability analysis. However, the traditionally applied methods either carry a significant computational burden, require model simplifications, or use overly conservative surrogate models. Neural networks can circumvent these limitations but are faced with high demands on the used datasets. Furthermore, they are agnostic to the underlying governing equations. Physics-informed neural network tackle this problem and we explore their advantages and challenges in this paper. We illustrate the findings on the Kundur two-area system and highlight possible pathways forward in developing this method further.	翻訳日:2021-06-28 12:52:35 公開日:2021-06-25
# ロボット学習のための統計的保証を用いたタスク駆動分散検出 Task-Driven Out-of-Distribution Detection with Statistical Guarantees for Robot Learning ( http://arxiv.org/abs/2106.13703v1 ) ライセンス: Link先を確認	Alec Farid, Sushant Veer, Anirudha Majumdar	(参考訳) 我々のゴールは、ロボットのトレーニングに使用する環境とは異なる分布から引き出された環境において、ロボットが動作していることを検知する、アウト・オブ・ディストリビューション(OOD)検出を行うことである。本稿では,確率的近似(PAC)-ベイズ理論を利用して,トレーニング分布における性能を保証したポリシをトレーニングする。テスト環境に縛られた性能の侵害は、ロボットがOODを操作していることを示す証拠となります。 p-値と濃度不等式に基づいて統計的手法を用いてこれを定式化する。結果として得られたアプローチ(i)はood検出に対する信頼性を保証し、(ii)タスク駆動であり、ロボットの性能に影響を与える変化のみに敏感である。身近なポーズや形状で物体をつかむシミュレーション例について,本手法を実証する。また,不慣れな環境(風乱や障害物密度など)において,視覚に基づく障害物回避を行うドローンのシミュレーションとハードウェア実験も行った。我々の例は、ほんの数回の試行でタスク駆動型OOD検出ができることを示している。また, ベースラインとの比較では, 統計的保証の提供やタスク非関連分布シフトに敏感であることから, 提案手法の利点も示している。 Our goal is to perform out-of-distribution (OOD) detection, i.e., to detect when a robot is operating in environments that are drawn from a different distribution than the environments used to train the robot. We leverage Probably Approximately Correct (PAC)-Bayes theory in order to train a policy with a guaranteed bound on performance on the training distribution. Our key idea for OOD detection then relies on the following intuition: violation of the performance bound on test environments provides evidence that the robot is operating OOD. We formalize this via statistical techniques based on p-values and concentration inequalities. The resulting approach (i) provides guaranteed confidence bounds on OOD detection, and (ii) is task-driven and sensitive only to changes that impact the robot's performance. We demonstrate our approach on a simulated example of grasping objects with unfamiliar poses or shapes. We also present both simulation and hardware experiments for a drone performing vision-based obstacle avoidance in unfamiliar environments (including wind disturbances and different obstacle densities). Our examples demonstrate that we can perform task-driven OOD detection within just a handful of trials. Comparisons with baselines also demonstrate the advantages of our approach in terms of providing statistical guarantees and being insensitive to task-irrelevant distribution shifts.	翻訳日:2021-06-28 12:52:24 公開日:2021-06-25
# sdss-iv拡張バリオン振動分光調査による原始的非ガウス性i:カタログ作成と体系的緩和 Primordial non-Gaussianity from the Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey I: Catalogue Preparation and Systematic Mitigation ( http://arxiv.org/abs/2106.13724v1 ) ライセンス: Link先を確認	Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Eva-Maria Mueller, Will J. Percival, Grant Merz, Reza Katebi, Razvan C. Bunescu, Julian Bautista, Joel R. Brownstein, Etienne Burtin, Kyle Dawson, H\'ector Gil-Mar\'in, Jiamin Hou, Eleanor B. Lyke, Axel de la Macorra, Graziano Rossi, Donald P. Schneider, Pauline Zarrouk, Gong-Bo Zhao	(参考訳) 最近完了した拡張バリオン振動分光調査(eboss)によるクエーサーの最終分光試料の大規模クラスタリングについて検討した。サンプルには、redshiftの範囲内の343708ドルのオブジェクトが含まれている。$0.8<z<2.2$と72667ドルのオブジェクトで、redshiftは$2.2<z<3.5$であり、有効面積は$4699 -{\rm deg}^{2}$である。本研究では,画像データの品質の空間的変動による密度場のスプリアス変動を軽減し,追従分光のターゲット選択に用いたニューラルネットワークに基づく手法を提案する。シミュレーションは実データと同じ角分布と半径分布を用いて共分散行列を推定し、誤差解析を行い、残差系統的不確かさを評価する。本稿では,eBOSSクエーサーの平均密度コントラストと相互相関を測定し,アルゴリズムの有効性に対処するため,画像の潜在的ソースのマップと比較し,ニューラルネットワークに基づくアプローチが標準的な線形回帰よりも優れていることを示した。恒星密度は、散発的な変動の最も重要な源の1つであり、gaia衛星のデータを用いて構築された新しいテンプレートは、観測されたクエーサーのクラスタリングに最も適している。この研究から得られた最終産物は、非線形イメージングシステム効果の補正のために改良された重み付き付加価値クエーサーカタログである。我々のクエーサーカタログは、我々の共用論文『Mueller et al』の局所型原始的非ガウス性を測定するために使用される。準備中 We investigate the large-scale clustering of the final spectroscopic sample of quasars from the recently completed extended Baryon Oscillation Spectroscopic Survey (eBOSS). The sample contains $343708$ objects in the redshift range $0.8<z<2.2$ and $72667$ objects with redshifts $2.2<z<3.5$, covering an effective area of $4699~{\rm deg}^{2}$. We develop a neural network-based approach to mitigate spurious fluctuations in the density field caused by spatial variations in the quality of the imaging data used to select targets for follow-up spectroscopy. Simulations are used with the same angular and radial distributions as the real data to estimate covariance matrices, perform error analyses, and assess residual systematic uncertainties. We measure the mean density contrast and cross-correlations of the eBOSS quasars against maps of potential sources of imaging systematics to address algorithm effectiveness, finding that the neural network-based approach outperforms standard linear regression. Stellar density is one of the most important sources of spurious fluctuations, and a new template constructed using data from the Gaia spacecraft provides the best match to the observed quasar clustering. The end-product from this work is a new value-added quasar catalogue with the improved weights to correct for nonlinear imaging systematic effects, which will be made public. Our quasar catalogue is used to measure the local-type primordial non-Gaussianity in our companion paper, Mueller et al. in preparation.	翻訳日:2021-06-28 12:52:04 公開日:2021-06-25
# ディープラーニングを用いた非線形音響エコーキャンセラ Nonlinear Acoustic Echo Cancellation with Deep Learning ( http://arxiv.org/abs/2106.13754v1 ) ライセンス: Link先を確認	Amir Ivry, Israel Cohen, Baruch Berdugo	(参考訳) 遠端信号から近端マイクロホンへのエコーパスを2つの部分でモデル化することを目的とした非線形音響エコーキャンセリングシステムを提案する。現代のハンズフリーデバイスの物理的挙動に触発されて、我々はまず、これらのデバイスが極端信号の受信と再生の間に引き起こす非線形歪みをモデル化する、新しいニューラルネットワークアーキテクチャを導入する。デバイス間のばらつきを考慮し,事前パラメータ化されていないが,トレーニングデータを用いたトレーニング段階で最適化された,トレーニング可能なメモリ長と非線形アクティベーション関数を備えたネットワークを構築する。第2に、スピーカ出力とマイクの間のエコーパスを常に追跡する標準適応線形フィルタによってネットワークを継承する。トレーニング中、ネットワークとフィルタはネットワークパラメータを学習するために協調的に最適化される。このシステムは毎秒500万の浮動小数点演算を消費する1万のパラメータと40Kiloバイトのメモリを必要とする。また、ハンズフリーの通信タイミング要件を標準のニューラルプロセッサで満たし、ハンズフリーの通信デバイスに組み込むのに適している。 280時間の実データと合成データを用いて、実験は競合する手法と比較して有利な性能を示す。 We propose a nonlinear acoustic echo cancellation system, which aims to model the echo path from the far-end signal to the near-end microphone in two parts. Inspired by the physical behavior of modern hands-free devices, we first introduce a novel neural network architecture that is specifically designed to model the nonlinear distortions these devices induce between receiving and playing the far-end signal. To account for variations between devices, we construct this network with trainable memory length and nonlinear activation functions that are not parameterized in advance, but are rather optimized during the training stage using the training data. Second, the network is succeeded by a standard adaptive linear filter that constantly tracks the echo path between the loudspeaker output and the microphone. During training, the network and filter are jointly optimized to learn the network parameters. This system requires 17 thousand parameters that consume 500 Million floating-point operations per second and 40 Kilo-bytes of memory. It also satisfies hands-free communication timing requirements on a standard neural processor, which renders it adequate for embedding on hands-free communication devices. Using 280 hours of real and synthetic data, experiments show advantageous performance compared to competing methods.	翻訳日:2021-06-28 12:51:34 公開日:2021-06-25
# 拡散ネットに基づく過渡雑音環境における音声活動検出 Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets ( http://arxiv.org/abs/2106.13763v1 ) ライセンス: Link先を確認	Amir Ivry, Baruch Berdugo, Israel Cohen	(参考訳) 実生活シナリオにおいてしばしば発生する過渡音と定常音の音響環境における音声活動の検出に対処する。音声と非音声の空間的パターンを独立に学習し,その基礎となる幾何学的構造を学習する。このプロセスはディープエンコーダ-デコーダベースのニューラルネットワークアーキテクチャを通じて行われる。この構造は、時間的情報を持つスペクトル特徴を拡散写像法を適用して生成される低次元表現にマッピングするエンコーダを含んでいる。エンコーダは、埋め込みデータを高次元空間にマッピングするデコーダを供給する。非音声フレームから音声を分離するように訓練されたディープニューラルネットワークは、既知の拡散ネットアーキテクチャに似たエンコーダにデコーダを結合することで得られる。実験の結果, 競合音声活動検出法と比較して, 性能が向上した。この改善は精度、堅牢性、一般化能力の両方で達成される。我々のモデルはリアルタイムに動作し、音声ベースの通信システムに統合することができる。また,オフラインアプリケーションに対して,より高精度なバッチアルゴリズムを提案する。 We address voice activity detection in acoustic environments of transients and stationary noises, which often occur in real life scenarios. We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure. This process is done through a deep encoder-decoder based neural network architecture. This structure involves an encoder that maps spectral features with temporal information to their low-dimensional representations, which are generated by applying the diffusion maps method. The encoder feeds a decoder that maps the embedded data back into the high-dimensional space. A deep neural network, which is trained to separate speech from non-speech frames, is obtained by concatenating the decoder to the encoder, resembling the known Diffusion nets architecture. Experimental results show enhanced performance compared to competing voice activity detection methods. The improvement is achieved in both accuracy, robustness and generalization ability. Our model performs in a real-time manner and can be integrated into audio-based communication systems. We also present a batch algorithm which obtains an even higher accuracy for off-line applications.	翻訳日:2021-06-28 12:51:14 公開日:2021-06-25
# 配電試験における耐性の価格 The Price of Tolerance in Distribution Testing ( http://arxiv.org/abs/2106.13414v1 ) ライセンス: Link先を確認	Cl\'ement L. Canonne, Ayush Jain, Gautam Kamath, Jerry Li	(参考訳) 寛容分布テストの問題を再検討する。つまり、未知の分布のサンプルが$p$ over $\{1, \dots, n\}$であるなら、$\varepsilon_1$-close to or $\varepsilon_2$-far from a reference distribution$q$ (the total variation distance)? 過去10年間の大きな関心にもかかわらず、この問題は極端なケースでのみよく理解されている。ノイズのない設定(すなわち$\varepsilon_1 = 0$)では、サンプルの複雑さは$\Theta(\sqrt{n})$で、ドメインサイズは強いサブ線形である。スペクトルの反対側では、$\varepsilon_1 = \varepsilon_2/2$ の場合、サンプルの複雑さはわずかにサブ線形な $\Theta(n/\log n)$ にジャンプする。しかし、中間体制についてはほとんど知られていない。分散テストにおける寛容の価格は、$n$, $\varepsilon_1$, $\varepsilon_2$の関数から、単一の$\log n$ factorまで、完全に特徴づける。具体的には、サンプルの複雑性は \[\tilde \theta\left(\frac{\sqrt{n}}{\varepsilon_2^{2}} + \frac{n}{\log n} \cdot \max \left\{\frac{\varepsilon_1}{\varepsilon_2^2},\left(\frac{\varepsilon_1}{\varepsilon_2^2}\right)^{\!\! また、p$ と$q$ の両方が未知である寛容同値検定の問題についても同様の特徴を与える。いずれの場合も、サンプルの複雑さを決定する主な量は、$\varepsilon_1/\varepsilon_2^2$であり、より直感的な$\varepsilon_1/\varepsilon_2$ではない。特に技術的な関心は下界フレームワークであり、$\varepsilon_1$と$\varepsilon_2$の間の非対称性を扱うのに必要な新しい近似理論ツールを含んでいる。 We revisit the problem of tolerant distribution testing. That is, given samples from an unknown distribution $p$ over $\{1, \dots, n\}$, is it $\varepsilon_1$-close to or $\varepsilon_2$-far from a reference distribution $q$ (in total variation distance)? Despite significant interest over the past decade, this problem is well understood only in the extreme cases. In the noiseless setting (i.e., $\varepsilon_1 = 0$) the sample complexity is $\Theta(\sqrt{n})$, strongly sublinear in the domain size. At the other end of the spectrum, when $\varepsilon_1 = \varepsilon_2/2$, the sample complexity jumps to the barely sublinear $\Theta(n/\log n)$. However, very little is known about the intermediate regime. We fully characterize the price of tolerance in distribution testing as a function of $n$, $\varepsilon_1$, $\varepsilon_2$, up to a single $\log n$ factor. Specifically, we show the sample complexity to be \[\tilde \Theta\left(\frac{\sqrt{n}}{\varepsilon_2^{2}} + \frac{n}{\log n} \cdot \max \left\{\frac{\varepsilon_1}{\varepsilon_2^2},\left(\frac{\varepsilon_1}{\varepsilon_2^2}\right)^{\!\!2}\right\}\right),\] providing a smooth tradeoff between the two previously known cases. We also provide a similar characterization for the problem of tolerant equivalence testing, where both $p$ and $q$ are unknown. Surprisingly, in both cases, the main quantity dictating the sample complexity is the ratio $\varepsilon_1/\varepsilon_2^2$, and not the more intuitive $\varepsilon_1/\varepsilon_2$. Of particular technical interest is our lower bound framework, which involves novel approximation-theoretic tools required to handle the asymmetry between $\varepsilon_1$ and $\varepsilon_2$, a challenge absent from previous works.	翻訳日:2021-06-28 12:50:59 公開日:2021-06-25

Title

Authors

Abstract

論文公表日・翻訳日

# ソーシャルメディアにおける攻撃的伝播のモデル化

Modeling Aggression Propagation on Social Media ( http://arxiv.org/abs/2002.10131v3 )

ライセンス: Link先を確認

Chrysoula Terizi, Despoina Chatzakou, Evaggelia Pitoura, Panayiotis Tsaparas and Nicolas Kourtellis

(参考訳) Cyberaggressionはさまざまなコンテキストやオンラインソーシャルプラットフォームで研究され、最先端の機械学習アルゴリズムとディープラーニングアルゴリズムを使用してさまざまなデータをモデル化して、この動作の自動検出とブロックを実現している。ユーザは、自身の(オンラインの)ソーシャルサークルにおける毒性と攻撃性の増加によって、攻撃的に行動したり、他人をいじめたりすることができる。事実上、この動作は、あるユーザや近隣から別のユーザへと伝播するので、ネットワーク中に広がります。興味深いことに、我々の知る限り、攻撃的行動のネットワークダイナミクスをモデル化した研究は行われていない。本稿では,ソーシャルメディア上での攻撃の伝播を意見ダイナミクスを用いて研究することで,この方向への第一歩を踏み出す。我々は,攻撃的あるいは通常のユーザとの接続方法に応じて,攻撃性をあるユーザから別のユーザへと伝播させる方法をモデル化する方法を提案する。 Twitterデータに対する広範なシミュレーションを通じて、攻撃的行動がネットワーク内でどのように伝播するかを研究する。我々は,アノテートされた基底真理データを用いてモデルの検証を行い,最大80%のaucに到達し,研究の結果と影響について議論した。

Cyberaggression has been studied in various contexts and online social platforms, and modeled on different data using state-of-the-art machine and deep learning algorithms to enable automatic detection and blocking of this behavior. Users can be influenced to act aggressively or even bully others because of elevated toxicity and aggression in their own (online) social circle. In effect, this behavior can propagate from one user and neighborhood to another, and therefore, spread in the network. Interestingly, to our knowledge, no work has modeled the network dynamics of aggressive behavior. In this paper, we take a first step towards this direction by studying propagation of aggression on social media using opinion dynamics. We propose ways to model how aggression may propagate from one user to another, depending on how each user is connected to other aggressive or regular users. Through extensive simulations on Twitter data, we study how aggressive behavior could propagate in the network. We validate our models with crawled and annotated ground truth data, reaching up to 80% AUC, and discuss the results and implications of our work.

翻訳日:2023-06-02 05:16:42 公開日:2021-06-25

# 鎖状錯体の特定の量子積符号とテンソル積に対する最小距離

Minimal distances for certain quantum product codes and tensor products of chain complexes ( http://arxiv.org/abs/2007.12152v3 )

ライセンス: Link先を確認

Weilei Zeng and Leonid P. Pryadko

(参考訳) 量子誤り訂正符号への写像と部分空間射影を用いて、有限体上のベクトル空間の2つの鎖複体からなるテンソル積の最小ホモロジー距離に対する下界を得る。そのような複素体のホモロジー群は、k\"unneth定理によって記述される。複素空間の一方が2つの空間の間の線型写像であるとき、距離を明示的に表現する。構築における符号、サブシステム製品符号およびゲージ固定された変種は、いくつかの既知の量子誤り訂正符号を一般化する。

We use a map to quantum error-correcting codes and a subspace projection to get lower bounds for minimal homological distances in a tensor product of two chain complexes of vector spaces over a finite field. Homology groups of such a complex are described by the K\"unneth theorem. We give an explicit expression for the distances when one of the complexes is a linear map between two spaces. The codes in the construction, subsystem product codes and their gauge-fixed variants, generalize several known families of quantum error-correcting codes.

翻訳日:2023-05-08 10:49:56 公開日:2021-06-25

# 波長多重エンタングルメントに基づく量子暗号の実験

Experimental wavelength-multiplexed entanglement-based quantum cryptography ( http://arxiv.org/abs/2009.03691v3 )

ライセンス: Link先を確認

Johannes Pseiner, Lukas Achatz, Lukas Bulla, Martin Bohmann, Rupert Ursin

(参考訳) 最先端量子鍵分布(qkd)システムでは、鍵生成速度を増加させる主な制限因子は光子検出のタイミング分解能である。本稿では,この限界を克服する戦略を提示し,実験的に実証する。波長多重化を用いた絡み合い光子の固有波長相関を利用して、偏光絡みから量子セキュア鍵を生成する。提案手法は、多くのタイプの絡み合い源を変更することなく、ファイバーおよび衛星ベースの量子通信方式に統合することができる。この手法は、多重化されていないスキームと比較して、安全な鍵レートを数桁向上できる巨大なスケーリングポテンシャルを備えている。

In state-of-the-art quantum key distribution (QKD) systems, the main limiting factor in increasing the key generation rate is the timing resolution in detecting photons. Here, we present and experimentally demonstrate a strategy to overcome this limitation, also for high-loss and long-distance implementations. We exploit the intrinsic wavelength correlations of entangled photons using wavelength multiplexing to generate a quantum secure key from polarization entanglement. The presented approach can be integrated into both fiber- and satellite-based quantum-communication schemes, without any changes to most types of entanglement sources. This technique features a huge scaling potential allowing to increase the secure key rate by several orders of magnitude as compared to non-multiplexed schemes.

翻訳日:2023-05-03 05:15:26 公開日:2021-06-25

# 量子調和振動子スペクトル分析器

Quantum harmonic oscillator spectrum analyzers ( http://arxiv.org/abs/2010.10438v3 )

ライセンス: Link先を確認

Jonas Keller, Pan-Yu Hou, Katherine C. McCormick, Daniel C. Cole, Stephen D. Erickson, Jenny J. Wu, Andrew C. Wilson, Dietrich Leibfried

(参考訳) ノイズの特性と抑制は、量子状態における調和振動子の制御に不可欠である。量子高調波発振器の雑音スペクトルを低周波から近傍の振動子共鳴で測定し,周波数変調周期駆動に対する応答をキュービットで検出する。捕捉したイオンの運動を用いて,500Hzから600kHzの雑音に対する感度を併用した2つの異なる実装を実験的に実証した。本手法は, イオントラップ電位の固有ノイズスペクトルを, 従来アクセスできなかった周波数範囲で測定する。

Characterization and suppression of noise are essential for the control of harmonic oscillators in the quantum regime. We measure the noise spectrum of a quantum harmonic oscillator from low frequency to near the oscillator resonance by sensing its response to amplitude modulated periodic drives with a qubit. Using the motion of a trapped ion, we experimentally demonstrate two different implementations with combined sensitivity to noise from 500 Hz to 600 kHz. We apply our method to measure the intrinsic noise spectrum of an ion trap potential in a previously unaccessed frequency range.

翻訳日:2023-04-28 05:32:25 公開日:2021-06-25

# チェビシェフ多項式の多スピン系における何千もの中心固有値の効率的な計算への応用

Dual application of Chebyshev polynomial for efficiently computing thousands of central eigenvalues in many-spin systems ( http://arxiv.org/abs/2011.02107v2 )

ライセンス: Link先を確認

Haoyu Guan, Wenxian Zhang

(参考訳) スペクトルの統計的性質が量子カオスの本質的な特徴を与えることが知られている。したがって、中間スペクトルにおける大きな内部固有値群の計算は、量子多体系にとって重要な問題である。本稿では,システムサイズの観点から指数関数的に互いに近い何千もの中心固有値を効率よく見つけるために,チェビシェフ多項式(DACP)法の二重応用を提案する。近縮退問題に対処するため、チェビシェフ多項式を用いて半円フィルタの指数関数をプレコンディショニングステップとして構成し、所望の部分空間の基底として大きな固有状態を生成する。さらに、DACPは計算時間が要求される固有値の数に影響されないという優れた性質を負っている。イジングスピンチェーンとスピングラスシャードに関する数値実験により,提案手法の正確性と有効性が示された。以上の結果から,DACPはスピングラスシャードの8倍の速度で,Isingスピンチェーンの最先端シフト反転法よりも30倍高速であることがわかった。メモリの要件はシステムサイズによって大きくなり、シフト反転アプローチよりも100倍も小さくなる可能性がある。

It is known that the statistical properties of the spectrum provide an essential characterization of quantum chaos. The computation of a large group of interior eigenvalues at the middle spectrum is thus an important problem for quantum many-body systems. We propose a dual application of Chebyshev polynomial (DACP) method to effciently find thousands of central eigenvalues, which are exponentially close to each other in terms of the system size. To cope with the near-degenerate problem, we use the Chebyshev polynomial to both construct an exponential of semicircle filter as the preconditioning step and generate a large set of proper states as the basis of the desired subspace. Besides, DACP owes an excellent property that its computation time is not influenced by the required number of eigenvalues. Numerical experiments on Ising spin chain and spin glass shards show the correctness and effciency of the proposed method. As our results demonstrate, DACP is a factor of 30 faster than the state-of-the-art shift-invert method for the Ising spin chain while 8 times faster for the spin glass shards. The memory requirements scale better with system size and could be a factor of 100 less than in the shift-invert approach.

翻訳日:2023-04-25 07:39:05 公開日:2021-06-25

# プレ選択およびポスト選択計測を含まないスケーラブル多光子量子メソロジー

Scalable multiphoton quantum metrology with neither pre- nor post-selected measurements ( http://arxiv.org/abs/2011.02454v2 )

ライセンス: Link先を確認

Chenglong You, Mingyuan Hong, Peter Bierhorst, Adriana E. Lita, Scott Glancy, Steve Kolthammer, Emanuel Knill, Sae Woo Nam, Richard P. Mirin, Omar S. Magana-Loaiza, Thomas Gerrits

(参考訳) 電磁場の量子統計ゆらぎは、古典的な技術で実行される光学測定の感度に基づいて、ショットノイズ限界と呼ばれる限界を確立する。しかし、量子技術はこのショットノイズ限界に拘束されない。この点において、量子光源によって生成された全ての光子を用いて、ショットノイズ限界を超える小さな物理パラメータを推定することは、量子光学の主要な目標の1つである。そこで本研究では,様々な位相の量子エンハンスド光位相推定のためのスケーラブルなプロトコルを実験的に実証する。これは、光子数分解検出と組み合わせた自発的パラメトリックダウンコンバージョン源の効率的な設計によって達成される。 2モード圧縮真空状態の損失に対するロバスト性は、単一の光子の損失が量子状態から全ての位相情報を取り除くのに十分なn00n状態に基づくスキームよりも優れています。 N00N状態や条件測定に依存する他のスキームとは対照的に,高次光子対の生成と検出により,本手法の感度を向上させることができる。このプロトコルのユニークな機能によって、スケーラブルになります。我々の研究は、量子イメージング、ボーソンサンプリング、量子ネットワークなどの多光子干渉に依存する量子技術にとって重要である。

The quantum statistical fluctuations of the electromagnetic field establish a limit, known as the shot-noise limit, on the sensitivity of optical measurements performed with classical technologies. However, quantum technologies are not constrained by this shot-noise limit. In this regard, the possibility of using every photon produced by quantum sources of light to estimate small physical parameters, beyond the shot-noise limit, constitutes one of the main goals of quantum optics. Here we experimentally demonstrate a scalable protocol for quantum-enhanced optical phase estimation across a broad range of phases, with neither pre- nor post-selected measurements. This is achieved through the efficient design of a source of spontaneous parametric down-conversion in combination with photon-number-resolving detection. The robustness of two-mode squeezed vacuum states against loss allows us to outperform schemes based on N00N states, in which the loss of a single photon is enough to remove all phase information from a quantum state. In contrast to other schemes that rely on N00N states or conditional measurements, the sensitivity of our technique could be improved through the generation and detection of high-order photon pairs. This unique feature of our protocol makes it scalable. Our work is important for quantum technologies that rely on multiphoton interference such as quantum imaging, boson sampling and quantum networks.

翻訳日:2023-04-25 07:23:07 公開日:2021-06-25

# ロバスト陰影推定

Robust shadow estimation ( http://arxiv.org/abs/2011.09636v2 )

ライセンス: Link先を確認

Senrui Chen, Wenjun Yu, Pei Zeng and Steven T. Flammia

(参考訳) 大規模で強結合した量子システムの特性を効率的に推定することは、多体物理学と量子情報理論の中心的な焦点である。量子コンピュータは多くのタスクでスピードアップを約束するが、短期的なデバイスではノイズが発生しやすいため、推定の精度は一般的に低下する。本稿では,Huang,Kueng,Preskillが最近提案したシャドウ推定プロトコルにおける誤りの軽減方法を紹介する。標準影推定法に実験的にフレンドリーなキャリブレーションステージを付加することにより,量子システムの古典影の偏りのない推定を得ることができ,実験条件において最小の仮定しか与えられず,サンプル効率と耐雑音性に優れた方法で多くの有用な特性を抽出することができる。我々は,本プロトコルのサンプル複雑性に厳密な限界を与え,その性能をいくつかの数値実験で実証する。

Efficiently estimating properties of large and strongly coupled quantum systems is a central focus in many-body physics and quantum information theory. While quantum computers promise speedups for many such tasks, near-term devices are prone to noise that will generally reduce the accuracy of such estimates. Here we show how to mitigate errors in the shadow estimation protocol recently proposed by Huang, Kueng, and Preskill. By adding an experimentally friendly calibration stage to the standard shadow estimation scheme, our robust shadow estimation algorithm can obtain an unbiased estimate of the classical shadow of a quantum system and hence extract many useful properties in a sample-efficient and noise-resilient manner given only minimal assumptions on the experimental conditions. We give rigorous bounds on the sample complexity of our protocol and demonstrate its performance with several numerical experiments.

翻訳日:2023-04-23 17:23:23 公開日:2021-06-25

# 散逸によって引き起こされるトポロジカル状態の障害:異なるタイプの局在遷移の証拠

Disorder in dissipation-induced topological states: Evidence for a different type of localization transition ( http://arxiv.org/abs/2011.09730v3 )

ライセンス: Link先を確認

Alon Beck, Moshe Goldstein

(参考訳) 非平衡量子相転移の探求は、しばしば駆動と散逸が効果的な温度をもたらす傾向によって妨げられ、古典的振る舞いをもたらす。散逸が非自明な量子コヒーレント定常状態へとシステムを動かすように設計されたとき、これは違うのだろうか? 本研究では,最近導入された散逸誘発チャーン位相状態に対する障害の影響を考察し,エルミート定常密度行列や絡み合いハミルトン行列の固有モードを調べることで,この問題に光を当てた。平衡と同様に、各ランダウバンドは中心付近に1つの非局在化レベルを持つ。しかし、3つの異なる有限サイズスケーリング法を用いて、非局在化状態に近づくときの局所化長のばらつきを記述する臨界指数$\nu$が、力学の非散逸部分に障害が導入された場合の平衡とは大きく異なることを示す。これは、冷原子実験で利用できる異なるタイプの非平衡量子臨界普遍性クラスを示す。

The quest for nonequilibrium quantum phase transitions is often hampered by the tendency of driving and dissipation to give rise to an effective temperature, resulting in classical behavior. Could this be different when the dissipation is engineered to drive the system into a nontrivial quantum coherent steady state? In this work we shed light on this issue by studying the effect of disorder on recently-introduced dissipation-induced Chern topological states, and examining the eigenmodes of the Hermitian steady state density matrix or entanglement Hamiltonian. We find that, similarly to equilibrium, each Landau band has a single delocalized level near its center. However, using three different finite size scaling methods we show that the critical exponent $\nu$ describing the divergence of the localization length upon approaching the delocalized state is significantly different from equilibrium if disorder is introduced into the non-dissipative part of the dynamics. This indicates a different type of nonequilibrium quantum critical universality class accessible in cold-atom experiments.

翻訳日:2023-04-23 17:16:56 公開日:2021-06-25

# 共変量子チャネルのプログラム可能性

Programmability of covariant quantum channels ( http://arxiv.org/abs/2012.00717v2 )

ライセンス: Link先を確認

Martina Gschwendtner, Andreas Bluhm, Andreas Winter

(参考訳) プログラム可能な量子プロセッサは、プログラムレジスタの状態を使用して、入力レジスタに適用される一連の量子チャネルの1つの要素を特定する。そのようなデバイスは、無限に多くのユニタリ量子チャネル(NielsenとChuangのNo-Programming Theorem)を含む集合に対して有限次元のプログラムレジスタでは不可能であることはよく知られている。システムが対称性を持つ場合、状況は変化する。実際、ここでは群共変チャネルを考える。群がチャネル入力に対して無作為に作用すると、これらのチャネルは有限のプログラム次元のプログラマブル量子プロセッサによって正確に実装できる(チャネルのチェイ・ジャミオルコフスキー状態をプログラムとして使用するテレポーテーションシミュレーションにより)。さらに、対称群作用の表現理論を活用することで、プログラムの冗長性を除去し、その結果のプログラムレジスタがヒルベルト空間次元が最小であることを示す方法を示す。さらに、全てのグループ共変チャネルを概ね実装したプロセッサのプログラムレジスタ次元の上限と下限を提供する。

A programmable quantum processor uses the states of a program register to specify one element of a set of quantum channels which is applied to an input register. It is well-known that such a device is impossible with a finite-dimensional program register for any set that contains infinitely many unitary quantum channels (Nielsen and Chuang's No-Programming Theorem), meaning that a universal programmable quantum processor does not exist. The situation changes if the system has symmetries. Indeed, here we consider group-covariant channels. If the group acts irreducibly on the channel input, these channels can be implemented exactly by a programmable quantum processor with finite program dimension (via teleportation simulation, which uses the Choi-Jamiolkowski state of the channel as a program). Moreover, by leveraging the representation theory of the symmetry group action, we show how to remove redundancy in the program and prove that the resulting program register has minimum Hilbert space dimension. Furthermore, we provide upper and lower bounds on the program register dimension of a processor implementing all group-covariant channels approximately.

翻訳日:2023-04-22 11:59:58 公開日:2021-06-25

# 極微相互作用によるキタエフ材料のトーリック符号位相の電気プローブ

Electric probe for the toric code phase in Kitaev materials through the hyperfine interaction ( http://arxiv.org/abs/2012.08825v5 )

ライセンス: Link先を確認

Masahiko G. Yamada, Satoshi Fujimoto

(参考訳) キタエフモデルは、ガッピングおよびギャップのないスピン液体相を持つ顕著なスピンモデルであり、イリダートと$\alpha$-rucl$_3$で実現される。最近行われた$\alpha$-RuCl$_3$の実験では、系のC_3$対称性を破るギャップ付きトーリック符号相へのネマティック遷移のシグネチャが、熱容量の角度依存性によって観測されている。本稿では,ネマティック遷移を電気的に検出する機構を提案する。 j_\textrm{eff}=1/2$スピンは電気四極子モーメント(eqm)を持たないため、これは不可能に見える。しかし、2階摂動では、非ゼロのEQMを持つ仮想状態が現れ、核磁気共鳴とM\"オスバウアー分光によってネマティック秩序パラメータが検出される。 EQMの純粋な磁源は従来の電子ネマティック相とは異なるため、北エフのトーリック誤り訂正符号の直接検出が可能である。

The Kitaev model is a remarkable spin model with gapped and gapless spin liquid phases, which are potentially realized in iridates and $\alpha$-RuCl$_3$. In the recent experiment of $\alpha$-RuCl$_3$, the signature of a nematic transition to the gapped toric code phase, which breaks the $C_3$ symmetry of the system, has been observed through the angle dependence of the heat capacity. We here propose a mechanism by which the nematic transition can be detected electrically. This is seemingly impossible because $J_\textrm{eff}=1/2$ spins do not have an electric quadrupole moment (EQM). However, in the second-order perturbation the virtual state with a nonzero EQM appears, which makes the nematic order parameter detectable by nuclear magnetic resonance and M\"ossbauer spectroscopy. The purely magnetic origin of EQM is different from conventional electronic nematic phases, allowing the direct detection of the realization of Kitaev's toric error-correction code.

翻訳日:2023-04-20 11:21:06 公開日:2021-06-25

# スケーラブルな量子コンピューティングアーキテクチャのための浮動小数点可変カプラ

Floating tunable coupler for scalable quantum computing architectures ( http://arxiv.org/abs/2103.07030v2 )

ライセンス: Link先を確認

Eyob A. Sete, Angela Q. Chen, Riccardo Manenti, Shobhan Kulshreshtha, and Stefano Poletto

(参考訳) ゼロカップリング条件を達成するために、直接量子ビット結合容量に依存しない浮動小数点結合器を提案する。量子ビットカップラ結合の極性は、非定数の量子ビット結合をオフセットし、カプラ周波数がキュービット周波数より上または下にあるときにゼロカップリング状態に達するように設計できることを示す。量子ビットに対するカプラの超伝導パッドの対称および非対称配置を実装し,これら2種類の可変カプラの動作機構を実験的に実証した。このような浮動小数点共振器は、常時オンの残余結合を低減しつつ、大規模量子プロセッサを設計する柔軟性を提供する。

We propose a floating tunable coupler that does not rely on direct qubit-qubit coupling capacitances to achieve the zero-coupling condition. We show that the polarity of the qubit-coupler couplings can be engineered to offset the otherwise constant qubit-qubit coupling and attain the zero-coupling condition when the coupler frequency is above or below the qubit frequencies. We experimentally demonstrate these two operating regimes of the tunable coupler by implementing symmetric and asymmetric configurations of the coupler's superconducting pads with respect to the qubits. Such a floating tunable coupler provides flexibility in designing large-scale quantum processors while reducing the always-on residual couplings.

翻訳日:2023-04-08 08:57:07 公開日:2021-06-25

# 熱前フロッケ時間結晶の臨界特性

Critical properties of the prethermal Floquet Time Crystal ( http://arxiv.org/abs/2103.10818v2 )

ライセンス: Link先を確認

Muath Natsheh, Andrea Gambassi, Aditi Mitra

(参考訳) 予熱相におけるフロケット時間結晶の形成を特徴付ける臨界特性を周期的に駆動された$O(N)$モデルで解析した。特に,周期同期力学を伴う自明な位相と,長距離空間秩序を伴う非自明な位相との長距離空間秩序の欠如とを分離する臨界線に着目した。臨界線の近傍では、次元展開と$N\to\infty$の正確な解の組み合わせにより、等時相関関数の空間相関長の発散を特徴付ける指数$\nu$、オーダーパラメータの振幅の増大を特徴付ける指数$\beta$、およびクエンチが臨界線への自明位相の奥深くで実行されたときの老化ダイナミクスの初期スリップ指数$\theta$を決定する。指数 $\nu, \beta, \theta$ はドライブがないときと同一であることが分かる。また、老化機能形態は、駆動期間に比べて小さかったり大きかったりした場合に、系がプローブされているかに依存することが判明した。臨界線近傍の摂動ポテンシャルに対する線形応答として得られる2点相関関数の空間構造は、駆動の欠如よりも長い範囲の代数的崩壊を示すことが示され、周期的に倍められただけでなく、波動ベクトル $\omega/(2 v)$, $v$ で空間的に振動することも見出され、準粒子の速度は$\omega$ であり、駆動周波数は$\omega$ である。

The critical properties characterizing the formation of the Floquet time crystal in the prethermal phase are investigated analytically in the periodically driven $O(N)$ model. In particular, we focus on the critical line separating the trivial phase with period synchronized dynamics and absence of long-range spatial order from the non-trivial phase where long-range spatial order is accompanied by period-doubling dynamics. In the vicinity of the critical line, with a combination of dimensional expansion and exact solution for $N\to\infty$, we determine the exponent $\nu$ that characterizes the divergence of the spatial correlation length of the equal-time correlation functions, the exponent $\beta$ characterizing the growth of the amplitude of the order-parameter, as well as the initial-slip exponent $\theta$ of the aging dynamics when a quench is performed from deep in the trivial phase to the critical line. The exponents $\nu, \beta, \theta$ are found to be identical to those in the absence of the drive. In addition, the functional form of the aging is found to depend on whether the system is probed at times that are small or large compared to the drive period. The spatial structure of the two-point correlation functions, obtained as a linear response to a perturbing potential in the vicinity of the critical line, is found to show algebraic decays that are longer ranged than in the absence of a drive, and besides being period-doubled, are also found to oscillate in space at the wave-vector $\omega/(2 v)$, $v$ being the velocity of the quasiparticles, and $\omega$ being the drive frequency.

翻訳日:2023-04-07 10:53:23 公開日:2021-06-25

# fairmandering: 公平を最適化した政治区分のためのコラム生成ヒューリスティック

Fairmandering: A column generation heuristic for fairness-optimized political districting ( http://arxiv.org/abs/2103.11469v2 )

ライセンス: Link先を確認

Wes Gurnee and David B. Shmoys

(参考訳) アメリカ合衆国議会の選挙区制は、選挙区の境界を操作することで選挙結果を決める権限を政治家に与えている。既存の計算ソリューションは、政治的、人口統計的な入力を無視して偏りのない地図を描くことに集中しており、代わりに単にコンパクトさを最適化している。コンパクトさと公正さは直交的な性質であるため、これは欠陥のあるアプローチであり、公正性の任意の片方向線形定義を明示的に最適化するためのスケーラブルな2段階法を導入する。第1段階はランダム化された分割列生成ヒューリスティックであり、グラフ分割問題の構成構造を利用して、指数的な数の異なる地区計画を生成する。この地区アンサンブルは、マスター選択問題への入力を形成し、最終計画に含まれる地区を選択する。分離した設計により、公正な対象関数を定義する上で、前例のない柔軟性が実現できます。パイプラインは任意に並列化可能で、さらなる再制限制約をサポートする柔軟性があり、他の広範囲の地域化問題に適用できる。議会地区における最大規模のアンサンブル研究において、我々の手法を用いて、期待される結果の範囲と、この範囲がフェアネスの潜在的な定義に与える影響を理解する。

The American winner-take-all congressional district system empowers politicians to engineer electoral outcomes by manipulating district boundaries. Existing computational solutions mostly focus on drawing unbiased maps by ignoring political and demographic input, and instead simply optimize for compactness. We claim that this is a flawed approach because compactness and fairness are orthogonal qualities, and introduce a scalable two-stage method to explicitly optimize for arbitrary piecewise-linear definitions of fairness. The first stage is a randomized divide-and-conquer column generation heuristic which produces an exponential number of distinct district plans by exploiting the compositional structure of graph partitioning problems. This district ensemble forms the input to a master selection problem to choose the districts to include in the final plan. Our decoupled design allows for unprecedented flexibility in defining fairness-aligned objective functions. The pipeline is arbitrarily parallelizable, is flexible to support additional redistricting constraints, and can be applied to a wide array of other regionalization problems. In the largest ever ensemble study of congressional districts, we use our method to understand the range of possible expected outcomes and the implications of this range on potential definitions of fairness.

翻訳日:2023-04-07 06:34:30 公開日:2021-06-25

# トーリック符号のグラフ状態表現

Graph state representation of the toric code ( http://arxiv.org/abs/2103.12268v3 )

ライセンス: Link先を確認

Pengcheng Liao, David L. Feder

(参考訳) フォールトトレラントな操作の可能性を考えると、トポロジカル量子状態は現在、激しい活動の焦点となっている。特に興味深いのは、表面や平らな安定化符号のようなトポロジカルな量子誤り訂正符号であり、これは有名なトーリック符号と等価である。すべてのスタビライザ状態は局所クリフォード操作下でグラフ状態にマップされるが、トポロジカルスタビライザ符号に関連するグラフは未知のままである。トーリック符号グラフは、星グラフ(グリーンベルガー=ホルン=ゼーリンガー状態の符号化)とハーフグラフの2種類の部分グラフからなる。位相次数は、繰り返し符号とトーリック符号の間の関係を明らかにする多重星グラフの存在と同一視される。グラフ構造は、幾何学的に非局所ゲートを仮定して、状態形成のためのログ深さ量子回路を容易に生成し、アキラを含む一定の深さに縮小でき、回路幅を増大させるコストで測定することができる。その結果, トポロジ的順序の調査と新しいトポロジ的誤り訂正符号の開発のためのグラフ理論の枠組みが得られた。

Given their potential for fault-tolerant operations, topological quantum states are currently the focus of intense activity. Of particular interest are topological quantum error correction codes, such as the surface and planar stabilizer codes that are equivalent to the celebrated toric code. While every stabilizer state maps to a graph state under local Clifford operations, the graphs associated with topological stabilizer codes remain unknown. We show that the toric code graph is composed of only two kinds of subgraphs: star graphs (which encode Greenberger-Horne-Zeilinger states) and half graphs. The topological order is identified with the existence of multiple star graphs, which reveals a connection between the repetition and toric codes. The graph structure readily yields a log-depth quantum circuit for state preparation, assuming geometrically non-local gates, which can be reduced to a constant depth including ancillae and measurements at the cost of increasing the circuit width. The results provide a new graph-theoretic framework for the investigation of topological order and the development of novel topological error correction codes.

翻訳日:2023-04-07 02:30:12 公開日:2021-06-25

# 断熱臨界量子距離論は、断熱へのショートカットを適用してもハイゼンベルク極限に達することができない

Adiabatic critical quantum metrology cannot reach the Heisenberg limit even when shortcuts to adiabaticity are applied ( http://arxiv.org/abs/2103.12939v2 )

ライセンス: Link先を確認

Karol Gietka, Friederike Metz, Tim Keller, and Jing Li

(参考訳) 臨界量子メトロロジーに対する断熱的なアプローチで得られた量子フィッシャー情報は、精度のハイゼンベルク限界につながり得ないため、最適な設定下での正則量子メトロロジーは常に優れていることを示す。さらに, 断熱への近道は臨界基底状態の生成時間を任意に減少させることができるが, 断熱臨界量子メトロロジーにおける量子パラメータ推定のハイゼンベルク限界の達成や克服には使用できないと論じた。ケーススタディとして、ランダウ・ツェナーモデルと量子ラビモデルへの反断熱駆動の適用について検討する。

We show that the quantum Fisher information attained in an adiabatic approach to critical quantum metrology cannot lead to the Heisenberg limit of precision and therefore regular quantum metrology under optimal settings is always superior. Furthermore, we argue that even though shortcuts to adiabaticity can arbitrarily decrease the time of preparing critical ground states, they cannot be used to achieve or overcome the Heisenberg limit for quantum parameter estimation in adiabatic critical quantum metrology. As case studies, we explore the application of counter-diabatic driving to the Landau-Zener model and the quantum Rabi model.

翻訳日:2023-04-06 23:59:23 公開日:2021-06-25

# フェルミオン双対性:強い散逸と記憶を持つ開システムの一般対称性

Fermionic duality: General symmetry of open systems with strong dissipation and memory ( http://arxiv.org/abs/2104.11202v2 )

ライセンス: Link先を確認

V. Bruch, K. Nestmann, J. Schulenborg and M. R. Wegewijs

(参考訳) 我々は、強い相互作用と広帯域貯水池との強い結合を持つ幅広い種類のフェルミオン開量子系の正確な時間進化を考える。我々は、状態の進化(schr\"odinger)と観測可能な状態(heisenberg)の間の非自明なフェルミオン双対関係を示す。この非常に直感的な関係は、クラース測度演算子、Choi-Jamio{\l}kowski状態、時間畳み込みおよび畳み込みのない量子マスター方程式、一般化されたリンドブラッドジャンプ演算子など、量子力学のすべての正準的アプローチにおける解析計算においてどのように理解され、活用されるかを示す。力学の可除性と因果構造に関する洞察と、非摂動マルコフ近似とその初期すべり補正への応用について論じる。フェミオンモデルに対する予測は、これまで考えられていたよりもはるかに広範囲に根本原理によって既に固定されている。

We consider the exact time-evolution of a broad class of fermionic open quantum systems with both strong interactions and strong coupling to wide-band reservoirs. We present a nontrivial fermionic duality relation between the evolution of states (Schr\"odinger) and of observables (Heisenberg). We show how this highly nonintuitive relation can be understood and exploited in analytical calculations within all canonical approaches to quantum dynamics, covering Kraus measurement operators, the Choi-Jamio{\l}kowski state, time-convolution and convolutionless quantum master equations and generalized Lindblad jump operators. We discuss the insights this offers into the divisibility and causal structure of the dynamics and the application to nonperturbative Markov approximations and their initial-slip corrections. Our results underscore that predictions for fermionic models are already fixed by fundamental principles to a much greater extent than previously thought.

翻訳日:2023-04-02 20:09:25 公開日:2021-06-25

# 逐次アルゴリズムを逆転させる

Reversify any sequential algorithm ( http://arxiv.org/abs/2105.05626v2 )

ライセンス: Link先を確認

Yuri Gurevich

(参考訳) 任意のシーケンシャルアルゴリズムを$A$に逆転させるには、簿記機械で$A$を優しく実装する。結果として、$a$のstep-for-stepを模倣したstep-for-stepのリバーシブルアルゴリズムが生まれ、$a$の時点で停止する。一般性を失うことなく、アルゴリズム $a$ は、動作的に $a$ と同一の抽象状態マシンとして提示される。そのような表現の存在は理論的に証明され、そのような表現の実用性は十分に証明されている。

To reversify an arbitrary sequential algorithm $A$, we gently instrument $A$ with bookkeeping machinery. The result is a step-for-step reversible algorithm that mimics $A$ step-for-step and stops exactly when $A$ does. Without loss of generality, we presume that algorithm $A$ is presented as an abstract state machine that is behaviorally identical to $A$. The existence of such representation has been proven theoretically, and the practicality of such representation has been amply demonstrated.

翻訳日:2023-03-31 08:52:24 公開日:2021-06-25

# 線形光学系におけるガウス絡み合いの分布

Distribution of Gaussian Entanglement in Linear Optical Systems ( http://arxiv.org/abs/2105.13441v2 )

ライセンス: Link先を確認

Jiru Liu, Wenchao Ge and M. Suhail Zubairy

(参考訳) 絡み合いは、多くのアプリケーションを持つ量子ネットワークを構築する上で不可欠な要素である。ネットワーク内での絡み合いの分散を理解することは、前進するための重要なステップです。本稿では,二部交絡のための新しい定量化器を用いて,線形ネットワークにおけるガウス交絡の保存と分布について検討する。本研究では, ビームスプリッタを透過率, 反射率と同等に, 絡み合いを分散できることを示す。絡み合った状態の要件と、この関係を満たすネットワークの種類は明確に示される。本研究は,量子エンタングルメントの新しい定量化とネットワーク内のエンタングルメントの構造に関するさらなる知見を提供する。

Entanglement is an essential ingredient for building a quantum network that can have many applications. Understanding how entanglement is distributed in a network is a crucial step to move forward. Here we study the conservation and distribution of Gaussian entanglement in a linear network using a new quantifier for bipartite entanglement. We show that the entanglement can be distributed through a beam-splitter in the same way as the transmittance and the reflectance. The requirements on the entangled states and the type of networks to satisfy this relation are presented explicitly. Our results provide a new quantification for quantum entanglement and further insights into the structure of entanglement in a network.

翻訳日:2023-03-29 06:55:29 公開日:2021-06-25

# 非断熱ドライブと非熱量子状態のエネルギー的利点

Energetic advantages of non-adiabatic drives combined with non-thermal quantum states ( http://arxiv.org/abs/2106.05990v3 )

ライセンス: Link先を確認

Camille L Latune

(参考訳) 量子力学の実験や応用において、量子系のユニタリ駆動はユビキタスであり、特に量子熱力学に関連するエネルギー的側面が注目されている。初期非熱状態から得られるユニタリ駆動のエネルギー的利点について検討する。我々は非環状エルゴトロピーを導入してエネルギー利得を定量化し、コヒーレント(コヒーレンスベース)と非コヒーレント(人口ベース)の寄与を同定する。特に、初期量子コヒーレンスは常に有益であるように見えるが、非パッシブ集団分布は体系的ではない。さらに、これらのエネルギーゲインは、初期熱状態に対する断熱力学の通常の最適性とは対照的に、非断熱力学を通してのみアクセス可能である。最後に、ショートカット・トゥ・アディバチティの文脈で確立されたフレームワークに従って、最適なドライブの実装に関連するエネルギーコストが分析され、ほとんどの場合、ショートカット・トゥ・アディバチティティに関連するエネルギーコストよりも小さいことが判明した。我々は,二段階システムの例を明示的に扱い,より大きな初期コヒーレンスによってエネルギッシュなアドバンテージが増大し,初期コヒーレンスとダイナミクスがコヒーレンスを消費し使用する能力との間に相互作用することを示す。

Unitary drivings of quantum systems are ubiquitous in experiments and applications of quantum mechanics and the underlying energetic aspects, particularly relevant in quantum thermodynamics, are receiving growing attention. We investigate energetic advantages in unitary driving obtained from initial non-thermal states. We introduce the non-cyclic ergotropy to quantify the energetic gains, from which coherent (coherence-based) and incoherent (population-based) contributions are identified. In particular, initial quantum coherences appear to be always beneficial whereas non-passive population distributions not systematically. Additionally, these energetic gains are accessible only through non-adiabatic dynamics, contrasting with the usual optimality of adiabatic dynamics for initial thermal states. Finally, following frameworks established in the context of shortcut-to-adiabaticity, the energetic cost related to the implementation of the optimal drives are analysed and, in most situations, are found to be smaller than the energetic cost associated with shortcut-to-adiabaticity. We treat explicitly the example of a two-level system and show that energetic advantages increase with larger initial coherences, illustrating the interplay between initial coherences and the ability of the dynamics to consume and use coherences.

翻訳日:2023-03-27 01:41:07 公開日:2021-06-25

# 古典的二次元ハイゼンベルク模型の再検討:$SU(2)$-symmetric tensor network study

The classical two-dimensional Heisenberg model revisited: An $SU(2)$-symmetric tensor network study ( http://arxiv.org/abs/2106.06310v2 )

ライセンス: Link先を確認

Philipp Schmoll, Augustine Kshetrimayum, Jens Eisert, Roman Orus, Matteo Rizzi

(参考訳) 2つの空間次元の古典的ハイゼンベルク模型は最もパラダイム的なスピンモデルの一つであり、磁性を理解するために統計物理学や凝縮物質物理学において重要な役割を果たす。それでも、そのパラダイム的特徴と(連続的な)自発的対称性の破れが広く受け入れられているにもかかわらず、モデルが有限温度で相転移を示すかどうかの議論は残る。重要なことに、このモデルは 1+1$ 次元における $o(3)$ 非線形シグマモデルの格子離散化として解釈することができ、これは有名な高次元(3+1$次元の量子色力学のような)の重要な特徴、すなわち漸近自由現象を含む最も単純な量子場理論の1つである。これは有限温度遷移も除外するが、格子効果は主流図の修正に重要な役割を果たす。本研究では,gibbs状態の相関構造を包括的に探究するために,熱力学的限界における古典的分割関数を表現する最先端テンソルネットワーク手法を用いた。 2次元テンソルネットワーク収縮スキームに$SU(2)$対称性を実装することで、相転移を検出する上で重要な特徴である$\chi_E^\text{eff} \sim 1500$までの環境の非常に大きな有効結合次元を処理できる。気温が下がるにつれて、急速に変化する相関関係の長さがみられ、その振る舞いは文献で知られている二つの矛盾する2つの仮説、すなわち有限=$t$遷移と漸近自由とに適合する。

The classical Heisenberg model in two spatial dimensions constitutes one of the most paradigmatic spin models, taking an important role in statistical and condensed matter physics to understand magnetism. Still, despite its paradigmatic character and the widely accepted ban of a (continuous) spontaneous symmetry breaking, controversies remain whether the model exhibits a phase transition at finite temperature. Importantly, the model can be interpreted as a lattice discretization of the $O(3)$ non-linear sigma model in $1+1$ dimensions, one of the simplest quantum field theories encompassing crucial features of celebrated higher-dimensional ones (like quantum chromodynamics in $3+1$ dimensions), namely the phenomenon of asymptotic freedom. This should also exclude finite-temperature transitions, but lattice effects might play a significant role in correcting the mainstream picture. In this work, we make use of state-of-the-art tensor network approaches, representing the classical partition function in the thermodynamic limit over a large range of temperatures, to comprehensively explore the correlation structure for Gibbs states. By implementing an $SU(2)$ symmetry in our two-dimensional tensor network contraction scheme, we are able to handle very large effective bond dimensions of the environment up to $\chi_E^\text{eff} \sim 1500$, a feature that is crucial in detecting phase transitions. With decreasing temperatures, we find a rapidly diverging correlation length, whose behaviour is apparently compatible with the two main contradictory hypotheses known in the literature, namely a finite-$T$ transition and asymptotic freedom, though with a slight preference for the second.

翻訳日:2023-03-26 23:42:02 公開日:2021-06-25

# 時間符号化多層スパイクニューラルネットワークにおけるVLSI回路制約の影響

Effects of VLSI Circuit Constraints on Temporal-Coding Multilayer Spiking Neural Networks ( http://arxiv.org/abs/2106.10382v2 )

ライセンス: Link先を確認

Yusuke Sakemi, Takashi Morie, Takeo Hosomi, Kazuyuki Aihara

(参考訳) spiking neural network (snn)は、脳の数学的モデルとしてだけでなく、現実世界のアプリケーションのためのエネルギー効率の良い情報処理モデルとしても注目されている。特に、テンポラリ符号化に基づくsnsは、タスクの実行にかなり少ないスパイクを必要とするため、レート符号化に基づくものよりもずっと効率的であることが期待されている。 SNNは連続状態および連続時間モデルであるため、アナログVLSI回路で実装することが好ましい。しかし、システムサイズが非常に大きい場合、連続時間アナログ回路によるシステム全体の構築は不可能である。したがって、混合信号回路を用いる必要があり、シナプス重みの時間離散化と量子化が必要である。さらに、SNNのアナログVLSI実装は、ノイズやデバイスミスマッチの影響、アナログ回路操作に起因する他の制約など、非理想性を示す。本研究では,SNNの性能に及ぼす時間離散化および/または重み量子化の影響を検討した。さらに, 膜電位の下限と焼成閾値の時間的変動の影響を解明した。最後に,数理SNNモデルを離散時間でアナログ回路にマッピングするための最適手法を提案する。

The spiking neural network (SNN) has been attracting considerable attention not only as a mathematical model for the brain, but also as an energy-efficient information processing model for real-world applications. In particular, SNNs based on temporal coding are expected to be much more efficient than those based on rate coding, because the former requires substantially fewer spikes to carry out tasks. As SNNs are continuous-state and continuous-time models, it is favorable to implement them with analog VLSI circuits. However, the construction of the entire system with continuous-time analog circuits would be infeasible when the system size is very large. Therefore, mixed-signal circuits must be employed, and the time discretization and quantization of the synaptic weights are necessary. Moreover, the analog VLSI implementation of SNNs exhibits non-idealities, such as the effects of noise and device mismatches, as well as other constraints arising from the analog circuit operation. In this study, we investigated the effects of the time discretization and/or weight quantization on the performance of SNNs. Furthermore, we elucidated the effects the lower bound of the membrane potentials and the temporal fluctuation of the firing threshold. Finally, we propose an optimal approach for the mapping of mathematical SNN models to analog circuits with discretized time.

翻訳日:2023-03-26 08:07:48 公開日:2021-06-25

# 強化学習によるインタラクティブレコメンデーションの精度と公平性のバランス

Balancing Accuracy and Fairness for Interactive Recommendation with Reinforcement Learning ( http://arxiv.org/abs/2106.13386v1 )

ライセンス: Link先を確認

Weiwen Liu, Feng Liu, Ruiming Tang, Ben Liao, Guangyong Chen, Pheng Ann Heng

(参考訳) レコメンデーションの公平性は、従来のレコメンデーションによって引き起こされるバイアスと差別のために、注目を集めている。 Interactive Recommender Systems (IRS)では、ユーザの好みとシステムの公平性は時間とともに常に変化している。既存の公正を意識した推奨者は、主に静的な設定における公平性を考慮する。 IRSに直接既存手法を適用すると、推奨度は低下する。この問題を解決するために,IRSの精度と公平性の長期的バランスを動的に維持する強化学習ベースのフレームワークであるFairRecを提案する。ユーザの好みとシステムの公平性ステータスは、状態表現に共同で圧縮され、レコメンデーションを生成する。 FairRecは、正確性と公正性を組み合わせた設計された累積報酬の最大化を目指している。大規模な実験は、FairRecが優れたレコメンデーション品質を維持しながら、公正性を改善することを実証する。

Fairness in recommendation has attracted increasing attention due to bias and discrimination possibly caused by traditional recommenders. In Interactive Recommender Systems (IRS), user preferences and the system's fairness status are constantly changing over time. Existing fairness-aware recommenders mainly consider fairness in static settings. Directly applying existing methods to IRS will result in poor recommendation. To resolve this problem, we propose a reinforcement learning based framework, FairRec, to dynamically maintain a long-term balance between accuracy and fairness in IRS. User preferences and the system's fairness status are jointly compressed into the state representation to generate recommendations. FairRec aims at maximizing our designed cumulative reward that combines accuracy and fairness. Extensive experiments validate that FairRec can improve fairness, while preserving good recommendation quality.

翻訳日:2023-03-25 14:11:22 公開日:2021-06-25

# 非マルコフ性との量子相関とコヒーレンスの保存

Preserving quantum correlations and coherence with non-Markovianity ( http://arxiv.org/abs/2106.13573v1 )

ライセンス: Link先を確認

Marek Miller, Kang-Da Wu, Manfredi Scalici, Jan Kolodynski, Guo-Yong Xiang, Chuan-Feng Li, Guang-Can Guo, Alexander Streltsov

(参考訳) 開量子系はシュリンガー方程式に従って一元的に進化する閉量子系と比較して豊かな現象論を示す。開量子系の力学は通常、任意の時間スケールで量子力学を有効な量子演算に分解できるかどうかによってマルコフ的および非マルコフ的に分類される。マルコフの進化は非マルコフ力学と比較してシミュレートが容易であるため、非マルコフ性は有用な量子技術応用に利用できると仮定することは妥当である。本稿では,量子系における相関とコヒーレンスを保存するための非マルコフ性の有用性を示す。このために我々は、デコヒーレンス行列をゼロから大きく分離した、広範囲な量子ビット進化のクラスを考える。そのようなマルコフの進化は、指数関数的な相関の損失をもたらすが、非マルコフ性は、極限 $t \rightarrow \infty$ においても相関を維持するのに役立つ。共変量子ビットの進化について、非マルコビアン性は、常に量子コヒーレンスを維持するのに利用できることを示す。我々は,この効果を線形光学を用いて実験的に証明し,常に非マルコフ型である所要進化を実装した。

Open quantum systems exhibit a rich phenomenology, in comparison to closed quantum systems that evolve unitarily according to the Schr\"odinger equation. The dynamics of an open quantum system are typically classified into Markovian and non-Markovian, depending on whether the dynamics can be decomposed into valid quantum operations at any time scale. Since Markovian evolutions are easier to simulate, compared to non-Markovian dynamics, it is reasonable to assume that non-Markovianity can be employed for useful quantum-technological applications. Here, we demonstrate the usefulness of non-Markovianity for preserving correlations and coherence in quantum systems. For this, we consider a broad class of qubit evolutions, having a decoherence matrix separated from zero for large times. While any such Markovian evolution leads to an exponential loss of correlations, non-Markovianity can help to preserve correlations even in the limit $t \rightarrow \infty$. For covariant qubit evolutions, we also show that non-Markovianity can be used to preserve quantum coherence at all times, which is an important resource for quantum metrology. We explicitly demonstrate this effect experimentally with linear optics, by implementing the required evolution that is non-Markovian at all times.

翻訳日:2023-03-25 14:08:41 公開日:2021-06-25

# 量子論のQに基づく解釈の導入

Introducing the Q-based interpretation of quantum theory ( http://arxiv.org/abs/2106.13502v1 )

ライセンス: Link先を確認

Simon Friederich

(参考訳) この記事では量子論の新しい解釈について概説する。この解釈の根底にある考え方は、Drummond と Reid [2020] によって最近提案された場の量子論において、位相空間函数 Q を、古典的な統計力学における確率分布 \rho に大まかに類似した適切な確率分布として解釈することである。ここで、qに基づく解釈を動機付け、経験的に適切かどうかを調べ、その重要な概念的特徴を概説する。 qに基づく解釈は、測定問題を持たないことを約束し、概念的に控えめであり、相対論的および場理論的な文脈にエレガントに適用する可能性を秘めているという点で魅力的である。

This article outlines a novel interpretation of quantum theory: the Q-based interpretation. The core idea underlying this interpretation, recently suggested for quantum field theories by Drummond and Reid [2020], is to interpret the phase space function Q -- a transform of the better known Wigner function -- as a proper probability distribution, roughly analogous to the probability distribution \rho in classical statistical mechanics. Here I motivate the Q-based interpretation, investigate whether it is empirically adequate, and outline some of its key conceptual features. I argue that the Q-based interpretation is attractive in that it promises having no measurement problem, is conceptually parsimonious and has the potential to apply elegantly to relativistic and field-theoretic contexts.

翻訳日:2023-03-25 14:07:35 公開日:2021-06-25

# 3 x 3方向不偏光線形光マルチポートの実装

Implementation of a 3 x 3 directionally-unbiased linear optical multiport ( http://arxiv.org/abs/2106.13473v1 )

ライセンス: Link先を確認

Ilhwan Kim, Donghwa Lee, Seongjin Hong, Young-Wook Cho, Kwang Jo Lee, Yong-Su Kim, Hyang-Tag Lim

(参考訳) 線形光マルチポートはフォトニック量子情報処理に広く用いられている。当然これらのデバイスは、光子が常に入力ポートから出力ポートに向かって伝播するため、方向バイアスを受ける。近年,方向不偏光多重ポートの概念が提案されている。これらの方向が不偏な多重ポートは光子を逆方向に伝播させ、複雑な線形光量子ネットワークに必要な線形光学素子の数を大幅に減少させる。本稿では,光トリッタとミラーを用いた3×3方向偏りのない線形光ファイバマルチポートの実証実験を行う。長いコヒーレンス長の光源でしか動作しないバルク光学素子を用いた以前の実演と比較して、実験用3×3光マルチポートは、可能なすべての光軌道に無視可能な光路長差を与えるため、長コヒーレンス長を必要としない。これは複雑なグラフネットワーク上で大規模量子ウォークを実装するのに有用なビルディングブロックである。

Linear optical multiports are widely used in photonic quantum information processing. Naturally, these devices are directionally-biased since photons always propagate from the input ports toward the output ports. Recently, the concept of directionally-unbiased linear optical multiports was proposed. These directionally-unbiased multiports allow photons to propagate along a reverse direction, which can greatly reduce the number of required linear optical elements for complicated linear optical quantum networks. Here, we report an experimental demonstration of a 3 x 3 directionally-unbiased linear optical fiber multiport using an optical tritter and mirrors. Compared to the previous demonstration using bulk optical elements which works only with light sources with a long coherence length, our experimental directionally-unbiased 3 x 3 optical multiport does not require a long coherence length since it provides negligible optical path length differences among all possible optical trajectories. It can be a useful building block for implementing large-scale quantum walks on complex graph networks.

翻訳日:2023-03-25 14:06:52 公開日:2021-06-25

# 全光伝送による水中デコイ状態量子鍵分布の実験

Experimental Demonstration of Underwater Decoy-state Quantum Key Distribution with All-optical Transmission ( http://arxiv.org/abs/2106.13441v1 )

ライセンス: Link先を確認

Yonghe Yu, Wendong Li, Yu Wei, Yang Yang, Shanchuan Dong, Tian Qian, Shuo Wang, Qiming Zhu, Shangshuai Zheng, Xinjian Zhang and Yongjian Gu

(参考訳) 量子信号、同期信号、古典的通信信号の全光伝送を備えた完全なuwqkdシステムを構築し、水深10.4メートルのjerlov型iii海水チャンネル上の水中量子鍵分布(uwqkd)を実証する。波長分割多重化と時空間波長フィルタリング技術を適用し、光信号が干渉しないようにする。このシステムはFPGAで制御されており、水密キャビンに容易に統合してフィールド実験を行うことができる。偏光符号化によるデコイ状態BB84プロトコルを用いることで、13.26dBの減衰で、鍵レート 1.82Kbps、エラーレート 1.55% が得られる。最大23.7dbのチャネル損失を許容できることを証明し、300mのジェロフ型清浄海水チャンネルで使用できることを証明した。

We demonstrate the underwater quantum key distribution (UWQKD) over a 10.4-meter Jerlov type III seawater channel by building a complete UWQKD system with all-optical transmission of quantum signals, synchronization signal and classical communication signal. The wavelength division multiplexing and the space-time-wavelength filtering technology are applied to ensure that the optical signals do not interfere with each other. The system is controlled by FPGA, and can be easily integrated into watertight cabins to perform field experiment. By using the decoy-state BB84 protocol with polarization encoding, we obtain a secure key rate of 1.82Kbps and an error rate of 1.55% at the attenuation of 13.26dB. We prove that the system can tolerate the channel loss up to 23.7dB, therefore may be used in the 300-meter-long Jerlov type I clean seawater channel.

翻訳日:2023-03-25 14:06:03 公開日:2021-06-25

# 衛星から地球への量子通信における光時間モード

Temporal Modes of Light in Satellite-to-Earth Quantum Communications ( http://arxiv.org/abs/2106.13693v1 )

ライセンス: Link先を確認

Ziqing Wang, Robert Malaney, Ryan Aguinaldo

(参考訳) フォトニック・テンポラル・モード(TM)は、実現可能な多次元量子通信の候補である。しかし、Orbital Angular Momentum (OAM)のような他の多次元量子情報キャリアと比較して、TMはあまり注目されていない。さらに、新興量子インターネットと衛星ベースの量子通信の文脈では、TMは注目されていない。本研究では,tm空間に符号化された単一光子の衛星から地球への通信路を考慮し,この状況を改善する。以上の結果から,photonic tmは衛星から地上への高スループット量子通信を実現するための有望な手段であることが示唆された。特に、これらのモードが、OAM単一光子状態と比較して、衛星-地球間通信路における多重化性能と優れた量子鍵分布を実現する方法を示す。この結果を保証するtm識別のレベルを概説し、衛星ベースの量子インターネットにおける我々の結果の意義について論じる。

The photonic Temporal Mode (TM) represents a possible candidate for the delivery of viable multidimensional quantum communications. However, relative to other multidimensional quantum information carriers such as the Orbital Angular Momentum (OAM), the TM has received less attention. Moreover, in the context of the emerging quantum internet and satellite-based quantum communications, the TM has received no attention. In this work, we remedy this situation by considering the traversal through the satellite-to-Earth channel of single photons encoded in TM space. Our results indicate that for anticipated atmospheric conditions the photonic TM offers a promising avenue for the delivery of high-throughput quantum communications from a satellite to a terrestrial receiver. In particular, we show how these modes can provide for improved multiplexing performance and superior quantum key distribution in the satellite-to-Earth channel, relative to OAM single-photon states. The levels of TM discrimination that guarantee this outcome are outlined and implications of our results for the emerging satellite-based quantum internet are discussed.

翻訳日:2023-03-25 13:59:50 公開日:2021-06-25

# 条件付きフォン・ノイマンエントロピー上のデバイス独立な下界

Device-independent lower bounds on the conditional von Neumann entropy ( http://arxiv.org/abs/2106.13692v1 )

ライセンス: Link先を確認

Peter Brown, Hamza Fawzi and Omar Fawzi

(参考訳) 量子鍵分布(QKD)やランダムネス展開(RE)を含むいくつかのデバイス非依存(DI)プロトコルの速度は、特定の量子状態のクラスに対する条件付きフォン・ノイマンエントロピーの最適化によって計算できる。本研究では,そのようなレートで下限を計算する数値計算手法を提案する。一般分離ヒルベルト空間上で定義される系の条件付きフォン・ノイマンエントロピーに収束する最適化問題を導出する。 Navascu\'es-Pironio-Ac\'in階層を用いて、これらの問題を半定値プログラムに緩和し、DIプロトコルのレートの低い境界を計算する計算可能な方法を与える。提案手法を適用してDI-REおよびDI-QKDプロトコルの速度を計算することにより,従来の数値手法よりも大幅に改善され,DI-REとDI-QKDの両者の速度が大幅に向上したことを示す。特に、DI-QKDでは、現在の能力の範囲内にある最小限の検出効率しきい値を示す。さらに, 本手法は, 既知の密接な解析境界のインスタンスを回収することで, 高速に収束できることを実証する。最後に,本手法はエントロピー累積定理に適合するので,有限ラウンドプロトコルの計算速度を計算し,その安全性を証明できることを示す。

The rates of several device-independent (DI) protocols, including quantum key-distribution (QKD) and randomness expansion (RE), can be computed via an optimization of the conditional von Neumann entropy over a particular class of quantum states. In this work we introduce a numerical method to compute lower bounds on such rates. We derive a sequence of optimization problems that converge to the conditional von Neumann entropy of systems defined on general separable Hilbert spaces. Using the Navascu\'es-Pironio-Ac\'in hierarchy we can then relax these problems to semidefinite programs, giving a computationally tractable method to compute lower bounds on the rates of DI protocols. Applying our method to compute the rates of DI-RE and DI-QKD protocols we find substantial improvements over all previous numerical techniques, demonstrating significantly higher rates for both DI-RE and DI-QKD. In particular, for DI-QKD we show a new minimal detection efficiency threshold which is within the realm of current capabilities. Moreover, we demonstrate that our method is capable of converging rapidly by recovering instances of known tight analytical bounds. Finally, we note that our method is compatible with the entropy accumulation theorem and can thus be used to compute rates of finite round protocols and subsequently prove their security.

翻訳日:2023-03-25 13:59:34 公開日:2021-06-25

# 2つのローカライズ技術--ローエンド電話における高精度ローカライズを可能にする-

The Tale of Two Localization Technologies: Enabling Accurate Low-Overhead WiFi-based Localization for Low-end Phones ( http://arxiv.org/abs/2106.13663v1 )

ライセンス: Link先を確認

Ahmed Shokry, Moustafa Elhamshary, Moustafa Youssef

(参考訳) WiFiフィンガープリントは、屋内ローカライゼーションの主流技術の一つである。しかし、指紋データベースを手動で構築する初期校正フェーズが必要となる。このプロセスは労働集約的であり、環境の変化を繰り返す必要があります。 RF伝搬モデルやクラウドソーシングによる校正作業を減らすために多くのシステムが導入されたが、これらにはいくつかの制限がある。他のアプローチでは、最近開発されたiBeacon技術が、屋内ローカライゼーションのためのWiFiに代わるものとして使われている。しかし、ビーコンベースのソリューションは、ハイエンドフォンの小さなサブセットに限られている。本稿では,精度の低い屋内ローカライズシステムであるhybridlocを提案する。 HybridLocの基本的な考え方は、ハイエンドのスマートフォンのセンサーを利用してローエンドのスマートフォンをローカライズすることだ。具体的には、WiFi指紋は、BLE対応ハイエンドスマートフォンから取得した位置情報をラベル付けしたWi-Fiスキャンによってクラウドソースされる。これらのスキャンは、Wi-Fiフィンガープリントを自動で構築するために使用され、その後、ローエンドの携帯電話をユビキタスなWiFi技術でローカライズするために使われる。 hybridlocはまた、指紋作成に使用される推定ble位置の固有のエラーに対処するとともに、ノイズの多いワイヤレス環境や異種デバイスなどの実用的な配置問題に対処するための規定も備えている。 Android携帯電話を用いたHybridLocの評価では,手動指紋認証技術と同じ範囲で,正確な位置推定が可能である。さらに、WiFiのみをサポートするローエンド端末でのローエンド端末のローカライズ精度は、BLEをサポートするハイエンド端末と同等である。この精度はトレーニングオーバーヘッドなしで達成され、異なるユーザデバイスに対して堅牢であり、環境変化下で一貫性がある。

WiFi fingerprinting is one of the mainstream technologies for indoor localization. However, it requires an initial calibration phase during which the fingerprint database is built manually. This process is labour intensive and needs to be repeated with any change in the environment. While a number of systems have been introduced to reduce the calibration effort through RF propagation models or crowdsourcing, these still have some limitations. Other approaches use the recently developed iBeacon technology as an alternative to WiFi for indoor localization. However, these beacon-based solutions are limited to a small subset of high-end phones. In this paper, we present HybridLoc: an accurate low-overhead indoor localization system. The basic idea HybridLoc builds on is to leverage the sensors of high-end phones to enable localization of lower-end phones. Specifically, the WiFi fingerprint is crowdsourced by opportunistically collecting WiFi-scans labeled with location data obtained from BLE-enabled high-end smart phones. These scans are used to automatically construct the WiFi-fingerprint, that is used later to localize any lower-end cell phone with the ubiquitous WiFi technology. HybridLoc also has provisions for handling the inherent error in the estimated BLE locations used in constructing the fingerprint as well as to handle practical deployment issues including the noisy wireless environment, heterogeneous devices, among others. Evaluation of HybridLoc using Android phones shows that it can provide accurate localization in the same range as manual fingerprinting techniques under the same conditions. Moreover, the localization accuracy on low-end phones supporting only WiFi is comparable to that achieved with high-end phones supporting BLE. This accuracy is achieved with no training overhead, is robust to the different user devices, and is consistent under environment changes.

翻訳日:2023-03-25 13:58:37 公開日:2021-06-25

# 相互作用する量子系の散逸および散逸のないダイナミクスの等価性とそのユニタリフェルミ気体への応用

Equivalence of dissipative and dissipationless dynamics of interacting quantum systems with its application to the unitary Fermi gas ( http://arxiv.org/abs/2106.13621v1 )

ライセンス: Link先を確認

Masaaki Tokieda and Shimpei Endo

(参考訳) 粒子間相互作用を用いたCaldirola-Kanaiモデルによる量子散逸ダイナミクスの解析を行った。カルディロラ・カナイモデルの散逸量子力学は、粒子が強く相互作用している場合でも、負の外部調和ポテンシャルの下で散逸のない量子力学に正確にマッピングできることがわかった。特に,低温原子や原子核問題に関連する一元的フェルミ気体ではマッピングが有効であることを示す。

We analytically study quantum dissipative dynamics described by the Caldirola-Kanai model with inter-particle interactions. We have found that the dissipative quantum dynamics of the Caldirola-Kanai model can be exactly mapped to a dissipationless quantum dynamics under a negative external harmonic potential, even when the particles are strongly interacting. In particular, we show that the mapping is valid for the unitary Fermi gas, which is relevant for cold atoms and nuclear matters.

翻訳日:2023-03-25 13:57:54 公開日:2021-06-25

# 局地化システム評価における地中真実の精度の影響

The Effect of Ground Truth Accuracy on the Evaluation of Localization Systems ( http://arxiv.org/abs/2106.13614v1 )

ライセンス: Link先を確認

Chen Gu, Ahmed Shokry, Moustafa Youssef

(参考訳) 位置決定システムの性能を正確に評価する能力は、多くのアプリケーションにおいて不可欠である。通常、そのようなシステムの性能は、地上の真理位置と推定位置を比較して得られる。しかし、これらの地上の真実の場所は通常、地図をクリックするか、GPSのような他のグローバルな技術を使って得られる。これは、マーキングプロセス、地図歪み、または固有のGPS不正確さに起因する、地上の真実の誤りをもたらす。本稿では,局所化システムの評価に対する基底的真理誤差の影響を分析するための理論的枠組みを提案する。そこで本研究では,検証誤差とマーキング/マップグランド真理誤差から実アルゴリズム誤差を計算するアルゴリズムを2つ設計した。さまざまなパフォーマンス指標の境界をさらに確立します。典型的な環境において収集された実データを用いた理論的仮定と解析の検証は、地中真理誤差の存在下での局所化アルゴリズムの推定誤差を補正する理論的枠組みの能力を示している。具体的には, マーキング誤差アルゴリズムは実誤差CDFと4%以内の精度で一致し, マップが6mずれた場合には, 中央値/テール誤差を150%/72%精度で推定する。

The ability to accurately evaluate the performance of location determination systems is crucial for many applications. Typically, the performance of such systems is obtained by comparing ground truth locations with estimated locations. However, these ground truth locations are usually obtained by clicking on a map or using other worldwide available technologies like GPS. This introduces ground truth errors that are due to the marking process, map distortions, or inherent GPS inaccuracy. In this paper, we present a theoretical framework for analyzing the effect of ground truth errors on the evaluation of localization systems. Based on that, we design two algorithms for computing the real algorithmic error from the validation error and marking/map ground truth errors, respectively. We further establish bounds on different performance metrics. Validation of our theoretical assumptions and analysis using real data collected in a typical environment shows the ability of our theoretical framework to correct the estimated error of a localization algorithm in the presence of ground truth errors. Specifically, our marking error algorithm matches the real error CDF within 4%, and our map error algorithm provides a more accurate estimate of the median/tail error by 150%/72% when the map is shifted by 6m.

翻訳日:2023-03-25 13:57:43 公開日:2021-06-25

# 集束原子レーザービーム

Focusing Atom Laser Beams ( http://arxiv.org/abs/2106.13845v1 )

ライセンス: Link先を確認

R. Richberg and A. M. Martin

(参考訳) 我々は、ルビジウム85(^{85}$Rb)の準連続原子レーザービームの焦点を理論的に研究する。 Gross-Pitaevskii方程式に基づく2相モデル解析は、2-体原子-原子相互作用と3-体再結合損失の影響を含む。ハーモニックポテンシャルなどの光集束電位を用いて,集束原子ビームプロファイルの幅,ピーク密度,原子損失率などの重要な因子について検討した。我々の分析は、原子レーザーを用いることで最大8ドルnmの分解能が劇的に向上すると予想している。

We theoretically study the focusing of a quasi-continuous atom laser beam of rubidium-85 ($^{85}$Rb). A two-sate model analysis based on the Gross-Pitaevskii equation is used which comprises the effects of two-body atom-atom interactions and three-body recombination losses. Utilizing optical focusing potentials such as harmonic potentials, the essential factors such as the width, peak density and atom loss rate of the focused atom laser beam profile are investigated. Our analysis predicts that using an atom laser offers a dramatic improvement in resolution of up to $8$ nm.

翻訳日:2023-03-25 13:50:29 公開日:2021-06-25

# 機械学習のための量子埋め込み実験

Experimental Quantum Embedding for Machine Learning ( http://arxiv.org/abs/2106.13835v1 )

ライセンス: Link先を確認

Ilaria Gianani, Ivana Mastroserio, Lorenzo Buffoni, Natalia Bruno, Ludovica Donati, Valeria Cimini, Marco Barbieri, Francesco S. Cataliotti, and Filippo Caruso

(参考訳) ビッグデータの分類は通常、新しいデータクラスタへのマッピングが必要であり、より効率的で実行可能な線形セパレータによって機械学習アルゴリズムによって処理される。最近、lloydらは古典データを量子空間に埋め込む提案を推し進めた: これらはより複雑なヒルベルト空間に存在し、線形に分離可能なクラスターに分割できる。本稿では、量子光学と超低温原子をベースとした2つの異なる実験プラットフォームを設計し、深層学習法により量子埋め込みプロトコルを適応・数値的に最適化し、いくつかの古典的データに対して検証する。リゲッティ超伝導量子コンピュータでも同様の解析を行う。したがって、量子埋め込みアプローチは実験レベルでもうまく機能し、特に、異なるプラットフォームがこのタスクを達成するために補完的な方法でどのように機能するかを示す。これらの研究は、特にハイブリッド量子技術に基づく量子機械学習技術に関する将来の研究の道を開くかもしれない。

The classification of big data usually requires a mapping onto new data clusters which can then be processed by machine learning algorithms by means of more efficient and feasible linear separators. Recently, Lloyd et al. have advanced the proposal to embed classical data into quantum ones: these live in the more complex Hilbert space where they can get split into linearly separable clusters. Here, we implement these ideas by engineering two different experimental platforms, based on quantum optics and ultra-cold atoms respectively, where we adapt and numerically optimize the quantum embedding protocol by deep learning methods, and test it for some trial classical data. We perform also a similar analysis on the Rigetti superconducting quantum computer. Therefore, we find that the quantum embedding approach successfully works also at the experimental level and, in particular, we show how different platforms could work in a complementary fashion to achieve this task. These studies might pave the way for future investigations on quantum machine learning techniques especially based on hybrid quantum technologies.

翻訳日:2023-03-25 13:49:36 公開日:2021-06-25

# 分散センシングのための近似デコヒーレンス自由部分空間

Approximate decoherence free subspaces for distributed sensing ( http://arxiv.org/abs/2106.13828v1 )

ライセンス: Link先を確認

Arne Hamann, Pavel Sekatski and Wolfgang D\"ur

(参考訳) トラップ内の異なる位置に位置する複数の原子などのセンサネットワークを用いて,空間依存性のあるスカラー値場を検知することを検討する。特定信号のみを検知する空間相関の活用法を示し、定評のある位置における雑音源の非一貫性部分空間を構築することにより、異なる位置や不等な空間依存の他人に対して無感を与える方法を示す。これは特定の表面にあるノイズ源に拡張することができ、そこでは古典的な静電気学におけるミラー電荷と等方性表面との接続に遭遇する。一般的な状況では、制御された方法で信号強度を減少させるコストで、あるボリューム内のすべてのソースのノイズを著しく抑制する、近似デコヒーレンスのない部分空間の概念を導入する。多数のノイズ源が存在するにも関わらず, 長期間にわたって, 多数のセンサに対して, ハイゼンベルクスケーリングを維持できることを示す。内部状態とセンサ構成を構築するための効率的なフォーマリズムを導入し、それをいくつかの例に適用して、我々のアプローチの有用性と幅広い適用性を示す。

We consider the sensing of scalar valued fields with specific spatial dependence using a network of sensors, e.g. multiple atoms located at different positions within a trap. We show how to harness the spatial correlations to sense only a specific signal, and be insensitive to others at different positions or with unequal spatial dependence by constructing a decoherence-free subspace for noise sources at fixed, known positions. This can be extended to noise sources lying on certain surfaces, where we encounter a connection to mirror charges and equipotential surfaces in classical electrostatics. For general situations, we introduce the notion of an approximate decoherence-free subspace, where noise for all sources within some volume is significantly suppressed, at the cost of reducing the signal strength in a controlled way. We show that one can use this approach to maintain Heisenberg-scaling over long times and for a large number of sensors, despite the presence of multiple noise sources in large volumes. We introduce an efficient formalism to construct internal states and sensor configurations, and apply it to several examples to demonstrate the usefulness and wide applicability of our approach.

翻訳日:2023-03-25 13:49:21 公開日:2021-06-25

# リニア光学による絡み合ったフォトニック状態の生成

Creation of Entangled Photonic States Using Linear Optics ( http://arxiv.org/abs/2106.13825v1 )

ライセンス: Link先を確認

Sara Bartolucci, Patrick M. Birchall, Mercedes Gimeno-Segovia, Eric Johnston, Konrad Kieling, Mihir Pant, Terry Rudolph, Jake Smith, Chris Sparrow, Mihai D. Vidrighin

(参考訳) 線形光学素子のみを用いることで、デュアルレールフォトニックエンタングル状態の生成は本質的に確率的である。量子情報処理プロトコルのほぼ決定論的操作を実現するために大規模な多重化を必要とする。本稿では,線形光量子コンピューティング(loqc)アーキテクチャのフットプリントを大幅に削減する可能性を持つ,高い確率でフォトニック絡み合い状態を生成する複数の手法と手法を提案する。特に,4つの単一光子から最大p=2/3までのベル状態調製の改善,デュアルレールベル状態アンシラによるType-I融合の75%向上,ベル状態識別限界を超えるType-II融合の改善を示す。

Using only linear optical elements, the creation of dual-rail photonic entangled states is inherently probabilistic. Known entanglement generation schemes have low success probabilities, requiring large-scale multiplexing to achieve near-deterministic operation of quantum information processing protocols. In this paper, we introduce multiple techniques and methods to generate photonic entangled states with high probability, which have the potential to reduce the footprint of Linear Optical Quantum Computing (LOQC) architectures drastically. Most notably, we are showing how to improve Bell state preparation from four single photons to up to p=2/3, boost Type-I fusion to 75% with a dual-rail Bell state ancilla and improve Type-II fusion beyond the limits of Bell state discrimination.

翻訳日:2023-03-25 13:49:01 公開日:2021-06-25

# SiおよびGe量子ドットにおけるホールスピン量子ビットの完全な可変超微細相互作用

Fully tunable hyperfine interactions of hole spin qubits in Si and Ge quantum dots ( http://arxiv.org/abs/2106.13744v1 )

ライセンス: Link先を確認

Stefano Bosco and Daniel Loss

(参考訳) ホールスピン量子ビットはスケーラブルな量子コンピュータのフロントエンドプラットフォームであるが、最先端のデバイスは原子核欠陥との超微細な相互作用に起因するノイズに悩まされている。これらの相互作用は、デバイス設計と外部電界によって制御される高度に調整可能な異方性を有する。この調整性により、超微細ノイズがマグニチュードで抑制され、異方性に精製された材料に匹敵するスイートスポットが可能になる。量子ビットは非常に整合性が高く、電荷と超微細ノイズの影響を受けない驚くほど単純な設計を同定する。長い量子ドットに典型的な大きなスピン軌道相互作用は、量子ビット演算を高速化するだけでなく、極小ノイズを劇的に再正規化し、駆動する量子ビットのダイナミクスを定性的に変化させ、量子ビットゲートの忠実度を高める。本研究は,量子コンピュータのスケールアップのための高性能量子ビットの設計ガイドラインとして機能する。

Hole spin qubits are frontrunner platforms for scalable quantum computers, but state-of-the-art devices suffer from noise originating from the hyperfine interactions with nuclear defects. We show that these interactions have a highly tunable anisotropy that is controlled by device design and external electric fields. This tunability enables sweet spots where the hyperfine noise is suppressed by an order of magnitude and is comparable to isotopically purified materials. We identify surprisingly simple designs where the qubits are highly coherent and are largely unaffected by both charge and hyperfine noise. We find that the large spin-orbit interaction typical of elongated quantum dots not only speeds up qubit operations, but also dramatically renormalizes the hyperfine noise, altering qualitatively the dynamics of driven qubits and enhancing the fidelity of qubit gates. Our findings serve as guidelines to design high performance qubits for scaling up quantum computers.

翻訳日:2023-03-25 13:48:16 公開日:2021-06-25

# ロスレス画像フォーマットの比較

Comparison of Lossless Image Formats ( http://arxiv.org/abs/2108.02557v1 )

ライセンス: Link先を確認

David Barina

(参考訳) 近年、画像圧縮フォーマットとビデオ圧縮フォーマットを備えたバッグが取り壊されている。しかし、それらのほとんどはロスレス圧縮にフォーカスしており、ロスレスモードのみをサポートしている。本稿では、ロスレスフォーマットと「どちらが最も効率的か」という重要な疑問に焦点をあてる。 FLIFは現在、ロスレス画像圧縮の最も効率的なフォーマットであることが判明した。これとは対照的に、FLIF開発者はJPEG XLに賛成して開発を中止した。

In recent years, a bag with image and video compression formats has been torn. However, most of them are focused on lossy compression and only marginally support the lossless mode. In this paper, I will focus on lossless formats and the critical question: "Which one is the most efficient?" It turned out that FLIF is currently the most efficient format for lossless image compression. This finding is in contrast to that FLIF developers stopped its development in favor of JPEG XL.

翻訳日:2023-03-25 13:41:51 公開日:2021-06-25

# ポアソン方程式とシュリンガー方程式の3次元における双対性

Duality between Poisson and Schr\"odinger equations in three dimensions ( http://arxiv.org/abs/2107.04669v1 )

ライセンス: Link先を確認

G. Gonzalez

(参考訳) 3次元世界における静電問題と1次元世界における量子力学的問題との間の双対性は、静電結果を用いてシュリンガー方程式の基底状態解を得ることができることを3次元に一般化する。ここでは、同じ変換手法が中心ポテンシャルの3次元におけるs-wave Schr\"odinger方程式にも適用可能であることを示す。このアプローチは静電ポテンシャルとs波関数と電気エネルギー密度と量子力学的エネルギーとの一般的な関係をもたらす。

A duality between an electrostatic problem in a three dimensional world and a quantum mechanical problem in a one dimensional world which allows one to obtain the ground state solution of the Schr\"odinger equation by using electrostatic results is generalized to three dimensions. Here, it is demonstrated that the same transformation technique is also applicable to the s-wave Schr\"odinger equation in three dimensions for central potentials. This approach leads to a general relationship between the electrostatic potential and the s-wave function and the electric energy density to the quantum mechanical energy.

翻訳日:2023-03-25 13:41:45 公開日:2021-06-25

# 表面符号モデルの量子二重性

Quantum double aspects of surface code models ( http://arxiv.org/abs/2107.04411v1 )

ライセンス: Link先を確認

Alexander Cowtan and Shahn Majid

(参考訳) 我々は、量子二重D(G)$対称性を持つ正方格子上のフォールトトレラント量子コンピューティングの北エフモデルを再検討し、$G$は有限群である。我々は、その準粒子コンテンツの投影演算子を $d(g)$ の既約表現として提供し、これを開リボン励起空間 $l(s_0,s_1)$ の $d(g)$-双加群特性と組み合わせて、開リボンがそれらのエンドポイント間の情報をテレポートするのにどのように使えるかを示す。初期の仕事をベースにした自己完結型アカウントを提供しながら、$d(s_3)$ のゲートを含む表面符号理論として量子コンピューティングへの応用を強調する。 D( \Bbb Z_n)\cong \Bbb C\Bbb Z_n^2$(トリックリボン作用素とそのブレイディングを含む)の場合、この理論がトーリック符号のより単純な理論にどのように還元されるかを示す。一方、我々の構成は、有限次元ホップ代数$H$に基づいて$D(H)$モデルに一般化し、ホップ代数が半単純でない場合でも、$D(H)$のサイトアクションとリボン同値部分結果を含むことを示す。

We revisit the Kitaev model for fault tolerant quantum computing on a square lattice with underlying quantum double $D(G)$ symmetry, where $G$ is a finite group. We provide projection operators for its quasiparticles content as irreducible representations of $D(G)$ and combine this with $D(G)$-bimodule properties of open ribbon excitation spaces $L(s_0,s_1)$ to show how open ribbons can be used to teleport information between their endpoints $s_0,s_1$. We give a self-contained account that builds on earlier work but emphasises applications to quantum computing as surface code theory, including gates on $D(S_3)$. We show how the theory reduces to a simpler theory for toric codes in the case of $D( \Bbb Z_n)\cong \Bbb C\Bbb Z_n^2$, including toric ribbon operators and their braiding. In the other direction, we show how our constructions generalise to $D(H)$ models based on a finite-dimensional Hopf algebra $H$, including site actions of $D(H)$ and partial results on ribbon equivariance even when the Hopf algebra is not semisimple.

翻訳日:2023-03-25 13:41:36 公開日:2021-06-25

# 番組は継続しなければならない --パンデミック中の検査

The Show Must Go On -- Examination During a Pandemic ( http://arxiv.org/abs/2107.04014v1 )

ライセンス: Link先を確認

Pamela Fleischmann and Mitja Kulczynski and Dirk Nowotka

(参考訳) 予期せぬインシデントが発生すると、新しい革新的で柔軟なソリューションが求められます。もしこのイベントが新型コロナウイルス(COVID-19)のパンデミックのように急激で劇的なものであるなら、これらの解決策は生命を守りながら可能な限り正常性を保証することを目指している。ショックを受けた後,大学側は,学生の失業による財政問題に直面している学生が多いため,期待できる時間内での学位確保のために,学業を追求しなくてはならないと判断した。これは,教師としての私たちにとって,授業方法がほぼ1日から次の1日間に再編成されるだけでなく,厳格な衛生規則の下でペンや紙で直接行うべき試験方法の調整が必要であったことを暗示している。一方、修正は個人的な接触を避けるべきである。我々は、我が国の一般データ保護規制による高い基準を提供しながら、自宅でのデジタル化試験を安全に修正できる枠組みを開発した。さらに、自動テストシートが自動生成され、自動でデジタル化され、ワークオンテストがソートされるため、オフィスでの時間を最小限に抑えることができる。

When unexpected incidents occur, new innovative and flexible solutions are required. If this event is something such radical and dramatic like the COVID-19 pandemic, these solutions must aim to guarantee as much normality as possible while protecting lives. After a moment of shock our university decided that the students have to be able to pursue their studies for guaranteeing a degree in the expected time since most of them faced immediate financial problems due to the loss of their student jobs. This implied, for us as teachers, that we had to reorganise not only the teaching methods from nearly one day to the next, but we also had to come up with an adjusted way of examinations which had to take place in person with pen and paper under strict hygiene rules. On the other hand the correction should avoid personal contacts. We developed a framework which allowed us to correct the digitalised exams safely at home while providing the high standards given by the general data protection regulation of our country. Moreover, the time spent in the offices could be reduced to a minimum thanks to automatically generated exam sheets, automatically re-digitalised and sorted worked-on exams.

翻訳日:2023-03-25 13:41:09 公開日:2021-06-25

# AIと医薬品研究の将来

AI and the future of pharmaceutical research ( http://arxiv.org/abs/2107.03896v1 )

ライセンス: Link先を確認

Adam Zielinski

(参考訳) 本稿では今後,医薬品の進歩が医薬品開発にどのような影響を与えるかを検討する。この質問は、業界文献、研究雑誌、ai研究、市場報告、市場予測、討論論文、プレスリリース、組織のウェブサイトを含む豊富なソース資料をレビューすることで答えられた。この論文は、医薬品のAIの継続的な革新は、これまで治療できなかった病気に対する安全で効果的な治療法の迅速な開発を可能にすると論じている。製薬業界は今日、重大な生産性危機に陥っており、aiを活用した研究手法を直接適用することで、医薬品発見プロジェクトの時間とコストを削減することができる。業界はすでに、薬物分子発見時間の10倍削減などの結果を報告している。産業、政府、アカデミア間の多くのAIアライアンスにより、独自データの利用が可能となり、これまでで最大の分子毒性データベースや200以上の薬物安全性予測モデルなどの結果につながった。最近、テック大企業と記録的な資金調達ラウンドが組み合わさったことで勢いが増した。長期的な効果は、安全で効果的な治療法から、薬品の特許の役割が減り、大規模なコラボレーションや、現在治療不能な病気に焦点を当てた新しいビジネス戦略まで幅広い。論文は、多くのレビューされたリソースは、過度に楽観的な将来の期待を持っているように見えるが、これらの開発のごく一部でさえ、生産性の危機を緩和するだろうと指摘している。最後に、この論文は、医薬品のAIに焦点をあてることによって、別の大きな破壊、すなわちオープンデータ共有とコラボレーションに向けて業界を軌道に乗せたと結論付けている。

This paper examines how pharmaceutical Artificial Intelligence advancements may affect the development of new drugs in the coming years. The question was answered by reviewing a rich body of source material, including industry literature, research journals, AI studies, market reports, market projections, discussion papers, press releases, and organizations' websites. The paper argues that continued innovation in pharmaceutical AI will enable rapid development of safe and effective therapies for previously untreatable diseases. A series of major points support this conclusion: The pharmaceutical industry is in a significant productivity crisis today, and AI-enabled research methods can be directly applied to reduce the time and cost of drug discovery projects. The industry already reported results such as a 10-fold reduction in drug molecule discovery times. Numerous AI alliances between industry, governments, and academia enabled utilizing proprietary data and led to outcomes such as the largest molecule toxicity database to date or more than 200 drug safety predictive models. The momentum was recently increased by the involvement of tech giants combined with record rounds of funding. The long-term effects will range from safer and more effective therapies, through the diminished role of pharmaceutical patents, to large-scale collaboration and new business strategies oriented around currently untreatable diseases. The paper notes that while many reviewed resources seem to have overly optimistic future expectations, even a fraction of these developments would alleviate the productivity crisis. Finally, the paper concludes that the focus on pharmaceutical AI put the industry on a trajectory towards another significant disruption: open data sharing and collaboration.

翻訳日:2023-03-25 13:40:49 公開日:2021-06-25

# 移動センサを用いた日常行動リズム変化を用いた患者非依存型統合失調症再発予測

Patient-independent Schizophrenia Relapse Prediction Using Mobile Sensor based Daily Behavioral Rhythm Changes ( http://arxiv.org/abs/2106.15353v1 )

ライセンス: Link先を確認

Bishal Lamichhane, Dror Ben-Zeev, Andrew Campbell, Tanzeem Choudhury, Marta Hauser, John Kane, Mikio Obuchi, Emily Scherer, Megan Walsh, Rui Wang, Weichen Wang, and Akane Sano

(参考訳) 統合失調症の再発は患者の健康、仕事、時には生命の安全に深刻な影響を与える。例えば、患者の早期の行動変化を検出することで、予定される再発を予測することができれば、再発を防ぐための介入が提供される。本研究では,モバイルセンシングデータを用いた統合失調症再発予測モデルを用いて,行動の特徴を特徴付ける。再発予測のための臨床展開シナリオを忠実に表現した逐次予測を提供する患者独立モデルの評価を行った。このモデルは、過去4週間のモバイルセンシングデータを使用して、来週の到来を予測している。本研究では,モバイルセンシングデータの毎日のテンプレートから抽出した行動リズム特徴,ema(ecological momentary assessment)による自己報告症状,および再帰予測のための分類器の比較を行った。 Naive Bayesをベースとしたモデルでは、63人の統合失調症患者からなるデータセットで最大1年間測定されたF2スコアが0.083であった。得られたF2スコアは低いが、ランダム分類のベースライン性能より優れている(F2スコアは0.02$\pm$0.024)。このように、移動センシングは、来るべき再発を検出するための予測値を持ち、現在の性能を改善するためにさらなる調査が必要である。その目的に向けて、患者の行動的特徴に基づくさらなる機能工学とモデルパーソナライズが役立つかもしれない。

A schizophrenia relapse has severe consequences for a patient's health, work, and sometimes even life safety. If an oncoming relapse can be predicted on time, for example by detecting early behavioral changes in patients, then interventions could be provided to prevent the relapse. In this work, we investigated a machine learning based schizophrenia relapse prediction model using mobile sensing data to characterize behavioral features. A patient-independent model providing sequential predictions, closely representing the clinical deployment scenario for relapse prediction, was evaluated. The model uses the mobile sensing data from the recent four weeks to predict an oncoming relapse in the next week. We used the behavioral rhythm features extracted from daily templates of mobile sensing data, self-reported symptoms collected via EMA (Ecological Momentary Assessment), and demographics to compare different classifiers for the relapse prediction. Naive Bayes based model gave the best results with an F2 score of 0.083 when evaluated in a dataset consisting of 63 schizophrenia patients, each monitored for up to a year. The obtained F2 score, though low, is better than the baseline performance of random classification (F2 score of 0.02 $\pm$ 0.024). Thus, mobile sensing has predictive value for detecting an oncoming relapse and needs further investigation to improve the current performance. Towards that end, further feature engineering and model personalization based on the behavioral idiosyncrasies of a patient could be helpful.

翻訳日:2023-03-25 13:40:09 公開日:2021-06-25

# 幾何ヒートポンプ:時間依存変調による熱輸送制御

Geometric Heat Pump: Controlling Thermal Transport with Time-dependent Modulations ( http://arxiv.org/abs/2106.14687v1 )

ライセンス: Link先を確認

Zi Wang, Luqin Wang, Jiangzhi Chen, Chen Wang, and Jie Ren

(参考訳) 熱力学の第2法則は、熱は平均して熱湯から冷たい浴場まで同時に流れると定めている。この図を越えて、過去10年間の作品の範囲は、瞬時熱バイアスによって決定される平均的な動的熱流束を除いて、時間駆動系において、内在幾何学的起源の非自明な流束が一般的に存在することを示している。この追加の熱流束は、ポンプされた熱に対して無料のランチを提供し、バイアスに対して熱を駆動することもできる。ここでは, 位相幾何学的位相効果に由来する, いわゆる「幾何学的ヒートポンプ」の出現と発展を概観し, 内部動力学の異なる様々な量子・古典輸送系について述べる。 adiabatic から non-adiabatic regime への一般化と制御理論の適用についても論じる。次に, 熱ポンプ効果の対称性, 双対性, 超対称性, 時間反転対称性について概説する。最後に, 幾何学的ヒートポンププロセスに関するオープンな問題について検討し, 高性能熱機械開発におけるその将来的意義を明らかにする。

The second law of thermodynamics dictates that heat simultaneously flows from the hot to cold bath on average. To go beyond this picture, a range of works in the past decade show that, other than the average dynamical heat flux determined by instantaneous thermal bias, a non-trivial flux contribution of intrinsic geometric origin is generally present in temporally driven systems. This additional heat flux provides a free lunch for the pumped heat and could even drive heat against the bias. We review here the emergence and development of this so called ``geometric heat pump'', originating from the topological geometric phase effect, and cover various quantum and classical transport systems with different internal dynamics. The generalization from the adiabatic to the non-adiabatic regime and the application of control theory are also discussed. Then, we briefly discuss the symmetry restriction on the heat pump effect, such as duality, supersymmetry and time-reversal symmetry. Finally, we examine open problems concerning the geometric heat pump process and elucidate their prospective significance in devising thermal machines with high performance.

翻訳日:2023-03-25 13:39:45 公開日:2021-06-25

# 量子エネルギーテレポーテーションはいつ観測可能か?

Comment on "When the Quantum Energy Teleportation is Observable? " ( http://arxiv.org/abs/2106.14680v1 )

ライセンス: Link先を確認

Masahiro Hotta

(参考訳) 最近の論文(arxiv:2105.04407)の著者は、時間-エネルギーの不確実性関係のため、量子エネルギーテレポーテーションは観測不可能であると主張している。この短い注記で、私は彼らの議論が間違っていることを指摘します。彼らは不確実性関係を誤用する。

Recently authors of a paper (arXiv:2105.04407) claim that quantum energy teleportation is unobservable due to a time-energy uncertainty relation. In this short note, I will point out that their argument is wrong. They misuse the uncertainty relation.

翻訳日:2023-03-25 13:39:23 公開日:2021-06-25

# ハードウェアニューラルネットワークにおけるノイズ摂動下でのブール学習

Boolean learning under noise-perturbations in hardware neural networks ( http://arxiv.org/abs/2003.12319v2 )

ライセンス: Link先を確認

Louis Andreoli, Xavier Porte, St\'ephane Chr\'etien, Maxime Jacquot, Laurent Larger and Daniel Brunner

(参考訳) ニューラルネットワークの高効率なハードウェア統合は、非線形性の実現、ネットワーク接続、物理基板での学習から恩恵を受ける。最近、複数のシステムがこれらの操作の一部を実装したが、技術的な課題に対処することに焦点が当てられた。ハードウェアニューラルネットワークの学習に関する基本的な疑問はほとんど未解決のままである。このようなアーキテクチャでは特にノイズは避けられず、光電子リカレントニューラルネットワークを用いた学習アルゴリズムとのインタラクションについて検討する。ノイズは収束中のシステムの経路を強く修飾し、最終的な読み出し重み行列を驚くほど完全に分離する。これは、相互作用するプレイヤーとしてアーキテクチャ、ノイズ、学習アルゴリズムを理解することの重要性を強調し、ノイズの多いアナログシステムの最適化のための数学的ツールの必要性を特定する。

A high efficiency hardware integration of neural networks benefits from realizing nonlinearity, network connectivity and learning fully in a physical substrate. Multiple systems have recently implemented some or all of these operations, yet the focus was placed on addressing technological challenges. Fundamental questions regarding learning in hardware neural networks remain largely unexplored. Noise in particular is unavoidable in such architectures, and here we investigate its interaction with a learning algorithm using an opto-electronic recurrent neural network. We find that noise strongly modifies the system's path during convergence, and surprisingly fully decorrelates the final readout weight matrices. This highlights the importance of understanding architecture, noise and learning algorithm as interacting players, and therefore identifies the need for mathematical tools for noisy, analogue system optimization.

翻訳日:2022-12-19 04:27:01 公開日:2021-06-25

# ベイズ最適化を用いた未知非線形システムの安全学習に基づくオブザーバ

Safe Learning-based Observers for Unknown Nonlinear Systems using Bayesian Optimization ( http://arxiv.org/abs/2005.05888v2 )

ライセンス: Link先を確認

Ankush Chakrabarty and Mouhacine Benosman

(参考訳) 未知のダイナミクスを持つ動的システムから生成されたデータは、エラーのモデリングが堅牢で、設計が計算可能で、保証された性能で操作できる状態オブザーバの学習を可能にする。本稿では,3つの設計段階からなるモジュール設計方法論を定式化する。 i) 状態推定エラーの発散を許さずに、ダイナミックスを学習することのできる、初期の堅牢なオブザーバ設計(従って、安全)。 (II)ベイズ最適化とガウス過程を用いて非モデル化成分を推定する学習段階、及び (iii)状態推定誤差の収束率を改善するために学習ダイナミクスを利用する再設計フェーズ。提案する学習ベースオブザーバのポテンシャルをベンチマーク非線形システムで実証する。また、保証された推定性能の証明書を提供する。

Data generated from dynamical systems with unknown dynamics enable the learning of state observers that are: robust to modeling error, computationally tractable to design, and capable of operating with guaranteed performance. In this paper, a modular design methodology is formulated, that consists of three design phases: (i) an initial robust observer design that enables one to learn the dynamics without allowing the state estimation error to diverge (hence, safe); (ii) a learning phase wherein the unmodeled components are estimated using Bayesian optimization and Gaussian processes; and, (iii) a re-design phase that leverages the learned dynamics to improve convergence rate of the state estimation error. The potential of our proposed learning-based observer is demonstrated on a benchmark nonlinear system. Additionally, certificates of guaranteed estimation performance are provided.

翻訳日:2022-12-03 20:00:12 公開日:2021-06-25

# daemon:マルチステージ特徴マイニングを用いたデータセット非依存なマルウェア分類

DAEMON: Dataset-Agnostic Explainable Malware Classification Using Multi-Stage Feature Mining ( http://arxiv.org/abs/2008.01855v2 )

ライセンス: Link先を確認

Ron Korine and Danny Hendler

(参考訳) シグネチャに基づく検出を回避するために、悪意のあるプログラムのコードを変換する変異エンジンによって、多くの変成的および多形的な悪意のある変種が毎日自動的に生成される。これらの自動処理によってマルウェアの変種数が大幅に増加し、完全な手動解析が不可能になった。マルウェア分類は、新しい悪意のある亜種が属する家族を決定するタスクである。同じマルウェアファミリーの変異種は、同様の行動パターンを示す。したがって、新たに発見された悪意あるプログラムとアプリケーションの分類は、それらが引き起こすリスクを評価するのに役立つ。さらに、マルウェア分類は、新たに発見された変異種のうちどれがセキュリティ専門家による手動分析を受けるべきかを決定するのを容易とし、新しいファミリー(例えば、ゼロデイ脆弱性を利用するメンバー)に属するか、または単に既知の悪意のある家族内の概念のドリフトの結果であるかを決定する。これは近年、マルウェア分類のための高精度自動ツールの開発に力を入れている。本稿では,新しいデータセット非依存マルウェア分類器であるdaemonを提案する。デーモンの重要な特性は、使用する特徴の種類とそれらの採掘方法が、マルウェアファミリーの識別行動の理解を容易とし、その分類決定を説明可能にすることである。私たちは、x86バイナリの大規模なデータセットを使用して、DAEMONを最適化しました。その後、再トレーニングして、アルゴリズム的な変更なしに、多数のマルウェアファミリーからなる悪意のあるandroidアプリケーションの2つの大規模データセットに、機能再設計やパラメータチューニングを適用しました。 DAEMONは全てのデータセットの高精度な分類結果を得て、プラットフォームに依存しないことも確認した。

Numerous metamorphic and polymorphic malicious variants are generated automatically on a daily basis by mutation engines that transform the code of a malicious program while retaining its functionality, in order to evade signature-based detection. These automatic processes have greatly increased the number of malware variants, deeming their fully-manual analysis impossible. Malware classification is the task of determining to which family a new malicious variant belongs. Variants of the same malware family show similar behavioral patterns. Thus, classifying newly discovered malicious programs and applications helps assess the risks they pose. Moreover, malware classification facilitates determining which of the newly discovered variants should undergo manual analysis by a security expert, in order to determine whether they belong to a new family (e.g., one whose members exploit a zero-day vulnerability) or are simply the result of a concept drift within a known malicious family. This motivated intense research in recent years on devising high-accuracy automatic tools for malware classification. In this work, we present DAEMON - a novel dataset-agnostic malware classifier. A key property of DAEMON is that the type of features it uses and the manner in which they are mined facilitate understanding the distinctive behavior of malware families, making its classification decisions explainable. We've optimized DAEMON using a large-scale dataset of x86 binaries, belonging to a mix of several malware families targeting computers running Windows. We then re-trained it and applied it, without any algorithmic change, feature re-engineering or parameter tuning, to two other large-scale datasets of malicious Android applications consisting of numerous malware families. DAEMON obtained highly accurate classification results on all datasets, establishing that it is also platform-agnostic.

翻訳日:2022-11-03 01:17:49 公開日:2021-06-25

# 一般化線形モデルに対する線形およびスペクトル推定器の最適組み合わせ

Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models ( http://arxiv.org/abs/2008.03326v3 )

ライセンス: Link先を確認

Marco Mondelli, Christos Thrampoulidis and Ramji Venkataramanan

(参考訳) 本研究では,gaussian sensing matrixを用いた一般化線形モデルから得られた未知信号 $\boldsymbol x$ の回復問題について検討する。 2つの一般的な解は、線形推定器 $\hat{\boldsymbol x}^{\rm l}$ とスペクトル推定器 $\hat{\boldsymbol x}^{\rm s}$ に基づいている。前者は測定行列の列のデータ依存線形結合であり、その解析は非常に単純である。後者はデータ依存行列の主要な固有ベクトルであり、最近の研究でその性能が研究されている。本稿では、$\hat{\boldsymbol x}^{\rm L}$と$\hat{\boldsymbol x}^{\rm s}$を最適に組み合わせる方法を示す。我々の分析の中心は、高次元極限において$(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$の合同経験的分布の正確な特徴づけである。これにより、$\hat{\boldsymbol x}^{\rm l}$ と $\hat{\boldsymbol x}^{\rm s}$ のベイズ・オプティカル結合を計算することができる。信号の分布がガウス的であるとき、ベイズ-最適結合は $\theta\hat{\boldsymbol x}^{\rm L}+\hat{\boldsymbol x}^{\rm s}$ という形式を持ち、最適結合係数を導出する。 $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$の極限分布を確立するために、反復が$\hat{\boldsymbol x}^{\rm L}$を付与する近似メッセージパッシング(AMP)アルゴリズムを設計・解析し、$\hat{\boldsymbol x}^{\rm s}$にアプローチする。数値シミュレーションにより, 2つの手法を別々に検討した結果, 提案手法の改良が示された。

We study the problem of recovering an unknown signal $\boldsymbol x$ given measurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator $\hat{\boldsymbol x}^{\rm L}$ and a spectral estimator $\hat{\boldsymbol x}^{\rm s}$. The former is a data-dependent linear combination of the columns of the measurement matrix, and its analysis is quite simple. The latter is the principal eigenvector of a data-dependent matrix, and a recent line of work has studied its performance. In this paper, we show how to optimally combine $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$. At the heart of our analysis is the exact characterization of the joint empirical distribution of $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$ in the high-dimensional limit. This allows us to compute the Bayes-optimal combination of $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$, given the limiting distribution of the signal $\boldsymbol x$. When the distribution of the signal is Gaussian, then the Bayes-optimal combination has the form $\theta\hat{\boldsymbol x}^{\rm L}+\hat{\boldsymbol x}^{\rm s}$ and we derive the optimal combination coefficient. In order to establish the limiting distribution of $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$, we design and analyze an Approximate Message Passing (AMP) algorithm whose iterates give $\hat{\boldsymbol x}^{\rm L}$ and approach $\hat{\boldsymbol x}^{\rm s}$. Numerical simulations demonstrate the improvement of the proposed combination with respect to the two methods considered separately.

翻訳日:2022-11-02 01:22:51 公開日:2021-06-25

# データの目における公平性: 機械学習モデルの証明

Fairness in the Eyes of the Data: Certifying Machine-Learning Models ( http://arxiv.org/abs/2009.01534v3 )

ライセンス: Link先を確認

Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, Joseph Keshet

(参考訳) 本稿では,対話型およびプライバシ保護テストに基づいて,モデルの公正度を認定するフレームワークを提案する。フレームワークはトレーニングプロセスやアーキテクチャに関係なく、トレーニングされたモデルを検証する。これにより、複数のフェアネス定義に基づくディープラーニングモデルの評価を経験的に行うことができる。テストデータはテスタにのみプライベートに提供されるか、あるいはモデル作成者にも事前に公開されている2つのシナリオに対処します。理論解析を用いて提案手法の健全性を調査し,インタラクティブテストのための統計的保証を提案する。最後に,参加者の機密データを隠蔽しながら,対象モデルへのブラックボックスアクセスのみを用いて,公正性テストを自動化する暗号手法を提案する。

We present a framework that allows to certify the fairness degree of a model based on an interactive and privacy-preserving test. The framework verifies any trained model, regardless of its training process and architecture. Thus, it allows us to evaluate any deep learning model on multiple fairness definitions empirically. We tackle two scenarios, where either the test data is privately available only to the tester or is publicly known in advance, even to the model creator. We investigate the soundness of the proposed approach using theoretical analysis and present statistical guarantees for the interactive test. Finally, we provide a cryptographic technique to automate fairness testing and certified inference with only black-box access to the model at hand while hiding the participants' sensitive data.

翻訳日:2022-10-22 06:50:14 公開日:2021-06-25

# 正規化流れにおける深さと条件の表現的側面

Representational aspects of depth and conditioning in normalizing flows ( http://arxiv.org/abs/2010.01155v2 )

ライセンス: Link先を確認

Frederic Koehler, Viraj Mehta, Andrej Risteski

(参考訳) 正規化フローは、特に画像において、データポイントの確率を効率的に評価できるため、生成モデリングにおいて最も一般的なパラダイムの一つである。これは、モデルの適合性を評価することと、トレーニングの容易性の両方において、勾配降下による可能性の最大化が望ましい。しかし、フローの正規化のトレーニングには、困難も伴う。良いサンプルを生成するモデルは通常、非常に深いものが必要です。それらは $\mathbb{R}^d \to \mathbb{R}^d$ から可逆写像としてパラメータ化され、画像のような典型的な訓練データは直感的に低次元であるため、学習された写像は特異点に近いヤコビアンを持つことが多い。本稿では,一般的な可逆アーキテクチャと一般的なアーキテクチャであるアフィンカップリングについて,奥行きと正規化フローの条件付けに関する表現的側面を取り上げる。 GLOWで使われているように、$\Theta(1)$アフィン結合層は、置換を正確に表すのに十分か、または1ドル \times 1$畳み込みを表すのに十分であることを示す。また, 浅いアフィンカップリングネットワークは不調が許容される場合, ワッサースタイン距離の普遍近似であり, パディングに関連する関連する現象を実験的に検討する。最後に,層ごとのニューロン数が少なく,リプシッツ定数が有界な一般フローアーキテクチャの深さ下界を示す。

Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point. This is desirable both for evaluating the fit of a model, and for ease of training, as maximizing the likelihood can be done by gradient descent. However, training normalizing flows comes with difficulties as well: models which produce good samples typically need to be extremely deep -- which comes with accompanying vanishing/exploding gradient problems. A very related problem is that they are often poorly conditioned: since they are parametrized as invertible maps from $\mathbb{R}^d \to \mathbb{R}^d$, and typical training data like images intuitively is lower-dimensional, the learned maps often have Jacobians that are close to being singular. In our paper, we tackle representational aspects around depth and conditioning of normalizing flows: both for general invertible architectures, and for a particular common architecture, affine couplings. We prove that $\Theta(1)$ affine coupling layers suffice to exactly represent a permutation or $1 \times 1$ convolution, as used in GLOW, showing that representationally the choice of partition is not a bottleneck for depth. We also show that shallow affine coupling networks are universal approximators in Wasserstein distance if ill-conditioning is allowed, and experimentally investigate related phenomena involving padding. Finally, we show a depth lower bound for general flow architectures with few neurons per layer and bounded Lipschitz constant.

翻訳日:2022-10-12 00:59:45 公開日:2021-06-25

# ニューラルマシン翻訳におけるソース分析と予測への目標貢献

Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation ( http://arxiv.org/abs/2010.10907v3 )

ライセンス: Link先を確認

Elena Voita, Rico Sennrich, Ivan Titov

(参考訳) ニューラルネットワーク翻訳(およびより一般的には、条件付き言語モデリング)では、ターゲットトークンの生成は、ターゲットシーケンスのソースとプレフィックスの2つのタイプのコンテキストに影響される。 NMTモデルの内部動作を理解するために多くの試みがなされているが、いずれも相対的な情報源と世代決定への目標貢献を明示的に評価するものではない。この相対的貢献は、Layerwise Relevance Propagation (LRP)の変種を採用することで評価できると論じる。他の方法とは異なり、トークンの重要性を反映した抽象的な量ではなく、それぞれのトークンの影響の比率を評価する。我々は、LPPをTransformerに拡張し、生成プロセスに対するソースおよびターゲット相対的コントリビューションを明確に評価するNMTモデルの解析を行う。本研究は,プレフィックスの種類による条件づけや,トレーニング目標やトレーニングデータ量の変化,トレーニングプロセスにおける貢献度の変化を分析する。より多くのデータでトレーニングされたモデルは、ソース情報に依存する傾向があり、より鋭いトークンコントリビュートを持つ傾向があることが分かりました。

In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We argue that this relative contribution can be evaluated by adopting a variant of Layerwise Relevance Propagation (LRP). Its underlying 'conservation principle' makes relevance propagation unique: differently from other methods, it evaluates not an abstract quantity reflecting token importance, but the proportion of each token's influence. We extend LRP to the Transformer and conduct an analysis of NMT models which explicitly evaluates the source and target relative contributions to the generation process. We analyze changes in these contributions when conditioning on different types of prefixes, when varying the training objective or the amount of training data, and during the training process. We find that models trained with more data tend to rely on source information more and to have more sharp token contributions; the training process is non-monotonic with several stages of different nature.

翻訳日:2022-10-04 23:50:19 公開日:2021-06-25

# 視覚オブジェクト操作のためのキーポイント予測モデルのマルチモーダル学習

Multi-Modal Learning of Keypoint Predictive Models for Visual Object Manipulation ( http://arxiv.org/abs/2011.03882v2 )

ライセンス: Link先を確認

Sarah Bechtle, Neha Das and Franziska Meier

(参考訳) 人間は、全く新しい環境でオブジェクトやツールを操作できるという印象的な一般化能力を持っている。これらの能力は、少なくとも部分的には、人間の体の内部モデルと把握された物体を持つ結果である。ロボットのボディスキーマを学習する方法は、まだ未解決の問題である。本研究では,視覚潜在表現から物体をつかむ際にロボットの運動モデルを拡張できる自己教師付きアプローチを開発した。本フレームワークは,(1) 物体上の視覚的キーポイントを予測するためにプロセプションと視覚を融合させて訓練したオートエンコーダアーキテクチャ,(2) 学習したキーポイント検出器を用いて,予測された視覚的キーポイントから仮想ジョイントを回帰することにより,キネマティックチェーンの拡張を学習する方法を示す。提案手法は,マニピュレータの手にある物体の視覚的キーポイントを一貫して予測することを学び,数秒間の視覚データから,様々な構成で把握された物体を含む拡張キネマティックチェーンの学習を容易にする。最後に, この拡張キネマティックチェーンは, 把握対象の配置やシミュレーション実験, ハードウェア上での実験など, オブジェクト操作作業に役立てることを示す。

Humans have impressive generalization capabilities when it comes to manipulating objects and tools in completely novel environments. These capabilities are, at least partially, a result of humans having internal models of their bodies and any grasped object. How to learn such body schemas for robots remains an open problem. In this work, we develop an self-supervised approach that can extend a robot's kinematic model when grasping an object from visual latent representations. Our framework comprises two components: (1) we present a multi-modal keypoint detector: an autoencoder architecture trained by fusing proprioception and vision to predict visual key points on an object; (2) we show how we can use our learned keypoint detector to learn an extension of the kinematic chain by regressing virtual joints from the predicted visual keypoints. Our evaluation shows that our approach learns to consistently predict visual keypoints on objects in the manipulator's hand, and thus can easily facilitate learning an extended kinematic chain to include the object grasped in various configurations, from a few seconds of visual data. Finally we show that this extended kinematic chain lends itself for object manipulation tasks such as placing a grasped object and present experiments in simulation and on hardware.

翻訳日:2022-09-28 08:35:52 公開日:2021-06-25

# Masked Proxy Loss for Text-Independent Speaker Verification (英語)

Masked Proxy Loss For Text-Independent Speaker Verification ( http://arxiv.org/abs/2011.04491v2 )

ライセンス: Link先を確認

Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh

(参考訳) オープンセット話者認識は,クラス間分散を最大化し,クラス内分散を最小化する,メートル法学習問題とみなすことができる。教師付きメトリック学習は、エンティティベースの学習とプロキシベースの学習に分類できる。 contrastive、triplet、prototypical、ge2eなど、既存のメトリック学習の目標のほとんどは、前者の区分に属しており、そのパフォーマンスはサンプルマイニング戦略に大きく依存するか、ミニバッチのラベル情報に制限されている。プロキシベースの損失は、どちらの欠点も軽減するが、エンティティ間のきめ細かい接続は、そうでないか間接的に活用される。本稿では、プロキシベースの関係とペアベースの関係を直接組み込んだMasked Proxy(MP)損失を提案する。さらに,話者対の難易度を活用するために,マルチノマルマスクドプロキシ(mmp)損失を提案する。これらの手法はVoxCelebテストセットの評価に応用され、最先端のEER(Equal Error Rate)に達する。

Open-set speaker recognition can be regarded as a metric learning problem, which is to maximize inter-class variance and minimize intra-class variance. Supervised metric learning can be categorized into entity-based learning and proxy-based learning. Most of the existing metric learning objectives like Contrastive, Triplet, Prototypical, GE2E, etc all belong to the former division, the performance of which is either highly dependent on sample mining strategy or restricted by insufficient label information in the mini-batch. Proxy-based losses mitigate both shortcomings, however, fine-grained connections among entities are either not or indirectly leveraged. This paper proposes a Masked Proxy (MP) loss which directly incorporates both proxy-based relationships and pair-based relationships. We further propose Multinomial Masked Proxy (MMP) loss to leverage the hardness of speaker pairs. These methods have been applied to evaluate on VoxCeleb test set and reach state-of-the-art Equal Error Rate(EER).

翻訳日:2022-09-28 02:02:57 公開日:2021-06-25

# 強化因子ポートフォリオに対するディリクレポリシー

Dirichlet policies for reinforced factor portfolios ( http://arxiv.org/abs/2011.05381v3 )

ライセンス: Link先を確認

Eric Andr\'e and Guillaume Coqueret

(参考訳) 本稿では、要素投資と強化学習(RL)を組み合わせることを目的とする。エージェントは、企業の特性に依存する逐次ランダム割り当てを通じて学習する。ディリクレ分布を駆動方針として用いることにより,性能尺度の政策勾配および分析特性の閉形式を導出する。これにより、米国株式の大きなデータセット上で実行されるREINFORCEメソッドの実装が可能になる。その結果、rlベースのポートフォリオは均等に重み付けされた(1/n)の割り当てに非常に近いことがわかった。これは、エージェントが因子に関して*診断的*であることを学ぶことを意味し、これは部分的には、リターンと強みの関係において強い時間変化を示す断面回帰によって説明できる。

This article aims to combine factor investing and reinforcement learning (RL). The agent learns through sequential random allocations which rely on firms' characteristics. Using Dirichlet distributions as the driving policy, we derive closed forms for the policy gradients and analytical properties of the performance measure. This enables the implementation of REINFORCE methods, which we perform on a large dataset of US equities. Across a large range of parametric choices, our result indicates that RL-based portfolios are very close to the equally-weighted (1/N) allocation. This implies that the agent learns to be *agnostic* with regard to factors, which can partly be explained by cross-sectional regressions showing a strong time variation in the relationship between returns and firm characteristics.

翻訳日:2022-09-27 08:16:52 公開日:2021-06-25

# iReason: ビデオと解釈可能な自然言語を用いたマルチモーダルコモンセンス推論

iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability ( http://arxiv.org/abs/2107.10300v1 )

ライセンス: Link先を確認

Aman Chadha and Vinija Jain

(参考訳) 因果的知識は、堅牢なAIシステムを構築する上で不可欠である。ディープラーニングモデルは、しばしば因果推論を必要とするタスクでパフォーマンスが悪く、入力ですぐには利用できないが、人間によって暗黙的に推論されるある種のコモンセンス知識を用いて導出されることが多い。先行研究は、モデルが因果性の欠如に危険を及ぼすような、散発的な観察バイアスを生じさせていない。言語表現モデルは学習された組込みの中で文脈知識を保存するが、訓練中の因果関係には影響しない。視覚認知タスク(シーン理解、ビデオキャプション、ビデオ質問回答など)を実行する既存のモデルに、入力特徴と因果関係をブレンドすることにより。 ) 因果関係がもたらす洞察により、より良いパフォーマンスを達成することができる。近年,視覚的・テキスト的モダリティから因果データを抽出する作業に取り組むモデルがいくつか提案されている。しかし、視覚と言語的モダリティを併せ持つ因果関係を探究する広範な研究は存在していない。画像は因果関係の知識を抽出するためのリッチでプロセスのリソースを提供するが、ビデオはより密度が高く、自然に時間順のイベントで構成されている。また、テキスト情報はビデオで暗黙的な詳細を提供する。 ireasonは,映像と自然言語キャプションを用いて視覚・視覚常識知識を推定するフレームワークである。さらに、iReasonのアーキテクチャは因果合理化モジュールを統合し、解釈可能性、エラー分析、バイアス検出のプロセスを支援する。言語表現学習モデル(BERT, GPT-2)と現在の最先端マルチモーダル因果モデルを用いた2段階比較分析によるiReasonの有効性を実証する。

Causality knowledge is vital to building robust AI systems. Deep learning models often perform poorly on tasks that require causal reasoning, which is often derived using some form of commonsense knowledge not immediately available in the input but implicitly inferred by humans. Prior work has unraveled spurious observational biases that models fall prey to in the absence of causality. While language representation models preserve contextual knowledge within learned embeddings, they do not factor in causal relationships during training. By blending causal relationships with the input features to an existing model that performs visual cognition tasks (such as scene understanding, video captioning, video question-answering, etc.), better performance can be achieved owing to the insight causal relationships bring about. Recently, several models have been proposed that have tackled the task of mining causal data from either the visual or textual modality. However, there does not exist widespread research that mines causal relationships by juxtaposing the visual and language modalities. While images offer a rich and easy-to-process resource for us to mine causality knowledge from, videos are denser and consist of naturally time-ordered events. Also, textual information offers details that could be implicit in videos. We propose iReason, a framework that infers visual-semantic commonsense knowledge using both videos and natural language captions. Furthermore, iReason's architecture integrates a causal rationalization module to aid the process of interpretability, error analysis and bias detection. We demonstrate the effectiveness of iReason using a two-pronged comparative analysis with language representation learning models (BERT, GPT-2) as well as current state-of-the-art multimodal causality models.

翻訳日:2021-07-25 11:53:39 公開日:2021-06-25

# (参考訳) aiSTROM -- AI戦略を成功させるロードマップ

aiSTROM -- A roadmap for developing a successful AI strategy ( http://arxiv.org/abs/2107.06071v1 )

ライセンス: CC BY 4.0

Dorien Herremans

(参考訳) 1,870社のRackspace Technologyによる最近の調査によると、AI研究開発プロジェクトの34%が失敗または放棄されている。我々は、管理者が詳細な文献レビューに基づいてAI戦略を成功させるための新しい戦略フレームワーク、aiSTROMを提案する。これは、マネージャと開発者を、実装プロセスのさまざまな課題を通じて導く、ユニークで統合されたアプローチを提供する。 aiSTROMフレームワークでは、まずトップnプロジェクト(典型的には3-5)を特定します。それぞれ、焦点の7つの領域を徹底的に分析する。これらの領域には、独自の部門間機械学習データ要件、セキュリティ、法的要件を考慮したデータ戦略の作成が含まれる。そして、AI人材の不足を踏まえた学際的人工知能(AI)実装チームを編成する方法を考えるようマネージャに指示する。 AIチームの戦略が確立すれば、部門横断的あるいは独立した部門として、組織内に配置する必要があります。その他には、AI as a service(AIaas)やアウトソーシング開発などがある。新しい技術を見てみると、バイアス、ブラックボックスモデルの合法性、人間をループに留めるといった課題を考える必要がある。次に、他のプロジェクトと同様に、進捗を追跡し検証するために価値ベースのキーパフォーマンス指標(KPI)が必要です。企業のリスク戦略によって、SWOT分析(強度、弱点、機会、脅威)は、ショートリスト化されたプロジェクトをさらに分類するのに役立ちます。最後に、採用の文化を実現するために、当社の戦略が従業員の継続的な教育を含むことを確実にすべきです。このユニークで包括的なフレームワークは、マネージャとリードディベロッパに価値ある文学的サポートを提供する。

A total of 34% of AI research and development projects fails or are abandoned, according to a recent survey by Rackspace Technology of 1,870 companies. We propose a new strategic framework, aiSTROM, that empowers managers to create a successful AI strategy based on a thorough literature review. This provides a unique and integrated approach that guides managers and lead developers through the various challenges in the implementation process. In the aiSTROM framework, we start by identifying the top n potential projects (typically 3-5). For each of those, seven areas of focus are thoroughly analysed. These areas include creating a data strategy that takes into account unique cross-departmental machine learning data requirements, security, and legal requirements. aiSTROM then guides managers to think about how to put together an interdisciplinary artificial intelligence (AI) implementation team given the scarcity of AI talent. Once an AI team strategy has been established, it needs to be positioned within the organization, either cross-departmental or as a separate division. Other considerations include AI as a service (AIaas), or outsourcing development. Looking at new technologies, we have to consider challenges such as bias, legality of black-box-models, and keeping humans in the loop. Next, like any project, we need value-based key performance indicators (KPIs) to track and validate the progress. Depending on the company's risk-strategy, a SWOT analysis (strengths, weaknesses, opportunities, and threats) can help further classify the shortlisted projects. Finally, we should make sure that our strategy includes continuous education of employees to enable a culture of adoption. This unique and comprehensive framework offers a valuable, literature supported, tool for managers and lead developers.

翻訳日:2021-07-18 19:56:15 公開日:2021-06-25

# 深部ニューラルネットワークを用いた内因性電位を用いた自然脳と機械の相互作用

Towards Natural Brain-Machine Interaction using Endogenous Potentials based on Deep Neural Networks ( http://arxiv.org/abs/2107.07335v1 )

ライセンス: Link先を確認

Hyung-Ju Ahn, Dae-Hyeok Lee, Ji-Hoon Jeong, Seong-Whan Lee

(参考訳) 人間とロボットのコラボレーションは、自律ロボットの操作効率を最大化する可能性がある。脳機械インタフェース(BMI)は、ユーザーの意図や状態が神経活動から翻訳できるため、ロボットと協調する上で望ましい技術である。しかし、最も一般的な非侵襲的BMIモダリティの1つである脳波図(EEG)は、信号対雑音比が低いため、精度が低く、自由度(DoF)が制限されている。したがって、より柔軟なBMIベースの人間ロボットコラボレーションを開発するためには、マルチクラス脳波分類の性能向上が不可欠である。本研究では,運動画像 (MI) や視覚画像 (VI) ,音声画像 (SI) などの複数の内因性BMIパラダイムのパラダイム間分類の可能性を検討した。 MI, VI, SIの統計的, 神経生理学的解析を行い, 提案した時間情報ベースニューラルネットワーク(TINN)を用いて3つのパラダイムを分類した。 3つの内在的パラダイムを分類すると, 統計的に有意な特徴が異なる脳領域で抽出できることを確認した。さらに,提案したTINNは,従来の3種類の精神イメージタスク(MI, VI, SI)の分類法と比較して0.93の精度を示した。

Human-robot collaboration has the potential to maximize the efficiency of the operation of autonomous robots. Brain-machine interface (BMI) would be a desirable technology to collaborate with robots since the intention or state of users can be translated from the neural activities. However, the electroencephalogram (EEG), which is one of the most popularly used non-invasive BMI modalities, has low accuracy and a limited degree of freedom (DoF) due to a low signal-to-noise ratio. Thus, improving the performance of multi-class EEG classification is crucial to develop more flexible BMI-based human-robot collaboration. In this study, we investigated the possibility for inter-paradigm classification of multiple endogenous BMI paradigms, such as motor imagery (MI), visual imagery (VI), and speech imagery (SI), to enhance the limited DoF while maintaining robust accuracy. We conducted the statistical and neurophysiological analyses on MI, VI, and SI and classified three paradigms using the proposed temporal information-based neural network (TINN). We confirmed that statistically significant features could be extracted on different brain regions when classifying three endogenous paradigms. Moreover, our proposed TINN showed the highest accuracy of 0.93 compared to the previous methods for classifying three different types of mental imagery tasks (MI, VI, and SI).

翻訳日:2021-07-18 12:21:15 公開日:2021-06-25

# (参考訳) IoTセキュリティにおける侵入検出のためのフェデレーション学習:ハイブリッドアンサンブルアプローチ

Federated Learning for Intrusion Detection in IoT Security: A Hybrid Ensemble Approach ( http://arxiv.org/abs/2106.15349v1 )

ライセンス: CC BY 4.0

Sayan Chatterjee and Manjesh K. Hanawal

(参考訳) スマートシティ、ヘルスケア、サプライチェーン、輸送といったさまざまなドメインにおけるIoT(Internet of Things)の役割は、悪意のある攻撃の標的となっている。この領域における過去の研究は、データ分析と脅威を特定する中央エンティティの存在を前提として、集中侵入検知システム(IDS)に焦点を当てていた。しかし、複数のソースにまたがるデータの拡散と中央ノードでの収集がコストがかかるため、そのようなIDSが常に実現可能であるとは限らない。また、初期の作業は、主にTrue Positive Rate(TPR)の改善に重点を置いており、システムの不必要なダウンタイムを回避する上でも不可欠である偽陽性レート(FPR)を無視している。本稿では、まず、PHECと呼ばれるハイブリッドアンサンブルモデルに基づくIDSのためのアーキテクチャを提案する。次に、このモデルを、ローカルトレーニングを実行し、モデルパラメータのみを集約する連合学習フレームワークに適応させる。次に、ラベルノイズ問題に対処するために、集中型およびフェデレートされた環境における耐雑音性PHECを提案する。提案手法は重み付き凸代理損失関数を用いた分類器を用いる。提案アーキテクチャでは,KNN分類器のノイズデータに対する自然な堅牢性も利用している。各種セキュリティ攻撃から得られた4つのベンチマークデータセットによる実験結果から,FPRを低ノイズでクリーンなデータに保ちながら高いTPRを達成することが示された。さらに, ハイブリッドアンサンブルモデルにより, 集中型設定に近いフェデレーション設定において, 性能が向上することを示した。

Critical role of Internet of Things (IoT) in various domains like smart city, healthcare, supply chain and transportation has made them the target of malicious attacks. Past works in this area focused on centralized Intrusion Detection System (IDS), assuming the existence of a central entity to perform data analysis and identify threats. However, such IDS may not always be feasible, mainly due to spread of data across multiple sources and gathering at central node can be costly. Also, the earlier works primarily focused on improving True Positive Rate (TPR) and ignored the False Positive Rate (FPR), which is also essential to avoid unnecessary downtime of the systems. In this paper, we first present an architecture for IDS based on hybrid ensemble model, named PHEC, which gives improved performance compared to state-of-the-art architectures. We then adapt this model to a federated learning framework that performs local training and aggregates only the model parameters. Next, we propose Noise-Tolerant PHEC in centralized and federated settings to address the label-noise problem. The proposed idea uses classifiers using weighted convex surrogate loss functions. Natural robustness of KNN classifier towards noisy data is also used in the proposed architecture. Experimental results on four benchmark datasets drawn from various security attacks show that our model achieves high TPR while keeping FPR low on noisy and clean data. Further, they also demonstrate that the hybrid ensemble models achieve performance in federated settings close to that of the centralized settings.

翻訳日:2021-07-02 04:49:45 公開日:2021-06-25

# (参考訳) PQK: プルーニング、量子化、知識蒸留によるモデル圧縮

PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation ( http://arxiv.org/abs/2106.14681v1 )

ライセンス: CC BY 4.0

Jangho Kim, Simyung Chang and Nojun Kwak

(参考訳) エッジデバイスが普及するにつれて、エッジデバイスにディープニューラルネットワーク(DNN)をデプロイすることが重要な問題となっている。しかし、DNNはエッジデバイスではほとんど利用できない高い計算資源を必要とする。そこで本稿では, プルーニング, 量子化, 知識蒸留(KD)プロセスからなるPQKと呼ばれる, 限られた計算資源を持つデバイスを対象とした新しいモデル圧縮手法を提案する。従来のプルーニングやKDとは異なり、PQKはプルーニング過程において重要でない重みを利用して、教師モデルを事前訓練することなく、より良い学生ネットワークをトレーニングするための教師ネットワークを構築している。 PQKには2つのフェーズがある。フェーズ1は、反復的プルーニングと量子化対応トレーニングを利用して、軽量で電力効率の良いモデルを作成する。第2相では、第1相未使用の重要度重みを刈り込みネットワークに付加して教師ネットワークを構築する。この教師ネットワークを用いて,学生ネットワークとして刈り取られたネットワークを訓練する。このような場合、教師と学生のネットワークが同一ネットワーク内で共存するため、KDフレームワーク用に事前学習した教師ネットワークは必要ない。本手法を認識モデルに適用し,キーワードスポッティング(KWS)と画像認識におけるPQKの有効性を検証する。

As edge devices become prevalent, deploying Deep Neural Networks (DNN) on edge devices has become a critical issue. However, DNN requires a high computational resource which is rarely available for edge devices. To handle this, we propose a novel model compression method for the devices with limited computational resources, called PQK consisting of pruning, quantization, and knowledge distillation (KD) processes. Unlike traditional pruning and KD, PQK makes use of unimportant weights pruned in the pruning process to make a teacher network for training a better student network without pre-training the teacher model. PQK has two phases. Phase 1 exploits iterative pruning and quantization-aware training to make a lightweight and power-efficient model. In phase 2, we make a teacher network by adding unimportant weights unused in phase 1 to a pruned network. By using this teacher network, we train the pruned network as a student network. In doing so, we do not need a pre-trained teacher network for the KD framework because the teacher and the student networks coexist within the same network. We apply our method to the recognition model and verify the effectiveness of PQK on keyword spotting (KWS) and image recognition.

翻訳日:2021-07-02 04:32:43 公開日:2021-06-25

# (参考訳) 問題を使って法的決定を説明する

Using Issues to Explain Legal Decisions ( http://arxiv.org/abs/2106.14688v1 )

ライセンス: CC BY 4.0

Trevor Bench-Capon

(参考訳) 訴訟の結果を予測するために設計された機械学習システムからのアウトプットを説明する必要性は、従来のaiと法システム、特に因子ベースの推論と前例を用いた説明に対する新たな関心をもたらした。本稿では,このようなシステムに対してどのような説明が期待できるのか,特に問題の利用によって提供できる構造に焦点をあてて検討する。

The need to explain the output from Machine Learning systems designed to predict the outcomes of legal cases has led to a renewed interest in the explanations offered by traditional AI and Law systems, especially those using factor based reasoning and precedent cases. In this paper we consider what sort of explanations we should expect from such systems, with a particular focus on the structure that can be provided by the use of issues in cases.

翻訳日:2021-07-01 12:27:13 公開日:2021-06-25

# (参考訳) 層状量子近似最適化におけるトレーニング飽和

Training Saturation in Layerwise Quantum Approximate Optimisation ( http://arxiv.org/abs/2106.13814v1 )

ライセンス: CC BY 4.0

E. Campos, D. Rabinovich, V. Akshay, J. Biamonte

(参考訳) 量子近似最適化(QAOA)は、今日最も研究されているゲートベースの変分量子アルゴリズムである。同時にQAOAを1層にトレーニングし、$n$ qubitのターゲットステートでオーバーラップを最大化します。これによって、このようなトレーニングは常に -- \textit{training saturation} と呼ばれる -- 一定の深さの $p^*$ で飽和していることが分かりました。我々は飽和に必要な条件を定式化する。数値的には、層状qaoaは深さ$p^*=n$で最大重なりに達する。トレーニングに一貫性を欠くエラーを加えることで、飽和が解消され、層間トレーニングの堅牢性が回復する。本研究は,QAOAの性能限界と今後の展望について,新たな光を当てるものである。

Quantum Approximate Optimisation (QAOA) is the most studied gate based variational quantum algorithm today. We train QAOA one layer at a time to maximize overlap with an $n$ qubit target state. Doing so we discovered that such training always saturates -- called \textit{training saturation} -- at some depth $p^*$, meaning that past a certain depth, overlap can not be improved by adding subsequent layers. We formulate necessary conditions for saturation. Numerically, we find layerwise QAOA reaches its maximum overlap at depth $p^*=n$. The addition of coherent dephasing errors to training removes saturation, recovering robustness to layerwise training. This study sheds new light on the performance limitations and prospects of QAOA.

翻訳日:2021-07-01 11:55:24 公開日:2021-06-25

# (参考訳) ペルシャの修辞構造理論

Persian Rhetorical Structure Theory ( http://arxiv.org/abs/2106.13833v1 )

ライセンス: CC BY 4.0

Sara Shahmohammadi, Hadi Veisi, Ali Darzi

(参考訳) 過去数年間、談話分析や談話解析への関心は着実に高まり、多くの談話注釈コーパスが作られ、結果として談話パーサーが作られた。本稿では、レトリック構造理論の枠組みで構築されたペルシア語の言論注釈コーパスと、オープンソースの言論パーサであるDPLPパーサ上に構築された言論パーサについて述べる。私たちのコーパスは150のジャーナリストのテキストで構成され、各テキストは平均約400語である。コーパステキストは18の談話関係を用いて注釈付けされ、英語のrst談話ツリーバンクコーパスの注釈ガイドラインに基づいている。テキストレベルの談話パーサは金セグメンテーションを用いて訓練され,DPLP談話パーサ上に構築されている。スパン (s), 核性 (n), 関係性 (r) における我々の談話解析器の性能は, それぞれ78%, 64%, 44%であった。

Over the past years, interest in discourse analysis and discourse parsing has steadily grown, and many discourse-annotated corpora and, as a result, discourse parsers have been built. In this paper, we present a discourse-annotated corpus for the Persian language built in the framework of Rhetorical Structure Theory as well as a discourse parser built upon the DPLP parser, an open-source discourse parser. Our corpus consists of 150 journalistic texts, each text having an average of around 400 words. Corpus texts were annotated using 18 discourse relations and based on the annotation guideline of the English RST Discourse Treebank corpus. Our text-level discourse parser is trained using gold segmentation and is built upon the DPLP discourse parser, which uses a large-margin transition-based approach to solve the problem of discourse parsing. The performance of our discourse parser in span (S), nuclearity (N) and relation (R) detection is around 78%, 64%, 44% respectively, in terms of F1 measure.

翻訳日:2021-07-01 11:45:58 公開日:2021-06-25

# (参考訳) Ladder Polynomial Neural Networks

Ladder Polynomial Neural Networks ( http://arxiv.org/abs/2106.13834v1 )

ライセンス: CC BY-SA 4.0

Li-Ping Liu, Ruiyuan Gu, Xiaozhe Hu

(参考訳) 多項式関数は有用な解析的性質を多数持っているが、それらの関数クラスは制限されていると考えられるため、学習モデルとして使われることは滅多にない。この研究は、適切な多項式関数を訓練すると強い学習モデルになることを示す。特にこの研究は、乗法から構築した新しい活性化関数である積活性化を用いて多項式フィードフォワードニューラルネットワークを構築する。新しいニューラルネットワークは多項式関数であり、多項式の順序を正確に制御する。バッチ正規化やドロップアウトといった標準的なトレーニングテクニックでトレーニングすることができる。この新しいfeedforwardネットワークは、いくつかの以前の多項式モデルを特別なケースとしてカバーする。一般的なフィードフォワードニューラルネットワークと比較して、多項式フィードフォワードネットワークはいくつかの興味深い量のクローズドフォーム計算を持ち、ベイズ学習において非常に有用である。経験的研究における回帰と分類の一連のタスクにおいて、提案モデルは以前の多項式モデルよりも優れている。

Polynomial functions have plenty of useful analytical properties, but they are rarely used as learning models because their function class is considered to be restricted. This work shows that when trained properly polynomial functions can be strong learning models. Particularly this work constructs polynomial feedforward neural networks using the product activation, a new activation function constructed from multiplications. The new neural network is a polynomial function and provides accurate control of its polynomial order. It can be trained by standard training techniques such as batch normalization and dropout. This new feedforward network covers several previous polynomial models as special cases. Compared with common feedforward neural networks, the polynomial feedforward network has closed-form calculations of a few interesting quantities, which are very useful in Bayesian learning. In a series of regression and classification tasks in the empirical study, the proposed model outperforms previous polynomial models.

翻訳日:2021-07-01 11:24:27 公開日:2021-06-25

# (参考訳) EARLIN:資源効率の協調推論のための早期分布検出

EARLIN: Early Out-of-Distribution Detection for Resource-efficient Collaborative Inference ( http://arxiv.org/abs/2106.13842v1 )

ライセンス: CC BY 4.0

Sumaiya Tabassum Nimi, Md Adnan Arefeen, Md Yusuf Sarwar Uddin, Yugyung Lee

(参考訳) 協調推論により、リソース制約のあるエッジデバイスは、重いディープラーニングモデルを実行するサーバ(クラウド)に入力(画像など)をアップロードすることで、推論を行うことができる。このセットアップは、成功した推論のためにコスト効率よく機能するが、モデルがトレーニングされていない入力サンプル(OOD(Out-of-Distribution)サンプル)に直面すると、非常にパフォーマンスが低下する。エッジデバイスが少なくとも、入力サンプルがOODであることを検出できれば、推論ワークロードのためにこれらの入力をサーバにアップロードしないことで、通信と計算リソースを節約できる可能性がある。本稿では,事前学習したCNNモデルの浅い層から重要な特徴を抽出し,縮小した特徴空間上に定義された距離関数に基づいて,入力サンプルをID(In-Distribution)またはOODとして検出する,新しい軽量なOOD検出手法を提案する。提案手法(a)は,事前学習したモデルに対して,それらのモデルの再トレーニングを伴わずに動作し,(b)任意のOODデータセットに自身を公開しない(すべての検出パラメータはIDトレーニングデータセットから得られる)。この目的のために、事前訓練されたモデルを用いて、OOD検出層でモデルを分割し、エッジデバイスとその他をクラウド上に展開するEARLIN(EARLy OOD Detection for Collaborative Inference)を開発した。実際のデータセットとプロトタイプの実装を用いて実験することにより,ベンチマークデータセットで事前学習された一般的なディープラーニングモデル上で,一般的なoodデータセットに対してテストした場合の全体的な精度とコストの観点から,他のアプローチよりも優れた結果が得られることを示す。

Collaborative inference enables resource-constrained edge devices to make inferences by uploading inputs (e.g., images) to a server (i.e., cloud) where the heavy deep learning models run. While this setup works cost-effectively for successful inferences, it severely underperforms when the model faces input samples on which the model was not trained (known as Out-of-Distribution (OOD) samples). If the edge devices could, at least, detect that an input sample is an OOD, that could potentially save communication and computation resources by not uploading those inputs to the server for inference workload. In this paper, we propose a novel lightweight OOD detection approach that mines important features from the shallow layers of a pretrained CNN model and detects an input sample as ID (In-Distribution) or OOD based on a distance function defined on the reduced feature space. Our technique (a) works on pretrained models without any retraining of those models, and (b) does not expose itself to any OOD dataset (all detection parameters are obtained from the ID training dataset). To this end, we develop EARLIN (EARLy OOD detection for Collaborative INference) that takes a pretrained model and partitions the model at the OOD detection layer and deploys the considerably small OOD part on an edge device and the rest on the cloud. By experimenting using real datasets and a prototype implementation, we show that our technique achieves better results than other approaches in terms of overall accuracy and cost when tested against popular OOD datasets on top of popular deep learning models pretrained on benchmark datasets.

翻訳日:2021-07-01 11:12:43 公開日:2021-06-25

# (参考訳) 近似最大半空間差

Approximate Maximum Halfspace Discrepancy ( http://arxiv.org/abs/2106.13851v1 )

ライセンス: CC BY 4.0

Michael Matheny and Jeff M. Phillips

(参考訳) 幾何学的範囲空間 $(X, \mathcal{H}_d)$ を考えると、$X \subset \mathbb{R}^d$ と $\mathcal{H}_d$ は$d$次元半空間で定義される範囲の集合である。この設定では、$x$ は赤と青の集合の合同和であると考える。各半空間 $h \in \mathcal{H}_d$ に対して、赤の分数と青の点の分数の差を測る函数 $\Phi(h)$ を定義する。この文脈における最大の相違問題は、$h^* = \arg \max_{h \in (X, \mathcal{H}_d)} \Phi(h)$ を見つけることである。代わりに、$\phi(h^*) - \phi(\hat{h}) \le \varepsilon$ となる$\hat{h}$を求める。これは、機械学習の線形分類、空間的異常検出のための空間スキャン統計における中心的な問題であり、他の多くの領域に見られる。この問題に対する解は$o(|x| + (1/\varepsilon^d) \log^4 (1/\varepsilon))$ time で与えられる。 $d=2$ の場合、条件付き下界ではほぼ厳密であることを示す。異なる$\Phi$のクラスに対して、APSP に還元された完全解に対して $\Omega(|X|^{3/2 - o(1)})$ 時下界を与えるか、3SUM に還元された近似解に対して $\Omega(|X| + 1/\varepsilon^{2-o(1)})$ 時下界を与えることができる。主要な技術的結果は、$O(1/\varepsilon^d)$ with $O(\log (1/\varepsilon))$ query timeであり、$O(|X| + (1/\varepsilon^d) \log^4 (1/\varepsilon)$ timeである。

Consider the geometric range space $(X, \mathcal{H}_d)$ where $X \subset \mathbb{R}^d$ and $\mathcal{H}_d$ is the set of ranges defined by $d$-dimensional halfspaces. In this setting we consider that $X$ is the disjoint union of a red and blue set. For each halfspace $h \in \mathcal{H}_d$ define a function $\Phi(h)$ that measures the "difference" between the fraction of red and fraction of blue points which fall in the range $h$. In this context the maximum discrepancy problem is to find the $h^* = \arg \max_{h \in (X, \mathcal{H}_d)} \Phi(h)$. We aim to instead find an $\hat{h}$ such that $\Phi(h^*) - \Phi(\hat{h}) \le \varepsilon$. This is the central problem in linear classification for machine learning, in spatial scan statistics for spatial anomaly detection, and shows up in many other areas. We provide a solution for this problem in $O(|X| + (1/\varepsilon^d) \log^4 (1/\varepsilon))$ time, which improves polynomially over the previous best solutions. For $d=2$ we show that this is nearly tight through conditional lower bounds. For different classes of $\Phi$ we can either provide a $\Omega(|X|^{3/2 - o(1)})$ time lower bound for the exact solution with a reduction to APSP, or an $\Omega(|X| + 1/\varepsilon^{2-o(1)})$ lower bound for the approximate solution with a reduction to 3SUM. A key technical result is a $\varepsilon$-approximate halfspace range counting data structure of size $O(1/\varepsilon^d)$ with $O(\log (1/\varepsilon))$ query time, which we can build in $O(|X| + (1/\varepsilon^d) \log^4 (1/\varepsilon))$ time.

翻訳日:2021-07-01 10:56:38 公開日:2021-06-25

# (参考訳) セマンティックパーシング自然言語をリレーショナル代数に変換する

Semantic Parsing Natural Language into Relational Algebra ( http://arxiv.org/abs/2106.13858v1 )

ライセンス: CC BY 4.0

Ruiyang Xu, Ayush Singh

(参考訳) データベースへの自然なインターフェース(NLIDB)は、過去数十年で多く研究されてきた。 NLIDBの中核は、自然言語をSQLに変換するために使われるセマンティックパーサである。従来のNLP方法論の解決策は文法規則パターン学習と中間論理形式によるペアリングに焦点を当てている。これらのメソッドは特定のデータベースやパースタスクに対して許容できるパフォーマンスを提供するが、一般化や拡張は困難である。一方,近年のニューラルディープ・ラーニングの進歩は,一般的なNLIDBシステム構築に有望な方向性をもたらすと考えられる。従来のアプローチとは異なり、これらの神経方法論は解析問題をシーケンスからシーケンスへの学習問題として扱う。本稿では,いくつかのシーケンスからシーケンスへの学習モデルを実験し,その性能を一般データベース解析タスクで評価する。

Natural interface to database (NLIDB) has been researched a lot during the past decades. In the core of NLIDB, is a semantic parser used to convert natural language into SQL. Solutions from traditional NLP methodology focuses on grammar rule pattern learning and pairing via intermediate logic forms. Although those methods give an acceptable performance on certain specific database and parsing tasks, they are hard to generalize and scale. On the other hand, recent progress in neural deep learning seems to provide a promising direction towards building a general NLIDB system. Unlike the traditional approach, those neural methodologies treat the parsing problem as a sequence-to-sequence learning problem. In this paper, we experimented on several sequence-to-sequence learning models and evaluate their performance on general database parsing task.

翻訳日:2021-07-01 10:33:14 公開日:2021-06-25

# (参考訳) AutoPipeline: 強化学習と検索を使ってデータパイプラインをターゲット別に合成する

AutoPipeline: Synthesize Data Pipelines By-Target Using Reinforcement Learning and Search ( http://arxiv.org/abs/2106.13861v1 )

ライセンス: CC BY 4.0

Junwen Yang, Yeye He, Surajit Chaudhuri

(参考訳) 最近の作業は、文字列変換やテーブル操作演算子(join、groupby、pivotなど)のような単一のデータ準備ステップの自動化を支援する上で大きな進歩を遂げている。本研究では、文字列変換とテーブル操作演算子の両方で複雑なデータパイプラインを合成することにより、複数のステップをエンドツーエンドで自動化することを提案する。本稿では,従来のバイサンプルパラダイムとは大きく離れているパイプラインをユーザが容易に指定できる,新たな"バイターゲット"パラダイムを提案する。 by-targetを使用することで、ユーザは入力テーブル(csvやjsonファイルなど)を提供して、“ターゲットテーブル”(既存のデータベーステーブルやbiダッシュボードなど)を指して、希望するパイプラインからの出力がどのようにスキーマ的に“見た目”するかを実証する。問題は具体的でないように見えるが、FDやキーといった暗黙のテーブルの制約を利用して、空間を著しく制約し、問題を抽出できるようにするというユニークな洞察がある。我々は、強化学習と探索を用いてパイプラインを合成するオートパイプシステムを開発した。 GitHubからクロールされた多数の実パイプラインの実験によると、Auto-Pipelineは、これらの複雑なパイプラインの60～70%(最大10ステップ)を平均10～20秒で合成できる。

Recent work has made significant progress in helping users to automate single data preparation steps, such as string-transformations and table-manipulation operators (e.g., Join, GroupBy, Pivot, etc.). We in this work propose to automate multiple such steps end-to-end, by synthesizing complex data pipelines with both string transformations and table-manipulation operators. We propose a novel "by-target" paradigm that allows users to easily specify the desired pipeline, which is a significant departure from the traditional by-example paradigm. Using by-target, users would provide input tables (e.g., csv or json files), and point us to a "target table" (e.g., an existing database table or BI dashboard) to demonstrate how the output from the desired pipeline would schematically "look like". While the problem is seemingly underspecified, our unique insight is that implicit table constraints such as FDs and keys can be exploited to significantly constrain the space to make the problem tractable. We develop an Auto-Pipeline system that learns to synthesize pipelines using reinforcement learning and search. Experiments on large numbers of real pipelines crawled from GitHub suggest that Auto-Pipeline can successfully synthesize 60-70% of these complex pipelines (up to 10 steps) in 10-20 seconds on average.

翻訳日:2021-07-01 10:25:52 公開日:2021-06-25

# LB-CNN:チェインとキューピーを用いた軽二元畳み込みニューラルネットワークの高速トレーニングのためのオープンソースフレームワーク

LB-CNN: An Open Source Framework for Fast Training of Light Binary Convolutional Neural Networks using Chainer and Cupy ( http://arxiv.org/abs/2106.15350v1 )

ライセンス: Link先を確認

Radu Dogaru, Ioana Dogaru

(参考訳) 軽量バイナリ畳み込みニューラルネットワーク(LB-CNN)は、多くの産業アプリケーションで必要とされる低エネルギーのコンピューティングプラットフォームで実装する場合、特に有用である。本稿では,コンパクトLB-CNNの最適化フレームワークを導入し,その有効性を評価する。このフレームワークは無償で利用可能であり、フリーアクセスのクラウドプラットフォームで動作する可能性がある。最適化されたモデルは標準化された.h5形式で保存され、特定の技術へのさらなる展開のための特別なツールへの入力として使用できる。モデル最適化,特にバイナリ畳み込みカーネルの選択を高速化する主な要素は,出力層を極端な学習機械として訓練するための大幅な高速化を提供するChainer/Cupy機械学習ライブラリである。 Keras/Tensorflowを使った出力層の追加トレーニングは、精度の向上を可能にするため含まれる。 MNIST, GTSRB, ORL, VGGなど, 広く使用されているデータセットの結果は, 精度と複雑性の間に非常に良い妥協点を示す。特に顔認識問題では、慎重に最適化されたlb-cnnモデルが最大100%の精度を提供する。このようなTinyMLソリューションは、低消費電力の画像認識を必要とする産業用途に適している。

Light binary convolutional neural networks (LB-CNN) are particularly useful when implemented in low-energy computing platforms as required in many industrial applications. Herein, a framework for optimizing compact LB-CNN is introduced and its effectiveness is evaluated. The framework is freely available and may run on free-access cloud platforms, thus requiring no major investments. The optimized model is saved in the standardized .h5 format and can be used as input to specialized tools for further deployment into specific technologies, thus enabling the rapid development of various intelligent image sensors. The main ingredient in accelerating the optimization of our model, particularly the selection of binary convolution kernels, is the Chainer/Cupy machine learning library offering significant speed-ups for training the output layer as an extreme-learning machine. Additional training of the output layer using Keras/Tensorflow is included, as it allows an increase in accuracy. Results for widely used datasets including MNIST, GTSRB, ORL, VGG show very good compromise between accuracy and complexity. Particularly, for face recognition problems a carefully optimized LB-CNN model provides up to 100% accuracies. Such TinyML solutions are well suited for industrial applications requiring image recognition with low energy consumption.

翻訳日:2021-06-30 15:31:57 公開日:2021-06-25

# (参考訳) POLAR:ニューラルネットワーク制御システムの検証のための多項式算術フレームワーク

POLAR: A Polynomial Arithmetic Framework for Verifying Neural-Network Controlled Systems ( http://arxiv.org/abs/2106.13867v1 )

ライセンス: CC BY 4.0

Chao Huang, Jiameng Fan, Xin Chen, Wenchao Li, Qi Zhu

(参考訳) 本稿では,ニューラルネットワーク制御システム(NNCSs)の有界時間到達性解析に多項式オーバー近似を時間間隔で利用した,テキストbf{pol}ynomial \textbf{ar}ithmetic frameworkであるPOLARを提案する。標準テイラーモデルを用いた既存の算術手法と比較して,本手法では,連続活性化関数のベルンシュタイン多項式補間法と他の演算のテイラーモデル算術法を組み合わせて,ニューロン出力を層々に重ね合わせて反復的に近似する新しい手法を用いる。このアプローチは標準テイラーモデルの算術における主な欠点を克服することができる。テイラー多項式で十分に近似できない関数を扱うことができず、NNCSの到達可能な状態計算の精度と効率を大幅に改善する。さらに,本手法では,ニューラルネットワークの出力範囲を推定する際に,線形写像下でのテイラーモデル残差を象徴的に保持する。 POLARが既存のTaylorモデルフローパイプ構築技術とシームレスに統合できることを示し、POLARが一連のベンチマークで現在の最先端技術よりも大幅に優れていることを示す。

We propose POLAR, a \textbf{pol}ynomial \textbf{ar}ithmetic framework that leverages polynomial overapproximations with interval remainders for bounded-time reachability analysis of neural network-controlled systems (NNCSs). Compared with existing arithmetic approaches that use standard Taylor models, our framework uses a novel approach to iteratively overapproximate the neuron output ranges layer-by-layer with a combination of Bernstein polynomial interpolation for continuous activation functions and Taylor model arithmetic for the other operations. This approach can overcome the main drawback in the standard Taylor model arithmetic, i.e. its inability to handle functions that cannot be well approximated by Taylor polynomials, and significantly improve the accuracy and efficiency of reachable states computation for NNCSs. To further tighten the overapproximation, our method keeps the Taylor model remainders symbolic under the linear mappings when estimating the output range of a neural network. We show that POLAR can be seamlessly integrated with existing Taylor model flowpipe construction techniques, and demonstrate that POLAR significantly outperforms the current state-of-the-art techniques on a suite of benchmarks.

翻訳日:2021-06-30 15:04:19 公開日:2021-06-25

# (参考訳) トランスフラワー:マルチモーダルアテンションによる確率的自己回帰ダンス生成

Transflower: probabilistic autoregressive dance generation with multimodal attention ( http://arxiv.org/abs/2106.13871v1 )

ライセンス: CC BY 4.0

Guillermo Valle-P\'erez, Gustav Eje Henter, Jonas Beskow, Andr\'e Holzapfel, Pierre-Yves Oudeyer, Simon Alexanderson

(参考訳) ダンスは、音楽のリズム、調律、音節の特徴に従う複雑な動きの巧みな構成を必要とする。形式的には、音楽に条件付けされたダンスの生成は、オーディオ信号に基づいて条件付けされた高次元連続モーション信号をモデル化する問題として表現することができる。この作業では、この問題に取り組むために2つの貢献をします。まず,マルチモーダルトランスフォーマーエンコーダを用いて,前回のポーズと音楽コンテキストで条件付けられた正規化フローを用いて,将来的なポーズの分布をモデル化する,新しい確率的自己回帰型アーキテクチャを提案する。第2に,現在最大規模の3dダンスモーションデータセットを紹介し,さまざまなモーションキャプチャ技術を用いて,プロダンサーとカジュアルダンサーの両方を含む。このデータセットを用いて,客観的指標とユーザスタディを用いて,新たなモデルを2つのベースラインと比較し,確率分布をモデル化する能力と,大きな動きや音楽のコンテキストを乗り越える能力の両方が,音楽にマッチする興味深い,多様で現実的なダンスを生み出すために必要なことを示す。

Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder. Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers. Using this dataset, we compare our new model against two baselines, via objective metrics and a user study, and show that both the ability to model a probability distribution, as well as being able to attend over a large motion and music context are necessary to produce interesting, diverse, and realistic dance that matches the music.

翻訳日:2021-06-30 14:42:49 公開日:2021-06-25

# (参考訳) コモンセンスを用いたRationale-Inspireed Natural Language Explanations

Rationale-Inspired Natural Language Explanations with Commonsense ( http://arxiv.org/abs/2106.13876v1 )

ライセンス: CC BY 4.0

Bodhisattwa Prasad Majumder, Oana-Maria Camburu, Thomas Lukasiewicz, Julian McAuley

(参考訳) 説明可能な機械学習モデルは、主に、抽出的論理(入力特徴のサブセット)または抽象的正当化として自由テキスト自然言語説明(NLE)を用いて予測されたラベルを正当化する。 NLEは抽出的理性よりも包括的であるが、機械生成のNLEは時に常識的知識を欠いていることが示されている。ここでは,コモンセンス知識が抽出的合理性とnlesの橋渡しとして機能し,両タイプの説明をより良くすることを示す。より正確には、(1)機械予測に責任のある特徴の集合として有理を抽出し、(2)利用可能なコモンセンスリソースを用いて抽出有理を拡大し、(3)拡張された知識を用いて自然言語の説明を生成する、RExC(Rationale-Inspired Explanations with Commonsense)と呼ばれる統一的なフレームワークを導入する。我々のフレームワークは、自然言語処理と視覚言語理解の両方において5つのタスクにまたがるNLEの生成において、これまでの最先端よりも大きなマージンを上回り、人間のアノテータは、RExCが生成した説明をより包括的で、常識に根ざし、従来の最先端モデルよりも全体的に好まれていることを常に評価している。さらに,コモンセンスに基づく説明により,作業性能と合理化抽出能力が向上することを示す。

Explainable machine learning models primarily justify predicted labels using either extractive rationales (i.e., subsets of input features) or free-text natural language explanations (NLEs) as abstractive justifications. While NLEs can be more comprehensive than extractive rationales, machine-generated NLEs have been shown to sometimes lack commonsense knowledge. Here, we show that commonsense knowledge can act as a bridge between extractive rationales and NLEs, rendering both types of explanations better. More precisely, we introduce a unified framework, called RExC (Rationale-Inspired Explanations with Commonsense), that (1) extracts rationales as a set of features responsible for machine predictions, (2) expands the extractive rationales using available commonsense resources, and (3) uses the expanded knowledge to generate natural language explanations. Our framework surpasses by a large margin the previous state-of-the-art in generating NLEs across five tasks in both natural language processing and vision-language understanding, with human annotators consistently rating the explanations generated by RExC to be more comprehensive, grounded in commonsense, and overall preferred compared to previous state-of-the-art models. Moreover, our work shows that commonsense-grounded explanations can enhance both task performance and rationales extraction capabilities.

翻訳日:2021-06-30 14:14:41 公開日:2021-06-25

# (参考訳) Pastprop-RNN:過去補正による未来予測の改善

Pastprop-RNN: improved predictions of the future by correcting the past ( http://arxiv.org/abs/2106.13881v1 )

ライセンス: CC BY 4.0

Andr\'e Baptista, Yassine Baghoussi, Carlos Soares, Jo\~ao Mendes-Moreira, Miguel Arantes

(参考訳) 予測精度は、利用可能な過去のデータの品質に依存する。データ破壊は生成されたモデルの品質(例)に悪影響を及ぼす可能性がある。需要予測時の在庫外商品などの予期せぬ出来事) 未来をよりよく説明するために、過去にどのようにデータが必要だったかを予測します。本研究では,データ中心のバックプロパゲーションアルゴリズムであるPassprop-LSTMを提案する。競合データセット M4 と M5 の予測と Numenta Anomaly Benchmark の3種類の Pastprop-LSTM を検証した。実験により,標準LSTMの予測誤差が高い場合,提案手法は予測精度を向上させることができることが示された。また、異常を含むデータセット上でアルゴリズムの可能性を示す。

Forecasting accuracy is reliant on the quality of available past data. Data disruptions can adversely affect the quality of the generated model (e.g. unexpected events such as out-of-stock products when forecasting demand). We address this problem by pastcasting: predicting how data should have been in the past to explain the future better. We propose Pastprop-LSTM, a data-centric backpropagation algorithm that assigns part of the responsibility for errors to the training data and changes it accordingly. We test three variants of Pastprop-LSTM on forecasting competition datasets, M4 and M5, plus the Numenta Anomaly Benchmark. Empirical evaluation indicates that the proposed method can improve forecasting accuracy, especially when the prediction errors of standard LSTM are high. It also demonstrates the potential of the algorithm on datasets containing anomalies.

翻訳日:2021-06-30 13:51:57 公開日:2021-06-25

# (参考訳) 関係帯域に対する信頼度境界付き知識注入型政策勾配

Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits ( http://arxiv.org/abs/2106.13895v1 )

ライセンス: CC BY 4.0

Kaushik Roy and Qi Zhang and Manas Gaur and Amit Sheth

(参考訳) コンテキストバンドは、オンライン広告、レコメンデーションシステム、ヘルスケアなど、さまざまな現実のシナリオで重要なユースケースを見つけます。しかし、ほとんどのアルゴリズムは文脈を表現するのに平坦な特徴ベクトルを使い、現実の世界では様々なオブジェクトとそれらの間の関係が文脈でモデル化される。例えば、音楽レコメンデーションシステムでは、ユーザのコンテキストには、聴く音楽、どのアーティストがこの音楽を作成するか、アーティストのアルバムなどが含まれる。よりリッチなリレーショナルコンテキスト表現を追加することで、探索と探索が難しくなる。探索探索戦略を導くために、文脈に関する探索探索知識の効率を向上させる。リレーショナルな文脈表現は、人間が記述的な性質から知識を特定できる自然な方法である。本研究では,知識注入政策グラディエンスを文脈帯域設定や新しい知識注入政策グラディエンス・アッパー信頼境界アルゴリズムに適応させ,模擬音楽レコメンデーションデータセットと各種実生活データセットを実験的に解析し,専門家の知識が全体の後悔を劇的に減らし,それを不可能にする方法を提案する。

Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc. However, most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context. For example, in a music recommendation system, the user context contains what music they listen to, which artists create this music, the artist albums, etc. Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder. To improve the efficiency of exploration-exploitation knowledge about the context can be infused to guide the exploration-exploitation strategy. Relational context representations allow a natural way for humans to specify knowledge owing to their descriptive nature. We propose an adaptation of Knowledge Infused Policy Gradients to the Contextual Bandit setting and a novel Knowledge Infused Policy Gradients Upper Confidence Bound algorithm and perform an experimental analysis of a simulated music recommendation dataset and various real-life datasets where expert knowledge can drastically reduce the total regret and where it cannot.

翻訳日:2021-06-30 13:42:42 公開日:2021-06-25

# (参考訳) ローリング水平展開による学習状態空間モデルによる予測制御

Predictive Control Using Learned State Space Models via Rolling Horizon Evolution ( http://arxiv.org/abs/2106.13911v1 )

ライセンス: CC BY 4.0

Alvaro Ovalle, Simon M. Lucas

(参考訳) モデルに基づく強化学習への関心の大部分は、戦略的長期的意思決定が可能な前方モデルを取得する可能性から導かれる。エージェントが有用な予測モデルを学ぶのに成功すると仮定すると、シミュレーションされた計画の生成と選択にそれを利用するメカニズムが必要である。本稿では,進化的アルゴリズム計画手法とディープラーニングと変分推論を用いて学習したモデルを組み合わせることを目的とした。視覚的ナビゲーションタスクのセットでオンライン計画を確実に行うエージェントを用いて,このアプローチを実証する。

A large part of the interest in model-based reinforcement learning derives from the potential utility to acquire a forward model capable of strategic long term decision making. Assuming that an agent succeeds in learning a useful predictive model, it still requires a mechanism to harness it to generate and select among competing simulated plans. In this paper, we explore this theme combining evolutionary algorithmic planning techniques with models learned via deep learning and variational inference. We demonstrate the approach with an agent that reliably performs online planning in a set of visual navigation tasks.

翻訳日:2021-06-30 13:27:32 公開日:2021-06-25

# XL-Sum:44言語のための大規模多言語抽象要約

XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages ( http://arxiv.org/abs/2106.13822v1 )

ライセンス: Link先を確認

Tahmid Hasan, Abhik Bhattacharjee, Md Saiful Islam, Kazi Samin, Yuan-Fang Li, Yong-Bin Kang, M. Sohel Rahman, Rifat Shahriyar

(参考訳) 抽象的テキスト要約(abstractive text summarization)に関する現代の研究は、主に英語のような高リソース言語に焦点を当ててきた。本稿では,bbcから100万の専門的注釈付記事要約ペアからなる包括的かつ多様なデータセットであるxl-sumを提案する。データセットは、ローからハイリソースまでの44の言語をカバーしており、その多くは、現在公開データセットが使用できない。 XL-Sumは非常に抽象的で簡潔で高品質で、人間や本質的な評価によって示される。我々は,最先端の事前学習型多言語モデルであるmt5をxl-sumで微調整し,多言語および低リソースの要約タスク実験を行った。 XL-Sumは、類似したモノリンガルデータセットを用いて得られたものと比較して、競合的な結果を誘導する: ベンチマークした10言語で11 ROUGE-2スコアを上回り、そのうちのいくつかはマルチリンガルトレーニングによって得られた15を超えている。さらに、低リソース言語でのトレーニングは、個々に競争的なパフォーマンスを提供する。我々の知る限り、XL-Sumは単一のソースから収集されたサンプルの数とカバーされる言語数で最大の抽象的な要約データセットである。我々は,多言語抽象要約に関する今後の研究を促進するために,データセットとモデルをリリースする。リソースは \url{https://github.com/csebuetnlp/xl-sum} にある。

Contemporary works on abstractive text summarization have focused primarily on high-resource languages like English, mostly due to the limited availability of datasets for low/mid-resource ones. In this work, we present XL-Sum, a comprehensive and diverse dataset comprising 1 million professionally annotated article-summary pairs from BBC, extracted using a set of carefully designed heuristics. The dataset covers 44 languages ranging from low to high-resource, for many of which no public dataset is currently available. XL-Sum is highly abstractive, concise, and of high quality, as indicated by human and intrinsic evaluation. We fine-tune mT5, a state-of-the-art pretrained multilingual model, with XL-Sum and experiment on multilingual and low-resource summarization tasks. XL-Sum induces competitive results compared to the ones obtained using similar monolingual datasets: we show higher than 11 ROUGE-2 scores on 10 languages we benchmark on, with some of them exceeding 15, as obtained by multilingual training. Additionally, training on low-resource languages individually also provides competitive performance. To the best of our knowledge, XL-Sum is the largest abstractive summarization dataset in terms of the number of samples collected from a single source and the number of languages covered. We are releasing our dataset and models to encourage future research on multilingual abstractive summarization. The resources can be found at \url{https://github.com/csebuetnlp/xl-sum}.

翻訳日:2021-06-29 18:13:37 公開日:2021-06-25

# ビルディングブリッジ:AI倫理を探求するジェネレーティブアートワーク

Building Bridges: Generative Artworks to Explore AI Ethics ( http://arxiv.org/abs/2106.13901v1 )

ライセンス: Link先を確認

Ramya Srinivasan and Devi Parikh

(参考訳) 近年,人工知能(AI)技術が社会に与える影響の理解と緩和に重点が置かれている。学術、産業、政府機関全体で、AI倫理の強化に向けた様々な取り組みが追求されている。倫理的AIシステムの設計における重要な課題は、AIパイプラインには複数の利害関係者があり、それぞれがそれぞれ独自の制約と関心を持っていることだ。例えば、AIモデルを設計し開発するAI研究者は、AI決定の複合的な効果によって消費者の生活に生じる不安定性を必ずしも認識していない。より広い文脈で、異なるステークホルダーの役割と責任について教育する必要がある。本稿では,異なる視点を捉えるためのアクセス可能で強力な教育ツールとして機能することにより,生成的アートワークがこの役割を果たす可能性について概説する。 AI倫理を強化するツールとして、計算創造性に関する学際的な議論を広く起こしたいと考えています。

In recent years, there has been an increased emphasis on understanding and mitigating adverse impacts of artificial intelligence (AI) technologies on society. Across academia, industry, and government bodies, a variety of endeavours are being pursued towards enhancing AI ethics. A significant challenge in the design of ethical AI systems is that there are multiple stakeholders in the AI pipeline, each with their own set of constraints and interests. These different perspectives are often not understood, due in part to communication gaps.For example, AI researchers who design and develop AI models are not necessarily aware of the instability induced in consumers' lives by the compounded effects of AI decisions. Educating different stakeholders about their roles and responsibilities in the broader context becomes necessary. In this position paper, we outline some potential ways in which generative artworks can play this role by serving as accessible and powerful educational tools for surfacing different perspectives. We hope to spark interdisciplinary discussions about computational creativity broadly as a tool for enhancing AI ethics.

翻訳日:2021-06-29 17:57:29 公開日:2021-06-25

# 超音波スキャンにおけるcnnセグメンテーションに基づく物体検出・追跡法と迷走神経検出への応用

A CNN Segmentation-Based Approach to Object Detection and Tracking in Ultrasound Scans with Application to the Vagus Nerve Detection ( http://arxiv.org/abs/2106.13849v1 )

ライセンス: Link先を確認

Abdullah F. Al-Battal, Yan Gong, Lu Xu, Timothy Morton, Chen Du, Yifeng Bu 1, Imanuel R Lerman, Radhika Madhavan, Truong Q. Nguyen

(参考訳) 超音波検査はいくつかの医療診断や治療に不可欠である。治療計画に影響を与える解剖学的特徴や構造を可視化し分析するために用いられる。しかし、どちらも労働集約的であり、その効果は操作者に依存する。リアルタイムで正確でロバストな解剖学的構造の自動検出と追跡は、診断と治療の手順に一貫性と効率性に大きな影響を与える。本稿では,超音波スキャンで特定の解剖学的標的構造を自動的に検出し追跡する深層学習フレームワークを提案する。我々のフレームワークは、被験者や撮像装置間で正確で堅牢で、リアルタイムで動作し、大規模なトレーニングセットを必要としないように設計されています。ローカライズ精度を維持しており、元のトレーニングセットの20%程度の大きさのトレーニングセットでトレーニングした場合、90%以上をリコールする。フレームワークのバックボーンは、U-Netに基づいた弱いトレーニングを受けたセグメンテーションニューラルネットワークである。このフレームワークを2つの異なる超音波データセット上でテストし、Vagus神経を検出し、追跡することを目的として、最先端のリアルタイム物体検出ネットワークよりも優れた性能を示した。

Ultrasound scanning is essential in several medical diagnostic and therapeutic applications. It is used to visualize and analyze anatomical features and structures that influence treatment plans. However, it is both labor intensive, and its effectiveness is operator dependent. Real-time accurate and robust automatic detection and tracking of anatomical structures while scanning would significantly impact diagnostic and therapeutic procedures to be consistent and efficient. In this paper, we propose a deep learning framework to automatically detect and track a specific anatomical target structure in ultrasound scans. Our framework is designed to be accurate and robust across subjects and imaging devices, to operate in real-time, and to not require a large training set. It maintains a localization precision and recall higher than 90% when trained on training sets that are as small as 20% in size of the original training set. The framework backbone is a weakly trained segmentation neural network based on U-Net. We tested the framework on two different ultrasound datasets with the aim to detect and track the Vagus nerve, where it outperformed current state-of-the-art real-time object detection networks.

翻訳日:2021-06-29 17:54:55 公開日:2021-06-25

# 半監督Raw-to-Rawマッピング

Semi-Supervised Raw-to-Raw Mapping ( http://arxiv.org/abs/2106.13883v1 )

ライセンス: Link先を確認

Mahmoud Afifi and Abdullah Abuolaim

(参考訳) カメラセンサーの生RGB色は、異なるセンサーのメーカーやモデル間のスペクトル感度の違いによって異なる。本稿では,異なるセンサRGB色空間間のマッピング作業に焦点をあてる。以前の研究は、正確な色マッピングを実現するためにペアワイズキャリブレーションを使用してこの問題に対処した。精度は高いものの、(1)カラーキャリブレーション対象を各シーンに配置した両カメラ装置で一対の画像を撮影する、(2)カラーキャリブレーション対象の正確な画像アライメントまたは手動アノテーション。本稿では,より実用的な構成で生空間のカラーマッピングを実現することを目的とする。具体的には,各カメラ装置で撮影された非ペア画像群とペア画像群で訓練された半教師付きraw-to-rawマッピング法を提案する。実験により,本手法は単一校正法に加えて,他の領域適応法よりも優れた結果が得られることを示す。この取り組みの一環として、2つの異なるスマートフォンカメラから生画像の新しいデータセットを作成しました。データセットには、セミ教師付きトレーニングと評価のためのペアとペアのセットが含まれています。

The raw-RGB colors of a camera sensor vary due to the spectral sensitivity differences across different sensor makes and models. This paper focuses on the task of mapping between different sensor raw-RGB color spaces. Prior work addressed this problem using a pairwise calibration to achieve accurate color mapping. Although being accurate, this approach is less practical as it requires: (1) capturing pair of images by both camera devices with a color calibration object placed in each new scene; (2) accurate image alignment or manual annotation of the color calibration object. This paper aims to tackle color mapping in the raw space through a more practical setup. Specifically, we present a semi-supervised raw-to-raw mapping method trained on a small set of paired images alongside an unpaired set of images captured by each camera device. Through extensive experiments, we show that our method achieves better results compared to other domain adaptation alternatives in addition to the single-calibration solution. We have generated a new dataset of raw images from two different smartphone cameras as part of this effort. Our dataset includes unpaired and paired sets for our semi-supervised training and evaluation.

翻訳日:2021-06-29 17:54:39 公開日:2021-06-25

# フォトニック回路インスパイアされた小型ネットワーク:エッジにおけるリアルタイム無線信号分類を目指して

A Photonic-Circuits-Inspired Compact Network: Toward Real-Time Wireless Signal Classification at the Edge ( http://arxiv.org/abs/2106.13865v1 )

ライセンス: Link先を確認

Hsuan-Tung Peng, Joshua Lederman, Lei Xu, Thomas Ferreira de Lima, Chaoran Huang, Bhavin Shastri, David Rosenbluth, Paul Prucnal

(参考訳) 機械学習(ML)法は無線通信システムに広く普及しており、無線周波数(RF)フィンガープリント、自動変調分類、認知無線などの応用に強力であることが証明されている。しかし、mlモデルのサイズが大きいため、レイテンシに敏感なダウンストリームタスクのためにエッジデバイスを実装するのが難しくなる。無線通信システムでは、ミリ秒以下のスケールでのMLデータ処理により、リアルタイムネットワーク監視がセキュリティを改善し、侵入を防ぐことができる。さらに、チップスケールでMLモデルを実装可能なコンパクトで統合可能なハードウェアプラットフォームは、無線通信ネットワークに対するより広範な応用を見出すだろう。エッジにおけるリアルタイム無線信号分類に向けて,フォトニックハードウェアに触発されたリカレントニューラルネットワークモデルと簡易畳み込み分類器を組み合わせた,新しい小型深層ネットワークを提案し,そのランダム送信によるrfエミッタ同定への応用を実証する。提案モデルにより、既存の最先端CNN分類器の50倍のトレーニングパラメータを使用する場合、ZigBeeと同一の30個のデバイスに対して96.32%の分類精度が得られる。ネットワークサイズを大幅に削減したことにより、小型FPGAボードPYNQ-Z1を用いて、0.219ミリ秒のリアルタイムRFフィンガープリントを実演した。

Machine learning (ML) methods are ubiquitous in wireless communication systems and have proven powerful for applications including radio-frequency (RF) fingerprinting, automatic modulation classification, and cognitive radio. However, the large size of ML models can make them difficult to implement on edge devices for latency-sensitive downstream tasks. In wireless communication systems, ML data processing at a sub-millisecond scale will enable real-time network monitoring to improve security and prevent infiltration. In addition, compact and integratable hardware platforms which can implement ML models at the chip scale will find much broader application to wireless communication networks. Toward real-time wireless signal classification at the edge, we propose a novel compact deep network that consists of a photonic-hardware-inspired recurrent neural network model in combination with a simplified convolutional classifier, and we demonstrate its application to the identification of RF emitters by their random transmissions. With the proposed model, we achieve 96.32% classification accuracy over a set of 30 identical ZigBee devices when using 50 times fewer training parameters than an existing state-of-the-art CNN classifier. Thanks to the large reduction in network size, we demonstrate real-time RF fingerprinting with 0.219 ms latency using a small-scale FPGA board, the PYNQ-Z1.

翻訳日:2021-06-29 17:47:17 公開日:2021-06-25

# 量子データ圧縮と量子クロスエントロピー

Quantum Data Compression and Quantum Cross Entropy ( http://arxiv.org/abs/2106.13823v1 )

ライセンス: Link先を確認

Zhou Shangnan

(参考訳) 量子機械学習は、機械学習と量子コンピューティングの交差点における新興分野である。量子機械学習の理論の基礎となる中心的な量は、量子クロスエントロピーである。本稿では,量子クロスエントロピーが準最適量子源符号化の圧縮率であることを示す。そこで我々は,可変長符号化の量子一般化と量子強度の典型性に基づいて開発した,単純で普遍的な量子データ圧縮プロトコルを提案する。

Quantum machine learning is an emerging field at the intersection of machine learning and quantum computing. A central quantity for the theoretical foundation of quantum machine learning is the quantum cross entropy. In this paper, we present one operational interpretation of this quantity, that the quantum cross entropy is the compression rate for sub-optimal quantum source coding. To do so, we give a simple, universal quantum data compression protocol, which is developed based on quantum generalization of variable-length coding, as well as quantum strong typicality.

翻訳日:2021-06-29 17:38:03 公開日:2021-06-25

# 自己ペース主成分分析

Self-paced Principal Component Analysis ( http://arxiv.org/abs/2106.13880v1 )

ライセンス: Link先を確認

Zhao Kang, Hongfei Liu, Jiangxin Li, Xiaofeng Zhu, and Ling Tian

(参考訳) 主成分分析(PCA)は次元減少と特徴抽出に広く用いられている。ロバストPCA (RPCA) は、l1-norm や l2, p-norm のような異なる頑健な距離の測定値の下で、ある程度ノイズや外れ値を扱うことができる。しかし、実世界のデータはこれらの単純な関数によって完全にキャプチャできない構造を表示するかもしれない。さらに、既存の方法は複雑で単純なサンプルを等しく扱う。対照的に、人間が一般的に採用する学習パターンは、単純から複雑、より少ないものから学ぶことである。この原理に基づき, 雑音や異常値の影響を更に低減する, セルフペーシングpca (spca) と呼ばれる新しい手法を提案する。特に、各サンプルの複雑さは、単純からより複雑なサンプルをトレーニングに統合するために、各イテレーションの初めに計算されます。交互最適化に基づいて、SPCAは最適なプロジェクション行列を見つけ、アウトリーチを反復的にフィルタする。理論的解析はSPCAの合理性を示す。一般的なデータセットに関する広範囲な実験により,提案手法が技術結果を大幅に改善できることが証明された。

Principal Component Analysis (PCA) has been widely used for dimensionality reduction and feature extraction. Robust PCA (RPCA), under different robust distance metrics, such as l1-norm and l2, p-norm, can deal with noise or outliers to some extent. However, real-world data may display structures that can not be fully captured by these simple functions. In addition, existing methods treat complex and simple samples equally. By contrast, a learning pattern typically adopted by human beings is to learn from simple to complex and less to more. Based on this principle, we propose a novel method called Self-paced PCA (SPCA) to further reduce the effect of noise and outliers. Notably, the complexity of each sample is calculated at the beginning of each iteration in order to integrate samples from simple to more complex into training. Based on an alternating optimization, SPCA finds an optimal projection matrix and filters out outliers iteratively. Theoretical analysis is presented to show the rationality of SPCA. Extensive experiments on popular data sets demonstrate that the proposed method can improve the state of-the-art results considerably.

翻訳日:2021-06-29 14:08:28 公開日:2021-06-25

# 凍結言語モデルを用いたマルチモーダルファウショット学習

Multimodal Few-Shot Learning with Frozen Language Models ( http://arxiv.org/abs/2106.13884v1 )

ライセンス: Link先を確認

Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S.M. Ali Eslami, Oriol Vinyals, Felix Hill

(参考訳) 十分な規模でトレーニングを行うと、自動回帰言語モデルは、ほんの数例で促された後、新しい言語タスクを学習する顕著な能力を示す。本稿では,このマイナショット学習能力をマルチモーダル環境(ビジョンと言語)に移すための,単純かつ効果的なアプローチを提案する。調整された画像とキャプションデータを用いて、視覚エンコーダを訓練し、各画像を連続した埋め込みのシーケンスとして表現し、プレトレーニングされた凍結言語モデルが適切なキャプションを生成する。結果として得られたシステムはマルチモーダルな数ショット学習者であり、実例に条件付けして、複数のインターリーブ画像とテキスト埋め込みのシーケンスとして表現された、様々な新しいタスクを学習する驚くべき能力を持つ。我々は,新しいオブジェクトや新しい視覚カテゴリーの単語を素早く学習し,ごく少数の例で視覚的質問応答を行い,複数の確立された新しいベンチマークで単一のモデルを測定することで外部知識を活用することを実証した。

When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each image as a sequence of continuous embeddings, such that a pre-trained, frozen language model prompted with this prefix generates the appropriate caption. The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples, represented as a sequence of multiple interleaved image and text embeddings. We demonstrate that it can rapidly learn words for new objects and novel visual categories, do visual question-answering with only a handful of examples, and make use of outside knowledge, by measuring a single model on a variety of established and new benchmarks.

翻訳日:2021-06-29 14:07:37 公開日:2021-06-25

# ドメイン適応のためのドメイン条件予測器

Domain Conditional Predictors for Domain Adaptation ( http://arxiv.org/abs/2106.13899v1 )

ライセンス: Link先を確認

Joao Monteiro, Xavier Gibert, Jianqiao Feng, Vincent Dumoulin, Dar-Shyang Lee

(参考訳) 学習保証は、しばしばi.i.d.の仮定に依存する予測器が実際のタスクを実行するためにデプロイされると、実際に違反する可能性があるデータ。このようにドメイン適応アプローチは、ラベル上の条件分布が基礎となるデータ分布から独立することを期待する共変量シフトなど、他の仮定が満たされるならば、異なるトレインとテストデータ分布をサポートするという、余分な柔軟性をもたらす有用なフレームワークとして現れた。様々な列車やテストデータソースをまたいだ一般化を誘導するために,データ生成分布が予測モデルに無視されるように,ドメイン不変性の一般的な考え方に依存する方法がいくつか導入された。本研究では,入力データに依存することに加えて,基礎となるデータ生成分布に関する情報を利用する条件付きモデリング手法を検討する。例えば、モデルには環境の変化や新しいデータソースに適応するための明確なメカニズムがあります。このようなアプローチは、共変量シフトのような余分な仮定を必要とせず、最小限の定式化によって生じるトレーニング不安定性の共通源を避けるため、より単純なトレーニングアルゴリズムが得られるため、現在の領域適応法よりも一般的に適用可能である。

Learning guarantees often rely on assumptions of i.i.d. data, which will likely be violated in practice once predictors are deployed to perform real-world tasks. Domain adaptation approaches thus appeared as a useful framework yielding extra flexibility in that distinct train and test data distributions are supported, provided that other assumptions are satisfied such as covariate shift, which expects the conditional distributions over labels to be independent of the underlying data distribution. Several approaches were introduced in order to induce generalization across varying train and test data sources, and those often rely on the general idea of domain-invariance, in such a way that the data-generating distributions are to be disregarded by the prediction model. In this contribution, we tackle the problem of generalizing across data sources by approaching it from the opposite direction: we consider a conditional modeling approach in which predictions, in addition to being dependent on the input data, use information relative to the underlying data-generating distribution. For instance, the model has an explicit mechanism to adapt to changing environments and/or new data sources. We argue that such an approach is more generally applicable than current domain adaptation methods since it does not require extra assumptions such as covariate shift and further yields simpler training algorithms that avoid a common source of training instabilities caused by minimax formulations, often employed in domain-invariant methods.

翻訳日:2021-06-29 14:07:19 公開日:2021-06-25

# 決定論的画像分類器のシーン不確かさとウェリントン後方

Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers ( http://arxiv.org/abs/2106.13870v1 )

ライセンス: Link先を確認

Stephanie Tsuei, Aditya Golatkar, Stefano Soatto

(参考訳) 本研究では,画像分類器の出力結果の不確実性を評価する手法を提案する。画像分類によく使用されるディープニューラルネットワークは、入力画像から出力クラスへの決定論的マップである。そのため、「自信」を定義し、測定し、解釈する場合に、どのような変動性について言及しているかを明確にする必要がある。この目的のために、Wellington Posteriorは、与えられた画像を生成する同じシーンから生成される可能性のあるデータに応答して得られる結果の分布である。与えられたイメージを生成できるシーンは無限に多いため、ウェリントン郵便局は描かれたシーン以外のシーンから誘導する必要がある。データ拡張、アンサンブル、モデル線形化を用いた代替手法について検討する。他にも、生成的敵ネットワーク、条件付き事前ネットワーク、教師付き単一ビュー再構築などがある。ビデオ中の時間隣接フレームのクラスを推測して得られた経験的後肢に対して,これらの代替案をテストした。これらの開発は、安全クリティカルなアプリケーションと互換性のある方法でディープネットワーク分類器の信頼性を評価するための小さなステップにすぎない。

We propose a method to estimate the uncertainty of the outcome of an image classifier on a given input datum. Deep neural networks commonly used for image classification are deterministic maps from an input image to an output class. As such, their outcome on a given datum involves no uncertainty, so we must specify what variability we are referring to when defining, measuring and interpreting "confidence." To this end, we introduce the Wellington Posterior, which is the distribution of outcomes that would have been obtained in response to data that could have been generated by the same scene that produced the given image. Since there are infinitely many scenes that could have generated the given image, the Wellington Posterior requires induction from scenes other than the one portrayed. We explore alternate methods using data augmentation, ensembling, and model linearization. Additional alternatives include generative adversarial networks, conditional prior networks, and supervised single-view reconstruction. We test these alternatives against the empirical posterior obtained by inferring the class of temporally adjacent frames in a video. These developments are only a small step towards assessing the reliability of deep network classifiers in a manner that is compatible with safety-critical applications.

翻訳日:2021-06-29 14:06:16 公開日:2021-06-25

# 食道マントメトリー診断のための多段階機械学習モデル

A multi-stage machine learning model on diagnosis of esophageal manometry ( http://arxiv.org/abs/2106.13869v1 )

ライセンス: Link先を確認

Wenjun Kou, Dustin A. Carlson, Alexandra J. Baumann, Erica N. Donnan, Jacob M. Schauer, Mozziyar Etemadi, John E. Pandolfino

(参考訳) 高分解能マントメトリー(HRM)は食道運動障害を診断するための第一処置である。その解釈と分類には、ツバメレベルの結果の初期評価と、木のようなアルゴリズムを用いてシカゴ分類(cc)に基づく研究レベルの診断の導出が含まれる。 HRMを用いた運動性障害のこの診断手法は、様々な機械学習手法の組み合わせを用いて開発された多段階モデリングフレームワークを用いて反映された。特に、このフレームワークは、飲み込みレベルにおけるディープラーニングモデルと、学習レベルにおける機能ベースの機械学習モデルを含んでいる。また,畳み込みニューラルネットワーク(cnns)に基づく3つのモデルを開発し,飲み込み型,飲み込み加圧,統合緩和圧(irp)の予測を行った。研究段階において、エキスパート知識に基づくルールモデル、xgboostモデル、人工ニューラルネットワーク(ANN)モデルのファミリーからモデル選択を行い、後者の2つのモデルは輸出知識からモチベーションを得て設計および拡張された。ベイズ原理に動機づけられたモデルバランスの単純なモデル非依存戦略を利用して、精度スコアによって重み付けされたモデル平均化を生み出した。平均モデルと各モデルを比較して評価し,top-1予測では0.81,top-2予測では0.92であった。これは、生のマルチスワローデータからHRM研究のCC診断を自動的に予測する最初の人工知能モデルである。さらに,HRMおよび機能的光画像プローブパノメトリー(FLIP)による臨床データに基づいて食道患者の診断など,マルチモーダルタスクに容易に拡張することができる。

High-resolution manometry (HRM) is the primary procedure used to diagnose esophageal motility disorders. Its interpretation and classification includes an initial evaluation of swallow-level outcomes and then derivation of a study-level diagnosis based on Chicago Classification (CC), using a tree-like algorithm. This diagnostic approach on motility disordered using HRM was mirrored using a multi-stage modeling framework developed using a combination of various machine learning approaches. Specifically, the framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage. In the swallow-level stage, three models based on convolutional neural networks (CNNs) were developed to predict swallow type, swallow pressurization, and integrated relaxation pressure (IRP). At the study-level stage, model selection from families of the expert-knowledge-based rule models, xgboost models and artificial neural network(ANN) models were conducted, with the latter two model designed and augmented with motivation from the export knowledge. A simple model-agnostic strategy of model balancing motivated by Bayesian principles was utilized, which gave rise to model averaging weighted by precision scores. The averaged (blended) models and individual models were compared and evaluated, of which the best performance on test dataset is 0.81 in top-1 prediction, 0.92 in top-2 predictions. This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data. Moreover, the proposed modeling framework could be easily extended to multi-modal tasks, such as diagnosis of esophageal patients based on clinical data from both HRM and functional luminal imaging probe panometry (FLIP).

翻訳日:2021-06-29 14:01:55 公開日:2021-06-25

# 論理仕様からの合成強化学習

Compositional Reinforcement Learning from Logical Specifications ( http://arxiv.org/abs/2106.13906v1 )

ライセンス: Link先を確認

Kishor Jothimurugan, Suguman Bansal, Osbert Bastani and Rajeev Alur

(参考訳) 論理仕様による複雑なタスクに対する学習制御ポリシーの問題点について検討する。最近のアプローチでは、与えられた仕様から報酬関数を自動的に生成し、適切な強化学習アルゴリズムを用いて、期待される報酬を最大化するポリシーを学ぶ。しかし、これらのアプローチは、高レベルの計画を必要とする複雑なタスクに不十分にスケールする。本研究では,高レベルの計画と強化学習をインターリーブするDiRLという構成学習手法を開発する。まず、dirlは仕様を抽象グラフとしてエンコードする。直感的には、グラフの頂点と辺はそれぞれ状態空間の領域と単純なサブタスクに対応する。このアプローチでは、強化学習を取り入れて、dijkstraスタイルの計画アルゴリズムで各エッジ(サブタスク)のニューラルネットワークポリシを学習し、グラフ内の高レベルプランを計算する。連続状態とアクション空間を持つ一連の挑戦的制御ベンチマークに対する提案手法の評価は、最先端のベースラインよりも優れていることを示す。

We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning. In this work, we develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning. First, DiRL encodes the specification as an abstract graph; intuitively, vertices and edges of the graph correspond to regions of the state space and simpler sub-tasks, respectively. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph. An evaluation of the proposed approach on a set of challenging control benchmarks with continuous state and action spaces demonstrates that it outperforms state-of-the-art baselines.

翻訳日:2021-06-29 14:01:26 公開日:2021-06-25

# 真理発見と幾何最適化による深層ニューラルネットワークの不確実性校正の改善

Improving Uncertainty Calibration of Deep Neural Networks via Truth Discovery and Geometric Optimization ( http://arxiv.org/abs/2106.14662v1 )

ライセンス: Link先を確認

Chunwei Ma, Ziyun Huang, Jiayi Xian, Mingchen Gao, Jinhui Xu

(参考訳) 近年のDeep Neural Networks(DNN)は、その成功にもかかわらず、学習プロセスに固有の不確実性のために、予測に疑問を投げかける可能性がある。アンサンブル技術とポストホックキャリブレーションは、DNNの不確実性キャリブレーションの改善を個別に示す2種類のアプローチである。しかし,2種類の手法の相乗効果はよく研究されていない。本稿では,アンサンブル法とポストホック校正法を統合するための真理発見フレームワークを提案する。そこで,アンサンブル候補の幾何分散をサンプル不確かさのよい指標として用い,精度を保った真理推定器を設計した。さらに,事実発見正規化最適化により,ポストホックキャリブレーションを向上できることを示す。 CIFAR や ImageNet などの大規模データセットでは,ヒストグラムとカーネル密度に基づく評価指標に対する最先端キャリブレーション手法に対して一貫した改善が見られた。私たちのコードはhttps://github.com/ horsepurve/truly-uncertainで入手できます。

Deep Neural Networks (DNNs), despite their tremendous success in recent years, could still cast doubts on their predictions due to the intrinsic uncertainty associated with their learning process. Ensemble techniques and post-hoc calibrations are two types of approaches that have individually shown promise in improving the uncertainty calibration of DNNs. However, the synergistic effect of the two types of methods has not been well explored. In this paper, we propose a truth discovery framework to integrate ensemble-based and post-hoc calibration methods. Using the geometric variance of the ensemble candidates as a good indicator for sample uncertainty, we design an accuracy-preserving truth estimator with provably no accuracy drop. Furthermore, we show that post-hoc calibration can also be enhanced by truth discovery-regularized optimization. On large-scale datasets including CIFAR and ImageNet, our method shows consistent improvement against state-of-the-art calibration approaches on both histogram-based and kernel density-based evaluation metrics. Our codes are available at https://github.com/horsepurve/truly-uncertain.

翻訳日:2021-06-29 13:59:17 公開日:2021-06-25

# 分散学習と連合学習における暗黙的勾配アライメント

Implicit Gradient Alignment in Distributed and Federated Learning ( http://arxiv.org/abs/2106.13897v1 )

ライセンス: Link先を確認

Yatin Dandi, Luis Barba, Martin Jaggi

(参考訳) 分散学習におけるグローバル収束を達成するための大きな障害は、分散データの不均一性と確率性によるクライアント間の勾配やミニバッチの誤調整である。この問題を軽減するひとつの方法は、トレーニングを通じて異なるクライアント間の勾配のアライメントを促進することだ。解析の結果,SGDの暗黙的正規化効果を再現する適切な最適化手法を用いることで,勾配アライメントとテスト精度の向上が実現可能であることがわかった。 sgdにおけるこの正規化の存在は、訓練中の異なるミニバッチの逐次使用に依存しているため、大きなミニバッチでのトレーニングでは本質的に欠如している。並列性を高めつつ、この正規化の一般化の利点を得るため、各更新で任意に大きなバッチを利用可能にしつつ、同じ暗黙の正規化を誘導する新しいgradalignアルゴリズムを提案する。分散学習とフェデレーション学習において,アルゴリズムの利点を実験的に検証した。

A major obstacle to achieving global convergence in distributed and federated learning is the misalignment of gradients across clients, or mini-batches due to heterogeneity and stochasticity of the distributed data. One way to alleviate this problem is to encourage the alignment of gradients across different clients throughout training. Our analysis reveals that this goal can be accomplished by utilizing the right optimization method that replicates the implicit regularization effect of SGD, leading to gradient alignment as well as improvements in test accuracies. Since the existence of this regularization in SGD completely relies on the sequential use of different mini-batches during training, it is inherently absent when training with large mini-batches. To obtain the generalization benefits of this regularization while increasing parallelism, we propose a novel GradAlign algorithm that induces the same implicit regularization while allowing the use of arbitrarily large batches in each update. We experimentally validate the benefit of our algorithm in different distributed and federated learning settings.

翻訳日:2021-06-29 13:58:20 公開日:2021-06-25

# 閉形式連続深層モデル

Closed-form Continuous-Depth Models ( http://arxiv.org/abs/2106.13898v1 )

ライセンス: Link先を確認

Ramin Hasani, Mathias Lechner, Alexander Amini, Lucas Liebenwein, Max Tschaikowski, Gerald Teschl, Daniela Rus

(参考訳) モデル隠れ状態の微分がニューラルネットワークによって定義される連続深度ニューラルネットワークは、強力なシーケンシャルなデータ処理機能を実現している。しかし、これらのモデルは高度な数値微分方程式(DE)の解法に依存しており、計算コストとモデルの複雑さの両方において大きなオーバーヘッドをもたらす。本稿では,CfCネットワークと呼ばれる新しいモデル群について述べる。そのモデル群は,ODEをベースとしたモデルと同等に強力なモデリング能力を示しながら,記述が簡単で,少なくとも1桁高速である。モデルは、時間連続モデルの表現的部分集合の解析的閉形式解から導出され、複雑なdeソルバの必要性を全て和らげる。実験により,CfCネットワークは長期依存や不規則なサンプルデータを含む様々な時系列予測タスクにおいて,高度で反復的なモデルよりも優れていることを示した。私たちは、リソース制約のある環境でリッチで継続的なニューラルモデルをトレーニングし、デプロイする新たな機会が、パフォーマンスと効率の両方を必要としている、と信じています。

Continuous-depth neural models, where the derivative of the model's hidden state is defined by a neural network, have enabled strong sequential data processing capabilities. However, these models rely on advanced numerical differential equation (DE) solvers resulting in a significant overhead both in terms of computational cost and model complexity. In this paper, we present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster while exhibiting equally strong modeling abilities compared to their ODE-based counterparts. The models are hereby derived from the analytical closed-form solution of an expressive subset of time-continuous models, thus alleviating the need for complex DE solvers all together. In our experimental evaluations, we demonstrate that CfC networks outperform advanced, recurrent models over a diverse set of time-series prediction tasks, including those with long-term dependencies and irregularly sampled data. We believe our findings open new opportunities to train and deploy rich, continuous neural models in resource-constrained settings, which demand both performance and efficiency.

翻訳日:2021-06-29 13:46:22 公開日:2021-06-25

# 低リソース高表現性音声のための明示的持続時間モデルを用いた非自己回帰tt

Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech ( http://arxiv.org/abs/2106.12896v2 )

ライセンス: Link先を確認

Raahil Shah, Kamil Pokora, Abdelhamid Ezzerg, Viacheslav Klimkov, Goeric Huybrechts, Bartosz Putrycz, Daniel Korzekwa, Thomas Merritt

(参考訳) 最近のニューラルテキスト音声(TTS)アプローチは高品質な音声を生成するが、通常はターゲット話者からの大量の録音を必要とする。先行研究では,高品質ttを生成するための3段階の手法を提案し,トレーニングに必要なデータ量を大幅に削減した。しかし, この手法では, 高い表現力を持つ音声に対して, 自然性レベルにおける天井効果が認められている。本稿では,ターゲット話者から15分間の音声データを用いて,高い表現力を持つTS音声を構築する手法を提案する。現在の最先端のアプローチと比較して,提案手法では音声の自然性が23.3%,話者の類似性が16.3%向上している。さらに,15分間の話者データを用いて,tacotron2ベースのフルデータモデル(約10時間)の自然性と話者の類似性を一致させた。 1) 自己回帰型注意型ttsモデルから, 注意を外部持続時間モデルに置き換えた非自己回帰型モデルに変更すること, 2) 追加条件付き生成敵ネットワーク(cgan)ベースの微調整ステップを提案する。

Whilst recent neural text-to-speech (TTS) approaches produce high-quality speech, they typically require a large amount of recordings from the target speaker. In previous work, a 3-step method was proposed to generate high-quality TTS while greatly reducing the amount of data required for training. However, we have observed a ceiling effect in the level of naturalness achievable for highly expressive voices when using this approach. In this paper, we present a method for building highly expressive TTS voices with as little as 15 minutes of speech data from the target speaker. Compared to the current state-of-the-art approach, our proposed improvements close the gap to recordings by 23.3% for naturalness of speech and by 16.3% for speaker similarity. Further, we match the naturalness and speaker similarity of a Tacotron2-based full-data (~10 hours) model using only 15 minutes of target speaker data, whereas with 30 minutes or more, we significantly outperform it. The following improvements are proposed: 1) changing from an autoregressive, attention-based TTS model to a non-autoregressive model replacing attention with an external duration model and 2) an additional Conditional Generative Adversarial Network (cGAN) based fine-tuning step.

翻訳日:2021-06-29 11:11:06 公開日:2021-06-25

# (参考訳) 正規化フローからの知識の蒸留

Distilling the Knowledge from Normalizing Flows ( http://arxiv.org/abs/2106.12699v2 )

ライセンス: CC BY 4.0

Dmitry Baranchuk, Vladimir Aliev, Artem Babenko

(参考訳) 正規化フローは、複数の音声および視覚問題において強力な性能を示す生成モデルの強力なクラスである。他の生成モデルとは対照的に、正規化フローは扱いやすい可能性を持つ潜在変数モデルであり、安定したトレーニングを可能にする。しかし、それらは効率的なヤコビ行列式計算で可逆関数を表現するように慎重に設計する必要がある。実際には、これらの要件は、推論時間とメモリ消費の観点から、代替フィードフォワードモデルよりも劣る、過度にパラメータ化され、洗練されたアーキテクチャをもたらす。本研究では,フローベースモデルをより効率的な代替品に蒸留できるかどうかを検討する。本稿では, 簡単な蒸留法を提案し, 画像超解像および音声合成のための現状条件付きフローベースモデルの有効性を示すことで, この問題に対する肯定的な回答を提供する。

Normalizing flows are a powerful class of generative models demonstrating strong performance in several speech and vision problems. In contrast to other generative models, normalizing flows are latent variable models with tractable likelihoods and allow for stable training. However, they have to be carefully designed to represent invertible functions with efficient Jacobian determinant calculation. In practice, these requirements lead to overparameterized and sophisticated architectures that are inferior to alternative feed-forward models in terms of inference time and memory consumption. In this work, we investigate whether one can distill flow-based models into more efficient alternatives. We provide a positive answer to this question by proposing a simple distillation approach and demonstrating its effectiveness on state-of-the-art conditional flow-based models for image super-resolution and speech synthesis.

翻訳日:2021-06-29 06:02:03 公開日:2021-06-25

# (参考訳) クラスタリング広告の意図による入札:Eコマースのための効率的な検索エンジンマーケティングシステム

Bidding via Clustering Ads Intentions: an Efficient Search Engine Marketing System for E-commerce ( http://arxiv.org/abs/2106.12700v2 )

ライセンス: CC0 1.0

Cheng Jie, Da Xu, Zigeng Wang, Lu Wang, Wei Shen

(参考訳) 検索エンジンのマーケティングの規模が拡大するにつれ、効率的な入札システムの設計がeコマース企業の成功にとって最重要になっている。現代の産業レベルの入札システムで直面する重要な課題は、そのカタログは巨大であり、関連する入札機能は高い疎性である; 2. 大量の入札要求は、オフラインとオンラインの両方のサービスに大きな計算負担を生じさせる。不要なユーザ項目情報を活用することは,ユーザクエリからの自然言語信号と製品からのコンテキスト知識を活用するため,スパーシティの問題を軽減する上で不可欠である。特に,広告のベクトル表現をトランスフォーマモデルを用いて抽出し,それらの幾何学的関係をクラスタリングによる協調入札予測の構築に活用する。 2段階の手続きは入札評価と最適化の計算ストレスを大幅に低減する。本稿では,walmart eコマースにおける検索エンジンマーケティングのための入札システムのエンドツーエンド構造について紹介する。当社のアプローチのオンラインおよびオフラインのパフォーマンスを分析し、それを運用効率のよいソリューションとみなす方法について論じます。

With the increasing scale of search engine marketing, designing an efficient bidding system is becoming paramount for the success of e-commerce companies. The critical challenges faced by a modern industrial-level bidding system include: 1. the catalog is enormous, and the relevant bidding features are of high sparsity; 2. the large volume of bidding requests induces significant computation burden to both the offline and online serving. Leveraging extraneous user-item information proves essential to mitigate the sparsity issue, for which we exploit the natural language signals from the users' query and the contextual knowledge from the products. In particular, we extract the vector representations of ads via the Transformer model and leverage their geometric relation to building collaborative bidding predictions via clustering. The two-step procedure also significantly reduces the computation stress of bid evaluation and optimization. In this paper, we introduce the end-to-end structure of the bidding system for search engine marketing for Walmart e-commerce, which successfully handles tens of millions of bids each day. We analyze the online and offline performances of our approach and discuss how we find it as a production-efficient solution.

翻訳日:2021-06-29 05:44:08 公開日:2021-06-25

# (参考訳) 自律走行における多モード3次元物体検出:サーベイ

Multi-Modal 3D Object Detection in Autonomous Driving: a Survey ( http://arxiv.org/abs/2106.12735v2 )

ライセンス: CC BY 4.0

Yingjie Wang, Qiuyu Mao, Hanqi Zhu, Yu Zhang, Jianmin Ji, Yanyong Zhang

(参考訳) 過去数年間、我々は自動運転の急速な発展を目撃してきた。しかし、複雑でダイナミックな運転環境のため、完全な自律性を実現することは依然として厄介な課題である。その結果、自動運転車は、堅牢で正確な環境認識を行うための一連のセンサーを備えている。センサーの数や種類が増加し続けており、それらを組み合わせて知覚を向上させることが自然なトレンドになりつつある。これまでのところ、マルチセンサー融合に基づく知覚に焦点を当てた詳細なレビューは行われていない。このギャップを埋め、将来の研究を動機付けるために、この調査では、複数のセンサーデータソース、特にカメラやLiDARを活用する、最近のフュージョンベースの3D検出ディープラーニングモデルについてレビューする。本調査では,各センサデータに共通するデータ表現やオブジェクト検出ネットワークを含む,自動運転車用の一般的なセンサの背景について紹介する。次に,マルチモーダル3dオブジェクト検出のための一般的なデータセットについて議論し,各データセットに含まれるセンサデータに着目した。次に, 核融合位置, 核融合データ表現, 核融合粒度の3つの側面を考慮し, 最新のマルチモーダル3次元検出ネットワークについて詳細に検討する。詳細なレビューの後、オープンチャレンジについて議論し、可能な解決策を指摘します。われわれの詳細なレビューが、マルチモーダルな3Dオブジェクト検出の分野での研究に役立てることを願っている。

In the past few years, we have witnessed rapid development of autonomous driving. However, achieving full autonomy remains a daunting task due to the complex and dynamic driving environment. As a result, self-driving cars are equipped with a suite of sensors to conduct robust and accurate environment perception. As the number and type of sensors keep increasing, combining them for better perception is becoming a natural trend. So far, there has been no indepth review that focuses on multi-sensor fusion based perception. To bridge this gap and motivate future research, this survey devotes to review recent fusion-based 3D detection deep learning models that leverage multiple sensor data sources, especially cameras and LiDARs. In this survey, we first introduce the background of popular sensors for autonomous cars, including their common data representations as well as object detection networks developed for each type of sensor data. Next, we discuss some popular datasets for multi-modal 3D object detection, with a special focus on the sensor data included in each dataset. Then we present in-depth reviews of recent multi-modal 3D detection networks by considering the following three aspects of the fusion: fusion location, fusion data representation, and fusion granularity. After a detailed review, we discuss open challenges and point out possible solutions. We hope that our detailed review can help researchers to embark investigations in the area of multi-modal 3D object detection.

翻訳日:2021-06-29 05:34:41 公開日:2021-06-25

# (参考訳) 時間的ルーティング適応と最適輸送を用いた複数ストックトレーディングパターンの学習

Learning Multiple Stock Trading Patterns with Temporal Routing Adaptor and Optimal Transport ( http://arxiv.org/abs/2106.12950v2 )

ライセンス: CC BY 4.0

Hengxu Lin, Dong Zhou, Weiqing Liu, Jiang Bian

(参考訳) 有効な量的投資は通常、株価の将来の動きの正確な予測に依存する。近年、機械学習ベースのソリューションは、より正確な株価予測を行い、現代の量的投資システムにおいて欠かせない要素となる能力を示している。しかし i. i. d. 既存手法の背景にある仮定は、市場における多様な取引パターンの存在と矛盾しており、それは必然的に、より良い株価予測性能を達成する能力を制限する。本稿では,既存の在庫予測モデルに複数の在庫取引パターンをモデル化する能力を持たせるための,新しいアーキテクチャである時間経路適応器(tra)を提案する。 TRAは、複数のパターンを学習するための独立した予測器のセットと、異なる予測器にサンプルをディスパッチするルータで構成される軽量モジュールである。それでも、明示的なパターン識別子がないため、効果的なTRAベースのモデルをトレーニングすることは極めて困難である。この課題に取り組むため,我々は,最適トランスポート(ot)に基づく学習アルゴリズムを更に設計し,予測者の割り当てに最適なサンプルを得るとともに,補助損失項を通じてルータを効果的に最適化する。実世界のストックランキングタスクの実験では,注意 LSTM や Transformer といった最先端のベースラインと比較して,情報係数を 0.053 から 0.059 , 0.051 から 0.056 に向上させることができる。 https://github.com/microsoft/qlib/tree/main/examples/benchmarks/TRA。

Successful quantitative investment usually relies on precise predictions of the future movement of the stock price. Recently, machine learning based solutions have shown their capacity to give more accurate stock prediction and become indispensable components in modern quantitative investment systems. However, the i.i.d. assumption behind existing methods is inconsistent with the existence of diverse trading patterns in the stock market, which inevitably limits their ability to achieve better stock prediction performance. In this paper, we propose a novel architecture, Temporal Routing Adaptor (TRA), to empower existing stock prediction models with the ability to model multiple stock trading patterns. Essentially, TRA is a lightweight module that consists of a set of independent predictors for learning multiple patterns as well as a router to dispatch samples to different predictors. Nevertheless, the lack of explicit pattern identifiers makes it quite challenging to train an effective TRA-based model. To tackle this challenge, we further design a learning algorithm based on Optimal Transport (OT) to obtain the optimal sample to predictor assignment and effectively optimize the router with such assignment through an auxiliary loss term. Experiments on the real-world stock ranking task show that compared to the state-of-the-art baselines, e.g., Attention LSTM and Transformer, the proposed method can improve information coefficient (IC) from 0.053 to 0.059 and 0.051 to 0.056 respectively. Our dataset and code used in this work are publicly available: https://github.com/microsoft/qlib/tree/main/examples/benchmarks/TRA.

翻訳日:2021-06-29 04:48:43 公開日:2021-06-25

# (参考訳) ディファレンシャルプライバシが解釈可能性を満たす場合--ケーススタディ

When Differential Privacy Meets Interpretability: A Case Study ( http://arxiv.org/abs/2106.13203v2 )

ライセンス: CC BY 4.0

Rakshit Naidu, Aman Priyanshu, Aadith Kumar, Sasikanth Kotti, Haofan Wang, Fatemehsadat Mireshghallah

(参考訳) 医療画像や診断などのタスクにおけるDeep Neural Networks(DNN)のトレーニングにおける個人データの利用の増加を踏まえ、DNNの差分プライベートトレーニングの重要性が高まっている。しかし,これらのモデルの解釈可能性やDPの適用が解釈の質に与える影響についてはほとんど注目されていない。本稿では,DPトレーニングがDNN,特に医療画像への応用に与える影響について,APTOSデータセット上で広範囲に研究する。

Given the increase in the use of personal data for training Deep Neural Networks (DNNs) in tasks such as medical imaging and diagnosis, differentially private training of DNNs is surging in importance and there is a large body of work focusing on providing better privacy-utility trade-off. However, little attention is given to the interpretability of these models, and how the application of DP affects the quality of interpretations. We propose an extensive study into the effects of DP training on DNNs, especially on medical imaging applications, on the APTOS dataset.

翻訳日:2021-06-29 04:14:30 公開日:2021-06-25

# (参考訳) RSN: 高精度LiDAR3次元物体検出のためのレンジスパースネット

RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection ( http://arxiv.org/abs/2106.13365v1 )

ライセンス: CC0 1.0

Pei Sun, Weiyue Wang, Yuning Chai, Gamaleldin Elsayed, Alex Bewley, Xiao Zhang, Cristian Sminchisescu, Dragomir Anguelov

(参考訳) LiDARデータから3Dオブジェクトを検出することは、ほとんどの自律運転システムにおいて重要な要素である。安全で高速な運転には、新しいLiDARによって実現されるより大きな検知範囲が必要である。これらのより大きな検出範囲はより効率的で正確な検出モデルを必要とする。本研究では,この拡張検出方式でリアルタイム3次元物体検出を実現するために,簡易で効率的かつ高精度な3次元物体検出器であるレンジスパースネット(RSN)を提案する。 RSNは、範囲画像からフォアグラウンドポイントを予測し、選択したフォアグラウンドポイントにスパース畳み込みを適用してオブジェクトを検出する。高密度領域画像上の軽量な2D畳み込みは、選択された前景点を著しく減らし、後の粗い畳み込みをRCNで効率的に操作できるようにする。距離画像の特徴を組み合わせることで検出精度がさらに向上する。 rsnはwaymo open dataset(wod)上の150m x 150m検出領域で毎秒60フレーム以上動作し、以前公開された検出器よりも正確である。 RSNは2020年11月11日現在、LiDARをベースとした歩行者および車両検出のためのAPH/LEVEL 1測定値に基づいて、WODのリーダーボードで第1位にランクされている。

The detection of 3D objects from LiDAR data is a critical component in most autonomous driving systems. Safe, high speed driving needs larger detection ranges, which are enabled by new LiDARs. These larger detection ranges require more efficient and accurate detection models. Towards this goal, we propose Range Sparse Net (RSN), a simple, efficient, and accurate 3D object detector in order to tackle real time 3D object detection in this extended detection regime. RSN predicts foreground points from range images and applies sparse convolutions on the selected foreground points to detect objects. The lightweight 2D convolutions on dense range images results in significantly fewer selected foreground points, thus enabling the later sparse convolutions in RSN to efficiently operate. Combining features from the range image further enhance detection accuracy. RSN runs at more than 60 frames per second on a 150m x 150m detection region on Waymo Open Dataset (WOD) while being more accurate than previously published detectors. As of 11/2020, RSN is ranked first in the WOD leaderboard based on the APH/LEVEL 1 metrics for LiDAR-based pedestrian and vehicle detection, while being several times faster than alternatives.

翻訳日:2021-06-29 01:14:03 公開日:2021-06-25

# (参考訳) グラフ畳み込みカーネルを用いた距離画像における効率的な3次元物体検出

To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels ( http://arxiv.org/abs/2106.13381v1 )

ライセンス: CC0 1.0

Yuning Chai, Pei Sun, Jiquan Ngiam, Weiyue Wang, Benjamin Caine, Vijay Vasudevan, Xiao Zhang, Dragomir Anguelov

(参考訳) 3Dオブジェクト検出は多くのロボティクス応用において不可欠である。 2次元視点範囲画像が存在するタスクに対しては,この範囲画像から直接3次元表現を学習することを提案する。この目的のために,我々は,各画素の3次元球面座標をネットワーク全体に伝達する2次元畳み込みネットワークアーキテクチャを設計した。その層は、デフォルトの内積カーネルの代わりに任意の畳み込みカーネルを消費し、各ピクセルの周囲の基底となる局所幾何学を利用することができる。我々は4つのカーネルを概説する: 単語の袋型パラダイムに基づく密集したカーネル、最近のグラフニューラルネットワークの進歩に触発された3つのグラフカーネル: トランスフォーマー、ポイントネット、エッジ畳み込み。また、遠近距離画像ビューの操作により、カメラ画像とのクロスモダリティ融合についても検討する。本手法はWaymo Open Dataset上で競合的に動作し,歩行者検出の最先端APを69.7%から75.5%に改善する。私たちの最小のモデルは、今でも人気の高いPointPillarsを上回り、180倍のFLOPSとモデルパラメータを必要としています。

3D object detection is vital for many robotics applications. For tasks where a 2D perspective range image exists, we propose to learn a 3D representation directly from this range image view. To this end, we designed a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network. Its layers can consume any arbitrary convolution kernel in place of the default inner product kernel and exploit the underlying local geometry around each pixel. We outline four such kernels: a dense kernel according to the bag-of-words paradigm, and three graph kernels inspired by recent graph neural network advances: the Transformer, the PointNet, and the Edge Convolution. We also explore cross-modality fusion with the camera image, facilitated by operating in the perspective range image view. Our method performs competitively on the Waymo Open Dataset and improves the state-of-the-art AP for pedestrian detection from 69.7% to 75.5%. It is also efficient in that our smallest model, which still outperforms the popular PointPillars in quality, requires 180 times fewer FLOPS and model parameters

翻訳日:2021-06-29 00:59:33 公開日:2021-06-25

# (参考訳) グローブインベディングのソース・クリティカル・デバイアス法

A Source-Criticism Debiasing Method for GloVe Embeddings ( http://arxiv.org/abs/2106.13382v1 )

ライセンス: CC BY-SA 4.0

Hope McGovern

(参考訳) 大規模な公共コーパスで訓練された単語の埋め込みは、既知の人間の社会的偏見を一貫して示すことはよく文書化されている。多くのデバイアスの方法が存在するが、ほとんどの場合、埋め込みからバイアス情報を完全に排除し、プロセス内のトレーニングセットのサイズを小さくする。本稿では,偏りのあるデータを取り除くのではなく,トレーニングセットの偏りに関する明示的な情報を取り込むことにより,グローブワード埋め込み(pennington et al., 2014)の偏りを解消する簡易かつ効果的な手法を提案する。提案手法は,Brunetらによる高速バイアス勾配近似法の助けを借りて,迅速かつ効率的に動作する。 (2019). 私たちのアプローチは、人文科学における「ソース批判」の概念に似ているので、本手法をソースクリティカルグローブ(sc-glove)と呼ぶ。 SC-GloVeは,トレーニングデータやTOP-1の性能を犠牲にすることなく,ワード埋め込みアソシエーションテスト(WEAT)セットへの影響を小さくする。

It is well-documented that word embeddings trained on large public corpora consistently exhibit known human social biases. Although many methods for debiasing exist, almost all fixate on completely eliminating biased information from the embeddings and often diminish training set size in the process. In this paper, we present a simple yet effective method for debiasing GloVe word embeddings (Pennington et al., 2014) which works by incorporating explicit information about training set bias rather than removing biased data outright. Our method runs quickly and efficiently with the help of a fast bias gradient approximation method from Brunet et al. (2019). As our approach is akin to the notion of 'source criticism' in the humanities, we term our method Source-Critical GloVe (SC-GloVe). We show that SC-GloVe reduces the effect size on Word Embedding Association Test (WEAT) sets without sacrificing training data or TOP-1 performance.

翻訳日:2021-06-29 00:40:49 公開日:2021-06-25

# (参考訳) ベイジアンアイトラッキング

Bayesian Eye Tracking ( http://arxiv.org/abs/2106.13387v1 )

ライセンス: CC BY 4.0

Qiang Ji and Kang Wang

(参考訳) モデルに基づく視線追跡は、訓練データや視線アノテーションを必要とせず、異なる対象に一般化できるため、視線追跡において支配的なアプローチである。しかし、モデルベースの眼球追跡は、特に野生の眼球追跡において、眼球の特徴検出エラーの影響を受けやすい。そこで本研究では,モデルベースアイトラッキングのためのベイズフレームワークを提案する。提案システムは,眼の外観とランドマークとの確率的関係を捉えるカスケード・ベイズ畳み込みニューラルネットワーク(c-BCNN)と,眼のランドマークから視線を推定する幾何学的アイモデルから構成される。ベイジアンフレームワークは、テスト眼画像からベイジアン推定により、明確な目印検出やモデルトレーニングを伴わない視線分布を生成し、最も可能性の高い視線を推定するだけでなく、その不確実性を推定する。さらに,点に基づく推論ではなくベイズ推論を用いることで,異なるサブジェクトやヘッドポーズ,環境に対してよりよく一般化できるだけでなく,画像ノイズやランドマーク検出誤差にも頑健である。最後に、視線の不確実性の推定により、視線推定精度を段階的に向上できるカスケードアーキテクチャを構築することができる。最先端のモデルベースと学習ベースの手法と比較して,提案手法は,いくつかのベンチマークデータセットにおける一般化能力と,実世界の課題条件下での正確性と堅牢性が大幅に向上することを示す。

Model-based eye tracking has been a dominant approach for eye gaze tracking because of its ability to generalize to different subjects, without the need of any training data and eye gaze annotations. Model-based eye tracking, however, is susceptible to eye feature detection errors, in particular for eye tracking in the wild. To address this issue, we propose a Bayesian framework for model-based eye tracking. The proposed system consists of a cascade-Bayesian Convolutional Neural Network (c-BCNN) to capture the probabilistic relationships between eye appearance and its landmarks, and a geometric eye model to estimate eye gaze from the eye landmarks. Given a testing eye image, the Bayesian framework can generate, through Bayesian inference, the eye gaze distribution without explicit landmark detection and model training, based on which it not only estimates the most likely eye gaze but also its uncertainty. Furthermore, with Bayesian inference instead of point-based inference, our model can not only generalize better to different sub-jects, head poses, and environments but also is robust to image noise and landmark detection errors. Finally, with the estimated gaze uncertainty, we can construct a cascade architecture that allows us to progressively improve gaze estimation accuracy. Compared to state-of-the-art model-based and learning-based methods, the proposed Bayesian framework demonstrates significant improvement in generalization capability across several benchmark datasets and in accuracy and robustness under challenging real-world conditions.

翻訳日:2021-06-29 00:33:11 公開日:2021-06-25

# (参考訳) HAN:骨格型ジェスチャー認識のための効率的な階層型自己認識ネットワーク

HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based Gesture Recognition ( http://arxiv.org/abs/2106.13391v1 )

ライセンス: CC BY 4.0

Jianbo Liu, Ying Wang, Shiming Xiang, Chunhong Pan

(参考訳) 骨格に基づくジェスチャー認識の従来の手法は、骨格配列を擬似画像や時空間グラフに配置し、深層畳み込みニューラルネットワーク(CNN)やグラフ畳み込みニューラルネットワーク(GCN)を用いて特徴抽出を行う。優れた結果を得たにもかかわらず、これらの手法はインタラクティブな手の部品の局所的な特徴を動的に捉えることに固有の制限があり、計算効率は依然として深刻な問題である。本研究では,この問題を緩和するために自己着脱機構を導入する。本稿では,手関節の階層構造を考慮し,CNN,RNN,GCN演算子を使わずに,純粋な自己認識に基づく骨格型ジェスチャー認識のための効率的な階層型自己認識ネットワーク(HAN)を提案する。具体的には、関節型自己保持モジュールは指の空間的特徴を捉え、指型自己保持モジュールは手全体の特徴を集約するように設計されている。時間的特徴の観点からは、時間的自己アテンションモジュールを使用して指と手全体の時間的ダイナミクスを捉える。最後に、これらの機能はジェスチャ分類のための融合自己注意モジュールによって融合される。提案手法は,計算複雑性がはるかに低い3つのジェスチャ認識データセットにおいて,競合する結果が得られることを示す。

Previous methods for skeleton-based gesture recognition mostly arrange the skeleton sequence into a pseudo picture or spatial-temporal graph and apply deep Convolutional Neural Network (CNN) or Graph Convolutional Network (GCN) for feature extraction. Although achieving superior results, these methods have inherent limitations in dynamically capturing local features of interactive hand parts, and the computing efficiency still remains a serious issue. In this work, the self-attention mechanism is introduced to alleviate this problem. Considering the hierarchical structure of hand joints, we propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition, which is based on pure self-attention without any CNN, RNN or GCN operators. Specifically, the joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand. In terms of temporal features, the temporal self-attention module is utilized to capture the temporal dynamics of the fingers and the entire hand. Finally, these features are fused by the fusion self-attention module for gesture classification. Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.

翻訳日:2021-06-29 00:20:57 公開日:2021-06-25

# (参考訳) SDS評価の長期ビデオ記録による抑うつの解釈

Interpreting Depression From Question-wise Long-term Video Recording of SDS Evaluation ( http://arxiv.org/abs/2106.13393v1 )

ライセンス: CC BY 4.0

Wanqing Xie, Lizhong Liang, Yao Lu, Chen Wang, Jihong Shen, Hui Luo, Xiaofeng Liu

(参考訳) SDS (Self-Rating Depression Scale) は, うつ病早期スクリーニングによく用いられている。しかし, コントロール不能な自己管理尺度は, 不合理に, 偏見的に答えることによって容易に影響を受け, 臨床医によるハミルトン抑うつ評価尺度 (HDRS) と最終診断で異なる結果が得られた。臨床では, 顔面表情(FE)と行動は, 臨床医による評価において重要な役割を担っている。本研究では,200人の被験者を対象とした新しいデータセットを収集し,自己評価アンケートの有効性を示す。 SDS評価とペアビデオからうつ病を自動的に解釈するために,質問票結果と回答時間にも配慮した,長期可変長ビデオのエンドツーエンド階層化フレームワークを提案する。具体的には,局所的時間的パターン探索に3D CNNを利用する階層モデルと,疑わしいグローバルな特徴集約のための冗長性を考慮した自己認識(RAS)方式を用いる。冗長なfeビデオ処理をターゲットとしたrasは,質問集合内の各ビデオクリップの相関を効果的に活用し,識別情報を強調し,特徴対の親和性に基づく冗長性を解消する。そして、質問側の映像特徴とアンケートスコアとを連結して最終抑うつ検出を行う。また,SDS評価とその映像記録の有効性,および従来の最先端の時間的モデリング手法に対するフレームワークの優位性も明らかにした。

Self-Rating Depression Scale (SDS) questionnaire has frequently been used for efficient depression preliminary screening. However, the uncontrollable self-administered measure can be easily affected by insouciantly or deceptively answering, and producing the different results with the clinician-administered Hamilton Depression Rating Scale (HDRS) and the final diagnosis. Clinically, facial expression (FE) and actions play a vital role in clinician-administered evaluation, while FE and action are underexplored for self-administered evaluations. In this work, we collect a novel dataset of 200 subjects to evidence the validity of self-rating questionnaires with their corresponding question-wise video recording. To automatically interpret depression from the SDS evaluation and the paired video, we propose an end-to-end hierarchical framework for the long-term variable-length video, which is also conditioned on the questionnaire results and the answering time. Specifically, we resort to a hierarchical model which utilizes a 3D CNN for local temporal pattern exploration and a redundancy-aware self-attention (RAS) scheme for question-wise global feature aggregation. Targeting for the redundant long-term FE video processing, our RAS is able to effectively exploit the correlations of each video clip within a question set to emphasize the discriminative information and eliminate the redundancy based on feature pair-wise affinity. Then, the question-wise video feature is concatenated with the questionnaire scores for final depression detection. Our thorough evaluations also show the validity of fusing SDS evaluation and its video recording, and the superiority of our framework to the conventional state-of-the-art temporal modeling methods.

翻訳日:2021-06-28 23:45:20 公開日:2021-06-25

# (参考訳) 逆算の例:入力変換と雑音学習の組み合わせ

Countering Adversarial Examples: Combining Input Transformation and Noisy Training ( http://arxiv.org/abs/2106.13394v1 )

ライセンス: CC BY 4.0

Cheng Zhang, Pan Gao

(参考訳) 近年の研究では、ニューラルネットワーク(nn)ベースの画像分類器は、セキュリティに敏感な画像認識タスクの脅威となる敵の例に非常に脆弱であることが示されている。これまでの研究では、JPEG圧縮はある程度の逆例の分類精度の低下に対処できることを示した。しかし、圧縮比が大きくなるにつれて、従来のJPEG圧縮はこれらの攻撃を防御するには不十分である。本稿では,逆方向の摂動を完全にフィルタリングすることを目的として,NNに好適な従来のJPEG圧縮アルゴリズムを改良する。具体的には,周波数係数の解析に基づいて圧縮のためのnn-favored quantization tableを設計する。データ拡張戦略として圧縮を考えると、モデルに依存しない前処理とノイズの多いトレーニングを組み合わせる。異なる圧縮レベルで符号化された画像を用いてトレーニングすることにより,事前学習したモデルを微調整し,複数の分類器を生成する。最後に、低(高)圧縮比は摂動と原特徴をわずかに除去できるので、モデルアンサンブルのためにこれらの訓練された複数のモデルを使用する。モデルのアンサンブルの大多数の投票は最終予測として採用される。実験の結果,本手法はオリジナル精度を維持しつつ防御効率を向上させることができた。

Recent studies have shown that neural network (NN) based image classifiers are highly vulnerable to adversarial examples, which poses a threat to security-sensitive image recognition task. Prior work has shown that JPEG compression can combat the drop in classification accuracy on adversarial examples to some extent. But, as the compression ratio increases, traditional JPEG compression is insufficient to defend those attacks but can cause an abrupt accuracy decline to the benign images. In this paper, with the aim of fully filtering the adversarial perturbations, we firstly make modifications to traditional JPEG compression algorithm which becomes more favorable for NN. Specifically, based on an analysis of the frequency coefficient, we design a NN-favored quantization table for compression. Considering compression as a data augmentation strategy, we then combine our model-agnostic preprocess with noisy training. We fine-tune the pre-trained model by training with images encoded at different compression levels, thus generating multiple classifiers. Finally, since lower (higher) compression ratio can remove both perturbations and original features slightly (aggressively), we use these trained multiple models for model ensemble. The majority vote of the ensemble of models is adopted as final predictions. Experiments results show our method can improve defense efficiency while maintaining original accuracy.

翻訳日:2021-06-28 23:23:48 公開日:2021-06-25

# (参考訳) インテリジェントな自律ナビゲーションエージェントの構築

Building Intelligent Autonomous Navigation Agents ( http://arxiv.org/abs/2106.13415v1 )

ライセンス: CC BY 4.0

Devendra Singh Chaplot

(参考訳) 過去10年間の機械学習のブレークスルーは‘デジタルインテリジェンス’、すなわち“デジタルインテリジェンス’につながった。膨大なラベル付きデータから学習し、音声認識、顔認識、機械翻訳などのデジタルタスクを実行することができる機械学習モデル。この論文の目標は「物理知性」が可能なアルゴリズムの設計を前進させることである。視覚知覚、自然言語理解、推論、計画、シーケンシャルな意思決定を含む、物理的な世界で複雑なナビゲーションタスクを実行できるインテリジェントな自律ナビゲーションエージェントの構築。過去数十年間の古典的ナビゲーション手法の進歩にもかかわらず、現在のナビゲーションエージェントは長期的な意味的ナビゲーションタスクで苦労している。論文の前半では,障害回避,意味認識,言語接地,推論といった課題に取り組むために,エンドツーエンドの強化学習を用いた短期ナビゲーションについて論じる。第2部では,モジュール型学習と構造化された明示的地図表現に基づく新しいナビゲーション手法について紹介する。これらの手法は, ローカライゼーション, マッピング, 長期計画, 探索, セマンティック事前学習といった課題に効果的に対処できることを示す。これらのモジュール型学習手法は,長期的空間的・意味的理解と,様々なナビゲーションタスクにおける最先端の成果を達成することができる。

Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.

翻訳日:2021-06-28 23:09:36 公開日:2021-06-25

# (参考訳) ドメイン名と関連する時間的特性を用いたブロックチェーン内の悪意のあるアカウントの識別

Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties ( http://arxiv.org/abs/2106.13420v1 )

ライセンス: CC BY 4.0

Rohit Kumar Sachan, Rachit Agarwal, Sandeep Kumar Shukla

(参考訳) ブロックチェーン技術の普及は、サイバー犯罪者による違法行為の増加につながり、何十億ドルもの費用がかかっている。このような不正行為を検出するために、多くの機械学習アルゴリズムが適用される。これらのアルゴリズムは、しばしばトランザクションの振る舞いに基づいて訓練され、場合によっては、システムに存在する脆弱性について訓練される。このアプローチでは、ブロックチェーン内のアカウントに関連付けられたドメイン名(DN)などのメタデータを使用することで、アカウントに悪意のあるタグ付けをすべきかどうかを判定する。ここでは、DNに付随する時間的側面を活用する。その結果,144930個のDNを同定し,そのうち54114個のDNは時間とともに永続的な悪意を示すことがわかった。それにもかかわらず、新たにタグ付けされた悪意のあるブロックチェーンDNには、これらの悪意のあるDNが報告されていない。

The rise in the adoption of blockchain technology has led to increased illegal activities by cyber-criminals costing billions of dollars. Many machine learning algorithms are applied to detect such illegal behavior. These algorithms are often trained on the transaction behavior and, in some cases, trained on the vulnerabilities that exist in the system. In our approach, we study the feasibility of using metadata such as Domain Name (DN) associated with the account in the blockchain and identify whether an account should be tagged malicious or not. Here, we leverage the temporal aspects attached to the DNs. Our results identify 144930 DNs that show malicious behavior, and out of these, 54114 DNs show persistent malicious behavior over time. Nonetheless, none of these identified malicious DNs were reported in new officially tagged malicious blockchain DNs.

翻訳日:2021-06-28 23:07:58 公開日:2021-06-25

# (参考訳) 不正スマートコントラクトの脆弱性とトランザクション行動に基づく検出

Vulnerability and Transaction behavior based detection of Malicious Smart Contracts ( http://arxiv.org/abs/2106.13422v1 )

ライセンス: CC BY 4.0

Rachit Agarwal, Tanmay Thapliyal, Sandeep Kumar Shukla

(参考訳) ethereumのsmart contracts(scs)はタスクを自動化し、ユーザにさまざまな機能を提供する。このような自動化は、SCが書かれたプログラミング言語(Solidity)の'Turing-complete'の性質によって実現される。これはまた、悪意あるアクターが暗号通貨プラットフォーム上で悪意あるまたは違法なアクティビティを実行するために悪用する、SCのさまざまな脆弱性とバグを開放する。本研究では,悪質な活動とscsに存在する脆弱性の相関関係を調べ,悪質な活動が特定の種類の脆弱性と相関していることを見いだす。次に、SCの脆弱性の重大度に対応するスコアリング機構の実現可能性について検討し、不審なSCの特定に関連性があるかどうかを判断する。非教師付き機械学習(ml)アルゴリズムを用いて,不審なscsの検出に向けた重大度スコアの有用性を分析し,行動変化の同定を行う。オンチェーンSCを用いた実験では、さまざまな粒度にわたる合計1094個の良性SCが、悪意のあるSCと同じような振る舞いをしており、機能セットにスマートコントラクトの脆弱性スコアが組み込まれています。

Smart Contracts (SCs) in Ethereum can automate tasks and provide different functionalities to a user. Such automation is enabled by the `Turing-complete' nature of the programming language (Solidity) in which SCs are written. This also opens up different vulnerabilities and bugs in SCs that malicious actors exploit to carry out malicious or illegal activities on the cryptocurrency platform. In this work, we study the correlation between malicious activities and the vulnerabilities present in SCs and find that some malicious activities are correlated with certain types of vulnerabilities. We then develop and study the feasibility of a scoring mechanism that corresponds to the severity of the vulnerabilities present in SCs to determine if it is a relevant feature to identify suspicious SCs. We analyze the utility of severity score towards detection of suspicious SCs using unsupervised machine learning (ML) algorithms across different temporal granularities and identify behavioral changes. In our experiments with on-chain SCs, we were able to find a total of 1094 benign SCs across different granularities which behave similar to malicious SCs, with the inclusion of the smart contract vulnerability scores in the feature set.

翻訳日:2021-06-28 22:48:16 公開日:2021-06-25

# (参考訳) 強化学習問題としての分岐予測 : なぜ, 方法, 事例研究

Branch Prediction as a Reinforcement Learning Problem: Why, How and Case Studies ( http://arxiv.org/abs/2106.13429v1 )

ライセンス: CC BY 4.0

Anastasios Zouzias, Kleovoulos Kalaitzidis and Boris Grot

(参考訳) 近年、分岐予測器(BP)の有効性が停滞し、分岐予測器の設計における新しいアイデアが失われ、この分野における新しい思考が求められている。本稿では,Reinforcement Learning(RL)の観点からBPを考察することにより,BP設計の体系的推論と探索を容易にする。本稿では、分岐予測器にRLの定式化を適用し、この定式化で既存の予測器を簡潔に表現できることを示し、従来のBPの2つのRLに基づく変種について検討する。

Recent years have seen stagnating improvements to branch predictor (BP) efficacy and a dearth of fresh ideas in branch predictor design, calling for fresh thinking in this area. This paper argues that looking at BP from the viewpoint of Reinforcement Learning (RL) facilitates systematic reasoning about, and exploration of, BP designs. We describe how to apply the RL formulation to branch predictors, show that existing predictors can be succinctly expressed in this formulation, and study two RL-based variants of conventional BPs.

翻訳日:2021-06-28 22:30:48 公開日:2021-06-25

# (参考訳) 限られた数の学習サンプルを用いたハイブリッドモデルと学習モデルによる分類手法

A hybrid model-based and learning-based approach for classification using limited number of training samples ( http://arxiv.org/abs/2106.13436v1 )

ライセンス: CC BY 4.0

Alireza Nooraiepour, Waheed U. Bajwa, Narayan B. Mandayam

(参考訳) 限られた数のトレーニングデータサンプルが与えられた分類の基本的なタスクは、既知のパラメトリック統計モデルを持つ物理システムである。独立した学習ベースおよび統計モデルベース分類器は、小さなトレーニングセットを用いた分類タスクの実現に向けて大きな課題に直面している。具体的には、物理に基づく統計モデルにのみ依存する分類器は、基礎となる観測不可能なパラメータを適切に調整できないため、システムの振舞いが不一致となる。一方、学習ベースの分類器は通常、基礎となる物理的プロセスからの大量のトレーニングデータに依存しており、ほとんどの現実的なシナリオでは実現できないかもしれない。本稿では,物理ベースの統計モデルと学習に基づく分類器の両方を利用するハイブリッド分類法であるhyphylearnを提案する。提案手法は,HyPhyLearnが学習ベースおよび統計モデルに基づく分類器の個人的アプローチに関わる課題を,それぞれの強みを融合することによって緩和する,という予想に基づいている。提案手法は,まず利用可能な(最適でない)統計的推定手法を用いて観測不可能なモデルパラメータを推定し,次いで物理に基づく統計モデルを用いて合成データを生成する。次に、ニューラルネットワークのドメイン対逆トレーニングに基づく学習ベース分類器に、トレーニングデータサンプルを合成データに組み込む。具体的には、ミスマッチ問題に対処するために、分類器は、トレーニングデータと合成データとから共通の特徴空間へのマッピングを学習する。同時に、分類器は、分類タスクを満たすために、この空間内で識別的特徴を見つけるように訓練される。

The fundamental task of classification given a limited number of training data samples is considered for physical systems with known parametric statistical models. The standalone learning-based and statistical model-based classifiers face major challenges towards the fulfillment of the classification task using a small training set. Specifically, classifiers that solely rely on the physics-based statistical models usually suffer from their inability to properly tune the underlying unobservable parameters, which leads to a mismatched representation of the system's behaviors. Learning-based classifiers, on the other hand, typically rely on a large number of training data from the underlying physical process, which might not be feasible in most practical scenarios. In this paper, a hybrid classification method -- termed HyPhyLearn -- is proposed that exploits both the physics-based statistical models and the learning-based classifiers. The proposed solution is based on the conjecture that HyPhyLearn would alleviate the challenges associated with the individual approaches of learning-based and statistical model-based classifiers by fusing their respective strengths. The proposed hybrid approach first estimates the unobservable model parameters using the available (suboptimal) statistical estimation procedures, and subsequently use the physics-based statistical models to generate synthetic data. Then, the training data samples are incorporated with the synthetic data in a learning-based classifier that is based on domain-adversarial training of neural networks. Specifically, in order to address the mismatch problem, the classifier learns a mapping from the training data and the synthetic data to a common feature space. Simultaneously, the classifier is trained to find discriminative features within this space in order to fulfill the classification task.

翻訳日:2021-06-28 22:19:07 公開日:2021-06-25

# (参考訳) 絵は、視覚的な質問に答えるために100語分の価値があるかもしれない

A Picture May Be Worth a Hundred Words for Visual Question Answering ( http://arxiv.org/abs/2106.13445v1 )

ライセンス: CC BY 4.0

Yusuke Hirota, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Ittetsu Taniguchi, Takao Onoye

(参考訳) 写真を理解するためのテキスト表現はどこまでできるのか? 画像理解では、簡潔だが詳細な画像表現を使うことが不可欠である。より高速なR-CNNのような視覚モデルによって抽出された深い視覚的特徴は、複数のタスク、特に視覚的質問応答(VQA)で広く使われている。しかし、従来の深い視覚的特徴は、人間のように画像内のすべての詳細を伝えるのに苦労するかもしれない。一方、最近の言語モデルの進歩により、記述テキストはこの問題の代替となるかもしれない。本稿では,VQAの特定の文脈における画像理解のためのテキスト表現の有効性について検討する。本稿では,記述・質問対を入力として,言語のみのトランスフォーマーモデルに導入し,プロセスと計算コストを単純化することを提案する。また、トレーニングセットの多様性を高め、統計的バイアスの学習を避けるために、データ拡張手法も実験した。大規模な評価では、VQA 2.0とVQA-CP v2の両方の深い視覚的特徴と競合するために、テキスト表現は100語程度しか必要としない。

How far can we go with textual representations for understanding pictures? In image understanding, it is essential to use concise but detailed image representations. Deep visual features extracted by vision models, such as Faster R-CNN, are prevailing used in multiple tasks, and especially in visual question answering (VQA). However, conventional deep visual features may struggle to convey all the details in an image as we humans do. Meanwhile, with recent language models' progress, descriptive text may be an alternative to this problem. This paper delves into the effectiveness of textual representations for image understanding in the specific context of VQA. We propose to take description-question pairs as input, instead of deep visual features, and fed them into a language-only Transformer model, simplifying the process and the computational cost. We also experiment with data augmentation techniques to increase the diversity in the training set and avoid learning statistical bias. Extensive evaluations have shown that textual representations require only about a hundred words to compete with deep visual features on both VQA 2.0 and VQA-CP v2.

翻訳日:2021-06-28 21:15:24 公開日:2021-06-25

# (参考訳) 深い解釈可能な刑事電荷予測とアルゴリズムバイアス

Deep Interpretable Criminal Charge Prediction and Algorithmic Bias ( http://arxiv.org/abs/2106.13456v1 )

ライセンス: CC BY 4.0

Abdul Rafae Khan, Jia Xu, Peter Varsanyi, Rachit Pabreja

(参考訳) 刑事司法制度における決定を補助する上で、予測的警察はますます一般的になっているが、これらの結果の使用はいまだに議論の余地がある。深層学習に基づくソフトウェアの中には精度(例えばF-1)に欠けるものもあるが、多くの意思決定プロセスは、人種、年齢、性別格差などの決定バイアスに疑念を生じさせるものではない。本稿では,20年以上の時間行動パターンを学習することで,過去の犯罪記録から将来の刑事訴追を受けるかという信頼性の高い予測を行うため,ポストホックな説明を伴うバイアス問題に対処する。 Bi-LSTMは、消失する勾配問題を緩和し、注意機構は特徴の重要性の学習と解釈を可能にする。提案手法は,実生活データセット上での予測精度とリコールの一貫性を示す。筆者らは,各入力特徴の重要性を分析し,犯罪履歴が統計的に重要な要因であるのに対して,人種,性別,年齢などの識別子はそうではないことを示唆した。最後に,我々のアルゴリズムは,犯罪の深刻度が時間とともに急激に上昇する傾向にあることを示す。

While predictive policing has become increasingly common in assisting with decisions in the criminal justice system, the use of these results is still controversial. Some software based on deep learning lacks accuracy (e.g., in F-1), and many decision processes are not transparent causing doubt about decision bias, such as perceived racial, age, and gender disparities. This paper addresses bias issues with post-hoc explanations to provide a trustable prediction of whether a person will receive future criminal charges given one's previous criminal records by learning temporal behavior patterns over twenty years. Bi-LSTM relieves the vanishing gradient problem, and attentional mechanisms allows learning and interpretation of feature importance. Our approach shows consistent and reliable prediction precision and recall on a real-life dataset. Our analysis of the importance of each input feature shows the critical causal impact on decision-making, suggesting that criminal histories are statistically significant factors, while identifiers, such as race, gender, and age, are not. Finally, our algorithm indicates that a suspect tends to gradually rather than suddenly increase crime severity level over time.

翻訳日:2021-06-28 20:56:53 公開日:2021-06-25

# (参考訳) adapt-and-distill: ドメインのための小さくて高速で効果的な事前学習言語モデルの開発

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains ( http://arxiv.org/abs/2106.13474v1 )

ライセンス: CC BY 4.0

Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong, Furu Wei

(参考訳) 訓練済みの大きなモデルは多くの自然言語処理タスクで大きな成功を収めた。しかしながら、特定のドメインに適用されると、これらのモデルはドメインシフトに悩まされ、レイテンシとキャパシティの制約に対して、微調整とオンラインサービスに課題をもたらす。本稿では、特定の領域に対して、小さくて高速で効果的な事前学習モデルを開発するための一般的なアプローチを提案する。これは、既成の一般訓練モデルに適応し、ターゲットドメインでタスク非依存の知識蒸留を行うことによって達成される。具体的には,適応段階におけるドメイン固有語彙拡張を提案し,コーパスレベル発生確率を用いてインクリメンタル語彙のサイズを自動的に選択する。そこで我々は,特定の領域に対する大規模事前学習モデルを圧縮するための様々な戦略を体系的に検討する。我々は生物医学とコンピュータ科学の領域で実験を行う。実験の結果、ドメイン固有タスクにおけるbertベースモデルよりもパフォーマンスが向上し、bertベースより3.3倍小さく5.1倍高速になった。コードと事前学習されたモデルはhttps://aka.ms/adalm.com/で入手できる。

Large pre-trained models have achieved great success in many natural language processing tasks. However, when they are applied in specific domains, these models suffer from domain shift and bring challenges in fine-tuning and online serving for latency and capacity constraints. In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains. This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains. Specifically, we propose domain-specific vocabulary expansion in the adaptation stage and employ corpus level occurrence probability to choose the size of incremental vocabulary automatically. Then we systematically explore different strategies to compress the large pre-trained models for specific domains. We conduct our experiments in the biomedical and computer science domain. The experimental results demonstrate that our approach achieves better performance over the BERT BASE model in domain-specific tasks while 3.3x smaller and 5.1x faster than BERT BASE. The code and pre-trained models are available at https://aka.ms/adalm.

翻訳日:2021-06-28 20:51:03 公開日:2021-06-25

# (参考訳) エネルギー予測のための機械学習の限界:ASHRAE Great Energy Predictor III Kaggle 競合誤差解析

Limitations of machine learning for building energy prediction: ASHRAE Great Energy Predictor III Kaggle competition error analysis ( http://arxiv.org/abs/2106.13475v1 )

ライセンス: CC BY 4.0

Clayton Miller, Bianca Picchetti, Chun Fu, Jovan Pantelic

(参考訳) 近年、エネルギー予測を構築するための機械学習が人気を博しているが、その限界と改善の可能性を理解していない。 ASHRAE Great Energy Predictor III (GEPIII) Kaggleコンペティションは、39,403件の予測を提出した4,370人の参加者による建築エネルギーメーター機械学習コンペティションである。テストデータには、時間給電、2年分の給湯、冷水、および16カ所の1,448棟の建物で2,380メートルの蒸気読み取りが含まれていた。本稿では,コンペティションのトップ50ソリューションの集約から残留モデルエラーの各種発生源と種類を分析した。この分析は、過去のメーター、天気、基本的な建築メタデータの標準モデル入力を用いた機械学習の限界を明らかにする。エラーの種類は、各インスタンスで発生した時間誤差の量、突然の振る舞いと漸進的な振る舞い、エラーの大きさ、エラーが1つの建物または複数の建物に一度に存在するかどうかによって分類される。結果は、機械学習モデルがテストデータの79.1%の許容範囲内でエラーを持っていることを示している。低等級のモデルエラーはテストデータの16.1%で発生する。これらの相違は、機械学習における追加のトレーニングデータソースやイノベーションによって対処される可能性がある。高次の誤差はテストデータの4.8%で発生し、イノベーションに関係なく正確に予測されることはない。エネルギーメータータイプ(電気予測モデルはテストデータの10%未満で許容できないエラーを持ち、温水は60%以上である)と使用タイプ(公共サービスでは14%未満、技術/科学では46%以上である)によって、エラーの振る舞いは様々である。

Machine learning for building energy prediction has exploded in popularity in recent years, yet understanding its limitations and potential for improvement are lacking. The ASHRAE Great Energy Predictor III (GEPIII) Kaggle competition was the largest building energy meter machine learning competition ever held with 4,370 participants who submitted 39,403 predictions. The test data set included two years of hourly electricity, hot water, chilled water, and steam readings from 2,380 meters in 1,448 buildings at 16 locations. This paper analyzes the various sources and types of residual model error from an aggregation of the competition's top 50 solutions. This analysis reveals the limitations for machine learning using the standard model inputs of historical meter, weather, and basic building metadata. The types of error are classified according to the amount of time errors occur in each instance, abrupt versus gradual behavior, the magnitude of error, and whether the error existed on single buildings or several buildings at once from a single location. The results show machine learning models have errors within a range of acceptability on 79.1% of the test data. Lower magnitude model errors occur in 16.1% of the test data. These discrepancies can likely be addressed through additional training data sources or innovations in machine learning. Higher magnitude errors occur in 4.8% of the test data and are unlikely to be accurately predicted regardless of innovation. There is a diversity of error behavior depending on the energy meter type (electricity prediction models have unacceptable error in under 10% of test data, while hot water is over 60%) and building use type (public service less than 14%, while technology/science is just over 46%).

翻訳日:2021-06-28 20:39:06 公開日:2021-06-25

# (参考訳) グラフパターン損失に基づくクロスモーダル検索のための分散注意ネットワーク

Graph Pattern Loss based Diversified Attention Network for Cross-Modal Retrieval ( http://arxiv.org/abs/2106.13552v1 )

ライセンス: CC BY 4.0

Xueying Chen, Rong Zhang, Yibing Zhan

(参考訳) クロスモーダル検索は、画像、ビデオ、テキスト、オーディオなどのマルチメディアデータを組み合わせることで、柔軟な検索エクスペリエンスを実現することを目的としている。教師なしアプローチのコアの1つは、異なるオブジェクト表現間の相関を掘り下げて、高価なラベルを必要とせずに完全な検索性能を達成することである。本稿では,表現間の相関関係を深く解析するために,教師なしクロスモーダル検索のためのグラフパターン損失に基づく分散注意ネットワーク(GPLDAN)を提案する。まず、インスタンスの複数の表現を生成するために異なる表現間の相互作用を考慮し、多様な注目機能プロジェクタを提案する。そこで我々は,異なる表現間の相関関係を探索するために,新しいグラフパターンの損失を設計する。さらに、融合前に対応する特徴のモダリティを明示的に宣言するためにモダリティ分類器を追加し、ネットワークを誘導して識別能力を高める。 GPLDANを4つの公開データセットでテストする。最先端のクロスモーダル検索手法と比較して,GPLDANの性能と競争性を示す実験結果が得られた。

Cross-modal retrieval aims to enable flexible retrieval experience by combining multimedia data such as image, video, text, and audio. One core of unsupervised approaches is to dig the correlations among different object representations to complete satisfied retrieval performance without requiring expensive labels. In this paper, we propose a Graph Pattern Loss based Diversified Attention Network(GPLDAN) for unsupervised cross-modal retrieval to deeply analyze correlations among representations. First, we propose a diversified attention feature projector by considering the interaction between different representations to generate multiple representations of an instance. Then, we design a novel graph pattern loss to explore the correlations among different representations, in this graph all possible distances between different representations are considered. In addition, a modality classifier is added to explicitly declare the corresponding modalities of features before fusion and guide the network to enhance discrimination ability. We test GPLDAN on four public datasets. Compared with the state-of-the-art cross-modal retrieval methods, the experimental results demonstrate the performance and competitiveness of GPLDAN.

翻訳日:2021-06-28 20:18:26 公開日:2021-06-25

# (参考訳) 文脈における単語の意味表現の探索:ホモニミーとシンノミーを事例として

Exploring the Representation of Word Meanings in Context: A Case Study on Homonymy and Synonymy ( http://arxiv.org/abs/2106.13553v1 )

ライセンス: CC BY 4.0

Marcos Garcia

(参考訳) 本稿では,文脈における単語の意味表現の多言語的研究について述べる。我々は,静的モデルと文脈モデルの両方が,同義語や同義語などの語彙関係を適切に表現できる能力を評価する。そこで我々は,周囲の文脈の影響や単語間の重なりなど,複数の要因の制御された評価を行い,同一あるいは異なる感覚を伝達できる,新たな多言語データセットを作成した。 4つのシナリオに関する体系的な評価は、トランスフォーマーに基づく最良の単言語モデルが文脈におけるホモニムを適切に曖昧化することができることを示している。しかし、これらのモデルは文脈に大きく依存しているため、類似した文で発生する異なる感覚の単語を表現できない。ガリシア語、ポルトガル語、英語、スペイン語で実験が行われ、データセット(3000以上の評価項目を含む)と新しいモデルの両方がこの研究で自由にリリースされる。

This paper presents a multilingual study of word meaning representations in context. We assess the ability of both static and contextualized models to adequately represent different lexical-semantic relations, such as homonymy and synonymy. To do so, we created a new multilingual dataset that allows us to perform a controlled evaluation of several factors such as the impact of the surrounding context or the overlap between words, conveying the same or different senses. A systematic assessment on four scenarios shows that the best monolingual models based on Transformers can adequately disambiguate homonyms in context. However, as they rely heavily on context, these models fail at representing words with different senses when occurring in similar sentences. Experiments are performed in Galician, Portuguese, English, and Spanish, and both the dataset (with more than 3,000 evaluation items) and new models are freely released with this study.

翻訳日:2021-06-28 20:08:27 公開日:2021-06-25

# (参考訳) srpn:組織画像における核および細胞検出のための類似性に基づく領域提案ネットワーク

SRPN: similarity-based region proposal networks for nuclei and cells detection in histology images ( http://arxiv.org/abs/2106.13556v1 )

ライセンス: CC BY 4.0

Yibao Sun, Xingru Huang, Huiyu Zhou, Qianni Zhang

(参考訳) 組織像中の核と細胞の検出は臨床と病理学的研究の両方において非常に有用である。しかし, 原子核や細胞の形態変化などの複数の理由から, 従来の物体検出法では良好な性能が得られない課題となっている。検出タスクは2つのサブタスク、分類とローカライゼーションで構成される。密度の高い物体検出条件下では、分類は検出性能を高める鍵となる。そこで本研究では,核・細胞検出のための類似性に基づく領域提案ネットワーク(SRPN)を提案する。特に、組み込み層と呼ばれるカスタマイズされた畳み込み層は、ネットワーク構築のために設計されている。埋め込み層がリージョン提案ネットワークに追加され、類似性学習に基づいて識別的特徴を学習することができる。類似学習によって得られる特徴は,従来の手法に比べて分類性能を著しく向上させることができる。 SRPNは、Faster R-CNNやRetinaNetのような標準の畳み込みニューラルネットワークアーキテクチャに容易に統合できる。組織像における多臓器核検出とシグナレットリング細胞検出の課題について,提案手法を検証した。実験の結果,類似性学習を施したネットワークは,両タスクにおいて,両課題とも同等の性能を得た。特に,提案したSRPNは,従来手法と比較して核分割と検出のためのMoNuSegベンチマーク,およびベースラインと比較した場合のシグレットリング細胞検出ベンチマークにおいて,最先端性能を実現している。ソースコードはhttps://github.com/sigma10010/nuclei_cells_detで公開されている。

The detection of nuclei and cells in histology images is of great value in both clinical practice and pathological studies. However, multiple reasons such as morphological variations of nuclei or cells make it a challenging task where conventional object detection methods cannot obtain satisfactory performance in many cases. A detection task consists of two sub-tasks, classification and localization. Under the condition of dense object detection, classification is a key to boost the detection performance. Considering this, we propose similarity based region proposal networks (SRPN) for nuclei and cells detection in histology images. In particular, a customized convolution layer termed as embedding layer is designed for network building. The embedding layer is added into the region proposal networks, enabling the networks to learn discriminative features based on similarity learning. Features obtained by similarity learning can significantly boost the classification performance compared to conventional methods. SRPN can be easily integrated into standard convolutional neural networks architectures such as the Faster R-CNN and RetinaNet. We test the proposed approach on tasks of multi-organ nuclei detection and signet ring cells detection in histological images. Experimental results show that networks applying similarity learning achieved superior performance on both tasks when compared to their counterparts. In particular, the proposed SRPN achieve state-of-the-art performance on the MoNuSeg benchmark for nuclei segmentation and detection while compared to previous methods, and on the signet ring cell detection benchmark when compared with baselines. The sourcecode is publicly available at: https://github.com/sigma10010/nuclei_cells_det.

翻訳日:2021-06-28 18:59:51 公開日:2021-06-25

# (参考訳) HEVC画面コンテンツ符号化によるマルチビュー映像圧縮

Multiview Video Compression Using Advanced HEVC Screen Content Coding ( http://arxiv.org/abs/2106.13574v1 )

ライセンス: CC BY 4.0

Jaros{\l}aw Samelak, Marek Doma\'nski

(参考訳) 本稿では,スクリーンコンテンツ符号化を用いたマルチビュー映像符号化手法を提案する。一瞬の間、すべてのビューに対応するフレームが単一のフレームに詰め込まれていると仮定される。マルチビュー符号化へのフレーム互換アプローチが適用される。このようなコーディングシナリオに対して,マルチビュー映像符号化にスクリーンコンテンツ符号化が有効であることを示す。 2つのアプローチが検討されている: 1つは標準hevcスクリーンコンテンツコーディング、もう1つは高度なスクリーンコンテンツコーディングである。後者は、HEVCスクリーンコンテンツ符号化の4分の1のモーションベクトルや他の非標準拡張を利用する著者の原案である。実験結果から,標準的なHEVC画面コンテンツ符号化を用いたマルチビュー映像符号化の方が,HEVC符号化のシミュレートよりもはるかに効率的であることが示された。提案したAdvanced Screen Content Codingは、最先端のマルチビュービデオ圧縮技術であるMV-HEVCとほぼ同等の符号化効率を提供する。著者らは、新しいVersatile Video Coding(VVC)技術で、Advanced Screen Content Codingを効率的に利用できることを示唆している。しかしながら、vvcの参照マルチビュー拡張はまだ存在しないため、vvcベースのコーディングでは、将来の作業のために実験的な比較が残されている。

The paper presents a new approach to multiview video coding using Screen Content Coding. It is assumed that for a time instant the frames corresponding to all views are packed into a single frame, i.e. the frame-compatible approach to multiview coding is applied. For such coding scenario, the paper demonstrates that Screen Content Coding can be efficiently used for multiview video coding. Two approaches are considered: the first using standard HEVC Screen Content Coding, and the second using Advanced Screen Content Coding. The latter is the original proposal of the authors that exploits quarter-pel motion vectors and other nonstandard extensions of HEVC Screen Content Coding. The experimental results demonstrate that multiview video coding even using standard HEVC Screen Content Coding is much more efficient than simulcast HEVC coding. The proposed Advanced Screen Content Coding provides virtually the same coding efficiency as MV-HEVC, which is the state-of-the-art multiview video compression technique. The authors suggest that Advanced Screen Content Coding can be efficiently used within the new Versatile Video Coding (VVC) technology. Nevertheless a reference multiview extension of VVC does not exist yet, therefore, for VVC-based coding, the experimental comparisons are left for future work.

翻訳日:2021-06-28 18:34:25 公開日:2021-06-25

# (参考訳) 遺伝的アルゴリズムを用いた段階的議論フレームワークの学習

Learning Gradual Argumentation Frameworks using Genetic Algorithms ( http://arxiv.org/abs/2106.13585v1 )

ライセンス: CC BY 4.0

Jonathan Spieler, Nico Potyka, Steffen Staab

(参考訳) グラフィカルな議論フレームワークは、重み付きグラフで引数とその関係を表現する。彼らのグラフィカルな構造と直感的な意味論は、機械学習を解釈するための潜在的に興味深いツールとなる。近年、そのメカニズムはニューラルネットワークと密接に関連しており、標準のディープラーニングフレームワークによってデータから重み付けを学習することができる。最初の概念実証として,議論型分類モデルの構造を同時に学習する遺伝的アルゴリズムを提案する。良好に解釈可能なモデルを得るには、適合関数は分類器のスパースネスと精度のバランスをとる。提案アルゴリズムについて考察し,UCI機械学習レポジトリの標準ベンチマークに関する最初の実験結果を示す。本プロトタイプでは,学習性能と解釈可能性の観点から,決定木に匹敵する議論的分類モデルを学習する。

Gradual argumentation frameworks represent arguments and their relationships in a weighted graph. Their graphical structure and intuitive semantics makes them a potentially interesting tool for interpretable machine learning. It has been noted recently that their mechanics are closely related to neural networks, which allows learning their weights from data by standard deep learning frameworks. As a first proof of concept, we propose a genetic algorithm to simultaneously learn the structure of argumentative classification models. To obtain a well interpretable model, the fitness function balances sparseness and accuracy of the classifier. We discuss our algorithm and present first experimental results on standard benchmarks from the UCI machine learning repository. Our prototype learns argumentative classification models that are comparable to decision trees in terms of learning performance and interpretability.

翻訳日:2021-06-28 18:23:39 公開日:2021-06-25

# (参考訳) 本物の熱画像と偽の熱画像を混ぜて、オブジェクト検出を改善する

Partially fake it till you make it: mixing real and fake thermal images for improved object detection ( http://arxiv.org/abs/2106.13603v1 )

ライセンス: CC BY-SA 4.0

Francesco Bongini, Lorenzo Berlincioni, Marco Bertini, Alberto Del Bimbo

(参考訳) 本稿では,学習データセットが乏しい視覚コンテンツ領域に対して,実シーンで合成された3Dオブジェクトを合成する新しいデータ拡張手法を提案する。熱画像における物体検出の文脈において, 提案システムの性能を示す。1) トレーニングデータセットは可視スペクトルデータセットと比較して非常に限られた領域であり, 2) シーンの素材の熱特性のモデル化が困難であるため, 完全なリアルな合成シーンの作成は非常に困難で費用がかかる。我々は,RL法を用いて得られた技術アプローチの状況,シミュレーションデータの注入,生成モデルの活用など,さまざまな拡張戦略を比較し,提案手法と他の手法を最大限に組み合わせる方法について検討する。実験結果から,我々のアプローチの有効性が示され,我々の単一モード検出装置はFLIR ADASデータセット上で最先端の成果を達成できる。

In this paper we propose a novel data augmentation approach for visual content domains that have scarce training datasets, compositing synthetic 3D objects within real scenes. We show the performance of the proposed system in the context of object detection in thermal videos, a domain where 1) training datasets are very limited compared to visible spectrum datasets and 2) creating full realistic synthetic scenes is extremely cumbersome and expensive due to the difficulty in modeling the thermal properties of the materials of the scene. We compare different augmentation strategies, including state of the art approaches obtained through RL techniques, the injection of simulated data and the employment of a generative model, and study how to best combine our proposed augmentation with these other techniques.Experimental results demonstrate the effectiveness of our approach, and our single-modality detector achieves state-of-the-art results on the FLIR ADAS dataset.

翻訳日:2021-06-28 18:08:21 公開日:2021-06-25

# (参考訳) Chebyshev-Cantelli PAC-Bayes-Bennettの不等式

Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote ( http://arxiv.org/abs/2106.13624v1 )

ライセンス: CC BY 4.0

Yi-Shan Wu, Andr\'es R. Masegosa, Stephan S. Lorenzen, Christian Igel, Yevgeny Seldin

(参考訳) 我々は、重み付けされた多数決のリスクを期待する2次託宣を新たに提示する。この境界は、チェビシェフ=カンテリの不等式(英語版)(a.k.a.\ one-sided chebyshev's)の新たなパラメトリック形式に基づいている。この新しい形式は、チェビシェフ・カンテリの不等式、C-バウンド [Germain et al., 2015] に基づく事前のオラクル境界が直面する最適化課題を解決し、同時に、マセゴサらによって導入された2階マルコフの不等式に基づくオラクル境界を改善する。 [2020]. また、私たちはpac-bayes-bennettの不等式を導出します。 PAC-Bayes-Bennettの不等式はセルディンらによってPAC-Bayes-Bernsteinの不等式を改善する。 [2012]. 我々は,Masegosaらによる新たな限界が改善できることを実証的に評価する。 [2020]. チェビシェフ・カンテッリ不等式のパラメトリック形式とPAC-ベイズ・ベネット不等式は、他の領域における測度集中の研究には独立した関心を持つかもしれない。

We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev-Cantelli inequality (a.k.a.\ one-sided Chebyshev's), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain et al., 2015], and, at the same time, it improves on the oracle bound based on second order Markov's inequality introduced by Masegosa et al. [2020]. We also derive the PAC-Bayes-Bennett inequality, which we use for empirical estimation of the oracle bound. The PAC-Bayes-Bennett inequality improves on the PAC-Bayes-Bernstein inequality by Seldin et al. [2012]. We provide an empirical evaluation demonstrating that the new bounds can improve on the work by Masegosa et al. [2020]. Both the parametric form of the Chebyshev-Cantelli inequality and the PAC-Bayes-Bennett inequality may be of independent interest for the study of concentration of measure in other domains.

翻訳日:2021-06-28 17:55:05 公開日:2021-06-25

# (参考訳) 言語モデルは優れた翻訳者です

Language Models are Good Translators ( http://arxiv.org/abs/2106.13627v1 )

ライセンス: CC BY 4.0

Shuo Wang, Zhaopeng Tu, Zhixing Tan, Wenxuan Wang, Maosong Sun, Yang Liu

(参考訳) 近年、エンコーダ-デコーダアーキテクチャの中核であるニューラルネットワーク翻訳(NMT)が急速に進歩しているのを目撃している。機械翻訳における大規模事前学習言語モデルの限られたシナリオにおける最近の進歩に触発されて、我々はまず、単一の言語モデル(LM4MT)が標準機械翻訳ベンチマークにおける強力なエンコーダ・デコーダNMTモデルと同等の性能を達成できることを実証した。 LM4MTはソースサイドのテキストを簡単に追加の監視として利用することができる。同じメカニズムでソースとターゲットのテキストをモデリングするが、LM4MTはソースとターゲットの文の両方に統一表現を提供し、言語間で知識を伝達する。ピボットベースおよびゼロショット変換タスクの広範囲な実験により、LM4MTはエンコーダ・デコーダNMTモデルよりも大きなマージンで優れていることが示された。

Recent years have witnessed the rapid advance in neural machine translation (NMT), the core of which lies in the encoder-decoder architecture. Inspired by the recent progress of large-scale pre-trained language models on machine translation in a limited scenario, we firstly demonstrate that a single language model (LM4MT) can achieve comparable performance with strong encoder-decoder NMT models on standard machine translation benchmarks, using the same training data and similar amount of model parameters. LM4MT can also easily utilize source-side texts as additional supervision. Though modeling the source- and target-language texts with the same mechanism, LM4MT can provide unified representations for both source and target sentences, which can better transfer knowledge across languages. Extensive experiments on pivot-based and zero-shot translation tasks show that LM4MT can outperform the encoder-decoder NMT model by a large margin.

翻訳日:2021-06-28 17:26:51 公開日:2021-06-25

# (参考訳) deeploc: ユビキタスな精度と低オーバヘッドな屋外セルローカライズシステム

DeepLoc: A Ubiquitous Accurate and Low-Overhead Outdoor Cellular Localization System ( http://arxiv.org/abs/2106.13632v1 )

ライセンス: CC BY 4.0

Ahmed Shokry, Marwan Torki, Moustafa Youssef

(参考訳) 近年,屋外位置情報サービスの普及が進んでいる。 gpsはユビキタスなローカライズシステムと考えられているが、ローエンドの携帯電話ではサポートされておらず、衛星への直接の視線を必要とする。本稿では,GPSライクな位置決め精度を限界なく獲得する深層学習型屋外位置決めシステムDeepLocを提案する。特にDeepLocは、モバイル端末が受信した異なるセルタワーから受信したユビキタスなセル信号を、それをローカライズするためのヒントとして利用する。そのため、異なるセルタワーから受信した信号強度情報をクラウドセンシングしたジオタグを用いて、利用者の位置を推定する深層モデルの訓練を行う。 deeploc設計の一環として,大規模領域へのデータ収集のスケールアップ,セル信号の固有ノイズ処理やジオタグデータ処理,低オーバヘッドのディープラーニングモデルに必要な十分なデータの提供など,多くの実用的な課題に対処するモジュールを導入する。私たちはさまざまなAndroidデバイスにDeepLocを実装しました。現実的な都市・農村環境の評価結果から、DeepLocは都市部では18.8m、農村部では15.7mの範囲で、中央値のローカライズ精度を達成できることが示された。この精度は、最先端のセルベースシステムよりも470%以上優れており、GPSと比較して330%の省電力が可能である。これはDeepLocがユビキタスで高精度かつ低オーバーヘッドなローカライゼーションシステムであることを示すものだ。

Recent years have witnessed fast growth in outdoor location-based services. While GPS is considered a ubiquitous localization system, it is not supported by low-end phones, requires direct line of sight to the satellites, and can drain the phone battery quickly. In this paper, we propose DeepLoc: a deep learning-based outdoor localization system that obtains GPS-like localization accuracy without its limitations. In particular, DeepLoc leverages the ubiquitous cellular signals received from the different cell towers heard by the mobile device as hints to localize it. To do that, crowd-sensed geo-tagged received signal strength information coming from different cell towers is used to train a deep model that is used to infer the user's position. As part of DeepLoc design, we introduce modules to address a number of practical challenges including scaling the data collection to large areas, handling the inherent noise in the cellular signal and geo-tagged data, as well as providing enough data that is required for deep learning models with low-overhead. We implemented DeepLoc on different Android devices. Evaluation results in realistic urban and rural environments show that DeepLoc can achieve a median localization accuracy within 18.8m in urban areas and within 15.7m in rural areas. This accuracy outperforms the state-of-the-art cellular-based systems by more than 470% and comes with 330% savings in power compared to the GPS. This highlights the promise of DeepLoc as a ubiquitous accurate and low-overhead localization system.

翻訳日:2021-06-28 17:11:39 公開日:2021-06-25

# (参考訳) 衝突依存報酬分布を有するマルチプレイヤーマルチアームバンディット

Multi-player Multi-armed Bandits with Collision-Dependent Reward Distributions ( http://arxiv.org/abs/2106.13669v1 )

ライセンス: CC BY 4.0

Chengshuai Shi, Cong Shen

(参考訳) 本研究では,腕に衝突した場合に報酬分布が変化する確率的マルチプレイヤーマルチアームバンディット問題(mp-mab)について検討した。既存の文献は常に、衝突が発生した場合、関連するプレイヤーにゼロ報酬を仮定するが、認知無線のような応用の場合、より現実的なシナリオは、衝突が平均報酬を減らし、必ずしもゼロにしないことである。我々は,プレイヤーが直接衝突を知覚しない,より実用的なno-sensing設定に着目し,暗黙的通信をノイズチャネル問題に対する信頼性の高い通信としてモデル化する誤り訂正衝突通信(ec3)アルゴリズムを提案する。最後に、コード長とデコードエラー率のトレードオフを最適化することは、自然の低い境界を表す集中的なMP-MABの後悔に近づくことを後悔させる。合成データと実世界のデータセットの両方における実用的な誤り訂正コードによる実験は、ec3の優位を示している。特に, コーディングスキームの選択が後悔のパフォーマンスに大きな影響を与えることが示された。

We study a new stochastic multi-player multi-armed bandits (MP-MAB) problem, where the reward distribution changes if a collision occurs on the arm. Existing literature always assumes a zero reward for involved players if collision happens, but for applications such as cognitive radio, the more realistic scenario is that collision reduces the mean reward but not necessarily to zero. We focus on the more practical no-sensing setting where players do not perceive collisions directly, and propose the Error-Correction Collision Communication (EC3) algorithm that models implicit communication as a reliable communication over noisy channel problem, for which random coding error exponent is used to establish the optimal regret that no communication protocol can beat. Finally, optimizing the tradeoff between code length and decoding error rate leads to a regret that approaches the centralized MP-MAB regret, which represents a natural lower bound. Experiments with practical error-correction codes on both synthetic and real-world datasets demonstrate the superiority of EC3. In particular, the results show that the choice of coding schemes has a profound impact on the regret performance.

翻訳日:2021-06-28 16:56:28 公開日:2021-06-25

# (参考訳) フェデレーション学習のためのクリッピングの理解:収束とクライアントレベルの差分プライバシー

Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy ( http://arxiv.org/abs/2106.13673v1 )

ライセンス: CC BY 4.0

Xinwei Zhang, Xiangyi Chen, Mingyi Hong, Zhiwei Steven Wu and Jinfeng Yi

(参考訳) プライバシ保護の提供は、フェデレートラーニング(FL)の主要な動機の1つだ。近年、差分プライバシーという形式的なプライバシー概念をFLに組み込むことに取り組んできた。 flアルゴリズムにおけるクライアントレベルのディファレンシャルプライバシを保証するためには、プライバシノイズを追加する前に、クライアントのモデル更新をクリップする必要がある。このようなクリッピング操作は、偏微分プライベートSGDにおける勾配クリッピングとは大きく異なり、十分に理解されていない。本稿では,ニューラルネットワークのトレーニングにおいて,有意なデータ不均一性を伴っても,カットしたFedAvgが驚くほど良好に動作可能であることを実証的に実証する。このキーとなる観測に基づいて、差分プライベート(DP)のFedAvgアルゴリズムの収束解析を行い、クリッピングバイアスとクライアント更新の分布との関係を明らかにする。私たちの知る限りでは、flアルゴリズムのクリッピング操作に関する理論的および経験的問題を厳格に調査するのはこれが初めてです。

Providing privacy protection has been one of the primary motivations of Federated Learning (FL). Recently, there has been a line of work on incorporating the formal privacy notion of differential privacy with FL. To guarantee the client-level differential privacy in FL algorithms, the clients' transmitted model updates have to be clipped before adding privacy noise. Such clipping operation is substantially different from its counterpart of gradient clipping in the centralized differentially private SGD and has not been well-understood. In this paper, we first empirically demonstrate that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity when training neural networks, which is partly because the clients' updates become similar for several popular deep architectures. Based on this key observation, we provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clipping bias and the distribution of the clients' updates. To the best of our knowledge, this is the first work that rigorously investigates theoretical and empirical issues regarding the clipping operation in FL algorithms.

翻訳日:2021-06-28 16:20:09 公開日:2021-06-25

# (参考訳) 非凸チューニングフリーロバスト回帰問題に対する近近大乗数最小化アルゴリズム

A proximal-proximal majorization-minimization algorithm for nonconvex tuning-free robust regression problems ( http://arxiv.org/abs/2106.13683v1 )

ライセンス: CC0 1.0

Peipei Tang, Chengjing Wang and Bo Jiang

(参考訳) 本稿では,非凸チューニングフリーロバスト回帰問題に対する近近大乗数最小化(ppmm)アルゴリズムを提案する。基本的考え方は、近位偏極最小化アルゴリズムを用いて、スパース半平板ニュートン法(SSN)法に基づく近位点アルゴリズム(PPA)によって解かれた内部のサブプロブレムで非凸問題を解くことである。アルゴリズムの設計における主な困難は、内部サブプロブレムの特異な難しさを克服する方法にあることを強調する必要がある。さらに、PPMMアルゴリズムがd-定常点に収束することを証明した。この問題のKurtyka-Lojasiewicz(KL)特性のため、PPMMアルゴリズムの収束率を示す。数値実験により,提案アルゴリズムが既存の最先端アルゴリズムよりも優れていることを示す。

In this paper, we introduce a proximal-proximal majorization-minimization (PPMM) algorithm for nonconvex tuning-free robust regression problems. The basic idea is to apply the proximal majorization-minimization algorithm to solve the nonconvex problem with the inner subproblems solved by a sparse semismooth Newton (SSN) method based proximal point algorithm (PPA). We must emphasize that the main difficulty in the design of the algorithm lies in how to overcome the singular difficulty of the inner subproblem. Furthermore, we also prove that the PPMM algorithm converges to a d-stationary point. Due to the Kurdyka-Lojasiewicz (KL) property of the problem, we present the convergence rate of the PPMM algorithm. Numerical experiments demonstrate that our proposed algorithm outperforms the existing state-of-the-art algorithms.

翻訳日:2021-06-28 15:35:43 公開日:2021-06-25

# (参考訳) フランク・エミカ・パンダシミュレーションロボットのための多方向強化学習環境

Multi-Goal Reinforcement Learning environments for simulated Franka Emika Panda robot ( http://arxiv.org/abs/2106.13687v1 )

ライセンス: CC BY 4.0

Quentin Gallou\'edec, Nicolas Cazin, Emmanuel Dellandr\'ea, Liming Chen

(参考訳) 本報告では,openai gym と統合した franka emika panda ロボットの強化学習(rl)環境である panda-gym を提案する。 reach、push、slide、pick & place、stackの5つのタスクが含まれている。それらはすべてMulti-Goal RLフレームワークに従っており、目標指向のRLアルゴリズムを使用することができる。オープンリサーチを促進するために、私たちはオープンソースの物理エンジンpybulletを使うことを選択しました。このパッケージに選択された実装は、非常に簡単に新しいタスクや新しいロボットを定義することができる。本報告では,最先端のモデルレスオフポリシーアルゴリズムを用いて得られた結果のベースラインを示す。 panda-gymはhttps://github.com/qgallouedec/panda-gymでオープンソースである。

This technical report presents panda-gym, a set Reinforcement Learning (RL) environments for the Franka Emika Panda robot integrated with OpenAI Gym. Five tasks are included: reach, push, slide, pick & place and stack. They all follow a Multi-Goal RL framework, allowing to use goal-oriented RL algorithms. To foster open-research, we chose to use the open-source physics engine PyBullet. The implementation chosen for this package allows to define very easily new tasks or new robots. This report also presents a baseline of results obtained with state-of-the-art model-free off-policy algorithms. panda-gym is open-source at https://github.com/qgallouedec/panda-gym.

翻訳日:2021-06-28 15:34:45 公開日:2021-06-25

# (参考訳) 後方共分散情報基準

Posterior Covariance Information Criterion ( http://arxiv.org/abs/2106.13694v1 )

ライセンス: CC BY 4.0

Yukito Iba and Keisuke Yano

(参考訳) 準後続分布に基づく予測評価のための情報基準であるPCICを導入する。広く適用可能な情報基準(waic)の自然な一般化と見なされ、単一のマルコフ連鎖モンテカルロランで計算することができる。 PCICは、重み付き確率推定や準ベイズ予測など、WAICではうまく扱えない様々な予測設定において有用である。

We introduce an information criterion, PCIC, for predictive evaluation based on quasi-posterior distributions. It is regarded as a natural generalization of widely applicable information criterion (WAIC) and can be computed via a single Markov Chain Monte Carlo run. PCIC is useful in a variety of predictive settings that are not well dealt with in WAIC, including weighted likelihood inference and quasi-Bayesian prediction.

翻訳日:2021-06-28 15:27:30 公開日:2021-06-25

# (参考訳) 補助条件による画像間変換

Image-to-image Transformation with Auxiliary Condition ( http://arxiv.org/abs/2106.13696v1 )

ライセンス: CC BY 4.0

Robert Leer, Hessi Roma, James Amelia

(参考訳) シミュレーション画像で訓練された人間のポーズ検出のような画像認識の性能は通常、実際のデータとシミュレーションデータのばらつきによって悪化する。シミュレーション画像の分布を実画像に近いものにするために、SimGAN や CycleGAN といった GAN ベースの画像-画像変換手法を適用する研究がいくつかある。しかし、これらの方法は、特に訓練データが不均衡である場合、例えば、訓練データにおいて特定のポーズや形状が小さい場合など、被験者の姿勢や形の変化に十分敏感ではない。この問題を克服するために, 被験者のポーズや物体の種類といったラベル情報をサイクガンの訓練に導入し, ラベルワイズ・トランスフォーメーションモデルを得ることを提案する。提案手法であるラベルサイクガンをsvhnからmnistへのデジット画像変換とシミュレーション画像から実画像への監視カメラ画像変換実験により評価した。

The performance of image recognition like human pose detection, trained with simulated images would usually get worse due to the divergence between real and simulated data. To make the distribution of a simulated image close to that of real one, there are several works applying GAN-based image-to-image transformation methods, e.g., SimGAN and CycleGAN. However, these methods would not be sensitive enough to the various change in pose and shape of subjects, especially when the training data are imbalanced, e.g., some particular poses and shapes are minor in the training data. To overcome this problem, we propose to introduce the label information of subjects, e.g., pose and type of objects in the training of CycleGAN, and lead it to obtain label-wise transforamtion models. We evaluate our proposed method called Label-CycleGAN, through experiments on the digit image transformation from SVHN to MNIST and the surveillance camera image transformation from simulated to real images.

翻訳日:2021-06-28 15:11:04 公開日:2021-06-25

# (参考訳) 代替現実感ゲームを用いた社会科学研究の促進手法:個人差と適応性の測定による概念実証とチームパフォーマンスへの影響

Advancing Methodology for Social Science Research Using Alternate Reality Games: Proof-of-Concept Through Measuring Individual Differences and Adaptability and their impact on Team Performance ( http://arxiv.org/abs/2106.13740v1 )

ライセンス: CC BY 4.0

Magy Seif El-Nasr, Casper Harteveld, Paul Fombelle, Truong-Huy Nguyen, Paola Rizzo, Dylan Schouten, Abdelrahman Madkour, Chaima Jemmali, Erica Kleinman, Nithesh Javvaji, Zhaoqing Teng, Extra Ludic Inc

(参考訳) cscw(computer supported collaborative work)、心理学、社会科学(social sciences)といった分野の研究は、チームプロセスとその効果と効果の理解を進歩させていますが、現在の手法は観察や自己報告に依存しています。この報告では、個人の違いとそのチーム適応への影響を理解することに焦点を当てて、このオープンな問題に取り組む作業について議論し、これらの要因が結果とプロセスの両方としてチームパフォーマンスに与える影響をさらに探ります。具体的には、調査データと行動データを強化し、チームパフォーマンスに関する洞察を深め、グループ内およびグループ内における適応とパフォーマンスを評価する方法の開発を可能にする方法に関する貢献について論じます。この問題をより扱いやすくするため、私たちは特定のタイプの環境、代替現実ゲーム(arg)、そしていくつかの理由に焦点を当てることを選びました。まず、これらのゲームは、例えばスラックや電子メールによるコミュニケーションなど、現実世界のセットアップと類似したセットアップを含む。第二に、実際の環境よりも制御可能で、必要に応じて刺激を埋め込むことができます。最後に、経験の全体を通して意思決定やコミュニケーションを理解するのに必要なデータを集めることができるため、チームプロセスは可能な限り透過的になります。本報告では,これまでに行った作業について論じ,その効果を実証する。

While work in fields of CSCW (Computer Supported Collaborative Work), Psychology and Social Sciences have progressed our understanding of team processes and their effect performance and effectiveness, current methods rely on observations or self-report, with little work directed towards studying team processes with quantifiable measures based on behavioral data. In this report we discuss work tackling this open problem with a focus on understanding individual differences and its effect on team adaptation, and further explore the effect of these factors on team performance as both an outcome and a process. We specifically discuss our contribution in terms of methods that augment survey data and behavioral data that allow us to gain more insight on team performance as well as develop a method to evaluate adaptation and performance across and within a group. To make this problem more tractable we chose to focus on specific types of environments, Alternate Reality Games (ARGs), and for several reasons. First, these types of games involve setups that are similar to a real-world setup, e.g., communication through slack or email. Second, they are more controllable than real environments allowing us to embed stimuli if needed. Lastly, they allow us to collect data needed to understand decisions and communications made through the entire duration of the experience, which makes team processes more transparent than otherwise possible. In this report we discuss the work we did so far and demonstrate the efficacy of the approach.

翻訳日:2021-06-28 15:04:43 公開日:2021-06-25

# (参考訳) Privileged Zero-Shot AutoML

Privileged Zero-Shot AutoML ( http://arxiv.org/abs/2106.13743v1 )

ライセンス: CC BY 4.0

Nikhil Singh, Brandon Kates, Jeff Mentch, Anant Kharkar, Madeleine Udell, Iddo Drori

(参考訳) この研究は、データセットと関数記述を用いて自動機械学習(AutoML)システムの品質を改善し、ゼロショットアプローチを用いて計算時間を数分からミリ秒に大幅に短縮する。新しいデータセットと明確に定義された機械学習タスクが与えられたとき、人間はデータセットの説明と使用するアルゴリズムのドキュメンテーションを読むことから始める。この作業は、AutoMLで特権情報と呼ばれるこれらのテキスト記述を使った最初のものです。トレーニング済みのTransformerモデルを使用して、特権テキストを処理し、この情報を使うことでAutoMLのパフォーマンスが向上することを示す。このように、自然言語処理における教師なし表現学習の進歩を活用し、AutoMLを大幅に向上させる。データと関数のテキスト記述のみを使用することで、合理的な分類性能が得られ、データメタ機能にテキスト記述を追加することで、表型データセット全体の分類が向上することを示す。ゼロショットAutoMLを達成するために、これらの記述埋め込みとデータメタ機能を使ってグラフニューラルネットワークをトレーニングする。各ノードはトレーニングデータセットを表しており、ゼロショット形式で新しいテストデータセットの最高の機械学習パイプラインを予測するために使用します。私たちのゼロショットアプローチは、教師付き学習タスクとデータセットのための高品質なパイプラインを迅速に予測します。対照的に、ほとんどのAutoMLシステムは、数十から数百のパイプライン評価を必要とする。ゼロショットのAutoMLは、実行時間と予測時間を数分からミリ秒に短縮する。 AutoMLを桁違いにスピードアップすることで、この作業はリアルタイムのAutoMLを示す。

This work improves the quality of automated machine learning (AutoML) systems by using dataset and function descriptions while significantly decreasing computation time from minutes to milliseconds by using a zero-shot approach. Given a new dataset and a well-defined machine learning task, humans begin by reading a description of the dataset and documentation for the algorithms to be used. This work is the first to use these textual descriptions, which we call privileged information, for AutoML. We use a pre-trained Transformer model to process the privileged text and demonstrate that using this information improves AutoML performance. Thus, our approach leverages the progress of unsupervised representation learning in natural language processing to provide a significant boost to AutoML. We demonstrate that using only textual descriptions of the data and functions achieves reasonable classification performance, and adding textual descriptions to data meta-features improves classification across tabular datasets. To achieve zero-shot AutoML we train a graph neural network with these description embeddings and the data meta-features. Each node represents a training dataset, which we use to predict the best machine learning pipeline for a new test dataset in a zero-shot fashion. Our zero-shot approach rapidly predicts a high-quality pipeline for a supervised learning task and dataset. In contrast, most AutoML systems require tens or hundreds of pipeline evaluations. We show that zero-shot AutoML reduces running and prediction times from minutes to milliseconds, consistently across datasets. By speeding up AutoML by orders of magnitude this work demonstrates real-time AutoML.

翻訳日:2021-06-28 15:03:15 公開日:2021-06-25

# (参考訳) InteL-VAEs:中間潜水剤による変分オートエンコーダへの誘導バイアス付加

InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via Intermediary Latents ( http://arxiv.org/abs/2106.13746v1 )

ライセンス: CC BY 4.0

Ning Miao, Emile Mathieu, N. Siddharth, Yee Whye Teh, Tom Rainforth

(参考訳) 本稿では,潜在変数の中間集合を用いて,制御可能な帰納バイアスを持つvaes学習法を提案する。これにより、標準ガウス事前仮定の制限を克服することができる。特に、学習した表現に疎結合やクラスタリングのような望ましい特性を課し、学習したモデルに事前情報を組み込むことができる。 InteL-VAE(Intermediary Latent Space VAE)と呼ばれる我々のアプローチは、符号化プロセスの確率性を中間潜時変数で制御することに基づいており、それらを対象潜時表現に決定的にマッピングし、そこから再構成を行う。これにより、望まれる事前情報、帰納的バイアス、さらには潜在マッピングによるトポロジ情報も取り入れながら、従来のVAEフレームワークのすべての利点を維持できます。これにより、InteL-VAEはより優れた生成モデルと表現の両方を学ぶことができる。

We introduce a simple and effective method for learning VAEs with controllable inductive biases by using an intermediary set of latent variables. This allows us to overcome the limitations of the standard Gaussian prior assumption. In particular, it allows us to impose desired properties like sparsity or clustering on learned representations, and incorporate prior information into the learned model. Our approach, which we refer to as the Intermediary Latent Space VAE (InteL-VAE), is based around controlling the stochasticity of the encoding process with the intermediary latent variables, before deterministically mapping them forward to our target latent representation, from which reconstruction is performed. This allows us to maintain all the advantages of the traditional VAE framework, while incorporating desired prior information, inductive biases, and even topological information through the latent mapping. We show that this, in turn, allows InteL-VAEs to learn both better generative models and representations.

翻訳日:2021-06-28 14:50:02 公開日:2021-06-25

# (参考訳) 新型コロナウイルス対策におけるロックダウン効果の評価

Assessing the Lockdown Effects on Air Quality during COVID-19 Era ( http://arxiv.org/abs/2106.13750v1 )

ライセンス: CC BY 4.0

Ioannis Kavouras, Eftychios Protopapadakis, Maria Kaselimia, Emmanuel Sardis, Nikolaos Doulamis

(参考訳) 本研究は、新型コロナウイルスの感染拡大を抑制するため、各都市で適用された予防策による大気汚染の短期的変動について検討する。特に、一酸化炭素(CO)、オゾン(O3)、二酸化窒素(NO2)、二酸化硫黄(SO2)などの特定の汚染ガスに対する濃度効果を強調した。ロックダウンの影響の評価は4つのヨーロッパ都市(Athens, Gladsaxe, Lodz, Rome)に焦点を当てた。地球規模の衛星観測により,汚染物質に関するデータを得た。雇用予防対策のレベルは、オックスフォード市政府対応トラッカーを用いて採用されている。分析の第2部では、さまざまな機械学習ツールを使用して、各汚染物質の濃度を2日前に推定した。その結果, 対応する指標と汚染要因との間には, 弱ないし中程度の相関関係が存在し, 日常生活における汚染物質ガスの挙動を予測できるモデルを作成することが可能であった。

In this work we investigate the short-term variations in air quality emissions, attributed to the prevention measures, applied in different cities, to mitigate the COVID-19 spread. In particular, we emphasize on the concentration effects regarding specific pollutant gases, such as carbon monoxide (CO), ozone (O3), nitrogen dioxide (NO2) and sulphur dioxide (SO2). The assessment of the impact of lockdown on air quality focused on four European Cities (Athens, Gladsaxe, Lodz and Rome). Available data on pollutant factors were obtained using global satellite observations. The level of the employed prevention measures is employed using the Oxford COVID-19 Government Response Tracker. The second part of the analysis employed a variety of machine learning tools, utilized for estimating the concentration of each pollutant, two days ahead. The results showed that a weak to moderate correlation exists between the corresponding measures and the pollutant factors and that it is possible to create models which can predict the behaviour of the pollutant gases under daily human activities.

翻訳日:2021-06-28 14:27:46 公開日:2021-06-25

# (参考訳) ニューラルスタイル伝達のための対話型マルチレベルストローク制御

Interactive Multi-level Stroke Control for Neural Style Transfer ( http://arxiv.org/abs/2106.13787v1 )

ライセンス: CC BY 4.0

Max Reimann and Benito Buchheim and Amir Semmo and J\"urgen D\"ollner and Matthias Trapp

(参考訳) 本稿では,スタイル要素の創造的調整を容易にし,高出力の忠実度を実現するニューラルスタイル転送をインタラクティブにマルチレベル制御するモバイルアプリstyletuneを提案する。現在のモバイルのニューラルスタイル転送アプリとは対照的に、styletuneでは、ブラシストロークやテクスチャパッチといったスタイル要素のサイズと向きを、グローバルおよびローカルレベルで調整することができる。そこで本研究では、ストロークサイズと強度を制御し、現在のアプローチよりも広い範囲の編集を可能にする、新しいストローク適応フィードフォワード型転送ネットワークを提案する。さらに,CNNの回転分散を利用したストローク向き調整のためのネットワーク非依存手法を提案する。さらに,高出力率を実現するために,20メガピクセル以上の出力解像度が得られるパッチベースのスタイル転送手法を提案する。当社のアプローチは,現在のモバイルニューラルスタイル転送アプリでは不可能な,多くの新しい結果を生成する上で有効です。

We present StyleTune, a mobile app for interactive multi-level control of neural style transfers that facilitates creative adjustments of style elements and enables high output fidelity. In contrast to current mobile neural style transfer apps, StyleTune supports users to adjust both the size and orientation of style elements, such as brushstrokes and texture patches, on a global as well as local level. To this end, we propose a novel stroke-adaptive feed-forward style transfer network, that enables control over stroke size and intensity and allows a larger range of edits than current approaches. For additional level-of-control, we propose a network agnostic method for stroke-orientation adjustment by utilizing the rotation-variance of CNNs. To achieve high output fidelity, we further add a patch-based style transfer method that enables users to obtain output resolutions of more than 20 Megapixel. Our approach empowers users to create many novel results that are not possible with current mobile neural style transfer apps.

翻訳日:2021-06-28 14:16:14 公開日:2021-06-25

# (参考訳) PVTv2:ピラミッドビジョントランスによるベースライン改善

PVTv2: Improved Baselines with Pyramid Vision Transformer ( http://arxiv.org/abs/2106.13797v1 )

ライセンス: CC BY 4.0

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

(参考訳) コンピュータビジョンのトランスフォーマーは、最近進歩している。本研究では,(1)畳み込みを伴う局所連続的な特徴,(2)ゼロパディングによる位置符号化,(3)平均プールを用いた線形複雑注意層を含む3つの改良設計を加えることにより,元のピラミッドビジョン変換器(PVTv1)を改善した。これらの簡単な修正により、PVTv2は分類、検出、セグメンテーションにおいてPVTv1を大幅に改善する。さらにPVTv2は、ImageNet-1K事前トレーニングの下で、Swin Transformerを含む最近の作業よりもはるかに優れたパフォーマンスを実現している。この研究により、最先端の視覚トランスフォーマー研究がよりアクセス可能になることを願っている。コードはhttps://github.com/whai362/PVTで入手できる。

Transformer in computer vision has recently shown encouraging progress. In this work, we improve the original Pyramid Vision Transformer (PVTv1) by adding three improvement designs, which include (1) locally continuous features with convolutions, (2) position encodings with zero paddings, and (3) linear complexity attention layers with average pooling. With these simple modifications, our PVTv2 significantly improves PVTv1 on classification, detection, and segmentation. Moreover, PVTv2 achieves much better performance than recent works, including Swin Transformer, under ImageNet-1K pre-training. We hope this work will make state-of-the-art vision Transformer research more accessible. Code is available at https://github.com/whai362/PVT .

翻訳日:2021-06-28 14:01:29 公開日:2021-06-25

# (参考訳) 領域ベースグラフニューラルネットワークを用いた効率的な文書画像分類

Efficient Document Image Classification Using Region-Based Graph Neural Network ( http://arxiv.org/abs/2106.13802v1 )

ライセンス: CC BY 4.0

Jaya Krishna Mandivarapu, Eric Bunch, Qian You, Glenn Fung

(参考訳) ドキュメントイメージの分類は、さまざまな業界にわたる多くのエンタープライズアプリケーションで商用化できるため、依然として一般的な研究分野である。大規模事前訓練されたコンピュータビジョンや言語モデル、グラフニューラルネットワークの最近の進歩は、画像分類を多くのツールに貸し出している。しかし、大きな事前訓練されたモデルを使用するには、通常かなりの計算資源が必要であるため、自動文書画像分類のコスト削減の利点を損なう可能性がある。本稿では,グラフ畳み込みニューラルネットワークを用いて,文書のテキスト情報,視覚情報,レイアウト情報を組み込んだ効率的な文書画像分類フレームワークを提案する。提案するアルゴリズムを,公開データセットと実生活保険書分類データセットの両方で,最先端のビジョンと言語モデルに対して厳格にベンチマークした。公開データと実世界のデータの両方で実証的な結果から,我々の手法はSOTAに近い性能を達成できるが,計算資源やモデルトレーニングや推論に要する時間をはるかに少なくすることがわかった。これにより、特にエンタープライズアプリケーションのスケーラブルなデプロイメントにおいて、コスト面でのメリットよりも優れたソリューションが実現される。その結果,本アルゴリズムはSOTAに非常に近い分類性能が得られることがわかった。また,提案手法とベースライン間の計算資源,モデルサイズ,トレーニング時間,推論時間を包括的に比較した。さらに、本手法および他のベースラインを用いて画像当たりのコストを並べる。

Document image classification remains a popular research area because it can be commercialized in many enterprise applications across different industries. Recent advancements in large pre-trained computer vision and language models and graph neural networks has lent document image classification many tools. However using large pre-trained models usually requires substantial computing resources which could defeat the cost-saving advantages of automatic document image classification. In the paper we propose an efficient document image classification framework that uses graph convolution neural networks and incorporates textual, visual and layout information of the document. We have rigorously benchmarked our proposed algorithm against several state-of-art vision and language models on both publicly available dataset and a real-life insurance document classification dataset. Empirical results on both publicly available and real-world data show that our methods achieve near SOTA performance yet require much less computing resources and time for model training and inference. This results in solutions than offer better cost advantages, especially in scalable deployment for enterprise applications. The results showed that our algorithm can achieve classification performance quite close to SOTA. We also provide comprehensive comparisons of computing resources, model sizes, train and inference time between our proposed methods and baselines. In addition we delineate the cost per image using our method and other baselines.

翻訳日:2021-06-28 13:46:54 公開日:2021-06-25

# 室内回転シーンの自己監督による単眼深度推定

Self-Supervised Monocular Depth Estimation of Untextured Indoor Rotated Scenes ( http://arxiv.org/abs/2106.12958v2 )

ライセンス: Link先を確認

Benjamin Keltjens and Tom van Dijk and Guido de Croon

(参考訳) 自己教師付き深層学習法では,単眼深度推定の訓練にステレオ画像を用いた。これらの手法は、KITTIなどの屋外データセットに対して強い結果を示すが、室内環境における監視手法の性能とカメラ回転とは一致しない。屋内で回転するシーンは、低テクスチャ領域の存在度と回転下の画像の奥行き手がかりの複雑さの増加という2つの理由から、制約の少ないアプリケーションでは一般的である。自己教師あり学習をより一般化した環境に拡張するために、我々は2つの追加を提案する。まず,テクスチャレス領域における画像再構成誤差損失の曖昧さを補正する新しい不均一損失項を提案する。具体的には, 周囲のテクスチャ領域からの距離を推定し, 元の推定値の補正にL1損失を用いる。実験の結果,ゴダードらによるモノデプスと比較すると,低テクスチャシーンでは,テクスチャシーンに損なわれることなく,奥行き推定が大幅に改善された。第2に, アプリケーションの代表回転によるトレーニングは, ピッチとロールの両方において, 期待回転範囲全体の性能を著しく向上させるのに十分であることを示す。カメラ回転のないテストセットで評価すると,性能が低下しないため,深さ推定がうまく一般化されることを示す。これらの開発により、複雑な環境に対する単眼深度推定の自己教師付き学習をより広く活用することができる。

Self-supervised deep learning methods have leveraged stereo images for training monocular depth estimation. Although these methods show strong results on outdoor datasets such as KITTI, they do not match performance of supervised methods on indoor environments with camera rotation. Indoor, rotated scenes are common for less constrained applications and pose problems for two reasons: abundance of low texture regions and increased complexity of depth cues for images under rotation. In an effort to extend self-supervised learning to more generalised environments we propose two additions. First, we propose a novel Filled Disparity Loss term that corrects for ambiguity of image reconstruction error loss in textureless regions. Specifically, we interpolate disparity in untextured regions, using the estimated disparity from surrounding textured areas, and use L1 loss to correct the original estimation. Our experiments show that depth estimation is substantially improved on low-texture scenes, without any loss on textured scenes, when compared to Monodepth by Godard et al. Secondly, we show that training with an application's representative rotations, in both pitch and roll, is sufficient to significantly improve performance over the entire range of expected rotation. We demonstrate that depth estimation is successfully generalised as performance is not lost when evaluated on test sets with no camera rotation. Together these developments enable a broader use of self-supervised learning of monocular depth estimation for complex environments.

翻訳日:2021-06-28 13:21:53 公開日:2021-06-25

# パストレースのためのリアルタイムニューラルネットワークラミアンスキャッシング

Real-time Neural Radiance Caching for Path Tracing ( http://arxiv.org/abs/2106.12372v2 )

ライセンス: Link先を確認

Thomas M\"uller, Fabrice Rousselle, Jan Nov\'ak, Alexander Keller

(参考訳) 本稿では,パストレースによるグローバル照明のためのリアルタイムニューラルネットワークラミアンスキャッシング手法を提案する。我々のシステムは、完全にダイナミックなシーンを扱うように設計されており、照明、幾何学、材料に関する仮定は一切ない。私たちのアプローチのデータ駆動性は、キャッシュポイントの配置、補間、更新など、キャッシュアルゴリズムの多くの難しさを回避します。ニューラルネットワークをトレーニングして新しいものを扱うため、動的シーンは恐ろしい一般化の課題であるので、事前トレーニングを廃止し、適応によって一般化する。レンダリング中にレイディアンスキャッシュを訓練することにしました低ノイズのトレーニングターゲットを提供し、数バウンストレーニング更新を単に繰り返して無限バウンス輸送をシミュレートするために、自己学習を採用している。最新のハードウェアをフル活用したニューラルネットワークのストリーミング実装のおかげで、更新とキャッシュクエリは -- フルhd解像度で約2.6ミリ秒の軽いオーバーヘッドを伴います。バイアスを小さく抑えることで大きなノイズ低減効果を示すとともに,多くの課題に対して最先端のリアルタイム性能を報告した。

We present a real-time neural radiance caching method for path-traced global illumination. Our system is designed to handle fully dynamic scenes, and makes no assumptions about the lighting, geometry, and materials. The data-driven nature of our approach sidesteps many difficulties of caching algorithms, such as locating, interpolating, and updating cache points. Since pretraining neural networks to handle novel, dynamic scenes is a formidable generalization challenge, we do away with pretraining and instead achieve generalization via adaptation, i.e. we opt for training the radiance cache while rendering. We employ self-training to provide low-noise training targets and simulate infinite-bounce transport by merely iterating few-bounce training updates. The updates and cache queries incur a mild overhead -- about 2.6ms on full HD resolution -- thanks to a streaming implementation of the neural network that fully exploits modern hardware. We demonstrate significant noise reduction at the cost of little induced bias, and report state-of-the-art, real-time performance on a number of challenging scenarios.

翻訳日:2021-06-28 13:21:29 公開日:2021-06-25

# 非iidグラフ上のフェデレートグラフ分類

Federated Graph Classification over Non-IID Graphs ( http://arxiv.org/abs/2106.13423v1 )

ライセンス: Link先を確認

Han Xie, Jing Ma, Li Xiong, Carl Yang

(参考訳) フェデレートラーニングは、異なるドメインで機械学習モデルをトレーニングするための重要なパラダイムとして登場した。グラフ分類のようなグラフレベルのタスクでは、グラフは特別な種類のデータサンプルと見なすこともできる。他のドメインと同様に、グラフの小さなセットを持つ複数のローカルシステムは、人気のあるグラフニューラルネットワーク(GNN)のような強力なグラフマイニングモデルを協調的にトレーニングする利点がある。このような取り組みへのモチベーションを高めるために、異なるドメインの現実世界のグラフを分析し、ランダムグラフと比較して統計的に有意なグラフ特性を実際に共有していることを確認する。しかし、同じ領域や同じデータセットからでも異なるグラフ集合が、グラフ構造とノードの特徴の両方に関してIIDではないことが分かる。そこで本研究では,gnnの勾配に基づく局所システムのクラスタを動的に探索し,そのクラスタが局所システムが所有するグラフの構造や特徴の多様性を低減できることを理論的に正当化する,グラフクラスタリングフェデレーション学習(gcfl)フレームワークを提案する。さらに,gnnの勾配がgcflでかなり変動するのを観察し,動的時間ウォーピング(gcfl+)に基づく勾配シーケンスに基づくクラスタリング機構を設計する。広範な実験結果と詳細な分析により,提案手法の有効性が実証された。

Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collaboratively training a powerful graph mining model, such as the popular graph neural networks (GNNs). To provide more motivation towards such endeavors, we analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. To handle this, we propose a graph clustering federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time warping (GCFL+). Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed frameworks.

翻訳日:2021-06-28 13:21:13 公開日:2021-06-25

# 診断によるSGDの一般化評価

Assessing Generalization of SGD via Disagreement ( http://arxiv.org/abs/2106.13799v1 )

ライセンス: Link先を確認

Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J. Zico Kolter

(参考訳) 実験により、同一のトレーニングセット上で同じアーキテクチャをトレーニングするだけで、SGD(Stochastic Gradient Descent)が異なる動作で深層ネットワークのテスト誤差を推定できることを示し、ラベルのないテストデータ上で2つのネットワーク間の不一致率を測定する。これは、Nakkiran & Bansal '20における観察の、より強力なバージョンの上に構築されている。さらに、この特異な現象は、SGD訓練モデルの \emph{well-calibrated} の性質から生じることを理論的に示す。この発見は、ラベルのないテストデータを使ってテストエラーを直接予測する単純な経験的尺度を提供するだけでなく、一般化とキャリブレーションの間に新たな概念的接続を確立する。

We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data. This builds on -- and is a stronger version of -- the observation in Nakkiran & Bansal '20, which requires the second run to be on an altogether fresh training set. We further theoretically show that this peculiar phenomenon arises from the \emph{well-calibrated} nature of \emph{ensembles} of SGD-trained models. This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.

翻訳日:2021-06-28 13:20:50 公開日:2021-06-25

# CausalCity:Causal DiscoveryとReasoningのための複雑なシミュレーション

CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning ( http://arxiv.org/abs/2106.13364v1 )

ライセンス: Link先を確認

Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn and Ashish Kapoor

(参考訳) 因果推論と反事実推論を行う能力は、人間の知性の中心的な性質である。このような推論を実行できる意思決定システムは、より一般化可能で解釈可能な可能性がある。シミュレーションは、パラメータ(例えば、共同設立者)を体系的に変化させ、反現実的なシナリオの場合の結果の例を生成する能力を提供することによって、この領域における最先端の進歩に役立っている。しかし、運転や車両ナビゲーションなど、多エージェントシナリオにおける複雑な時間的因果事象をシミュレートすることは困難である。そこで本研究では,安全クリティカルな文脈における因果発見と反事実推論のためのアルゴリズム開発を目的とした,忠実度の高いシミュレーション環境を提案する。私たちの作業の中核となるコンポーネントは \textit{agency} を導入することで、ハイレベルな定義を使って複雑なシナリオを簡単に定義し作成できます。車両はこれらの目的を達成するために機関と共に運用され、低レベルの行動は必要に応じてのみ制御される。我々は,3つの最先端手法を用いて実験を行い,ベースラインを作成し,この環境の余裕を強調する。最後に、将来の仕事の課題と機会を強調します。

The ability to perform causal and counterfactual reasoning are central properties of human intelligence. Decision-making systems that can perform these types of reasoning have the potential to be more generalizable and interpretable. Simulations have helped advance the state-of-the-art in this domain, by providing the ability to systematically vary parameters (e.g., confounders) and generate examples of the outcomes in the case of counterfactual scenarios. However, simulating complex temporal causal events in multi-agent scenarios, such as those that exist in driving and vehicle navigation, is challenging. To help address this, we present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning in the safety-critical context. A core component of our work is to introduce \textit{agency}, such that it is simple to define and create complex scenarios using high-level definitions. The vehicles then operate with agency to complete these objectives, meaning low-level behaviors need only be controlled if necessary. We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment. Finally, we highlight challenges and opportunities for future work.

翻訳日:2021-06-28 13:20:08 公開日:2021-06-25

# データ拡張のための単一画像テクスチャ変換

Single Image Texture Translation for Data Augmentation ( http://arxiv.org/abs/2106.13804v1 )

ライセンス: Link先を確認

Boyi Li and Yin Cui and Tsung-Yi Lin and Serge Belongie

(参考訳) 画像合成の最近の進歩により、ソースドメインとターゲットドメインのマッピングを学習することで、画像の翻訳が可能になる。既存の手法では、様々なデータセット上でモデルをトレーニングすることで分布を学習する傾向があり、その結果は主観的に評価される。しかし,画像認識タスクにおける意味的画像翻訳手法の可能性について検討する研究は比較的少ない。本稿では,データ拡張におけるSITT(Single Image Texture Translation)の利用について検討する。まず,ソーステクスチャの単一の入力に基づいてテクスチャを画像に変換する軽量モデルを提案し,高速なトレーニングとテストを可能にした。 SITTに基づいて、長い尾と少数ショットの画像分類タスクにおける拡張データの利用について検討する。提案手法は,入力データを対象領域に翻訳し,一貫した画像認識性能の向上を実現する。最後に、SITTと関連する画像翻訳手法が、モデルトレーニングにおけるデータ効率向上工学アプローチの基盤となるかを検討する。

Recent advances in image synthesis enables one to translate images by learning the mapping between a source domain and a target domain. Existing methods tend to learn the distributions by training a model on a variety of datasets, with results evaluated largely in a subjective manner. Relatively few works in this area, however, study the potential use of semantic image translation methods for image recognition tasks. In this paper, we explore the use of Single Image Texture Translation (SITT) for data augmentation. We first propose a lightweight model for translating texture to images based on a single input of source texture, allowing for fast training and testing. Based on SITT, we then explore the use of augmented data in long-tailed and few-shot image classification tasks. We find the proposed method is capable of translating input data into a target domain, leading to consistent improved image recognition performance. Finally, we examine how SITT and related image translation methods can provide a basis for a data-efficient, augmentation engineering approach to model training.

翻訳日:2021-06-28 13:19:49 公開日:2021-06-25

# ParaLaw Nets -- 法的テキスト処理のための言語間文レベルの事前学習

ParaLaw Nets -- Cross-lingual Sentence-level Pretraining for Legal Text Processing ( http://arxiv.org/abs/2106.13403v1 )

ライセンス: Link先を確認

Ha-Thanh Nguyen, Vu Tran, Phuong Minh Nguyen, Thi-Hai-Yen Vuong, Quan Minh Bui, Chau Minh Nguyen, Binh Tran Dang, Minh Le Nguyen, Ken Satoh

(参考訳) 曖昧さは自然言語の特徴であり、表現のアイデアを柔軟にする。しかし、正確なステートメントを必要とするドメインでは、それは障壁になります。具体的には、1つの単語が複数の意味を持ち、複数の単語が同じ意味を持つ。テキストを外国語に翻訳する場合、翻訳者は原文の各要素の正確な意味を判断し、正しい翻訳文を生成する必要がある。そこで本研究では,文レベルの言語間情報を用いた事前学習されたモデルファミリーであるParaLaw Netsを提案する。このアプローチは coliee-2021 の質問応答タスクで最高の結果を得た。

Ambiguity is a characteristic of natural language, which makes expression ideas flexible. However, in a domain that requires accurate statements, it becomes a barrier. Specifically, a single word can have many meanings and multiple words can have the same meaning. When translating a text into a foreign language, the translator needs to determine the exact meaning of each element in the original sentence to produce the correct translation sentence. From that observation, in this paper, we propose ParaLaw Nets, a pretrained model family using sentence-level cross-lingual information to reduce ambiguity and increase the performance in legal text processing. This approach achieved the best result in the Question Answering task of COLIEE-2021.

翻訳日:2021-06-28 13:19:35 公開日:2021-06-25

# コントラスト表現学習のための分解的相互情報推定

Decomposed Mutual Information Estimation for Contrastive Representation Learning ( http://arxiv.org/abs/2106.13401v1 )

ライセンス: Link先を確認

Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Phil Bachman, Remi Tachet

(参考訳) 最近のコントラスト表現学習法は、基礎となるコンテキストの複数のビュー間の相互情報(mi)の推定に依存する。例えば、データ拡張を適用することで、与えられた画像の複数のビューを導出したり、シーケンスをシーケンス内の何らかのステップの過去と未来からなるビューに分割することができる。 MI 上の対照的な下界は最適化が容易であるが、MI の大量推定では強い過小評価バイアスを持つ。そこで本稿では,MI推定問題をより小さな推定問題に分解する手法として,ビューの1つを段階的により情報的なサブビューに分割し,分割されたビュー間でMIに連鎖ルールを適用する手法を提案する。この式は無条件および条件MIの項の和を含み、それぞれが全MIのモデストチャンクを測定し、対照的な境界による近似を容易にする。和を最大化するために、効率的に近似できる条件 MI 上の対照的な下界を定式化する。我々は、一般的なアプローチをDEMI(Decomposed Estimation of Mutual Information)と呼ぶ。 DEMIは、合成環境では非分解コントラスト境界よりも多くのMIを捕捉でき、視覚領域や対話生成においてより良い表現を学習できることを示す。

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

翻訳日:2021-06-28 13:18:56 公開日:2021-06-25

# マルチドメインアクティブラーニング:比較研究

Multi-Domain Active Learning: A Comparative Study ( http://arxiv.org/abs/2106.13516v1 )

ライセンス: Link先を確認

Rui He, Shan He, Ke Tang

(参考訳) 複数のドメインに分類器を構築することは実生活において現実的な問題である。分類器を1つずつ構築する代わりに、マルチドメイン学習(MDL)は同時に複数のドメインに分類器を構築する。 MDLはドメイン間で共有される情報を利用してパフォーマンスを向上させる。教師付き学習問題として,MDL問題ではラベル付け作業が依然として高い。通常、この高いラベル付けコスト問題は、アクティブラーニングを使用することで軽減できる。したがって、MDLにおけるラベル付けの労力を減らすためにアクティブラーニングを活用することは自然であり、この設定をマルチドメインアクティブラーニング(MDAL)と呼ぶ。しかし、この設定で作られた作品はほとんどない。そして、研究がこの問題に直面するとき、既成の解決策は存在しない。この状況下では、現在のマルチドメイン学習モデルと単一ドメインのアクティブ学習戦略を組み合わせることが、MDAL問題の予備的な解決策となるかもしれない。この予備解の可能性を明らかにするために,5モデルと4つの選択戦略の比較研究を行った。私たちの知る限りでは、これがMDALの正式な定義を提供する最初の作品です。さらに、MDAL問題に対する最初の比較研究である。その結果,単純最良対第二最良(bvsb)不確実性戦略を用いた多項逆ネットワーク(man)モデルは,ほとんどの場合においてその優位を示す。我々はこの組み合わせをMDAL問題に対する既定の勧告だと考えている。

Building classifiers on multiple domains is a practical problem in the real life. Instead of building classifiers one by one, multi-domain learning (MDL) simultaneously builds classifiers on multiple domains. MDL utilizes the information shared among the domains to improve the performance. As a supervised learning problem, the labeling effort is still high in MDL problems. Usually, this high labeling cost issue could be relieved by using active learning. Thus, it is natural to utilize active learning to reduce the labeling effort in MDL, and we refer this setting as multi-domain active learning (MDAL). However, there are only few works which are built on this setting. And when the researches have to face this problem, there is no off-the-shelf solutions. Under this circumstance, combining the current multi-domain learning models and single-domain active learning strategies might be a preliminary solution for MDAL problem. To find out the potential of this preliminary solution, a comparative study over 5 models and 4 selection strategies is made in this paper. To the best of our knowledge, this is the first work provides the formal definition of MDAL. Besides, this is the first comparative work for MDAL problem. From the results, the Multinomial Adversarial Networks (MAN) model with a simple best vs second best (BvSB) uncertainty strategy shows its superiority in most cases. We take this combination as our off-the-shelf recommendation for the MDAL problem.

翻訳日:2021-06-28 13:18:37 公開日:2021-06-25

# 集団意思決定におけるエキスパートバイアスの対応

Dealing with Expert Bias in Collective Decision-Making ( http://arxiv.org/abs/2106.13539v1 )

ライセンス: Link先を確認

Axel Abels, Tom Lenaerts, Vito Trianni, Ann Now\'e

(参考訳) いくつかの現実の問題は意思決定の問題として定式化され、ある選択肢から適切な選択を繰り返す必要がある。人間であれ人工であれ、専門家の判断は、特に代替ソリューションの探索にコストがかかる場合に、正しい判断を下すのに役立つ。専門家の意見が逸脱する可能性があるため、正しい選択肢を見つけるという問題は、集団意思決定問題(CDM)としてアプローチできる。現在の最先端のCDM解決アプローチは、グループ内の最高の専門家の質によって制限されており、専門家が資格がない場合や過度に偏りがある場合、判断プロセスの妨げになる可能性がある。本稿では,文脈的マルチアームバンディット問題(CMAB)に基づく新たなアルゴリズムアプローチを提案する。我々は,同種,異種,偏極性の専門家グループを探索し,提案手法が優れた評価に直結するかどうか,最先端の手法,特に提供された専門知識の品質が低下した場合に,効果的に活用可能であることを示す。 cmabに触発された新しいアプローチは,従来の適応型アルゴリズムよりも高速に収束しながら,高い最終性能を実現している。

Quite some real-world problems can be formulated as decision-making problems wherein one must repeatedly make an appropriate choice from a set of alternatives. Expert judgements, whether human or artificial, can help in taking correct decisions, especially when exploration of alternative solutions is costly. As expert opinions might deviate, the problem of finding the right alternative can be approached as a collective decision making problem (CDM). Current state-of-the-art approaches to solve CDM are limited by the quality of the best expert in the group, and perform poorly if experts are not qualified or if they are overly biased, thus potentially derailing the decision-making process. In this paper, we propose a new algorithmic approach based on contextual multi-armed bandit problems (CMAB) to identify and counteract such biased expertises. We explore homogeneous, heterogeneous and polarised expert groups and show that this approach is able to effectively exploit the collective expertise, irrespective of whether the provided advice is directly conducive to good performance, outperforming state-of-the-art methods, especially when the quality of the provided expertise degrades. Our novel CMAB-inspired approach achieves a higher final performance and does so while converging more rapidly than previous adaptive algorithms, especially when heterogeneous expertise is readily available.

翻訳日:2021-06-28 13:18:20 公開日:2021-06-25

# フレキシブルニューラルネットワークのトレーニングのためのテンソルベースフレームワーク

Tensor-based framework for training flexible neural networks ( http://arxiv.org/abs/2106.13542v1 )

ライセンス: Link先を確認

Yassine Zniyed, Konstantin Usevich, Sebastian Miron, David Brie

(参考訳) 活性化関数(AF)はニューラルネットワーク(NN)の設計において重要な部分であり、その選択はNNのパフォーマンスにおいて重要な役割を果たす。本研究では,afsを既定基底関数の重み付き和として表現したテンソルに基づく解を用いたフレキシブルアクティベーション関数の推定に特に注目する。そこで本研究では,制約付き結合行列-テンソル因子分解(cmtf)問題を解く新しい学習アルゴリズムを提案する。この手法は、制約付き正準多進分解(CPD)に続いて、一階情報がヤコビテンソルに含まれるNNの第1次及び第0次情報を融合する。提案アルゴリズムは、異なる分解基盤を扱える。この方法の目標は、元のネットワークの1層または複数の層を新しいフレキシブルな層に置き換えることで、大きな事前学習されたnnモデルを圧縮することである。このアプローチは、文字分類に使用される事前訓練された畳み込みニューラルネットワーク(CNN)に適用される。

Activation functions (AFs) are an important part of the design of neural networks (NNs), and their choice plays a predominant role in the performance of a NN. In this work, we are particularly interested in the estimation of flexible activation functions using tensor-based solutions, where the AFs are expressed as a weighted sum of predefined basis functions. To do so, we propose a new learning algorithm which solves a constrained coupled matrix-tensor factorization (CMTF) problem. This technique fuses the first and zeroth order information of the NN, where the first-order information is contained in a Jacobian tensor, following a constrained canonical polyadic decomposition (CPD). The proposed algorithm can handle different decomposition bases. The goal of this method is to compress large pretrained NN models, by replacing subnetworks, {\em i.e.,} one or multiple layers of the original network, by a new flexible layer. The approach is applied to a pretrained convolutional neural network (CNN) used for character classification.

翻訳日:2021-06-28 13:17:58 公開日:2021-06-25

# VEGN: グラフニューラルネットワークによる変数効果予測

VEGN: Variant Effect Prediction with Graph Neural Networks ( http://arxiv.org/abs/2106.13642v1 )

ライセンス: Link先を確認

Jun Cheng, Carolin Lawrence, Mathias Niepert

(参考訳) 遺伝的変異は、正常な遺伝子機能を破壊して病気を引き起こすことがある。個々の患者内の数百万の遺伝子変異から病気を引き起こす突然変異を特定することは難しい問題である。病原性突然変異を優先順位付けできる計算方法には、膨大な応用がある。遺伝子は複雑な制御ネットワークを介して機能することが知られている。しかし、既存の変種効果予測モデルは、単独で変種を考えるのみである。対照的に、遺伝子と変異を持つ異種グラフ上で動作するグラフニューラルネットワーク(GNN)を用いて、変動効果予測をモデル化するVEGNを提案する。このグラフは遺伝子に変異を割り当て、遺伝子と遺伝子相互作用ネットワークを接続することによって作られる。この文脈では、遺伝子遺伝子グラフが与えられるアプローチと、VEGNが遺伝子遺伝子グラフを学習し、与えられたエッジと学習したエッジの両方で操作するアプローチを探索する。グラフニューラルネットワークは、遺伝子間、および遺伝子と変異種間の情報を集約するために訓練される。変数は、接続する遺伝子を介して情報を交換することができる。このアプローチは既存の最先端モデルの性能を改善する。

Genetic mutations can cause disease by disrupting normal gene function. Identifying the disease-causing mutations from millions of genetic variants within an individual patient is a challenging problem. Computational methods which can prioritize disease-causing mutations have, therefore, enormous applications. It is well-known that genes function through a complex regulatory network. However, existing variant effect prediction models only consider a variant in isolation. In contrast, we propose VEGN, which models variant effect prediction using a graph neural network (GNN) that operates on a heterogeneous graph with genes and variants. The graph is created by assigning variants to genes and connecting genes with an gene-gene interaction network. In this context, we explore an approach where a gene-gene graph is given and another where VEGN learns the gene-gene graph and therefore operates both on given and learnt edges. The graph neural network is trained to aggregate information between genes, and between genes and variants. Variants can exchange information via the genes they connect to. This approach improves the performance of existing state-of-the-art models.

翻訳日:2021-06-28 13:17:25 公開日:2021-06-25

# ニューラルネットワークを用いた遺伝性癌の予測

Prediction of Hereditary Cancers Using Neural Networks ( http://arxiv.org/abs/2106.13682v1 )

ライセンス: Link先を確認

Zoe Guan, Giovanni Parmigiani, Danielle Braun, and Lorenzo Trippa

(参考訳) 家族歴は多くの種類のがんの主要な危険因子である。メンデルリスク予測モデルは、癌感受性遺伝子の知識に基づいて、家族の歴史をがんリスク予測に変換する。これらのモデルは、リスクの高い個人を特定するために臨床実践で広く利用されている。メンデルモデルは家族の歴史全体を生かしているが、変異の頻度が低いため、非現実的または検証が難しいがん感受性遺伝子に関する多くの仮定に依存している。ニューラルネットワークなどのフレキシブルなモデルを1桁の大規模なデータベースでトレーニングすることは、精度の向上につながる可能性がある。本稿では,家族史データにニューラルネットワークを適用する枠組みを開発し,癌に対する遺伝感受性を学習する能力について検討する。多くのタスクでは、ニューラルネットワークとその最先端のパフォーマンスに関する広範な文献があるが、家族の歴史データに適用する作業はほとんどない。本稿では,完全連結ニューラルネットワークと畳み込みニューラルネットワークの系統への適応を提案する。メンデル継承下でシミュレーションされたデータでは,提案するニューラルネットワークモデルがほぼ最適予測性能を達成できることを実証する。さらに、観測された家族歴が誤報告されたがん診断を含んでいる場合、ニューラルネットワークは正しい遺伝法則を組み込んだメンデル型BRCAPROモデルよりも優れている。リスクサービスのコホートである20万以上の家族履歴の大規模なデータセットを使用して、将来の乳癌リスク予測モデルをトレーニングします。がん遺伝学ネットワークのデータを用いてモデルを検証する。

Family history is a major risk factor for many types of cancer. Mendelian risk prediction models translate family histories into cancer risk predictions based on knowledge of cancer susceptibility genes. These models are widely used in clinical practice to help identify high-risk individuals. Mendelian models leverage the entire family history, but they rely on many assumptions about cancer susceptibility genes that are either unrealistic or challenging to validate due to low mutation prevalence. Training more flexible models, such as neural networks, on large databases of pedigrees can potentially lead to accuracy gains. In this paper, we develop a framework to apply neural networks to family history data and investigate their ability to learn inherited susceptibility to cancer. While there is an extensive literature on neural networks and their state-of-the-art performance in many tasks, there is little work applying them to family history data. We propose adaptations of fully-connected neural networks and convolutional neural networks to pedigrees. In data simulated under Mendelian inheritance, we demonstrate that our proposed neural network models are able to achieve nearly optimal prediction performance. Moreover, when the observed family history includes misreported cancer diagnoses, neural networks are able to outperform the Mendelian BRCAPRO model embedding the correct inheritance laws. Using a large dataset of over 200,000 family histories, the Risk Service cohort, we train prediction models for future risk of breast cancer. We validate the models using data from the Cancer Genetics Network.

翻訳日:2021-06-28 13:17:10 公開日:2021-06-25

# 共役エネルギーモデル

Conjugate Energy-Based Models ( http://arxiv.org/abs/2106.13798v1 )

ライセンス: Link先を確認

Hao Wu, Babak Esmaeili, Michael Wick, Jean-Baptiste Tristan, Jan-Willem van de Meent

(参考訳) 本稿では,データと潜在変数の結合密度を定義する新しいエネルギー系モデルである共役エネルギーベースモデル(cebms)を提案する。 CEBMの結合密度は、データ上の難解な分布と遅延変数上の引き込み可能な後部分布に分解される。 CEBMは、データから潜在変数への教師なしマッピングを学ぶという意味で、変分オートエンコーダのようなユースケースを持つ。しかし、これらのモデルはジェネレータネットワークを省略し、データポイント間の類似性のより柔軟な概念を学ぶことができる。実験により,共役型EMMは画像モデリング,潜在空間の予測能力,および様々なデータセットの領域外検出において競合する結果が得られることを示した。

In this paper, we propose conjugate energy-based models (CEBMs), a new class of energy-based models that define a joint density over data and latent variables. The joint density of a CEBM decomposes into an intractable distribution over data and a tractable posterior over latent variables. CEBMs have similar use cases as variational autoencoders, in the sense that they learn an unsupervised mapping from data to latent variables. However, these models omit a generator network, which allows them to learn more flexible notions of similarity between data points. Our experiments demonstrate that conjugate EBMs achieve competitive results in terms of image modelling, predictive power of latent space, and out-of-domain detection on a variety of datasets.

翻訳日:2021-06-28 13:16:49 公開日:2021-06-25

# 正則化のための階層的接続球面多様体

Connecting Sphere Manifolds Hierarchically for Regularization ( http://arxiv.org/abs/2106.13549v1 )

ライセンス: Link先を確認

Damien Scieur, Youngsung Kim

(参考訳) 本稿では階層的なクラスによる分類問題を考察する。各クラスの分類器(超平面)を、その中心がスーパークラスの分類器である球面多様体に属するように強制する。そして、個々の球面多様体はその階層関係に基づいて連結される。本手法は,球状完全連結層と階層層を組み合わせることで,ニューラルネットワークの最終層を置き換えるものである。この正規化は、公開データセット(CIFAR100、CUB200、スタンフォード犬、スタンフォード車、Tiny-ImageNet)上で広く使用されているディープニューラルネットワークアーキテクチャ(ResNetとDenseNet)のパフォーマンスを改善することが示されている。

This paper considers classification problems with hierarchically organized classes. We force the classifier (hyperplane) of each class to belong to a sphere manifold, whose center is the classifier of its super-class. Then, individual sphere manifolds are connected based on their hierarchical relations. Our technique replaces the last layer of a neural network by combining a spherical fully-connected layer with a hierarchical layer. This regularization is shown to improve the performance of widely used deep neural network architectures (ResNet and DenseNet) on publicly available datasets (CIFAR100, CUB200, Stanford dogs, Stanford cars, and Tiny-ImageNet).

翻訳日:2021-06-28 13:15:47 公開日:2021-06-25

# 視覚変換器アーキテクチャ探索

Vision Transformer Architecture Search ( http://arxiv.org/abs/2106.13700v1 )

ライセンス: Link先を確認

Xiu Su, Shan You, Jiyang Xie, Mingkai Zheng, Fei Wang, Chen Qian, Changshui Zhang, Xiaogang Wang, Chang Xu

(参考訳) 近年,手動分割パッチのシーケンスを自己認識機構でモデル化することで,コンピュータビジョンタスクの解法において,トランスフォーマーは優れた優位性を示している。しかし、現在の視覚トランスフォーマー(vits)のアーキテクチャは自然言語処理(nlp)タスクから継承され、十分に研究され最適化されていない。本稿では,視覚タスクにおけるトランスフォーマの固有構造を検証し,同様のハードウェア予算で最適なアーキテクチャを探索するためのアーキテクチャ探索手法vitasを提案する。具体的には, 異なるトークン埋め込み, シーケンスサイズ, ヘッド数, 幅, 深さの異なるアーキテクチャを単一超変圧器から導出できるような, 有効かつ効率的なViTのための新しい重量共有パラダイムを設計する。さらに、異なるアーキテクチャのばらつきに対応するため、スーパートランスフォーマで \textit{private} クラストークンとセルフアテンションマップを導入する。また,異なる予算の探索に適応するために,同一性操作のサンプリング確率を探索することを提案する。実験の結果,既存のトランスフォーマアーキテクチャに比べ,vitasは優れた結果を得た。例えば、13$gのフロップス予算で、検索されたアーキテクチャは、imagenetで最大$$1の精度を74.7.%達成し、現在のベースラインvitアーキテクチャよりも$2.5\%優れている。コードは \url{https://github.com/xiusu/ViTAS} で入手できる。

Recently, transformers have shown great superiority in solving computer vision tasks by modeling images as a sequence of manually-split patches with self-attention mechanism. However, current architectures of vision transformers (ViTs) are simply inherited from natural language processing (NLP) tasks and have not been sufficiently investigated and optimized. In this paper, we make a further step by examining the intrinsic structure of transformers for vision tasks and propose an architecture search method, dubbed ViTAS, to search for the optimal architecture with similar hardware budgets. Concretely, we design a new effective yet efficient weight sharing paradigm for ViTs, such that architectures with different token embedding, sequence size, number of heads, width, and depth can be derived from a single super-transformer. Moreover, to cater for the variance of distinct architectures, we introduce \textit{private} class token and self-attention maps in the super-transformer. In addition, to adapt the searching for different budgets, we propose to search the sampling probability of identity operation. Experimental results show that our ViTAS attains excellent results compared to existing pure transformer architectures. For example, with $1.3$G FLOPs budget, our searched architecture achieves $74.7\%$ top-$1$ accuracy on ImageNet and is $2.5\%$ superior than the current baseline ViT architecture. Code is available at \url{https://github.com/xiusu/ViTAS}.

翻訳日:2021-06-28 13:15:35 公開日:2021-06-25

# 安定のための再パラメータvaes

Re-parameterizing VAEs for stability ( http://arxiv.org/abs/2106.13739v1 )

ライセンス: Link先を確認

David Dehaene and R\'emy Brossard

(参考訳) 本稿では,変分オートエンコーダ(VAE)の数値安定性を訓練するための理論的アプローチを提案する。我々の研究は、VAEが複雑な画像データセットのアート生成結果に到達できるようにする最近の研究によって動機づけられている。これらの非常に深いVAEアーキテクチャと、より複雑な出力分布を用いたVAEは、高いトレーニング勾配とNaN損失を生み出す傾向を浮き彫りにしている。制限にもかかわらず訓練するために提案された経験的な修正は、完全に理論的に根拠づけられたり、実際は十分ではない。そこで本研究では,モデルのニューラルネットワークとその出力確率分布とのインタフェースに問題の原因を局所化する。符号化された正規分布の分散の注意深い定式化から生じる不安定性の共通源を説明し、他の明らかでないソースにも同様のアプローチを適用する。私たちが依存する正規分布をパラメータ化する方法に小さな変更を加えることで、VAEを安全にトレーニングできることが示されます。

We propose a theoretical approach towards the training numerical stability of Variational AutoEncoders (VAE). Our work is motivated by recent studies empowering VAEs to reach state of the art generative results on complex image datasets. These very deep VAE architectures, as well as VAEs using more complex output distributions, highlight a tendency to haphazardly produce high training gradients as well as NaN losses. The empirical fixes proposed to train them despite their limitations are neither fully theoretically grounded nor generally sufficient in practice. Building on this, we localize the source of the problem at the interface between the model's neural networks and their output probabilistic distributions. We explain a common source of instability stemming from an incautious formulation of the encoded Normal distribution's variance, and apply the same approach on other, less obvious sources. We show that by implementing small changes to the way we parameterize the Normal distributions on which they rely, VAEs can securely be trained.

翻訳日:2021-06-28 13:15:08 公開日:2021-06-25

# ヒューマンマシン協調によるつぶやきのきめ細かい位置情報予測

Fine-grained Geolocation Prediction of Tweets with Human Machine Collaboration ( http://arxiv.org/abs/2106.13411v1 )

ライセンス: Link先を確認

Florina Dutt and Subhajit Das

(参考訳) Twitterは、さまざまなトピックに関する人々の意見を分析するのに有用なリソースである。多くの場合、これらのトピックは、これらのつぶやきが投稿された場所と関連付けられている。例えば、レストランのオーナーは、食事に関する投稿の感情に関して、ターゲットの顧客がどこで食事をしているかを知る必要があり、政策プランナーは、犯罪、安全、渋滞などの関連する問題について、市民の意見を分析する必要がある。市の特定の部分、または郡または州について。約束通り、クロールされたツイートの投稿に位置情報タグが付くのは$1\%以下だ。これにより、ジオタグ付けされていないツイートに対するツイートの正確な予測が、さまざまなドメインのデータ分析に非常に重要である。本研究では,自然言語処理(NLP)技術を用いて,近隣,ジップコード,緯度などの様々な粒度における非ジオタグのつぶやき投稿の位置を推定するディープニューラルネットワークモデルを構築するために,何百万ものTwitter投稿とエンドユーザドメインの専門知識を活用した。複数のニューラルアーキテクチャ実験と協調的なヒューマンマシンワークフロー設計により、位置情報検出に関する現在進行中の研究は、エンドユーザが選択した変数と位置情報の関係を関連付けるための有望な結果を示している。

Twitter is a useful resource to analyze peoples' opinions on various topics. Often these topics are correlated or associated with locations from where these Tweet posts are made. For example, restaurant owners may need to know where their target customers eat with respect to the sentiment of the posts made related to food, policy planners may need to analyze citizens' opinion on relevant issues such as crime, safety, congestion, etc. with respect to specific parts of the city, or county or state. As promising as this is, less than $1\%$ of the crawled Tweet posts come with geolocation tags. That makes accurate prediction of Tweet posts for the non geo-tagged tweets very critical to analyze data in various domains. In this research, we utilized millions of Twitter posts and end-users domain expertise to build a set of deep neural network models using natural language processing (NLP) techniques, that predicts the geolocation of non geo-tagged Tweet posts at various level of granularities such as neighborhood, zipcode, and longitude with latitudes. With multiple neural architecture experiments, and a collaborative human-machine workflow design, our ongoing work on geolocation detection shows promising results that empower end-users to correlate relationship between variables of choice with the location information.

翻訳日:2021-06-28 13:14:53 公開日:2021-06-25

# リアルタイム話者分離のためのオンライン自己認識型学習RNN

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation ( http://arxiv.org/abs/2106.13493v1 )

ライセンス: Link先を確認

Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar

(参考訳) ディープニューラルネットワークは、モノラルとバイノーラルの両方の設定下で、ブラインドソース分離のタスクで大きな成功を収めた。これらの手法は高品質な分離を実現することが示されているが、主にオフライン環境で適用され、モデルが信号分離中に全入力信号にアクセスできる。本研究では,非因果的状態分離モデルを因果的かつリアルタイムなモデルに変換し,その性能をオンラインとオフラインの両方で評価する。提案モデルの性能を無響・無響・無雑音・残響記録条件下での複数のベースライン法と比較し,両耳入力と出力について検討した。分離時の因果モデルと非因果モデルとの相対的差異について検討した。オンライン分離のためのステートフルな実装は,オフラインモデルに比べてパフォーマンスが低下し,モノラル入力は0.8dB,バイノーラル入力は0.3dBとなり,リアルタイム係数0.65に達した。 https://kwanum.github.io/sagrnnc-stream-results/。

Deep neural networks have recently shown great success in the task of blind source separation, both under monaural and binaural settings. Although these methods were shown to produce high-quality separations, they were mainly applied under offline settings, in which the model has access to the full input signal while separating the signal. In this study, we convert a non-causal state-of-the-art separation model into a causal and real-time model and evaluate its performance under both online and offline settings. We compare the performance of the proposed model to several baseline methods under anechoic, noisy, and noisy-reverberant recording conditions while exploring both monaural and binaural inputs and outputs. Our findings shed light on the relative difference between causal and non-causal models when performing separation. Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0.8dB for monaural inputs and 0.3dB for binaural inputs while reaching a real-time factor of 0.65. Samples can be found under the following link: https://kwanum.github.io/sagrnnc-stream-results/.

翻訳日:2021-06-28 13:14:20 公開日:2021-06-25

# 空間的進化的adversarial networkにおける多様性の育成

Fostering Diversity in Spatial Evolutionary Generative Adversarial Networks ( http://arxiv.org/abs/2106.13590v1 )

ライセンス: Link先を確認

Jamal Toutouh and Erik Hemberg and Una-May O'Reilly

(参考訳) ジェネレーティブ・逆境ネットワーク(GAN)は不安定性やモード崩壊などのトレーニング病理に悩まされ、主に敵の相互作用の多様性の欠如から生じる。 Co-evolutionary GAN (CoE-GAN) トレーニングアルゴリズムはこれらの病理に耐性があることが示されている。本稿では,空間分布型CoE-GANであるMustangsを紹介する。 MNISTとCelebAの実験分析により、ムスタングは統計的により正確な発電機を訓練することを示した。

Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse, which mainly arise from a lack of diversity in their adversarial interactions. Co-evolutionary GAN (CoE-GAN) training algorithms have shown to be resilient to these pathologies. This article introduces Mustangs, a spatially distributed CoE-GAN, which fosters diversity by using different loss functions during the training. Experimental analysis on MNIST and CelebA demonstrated that Mustangs trains statistically more accurate generators.

翻訳日:2021-06-28 13:14:03 公開日:2021-06-25

# 直交確率線形混合モデルを用いた高次元時系列ベイズ推定

Bayesian Inference in High-Dimensional Time-Serieswith the Orthogonal Stochastic Linear Mixing Model ( http://arxiv.org/abs/2106.13379v1 )

ライセンス: Link先を確認

Rui Meng, Kristofer Bouchard

(参考訳) 現代の時系列データセットの多くは、長期間にわたってサンプリングされた大量の出力応答変数を含んでいる。例えば神経科学では、100s-1000のニューロンの活動は行動中や感覚刺激に反応して記録される。多出力ガウス過程モデルでは、ガウス過程の非パラメトリックな性質を利用して複数の出力をまたいだ構造を捉える。しかし、このモデルのクラスは、通常、出力応答変数間の相関が入力空間で不変であると仮定する。確率線形混合モデル(SLMM)は、混合係数が入力に依存すると仮定し、より柔軟で複雑な出力依存を捉えるのに効果的である。しかし、現在、SLMMの推論は大規模なデータセットには難解であり、現代の時系列問題にも適用できない。本稿では,混合係数間の直交制約を導入する新しい回帰フレームワークである直交確率線形混合モデル(oslmm)を提案する。この制約は、複雑な出力依存を処理する能力を保持しながら、推論の計算負荷を軽減する。我々は,slmmとoslmmの双方に対してマルコフ連鎖モンテカルロ推定手法を提供し,実世界のアプリケーションにおいて,oslmmのモデル拡張性と予測誤差の低減を実証した。神経生理学記録では,聴覚刺激に対する応答のコンパクトな可視化に,推定潜時関数を用い,競合法(GPFA)と比較して優れた結果を示した。これらの結果から,OSLMMは多様な大規模時系列データセットの分析に有用であることが示唆された。

Many modern time-series datasets contain large numbers of output response variables sampled for prolonged periods of time. For example, in neuroscience, the activities of 100s-1000's of neurons are recorded during behaviors and in response to sensory stimuli. Multi-output Gaussian process models leverage the nonparametric nature of Gaussian processes to capture structure across multiple outputs. However, this class of models typically assumes that the correlations between the output response variables are invariant in the input space. Stochastic linear mixing models (SLMM) assume the mixture coefficients depend on input, making them more flexible and effective to capture complex output dependence. However, currently, the inference for SLMMs is intractable for large datasets, making them inapplicable to several modern time-series problems. In this paper, we propose a new regression framework, the orthogonal stochastic linear mixing model (OSLMM) that introduces an orthogonal constraint amongst the mixing coefficients. This constraint reduces the computational burden of inference while retaining the capability to handle complex output dependence. We provide Markov chain Monte Carlo inference procedures for both SLMM and OSLMM and demonstrate superior model scalability and reduced prediction error of OSLMM compared with state-of-the-art methods on several real-world applications. In neurophysiology recordings, we use the inferred latent functions for compact visualization of population responses to auditory stimuli, and demonstrate superior results compared to a competing method (GPFA). Together, these results demonstrate that OSLMM will be useful for the analysis of diverse, large-scale time-series datasets.

翻訳日:2021-06-28 13:13:55 公開日:2021-06-25

# グルーピング効果を用いたロバスト行列因子分解

Robust Matrix Factorization with Grouping Effect ( http://arxiv.org/abs/2106.13681v1 )

ライセンス: Link先を確認

Haiyan Jiang, Shuyu Li, Luwei Zhang, Haoyi Xiong, Dejing Dou

(参考訳) 行列分解(MF)には多くの技術が応用されているが、特徴構造を完全に活用するものではない。本稿では,グループ化効果をMFに組み込んで,グループ化効果を用いたロバスト行列分解法(GRMF)を提案する。グルーピング効果はsparsity効果の一般化であり、0.0前後ではなく、複数の中心に類似した値をクラスタリングすることでデノイジングを行う。既存のアルゴリズムと比較して,提案したGRMFは,自然に調整可能な非凸正規化を導入し,同時分散とグループ化効果を実現することで,MF内のグループ構造と疎性を自動的に学習することができる。具体的には、GRMFは効率の良い交互最小化フレームワークを使用してMFを実行し、元の非凸問題はまず差分凸(DC)プログラミングによって凸問題に変換され、次に交互乗算器の方向法(ADMM)によって解決される。さらに、GRMFはNon- negative Matrix Factorization (NMF)設定に容易に拡張できる。 5つのベンチマークアルゴリズムと比較して, GRMFが性能と堅牢性を向上したことを示す実験結果が得られた。

Although many techniques have been applied to matrix factorization (MF), they may not fully exploit the feature structure. In this paper, we incorporate the grouping effect into MF and propose a novel method called Robust Matrix Factorization with Grouping effect (GRMF). The grouping effect is a generalization of the sparsity effect, which conducts denoising by clustering similar values around multiple centers instead of just around 0. Compared with existing algorithms, the proposed GRMF can automatically learn the grouping structure and sparsity in MF without prior knowledge, by introducing a naturally adjustable non-convex regularization to achieve simultaneous sparsity and grouping effect. Specifically, GRMF uses an efficient alternating minimization framework to perform MF, in which the original non-convex problem is first converted into a convex problem through Difference-of-Convex (DC) programming, and then solved by Alternating Direction Method of Multipliers (ADMM). In addition, GRMF can be easily extended to the Non-negative Matrix Factorization (NMF) settings. Extensive experiments have been conducted using real-world data sets with outliers and contaminated noise, where the experimental results show that GRMF has promoted performance and robustness, compared to five benchmark algorithms.

翻訳日:2021-06-28 13:13:30 公開日:2021-06-25

# 凸最適化のためのプライベート適応勾配法

Private Adaptive Gradient Methods for Convex Optimization ( http://arxiv.org/abs/2106.13756v1 )

ライセンス: Link先を確認

Hilal Asi, John Duchi, Alireza Fallah, Omid Javidbakht, Kunal Talwar

(参考訳) 適応ステップを持つ確率勾配降下 (sgd) アルゴリズムの微分プライベート変種の提案と解析, 微分プライベート凸最適化のための適応手法とアダグラードアルゴリズムについて検討した。我々は,両アルゴリズムの後悔の上限を与え,その限界が(最悪の場合)最適であることを示す。その結果,AdaGradのプライベートバージョンは適応性SGDより優れており,AdagradがSGDより優れていることを示す非等方勾配のシナリオでは従来のSGDより優れていた。主な課題は、一般にプライバシーのために付加される等方性雑音が高次元問題に対する勾配幾何学の信号を支配していることである。対照的に,非等方性クリッピングとノイズ付加について研究し,原理的理論的アプローチを考案した。

We study adaptive methods for differentially private convex optimization, proposing and analyzing differentially private variants of a Stochastic Gradient Descent (SGD) algorithm with adaptive stepsizes, as well as the AdaGrad algorithm. We provide upper bounds on the regret of both algorithms and show that the bounds are (worst-case) optimal. As a consequence of our development, we show that our private versions of AdaGrad outperform adaptive SGD, which in turn outperforms traditional SGD in scenarios with non-isotropic gradients where (non-private) Adagrad provably outperforms SGD. The major challenge is that the isotropic noise typically added for privacy dominates the signal in gradient geometry for high-dimensional problems; approaches to this that effectively optimize over lower-dimensional subspaces simply ignore the actual problems that varying gradient geometries introduce. In contrast, we study non-isotropic clipping and noise addition, developing a principled theoretical approach; the consequent procedures also enjoy significantly stronger empirical performance than prior approaches.

翻訳日:2021-06-28 13:13:09 公開日:2021-06-25

# 確率ネスト問題に対する交互確率勾配法のタイター解析

Tighter Analysis of Alternating Stochastic Gradient Method for Stochastic Nested Problems ( http://arxiv.org/abs/2106.13781v1 )

ライセンス: Link先を確認

Tianyi Chen, Yuejiao Sun, and Wotao Yin

(参考訳) 確率的合成、min-max、bilevel最適化を含む確率的ネスト最適化は、多くの機械学習アプリケーションで人気を集めている。 3つの問題はネスト構造を共有しているが、既存の作品はしばしばそれらを別々に扱い、問題固有のアルゴリズムとその分析を開発する。様々なエキサイティングな開発の中で、単純なsgdタイプの更新(潜在的に複数の変数上の)は、ネストした問題のクラスを解くために今でも一般的であるが、非ネスト問題に比べて収束速度が遅いと考えられている。本稿では,確率的ネスト問題に対するSGD型更新を1つのSGDアプローチに統合し,確率的勾配dEscenT法(Alternating Stochastic gradient dEscenT:ALSET)と呼ぶ。本稿では,問題の隠れた滑らかさを生かして,確率的ネスト問題に対するalsetのより厳密な解析を行う。新しい解析では、ネストされた問題の$\epsilon$-定常点を達成するには、${\cal O}(\epsilon^{-2})$サンプルが必要である。一定の規則性条件下では, 確率的構成, min-max, 強化学習問題に適用し, それぞれの場合において最もよく知られたサンプルの複雑さを改善または一致させる。確率ネスト問題における単純なSGD型アルゴリズムが、さらなる修正を必要とせず、実際に非常にうまく機能する理由を述べる。

Stochastic nested optimization, including stochastic compositional, min-max and bilevel optimization, is gaining popularity in many machine learning applications. While the three problems share the nested structure, existing works often treat them separately, and thus develop problem-specific algorithms and their analyses. Among various exciting developments, simple SGD-type updates (potentially on multiple variables) are still prevalent in solving this class of nested problems, but they are believed to have slower convergence rate compared to that of the non-nested problems. This paper unifies several SGD-type updates for stochastic nested problems into a single SGD approach that we term ALternating Stochastic gradient dEscenT (ALSET) method. By leveraging the hidden smoothness of the problem, this paper presents a tighter analysis of ALSET for stochastic nested problems. Under the new analysis, to achieve an $\epsilon$-stationary point of the nested problem, it requires ${\cal O}(\epsilon^{-2})$ samples. Under certain regularity conditions, applying our results to stochastic compositional, min-max and reinforcement learning problems either improves or matches the best-known sample complexity in the respective cases. Our results explain why simple SGD-type algorithms in stochastic nested problems all work very well in practice without the need for further modifications.

翻訳日:2021-06-28 13:12:50 公開日:2021-06-25

# 効率的なレアイベントシミュレーションのためのマルチフィデリティモデリングによるアクティブラーニング

Active Learning with Multifidelity Modeling for Efficient Rare Event Simulation ( http://arxiv.org/abs/2106.13790v1 )

ライセンス: Link先を確認

S. L. N. Dhulipala, M. D. Shields, B. W. Spencer, C. Bolisetti, A. E. Slaughter, V. M. Laboure, P. Chakroborty

(参考訳) マルチフィデリティモデリングは、計算コストの高いモデルで不確実性定量化を行うためのコスト効率の高い方法を提供するが、問題の種類や複雑性、結果の所望の精度に応じて、必要なハイフィデリティ(hf)シミュレーションの数を適応的に決定することで、より効率が向上する。希少事象の効率的に推定することを強調する多要素モデルを用いた能動的学習フレームワークを提案する。提案手法は,hf推定された補正を用いて低忠実度(lf)予測を融合し,修正されたlf予測をフィルタリングして高忠実度モデルを呼び出すか否かを判断し,hfモデル呼び出し毎にlf予測の補正を適応させる。このフレームワークは、LFモデルタイプやHFモデルとの相関について、いかなる仮定もしていない。さらに,障害確率を小さく見積もる場合のロバスト性向上のために,HFモデルをいつ呼び出すかを決定する動的能動学習関数を提案する。我々は,いくつかの学術ケーススタディと2つの有限要素モデルケーススタディを用いて,Stokes近似を用いたNavier-Stokes velocitiesの推定と,粗いメッシュ化等方性モデルを用いた横方向等方性モデルによる応力推定を行う。これらのケーススタディを通じて,提案手法は故障確率を正確に推定するだけでなく,モンテカルロ法や標準分散還元法と比較して,hfモデルへのコールのごく一部しか必要としなかった。

While multifidelity modeling provides a cost-effective way to conduct uncertainty quantification with computationally expensive models, much greater efficiency can be achieved by adaptively deciding the number of required high-fidelity (HF) simulations, depending on the type and complexity of the problem and the desired accuracy in the results. We propose a framework for active learning with multifidelity modeling emphasizing the efficient estimation of rare events. Our framework works by fusing a low-fidelity (LF) prediction with an HF-inferred correction, filtering the corrected LF prediction to decide whether to call the high-fidelity model, and for enhanced subsequent accuracy, adapting the correction for the LF prediction after every HF model call. The framework does not make any assumptions as to the LF model type or its correlations with the HF model. In addition, for improved robustness when estimating smaller failure probabilities, we propose using dynamic active learning functions that decide when to call the HF model. We demonstrate our framework using several academic case studies and two finite element (FE) model case studies: estimating Navier-Stokes velocities using the Stokes approximation and estimating stresses in a transversely isotropic model subjected to displacements via a coarsely meshed isotropic model. Across these case studies, not only did the proposed framework estimate the failure probabilities accurately, but compared with either Monte Carlo or a standard variance reduction method, it also required only a small fraction of the calls to the HF model.

翻訳日:2021-06-28 13:12:25 公開日:2021-06-25

# Proxy Convexity: グラディエントDescentでトレーニングされたニューラルネットワーク解析のための統一フレームワーク

Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent ( http://arxiv.org/abs/2106.13792v1 )

ライセンス: Link先を確認

Spencer Frei and Quanquan Gu

(参考訳) ニューラルネットワークを学習するための最適化目標は非常に非凸であるが、勾配に基づく手法は実際にニューラルネットワークを学習する上で大きな成功を収めている。この仮定は、勾配降下によって訓練されたニューラルネットワークの証明可能な保証に関する最近の多くの研究につながった。残念なことに、これらの研究のテクニックは、分散、最適化パラメータ、ネットワークアーキテクチャの異なる仮定に依存して、各設定で研究された問題に非常に特化していることが多い。本稿では,ニューラルネットワークの学習分析のための統合型非凸最適化フレームワークを提案する。本稿では,従来の目的関数が勾配法を用いて暗黙的に最小化されるプロキシ目的関数を誘導した場合に満足する,プロキシ凸性とプロキシのPolyak-Lojasiewicz(PL)不等式について紹介する。確率的勾配降下 (sgd) は, プロキシ凸性あるいはプロキシplの不等式を満たす目的に対して, プロキシ目的関数の効率的な保証をもたらす。さらに,勾配降下によって学習されたニューラルネットワークに対する既存の保証の多くは,プロキシ凸性とプロキシpl不等式によって統一できることを示した。

Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the problem studied in each setting, relying on different assumptions on the distribution, optimization parameters, and network architectures, making it difficult to generalize across different settings. In this work, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that stochastic gradient descent (SGD) on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.

翻訳日:2021-06-28 13:11:58 公開日:2021-06-25

# 自己学習型学習者は混合モデルで強い学習者へ変換する

Self-training Converts Weak Learners to Strong Learners in Mixture Models ( http://arxiv.org/abs/2106.13805v1 )

ライセンス: Link先を確認

Spencer Frei and Difan Zou and Zixiang Chen and Quanquan Gu

(参考訳) 本研究では, 2 つの等方性分布が対数対数対数分布で満たし, 対数対数分布が満たしている場合の二分分類問題を考える。 We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension. すなわち、自己学習は、ラベルのない例のみを用いて弱い学習者を強い学習者に変換する。さらに、ロジスティック損失に対して勾配降下を行うことで、$o(d)$のラベル付き例のみを用いて分類誤差$c_{\mathrm{err}}$を持つ擬似ラベル$\boldsymbol{\beta}_{\mathrm{pl}}$が得られる(すなわち$\varepsilon$とは無関係)。その結果,半教師付き自己学習アルゴリズムを用いて,最大$o(d)$のラベル付き例と$\tilde o(d/\varepsilon^2)$のラベル付き例を用いて,混合モデルがベイズ最適精度の$\varepsilon$以内に学習できることが示唆された。

We consider a binary classification problem when the data comes from a mixture of two isotropic distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ with classification error $C_{\mathrm{err}}$ using only $O(d)$ labeled examples (i.e., independent of $\varepsilon$). Together our results imply that mixture models can be learned to within $\varepsilon$ of the Bayes-optimal accuracy using at most $O(d)$ labeled examples and $\tilde O(d/\varepsilon^2)$ unlabeled examples by way of a semi-supervised self-training algorithm.

翻訳日:2021-06-28 13:11:38 公開日:2021-06-25

# タンポラ型原型ニューラルネットワークを用いた眼底緑内障の円形OCT-Focused Hybrid Learning

Circumpapillary OCT-Focused Hybrid Learning for Glaucoma Grading Using Tailored Prototypical Neural Networks ( http://arxiv.org/abs/2106.13551v1 )

ライセンス: Link先を確認

Gabriel Garc\'ia, Roc\'io del Amor, Adri\'an Colomer, Rafael Verd\'u-Monedero, Juan Morales-S\'anchez and Valery Naranjo

(参考訳) 緑内障は視覚障害の主要な原因の1つであり、光コヒーレンス・トモグラフィー(OCT)はその検出に欠かせない技術である。緑内障検出に焦点をあてた最先端研究の多くとは違って,本論文では,生の毛細血管Bスキャンを用いた緑内障診断のための新しい枠組みを提案する。特に,手動学習と深層学習を組み合わせた新しいOCTベースのハイブリッドネットワークを構築した。網膜神経線維層(RNFL)に関連する手作りの特徴を抽出するために,OCT特異的記述子を提案する。並行して、遅延空間の自動的特徴を洗練させるために、リザーブド・アテンション・モジュールを含むスキップ接続を用いて革新的なCNNが開発された。提案アーキテクチャは,静的および動的プロトタイプネットワークに基づく新規な数ショット学習を行うためのバックボーンとして使用される。 k-shotパラダイムが再定義され、健康、早期、先進的な緑内障のサンプルを区別する、監視されたエンドツーエンドシステムを生み出している。ハイデルベルクスペクトルシステムによって取得した2つの融合データベースから,動的プロトタイプネットワークの訓練と評価を行う。検証と検査の結果,緑内障の分類精度は 0.9459 と 0.8788 であった。さらに,提案モデルによる緑内障検出の高性能化は,特に注意すべき点である。 RNFLは緑内障診断の最も関連性の高い構造であると熱マップが指摘しているため, クラスアクティベーションマップからの知見は臨床医の意見と直接一致している。

Glaucoma is one of the leading causes of blindness worldwide and Optical Coherence Tomography (OCT) is the quintessential imaging technique for its detection. Unlike most of the state-of-the-art studies focused on glaucoma detection, in this paper, we propose, for the first time, a novel framework for glaucoma grading using raw circumpapillary B-scans. In particular, we set out a new OCT-based hybrid network which combines hand-driven and deep learning algorithms. An OCT-specific descriptor is proposed to extract hand-crafted features related to the retinal nerve fibre layer (RNFL). In parallel, an innovative CNN is developed using skip-connections to include tailored residual and attention modules to refine the automatic features of the latent space. The proposed architecture is used as a backbone to conduct a novel few-shot learning based on static and dynamic prototypical networks. The k-shot paradigm is redefined giving rise to a supervised end-to-end system which provides substantial improvements discriminating between healthy, early and advanced glaucoma samples. The training and evaluation processes of the dynamic prototypical network are addressed from two fused databases acquired via Heidelberg Spectralis system. Validation and testing results reach a categorical accuracy of 0.9459 and 0.8788 for glaucoma grading, respectively. Besides, the high performance reported by the proposed model for glaucoma detection deserves a special mention. The findings from the class activation maps are directly in line with the clinicians' opinion since the heatmaps pointed out the RNFL as the most relevant structure for glaucoma diagnosis.

翻訳日:2021-06-28 13:10:36 公開日:2021-06-25

# 病理組織像を用いた膀胱癌診断のための新しい自己学習フレームワーク

A Novel Self-Learning Framework for Bladder Cancer Grading Using Histopathological Images ( http://arxiv.org/abs/2106.13559v1 )

ライセンス: Link先を確認

Gabriel Garc\'ia, Anna Esteve, Adri\'an Colomer, David Ramos and Valery Naranjo

(参考訳) 近年,膀胱癌の発生率と死亡率の増加がみられた。現在、NMIBC(非筋浸潤性膀胱癌)とMIBC(筋浸潤性膀胱癌)の2つのサブタイプが知られている。本研究では,mibcサブタイプに焦点をあてる。これは最悪の予後であり,隣接する臓器に拡がる可能性があるためである。組織像を免疫組織化学的手法により染色した膀胱癌に対する自己学習フレームワークを提案する。具体的には、本論文で確立されたパターンに従って、組織学的パッチを病の重症度に分類できる新しいDeep Convolutional Embedded Attention Clustering (DCEAC)を提案する。提案したDCEACモデルは,512×512ピクセルの高分解能試料から非腫瘍,軽度,浸透パターンを識別する2段階の完全教師なし学習手法に従う。本システムでは,従来のクラスタリング手法よりも,階層化前の潜伏空間の特徴を精査するコンボリューショナルアテンションモジュールを組み込むことで,性能を向上する。提案されたネットワークは最先端のアプローチを異なるメトリクスで2-3%上回り、マルチクラスのシナリオでは 0.9034 という最終的な平均精度を達成している。さらに、報告されたクラスアクティベーションマップは、我々のモデルが、事前のアノテーションステップを発生させることなく、臨床医が考慮するパターンと同じパターンで学習できることを示す。この事実は、ラベル付きデータでモデルを訓練する際のギャップを埋める筋肉浸潤性膀胱がんのグレーディングにおけるブレークスルーである。

Recently, bladder cancer has been significantly increased in terms of incidence and mortality. Currently, two subtypes are known based on tumour growth: non-muscle invasive (NMIBC) and muscle-invasive bladder cancer (MIBC). In this work, we focus on the MIBC subtype because it is of the worst prognosis and can spread to adjacent organs. We present a self-learning framework to grade bladder cancer from histological images stained via immunohistochemical techniques. Specifically, we propose a novel Deep Convolutional Embedded Attention Clustering (DCEAC) which allows classifying histological patches into different severity levels of the disease, according to the patterns established in the literature. The proposed DCEAC model follows a two-step fully unsupervised learning methodology to discern between non-tumour, mild and infiltrative patterns from high-resolution samples of 512x512 pixels. Our system outperforms previous clustering-based methods by including a convolutional attention module, which allows refining the features of the latent space before the classification stage. The proposed network exceeds state-of-the-art approaches by 2-3% across different metrics, achieving a final average accuracy of 0.9034 in a multi-class scenario. Furthermore, the reported class activation maps evidence that our model is able to learn by itself the same patterns that clinicians consider relevant, without incurring prior annotation steps. This fact supposes a breakthrough in muscle-invasive bladder cancer grading which bridges the gap with respect to train the model on labelled data.

翻訳日:2021-06-28 13:10:12 公開日:2021-06-25

# 変圧器における形状登録

Shape registration in the time of transformers ( http://arxiv.org/abs/2106.13679v1 )

ライセンス: Link先を確認

Giovanni Trappolini, Luca Cosmo, Luca Moschella, Riccardo Marin, Emanuele Rodol\`a

(参考訳) 本稿では,非剛性3次元点雲の効率的な登録のための変圧器に基づく手法を提案する。提案手法はデータ駆動型であり、登録タスクにおいて初めてトランスフォーマーアーキテクチャを採用する。我々の方法は一般的であり、異なる設定に当てはまる。いくつかの望ましい特性を持つ固定テンプレート(例)が与えられる。スキンウェイトや他のアニメーションキュー) 取得した生データを登録することで、すべてのテンプレートプロパティを入力ジオメトリに転送することができる。あるいは、一対の形状を与えられた場合、この方法は第1を第2(あるいはその逆)に登録し、2つの間の高品質な密度対応を得る。両方の文脈において、結果の品質は、テクスチャ転送や形状補間といった実際の応用を目標にすることができる。さらに,表面の密度を推定することにより,学習プロセスが簡単になることを示す。このアーキテクチャの潜在能力を生かして、基礎的真理対応のスパースセットだけを必要とするモデルを訓練することができる(全点の10\sim20\%$)。提案するモデルと解析により、登録およびマッチングアプリケーションのためのトランスフォーマーベースのアーキテクチャの今後の探究の道が開けた。定性的かつ定量的な評価は,異なるデータセットやシナリオの3dデータ登録を変形可能かつ無秩序にするために,パイプラインが最先端の手法よりも優れていることを示している。

In this paper, we propose a transformer-based procedure for the efficient registration of non-rigid 3D point clouds. The proposed approach is data-driven and adopts for the first time the transformer architecture in the registration task. Our method is general and applies to different settings. Given a fixed template with some desired properties (e.g. skinning weights or other animation cues), we can register raw acquired data to it, thereby transferring all the template properties to the input geometry. Alternatively, given a pair of shapes, our method can register the first onto the second (or vice-versa), obtaining a high-quality dense correspondence between the two. In both contexts, the quality of our results enables us to target real applications such as texture transfer and shape interpolation. Furthermore, we also show that including an estimation of the underlying density of the surface eases the learning process. By exploiting the potential of this architecture, we can train our model requiring only a sparse set of ground truth correspondences ($10\sim20\%$ of the total points). The proposed model and the analysis that we perform pave the way for future exploration of transformer-based architectures for registration and matching applications. Qualitative and quantitative evaluations demonstrate that our pipeline outperforms state-of-the-art methods for deformable and unordered 3D data registration on different datasets and scenarios.

翻訳日:2021-06-28 13:09:46 公開日:2021-06-25

# 計算病理学のセマンティックアノテーション:多分野経験とベストプラクティス勧告

Semantic annotation for computational pathology: Multidisciplinary experience and best practice recommendations ( http://arxiv.org/abs/2106.13689v1 )

ライセンス: Link先を確認

Noorul Wahab, Islam M Miligy, Katherine Dodd, Harvir Sahota, Michael Toss, Wenqi Lu, Mostafa Jahanifar, Mohsin Bilal, Simon Graham, Young Park, Giorgos Hadjigeorghiou, Abhir Bhalerao, Ayat Lashen, Asmaa Ibrahim, Ayaka Katayama, Henry O Ebili, Matthew Parkin, Tom Sorell, Shan E Ahmed Raza, Emily Hero, Hesham Eldaly, Yee Wah Tsang, Kishore Gopalakrishnan, David Snead, Emad Rakha, Nasir Rajpoot, Fayyaz Minhas

(参考訳) フルスライドイメージング(wsi)技術の最近の進歩は、無数のコンピュータビジョンと人工知能(ai)ベースの診断、予測、予測アルゴリズムの開発につながった。 CPath(Computational Pathology)は、病理学のWSIに埋め込まれた情報を活用するための統合されたソリューションを提供する。 WSIの自動分析と機械学習(ML)モデルの検証には、スライドでのアノテーション、組織、細胞レベルが必要である。病理画像における重要な視覚構成物のアノテーションはCPathプロジェクトの重要な構成要素である。不適切なアノテーションは解釈が難しいアルゴリズムとなり、不正確で一貫性のない結果を生み出す可能性がある。 CPathプロジェクトにおけるアノテーションの重要な役割にもかかわらず、アノテーションの実施方法に関する明確なガイドラインやベストプラクティスは存在しない。本稿では,多分野にわたる病理学者,ML専門家,研究者による大規模アノテーション演習の実施中に得られた経験とベストプラクティスを,PathLAKEコンソーシアム(Lake for Analytics, Knowledge and Education)コンソーシアムの一部として提示することで,この問題に対処する。本稿では,様々な種類のアノテーション,診断アルゴリズム,アノテーションデータ辞書,アノテーション構成例とともに実世界のケーススタディを示す。この研究で報告された分析は、CPathプロジェクトのライフサイクルに関するガイドラインとして使用できるベストプラクティスの推奨を強調している。

Recent advances in whole slide imaging (WSI) technology have led to the development of a myriad of computer vision and artificial intelligence (AI) based diagnostic, prognostic, and predictive algorithms. Computational Pathology (CPath) offers an integrated solution to utilize information embedded in pathology WSIs beyond what we obtain through visual assessment. For automated analysis of WSIs and validation of machine learning (ML) models, annotations at the slide, tissue and cellular levels are required. The annotation of important visual constructs in pathology images is an important component of CPath projects. Improper annotations can result in algorithms which are hard to interpret and can potentially produce inaccurate and inconsistent results. Despite the crucial role of annotations in CPath projects, there are no well-defined guidelines or best practices on how annotations should be carried out. In this paper, we address this shortcoming by presenting the experience and best practices acquired during the execution of a large-scale annotation exercise involving a multidisciplinary team of pathologists, ML experts and researchers as part of the Pathology image data Lake for Analytics, Knowledge and Education (PathLAKE) consortium. We present a real-world case study along with examples of different types of annotations, diagnostic algorithm, annotation data dictionary and annotation constructs. The analyses reported in this work highlight best practice recommendations that can be used as annotation guidelines over the lifecycle of a CPath project.

翻訳日:2021-06-28 13:09:27 公開日:2021-06-25

# JNLPチーム:COLIEE 2021における法律処理タスクのためのディープラーニングアプローチ

JNLP Team: Deep Learning Approaches for Legal Processing Tasks in COLIEE 2021 ( http://arxiv.org/abs/2106.13405v1 )

ライセンス: Link先を確認

Ha-Thanh Nguyen, Phuong Minh Nguyen, Thi-Hai-Yen Vuong, Quan Minh Bui, Chau Minh Nguyen, Binh Tran Dang, Vu Tran, Minh Le Nguyen, Ken Satoh

(参考訳) COLIEEは、自動コンピュータ化された法律テキスト処理における毎年のコンペティションである。自動法的文書処理は野心的な目標であり、法律の構造と意味論は日常言語よりもはるかに複雑であることが多い。本稿では,法律文書処理における深層学習の方法と実験結果について調査・報告する。結果は、この一連のアプローチの難しさと可能性を示している。

COLIEE is an annual competition in automatic computerized legal text processing. Automatic legal document processing is an ambitious goal, and the structure and semantics of the law are often far more complex than everyday language. In this article, we survey and report our methods and experimental results in using deep learning in legal document processing. The results show the difficulties as well as potentials in this family of approaches.

翻訳日:2021-06-28 13:09:01 公開日:2021-06-25

# アムハラ語用マニュアルアノテーション付きスペル誤りコーパス

Manually Annotated Spelling Error Corpus for Amharic ( http://arxiv.org/abs/2106.13521v1 )

ライセンス: Link先を確認

Andargachew Mekonnen Gezmu, Tirufat Tesifaye Lema, Binyam Ephrem Seyoum, Andreas N\"urnberger

(参考訳) 本稿では,エチオピアのAmharic, lingua Franceaに対して手書きの綴り誤りコーパスを提案する。コーパスはスペルエラーの検出と修正の評価に使用されるように設計されている。ミススペルは非単語と実単語のエラーとしてタグ付けされる。さらに、コーパスで利用可能なコンテキスト情報は、両方のスペルエラーを扱うのに役立ちます。

This paper presents a manually annotated spelling error corpus for Amharic, lingua franca in Ethiopia. The corpus is designed to be used for the evaluation of spelling error detection and correction. The misspellings are tagged as non-word and real-word errors. In addition, the contextual information available in the corpus makes it useful in dealing with both types of spelling errors.

翻訳日:2021-06-28 13:08:56 公開日:2021-06-25

# ELECTRA事前学習のためのサンプル交換の学習

Learning to Sample Replacements for ELECTRA Pre-Training ( http://arxiv.org/abs/2106.13715v1 )

ライセンス: Link先を確認

Yaru Hao, Li Dong, Hangbo Bao, Ke Xu, Furu Wei

(参考訳) ELECTRAは、置換トークンを検出するために識別器を事前訓練し、置換はマスク付き言語モデリングで訓練されたジェネレータからサンプリングされる。この性能にもかかわらず、ELECTRAは以下の2つの問題に悩まされている。まず、判別器からジェネレータへの直接フィードバックループはなく、置換サンプリングが非効率になる。第二に、ジェネレータの予測はトレーニングとともに過信される傾向があり、置換は正しいトークンに偏っている。本稿では,エレクトラプリトレーニングのための代替サンプリングを改善する2つの手法を提案する。具体的には,識別器が取得していないものを学習できるように,硬度予測機構によるサンプリングを増強する。また,効率的なサンプリングが判別器のトレーニング分散を減少させることを示す。さらに,代用として適切なトークンのオーバーサンプリングを緩和するために,発電機の焦点損失を利用する。実験の結果,提案手法は様々な下流タスクにおけるELECTRA事前学習を改善することがわかった。

ELECTRA pretrains a discriminator to detect replaced tokens, where the replacements are sampled from a generator trained with masked language modeling. Despite the compelling performance, ELECTRA suffers from the following two issues. First, there is no direct feedback loop from discriminator to generator, which renders replacement sampling inefficient. Second, the generator's prediction tends to be over-confident along with training, making replacements biased to correct tokens. In this paper, we propose two methods to improve replacement sampling for ELECTRA pre-training. Specifically, we augment sampling with a hardness prediction mechanism, so that the generator can encourage the discriminator to learn what it has not acquired. We also prove that efficient sampling reduces the training variance of the discriminator. Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements. Experimental results show that our method improves ELECTRA pre-training on various downstream tasks.

翻訳日:2021-06-28 13:08:49 公開日:2021-06-25

# DeltaLM: 事前訓練された多言語エンコーダの拡張による言語生成と翻訳のためのエンコーダデコーダ事前学習

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders ( http://arxiv.org/abs/2106.13736v1 )

ライセンス: Link先を確認

Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

(参考訳) プリトレーニングエンコーダは、様々な自然言語理解(nlu)タスクで成功を収めているが、これらのプリトレーニングエンコーダと自然言語生成(nlg)の間にはギャップがある。 nlgタスクはしばしばエンコーダ/デコーダフレームワークに基づいており、プリトレーニングされたエンコーダはその一部しか役に立たない。このギャップを減らすために,本モデルでは,デコーダを既訓練の既訓練エンコーダのタスク層とみなす,事前訓練された多言語エンコーダ-デコーダモデルであるDeltaLMを導入する。具体的には,事前学習した多言語エンコーダをデコーダで拡張し,自己指導型で事前学習する。大規模単言語データとバイリンガルデータの両方を活用するために,スパン破壊と翻訳スパン破壊を事前学習タスクとして採用する。実験により、DeltaLMは、機械翻訳、抽象テキスト要約、データ・トゥ・テキスト、質問生成など、自然言語生成と翻訳タスクの両方において、様々な強力なベースラインを上回ります。

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. To take advantage of both the large-scale monolingual data and bilingual data, we adopt the span corruption and translation span corruption as the pre-training tasks. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks, including machine translation, abstractive text summarization, data-to-text, and question generation.

翻訳日:2021-06-28 13:08:35 公開日:2021-06-25

# エネルギーをベースとした協調型サリエンシ予測

Energy-Based Generative Cooperative Saliency Prediction ( http://arxiv.org/abs/2106.13389v1 )

ライセンス: Link先を確認

Jing Zhang and Jianwen Xie and Zilong Zheng and Nick Barnes

(参考訳) 従来のサリエンシー予測モデルは、通常、画像から対応する基底真理サリエンシーマップへの決定論的マッピングを学習する。本稿では,画像に与えられたサリエンシーマップ上の条件付き確率分布を学習し,その予測をサンプリングプロセスとして扱うことにより,生成モデルの観点からサリエンシー予測問題を検討する。具体的には,条件付き潜伏変数モデルと条件付きエネルギーベースモデルとを協調的に学習し,協調的に相応の予測を行う,生成型協調ネットワークに基づく生産型相応予測フレームワークを提案する。私たちはモデルをSalCoopNetsと呼んでいます。潜在変数モデルは、高速だが粗い予測器として機能し、初期予測を効率的に生成し、その後、微細な予測器として機能するエネルギーベースモデルの反復的ランゲヴィン修正によって洗練される。このような粗大な協力的サリエンシ予測戦略は、両方の世界の長所を提供する。さらに,戦略を回復しながら協調学習を行うことによって,トレーニング画像の塩分アノテーションを部分的に観察する,弱教師付き塩分予測のシナリオを一般化する。最後に,学習エネルギー関数を改良モジュールとして機能させることにより,事前学習した他の塩分濃度予測モデルの結果を洗練できることを示す。実験の結果, 生成モデルが最先端の性能を達成できることが判明した。我々のコードは以下で公開されている。 \url{https://github.com/JingZhang617/SalCoopNets}。

Conventional saliency prediction models typically learn a deterministic mapping from images to the corresponding ground truth saliency maps. In this paper, we study the saliency prediction problem from the perspective of generative models by learning a conditional probability distribution over saliency maps given an image, and treating the prediction as a sampling process. Specifically, we propose a generative cooperative saliency prediction framework based on the generative cooperative networks, where a conditional latent variable model and a conditional energy-based model are jointly trained to predict saliency in a cooperative manner. We call our model the SalCoopNets. The latent variable model serves as a fast but coarse predictor to efficiently produce an initial prediction, which is then refined by the iterative Langevin revision of the energy-based model that serves as a fine predictor. Such a coarse-to-fine cooperative saliency prediction strategy offers the best of both worlds. Moreover, we generalize our framework to the scenario of weakly supervised saliency prediction, where saliency annotation of training images is partially observed, by proposing a cooperative learning while recovering strategy. Lastly, we show that the learned energy function can serve as a refinement module that can refine the results of other pre-trained saliency prediction models. Experimental results show that our generative model can achieve state-of-the-art performance. Our code is publicly available at: \url{https://github.com/JingZhang617/SalCoopNets}.

翻訳日:2021-06-28 13:07:55 公開日:2021-06-25

# マルチタスク視覚学習のための生成モデル

Generative Modeling for Multi-task Visual Learning ( http://arxiv.org/abs/2106.13409v1 )

ライセンス: Link先を確認

Zhipeng Bao, Martial Hebert, Yu-Xiong Wang

(参考訳) 生成モデリングはコンピュータビジョンにおいて非常に有望であるが、主に視覚的にリアルなイメージの合成に焦点を当てている。本稿では,共有可能な特徴表現のマルチタスク学習をモチベーションとして,様々な視覚的タスクにおいて有用な共有生成モデルを学ぶという,新たな課題について考察する。そこで本研究では,識別型マルチタスクネットワークと生成ネットワークを結合した汎用マルチタスク指向生成モデリング(mgm)フレームワークを提案する。 RGB画像と画素レベルのアノテーションの両方をマルチタスクシナリオで合成することは難しいが、我々のフレームワークは、弱いアノテーション(画像レベルのシーンラベル)のみをペアにした合成画像を使用することで、複数の視覚的タスクを容易にすることができる。 NYUv2やTaskonomyなど、挑戦的なマルチタスクベンチマークに関する実験的評価は、我々のMGMフレームワークがすべてのタスクのパフォーマンスを大きなマージンで改善し、一貫して最先端のマルチタスクアプローチよりも優れています。

Generative modeling has recently shown great promise in computer vision, but it has mostly focused on synthesizing visually realistic images. In this paper, motivated by multi-task learning of shareable feature representations, we consider a novel problem of learning a shared generative model that is useful across various visual perception tasks. Correspondingly, we propose a general multi-task oriented generative modeling (MGM) framework, by coupling a discriminative multi-task network with a generative network. While it is challenging to synthesize both RGB images and pixel-level annotations in multi-task scenarios, our framework enables us to use synthesized images paired with only weak annotations (i.e., image-level scene labels) to facilitate multiple visual tasks. Experimental evaluation on challenging multi-task benchmarks, including NYUv2 and Taskonomy, demonstrates that our MGM framework improves the performance of all the tasks by large margins, consistently outperforming state-of-the-art multi-task approaches.

翻訳日:2021-06-28 13:07:33 公開日:2021-06-25

# ビデオ質問応答のための階層的オブジェクト指向時空間推論

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering ( http://arxiv.org/abs/2106.13432v1 )

ライセンス: Link先を確認

Long Hoang Dang, Thao Minh Le, Vuong Le, Truyen Tran

(参考訳) Video Question Answering(ビデオQA)は新しいAI機能を開発するための強力なテストベッドである。このタスクは、時空における視覚ドメインと言語ドメイン間のオブジェクト、関係、イベントの推論を学ぶ必要がある。高レベルの推論は、連想的な視覚的パターン認識から、オブジェクトに対するシンボルのような操作、その振る舞いと相互作用への要求を軽減します。この目標を達成するために,映像を相互作用するオブジェクトの動的ストリームとして抽象化するオブジェクト指向推論手法を提案する。ビデオイベントフローの各段階で、これらのオブジェクトは相互に相互作用し、それらの相互作用は、クエリおよびビデオの全体的なコンテキストの下で、推論される。このメカニズムは汎用神経ユニットのファミリーと階層的オブジェクト指向時空間推論(HOSTR)ネットワークと呼ばれる多層アーキテクチャに実体化されている。このニューラルモデルは、階層的にネストされた時空間グラフの形で、オブジェクトの一貫したライフラインを維持する。このグラフ内では、動的インタラクティブなオブジェクト指向表現がビデオシーケンスに沿って構築され、階層的にボトムアップ的に抽象化され、正しい回答のキー情報に収束する。この手法は、複数の主要なビデオQAデータセットで評価され、これらのタスクに新しい最先端技術を確立する。モデルの振る舞いの分析は、オブジェクト指向推論がビデオQAに対する信頼性、解釈可能、効率的なアプローチであることを示している。

Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities. This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time. High-level reasoning demands lifting from associative visual pattern recognition to symbol-like manipulation over objects, their behavior and interactions. Toward reaching this goal we propose an object-oriented reasoning approach in that video is abstracted as a dynamic stream of interacting objects. At each stage of the video event flow, these objects interact with each other, and their interactions are reasoned about with respect to the query and under the overall context of a video. This mechanism is materialized into a family of general-purpose neural units and their multi-level architecture called Hierarchical Object-oriented Spatio-Temporal Reasoning (HOSTR) networks. This neural model maintains the objects' consistent lifelines in the form of a hierarchically nested spatio-temporal graph. Within this graph, the dynamic interactive object-oriented representations are built up along the video sequence, hierarchically abstracted in a bottom-up manner, and converge toward the key information for the correct answer. The method is evaluated on multiple major Video QA datasets and establishes new state-of-the-arts in these tasks. Analysis into the model's behavior indicates that object-oriented reasoning is a reliable, interpretable and efficient approach to Video QA.

翻訳日:2021-06-28 13:07:17 公開日:2021-06-25

# NP-DRAW:画像生成のための非パラメータ構造潜在変数モデル

NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image Generation ( http://arxiv.org/abs/2106.13435v1 )

ライセンス: Link先を確認

Xiaohui Zeng, Raquel Urtasun, Richard Zemel, Sanja Fidler, Renjie Liao

(参考訳) 本稿では、NP-DRAWと呼ばれる画像生成のための非パラメトリック構造化潜在変数モデルを提案する。主な貢献は以下の通りである。 1)ステップ毎の潜在変数 `what-to-draw''' がカテゴリ確率変数となるように,画像部分の出現に関する非パラメトリック事前分布を提案する。これにより表現性が向上し、文学で使用されるガウス語と比較して学習が大幅に楽になる。 2)本論文では,トランスフォーマーを用いて部品の逐次依存性構造をモデル化する。 3) 事前学習のための効果的なヒューリスティック解析アルゴリズムを提案する。 MNIST,Omniglot,CIFAR-10,CelebAによる実験により,本手法は従来のDRAWやAIRなどの画像モデルよりも大幅に優れており,他のジェネリック生成モデルと競合することを示す。さらに,本モデル固有の構成性や解釈性は,低データ学習システムや潜在空間編集において大きなメリットをもたらすことを示す。コードは \url{https://github.com/ZENGXH/NPDRAW} で入手できる。

In this paper, we present a non-parametric structured latent variable model for image generation, called NP-DRAW, which sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas. Our key contributions are as follows. 1) We propose a non-parametric prior distribution over the appearance of image parts so that the latent variable ``what-to-draw'' per step becomes a categorical random variable. This improves the expressiveness and greatly eases the learning compared to Gaussians used in the literature. 2) We model the sequential dependency structure of parts via a Transformer, which is more powerful and easier to train compared to RNNs used in the literature. 3) We propose an effective heuristic parsing algorithm to pre-train the prior. Experiments on MNIST, Omniglot, CIFAR-10, and CelebA show that our method significantly outperforms previous structured image models like DRAW and AIR and is competitive to other generic generative models. Moreover, we show that our model's inherent compositionality and interpretability bring significant benefits in the low-data learning regime and latent space editing. Code is available at \url{https://github.com/ZENGXH/NPDRAW}.

翻訳日:2021-06-28 13:06:58 公開日:2021-06-25

# モダリティの探索:視覚言語事前学習のための自己注意型視覚解析

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training ( http://arxiv.org/abs/2106.13488v1 )

ライセンス: Link先を確認

Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

(参考訳) Vision-Language Pre-Training (VLP)は、画像テキストペアからマルチモーダル表現を学習し、微調整で下流の視覚言語タスクに役立てることを目的としている。支配的なVLPモデルはCNN-Transformerアーキテクチャを採用し、CNNにイメージを埋め込んで、画像とテキストをTransformerにアライメントする。視覚コンテンツ間の視覚的関係は画像理解において重要な役割を担い、モーダル間アライメント学習の基礎となる。しかしながら、cnnは、長距離依存関係のモデリングにおける局所受容野の弱さのため、視覚関係学習に制限がある。したがって、視覚関係とモーダル間アライメントの2つの目的は同じトランスフォーマーネットワークにカプセル化される。このような設計は、各目的の特殊特性を無視してトランスフォーマーにおけるモーダル間アライメント学習を制限する可能性がある。そこで本研究では,視覚関係をよりよく学習し,モーダル間アライメントを促進するために,VLPのためのフルトランスフォーマー視覚埋め込みを提案する。具体的には、視覚と言語モダリティ(モダリティ間)の相互作用を測定するために、IMF(Inter-Modality Flow)と呼ばれる指標を提案する。また,モダリティ間の学習をさらに促進するために,Transformer で Masked Feature Regression (MFR) という新しいマスキング最適化機構を設計する。我々の知る限りでは、VLPにおける視覚的特徴学習におけるTransformerのメリットを探求する最初の研究である。本稿では,視覚的質問応答(VQA),視覚的ヒント(Visual Entailment),視覚的推論(Visual Reasoning)など,幅広い視覚言語タスクについて検証する。当社のアプローチは、最先端のVLPのパフォーマンスを上回るだけでなく、IMFの指標にもメリットがあります。

Vision-Language Pre-training (VLP) aims to learn multi-modal representations from image-text pairs and serves for downstream vision-language tasks in a fine-tuning fashion. The dominant VLP models adopt a CNN-Transformer architecture, which embeds images with a CNN, and then aligns images and text with a Transformer. Visual relationship between visual contents plays an important role in image understanding and is the basic for inter-modal alignment learning. However, CNNs have limitations in visual relation learning due to local receptive field's weakness in modeling long-range dependencies. Thus the two objectives of learning visual relation and inter-modal alignment are encapsulated in the same Transformer network. Such design might restrict the inter-modal alignment learning in the Transformer by ignoring the specialized characteristic of each objective. To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment. Specifically, we propose a metric named Inter-Modality Flow (IMF) to measure the interaction between vision and language modalities (i.e., inter-modality). We also design a novel masking optimization mechanism named Masked Feature Regression (MFR) in Transformer to further promote the inter-modality learning. To the best of our knowledge, this is the first study to explore the benefit of Transformer for visual feature learning in VLP. We verify our method on a wide range of vision-language tasks, including Visual Question Answering (VQA), Visual Entailment and Visual Reasoning. Our approach not only outperforms the state-of-the-art VLP performance, but also shows benefits on the IMF metric.

翻訳日:2021-06-28 13:06:37 公開日:2021-06-25

# 糖尿病網膜症の深層学習に基づく分析における事前訓練と自己監視のロバスト性について

On the Robustness of Pretraining and Self-Supervision for a Deep Learning-based Analysis of Diabetic Retinopathy ( http://arxiv.org/abs/2106.13497v1 )

ライセンス: Link先を確認

Vignesh Srinivasan, Nils Strodthoff, Jackie Ma, Alexander Binder, Klaus-Robert M\"uller, Wojciech Samek

(参考訳) ディープニューラルネットワークに基づく分類アルゴリズムが、人間の医療専門家と競合するパフォーマンスレベルに達する医療用ユースケースが増えている。小さなデータセットサイズの課題を軽減するため、これらのシステムは事前トレーニングに依存することが多い。本研究は,これらのアプローチの広範な影響を評価することを目的とする。糖尿病網膜症を模範とした症例では, コントラスト学習に基づく自己指導型事前訓練法を含め, 異なる訓練方法の影響を比較検討した。この目的のために, 定量的性能, 学習特徴表現の統計, 解釈可能性, 画像歪みに対するロバスト性など, 様々な側面について検討した。以上の結果から,imagenetプリトレーニングから初期化したモデルでは,画像歪みに対する性能,一般化,ロバスト性が著しく向上することが示唆された。特に、自己教師付きモデルは教師付きモデルにさらなる利点をもたらす。 ImageNetから初期化した自己教師型モデルは、高いパフォーマンスを報告するだけでなく、大きな病変への過剰適合を減らし、疾患の進行を示す微小病変を考慮に入れた。簡単なパフォーマンス比較を超えて、より広い意味でプレトレーニングの効果を理解することは、この研究で考慮されたユースケースを超えて、幅広い医療画像コミュニティにとって重要である。

There is an increasing number of medical use-cases where classification algorithms based on deep neural networks reach performance levels that are competitive with human medical experts. To alleviate the challenges of small dataset sizes, these systems often rely on pretraining. In this work, we aim to assess the broader implications of these approaches. For diabetic retinopathy grading as exemplary use case, we compare the impact of different training procedures including recently established self-supervised pretraining methods based on contrastive learning. To this end, we investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions. Our results indicate that models initialized from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions. In particular, self-supervised models show further benefits to supervised models. Self-supervised models with initialization from ImageNet pretraining not only report higher performance, they also reduce overfitting to large lesions along with improvements in taking into account minute lesions indicative of the progression of the disease. Understanding the effects of pretraining in a broader sense that goes beyond simple performance comparisons is of crucial importance for the broader medical imaging community beyond the use-case considered in this work.

翻訳日:2021-06-28 13:06:07 公開日:2021-06-25

# 多対多対応を考慮したテキストクエリによるビデオモーメント検索

Video Moment Retrieval with Text Query Considering Many-to-Many Correspondence Using Potentially Relevant Pair ( http://arxiv.org/abs/2106.13566v1 )

ライセンス: Link先を確認

Sho Maeoki, Yusuke Mukuta, Tatsuya Harada

(参考訳) 本稿では,ビデオコーパスからテキストベースの映像モーメント検索を行う。モデルをトレーニングするために、テキストモーメントペアデータセットを使用して正しい対応を学習した。典型的な訓練法では、接地型テキストモーメントペアは正の対として、他のペアは負の対として用いられる。しかし、地対と地対は別として、一部の文対は正と見なすべきである。この場合、1つのテキストアノテーションは多くのビデオモーメントに対して陽性となる。逆に、あるビデオモーメントは多くのテキストアノテーションに対応できる。したがって、テキストアノテーションとビデオモーメントの間には多くの対多の対応がある。これらの対応に基づき、基礎的真理として与えられていなくても否定的でない潜在的に関連性のあるペアを形成することができ、そのような関連性のあるペアを効果的にトレーニングに組み込むことで、検索性能を向上させることができる。テキストクエリは、ビデオの瞬間に起きていることを記述すべきである。したがって、類似したアクションを含む類似のテキストでアノテートされた異なるビデオモーメントは、類似のアクションを持つ可能性が高いため、これらのペアは関連するペアと見なすことができる。本稿では,テキストアノテーションに関する言語解析に基づいて,潜在的に関連のあるペアを活用できる新しい学習手法を提案する。 2つのベンチマークデータセットを用いた実験により,本手法は定量的かつ定性的に検索性能を向上することがわかった。

In this paper we undertake the task of text-based video moment retrieval from a corpus of videos. To train the model, text-moment paired datasets were used to learn the correct correspondences. In typical training methods, ground-truth text-moment pairs are used as positive pairs, whereas other pairs are regarded as negative pairs. However, aside from the ground-truth pairs, some text-moment pairs should be regarded as positive. In this case, one text annotation can be positive for many video moments. Conversely, one video moment can be corresponded to many text annotations. Thus, there are many-to-many correspondences between the text annotations and video moments. Based on these correspondences, we can form potentially relevant pairs, which are not given as ground truth yet are not negative; effectively incorporating such relevant pairs into training can improve the retrieval performance. The text query should describe what is happening in a video moment. Hence, different video moments annotated with similar texts, which contain a similar action, are likely to hold the similar action, thus these pairs can be considered as potentially relevant pairs. In this paper, we propose a novel training method that takes advantage of potentially relevant pairs, which are detected based on linguistic analysis about text annotation. Experiments on two benchmark datasets revealed that our method improves the retrieval performance both quantitatively and qualitatively.

翻訳日:2021-06-28 13:05:48 公開日:2021-06-25

# 単眼RGB映像からのアニマタブルニューラルラジアンス場

Animatable Neural Radiance Fields from Monocular RGB Video ( http://arxiv.org/abs/2106.13629v1 )

ライセンス: Link先を確認

Jianchuan Chen, Ying Zhang, Di Kang, Xuefei Zhe, Linchao Bao, Huchuan Lu

(参考訳) 単眼ビデオからの詳細な人体アバター作成のためのアニマタブル神経放射場を提案する。提案手法は,シーン表現ネットワークを学習しながら,明示的なポーズ誘導変形を導入することで,人間の動きを伴う動的シーンにニューラルレイディアンス場(NeRF)を拡張する。特に、各フレームの人間のポーズを推定し、詳細な人間のテンプレートに対して一定の標準空間を学習し、ポーズパラメータの明示的な制御の下で観察空間から標準空間への自然な形状変形を可能にする。不正確なポーズ推定を補うために、学習過程における最初のポーズを更新するポーズ改善戦略を導入し、より正確な人間の再構築を学ぶだけでなく、収束を加速させる。実験の結果, 提案手法は, 1) 質の高い細部を持つ暗黙の人間の形状と外観の復元, 2) 任意の視点からの人間の写真リアルなレンダリング, 3) 任意のポーズを持つ人間のアニメーションを実現する。

We present animatable neural radiance fields for detailed human avatar creation from monocular videos. Our approach extends neural radiance fields (NeRF) to the dynamic scenes with human movements via introducing explicit pose-guided deformation while learning the scene representation network. In particular, we estimate the human pose for each frame and learn a constant canonical space for the detailed human template, which enables natural shape deformation from the observation space to the canonical space under the explicit control of the pose parameters. To compensate for inaccurate pose estimation, we introduce the pose refinement strategy that updates the initial pose during the learning process, which not only helps to learn more accurate human reconstruction but also accelerates the convergence. In experiments we show that the proposed approach achieves 1) implicit human geometry and appearance reconstruction with high-quality details, 2) photo-realistic rendering of the human from arbitrary views, and 3) animation of the human with arbitrary poses.

翻訳日:2021-06-28 13:05:29 公開日:2021-06-25

# 公平かつ解釈可能な表現学習のための投影的遠近法:3次元顔形状解析への応用

Projection-wise Disentangling for Fair and Interpretable Representation Learning: Application to 3D Facial Shape Analysis ( http://arxiv.org/abs/2106.13734v1 )

ライセンス: Link先を確認

Xianjing Liu, Bo Li, Esther Bron, Wiro Niessen, Eppo Wolvius and Gennady Roshchupkin

(参考訳) 合流バイアスは、特に臨床実践において、機械学習を実践する上で重要な問題である。我々は,複数のバイアスに依存しない学習表現の問題を考える。文学では、これは主にバイアス情報を学習した表現から取り除くことで解決される。しかし我々は,この戦略が表現における情報の多様性を損なうことを期待し,その将来的な利用(解釈など)を制限する。そこで本研究では,ほぼすべての情報を潜在表現に保持しながらバイアスを軽減することを提案する。これを実現するため,学習ベクトル方向に潜在機能を投影し,すべての学習特徴よりもバイアスと予測特徴の独立性を強制する。投影特徴と入力データとのマッピングを解釈するために,学習ベクトル方向に沿ってサンプリングと再構成を行うプロジェクションワイド・アンタングリングを提案する。提案手法は3次元顔の形状と患者特性(n=5011)の分析に基づいて評価した。実験により、この概念的に単純な手法は、最先端の公正な予測性能と解釈性を達成し、臨床応用への大きな可能性を示した。

Confounding bias is a crucial problem when applying machine learning to practice, especially in clinical practice. We consider the problem of learning representations independent to multiple biases. In literature, this is mostly solved by purging the bias information from learned representations. We however expect this strategy to harm the diversity of information in the representation, and thus limiting its prospective usage (e.g., interpretation). Therefore, we propose to mitigate the bias while keeping almost all information in the latent representations, which enables us to observe and interpret them as well. To achieve this, we project latent features onto a learned vector direction, and enforce the independence between biases and projected features rather than all learned features. To interpret the mapping between projected features and input data, we propose projection-wise disentangling: a sampling and reconstruction along the learned vector direction. The proposed method was evaluated on the analysis of 3D facial shape and patient characteristics (N=5011). Experiments showed that this conceptually simple method achieved state-of-the-art fair prediction performance and interpretability, showing its great potential for clinical applications.

翻訳日:2021-06-28 13:05:13 公開日:2021-06-25

# 「ゼロショット」ポイントクラウドアップサンプリング

"Zero Shot" Point Cloud Upsampling ( http://arxiv.org/abs/2106.13765v1 )

ライセンス: Link先を確認

Kaiyue Zhou, Ming Dong, Suzan Arslanturk

(参考訳) ディープラーニングを使ったポイントクラウドのアップサンプリングは、ここ数年でさまざまな成果を上げている。近年の教師付き深層学習法は, 訓練データのサイズに制限されており, 点雲の形状を網羅する点で制限されている。さらに、そのような量のデータの取得は非現実的であり、ネットワークは一般に、見当たらないレコードで期待されたほど強力ではない。本稿では,ゼロショット (Zero Shot) Point Cloud Upsampling (ZSPU) と呼ばれる点群を包括的に監視する手法を提案する。我々のアプローチは、自己学習とテストの両方のフェーズにパッチを当てることなく、特定のポイントクラウドが提供する内部情報のみに基づいています。このシングルストリーム設計は、低解像度(LR)点雲と高解像度(HR)雲の関係を学習することにより、アップサンプリングタスクのトレーニング時間を著しく短縮する。このアソシエーションは、元の点雲が入力としてロードされたときに超解像(SR)出力を提供する。ベンチマークポイントクラウドデータセット上で、他のアップサンプリング手法と比較して、競合性能を示す。さらに、ZSPUは複雑な局所的な詳細や高い曲率を持つ形状の質的な結果を得る。

Point cloud upsampling using deep learning has been paid various efforts in the past few years. Recent supervised deep learning methods are restricted to the size of training data and is limited in terms of covering all shapes of point clouds. Besides, the acquisition of such amount of data is unrealistic, and the network generally performs less powerful than expected on unseen records. In this paper, we present an unsupervised approach to upsample point clouds internally referred as "Zero Shot" Point Cloud Upsampling (ZSPU) at holistic level. Our approach is solely based on the internal information provided by a particular point cloud without patching in both self-training and testing phases. This single-stream design significantly reduces the training time of the upsampling task, by learning the relation between low-resolution (LR) point clouds and their high (original) resolution (HR) counterparts. This association will provide super-resolution (SR) outputs when original point clouds are loaded as input. We demonstrate competitive performance on benchmark point cloud datasets when compared to other upsampling methods. Furthermore, ZSPU achieves superior qualitative results on shapes with complex local details or high curvatures.

翻訳日:2021-06-28 13:04:55 公開日:2021-06-25

# 逆学習による信頼グラフニューラルネットワークの説明

Reliable Graph Neural Network Explanations Through Adversarial Training ( http://arxiv.org/abs/2106.13427v1 )

ライセンス: Link先を確認

Donald Loveland, Shusen Liu, Bhavya Kailkhura, Anna Hiszpanski, Yong Han

(参考訳) グラフニューラルネットワーク(GNN)の説明は大半がポストホックイントロスペクションによって進められている。これは成功と見なされているが、多くのポストホックな説明方法はモデルの学習した表現を捉えるのに失敗することが示されている。この問題のため、モデルをどのようにトレーニングして、ポストホック解析がより快適になるか検討する価値がある。コンピュータビジョン領域における、より信頼性の高い表現でモデルを訓練するための逆トレーニングの成功を踏まえ、GNNの同様の訓練パラダイムを提案し、モデルの説明に対するそれぞれの影響を分析する。基底的真理ラベルのない例では、説明法がモデルの学習した表現を新しいメトリックを通していかにうまく活用しているかを判断し、逆行訓練が化学におけるドメイン関連洞察の抽出に役立つことを示す。

Graph neural network (GNN) explanations have largely been facilitated through post-hoc introspection. While this has been deemed successful, many post-hoc explanation methods have been shown to fail in capturing a model's learned representation. Due to this problem, it is worthwhile to consider how one might train a model so that it is more amenable to post-hoc analysis. Given the success of adversarial training in the computer vision domain to train models with more reliable representations, we propose a similar training paradigm for GNNs and analyze the respective impact on a model's explanations. In instances without ground truth labels, we also determine how well an explanation method is utilizing a model's learned representation through a new metric and demonstrate adversarial training can help better extract domain-relevant insights in chemistry.

翻訳日:2021-06-28 13:03:35 公開日:2021-06-25

# 時間グラフ信号分解

Temporal Graph Signal Decomposition ( http://arxiv.org/abs/2106.13517v1 )

ライセンス: Link先を確認

Maxwell McNeil and Lin Zhang and Petko Bogdanov

(参考訳) 時間グラフ信号は、固定グラフ構造のノードに関連付けられた個々のコンポーネントを持つ多変量時系列である。この種のデータは、ソーシャルネットワークユーザーの活動、時間の経過とともにセンサーネットワークを読み取ること、モデル生物の相互作用ネットワーク内の時間コース遺伝子表現など、多くの領域で発生する。このようなデータに適用される従来の行列分解法は、基礎となるグラフにエンコードされた構造的規則性や、信号の時間的パターンを活用できない。このような構造を考慮すれば、時間グラフ信号の簡潔かつ解釈可能な表現が得られるか。本稿では、時間グラフ信号分解(TGSD)のための一般的な辞書ベースのフレームワークを提案する。鍵となるアイデアは、グラフと時間辞書を組み合わせることで、データの低ランクでジョイントなエンコーディングを学ぶことである。本稿では, 完全データと不完全データの両方に対する高度にスケーラブルな分解アルゴリズムを提案し, 行列分解, 欠落値の計算, 時間的補間, クラスタリング, 周期推定, および, 交通パターンからソーシャルメディア活動まで, 実世界のデータにおけるランク推定の利点を示す。観察の75%が欠落している時, 時間的補間のための基準線に比べてRMSEの28%の減少が達成された。ベースライン間では、350万のデータポイントで20秒未満でスケールし、最も控えめなモデルを生成する。我々の知る限りでは、TGSDは時間辞書とグラフ辞書によってグラフ信号を共同でモデル化する最初のフレームワークである。

Temporal graph signals are multivariate time series with individual components associated with nodes of a fixed graph structure. Data of this kind arises in many domains including activity of social network users, sensor network readings over time, and time course gene expression within the interaction network of a model organism. Traditional matrix decomposition methods applied to such data fall short of exploiting structural regularities encoded in the underlying graph and also in the temporal patterns of the signal. How can we take into account such structure to obtain a succinct and interpretable representation of temporal graph signals? We propose a general, dictionary-based framework for temporal graph signal decomposition (TGSD). The key idea is to learn a low-rank, joint encoding of the data via a combination of graph and time dictionaries. We propose a highly scalable decomposition algorithm for both complete and incomplete data, and demonstrate its advantage for matrix decomposition, imputation of missing values, temporal interpolation, clustering, period estimation, and rank estimation in synthetic and real-world data ranging from traffic patterns to social media activity. Our framework achieves 28% reduction in RMSE compared to baselines for temporal interpolation when as many as 75% of the observations are missing. It scales best among baselines taking under 20 seconds on 3.5 million data points and produces the most parsimonious models. To the best of our knowledge, TGSD is the first framework to jointly model graph signals by temporal and graph dictionaries.

翻訳日:2021-06-28 13:03:22 公開日:2021-06-25

# 有限要素畳み込みニューラルネットワーク(fe-cnn)による構造トポロジ最適化の高速化

A mechanistic-based data-driven approach to accelerate structural topology optimization through finite element convolutional neural network (FE-CNN) ( http://arxiv.org/abs/2106.13652v1 )

ライセンス: Link先を確認

Tianle Yue, Hang Yang, Zongliang Du, Chang Liu, Khalil I. Elkhodary, Shan Tang, Xu Guo

(参考訳) 本稿では, 内部で開発された有限要素畳み込みニューラルネットワーク(FE-CNN)を用いて, 構造トポロジ最適化を高速化するメカニスティックなデータ駆動手法を提案する。我々のアプローチは、オフライントレーニングとオンライン最適化の2つの段階に分けられる。オフライントレーニングでは、所定の設計ドメインの高解像度表現と低解像度表現の間にマッピング関数が構築される。このマッピングはFE-CNNによって表現され、異なる解像度の設計領域間で共通の目的関数値(例えば、構造的コンプライアンス)をターゲットにしている。オンライン最適化では、訓練されたマッピング機能により、高解像度の任意の設計領域を低解像度に還元する。従って、オリジナルの高解像度ドメインは、低解像度バージョンのみで実行される計算と、高解像度ドメインへの逆マッピングによって設計されている。数値例は、このアプローチが計算時間の最大桁まで最適化を加速できることを示しています。したがって,提案手法は密度に基づく構造トポロジー最適化によって生じる次元の呪いを克服する大きな可能性を示す。本研究のアプローチの限界についても論じる。

In this paper, a mechanistic data-driven approach is proposed to accelerate structural topology optimization, employing an in-house developed finite element convolutional neural network (FE-CNN). Our approach can be divided into two stages: offline training, and online optimization. During offline training, a mapping function is built between high and low resolution representations of a given design domain. The mapping is expressed by a FE-CNN, which targets a common objective function value (e.g., structural compliance) across design domains of differing resolutions. During online optimization, an arbitrary design domain of high resolution is reduced to low resolution through the trained mapping function. The original high-resolution domain is thus designed by computations performed on only the low-resolution version, followed by an inverse mapping back to the high-resolution domain. Numerical examples demonstrate that this approach can accelerate optimization by up to an order of magnitude in computational time. Our proposed approach therefore shows great potential to overcome the curse-of-dimensionality incurred by density-based structural topology optimization. The limitation of our present approach is also discussed.

翻訳日:2021-06-28 13:02:58 公開日:2021-06-25

# cadda:脳波信号に対するクラス別自動微分可能データ拡張

CADDA: Class-wise Automatic Differentiable Data Augmentation for EEG Signals ( http://arxiv.org/abs/2106.13695v1 )

ライセンス: Link先を確認

C\'edric Rommel, Thomas Moreau, Alexandre Gramfort

(参考訳) データ拡張はディープラーニングパイプラインの重要な要素であり、ラベルを不変に保つ入力データの変換に関するトレーニング中にネットワークに通知する。しかし、与えられたパイプラインの適切な拡張メソッドとパラメータを手動で見つけるのは、急速に面倒です。特に、直観は画像に対してこの決定を導くことができるが、神経科学信号のようなより複雑なデータに対して、拡張ポリシーの設計と選択は不明確である。さらに、このような構造化データにはラベル独立戦略が適さない場合や、クラス依存の強化が必要かもしれない。カーイメージの色を変えることは、予測されるオブジェクトクラスを変えるのではなく、オレンジの画像に同じことをすることです。本稿では,データ拡張による一般化能力の向上を目的とする。しかし、クラスに依存した変換を求めるとタスクの複雑さが大きくなり、既存のほとんどの自動手法による勾配のない最適化手法が現実のデータセットにとって難解になる。そこで本研究では,勾配に基づく学習に適した微分可能データ拡張法を提案する。脳波信号は、良い拡張ポリシーがほとんど知られていないデータの完璧な例です。本研究は,臨床関連睡眠ステージ分類課題に対する我々のアプローチの意義を実証するものであり,また,異なる変換も提案する。

Data augmentation is a key element of deep learning pipelines, as it informs the network during training about transformations of the input data that keep the label unchanged. Manually finding adequate augmentation methods and parameters for a given pipeline is however rapidly cumbersome. In particular, while intuition can guide this decision for images, the design and choice of augmentation policies remains unclear for more complex types of data, such as neuroscience signals. Moreover, label independent strategies might not be suitable for such structured data and class-dependent augmentations might be necessary. This idea has been surprisingly unexplored in the literature, while it is quite intuitive: changing the color of a car image does not change the object class to be predicted, but doing the same to the picture of an orange does. This paper aims to increase the generalization power added through class-wise data augmentation. Yet, as seeking transformations depending on the class largely increases the complexity of the task, using gradient-free optimization techniques as done by most existing automatic approaches becomes intractable for real-world datasets. For this reason we propose to use differentiable data augmentation amenable to gradient-based learning. EEG signals are a perfect example of data for which good augmentation policies are mostly unknown. In this work, we demonstrate the relevance of our approach on the clinically relevant sleep staging classification task, for which we also propose differentiable transformations.

翻訳日:2021-06-28 13:02:38 公開日:2021-06-25

# Ranger21: シナジスティックなディープラーニングオプティマイザ

Ranger21: a synergistic deep learning optimizer ( http://arxiv.org/abs/2106.13731v1 )

ライセンス: Link先を確認

Less Wright and Nestor Demeure

(参考訳) ニューラルネットワークの性能に最適化器が不可欠であるため、毎年多くの論文が発表されている。しかし、これらの出版物の多くは既存のアルゴリズムを漸進的に改善しているが、それらは構成可能なアルゴリズムではなく、新しい最適化として提示される傾向がある。このように、初期の出版物から多くの価値ある改善が見られることは滅多にない。この未解決の可能性を生かして、adamwと8つのコンポーネントを組み合わせた新しいオプティマイザ ranger21 を紹介し、文献からアイデアをレビューおよびテストした後、慎重に選択する。その結果、オプティマイザは検証精度とトレーニング速度を大幅に改善し、スムーズなトレーニング曲線を提供し、バッチ正規化レイヤなしでImageNet2012上でResNet50をトレーニングできることがわかった。 AdamWが体系的に悪い初期状態に留まっている問題。

As optimizers are critical to the performances of neural networks, every year a large number of papers innovating on the subject are published. However, while most of these publications provide incremental improvements to existing algorithms, they tend to be presented as new optimizers rather than composable algorithms. Thus, many worthwhile improvements are rarely seen out of their initial publication. Taking advantage of this untapped potential, we introduce Ranger21, a new optimizer which combines AdamW with eight components, carefully selected after reviewing and testing ideas from the literature. We found that the resulting optimizer provides significantly improved validation accuracy and training speed, smoother training curves, and is even able to train a ResNet50 on ImageNet2012 without Batch Normalization layers. A problem on which AdamW stays systematically stuck in a bad initial state.

翻訳日:2021-06-28 13:02:18 公開日:2021-06-25

# Jitter:ランダムジッタリング損失関数

Jitter: Random Jittering Loss Function ( http://arxiv.org/abs/2106.13749v1 )

ライセンス: Link先を確認

Zhicheng Cai, Chenglei Peng and Sidan Du

(参考訳) 正規化は機械学習の最適化において重要な役割を果たす。フラッディングと呼ばれる新しい正規化手法により、トレーニング損失はフラッディングレベル付近で変動する。一般化を促進するために、フラットな損失の風景に達するまで、モデルをランダムに歩き続けることを意図しています。しかし、洪水法のハイパーパラメータフラッディングレベルを適切に均一に選択することができない。そこで我々は,jitter という新しい手法を提案する。 jitterは本質的にランダムな損失関数の一種です。トレーニング前に、特定の確率分布からジッタ点をランダムにサンプリングする。浸水レベルをジッターポイントに置き換えて新しい目標関数を取得し、それに従ってモデルを訓練する必要がある。ランダムな要素として作用するジッター点は、実際に損失関数にランダム性を加えるが、これは機械学習モデルの学習プロセスに無数のランダムな振る舞いが存在するという事実と一致し、モデルをより堅牢にすることが期待される。さらに、jitterはランダムにランダムにウォークを行い、損失曲線を小さな間隔に分けて反転させ、損失曲線をよりフラットにし、一般化能力を高める。さらに、Jitterはドメイン、タスク、モデルに依存しない正規化手法であり、トレーニングエラーがゼロになった後にモデルを効果的に訓練することができる。実験の結果,jitter法では,従来のフラッディング法よりもモデル性能が大幅に向上し,試験損失曲線を2回降下できることがわかった。

Regularization plays a vital role in machine learning optimization. One novel regularization method called flooding makes the training loss fluctuate around the flooding level. It intends to make the model continue to random walk until it comes to a flat loss landscape to enhance generalization. However, the hyper-parameter flooding level of the flooding method fails to be selected properly and uniformly. We propose a novel method called Jitter to improve it. Jitter is essentially a kind of random loss function. Before training, we randomly sample the Jitter Point from a specific probability distribution. The flooding level should be replaced by Jitter point to obtain a new target function and train the model accordingly. As Jitter point acting as a random factor, we actually add some randomness to the loss function, which is consistent with the fact that there exists innumerable random behaviors in the learning process of the machine learning model and is supposed to make the model more robust. In addition, Jitter performs random walk randomly which divides the loss curve into small intervals and then flipping them over, ideally making the loss curve much flatter and enhancing generalization ability. Moreover, Jitter can be a domain-, task-, and model-independent regularization method and train the model effectively after the training error reduces to zero. Our experimental results show that Jitter method can improve model performance more significantly than the previous flooding method and make the test loss curve descend twice.

翻訳日:2021-06-28 13:02:03 公開日:2021-06-25

# 等分散グラフネットワークにおけるデータ効率

Data efficiency in graph networks through equivariance ( http://arxiv.org/abs/2106.13786v1 )

ライセンス: Link先を確認

Francesco Farina, Emma Slade

(参考訳) 本稿では,隣接ノード間の距離を保つ座標埋め込み内の任意の変換に同値なグラフネットワークのための新しいアーキテクチャを提案する。特に、n$-次元におけるユークリッド群と共形直交群とに同値である。その同値性のおかげで、提案モデルは古典的なグラフアーキテクチャに関して非常にデータ効率が良く、本質的にはより優れた帰納バイアスを備える。提案するアーキテクチャは、最小限のデータ量で学習することで、合成問題で見つからないデータに完全に一般化できる一方で、標準モデルからより多くのトレーニングデータが、同等のパフォーマンスに達するのに必要であることを示す。

We introduce a novel architecture for graph networks which is equivariant to any transformation in the coordinate embeddings that preserves the distance between neighbouring nodes. In particular, it is equivariant to the Euclidean and conformal orthogonal groups in $n$-dimensions. Thanks to its equivariance properties, the proposed model is extremely more data efficient with respect to classical graph architectures and also intrinsically equipped with a better inductive bias. We show that, learning on a minimal amount of data, the architecture we propose can perfectly generalise to unseen data in a synthetic problem, while much more training data are required from a standard model to reach comparable performance.

翻訳日:2021-06-28 13:01:43 公開日:2021-06-25

# 知識グラフに基づくソフトウェア定義ネットワークの自律管理に向けて

Towards A Knowledge Graph Based Autonomic Management of Software Defined Networks ( http://arxiv.org/abs/2106.13367v1 )

ライセンス: Link先を確認

Qianru Zhou and Alasdair J.G. Gray and Stephen McLaughlin

(参考訳) 人工知能技術による自動ネットワーク管理は何十年にもわたって熱く議論されてきた。しかし、現在の報告では、主に理論的な提案とアーキテクチャ設計に焦点を当てており、現実のネットワーク上での実践的実装に関する作業は未だ現れていない。本稿では,ソフトウェア定義ネットワーク(SDN)における自律的ネットワーク管理のための知識グラフ駆動型アプローチの実装に向けた取り組みについて述べる。 ToCoオントロジーによって駆動されるSeaNetは、Mininet(SDNエミュレータ)に基づいて再プログラムされる。それは3つのコアコンポーネント、ナレッジグラフジェネレータ、sparqlエンジン、ネットワーク管理apiで構成されている。知識グラフ生成器は、通信ネットワーク管理タスクの知識を、正式に表現されたオントロジー駆動モデルに表現する。エキスパートエクスペリエンスとネットワーク管理ルールはナレッジグラフに形式化することができ、SPARQLエンジンによって自動的に推論されることにより、Network Management APIはテクノロジ固有の詳細をパケット化し、テクノロジに依存しないインターフェースをユーザに公開することができる。同一言語pythonで実装された商用sdnコントローラryuとの比較により,提案手法を評価する実験を行った。評価の結果,ほとんどの場合,SeaNetはRyuよりかなり高速であり,SeaNetのコードははるかにコンパクトであることがわかった。 RDF推論の利点として、SeaNetは知識グラフの異なるスケールでO(1)時間複雑性を達成でき、一方従来のデータベースはO(nlogn)を最大限に達成できる。 SeaNetは、開発したネットワーク管理APIにより、研究者が自身のSDN上でセマンティック・インテリジェントなアプリケーションを開発できるようにする。

Automatic network management driven by Artificial Intelligent technologies has been heatedly discussed over decades. However, current reports mainly focus on theoretic proposals and architecture designs, works on practical implementations on real-life networks are yet to appear. This paper proposes our effort toward the implementation of knowledge graph driven approach for autonomic network management in software defined networks (SDNs), termed as SeaNet. Driven by the ToCo ontology, SeaNet is reprogrammed based on Mininet (a SDN emulator). It consists three core components, a knowledge graph generator, a SPARQL engine, and a network management API. The knowledge graph generator represents the knowledge in the telecommunication network management tasks into formally represented ontology driven model. Expert experience and network management rules can be formalized into knowledge graph and by automatically inferenced by SPARQL engine, Network management API is able to packet technology-specific details and expose technology-independent interfaces to users. The Experiments are carried out to evaluate proposed work by comparing with a commercial SDN controller Ryu implemented by the same language Python. The evaluation results show that SeaNet is considerably faster in most circumstances than Ryu and the SeaNet code is significantly more compact. Benefit from RDF reasoning, SeaNet is able to achieve O(1) time complexity on different scales of the knowledge graph while the traditional database can achieve O(nlogn) at its best. With the developed network management API, SeaNet enables researchers to develop semantic-intelligent applications on their own SDNs.

翻訳日:2021-06-28 13:00:38 公開日:2021-06-25

# 垂直探索のためのドメイン特化事前学習:生物医学文献の事例研究

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature ( http://arxiv.org/abs/2106.13375v1 )

ライセンス: Link先を確認

Yu Wang, Jinchao Li, Tristan Naumann, Chenyan Xiong, Hao Cheng, Robert Tinn, Cliff Wong, Naoto Usuyama, Richard Rogahn, Zhihong Shen, Yang Qin, Eric Horvitz, Paul N. Bennett, Jianfeng Gao, Hoifung Poon

(参考訳) 情報過負荷は、多くの高価値ドメインにおいて一般的な課題である。特筆すべき事例は、新型コロナウイルス(covid-19)に関する生物医学文献が爆発的に爆発し、何ヶ月にもわたって数十万の論文に膨れ上がったことだ。概して、生物医学の文献は毎分2つの論文に拡張され、毎年100万以上の新しい論文が発行されている。クリックログからの直接監督が不足しているため、バイオメディカル領域や多くの垂直領域での検索は困難である。自己監督学習は、アノテーションのボトルネックを克服するための有望な方向性として現れてきた。本稿では、ドメイン固有の事前学習に基づく垂直探索のための一般的なアプローチを提案し、バイオメディカルドメインのケーススタディを提案する。極めてシンプルで,訓練や開発に関連ラベルを使用しないにもかかわらず,本手法は,新型コロナ関連生物医学的検索競争である公式trec-covid評価において,優れたシステムと同等かそれ以上の性能を発揮する。現代のクラウドインフラで分散コンピューティングを使用することで、私たちのシステムはPubMed上で数千万の記事にスケールでき、Microsoft Biomedical Searchとしてデプロイされた。

Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vertical domains is challenging due to the scarcity of direct supervision from click logs. Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck. We propose a general approach for vertical search based on domain-specific pretraining and present a case study for the biomedical domain. Despite being substantially simpler and not using any relevance labels for training or development, our method performs comparably or better than the best systems in the official TREC-COVID evaluation, a COVID-related biomedical search competition. Using distributed computing in modern cloud infrastructure, our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search, a new search experience for biomedical literature: https://aka.ms/biomedsearch.

翻訳日:2021-06-28 13:00:13 公開日:2021-06-25

# tts/vcシステムにおけるベクトル量子化潜在空間の利用に関する予備的検討

Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance ( http://arxiv.org/abs/2106.13479v1 )

ライセンス: Link先を確認

Hieu-Thi Luong and Junichi Yamagishi

(参考訳) 一般に、ニューラルネットワーク合成システムの訓練の主な目的は、隠れた層にあまり注意を払わずに、ニューラルネットワークの出力層から自然で表現豊かな音声を合成することである。しかし、有用な潜在表現を学習することで、システムはより実用的なシナリオで使用できる。本稿では,潜在言語埋め込みのモデル化における量子化ベクトルの利用について検討し,それと比較する。学習における潜在空間上の異なるポリシーを強制することにより、品質と話者の類似性の観点から同様の性能を保ちながら、異なる特性を生かした潜在言語埋め込みを得ることができる。実験により,ベクトル量子化によって構築された音声クローンシステムは,知覚的評価の面では小さな劣化しか持たないが,データ転送や情報漏洩の抑制に望ましい表現ビットレートの低減や,話者の匿名化などのタスクにおいて重要な離散的潜在空間を有することが分かった。

Generally speaking, the main objective when training a neural speech synthesis system is to synthesize natural and expressive speech from the output layer of the neural network without much attention given to the hidden layers. However, by learning useful latent representation, the system can be used for many more practical scenarios. In this paper, we investigate the use of quantized vectors to model the latent linguistic embedding and compare it with the continuous counterpart. By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding that takes on different properties while having a similar performance in terms of quality and speaker similarity. Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations, but has a discrete latent space that is useful for reducing the representation bit-rate, which is desirable for data transferring, or limiting the information leaking, which is important for speaker anonymization and other tasks of that nature.

翻訳日:2021-06-28 12:59:53 公開日:2021-06-25

# 特徴群とスパース主成分分析

Feature Grouping and Sparse Principal Component Analysis ( http://arxiv.org/abs/2106.13685v1 )

ライセンス: Link先を確認

Haiyan Jiang, Shanshan Qin, Dejing Dou

(参考訳) スパース主成分分析 (sparse principal component analysis, spca) はデータ処理や次元縮小に広く使われている。しかし、スパースPCAは、すべての係数が 0 である特別な群(つまり、特徴選択)に加えて、負荷が類似した係数(すなわち、特徴群)を共有する追加のグループ構造を決して考慮しない。本稿では,FGSPCA(Feature Grouping and Sparse principal Component Analysis)と呼ばれる新しい手法を提案する。提案したFGSPCAは,非凸正規化を自然に調整可能な間隔とグループ化効果を付与することにより,グループ探索と特徴選択を同時に行うためのサブスペース学習手法である。結果として得られる非凸最適化問題を解決するために、差分凸プログラミング、拡張ラグランジュおよび座標降下法を組み込んだ交互アルゴリズムを提案する。さらに, 実データを用いた実験結果から, 提案したFGSPCAはグループ化効果のない手法と比較してグループ化効果の恩恵を受けることが示された。

Sparse Principal Component Analysis (SPCA) is widely used in data processing and dimension reduction; it uses the lasso to produce modified principal components with sparse loadings for better interpretability. However, sparse PCA never considers an additional grouping structure where the loadings share similar coefficients (i.e., feature grouping), besides a special group with all coefficients being zero (i.e., feature selection). In this paper, we propose a novel method called Feature Grouping and Sparse Principal Component Analysis (FGSPCA) which allows the loadings to belong to disjoint homogeneous groups, with sparsity as a special case. The proposed FGSPCA is a subspace learning method designed to simultaneously perform grouping pursuit and feature selection, by imposing a non-convex regularization with naturally adjustable sparsity and grouping effect. To solve the resulting non-convex optimization problem, we propose an alternating algorithm that incorporates the difference-of-convex programming, augmented Lagrange and coordinate descent methods. Additionally, the experimental results on real data sets show that the proposed FGSPCA benefits from the grouping effect compared with methods without grouping effect.

翻訳日:2021-06-28 12:59:34 公開日:2021-06-25

# クラス及び層別VAEによる意味的画像合成と編集の多様化

Diversifying Semantic Image Synthesis and Editing via Class- and Layer-wise VAEs ( http://arxiv.org/abs/2106.13416v1 )

ライセンス: Link先を確認

Yuki Endo, Yoshihiro Kanamori

(参考訳) セマンティック画像合成は、単一のセマンティックマスクからフォトリアリスティック画像を生成するプロセスである。マルチモーダル画像合成の多様性を高めるため、従来の手法では1つの潜在空間を学習することで出力画像のグローバル外観を制御する。しかし、オブジェクトの外観が複数の要因に依存するため、複数のオブジェクトスタイルをキャプチャするには、単一の潜時コードは不十分であることが多い。オブジェクトのスタイルを決定する個々の要素を扱うため、複数の潜在空間を学習することにより、各オブジェクトクラスをローカルからグローバルレベルまで柔軟に制御できるvaruational autoencoder(vae)フレームワークのクラスおよびレイヤごとに拡張する。さらに,本手法は3つの異なる領域における実データと合成データを用いた広範囲な実験により,最先端の手法と比較して,多種多様な画像を生成することを実証する。また,本手法は画像合成や編集作業において幅広い応用が可能となることを示した。

Semantic image synthesis is a process for generating photorealistic images from a single semantic mask. To enrich the diversity of multimodal image synthesis, previous methods have controlled the global appearance of an output image by learning a single latent space. However, a single latent code is often insufficient for capturing various object styles because object appearance depends on multiple factors. To handle individual factors that determine object styles, we propose a class- and layer-wise extension to the variational autoencoder (VAE) framework that allows flexible control over each object class at the local to global levels by learning multiple latent spaces. Furthermore, we demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods via extensive experiments with real and synthetic datasets inthree different domains. We also show that our method enables a wide range of applications in image synthesis and editing tasks.

翻訳日:2021-06-28 12:58:14 公開日:2021-06-25

# 物理形ニューラルネットワーク(pinns)のマルチフィデリティモデリング

Multifidelity Modeling for Physics-Informed Neural Networks (PINNs) ( http://arxiv.org/abs/2106.13361v1 )

ライセンス: Link先を確認

Michael Penwarden, Shandian Zhe, Akil Narayan, Robert M. Kirby

(参考訳) マルチファイダリティシミュレーション手法は、低フィダリティと高フィダリティシミュレーションを巧みに組み合わせることで、精度の向上とコスト削減を図っている。このアプローチの候補は、重要な計算コストの差と忠実性の違いがあるシミュレーション方法論である。物理インフォームドニューラルネットワーク(PINN)は、異なる忠実度(アーキテクチャの幅と深さおよび最適化基準で表される)が採用されるために必要なトレーニング時間に大きな違いがあるため、この種のアプローチの候補となっている。本稿では,低ランク構造を利用するPINNに適用した,特定の多重忠実度アプローチを提案する。モデルの忠実度に関するパラメータとして,幅,深さ,最適化基準が利用可能であることを実証し,忠実度パラメータの選択によるトレーニングにおけるコスト差の数値的正当性を示す。我々は新しいピンズ文学で提示された様々な正準フォワードpdeモデル上で多元性スキームをテストする。

Multifidelity simulation methodologies are often used in an attempt to judiciously combine low-fidelity and high-fidelity simulation results in an accuracy-increasing, cost-saving way. Candidates for this approach are simulation methodologies for which there are fidelity differences connected with significant computational cost differences. Physics-informed Neural Networks (PINNs) are candidates for these types of approaches due to the significant difference in training times required when different fidelities (expressed in terms of architecture width and depth as well as optimization criteria) are employed. In this paper, we propose a particular multifidelity approach applied to PINNs that exploits low-rank structure. We demonstrate that width, depth, and optimization criteria can be used as parameters related to model fidelity, and show numerical justification of cost differences in training due to fidelity parameter choices. We test our multifidelity scheme on various canonical forward PDE models that have been presented in the emerging PINNs literature.

翻訳日:2021-06-28 12:57:48 公開日:2021-06-25

# 隣接世代を欠いたサブグラフフェデレーション学習

Subgraph Federated Learning with Missing Neighbor Generation ( http://arxiv.org/abs/2106.13430v1 )

ライセンス: Link先を確認

Ke Zhang, Carl Yang, Xiaoxiao Li, Lichao Sun, Siu Ming Yiu

(参考訳) グラフは、現実世界のオブジェクトとその相互作用のユニークな表現のために、データマイニングや機械学習で広く使われている。近年,グラフがますます大きくなってきているため,各サブグラフが個別に収集され,複数のローカルシステムに格納されることが一般的である。したがって、グラフ全体の分布からバイアスを受ける可能性のある小さなサブグラフを持つ各ローカルシステムにおいて、サブグラフフェデレーション学習環境を考えるのは自然である。したがって、subgraphフェデレーション学習は、グラフデータを直接共有することなく、強力で一般化可能なグラフマイニングモデルを協調的にトレーニングすることを目的としている。本研究では,1)FedAvgをベースとしたGraphSageモデルを訓練し,ノードの特徴,リンク構造,タスクラベルを複数のローカルサブグラフに統合するFedSage+,2)FedSageに沿って欠落した隣人ジェネレータを訓練してローカルサブグラフ間のリンクに対処するFedSage+という2つの主要な手法を提案する。合成サブグラフフェデレーション学習設定を用いた4つの実世界のグラフデータセットの実証結果から,提案手法の有効性と有効性を示す。同時に、大域グラフ上の一般化能力に対して一貫した理論的含意がもたらされる。

Graphs have been widely used in data mining and machine learning due to their unique representation of real-world objects and their interactions. As graphs are getting bigger and bigger nowadays, it is common to see their subgraphs separately collected and stored in multiple local systems. Therefore, it is natural to consider the subgraph federated learning setting, where each local system holding a small subgraph that may be biased from the distribution of the whole graph. Hence, the subgraph federated learning aims to collaboratively train a powerful and generalizable graph mining model without directly sharing their graph data. In this work, towards the novel yet realistic setting of subgraph federated learning, we propose two major techniques: (1) FedSage, which trains a GraphSage model based on FedAvg to integrate node features, link structures, and task labels on multiple local subgraphs; (2) FedSage+, which trains a missing neighbor generator along FedSage to deal with missing links across local subgraphs. Empirical results on four real-world graph datasets with synthesized subgraph federated learning settings demonstrate the effectiveness and efficiency of our proposed techniques. At the same time, consistent theoretical implications are made towards their generalization ability on the global graphs.

翻訳日:2021-06-28 12:57:34 公開日:2021-06-25

# ペトロケミカル産業用推論センサのデータベース設計

Data-based Design of Inferential Sensors for Petrochemical Industry ( http://arxiv.org/abs/2106.13503v1 )

ライセンス: Link先を確認

Martin Mojto, Karol \v{L}ubu\v{s}k\'y, Miroslav Fikar and Radoslav Paulen

(参考訳) 産業において、不正確な(または柔らかい)センサーは、オンラインで測定された変数(例えば圧力、温度)から不正確かつ稀に測定された(または完全に測定されていない)変数の値を推測するために用いられる。効果的な推論センサーを設計する際の、古典的なモデルオーバーフィッティングに似た主な課題は、センサーの正しい構造を選択することである。センサ構造は、オンラインで測定された変数とその(単純な)組み合わせに対応するセンサへの入力数によって表現される。本研究は,2つの油精製ユニット,流体触媒分解ユニットと真空ガス水素化ユニットにおける工業蒸留塔の製品組成推定センサの設計に焦点をあてたものである。最初の設計ステップとして,いくつかのよく知られたデータ前処理(gross error detection)手法を用いて,利用可能な産業データにおける系統的エラーと異常値を示すために,これらの手法の能力を比較する。次に,得られたモデルの複雑さと精度を考慮した推論センサの設計手法の有効性について検討する。有効性分析の結果、現在の平均センサによる改善は最大19%であった。

Inferential (or soft) sensors are used in industry to infer the values of imprecisely and rarely measured (or completely unmeasured) variables from variables measured online (e.g., pressures, temperatures). The main challenge, akin to classical model overfitting, in designing an effective inferential sensor is the selection of a correct structure of the sensor. The sensor structure is represented by the number of inputs to the sensor, which correspond to the variables measured online and their (simple) combinations. This work is focused on the design of inferential sensors for product composition of an industrial distillation column in two oil refinery units, a Fluid Catalytic Cracking unit and a Vacuum Gasoil Hydrogenation unit. As the first design step, we use several well-known data pre-treatment (gross error detection) methods and compare the ability of these approaches to indicate systematic errors and outliers in the available industrial data. We then study effectiveness of various methods for design of the inferential sensors taking into account the complexity and accuracy of the resulting model. The effectiveness analysis indicates that the improvements achieved over the current inferential sensors are up to 19 %.

翻訳日:2021-06-28 12:57:13 公開日:2021-06-25

# 高次元コルモゴロフ-スミルノフ距離の加速計算

Accelerated Computation of a High Dimensional Kolmogorov-Smirnov Distance ( http://arxiv.org/abs/2106.13706v1 )

ライセンス: Link先を確認

Alex Hagen, Shane Jackson, James Kahn, Jan Strube, Isabel Haide, Karl Pazdernik, Connor Hainje

(参考訳) 統計検査は、様々な科学分野において広く、重要である。機械学習の出現と計算能力の増大により、多次元データの分析と統計的テストへの関心が高まっている。強力なコルモゴロフ・スミルノフの2つの標本試験をファサーノ(fasano, 1987)と同様の高次元形式に拡張する。 We call our result the d-dimensional Kolmogorov-Smirnov test (ddKS) and provide three novel contributions therewith: we develop an analytical equation for the significance of a given ddKS score, we provide an algorithm for computation of ddKS on modern computing hardware that is of constant time complexity for small sample sizes and dimensions, and we provide two approximate calculations of ddKS: one that reduces the time complexity to linear at larger sample sizes, and another that reduces the time complexity to linear with increasing dimension. 我々は、ddKSとその近似をデータセットのコーパス上でパワー分析し、HotellingのT^2テストとKullback-Leiblerの分岐といった、他の一般的な高次元の2つのサンプルテストと距離と比較する。私たちのddkテストは、テストされたすべてのデータセット、寸法、サイズでうまく動作しますが、他のテストと距離は、少なくとも1つのデータセットのヌル仮説を拒否できません。したがって,ddkは汎用的な多次元2試料テストであり,並列近似法や近似法を用いて高速かつ効率的な計算が可能である。本研究で説明したすべてのメソッドのオープンソース実装はhttps://github.com/pnnl/ddks.comにある。

Statistical testing is widespread and critical for a variety of scientific disciplines. The advent of machine learning and the increase of computing power has increased the interest in the analysis and statistical testing of multidimensional data. We extend the powerful Kolmogorov-Smirnov two sample test to a high dimensional form in a similar manner to Fasano (Fasano, 1987). We call our result the d-dimensional Kolmogorov-Smirnov test (ddKS) and provide three novel contributions therewith: we develop an analytical equation for the significance of a given ddKS score, we provide an algorithm for computation of ddKS on modern computing hardware that is of constant time complexity for small sample sizes and dimensions, and we provide two approximate calculations of ddKS: one that reduces the time complexity to linear at larger sample sizes, and another that reduces the time complexity to linear with increasing dimension. We perform power analysis of ddKS and its approximations on a corpus of datasets and compare to other common high dimensional two sample tests and distances: Hotelling's T^2 test and Kullback-Leibler divergence. Our ddKS test performs well for all datasets, dimensions, and sizes tested, whereas the other tests and distances fail to reject the null hypothesis on at least one dataset. We therefore conclude that ddKS is a powerful multidimensional two sample test for general use, and can be calculated in a fast and efficient manner using our parallel or approximate methods. Open source implementations of all methods described in this work are located at https://github.com/pnnl/ddks.

翻訳日:2021-06-28 12:56:54 公開日:2021-06-25

# 平均フィールドゲームのための強化学習と経済学への応用

Reinforcement Learning for Mean Field Games, with Applications to Economics ( http://arxiv.org/abs/2106.13755v1 )

ライセンス: Link先を確認

Andrea Angiuli and Jean-Pierre Fouque and Mathieu Lauriere

(参考訳) 平均場ゲーム (mfg) と平均場制御問題 (mfc) は、エージェントの連続体を持つゲームにおけるnash平衡や社会光学を研究するためのフレームワークである。これらの問題は、大きな有限個のエージェントによる競争的または協調的なゲーム近似に利用することができ、特に経済学において幅広い応用を見いだすことができる。近年、MFGとMFCにおける学習の問題は、解を計算する方法と、学習者の大多数が均衡にどのように収束するかをモデル化する方法の両方として関心を集めている。特に興味深いのは、エージェントがモデルを知らない設定であり、これは強化学習(rl)メソッドの開発につながる。このトピックに関する文献をレビューした後、統一的なQ-ラーニングアルゴリズムに依存するMFGとMFCのためのRLを用いた2つのタイムスケールアプローチを提案する。この手法の主な目新しさは、アクション値関数と分布を同時に更新するが、異なるレートでモデルフリーで更新することである。 2つの学習率の比率に応じて、アルゴリズムはMFGまたはMFCソリューションのいずれかを学習する。この方法を説明するために,原ユーティリティ関数を用いた有限方向の累積消費の平均場問題と,トレーダの最適清算問題に適用する。

Mean field games (MFG) and mean field control problems (MFC) are frameworks to study Nash equilibria or social optima in games with a continuum of agents. These problems can be used to approximate competitive or cooperative games with a large finite number of agents and have found a broad range of applications, in particular in economics. In recent years, the question of learning in MFG and MFC has garnered interest, both as a way to compute solutions and as a way to model how large populations of learners converge to an equilibrium. Of particular interest is the setting where the agents do not know the model, which leads to the development of reinforcement learning (RL) methods. After reviewing the literature on this topic, we present a two timescale approach with RL for MFG and MFC, which relies on a unified Q-learning algorithm. The main novelty of this method is to simultaneously update an action-value function and a distribution but with different rates, in a model-free fashion. Depending on the ratio of the two learning rates, the algorithm learns either the MFG or the MFC solution. To illustrate this method, we apply it to a mean field problem of accumulated consumption in finite horizon with HARA utility function, and to a trader's optimal liquidation problem.

翻訳日:2021-06-28 12:56:03 公開日:2021-06-25

# HyperNP:多次元投影型ハイパーパラメータのインタラクティブビジュアル探索

HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters ( http://arxiv.org/abs/2106.13777v1 )

ライセンス: Link先を確認

Gabriel Appleby, Mateus Espadoto, Rui Chen, Samuel Goree, Alexandru Telea, Erik W Anderson, Remco Chang

(参考訳) t-SNE や UMAP のような投影アルゴリズムは高次元データの可視化に有用であるが、慎重に調整する必要があるハイパーパラメータに依存する。残念なことに、最適なハイパーパラメータ値を見つけるために反復的に再計算する射影は、これらの方法の確率的性質のために計算的に集中的で直感的である。本稿では,ニューラルネットワーク近似をトレーニングすることで,プロジェクション手法のリアルタイム対話型ハイパーパラメータ探索を可能にする,スケーラブルなHyperNPを提案する。 hypernpは、全データインスタンスとハイパーパラメータの構成のごく一部でトレーニングでき、新しいデータとハイパーパラメータのプロジェクションをインタラクティブな速度で計算できる。 HyperNPはサイズがコンパクトで計算が速いため、Webブラウザのような軽量な視覚化システムに組み込むことができる。我々は3つのデータセットにおけるhypernpの性能を性能と速度の観点から評価する。結果は、HyperNPは正確で、スケーラブルで、インタラクティブで、現実世界の設定での使用に適していることを示唆している。

Projection algorithms such as t-SNE or UMAP are useful for the visualization of high dimensional data, but depend on hyperparameters which must be tuned carefully. Unfortunately, iteratively recomputing projections to find the optimal hyperparameter value is computationally intensive and unintuitive due to the stochastic nature of these methods. In this paper we propose HyperNP, a scalable method that allows for real-time interactive hyperparameter exploration of projection methods by training neural network approximations. HyperNP can be trained on a fraction of the total data instances and hyperparameter configurations and can compute projections for new data and hyperparameters at interactive speeds. HyperNP is compact in size and fast to compute, thus allowing it to be embedded in lightweight visualization systems such as web browsers. We evaluate the performance of the HyperNP across three datasets in terms of performance and speed. The results suggest that HyperNP is accurate, scalable, interactive, and appropriate for use in real-world settings.

翻訳日:2021-06-28 12:55:36 公開日:2021-06-25

# 整数プログラミングによるバイナリ行列の分解と補完

Binary Matrix Factorisation and Completion via Integer Programming ( http://arxiv.org/abs/2106.13434v1 )

ライセンス: Link先を確認

Reka A. Kovacs, Oktay Gunluk, Raphael A. Hauser

(参考訳) binary matrix factorizationは、バイナリデータの離散的なパターンを識別するための必須のツールである。本稿では,次数-k二進行列分解問題 (k-BMF) をブール算術の下で考慮し,nxm二進行列 X が欠落する可能性があり,それぞれ nxk と kxm の 2 つの二進行列 A と B を見出すことで,X と A と B の距離を2乗フロベニウス距離で最小化する。我々はk-BMF用のコンパクトかつ2つの指数サイズの整数プログラム(IP)を提案し、コンパクトIPはLP緩和が弱い一方、指数サイズのLPはLP緩和が強いことを示す。従来の2乗フロベニウスの目的と異なり、ゼロがランク-k因子化で誤ってカバーされる回数に比例する入力行列の零エントリに重みを割り当てる新たな目的関数を導入する。指数サイズのIPの1つについて,列生成に基づく計算手法について述べる。合成および実単語データセットの実験結果から,我々の整数プログラミング手法はk-BMFの利用可能な手法と競合し,精度の高い低エラー因数分解を提供することが示された。

Binary matrix factorisation is an essential tool for identifying discrete patterns in binary data. In this paper we consider the rank-k binary matrix factorisation problem (k-BMF) under Boolean arithmetic: we are given an n x m binary matrix X with possibly missing entries and need to find two binary matrices A and B of dimension n x k and k x m respectively, which minimise the distance between X and the Boolean product of A and B in the squared Frobenius distance. We present a compact and two exponential size integer programs (IPs) for k-BMF and show that the compact IP has a weak LP relaxation, while the exponential size LPs have a stronger equivalent LP relaxation. We introduce a new objective function, which differs from the traditional squared Frobenius objective in attributing a weight to zero entries of the input matrix that is proportional to the number of times the zero is erroneously covered in a rank-k factorisation. For one of the exponential size IPs we describe a computational approach based on column generation. Experimental results on synthetic and real word datasets suggest that our integer programming approach is competitive against available methods for k-BMF and provides accurate low-error factorisations.

翻訳日:2021-06-28 12:54:20 公開日:2021-06-25

# 残響環境におけるディープラーニング音声活動検出器と室内インパルス応答モデルの評価

Evaluation of Deep-Learning-Based Voice Activity Detectors and Room Impulse Response Models in Reverberant Environments ( http://arxiv.org/abs/2106.13511v1 )

ライセンス: Link先を確認

Amir Ivry, Israel Cohen, Baruch Berdugo

(参考訳) 最先端のディープラーニングに基づく音声活動検出装置(vad)は、しばしば無響データを用いて訓練される。しかし、実際の音響環境は一般に残響であり、性能が著しく低下する。トレーニングデータと実データとのミスマッチを軽減するために,500万近い発話を含む拡張トレーニングセットをシミュレートする。この拡張は、様々な室インパルス応答(rir)を伴う無響発話の畳み込みによって生成される無響発話とその残響変化からなる。 rirの生成には5つの異なるモデルと、拡張トレーニングセットでトレーニングされる5つの異なるvadを考えます。トレーニングされたシステムはすべて、3つの異なる実残響環境でテストします。実験結果から,全ての検出器および応答モデルの平均精度,精度,リコールが,無響訓練と比較して20 %$上昇した。さらに、RIRモデルの1つは、テストされた全てのVADに対して、他のモデルよりも常に優れたパフォーマンスが得られる。さらに、VADの1つは全ての実験で他のVADよりも一貫して優れていた。

State-of-the-art deep-learning-based voice activity detectors (VADs) are often trained with anechoic data. However, real acoustic environments are generally reverberant, which causes the performance to significantly deteriorate. To mitigate this mismatch between training data and real data, we simulate an augmented training set that contains nearly five million utterances. This extension comprises of anechoic utterances and their reverberant modifications, generated by convolutions of the anechoic utterances with a variety of room impulse responses (RIRs). We consider five different models to generate RIRs, and five different VADs that are trained with the augmented training set. We test all trained systems in three different real reverberant environments. Experimental results show $20\%$ increase on average in accuracy, precision and recall for all detectors and response models, compared to anechoic training. Furthermore, one of the RIR models consistently yields better performance than the other models, for all the tested VADs. Additionally, one of the VADs consistently outperformed the other VADs in all experiments.

翻訳日:2021-06-28 12:53:58 公開日:2021-06-25

# Littlestoneクラスはプライベートにオンライン学習可能

Littlestone Classes are Privately Online Learnable ( http://arxiv.org/abs/2106.13513v1 )

ライセンス: Link先を確認

Noah Golowich and Roi Livni

(参考訳) プライバシー制約下でのオンライン分類の問題点を考察する。この設定では、学習者はラベル付きサンプルのストリームを逐次観察し、各イテレーションで$(x_t, y_t)$, for $1 \leq t \leq t$, and return at each iteration $t$ a hypothesis $h_t$ を返します。学習者のパフォーマンスは、既知の仮説クラス$\mathcal{H}$に対する後悔によって測定される。アルゴリズムが出力する仮説のシーケンス$h_1, \ldots, h_T$は$(\epsilon, \delta)$-differentially private function of the whole input sequence $(x_1, y_1), \ldots, (x_T, y_T)$である必要がある。実現可能な設定のために、最初の非自明な後悔を与える。具体的には、クラス $\mathcal{H}$ が定数のリトルストーン次元を持つなら、ラベル付き例の曖昧な列が与えられた場合、最大$O(\log T)$ミスを期待するプライベートラーナーが存在する。さらに、リトルストーン次元の一般値 $d$ に対して、同じ誤り境界は成り立つが、$d$因子の二重指数を持つ。最近の研究は、オンライン学習可能なクラスと、微分プライベート学習可能なクラスの間に強いつながりを示している。この関係を強化し、オンライン学習アルゴリズムが(実現可能な設定で)直接民営化可能であることを示す。また,適応的な設定を議論し,o(\sqrt{t})$ の劣線形後悔値を与える。

We consider the problem of online classification under a privacy constraint. In this setting a learner observes sequentially a stream of labelled examples $(x_t, y_t)$, for $1 \leq t \leq T$, and returns at each iteration $t$ a hypothesis $h_t$ which is used to predict the label of each new example $x_t$. The learner's performance is measured by her regret against a known hypothesis class $\mathcal{H}$. We require that the algorithm satisfies the following privacy constraint: the sequence $h_1, \ldots, h_T$ of hypotheses output by the algorithm needs to be an $(\epsilon, \delta)$-differentially private function of the whole input sequence $(x_1, y_1), \ldots, (x_T, y_T)$. We provide the first non-trivial regret bound for the realizable setting. Specifically, we show that if the class $\mathcal{H}$ has constant Littlestone dimension then, given an oblivious sequence of labelled examples, there is a private learner that makes in expectation at most $O(\log T)$ mistakes -- comparable to the optimal mistake bound in the non-private case, up to a logarithmic factor. Moreover, for general values of the Littlestone dimension $d$, the same mistake bound holds but with a doubly-exponential in $d$ factor. A recent line of work has demonstrated a strong connection between classes that are online learnable and those that are differentially-private learnable. Our results strengthen this connection and show that an online learning algorithm can in fact be directly privatized (in the realizable setting). We also discuss an adaptive setting and provide a sublinear regret bound of $O(\sqrt{T})$.

翻訳日:2021-06-28 12:53:43 公開日:2021-06-25

# テキスト依存話者検証のための音素認識とチャネル毎注意学習

Phoneme-aware and Channel-wise Attentive Learning for Text DependentSpeaker Verification ( http://arxiv.org/abs/2106.13514v1 )

ライセンス: Link先を確認

Yan Liu, Zheng Li, Lin Li, Qingyang Hong

(参考訳) 本稿では,テキスト依存型話者認証(SV)のための音素認識型マルチタスク学習ネットワークを提案する。提案手法では,フレームレベルのマルチタスク学習とセグメントレベルの逆学習を併用して話者埋め込み抽出を行う。話者分類器の主ネットワークにおけるフレームレベルの特徴に、補助サブネットにおける音素分布に対する対応する後続確率を利用して音素認識注意プールを行う。さらに、Squeeze and Excitation(SE-block)の導入により、動的チャネルワイズ機能の再検討が行われ、表現能力が向上する。提案手法は, パスフレーズに関連する話者イディオ同期を活用し, 時間的, チャネル的側面から音素対応の注意プーリングとseブロックによりさらに改善する。 RSR2015 Part 1データベースで行った実験により,本システムはテキスト依存型SVに対して優れた結果が得られることを確認した。

This paper proposes a multi-task learning network with phoneme-aware and channel-wise attentive learning strategies for text-dependent Speaker Verification (SV). In the proposed structure, the frame-level multi-task learning along with the segment-level adversarial learning is adopted for speaker embedding extraction. The phoneme-aware attentive pooling is exploited on frame-level features in the main network for speaker classifier, with the corresponding posterior probability for the phoneme distribution in the auxiliary subnet. Further, the introduction of Squeeze and Excitation (SE-block) performs dynamic channel-wise feature recalibration, which improves the representational ability. The proposed method exploits speaker idiosyncrasies associated with pass-phrases, and is further improved by the phoneme-aware attentive pooling and SE-block from temporal and channel-wise aspects, respectively. The experiments conducted on RSR2015 Part 1 database confirm that the proposed system achieves outstanding results for textdependent SV.

翻訳日:2021-06-28 12:53:06 公開日:2021-06-25

# 信号歪みとエコー抑圧の調整可能なトレードオフによる深い残留エコー抑圧

Deep Residual Echo Suppression with A Tunable Tradeoff Between Signal Distortion and Echo Suppression ( http://arxiv.org/abs/2106.13531v1 )

ライセンス: Link先を確認

Amir Ivry, Israel Cohen, Baruch Berdugo

(参考訳) 本稿では,線形音響エコーキャンセラの出力を直接スペクトル領域の所望信号にマッピングするUNetニューラルネットワークを用いた残留エコー抑圧手法を提案する。このシステムでは、ダブルトークシナリオにおける所望の信号歪みと残留エコー抑圧との調整可能なトレードオフを可能にする設計パラメータを組み込む。このシステムは136万のパラメータを使用し、毎秒1.6ギガ浮動小数点演算と10メガバイトのメモリを必要とする。この実装は、AECチャレンジのタイミング要件とオンデバイスアプリケーションの計算およびメモリ制限の両方を満たす。 AECチャレンジデータベースと実際の独立記録から161〜hのデータを用いて実験を行う。提案システムの性能を実生活環境で実証し、エコー抑圧と所望の信号歪み、様々な環境への一般化、高エコーレベルの堅牢性に関する2つの競合手法と比較する。

In this paper, we propose a residual echo suppression method using a UNet neural network that directly maps the outputs of a linear acoustic echo canceler to the desired signal in the spectral domain. This system embeds a design parameter that allows a tunable tradeoff between the desired-signal distortion and residual echo suppression in double-talk scenarios. The system employs 136 thousand parameters, and requires 1.6 Giga floating-point operations per second and 10 Mega-bytes of memory. The implementation satisfies both the timing requirements of the AEC challenge and the computational and memory limitations of on-device applications. Experiments are conducted with 161~h of data from the AEC challenge database and from real independent recordings. We demonstrate the performance of the proposed system in real-life conditions and compare it with two competing methods regarding echo suppression and desired-signal distortion, generalization to various environments, and robustness to high echo levels.

翻訳日:2021-06-28 12:52:51 公開日:2021-06-25

# 物理形ニューラルネットワークによる過渡安定性解析

Transient Stability Analysis with Physics-Informed Neural Networks ( http://arxiv.org/abs/2106.13638v1 )

ライセンス: Link先を確認

Jochen Stiasny, Georgios S. Misyris, Spyros Chatzivasileiadis

(参考訳) 電力系統を支配する通常の微分方程式を解くことは、過渡安定解析において不可欠である。しかし、伝統的に適用される手法は、計算量に大きな負担を負うか、モデルの単純化を必要とするか、保守的なサロゲートモデルを使うかのどちらかである。ニューラルネットワークは、これらの制限を回避できるが、使用するデータセットに対する高い要求に直面している。さらに、それらは下層の支配方程式に無関係である。物理インフォームドニューラルネットワークはこの問題に対処し,その利点と課題について考察する。 Kundur two-area systemの知見を概説し,本手法のさらなる発展に向けての道筋を明らかにする。

Solving the ordinary differential equations that govern the power system is an indispensable part in transient stability analysis. However, the traditionally applied methods either carry a significant computational burden, require model simplifications, or use overly conservative surrogate models. Neural networks can circumvent these limitations but are faced with high demands on the used datasets. Furthermore, they are agnostic to the underlying governing equations. Physics-informed neural network tackle this problem and we explore their advantages and challenges in this paper. We illustrate the findings on the Kundur two-area system and highlight possible pathways forward in developing this method further.

翻訳日:2021-06-28 12:52:35 公開日:2021-06-25

# ロボット学習のための統計的保証を用いたタスク駆動分散検出

Task-Driven Out-of-Distribution Detection with Statistical Guarantees for Robot Learning ( http://arxiv.org/abs/2106.13703v1 )

ライセンス: Link先を確認

Alec Farid, Sushant Veer, Anirudha Majumdar

(参考訳) 我々のゴールは、ロボットのトレーニングに使用する環境とは異なる分布から引き出された環境において、ロボットが動作していることを検知する、アウト・オブ・ディストリビューション(OOD)検出を行うことである。本稿では,確率的近似(PAC)-ベイズ理論を利用して,トレーニング分布における性能を保証したポリシをトレーニングする。テスト環境に縛られた性能の侵害は、ロボットがOODを操作していることを示す証拠となります。 p-値と濃度不等式に基づいて統計的手法を用いてこれを定式化する。結果として得られたアプローチ(i)はood検出に対する信頼性を保証し、(ii)タスク駆動であり、ロボットの性能に影響を与える変化のみに敏感である。身近なポーズや形状で物体をつかむシミュレーション例について,本手法を実証する。また,不慣れな環境(風乱や障害物密度など)において,視覚に基づく障害物回避を行うドローンのシミュレーションとハードウェア実験も行った。我々の例は、ほんの数回の試行でタスク駆動型OOD検出ができることを示している。また, ベースラインとの比較では, 統計的保証の提供やタスク非関連分布シフトに敏感であることから, 提案手法の利点も示している。

Our goal is to perform out-of-distribution (OOD) detection, i.e., to detect when a robot is operating in environments that are drawn from a different distribution than the environments used to train the robot. We leverage Probably Approximately Correct (PAC)-Bayes theory in order to train a policy with a guaranteed bound on performance on the training distribution. Our key idea for OOD detection then relies on the following intuition: violation of the performance bound on test environments provides evidence that the robot is operating OOD. We formalize this via statistical techniques based on p-values and concentration inequalities. The resulting approach (i) provides guaranteed confidence bounds on OOD detection, and (ii) is task-driven and sensitive only to changes that impact the robot's performance. We demonstrate our approach on a simulated example of grasping objects with unfamiliar poses or shapes. We also present both simulation and hardware experiments for a drone performing vision-based obstacle avoidance in unfamiliar environments (including wind disturbances and different obstacle densities). Our examples demonstrate that we can perform task-driven OOD detection within just a handful of trials. Comparisons with baselines also demonstrate the advantages of our approach in terms of providing statistical guarantees and being insensitive to task-irrelevant distribution shifts.

翻訳日:2021-06-28 12:52:24 公開日:2021-06-25

# sdss-iv拡張バリオン振動分光調査による原始的非ガウス性i:カタログ作成と体系的緩和

Primordial non-Gaussianity from the Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey I: Catalogue Preparation and Systematic Mitigation ( http://arxiv.org/abs/2106.13724v1 )

ライセンス: Link先を確認

Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Eva-Maria Mueller, Will J. Percival, Grant Merz, Reza Katebi, Razvan C. Bunescu, Julian Bautista, Joel R. Brownstein, Etienne Burtin, Kyle Dawson, H\'ector Gil-Mar\'in, Jiamin Hou, Eleanor B. Lyke, Axel de la Macorra, Graziano Rossi, Donald P. Schneider, Pauline Zarrouk, Gong-Bo Zhao

(参考訳) 最近完了した拡張バリオン振動分光調査(eboss)によるクエーサーの最終分光試料の大規模クラスタリングについて検討した。サンプルには、redshiftの範囲内の343708ドルのオブジェクトが含まれている。$0.8<z<2.2$と72667ドルのオブジェクトで、redshiftは$2.2<z<3.5$であり、有効面積は$4699 -{\rm deg}^{2}$である。本研究では,画像データの品質の空間的変動による密度場のスプリアス変動を軽減し,追従分光のターゲット選択に用いたニューラルネットワークに基づく手法を提案する。シミュレーションは実データと同じ角分布と半径分布を用いて共分散行列を推定し、誤差解析を行い、残差系統的不確かさを評価する。本稿では,eBOSSクエーサーの平均密度コントラストと相互相関を測定し,アルゴリズムの有効性に対処するため,画像の潜在的ソースのマップと比較し,ニューラルネットワークに基づくアプローチが標準的な線形回帰よりも優れていることを示した。恒星密度は、散発的な変動の最も重要な源の1つであり、gaia衛星のデータを用いて構築された新しいテンプレートは、観測されたクエーサーのクラスタリングに最も適している。この研究から得られた最終産物は、非線形イメージングシステム効果の補正のために改良された重み付き付加価値クエーサーカタログである。我々のクエーサーカタログは、我々の共用論文『Mueller et al』の局所型原始的非ガウス性を測定するために使用される。準備中

We investigate the large-scale clustering of the final spectroscopic sample of quasars from the recently completed extended Baryon Oscillation Spectroscopic Survey (eBOSS). The sample contains $343708$ objects in the redshift range $0.8<z<2.2$ and $72667$ objects with redshifts $2.2<z<3.5$, covering an effective area of $4699~{\rm deg}^{2}$. We develop a neural network-based approach to mitigate spurious fluctuations in the density field caused by spatial variations in the quality of the imaging data used to select targets for follow-up spectroscopy. Simulations are used with the same angular and radial distributions as the real data to estimate covariance matrices, perform error analyses, and assess residual systematic uncertainties. We measure the mean density contrast and cross-correlations of the eBOSS quasars against maps of potential sources of imaging systematics to address algorithm effectiveness, finding that the neural network-based approach outperforms standard linear regression. Stellar density is one of the most important sources of spurious fluctuations, and a new template constructed using data from the Gaia spacecraft provides the best match to the observed quasar clustering. The end-product from this work is a new value-added quasar catalogue with the improved weights to correct for nonlinear imaging systematic effects, which will be made public. Our quasar catalogue is used to measure the local-type primordial non-Gaussianity in our companion paper, Mueller et al. in preparation.

翻訳日:2021-06-28 12:52:04 公開日:2021-06-25

# ディープラーニングを用いた非線形音響エコーキャンセラ

Nonlinear Acoustic Echo Cancellation with Deep Learning ( http://arxiv.org/abs/2106.13754v1 )

ライセンス: Link先を確認

Amir Ivry, Israel Cohen, Baruch Berdugo

(参考訳) 遠端信号から近端マイクロホンへのエコーパスを2つの部分でモデル化することを目的とした非線形音響エコーキャンセリングシステムを提案する。現代のハンズフリーデバイスの物理的挙動に触発されて、我々はまず、これらのデバイスが極端信号の受信と再生の間に引き起こす非線形歪みをモデル化する、新しいニューラルネットワークアーキテクチャを導入する。デバイス間のばらつきを考慮し,事前パラメータ化されていないが,トレーニングデータを用いたトレーニング段階で最適化された,トレーニング可能なメモリ長と非線形アクティベーション関数を備えたネットワークを構築する。第2に、スピーカ出力とマイクの間のエコーパスを常に追跡する標準適応線形フィルタによってネットワークを継承する。トレーニング中、ネットワークとフィルタはネットワークパラメータを学習するために協調的に最適化される。このシステムは毎秒500万の浮動小数点演算を消費する1万のパラメータと40Kiloバイトのメモリを必要とする。また、ハンズフリーの通信タイミング要件を標準のニューラルプロセッサで満たし、ハンズフリーの通信デバイスに組み込むのに適している。 280時間の実データと合成データを用いて、実験は競合する手法と比較して有利な性能を示す。

We propose a nonlinear acoustic echo cancellation system, which aims to model the echo path from the far-end signal to the near-end microphone in two parts. Inspired by the physical behavior of modern hands-free devices, we first introduce a novel neural network architecture that is specifically designed to model the nonlinear distortions these devices induce between receiving and playing the far-end signal. To account for variations between devices, we construct this network with trainable memory length and nonlinear activation functions that are not parameterized in advance, but are rather optimized during the training stage using the training data. Second, the network is succeeded by a standard adaptive linear filter that constantly tracks the echo path between the loudspeaker output and the microphone. During training, the network and filter are jointly optimized to learn the network parameters. This system requires 17 thousand parameters that consume 500 Million floating-point operations per second and 40 Kilo-bytes of memory. It also satisfies hands-free communication timing requirements on a standard neural processor, which renders it adequate for embedding on hands-free communication devices. Using 280 hours of real and synthetic data, experiments show advantageous performance compared to competing methods.

翻訳日:2021-06-28 12:51:34 公開日:2021-06-25

# 拡散ネットに基づく過渡雑音環境における音声活動検出

Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets ( http://arxiv.org/abs/2106.13763v1 )

ライセンス: Link先を確認

Amir Ivry, Baruch Berdugo, Israel Cohen

(参考訳) 実生活シナリオにおいてしばしば発生する過渡音と定常音の音響環境における音声活動の検出に対処する。音声と非音声の空間的パターンを独立に学習し,その基礎となる幾何学的構造を学習する。このプロセスはディープエンコーダ-デコーダベースのニューラルネットワークアーキテクチャを通じて行われる。この構造は、時間的情報を持つスペクトル特徴を拡散写像法を適用して生成される低次元表現にマッピングするエンコーダを含んでいる。エンコーダは、埋め込みデータを高次元空間にマッピングするデコーダを供給する。非音声フレームから音声を分離するように訓練されたディープニューラルネットワークは、既知の拡散ネットアーキテクチャに似たエンコーダにデコーダを結合することで得られる。実験の結果, 競合音声活動検出法と比較して, 性能が向上した。この改善は精度、堅牢性、一般化能力の両方で達成される。我々のモデルはリアルタイムに動作し、音声ベースの通信システムに統合することができる。また,オフラインアプリケーションに対して,より高精度なバッチアルゴリズムを提案する。

We address voice activity detection in acoustic environments of transients and stationary noises, which often occur in real life scenarios. We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure. This process is done through a deep encoder-decoder based neural network architecture. This structure involves an encoder that maps spectral features with temporal information to their low-dimensional representations, which are generated by applying the diffusion maps method. The encoder feeds a decoder that maps the embedded data back into the high-dimensional space. A deep neural network, which is trained to separate speech from non-speech frames, is obtained by concatenating the decoder to the encoder, resembling the known Diffusion nets architecture. Experimental results show enhanced performance compared to competing voice activity detection methods. The improvement is achieved in both accuracy, robustness and generalization ability. Our model performs in a real-time manner and can be integrated into audio-based communication systems. We also present a batch algorithm which obtains an even higher accuracy for off-line applications.

翻訳日:2021-06-28 12:51:14 公開日:2021-06-25

# 配電試験における耐性の価格

The Price of Tolerance in Distribution Testing ( http://arxiv.org/abs/2106.13414v1 )

ライセンス: Link先を確認

Cl\'ement L. Canonne, Ayush Jain, Gautam Kamath, Jerry Li

(参考訳) 寛容分布テストの問題を再検討する。つまり、未知の分布のサンプルが$p$ over $\{1, \dots, n\}$であるなら、$\varepsilon_1$-close to or $\varepsilon_2$-far from a reference distribution$q$ (the total variation distance)? 過去10年間の大きな関心にもかかわらず、この問題は極端なケースでのみよく理解されている。ノイズのない設定(すなわち$\varepsilon_1 = 0$)では、サンプルの複雑さは$\Theta(\sqrt{n})$で、ドメインサイズは強いサブ線形である。スペクトルの反対側では、$\varepsilon_1 = \varepsilon_2/2$ の場合、サンプルの複雑さはわずかにサブ線形な $\Theta(n/\log n)$ にジャンプする。しかし、中間体制についてはほとんど知られていない。分散テストにおける寛容の価格は、$n$, $\varepsilon_1$, $\varepsilon_2$の関数から、単一の$\log n$ factorまで、完全に特徴づける。具体的には、サンプルの複雑性は \[\tilde \theta\left(\frac{\sqrt{n}}{\varepsilon_2^{2}} + \frac{n}{\log n} \cdot \max \left\{\frac{\varepsilon_1}{\varepsilon_2^2},\left(\frac{\varepsilon_1}{\varepsilon_2^2}\right)^{\!\! また、p$ と$q$ の両方が未知である寛容同値検定の問題についても同様の特徴を与える。いずれの場合も、サンプルの複雑さを決定する主な量は、$\varepsilon_1/\varepsilon_2^2$であり、より直感的な$\varepsilon_1/\varepsilon_2$ではない。特に技術的な関心は下界フレームワークであり、$\varepsilon_1$と$\varepsilon_2$の間の非対称性を扱うのに必要な新しい近似理論ツールを含んでいる。

We revisit the problem of tolerant distribution testing. That is, given samples from an unknown distribution $p$ over $\{1, \dots, n\}$, is it $\varepsilon_1$-close to or $\varepsilon_2$-far from a reference distribution $q$ (in total variation distance)? Despite significant interest over the past decade, this problem is well understood only in the extreme cases. In the noiseless setting (i.e., $\varepsilon_1 = 0$) the sample complexity is $\Theta(\sqrt{n})$, strongly sublinear in the domain size. At the other end of the spectrum, when $\varepsilon_1 = \varepsilon_2/2$, the sample complexity jumps to the barely sublinear $\Theta(n/\log n)$. However, very little is known about the intermediate regime. We fully characterize the price of tolerance in distribution testing as a function of $n$, $\varepsilon_1$, $\varepsilon_2$, up to a single $\log n$ factor. Specifically, we show the sample complexity to be \[\tilde \Theta\left(\frac{\sqrt{n}}{\varepsilon_2^{2}} + \frac{n}{\log n} \cdot \max \left\{\frac{\varepsilon_1}{\varepsilon_2^2},\left(\frac{\varepsilon_1}{\varepsilon_2^2}\right)^{\!\!2}\right\}\right),\] providing a smooth tradeoff between the two previously known cases. We also provide a similar characterization for the problem of tolerant equivalence testing, where both $p$ and $q$ are unknown. Surprisingly, in both cases, the main quantity dictating the sample complexity is the ratio $\varepsilon_1/\varepsilon_2^2$, and not the more intuitive $\varepsilon_1/\varepsilon_2$. Of particular technical interest is our lower bound framework, which involves novel approximation-theoretic tools required to handle the asymmetry between $\varepsilon_1$ and $\varepsilon_2$, a challenge absent from previous works.

翻訳日:2021-06-28 12:50:59 公開日:2021-06-25

PDF登録状況（公開日: 20210625）