Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20210702となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# ランダム積公式に対する濃度 Concentration for random product formulas ( http://arxiv.org/abs/2008.11751v3 ) ライセンス: Link先を確認	Chi-Fang Chen, Hsin-Yuan Huang, Richard Kueng, Joel A. Tropp	(参考訳) 量子シミュレーションは、量子化学と物理学に広く応用されている。近年、量子シミュレーションを加速するためのランダム化手法の研究が始まっている。このうち、qDRIFTと呼ばれる単純で強力な手法は、平均量子チャネルが理想的な進化を近似するランダムな積公式を生成することが知られている。 qDRIFTは、スズキの公式と対照的なハミルトン式における項数に明示的に依存しないゲート数を達成する。本研究の目的は,qDRIFTが生成するランダムな積公式の単一実現を包括的に解析することで,このスピードアップの起源を理解することである。主な結果は、ランダム化された積公式の典型的な実現が、小さなダイヤモンドノルム誤差まで理想ユニタリ進化を近似することを示している。ゲートの複雑性は、既にハミルトニアンにおける項数とは独立であるが、ハミルトニアンにおける相互作用強度の和とシステムサイズに依存する。注目すべきは、任意のが固定された入力状態から始まる同じランダムな進化は、その入力状態に適したはるかに短い回路をもたらすことである。対照的に、決定論的設定では、そのような改善は通常、初期状態の知識を必要とする。証明はベクトルおよび行列マルチンタルの濃度不等式に依存し、他のランダム化された積公式にも適用できる。我々の境界はある種の通勤ハミルトニアンによって飽和している。 Quantum simulation has wide applications in quantum chemistry and physics. Recently, scientists have begun exploring the use of randomized methods for accelerating quantum simulation. Among them, a simple and powerful technique, called qDRIFT, is known to generate random product formulas for which the average quantum channel approximates the ideal evolution. qDRIFT achieves a gate count that does not explicitly depend on the number of terms in the Hamiltonian, which contrasts with Suzuki formulas. This work aims to understand the origin of this speed-up by comprehensively analyzing a single realization of the random product formula produced by qDRIFT. The main results prove that a typical realization of the randomized product formula approximates the ideal unitary evolution up to a small diamond-norm error. The gate complexity is already independent of the number of terms in the Hamiltonian, but it depends on the system size and the sum of the interaction strengths in the Hamiltonian. Remarkably, the same random evolution starting from an arbitrary, but fixed, input state yields a much shorter circuit suitable for that input state. In contrast, in deterministic settings, such an improvement usually requires initial state knowledge. The proofs depend on concentration inequalities for vector and matrix martingales, and the framework is applicable to other randomized product formulas. Our bounds are saturated by certain commuting Hamiltonians.	翻訳日:2023-05-04 21:30:30 公開日:2021-07-02
# コヒーレント励起窒素イオン中の光子保持 Photon retention in coherently excited nitrogen ions ( http://arxiv.org/abs/2011.11926v2 ) ライセンス: Link先を確認	Jinping Yao, Luojia Wang, Jinming Chen, Yuexin Wan, Zhihao Zhang, Fangbo Zhang, Lingling Qiao, Shupeng Yu, Botao Fu, Zengxiu Zhao, Chengyin Wu, Vladislav V. Yakovlev, Luqi Yuan, Xianfeng Chen, Ya Cheng	(参考訳) 量子光学における量子コヒーレンス(quantum coherence)は、光情報処理と光操作の重要な部分である。多くの欠点にもかかわらず、アルカリ金属の蒸気は、便利な近赤外励起、強い双極子転移、長寿命コヒーレンスにより、量子光学においてワーキング媒体として使用される。そこで本研究では,800nmフェムト秒レーザーパルスを用いたコヒーレント励起分子窒素イオン(N2+)系において,光子保持と量子コヒーレンスへの再有効性を示す実験を行った。このような光子保持は、量子コヒーレンスによって促進され、数十ピコ秒の間直接測定不能なコヒーレント光子を放出し続けるが、2光子共振吸収により1580nm中心の時間遅延フェムト秒パルスによって読み出され、329.3nmの強い放射となる。本システムでは, 励起状態の個体群が, 非常に弱い再放出光子を伝達する役割を明らかにする。この新たな発見は、N2+における光情報ストレージの潜在的なプラットフォームとしてのコヒーレントな量子制御の性質を明らかにし、強磁場イオン化分子を用いた量子光学プラットフォームにおける基本的な相互作用のさらなる探索を容易にする。 Quantum coherence in quantum optics is an essential part of optical information processing and light manipulation. Alkali metal vapors, despite the numerous shortcomings, are traditionally used in quantum optics as a working medium due to convenient near-infrared excitation, strong dipole transitions and long-lived coherence. Here, we proposed and experimentally demonstrated photon retention and subsequent re-emittance with the quantum coherence in a system of coherently excited molecular nitrogen ions (N2+) which are produced using a strong 800 nm femtosecond laser pulse. Such photon retention, facilitated by quantum coherence, keeps releasing directly-unmeasurable coherent photons for tens of picoseconds, but is able to be read-out by a time-delayed femtosecond pulse centered at 1580 nm via two-photon resonant absorption, resulting in a strong radiation at 329.3 nm. We reveal a pivotal role of the excited-state population to transmit such extremely weak re-emitted photons in this system. This new finding unveils the nature of the coherent quantum control in N2+ for the potential platform for optical information storage in the remote atmosphere, and facilitates further exploration of fundamental interactions in the quantum optical platform with strong-field ionized molecules.	翻訳日:2023-04-23 06:49:22 公開日:2021-07-02
# 5G重畳MIMOネットワークにおけるセル間干渉緩和のための強化学習支援ビームフォーミング Reinforcement Learning Assisted Beamforming for Inter-cell Interference Mitigation in 5G Massive MIMO Networks ( http://arxiv.org/abs/2103.11782v2 ) ライセンス: Link先を確認	Aidong Yang, Xinlang Yue, Ye Ouyang	(参考訳) ビームフォーミング(ビームフォーミング)は、5Gの大規模マルチインプット・マルチプルアウトプット(MMIMO)通信において重要な技術であり、無線伝送路の性質、すなわち空気の性質により多くの障害を受ける。細胞間干渉(ICI)は、周波数再利用技術による5G通信が直面する主な障害の1つである。本稿では,5GダウンリンクにおけるICI緩和のためのフルダイナミックビームフォーミングを支援する強化学習(RL)を提案する。提案アルゴリズムは、ICIを最小化するためにビームフォーミングとフルダイナミックQ-ラーニングを併用し、チャネル推定を行なわない低複雑さ手法を実現する。パフォーマンス分析は、他のアルゴリズムと比較して、sinr(signal-to-interference-plus-noise-ratio)と計算複雑性の観点からサービス改善の品質を示している。 Beamforming is an essential technology in the 5G massive multiple-input-multiple-output (MMIMO) communications, which are subject to many impairments due to the nature of wireless transmission channel, i.e. the air. The inter-cell interference (ICI) is one of the main impairments faced by 5G communications due to frequency-reuse technologies. In this paper, we propose a reinforcement learning (RL) assisted full dynamic beamforming for ICI mitigation in 5G downlink. The proposed algorithm is a joint of beamforming and full dynamic Q-learning technology to minimize the ICI, and results in a low-complexity method without channel estimation. Performance analysis shows the quality of service improvement in terms of signal-to-interference-plus-noise-ratio (SINR) and computational complexity compared to other algorithms.	翻訳日:2023-04-13 19:40:38 公開日:2021-07-02
# 古典的および量子的状態における仕事と熱の測定 Measurement of work and heat in the classical and quantum regimes ( http://arxiv.org/abs/2102.01493v2 ) ライセンス: Link先を確認	Paolo Solinas, Mirko Amico and Nino Zangh\`i	(参考訳) 量子レベルで仕事や熱の概念を研究する研究分野は、量子システムにおける仕事や熱や内部エネルギーの変化を適切に定義し測定することが困難であることと、実験の欠如という2つの大きな欠点に苦しめられている。本稿では, 工学的環境と相互作用する2レベル量子系の放散熱, 作業, 内部エネルギー変動の完全な特徴について報告する。我々は、IBMQ量子コンピュータを用いて、分散環境で駆動システムのダイナミクスを実装する。実験データを用いて準確率分布関数を構築し, 散逸過程における作業量, 熱量, 内部エネルギー量の正しい平均値を復元する。興味深いことに, 環境結合強度を増大させることにより, 古典的極限の出現と解釈されるエネルギー交換過程の純粋量子特性の低減を観測した。これにより、現在のアプローチはエネルギー交換における量子効果を研究し、理解し、活用するための特権的なツールとなる。 Despite the increasing interest, the research field which studies the concepts of work and heat at quantum level has suffered from two main drawbacks: first, the difficulty to properly define and measure the work, heat and internal energy variation in a quantum system and, second, the lack of experiments. Here, we report a full characterization of the dissipated heat, work and internal energy variation in a two-level quantum system interacting with an engineered environment. We use the IBMQ quantum computer to implement the driven system's dynamics in a dissipative environment. The experimental data allow us to construct quasi-probability distribution functions from which we recover the correct averages of work, heat and internal energy variation in the dissipative processes. Interestingly, by increasing the environment coupling strength, we observe a reduction of the pure quantum features of the energy exchange processes that we interpret as the emergence of the classical limit. This makes the present approach a privileged tool to study, understand and exploit quantum effects in energy exchanges.	翻訳日:2023-04-13 00:39:22 公開日:2021-07-02
# ボース・アインシュタイン音響ブラックホールにおける二分極と三分極の絡み合い Bipartite and tripartite entanglement in a Bose-Einstein acoustic black hole ( http://arxiv.org/abs/2102.06175v2 ) ライセンス: Link先を確認	Mathieu Isoard, Nadia Milazzo, Nicolas Pavloff, Olivier Giraud	(参考訳) ボース・アインシュタイン凝縮体の流れで実現される類似ブラックホールの量子絡み合いを調べる。この系は3モードガウス状態によって記述され、対応する共分散行列を0および有限温度で構成する。両分節および三分節の絡み合いについて検討し,その実験的観察について考察した。我々は、系のホーキング温度とグレーボディ係数を決定する新しい方法を示す類似のボース・アインシュタインブラックホールと同等の単純な光学装置を同定する。 We investigate quantum entanglement in an analogue black hole realized in the flow of a Bose-Einstein condensate. The system is described by a three-mode Gaussian state and we construct the corresponding covariance matrix at zero and finite temperature. We study associated bipartite and tripartite entanglement measures and discuss their experimental observation. We identify a simple optical setup equivalent to the analogue Bose-Einstein black hole which suggests a new way of determining the Hawking temperature and grey-body factor of the system.	翻訳日:2023-04-11 11:51:18 公開日:2021-07-02
# 開放型量子システムとしての神経系モデリング Modeling the Nervous System as An Open Quantum System ( http://arxiv.org/abs/2104.09424v2 ) ライセンス: Link先を確認	Yu-Juan Sun and Wei-Min Zhang	(参考訳) 本稿では,ニューロンをシミュレートし,神経細胞の周囲を介し相互に相互作用する多ニューロン相互作用系のニューラルネットワークモデルを提案する。我々は、神経活動電位の電気回路環境から生じるあらゆる種類の振動モードの収集として、樹状突起、軸索、シナプス、および周囲のグリア細胞を含む神経細胞周囲を物理的にモデル化する。オープン量子システムのマスター方程式を用いて神経モデルのダイナミクスを解析し,ニューロンの集団行動について検討した。神経回路に刺激を施した後、ニューロン集団状態が活性化され、行動電位の挙動を示す。このモデルはランダムなニューロンとニューロンの相互作用を発生させ、神経系における情報伝達の過程を物理的に記述するのに適しており、神経系のダイナミクスを理解するための潜在的な経路となる可能性がある。 We propose a neural network model of multi-neuron interacting system that simulates neurons to interact each other through the surroundings of neuronal cell bodies. We physically model the neuronal cell surroundings, include the dendrites, the axons and the synapses as well as the surrounding glial cells, as a collection of all kinds of oscillating modes arisen from the electric circuital environment of neuronal action potentials. By analyzing the dynamics of this neural model through the master equation approach of open quantum systems, we investigate the collective behavior of neurons. After applying stimulations to the neural network, the neuronal collective state is activated and shows the action potential behavior. We find that this model can generate random neuron-neuron interactions and is proper to describe the process of information transmission in the nervous system physically, which may pave a potential route toward understanding the dynamics of nervous system.	翻訳日:2023-04-07 18:40:23 公開日:2021-07-02
# Lifshitzフェルミオンによる絡み合いエントロピー Entanglement Entropy with Lifshitz Fermions ( http://arxiv.org/abs/2104.10913v3 ) ライセンス: Link先を確認	Dion Hartmann and Kevin Kavanagh and Stefan Vandoren	(参考訳) lifshitzのスケーリング対称性を持つフェルミオンを調べ、1+1次元のエンタングルメントエントロピーをスケーリング指数$z$の関数として研究する。興味深いことに、基底状態では、絡み合いエントロピーは$z$の偶数に対して消えるが、奇数に対しては$z$とは独立であり、$z=1$の相対論的ケースと同値である。格子上での相関法とホログラフィック cMERA を用いた手法を用いてこれを示す。熱状態における絡み合いエントロピーは、格子相関法を用いてプロットする$z$と$T$のより詳細な関数である。 z$ の偶数または奇数性に依存することは、まだ小さな温度を示すが、大きな温度または大きな値である$z$ で洗い流される。 We investigate fermions with Lifshitz scaling symmetry and study their entanglement entropy in 1+1 dimensions as a function of the scaling exponent $z$. Remarkably, in the ground state the entanglement entropy vanishes for even values of $z$, whereas for odd values it is independent of $z$ and equal to the relativistic case with $z=1$. We show this using the correlation method on the lattice, and also using a holographic cMERA approach. The entanglement entropy in a thermal state is a more detailed function of $z$ and $T$ which we plot using the lattice correlation method. The dependence on the even- or oddness of $z$ still shows for small temperatures, but is washed out for large temperatures or large values of $z$.	翻訳日:2023-04-02 20:28:47 公開日:2021-07-02
# 量子光合成における量子フィードバック制御 Quantum feedback control in quantum photosynthesis ( http://arxiv.org/abs/2105.12128v3 ) ライセンス: Link先を確認	S.V. Kozyrev, A.N. Pechen	(参考訳) 相互作用する励起子とビブロンの系における量子フィードバック制御のモデルとしての量子光合成における電荷分離のモデルを紹介する。このアプローチにおける量子フィードバックは、デコヒーレンスを伴うランダウ・ツェナー遷移を記述する。このモデルは、量子光合成における電荷分離の過程における非可逆性を説明する。この量子制御モデルに対する直接遷移は1に近い確率を持ち、逆遷移は0に近い確率を持つ。これは量子ラチェットのモデルと見なすことができる。また、このモデルは遷移のボーア周波数と遷移に結合したビブロンのエネルギーの一致を説明する。 A model of charge separation in quantum photosynthesis as a model of quantum feedback control in a system of interacting excitons and vibrons is introduced. Quantum feedback in this approach describes the Landau--Zener transition with decoherence. The model explains irreversibility in the process of charge separation for quantum photosynthesis -- direct transitions for this quantum control model will have probabilities close to one and reverse transitions will have probabilities close to zero. This can be considered as a model of quantum ratchet. Also this model explains coincidence of energy of the vibron paired to the transition and Bohr frequency of the transition.	翻訳日:2023-03-29 20:40:35 公開日:2021-07-02
# 時間依存相対論的非摂動クーロン場における自然放射スペクトルのゲージ依存性 Gauge dependence of spontaneous radiation spectrum in a time-dependent relativistic non-perturbative Coulomb field ( http://arxiv.org/abs/2106.03429v3 ) ライセンス: Link先を確認	Xue-Nan Chen, Yu-hang Luo and Xiang-Song Chen	(参考訳) 我々は、相対論的荷電粒子のクラスターによって生成できる時間依存相対論的非摂動クーロン場を含むことをラムが指摘した「ゲージ選択」問題を拡張する。断熱条件が慎重に維持されている場合は、原子状態を定義する際に核クーロンポテンシャルの側にその場を含めなければならない。外部磁場近似をとると、この時間依存相対論的非摂動クーロン場に対するゲージ選択は、従来の方法では克服できず、一過性自発的放射スペクトルのゲージ依存性がかなり大きいことが判明した。我々は、一般的なクーロン、ローレンツ、マルチポーラゲージに対して、そのようなゲージ依存性が10MHz以上であるような単純な1次元電荷調和振動子を明示的に計算する。一般の見解とは対照的に、このゲージ依存は実際には災害ではなく、実際には利点である:相対論的境界状態問題は非常に複雑であり、完全な量子場法が不足しているため、外部場の近似は導出できず、したがって保証されない。しかし、実験データに適合することにより、常に有効外部場を定義することができ、これは特定のゲージのゲージポテンシャルとパラメータ化される可能性がある。この効果的な外部場は現象学的な用途だけでなく、ゲージ場の物理的意義にも光を当てた。 We extend the "gauge choice" problem Lamb noticed to include a time-dependent relativistic non-perturbative Coulomb field, which can be produced by a cluster of relativistic charged particles. If adiabatic conditions are carefully maintained, such a field must be included along side the nuclear Coulomb potential when defining the atomic state. We reveal that when taking the external field approximation, the gauge choice for this time-dependent relativistic non-perturbative Coulomb field cannot be overcome by previous method, and leads to considerable gauge-dependence of the transient spontaneous radiation spectrum. We calculate explicitly with a simple one-dimensional charged harmonic oscillator that such a gauge-dependence can be of a measurable magnitude of 10 MHz or larger for the commonly used Coulomb, Lorentz, and multipolar gauges. Contrary to the popular view, we explain that this gauge dependence is not really a disaster, but actually an advantage here: The relativistic bound-state problem is so complicated that a fully quantum-field method is still lacking, thus the external field approximation cannot be derived and hence not guaranteed. However, by fitting to the experimental data, one may always define an effective external field, which may likely be parameterized with the gauge potential in a particular gauge. This effective external field would not only be of phenomenological use, but also shed light on the physical significance of the gauge field.	翻訳日:2023-03-27 09:16:24 公開日:2021-07-02
# 飽和付近の量子スピン解法:qs$^3_{~}$ Quantum spin solver near saturation: QS$^3_{~}$ ( http://arxiv.org/abs/2107.00872v1 ) ライセンス: Link先を確認	Hiroshi Ueda, Seiji Yunoki, Tokuro Shimokawa	(参考訳) QS$^{3}$ [\textipa{kj\'u:-\'es-kj\'u:b}] というプログラムパッケージを開発し、スピン-1/2 XXZ型量子スピンモデルを全偏極状態近傍の空間的均一・非一様格子上で解析し、希薄なハードコアボース系にマッピングする。 qs$^{3}$の全ての計算は、固有値問題、一点/二点スピン作用素の期待値、静的/動的スピン構造因子を含む、ダウンスピンの$n_{\downarrow}$とスピン配置のビット表現を使わずに翻訳対称性に関連付けられた波数$\boldsymbol{k}$で指定された対称性対応基底で実行される。これらの処理により、QS$^{3}$は1000以上のサイトと希薄な$N_{\downarrow}$を含む大規模量子システムをサポートすることができる。 10\times10\times10$立方格子上の等方性ハイゼンベルクモデルの低エネルギー励起分散に対するqs$^{3}$のベンチマーク結果、10\times10$平方格子上の等方性ハイゼンベルクモデルの静的および動的スピン構造因子、および固体物理学研究所(issp)に設置されたamd epyc 7702に基づくスーパーコンピュータ(ohtaka)上でのオープンmp並列化効率を示す。理論的背景とQS$^{3}$のユーザインタフェースについても述べる。 We develop a program package named QS$^{3}$ [\textipa{kj\'u:-\'es-kj\'u:b}] based on the (thick-restart) Lanczos method for analyzing spin-1/2 XXZ-type quantum spin models on spatially uniform/non-uniform lattices near fully polarized states, which can be mapped to dilute hardcore Bose systems. All calculations in QS$^{3}$, including eigenvalue problems, expectation values for one/two-point spin operators, and static/dynamical spin structure factors, are performed in the symmetry-adapted bases specified by the number $N_{\downarrow}$ of down spins and the wave number $\boldsymbol{k}$ associated with the translational symmetry without using the bit representation for specifying spin configurations. Because of these treatments, QS$^{3}$ can support large-scale quantum systems containing more than 1000 sites with dilute $N_{\downarrow}$. We show the benchmark results of QS$^{3}$ for the low-energy excitation dispersion of the isotropic Heisenberg model on the $10\times10\times10$ cubic lattice, the static and dynamical spin structure factors of the isotropic Heisenberg model on the $10\times10$ square lattice, and the open-MP parallelization efficiency on the supercomputer (Ohtaka) based on AMD Epyc 7702 installed at the Institute for the Solid State Physics (ISSP). Theoretical backgrounds and the user interface of QS$^{3}$ are also described.	翻訳日:2023-03-23 18:51:19 公開日:2021-07-02
# 室温導体による2つのレーザー冷却イオンのカップリング Coupling two laser-cooled ions via a room-temperature conductor ( http://arxiv.org/abs/2107.00851v1 ) ライセンス: Link先を確認	Da An, Alberto M. Alonso, Clemens Matthiesen, and Hartmut H\"affner	(参考訳) 分離距離620$\mu$mの2つの独立に捕捉されたイオンの運動間の結合を示す。イオン-イオン相互作用は、2つの表面トラップを接続する室温電気浮遊金属線を介して強化される。両イオンの共鳴運動を調整し、結合速度11Hzのエネルギーの流れを示す。量子コヒーレント結合はデバイス内の強表面電界ノイズによって妨げられる。イオン配線系は、室温導体を用いて、自由空間双極子-双極子カップリングによって達成可能な距離を超える距離における、独立に閉じ込められた電荷間の相互作用を仲介し、調整することができることを示す。この技術は、同調的に冷却したり、遠隔で閉じ込められた電荷を絡ませたり、異なる物理的システム間のカップリングを可能にするために用いられる。 We demonstrate coupling between the motions of two independently trapped ions with a separation distance of 620 $\mu$m. The ion-ion interaction is enhanced via a room-temperature electrically floating metallic wire which connects two surface traps. Tuning the motion of both ions into resonance, we show flow of energy with a coupling rate of 11 Hz. Quantum-coherent coupling is hindered by strong surface electric-field noise in our device. Our ion wire-ion system demonstrates that room-temperature conductors can be used to mediate and tune interactions between independently trapped charges over distances beyond those achievable with free-space dipole-dipole coupling. This technology may be used to sympathetically cool or entangle remotely trapped charges and enable coupling between disparate physical systems.	翻訳日:2023-03-23 18:50:42 公開日:2021-07-02
# 量子計測仮定の微視的導出 A microscopic derivation of the quantum measurement postulates ( http://arxiv.org/abs/2107.00803v1 ) ライセンス: Link先を確認	Vyacheslav Lysov and Yasha Neiman	(参考訳) 19世紀中頃、力学と熱力学の両方の法則が知られ、どちらも基本的であった。これはボルツマンとギブスによって変更され、熱力学は、非常に大きなシステムに力学を適用し、それらの振る舞いに関する単純な統計的仮定をすることで、由来であることを示した。同様に、量子力学(QM)が最初に発見されたとき、波動関数の決定論的進化と確率論的測定プロセスの2つの仮定を必要とするように見えた。ここでも後者は前者から導かれる: 大規模システム(機器、観測者、環境)にユニタリ進化を適用し、それらの振る舞いに関する単純な仮定をすることで、量子測定のすべての特徴を導出することができる。私たちは、量子実験の単純で明示的なモデルを用いて、この主張を実証することにしました。 In the mid-19th century, both the laws of mechanics and thermodynamics were known, and both appeared fundamental. This was changed by Boltzmann and Gibbs, who showed that thermodynamics can be derived, by applying mechanics to very large systems, and making simple statistical assumptions about their behavior. Similarly, when Quantum Mechanics (QM) was first discovered, it appeared to require two sets of postulates: one about the deterministic evolution of wavefunctions, and another about the probabilistic measurement process. Here again, the latter is derivable from the former: by applying unitary evolution to large systems (apparatuses, observers and environment), and making simple assumptions about their behavior, one can derive all the features of quantum measurement. We set out to demonstrate this claim, using a simple and explicit model of a quantum experiment, which we hope will be clear and compelling to the average physicist.	翻訳日:2023-03-23 18:50:23 公開日:2021-07-02
# 連続変数を持つ量子場理論のための量子イマジナリー時間進化アルゴリズム Quantum Imaginary Time Evolution Algorithm for Quantum Field Theories with Continuous Variables ( http://arxiv.org/abs/2107.00791v1 ) ライセンス: Link先を確認	K\"ubra Yeter-Aydeniz, Eleftherios Moschandreou, George Siopsis	(参考訳) 量子想像時間進化アルゴリズムの連続可変バージョンを用いて格子上の相互作用するスカラー量子場理論のエネルギーレベルと対応する固有状態を計算する。格子上の各点における場のシミュレーションには1つのqumodeのみが必要である。我々の量子アルゴリズムは非ガウス量子ゲートの使用を回避し、代わりに光子数演算子の固有状態に投影する検出器に依存する。 XanaduのStrawberry Fieldsシミュレーターを用いて、正確な計算結果と非常によく一致したエネルギーレベルの結果を得る。既存の技術で実現可能な実験的なセットアップを提案する。 We calculate the energy levels and corresponding eigenstates of an interacting scalar quantum field theory on a lattice using a continuous-variable version of the quantum imaginary time evolution algorithm. Only a single qumode is needed for the simulation of the field at each point on the lattice. Our quantum algorithm avoids the use of non-Gaussian quantum gates and relies, instead, on detectors projecting onto eigenstates of the photon-number operator. Using Xanadu's Strawberry Fields simulator, we obtain results on energy levels that are in very good agreement with results from exact calculations. We propose an experimental setup that can be realized with existing technology.	翻訳日:2023-03-23 18:50:05 公開日:2021-07-02
# 1つのモード間相互作用を有する非定常キャビティの厳密解 Exact solution of a non-stationary cavity with one intermode interaction ( http://arxiv.org/abs/2107.00785v1 ) ライセンス: Link先を確認	I. Ramos-Prieto, R. Rom\'an-Ancheyta, J. R\'ecamier and H. M. Moya-Cessa	(参考訳) 非定常一次元空洞は、いわゆる動的カシミール効果の時間依存的かつ多モード有効ハミルトニアンによって記述することができる。共振器ミラーの1つに課される非断熱境界条件により、この効果は電磁場の真空変動から実際の光子の発生を予測する。このような光子生成はキャビティにおけるモードの数とその中間結合に依存する。ここでは代数的アプローチを用いて、実効ハミルトニアンをパラメタライズする任意の関数に対して、対応する時間依存シュリンガー方程式は空洞が1つの終端相互作用を持つときの正確な解を認めることを示す。 11個の指数関数の積として書かれた正確な時間発展演算子により、各モードの平均光子数、関連する観測可能数、進化した真空状態の統計特性が得られる。 A non-stationary one-dimensional cavity can be described by the time-dependent and multi-mode effective Hamiltonian of the so-called dynamical Casimir effect. Due to the non-adiabatic boundary conditions imposed in one of the cavity mirrors, this effect predicts the generation of real photons out of vacuum fluctuations of the electromagnetic field. Such photon generation strongly depends on the number of modes in the cavity and their intermode couplings. Here, by using an algebraic approach, we show that for any set of functions parameterizing the effective Hamiltonian, the corresponding time-dependent Schr\"odinger equation admits an exact solution when the cavity has one intermode interaction. With the exact time evolution operator, written as a product of eleven exponentials, we obtain the average photon number in each mode, a few relevant observables and some statistical properties for the evolved vacuum state.	翻訳日:2023-03-23 18:49:56 公開日:2021-07-02
# 二成分多モード状態に対するアインシュタイン・ポドルスキー・ローゼンの不確かさ限界 Einstein-Podolsky-Rosen uncertainty limits for bipartite multimode states ( http://arxiv.org/abs/2107.01058v1 ) ライセンス: Link先を確認	Paulina Marian and Tudor A. Marian	(参考訳) 量子系の多部状態に対する相関の証明と定量化は、量子情報理論において中心的な課題であるように見える。ここでは、連続変数のマルチモード状態の絡み合いとアインシュタイン-ポドルスキー-ローゼン(epr)ステアリングの両方のユニタリ量子力学的観点を与える。これはモードの正準二次作用素に対するハイゼンベルクの不確実性関係に由来する。適切なEPR様観測値の対のばらつきを用いて, 2-party $(N\, \text{vs} \,1)$-mode状態の相関について検討した。これらの非局所変数の不確かさの和は、下から局所の不確かさによって束縛され、分離可能な状態と、各一方通行不能な状態に対して異なる強化がなされる。これらの分散の最小の正規化和の分析は、両方の可能なステアリング方法において、$(N\, \text{vs} \,1)$-モード状態の分離性とEPR不安定性の必要条件をもたらす。状態と実行された測定値がガウス的である場合、これらの条件は正確には分離性と一方的不安定性の既知基準である。 Certification and quantification of correlations for multipartite states of quantum systems appear to be a central task in quantum information theory. We give here a unitary quantum-mechanical perspective of both entanglement and Einstein-Podolsky-Rosen (EPR) steering of continuous-variable multimode states. This originates in the Heisenberg uncertainty relations for the canonical quadrature operators of the modes. Correlations of two-party $(N\, \text{vs} \,1)$-mode states are examined by using the variances of a pair of suitable EPR-like observables. It turns out that the uncertainty sum of these nonlocal variables is bounded from below by local uncertainties and is strengthened differently for separable states and for each one-way unsteerable ones. The analysis of the minimal properly normalized sums of these variances yields necessary conditions of separability and EPR unsteerability of $(N\, \text{vs} \,1)$-mode states in both possible ways of steering. When the states and the performed measurements are Gaussian, then these conditions are precisely the previously-known criteria of separability and one-way unsteerability.	翻訳日:2023-03-23 18:45:21 公開日:2021-07-02
# FPGAによる量子鍵分布の大規模かつ高速なプライバシ増幅 Large-scale and High-speed Privacy Amplification for FPGA-based Quantum Key Distribution ( http://arxiv.org/abs/2107.01013v1 ) ライセンス: Link先を確認	Yan Bingze and Li Qiong and Mao Haokun	(参考訳) FPGAベースの量子鍵分布(QKD)システムはQKDシステムの重要なトレンドである。いくつかの利点、リアルタイム、低消費電力、高統合密度がある。プライバシアンプリフィケーションは、QKDのセキュリティを確保するために、QKDシステムにおいて不可欠な部分である。既存のFPGAベースのプライバシー増幅スキームには、これらのスキームのスループットと入力サイズ(最良のスキーム116Mbps@10^6)が他のプラットフォームよりもはるかに低い(最良のスキーム1Gbps@10^8)という欠点がある。本稿では,マルチ線形モジュラーハッシュモジュラー演算ハッシュ(MMH-MH)と数値理論変換(NTT)アルゴリズムを用いたFPGAベースのQKDのための新しいPAスキームを設計する。大規模かつ高速(LSHS)PAスキームと名付けられた新しいPAスキームは、乗算再利用可能なアーキテクチャと3つのキーユニットを設計し、性能を改善した。この方式はPAの入力サイズとスループットを桁違いに改善する。このスキームのスループットと入力サイズ(1gbps@10^8)は、他のプラットフォームと同等である。 The FPGA-based Quantum key distribution (QKD) system is an important trend of QKD systems. It has several advantages, real time, low power consumption and high integration density. Privacy amplification is an essential part in a QKD system to ensure the security of QKD. Existing FPGA-based privacy amplification schemes have an disadvantage, that the throughput and the input size of these schemes (the best scheme 116Mbps@10^6) are much lower than these on other platforms (the best scheme 1Gbps@10^8). This paper designs a new PA scheme for FPGA-based QKD with multilinear modular hash-modular arithmetic hash (MMH-MH) PA and number theoretical transform (NTT) algorithm. The new PA scheme, named large-scale and high-speed (LSHS) PA scheme, designs a multiplication-reusable architecture and three key units to improve the performance. This scheme improves the input size and throughput of PA by above an order of magnitude. The throughput and input size of this scheme (1Gbps@10^8) is at a comparable level with these on other platforms.	翻訳日:2023-03-23 18:44:38 公開日:2021-07-02
# 1自由度ハミルトン・サドルノード分岐の量子力学 Quantum dynamics of a one degree-of-freedom Hamiltonian saddle-node bifurcation ( http://arxiv.org/abs/2107.00979v1 ) ライセンス: Link先を確認	Wenyang Lyu, Shibabrat Naik, Stephen Wiggins	(参考訳) 本稿では、位相空間における平衡点のサドルノード分岐の正規形式である1次自由度ハミルトニアン(DOF)の量子力学について検討する。ハミルトニアンは運動エネルギーとポテンシャルエネルギーの和の形をしている。分岐パラメータはポテンシャルエネルギー関数にあり、ポテンシャルエネルギーに対するその影響はポテンシャル井戸の深さを変化させることである。主な焦点は、井戸の深さが量子力学に与える影響を評価することである。この評価は、時間に依存しないシュリンガー方程式のエネルギー固有値と固有ベクトル、位置座標に対する期待値と位置不確かさ、ウィグナー関数の計算によって行われる。 In this paper, we study the quantum dynamics of a one degree-of-freedom (DOF) Hamiltonian that is a normal form for a saddle node bifurcation of equilibrium points in phase space. The Hamiltonian has the form of the sum of kinetic energy and potential energy. The bifurcation parameter is in the potential energy function and its effect on the potential energy is to vary the depth of the potential well. The main focus is to evaluate the effect of the depth of the well on the quantum dynamics. This evaluation is carried out through the computation of energy eigenvalues and eigenvectors of the time-independent Schr\"odinger equations, expectation values and position uncertainties for position coordinate, and Wigner functions.	翻訳日:2023-03-23 18:44:22 公開日:2021-07-02
# 量子・古典宇宙論の時間と進化 Time and Evolution in Quantum and Classical Cosmology ( http://arxiv.org/abs/2107.00917v1 ) ライセンス: Link先を確認	Alexander Yu. Kamenshchik, Jeinny Nallely Perez Rodriguez and Tereza Vardanyan	(参考訳) 量子宇宙論における動的進化と時間の問題を分析する。我々は、量子作用素の期待値に対して古典的進化が再現されるような方法で、時間パラメータの役割を担える位相空間変数の選択の問題を強調する。我々は、時間変数と超ハミルトニアンの間のポアソン括弧が位相空間のすべてにおいて一元に等しい必要も十分もないことを示した。また、異なる内部時間間の切り替えの問題や、量子論のモンテビデオ解釈についても論じる。 We analyze the issue of dynamical evolution and time in quantum cosmology. We emphasize the problem of choice of phase space variables that can play the role of a time parameter in such a way that for expectation values of quantum operators the classical evolution is reproduced. We show that it is neither necessary nor sufficient for the Poisson bracket between the time variable and the super-Hamiltonian to be equal to unity in all of the phase space. We also discuss the question of switching between different internal times as well as the Montevideo interpretation of quantum theory.	翻訳日:2023-03-23 18:43:50 公開日:2021-07-02
# システム環境絡みの移動とテレポーテーション Transfer and teleportation of system-environment entanglement ( http://arxiv.org/abs/2107.00895v1 ) ライセンス: Link先を確認	Tytus Harlender and Katarzyna Roszak	(参考訳) 環境を考慮した双方向テレポーテーションの研究を行っている。この環境は最初、テレポーテーションを補助するベル状態の純粋なデコヒーレンスを引き起こす。テレポーテーションが一方向に行われると、相関関係がqubit $c$のポストテレポーテーション状態へ転送され、結果として状態が非一貫性になる。他方では,新たなデコヒーレンス処理が起こらない場合には,キュービットの状態だけでなく,その環境との相関関係を単位忠実度でテレポートしていることが分かる。これらの過程はテルポーテーション中の測定結果に依存しず、古典相関と量子相関を区別しない。一方、第2のテレポーテーションステップがベル状態のデコヒーレンスによって先行している場合、状況はさらに複雑である。相関のテレポーテーションと転送は同時に発生し、異なる測定結果に対して異なるテレポーティング量子環境状態が得られる。これらの状態は、テレポートされたキュービットのコヒーレンス度が異なるが、テレポーテーションの最初の段階でベル状態-環境相互作用が絡み合う場合のみ、異なる量のキュービット環境絡み合いを持つことができる。極端な場合、テレポートされた量子ビット状態の1つは環境と絡み合うことができ、もう1つは分離可能である。 We study bidirectional teleportation while explicitly taking into account an environment. This environment initially causes pure dephasing decoherence of the Bell state which assists teleportation. We find that when teleportation is performed in one direction it is accompanied by a transfer of correlations into the post-teleportation state of qubit $C$, which results in decoherence of the state. In the other direction, if no new decoherence process occurs, we find that not only the state of the qubit but also its correlations with an environment are being teleported with unit Fidelity. These processes do not depend on the measurement outcome during telportation and do not differentiate between classical and quantum correlations. If, on the other hand, the second teleportation step is preceded by decoherence of the Bell state then the situation is much more complicated. Teleportation and transfer of correlations occur simultaneously, yielding different teleported qubit-environment states for different measurement outcomes. These states can differ in the degree of coherence of the teleported qubit, but only for an entangling Bell-state-environment interaction in the first step of teleportation, can they have different amounts of qubit-environment entanglement. In the extreme case, one of the teleported qubit states can be entangled with the environment while the other is separable.	翻訳日:2023-03-23 18:43:02 公開日:2021-07-02
# 安定化状態とグラフ状態の低減量子回路 Reduced quantum circuits for stabilizer states and graph states ( http://arxiv.org/abs/2107.00885v1 ) ライセンス: Link先を確認	Marc Bataille	(参考訳) まず,安定化回路を構成する部分群構造を考察し,本結果を用いて安定化回路の新たな正規形を提案する。この正規形式はクリフォード群における単純な共役規則を用いて誘導によって計算される。形状は CX-CZ-P-H-CZ-P-H で、CX (resp. CZ) は$\cnot$ (resp.) の層を表す。 $\cz$) ゲート、P は位相ゲートの層、H はアダマールゲートの層である。次に、安定状態の正規形を考え、グラフ状態を実装する回路における2量子ビットゲート数を削減する方法を示す。最後に,本手法の実用性を示すため,古典計算機と量子コンピュータの数値実験を行った。論文に記載されているすべてのアルゴリズムは、GitHubで利用可能なLinuxコマンドとして、C言語で実装されている。 We start by studying the subgroup structures underlying stabilizer circuits and we use our results to propose a new normal form for stabilizer circuits. This normal form is computed by induction using simple conjugation rules in the Clifford group. It has shape CX-CZ-P-H-CZ-P-H, where CX (resp. CZ) denotes a layer of $\cnot$ (resp. $\cz$) gates, P a layer of phase gates and H a layer of Hadamard gates. Then we consider a normal form for stabilizer states and we show how to reduce the two-qubit gate count in circuits implementing graph states. Finally we carry out a few numerical tests on classical and quantum computers in order to show the practical utility of our methods. All the algorithms described in the paper are implemented in the C language as a Linux command available on GitHub.	翻訳日:2023-03-23 18:42:39 公開日:2021-07-02
# 最大エントロピー法による再構成密度行列の純状態への収束 Convergence of reconstructed density matrix to a pure state using maximal entropy approach ( http://arxiv.org/abs/2107.01191v1 ) ライセンス: Link先を確認	Rishabh Gupta, Sabre Kais and Raphael D. Levine	(参考訳) 様々な種類の量子システムの技術応用の研究において、過去10年間に印象的な進歩があった。 IBMのような業界の巨人が、2023年末までに1000量子ビットを超えるスケーラブルな量子デバイスに関するロードマップを公開し、これらのデバイス上で量子処理をテストするための効率的な検証技術も開発されている。量子状態のキャラクタリゼーションは、量子状態トモグラフィ(QST)と呼ばれるプロセスを通じて実験的に測定され、システムのサイズと指数関数的にスケールする。しかし、不完全測定を用いたQSTは、これらの量子技術の特徴、特に全ての平均測定が高忠実で利用できるわけではないノイズの多い中間規模量子(NISQ)デバイスの現在の性質に適している。本稿では,量子系の密度行列を任意の数の量子ビットに対して完全再構成するために,最大エントロピー形式を既知平均測定値のペアワイズ結合に適用することにより,qstの代替手法を提案する。このアプローチは、再構成された密度行列を純粋な状態に収束する場合の観測可能な完全な集合を知っているとき、対象状態の最良の推定を提供する。我々のゴールは、純粋状態の量子システムの実用的な推論を提供することで、その応用を実際の量子コンピュータにおける量子エラー軽減の分野に適用し、さらなる研究を予定している。 Impressive progress has been made in the past decade in the study of technological applications of varied types of quantum systems. With industry giants like IBM laying down their roadmap for scalable quantum devices with more than 1000-qubits by the end of 2023, efficient validation techniques are also being developed for testing quantum processing on these devices. The characterization of a quantum state is done by experimental measurements through the process called quantum state tomography (QST) which scales exponentially with the size of the system. However, QST performed using incomplete measurements is aptly suited for characterizing these quantum technologies especially with the current nature of noisy intermediate-scale quantum (NISQ) devices where not all mean measurements are available with high fidelity. We, hereby, propose an alternative approach to QST for the complete reconstruction of the density matrix of a quantum system in a pure state for any number of qubits by applying the maximal entropy formalism on the pairwise combinations of the known mean measurements. This approach provides the best estimate of the target state when we know the complete set of observables which is the case of convergence of the reconstructed density matrix to a pure state. Our goal is to provide a practical inference of a quantum system in a pure state that can find its applications in the field of quantum error mitigation on a real quantum computer that we intend to investigate further.	翻訳日:2023-03-23 18:36:32 公開日:2021-07-02
# 図形計算における量子多値決定図 Quantum Multiple-Valued Decision Diagrams in Graphical Calculi ( http://arxiv.org/abs/2107.01186v1 ) ライセンス: Link先を確認	Renaud Vilmart	(参考訳) zh計算のようなグラフィカル計算は、量子過程の研究と解析において強力なツールであり、量子回路や測定に基づく計算などの量子計算の他のモデルとリンクしている。量子過程を記述するためのややコンパクトだが体系的な方法は量子多重値決定図(QMDD)を使うことであり、量子回路の合成や検証にすでに使われている。本稿では,QMDDを等価なZH-ダイアグラム,逆変換に変換する方法を示し,QMDDの削減がZH-カルキュラスでどのように変換されるかを示す。 Graphical calculi such as the ZH-calculus are powerful tools in the study and analysis of quantum processes, with links to other models of quantum computation such as quantum circuits, measurement-based computing, etc. A somewhat compact but systematic way to describe a quantum process is through the use of quantum multiple-valued decision diagrams (QMDDs), which have already been used for the synthesis of quantum circuits as well as for verification. We show in this paper how to turn a QMDD into an equivalent ZH-diagram, and vice-versa, and show how reducing a QMDD translates in the ZH-Calculus, hence allowing tools from one formalism to be used into the other.	翻訳日:2023-03-23 18:36:12 公開日:2021-07-02
# 表面イオントラップの1/{\omega} 電界雑音に上昇する実効性基板の吸着ダイナミクス How Correlated Adsorbate Dynamics on Realistic Substrates Can Give Rise to 1/{\omega} Electric-Field Noise in Surface Ion Traps ( http://arxiv.org/abs/2107.01177v1 ) ライセンス: Link先を確認	Benjamin Foulon, Keith G. Ray, Chang-Eun Kim, Yuan Liu, Brenda M. Rubenstein, and Vincenzo Lordi	(参考訳) イオントラップは、スケーラブルな量子コンピューティングを実装する上で有望なアーキテクチャであるが、過度の"異常"加熱に悩まされ、その潜在能力の完全な実現を妨げている。この加熱はジョンソン-ナイキストノイズから予想されるよりも桁違いに大きいため、量子論理ゲートの非一貫性と忠実度を低下させるイオン運動を引き起こす。異常加熱の正確な起源は未解決の問題であるが、実験ではトラップ電極に吸着する可能性が示唆されている。異常加熱の多くのモデルが提案されているが、これらのモデルは0.1-10mhzの周波数でイオントラップで観測される1/\omega$電界ノイズスケーリングの原子論的起源を突き止めていない。本研究では,第一原理ポテンシャルによって記述された吸着剤の多層膜の運動によって生じるイオントラップ電界雑音の計算的研究を行う。このようにして、相関吸着運動が1/\omega$ノイズの生成において決定的な役割を果たすことを示すとともに、一般的にイオントラップで使用されるMHz周波数での1/\omega$スケーリングを引き起こす吸着パッチと多層交換の変換および回転運動を含む、候補の集合吸着運動を特定する。これらの結果は、複数の吸着系が、単純なものであっても、イオントラップで観測される1/\omega$のノイズを発生させる一連の活性化運動を生じさせ、個々の吸着運動よりも集団的に低周波加熱を引き起こす可能性が高いことを示している。 Ion traps are promising architectures for implementing scalable quantum computing, but they suffer from excessive "anomalous" heating that prevents their full potential from being realized. This heating, which is orders of magnitude larger than that expected from Johnson-Nyquist noise, results in ion motion that leads to decoherence and reduced fidelity in quantum logic gates. The exact origin of anomalous heating is an open question, but experiments point to adsorbates on trap electrodes as a likely source. Many different models of anomalous heating have been proposed, but these models have yet to pinpoint the atomistic origin of the experimentally-observed $1/\omega$ electric field noise scaling observed in ion traps at frequencies between 0.1-10 MHz. In this work, we perform the first computational study of the ion trap electric field noise produced by the motions of multiple monolayers of adsorbates described by first principles potentials. In so doing, we show that correlated adsorbate motions play a definitive role in producing $1/\omega$ noise and identify candidate collective adsorbate motions, including translational and rotational motions of adsorbate patches and multilayer exchanges, that give rise to $1/\omega$ scaling at the MHz frequencies typically employed in ion traps. These results demonstrate that multi-adsorbate systems, even simple ones, can give rise to a set of activated motions that can produce the $1/\omega$ noise observed in ion traps and that collective, rather than individual, adsorbate motions are much more likely to give rise to low-frequency heating.	翻訳日:2023-03-23 18:35:59 公開日:2021-07-02
# NVスピンレジスタの初期化のための量子制御シーケンスの最適化 Optimization of a quantum control sequence for initializing an NV spin register ( http://arxiv.org/abs/2107.01116v1 ) ライセンス: Link先を確認	T. Chakraborty, J. Zhang and D. Suter	(参考訳) 多くの量子情報プロトコルの実装は、量子レジスタの効率的な初期化を必要とする。本報告では,ダイヤモンド中の窒素空孔(NV)中心に付随するハイブリッドスピンレジスタを初期化するための集団トラッププロトコルを最適化する。我々はNVの電子スピンと核スピンをマイクロ波、高周波、光パルスのシーケンスで分極することで量子レジスタを初期化する。我々は、光パルスの影響下での人口分布を説明するために、レート方程式モデルを用いる。このモデルは、部分量子状態トモグラフィーによって得られた実験データと比較される。スピン偏極をさらに高めるため,光パルスを最適化した再帰プロトコルを提案する。 Implementation of many quantum information protocols require an efficient initialization of the quantum register. In the present report, we optimize a population trapping protocol for initializing a hybrid spin register associated a single nitrogen vacancy (NV) center in diamond. We initialize the quantum register by polarizing the electronic and the nuclear spins of the NV with a sequence of microwave, radio-frequency and optical pulses. We use a rate equation model to explain the distribution of population under the effect of the optical pulses. The model is compared to the experimental data obtained by performing partial quantum state tomography. To further increase the spin polarization, we propose a recursive protocol with optimized optical pulses.	翻訳日:2023-03-23 18:35:15 公開日:2021-07-02
# マルコフのボゾン環境における進化の量子速度 Quantum speed of evolution in a Markovian bosonic environment ( http://arxiv.org/abs/2107.01075v1 ) ライセンス: Link先を確認	Paulina Marian and Tudor A. Marian	(参考訳) 本稿では,開連続変数系のマルコフ力学に関連する量子速度制限時間の明示的な評価を行う。具体的には,熱ボソニック貯留層に弱結合した量子放射場のキャビティモードの標準設定について検討する。場の状態の進化は、正確な解析解を持つことが知られている量子光学マスター方程式によって制御される。純粋な入力状態から始まり、初期状態と進化状態の違い、すなわち進化の忠実性と進化のヒルベルト・シュミット距離の2つの指標を用いている。前者は del campo {\em et al によって導入された。マルコフ開系の進化に対して、時間に依存しない速度制限を導出した。フィールドモードの任意の入力純状態を用いて、このフィールド貯留層設定について評価する。結果公式はコヒーレント状態とフォック状態に特殊化される。一方,我々は,上述の2つの進化指標を用いた代替手法を活用している。それらの変化速度は同じ上限を持ち、従って独自の時間依存量子速度制限を与える。ヒルベルト・シュミット計量で構築された量子速度制限時間は、忠実度に基づくものよりも厳密であることが判明した。応用例として,対応する進化状態の特性関数を用いて,コヒーレント状態とフォック状態の減衰について検討する。これら2つの入力状態のクラスについて、忠実度とヒルベルト・シュミット距離の両方の一般表現を求め、解析する。コヒーレント状態の場合、それらの共通速度限界と一対のアソシエイト制限時間に関する正確な公式を導出する。 We present explicit evaluations of quantum speed limit times pertinent to the Markovian dynamics of an open continuous-variable system. Specifically, we consider the standard setting of a cavity mode of the quantum radiation field weakly coupled to a thermal bosonic reservoir. The evolution of the field state is ruled by the quantum optical master equation, which is known to have an exact analytic solution. Starting from a pure input state, we employ two indicators of how different the initial and evolved states are, namely, the fidelity of evolution and the Hilbert-Schmidt distance of evolution. The former was introduced by del Campo {\em et al.} who derived a time-independent speed limit for the evolution of a Markovian open system. We evaluate it for this field-reservoir setting, with an arbitrary input pure state of the field mode. The resultant formula is then specialized to the coherent and Fock states. On the other hand, we exploit an alternative approach that employs both indicators of evolution mentioned above. Their rates of change have the same upper bound, and consequently provide a unique time-dependent quantum speed limit. It turns out that the associate quantum speed limit time built with the Hilbert-Schmidt metric is tighter than the fidelity-based one. As apposite applications, we investigate the damping of the coherent and Fock states by using the characteristic functions of the corresponding evolved states. General expressions of both the fidelity and the Hilbert-Schmidt distance of evolution are obtained and analyzed for these two classes of input states. In the case of a coherent state, we derive accurate formulas for their common speed limit and the pair of associate limit times.	翻訳日:2023-03-23 18:33:53 公開日:2021-07-02
# ボース・アインシュタイン凝縮と準結晶 Bose-Einstein Condensation and quasicrystals ( http://arxiv.org/abs/2107.02901v1 ) ライセンス: Link先を確認	Moorad Alexanian and Vanik E. Mkrtchian	(参考訳) 相互作用するボース粒子を外部の局所ポテンシャルで検討する。外部準結晶ポテンシャルの大きなクラスではボース・アインシュタイン凝縮系は維持できないことが示されている。したがって、そのような準結晶ポテンシャルにおける空間次元 $d\leq 2$ では、ボース・アインシュタインの有限温度での凝縮によって超固体は不可能である。後者はまた、二次元フィボナッチタイリングについても真である。しかし、超固体は無限に長距離で非局所的な粒子間ポテンシャルからボース・アインシュタインによる$d\leq 2$で生じる。 We consider interacting Bose particles in an external local potential. It is shown that large class of external quasicrystal potentials cannot sustain any type of Bose-Einstein condensates. Accordingly, at spatial dimensions $D\leq 2$ in such quasicrystal potentials a supersolid is not possible via Bose-Einstein condensates at finite temperatures. The latter also hold true for the two-dimensional Fibonacci tiling. However, supersolids do arise at $D\leq 2$ via Bose-Einstein condensates from infinitely long-range, nonlocal interparticle potentials.	翻訳日:2023-03-23 18:26:49 公開日:2021-07-02
# コンピュータ教育研究文学における経験主義と報告のノームの体系的文献レビュー A Systematic Literature Review of Empiricism and Norms of Reporting in Computing Education Research Literature ( http://arxiv.org/abs/2107.01984v1 ) ライセンス: Link先を確認	Sarah Heckman and Jeffrey C. Carver and Mark Sherriff and Ahmed Al-Zubidy	(参考訳) コンピュータ教育研究(CER)は、コンピュータのスキルを習得する学生の増加をサポートするために重要である。知識を体系的に前進させるためには、出版物は複製、メタ分析、理論構築をサポートするのに十分クリアでなければならない。本研究の目的は,出版物が複製,メタアナリシス,理論構築をサポートする情報を含むか否かを特定することで,CER文学における経験主義の報告を特徴付けることである。 RQ1) CER会場の論文のどの割合に経験的評価があるか。 RQ2) 経験的評価の特徴は何か。 rq3) 経験的評価を持つ論文は報告基準(包含とキー情報のラベル付けの両方)に従うか? 2014年と2015年に、SIGCSE TS, ICER, ITiCSE, TOCE, CSEの5つの会場で427の論文を発表した。我々はcerempiricism assessment rubricを開発し,応用した。 80%以上の論文がある種の経験的評価をしていた。定量的評価手法が最も多かった。最も頻繁に報告されている論文は教育技術、カリキュラム、コミュニティ、ツールに関する介入に関するものである。介入と他のデータセットやベースラインとを何らかの形で比較した論文の分割があった。多くの論文は、適切に報告された研究目標、目標、研究質問、仮説、参加者の説明、研究設計、データ収集、妥当性への脅威を欠いていた。 CERの著者は文献に経験的な結果をもたらしているが、報告の規範がすべて満たされているわけではない。著者には、その作業に関する明確なラベル付き詳細を提供して、読者がレプリケーションやメタ分析に方法論と結果を使用することを推奨します。コミュニティが成長するにつれて、CERの報告は成熟して、次世代のコンピューティング学習者を支援するためのコンピューティング教育理論の確立に役立ちます。 Computing Education Research (CER) is critical for supporting the increasing number of students who need to learn computing skills. To systematically advance knowledge, publications must be clear enough to support replications, meta-analyses, and theory-building. The goal of this study is to characterize the reporting of empiricism in CER literature by identifying whether publications include information to support replications, meta-analyses, and theory building. The research questions are: RQ1) What percentage of papers in CER venues have empirical evaluation? RQ2) What are the characteristics of the empirical evaluation? RQ3) Do the papers with empirical evaluation follow reporting norms (both for inclusion and for labeling of key information)? We conducted an SLR of 427 papers published during 2014 and 2015 in five CER venues: SIGCSE TS, ICER, ITiCSE, TOCE, and CSE. We developed and applied the CER Empiricism Assessment Rubric. Over 80% of papers had some form of empirical evaluation. Quantitative evaluation methods were the most frequent. Papers most frequently reported results on interventions around pedagogical techniques, curriculum, community, or tools. There was a split in papers that had some type of comparison between an intervention and some other data set or baseline. Many papers lacked properly reported research objectives, goals, research questions, or hypotheses, description of participants, study design, data collection, and threats to validity. CER authors are contributing empirical results to the literature; however, not all norms for reporting are met. We encourage authors to provide clear, labeled details about their work so readers can use the methodologies and results for replications and meta-analyses. As our community grows, our reporting of CER should mature to help establish computing education theory to support the next generation of computing learners.	翻訳日:2023-03-23 18:26:35 公開日:2021-07-02
# ベイジアンアプローチを用いたQKDシステムの量子クロック同期 Qubit-based clock synchronization for QKD systems using a Bayesian approach ( http://arxiv.org/abs/2107.01304v1 ) ライセンス: Link先を確認	Roderick D. Cochran and Daniel J. Gauthier	(参考訳) 量子鍵分配(QKD)システムは、2人のユーザが証明可能な安全な鍵を交換する方法を提供する。ユーザーの時計の同期は、セキュアな鍵を蒸留する前に必須のステップである。量子ビットベースの同期プロトコルは、送信された量子状態を直接使用して同期を実現する。従来のqubitベースの同期プロトコルは、直接または間接にセキュアな鍵を犠牲にしており、既知のqubitベースの同期プロトコルはすべて、ユーザが公開する公開情報をすべて効率的に使用していない。本稿では,すべての公開情報を組み込んだベイズ確率アルゴリズムを導入し,セキュアな鍵を犠牲にすることなく,クロックオフセットを効率的に検出する。さらに、アルゴリズムの出力は確率であり、同期に対する信頼度を定量化することができる。実演目的のために,効率の良い3状態BB84の準備・測定プロトコルのシミュレーションを伴うモデルシステムを提案する。我々のアルゴリズムは、アリスの公表した基底と平均光子数選択とボブの測定結果との相関を利用して、確率論的に最も起こりそうなクロックオフセットを決定する。この例では、通信用ビン幅8e-4のダークカウント確率と受信平均光子数0.01をシミュレートする場合に、通信用ビン幅4,140で95%の同期信頼性が得られることが判明した。 Quantum key distribution (QKD) systems provide a method for two users to exchange a provably secure key. Synchronizing the users' clocks is an essential step before a secure key can be distilled. Qubit-based synchronization protocols directly use the transmitted quantum states to achieve synchronization and thus avoid the need for additional classical synchronization hardware. Previous qubit-based synchronization protocols sacrifice secure key either directly or indirectly, and all known qubit-based synchronization protocols do not efficiently use all publicly available information published by the users. Here, we introduce a Bayesian probabilistic algorithm that incorporates all published information to efficiently find the clock offset without sacrificing any secure key. Additionally, the output of the algorithm is a probability, which allows us to quantify our confidence in the synchronization. For demonstration purposes, we present a model system with accompanying simulations of an efficient three-state BB84 prepare-and-measure protocol with decoy states. We use our algorithm to exploit the correlations between Alice's published basis and mean photon number choices and Bob's measurement outcomes to probabilistically determine the most likely clock offset. We find that we can achieve a 95 percent synchronization confidence in only 4,140 communication bin widths, meaning we can tolerate clock drift approaching 1 part in 4,140 in this example when simulating this system with a dark count probability per communication bin width of 8e-4 and a received mean photon number of 0.01.	翻訳日:2023-03-23 18:26:03 公開日:2021-07-02
# DiSH-trend:トレンドを考慮したインターベンションモデリングシミュレータ DiSH-trend: Intervention Modeling Simulator That Accounts for Trend Influences ( http://arxiv.org/abs/2107.01302v1 ) ライセンス: Link先を確認	Stefan Andjelkovic and Natasa Miskov-Zivanov	(参考訳) 有向グラフのシミュレーションは、接続グラフが周期を含むシステムの力学を理解するための重要な方法である。 Discrete Stochastic Heterogeneous Simulator (DiSH) は、規制値を用いて規制要素の状態の更新を計算するシミュレーションツールの1つである。ここでは、要素制御のトレンドを考慮に入れた新しいシミュレーション手法であるDiSH-trendを提案する。本稿では,トレンドベースとレベルベースを組み合わせたハイブリッドレギュレーションとともに,トレンドベースのレギュレーションの特徴を示す。モデリング機能は、様々な機能を示す小さなおもちゃモデルで実証される。現実世界の能力はエチオピアのオロミア地方における食料不安のより大きなネットワークモデルで実証されている。モデルにトレンドベースのレギュレーションを加えると、モデリングの柔軟性が向上し、ハイブリッドレギュレーションは定性的な動的振る舞い予測を改善する。適切なデータがあれば、DiSH-trendは介入戦略を探求するための強力なツールになります。 Simulation on directed graphs is an important method for understanding the dynamics in the systems where connectivity graphs contain cycles. Discrete Stochastic Heterogeneous Simulator (DiSH) is one of the simulation tools with wide application, which uses regulator values to calculate state updates of regulated elements. Here we present a new simulation approach DiSH-trend which also takes into account the trends in regulating elements. We demonstrate the features of trend-based regulation, as well as hybrid regulation, which is a combination of the trend- and level-based approaches. The modeling capabilities are demonstrated on a small toy model, showcasing different functionalities. Real-world capabilities are demonstrated on a larger network model of food insecurity in the Ethiopian region Oromia. Adding trend-based regulation to models results in increased modeling flexibility, and hybrid regulation improves qualitative dynamic behavior prediction. With appropriate data, DiSH-trend becomes a powerful tool for exploring intervention strategies.	翻訳日:2023-03-23 18:25:37 公開日:2021-07-02
# 正方根特異点における非線形光学センサの異常精度 Exceptional precision of a nonlinear optical sensor at a square-root singularity ( http://arxiv.org/abs/2107.01291v1 ) ライセンス: Link先を確認	K. J. H. Peters and S. R. K. Rodriguez	(参考訳) 例外点(eps) --非エルミート線型系のスペクトル特異点 -- は、最近センシングに大きな関心を集めている。最初の提案と実験ではノイズを無視する感度の向上に焦点が当てられたが、その後の研究でノイズ環境におけるepセンサの問題点が明らかになった。本稿では,雑音下での特別なセンシングのための単一モードkerr非線形共振器を提案する。共振器の動的ヒステリシスに基づいて、EPに似た平方根特異点を示す信号を定義する。 epセンサとは対照的に,センサの信号対雑音比は測定速度とともに増加し,正方根特異性では精度が向上した。驚くべきことに、信号の平均化は素早く向上し、精度を低下させる。これらの非慣習的な特徴は、線形システムの制約を超えた高速で精密なセンシングの新たな機会を開く。光センシングに焦点をあてる一方で、我々のアプローチは他のヒステリックシステムにも拡張できる。 Exceptional points (EPs) -- spectral singularities of non-Hermitian linear systems -- have recently attracted great interest for sensing. While initial proposals and experiments focused on enhanced sensitivities neglecting noise, subsequent studies revealed issues with EP sensors in noisy environments. Here we propose a single-mode Kerr-nonlinear resonator for exceptional sensing in noisy environments. Based on the resonator's dynamic hysteresis, we define a signal that displays a square-root singularity akin to an EP. In contrast to EP sensors, our sensor has a signal-to-noise ratio that increases with the measurement speed, and a precision enhanced at the square-root singularity. Remarkably, averaging the signal can quickly enhance and then degrade the precision. These unconventional features open up new opportunities for fast and precise sensing beyond the constraints of linear systems. While we focus on optical sensing, our approach can be extended to other hysteretic systems.	翻訳日:2023-03-23 18:25:22 公開日:2021-07-02
# 三角形格子上の$U(1)$量子リンクモデルにおけるネマティック収束相:チップ上の文字列ダイナミクスの短期量子計算の可能性 Nematic Confined Phases in the $U(1)$ Quantum Link Model on a Triangular Lattice: An Opportunity for Near-Term Quantum Computations of String Dynamics on a Chip ( http://arxiv.org/abs/2107.01283v1 ) ライセンス: Link先を確認	D. Banerjee, S. Caspar, F.-J. Jiang, J.-H. Peng, and U.-J. Wiese	(参考訳) 三角格子上の$U(1)$量子リンクモデルは、2つの回転対称性を破るネマティック制限相を持つ。静電荷は、分極化された電気束を持つ個々のストランドからなる弦で接続される。 2つの相は、ほぼ正確な$SO(2)$対称性を持つ弱い1次相転移によって分離される。我々はチップ上に量子回路を構築し、非自明な弦力学の短期量子計算を容易にする。 The $U(1)$ quantum link model on the triangular lattice has two rotation-symmetry-breaking nematic confined phases. Static external charges are connected by confining strings consisting of individual strands with fractionalized electric flux. The two phases are separated by a weak first order phase transition with an emergent almost exact $SO(2)$ symmetry. We construct a quantum circuit on a chip to facilitate near-term quantum computations of the non-trivial string dynamics.	翻訳日:2023-03-23 18:25:09 公開日:2021-07-02
# 運動ロボットを用いた目標筋力分布 : 軌道と抵抗効果 Targeted Muscle Effort Distribution with Exercise Robots: Trajectory and Resistance Effects ( http://arxiv.org/abs/2107.01280v1 ) ライセンス: Link先を確認	Humberto De las Casas and Santino Bianco and Hanz Richter	(参考訳) 本研究の目的は,ロボット運動・リハビリテーション機械の筋力分布を軌道と抵抗設定に関連付けることである。筋活動における各筋の関与を表す筋活動分布を筋電図センサ(EMG)を用いて測定し,筋集団の活性化によって個別の活性化が決定された。 4自由度ロボットとそのインピーダンス制御システムは、ユーザが機械の中立経路と抵抗に対して経路に従うように要求される高度な運動プロトコルを作成するために使用される。この研究では、ロボットはゼロエフォート円形経路を確立し、被験者は楕円軌道に従うように要求される。制御システムは、中性経路からのずれと被検者によるトルクとの間にユーザが定義した剛性を生成する。実験で使用された軌道と抵抗の設定は楕円の向きと剛性パラメータであった。これらのパラメータを複数組み合わせて筋力分布に及ぼす影響を測定した。人工知能ニューラルネットワーク(ANN)は、モデルのトレーニングにデータの一部を使用した。そして、残りのデータを用いてモデルの精度を評価した。その結果,モデルの精度は時間とともに低下することがわかった。これらの結果は、疲労に関連する可能性のある時間変化ダイナミクスの存在を示唆する長期推定のための筋力学の複雑さを示している。 The objective of this work is to relate muscle effort distributions to the trajectory and resistance settings of a robotic exercise and rehabilitation machine. Muscular effort distribution, representing the participation of each muscle in the training activity, was measured with electromyography sensors (EMG) and defined as the individual activation divided by the total muscle group activation. A four degrees-of-freedom robot and its impedance control system are used to create advanced exercise protocols whereby the user is asked to follow a path against the machine's neutral path and resistance. In this work, the robot establishes a zero-effort circular path, and the subject is asked to follow an elliptical trajectory. The control system produces a user-defined stiffness between the deviations from the neutral path and the torque applied by the subject. The trajectory and resistance settings used in the experiments were the orientation of the ellipse and a stiffness parameter. Multiple combinations of these parameters were used to measure their effects on the muscle effort distribution. An artificial neural network (ANN) used part of the data for training the model. Then, the accuracy of the model was evaluated using the rest of the data. The results show how the precision of the model is lost over time. These outcomes show the complexity of the muscle dynamics for long-term estimations suggesting the existence of time-varying dynamics possibly associated with fatigue.	翻訳日:2023-03-23 18:25:01 公開日:2021-07-02
# コヒーレント光吸収をもつ非対称ミラーの量子光学 The quantum optics of asymmetric mirrors with coherent light absorption ( http://arxiv.org/abs/2107.01279v1 ) ライセンス: Link先を確認	Benjamin Dawson, Nicholas Furtak-Wells, Thomas Mann, Gin Jose and Almut Beige	(参考訳) ミラー被覆界面近傍の量子化された電磁界の局所観測は、"em both} 側の媒体の性質に強く依存する。巨視的量子電磁力学では、この事実は観測者の位置と他の全ての空間的位置と光子周波数を関連付ける光グリーン関数の助けを借りて考慮される。ここでは,量子ミラー画像検出法 (furtak-wells et al., phys. rev. a 97, 043827 (2018)) の助けを借りて,より直感的な手法で局所場観測性を得る。電場作用素を正しく正規化するために、自発的原子崩壊率を反射面から遠く離れたそれぞれの自由空間値に簡易化することを要求する。ミラーコーティング界面は量子フォトニックデバイスのための共通の基本構成ブロックであるので,このアプローチは興味深い。 The local observables of the quantised electromagnetic field near a mirror-coated interface depend strongly on the properties of the media on {\em both} sides. In macroscopic quantum electrodynamics, this fact is taken into account with the help of optical Green's functions which correlate the position of an observer with all other spatial positions and photon frequencies. Here we present an alternative, more intuitive approach and obtain the local field observables with the help of a quantum mirror image detector method [Furtak-Wells et al., Phys. Rev. A 97, 043827 (2018)]. In order to correctly normalise electric field operators, we demand that spontaneous atomic decay rates simplify to their respective free space values far away from the reflecting surface. Our approach is interesting, since mirror-coated interfaces constitute a common basic building block for quantum photonic devices.	翻訳日:2023-03-23 18:24:41 公開日:2021-07-02
# アナログ量子アルゴリズムの挙動 Behavior of Analog Quantum Algorithms ( http://arxiv.org/abs/2107.01218v1 ) ライセンス: Link先を確認	Lucas T. Brady, Lucas Kocia, Przemyslaw Bienias, Aniruddha Bapat, Yaroslav Kharkov, Alexey V. Gorshkov	(参考訳) アナログ量子アルゴリズムはユニタリゲートではなくハミルトニアンによって定式化され、量子断熱計算、量子アニーリング、量子近似最適化アルゴリズム(qaoa)が含まれる。これらのアルゴリズムは、短期量子アプリケーションには有望な候補であるが、アニーリングスケジュールや変動パラメータによる微調整を必要とすることが多い。本研究では,これらのアナログアルゴリズム間の関係や,最適手順の近似となる限界について検討する。しかしながら,最適手順がスムーズな断熱処理にどのようにアプローチするかを,基底状態とダイアバティック遷移のコヒーレントなエラーキャンセルに影響を及ぼす最初の励起状態との相互作用から説明できる重畳発振パターンを用いて検討する。さらに、QAOAが各QAOA層の長さを振動パターンの周期に等しい長さでエミュレートするという数値的および解析的な証拠を提供する。さらに、QAOAバングの比率は、最適手順の滑らかで非振動部分によって決定される。最適手順の積公式展開の観点からこれらの現象について議論する。これらの議論により、異なるアナログアルゴリズムは、異なる極限と近似の下で最適なプロトコルをエミュレートできると結論付ける。最後に,論文の他の部分から得られた解析的および数値的洞察を用いて,最適なプロトコルを近似する新しいアルゴリズムを提案する。実際、数値的には、このアルゴリズムは標準的なQAOAおよびナイーブ量子アニール法よりも優れている。 Analog quantum algorithms are formulated in terms of Hamiltonians rather than unitary gates and include quantum adiabatic computing, quantum annealing, and the quantum approximate optimization algorithm (QAOA). These algorithms are promising candidates for near-term quantum applications, but they often require fine tuning via the annealing schedule or variational parameters. In this work, we explore connections between these analog algorithms, as well as limits in which they become approximations of the optimal procedure.Notably, we explore how the optimal procedure approaches a smooth adiabatic procedure but with a superposed oscillatory pattern that can be explained in terms of the interactions between the ground state and first excited state that effect the coherent error cancellation of diabatic transitions. Furthermore, we provide numeric and analytic evidence that QAOA emulates this optimal procedure with the length of each QAOA layer equal to the period of the oscillatory pattern. Additionally, the ratios of the QAOA bangs are determined by the smooth, non-oscillatory part of the optimal procedure. We provide arguments for these phenomena in terms of the product formula expansion of the optimal procedure. With these arguments, we conclude that different analog algorithms can emulate the optimal protocol under different limits and approximations. Finally, we present a new algorithm for better approximating the optimal protocol using the analytic and numeric insights from the rest of the paper. In practice, numerically, we find that this algorithm outperforms standard QAOA and naive quantum annealing procedures.	翻訳日:2023-03-23 18:24:14 公開日:2021-07-02
# 多重多重光子数測定 Multiplexed photon number measurement ( http://arxiv.org/abs/2001.03217v3 ) ライセンス: Link先を確認	Antoine Essig, Quentin Ficheux, Th\'eau Peronnin, Nathana\"el Cottet, Rapha\"el Lescanne, Alain Sarlette, Pierre Rouchon, Zaki Leghtas, Benjamin Huard	(参考訳) 2段階のシステム – キュービット – がより大きなシステムのプローブとして使用される場合,システム状態に関する1つのイエスノー質問に自然に答えることになります。本稿では,マイクロ波共振器の光子数について,単一量子ビットではなく,多くの情報を連続計測により抽出する手法を提案する。周波数コムを反射する超伝導量子ビットから放出される蛍光を記録することにより、各フォック状態(0から8)に関する情報が独立な測定チャネルに同時に符号化される多重光子計数を実現することにより、原理実証実験を実現する。共振器の量子状態の直接ウィグナートモグラフィーは、測定のバックアクションと最適な情報抽出パラメータを証明している。本実験は、逐次量子測定を周波数領域で分離した同時連続測定に置き換えることで、量子メータの全ポテンシャルを解き明かす。 When a two-level system -- a qubit -- is used as a probe of a larger system, it naturally leads to answering a single yes-no question about the system state. Here we propose a method where a single qubit is able to extract, not a single, but many bits of information about the photon number of a microwave resonator using continuous measurement. We realize a proof-of-principle experiment by recording the fluorescence emitted by a superconducting qubit reflecting a frequency comb, thus implementing multiplexed photon counting where the information about each Fock state -- from 0 to 8 -- is simultaneously encoded in independent measurement channels. Direct Wigner tomography of the quantum state of the resonator evidences the back-action of the measurement as well as the optimal information extraction parameters. Our experiment unleashes the full potential of quantum meters by replacing a sequential quantum measurements with simultaneous and continuous measurements separated in the frequency domain.	翻訳日:2023-01-13 05:32:52 公開日:2021-07-02
# Vine Copulaによる変量推論:ベイジアンコンピュータモデル校正のための効率的なアプローチ Variational Inference with Vine Copulas: An efficient Approach for Bayesian Computer Model Calibration ( http://arxiv.org/abs/2003.12890v2 ) ライセンス: Link先を確認	Vojtech Kejzlar and Tapabrata Maiti	(参考訳) コンピュータアーキテクチャの進歩により、計算モデルの使用が増加し、核物理学や気候研究など多くの科学的応用において複雑な問題を解く。しかし、そのようなモデルのポテンシャルは計算コストが高く、その結果不確かさの定量化が不適当になる傾向があるため、しばしば妨げられる。さらに、通常はリアルタイム観測では校正されない。ガウス過程を持つ計算機モデルの校正のための変分ベイズ推定(vbi)に基づく計算効率の高いアルゴリズムを開発した。残念ながら、VBIの速度とスケーラビリティは、依存データによるキャリブレーションフレームワークに適用すると低下する。 VBIの効率性を維持するために,Vine copulas を用いてデータ間の依存構造に関する情報を境界分布から分離し,データ可能性のペアワイズ分解を行う。本稿では,提案手法の計算スケーラビリティに関する理論的および実証的な証拠と,提案アルゴリズムの効率的な実装に必要な詳細をすべて記述する。また,核結合エネルギーの液滴モデルのキャリブレーションを通じて実データを用いた実践者に対して,本手法がもたらす機会を実証する。 With the advancements of computer architectures, the use of computational models proliferates to solve complex problems in many scientific applications such as nuclear physics and climate research. However, the potential of such models is often hindered because they tend to be computationally expensive and consequently ill-fitting for uncertainty quantification. Furthermore, they are usually not calibrated with real-time observations. We develop a computationally efficient algorithm based on variational Bayes inference (VBI) for calibration of computer models with Gaussian processes. Unfortunately, the speed and scalability of VBI diminishes when applied to the calibration framework with dependent data. To preserve the efficiency of VBI, we adopt a pairwise decomposition of the data likelihood using vine copulas that separate the information on dependence structure in data from their marginal distributions. We provide both theoretical and empirical evidence for the computational scalability of our methodology and describe all the necessary details for an efficient implementation of the proposed algorithm. We also demonstrate the opportunities given by our method for practitioners on a real data example through calibration of the Liquid Drop Model of nuclear binding energies.	翻訳日:2022-12-19 00:03:15 公開日:2021-07-02
# 鏡のない鏡の輝き:鏡の輝きの自然な派生 Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent ( http://arxiv.org/abs/2004.01025v3 ) ライセンス: Link先を確認	Suriya Gunasekar, Blake Woodworth, Nathan Srebro	(参考訳) 我々は、ミラー降下ポテンシャルの原始的唯一の導出を、計量テンソルがミラー降下ポテンシャルのヘッシアンであるリーマン多様体上の勾配流れの「部分的」離散化として示す。我々は、この離散化を「完全な」前方オイラー離散化によって得られる自然グラディエント Descent と対比する。この見解は、この方法の関係性に光を当て、計量テンソルがヘッシアンであるにもかかわらず、一般リーマン幾何学へのミラー降下を一般化することを可能にし、従って「双対」は存在しない。 We present a primal only derivation of Mirror Descent as a "partial" discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential. We contrast this discretization to Natural Gradient Descent, which is obtained by a "full" forward Euler discretization. This view helps shed light on the relationship between the methods and allows generalizing Mirror Descent to general Riemannian geometries, even when the metric tensor is {\em not} a Hessian, and thus there is no "dual."	翻訳日:2022-12-17 09:54:20 公開日:2021-07-02
# 大規模複合飛行ネットワークを用いた飛行船ペア最適化のための新しいカラム生成ヒューリスティック A Novel Column Generation Heuristic for Airline Crew Pairing Optimization with Large-scale Complex Flight Networks ( http://arxiv.org/abs/2005.08636v4 ) ライセンス: Link先を確認	Divyam Aggarwal, Dhish Kumar Saxena, Saaju Pualose, Thomas B\"ack, Michael Emmerich	(参考訳) 乗組員のペアリング最適化(cpo)は、乗組員の運用コストが燃料コストに次いで第2位であることから、航空会社のビジネスの存続に不可欠である。 cpoは、いくつかの法的制約を満たしながら、スケジュールされたすべてのフライトをカバーする一連の飛行シーケンス(クルーペアリング)を作成することを目指している。最先端の手法は、基礎となる整数プログラミング問題を線形計画問題に緩和することに大きく依存しており、これはカラム生成(cg)技術によって解決される。しかし、航空会社の事業拡大に伴い、CPOは次元性の呪いに悩まされ、正確なCG実装は廃止され、ヒューリスティックベースのCG実装が必要とされる。しかし、文献では、複数の { crew bases と/またはハブアンドスポークのサブネットワークを含む、非常に一般的な大規模な複雑な飛行ネットワークは、ほとんど調査されていない。本稿では,AirCROP(Airline Crew Pairing Optimizer)の社内開発を可能にする新しいCGヒューリスティックを提案する。ヒューリスティック/エアCROPの有効性は、実世界の大規模で複雑なネットワークインスタンスで4,200機以上の飛行、15人の乗員基地、複数のハブ・アンド・スポーク・サブネットワーク(数十億以上のペアリング)でテストされている。特に,本論文では,ペアリングのランダムな探索,ドメイン知識の活用(最適解の特徴に基づく),アーカイビングによる過去の計算・探索の活用を中心に,提案したCGヒューリスティック(AirCROPフレームワーク全体ではない)に焦点をあてる。本論文は航空会社の文脈を持つが,提案したCGヒューリスティックは,ドメイン知識の活用による組合せ最適化問題への対処方法のテンプレートとして,様々な分野にまたがる幅広い応用を見出すことができる。 Crew Pairing Optimization (CPO) is critical for an airlines' business viability, given that the crew operating cost is second only to the fuel cost. CPO aims at generating a set of flight sequences (crew pairings) to cover all scheduled flights, at minimum cost, while satisfying several legality constraints. The state-of-the-art heavily relies on relaxing the underlying Integer Programming Problem into a Linear Programming Problem, which in turn is solved through the Column Generation (CG) technique. However, with the alarmingly expanding airlines' operations, CPO is marred by the curse of dimensionality, rendering the exact CG-implementations obsolete, and necessitating the heuristic-based CG-implementations. Yet, in literature, the much prevalent large-scale complex flight networks involving multiple { crew bases and/or hub-and-spoke sub-networks, largely remain uninvestigated. This paper proposes a novel CG heuristic, which has enabled the in-house development of an Airline Crew Pairing Optimizer (AirCROP). The efficacy of the heuristic/AirCROP has been tested on real-world, large-scale, complex network instances with over 4,200 flights, 15 crew bases, and multiple hub-and-spoke sub-networks (resulting in billion-plus possible pairings). Notably, this paper has a dedicated focus on the proposed CG heuristic (not the entire AirCROP framework) based on balancing random exploration of pairings; exploitation of domain knowledge (on optimal solution features); and utilization of the past computational & search effort through archiving. Though this paper has an airline context, the proposed CG heuristic may find wider applications across different domains, by serving as a template on how to utilize domain knowledge to better tackle combinatorial optimization problems.	翻訳日:2022-12-02 00:23:58 公開日:2021-07-02
# SHADOWCAST: 制御可能なグラフ生成 SHADOWCAST: Controllable Graph Generation ( http://arxiv.org/abs/2006.03774v4 ) ライセンス: Link先を確認	Wesley Joon-Wie Tann, Ee-Chien Chang, and Bryan Hooi	(参考訳) 生成過程におけるグラフ属性の制御として定式化された制御可能なグラフ生成問題を導入し,理解可能な構造を持つ所望のグラフを生成する。この生成プロセスを導くために透明で分かりやすいマルコフモデルを使用することで、生成したグラフを形作り、理解することができる。本稿では,従来のグラフ固有の特性を維持しつつ,グラフ生成を制御可能な生成モデルである${\rm S{\small HADOW}C{\small AST}}$を提案する。提案モデルは条件付き生成型adversarial networkに基づいている。観察されたグラフとユーザ指定のマルコフモデルパラメータが与えられたとき、${\rm s{\small hadow}c{\small ast}}$ は所望のグラフを生成する条件を制御する。 3つの実世界のネットワークデータセットに関する総合的な実験は、グラフ生成タスクにおける我々のモデルの競合性能を示す。さらに、グラフ構造が異なる仮説シナリオを生成するために、${\rm S{\small HADOW}C{\small AST}}$を指示することで、その効果的な制御性を示す。 We introduce the controllable graph generation problem, formulated as controlling graph attributes during the generative process to produce desired graphs with understandable structures. Using a transparent and straightforward Markov model to guide this generative process, practitioners can shape and understand the generated graphs. We propose ${\rm S{\small HADOW}C{\small AST}}$, a generative model capable of controlling graph generation while retaining the original graph's intrinsic properties. The proposed model is based on a conditional generative adversarial network. Given an observed graph and some user-specified Markov model parameters, ${\rm S{\small HADOW}C{\small AST}}$ controls the conditions to generate desired graphs. Comprehensive experiments on three real-world network datasets demonstrate our model's competitive performance in the graph generation task. Furthermore, we show its effective controllability by directing ${\rm S{\small HADOW}C{\small AST}}$ to generate hypothetical scenarios with different graph structures.	翻訳日:2022-11-24 20:56:15 公開日:2021-07-02
# シーケンス学習のための時間相関タスクスケジューリング Temporally Correlated Task Scheduling for Sequence Learning ( http://arxiv.org/abs/2007.05290v2 ) ライセンス: Link先を確認	Xueqing Wu, Lewen Wang, Yingce Xia, Weiqing Liu, Lijun Wu, Shufang Xie, Tao Qin, Tie-Yan Liu	(参考訳) 近年、シーケンス学習は機械学習コミュニティから多くの研究の注目を集めている。多くのアプリケーションにおいて、シーケンス学習タスクは、通常、複数の時間的に相関した補助タスクと関連付けられている。例えば (i)同時機械翻訳では、異なるレイテンシで翻訳を行うことができる(つまり、翻訳の前に読み待ちする入力語数)。 (二)株価トレンド予測においては、将来日(例えば、明日、明日の翌日)の株価を予測することができる。これらの時間的相関タスクが互いに助け合うことは明らかだが、メインタスクの性能を高めるために複数の補助タスクをよりよく活用する方法について、非常に限定的な調査が行われている。本研究では,学習用補助タスクをモデル状態と現在のトレーニングデータに応じて適応的に選択できるシーケンス学習のための学習可能なスケジューラを提案する。メインタスクのスケジューラとモデルは、バイレベル最適化によって共同で訓練される。実験の結果,本手法は同時翻訳と株価トレンド予測の性能を著しく向上させることがわかった。 Sequence learning has attracted much research attention from the machine learning community in recent years. In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks, which are different in terms of how much input information to use or which future step to predict. For example, (i) in simultaneous machine translation, one can conduct translation under different latency (i.e., how many input words to read/wait before translation); (ii) in stock trend forecasting, one can predict the price of a stock in different future days (e.g., tomorrow, the day after tomorrow). While it is clear that those temporally correlated tasks can help each other, there is a very limited exploration on how to better leverage multiple auxiliary tasks to boost the performance of the main task. In this work, we introduce a learnable scheduler to sequence learning, which can adaptively select auxiliary tasks for training depending on the model status and the current training data. The scheduler and the model for the main task are jointly trained through bi-level optimization. Experiments show that our method significantly improves the performance of simultaneous machine translation and stock trend forecasting.	翻訳日:2022-11-11 22:00:28 公開日:2021-07-02
# 創造産業における人工知能 : レビュー Artificial Intelligence in the Creative Industries: A Review ( http://arxiv.org/abs/2007.12391v6 ) ライセンス: Link先を確認	Nantheera Anantrasirichai and David Bull	(参考訳) 本稿では,創造産業の文脈における人工知能(AI)技術と応用の現状を概観する。 ai、特に機械学習(ml)アルゴリズムの簡単な背景には、畳み込みニューラルネットワーク(cnns)、生成敵ネットワーク(gans)、リカレントニューラルネットワーク(rnns)、深層強化学習(drl)が含まれる。私たちはクリエイティブなアプリケーションを、AI技術の使用方法に関連する5つのグループに分類します。 i) コンテンツの作成 ii) 情報分析三コンテンツの充実及び生産後のワークフロー四情報抽出及び強化及び v) データ圧縮。我々は、これらの分野におけるこの急速に進歩する技術の成功と限界について批判的に検討する。創造的なツールとしてのAIの使用と、創造的なツールとしての潜在能力とを、私たちはさらに区別しています。近い将来、機械学習ベースのAIは、創造性のためのツールや共同アシスタントとして広く採用されるでしょう。対照的に、AIが‘創造者’であるような制約の少ない領域での機械学習の成功は、控えめなままである。 AI(あるいはその開発者)が、人間の創造と競合するオリジナルの創造物に対して受賞する可能性も、現代の技術に基づいて制限されている。それゆえ、創造的産業の文脈では、aiによる最大限の利益は、その焦点が人間中心であり、人間の創造性を置き換えるのではなく、強化するように設計された場所でもたらされる、と結論づける。 This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity.	翻訳日:2022-11-07 05:56:23 公開日:2021-07-02
# 外乱推定:硬さ、極小調整アルゴリズムとその応用 Outlier-Robust Estimation: Hardness, Minimally Tuned Algorithms, and Applications ( http://arxiv.org/abs/2007.15109v3 ) ライセンス: Link先を確認	Pasquale Antonante, Vasileios Tzoumas, Heng Yang, Luca Carlone	(参考訳) ロボット工学と視覚の非線形推定は、通常、誤ったデータ関連付けや、信号処理や機械学習の手法による誤検出によって異常に苦しめられている。本稿では,外乱推定のための2つの統一的な定式化,一般化最大収束(G-MC)と一般化最小平方(G-TLS)を導入し,基本的限界,実用的アルゴリズム,応用について検討する。我々の最初の貢献は、アウトリアー・ロバスト推定がほぼ不可能であることの証明である: 最悪の場合、(ほぼ)アウトリアーの集合を見つけることは、時間よりも遅いアルゴリズム(特に、準多項時間で実行されるアルゴリズム)でさえも不可能である。第2の貢献として,2つの汎用アルゴリズムをレビューし,拡張する。第1のAdaptive Trimming (ADAPT) は組合せ的であり、G-MCに適しており、第2のDeleced Non-Convexity (GNC) はホモトピー法に基づいており、G-TLSに適している。 ADAPT と GNC は、ユーザがイリヤノイズ統計の事前知識を持っていない場合(あるいは、統計が時間とともに変化する場合)に拡張し、イリヤとイリヤを分離する合理的なしきい値(RANSAC でよく使われるもの)を推測できない場合に拡張する。外れ値から外れ値を切り離す方法を動的に決定する、外れ値拒否のための最初の最小調整アルゴリズムを提案する。第3の貢献は、メッシュ登録、画像に基づく物体検出(形状アライメント)、ポーズグラフ最適化といったロボット知覚問題に対するアルゴリズムの評価である。 ADAPTとGNCはリアルタイムで実行され、決定論的であり、RANSACより優れ、80-90%のアウトレイラが堅牢である。彼らの最小限に調整されたバージョンは、イリヤのノイズに頼らずとも、芸術の状態を好意的に比較している。 Nonlinear estimation in robotics and vision is typically plagued with outliers due to wrong data association, or to incorrect detections from signal processing and machine learning methods. This paper introduces two unifying formulations for outlier-robust estimation, Generalized Maximum Consensus (G-MC) and Generalized Truncated Least Squares (G-TLS), and investigates fundamental limits, practical algorithms, and applications. Our first contribution is a proof that outlier-robust estimation is inapproximable: in the worst case, it is impossible to (even approximately) find the set of outliers, even with slower-than-polynomial-time algorithms (particularly, algorithms running in quasi-polynomial time). As a second contribution, we review and extend two general-purpose algorithms. The first, Adaptive Trimming (ADAPT), is combinatorial, and is suitable for G-MC; the second, Graduated Non-Convexity (GNC), is based on homotopy methods, and is suitable for G-TLS. We extend ADAPT and GNC to the case where the user does not have prior knowledge of the inlier-noise statistics (or the statistics may vary over time) and is unable to guess a reasonable threshold to separate inliers from outliers (as the one commonly used in RANSAC). We propose the first minimally tuned algorithms for outlier rejection, that dynamically decide how to separate inliers from outliers. Our third contribution is an evaluation of the proposed algorithms on robot perception problems: mesh registration, image-based object detection (shape alignment), and pose graph optimization. ADAPT and GNC execute in real-time, are deterministic, outperform RANSAC, and are robust up to 80-90% outliers. Their minimally tuned versions also compare favorably with the state of the art, even though they do not rely on a noise bound for the inliers.	翻訳日:2022-11-05 20:53:27 公開日:2021-07-02
# ニューラルネットワークを用いた時系列分類のためのデータ拡張に関する実証的研究 An Empirical Survey of Data Augmentation for Time Series Classification with Neural Networks ( http://arxiv.org/abs/2007.15951v4 ) ライセンス: Link先を確認	Brian Kenji Iwana, Seiichi Uchida	(参考訳) 近年、深層ニューラルネットワークはパターン認識において多くの成功を収めている。この成功の一部は、一般化を促進するためにビッグデータに依存しているためである。しかし、時系列認識の分野では、多くのデータセットは非常に小さい。この問題に対処する1つの方法は、データ拡張の利用である。本稿では,時系列データ拡張手法とニューラルネットワークを用いた時系列分類への応用について検討する。本稿では,時系列データ拡張において,変換に基づく手法,パターン混合法,生成モデル,分解法を含む4つのファミリーを分類・概説する。さらに,6種類のニューラルネットワークを用いた128の時系列分類データセットにおいて,12の時系列データ拡張手法を実証的に評価した。その結果,各データ拡張手法の特徴,長所,短所,レコメンデーションを解析できた。この調査は、ニューラルネットワークアプリケーションのための時系列データ拡張の選択を支援することを目的としている。 In recent times, deep artificial neural networks have achieved many successes in pattern recognition. Part of this success can be attributed to the reliance on big data to increase generalization. However, in the field of time series recognition, many datasets are often very small. One method of addressing this problem is through the use of data augmentation. In this paper, we survey data augmentation techniques for time series and their application to time series classification with neural networks. We propose a taxonomy and outline the four families in time series data augmentation, including transformation-based methods, pattern mixing, generative models, and decomposition methods. Furthermore, we empirically evaluate 12 time series data augmentation methods on 128 time series classification datasets with six different types of neural networks. Through the results, we are able to analyze the characteristics, advantages and disadvantages, and recommendations of each data augmentation method. This survey aims to help in the selection of time series data augmentation for neural network applications.	翻訳日:2022-11-04 05:52:56 公開日:2021-07-02
# 合理的攻撃者に対する未知の提示攻撃検出 Unknown Presentation Attack Detection against Rational Attackers ( http://arxiv.org/abs/2010.01592v2 ) ライセンス: Link先を確認	Ali Khodabakhsh, Zahid Akhtar	(参考訳) 過去10年間のプレゼンテーション攻撃検出とマルチメディア法医学の分野では目覚ましい進歩があったが、これらのシステムは実際の環境での攻撃には弱い。既存のソリューションの課題は、未知の攻撃の検出、敵対的な設定で実行する能力、最小限の学習、説明可能性である。本研究では,アタッカーと検出器の相互作用をモデル化するゲーム理論的な視点に依拠して,これらの限界にアプローチする。その結果、新しい最適化基準が提案され、実際の環境での性能を改善するための一連の要件が定義される。さらに,特定の攻撃種に偏らないジェネレータベースの特徴セットを用いて,新たな検出手法を提案する。既知の攻撃の性能をさらに最適化するために, カテゴリー的マージン最大化損失(c-marmax)という新たな損失関数が提案され, 最も強力な攻撃に対する性能が徐々に向上する。提案手法は、既知の攻撃と未知の攻撃の間でよりバランスの取れた性能を提供し、合理的な攻撃に対して、未知の攻撃検出ケースにおいて最先端のパフォーマンスを達成する。最後に,提案手法の数少ない学習可能性と,画素レベルの説明可能性について検討した。 Despite the impressive progress in the field of presentation attack detection and multimedia forensics over the last decade, these systems are still vulnerable to attacks in real-life settings. Some of the challenges for existing solutions are the detection of unknown attacks, the ability to perform in adversarial settings, few-shot learning, and explainability. In this study, these limitations are approached by reliance on a game-theoretic view for modeling the interactions between the attacker and the detector. Consequently, a new optimization criterion is proposed and a set of requirements are defined for improving the performance of these systems in real-life settings. Furthermore, a novel detection technique is proposed using generator-based feature sets that are not biased towards any specific attack species. To further optimize the performance on known attacks, a new loss function coined categorical margin maximization loss (C-marmax) is proposed which gradually improves the performance against the most powerful attack. The proposed approach provides a more balanced performance across known and unknown attacks and achieves state-of-the-art performance in known and unknown attack detection cases against rational attackers. Lastly, the few-shot learning potential of the proposed approach is studied as well as its ability to provide pixel-level explainability.	翻訳日:2022-10-11 03:32:07 公開日:2021-07-02
# シンボリック有限状態オートマトンの複雑性について On the Complexity of Symbolic Finite-State Automata ( http://arxiv.org/abs/2011.05389v3 ) ライセンス: Link先を確認	Dana Fisman and Hadar Frenkel and Sandra Zilles	(参考訳) 我々は、SFAの手順(交叉、空白など)の複雑さを再考し、それらを象徴的オートマトン(状態の数、状態から出る遷移の最大数、最も複雑な遷移述語のサイズ)に適した尺度に従って分析する。我々は SFA の特殊形式である {normalized SFAs} と {neat SFAs} 、および {monotonic} の実効ブール代数上の SFA に注意を払う。 We revisit the complexity of procedures on SFAs (such as intersection, emptiness, etc.) and analyze them according to the measures we find suitable for symbolic automata: the number of states, the maximal number of transitions exiting a state, and the size of the most complex transition predicate. We pay attention to the special forms of SFAs: {normalized SFAs} and {neat SFAs}, as well as to SFAs over a {monotonic} effective Boolean algebra.	翻訳日:2022-09-27 08:35:27 公開日:2021-07-02
# 漸近的最適情報指向サンプリング Asymptotically Optimal Information-Directed Sampling ( http://arxiv.org/abs/2011.05944v4 ) ライセンス: Link先を確認	Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesv\'ari	(参考訳) 漸近的に最適であり、(ほぼ)最悪の場合を有限時間で最適とする確率線形包帯に対する単純かつ効率的なアルゴリズムを導入する。このアプローチは、漸近的下界を定義する最適化問題によって通知される情報ゲインのためのサロゲートを備えた、頻繁な情報指向サンプリング(ids)フレームワークに基づいている。我々の分析では、IDSが後悔と情報のトレードオフのバランスを保ち、最近提案された原始双対法とIDSアルゴリズムの驚くべき関係を明らかにする。 IDS が UCB と有限時間で競合し,無症候性体制において有意に良くなることを実証的に実証した。 We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.	翻訳日:2022-09-26 23:23:53 公開日:2021-07-02
# オンラインマッチング,フロー,ロードバランシングのための学習可能な,インスタンス・ロバスト予測 Learnable and Instance-Robust Predictions for Online Matching, Flows and Load Balancing ( http://arxiv.org/abs/2011.11743v2 ) ライセンス: Link先を確認	Thomas Lavastida, Benjamin Moseley, R. Ravi and Chenyang Xu	(参考訳) 本稿では,形式的に学習可能かつインスタンスロバストなアルゴリズムを,予測によって拡張するための新しいモデルを提案する。学習可能性により、予測は妥当な過去のデータから効率的に構築できる。インスタンスの堅牢性は、予測が問題入力の適度な変更に対して堅牢であることを保証する。インスタンスの堅牢性は、変更の関数としてのパフォーマンスのスムーズな低下を主張する。理想的には、パフォーマンスは最悪のケース境界よりも悪くはない。また、予測を客観的に比較することもできる。我々は,ネットワークフロー割当問題と制限割当メースパン最小化の予測を伴うオンラインアルゴリズムを設計する。両方の問題に対して、2つの重要な特性が確立されている: 以前のインスタンスの小さなサンプルから高品質の予測を学習することができ、これらの予測は、基礎となる問題インスタンスが変化したときにスムーズに劣化するエラーに頑健である。 We propose a new model for augmenting algorithms with predictions by requiring that they are formally learnable and instance robust. Learnability ensures that predictions can be efficiently constructed from a reasonable amount of past data. Instance robustness ensures that the prediction is robust to modest changes in the problem input, where the measure of the change may be problem specific. Instance robustness insists on a smooth degradation in performance as a function of the change. Ideally, the performance is never worse than worst-case bounds. This also allows predictions to be objectively compared. We design online algorithms with predictions for a network flow allocation problem and restricted assignment makespan minimization. For both problems, two key properties are established: high quality predictions can be learned from a small sample of prior instances and these predictions are robust to errors that smoothly degrade as the underlying problem instance changes.	翻訳日:2022-09-22 03:24:12 公開日:2021-07-02
# (参考訳) 深層学習フレームワークにおけるhfox抵抗メモリの弱いリセット過程のモデル化 Model of the Weak Reset Process in HfOx Resistive Memory for Deep Learning Frameworks ( http://arxiv.org/abs/2107.06064v1 ) ライセンス: CC BY 4.0	Atreya Majumdar, Marc Bocquet, Tifenn Hirtzlin, Axel Laborieux, Jacques-Olivier Klein, Etienne Nowak, Elisa Vianello, Jean-Michel Portal, Damien Querlioz	(参考訳) 現在のディープラーニングトレーニングアルゴリズムの実装は、メモリとロジックユニット間のデータ転送による電力消費である。酸化物ベースのRRAMは、インメモリコンピューティングを実装するための優れた候補である。その弱いRESETの仕組みは、耐久性が高いデバイスの抵抗を調整できるので、学習には特に魅力的だ。しかし、この体制における抵抗的な変化の挙動は多くの変動に悩まされており、特にディープラーニングをシミュレートするツールと互換性のある方法でモデリングすることは特に困難である。本研究では,酸化ハフニウムRRAMにおける弱いRESET過程のモデルを示し,このモデルをPyTorchディープラーニングフレームワークに統合する。我々のモデルは、ハイブリッドCMOS/RRAM技術の実験に基づいて、ノイズの進行挙動とデバイス間変動(D2D)の両方を再現する。我々はこのツールを用いて、MNIST手書き文字認識タスクとCIFAR-10オブジェクト分類タスクにバイナリニューラルネットワークを訓練する。トレーニングプロセスへの影響を理解し,D2Dの変動性が最も有害な側面であることを識別するために,デバイス不完全性のさまざまな側面をシミュレートする。このフレームワークは、他の種類の記憶において、最も劣化の原因となるデバイス欠陥を特定するのと同じ方法で使用することができ、その結果、デバイスを最適化してこれらの欠陥の影響を減らすことができる。 The implementation of current deep learning training algorithms is power-hungry, owing to data transfer between memory and logic units. Oxide-based RRAMs are outstanding candidates to implement in-memory computing, which is less power-intensive. Their weak RESET regime, is particularly attractive for learning, as it allows tuning the resistance of the devices with remarkable endurance. However, the resistive change behavior in this regime suffers many fluctuations and is particularly challenging to model, especially in a way compatible with tools used for simulating deep learning. In this work, we present a model of the weak RESET process in hafnium oxide RRAM and integrate this model within the PyTorch deep learning framework. Validated on experiments on a hybrid CMOS/RRAM technology, our model reproduces both the noisy progressive behavior and the device-to-device (D2D) variability. We use this tool to train Binarized Neural Networks for the MNIST handwritten digit recognition task and the CIFAR-10 object classification task. We simulate our model with and without various aspects of device imperfections to understand their impact on the training process and identify that the D2D variability is the most detrimental aspect. The framework can be used in the same manner for other types of memories to identify the device imperfections that cause the most degradation, which can, in turn, be used to optimize the devices to reduce the impact of these imperfections.	翻訳日:2021-07-18 16:58:37 公開日:2021-07-02
# マルチドメイン学習における距離移動と干渉 Disentangling Transfer and Interference in Multi-Domain Learning ( http://arxiv.org/abs/2107.05445v1 ) ライセンス: Link先を確認	Yipeng Zhang, Tyler L. Hayes, Christopher Kanan	(参考訳) 人間は、あるドメインから別のドメインに知識を移すことがとても得意で、新しいタスクを素早く学習できます。同様に、転送学習は事前学習を用いた多くのコンピュータビジョン問題において大きな成功を収めた。しかし、ネットワークが異なるデータセットで定義された複数のタスクを学習するマルチドメイン学習における転送の利点は十分に研究されていない。複数のドメインを学ぶことは有益か、あるいはネットワーク容量が限られているため、ドメイン同士が干渉する可能性がある。本研究では,マルチドメイン学習において,干渉や知識伝達が発生する条件を解明する。干渉と転送を分離する新しいメトリクスを提案し、実験プロトコルをセットアップする。さらに,ネットワークキャパシティ,タスクグループ化,動的損失重み付けが干渉の軽減と伝達の促進に果たす役割について検討する。我々は、CIFAR-100、MiniPlaces、Tiny-ImageNetデータセットでこの結果を示す。 Humans are incredibly good at transferring knowledge from one domain to another, enabling rapid learning of new tasks. Likewise, transfer learning has enabled enormous success in many computer vision problems using pretraining. However, the benefits of transfer in multi-domain learning, where a network learns multiple tasks defined by different datasets, has not been adequately studied. Learning multiple domains could be beneficial or these domains could interfere with each other given limited network capacity. In this work, we decipher the conditions where interference and knowledge transfer occur in multi-domain learning. We propose new metrics disentangling interference and transfer and set up experimental protocols. We further examine the roles of network capacity, task grouping, and dynamic loss weighting in reducing interference and facilitating transfer. We demonstrate our findings on the CIFAR-100, MiniPlaces, and Tiny-ImageNet datasets.	翻訳日:2021-07-18 12:26:13 公開日:2021-07-02
# ニューラルネットワークを用いた因果構造学習のための逐次MDL Prequential MDL for Causal Structure Learning with Neural Networks ( http://arxiv.org/abs/2107.05481v1 ) ライセンス: Link先を確認	Jorg Bornschein and Silvia Chiappa and Alan Malek and Rosemary Nan Ke	(参考訳) ベイジアンネットワークの構造と観測から因果関係を学習することは、科学と技術のいくつかの分野において共通の目標である。本稿では,適応性および過度にパラメータ化されたニューラルネットワークを用いて観測変数間の条件付き確率分布をモデル化した場合に,事前最小記述長原理(MDL)を用いてベイズネットワークの実用的なスコアリング関数を導出できることを示す。 MDL は Occam の Razor の具現化を表現し, 調整が必要な前処理やその他の正則化器を疎結合にすることなく, 可塑性および同相グラフ構造を得る。人工的および実世界のデータに競合する結果を実証する。スコアはしばしば変数間の強い非線形関係が存在する場合でも正しい構造を回復する。さらに, 分布シフト中の音源から観測を行った場合, 適応速度から因果構造を推定する最近の研究との関係についても考察した。 Learning the structure of Bayesian networks and causal relationships from observations is a common goal in several areas of science and technology. We show that the prequential minimum description length principle (MDL) can be used to derive a practical scoring function for Bayesian networks when flexible and overparametrized neural networks are used to model the conditional probability distributions between observed variables. MDL represents an embodiment of Occam's Razor and we obtain plausible and parsimonious graph structures without relying on sparsity inducing priors or other regularizers which must be tuned. Empirically we demonstrate competitive results on synthetic and real-world data. The score often recovers the correct structure even in the presence of strongly nonlinear relationships between variables; a scenario were prior approaches struggle and usually fail. Furthermore we discuss how the the prequential score relates to recent work that infers causal structure from the speed of adaptation when the observations come from a source undergoing distributional shift.	翻訳日:2021-07-18 12:25:43 公開日:2021-07-02
# 畳み込みニューラルバンド:ビジュアルアウェア広告のための確率的アルゴリズム Convolutional Neural Bandit: Provable Algorithm for Visual-aware Advertising ( http://arxiv.org/abs/2107.07438v1 ) ライセンス: Link先を確認	Yikun Ban, Jingrui He	(参考訳) オンライン広告はウェブビジネスで広く使われている。画像表示は、顧客と対話する最も一般的なフォーマットの1つであると考えられている。コンテクチュアルなマルチアームバンディットは、レコメンデーション手順に存在する探索探索ジレンマを解決するための広告の適用に成功している。本稿では,視覚的広告にインスパイアされた畳み込みニューラルネットワーク(CNN)を用いて,探索のための上位信頼境界(UCB)とともに報酬関数を学習するコンテキスト的帯域幅アルゴリズムを提案する。また、ネットワークが過度にパラメータ化され、畳み込みニューラル・タンジェント・カーネル(CNTK)との強い接続が確立されたときに、ほぼ最適の後悔を$\tilde{\mathcal{O}}(\sqrt{T})$で証明する。最後に,提案手法の有効性を評価し,実世界画像データセット上でのucbベースのバンディットアルゴリズムよりも優れていることを示す。 Online advertising is ubiquitous in web business. Image displaying is considered as one of the most commonly used formats to interact with customers. Contextual multi-armed bandit has shown success in the application of advertising to solve the exploration-exploitation dilemma existed in the recommendation procedure. Inspired by the visual-aware advertising, in this paper, we propose a contextual bandit algorithm, where the convolutional neural network (CNN) is utilized to learn the reward function along with an upper confidence bound (UCB) for exploration. We also prove a near-optimal regret bound $\tilde{\mathcal{O}}(\sqrt{T})$ when the network is over-parameterized and establish strong connections with convolutional neural tangent kernel (CNTK). Finally, we evaluate the empirical performance of the proposed algorithm and show that it outperforms other state-of-the-art UCB-based bandit algorithms on real-world image data sets.	翻訳日:2021-07-18 12:25:14 公開日:2021-07-02
# thriceだけを書く: ドキュメントの作成、計算ノートブック、プレゼンテーションを1つのソースから作成 You Only Write Thrice: Creating Documents, Computational Notebooks and Presentations From a Single Source ( http://arxiv.org/abs/2107.06639v1 ) ライセンス: Link先を確認	Kacper Sokol and Peter Flach	(参考訳) 学術的な取引では、原稿、プレゼンテーション、ポスター、計算ノートなど、異なるフォーマットで出版された同じコンテンツの複数のバリエーションをジャグリングする必要がある。 write-review--rebut--revise ライフサイクルに対応するバージョンを追跡する必要があると、別の複雑さが増す。本稿では,バージョン管理環境(gitなど)における単一ソースドキュメントの維持と,アカデミックで広く普及している出力フォーマットのコレクションを生成する機能の追加により,この負担を大幅に軽減することを提案する。この目的のために、Jupyterの科学計算エコシステムからさまざまなオープンソースツールを活用し、選択したソフトウェアエンジニアリング概念を運用する。概念実証ワークフローを提供し、jupyter book(オンラインドキュメント)、jupyter notebook(計算ナラティブ)、clearly.jsのスライドを単一のmarkdownソースファイルから作成します。 githubにホストされているこのアプローチは、変更追跡とバージョニングをサポートし、基盤となるコードイシュー管理インフラストラクチャに基づいた透過的なレビュープロセスもサポートする。私たちのワークフローの展示はhttps://so-cool.github.io/you-only-write-thrice/でプレビューできます。 Academic trade requires juggling multiple variants of the same content published in different formats: manuscripts, presentations, posters and computational notebooks. The need to track versions to accommodate for the write--review--rebut--revise life-cycle adds another layer of complexity. We propose to significantly reduce this burden by maintaining a single source document in a version-controlled environment (such as git), adding functionality to generate a collection of output formats popular in academia. To this end, we utilise various open-source tools from the Jupyter scientific computing ecosystem and operationalise selected software engineering concepts. We offer a proof-of-concept workflow that composes Jupyter Book (an online document), Jupyter Notebook (a computational narrative) and reveal.js slides from a single markdown source file. Hosted on GitHub, our approach supports change tracking and versioning, as well as a transparent review process based on the underlying code issue management infrastructure. An exhibit of our workflow can be previewed at https://so-cool.github.io/you-only-write-thrice/.	翻訳日:2021-07-18 12:24:57 公開日:2021-07-02
# 量子ビットを話す方法 How to make qubits speak ( http://arxiv.org/abs/2107.06776v1 ) ライセンス: Link先を確認	Bob Coecke, Giovanni de Felice, Konstantinos Meichanetzidis, Alexis Toumi	(参考訳) これは量子コンピュータを話させ、それを量子ネイティブ、合成、意味認識の方法で行う物語である。最近,実際の量子コンピュータを用いて質問応答を行った。私たちは、何をしたかを説明し、これはすべて写真の観点から行われたことを強調し、関連する文献に多くのポインタを提供する。実際、自然言語以外にも、他の多くのことは、量子ネイティブで、構成的で、意味を意識した方法で実装することができる。また、実際に実行するためのガイダンスも提供しています。 This is a story about making quantum computers speak, and doing so in a quantum-native, compositional and meaning-aware manner. Recently we did question-answering with an actual quantum computer. We explain what we did, stress that this was all done in terms of pictures, and provide many pointers to the related literature. In fact, besides natural language, many other things can be implemented in a quantum-native, compositional and meaning-aware manner, and we provide the reader with some indications of that broader pictorial landscape, including our account on the notion of compositionality. We also provide some guidance for the actual execution, so that the reader can give it a go as well.	翻訳日:2021-07-18 12:23:59 公開日:2021-07-02
# 戦略的航空交通流管理のための気象シーンの圧縮表現 Compressive Representations of Weather Scenes for Strategic Air Traffic Flow Management ( http://arxiv.org/abs/2107.06394v1 ) ライセンス: Link先を確認	Sandip Roy	(参考訳) 戦略的航空交通フロー管理の目的を支援するため,高次元気象シーンデータの長期的表現について検討した。具体的には,航空関係の気象シーンが圧縮可能かどうかについて考察する。ここでは,metarデータ(気温,飛行カテゴリ,アメリカ大陸の可視性プロファイルを含む)から抽出した気象シーンの圧縮をグラフスペクトルベースで検討した。シーンは圧縮可能であり、シーンコンテンツの75-95%は基底ベクトルの0.5-4%でキャプチャされる。さらに、各シーンにおける支配的基底ベクトルは、気象の時変空間特性を識別し、圧縮された表現からの再構成を示す。最後に、戦略的TFM設計における圧縮表現の潜在的利用について概説する。 Terse representation of high-dimensional weather scene data is explored, in support of strategic air traffic flow management objectives. Specifically, we consider whether aviation-relevant weather scenes are compressible, in the sense that each scene admits a possibly-different sparse representation in a basis of interest. Here, compression of weather scenes extracted from METAR data (including temperature, flight categories, and visibility profiles for the contiguous United States) is examined, for the graph-spectral basis. The scenes are found to be compressible, with 75-95% of the scene content captured using 0.5-4% of the basis vectors. Further, the dominant basis vectors for each scene are seen to identify time-varying spatial characteristics of the weather, and reconstruction from the compressed representation is demonstrated. Finally, potential uses of the compressive representations in strategic TFM design are briefly scoped.	翻訳日:2021-07-18 12:23:49 公開日:2021-07-02
# CHASE: セルレベル微分可能なニューラルネットワークによるロバストなビジュアルトラッキング CHASE: Robust Visual Tracking via Cell-Level Differentiable Neural Architecture Search ( http://arxiv.org/abs/2107.03463v1 ) ライセンス: Link先を確認	Seyed Mojtaba Marvasti-Zadeh, Javad Khaghani, Li Cheng, Hossein Ghanei-Yakhdan, Shohreh Kasaei	(参考訳) 現在、強力なビジュアルオブジェクトトラッカーは、手作業で設計されたネットワークアーキテクチャで高品質なトラッキング結果を提供する、よく作られたモジュールに依存している。手動設計プロセスは、十分な事前経験、膨大な努力、直感、そしておそらく幸運を必要とするため、特に困難な障壁となる。一方,ニューラルネットワーク検索は,実現可能なネットワーク構造の自動探索問題に取り組むための有望な手法として,画像分割などの実用的応用において基盤となっている。本研究では,トラッキングモジュールのネットワーク設計を自動化し,オフライントレーニング中のトラッキングネットワークの目的にバックボーン機能を適用することを目的とした,セルレベルの差別化可能なアーキテクチャ探索機構を提案する。提案されたアプローチはシンプルで効率的であり、ネットワークを構築するために一連のモジュールを積み重ねる必要はない。我々の手法は既存のトラッカーに組み込むことが簡単であり、異なるアーキテクチャ検索手法と追跡対象を用いて実証的に検証されている。広範な実験評価の結果,5つのベンチマークにおいて優れた性能が得られた。一方、私たちの自動検索プロセスは、trackingnetデータセット上の第2(第1)のdartsメソッドに41時間(18時間)かかります。 A strong visual object tracker nowadays relies on its well-crafted modules, which typically consist of manually-designed network architectures to deliver high-quality tracking results. Not surprisingly, the manual design process becomes a particularly challenging barrier, as it demands sufficient prior experience, enormous effort, intuition and perhaps some good luck. Meanwhile, neural architecture search has gaining grounds in practical applications such as image segmentation, as a promising method in tackling the issue of automated search of feasible network structures. In this work, we propose a novel cell-level differentiable architecture search mechanism to automate the network design of the tracking module, aiming to adapt backbone features to the objective of a tracking network during offline training. The proposed approach is simple, efficient, and with no need to stack a series of modules to construct a network. Our approach is easy to be incorporated into existing trackers, which is empirically validated using different differentiable architecture search-based methods and tracking objectives. Extensive experimental evaluations demonstrate the superior performance of our approach over five commonly-used benchmarks. Meanwhile, our automated searching process takes 41 (18) hours for the second (first) order DARTS method on the TrackingNet dataset.	翻訳日:2021-07-11 11:37:54 公開日:2021-07-02
# Deep Mesh Prior: グラフ畳み込みネットワークを用いた教師なしメッシュ復元 Deep Mesh Prior: Unsupervised Mesh Restoration using Graph Convolutional Networks ( http://arxiv.org/abs/2107.02909v1 ) ライセンス: Link先を確認	Shota Hattori, Tatsuya Yatagawa, Yutaka Ohtake, Hiromasa Suzuki	(参考訳) 本稿では,教師なしの方法で自己相似性を学習することで,メッシュ復元問題,すなわち分節化と完了に対処する。そこで,本提案手法では,メッシュ上のグラフ畳み込みネットワークを用いて自己相似性を学習する。ネットワークは入力データとして単一の不完全なメッシュを取り、大規模なデータセットを使用してトレーニングされることなく、再構築されたメッシュを直接出力する。本手法では,プロセス全体がメッシュで動作するため,暗黙のフィールドなどの中間表現は使用しない。我々の教師なし手法は大規模データセットを用いた最先端手法と同等かそれ以上に機能することを示した。 This paper addresses mesh restoration problems, i.e., denoising and completion, by learning self-similarity in an unsupervised manner. For this purpose, the proposed method, which we refer to as Deep Mesh Prior, uses a graph convolutional network on meshes to learn the self-similarity. The network takes a single incomplete mesh as input data and directly outputs the reconstructed mesh without being trained using large-scale datasets. Our method does not use any intermediate representations such as an implicit field because the whole process works on a mesh. We demonstrate that our unsupervised method performs equally well or even better than the state-of-the-art methods using large-scale datasets.	翻訳日:2021-07-11 11:36:22 公開日:2021-07-02
# (参考訳) 機械学習の問題を解決する Solving Machine Learning Problems ( http://arxiv.org/abs/2107.01238v1 ) ライセンス: CC BY 4.0	Sunny Tran, Pranav Krishna, Ishan Pakuwal, Prabhakar Kafle, Nikhil Singh, Jayson Lynch, Iddo Drori	(参考訳) 機械は機械学習を学べるのか? この研究は、機械学習モデルをトレーニングして、大学の学部レベルのコースから機械学習問題を解決する。我々は、MITの6.036のIntroduction to Machine Learningコースからコース演習、宿題、クイズ質問からなる新しいトレーニングセットを作成し、これらの質問に答えるために機械学習モデルをトレーニングします。本システムでは,MIT学生の平均93%に対して,オープン応答質問では96%,マルチチョイス質問では97%の総合的精度をリアルタイムで達成している。質問はコースで教えられた12のトピックすべてをカバーする。 i)基本的な機械学習原則、(ii)パーセプトロン、(iii)特徴抽出と選択、(iv)ロジスティック回帰、(v)回帰、(vi)ニューラルネットワーク、(vi)高度なニューラルネットワーク、(viii)畳み込みニューラルネットワーク、(ix)リカレントニューラルネットワーク、(x)ステートマシンとmdp、(xi)強化学習、(xii)決定木。本システムは,グラフとツリー表現を備えたエンコーダデコーダアーキテクチャ内でTransformerモデルを使用する。私たちのアプローチの重要な側面は、新しい例問題を生成するためのデータ提供スキームです。また、問題ヒントを生成するために機械学習モデルをトレーニングします。そこで,本システムは,トピック間の新たな質問を自動的に生成し,オープン応答質問と複数質問の両方に回答し,問題を分類し,問題ヒントを生成し,stem教育のためのaiの包含を押し上げる。 Can a machine learn Machine Learning? This work trains a machine learning model to solve machine learning problems from a University undergraduate level course. We generate a new training set of questions and answers consisting of course exercises, homework, and quiz questions from MIT's 6.036 Introduction to Machine Learning course and train a machine learning model to answer these questions. Our system demonstrates an overall accuracy of 96% for open-response questions and 97% for multiple-choice questions, compared with MIT students' average of 93%, achieving grade A performance in the course, all in real-time. Questions cover all 12 topics taught in the course, excluding coding questions or questions with images. Topics include: (i) basic machine learning principles; (ii) perceptrons; (iii) feature extraction and selection; (iv) logistic regression; (v) regression; (vi) neural networks; (vii) advanced neural networks; (viii) convolutional neural networks; (ix) recurrent neural networks; (x) state machines and MDPs; (xi) reinforcement learning; and (xii) decision trees. Our system uses Transformer models within an encoder-decoder architecture with graph and tree representations. An important aspect of our approach is a data-augmentation scheme for generating new example problems. We also train a machine learning model to generate problem hints. Thus, our system automatically generates new questions across topics, answers both open-response questions and multiple-choice questions, classifies problems, and generates problem hints, pushing the envelope of AI for STEM education.	翻訳日:2021-07-07 13:10:08 公開日:2021-07-02
# (参考訳) AutoMLサロゲートモデリング最適化のための機械学習パイプラインツールキットの設計 Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization ( http://arxiv.org/abs/2107.01253v1 ) ライセンス: CC BY 4.0	Paulito P. Palmes, Akihiro Kishimoto, Radu Marinescu, Parikshit Ram, Elizabeth Daly	(参考訳) 機械学習におけるパイプライン最適化問題は、パイプライン構造とそれらの要素のパラメータ適応の同時最適化を必要とする。これらの構造を表現するエレガントな方法を持つことは、最適化戦略の異なる選択とともに、パフォーマンスの管理と分析の複雑さを減らすのに役立ちます。これらの問題を念頭に,我々は,シンプルな表現を用いた複雑な機械学習パイプライン構造の作成と評価を容易にするAMLPツールキットを開発した。 AMLPを使って最適なパイプラインシグネチャを見つけ、それらをデータマイニングし、これらのデータマイニング機能を使って学習と予測を高速化します。我々は、AMLP計算時間5分未満で4時間の予算で他のAutoMLアプローチを上回り、AMLPのサロゲートモデルを用いた2段階パイプライン最適化を定式化した。 The pipeline optimization problem in machine learning requires simultaneous optimization of pipeline structures and parameter adaptation of their elements. Having an elegant way to express these structures can help lessen the complexity in the management and analysis of their performances together with the different choices of optimization strategies. With these issues in mind, we created the AMLP toolkit which facilitates the creation and evaluation of complex machine learning pipeline structures using simple expressions. We use AMLP to find optimal pipeline signatures, datamine them, and use these datamined features to speed-up learning and prediction. We formulated a two-stage pipeline optimization with surrogate modeling in AMLP which outperforms other AutoML approaches with a 4-hour time budget in less than 5 minutes of AMLP computation time.	翻訳日:2021-07-07 13:08:54 公開日:2021-07-02
# (参考訳) 注意モデルを用いた新しい災害画像データセットと特徴解析 A Novel Disaster Image Dataset and Characteristics Analysis using Attention Model ( http://arxiv.org/abs/2107.01284v1 ) ライセンス: CC BY 4.0	Fahim Faisal Niloy, Arif, Abu Bakar Siddik Nayem, Anis Sarker, Ovi Paul, M. Ashraful Amin, Amin Ahsan Ali, Moinul Islam Zaber, AKM Mahbubur Rahman	(参考訳) ディープラーニング技術の進歩により、他の分類技術よりも優れたシステムを開発することができた。しかし,実験システムの成功は,提案システムの学習に利用可能なデータの品質と多様性に依存している。本研究では, 火災, 水, 陸の3つの災害現場から収集した画像を含む比較的困難なデータセットを慎重に収集した。また,自然や人による災害や,戦争や事故による人的被害など,さまざまな被害インフラの画像も収集した。また,このような災害や被害の徴候のない画像を含む非損傷クラスに対する画像データも蓄積した。このデータセットには13,720の注釈付き画像があり、各画像は3人で注釈付けされている。また,200種類のテスト画像に対して,バウンディングボックスを手作業で付与した画像クラス情報を識別する。画像は、他の研究者が利用可能なさまざまなニュースポータル、ソーシャルメディア、標準データセットから収集される。 3層注意モデル(tlam)を訓練し、平均5つの折りたたみ検証精度95.88%を達成する。さらに、200個の未検出画像では、この精度は96.48%である。また,これらの実験画像に対して注意マップを作成し,比較し,注意モデルの特性について検討した。私たちのデータセットはhttps://niloy193.github.io/Disaster-Datasetで利用可能です。 The advancement of deep learning technology has enabled us to develop systems that outperform any other classification technique. However, success of any empirical system depends on the quality and diversity of the data available to train the proposed system. In this research, we have carefully accumulated a relatively challenging dataset that contains images collected from various sources for three different disasters: fire, water and land. Besides this, we have also collected images for various damaged infrastructure due to natural or man made calamities and damaged human due to war or accidents. We have also accumulated image data for a class named non-damage that contains images with no such disaster or sign of damage in them. There are 13,720 manually annotated images in this dataset, each image is annotated by three individuals. We are also providing discriminating image class information annotated manually with bounding box for a set of 200 test images. Images are collected from different news portals, social media, and standard datasets made available by other researchers. A three layer attention model (TLAM) is trained and average five fold validation accuracy of 95.88% is achieved. Moreover, on the 200 unseen test images this accuracy is 96.48%. We also generate and compare attention maps for these test images to determine the characteristics of the trained attention model. Our dataset is available at https://niloy193.github.io/Disaster-Dataset	翻訳日:2021-07-07 12:58:27 公開日:2021-07-02
# (参考訳) 2値分類と変更点検出のためのソルトベースサロゲート損失関数を用いたROC曲線の最適化 Optimizing ROC Curves with a Sort-Based Surrogate Loss Function for Binary Classification and Changepoint Detection ( http://arxiv.org/abs/2107.01285v1 ) ライセンス: CC BY 4.0	Jonathan Hillman and Toby Dylan Hocking	(参考訳) 受信者動作特性(roc)曲線は、バイナリ分類モデルの評価に有用な真正率と偽正率のプロットであるが、曲線(auc)下の領域が凸でないため、学習に使用するのが困難である。 ROC曲線は、変化点検出のような偽陽性と真正の確率を持つ他の問題にも用いられる。このより一般的な文脈では、ROC曲線はループ、高い準最適誤差率を持つ点、AUCが1より大きい点を持つことが示される。この観測は、AUCを最大化する代わりに、Min(FP,FN) に対して大きな値を持つ点を避ける AUC=1 の単調ROC曲線を求める。本研究では,AUMと呼ばれる新しいサロゲート損失関数(AUM, Area Under Min(FP, FN))を導出する凸緩和法を提案する。以前の損失関数はすべてのラベル付き例やペアの和に基づいているが、AUMはROC曲線上の点列上のソートと和を必要とする。勾配降下学習アルゴリズムでは,AUM方向微分を効率的に計算し,利用できることを示す。教師付きバイナリ分類と変更点検出問題に関する実証的研究では、新しいAUM最小化学習アルゴリズムがAUCを改良し、以前のベースラインと同等の速度をもたらすことを示した。 Receiver Operating Characteristic (ROC) curves are plots of true positive rate versus false positive rate which are useful for evaluating binary classification models, but difficult to use for learning since the Area Under the Curve (AUC) is non-convex. ROC curves can also be used in other problems that have false positive and true positive rates such as changepoint detection. We show that in this more general context, the ROC curve can have loops, points with highly sub-optimal error rates, and AUC greater than one. This observation motivates a new optimization objective: rather than maximizing the AUC, we would like a monotonic ROC curve with AUC=1 that avoids points with large values for Min(FP,FN). We propose a convex relaxation of this objective that results in a new surrogate loss function called the AUM, short for Area Under Min(FP, FN). Whereas previous loss functions are based on summing over all labeled examples or pairs, the AUM requires a sort and a sum over the sequence of points on the ROC curve. We show that AUM directional derivatives can be efficiently computed and used in a gradient descent learning algorithm. In our empirical study of supervised binary classification and changepoint detection problems, we show that our new AUM minimization learning algorithm results in improved AUC and comparable speed relative to previous baselines.	翻訳日:2021-07-07 12:46:27 公開日:2021-07-02
# (参考訳) Scarecrow: マシンテキストの精査のためのフレームワーク Scarecrow: A Framework for Scrutinizing Machine Text ( http://arxiv.org/abs/2107.01294v1 ) ライセンス: CC BY 4.0	Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, Noah A.Smith, Yejin Choi	(参考訳) 現代のニューラルテキスト生成システムは、驚くほど流動的で文法的なテキストを生成することができる。初期の言語モデルは反復と構文上の誤りに苦しんだが、現代のモデルによる誤りはしばしば意味的、物語的、あるいは談話的失敗である。これらの複雑なエラータイプの研究を容易にするために、Scarecrowと呼ばれる新しい構造化されたクラウドソースエラーアノテーションスキーマを導入する。 Scarecrowで使用されるエラーカテゴリ(冗長性、コモンセンスエラー、不整合など)は、専門家分析とオントロジーのないクラウドアノテーションのパイロットラウンドを組み合わせて、実際のマシン生成テキストで見られるエラー現象をカバーするスキーマに到達することで特定された。我々は、Scarecrowを使って1.3kの人文と機械が生成する英語ニューステキストの13kのアノテーションを収集し、それぞれ41k以上のスパンにエラーカテゴリ、重大さ、自然言語の説明、先行スパン(関連する部分)をラベル付けした。我々は、GPT-2 Smallから最大のGPT-3まで、様々なパフォーマンスレベルを持つ最先端システムによって生成されたテキストのアノテーションを収集する。パラメータ数,トレーニングデータ,復号化技術など,詳細な解析のためのいくつかの因子を分離した。以上の結果から,これらの設定の相違点が期待できる。これらの結果から,現在および将来のテキスト生成システムの評価において,カカシアノテーションの価値が示された。私たちは完全なアノテーションツールキットとデータセットをhttps://yao-dou.github.io/scarecrow/でリリースしています。 Modern neural text generation systems can produce remarkably fluent and grammatical texts. While earlier language models suffered from repetition and syntactic errors, the errors made by contemporary models are often semantic, narrative, or discourse failures. To facilitate research of these complex error types, we introduce a new structured, crowdsourced error annotation schema called Scarecrow. The error categories used in Scarecrow -- such as redundancy, commonsense errors, and incoherence -- were identified by combining expert analysis with several pilot rounds of ontology-free crowd annotation to arrive at a schema which covers the error phenomena found in real machine generated text. We use Scarecrow to collect 13k annotations of 1.3k human and machine generate paragraphs of English language news text, amounting to over 41k spans each labeled with its error category, severity, a natural language explanation, and antecedent span (where relevant). We collect annotations for text generated by state-of-the-art systems with varying known performance levels, from GPT-2 Small through the largest GPT-3. We isolate several factors for detailed analysis, including parameter count, training data, and decoding technique. Our results show both expected and surprising differences across these settings. These findings demonstrate the value of Scarecrow annotations in the assessment of current and future text generation systems. We release our complete annotation toolkit and dataset at https://yao-dou.github.io/scarecrow/.	翻訳日:2021-07-07 12:21:44 公開日:2021-07-02
# (参考訳) ニューラルネットワークのサブスペースクラスタリングに基づく解析 Subspace Clustering Based Analysis of Neural Networks ( http://arxiv.org/abs/2107.01296v1 ) ライセンス: CC BY 4.0	Uday Singh Saini, Pravallika Devineni, Evangelos E. Papalexakis	(参考訳) ディープニューラルネットワークの潜在空間を分析するツールは、それらを理解するためのステップを提供する。本研究では,入力セット上で訓練されたニューラルネットワーク層の潜在構造から親和性グラフを学習することを目的として,スパース部分空間クラスタリング(ssc)の動機付けを行う。次に、コミュニティ検出のツールを使用して、入力に存在する構造を定量化する。これらの実験は、ネットワークの奥深くに進むにつれて、入力は同じクラスの他の入力と親和性が高まる傾向があることを示しています。次に,アフィニティグラフ間の層間比較を行うために,行列類似度尺度を利用する。そうすることで、我々はまず、トレーニング中のある層を最終状態と比較すると、ネットワークの層が浅いほど、より深い層よりも収束が早いことを実証する。ネットワークアーキテクチャ全体のペアワイズ分析を行う場合、ネットワークのサイズが大きくなるにつれて、各レイヤが隣のレイヤと適度に類似している状態から、ブロック内のレイヤが他のブロックのレイヤと高い類似度を持つ状態へと再編成されるのが観察される。最後に,ネットワークの最終畳み込み層の学習された親和性グラフを分析し,入力の局所的近傍がネットワークの分類にどのように影響するかを示す。 Tools to analyze the latent space of deep neural networks provide a step towards better understanding them. In this work, we motivate sparse subspace clustering (SSC) with an aim to learn affinity graphs from the latent structure of a given neural network layer trained over a set of inputs. We then use tools from Community Detection to quantify structures present in the input. These experiments reveal that as we go deeper in a network, inputs tend to have an increasing affinity to other inputs of the same class. Subsequently, we utilise matrix similarity measures to perform layer-wise comparisons between affinity graphs. In doing so we first demonstrate that when comparing a given layer currently under training to its final state, the shallower the layer of the network, the quicker it is to converge than the deeper layers. When performing a pairwise analysis of the entire network architecture, we observe that, as the network increases in size, it reorganises from a state where each layer is moderately similar to its neighbours, to a state where layers within a block have high similarity than to layers in other blocks. Finally, we analyze the learned affinity graphs of the final convolutional layer of the network and demonstrate how an input's local neighbourhood affects its classification by the network.	翻訳日:2021-07-07 11:50:32 公開日:2021-07-02
# データ不確かさに基づく指紋の前処理 Data Uncertainty Guided Noise-aware Preprocessing Of Fingerprints ( http://arxiv.org/abs/2107.01248v1 ) ライセンス: Link先を確認	Indu Joshi and Ayush Utkarsh and Riya Kothari and Vinod K Kurmi and Antitza Dantcheva and Sumantra Dutta Roy and Prem Kumar Kalra	(参考訳) 良質な指紋に対する指紋認証システムの有効性は, 昔から確立されてきた。しかし, ノイズや品質の悪い指紋に対する標準指紋照合システムの性能は十分ではない。そこで本研究では,最先端の指紋前処理モデルを用いて,入力画像に存在する雑音を定量化し,背景雑音やリッジの明度が低い指紋領域を識別する手法を提案する。ノイズの定量化は、2つの折りたたみモデルに役立つ: まず、目的関数を特定の入力指紋のノイズに適応させ、その結果、ノイズや歪んだ指紋領域の堅牢な性能を達成する。第二に、入力指紋画像中のノイズの多い画素を示すノイズ分散マップを提供する。予測ノイズ分散マップは、入力画像に存在するノイズによる誤予測をエンドユーザが理解できるようにする。様々なアーキテクチャの選択と2つの指紋処理タスクにわたる13の公開指紋データベースの広範な実験評価は,提案手法の有効性を示している。 The effectiveness of fingerprint-based authentication systems on good quality fingerprints is established long back. However, the performance of standard fingerprint matching systems on noisy and poor quality fingerprints is far from satisfactory. Towards this, we propose a data uncertainty-based framework which enables the state-of-the-art fingerprint preprocessing models to quantify noise present in the input image and identify fingerprint regions with background noise and poor ridge clarity. Quantification of noise helps the model two folds: firstly, it makes the objective function adaptive to the noise in a particular input fingerprint and consequently, helps to achieve robust performance on noisy and distorted fingerprint regions. Secondly, it provides a noise variance map which indicates noisy pixels in the input fingerprint image. The predicted noise variance map enables the end-users to understand erroneous predictions due to noise present in the input image. Extensive experimental evaluation on 13 publicly available fingerprint databases, across different architectural choices and two fingerprint processing tasks demonstrate effectiveness of the proposed framework.	翻訳日:2021-07-06 15:22:57 公開日:2021-07-02
# リラックスした注意:エンドツーエンド自動音声認識の性能向上のための簡易手法 Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition ( http://arxiv.org/abs/2107.01275v1 ) ライセンス: Link先を確認	Timo Lohrenz, Patrick Schwarz, Zhengyang Li, Tim Fingscheidt	(参考訳) 近年,アテンションベースのエンコーダデコーダ(AED)モデルでは,複数タスクにわたるエンドツーエンド自動音声認識(ASR)の性能が向上している。本稿では,2行のコードで容易に実装できる訓練において,エンコーダ・デコーダの注意重みに対する一様分布の簡易な段階的注入である緩和注意の概念を紹介する。我々は,様々なAEDモデルアーキテクチャと,ウォールストリートジャーナル (WSJ) とリブリスペック (Librispeech) の2つの顕著なASRタスクにおける緩和された注意の効果について検討した。ゆるやかな注意で訓練されたトランスフォーマーは、外部言語モデルで復号する際に標準ベースラインモデルより一貫して優れていた。 wsjでは、単語誤り率3.65%のトランスフォーマ・エンド・ツー・エンド音声認識のベンチマークを新たに設定し、その性能(4.20%)を13.1%向上させた。受け入れられると、モデルはgithubで公開される。 Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in this paper we introduce the concept of relaxed attention, which is a simple gradual injection of a uniform distribution to the encoder-decoder attention weights during training that is easily implemented with two lines of code. We investigate the effect of relaxed attention across different AED model architectures and two prominent ASR tasks, Wall Street Journal (WSJ) and Librispeech. We found that transformers trained with relaxed attention outperform the standard baseline models consistently during decoding with external language models. On WSJ, we set a new benchmark for transformer-based end-to-end speech recognition with a word error rate of 3.65%, outperforming state of the art (4.20%) by 13.1% relative, while introducing only a single hyperparameter. Upon acceptance, models will be published on github.	翻訳日:2021-07-06 15:12:16 公開日:2021-07-02
# Trncated Marginal Neural Ratio Estimation Truncated Marginal Neural Ratio Estimation ( http://arxiv.org/abs/2107.01214v1 ) ライセンス: Link先を確認	Benjamin Kurt Miller, Alex Cole, Patrick Forr\'e, Gilles Louppe, Christoph Weniger	(参考訳) パラメトリック確率シミュレータは科学においてユビキタスであり、しばしば高次元の入力パラメータと/または難易度を特徴とする。この文脈でベイズパラメータ推論を行うことは困難である。本稿では,シミュレーション効率と高速な実験後テスト性を備えたニューラルシミュレータに基づく推論アルゴリズムを提案する。提案手法は, 関節後部ではなく低次元縁後部を同時に推定し, インジケータ関数によって適切に切り替わる前の観察を目的としたシミュレーションを提案する。さらに, 局所的償却後を推定することにより, 推定結果のロバスト性に関する効率的な実証実験が可能となる。このようなテストは、実世界のアプリケーションにおける正当性チェックの推論において重要である。シミュレーションベース推論ベンチマークのマージン化版と,2つの複雑で狭い後方部について実験を行い,本アルゴリズムのシミュレーター効率と推定された後方値の品質について検討した。 github上の実装。 Parametric stochastic simulators are ubiquitous in science, often featuring high-dimensional input parameters and/or an intractable likelihood. Performing Bayesian parameter inference in this context can be challenging. We present a neural simulator-based inference algorithm which simultaneously offers simulation efficiency and fast empirical posterior testability, which is unique among modern algorithms. Our approach is simulation efficient by simultaneously estimating low-dimensional marginal posteriors instead of the joint posterior and by proposing simulations targeted to an observation of interest via a prior suitably truncated by an indicator function. Furthermore, by estimating a locally amortized posterior our algorithm enables efficient empirical tests of the robustness of the inference results. Such tests are important for sanity-checking inference in real-world applications, which do not feature a known ground truth. We perform experiments on a marginalized version of the simulation-based inference benchmark and two complex and narrow posteriors, highlighting the simulator efficiency of our algorithm as well as the quality of the estimated marginal posteriors. Implementation on GitHub.	翻訳日:2021-07-06 15:09:17 公開日:2021-07-02
# Visual Time Series Forecasting: イメージ駆動型アプローチ Visual Time Series Forecasting: An Image-driven Approach ( http://arxiv.org/abs/2107.01273v1 ) ライセンス: Link先を確認	Naftali Cohen, Srijan Sood, Zhen Zeng, Tucker Balch, Manuela Veloso	(参考訳) 本研究では,時系列予測をコンピュータビジョンタスクとして扱う。入力データを画像としてキャプチャし,モデルをトレーニングして次の画像を生成する。このアプローチは、ポイントワイズ値とは対照的に分布を予測する。提案手法のロバスト性と品質を評価するため,様々なデータセットと複数の評価指標について検討する。実験の結果, 予測ツールは循環データには有効であるが, 株価などの不規則データには若干少ないことがわかった。重要な点は、画像に基づく評価メトリクスを使用する場合、arimaを含むさまざまなベースラインと、ディープラーニングアプローチの数値的変化を比較できる方法を見つけることです。 In this work, we address time-series forecasting as a computer vision task. We capture input data as an image and train a model to produce the subsequent image. This approach results in predicting distributions as opposed to pointwise values. To assess the robustness and quality of our approach, we examine various datasets and multiple evaluation metrics. Our experiments show that our forecasting tool is effective for cyclic data but somewhat less for irregular data such as stock prices. Importantly, when using image-based evaluation metrics, we find our method to outperform various baselines, including ARIMA, and a numerical variation of our deep learning approach.	翻訳日:2021-07-06 15:07:17 公開日:2021-07-02
# バリューファンクションギャップを超えて: エピソード強化学習のためのインスタンス依存レグレスト境界の改善 Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning ( http://arxiv.org/abs/2107.01264v1 ) ライセンス: Link先を確認	Christoph Dann, Teodor V. Marinov, Mehryar Mohri, Julian Zimmert	(参考訳) 有限エピソディックマルコフ決定過程における強化学習のためのギャップ依存的後悔境界の改善を提案する。以前の仕事と比較して、私たちの境界はギャップの代替定義に依存する。これらの定義は、好意的な後悔を達成するために、アルゴリズムが最適なポリシーによって達成されない状態において最適に振る舞う方法を学習する必要がないという洞察に基づいている。楽観的なアルゴリズムでは,より強い後悔境界を証明し,多数のMDPに対して新たな情報理論的下限を伴う。楽観的アルゴリズムは, 決定論的 MDP においても, 独特な最適政策がない限り, 情報理論の下限を達成できないことを示す。 We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the insight that, in order to achieve a favorable regret, an algorithm does not need to learn how to behave optimally in states that are not reached by an optimal policy. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs. Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.	翻訳日:2021-07-06 14:56:21 公開日:2021-07-02
# 力学系のための物理誘導深層学習:調査 Physics-Guided Deep Learning for Dynamical Systems: A survey ( http://arxiv.org/abs/2107.01272v1 ) ライセンス: Link先を確認	Rui Wang	(参考訳) 複雑な物理力学のモデリングは科学と工学の基本的な課題である。従来の物理モデルは解釈可能であるが、厳密な仮定に依存している。直接数値近似は通常計算集約であり、かなりの計算資源と専門知識を必要とする。ディープラーニング(DL)は、複雑なパターンを効率的に認識し、非線形力学をエミュレートするための新しい代替手段を提供するが、必ずしも物理系の規則に従わないし、異なるシステムにまたがってうまく一般化しない。このように、物理誘導型DLの研究が登場し、大きな進歩を遂げた。物理学に基づくモデリングと最先端のDLモデルの両方から、科学的な問題を解決することを目指している。本稿では,従来の物理知識や物理に基づくモデリングをDLに統合する手法について概説し,新たな可能性について論じる。 Modeling complex physical dynamics is a fundamental task in science and engineering. Traditional physics-based models are interpretable but rely on rigid assumptions. And the direct numerical approximation is usually computationally intensive, requiring significant computational resources and expertise. While deep learning (DL) provides novel alternatives for efficiently recognizing complex patterns and emulating nonlinear dynamics, it does not necessarily obey the governing laws of physical systems, nor do they generalize well across different systems. Thus, the study of physics-guided DL emerged and has gained great progress. It aims to take the best from both physics-based modeling and state-of-the-art DL models to better solve scientific problems. In this paper, we provide a structured overview of existing methodologies of integrating prior physical knowledge or physics-based modeling into DL and discuss the emerging opportunities.	翻訳日:2021-07-06 14:56:10 公開日:2021-07-02
# 過パラメータ線形ネットワークによるオートエンコーダにおける暗黙の欲欲ランク学習 Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks ( http://arxiv.org/abs/2107.01301v1 ) ライセンス: Link先を確認	Shih-Yu Sun, Vimal Thilak, Etai Littwin, Omid Saremi, Joshua M. Susskind	(参考訳) 勾配降下で訓練された深い線形ネットワークは、行列因子分解でよく研究されるように、低いランクの解を与える。本稿では,さらに一歩進めて,オートエンコーダにおける暗黙のランク正規化を解析する。オートエンコーダボトルネックにおける線形サブネットワークによって誘導される低ランク潜伏符号の欲望学習を示す。さらに,スペクトル先行および線形深度に対するトレーニング力学の感度を緩和するために,直交初期化と学習率調整法を提案する。合成データ上の線形オートエンコーダでは,本手法は定常的に基底潜在コードランクに収束する。非線形オートエンコーダでは,下流分類と画像サンプリングに最適な潜在ランクに収束する。 Deep linear networks trained with gradient descent yield low rank solutions, as is typically studied in matrix factorization. In this paper, we take a step further and analyze implicit rank regularization in autoencoders. We show greedy learning of low-rank latent codes induced by a linear sub-network at the autoencoder bottleneck. We further propose orthogonal initialization and principled learning rate adjustment to mitigate sensitivity of training dynamics to spectral prior and linear depth. With linear autoencoders on synthetic data, our method converges stably to ground-truth latent code rank. With nonlinear autoencoders, our method converges to latent ranks optimal for downstream classification and image sampling.	翻訳日:2021-07-06 14:55:56 公開日:2021-07-02
# ヒューマノイドロボットの先天的遠隔操作 Prescient teleoperation of humanoid robots ( http://arxiv.org/abs/2107.01281v1 ) ライセンス: Link先を確認	Luigi Penco, Jean-Baptiste Mouret, Serena Ivaldi	(参考訳) 人間型ロボットは、操作者に視覚フィードバックを送りながら、ウェアラブルモーションキャプチャ装置を装備した操作者の動きを遠隔地で再現することができる。人間の動作(再ターゲット)をヒューマノイドロボットに転送する大きな進歩があったが、そのようなシステムが実際の応用に配備されるのを防ぐ主要な問題は、人間の入力とロボットからのフィードバックの間の通信遅延の存在である。これらの遅延を克服するために、ヒューマノイドロボットが実際にコマンドを受け取る前にコマンドを実行するシステムを導入し、視覚フィードバックがオペレータに同期するようにし、ロボットは過去にコマンドを実行した。そのためロボットは、過去の軌道で訓練され、最後に受信したコマンドで条件付けられた機械学習モデルをクエリすることで、将来のコマンドを継続的に予測する。私たちの実験では、オペレーターがヒューマノイドロボット(32度自由度)を、複数の身体操作タスクで最大2秒の確率的遅延で制御することに成功しました。 Humanoid robots could be versatile and intuitive human avatars that operate remotely in inaccessible places: the robot could reproduce in the remote location the movements of an operator equipped with a wearable motion capture device while sending visual feedback to the operator. While substantial progress has been made on transferring ("retargeting") human motions to humanoid robots, a major problem preventing the deployment of such systems in real applications is the presence of communication delays between the human input and the feedback from the robot: even a few hundred milliseconds of delay can irreversibly disturb the operator, let alone a few seconds. To overcome these delays, we introduce a system in which a humanoid robot executes commands before it actually receives them, so that the visual feedback appears to be synchronized to the operator, whereas the robot executed the commands in the past. To do so, the robot continuously predicts future commands by querying a machine learning model that is trained on past trajectories and conditioned on the last received commands. In our experiments, an operator was able to successfully control a humanoid robot (32 degrees of freedom) with stochastic delays up to 2 seconds in several whole-body manipulation tasks, including reaching different targets, picking up, and placing a box at distinct locations.	翻訳日:2021-07-06 14:46:31 公開日:2021-07-02
# 最適トランスポートを用いた機能コネクトーム間のデータ駆動マッピング Data-driven mapping between functional connectomes using optimal transport ( http://arxiv.org/abs/2107.01303v1 ) ライセンス: Link先を確認	Javid Dadashkarimi and Amin Karbasi and Dustin Scheinost	(参考訳) 機能的磁気共鳴イメージングに由来する機能的コネクトームは、長い間脳の機能的構造を理解するために用いられてきた。それにもかかわらず、コネクトームは本質的にアトラスと結びついている。言い換えれば、あるアトラスから生成されたコネクトームは、別のアトラスから生成されたコネクトームと比べてスケールと解像度が異なる。コネクトームと導出結果を、追加の事前処理なしで異なるアトラス間でマッピングできることは、異なるアトラスを使用する研究間の解釈と一般化を改善する重要なステップである。ここでは、2つのアトラス間の最適マッピングを見つけるために、強力な数学的手法である最適輸送を用いる。このマッピングはコネクトームを再構築するために、あるアトラスから別のアトラスへの時系列変換に使用される。我々は、変換コネクトームと「金標準」コネクトーム(すなわち、アトラスから直接生成されたコネクトーム)を比較し、これらのコネクトームを異なるアトラスに基づく予測モデルに適用することにより、変換コネクトームの有用性を示す。これらのトランスフォーメーションコネクトームは,「金標準」コネクトームと著しく類似しており,脳行動関連における個人差を維持しており,本手法の有効性と下流解析における有用性を示している。全体として、我々のアプローチはコネクトームに基づく様々なアトラスの一般化を促進するための有望な道である。 Functional connectomes derived from functional magnetic resonance imaging have long been used to understand the functional organization of the brain. Nevertheless, a connectome is intrinsically linked to the atlas used to create it. In other words, a connectome generated from one atlas is different in scale and resolution compared to a connectome generated from another atlas. Being able to map connectomes and derived results between different atlases without additional pre-processing is a crucial step in improving interpretation and generalization between studies that use different atlases. Here, we use optimal transport, a powerful mathematical technique, to find an optimum mapping between two atlases. This mapping is then used to transform time series from one atlas to another in order to reconstruct a connectome. We validate our approach by comparing transformed connectomes against their "gold-standard" counterparts (i.e., connectomes generated directly from an atlas) and demonstrate the utility of transformed connectomes by applying these connectomes to predictive models based on a different atlas. We show that these transformed connectomes are significantly similar to their "gold-standard" counterparts and maintain individual differences in brain-behavior associations, demonstrating both the validity of our approach and its utility in downstream analyses. Overall, our approach is a promising avenue to increase the generalization of connectome-based results across different atlases.	翻訳日:2021-07-06 14:46:09 公開日:2021-07-02
# エンドツーエンド音声認識のための二重因果・非因果自己認識 Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition ( http://arxiv.org/abs/2107.01269v1 ) ライセンス: Link先を確認	Niko Moritz, Takaaki Hori, Jonathan Le Roux	(参考訳) 注意に基づくエンドツーエンド自動音声認識(ASR)システムは、最近、多くのタスクに対する最先端の結果を実証している。しかし、セルフアテンションと注意に基づくエンコーダ・デコーダモデルの適用は、各単語が話された直後に認識されなければならないストリーミングasrでは依然として困難である。本稿では,dcn(d-causal self-attention)アーキテクチャについて述べる。このアーキテクチャは,制限された自己完結とは対照的に,ディープアーキテクチャで使用される場合の単一レイヤのルック・アヘッドを超えて,全体的なコンテキストが成長することを妨げている。 dcnは、ストリーミングトランスフォーマーとコンフォーメータアーキテクチャを用いたチャンクベースおよび制限付きセルフアテンションと比較され、チャンクベースの自己アテンションに比べて制限付き自己アテンションおよび競合型asr結果よりもasr性能が向上し、フレーム同期処理の利点を提供する。提案されたストリーミング・ツー・エンドのASRシステムは、注意を喚起し、LibriSpeech、HKUST、Switchboard ASRタスクの最先端の結果を得た。 Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks. However, the application of self-attention and attention-based encoder-decoder models remains challenging for streaming ASR, where each word must be recognized shortly after it was spoken. In this work, we present the dual causal/non-causal self-attention (DCN) architecture, which in contrast to restricted self-attention prevents the overall context to grow beyond the look-ahead of a single layer when used in a deep architecture. DCN is compared to chunk-based and restricted self-attention using streaming transformer and conformer architectures, showing improved ASR performance over restricted self-attention and competitive ASR results compared to chunk-based self-attention, while providing the advantage of frame-synchronous processing. Combined with triggered attention, the proposed streaming end-to-end ASR systems obtained state-of-the-art results on the LibriSpeech, HKUST, and Switchboard ASR tasks.	翻訳日:2021-07-06 14:41:08 公開日:2021-07-02
# (参考訳) o2d2: オーサシップ検証における決定不能な試行をキャプチャする分散検出装置 O2D2: Out-Of-Distribution Detector to Capture Undecidable Trials in Authorship Verification ( http://arxiv.org/abs/2106.15825v2 ) ライセンス: CC BY 4.0	Benedikt Boenninghoff, Robert M. Nickel, Dorothea Kolossa	(参考訳) pan 2021 authorship verification (av) challengeは、クロストピック/クローズドセットavタスクからクロストピック/オープンセットavタスクへ、ファンフィクションテキストのコレクションに移行した3年間の戦略の一部である。本稿では,2021年の課題に対処するために設計された,新しいハイブリッド型ニューラル確率的フレームワークを提案する。提案方式は,2020年度の入賞提案に基づいて,話題の変化に対する感性を大幅に低減し,不確実性対応層を用いてシステムの校正をさらに改善する更新を行った。当社のフレームワークには、非応答を定義するためのout-of-distribution detector(o2d2)も含まれています。提案システムは、PAN 2021 AVタスクに参加した他のシステムよりも優れていた。 The PAN 2021 authorship verification (AV) challenge is part of a three-year strategy, moving from a cross-topic/closed-set AV task to a cross-topic/open-set AV task over a collection of fanfiction texts. In this work, we present a novel hybrid neural-probabilistic framework that is designed to tackle the challenges of the 2021 task. Our system is based on our 2020 winning submission, with updates to significantly reduce sensitivities to topical variations and to further improve the system's calibration by means of an uncertainty-adaptation layer. Our framework additionally includes an out-of-distribution detector (O2D2) for defining non-responses. Our proposed system outperformed all other systems that participated in the PAN 2021 AV task.	翻訳日:2021-07-06 07:20:10 公開日:2021-07-02
# (参考訳) ニューラルボコーダによる話者検証のための対向サンプルの抽出 Spotting adversarial samples for speaker verification by neural vocoders ( http://arxiv.org/abs/2107.00309v2 ) ライセンス: CC0 1.0	Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-yi Lee	(参考訳) 生体認証の最も重要な技術の一つである自動話者認証(ASV)は、トランザクション認証やアクセス制御を含むセキュリティクリティカルなアプリケーションで広く採用されている。しかし、以前の研究では、ASVは最近出現した敵攻撃に対して深刻な脆弱性があることが示されている。本稿では,ASVの対立サンプルを見つけるために,ニューラルボコーダを用いる。ニューラルボコーダを用いてオーディオを再合成し、元のオーディオと再合成オーディオのASVスコアの違いが真と逆のサンプルの識別に良い指標であることを示す。この取り組みは、私たちの知る限り、ASVの敵対的サンプルを検出するための技術的方向性を最初に追求するものであり、そのため、比較のための確立された基準線が欠如している。その結果,検出基準としてGriffin-Limアルゴリズムを実装した。提案手法は,すべての設定において,すべてのベースラインを上回る効果的な検出性能を実現する。また,検出フレームワークで採用されているニューラルボコーダはデータセットに依存しないことを示す。私たちのコードは、将来的な比較作業のためにオープンソースにされます。 Automatic speaker verification (ASV), one of the most important technology for biometric identification, has been widely adopted in security-critical applications, including transaction authentication and access control. However, previous work has shown that ASV is seriously vulnerable to recently emerged adversarial attacks, yet effective countermeasures against them are limited. In this paper, we adopt neural vocoders to spot adversarial samples for ASV. We use the neural vocoder to re-synthesize audio and find that the difference between the ASV scores for the original and re-synthesized audio is a good indicator for discrimination between genuine and adversarial samples. This effort is, to the best of our knowledge, among the first to pursue such a technical direction for detecting adversarial samples for ASV, and hence there is a lack of established baselines for comparison. Consequently, we implement the Griffin-Lim algorithm as the detection baseline. The proposed approach achieves effective detection performance that outperforms all the baselines in all the settings. We also show that the neural vocoder adopted in the detection framework is dataset-independent. Our codes will be made open-source for future works to do comparison.	翻訳日:2021-07-06 06:55:07 公開日:2021-07-02
# (参考訳) multilingual central repository: wordnetsを開発するためのクロスリンガルフレームワーク Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets ( http://arxiv.org/abs/2107.00333v2 ) ライセンス: CC BY 4.0	Xavier G\'omez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau	(参考訳) 言語処理には言語リソースが必要ですが、その構築にはコストがかかり、さまざまな分野の研究が必要で、常に更新が必要です。本稿では,バスク語,カタルーニャ語,英語,ガリシア語,ポルトガル語,スペイン語,および以下のオントロジー(ベースコンセプト,トップオントロジー,WordNetドメイン,Suggested Upper Merged Ontology)を含む多言語知識基盤であるMCR(Multilingual Central Repository)の開発に使用されるクロスリンガルフレームワークについて述べる。我々は、MCR、2017年の状態、および開発ツールについて紹介する。 Language resources are necessary for language processing,but building them is costly, involves many researches from different areas and needs constant updating. In this paper, we describe the crosslingual framework used for developing the Multilingual Central Repository (MCR), a multilingual knowledge base that includes wordnets of Basque, Catalan, English, Galician, Portuguese, Spanish and the following ontologies: Base Concepts, Top Ontology, WordNet Domains and Suggested Upper Merged Ontology. We present the story of MCR, its state in 2017 and the developed tools.	翻訳日:2021-07-06 06:18:11 公開日:2021-07-02
# (参考訳) ボトルネック付き無限広ニューラルネットワークにおける暗黙の加速と特徴学習 Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks ( http://arxiv.org/abs/2107.00364v2 ) ライセンス: CC0 1.0	Etai Littwin, Omid Saremi, Shuangfei Zhai, Vimal Thilak, Hanlin Goh, Joshua M. Susskind, Greg Yang	(参考訳) 有限サイズのボトルネックを用いて無限大ニューラルネットワークの学習ダイナミクスを分析する。ニューラルネットワークカーネルの限界とは異なり、無限幅ネットワークにおけるボトルネックは、ボトルネック表現におけるデータ依存的特徴学習を遅くする。無限ネットワークにおける単一ボトルネックは、純粋に無限ネットワークと比較してトレーニングを劇的に加速し、全体的なパフォーマンスが向上することを示す。ボトルネックの加速度効果を理論的に理解できる無限大のディープリニアモデルと類似性を引き出すことで加速度現象を考察する。 We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck. Unlike the neural tangent kernel limit, a bottleneck in an otherwise infinite width network al-lows data dependent feature learning in its bottle-neck representation. We empirically show that a single bottleneck in infinite networks dramatically accelerates training when compared to purely in-finite networks, with an improved overall performance. We discuss the acceleration phenomena by drawing similarities to infinitely wide deep linear models, where the acceleration effect of a bottleneck can be understood theoretically.	翻訳日:2021-07-06 06:07:37 公開日:2021-07-02
# (参考訳) CBNetV2:オブジェクト検出のための複合バックボーンネットワークアーキテクチャ CBNetV2: A Composite Backbone Network Architecture for Object Detection ( http://arxiv.org/abs/2107.00420v2 ) ライセンス: CC0 1.0	Tingting Liang, Xiaojie Chu, Yudong Liu, Yongtao Wang, Zhi Tang, Wei Chu, Jingdong Chen, Haibing Ling	(参考訳) 現代のトップパフォーマンスオブジェクト検出器はバックボーンネットワークに大きく依存しており、その進歩はより効率的なネットワーク構造を探索することで一貫した性能向上をもたらす。しかし、新しいバックボーンを設計してimagenetで事前トレーニングするには大量の計算リソースが必要となり、より良い検出性能を得るのにコストがかかる。本稿では,既存のオープンソースの学習済みバックボーンの構成を組み込んだ新しいバックボーンネットワークCBNetV2を提案する。特にCBNetV2アーキテクチャは、複合接続を介して接続される複数の同一のバックボーンをグループ化する。また、CBNetベースの検出器のためのAssistant Supervisionによるより良いトレーニング戦略を提案する。 CBNetV2は追加の事前訓練がなければ、1段と2段の検出器を含むメインストリームの検出器とアンカーベースとアンカーフリーベースの検出器に組み込むことができ、COCOのベースライン上での性能は3.0%以上向上する。また、複合バックボーンは、手動ベースやNASベース、CNNベースやTransformerベースなど、トレーニング済みのより広いネットワークよりも効率的でリソースフレンドリであることを示す強力な証拠を提供する。特に、シングルモデルとシングルスケールのテストでは、HTC Dual-Swin-Bが58.6%のボックスAPと51.1%のマスクAPをCOCOテストデブで達成しています。これは最先端の結果(57.7%のボックスAPと50.2%のマスクAP)よりもはるかに優れています。 Modern top-performing object detectors depend heavily on backbone networks, whose advances bring consistent performance gains through exploring more effective network structures. However, designing or searching for a new backbone and pre-training it on ImageNet may require a large number of computational resources, making it costly to obtain better detection performance. In this paper, we propose a novel backbone network, namely CBNetV2, by constructing compositions of existing open-sourced pre-trained backbones. In particular, CBNetV2 architecture groups multiple identical backbones, which are connected through composite connections. We also propose a better training strategy with the Assistant Supervision for CBNet-based detectors. Without additional pre-training, CBNetV2 can be integrated into mainstream detectors, including one-stage and two-stage detectors, as well as anchor-based and anchor-free-based ones, and significantly improve their performance by more than 3.0% AP over the baseline on COCO. Also, experiments provide strong evidence showing that composite backbones are more efficient and resource-friendly than pre-trained wider and deeper networks, including manual-based and NAS-based, as well as CNN-based and Transformer-based ones. Particularly, with single-model and single-scale testing, our HTC Dual-Swin-B achieves 58.6% box AP and 51.1% mask AP on COCO test-dev, which is significantly better than the state-of-the-art result (i.e., 57.7% box AP and 50.2% mask AP) achieved by a stronger baseline HTC++ with a larger backbone Swin-L. Code will be released at https://github.com/VDIGPKU/CBNetV2.	翻訳日:2021-07-06 05:46:21 公開日:2021-07-02
# (参考訳) 因果的神経結合:表現力、学習力、推論 The Causal Neural Connection: Expressiveness, Learnability, and Inference ( http://arxiv.org/abs/2107.00793v1 ) ライセンス: CC BY 4.0	Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, Elias Bareinboim	(参考訳) 因果推論の中心的な要素の1つは構造因果モデル (Structure causal model, SCM) と呼ばれる対象であり、これは調査中のシステムのランダムな変動のメカニズムと外因性源の集合を表す(Pearl, 2000)。多くの種類のニューラルネットワークの重要な性質は、任意の関数を任意の精度で近似できる普遍近似性である。この性質から、ニューラルネットワークの集合がそのSCMによって生成されたデータに基づいてトレーニングすることで、任意のSCMを学習できると推測する誘惑があるかもしれない。本稿では,表現性や学習可能性の概念を否定することで,この現象は当てはまらないことを示す。具体的には、因果階層定理(Thm)を示す。データから学べるものの限界を記述するBareinboim et al., 2020)は、依然としてニューラルモデルに当てはまる。例えば、任意に複雑で表現力のあるニューラルネットは、観測データのみによる介入の効果を予測できない。この結果から,ニューラル因果モデル(NCM)と呼ばれる特殊なSCMを導入し,因果推論に必要な構造的制約をエンコードする新しいタイプの帰納バイアスを定式化する。この新たなモデルに基づいて、因果同定と推定として知られる文献に見られる2つの正準タスクの解決に焦点をあてる。ニューラルツールボックスを活用することで、データから因果効果を学習できるかどうかを判断するために必要なアルゴリズム(すなわち因果識別可能性)を開発し、識別性が保持されるたびにその効果を推定する(因果推定)。シミュレーションは提案手法を裏付ける。 One of the central elements of any causal inference is an object called structural causal model (SCM), which represents a collection of mechanisms and exogenous sources of random variation of the system under investigation (Pearl, 2000). An important property of many kinds of neural networks is universal approximability: the ability to approximate any function to arbitrary precision. Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM. In this paper, we show this is not the case by disentangling the notions of expressivity and learnability. Specifically, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020), which describes the limits of what can be learned from data, still holds for neural models. For instance, an arbitrarily complex and expressive neural net is unable to predict the effects of interventions given observational data alone. Given this result, we introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences. Building on this new class of models, we focus on solving two canonical tasks found in the literature known as causal identification and estimation. Leveraging the neural toolbox, we develop an algorithm that is both sufficient and necessary to determine whether a causal effect can be learned from data (i.e., causal identifiability); it then estimates the effect whenever identifiability holds (causal estimation). Simulations corroborate the proposed approach.	翻訳日:2021-07-06 02:49:28 公開日:2021-07-02
# (参考訳) 説明可能なk-mediansとk-meansの近似最適アルゴリズム Near-optimal Algorithms for Explainable k-Medians and k-Means ( http://arxiv.org/abs/2107.00798v1 ) ライセンス: CC BY 4.0	Konstantin Makarychev, Liren Shan	(参考訳) 我々は,dasgupta,frost,moshkovitz,rashtchian~(icml 2020)が導入した説明可能な$k$-mediansと$k$-meansの問題を考える。この問題では、データを$k$クラスタに分割し、$k$-mediansや$k$-meansの目的を最小化する、‘emph{threshold decision tree’を見つけることが目的です。閾値木のすべての決定ノードは、1つの特徴に基づいてデータを2つのグループに分割するため、得られたクラスタリングは容易に解釈できる。我々は、$\tilde o(\log k)$が$k$-medians、$\ell_1$ norm、$\tilde o(k)$が$k$-meansと競合する問題に対する新しいアルゴリズムを提案する。これは Dasgupta et al (2020) による$O(k)$ と $O(k^2)$ の以前の保証よりも改善されている。また、$O(\log^{3/2} k)$$k$-medians with $\ell_2$ normに対して競合する新しいアルゴリズムも提供する。 dasgupta et al (2020) は$k$-medians に対して$\omega(\log k)$という下限を示し、本研究では$k$-means に対して$\tilde\omega(k)$という下限を証明した。また、$\ell_2$ normを持つ$k$-mediansに対して$\Omega(\log k)$の低い境界も提供する。 We consider the problem of explainable $k$-medians and $k$-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian~(ICML 2020). In this problem, our goal is to find a \emph{threshold decision tree} that partitions data into $k$ clusters and minimizes the $k$-medians or $k$-means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is $\tilde O(\log k)$ competitive with $k$-medians with $\ell_1$ norm and $\tilde O(k)$ competitive with $k$-means. This is an improvement over the previous guarantees of $O(k)$ and $O(k^2)$ by Dasgupta et al (2020). We also provide a new algorithm which is $O(\log^{3/2} k)$ competitive for $k$-medians with $\ell_2$ norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of $\Omega(\log k)$ for $k$-medians; in this work, we prove a lower bound of $\tilde\Omega(k)$ for $k$-means. We also provide a lower bound of $\Omega(\log k)$ for $k$-medians with $\ell_2$ norm.	翻訳日:2021-07-06 02:48:07 公開日:2021-07-02
# (参考訳) 彼は医師より良いことを知っている: BERTは実用性に影響を及ぼす He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics ( http://arxiv.org/abs/2107.00807v1 ) ライセンス: CC BY 4.0	Nanjiang Jiang and Marie-Catherine de Marneffe	(参考訳) 既存のいくつかの英語データセットにおいて,BERTが実存性を予測し,様々な言語構造を包含する方法について検討する。 BERTは、ほとんどのデータセットで強力なパフォーマンスを得るが、特定の事実ラベルと相関する一般的な表面パターンを利用することで、実用的推論が必要なインスタンスではフェールする。ハイパフォーマンスが示唆するものとは対照的に、事実性予測のための堅牢なシステムには程遠いのです。 We investigate how well BERT performs on predicting factuality in several existing English datasets, encompassing various linguistic constructions. Although BERT obtains a strong performance on most datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, and it fails on instances where pragmatic reasoning is necessary. Contrary to what the high performance suggests, we are still far from having a robust system for factuality prediction.	翻訳日:2021-07-06 02:19:28 公開日:2021-07-02
# (参考訳) 細胞平均に基づく双曲型および放物型偏微分方程式のニューラルネットワーク法 Cell-average based neural network method for hyperbolic and parabolic partial differential equations ( http://arxiv.org/abs/2107.00813v1 ) ライセンス: CC BY 4.0	Changxin Qiu, Jue Yan	(参考訳) 有限体積スキームをベースとしたセル平均ニューラルネットワーク手法を提案する。この方法は偏微分方程式の積分あるいは弱定式化に基づいている。単純なフィードフォワードネットワークは、2つの隣り合う時間ステップ間のソリューション平均進化を学ぶことを余儀なくされる。ニューラルネットワーク法のような1つの有限体積を一意に識別する最適ネットワークパラメータセットを得るために、オフライン教師付きトレーニングを行う。トレーニングがうまく行えば、ネットワーク手法は有限体積スキームとして実装され、メッシュ依存となる。従来の数値法とは異なり,提案手法は明示的なスキーム CFL 制約から緩和することができ,解の進化のために任意の時間ステップサイズに適応することができる。熱方程式では、第1次収束が観測され、誤差はメッシュサイズと関連しているが、メッシュサイズとは独立に観測される。セル平均ベースのニューラルネットワーク手法は、ほぼゼロの数値拡散で接触不連続性を鋭く発展させることができる。衝撃波と希薄波は非線形双曲保存法則のためによく捕獲される。 Motivated by finite volume scheme, a cell-average based neural network method is proposed. The method is based on the integral or weak formulation of partial differential equations. A simple feed forward network is forced to learn the solution average evolution between two neighboring time steps. Offline supervised training is carried out to obtain the optimal network parameter set, which uniquely identifies one finite volume like neural network method. Once well trained, the network method is implemented as a finite volume scheme, thus is mesh dependent. Different to traditional numerical methods, our method can be relieved from the explicit scheme CFL restriction and can adapt to any time step size for solution evolution. For Heat equation, first order of convergence is observed and the errors are related to the spatial mesh size but are observed independent of the mesh size in time. The cell-average based neural network method can sharply evolve contact discontinuity with almost zero numerical diffusion introduced. Shock and rarefaction waves are well captured for nonlinear hyperbolic conservation laws.	翻訳日:2021-07-06 01:57:59 公開日:2021-07-02
# (参考訳) 機械学習再現性に関する経験報告:実践者およびtensorflow model gardenコントリビュータへのガイダンス An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors ( http://arxiv.org/abs/2107.00821v1 ) ライセンス: CC BY-SA 4.0	Vishnu Banna and Akhil Chinnakotla and Zhengxin Yan and Ani Vegesana and Naveen Vivek and Kruthi Krishnappa and Wenxin Jiang and Yung-Hsiang Lu and George K. Thiruvathukal and James C. Davis	(参考訳) 機械学習技術は、科学と工学の進歩の基本的なツールになりつつある。これらの手法は天文学やスパムフィルタリングと同じくらい多様な文脈で適用されている。しかし、これらの手法を正しく適用するには、注意深い工学が必要である。研究ベースの機械学習技術を実用的なものにするために必要なソフトウェアエンジニアリングプロセスには比較的注意が払われていない。テクノロジ企業はTensorFLowやPyTorchといった機械学習フレームワークを通じてエンジニアリングコミュニティを支援してきたが、これらのフレームワークで複雑な機械学習モデルを設計する方法の詳細は隠されている。エンジニアリングコミュニティ内でのベストプラクティスを促進するため、学術機関とGoogleは、TensorFlow Model Garden(TFMG)などのコミュニティロケーションで著名な機械学習モデルの模範的な実装を開発することを目的とした、機械学習モデルに関する特別研究グループ(SIGMODELS)の立ち上げに協力した。本報告の目的は、tfmgに含まれるのに適した品質で最先端の機械学習モデルを再現するプロセスを定義することである。論文分析からモデルリリースまで,各ステップについて詳細なエンジニアリングプロセスを定義します。我々は26人の学生からなるチームでYOLOモデルファミリの実装経験を報告し、開発したツールを共有し、その過程で学んだ教訓を説明する。 Machine learning techniques are becoming a fundamental tool for scientific and engineering progress. These techniques are applied in contexts as diverse as astronomy and spam filtering. However, correctly applying these techniques requires careful engineering. Much attention has been paid to the technical potential; relatively little attention has been paid to the software engineering process required to bring research-based machine learning techniques into practical utility. Technology companies have supported the engineering community through machine learning frameworks such as TensorFLow and PyTorch, but the details of how to engineer complex machine learning models in these frameworks have remained hidden. To promote best practices within the engineering community, academic institutions and Google have partnered to launch a Special Interest Group on Machine Learning Models (SIGMODELS) whose goal is to develop exemplary implementations of prominent machine learning models in community locations such as the TensorFlow Model Garden (TFMG). The purpose of this report is to define a process for reproducing a state-of-the-art machine learning model at a level of quality suitable for inclusion in the TFMG. We define the engineering process and elaborate on each step, from paper analysis to model release. We report on our experiences implementing the YOLO model family with a team of 26 student researchers, share the tools we developed, and describe the lessons we learned along the way.	翻訳日:2021-07-06 01:37:35 公開日:2021-07-02
# (参考訳) 線形4次平均場ゲーム学習のための探索ノイズ Exploration noise for learning linear-quadratic mean field games ( http://arxiv.org/abs/2107.00839v1 ) ライセンス: CC BY 4.0	Fran\c{c}ois Delarue and Athanasios Vasileiadis	(参考訳) 本研究の目的は, 平均フィールドゲームの解法を学ぶための探索ノイズとして, 共通雑音が有効であることを示すことである。この概念は、一般的な雑音の適切な形が、存在と特異性を復元することがすでに証明されている、おもちゃの線形四角形モデルによって実証されている。ここではさらに一歩進んで、同じ種類の共通雑音が「架空の遊び」と呼ばれる学習アルゴリズムの収束を招きかねないことを証明し、これはさらなるポテンシャルや単調な構造を伴わない。理論解析を支えるためにいくつかの数値例が提供されている。 The goal of this paper is to demonstrate that common noise may serve as an exploration noise for learning the solution of a mean field game. This concept is here exemplified through a toy linear-quadratic model, for which a suitable form of common noise has already been proven to restore existence and uniqueness. We here go one step further and prove that the same form of common noise may force the convergence of the learning algorithm called `fictitious play', and this without any further potential or monotone structure. Several numerical examples are provided in order to support our theoretical analysis.	翻訳日:2021-07-06 01:36:41 公開日:2021-07-02
# (参考訳) 多次元スペクトルデータに対する深層学習に基づく統計ノイズ低減 Deep learning-based statistical noise reduction for multidimensional spectral data ( http://arxiv.org/abs/2107.00844v1 ) ライセンス: CC BY 4.0	Younsik Kim, Dongjin Oh, Soonsang Huh, Dongjoon Song, Sunbeom Jeong, Junyoung Kwon, Minsoo Kim, Donghan Kim, Hanyoung Ryu, Jongkeun Jung, Wonshik Kyung, Byungmin Sohn, Suyoung Lee, Jounghoon Hyun, Yeonghoon Lee, Yeongkwan Kimand Changyoung Kim	(参考訳) 分光実験では、多次元位相空間におけるデータ取得は、カバーすべき大きな位相空間体積のため、長い取得時間を必要とする可能性がある。このような場合、データ取得に利用可能な制限時間は、多次元スペクトルデータを取得する実験において深刻な制約となる。本稿では,角度分解光電子分光(arpes)を例として,ディープラーニングを知的手法として活用し,その制約を克服する手法を提案する。簡単に利用できるARPESデータとトレーニングデータセットをランダムに生成することで、過度に適合することなく、ニューラルネットワークの雑音化をトレーニングすることに成功しました。消音ニューラルネットワークは、本質的な情報を保存しながら、データ内のノイズを除去できる。ニューラルネットは,2桁の取得時間で取得したデータに対して,類似した2次導出および線形状解析を行うことができることを示す。本手法の重要性は,統計雑音の影響を受けやすい多次元スペクトルデータに適用可能であることにある。 In spectroscopic experiments, data acquisition in multi-dimensional phase space may require long acquisition time, owing to the large phase space volume to be covered. In such case, the limited time available for data acquisition can be a serious constraint for experiments in which multidimensional spectral data are acquired. Here, taking angle-resolved photoemission spectroscopy (ARPES) as an example, we demonstrate a denoising method that utilizes deep learning as an intelligent way to overcome the constraint. With readily available ARPES data and random generation of training data set, we successfully trained the denoising neural network without overfitting. The denoising neural network can remove the noise in the data while preserving its intrinsic information. We show that the denoising neural network allows us to perform similar level of second-derivative and line shape analysis on data taken with two orders of magnitude less acquisition time. The importance of our method lies in its applicability to any multidimensional spectral data that are susceptible to statistical noise.	翻訳日:2021-07-06 01:35:29 公開日:2021-07-02
# (参考訳) モバイルアプリケーションにおけるK平均+強化学習に基づくユーザの役割発見と最適化手法 User Role Discovery and Optimization Method based on K-means + Reinforcement learning in Mobile Applications ( http://arxiv.org/abs/2107.00862v1 ) ライセンス: CC BY 4.0	Yuanbang Li	(参考訳) 携帯電話の普及により、ユーザーは自分の位置やアクティビティをいつでも、どこでも、データのチェックインとして共有できる。これらのデータはユーザーの特徴を反映している。長期的な安定と、ユーザ共有機能のセットは、ユーザロールとして抽象化できる。この役割はユーザの社会的背景、職業、生活習慣と密接に関連している。本研究の主な貢献は4つある。まず、各ユーザに対する異なるビューからのユーザ特徴モデルを、データのチェックの分析から構築する。次に、K Meansアルゴリズムを用いてユーザ機能からユーザロールを検出する。第3に,ユーザロールのクラスタリング効果を強化し,クラスタリング結果の安定性を向上させるため,強化学習アルゴリズムを提案する。最後に,本手法の有効性を検証する実験を行い,その有効性を示した。 With the widespread use of mobile phones, users can share their location and activity anytime, anywhere, as a form of check in data. These data reflect user features. Long term stable, and a set of user shared features can be abstracted as user roles. The role is closely related to the user's social background, occupation, and living habits. This study provides four main contributions. Firstly, user feature models from different views for each user are constructed from the analysis of check in data. Secondly, K Means algorithm is used to discover user roles from user features. Thirdly, a reinforcement learning algorithm is proposed to strengthen the clustering effect of user roles and improve the stability of the clustering result. Finally, experiments are used to verify the validity of the method, the results of which show the effectiveness of the method.	翻訳日:2021-07-06 01:23:19 公開日:2021-07-02
# (参考訳) 混合整数プログラムの原始的ヒューリスティックス学習 Learning Primal Heuristics for Mixed Integer Programs ( http://arxiv.org/abs/2107.00866v1 ) ライセンス: CC BY 4.0	Yunzhuang Shen, Yuan Sun, Andrew Eberhard, Xiaodong Li	(参考訳) 本稿では,機械学習技術を用いた混合整数プログラムのための新しい原始ヒューリスティックを提案する。混合整数プログラミングは組合せ最適化問題を定式化する一般的な手法である。解法の内部では、分岐境界アルゴリズム(B&B)の開始から双対性ギャップを狭め、B&B木を積極的に刈り取ることでその性能を大幅に向上させる、優れた実現可能な解を見つける上で、原始ヒューリスティックスが重要な役割を果たす。本稿では,機械学習を用いて有効原始ヒューリスティックスを自動学習できるかどうかを検討する。本稿では,最適化問題をグラフとして表現する新しい手法を提案し,既知の最適解を持つ解問題インスタンス上でグラフ畳み込みネットワークを訓練する。これにより、類似型の未解決問題インスタンスの最適解における決定変数の値を予測することができる。可変解の予測は,B&B法(PB-DFS)を用いた確率分岐法(Probabilistic Branching with guided Depth-first Search, PB-DFS)の新たな構成により,(ほぼ)最適解の探索を迅速に行う。実験の結果、この新しいヒューリスティックは、他の最先端の原始ヒューリスティックと比較して、解法プロセスのずっと早い段階でより優れた原始解を見出すことができた。 This paper proposes a novel primal heuristic for Mixed Integer Programs, by employing machine learning techniques. Mixed Integer Programming is a general technique for formulating combinatorial optimization problems. Inside a solver, primal heuristics play a critical role in finding good feasible solutions that enable one to tighten the duality gap from the outset of the Branch-and-Bound algorithm (B&B), greatly improving its performance by pruning the B&B tree aggressively. In this paper, we investigate whether effective primal heuristics can be automatically learned via machine learning. We propose a new method to represent an optimization problem as a graph, and train a Graph Convolutional Network on solved problem instances with known optimal solutions. This in turn can predict the values of decision variables in the optimal solution for an unseen problem instance of a similar type. The prediction of variable solutions is then leveraged by a novel configuration of the B&B method, Probabilistic Branching with guided Depth-first Search (PB-DFS) approach, aiming to find (near-)optimal solutions quickly. The experimental results show that this new heuristic can find better primal solutions at a much earlier stage of the solving process, compared to other state-of-the-art primal heuristics.	翻訳日:2021-07-06 01:09:40 公開日:2021-07-02
# (参考訳) 情報幾何の観点から見た依存ネットワークの再考 Reconsidering Dependency Networks from an Information Geometry Perspective ( http://arxiv.org/abs/2107.00871v1 ) ライセンス: CC BY-SA 4.0	Kazuya Takabatake, Shotaro Akaho	(参考訳) 依存ネットワーク(Heckerman et al., 2000)は、多数の変数を含むシステムに対する潜在的確率的グラフィカルモデルである。ベイズネットワークと同様に、依存ネットワークの構造は有向グラフで表現され、各ノードは条件付き確率テーブルを持つ。学習と推論は個々のノード上でローカルに実現されるため、多くの変数であっても計算は扱いやすいままである。しかし、依存ネットワークの学習分布は擬ギブスサンプリングと呼ばれるマルコフ連鎖の定常分布であり、閉形式表現を持たない。この技術的不利は依存ネットワークの開発を妨げている。本稿では,各ノードに対してある多様体を考える。そして、これらの多様体上の反復 m-射影として擬ギブスサンプリングを解釈することができる。この解釈は、擬ギブスサンプリングの定常分布が分布空間に存在する位置に関する理論的境界を与える。さらに、この解釈は最適化問題として構造およびパラメータ学習アルゴリズムを含む。さらに,ベイジアンネットワークと依存性を実験的に比較した。その結果,依存性ネットワークとベイズネットワークは,学習した分布の精度でほぼ同じ性能を示すことがわかった。その結果,依存性ネットワークはベイズネットワークよりもはるかに高速に学習できることがわかった。 Dependency networks (Heckerman et al., 2000) are potential probabilistic graphical models for systems comprising a large number of variables. Like Bayesian networks, the structure of a dependency network is represented by a directed graph, and each node has a conditional probability table. Learning and inference are realized locally on individual nodes; therefore, computation remains tractable even with a large number of variables. However, the dependency network's learned distribution is the stationary distribution of a Markov chain called pseudo-Gibbs sampling and has no closed-form expressions. This technical disadvantage has impeded the development of dependency networks. In this paper, we consider a certain manifold for each node. Then, we can interpret pseudo-Gibbs sampling as iterative m-projections onto these manifolds. This interpretation provides a theoretical bound for the location where the stationary distribution of pseudo-Gibbs sampling exists in distribution space. Furthermore, this interpretation involves structure and parameter learning algorithms as optimization problems. In addition, we compare dependency and Bayesian networks experimentally. The results demonstrate that the dependency network and the Bayesian network have roughly the same performance in terms of the accuracy of their learned distributions. The results also show that the dependency network can learn much faster than the Bayesian network.	翻訳日:2021-07-06 00:53:53 公開日:2021-07-02
# (参考訳) オンデマンドで軽量な知識グラフ生成 - DBpediaによるデモ On-Demand and Lightweight Knowledge Graph Generation -- a Demonstration with DBpedia ( http://arxiv.org/abs/2107.00873v1 ) ライセンス: CC BY 4.0	Malte Brockmeier, Yawen Liu, Sunita Pateer, Sven Hertling and Heiko Paulheim	(参考訳) 現代のDBpediaのような大規模知識グラフは、処理と処理に大量の計算リソースを必要とするデータセットである。さらに、リリースサイクルが長い場合が多いため、これらのグラフには時代遅れの情報が残されている。本稿では,DBpedia on Demand(DBpedia on Demand)を提案する。DBpediaのリソースを,グラフ全体の実体化や保存を必要とせずにオンデマンドで提供するシステムで,クエリ機能にも制限がある。 Modern large-scale knowledge graphs, such as DBpedia, are datasets which require large computational resources to serve and process. Moreover, they often have longer release cycles, which leads to outdated information in those graphs. In this paper, we present DBpedia on Demand -- a system which serves DBpedia resources on demand without the need to materialize and store the entire graph, and which even provides limited querying functionality.	翻訳日:2021-07-06 00:32:58 公開日:2021-07-02
# (参考訳) 軌道角運動量絡み合った光子による衝突のない集団確率決定 Conflict-free collective stochastic decision making by orbital angular momentum entangled photons ( http://arxiv.org/abs/2107.00877v1 ) ライセンス: CC BY 4.0	Takashi Amakasu, Nicolas Chauvet, Guillaume Bachelier, Serge Huant, Ryoichi Horisaki, Makoto Naruse	(参考訳) 近年、光学と計算の両方に関わる学際研究において、光の波動粒子双対性を利用して複数腕のバンディット問題を解決するシングルフォトンによる意思決定が実証されている。さらに、絡み合った光子に基づく意思決定は、プレイヤー間の決定の衝突を回避し、平等を確保しながら競合するマルチアームバンディット問題を解決した。しかし、これらの研究は光の偏光に基づいているため、利用可能な選択の数は2に制限され、2つの直交偏光状態に対応する。ここでは、軌道角運動量を光子の調整可能な自由度として利用することにより、競争上の意思決定状況を解決するためのスケーラブルな原理を提案する。さらに、Hong-Ou-Mandel効果を2つ以上の状態に拡張することにより、軌道角運動量を持つ絡み合った光子状態を生成することができる実験的な構成を確立する。提案手法がナッシュ均衡を実現するための従来の混合戦略よりも大きい理論的最大値をほぼ達成する三本腕バンディット問題に関する全報酬を数値的に検討する。これは、最良の武器を見つけるための探索段階でさえも、矛盾のない選択を達成する絡み合い特性のおかげである。 In recent cross-disciplinary studies involving both optics and computing, single-photon-based decision-making has been demonstrated by utilizing the wave-particle duality of light to solve multi-armed bandit problems. Furthermore, entangled-photon-based decision-making has managed to solve a competitive multi-armed bandit problem in such a way that conflicts of decisions among players are avoided while ensuring equality. However, as these studies are based on the polarization of light, the number of available choices is limited to two, corresponding to two orthogonal polarization states. Here we propose a scalable principle to solve competitive decision-making situations by using the orbital angular momentum as the tunable degree of freedom of photons, which theoretically allows an unlimited number of arms. Moreover, by extending the Hong-Ou-Mandel effect to more than two states, we theoretically establish an experimental configuration able to generate entangled photon states with orbital angular momentum and conditions that provide conflict-free selections at every turn. We numerically examine total rewards regarding three-armed bandit problems, for which the proposed strategy accomplishes almost the theoretical maximum, which is greater than a conventional mixed strategy intending to realize Nash equilibrium. This is thanks to the entanglement property that achieves no-conflict selections, even in the exploring phase to find the best arms.	翻訳日:2021-07-06 00:28:46 公開日:2021-07-02
# (参考訳) 解釈可能な協調グラフニューラルネットワークによるオンラインマルチエージェント予測 Online Multi-Agent Forecasting with Interpretable Collaborative Graph Neural Network ( http://arxiv.org/abs/2107.00894v1 ) ライセンス: CC BY 4.0	Maosen Li, Siheng Chen, Yanning Shen, Genjia Liu, Ivor W. Tsang, Ya Zhang	(参考訳) 本稿では,システム内の動的相互作用を利用して,複数エージェントの今後の状況を予測する。本稿では,複数の協調予測器からの予測をコラボレーティブグラフに従って集約するコラボレーティブ予測ユニット(copu)を提案する。各協調予測器は、他のエージェントの影響を考慮してエージェントの状態を予測するように訓練される。協調グラフのエッジ重みは、各予測器の重要性を反映している。協調グラフは、明示的な目的を最小化することで動機づけられる乗法的更新によってオンラインに調整される。この目的により、我々はまた、トレーニングとともに、CoPUが、後方の最高の協調予測器と同じようなパフォーマンスを達成することを示すために、後悔の分析を行う。この理論的解釈性は、我々の手法を他の多くのグラフネットワークと区別する。予測を段階的に洗練するために、複数のCoPUを積み重ねて協調グラフニューラルネットワークを形成する。オンラインの軌道予測,オンラインの人力予測,オンラインの交通速度予測の3つのタスクにおいて,本手法は平均28.6%,17.4%,21.0%の3タスクにおいて,最先端の作業よりも優れていた。 This paper considers predicting future statuses of multiple agents in an online fashion by exploiting dynamic interactions in the system. We propose a novel collaborative prediction unit (CoPU), which aggregates the predictions from multiple collaborative predictors according to a collaborative graph. Each collaborative predictor is trained to predict the status of an agent by considering the impact of another agent. The edge weights of the collaborative graph reflect the importance of each predictor. The collaborative graph is adjusted online by multiplicative update, which can be motivated by minimizing an explicit objective. With this objective, we also conduct regret analysis to indicate that, along with training, our CoPU achieves similar performance with the best individual collaborative predictor in hindsight. This theoretical interpretability distinguishes our method from many other graph networks. To progressively refine predictions, multiple CoPUs are stacked to form a collaborative graph neural network. Extensive experiments are conducted on three tasks: online simulated trajectory prediction, online human motion prediction and online traffic speed prediction, and our methods outperform state-of-the-art works on the three tasks by 28.6%, 17.4% and 21.0% on average, respectively.	翻訳日:2021-07-06 00:05:47 公開日:2021-07-02
# (参考訳) 深部畳み込みニューラルネットワークの理論III:放射関数の近似 Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions ( http://arxiv.org/abs/2107.00896v1 ) ライセンス: CC BY 4.0	Tong Mao, Zhongjie Shi, and Ding-Xuan Zhou	(参考訳) 我々は、2つの畳み込み層、ダウンサンプリング演算子、完全に接続された層からなるディープニューラルネットワークのファミリーを考える。ネットワーク構造は、畳み込み層の数と完全に連結された層の幅を決定する2つの構造パラメータに依存する。近似関数が特徴多項式 $q$ と不定値関数 $f$ との合成形式 $f\circ q$ を取るとき、明示的な近似率を持つ近似理論を定式化する。特に、そのようなネットワークが、$\mathbb{r}^d$ からのデータの次元 $d$ が大きいとき、$q(x) =\|x\|^2$ で半径関数を近似するときに、完全連結な浅層ネットワークを上回ることが証明される。これは、特殊構造を持つ関数を近似する深層畳み込みニューラルネットワークの優越性に関する最初の厳密な証明を与える。そこで我々は, 回帰関数が$f\circ Q$である回帰フレームワークにおいて, そのようなディープネットワークを用いた経験的リスク最小化のための一般化解析を行う。複合情報や$q$ や $f$ の関数を使用しないネットワーク構造は、自動的に特徴を抽出し、構造パラメータをチューニングすることで回帰関数の複合的性質を利用することができます。本解析は,ネットワーク深度を最小にし,その後増加させる誤差境界を提供し,ネットワーク深さで観測されるトレードオフ現象を理論的に検証する。 We consider a family of deep neural networks consisting of two groups of convolutional layers, a downsampling operator, and a fully connected layer. The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer. We establish an approximation theory with explicit approximation rates when the approximated function takes a composite form $f\circ Q$ with a feature polynomial $Q$ and a univariate function $f$. In particular, we prove that such a network can outperform fully connected shallow networks in approximating radial functions with $Q(x) =\|x\|^2$, when the dimension $d$ of data from $\mathbb{R}^d$ is large. This gives the first rigorous proof for the superiority of deep convolutional neural networks in approximating functions with special structures. Then we carry out generalization analysis for empirical risk minimization with such a deep network in a regression framework with the regression function of the form $f\circ Q$. Our network structure which does not use any composite information or the functions $Q$ and $f$ can automatically extract features and make use of the composite nature of the regression function via tuning the structural parameters. Our analysis provides an error bound which decreases with the network depth to a minimum and then increases, verifying theoretically a trade-off phenomenon observed for network depths in many practical applications.	翻訳日:2021-07-05 23:35:31 公開日:2021-07-02
# (参考訳) ocr誤りのある歴史的テキストに対するデータ中心領域適応 Data Centric Domain Adaptation for Historical Text with OCR Errors ( http://arxiv.org/abs/2107.00927v1 ) ライセンス: CC BY 4.0	Luisa M\"arz, Stefan Schweter, Nina Poerner, Benjamin Roth and Hinrich Sch\"utze	(参考訳) オランダ語とフランス語の歴史的データに基づいて、ドメイン内およびドメイン間識別(NER)のための新しい手法を提案する。クロスドメインの場合、コンテキスト化された文字列埋め込みを通じて教師なしのドメインデータを統合することでドメインシフトに対処し、OCRエラーをソースドメインに注入し、データ中心のドメイン適応に対処する。任意の入力データにOCR誤差を模倣する一般的な手法を提案する。私たちのクロスドメインとドメイン内の結果は、いくつかの強力なベースラインを上回り、最先端の結果を確立します。私たちは、フランスとオランダのヨーロッパ・ナー・コーポラの事前処理版を公開します。 We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical data for Dutch and French. For the cross-domain case, we address domain shift by integrating unsupervised in-domain data via contextualized string embeddings; and OCR errors by injecting synthetic OCR errors into the source domain and address data centric domain adaptation. We propose a general approach to imitate OCR errors in arbitrary input data. Our cross-domain as well as our in-domain results outperform several strong baselines and establish state-of-the-art results. We publish preprocessed versions of the French and Dutch Europeana NER corpora.	翻訳日:2021-07-05 23:34:22 公開日:2021-07-02
# (参考訳) 全スライド画像分類のための混合監督学習 Mixed Supervision Learning for Whole Slide Image Classification ( http://arxiv.org/abs/2107.00934v1 ) ライセンス: CC BY 4.0	Jiahui Li, Wen Chen, Xiaodi Huang, Zhiqiang Hu, Qi Duan, Hongsheng Li, Dimitris N. Metaxas, Shaoting Zhang	(参考訳) 分類ラベルを用いた弱監督学習は,様々なタスクにおいて高い性能を示した。数ピクセルレベルのファインアノテーションも手頃な価格である場合、ピクセルレベルのアノテーション(セグメンテーションなど)と画像レベルのアノテーション(分類など)の両方を活用してパフォーマンスをさらに向上することは自然である。しかし、計算病理学では、スライド画像全体の高解像度化によって分類モデルのエンドツーエンドの訓練が不可能になるため、そのような弱さや混在した監視学習は依然として難しい課題である。別のアプローチとして、パッチベースのモデルトレーニング、すなわち、自己教師付き学習を用いてパッチのピクセルレベルの擬似ラベルを生成することで、そのようなデータを解析する方法がある。しかしながら、そのような手法は通常、自己学習過程中にノイズが蓄積されるため、収束しにくいモデルドリフト問題を持つ。これらの問題に対処するために,高解像度画像のための混合監視学習フレームワークを提案し,それらの様々なラベル(画像レベルの粗いアノテーションや画素レベルの微細なラベルなど)を効果的に活用する。パッチトレーニングの段階で、このフレームワークは粗いイメージレベルラベルを使用して、自己教師付き学習を洗練し、高品質のピクセルレベル擬似ラベルを生成することができる。ピクセルレベルの偽陽性と偽陰性を抑制するための包括的戦略が提案されている。大量の画像(スライド画像1万枚以上)を持つ実世界の3つのデータセットと、様々な種類のラベルを用いて、混合監視学習の有効性を評価する。画像レベルの分類作業において,100%の感度を維持しながら,術式と比較して偽陽性率を約3分の1削減した。 Weak supervision learning on classification labels has demonstrated high performance in various tasks. When a few pixel-level fine annotations are also affordable, it is natural to leverage both of the pixel-level (e.g., segmentation) and image level (e.g., classification) annotation to further improve the performance. In computational pathology, however, such weak or mixed supervision learning is still a challenging task, since the high resolution of whole slide images makes it unattainable to perform end-to-end training of classification models. An alternative approach is to analyze such data by patch-base model training, i.e., using self-supervised learning to generate pixel-level pseudo labels for patches. However, such methods usually have model drifting issues, i.e., hard to converge, because the noise accumulates during the self-training process. To handle those problems, we propose a mixed supervision learning framework for super high-resolution images to effectively utilize their various labels (e.g., sufficient image-level coarse annotations and a few pixel-level fine labels). During the patch training stage, this framework can make use of coarse image-level labels to refine self-supervised learning and generate high-quality pixel-level pseudo labels. A comprehensive strategy is proposed to suppress pixel-level false positives and false negatives. Three real-world datasets with very large number of images (i.e., more than 10,000 whole slide images) and various types of labels are used to evaluate the effectiveness of mixed supervision learning. We reduced the false positive rate by around one third compared to state of the art while retaining 100\% sensitivity, in the task of image-level classification.	翻訳日:2021-07-05 23:22:50 公開日:2021-07-02
# (参考訳) 逆ディリクレ重み付けによる物理情報ニューラルネットワークの信頼性向上 Inverse-Dirichlet Weighting Enables Reliable Training of Physics Informed Neural Networks ( http://arxiv.org/abs/2107.00940v1 ) ライセンス: CC BY-SA 4.0	Suryanarayana Maddu, Dominik Sturm, Christian L. M\"uller, Ivo F. Sbalzarini	(参考訳) 我々は、物理情報ニューラルネットワーク(PINN)のような深層ニューラルネットワークのトレーニング中に、スケール不均衡を伴うマルチスケールダイナミクスから生じる障害モードを特徴付け、治療する。 PINNは、物理方程式モデルのデータとのシームレスな統合を可能にする、一般的な機械学習テンプレートである。彼らのトレーニングは、データ忠実度と方程式忠実度目標の重み付け和による最適化問題を解決することにかかっている。目的間の衝突は、スケールの不均衡、データのヘテロシディスティック性、物理方程式の剛性、または逐次訓練中の破滅的な干渉によって生じる。このことから生じる訓練病理を説明し,この問題を軽減するための単純かつ効果的な逆ディリクレ重み付け戦略を提案する。ニューラルネットワークのソボレフトレーニングと比較し、分析的に$\boldsymbol{\epsilon}$-Optimalトレーニングのベースラインを提供する。本研究では,多スケールのアクティブ乱流モデルを含む様々な応用における逆ディリクレ重み付けの有効性を実証し,従来のピン訓練よりも精度と収束度が桁違いに向上することを示す。逐次トレーニングを用いた逆モデリングでは,逆ディリクレ重み付けがPINNを破滅的忘れから保護することがわかった。 We characterize and remedy a failure mode that may arise from multi-scale dynamics with scale imbalances during training of deep neural networks, such as Physics Informed Neural Networks (PINNs). PINNs are popular machine-learning templates that allow for seamless integration of physical equation models with data. Their training amounts to solving an optimization problem over a weighted sum of data-fidelity and equation-fidelity objectives. Conflicts between objectives can arise from scale imbalances, heteroscedasticity in the data, stiffness of the physical equation, or from catastrophic interference during sequential training. We explain the training pathology arising from this and propose a simple yet effective inverse-Dirichlet weighting strategy to alleviate the issue. We compare with Sobolev training of neural networks, providing the baseline of analytically $\boldsymbol{\epsilon}$-optimal training. We demonstrate the effectiveness of inverse-Dirichlet weighting in various applications, including a multi-scale model of active turbulence, where we show orders of magnitude improvement in accuracy and convergence over conventional PINN training. For inverse modeling using sequential training, we find that inverse-Dirichlet weighting protects a PINN against catastrophic forgetting.	翻訳日:2021-07-05 22:58:27 公開日:2021-07-02
# (参考訳) パーソナライズ医療から人口健康へ:mヘルスセンシング技術に関する調査 From Personalized Medicine to Population Health: A Survey of mHealth Sensing Techniques ( http://arxiv.org/abs/2107.00948v1 ) ライセンス: CC BY 4.0	Zhiyuan Wang, Haoyi Xiong, Jie Zhang, Sijia Yang, Mehdi Boukhechba, Laura E. Barnes, Daqing Zhang	(参考訳) モバイルセンシングアプリは、個人から行動や健康関連の情報を収集し、メンタルヘルスや慢性ケアのような健康や健康を促進するためのタイムリーな介入を提供するための実用的なアプローチとして広く使われている。モバイルセンシングの目的は,個人用個別医療(emph{(a))と人口用公衆衛生(emph{(b))のいずれかであり,これらのモバイルセンシングアプリの設計を概観し,これらのアプリやシステムの設計を2つのパラダイム –\emph{(i) Personal Sensing} と \emph{(ii) Crowd Sensing} のパラダイムに分類することを提案する。 While both sensing paradigms might incorporate with common ubiquitous sensing technologies, such as wearable sensors, mobility monitoring, mobile data offloading, and/or cloud-based data analytics to collect and process sensing data from individuals, we present a novel taxonomy system with two major components that can specify and classify apps/systems from aspects of the life-cycle of mHealth Sensing: \emph{(1) Sensing Task Creation \& Participation}, \emph{(2) Health Surveillance \& Data Collection}, and \emph{(3) Data Analysis \& Knowledge Discovery}. 2つのパラダイムの異なる目標に関して、この研究はこの分野を体系的にレビューし、これらの2つのコンポーネント間の構成と相互作用の観点から、典型的なアプリ/システムの設計を要約します。要約に加えて, 個人化医療と人口健康の両面から, モバイルセンシングの健康への方向性を明らかにする上でも有効である。 Mobile Sensing Apps have been widely used as a practical approach to collect behavioral and health-related information from individuals and provide timely intervention to promote health and well-beings, such as mental health and chronic cares. As the objectives of mobile sensing could be either \emph{(a) personalized medicine for individuals} or \emph{(b) public health for populations}, in this work we review the design of these mobile sensing apps, and propose to categorize the design of these apps/systems in two paradigms -- \emph{(i) Personal Sensing} and \emph{(ii) Crowd Sensing} paradigms. While both sensing paradigms might incorporate with common ubiquitous sensing technologies, such as wearable sensors, mobility monitoring, mobile data offloading, and/or cloud-based data analytics to collect and process sensing data from individuals, we present a novel taxonomy system with two major components that can specify and classify apps/systems from aspects of the life-cycle of mHealth Sensing: \emph{(1) Sensing Task Creation \& Participation}, \emph{(2) Health Surveillance \& Data Collection}, and \emph{(3) Data Analysis \& Knowledge Discovery}. With respect to different goals of the two paradigms, this work systematically reviews this field, and summarizes the design of typical apps/systems in the view of the configurations and interactions between these two components. In addition to summarization, the proposed taxonomy system also helps figure out the potential directions of mobile sensing for health from both personalized medicines and population health perspectives.	翻訳日:2021-07-05 21:51:55 公開日:2021-07-02
# (参考訳) 身体と計算創造性 Embodiment and Computational Creativity ( http://arxiv.org/abs/2107.00949v1 ) ライセンス: CC BY 4.0	Christian Guckelsberger, Anna Kantosalo, Santiago Negrete-Yankelevich and Tapio Takala	(参考訳) 創造性と創造性の認識は、少なくともある程度は具現化によって形成されると推測する。これは計算創造性(CC)研究に非常に関係があるが、既存の研究は乏しく、概念の使用は曖昧である。我々は,国際計算創造会議において,体系的なレビューと出版物の規範的分析により,この状況を克服した。我々は、概念の異なる使用法を識別し比較することで曖昧さを解決するために、確立した態様を取り入れ、拡張する。我々は,研究の参考として,CCの実施の機会と課題を収集し,文脈を整理し,強調し,具体化されたCC研究プログラムをさらに進めるために重要な方向性を示した。 We conjecture that creativity and the perception of creativity are, at least to some extent, shaped by embodiment. This makes embodiment highly relevant for Computational Creativity (CC) research, but existing research is scarce and the use of the concept highly ambiguous. We overcome this situation by means of a systematic review and a prescriptive analysis of publications at the International Conference on Computational Creativity. We adopt and extend an established typology of embodiment to resolve ambiguity through identifying and comparing different usages of the concept. We collect, contextualise and highlight opportunities and challenges in embracing embodiment in CC as a reference for research, and put forward important directions to further the embodied CC research programme.	翻訳日:2021-07-05 20:34:30 公開日:2021-07-02
# (参考訳) 人集団を参照する直接的及び間接的関連項の概念識別 Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons ( http://arxiv.org/abs/2107.00955v1 ) ライセンス: CC BY 4.0	Anastasia Zhukova, Felix Hamborg, Karsten Donnay and Bela Gipp	(参考訳) クラスタリングによる非教師なし概念識別(unsupervised concept identification)、すなわち意味的関連のある単語やフレーズの識別は、様々なユースケースで使用される文脈的プリミティブ(例えば、テキスト次元の縮小)、すなわち語彙のサイズ、要約、名前付きエンティティの解決を減らすために単語を概念に置き換える、という一般的なアプローチである。本稿では,関連記事から抽出した人物群をアクタとして識別するための教師なしアプローチの最初の結果を示す。具体的には、「移民家族」=「亡命者」など、無名の実体俳優として活動する人々の集団について言及している。私たちの基準と比較すると、このアプローチは「イランの指導者」と「ヨーロッパの指導者」と、「アメリカの役人」=「トランプ政権」といった様々な言葉で間接的に関連のある言及を分離した地政学的実体の言及を維持する。 Unsupervised concept identification through clustering, i.e., identification of semantically related words and phrases, is a common approach to identify contextual primitives employed in various use cases, e.g., text dimension reduction, i.e., replace words with the concepts to reduce the vocabulary size, summarization, and named entity resolution. We demonstrate the first results of an unsupervised approach for the identification of groups of persons as actors extracted from a set of related articles. Specifically, the approach clusters mentions of groups of persons that act as non-named entity actors in the texts, e.g., "migrant families" = "asylum-seekers." Compared to our baseline, the approach keeps the mentions of the geopolitical entities separated, e.g., "Iran leaders" != "European leaders," and clusters (in)directly related mentions with diverse wording, e.g., "American officials" = "Trump Administration."	翻訳日:2021-07-05 20:13:12 公開日:2021-07-02
# (参考訳) A\c{C}AI: アクセント類似性キャッシュと近似指標 A\c{C}AI: Ascent Similarity Caching with Approximate Indexes ( http://arxiv.org/abs/2107.00957v1 ) ライセンス: CC BY 4.0	Tareq Si Salem, Giovanni Neglia, Damiano Carra	(参考訳) 類似性検索はマルチメディア検索システムやレコメンダシステムにおいて重要な操作であり、将来の機械学習や拡張現実アプリケーションにおいても重要な役割を果たす。これらのシステムが大きなオブジェクトに厳しい遅延制約を課す必要がある場合、エンドユーザーに近いエッジサーバは類似性キャッシュとして動作し、検索を高速化することができる。本稿では,a\c{c}aiについて述べる。a\c{c}aiは,(i)カタログ全体に対して(約)インデックスを使用して,どのオブジェクトをローカルに提供し,どのオブジェクトをリモートサーバから取得するかを判断し,(ii)リクエストプロセスが統計的に正規性を示さない場合でも,ローカルオブジェクトの集合を強い保証で更新するミラーアセンシングアルゴリズムを用いて,アートの状態を改善した新しい類似性キャッシングポリシである。 Similarity search is a key operation in multimedia retrieval systems and recommender systems, and it will play an important role also for future machine learning and augmented reality applications. When these systems need to serve large objects with tight delay constraints, edge servers close to the end-user can operate as similarity caches to speed up the retrieval. In this paper we present A\c{C}AI, a new similarity caching policy which improves on the state of the art by using (i) an (approximate) index for the whole catalog to decide which objects to serve locally and which to retrieve from the remote server, and (ii) a mirror ascent algorithm to update the set of local objects with strong guarantees even when the request process does not exhibit any statistical regularity.	翻訳日:2021-07-05 20:01:44 公開日:2021-07-02
# (参考訳) ResIST: 分散トレーニングのためのResNetのレイヤワイズ分解 ResIST: Layer-Wise Decomposition of ResNets for Distributed Training ( http://arxiv.org/abs/2107.00961v1 ) ライセンス: CC BY 4.0	Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis	(参考訳) 残差ネットワーク(resnets)のための新しい分散トレーニングプロトコルである {\rm \texttt{resist}} を提案する。 rm \texttt{resist}} は、グローバルレセットをランダムにいくつかの浅いサブレセットに分解し、複数のローカルイテレーションで個別に訓練し、更新を同期させ、グローバルモデルに集約する。次のラウンドでは、新しいサブResNetがランダムに生成され、プロセスが繰り返される。構成により、反復毎に {\rm \textt{resist}} はネットワークパラメータのほんの一部を各マシンに通信し、トレーニング中にフルモデルを使用することはない。したがって、 {\rm \texttt{ResIST}} は、ResNetトレーニングの通信、メモリ、時間要件を、以前のメソッドの要求のごく一部に減らす。データ並列トレーニングやローカルSGDによるデータ並列トレーニングのような一般的なプロトコルと比較すると、モデルの性能に関して競合する一方で、壁時計のトレーニング時間が減少する。 We propose {\rm \texttt{ResIST}}, a novel distributed training protocol for Residual Networks (ResNets). {\rm \texttt{ResIST}} randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the process repeats. By construction, per iteration, {\rm \texttt{ResIST}} communicates only a small portion of network parameters to each machine and never uses the full model during training. Thus, {\rm \texttt{ResIST}} reduces the communication, memory, and time requirements of ResNet training to only a fraction of the requirements of previous methods. In comparison to common protocols like data-parallel training and data-parallel training with local SGD, {\rm \texttt{ResIST}} yields a decrease in wall-clock training time, while being competitive with respect to model performance.	翻訳日:2021-07-05 19:25:19 公開日:2021-07-02
# (参考訳) 文化財における非監視モニタリングの有用性の評価 Evaluating the Usefulness of Unsupervised monitoring in Cultural Heritage Monuments ( http://arxiv.org/abs/2107.00964v1 ) ライセンス: CC BY-SA 4.0	Charalampos Zafeiropoulos, Ioannis N. Tzortzis, Ioannis Rallis, Eftychios Protopapadakis, Nikolaos Doulamis and Anastasios Doulamis	(参考訳) 本稿では, 各種クラスタリング手法の有効性を検証し, 文化遺産モニタリングへの適用性について検討する。本稿では,ロードス州のサン・ニコラス砦の壁面の分解と腐食のレベルをハイパースペクトル画像を用いて検出する。合計6つの異なるクラスタリング手法が14種類の補正ハイパースペクトル画像に対して評価されている。本研究では,K-means, Spectral, Meanshift, DBSCAN, Birch, Opticsアルゴリズムを実験的に検討した。これらの各手法について,calinski-harabasz,davies-bouldin indexes,silhouette valueなどのパフォーマンス指標を用いて性能評価を行う。本研究では,クラスタリング手法の結果を,原画像の分解および/または腐食領域に関する真実を表す注釈付き画像の集合と比較することにより評価する。その結果,与えられたデータセットに適用したクラスタリング手法によって,精度,精度,リコール,f1スコアが向上した。最終的に,劣化は極めて正確に検出された。 In this paper, we scrutinize the effectiveness of various clustering techniques, investigating their applicability in Cultural Heritage monitoring applications. In the context of this paper, we detect the level of decomposition and corrosion on the walls of Saint Nicholas fort in Rhodes utilizing hyperspectral images. A total of 6 different clustering approaches have been evaluated over a set of 14 different orthorectified hyperspectral images. Experimental setup in this study involves K-means, Spectral, Meanshift, DBSCAN, Birch and Optics algorithms. For each of these techniques we evaluate its performance by the use of performance metrics such as Calinski-Harabasz, Davies-Bouldin indexes and Silhouette value. In this approach, we evaluate the outcomes of the clustering methods by comparing them with a set of annotated images which denotes the ground truth regarding the decomposition and/or corrosion area of the original images. The results depict that a few clustering techniques applied on the given dataset succeeded decent accuracy, precision, recall and f1 scores. Eventually, it was observed that the deterioration was detected quite accurately.	翻訳日:2021-07-05 19:06:20 公開日:2021-07-02
# (参考訳) 移動学習を用いた低コスト顕微鏡画像における寄生卵の検出と分類 Parasitic Egg Detection and Classification in Low-cost Microscopic Images using Transfer Learning ( http://arxiv.org/abs/2107.00968v1 ) ライセンス: CC BY 4.0	Thanaphon Suwannaphong, Sawaphob Chavana, Sahapol Tongsom, Duangdao Palasuwan, Thanarat H. Chalidabhongse and Nantheera Anantrasirichai	(参考訳) 腸管寄生虫感染は世界中の人、特に熱帯諸国にいくつかの致死性をもたらす。通常、従来の診断は、異なる寄生卵の形態的類似性やサンプル中の不純物が豊富にあるため、人間のエラーにつながる顕微鏡画像から手動で解析する。多くの研究が人間の作業負荷を減らすために寄生虫卵検出のための自動システムを開発した。しかし、彼らは高品質の顕微鏡で作業しているが、田園部では残念ながら耐えられない。我々の研究は低コストのUSB顕微鏡の利点を生かしている。しかし、この装置は拡大限界(10x)のため画像の品質が悪く、寄生虫の検出や種分類が困難である。本稿では,低品質顕微鏡画像における自動寄生虫分類の効率を高めるために,トランスファー学習戦略を用いたcnnに基づく手法を提案する。スライディングウインドウを用いたパッチベース技術を用いて卵の位置を探索する。 AlexNetとResNet50という2つのネットワークがアーキテクチャサイズと分類性能のトレードオフによって検討されている。その結果,提案手法は最先端のオブジェクト認識手法よりも優れていた。本システムと専門家による最終決定が組み合わされば, 低コスト顕微鏡による実検率の向上が期待できる。 Intestinal parasitic infection leads to several morbidities to humans worldwide, especially in tropical countries. The traditional diagnosis usually relies on manual analysis from microscopic images which is prone to human error due to morphological similarity of different parasitic eggs and abundance of impurities in a sample. Many studies have developed automatic systems for parasite egg detection to reduce human workload. However, they work with high quality microscopes, which unfortunately remain unaffordable in some rural areas. Our work thus exploits a benefit of a low-cost USB microscope. This instrument however provides poor quality of images due to limitation of magnification (10x), causing difficulty in parasite detection and species classification. In this paper, we propose a CNN-based technique using transfer learning strategy to enhance the efficiency of automatic parasite classification in poor-quality microscopic images. The patch-based technique with sliding window is employed to search for location of the eggs. Two networks, AlexNet and ResNet50, are examined with a trade-off between architecture size and classification performance. The results show that our proposed framework outperforms the state-of-the-art object recognition methods. Our system combined with final decision from an expert may improve the real faecal examination with low-cost microscopes.	翻訳日:2021-07-05 18:58:10 公開日:2021-07-02
# (参考訳) 複数のAndroidスマートフォンのサブミリ秒ビデオ同期 Sub-millisecond Video Synchronization of Multiple Android Smartphones ( http://arxiv.org/abs/2107.00987v1 ) ライセンス: CC BY-SA 4.0	Azat Akhmetyanov, Anastasiia Kornilova, Marsel Faizullin, David Pozo, Gonzalo Ferrer	(参考訳) 本稿では,高ダイナミック環境における多くのコンピュータビジョン・ロボティクスアプリケーションに要求される,安価でセットアップが容易なマルチビューカメラシステムを構築することの課題に対処する。そこで本研究では,複数のスマートフォン上で1ミリ秒未満の精度で動画を同期記録できるandroidアプリケーションを提案する。我々は,androidスマートフォンにおけるタイムスタンプの一般化した数学的モデルを提案し,47種類の物理デバイスに適用可能であることを証明した。また,多くのデバイスで1秒あたり1.2ミリ秒以下であるスマートフォンの時間ドリフトパラメータを推定することにより,スマートフォンのカメラシステムは,プロのマルチビューシステムにふさわしいアナログとなる。最後に,androidスマートフォンを用いたカメラシステムにおいて,300マイクロ秒未満の同期誤差を示し,パノラマ縫合作業において定量的にandroidアプリのパフォーマンスを示す。 This paper addresses the problem of building an affordable easy-to-setup synchronized multi-view camera system, which is in demand for many Computer Vision and Robotics applications in high-dynamic environments. In our work, we propose a solution for this problem - a publicly-available Android application for synchronized video recording on multiple smartphones with sub-millisecond accuracy. We present a generalized mathematical model of timestamping for Android smartphones and prove its applicability on 47 different physical devices. Also, we estimate the time drift parameter for those smartphones, which is less than 1.2 millisecond per minute for most of the considered devices, that makes smartphones' camera system a worthy analog for professional multi-view systems. Finally, we demonstrate Android-app performance on the camera system built from Android smartphones quantitatively, showing less than 300 microseconds synchronization error, and qualitatively - on panorama stitching task.	翻訳日:2021-07-05 18:48:40 公開日:2021-07-02
# (参考訳) ニューラルコード探索のためのマルチモーダル表現 Multimodal Representation for Neural Code Search ( http://arxiv.org/abs/2107.00992v1 ) ライセンス: CC BY 4.0	Jian Gu, Zimin Chen, Martin Monperrus	(参考訳) 意味的なコード検索は、ある自然言語クエリのセマンティック関連コードスニペットを見つけることである。最先端のアプローチでは、コードとクエリのセマンティックな類似性は、共有ベクトル空間におけるそれらの表現の距離として定量化される。本稿では,ベクトル空間を改善するために,AST の簡易な形式を用いたツリーシリアライズ手法を導入し,コードデータのマルチモーダル表現を構築する。大規模なマルチ言語コーパスであるcodesearchnetを用いて,広範な実験を行う。以上の結果から,本手法とマルチモーダル学習モデルの両方が,ニューラルコード探索の性能を向上させることが示された。最後に,コードデータのセマンティック情報と構文情報の完全性に着目した2つの直感的定量化指標を定義する。 Semantic code search is about finding semantically relevant code snippets for a given natural language query. In the state-of-the-art approaches, the semantic similarity between code and query is quantified as the distance of their representation in the shared vector space. In this paper, to improve the vector space, we introduce tree-serialization methods on a simplified form of AST and build the multimodal representation for the code data. We conduct extensive experiments using a single corpus that is large-scale and multi-language: CodeSearchNet. Our results show that both our tree-serialized representations and multimodal learning model improve the performance of neural code search. Last, we define two intuitive quantification metrics oriented to the completeness of semantic and syntactic information of the code data.	翻訳日:2021-07-05 18:41:15 公開日:2021-07-02
# (参考訳) 類似性に基づくマルチスケール埋め込みを用いた拡大非依存型組織像分類 Magnification-independent Histopathological Image Classification with Similarity-based Multi-scale Embeddings ( http://arxiv.org/abs/2107.01063v1 ) ライセンス: CC BY 4.0	Yibao Sun, Xingru Huang, Yaqi Wang, Huiyu Zhou, Qianni Zhang	(参考訳) 病理組織像の分類は、癌診断と病理研究の両方において非常に重要である。しかし、拡大係数やクラス不均衡など、複数の理由により、イメージラベルデータセットから学習する従来の手法が多くの場合、不十分に実行するという困難な課題となっている。同じクラスの腫瘍は、しばしば共通の形態的パターンを共有しているのが観察される。そこで本研究では,類似性に基づくマルチスケール埋め込み (SMSE) を用いた画像分類手法を提案する。特に、対損失と三重項損失を利用して、画像対や画像三重項から類似性に基づく埋め込みを学習する。学習された埋め込みは画像間の類似度を正確に測定し、通常の画像特徴よりも病理組織学的形態のより効果的な表現形態と見なされる。さらに、生成したモデルが倍率非依存であることを保証するため、マルチスケール埋め込み学習のトレーニング中に異なる倍率係数で取得した画像をネットワークに同時に供給する。 smseに加えて, 簡単なサンプルを直感的に破棄するハードサンプルマイニング戦略を用いる代わりに, 容易に分類されたサンプルを抑圧しながら, ハードクラス分けされたサンプルを同時に罰する新たな強化焦点損失を導入する。実験の結果,smseは乳腺癌および肝癌における病理組織学的画像分類タスクの性能を,従来法に比べて大きなマージンで改善することが判明した。特に、SMSEはBreakHisベンチマークで最高のパフォーマンスを達成しており、従来の機能を使った方法に比べて5%から18%改善されている。 The classification of histopathological images is of great value in both cancer diagnosis and pathological studies. However, multiple reasons, such as variations caused by magnification factors and class imbalance, make it a challenging task where conventional methods that learn from image-label datasets perform unsatisfactorily in many cases. We observe that tumours of the same class often share common morphological patterns. To exploit this fact, we propose an approach that learns similarity-based multi-scale embeddings (SMSE) for magnification-independent histopathological image classification. In particular, a pair loss and a triplet loss are leveraged to learn similarity-based embeddings from image pairs or image triplets. The learned embeddings provide accurate measurements of similarities between images, which are regarded as a more effective form of representation for histopathological morphology than normal image features. Furthermore, in order to ensure the generated models are magnification-independent, images acquired at different magnification factors are simultaneously fed to networks during training for learning multi-scale embeddings. In addition to the SMSE, to eliminate the impact of class imbalance, instead of using the hard sample mining strategy that intuitively discards some easy samples, we introduce a new reinforced focal loss to simultaneously punish hard misclassified samples while suppressing easy well-classified samples. Experimental results show that the SMSE improves the performance for histopathological image classification tasks for both breast and liver cancers by a large margin compared to previous methods. In particular, the SMSE achieves the best performance on the BreakHis benchmark with an improvement ranging from 5% to 18% compared to previous methods using traditional features.	翻訳日:2021-07-05 18:25:15 公開日:2021-07-02
# (参考訳) 教師なし音声発話分類 Unsupervised Spoken Utterance Classification ( http://arxiv.org/abs/2107.01068v1 ) ライセンス: CC0 1.0	Shahab Jalalvand and Srinivas Bangalore	(参考訳) インテリジェントバーチャルアシスタント(IVA)は、音声言語理解(SLU)の特殊な形式である音声発話分類(SUC)を通じて、通話ルーティングにおける努力的な会話を可能にする。 SUCシステムを構築するには、常に利用できない大量のドメイン内データを必要とする。本稿では、意図ラベルと意図ごとのパラフレーズを除いてドメイン内のデータを必要としない教師なし音声音声分類手法(USUC)を提案する。 USUCはKNN分類器(K=1)と、大量の教師なし顧客サービスコーパスに基づいてトレーニングされた複雑な埋め込みモデルで構成される。すべての埋め込みモデルの中で、ElmoがUSUCに最適であることを示す。しかし、elmoモデルは呼び出しルーティングのために実行時に使用するには遅すぎる。この問題を解決するため、まず、ユニグラフおよびバイグラム組込みベクトルをオフラインで計算し、n-gramとその組込みベクトルのルックアップテーブルを構築する。次に,このテーブルを用いて文の埋め込みベクトルをリアルタイムに計算し,n-gramのバックオフ手法を提案する。実験により,usucは,教師付きデータなしで分類誤り率を32.9%から27.0%に下げることにより,従来の発話分類法を上回った。さらに,本手法では,処理速度を毎秒16発話から毎秒118発話に向上させる。 An intelligent virtual assistant (IVA) enables effortless conversations in call routing through spoken utterance classification (SUC) which is a special form of spoken language understanding (SLU). Building a SUC system requires a large amount of supervised in-domain data that is not always available. In this paper, we introduce an unsupervised spoken utterance classification approach (USUC) that does not require any in-domain data except for the intent labels and a few para-phrases per intent. USUC is consisting of a KNN classifier (K=1) and a complex embedding model trained on a large amount of unsupervised customer service corpus. Among all embedding models, we demonstrate that Elmo works best for USUC. However, an Elmo model is too slow to be used at run-time for call routing. To resolve this issue, first, we compute the uni- and bi-gram embedding vectors offline and we build a lookup table of n-grams and their corresponding embedding vector. Then we use this table to compute sentence embedding vectors at run-time, along with back-off techniques for unseen n-grams. Experiments show that USUC outperforms the traditional utterance classification methods by reducing the classification error rate from 32.9% to 27.0% without requiring supervised data. Moreover, our lookup and back-off technique increases the processing speed from 16 utterances per second to 118 utterances per second.	翻訳日:2021-07-05 18:06:16 公開日:2021-07-02
# (参考訳) ウェアラブルセンサを用いた幼児運動自動評価のためのエンドツーエンドニューラルネットワークアーキテクチャとデータ拡張手法の比較 Comparison of end-to-end neural network architectures and data augmentation methods for automatic infant motility assessment using wearable sensors ( http://arxiv.org/abs/2107.01086v1 ) ライセンス: CC BY 4.0	Manu Airaksinen, Sampsa Vanhatalo, Okko R\"as\"anen	(参考訳) 知的ウェアラブルを用いた幼児運動評価は、乳児の神経生理学的発達と効率的な信号分析が中心的な役割を果たす新しいアプローチとして有望である。本研究では,ウェアラブルセンサから幼児の運動データを処理するためのエンド・ツー・エンドのニューラルネットワークアーキテクチャの利用について検討する。本稿では,代替センサエンコーダと時系列モデリングモジュールの性能と計算負荷とその組み合わせに着目した。さらに,理想的および非理想的記録条件におけるデータ拡張手法の利点について検討する。実験は, 乳児の運動性評価のためのスマートジャンプスーツを用いて, 7カ月児のマルチセンサ運動記録のデータセットを用いて行った。その結果,エンコーダモジュールの選択は分類器の性能に大きな影響を与えることがわかった。センサエンコーダでは,全センサの共有重み付きセンサ内チャネル融合において,並列2次元畳み込みによる最適性能が得られた。また, センサ内特徴抽出において, 分類器の性能を著しく損なうことなく, 比較的コンパクトな特徴表現が得られることを示す。時系列モデルとの比較により,残差およびスキップ接続によるフィードフォワード拡張畳み込みは,RNNベースモデル全体の性能,トレーニング時間,トレーニング安定性に優れていた。実験は、データ拡張がシミュレーションパケット損失やセンサドロップアウトシナリオのモデルロバスト性を向上させることも示している。特に、信号およびセンサドロップアウトに基づく拡張戦略は、ベースライン性能に悪影響を及ぼすことなく、性能を大幅に向上させた。その結果,多チャンネル移動センサデータに対するエンドツーエンドニューラルネットワークトレーニングの最適化方法について,具体的な提案が得られた。 Infant motility assessment using intelligent wearables is a promising new approach for assessment of infant neurophysiological development, and where efficient signal analysis plays a central role. This study investigates the use of different end-to-end neural network architectures for processing infant motility data from wearable sensors. We focus on the performance and computational burden of alternative sensor encoder and time-series modelling modules and their combinations. In addition, we explore the benefits of data augmentation methods in ideal and non-ideal recording conditions. The experiments are conducted using a data-set of multi-sensor movement recordings from 7-month-old infants, as captured by a recently proposed smart jumpsuit for infant motility assessment. Our results indicate that the choice of the encoder module has a major impact on classifier performance. For sensor encoders, the best performance was obtained with parallel 2-dimensional convolutions for intra-sensor channel fusion with shared weights for all sensors. The results also indicate that a relatively compact feature representation is obtainable for within-sensor feature extraction without a drastic loss to classifier performance. Comparison of time-series models revealed that feed-forward dilated convolutions with residual and skip connections outperformed all RNN-based models in performance, training time, and training stability. The experiments also indicate that data augmentation improves model robustness in simulated packet loss or sensor dropout scenarios. In particular, signal- and sensor-dropout-based augmentation strategies provided considerable boosts to performance without negatively affecting the baseline performance. Overall the results provide tangible suggestions on how to optimize end-to-end neural network training for multi-channel movement sensor data.	翻訳日:2021-07-05 17:59:44 公開日:2021-07-02
# (参考訳) vox populi, vox diy:クラウドソースオーディオ転写のためのベンチマークデータセット Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription ( http://arxiv.org/abs/2107.01091v1 ) ライセンス: CC BY 4.0	Nikita Pavlichenko, Ivan Stelmakh, Dmitry Ustalov	(参考訳) ドメイン固有のデータは、ベンチマークから実生活への機械学習システムの移行の成功の要点である。クラウドソーシングは、画像分類のような単純な問題に対して、安価で時間効率の良いデータ収集のための標準的なツールの1つになっている。しかしながら、より複雑なタスク(例えば音声認識)へのクラウドソーシングの適用性は、これらのモダリティに対する原則的な集約方法の欠如によって制限されている。高度な集約手法を設計する主な障害は、トレーニングデータの欠如であり、本研究では、音声認識におけるこのギャップを埋めることに焦点を当てる。 CrowdSpeechは、クラウドソーシングされたオーディオの大規模なデータセットとして初めて公開されています。既存の集計手法の評価は改善の余地があり,より優れたアルゴリズムの設計を伴っている可能性が示唆された。より高度なレベルでは、クラウドソーシングを使用して高品質なデータセットを収集するという、より一般的な課題にも貢献します。ロシア語のcrowdspeechに相当するvoxdiyを構築することで、リソース不足の言語にその適用性を示す。データ収集パイプラインの完全なレプリケーションを可能にするコードもリリースし、クラウドソーシングによるデータ収集のベストプラクティスに関するさまざまな洞察を共有しています。 Domain-specific data is the crux of the successful transfer of machine learning systems from benchmarks to real life. Crowdsourcing has become one of the standard tools for cheap and time-efficient data collection for simple problems such as image classification: thanks in large part to advances in research on aggregation methods. However, the applicability of crowdsourcing to more complex tasks (e.g., speech recognition) remains limited due to the lack of principled aggregation methods for these modalities. The main obstacle towards designing advanced aggregation methods is the absence of training data, and in this work, we focus on bridging this gap in speech recognition. For this, we collect and release CrowdSpeech -- the first publicly available large-scale dataset of crowdsourced audio transcriptions. Evaluation of existing aggregation methods on our data shows room for improvement, suggesting that our work may entail the design of better algorithms. At a higher level, we also contribute to the more general challenge of collecting high-quality datasets using crowdsourcing: we develop a principled pipeline for constructing datasets of crowdsourced audio transcriptions in any novel domain. We show its applicability on an under-resourced language by constructing VoxDIY -- a counterpart of CrowdSpeech for the Russian language. We also release the code that allows a full replication of our data collection pipeline and share various insights on best practices of data collection via crowdsourcing.	翻訳日:2021-07-05 17:41:08 公開日:2021-07-02
# (参考訳) 自動運転車のための意思決定技術 : 学習方法と応用と今後の展望 Decision-Making Technology for Autonomous Vehicles Learning-Based Methods, Applications and Future Outlook ( http://arxiv.org/abs/2107.01110v1 ) ライセンス: CC BY 4.0	Qi Liu, Xueyuan Li, Shihua Yuan, Zirui Li	(参考訳) 自動運転車は、民間と軍事の両方の分野の応用に大きな可能性を秘めており、科学と経済の急速な発展による研究の焦点となっている。本稿では,自動運転車の安全性と効率性において重要であることから,自動運転車の学習に基づく意思決定技術について概説する。まず、意思決定技術の基本的な概要を提供する。第2に,学習に基づく自動運転車の意思決定手法に関する関連研究を,古典的意思決定手法との比較で概ね検討した。また,既存の自動運転車における意思決定手法の適用例をまとめた。最後に、自動運転車の意思決定技術の将来研究における有望な研究トピックを展望する。 Autonomous vehicles have a great potential in the application of both civil and military fields, and have become the focus of research with the rapid development of science and economy. This article proposes a brief review on learning-based decision-making technology for autonomous vehicles since it is significant for safer and efficient performance of autonomous vehicles. Firstly, the basic outline of decision-making technology is provided. Secondly, related works about learning-based decision-making methods for autonomous vehicles are mainly reviewed with the comparison to classical decision-making methods. In addition, applications of decision-making methods in existing autonomous vehicles are summarized. Finally, promising research topics in the future study of decision-making technology for autonomous vehicles are prospected.	翻訳日:2021-07-05 17:26:04 公開日:2021-07-02
# (参考訳) 深部画像のスペクトルバイアスの測定と制御について On Measuring and Controlling the Spectral Bias of the Deep Image Prior ( http://arxiv.org/abs/2107.01125v1 ) ライセンス: CC BY 4.0	Zenglin Shi, Pascal Mettes, Subhransu Maji, and Cees G. M. Snoek	(参考訳) 深層画像は,1つの劣化画像だけを最適化することにより,ノイズ除去,塗装,超高解像度化などの逆画像問題に対処できることを示す。約束にもかかわらず、2つの制限がある。まず、ネットワークアーキテクチャの選択を超えて、どのように事前を制御することができるのかは不明だ。第二に、ピークに達するとパフォーマンスが劣化するので、最適化をいつ停止するかをoracleが決める必要がある。本稿では,これらの問題に対処するために,スペクトルバイアスの観点から先行した深部画像について検討する。周波数帯域対応尺度を導入することで、逆画像の深部画像先行は、低周波画像信号が高周波ノイズ信号よりも高速に学習される最適化中にスペクトルバイアスを示す。このピンポイントは、最適化が正しいタイミングで停止されたときに、劣化した画像をデノベートしたり、インペイントしたりできる理由を示している。そこで本研究では,性能劣化を防止し,最適化収束を高速化するために,深部画像のスペクトルバイアスを制御することを提案する。コンボリューション層とアップサンプリング層という,逆画像ネットワークの2つのコア層タイプで実現している。畳み込みに対するリプシッツ制御アプローチと、アップサンプリング層に対するガウス制御アプローチを提案する。さらに,過剰な計算を避けるために停止基準を導入する。ノイズ除去, 塗装, 超高分解能化実験により, 最適化中の性能劣化に苦しむことなく, オラクル基準を早期に停止させる必要性が軽減された。さらに,過剰な計算を避けるために停止基準を概説する。最後に,本手法は全タスクにおいて,現在の手法と比較して良好な復元結果が得られることを示す。 The deep image prior has demonstrated the remarkable ability that untrained networks can address inverse imaging problems, such as denoising, inpainting and super-resolution, by optimizing on just a single degraded image. Despite its promise, it suffers from two limitations. First, it remains unclear how one can control the prior beyond the choice of the network architecture. Second, it requires an oracle to determine when to stop the optimization as the performance degrades after reaching a peak. In this paper, we study the deep image prior from a spectral bias perspective to address these problems. By introducing a frequency-band correspondence measure, we observe that deep image priors for inverse imaging exhibit a spectral bias during optimization, where low-frequency image signals are learned faster and better than high-frequency noise signals. This pinpoints why degraded images can be denoised or inpainted when the optimization is stopped at the right time. Based on our observations, we propose to control the spectral bias in the deep image prior to prevent performance degradation and to speed up optimization convergence. We do so in the two core layer types of inverse imaging networks: the convolution layer and the upsampling layer. We present a Lipschitz-controlled approach for the convolution and a Gaussian-controlled approach for the upsampling layer. We further introduce a stopping criterion to avoid superfluous computation. The experiments on denoising, inpainting and super-resolution show that our method no longer suffers from performance degradation during optimization, relieving us from the need for an oracle criterion to stop early. We further outline a stopping criterion to avoid superfluous computation. Finally, we show that our approach obtains favorable restoration results compared to current approaches, across all tasks.	翻訳日:2021-07-05 17:09:21 公開日:2021-07-02
# (参考訳) Contrastive Fenchel-Legendre Optimization を用いた高度相互情報推定 Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization ( http://arxiv.org/abs/2107.01131v1 ) ライセンス: CC BY 4.0	Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, Chenyang Tao	(参考訳) InfoNCEとその変種の使用が成功し、機械学習におけるコントラスト変動相互情報(MI)推定器の利用が一般化した。優れた安定性を示す一方で、これらの推定値はコストのかかる大規模バッチトレーニングに依存しており、分散削減のために縛りのあるタイトさを犠牲にしている。これらの限界を克服するために、非正規化統計モデリングと凸最適化のレンズから一般的な変分mi境界の数学を再検討する。我々の研究は、一般的な変分MI境界を包含する新しい統一理論の枠組みをもたらすだけでなく、FLOと呼ばれる新しい、単純で強力な反トラストMI推定器にも繋がる。理論的には、FLO推定器は厳密であり、確率勾配降下下では確実に収束する。実証的に、我々のFLO推定器は前者の限界を克服し、より効率的に学習する。 FLOの有効性は、広範囲なベンチマークを用いて検証され、実際のMI推定におけるトレードオフも明らかにされる。 Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation not only yields a new unified theoretical framework encompassing popular variational MI bounds but also leads to a novel, simple, and powerful contrastive MI estimator named as FLO. Theoretically, we show that the FLO estimator is tight, and it provably converges under stochastic gradient descent. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.	翻訳日:2021-07-05 16:51:11 公開日:2021-07-02
# (参考訳) 4C: CAVのための計算・通信・制御の共同設計フレームワーク 4C: A Computation, Communication, and Control Co-Design Framework for CAVs ( http://arxiv.org/abs/2107.01142v1 ) ライセンス: CC0 1.0	Liangkai Liu, Shaoshan Liu, and Weisong Shi	(参考訳) コネクテッド・自動運転車(CAV)は、安全性と効率の面で有望であり、政府機関、産業、学界から多額の投資と関心を集めている。より多くのコンピューティングと通信リソースが利用可能であるため、車両とエッジサーバは、感知と知覚のために、Visual IoT(V-IoT)技術として知られる一連のカメラベースの視覚センサーを備えている。プログラム可能な通信、計算、制御を実現するために、多くの努力がなされている。しかし、それらは主にサイロモードで行われ、現実世界で挑戦的なシナリオを扱う応答性と効率を制限している。エンド・ツー・エンドの性能を向上させるために,将来のCAVはコミュニケーション,計算,制御の共設計を必要とする。本稿では,CAVのエンドツーエンド設計原則である4Cについて,統一的な通信,計算,協調設計のフレームワークを提供することで,V-IoTシステムを拡張したビジョンを述べる。プログラマブルなコミュニケーション、細かな異種計算、そして4cの効率的な車両制御により、cavsは重要なシナリオを処理し、エネルギー効率の良い自動運転を実現することができる。最後に,4cフレームワークのビジョンを実現するための課題をいくつか提示する。 Connected and autonomous vehicles (CAVs) are promising due to their potential safety and efficiency benefits and have attracted massive investment and interest from government agencies, industry, and academia. With more computing and communication resources are available, both vehicles and edge servers are equipped with a set of camera-based vision sensors, also known as Visual IoT (V-IoT) techniques, for sensing and perception. Tremendous efforts have been made for achieving programmable communication, computation, and control. However, they are conducted mainly in the silo mode, limiting the responsiveness and efficiency of handling challenging scenarios in the real world. To improve the end-to-end performance, we envision that future CAVs require the co-design of communication, computation, and control. This paper presents our vision of the end-to-end design principle for CAVs, called 4C, which extends the V-IoT system by providing a unified communication, computation, and control co-design framework. With programmable communications, fine-grained heterogeneous computation, and efficient vehicle controls in 4C, CAVs can handle critical scenarios and achieve energy-efficient autonomous driving. Finally, we present several challenges to achieving the vision of the 4C framework.	翻訳日:2021-07-05 16:17:21 公開日:2021-07-02
# (参考訳) 協調型視覚ナビゲーション Collaborative Visual Navigation ( http://arxiv.org/abs/2107.01151v1 ) ライセンス: CC BY 4.0	Haiyang Wang, Wenguan Wang, Xizhou Zhu, Jifeng Dai, Liwei Wang	(参考訳) 人工知能の基本的な問題として、マルチエージェントシステム(MAS)は、主にマルチエージェント強化学習(MARL)技術によって急速に進歩している。しかしながら、従来のmarlの手法は主にグリッドワールドのようなゲーム環境にフォーカスしており、視覚的にリッチな環境でのmasの探索は少ないままである。このギャップを狭め,MASにおける知覚の重要な役割を強調するために,マルチエージェント視覚ナビゲーション(MAVN)のための大規模3次元データセットCollaVNを提案する。 collavnでは、複数のエージェントが協調してフォトリアリスティックな環境を渡り、ターゲットの場所に到達する。この問題をより一般的なものにするために、様々なMAVN変種を探索する。さらに,メモリ型通信フレームワークを提案する。各エージェントは、通信情報を永続的に記憶するプライベートな外部メモリを備える。これにより、エージェントは過去のコミュニケーション情報をよりよく利用し、より効率的なコラボレーションと堅牢な長期計画を可能にします。実験では,いくつかのベースラインと評価指標を設計した。また、異なるMAVNタスク設定に対して提案したMARLアプローチの有効性を実証的に検証した。 As a fundamental problem for Artificial Intelligence, multi-agent system (MAS) is making rapid progress, mainly driven by multi-agent reinforcement learning (MARL) techniques. However, previous MARL methods largely focused on grid-world like or game environments; MAS in visually rich environments has remained less explored. To narrow this gap and emphasize the crucial role of perception in MAS, we propose a large-scale 3D dataset, CollaVN, for multi-agent visual navigation (MAVN). In CollaVN, multiple agents are entailed to cooperatively navigate across photo-realistic environments to reach target locations. Diverse MAVN variants are explored to make our problem more general. Moreover, a memory-augmented communication framework is proposed. Each agent is equipped with a private, external memory to persistently store communication information. This allows agents to make better use of their past communication information, enabling more efficient collaboration and robust long-term planning. In our experiments, several baselines and evaluation metrics are designed. We also empirically verify the efficacy of our proposed MARL approach across different MAVN task settings.	翻訳日:2021-07-05 16:06:14 公開日:2021-07-02
# (参考訳) シンプルで、速く、より強く: 対照的な学習者に対して、log-kの呪いを破る Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE ( http://arxiv.org/abs/2107.01152v1 ) ライセンス: CC BY 4.0	Junya Chen, Zhe Gan, Xuan Li, Qing Guo, Liqun Chen, Shuyang Gao, Tagyoung Chung, Yi Xu, Belinda Zeng, Wenlian Lu, Fan Li, Lawrence Carin, Chenyang Tao	(参考訳) InfoNCEベースのコントラスト表現学習者(SimCLRなど)は近年大きく成功している。しかしながら、これらの対照的なスキームは、その効果が小さなバッチトレーニング(例えば、log-Kの呪い、Kはバッチサイズ)によって破壊されるため、リソース要求で悪名高い。本研究は,小さなバッチサイズでは,コントラスト学習者が失敗する理由を数学的に明らかにし,この問題を解決した,単純で非自明なコントラスト目標flatnceを提案する。 InfoNCEとは異なり、FlatNCEはもはや、対照的な学習のための差別的な分類目標に明示的にアピールしていません。理論的には、フラットスはインフォンスの数学的双対な定式化であり、したがってエネルギーモデリングに関する古典文学を橋渡ししていることを示している。この研究の意義は、コントラスト学習技術の強力な一般化と、コントラスト学習の監視と診断のための新しいツールの導入によってもたらされる。 CIFAR10、ImageNet、その他のデータセットに関する実証的な証拠で、私たちの主張を裏付けます。 InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes this issue. Unlike InfoNCE, our FlatNCE no longer explicitly appeals to a discriminative classification goal for contrastive learning. Theoretically, we show FlatNCE is the mathematical dual formulation of InfoNCE, thus bridging the classical literature on energy modeling; and empirically, we demonstrate that, with minimal modification of code, FlatNCE enables immediate performance boost independent of the subject-matter engineering efforts. The significance of this work is furthered by the powerful generalization of contrastive learning techniques, and the introduction of new tools to monitor and diagnose contrastive training. We substantiate our claims with empirical evidence on CIFAR10, ImageNet, and other datasets, where FlatNCE consistently outperforms InfoNCE.	翻訳日:2021-07-05 15:40:50 公開日:2021-07-02
# (参考訳) 勾配漏れ耐性フェデレート学習 Gradient-Leakage Resilient Federated Learning ( http://arxiv.org/abs/2107.01154v1 ) ライセンス: CC BY 4.0	Wenqi Wei, Ling Liu, Yanzhao Wu, Gong Su, Arun Iyengar	(参考訳) クライアントはデバイスに機密データを保持でき、ローカルトレーニングパラメータの更新のみをフェデレーションサーバと共有できるため、フェデレーション学習(FL)は、デフォルトのクライアントプライバシを備えた、新興の分散学習パラダイムである。しかし最近の研究では、FLの勾配リークがクライアントのトレーニングデータのプライバシーを損なう可能性があることが示されている。本稿では,feed-cdpと呼ばれるサンプルベースクライアントディファレンシャルプライバシのトレーニング毎に,プライバシ保存型フェデレーション学習に対する勾配漏洩耐性アプローチを提案する。 3つのオリジナル・コントリビューションがある。まず,暗号化されたクライアントサーバ間通信においても,フェデレーション学習における3種類のクライアント勾配漏洩脅威を識別する。我々は、従来のサーバがFed-SDPとよばれる差分プライバシーアプローチが、トレーニングデータのプライバシーを保護するのに不十分な時期と理由を明確に述べる。第2に、サンプルベースのクライアント差分プライバシーアルゴリズムであるFed-CDPを導入し、$(\epsilon, \delta)$差分プライバシー保証によるFed-CDPの形式分析と、プライバシ会計の観点からFed-CDPとFed-SDPの形式比較を提供する。第三に、Fed-CDPによる差分プライバシー保証を提供するためのプライバシユーティリティトレードオフを正式に分析し、Fed-CDPの精度とレジリエンスをさらに向上させる動的減衰ノイズ注入ポリシーを提案する。 Fed-CDPとFed-CDP(decay)を5つのベンチマークデータセットに対して差分プライバシー保証と勾配リークレジリエンスの観点からFed-SDPと比較した。その結果、Fed-CDPアプローチは、クライアント勾配リークに対するレジリエンスの観点から従来のFed-SDPよりも優れており、フェデレート学習における競争精度が向上していることがわかった。 Federated learning(FL) is an emerging distributed learning paradigm with default client privacy because clients can keep sensitive data on their devices and only share local training parameter updates with the federated server. However, recent studies reveal that gradient leakages in FL may compromise the privacy of client training data. This paper presents a gradient leakage resilient approach to privacy-preserving federated learning with per training example-based client differential privacy, coined as Fed-CDP. It makes three original contributions. First, we identify three types of client gradient leakage threats in federated learning even with encrypted client-server communications. We articulate when and why the conventional server coordinated differential privacy approach, coined as Fed-SDP, is insufficient to protect the privacy of the training data. Second, we introduce Fed-CDP, the per example-based client differential privacy algorithm, and provide a formal analysis of Fed-CDP with the $(\epsilon, \delta)$ differential privacy guarantee, and a formal comparison between Fed-CDP and Fed-SDP in terms of privacy accounting. Third, we formally analyze the privacy-utility trade-off for providing differential privacy guarantee by Fed-CDP and present a dynamic decay noise-injection policy to further improve the accuracy and resiliency of Fed-CDP. We evaluate and compare Fed-CDP and Fed-CDP(decay) with Fed-SDP in terms of differential privacy guarantee and gradient leakage resilience over five benchmark datasets. The results show that the Fed-CDP approach outperforms conventional Fed-SDP in terms of resilience to client gradient leakages while offering competitive accuracy performance in federated learning.	翻訳日:2021-07-05 15:11:04 公開日:2021-07-02
# (参考訳) モーメントは確率的AUPRC最大化の収束を加速する Momentum Accelerates the Convergence of Stochastic AUPRC Maximization ( http://arxiv.org/abs/2107.01173v1 ) ライセンス: CC BY 4.0	Guanghui Wang, Ming Yang, Lijun Zhang, Tianbao Yang	(参考訳) 本稿では,不均衡な分類課題に対処するために広く用いられている精度リコール曲線(auprc)下の領域の確率的最適化について検討する。 AUPRCの最大化にはいくつかの方法が提案されているが、収束保証付きAUPRCの確率的最適化は未開発領域のままである。最近の研究[42]では、平均精度のサロゲート損失を最大化することに基づく AUPRC に対する有望なアプローチを提案し、非凸目的の$O(1/\epsilon^5)$複雑性を証明した。本稿では, (i)$O(1/\epsilon^4)$の反復複雑性を向上した新しい確率運動量法を開発し, (ii)$O(1/\epsilon^4)$と同じ反復複雑性を持つ新しい確率適応手法のファミリーを設計し, 実際により高速な収束を享受することで, AURPCの確率的最適化をさらに改善する。そこで本研究では,コンバージェンス改善に不可欠な2つの革新的手法を提案する。 (i) 個々のランキングスコアを追跡するバイアス付き推定器をランダムに座標的に更新する, (ii) 目標の勾配を追跡する確率的勾配推定器の上に運動量更新を用いる。様々なデータセットに対する実験により,提案アルゴリズムの有効性が示された。独立性において、提案された確率運動量と適応アルゴリズムは、2段階確率依存構成最適化問題にも適用できる。 In this paper, we study stochastic optimization of areas under precision-recall curves (AUPRC), which is widely used for combating imbalanced classification tasks. Although a few methods have been proposed for maximizing AUPRC, stochastic optimization of AUPRC with convergence guarantee remains an undeveloped territory. A recent work [42] has proposed a promising approach towards AUPRC based on maximizing a surrogate loss for the average precision, and proved an $O(1/\epsilon^5)$ complexity for finding an $\epsilon$-stationary solution of the non-convex objective. In this paper, we further improve the stochastic optimization of AURPC by (i) developing novel stochastic momentum methods with a better iteration complexity of $O(1/\epsilon^4)$ for finding an $\epsilon$-stationary solution; and (ii) designing a novel family of stochastic adaptive methods with the same iteration complexity of $O(1/\epsilon^4)$, which enjoy faster convergence in practice. To this end, we propose two innovative techniques that are critical for improving the convergence: (i) the biased estimators for tracking individual ranking scores are updated in a randomized coordinate-wise manner; and (ii) a momentum update is used on top of the stochastic gradient estimator for tracking the gradient of the objective. Extensive experiments on various data sets demonstrate the effectiveness of the proposed algorithms. Of independent interest, the proposed stochastic momentum and adaptive algorithms are also applicable to a class of two-level stochastic dependent compositional optimization problems.	翻訳日:2021-07-05 14:45:19 公開日:2021-07-02
# (参考訳) AIタスクのための倫理シート Ethics Sheets for AI Tasks ( http://arxiv.org/abs/2107.01183v1 ) ライセンス: CC BY 4.0	Saif M. Mohammad	(参考訳) バイアスド・リシディズム・システムの使用や、脆弱なサブ人口に対する感情認識システムの大量テストなど、いくつかの顕著な出来事は、テクノロジーが既に疎外されている人々にとってより有害な結果をもたらすことを強調している。本稿では,個別のモデルやデータセットのレベルだけでなく,AIタスクのレベルにおいても倫理的考察を考察する。 AIタスクのための倫理シート(Ethics Sheets for AI Tasks)という,タスクの一般的なフレーム化方法や,データやメソッド,評価に関する選択に隠された仮定と倫理的考察の具体化を目的とした,そのような取り組みの新たな形式を紹介します。最後に、自動感情認識のための倫理表の例を挙げる。データセット用のData SheetsとAIシステムのModel Cardsとともに、Ethics Sheetsは、責任あるAIシステムの開発とデプロイを支援する。 Several high-profile events, such as the use of biased recidivism systems and mass testing of emotion recognition systems on vulnerable sub-populations, have highlighted how technology will often lead to more adverse outcomes for those that are already marginalized. In this paper, I will make a case for thinking about ethical considerations not just at the level of individual models and datasets, but also at the level of AI tasks. I will present a new form of such an effort, Ethics Sheets for AI Tasks, dedicated to fleshing out the assumptions and ethical considerations hidden in how a task is commonly framed and in the choices we make regarding the data, method, and evaluation. Finally, I will provide an example ethics sheet for automatic emotion recognition. Together with Data Sheets for datasets and Model Cards for AI systems, Ethics Sheets aid in the development and deployment of responsible AI systems.	翻訳日:2021-07-05 14:11:38 公開日:2021-07-02
# (参考訳) NTIRE 2021 Multi-modal Aerial View Object Classification Challenge NTIRE 2021 Multi-modal Aerial View Object Classification Challenge ( http://arxiv.org/abs/2107.01189v1 ) ライセンス: CC BY 4.0	Jerrick Liu, Nathan Inkawhich, Oliver Nina, Radu Timofte, Sahil Jain, Bob Lee, Yuru Duan, Wei Wei, Lei Zhang, Songzheng Xu, Yuxuan Sun, Jiaqi Tang, Xueli Geng, Mengru Ma, Gongzhe Li, Xueli Geng, Huanqia Cai, Chengxue Cai, Sol Cummings, Casian Miron, Alexandru Pasarica, Cheng-Yen Yang, Hung-Min Hsu, Jiarui Cai, Jie Mei, Chia-Ying Yeh, Jenq-Neng Hwang, Michael Xin, Zhongkai Shangguan, Zihe Zheng, Xu Yifei, Lehan Yang, Kele Xu, Min Feng	(参考訳) 本稿では,CVPR における NTIRE 2021 ワークショップと合わせて,MAVOC (Multi-modal Aerial View Object Classification) の最初の挑戦を紹介する。この課題は、EOとSAR画像を用いた2つの異なるトラックで構成されている。 EOとSARのセンサーには、それぞれ異なる利点と欠点がある。この競争の目的は、両方の感覚情報を相補的に利用する方法を分析することである。本コンペティションに提案した上位手法について論じ,その成果を盲点テストセットで評価する。我々の挑戦結果は、競技のトラック毎の現在のベースラインから15%以上精度が向上したことを示している。 In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in complementary ways. We discuss the top methods submitted for this competition and evaluate their results on our blind test set. Our challenge results show significant improvement of more than 15% accuracy from our current baselines for each track of the competition	翻訳日:2021-07-05 13:59:55 公開日:2021-07-02
# (参考訳) コントラスト学習はいかに不完全か自己監督型ビデオ認識のためのイントラ・イントラ・ヴァリアントデュアル表現法 How Incomplete is Contrastive Learning? AnInter-intra Variant Dual Representation Method forSelf-supervised Video Recognition ( http://arxiv.org/abs/2107.01194v1 ) ライセンス: CC BY 4.0	Lin Zhang, Qi She, Zhengyang Shen, Changhu Wang	(参考訳) 自己指導型表現学習に適用されるコントラスト学習は、深層モデルで復活している。本稿では,自己教師付きビデオ認識のための既存のコントラスト学習ベースのソリューションが,同一ビデオ内のクリップ内分散を無視しながら,分散符号化に重点を置いていることを見出した。そこで本研究では,各クリップの2つの表現を学習し,シャッフルランクのプリテキストタスクでイントラ分散を符号化し,時間的コヒーレントなコントラスト損失で相互分散を符号化する手法を提案する。実験の結果,本手法は相互および内部分散のバランスをとる上で重要な役割を担っており,複数のバックボーンとコントラスト学習フレームワーク上で一貫したパフォーマンス向上をもたらす。 SimCLR と統合して Kinetics-400 で事前訓練を行い,UCF101 と HMDB51 のテストセットの下流分類精度 $\textbf{82.0\%} と $\textbf{51.2\%} と $\textbf{46.1\%} と UCF101 の動画検索精度 $\textbf{46.1\%} をそれぞれ達成した。 Contrastive learning applied to self-supervised representation learning has seen a resurgence in deep models. In this paper, we find that existing contrastive learning based solutions for self-supervised video recognition focus on inter-variance encoding but ignore the intra-variance existing in clips within the same video. We thus propose to learn dual representations for each clip which (\romannumeral 1) encode intra-variance through a shuffle-rank pretext task; (\romannumeral 2) encode inter-variance through a temporal coherent contrastive loss. Experiment results show that our method plays an essential role in balancing inter and intra variances and brings consistent performance gains on multiple backbones and contrastive learning frameworks. Integrated with SimCLR and pretrained on Kinetics-400, our method achieves $\textbf{82.0\%}$ and $\textbf{51.2\%}$ downstream classification accuracy on UCF101 and HMDB51 test sets respectively and $\textbf{46.1\%}$ video retrieval accuracy on UCF101, outperforming both pretext-task based and contrastive learning based counterparts.	翻訳日:2021-07-05 13:37:09 公開日:2021-07-02
# アクショントランスフォーマー : 短時間行動認識のためのセルフアテンションモデル Action Transformer: A Self-Attention Model for Short-Time Human Action Recognition ( http://arxiv.org/abs/2107.00606v2 ) ライセンス: Link先を確認	Vittorio Mazzia, Simone Angarano, Francesco Salvetti, Federico Angelini and Marcello Chiaberge	(参考訳) 純粋に注意に基づくディープニューラルネットワークは、設計者による最小限のアーキテクチャ優先に依存しているため、いくつかのドメインで成功を収めている。人間行動認識(har)では、注意機構は主に標準畳み込み層や再帰層の上に採用され、全体的な一般化能力が向上している。本研究では,畳み込み層,リカレント層,注意層を混合するより精巧なネットワークを一貫して上回る,単純で完全な自己完結型アーキテクチャであるaction transformer(act)を導入する。従来のヒューマンアクション認識研究に基づいて,計算とエネルギーの要求を制限するため,提案手法では2次元ポーズ表現を小さな時間窓上で活用し,高精度かつ効果的なリアルタイム性能を実現するための低レイテンシソリューションを提供する。さらに、リアルタイムな短時間の人行動認識のための正式なトレーニングと評価ベンチマークを構築するために、新しい大規模データセットであるMPOSE2021をオープンソース化した。 MPOSE2021の大規模実験は,提案手法と,それ以前のアーキテクチャソリューションにより,AcTモデルの有効性が証明され,今後のHAR研究の基盤となる。 Deep neural networks based purely on attention have been successful across several domains, relying on minimal architectural priors from the designer. In Human Action Recognition (HAR), attention mechanisms have been primarily adopted on top of standard convolutional or recurrent layers, improving the overall generalization capability. In this work, we introduce Action Transformer (AcT), a simple, fully self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent, and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance. Moreover, we open-source MPOSE2021, a new large-scale dataset, as an attempt to build a formal training and evaluation benchmark for real-time short-time human action recognition. Extensive experimentation on MPOSE2021 with our proposed methodology and several previous architectural solutions proves the effectiveness of the AcT model and poses the base for future work on HAR.	翻訳日:2021-07-05 13:07:08 公開日:2021-07-02
# 合成データは多目的追跡における関連知識学習の現実に匹敵する Synthetic Data Are as Good as the Real for Association Knowledge Learning in Multi-object Tracking ( http://arxiv.org/abs/2106.16100v2 ) ライセンス: Link先を確認	Yuchi Liu, Zhongdao Wang, Xiangxin Zhou and Liang Zheng	(参考訳) 同じアイデンティティのバウンディングボックスをビデオシーケンスでリンクすることを目的としたアソシエーションは、マルチオブジェクトトラッキング(mot)の中心的なコンポーネントである。パラメトリックネットワークなどのアソシエーションモジュールをトレーニングするために、実際のビデオデータが通常使用される。しかし、連続するビデオフレームで人物のトラックをアノテートすることは高価であり、そのような実際のデータは柔軟性がないため、追跡シナリオを変更するシステム性能w.r.tを評価する機会が限られている。本稿では,3次元合成データが実世界の映像を連想訓練に置き換えられるかどうかについて検討する。具体的には,MOTXと呼ばれる大規模合成データエンジンを導入し,カメラや物体の運動特性を実世界のデータセットに類似するように手動で設定する。実データと比較すると,合成データから得られる連想知識は,ドメイン適応手法を使わずに実世界のテストセットで非常によく似た性能が得られることを示す。私たちの興味深い観察には2つの要因がある。第一に、3Dエンジンは、カメラの動き、カメラの視界、物体の動きなどの動きをうまくシミュレートすることができ、シミュレートされたビデオは、効果的なモーション特徴を持つアソシエーションモジュールを提供することができる。第2に, 出現領域のギャップが連想知識の学習にほとんど影響を与えないことを示す実験結果が得られた。さらに、MOTXの強力なカスタマイズ能力により、MOTに対する運動要因の影響を定量的に評価することが可能となり、コミュニティに新たな洞察がもたらされる。 Association, aiming to link bounding boxes of the same identity in a video sequence, is a central component in multi-object tracking (MOT). To train association modules, e.g., parametric networks, real video data are usually used. However, annotating person tracks in consecutive video frames is expensive, and such real data, due to its inflexibility, offer us limited opportunities to evaluate the system performance w.r.t changing tracking scenarios. In this paper, we study whether 3D synthetic data can replace real-world videos for association training. Specifically, we introduce a large-scale synthetic data engine named MOTX, where the motion characteristics of cameras and objects are manually configured to be similar to those in real-world datasets. We show that compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques. Our intriguing observation is credited to two factors. First and foremost, 3D engines can well simulate motion factors such as camera movement, camera view and object movement, so that the simulated videos can provide association modules with effective motion features. Second, experimental results show that the appearance domain gap hardly harms the learning of association knowledge. In addition, the strong customization ability of MOTX allows us to quantitatively assess the impact of motion factors on MOT, which brings new insights to the community.	翻訳日:2021-07-05 13:06:47 公開日:2021-07-02
# SocialAI: 深層強化学習エージェントにおける社会認知能力のベンチマーク SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents ( http://arxiv.org/abs/2107.00956v1 ) ライセンス: Link先を確認	Grgur Kova\v{c}, R\'emy Portelas, Katja Hofmann, Pierre-Yves Oudeyer	(参考訳) 人間との社会的相互作用に参加することができる、具体化された自律エージェントを構築することは、AIの主要な課題の1つだ。深層強化学習(Dep Reinforcement Learning, DRL)分野において、この目的は具体的言語使用に関する複数の研究を動機づけた。しかし、現在のアプローチでは、非常にシンプルで多様でない社会状況におけるコミュニケーションツールとしての言語に焦点が当てられている: 言語の「自然性」は、高い語彙サイズと可変性の概念に還元される。本稿では,人間レベルのAIを目指すためには,1)複雑で可変な社会的文脈における言語の使用,2)常に進化する社会世界におけるマルチモーダル環境における複雑な具体的コミュニケーションなど,より広範な社会スキルのセットが必要であることを論じる。認知科学の概念は、AIが人間のような知性に向けてロードマップを描き出すのにどう役立つかを説明します。最初のステップとして、現在の研究をより広範なソーシャルスキルのセットに拡大することを提案する。そこで我々は,他の(記述された)ソーシャルエージェントを特徴とする複数のグリッドワールド環境を用いて,DRLエージェントの社会的スキル獲得を評価するベンチマークであるSocialAIを提案する。次に,最近のsota drlアプローチの限界をsocialai上で検証し,次の社会的エージェントへの重要なステップについて論じる。ビデオとコードはhttps://sites.google.com/view/socialaiで入手できる。 Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. Within the Deep Reinforcement Learning (DRL) field, this objective motivated multiple works on embodied language use. However, current approaches focus on language as a communication tool in very simplified and non-diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary size and variability. In this paper, we argue that aiming towards human-level AI requires a broader set of key social skills: 1) language use in complex and variable social contexts; 2) beyond language, complex embodied communication in multimodal settings within constantly evolving social worlds. We explain how concepts from cognitive sciences could help AI to draw a roadmap towards human-like intelligence, with a focus on its social dimensions. As a first step, we propose to expand current research to a broader set of core social skills. To do this, we present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents using multiple grid-world environments featuring other (scripted) social agents. We then study the limits of a recent SOTA DRL approach when tested on SocialAI and discuss important next steps towards proficient social agents. Videos and code are available at https://sites.google.com/view/socialai.	翻訳日:2021-07-05 13:05:53 公開日:2021-07-02
# ロバストな医用画像分割のための協調訓練と潜時空間データ増強 Cooperative Training and Latent Space Data Augmentation for Robust Medical Image Segmentation ( http://arxiv.org/abs/2107.01079v1 ) ライセンス: Link先を確認	Chen Chen, Kerstin Hammernik, Cheng Ouyang, Chen Qin, Wenjia Bai, Daniel Rueckert	(参考訳) ディープラーニングベースのセグメンテーション手法は、例えばデプロイメント中に予期せぬデータ分散シフトに対して脆弱である。異なるスキャナー、予期しない画像アーティファクトなどによる画像の外観やコントラストの変化。本稿では,画像分割モデルの学習のための協調フレームワークと,実例生成のための潜在空間拡張手法を提案する。どちらの貢献も限られたデータでモデルの一般化と堅牢性を改善する。協調トレーニングフレームワークは、高速思考ネットワーク(FTN)と低速思考ネットワーク(STN)で構成されている。 FTNは、画像再構成とセグメンテーションタスクのための分離された画像特徴と形状特徴を学習する。 STNは、セグメンテーション補正と精錬のための形状前処理を学習する。 2つのネットワークは協調的に訓練されている。潜時空間増強は、チャネルワイドおよび空間ワイドの両方で分離された潜時空間をマスキングすることで、困難な訓練例を生成する。公開心画像データセットについて広範な実験を行った。訓練対象は1つのサイトから10名に過ぎず,強いベースライン法に比べ,クロスサイトセグメンテーション性能が向上し,様々な予期せぬ画像アーチファクトに対するロバスト性が向上した。特に、潜在空間データ拡張による協調訓練は、標準訓練法と比較して平均サイコロスコアで15%向上する。 Deep learning-based segmentation methods are vulnerable to unforeseen data distribution shifts during deployment, e.g. change of image appearances or contrasts caused by different scanners, unexpected imaging artifacts etc. In this paper, we present a cooperative framework for training image segmentation models and a latent space augmentation method for generating hard examples. Both contributions improve model generalization and robustness with limited data. The cooperative training framework consists of a fast-thinking network (FTN) and a slow-thinking network (STN). The FTN learns decoupled image features and shape features for image reconstruction and segmentation tasks. The STN learns shape priors for segmentation correction and refinement. The two networks are trained in a cooperative manner. The latent space augmentation generates challenging examples for training by masking the decoupled latent space in both channel-wise and spatial-wise manners. We performed extensive experiments on public cardiac imaging datasets. Using only 10 subjects from a single site for training, we demonstrated improved cross-site segmentation performance and increased robustness against various unforeseen imaging artifacts compared to strong baseline methods. Particularly, cooperative training with latent space data augmentation yields 15% improvement in terms of average Dice score when compared to a standard training method.	翻訳日:2021-07-05 13:05:31 公開日:2021-07-02
# Deep Metric Learning 法の一般化性向上のための損失関数の組付け Ensemble of Loss Functions to Improve Generalizability of Deep Metric Learning methods ( http://arxiv.org/abs/2107.01130v1 ) ライセンス: Link先を確認	Davood Zabihzadeh	(参考訳) Deep Metric Learning (DML)は入力データから非線形セマンティック埋め込みを学び、類似したペアをまとめながら、異なるデータを互いに遠ざけ合う。この目的のために、様々な応用において有望な結果をもたらす様々な方法が過去10年間に提案されている。 DMLアルゴリズムの成功は、その損失関数に大きく依存する。しかし、損失関数は完全ではなく、最適な類似性の埋め込みのいくつかの側面のみを扱う。さらに、テスト段階における見えないカテゴリに対するDMLの一般化性は、既存の損失関数では考慮されない重要な問題である。これらの課題に対処するために,共有機能抽出器上に構築された異なる損失を組み合わせ,新しい手法を提案する。提案された損失の集合は、すべての損失と一致する特徴を抽出するディープモデルを強制する。選択された損失は多種多様であり,それぞれが最適セマンティック埋め込みの異なる側面を強調しているため,有効結合法は個々の損失に対して著しく改善され,目に見えないカテゴリをうまく一般化する。ここでは、損失関数の選択には制限がなく、我々のメソッドは既存の関数の任意のセットで動作する。さらに、各損失関数とその重みを、ハイパーパラメータを調整する必要なく、エンドツーエンドのパラダイムで最適化することもできる。従来のゼロショット学習(zsl)設定において,マシンビジョン領域から一般的なデータセットを評価する。その結果,本手法がすべてのデータセットにおいて,ベースラインの損失をはるかに上回っていることが明らかとなった。 Deep Metric Learning (DML) learns a non-linear semantic embedding from input data that brings similar pairs together while keeps dissimilar data away from each other. To this end, many different methods are proposed in the last decade with promising results in various applications. The success of a DML algorithm greatly depends on its loss function. However, no loss function is perfect, and it deals only with some aspects of an optimal similarity embedding. Besides, the generalizability of the DML on unseen categories during the test stage is an important matter that is not considered by existing loss functions. To address these challenges, we propose novel approaches to combine different losses built on top of a shared deep feature extractor. The proposed ensemble of losses enforces the deep model to extract features that are consistent with all losses. Since the selected losses are diverse and each emphasizes different aspects of an optimal semantic embedding, our effective combining methods yield a considerable improvement over any individual loss and generalize well on unseen categories. Here, there is no limitation in choosing loss functions, and our methods can work with any set of existing ones. Besides, they can optimize each loss function as well as its weight in an end-to-end paradigm with no need to adjust any hyper-parameter. We evaluate our methods on some popular datasets from the machine vision domain in conventional Zero-Shot-Learning (ZSL) settings. The results are very encouraging and show that our methods outperform all baseline losses by a large margin in all datasets.	翻訳日:2021-07-05 13:05:11 公開日:2021-07-02
# 映像における視覚関係予測 Visual Relationship Forecasting in Videos ( http://arxiv.org/abs/2107.01181v1 ) ライセンス: Link先を確認	Li Mi, Yangjun Ou, Zhenzhong Chen	(参考訳) 現実世界のシナリオは、しばしば未知の未来のオブジェクトインタラクションの予測を必要とし、人間とエージェントの両方の意思決定プロセスを支援する。この課題に対処するため,視覚関係予測(Visual Relation Forecasting:VRF)というタスクをビデオに提示し,視覚関係の予測を推論的に検討する。具体的には、Hフレームと対象オブジェクトのペアが与えられた場合、VRFは視覚的証拠なしで次のTフレームに対する将来の相互作用を予測することを目的としている。 VRFタスクを評価するために,VRF-AGとVRF-VidORという2つのビデオデータセットを紹介した。これらの2つのデータセットは、それぞれ1923年と13447年のビデオクリップで13と35の視覚関係を密に注釈している。さらに、時空間グラフ畳み込みネットワークとトランスフォーマーによってオブジェクトレベルとフレームレベルの依存関係をキャプチャする新しいグラフ畳み込みトランスフォーマ(GCT)フレームワークを提案する。 VRF-AGデータセットとVRF-Vidorデータセットの両方の実験結果から、GCTは視覚関係予測における最先端のシーケンスモデリング手法よりも優れていることが示された。 Real-world scenarios often require the anticipation of object interactions in unknown future, which would assist the decision-making process of both humans and agents. To meet this challenge, we present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a reasoning manner. Specifically, given a subject-object pair with H existing frames, VRF aims to predict their future interactions for the next T frames without visual evidence. To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series of spatio-temporally localized visual relation annotations in a video. These two datasets densely annotate 13 and 35 visual relationships in 1923 and 13447 video clips, respectively. In addition, we present a novel Graph Convolutional Transformer (GCT) framework, which captures both object-level and frame-level dependencies by spatio-temporal Graph Convolution Network and Transformer. Experimental results on both VRF-AG and VRF-VidOR datasets demonstrate that GCT outperforms the state-of-the-art sequence modelling methods on visual relationship forecasting.	翻訳日:2021-07-05 13:04:51 公開日:2021-07-02
# Transformer-F: 普遍的な文表現の学習に有効なトランスフォーマーネットワーク Transformer-F: A Transformer network with effective methods for learning universal sentence representation ( http://arxiv.org/abs/2107.00653v1 ) ライセンス: Link先を確認	Yu Shi	(参考訳) Transformerモデルは、自然言語処理で文表現に広く使われている。しかし、以前のトランスフォーマーベースのモデルは、たいていの場合、限定的な意味を持ち、単に高レベルな意味抽象機能を抽出できる関数ワードに焦点を当てていた。本稿では,トランスフォーマーの性能向上のための2つの手法を提案する。注意度を相関係数と重みベクトルを乗じることで算出し,より実用的な意味を持つ単語の抽出に寄与した。重みベクトルは、音声部分の重要性に基づいて入力テキストシーケンスによって得られる。さらに,各層の特徴を融合させて文表現結果をより包括的かつ正確にする。実験では、3つの標準テキスト分類データセットに対するモデルtransformer-fの有効性を示す。実験の結果,提案モデルがベースラインモデルと比較してテキスト分類の性能を著しく向上させることがわかった。具体的には,簡単な作業でバニラ変圧器を5.28%向上させた。 The Transformer model is widely used in natural language processing for sentence representation. However, the previous Transformer-based models focus on function words that have limited meaning in most cases and could merely extract high-level semantic abstraction features. In this paper, two approaches are introduced to improve the performance of Transformers. We calculated the attention score by multiplying the part-of-speech weight vector with the correlation coefficient, which helps extract the words with more practical meaning. The weight vector is obtained by the input text sequence based on the importance of the part-of-speech. Furthermore, we fuse the features of each layer to make the sentence representation results more comprehensive and accurate. In experiments, we demonstrate the effectiveness of our model Transformer-F on three standard text classification datasets. Experimental results show that our proposed model significantly boosts the performance of text classification as compared to the baseline model. Specifically, we obtain a 5.28% relative improvement over the vanilla Transformer on the simple tasks.	翻訳日:2021-07-05 13:04:32 公開日:2021-07-02
# Video Captionsを用いたYouTube上の誤情報検出 Misinformation Detection on YouTube Using Video Captions ( http://arxiv.org/abs/2107.00941v1 ) ライセンス: Link先を確認	Raj Jagtap, Abhinav Kumar, Rahul Goel, Shakshi Sharma, Rajesh Sharma, Clint P. George	(参考訳) 何百万人もの人々がyoutube、facebook、twitterなどのマスメディアを利用している。これらのプラットフォームへのアクセシビリティのため、物語を確立し、プロパガンダを行い、誤情報を広めるためにしばしば使用される。本研究では,最新のNLP技術を用いて映像キャプション(字幕)から特徴を抽出する手法を提案する。提案手法を評価するために,動画を誤情報か否かを分類するために,公開アクセス可能なラベル付きデータセットを用いた。ビデオキャプションを探索する動機は、ビデオメタデータの分析にある。ビュー数、いいね!、嫌い、コメントなどの属性は、ビデオがこの情報を使って区別することが難しいため、効果がない。提案手法では,キャプションデータセットを用いて0.85から0.90 f1-scoreの3種類(誤報,誤報,中立)の動画を分類できる。誤情報クラスの関連性を強調するため,我々はこの分類問題を,誤情報と他者(誤情報と中立性)の2類分類として再定式化する。提案手法では,0.92から0.95 f1-score,0.78から0.90 auc rocの動画を分類できる。 Millions of people use platforms such as YouTube, Facebook, Twitter, and other mass media. Due to the accessibility of these platforms, they are often used to establish a narrative, conduct propaganda, and disseminate misinformation. This work proposes an approach that uses state-of-the-art NLP techniques to extract features from video captions (subtitles). To evaluate our approach, we utilize a publicly accessible and labeled dataset for classifying videos as misinformation or not. The motivation behind exploring video captions stems from our analysis of videos metadata. Attributes such as the number of views, likes, dislikes, and comments are ineffective as videos are hard to differentiate using this information. Using caption dataset, the proposed models can classify videos among three classes (Misinformation, Debunking Misinformation, and Neutral) with 0.85 to 0.90 F1-score. To emphasize the relevance of the misinformation class, we re-formulate our classification problem as a two-class classification - Misinformation vs. others (Debunking Misinformation and Neutral). In our experiments, the proposed models can classify videos with 0.92 to 0.95 F1-score and 0.78 to 0.90 AUC ROC.	翻訳日:2021-07-05 13:03:58 公開日:2021-07-02
# r2d2:階層型言語モデルのための微分木に基づく再帰的トランスフォーマ R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling ( http://arxiv.org/abs/2107.00967v1 ) ライセンス: Link先を確認	Xiang Hu, Haitao Mi, Zujie Wen, Yafang Wang, Yi Su, Jing Zheng, Gerard de Melo	(参考訳) 人間の言語理解は、階層的に結合できる抽象レベルの増加とともに、複数のレベルの粒度(例えば、単語、句、文)で機能する。しかし、スタック層を持つ既存の深層モデルは、いかなる階層的プロセスも明示的にモデル化しない。本稿では、構成過程をエミュレートするために、微分可能なCKYスタイルのバイナリツリーに基づく再帰変換モデルを提案する。我々は、双方向言語モデルの事前学習目標をこのアーキテクチャに拡張し、左右の抽象ノードから各単語を予測することを試みる。また,本手法を大規模化するために,合成ステップの線形数だけを符号化する効率的な伐採木誘導アルゴリズムを導入する。言語モデルと教師なし構文解析の実験結果は,提案手法の有効性を示している。 Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. This paper proposes a recursive Transformer model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach.	翻訳日:2021-07-05 13:03:38 公開日:2021-07-02
# コード混合BERTを用いたヒンディー語ツイートの言語識別 Language Identification of Hindi-English tweets using code-mixed BERT ( http://arxiv.org/abs/2107.01202v1 ) ライセンス: Link先を確認	Mohd Zeeshan Ansari, M M Sufyan Beg, Tanvir Ahmad, Mohd Jazib Khan, Ghazali Wasim	(参考訳) 近年,ソーシャルメディアのテキストの言語識別は興味深い研究課題となっている。ソーシャルメディアのメッセージは、主に英語以外の国で混在している。文脈埋め込みの事前学習による事前知識は、下流タスクにおけるアート結果の状態を示している。近年、BERTのようなモデルでは、大量のラベルのないデータを使用することで、事前訓練された言語モデルは共通の言語表現を学習するのにさらに有益であることが示されている。本稿では,移動学習と細調整BERTモデルを用いたTwitter上での言語識別実験について述べる。この研究は、ヒンディー語-英語-ウルドゥー語のコード混合テキストのデータ収集を言語事前学習に用い、ヒンディー語-英語コード混合を後続の単語レベルの言語分類に用いている。その結果、コードミックスデータ上で事前学習された表現は、モノリンガルデータによるより良い結果をもたらすことがわかった。 Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states. Prior knowledge by pre-training contextual embeddings have shown state of the art results for a range of downstream tasks. Recently, models such as BERT have shown that using a large amount of unlabeled data, the pretrained language models are even more beneficial for learning common language representations. Extensive experiments exploiting transfer learning and fine-tuning BERT models to identify language on Twitter are presented in this paper. The work utilizes a data collection of Hindi-English-Urdu codemixed text for language pre-training and Hindi-English codemixed for subsequent word-level language classification. The results show that the representations pre-trained over codemixed data produce better results by their monolingual counterpart.	翻訳日:2021-07-05 13:03:25 公開日:2021-07-02
# ブリッジングジェネリックと個人化フェデレーション学習について On Bridging Generic and Personalized Federated Learning ( http://arxiv.org/abs/2107.00778v1 ) ライセンス: Link先を確認	Hong-You Chen, Wei-Lun Chao	(参考訳) フェデレーション学習(federated learning)は、データにアクセスせずに複数のクライアントと共同でモデルをトレーニングできることを約束している。学習したモデルの一般的なパフォーマンス(サーバでの将来の使用のために)や、パーソナライズされたパフォーマンス(各クライアントのために)を優先すべきか? これら2つの競合しているように見える目標が,コミュニティを分割して一方に注目する一方で,本論文では,両アプローチが同時に可能であることを示す。具体的には,モデルの2つの義務を2つの予測タスクで明確に分離する,新しいフェデレーション学習フレームワークを提案する。一方で,非同一のクラス分布に対して頑健な損失のファミリーを導入することで,クライアントが相互に一貫した目標を持った汎用予測子をトレーニングできる。一方、パーソナライズされた予測器を軽量適応モジュールとして定式化し、汎用予測器上で各クライアントの経験的リスクを最小限に抑えることを学習する。 Federated Robust Decoupling Fed-RoDと名付けられたこの2つの余分な2つの予測フレームワークによって、学習モデルは、最先端の汎用的かつパーソナライズされたパフォーマンスを同時に達成することができ、基本的に2つのタスクをブリッジします。 Federated learning is promising for its ability to collaboratively train models with multiple clients without accessing their data, but vulnerable when clients' data distributions diverge from each other. This divergence further leads to a dilemma: "Should we prioritize the learned model's generic performance (for future use at the server) or its personalized performance (for each client)?" These two, seemingly competing goals have divided the community to focus on one or the other, yet in this paper we show that it is possible to approach both at the same time. Concretely, we propose a novel federated learning framework that explicitly decouples a model's dual duties with two prediction tasks. On the one hand, we introduce a family of losses that are robust to non-identical class distributions, enabling clients to train a generic predictor with a consistent objective across them. On the other hand, we formulate the personalized predictor as a lightweight adaptive module that is learned to minimize each client's empirical risk on top of the generic predictor. With this two-loss, two-predictor framework which we name Federated Robust Decoupling Fed-RoD, the learned model can simultaneously achieve state-of-the-art generic and personalized performance, essentially bridging the two tasks.	翻訳日:2021-07-05 13:02:56 公開日:2021-07-02
# 後方対応型予測更新:確率論的アプローチ Backward-Compatible Prediction Updates: A Probabilistic Approach ( http://arxiv.org/abs/2107.01057v1 ) ライセンス: Link先を確認	Frederik Tr\"auble, Julius von K\"ugelgen, Matth\"aus Kleindessner, Francesco Locatello, Bernhard Sch\"olkopf, Peter Gehler	(参考訳) 機械学習システムが現実世界のアプリケーションに適合する場合、精度はいくつかの要件の1つに過ぎません。本稿では,事前学習および定期的な最先端モデルの改善による相補的視点について検討する。新しい改善されたモデルは速いペースで進化するが、下流のタスクはよりゆっくりと変化するか、一定である。正確な予測を維持したいという、大きなラベルのないデータセットがあると仮定する。 i) 予算が限られている場合、どのデータポイントが新しいモデルで再評価されるべきか? と (ii) もし新しい予測が現在の予測と違うなら、更新すべきだろうか? 問題 (i) は計算コストであり、非常に大きなデータセットとモデルにとって重要である。問題 (ii) は予測の整合性を維持することであり、これは下流のアプリケーションに非常に関係がある。本稿では,予測更新問題を定式化し,上記の問題に対する効率的な確率的アプローチを提案する。標準分類ベンチマークデータセットの広範な実験において,提案手法は後方互換性のある予測更新のための重要な指標に沿って,代替戦略よりも優れていることを示す。 When machine learning systems meet real world applications, accuracy is only one of several requirements. In this paper, we assay a complementary perspective originating from the increasing availability of pre-trained and regularly improving state-of-the-art models. While new improved models develop at a fast pace, downstream tasks vary more slowly or stay constant. Assume that we have a large unlabelled data set for which we want to maintain accurate predictions. Whenever a new and presumably better ML models becomes available, we encounter two problems: (i) given a limited budget, which data points should be re-evaluated using the new model?; and (ii) if the new predictions differ from the current ones, should we update? Problem (i) is about compute cost, which matters for very large data sets and models. Problem (ii) is about maintaining consistency of the predictions, which can be highly relevant for downstream applications; our demand is to avoid negative flips, i.e., changing correct to incorrect predictions. In this paper, we formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions. In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies along key metrics for backward-compatible prediction updates.	翻訳日:2021-07-05 13:02:32 公開日:2021-07-02
# CHISEL: 深層学習による屋内局所化の精度向上 CHISEL: Compression-Aware High-Accuracy Embedded Indoor Localization with Deep Learning ( http://arxiv.org/abs/2107.01192v1 ) ライセンス: Link先を確認	Liping Wang, Saideep Tiku, Sudeep Pasricha	(参考訳) GPS技術は、私たちが屋外でローカライズし、ナビゲートする方法に革命をもたらした。しかし、建物内のGPS信号の受信が貧弱なため、屋内でのローカライゼーションには適さない。 WiFi指紋認証による屋内位置特定は、この需要を満たす最も有望な方法の1つだ。残念なことに、ドメイン内のほとんどの作業は、リソース制限された組み込みデバイスへのデプロイ可能性に関する課題を解決できない。そこで本研究では,組込みデバイスにおけるローカライズロバスト性を維持しつつ,その領域でよく知られた作業より優れる圧縮認識・高精度深層学習フレームワークCHISELを提案する。 GPS technology has revolutionized the way we localize and navigate outdoors. However, the poor reception of GPS signals in buildings makes it unsuitable for indoor localization. WiFi fingerprinting-based indoor localization is one of the most promising ways to meet this demand. Unfortunately, most work in the domain fails to resolve challenges associated with deployability on resource-limited embedded devices. In this work, we propose a compression-aware and high-accuracy deep learning framework called CHISEL that outperforms the best-known works in the area while maintaining localization robustness on embedded devices.	翻訳日:2021-07-05 13:02:16 公開日:2021-07-02
# 相対密度比推定のためのメタラーニング Meta-Learning for Relative Density-Ratio Estimation ( http://arxiv.org/abs/2107.00801v1 ) ライセンス: Link先を確認	Atsutoshi Kumagai and Tomoharu Iwata and Yasuhiro Fujiwara	(参考訳) 密度比と呼ばれる2つの確率密度の比率は、機械学習において重要な量である。特に、密度比の有界拡大である相対密度比は、その安定性から多くの注目を集めており、外乱検出やデータセット比較といった様々な用途で利用されてきた。相対密度比推定(DRE)の既存の方法は、両方の密度から多くのインスタンスを必要とする。しかし、実際には十分なインスタンスは利用できないことが多い。本稿では,関係データセットの知識を用いて,少数の事例から相対密度比を推定する,相対DREのメタラーニング手法を提案する。具体的には、いくつかのインスタンスからなる2つのデータセットを与えられた場合、ニューラルネットワークを用いてデータセットの情報を抽出し、相対dreに適したインスタンス埋め込みを得る。我々は,大域的最適解を閉形式解として得られる埋め込み空間上の線形モデルを用いて相対密度比をモデル化する。クローズドフォームソリューションはいくつかのインスタンスへの高速かつ効果的な適応を可能にし、その微分可能性により、相対的なDREに対するテストエラーが、少数のインスタンスに適応した後、明示的に最小化できるようにモデルを訓練することができる。提案手法の有効性を,相対的DRE,データセット比較,外乱検出の3つの問題を用いて実証的に実証した。 The ratio of two probability densities, called a density-ratio, is a vital quantity in machine learning. In particular, a relative density-ratio, which is a bounded extension of the density-ratio, has received much attention due to its stability and has been used in various applications such as outlier detection and dataset comparison. Existing methods for (relative) density-ratio estimation (DRE) require many instances from both densities. However, sufficient instances are often unavailable in practice. In this paper, we propose a meta-learning method for relative DRE, which estimates the relative density-ratio from a few instances by using knowledge in related datasets. Specifically, given two datasets that consist of a few instances, our model extracts the datasets' information by using neural networks and uses it to obtain instance embeddings appropriate for the relative DRE. We model the relative density-ratio by a linear model on the embedded space, whose global optimum solution can be obtained as a closed-form solution. The closed-form solution enables fast and effective adaptation to a few instances, and its differentiability enables us to train our model such that the expected test error for relative DRE can be explicitly minimized after adapting to a few instances. We empirically demonstrate the effectiveness of the proposed method by using three problems: relative DRE, dataset comparison, and outlier detection.	翻訳日:2021-07-05 13:01:36 公開日:2021-07-02
# 教師なし特徴選択のための少数ショット学習 Few-shot Learning for Unsupervised Feature Selection ( http://arxiv.org/abs/2107.00816v1 ) ライセンス: Link先を確認	Atsutoshi Kumagai and Tomoharu Iwata and Yasuhiro Fujiwara	(参考訳) そこで本稿では,ラベル付きデータに含まれる特徴のサブセットを選択するタスクである,教師なし特徴選択のための数ショット学習手法を提案する。既存のメソッドは通常、機能選択に多くのインスタンスを必要とする。しかし、実際には十分なインスタンスは利用できないことが多い。提案手法では,複数のソースタスクでラベルなしインスタンスをトレーニングすることにより,いくつかのラベルなしターゲットインスタンスが与えられた場合,対象タスクの関連機能のサブセットを選択できる。我々のモデルは特徴セレクタとデコーダで構成される。特徴セレクタは、いくつかの未ラベルのインスタンスを入力として取り込んだ関連する機能のサブセットを出力し、デコーダは選択したインスタンスから未表示のインスタンスのオリジナル機能を再構築することができる。特徴セレクタは、具体的ランダム変数を使用して、勾配降下による特徴を選択する。いくつかのラベルなしインスタンスからモデルにタスク固有の特性をエンコードするために、いくつかのラベルなしインスタンスを入力とする置換不変ニューラルネットワークを用いて具体的確率変数とデコーダをモデル化する。私たちのモデルは、ソースタスクのデータセットで計算されたいくつかのラベルなしインスタンスに対して、期待されるテスト再構成エラーを最小化することでトレーニングされます。提案手法が既存の特徴選択法より優れていることを示す。 We propose a few-shot learning method for unsupervised feature selection, which is a task to select a subset of relevant features in unlabeled data. Existing methods usually require many instances for feature selection. However, sufficient instances are often unavailable in practice. The proposed method can select a subset of relevant features in a target task given a few unlabeled target instances by training with unlabeled instances in multiple source tasks. Our model consists of a feature selector and decoder. The feature selector outputs a subset of relevant features taking a few unlabeled instances as input such that the decoder can reconstruct the original features of unseen instances from the selected ones. The feature selector uses the Concrete random variables to select features via gradient descent. To encode task-specific properties from a few unlabeled instances to the model, the Concrete random variables and decoder are modeled using permutation-invariant neural networks that take a few unlabeled instances as input. Our model is trained by minimizing the expected test reconstruction error given a few unlabeled instances that is calculated with datasets in source tasks. We experimentally demonstrate that the proposed method outperforms existing feature selection methods.	翻訳日:2021-07-05 13:01:16 公開日:2021-07-02
# 視覚モデルに基づく強化学習における因果発見の体系的評価 Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning ( http://arxiv.org/abs/2107.00848v1 ) ライセンス: Link先を確認	Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal	(参考訳) 観察から因果関係を誘導することは機械学習の古典的な問題である。ほとんどの因果関係の研究は、因果変数自体が観察されるという前提から始まる。しかし、ロボットのようなAIエージェントが環境を理解しようとする場合、観測可能な変数は画像中のピクセルのような低レベル変数のみである。適切に一般化するには、エージェントは高レベルの変数、特に因果変数に影響を受ける変数を誘導する必要がある。 aiと因果関係の中心的な目標は、抽象表現と因果構造の共同発見である。しかし,因果誘導を研究する既存の環境は,パラメトリックに操作できない複雑なタスク固有の因果グラフ(ノード数,スパーシティ,因果連鎖長など)を持っているため,この目的には適さない。本研究の目的は,高レベル変数の表現とそれらの間の因果構造を学ぶ研究を促進することである。これらの変数や構造を同定する手法を体系的に探索するために,我々はRL環境のベンチマークスイートを設計する。本研究は,様々な表現学習アルゴリズムを文献から評価し,モデルに構造とモジュラリティを明示的に組み込むことが,モデルに基づく強化学習における因果的帰納に役立つことを見出した。 Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.	翻訳日:2021-07-05 13:00:57 公開日:2021-07-02
# DeformRS: ランダムな平滑化による入力変形の認証 DeformRS: Certifying Input Deformations with Randomized Smoothing ( http://arxiv.org/abs/2107.00996v1 ) ライセンス: Link先を確認	Motasem Alfarra, Adel Bibi, Naeemullah Khan, Philip H. S. Torr, and Bernard Ghanem	(参考訳) 深層ニューラルネットワークは、画素変位のベクトル場の形での入力変形や、他のパラメータ化された幾何学的変形に弱い。翻訳、回転など現在の入力変形認証法は、(i)大きな入力データセット上のディープネットワークにスケールしないか、(ii)特定の種類の変形を認証できないかのいずれかである。回転だけだ一般ベクトル場およびパラメータ化変形のランダムな平滑化設定における認証を再構成し,DeformRS-VFとDeformRS-Parを提案する。我々の新しい定式化は、大きな入力データセット上の大きなネットワークにスケールする。例えば、DeformRS-Parは豊富な変形、翻訳、回転、スケーリング、アフィン変形、その他の視覚的に整列した変形、例えば離散コサイン変換によってパラメータ化された変形を認証する。 MNIST、CIFAR10、ImageNetの大規模な実験により、DeformRS-Parは、認証された精度で既存の最先端技術よりも優れていることが示された。 imagenetの[10,10]度での摂動回転に対する認証精度が6%向上した。 Deep neural networks are vulnerable to input deformations in the form of vector fields of pixel displacements and to other parameterized geometric deformations e.g. translations, rotations, etc. Current input deformation certification methods either (i) do not scale to deep networks on large input datasets, or (ii) can only certify a specific class of deformations, e.g. only rotations. We reformulate certification in randomized smoothing setting for both general vector field and parameterized deformations and propose DeformRS-VF and DeformRS-Par, respectively. Our new formulation scales to large networks on large input datasets. For instance, DeformRS-Par certifies rich deformations, covering translations, rotations, scaling, affine deformations, and other visually aligned deformations such as ones parameterized by Discrete-Cosine-Transform basis. Extensive experiments on MNIST, CIFAR10 and ImageNet show that DeformRS-Par outperforms existing state-of-the-art in certified accuracy, e.g. improved certified accuracy of 6% against perturbed rotations in the set [-10,10] degrees on ImageNet.	翻訳日:2021-07-05 13:00:32 公開日:2021-07-02
# 大規模画像を用いたメモリ効率の良いメタラーニング Memory Efficient Meta-Learning with Large Images ( http://arxiv.org/abs/2107.01105v1 ) ライセンス: Link先を確認	John Bronskill, Daniela Massiceti, Massimiliano Patacchiola, Katja Hofmann, Sebastian Nowozin, Richard E. Turner	(参考訳) 少数ショット分類へのメタ学習のアプローチは、新しいタスクを学ぶのにほんの数回の最適化ステップや1回のフォワードパスを必要とするテスト時に計算効率が良いが、トレーニングにはメモリ集約性が高い。この制限は、最大1000枚の画像を含むタスク全体のサポートセットを最適化ステップが取られる前に処理しなければならないため生じます。大規模なイメージで提供されるパフォーマンス向上を活用するには、複数のgpuでメタリアナーを並列化するか、メモリ制約が適用できない場合のタスクとイメージサイズのトレードオフが必要となる。単一のgpu上で大きなイメージからなる大きなタスクのメタトレーニングを可能にする汎用およびメモリ効率のよいエピソディックトレーニングスキームであるliteを提案することで、両方のオプションを改善した。我々は,タスクの勾配を,タスクの訓練画像上の勾配の和に分解することができることを観察することによって達成した。これにより、タスク全体のトレーニングセットでフォワードパスを実行できるが、全勾配の偏りのない近似であるこれらの画像のランダムなサブセットのみをバックプロパゲーションすることで、大幅なメモリ節約を実現することができる。我々は、LITEを用いてメタラーナーのトレーニングを行い、実際のORBITベンチマークで新しい最先端の精度を示し、主要なメタラーナーと比較して挑戦的なVTAB+MDベンチマークの4つの部分のうち3つを示す。 LITEはまた、メタ学習者がトランスファーラーニングアプローチと競合することを可能にするが、テストタイムの計算コストのごく一部で、トランスファーラーニングが数ショットの分類に必要なすべてである、という最近の物語の対極として機能する。 Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing the performance gains offered by large images thus requires either parallelizing the meta-learner across multiple GPUs, which may not be available, or trade-offs between task and image size when memory constraints apply. We improve on both options by proposing LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU. We achieve this by observing that the gradients for a task can be decomposed into a sum of gradients over the task's training images. This enables us to perform a forward pass on a task's entire training set but realize significant memory savings by back-propagating only a random subset of these images which we show is an unbiased approximation of the full gradient. We use LITE to train meta-learners and demonstrate new state-of-the-art accuracy on the real-world ORBIT benchmark and 3 of the 4 parts of the challenging VTAB+MD benchmark relative to leading meta-learners. LITE also enables meta-learners to be competitive with transfer learning approaches but at a fraction of the test-time computational cost, thus serving as a counterpoint to the recent narrative that transfer learning is all you need for few-shot classification.	翻訳日:2021-07-05 13:00:12 公開日:2021-07-02
# ケースリレーショナルトランスフォーマー:命令フェッチのためのクロスモーダル言語生成モデル Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions ( http://arxiv.org/abs/2107.00789v1 ) ライセンス: Link先を確認	Motonari Kambara and Komei Sugiura	(参考訳) 家庭内サービスロボットのコミュニケーション能力を向上させるためのロボット工学の研究が数多く行われている。しかし、ほとんどの研究は、トレーニングデータセットが十分に大きくないため、最近のディープニューラルネットワークの進歩の恩恵を受けていない。本稿では,クロスモーダル言語生成モデルに基づくデータセットの強化を目的とする。画像から「青いフリップフロップを左下ボックスに移動させる」というようなフェッチング命令文を生成するケース関係変換器(CRT)を提案する。既存の方法とは異なり、CRTはTransformerを使用して画像内のオブジェクトの視覚的特徴と幾何学的特徴を統合する。 CRTはケースリレーショナルブロックのためにオブジェクトを処理することができる。比較実験と人的評価を行った。実験の結果,crtはベースライン法を上回った。 There have been many studies in robotics to improve the communication skills of domestic service robots. Most studies, however, have not fully benefited from recent advances in deep neural networks because the training datasets are not large enough. In this paper, our aim is to augment the datasets based on a crossmodal language generation model. We propose the Case Relation Transformer (CRT), which generates a fetching instruction sentence from an image, such as "Move the blue flip-flop to the lower left box." Unlike existing methods, the CRT uses the Transformer to integrate the visual features and geometry features of objects in the image. The CRT can handle the objects because of the Case Relation Block. We conducted comparison experiments and a human evaluation. The experimental results show the CRT outperforms baseline methods.	翻訳日:2021-07-05 12:59:44 公開日:2021-07-02
# target-dependent uniter: 国内サービスロボットのためのトランスフォーマーベースのマルチモーダル言語理解モデル Target-dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots ( http://arxiv.org/abs/2107.00811v1 ) ライセンス: Link先を確認	Shintaro Ishikawa and Komei Sugiura	(参考訳) 現在、国内サービスロボットは言語を通して自然に対話する能力が不十分である。これは、人間の指示を理解するのに様々な曖昧さや情報不足が複雑であるからである。既存手法では,オブジェクト間の関係を規定する参照表現は十分にモデル化されていない。本稿では,画像全体ではなく,画像内の関連領域に焦点をあてることで,対象オブジェクトと他のオブジェクトの関係を直接学習するターゲット依存型UNITERを提案する。本手法は汎用データセット上で事前学習可能なユニバーサベースのトランスフォーマの拡張である。対象候補を扱うための新しいアーキテクチャを導入することで、UNITERアプローチを拡張します。本モデルでは,2つの標準データセットに対して検証を行い,分類精度の点で,ターゲット依存型UNITERがベースライン法より優れていることを示す。 Currently, domestic service robots have an insufficient ability to interact naturally through language. This is because understanding human instructions is complicated by various ambiguities and missing information. In existing methods, the referring expressions that specify the relationships between objects are insufficiently modeled. In this paper, we propose Target-dependent UNITER, which learns the relationship between the target object and other objects directly by focusing on the relevant regions within an image, rather than the whole image. Our method is an extension of the UNITER-based Transformer that can be pretrained on general-purpose datasets. We extend the UNITER approach by introducing a new architecture for handling the target candidates. Our model is validated on two standard datasets, and the results show that Target-dependent UNITER outperforms the baseline method in terms of classification accuracy.	翻訳日:2021-07-05 12:59:31 公開日:2021-07-02
# データセットからグラフを生成する学習による高速ニューラルネットワーク検索 Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets ( http://arxiv.org/abs/2107.00860v1 ) ライセンス: Link先を確認	Hayeon Lee, Eunyoung Hyung, Sung Ju Hwang	(参考訳) 最近のnas(neural architecture search)法の成功は、人間が設計したネットワークをほとんど上回るネットワークを出力していることを示したが、従来のnas法は、単一のタスク(データセット)に対するネットワークアーキテクチャの探索の最適化に主に取り組んできた。さらに、タスク固有の手法は、与えられたタスクごとにスクラッチからニューラルアーキテクチャを探索するので、時間と金銭の予算が限られている場合に問題となる大きな計算コストが発生する。本稿では,データセットと事前学習ネットワークからなるデータベース上で1度トレーニングし,新しいデータセットのためのニューラルネットワークを高速に検索できる効率的なNASフレームワークを提案する。提案したMetaD2A(Meta Dataset-to-Architecture)モデルは、アモータイズされたメタラーニングで学習したクロスモーダル潜在空間を介して、与えられたセット(データセット)からグラフ(アーキテクチャ)を確率的に生成することができる。さらに,目的とするデータセットを直接トレーニングすることなく,最適なアーキテクチャを推定し選択するメタパフォーマンス予測器を提案する。実験の結果,画像Net-1KのサブセットとNAS-Bench 201の検索空間からのアーキテクチャに基づいてメタ学習したモデルが,CIFAR-10やCIFAR-100を含む複数の未知のデータセットに平均33GPU秒で一般化できることが示されている。 mobilenetv3の検索空間でも、metad2aは転送可能なnasメソッドであるnsganetv2よりも5.5k倍高速で、同等の性能を持つ。 metad2aは、過去数年間に蓄積されたデータセットやアーキテクチャの豊富なデータベースからの知識を活用する方法だけでなく、rapid nasの新しい研究方向性を提案していると信じています。コードはhttps://github.com/HayeonLee/MetaD2Aで入手できる。 Despite the success of recent Neural Architecture Search (NAS) methods on various tasks which have shown to output networks that largely outperform human-designed networks, conventional NAS methods have mostly tackled the optimization of searching for the network architecture for a single task (dataset), which does not generalize well across multiple tasks (datasets). Moreover, since such task-specific methods search for a neural architecture from scratch for every given task, they incur a large computational cost, which is problematic when the time and monetary budget are limited. In this paper, we propose an efficient NAS framework that is trained once on a database consisting of datasets and pretrained networks and can rapidly search for a neural architecture for a novel dataset. The proposed MetaD2A (Meta Dataset-to-Architecture) model can stochastically generate graphs (architectures) from a given set (dataset) via a cross-modal latent space learned with amortized meta-learning. Moreover, we also propose a meta-performance predictor to estimate and select the best architecture without direct training on target datasets. The experimental results demonstrate that our model meta-learned on subsets of ImageNet-1K and architectures from NAS-Bench 201 search space successfully generalizes to multiple unseen datasets including CIFAR-10 and CIFAR-100, with an average search time of 33 GPU seconds. Even under MobileNetV3 search space, MetaD2A is 5.5K times faster than NSGANetV2, a transferable NAS method, with comparable performance. We believe that the MetaD2A proposes a new research direction for rapid NAS as well as ways to utilize the knowledge from rich databases of datasets and architectures accumulated over the past years. Code is available at https://github.com/HayeonLee/MetaD2A.	翻訳日:2021-07-05 12:58:57 公開日:2021-07-02
# DUKweb: UK Web Archive corpusのダイアクロニックな単語表現 DUKweb: Diachronic word representations from the UK Web Archive corpus ( http://arxiv.org/abs/2107.01076v1 ) ライセンス: Link先を確認	Adam Tsakalidis, Pierpaolo Basile, Marya Bazzi, Mihai Cucuringu and Barbara McGillivray	(参考訳) 語彙的意味変化(単語の意味と用法の変化を検出する)は、自然言語処理だけでなく、社会・文化研究においても重要な課題である。ダイアクロニック単語の埋め込み(意味を保存する単語の時間感受性ベクトル表現)がこのタスクの標準リソースとなっている。しかし、その世代に必要な重要な計算資源を考えると、ダイアクロニックな単語の埋め込みを科学界で利用できる資源はごくわずかである。本稿では,現代英語のダイアクロニック解析のための大規模リソースセットであるDUKwebについて述べる。 DUKweb は JISC UK Web Domain Dataset (1996-2013) から作成され、".uk" で終わるドメインにホストされたインターネットアーカイブからリソースを収集する非常に大規模なアーカイブである。 DUKwebは一連の単語共起行列と、JISC UK Web Domainデータセットに毎年2種類の単語埋め込みで構成されている。 dukwebの再利用可能性とその品質基準を,単語の意味変化検出を事例として示す。 Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, given the significant computational resources needed for their generation, very few resources exist that make diachronic word embeddings available to the scientific community. In this paper we present DUKweb, a set of large-scale resources designed for the diachronic analysis of contemporary English. DUKweb was created from the JISC UK Web Domain Dataset (1996-2013), a very large archive which collects resources from the Internet Archive that were hosted on domains ending in `.uk'. DUKweb consists of a series word co-occurrence matrices and two types of word embeddings for each year in the JISC UK Web Domain dataset. We show the reuse potential of DUKweb and its quality standards via a case study on word meaning change detection.	翻訳日:2021-07-05 12:57:56 公開日:2021-07-02
# 知識グラフとコミュニティ認識感情を用いた新しい深層強化学習に基づくストック方向予測 A Novel Deep Reinforcement Learning Based Stock Direction Prediction using Knowledge Graph and Community Aware Sentiments ( http://arxiv.org/abs/2107.00931v1 ) ライセンス: Link先を確認	Anil Berk Altuner, Zeynep Hilal Kilimci	(参考訳) 株式市場の予測は投資家、研究者、アナリストにとって重要なトピックだ。あまりにも多くの要因に影響されているため、株式市場の予測は扱いにくい。本研究では,地域社会の感情と知識グラフを用いた株式の方向性予測のための深層強化学習手法に基づく新しい手法を提案する。この目的のために,まず接続関係を解析し,ユーザの社会的知識グラフを構築する。その後、関連するストックと感情分析の時系列分析と深い補強手法をブレンドする。トルコ版のトランスフォーマー(berturk)による双方向エンコーダ表現は、ユーザの感情分析に用いられ、深層q学習手法は、深層qネットワークを構築するために提案されたモデルの深層強化学習側に使用されている。このモデルの有効性を示すために、イスタンブール証券取引所のGaranti Bank(GARAN)、Akbank(AKBNK)、T\"urkiye \.I\c{s} Bankas{\i}(ISCTR)がケーススタディとして使用されている。実験の結果,提案手法は株式市場予測タスクにおいて顕著な結果を得た。 Stock market prediction has been an important topic for investors, researchers, and analysts. Because it is affected by too many factors, stock market prediction is a difficult task to handle. In this study, we propose a novel method that is based on deep reinforcement learning methodologies for the direction prediction of stocks using sentiments of community and knowledge graph. For this purpose, we firstly construct a social knowledge graph of users by analyzing relations between connections. After that, time series analysis of related stock and sentiment analysis is blended with deep reinforcement methodology. Turkish version of Bidirectional Encoder Representations from Transformers (BerTurk) is employed to analyze the sentiments of the users while deep Q-learning methodology is used for the deep reinforcement learning side of the proposed model to construct the deep Q network. In order to demonstrate the effectiveness of the proposed model, Garanti Bank (GARAN), Akbank (AKBNK), T\"urkiye \.I\c{s} Bankas{\i} (ISCTR) stocks in Istanbul Stock Exchange are used as a case study. Experiment results show that the proposed novel model achieves remarkable results for stock market prediction task.	翻訳日:2021-07-05 12:57:25 公開日:2021-07-02
# 決定木ヒューリスティックスは滑らかな設定でも失敗する可能性がある Decision tree heuristics can fail, even in the smoothed setting ( http://arxiv.org/abs/2107.00819v1 ) ライセンス: Link先を確認	Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan	(参考訳) 生意気な決定木学習ヒューリスティックスは、機械学習の実践の主役であるが、その経験的成功に対する理論的正当化は、いまだ解明されていない。実際、それらがひどく失敗する単純な対象関数があることは長年知られている(Kearns and Mansour, STOC 1996)。 Brutzkus, Daniely, and Malach (COLT 2020) の最近の研究は、スムーズな解析モデルを、この切断を解決するための道のりとして考えている。平滑化設定の中で、目標の$f$は$k$-juntasであり、これらのヒューリスティックは$f$を深さ$k$決定木仮説で学べることを示した。彼らは、同じ保証がより一般に深さが$k$決定木である目標に対して成り立つと推測した。我々は、深さ$k$決定木であるターゲットを構築し、滑らかな設定であっても、これらのヒューリスティックスは高い精度を達成する前に深さ$2^{\Omega(k)}$のツリーを構築することを示す。また、brutzkusらによる保証も示している。目標が$k$-juntasに非常に近い場合、これらのヒューリスティックスは高い精度を達成する前に深さ270Omega(k)}$のツリーを構築する。 Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets $f$ that are $k$-juntas, they showed that these heuristics successfully learn $f$ with depth-$k$ decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-$k$ decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-$k$ decision trees and show that even in the smoothed setting, these heuristics build trees of depth $2^{\Omega(k)}$ before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to $k$-juntas, for which these heuristics build trees of depth $2^{\Omega(k)}$ before achieving high accuracy.	翻訳日:2021-07-05 12:57:00 公開日:2021-07-02
# 高次元非パラメトリック仮説検定のための一般化多変量符号 Generalized Multivariate Signs for Nonparametric Hypothesis Testing in High Dimensions ( http://arxiv.org/abs/2107.01103v1 ) ライセンス: Link先を確認	Subhabrata Majumdar, Snigdhansu Chatterjee	(参考訳) 特徴空間の次元がサンプルサイズよりはるかに大きい高次元のデータは、多くの統計応用において生じる。この文脈では、一般化された多変量記号変換を構築し、そのノルムによって分割されたベクトルとして定義される。ノルム関数の異なる選択に対して、変換されたベクトルはデータ分布の幾何学的特徴に適応する。このアイデアに基づいて、これらの一般化符号ベクトルを用いて、高次元データの平均ベクトルに対する1サンプルおよび2サンプルの試験手順を得る。これらのテストはカーネル内積を用いたu-統計に基づいており、禁止的な仮定は必要とせず、高速なランダム化ベースの実装に適応できる。複数のデータ設定の実験を通じて、一般的な符号を用いたテストは、名目上のタイプiエラー率を維持しつつ、既存のテストよりも高いパワーを示すことを示した。最後に、mnist と minnesota twin studies のゲノムデータに関するサンプルアプリケーションを提供する。 High-dimensional data, where the dimension of the feature space is much larger than sample size, arise in a number of statistical applications. In this context, we construct the generalized multivariate sign transformation, defined as a vector divided by its norm. For different choices of the norm function, the resulting transformed vector adapts to certain geometrical features of the data distribution. Building up on this idea, we obtain one-sample and two-sample testing procedures for mean vectors of high-dimensional data using these generalized sign vectors. These tests are based on U-statistics using kernel inner products, do not require prohibitive assumptions, and are amenable to a fast randomization-based implementation. Through experiments in a number of data settings, we show that tests using generalized signs display higher power than existing tests, while maintaining nominal type-I error rates. Finally, we provide example applications on the MNIST and Minnesota Twin Studies genomic data.	翻訳日:2021-07-05 12:56:22 公開日:2021-07-02
# 物理インスパイアされたグラフニューラルネットワークによる組合せ最適化 Combinatorial Optimization with Physics-Inspired Graph Neural Networks ( http://arxiv.org/abs/2107.01188v1 ) ライセンス: Link先を確認	Martin J. A. Schuetz, J. Kyle Brubaker, Helmut G. Katzgraber	(参考訳) 組合せ最適化問題の解法としてグラフニューラルネットワークを用いる方法を示す。本手法は,最大カット,最小頂点被覆,最大独立集合,イジングスピングラスおよび多項式非拘束二元最適化問題の形式での高次一般化といった二次非拘束二元最適化問題の形式において,正準np-ハード問題に対して広く適用できる。グラフニューラルネットワークをトレーニングし、教師なし学習プロセスが完了すると、単純なプロジェクションを整数変数に適用する、微分可能な損失関数を生成するために、ハミルトン問題に緩和戦略を適用する。正準最大カットと最大独立集合問題に対する数値計算結果を用いて本手法を実証する。グラフニューラルネットワークオプティマイザが既存のソルバと同等かそれ以上の性能を発揮し、数百万の変数を持つ問題に対して最先端を超えてスケールすることができることが分かりました。 We demonstrate how graph neural networks can be used to solve combinatorial optimization problems. Our approach is broadly applicable to canonical NP-hard problems in the form of quadratic unconstrained binary optimization problems, such as maximum cut, minimum vertex cover, maximum independent set, as well as Ising spin glasses and higher-order generalizations thereof in the form of polynomial unconstrained binary optimization problems. We apply a relaxation strategy to the problem Hamiltonian to generate a differentiable loss function with which we train the graph neural network and apply a simple projection to integer variables once the unsupervised training process has completed. We showcase our approach with numerical results for the canonical maximum cut and maximum independent set problems. We find that the graph neural network optimizer performs on par or outperforms existing solvers, with the ability to scale beyond the state of the art to problems with millions of variables.	翻訳日:2021-07-05 12:56:07 公開日:2021-07-02
# マルチホップ機械読解のための不均一グラフ注意ネットワーク Heterogeneous Graph Attention Network for Multi-hop Machine Reading Comprehension ( http://arxiv.org/abs/2107.00841v1 ) ライセンス: Link先を確認	Feng Gao, Jian-Cheng Ni, Peng Gao, Zi-Li Zhou, Yan-Yan Li, Hamido Fujita	(参考訳) マルチホップ機械読解は自然言語処理において難しい課題であり、推論能力と説明可能性を必要とする。グラフ畳み込みネットワークに基づくスペクトルモデルは推論能力を与え、競争結果をもたらすが、その一部は人間の理解可能な方法で推論を分析するという課題に直面している。認知神経科学における祖母細胞の概念に触発されてcrnameと呼ばれる空間グラフ注目フレームワークが提案された。このモデルは、意味的特徴を多角表現に集約し、推論のための情報を自動的に集中または緩和するように設計されている。クエリの主題を手掛かりの出発点として、推論エンティティをブリッジポイントとして、潜在候補エンティティをおばあちゃんセルとして、手掛かりを候補エンティティとして考える。提案モデルでは, 推論グラフを可視化し, 2つのエンティティを接続するエッジの重要性と, 参照ノードと候補ノードの選択性を分析する。オープンドメインマルチホップ読解データセット WikiHop と Drug-drug Interactions データセット MedHop の公式評価は、我々のアプローチの有効性を証明し、分子生物学領域におけるモデルの適用可能性を示す。 Multi-hop machine reading comprehension is a challenging task in natural language processing, which requires more reasoning ability and explainability. Spectral models based on graph convolutional networks grant the inferring abilities and lead to competitive results, however, part of them still face the challenge of analyzing the reasoning in a human-understandable way. Inspired by the concept of the Grandmother Cells in cognitive neuroscience, a spatial graph attention framework named crname, imitating the procedure was proposed. This model is designed to assemble the semantic features in multi-angle representations and automatically concentrate or alleviate the information for reasoning. The name "crname" is a metaphor for the pattern of the model: regard the subjects of queries as the start points of clues, take the reasoning entities as bridge points, and consider the latent candidate entities as the grandmother cells, and the clues end up in candidate entities. The proposed model allows us to visualize the reasoning graph and analyze the importance of edges connecting two entities and the selectivity in the mention and candidate nodes, which can be easier to be comprehended empirically. The official evaluations in open-domain multi-hop reading dataset WikiHop and Drug-drug Interactions dataset MedHop prove the validity of our approach and show the probability of the application of the model in the molecular biology domain.	翻訳日:2021-07-05 12:55:11 公開日:2021-07-02
# 変圧器の学習トークンプルーニング Learned Token Pruning for Transformers ( http://arxiv.org/abs/2107.00910v1 ) ライセンス: Link先を確認	Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Joseph Hassoun, Kurt Keutzer	(参考訳) トランスフォーマーモデルのデプロイにおける大きな課題は、入力シーケンスの長さで2倍にスケールする禁止推論コストである。これにより、長いシーケンスを処理するのにトランスフォーマーを使うのが特に困難になる。そこで本研究では,データをトランスフォーマーの異なる層を通過する際に冗長なトークンを減少させる新しい学習トークンプルーニング(ltp)法を提案する。特に、LTPは、トレーニング中に学習した閾値未満の注意スコアでトークンをプルーネする。重要なことは、しきい値に基づく手法は、先行トークンプルーニング法で使用されるトップkトークン選択のようなアルゴリズム的に高価な操作を回避し、構造化プルーニングにつながることである。我々は,複数のグルータスクに対する我々のアプローチの性能を広範囲にテストし,学習しきい値に基づく手法が,従来のtop-kトークンベース手法を,同等のフラップで最大2%高い精度で一貫して上回ることを示した。さらに、我々の予備結果は、tesla t4 gpuとintel haswell cpuでそれぞれ1.4倍と1.9倍のスループット向上を示し、1%未満の精度低下(最大2.1倍のフロップス削減)でした。私たちのコードはPyTorchで開発され、オープンソース化されました。 A major challenge in deploying transformer models is their prohibitive inference cost, which quadratically scales with the input sequence length. This makes it especially difficult to use transformers for processing long sequences. To address this, we present a novel Learned Token Pruning (LTP) method that reduces redundant tokens as the data passes through the different layers of the transformer. In particular, LTP prunes tokens with an attention score below a threshold value, which is learned during training. Importantly, our threshold based method avoids algorithmically expensive operations such as top-k token selection which are used in prior token pruning methods, and also leads to structured pruning. We extensively test the performance of our approach on multiple GLUE tasks and show that our learned threshold based method consistently outperforms the prior state-of-the-art top-k token based method by up to ~2% higher accuracy with the same amount of FLOPs. Furthermore, our preliminary results show up to 1.4x and 1.9x throughput improvement on Tesla T4 GPU and Intel Haswell CPU, respectively, with less than 1% of accuracy drop (and up to 2.1x FLOPs reduction). Our code has been developed in PyTorch and has been open-sourced.	翻訳日:2021-07-05 12:54:47 公開日:2021-07-02
# DRIFT:学術文献のダイアクロニック解析用ツールキット DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature ( http://arxiv.org/abs/2107.01198v1 ) ライセンス: Link先を確認	Abheesht Sharma, Gunjan Chhablani, Harshit Pandey, Rajaswa Patil	(参考訳) 本研究は,NLPコミュニティと研究コミュニティ全体を対象として,研究コーパスのダイアクロニック解析への応用について述べる。 DRIFTは、研究者が長年の研究動向や開発を追跡できるツールです。分析方法は、よく引用された研究成果と照合され、良い測定のためにいくつかの独自の方法が追加された。キーワード抽出、ワードクラウド、生産性による減少/停滞/成長傾向の予測、アクセラレーションプロットによるバイグラムの追跡、単語のセマンティックドリフトの検索、類似性によるトレンドの追跡などである。本ツールの有用性と有効性を示すため,本研究では,arxivリポジトリのcs.clコーパスをケーススタディとして,解析手法から推論を行う。ツールキットと関連するコードは以下の通りである。 In this work, we present to the NLP community, and to the wider research community as a whole, an application for the diachronic analysis of research corpora. We open source an easy-to-use tool coined: DRIFT, which allows researchers to track research trends and development over the years. The analysis methods are collated from well-cited research works, with a few of our own methods added for good measure. Succinctly put, some of the analysis methods are: keyword extraction, word clouds, predicting declining/stagnant/growing trends using Productivity, tracking bi-grams using Acceleration plots, finding the Semantic Drift of words, tracking trends using similarity, etc. To demonstrate the utility and efficacy of our tool, we perform a case study on the cs.CL corpus of the arXiv repository and draw inferences from the analysis methods. The toolkit and the associated code are available here: https://github.com/rajaswa/DRIFT.	翻訳日:2021-07-05 12:54:25 公開日:2021-07-02
# 一般的なボードゲームの概念 General Board Game Concepts ( http://arxiv.org/abs/2107.01078v1 ) ライセンス: Link先を確認	\'Eric Piette, Matthew Stephenson, Dennis J.N.J. Soemers and Cameron Browne	(参考訳) 多くのゲームは、ルールやコントロール、プレーエリアなど、共通のアイデアや側面を共有していることが多い。しかし、ボードゲームにおける一般ゲームプレイング(GGP)の文脈では、この領域は未探索のままである。ゲームの概念を定式化するために,ゲームプレーヤやデザイナーが一般的に使用する用語に着想を得た。 Ludii General Game Systemを通じて、ゲーム自体、プレイされた動き、到達した状態など、さまざまなレベルの抽象化の概念を記述します。ゲームのludeme表現に関連する新しいggp機能は、多くの新しい研究ラインを開く。ハイパーエージェントセレクタの作成、ゲーム間のAI学習の転送、ゲーム用語を用いたAI技術の説明は、ゲームコンセプトを使用することで、すべて容易になる。ゲームコンセプトの恩恵を受けることができる他のアプリケーションとして、不完全な古代ゲームのためのもっともらしい再構成ルールの生成や、ボードゲームレコメンデータシステムの実装などが議論されている。 Many games often share common ideas or aspects between them, such as their rules, controls, or playing area. However, in the context of General Game Playing (GGP) for board games, this area remains under-explored. We propose to formalise the notion of "game concept", inspired by terms generally used by game players and designers. Through the Ludii General Game System, we describe concepts for several levels of abstraction, such as the game itself, the moves played, or the states reached. This new GGP feature associated with the ludeme representation of games opens many new lines of research. The creation of a hyper-agent selector, the transfer of AI learning between games, or explaining AI techniques using game terms, can all be facilitated by the use of game concepts. Other applications which can benefit from game concepts are also discussed, such as the generation of plausible reconstructed rules for incomplete ancient games, or the implementation of a board game recommender system.	翻訳日:2021-07-05 12:54:03 公開日:2021-07-02
# ファジィ推論を用いたファジィラフ集合の類似性計算と文類似性計算への応用 Computing Fuzzy Rough Set based Similarities with Fuzzy Inference and Its Application to Sentence Similarity Computations ( http://arxiv.org/abs/2107.01170v1 ) ライセンス: Link先を確認	Nidhika Yadav	(参考訳) ファジィ粗集合による解析において、2つのファジィ集合間の類似性を計算するためのいくつかの研究イニシアティブが提案されている。これらの手法は2つの方法をもたらす。低い相似性と高い相似性。ほとんどのアプリケーションでは、1つのエンティティだけがさらなる分析や結論の導出に役立ちます。本稿では,ファジィ推論エンジンを用いたファジィラフセットに基づく低類似度と上類似度を組み合わせた新しい手法を提案する。さらに,提案手法を問題計算文の類似性に適用し,SICK2014データセット上で評価した。 Several research initiatives have been proposed for computing similarity between two Fuzzy Sets in analysis through Fuzzy Rough Sets. These techniques yield two measures viz. lower similarity and upper similarity. While in most applications only one entity is useful to further analysis and for drawing conclusions. The aim of this paper is to propose novel technique to combine Fuzzy Rough Set based lower similarity and upper similarity using Fuzzy Inference Engine. Further, the proposed approach is applied to the problem computing sentence similarity and have been evaluated on SICK2014 dataset.	翻訳日:2021-07-05 12:53:49 公開日:2021-07-02
# utnet:医療用画像分割のためのハイブリッドトランスフォーマーアーキテクチャ UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation ( http://arxiv.org/abs/2107.00781v1 ) ライセンス: Link先を確認	Yunhe Gao, Mu Zhou, Dimitris Metaxas	(参考訳) トランスフォーマーアーキテクチャは多くの自然言語処理タスクで成功している。しかし、その医学的ビジョンへの応用はほとんど未解明のままである。本研究では,医用画像セグメンテーションを強化するために,自己意識を畳み込みニューラルネットワークに統合するシンプルなハイブリッドトランスフォーマーアーキテクチャUTNetを提案する。 UTNetはエンコーダとデコーダの両方に自己アテンションモジュールを適用し、最小限のオーバーヘッドで異なるスケールで長距離依存性をキャプチャする。そこで本研究では, 自己注意動作の複雑さを$O(n^2)$から$O(n)$に大幅に低減する, 相対的な位置符号化を伴う効率的な自己注意機構を提案する。エンコーダのスキップされた接続から細かな詳細を復元するために,新たな自己アテンションデコーダも提案されている。われわれのアプローチは、Transformerが視覚誘発バイアスを学ぶために大量のデータを必要とするジレンマに対処する。当社のハイブリッド層設計では,事前学習を必要とせずにTransformerを畳み込みネットワークに初期化する。我々は, UTNetをマルチラベル, マルチベンダ型心臓MRIコホートで評価した。 utnetは、最先端のアプローチに対して優れたセグメンテーション性能と堅牢性を示し、他の医療画像セグメンテーションをうまく一般化することを約束している。 Transformer architecture has emerged to be successful in a number of natural language processing tasks. However, its applications to medical vision remain largely unexplored. In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation. UTNet applies self-attention modules in both encoder and decoder for capturing long-range dependency at different scales with minimal overhead. To this end, we propose an efficient self-attention mechanism along with relative position encoding that reduces the complexity of self-attention operation significantly from $O(n^2)$ to approximate $O(n)$. A new self-attention decoder is also proposed to recover fine-grained details from the skipped connections in the encoder. Our approach addresses the dilemma that Transformer requires huge amounts of data to learn vision inductive bias. Our hybrid layer design allows the initialization of Transformer into convolutional networks without a need of pre-training. We have evaluated UTNet on the multi-label, multi-vendor cardiac magnetic resonance imaging cohort. UTNet demonstrates superior segmentation performance and robustness against the state-of-the-art approaches, holding the promise to generalize well on other medical image segmentations.	翻訳日:2021-07-05 12:52:19 公開日:2021-07-02
# 偏光自己注意:高品質な画素ワイド回帰に向けて Polarized Self-Attention: Towards High-quality Pixel-wise Regression ( http://arxiv.org/abs/2107.00782v1 ) ライセンス: Link先を確認	Huajun Liu, Fuqiang Liu, Xinyi Fan, Dong Huang	(参考訳) ピクセル単位での回帰は、キーポイントのヒートマップやセグメンテーションマスクの推定など、コンピュータビジョンタスクにおいて最も一般的な問題である。これらの回帰問題は、特に低い計算オーバーヘッドで高分解能入力/出力の長距離依存性をモデル化し、高度に非線形なピクセル単位の意味論を推定する必要があるため、非常に困難である。ディープ畳み込みニューラルネットワーク(DCNN)の注意機構は、長距離依存の促進に人気があるが、非局所ブロックのような要素固有の注意は、学習に非常に複雑でノイズに敏感であり、単純化されたハイブリットのほとんどは、複数のタスクの間で最高の妥協点に達しようとしている。本稿では,高品質な画素ワイドレグレッションに向けた2つの重要な設計を取り入れた分極自己注意ブロックを提案する。(1)分極フィルタリング:チャネルと空間の注意計算において高い内部分解能を維持しつつ,入力テンソルを対応する次元に沿って完全に崩壊させる。 2)強化: 2次元ガウス分布(キーポイントヒートマップ)や2次元双対分布(バイナリセグメンテーションマスク)など,典型的な細粒度回帰の出力分布に直接適合する非線形性を構成する。 psaはチャネルのみのブランチと空間のみのブランチで表現能力を使い果たし、シーケンシャルレイアウトと並列レイアウトの間には限界的なメトリック差しかなかったようである。実験の結果、psaは標準ベースラインを2～4ドルのポイント増やし、2dポーズ推定とセマンティクスセグメンテーションベンチマークで1～2ドルのポイント増やすことが示されている。 Pixel-wise regression is probably the most common problem in fine-grained computer vision tasks, such as estimating keypoint heatmaps and segmentation masks. These regression problems are very challenging particularly because they require, at low computation overheads, modeling long-range dependencies on high-resolution inputs/outputs to estimate the highly nonlinear pixel-wise semantics. While attention mechanisms in Deep Convolutional Neural Networks(DCNNs) has become popular for boosting long-range dependencies, element-specific attention, such as Nonlocal blocks, is highly complex and noise-sensitive to learn, and most of simplified attention hybrids try to reach the best compromise among multiple types of tasks. In this paper, we present the Polarized Self-Attention(PSA) block that incorporates two critical designs towards high-quality pixel-wise regression: (1) Polarized filtering: keeping high internal resolution in both channel and spatial attention computation while completely collapsing input tensors along their counterpart dimensions. (2) Enhancement: composing non-linearity that directly fits the output distribution of typical fine-grained regression, such as the 2D Gaussian distribution (keypoint heatmaps), or the 2D Binormial distribution (binary segmentation masks). PSA appears to have exhausted the representation capacity within its channel-only and spatial-only branches, such that there is only marginal metric differences between its sequential and parallel layouts. Experimental results show that PSA boosts standard baselines by $2-4$ points, and boosts state-of-the-arts by $1-2$ points on 2D pose estimation and semantic segmentation benchmarks.	翻訳日:2021-07-05 12:52:00 公開日:2021-07-02
# MMF:階層画像分類のためのマルチタスク多構造融合 MMF: Multi-Task Multi-Structure Fusion for Hierarchical Image Classification ( http://arxiv.org/abs/2107.00808v1 ) ライセンス: Link先を確認	Xiaoni Li, Yucan Zhou, Yu Zhou, Weiping Wang	(参考訳) 階層的分類は、複数の粒度の予測を提供し、より良い誤りを促すことで複雑なタスクに重要である。ラベル構造が性能を決定すると、多くの既存手法が分類結果を促進するための優れたラベル構造を構築しようとする。本稿では,異なるラベル構造がカテゴリ認識に様々な事前知識を提供すると考えているので,それらを融合させることにより,階層的な分類結果の改善が期待できる。さらに,異なるラベル構造を統合するマルチタスク多構造融合モデルを提案する。 1つは共通のサブクラスを分類する伝統的な分類枝であり、もう1つは異なるラベル構造によって定義される異種スーパークラスを特定する責任がある。また,複数のラベル構造の効果に加えて,階層分類の精度向上と階層評価指標の調整のために,ディープモデルのアーキテクチャについても検討する。 cifar100 と car196 の実験結果から,任意のラベル構造を持つフラット分類器や階層分類器よりはるかに優れた結果が得られることがわかった。 Hierarchical classification is significant for complex tasks by providing multi-granular predictions and encouraging better mistakes. As the label structure decides its performance, many existing approaches attempt to construct an excellent label structure for promoting the classification results. In this paper, we consider that different label structures provide a variety of prior knowledge for category recognition, thus fusing them is helpful to achieve better hierarchical classification results. Furthermore, we propose a multi-task multi-structure fusion model to integrate different label structures. It contains two kinds of branches: one is the traditional classification branch to classify the common subclasses, the other is responsible for identifying the heterogeneous superclasses defined by different label structures. Besides the effect of multiple label structures, we also explore the architecture of the deep model for better hierachical classification and adjust the hierarchical evaluation metrics for multiple label structures. Experimental results on CIFAR100 and Car196 show that our method obtains significantly better results than using a flat classifier or a hierarchical classifier with any single label structure.	翻訳日:2021-07-05 12:51:27 公開日:2021-07-02
# 第1位 ug2+ challenge 2021 -- (semi-)supervised face detection in the low light condition 1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face detection in the low light condition ( http://arxiv.org/abs/2107.00818v1 ) ライセンス: Link先を確認	Pengcheng Wang, Lingqiao Ji, Zhilong Ji, Yuan Gao, Xiao Liu	(参考訳) 本稿では, CVPR 2021のUG2+チャレンジにおいて, 低光環境下での顔検出のための「TAL-ai」のソリューションを簡潔に紹介する。一般的な画像強調法と画像転送法を用いていくつかの実験を行い、より近い領域に低照度画像と正常画像を引き寄せた。そして、これらのデータをトレーニングに利用することで、よりよいパフォーマンスが得られることが観察されている。また、DectoroRS、Cascade-RCNN、Swin-transformerのような大きなバックボーンなど、一般的なオブジェクト検出フレームワークにも適応しています。最後に、テストセットでmAP 74.89を達成し、最終リーダーボードで1位となったモデルをいくつかまとめる。 In this technical report, we briefly introduce the solution of our team "TAL-ai" for (Semi-) supervised Face detection in the low light condition in UG2+ Challenge in CVPR 2021. By conducting several experiments with popular image enhancement methods and image transfer methods, we pulled the low light image and the normal image to a more closer domain. And it is observed that using these data to training can achieve better performance. We also adapt several popular object detection frameworks, e.g., DetectoRS, Cascade-RCNN, and large backbone like Swin-transformer. Finally, we ensemble several models which achieved mAP 74.89 on the testing set, ranking 1st on the final leaderboard.	翻訳日:2021-07-05 12:51:11 公開日:2021-07-02
# 変圧器を用いたクロスビュージオローカライズ Cross-view Geo-localization with Evolving Transformer ( http://arxiv.org/abs/2107.00842v1 ) ライセンス: Link先を確認	Hongji Yang, Xiufan Lu and Yingying Zhu	(参考訳) 本研究では,道路画像の地理空間的位置をジオタグ付き空中画像のデータベースとマッチングすることにより推定する,クロスビューなジオローカライゼーションの問題に対処する。クロスビューマッチングタスクは、視界の劇的な外観と幾何学的差異のため、非常に難しい。そこで本稿では,cnnが主流である既存の手法とは異なり,グローバル依存をモデル化するためにトランスフォーマの自己着脱特性を利用する新しいジオローカライズトランス(egotr)を考案し,クロスビュージオローカライズにおける視覚的あいまいさを著しく低減する。また,egotrが地上画像と空中画像の幾何学的配置を理解し対応するために,トランスフォーマーの位置符号化を利用する。幾何学的知識に強い仮定を課す最先端の手法と比較して、egotrはトレーニング目的を通じて柔軟に位置埋め込みを学び、従って多くの実世界のシナリオにおいてより実用的になる。トランスフォーマーはタスクに適していますが、そのバニラセルフアテンションメカニズムは各レイヤ内のイメージパッチ内で独立して相互作用し、レイヤ間の相関を見落としています。本稿では,学習表現の品質を向上させるための,単純かつ効果的な自己交叉注意機構を提案する。セルフクロスアテンション(self-cross attention)は、隣接するレイヤ間のグローバルな依存関係をモデル化する。その結果、提案した自己横断的注意はより安定したトレーニングをもたらし、一般化能力を改善し、ネットワークが深まるにつれて表現が進化し続けるように促す。広汎な実験により,我々のEgoTRは,標準的な,きめ細かな,また,クロスデータセットなジオローカライゼーションタスクにおいて,最先端の手法に対して良好に機能することを示した。 In this work, we address the problem of cross-view geo-localization, which estimates the geospatial location of a street view image by matching it with a database of geo-tagged aerial images. The cross-view matching task is extremely challenging due to drastic appearance and geometry differences across views. Unlike existing methods that predominantly fall back on CNN, here we devise a novel evolving geo-localization Transformer (EgoTR) that utilizes the properties of self-attention in Transformer to model global dependencies, thus significantly decreasing visual ambiguities in cross-view geo-localization. We also exploit the positional encoding of Transformer to help the EgoTR understand and correspond geometric configurations between ground and aerial images. Compared to state-of-the-art methods that impose strong assumption on geometry knowledge, the EgoTR flexibly learns the positional embeddings through the training objective and hence becomes more practical in many real-world scenarios. Although Transformer is well suited to our task, its vanilla self-attention mechanism independently interacts within image patches in each layer, which overlooks correlations between layers. Instead, this paper propose a simple yet effective self-cross attention mechanism to improve the quality of learned representations. The self-cross attention models global dependencies between adjacent layers, which relates between image patches while modeling how features evolve in the previous layer. As a result, the proposed self-cross attention leads to more stable training, improves the generalization ability and encourages representations to keep evolving as the network goes deeper. Extensive experiments demonstrate that our EgoTR performs favorably against state-of-the-art methods on standard, fine-grained and cross-dataset cross-view geo-localization tasks.	翻訳日:2021-07-05 12:50:56 公開日:2021-07-02
# MSN:軌道予測のためのマルチスタイルネットワーク MSN: Multi-Style Network for Trajectory Prediction ( http://arxiv.org/abs/2107.00932v1 ) ライセンス: Link先を確認	Conghao Wong, Beihao Xia, Qinmu Peng, Xinge You	(参考訳) 複雑な場面で様々なエージェントの将来の軌跡を予測することは不可欠だが困難である。エージェントの内部的性格要因、近隣の対話的行動、周辺環境の影響にかかわらず、将来の行動スタイルに影響を及ぼすであろう。つまり、同じ種類のエージェントであっても、行動の好みに大きな違いがあるということです。最近の研究はエージェントのマルチモーダル計画の研究に大きな進展をもたらしたが、それらの多くは依然として全てのエージェントに同じ予測戦略を適用しており、巨大なエージェントの複数のスタイルを完全に示すことは困難である。本稿では,エージェントの嗜好スタイルを複数の隠れた行動カテゴリに適応的に分割し,各カテゴリの予測ネットワークを個別に訓練することにより,エージェントに同時に予測スタイルを与えるマルチスタイルネットワーク(msn)を提案する。実験により,我々の決定論的MSN-Dと生成MSN-Gは,最近の最先端手法よりも優れており,可視化結果のマルチスタイル特性が優れていることが示された。 It is essential but challenging to predict future trajectories of various agents in complex scenes. Whether it is internal personality factors of agents, interactive behavior of the neighborhood, or the influence of surroundings, it will have an impact on their future behavior styles. It means that even for the same physical type of agents, there are huge differences in their behavior preferences. Although recent works have made significant progress in studying agents' multi-modal plannings, most of them still apply the same prediction strategy to all agents, which makes them difficult to fully show the multiple styles of vast agents. In this paper, we propose the Multi-Style Network (MSN) to focus on this problem by divide agents' preference styles into several hidden behavior categories adaptively and train each category's prediction network separately, therefore giving agents all styles of predictions simultaneously. Experiments demonstrate that our deterministic MSN-D and generative MSN-G outperform many recent state-of-the-art methods and show better multi-style characteristics in the visualized results.	翻訳日:2021-07-05 12:50:22 公開日:2021-07-02
# 心臓射出分画推定のための超音波ビデオトランスフォーマ Ultrasound Video Transformers for Cardiac Ejection Fraction Estimation ( http://arxiv.org/abs/2107.00977v1 ) ライセンス: Link先を確認	Hadrien Reynaud, Athanasios Vlontzos, Benjamin Hou, Arian Beqiri, Paul Leeson, Bernhard Kainz	(参考訳) 心臓超音波画像は様々な心臓疾患の診断に用いられる。一般的な分析パイプラインは、専門医によるビデオフレームのマニュアル処理を含む。これは、オブザーバ内およびオブザーバ間の可変性に苦しむ。本稿では,残差オートエンコーダネットワークに基づく変圧器アーキテクチャとトークン分類に適したbertモデルを用いた超音波映像解析手法を提案する。これにより、任意の長さのビデオが処理できる。本研究では,エンドシストリクス(ES)とエンドダイアストリクス(ED)のフレーム検出と左室放出率の自動計算に本モデルを適用した。任意の長さの映像に対して,esの3.36フレームとedの7.17フレームの平均フレーム距離を達成する。我々のエンドツーエンドの学習可能なアプローチでは、ビデオあたり5.95のMAEと0.15秒で$R^2$の0.52で射出率を推定できる。コードとモデルはhttps://github.com/hreynaud/uvtで入手できる。 Cardiac ultrasound imaging is used to diagnose various heart diseases. Common analysis pipelines involve manual processing of the video frames by expert clinicians. This suffers from intra- and inter-observer variability. We propose a novel approach to ultrasound video analysis using a transformer architecture based on a Residual Auto-Encoder Network and a BERT model adapted for token classification. This enables videos of any length to be processed. We apply our model to the task of End-Systolic (ES) and End-Diastolic (ED) frame detection and the automated computation of the left ventricular ejection fraction. We achieve an average frame distance of 3.36 frames for the ES and 7.17 frames for the ED on videos of arbitrary length. Our end-to-end learnable approach can estimate the ejection fraction with a MAE of 5.95 and $R^2$ of 0.52 in 0.15s per video, showing that segmentation is not the only way to predict ejection fraction. Code and models are available at https://github.com/HReynaud/UVT.	翻訳日:2021-07-05 12:50:03 公開日:2021-07-02
# 複雑な雑音下での教師なし単一画像超解像 Unsupervised Single Image Super-resolution Under Complex Noise ( http://arxiv.org/abs/2107.00986v1 ) ライセンス: Link先を確認	Zongsheng Yue, Qian Zhao, Jianwen Xie, Lei Zhang and Deyu Meng	(参考訳) シングル・イメージ・スーパーレゾリューション(sisr)の研究、特にディープ・ニューラル・ネットワーク(dnn)が近年大きな成功を収めているが、2つの大きな制限に苦しめられている。第一に、実際の画像劣化は、通常不明であり、互いに非常に異なっており、一般的なSISRタスクを扱うために単一のモデルを訓練することは極めて困難である。第二に、現在の手法は主に劣化のサンプル化プロセスに焦点を当てているが、避けられない騒音汚染を無視または過小評価している。例えば、一般的に使用される独立で同一の分散(d)である。ガウス雑音分布は常に実際の画像ノイズ(カメラセンサノイズなど)から逸脱しており、実際のシナリオでは性能が制限される。これらの問題に対処するため,本論文では,一般のSISRタスクを未知の劣化に対処するモデルベースunsupervised SISR法を提案する。伝統的なidの代わりにガウスノイズ仮定 - パッチベースの新しい非i.d. 複雑な実雑音に適合するノイズモデリング手法を提案する。さらに、DNNによりパラメータ化された深層ジェネレータを用いて、潜伏変数を高解像度画像にマッピングし、従来のハイパーラプラシアン前駆体も精巧にそのようなジェネレータに埋め込み、画像勾配をさらに制約する。最後に、モンテカルロemアルゴリズムは、w.r.t.の両方のイメージジェネレータを更新するための一般的な推論フレームワークを提供する。潜在変数とネットワークパラメータ。総合実験により, 提案手法は, より小さなモデル (0.34M vs. 2.40M) だけでなく, より高速な技術 (SotA) 法(約1dB PSNR) を克服できることが示された。 While the researches on single image super-resolution (SISR), especially equipped with deep neural networks (DNNs), have achieved tremendous successes recently, they still suffer from two major limitations. Firstly, the real image degradation is usually unknown and highly variant from one to another, making it extremely hard to train a single model to handle the general SISR task. Secondly, most of current methods mainly focus on the downsampling process of the degradation, but ignore or underestimate the inevitable noise contamination. For example, the commonly-used independent and identically distributed (i.i.d.) Gaussian noise distribution always largely deviates from the real image noise (e.g., camera sensor noise), which limits their performance in real scenarios. To address these issues, this paper proposes a model-based unsupervised SISR method to deal with the general SISR task with unknown degradations. Instead of the traditional i.i.d. Gaussian noise assumption, a novel patch-based non-i.i.d. noise modeling method is proposed to fit the complex real noise. Besides, a deep generator parameterized by a DNN is used to map the latent variable to the high-resolution image, and the conventional hyper-Laplacian prior is also elaborately embedded into such generator to further constrain the image gradients. Finally, a Monte Carlo EM algorithm is designed to solve our model, which provides a general inference framework to update the image generator both w.r.t. the latent variable and the network parameters. Comprehensive experiments demonstrate that the proposed method can evidently surpass the current state of the art (SotA) method (about 1dB PSNR) not only with a slighter model (0.34M vs. 2.40M) but also faster speed.	翻訳日:2021-07-05 12:49:46 公開日:2021-07-02
# 円形ハフ変換を用いた光点字認識 Optical Braille Recognition using Circular Hough Transform ( http://arxiv.org/abs/2107.00993v1 ) ライセンス: Link先を確認	Zeba Khanam and Atiya Usmani	(参考訳) 点字は視覚障害者に読み書きの権限を与えてきた。しかし同時に、点字以外のユーザーが点字のスクリプトを理解できないことによるギャップも生んでいる。このギャップにより、研究者は点字文書を自然言語に変換する光学点字認識技術を提案するようになった。この研究の主な動機は、盲目の学生の個人文書を翻訳することで、学術機関のコミュニケーションギャップを埋めることである。これはスマートフォンのカメラを使って点字文書をデジタル化する経済的かつ効果的な手法を提案している。任意の点字画像に対して、スキューネス、ノイズ、その他の抑止に不変なハフ変換に基づくドット検出機構が提案されている。検出されたドットは、距離ベースのクラスタリングアルゴリズムを使用して点字細胞にクラスタリングされる。続いて、各点字細胞の標準的な物理パラメータを、特徴抽出と自然言語文字の分類のために推定する。 54点字スクリプトのデータセットに対するこの手法の包括的な評価は、98.71%の精度で行われている。 Braille has empowered visually challenged community to read and write. But at the same time, it has created a gap due to widespread inability of non-Braille users to understand Braille scripts. This gap has fuelled researchers to propose Optical Braille Recognition techniques to convert Braille documents to natural language. The main motivation of this work is to cement the communication gap at academic institutions by translating personal documents of blind students. This has been accomplished by proposing an economical and effective technique which digitizes Braille documents using a smartphone camera. For any given Braille image, a dot detection mechanism based on Hough transform is proposed which is invariant to skewness, noise and other deterrents. The detected dots are then clustered into Braille cells using distance-based clustering algorithm. In succession, the standard physical parameters of each Braille cells are estimated for feature extraction and classification as natural language characters. The comprehensive evaluation of this technique on the proposed dataset of 54 Braille scripts has yielded into accuracy of 98.71%.	翻訳日:2021-07-05 12:49:14 公開日:2021-07-02
# ビデオセグメンテーションのためのディープラーニング技術に関する調査 A Survey on Deep Learning Technique for Video Segmentation ( http://arxiv.org/abs/2107.01153v1 ) ライセンス: Link先を確認	Wenguan Wang, Tianfei Zhou, Fatih Porikli, David Crandall, Luc Van Gool	(参考訳) ビデオセグメンテーション(ビデオセグメンテーション、ビデオフレームを複数のセグメントまたはオブジェクトに分割する)は、映画における視覚効果補助、自律運転におけるシーン理解、ビデオ会議における仮想背景生成など、幅広い実践的応用において重要な役割を果たしている。近年,コンピュータビジョンにおけるコネクショナリズムのルネサンスにより,映像セグメンテーションに特化し,魅力的なパフォーマンスを提供するディープラーニングベースのアプローチが数多く流入している。本調査では,各タスク設定,背景概念,認識されたニーズ,開発履歴,主な課題について,ビデオおよびビデオ意味セグメンテーションにおけるジェネリックオブジェクトセグメンテーション(未知のカテゴリの)という,この分野における2つの基本的な研究方針を総合的にレビューする。また,提案手法とデータセットについて,代表文献の詳細な概要を述べる。さらに,ベンチマークデータセットにおけるレビュー手法の定量的性能比較を行った。最終的に、この分野における未解決の未解決問題の集合を指摘し、さらなる研究の機会を提案する。 Video segmentation, i.e., partitioning video frames into multiple segments or objects, plays a critical role in a broad range of practical applications, e.g., visual effect assistance in movie, scene understanding in autonomous driving, and virtual background creation in video conferencing, to name a few. Recently, due to the renaissance of connectionism in computer vision, there has been an influx of numerous deep learning based approaches that have been dedicated to video segmentation and delivered compelling performance. In this survey, we comprehensively review two basic lines of research in this area, i.e., generic object segmentation (of unknown categories) in videos and video semantic segmentation, by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also provide a detailed overview of representative literature on both methods and datasets. Additionally, we present quantitative performance comparisons of the reviewed methods on benchmark datasets. At last, we point out a set of unsolved open issues in this field, and suggest possible opportunities for further research.	翻訳日:2021-07-05 12:49:01 公開日:2021-07-02
# HandVoxNet++:Voxel-based Neural Networksを用いた手形状と姿勢推定 HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural Networks ( http://arxiv.org/abs/2107.01205v1 ) ライセンス: Link先を確認	Jameel Malik and Soshi Shimada and Ahmed Elhayek and Sk Aziz Ali and Christian Theobalt and Vladislav Golyanik and Didier Stricker	(参考訳) 単一深度マップからの3次元手形状とポーズ推定は多くのアプリケーションにおいて新しい挑戦的なコンピュータビジョン問題である。既存の方法では、2D畳み込みニューラルネットワークを介して手メッシュを直接回帰し、画像の視点歪みによるアーティファクトにつながる。既存の方法の限界に対処するため、handvoxnet++、すなわち3dおよびグラフ畳み込みを完全に教師付きで訓練したvoxelベースのディープネットワークを開発した。ネットワークへの入力はtsdf(truncated signed distance function)に基づく3次元voxelized-depth-mapである。 HandVoxNet++は2つの手形表現に依存している。 1つ目は、メッシュトポロジを保存せず、最も正確な表現である手形状の3Dボキセル化格子である。第2の表現は、メッシュトポロジーを保存する手表面である。両表現の利点を,新たなニューラルグラフ畳み込み型メッシュ登録 (gcn-meshreg) や,トレーニングデータに依存しない古典的なセグメント単位非剛性重力アプローチ (nrga++) と組み合わせて,手表面とボクセル化手の形状を整合させることで組み合わせる。 SynHand5M、deep-based HANDS19 Challenge、HO-3Dという3つの公開ベンチマークの広範な評価において、提案されたHandVoxNet++は最先端のパフォーマンスを達成する。 CVPR 2020で発表されたこれまでのアプローチのジャーナル拡張では、SynHand5MとHANDS19データセットでそれぞれ41.09%と13.7%のアライメント精度が得られた。 HANDS19チャレンジデータセット(Task 1: Depth-Based 3D Hand Pose Estimation)では,2020年8月の結果がポータルに提出された時点で,本手法が第1位となった。 3D hand shape and pose estimation from a single depth map is a new and challenging computer vision problem with many applications. Existing methods addressing it directly regress hand meshes via 2D convolutional neural networks, which leads to artifacts due to perspective distortions in the images. To address the limitations of the existing methods, we develop HandVoxNet++, i.e., a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner. The input to our network is a 3D voxelized-depth-map-based on the truncated signed distance function (TSDF). HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology and which is the most accurate representation. The second representation is the hand surface that preserves the mesh topology. We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which does not rely on training data. In extensive evaluations on three public benchmarks, i.e., SynHand5M, depth-based HANDS19 challenge and HO-3D, the proposed HandVoxNet++ achieves the state-of-the-art performance. In this journal extension of our previous approach presented at CVPR 2020, we gain 41.09% and 13.7% higher shape alignment accuracy on SynHand5M and HANDS19 datasets, respectively. Our method is ranked first on the HANDS19 challenge dataset (Task 1: Depth-Based 3D Hand Pose Estimation) at the moment of the submission of our results to the portal in August 2020.	翻訳日:2021-07-05 12:48:42 公開日:2021-07-02
# 入力の連結による深い二重降下の緩和 Mitigating deep double descent by concatenating inputs ( http://arxiv.org/abs/2107.00797v1 ) ライセンス: Link先を確認	John Chen, Qihan Wang, Anastasios Kyrillidis	(参考訳) 二重降下曲線はディープニューラルネットワークの最も興味深い特性の1つである。これは、古典的なバイアス分散曲線と現代のニューラルネットワークの振る舞いとを対比し、サンプル数がパラメータの数に近づくところで発生する。本研究では,深部ニューラルネットワーク設定における二重降下現象とサンプル数との関係について検討する。特に,サンプル数を人工的に増やすことで既存のデータセットを増強する構造を提案する。この構成は経験的にこの設定の二重降下曲線を緩和する。我々は, 深層2次降下に関する既存の研究を再現し, 過パラメータ領域への滑らかな降下を観測した。これは、モデルサイズ、および数字のエポックに関しても起こる。 The double descent curve is one of the most intriguing properties of deep neural networks. It contrasts the classical bias-variance curve with the behavior of modern neural networks, occurring where the number of samples nears the number of parameters. In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting. In particular, we propose a construction which augments the existing dataset by artificially increasing the number of samples. This construction empirically mitigates the double descent curve in this setting. We reproduce existing work on deep double descent, and observe a smooth descent into the overparameterized region for our construction. This occurs both with respect to the model size, and with respect to the number epochs.	翻訳日:2021-07-05 12:48:07 公開日:2021-07-02
# 異種情報集約によるオンライン地下鉄原点推定予測 Online Metro Origin-Destination Prediction via Heterogeneous Information Aggregation ( http://arxiv.org/abs/2107.00946v1 ) ライセンス: Link先を確認	Lingbo Liu, Yuying Zhu, Guanbin Li, Ziyi Wu, Lei Bai, Mingzhi Mao, Liang Lin	(参考訳) 地下鉄の起点決定予測は知的交通管理にとって極めて重要な課題であり、これは2種類のクロスステーション乗務員、すなわちオリジン・デスティネーション(OD)1とデスティネーション・オリジン(DO)1を正確に予測することを目的としている。 However, complete OD matrices of previous time intervals can not be obtained immediately in online metro systems, and conventional methods only used limited information to forecast the future OD and DO ridership separately.In this work, we proposed a novel neural network module termed Heterogeneous Information Aggregation Machine (HIAM), which fully exploits heterogeneous information of historical data (e.g., incomplete OD matrices, unfinished order vectors, and DO matrices) to jointly learn the evolutionary patterns of OD and DO ridership. 具体的には、ODモデリングブランチが未完成注文の潜在的目的地を明示的に推定し、不完全なOD行列の情報を補完する一方、DOモデリングブランチはDO行列を入力として、DOライダーシップの時空間分布をキャプチャする。さらに、OD-DO因果関係と相関関係をモデル化するためのOD特徴とDO特徴の相互情報を伝達するために、デュアル情報変換器を導入する。提案したHIAMに基づいて,将来のODおよびDOライダーを同時に予測する統合Seq2Seqネットワークを開発した。 2つの大規模ベンチマークで行った大規模な実験は、オンライン地下鉄の起点決定予測における手法の有効性を示した。 Metro origin-destination prediction is a crucial yet challenging task for intelligent transportation management, which aims to accurately forecast two specific types of cross-station ridership, i.e., Origin-Destination (OD) one and Destination-Origin (DO) one. However, complete OD matrices of previous time intervals can not be obtained immediately in online metro systems, and conventional methods only used limited information to forecast the future OD and DO ridership separately.In this work, we proposed a novel neural network module termed Heterogeneous Information Aggregation Machine (HIAM), which fully exploits heterogeneous information of historical data (e.g., incomplete OD matrices, unfinished order vectors, and DO matrices) to jointly learn the evolutionary patterns of OD and DO ridership. Specifically, an OD modeling branch estimates the potential destinations of unfinished orders explicitly to complement the information of incomplete OD matrices, while a DO modeling branch takes DO matrices as input to capture the spatial-temporal distribution of DO ridership. Moreover, a Dual Information Transformer is introduced to propagate the mutual information among OD features and DO features for modeling the OD-DO causality and correlation. Based on the proposed HIAM, we develop a unified Seq2Seq network to forecast the future OD and DO ridership simultaneously. Extensive experiments conducted on two large-scale benchmarks demonstrate the effectiveness of our method for online metro origin-destination prediction.	翻訳日:2021-07-05 12:47:56 公開日:2021-07-02
# ニューラルネットワーク層代数:ディープラーニングにおけるキャパシティと圧縮を測定するフレームワーク Neural Network Layer Algebra: A Framework to Measure Capacity and Compression in Deep Learning ( http://arxiv.org/abs/2107.01081v1 ) ライセンス: Link先を確認	Alberto Badias and Ashis Banerjee	(参考訳) 本稿では,ニューラルネットワークの内在特性を測定するための新しい枠組みを提案する。畳み込みネットワークにフォーカスしながら、我々のフレームワークはどんなネットワークアーキテクチャにも外挿できる。特に,ネットワーク構造のみに依存し,トレーニングやテストデータに依存しない,キャパシティ(表現性に関連する)と圧縮の2つの特性を評価した。この目的のために、第1のメトリクスは、レイヤ複雑性と呼ばれ、任意のネットワーク層のアーキテクチャ上の複雑さを捉え、第2のメトリクスは、レイヤ固有のパワーと呼ばれ、ネットワークに沿ってデータを圧縮する方法を符号化する。メトリクスは、この論文で紹介された層代数の概念に基づいている。この概念は、グローバルなプロパティがネットワークトポロジに依存し、任意のニューラルネットワークの葉ノードを局所的な転送関数で近似できるという考えに基づいており、グローバルなメトリクスの簡単な計算を可能にしている。また,我々の測定値を用いて最先端アーキテクチャの特性を比較し,その特性を用いてベンチマークデータセットの分類精度を解析した。 We present a new framework to measure the intrinsic properties of (deep) neural networks. While we focus on convolutional networks, our framework can be extrapolated to any network architecture. In particular, we evaluate two network properties, namely, capacity (related to expressivity) and compression, both of which depend only on the network structure and are independent of the training and test data. To this end, we propose two metrics: the first one, called layer complexity, captures the architectural complexity of any network layer; and, the second one, called layer intrinsic power, encodes how data is compressed along the network. The metrics are based on the concept of layer algebra, which is also introduced in this paper. This concept is based on the idea that the global properties depend on the network topology, and the leaf nodes of any neural network can be approximated using local transfer functions, thereby, allowing a simple computation of the global metrics. We also compare the properties of the state-of-the art architectures using our metrics and use the properties to analyze the classification accuracy on benchmark datasets.	翻訳日:2021-07-05 12:47:31 公開日:2021-07-02
# brain over brawn -- ステレオカメラを使って、軌道を再構築してより高速なuavを検出し、追跡し、インターセプトする Brain over Brawn -- Using a Stereo Camera to Detect, Track and Intercept a Faster UAV by Reconstructing Its Trajectory ( http://arxiv.org/abs/2107.00962v1 ) ライセンス: Link先を確認	Antonella Bari\v{s}i\'c, Frano Petric, Stjepan Bogdan	(参考訳) 本稿では,MBZIRC2020 Challenge 1に触発された高速侵入型UAVのインターセプト手法について述べる。侵入者の軌道の形状の知識を活用することで、インターセプションポイントを計算することができる。ターゲット追跡は, YOLOv3 Tiny畳み込みニューラルネットワークによる画像処理と, ジンバル搭載型ZED Miniステレオカメラを用いた深度計算を併用した。我々は、ZED MiniからRGBと深度データを用いてターゲットの3次元位置を抽出し、ノイズを低減するためにヒストグラムに基づく処理を考案した。目標位置の3次元計測は、ベルヌーイの補題を用いて近似した図形形状軌跡の位置、向き、大きさを計算するために用いられる。近似が十分正確であると判断されると、測定と近似の間のハウスドルフ距離によって測定され、インターセプションポイントが算出され、インターセプションUAVがターゲットの経路に位置決めされる。提案手法はmbzircコンペティションで得られた経験に基づいて大幅に改善され,シミュレーションおよびフィールド実験により検証された。その結果, 標的UAVの動作に関する情報を抽出する効率的な視覚認識モジュールが, インターセプションの基盤として開発されたことを確認した。このシステムは、ほとんどのシミュレーション実験において、インターセプターよりも30%速いターゲットを追跡し、インターセプトすることができる。非構造環境でのテストでは、12の成果のうち9つが成功した。 The work presented in this paper demonstrates our approach to intercepting a faster intruder UAV, inspired by the MBZIRC2020 Challenge 1. By leveraging the knowledge of the shape of the intruder's trajectory we are able to calculate the interception point. Target tracking is based on image processing by a YOLOv3 Tiny convolutional neural network, combined with depth calculation using a gimbal-mounted ZED Mini stereo camera. We use RGB and depth data from ZED Mini to extract the 3D position of the target, for which we devise a histogram-of-depth based processing to reduce noise. Obtained 3D measurements of target's position are used to calculate the position, the orientation and the size of a figure-eight shaped trajectory, which we approximate using lemniscate of Bernoulli. Once the approximation is deemed sufficiently precise, measured by Hausdorff distance between measurements and the approximation, an interception point is calculated to position the intercepting UAV right on the path of the target. The proposed method, which has been significantly improved based on the experience gathered during the MBZIRC competition, has been validated in simulation and through field experiments. The results confirmed that an efficient visual perception module which extracts information related to the motion of the target UAV as a basis for the interception, has been developed. The system is able to track and intercept the target which is 30% faster than the interceptor in majority of simulation experiments. Tests in the unstructured environment yielded 9 out of 12 successful results.	翻訳日:2021-07-05 12:46:28 公開日:2021-07-02
# epistemic congress (複数形 epistemic congresss) The Optimal Size of an Epistemic Congress ( http://arxiv.org/abs/2107.01042v1 ) ライセンス: Link先を確認	Manon Revel, Tao Lin, Daniel Halpern	(参考訳) 代表制民主主義における議会の最適な規模を分析する。我々は、有権者が一つの根拠となる真理結果で二項問題を判断し、各投票者が彼らの能力レベルに応じて正確に[0, 1]$. 最善の専門家をサンプリングして認識論的会議を構成できると仮定すると、最適な議会のサイズは人口規模で線形であるべきである。この結果は、トップの代表者が任意に高い確率で正確であることを許すとしても、持続する。実世界のデータを分析した結果、議会の実際の規模は、理論的な結果が示す最適なサイズよりもはるかに小さいことがわかった。我々は、極小の議会が直接民主主義を上回り、全ての有権者が投票する状況を分析して結論付けた。 We analyze the optimal size of a congress in a representative democracy. We take an epistemic view where voters decide on a binary issue with one ground truth outcome, and each voter votes correctly according to their competence levels in $[0, 1]$. Assuming that we can sample the best experts to form an epistemic congress, we find that the optimal congress size should be linear in the population size. This result is striking because it holds even when allowing the top representatives to be accurate with arbitrarily high probabilities. We then analyze real world data, finding that the actual sizes of congresses are much smaller than the optimal size our theoretical results suggest. We conclude by analyzing under what conditions congresses of sub-optimal sizes would still outperform direct democracy, in which all voters vote.	翻訳日:2021-07-05 12:45:46 公開日:2021-07-02
# スパースランダムグラフにおけるオンラインマッチング:グリーディアルゴリズムの非漸近的性能 Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm ( http://arxiv.org/abs/2107.00995v1 ) ライセンス: Link先を確認	Nathan Noiry, Flore Sentenac, Vianney Perchet	(参考訳) 逐次予算配分問題により、頂点間の接続がi.d.ではなく、固定度分布(いわゆる構成モデル)を持つオンラインマッチング問題を調査する。偏微分方程式の明示的な系の解であるそれらの連続的な対応によって関連する確率的離散過程を近似することにより、最も単純なアルゴリズムであるgreedyの競合比を推定する。この手法は、問題のサイズが大きくなるにつれて任意に高い確率で、推定誤差の正確な境界を与える。特に、異なる構成モデル間の形式的な比較を可能にする。また、非常に驚くべきことに、GREEDYがRANKINGよりも優れたパフォーマンス保証が得られることを証明しています。 Motivated by sequential budgeted allocation problems, we investigate online matching problems where connections between vertices are not i.i.d., but they have fixed degree distributions -- the so-called configuration model. We estimate the competitive ratio of the simplest algorithm, GREEDY, by approximating some relevant stochastic discrete processes by their continuous counterparts, that are solutions of an explicit system of partial differential equations. This technique gives precise bounds on the estimation errors, with arbitrarily high probability as the problem size increases. In particular, it allows the formal comparison between different configuration models. We also prove that, quite surprisingly, GREEDY can have better performance guarantees than RANKING, another celebrated algorithm for online matching that usually outperforms the former.	翻訳日:2021-07-05 12:45:32 公開日:2021-07-02
# LensID:白内障手術ビデオにおけるレンズ不規則性検出を目的としたCNN-RNNベースのフレームワーク LensID: A CNN-RNN-Based Framework Towards Lens Irregularity Detection in Cataract Surgery Videos ( http://arxiv.org/abs/2107.00875v1 ) ライセンス: Link先を確認	Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Yosuf El-Shabrawi, Klaus Schoeffmann	(参考訳) 白内障手術後の致命的な合併症は、視力の低下と眼外傷につながるレンズインプラントの脱臼である。この合併症のリスクを軽減するためには、手術中の危険因子を発見することが不可欠である。しかし、レンズ脱臼とその不審な危険因子との関係を多数のビデオを用いて検討することは、時間的拡張の手順である。そのため、外科医はより大規模で信頼性の高い研究を可能にするために、自動的なアプローチを要求する。本稿では,レンズの不規則性検出のための大きなステップとして,新しい枠組みを提案する。特に、(I)レンズ導入フェーズを認識するエンドツーエンドのリカレントニューラルネットワークを提案し、(II)インプラントフェーズ後にレンズと瞳孔を分割する新しいセマンティックセグメンテーションネットワークを提案する。位相認識結果から, 手術用位相認識手法の有効性が示された。さらに,セグメンテーション結果は,最先端の競合手法と比較して,セグメンテーションネットワークの有効性を確認した。 A critical complication after cataract surgery is the dislocation of the lens implant leading to vision deterioration and eye trauma. In order to reduce the risk of this complication, it is vital to discover the risk factors during the surgery. However, studying the relationship between lens dislocation and its suspicious risk factors using numerous videos is a time-extensive procedure. Hence, the surgeons demand an automatic approach to enable a larger-scale and, accordingly, more reliable study. In this paper, we propose a novel framework as the major step towards lens irregularity detection. In particular, we propose (I) an end-to-end recurrent neural network to recognize the lens-implantation phase and (II) a novel semantic segmentation network to segment the lens and pupil after the implantation phase. The phase recognition results reveal the effectiveness of the proposed surgical phase recognition approach. Moreover, the segmentation results confirm the proposed segmentation network's effectiveness compared to state-of-the-art rival approaches.	翻訳日:2021-07-05 12:45:09 公開日:2021-07-02
# HO-3D_v3: HO-3Dデータセットの手動アノテーションの精度向上 HO-3D_v3: Improving the Accuracy of Hand-Object Annotations of the HO-3D Dataset ( http://arxiv.org/abs/2107.00887v1 ) ライセンス: Link先を確認	Shreyas Hampali, Sayan Deb Sarkar, Vincent Lepetit	(参考訳) HO-3Dは、手とオブジェクトの3Dポーズにアノテートされた様々なハンドオブジェクトインタラクションシナリオの画像シーケンスを提供するデータセットで、元々HO-3D_v2として導入された。本論文で紹介した最適化手法「本注」を用いてアノテーションを自動生成した。 ho-3d_v3は、手とオブジェクトのポーズの両方に対してより正確なアノテーションを提供するので、手とオブジェクトの接触領域の見積もりがより良くなります。本稿では,HOnnotate法の改良について詳述し,HO-3D_v2とHO-3D_v3の精度を比較するための評価を行った。 ho-3d_v3は、手ポーズのho-3d_v2よりも4mm高い精度を示し、物体表面との接触領域も高い。 HO-3D is a dataset providing image sequences of various hand-object interaction scenarios annotated with the 3D pose of the hand and the object and was originally introduced as HO-3D_v2. The annotations were obtained automatically using an optimization method, 'HOnnotate', introduced in the original paper. HO-3D_v3 provides more accurate annotations for both the hand and object poses thus resulting in better estimates of contact regions between the hand and the object. In this report, we elaborate on the improvements to the HOnnotate method and provide evaluations to compare the accuracy of HO-3D_v2 and HO-3D_v3. HO-3D_v3 results in 4mm higher accuracy compared to HO-3D_v2 for hand poses while exhibiting higher contact regions with the object surface.	翻訳日:2021-07-05 12:44:53 公開日:2021-07-02
# 連続感情認識のための視聴覚・視聴覚融合 Audio-visual Attentive Fusion for Continuous Emotion Recognition ( http://arxiv.org/abs/2107.01175v1 ) ライセンス: Link先を確認	Su Zhang, Yi Ding, Ziquan Wei, Cuntai Guan	(参考訳) 本稿では,(1)事前訓練された2d-cnnと時間畳み込みネットワーク(tcn)を含む視覚ブロック,(2)複数の並列tcnを含むオーラルブロック,(3)音声・視覚情報を結合したリーダ・フォロー・アテンション・フュージョンブロックという,視聴覚・時空間深層ニューラルネットワークを提案する。大規模な履歴カバレッジを持つttnは、ベースラインや最先端の手法(36または48)よりもずっと大きなウィンドウ長(つまり300)で空間-時間情報を利用することができる。融合ブロックは視覚モダリティを強調しつつ、ノイズのオーラルモダリティを相互モダリティ注意機構を用いて活用する。データの完全活用と過度な適合を軽減するため、トレーニングおよび検証セット上でクロスバリデーションを行う。コンコータンス相関係数(CCC)中心は、各折り目から結果をマージするために用いられる。現像セットでは、得られたcccはvalence 0.410、arousal 0.661であり、対応するcccはvalence 0.210、arousal 0.230である。コードはhttps://github.com/sucv/abaw2で入手できる。 We propose an audio-visual spatial-temporal deep neural network with: (1) a visual block containing a pretrained 2D-CNN followed by a temporal convolutional network (TCN); (2) an aural block containing several parallel TCNs; and (3) a leader-follower attentive fusion block combining the audio-visual information. The TCN with large history coverage enables our model to exploit spatial-temporal information within a much larger window length (i.e., 300) than that from the baseline and state-of-the-art methods (i.e., 36 or 48). The fusion block emphasizes the visual modality while exploits the noisy aural modality using the inter-modality attention mechanism. To make full use of the data and alleviate over-fitting, cross-validation is carried out on the training and validation set. The concordance correlation coefficient (CCC) centering is used to merge the results from each fold. On the development set, the achieved CCC is 0.410 for valence and 0.661 for arousal, which significantly outperforms the baseline method with the corresponding CCC of 0.210 and 0.230 for valence and arousal, respectively. The code is available at https://github.com/sucv/ABAW2.	翻訳日:2021-07-05 12:44:40 公開日:2021-07-02
# ペナル化条件勾配法の再検討 Screening for a Reweighted Penalized Conditional Gradient Method ( http://arxiv.org/abs/2107.01106v1 ) ライセンス: Link先を確認	Yifan Sun and Francis Bach	(参考訳) 条件勾配法(CGM)は大規模なスパース凸最適化において広く用いられ、構造化スパース正規化器の1イテレーション当たりの計算コストが低く、非ゼロの収集に対する欲求的なアプローチである。非凸正則化器用一般ペナリゼーションCGM(P-CGM)と非凸正則化器用再重み付きペナリゼーションCGM(RP-CGM)について,通常の凸制約をゲージインスパイアされたペナリティーに置き換えた。この一般化は、イテレーション当たりの複雑さを顕著に増やさない。有界イテレートや線探索を仮定せずに、各サブプロブレムのギャップを$O(1/t)$収束させ、静止点までの距離を測定する。我々はこれを凸の場合において安全であるスクリーニング規則と結合し、o(1/(\delta^2))$で真のサポートに収束する。非凸の場合、スクリーニング規則は有限個の反復において真の支持に収束するが、中間イテレートでは必ずしも安全ではない。本実験では, 本手法の整合性を検証し, 正則化器の凹凸を調整し, スクリーニング規則の適応性を調整した。 The conditional gradient method (CGM) is widely used in large-scale sparse convex optimization, having a low per iteration computational cost for structured sparse regularizers and a greedy approach to collecting nonzeros. We explore the sparsity acquiring properties of a general penalized CGM (P-CGM) for convex regularizers and a reweighted penalized CGM (RP-CGM) for nonconvex regularizers, replacing the usual convex constraints with gauge-inspired penalties. This generalization does not increase the per-iteration complexity noticeably. Without assuming bounded iterates or using line search, we show $O(1/t)$ convergence of the gap of each subproblem, which measures distance to a stationary point. We couple this with a screening rule which is safe in the convex case, converging to the true support at a rate $O(1/(\delta^2))$ where $\delta \geq 0$ measures how close the problem is to degeneracy. In the nonconvex case the screening rule converges to the true support in a finite number of iterations, but is not necessarily safe in the intermediate iterates. In our experiments, we verify the consistency of the method and adjust the aggressiveness of the screening rule by tuning the concavity of the regularizer.	翻訳日:2021-07-05 12:43:08 公開日:2021-07-02
# 機械学習による道路粗さ推定 Road Roughness Estimation Using Machine Learning ( http://arxiv.org/abs/2107.01199v1 ) ライセンス: Link先を確認	Milena Bajic, Shahrzad M. Pour, Asmus Skar, Matteo Pettinari, Eyal Levenberg, Tommy Sonne Alstr{\o}m	(参考訳) 路面粗さは、乗客の安全と乗り心地の両方に影響を与えるため、インフラにとって非常に重要な道路条件である。道路は経時的に劣化するので、道路インフラの状況を正確に把握するために、道路粗さを継続的に監視する必要がある。本稿では,自動車の垂直加速度と速度を用いた道路粗さ予測のための機械学習パイプラインを提案する。我々は、線形回帰、ナイーブベイズ、k-アネレスト隣人、ランダムフォレスト、サポートベクターマシン、マルチ層パーセプトロンニューラルネットワークなどのよく知られた機械学習モデルを比較した。モデルは、時間領域と統計領域で計算される最適選択された特徴の集合に基づいて訓練される。その結果, 従来の乗用車に搭載された車載センサの費用対効果を用いて, 機械学習により道路の粗さを正確に予測できることがわかった。本研究は, 広域道路網の連続監視を可能にすることにより, 今後の舗装状況監視に適していることを示す。 Road roughness is a very important road condition for the infrastructure, as the roughness affects both the safety and ride comfort of passengers. The roads deteriorate over time which means the road roughness must be continuously monitored in order to have an accurate understand of the condition of the road infrastructure. In this paper, we propose a machine learning pipeline for road roughness prediction using the vertical acceleration of the car and the car speed. We compared well-known supervised machine learning models such as linear regression, naive Bayes, k-nearest neighbor, random forest, support vector machine, and the multi-layer perceptron neural network. The models are trained on an optimally selected set of features computed in the temporal and statistical domain. The results demonstrate that machine learning methods can accurately predict road roughness, using the recordings of the cost approachable in-vehicle sensors installed in conventional passenger cars. Our findings demonstrate that the technology is well suited to meet future pavement condition monitoring, by enabling continuous monitoring of a wide road network.	翻訳日:2021-07-05 12:42:43 公開日:2021-07-02
# アンサンブルモデリングと伝達学習によるロバスト薬物・標的相互作用予測に向けて Toward Robust Drug-Target Interaction Prediction via Ensemble Modeling and Transfer Learning ( http://arxiv.org/abs/2107.00719v1 ) ライセンス: Link先を確認	Po-Yu Kao, Shu-Min Kao, Nan-Lan Huang, Yen-Chu Lin	(参考訳) 薬物-標的相互作用(DTI)予測は薬物発見において重要な役割を担い、ディープラーニングアプローチはこの分野で最先端のパフォーマンスを達成した。本稿では,DTI予測のための深層学習モデル(EnsembleDLM)のアンサンブルを紹介する。 EnsembleDLMは化学物質やタンパク質の配列情報のみを使用し、複数のディープニューラルネットワークからの予測を集約する。このアプローチはオーバーフィッティングの機会を減らし、バイアスのない予測をもたらし、DavisとKIBAのデータセットで最先端のパフォーマンスを達成する。 EnsembleDLMは、新しいドメインにおけるテストデータの約2倍の量を用いて転送学習を行い、クロスドメインアプリケーションにおける最先端性能と適切なクロスドメインパフォーマンス(ピアソン相関係数とコンコータンス指数 > 0.8)を達成する。 Drug-target interaction (DTI) prediction plays a crucial role in drug discovery, and deep learning approaches have achieved state-of-the-art performance in this field. We introduce an ensemble of deep learning models (EnsembleDLM) for robust DTI prediction. EnsembleDLM only uses the sequence information of chemical compounds and proteins, and it aggregates the predictions from multiple deep neural networks. This approach reduces the chance of overfitting, yields an unbiased prediction, and achieves state-of-the-art performance in Davis and KIBA datasets. EnsembleDLM also reaches state-of-the-art performance in cross-domain applications and decent cross-domain performance (Pearson correlation coefficient and concordance index > 0.8) with transfer learning using approximately twice the amount of test data in the new domain.	翻訳日:2021-07-05 12:42:26 公開日:2021-07-02
# フィードバック型サイバーレジリエンスのための強化学習 Reinforcement Learning for Feedback-Enabled Cyber Resilience ( http://arxiv.org/abs/2107.00783v1 ) ライセンス: Link先を確認	Yunhan Huang, Linan Huang, Quanyan Zhu	(参考訳) デバイス数と接続の急速な増加は、攻撃面を拡大し、サイバーシステムを弱体化させている。攻撃者がますます高度でリソースに富むようになるにつれて、侵入検知、ファイアウォール、暗号化といった従来のサイバー保護に頼るだけでは、セキュアなサイバーシステムには不十分である。サイバーレジリエンスは、不適切な保護とレジリエンスメカニズムを補完する新しいセキュリティパラダイムを提供する。 CRM(Cyber-Resilient Mechanism)は、既知の、あるいはゼロデイの脅威に適応し、リアルタイムで不確実性に対処し、戦略的にサイバーシステムの重要な機能を維持する。フィードバックアーキテクチャはCRMのオンラインセンシング、推論、動作を可能にする上で重要な役割を担います。強化学習(Reinforcement Learning, RL)は、サイバーレジリエンスのためのフィードバックアーキテクチャを模倣する重要なアルゴリズムのクラスであり、CRMは攻撃者の事前知識に制限された攻撃に対して動的かつシーケンシャルな応答を提供することができる。本稿では,サイバーレジリエンスに関するRLに関する文献をレビューし,姿勢関連,情報関連,人為的脆弱性の3つの主要な脆弱性に対するサイバーレジリエントな防御について論じる。我々は,CRMの3つのアプリケーションドメインとして,移動目標防衛,サイバー詐欺,ヒューマンセキュリティ技術を導入し,その設計を詳述する。 RLテクニックにも脆弱性がある。本稿では、RLの主な脆弱性を説明し、攻撃が報酬、測定、アクチュエータを標的とするいくつかの攻撃モデルを示す。攻撃者はRLエージェントを騙して最小限の攻撃力で悪質なポリシーを学習し、RL対応システムに対する重大なセキュリティ上の懸念を示す。最後に、サイバーセキュリティとレジリエンスにおけるRLの今後の課題と、RLベースのCRMの新たな応用について論じる。 The rapid growth in the number of devices and their connectivity has enlarged the attack surface and weakened cyber systems. As attackers become increasingly sophisticated and resourceful, mere reliance on traditional cyber protection, such as intrusion detection, firewalls, and encryption, is insufficient to secure cyber systems. Cyber resilience provides a new security paradigm that complements inadequate protection with resilience mechanisms. A Cyber-Resilient Mechanism (CRM) adapts to the known or zero-day threats and uncertainties in real-time and strategically responds to them to maintain the critical functions of the cyber systems. Feedback architectures play a pivotal role in enabling the online sensing, reasoning, and actuation of the CRM. Reinforcement Learning (RL) is an important class of algorithms that epitomize the feedback architectures for cyber resiliency, allowing the CRM to provide dynamic and sequential responses to attacks with limited prior knowledge of the attacker. In this work, we review the literature on RL for cyber resiliency and discuss the cyber-resilient defenses against three major types of vulnerabilities, i.e., posture-related, information-related, and human-related vulnerabilities. We introduce moving target defense, defensive cyber deception, and assistive human security technologies as three application domains of CRMs to elaborate on their designs. The RL technique also has vulnerabilities itself. We explain the major vulnerabilities of RL and present several attack models in which the attacks target the rewards, the measurements, and the actuators. We show that the attacker can trick the RL agent into learning a nefarious policy with minimum attacking effort, which shows serious security concerns for RL-enabled systems. Finally, we discuss the future challenges of RL for cyber security and resiliency and emerging applications of RL-based CRMs.	翻訳日:2021-07-05 12:42:02 公開日:2021-07-02
# RL-NCS:非一様圧縮センシングのための強化学習に基づくデータ駆動アプローチ RL-NCS: Reinforcement learning based data-driven approach for nonuniform compressed sensing ( http://arxiv.org/abs/2107.00838v1 ) ライセンス: Link先を確認	Nazmul Karim, Alireza Zaeemzadeh, and Nazanin Rahnavard	(参考訳) 時間変化信号のための強化学習に基づく非一様圧縮センシング(NCS)フレームワークを導入する。 RL-NCSと呼ばれる提案手法は,信号のROI係数と非ROI係数の2つの係数群間のセンサエネルギーの最適かつ適応的な分布を通じて,信号回復性能を向上させることを目的としている。 ROIの係数は通常より重要であり、非ROI係数よりも高い精度で再構成する必要がある。このタスクを達成するために、ROIは2つの特定のアプローチを使用して各タイミングで予測される。これらのアプローチの1つは、予測のために長い短期記憶(LSTM)ネットワークを組み込んでいる。別のアプローチでは、次のステップROIを予測するために、以前のROI情報を使用します。探索探索法を用いて、qネットワークは測定行列を設計するための最適なアプローチを選択することを学ぶ。さらに,Q-network と LSTM ネットワークの効率的なトレーニングのために,結合損失関数を導入している。その結果,急速に変化する信号や測定回数の削減においても,提案手法の有効性が示唆された。 A reinforcement-learning-based non-uniform compressed sensing (NCS) framework for time-varying signals is introduced. The proposed scheme, referred to as RL-NCS, aims to boost the performance of signal recovery through an optimal and adaptive distribution of sensing energy among two groups of coefficients of the signal, referred to as the region of interest (ROI) coefficients and non-ROI coefficients. The coefficients in ROI usually have greater importance and need to be reconstructed with higher accuracy compared to non-ROI coefficients. In order to accomplish this task, the ROI is predicted at each time step using two specific approaches. One of these approaches incorporates a long short-term memory (LSTM) network for the prediction. The other approach employs the previous ROI information for predicting the next step ROI. Using the exploration-exploitation technique, a Q-network learns to choose the best approach for designing the measurement matrix. Furthermore, a joint loss function is introduced for the efficient training of the Q-network as well as the LSTM network. The result indicates a significant performance gain for our proposed method, even for rapidly varying signals and a reduced number of measurements.	翻訳日:2021-07-05 12:41:31 公開日:2021-07-02
# 適応侵入検知システムのためのセグメンテッドフェデレーション学習 Segmented Federated Learning for Adaptive Intrusion Detection System ( http://arxiv.org/abs/2107.00881v1 ) ライセンス: Link先を確認	Geet Shingi, Harsh Saglani, Preeti Jain	(参考訳) サイバー攻撃は大きな問題であり、組織に大きな経済的、評判の害をもたらす。しかし、様々な要因により、現在のネットワーク侵入検知システム(nids)は不十分であると思われる。 NIDSは、手作りのルールデータセットを通じてサイバー攻撃を特定する。機械学習とディープラーニングの最近の応用は、nidsの膨大な努力を和らげてきたが、ネットワークデータのセキュリティは常に主要な関心事であった。しかし、セキュリティ問題に遭遇し、組織間の共有を可能にするために、フェデレートラーニング(FL)スキームが採用されている。現在のFLシステムは成功したが、ネットワークのデータ分散はFLのような単一のグローバルモデルに必ずしも適合しない。したがって、そのような場合、fl に単一の大域モデルを持つことは不可能である。本稿では,より効率的なNIDSのためのSegmented-Federated Learning(Segmented-FL)学習手法を提案する。 segmented-flアプローチでは、セグメンテーションの発生状況に基づいて周期的局所モデル評価を行う。同様のネットワーク環境を同じグループに持ち込もうとしている。さらに、Segmented-FLシステムは、作業者が保持するデータサンプル数に基づいて、局所モデルパラメータの重み付け集約と結合して、さらなる性能向上を行う。 FLや標準データセットの集中型システムと比較して,システムの性能向上が図られ,様々なタスクにまたがってその技術を拡張することが強くなっています。このソリューションは、多様なネットワーク環境を共同で学び、個々のデータセットのプライバシーを保護したい組織に応用される。 Cyberattacks are a major issues and it causes organizations great financial, and reputation harm. However, due to various factors, the current network intrusion detection systems (NIDS) seem to be insufficent. Predominant NIDS identifies Cyberattacks through a handcrafted dataset of rules. Although the recent applications of machine learning and deep learning have alleviated the enormous effort in NIDS, the security of network data has always been a prime concern. However, to encounter the security problem and enable sharing among organizations, Federated Learning (FL) scheme is employed. Although the current FL systems have been successful, a network's data distribution does not always fit into a single global model as in FL. Thus, in such cases, having a single global model in FL is no feasible. In this paper, we propose a Segmented-Federated Learning (Segmented-FL) learning scheme for a more efficient NIDS. The Segmented-FL approach employs periodic local model evaluation based on which the segmentation occurs. We aim to bring similar network environments to the same group. Further, the Segmented-FL system is coupled with a weighted aggregation of local model parameters based on the number of data samples a worker possesses to further augment the performance. The improved performance by our system as compared to the FL and centralized systems on standard dataset further validates our system and makes a strong case for extending our technique across various tasks. The solution finds its application in organizations that want to collaboratively learn on diverse network environments and protect the privacy of individual datasets.	翻訳日:2021-07-05 12:41:14 公開日:2021-07-02
# アクセント音声認識のための教師付きコントラスト学習 Supervised Contrastive Learning for Accented Speech Recognition ( http://arxiv.org/abs/2107.00921v1 ) ライセンス: Link先を確認	Tao Han, Hantao Huang, Ziang Yang, Wei Han	(参考訳) ニューラルネットワークに基づく音声認識システムは、アクセント付き音声、特に不慣れなアクセントによる性能劣化に悩まされる。本稿では,アクセント付き音声認識のための教師付きコントラスト学習フレームワークについて検討する。コントラスト学習のための異なる視点(類似の「陽性」データサンプル)を構築するため,ノイズ注入,分光法,TS-Same-same-sence生成を含む3つのデータ拡張手法について検討した。共通音声データセットを用いた実験から, コントラスト学習は, ゼロショットとフルショットの両方において, 従来の共同学習法を著しく上回るデータ提示不変量および発音不変量表現の構築に寄与することを示した。コントラスト学習は,合同訓練法と比較して,平均で3.66% (ゼロショット) と3.78% (フルショット) の精度向上が示された。 Neural network based speech recognition systems suffer from performance degradation due to accented speech, especially unfamiliar accents. In this paper, we study the supervised contrastive learning framework for accented speech recognition. To build different views (similar "positive" data samples) for contrastive learning, three data augmentation techniques including noise injection, spectrogram augmentation and TTS-same-sentence generation are further investigated. From the experiments on the Common Voice dataset, we have shown that contrastive learning helps to build data-augmentation invariant and pronunciation invariant representations, which significantly outperforms traditional joint training methods in both zero-shot and full-shot settings. Experiments show that contrastive learning can improve accuracy by 3.66% (zero-shot) and 3.78% (full-shot) on average, comparing to the joint training method.	翻訳日:2021-07-05 12:40:53 公開日:2021-07-02
# システム設計と運用のための伝達距離の実証計測 Empirically Measuring Transfer Distance for System Design and Operation ( http://arxiv.org/abs/2107.01184v1 ) ライセンス: Link先を確認	Tyler Cody, Stephen Adams, Peter A. Beling	(参考訳) 古典的な機械学習アプローチは非定常性に敏感である。転送学習は、あるシステムから別のシステムへの知識を共有することによって、非定常性に対処することができるが、機械の予測や防御といった分野においては、データは基本的に制限される。したがって、転送学習アルゴリズムには、学習すべき例がほとんどない。本稿では,これらのアルゴリズム学習の制約がシステム工学によって対処可能であることを示唆する。一般に移動距離を定式化し,モデルの伝達可能性の実証的定量化におけるその利用を実証する。我々は, 転置可能な予測モデルを実現するために, 機械改造手順の設計における移動距離の利用を検討する。また,コンピュータビジョンにおける操作性能予測における転送距離の利用も検討する。経験者は、コンポーネント学習システムで直面する学習論的課題を考慮して、提示された方法論を使ってシステムの設計と運用を行うことができる。 Classical machine learning approaches are sensitive to non-stationarity. Transfer learning can address non-stationarity by sharing knowledge from one system to another, however, in areas like machine prognostics and defense, data is fundamentally limited. Therefore, transfer learning algorithms have little, if any, examples from which to learn. Herein, we suggest that these constraints on algorithmic learning can be addressed by systems engineering. We formally define transfer distance in general terms and demonstrate its use in empirically quantifying the transferability of models. We consider the use of transfer distance in the design of machine rebuild procedures to allow for transferable prognostic models. We also consider the use of transfer distance in predicting operational performance in computer vision. Practitioners can use the presented methodology to design and operate systems with consideration for the learning theoretic challenges faced by component learning systems.	翻訳日:2021-07-05 12:40:16 公開日:2021-07-02
# 転校学習のシステム理論 A Systems Theory of Transfer Learning ( http://arxiv.org/abs/2107.01196v1 ) ライセンス: Link先を確認	Tyler Cody, Peter A. Beling	(参考訳) 伝達学習のための既存のフレームワークは、システム理論の観点から不完全である。彼らはドメインとタスクの概念を強調し、構造と振舞いの概念を無視する。そうすることで、形式主義が彼らの枠組みの解明に果たすことができる範囲を制限できる。ここでは、転移学習を集合上の関係として定義し、その後、転移学習の一般的な性質を数学的構成として特徴づけるメサロヴィッチ系理論を用いる。既存のフレームワークを私たちの観点で解釈し、トランスファー可能性、転送粗さ、転送距離の概念を定義する既存のフレームワークを越えています。重要な点は、その形式化にもかかわらず、学習理論の詳細な数学や機械学習の解法を回避し、それらの考察を取り除かないことである。したがって、システム設計と分析のための厳格な基盤を提供する、転送学習をモデリングするための正式な汎用システムフレームワークを提供する。 Existing frameworks for transfer learning are incomplete from a systems theoretic perspective. They place emphasis on notions of domain and task, and neglect notions of structure and behavior. In doing so, they limit the extent to which formalism can be carried through into the elaboration of their frameworks. Herein, we use Mesarovician systems theory to define transfer learning as a relation on sets and subsequently characterize the general nature of transfer learning as a mathematical construct. We interpret existing frameworks in terms of ours and go beyond existing frameworks to define notions of transferability, transfer roughness, and transfer distance. Importantly, despite its formalism, our framework avoids the detailed mathematics of learning theory or machine learning solution methods without excluding their consideration. As such, we provide a formal, general systems framework for modeling transfer learning that offers a rigorous foundation for system design and analysis.	翻訳日:2021-07-05 12:40:03 公開日:2021-07-02
# Attentive Speaker Embedding を用いたマルチユーザボイスフィルターライト Multi-user VoiceFilter-Lite via Attentive Speaker Embedding ( http://arxiv.org/abs/2107.01201v1 ) ライセンス: Link先を確認	Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ian McGraw	(参考訳) 本稿では、voicefilter-liteのような話者条件付き音声モデルが、任意の数の登録ユーザを1回のパスでサポートできるようにするソリューションを提案する。これは、複数の話者埋め込みにアテンション機構を用いて単一の注意埋め込みを計算し、モデルへのサイドインプットとして使用することによって実現される。マルチユーザ音声フィルタ-liteを実装し,(1)ストリーミング自動音声認識(asr)タスク,(2)テキスト非依存話者照合タスク,(3)asrが複数の登録ユーザからのキーフレーズを雑音環境下で検出しなければならないパーソナライズされたキーフレーズ検出タスクの3つのタスクについて評価した。提案実験では,最大4人の登録ユーザに対して,重複する音声が存在する場合の音声認識と話者照合の誤りを,他の音響条件下での性能に影響を与えずに大幅に低減できることを示す。この注意型話者埋め込みアプローチは、個人用VADやパーソナライズされたASRといった他の話者条件モデルにも容易に適用できる。 In this paper, we propose a solution to allow speaker conditioned speech models, such as VoiceFilter-Lite, to support an arbitrary number of enrolled users in a single pass. This is achieved by using an attention mechanism on multiple speaker embeddings to compute a single attentive embedding, which is then used as a side input to the model. We implemented multi-user VoiceFilter-Lite and evaluated it for three tasks: (1) a streaming automatic speech recognition (ASR) task; (2) a text-independent speaker verification task; and (3) a personalized keyphrase detection task, where ASR has to detect keyphrases from multiple enrolled users in a noisy environment. Our experiments show that, with up to four enrolled users, multi-user VoiceFilter-Lite is able to significantly reduce speech recognition and speaker verification errors when there is overlapping speech, without affecting performance under other acoustic conditions. This attentive speaker embedding approach can also be easily applied to other speaker-conditioned models such as personal VAD and personalized ASR.	翻訳日:2021-07-05 12:39:50 公開日:2021-07-02
# ニューラルネットワークにおける広平面ミニマの構造の解明 Unveiling the structure of wide flat minima in neural networks ( http://arxiv.org/abs/2107.01163v1 ) ライセンス: Link先を確認	Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Gabriele Perugini, Riccardo Zecchina	(参考訳) ディープラーニングの成功は、科学全体にわたるニューラルネットワークの応用の可能性を明らかにし、基本的な理論的問題を開いた。特に、勾配法の単純な変種に基づく学習アルゴリズムが、非凸損失関数のほぼ最適最小値を見つけることができるという事実は、ニューラルネットワークの予期せぬ特徴であり、深く理解する必要がある。このようなアルゴリズムは、ノイズがあってもほぼ完璧にデータを適合させることができるが、予測能力は優れている。いくつかの実験結果は、アルゴリズムによって達成されたいわゆる極小の平坦性と一般化性能との再現可能な相関を示した。同時に、統計物理学の結果は、非凸ネットワークにおいて、多くの狭小極小が、より少ない幅の平らな極小と共存していることを示しており、これはよく一般化している。ここでは,高いマージン分類に対応するミニマの合体から,広い平坦なミニマが生まれることを示す。ゼロマージン解と比較して指数関数的に稀であるにもかかわらず、高マージンミニマは特定の領域に集中する傾向がある。これらのミニマは、より小さく、より小さな縁の他の解に囲まれており、長距離の溶液の密集領域につながる。また, モデルパラメータの数が異なるため, 平坦な最小値が出現し, アルゴリズムが解を見つけ始めるタイミングを推定する代替分析手法も提供する。 The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks which needs to be understood in depth. Such algorithms are able to fit the data almost perfectly, even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here we show that wide flat minima arise from the coalescence of minima that correspond to high-margin classifications. Despite being exponentially rare compared to zero-margin solutions, high-margin minima tend to concentrate in particular regions. These minima are in turn surrounded by other solutions of smaller and smaller margin, leading to dense regions of solutions over long distances. Our analysis also provides an alternative analytical method for estimating when flat minima appear and when algorithms begin to find solutions, as the number of model parameters varies.	翻訳日:2021-07-05 12:39:32 公開日:2021-07-02

Title

Authors

Abstract

論文公表日・翻訳日

# ランダム積公式に対する濃度

Concentration for random product formulas ( http://arxiv.org/abs/2008.11751v3 )

ライセンス: Link先を確認

Chi-Fang Chen, Hsin-Yuan Huang, Richard Kueng, Joel A. Tropp

(参考訳) 量子シミュレーションは、量子化学と物理学に広く応用されている。近年、量子シミュレーションを加速するためのランダム化手法の研究が始まっている。このうち、qDRIFTと呼ばれる単純で強力な手法は、平均量子チャネルが理想的な進化を近似するランダムな積公式を生成することが知られている。 qDRIFTは、スズキの公式と対照的なハミルトン式における項数に明示的に依存しないゲート数を達成する。本研究の目的は,qDRIFTが生成するランダムな積公式の単一実現を包括的に解析することで,このスピードアップの起源を理解することである。主な結果は、ランダム化された積公式の典型的な実現が、小さなダイヤモンドノルム誤差まで理想ユニタリ進化を近似することを示している。ゲートの複雑性は、既にハミルトニアンにおける項数とは独立であるが、ハミルトニアンにおける相互作用強度の和とシステムサイズに依存する。注目すべきは、任意のが固定された入力状態から始まる同じランダムな進化は、その入力状態に適したはるかに短い回路をもたらすことである。対照的に、決定論的設定では、そのような改善は通常、初期状態の知識を必要とする。証明はベクトルおよび行列マルチンタルの濃度不等式に依存し、他のランダム化された積公式にも適用できる。我々の境界はある種の通勤ハミルトニアンによって飽和している。

Quantum simulation has wide applications in quantum chemistry and physics. Recently, scientists have begun exploring the use of randomized methods for accelerating quantum simulation. Among them, a simple and powerful technique, called qDRIFT, is known to generate random product formulas for which the average quantum channel approximates the ideal evolution. qDRIFT achieves a gate count that does not explicitly depend on the number of terms in the Hamiltonian, which contrasts with Suzuki formulas. This work aims to understand the origin of this speed-up by comprehensively analyzing a single realization of the random product formula produced by qDRIFT. The main results prove that a typical realization of the randomized product formula approximates the ideal unitary evolution up to a small diamond-norm error. The gate complexity is already independent of the number of terms in the Hamiltonian, but it depends on the system size and the sum of the interaction strengths in the Hamiltonian. Remarkably, the same random evolution starting from an arbitrary, but fixed, input state yields a much shorter circuit suitable for that input state. In contrast, in deterministic settings, such an improvement usually requires initial state knowledge. The proofs depend on concentration inequalities for vector and matrix martingales, and the framework is applicable to other randomized product formulas. Our bounds are saturated by certain commuting Hamiltonians.

翻訳日:2023-05-04 21:30:30 公開日:2021-07-02

# コヒーレント励起窒素イオン中の光子保持

Photon retention in coherently excited nitrogen ions ( http://arxiv.org/abs/2011.11926v2 )

ライセンス: Link先を確認

Jinping Yao, Luojia Wang, Jinming Chen, Yuexin Wan, Zhihao Zhang, Fangbo Zhang, Lingling Qiao, Shupeng Yu, Botao Fu, Zengxiu Zhao, Chengyin Wu, Vladislav V. Yakovlev, Luqi Yuan, Xianfeng Chen, Ya Cheng

(参考訳) 量子光学における量子コヒーレンス(quantum coherence)は、光情報処理と光操作の重要な部分である。多くの欠点にもかかわらず、アルカリ金属の蒸気は、便利な近赤外励起、強い双極子転移、長寿命コヒーレンスにより、量子光学においてワーキング媒体として使用される。そこで本研究では,800nmフェムト秒レーザーパルスを用いたコヒーレント励起分子窒素イオン(N2+)系において,光子保持と量子コヒーレンスへの再有効性を示す実験を行った。このような光子保持は、量子コヒーレンスによって促進され、数十ピコ秒の間直接測定不能なコヒーレント光子を放出し続けるが、2光子共振吸収により1580nm中心の時間遅延フェムト秒パルスによって読み出され、329.3nmの強い放射となる。本システムでは, 励起状態の個体群が, 非常に弱い再放出光子を伝達する役割を明らかにする。この新たな発見は、N2+における光情報ストレージの潜在的なプラットフォームとしてのコヒーレントな量子制御の性質を明らかにし、強磁場イオン化分子を用いた量子光学プラットフォームにおける基本的な相互作用のさらなる探索を容易にする。

Quantum coherence in quantum optics is an essential part of optical information processing and light manipulation. Alkali metal vapors, despite the numerous shortcomings, are traditionally used in quantum optics as a working medium due to convenient near-infrared excitation, strong dipole transitions and long-lived coherence. Here, we proposed and experimentally demonstrated photon retention and subsequent re-emittance with the quantum coherence in a system of coherently excited molecular nitrogen ions (N2+) which are produced using a strong 800 nm femtosecond laser pulse. Such photon retention, facilitated by quantum coherence, keeps releasing directly-unmeasurable coherent photons for tens of picoseconds, but is able to be read-out by a time-delayed femtosecond pulse centered at 1580 nm via two-photon resonant absorption, resulting in a strong radiation at 329.3 nm. We reveal a pivotal role of the excited-state population to transmit such extremely weak re-emitted photons in this system. This new finding unveils the nature of the coherent quantum control in N2+ for the potential platform for optical information storage in the remote atmosphere, and facilitates further exploration of fundamental interactions in the quantum optical platform with strong-field ionized molecules.

翻訳日:2023-04-23 06:49:22 公開日:2021-07-02

# 5G重畳MIMOネットワークにおけるセル間干渉緩和のための強化学習支援ビームフォーミング

Reinforcement Learning Assisted Beamforming for Inter-cell Interference Mitigation in 5G Massive MIMO Networks ( http://arxiv.org/abs/2103.11782v2 )

ライセンス: Link先を確認

Aidong Yang, Xinlang Yue, Ye Ouyang

(参考訳) ビームフォーミング(ビームフォーミング)は、5Gの大規模マルチインプット・マルチプルアウトプット(MMIMO)通信において重要な技術であり、無線伝送路の性質、すなわち空気の性質により多くの障害を受ける。細胞間干渉(ICI)は、周波数再利用技術による5G通信が直面する主な障害の1つである。本稿では,5GダウンリンクにおけるICI緩和のためのフルダイナミックビームフォーミングを支援する強化学習(RL)を提案する。提案アルゴリズムは、ICIを最小化するためにビームフォーミングとフルダイナミックQ-ラーニングを併用し、チャネル推定を行なわない低複雑さ手法を実現する。パフォーマンス分析は、他のアルゴリズムと比較して、sinr(signal-to-interference-plus-noise-ratio)と計算複雑性の観点からサービス改善の品質を示している。

Beamforming is an essential technology in the 5G massive multiple-input-multiple-output (MMIMO) communications, which are subject to many impairments due to the nature of wireless transmission channel, i.e. the air. The inter-cell interference (ICI) is one of the main impairments faced by 5G communications due to frequency-reuse technologies. In this paper, we propose a reinforcement learning (RL) assisted full dynamic beamforming for ICI mitigation in 5G downlink. The proposed algorithm is a joint of beamforming and full dynamic Q-learning technology to minimize the ICI, and results in a low-complexity method without channel estimation. Performance analysis shows the quality of service improvement in terms of signal-to-interference-plus-noise-ratio (SINR) and computational complexity compared to other algorithms.

翻訳日:2023-04-13 19:40:38 公開日:2021-07-02

# 古典的および量子的状態における仕事と熱の測定

Measurement of work and heat in the classical and quantum regimes ( http://arxiv.org/abs/2102.01493v2 )

ライセンス: Link先を確認

Paolo Solinas, Mirko Amico and Nino Zangh\`i

(参考訳) 量子レベルで仕事や熱の概念を研究する研究分野は、量子システムにおける仕事や熱や内部エネルギーの変化を適切に定義し測定することが困難であることと、実験の欠如という2つの大きな欠点に苦しめられている。本稿では, 工学的環境と相互作用する2レベル量子系の放散熱, 作業, 内部エネルギー変動の完全な特徴について報告する。我々は、IBMQ量子コンピュータを用いて、分散環境で駆動システムのダイナミクスを実装する。実験データを用いて準確率分布関数を構築し, 散逸過程における作業量, 熱量, 内部エネルギー量の正しい平均値を復元する。興味深いことに, 環境結合強度を増大させることにより, 古典的極限の出現と解釈されるエネルギー交換過程の純粋量子特性の低減を観測した。これにより、現在のアプローチはエネルギー交換における量子効果を研究し、理解し、活用するための特権的なツールとなる。

Despite the increasing interest, the research field which studies the concepts of work and heat at quantum level has suffered from two main drawbacks: first, the difficulty to properly define and measure the work, heat and internal energy variation in a quantum system and, second, the lack of experiments. Here, we report a full characterization of the dissipated heat, work and internal energy variation in a two-level quantum system interacting with an engineered environment. We use the IBMQ quantum computer to implement the driven system's dynamics in a dissipative environment. The experimental data allow us to construct quasi-probability distribution functions from which we recover the correct averages of work, heat and internal energy variation in the dissipative processes. Interestingly, by increasing the environment coupling strength, we observe a reduction of the pure quantum features of the energy exchange processes that we interpret as the emergence of the classical limit. This makes the present approach a privileged tool to study, understand and exploit quantum effects in energy exchanges.

翻訳日:2023-04-13 00:39:22 公開日:2021-07-02

# ボース・アインシュタイン音響ブラックホールにおける二分極と三分極の絡み合い

Bipartite and tripartite entanglement in a Bose-Einstein acoustic black hole ( http://arxiv.org/abs/2102.06175v2 )

ライセンス: Link先を確認

Mathieu Isoard, Nadia Milazzo, Nicolas Pavloff, Olivier Giraud

(参考訳) ボース・アインシュタイン凝縮体の流れで実現される類似ブラックホールの量子絡み合いを調べる。この系は3モードガウス状態によって記述され、対応する共分散行列を0および有限温度で構成する。両分節および三分節の絡み合いについて検討し,その実験的観察について考察した。我々は、系のホーキング温度とグレーボディ係数を決定する新しい方法を示す類似のボース・アインシュタインブラックホールと同等の単純な光学装置を同定する。

We investigate quantum entanglement in an analogue black hole realized in the flow of a Bose-Einstein condensate. The system is described by a three-mode Gaussian state and we construct the corresponding covariance matrix at zero and finite temperature. We study associated bipartite and tripartite entanglement measures and discuss their experimental observation. We identify a simple optical setup equivalent to the analogue Bose-Einstein black hole which suggests a new way of determining the Hawking temperature and grey-body factor of the system.

翻訳日:2023-04-11 11:51:18 公開日:2021-07-02

# 開放型量子システムとしての神経系モデリング

Modeling the Nervous System as An Open Quantum System ( http://arxiv.org/abs/2104.09424v2 )

ライセンス: Link先を確認

Yu-Juan Sun and Wei-Min Zhang

(参考訳) 本稿では,ニューロンをシミュレートし,神経細胞の周囲を介し相互に相互作用する多ニューロン相互作用系のニューラルネットワークモデルを提案する。我々は、神経活動電位の電気回路環境から生じるあらゆる種類の振動モードの収集として、樹状突起、軸索、シナプス、および周囲のグリア細胞を含む神経細胞周囲を物理的にモデル化する。オープン量子システムのマスター方程式を用いて神経モデルのダイナミクスを解析し,ニューロンの集団行動について検討した。神経回路に刺激を施した後、ニューロン集団状態が活性化され、行動電位の挙動を示す。このモデルはランダムなニューロンとニューロンの相互作用を発生させ、神経系における情報伝達の過程を物理的に記述するのに適しており、神経系のダイナミクスを理解するための潜在的な経路となる可能性がある。

We propose a neural network model of multi-neuron interacting system that simulates neurons to interact each other through the surroundings of neuronal cell bodies. We physically model the neuronal cell surroundings, include the dendrites, the axons and the synapses as well as the surrounding glial cells, as a collection of all kinds of oscillating modes arisen from the electric circuital environment of neuronal action potentials. By analyzing the dynamics of this neural model through the master equation approach of open quantum systems, we investigate the collective behavior of neurons. After applying stimulations to the neural network, the neuronal collective state is activated and shows the action potential behavior. We find that this model can generate random neuron-neuron interactions and is proper to describe the process of information transmission in the nervous system physically, which may pave a potential route toward understanding the dynamics of nervous system.

翻訳日:2023-04-07 18:40:23 公開日:2021-07-02

# Lifshitzフェルミオンによる絡み合いエントロピー

Entanglement Entropy with Lifshitz Fermions ( http://arxiv.org/abs/2104.10913v3 )

ライセンス: Link先を確認

Dion Hartmann and Kevin Kavanagh and Stefan Vandoren

(参考訳) lifshitzのスケーリング対称性を持つフェルミオンを調べ、1+1次元のエンタングルメントエントロピーをスケーリング指数$z$の関数として研究する。興味深いことに、基底状態では、絡み合いエントロピーは$z$の偶数に対して消えるが、奇数に対しては$z$とは独立であり、$z=1$の相対論的ケースと同値である。格子上での相関法とホログラフィック cMERA を用いた手法を用いてこれを示す。熱状態における絡み合いエントロピーは、格子相関法を用いてプロットする$z$と$T$のより詳細な関数である。 z$ の偶数または奇数性に依存することは、まだ小さな温度を示すが、大きな温度または大きな値である$z$ で洗い流される。

We investigate fermions with Lifshitz scaling symmetry and study their entanglement entropy in 1+1 dimensions as a function of the scaling exponent $z$. Remarkably, in the ground state the entanglement entropy vanishes for even values of $z$, whereas for odd values it is independent of $z$ and equal to the relativistic case with $z=1$. We show this using the correlation method on the lattice, and also using a holographic cMERA approach. The entanglement entropy in a thermal state is a more detailed function of $z$ and $T$ which we plot using the lattice correlation method. The dependence on the even- or oddness of $z$ still shows for small temperatures, but is washed out for large temperatures or large values of $z$.

翻訳日:2023-04-02 20:28:47 公開日:2021-07-02

# 量子光合成における量子フィードバック制御

Quantum feedback control in quantum photosynthesis ( http://arxiv.org/abs/2105.12128v3 )

ライセンス: Link先を確認

S.V. Kozyrev, A.N. Pechen

(参考訳) 相互作用する励起子とビブロンの系における量子フィードバック制御のモデルとしての量子光合成における電荷分離のモデルを紹介する。このアプローチにおける量子フィードバックは、デコヒーレンスを伴うランダウ・ツェナー遷移を記述する。このモデルは、量子光合成における電荷分離の過程における非可逆性を説明する。この量子制御モデルに対する直接遷移は1に近い確率を持ち、逆遷移は0に近い確率を持つ。これは量子ラチェットのモデルと見なすことができる。また、このモデルは遷移のボーア周波数と遷移に結合したビブロンのエネルギーの一致を説明する。

A model of charge separation in quantum photosynthesis as a model of quantum feedback control in a system of interacting excitons and vibrons is introduced. Quantum feedback in this approach describes the Landau--Zener transition with decoherence. The model explains irreversibility in the process of charge separation for quantum photosynthesis -- direct transitions for this quantum control model will have probabilities close to one and reverse transitions will have probabilities close to zero. This can be considered as a model of quantum ratchet. Also this model explains coincidence of energy of the vibron paired to the transition and Bohr frequency of the transition.

翻訳日:2023-03-29 20:40:35 公開日:2021-07-02

# 時間依存相対論的非摂動クーロン場における自然放射スペクトルのゲージ依存性

Gauge dependence of spontaneous radiation spectrum in a time-dependent relativistic non-perturbative Coulomb field ( http://arxiv.org/abs/2106.03429v3 )

ライセンス: Link先を確認

Xue-Nan Chen, Yu-hang Luo and Xiang-Song Chen

(参考訳) 我々は、相対論的荷電粒子のクラスターによって生成できる時間依存相対論的非摂動クーロン場を含むことをラムが指摘した「ゲージ選択」問題を拡張する。断熱条件が慎重に維持されている場合は、原子状態を定義する際に核クーロンポテンシャルの側にその場を含めなければならない。外部磁場近似をとると、この時間依存相対論的非摂動クーロン場に対するゲージ選択は、従来の方法では克服できず、一過性自発的放射スペクトルのゲージ依存性がかなり大きいことが判明した。我々は、一般的なクーロン、ローレンツ、マルチポーラゲージに対して、そのようなゲージ依存性が10MHz以上であるような単純な1次元電荷調和振動子を明示的に計算する。一般の見解とは対照的に、このゲージ依存は実際には災害ではなく、実際には利点である:相対論的境界状態問題は非常に複雑であり、完全な量子場法が不足しているため、外部場の近似は導出できず、したがって保証されない。しかし、実験データに適合することにより、常に有効外部場を定義することができ、これは特定のゲージのゲージポテンシャルとパラメータ化される可能性がある。この効果的な外部場は現象学的な用途だけでなく、ゲージ場の物理的意義にも光を当てた。

We extend the "gauge choice" problem Lamb noticed to include a time-dependent relativistic non-perturbative Coulomb field, which can be produced by a cluster of relativistic charged particles. If adiabatic conditions are carefully maintained, such a field must be included along side the nuclear Coulomb potential when defining the atomic state. We reveal that when taking the external field approximation, the gauge choice for this time-dependent relativistic non-perturbative Coulomb field cannot be overcome by previous method, and leads to considerable gauge-dependence of the transient spontaneous radiation spectrum. We calculate explicitly with a simple one-dimensional charged harmonic oscillator that such a gauge-dependence can be of a measurable magnitude of 10 MHz or larger for the commonly used Coulomb, Lorentz, and multipolar gauges. Contrary to the popular view, we explain that this gauge dependence is not really a disaster, but actually an advantage here: The relativistic bound-state problem is so complicated that a fully quantum-field method is still lacking, thus the external field approximation cannot be derived and hence not guaranteed. However, by fitting to the experimental data, one may always define an effective external field, which may likely be parameterized with the gauge potential in a particular gauge. This effective external field would not only be of phenomenological use, but also shed light on the physical significance of the gauge field.

翻訳日:2023-03-27 09:16:24 公開日:2021-07-02

# 飽和付近の量子スピン解法:qs$^3_{~}$

Quantum spin solver near saturation: QS$^3_{~}$ ( http://arxiv.org/abs/2107.00872v1 )

ライセンス: Link先を確認

Hiroshi Ueda, Seiji Yunoki, Tokuro Shimokawa

(参考訳) QS$^{3}$ [\textipa{kj\'u:-\'es-kj\'u:b}] というプログラムパッケージを開発し、スピン-1/2 XXZ型量子スピンモデルを全偏極状態近傍の空間的均一・非一様格子上で解析し、希薄なハードコアボース系にマッピングする。 qs$^{3}$の全ての計算は、固有値問題、一点/二点スピン作用素の期待値、静的/動的スピン構造因子を含む、ダウンスピンの$n_{\downarrow}$とスピン配置のビット表現を使わずに翻訳対称性に関連付けられた波数$\boldsymbol{k}$で指定された対称性対応基底で実行される。これらの処理により、QS$^{3}$は1000以上のサイトと希薄な$N_{\downarrow}$を含む大規模量子システムをサポートすることができる。 10\times10\times10$立方格子上の等方性ハイゼンベルクモデルの低エネルギー励起分散に対するqs$^{3}$のベンチマーク結果、10\times10$平方格子上の等方性ハイゼンベルクモデルの静的および動的スピン構造因子、および固体物理学研究所(issp)に設置されたamd epyc 7702に基づくスーパーコンピュータ(ohtaka)上でのオープンmp並列化効率を示す。理論的背景とQS$^{3}$のユーザインタフェースについても述べる。

We develop a program package named QS$^{3}$ [\textipa{kj\'u:-\'es-kj\'u:b}] based on the (thick-restart) Lanczos method for analyzing spin-1/2 XXZ-type quantum spin models on spatially uniform/non-uniform lattices near fully polarized states, which can be mapped to dilute hardcore Bose systems. All calculations in QS$^{3}$, including eigenvalue problems, expectation values for one/two-point spin operators, and static/dynamical spin structure factors, are performed in the symmetry-adapted bases specified by the number $N_{\downarrow}$ of down spins and the wave number $\boldsymbol{k}$ associated with the translational symmetry without using the bit representation for specifying spin configurations. Because of these treatments, QS$^{3}$ can support large-scale quantum systems containing more than 1000 sites with dilute $N_{\downarrow}$. We show the benchmark results of QS$^{3}$ for the low-energy excitation dispersion of the isotropic Heisenberg model on the $10\times10\times10$ cubic lattice, the static and dynamical spin structure factors of the isotropic Heisenberg model on the $10\times10$ square lattice, and the open-MP parallelization efficiency on the supercomputer (Ohtaka) based on AMD Epyc 7702 installed at the Institute for the Solid State Physics (ISSP). Theoretical backgrounds and the user interface of QS$^{3}$ are also described.

翻訳日:2023-03-23 18:51:19 公開日:2021-07-02

# 室温導体による2つのレーザー冷却イオンのカップリング

Coupling two laser-cooled ions via a room-temperature conductor ( http://arxiv.org/abs/2107.00851v1 )

ライセンス: Link先を確認

Da An, Alberto M. Alonso, Clemens Matthiesen, and Hartmut H\"affner

(参考訳) 分離距離620$\mu$mの2つの独立に捕捉されたイオンの運動間の結合を示す。イオン-イオン相互作用は、2つの表面トラップを接続する室温電気浮遊金属線を介して強化される。両イオンの共鳴運動を調整し、結合速度11Hzのエネルギーの流れを示す。量子コヒーレント結合はデバイス内の強表面電界ノイズによって妨げられる。イオン配線系は、室温導体を用いて、自由空間双極子-双極子カップリングによって達成可能な距離を超える距離における、独立に閉じ込められた電荷間の相互作用を仲介し、調整することができることを示す。この技術は、同調的に冷却したり、遠隔で閉じ込められた電荷を絡ませたり、異なる物理的システム間のカップリングを可能にするために用いられる。

We demonstrate coupling between the motions of two independently trapped ions with a separation distance of 620 $\mu$m. The ion-ion interaction is enhanced via a room-temperature electrically floating metallic wire which connects two surface traps. Tuning the motion of both ions into resonance, we show flow of energy with a coupling rate of 11 Hz. Quantum-coherent coupling is hindered by strong surface electric-field noise in our device. Our ion wire-ion system demonstrates that room-temperature conductors can be used to mediate and tune interactions between independently trapped charges over distances beyond those achievable with free-space dipole-dipole coupling. This technology may be used to sympathetically cool or entangle remotely trapped charges and enable coupling between disparate physical systems.

翻訳日:2023-03-23 18:50:42 公開日:2021-07-02

# 量子計測仮定の微視的導出

A microscopic derivation of the quantum measurement postulates ( http://arxiv.org/abs/2107.00803v1 )

ライセンス: Link先を確認

Vyacheslav Lysov and Yasha Neiman

(参考訳) 19世紀中頃、力学と熱力学の両方の法則が知られ、どちらも基本的であった。これはボルツマンとギブスによって変更され、熱力学は、非常に大きなシステムに力学を適用し、それらの振る舞いに関する単純な統計的仮定をすることで、*由来*であることを示した。同様に、量子力学(QM)が最初に発見されたとき、波動関数の決定論的進化と確率論的測定プロセスの2つの仮定を必要とするように見えた。ここでも後者は前者から導かれる: 大規模システム(機器、観測者、環境)にユニタリ進化を適用し、それらの振る舞いに関する単純な仮定をすることで、量子測定のすべての特徴を導出することができる。私たちは、量子実験の単純で明示的なモデルを用いて、この主張を実証することにしました。

In the mid-19th century, both the laws of mechanics and thermodynamics were known, and both appeared fundamental. This was changed by Boltzmann and Gibbs, who showed that thermodynamics can be *derived*, by applying mechanics to very large systems, and making simple statistical assumptions about their behavior. Similarly, when Quantum Mechanics (QM) was first discovered, it appeared to require two sets of postulates: one about the deterministic evolution of wavefunctions, and another about the probabilistic measurement process. Here again, the latter is derivable from the former: by applying unitary evolution to large systems (apparatuses, observers and environment), and making simple assumptions about their behavior, one can derive all the features of quantum measurement. We set out to demonstrate this claim, using a simple and explicit model of a quantum experiment, which we hope will be clear and compelling to the average physicist.

翻訳日:2023-03-23 18:50:23 公開日:2021-07-02

# 連続変数を持つ量子場理論のための量子イマジナリー時間進化アルゴリズム

Quantum Imaginary Time Evolution Algorithm for Quantum Field Theories with Continuous Variables ( http://arxiv.org/abs/2107.00791v1 )

ライセンス: Link先を確認

K\"ubra Yeter-Aydeniz, Eleftherios Moschandreou, George Siopsis

(参考訳) 量子想像時間進化アルゴリズムの連続可変バージョンを用いて格子上の相互作用するスカラー量子場理論のエネルギーレベルと対応する固有状態を計算する。格子上の各点における場のシミュレーションには1つのqumodeのみが必要である。我々の量子アルゴリズムは非ガウス量子ゲートの使用を回避し、代わりに光子数演算子の固有状態に投影する検出器に依存する。 XanaduのStrawberry Fieldsシミュレーターを用いて、正確な計算結果と非常によく一致したエネルギーレベルの結果を得る。既存の技術で実現可能な実験的なセットアップを提案する。

We calculate the energy levels and corresponding eigenstates of an interacting scalar quantum field theory on a lattice using a continuous-variable version of the quantum imaginary time evolution algorithm. Only a single qumode is needed for the simulation of the field at each point on the lattice. Our quantum algorithm avoids the use of non-Gaussian quantum gates and relies, instead, on detectors projecting onto eigenstates of the photon-number operator. Using Xanadu's Strawberry Fields simulator, we obtain results on energy levels that are in very good agreement with results from exact calculations. We propose an experimental setup that can be realized with existing technology.

翻訳日:2023-03-23 18:50:05 公開日:2021-07-02

# 1つのモード間相互作用を有する非定常キャビティの厳密解

Exact solution of a non-stationary cavity with one intermode interaction ( http://arxiv.org/abs/2107.00785v1 )

ライセンス: Link先を確認

I. Ramos-Prieto, R. Rom\'an-Ancheyta, J. R\'ecamier and H. M. Moya-Cessa

(参考訳) 非定常一次元空洞は、いわゆる動的カシミール効果の時間依存的かつ多モード有効ハミルトニアンによって記述することができる。共振器ミラーの1つに課される非断熱境界条件により、この効果は電磁場の真空変動から実際の光子の発生を予測する。このような光子生成はキャビティにおけるモードの数とその中間結合に依存する。ここでは代数的アプローチを用いて、実効ハミルトニアンをパラメタライズする任意の関数に対して、対応する時間依存シュリンガー方程式は空洞が1つの終端相互作用を持つときの正確な解を認めることを示す。 11個の指数関数の積として書かれた正確な時間発展演算子により、各モードの平均光子数、関連する観測可能数、進化した真空状態の統計特性が得られる。

A non-stationary one-dimensional cavity can be described by the time-dependent and multi-mode effective Hamiltonian of the so-called dynamical Casimir effect. Due to the non-adiabatic boundary conditions imposed in one of the cavity mirrors, this effect predicts the generation of real photons out of vacuum fluctuations of the electromagnetic field. Such photon generation strongly depends on the number of modes in the cavity and their intermode couplings. Here, by using an algebraic approach, we show that for any set of functions parameterizing the effective Hamiltonian, the corresponding time-dependent Schr\"odinger equation admits an exact solution when the cavity has one intermode interaction. With the exact time evolution operator, written as a product of eleven exponentials, we obtain the average photon number in each mode, a few relevant observables and some statistical properties for the evolved vacuum state.

翻訳日:2023-03-23 18:49:56 公開日:2021-07-02

# 二成分多モード状態に対するアインシュタイン・ポドルスキー・ローゼンの不確かさ限界

Einstein-Podolsky-Rosen uncertainty limits for bipartite multimode states ( http://arxiv.org/abs/2107.01058v1 )

ライセンス: Link先を確認

Paulina Marian and Tudor A. Marian

(参考訳) 量子系の多部状態に対する相関の証明と定量化は、量子情報理論において中心的な課題であるように見える。ここでは、連続変数のマルチモード状態の絡み合いとアインシュタイン-ポドルスキー-ローゼン(epr)ステアリングの両方のユニタリ量子力学的観点を与える。これはモードの正準二次作用素に対するハイゼンベルクの不確実性関係に由来する。適切なEPR様観測値の対のばらつきを用いて, 2-party $(N\, \text{vs} \,1)$-mode状態の相関について検討した。これらの非局所変数の不確かさの和は、下から局所の不確かさによって束縛され、分離可能な状態と、各一方通行不能な状態に対して異なる強化がなされる。これらの分散の最小の正規化和の分析は、両方の可能なステアリング方法において、$(N\, \text{vs} \,1)$-モード状態の分離性とEPR不安定性の必要条件をもたらす。状態と実行された測定値がガウス的である場合、これらの条件は正確には分離性と一方的不安定性の既知基準である。

Certification and quantification of correlations for multipartite states of quantum systems appear to be a central task in quantum information theory. We give here a unitary quantum-mechanical perspective of both entanglement and Einstein-Podolsky-Rosen (EPR) steering of continuous-variable multimode states. This originates in the Heisenberg uncertainty relations for the canonical quadrature operators of the modes. Correlations of two-party $(N\, \text{vs} \,1)$-mode states are examined by using the variances of a pair of suitable EPR-like observables. It turns out that the uncertainty sum of these nonlocal variables is bounded from below by local uncertainties and is strengthened differently for separable states and for each one-way unsteerable ones. The analysis of the minimal properly normalized sums of these variances yields necessary conditions of separability and EPR unsteerability of $(N\, \text{vs} \,1)$-mode states in both possible ways of steering. When the states and the performed measurements are Gaussian, then these conditions are precisely the previously-known criteria of separability and one-way unsteerability.

翻訳日:2023-03-23 18:45:21 公開日:2021-07-02

# FPGAによる量子鍵分布の大規模かつ高速なプライバシ増幅

Large-scale and High-speed Privacy Amplification for FPGA-based Quantum Key Distribution ( http://arxiv.org/abs/2107.01013v1 )

ライセンス: Link先を確認

Yan Bingze and Li Qiong and Mao Haokun

(参考訳) FPGAベースの量子鍵分布(QKD)システムはQKDシステムの重要なトレンドである。いくつかの利点、リアルタイム、低消費電力、高統合密度がある。プライバシアンプリフィケーションは、QKDのセキュリティを確保するために、QKDシステムにおいて不可欠な部分である。既存のFPGAベースのプライバシー増幅スキームには、これらのスキームのスループットと入力サイズ(最良のスキーム116Mbps@10^6)が他のプラットフォームよりもはるかに低い(最良のスキーム1Gbps@10^8)という欠点がある。本稿では,マルチ線形モジュラーハッシュモジュラー演算ハッシュ(MMH-MH)と数値理論変換(NTT)アルゴリズムを用いたFPGAベースのQKDのための新しいPAスキームを設計する。大規模かつ高速(LSHS)PAスキームと名付けられた新しいPAスキームは、乗算再利用可能なアーキテクチャと3つのキーユニットを設計し、性能を改善した。この方式はPAの入力サイズとスループットを桁違いに改善する。このスキームのスループットと入力サイズ(1gbps@10^8)は、他のプラットフォームと同等である。

The FPGA-based Quantum key distribution (QKD) system is an important trend of QKD systems. It has several advantages, real time, low power consumption and high integration density. Privacy amplification is an essential part in a QKD system to ensure the security of QKD. Existing FPGA-based privacy amplification schemes have an disadvantage, that the throughput and the input size of these schemes (the best scheme 116Mbps@10^6) are much lower than these on other platforms (the best scheme 1Gbps@10^8). This paper designs a new PA scheme for FPGA-based QKD with multilinear modular hash-modular arithmetic hash (MMH-MH) PA and number theoretical transform (NTT) algorithm. The new PA scheme, named large-scale and high-speed (LSHS) PA scheme, designs a multiplication-reusable architecture and three key units to improve the performance. This scheme improves the input size and throughput of PA by above an order of magnitude. The throughput and input size of this scheme (1Gbps@10^8) is at a comparable level with these on other platforms.

翻訳日:2023-03-23 18:44:38 公開日:2021-07-02

# 1自由度ハミルトン・サドルノード分岐の量子力学

Quantum dynamics of a one degree-of-freedom Hamiltonian saddle-node bifurcation ( http://arxiv.org/abs/2107.00979v1 )

ライセンス: Link先を確認

Wenyang Lyu, Shibabrat Naik, Stephen Wiggins

(参考訳) 本稿では、位相空間における平衡点のサドルノード分岐の正規形式である1次自由度ハミルトニアン(DOF)の量子力学について検討する。ハミルトニアンは運動エネルギーとポテンシャルエネルギーの和の形をしている。分岐パラメータはポテンシャルエネルギー関数にあり、ポテンシャルエネルギーに対するその影響はポテンシャル井戸の深さを変化させることである。主な焦点は、井戸の深さが量子力学に与える影響を評価することである。この評価は、時間に依存しないシュリンガー方程式のエネルギー固有値と固有ベクトル、位置座標に対する期待値と位置不確かさ、ウィグナー関数の計算によって行われる。

In this paper, we study the quantum dynamics of a one degree-of-freedom (DOF) Hamiltonian that is a normal form for a saddle node bifurcation of equilibrium points in phase space. The Hamiltonian has the form of the sum of kinetic energy and potential energy. The bifurcation parameter is in the potential energy function and its effect on the potential energy is to vary the depth of the potential well. The main focus is to evaluate the effect of the depth of the well on the quantum dynamics. This evaluation is carried out through the computation of energy eigenvalues and eigenvectors of the time-independent Schr\"odinger equations, expectation values and position uncertainties for position coordinate, and Wigner functions.

翻訳日:2023-03-23 18:44:22 公開日:2021-07-02

# 量子・古典宇宙論の時間と進化

Time and Evolution in Quantum and Classical Cosmology ( http://arxiv.org/abs/2107.00917v1 )

ライセンス: Link先を確認

Alexander Yu. Kamenshchik, Jeinny Nallely Perez Rodriguez and Tereza Vardanyan

(参考訳) 量子宇宙論における動的進化と時間の問題を分析する。我々は、量子作用素の期待値に対して古典的進化が再現されるような方法で、時間パラメータの役割を担える位相空間変数の選択の問題を強調する。我々は、時間変数と超ハミルトニアンの間のポアソン括弧が位相空間のすべてにおいて一元に等しい必要も十分もないことを示した。また、異なる内部時間間の切り替えの問題や、量子論のモンテビデオ解釈についても論じる。

We analyze the issue of dynamical evolution and time in quantum cosmology. We emphasize the problem of choice of phase space variables that can play the role of a time parameter in such a way that for expectation values of quantum operators the classical evolution is reproduced. We show that it is neither necessary nor sufficient for the Poisson bracket between the time variable and the super-Hamiltonian to be equal to unity in all of the phase space. We also discuss the question of switching between different internal times as well as the Montevideo interpretation of quantum theory.

翻訳日:2023-03-23 18:43:50 公開日:2021-07-02

# システム環境絡みの移動とテレポーテーション

Transfer and teleportation of system-environment entanglement ( http://arxiv.org/abs/2107.00895v1 )

ライセンス: Link先を確認

Tytus Harlender and Katarzyna Roszak

(参考訳) 環境を考慮した双方向テレポーテーションの研究を行っている。この環境は最初、テレポーテーションを補助するベル状態の純粋なデコヒーレンスを引き起こす。テレポーテーションが一方向に行われると、相関関係がqubit $c$のポストテレポーテーション状態へ転送され、結果として状態が非一貫性になる。他方では,新たなデコヒーレンス処理が起こらない場合には,キュービットの状態だけでなく,その環境との相関関係を単位忠実度でテレポートしていることが分かる。これらの過程はテルポーテーション中の測定結果に依存しず、古典相関と量子相関を区別しない。一方、第2のテレポーテーションステップがベル状態のデコヒーレンスによって先行している場合、状況はさらに複雑である。相関のテレポーテーションと転送は同時に発生し、異なる測定結果に対して異なるテレポーティング量子環境状態が得られる。これらの状態は、テレポートされたキュービットのコヒーレンス度が異なるが、テレポーテーションの最初の段階でベル状態-環境相互作用が絡み合う場合のみ、異なる量のキュービット環境絡み合いを持つことができる。極端な場合、テレポートされた量子ビット状態の1つは環境と絡み合うことができ、もう1つは分離可能である。

We study bidirectional teleportation while explicitly taking into account an environment. This environment initially causes pure dephasing decoherence of the Bell state which assists teleportation. We find that when teleportation is performed in one direction it is accompanied by a transfer of correlations into the post-teleportation state of qubit $C$, which results in decoherence of the state. In the other direction, if no new decoherence process occurs, we find that not only the state of the qubit but also its correlations with an environment are being teleported with unit Fidelity. These processes do not depend on the measurement outcome during telportation and do not differentiate between classical and quantum correlations. If, on the other hand, the second teleportation step is preceded by decoherence of the Bell state then the situation is much more complicated. Teleportation and transfer of correlations occur simultaneously, yielding different teleported qubit-environment states for different measurement outcomes. These states can differ in the degree of coherence of the teleported qubit, but only for an entangling Bell-state-environment interaction in the first step of teleportation, can they have different amounts of qubit-environment entanglement. In the extreme case, one of the teleported qubit states can be entangled with the environment while the other is separable.

翻訳日:2023-03-23 18:43:02 公開日:2021-07-02

# 安定化状態とグラフ状態の低減量子回路

Reduced quantum circuits for stabilizer states and graph states ( http://arxiv.org/abs/2107.00885v1 )

ライセンス: Link先を確認

Marc Bataille

(参考訳) まず,安定化回路を構成する部分群構造を考察し,本結果を用いて安定化回路の新たな正規形を提案する。この正規形式はクリフォード群における単純な共役規則を用いて誘導によって計算される。形状は CX-CZ-P-H-CZ-P-H で、CX (resp. CZ) は$\cnot$ (resp.) の層を表す。 $\cz$) ゲート、P は位相ゲートの層、H はアダマールゲートの層である。次に、安定状態の正規形を考え、グラフ状態を実装する回路における2量子ビットゲート数を削減する方法を示す。最後に,本手法の実用性を示すため,古典計算機と量子コンピュータの数値実験を行った。論文に記載されているすべてのアルゴリズムは、GitHubで利用可能なLinuxコマンドとして、C言語で実装されている。

We start by studying the subgroup structures underlying stabilizer circuits and we use our results to propose a new normal form for stabilizer circuits. This normal form is computed by induction using simple conjugation rules in the Clifford group. It has shape CX-CZ-P-H-CZ-P-H, where CX (resp. CZ) denotes a layer of $\cnot$ (resp. $\cz$) gates, P a layer of phase gates and H a layer of Hadamard gates. Then we consider a normal form for stabilizer states and we show how to reduce the two-qubit gate count in circuits implementing graph states. Finally we carry out a few numerical tests on classical and quantum computers in order to show the practical utility of our methods. All the algorithms described in the paper are implemented in the C language as a Linux command available on GitHub.

翻訳日:2023-03-23 18:42:39 公開日:2021-07-02

# 最大エントロピー法による再構成密度行列の純状態への収束

Convergence of reconstructed density matrix to a pure state using maximal entropy approach ( http://arxiv.org/abs/2107.01191v1 )

ライセンス: Link先を確認

Rishabh Gupta, Sabre Kais and Raphael D. Levine

(参考訳) 様々な種類の量子システムの技術応用の研究において、過去10年間に印象的な進歩があった。 IBMのような業界の巨人が、2023年末までに1000量子ビットを超えるスケーラブルな量子デバイスに関するロードマップを公開し、これらのデバイス上で量子処理をテストするための効率的な検証技術も開発されている。量子状態のキャラクタリゼーションは、量子状態トモグラフィ(QST)と呼ばれるプロセスを通じて実験的に測定され、システムのサイズと指数関数的にスケールする。しかし、不完全測定を用いたQSTは、これらの量子技術の特徴、特に全ての平均測定が高忠実で利用できるわけではないノイズの多い中間規模量子(NISQ)デバイスの現在の性質に適している。本稿では,量子系の密度行列を任意の数の量子ビットに対して完全再構成するために,最大エントロピー形式を既知平均測定値のペアワイズ結合に適用することにより,qstの代替手法を提案する。このアプローチは、再構成された密度行列を純粋な状態に収束する場合の観測可能な完全な集合を知っているとき、対象状態の最良の推定を提供する。我々のゴールは、純粋状態の量子システムの実用的な推論を提供することで、その応用を実際の量子コンピュータにおける量子エラー軽減の分野に適用し、さらなる研究を予定している。

Impressive progress has been made in the past decade in the study of technological applications of varied types of quantum systems. With industry giants like IBM laying down their roadmap for scalable quantum devices with more than 1000-qubits by the end of 2023, efficient validation techniques are also being developed for testing quantum processing on these devices. The characterization of a quantum state is done by experimental measurements through the process called quantum state tomography (QST) which scales exponentially with the size of the system. However, QST performed using incomplete measurements is aptly suited for characterizing these quantum technologies especially with the current nature of noisy intermediate-scale quantum (NISQ) devices where not all mean measurements are available with high fidelity. We, hereby, propose an alternative approach to QST for the complete reconstruction of the density matrix of a quantum system in a pure state for any number of qubits by applying the maximal entropy formalism on the pairwise combinations of the known mean measurements. This approach provides the best estimate of the target state when we know the complete set of observables which is the case of convergence of the reconstructed density matrix to a pure state. Our goal is to provide a practical inference of a quantum system in a pure state that can find its applications in the field of quantum error mitigation on a real quantum computer that we intend to investigate further.

翻訳日:2023-03-23 18:36:32 公開日:2021-07-02

# 図形計算における量子多値決定図

Quantum Multiple-Valued Decision Diagrams in Graphical Calculi ( http://arxiv.org/abs/2107.01186v1 )

ライセンス: Link先を確認

Renaud Vilmart

(参考訳) zh計算のようなグラフィカル計算は、量子過程の研究と解析において強力なツールであり、量子回路や測定に基づく計算などの量子計算の他のモデルとリンクしている。量子過程を記述するためのややコンパクトだが体系的な方法は量子多重値決定図(QMDD)を使うことであり、量子回路の合成や検証にすでに使われている。本稿では,QMDDを等価なZH-ダイアグラム,逆変換に変換する方法を示し,QMDDの削減がZH-カルキュラスでどのように変換されるかを示す。

Graphical calculi such as the ZH-calculus are powerful tools in the study and analysis of quantum processes, with links to other models of quantum computation such as quantum circuits, measurement-based computing, etc. A somewhat compact but systematic way to describe a quantum process is through the use of quantum multiple-valued decision diagrams (QMDDs), which have already been used for the synthesis of quantum circuits as well as for verification. We show in this paper how to turn a QMDD into an equivalent ZH-diagram, and vice-versa, and show how reducing a QMDD translates in the ZH-Calculus, hence allowing tools from one formalism to be used into the other.

翻訳日:2023-03-23 18:36:12 公開日:2021-07-02

# 表面イオントラップの1/{\omega} 電界雑音に上昇する実効性基板の吸着ダイナミクス

How Correlated Adsorbate Dynamics on Realistic Substrates Can Give Rise to 1/{\omega} Electric-Field Noise in Surface Ion Traps ( http://arxiv.org/abs/2107.01177v1 )

ライセンス: Link先を確認

Benjamin Foulon, Keith G. Ray, Chang-Eun Kim, Yuan Liu, Brenda M. Rubenstein, and Vincenzo Lordi

(参考訳) イオントラップは、スケーラブルな量子コンピューティングを実装する上で有望なアーキテクチャであるが、過度の"異常"加熱に悩まされ、その潜在能力の完全な実現を妨げている。この加熱はジョンソン-ナイキストノイズから予想されるよりも桁違いに大きいため、量子論理ゲートの非一貫性と忠実度を低下させるイオン運動を引き起こす。異常加熱の正確な起源は未解決の問題であるが、実験ではトラップ電極に吸着する可能性が示唆されている。異常加熱の多くのモデルが提案されているが、これらのモデルは0.1-10mhzの周波数でイオントラップで観測される1/\omega$電界ノイズスケーリングの原子論的起源を突き止めていない。本研究では,第一原理ポテンシャルによって記述された吸着剤の多層膜の運動によって生じるイオントラップ電界雑音の計算的研究を行う。このようにして、相関吸着運動が1/\omega$ノイズの生成において決定的な役割を果たすことを示すとともに、一般的にイオントラップで使用されるMHz周波数での1/\omega$スケーリングを引き起こす吸着パッチと多層交換の変換および回転運動を含む、候補の集合吸着運動を特定する。これらの結果は、複数の吸着系が、単純なものであっても、イオントラップで観測される1/\omega$のノイズを発生させる一連の活性化運動を生じさせ、個々の吸着運動よりも集団的に低周波加熱を引き起こす可能性が高いことを示している。

Ion traps are promising architectures for implementing scalable quantum computing, but they suffer from excessive "anomalous" heating that prevents their full potential from being realized. This heating, which is orders of magnitude larger than that expected from Johnson-Nyquist noise, results in ion motion that leads to decoherence and reduced fidelity in quantum logic gates. The exact origin of anomalous heating is an open question, but experiments point to adsorbates on trap electrodes as a likely source. Many different models of anomalous heating have been proposed, but these models have yet to pinpoint the atomistic origin of the experimentally-observed $1/\omega$ electric field noise scaling observed in ion traps at frequencies between 0.1-10 MHz. In this work, we perform the first computational study of the ion trap electric field noise produced by the motions of multiple monolayers of adsorbates described by first principles potentials. In so doing, we show that correlated adsorbate motions play a definitive role in producing $1/\omega$ noise and identify candidate collective adsorbate motions, including translational and rotational motions of adsorbate patches and multilayer exchanges, that give rise to $1/\omega$ scaling at the MHz frequencies typically employed in ion traps. These results demonstrate that multi-adsorbate systems, even simple ones, can give rise to a set of activated motions that can produce the $1/\omega$ noise observed in ion traps and that collective, rather than individual, adsorbate motions are much more likely to give rise to low-frequency heating.

翻訳日:2023-03-23 18:35:59 公開日:2021-07-02

# NVスピンレジスタの初期化のための量子制御シーケンスの最適化

Optimization of a quantum control sequence for initializing an NV spin register ( http://arxiv.org/abs/2107.01116v1 )

ライセンス: Link先を確認

T. Chakraborty, J. Zhang and D. Suter

(参考訳) 多くの量子情報プロトコルの実装は、量子レジスタの効率的な初期化を必要とする。本報告では,ダイヤモンド中の窒素空孔(NV)中心に付随するハイブリッドスピンレジスタを初期化するための集団トラッププロトコルを最適化する。我々はNVの電子スピンと核スピンをマイクロ波、高周波、光パルスのシーケンスで分極することで量子レジスタを初期化する。我々は、光パルスの影響下での人口分布を説明するために、レート方程式モデルを用いる。このモデルは、部分量子状態トモグラフィーによって得られた実験データと比較される。スピン偏極をさらに高めるため,光パルスを最適化した再帰プロトコルを提案する。

Implementation of many quantum information protocols require an efficient initialization of the quantum register. In the present report, we optimize a population trapping protocol for initializing a hybrid spin register associated a single nitrogen vacancy (NV) center in diamond. We initialize the quantum register by polarizing the electronic and the nuclear spins of the NV with a sequence of microwave, radio-frequency and optical pulses. We use a rate equation model to explain the distribution of population under the effect of the optical pulses. The model is compared to the experimental data obtained by performing partial quantum state tomography. To further increase the spin polarization, we propose a recursive protocol with optimized optical pulses.

翻訳日:2023-03-23 18:35:15 公開日:2021-07-02

# マルコフのボゾン環境における進化の量子速度

Quantum speed of evolution in a Markovian bosonic environment ( http://arxiv.org/abs/2107.01075v1 )

ライセンス: Link先を確認

Paulina Marian and Tudor A. Marian

(参考訳) 本稿では,開連続変数系のマルコフ力学に関連する量子速度制限時間の明示的な評価を行う。具体的には,熱ボソニック貯留層に弱結合した量子放射場のキャビティモードの標準設定について検討する。場の状態の進化は、正確な解析解を持つことが知られている量子光学マスター方程式によって制御される。純粋な入力状態から始まり、初期状態と進化状態の違い、すなわち進化の忠実性と進化のヒルベルト・シュミット距離の2つの指標を用いている。前者は del campo {\em et al によって導入された。マルコフ開系の進化に対して、時間に依存しない速度制限を導出した。フィールドモードの任意の入力純状態を用いて、このフィールド貯留層設定について評価する。結果公式はコヒーレント状態とフォック状態に特殊化される。一方,我々は,上述の2つの進化指標を用いた代替手法を活用している。それらの変化速度は同じ上限を持ち、従って独自の時間依存量子速度制限を与える。ヒルベルト・シュミット計量で構築された量子速度制限時間は、忠実度に基づくものよりも厳密であることが判明した。応用例として,対応する進化状態の特性関数を用いて,コヒーレント状態とフォック状態の減衰について検討する。これら2つの入力状態のクラスについて、忠実度とヒルベルト・シュミット距離の両方の一般表現を求め、解析する。コヒーレント状態の場合、それらの共通速度限界と一対のアソシエイト制限時間に関する正確な公式を導出する。

We present explicit evaluations of quantum speed limit times pertinent to the Markovian dynamics of an open continuous-variable system. Specifically, we consider the standard setting of a cavity mode of the quantum radiation field weakly coupled to a thermal bosonic reservoir. The evolution of the field state is ruled by the quantum optical master equation, which is known to have an exact analytic solution. Starting from a pure input state, we employ two indicators of how different the initial and evolved states are, namely, the fidelity of evolution and the Hilbert-Schmidt distance of evolution. The former was introduced by del Campo {\em et al.} who derived a time-independent speed limit for the evolution of a Markovian open system. We evaluate it for this field-reservoir setting, with an arbitrary input pure state of the field mode. The resultant formula is then specialized to the coherent and Fock states. On the other hand, we exploit an alternative approach that employs both indicators of evolution mentioned above. Their rates of change have the same upper bound, and consequently provide a unique time-dependent quantum speed limit. It turns out that the associate quantum speed limit time built with the Hilbert-Schmidt metric is tighter than the fidelity-based one. As apposite applications, we investigate the damping of the coherent and Fock states by using the characteristic functions of the corresponding evolved states. General expressions of both the fidelity and the Hilbert-Schmidt distance of evolution are obtained and analyzed for these two classes of input states. In the case of a coherent state, we derive accurate formulas for their common speed limit and the pair of associate limit times.

翻訳日:2023-03-23 18:33:53 公開日:2021-07-02

# ボース・アインシュタイン凝縮と準結晶

Bose-Einstein Condensation and quasicrystals ( http://arxiv.org/abs/2107.02901v1 )

ライセンス: Link先を確認

Moorad Alexanian and Vanik E. Mkrtchian

(参考訳) 相互作用するボース粒子を外部の局所ポテンシャルで検討する。外部準結晶ポテンシャルの大きなクラスではボース・アインシュタイン凝縮系は維持できないことが示されている。したがって、そのような準結晶ポテンシャルにおける空間次元 $d\leq 2$ では、ボース・アインシュタインの有限温度での凝縮によって超固体は不可能である。後者はまた、二次元フィボナッチタイリングについても真である。しかし、超固体は無限に長距離で非局所的な粒子間ポテンシャルからボース・アインシュタインによる$d\leq 2$で生じる。

We consider interacting Bose particles in an external local potential. It is shown that large class of external quasicrystal potentials cannot sustain any type of Bose-Einstein condensates. Accordingly, at spatial dimensions $D\leq 2$ in such quasicrystal potentials a supersolid is not possible via Bose-Einstein condensates at finite temperatures. The latter also hold true for the two-dimensional Fibonacci tiling. However, supersolids do arise at $D\leq 2$ via Bose-Einstein condensates from infinitely long-range, nonlocal interparticle potentials.

翻訳日:2023-03-23 18:26:49 公開日:2021-07-02

# コンピュータ教育研究文学における経験主義と報告のノームの体系的文献レビュー

A Systematic Literature Review of Empiricism and Norms of Reporting in Computing Education Research Literature ( http://arxiv.org/abs/2107.01984v1 )

ライセンス: Link先を確認

Sarah Heckman and Jeffrey C. Carver and Mark Sherriff and Ahmed Al-Zubidy

(参考訳) コンピュータ教育研究(CER)は、コンピュータのスキルを習得する学生の増加をサポートするために重要である。知識を体系的に前進させるためには、出版物は複製、メタ分析、理論構築をサポートするのに十分クリアでなければならない。本研究の目的は,出版物が複製,メタアナリシス,理論構築をサポートする情報を含むか否かを特定することで,CER文学における経験主義の報告を特徴付けることである。 RQ1) CER会場の論文のどの割合に経験的評価があるか。 RQ2) 経験的評価の特徴は何か。 rq3) 経験的評価を持つ論文は報告基準(包含とキー情報のラベル付けの両方)に従うか? 2014年と2015年に、SIGCSE TS, ICER, ITiCSE, TOCE, CSEの5つの会場で427の論文を発表した。我々はcerempiricism assessment rubricを開発し,応用した。 80%以上の論文がある種の経験的評価をしていた。定量的評価手法が最も多かった。最も頻繁に報告されている論文は教育技術、カリキュラム、コミュニティ、ツールに関する介入に関するものである。介入と他のデータセットやベースラインとを何らかの形で比較した論文の分割があった。多くの論文は、適切に報告された研究目標、目標、研究質問、仮説、参加者の説明、研究設計、データ収集、妥当性への脅威を欠いていた。 CERの著者は文献に経験的な結果をもたらしているが、報告の規範がすべて満たされているわけではない。著者には、その作業に関する明確なラベル付き詳細を提供して、読者がレプリケーションやメタ分析に方法論と結果を使用することを推奨します。コミュニティが成長するにつれて、CERの報告は成熟して、次世代のコンピューティング学習者を支援するためのコンピューティング教育理論の確立に役立ちます。

Computing Education Research (CER) is critical for supporting the increasing number of students who need to learn computing skills. To systematically advance knowledge, publications must be clear enough to support replications, meta-analyses, and theory-building. The goal of this study is to characterize the reporting of empiricism in CER literature by identifying whether publications include information to support replications, meta-analyses, and theory building. The research questions are: RQ1) What percentage of papers in CER venues have empirical evaluation? RQ2) What are the characteristics of the empirical evaluation? RQ3) Do the papers with empirical evaluation follow reporting norms (both for inclusion and for labeling of key information)? We conducted an SLR of 427 papers published during 2014 and 2015 in five CER venues: SIGCSE TS, ICER, ITiCSE, TOCE, and CSE. We developed and applied the CER Empiricism Assessment Rubric. Over 80% of papers had some form of empirical evaluation. Quantitative evaluation methods were the most frequent. Papers most frequently reported results on interventions around pedagogical techniques, curriculum, community, or tools. There was a split in papers that had some type of comparison between an intervention and some other data set or baseline. Many papers lacked properly reported research objectives, goals, research questions, or hypotheses, description of participants, study design, data collection, and threats to validity. CER authors are contributing empirical results to the literature; however, not all norms for reporting are met. We encourage authors to provide clear, labeled details about their work so readers can use the methodologies and results for replications and meta-analyses. As our community grows, our reporting of CER should mature to help establish computing education theory to support the next generation of computing learners.

翻訳日:2023-03-23 18:26:35 公開日:2021-07-02

# ベイジアンアプローチを用いたQKDシステムの量子クロック同期

Qubit-based clock synchronization for QKD systems using a Bayesian approach ( http://arxiv.org/abs/2107.01304v1 )

ライセンス: Link先を確認

Roderick D. Cochran and Daniel J. Gauthier

(参考訳) 量子鍵分配(QKD)システムは、2人のユーザが証明可能な安全な鍵を交換する方法を提供する。ユーザーの時計の同期は、セキュアな鍵を蒸留する前に必須のステップである。量子ビットベースの同期プロトコルは、送信された量子状態を直接使用して同期を実現する。従来のqubitベースの同期プロトコルは、直接または間接にセキュアな鍵を犠牲にしており、既知のqubitベースの同期プロトコルはすべて、ユーザが公開する公開情報をすべて効率的に使用していない。本稿では,すべての公開情報を組み込んだベイズ確率アルゴリズムを導入し,セキュアな鍵を犠牲にすることなく,クロックオフセットを効率的に検出する。さらに、アルゴリズムの出力は確率であり、同期に対する信頼度を定量化することができる。実演目的のために,効率の良い3状態BB84の準備・測定プロトコルのシミュレーションを伴うモデルシステムを提案する。我々のアルゴリズムは、アリスの公表した基底と平均光子数選択とボブの測定結果との相関を利用して、確率論的に最も起こりそうなクロックオフセットを決定する。この例では、通信用ビン幅8e-4のダークカウント確率と受信平均光子数0.01をシミュレートする場合に、通信用ビン幅4,140で95%の同期信頼性が得られることが判明した。

Quantum key distribution (QKD) systems provide a method for two users to exchange a provably secure key. Synchronizing the users' clocks is an essential step before a secure key can be distilled. Qubit-based synchronization protocols directly use the transmitted quantum states to achieve synchronization and thus avoid the need for additional classical synchronization hardware. Previous qubit-based synchronization protocols sacrifice secure key either directly or indirectly, and all known qubit-based synchronization protocols do not efficiently use all publicly available information published by the users. Here, we introduce a Bayesian probabilistic algorithm that incorporates all published information to efficiently find the clock offset without sacrificing any secure key. Additionally, the output of the algorithm is a probability, which allows us to quantify our confidence in the synchronization. For demonstration purposes, we present a model system with accompanying simulations of an efficient three-state BB84 prepare-and-measure protocol with decoy states. We use our algorithm to exploit the correlations between Alice's published basis and mean photon number choices and Bob's measurement outcomes to probabilistically determine the most likely clock offset. We find that we can achieve a 95 percent synchronization confidence in only 4,140 communication bin widths, meaning we can tolerate clock drift approaching 1 part in 4,140 in this example when simulating this system with a dark count probability per communication bin width of 8e-4 and a received mean photon number of 0.01.

翻訳日:2023-03-23 18:26:03 公開日:2021-07-02

# DiSH-trend:トレンドを考慮したインターベンションモデリングシミュレータ

DiSH-trend: Intervention Modeling Simulator That Accounts for Trend Influences ( http://arxiv.org/abs/2107.01302v1 )

ライセンス: Link先を確認

Stefan Andjelkovic and Natasa Miskov-Zivanov

(参考訳) 有向グラフのシミュレーションは、接続グラフが周期を含むシステムの力学を理解するための重要な方法である。 Discrete Stochastic Heterogeneous Simulator (DiSH) は、規制値を用いて規制要素の状態の更新を計算するシミュレーションツールの1つである。ここでは、要素制御のトレンドを考慮に入れた新しいシミュレーション手法であるDiSH-trendを提案する。本稿では,トレンドベースとレベルベースを組み合わせたハイブリッドレギュレーションとともに,トレンドベースのレギュレーションの特徴を示す。モデリング機能は、様々な機能を示す小さなおもちゃモデルで実証される。現実世界の能力はエチオピアのオロミア地方における食料不安のより大きなネットワークモデルで実証されている。モデルにトレンドベースのレギュレーションを加えると、モデリングの柔軟性が向上し、ハイブリッドレギュレーションは定性的な動的振る舞い予測を改善する。適切なデータがあれば、DiSH-trendは介入戦略を探求するための強力なツールになります。

Simulation on directed graphs is an important method for understanding the dynamics in the systems where connectivity graphs contain cycles. Discrete Stochastic Heterogeneous Simulator (DiSH) is one of the simulation tools with wide application, which uses regulator values to calculate state updates of regulated elements. Here we present a new simulation approach DiSH-trend which also takes into account the trends in regulating elements. We demonstrate the features of trend-based regulation, as well as hybrid regulation, which is a combination of the trend- and level-based approaches. The modeling capabilities are demonstrated on a small toy model, showcasing different functionalities. Real-world capabilities are demonstrated on a larger network model of food insecurity in the Ethiopian region Oromia. Adding trend-based regulation to models results in increased modeling flexibility, and hybrid regulation improves qualitative dynamic behavior prediction. With appropriate data, DiSH-trend becomes a powerful tool for exploring intervention strategies.

翻訳日:2023-03-23 18:25:37 公開日:2021-07-02

# 正方根特異点における非線形光学センサの異常精度

Exceptional precision of a nonlinear optical sensor at a square-root singularity ( http://arxiv.org/abs/2107.01291v1 )

ライセンス: Link先を確認

K. J. H. Peters and S. R. K. Rodriguez

(参考訳) 例外点(eps) --非エルミート線型系のスペクトル特異点 -- は、最近センシングに大きな関心を集めている。最初の提案と実験ではノイズを無視する感度の向上に焦点が当てられたが、その後の研究でノイズ環境におけるepセンサの問題点が明らかになった。本稿では,雑音下での特別なセンシングのための単一モードkerr非線形共振器を提案する。共振器の動的ヒステリシスに基づいて、EPに似た平方根特異点を示す信号を定義する。 epセンサとは対照的に,センサの信号対雑音比は測定速度とともに増加し,正方根特異性では精度が向上した。驚くべきことに、信号の平均化は素早く向上し、精度を低下させる。これらの非慣習的な特徴は、線形システムの制約を超えた高速で精密なセンシングの新たな機会を開く。光センシングに焦点をあてる一方で、我々のアプローチは他のヒステリックシステムにも拡張できる。

Exceptional points (EPs) -- spectral singularities of non-Hermitian linear systems -- have recently attracted great interest for sensing. While initial proposals and experiments focused on enhanced sensitivities neglecting noise, subsequent studies revealed issues with EP sensors in noisy environments. Here we propose a single-mode Kerr-nonlinear resonator for exceptional sensing in noisy environments. Based on the resonator's dynamic hysteresis, we define a signal that displays a square-root singularity akin to an EP. In contrast to EP sensors, our sensor has a signal-to-noise ratio that increases with the measurement speed, and a precision enhanced at the square-root singularity. Remarkably, averaging the signal can quickly enhance and then degrade the precision. These unconventional features open up new opportunities for fast and precise sensing beyond the constraints of linear systems. While we focus on optical sensing, our approach can be extended to other hysteretic systems.

翻訳日:2023-03-23 18:25:22 公開日:2021-07-02

# 三角形格子上の$U(1)$量子リンクモデルにおけるネマティック収束相:チップ上の文字列ダイナミクスの短期量子計算の可能性

Nematic Confined Phases in the $U(1)$ Quantum Link Model on a Triangular Lattice: An Opportunity for Near-Term Quantum Computations of String Dynamics on a Chip ( http://arxiv.org/abs/2107.01283v1 )

ライセンス: Link先を確認

D. Banerjee, S. Caspar, F.-J. Jiang, J.-H. Peng, and U.-J. Wiese

(参考訳) 三角格子上の$U(1)$量子リンクモデルは、2つの回転対称性を破るネマティック制限相を持つ。静電荷は、分極化された電気束を持つ個々のストランドからなる弦で接続される。 2つの相は、ほぼ正確な$SO(2)$対称性を持つ弱い1次相転移によって分離される。我々はチップ上に量子回路を構築し、非自明な弦力学の短期量子計算を容易にする。

The $U(1)$ quantum link model on the triangular lattice has two rotation-symmetry-breaking nematic confined phases. Static external charges are connected by confining strings consisting of individual strands with fractionalized electric flux. The two phases are separated by a weak first order phase transition with an emergent almost exact $SO(2)$ symmetry. We construct a quantum circuit on a chip to facilitate near-term quantum computations of the non-trivial string dynamics.

翻訳日:2023-03-23 18:25:09 公開日:2021-07-02

# 運動ロボットを用いた目標筋力分布 : 軌道と抵抗効果

Targeted Muscle Effort Distribution with Exercise Robots: Trajectory and Resistance Effects ( http://arxiv.org/abs/2107.01280v1 )

ライセンス: Link先を確認

Humberto De las Casas and Santino Bianco and Hanz Richter

(参考訳) 本研究の目的は,ロボット運動・リハビリテーション機械の筋力分布を軌道と抵抗設定に関連付けることである。筋活動における各筋の関与を表す筋活動分布を筋電図センサ(EMG)を用いて測定し,筋集団の活性化によって個別の活性化が決定された。 4自由度ロボットとそのインピーダンス制御システムは、ユーザが機械の中立経路と抵抗に対して経路に従うように要求される高度な運動プロトコルを作成するために使用される。この研究では、ロボットはゼロエフォート円形経路を確立し、被験者は楕円軌道に従うように要求される。制御システムは、中性経路からのずれと被検者によるトルクとの間にユーザが定義した剛性を生成する。実験で使用された軌道と抵抗の設定は楕円の向きと剛性パラメータであった。これらのパラメータを複数組み合わせて筋力分布に及ぼす影響を測定した。人工知能ニューラルネットワーク(ANN)は、モデルのトレーニングにデータの一部を使用した。そして、残りのデータを用いてモデルの精度を評価した。その結果,モデルの精度は時間とともに低下することがわかった。これらの結果は、疲労に関連する可能性のある時間変化ダイナミクスの存在を示唆する長期推定のための筋力学の複雑さを示している。

The objective of this work is to relate muscle effort distributions to the trajectory and resistance settings of a robotic exercise and rehabilitation machine. Muscular effort distribution, representing the participation of each muscle in the training activity, was measured with electromyography sensors (EMG) and defined as the individual activation divided by the total muscle group activation. A four degrees-of-freedom robot and its impedance control system are used to create advanced exercise protocols whereby the user is asked to follow a path against the machine's neutral path and resistance. In this work, the robot establishes a zero-effort circular path, and the subject is asked to follow an elliptical trajectory. The control system produces a user-defined stiffness between the deviations from the neutral path and the torque applied by the subject. The trajectory and resistance settings used in the experiments were the orientation of the ellipse and a stiffness parameter. Multiple combinations of these parameters were used to measure their effects on the muscle effort distribution. An artificial neural network (ANN) used part of the data for training the model. Then, the accuracy of the model was evaluated using the rest of the data. The results show how the precision of the model is lost over time. These outcomes show the complexity of the muscle dynamics for long-term estimations suggesting the existence of time-varying dynamics possibly associated with fatigue.

翻訳日:2023-03-23 18:25:01 公開日:2021-07-02

# コヒーレント光吸収をもつ非対称ミラーの量子光学

The quantum optics of asymmetric mirrors with coherent light absorption ( http://arxiv.org/abs/2107.01279v1 )

ライセンス: Link先を確認

Benjamin Dawson, Nicholas Furtak-Wells, Thomas Mann, Gin Jose and Almut Beige

(参考訳) ミラー被覆界面近傍の量子化された電磁界の局所観測は、"em both} 側の媒体の性質に強く依存する。巨視的量子電磁力学では、この事実は観測者の位置と他の全ての空間的位置と光子周波数を関連付ける光グリーン関数の助けを借りて考慮される。ここでは,量子ミラー画像検出法 (furtak-wells et al., phys. rev. a 97, 043827 (2018)) の助けを借りて,より直感的な手法で局所場観測性を得る。電場作用素を正しく正規化するために、自発的原子崩壊率を反射面から遠く離れたそれぞれの自由空間値に簡易化することを要求する。ミラーコーティング界面は量子フォトニックデバイスのための共通の基本構成ブロックであるので,このアプローチは興味深い。

The local observables of the quantised electromagnetic field near a mirror-coated interface depend strongly on the properties of the media on {\em both} sides. In macroscopic quantum electrodynamics, this fact is taken into account with the help of optical Green's functions which correlate the position of an observer with all other spatial positions and photon frequencies. Here we present an alternative, more intuitive approach and obtain the local field observables with the help of a quantum mirror image detector method [Furtak-Wells et al., Phys. Rev. A 97, 043827 (2018)]. In order to correctly normalise electric field operators, we demand that spontaneous atomic decay rates simplify to their respective free space values far away from the reflecting surface. Our approach is interesting, since mirror-coated interfaces constitute a common basic building block for quantum photonic devices.

翻訳日:2023-03-23 18:24:41 公開日:2021-07-02

# アナログ量子アルゴリズムの挙動

Behavior of Analog Quantum Algorithms ( http://arxiv.org/abs/2107.01218v1 )

ライセンス: Link先を確認

Lucas T. Brady, Lucas Kocia, Przemyslaw Bienias, Aniruddha Bapat, Yaroslav Kharkov, Alexey V. Gorshkov

(参考訳) アナログ量子アルゴリズムはユニタリゲートではなくハミルトニアンによって定式化され、量子断熱計算、量子アニーリング、量子近似最適化アルゴリズム(qaoa)が含まれる。これらのアルゴリズムは、短期量子アプリケーションには有望な候補であるが、アニーリングスケジュールや変動パラメータによる微調整を必要とすることが多い。本研究では,これらのアナログアルゴリズム間の関係や,最適手順の近似となる限界について検討する。しかしながら,最適手順がスムーズな断熱処理にどのようにアプローチするかを,基底状態とダイアバティック遷移のコヒーレントなエラーキャンセルに影響を及ぼす最初の励起状態との相互作用から説明できる重畳発振パターンを用いて検討する。さらに、QAOAが各QAOA層の長さを振動パターンの周期に等しい長さでエミュレートするという数値的および解析的な証拠を提供する。さらに、QAOAバングの比率は、最適手順の滑らかで非振動部分によって決定される。最適手順の積公式展開の観点からこれらの現象について議論する。これらの議論により、異なるアナログアルゴリズムは、異なる極限と近似の下で最適なプロトコルをエミュレートできると結論付ける。最後に,論文の他の部分から得られた解析的および数値的洞察を用いて,最適なプロトコルを近似する新しいアルゴリズムを提案する。実際、数値的には、このアルゴリズムは標準的なQAOAおよびナイーブ量子アニール法よりも優れている。

Analog quantum algorithms are formulated in terms of Hamiltonians rather than unitary gates and include quantum adiabatic computing, quantum annealing, and the quantum approximate optimization algorithm (QAOA). These algorithms are promising candidates for near-term quantum applications, but they often require fine tuning via the annealing schedule or variational parameters. In this work, we explore connections between these analog algorithms, as well as limits in which they become approximations of the optimal procedure.Notably, we explore how the optimal procedure approaches a smooth adiabatic procedure but with a superposed oscillatory pattern that can be explained in terms of the interactions between the ground state and first excited state that effect the coherent error cancellation of diabatic transitions. Furthermore, we provide numeric and analytic evidence that QAOA emulates this optimal procedure with the length of each QAOA layer equal to the period of the oscillatory pattern. Additionally, the ratios of the QAOA bangs are determined by the smooth, non-oscillatory part of the optimal procedure. We provide arguments for these phenomena in terms of the product formula expansion of the optimal procedure. With these arguments, we conclude that different analog algorithms can emulate the optimal protocol under different limits and approximations. Finally, we present a new algorithm for better approximating the optimal protocol using the analytic and numeric insights from the rest of the paper. In practice, numerically, we find that this algorithm outperforms standard QAOA and naive quantum annealing procedures.

翻訳日:2023-03-23 18:24:14 公開日:2021-07-02

# 多重多重光子数測定

Multiplexed photon number measurement ( http://arxiv.org/abs/2001.03217v3 )

ライセンス: Link先を確認

Antoine Essig, Quentin Ficheux, Th\'eau Peronnin, Nathana\"el Cottet, Rapha\"el Lescanne, Alain Sarlette, Pierre Rouchon, Zaki Leghtas, Benjamin Huard

(参考訳) 2段階のシステム – キュービット – がより大きなシステムのプローブとして使用される場合,システム状態に関する1つのイエスノー質問に自然に答えることになります。本稿では,マイクロ波共振器の光子数について,単一量子ビットではなく,多くの情報を連続計測により抽出する手法を提案する。周波数コムを反射する超伝導量子ビットから放出される蛍光を記録することにより、各フォック状態(0から8)に関する情報が独立な測定チャネルに同時に符号化される多重光子計数を実現することにより、原理実証実験を実現する。共振器の量子状態の直接ウィグナートモグラフィーは、測定のバックアクションと最適な情報抽出パラメータを証明している。本実験は、逐次量子測定を周波数領域で分離した同時連続測定に置き換えることで、量子メータの全ポテンシャルを解き明かす。

When a two-level system -- a qubit -- is used as a probe of a larger system, it naturally leads to answering a single yes-no question about the system state. Here we propose a method where a single qubit is able to extract, not a single, but many bits of information about the photon number of a microwave resonator using continuous measurement. We realize a proof-of-principle experiment by recording the fluorescence emitted by a superconducting qubit reflecting a frequency comb, thus implementing multiplexed photon counting where the information about each Fock state -- from 0 to 8 -- is simultaneously encoded in independent measurement channels. Direct Wigner tomography of the quantum state of the resonator evidences the back-action of the measurement as well as the optimal information extraction parameters. Our experiment unleashes the full potential of quantum meters by replacing a sequential quantum measurements with simultaneous and continuous measurements separated in the frequency domain.

翻訳日:2023-01-13 05:32:52 公開日:2021-07-02

# Vine Copulaによる変量推論:ベイジアンコンピュータモデル校正のための効率的なアプローチ

Variational Inference with Vine Copulas: An efficient Approach for Bayesian Computer Model Calibration ( http://arxiv.org/abs/2003.12890v2 )

ライセンス: Link先を確認

Vojtech Kejzlar and Tapabrata Maiti

(参考訳) コンピュータアーキテクチャの進歩により、計算モデルの使用が増加し、核物理学や気候研究など多くの科学的応用において複雑な問題を解く。しかし、そのようなモデルのポテンシャルは計算コストが高く、その結果不確かさの定量化が不適当になる傾向があるため、しばしば妨げられる。さらに、通常はリアルタイム観測では校正されない。ガウス過程を持つ計算機モデルの校正のための変分ベイズ推定(vbi)に基づく計算効率の高いアルゴリズムを開発した。残念ながら、VBIの速度とスケーラビリティは、依存データによるキャリブレーションフレームワークに適用すると低下する。 VBIの効率性を維持するために,Vine copulas を用いてデータ間の依存構造に関する情報を境界分布から分離し,データ可能性のペアワイズ分解を行う。本稿では,提案手法の計算スケーラビリティに関する理論的および実証的な証拠と,提案アルゴリズムの効率的な実装に必要な詳細をすべて記述する。また,核結合エネルギーの液滴モデルのキャリブレーションを通じて実データを用いた実践者に対して,本手法がもたらす機会を実証する。

With the advancements of computer architectures, the use of computational models proliferates to solve complex problems in many scientific applications such as nuclear physics and climate research. However, the potential of such models is often hindered because they tend to be computationally expensive and consequently ill-fitting for uncertainty quantification. Furthermore, they are usually not calibrated with real-time observations. We develop a computationally efficient algorithm based on variational Bayes inference (VBI) for calibration of computer models with Gaussian processes. Unfortunately, the speed and scalability of VBI diminishes when applied to the calibration framework with dependent data. To preserve the efficiency of VBI, we adopt a pairwise decomposition of the data likelihood using vine copulas that separate the information on dependence structure in data from their marginal distributions. We provide both theoretical and empirical evidence for the computational scalability of our methodology and describe all the necessary details for an efficient implementation of the proposed algorithm. We also demonstrate the opportunities given by our method for practitioners on a real data example through calibration of the Liquid Drop Model of nuclear binding energies.

翻訳日:2022-12-19 00:03:15 公開日:2021-07-02

# 鏡のない鏡の輝き:鏡の輝きの自然な派生

Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent ( http://arxiv.org/abs/2004.01025v3 )

ライセンス: Link先を確認

Suriya Gunasekar, Blake Woodworth, Nathan Srebro

(参考訳) 我々は、ミラー降下ポテンシャルの原始的唯一の導出を、計量テンソルがミラー降下ポテンシャルのヘッシアンであるリーマン多様体上の勾配流れの「部分的」離散化として示す。我々は、この離散化を「完全な」前方オイラー離散化によって得られる自然グラディエント Descent と対比する。この見解は、この方法の関係性に光を当て、計量テンソルがヘッシアンであるにもかかわらず、一般リーマン幾何学へのミラー降下を一般化することを可能にし、従って「双対」は存在しない。

We present a primal only derivation of Mirror Descent as a "partial" discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential. We contrast this discretization to Natural Gradient Descent, which is obtained by a "full" forward Euler discretization. This view helps shed light on the relationship between the methods and allows generalizing Mirror Descent to general Riemannian geometries, even when the metric tensor is {\em not} a Hessian, and thus there is no "dual."

翻訳日:2022-12-17 09:54:20 公開日:2021-07-02

# 大規模複合飛行ネットワークを用いた飛行船ペア最適化のための新しいカラム生成ヒューリスティック

A Novel Column Generation Heuristic for Airline Crew Pairing Optimization with Large-scale Complex Flight Networks ( http://arxiv.org/abs/2005.08636v4 )

ライセンス: Link先を確認

Divyam Aggarwal, Dhish Kumar Saxena, Saaju Pualose, Thomas B\"ack, Michael Emmerich

(参考訳) 乗組員のペアリング最適化(cpo)は、乗組員の運用コストが燃料コストに次いで第2位であることから、航空会社のビジネスの存続に不可欠である。 cpoは、いくつかの法的制約を満たしながら、スケジュールされたすべてのフライトをカバーする一連の飛行シーケンス(クルーペアリング)を作成することを目指している。最先端の手法は、基礎となる整数プログラミング問題を線形計画問題に緩和することに大きく依存しており、これはカラム生成(cg)技術によって解決される。しかし、航空会社の事業拡大に伴い、CPOは次元性の呪いに悩まされ、正確なCG実装は廃止され、ヒューリスティックベースのCG実装が必要とされる。しかし、文献では、複数の { crew bases と/またはハブアンドスポークのサブネットワークを含む、非常に一般的な大規模な複雑な飛行ネットワークは、ほとんど調査されていない。本稿では,AirCROP(Airline Crew Pairing Optimizer)の社内開発を可能にする新しいCGヒューリスティックを提案する。ヒューリスティック/エアCROPの有効性は、実世界の大規模で複雑なネットワークインスタンスで4,200機以上の飛行、15人の乗員基地、複数のハブ・アンド・スポーク・サブネットワーク(数十億以上のペアリング)でテストされている。特に,本論文では,ペアリングのランダムな探索,ドメイン知識の活用(最適解の特徴に基づく),アーカイビングによる過去の計算・探索の活用を中心に,提案したCGヒューリスティック(AirCROPフレームワーク全体ではない)に焦点をあてる。本論文は航空会社の文脈を持つが,提案したCGヒューリスティックは,ドメイン知識の活用による組合せ最適化問題への対処方法のテンプレートとして,様々な分野にまたがる幅広い応用を見出すことができる。

Crew Pairing Optimization (CPO) is critical for an airlines' business viability, given that the crew operating cost is second only to the fuel cost. CPO aims at generating a set of flight sequences (crew pairings) to cover all scheduled flights, at minimum cost, while satisfying several legality constraints. The state-of-the-art heavily relies on relaxing the underlying Integer Programming Problem into a Linear Programming Problem, which in turn is solved through the Column Generation (CG) technique. However, with the alarmingly expanding airlines' operations, CPO is marred by the curse of dimensionality, rendering the exact CG-implementations obsolete, and necessitating the heuristic-based CG-implementations. Yet, in literature, the much prevalent large-scale complex flight networks involving multiple { crew bases and/or hub-and-spoke sub-networks, largely remain uninvestigated. This paper proposes a novel CG heuristic, which has enabled the in-house development of an Airline Crew Pairing Optimizer (AirCROP). The efficacy of the heuristic/AirCROP has been tested on real-world, large-scale, complex network instances with over 4,200 flights, 15 crew bases, and multiple hub-and-spoke sub-networks (resulting in billion-plus possible pairings). Notably, this paper has a dedicated focus on the proposed CG heuristic (not the entire AirCROP framework) based on balancing random exploration of pairings; exploitation of domain knowledge (on optimal solution features); and utilization of the past computational & search effort through archiving. Though this paper has an airline context, the proposed CG heuristic may find wider applications across different domains, by serving as a template on how to utilize domain knowledge to better tackle combinatorial optimization problems.

翻訳日:2022-12-02 00:23:58 公開日:2021-07-02

# SHADOWCAST: 制御可能なグラフ生成

SHADOWCAST: Controllable Graph Generation ( http://arxiv.org/abs/2006.03774v4 )

ライセンス: Link先を確認

Wesley Joon-Wie Tann, Ee-Chien Chang, and Bryan Hooi

(参考訳) 生成過程におけるグラフ属性の制御として定式化された制御可能なグラフ生成問題を導入し,理解可能な構造を持つ所望のグラフを生成する。この生成プロセスを導くために透明で分かりやすいマルコフモデルを使用することで、生成したグラフを形作り、理解することができる。本稿では,従来のグラフ固有の特性を維持しつつ,グラフ生成を制御可能な生成モデルである${\rm S{\small HADOW}C{\small AST}}$を提案する。提案モデルは条件付き生成型adversarial networkに基づいている。観察されたグラフとユーザ指定のマルコフモデルパラメータが与えられたとき、${\rm s{\small hadow}c{\small ast}}$ は所望のグラフを生成する条件を制御する。 3つの実世界のネットワークデータセットに関する総合的な実験は、グラフ生成タスクにおける我々のモデルの競合性能を示す。さらに、グラフ構造が異なる仮説シナリオを生成するために、${\rm S{\small HADOW}C{\small AST}}$を指示することで、その効果的な制御性を示す。

We introduce the controllable graph generation problem, formulated as controlling graph attributes during the generative process to produce desired graphs with understandable structures. Using a transparent and straightforward Markov model to guide this generative process, practitioners can shape and understand the generated graphs. We propose ${\rm S{\small HADOW}C{\small AST}}$, a generative model capable of controlling graph generation while retaining the original graph's intrinsic properties. The proposed model is based on a conditional generative adversarial network. Given an observed graph and some user-specified Markov model parameters, ${\rm S{\small HADOW}C{\small AST}}$ controls the conditions to generate desired graphs. Comprehensive experiments on three real-world network datasets demonstrate our model's competitive performance in the graph generation task. Furthermore, we show its effective controllability by directing ${\rm S{\small HADOW}C{\small AST}}$ to generate hypothetical scenarios with different graph structures.

翻訳日:2022-11-24 20:56:15 公開日:2021-07-02

# シーケンス学習のための時間相関タスクスケジューリング

Temporally Correlated Task Scheduling for Sequence Learning ( http://arxiv.org/abs/2007.05290v2 )

ライセンス: Link先を確認

Xueqing Wu, Lewen Wang, Yingce Xia, Weiqing Liu, Lijun Wu, Shufang Xie, Tao Qin, Tie-Yan Liu

(参考訳) 近年、シーケンス学習は機械学習コミュニティから多くの研究の注目を集めている。多くのアプリケーションにおいて、シーケンス学習タスクは、通常、複数の時間的に相関した補助タスクと関連付けられている。例えば (i)同時機械翻訳では、異なるレイテンシで翻訳を行うことができる(つまり、翻訳の前に読み待ちする入力語数)。 (二)株価トレンド予測においては、将来日(例えば、明日、明日の翌日)の株価を予測することができる。これらの時間的相関タスクが互いに助け合うことは明らかだが、メインタスクの性能を高めるために複数の補助タスクをよりよく活用する方法について、非常に限定的な調査が行われている。本研究では,学習用補助タスクをモデル状態と現在のトレーニングデータに応じて適応的に選択できるシーケンス学習のための学習可能なスケジューラを提案する。メインタスクのスケジューラとモデルは、バイレベル最適化によって共同で訓練される。実験の結果,本手法は同時翻訳と株価トレンド予測の性能を著しく向上させることがわかった。

Sequence learning has attracted much research attention from the machine learning community in recent years. In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks, which are different in terms of how much input information to use or which future step to predict. For example, (i) in simultaneous machine translation, one can conduct translation under different latency (i.e., how many input words to read/wait before translation); (ii) in stock trend forecasting, one can predict the price of a stock in different future days (e.g., tomorrow, the day after tomorrow). While it is clear that those temporally correlated tasks can help each other, there is a very limited exploration on how to better leverage multiple auxiliary tasks to boost the performance of the main task. In this work, we introduce a learnable scheduler to sequence learning, which can adaptively select auxiliary tasks for training depending on the model status and the current training data. The scheduler and the model for the main task are jointly trained through bi-level optimization. Experiments show that our method significantly improves the performance of simultaneous machine translation and stock trend forecasting.

翻訳日:2022-11-11 22:00:28 公開日:2021-07-02

# 創造産業における人工知能 : レビュー

Artificial Intelligence in the Creative Industries: A Review ( http://arxiv.org/abs/2007.12391v6 )

ライセンス: Link先を確認

Nantheera Anantrasirichai and David Bull

(参考訳) 本稿では,創造産業の文脈における人工知能(AI)技術と応用の現状を概観する。 ai、特に機械学習(ml)アルゴリズムの簡単な背景には、畳み込みニューラルネットワーク(cnns)、生成敵ネットワーク(gans)、リカレントニューラルネットワーク(rnns)、深層強化学習(drl)が含まれる。私たちはクリエイティブなアプリケーションを、AI技術の使用方法に関連する5つのグループに分類します。 i) コンテンツの作成 ii) 情報分析三コンテンツの充実及び生産後のワークフロー四情報抽出及び強化及び v) データ圧縮。我々は、これらの分野におけるこの急速に進歩する技術の成功と限界について批判的に検討する。創造的なツールとしてのAIの使用と、創造的なツールとしての潜在能力とを、私たちはさらに区別しています。近い将来、機械学習ベースのAIは、創造性のためのツールや共同アシスタントとして広く採用されるでしょう。対照的に、AIが‘創造者’であるような制約の少ない領域での機械学習の成功は、控えめなままである。 AI(あるいはその開発者)が、人間の創造と競合するオリジナルの創造物に対して受賞する可能性も、現代の技術に基づいて制限されている。それゆえ、創造的産業の文脈では、aiによる最大限の利益は、その焦点が人間中心であり、人間の創造性を置き換えるのではなく、強化するように設計された場所でもたらされる、と結論づける。

This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity.

翻訳日:2022-11-07 05:56:23 公開日:2021-07-02

# 外乱推定:硬さ、極小調整アルゴリズムとその応用

Outlier-Robust Estimation: Hardness, Minimally Tuned Algorithms, and Applications ( http://arxiv.org/abs/2007.15109v3 )

ライセンス: Link先を確認

Pasquale Antonante, Vasileios Tzoumas, Heng Yang, Luca Carlone

(参考訳) ロボット工学と視覚の非線形推定は、通常、誤ったデータ関連付けや、信号処理や機械学習の手法による誤検出によって異常に苦しめられている。本稿では,外乱推定のための2つの統一的な定式化,一般化最大収束(G-MC)と一般化最小平方(G-TLS)を導入し,基本的限界,実用的アルゴリズム,応用について検討する。我々の最初の貢献は、アウトリアー・ロバスト推定がほぼ不可能であることの証明である: 最悪の場合、(ほぼ)アウトリアーの集合を見つけることは、時間よりも遅いアルゴリズム(特に、準多項時間で実行されるアルゴリズム)でさえも不可能である。第2の貢献として,2つの汎用アルゴリズムをレビューし,拡張する。第1のAdaptive Trimming (ADAPT) は組合せ的であり、G-MCに適しており、第2のDeleced Non-Convexity (GNC) はホモトピー法に基づいており、G-TLSに適している。 ADAPT と GNC は、ユーザがイリヤノイズ統計の事前知識を持っていない場合(あるいは、統計が時間とともに変化する場合)に拡張し、イリヤとイリヤを分離する合理的なしきい値(RANSAC でよく使われるもの)を推測できない場合に拡張する。外れ値から外れ値を切り離す方法を動的に決定する、外れ値拒否のための最初の最小調整アルゴリズムを提案する。第3の貢献は、メッシュ登録、画像に基づく物体検出(形状アライメント)、ポーズグラフ最適化といったロボット知覚問題に対するアルゴリズムの評価である。 ADAPTとGNCはリアルタイムで実行され、決定論的であり、RANSACより優れ、80-90%のアウトレイラが堅牢である。彼らの最小限に調整されたバージョンは、イリヤのノイズに頼らずとも、芸術の状態を好意的に比較している。

Nonlinear estimation in robotics and vision is typically plagued with outliers due to wrong data association, or to incorrect detections from signal processing and machine learning methods. This paper introduces two unifying formulations for outlier-robust estimation, Generalized Maximum Consensus (G-MC) and Generalized Truncated Least Squares (G-TLS), and investigates fundamental limits, practical algorithms, and applications. Our first contribution is a proof that outlier-robust estimation is inapproximable: in the worst case, it is impossible to (even approximately) find the set of outliers, even with slower-than-polynomial-time algorithms (particularly, algorithms running in quasi-polynomial time). As a second contribution, we review and extend two general-purpose algorithms. The first, Adaptive Trimming (ADAPT), is combinatorial, and is suitable for G-MC; the second, Graduated Non-Convexity (GNC), is based on homotopy methods, and is suitable for G-TLS. We extend ADAPT and GNC to the case where the user does not have prior knowledge of the inlier-noise statistics (or the statistics may vary over time) and is unable to guess a reasonable threshold to separate inliers from outliers (as the one commonly used in RANSAC). We propose the first minimally tuned algorithms for outlier rejection, that dynamically decide how to separate inliers from outliers. Our third contribution is an evaluation of the proposed algorithms on robot perception problems: mesh registration, image-based object detection (shape alignment), and pose graph optimization. ADAPT and GNC execute in real-time, are deterministic, outperform RANSAC, and are robust up to 80-90% outliers. Their minimally tuned versions also compare favorably with the state of the art, even though they do not rely on a noise bound for the inliers.

翻訳日:2022-11-05 20:53:27 公開日:2021-07-02

# ニューラルネットワークを用いた時系列分類のためのデータ拡張に関する実証的研究

An Empirical Survey of Data Augmentation for Time Series Classification with Neural Networks ( http://arxiv.org/abs/2007.15951v4 )

ライセンス: Link先を確認

Brian Kenji Iwana, Seiichi Uchida

(参考訳) 近年、深層ニューラルネットワークはパターン認識において多くの成功を収めている。この成功の一部は、一般化を促進するためにビッグデータに依存しているためである。しかし、時系列認識の分野では、多くのデータセットは非常に小さい。この問題に対処する1つの方法は、データ拡張の利用である。本稿では,時系列データ拡張手法とニューラルネットワークを用いた時系列分類への応用について検討する。本稿では,時系列データ拡張において,変換に基づく手法,パターン混合法,生成モデル,分解法を含む4つのファミリーを分類・概説する。さらに,6種類のニューラルネットワークを用いた128の時系列分類データセットにおいて,12の時系列データ拡張手法を実証的に評価した。その結果,各データ拡張手法の特徴,長所,短所,レコメンデーションを解析できた。この調査は、ニューラルネットワークアプリケーションのための時系列データ拡張の選択を支援することを目的としている。

In recent times, deep artificial neural networks have achieved many successes in pattern recognition. Part of this success can be attributed to the reliance on big data to increase generalization. However, in the field of time series recognition, many datasets are often very small. One method of addressing this problem is through the use of data augmentation. In this paper, we survey data augmentation techniques for time series and their application to time series classification with neural networks. We propose a taxonomy and outline the four families in time series data augmentation, including transformation-based methods, pattern mixing, generative models, and decomposition methods. Furthermore, we empirically evaluate 12 time series data augmentation methods on 128 time series classification datasets with six different types of neural networks. Through the results, we are able to analyze the characteristics, advantages and disadvantages, and recommendations of each data augmentation method. This survey aims to help in the selection of time series data augmentation for neural network applications.

翻訳日:2022-11-04 05:52:56 公開日:2021-07-02

# 合理的攻撃者に対する未知の提示攻撃検出

Unknown Presentation Attack Detection against Rational Attackers ( http://arxiv.org/abs/2010.01592v2 )

ライセンス: Link先を確認

Ali Khodabakhsh, Zahid Akhtar

(参考訳) 過去10年間のプレゼンテーション攻撃検出とマルチメディア法医学の分野では目覚ましい進歩があったが、これらのシステムは実際の環境での攻撃には弱い。既存のソリューションの課題は、未知の攻撃の検出、敵対的な設定で実行する能力、最小限の学習、説明可能性である。本研究では,アタッカーと検出器の相互作用をモデル化するゲーム理論的な視点に依拠して,これらの限界にアプローチする。その結果、新しい最適化基準が提案され、実際の環境での性能を改善するための一連の要件が定義される。さらに,特定の攻撃種に偏らないジェネレータベースの特徴セットを用いて,新たな検出手法を提案する。既知の攻撃の性能をさらに最適化するために, カテゴリー的マージン最大化損失(c-marmax)という新たな損失関数が提案され, 最も強力な攻撃に対する性能が徐々に向上する。提案手法は、既知の攻撃と未知の攻撃の間でよりバランスの取れた性能を提供し、合理的な攻撃に対して、未知の攻撃検出ケースにおいて最先端のパフォーマンスを達成する。最後に,提案手法の数少ない学習可能性と,画素レベルの説明可能性について検討した。

Despite the impressive progress in the field of presentation attack detection and multimedia forensics over the last decade, these systems are still vulnerable to attacks in real-life settings. Some of the challenges for existing solutions are the detection of unknown attacks, the ability to perform in adversarial settings, few-shot learning, and explainability. In this study, these limitations are approached by reliance on a game-theoretic view for modeling the interactions between the attacker and the detector. Consequently, a new optimization criterion is proposed and a set of requirements are defined for improving the performance of these systems in real-life settings. Furthermore, a novel detection technique is proposed using generator-based feature sets that are not biased towards any specific attack species. To further optimize the performance on known attacks, a new loss function coined categorical margin maximization loss (C-marmax) is proposed which gradually improves the performance against the most powerful attack. The proposed approach provides a more balanced performance across known and unknown attacks and achieves state-of-the-art performance in known and unknown attack detection cases against rational attackers. Lastly, the few-shot learning potential of the proposed approach is studied as well as its ability to provide pixel-level explainability.

翻訳日:2022-10-11 03:32:07 公開日:2021-07-02

# シンボリック有限状態オートマトンの複雑性について

On the Complexity of Symbolic Finite-State Automata ( http://arxiv.org/abs/2011.05389v3 )

ライセンス: Link先を確認

Dana Fisman and Hadar Frenkel and Sandra Zilles

(参考訳) 我々は、SFAの手順(交叉、空白など)の複雑さを再考し、それらを象徴的オートマトン(状態の数、状態から出る遷移の最大数、最も複雑な遷移述語のサイズ)に適した尺度に従って分析する。我々は SFA の特殊形式である {normalized SFAs} と {neat SFAs} 、および {monotonic} の実効ブール代数上の SFA に注意を払う。

We revisit the complexity of procedures on SFAs (such as intersection, emptiness, etc.) and analyze them according to the measures we find suitable for symbolic automata: the number of states, the maximal number of transitions exiting a state, and the size of the most complex transition predicate. We pay attention to the special forms of SFAs: {normalized SFAs} and {neat SFAs}, as well as to SFAs over a {monotonic} effective Boolean algebra.

翻訳日:2022-09-27 08:35:27 公開日:2021-07-02

# 漸近的最適情報指向サンプリング

Asymptotically Optimal Information-Directed Sampling ( http://arxiv.org/abs/2011.05944v4 )

ライセンス: Link先を確認

Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesv\'ari

(参考訳) 漸近的に最適であり、(ほぼ)最悪の場合を有限時間で最適とする確率線形包帯に対する単純かつ効率的なアルゴリズムを導入する。このアプローチは、漸近的下界を定義する最適化問題によって通知される情報ゲインのためのサロゲートを備えた、頻繁な情報指向サンプリング(ids)フレームワークに基づいている。我々の分析では、IDSが後悔と情報のトレードオフのバランスを保ち、最近提案された原始双対法とIDSアルゴリズムの驚くべき関係を明らかにする。 IDS が UCB と有限時間で競合し,無症候性体制において有意に良くなることを実証的に実証した。

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.

翻訳日:2022-09-26 23:23:53 公開日:2021-07-02

# オンラインマッチング,フロー,ロードバランシングのための学習可能な,インスタンス・ロバスト予測

Learnable and Instance-Robust Predictions for Online Matching, Flows and Load Balancing ( http://arxiv.org/abs/2011.11743v2 )

ライセンス: Link先を確認

Thomas Lavastida, Benjamin Moseley, R. Ravi and Chenyang Xu

(参考訳) 本稿では,形式的に学習可能かつインスタンスロバストなアルゴリズムを,予測によって拡張するための新しいモデルを提案する。学習可能性により、予測は妥当な過去のデータから効率的に構築できる。インスタンスの堅牢性は、予測が問題入力の適度な変更に対して堅牢であることを保証する。インスタンスの堅牢性は、変更の関数としてのパフォーマンスのスムーズな低下を主張する。理想的には、パフォーマンスは最悪のケース境界よりも悪くはない。また、予測を客観的に比較することもできる。我々は,ネットワークフロー割当問題と制限割当メースパン最小化の予測を伴うオンラインアルゴリズムを設計する。両方の問題に対して、2つの重要な特性が確立されている: 以前のインスタンスの小さなサンプルから高品質の予測を学習することができ、これらの予測は、基礎となる問題インスタンスが変化したときにスムーズに劣化するエラーに頑健である。

We propose a new model for augmenting algorithms with predictions by requiring that they are formally learnable and instance robust. Learnability ensures that predictions can be efficiently constructed from a reasonable amount of past data. Instance robustness ensures that the prediction is robust to modest changes in the problem input, where the measure of the change may be problem specific. Instance robustness insists on a smooth degradation in performance as a function of the change. Ideally, the performance is never worse than worst-case bounds. This also allows predictions to be objectively compared. We design online algorithms with predictions for a network flow allocation problem and restricted assignment makespan minimization. For both problems, two key properties are established: high quality predictions can be learned from a small sample of prior instances and these predictions are robust to errors that smoothly degrade as the underlying problem instance changes.

翻訳日:2022-09-22 03:24:12 公開日:2021-07-02

# (参考訳) 深層学習フレームワークにおけるhfox抵抗メモリの弱いリセット過程のモデル化

Model of the Weak Reset Process in HfOx Resistive Memory for Deep Learning Frameworks ( http://arxiv.org/abs/2107.06064v1 )

ライセンス: CC BY 4.0

Atreya Majumdar, Marc Bocquet, Tifenn Hirtzlin, Axel Laborieux, Jacques-Olivier Klein, Etienne Nowak, Elisa Vianello, Jean-Michel Portal, Damien Querlioz

(参考訳) 現在のディープラーニングトレーニングアルゴリズムの実装は、メモリとロジックユニット間のデータ転送による電力消費である。酸化物ベースのRRAMは、インメモリコンピューティングを実装するための優れた候補である。その弱いRESETの仕組みは、耐久性が高いデバイスの抵抗を調整できるので、学習には特に魅力的だ。しかし、この体制における抵抗的な変化の挙動は多くの変動に悩まされており、特にディープラーニングをシミュレートするツールと互換性のある方法でモデリングすることは特に困難である。本研究では,酸化ハフニウムRRAMにおける弱いRESET過程のモデルを示し,このモデルをPyTorchディープラーニングフレームワークに統合する。我々のモデルは、ハイブリッドCMOS/RRAM技術の実験に基づいて、ノイズの進行挙動とデバイス間変動(D2D)の両方を再現する。我々はこのツールを用いて、MNIST手書き文字認識タスクとCIFAR-10オブジェクト分類タスクにバイナリニューラルネットワークを訓練する。トレーニングプロセスへの影響を理解し,D2Dの変動性が最も有害な側面であることを識別するために,デバイス不完全性のさまざまな側面をシミュレートする。このフレームワークは、他の種類の記憶において、最も劣化の原因となるデバイス欠陥を特定するのと同じ方法で使用することができ、その結果、デバイスを最適化してこれらの欠陥の影響を減らすことができる。

The implementation of current deep learning training algorithms is power-hungry, owing to data transfer between memory and logic units. Oxide-based RRAMs are outstanding candidates to implement in-memory computing, which is less power-intensive. Their weak RESET regime, is particularly attractive for learning, as it allows tuning the resistance of the devices with remarkable endurance. However, the resistive change behavior in this regime suffers many fluctuations and is particularly challenging to model, especially in a way compatible with tools used for simulating deep learning. In this work, we present a model of the weak RESET process in hafnium oxide RRAM and integrate this model within the PyTorch deep learning framework. Validated on experiments on a hybrid CMOS/RRAM technology, our model reproduces both the noisy progressive behavior and the device-to-device (D2D) variability. We use this tool to train Binarized Neural Networks for the MNIST handwritten digit recognition task and the CIFAR-10 object classification task. We simulate our model with and without various aspects of device imperfections to understand their impact on the training process and identify that the D2D variability is the most detrimental aspect. The framework can be used in the same manner for other types of memories to identify the device imperfections that cause the most degradation, which can, in turn, be used to optimize the devices to reduce the impact of these imperfections.

翻訳日:2021-07-18 16:58:37 公開日:2021-07-02

# マルチドメイン学習における距離移動と干渉

Disentangling Transfer and Interference in Multi-Domain Learning ( http://arxiv.org/abs/2107.05445v1 )

ライセンス: Link先を確認

Yipeng Zhang, Tyler L. Hayes, Christopher Kanan

(参考訳) 人間は、あるドメインから別のドメインに知識を移すことがとても得意で、新しいタスクを素早く学習できます。同様に、転送学習は事前学習を用いた多くのコンピュータビジョン問題において大きな成功を収めた。しかし、ネットワークが異なるデータセットで定義された複数のタスクを学習するマルチドメイン学習における転送の利点は十分に研究されていない。複数のドメインを学ぶことは有益か、あるいはネットワーク容量が限られているため、ドメイン同士が干渉する可能性がある。本研究では,マルチドメイン学習において,干渉や知識伝達が発生する条件を解明する。干渉と転送を分離する新しいメトリクスを提案し、実験プロトコルをセットアップする。さらに,ネットワークキャパシティ,タスクグループ化,動的損失重み付けが干渉の軽減と伝達の促進に果たす役割について検討する。我々は、CIFAR-100、MiniPlaces、Tiny-ImageNetデータセットでこの結果を示す。

Humans are incredibly good at transferring knowledge from one domain to another, enabling rapid learning of new tasks. Likewise, transfer learning has enabled enormous success in many computer vision problems using pretraining. However, the benefits of transfer in multi-domain learning, where a network learns multiple tasks defined by different datasets, has not been adequately studied. Learning multiple domains could be beneficial or these domains could interfere with each other given limited network capacity. In this work, we decipher the conditions where interference and knowledge transfer occur in multi-domain learning. We propose new metrics disentangling interference and transfer and set up experimental protocols. We further examine the roles of network capacity, task grouping, and dynamic loss weighting in reducing interference and facilitating transfer. We demonstrate our findings on the CIFAR-100, MiniPlaces, and Tiny-ImageNet datasets.

翻訳日:2021-07-18 12:26:13 公開日:2021-07-02

# ニューラルネットワークを用いた因果構造学習のための逐次MDL

Prequential MDL for Causal Structure Learning with Neural Networks ( http://arxiv.org/abs/2107.05481v1 )

ライセンス: Link先を確認

Jorg Bornschein and Silvia Chiappa and Alan Malek and Rosemary Nan Ke

(参考訳) ベイジアンネットワークの構造と観測から因果関係を学習することは、科学と技術のいくつかの分野において共通の目標である。本稿では,適応性および過度にパラメータ化されたニューラルネットワークを用いて観測変数間の条件付き確率分布をモデル化した場合に,事前最小記述長原理(MDL)を用いてベイズネットワークの実用的なスコアリング関数を導出できることを示す。 MDL は Occam の Razor の具現化を表現し, 調整が必要な前処理やその他の正則化器を疎結合にすることなく, 可塑性および同相グラフ構造を得る。人工的および実世界のデータに競合する結果を実証する。スコアはしばしば変数間の強い非線形関係が存在する場合でも正しい構造を回復する。さらに, 分布シフト中の音源から観測を行った場合, 適応速度から因果構造を推定する最近の研究との関係についても考察した。

Learning the structure of Bayesian networks and causal relationships from observations is a common goal in several areas of science and technology. We show that the prequential minimum description length principle (MDL) can be used to derive a practical scoring function for Bayesian networks when flexible and overparametrized neural networks are used to model the conditional probability distributions between observed variables. MDL represents an embodiment of Occam's Razor and we obtain plausible and parsimonious graph structures without relying on sparsity inducing priors or other regularizers which must be tuned. Empirically we demonstrate competitive results on synthetic and real-world data. The score often recovers the correct structure even in the presence of strongly nonlinear relationships between variables; a scenario were prior approaches struggle and usually fail. Furthermore we discuss how the the prequential score relates to recent work that infers causal structure from the speed of adaptation when the observations come from a source undergoing distributional shift.

翻訳日:2021-07-18 12:25:43 公開日:2021-07-02

# 畳み込みニューラルバンド:ビジュアルアウェア広告のための確率的アルゴリズム

Convolutional Neural Bandit: Provable Algorithm for Visual-aware Advertising ( http://arxiv.org/abs/2107.07438v1 )

ライセンス: Link先を確認

Yikun Ban, Jingrui He

(参考訳) オンライン広告はウェブビジネスで広く使われている。画像表示は、顧客と対話する最も一般的なフォーマットの1つであると考えられている。コンテクチュアルなマルチアームバンディットは、レコメンデーション手順に存在する探索探索ジレンマを解決するための広告の適用に成功している。本稿では,視覚的広告にインスパイアされた畳み込みニューラルネットワーク(CNN)を用いて,探索のための上位信頼境界(UCB)とともに報酬関数を学習するコンテキスト的帯域幅アルゴリズムを提案する。また、ネットワークが過度にパラメータ化され、畳み込みニューラル・タンジェント・カーネル(CNTK)との強い接続が確立されたときに、ほぼ最適の後悔を$\tilde{\mathcal{O}}(\sqrt{T})$で証明する。最後に,提案手法の有効性を評価し,実世界画像データセット上でのucbベースのバンディットアルゴリズムよりも優れていることを示す。

Online advertising is ubiquitous in web business. Image displaying is considered as one of the most commonly used formats to interact with customers. Contextual multi-armed bandit has shown success in the application of advertising to solve the exploration-exploitation dilemma existed in the recommendation procedure. Inspired by the visual-aware advertising, in this paper, we propose a contextual bandit algorithm, where the convolutional neural network (CNN) is utilized to learn the reward function along with an upper confidence bound (UCB) for exploration. We also prove a near-optimal regret bound $\tilde{\mathcal{O}}(\sqrt{T})$ when the network is over-parameterized and establish strong connections with convolutional neural tangent kernel (CNTK). Finally, we evaluate the empirical performance of the proposed algorithm and show that it outperforms other state-of-the-art UCB-based bandit algorithms on real-world image data sets.

翻訳日:2021-07-18 12:25:14 公開日:2021-07-02

# thriceだけを書く: ドキュメントの作成、計算ノートブック、プレゼンテーションを1つのソースから作成

You Only Write Thrice: Creating Documents, Computational Notebooks and Presentations From a Single Source ( http://arxiv.org/abs/2107.06639v1 )

ライセンス: Link先を確認

Kacper Sokol and Peter Flach

(参考訳) 学術的な取引では、原稿、プレゼンテーション、ポスター、計算ノートなど、異なるフォーマットで出版された同じコンテンツの複数のバリエーションをジャグリングする必要がある。 write-review--rebut--revise ライフサイクルに対応するバージョンを追跡する必要があると、別の複雑さが増す。本稿では,バージョン管理環境(gitなど)における単一ソースドキュメントの維持と,アカデミックで広く普及している出力フォーマットのコレクションを生成する機能の追加により,この負担を大幅に軽減することを提案する。この目的のために、Jupyterの科学計算エコシステムからさまざまなオープンソースツールを活用し、選択したソフトウェアエンジニアリング概念を運用する。概念実証ワークフローを提供し、jupyter book(オンラインドキュメント)、jupyter notebook(計算ナラティブ)、clearly.jsのスライドを単一のmarkdownソースファイルから作成します。 githubにホストされているこのアプローチは、変更追跡とバージョニングをサポートし、基盤となるコードイシュー管理インフラストラクチャに基づいた透過的なレビュープロセスもサポートする。私たちのワークフローの展示はhttps://so-cool.github.io/you-only-write-thrice/でプレビューできます。

Academic trade requires juggling multiple variants of the same content published in different formats: manuscripts, presentations, posters and computational notebooks. The need to track versions to accommodate for the write--review--rebut--revise life-cycle adds another layer of complexity. We propose to significantly reduce this burden by maintaining a single source document in a version-controlled environment (such as git), adding functionality to generate a collection of output formats popular in academia. To this end, we utilise various open-source tools from the Jupyter scientific computing ecosystem and operationalise selected software engineering concepts. We offer a proof-of-concept workflow that composes Jupyter Book (an online document), Jupyter Notebook (a computational narrative) and reveal.js slides from a single markdown source file. Hosted on GitHub, our approach supports change tracking and versioning, as well as a transparent review process based on the underlying code issue management infrastructure. An exhibit of our workflow can be previewed at https://so-cool.github.io/you-only-write-thrice/.

翻訳日:2021-07-18 12:24:57 公開日:2021-07-02

# 量子ビットを話す方法

How to make qubits speak ( http://arxiv.org/abs/2107.06776v1 )

ライセンス: Link先を確認

Bob Coecke, Giovanni de Felice, Konstantinos Meichanetzidis, Alexis Toumi

(参考訳) これは量子コンピュータを話させ、それを量子ネイティブ、合成、意味認識の方法で行う物語である。最近,実際の量子コンピュータを用いて質問応答を行った。私たちは、何をしたかを説明し、これはすべて写真の観点から行われたことを強調し、関連する文献に多くのポインタを提供する。実際、自然言語以外にも、他の多くのことは、量子ネイティブで、構成的で、意味を意識した方法で実装することができる。また、実際に実行するためのガイダンスも提供しています。

This is a story about making quantum computers speak, and doing so in a quantum-native, compositional and meaning-aware manner. Recently we did question-answering with an actual quantum computer. We explain what we did, stress that this was all done in terms of pictures, and provide many pointers to the related literature. In fact, besides natural language, many other things can be implemented in a quantum-native, compositional and meaning-aware manner, and we provide the reader with some indications of that broader pictorial landscape, including our account on the notion of compositionality. We also provide some guidance for the actual execution, so that the reader can give it a go as well.

翻訳日:2021-07-18 12:23:59 公開日:2021-07-02

# 戦略的航空交通流管理のための気象シーンの圧縮表現

Compressive Representations of Weather Scenes for Strategic Air Traffic Flow Management ( http://arxiv.org/abs/2107.06394v1 )

ライセンス: Link先を確認

Sandip Roy

(参考訳) 戦略的航空交通フロー管理の目的を支援するため,高次元気象シーンデータの長期的表現について検討した。具体的には,航空関係の気象シーンが圧縮可能かどうかについて考察する。ここでは,metarデータ(気温,飛行カテゴリ,アメリカ大陸の可視性プロファイルを含む)から抽出した気象シーンの圧縮をグラフスペクトルベースで検討した。シーンは圧縮可能であり、シーンコンテンツの75-95%は基底ベクトルの0.5-4%でキャプチャされる。さらに、各シーンにおける支配的基底ベクトルは、気象の時変空間特性を識別し、圧縮された表現からの再構成を示す。最後に、戦略的TFM設計における圧縮表現の潜在的利用について概説する。

Terse representation of high-dimensional weather scene data is explored, in support of strategic air traffic flow management objectives. Specifically, we consider whether aviation-relevant weather scenes are compressible, in the sense that each scene admits a possibly-different sparse representation in a basis of interest. Here, compression of weather scenes extracted from METAR data (including temperature, flight categories, and visibility profiles for the contiguous United States) is examined, for the graph-spectral basis. The scenes are found to be compressible, with 75-95% of the scene content captured using 0.5-4% of the basis vectors. Further, the dominant basis vectors for each scene are seen to identify time-varying spatial characteristics of the weather, and reconstruction from the compressed representation is demonstrated. Finally, potential uses of the compressive representations in strategic TFM design are briefly scoped.

翻訳日:2021-07-18 12:23:49 公開日:2021-07-02

# CHASE: セルレベル微分可能なニューラルネットワークによるロバストなビジュアルトラッキング

CHASE: Robust Visual Tracking via Cell-Level Differentiable Neural Architecture Search ( http://arxiv.org/abs/2107.03463v1 )

ライセンス: Link先を確認

Seyed Mojtaba Marvasti-Zadeh, Javad Khaghani, Li Cheng, Hossein Ghanei-Yakhdan, Shohreh Kasaei

(参考訳) 現在、強力なビジュアルオブジェクトトラッカーは、手作業で設計されたネットワークアーキテクチャで高品質なトラッキング結果を提供する、よく作られたモジュールに依存している。手動設計プロセスは、十分な事前経験、膨大な努力、直感、そしておそらく幸運を必要とするため、特に困難な障壁となる。一方,ニューラルネットワーク検索は,実現可能なネットワーク構造の自動探索問題に取り組むための有望な手法として,画像分割などの実用的応用において基盤となっている。本研究では,トラッキングモジュールのネットワーク設計を自動化し,オフライントレーニング中のトラッキングネットワークの目的にバックボーン機能を適用することを目的とした,セルレベルの差別化可能なアーキテクチャ探索機構を提案する。提案されたアプローチはシンプルで効率的であり、ネットワークを構築するために一連のモジュールを積み重ねる必要はない。我々の手法は既存のトラッカーに組み込むことが簡単であり、異なるアーキテクチャ検索手法と追跡対象を用いて実証的に検証されている。広範な実験評価の結果,5つのベンチマークにおいて優れた性能が得られた。一方、私たちの自動検索プロセスは、trackingnetデータセット上の第2(第1)のdartsメソッドに41時間(18時間)かかります。

A strong visual object tracker nowadays relies on its well-crafted modules, which typically consist of manually-designed network architectures to deliver high-quality tracking results. Not surprisingly, the manual design process becomes a particularly challenging barrier, as it demands sufficient prior experience, enormous effort, intuition and perhaps some good luck. Meanwhile, neural architecture search has gaining grounds in practical applications such as image segmentation, as a promising method in tackling the issue of automated search of feasible network structures. In this work, we propose a novel cell-level differentiable architecture search mechanism to automate the network design of the tracking module, aiming to adapt backbone features to the objective of a tracking network during offline training. The proposed approach is simple, efficient, and with no need to stack a series of modules to construct a network. Our approach is easy to be incorporated into existing trackers, which is empirically validated using different differentiable architecture search-based methods and tracking objectives. Extensive experimental evaluations demonstrate the superior performance of our approach over five commonly-used benchmarks. Meanwhile, our automated searching process takes 41 (18) hours for the second (first) order DARTS method on the TrackingNet dataset.

翻訳日:2021-07-11 11:37:54 公開日:2021-07-02

# Deep Mesh Prior: グラフ畳み込みネットワークを用いた教師なしメッシュ復元

Deep Mesh Prior: Unsupervised Mesh Restoration using Graph Convolutional Networks ( http://arxiv.org/abs/2107.02909v1 )

ライセンス: Link先を確認

Shota Hattori, Tatsuya Yatagawa, Yutaka Ohtake, Hiromasa Suzuki

(参考訳) 本稿では,教師なしの方法で自己相似性を学習することで,メッシュ復元問題,すなわち分節化と完了に対処する。そこで,本提案手法では,メッシュ上のグラフ畳み込みネットワークを用いて自己相似性を学習する。ネットワークは入力データとして単一の不完全なメッシュを取り、大規模なデータセットを使用してトレーニングされることなく、再構築されたメッシュを直接出力する。本手法では,プロセス全体がメッシュで動作するため,暗黙のフィールドなどの中間表現は使用しない。我々の教師なし手法は大規模データセットを用いた最先端手法と同等かそれ以上に機能することを示した。

This paper addresses mesh restoration problems, i.e., denoising and completion, by learning self-similarity in an unsupervised manner. For this purpose, the proposed method, which we refer to as Deep Mesh Prior, uses a graph convolutional network on meshes to learn the self-similarity. The network takes a single incomplete mesh as input data and directly outputs the reconstructed mesh without being trained using large-scale datasets. Our method does not use any intermediate representations such as an implicit field because the whole process works on a mesh. We demonstrate that our unsupervised method performs equally well or even better than the state-of-the-art methods using large-scale datasets.

翻訳日:2021-07-11 11:36:22 公開日:2021-07-02

# (参考訳) 機械学習の問題を解決する

Solving Machine Learning Problems ( http://arxiv.org/abs/2107.01238v1 )

ライセンス: CC BY 4.0

Sunny Tran, Pranav Krishna, Ishan Pakuwal, Prabhakar Kafle, Nikhil Singh, Jayson Lynch, Iddo Drori

(参考訳) 機械は機械学習を学べるのか? この研究は、機械学習モデルをトレーニングして、大学の学部レベルのコースから機械学習問題を解決する。我々は、MITの6.036のIntroduction to Machine Learningコースからコース演習、宿題、クイズ質問からなる新しいトレーニングセットを作成し、これらの質問に答えるために機械学習モデルをトレーニングします。本システムでは,MIT学生の平均93%に対して,オープン応答質問では96%,マルチチョイス質問では97%の総合的精度をリアルタイムで達成している。質問はコースで教えられた12のトピックすべてをカバーする。 i)基本的な機械学習原則、(ii)パーセプトロン、(iii)特徴抽出と選択、(iv)ロジスティック回帰、(v)回帰、(vi)ニューラルネットワーク、(vi)高度なニューラルネットワーク、(viii)畳み込みニューラルネットワーク、(ix)リカレントニューラルネットワーク、(x)ステートマシンとmdp、(xi)強化学習、(xii)決定木。本システムは,グラフとツリー表現を備えたエンコーダデコーダアーキテクチャ内でTransformerモデルを使用する。私たちのアプローチの重要な側面は、新しい例問題を生成するためのデータ提供スキームです。また、問題ヒントを生成するために機械学習モデルをトレーニングします。そこで,本システムは,トピック間の新たな質問を自動的に生成し,オープン応答質問と複数質問の両方に回答し,問題を分類し,問題ヒントを生成し,stem教育のためのaiの包含を押し上げる。

Can a machine learn Machine Learning? This work trains a machine learning model to solve machine learning problems from a University undergraduate level course. We generate a new training set of questions and answers consisting of course exercises, homework, and quiz questions from MIT's 6.036 Introduction to Machine Learning course and train a machine learning model to answer these questions. Our system demonstrates an overall accuracy of 96% for open-response questions and 97% for multiple-choice questions, compared with MIT students' average of 93%, achieving grade A performance in the course, all in real-time. Questions cover all 12 topics taught in the course, excluding coding questions or questions with images. Topics include: (i) basic machine learning principles; (ii) perceptrons; (iii) feature extraction and selection; (iv) logistic regression; (v) regression; (vi) neural networks; (vii) advanced neural networks; (viii) convolutional neural networks; (ix) recurrent neural networks; (x) state machines and MDPs; (xi) reinforcement learning; and (xii) decision trees. Our system uses Transformer models within an encoder-decoder architecture with graph and tree representations. An important aspect of our approach is a data-augmentation scheme for generating new example problems. We also train a machine learning model to generate problem hints. Thus, our system automatically generates new questions across topics, answers both open-response questions and multiple-choice questions, classifies problems, and generates problem hints, pushing the envelope of AI for STEM education.

翻訳日:2021-07-07 13:10:08 公開日:2021-07-02

# (参考訳) AutoMLサロゲートモデリング最適化のための機械学習パイプラインツールキットの設計

Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization ( http://arxiv.org/abs/2107.01253v1 )

ライセンス: CC BY 4.0

Paulito P. Palmes, Akihiro Kishimoto, Radu Marinescu, Parikshit Ram, Elizabeth Daly

(参考訳) 機械学習におけるパイプライン最適化問題は、パイプライン構造とそれらの要素のパラメータ適応の同時最適化を必要とする。これらの構造を表現するエレガントな方法を持つことは、最適化戦略の異なる選択とともに、パフォーマンスの管理と分析の複雑さを減らすのに役立ちます。これらの問題を念頭に,我々は,シンプルな表現を用いた複雑な機械学習パイプライン構造の作成と評価を容易にするAMLPツールキットを開発した。 AMLPを使って最適なパイプラインシグネチャを見つけ、それらをデータマイニングし、これらのデータマイニング機能を使って学習と予測を高速化します。我々は、AMLP計算時間5分未満で4時間の予算で他のAutoMLアプローチを上回り、AMLPのサロゲートモデルを用いた2段階パイプライン最適化を定式化した。

The pipeline optimization problem in machine learning requires simultaneous optimization of pipeline structures and parameter adaptation of their elements. Having an elegant way to express these structures can help lessen the complexity in the management and analysis of their performances together with the different choices of optimization strategies. With these issues in mind, we created the AMLP toolkit which facilitates the creation and evaluation of complex machine learning pipeline structures using simple expressions. We use AMLP to find optimal pipeline signatures, datamine them, and use these datamined features to speed-up learning and prediction. We formulated a two-stage pipeline optimization with surrogate modeling in AMLP which outperforms other AutoML approaches with a 4-hour time budget in less than 5 minutes of AMLP computation time.

翻訳日:2021-07-07 13:08:54 公開日:2021-07-02

# (参考訳) 注意モデルを用いた新しい災害画像データセットと特徴解析

A Novel Disaster Image Dataset and Characteristics Analysis using Attention Model ( http://arxiv.org/abs/2107.01284v1 )

ライセンス: CC BY 4.0

Fahim Faisal Niloy, Arif, Abu Bakar Siddik Nayem, Anis Sarker, Ovi Paul, M. Ashraful Amin, Amin Ahsan Ali, Moinul Islam Zaber, AKM Mahbubur Rahman

(参考訳) ディープラーニング技術の進歩により、他の分類技術よりも優れたシステムを開発することができた。しかし,実験システムの成功は,提案システムの学習に利用可能なデータの品質と多様性に依存している。本研究では, 火災, 水, 陸の3つの災害現場から収集した画像を含む比較的困難なデータセットを慎重に収集した。また,自然や人による災害や,戦争や事故による人的被害など,さまざまな被害インフラの画像も収集した。また,このような災害や被害の徴候のない画像を含む非損傷クラスに対する画像データも蓄積した。このデータセットには13,720の注釈付き画像があり、各画像は3人で注釈付けされている。また,200種類のテスト画像に対して,バウンディングボックスを手作業で付与した画像クラス情報を識別する。画像は、他の研究者が利用可能なさまざまなニュースポータル、ソーシャルメディア、標準データセットから収集される。 3層注意モデル(tlam)を訓練し、平均5つの折りたたみ検証精度95.88%を達成する。さらに、200個の未検出画像では、この精度は96.48%である。また,これらの実験画像に対して注意マップを作成し,比較し,注意モデルの特性について検討した。私たちのデータセットはhttps://niloy193.github.io/Disaster-Datasetで利用可能です。

The advancement of deep learning technology has enabled us to develop systems that outperform any other classification technique. However, success of any empirical system depends on the quality and diversity of the data available to train the proposed system. In this research, we have carefully accumulated a relatively challenging dataset that contains images collected from various sources for three different disasters: fire, water and land. Besides this, we have also collected images for various damaged infrastructure due to natural or man made calamities and damaged human due to war or accidents. We have also accumulated image data for a class named non-damage that contains images with no such disaster or sign of damage in them. There are 13,720 manually annotated images in this dataset, each image is annotated by three individuals. We are also providing discriminating image class information annotated manually with bounding box for a set of 200 test images. Images are collected from different news portals, social media, and standard datasets made available by other researchers. A three layer attention model (TLAM) is trained and average five fold validation accuracy of 95.88% is achieved. Moreover, on the 200 unseen test images this accuracy is 96.48%. We also generate and compare attention maps for these test images to determine the characteristics of the trained attention model. Our dataset is available at https://niloy193.github.io/Disaster-Dataset

翻訳日:2021-07-07 12:58:27 公開日:2021-07-02

# (参考訳) 2値分類と変更点検出のためのソルトベースサロゲート損失関数を用いたROC曲線の最適化

Optimizing ROC Curves with a Sort-Based Surrogate Loss Function for Binary Classification and Changepoint Detection ( http://arxiv.org/abs/2107.01285v1 )

ライセンス: CC BY 4.0

Jonathan Hillman and Toby Dylan Hocking

(参考訳) 受信者動作特性(roc)曲線は、バイナリ分類モデルの評価に有用な真正率と偽正率のプロットであるが、曲線(auc)下の領域が凸でないため、学習に使用するのが困難である。 ROC曲線は、変化点検出のような偽陽性と真正の確率を持つ他の問題にも用いられる。このより一般的な文脈では、ROC曲線はループ、高い準最適誤差率を持つ点、AUCが1より大きい点を持つことが示される。この観測は、AUCを最大化する代わりに、Min(FP,FN) に対して大きな値を持つ点を避ける AUC=1 の単調ROC曲線を求める。本研究では,AUMと呼ばれる新しいサロゲート損失関数(AUM, Area Under Min(FP, FN))を導出する凸緩和法を提案する。以前の損失関数はすべてのラベル付き例やペアの和に基づいているが、AUMはROC曲線上の点列上のソートと和を必要とする。勾配降下学習アルゴリズムでは,AUM方向微分を効率的に計算し,利用できることを示す。教師付きバイナリ分類と変更点検出問題に関する実証的研究では、新しいAUM最小化学習アルゴリズムがAUCを改良し、以前のベースラインと同等の速度をもたらすことを示した。

Receiver Operating Characteristic (ROC) curves are plots of true positive rate versus false positive rate which are useful for evaluating binary classification models, but difficult to use for learning since the Area Under the Curve (AUC) is non-convex. ROC curves can also be used in other problems that have false positive and true positive rates such as changepoint detection. We show that in this more general context, the ROC curve can have loops, points with highly sub-optimal error rates, and AUC greater than one. This observation motivates a new optimization objective: rather than maximizing the AUC, we would like a monotonic ROC curve with AUC=1 that avoids points with large values for Min(FP,FN). We propose a convex relaxation of this objective that results in a new surrogate loss function called the AUM, short for Area Under Min(FP, FN). Whereas previous loss functions are based on summing over all labeled examples or pairs, the AUM requires a sort and a sum over the sequence of points on the ROC curve. We show that AUM directional derivatives can be efficiently computed and used in a gradient descent learning algorithm. In our empirical study of supervised binary classification and changepoint detection problems, we show that our new AUM minimization learning algorithm results in improved AUC and comparable speed relative to previous baselines.

翻訳日:2021-07-07 12:46:27 公開日:2021-07-02

# (参考訳) Scarecrow: マシンテキストの精査のためのフレームワーク

Scarecrow: A Framework for Scrutinizing Machine Text ( http://arxiv.org/abs/2107.01294v1 )

ライセンス: CC BY 4.0

Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, Noah A.Smith, Yejin Choi

(参考訳) 現代のニューラルテキスト生成システムは、驚くほど流動的で文法的なテキストを生成することができる。初期の言語モデルは反復と構文上の誤りに苦しんだが、現代のモデルによる誤りはしばしば意味的、物語的、あるいは談話的失敗である。これらの複雑なエラータイプの研究を容易にするために、Scarecrowと呼ばれる新しい構造化されたクラウドソースエラーアノテーションスキーマを導入する。 Scarecrowで使用されるエラーカテゴリ(冗長性、コモンセンスエラー、不整合など)は、専門家分析とオントロジーのないクラウドアノテーションのパイロットラウンドを組み合わせて、実際のマシン生成テキストで見られるエラー現象をカバーするスキーマに到達することで特定された。我々は、Scarecrowを使って1.3kの人文と機械が生成する英語ニューステキストの13kのアノテーションを収集し、それぞれ41k以上のスパンにエラーカテゴリ、重大さ、自然言語の説明、先行スパン(関連する部分)をラベル付けした。我々は、GPT-2 Smallから最大のGPT-3まで、様々なパフォーマンスレベルを持つ最先端システムによって生成されたテキストのアノテーションを収集する。パラメータ数,トレーニングデータ,復号化技術など,詳細な解析のためのいくつかの因子を分離した。以上の結果から,これらの設定の相違点が期待できる。これらの結果から,現在および将来のテキスト生成システムの評価において,カカシアノテーションの価値が示された。私たちは完全なアノテーションツールキットとデータセットをhttps://yao-dou.github.io/scarecrow/でリリースしています。

Modern neural text generation systems can produce remarkably fluent and grammatical texts. While earlier language models suffered from repetition and syntactic errors, the errors made by contemporary models are often semantic, narrative, or discourse failures. To facilitate research of these complex error types, we introduce a new structured, crowdsourced error annotation schema called Scarecrow. The error categories used in Scarecrow -- such as redundancy, commonsense errors, and incoherence -- were identified by combining expert analysis with several pilot rounds of ontology-free crowd annotation to arrive at a schema which covers the error phenomena found in real machine generated text. We use Scarecrow to collect 13k annotations of 1.3k human and machine generate paragraphs of English language news text, amounting to over 41k spans each labeled with its error category, severity, a natural language explanation, and antecedent span (where relevant). We collect annotations for text generated by state-of-the-art systems with varying known performance levels, from GPT-2 Small through the largest GPT-3. We isolate several factors for detailed analysis, including parameter count, training data, and decoding technique. Our results show both expected and surprising differences across these settings. These findings demonstrate the value of Scarecrow annotations in the assessment of current and future text generation systems. We release our complete annotation toolkit and dataset at https://yao-dou.github.io/scarecrow/.

翻訳日:2021-07-07 12:21:44 公開日:2021-07-02

# (参考訳) ニューラルネットワークのサブスペースクラスタリングに基づく解析

Subspace Clustering Based Analysis of Neural Networks ( http://arxiv.org/abs/2107.01296v1 )

ライセンス: CC BY 4.0

Uday Singh Saini, Pravallika Devineni, Evangelos E. Papalexakis

(参考訳) ディープニューラルネットワークの潜在空間を分析するツールは、それらを理解するためのステップを提供する。本研究では,入力セット上で訓練されたニューラルネットワーク層の潜在構造から親和性グラフを学習することを目的として,スパース部分空間クラスタリング(ssc)の動機付けを行う。次に、コミュニティ検出のツールを使用して、入力に存在する構造を定量化する。これらの実験は、ネットワークの奥深くに進むにつれて、入力は同じクラスの他の入力と親和性が高まる傾向があることを示しています。次に,アフィニティグラフ間の層間比較を行うために,行列類似度尺度を利用する。そうすることで、我々はまず、トレーニング中のある層を最終状態と比較すると、ネットワークの層が浅いほど、より深い層よりも収束が早いことを実証する。ネットワークアーキテクチャ全体のペアワイズ分析を行う場合、ネットワークのサイズが大きくなるにつれて、各レイヤが隣のレイヤと適度に類似している状態から、ブロック内のレイヤが他のブロックのレイヤと高い類似度を持つ状態へと再編成されるのが観察される。最後に,ネットワークの最終畳み込み層の学習された親和性グラフを分析し,入力の局所的近傍がネットワークの分類にどのように影響するかを示す。

Tools to analyze the latent space of deep neural networks provide a step towards better understanding them. In this work, we motivate sparse subspace clustering (SSC) with an aim to learn affinity graphs from the latent structure of a given neural network layer trained over a set of inputs. We then use tools from Community Detection to quantify structures present in the input. These experiments reveal that as we go deeper in a network, inputs tend to have an increasing affinity to other inputs of the same class. Subsequently, we utilise matrix similarity measures to perform layer-wise comparisons between affinity graphs. In doing so we first demonstrate that when comparing a given layer currently under training to its final state, the shallower the layer of the network, the quicker it is to converge than the deeper layers. When performing a pairwise analysis of the entire network architecture, we observe that, as the network increases in size, it reorganises from a state where each layer is moderately similar to its neighbours, to a state where layers within a block have high similarity than to layers in other blocks. Finally, we analyze the learned affinity graphs of the final convolutional layer of the network and demonstrate how an input's local neighbourhood affects its classification by the network.

翻訳日:2021-07-07 11:50:32 公開日:2021-07-02

# データ不確かさに基づく指紋の前処理

Data Uncertainty Guided Noise-aware Preprocessing Of Fingerprints ( http://arxiv.org/abs/2107.01248v1 )

ライセンス: Link先を確認

Indu Joshi and Ayush Utkarsh and Riya Kothari and Vinod K Kurmi and Antitza Dantcheva and Sumantra Dutta Roy and Prem Kumar Kalra

(参考訳) 良質な指紋に対する指紋認証システムの有効性は, 昔から確立されてきた。しかし, ノイズや品質の悪い指紋に対する標準指紋照合システムの性能は十分ではない。そこで本研究では,最先端の指紋前処理モデルを用いて,入力画像に存在する雑音を定量化し,背景雑音やリッジの明度が低い指紋領域を識別する手法を提案する。ノイズの定量化は、2つの折りたたみモデルに役立つ: まず、目的関数を特定の入力指紋のノイズに適応させ、その結果、ノイズや歪んだ指紋領域の堅牢な性能を達成する。第二に、入力指紋画像中のノイズの多い画素を示すノイズ分散マップを提供する。予測ノイズ分散マップは、入力画像に存在するノイズによる誤予測をエンドユーザが理解できるようにする。様々なアーキテクチャの選択と2つの指紋処理タスクにわたる13の公開指紋データベースの広範な実験評価は,提案手法の有効性を示している。

The effectiveness of fingerprint-based authentication systems on good quality fingerprints is established long back. However, the performance of standard fingerprint matching systems on noisy and poor quality fingerprints is far from satisfactory. Towards this, we propose a data uncertainty-based framework which enables the state-of-the-art fingerprint preprocessing models to quantify noise present in the input image and identify fingerprint regions with background noise and poor ridge clarity. Quantification of noise helps the model two folds: firstly, it makes the objective function adaptive to the noise in a particular input fingerprint and consequently, helps to achieve robust performance on noisy and distorted fingerprint regions. Secondly, it provides a noise variance map which indicates noisy pixels in the input fingerprint image. The predicted noise variance map enables the end-users to understand erroneous predictions due to noise present in the input image. Extensive experimental evaluation on 13 publicly available fingerprint databases, across different architectural choices and two fingerprint processing tasks demonstrate effectiveness of the proposed framework.

翻訳日:2021-07-06 15:22:57 公開日:2021-07-02

# リラックスした注意:エンドツーエンド自動音声認識の性能向上のための簡易手法

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition ( http://arxiv.org/abs/2107.01275v1 )

ライセンス: Link先を確認

Timo Lohrenz, Patrick Schwarz, Zhengyang Li, Tim Fingscheidt

(参考訳) 近年,アテンションベースのエンコーダデコーダ(AED)モデルでは,複数タスクにわたるエンドツーエンド自動音声認識(ASR)の性能が向上している。本稿では,2行のコードで容易に実装できる訓練において,エンコーダ・デコーダの注意重みに対する一様分布の簡易な段階的注入である緩和注意の概念を紹介する。我々は,様々なAEDモデルアーキテクチャと,ウォールストリートジャーナル (WSJ) とリブリスペック (Librispeech) の2つの顕著なASRタスクにおける緩和された注意の効果について検討した。ゆるやかな注意で訓練されたトランスフォーマーは、外部言語モデルで復号する際に標準ベースラインモデルより一貫して優れていた。 wsjでは、単語誤り率3.65%のトランスフォーマ・エンド・ツー・エンド音声認識のベンチマークを新たに設定し、その性能(4.20%)を13.1%向上させた。受け入れられると、モデルはgithubで公開される。

Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in this paper we introduce the concept of relaxed attention, which is a simple gradual injection of a uniform distribution to the encoder-decoder attention weights during training that is easily implemented with two lines of code. We investigate the effect of relaxed attention across different AED model architectures and two prominent ASR tasks, Wall Street Journal (WSJ) and Librispeech. We found that transformers trained with relaxed attention outperform the standard baseline models consistently during decoding with external language models. On WSJ, we set a new benchmark for transformer-based end-to-end speech recognition with a word error rate of 3.65%, outperforming state of the art (4.20%) by 13.1% relative, while introducing only a single hyperparameter. Upon acceptance, models will be published on github.

翻訳日:2021-07-06 15:12:16 公開日:2021-07-02

# Trncated Marginal Neural Ratio Estimation

Truncated Marginal Neural Ratio Estimation ( http://arxiv.org/abs/2107.01214v1 )

ライセンス: Link先を確認

Benjamin Kurt Miller, Alex Cole, Patrick Forr\'e, Gilles Louppe, Christoph Weniger

(参考訳) パラメトリック確率シミュレータは科学においてユビキタスであり、しばしば高次元の入力パラメータと/または難易度を特徴とする。この文脈でベイズパラメータ推論を行うことは困難である。本稿では,シミュレーション効率と高速な実験後テスト性を備えたニューラルシミュレータに基づく推論アルゴリズムを提案する。提案手法は, 関節後部ではなく低次元縁後部を同時に推定し, インジケータ関数によって適切に切り替わる前の観察を目的としたシミュレーションを提案する。さらに, 局所的償却後を推定することにより, 推定結果のロバスト性に関する効率的な実証実験が可能となる。このようなテストは、実世界のアプリケーションにおける正当性チェックの推論において重要である。シミュレーションベース推論ベンチマークのマージン化版と,2つの複雑で狭い後方部について実験を行い,本アルゴリズムのシミュレーター効率と推定された後方値の品質について検討した。 github上の実装。

Parametric stochastic simulators are ubiquitous in science, often featuring high-dimensional input parameters and/or an intractable likelihood. Performing Bayesian parameter inference in this context can be challenging. We present a neural simulator-based inference algorithm which simultaneously offers simulation efficiency and fast empirical posterior testability, which is unique among modern algorithms. Our approach is simulation efficient by simultaneously estimating low-dimensional marginal posteriors instead of the joint posterior and by proposing simulations targeted to an observation of interest via a prior suitably truncated by an indicator function. Furthermore, by estimating a locally amortized posterior our algorithm enables efficient empirical tests of the robustness of the inference results. Such tests are important for sanity-checking inference in real-world applications, which do not feature a known ground truth. We perform experiments on a marginalized version of the simulation-based inference benchmark and two complex and narrow posteriors, highlighting the simulator efficiency of our algorithm as well as the quality of the estimated marginal posteriors. Implementation on GitHub.

翻訳日:2021-07-06 15:09:17 公開日:2021-07-02

# Visual Time Series Forecasting: イメージ駆動型アプローチ

Visual Time Series Forecasting: An Image-driven Approach ( http://arxiv.org/abs/2107.01273v1 )

ライセンス: Link先を確認

Naftali Cohen, Srijan Sood, Zhen Zeng, Tucker Balch, Manuela Veloso

(参考訳) 本研究では,時系列予測をコンピュータビジョンタスクとして扱う。入力データを画像としてキャプチャし,モデルをトレーニングして次の画像を生成する。このアプローチは、ポイントワイズ値とは対照的に分布を予測する。提案手法のロバスト性と品質を評価するため,様々なデータセットと複数の評価指標について検討する。実験の結果, 予測ツールは循環データには有効であるが, 株価などの不規則データには若干少ないことがわかった。重要な点は、画像に基づく評価メトリクスを使用する場合、arimaを含むさまざまなベースラインと、ディープラーニングアプローチの数値的変化を比較できる方法を見つけることです。

In this work, we address time-series forecasting as a computer vision task. We capture input data as an image and train a model to produce the subsequent image. This approach results in predicting distributions as opposed to pointwise values. To assess the robustness and quality of our approach, we examine various datasets and multiple evaluation metrics. Our experiments show that our forecasting tool is effective for cyclic data but somewhat less for irregular data such as stock prices. Importantly, when using image-based evaluation metrics, we find our method to outperform various baselines, including ARIMA, and a numerical variation of our deep learning approach.

翻訳日:2021-07-06 15:07:17 公開日:2021-07-02

# バリューファンクションギャップを超えて: エピソード強化学習のためのインスタンス依存レグレスト境界の改善

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning ( http://arxiv.org/abs/2107.01264v1 )

ライセンス: Link先を確認

Christoph Dann, Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

(参考訳) 有限エピソディックマルコフ決定過程における強化学習のためのギャップ依存的後悔境界の改善を提案する。以前の仕事と比較して、私たちの境界はギャップの代替定義に依存する。これらの定義は、好意的な後悔を達成するために、アルゴリズムが最適なポリシーによって達成されない状態において最適に振る舞う方法を学習する必要がないという洞察に基づいている。楽観的なアルゴリズムでは,より強い後悔境界を証明し,多数のMDPに対して新たな情報理論的下限を伴う。楽観的アルゴリズムは, 決定論的 MDP においても, 独特な最適政策がない限り, 情報理論の下限を達成できないことを示す。

We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the insight that, in order to achieve a favorable regret, an algorithm does not need to learn how to behave optimally in states that are not reached by an optimal policy. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs. Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.

翻訳日:2021-07-06 14:56:21 公開日:2021-07-02

# 力学系のための物理誘導深層学習:調査

Physics-Guided Deep Learning for Dynamical Systems: A survey ( http://arxiv.org/abs/2107.01272v1 )

ライセンス: Link先を確認

Rui Wang

(参考訳) 複雑な物理力学のモデリングは科学と工学の基本的な課題である。従来の物理モデルは解釈可能であるが、厳密な仮定に依存している。直接数値近似は通常計算集約であり、かなりの計算資源と専門知識を必要とする。ディープラーニング(DL)は、複雑なパターンを効率的に認識し、非線形力学をエミュレートするための新しい代替手段を提供するが、必ずしも物理系の規則に従わないし、異なるシステムにまたがってうまく一般化しない。このように、物理誘導型DLの研究が登場し、大きな進歩を遂げた。物理学に基づくモデリングと最先端のDLモデルの両方から、科学的な問題を解決することを目指している。本稿では,従来の物理知識や物理に基づくモデリングをDLに統合する手法について概説し,新たな可能性について論じる。

Modeling complex physical dynamics is a fundamental task in science and engineering. Traditional physics-based models are interpretable but rely on rigid assumptions. And the direct numerical approximation is usually computationally intensive, requiring significant computational resources and expertise. While deep learning (DL) provides novel alternatives for efficiently recognizing complex patterns and emulating nonlinear dynamics, it does not necessarily obey the governing laws of physical systems, nor do they generalize well across different systems. Thus, the study of physics-guided DL emerged and has gained great progress. It aims to take the best from both physics-based modeling and state-of-the-art DL models to better solve scientific problems. In this paper, we provide a structured overview of existing methodologies of integrating prior physical knowledge or physics-based modeling into DL and discuss the emerging opportunities.

翻訳日:2021-07-06 14:56:10 公開日:2021-07-02

# 過パラメータ線形ネットワークによるオートエンコーダにおける暗黙の欲欲ランク学習

Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks ( http://arxiv.org/abs/2107.01301v1 )

ライセンス: Link先を確認

Shih-Yu Sun, Vimal Thilak, Etai Littwin, Omid Saremi, Joshua M. Susskind

(参考訳) 勾配降下で訓練された深い線形ネットワークは、行列因子分解でよく研究されるように、低いランクの解を与える。本稿では,さらに一歩進めて,オートエンコーダにおける暗黙のランク正規化を解析する。オートエンコーダボトルネックにおける線形サブネットワークによって誘導される低ランク潜伏符号の欲望学習を示す。さらに,スペクトル先行および線形深度に対するトレーニング力学の感度を緩和するために,直交初期化と学習率調整法を提案する。合成データ上の線形オートエンコーダでは,本手法は定常的に基底潜在コードランクに収束する。非線形オートエンコーダでは,下流分類と画像サンプリングに最適な潜在ランクに収束する。

Deep linear networks trained with gradient descent yield low rank solutions, as is typically studied in matrix factorization. In this paper, we take a step further and analyze implicit rank regularization in autoencoders. We show greedy learning of low-rank latent codes induced by a linear sub-network at the autoencoder bottleneck. We further propose orthogonal initialization and principled learning rate adjustment to mitigate sensitivity of training dynamics to spectral prior and linear depth. With linear autoencoders on synthetic data, our method converges stably to ground-truth latent code rank. With nonlinear autoencoders, our method converges to latent ranks optimal for downstream classification and image sampling.

翻訳日:2021-07-06 14:55:56 公開日:2021-07-02

# ヒューマノイドロボットの先天的遠隔操作

Prescient teleoperation of humanoid robots ( http://arxiv.org/abs/2107.01281v1 )

ライセンス: Link先を確認

Luigi Penco, Jean-Baptiste Mouret, Serena Ivaldi

(参考訳) 人間型ロボットは、操作者に視覚フィードバックを送りながら、ウェアラブルモーションキャプチャ装置を装備した操作者の動きを遠隔地で再現することができる。人間の動作(再ターゲット)をヒューマノイドロボットに転送する大きな進歩があったが、そのようなシステムが実際の応用に配備されるのを防ぐ主要な問題は、人間の入力とロボットからのフィードバックの間の通信遅延の存在である。これらの遅延を克服するために、ヒューマノイドロボットが実際にコマンドを受け取る前にコマンドを実行するシステムを導入し、視覚フィードバックがオペレータに同期するようにし、ロボットは過去にコマンドを実行した。そのためロボットは、過去の軌道で訓練され、最後に受信したコマンドで条件付けられた機械学習モデルをクエリすることで、将来のコマンドを継続的に予測する。私たちの実験では、オペレーターがヒューマノイドロボット(32度自由度)を、複数の身体操作タスクで最大2秒の確率的遅延で制御することに成功しました。

Humanoid robots could be versatile and intuitive human avatars that operate remotely in inaccessible places: the robot could reproduce in the remote location the movements of an operator equipped with a wearable motion capture device while sending visual feedback to the operator. While substantial progress has been made on transferring ("retargeting") human motions to humanoid robots, a major problem preventing the deployment of such systems in real applications is the presence of communication delays between the human input and the feedback from the robot: even a few hundred milliseconds of delay can irreversibly disturb the operator, let alone a few seconds. To overcome these delays, we introduce a system in which a humanoid robot executes commands before it actually receives them, so that the visual feedback appears to be synchronized to the operator, whereas the robot executed the commands in the past. To do so, the robot continuously predicts future commands by querying a machine learning model that is trained on past trajectories and conditioned on the last received commands. In our experiments, an operator was able to successfully control a humanoid robot (32 degrees of freedom) with stochastic delays up to 2 seconds in several whole-body manipulation tasks, including reaching different targets, picking up, and placing a box at distinct locations.

翻訳日:2021-07-06 14:46:31 公開日:2021-07-02

# 最適トランスポートを用いた機能コネクトーム間のデータ駆動マッピング

Data-driven mapping between functional connectomes using optimal transport ( http://arxiv.org/abs/2107.01303v1 )

ライセンス: Link先を確認

Javid Dadashkarimi and Amin Karbasi and Dustin Scheinost

(参考訳) 機能的磁気共鳴イメージングに由来する機能的コネクトームは、長い間脳の機能的構造を理解するために用いられてきた。それにもかかわらず、コネクトームは本質的にアトラスと結びついている。言い換えれば、あるアトラスから生成されたコネクトームは、別のアトラスから生成されたコネクトームと比べてスケールと解像度が異なる。コネクトームと導出結果を、追加の事前処理なしで異なるアトラス間でマッピングできることは、異なるアトラスを使用する研究間の解釈と一般化を改善する重要なステップである。ここでは、2つのアトラス間の最適マッピングを見つけるために、強力な数学的手法である最適輸送を用いる。このマッピングはコネクトームを再構築するために、あるアトラスから別のアトラスへの時系列変換に使用される。我々は、変換コネクトームと「金標準」コネクトーム(すなわち、アトラスから直接生成されたコネクトーム)を比較し、これらのコネクトームを異なるアトラスに基づく予測モデルに適用することにより、変換コネクトームの有用性を示す。これらのトランスフォーメーションコネクトームは,「金標準」コネクトームと著しく類似しており,脳行動関連における個人差を維持しており,本手法の有効性と下流解析における有用性を示している。全体として、我々のアプローチはコネクトームに基づく様々なアトラスの一般化を促進するための有望な道である。

Functional connectomes derived from functional magnetic resonance imaging have long been used to understand the functional organization of the brain. Nevertheless, a connectome is intrinsically linked to the atlas used to create it. In other words, a connectome generated from one atlas is different in scale and resolution compared to a connectome generated from another atlas. Being able to map connectomes and derived results between different atlases without additional pre-processing is a crucial step in improving interpretation and generalization between studies that use different atlases. Here, we use optimal transport, a powerful mathematical technique, to find an optimum mapping between two atlases. This mapping is then used to transform time series from one atlas to another in order to reconstruct a connectome. We validate our approach by comparing transformed connectomes against their "gold-standard" counterparts (i.e., connectomes generated directly from an atlas) and demonstrate the utility of transformed connectomes by applying these connectomes to predictive models based on a different atlas. We show that these transformed connectomes are significantly similar to their "gold-standard" counterparts and maintain individual differences in brain-behavior associations, demonstrating both the validity of our approach and its utility in downstream analyses. Overall, our approach is a promising avenue to increase the generalization of connectome-based results across different atlases.

翻訳日:2021-07-06 14:46:09 公開日:2021-07-02

# エンドツーエンド音声認識のための二重因果・非因果自己認識

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition ( http://arxiv.org/abs/2107.01269v1 )

ライセンス: Link先を確認

Niko Moritz, Takaaki Hori, Jonathan Le Roux

(参考訳) 注意に基づくエンドツーエンド自動音声認識(ASR)システムは、最近、多くのタスクに対する最先端の結果を実証している。しかし、セルフアテンションと注意に基づくエンコーダ・デコーダモデルの適用は、各単語が話された直後に認識されなければならないストリーミングasrでは依然として困難である。本稿では,dcn(d-causal self-attention)アーキテクチャについて述べる。このアーキテクチャは,制限された自己完結とは対照的に,ディープアーキテクチャで使用される場合の単一レイヤのルック・アヘッドを超えて,全体的なコンテキストが成長することを妨げている。 dcnは、ストリーミングトランスフォーマーとコンフォーメータアーキテクチャを用いたチャンクベースおよび制限付きセルフアテンションと比較され、チャンクベースの自己アテンションに比べて制限付き自己アテンションおよび競合型asr結果よりもasr性能が向上し、フレーム同期処理の利点を提供する。提案されたストリーミング・ツー・エンドのASRシステムは、注意を喚起し、LibriSpeech、HKUST、Switchboard ASRタスクの最先端の結果を得た。

Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks. However, the application of self-attention and attention-based encoder-decoder models remains challenging for streaming ASR, where each word must be recognized shortly after it was spoken. In this work, we present the dual causal/non-causal self-attention (DCN) architecture, which in contrast to restricted self-attention prevents the overall context to grow beyond the look-ahead of a single layer when used in a deep architecture. DCN is compared to chunk-based and restricted self-attention using streaming transformer and conformer architectures, showing improved ASR performance over restricted self-attention and competitive ASR results compared to chunk-based self-attention, while providing the advantage of frame-synchronous processing. Combined with triggered attention, the proposed streaming end-to-end ASR systems obtained state-of-the-art results on the LibriSpeech, HKUST, and Switchboard ASR tasks.

翻訳日:2021-07-06 14:41:08 公開日:2021-07-02

# (参考訳) o2d2: オーサシップ検証における決定不能な試行をキャプチャする分散検出装置

O2D2: Out-Of-Distribution Detector to Capture Undecidable Trials in Authorship Verification ( http://arxiv.org/abs/2106.15825v2 )

ライセンス: CC BY 4.0

Benedikt Boenninghoff, Robert M. Nickel, Dorothea Kolossa

(参考訳) pan 2021 authorship verification (av) challengeは、クロストピック/クローズドセットavタスクからクロストピック/オープンセットavタスクへ、ファンフィクションテキストのコレクションに移行した3年間の戦略の一部である。本稿では,2021年の課題に対処するために設計された,新しいハイブリッド型ニューラル確率的フレームワークを提案する。提案方式は,2020年度の入賞提案に基づいて,話題の変化に対する感性を大幅に低減し,不確実性対応層を用いてシステムの校正をさらに改善する更新を行った。当社のフレームワークには、非応答を定義するためのout-of-distribution detector(o2d2)も含まれています。提案システムは、PAN 2021 AVタスクに参加した他のシステムよりも優れていた。

The PAN 2021 authorship verification (AV) challenge is part of a three-year strategy, moving from a cross-topic/closed-set AV task to a cross-topic/open-set AV task over a collection of fanfiction texts. In this work, we present a novel hybrid neural-probabilistic framework that is designed to tackle the challenges of the 2021 task. Our system is based on our 2020 winning submission, with updates to significantly reduce sensitivities to topical variations and to further improve the system's calibration by means of an uncertainty-adaptation layer. Our framework additionally includes an out-of-distribution detector (O2D2) for defining non-responses. Our proposed system outperformed all other systems that participated in the PAN 2021 AV task.

翻訳日:2021-07-06 07:20:10 公開日:2021-07-02

# (参考訳) ニューラルボコーダによる話者検証のための対向サンプルの抽出

Spotting adversarial samples for speaker verification by neural vocoders ( http://arxiv.org/abs/2107.00309v2 )

ライセンス: CC0 1.0

Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-yi Lee

(参考訳) 生体認証の最も重要な技術の一つである自動話者認証(ASV)は、トランザクション認証やアクセス制御を含むセキュリティクリティカルなアプリケーションで広く採用されている。しかし、以前の研究では、ASVは最近出現した敵攻撃に対して深刻な脆弱性があることが示されている。本稿では,ASVの対立サンプルを見つけるために,ニューラルボコーダを用いる。ニューラルボコーダを用いてオーディオを再合成し、元のオーディオと再合成オーディオのASVスコアの違いが真と逆のサンプルの識別に良い指標であることを示す。この取り組みは、私たちの知る限り、ASVの敵対的サンプルを検出するための技術的方向性を最初に追求するものであり、そのため、比較のための確立された基準線が欠如している。その結果,検出基準としてGriffin-Limアルゴリズムを実装した。提案手法は,すべての設定において,すべてのベースラインを上回る効果的な検出性能を実現する。また,検出フレームワークで採用されているニューラルボコーダはデータセットに依存しないことを示す。私たちのコードは、将来的な比較作業のためにオープンソースにされます。

Automatic speaker verification (ASV), one of the most important technology for biometric identification, has been widely adopted in security-critical applications, including transaction authentication and access control. However, previous work has shown that ASV is seriously vulnerable to recently emerged adversarial attacks, yet effective countermeasures against them are limited. In this paper, we adopt neural vocoders to spot adversarial samples for ASV. We use the neural vocoder to re-synthesize audio and find that the difference between the ASV scores for the original and re-synthesized audio is a good indicator for discrimination between genuine and adversarial samples. This effort is, to the best of our knowledge, among the first to pursue such a technical direction for detecting adversarial samples for ASV, and hence there is a lack of established baselines for comparison. Consequently, we implement the Griffin-Lim algorithm as the detection baseline. The proposed approach achieves effective detection performance that outperforms all the baselines in all the settings. We also show that the neural vocoder adopted in the detection framework is dataset-independent. Our codes will be made open-source for future works to do comparison.

翻訳日:2021-07-06 06:55:07 公開日:2021-07-02

# (参考訳) multilingual central repository: wordnetsを開発するためのクロスリンガルフレームワーク

Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets ( http://arxiv.org/abs/2107.00333v2 )

ライセンス: CC BY 4.0

Xavier G\'omez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau

(参考訳) 言語処理には言語リソースが必要ですが、その構築にはコストがかかり、さまざまな分野の研究が必要で、常に更新が必要です。本稿では,バスク語,カタルーニャ語,英語,ガリシア語,ポルトガル語,スペイン語,および以下のオントロジー(ベースコンセプト,トップオントロジー,WordNetドメイン,Suggested Upper Merged Ontology)を含む多言語知識基盤であるMCR(Multilingual Central Repository)の開発に使用されるクロスリンガルフレームワークについて述べる。我々は、MCR、2017年の状態、および開発ツールについて紹介する。

Language resources are necessary for language processing,but building them is costly, involves many researches from different areas and needs constant updating. In this paper, we describe the crosslingual framework used for developing the Multilingual Central Repository (MCR), a multilingual knowledge base that includes wordnets of Basque, Catalan, English, Galician, Portuguese, Spanish and the following ontologies: Base Concepts, Top Ontology, WordNet Domains and Suggested Upper Merged Ontology. We present the story of MCR, its state in 2017 and the developed tools.

翻訳日:2021-07-06 06:18:11 公開日:2021-07-02

# (参考訳) ボトルネック付き無限広ニューラルネットワークにおける暗黙の加速と特徴学習

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks ( http://arxiv.org/abs/2107.00364v2 )

ライセンス: CC0 1.0

Etai Littwin, Omid Saremi, Shuangfei Zhai, Vimal Thilak, Hanlin Goh, Joshua M. Susskind, Greg Yang

(参考訳) 有限サイズのボトルネックを用いて無限大ニューラルネットワークの学習ダイナミクスを分析する。ニューラルネットワークカーネルの限界とは異なり、無限幅ネットワークにおけるボトルネックは、ボトルネック表現におけるデータ依存的特徴学習を遅くする。無限ネットワークにおける単一ボトルネックは、純粋に無限ネットワークと比較してトレーニングを劇的に加速し、全体的なパフォーマンスが向上することを示す。ボトルネックの加速度効果を理論的に理解できる無限大のディープリニアモデルと類似性を引き出すことで加速度現象を考察する。

We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck. Unlike the neural tangent kernel limit, a bottleneck in an otherwise infinite width network al-lows data dependent feature learning in its bottle-neck representation. We empirically show that a single bottleneck in infinite networks dramatically accelerates training when compared to purely in-finite networks, with an improved overall performance. We discuss the acceleration phenomena by drawing similarities to infinitely wide deep linear models, where the acceleration effect of a bottleneck can be understood theoretically.

翻訳日:2021-07-06 06:07:37 公開日:2021-07-02

# (参考訳) CBNetV2:オブジェクト検出のための複合バックボーンネットワークアーキテクチャ

CBNetV2: A Composite Backbone Network Architecture for Object Detection ( http://arxiv.org/abs/2107.00420v2 )

ライセンス: CC0 1.0

Tingting Liang, Xiaojie Chu, Yudong Liu, Yongtao Wang, Zhi Tang, Wei Chu, Jingdong Chen, Haibing Ling

(参考訳) 現代のトップパフォーマンスオブジェクト検出器はバックボーンネットワークに大きく依存しており、その進歩はより効率的なネットワーク構造を探索することで一貫した性能向上をもたらす。しかし、新しいバックボーンを設計してimagenetで事前トレーニングするには大量の計算リソースが必要となり、より良い検出性能を得るのにコストがかかる。本稿では,既存のオープンソースの学習済みバックボーンの構成を組み込んだ新しいバックボーンネットワークCBNetV2を提案する。特にCBNetV2アーキテクチャは、複合接続を介して接続される複数の同一のバックボーンをグループ化する。また、CBNetベースの検出器のためのAssistant Supervisionによるより良いトレーニング戦略を提案する。 CBNetV2は追加の事前訓練がなければ、1段と2段の検出器を含むメインストリームの検出器とアンカーベースとアンカーフリーベースの検出器に組み込むことができ、COCOのベースライン上での性能は3.0%以上向上する。また、複合バックボーンは、手動ベースやNASベース、CNNベースやTransformerベースなど、トレーニング済みのより広いネットワークよりも効率的でリソースフレンドリであることを示す強力な証拠を提供する。特に、シングルモデルとシングルスケールのテストでは、HTC Dual-Swin-Bが58.6%のボックスAPと51.1%のマスクAPをCOCOテストデブで達成しています。これは最先端の結果(57.7%のボックスAPと50.2%のマスクAP)よりもはるかに優れています。

Modern top-performing object detectors depend heavily on backbone networks, whose advances bring consistent performance gains through exploring more effective network structures. However, designing or searching for a new backbone and pre-training it on ImageNet may require a large number of computational resources, making it costly to obtain better detection performance. In this paper, we propose a novel backbone network, namely CBNetV2, by constructing compositions of existing open-sourced pre-trained backbones. In particular, CBNetV2 architecture groups multiple identical backbones, which are connected through composite connections. We also propose a better training strategy with the Assistant Supervision for CBNet-based detectors. Without additional pre-training, CBNetV2 can be integrated into mainstream detectors, including one-stage and two-stage detectors, as well as anchor-based and anchor-free-based ones, and significantly improve their performance by more than 3.0% AP over the baseline on COCO. Also, experiments provide strong evidence showing that composite backbones are more efficient and resource-friendly than pre-trained wider and deeper networks, including manual-based and NAS-based, as well as CNN-based and Transformer-based ones. Particularly, with single-model and single-scale testing, our HTC Dual-Swin-B achieves 58.6% box AP and 51.1% mask AP on COCO test-dev, which is significantly better than the state-of-the-art result (i.e., 57.7% box AP and 50.2% mask AP) achieved by a stronger baseline HTC++ with a larger backbone Swin-L. Code will be released at https://github.com/VDIGPKU/CBNetV2.

翻訳日:2021-07-06 05:46:21 公開日:2021-07-02

# (参考訳) 因果的神経結合:表現力、学習力、推論

The Causal Neural Connection: Expressiveness, Learnability, and Inference ( http://arxiv.org/abs/2107.00793v1 )

ライセンス: CC BY 4.0

Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, Elias Bareinboim

(参考訳) 因果推論の中心的な要素の1つは構造因果モデル (Structure causal model, SCM) と呼ばれる対象であり、これは調査中のシステムのランダムな変動のメカニズムと外因性源の集合を表す(Pearl, 2000)。多くの種類のニューラルネットワークの重要な性質は、任意の関数を任意の精度で近似できる普遍近似性である。この性質から、ニューラルネットワークの集合がそのSCMによって生成されたデータに基づいてトレーニングすることで、任意のSCMを学習できると推測する誘惑があるかもしれない。本稿では,表現性や学習可能性の概念を否定することで,この現象は当てはまらないことを示す。具体的には、因果階層定理(Thm)を示す。データから学べるものの限界を記述するBareinboim et al., 2020)は、依然としてニューラルモデルに当てはまる。例えば、任意に複雑で表現力のあるニューラルネットは、観測データのみによる介入の効果を予測できない。この結果から,ニューラル因果モデル(NCM)と呼ばれる特殊なSCMを導入し,因果推論に必要な構造的制約をエンコードする新しいタイプの帰納バイアスを定式化する。この新たなモデルに基づいて、因果同定と推定として知られる文献に見られる2つの正準タスクの解決に焦点をあてる。ニューラルツールボックスを活用することで、データから因果効果を学習できるかどうかを判断するために必要なアルゴリズム(すなわち因果識別可能性)を開発し、識別性が保持されるたびにその効果を推定する(因果推定)。シミュレーションは提案手法を裏付ける。

One of the central elements of any causal inference is an object called structural causal model (SCM), which represents a collection of mechanisms and exogenous sources of random variation of the system under investigation (Pearl, 2000). An important property of many kinds of neural networks is universal approximability: the ability to approximate any function to arbitrary precision. Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM. In this paper, we show this is not the case by disentangling the notions of expressivity and learnability. Specifically, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020), which describes the limits of what can be learned from data, still holds for neural models. For instance, an arbitrarily complex and expressive neural net is unable to predict the effects of interventions given observational data alone. Given this result, we introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences. Building on this new class of models, we focus on solving two canonical tasks found in the literature known as causal identification and estimation. Leveraging the neural toolbox, we develop an algorithm that is both sufficient and necessary to determine whether a causal effect can be learned from data (i.e., causal identifiability); it then estimates the effect whenever identifiability holds (causal estimation). Simulations corroborate the proposed approach.

翻訳日:2021-07-06 02:49:28 公開日:2021-07-02

# (参考訳) 説明可能なk-mediansとk-meansの近似最適アルゴリズム

Near-optimal Algorithms for Explainable k-Medians and k-Means ( http://arxiv.org/abs/2107.00798v1 )

ライセンス: CC BY 4.0

Konstantin Makarychev, Liren Shan

(参考訳) 我々は,dasgupta,frost,moshkovitz,rashtchian~(icml 2020)が導入した説明可能な$k$-mediansと$k$-meansの問題を考える。この問題では、データを$k$クラスタに分割し、$k$-mediansや$k$-meansの目的を最小化する、‘emph{threshold decision tree’を見つけることが目的です。閾値木のすべての決定ノードは、1つの特徴に基づいてデータを2つのグループに分割するため、得られたクラスタリングは容易に解釈できる。我々は、$\tilde o(\log k)$が$k$-medians、$\ell_1$ norm、$\tilde o(k)$が$k$-meansと競合する問題に対する新しいアルゴリズムを提案する。これは Dasgupta et al (2020) による$O(k)$ と $O(k^2)$ の以前の保証よりも改善されている。また、$O(\log^{3/2} k)$$k$-medians with $\ell_2$ normに対して競合する新しいアルゴリズムも提供する。 dasgupta et al (2020) は$k$-medians に対して$\omega(\log k)$という下限を示し、本研究では$k$-means に対して$\tilde\omega(k)$という下限を証明した。また、$\ell_2$ normを持つ$k$-mediansに対して$\Omega(\log k)$の低い境界も提供する。

We consider the problem of explainable $k$-medians and $k$-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian~(ICML 2020). In this problem, our goal is to find a \emph{threshold decision tree} that partitions data into $k$ clusters and minimizes the $k$-medians or $k$-means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is $\tilde O(\log k)$ competitive with $k$-medians with $\ell_1$ norm and $\tilde O(k)$ competitive with $k$-means. This is an improvement over the previous guarantees of $O(k)$ and $O(k^2)$ by Dasgupta et al (2020). We also provide a new algorithm which is $O(\log^{3/2} k)$ competitive for $k$-medians with $\ell_2$ norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of $\Omega(\log k)$ for $k$-medians; in this work, we prove a lower bound of $\tilde\Omega(k)$ for $k$-means. We also provide a lower bound of $\Omega(\log k)$ for $k$-medians with $\ell_2$ norm.

翻訳日:2021-07-06 02:48:07 公開日:2021-07-02

# (参考訳) 彼は医師より良いことを知っている: BERTは実用性に影響を及ぼす

He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics ( http://arxiv.org/abs/2107.00807v1 )

ライセンス: CC BY 4.0

Nanjiang Jiang and Marie-Catherine de Marneffe

(参考訳) 既存のいくつかの英語データセットにおいて,BERTが実存性を予測し,様々な言語構造を包含する方法について検討する。 BERTは、ほとんどのデータセットで強力なパフォーマンスを得るが、特定の事実ラベルと相関する一般的な表面パターンを利用することで、実用的推論が必要なインスタンスではフェールする。ハイパフォーマンスが示唆するものとは対照的に、事実性予測のための堅牢なシステムには程遠いのです。

We investigate how well BERT performs on predicting factuality in several existing English datasets, encompassing various linguistic constructions. Although BERT obtains a strong performance on most datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, and it fails on instances where pragmatic reasoning is necessary. Contrary to what the high performance suggests, we are still far from having a robust system for factuality prediction.

翻訳日:2021-07-06 02:19:28 公開日:2021-07-02

# (参考訳) 細胞平均に基づく双曲型および放物型偏微分方程式のニューラルネットワーク法

Cell-average based neural network method for hyperbolic and parabolic partial differential equations ( http://arxiv.org/abs/2107.00813v1 )

ライセンス: CC BY 4.0

Changxin Qiu, Jue Yan

(参考訳) 有限体積スキームをベースとしたセル平均ニューラルネットワーク手法を提案する。この方法は偏微分方程式の積分あるいは弱定式化に基づいている。単純なフィードフォワードネットワークは、2つの隣り合う時間ステップ間のソリューション平均進化を学ぶことを余儀なくされる。ニューラルネットワーク法のような1つの有限体積を一意に識別する最適ネットワークパラメータセットを得るために、オフライン教師付きトレーニングを行う。トレーニングがうまく行えば、ネットワーク手法は有限体積スキームとして実装され、メッシュ依存となる。従来の数値法とは異なり,提案手法は明示的なスキーム CFL 制約から緩和することができ,解の進化のために任意の時間ステップサイズに適応することができる。熱方程式では、第1次収束が観測され、誤差はメッシュサイズと関連しているが、メッシュサイズとは独立に観測される。セル平均ベースのニューラルネットワーク手法は、ほぼゼロの数値拡散で接触不連続性を鋭く発展させることができる。衝撃波と希薄波は非線形双曲保存法則のためによく捕獲される。

Motivated by finite volume scheme, a cell-average based neural network method is proposed. The method is based on the integral or weak formulation of partial differential equations. A simple feed forward network is forced to learn the solution average evolution between two neighboring time steps. Offline supervised training is carried out to obtain the optimal network parameter set, which uniquely identifies one finite volume like neural network method. Once well trained, the network method is implemented as a finite volume scheme, thus is mesh dependent. Different to traditional numerical methods, our method can be relieved from the explicit scheme CFL restriction and can adapt to any time step size for solution evolution. For Heat equation, first order of convergence is observed and the errors are related to the spatial mesh size but are observed independent of the mesh size in time. The cell-average based neural network method can sharply evolve contact discontinuity with almost zero numerical diffusion introduced. Shock and rarefaction waves are well captured for nonlinear hyperbolic conservation laws.

翻訳日:2021-07-06 01:57:59 公開日:2021-07-02

# (参考訳) 機械学習再現性に関する経験報告:実践者およびtensorflow model gardenコントリビュータへのガイダンス

An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors ( http://arxiv.org/abs/2107.00821v1 )

ライセンス: CC BY-SA 4.0

Vishnu Banna and Akhil Chinnakotla and Zhengxin Yan and Ani Vegesana and Naveen Vivek and Kruthi Krishnappa and Wenxin Jiang and Yung-Hsiang Lu and George K. Thiruvathukal and James C. Davis

(参考訳) 機械学習技術は、科学と工学の進歩の基本的なツールになりつつある。これらの手法は天文学やスパムフィルタリングと同じくらい多様な文脈で適用されている。しかし、これらの手法を正しく適用するには、注意深い工学が必要である。研究ベースの機械学習技術を実用的なものにするために必要なソフトウェアエンジニアリングプロセスには比較的注意が払われていない。テクノロジ企業はTensorFLowやPyTorchといった機械学習フレームワークを通じてエンジニアリングコミュニティを支援してきたが、これらのフレームワークで複雑な機械学習モデルを設計する方法の詳細は隠されている。エンジニアリングコミュニティ内でのベストプラクティスを促進するため、学術機関とGoogleは、TensorFlow Model Garden(TFMG)などのコミュニティロケーションで著名な機械学習モデルの模範的な実装を開発することを目的とした、機械学習モデルに関する特別研究グループ(SIGMODELS)の立ち上げに協力した。本報告の目的は、tfmgに含まれるのに適した品質で最先端の機械学習モデルを再現するプロセスを定義することである。論文分析からモデルリリースまで,各ステップについて詳細なエンジニアリングプロセスを定義します。我々は26人の学生からなるチームでYOLOモデルファミリの実装経験を報告し、開発したツールを共有し、その過程で学んだ教訓を説明する。

Machine learning techniques are becoming a fundamental tool for scientific and engineering progress. These techniques are applied in contexts as diverse as astronomy and spam filtering. However, correctly applying these techniques requires careful engineering. Much attention has been paid to the technical potential; relatively little attention has been paid to the software engineering process required to bring research-based machine learning techniques into practical utility. Technology companies have supported the engineering community through machine learning frameworks such as TensorFLow and PyTorch, but the details of how to engineer complex machine learning models in these frameworks have remained hidden. To promote best practices within the engineering community, academic institutions and Google have partnered to launch a Special Interest Group on Machine Learning Models (SIGMODELS) whose goal is to develop exemplary implementations of prominent machine learning models in community locations such as the TensorFlow Model Garden (TFMG). The purpose of this report is to define a process for reproducing a state-of-the-art machine learning model at a level of quality suitable for inclusion in the TFMG. We define the engineering process and elaborate on each step, from paper analysis to model release. We report on our experiences implementing the YOLO model family with a team of 26 student researchers, share the tools we developed, and describe the lessons we learned along the way.

翻訳日:2021-07-06 01:37:35 公開日:2021-07-02

# (参考訳) 線形4次平均場ゲーム学習のための探索ノイズ

Exploration noise for learning linear-quadratic mean field games ( http://arxiv.org/abs/2107.00839v1 )

ライセンス: CC BY 4.0

Fran\c{c}ois Delarue and Athanasios Vasileiadis

(参考訳) 本研究の目的は, 平均フィールドゲームの解法を学ぶための探索ノイズとして, 共通雑音が有効であることを示すことである。この概念は、一般的な雑音の適切な形が、存在と特異性を復元することがすでに証明されている、おもちゃの線形四角形モデルによって実証されている。ここではさらに一歩進んで、同じ種類の共通雑音が「架空の遊び」と呼ばれる学習アルゴリズムの収束を招きかねないことを証明し、これはさらなるポテンシャルや単調な構造を伴わない。理論解析を支えるためにいくつかの数値例が提供されている。

The goal of this paper is to demonstrate that common noise may serve as an exploration noise for learning the solution of a mean field game. This concept is here exemplified through a toy linear-quadratic model, for which a suitable form of common noise has already been proven to restore existence and uniqueness. We here go one step further and prove that the same form of common noise may force the convergence of the learning algorithm called `fictitious play', and this without any further potential or monotone structure. Several numerical examples are provided in order to support our theoretical analysis.

翻訳日:2021-07-06 01:36:41 公開日:2021-07-02

# (参考訳) 多次元スペクトルデータに対する深層学習に基づく統計ノイズ低減

Deep learning-based statistical noise reduction for multidimensional spectral data ( http://arxiv.org/abs/2107.00844v1 )

ライセンス: CC BY 4.0

Younsik Kim, Dongjin Oh, Soonsang Huh, Dongjoon Song, Sunbeom Jeong, Junyoung Kwon, Minsoo Kim, Donghan Kim, Hanyoung Ryu, Jongkeun Jung, Wonshik Kyung, Byungmin Sohn, Suyoung Lee, Jounghoon Hyun, Yeonghoon Lee, Yeongkwan Kimand Changyoung Kim

(参考訳) 分光実験では、多次元位相空間におけるデータ取得は、カバーすべき大きな位相空間体積のため、長い取得時間を必要とする可能性がある。このような場合、データ取得に利用可能な制限時間は、多次元スペクトルデータを取得する実験において深刻な制約となる。本稿では,角度分解光電子分光(arpes)を例として,ディープラーニングを知的手法として活用し,その制約を克服する手法を提案する。簡単に利用できるARPESデータとトレーニングデータセットをランダムに生成することで、過度に適合することなく、ニューラルネットワークの雑音化をトレーニングすることに成功しました。消音ニューラルネットワークは、本質的な情報を保存しながら、データ内のノイズを除去できる。ニューラルネットは,2桁の取得時間で取得したデータに対して,類似した2次導出および線形状解析を行うことができることを示す。本手法の重要性は,統計雑音の影響を受けやすい多次元スペクトルデータに適用可能であることにある。

In spectroscopic experiments, data acquisition in multi-dimensional phase space may require long acquisition time, owing to the large phase space volume to be covered. In such case, the limited time available for data acquisition can be a serious constraint for experiments in which multidimensional spectral data are acquired. Here, taking angle-resolved photoemission spectroscopy (ARPES) as an example, we demonstrate a denoising method that utilizes deep learning as an intelligent way to overcome the constraint. With readily available ARPES data and random generation of training data set, we successfully trained the denoising neural network without overfitting. The denoising neural network can remove the noise in the data while preserving its intrinsic information. We show that the denoising neural network allows us to perform similar level of second-derivative and line shape analysis on data taken with two orders of magnitude less acquisition time. The importance of our method lies in its applicability to any multidimensional spectral data that are susceptible to statistical noise.

翻訳日:2021-07-06 01:35:29 公開日:2021-07-02

# (参考訳) モバイルアプリケーションにおけるK平均+強化学習に基づくユーザの役割発見と最適化手法

User Role Discovery and Optimization Method based on K-means + Reinforcement learning in Mobile Applications ( http://arxiv.org/abs/2107.00862v1 )

ライセンス: CC BY 4.0

Yuanbang Li

(参考訳) 携帯電話の普及により、ユーザーは自分の位置やアクティビティをいつでも、どこでも、データのチェックインとして共有できる。これらのデータはユーザーの特徴を反映している。長期的な安定と、ユーザ共有機能のセットは、ユーザロールとして抽象化できる。この役割はユーザの社会的背景、職業、生活習慣と密接に関連している。本研究の主な貢献は4つある。まず、各ユーザに対する異なるビューからのユーザ特徴モデルを、データのチェックの分析から構築する。次に、K Meansアルゴリズムを用いてユーザ機能からユーザロールを検出する。第3に,ユーザロールのクラスタリング効果を強化し,クラスタリング結果の安定性を向上させるため,強化学習アルゴリズムを提案する。最後に,本手法の有効性を検証する実験を行い,その有効性を示した。

With the widespread use of mobile phones, users can share their location and activity anytime, anywhere, as a form of check in data. These data reflect user features. Long term stable, and a set of user shared features can be abstracted as user roles. The role is closely related to the user's social background, occupation, and living habits. This study provides four main contributions. Firstly, user feature models from different views for each user are constructed from the analysis of check in data. Secondly, K Means algorithm is used to discover user roles from user features. Thirdly, a reinforcement learning algorithm is proposed to strengthen the clustering effect of user roles and improve the stability of the clustering result. Finally, experiments are used to verify the validity of the method, the results of which show the effectiveness of the method.

翻訳日:2021-07-06 01:23:19 公開日:2021-07-02

# (参考訳) 混合整数プログラムの原始的ヒューリスティックス学習

Learning Primal Heuristics for Mixed Integer Programs ( http://arxiv.org/abs/2107.00866v1 )

ライセンス: CC BY 4.0

Yunzhuang Shen, Yuan Sun, Andrew Eberhard, Xiaodong Li

(参考訳) 本稿では,機械学習技術を用いた混合整数プログラムのための新しい原始ヒューリスティックを提案する。混合整数プログラミングは組合せ最適化問題を定式化する一般的な手法である。解法の内部では、分岐境界アルゴリズム(B&B)の開始から双対性ギャップを狭め、B&B木を積極的に刈り取ることでその性能を大幅に向上させる、優れた実現可能な解を見つける上で、原始ヒューリスティックスが重要な役割を果たす。本稿では,機械学習を用いて有効原始ヒューリスティックスを自動学習できるかどうかを検討する。本稿では,最適化問題をグラフとして表現する新しい手法を提案し,既知の最適解を持つ解問題インスタンス上でグラフ畳み込みネットワークを訓練する。これにより、類似型の未解決問題インスタンスの最適解における決定変数の値を予測することができる。可変解の予測は,B&B法(PB-DFS)を用いた確率分岐法(Probabilistic Branching with guided Depth-first Search, PB-DFS)の新たな構成により,(ほぼ)最適解の探索を迅速に行う。実験の結果、この新しいヒューリスティックは、他の最先端の原始ヒューリスティックと比較して、解法プロセスのずっと早い段階でより優れた原始解を見出すことができた。

This paper proposes a novel primal heuristic for Mixed Integer Programs, by employing machine learning techniques. Mixed Integer Programming is a general technique for formulating combinatorial optimization problems. Inside a solver, primal heuristics play a critical role in finding good feasible solutions that enable one to tighten the duality gap from the outset of the Branch-and-Bound algorithm (B&B), greatly improving its performance by pruning the B&B tree aggressively. In this paper, we investigate whether effective primal heuristics can be automatically learned via machine learning. We propose a new method to represent an optimization problem as a graph, and train a Graph Convolutional Network on solved problem instances with known optimal solutions. This in turn can predict the values of decision variables in the optimal solution for an unseen problem instance of a similar type. The prediction of variable solutions is then leveraged by a novel configuration of the B&B method, Probabilistic Branching with guided Depth-first Search (PB-DFS) approach, aiming to find (near-)optimal solutions quickly. The experimental results show that this new heuristic can find better primal solutions at a much earlier stage of the solving process, compared to other state-of-the-art primal heuristics.

翻訳日:2021-07-06 01:09:40 公開日:2021-07-02

# (参考訳) 情報幾何の観点から見た依存ネットワークの再考

Reconsidering Dependency Networks from an Information Geometry Perspective ( http://arxiv.org/abs/2107.00871v1 )

ライセンス: CC BY-SA 4.0

Kazuya Takabatake, Shotaro Akaho

(参考訳) 依存ネットワーク(Heckerman et al., 2000)は、多数の変数を含むシステムに対する潜在的確率的グラフィカルモデルである。ベイズネットワークと同様に、依存ネットワークの構造は有向グラフで表現され、各ノードは条件付き確率テーブルを持つ。学習と推論は個々のノード上でローカルに実現されるため、多くの変数であっても計算は扱いやすいままである。しかし、依存ネットワークの学習分布は擬ギブスサンプリングと呼ばれるマルコフ連鎖の定常分布であり、閉形式表現を持たない。この技術的不利は依存ネットワークの開発を妨げている。本稿では,各ノードに対してある多様体を考える。そして、これらの多様体上の反復 m-射影として擬ギブスサンプリングを解釈することができる。この解釈は、擬ギブスサンプリングの定常分布が分布空間に存在する位置に関する理論的境界を与える。さらに、この解釈は最適化問題として構造およびパラメータ学習アルゴリズムを含む。さらに,ベイジアンネットワークと依存性を実験的に比較した。その結果,依存性ネットワークとベイズネットワークは,学習した分布の精度でほぼ同じ性能を示すことがわかった。その結果,依存性ネットワークはベイズネットワークよりもはるかに高速に学習できることがわかった。

Dependency networks (Heckerman et al., 2000) are potential probabilistic graphical models for systems comprising a large number of variables. Like Bayesian networks, the structure of a dependency network is represented by a directed graph, and each node has a conditional probability table. Learning and inference are realized locally on individual nodes; therefore, computation remains tractable even with a large number of variables. However, the dependency network's learned distribution is the stationary distribution of a Markov chain called pseudo-Gibbs sampling and has no closed-form expressions. This technical disadvantage has impeded the development of dependency networks. In this paper, we consider a certain manifold for each node. Then, we can interpret pseudo-Gibbs sampling as iterative m-projections onto these manifolds. This interpretation provides a theoretical bound for the location where the stationary distribution of pseudo-Gibbs sampling exists in distribution space. Furthermore, this interpretation involves structure and parameter learning algorithms as optimization problems. In addition, we compare dependency and Bayesian networks experimentally. The results demonstrate that the dependency network and the Bayesian network have roughly the same performance in terms of the accuracy of their learned distributions. The results also show that the dependency network can learn much faster than the Bayesian network.

翻訳日:2021-07-06 00:53:53 公開日:2021-07-02

# (参考訳) オンデマンドで軽量な知識グラフ生成 - DBpediaによるデモ

On-Demand and Lightweight Knowledge Graph Generation -- a Demonstration with DBpedia ( http://arxiv.org/abs/2107.00873v1 )

ライセンス: CC BY 4.0

Malte Brockmeier, Yawen Liu, Sunita Pateer, Sven Hertling and Heiko Paulheim

(参考訳) 現代のDBpediaのような大規模知識グラフは、処理と処理に大量の計算リソースを必要とするデータセットである。さらに、リリースサイクルが長い場合が多いため、これらのグラフには時代遅れの情報が残されている。本稿では,DBpedia on Demand(DBpedia on Demand)を提案する。DBpediaのリソースを,グラフ全体の実体化や保存を必要とせずにオンデマンドで提供するシステムで,クエリ機能にも制限がある。

Modern large-scale knowledge graphs, such as DBpedia, are datasets which require large computational resources to serve and process. Moreover, they often have longer release cycles, which leads to outdated information in those graphs. In this paper, we present DBpedia on Demand -- a system which serves DBpedia resources on demand without the need to materialize and store the entire graph, and which even provides limited querying functionality.

翻訳日:2021-07-06 00:32:58 公開日:2021-07-02

# (参考訳) 軌道角運動量絡み合った光子による衝突のない集団確率決定

Conflict-free collective stochastic decision making by orbital angular momentum entangled photons ( http://arxiv.org/abs/2107.00877v1 )

ライセンス: CC BY 4.0

Takashi Amakasu, Nicolas Chauvet, Guillaume Bachelier, Serge Huant, Ryoichi Horisaki, Makoto Naruse

(参考訳) 近年、光学と計算の両方に関わる学際研究において、光の波動粒子双対性を利用して複数腕のバンディット問題を解決するシングルフォトンによる意思決定が実証されている。さらに、絡み合った光子に基づく意思決定は、プレイヤー間の決定の衝突を回避し、平等を確保しながら競合するマルチアームバンディット問題を解決した。しかし、これらの研究は光の偏光に基づいているため、利用可能な選択の数は2に制限され、2つの直交偏光状態に対応する。ここでは、軌道角運動量を光子の調整可能な自由度として利用することにより、競争上の意思決定状況を解決するためのスケーラブルな原理を提案する。さらに、Hong-Ou-Mandel効果を2つ以上の状態に拡張することにより、軌道角運動量を持つ絡み合った光子状態を生成することができる実験的な構成を確立する。提案手法がナッシュ均衡を実現するための従来の混合戦略よりも大きい理論的最大値をほぼ達成する三本腕バンディット問題に関する全報酬を数値的に検討する。これは、最良の武器を見つけるための探索段階でさえも、矛盾のない選択を達成する絡み合い特性のおかげである。

In recent cross-disciplinary studies involving both optics and computing, single-photon-based decision-making has been demonstrated by utilizing the wave-particle duality of light to solve multi-armed bandit problems. Furthermore, entangled-photon-based decision-making has managed to solve a competitive multi-armed bandit problem in such a way that conflicts of decisions among players are avoided while ensuring equality. However, as these studies are based on the polarization of light, the number of available choices is limited to two, corresponding to two orthogonal polarization states. Here we propose a scalable principle to solve competitive decision-making situations by using the orbital angular momentum as the tunable degree of freedom of photons, which theoretically allows an unlimited number of arms. Moreover, by extending the Hong-Ou-Mandel effect to more than two states, we theoretically establish an experimental configuration able to generate entangled photon states with orbital angular momentum and conditions that provide conflict-free selections at every turn. We numerically examine total rewards regarding three-armed bandit problems, for which the proposed strategy accomplishes almost the theoretical maximum, which is greater than a conventional mixed strategy intending to realize Nash equilibrium. This is thanks to the entanglement property that achieves no-conflict selections, even in the exploring phase to find the best arms.

翻訳日:2021-07-06 00:28:46 公開日:2021-07-02

# (参考訳) 解釈可能な協調グラフニューラルネットワークによるオンラインマルチエージェント予測

Online Multi-Agent Forecasting with Interpretable Collaborative Graph Neural Network ( http://arxiv.org/abs/2107.00894v1 )

ライセンス: CC BY 4.0

Maosen Li, Siheng Chen, Yanning Shen, Genjia Liu, Ivor W. Tsang, Ya Zhang

(参考訳) 本稿では,システム内の動的相互作用を利用して,複数エージェントの今後の状況を予測する。本稿では,複数の協調予測器からの予測をコラボレーティブグラフに従って集約するコラボレーティブ予測ユニット(copu)を提案する。各協調予測器は、他のエージェントの影響を考慮してエージェントの状態を予測するように訓練される。協調グラフのエッジ重みは、各予測器の重要性を反映している。協調グラフは、明示的な目的を最小化することで動機づけられる乗法的更新によってオンラインに調整される。この目的により、我々はまた、トレーニングとともに、CoPUが、後方の最高の協調予測器と同じようなパフォーマンスを達成することを示すために、後悔の分析を行う。この理論的解釈性は、我々の手法を他の多くのグラフネットワークと区別する。予測を段階的に洗練するために、複数のCoPUを積み重ねて協調グラフニューラルネットワークを形成する。オンラインの軌道予測,オンラインの人力予測,オンラインの交通速度予測の3つのタスクにおいて,本手法は平均28.6%,17.4%,21.0%の3タスクにおいて,最先端の作業よりも優れていた。

This paper considers predicting future statuses of multiple agents in an online fashion by exploiting dynamic interactions in the system. We propose a novel collaborative prediction unit (CoPU), which aggregates the predictions from multiple collaborative predictors according to a collaborative graph. Each collaborative predictor is trained to predict the status of an agent by considering the impact of another agent. The edge weights of the collaborative graph reflect the importance of each predictor. The collaborative graph is adjusted online by multiplicative update, which can be motivated by minimizing an explicit objective. With this objective, we also conduct regret analysis to indicate that, along with training, our CoPU achieves similar performance with the best individual collaborative predictor in hindsight. This theoretical interpretability distinguishes our method from many other graph networks. To progressively refine predictions, multiple CoPUs are stacked to form a collaborative graph neural network. Extensive experiments are conducted on three tasks: online simulated trajectory prediction, online human motion prediction and online traffic speed prediction, and our methods outperform state-of-the-art works on the three tasks by 28.6%, 17.4% and 21.0% on average, respectively.

翻訳日:2021-07-06 00:05:47 公開日:2021-07-02

# (参考訳) 深部畳み込みニューラルネットワークの理論III:放射関数の近似

Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions ( http://arxiv.org/abs/2107.00896v1 )

ライセンス: CC BY 4.0

Tong Mao, Zhongjie Shi, and Ding-Xuan Zhou

(参考訳) 我々は、2つの畳み込み層、ダウンサンプリング演算子、完全に接続された層からなるディープニューラルネットワークのファミリーを考える。ネットワーク構造は、畳み込み層の数と完全に連結された層の幅を決定する2つの構造パラメータに依存する。近似関数が特徴多項式 $q$ と不定値関数 $f$ との合成形式 $f\circ q$ を取るとき、明示的な近似率を持つ近似理論を定式化する。特に、そのようなネットワークが、$\mathbb{r}^d$ からのデータの次元 $d$ が大きいとき、$q(x) =|x|^2$ で半径関数を近似するときに、完全連結な浅層ネットワークを上回ることが証明される。これは、特殊構造を持つ関数を近似する深層畳み込みニューラルネットワークの優越性に関する最初の厳密な証明を与える。そこで我々は, 回帰関数が$f\circ Q$である回帰フレームワークにおいて, そのようなディープネットワークを用いた経験的リスク最小化のための一般化解析を行う。複合情報や$q$ や $f$ の関数を使用しないネットワーク構造は、自動的に特徴を抽出し、構造パラメータをチューニングすることで回帰関数の複合的性質を利用することができます。本解析は,ネットワーク深度を最小にし,その後増加させる誤差境界を提供し,ネットワーク深さで観測されるトレードオフ現象を理論的に検証する。

We consider a family of deep neural networks consisting of two groups of convolutional layers, a downsampling operator, and a fully connected layer. The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer. We establish an approximation theory with explicit approximation rates when the approximated function takes a composite form $f\circ Q$ with a feature polynomial $Q$ and a univariate function $f$. In particular, we prove that such a network can outperform fully connected shallow networks in approximating radial functions with $Q(x) =|x|^2$, when the dimension $d$ of data from $\mathbb{R}^d$ is large. This gives the first rigorous proof for the superiority of deep convolutional neural networks in approximating functions with special structures. Then we carry out generalization analysis for empirical risk minimization with such a deep network in a regression framework with the regression function of the form $f\circ Q$. Our network structure which does not use any composite information or the functions $Q$ and $f$ can automatically extract features and make use of the composite nature of the regression function via tuning the structural parameters. Our analysis provides an error bound which decreases with the network depth to a minimum and then increases, verifying theoretically a trade-off phenomenon observed for network depths in many practical applications.

翻訳日:2021-07-05 23:35:31 公開日:2021-07-02

# (参考訳) ocr誤りのある歴史的テキストに対するデータ中心領域適応

Data Centric Domain Adaptation for Historical Text with OCR Errors ( http://arxiv.org/abs/2107.00927v1 )

ライセンス: CC BY 4.0

Luisa M\"arz, Stefan Schweter, Nina Poerner, Benjamin Roth and Hinrich Sch\"utze

(参考訳) オランダ語とフランス語の歴史的データに基づいて、ドメイン内およびドメイン間識別(NER)のための新しい手法を提案する。クロスドメインの場合、コンテキスト化された文字列埋め込みを通じて教師なしのドメインデータを統合することでドメインシフトに対処し、OCRエラーをソースドメインに注入し、データ中心のドメイン適応に対処する。任意の入力データにOCR誤差を模倣する一般的な手法を提案する。私たちのクロスドメインとドメイン内の結果は、いくつかの強力なベースラインを上回り、最先端の結果を確立します。私たちは、フランスとオランダのヨーロッパ・ナー・コーポラの事前処理版を公開します。

We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical data for Dutch and French. For the cross-domain case, we address domain shift by integrating unsupervised in-domain data via contextualized string embeddings; and OCR errors by injecting synthetic OCR errors into the source domain and address data centric domain adaptation. We propose a general approach to imitate OCR errors in arbitrary input data. Our cross-domain as well as our in-domain results outperform several strong baselines and establish state-of-the-art results. We publish preprocessed versions of the French and Dutch Europeana NER corpora.

翻訳日:2021-07-05 23:34:22 公開日:2021-07-02

# (参考訳) 全スライド画像分類のための混合監督学習

Mixed Supervision Learning for Whole Slide Image Classification ( http://arxiv.org/abs/2107.00934v1 )

ライセンス: CC BY 4.0

Jiahui Li, Wen Chen, Xiaodi Huang, Zhiqiang Hu, Qi Duan, Hongsheng Li, Dimitris N. Metaxas, Shaoting Zhang

(参考訳) 分類ラベルを用いた弱監督学習は,様々なタスクにおいて高い性能を示した。数ピクセルレベルのファインアノテーションも手頃な価格である場合、ピクセルレベルのアノテーション(セグメンテーションなど)と画像レベルのアノテーション(分類など)の両方を活用してパフォーマンスをさらに向上することは自然である。しかし、計算病理学では、スライド画像全体の高解像度化によって分類モデルのエンドツーエンドの訓練が不可能になるため、そのような弱さや混在した監視学習は依然として難しい課題である。別のアプローチとして、パッチベースのモデルトレーニング、すなわち、自己教師付き学習を用いてパッチのピクセルレベルの擬似ラベルを生成することで、そのようなデータを解析する方法がある。しかしながら、そのような手法は通常、自己学習過程中にノイズが蓄積されるため、収束しにくいモデルドリフト問題を持つ。これらの問題に対処するために,高解像度画像のための混合監視学習フレームワークを提案し,それらの様々なラベル(画像レベルの粗いアノテーションや画素レベルの微細なラベルなど)を効果的に活用する。パッチトレーニングの段階で、このフレームワークは粗いイメージレベルラベルを使用して、自己教師付き学習を洗練し、高品質のピクセルレベル擬似ラベルを生成することができる。ピクセルレベルの偽陽性と偽陰性を抑制するための包括的戦略が提案されている。大量の画像(スライド画像1万枚以上)を持つ実世界の3つのデータセットと、様々な種類のラベルを用いて、混合監視学習の有効性を評価する。画像レベルの分類作業において,100%の感度を維持しながら,術式と比較して偽陽性率を約3分の1削減した。

Weak supervision learning on classification labels has demonstrated high performance in various tasks. When a few pixel-level fine annotations are also affordable, it is natural to leverage both of the pixel-level (e.g., segmentation) and image level (e.g., classification) annotation to further improve the performance. In computational pathology, however, such weak or mixed supervision learning is still a challenging task, since the high resolution of whole slide images makes it unattainable to perform end-to-end training of classification models. An alternative approach is to analyze such data by patch-base model training, i.e., using self-supervised learning to generate pixel-level pseudo labels for patches. However, such methods usually have model drifting issues, i.e., hard to converge, because the noise accumulates during the self-training process. To handle those problems, we propose a mixed supervision learning framework for super high-resolution images to effectively utilize their various labels (e.g., sufficient image-level coarse annotations and a few pixel-level fine labels). During the patch training stage, this framework can make use of coarse image-level labels to refine self-supervised learning and generate high-quality pixel-level pseudo labels. A comprehensive strategy is proposed to suppress pixel-level false positives and false negatives. Three real-world datasets with very large number of images (i.e., more than 10,000 whole slide images) and various types of labels are used to evaluate the effectiveness of mixed supervision learning. We reduced the false positive rate by around one third compared to state of the art while retaining 100\% sensitivity, in the task of image-level classification.

翻訳日:2021-07-05 23:22:50 公開日:2021-07-02

# (参考訳) 逆ディリクレ重み付けによる物理情報ニューラルネットワークの信頼性向上

Inverse-Dirichlet Weighting Enables Reliable Training of Physics Informed Neural Networks ( http://arxiv.org/abs/2107.00940v1 )

ライセンス: CC BY-SA 4.0

Suryanarayana Maddu, Dominik Sturm, Christian L. M\"uller, Ivo F. Sbalzarini

(参考訳) 我々は、物理情報ニューラルネットワーク(PINN)のような深層ニューラルネットワークのトレーニング中に、スケール不均衡を伴うマルチスケールダイナミクスから生じる障害モードを特徴付け、治療する。 PINNは、物理方程式モデルのデータとのシームレスな統合を可能にする、一般的な機械学習テンプレートである。彼らのトレーニングは、データ忠実度と方程式忠実度目標の重み付け和による最適化問題を解決することにかかっている。目的間の衝突は、スケールの不均衡、データのヘテロシディスティック性、物理方程式の剛性、または逐次訓練中の破滅的な干渉によって生じる。このことから生じる訓練病理を説明し,この問題を軽減するための単純かつ効果的な逆ディリクレ重み付け戦略を提案する。ニューラルネットワークのソボレフトレーニングと比較し、分析的に$\boldsymbol{\epsilon}$-Optimalトレーニングのベースラインを提供する。本研究では,多スケールのアクティブ乱流モデルを含む様々な応用における逆ディリクレ重み付けの有効性を実証し,従来のピン訓練よりも精度と収束度が桁違いに向上することを示す。逐次トレーニングを用いた逆モデリングでは,逆ディリクレ重み付けがPINNを破滅的忘れから保護することがわかった。

We characterize and remedy a failure mode that may arise from multi-scale dynamics with scale imbalances during training of deep neural networks, such as Physics Informed Neural Networks (PINNs). PINNs are popular machine-learning templates that allow for seamless integration of physical equation models with data. Their training amounts to solving an optimization problem over a weighted sum of data-fidelity and equation-fidelity objectives. Conflicts between objectives can arise from scale imbalances, heteroscedasticity in the data, stiffness of the physical equation, or from catastrophic interference during sequential training. We explain the training pathology arising from this and propose a simple yet effective inverse-Dirichlet weighting strategy to alleviate the issue. We compare with Sobolev training of neural networks, providing the baseline of analytically $\boldsymbol{\epsilon}$-optimal training. We demonstrate the effectiveness of inverse-Dirichlet weighting in various applications, including a multi-scale model of active turbulence, where we show orders of magnitude improvement in accuracy and convergence over conventional PINN training. For inverse modeling using sequential training, we find that inverse-Dirichlet weighting protects a PINN against catastrophic forgetting.

翻訳日:2021-07-05 22:58:27 公開日:2021-07-02

# (参考訳) パーソナライズ医療から人口健康へ:mヘルスセンシング技術に関する調査

From Personalized Medicine to Population Health: A Survey of mHealth Sensing Techniques ( http://arxiv.org/abs/2107.00948v1 )

ライセンス: CC BY 4.0

Zhiyuan Wang, Haoyi Xiong, Jie Zhang, Sijia Yang, Mehdi Boukhechba, Laura E. Barnes, Daqing Zhang

(参考訳) モバイルセンシングアプリは、個人から行動や健康関連の情報を収集し、メンタルヘルスや慢性ケアのような健康や健康を促進するためのタイムリーな介入を提供するための実用的なアプローチとして広く使われている。モバイルセンシングの目的は,個人用個別医療(emph{(a))と人口用公衆衛生(emph{(b))のいずれかであり,これらのモバイルセンシングアプリの設計を概観し,これらのアプリやシステムの設計を2つのパラダイム –\emph{(i) Personal Sensing} と \emph{(ii) Crowd Sensing} のパラダイムに分類することを提案する。 While both sensing paradigms might incorporate with common ubiquitous sensing technologies, such as wearable sensors, mobility monitoring, mobile data offloading, and/or cloud-based data analytics to collect and process sensing data from individuals, we present a novel taxonomy system with two major components that can specify and classify apps/systems from aspects of the life-cycle of mHealth Sensing: \emph{(1) Sensing Task Creation \& Participation}, \emph{(2) Health Surveillance \& Data Collection}, and \emph{(3) Data Analysis \& Knowledge Discovery}. 2つのパラダイムの異なる目標に関して、この研究はこの分野を体系的にレビューし、これらの2つのコンポーネント間の構成と相互作用の観点から、典型的なアプリ/システムの設計を要約します。要約に加えて, 個人化医療と人口健康の両面から, モバイルセンシングの健康への方向性を明らかにする上でも有効である。

Mobile Sensing Apps have been widely used as a practical approach to collect behavioral and health-related information from individuals and provide timely intervention to promote health and well-beings, such as mental health and chronic cares. As the objectives of mobile sensing could be either \emph{(a) personalized medicine for individuals} or \emph{(b) public health for populations}, in this work we review the design of these mobile sensing apps, and propose to categorize the design of these apps/systems in two paradigms -- \emph{(i) Personal Sensing} and \emph{(ii) Crowd Sensing} paradigms. While both sensing paradigms might incorporate with common ubiquitous sensing technologies, such as wearable sensors, mobility monitoring, mobile data offloading, and/or cloud-based data analytics to collect and process sensing data from individuals, we present a novel taxonomy system with two major components that can specify and classify apps/systems from aspects of the life-cycle of mHealth Sensing: \emph{(1) Sensing Task Creation \& Participation}, \emph{(2) Health Surveillance \& Data Collection}, and \emph{(3) Data Analysis \& Knowledge Discovery}. With respect to different goals of the two paradigms, this work systematically reviews this field, and summarizes the design of typical apps/systems in the view of the configurations and interactions between these two components. In addition to summarization, the proposed taxonomy system also helps figure out the potential directions of mobile sensing for health from both personalized medicines and population health perspectives.

翻訳日:2021-07-05 21:51:55 公開日:2021-07-02

# (参考訳) 身体と計算創造性

Embodiment and Computational Creativity ( http://arxiv.org/abs/2107.00949v1 )

ライセンス: CC BY 4.0

Christian Guckelsberger, Anna Kantosalo, Santiago Negrete-Yankelevich and Tapio Takala

(参考訳) 創造性と創造性の認識は、少なくともある程度は具現化によって形成されると推測する。これは計算創造性(CC)研究に非常に関係があるが、既存の研究は乏しく、概念の使用は曖昧である。我々は,国際計算創造会議において,体系的なレビューと出版物の規範的分析により,この状況を克服した。我々は、概念の異なる使用法を識別し比較することで曖昧さを解決するために、確立した態様を取り入れ、拡張する。我々は,研究の参考として,CCの実施の機会と課題を収集し,文脈を整理し,強調し,具体化されたCC研究プログラムをさらに進めるために重要な方向性を示した。

We conjecture that creativity and the perception of creativity are, at least to some extent, shaped by embodiment. This makes embodiment highly relevant for Computational Creativity (CC) research, but existing research is scarce and the use of the concept highly ambiguous. We overcome this situation by means of a systematic review and a prescriptive analysis of publications at the International Conference on Computational Creativity. We adopt and extend an established typology of embodiment to resolve ambiguity through identifying and comparing different usages of the concept. We collect, contextualise and highlight opportunities and challenges in embracing embodiment in CC as a reference for research, and put forward important directions to further the embodied CC research programme.

翻訳日:2021-07-05 20:34:30 公開日:2021-07-02

# (参考訳) 人集団を参照する直接的及び間接的関連項の概念識別

Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons ( http://arxiv.org/abs/2107.00955v1 )

ライセンス: CC BY 4.0

Anastasia Zhukova, Felix Hamborg, Karsten Donnay and Bela Gipp

(参考訳) クラスタリングによる非教師なし概念識別(unsupervised concept identification)、すなわち意味的関連のある単語やフレーズの識別は、様々なユースケースで使用される文脈的プリミティブ(例えば、テキスト次元の縮小)、すなわち語彙のサイズ、要約、名前付きエンティティの解決を減らすために単語を概念に置き換える、という一般的なアプローチである。本稿では,関連記事から抽出した人物群をアクタとして識別するための教師なしアプローチの最初の結果を示す。具体的には、「移民家族」=「亡命者」など、無名の実体俳優として活動する人々の集団について言及している。私たちの基準と比較すると、このアプローチは「イランの指導者」と「ヨーロッパの指導者」と、「アメリカの役人」=「トランプ政権」といった様々な言葉で間接的に関連のある言及を分離した地政学的実体の言及を維持する。

Unsupervised concept identification through clustering, i.e., identification of semantically related words and phrases, is a common approach to identify contextual primitives employed in various use cases, e.g., text dimension reduction, i.e., replace words with the concepts to reduce the vocabulary size, summarization, and named entity resolution. We demonstrate the first results of an unsupervised approach for the identification of groups of persons as actors extracted from a set of related articles. Specifically, the approach clusters mentions of groups of persons that act as non-named entity actors in the texts, e.g., "migrant families" = "asylum-seekers." Compared to our baseline, the approach keeps the mentions of the geopolitical entities separated, e.g., "Iran leaders" != "European leaders," and clusters (in)directly related mentions with diverse wording, e.g., "American officials" = "Trump Administration."

翻訳日:2021-07-05 20:13:12 公開日:2021-07-02

# (参考訳) A\c{C}AI: アクセント類似性キャッシュと近似指標

A\c{C}AI: Ascent Similarity Caching with Approximate Indexes ( http://arxiv.org/abs/2107.00957v1 )

ライセンス: CC BY 4.0

Tareq Si Salem, Giovanni Neglia, Damiano Carra

(参考訳) 類似性検索はマルチメディア検索システムやレコメンダシステムにおいて重要な操作であり、将来の機械学習や拡張現実アプリケーションにおいても重要な役割を果たす。これらのシステムが大きなオブジェクトに厳しい遅延制約を課す必要がある場合、エンドユーザーに近いエッジサーバは類似性キャッシュとして動作し、検索を高速化することができる。本稿では,a\c{c}aiについて述べる。a\c{c}aiは,(i)カタログ全体に対して(約)インデックスを使用して,どのオブジェクトをローカルに提供し,どのオブジェクトをリモートサーバから取得するかを判断し,(ii)リクエストプロセスが統計的に正規性を示さない場合でも,ローカルオブジェクトの集合を強い保証で更新するミラーアセンシングアルゴリズムを用いて,アートの状態を改善した新しい類似性キャッシングポリシである。

Similarity search is a key operation in multimedia retrieval systems and recommender systems, and it will play an important role also for future machine learning and augmented reality applications. When these systems need to serve large objects with tight delay constraints, edge servers close to the end-user can operate as similarity caches to speed up the retrieval. In this paper we present A\c{C}AI, a new similarity caching policy which improves on the state of the art by using (i) an (approximate) index for the whole catalog to decide which objects to serve locally and which to retrieve from the remote server, and (ii) a mirror ascent algorithm to update the set of local objects with strong guarantees even when the request process does not exhibit any statistical regularity.

翻訳日:2021-07-05 20:01:44 公開日:2021-07-02

# (参考訳) ResIST: 分散トレーニングのためのResNetのレイヤワイズ分解

ResIST: Layer-Wise Decomposition of ResNets for Distributed Training ( http://arxiv.org/abs/2107.00961v1 )

ライセンス: CC BY 4.0

Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis

(参考訳) 残差ネットワーク(resnets)のための新しい分散トレーニングプロトコルである {\rm \texttt{resist}} を提案する。 rm \texttt{resist}} は、グローバルレセットをランダムにいくつかの浅いサブレセットに分解し、複数のローカルイテレーションで個別に訓練し、更新を同期させ、グローバルモデルに集約する。次のラウンドでは、新しいサブResNetがランダムに生成され、プロセスが繰り返される。構成により、反復毎に {\rm \textt{resist}} はネットワークパラメータのほんの一部を各マシンに通信し、トレーニング中にフルモデルを使用することはない。したがって、 {\rm \texttt{ResIST}} は、ResNetトレーニングの通信、メモリ、時間要件を、以前のメソッドの要求のごく一部に減らす。データ並列トレーニングやローカルSGDによるデータ並列トレーニングのような一般的なプロトコルと比較すると、モデルの性能に関して競合する一方で、壁時計のトレーニング時間が減少する。

We propose {\rm \texttt{ResIST}}, a novel distributed training protocol for Residual Networks (ResNets). {\rm \texttt{ResIST}} randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the process repeats. By construction, per iteration, {\rm \texttt{ResIST}} communicates only a small portion of network parameters to each machine and never uses the full model during training. Thus, {\rm \texttt{ResIST}} reduces the communication, memory, and time requirements of ResNet training to only a fraction of the requirements of previous methods. In comparison to common protocols like data-parallel training and data-parallel training with local SGD, {\rm \texttt{ResIST}} yields a decrease in wall-clock training time, while being competitive with respect to model performance.

翻訳日:2021-07-05 19:25:19 公開日:2021-07-02

# (参考訳) 文化財における非監視モニタリングの有用性の評価

Evaluating the Usefulness of Unsupervised monitoring in Cultural Heritage Monuments ( http://arxiv.org/abs/2107.00964v1 )

ライセンス: CC BY-SA 4.0

Charalampos Zafeiropoulos, Ioannis N. Tzortzis, Ioannis Rallis, Eftychios Protopapadakis, Nikolaos Doulamis and Anastasios Doulamis

(参考訳) 本稿では, 各種クラスタリング手法の有効性を検証し, 文化遺産モニタリングへの適用性について検討する。本稿では,ロードス州のサン・ニコラス砦の壁面の分解と腐食のレベルをハイパースペクトル画像を用いて検出する。合計6つの異なるクラスタリング手法が14種類の補正ハイパースペクトル画像に対して評価されている。本研究では,K-means, Spectral, Meanshift, DBSCAN, Birch, Opticsアルゴリズムを実験的に検討した。これらの各手法について,calinski-harabasz,davies-bouldin indexes,silhouette valueなどのパフォーマンス指標を用いて性能評価を行う。本研究では,クラスタリング手法の結果を,原画像の分解および/または腐食領域に関する真実を表す注釈付き画像の集合と比較することにより評価する。その結果,与えられたデータセットに適用したクラスタリング手法によって,精度,精度,リコール,f1スコアが向上した。最終的に,劣化は極めて正確に検出された。

In this paper, we scrutinize the effectiveness of various clustering techniques, investigating their applicability in Cultural Heritage monitoring applications. In the context of this paper, we detect the level of decomposition and corrosion on the walls of Saint Nicholas fort in Rhodes utilizing hyperspectral images. A total of 6 different clustering approaches have been evaluated over a set of 14 different orthorectified hyperspectral images. Experimental setup in this study involves K-means, Spectral, Meanshift, DBSCAN, Birch and Optics algorithms. For each of these techniques we evaluate its performance by the use of performance metrics such as Calinski-Harabasz, Davies-Bouldin indexes and Silhouette value. In this approach, we evaluate the outcomes of the clustering methods by comparing them with a set of annotated images which denotes the ground truth regarding the decomposition and/or corrosion area of the original images. The results depict that a few clustering techniques applied on the given dataset succeeded decent accuracy, precision, recall and f1 scores. Eventually, it was observed that the deterioration was detected quite accurately.

翻訳日:2021-07-05 19:06:20 公開日:2021-07-02

# (参考訳) 移動学習を用いた低コスト顕微鏡画像における寄生卵の検出と分類

Parasitic Egg Detection and Classification in Low-cost Microscopic Images using Transfer Learning ( http://arxiv.org/abs/2107.00968v1 )

ライセンス: CC BY 4.0

Thanaphon Suwannaphong, Sawaphob Chavana, Sahapol Tongsom, Duangdao Palasuwan, Thanarat H. Chalidabhongse and Nantheera Anantrasirichai

(参考訳) 腸管寄生虫感染は世界中の人、特に熱帯諸国にいくつかの致死性をもたらす。通常、従来の診断は、異なる寄生卵の形態的類似性やサンプル中の不純物が豊富にあるため、人間のエラーにつながる顕微鏡画像から手動で解析する。多くの研究が人間の作業負荷を減らすために寄生虫卵検出のための自動システムを開発した。しかし、彼らは高品質の顕微鏡で作業しているが、田園部では残念ながら耐えられない。我々の研究は低コストのUSB顕微鏡の利点を生かしている。しかし、この装置は拡大限界(10x)のため画像の品質が悪く、寄生虫の検出や種分類が困難である。本稿では,低品質顕微鏡画像における自動寄生虫分類の効率を高めるために,トランスファー学習戦略を用いたcnnに基づく手法を提案する。スライディングウインドウを用いたパッチベース技術を用いて卵の位置を探索する。 AlexNetとResNet50という2つのネットワークがアーキテクチャサイズと分類性能のトレードオフによって検討されている。その結果,提案手法は最先端のオブジェクト認識手法よりも優れていた。本システムと専門家による最終決定が組み合わされば, 低コスト顕微鏡による実検率の向上が期待できる。

Intestinal parasitic infection leads to several morbidities to humans worldwide, especially in tropical countries. The traditional diagnosis usually relies on manual analysis from microscopic images which is prone to human error due to morphological similarity of different parasitic eggs and abundance of impurities in a sample. Many studies have developed automatic systems for parasite egg detection to reduce human workload. However, they work with high quality microscopes, which unfortunately remain unaffordable in some rural areas. Our work thus exploits a benefit of a low-cost USB microscope. This instrument however provides poor quality of images due to limitation of magnification (10x), causing difficulty in parasite detection and species classification. In this paper, we propose a CNN-based technique using transfer learning strategy to enhance the efficiency of automatic parasite classification in poor-quality microscopic images. The patch-based technique with sliding window is employed to search for location of the eggs. Two networks, AlexNet and ResNet50, are examined with a trade-off between architecture size and classification performance. The results show that our proposed framework outperforms the state-of-the-art object recognition methods. Our system combined with final decision from an expert may improve the real faecal examination with low-cost microscopes.

翻訳日:2021-07-05 18:58:10 公開日:2021-07-02

# (参考訳) 複数のAndroidスマートフォンのサブミリ秒ビデオ同期

Sub-millisecond Video Synchronization of Multiple Android Smartphones ( http://arxiv.org/abs/2107.00987v1 )

ライセンス: CC BY-SA 4.0

Azat Akhmetyanov, Anastasiia Kornilova, Marsel Faizullin, David Pozo, Gonzalo Ferrer

(参考訳) 本稿では,高ダイナミック環境における多くのコンピュータビジョン・ロボティクスアプリケーションに要求される,安価でセットアップが容易なマルチビューカメラシステムを構築することの課題に対処する。そこで本研究では,複数のスマートフォン上で1ミリ秒未満の精度で動画を同期記録できるandroidアプリケーションを提案する。我々は,androidスマートフォンにおけるタイムスタンプの一般化した数学的モデルを提案し,47種類の物理デバイスに適用可能であることを証明した。また,多くのデバイスで1秒あたり1.2ミリ秒以下であるスマートフォンの時間ドリフトパラメータを推定することにより,スマートフォンのカメラシステムは,プロのマルチビューシステムにふさわしいアナログとなる。最後に,androidスマートフォンを用いたカメラシステムにおいて,300マイクロ秒未満の同期誤差を示し,パノラマ縫合作業において定量的にandroidアプリのパフォーマンスを示す。

This paper addresses the problem of building an affordable easy-to-setup synchronized multi-view camera system, which is in demand for many Computer Vision and Robotics applications in high-dynamic environments. In our work, we propose a solution for this problem - a publicly-available Android application for synchronized video recording on multiple smartphones with sub-millisecond accuracy. We present a generalized mathematical model of timestamping for Android smartphones and prove its applicability on 47 different physical devices. Also, we estimate the time drift parameter for those smartphones, which is less than 1.2 millisecond per minute for most of the considered devices, that makes smartphones' camera system a worthy analog for professional multi-view systems. Finally, we demonstrate Android-app performance on the camera system built from Android smartphones quantitatively, showing less than 300 microseconds synchronization error, and qualitatively - on panorama stitching task.

翻訳日:2021-07-05 18:48:40 公開日:2021-07-02

# (参考訳) ニューラルコード探索のためのマルチモーダル表現

Multimodal Representation for Neural Code Search ( http://arxiv.org/abs/2107.00992v1 )

ライセンス: CC BY 4.0

Jian Gu, Zimin Chen, Martin Monperrus

(参考訳) 意味的なコード検索は、ある自然言語クエリのセマンティック関連コードスニペットを見つけることである。最先端のアプローチでは、コードとクエリのセマンティックな類似性は、共有ベクトル空間におけるそれらの表現の距離として定量化される。本稿では,ベクトル空間を改善するために,AST の簡易な形式を用いたツリーシリアライズ手法を導入し,コードデータのマルチモーダル表現を構築する。大規模なマルチ言語コーパスであるcodesearchnetを用いて,広範な実験を行う。以上の結果から,本手法とマルチモーダル学習モデルの両方が,ニューラルコード探索の性能を向上させることが示された。最後に,コードデータのセマンティック情報と構文情報の完全性に着目した2つの直感的定量化指標を定義する。

Semantic code search is about finding semantically relevant code snippets for a given natural language query. In the state-of-the-art approaches, the semantic similarity between code and query is quantified as the distance of their representation in the shared vector space. In this paper, to improve the vector space, we introduce tree-serialization methods on a simplified form of AST and build the multimodal representation for the code data. We conduct extensive experiments using a single corpus that is large-scale and multi-language: CodeSearchNet. Our results show that both our tree-serialized representations and multimodal learning model improve the performance of neural code search. Last, we define two intuitive quantification metrics oriented to the completeness of semantic and syntactic information of the code data.

翻訳日:2021-07-05 18:41:15 公開日:2021-07-02

# (参考訳) 類似性に基づくマルチスケール埋め込みを用いた拡大非依存型組織像分類

Magnification-independent Histopathological Image Classification with Similarity-based Multi-scale Embeddings ( http://arxiv.org/abs/2107.01063v1 )

ライセンス: CC BY 4.0

Yibao Sun, Xingru Huang, Yaqi Wang, Huiyu Zhou, Qianni Zhang

(参考訳) 病理組織像の分類は、癌診断と病理研究の両方において非常に重要である。しかし、拡大係数やクラス不均衡など、複数の理由により、イメージラベルデータセットから学習する従来の手法が多くの場合、不十分に実行するという困難な課題となっている。同じクラスの腫瘍は、しばしば共通の形態的パターンを共有しているのが観察される。そこで本研究では,類似性に基づくマルチスケール埋め込み (SMSE) を用いた画像分類手法を提案する。特に、対損失と三重項損失を利用して、画像対や画像三重項から類似性に基づく埋め込みを学習する。学習された埋め込みは画像間の類似度を正確に測定し、通常の画像特徴よりも病理組織学的形態のより効果的な表現形態と見なされる。さらに、生成したモデルが倍率非依存であることを保証するため、マルチスケール埋め込み学習のトレーニング中に異なる倍率係数で取得した画像をネットワークに同時に供給する。 smseに加えて, 簡単なサンプルを直感的に破棄するハードサンプルマイニング戦略を用いる代わりに, 容易に分類されたサンプルを抑圧しながら, ハードクラス分けされたサンプルを同時に罰する新たな強化焦点損失を導入する。実験の結果,smseは乳腺癌および肝癌における病理組織学的画像分類タスクの性能を,従来法に比べて大きなマージンで改善することが判明した。特に、SMSEはBreakHisベンチマークで最高のパフォーマンスを達成しており、従来の機能を使った方法に比べて5%から18%改善されている。

The classification of histopathological images is of great value in both cancer diagnosis and pathological studies. However, multiple reasons, such as variations caused by magnification factors and class imbalance, make it a challenging task where conventional methods that learn from image-label datasets perform unsatisfactorily in many cases. We observe that tumours of the same class often share common morphological patterns. To exploit this fact, we propose an approach that learns similarity-based multi-scale embeddings (SMSE) for magnification-independent histopathological image classification. In particular, a pair loss and a triplet loss are leveraged to learn similarity-based embeddings from image pairs or image triplets. The learned embeddings provide accurate measurements of similarities between images, which are regarded as a more effective form of representation for histopathological morphology than normal image features. Furthermore, in order to ensure the generated models are magnification-independent, images acquired at different magnification factors are simultaneously fed to networks during training for learning multi-scale embeddings. In addition to the SMSE, to eliminate the impact of class imbalance, instead of using the hard sample mining strategy that intuitively discards some easy samples, we introduce a new reinforced focal loss to simultaneously punish hard misclassified samples while suppressing easy well-classified samples. Experimental results show that the SMSE improves the performance for histopathological image classification tasks for both breast and liver cancers by a large margin compared to previous methods. In particular, the SMSE achieves the best performance on the BreakHis benchmark with an improvement ranging from 5% to 18% compared to previous methods using traditional features.

翻訳日:2021-07-05 18:25:15 公開日:2021-07-02

# (参考訳) 教師なし音声発話分類

Unsupervised Spoken Utterance Classification ( http://arxiv.org/abs/2107.01068v1 )

ライセンス: CC0 1.0

Shahab Jalalvand and Srinivas Bangalore

(参考訳) インテリジェントバーチャルアシスタント(IVA)は、音声言語理解(SLU)の特殊な形式である音声発話分類(SUC)を通じて、通話ルーティングにおける努力的な会話を可能にする。 SUCシステムを構築するには、常に利用できない大量のドメイン内データを必要とする。本稿では、意図ラベルと意図ごとのパラフレーズを除いてドメイン内のデータを必要としない教師なし音声音声分類手法(USUC)を提案する。 USUCはKNN分類器(K=1)と、大量の教師なし顧客サービスコーパスに基づいてトレーニングされた複雑な埋め込みモデルで構成される。すべての埋め込みモデルの中で、ElmoがUSUCに最適であることを示す。しかし、elmoモデルは呼び出しルーティングのために実行時に使用するには遅すぎる。この問題を解決するため、まず、ユニグラフおよびバイグラム組込みベクトルをオフラインで計算し、n-gramとその組込みベクトルのルックアップテーブルを構築する。次に,このテーブルを用いて文の埋め込みベクトルをリアルタイムに計算し,n-gramのバックオフ手法を提案する。実験により,usucは,教師付きデータなしで分類誤り率を32.9%から27.0%に下げることにより,従来の発話分類法を上回った。さらに,本手法では,処理速度を毎秒16発話から毎秒118発話に向上させる。

An intelligent virtual assistant (IVA) enables effortless conversations in call routing through spoken utterance classification (SUC) which is a special form of spoken language understanding (SLU). Building a SUC system requires a large amount of supervised in-domain data that is not always available. In this paper, we introduce an unsupervised spoken utterance classification approach (USUC) that does not require any in-domain data except for the intent labels and a few para-phrases per intent. USUC is consisting of a KNN classifier (K=1) and a complex embedding model trained on a large amount of unsupervised customer service corpus. Among all embedding models, we demonstrate that Elmo works best for USUC. However, an Elmo model is too slow to be used at run-time for call routing. To resolve this issue, first, we compute the uni- and bi-gram embedding vectors offline and we build a lookup table of n-grams and their corresponding embedding vector. Then we use this table to compute sentence embedding vectors at run-time, along with back-off techniques for unseen n-grams. Experiments show that USUC outperforms the traditional utterance classification methods by reducing the classification error rate from 32.9% to 27.0% without requiring supervised data. Moreover, our lookup and back-off technique increases the processing speed from 16 utterances per second to 118 utterances per second.

翻訳日:2021-07-05 18:06:16 公開日:2021-07-02

# (参考訳) ウェアラブルセンサを用いた幼児運動自動評価のためのエンドツーエンドニューラルネットワークアーキテクチャとデータ拡張手法の比較

Comparison of end-to-end neural network architectures and data augmentation methods for automatic infant motility assessment using wearable sensors ( http://arxiv.org/abs/2107.01086v1 )

ライセンス: CC BY 4.0

Manu Airaksinen, Sampsa Vanhatalo, Okko R\"as\"anen

(参考訳) 知的ウェアラブルを用いた幼児運動評価は、乳児の神経生理学的発達と効率的な信号分析が中心的な役割を果たす新しいアプローチとして有望である。本研究では,ウェアラブルセンサから幼児の運動データを処理するためのエンド・ツー・エンドのニューラルネットワークアーキテクチャの利用について検討する。本稿では,代替センサエンコーダと時系列モデリングモジュールの性能と計算負荷とその組み合わせに着目した。さらに,理想的および非理想的記録条件におけるデータ拡張手法の利点について検討する。実験は, 乳児の運動性評価のためのスマートジャンプスーツを用いて, 7カ月児のマルチセンサ運動記録のデータセットを用いて行った。その結果,エンコーダモジュールの選択は分類器の性能に大きな影響を与えることがわかった。センサエンコーダでは,全センサの共有重み付きセンサ内チャネル融合において,並列2次元畳み込みによる最適性能が得られた。また, センサ内特徴抽出において, 分類器の性能を著しく損なうことなく, 比較的コンパクトな特徴表現が得られることを示す。時系列モデルとの比較により,残差およびスキップ接続によるフィードフォワード拡張畳み込みは,RNNベースモデル全体の性能,トレーニング時間,トレーニング安定性に優れていた。実験は、データ拡張がシミュレーションパケット損失やセンサドロップアウトシナリオのモデルロバスト性を向上させることも示している。特に、信号およびセンサドロップアウトに基づく拡張戦略は、ベースライン性能に悪影響を及ぼすことなく、性能を大幅に向上させた。その結果,多チャンネル移動センサデータに対するエンドツーエンドニューラルネットワークトレーニングの最適化方法について,具体的な提案が得られた。

Infant motility assessment using intelligent wearables is a promising new approach for assessment of infant neurophysiological development, and where efficient signal analysis plays a central role. This study investigates the use of different end-to-end neural network architectures for processing infant motility data from wearable sensors. We focus on the performance and computational burden of alternative sensor encoder and time-series modelling modules and their combinations. In addition, we explore the benefits of data augmentation methods in ideal and non-ideal recording conditions. The experiments are conducted using a data-set of multi-sensor movement recordings from 7-month-old infants, as captured by a recently proposed smart jumpsuit for infant motility assessment. Our results indicate that the choice of the encoder module has a major impact on classifier performance. For sensor encoders, the best performance was obtained with parallel 2-dimensional convolutions for intra-sensor channel fusion with shared weights for all sensors. The results also indicate that a relatively compact feature representation is obtainable for within-sensor feature extraction without a drastic loss to classifier performance. Comparison of time-series models revealed that feed-forward dilated convolutions with residual and skip connections outperformed all RNN-based models in performance, training time, and training stability. The experiments also indicate that data augmentation improves model robustness in simulated packet loss or sensor dropout scenarios. In particular, signal- and sensor-dropout-based augmentation strategies provided considerable boosts to performance without negatively affecting the baseline performance. Overall the results provide tangible suggestions on how to optimize end-to-end neural network training for multi-channel movement sensor data.

翻訳日:2021-07-05 17:59:44 公開日:2021-07-02

# (参考訳) vox populi, vox diy:クラウドソースオーディオ転写のためのベンチマークデータセット

Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription ( http://arxiv.org/abs/2107.01091v1 )

ライセンス: CC BY 4.0

Nikita Pavlichenko, Ivan Stelmakh, Dmitry Ustalov

(参考訳) ドメイン固有のデータは、ベンチマークから実生活への機械学習システムの移行の成功の要点である。クラウドソーシングは、画像分類のような単純な問題に対して、安価で時間効率の良いデータ収集のための標準的なツールの1つになっている。しかしながら、より複雑なタスク(例えば音声認識)へのクラウドソーシングの適用性は、これらのモダリティに対する原則的な集約方法の欠如によって制限されている。高度な集約手法を設計する主な障害は、トレーニングデータの欠如であり、本研究では、音声認識におけるこのギャップを埋めることに焦点を当てる。 CrowdSpeechは、クラウドソーシングされたオーディオの大規模なデータセットとして初めて公開されています。既存の集計手法の評価は改善の余地があり,より優れたアルゴリズムの設計を伴っている可能性が示唆された。より高度なレベルでは、クラウドソーシングを使用して高品質なデータセットを収集するという、より一般的な課題にも貢献します。ロシア語のcrowdspeechに相当するvoxdiyを構築することで、リソース不足の言語にその適用性を示す。データ収集パイプラインの完全なレプリケーションを可能にするコードもリリースし、クラウドソーシングによるデータ収集のベストプラクティスに関するさまざまな洞察を共有しています。

Domain-specific data is the crux of the successful transfer of machine learning systems from benchmarks to real life. Crowdsourcing has become one of the standard tools for cheap and time-efficient data collection for simple problems such as image classification: thanks in large part to advances in research on aggregation methods. However, the applicability of crowdsourcing to more complex tasks (e.g., speech recognition) remains limited due to the lack of principled aggregation methods for these modalities. The main obstacle towards designing advanced aggregation methods is the absence of training data, and in this work, we focus on bridging this gap in speech recognition. For this, we collect and release CrowdSpeech -- the first publicly available large-scale dataset of crowdsourced audio transcriptions. Evaluation of existing aggregation methods on our data shows room for improvement, suggesting that our work may entail the design of better algorithms. At a higher level, we also contribute to the more general challenge of collecting high-quality datasets using crowdsourcing: we develop a principled pipeline for constructing datasets of crowdsourced audio transcriptions in any novel domain. We show its applicability on an under-resourced language by constructing VoxDIY -- a counterpart of CrowdSpeech for the Russian language. We also release the code that allows a full replication of our data collection pipeline and share various insights on best practices of data collection via crowdsourcing.

翻訳日:2021-07-05 17:41:08 公開日:2021-07-02

# (参考訳) 自動運転車のための意思決定技術 : 学習方法と応用と今後の展望

Decision-Making Technology for Autonomous Vehicles Learning-Based Methods, Applications and Future Outlook ( http://arxiv.org/abs/2107.01110v1 )

ライセンス: CC BY 4.0

Qi Liu, Xueyuan Li, Shihua Yuan, Zirui Li

(参考訳) 自動運転車は、民間と軍事の両方の分野の応用に大きな可能性を秘めており、科学と経済の急速な発展による研究の焦点となっている。本稿では,自動運転車の安全性と効率性において重要であることから,自動運転車の学習に基づく意思決定技術について概説する。まず、意思決定技術の基本的な概要を提供する。第2に,学習に基づく自動運転車の意思決定手法に関する関連研究を,古典的意思決定手法との比較で概ね検討した。また,既存の自動運転車における意思決定手法の適用例をまとめた。最後に、自動運転車の意思決定技術の将来研究における有望な研究トピックを展望する。

Autonomous vehicles have a great potential in the application of both civil and military fields, and have become the focus of research with the rapid development of science and economy. This article proposes a brief review on learning-based decision-making technology for autonomous vehicles since it is significant for safer and efficient performance of autonomous vehicles. Firstly, the basic outline of decision-making technology is provided. Secondly, related works about learning-based decision-making methods for autonomous vehicles are mainly reviewed with the comparison to classical decision-making methods. In addition, applications of decision-making methods in existing autonomous vehicles are summarized. Finally, promising research topics in the future study of decision-making technology for autonomous vehicles are prospected.

翻訳日:2021-07-05 17:26:04 公開日:2021-07-02

# (参考訳) 深部画像のスペクトルバイアスの測定と制御について

On Measuring and Controlling the Spectral Bias of the Deep Image Prior ( http://arxiv.org/abs/2107.01125v1 )

ライセンス: CC BY 4.0

Zenglin Shi, Pascal Mettes, Subhransu Maji, and Cees G. M. Snoek

(参考訳) 深層画像は,1つの劣化画像だけを最適化することにより,ノイズ除去,塗装,超高解像度化などの逆画像問題に対処できることを示す。約束にもかかわらず、2つの制限がある。まず、ネットワークアーキテクチャの選択を超えて、どのように事前を制御することができるのかは不明だ。第二に、ピークに達するとパフォーマンスが劣化するので、最適化をいつ停止するかをoracleが決める必要がある。本稿では,これらの問題に対処するために,スペクトルバイアスの観点から先行した深部画像について検討する。周波数帯域対応尺度を導入することで、逆画像の深部画像先行は、低周波画像信号が高周波ノイズ信号よりも高速に学習される最適化中にスペクトルバイアスを示す。このピンポイントは、最適化が正しいタイミングで停止されたときに、劣化した画像をデノベートしたり、インペイントしたりできる理由を示している。そこで本研究では,性能劣化を防止し,最適化収束を高速化するために,深部画像のスペクトルバイアスを制御することを提案する。コンボリューション層とアップサンプリング層という,逆画像ネットワークの2つのコア層タイプで実現している。畳み込みに対するリプシッツ制御アプローチと、アップサンプリング層に対するガウス制御アプローチを提案する。さらに,過剰な計算を避けるために停止基準を導入する。ノイズ除去, 塗装, 超高分解能化実験により, 最適化中の性能劣化に苦しむことなく, オラクル基準を早期に停止させる必要性が軽減された。さらに,過剰な計算を避けるために停止基準を概説する。最後に,本手法は全タスクにおいて,現在の手法と比較して良好な復元結果が得られることを示す。

The deep image prior has demonstrated the remarkable ability that untrained networks can address inverse imaging problems, such as denoising, inpainting and super-resolution, by optimizing on just a single degraded image. Despite its promise, it suffers from two limitations. First, it remains unclear how one can control the prior beyond the choice of the network architecture. Second, it requires an oracle to determine when to stop the optimization as the performance degrades after reaching a peak. In this paper, we study the deep image prior from a spectral bias perspective to address these problems. By introducing a frequency-band correspondence measure, we observe that deep image priors for inverse imaging exhibit a spectral bias during optimization, where low-frequency image signals are learned faster and better than high-frequency noise signals. This pinpoints why degraded images can be denoised or inpainted when the optimization is stopped at the right time. Based on our observations, we propose to control the spectral bias in the deep image prior to prevent performance degradation and to speed up optimization convergence. We do so in the two core layer types of inverse imaging networks: the convolution layer and the upsampling layer. We present a Lipschitz-controlled approach for the convolution and a Gaussian-controlled approach for the upsampling layer. We further introduce a stopping criterion to avoid superfluous computation. The experiments on denoising, inpainting and super-resolution show that our method no longer suffers from performance degradation during optimization, relieving us from the need for an oracle criterion to stop early. We further outline a stopping criterion to avoid superfluous computation. Finally, we show that our approach obtains favorable restoration results compared to current approaches, across all tasks.

翻訳日:2021-07-05 17:09:21 公開日:2021-07-02

# (参考訳) Contrastive Fenchel-Legendre Optimization を用いた高度相互情報推定

Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization ( http://arxiv.org/abs/2107.01131v1 )

ライセンス: CC BY 4.0

Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, Chenyang Tao

(参考訳) InfoNCEとその変種の使用が成功し、機械学習におけるコントラスト変動相互情報(MI)推定器の利用が一般化した。優れた安定性を示す一方で、これらの推定値はコストのかかる大規模バッチトレーニングに依存しており、分散削減のために縛りのあるタイトさを犠牲にしている。これらの限界を克服するために、非正規化統計モデリングと凸最適化のレンズから一般的な変分mi境界の数学を再検討する。我々の研究は、一般的な変分MI境界を包含する新しい統一理論の枠組みをもたらすだけでなく、FLOと呼ばれる新しい、単純で強力な反トラストMI推定器にも繋がる。理論的には、FLO推定器は厳密であり、確率勾配降下下では確実に収束する。実証的に、我々のFLO推定器は前者の限界を克服し、より効率的に学習する。 FLOの有効性は、広範囲なベンチマークを用いて検証され、実際のMI推定におけるトレードオフも明らかにされる。

Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation not only yields a new unified theoretical framework encompassing popular variational MI bounds but also leads to a novel, simple, and powerful contrastive MI estimator named as FLO. Theoretically, we show that the FLO estimator is tight, and it provably converges under stochastic gradient descent. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.

翻訳日:2021-07-05 16:51:11 公開日:2021-07-02

# (参考訳) 4C: CAVのための計算・通信・制御の共同設計フレームワーク

4C: A Computation, Communication, and Control Co-Design Framework for CAVs ( http://arxiv.org/abs/2107.01142v1 )

ライセンス: CC0 1.0

Liangkai Liu, Shaoshan Liu, and Weisong Shi

(参考訳) コネクテッド・自動運転車(CAV)は、安全性と効率の面で有望であり、政府機関、産業、学界から多額の投資と関心を集めている。より多くのコンピューティングと通信リソースが利用可能であるため、車両とエッジサーバは、感知と知覚のために、Visual IoT(V-IoT)技術として知られる一連のカメラベースの視覚センサーを備えている。プログラム可能な通信、計算、制御を実現するために、多くの努力がなされている。しかし、それらは主にサイロモードで行われ、現実世界で挑戦的なシナリオを扱う応答性と効率を制限している。エンド・ツー・エンドの性能を向上させるために,将来のCAVはコミュニケーション,計算,制御の共設計を必要とする。本稿では,CAVのエンドツーエンド設計原則である4Cについて,統一的な通信,計算,協調設計のフレームワークを提供することで,V-IoTシステムを拡張したビジョンを述べる。プログラマブルなコミュニケーション、細かな異種計算、そして4cの効率的な車両制御により、cavsは重要なシナリオを処理し、エネルギー効率の良い自動運転を実現することができる。最後に,4cフレームワークのビジョンを実現するための課題をいくつか提示する。

Connected and autonomous vehicles (CAVs) are promising due to their potential safety and efficiency benefits and have attracted massive investment and interest from government agencies, industry, and academia. With more computing and communication resources are available, both vehicles and edge servers are equipped with a set of camera-based vision sensors, also known as Visual IoT (V-IoT) techniques, for sensing and perception. Tremendous efforts have been made for achieving programmable communication, computation, and control. However, they are conducted mainly in the silo mode, limiting the responsiveness and efficiency of handling challenging scenarios in the real world. To improve the end-to-end performance, we envision that future CAVs require the co-design of communication, computation, and control. This paper presents our vision of the end-to-end design principle for CAVs, called 4C, which extends the V-IoT system by providing a unified communication, computation, and control co-design framework. With programmable communications, fine-grained heterogeneous computation, and efficient vehicle controls in 4C, CAVs can handle critical scenarios and achieve energy-efficient autonomous driving. Finally, we present several challenges to achieving the vision of the 4C framework.

翻訳日:2021-07-05 16:17:21 公開日:2021-07-02

# (参考訳) 協調型視覚ナビゲーション

Collaborative Visual Navigation ( http://arxiv.org/abs/2107.01151v1 )

ライセンス: CC BY 4.0

Haiyang Wang, Wenguan Wang, Xizhou Zhu, Jifeng Dai, Liwei Wang

(参考訳) 人工知能の基本的な問題として、マルチエージェントシステム(MAS)は、主にマルチエージェント強化学習(MARL)技術によって急速に進歩している。しかしながら、従来のmarlの手法は主にグリッドワールドのようなゲーム環境にフォーカスしており、視覚的にリッチな環境でのmasの探索は少ないままである。このギャップを狭め,MASにおける知覚の重要な役割を強調するために,マルチエージェント視覚ナビゲーション(MAVN)のための大規模3次元データセットCollaVNを提案する。 collavnでは、複数のエージェントが協調してフォトリアリスティックな環境を渡り、ターゲットの場所に到達する。この問題をより一般的なものにするために、様々なMAVN変種を探索する。さらに,メモリ型通信フレームワークを提案する。各エージェントは、通信情報を永続的に記憶するプライベートな外部メモリを備える。これにより、エージェントは過去のコミュニケーション情報をよりよく利用し、より効率的なコラボレーションと堅牢な長期計画を可能にします。実験では,いくつかのベースラインと評価指標を設計した。また、異なるMAVNタスク設定に対して提案したMARLアプローチの有効性を実証的に検証した。

As a fundamental problem for Artificial Intelligence, multi-agent system (MAS) is making rapid progress, mainly driven by multi-agent reinforcement learning (MARL) techniques. However, previous MARL methods largely focused on grid-world like or game environments; MAS in visually rich environments has remained less explored. To narrow this gap and emphasize the crucial role of perception in MAS, we propose a large-scale 3D dataset, CollaVN, for multi-agent visual navigation (MAVN). In CollaVN, multiple agents are entailed to cooperatively navigate across photo-realistic environments to reach target locations. Diverse MAVN variants are explored to make our problem more general. Moreover, a memory-augmented communication framework is proposed. Each agent is equipped with a private, external memory to persistently store communication information. This allows agents to make better use of their past communication information, enabling more efficient collaboration and robust long-term planning. In our experiments, several baselines and evaluation metrics are designed. We also empirically verify the efficacy of our proposed MARL approach across different MAVN task settings.

翻訳日:2021-07-05 16:06:14 公開日:2021-07-02

# (参考訳) シンプルで、速く、より強く: 対照的な学習者に対して、log-kの呪いを破る

Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE ( http://arxiv.org/abs/2107.01152v1 )

ライセンス: CC BY 4.0

Junya Chen, Zhe Gan, Xuan Li, Qing Guo, Liqun Chen, Shuyang Gao, Tagyoung Chung, Yi Xu, Belinda Zeng, Wenlian Lu, Fan Li, Lawrence Carin, Chenyang Tao

(参考訳) InfoNCEベースのコントラスト表現学習者(SimCLRなど)は近年大きく成功している。しかしながら、これらの対照的なスキームは、その効果が小さなバッチトレーニング(例えば、log-Kの呪い、Kはバッチサイズ)によって破壊されるため、リソース要求で悪名高い。本研究は,小さなバッチサイズでは,コントラスト学習者が失敗する理由を数学的に明らかにし,この問題を解決した,単純で非自明なコントラスト目標flatnceを提案する。 InfoNCEとは異なり、FlatNCEはもはや、対照的な学習のための差別的な分類目標に明示的にアピールしていません。理論的には、フラットスはインフォンスの数学的双対な定式化であり、したがってエネルギーモデリングに関する古典文学を橋渡ししていることを示している。この研究の意義は、コントラスト学習技術の強力な一般化と、コントラスト学習の監視と診断のための新しいツールの導入によってもたらされる。 CIFAR10、ImageNet、その他のデータセットに関する実証的な証拠で、私たちの主張を裏付けます。

InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes this issue. Unlike InfoNCE, our FlatNCE no longer explicitly appeals to a discriminative classification goal for contrastive learning. Theoretically, we show FlatNCE is the mathematical dual formulation of InfoNCE, thus bridging the classical literature on energy modeling; and empirically, we demonstrate that, with minimal modification of code, FlatNCE enables immediate performance boost independent of the subject-matter engineering efforts. The significance of this work is furthered by the powerful generalization of contrastive learning techniques, and the introduction of new tools to monitor and diagnose contrastive training. We substantiate our claims with empirical evidence on CIFAR10, ImageNet, and other datasets, where FlatNCE consistently outperforms InfoNCE.

翻訳日:2021-07-05 15:40:50 公開日:2021-07-02

# (参考訳) 勾配漏れ耐性フェデレート学習

Gradient-Leakage Resilient Federated Learning ( http://arxiv.org/abs/2107.01154v1 )

ライセンス: CC BY 4.0

Wenqi Wei, Ling Liu, Yanzhao Wu, Gong Su, Arun Iyengar

(参考訳) クライアントはデバイスに機密データを保持でき、ローカルトレーニングパラメータの更新のみをフェデレーションサーバと共有できるため、フェデレーション学習(FL)は、デフォルトのクライアントプライバシを備えた、新興の分散学習パラダイムである。しかし最近の研究では、FLの勾配リークがクライアントのトレーニングデータのプライバシーを損なう可能性があることが示されている。本稿では,feed-cdpと呼ばれるサンプルベースクライアントディファレンシャルプライバシのトレーニング毎に,プライバシ保存型フェデレーション学習に対する勾配漏洩耐性アプローチを提案する。 3つのオリジナル・コントリビューションがある。まず,暗号化されたクライアントサーバ間通信においても,フェデレーション学習における3種類のクライアント勾配漏洩脅威を識別する。我々は、従来のサーバがFed-SDPとよばれる差分プライバシーアプローチが、トレーニングデータのプライバシーを保護するのに不十分な時期と理由を明確に述べる。第2に、サンプルベースのクライアント差分プライバシーアルゴリズムであるFed-CDPを導入し、$(\epsilon, \delta)$差分プライバシー保証によるFed-CDPの形式分析と、プライバシ会計の観点からFed-CDPとFed-SDPの形式比較を提供する。第三に、Fed-CDPによる差分プライバシー保証を提供するためのプライバシユーティリティトレードオフを正式に分析し、Fed-CDPの精度とレジリエンスをさらに向上させる動的減衰ノイズ注入ポリシーを提案する。 Fed-CDPとFed-CDP(decay)を5つのベンチマークデータセットに対して差分プライバシー保証と勾配リークレジリエンスの観点からFed-SDPと比較した。その結果、Fed-CDPアプローチは、クライアント勾配リークに対するレジリエンスの観点から従来のFed-SDPよりも優れており、フェデレート学習における競争精度が向上していることがわかった。

Federated learning(FL) is an emerging distributed learning paradigm with default client privacy because clients can keep sensitive data on their devices and only share local training parameter updates with the federated server. However, recent studies reveal that gradient leakages in FL may compromise the privacy of client training data. This paper presents a gradient leakage resilient approach to privacy-preserving federated learning with per training example-based client differential privacy, coined as Fed-CDP. It makes three original contributions. First, we identify three types of client gradient leakage threats in federated learning even with encrypted client-server communications. We articulate when and why the conventional server coordinated differential privacy approach, coined as Fed-SDP, is insufficient to protect the privacy of the training data. Second, we introduce Fed-CDP, the per example-based client differential privacy algorithm, and provide a formal analysis of Fed-CDP with the $(\epsilon, \delta)$ differential privacy guarantee, and a formal comparison between Fed-CDP and Fed-SDP in terms of privacy accounting. Third, we formally analyze the privacy-utility trade-off for providing differential privacy guarantee by Fed-CDP and present a dynamic decay noise-injection policy to further improve the accuracy and resiliency of Fed-CDP. We evaluate and compare Fed-CDP and Fed-CDP(decay) with Fed-SDP in terms of differential privacy guarantee and gradient leakage resilience over five benchmark datasets. The results show that the Fed-CDP approach outperforms conventional Fed-SDP in terms of resilience to client gradient leakages while offering competitive accuracy performance in federated learning.

翻訳日:2021-07-05 15:11:04 公開日:2021-07-02

# (参考訳) モーメントは確率的AUPRC最大化の収束を加速する

Momentum Accelerates the Convergence of Stochastic AUPRC Maximization ( http://arxiv.org/abs/2107.01173v1 )

ライセンス: CC BY 4.0

Guanghui Wang, Ming Yang, Lijun Zhang, Tianbao Yang

(参考訳) 本稿では,不均衡な分類課題に対処するために広く用いられている精度リコール曲線(auprc)下の領域の確率的最適化について検討する。 AUPRCの最大化にはいくつかの方法が提案されているが、収束保証付きAUPRCの確率的最適化は未開発領域のままである。最近の研究[42]では、平均精度のサロゲート損失を最大化することに基づく AUPRC に対する有望なアプローチを提案し、非凸目的の$O(1/\epsilon^5)$複雑性を証明した。本稿では, (i)$O(1/\epsilon^4)$の反復複雑性を向上した新しい確率運動量法を開発し, (ii)$O(1/\epsilon^4)$と同じ反復複雑性を持つ新しい確率適応手法のファミリーを設計し, 実際により高速な収束を享受することで, AURPCの確率的最適化をさらに改善する。そこで本研究では,コンバージェンス改善に不可欠な2つの革新的手法を提案する。 (i) 個々のランキングスコアを追跡するバイアス付き推定器をランダムに座標的に更新する, (ii) 目標の勾配を追跡する確率的勾配推定器の上に運動量更新を用いる。様々なデータセットに対する実験により,提案アルゴリズムの有効性が示された。独立性において、提案された確率運動量と適応アルゴリズムは、2段階確率依存構成最適化問題にも適用できる。

In this paper, we study stochastic optimization of areas under precision-recall curves (AUPRC), which is widely used for combating imbalanced classification tasks. Although a few methods have been proposed for maximizing AUPRC, stochastic optimization of AUPRC with convergence guarantee remains an undeveloped territory. A recent work [42] has proposed a promising approach towards AUPRC based on maximizing a surrogate loss for the average precision, and proved an $O(1/\epsilon^5)$ complexity for finding an $\epsilon$-stationary solution of the non-convex objective. In this paper, we further improve the stochastic optimization of AURPC by (i) developing novel stochastic momentum methods with a better iteration complexity of $O(1/\epsilon^4)$ for finding an $\epsilon$-stationary solution; and (ii) designing a novel family of stochastic adaptive methods with the same iteration complexity of $O(1/\epsilon^4)$, which enjoy faster convergence in practice. To this end, we propose two innovative techniques that are critical for improving the convergence: (i) the biased estimators for tracking individual ranking scores are updated in a randomized coordinate-wise manner; and (ii) a momentum update is used on top of the stochastic gradient estimator for tracking the gradient of the objective. Extensive experiments on various data sets demonstrate the effectiveness of the proposed algorithms. Of independent interest, the proposed stochastic momentum and adaptive algorithms are also applicable to a class of two-level stochastic dependent compositional optimization problems.

翻訳日:2021-07-05 14:45:19 公開日:2021-07-02

# (参考訳) AIタスクのための倫理シート

Ethics Sheets for AI Tasks ( http://arxiv.org/abs/2107.01183v1 )

ライセンス: CC BY 4.0

Saif M. Mohammad

(参考訳) バイアスド・リシディズム・システムの使用や、脆弱なサブ人口に対する感情認識システムの大量テストなど、いくつかの顕著な出来事は、テクノロジーが既に疎外されている人々にとってより有害な結果をもたらすことを強調している。本稿では,個別のモデルやデータセットのレベルだけでなく,AIタスクのレベルにおいても倫理的考察を考察する。 AIタスクのための倫理シート(Ethics Sheets for AI Tasks)という,タスクの一般的なフレーム化方法や,データやメソッド,評価に関する選択に隠された仮定と倫理的考察の具体化を目的とした,そのような取り組みの新たな形式を紹介します。最後に、自動感情認識のための倫理表の例を挙げる。データセット用のData SheetsとAIシステムのModel Cardsとともに、Ethics Sheetsは、責任あるAIシステムの開発とデプロイを支援する。

Several high-profile events, such as the use of biased recidivism systems and mass testing of emotion recognition systems on vulnerable sub-populations, have highlighted how technology will often lead to more adverse outcomes for those that are already marginalized. In this paper, I will make a case for thinking about ethical considerations not just at the level of individual models and datasets, but also at the level of AI tasks. I will present a new form of such an effort, Ethics Sheets for AI Tasks, dedicated to fleshing out the assumptions and ethical considerations hidden in how a task is commonly framed and in the choices we make regarding the data, method, and evaluation. Finally, I will provide an example ethics sheet for automatic emotion recognition. Together with Data Sheets for datasets and Model Cards for AI systems, Ethics Sheets aid in the development and deployment of responsible AI systems.

翻訳日:2021-07-05 14:11:38 公開日:2021-07-02

# (参考訳) NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

NTIRE 2021 Multi-modal Aerial View Object Classification Challenge ( http://arxiv.org/abs/2107.01189v1 )

ライセンス: CC BY 4.0

Jerrick Liu, Nathan Inkawhich, Oliver Nina, Radu Timofte, Sahil Jain, Bob Lee, Yuru Duan, Wei Wei, Lei Zhang, Songzheng Xu, Yuxuan Sun, Jiaqi Tang, Xueli Geng, Mengru Ma, Gongzhe Li, Xueli Geng, Huanqia Cai, Chengxue Cai, Sol Cummings, Casian Miron, Alexandru Pasarica, Cheng-Yen Yang, Hung-Min Hsu, Jiarui Cai, Jie Mei, Chia-Ying Yeh, Jenq-Neng Hwang, Michael Xin, Zhongkai Shangguan, Zihe Zheng, Xu Yifei, Lehan Yang, Kele Xu, Min Feng

(参考訳) 本稿では,CVPR における NTIRE 2021 ワークショップと合わせて,MAVOC (Multi-modal Aerial View Object Classification) の最初の挑戦を紹介する。この課題は、EOとSAR画像を用いた2つの異なるトラックで構成されている。 EOとSARのセンサーには、それぞれ異なる利点と欠点がある。この競争の目的は、両方の感覚情報を相補的に利用する方法を分析することである。本コンペティションに提案した上位手法について論じ,その成果を盲点テストセットで評価する。我々の挑戦結果は、競技のトラック毎の現在のベースラインから15%以上精度が向上したことを示している。

In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in complementary ways. We discuss the top methods submitted for this competition and evaluate their results on our blind test set. Our challenge results show significant improvement of more than 15% accuracy from our current baselines for each track of the competition

翻訳日:2021-07-05 13:59:55 公開日:2021-07-02

# (参考訳) コントラスト学習はいかに不完全か自己監督型ビデオ認識のためのイントラ・イントラ・ヴァリアントデュアル表現法

How Incomplete is Contrastive Learning? AnInter-intra Variant Dual Representation Method forSelf-supervised Video Recognition ( http://arxiv.org/abs/2107.01194v1 )

ライセンス: CC BY 4.0

Lin Zhang, Qi She, Zhengyang Shen, Changhu Wang

(参考訳) 自己指導型表現学習に適用されるコントラスト学習は、深層モデルで復活している。本稿では,自己教師付きビデオ認識のための既存のコントラスト学習ベースのソリューションが,同一ビデオ内のクリップ内分散を無視しながら,分散符号化に重点を置いていることを見出した。そこで本研究では,各クリップの2つの表現を学習し,シャッフルランクのプリテキストタスクでイントラ分散を符号化し,時間的コヒーレントなコントラスト損失で相互分散を符号化する手法を提案する。実験の結果,本手法は相互および内部分散のバランスをとる上で重要な役割を担っており,複数のバックボーンとコントラスト学習フレームワーク上で一貫したパフォーマンス向上をもたらす。 SimCLR と統合して Kinetics-400 で事前訓練を行い,UCF101 と HMDB51 のテストセットの下流分類精度 $\textbf{82.0\%} と $\textbf{51.2\%} と $\textbf{46.1\%} と UCF101 の動画検索精度 $\textbf{46.1\%} をそれぞれ達成した。

Contrastive learning applied to self-supervised representation learning has seen a resurgence in deep models. In this paper, we find that existing contrastive learning based solutions for self-supervised video recognition focus on inter-variance encoding but ignore the intra-variance existing in clips within the same video. We thus propose to learn dual representations for each clip which (\romannumeral 1) encode intra-variance through a shuffle-rank pretext task; (\romannumeral 2) encode inter-variance through a temporal coherent contrastive loss. Experiment results show that our method plays an essential role in balancing inter and intra variances and brings consistent performance gains on multiple backbones and contrastive learning frameworks. Integrated with SimCLR and pretrained on Kinetics-400, our method achieves $\textbf{82.0\%}$ and $\textbf{51.2\%}$ downstream classification accuracy on UCF101 and HMDB51 test sets respectively and $\textbf{46.1\%}$ video retrieval accuracy on UCF101, outperforming both pretext-task based and contrastive learning based counterparts.

翻訳日:2021-07-05 13:37:09 公開日:2021-07-02

# アクショントランスフォーマー : 短時間行動認識のためのセルフアテンションモデル

Action Transformer: A Self-Attention Model for Short-Time Human Action Recognition ( http://arxiv.org/abs/2107.00606v2 )

ライセンス: Link先を確認

Vittorio Mazzia, Simone Angarano, Francesco Salvetti, Federico Angelini and Marcello Chiaberge

(参考訳) 純粋に注意に基づくディープニューラルネットワークは、設計者による最小限のアーキテクチャ優先に依存しているため、いくつかのドメインで成功を収めている。人間行動認識(har)では、注意機構は主に標準畳み込み層や再帰層の上に採用され、全体的な一般化能力が向上している。本研究では,畳み込み層,リカレント層,注意層を混合するより精巧なネットワークを一貫して上回る,単純で完全な自己完結型アーキテクチャであるaction transformer(act)を導入する。従来のヒューマンアクション認識研究に基づいて,計算とエネルギーの要求を制限するため,提案手法では2次元ポーズ表現を小さな時間窓上で活用し,高精度かつ効果的なリアルタイム性能を実現するための低レイテンシソリューションを提供する。さらに、リアルタイムな短時間の人行動認識のための正式なトレーニングと評価ベンチマークを構築するために、新しい大規模データセットであるMPOSE2021をオープンソース化した。 MPOSE2021の大規模実験は,提案手法と,それ以前のアーキテクチャソリューションにより,AcTモデルの有効性が証明され,今後のHAR研究の基盤となる。

Deep neural networks based purely on attention have been successful across several domains, relying on minimal architectural priors from the designer. In Human Action Recognition (HAR), attention mechanisms have been primarily adopted on top of standard convolutional or recurrent layers, improving the overall generalization capability. In this work, we introduce Action Transformer (AcT), a simple, fully self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent, and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance. Moreover, we open-source MPOSE2021, a new large-scale dataset, as an attempt to build a formal training and evaluation benchmark for real-time short-time human action recognition. Extensive experimentation on MPOSE2021 with our proposed methodology and several previous architectural solutions proves the effectiveness of the AcT model and poses the base for future work on HAR.

翻訳日:2021-07-05 13:07:08 公開日:2021-07-02

# 合成データは多目的追跡における関連知識学習の現実に匹敵する

Synthetic Data Are as Good as the Real for Association Knowledge Learning in Multi-object Tracking ( http://arxiv.org/abs/2106.16100v2 )

ライセンス: Link先を確認

Yuchi Liu, Zhongdao Wang, Xiangxin Zhou and Liang Zheng

(参考訳) 同じアイデンティティのバウンディングボックスをビデオシーケンスでリンクすることを目的としたアソシエーションは、マルチオブジェクトトラッキング(mot)の中心的なコンポーネントである。パラメトリックネットワークなどのアソシエーションモジュールをトレーニングするために、実際のビデオデータが通常使用される。しかし、連続するビデオフレームで人物のトラックをアノテートすることは高価であり、そのような実際のデータは柔軟性がないため、追跡シナリオを変更するシステム性能w.r.tを評価する機会が限られている。本稿では,3次元合成データが実世界の映像を連想訓練に置き換えられるかどうかについて検討する。具体的には,MOTXと呼ばれる大規模合成データエンジンを導入し,カメラや物体の運動特性を実世界のデータセットに類似するように手動で設定する。実データと比較すると,合成データから得られる連想知識は,ドメイン適応手法を使わずに実世界のテストセットで非常によく似た性能が得られることを示す。私たちの興味深い観察には2つの要因がある。第一に、3Dエンジンは、カメラの動き、カメラの視界、物体の動きなどの動きをうまくシミュレートすることができ、シミュレートされたビデオは、効果的なモーション特徴を持つアソシエーションモジュールを提供することができる。第2に, 出現領域のギャップが連想知識の学習にほとんど影響を与えないことを示す実験結果が得られた。さらに、MOTXの強力なカスタマイズ能力により、MOTに対する運動要因の影響を定量的に評価することが可能となり、コミュニティに新たな洞察がもたらされる。

Association, aiming to link bounding boxes of the same identity in a video sequence, is a central component in multi-object tracking (MOT). To train association modules, e.g., parametric networks, real video data are usually used. However, annotating person tracks in consecutive video frames is expensive, and such real data, due to its inflexibility, offer us limited opportunities to evaluate the system performance w.r.t changing tracking scenarios. In this paper, we study whether 3D synthetic data can replace real-world videos for association training. Specifically, we introduce a large-scale synthetic data engine named MOTX, where the motion characteristics of cameras and objects are manually configured to be similar to those in real-world datasets. We show that compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques. Our intriguing observation is credited to two factors. First and foremost, 3D engines can well simulate motion factors such as camera movement, camera view and object movement, so that the simulated videos can provide association modules with effective motion features. Second, experimental results show that the appearance domain gap hardly harms the learning of association knowledge. In addition, the strong customization ability of MOTX allows us to quantitatively assess the impact of motion factors on MOT, which brings new insights to the community.

翻訳日:2021-07-05 13:06:47 公開日:2021-07-02

# SocialAI: 深層強化学習エージェントにおける社会認知能力のベンチマーク

SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents ( http://arxiv.org/abs/2107.00956v1 )

ライセンス: Link先を確認

Grgur Kova\v{c}, R\'emy Portelas, Katja Hofmann, Pierre-Yves Oudeyer

(参考訳) 人間との社会的相互作用に参加することができる、具体化された自律エージェントを構築することは、AIの主要な課題の1つだ。深層強化学習(Dep Reinforcement Learning, DRL)分野において、この目的は具体的言語使用に関する複数の研究を動機づけた。しかし、現在のアプローチでは、非常にシンプルで多様でない社会状況におけるコミュニケーションツールとしての言語に焦点が当てられている: 言語の「自然性」は、高い語彙サイズと可変性の概念に還元される。本稿では,人間レベルのAIを目指すためには,1)複雑で可変な社会的文脈における言語の使用,2)常に進化する社会世界におけるマルチモーダル環境における複雑な具体的コミュニケーションなど,より広範な社会スキルのセットが必要であることを論じる。認知科学の概念は、AIが人間のような知性に向けてロードマップを描き出すのにどう役立つかを説明します。最初のステップとして、現在の研究をより広範なソーシャルスキルのセットに拡大することを提案する。そこで我々は,他の(記述された)ソーシャルエージェントを特徴とする複数のグリッドワールド環境を用いて,DRLエージェントの社会的スキル獲得を評価するベンチマークであるSocialAIを提案する。次に,最近のsota drlアプローチの限界をsocialai上で検証し,次の社会的エージェントへの重要なステップについて論じる。ビデオとコードはhttps://sites.google.com/view/socialaiで入手できる。

Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. Within the Deep Reinforcement Learning (DRL) field, this objective motivated multiple works on embodied language use. However, current approaches focus on language as a communication tool in very simplified and non-diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary size and variability. In this paper, we argue that aiming towards human-level AI requires a broader set of key social skills: 1) language use in complex and variable social contexts; 2) beyond language, complex embodied communication in multimodal settings within constantly evolving social worlds. We explain how concepts from cognitive sciences could help AI to draw a roadmap towards human-like intelligence, with a focus on its social dimensions. As a first step, we propose to expand current research to a broader set of core social skills. To do this, we present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents using multiple grid-world environments featuring other (scripted) social agents. We then study the limits of a recent SOTA DRL approach when tested on SocialAI and discuss important next steps towards proficient social agents. Videos and code are available at https://sites.google.com/view/socialai.

翻訳日:2021-07-05 13:05:53 公開日:2021-07-02

# ロバストな医用画像分割のための協調訓練と潜時空間データ増強

Cooperative Training and Latent Space Data Augmentation for Robust Medical Image Segmentation ( http://arxiv.org/abs/2107.01079v1 )

ライセンス: Link先を確認

Chen Chen, Kerstin Hammernik, Cheng Ouyang, Chen Qin, Wenjia Bai, Daniel Rueckert

(参考訳) ディープラーニングベースのセグメンテーション手法は、例えばデプロイメント中に予期せぬデータ分散シフトに対して脆弱である。異なるスキャナー、予期しない画像アーティファクトなどによる画像の外観やコントラストの変化。本稿では,画像分割モデルの学習のための協調フレームワークと,実例生成のための潜在空間拡張手法を提案する。どちらの貢献も限られたデータでモデルの一般化と堅牢性を改善する。協調トレーニングフレームワークは、高速思考ネットワーク(FTN)と低速思考ネットワーク(STN)で構成されている。 FTNは、画像再構成とセグメンテーションタスクのための分離された画像特徴と形状特徴を学習する。 STNは、セグメンテーション補正と精錬のための形状前処理を学習する。 2つのネットワークは協調的に訓練されている。潜時空間増強は、チャネルワイドおよび空間ワイドの両方で分離された潜時空間をマスキングすることで、困難な訓練例を生成する。公開心画像データセットについて広範な実験を行った。訓練対象は1つのサイトから10名に過ぎず,強いベースライン法に比べ,クロスサイトセグメンテーション性能が向上し,様々な予期せぬ画像アーチファクトに対するロバスト性が向上した。特に、潜在空間データ拡張による協調訓練は、標準訓練法と比較して平均サイコロスコアで15%向上する。

Deep learning-based segmentation methods are vulnerable to unforeseen data distribution shifts during deployment, e.g. change of image appearances or contrasts caused by different scanners, unexpected imaging artifacts etc. In this paper, we present a cooperative framework for training image segmentation models and a latent space augmentation method for generating hard examples. Both contributions improve model generalization and robustness with limited data. The cooperative training framework consists of a fast-thinking network (FTN) and a slow-thinking network (STN). The FTN learns decoupled image features and shape features for image reconstruction and segmentation tasks. The STN learns shape priors for segmentation correction and refinement. The two networks are trained in a cooperative manner. The latent space augmentation generates challenging examples for training by masking the decoupled latent space in both channel-wise and spatial-wise manners. We performed extensive experiments on public cardiac imaging datasets. Using only 10 subjects from a single site for training, we demonstrated improved cross-site segmentation performance and increased robustness against various unforeseen imaging artifacts compared to strong baseline methods. Particularly, cooperative training with latent space data augmentation yields 15% improvement in terms of average Dice score when compared to a standard training method.

翻訳日:2021-07-05 13:05:31 公開日:2021-07-02

# Deep Metric Learning 法の一般化性向上のための損失関数の組付け

Ensemble of Loss Functions to Improve Generalizability of Deep Metric Learning methods ( http://arxiv.org/abs/2107.01130v1 )

ライセンス: Link先を確認

Davood Zabihzadeh

(参考訳) Deep Metric Learning (DML)は入力データから非線形セマンティック埋め込みを学び、類似したペアをまとめながら、異なるデータを互いに遠ざけ合う。この目的のために、様々な応用において有望な結果をもたらす様々な方法が過去10年間に提案されている。 DMLアルゴリズムの成功は、その損失関数に大きく依存する。しかし、損失関数は完全ではなく、最適な類似性の埋め込みのいくつかの側面のみを扱う。さらに、テスト段階における見えないカテゴリに対するDMLの一般化性は、既存の損失関数では考慮されない重要な問題である。これらの課題に対処するために,共有機能抽出器上に構築された異なる損失を組み合わせ,新しい手法を提案する。提案された損失の集合は、すべての損失と一致する特徴を抽出するディープモデルを強制する。選択された損失は多種多様であり,それぞれが最適セマンティック埋め込みの異なる側面を強調しているため,有効結合法は個々の損失に対して著しく改善され,目に見えないカテゴリをうまく一般化する。ここでは、損失関数の選択には制限がなく、我々のメソッドは既存の関数の任意のセットで動作する。さらに、各損失関数とその重みを、ハイパーパラメータを調整する必要なく、エンドツーエンドのパラダイムで最適化することもできる。従来のゼロショット学習(zsl)設定において,マシンビジョン領域から一般的なデータセットを評価する。その結果,本手法がすべてのデータセットにおいて,ベースラインの損失をはるかに上回っていることが明らかとなった。

Deep Metric Learning (DML) learns a non-linear semantic embedding from input data that brings similar pairs together while keeps dissimilar data away from each other. To this end, many different methods are proposed in the last decade with promising results in various applications. The success of a DML algorithm greatly depends on its loss function. However, no loss function is perfect, and it deals only with some aspects of an optimal similarity embedding. Besides, the generalizability of the DML on unseen categories during the test stage is an important matter that is not considered by existing loss functions. To address these challenges, we propose novel approaches to combine different losses built on top of a shared deep feature extractor. The proposed ensemble of losses enforces the deep model to extract features that are consistent with all losses. Since the selected losses are diverse and each emphasizes different aspects of an optimal semantic embedding, our effective combining methods yield a considerable improvement over any individual loss and generalize well on unseen categories. Here, there is no limitation in choosing loss functions, and our methods can work with any set of existing ones. Besides, they can optimize each loss function as well as its weight in an end-to-end paradigm with no need to adjust any hyper-parameter. We evaluate our methods on some popular datasets from the machine vision domain in conventional Zero-Shot-Learning (ZSL) settings. The results are very encouraging and show that our methods outperform all baseline losses by a large margin in all datasets.

翻訳日:2021-07-05 13:05:11 公開日:2021-07-02

# 映像における視覚関係予測

Visual Relationship Forecasting in Videos ( http://arxiv.org/abs/2107.01181v1 )

ライセンス: Link先を確認

Li Mi, Yangjun Ou, Zhenzhong Chen

(参考訳) 現実世界のシナリオは、しばしば未知の未来のオブジェクトインタラクションの予測を必要とし、人間とエージェントの両方の意思決定プロセスを支援する。この課題に対処するため,視覚関係予測(Visual Relation Forecasting:VRF)というタスクをビデオに提示し,視覚関係の予測を推論的に検討する。具体的には、Hフレームと対象オブジェクトのペアが与えられた場合、VRFは視覚的証拠なしで次のTフレームに対する将来の相互作用を予測することを目的としている。 VRFタスクを評価するために,VRF-AGとVRF-VidORという2つのビデオデータセットを紹介した。これらの2つのデータセットは、それぞれ1923年と13447年のビデオクリップで13と35の視覚関係を密に注釈している。さらに、時空間グラフ畳み込みネットワークとトランスフォーマーによってオブジェクトレベルとフレームレベルの依存関係をキャプチャする新しいグラフ畳み込みトランスフォーマ(GCT)フレームワークを提案する。 VRF-AGデータセットとVRF-Vidorデータセットの両方の実験結果から、GCTは視覚関係予測における最先端のシーケンスモデリング手法よりも優れていることが示された。

Real-world scenarios often require the anticipation of object interactions in unknown future, which would assist the decision-making process of both humans and agents. To meet this challenge, we present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a reasoning manner. Specifically, given a subject-object pair with H existing frames, VRF aims to predict their future interactions for the next T frames without visual evidence. To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series of spatio-temporally localized visual relation annotations in a video. These two datasets densely annotate 13 and 35 visual relationships in 1923 and 13447 video clips, respectively. In addition, we present a novel Graph Convolutional Transformer (GCT) framework, which captures both object-level and frame-level dependencies by spatio-temporal Graph Convolution Network and Transformer. Experimental results on both VRF-AG and VRF-VidOR datasets demonstrate that GCT outperforms the state-of-the-art sequence modelling methods on visual relationship forecasting.

翻訳日:2021-07-05 13:04:51 公開日:2021-07-02

# Transformer-F: 普遍的な文表現の学習に有効なトランスフォーマーネットワーク

Transformer-F: A Transformer network with effective methods for learning universal sentence representation ( http://arxiv.org/abs/2107.00653v1 )

ライセンス: Link先を確認

Yu Shi

(参考訳) Transformerモデルは、自然言語処理で文表現に広く使われている。しかし、以前のトランスフォーマーベースのモデルは、たいていの場合、限定的な意味を持ち、単に高レベルな意味抽象機能を抽出できる関数ワードに焦点を当てていた。本稿では,トランスフォーマーの性能向上のための2つの手法を提案する。注意度を相関係数と重みベクトルを乗じることで算出し,より実用的な意味を持つ単語の抽出に寄与した。重みベクトルは、音声部分の重要性に基づいて入力テキストシーケンスによって得られる。さらに,各層の特徴を融合させて文表現結果をより包括的かつ正確にする。実験では、3つの標準テキスト分類データセットに対するモデルtransformer-fの有効性を示す。実験の結果,提案モデルがベースラインモデルと比較してテキスト分類の性能を著しく向上させることがわかった。具体的には,簡単な作業でバニラ変圧器を5.28%向上させた。

The Transformer model is widely used in natural language processing for sentence representation. However, the previous Transformer-based models focus on function words that have limited meaning in most cases and could merely extract high-level semantic abstraction features. In this paper, two approaches are introduced to improve the performance of Transformers. We calculated the attention score by multiplying the part-of-speech weight vector with the correlation coefficient, which helps extract the words with more practical meaning. The weight vector is obtained by the input text sequence based on the importance of the part-of-speech. Furthermore, we fuse the features of each layer to make the sentence representation results more comprehensive and accurate. In experiments, we demonstrate the effectiveness of our model Transformer-F on three standard text classification datasets. Experimental results show that our proposed model significantly boosts the performance of text classification as compared to the baseline model. Specifically, we obtain a 5.28% relative improvement over the vanilla Transformer on the simple tasks.

翻訳日:2021-07-05 13:04:32 公開日:2021-07-02

# Video Captionsを用いたYouTube上の誤情報検出

Misinformation Detection on YouTube Using Video Captions ( http://arxiv.org/abs/2107.00941v1 )

ライセンス: Link先を確認

Raj Jagtap, Abhinav Kumar, Rahul Goel, Shakshi Sharma, Rajesh Sharma, Clint P. George

(参考訳) 何百万人もの人々がyoutube、facebook、twitterなどのマスメディアを利用している。これらのプラットフォームへのアクセシビリティのため、物語を確立し、プロパガンダを行い、誤情報を広めるためにしばしば使用される。本研究では,最新のNLP技術を用いて映像キャプション(字幕)から特徴を抽出する手法を提案する。提案手法を評価するために,動画を誤情報か否かを分類するために,公開アクセス可能なラベル付きデータセットを用いた。ビデオキャプションを探索する動機は、ビデオメタデータの分析にある。ビュー数、いいね!、嫌い、コメントなどの属性は、ビデオがこの情報を使って区別することが難しいため、効果がない。提案手法では,キャプションデータセットを用いて0.85から0.90 f1-scoreの3種類(誤報,誤報,中立)の動画を分類できる。誤情報クラスの関連性を強調するため,我々はこの分類問題を,誤情報と他者(誤情報と中立性)の2類分類として再定式化する。提案手法では,0.92から0.95 f1-score,0.78から0.90 auc rocの動画を分類できる。

Millions of people use platforms such as YouTube, Facebook, Twitter, and other mass media. Due to the accessibility of these platforms, they are often used to establish a narrative, conduct propaganda, and disseminate misinformation. This work proposes an approach that uses state-of-the-art NLP techniques to extract features from video captions (subtitles). To evaluate our approach, we utilize a publicly accessible and labeled dataset for classifying videos as misinformation or not. The motivation behind exploring video captions stems from our analysis of videos metadata. Attributes such as the number of views, likes, dislikes, and comments are ineffective as videos are hard to differentiate using this information. Using caption dataset, the proposed models can classify videos among three classes (Misinformation, Debunking Misinformation, and Neutral) with 0.85 to 0.90 F1-score. To emphasize the relevance of the misinformation class, we re-formulate our classification problem as a two-class classification - Misinformation vs. others (Debunking Misinformation and Neutral). In our experiments, the proposed models can classify videos with 0.92 to 0.95 F1-score and 0.78 to 0.90 AUC ROC.

翻訳日:2021-07-05 13:03:58 公開日:2021-07-02

# r2d2:階層型言語モデルのための微分木に基づく再帰的トランスフォーマ

R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling ( http://arxiv.org/abs/2107.00967v1 )

ライセンス: Link先を確認

Xiang Hu, Haitao Mi, Zujie Wen, Yafang Wang, Yi Su, Jing Zheng, Gerard de Melo

(参考訳) 人間の言語理解は、階層的に結合できる抽象レベルの増加とともに、複数のレベルの粒度(例えば、単語、句、文)で機能する。しかし、スタック層を持つ既存の深層モデルは、いかなる階層的プロセスも明示的にモデル化しない。本稿では、構成過程をエミュレートするために、微分可能なCKYスタイルのバイナリツリーに基づく再帰変換モデルを提案する。我々は、双方向言語モデルの事前学習目標をこのアーキテクチャに拡張し、左右の抽象ノードから各単語を予測することを試みる。また,本手法を大規模化するために,合成ステップの線形数だけを符号化する効率的な伐採木誘導アルゴリズムを導入する。言語モデルと教師なし構文解析の実験結果は,提案手法の有効性を示している。

Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. This paper proposes a recursive Transformer model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach.

翻訳日:2021-07-05 13:03:38 公開日:2021-07-02

# コード混合BERTを用いたヒンディー語ツイートの言語識別

Language Identification of Hindi-English tweets using code-mixed BERT ( http://arxiv.org/abs/2107.01202v1 )

ライセンス: Link先を確認

Mohd Zeeshan Ansari, M M Sufyan Beg, Tanvir Ahmad, Mohd Jazib Khan, Ghazali Wasim

(参考訳) 近年,ソーシャルメディアのテキストの言語識別は興味深い研究課題となっている。ソーシャルメディアのメッセージは、主に英語以外の国で混在している。文脈埋め込みの事前学習による事前知識は、下流タスクにおけるアート結果の状態を示している。近年、BERTのようなモデルでは、大量のラベルのないデータを使用することで、事前訓練された言語モデルは共通の言語表現を学習するのにさらに有益であることが示されている。本稿では,移動学習と細調整BERTモデルを用いたTwitter上での言語識別実験について述べる。この研究は、ヒンディー語-英語-ウルドゥー語のコード混合テキストのデータ収集を言語事前学習に用い、ヒンディー語-英語コード混合を後続の単語レベルの言語分類に用いている。その結果、コードミックスデータ上で事前学習された表現は、モノリンガルデータによるより良い結果をもたらすことがわかった。

Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states. Prior knowledge by pre-training contextual embeddings have shown state of the art results for a range of downstream tasks. Recently, models such as BERT have shown that using a large amount of unlabeled data, the pretrained language models are even more beneficial for learning common language representations. Extensive experiments exploiting transfer learning and fine-tuning BERT models to identify language on Twitter are presented in this paper. The work utilizes a data collection of Hindi-English-Urdu codemixed text for language pre-training and Hindi-English codemixed for subsequent word-level language classification. The results show that the representations pre-trained over codemixed data produce better results by their monolingual counterpart.

翻訳日:2021-07-05 13:03:25 公開日:2021-07-02

# ブリッジングジェネリックと個人化フェデレーション学習について

On Bridging Generic and Personalized Federated Learning ( http://arxiv.org/abs/2107.00778v1 )

ライセンス: Link先を確認

Hong-You Chen, Wei-Lun Chao

(参考訳) フェデレーション学習(federated learning)は、データにアクセスせずに複数のクライアントと共同でモデルをトレーニングできることを約束している。学習したモデルの一般的なパフォーマンス(サーバでの将来の使用のために)や、パーソナライズされたパフォーマンス(各クライアントのために)を優先すべきか? これら2つの競合しているように見える目標が,コミュニティを分割して一方に注目する一方で,本論文では,両アプローチが同時に可能であることを示す。具体的には,モデルの2つの義務を2つの予測タスクで明確に分離する,新しいフェデレーション学習フレームワークを提案する。一方で,非同一のクラス分布に対して頑健な損失のファミリーを導入することで,クライアントが相互に一貫した目標を持った汎用予測子をトレーニングできる。一方、パーソナライズされた予測器を軽量適応モジュールとして定式化し、汎用予測器上で各クライアントの経験的リスクを最小限に抑えることを学習する。 Federated Robust Decoupling Fed-RoDと名付けられたこの2つの余分な2つの予測フレームワークによって、学習モデルは、最先端の汎用的かつパーソナライズされたパフォーマンスを同時に達成することができ、基本的に2つのタスクをブリッジします。

Federated learning is promising for its ability to collaboratively train models with multiple clients without accessing their data, but vulnerable when clients' data distributions diverge from each other. This divergence further leads to a dilemma: "Should we prioritize the learned model's generic performance (for future use at the server) or its personalized performance (for each client)?" These two, seemingly competing goals have divided the community to focus on one or the other, yet in this paper we show that it is possible to approach both at the same time. Concretely, we propose a novel federated learning framework that explicitly decouples a model's dual duties with two prediction tasks. On the one hand, we introduce a family of losses that are robust to non-identical class distributions, enabling clients to train a generic predictor with a consistent objective across them. On the other hand, we formulate the personalized predictor as a lightweight adaptive module that is learned to minimize each client's empirical risk on top of the generic predictor. With this two-loss, two-predictor framework which we name Federated Robust Decoupling Fed-RoD, the learned model can simultaneously achieve state-of-the-art generic and personalized performance, essentially bridging the two tasks.

翻訳日:2021-07-05 13:02:56 公開日:2021-07-02

# 後方対応型予測更新:確率論的アプローチ

Backward-Compatible Prediction Updates: A Probabilistic Approach ( http://arxiv.org/abs/2107.01057v1 )

ライセンス: Link先を確認

Frederik Tr\"auble, Julius von K\"ugelgen, Matth\"aus Kleindessner, Francesco Locatello, Bernhard Sch\"olkopf, Peter Gehler

(参考訳) 機械学習システムが現実世界のアプリケーションに適合する場合、精度はいくつかの要件の1つに過ぎません。本稿では,事前学習および定期的な最先端モデルの改善による相補的視点について検討する。新しい改善されたモデルは速いペースで進化するが、下流のタスクはよりゆっくりと変化するか、一定である。正確な予測を維持したいという、大きなラベルのないデータセットがあると仮定する。 i) 予算が限られている場合、どのデータポイントが新しいモデルで再評価されるべきか? と (ii) もし新しい予測が現在の予測と違うなら、更新すべきだろうか? 問題 (i) は計算コストであり、非常に大きなデータセットとモデルにとって重要である。問題 (ii) は予測の整合性を維持することであり、これは下流のアプリケーションに非常に関係がある。本稿では,予測更新問題を定式化し,上記の問題に対する効率的な確率的アプローチを提案する。標準分類ベンチマークデータセットの広範な実験において,提案手法は後方互換性のある予測更新のための重要な指標に沿って,代替戦略よりも優れていることを示す。

When machine learning systems meet real world applications, accuracy is only one of several requirements. In this paper, we assay a complementary perspective originating from the increasing availability of pre-trained and regularly improving state-of-the-art models. While new improved models develop at a fast pace, downstream tasks vary more slowly or stay constant. Assume that we have a large unlabelled data set for which we want to maintain accurate predictions. Whenever a new and presumably better ML models becomes available, we encounter two problems: (i) given a limited budget, which data points should be re-evaluated using the new model?; and (ii) if the new predictions differ from the current ones, should we update? Problem (i) is about compute cost, which matters for very large data sets and models. Problem (ii) is about maintaining consistency of the predictions, which can be highly relevant for downstream applications; our demand is to avoid negative flips, i.e., changing correct to incorrect predictions. In this paper, we formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions. In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies along key metrics for backward-compatible prediction updates.

翻訳日:2021-07-05 13:02:32 公開日:2021-07-02

# CHISEL: 深層学習による屋内局所化の精度向上

CHISEL: Compression-Aware High-Accuracy Embedded Indoor Localization with Deep Learning ( http://arxiv.org/abs/2107.01192v1 )

ライセンス: Link先を確認

Liping Wang, Saideep Tiku, Sudeep Pasricha

(参考訳) GPS技術は、私たちが屋外でローカライズし、ナビゲートする方法に革命をもたらした。しかし、建物内のGPS信号の受信が貧弱なため、屋内でのローカライゼーションには適さない。 WiFi指紋認証による屋内位置特定は、この需要を満たす最も有望な方法の1つだ。残念なことに、ドメイン内のほとんどの作業は、リソース制限された組み込みデバイスへのデプロイ可能性に関する課題を解決できない。そこで本研究では,組込みデバイスにおけるローカライズロバスト性を維持しつつ,その領域でよく知られた作業より優れる圧縮認識・高精度深層学習フレームワークCHISELを提案する。

GPS technology has revolutionized the way we localize and navigate outdoors. However, the poor reception of GPS signals in buildings makes it unsuitable for indoor localization. WiFi fingerprinting-based indoor localization is one of the most promising ways to meet this demand. Unfortunately, most work in the domain fails to resolve challenges associated with deployability on resource-limited embedded devices. In this work, we propose a compression-aware and high-accuracy deep learning framework called CHISEL that outperforms the best-known works in the area while maintaining localization robustness on embedded devices.

翻訳日:2021-07-05 13:02:16 公開日:2021-07-02

# 相対密度比推定のためのメタラーニング

Meta-Learning for Relative Density-Ratio Estimation ( http://arxiv.org/abs/2107.00801v1 )

ライセンス: Link先を確認

Atsutoshi Kumagai and Tomoharu Iwata and Yasuhiro Fujiwara

(参考訳) 密度比と呼ばれる2つの確率密度の比率は、機械学習において重要な量である。特に、密度比の有界拡大である相対密度比は、その安定性から多くの注目を集めており、外乱検出やデータセット比較といった様々な用途で利用されてきた。相対密度比推定(DRE)の既存の方法は、両方の密度から多くのインスタンスを必要とする。しかし、実際には十分なインスタンスは利用できないことが多い。本稿では,関係データセットの知識を用いて,少数の事例から相対密度比を推定する,相対DREのメタラーニング手法を提案する。具体的には、いくつかのインスタンスからなる2つのデータセットを与えられた場合、ニューラルネットワークを用いてデータセットの情報を抽出し、相対dreに適したインスタンス埋め込みを得る。我々は,大域的最適解を閉形式解として得られる埋め込み空間上の線形モデルを用いて相対密度比をモデル化する。クローズドフォームソリューションはいくつかのインスタンスへの高速かつ効果的な適応を可能にし、その微分可能性により、相対的なDREに対するテストエラーが、少数のインスタンスに適応した後、明示的に最小化できるようにモデルを訓練することができる。提案手法の有効性を,相対的DRE,データセット比較,外乱検出の3つの問題を用いて実証的に実証した。

The ratio of two probability densities, called a density-ratio, is a vital quantity in machine learning. In particular, a relative density-ratio, which is a bounded extension of the density-ratio, has received much attention due to its stability and has been used in various applications such as outlier detection and dataset comparison. Existing methods for (relative) density-ratio estimation (DRE) require many instances from both densities. However, sufficient instances are often unavailable in practice. In this paper, we propose a meta-learning method for relative DRE, which estimates the relative density-ratio from a few instances by using knowledge in related datasets. Specifically, given two datasets that consist of a few instances, our model extracts the datasets' information by using neural networks and uses it to obtain instance embeddings appropriate for the relative DRE. We model the relative density-ratio by a linear model on the embedded space, whose global optimum solution can be obtained as a closed-form solution. The closed-form solution enables fast and effective adaptation to a few instances, and its differentiability enables us to train our model such that the expected test error for relative DRE can be explicitly minimized after adapting to a few instances. We empirically demonstrate the effectiveness of the proposed method by using three problems: relative DRE, dataset comparison, and outlier detection.

翻訳日:2021-07-05 13:01:36 公開日:2021-07-02

# 教師なし特徴選択のための少数ショット学習

Few-shot Learning for Unsupervised Feature Selection ( http://arxiv.org/abs/2107.00816v1 )

ライセンス: Link先を確認

Atsutoshi Kumagai and Tomoharu Iwata and Yasuhiro Fujiwara

(参考訳) そこで本稿では,ラベル付きデータに含まれる特徴のサブセットを選択するタスクである,教師なし特徴選択のための数ショット学習手法を提案する。既存のメソッドは通常、機能選択に多くのインスタンスを必要とする。しかし、実際には十分なインスタンスは利用できないことが多い。提案手法では,複数のソースタスクでラベルなしインスタンスをトレーニングすることにより,いくつかのラベルなしターゲットインスタンスが与えられた場合,対象タスクの関連機能のサブセットを選択できる。我々のモデルは特徴セレクタとデコーダで構成される。特徴セレクタは、いくつかの未ラベルのインスタンスを入力として取り込んだ関連する機能のサブセットを出力し、デコーダは選択したインスタンスから未表示のインスタンスのオリジナル機能を再構築することができる。特徴セレクタは、具体的ランダム変数を使用して、勾配降下による特徴を選択する。いくつかのラベルなしインスタンスからモデルにタスク固有の特性をエンコードするために、いくつかのラベルなしインスタンスを入力とする置換不変ニューラルネットワークを用いて具体的確率変数とデコーダをモデル化する。私たちのモデルは、ソースタスクのデータセットで計算されたいくつかのラベルなしインスタンスに対して、期待されるテスト再構成エラーを最小化することでトレーニングされます。提案手法が既存の特徴選択法より優れていることを示す。

We propose a few-shot learning method for unsupervised feature selection, which is a task to select a subset of relevant features in unlabeled data. Existing methods usually require many instances for feature selection. However, sufficient instances are often unavailable in practice. The proposed method can select a subset of relevant features in a target task given a few unlabeled target instances by training with unlabeled instances in multiple source tasks. Our model consists of a feature selector and decoder. The feature selector outputs a subset of relevant features taking a few unlabeled instances as input such that the decoder can reconstruct the original features of unseen instances from the selected ones. The feature selector uses the Concrete random variables to select features via gradient descent. To encode task-specific properties from a few unlabeled instances to the model, the Concrete random variables and decoder are modeled using permutation-invariant neural networks that take a few unlabeled instances as input. Our model is trained by minimizing the expected test reconstruction error given a few unlabeled instances that is calculated with datasets in source tasks. We experimentally demonstrate that the proposed method outperforms existing feature selection methods.

翻訳日:2021-07-05 13:01:16 公開日:2021-07-02

# 視覚モデルに基づく強化学習における因果発見の体系的評価

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning ( http://arxiv.org/abs/2107.00848v1 )

ライセンス: Link先を確認

Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal

(参考訳) 観察から因果関係を誘導することは機械学習の古典的な問題である。ほとんどの因果関係の研究は、因果変数自体が観察されるという前提から始まる。しかし、ロボットのようなAIエージェントが環境を理解しようとする場合、観測可能な変数は画像中のピクセルのような低レベル変数のみである。適切に一般化するには、エージェントは高レベルの変数、特に因果変数に影響を受ける変数を誘導する必要がある。 aiと因果関係の中心的な目標は、抽象表現と因果構造の共同発見である。しかし,因果誘導を研究する既存の環境は,パラメトリックに操作できない複雑なタスク固有の因果グラフ(ノード数,スパーシティ,因果連鎖長など)を持っているため,この目的には適さない。本研究の目的は,高レベル変数の表現とそれらの間の因果構造を学ぶ研究を促進することである。これらの変数や構造を同定する手法を体系的に探索するために,我々はRL環境のベンチマークスイートを設計する。本研究は,様々な表現学習アルゴリズムを文献から評価し,モデルに構造とモジュラリティを明示的に組み込むことが,モデルに基づく強化学習における因果的帰納に役立つことを見出した。

Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.

翻訳日:2021-07-05 13:00:57 公開日:2021-07-02

# DeformRS: ランダムな平滑化による入力変形の認証

DeformRS: Certifying Input Deformations with Randomized Smoothing ( http://arxiv.org/abs/2107.00996v1 )

ライセンス: Link先を確認

Motasem Alfarra, Adel Bibi, Naeemullah Khan, Philip H. S. Torr, and Bernard Ghanem

(参考訳) 深層ニューラルネットワークは、画素変位のベクトル場の形での入力変形や、他のパラメータ化された幾何学的変形に弱い。翻訳、回転など現在の入力変形認証法は、(i)大きな入力データセット上のディープネットワークにスケールしないか、(ii)特定の種類の変形を認証できないかのいずれかである。回転だけだ一般ベクトル場およびパラメータ化変形のランダムな平滑化設定における認証を再構成し,DeformRS-VFとDeformRS-Parを提案する。我々の新しい定式化は、大きな入力データセット上の大きなネットワークにスケールする。例えば、DeformRS-Parは豊富な変形、翻訳、回転、スケーリング、アフィン変形、その他の視覚的に整列した変形、例えば離散コサイン変換によってパラメータ化された変形を認証する。 MNIST、CIFAR10、ImageNetの大規模な実験により、DeformRS-Parは、認証された精度で既存の最先端技術よりも優れていることが示された。 imagenetの[10,10]度での摂動回転に対する認証精度が6%向上した。

Deep neural networks are vulnerable to input deformations in the form of vector fields of pixel displacements and to other parameterized geometric deformations e.g. translations, rotations, etc. Current input deformation certification methods either (i) do not scale to deep networks on large input datasets, or (ii) can only certify a specific class of deformations, e.g. only rotations. We reformulate certification in randomized smoothing setting for both general vector field and parameterized deformations and propose DeformRS-VF and DeformRS-Par, respectively. Our new formulation scales to large networks on large input datasets. For instance, DeformRS-Par certifies rich deformations, covering translations, rotations, scaling, affine deformations, and other visually aligned deformations such as ones parameterized by Discrete-Cosine-Transform basis. Extensive experiments on MNIST, CIFAR10 and ImageNet show that DeformRS-Par outperforms existing state-of-the-art in certified accuracy, e.g. improved certified accuracy of 6% against perturbed rotations in the set [-10,10] degrees on ImageNet.

翻訳日:2021-07-05 13:00:32 公開日:2021-07-02

# 大規模画像を用いたメモリ効率の良いメタラーニング

Memory Efficient Meta-Learning with Large Images ( http://arxiv.org/abs/2107.01105v1 )

ライセンス: Link先を確認

John Bronskill, Daniela Massiceti, Massimiliano Patacchiola, Katja Hofmann, Sebastian Nowozin, Richard E. Turner

(参考訳) 少数ショット分類へのメタ学習のアプローチは、新しいタスクを学ぶのにほんの数回の最適化ステップや1回のフォワードパスを必要とするテスト時に計算効率が良いが、トレーニングにはメモリ集約性が高い。この制限は、最大1000枚の画像を含むタスク全体のサポートセットを最適化ステップが取られる前に処理しなければならないため生じます。大規模なイメージで提供されるパフォーマンス向上を活用するには、複数のgpuでメタリアナーを並列化するか、メモリ制約が適用できない場合のタスクとイメージサイズのトレードオフが必要となる。単一のgpu上で大きなイメージからなる大きなタスクのメタトレーニングを可能にする汎用およびメモリ効率のよいエピソディックトレーニングスキームであるliteを提案することで、両方のオプションを改善した。我々は,タスクの勾配を,タスクの訓練画像上の勾配の和に分解することができることを観察することによって達成した。これにより、タスク全体のトレーニングセットでフォワードパスを実行できるが、全勾配の偏りのない近似であるこれらの画像のランダムなサブセットのみをバックプロパゲーションすることで、大幅なメモリ節約を実現することができる。我々は、LITEを用いてメタラーナーのトレーニングを行い、実際のORBITベンチマークで新しい最先端の精度を示し、主要なメタラーナーと比較して挑戦的なVTAB+MDベンチマークの4つの部分のうち3つを示す。 LITEはまた、メタ学習者がトランスファーラーニングアプローチと競合することを可能にするが、テストタイムの計算コストのごく一部で、トランスファーラーニングが数ショットの分類に必要なすべてである、という最近の物語の対極として機能する。

Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing the performance gains offered by large images thus requires either parallelizing the meta-learner across multiple GPUs, which may not be available, or trade-offs between task and image size when memory constraints apply. We improve on both options by proposing LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU. We achieve this by observing that the gradients for a task can be decomposed into a sum of gradients over the task's training images. This enables us to perform a forward pass on a task's entire training set but realize significant memory savings by back-propagating only a random subset of these images which we show is an unbiased approximation of the full gradient. We use LITE to train meta-learners and demonstrate new state-of-the-art accuracy on the real-world ORBIT benchmark and 3 of the 4 parts of the challenging VTAB+MD benchmark relative to leading meta-learners. LITE also enables meta-learners to be competitive with transfer learning approaches but at a fraction of the test-time computational cost, thus serving as a counterpoint to the recent narrative that transfer learning is all you need for few-shot classification.

翻訳日:2021-07-05 13:00:12 公開日:2021-07-02

# ケースリレーショナルトランスフォーマー:命令フェッチのためのクロスモーダル言語生成モデル

Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions ( http://arxiv.org/abs/2107.00789v1 )

ライセンス: Link先を確認

Motonari Kambara and Komei Sugiura

(参考訳) 家庭内サービスロボットのコミュニケーション能力を向上させるためのロボット工学の研究が数多く行われている。しかし、ほとんどの研究は、トレーニングデータセットが十分に大きくないため、最近のディープニューラルネットワークの進歩の恩恵を受けていない。本稿では,クロスモーダル言語生成モデルに基づくデータセットの強化を目的とする。画像から「青いフリップフロップを左下ボックスに移動させる」というようなフェッチング命令文を生成するケース関係変換器(CRT)を提案する。既存の方法とは異なり、CRTはTransformerを使用して画像内のオブジェクトの視覚的特徴と幾何学的特徴を統合する。 CRTはケースリレーショナルブロックのためにオブジェクトを処理することができる。比較実験と人的評価を行った。実験の結果,crtはベースライン法を上回った。

There have been many studies in robotics to improve the communication skills of domestic service robots. Most studies, however, have not fully benefited from recent advances in deep neural networks because the training datasets are not large enough. In this paper, our aim is to augment the datasets based on a crossmodal language generation model. We propose the Case Relation Transformer (CRT), which generates a fetching instruction sentence from an image, such as "Move the blue flip-flop to the lower left box." Unlike existing methods, the CRT uses the Transformer to integrate the visual features and geometry features of objects in the image. The CRT can handle the objects because of the Case Relation Block. We conducted comparison experiments and a human evaluation. The experimental results show the CRT outperforms baseline methods.

翻訳日:2021-07-05 12:59:44 公開日:2021-07-02

# target-dependent uniter: 国内サービスロボットのためのトランスフォーマーベースのマルチモーダル言語理解モデル

Target-dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots ( http://arxiv.org/abs/2107.00811v1 )

ライセンス: Link先を確認

Shintaro Ishikawa and Komei Sugiura

(参考訳) 現在、国内サービスロボットは言語を通して自然に対話する能力が不十分である。これは、人間の指示を理解するのに様々な曖昧さや情報不足が複雑であるからである。既存手法では,オブジェクト間の関係を規定する参照表現は十分にモデル化されていない。本稿では,画像全体ではなく,画像内の関連領域に焦点をあてることで,対象オブジェクトと他のオブジェクトの関係を直接学習するターゲット依存型UNITERを提案する。本手法は汎用データセット上で事前学習可能なユニバーサベースのトランスフォーマの拡張である。対象候補を扱うための新しいアーキテクチャを導入することで、UNITERアプローチを拡張します。本モデルでは,2つの標準データセットに対して検証を行い,分類精度の点で,ターゲット依存型UNITERがベースライン法より優れていることを示す。

Currently, domestic service robots have an insufficient ability to interact naturally through language. This is because understanding human instructions is complicated by various ambiguities and missing information. In existing methods, the referring expressions that specify the relationships between objects are insufficiently modeled. In this paper, we propose Target-dependent UNITER, which learns the relationship between the target object and other objects directly by focusing on the relevant regions within an image, rather than the whole image. Our method is an extension of the UNITER-based Transformer that can be pretrained on general-purpose datasets. We extend the UNITER approach by introducing a new architecture for handling the target candidates. Our model is validated on two standard datasets, and the results show that Target-dependent UNITER outperforms the baseline method in terms of classification accuracy.

翻訳日:2021-07-05 12:59:31 公開日:2021-07-02

# データセットからグラフを生成する学習による高速ニューラルネットワーク検索

Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets ( http://arxiv.org/abs/2107.00860v1 )

ライセンス: Link先を確認

Hayeon Lee, Eunyoung Hyung, Sung Ju Hwang

(参考訳) 最近のnas(neural architecture search)法の成功は、人間が設計したネットワークをほとんど上回るネットワークを出力していることを示したが、従来のnas法は、単一のタスク(データセット)に対するネットワークアーキテクチャの探索の最適化に主に取り組んできた。さらに、タスク固有の手法は、与えられたタスクごとにスクラッチからニューラルアーキテクチャを探索するので、時間と金銭の予算が限られている場合に問題となる大きな計算コストが発生する。本稿では,データセットと事前学習ネットワークからなるデータベース上で1度トレーニングし,新しいデータセットのためのニューラルネットワークを高速に検索できる効率的なNASフレームワークを提案する。提案したMetaD2A(Meta Dataset-to-Architecture)モデルは、アモータイズされたメタラーニングで学習したクロスモーダル潜在空間を介して、与えられたセット(データセット)からグラフ(アーキテクチャ)を確率的に生成することができる。さらに,目的とするデータセットを直接トレーニングすることなく,最適なアーキテクチャを推定し選択するメタパフォーマンス予測器を提案する。実験の結果,画像Net-1KのサブセットとNAS-Bench 201の検索空間からのアーキテクチャに基づいてメタ学習したモデルが,CIFAR-10やCIFAR-100を含む複数の未知のデータセットに平均33GPU秒で一般化できることが示されている。 mobilenetv3の検索空間でも、metad2aは転送可能なnasメソッドであるnsganetv2よりも5.5k倍高速で、同等の性能を持つ。 metad2aは、過去数年間に蓄積されたデータセットやアーキテクチャの豊富なデータベースからの知識を活用する方法だけでなく、rapid nasの新しい研究方向性を提案していると信じています。コードはhttps://github.com/HayeonLee/MetaD2Aで入手できる。

Despite the success of recent Neural Architecture Search (NAS) methods on various tasks which have shown to output networks that largely outperform human-designed networks, conventional NAS methods have mostly tackled the optimization of searching for the network architecture for a single task (dataset), which does not generalize well across multiple tasks (datasets). Moreover, since such task-specific methods search for a neural architecture from scratch for every given task, they incur a large computational cost, which is problematic when the time and monetary budget are limited. In this paper, we propose an efficient NAS framework that is trained once on a database consisting of datasets and pretrained networks and can rapidly search for a neural architecture for a novel dataset. The proposed MetaD2A (Meta Dataset-to-Architecture) model can stochastically generate graphs (architectures) from a given set (dataset) via a cross-modal latent space learned with amortized meta-learning. Moreover, we also propose a meta-performance predictor to estimate and select the best architecture without direct training on target datasets. The experimental results demonstrate that our model meta-learned on subsets of ImageNet-1K and architectures from NAS-Bench 201 search space successfully generalizes to multiple unseen datasets including CIFAR-10 and CIFAR-100, with an average search time of 33 GPU seconds. Even under MobileNetV3 search space, MetaD2A is 5.5K times faster than NSGANetV2, a transferable NAS method, with comparable performance. We believe that the MetaD2A proposes a new research direction for rapid NAS as well as ways to utilize the knowledge from rich databases of datasets and architectures accumulated over the past years. Code is available at https://github.com/HayeonLee/MetaD2A.

翻訳日:2021-07-05 12:58:57 公開日:2021-07-02

# DUKweb: UK Web Archive corpusのダイアクロニックな単語表現

DUKweb: Diachronic word representations from the UK Web Archive corpus ( http://arxiv.org/abs/2107.01076v1 )

ライセンス: Link先を確認

Adam Tsakalidis, Pierpaolo Basile, Marya Bazzi, Mihai Cucuringu and Barbara McGillivray

(参考訳) 語彙的意味変化(単語の意味と用法の変化を検出する)は、自然言語処理だけでなく、社会・文化研究においても重要な課題である。ダイアクロニック単語の埋め込み(意味を保存する単語の時間感受性ベクトル表現)がこのタスクの標準リソースとなっている。しかし、その世代に必要な重要な計算資源を考えると、ダイアクロニックな単語の埋め込みを科学界で利用できる資源はごくわずかである。本稿では,現代英語のダイアクロニック解析のための大規模リソースセットであるDUKwebについて述べる。 DUKweb は JISC UK Web Domain Dataset (1996-2013) から作成され、".uk" で終わるドメインにホストされたインターネットアーカイブからリソースを収集する非常に大規模なアーカイブである。 DUKwebは一連の単語共起行列と、JISC UK Web Domainデータセットに毎年2種類の単語埋め込みで構成されている。 dukwebの再利用可能性とその品質基準を,単語の意味変化検出を事例として示す。

Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, given the significant computational resources needed for their generation, very few resources exist that make diachronic word embeddings available to the scientific community. In this paper we present DUKweb, a set of large-scale resources designed for the diachronic analysis of contemporary English. DUKweb was created from the JISC UK Web Domain Dataset (1996-2013), a very large archive which collects resources from the Internet Archive that were hosted on domains ending in `.uk'. DUKweb consists of a series word co-occurrence matrices and two types of word embeddings for each year in the JISC UK Web Domain dataset. We show the reuse potential of DUKweb and its quality standards via a case study on word meaning change detection.

翻訳日:2021-07-05 12:57:56 公開日:2021-07-02

# 知識グラフとコミュニティ認識感情を用いた新しい深層強化学習に基づくストック方向予測

A Novel Deep Reinforcement Learning Based Stock Direction Prediction using Knowledge Graph and Community Aware Sentiments ( http://arxiv.org/abs/2107.00931v1 )

ライセンス: Link先を確認

Anil Berk Altuner, Zeynep Hilal Kilimci

(参考訳) 株式市場の予測は投資家、研究者、アナリストにとって重要なトピックだ。あまりにも多くの要因に影響されているため、株式市場の予測は扱いにくい。本研究では,地域社会の感情と知識グラフを用いた株式の方向性予測のための深層強化学習手法に基づく新しい手法を提案する。この目的のために,まず接続関係を解析し,ユーザの社会的知識グラフを構築する。その後、関連するストックと感情分析の時系列分析と深い補強手法をブレンドする。トルコ版のトランスフォーマー(berturk)による双方向エンコーダ表現は、ユーザの感情分析に用いられ、深層q学習手法は、深層qネットワークを構築するために提案されたモデルの深層強化学習側に使用されている。このモデルの有効性を示すために、イスタンブール証券取引所のGaranti Bank(GARAN)、Akbank(AKBNK)、T\"urkiye \.I\c{s} Bankas{\i}(ISCTR)がケーススタディとして使用されている。実験の結果,提案手法は株式市場予測タスクにおいて顕著な結果を得た。

Stock market prediction has been an important topic for investors, researchers, and analysts. Because it is affected by too many factors, stock market prediction is a difficult task to handle. In this study, we propose a novel method that is based on deep reinforcement learning methodologies for the direction prediction of stocks using sentiments of community and knowledge graph. For this purpose, we firstly construct a social knowledge graph of users by analyzing relations between connections. After that, time series analysis of related stock and sentiment analysis is blended with deep reinforcement methodology. Turkish version of Bidirectional Encoder Representations from Transformers (BerTurk) is employed to analyze the sentiments of the users while deep Q-learning methodology is used for the deep reinforcement learning side of the proposed model to construct the deep Q network. In order to demonstrate the effectiveness of the proposed model, Garanti Bank (GARAN), Akbank (AKBNK), T\"urkiye \.I\c{s} Bankas{\i} (ISCTR) stocks in Istanbul Stock Exchange are used as a case study. Experiment results show that the proposed novel model achieves remarkable results for stock market prediction task.

翻訳日:2021-07-05 12:57:25 公開日:2021-07-02

# 決定木ヒューリスティックスは滑らかな設定でも失敗する可能性がある

Decision tree heuristics can fail, even in the smoothed setting ( http://arxiv.org/abs/2107.00819v1 )

ライセンス: Link先を確認

Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan

(参考訳) 生意気な決定木学習ヒューリスティックスは、機械学習の実践の主役であるが、その経験的成功に対する理論的正当化は、いまだ解明されていない。実際、それらがひどく失敗する単純な対象関数があることは長年知られている(Kearns and Mansour, STOC 1996)。 Brutzkus, Daniely, and Malach (COLT 2020) の最近の研究は、スムーズな解析モデルを、この切断を解決するための道のりとして考えている。平滑化設定の中で、目標の$f$は$k$-juntasであり、これらのヒューリスティックは$f$を深さ$k$決定木仮説で学べることを示した。彼らは、同じ保証がより一般に深さが$k$決定木である目標に対して成り立つと推測した。我々は、深さ$k$決定木であるターゲットを構築し、滑らかな設定であっても、これらのヒューリスティックスは高い精度を達成する前に深さ$2^{\Omega(k)}$のツリーを構築することを示す。また、brutzkusらによる保証も示している。目標が$k$-juntasに非常に近い場合、これらのヒューリスティックスは高い精度を達成する前に深さ270Omega(k)}$のツリーを構築する。

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets $f$ that are $k$-juntas, they showed that these heuristics successfully learn $f$ with depth-$k$ decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-$k$ decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-$k$ decision trees and show that even in the smoothed setting, these heuristics build trees of depth $2^{\Omega(k)}$ before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to $k$-juntas, for which these heuristics build trees of depth $2^{\Omega(k)}$ before achieving high accuracy.

翻訳日:2021-07-05 12:57:00 公開日:2021-07-02

# 高次元非パラメトリック仮説検定のための一般化多変量符号

Generalized Multivariate Signs for Nonparametric Hypothesis Testing in High Dimensions ( http://arxiv.org/abs/2107.01103v1 )

ライセンス: Link先を確認

Subhabrata Majumdar, Snigdhansu Chatterjee

(参考訳) 特徴空間の次元がサンプルサイズよりはるかに大きい高次元のデータは、多くの統計応用において生じる。この文脈では、一般化された多変量記号変換を構築し、そのノルムによって分割されたベクトルとして定義される。ノルム関数の異なる選択に対して、変換されたベクトルはデータ分布の幾何学的特徴に適応する。このアイデアに基づいて、これらの一般化符号ベクトルを用いて、高次元データの平均ベクトルに対する1サンプルおよび2サンプルの試験手順を得る。これらのテストはカーネル内積を用いたu-統計に基づいており、禁止的な仮定は必要とせず、高速なランダム化ベースの実装に適応できる。複数のデータ設定の実験を通じて、一般的な符号を用いたテストは、名目上のタイプiエラー率を維持しつつ、既存のテストよりも高いパワーを示すことを示した。最後に、mnist と minnesota twin studies のゲノムデータに関するサンプルアプリケーションを提供する。

High-dimensional data, where the dimension of the feature space is much larger than sample size, arise in a number of statistical applications. In this context, we construct the generalized multivariate sign transformation, defined as a vector divided by its norm. For different choices of the norm function, the resulting transformed vector adapts to certain geometrical features of the data distribution. Building up on this idea, we obtain one-sample and two-sample testing procedures for mean vectors of high-dimensional data using these generalized sign vectors. These tests are based on U-statistics using kernel inner products, do not require prohibitive assumptions, and are amenable to a fast randomization-based implementation. Through experiments in a number of data settings, we show that tests using generalized signs display higher power than existing tests, while maintaining nominal type-I error rates. Finally, we provide example applications on the MNIST and Minnesota Twin Studies genomic data.

翻訳日:2021-07-05 12:56:22 公開日:2021-07-02

# 物理インスパイアされたグラフニューラルネットワークによる組合せ最適化

Combinatorial Optimization with Physics-Inspired Graph Neural Networks ( http://arxiv.org/abs/2107.01188v1 )

ライセンス: Link先を確認

Martin J. A. Schuetz, J. Kyle Brubaker, Helmut G. Katzgraber

(参考訳) 組合せ最適化問題の解法としてグラフニューラルネットワークを用いる方法を示す。本手法は,最大カット,最小頂点被覆,最大独立集合,イジングスピングラスおよび多項式非拘束二元最適化問題の形式での高次一般化といった二次非拘束二元最適化問題の形式において,正準np-ハード問題に対して広く適用できる。グラフニューラルネットワークをトレーニングし、教師なし学習プロセスが完了すると、単純なプロジェクションを整数変数に適用する、微分可能な損失関数を生成するために、ハミルトン問題に緩和戦略を適用する。正準最大カットと最大独立集合問題に対する数値計算結果を用いて本手法を実証する。グラフニューラルネットワークオプティマイザが既存のソルバと同等かそれ以上の性能を発揮し、数百万の変数を持つ問題に対して最先端を超えてスケールすることができることが分かりました。

We demonstrate how graph neural networks can be used to solve combinatorial optimization problems. Our approach is broadly applicable to canonical NP-hard problems in the form of quadratic unconstrained binary optimization problems, such as maximum cut, minimum vertex cover, maximum independent set, as well as Ising spin glasses and higher-order generalizations thereof in the form of polynomial unconstrained binary optimization problems. We apply a relaxation strategy to the problem Hamiltonian to generate a differentiable loss function with which we train the graph neural network and apply a simple projection to integer variables once the unsupervised training process has completed. We showcase our approach with numerical results for the canonical maximum cut and maximum independent set problems. We find that the graph neural network optimizer performs on par or outperforms existing solvers, with the ability to scale beyond the state of the art to problems with millions of variables.

翻訳日:2021-07-05 12:56:07 公開日:2021-07-02

# マルチホップ機械読解のための不均一グラフ注意ネットワーク

Heterogeneous Graph Attention Network for Multi-hop Machine Reading Comprehension ( http://arxiv.org/abs/2107.00841v1 )

ライセンス: Link先を確認

Feng Gao, Jian-Cheng Ni, Peng Gao, Zi-Li Zhou, Yan-Yan Li, Hamido Fujita

(参考訳) マルチホップ機械読解は自然言語処理において難しい課題であり、推論能力と説明可能性を必要とする。グラフ畳み込みネットワークに基づくスペクトルモデルは推論能力を与え、競争結果をもたらすが、その一部は人間の理解可能な方法で推論を分析するという課題に直面している。認知神経科学における祖母細胞の概念に触発されてcrnameと呼ばれる空間グラフ注目フレームワークが提案された。このモデルは、意味的特徴を多角表現に集約し、推論のための情報を自動的に集中または緩和するように設計されている。クエリの主題を手掛かりの出発点として、推論エンティティをブリッジポイントとして、潜在候補エンティティをおばあちゃんセルとして、手掛かりを候補エンティティとして考える。提案モデルでは, 推論グラフを可視化し, 2つのエンティティを接続するエッジの重要性と, 参照ノードと候補ノードの選択性を分析する。オープンドメインマルチホップ読解データセット WikiHop と Drug-drug Interactions データセット MedHop の公式評価は、我々のアプローチの有効性を証明し、分子生物学領域におけるモデルの適用可能性を示す。

Multi-hop machine reading comprehension is a challenging task in natural language processing, which requires more reasoning ability and explainability. Spectral models based on graph convolutional networks grant the inferring abilities and lead to competitive results, however, part of them still face the challenge of analyzing the reasoning in a human-understandable way. Inspired by the concept of the Grandmother Cells in cognitive neuroscience, a spatial graph attention framework named crname, imitating the procedure was proposed. This model is designed to assemble the semantic features in multi-angle representations and automatically concentrate or alleviate the information for reasoning. The name "crname" is a metaphor for the pattern of the model: regard the subjects of queries as the start points of clues, take the reasoning entities as bridge points, and consider the latent candidate entities as the grandmother cells, and the clues end up in candidate entities. The proposed model allows us to visualize the reasoning graph and analyze the importance of edges connecting two entities and the selectivity in the mention and candidate nodes, which can be easier to be comprehended empirically. The official evaluations in open-domain multi-hop reading dataset WikiHop and Drug-drug Interactions dataset MedHop prove the validity of our approach and show the probability of the application of the model in the molecular biology domain.

翻訳日:2021-07-05 12:55:11 公開日:2021-07-02

# 変圧器の学習トークンプルーニング

Learned Token Pruning for Transformers ( http://arxiv.org/abs/2107.00910v1 )

ライセンス: Link先を確認

Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Joseph Hassoun, Kurt Keutzer

(参考訳) トランスフォーマーモデルのデプロイにおける大きな課題は、入力シーケンスの長さで2倍にスケールする禁止推論コストである。これにより、長いシーケンスを処理するのにトランスフォーマーを使うのが特に困難になる。そこで本研究では,データをトランスフォーマーの異なる層を通過する際に冗長なトークンを減少させる新しい学習トークンプルーニング(ltp)法を提案する。特に、LTPは、トレーニング中に学習した閾値未満の注意スコアでトークンをプルーネする。重要なことは、しきい値に基づく手法は、先行トークンプルーニング法で使用されるトップkトークン選択のようなアルゴリズム的に高価な操作を回避し、構造化プルーニングにつながることである。我々は,複数のグルータスクに対する我々のアプローチの性能を広範囲にテストし,学習しきい値に基づく手法が,従来のtop-kトークンベース手法を,同等のフラップで最大2%高い精度で一貫して上回ることを示した。さらに、我々の予備結果は、tesla t4 gpuとintel haswell cpuでそれぞれ1.4倍と1.9倍のスループット向上を示し、1%未満の精度低下(最大2.1倍のフロップス削減)でした。私たちのコードはPyTorchで開発され、オープンソース化されました。

A major challenge in deploying transformer models is their prohibitive inference cost, which quadratically scales with the input sequence length. This makes it especially difficult to use transformers for processing long sequences. To address this, we present a novel Learned Token Pruning (LTP) method that reduces redundant tokens as the data passes through the different layers of the transformer. In particular, LTP prunes tokens with an attention score below a threshold value, which is learned during training. Importantly, our threshold based method avoids algorithmically expensive operations such as top-k token selection which are used in prior token pruning methods, and also leads to structured pruning. We extensively test the performance of our approach on multiple GLUE tasks and show that our learned threshold based method consistently outperforms the prior state-of-the-art top-k token based method by up to ~2% higher accuracy with the same amount of FLOPs. Furthermore, our preliminary results show up to 1.4x and 1.9x throughput improvement on Tesla T4 GPU and Intel Haswell CPU, respectively, with less than 1% of accuracy drop (and up to 2.1x FLOPs reduction). Our code has been developed in PyTorch and has been open-sourced.

翻訳日:2021-07-05 12:54:47 公開日:2021-07-02

# DRIFT:学術文献のダイアクロニック解析用ツールキット

DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature ( http://arxiv.org/abs/2107.01198v1 )

ライセンス: Link先を確認

Abheesht Sharma, Gunjan Chhablani, Harshit Pandey, Rajaswa Patil

(参考訳) 本研究は,NLPコミュニティと研究コミュニティ全体を対象として,研究コーパスのダイアクロニック解析への応用について述べる。 DRIFTは、研究者が長年の研究動向や開発を追跡できるツールです。分析方法は、よく引用された研究成果と照合され、良い測定のためにいくつかの独自の方法が追加された。キーワード抽出、ワードクラウド、生産性による減少/停滞/成長傾向の予測、アクセラレーションプロットによるバイグラムの追跡、単語のセマンティックドリフトの検索、類似性によるトレンドの追跡などである。本ツールの有用性と有効性を示すため,本研究では,arxivリポジトリのcs.clコーパスをケーススタディとして,解析手法から推論を行う。ツールキットと関連するコードは以下の通りである。

In this work, we present to the NLP community, and to the wider research community as a whole, an application for the diachronic analysis of research corpora. We open source an easy-to-use tool coined: DRIFT, which allows researchers to track research trends and development over the years. The analysis methods are collated from well-cited research works, with a few of our own methods added for good measure. Succinctly put, some of the analysis methods are: keyword extraction, word clouds, predicting declining/stagnant/growing trends using Productivity, tracking bi-grams using Acceleration plots, finding the Semantic Drift of words, tracking trends using similarity, etc. To demonstrate the utility and efficacy of our tool, we perform a case study on the cs.CL corpus of the arXiv repository and draw inferences from the analysis methods. The toolkit and the associated code are available here: https://github.com/rajaswa/DRIFT.

翻訳日:2021-07-05 12:54:25 公開日:2021-07-02

# 一般的なボードゲームの概念

General Board Game Concepts ( http://arxiv.org/abs/2107.01078v1 )

ライセンス: Link先を確認

\'Eric Piette, Matthew Stephenson, Dennis J.N.J. Soemers and Cameron Browne

(参考訳) 多くのゲームは、ルールやコントロール、プレーエリアなど、共通のアイデアや側面を共有していることが多い。しかし、ボードゲームにおける一般ゲームプレイング(GGP)の文脈では、この領域は未探索のままである。ゲームの概念を定式化するために,ゲームプレーヤやデザイナーが一般的に使用する用語に着想を得た。 Ludii General Game Systemを通じて、ゲーム自体、プレイされた動き、到達した状態など、さまざまなレベルの抽象化の概念を記述します。ゲームのludeme表現に関連する新しいggp機能は、多くの新しい研究ラインを開く。ハイパーエージェントセレクタの作成、ゲーム間のAI学習の転送、ゲーム用語を用いたAI技術の説明は、ゲームコンセプトを使用することで、すべて容易になる。ゲームコンセプトの恩恵を受けることができる他のアプリケーションとして、不完全な古代ゲームのためのもっともらしい再構成ルールの生成や、ボードゲームレコメンデータシステムの実装などが議論されている。

Many games often share common ideas or aspects between them, such as their rules, controls, or playing area. However, in the context of General Game Playing (GGP) for board games, this area remains under-explored. We propose to formalise the notion of "game concept", inspired by terms generally used by game players and designers. Through the Ludii General Game System, we describe concepts for several levels of abstraction, such as the game itself, the moves played, or the states reached. This new GGP feature associated with the ludeme representation of games opens many new lines of research. The creation of a hyper-agent selector, the transfer of AI learning between games, or explaining AI techniques using game terms, can all be facilitated by the use of game concepts. Other applications which can benefit from game concepts are also discussed, such as the generation of plausible reconstructed rules for incomplete ancient games, or the implementation of a board game recommender system.

翻訳日:2021-07-05 12:54:03 公開日:2021-07-02

# ファジィ推論を用いたファジィラフ集合の類似性計算と文類似性計算への応用

Computing Fuzzy Rough Set based Similarities with Fuzzy Inference and Its Application to Sentence Similarity Computations ( http://arxiv.org/abs/2107.01170v1 )

ライセンス: Link先を確認

Nidhika Yadav

(参考訳) ファジィ粗集合による解析において、2つのファジィ集合間の類似性を計算するためのいくつかの研究イニシアティブが提案されている。これらの手法は2つの方法をもたらす。低い相似性と高い相似性。ほとんどのアプリケーションでは、1つのエンティティだけがさらなる分析や結論の導出に役立ちます。本稿では,ファジィ推論エンジンを用いたファジィラフセットに基づく低類似度と上類似度を組み合わせた新しい手法を提案する。さらに,提案手法を問題計算文の類似性に適用し,SICK2014データセット上で評価した。

Several research initiatives have been proposed for computing similarity between two Fuzzy Sets in analysis through Fuzzy Rough Sets. These techniques yield two measures viz. lower similarity and upper similarity. While in most applications only one entity is useful to further analysis and for drawing conclusions. The aim of this paper is to propose novel technique to combine Fuzzy Rough Set based lower similarity and upper similarity using Fuzzy Inference Engine. Further, the proposed approach is applied to the problem computing sentence similarity and have been evaluated on SICK2014 dataset.

翻訳日:2021-07-05 12:53:49 公開日:2021-07-02

# utnet:医療用画像分割のためのハイブリッドトランスフォーマーアーキテクチャ

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation ( http://arxiv.org/abs/2107.00781v1 )

ライセンス: Link先を確認

Yunhe Gao, Mu Zhou, Dimitris Metaxas

(参考訳) トランスフォーマーアーキテクチャは多くの自然言語処理タスクで成功している。しかし、その医学的ビジョンへの応用はほとんど未解明のままである。本研究では,医用画像セグメンテーションを強化するために,自己意識を畳み込みニューラルネットワークに統合するシンプルなハイブリッドトランスフォーマーアーキテクチャUTNetを提案する。 UTNetはエンコーダとデコーダの両方に自己アテンションモジュールを適用し、最小限のオーバーヘッドで異なるスケールで長距離依存性をキャプチャする。そこで本研究では, 自己注意動作の複雑さを$O(n^2)$から$O(n)$に大幅に低減する, 相対的な位置符号化を伴う効率的な自己注意機構を提案する。エンコーダのスキップされた接続から細かな詳細を復元するために,新たな自己アテンションデコーダも提案されている。われわれのアプローチは、Transformerが視覚誘発バイアスを学ぶために大量のデータを必要とするジレンマに対処する。当社のハイブリッド層設計では,事前学習を必要とせずにTransformerを畳み込みネットワークに初期化する。我々は, UTNetをマルチラベル, マルチベンダ型心臓MRIコホートで評価した。 utnetは、最先端のアプローチに対して優れたセグメンテーション性能と堅牢性を示し、他の医療画像セグメンテーションをうまく一般化することを約束している。

Transformer architecture has emerged to be successful in a number of natural language processing tasks. However, its applications to medical vision remain largely unexplored. In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation. UTNet applies self-attention modules in both encoder and decoder for capturing long-range dependency at different scales with minimal overhead. To this end, we propose an efficient self-attention mechanism along with relative position encoding that reduces the complexity of self-attention operation significantly from $O(n^2)$ to approximate $O(n)$. A new self-attention decoder is also proposed to recover fine-grained details from the skipped connections in the encoder. Our approach addresses the dilemma that Transformer requires huge amounts of data to learn vision inductive bias. Our hybrid layer design allows the initialization of Transformer into convolutional networks without a need of pre-training. We have evaluated UTNet on the multi-label, multi-vendor cardiac magnetic resonance imaging cohort. UTNet demonstrates superior segmentation performance and robustness against the state-of-the-art approaches, holding the promise to generalize well on other medical image segmentations.

翻訳日:2021-07-05 12:52:19 公開日:2021-07-02

# 偏光自己注意:高品質な画素ワイド回帰に向けて

Polarized Self-Attention: Towards High-quality Pixel-wise Regression ( http://arxiv.org/abs/2107.00782v1 )

ライセンス: Link先を確認

Huajun Liu, Fuqiang Liu, Xinyi Fan, Dong Huang

(参考訳) ピクセル単位での回帰は、キーポイントのヒートマップやセグメンテーションマスクの推定など、コンピュータビジョンタスクにおいて最も一般的な問題である。これらの回帰問題は、特に低い計算オーバーヘッドで高分解能入力/出力の長距離依存性をモデル化し、高度に非線形なピクセル単位の意味論を推定する必要があるため、非常に困難である。ディープ畳み込みニューラルネットワーク(DCNN)の注意機構は、長距離依存の促進に人気があるが、非局所ブロックのような要素固有の注意は、学習に非常に複雑でノイズに敏感であり、単純化されたハイブリットのほとんどは、複数のタスクの間で最高の妥協点に達しようとしている。本稿では,高品質な画素ワイドレグレッションに向けた2つの重要な設計を取り入れた分極自己注意ブロックを提案する。(1)分極フィルタリング:チャネルと空間の注意計算において高い内部分解能を維持しつつ,入力テンソルを対応する次元に沿って完全に崩壊させる。 2)強化: 2次元ガウス分布(キーポイントヒートマップ)や2次元双対分布(バイナリセグメンテーションマスク)など,典型的な細粒度回帰の出力分布に直接適合する非線形性を構成する。 psaはチャネルのみのブランチと空間のみのブランチで表現能力を使い果たし、シーケンシャルレイアウトと並列レイアウトの間には限界的なメトリック差しかなかったようである。実験の結果、psaは標準ベースラインを2～4ドルのポイント増やし、2dポーズ推定とセマンティクスセグメンテーションベンチマークで1～2ドルのポイント増やすことが示されている。

Pixel-wise regression is probably the most common problem in fine-grained computer vision tasks, such as estimating keypoint heatmaps and segmentation masks. These regression problems are very challenging particularly because they require, at low computation overheads, modeling long-range dependencies on high-resolution inputs/outputs to estimate the highly nonlinear pixel-wise semantics. While attention mechanisms in Deep Convolutional Neural Networks(DCNNs) has become popular for boosting long-range dependencies, element-specific attention, such as Nonlocal blocks, is highly complex and noise-sensitive to learn, and most of simplified attention hybrids try to reach the best compromise among multiple types of tasks. In this paper, we present the Polarized Self-Attention(PSA) block that incorporates two critical designs towards high-quality pixel-wise regression: (1) Polarized filtering: keeping high internal resolution in both channel and spatial attention computation while completely collapsing input tensors along their counterpart dimensions. (2) Enhancement: composing non-linearity that directly fits the output distribution of typical fine-grained regression, such as the 2D Gaussian distribution (keypoint heatmaps), or the 2D Binormial distribution (binary segmentation masks). PSA appears to have exhausted the representation capacity within its channel-only and spatial-only branches, such that there is only marginal metric differences between its sequential and parallel layouts. Experimental results show that PSA boosts standard baselines by $2-4$ points, and boosts state-of-the-arts by $1-2$ points on 2D pose estimation and semantic segmentation benchmarks.

翻訳日:2021-07-05 12:52:00 公開日:2021-07-02

# MMF:階層画像分類のためのマルチタスク多構造融合

MMF: Multi-Task Multi-Structure Fusion for Hierarchical Image Classification ( http://arxiv.org/abs/2107.00808v1 )

ライセンス: Link先を確認

Xiaoni Li, Yucan Zhou, Yu Zhou, Weiping Wang

(参考訳) 階層的分類は、複数の粒度の予測を提供し、より良い誤りを促すことで複雑なタスクに重要である。ラベル構造が性能を決定すると、多くの既存手法が分類結果を促進するための優れたラベル構造を構築しようとする。本稿では,異なるラベル構造がカテゴリ認識に様々な事前知識を提供すると考えているので,それらを融合させることにより,階層的な分類結果の改善が期待できる。さらに,異なるラベル構造を統合するマルチタスク多構造融合モデルを提案する。 1つは共通のサブクラスを分類する伝統的な分類枝であり、もう1つは異なるラベル構造によって定義される異種スーパークラスを特定する責任がある。また,複数のラベル構造の効果に加えて,階層分類の精度向上と階層評価指標の調整のために,ディープモデルのアーキテクチャについても検討する。 cifar100 と car196 の実験結果から,任意のラベル構造を持つフラット分類器や階層分類器よりはるかに優れた結果が得られることがわかった。

Hierarchical classification is significant for complex tasks by providing multi-granular predictions and encouraging better mistakes. As the label structure decides its performance, many existing approaches attempt to construct an excellent label structure for promoting the classification results. In this paper, we consider that different label structures provide a variety of prior knowledge for category recognition, thus fusing them is helpful to achieve better hierarchical classification results. Furthermore, we propose a multi-task multi-structure fusion model to integrate different label structures. It contains two kinds of branches: one is the traditional classification branch to classify the common subclasses, the other is responsible for identifying the heterogeneous superclasses defined by different label structures. Besides the effect of multiple label structures, we also explore the architecture of the deep model for better hierachical classification and adjust the hierarchical evaluation metrics for multiple label structures. Experimental results on CIFAR100 and Car196 show that our method obtains significantly better results than using a flat classifier or a hierarchical classifier with any single label structure.

翻訳日:2021-07-05 12:51:27 公開日:2021-07-02

# 第1位 ug2+ challenge 2021 -- (semi-)supervised face detection in the low light condition

1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face detection in the low light condition ( http://arxiv.org/abs/2107.00818v1 )

ライセンス: Link先を確認

Pengcheng Wang, Lingqiao Ji, Zhilong Ji, Yuan Gao, Xiao Liu

(参考訳) 本稿では, CVPR 2021のUG2+チャレンジにおいて, 低光環境下での顔検出のための「TAL-ai」のソリューションを簡潔に紹介する。一般的な画像強調法と画像転送法を用いていくつかの実験を行い、より近い領域に低照度画像と正常画像を引き寄せた。そして、これらのデータをトレーニングに利用することで、よりよいパフォーマンスが得られることが観察されている。また、DectoroRS、Cascade-RCNN、Swin-transformerのような大きなバックボーンなど、一般的なオブジェクト検出フレームワークにも適応しています。最後に、テストセットでmAP 74.89を達成し、最終リーダーボードで1位となったモデルをいくつかまとめる。

In this technical report, we briefly introduce the solution of our team "TAL-ai" for (Semi-) supervised Face detection in the low light condition in UG2+ Challenge in CVPR 2021. By conducting several experiments with popular image enhancement methods and image transfer methods, we pulled the low light image and the normal image to a more closer domain. And it is observed that using these data to training can achieve better performance. We also adapt several popular object detection frameworks, e.g., DetectoRS, Cascade-RCNN, and large backbone like Swin-transformer. Finally, we ensemble several models which achieved mAP 74.89 on the testing set, ranking 1st on the final leaderboard.

翻訳日:2021-07-05 12:51:11 公開日:2021-07-02

# 変圧器を用いたクロスビュージオローカライズ

Cross-view Geo-localization with Evolving Transformer ( http://arxiv.org/abs/2107.00842v1 )

ライセンス: Link先を確認

Hongji Yang, Xiufan Lu and Yingying Zhu

(参考訳) 本研究では,道路画像の地理空間的位置をジオタグ付き空中画像のデータベースとマッチングすることにより推定する,クロスビューなジオローカライゼーションの問題に対処する。クロスビューマッチングタスクは、視界の劇的な外観と幾何学的差異のため、非常に難しい。そこで本稿では,cnnが主流である既存の手法とは異なり,グローバル依存をモデル化するためにトランスフォーマの自己着脱特性を利用する新しいジオローカライズトランス(egotr)を考案し,クロスビュージオローカライズにおける視覚的あいまいさを著しく低減する。また,egotrが地上画像と空中画像の幾何学的配置を理解し対応するために,トランスフォーマーの位置符号化を利用する。幾何学的知識に強い仮定を課す最先端の手法と比較して、egotrはトレーニング目的を通じて柔軟に位置埋め込みを学び、従って多くの実世界のシナリオにおいてより実用的になる。トランスフォーマーはタスクに適していますが、そのバニラセルフアテンションメカニズムは各レイヤ内のイメージパッチ内で独立して相互作用し、レイヤ間の相関を見落としています。本稿では,学習表現の品質を向上させるための,単純かつ効果的な自己交叉注意機構を提案する。セルフクロスアテンション(self-cross attention)は、隣接するレイヤ間のグローバルな依存関係をモデル化する。その結果、提案した自己横断的注意はより安定したトレーニングをもたらし、一般化能力を改善し、ネットワークが深まるにつれて表現が進化し続けるように促す。広汎な実験により,我々のEgoTRは,標準的な,きめ細かな,また,クロスデータセットなジオローカライゼーションタスクにおいて,最先端の手法に対して良好に機能することを示した。

In this work, we address the problem of cross-view geo-localization, which estimates the geospatial location of a street view image by matching it with a database of geo-tagged aerial images. The cross-view matching task is extremely challenging due to drastic appearance and geometry differences across views. Unlike existing methods that predominantly fall back on CNN, here we devise a novel evolving geo-localization Transformer (EgoTR) that utilizes the properties of self-attention in Transformer to model global dependencies, thus significantly decreasing visual ambiguities in cross-view geo-localization. We also exploit the positional encoding of Transformer to help the EgoTR understand and correspond geometric configurations between ground and aerial images. Compared to state-of-the-art methods that impose strong assumption on geometry knowledge, the EgoTR flexibly learns the positional embeddings through the training objective and hence becomes more practical in many real-world scenarios. Although Transformer is well suited to our task, its vanilla self-attention mechanism independently interacts within image patches in each layer, which overlooks correlations between layers. Instead, this paper propose a simple yet effective self-cross attention mechanism to improve the quality of learned representations. The self-cross attention models global dependencies between adjacent layers, which relates between image patches while modeling how features evolve in the previous layer. As a result, the proposed self-cross attention leads to more stable training, improves the generalization ability and encourages representations to keep evolving as the network goes deeper. Extensive experiments demonstrate that our EgoTR performs favorably against state-of-the-art methods on standard, fine-grained and cross-dataset cross-view geo-localization tasks.

翻訳日:2021-07-05 12:50:56 公開日:2021-07-02

# MSN:軌道予測のためのマルチスタイルネットワーク

MSN: Multi-Style Network for Trajectory Prediction ( http://arxiv.org/abs/2107.00932v1 )

ライセンス: Link先を確認

Conghao Wong, Beihao Xia, Qinmu Peng, Xinge You

(参考訳) 複雑な場面で様々なエージェントの将来の軌跡を予測することは不可欠だが困難である。エージェントの内部的性格要因、近隣の対話的行動、周辺環境の影響にかかわらず、将来の行動スタイルに影響を及ぼすであろう。つまり、同じ種類のエージェントであっても、行動の好みに大きな違いがあるということです。最近の研究はエージェントのマルチモーダル計画の研究に大きな進展をもたらしたが、それらの多くは依然として全てのエージェントに同じ予測戦略を適用しており、巨大なエージェントの複数のスタイルを完全に示すことは困難である。本稿では,エージェントの嗜好スタイルを複数の隠れた行動カテゴリに適応的に分割し,各カテゴリの予測ネットワークを個別に訓練することにより,エージェントに同時に予測スタイルを与えるマルチスタイルネットワーク(msn)を提案する。実験により,我々の決定論的MSN-Dと生成MSN-Gは,最近の最先端手法よりも優れており,可視化結果のマルチスタイル特性が優れていることが示された。

It is essential but challenging to predict future trajectories of various agents in complex scenes. Whether it is internal personality factors of agents, interactive behavior of the neighborhood, or the influence of surroundings, it will have an impact on their future behavior styles. It means that even for the same physical type of agents, there are huge differences in their behavior preferences. Although recent works have made significant progress in studying agents' multi-modal plannings, most of them still apply the same prediction strategy to all agents, which makes them difficult to fully show the multiple styles of vast agents. In this paper, we propose the Multi-Style Network (MSN) to focus on this problem by divide agents' preference styles into several hidden behavior categories adaptively and train each category's prediction network separately, therefore giving agents all styles of predictions simultaneously. Experiments demonstrate that our deterministic MSN-D and generative MSN-G outperform many recent state-of-the-art methods and show better multi-style characteristics in the visualized results.

翻訳日:2021-07-05 12:50:22 公開日:2021-07-02

# 心臓射出分画推定のための超音波ビデオトランスフォーマ

Ultrasound Video Transformers for Cardiac Ejection Fraction Estimation ( http://arxiv.org/abs/2107.00977v1 )

ライセンス: Link先を確認

Hadrien Reynaud, Athanasios Vlontzos, Benjamin Hou, Arian Beqiri, Paul Leeson, Bernhard Kainz

(参考訳) 心臓超音波画像は様々な心臓疾患の診断に用いられる。一般的な分析パイプラインは、専門医によるビデオフレームのマニュアル処理を含む。これは、オブザーバ内およびオブザーバ間の可変性に苦しむ。本稿では,残差オートエンコーダネットワークに基づく変圧器アーキテクチャとトークン分類に適したbertモデルを用いた超音波映像解析手法を提案する。これにより、任意の長さのビデオが処理できる。本研究では,エンドシストリクス(ES)とエンドダイアストリクス(ED)のフレーム検出と左室放出率の自動計算に本モデルを適用した。任意の長さの映像に対して,esの3.36フレームとedの7.17フレームの平均フレーム距離を達成する。我々のエンドツーエンドの学習可能なアプローチでは、ビデオあたり5.95のMAEと0.15秒で$R^2$の0.52で射出率を推定できる。コードとモデルはhttps://github.com/hreynaud/uvtで入手できる。

Cardiac ultrasound imaging is used to diagnose various heart diseases. Common analysis pipelines involve manual processing of the video frames by expert clinicians. This suffers from intra- and inter-observer variability. We propose a novel approach to ultrasound video analysis using a transformer architecture based on a Residual Auto-Encoder Network and a BERT model adapted for token classification. This enables videos of any length to be processed. We apply our model to the task of End-Systolic (ES) and End-Diastolic (ED) frame detection and the automated computation of the left ventricular ejection fraction. We achieve an average frame distance of 3.36 frames for the ES and 7.17 frames for the ED on videos of arbitrary length. Our end-to-end learnable approach can estimate the ejection fraction with a MAE of 5.95 and $R^2$ of 0.52 in 0.15s per video, showing that segmentation is not the only way to predict ejection fraction. Code and models are available at https://github.com/HReynaud/UVT.

翻訳日:2021-07-05 12:50:03 公開日:2021-07-02

# 複雑な雑音下での教師なし単一画像超解像

Unsupervised Single Image Super-resolution Under Complex Noise ( http://arxiv.org/abs/2107.00986v1 )

ライセンス: Link先を確認

Zongsheng Yue, Qian Zhao, Jianwen Xie, Lei Zhang and Deyu Meng

(参考訳) シングル・イメージ・スーパーレゾリューション(sisr)の研究、特にディープ・ニューラル・ネットワーク(dnn)が近年大きな成功を収めているが、2つの大きな制限に苦しめられている。第一に、実際の画像劣化は、通常不明であり、互いに非常に異なっており、一般的なSISRタスクを扱うために単一のモデルを訓練することは極めて困難である。第二に、現在の手法は主に劣化のサンプル化プロセスに焦点を当てているが、避けられない騒音汚染を無視または過小評価している。例えば、一般的に使用される独立で同一の分散(d)である。ガウス雑音分布は常に実際の画像ノイズ(カメラセンサノイズなど)から逸脱しており、実際のシナリオでは性能が制限される。これらの問題に対処するため,本論文では,一般のSISRタスクを未知の劣化に対処するモデルベースunsupervised SISR法を提案する。伝統的なidの代わりにガウスノイズ仮定 - パッチベースの新しい非i.d. 複雑な実雑音に適合するノイズモデリング手法を提案する。さらに、DNNによりパラメータ化された深層ジェネレータを用いて、潜伏変数を高解像度画像にマッピングし、従来のハイパーラプラシアン前駆体も精巧にそのようなジェネレータに埋め込み、画像勾配をさらに制約する。最後に、モンテカルロemアルゴリズムは、w.r.t.の両方のイメージジェネレータを更新するための一般的な推論フレームワークを提供する。潜在変数とネットワークパラメータ。総合実験により, 提案手法は, より小さなモデル (0.34M vs. 2.40M) だけでなく, より高速な技術 (SotA) 法(約1dB PSNR) を克服できることが示された。

While the researches on single image super-resolution (SISR), especially equipped with deep neural networks (DNNs), have achieved tremendous successes recently, they still suffer from two major limitations. Firstly, the real image degradation is usually unknown and highly variant from one to another, making it extremely hard to train a single model to handle the general SISR task. Secondly, most of current methods mainly focus on the downsampling process of the degradation, but ignore or underestimate the inevitable noise contamination. For example, the commonly-used independent and identically distributed (i.i.d.) Gaussian noise distribution always largely deviates from the real image noise (e.g., camera sensor noise), which limits their performance in real scenarios. To address these issues, this paper proposes a model-based unsupervised SISR method to deal with the general SISR task with unknown degradations. Instead of the traditional i.i.d. Gaussian noise assumption, a novel patch-based non-i.i.d. noise modeling method is proposed to fit the complex real noise. Besides, a deep generator parameterized by a DNN is used to map the latent variable to the high-resolution image, and the conventional hyper-Laplacian prior is also elaborately embedded into such generator to further constrain the image gradients. Finally, a Monte Carlo EM algorithm is designed to solve our model, which provides a general inference framework to update the image generator both w.r.t. the latent variable and the network parameters. Comprehensive experiments demonstrate that the proposed method can evidently surpass the current state of the art (SotA) method (about 1dB PSNR) not only with a slighter model (0.34M vs. 2.40M) but also faster speed.

翻訳日:2021-07-05 12:49:46 公開日:2021-07-02

# 円形ハフ変換を用いた光点字認識

Optical Braille Recognition using Circular Hough Transform ( http://arxiv.org/abs/2107.00993v1 )

ライセンス: Link先を確認

Zeba Khanam and Atiya Usmani

(参考訳) 点字は視覚障害者に読み書きの権限を与えてきた。しかし同時に、点字以外のユーザーが点字のスクリプトを理解できないことによるギャップも生んでいる。このギャップにより、研究者は点字文書を自然言語に変換する光学点字認識技術を提案するようになった。この研究の主な動機は、盲目の学生の個人文書を翻訳することで、学術機関のコミュニケーションギャップを埋めることである。これはスマートフォンのカメラを使って点字文書をデジタル化する経済的かつ効果的な手法を提案している。任意の点字画像に対して、スキューネス、ノイズ、その他の抑止に不変なハフ変換に基づくドット検出機構が提案されている。検出されたドットは、距離ベースのクラスタリングアルゴリズムを使用して点字細胞にクラスタリングされる。続いて、各点字細胞の標準的な物理パラメータを、特徴抽出と自然言語文字の分類のために推定する。 54点字スクリプトのデータセットに対するこの手法の包括的な評価は、98.71%の精度で行われている。

Braille has empowered visually challenged community to read and write. But at the same time, it has created a gap due to widespread inability of non-Braille users to understand Braille scripts. This gap has fuelled researchers to propose Optical Braille Recognition techniques to convert Braille documents to natural language. The main motivation of this work is to cement the communication gap at academic institutions by translating personal documents of blind students. This has been accomplished by proposing an economical and effective technique which digitizes Braille documents using a smartphone camera. For any given Braille image, a dot detection mechanism based on Hough transform is proposed which is invariant to skewness, noise and other deterrents. The detected dots are then clustered into Braille cells using distance-based clustering algorithm. In succession, the standard physical parameters of each Braille cells are estimated for feature extraction and classification as natural language characters. The comprehensive evaluation of this technique on the proposed dataset of 54 Braille scripts has yielded into accuracy of 98.71%.

翻訳日:2021-07-05 12:49:14 公開日:2021-07-02

# ビデオセグメンテーションのためのディープラーニング技術に関する調査

A Survey on Deep Learning Technique for Video Segmentation ( http://arxiv.org/abs/2107.01153v1 )

ライセンス: Link先を確認

Wenguan Wang, Tianfei Zhou, Fatih Porikli, David Crandall, Luc Van Gool

(参考訳) ビデオセグメンテーション(ビデオセグメンテーション、ビデオフレームを複数のセグメントまたはオブジェクトに分割する)は、映画における視覚効果補助、自律運転におけるシーン理解、ビデオ会議における仮想背景生成など、幅広い実践的応用において重要な役割を果たしている。近年,コンピュータビジョンにおけるコネクショナリズムのルネサンスにより,映像セグメンテーションに特化し,魅力的なパフォーマンスを提供するディープラーニングベースのアプローチが数多く流入している。本調査では,各タスク設定,背景概念,認識されたニーズ,開発履歴,主な課題について,ビデオおよびビデオ意味セグメンテーションにおけるジェネリックオブジェクトセグメンテーション(未知のカテゴリの)という,この分野における2つの基本的な研究方針を総合的にレビューする。また,提案手法とデータセットについて,代表文献の詳細な概要を述べる。さらに,ベンチマークデータセットにおけるレビュー手法の定量的性能比較を行った。最終的に、この分野における未解決の未解決問題の集合を指摘し、さらなる研究の機会を提案する。

Video segmentation, i.e., partitioning video frames into multiple segments or objects, plays a critical role in a broad range of practical applications, e.g., visual effect assistance in movie, scene understanding in autonomous driving, and virtual background creation in video conferencing, to name a few. Recently, due to the renaissance of connectionism in computer vision, there has been an influx of numerous deep learning based approaches that have been dedicated to video segmentation and delivered compelling performance. In this survey, we comprehensively review two basic lines of research in this area, i.e., generic object segmentation (of unknown categories) in videos and video semantic segmentation, by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also provide a detailed overview of representative literature on both methods and datasets. Additionally, we present quantitative performance comparisons of the reviewed methods on benchmark datasets. At last, we point out a set of unsolved open issues in this field, and suggest possible opportunities for further research.

翻訳日:2021-07-05 12:49:01 公開日:2021-07-02

# HandVoxNet++:Voxel-based Neural Networksを用いた手形状と姿勢推定

HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural Networks ( http://arxiv.org/abs/2107.01205v1 )

ライセンス: Link先を確認

Jameel Malik and Soshi Shimada and Ahmed Elhayek and Sk Aziz Ali and Christian Theobalt and Vladislav Golyanik and Didier Stricker

(参考訳) 単一深度マップからの3次元手形状とポーズ推定は多くのアプリケーションにおいて新しい挑戦的なコンピュータビジョン問題である。既存の方法では、2D畳み込みニューラルネットワークを介して手メッシュを直接回帰し、画像の視点歪みによるアーティファクトにつながる。既存の方法の限界に対処するため、handvoxnet++、すなわち3dおよびグラフ畳み込みを完全に教師付きで訓練したvoxelベースのディープネットワークを開発した。ネットワークへの入力はtsdf(truncated signed distance function)に基づく3次元voxelized-depth-mapである。 HandVoxNet++は2つの手形表現に依存している。 1つ目は、メッシュトポロジを保存せず、最も正確な表現である手形状の3Dボキセル化格子である。第2の表現は、メッシュトポロジーを保存する手表面である。両表現の利点を,新たなニューラルグラフ畳み込み型メッシュ登録 (gcn-meshreg) や,トレーニングデータに依存しない古典的なセグメント単位非剛性重力アプローチ (nrga++) と組み合わせて,手表面とボクセル化手の形状を整合させることで組み合わせる。 SynHand5M、deep-based HANDS19 Challenge、HO-3Dという3つの公開ベンチマークの広範な評価において、提案されたHandVoxNet++は最先端のパフォーマンスを達成する。 CVPR 2020で発表されたこれまでのアプローチのジャーナル拡張では、SynHand5MとHANDS19データセットでそれぞれ41.09%と13.7%のアライメント精度が得られた。 HANDS19チャレンジデータセット(Task 1: Depth-Based 3D Hand Pose Estimation)では,2020年8月の結果がポータルに提出された時点で,本手法が第1位となった。

3D hand shape and pose estimation from a single depth map is a new and challenging computer vision problem with many applications. Existing methods addressing it directly regress hand meshes via 2D convolutional neural networks, which leads to artifacts due to perspective distortions in the images. To address the limitations of the existing methods, we develop HandVoxNet++, i.e., a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner. The input to our network is a 3D voxelized-depth-map-based on the truncated signed distance function (TSDF). HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology and which is the most accurate representation. The second representation is the hand surface that preserves the mesh topology. We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which does not rely on training data. In extensive evaluations on three public benchmarks, i.e., SynHand5M, depth-based HANDS19 challenge and HO-3D, the proposed HandVoxNet++ achieves the state-of-the-art performance. In this journal extension of our previous approach presented at CVPR 2020, we gain 41.09% and 13.7% higher shape alignment accuracy on SynHand5M and HANDS19 datasets, respectively. Our method is ranked first on the HANDS19 challenge dataset (Task 1: Depth-Based 3D Hand Pose Estimation) at the moment of the submission of our results to the portal in August 2020.

翻訳日:2021-07-05 12:48:42 公開日:2021-07-02

# 入力の連結による深い二重降下の緩和

Mitigating deep double descent by concatenating inputs ( http://arxiv.org/abs/2107.00797v1 )

ライセンス: Link先を確認

John Chen, Qihan Wang, Anastasios Kyrillidis

(参考訳) 二重降下曲線はディープニューラルネットワークの最も興味深い特性の1つである。これは、古典的なバイアス分散曲線と現代のニューラルネットワークの振る舞いとを対比し、サンプル数がパラメータの数に近づくところで発生する。本研究では,深部ニューラルネットワーク設定における二重降下現象とサンプル数との関係について検討する。特に,サンプル数を人工的に増やすことで既存のデータセットを増強する構造を提案する。この構成は経験的にこの設定の二重降下曲線を緩和する。我々は, 深層2次降下に関する既存の研究を再現し, 過パラメータ領域への滑らかな降下を観測した。これは、モデルサイズ、および数字のエポックに関しても起こる。

The double descent curve is one of the most intriguing properties of deep neural networks. It contrasts the classical bias-variance curve with the behavior of modern neural networks, occurring where the number of samples nears the number of parameters. In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting. In particular, we propose a construction which augments the existing dataset by artificially increasing the number of samples. This construction empirically mitigates the double descent curve in this setting. We reproduce existing work on deep double descent, and observe a smooth descent into the overparameterized region for our construction. This occurs both with respect to the model size, and with respect to the number epochs.

翻訳日:2021-07-05 12:48:07 公開日:2021-07-02

# 異種情報集約によるオンライン地下鉄原点推定予測

Online Metro Origin-Destination Prediction via Heterogeneous Information Aggregation ( http://arxiv.org/abs/2107.00946v1 )

ライセンス: Link先を確認

Lingbo Liu, Yuying Zhu, Guanbin Li, Ziyi Wu, Lei Bai, Mingzhi Mao, Liang Lin

(参考訳) 地下鉄の起点決定予測は知的交通管理にとって極めて重要な課題であり、これは2種類のクロスステーション乗務員、すなわちオリジン・デスティネーション(OD)1とデスティネーション・オリジン(DO)1を正確に予測することを目的としている。 However, complete OD matrices of previous time intervals can not be obtained immediately in online metro systems, and conventional methods only used limited information to forecast the future OD and DO ridership separately.In this work, we proposed a novel neural network module termed Heterogeneous Information Aggregation Machine (HIAM), which fully exploits heterogeneous information of historical data (e.g., incomplete OD matrices, unfinished order vectors, and DO matrices) to jointly learn the evolutionary patterns of OD and DO ridership. 具体的には、ODモデリングブランチが未完成注文の潜在的目的地を明示的に推定し、不完全なOD行列の情報を補完する一方、DOモデリングブランチはDO行列を入力として、DOライダーシップの時空間分布をキャプチャする。さらに、OD-DO因果関係と相関関係をモデル化するためのOD特徴とDO特徴の相互情報を伝達するために、デュアル情報変換器を導入する。提案したHIAMに基づいて,将来のODおよびDOライダーを同時に予測する統合Seq2Seqネットワークを開発した。 2つの大規模ベンチマークで行った大規模な実験は、オンライン地下鉄の起点決定予測における手法の有効性を示した。

Metro origin-destination prediction is a crucial yet challenging task for intelligent transportation management, which aims to accurately forecast two specific types of cross-station ridership, i.e., Origin-Destination (OD) one and Destination-Origin (DO) one. However, complete OD matrices of previous time intervals can not be obtained immediately in online metro systems, and conventional methods only used limited information to forecast the future OD and DO ridership separately.In this work, we proposed a novel neural network module termed Heterogeneous Information Aggregation Machine (HIAM), which fully exploits heterogeneous information of historical data (e.g., incomplete OD matrices, unfinished order vectors, and DO matrices) to jointly learn the evolutionary patterns of OD and DO ridership. Specifically, an OD modeling branch estimates the potential destinations of unfinished orders explicitly to complement the information of incomplete OD matrices, while a DO modeling branch takes DO matrices as input to capture the spatial-temporal distribution of DO ridership. Moreover, a Dual Information Transformer is introduced to propagate the mutual information among OD features and DO features for modeling the OD-DO causality and correlation. Based on the proposed HIAM, we develop a unified Seq2Seq network to forecast the future OD and DO ridership simultaneously. Extensive experiments conducted on two large-scale benchmarks demonstrate the effectiveness of our method for online metro origin-destination prediction.

翻訳日:2021-07-05 12:47:56 公開日:2021-07-02

# ニューラルネットワーク層代数:ディープラーニングにおけるキャパシティと圧縮を測定するフレームワーク

Neural Network Layer Algebra: A Framework to Measure Capacity and Compression in Deep Learning ( http://arxiv.org/abs/2107.01081v1 )

ライセンス: Link先を確認

Alberto Badias and Ashis Banerjee

(参考訳) 本稿では,ニューラルネットワークの内在特性を測定するための新しい枠組みを提案する。畳み込みネットワークにフォーカスしながら、我々のフレームワークはどんなネットワークアーキテクチャにも外挿できる。特に,ネットワーク構造のみに依存し,トレーニングやテストデータに依存しない,キャパシティ(表現性に関連する)と圧縮の2つの特性を評価した。この目的のために、第1のメトリクスは、レイヤ複雑性と呼ばれ、任意のネットワーク層のアーキテクチャ上の複雑さを捉え、第2のメトリクスは、レイヤ固有のパワーと呼ばれ、ネットワークに沿ってデータを圧縮する方法を符号化する。メトリクスは、この論文で紹介された層代数の概念に基づいている。この概念は、グローバルなプロパティがネットワークトポロジに依存し、任意のニューラルネットワークの葉ノードを局所的な転送関数で近似できるという考えに基づいており、グローバルなメトリクスの簡単な計算を可能にしている。また,我々の測定値を用いて最先端アーキテクチャの特性を比較し,その特性を用いてベンチマークデータセットの分類精度を解析した。

We present a new framework to measure the intrinsic properties of (deep) neural networks. While we focus on convolutional networks, our framework can be extrapolated to any network architecture. In particular, we evaluate two network properties, namely, capacity (related to expressivity) and compression, both of which depend only on the network structure and are independent of the training and test data. To this end, we propose two metrics: the first one, called layer complexity, captures the architectural complexity of any network layer; and, the second one, called layer intrinsic power, encodes how data is compressed along the network. The metrics are based on the concept of layer algebra, which is also introduced in this paper. This concept is based on the idea that the global properties depend on the network topology, and the leaf nodes of any neural network can be approximated using local transfer functions, thereby, allowing a simple computation of the global metrics. We also compare the properties of the state-of-the art architectures using our metrics and use the properties to analyze the classification accuracy on benchmark datasets.

翻訳日:2021-07-05 12:47:31 公開日:2021-07-02

# brain over brawn -- ステレオカメラを使って、軌道を再構築してより高速なuavを検出し、追跡し、インターセプトする

Brain over Brawn -- Using a Stereo Camera to Detect, Track and Intercept a Faster UAV by Reconstructing Its Trajectory ( http://arxiv.org/abs/2107.00962v1 )

ライセンス: Link先を確認

Antonella Bari\v{s}i\'c, Frano Petric, Stjepan Bogdan

(参考訳) 本稿では,MBZIRC2020 Challenge 1に触発された高速侵入型UAVのインターセプト手法について述べる。侵入者の軌道の形状の知識を活用することで、インターセプションポイントを計算することができる。ターゲット追跡は, YOLOv3 Tiny畳み込みニューラルネットワークによる画像処理と, ジンバル搭載型ZED Miniステレオカメラを用いた深度計算を併用した。我々は、ZED MiniからRGBと深度データを用いてターゲットの3次元位置を抽出し、ノイズを低減するためにヒストグラムに基づく処理を考案した。目標位置の3次元計測は、ベルヌーイの補題を用いて近似した図形形状軌跡の位置、向き、大きさを計算するために用いられる。近似が十分正確であると判断されると、測定と近似の間のハウスドルフ距離によって測定され、インターセプションポイントが算出され、インターセプションUAVがターゲットの経路に位置決めされる。提案手法はmbzircコンペティションで得られた経験に基づいて大幅に改善され,シミュレーションおよびフィールド実験により検証された。その結果, 標的UAVの動作に関する情報を抽出する効率的な視覚認識モジュールが, インターセプションの基盤として開発されたことを確認した。このシステムは、ほとんどのシミュレーション実験において、インターセプターよりも30%速いターゲットを追跡し、インターセプトすることができる。非構造環境でのテストでは、12の成果のうち9つが成功した。

The work presented in this paper demonstrates our approach to intercepting a faster intruder UAV, inspired by the MBZIRC2020 Challenge 1. By leveraging the knowledge of the shape of the intruder's trajectory we are able to calculate the interception point. Target tracking is based on image processing by a YOLOv3 Tiny convolutional neural network, combined with depth calculation using a gimbal-mounted ZED Mini stereo camera. We use RGB and depth data from ZED Mini to extract the 3D position of the target, for which we devise a histogram-of-depth based processing to reduce noise. Obtained 3D measurements of target's position are used to calculate the position, the orientation and the size of a figure-eight shaped trajectory, which we approximate using lemniscate of Bernoulli. Once the approximation is deemed sufficiently precise, measured by Hausdorff distance between measurements and the approximation, an interception point is calculated to position the intercepting UAV right on the path of the target. The proposed method, which has been significantly improved based on the experience gathered during the MBZIRC competition, has been validated in simulation and through field experiments. The results confirmed that an efficient visual perception module which extracts information related to the motion of the target UAV as a basis for the interception, has been developed. The system is able to track and intercept the target which is 30% faster than the interceptor in majority of simulation experiments. Tests in the unstructured environment yielded 9 out of 12 successful results.

翻訳日:2021-07-05 12:46:28 公開日:2021-07-02

# epistemic congress (複数形 epistemic congresss)

The Optimal Size of an Epistemic Congress ( http://arxiv.org/abs/2107.01042v1 )

ライセンス: Link先を確認

Manon Revel, Tao Lin, Daniel Halpern

(参考訳) 代表制民主主義における議会の最適な規模を分析する。我々は、有権者が一つの根拠となる真理結果で二項問題を判断し、各投票者が彼らの能力レベルに応じて正確に[0, 1]$. 最善の専門家をサンプリングして認識論的会議を構成できると仮定すると、最適な議会のサイズは人口規模で線形であるべきである。この結果は、トップの代表者が任意に高い確率で正確であることを許すとしても、持続する。実世界のデータを分析した結果、議会の実際の規模は、理論的な結果が示す最適なサイズよりもはるかに小さいことがわかった。我々は、極小の議会が直接民主主義を上回り、全ての有権者が投票する状況を分析して結論付けた。

We analyze the optimal size of a congress in a representative democracy. We take an epistemic view where voters decide on a binary issue with one ground truth outcome, and each voter votes correctly according to their competence levels in $[0, 1]$. Assuming that we can sample the best experts to form an epistemic congress, we find that the optimal congress size should be linear in the population size. This result is striking because it holds even when allowing the top representatives to be accurate with arbitrarily high probabilities. We then analyze real world data, finding that the actual sizes of congresses are much smaller than the optimal size our theoretical results suggest. We conclude by analyzing under what conditions congresses of sub-optimal sizes would still outperform direct democracy, in which all voters vote.

翻訳日:2021-07-05 12:45:46 公開日:2021-07-02

# スパースランダムグラフにおけるオンラインマッチング:グリーディアルゴリズムの非漸近的性能

Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm ( http://arxiv.org/abs/2107.00995v1 )

ライセンス: Link先を確認

Nathan Noiry, Flore Sentenac, Vianney Perchet

(参考訳) 逐次予算配分問題により、頂点間の接続がi.d.ではなく、固定度分布(いわゆる構成モデル)を持つオンラインマッチング問題を調査する。偏微分方程式の明示的な系の解であるそれらの連続的な対応によって関連する確率的離散過程を近似することにより、最も単純なアルゴリズムであるgreedyの競合比を推定する。この手法は、問題のサイズが大きくなるにつれて任意に高い確率で、推定誤差の正確な境界を与える。特に、異なる構成モデル間の形式的な比較を可能にする。また、非常に驚くべきことに、GREEDYがRANKINGよりも優れたパフォーマンス保証が得られることを証明しています。

Motivated by sequential budgeted allocation problems, we investigate online matching problems where connections between vertices are not i.i.d., but they have fixed degree distributions -- the so-called configuration model. We estimate the competitive ratio of the simplest algorithm, GREEDY, by approximating some relevant stochastic discrete processes by their continuous counterparts, that are solutions of an explicit system of partial differential equations. This technique gives precise bounds on the estimation errors, with arbitrarily high probability as the problem size increases. In particular, it allows the formal comparison between different configuration models. We also prove that, quite surprisingly, GREEDY can have better performance guarantees than RANKING, another celebrated algorithm for online matching that usually outperforms the former.

翻訳日:2021-07-05 12:45:32 公開日:2021-07-02

# LensID:白内障手術ビデオにおけるレンズ不規則性検出を目的としたCNN-RNNベースのフレームワーク

LensID: A CNN-RNN-Based Framework Towards Lens Irregularity Detection in Cataract Surgery Videos ( http://arxiv.org/abs/2107.00875v1 )

ライセンス: Link先を確認

Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Yosuf El-Shabrawi, Klaus Schoeffmann

(参考訳) 白内障手術後の致命的な合併症は、視力の低下と眼外傷につながるレンズインプラントの脱臼である。この合併症のリスクを軽減するためには、手術中の危険因子を発見することが不可欠である。しかし、レンズ脱臼とその不審な危険因子との関係を多数のビデオを用いて検討することは、時間的拡張の手順である。そのため、外科医はより大規模で信頼性の高い研究を可能にするために、自動的なアプローチを要求する。本稿では,レンズの不規則性検出のための大きなステップとして,新しい枠組みを提案する。特に、(I)レンズ導入フェーズを認識するエンドツーエンドのリカレントニューラルネットワークを提案し、(II)インプラントフェーズ後にレンズと瞳孔を分割する新しいセマンティックセグメンテーションネットワークを提案する。位相認識結果から, 手術用位相認識手法の有効性が示された。さらに,セグメンテーション結果は,最先端の競合手法と比較して,セグメンテーションネットワークの有効性を確認した。

A critical complication after cataract surgery is the dislocation of the lens implant leading to vision deterioration and eye trauma. In order to reduce the risk of this complication, it is vital to discover the risk factors during the surgery. However, studying the relationship between lens dislocation and its suspicious risk factors using numerous videos is a time-extensive procedure. Hence, the surgeons demand an automatic approach to enable a larger-scale and, accordingly, more reliable study. In this paper, we propose a novel framework as the major step towards lens irregularity detection. In particular, we propose (I) an end-to-end recurrent neural network to recognize the lens-implantation phase and (II) a novel semantic segmentation network to segment the lens and pupil after the implantation phase. The phase recognition results reveal the effectiveness of the proposed surgical phase recognition approach. Moreover, the segmentation results confirm the proposed segmentation network's effectiveness compared to state-of-the-art rival approaches.

翻訳日:2021-07-05 12:45:09 公開日:2021-07-02

# HO-3D_v3: HO-3Dデータセットの手動アノテーションの精度向上

HO-3D_v3: Improving the Accuracy of Hand-Object Annotations of the HO-3D Dataset ( http://arxiv.org/abs/2107.00887v1 )

ライセンス: Link先を確認

Shreyas Hampali, Sayan Deb Sarkar, Vincent Lepetit

(参考訳) HO-3Dは、手とオブジェクトの3Dポーズにアノテートされた様々なハンドオブジェクトインタラクションシナリオの画像シーケンスを提供するデータセットで、元々HO-3D_v2として導入された。本論文で紹介した最適化手法「本注」を用いてアノテーションを自動生成した。 ho-3d_v3は、手とオブジェクトのポーズの両方に対してより正確なアノテーションを提供するので、手とオブジェクトの接触領域の見積もりがより良くなります。本稿では,HOnnotate法の改良について詳述し,HO-3D_v2とHO-3D_v3の精度を比較するための評価を行った。 ho-3d_v3は、手ポーズのho-3d_v2よりも4mm高い精度を示し、物体表面との接触領域も高い。

HO-3D is a dataset providing image sequences of various hand-object interaction scenarios annotated with the 3D pose of the hand and the object and was originally introduced as HO-3D_v2. The annotations were obtained automatically using an optimization method, 'HOnnotate', introduced in the original paper. HO-3D_v3 provides more accurate annotations for both the hand and object poses thus resulting in better estimates of contact regions between the hand and the object. In this report, we elaborate on the improvements to the HOnnotate method and provide evaluations to compare the accuracy of HO-3D_v2 and HO-3D_v3. HO-3D_v3 results in 4mm higher accuracy compared to HO-3D_v2 for hand poses while exhibiting higher contact regions with the object surface.

翻訳日:2021-07-05 12:44:53 公開日:2021-07-02

# 連続感情認識のための視聴覚・視聴覚融合

Audio-visual Attentive Fusion for Continuous Emotion Recognition ( http://arxiv.org/abs/2107.01175v1 )

ライセンス: Link先を確認

Su Zhang, Yi Ding, Ziquan Wei, Cuntai Guan

(参考訳) 本稿では,(1)事前訓練された2d-cnnと時間畳み込みネットワーク(tcn)を含む視覚ブロック,(2)複数の並列tcnを含むオーラルブロック,(3)音声・視覚情報を結合したリーダ・フォロー・アテンション・フュージョンブロックという,視聴覚・時空間深層ニューラルネットワークを提案する。大規模な履歴カバレッジを持つttnは、ベースラインや最先端の手法(36または48)よりもずっと大きなウィンドウ長(つまり300)で空間-時間情報を利用することができる。融合ブロックは視覚モダリティを強調しつつ、ノイズのオーラルモダリティを相互モダリティ注意機構を用いて活用する。データの完全活用と過度な適合を軽減するため、トレーニングおよび検証セット上でクロスバリデーションを行う。コンコータンス相関係数(CCC)中心は、各折り目から結果をマージするために用いられる。現像セットでは、得られたcccはvalence 0.410、arousal 0.661であり、対応するcccはvalence 0.210、arousal 0.230である。コードはhttps://github.com/sucv/abaw2で入手できる。

We propose an audio-visual spatial-temporal deep neural network with: (1) a visual block containing a pretrained 2D-CNN followed by a temporal convolutional network (TCN); (2) an aural block containing several parallel TCNs; and (3) a leader-follower attentive fusion block combining the audio-visual information. The TCN with large history coverage enables our model to exploit spatial-temporal information within a much larger window length (i.e., 300) than that from the baseline and state-of-the-art methods (i.e., 36 or 48). The fusion block emphasizes the visual modality while exploits the noisy aural modality using the inter-modality attention mechanism. To make full use of the data and alleviate over-fitting, cross-validation is carried out on the training and validation set. The concordance correlation coefficient (CCC) centering is used to merge the results from each fold. On the development set, the achieved CCC is 0.410 for valence and 0.661 for arousal, which significantly outperforms the baseline method with the corresponding CCC of 0.210 and 0.230 for valence and arousal, respectively. The code is available at https://github.com/sucv/ABAW2.

翻訳日:2021-07-05 12:44:40 公開日:2021-07-02

# ペナル化条件勾配法の再検討

Screening for a Reweighted Penalized Conditional Gradient Method ( http://arxiv.org/abs/2107.01106v1 )

ライセンス: Link先を確認

Yifan Sun and Francis Bach

(参考訳) 条件勾配法(CGM)は大規模なスパース凸最適化において広く用いられ、構造化スパース正規化器の1イテレーション当たりの計算コストが低く、非ゼロの収集に対する欲求的なアプローチである。非凸正則化器用一般ペナリゼーションCGM(P-CGM)と非凸正則化器用再重み付きペナリゼーションCGM(RP-CGM)について,通常の凸制約をゲージインスパイアされたペナリティーに置き換えた。この一般化は、イテレーション当たりの複雑さを顕著に増やさない。有界イテレートや線探索を仮定せずに、各サブプロブレムのギャップを$O(1/t)$収束させ、静止点までの距離を測定する。我々はこれを凸の場合において安全であるスクリーニング規則と結合し、o(1/(\delta^2))$で真のサポートに収束する。非凸の場合、スクリーニング規則は有限個の反復において真の支持に収束するが、中間イテレートでは必ずしも安全ではない。本実験では, 本手法の整合性を検証し, 正則化器の凹凸を調整し, スクリーニング規則の適応性を調整した。

The conditional gradient method (CGM) is widely used in large-scale sparse convex optimization, having a low per iteration computational cost for structured sparse regularizers and a greedy approach to collecting nonzeros. We explore the sparsity acquiring properties of a general penalized CGM (P-CGM) for convex regularizers and a reweighted penalized CGM (RP-CGM) for nonconvex regularizers, replacing the usual convex constraints with gauge-inspired penalties. This generalization does not increase the per-iteration complexity noticeably. Without assuming bounded iterates or using line search, we show $O(1/t)$ convergence of the gap of each subproblem, which measures distance to a stationary point. We couple this with a screening rule which is safe in the convex case, converging to the true support at a rate $O(1/(\delta^2))$ where $\delta \geq 0$ measures how close the problem is to degeneracy. In the nonconvex case the screening rule converges to the true support in a finite number of iterations, but is not necessarily safe in the intermediate iterates. In our experiments, we verify the consistency of the method and adjust the aggressiveness of the screening rule by tuning the concavity of the regularizer.

翻訳日:2021-07-05 12:43:08 公開日:2021-07-02

# 機械学習による道路粗さ推定

Road Roughness Estimation Using Machine Learning ( http://arxiv.org/abs/2107.01199v1 )

ライセンス: Link先を確認

Milena Bajic, Shahrzad M. Pour, Asmus Skar, Matteo Pettinari, Eyal Levenberg, Tommy Sonne Alstr{\o}m

(参考訳) 路面粗さは、乗客の安全と乗り心地の両方に影響を与えるため、インフラにとって非常に重要な道路条件である。道路は経時的に劣化するので、道路インフラの状況を正確に把握するために、道路粗さを継続的に監視する必要がある。本稿では,自動車の垂直加速度と速度を用いた道路粗さ予測のための機械学習パイプラインを提案する。我々は、線形回帰、ナイーブベイズ、k-アネレスト隣人、ランダムフォレスト、サポートベクターマシン、マルチ層パーセプトロンニューラルネットワークなどのよく知られた機械学習モデルを比較した。モデルは、時間領域と統計領域で計算される最適選択された特徴の集合に基づいて訓練される。その結果, 従来の乗用車に搭載された車載センサの費用対効果を用いて, 機械学習により道路の粗さを正確に予測できることがわかった。本研究は, 広域道路網の連続監視を可能にすることにより, 今後の舗装状況監視に適していることを示す。

Road roughness is a very important road condition for the infrastructure, as the roughness affects both the safety and ride comfort of passengers. The roads deteriorate over time which means the road roughness must be continuously monitored in order to have an accurate understand of the condition of the road infrastructure. In this paper, we propose a machine learning pipeline for road roughness prediction using the vertical acceleration of the car and the car speed. We compared well-known supervised machine learning models such as linear regression, naive Bayes, k-nearest neighbor, random forest, support vector machine, and the multi-layer perceptron neural network. The models are trained on an optimally selected set of features computed in the temporal and statistical domain. The results demonstrate that machine learning methods can accurately predict road roughness, using the recordings of the cost approachable in-vehicle sensors installed in conventional passenger cars. Our findings demonstrate that the technology is well suited to meet future pavement condition monitoring, by enabling continuous monitoring of a wide road network.

翻訳日:2021-07-05 12:42:43 公開日:2021-07-02

# アンサンブルモデリングと伝達学習によるロバスト薬物・標的相互作用予測に向けて

Toward Robust Drug-Target Interaction Prediction via Ensemble Modeling and Transfer Learning ( http://arxiv.org/abs/2107.00719v1 )

ライセンス: Link先を確認

Po-Yu Kao, Shu-Min Kao, Nan-Lan Huang, Yen-Chu Lin

(参考訳) 薬物-標的相互作用(DTI)予測は薬物発見において重要な役割を担い、ディープラーニングアプローチはこの分野で最先端のパフォーマンスを達成した。本稿では,DTI予測のための深層学習モデル(EnsembleDLM)のアンサンブルを紹介する。 EnsembleDLMは化学物質やタンパク質の配列情報のみを使用し、複数のディープニューラルネットワークからの予測を集約する。このアプローチはオーバーフィッティングの機会を減らし、バイアスのない予測をもたらし、DavisとKIBAのデータセットで最先端のパフォーマンスを達成する。 EnsembleDLMは、新しいドメインにおけるテストデータの約2倍の量を用いて転送学習を行い、クロスドメインアプリケーションにおける最先端性能と適切なクロスドメインパフォーマンス(ピアソン相関係数とコンコータンス指数 > 0.8)を達成する。

Drug-target interaction (DTI) prediction plays a crucial role in drug discovery, and deep learning approaches have achieved state-of-the-art performance in this field. We introduce an ensemble of deep learning models (EnsembleDLM) for robust DTI prediction. EnsembleDLM only uses the sequence information of chemical compounds and proteins, and it aggregates the predictions from multiple deep neural networks. This approach reduces the chance of overfitting, yields an unbiased prediction, and achieves state-of-the-art performance in Davis and KIBA datasets. EnsembleDLM also reaches state-of-the-art performance in cross-domain applications and decent cross-domain performance (Pearson correlation coefficient and concordance index > 0.8) with transfer learning using approximately twice the amount of test data in the new domain.

翻訳日:2021-07-05 12:42:26 公開日:2021-07-02

# フィードバック型サイバーレジリエンスのための強化学習

Reinforcement Learning for Feedback-Enabled Cyber Resilience ( http://arxiv.org/abs/2107.00783v1 )

ライセンス: Link先を確認

Yunhan Huang, Linan Huang, Quanyan Zhu

(参考訳) デバイス数と接続の急速な増加は、攻撃面を拡大し、サイバーシステムを弱体化させている。攻撃者がますます高度でリソースに富むようになるにつれて、侵入検知、ファイアウォール、暗号化といった従来のサイバー保護に頼るだけでは、セキュアなサイバーシステムには不十分である。サイバーレジリエンスは、不適切な保護とレジリエンスメカニズムを補完する新しいセキュリティパラダイムを提供する。 CRM(Cyber-Resilient Mechanism)は、既知の、あるいはゼロデイの脅威に適応し、リアルタイムで不確実性に対処し、戦略的にサイバーシステムの重要な機能を維持する。フィードバックアーキテクチャはCRMのオンラインセンシング、推論、動作を可能にする上で重要な役割を担います。強化学習(Reinforcement Learning, RL)は、サイバーレジリエンスのためのフィードバックアーキテクチャを模倣する重要なアルゴリズムのクラスであり、CRMは攻撃者の事前知識に制限された攻撃に対して動的かつシーケンシャルな応答を提供することができる。本稿では,サイバーレジリエンスに関するRLに関する文献をレビューし,姿勢関連,情報関連,人為的脆弱性の3つの主要な脆弱性に対するサイバーレジリエントな防御について論じる。我々は,CRMの3つのアプリケーションドメインとして,移動目標防衛,サイバー詐欺,ヒューマンセキュリティ技術を導入し,その設計を詳述する。 RLテクニックにも脆弱性がある。本稿では、RLの主な脆弱性を説明し、攻撃が報酬、測定、アクチュエータを標的とするいくつかの攻撃モデルを示す。攻撃者はRLエージェントを騙して最小限の攻撃力で悪質なポリシーを学習し、RL対応システムに対する重大なセキュリティ上の懸念を示す。最後に、サイバーセキュリティとレジリエンスにおけるRLの今後の課題と、RLベースのCRMの新たな応用について論じる。

The rapid growth in the number of devices and their connectivity has enlarged the attack surface and weakened cyber systems. As attackers become increasingly sophisticated and resourceful, mere reliance on traditional cyber protection, such as intrusion detection, firewalls, and encryption, is insufficient to secure cyber systems. Cyber resilience provides a new security paradigm that complements inadequate protection with resilience mechanisms. A Cyber-Resilient Mechanism (CRM) adapts to the known or zero-day threats and uncertainties in real-time and strategically responds to them to maintain the critical functions of the cyber systems. Feedback architectures play a pivotal role in enabling the online sensing, reasoning, and actuation of the CRM. Reinforcement Learning (RL) is an important class of algorithms that epitomize the feedback architectures for cyber resiliency, allowing the CRM to provide dynamic and sequential responses to attacks with limited prior knowledge of the attacker. In this work, we review the literature on RL for cyber resiliency and discuss the cyber-resilient defenses against three major types of vulnerabilities, i.e., posture-related, information-related, and human-related vulnerabilities. We introduce moving target defense, defensive cyber deception, and assistive human security technologies as three application domains of CRMs to elaborate on their designs. The RL technique also has vulnerabilities itself. We explain the major vulnerabilities of RL and present several attack models in which the attacks target the rewards, the measurements, and the actuators. We show that the attacker can trick the RL agent into learning a nefarious policy with minimum attacking effort, which shows serious security concerns for RL-enabled systems. Finally, we discuss the future challenges of RL for cyber security and resiliency and emerging applications of RL-based CRMs.

翻訳日:2021-07-05 12:42:02 公開日:2021-07-02

# RL-NCS:非一様圧縮センシングのための強化学習に基づくデータ駆動アプローチ

RL-NCS: Reinforcement learning based data-driven approach for nonuniform compressed sensing ( http://arxiv.org/abs/2107.00838v1 )

ライセンス: Link先を確認

Nazmul Karim, Alireza Zaeemzadeh, and Nazanin Rahnavard

(参考訳) 時間変化信号のための強化学習に基づく非一様圧縮センシング(NCS)フレームワークを導入する。 RL-NCSと呼ばれる提案手法は,信号のROI係数と非ROI係数の2つの係数群間のセンサエネルギーの最適かつ適応的な分布を通じて,信号回復性能を向上させることを目的としている。 ROIの係数は通常より重要であり、非ROI係数よりも高い精度で再構成する必要がある。このタスクを達成するために、ROIは2つの特定のアプローチを使用して各タイミングで予測される。これらのアプローチの1つは、予測のために長い短期記憶(LSTM)ネットワークを組み込んでいる。別のアプローチでは、次のステップROIを予測するために、以前のROI情報を使用します。探索探索法を用いて、qネットワークは測定行列を設計するための最適なアプローチを選択することを学ぶ。さらに,Q-network と LSTM ネットワークの効率的なトレーニングのために,結合損失関数を導入している。その結果,急速に変化する信号や測定回数の削減においても,提案手法の有効性が示唆された。

A reinforcement-learning-based non-uniform compressed sensing (NCS) framework for time-varying signals is introduced. The proposed scheme, referred to as RL-NCS, aims to boost the performance of signal recovery through an optimal and adaptive distribution of sensing energy among two groups of coefficients of the signal, referred to as the region of interest (ROI) coefficients and non-ROI coefficients. The coefficients in ROI usually have greater importance and need to be reconstructed with higher accuracy compared to non-ROI coefficients. In order to accomplish this task, the ROI is predicted at each time step using two specific approaches. One of these approaches incorporates a long short-term memory (LSTM) network for the prediction. The other approach employs the previous ROI information for predicting the next step ROI. Using the exploration-exploitation technique, a Q-network learns to choose the best approach for designing the measurement matrix. Furthermore, a joint loss function is introduced for the efficient training of the Q-network as well as the LSTM network. The result indicates a significant performance gain for our proposed method, even for rapidly varying signals and a reduced number of measurements.

翻訳日:2021-07-05 12:41:31 公開日:2021-07-02

# 適応侵入検知システムのためのセグメンテッドフェデレーション学習

Segmented Federated Learning for Adaptive Intrusion Detection System ( http://arxiv.org/abs/2107.00881v1 )

ライセンス: Link先を確認

Geet Shingi, Harsh Saglani, Preeti Jain

(参考訳) サイバー攻撃は大きな問題であり、組織に大きな経済的、評判の害をもたらす。しかし、様々な要因により、現在のネットワーク侵入検知システム(nids)は不十分であると思われる。 NIDSは、手作りのルールデータセットを通じてサイバー攻撃を特定する。機械学習とディープラーニングの最近の応用は、nidsの膨大な努力を和らげてきたが、ネットワークデータのセキュリティは常に主要な関心事であった。しかし、セキュリティ問題に遭遇し、組織間の共有を可能にするために、フェデレートラーニング(FL)スキームが採用されている。現在のFLシステムは成功したが、ネットワークのデータ分散はFLのような単一のグローバルモデルに必ずしも適合しない。したがって、そのような場合、fl に単一の大域モデルを持つことは不可能である。本稿では,より効率的なNIDSのためのSegmented-Federated Learning(Segmented-FL)学習手法を提案する。 segmented-flアプローチでは、セグメンテーションの発生状況に基づいて周期的局所モデル評価を行う。同様のネットワーク環境を同じグループに持ち込もうとしている。さらに、Segmented-FLシステムは、作業者が保持するデータサンプル数に基づいて、局所モデルパラメータの重み付け集約と結合して、さらなる性能向上を行う。 FLや標準データセットの集中型システムと比較して,システムの性能向上が図られ,様々なタスクにまたがってその技術を拡張することが強くなっています。このソリューションは、多様なネットワーク環境を共同で学び、個々のデータセットのプライバシーを保護したい組織に応用される。

Cyberattacks are a major issues and it causes organizations great financial, and reputation harm. However, due to various factors, the current network intrusion detection systems (NIDS) seem to be insufficent. Predominant NIDS identifies Cyberattacks through a handcrafted dataset of rules. Although the recent applications of machine learning and deep learning have alleviated the enormous effort in NIDS, the security of network data has always been a prime concern. However, to encounter the security problem and enable sharing among organizations, Federated Learning (FL) scheme is employed. Although the current FL systems have been successful, a network's data distribution does not always fit into a single global model as in FL. Thus, in such cases, having a single global model in FL is no feasible. In this paper, we propose a Segmented-Federated Learning (Segmented-FL) learning scheme for a more efficient NIDS. The Segmented-FL approach employs periodic local model evaluation based on which the segmentation occurs. We aim to bring similar network environments to the same group. Further, the Segmented-FL system is coupled with a weighted aggregation of local model parameters based on the number of data samples a worker possesses to further augment the performance. The improved performance by our system as compared to the FL and centralized systems on standard dataset further validates our system and makes a strong case for extending our technique across various tasks. The solution finds its application in organizations that want to collaboratively learn on diverse network environments and protect the privacy of individual datasets.

翻訳日:2021-07-05 12:41:14 公開日:2021-07-02

# アクセント音声認識のための教師付きコントラスト学習

Supervised Contrastive Learning for Accented Speech Recognition ( http://arxiv.org/abs/2107.00921v1 )

ライセンス: Link先を確認

Tao Han, Hantao Huang, Ziang Yang, Wei Han

(参考訳) ニューラルネットワークに基づく音声認識システムは、アクセント付き音声、特に不慣れなアクセントによる性能劣化に悩まされる。本稿では,アクセント付き音声認識のための教師付きコントラスト学習フレームワークについて検討する。コントラスト学習のための異なる視点(類似の「陽性」データサンプル)を構築するため,ノイズ注入,分光法,TS-Same-same-sence生成を含む3つのデータ拡張手法について検討した。共通音声データセットを用いた実験から, コントラスト学習は, ゼロショットとフルショットの両方において, 従来の共同学習法を著しく上回るデータ提示不変量および発音不変量表現の構築に寄与することを示した。コントラスト学習は,合同訓練法と比較して,平均で3.66% (ゼロショット) と3.78% (フルショット) の精度向上が示された。

Neural network based speech recognition systems suffer from performance degradation due to accented speech, especially unfamiliar accents. In this paper, we study the supervised contrastive learning framework for accented speech recognition. To build different views (similar "positive" data samples) for contrastive learning, three data augmentation techniques including noise injection, spectrogram augmentation and TTS-same-sentence generation are further investigated. From the experiments on the Common Voice dataset, we have shown that contrastive learning helps to build data-augmentation invariant and pronunciation invariant representations, which significantly outperforms traditional joint training methods in both zero-shot and full-shot settings. Experiments show that contrastive learning can improve accuracy by 3.66% (zero-shot) and 3.78% (full-shot) on average, comparing to the joint training method.

翻訳日:2021-07-05 12:40:53 公開日:2021-07-02

# システム設計と運用のための伝達距離の実証計測

Empirically Measuring Transfer Distance for System Design and Operation ( http://arxiv.org/abs/2107.01184v1 )

ライセンス: Link先を確認

Tyler Cody, Stephen Adams, Peter A. Beling

(参考訳) 古典的な機械学習アプローチは非定常性に敏感である。転送学習は、あるシステムから別のシステムへの知識を共有することによって、非定常性に対処することができるが、機械の予測や防御といった分野においては、データは基本的に制限される。したがって、転送学習アルゴリズムには、学習すべき例がほとんどない。本稿では,これらのアルゴリズム学習の制約がシステム工学によって対処可能であることを示唆する。一般に移動距離を定式化し,モデルの伝達可能性の実証的定量化におけるその利用を実証する。我々は, 転置可能な予測モデルを実現するために, 機械改造手順の設計における移動距離の利用を検討する。また,コンピュータビジョンにおける操作性能予測における転送距離の利用も検討する。経験者は、コンポーネント学習システムで直面する学習論的課題を考慮して、提示された方法論を使ってシステムの設計と運用を行うことができる。

Classical machine learning approaches are sensitive to non-stationarity. Transfer learning can address non-stationarity by sharing knowledge from one system to another, however, in areas like machine prognostics and defense, data is fundamentally limited. Therefore, transfer learning algorithms have little, if any, examples from which to learn. Herein, we suggest that these constraints on algorithmic learning can be addressed by systems engineering. We formally define transfer distance in general terms and demonstrate its use in empirically quantifying the transferability of models. We consider the use of transfer distance in the design of machine rebuild procedures to allow for transferable prognostic models. We also consider the use of transfer distance in predicting operational performance in computer vision. Practitioners can use the presented methodology to design and operate systems with consideration for the learning theoretic challenges faced by component learning systems.

翻訳日:2021-07-05 12:40:16 公開日:2021-07-02

# 転校学習のシステム理論

A Systems Theory of Transfer Learning ( http://arxiv.org/abs/2107.01196v1 )

ライセンス: Link先を確認

Tyler Cody, Peter A. Beling

(参考訳) 伝達学習のための既存のフレームワークは、システム理論の観点から不完全である。彼らはドメインとタスクの概念を強調し、構造と振舞いの概念を無視する。そうすることで、形式主義が彼らの枠組みの解明に果たすことができる範囲を制限できる。ここでは、転移学習を集合上の関係として定義し、その後、転移学習の一般的な性質を数学的構成として特徴づけるメサロヴィッチ系理論を用いる。既存のフレームワークを私たちの観点で解釈し、トランスファー可能性、転送粗さ、転送距離の概念を定義する既存のフレームワークを越えています。重要な点は、その形式化にもかかわらず、学習理論の詳細な数学や機械学習の解法を回避し、それらの考察を取り除かないことである。したがって、システム設計と分析のための厳格な基盤を提供する、転送学習をモデリングするための正式な汎用システムフレームワークを提供する。

Existing frameworks for transfer learning are incomplete from a systems theoretic perspective. They place emphasis on notions of domain and task, and neglect notions of structure and behavior. In doing so, they limit the extent to which formalism can be carried through into the elaboration of their frameworks. Herein, we use Mesarovician systems theory to define transfer learning as a relation on sets and subsequently characterize the general nature of transfer learning as a mathematical construct. We interpret existing frameworks in terms of ours and go beyond existing frameworks to define notions of transferability, transfer roughness, and transfer distance. Importantly, despite its formalism, our framework avoids the detailed mathematics of learning theory or machine learning solution methods without excluding their consideration. As such, we provide a formal, general systems framework for modeling transfer learning that offers a rigorous foundation for system design and analysis.

翻訳日:2021-07-05 12:40:03 公開日:2021-07-02

# Attentive Speaker Embedding を用いたマルチユーザボイスフィルターライト

Multi-user VoiceFilter-Lite via Attentive Speaker Embedding ( http://arxiv.org/abs/2107.01201v1 )

ライセンス: Link先を確認

Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ian McGraw

(参考訳) 本稿では、voicefilter-liteのような話者条件付き音声モデルが、任意の数の登録ユーザを1回のパスでサポートできるようにするソリューションを提案する。これは、複数の話者埋め込みにアテンション機構を用いて単一の注意埋め込みを計算し、モデルへのサイドインプットとして使用することによって実現される。マルチユーザ音声フィルタ-liteを実装し,(1)ストリーミング自動音声認識(asr)タスク,(2)テキスト非依存話者照合タスク,(3)asrが複数の登録ユーザからのキーフレーズを雑音環境下で検出しなければならないパーソナライズされたキーフレーズ検出タスクの3つのタスクについて評価した。提案実験では,最大4人の登録ユーザに対して,重複する音声が存在する場合の音声認識と話者照合の誤りを,他の音響条件下での性能に影響を与えずに大幅に低減できることを示す。この注意型話者埋め込みアプローチは、個人用VADやパーソナライズされたASRといった他の話者条件モデルにも容易に適用できる。

In this paper, we propose a solution to allow speaker conditioned speech models, such as VoiceFilter-Lite, to support an arbitrary number of enrolled users in a single pass. This is achieved by using an attention mechanism on multiple speaker embeddings to compute a single attentive embedding, which is then used as a side input to the model. We implemented multi-user VoiceFilter-Lite and evaluated it for three tasks: (1) a streaming automatic speech recognition (ASR) task; (2) a text-independent speaker verification task; and (3) a personalized keyphrase detection task, where ASR has to detect keyphrases from multiple enrolled users in a noisy environment. Our experiments show that, with up to four enrolled users, multi-user VoiceFilter-Lite is able to significantly reduce speech recognition and speaker verification errors when there is overlapping speech, without affecting performance under other acoustic conditions. This attentive speaker embedding approach can also be easily applied to other speaker-conditioned models such as personal VAD and personalized ASR.

翻訳日:2021-07-05 12:39:50 公開日:2021-07-02

# ニューラルネットワークにおける広平面ミニマの構造の解明

Unveiling the structure of wide flat minima in neural networks ( http://arxiv.org/abs/2107.01163v1 )

ライセンス: Link先を確認

Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Gabriele Perugini, Riccardo Zecchina

(参考訳) ディープラーニングの成功は、科学全体にわたるニューラルネットワークの応用の可能性を明らかにし、基本的な理論的問題を開いた。特に、勾配法の単純な変種に基づく学習アルゴリズムが、非凸損失関数のほぼ最適最小値を見つけることができるという事実は、ニューラルネットワークの予期せぬ特徴であり、深く理解する必要がある。このようなアルゴリズムは、ノイズがあってもほぼ完璧にデータを適合させることができるが、予測能力は優れている。いくつかの実験結果は、アルゴリズムによって達成されたいわゆる極小の平坦性と一般化性能との再現可能な相関を示した。同時に、統計物理学の結果は、非凸ネットワークにおいて、多くの狭小極小が、より少ない幅の平らな極小と共存していることを示しており、これはよく一般化している。ここでは,高いマージン分類に対応するミニマの合体から,広い平坦なミニマが生まれることを示す。ゼロマージン解と比較して指数関数的に稀であるにもかかわらず、高マージンミニマは特定の領域に集中する傾向がある。これらのミニマは、より小さく、より小さな縁の他の解に囲まれており、長距離の溶液の密集領域につながる。また, モデルパラメータの数が異なるため, 平坦な最小値が出現し, アルゴリズムが解を見つけ始めるタイミングを推定する代替分析手法も提供する。

The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks which needs to be understood in depth. Such algorithms are able to fit the data almost perfectly, even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here we show that wide flat minima arise from the coalescence of minima that correspond to high-margin classifications. Despite being exponentially rare compared to zero-margin solutions, high-margin minima tend to concentrate in particular regions. These minima are in turn surrounded by other solutions of smaller and smaller margin, leading to dense regions of solutions over long distances. Our analysis also provides an alternative analytical method for estimating when flat minima appear and when algorithms begin to find solutions, as the number of model parameters varies.

翻訳日:2021-07-05 12:39:32 公開日:2021-07-02

PDF登録状況（公開日: 20210702）