Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20210314となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 多部量子系の特性に対する一様連続性境界 Uniform continuity bounds for characteristics of multipartite quantum systems ( http://arxiv.org/abs/2007.00417v2 ) ライセンス: Link先を確認	M.E.Shirokov	(参考訳) 多部量子系の特性に対する(一様)連続性境界を求める普遍的方法を考える。我々はエネルギー制約の下で無限次元多部量子系に特別な注意を払う。これらの方法により、多部量子状態のいくつかの重要な特性、すなわち、量子的(条件的)相互情報、スクアッドエンタングルメント、c-スクアッドエンタングルメント、および相互情報の条件エンタングルメントに対する連続性境界を得る。多部量子相互情報の連続性境界は、大きな次元/エネルギーに対して漸近的にきつい。得られた結果は,n$-partite squashed entanglement,c-squashed entanglement,およびエネルギー制約下での相互情報の条件付き絡み合いの漸近連続性を証明するために用いられる。 We consider universal methods for obtaining (uniform) continuity bounds for characteristics of multipartite quantum systems. We pay a special attention to infinite-dimensional multipartite quantum systems under the energy constraints. By these methods we obtain continuity bounds for several important characteristics of a multipartite quantum state: the quantum (conditional) mutual information, the squashed entanglement, the c-squashed entanglement and the conditional entanglement of mutual information. The continuity bounds for the multipartite quantum mutual information are asymptotically tight for large dimension/energy. The obtained results are used to prove the asymptotic continuity of the $n$-partite squashed entanglement, c-squashed entanglement and the conditional entanglement of mutual information under the energy constraints.	翻訳日:2023-05-11 23:10:30 公開日:2021-03-14
# 制御sゲートを用いた非cliffordinterleaved randomized benchmarkingの実験的検討 Experimental implementation of non-Clifford interleaved randomized benchmarking with a controlled-S gate ( http://arxiv.org/abs/2007.08532v2 ) ライセンス: Link先を確認	Shelly Garion, Naoki Kanazawa, Haggai Landa, David C. McKay, Sarah Sheldon, Andrew W. Cross, Christopher J. Wood	(参考訳) ハードウェアで効率的な量子回路の量子デバイスへのトランスパイレーションは、ノイズの多い量子コンピュータ上での量子アルゴリズムの実行に不可欠である。典型的な量子デバイスでは、1対の結合量子ビットに対して1つの2ビットのクリフォードエンタングゲートを持つゲートセットを使用するが、いくつかのアプリケーションでは、非クリフォード2ビットのゲートにアクセスするとより最適な回路分解が起こり、ノイズを最適化する柔軟性も向上する。我々は、Qiskit Pulseフレームワークを用いたクラウドベースのIBM量子コンピューティング上で、低エラー非クリフォード制御-$\frac{\pi}{2}$ phase (CS) ゲートの校正を実演する。校正されたcsゲートのゲートエラーを測定するために、非クリフォードcnot-dihedral interleaved randomized benchmarkingを行う。ゲート長263 nsで5.9(7) \times 10^{-3}$のゲートエラーを得ることができ、これは関連するキュービットのコヒーレンス限界に近く、バックエンドの標準キャリブレーションされたcnotゲートよりも低いエラーである。 Hardware efficient transpilation of quantum circuits to a quantum devices native gateset is essential for the execution of quantum algorithms on noisy quantum computers. Typical quantum devices utilize a gateset with a single two-qubit Clifford entangling gate per pair of coupled qubits, however, in some applications access to a non-Clifford two-qubit gate can result in more optimal circuit decompositions and also allows more flexibility in optimizing over noise. We demonstrate calibration of a low error non-Clifford Controlled-$\frac{\pi}{2}$ phase (CS) gate on a cloud based IBM Quantum computing using the Qiskit Pulse framework. To measure the gate error of the calibrated CS gate we perform non-Clifford CNOT-Dihedral interleaved randomized benchmarking. We are able to obtain a gate error of $5.9(7) \times 10^{-3}$ at a gate length 263 ns, which is close to the coherence limit of the associated qubits, and lower error than the backends standard calibrated CNOT gate.	翻訳日:2023-05-09 07:03:02 公開日:2021-03-14
# リースウォークの帰納確率と自己相似性 Return probability and self-similarity of the Riesz walk ( http://arxiv.org/abs/2010.04518v3 ) ライセンス: Link先を確認	Ryota Hanaoka, Norio Konno	(参考訳) 量子ウォーク(quantum walk)とは、ランダムウォークの一種である。 1次元の2状態量子ウォークは複素平面の単位円上の測度によって決定できる。特異連続測度については、対応する量子ウォークの結果は限定的である。この状況では、有名な特異連続測度の一つであるリース測度によって与えられるリースウォークと呼ばれる量子ウォークに焦点を当てる。本論文は, リースウォークの戻り確率について述べる。さらに,歩行の自己相似性に関するいくつかの予想を示す。 The quantum walk is a counterpart of the random walk. The 2-state quantum walk in one dimension can be determined by a measure on the unit circle in the complex plane. As for the singular continuous measure, results on the corresponding quantum walk are limited. In this situation, we focus on a quantum walk, called the Riesz walk, given by the Riesz measure which is one of the famous singular continuous measures. The present paper is devoted to the return probability of the Riesz walk. Furthermore, we present some conjectures on the self-similarity of the walk.	翻訳日:2023-04-29 13:29:55 公開日:2021-03-14
# wバンド超電導インダクタンス量子ビット(キネチコン)の初期設計 Initial Design of a W-band Superconducting Kinetic Inductance Qubit (Kineticon) ( http://arxiv.org/abs/2012.08654v3 ) ライセンス: Link先を確認	Farzad B. Faramarzi, Peter K. Day, Jacob Glasby, Sasha Sypkens, Marco Colangelo, Ralph Chamberlin, Mohammad Mirhosseini, Kevin Schmidt, Karl K. Berggren Philip Mauskopf	(参考訳) 超伝導量子ビットは量子コンピューティングの研究や産業で広く使われている。本稿では、2つの異なる量子エネルギー状態に必要な非調和性を提供する非線形ナノワイヤセクションでWバンド周波数で動作する超伝導運動インダクタンス量子ビットについて述べる。キュービットを高い周波数で動作させることは、これらの装置の希釈冷凍機温度要件を緩和し、多数のキュービットを多重化する経路を舗装する。ミリ波動作には比較的高いT_c$の超伝導体が必要であり、これは高いギャップ周波数、2$\Delta/h$であり、光子がクーパー対を破る。例えば、$T_c = 15\,\text{K}$のNbTiNは1.4 THz付近のギャップ周波数を持ち、アルミニウム(90GHz)よりもはるかに高く、ミリ波帯全体の動作を可能にする。ここでは3次元キャビティに埋め込まれたWバンドキネティコン量子ビットの設計とシミュレーションについて述べる。得られた電界分布の古典的電磁計算を行う。 Superconducting qubits are widely used in quantum computing research and industry. We describe a superconducting kinetic inductance qubit (and introduce the term Kineticon to describe it) operating at W-band frequencies with a nonlinear nanowire section that provides the anharmonicity required for two distinct quantum energy states. Operating the qubits at higher frequencies may relax the dilution refrigerator temperature requirements for these devices and paves the path for multiplexing a large number of qubits. Millimeter-wave operation requires superconductors with relatively high $T_c$, which implies high gap frequency, 2$\Delta/h$, beyond which photons break Cooper pairs. For example, NbTiN with $T_c =15\,\text{K}$ has a gap frequency near 1.4 THz, which is much higher than that of aluminum (90 GHz), allowing for operation throughout the millimeter-wave band. Here we describe a design and simulation of a W-band Kineticon qubit embedded in a 3-D cavity. We perform classical electromagnetic calculations of the resulting field distributions.	翻訳日:2023-04-20 18:46:46 公開日:2021-03-14
# 行列ベクトル積による量子クエリ複雑性 Quantum query complexity with matrix-vector products ( http://arxiv.org/abs/2102.11349v2 ) ライセンス: Link先を確認	Andrew M. Childs, Shih-Han Hung, Tongyang Li	(参考訳) 入力ベクトル上での動作を返すクエリを用いて,行列の性質を学習する量子アルゴリズムについて検討する。行列のトレース、行列式、ランクの計算や線形系の解法など様々な問題に対して、量子コンピュータは古典計算よりも漸近的な高速化を提供していないことを示す。一方で,行や列のパリティの計算や,同一の行や列が2つあるかどうかの判断といった問題に対して,量子コンピュータは指数関数的なスピードアップを提供する。我々は、行列ベクトル積、ベクトル行列積、ベクトル行列ベクトル積を提供するモデル間の等価性を示すことによって、これを実証する。 We study quantum algorithms that learn properties of a matrix using queries that return its action on an input vector. We show that for various problems, including computing the trace, determinant, or rank of a matrix or solving a linear system that it specifies, quantum computers do not provide an asymptotic speedup over classical computation. On the other hand, we show that for some problems, such as computing the parities of rows or columns or deciding if there are two identical rows or columns, quantum computers provide exponential speedup. We demonstrate this by showing equivalence between models that provide matrix-vector products, vector-matrix products, and vector-matrix-vector products, whereas the power of these models can vary significantly for classical computation.	翻訳日:2023-04-10 05:31:54 公開日:2021-03-14
# SU(1,1)干渉計のKerr非線形による位相感度の向上 Improvement of phase sensitivity in SU(1,1) interferometer via a Kerr nonlinear ( http://arxiv.org/abs/2103.07844v1 ) ライセンス: Link先を確認	Shoukang Chang, Wei Ye, Huan Zhang, Liyun Hu, Jiehui Huang, and Sanqiu Liu	(参考訳) 本稿では,コヒーレント状態入力とホモダイン検出を併用した従来のSU(1,1)干渉計にKerr非線形位相シフトを導入することにより,位相感度を向上させる理論的手法を提案する。位相感度および量子フィッシャー情報に対する光子損失の現実的影響について検討する。その結果,SU(1,1)干渉計の線形位相シフトと比較して,Kerr非線形ケースは位相感度と量子フィッシャー情報を高めるだけでなく,光子損失を著しく抑制できることがわかった。また,同じアクセス可能なパラメータにおいて,内部損失が外部のパラメータよりも位相感度に与える影響も観察した。非線形位相要素の導入により,より高い位相感度とより大きな量子フィッシャー情報を得るための,低コストな入力資源の明らかな利点を示すことが興味深い。 We propose a theoretical scheme to enhance the phase sensitivity by introducing a Kerr nonlinear phase shift into the traditional SU(1,1) interferometer with a coherent state input and homodyne detection. We investigate the realistic effects of photon losses on phase sensitivity and quantum Fisher information. The results show that compared with the linear phase shift in SU(1,1) interferometer, the Kerr nonlinear case can not only enhance the phase sensitivity and quantum Fisher information, but also significantly suppress the photon losses. We also observe that at the same accessible parameters, internal losses have a greater influence on the phase sensitivity than the external ones. It is interesting that, our scheme shows an obvious advantage of low-cost input resources to obtain higher phase sensitivity and larger quantum Fisher information due to the introduction of nonlinear phase element.	翻訳日:2023-04-08 04:35:04 公開日:2021-03-14
# 複数の中心スピン系における量子バッテリパワーの下限と上限 Lower and upper bounds of quantum battery power in multiple central spin systems ( http://arxiv.org/abs/2103.07828v1 ) ライセンス: Link先を確認	Li Peng, Wen-Bin He, Stefano Chesi, Hai-Qing Lin, Xi-Wen Guan	(参考訳) 複数の中心スピンと入浴スピンからなる量子電池システムにおけるエネルギー伝達過程について検討した。ここでは「量子電池」は中心のスピンを指し、風呂は「充電器」として機能する。単一中心スピン電池については、任意の数の浴スピンでエネルギー移動とチャージパワーの時間変化を解析的に導出する。電池内の複数の中心スピンの場合、最大パワー $p_{max}$ と中心スピン $n_b$ の間のスケーリング則の関係を見出す。スケーリング法則関係$P_{max}\propto N_{B}^{\alpha}$を満たすが、スケーリング指数$\alpha$は、下界$\alpha =1/2$から上界$\alpha =3/2$までバススピン数$N$によって変化する。下限と上限はそれぞれ$N\to 1$と$N\gg N_B$に対応する。熱力学的極限において、ホルシュタイン・プリマコフ変換(H-P)を適用することにより、上界が$P_{max}=0.72 B A \sqrt{N} N_{B}^{3/2}$であることが厳密に証明される。ここで$B$と$A $は外部磁場であり、バッテリーと充電器の間の結合定数である。 We study the energy transfer process in quantum battery systems consisting of multiple central spins and bath spins. Here with "quantum battery" we refer to the central spins, whereas the bath serves as the "charger". For the single central-spin battery, we analytically derive the time evolutions of the energy transfer and the charging power with arbitrary number of bath spins. For the case of multiple central spins in the battery, we find the scaling-law relation between the maximum power $P_{max}$ and the number of central spins $N_B$. It approximately satisfies a scaling law relation $P_{max}\propto N_{B}^{\alpha}$, where scaling exponent $\alpha$ varies with the bath spin number $N$ from the lower bound $\alpha =1/2$ to the upper bound $\alpha =3/2$. The lower and upper bounds correspond to the limits $N\to 1$ and $N\gg N_B$, respectively. In thermodynamic limit, by applying the Holstein-Primakoff (H-P) transformation, we rigorously prove that the upper bound is $P_{max}=0.72 B A \sqrt{N} N_{B}^{3/2}$, which shows the same advantage in scaling of a recent charging protocol based on the Tavis-Cummins model. Here $B$ and $A $ are the external magnetic field and coupling constant between the battery and the charger.	翻訳日:2023-04-08 04:34:52 公開日:2021-03-14
# オペレーターの正規順序変換とエンタングルメントの定量化への応用 Operator transpose within normal ordering and its applications for quantifying entanglement ( http://arxiv.org/abs/2103.07821v1 ) ライセンス: Link先を確認	Liyun Hu, Luping Zhang, Xiaoting Chen, Wei Ye, Qin Guo, and Hongyi Fan	(参考訳) 部分転置は絡み合いを定量化する重要な演算であり、ここでは任意の単(二モード)作用素の(部分的な)転置を研究する。 fock-basis展開を用いて、任意の作用素の転置作用素は通常の順序形式で c-数の代わりに a^{{\dag}}(a) を a(a^{{\dag}}) に置き換えることで得られることが判明した。変位演算子とウィグナー演算子の変換について検討し, 密度演算子と遷移密度演算子との間には, ウィグナー関数, 特性関数, 共分散行列などの平均値の関係が形成される。これらの観測はマルチモードの場合にも拡張できる。応用例として, 2モードスクイーズドオペレータの部分的転置とレーザチャネルを介した2モードスクイーズド真空の絡み合いを考える。 Partial transpose is an important operation for quantifying the entanglement, here we study the (partial) transpose of any single (two-mode) operators. Using the Fock-basis expansion, it is found that the transposed operator of an arbitrary operator can be obtained by replacement of a^{{\dag}}(a) by a(a^{{\dag}}) instead of c-number within normal ordering form. The transpose of displacement operator and Wigner operator are studied, from which the relation of Wigner function, characteristics function and average values such as covariance matrix are constructed between density operator and transposed density operator. These observations can be further extended to multi-mode cases. As applications, the partial transpose of two-mode squeezed operator and the entanglement of two-mode squeezed vacuum through a laser channel are considered.	翻訳日:2023-04-08 04:34:19 公開日:2021-03-14
# 配位子相互作用するリドベルグアンサンブルにおけるコヒーレント非局在状態:内部縮退の役割 Coherently delocalized states in dipole interacting Rydberg ensembles: the role of internal degeneracies ( http://arxiv.org/abs/2103.07990v1 ) ライセンス: Link先を確認	Ghassan Abumwis, Christopher W. W\"achtler, Matthew T. Eiles, Alexander Eisfeld	(参考訳) 双極子-双極子相互作用リドバーグ集合体の励起子非局在化に及ぼす縮退原子状態の影響について検討した。凍結ガスと正則 1-, 2-, 3-次元格子配置を例にとると, 縮退しない状況と比較して, 縮退が非局在化を促進することが分かる。磁場によって提供されるゼーマン分裂を用いて、縮退性を持ち上げ、縮退状態と非縮退状態の遷移を詳細に研究する。 We investigate the effect of degenerate atomic states on the exciton delocalization of dipole-dipole interacting Rydberg assemblies. Using a frozen gas and regular one-, two-, and three-dimensional lattice arrangements as examples, we see that degeneracies can enhance the delocalization compared to the situation when there is no degeneracy. Using the Zeeman splitting provided by a magnetic field, we controllably lift the degeneracy to study in detail the transition between degenerate and non-degenerate regimes.	翻訳日:2023-04-08 04:29:09 公開日:2021-03-14
# フラックス可変トランスモン量子ビットに対する表面処理の効果 Effects of surface treatments on flux tunable transmon qubits ( http://arxiv.org/abs/2103.07970v1 ) ライセンス: Link先を確認	M. Mergenthaler, C. M\"uller, M. Ganzhorn, S. Paredes, P. M\"uller, G. Salis, V. P. Adiga, M. Brink, M. Sandberg, J. B. Hertzberg, S. Filipp, A. Fuhrer	(参考訳) 最先端のソリッドステート量子プロセッサの主な制限の1つは、局所環境におけるノイズによるクビットデコヒーレンスと緩和である。完全なフォールトトレラント量子コンピューティングに進むためには、基礎となる微視的ノイズ源をよりよく理解する必要がある。表面への吸着、界面の不純物、材料欠陥は固体量子デバイスにおけるノイズと消散の源として同定されている。ここでは,超高真空パッケージを用いて,真空負荷,紫外線露光,イオン照射処理がフラックス調整可能な超伝導トランスモン量子ビットのコヒーレンスおよび遅いパラメータ変動に与える影響を調べた。本研究では, 各表面処理の効果を, 多くのキュービットの平均値と処理前後の測定値を比較して分析する。検討した処理は緩和レート$\Gamma_1$とエコー減圧レート$\Gamma_2^\textrm{e}$に大きく影響しないが、Neイオン照射は$\Gamma_1$を減少させる。対照的に、紫外線およびnh$_3$処理によりチップ表面から磁性吸着物を除去することにより、フラックスノイズパラメータが改善される。さらに,sf$_6$のイオン照射により,スイートスポットにおけるqubitコヒーレンスに影響を与えることなく,その場および後製造中のqubit周波数を調整できることを実証した。 One of the main limitations in state-of-the art solid-state quantum processors are qubit decoherence and relaxation due to noise in their local environment. For the field to advance towards full fault-tolerant quantum computing, a better understanding of the underlying microscopic noise sources is therefore needed. Adsorbates on surfaces, impurities at interfaces and material defects have been identified as sources of noise and dissipation in solid-state quantum devices. Here, we use an ultra-high vacuum package to study the impact of vacuum loading, UV-light exposure and ion irradiation treatments on coherence and slow parameter fluctuations of flux tunable superconducting transmon qubits. We analyse the effects of each of these surface treatments by comparing averages over many individual qubits and measurements before and after treatment. The treatments studied do not significantly impact the relaxation rate $\Gamma_1$ and the echo dephasing rate $\Gamma_2^\textrm{e}$, except for Ne ion bombardment which reduces $\Gamma_1$. In contrast, flux noise parameters are improved by removing magnetic adsorbates from the chip surfaces with UV-light and NH$_3$ treatments. Additionally, we demonstrate that SF$_6$ ion bombardment can be used to adjust qubit frequencies in-situ and post fabrication without affecting qubit coherence at the sweet spot.	翻訳日:2023-04-08 04:28:29 公開日:2021-03-14
# ダイヤモンド中の自然寿命スピン対のコヒーレンスと絡み合い Coherence and entanglement of inherently long-lived spin pairs in diamond ( http://arxiv.org/abs/2103.07961v1 ) ライセンス: Link先を確認	H. P. Bartling, M. H. Abobeih, B. Pingault, M. J. Degen, S. J. H. Loenen, C. E. Bradley, J. Randall, M. Markham, D. J. Twitchen, and T. H. Taminiau	(参考訳) 個々の量子システムの一貫性を理解し保護することは、量子科学とテクノロジーにおける中心的な課題である。過去数十年にわたり、コヒーレンスを拡張するための様々な方法が開発されてきた。補完的なアプローチは、本質的にデコヒーレンスから保護される自然に存在するシステムを探すことである。ここでは、固体中の同一核スピンの対が本質的に長寿命の量子系を形成することを示す。ダイヤモンド中の炭素13対を3つ研究し、その近傍の単一のNV中心を用いて量子状態の高忠実度測定を実現する。次に、スピン対は、時計遷移、非コヒーレンスな部分空間、運動的狭小化の変種という3つの現象のユニークな組み合わせにより、外部摂動に対して堅牢であることを明らかにする。結果として生じる不均質な強調時間は$t_2^* = 1.9(3)$ minutesであり、個別に制御された量子ビットでは最長である。最後に、完全な制御を開発し、射影パリティ測定により2つのスピンペア量子ビット間の絡み合い状態を実現する。これらの長寿命量子ビットはダイヤモンドやその他の固体に多く存在し、量子センシング、量子情報処理、量子ネットワークの新たな機会を提供する。 Understanding and protecting the coherence of individual quantum systems is a central challenge in quantum science and technology. Over the last decades, a rich variety of methods to extend coherence have been developed. A complementary approach is to look for naturally occurring systems that are inherently protected against decoherence. Here, we show that pairs of identical nuclear spins in solids form intrinsically long-lived quantum systems. We study three carbon-13 pairs in diamond and realize high-fidelity measurements of their quantum states using a single NV center in their vicinity. We then reveal that the spin pairs are robust to external perturbations due to a unique combination of three phenomena: a clock transition, a decoherence-free subspace, and a variant on motional narrowing. The resulting inhomogeneous dephasing time is $T_2^* = 1.9(3)$ minutes, the longest reported for individually controlled qubits. Finally, we develop complete control and realize an entangled state between two spin-pair qubits through projective parity measurements. These long-lived qubits are abundantly present in diamond and other solids, and provide new opportunities for quantum sensing, quantum information processing, and quantum networks.	翻訳日:2023-04-08 04:28:02 公開日:2021-03-14
# 原子集合におけるデコヒーレンスフリー部分空間の幾何学的操作 Geometric Manipulation of a Decoherence-Free Subspace in Atomic Ensembles ( http://arxiv.org/abs/2103.07907v1 ) ライセンス: Link先を確認	Dongni Chen, Si Luo, Ying-Dan Wang, Stefano Chesi, and Mahn-Soo Choi	(参考訳) 単一モードキャビティに閉じ込められた$\Lambda$型構造を持つ原子のアンサンブルを考察し、系の量子ゼノ部分空間内のゼロエネルギー状態の部分空間上の量子状態のコヒーレントな操作の幾何学的スキームを提案する。特定の部分空間は量子ゼノ部分空間の非コヒーレンスな性質を継承しており、対称性に保護された退化性を持ち、任意のユニタリ操作の普遍的なスキームのすべての条件を満たす。 We consider an ensemble of atoms with $\Lambda$-type level structure trapped in a single-mode cavity, and propose a geometric scheme of coherent manipulation of quantum states on the subspace of zero-energy states within the quantum Zeno subspace of the system. We find that the particular subspace inherits the decoherence-free nature of the quantum Zeno subspace and features a symmetry-protected degeneracy, fulfilling all the conditions for a universal scheme of arbitrary unitary operations on it.	翻訳日:2023-04-08 04:26:27 公開日:2021-03-14
# 超低温原子で実現した異方性ハイゼンベルク模型における横スピンダイナミクス Transverse spin dynamics in the anisotropic Heisenberg model realized with ultracold atoms ( http://arxiv.org/abs/2103.07866v1 ) ライセンス: Link先を確認	Paul Niklas Jepsen, Wen Wei Ho, Jesse Amato-Grill, Ivana Dimitrova, Eugene Demler, Wolfgang Ketterle	(参考訳) 交換異方性を持つハイゼンベルクモデルでは、横スピン成分は保存されておらず、輸送によっても減衰しうる。ここでは、超低温原子を用いて1次元ハイゼンベルクスピン鎖のダイナミクスをシミュレーションし、異方性によって制御される高速で局所的なスピン崩壊を観測する。さらに, チェーン間の格子深さのばらつきや, チェーンの端面における有効磁場の2倍の減少, 移動孔の存在下での有効磁場のゆらぎによる各チェーン内での劣化などにより, 不均一な崩壊機構を生じる超交換により生じる有効磁場を直接観察する。後者は、穴とマグノンの間の新しい結合機構である。広範な数値シミュレーションによって裏付けられたこれら全てのデファスメント機構は、超低温原子では観測されておらず、基礎となるハバード模型の基本的な性質を示している。 In Heisenberg models with exchange anisotropy, transverse spin components are not conserved and can decay not only by transport, but also by dephasing. Here we utilize ultracold atoms to simulate the dynamics of 1D Heisenberg spin chains, and observe fast, local spin decay controlled by the anisotropy. Additionally, we directly observe an effective magnetic field created by superexchange which causes an inhomogeneous decay mechanism due to variations of lattice depth between chains, as well as dephasing within each chain due to the twofold reduction of the effective magnetic field at the edges of the chains and due to fluctuations of the effective magnetic field in the presence of mobile holes. The latter is a new coupling mechanism between holes and magnons. All these dephasing mechanisms, corroborated by extensive numerical simulations, have not been observed before with ultracold atoms and illustrate basic properties of the underlying Hubbard model.	翻訳日:2023-04-08 04:26:07 公開日:2021-03-14
# 多様体上の連続正規化フロー Continuous normalizing flows on manifolds ( http://arxiv.org/abs/2104.14959v1 ) ライセンス: Link先を確認	Luca Falorsi	(参考訳) 正規化フローは、複雑なマルチモーダル分布から再パラメータ化可能なサンプルを得るための強力な技術である。残念なことに、現在のアプローチは最も基本的なジオメトリでのみ利用可能であり、基礎となる空間が非自明なトポロジを持つ場合に不足する。微分幾何学と幾何学制御理論の基本的な考え方を用いて、最近導入されたニューラルオドと連続正規化フローを任意の滑らかな多様体に拡張できる方法について述べる。本稿では,これらの空間上のベクトル場をパラメータ化する一般的な手法を提案し,勾配に基づく学習を行う方法を示す。さらに、この一般化された環境での発散に対するスケーラブルな非バイアス推定器を提供する。多様な空間の選択に関する実験では、複雑な分布から再パラメータ可能なサンプルを得るためのフレームワークの能力が実証的に示される。 Normalizing flows are a powerful technique for obtaining reparameterizable samples from complex multimodal distributions. Unfortunately, current approaches are only available for the most basic geometries and fall short when the underlying space has a nontrivial topology, limiting their applicability for most real-world data. Using fundamental ideas from differential geometry and geometric control theory, we describe how the recently introduced Neural ODEs and continuous normalizing flows can be extended to arbitrary smooth manifolds. We propose a general methodology for parameterizing vector fields on these spaces and demonstrate how gradient-based learning can be performed. Additionally, we provide a scalable unbiased estimator for the divergence in this generalized setting. Experiments on a diverse selection of spaces empirically showcase the defined framework's ability to obtain reparameterizable samples from complex distributions.	翻訳日:2023-04-08 04:20:23 公開日:2021-03-14
# 新型コロナウイルス(covid-19)による学内授業からオンライン授業への転換過程 - コソボの高等教育機関を事例として- The transformation process from in-campus classes into online classes due to the COVID-19 situation -- the case of higher education institutions in Kosovo ( http://arxiv.org/abs/2104.03896v1 ) ライセンス: Link先を確認	Ereza Baftiu and Krenare Pireva Nuci	(参考訳) 新型コロナウイルス(COVID-19)のパンデミックは、世界中の伝統的な教育の面で変化をもたらした。コソバの文脈では、大学は授業からオンライン授業への移行を非常に困難にしている。本研究では,コソボの5つの高等教育機関(HEI)の技術的観点から,インキャンプクラスからオンラインクラスへの転換過程について検討した。データは定性的手法で収集され、3c lichtmanアプローチに従って解析された。その結果,各大学は,インフラの限定化やクラウドインフラの追加により,異なるアプローチを採っていることがわかった。 The COVID-19 pandemic has caused changes in terms of traditional teaching globally. In Kosova context, the Universities have found the transition from teaching in class to online classes quite challenging. This study investigates the transformation process from in-campus classes to online classes from the technical perspective within five Higher Education Institutions (HEI) in Kosovo. The data was collected using the qualitative methods and its analysis followed the 3C Lichtman approach. The results show that each of the Universities followed a different approach, by using either their limited premises infrastructure or using additional cloud infrastructure.	翻訳日:2023-04-08 04:20:10 公開日:2021-03-14
# 古典的電磁ゼロ点放射における水素原子の相対性と放射バランス Relativity and Radiation Balance for the Classical Hydrogen Atom in Classical Electromagnetic Zero-Point Radiation ( http://arxiv.org/abs/2103.09084v1 ) ライセンス: Link先を確認	Timothy H. Boyer	(参考訳) ここでは、古典的電磁ゼロ点放射における古典的水素原子の理解を概観し、特殊相対性理論の重要性を強調する。初期の計算の試みにおける重要な欠落成分(数値と解析の両方)は、完全な相対論的解析に有効な近似を用いることである。ランダウとリフシッツが与えた非相対論的時間フーリエ展開係数は、クーロンポテンシャルにおける荷電粒子の電磁的記述として誤りであり、この誤差のため、マーシャルとクラヴェリーの放射平衡の失敗に関する結論は無効であると指摘されている。むしろ、マーシャルとクレーヴェリーの計算を用いるが、軌道偏心性(非相対論的軌道が完全相対論的電磁軌道の有効な近似である場合)において、古典的な電磁ゼロ点放射の放射バランスは基本周波数と関連する第1のオーバートンで保つことが示されている。 Here we review the understanding of the classical hydrogen atom in classical electromagnetic zero-point radiation, and emphasize the importance of special relativity. The crucial missing ingredient in earlier calculational attempts (both numerical and analytic) is the use of valid approximations to the full relativistic analysis. It is pointed out that the nonrelativistic time Fourier expansion coefficients given by Landau and Lifshitz are in error as the electromagnetic description of a charged particle in a Coulomb potential, and, because of this error, Marshall and Claverie's conclusion regarding the failure of radiation balance is invalid. Rather, using Marshall and Claverie's calculations, but restricted to lowest nonvanishing order in the orbital eccentricity (where the nonrelativistic orbit is a valid approximation to the fully relativistic electromagnetic orbit) radiation balance for classical electromagnetic zero-point radiation is shown to hold at the fundamental frequencies and associated first overtones.	翻訳日:2023-04-08 04:19:59 公開日:2021-03-14
# 量子相転移における非コヒーレントクエンチダイナミクス Decoherent Quench Dynamics across Quantum Phase Transitions ( http://arxiv.org/abs/2103.08068v1 ) ライセンス: Link先を確認	Wei-Ting Kuo, Daniel Arovas, Smitha Vishveshwara, Yi-Zhuang You	(参考訳) 本稿では,デコヒーレンスの存在下での量子相転移のクエンチダイナミクスを調べるための定式化について述べる。即時ハミルトニアンの連続量子非破壊測定によって引き起こされるデコヒーレントダイナミクスを定式化する。臨界点を横断する線形時間駆動に対するよく研究された普遍的キブル・ズレーク挙動を一般化する。基底状態上の逆ギャップとして変化する標準相関時間よりもデコヒーレンス時間が短い強いデコヒーレンス構造を特定する。この方法では、システムが平衡から外れ、関連するフリーズアウト長さが$\bar{\xi}\sim\tau^{\nu/({1+2\nu z})} となる場合のフリーズアウト時間$\bar{t}\sim\tau^{{2\nu z}/({1+2\nu z})} がクエンチレート(1/\tau$)に関してパワーロースケーリングを示す。普遍指数は標準的なkibble-zurekスケールと異なる。我々は,チャーン絶縁体系における位相遷移の場合に,このスケーリング挙動を明示的に示す。本研究では,ホール導電率の緩和から凍結時間スケールを推定できることを示す。さらに、翻訳不変性を損なう障害の出現について、創発的長スケールが特徴とする不均衡励起密度の領域での焼成結果が普遍的スケーリングを示すことを示す。解析的予測を検証し,システムのホストに普遍的と仮定するスケーリング引数を相関付けるため,数値シミュレーションを行う。 We present a formulation for investigating quench dynamics across quantum phase transitions in the presence of decoherence. We formulate decoherent dynamics induced by continuous quantum non-demolition measurements of the instantaneous Hamiltonian. We generalize the well-studied universal Kibble-Zurek behavior for linear temporal drive across the critical point. We identify a strong decoherence regime wherein the decoherence time is shorter than the standard correlation time, which varies as the inverse gap above the groundstate. In this regime, we find that the freeze-out time $\bar{t}\sim\tau^{{2\nu z}/({1+2\nu z})}$ for when the system falls out of equilibrium and the associated freeze-out length $\bar{\xi}\sim\tau^{\nu/({1+2\nu z})}$ show power-law scaling with respect to the quench rate $1/\tau$, where the exponents depend on the correlation length exponent $\nu$ and the dynamical exponent $z$ associated with the transition. The universal exponents differ from those of standard Kibble-Zurek scaling. We explicitly demonstrate this scaling behavior in the instance of a topological transition in a Chern insulator system. We show that the freeze-out time scale can be probed from the relaxation of the Hall conductivity. Furthermore, on introducing disorder to break translational invariance, we demonstrate how quenching results in regions of imbalanced excitation density characterized by an emergent length scale which also shows universal scaling. We perform numerical simulations to confirm our analytical predictions and corroborate the scaling arguments that we postulate as universal to a host of systems.	翻訳日:2023-04-08 04:19:38 公開日:2021-03-14
# 超ポテンシャル$W(x,A,B)=A\tanh 3px-B\coth px$ を持つ形状不変ポテンシャルの可解シュロディンガー方程式 Solvable Schrodinger Equations of Shape Invariant Potentials with Superpotential $W(x,A,B)=A\tanh 3px-B\coth px$ ( http://arxiv.org/abs/2103.08066v1 ) ライセンス: Link先を確認	Jamal Benbourenane	(参考訳) 我々は、新しい、正確に解けるSchr\"{o}dinger方程式を提案する。ポテンシャルパートナーは \[{ V=}-Bp\operatorname{csch}[px]^{2}-9p(B+p)\operatorname{sech}[3px]^{2}+(B\coth[px]-3(B+p)\tanh[3px])^{2} で与えられる。超ポテンシャル $w(x,a,b)=a\tanh 3px-b\coth px を持つ形状不変性を持つ超対称法を用いて得られる。 E_{n}^{\left( -\right) }=(A-B)^{2}-(A-B-4np)^{2}% $ で与えられる固有値を持ち、対応する固有函数は正確に閉形式で決定される。 schr\"{o}dinger方程式とsturm-liouville方程式は一般に閉形式で解くのが難しく、そのいくつかしか知られていない。したがって、厳密な数学的意味では、新しい可解方程式の発見は、解の基盤を理解する上で不可欠である。この結果は核物理学や化学、その他の科学分野にも応用できる可能性がある。 We propose a new, exactly solvable Schr\"{o}dinger equation. The potential partner is given by \[{ V=}-Bp\operatorname{csch}[px]^{2}-9p(B+p)\operatorname{sech}[3px]^{2}+(B\coth[px]-3(B+p)\tanh[3px])^{2}.\] obtained using supersymmetric method with shape invariance property having a superpotential $W(x,A,B)=A\tanh 3px-B\coth px.$ We derive entirely the exact solutions of this family of Schr\"{o}dinger equations with the eigenvalue given by $E_{n}^{\left( -\right) }=(A-B)^{2}-(A-B-4np)^{2}% $ and the corresponding eigenfunctions are determined exactly and in closed form. Schr\"{o}dinger equations, and Sturm-Liouville equations in general, are challenging to solve in closed form, and only a few of them are known. Therefore, in a strict mathematical sense, discovering new solvable equations is essential in understanding the eluded solutions' underpinnings. This result has potential applications in nuclear physics and chemistry, and other fields of science.	翻訳日:2023-04-08 04:19:03 公開日:2021-03-14
# 量子ldpc符号の結合探索デコーダに向けて Toward a Union-Find decoder for quantum LDPC codes ( http://arxiv.org/abs/2103.08049v1 ) ライセンス: Link先を確認	Nicolas Delfosse, Vivien Londe and Michael Beverland	(参考訳) 量子LDPC符号は低オーバーヘッド量子コンピューティングにとって有望な方向である。本稿では,量子LDPC符号のアデコーダとしてUnion-Findデコーダの一般化を提案する。このデコーダは、任意の次元 D \geq 3 のトーリック符号や双曲符号や量子展開符号などの量子LDPC符号の異なるクラスに対して、いくつかの A, {\alpha > 0 に対して、An^{\alpha} までの重みで全ての誤差を補正する。この結果を証明するために,その症候群からの誤差の拡散を測定する被覆半径の概念を導入する。この概念はデコード問題を超えて応用できると考えている。また,Union-Findデコーダは,長さ3600の量子LDPC符号の場合,低誤り率条件下での信念伝搬デコーダよりも優れていることを示す数値シミュレーションを行った。 Quantum LDPC codes are a promising direction for low overhead quantum computing. In this paper, we propose a generalization of the Union-Find decoder as adecoder for quantum LDPC codes. We prove that this decoder corrects all errors with weight up to An^{\alpha} for some A, {\alpha} > 0 for different classes of quantum LDPC codes such as toric codes and hyperbolic codes in any dimension D \geq 3 and quantum expander codes. To prove this result, we introduce a notion of covering radius which measures the spread of an error from its syndrome. We believe this notion could find application beyond the decoding problem. We also perform numerical simulations, which show that our Union-Find decoder outperforms the belief propagation decoder in the low error rate regime in the case of a quantum LDPC code with length 3600.	翻訳日:2023-04-08 04:18:12 公開日:2021-03-14
# 量子エントロピー物理 Quantum-Entropy Physics ( http://arxiv.org/abs/2103.07996v1 ) ライセンス: Link先を確認	Davi Geiger and Zvi M. Kedem	(参考訳) 物理学の法則はすべて可逆である。古典粒子のアンサンブルが確率論的に扱われるときにのみ時間矢印が出現し、エントロピーと熱力学の第二法則が導入される。量子物理学では、固有確率性にもかかわらず時間矢印のメカニズムは提案されていない。結果として、励起状態にある電子が、可逆的なユニタリ進化を続けるのではなく、光子が生成され放出されるにつれて「自発的に」基底状態に遷移する理由を説明できない。このような現象に対処するために、時間矢印の出現を誘発する量子物理学のエントロピーを導入する。エントロピー(entropy)は、量子状態の自由度に対するランダム性の尺度である。これは無次元であり、相対論的スカラーであり、位置と運動量の座標変換の下では不変であり、共役性を維持し、CPT変換の下では不変である。保存法則に従っても量子物理過程が起こらない理由を解明するために、エントロピーの有無に基づいて初期状態のすべての進化の集合を4ブロックに分割する。 (i)増加するが一定ではない (ii)減少するが一定ではない。 (iii)定数 (4)振動する。量子物理学におけるエントロピー(weakly)は時間とともに増加するという法則を提案する。したがって、集合の進化は、 (ii)不許可であり、集合における進化 (iv)は、瞬時に新しい状態に移行することにより、発振期間の終了を阻止する。この量子物理学の法則は、保存法則を超えた物理シナリオを制限し、時間矢印を定義することで因果推論を提供する。 All the laws of physics are time-reversible. Time arrow emerges only when ensembles of classical particles are treated probabilistically, outside of physics laws, and the entropy and the second law of thermodynamics are introduced. In quantum physics, no mechanism for a time arrow has been proposed despite its intrinsic probabilistic nature. In consequence, one cannot explain why an electron in an excited state will "spontaneously" transition into a ground state as a photon is created and emitted, instead of continuing in its reversible unitary evolution. To address such phenomena, we introduce an entropy for quantum physics, which will conduce to the emergence of a time arrow. The entropy is a measure of randomness over the degrees of freedom of a quantum state. It is dimensionless; it is a relativistic scalar, it is invariant under coordinate transformation of position and momentum that maintain conjugate properties and under CPT transformations; and its minimum is positive due to the uncertainty principle. To excogitate why some quantum physical processes cannot take place even though they obey conservation laws, we partition the set of all evolutions of an initial state into four blocks, based on whether the entropy is (i) increasing but not a constant, (ii) decreasing but not a constant, (iii) a constant, (iv) oscillating. We propose a law that in quantum physics entropy (weakly) increases over time. Thus, evolutions in the set (ii) are disallowed, and evolutions in set (iv) are barred from completing an oscillation period by instantaneously transitioning to a new state. This law for quantum physics limits physical scenarios beyond conservation laws, providing causality reasoning by defining a time arrow.	翻訳日:2023-04-08 04:17:16 公開日:2021-03-14
# privacynet:マルチ属性の顔プライバシーのための半敵ネットワーク PrivacyNet: Semi-Adversarial Networks for Multi-attribute Face Privacy ( http://arxiv.org/abs/2001.00561v3 ) ライセンス: Link先を確認	Vahid Mirjalili, Sebastian Raschka, Arun Ross	(参考訳) 近年,人物の顔画像から年齢,性別,人種などのソフトバイオメトリックな属性を高精度に推定する可能性が確立されている。しかし、特に生体認証のために収集された顔画像が、人の同意なしに属性分析に使用される場合、プライバシーの懸念が高まる。この問題に対処するために,画像摂動法を用いて顔画像にソフトバイオメトリックプライバシを付与する手法を開発した。画像の摂動はganベースの半敵ネットワーク(san)(privacynet)を使用して行われ、入力された顔画像がマッチングのために顔マッチング器で使用できるが、属性分類器では確実に使用できないように修正される。さらに、privacynetでは、入力された顔画像(例えば、年齢と人種)に難読化されなければならない特定の属性を選択でき、他の種類の属性(例えば、性別)を抽出することができる。複数の顔マッチング器、複数の年齢/性別/人種分類器、および複数の顔データセットを用いた大規模な実験は、複数の顔および属性分類器にまたがる多属性プライバシー向上手法の一般化可能性を示す。 Recent research has established the possibility of deducing soft-biometric attributes such as age, gender and race from an individual's face image with high accuracy. However, this raises privacy concerns, especially when face images collected for biometric recognition purposes are used for attribute analysis without the person's consent. To address this problem, we develop a technique for imparting soft biometric privacy to face images via an image perturbation methodology. The image perturbation is undertaken using a GAN-based Semi-Adversarial Network (SAN) - referred to as PrivacyNet - that modifies an input face image such that it can be used by a face matcher for matching purposes but cannot be reliably used by an attribute classifier. Further, PrivacyNet allows a person to choose specific attributes that have to be obfuscated in the input face images (e.g., age and race), while allowing for other types of attributes to be extracted (e.g., gender). Extensive experiments using multiple face matchers, multiple age/gender/race classifiers, and multiple face datasets demonstrate the generalizability of the proposed multi-attribute privacy enhancing method across multiple face and attribute classifiers.	翻訳日:2023-01-16 04:14:10 公開日:2021-03-14
# 連続制御のためのdeep radial-basis値関数 Deep Radial-Basis Value Functions for Continuous Control ( http://arxiv.org/abs/2002.01883v2 ) ライセンス: Link先を確認	Kavosh Asadi, Neev Parikh, Ronald E. Parr, George D. Konidaris, Michael L. Littman	(参考訳) 強化学習(RL)の中核となる操作は、学習値関数に対して最適な行動を見つけることである。この操作は、学習値関数が連続的なアクションを入力として取る場合、しばしば難しい。本稿では,放射基底関数(RBF)の出力層を持つディープネットワークを用いて学習した値関数について紹介する。深部RBVFに対する作用値の最大値は、容易に正確に近似できることを示す。さらに、深いRBVFは、普遍関数近似をサポートするため、真の値関数を表現できる。エージェントに深いRBVFを付与することにより、標準的なDQNアルゴリズムを連続制御に拡張する。 RBF-DQNと呼ばれる結果のエージェントは、値関数のみのベースラインを著しく上回り、最先端のアクター批判アルゴリズムと競合することを示す。 A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis function (RBF) output layer. We show that the maximum action-value with respect to a deep RBVF can be approximated easily and accurately. Moreover, deep RBVFs can represent any true value function owing to their support for universal function approximation. We extend the standard DQN algorithm to continuous control by endowing the agent with a deep RBVF. We show that the resultant agent, called RBF-DQN, significantly outperforms value-function-only baselines, and is competitive with state-of-the-art actor-critic algorithms.	翻訳日:2023-01-03 20:43:27 公開日:2021-03-14
# アンダーディスプレイカメラの画像復元 Image Restoration for Under-Display Camera ( http://arxiv.org/abs/2003.04857v2 ) ライセンス: Link先を確認	Yuqian Zhou, David Ren, Neil Emerton, Sehoon Lim, Timothy Large	(参考訳) フルスクリーンデバイスの新しいトレンドは、カメラをスクリーンの後ろに置くことを奨励する。ベゼルを外し、画面下にカメラを集中させると、ディスプレイとボディの比率が大きくなり、ビデオチャットではアイコンタクトが強化されるが、画像の劣化も引き起こす。本稿では,新しい実世界の単一画像復元問題として,新たに定義されたudc(under-display camera)に着目した。まず4k Transparent OLED(T-OLED)とPentile OLED(P-OLED)を使って、その劣化を理解するための光学系を分析します。第2に、実対データ取得を容易にするモニタカメライメージングシステム(MCIS)と、表示パターンとカメラ計測のみからポイントスプレッド関数(PSF)とUDCデータを生成するモデルベースデータ合成パイプラインを設計する。最後に,デコンボリューションに基づくパイプラインと学習に基づく手法を用いて,複雑な劣化を解消する。我々のモデルはリアルタイムの高品質な復元を実証する。提案手法と結果は,UDCの有望な研究価値と方向性を明らかにする。 The new trend of full-screen devices encourages us to position a camera behind a screen. Removing the bezel and centralizing the camera under the screen brings larger display-to-body ratio and enhances eye contact in video chat, but also causes image degradation. In this paper, we focus on a newly-defined Under-Display Camera (UDC), as a novel real-world single image restoration problem. First, we take a 4k Transparent OLED (T-OLED) and a phone Pentile OLED (P-OLED) and analyze their optical systems to understand the degradation. Second, we design a Monitor-Camera Imaging System (MCIS) for easier real pair data acquisition, and a model-based data synthesizing pipeline to generate Point Spread Function (PSF) and UDC data only from display pattern and camera measurements. Finally, we resolve the complicated degradation using deconvolution-based pipeline and learning-based methods. Our model demonstrates a real-time high-quality restoration. The presented methods and results reveal the promising research values and directions of UDC.	翻訳日:2022-12-24 21:20:22 公開日:2021-03-14
# 物体検出のための動的スケールトレーニング Dynamic Scale Training for Object Detection ( http://arxiv.org/abs/2004.12432v2 ) ライセンス: Link先を確認	Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu Qi, Jian Sun, and Jiaya Jia	(参考訳) 本稿では,オブジェクト検出におけるスケール変動問題を軽減するための動的スケールトレーニングパラダイム(DST)を提案する。画像ピラミッドやマルチスケールトレーニングといったこれまでの戦略は、モデル最適化のためのスケール不変データを準備することを目的としていた。しかし, 提案手法は, スケール変動の処理能力を制限する, 以下の最適化プロセスに気付かない。代わりに、我々のパラダイムでは、最適化プロセスからのフィードバック情報を使用して、データ準備を動的にガイドします。提案手法は驚くほど単純であるが,従来の手法を上回っている(ms cocoデータセットの平均精度2%以上)。実験により,提案手法のスケール変動処理に対する有効性を示した。また、さまざまなバックボーン、ベンチマーク、およびインスタンスのセグメンテーションのようなダウンストリームタスクを一般化することもできる。推論オーバーヘッドを導入せず、一般的な検出設定のための無料ランチとして機能する。さらに、高速収束による効率的なトレーニングも容易である。コードとモデルはgithub.com/yukang2017/stitcherで入手できる。 We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection. Previous strategies like image pyramid, multi-scale training, and their variants are aiming at preparing scale-invariant data for model optimization. However, the preparation procedure is unaware of the following optimization process that restricts their capability in handling the scale variation. Instead, in our paradigm, we use feedback information from the optimization process to dynamically guide the data preparation. The proposed method is surprisingly simple yet obtains significant gains (2%+ Average Precision on MS COCO dataset), outperforming previous methods. Experimental results demonstrate the efficacy of our proposed DST method towards scale variation handling. It could also generalize to various backbones, benchmarks, and other challenging downstream tasks like instance segmentation. It does not introduce inference overhead and could serve as a free lunch for general detection configurations. Besides, it also facilitates efficient training due to fast convergence. Code and models are available at github.com/yukang2017/Stitcher.	翻訳日:2022-12-09 13:34:48 公開日:2021-03-14
# Bullseye Polytope: トランスファービリティを改善したスケーラブルなクリーンラベル中毒攻撃 Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability ( http://arxiv.org/abs/2005.00191v3 ) ライセンス: Link先を確認	Hojjat Aghakhani, Dongyu Meng, Yu-Xiang Wang, Christopher Kruegel, and Giovanni Vigna	(参考訳) ニューラルネットワークのセキュリティに対する最近の懸念の源泉は、トレーニングデータセットに正しくラベル付けされた毒サンプルを注入するクリーンラベルデータセット中毒攻撃の出現である。これらの毒のサンプルは人間の観察者にとって正しいように見えるが、推論中に標的の誤分類を引き起こす悪意のある特徴を含んでいる。そこで我々は,移動学習に対するスケーラブルで移動可能なクリーンラベル中毒攻撃を提案し,特徴空間内のターゲット画像に近い中心に毒画像を生成する。我々の攻撃であるBullseye Polytopeは、現在の最先端技術の攻撃成功率を26.75%向上させ、攻撃速度を12倍に向上させた。我々はさらにブルジー・ポリトープをより実用的な攻撃モデルに拡張し、毒サンプルを作成する際に同じ物体(例えば、異なる角度から)の複数の画像を含める。この拡張により、余分な毒のサンプルを使わずに、16%以上の画像(同じオブジェクト)のアタック転送性が向上する。 A recent source of concern for the security of neural networks is the emergence of clean-label dataset poisoning attacks, wherein correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to the human observer, they contain malicious characteristics that trigger a targeted misclassification during inference. We propose a scalable and transferable clean-label poisoning attack against transfer learning, which creates poison images with their center close to the target image in the feature space. Our attack, Bullseye Polytope, improves the attack success rate of the current state-of-the-art by 26.75% in end-to-end transfer learning, while increasing attack speed by a factor of 12. We further extend Bullseye Polytope to a more practical attack model by including multiple images of the same object (e.g., from different angles) when crafting the poison samples. We demonstrate that this extension improves attack transferability by over 16% to unseen images (of the same object) without using extra poison samples.	翻訳日:2022-12-07 23:18:57 公開日:2021-03-14
# human in events: 複雑なイベントにおける人間中心のビデオ分析のための大規模ベンチマーク Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events ( http://arxiv.org/abs/2005.04490v5 ) ライセンス: Link先を確認	Weiyao Lin, Huabin Liu, Shizhan Liu, Yuxi Li, Rui Qian, Tao Wang, Ning Xu, Hongkai Xiong, Guo-Jun Qi, Nicu Sebe	(参考訳) 現代のスマートシティの発展とともに、人間中心のビデオ分析は、現実の場面で多様な複雑なイベントを分析するという課題に直面している。複雑な出来事は、密集した群衆、異常、集団行動に関連する。しかしながら、既存のビデオデータセットの規模によって制限されているため、このような複雑なイベントにおけるパフォーマンスを報告している人的分析アプローチはほとんどない。そこで本研究では,人間の動作やポーズ,行動,特に群衆や複雑なイベントにおける動作を理解するために,ヒューマン・イン・イベント (human-in-events) や hieve (human-centric video analysis in complex events) という,新たな大規模データセットを提案する。複雑なイベントにおけるアクションインスタンスの最大数 (>56k) であるポーズ数 (>1M) と、長い時間(平均軌道長は >480 フレーム)続くトラジェクトリの最大数 (the most number of trajectories) を含む。このデータセットに基づいて,より強力な2次元ポーズ特徴の学習を導くために,行動情報の潜在性を活用したポーズ推定ベースラインの強化を提案する。提案手法は,HiEveデータセット上の既存のポーズ推定パイプラインの性能を向上させることができることを示す。さらに,最近の映像分析手法とベースライン手法のベンチマーク実験を行い,HiEveが人間中心のビデオ解析の挑戦的データセットであることを実証した。データセットは、人間中心の分析と複雑な事象の理解における最先端技術の開発を前進させることを期待している。データセットはhttp://humaninevents.orgで利用可能である。 Along with the development of modern smart cities, human-centric video analysis has been encountering the challenge of analyzing diverse and complex events in real scenes. A complex event relates to dense crowds, anomalous, or collective behaviors. However, limited by the scale of existing video datasets, few human analysis approaches have reported their performance on such complex events. To this end, we present a new large-scale dataset, named Human-in-Events or HiEve (Human-centric video analysis in complex Events), for the understanding of human motions, poses, and actions in a variety of realistic events, especially in crowd and complex events. It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time (with an average trajectory length of >480 frames). Based on this dataset, we present an enhanced pose estimation baseline by utilizing the potential of action information to guide the learning of more powerful 2D pose features. We demonstrate that the proposed method is able to boost the performance of existing pose estimation pipelines on our HiEve dataset. Furthermore, we conduct extensive experiments to benchmark recent video analysis approaches together with our baseline methods, demonstrating that HiEve is a challenging dataset for human-centric video analysis. We expect that the dataset will advance the development of cutting-edge techniques in human-centric analysis and the understanding of complex events. The dataset is available at http://humaninevents.org	翻訳日:2022-12-05 07:11:55 公開日:2021-03-14
# 5* 射影変換を用いた知識グラフ埋め込み 5* Knowledge Graph Embeddings with Projective Transformations ( http://arxiv.org/abs/2006.04986v2 ) ライセンス: Link先を確認	Mojtaba Nayyeri, Sahar Vahdati, Can Aykul, Jens Lehmann	(参考訳) 知識グラフ埋め込みモデルを用いたリンク予測が知識グラフ補完の一般的なアプローチとなっている。このようなモデルは、エッジを介してノードをベクトル空間にマッピングし、リンクの可能性を測定する変換関数を用いる。個々のノードをマッピングしながら、サブグラフの構造も変換される。ユークリッド幾何学で設計された埋め込みモデルは、通常、1つの変換タイプの変換や回転をサポートし、隣接する部分グラフに小さな違いがあるグラフの学習に適している。しかし、多重関係的知識グラフは近隣の複数の部分グラフ構造(例えば、パスとループ構造の組み合わせ)を含むことが多く、現在の埋め込みモデルではうまく捉えられていない。この問題に対処するために,複数の同時変換をサポートする射影幾何学における新しいKGEモデル(5E)を提案する。このモデルはいくつかの好ましい理論的性質を持ち、既存のアプローチを仮定する。これは最も広く使われているリンク予測ベンチマークでそれらを上回っている Performing link prediction using knowledge graph embedding models has become a popular approach for knowledge graph completion. Such models employ a transformation function that maps nodes via edges into a vector space in order to measure the likelihood of the links. While mapping the individual nodes, the structure of subgraphs is also transformed. Most of the embedding models designed in Euclidean geometry usually support a single transformation type - often translation or rotation, which is suitable for learning on graphs with small differences in neighboring subgraphs. However, multi-relational knowledge graphs often include multiple sub-graph structures in a neighborhood (e.g. combinations of path and loop structures), which current embedding models do not capture well. To tackle this problem, we propose a novel KGE model (5E) in projective geometry, which supports multiple simultaneous transformations - specifically inversion, reflection, translation, rotation, and homothety. The model has several favorable theoretical properties and subsumes the existing approaches. It outperforms them on the most widely used link prediction benchmarks	翻訳日:2022-11-24 00:24:53 公開日:2021-03-14
# dance revolution: カリキュラム学習による音楽による長期ダンス生成 Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning ( http://arxiv.org/abs/2006.06119v7 ) ライセンス: Link先を確認	Ruozi Huang, Huang Hu, Wei Wu, Kei Sawada, Mi Zhang and Daxin Jiang	(参考訳) 音楽に合わせて踊ることは、古代から人間の生来の能力の1つである。しかし、機械学習の研究では、音楽からダンスの動きを合成することは難しい問題である。近年,リカレントニューラルネットワーク(RNN)のような自己回帰モデルを用いて,ヒトの動作配列を合成している。このようなアプローチは、ニューラルネットワークにフィードバックされる予測エラーの蓄積によって、しばしば短いシーケンスを生成する。この問題は長動き列生成においてさらに深刻になる。また、スタイル、リズム、ビートの観点からのダンスと音楽の一貫性は、モデリングの段階ではまだ考慮されていない。本稿では,音楽条件付きダンス生成を逐次学習問題として定式化し,新しいseq2seqアーキテクチャを考案し,音楽特徴の長いシーケンスを効率的に処理し,音楽とダンスの微妙な対応を捉える。さらに,本論文では,前回の地中運動を用いた完全誘導型教師励行方式から,主に生成した動きを用いた非誘導型自己回帰方式へと,学習過程をゆるやかに変化させる長動系列生成における自己回帰モデルの誤り蓄積を緩和する新しいカリキュラム学習戦略を提案する。大規模な実験により、我々のアプローチは、自動測定と人的評価において、既存の最先端技術よりも大幅に優れていることが示された。また、提案されたアプローチの優れたパフォーマンスを示すデモビデオをhttps://www.youtube.com/watch? v=lmE20MEheZ8。 Dancing to music is one of human's innate abilities since ancient times. In machine learning research, however, synthesizing dance movements from music is a challenging problem. Recently, researchers synthesize human motion sequences through autoregressive models like recurrent neural network (RNN). Such an approach often generates short sequences due to an accumulation of prediction errors that are fed back into the neural network. This problem becomes even more severe in the long motion sequence generation. Besides, the consistency between dance and music in terms of style, rhythm and beat is yet to be taken into account during modeling. In this paper, we formalize the music-conditioned dance generation as a sequence-to-sequence learning problem and devise a novel seq2seq architecture to efficiently process long sequences of music features and capture the fine-grained correspondence between music and dance. Furthermore, we propose a novel curriculum learning strategy to alleviate error accumulation of autoregressive models in long motion sequence generation, which gently changes the training process from a fully guided teacher-forcing scheme using the previous ground-truth movements, towards a less guided autoregressive scheme mostly using the generated movements instead. Extensive experiments show that our approach significantly outperforms the existing state-of-the-arts on automatic metrics and human evaluation. We also make a demo video to demonstrate the superior performance of our proposed approach at https://www.youtube.com/watch?v=lmE20MEheZ8.	翻訳日:2022-11-22 13:22:35 公開日:2021-03-14
# 微分的にプライベートな確率座標降下 Differentially Private Stochastic Coordinate Descent ( http://arxiv.org/abs/2006.07272v4 ) ライセンス: Link先を確認	Georgios Damaskinos, Celestine Mendler-D\"unner, Rachid Guerraoui, Nikolaos Papandreou, Thomas Parnell	(参考訳) 本稿では,確率座標降下アルゴリズムを微分プライベートにするという課題に挑戦する。従来の勾配降下アルゴリズムでは、更新が1つのモデルベクトル上で動作し、このベクトルにノイズを加えることで個人に関する重要な情報を隠蔽するが、確率座標降下はトレーニング中に補助情報をメモリに保持することに大きく依存する。この補助情報は、さらなるプライバシー漏洩をもたらし、この作業で対処される大きな課題を提起する。独立雑音付加の下では、補助情報の整合性は期待通りに保たれるという知見により、DP-SCDは、最初の微分プライベート確率座標降下アルゴリズムである。提案手法を理論的に解析し,コーディネート更新の分離と並列化が有用であると主張している。経験的側面では、一般的な確率勾配降下代替(DP-SGD)に対して、チューニングを著しく少なくして競合性能を示す。 In this paper we tackle the challenge of making the stochastic coordinate descent algorithm differentially private. Compared to the classical gradient descent algorithm where updates operate on a single model vector and controlled noise addition to this vector suffices to hide critical information about individuals, stochastic coordinate descent crucially relies on keeping auxiliary information in memory during training. This auxiliary information provides an additional privacy leak and poses the major challenge addressed in this work. Driven by the insight that under independent noise addition, the consistency of the auxiliary information holds in expectation, we present DP-SCD, the first differentially private stochastic coordinate descent algorithm. We analyze our new method theoretically and argue that decoupling and parallelizing coordinate updates is essential for its utility. On the empirical side we demonstrate competitive performance against the popular stochastic gradient descent alternative (DP-SGD) while requiring significantly less tuning.	翻訳日:2022-11-22 03:42:32 公開日:2021-03-14
# コミュニケーション効率の良い分散学習における誤りフィードバックの代替策 A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning ( http://arxiv.org/abs/2006.11077v2 ) ライセンス: Link先を確認	Samuel Horv\'ath and Peter Richt\'arik	(参考訳) 現代の大規模機械学習アプリケーションは、分散コンピューティングシステムに実装するために確率最適化アルゴリズムを必要とする。このようなシステムの重要なボトルネックは、確率勾配のような労働者間で情報を交換するための通信オーバーヘッドである。この問題を解決するために提案された多くのテクニックの中で、最も成功したのは、エラーフィードバック(EF)による圧縮通信のフレームワークである。 EFは、Top-$K$のようなバイアスのない圧縮機によって引き起こされるエラーに対処できる唯一の方法である。本稿では, 収縮圧縮機を扱うための新しい, 理論上, 実用上, EFの代替案を提案する。特に,任意の収縮圧縮機を誘導非バイアス圧縮機に変換可能な構成を提案する。この変換の後、非バイアス圧縮機で動く既存の方法を適用することができる。我々のアプローチは、メモリ要求の削減、通信の複雑さの保証の改善、仮定の削減など、EFよりも大幅に改善されることを示します。さらに,ノード上の任意の分布に従って,部分的参加を伴うフェデレーション学習に結果を拡張し,そのメリットを実証する。理論的結果を検証する数値実験を数回行った。 Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed compute systems. A key bottleneck of such systems is the communication overhead for exchanging information across the workers, such as stochastic gradients. Among the many techniques proposed to remedy this issue, one of the most successful is the framework of compressed communication with error feedback (EF). EF remains the only known technique that can deal with the error induced by contractive compressors which are not unbiased, such as Top-$K$. In this paper, we propose a new and theoretically and practically better alternative to EF for dealing with contractive compressors. In particular, we propose a construction which can transform any contractive compressor into an induced unbiased compressor. Following this transformation, existing methods able to work with unbiased compressors can be applied. We show that our approach leads to vast improvements over EF, including reduced memory requirements, better communication complexity guarantees and fewer assumptions. We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits thereof. We perform several numerical experiments which validate our theoretical findings.	翻訳日:2022-11-19 03:57:56 公開日:2021-03-14
# BBTの事前トレーニングに役立つフライノート Taking Notes on the Fly Helps BERT Pre-training ( http://arxiv.org/abs/2008.01466v2 ) ライセンス: Link先を確認	Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu	(参考訳) 教師なし言語の事前学習をより効率的でリソース集約の少ないものにする方法は、NLPにおいて重要な研究方向である。本稿では,より優れたデータ利用を提供することにより,言語事前学習手法の効率化に焦点をあてる。言語データコーパスでは、単語はヘビーテール分布に従うことが知られている。単語のかなりの割合はわずか数回しか現れず、まれな単語の埋め込みは通常最適化が不十分である。このような埋め込みはセマンティックな信号が不十分であるため、データの利用効率が低下し、モデル全体の事前学習が遅くなる可能性がある。この問題を軽減するため,我々は,モデルが次回発生することを理解できるように,事前学習中のまれな単語のメモを取るtnf(take notes on the fly)を提案する。具体的には、TNFはノート辞書を保持し、まれな単語の文脈情報を文中に稀な単語が発生したときのメモとして保存する。トレーニング中に同じまれな単語が再び発生すると、前もって保存したメモ情報を使用して、現在の文の意味性を高めることができる。これにより、TNFは、文中のまれな単語によって引き起こされる不適切な意味をカバーするために、クロス文情報を用いるため、より良いデータ利用を提供する。 BERTとELECTRAの両方にTNFを実装し,その効率性と有効性を確認した。実験の結果、TNFのトレーニング時間は、同じパフォーマンスに達すると、バックボーン事前トレーニングモデルよりも60\%$安いことがわかった。同じイテレーション数でトレーニングされた場合、TNFは、ダウンストリームタスクの大部分と平均GLUEスコアで、バックボーンメソッドよりも優れています。ソースコードは補足材料に添付される。 How to make unsupervised language pre-training more efficient and less resource-intensive is an important research direction in NLP. In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization. It is well-known that in language data corpus, words follow a heavy-tail distribution. A large proportion of words appear only very few times and the embeddings of rare words are usually poorly optimized. We argue that such embeddings carry inadequate semantic signals, which could make the data utilization inefficient and slow down the pre-training of the entire model. To mitigate this problem, we propose Taking Notes on the Fly (TNF), which takes notes for rare words on the fly during pre-training to help the model understand them when they occur next time. Specifically, TNF maintains a note dictionary and saves a rare word's contextual information in it as notes when the rare word occurs in a sentence. When the same rare word occurs again during training, the note information saved beforehand can be employed to enhance the semantics of the current sentence. By doing so, TNF provides better data utilization since cross-sentence information is employed to cover the inadequate semantics caused by rare words in the sentences. We implement TNF on both BERT and ELECTRA to check its efficiency and effectiveness. Experimental results show that TNF's training time is $60\%$ less than its backbone pre-training models when reaching the same performance. When trained with the same number of iterations, TNF outperforms its backbone methods on most of downstream tasks and the average GLUE score. Source code is attached in the supplementary material.	翻訳日:2022-11-03 00:14:52 公開日:2021-03-14
# Coupled Oscillatory Recurrent Neural Network (coRNN): 長期間の依存関係を学習するための正確で(段階的な)安定したアーキテクチャ Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies ( http://arxiv.org/abs/2010.00951v2 ) ライセンス: Link先を確認	T. Konstantin Rusch, Siddhartha Mishra	(参考訳) 脳の機能部分のような生体ニューロンの回路は、結合振動子のネットワークとしてモデル化することができる。状態変数を(段階的に)有界に保ちながら、豊かな出力を表現できるシステムの能力に着想を得て、リカレントニューラルネットワークのための新しいアーキテクチャを提案する。提案するRNNは,制御非線形発振器のモデリングネットワークである2次常微分方程式系の時間分解に基づく。我々は隠れた状態の勾配の正確な境界を証明し、このrnnの爆発と消滅の勾配問題の緩和に繋がる。実験により、提案したRNNは、様々なベンチマークにおける最先端技術に匹敵する性能を示し、複雑なシーケンシャルデータを処理するための安定かつ正確なRNNを提供するアーキテクチャの可能性を示した。 Circuits of biological neurons, such as in the functional parts of the brain can be modeled as networks of coupled oscillators. Inspired by the ability of these systems to express a rich set of outputs while keeping (gradients of) state variables bounded, we propose a novel architecture for recurrent neural networks. Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations, modeling networks of controlled nonlinear oscillators. We prove precise bounds on the gradients of the hidden states, leading to the mitigation of the exploding and vanishing gradient problem for this RNN. Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks, demonstrating the potential of this architecture to provide stable and accurate RNNs for processing complex sequential data.	翻訳日:2022-10-12 00:14:26 公開日:2021-03-14
# ALFWorld:インタラクティブ学習のためのテキストと身体環境の調整 ALFWorld: Aligning Text and Embodied Environments for Interactive Learning ( http://arxiv.org/abs/2010.03768v2 ) ライセンス: Link先を確認	Mohit Shridhar, Xingdi Yuan, Marc-Alexandre C\^ot\'e, Yonatan Bisk, Adam Trischler, Matthew Hausknecht	(参考訳) キッチンの冷蔵庫にリンゴを洗うといった単純な要求があれば、人間はアクションシーケンスを想像し、成功率、確率、効率を、筋肉を動かすことなく評価することで、純粋に抽象的な言葉で説明できる。問題のあるキッチンを見たら、そのシーンに合うように抽象的な計画を更新できる。エージェントは同じ能力を必要とするが、既存の作業は抽象的な推論と具体的実行の両方に必要なインフラを提供していない。この制限には、エージェントがTextWorld(C\^ot\'e et al., 2018)で抽象的テキストベースのポリシーを学習し、ALFREDベンチマーク(Shridhar et al., 2020)の目標をリッチなビジュアル環境で実行できるようにするシミュレータALFWorldを導入することで対処する。 ALFWorldは、TextWorldで学んだ抽象的な知識が、具体的で視覚的に根ざしたアクションに直接対応する新しいBUTLERエージェントの作成を可能にする。実験的に示すように、これは視覚的に接地された環境でのトレーニングよりも優れたエージェントの一般化を促進する。バトラーのシンプルでモジュラーな設計要素は、研究者がパイプラインのすべての部分(言語理解、計画、ナビゲーション、視覚シーン理解)を改善するためのモデルに集中できる問題である。 Given a simple request like Put a washed apple in the kitchen fridge, humans can reason in purely abstract terms by imagining action sequences and scoring their likelihood of success, prototypicality, and efficiency, all without moving a muscle. Once we see the kitchen in question, we can update our abstract plans to fit the scene. Embodied agents require the same abilities, but existing work does not yet provide the infrastructure necessary for both reasoning abstractly and executing concretely. We address this limitation by introducing ALFWorld, a simulator that enables agents to learn abstract, text based policies in TextWorld (C\^ot\'e et al., 2018) and then execute goals from the ALFRED benchmark (Shridhar et al., 2020) in a rich visual environment. ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions. In turn, as we demonstrate empirically, this fosters better agent generalization than training only in the visually grounded environment. BUTLER's simple, modular design factors the problem to allow researchers to focus on models for improving every piece of the pipeline (language understanding, planning, navigation, and visual scene understanding).	翻訳日:2022-10-09 10:57:20 公開日:2021-03-14
# 近似量子状態を用いた期待値の推定 Estimating expectation values using approximate quantum states ( http://arxiv.org/abs/2011.04754v3 ) ライセンス: Link先を確認	Marco Paini, Amir Kalev, Dan Padilha, and Brendan Ruck	(参考訳) n$-qubit状態の近似的な記述を導入する。これは、システムの同一準備数の平方根に対する可観測性の適切に定義された半ノルムの比率によって上限される精度に対して、観測可能な任意の可観測値の期待値を推定するのに十分な情報を含んでいる。本稿では,量子状態生成に加えて,単一量子ビット回転と単一量子ビット計測のみを必要とする状態の近似記述を行うための操作手順について述べる。この手順に従って、結果として得られた状態の記述の基数は、$3MN$に増加することを示す。リゲッティの量子プロセッサユニット上で、ランダムな状態とランダムな可観測値に対して12, 16, 25キュービットの量子ビットを用いて提案手法を検証し、実験誤差にもかかわらず、理論と良好な一致を見出した。 We introduce an approximate description of an $N$-qubit state, which contains sufficient information to estimate the expectation value of any observable to a precision that is upper bounded by the ratio of a suitably-defined seminorm of the observable to the square root of the number of the system's identical preparations $M$, with no explicit dependence on $N$. We describe an operational procedure for constructing the approximate description of the state that requires, besides the quantum state preparation, only single-qubit rotations followed by single-qubit measurements. We show that following this procedure, the cardinality of the resulting description of the state grows as $3MN$. We test the proposed method on Rigetti's quantum processor unit with 12, 16 and 25 qubits for random states and random observables, and find an excellent agreement with the theory, despite experimental errors.	翻訳日:2022-09-28 02:19:18 公開日:2021-03-14
# キャプションを用いた開語彙オブジェクト検出 Open-Vocabulary Object Detection Using Captions ( http://arxiv.org/abs/2011.10678v2 ) ライセンス: Link先を確認	Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang	(参考訳) オブジェクト検出におけるディープニューラルネットワークの精度は極めて高いが、監視要件のためにトレーニングやスケールにコストがかかる。特に、より多くのオブジェクトカテゴリを学ぶには、一般的に比例的にボックスアノテーションが必要である。弱い教師付きおよびゼロショット学習技術は、少ない監督でより多くのカテゴリに対象検出器をスケールするために研究されてきたが、教師付きモデルほど成功せず、広く採用されていない。本稿では,対象検出問題の新たな定式化,すなわちオープンボキャブラリー物体検出法について述べる。本稿では,限定された対象カテゴリに対するバウンディングボックスアノテーションと,より広い範囲のオブジェクトをカバーするイメージキャプチャペアを用いて,より低コストで物体検出を行う新しい手法を提案する。提案手法は,学習中に境界ボックスアノテーションが提供されないオブジェクトを,ゼロショットアプローチよりもはるかに高い精度で検出・ローカライズできることを示す。一方、境界ボックスアノテーションを持つオブジェクトは、教師付きメソッドと同じくらい正確に検出することができる。そこで我々は,スケーラブルな物体検出のための新しい技術を確立した。 Despite the remarkable accuracy of deep neural networks in object detection, they are costly to train and scale due to supervision requirements. Particularly, learning more object categories typically requires proportionally more bounding box annotations. Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models. In this paper, we put forth a novel formulation of the object detection problem, namely open-vocabulary object detection, which is more general, more practical, and more effective than weakly supervised and zero-shot approaches. We propose a new method to train object detectors using bounding box annotations for a limited set of object categories, as well as image-caption pairs that cover a larger variety of objects at a significantly lower cost. We show that the proposed method can detect and localize objects for which no bounding box annotation is provided during training, at a significantly higher accuracy than zero-shot approaches. Meanwhile, objects with bounding box annotation can be detected almost as accurately as supervised methods, which is significantly better than weakly supervised baselines. Accordingly, we establish a new state of the art for scalable object detection.	翻訳日:2022-09-23 05:05:24 公開日:2021-03-14
# RaP-Net: 屋内ローカライゼーションのためのロバスト特徴抽出のための領域的および点的重み付けネットワーク RaP-Net: A Region-wise and Point-wise Weighting Network to Extract Robust Features for Indoor Localization ( http://arxiv.org/abs/2012.00234v2 ) ライセンス: Link先を確認	Dongjiang Li, Jinyu Miao, Xuesong Shi, Yuxin Tian, Qiwei Long, Tianyu Cai, Ping Guo, Hongfei Yu, Wei Yang, Haosong Yue, Qi Wei, Fei Qiao	(参考訳) 特徴抽出は視覚局所化において重要な役割を果たす。動的オブジェクトや反復領域の信頼性の低い機能は、ロバストな特徴マッチングを邪魔し、屋内でのローカライゼーションに大きく挑戦する。このような問題を克服するために,地域的不可変性と点的信頼性を同時に予測する新しいネットワークであるRaP-Netを提案し,その両方を考慮して特徴を抽出する。また、提案するネットワークをトレーニングするために、OpenLORIS-Locationという新しいデータセットも導入する。データセットには93箇所の屋内画像1553点が含まれている。同じ場所の画像間の様々な外観変化が含まれており、典型的な屋内シーンにおける不変性を学ぶのに役立ちます。実験の結果,openloris-locationデータセットでトレーニングしたrap-netは,特徴マッチングタスクにおいて優れた性能を達成でき,室内ローカライズにおける最先端の特徴アルゴリズムを著しく上回っている。 RaP-Netのコードとデータセットはhttps://github.com/ivipsourcecode/RaP-Netで公開されている。 Feature extraction plays an important role in visual localization. Unreliable features on dynamic objects or repetitive regions will disturb robust feature matching and thus, challenge indoor localization greatly. To conquer such an issue, we propose a novel network, RaP-Net, to simultaneously predict region-wise invariability and point-wise reliability, and then extract features by considering both of them. We also introduce a new dataset, named OpenLORIS-Location, to train proposed network. The dataset contains 1553 indoor images from 93 indoor locations. Various appearance changes between images of the same location are included and they can help to learn the invariability in typical indoor scenes. Experimental results show that the proposed RaP-Net trained with the OpenLORIS-Location dataset achieves an excellent performance in the feature matching task and significantly outperforms state-of-the-arts feature algorithms in indoor localization. The RaP-Net code and dataset are available at https://github.com/ivipsourcecode/RaP-Net.	翻訳日:2021-05-30 19:36:50 公開日:2021-03-14
# (参考訳) テンソルブロックモデルにおける厳密なクラスタリング:統計的最適性と計算限界 Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit ( http://arxiv.org/abs/2012.09996v2 ) ライセンス: CC BY 4.0	Rungang Han, Yuetian Luo, Miaoyan Wang, and Anru R. Zhang	(参考訳) 高次クラスタリングは、神経画像、ゲノム、およびソーシャルネットワーク研究で一般的に発生するマルチウェイデータセットにおける不均一なサブ構造を特定することを目的としている。この問題の非凸性と不連続性は、統計と計算の両方において大きな課題を生じさせる。本稿では,テンソルブロックモデルにおける高次クラスタリングのためのテンソルブロックモデルと計算効率のよい方法である 'emph{high-order Lloyd algorithm} (HLloyd) と \emph{high-order spectrum clustering} (HSC) を提案する。提案手法の収束が確立され,提案手法が妥当な仮定のもとに正確なクラスタリングを実現することを示す。また、3つの異なる信号対雑音比に基づく高次クラスタリングにおける統計的計算的トレードオフの完全な特性を示す。最後に,合成データと実データの両方について広範な実験を行い,提案手法のメリットを示す。 High-order clustering aims to identify heterogeneous substructure in multiway dataset that arises commonly in neuroimaging, genomics, and social network studies. The non-convex and discontinuous nature of the problem poses significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, \emph{high-order Lloyd algorithm} (HLloyd) and \emph{high-order spectral clustering} (HSC), for high-order clustering in tensor block model. The convergence of the proposed procedure is established, and we show that our method achieves exact clustering under reasonable assumptions. We also give the complete characterization for the statistical-computational trade-off in high-order clustering based on three different signal-to-noise ratio regimes. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.	翻訳日:2021-05-02 04:34:56 公開日:2021-03-14
# 混合整数線形最適化による順序付き対実説明 Ordered Counterfactual Explanation by Mixed-Integer Linear Optimization ( http://arxiv.org/abs/2012.11782v2 ) ライセンス: Link先を確認	Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike, Kento Uemura, Hiroki Arimura	(参考訳) 機械学習モデルのポストホックな説明法は意思決定を支援するために広く用いられている。一般的な方法の1つは、予測結果を変える特徴の摂動ベクトルをユーザに提供する、Actionable Recourse (CE) とも呼ばれる。摂動ベクトルが与えられると、ユーザはそれを望ましい決定結果を得るための「作用」として解釈することができる。しかし実際には、摂動ベクトルのみを示すことは、ユーザがアクションを実行するには不十分であることが多い。その理由は、因果関係のような機能間で非対称な相互作用がある場合、アクションの総コストは機能変更の順序に依存することが期待されるためである。したがって、実用的なCE法は摂動ベクトルに加えて、特徴の変化の適切な順序を提供する必要がある。そこで本研究では,OrdCE (Ordered Counterfactual Explanation) と呼ばれる新しいフレームワークを提案する。本稿では,アクションの対と順序を特徴的相互作用に基づいて評価する新しい目的関数を提案する。最適ペアを抽出するために,目的関数を用いた混合整数線形最適化手法を提案する。実データセットの数値実験により,OrdCEの非順序CE法と比較して有効性を示した。 Post-hoc explanation methods for machine learning models have been widely used to support decision-making. One of the popular methods is Counterfactual Explanation (CE), also known as Actionable Recourse, which provides a user with a perturbation vector of features that alters the prediction result. Given a perturbation vector, a user can interpret it as an "action" for obtaining one's desired decision result. In practice, however, showing only a perturbation vector is often insufficient for users to execute the action. The reason is that if there is an asymmetric interaction among features, such as causality, the total cost of the action is expected to depend on the order of changing features. Therefore, practical CE methods are required to provide an appropriate order of changing features in addition to a perturbation vector. For this purpose, we propose a new framework called Ordered Counterfactual Explanation (OrdCE). We introduce a new objective function that evaluates a pair of an action and an order based on feature interaction. To extract an optimal pair, we propose a mixed-integer linear optimization approach with our objective function. Numerical experiments on real datasets demonstrated the effectiveness of our OrdCE in comparison with unordered CE methods.	翻訳日:2021-04-26 07:42:16 公開日:2021-03-14
# (参考訳) DeepStyle:短いテキストのオーサシップ属性のためのユーザスタイルの埋め込み DeepStyle: User Style Embedding for Authorship Attribution of Short Texts ( http://arxiv.org/abs/2103.11798v1 ) ライセンス: CC BY 4.0	Zhiqiang Hu, Roy Ka-Wei Lee, Lei Wang, Ee-Peng Lim and Bo Dai	(参考訳) 著者帰属(英: Authorship Attribution、AA)は、あるテキストの所有者を見つけるタスクであり、多くのアプリケーションにおいて重要かつ広く研究されている研究トピックである。近年の研究では、深層学習がAAタスクの精度を大幅に向上させることが示されている。それにもかかわらず、提案された手法のほとんどは、単一のタイプの機能(例えば、ワードバイグラム)を使用してユーザー投稿を表現し、タスクに対処するためのテキスト分類アプローチを採用する。さらに、これらの手法はAA結果の非常に限定的な説明性を提供する。本稿では,ユーザの敬遠した文体表現を学習する新しい組込み型フレームワークであるdeepstyleを提案することで,これらの制限に対処する。 TwitterとWeiboの2つの実世界のデータセットについて広範な実験を行った。実験の結果,DeepStyleはAAタスクにおける最先端のベースラインよりも優れていた。 Authorship attribution (AA), which is the task of finding the owner of a given text, is an important and widely studied research topic with many applications. Recent works have shown that deep learning methods could achieve significant accuracy improvement for the AA task. Nevertheless, most of these proposed methods represent user posts using a single type of feature (e.g., word bi-grams) and adopt a text classification approach to address the task. Furthermore, these methods offer very limited explainability of the AA results. In this paper, we address these limitations by proposing DeepStyle, a novel embedding-based framework that learns the representations of users' salient writing styles. We conduct extensive experiments on two real-world datasets from Twitter and Weibo. Our experiment results show that DeepStyle outperforms the state-of-the-art baselines on the AA task.	翻訳日:2021-04-05 03:10:45 公開日:2021-03-14
# (参考訳) DeepHate: 多面的テキスト表現によるヘイトスピーチ検出 DeepHate: Hate Speech Detection via Multi-Faceted Text Representations ( http://arxiv.org/abs/2103.11799v1 ) ライセンス: CC BY 4.0	Rui Cao, Roy Ka-Wei Lee and Tuan-Anh Hoang	(参考訳) オンラインヘイトスピーチは、オンライン社会の結束性を損なう重要な問題であり、私たちの社会における公衆の安全を懸念することさえある。この問題に触発された研究者たちは、オンラインソーシャルプラットフォームにおけるヘイトスピーチを自動的に検出する、多くの伝統的な機械学習とディープラーニング手法を開発した。しかし、これらの手法のほとんどは、単語の頻度や単語の埋め込みなど、単一の型テキストの特徴しか考慮していない。このようなアプローチは、ヘイトスピーチ検出を改善するために使用できる他の豊富なテキスト情報を無視している。本稿では,オンラインソーシャルプラットフォームにおけるヘイトスピーチを検出するために,単語の埋め込み,感情,話題情報などの多面的テキスト表現を組み合わせた新しいディープラーニングモデルDeepHateを提案する。大規模な実験を行い、3つの公開現実データセット上でDeepHateを評価する。実験の結果,DeepHateはヘイトスピーチ検出タスクにおける最先端のベースラインよりも優れていた。また、オンラインソーシャルプラットフォームでヘイトスピーチを検出するのに最適なサルエント機能に関する洞察を提供するために、ケーススタディを実施します。 Online hate speech is an important issue that breaks the cohesiveness of online social communities and even raises public safety concerns in our societies. Motivated by this rising issue, researchers have developed many traditional machine learning and deep learning methods to detect hate speech in online social platforms automatically. However, most of these methods have only considered single type textual feature, e.g., term frequency, or using word embeddings. Such approaches neglect the other rich textual information that could be utilized to improve hate speech detection. In this paper, we propose DeepHate, a novel deep learning model that combines multi-faceted text representations such as word embeddings, sentiments, and topical information, to detect hate speech in online social platforms. We conduct extensive experiments and evaluate DeepHate on three large publicly available real-world datasets. Our experiment results show that DeepHate outperforms the state-of-the-art baselines on the hate speech detection task. We also perform case studies to provide insights into the salient features that best aid in detecting hate speech in online social platforms.	翻訳日:2021-04-05 03:03:27 公開日:2021-03-14
# (参考訳) AngryBERT:ヘイトスピーチ検出のための共同学習目標と感情 AngryBERT: Joint Learning Target and Emotion for Hate Speech Detection ( http://arxiv.org/abs/2103.11800v1 ) ライセンス: CC BY 4.0	Md Rabiul Awal, Rui Cao, Roy Ka-Wei Lee, Sandra Mitrovic	(参考訳) ソーシャルメディアにおけるヘイトスピーチの自動検出は、最近データマイニングと自然言語処理コミュニティで大きな注目を集めている課題である。しかし、既存の手法の多くは、不均衡でしばしばヘイトフルコンテンツのトレーニングサンプルを欠く、注釈付きヘイトスピーチデータセットに大きく依存する教師付きアプローチを採用している。本稿では,新たなマルチタスク学習ベースモデルであるangrybertを提案し,感情分類とターゲット識別を併用したヘイトスピーチ検出を副次的なタスクとして提案する。 3つの一般的なヘイトスピーチ検出データセットを補完する大規模な実験を行った。実験の結果、AngryBERTは最先端のシングルタスク学習とマルチタスク学習のベースラインを上回っていることがわかった。我々は,AngryBERTモデルの強みと特徴を実証的に検証するためにアブレーション研究とケーススタディを行い,その二次課題がヘイトスピーチの検出を改善することを示す。 Automated hate speech detection in social media is a challenging task that has recently gained significant traction in the data mining and Natural Language Processing community. However, most of the existing methods adopt a supervised approach that depended heavily on the annotated hate speech datasets, which are imbalanced and often lack training samples for hateful content. This paper addresses the research gaps by proposing a novel multitask learning-based model, AngryBERT, which jointly learns hate speech detection with sentiment classification and target identification as secondary relevant tasks. We conduct extensive experiments to augment three commonly-used hate speech detection datasets. Our experiment results show that AngryBERT outperforms state-of-the-art single-task-learning and multitask learning baselines. We conduct ablation studies and case studies to empirically examine the strengths and characteristics of our AngryBERT model and show that the secondary tasks are able to improve hate speech detection.	翻訳日:2021-04-05 02:48:24 公開日:2021-03-14
# 静的から動的予測へ:複数の環境要因に基づく山火事リスク評価 From Static to Dynamic Prediction: Wildfire Risk Assessment Based on Multiple Environmental Factors ( http://arxiv.org/abs/2103.10901v1 ) ライセンス: Link先を確認	Tanqiu Jiang, Sidhant K. Bendre, Hanjia Lyu, Jiebo Luo	(参考訳) ワイルドファイアはアメリカ合衆国西海岸で頻繁に起こる最大の災害の1つである。近年、山火事の強度と頻度の増加の原因を理解するために多くの努力がなされている。本研究では,人口密度,正規化差植生指数(ndvi),パーマー干ばつ重大度指数(pdsi),樹木の枯死率,樹木の枯死率,標高など多岐にわたる環境データを用いて,カリフォルニア州の森林火災リスクの高い地域を解析・評価するための静的・動的予測モデルを提案する。さらに,様々な要因の影響をよりよく理解し,予防的行動に知らせることにも焦点を当てる。我々のモデルと結果を検証するために、カリフォルニアの土地を緯度と経度で0.1°$\times$0.1°sの4,242のグリッドに分割し、空間的および時間的条件に基づいて各グリッドのリスクを計算する。対物分析を行うことで、高リスク山火事の減少に対するいくつかの方法がもたらす影響を明らかにする。本研究は、これらの環境データが利用可能であるような様々な地域において、山火事のリスクを推定、監視、軽減する可能性を秘めている。 Wildfire is one of the biggest disasters that frequently occurs on the west coast of the United States. Many efforts have been made to understand the causes of the increases in wildfire intensity and frequency in recent years. In this work, we propose static and dynamic prediction models to analyze and assess the areas with high wildfire risks in California by utilizing a multitude of environmental data including population density, Normalized Difference Vegetation Index (NDVI), Palmer Drought Severity Index (PDSI), tree mortality area, tree mortality number, and altitude. Moreover, we focus on a better understanding of the impacts of different factors so as to inform preventive actions. To validate our models and findings, we divide the land of California into 4,242 grids of 0.1 degrees $\times$ 0.1 degrees in latitude and longitude, and compute the risk of each grid based on spatial and temporal conditions. By performing counterfactual analysis, we uncover the effects of several possible methods on reducing the number of high risk wildfires. Taken together, our study has the potential to estimate, monitor, and reduce the risks of wildfires across diverse areas provided that such environment data is available.	翻訳日:2021-04-05 01:05:44 公開日:2021-03-14
# Fruit Flyは言葉の埋め込みを学べる? Can a Fruit Fly Learn Word Embeddings? ( http://arxiv.org/abs/2101.06887v2 ) ライセンス: Link先を確認	Yuchen Liang, Chaitanya K. Ryali, Benjamin Hoover, Leopold Grinberg, Saket Navlakha, Mohammed J. Zaki, Dmitry Krotov	(参考訳) ショウジョウバエ脳のキノコ体は神経科学において最も研究されているシステムの一つである。核となるのはケニオン細胞の集団であり、複数の感覚様相から入力を受ける。これらの細胞は前対の側方ニューロンによって抑制され、入力のスパースな高次元表現となる。本研究では,このネットワークモチーフの数学的形式化について検討し,自然言語処理(NLP)タスクである非構造化テキストのコーパスにおいて,単語とその文脈間の相関構造を学習する。このネットワークは単語の意味表現を学習でき、静的および文脈依存の単語埋め込みを生成することができる。単語埋め込みに高密度表現を用いる従来の方法(BERT, GloVeなど)とは異なり、我々のアルゴリズムは単語の意味と文脈をスパースバイナリハッシュコードの形で符号化する。学習した表現の質は、単語類似性分析、単語センスの曖昧さ、文書分類に基づいて評価される。また,fruit fly networkモチーフはnlpの既存の手法に匹敵する性能を実現するだけでなく,計算資源のほんの一部(短いトレーニング時間と少ないメモリフットプリント)しか使用できないことを示した。 The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this network motif and apply it to learning the correlational structure between words and their context in a corpus of unstructured text, a common natural language processing (NLP) task. We show that this network can learn semantic representations of words and can generate both static and context-dependent word embeddings. Unlike conventional methods (e.g., BERT, GloVe) that use dense representations for word embedding, our algorithm encodes semantic meaning of words and their context in the form of sparse binary hash codes. The quality of the learned representations is evaluated on word similarity analysis, word-sense disambiguation, and document classification. It is shown that not only can the fruit fly network motif achieve performance comparable to existing methods in NLP, but, additionally, it uses only a fraction of the computational resources (shorter training time and smaller memory footprint).	翻訳日:2021-03-27 06:07:47 公開日:2021-03-14
# 大規模な調査では、深層学習モデルは欠落データ計算に優れているか? 経験的比較からの証拠 Are deep learning models superior for missing data imputation in large surveys? Evidence from an empirical comparison ( http://arxiv.org/abs/2103.09316v1 ) ライセンス: Link先を確認	Zhenhua Wang, Olanrewaju Akande, Jason Poulos and Fan Li	(参考訳) 多重計算(Multiple imputation、MI)は、サンプル調査における非応答性に起因する欠落データを扱うための最先端の手法である。連鎖方程式(MICE)による多重計算は最も広く使われているMI法であるが、理論的な基礎が欠如しており、計算集約的である。近年, 深層学習モデルに基づくMI手法が開発され, 小規模な研究が進められている。しかし,MICEと比較した場合,特に大規模調査では,現実的な環境下での性能を体系的に評価する研究が限られている。本稿では,実測データに基づくシミュレーションの一般的なフレームワークと,MI手法を比較するための性能指標について述べる。本研究では,アメリカコミュニティ調査データに基づく広範囲なシミュレーションを行い,分類木を用いたマウス,ランダム林を用いたマウス,生成的逆インプテーションネットワーク,デノージングオートエンコーダを用いた複数インプテーションの4つの機械学習手法の繰り返しサンプリング特性を比較した。深層学習に基づくMI手法は,計算時間の観点からはMICEが支配的であるが,分類木を用いたMICEは,偏差,平均二乗誤差,範囲の現実的な設定において,常に深層学習のMI手法よりも優れる。 Multiple imputation (MI) is the state-of-the-art approach for dealing with missing data arising from non-response in sample surveys. Multiple imputation by chained equations (MICE) is the most widely used MI method, but it lacks theoretical foundation and is computationally intensive. Recently, MI methods based on deep learning models have been developed with encouraging results in small studies. However, there has been limited research on systematically evaluating their performance in realistic settings comparing to MICE, particularly in large-scale surveys. This paper provides a general framework for using simulations based on real survey data and several performance metrics to compare MI methods. We conduct extensive simulation studies based on the American Community Survey data to compare repeated sampling properties of four machine learning based MI methods: MICE with classification trees, MICE with random forests, generative adversarial imputation network, and multiple imputation using denoising autoencoders. We find the deep learning based MI methods dominate MICE in terms of computational time; however, MICE with classification trees consistently outperforms the deep learning MI methods in terms of bias, mean squared error, and coverage under a range of realistic settings.	翻訳日:2021-03-18 13:03:29 公開日:2021-03-14
# (参考訳) 入射モデルに対するベイズ実験設計のためのハイブリッド勾配法 A Hybrid Gradient Method to Designing Bayesian Experiments for Implicit Models ( http://arxiv.org/abs/2103.08594v1 ) ライセンス: CC BY 4.0	Jiaxin Zhang, Sirui Bi, Guannan Zhang	(参考訳) ベイズ実験設計(BED)は,収集したデータから収集した情報を最大化する実験を設計することを目的としている。最適設計は通常、データとモデルパラメータ間の相互情報(MI)を最大化することで達成される。例えば、抽出可能なデータ分布を持つ暗黙のモデルを持つMIの分析式が利用できない場合、最近、MIのニューラルネットワークに基づく下界が提案され、下界を最大化するために勾配上昇法が用いられた。しかしながら、2020年のkleinegesseらによるアプローチでは、設計変数に対するmi下界の勾配を計算するためにパスワイズサンプリングパスが必要であり、そのようなパスワイズサンプリングパスは通常、暗黙のモデルではアクセスできない。本研究では,変分mi推定器と進化戦略(es)の最近の進歩とブラックボックス確率勾配上昇(sga)を組み合わせて,mi下界を最大化するハイブリッド勾配手法を提案する。これにより、経路勾配をサンプリングすることなく、暗黙のモデルに対して統一的なスケーラブルな手順で設計プロセスを実現できる。提案手法は,高次元設計空間における暗黙的モデルに対するBEDのスケーラビリティを著しく向上することを示す。 Bayesian experimental design (BED) aims at designing an experiment to maximize the information gathering from the collected data. The optimal design is usually achieved by maximizing the mutual information (MI) between the data and the model parameters. When the analytical expression of the MI is unavailable, e.g., having implicit models with intractable data distributions, a neural network-based lower bound of the MI was recently proposed and a gradient ascent method was used to maximize the lower bound. However, the approach in Kleinegesse et al., 2020 requires a pathwise sampling path to compute the gradient of the MI lower bound with respect to the design variables, and such a pathwise sampling path is usually inaccessible for implicit models. In this work, we propose a hybrid gradient approach that leverages recent advances in variational MI estimator and evolution strategies (ES) combined with black-box stochastic gradient ascent (SGA) to maximize the MI lower bound. This allows the design process to be achieved through a unified scalable procedure for implicit models without sampling path gradients. Several experiments demonstrate that our approach significantly improves the scalability of BED for implicit models in high-dimensional design space.	翻訳日:2021-03-18 01:33:00 公開日:2021-03-14
# 厳密なスパース直交辞書学習 Exact Sparse Orthogonal Dictionary Learning ( http://arxiv.org/abs/2103.09085v1 ) ライセンス: Link先を確認	Kai Liu, Yongjian Zhao, Hua Wang	(参考訳) 過去10年間、入力画像からの辞書の学習は、画像処理と圧縮センシングにおいて最も研究の注目を集めるトピックの1つとなっている。既存の辞書学習法の多くは、K-SVD法のような過剰完全辞書を考慮しており、相互不整合が高く、認識に悪影響を及ぼす可能性がある。一方、スパースコードは、通常、$\ell_0$または$\ell_1$-normのペナルティを追加することで最適化されるが、厳格なスパース性保証はない。本稿では,厳密なスパース符号とグローバルシーケンス収束保証付き直交辞書を得られる直交辞書学習モデルを提案する。本手法は, 辞書ベースの学習手法に比べて, 高い評価結果が得られること, 高い計算効率の利点が期待できることがわかった。 Over the past decade, learning a dictionary from input images for sparse modeling has been one of the topics which receive most research attention in image processing and compressed sensing. Most existing dictionary learning methods consider an over-complete dictionary, such as the K-SVD method, which may result in high mutual incoherence and therefore has a negative impact in recognition. On the other side, the sparse codes are usually optimized by adding the $\ell_0$ or $\ell_1$-norm penalty, but with no strict sparsity guarantee. In this paper, we propose an orthogonal dictionary learning model which can obtain strictly sparse codes and orthogonal dictionary with global sequence convergence guarantee. We find that our method can result in better denoising results than over-complete dictionary based learning methods, and has the additional advantage of high computation efficiency.	翻訳日:2021-03-17 13:19:03 公開日:2021-03-14
# (参考訳) 自律運転における表現学習によるレーダカメラ融合 Radar Camera Fusion via Representation Learning in Autonomous Driving ( http://arxiv.org/abs/2103.07825v1 ) ライセンス: CC BY 4.0	Xu Dong, Binnan Zhuang, Yunxiang Mao, Langechuan Liu	(参考訳) レーダーとカメラは成熟し、コスト効率が高く、堅牢なセンサーであり、大量生産された自動運転システムの認識スタックで広く利用されている。その相補的な性質のため、レーダー検出(レーダーピン)とカメラ認識(2dバウンディングボックス)からの出力は通常融合され、最良の知覚結果を生成する。レーダーカメラ融合の成功の鍵は、正確なデータ関連付けです。レーダ・カメラ・アソシエーションの課題は、運転シーンの複雑さ、レーダ測定のノイズとスパースの性質、および2次元境界ボックスからの深さのあいまいさに起因する。従来のルールに基づくアソシエーション手法は、難解なシナリオやコーナーケースの障害でパフォーマンスが低下するおそれがある。本研究では,rad-camアソシエーションを深層表現学習を通じて解決し,機能レベルのインタラクションとグローバル推論を検討する。具体的には,不完全なラベル付けの難しさを克服し,人間の批判的推論を強制するために,損失サンプリング機構と革新的な順序的損失をデザインする。規則に基づくアルゴリズムによって生成された雑音ラベルを用いて学習したにもかかわらず,提案手法は92.2%のf1スコアを達成し,これは規則に基づく教師よりも11.6%高い。さらに,このデータ駆動方式は,コーナーケースマイニングによる継続的改善にも有効だ。 Radars and cameras are mature, cost-effective, and robust sensors and have been widely used in the perception stack of mass-produced autonomous driving systems. Due to their complementary properties, outputs from radar detection (radar pins) and camera perception (2D bounding boxes) are usually fused to generate the best perception results. The key to successful radar-camera fusion is accurate data association. The challenges in radar-camera association can be attributed to the complexity of driving scenes, the noisy and sparse nature of radar measurements, and the depth ambiguity from 2D bounding boxes. Traditional rule-based association methods are susceptible to performance degradation in challenging scenarios and failure in corner cases. In this study, we propose to address rad-cam association via deep representation learning, to explore feature-level interaction and global reasoning. Concretely, we design a loss sampling mechanism and an innovative ordinal loss to overcome the difficulty of imperfect labeling and to enforce critical human reasoning. Despite being trained with noisy labels generated by a rule-based algorithm, our proposed method achieves a performance of 92.2% F1 score, which is 11.6% higher than the rule-based teacher. Moreover, this data-driven method also lends itself to continuous improvement via corner case mining.	翻訳日:2021-03-17 06:35:56 公開日:2021-03-14
# (参考訳) 文脈的確率推定のための文レベルのノイズコントラスト推定による単語レベルの言語モデル学習 Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation ( http://arxiv.org/abs/2103.07875v1 ) ライセンス: CC BY 4.0	Heewoong Park, Sukhyun Cho, Jonghun Park	(参考訳) 文や単語列の確率分布を推測することは自然言語処理の重要なプロセスである。単語系列の結合確率を計算するために単語レベル言語モデル(lms)が広く採用されているが、文確率推定(spe)に十分な長さの文脈を捉えるのが困難である。これを解決するために、近年の研究では、リカレントニューラルネットワーク(RNN)を用いた文レベルノイズコントラスト推定(NCE)を用いたトレーニング手法を導入している。本研究では,前文の条件文確率を推定することを目的とした文脈的SPEの拡張を試みる。提案されたNCEは、前のテキストとは無関係にネガティブな文をサンプリングするため、訓練されたモデルは、より一貫性のある文により高い確率を与える。本手法を単純な単語レベルのRNN LMに適用し,ネットワークアーキテクチャではなく文レベルのNCEトレーニングの効果に着目した。推定の質は,人間と自動生成した質問を含む複数項目のクローゼ型質問に対して評価した。実験結果は,提案手法が単語レベルRNN LMのSPE品質を改善することを示した。 Inferring the probability distribution of sentences or word sequences is a key process in natural language processing. While word-level language models (LMs) have been widely adopted for computing the joint probabilities of word sequences, they have difficulty in capturing a context long enough for sentence probability estimation (SPE). To overcome this, recent studies introduced training methods using sentence-level noise-contrastive estimation (NCE) with recurrent neural networks (RNNs). In this work, we attempt to extend it for contextual SPE, which aims to estimate a conditional sentence probability given a previous text. The proposed NCE samples negative sentences independently of a previous text so that the trained model gives higher probabilities to the sentences that are more consistent with \textcolor{blue}{the} context. We apply our method to a simple word-level RNN LM to focus on the effect of the sentence-level NCE training rather than on the network architecture. The quality of estimation was evaluated against multiple-choice cloze-style questions including both human and automatically generated questions. The experimental results show that the proposed method improved the SPE quality for the word-level RNN LM.	翻訳日:2021-03-17 06:23:03 公開日:2021-03-14
# (参考訳) R-GSN:異種グラフのためのリレーショナルグラフ類似ネットワーク R-GSN: The Relation-based Graph Similar Network for Heterogeneous Graph ( http://arxiv.org/abs/2103.07877v1 ) ライセンス: CC BY 4.0	Xinliang Wu and Mengying Jiang and Guizhong Liu	(参考訳) 不均一グラフは、実生活で広く存在するデータ構造の一種です。今日では、異種グラフ上のグラフニューラルネットワークの研究がますます盛んになっている。既存の異種グラフニューラルネットワークアルゴリズムは主にメタパスをベースとしており、もう1つはそうではない。メタパスに基づくアイデアは、しばしば手作業による事前処理を必要とするが、同時に大規模グラフの拡張は困難である。本稿では, メタパスを必要としない一般異種メッセージパッシングパラダイムを提案し, R-GSNを設計し, ベースラインのR-GCNに比べて大幅に改善した。実験により,我々のR-GSNアルゴリズムはogbn-mag大規模不均一グラフデータセット上での最先端の性能を実現することが示された。 Heterogeneous graph is a kind of data structure widely existing in real life. Nowadays, the research of graph neural network on heterogeneous graph has become more and more popular. The existing heterogeneous graph neural network algorithms mainly have two ideas, one is based on meta-path and the other is not. The idea based on meta-path often requires a lot of manual preprocessing, at the same time it is difficult to extend to large scale graphs. In this paper, we proposed the general heterogeneous message passing paradigm and designed R-GSN that does not need meta-path, which is much improved compared to the baseline R-GCN. Experiments have shown that our R-GSN algorithm achieves the state-of-the-art performance on the ogbn-mag large scale heterogeneous graph dataset.	翻訳日:2021-03-17 06:07:48 公開日:2021-03-14
# (参考訳) 携帯型拡張現実モバイルを用いた動的物体復元のためのマルチビューデータキャプチャ Multi-view data capture for dynamic object reconstruction using handheld augmented reality mobiles ( http://arxiv.org/abs/2103.07883v1 ) ライセンス: CC BY 4.0	M. Bortolon, L. Bazzanella, F. Poiesi	(参考訳) 動的オブジェクト3D再構成に適した複数の携帯端末からほぼ同期のフレームストリームをキャプチャするシステムを提案する。各モバイルは、そのポーズを推定するために同時にローカライズとマッピングを実行し、無線通信チャネルを使用して同期トリガーを送受信する。我々のシステムは、分散トリガ戦略とエッジまたはクラウドにデプロイ可能なデータリレーアーキテクチャを用いて、フレームとモバイルのポーズをリアルタイムで収集することができる。 3次元骨格とボリュームリコンストラクションに利用することで,本システムの有効性を示す。我々のトリガー戦略は、NTPベースの同期アプローチと同等のパフォーマンスを達成するが、アプリケーションのニーズに応じてオンラインで調整できるため、より高い柔軟性を提供する。屋外でスポーツ活動を行う俳優を録画する6つのハンドヘルド拡張現実モバイルを含む、挑戦的な新しいデータセット、すなわち4DMを作成しました。システムを4DM上で検証し、その強みと限界を分析し、モジュールと代替モジュールを比較します。 We propose a system to capture nearly-synchronous frame streams from multiple and moving handheld mobiles that is suitable for dynamic object 3D reconstruction. Each mobile executes Simultaneous Localisation and Mapping on-board to estimate its pose, and uses a wireless communication channel to send or receive synchronisation triggers. Our system can harvest frames and mobile poses in real time using a decentralised triggering strategy and a data-relay architecture that can be deployed either at the Edge or in the Cloud. We show the effectiveness of our system by employing it for 3D skeleton and volumetric reconstructions. Our triggering strategy achieves equal performance to that of an NTP-based synchronisation approach, but offers higher flexibility, as it can be adjusted online based on application needs. We created a challenging new dataset, namely 4DM, that involves six handheld augmented reality mobiles recording an actor performing sports actions outdoors. We validate our system on 4DM, analyse its strengths and limitations, and compare its modules with alternative ones.	翻訳日:2021-03-17 05:54:44 公開日:2021-03-14
# (参考訳) 標準平面の分類のための原理的超音波データ拡張 Principled Ultrasound Data Augmentation for Classification of Standard Planes ( http://arxiv.org/abs/2103.07895v1 ) ライセンス: CC BY 4.0	Lok Hin Lee and Yuan Gao and J. Alison Noble	(参考訳) 大きな学習能力を持つディープラーニングモデルは、しばしば医療画像データセットに適合する。これは、医療データ取得やラベル付けで生じるかなりの時間と費用のために、トレーニングセットが比較的小さいためである。したがって、データ拡張はトレーニングデータの可用性を拡大し、一般化を促進するためにしばしば用いられる。しかし、拡張戦略はしばしば正当化なしでアドホックに選択される。本稿では,モデル分類性能の向上を目的とした拡張ポリシー探索手法を提案する。我々は,医療画像解析によく用いられる追加の変換を補完ポリシー検索に含め,その性能を評価する。さらに,非線形混合サンプルデータ拡張戦略を含むように拡張ポリシー検索を拡張した。本研究では、超音波標準平面分類におけるナイーブデータ増強戦略よりも平均F1スコアが7.0%向上し、医学的画像モデルトレーニングのための原則的データ増強が超音波標準平面検出の大幅な改善につながることを示した。得られた超音波画像の表現は、よりよくクラスタ化され、最適化されたデータ拡張で定義される。 Deep learning models with large learning capacities often overfit to medical imaging datasets. This is because training sets are often relatively small due to the significant time and financial costs incurred in medical data acquisition and labelling. Data augmentation is therefore often used to expand the availability of training data and to increase generalization. However, augmentation strategies are often chosen on an ad-hoc basis without justification. In this paper, we present an augmentation policy search method with the goal of improving model classification performance. We include in the augmentation policy search additional transformations that are often used in medical image analysis and evaluate their performance. In addition, we extend the augmentation policy search to include non-linear mixed-example data augmentation strategies. Using these learned policies, we show that principled data augmentation for medical image model training can lead to significant improvements in ultrasound standard plane detection, with an an average F1-score improvement of 7.0% overall over naive data augmentation strategies in ultrasound fetal standard plane classification. We find that the learned representations of ultrasound images are better clustered and defined with optimized data augmentation.	翻訳日:2021-03-17 05:33:46 公開日:2021-03-14
# (参考訳) バングラ手書き文字認識と生成 Bangla Handwritten Digit Recognition and Generation ( http://arxiv.org/abs/2103.07905v1 ) ライセンス: CC BY 4.0	Md Fahim Sikder	(参考訳) 手書き数字や数値認識は、パターン認識の分野では古典的な問題の一つであり、近年のコンピュータリソースの幅広い可用性のために、大きな進歩を遂げています。英語、アラビア語、中国語、日本語手書きのスクリプトですでに豊富な作品が行われています。バングラでの作業もいくつか行われたが、開発の余地がある。そこで本論文では,BHANDデータセット上で99.44%の検証精度を達成し,AlexnetとInception V3アーキテクチャを上回ったアーキテクチャを実装した。数値認識以外にも、デジタル生成は研究者の注目を集めている分野でもあるが、特にバングラについての研究はあまり行われていない。本論文では,Bangla手書き数字を生成するためにSemi-supvised Generative Adversarial Network(SGAN)を適用し,Bangla桁の生成に成功した。 Handwritten digit or numeral recognition is one of the classical issues in the area of pattern recognition and has seen tremendous advancement because of the recent wide availability of computing resources. Plentiful works have already done on English, Arabic, Chinese, Japanese handwritten script. Some work on Bangla also have been done but there is space for development. From that angle, in this paper, an architecture has been implemented which achieved the validation accuracy of 99.44% on BHAND dataset and outperforms Alexnet and Inception V3 architecture. Beside digit recognition, digit generation is another field which has recently caught the attention of the researchers though not many works have been done in this field especially on Bangla. In this paper, a Semi-Supervised Generative Adversarial Network or SGAN has been applied to generate Bangla handwritten numerals and it successfully generated Bangla digits.	翻訳日:2021-03-17 05:24:11 公開日:2021-03-14
# (参考訳) 自然言語処理における再現性研究の体系的レビュー A Systematic Review of Reproducibility Research in Natural Language Processing ( http://arxiv.org/abs/2103.07929v1 ) ライセンス: CC BY 4.0	Anya Belz, Shubham Agarwal, Anastasia Shimorina, Ehud Reiter	(参考訳) 科学における再現性危機と呼ばれることの背景から、NLPの分野はますます興味を持ち、その成果の再現性に精通してきている。過去数年間、この地域では様々な新しいイニシアチブやイベント、活発な研究が行われてきた。しかし、再現性がどのように定義され、測定され、対処されるべきかについて、この分野は合意に達するには程遠い。この重点的貢献により、NLPの再現性に関する現在の作業のスナップショット、相違点と類似点の記述、共通分母へのポインタの提供を、可能な限り広角かつ近距離に行うことを目指しています。 Against the background of what has been termed a reproducibility crisis in science, the NLP field is becoming increasingly interested in, and conscientious about, the reproducibility of its results. The past few years have seen an impressive range of new initiatives, events and active research in the area. However, the field is far from reaching a consensus about how reproducibility should be defined, measured and addressed, with diversity of views currently increasing rather than converging. With this focused contribution, we aim to provide a wide-angle, and as near as possible complete, snapshot of current work on reproducibility in NLP, delineating differences and similarities, and providing pointers to common denominators.	翻訳日:2021-03-17 05:15:09 公開日:2021-03-14
# (参考訳) Gym-ANM: 電力配電システムにおけるアクティブネットワーク管理タスクのための強化学習環境 Gym-ANM: Reinforcement Learning Environments for Active Network Management Tasks in Electricity Distribution Systems ( http://arxiv.org/abs/2103.07932v1 ) ライセンス: CC BY 4.0	Robin Henry and Damien Ernst	(参考訳) 配電ネットワークのアクティブネットワーク管理(ANM)には、多くの複雑な確率的逐次最適化問題が含まれる。これらの問題は、再生可能エネルギーと分散ストレージを将来の電力網に統合するために解決する必要がある。本稿では、電力配電ネットワークにおけるANMタスクをモデル化する強化学習(RL)環境を設計するためのフレームワークであるGym-ANMを紹介する。これらの環境は、そのようなシステムの基盤となるダイナミクスに関する広範な知識を必要としない電力ネットワークの管理におけるrl研究の新しい場を提供する。この作業に加えて、ANMの共通の課題を強調するために設計された入門玩具環境ANM6-Easyの実装をリリースしています。また、モデル予測制御(MPC)手法と比較して、最先端のRLアルゴリズムはANM6-Easy上で既に優れた性能が得られることを示す。最後に, (a) 分布ネットワークトポロジーとパラメータ, (b) 観測空間, (c) システムに存在する確率過程のモデル化, (d) 報酬信号に影響を及ぼす一連のハイパーパラメータについて異なる新しい体育環境を作成するためのガイドラインを提供する。 gym-anmはhttps://github.com/robinhenry/gym-anmからダウンロードできる。 Active network management (ANM) of electricity distribution networks include many complex stochastic sequential optimization problems. These problems need to be solved for integrating renewable energies and distributed storage into future electrical grids. In this work, we introduce Gym-ANM, a framework for designing reinforcement learning (RL) environments that model ANM tasks in electricity distribution networks. These environments provide new playgrounds for RL research in the management of electricity networks that do not require an extensive knowledge of the underlying dynamics of such systems. Along with this work, we are releasing an implementation of an introductory toy-environment, ANM6-Easy, designed to emphasize common challenges in ANM. We also show that state-of-the-art RL algorithms can already achieve good performance on ANM6-Easy when compared against a model predictive control (MPC) approach. Finally, we provide guidelines to create new Gym-ANM environments differing in terms of (a) the distribution network topology and parameters, (b) the observation space, (c) the modelling of the stochastic processes present in the system, and (d) a set of hyperparameters influencing the reward signal. Gym-ANM can be downloaded at https://github.com/robinhenry/gym-anm.	翻訳日:2021-03-17 04:55:40 公開日:2021-03-14
# (参考訳) サンプルタスク実行から針挿入を学ぶ Learning needle insertion from sample task executions ( http://arxiv.org/abs/2103.07938v1 ) ライセンス: CC BY 4.0	Amir Ghalamzan-E	(参考訳) ロボット作業、例えばロボット縫合の自動化は非常に複雑で時間がかかる。自律的にタスクを実行するためのタスクモデルを学ぶことは、技術、ロボット手術、より広いコミュニティのためにアクセス可能にする貴重なことです。ロボット手術のデータを簡単に記録でき、収集したデータを使ってタスクモデルを学ぶことができる。これにより、外科医がロボット操作を監督したり、ツールの低レベル制御の代わりに高レベルのコマンドを与えることができるロボット手術の時間とコストが削減されます。腕1が軟組織に針を挿入し、腕2が軟組織を積極的に操作し、所望の出口と実際の出口が同一であることを保証する2本の腕を持つ軟組織に針を挿入するデータセットを提案する。これは、組織をアクティブに操作することなく縫合することは縫合に失敗する可能性があるため、縫合が縫合に適用される力に耐えるだけの十分な組織を縫合することが出来ないため、実際の手術において重要である。 3対のステレオカメラで記録された60の治験を含む針挿入データセットを提案する。さらに, t 以降の段階でロボットの望ましい状態を予測するDeep-Robot Learning from Demonstrations(デモからの深層ロボット学習)を, 過去のステップのビデオ(すなわち, t での最適動作)から見て紹介する。 n ステップタイム履歴 N はタスクの実行のメモリタイムウィンドウです。実験結果は,提案する深層モデルアーキテクチャが既存手法を上回っていることを示す。ソリューションはまだ実際のロボットにデプロイする準備が整っていないが、結果は実際のロボットを展開するための将来の開発の可能性を示している。 Automating a robotic task, e.g., robotic suturing can be very complex and time-consuming. Learning a task model to autonomously perform the task is invaluable making the technology, robotic surgery, accessible for a wider community. The data of robotic surgery can be easily logged where the collected data can be used to learn task models. This will result in reduced time and cost of robotic surgery in which a surgeon can supervise the robot operation or give high-level commands instead of low-level control of the tools. We present a data-set of needle insertion in soft tissue with two arms where Arm 1 inserts the needle into the tissue and Arm 2 actively manipulate the soft tissue to ensure the desired and actual exit points are the same. This is important in real-surgery because suturing without active manipulation of tissue may yield failure of the suturing as the stitch may not grip enough tissue to resist the force applied for the suturing. We present a needle insertion dataset including 60 successful trials recorded by 3 pair of stereo cameras. Moreover, we present Deep-robot Learning from Demonstrations that predicts the desired state of the robot at the time step after t (which the optimal action taken at t yields) by looking at the video of the past time steps, i.e. n step time history where N is the memory time window, of the task execution. The experimental results illustrate our proposed deep model architecture is outperforming the existing methods. Although the solution is not yet ready to be deployed on a real robot, the results indicate the possibility of future development for real robot deployment.	翻訳日:2021-03-17 04:54:27 公開日:2021-03-14
# (参考訳) MLベースのシステムのためのソフトウェアアーキテクチャ - 既存のものと、その先にあるもの Software Architecture for ML-based Systems: What Exists and What Lies Ahead ( http://arxiv.org/abs/2103.07950v1 ) ライセンス: CC BY 4.0	Henry Muccini and Karthik Vaidhyanathan	(参考訳) 機械学習(ML)の利用の増加と、現代のソフトウェアアーキテクチャの課題が組み合わさって、MLベースのシステムのためのソフトウェアアーキテクチャ、MLベースのソフトウェアシステムを開発するためのアーキテクチャ技術開発に焦点を当てたソフトウェアアーキテクチャのためのソフトウェアアーキテクチャ、そして、従来のソフトウェアシステムを構築するためのML技術の開発に焦点を当てたソフトウェアアーキテクチャのためのMLの2つの広い研究領域が生まれた。本研究では、MLベースのソフトウェアシステムを設計する現在のシナリオに存在するさまざまなアーキテクチャプラクティスを強調することを目的として、スペクトルの以前の側面に焦点を当てる。 MLベースのソフトウェアシステムを設計するための標準的なプラクティスセットをより適切に定義するために、MLとソフトウェア実践者の双方の注意を必要とするソフトウェアアーキテクチャの4つの重要な領域を特定します。私たちは、イタリア最大の博物館のひとつでキューイングの課題を解決するmlベースのソフトウェアシステムを構築した経験から、これらの領域を基盤としています。 The increasing usage of machine learning (ML) coupled with the software architectural challenges of the modern era has resulted in two broad research areas: i) software architecture for ML-based systems, which focuses on developing architectural techniques for better developing ML-based software systems, and ii) ML for software architectures, which focuses on developing ML techniques to better architect traditional software systems. In this work, we focus on the former side of the spectrum with a goal to highlight the different architecting practices that exist in the current scenario for architecting ML-based software systems. We identify four key areas of software architecture that need the attention of both the ML and software practitioners to better define a standard set of practices for architecting ML-based software systems. We base these areas in light of our experience in architecting an ML-based software system for solving queuing challenges in one of the largest museums in Italy.	翻訳日:2021-03-17 04:39:46 公開日:2021-03-14
# (参考訳) アクティブダイナミカルプロスペクション:パスフィンディング時の感覚制御のための粒子フィルタリングとしてのメンタルシミュレーションのモデル化 Active Dynamical Prospection: Modeling Mental Simulation as Particle Filtering for Sensorimotor Control during Pathfinding ( http://arxiv.org/abs/2103.07966v1 ) ライセンス: CC BY-SA 4.0	Jeremy Gordon and John Chuang	(参考訳) 共通の課題に直面した時に人間が何をするか – どこに行きたいかは分かっていますが、そこに着く最善の方法がまだ分かっていません。これは空間的ナビゲーションやパスフィニングにおいてエージェントが引き起こす問題であり、その解決策は一般により抽象的な計画領域に関するヒントを与えるかもしれない。本研究では,パスファインディング行動の連続的,明示的な探索的パラダイムをモデル化する。私たちのタスクでは、参加者(およびエージェント)は、部分的に観察可能な環境で視覚的な探索とナビゲーションの両方を調整しなければなりません。 1)オンライン実験として実施した新しいパスファインディングパラダイムにおける81名の被験者の行動データの解析,2) 粒子フィルタリングとしてナビゲーション中の予測的メンタルシミュレーションをモデル化する提案,3) 計算エージェントにおける提案のインスタンス化,の3つの主成分がある。我々のモデルであるActive Dynamical Prospectionでは、マップの解法率、経路選択、試行期間の類似パターンと、人間の参加者のデータと比較した場合の注意行動(集約レベルと個人レベルの両方)が示される。また,最初の移動前の遠近的注意と遅延(予測シミュレーションの潜在的な相関関係)がタスク性能の予測であることを見出した。 What do humans do when confronted with a common challenge: we know where we want to go but we are not yet sure the best way to get there, or even if we can. This is the problem posed to agents during spatial navigation and pathfinding, and its solution may give us clues about the more abstract domain of planning in general. In this work, we model pathfinding behavior in a continuous, explicitly exploratory paradigm. In our task, participants (and agents) must coordinate both visual exploration and navigation within a partially observable environment. Our contribution has three primary components: 1) an analysis of behavioral data from 81 human participants in a novel pathfinding paradigm conducted as an online experiment, 2) a proposal to model prospective mental simulation during navigation as particle filtering, and 3) an instantiation of this proposal in a computational agent. We show that our model, Active Dynamical Prospection, demonstrates similar patterns of map solution rate, path selection, and trial duration, as well as attentional behavior (at both aggregate and individual levels) when compared with data from human participants. We also find that both distal attention and delay prior to first move (both potential correlates of prospective simulation) are predictive of task performance.	翻訳日:2021-03-17 04:21:13 公開日:2021-03-14
# (参考訳) CrossoverScheduler: クロスオーバーマナーで複数の分散トレーニングアプリケーションをオーバーラップする CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner ( http://arxiv.org/abs/2103.07974v1 ) ライセンス: CC BY 4.0	Cheng Luo, Lei Qu, Youshan Miao, Peng Cheng, Yongqiang Xiong	(参考訳) 分散ディープラーニングのワークロードには、GPUクラスタ上のスループット集約型トレーニングタスクが含まれる。分散確率勾配日射(Distributed Stochastic Gradient Descent, SGD)は、後方伝播後の通信遅延を大幅に増大させる。本稿では,分散トレーニングアプリケーションの通信サイクルを,パイプライン通信と計算を通じて他のアプリケーションで満たすアルゴリズムであるcrossoverschedulerを提案する。 CrossoverSchedulerでは、収束率とネットワーク精度を犠牲にすることなく、分散トレーニングの実行性能を著しく向上させることができる。我々は、複数の分散ディープラーニングアプリケーションが同じGPUを交互にタイムシェアできるクロスオーバー同期を導入することで実現した。 CrossoverSchedulerのプロトタイプはHorovodと構築および統合されています。さまざまな分散タスクの実験から、CrossoverSchedulerはImageNetデータセット上の画像分類タスクの20%のスピードアップを実現している。 Distributed deep learning workloads include throughput-intensive training tasks on the GPU clusters, where the Distributed Stochastic Gradient Descent (SGD) incurs significant communication delays after backward propagation, forces workers to wait for the gradient synchronization via a centralized parameter server or directly in decentralized workers. We present CrossoverScheduler, an algorithm that enables communication cycles of a distributed training application to be filled by other applications through pipelining communication and computation. With CrossoverScheduler, the running performance of distributed training can be significantly improved without sacrificing convergence rate and network accuracy. We achieve so by introducing Crossover Synchronization which allows multiple distributed deep learning applications to time-share the same GPU alternately. The prototype of CrossoverScheduler is built and integrated with Horovod. Experiments on a variety of distributed tasks show that CrossoverScheduler achieves 20% \times speedup for image classification tasks on ImageNet dataset.	翻訳日:2021-03-17 04:07:58 公開日:2021-03-14
# (参考訳) 暗黙モデルを用いたベイズ実験設計のためのスケーラブルグラデーションフリー手法 A Scalable Gradient-Free Method for Bayesian Experimental Design with Implicit Models ( http://arxiv.org/abs/2103.08026v1 ) ライセンス: CC BY 4.0	Jiaxin Zhang, Sirui Bi, Guannan Zhang	(参考訳) ベイズ実験設計(BED)は、情報収集を最大化する設計を選択する方法という質問に答えることです。暗黙的なモデルでは、サンプリングが可能であり、従来のBED法では、後方分布を効率的に推定し、データとパラメータ間の相互情報(MI)を最大化するのが困難である。最近の研究では、これらの問題に対処するためにMIの低い境界を最大化するグラデーションアセンションの使用を提案しました。しかし、この手法では設計変数に関してMI下限の経路勾配を計算するためにサンプリングパスが必要であり、そのような経路勾配は通常暗黙のモデルでは到達できない。本論文では, 確率的近似勾配上昇の最近の進歩を有効かつ堅牢なBEDのための平滑な変動MI推定器に組み込んだ新しい手法を提案する。経路勾配の必要がなければ,本手法は暗黙的モデルに対して近似的な勾配を持つ統一的な手順で設計プロセスを実現することができる。いくつかの実験により,本手法はベースライン法より優れ,高次元問題におけるBEDのスケーラビリティが著しく向上することが示された。 Bayesian experimental design (BED) is to answer the question that how to choose designs that maximize the information gathering. For implicit models, where the likelihood is intractable but sampling is possible, conventional BED methods have difficulties in efficiently estimating the posterior distribution and maximizing the mutual information (MI) between data and parameters. Recent work proposed the use of gradient ascent to maximize a lower bound on MI to deal with these issues. However, the approach requires a sampling path to compute the pathwise gradient of the MI lower bound with respect to the design variables, and such a pathwise gradient is usually inaccessible for implicit models. In this paper, we propose a novel approach that leverages recent advances in stochastic approximate gradient ascent incorporated with a smoothed variational MI estimator for efficient and robust BED. Without the necessity of pathwise gradients, our approach allows the design process to be achieved through a unified procedure with an approximate gradient for implicit models. Several experiments show that our approach outperforms baseline methods, and significantly improves the scalability of BED in high-dimensional problems.	翻訳日:2021-03-17 04:01:53 公開日:2021-03-14
# (参考訳) 低リソースニューラルネットワーク翻訳のためのクラウドソーシングフレーズベースのトークン化:Fon言語の場合 Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language ( http://arxiv.org/abs/2103.08052v1 ) ライセンス: CC BY 4.0	Bonaventure F. P. Dossou and Chris C. Emezue	(参考訳) 非常に低リソースで形態的に豊かなアフリカの先住民言語に対する効果的なニューラルネットワーク翻訳(NMT)モデルの構築は、オープンな課題である。利用可能なリソースを見つけるという問題に加えて、多くの作業が前処理とトークン化に費やされます。最近の研究では、標準のトークン化方法がアフリカ言語の文法的、ダイアクリティカル、トーン特性を適切に扱うとは限らないことが示されています。トレーニングサンプルの可用性が極めて低いことに加えて、信頼性の高いNMTモデルの生産を妨げている。本稿では,fon言語を事例研究として,標準トークン化法を再検討し,人間主導のスーパーワードトークン化戦略であるword-expressions-based (web)トークン化を導入する。さらに、Fon-France-Fon翻訳タスクのトークン化戦略を他の人と比較します。 Building effective neural machine translation (NMT) models for very low-resourced and morphologically rich African indigenous languages is an open challenge. Besides the issue of finding available resources for them, a lot of work is put into preprocessing and tokenization. Recent studies have shown that standard tokenization methods do not always adequately deal with the grammatical, diacritical, and tonal properties of some African languages. That, coupled with the extremely low availability of training samples, hinders the production of reliable NMT models. In this paper, using Fon language as a case study, we revisit standard tokenization methods and introduce Word-Expressions-Based (WEB) tokenization, a human-involved super-words tokenization strategy to create a better representative vocabulary for training. Furthermore, we compare our tokenization strategy to others on the Fon-French and French-Fon translation tasks.	翻訳日:2021-03-17 03:38:33 公開日:2021-03-14
# (参考訳) RecSim NG:Recommenderエコシステムの原則的不確実性モデリングを目指して RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems ( http://arxiv.org/abs/2103.08057v1 ) ライセンス: CC BY 4.0	Martin Mladenov, Chih-Wei Hsu, Vihan Jain, Eugene Ie, Christopher Colby, Nicolas Mayoraz, Hubert Pham, Dustin Tran, Ivan Vendrov, Craig Boutilier	(参考訳) ユーザとのマルチターンインタラクションを最適化し、レコメンダエコシステムにおけるさまざまなエージェント(ユーザ、コンテンツプロバイダ、ベンダなど)のインタラクションをモデル化するレコメンダシステムの開発は、近年注目を集めている。このようなレコメンダーのためのモデルとアルゴリズムの開発とトレーニングは、静的データセットを使用することで特に困難になる可能性があります。そこで我々は,マルチエージェントレコメンダシステムのシミュレーションのための確率的プラットフォームであるrecsim ngを開発した。 RecSim NGはEdward2とTensorFlowで実装されたスケーラブルでモジュール化された差別化可能なシミュレータである。エージェントビヘイビア仕様のための強力で汎用的な確率的プログラム言語、自動微分とトレースによる確率的推論と潜在変数モデル学習のためのツール、アクセラレーションされたハードウェア上でシミュレーションを実行するTensorFlowベースのランタイムを提供する。 RecSim NGについて説明するとともに、RecSim NGが研究者と実践者の両方にとって、レコメンダシステムのための新しいアルゴリズムを容易に開発し、訓練するための簡単なユースケースの小さなセットによって補完される、レコメンダエコシステムの透明で構成可能なエンドツーエンドモデルの作成にどのように使用できるかを説明している。 The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different agents (e.g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years. Developing and training models and algorithms for such recommenders can be especially difficult using static datasets, which often fail to offer the types of counterfactual predictions needed to evaluate policies over extended horizons. To address this, we develop RecSim NG, a probabilistic platform for the simulation of multi-agent recommender systems. RecSim NG is a scalable, modular, differentiable simulator implemented in Edward2 and TensorFlow. It offers: a powerful, general probabilistic programming language for agent-behavior specification; tools for probabilistic inference and latent-variable model learning, backed by automatic differentiation and tracing; and a TensorFlow-based runtime for running simulations on accelerated hardware. We describe RecSim NG and illustrate how it can be used to create transparent, configurable, end-to-end models of a recommender ecosystem, complemented by a small set of simple use cases that demonstrate how RecSim NG can help both researchers and practitioners easily develop and train novel algorithms for recommender systems.	翻訳日:2021-03-17 03:22:11 公開日:2021-03-14
# (参考訳) Versailles-FPデータセット:古代の壁検出 Versailles-FP dataset: Wall Detection in Ancient ( http://arxiv.org/abs/2103.08064v1 ) ライセンス: CC BY 4.0	Wassim Swaileh, Dimitrios Kotzinos, Suman Ghosh, Michel Jordan, Son Vu, and Yaguan Qian	(参考訳) 歴史的建造物の床計画へのアクセスは、建築の進化と歴史を理解するために必要である。このような知識ベースは、かつて建物の一部であったさまざまな出来事、人物、事実の間のつながりを確立することで、歴史の再構築にも役立ちます。 2次元の計画は空間全体を捉えないため、3Dモデリングはこれらのユニークなアーカイブの読影に新たな光を放ち、記念碑の古代国家を理解するための大きな視点を開く。建物や記念碑の3Dモデルの最初のステップは、フロアプランにおける壁検出であり、本稿では、17世紀から18世紀にかけてのヴェルサイユ宮殿の、新しい独特で独特な壁面のFPデータセットを紹介する。データセットの壁マスクは、多方向ステアブルフィルタに基づく自動アプローチによって生成される。生成された壁面は手作業で検証され修正される。我々は最新のデータ集合における壁マスク生成のアプローチを検証する。最後に、壁検出のためのUネットベースの畳み込みフレームワークを提案する。本手法は,完全接続型ネットワークベースアプローチを超越した技術結果を実現する。 Access to historical monuments' floor plans over a time period is necessary to understand the architectural evolution and history. Such knowledge bases also helps to rebuild the history by establishing connection between different event, person and facts which are once part of the buildings. Since the two-dimensional plans do not capture the entire space, 3D modeling sheds new light on the reading of these unique archives and thus opens up great perspectives for understanding the ancient states of the monument. Since the first step in the building's or monument's 3D model is the wall detection in the floor plan, we introduce in this paper the new and unique Versailles FP dataset of wall groundtruthed images of the Versailles Palace dated between 17th and 18th century. The dataset's wall masks are generated using an automatic approach based on multi directional steerable filters. The generated wall masks are then validated and corrected manually. We validate our approach of wall mask generation in state-of-the-art modern datasets. Finally we propose a U net based convolutional framework for wall detection. Our method achieves state of the art result surpassing fully connected network based approach.	翻訳日:2021-03-17 02:51:48 公開日:2021-03-14
# (参考訳) ゼロショット創発通信のための準等価ディスカバリ Quasi-Equivalence Discovery for Zero-Shot Emergent Communication ( http://arxiv.org/abs/2103.08067v1 ) ライセンス: CC BY 4.0	Kalesha Bullard, Douwe Kiela, Joelle Pineau, Jakob Foerster	(参考訳) 効果的なコミュニケーションはマルチエージェント環境での情報交換を可能にする重要なスキルであり、創発的コミュニケーションは活気ある研究分野であり、個別の安価トークチャネルを含む共通的な設定である。定義上、これらの設定には任意の情報エンコーディングが含まれており、通常、学習したプロトコルがトレーニングパートナーを超えて一般化することを許さない。対照的に、本研究では、ゼロショットコーディネーション(ZSC)を可能にする新しい問題設定と準等価ディスカバリー(QED)アルゴリズム、すなわち独立に訓練されたエージェントに一般化できるプロトコルを発見することを提案する。現実世界の問題設定にはしばしば高価な通信チャネルが含まれており、例えばロボットは四肢を物理的に動かさなければならない。これらの2つの要因が,エージェントが意図を伝えるためにメッセージのエネルギーコストを使用するレファレンシャルゲームにおいて,ユニークなzscポリシーをもたらすことを示す。 Other-Playは最近、最適なZSCポリシーを学ぶために導入されたが、問題の対称性に事前アクセスする必要がある。代わりに、qedはこの設定における対称性を反復的に発見し、最適なzscポリシーに収束する。 Effective communication is an important skill for enabling information exchange in multi-agent settings and emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. Since, by definition, these settings involve arbitrary encoding of information, typically they do not allow for the learned protocols to generalize beyond training partners. In contrast, in this work, we present a novel problem setting and the Quasi-Equivalence Discovery (QED) algorithm that allows for zero-shot coordination (ZSC), i.e., discovering protocols that can generalize to independently trained agents. Real world problem settings often contain costly communication channels, e.g., robots have to physically move their limbs, and a non-uniform distribution over intents. We show that these two factors lead to unique optimal ZSC policies in referential games, where agents use the energy cost of the messages to communicate intent. Other-Play was recently introduced for learning optimal ZSC policies, but requires prior access to the symmetries of the problem. Instead, QED can iteratively discovers the symmetries in this setting and converges to the optimal ZSC policy.	翻訳日:2021-03-17 02:39:17 公開日:2021-03-14
# マルチGANモデルを用いたクレーム検証 Claim Verification using a Multi-GAN based Model ( http://arxiv.org/abs/2103.08001v1 ) ライセンス: Link先を確認	Amartya Hatua, Arjun Mukherjee and Rakesh M. Verma	(参考訳) 本稿では,複数のGANモデルを用いたクレーム検証について述べる。提案モデルは3組のジェネレータと判別器から構成される。生成器と識別器のペアは、支持および反論されたクレームおよびクレームラベルの合成データを生成する責任があります。提案モデルに関する理論的議論は、モデルの平衡状態を検証するために提供される。提案モデルはフィーバーデータセットに適用され、入力テキストデータには事前学習された言語モデルが使用される。合成されたデータは、モデルが技術モデルや他の標準分類器の状態よりも優れた性能を発揮するのに役立つ情報を得るのに役立つ。 This article describes research on claim verification carried out using a multiple GAN-based model. The proposed model consists of three pairs of generators and discriminators. The generator and discriminator pairs are responsible for generating synthetic data for supported and refuted claims and claim labels. A theoretical discussion about the proposed model is provided to validate the equilibrium state of the model. The proposed model is applied to the FEVER dataset, and a pre-trained language model is used for the input text data. The synthetically generated data helps to gain information which helps the model to perform better than state of the art models and other standard classifiers.	翻訳日:2021-03-16 14:32:49 公開日:2021-03-14
# 3次元シーン理解のためのモンテカルロシーン検索 Monte Carlo Scene Search for 3D Scene Understanding ( http://arxiv.org/abs/2103.07969v1 ) ライセンス: Link先を確認	Shreyas Hampali, Sinisa Stekovic, Sayan Deb Sarkar, Chetan Srinivasa Kumar, Friedrich Fraundorfer, Vincent Lepetit	(参考訳) トレーニングデータの必要性を低減するために、一般的なAIアルゴリズムを3Dシーン理解にどのように使用できるかを検討します。より正確には、ノイズの多いRGB-Dスキャンからオブジェクトと部屋レイアウトを検索するためのモンテカルロ木探索(MCTS)アルゴリズムの修正を提案する。 MCTSはゲームプレイングアルゴリズムとして開発されたが、複雑な知覚問題にも使用できることを示す。簡単に調整できるハイパーパラメータは少なく、一般的な損失を最適化できる。 rgb-dデータに基づいて,物体の後方確率と室内配置仮説を最適化する。これにより、現在の解をレンダリングしてRGB-D観測と比較することにより、解空間を探索する分析バイシンセシスアプローチがもたらされる。この探索をより効率的に行うために,標準MCTSのツリー構築・探索方針の簡易な変更を提案する。 ScanNetデータセットに対する我々のアプローチを実証する。我々のメソッドは、特にレイアウト上の手動アノテーションよりも優れた設定を検索することが多い。 We explore how a general AI algorithm can be used for 3D scene understanding in order to reduce the need for training data. More exactly, we propose a modification of the Monte Carlo Tree Search (MCTS) algorithm to retrieve objects and room layouts from noisy RGB-D scans. While MCTS was developed as a game-playing algorithm, we show it can also be used for complex perception problems. It has few easy-to-tune hyperparameters and can optimise general losses. We use it to optimise the posterior probability of objects and room layout hypotheses given the RGB-D data. This results in an analysis-by-synthesis approach that explores the solution space by rendering the current solution and comparing it to the RGB-D observations. To perform this exploration even more efficiently, we propose simple changes to the standard MCTS' tree construction and exploration policy. We demonstrate our approach on the ScanNet dataset. Our method often retrieves configurations that are better than some manual annotations especially on layouts.	翻訳日:2021-03-16 14:32:40 公開日:2021-03-14
# 残差説明に基づく新しい解釈不能非監視異常検出法 A new interpretable unsupervised anomaly detection method based on residual explanation ( http://arxiv.org/abs/2103.07953v1 ) ライセンス: Link先を確認	David F. N. Oliveira, Lucio F. Vismari, Alexandre M. Nascimento, Jorge R. de Almeida Jr, Paulo S. Cugnasca, Joao B. Camargo Jr, Leandro Almeida, Rafael Gripp, Marcelo Neves	(参考訳) 難しい問題に対処するために複雑なパターンをモデリングする際の優れたパフォーマンスにもかかわらず、Deep Learning(DL)メソッドのブラックボックスの性質は、現実のクリティカルドメインにおけるアプリケーションに制限を課している。ブラックボックスの決定に対する人間の推論を可能にする円滑な方法の欠如は、予期せぬ出来事に対する予防措置を妨げ、破滅的な結果をもたらす可能性がある。ブラックボックスモデルの不明瞭さに取り組むため、解釈性はdlベースのシステムにおいて基本的な要件となり、モデルの振る舞いを理解する方法を提供することで、信頼と知識を活用した。現在のホットなトピックですが、監視されていないDLベースの異常検出モデル(AD)における現在の解釈可能性メソッドの既存の制限を克服するには、さらなる進歩が必要です。オートエンコーダ(AE)は、ADアプリケーションのための教師なしDLベースのコアであり、クラス内で最高のパフォーマンスを達成する。しかし、この結果を得るためのハイブリッドな側面(ネットワーク外での追加計算を必要とする)のため、AEベースのADに適用できるのは非依存の解釈可能な方法のみである。これらの非依存メソッドは、多数のパラメータを処理するのに計算的に高価である。本稿では,大規模システムにおけるAEベースのADの限界に対処する新しい解釈可能性手法であるRXP(Residual eXPlainer)を提案する。実装の単純化、計算コストの低減、および再構成された入力機能の偏差解析によって説明が得られる決定論的な振る舞いが際立っています。実鉄道路線のデータを用いた実験において,提案手法はSHAPよりも優れた性能を示し,大規模クリティカルシステムにおける意思決定を支援する可能性を実証した。 Despite the superior performance in modeling complex patterns to address challenging problems, the black-box nature of Deep Learning (DL) methods impose limitations to their application in real-world critical domains. The lack of a smooth manner for enabling human reasoning about the black-box decisions hinder any preventive action to unexpected events, in which may lead to catastrophic consequences. To tackle the unclearness from black-box models, interpretability became a fundamental requirement in DL-based systems, leveraging trust and knowledge by providing ways to understand the model's behavior. Although a current hot topic, further advances are still needed to overcome the existing limitations of the current interpretability methods in unsupervised DL-based models for Anomaly Detection (AD). Autoencoders (AE) are the core of unsupervised DL-based for AD applications, achieving best-in-class performance. However, due to their hybrid aspect to obtain the results (by requiring additional calculations out of network), only agnostic interpretable methods can be applied to AE-based AD. These agnostic methods are computationally expensive to process a large number of parameters. In this paper we present the RXP (Residual eXPlainer), a new interpretability method to deal with the limitations for AE-based AD in large-scale systems. It stands out for its implementation simplicity, low computational cost and deterministic behavior, in which explanations are obtained through the deviation analysis of reconstructed input features. In an experiment using data from a real heavy-haul railway line, the proposed method achieved superior performance compared to SHAP, demonstrating its potential to support decision making in large scale critical systems.	翻訳日:2021-03-16 14:28:20 公開日:2021-03-14
# ニューラルネットワークにおける補間損失挙動 Pre-interpolation loss behaviour in neural networks ( http://arxiv.org/abs/2103.07986v1 ) ライセンス: Link先を確認	Arthur E. W. Venter and Marthinus W. Theunissen and Marelie H. Davel	(参考訳) ニューラルネットワークを分類器としてトレーニングする場合、同じデータセット上の全体的な分類精度を維持または改善しながら、平均テスト損失の増加を観察することが一般的です。この現象の普遍性にも拘わらず、よく研究されておらず、境界の正しい分類の増加によってしばしば軽視される。本稿では,この現象が実際に試験試料の処理方法の違いの結果であることを示す実験的検討を行う。本質的に: テスト損失は全体として増加しませんが、少数のサンプルのためにだけ。大きい表現容量は他のための極度な増加の費用でテストサンプルの大多数のための損失を減らすことを可能にします。この効果は主に、正しく処理されたサンプルの特徴に関連するパラメータ値の増加に起因すると考えられる。本研究は,ディープニューラルネットワークの共通行動の実用的理解に寄与する。また、この作業がネットワーク最適化と一般化に果たす影響についても議論する。 When training neural networks as classifiers, it is common to observe an increase in average test loss while still maintaining or improving the overall classification accuracy on the same dataset. In spite of the ubiquity of this phenomenon, it has not been well studied and is often dismissively attributed to an increase in borderline correct classifications. We present an empirical investigation that shows how this phenomenon is actually a result of the differential manner by which test samples are processed. In essence: test loss does not increase overall, but only for a small minority of samples. Large representational capacities allow losses to decrease for the vast majority of test samples at the cost of extreme increases for others. This effect seems to be mainly caused by increased parameter values relating to the correctly processed sample features. Our findings contribute to the practical understanding of a common behaviour of deep neural networks. We also discuss the implications of this work for network optimisation and generalisation.	翻訳日:2021-03-16 14:27:56 公開日:2021-03-14
# ブロック型抽象構文木分割によるコード要約の改善 Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting ( http://arxiv.org/abs/2103.07845v1 ) ライセンス: Link先を確認	Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li, Rongxin Wu	(参考訳) 自動コード要約は、ソフトウェア開発者を手動コメントの重荷から解放し、ソフトウェア開発とメンテナンスに利益をもたらします。ソースコードの構文構造を表現した抽象構文木(AST)がコード要約の生成をガイドするために組み込まれている。しかし、既存のASTベースのメソッドは、トレーニングの難しさに悩まされ、不十分なコード要約を生成する。本稿では、ASTのリッチツリー形式の構文構造をフルに活用し、コード要約を改善するBlock-wise Abstract Syntax Tree Splitting法(略してBASTS)を提案する。 BASTSは、コントロールフローグラフの支配木にあるブロックに基づいてメソッドのコードを分割し、各コード分割に対して分割ASTを生成します。各スプリットASTはTree-LSTMによってモデル化され、プリトレーニング戦略を使用してローカルの非線形シンタックスエンコーディングをキャプチャする。学習したシンタックスエンコーディングはコードエンコーディングと組み合わせ、Transformerにフィードバックされ、高品質のコードサマリを生成します。ベンチマークに関する総合的な実験は、BASTSが様々な評価指標で最先端のアプローチを著しく上回っていることを実証している。再現性を促進するために、私たちの実装はhttps://github.com/XMUDM/BASTSで入手できます。 Automatic code summarization frees software developers from the heavy burden of manual commenting and benefits software development and maintenance. Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been incorporated to guide the generation of code summaries. However, existing AST based methods suffer from the difficulty of training and generate inadequate code summaries. In this paper, we present the Block-wise Abstract Syntax Tree Splitting method (BASTS for short), which fully utilizes the rich tree-form syntax structure in ASTs, for improving code summarization. BASTS splits the code of a method based on the blocks in the dominator tree of the Control Flow Graph, and generates a split AST for each code split. Each split AST is then modeled by a Tree-LSTM using a pre-training strategy to capture local non-linear syntax encoding. The learned syntax encoding is combined with code encoding, and fed into Transformer to generate high-quality code summaries. Comprehensive experiments on benchmarks have demonstrated that BASTS significantly outperforms state-of-the-art approaches in terms of various evaluation metrics. To facilitate reproducibility, our implementation is available at https://github.com/XMUDM/BASTS.	翻訳日:2021-03-16 14:27:29 公開日:2021-03-14
# 連続処理の因果効果学習のためのVCNetと機能目標正規化 VCNet and Functional Targeted Regularization For Learning Causal Effects of Continuous Treatments ( http://arxiv.org/abs/2103.07861v1 ) ライセンス: Link先を確認	Lizhen Nie, Mao Ye, Qiang Liu, Dan Nicolae	(参考訳) 連続処理による観測データの増大に動機付けられ, 平均線量応答曲線(ADRF)を推定する問題について検討した。利用可能なパラメトリック手法はモデル空間において制限されており、ニューラルネットワークを利用して連続的な処理をブロックに分割し、それぞれのブロックに別々のヘッドを使用することでモデル表現性を向上しようとする以前の試みは、実際には不連続ADRFを生成する。したがって、ADRFを推定するためにニューラルネットワークの構造とトレーニングをどのように適応させるかという問題はまだ開いていません。本稿は2つの重要な貢献を述べる。まず,予測されたADRFの連続性を保ちつつ,モデル表現性を向上させる新しい可変係数ニューラルネットワーク(VCNet)を提案する。第二に、有限サンプル性能を改善するために、ターゲット正規化を一般化し、ADRF曲線全体の二重に堅牢な推定値を得る。 Motivated by the rising abundance of observational data with continuous treatments, we investigate the problem of estimating the average dose-response curve (ADRF). Available parametric methods are limited in their model space, and previous attempts in leveraging neural network to enhance model expressiveness relied on partitioning continuous treatment into blocks and using separate heads for each block; this however produces in practice discontinuous ADRFs. Therefore, the question of how to adapt the structure and training of neural network to estimate ADRFs remains open. This paper makes two important contributions. First, we propose a novel varying coefficient neural network (VCNet) that improves model expressiveness while preserving continuity of the estimated ADRF. Second, to improve finite sample performance, we generalize targeted regularization to obtain a doubly robust estimator of the whole ADRF curve.	翻訳日:2021-03-16 14:25:44 公開日:2021-03-14
# Von Mises-Fisher楕円分布 Von Mises-Fisher Elliptical Distribution ( http://arxiv.org/abs/2103.07948v1 ) ライセンス: Link先を確認	Shengxi Li, Danilo Mandic	(参考訳) 現代の確率的学習システムの大きなクラスは対称分布を仮定しているが、実世界のデータは歪分布に従う傾向にあり、したがって対称分布を通じて適切にモデル化されるとは限らない。この問題に対処するため、楕円分布は対称分布の一般化にますます使われており、近位楕円分布のさらなる改善が注目されている。しかし、既存のアプローチは見積もりが難しいか、複雑で抽象的な表現を持っている。そこで本研究では,vMF(Von-Mises-Fisher)分布を用いて,スキュー楕円分布の明確かつ簡便な確率表現を提案する。これは、非対称学習システムに対処できるだけでなく、歪んだ分布を一般化するための物理的に意味のある方法を提供するためにも示される。厳密さのために、私達の拡張は対称同等と重要で望ましい特性を共有することが証明されます。また,提案するvmf分布は,理論上および実例を通じて,生成が容易であり,推定が安定であることを示す。 A large class of modern probabilistic learning systems assumes symmetric distributions, however, real-world data tend to obey skewed distributions and are thus not always adequately modelled through symmetric distributions. To address this issue, elliptical distributions are increasingly used to generalise symmetric distributions, and further improvements to skewed elliptical distributions have recently attracted much attention. However, existing approaches are either hard to estimate or have complicated and abstract representations. To this end, we propose to employ the von-Mises-Fisher (vMF) distribution to obtain an explicit and simple probability representation of the skewed elliptical distribution. This is shown not only to allow us to deal with non-symmetric learning systems, but also to provide a physically meaningful way of generalising skewed distributions. For rigour, our extension is proved to share important and desirable properties with its symmetric counterpart. We also demonstrate that the proposed vMF distribution is both easy to generate and stable to estimate, both theoretically and through examples.	翻訳日:2021-03-16 14:25:28 公開日:2021-03-14
# すべての報酬を最適化するための1つの表現を学ぶ Learning One Representation to Optimize All Rewards ( http://arxiv.org/abs/2103.07945v1 ) ライセンス: Link先を確認	Ahmed Touati and Yann Ollivier	(参考訳) 我々は,報酬のないマルコフ決定プロセスのダイナミクスのフォワードバックワード(fb)表現を紹介する。後尾に指定された報酬に対して、明確な準最適ポリシーを提供する。教師なしのフェーズでは,既成の深層学習法と時間差学習(TD)を用いて,環境との報酬のないインタラクションを用いて2つの表現を学習する。試験段階では、報酬表現は、観察または明示的な報酬記述(例えば、目標状態)から推定される。その報酬の最適方針は、これらの表現から直接得られるが、計画はない。教師なしのFB損失は十分に優先されます:トレーニングが完璧であれば、得られたポリシーはどんな報酬機能にも最適です。不完全なトレーニングでは、副最適性は教師なし近似誤差に比例する。 FB表現は、モデルベースのアプローチのように状態を合成することなく、予測占有マップを介して状態と行動の間の長距離関係を学習する。これは任意のブラックボックス確率環境で制御可能なエージェントを学ぶためのステップである。このアプローチは、離散迷路および連続迷路上の目標指向RLアルゴリズム、ピクセルベースのMsPacman、およびFetchReach仮想ロボットアームとよく比較します。また、エージェントが目標指向RLを超える新しいタスクに即座に適応する方法も説明します。 We introduce the forward-backward (FB) representation of the dynamics of a reward-free Markov decision process. It provides explicit near-optimal policies for any reward specified a posteriori. During an unsupervised phase, we use reward-free interactions with the environment to learn two representations via off-the-shelf deep learning methods and temporal difference (TD) learning. In the test phase, a reward representation is estimated either from observations or an explicit reward description (e.g., a target state). The optimal policy for that reward is directly obtained from these representations, with no planning. The unsupervised FB loss is well-principled: if training is perfect, the policies obtained are provably optimal for any reward function. With imperfect training, the sub-optimality is proportional to the unsupervised approximation error. The FB representation learns long-range relationships between states and actions, via a predictive occupancy map, without having to synthesize states as in model-based approaches. This is a step towards learning controllable agents in arbitrary black-box stochastic environments. This approach compares well to goal-oriented RL algorithms on discrete and continuous mazes, pixel-based MsPacman, and the FetchReach virtual robot arm. We also illustrate how the agent can immediately adapt to new tasks beyond goal-oriented RL.	翻訳日:2021-03-16 14:20:34 公開日:2021-03-14
# SemVLP:複数のレベルでセマンティクスをアライメントするビジョン言語前訓練 SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels ( http://arxiv.org/abs/2103.07829v1 ) ライセンス: Link先を確認	Chenliang Li, Ming Yan, Haiyang Xu, Fuli Luo, Wei Wang, Bin Bi, Songfang Huang	(参考訳) 大規模画像テキストペア上での視覚言語事前学習(VLP)は、最近、クロスモーダル表現の学習の急速な進歩を目撃している。既存の事前学習手法は、単一ストリームトランスフォーマーへの入力として機能レベルで画像表現とテキスト表現を直接結合するか、2ストリームのクロスモーダルトランスフォーマーを使用して、画像テキスト表現を高レベルなセマンティック空間で整列させる。実世界の画像テキストデータでは、画像とテキストのペアが両方のモダリティに単純なセマンティクスをアライメントするのは容易である。そこで本稿では,画像とテキスト表現の低レベルと高レベルのセマンティクスを協調的に調整する,新しい事前学習手法SemVLPを提案する。モデルは2つの一般的な方法で事前訓練される: 単一ストリームの事前訓練きめ細かい特徴レベルでの調整および2ストリームの事前訓練ハイレベルセマンティクスの整合 ; プラグ可能なクロスモーダルアテンションモジュールを備えた共有トランスフォーマーネットワークを利用する。提案したSemVLPの有効性を実証するために、4つのよく確立された視覚言語理解タスクについて、多岐にわたる実験を行った。 Vision-language pre-training (VLP) on large-scale image-text pairs has recently witnessed rapid progress for learning cross-modal representations. Existing pre-training methods either directly concatenate image representation and text representation at a feature level as input to a single-stream Transformer, or use a two-stream cross-modal Transformer to align the image-text representation at a high-level semantic space. In real-world image-text data, we observe that it is easy for some of the image-text pairs to align simple semantics on both modalities, while others may be related after higher-level abstraction. Therefore, in this paper, we propose a new pre-training method SemVLP, which jointly aligns both the low-level and high-level semantics between image and text representations. The model is pre-trained iteratively with two prevalent fashions: single-stream pre-training to align at a fine-grained feature level and two-stream pre-training to align high-level semantics, by employing a shared Transformer network with a pluggable cross-modal attention module. An extensive set of experiments have been conducted on four well-established vision-language understanding tasks to demonstrate the effectiveness of the proposed SemVLP in aligning cross-modal representations towards different semantic granularities.	翻訳日:2021-03-16 14:15:39 公開日:2021-03-14
# ソースフル」なツイスト:感性、ハッシュタグ、およびアプリケーションソースに基づく絵文字予測 A `Sourceful' Twist: Emoji Prediction Based on Sentiment, Hashtags and Application Source ( http://arxiv.org/abs/2103.07833v1 ) ライセンス: Link先を確認	Pranav Venkit, Zeba Karishma, Chi-Yang Hsu, Rahul Katiki, Kenneth Huang, Shomir Wilson, Patrick Dudas	(参考訳) 私達は広くテキストの感情を高め、緩和し、または否定するためにソーシャルネットワークで絵文字を使用します。絵文字提案は、すでに多くのクロスプラットフォームアプリケーションに存在しているが、絵文字はテキストの主題や内容を理解するのではなく、一部の顕著な単語に基づいて予測される。そこで本論文では,関係する感情をモデルが理解し,テキストに最も適した絵文字を予測するために,Twitter機能を利用することの重要性を述べる。ハッシュタグやandroidなどのアプリケーションソースなど。絵文字の予測とTwitterの感情分析であまり使われていないことが判明した2つの機能だ。この欠点にアプローチし、さらに絵文字の行動パターンを理解するために、タイムスタンプ、ハッシュタグ、アプリケーションソースなどの追加のtwitterデータをクロールすることで、よりバランスのとれたデータセットを提案する。データ分析とニューラルネットワークモデルのパフォーマンス評価は、ハッシュタグとアプリケーションソースを特徴として使用することで、異なる情報をエンコードすることができ、絵文字の予測に有効であることを示している。 We widely use emojis in social networking to heighten, mitigate or negate the sentiment of the text. Emoji suggestions already exist in many cross-platform applications but an emoji is predicted solely based a few prominent words instead of understanding the subject and substance of the text. Through this paper, we showcase the importance of using Twitter features to help the model understand the sentiment involved and hence to predict the most suitable emoji for the text. Hashtags and Application Sources like Android, etc. are two features which we found to be important yet underused in emoji prediction and Twitter sentiment analysis on the whole. To approach this shortcoming and to further understand emoji behavioral patterns, we propose a more balanced dataset by crawling additional Twitter data, including timestamp, hashtags, and application source acting as additional attributes to the tweet. Our data analysis and neural network model performance evaluations depict that using hashtags and application sources as features allows to encode different information and is effective in emoji prediction.	翻訳日:2021-03-16 14:15:14 公開日:2021-03-14
# 道路・気象条件の異なる自動運転におけるカリキュラム強化学習の価値の検討 Investigating Value of Curriculum Reinforcement Learning in Autonomous Driving Under Diverse Road and Weather Conditions ( http://arxiv.org/abs/2103.07903v1 ) ライセンス: Link先を確認	Anil Ozturk, Mustafa Burak Gunel, Resul Dagdanov, Mirac Ekim Vural, Ferhat Yurdakul, Melih Dal, Nazim Kemal Ure	(参考訳) 強化学習(RL)の応用は自動運転タスクで人気がある。とはいえ、RLエージェントのパフォーマンスをチューニングし、さまざまな運転シナリオで一般化のパフォーマンスを保証することは、依然として大きな問題です。特に、複雑な道路や気象条件で優れた性能を得るには、徹底的なチューニングと計算時間が必要である。複雑なタスクに知識を移すため、簡単な自動化タスクの解決に重点を置くカリキュラムRLは、RLコミュニティで注目を集めている。本論文の主な貢献は、自動運転アプリケーションにおけるカリキュラム強化学習の価値を調査するための体系的研究である。本研究の目的は,道路の複雑度や気象条件の異なる実走行シミュレータにおいて,複数の異なる運転シナリオをセットアップすることである。次に、タスクの組み合わせとカリキュラムの異なるシーケンスでRLエージェントの性能を訓練し、評価する。その結果、カリキュラムRLは、運転性能とサンプルの複雑さの両方の観点から、複雑な運転タスクで有意な利益を得ることができます。結果は、異なるカリキュラムが異なるメリットをもたらす可能性があることも示しており、これは自動カリキュラムトレーニングの今後の研究方向性を示唆している。 Applications of reinforcement learning (RL) are popular in autonomous driving tasks. That being said, tuning the performance of an RL agent and guaranteeing the generalization performance across variety of different driving scenarios is still largely an open problem. In particular, getting good performance on complex road and weather conditions require exhaustive tuning and computation time. Curriculum RL, which focuses on solving simpler automation tasks in order to transfer knowledge to complex tasks, is attracting attention in RL community. The main contribution of this paper is a systematic study for investigating the value of curriculum reinforcement learning in autonomous driving applications. For this purpose, we setup several different driving scenarios in a realistic driving simulator, with varying road complexity and weather conditions. Next, we train and evaluate performance of RL agents on different sequences of task combinations and curricula. Results show that curriculum RL can yield significant gains in complex driving tasks, both in terms of driving performance and sample complexity. Results also demonstrate that different curricula might enable different benefits, which hints future research directions for automated curriculum training.	翻訳日:2021-03-16 14:13:58 公開日:2021-03-14
# Cycle4Completion:Missing Region Codingを用いたCycle Transformationによる不対点クラウド補完 Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding ( http://arxiv.org/abs/2103.07838v1 ) ライセンス: Link先を確認	Xin Wen and Zhizhong Han and Yan-Pei Cao and Pengfei Wan and Wen Zheng and Yu-Shen Liu	(参考訳) 本稿では,部分3dオブジェクトから全測地線を推定するcycle4completionという,新しい非ペアレッド点クラウド補完ネットワークを提案する。従来未完成な完成法は、不完全な形状から完全な形状への幾何学的対応の学習にのみ焦点を合わせ、逆方向の学習を無視することで、3次元形状理解能力の制限による完成精度の低下を招いた。そこで本研究では, 完全形状の潜在空間と不完全空間の2つの周期変換を提案する。サイクル変換の洞察は、ネットワークが相補的な形状から完全または不完全な形状を生成するように学習することで、3d形状を理解するよう促進することである。具体的には、最初のサイクルは不完全ドメインから完全ドメインへ形を変換し、その後不完全ドメインに投影する。このプロセスは完全形状の幾何学的特徴を学習し、完全予測と不完全入力の間の形状整合性を維持する。同様に、逆サイクル変換は完全なドメインから不完全なドメインへ始まり、不完全なシェイプの特徴を学ぶために完全なドメインに戻ります。実験の包括的評価を行い、学習した双方向形状対応モデルが、最先端の非ペアリング補完法よりも優れていることを示す。 In this paper, we present a novel unpaired point cloud completion network, named Cycle4Completion, to infer the complete geometries from a partial 3D object. Previous unpaired completion methods merely focus on the learning of geometric correspondence from incomplete shapes to complete shapes, and ignore the learning in the reverse direction, which makes them suffer from low completion accuracy due to the limited 3D shape understanding ability. To address this problem, we propose two simultaneous cycle transformations between the latent spaces of complete shapes and incomplete ones. The insight of cycle transformation is to promote networks to understand 3D shapes by learning to generate complete or incomplete shapes from their complementary ones. Specifically, the first cycle transforms shapes from incomplete domain to complete domain, and then projects them back to the incomplete domain. This process learns the geometric characteristic of complete shapes, and maintains the shape consistency between the complete prediction and the incomplete input. Similarly, the inverse cycle transformation starts from complete domain to incomplete domain, and goes back to complete domain to learn the characteristic of incomplete shapes. We provide a comprehensive evaluation in experiments, which shows that our model with the learned bidirectional geometry correspondence outperforms state-of-the-art unpaired completion methods.	翻訳日:2021-03-16 14:11:39 公開日:2021-03-14
# マルチモーダル軌道予測のための3つのステップ:モダリティクラスタリング、分類、合成 Three Steps to Multimodal Trajectory Prediction: Modality Clustering, Classification and Synthesis ( http://arxiv.org/abs/2103.07854v1 ) ライセンス: Link先を確認	Jianhua Sun, Yuxuan Li, Hao-Shu Fang, Cewu Lu	(参考訳) 軌道予測タスクには,未来に対する正しい答えが1つもないため,マルチモーダル予測結果が不可欠である。以前のフレームワークは、回帰、生成、分類の3つのカテゴリに分けられる。しかし、これらのフレームワークは異なる側面に弱点があり、マルチモーダル予測タスクを包括的にモデル化できない。本稿では,マルチモーダル予測を3つのステップ(モダリティクラスタリング,分類と合成)に定式化し,それ以前のフレームワークの欠点に対処することにより,新しい予測フレームワークとともに新しい洞察を提案する。提案手法は,社会情報や地図情報を導入することなく,最先端の手法を超越することを示した。具体的には、ETH/UCYデータセットでADEとFDEをそれぞれ19.2%、20.8%改善する。私たちのコードは公開されます。 Multimodal prediction results are essential for trajectory forecasting task as there is no single correct answer for the future. Previous frameworks can be divided into three categories: regression, generation and classification frameworks. However, these frameworks have weaknesses in different aspects so that they cannot model the multimodal prediction task comprehensively. In this paper, we present a novel insight along with a brand-new prediction framework by formulating multimodal prediction into three steps: modality clustering, classification and synthesis, and address the shortcomings of earlier frameworks. Exhaustive experiments on popular benchmarks have demonstrated that our proposed method surpasses state-of-the-art works even without introducing social and map information. Specifically, we achieve 19.2% and 20.8% improvement on ADE and FDE respectively on ETH/UCY dataset. Our code will be made publicly available.	翻訳日:2021-03-16 14:11:15 公開日:2021-03-14
# 複数オブジェクト追跡のための提案分類器の学習 Learning a Proposal Classifier for Multiple Object Tracking ( http://arxiv.org/abs/2103.07889v1 ) ライセンス: Link先を確認	Peng Dai and Renliang Weng and Wongun Choi and Changshui Zhang and Zhangping He and Wei Ding	(参考訳) マルチオブジェクトトラッキング(MOT)の最近のトレンドは、トラッキングパフォーマンスを高めるためにディープラーニングを活用することに向かっています。しかし、データ結合問題をエンドツーエンドで解くことは自明ではない。本稿では,MOTを提案生成,提案スコアリング,トラジェクティブ推論パラダイムとしてアフィニティグラフ上にモデル化した,提案に基づく学習可能なフレームワークを提案する。このフレームワークは、2段階のオブジェクト検出器Faster RCNNに似ており、データ駆動の方法でMOT問題を解決することができる。提案生成のために,生成した提案の品質を維持しながら計算コストを削減するための反復グラフクラスタリング手法を提案する。提案手法は,提案する提案の構造パターンを学習し,評価された品質スコアに従ってランク付けするために,トレーニング可能なグラフ畳み込みネットワーク(GCN)をデプロイする。軌道推論では、複数のトラックに検出を割り当てることができないという制約に従いながら、追跡出力を生成するためのシンプルなオーバーラップ戦略を採用しています。提案手法は,従来の2つの公開ベンチマークにおいて,MOTAとIDF1の両性能改善を実現することを実験的に実証した。コードは \url{https://github.com/daip13/LPC_MOT.git} で入手できます。 The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. However, it is not trivial to solve the data-association problem in an end-to-end fashion. In this paper, we propose a novel proposal-based learnable framework, which models MOT as a proposal generation, proposal scoring and trajectory inference paradigm on an affinity graph. This framework is similar to the two-stage object detector Faster RCNN, and can solve the MOT problem in a data-driven way. For proposal generation, we propose an iterative graph clustering method to reduce the computational cost while maintaining the quality of the generated proposals. For proposal scoring, we deploy a trainable graph-convolutional-network (GCN) to learn the structural patterns of the generated proposals and rank them according to the estimated quality scores. For trajectory inference, a simple deoverlapping strategy is adopted to generate tracking output while complying with the constraints that no detection can be assigned to more than one track. We experimentally demonstrate that the proposed method achieves a clear performance improvement in both MOTA and IDF1 with respect to previous state-of-the-art on two public benchmarks. Our code is available at \url{https://github.com/daip13/LPC_MOT.git}.	翻訳日:2021-03-16 14:11:00 公開日:2021-03-14
# DivCo: Contrastive Generative Adversarial Networkによる多様な条件付き画像合成 DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network ( http://arxiv.org/abs/2103.07893v1 ) ライセンス: Link先を確認	Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li	(参考訳) conditional generative adversarial networks (cgans) は、入力条件と潜在コードから様々なイメージを合成することを目標としているが、残念ながらモード崩壊の問題に苦しむ。この問題を解決するため、従来の研究は主に、様々な潜在コードから生成された画像間の関係を無視しながら、潜在コードと生成画像の相関関係を奨励することに焦点を当てていた。最近のMSGANは生成した画像の多様性を奨励しようとしたが、画像ペア間の"負の関係"のみを考慮していた。本稿では,潜在空間で指定された生成画像間の「正」と「負」の関係を適切に制約する新しいDivCoフレームワークを提案する。私たちの知る限りでは、これは様々な条件付き画像合成にコントラスト学習を使用する最初の試みです。隣接する潜時符号から生成された画像と、異なる潜時符号から生成された画像とが類似することを奨励する、新規な潜時拡張コントラスト損失が導入される。提案された遅発性コントラスト損失は、様々なcGANアーキテクチャとよく互換性がある。広範な実験により、提案されたDivCoは、複数の無対およびペアの画像生成タスクで視覚的品質を犠牲にすることなく、最先端の方法よりも多様な画像を生成することができることが実証された。 Conditional generative adversarial networks (cGANs) target at synthesizing diverse images given the input conditions and latent codes, but unfortunately, they usually suffer from the issue of mode collapse. To solve this issue, previous works mainly focused on encouraging the correlation between the latent codes and their generated images, while ignoring the relations between images generated from various latent codes. The recent MSGAN tried to encourage the diversity of the generated image but only considers "negative" relations between the image pairs. In this paper, we propose a novel DivCo framework to properly constrain both "positive" and "negative" relations between the generated images specified in the latent space. To the best of our knowledge, this is the first attempt to use contrastive learning for diverse conditional image synthesis. A novel latent-augmented contrastive loss is introduced, which encourages images generated from adjacent latent codes to be similar and those generated from distinct latent codes to be dissimilar. The proposed latent-augmented contrastive loss is well compatible with various cGAN architectures. Extensive experiments demonstrate that the proposed DivCo can produce more diverse images than state-of-the-art methods without sacrificing visual quality in multiple unpaired and paired image generation tasks.	翻訳日:2021-03-16 14:10:42 公開日:2021-03-14
# Refer-it-in-RGBD:RGBD画像における3次元視覚グラウンドのボトムアップアプローチ Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images ( http://arxiv.org/abs/2103.07894v1 ) ライセンス: Link先を確認	Haolin Liu, Anran Lin, Xiaoguang Han, Lei Yang, Yizhou Yu, Shuguang Cui	(参考訳) RGBD画像における接地参照表現は新たな分野である。本稿では,参照する物体が閉塞により部分的にスキャンされる場合が多い単視点rgbd画像における3次元視覚グランド化の新たな課題を提案する。 3Dシーンに接地するためのオブジェクト提案を直接生成する従来の作業とは対照的に,コンテキスト認識情報を段階的に集約するボトムアップ手法を提案し,部分幾何学による課題に効果的に対処する。我々のアプローチは、まず言語と視覚機能をボトムレベルに融合させ、rgbdイメージ内の関連領域を粗くローカライズするヒートマップを生成する。次に、ヒートマップに基づく適応的特徴学習を行い、他のビジオ言語融合とオブジェクトレベルのマッチングを行い、最後に参照したオブジェクトを接地する。提案手法は,ScanReferデータセットから抽出したRGBD画像と新たに収集したSUNReferデータセットとを比較して評価する。実験では、両データセットの以前の手法(11.2%と15.6%のAcc@0.5)を上回った。 Grounding referring expressions in RGBD image has been an emerging field. We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate context-aware information, effectively addressing the challenge posed by the partial geometry. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGBD image. Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object. We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGBD images extracted from the ScanRefer dataset and our newly collected SUNRefer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.2% and 15.6% Acc@0.5) on both datasets.	翻訳日:2021-03-16 14:10:20 公開日:2021-03-14
# Bag-of-local-featureによる顔操作の一般化とロバスト化に向けて Towards Generalizable and Robust Face Manipulation Detection via Bag-of-local-feature ( http://arxiv.org/abs/2103.07915v1 ) ライセンス: Link先を確認	Changtao Miao, Qi Chu, Weihai Li, Tao Gong, Wanyi Zhuang and Nenghai Yu	(参考訳) 過去数年間、顔操作技術の悪質な虐待の問題を解決するために、顔操作検出技術はかなりの注目を集め、顕著な進歩を達成しました。しかし、既存の手法の多くは一般化能力と堅牢性が非常に貧弱である。本稿では,局所的特徴量による一般化とロバスト性を向上させるための新しい顔操作検出法を提案する。具体的には、パッチ間関係をエンコードするためにbag-of-featureアプローチを使ってトランスフォーマーを拡張し、明示的な監督なしにローカルな偽造機能を学ぶことができる。広範な実験では、FaceForensics++、Celeb-DF、DeeperForensics-1.0データセットの競合する最先端のメソッドを上回ります。 Over the past several years, in order to solve the problem of malicious abuse of facial manipulation technology, face manipulation detection technology has obtained considerable attention and achieved remarkable progress. However, most existing methods have very impoverished generalization ability and robustness. In this paper, we propose a novel method for face manipulation detection, which can improve the generalization ability and robustness by bag-of-local-feature. Specifically, we extend Transformers using bag-of-feature approach to encode inter-patch relationships, allowing it to learn local forgery features without any explicit supervision. Extensive experiments demonstrate that our method can outperform competing state-of-the-art methods on FaceForensics++, Celeb-DF and DeeperForensics-1.0 datasets.	翻訳日:2021-03-16 14:10:00 公開日:2021-03-14
# 動的雨発生器による半監督映像の劣化 Semi-Supervised Video Deraining with Dynamic Rain Generator ( http://arxiv.org/abs/2103.07939v1 ) ライセンス: Link先を確認	Zongsheng Yue, Jianwen Xie, Qian Zhao, Deyu Meng	(参考訳) 近年,深層学習(DL)に基づくビデオデライニング手法は大きな成功を収めているが,大きな欠点は2つある。第一に、雨の層の特徴を十分にモデル化していないものが多い。実際、雨の層は空間次元の強い物理的性質(例えば、方向、スケールおよび厚さ)および時間次元の自然な連続性を示し、統計学の空間時間過程によって一般にモデル化することができる。第二に、現在のdlベースの手法はラベル付き合成トレーニングデータに真剣に依存しており、雨種は常にラベルなしの実データと切り離されている。このような合成データセットと実際のデータセットのギャップは、実際のシナリオに適用する際のパフォーマンスの低下につながります。そこで本論文では,雨の層に適応する動的雨発生器を用いて,その洞察力のある特性をよりよく表現する,新しい半監視型ビデオ脱雨法を提案する。具体的には、1つの放出モデルと1つの遷移モデルからなり、それぞれ深層ニューラルネットワーク(DNN)としてパラメータ化される雨のストリークの空間的物理的構造と時間的連続的な変化を同時に符号化する。さらに、ラベル付き合成およびラベルなしの実データに対して、それらの基礎となる共通知識を十分に活用するために、異なる先行フォーマットが設計されている。最後に、我々はまた、このモデルを解決するためにモンテカルロEMアルゴリズムを設計します。提案した半教師付きデラライニングモデルの優位性を検証するため, 大規模実験を行った。 While deep learning (DL)-based video deraining methods have achieved significant success recently, they still exist two major drawbacks. Firstly, most of them do not sufficiently model the characteristics of rain layers of rainy videos. In fact, the rain layers exhibit strong physical properties (e.g., direction, scale and thickness) in spatial dimension and natural continuities in temporal dimension, and thus can be generally modelled by the spatial-temporal process in statistics. Secondly, current DL-based methods seriously depend on the labeled synthetic training data, whose rain types are always deviated from those in unlabeled real data. Such gap between synthetic and real data sets leads to poor performance when applying them in real scenarios. Against these issues, this paper proposes a new semi-supervised video deraining method, in which a dynamic rain generator is employed to fit the rain layer, expecting to better depict its insightful characteristics. Specifically, such dynamic generator consists of one emission model and one transition model to simultaneously encode the spatially physical structure and temporally continuous changes of rain streaks, respectively, which both are parameterized as deep neural networks (DNNs). Further more, different prior formats are designed for the labeled synthetic and unlabeled real data, so as to fully exploit the common knowledge underlying them. Last but not least, we also design a Monte Carlo EM algorithm to solve this model. Extensive experiments are conducted to verify the superiorities of the proposed semi-supervised deraining model.	翻訳日:2021-03-16 14:09:48 公開日:2021-03-14
# Modular Interactive Video Object Segmentation:Interaction-to-Mask, Propagation and difference-Aware Fusion Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion ( http://arxiv.org/abs/2103.07941v1 ) ライセンス: Link先を確認	Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang	(参考訳) マスク間相互作用とマスク伝搬を分離し,より汎用性と優れた性能を実現するモジュール型対話型VOS(MiVOS)フレームワークを提案する。個別にトレーニングされたインタラクションモジュールは,ユーザインタラクションをオブジェクトマスクに変換して,時空間メモリを読み取るための新しいトップ-k$フィルタ戦略を用いて,伝搬モジュールによって時間的に伝搬する。ユーザの意図を効果的に考慮するために、時空メモリを用いてターゲットフレームに整列した各インタラクションの前後にマスクを適切に融合する方法を学ぶための新しい差分認識モジュールが提案される。 DAVIS上でのユーザインタラクション(例えば、スクリブル、クリック)の定性的および定量的に評価し、この手法が現在の最先端のアルゴリズムを上回る一方で、フレームインタラクションを少なくし、さまざまなタイプのユーザーインタラクションを一般化する利点があることを示した。ソースコードに付随する4.8Mフレームのピクセル精度のセグメンテーションによる大規模合成VOSデータセットを提供し、今後の研究を促進しています。 We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance. Trained separately, the interaction module converts user interactions to an object mask, which is then temporally propagated by our propagation module using a novel top-$k$ filtering strategy in reading the space-time memory. To effectively take the user's intent into account, a novel difference-aware module is proposed to learn how to properly fuse the masks before and after each interaction, which are aligned with the target frames by employing the space-time memory. We evaluate our method both qualitatively and quantitatively with different forms of user interactions (e.g., scribbles, clicks) on DAVIS to show that our method outperforms current state-of-the-art algorithms while requiring fewer frame interactions, with the additional advantage in generalizing to different types of user interactions. We contribute a large-scale synthetic VOS dataset with pixel-accurate segmentation of 4.8M frames to accompany our source codes to facilitate future research.	翻訳日:2021-03-16 14:09:23 公開日:2021-03-14
# TransFG:微細粒度認識のためのトランスフォーマーアーキテクチャ TransFG: A Transformer Architecture for Fine-grained Recognition ( http://arxiv.org/abs/2103.07976v1 ) ライセンス: Link先を確認	Ju He, Jieneng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille	(参考訳) サブカテゴリからオブジェクトを認識することを目的とした細粒度視覚分類(FGVC)は、本質的に微妙なクラス間差のため、非常に難しい課題である。近年の研究では、最も差別的な画像領域の特定に焦点をあて、ネットワークの微妙なばらつきを捉える能力の向上にそれらを活用することで、この問題に対処している。これらの作業のほとんどは、RPNモジュールを使用してバウンディングボックスを提案し、バックボーンネットワークを使用して選択されたボックスの特徴を抽出します。近年,視覚変換器 (ViT) は従来の分類課題において高い性能を示した。トランスの自己アテンション機構は、すべてのパッチトークンを分類トークンにリンクする。注意リンクの強さはトークンの重要性の指標として直感的に考えることができる。本研究では,トランスの生の注意重みをすべて注意マップに統合して,識別画像パッチを効率的かつ正確に選択し,それらの関係を計算する,新しいトランスフォーマーベースのフレームワークであるTransFGを提案する。重複損失は、異なる地域に焦点を当てるために複数の注意を喚起するために導入されます。さらに、類似サブクラスの特徴表現間の距離をさらに拡大するために、コントラスト損失が適用される。我々は、cub-200-2011、stanford cars、stanford dogs、nabirds、inat2017の5つの人気のあるきめ細かいベンチマーク実験を行い、transfgの価値を実証した。モデルをよりよく理解するための定性的な結果が提示される。 Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences. Recent works mainly tackle this problem by focusing on how to locate the most discriminative image regions and rely on them to improve the capability of networks to capture subtle variances. Most of these works achieve this by using an RPN module to propose bounding boxes and re-use the backbone network to extract features of selected boxes. Recently, vision transformer (ViT) shows its strong performance in the traditional classification task. The self-attention mechanism of the transformer links every patch token to the classification token. The strength of the attention link can be intuitively considered as an indicator of the importance of tokens. In this work, we propose a novel transformer-based framework TransFG where we integrate all raw attention weights of the transformer into an attention map for guiding the network to effectively and accurately select discriminative image patches and compute their relations. A duplicate loss is introduced to encourage multiple attention heads to focus on different regions. In addition, a contrastive loss is applied to further enlarge the distance between feature representations of similar sub-classes. We demonstrate the value of TransFG by conducting experiments on five popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, NABirds and iNat2017 where we achieve state-of-the-art performance. Qualitative results are presented for better understanding of our model.	翻訳日:2021-03-16 14:09:01 公開日:2021-03-14
# 平均フィールドゲームGAN Mean Field Game GAN ( http://arxiv.org/abs/2103.07855v1 ) ライセンス: Link先を確認	Shaojun Ma, Haomin Zhou, Hongyuan Zha	(参考訳) 新規な平均フィールドゲーム (MFGs) ベースのGAN (generation adversarial network) フレームワークを提案する。具体的には、密度空間における Hopf 式を用いて MFG を主双対問題として書き換え、ニューラルネットワークやサンプルを通じてモデルを訓練できるようにします。私たちのモデルは、ホップ式内の様々な機能を選択する自由のために柔軟です。さらに、私達の公式は数学的にLipschitz-1の制約を避けます。本手法の正確性と効率は,いくつかの実験により検証された。 We propose a novel mean field games (MFGs) based GAN(generative adversarial network) framework. To be specific, we utilize the Hopf formula in density space to rewrite MFGs as a primal-dual problem so that we are able to train the model via neural networks and samples. Our model is flexible due to the freedom of choosing various functionals within the Hopf formula. Moreover, our formulation mathematically avoids Lipschitz-1 constraint. The correctness and efficiency of our method are validated through several experiments.	翻訳日:2021-03-16 14:03:09 公開日:2021-03-14
# fisher divergence critic regularizationを用いたオフライン強化学習 Offline Reinforcement Learning with Fisher Divergence Critic Regularization ( http://arxiv.org/abs/2103.08050v1 ) ライセンス: Link先を確認	Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum	(参考訳) オフライン強化学習(RL)に対する現代の多くのアプローチは、通常、オフラインデータからポリシーのばらつきを測定するペナルティを持つモデルフリーアクター批評家アルゴリズムを増強する行動規則化を利用している。本研究では,オフラインデータを生成するログビヘイビア・ポリティ(log-behavior-policy)と,ニューラルネットワークを用いて学習可能な状態アクション値オフセット項をパラメータ化して,学習方針がデータに近づき続けることを奨励する代替手法を提案する。動作の正規化は、オフセット期間の適切な正規化に対応します。本稿では,オフセット項に勾配ペナルティ正規化器を用い,フィッシャーの発散正規化と等価性を実証し,スコアマッチングと生成エネルギーに基づくモデル文献との関連性を提案する。そこで,このアルゴリズムをfisher-brc (behavior regularized critic) と呼ぶ。標準のオフラインRLベンチマークでは、Fisher-BRCはパフォーマンスの向上と既存の最先端のメソッドよりも迅速な収束を実現します。 Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data. In this work, we propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy, which generated the offline data, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. We propose using a gradient penalty regularizer for the offset term and demonstrate its equivalence to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature. We thus term our resulting algorithm Fisher-BRC (Behavior Regularized Critic). On standard offline RL benchmarks, Fisher-BRC achieves both improved performance and faster convergence over existing state-of-the-art methods.	翻訳日:2021-03-16 14:03:02 公開日:2021-03-14
# ハイパーパラメータ最適化における静的サーロゲートの利用 Use of static surrogates in hyperparameter optimization ( http://arxiv.org/abs/2103.07963v1 ) ライセンス: Link先を確認	Dounia Lakhmiri and S\'ebastien Le Digabel	(参考訳) ニューラルネットワークのハイパーパラメータとアーキテクチャを最適化することは、新しいアプリケーションの開発において長く必要不可欠なフェーズです。この消費プロセスは、低品質な構成を迅速に破棄し、より有望な候補に集中するように設計された戦略の策定の恩恵を受けることができる。本研究の目的は,ニューラルネットワークのアーキテクチャとトレーニングを同時にチューニングするために,直接探索微分自由最適化アルゴリズムを適用したライブラリであるHyperNOMADを,実行の2つのキーステップを目標とし,静的サロゲートの形で安価な近似を利用して,構成の評価と候補プールのランク付けを早期に停止させることである。これらのHyperNOMADへの追加は、提案したソリューションの品質を損なうことなく、リソース消費を改善することが示されている。 Optimizing the hyperparameters and architecture of a neural network is a long yet necessary phase in the development of any new application. This consuming process can benefit from the elaboration of strategies designed to quickly discard low quality configurations and focus on more promising candidates. This work aims at enhancing HyperNOMAD, a library that adapts a direct search derivative-free optimization algorithm to tune both the architecture and the training of a neural network simultaneously, by targeting two keys steps of its execution and exploiting cheap approximations in the form of static surrogates to trigger the early stopping of the evaluation of a configuration and the ranking of pools of candidates. These additions to HyperNOMAD are shown to improve on its resources consumption without harming the quality of the proposed solutions.	翻訳日:2021-03-16 13:58:54 公開日:2021-03-14
# 完了時間による成功度:身体的ナビゲーションのためのダイナミクスを考慮した評価基準 Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation ( http://arxiv.org/abs/2103.08022v1 ) ライセンス: Link先を確認	Naoki Yokoyama, Sehoon Ha, Dhruv Batra	(参考訳) 我々は,移動ロボットのナビゲーション性能を評価するための新しい指標であるCompletion Time (SCT) により,Successを重み付けした。ナビゲーションに関するいくつかの関連する研究は、エージェントが目標とする経路を評価する主要な方法として、パス長(SPL)で重み付けされたSuccessを使用してきたが、SPLは複雑なダイナミクスを持つエージェントを適切に評価する能力に限られている。対照的に、sctはエージェントのダイナミクスモデルを明示的に考慮し、エージェントがそのダイナミクスによって与えられる最速のナビゲーション動作をいかに正確に把握することを目的としている。いくつかの具体化されたナビゲーションはポイントターンダイナミクスを使用しますが、私たちは一般的なモバイルロボティクスプラットフォーム(LoCoBot、TurtleBot、Fetchなど)のダイナミクスモデルをよりよく例示するエージェントのための一輪車ダイナミクスに焦点を当てています。 RRT-Unicycleは、障害物を含む環境において、開始ポーズから目標位置までの衝突のない経路と完了時間を推定する一輪動力学のアルゴリズムである。深層強化学習と報酬形成の実験を行い,エージェントのナビゲーション性能を異なる動的モデルと比較した。これらのエージェントを評価すると、SPLとは対照的に、SCTは1サイクルモデルがダイナミクスの単純なポイントターンモデルよりもナビゲーション速度の利点を捉えることができることを示しています。最後に、実世界のシミュレーション以外のトレーニングを受けたモデルとアルゴリズムをうまく展開できることを示します。私たちは実際のロボットにエージェントを体現し、アパートをナビゲートし、ゼロショットで一般化できることを示します。 We present Success weighted by Completion Time (SCT), a new metric for evaluating navigation performance for mobile robots. Several related works on navigation have used Success weighted by Path Length (SPL) as the primary method of evaluating the path an agent makes to a goal location, but SPL is limited in its ability to properly evaluate agents with complex dynamics. In contrast, SCT explicitly takes the agent's dynamics model into consideration, and aims to accurately capture how well the agent has approximated the fastest navigation behavior afforded by its dynamics. While several embodied navigation works use point-turn dynamics, we focus on unicycle-cart dynamics for our agent, which better exemplifies the dynamics model of popular mobile robotics platforms (e.g., LoCoBot, TurtleBot, Fetch, etc.). We also present RRT-Unicycle, an algorithm for unicycle dynamics that estimates the fastest collision-free path and completion time from a starting pose to a goal location in an environment containing obstacles. We experiment with deep reinforcement learning and reward shaping to train and compare the navigation performance of agents with different dynamics models. In evaluating these agents, we show that in contrast to SPL, SCT is able to capture the advantages in navigation speed a unicycle model has over a simpler point-turn model of dynamics. Lastly, we show that we can successfully deploy our trained models and algorithms outside of simulation in the real world. We embody our agents in an real robot to navigate an apartment, and show that they can generalize in a zero-shot manner.	翻訳日:2021-03-16 13:58:38 公開日:2021-03-14
# SaNet: 空間分解能空中画像解析のためのスケール対応ニューラルネットワーク SaNet: Scale-aware neural Network for Parsing Multiple Spatial Resolution Aerial Images ( http://arxiv.org/abs/2103.07935v1 ) ライセンス: Link先を確認	Libo Wang (School of Remote Sensing and Information Engineering Wuhan University, China)	(参考訳) 画像の地理空間を画素レベルで分類情報で指定することは都市景観理解の基本的な課題である。しかし、リモートセンシングセンサーの巨大な違いにより、複数の空間分解能(MSR)で空撮された画像は、地理的空間オブジェクトのスケール変動の増加と空間分解能が低下するにつれて情報的特徴の損失という2つの問題を引き起こします。そこで本研究では,MSR空中画像解析のためのスケールアウェアニューラルネットワーク (SaNet) を提案する。スケール変動に起因する大小のオブジェクト間の不均衡なセグメンテーション品質に対応するため、SaNetは高密度接続機能ネットワーク(DCFPN)モジュールをデプロイし、大きな受信フィールドを持つ品質のマルチスケールコンテキストをキャプチャする。情報的特徴損失を軽減するため、SFRモジュールをネットワークに組み込み、空間的関係強化を伴うスケール不変の特徴を学習する。 ISPRS Vaihingen 2DデータセットとISPRS Potsdam 2Dデータセットに関する広範な実験結果は、提案されたSaNetの他の最先端のネットワークと比較して優れたクロス解像度セグメンテーション能力を示しています。 Assigning the geospatial objects of aerial images with categorical information at the pixel-level is a basic task in urban scene understanding. However, the huge differencc in remote sensing sensors makes the acqured aerial images in multiple spatial resolution (MSR), which brings two issues: the increased scale variation of geospatial objects and informative feature loss as spatial resolution drops. To address the two issues, we propose a novel scale-aware neural network (SaNet) for parsing MSR aerial images. For coping with the imbalanced segmentation quality between larger and smaller objects caused by the scale variation, the SaNet deploys a densely connected feature network (DCFPN) module to capture quality multi-scale context with large receptive fields. To alleviate the informative feature loss, a SFR module is incorporated into the network to learn scale-invariant features with spatial relation enhancement. Extensive experimental results on the ISPRS Vaihingen 2D Dataset and ISPRS Potsdam 2D Dataset demonstrate the outstanding cross-resolution segmentation ability of the proposed SaNet compared to other state-of-the-art networks.	翻訳日:2021-03-16 13:55:01 公開日:2021-03-14
# 単一画像デハジングのためのプログレッシブ残差学習 Progressive residual learning for single image dehazing ( http://arxiv.org/abs/2103.07973v1 ) ライセンス: Link先を確認	Yudong Liang, Bin Wang, Jiaying Liu, Deyu Li, Yuhua Qian and Wenqi Ren	(参考訳) 最近の物理モデルフリーのデハジング手法は最先端のパフォーマンスを達成している。しかし,物理モデルの指導がなければ,データ不足やデータ不足のため,実際のシナリオに適用すると性能は急速に低下する。一方、物理モデルに基づく手法はより解釈性が高いが、パラメータの多目的最適化に苦しむため、準最適脱ハージング結果につながる可能性がある。本稿では, 物理的モデルフリー脱ハージングプロセスと, 両カテゴリにおける脱ハージング手法のメリットを享受する改良型散乱モデルベース脱ハージング操作を組み合わせ, 段階的残留学習戦略を提案する。特に、地球大気の光と透過地図は、初期物理モデルフリーの消泡過程から正確な残差情報と予備的な消泡修復の助けを借りてインタラクティブに最適化されている。提案手法は,パブリックデヘイジングベンチマークにおける最先端手法に対して,複雑なヘイジングデータに対するモデル解釈性と適応性に優れる。 The recent physical model-free dehazing methods have achieved state-of-the-art performances. However, without the guidance of physical models, the performances degrade rapidly when applied to real scenarios due to the unavailable or insufficient data problems. On the other hand, the physical model-based methods have better interpretability but suffer from multi-objective optimizations of parameters, which may lead to sub-optimal dehazing results. In this paper, a progressive residual learning strategy has been proposed to combine the physical model-free dehazing process with reformulated scattering model-based dehazing operations, which enjoys the merits of dehazing methods in both categories. Specifically, the global atmosphere light and transmission maps are interactively optimized with the aid of accurate residual information and preliminary dehazed restorations from the initial physical model-free dehazing process. The proposed method performs favorably against the state-of-the-art methods on public dehazing benchmarks with better model interpretability and adaptivity for complex hazy data.	翻訳日:2021-03-16 13:54:39 公開日:2021-03-14
# 胸部X線画像からのCOVID-19感染の局在と重症度 COVID-19 Infection Localization and Severity Grading from Chest X-ray Images ( http://arxiv.org/abs/2103.07985v1 ) ライセンス: Link先を確認	Anas M. Tahir, Muhammad E. H. Chowdhury, Amith Khandakar, Tawsifur Rahman, Yazan Qiblawey, Uzair Khurshid, Serkan Kiranyaz, Nabil Ibtehaz, M Shohel Rahman, Somaya Al-Madeed, Khaled Hameed, Tahir Hamid, Sakib Mahmud, Maymouna Ezeddin	(参考訳) コロナウイルス感染症2019(COVID-19)は、2019年12月に世界経済と医療システムに大きな影響を与えたことから、世界中の主要な課題となっている。肺組織に対するcovid-19の影響を考えると、胸部x線撮影は疾患のスクリーニングと監視に不可欠である。多くの研究が、COVID-19の自動診断のためのディープラーニングアプローチを提案している。これらの手法は検出性能に驚くべきものとなったが、通常は数百のCXR画像のみを含む限られた胸部X線レポジトリ(CXR)を用いて評価を行っている。したがって、そのようなデータ不足は、オーバーフィッティングの可能性による信頼性の高い評価を妨げている。さらに、ほとんどの研究では、COVID-19肺炎の感染局在および重症度格付けの能力が示されませんでした。本研究では,CXR画像からの感染定量化による肺分画とCOVID-19の局在の体系的,統一的なアプローチを提案することにより,この緊急ニーズに対処する。これを実現するため,我々は,新しい人間-機械協調アプローチにより,cxr上で地対肺分割マスクのアノテーションを行う11,956個のcovid-19サンプルを含む33,920個のcxr画像を含む,最大のベンチマークデータセットを構築した。最先端セグメンテーションネットワーク、U-Net、U-Net++、Feature Pyramid Networks (FPN) を用いて広範な実験を行った。開発されたネットワークは、広範な反復プロセスを経て、96.11%のインターセクションオーバーユニオン(IoU)と97.99%のダイス類似係数(DSC)で肺領域セグメンテーションの優れた性能を達成しました。さらに、様々な形や種類のCOVID-19感染症が83.05%のIoUと88.21%のDSCで確実に局在した。最後に、提案されたアプローチは、99%を超える感度と特異性の両方で優れたCOVID-19検出性能を達成しました。 Coronavirus disease 2019 (COVID-19) has been the main agenda of the whole world, since it came into sight in December 2019 as it has significantly affected the world economy and healthcare system. Given the effects of COVID-19 on pulmonary tissues, chest radiographic imaging has become a necessity for screening and monitoring the disease. Numerous studies have proposed Deep Learning approaches for the automatic diagnosis of COVID-19. Although these methods achieved astonishing performance in detection, they have used limited chest X-ray (CXR) repositories for evaluation, usually with a few hundred COVID-19 CXR images only. Thus, such data scarcity prevents reliable evaluation with the potential of overfitting. In addition, most studies showed no or limited capability in infection localization and severity grading of COVID-19 pneumonia. In this study, we address this urgent need by proposing a systematic and unified approach for lung segmentation and COVID-19 localization with infection quantification from CXR images. To accomplish this, we have constructed the largest benchmark dataset with 33,920 CXR images, including 11,956 COVID-19 samples, where the annotation of ground-truth lung segmentation masks is performed on CXRs by a novel human-machine collaborative approach. An extensive set of experiments was performed using the state-of-the-art segmentation networks, U-Net, U-Net++, and Feature Pyramid Networks (FPN). The developed network, after an extensive iterative process, reached a superior performance for lung region segmentation with Intersection over Union (IoU) of 96.11% and Dice Similarity Coefficient (DSC) of 97.99%. Furthermore, COVID-19 infections of various shapes and types were reliably localized with 83.05% IoU and 88.21% DSC. Finally, the proposed approach has achieved an outstanding COVID-19 detection performance with both sensitivity and specificity values above 99%.	翻訳日:2021-03-16 13:54:23 公開日:2021-03-14
# Deep Tiling: ディープラーニングアプローチを用いたテクスチャタイル合成 Deep Tiling: Texture Tile Synthesis Using a Deep Learning Approach ( http://arxiv.org/abs/2103.07992v1 ) ライセンス: Link先を確認	Vasilis Toulatzis, Ioannis Fudos	(参考訳) テクスチャはコンピュータグラフィックスの基本的なプロセスである。テクスチャを利用して、3Dシーンの可視化結果を強化する。多くの場合、テクスチャ画像は解像度が小さいため、大きな3dモデル表面を覆うことができない。リピート、ミラーリピート、またはエッジへのクランプなどの従来の技術は、視覚的に許容できる結果をもたらしません。深層学習に基づくテクスチャ合成はそのような場合に非常に有効であることが証明されている。より大きな解像度のテクスチャを作ろうとするディープテクスチャ合成手法はすべて、gpuメモリリソースの面で制限されている。本稿では,入力テクスチャの構造的構成要素に類似した任意の解像度のタイルを作成するために,頑健な深層学習プロセスを用いたサンプルベーステクスチャ合成手法を提案する。このようにして、小サイズの新しいテクスチャタイルを合成して元のテクスチャとマージし、第2に、大きなテクスチャの欠落部分を容易に生成できるという事実から、第一にメモリの少ない方法である。 Texturing is a fundamental process in computer graphics. Texture is leveraged to enhance the visualization outcome for a 3D scene. In many cases a texture image cannot cover a large 3D model surface because of its small resolution. Conventional techniques like repeating, mirror repeating or clamp to edge do not yield visually acceptable results. Deep learning based texture synthesis has proven to be very effective in such cases. All deep texture synthesis methods trying to create larger resolution textures are limited in terms of GPU memory resources. In this paper, we propose a novel approach to example-based texture synthesis by using a robust deep learning process for creating tiles of arbitrary resolutions that resemble the structural components of an input texture. In this manner, our method is firstly much less memory limited owing to the fact that a new texture tile of small size is synthesized and merged with the original texture and secondly can easily produce missing parts of a large texture.	翻訳日:2021-03-16 13:53:51 公開日:2021-03-14
# 人的相互作用による建物制御のための深層強化学習のシミュレーション研究 Simulation Studies on Deep Reinforcement Learning for Building Control with Human Interaction ( http://arxiv.org/abs/2103.07919v1 ) ライセンス: Link先を確認	Donghwan Lee, Niao He, Seungjae Lee, Panagiota Karava, Jianghai Hu	(参考訳) 建築部門は世界最大のエネルギーを消費しており、建物のエネルギー消費と快適管理にかなりの研究関心が寄せられています。近年の強化学習 (RL) の進展に触発されて, 気候制御問題構築におけるRLの可能性を評価することを目的とした。本研究では,連続建物制御タスクに対してddpg(deep deterministic policy gradient)と呼ばれる最近のrlアプローチを適用し,センサ制限による部分的状態観測可能性の処理能力,(b)連続的かつ離散的な高次元状態空間を有する複雑な確率システム,(c)環境条件による不確実性,居住者の行動,快適感についてシミュレーション研究を行い,その性能を評価する。特に、占有者間相互作用による部分的可観測性と不確実性は、制御問題を著しく複雑化する。シミュレーション研究を通じて、DDPGが学んだポリシーは、合理的な性能と計算的トラクタビリティを示す。 The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings. Inspired by recent advances in reinforcement learning (RL), this paper aims at assessing the potential of RL in building climate control problems with occupant interaction. We apply a recent RL approach, called DDPG (deep deterministic policy gradient), for the continuous building control tasks and assess its performance with simulation studies in terms of its ability to handle (a) the partial state observability due to sensor limitations; (b) complex stochastic system with high-dimensional state-spaces, which are jointly continuous and discrete; (c) uncertainties due to ambient weather conditions, occupant's behavior, and comfort feelings. Especially, the partial observability and uncertainty due to the occupant interaction significantly complicate the control problem. Through simulation studies, the policy learned by DDPG demonstrates reasonable performance and computational tractability.	翻訳日:2021-03-16 13:51:42 公開日:2021-03-14
# オープンエンディングゲームにおける学習行動多様性のモデル化 Modelling Behavioural Diversity for Learning in Open-Ended Games ( http://arxiv.org/abs/2103.07927v1 ) ライセンス: Link先を確認	Nicolas Perez Nieves, Yaodong Yang, Oliver Slumbers, David Henry Mguni, Jun Wang	(参考訳) 行動多様性の促進は、戦略サイクルが存在する非推移的ダイナミクスでゲームを解決するために重要であり、一貫した勝者は存在しない(例えば、Rock-Paper-Scissors)。しかし、多様性を定義し、多様性を意識した学習ダイナミクスを構築するための厳格な処理が欠けています。本研究では,ゲームにおける行動の多様性を幾何学的に解釈し,dpp(\emph{ determinantal point processes})に基づく新しい多様性指標を導入する。多様性指標を最適応答力学に組み込むことで,正規形式ゲームやオープンエンドゲームを解決するために,emph{diverse fictitious play} と \emph{diverse policy-space response oracle} を開発した。多様なベストレスポンスのユニークさと、2プレイヤーゲームにおけるアルゴリズムの収束性を証明する。重要なのは、DPPベースの多様性メトリックを最大化することで、エージェントの戦略の混合にまたがる凸ポリトープである \emph{gamescape} を拡大できることである。多様性を意識した解法を検証するために、強い非推移性を示す数万のゲームをテストする。提案手法は, 有効かつ多様な戦略を見出すことにより, 最先端の解法よりもはるかに低いエクスプロイザビリティを実現することを示唆している。 Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.	翻訳日:2021-03-16 13:51:24 公開日:2021-03-14
# 機械学習に対するメンバシップ推論の攻撃:調査 Membership Inference Attacks on Machine Learning: A Survey ( http://arxiv.org/abs/2103.07853v1 ) ライセンス: Link先を確認	Hongsheng Hu and Zoran Salcic and Gillian Dobbie and Xuyun Zhang	(参考訳) メンバシップ推論攻撃は、データサンプルがマシンラーニングモデルのトレーニングに使用されたかどうかを識別することを目的としている。会員が個人の機密情報を開示できるため、深刻なプライバシーリスクを引き起こす可能性があります。例えば、病院の健康分析トレーニングセットに参加している個人を特定すると、この個人はかつてその病院の患者だったことが判明します。メンバシップ推論攻撃は、分類モデル、生成モデル、シーケンスツーシーケンスモデルなど、さまざまな機械学習モデルに有効であることが示されている。一方、このようなプライバシー攻撃を防御する多くの方法が提案されている。メンバーシップ推論攻撃は、急速に成長している研究分野であるが、このトピックに関する包括的調査はまだない。本稿では,会員推定攻撃文学におけるこの重要なギャップを橋渡しする。会員の推論攻撃に関する最初の包括的な調査を紹介します。既存のメンバーシップ推論攻撃と防御をまとめて分類し、さまざまな設定で攻撃を実装する方法を明確に示します。さらに、メンバシップ推論攻撃が機能する理由を議論し、ベンチマークデータセットを要約して、比較を促進し、将来の作業の公平性を保証する。最後に,今後の研究の方向性と,レビューによる応用の可能性について提案する。 Membership inference attack aims to identify whether a data sample was used to train a machine learning model or not. It can raise severe privacy risks as the membership can reveal an individual's sensitive information. For example, identifying an individual's participation in a hospital's health analytics training set reveals that this individual was once a patient in that hospital. Membership inference attacks have been shown to be effective on various machine learning models, such as classification models, generative models, and sequence-to-sequence models. Meanwhile, many methods are proposed to defend such a privacy attack. Although membership inference attack is an emerging and rapidly growing research area, there is no comprehensive survey on this topic yet. In this paper, we bridge this important gap in membership inference attack literature. We present the first comprehensive survey of membership inference attacks. We summarize and categorize existing membership inference attacks and defenses and explicitly present how to implement attacks in various settings. Besides, we discuss why membership inference attacks work and summarize the benchmark datasets to facilitate comparison and ensure fairness of future work. Finally, we propose several possible directions for future research and possible applications relying on reviewed works.	翻訳日:2021-03-16 13:49:01 公開日:2021-03-14
# BreakingBED -- 敵対攻撃によるバイナリと効率的なディープニューラルネットワークの破壊 BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks ( http://arxiv.org/abs/2103.08031v1 ) ライセンス: Link先を確認	Manoj Rohit Vemparala, Alexander Frickenstein, Nael Fasfous, Lukas Frickenstein, Qi Zhao, Sabine Kuhn, Daniel Ehrhardt, Yuankai Wu, Christian Unger, Naveen Shankar Nagaraja, Walter Stechele	(参考訳) 組み込みアプリケーション向けの畳み込みニューラルネットワーク(CNN)の展開は、リソース効率とタスク関連精度のバランスをとる上で多くの課題である。これらの2つの側面はCNN圧縮の分野でよく研究されています。現実世界のアプリケーションでは、第3の重要な側面、すなわちcnnの堅牢性が果たされる。本論文では、ホワイトボックスとブラックボックスの敵対攻撃(FGSM、PGD、C&W、DeepFool、LocalSearch、GenAttack)に対する非圧縮、蒸留、粉砕およびバイナライズニューラルネットワークの堅牢性を徹底的に研究する。これらの新たな洞察は、攻撃を検知し、入力を破棄または/または浄化する防御訓練スキームや反応性フィルタリング手法を促進する。 CIFAR-10およびImageNetデータセットをトレーニングした蒸留CNN、エージェントベース最先端プルーニングモデル、XNOR-NetやABC-Netなどのバイナライズニューラルネットワーク(BNN)の実験結果を示す。損失/精度レベル, 応力-ひずみグラフ, ボックスプロット, クラスアクティベーションマッピング (CAM) を用いて, CNN の比較を簡略化する手法を提案する。解析の結果,非圧縮cnnおよびプルーニングcnnのあらゆる種類の攻撃に対する感受性が明らかになった。蒸留されたモデルは、C&Wを除いて全ての白い箱攻撃に対する強さを示す。さらに、バイナリニューラルネットワークは、ベースラインや他の圧縮変形と比較して回復力のある挙動を示す。 Deploying convolutional neural networks (CNNs) for embedded applications presents many challenges in balancing resource-efficiency and task-related accuracy. These two aspects have been well-researched in the field of CNN compression. In real-world applications, a third important aspect comes into play, namely the robustness of the CNN. In this paper, we thoroughly study the robustness of uncompressed, distilled, pruned and binarized neural networks against white-box and black-box adversarial attacks (FGSM, PGD, C&W, DeepFool, LocalSearch and GenAttack). These new insights facilitate defensive training schemes or reactive filtering methods, where the attack is detected and the input is discarded and/or cleaned. Experimental results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks (BNNs) such as XNOR-Net and ABC-Net, trained on CIFAR-10 and ImageNet datasets. We present evaluation methods to simplify the comparison between CNNs under different attack schemes using loss/accuracy levels, stress-strain graphs, box-plots and class activation mapping (CAM). Our analysis reveals susceptible behavior of uncompressed and pruned CNNs against all kinds of attacks. The distilled models exhibit their strength against all white box attacks with an exception of C&W. Furthermore, binary neural networks exhibit resilient behavior compared to their baselines and other compressed variants.	翻訳日:2021-03-16 13:48:43 公開日:2021-03-14
# 量子機械学習のための図式微分 Diagrammatic Differentiation for Quantum Machine Learning ( http://arxiv.org/abs/2103.07960v1 ) ライセンス: Link先を確認	Alexis Toumi, Richie Yeung, Giovanni de Felice	(参考訳) リグからモノイド圏への双数構造の一般化によるテンソル計算のダイアグラム的微分について紹介する。これをZXダイアグラムに適用し、位相パラメータに関して線形写像の勾配を図式的に計算する方法を示す。パラメトリス量子回路の図では、多くの変分量子アルゴリズムに基づいてよく知られたパラメータシフト規則が得られる。次に、任意の非線形演算子を符号化するバブル付きダイアグラムを用いて、ハイブリッド古典量子回路の自動微分に拡張する。さらに、ダイアグラムの差別化には、Monoidalカテゴリ用のPythonライブラリであるDisCoPyのオープンソース実装が付属している。古典量子回路の図式勾配はpyzxライブラリを使って単純化され、tketコンパイラを介して量子ハードウェア上で実行される。このことは、文字列図の構造と量子機械学習の計算能力の両方を活用する多くの実用的な応用への扉を開く。 We introduce diagrammatic differentiation for tensor calculus by generalising the dual number construction from rigs to monoidal categories. Applying this to ZX diagrams, we show how to calculate diagrammatically the gradient of a linear map with respect to a phase parameter. For diagrams of parametrised quantum circuits, we get the well-known parameter-shift rule at the basis of many variational quantum algorithms. We then extend our method to the automatic differentation of hybrid classical-quantum circuits, using diagrams with bubbles to encode arbitrary non-linear operators. Moreover, diagrammatic differentiation comes with an open-source implementation in DisCoPy, the Python library for monoidal categories. Diagrammatic gradients of classical-quantum circuits can then be simplified using the PyZX library and executed on quantum hardware via the tket compiler. This opens the door to many practical applications harnessing both the structure of string diagrams and the computational power of quantum machine learning.	翻訳日:2021-03-16 13:45:06 公開日:2021-03-14
# 強凸最適化問題に対する加速一階法の過渡成長 Transient growth of accelerated first-order methods for strongly convex optimization problems ( http://arxiv.org/abs/2103.08017v1 ) ライセンス: Link先を確認	Hesameddin Mohammadi, Samantha Samuelson, Mihailo R. Jovanovi\'c	(参考訳) 最適化アルゴリズムは、限られた時間予算のアプリケーションでますます使われています。多くのリアルタイムおよび組み込みシナリオでは、ほんの数回のイテレーションしか実行できず、伝統的な収束メトリクスはこれらの非漸近的なシステムのパフォーマンスを評価するために使用できない。本稿では,高速化第一次最適化アルゴリズムの過渡挙動について検討する。二次最適化問題に対しては、線形系理論のツールを用いて、非正規ダイナミクスの存在から過渡的成長が生じることを示す。初期のイテレーションで代数的成長をもたらすモードの存在を同定し、これらのモードによって引き起こされる最適解からの過渡的エクスカージョンを定量化する。強凸滑らかな最適化問題に対して, 積分二次制約の理論を応用し, ネステロフ加速法の過渡応答の大きさの上限を定式化する。最適化変数と大域最小化器の間のユークリッド距離と過渡ピークへの上昇時間の両方が問題の条件数の平方根に比例していることを示す。最後に,条件数が大きい問題に対して,定数係数まで導出する境界の厳密性を示す。 Optimization algorithms are increasingly being used in applications with limited time budgets. In many real-time and embedded scenarios, only a few iterations can be performed and traditional convergence metrics cannot be used to evaluate performance in these non-asymptotic regimes. In this paper, we examine the transient behavior of accelerated first-order optimization algorithms. For quadratic optimization problems, we employ tools from linear systems theory to show that transient growth arises from the presence of non-normal dynamics. We identify the existence of modes that yield an algebraic growth in early iterations and quantify the transient excursion from the optimal solution caused by these modes. For strongly convex smooth optimization problems, we utilize the theory of integral quadratic constraints to establish an upper bound on the magnitude of the transient response of Nesterov's accelerated method. We show that both the Euclidean distance between the optimization variable and the global minimizer and the rise time to the transient peak are proportional to the square root of the condition number of the problem. Finally, for problems with large condition numbers, we demonstrate tightness of the bounds that we derive up to constant factors.	翻訳日:2021-03-16 13:43:09 公開日:2021-03-14
# エクストリームラーニングマシンのランダム係数の事前学習のための修正バッチ内在可塑性手法 A Modified Batch Intrinsic Plasticity Method for Pre-training the Random Coefficients of Extreme Learning Machines ( http://arxiv.org/abs/2103.08042v1 ) ライセンス: Link先を確認	Suchuan Dong, Zongwei Li	(参考訳) 極端な学習機械(elm)では、隠れ層係数はランダムに設定され固定され、ニューラルネットワークの出力層係数は最小二乗法で計算される。 ELMのランダム割り当て係数は、その性能と精度に大きく影響することが知られています。本稿では,ELMニューラルネットワークの乱数係数を前訓練するための修正バッチ内在可塑性(modBIP)法を提案する。本手法は,ニューラルネットワークの各ノードにおける情報伝達を強化することにより,バッチ固有可塑性(BIP)法と同じ原理に基づいて考案されている。 BIPとは2つの点で異なる。第一に、modbipはそのアルゴリズムでアクティベーション関数を含まず、ニューラルネットワークの任意のアクティベーション関数に適用することができる。対照的に、BIPはその構成において活性化関数の逆を使い、活性化関数は可逆性(あるいは単調性)を必要とする。 modBIPメソッドは、しばしば使用される非モノトニック活性化関数(例えば)で動作する。 Gaussian, swish, Gaussian error linear unit, and radial-basis type function)。第2に、modBIPは最小サイズのランダム間隔でターゲットサンプルを生成し、EMMと組み合わせると高精度な計算結果が得られる。 ELM/modBIP法は数値シミュレーションにおいてEMM/BIP法よりも著しく精度が高い。関数近似のための浅層および深層ニューラルネットワークと偏微分方程式を用いた境界/初期値問題について, 数値実験を行った。 ELM/modBIP法を組み合わせることで高精度なシミュレーション結果が得られ,その精度はニューラルネットワークのランダム係数初期化に不感であることが実証された。これは、ランダム係数の事前学習を行わないEMM結果と鋭い対比である。 In extreme learning machines (ELM) the hidden-layer coefficients are randomly set and fixed, while the output-layer coefficients of the neural network are computed by a least squares method. The randomly-assigned coefficients in ELM are known to influence its performance and accuracy significantly. In this paper we present a modified batch intrinsic plasticity (modBIP) method for pre-training the random coefficients in the ELM neural networks. The current method is devised based on the same principle as the batch intrinsic plasticity (BIP) method, namely, by enhancing the information transmission in every node of the neural network. It differs from BIP in two prominent aspects. First, modBIP does not involve the activation function in its algorithm, and it can be applied with any activation function in the neural network. In contrast, BIP employs the inverse of the activation function in its construction, and requires the activation function to be invertible (or monotonic). The modBIP method can work with the often-used non-monotonic activation functions (e.g. Gaussian, swish, Gaussian error linear unit, and radial-basis type functions), with which BIP breaks down. Second, modBIP generates target samples on random intervals with a minimum size, which leads to highly accurate computation results when combined with ELM. The combined ELM/modBIP method is markedly more accurate than ELM/BIP in numerical simulations. Ample numerical experiments are presented with shallow and deep neural networks for function approximation and boundary/initial value problems with partial differential equations. They demonstrate that the combined ELM/modBIP method produces highly accurate simulation results, and that its accuracy is insensitive to the random-coefficient initializations in the neural network. This is in sharp contrast with the ELM results without pre-training of the random coefficients.	翻訳日:2021-03-16 13:42:52 公開日:2021-03-14
# (参考訳) 二次拘束下での深部グラフマッチング Deep Graph Matching under Quadratic Constraint ( http://arxiv.org/abs/2103.06643v2 ) ライセンス: CC BY 4.0	Quankai Gao, Fudong Wang, Nan Xue, Jin-Gang Yu, Gui-Song Xia	(参考訳) 近年,グラフノード上で抽出された深層特徴の記述能力に依拠して,グラフマッチング問題に対して有望な結果が得られている。しかし、既存のディープグラフマッチング(DGM)メソッドの主な制限の1つは、グラフ構造の明示的な制約の無知であり、トレーニング中にモデルが局所的な最小値に閉じ込められる可能性がある。本稿では, DGM フレームワークに組み込んだ対方グラフ構造を, \textbf{quadratic constraint} として明示的に定式化する。二次制約はグラフ間の対構造的な相違を最小限に抑え、抽出したCNN特徴のみを用いて得られるあいまいさを軽減できる。さらに,2次制約付き最適化に対して,制約のないディープラーニングオプティマイザと互換性があるような,微分可能な実装を提案する。より正確かつ適切な監視を行うために、クラス不均衡に対する適切に設計された偽マッチング損失が提案され、過度に適合しない偽陰性や偽陽性をよりよく罰できる。実験により,本手法は実世界のデータセット上での競合性能を示す。 Recently, deep learning based methods have demonstrated promising results on the graph matching problem, by relying on the descriptive capability of deep features extracted on graph nodes. However, one main limitation with existing deep graph matching (DGM) methods lies in their ignorance of explicit constraint of graph structures, which may lead the model to be trapped into local minimum in training. In this paper, we propose to explicitly formulate pairwise graph structures as a \textbf{quadratic constraint} incorporated into the DGM framework. The quadratic constraint minimizes the pairwise structural discrepancy between graphs, which can reduce the ambiguities brought by only using the extracted CNN features. Moreover, we present a differentiable implementation to the quadratic constrained-optimization such that it is compatible with the unconstrained deep learning optimizer. To give more precise and proper supervision, a well-designed false matching loss against class imbalance is proposed, which can better penalize the false negatives and false positives with less overfitting. Exhaustive experiments demonstrate that our method competitive performance on real-world datasets.	翻訳日:2021-03-16 13:27:21 公開日:2021-03-14
# 確率的制御目的における情報探索の起源の理解 Understanding the origin of information-seeking exploration in probabilistic objectives for control ( http://arxiv.org/abs/2103.06859v2 ) ライセンス: Link先を確認	Beren Millidge, Alexander Tschantz, Anil Seth, Christopher Buckley	(参考訳) 探索と探索のトレードオフは、機械学習から生物学、経済学まで幅広い分野における適応行動の記述の中心である。多くのアプローチが取られているが、このトレードオフを解決するための1つのアプローチは、エージェントが固有の「探索駆動」を持っていること、すなわち、世界のエージェント情報獲得を最大化すること、すなわち機械学習や認知科学で広く研究されているアプローチである。本稿では,このような手法の性質と意味を数学的に検討し,このユーティリティの最大化と情報探索の組合せが,分散目的と呼ぶ目的の完全差分クラスを最小化することから生じることを実証する。 We propose a dichotomy in the objective functions underlying adaptive behaviour between \emph{evidence} objectives, which correspond to well-known reward or utility maximizing objectives in the literature, and \emph{divergence} objectives which instead seek to minimize the divergence between the agent's expected and desired futures, and argue that this new class of divergence objectives could form the mathematical foundation for a much richer understanding of the exploratory components of adaptive and intelligent action, beyond simply greedy utility maximization. The exploration-exploitation trade-off is central to the description of adaptive behaviour in fields ranging from machine learning, to biology, to economics. While many approaches have been taken, one approach to solving this trade-off has been to equip or propose that agents possess an intrinsic 'exploratory drive' which is often implemented in terms of maximizing the agents information gain about the world -- an approach which has been widely studied in machine learning and cognitive science. In this paper we mathematically investigate the nature and meaning of such approaches and demonstrate that this combination of utility maximizing and information-seeking behaviour arises from the minimization of an entirely difference class of objectives we call divergence objectives. We propose a dichotomy in the objective functions underlying adaptive behaviour between \emph{evidence} objectives, which correspond to well-known reward or utility maximizing objectives in the literature, and \emph{divergence} objectives which instead seek to minimize the divergence between the agent's expected and desired futures, and argue that this new class of divergence objectives could form the mathematical foundation for a much richer understanding of the exploratory components of adaptive and intelligent action, beyond simply greedy utility maximization.	翻訳日:2021-03-16 11:55:54 公開日:2021-03-14

Title

Authors

Abstract

論文公表日・翻訳日

# 多部量子系の特性に対する一様連続性境界

Uniform continuity bounds for characteristics of multipartite quantum systems ( http://arxiv.org/abs/2007.00417v2 )

ライセンス: Link先を確認

M.E.Shirokov

(参考訳) 多部量子系の特性に対する(一様)連続性境界を求める普遍的方法を考える。我々はエネルギー制約の下で無限次元多部量子系に特別な注意を払う。これらの方法により、多部量子状態のいくつかの重要な特性、すなわち、量子的(条件的)相互情報、スクアッドエンタングルメント、c-スクアッドエンタングルメント、および相互情報の条件エンタングルメントに対する連続性境界を得る。多部量子相互情報の連続性境界は、大きな次元/エネルギーに対して漸近的にきつい。得られた結果は,n$-partite squashed entanglement,c-squashed entanglement,およびエネルギー制約下での相互情報の条件付き絡み合いの漸近連続性を証明するために用いられる。

We consider universal methods for obtaining (uniform) continuity bounds for characteristics of multipartite quantum systems. We pay a special attention to infinite-dimensional multipartite quantum systems under the energy constraints. By these methods we obtain continuity bounds for several important characteristics of a multipartite quantum state: the quantum (conditional) mutual information, the squashed entanglement, the c-squashed entanglement and the conditional entanglement of mutual information. The continuity bounds for the multipartite quantum mutual information are asymptotically tight for large dimension/energy. The obtained results are used to prove the asymptotic continuity of the $n$-partite squashed entanglement, c-squashed entanglement and the conditional entanglement of mutual information under the energy constraints.

翻訳日:2023-05-11 23:10:30 公開日:2021-03-14

# 制御sゲートを用いた非cliffordinterleaved randomized benchmarkingの実験的検討

Experimental implementation of non-Clifford interleaved randomized benchmarking with a controlled-S gate ( http://arxiv.org/abs/2007.08532v2 )

ライセンス: Link先を確認

Shelly Garion, Naoki Kanazawa, Haggai Landa, David C. McKay, Sarah Sheldon, Andrew W. Cross, Christopher J. Wood

(参考訳) ハードウェアで効率的な量子回路の量子デバイスへのトランスパイレーションは、ノイズの多い量子コンピュータ上での量子アルゴリズムの実行に不可欠である。典型的な量子デバイスでは、1対の結合量子ビットに対して1つの2ビットのクリフォードエンタングゲートを持つゲートセットを使用するが、いくつかのアプリケーションでは、非クリフォード2ビットのゲートにアクセスするとより最適な回路分解が起こり、ノイズを最適化する柔軟性も向上する。我々は、Qiskit Pulseフレームワークを用いたクラウドベースのIBM量子コンピューティング上で、低エラー非クリフォード制御-$\frac{\pi}{2}$ phase (CS) ゲートの校正を実演する。校正されたcsゲートのゲートエラーを測定するために、非クリフォードcnot-dihedral interleaved randomized benchmarkingを行う。ゲート長263 nsで5.9(7) \times 10^{-3}$のゲートエラーを得ることができ、これは関連するキュービットのコヒーレンス限界に近く、バックエンドの標準キャリブレーションされたcnotゲートよりも低いエラーである。

Hardware efficient transpilation of quantum circuits to a quantum devices native gateset is essential for the execution of quantum algorithms on noisy quantum computers. Typical quantum devices utilize a gateset with a single two-qubit Clifford entangling gate per pair of coupled qubits, however, in some applications access to a non-Clifford two-qubit gate can result in more optimal circuit decompositions and also allows more flexibility in optimizing over noise. We demonstrate calibration of a low error non-Clifford Controlled-$\frac{\pi}{2}$ phase (CS) gate on a cloud based IBM Quantum computing using the Qiskit Pulse framework. To measure the gate error of the calibrated CS gate we perform non-Clifford CNOT-Dihedral interleaved randomized benchmarking. We are able to obtain a gate error of $5.9(7) \times 10^{-3}$ at a gate length 263 ns, which is close to the coherence limit of the associated qubits, and lower error than the backends standard calibrated CNOT gate.

翻訳日:2023-05-09 07:03:02 公開日:2021-03-14

# リースウォークの帰納確率と自己相似性

Return probability and self-similarity of the Riesz walk ( http://arxiv.org/abs/2010.04518v3 )

ライセンス: Link先を確認

Ryota Hanaoka, Norio Konno

(参考訳) 量子ウォーク(quantum walk)とは、ランダムウォークの一種である。 1次元の2状態量子ウォークは複素平面の単位円上の測度によって決定できる。特異連続測度については、対応する量子ウォークの結果は限定的である。この状況では、有名な特異連続測度の一つであるリース測度によって与えられるリースウォークと呼ばれる量子ウォークに焦点を当てる。本論文は, リースウォークの戻り確率について述べる。さらに,歩行の自己相似性に関するいくつかの予想を示す。

The quantum walk is a counterpart of the random walk. The 2-state quantum walk in one dimension can be determined by a measure on the unit circle in the complex plane. As for the singular continuous measure, results on the corresponding quantum walk are limited. In this situation, we focus on a quantum walk, called the Riesz walk, given by the Riesz measure which is one of the famous singular continuous measures. The present paper is devoted to the return probability of the Riesz walk. Furthermore, we present some conjectures on the self-similarity of the walk.

翻訳日:2023-04-29 13:29:55 公開日:2021-03-14

# wバンド超電導インダクタンス量子ビット(キネチコン)の初期設計

Initial Design of a W-band Superconducting Kinetic Inductance Qubit (Kineticon) ( http://arxiv.org/abs/2012.08654v3 )

ライセンス: Link先を確認

Farzad B. Faramarzi, Peter K. Day, Jacob Glasby, Sasha Sypkens, Marco Colangelo, Ralph Chamberlin, Mohammad Mirhosseini, Kevin Schmidt, Karl K. Berggren Philip Mauskopf

(参考訳) 超伝導量子ビットは量子コンピューティングの研究や産業で広く使われている。本稿では、2つの異なる量子エネルギー状態に必要な非調和性を提供する非線形ナノワイヤセクションでWバンド周波数で動作する超伝導運動インダクタンス量子ビットについて述べる。キュービットを高い周波数で動作させることは、これらの装置の希釈冷凍機温度要件を緩和し、多数のキュービットを多重化する経路を舗装する。ミリ波動作には比較的高いT_c$の超伝導体が必要であり、これは高いギャップ周波数、2$\Delta/h$であり、光子がクーパー対を破る。例えば、$T_c = 15\,\text{K}$のNbTiNは1.4 THz付近のギャップ周波数を持ち、アルミニウム(90GHz)よりもはるかに高く、ミリ波帯全体の動作を可能にする。ここでは3次元キャビティに埋め込まれたWバンドキネティコン量子ビットの設計とシミュレーションについて述べる。得られた電界分布の古典的電磁計算を行う。

Superconducting qubits are widely used in quantum computing research and industry. We describe a superconducting kinetic inductance qubit (and introduce the term Kineticon to describe it) operating at W-band frequencies with a nonlinear nanowire section that provides the anharmonicity required for two distinct quantum energy states. Operating the qubits at higher frequencies may relax the dilution refrigerator temperature requirements for these devices and paves the path for multiplexing a large number of qubits. Millimeter-wave operation requires superconductors with relatively high $T_c$, which implies high gap frequency, 2$\Delta/h$, beyond which photons break Cooper pairs. For example, NbTiN with $T_c =15\,\text{K}$ has a gap frequency near 1.4 THz, which is much higher than that of aluminum (90 GHz), allowing for operation throughout the millimeter-wave band. Here we describe a design and simulation of a W-band Kineticon qubit embedded in a 3-D cavity. We perform classical electromagnetic calculations of the resulting field distributions.

翻訳日:2023-04-20 18:46:46 公開日:2021-03-14

# 行列ベクトル積による量子クエリ複雑性

Quantum query complexity with matrix-vector products ( http://arxiv.org/abs/2102.11349v2 )

ライセンス: Link先を確認

Andrew M. Childs, Shih-Han Hung, Tongyang Li

(参考訳) 入力ベクトル上での動作を返すクエリを用いて,行列の性質を学習する量子アルゴリズムについて検討する。行列のトレース、行列式、ランクの計算や線形系の解法など様々な問題に対して、量子コンピュータは古典計算よりも漸近的な高速化を提供していないことを示す。一方で,行や列のパリティの計算や,同一の行や列が2つあるかどうかの判断といった問題に対して,量子コンピュータは指数関数的なスピードアップを提供する。我々は、行列ベクトル積、ベクトル行列積、ベクトル行列ベクトル積を提供するモデル間の等価性を示すことによって、これを実証する。

We study quantum algorithms that learn properties of a matrix using queries that return its action on an input vector. We show that for various problems, including computing the trace, determinant, or rank of a matrix or solving a linear system that it specifies, quantum computers do not provide an asymptotic speedup over classical computation. On the other hand, we show that for some problems, such as computing the parities of rows or columns or deciding if there are two identical rows or columns, quantum computers provide exponential speedup. We demonstrate this by showing equivalence between models that provide matrix-vector products, vector-matrix products, and vector-matrix-vector products, whereas the power of these models can vary significantly for classical computation.

翻訳日:2023-04-10 05:31:54 公開日:2021-03-14

# SU(1,1)干渉計のKerr非線形による位相感度の向上

Improvement of phase sensitivity in SU(1,1) interferometer via a Kerr nonlinear ( http://arxiv.org/abs/2103.07844v1 )

ライセンス: Link先を確認

Shoukang Chang, Wei Ye, Huan Zhang, Liyun Hu, Jiehui Huang, and Sanqiu Liu

(参考訳) 本稿では,コヒーレント状態入力とホモダイン検出を併用した従来のSU(1,1)干渉計にKerr非線形位相シフトを導入することにより,位相感度を向上させる理論的手法を提案する。位相感度および量子フィッシャー情報に対する光子損失の現実的影響について検討する。その結果,SU(1,1)干渉計の線形位相シフトと比較して,Kerr非線形ケースは位相感度と量子フィッシャー情報を高めるだけでなく,光子損失を著しく抑制できることがわかった。また,同じアクセス可能なパラメータにおいて,内部損失が外部のパラメータよりも位相感度に与える影響も観察した。非線形位相要素の導入により,より高い位相感度とより大きな量子フィッシャー情報を得るための,低コストな入力資源の明らかな利点を示すことが興味深い。

We propose a theoretical scheme to enhance the phase sensitivity by introducing a Kerr nonlinear phase shift into the traditional SU(1,1) interferometer with a coherent state input and homodyne detection. We investigate the realistic effects of photon losses on phase sensitivity and quantum Fisher information. The results show that compared with the linear phase shift in SU(1,1) interferometer, the Kerr nonlinear case can not only enhance the phase sensitivity and quantum Fisher information, but also significantly suppress the photon losses. We also observe that at the same accessible parameters, internal losses have a greater influence on the phase sensitivity than the external ones. It is interesting that, our scheme shows an obvious advantage of low-cost input resources to obtain higher phase sensitivity and larger quantum Fisher information due to the introduction of nonlinear phase element.

翻訳日:2023-04-08 04:35:04 公開日:2021-03-14

# 複数の中心スピン系における量子バッテリパワーの下限と上限

Lower and upper bounds of quantum battery power in multiple central spin systems ( http://arxiv.org/abs/2103.07828v1 )

ライセンス: Link先を確認

Li Peng, Wen-Bin He, Stefano Chesi, Hai-Qing Lin, Xi-Wen Guan

(参考訳) 複数の中心スピンと入浴スピンからなる量子電池システムにおけるエネルギー伝達過程について検討した。ここでは「量子電池」は中心のスピンを指し、風呂は「充電器」として機能する。単一中心スピン電池については、任意の数の浴スピンでエネルギー移動とチャージパワーの時間変化を解析的に導出する。電池内の複数の中心スピンの場合、最大パワー $p_{max}$ と中心スピン $n_b$ の間のスケーリング則の関係を見出す。スケーリング法則関係$P_{max}\propto N_{B}^{\alpha}$を満たすが、スケーリング指数$\alpha$は、下界$\alpha =1/2$から上界$\alpha =3/2$までバススピン数$N$によって変化する。下限と上限はそれぞれ$N\to 1$と$N\gg N_B$に対応する。熱力学的極限において、ホルシュタイン・プリマコフ変換(H-P)を適用することにより、上界が$P_{max}=0.72 B A \sqrt{N} N_{B}^{3/2}$であることが厳密に証明される。ここで$B$と$A $は外部磁場であり、バッテリーと充電器の間の結合定数である。

We study the energy transfer process in quantum battery systems consisting of multiple central spins and bath spins. Here with "quantum battery" we refer to the central spins, whereas the bath serves as the "charger". For the single central-spin battery, we analytically derive the time evolutions of the energy transfer and the charging power with arbitrary number of bath spins. For the case of multiple central spins in the battery, we find the scaling-law relation between the maximum power $P_{max}$ and the number of central spins $N_B$. It approximately satisfies a scaling law relation $P_{max}\propto N_{B}^{\alpha}$, where scaling exponent $\alpha$ varies with the bath spin number $N$ from the lower bound $\alpha =1/2$ to the upper bound $\alpha =3/2$. The lower and upper bounds correspond to the limits $N\to 1$ and $N\gg N_B$, respectively. In thermodynamic limit, by applying the Holstein-Primakoff (H-P) transformation, we rigorously prove that the upper bound is $P_{max}=0.72 B A \sqrt{N} N_{B}^{3/2}$, which shows the same advantage in scaling of a recent charging protocol based on the Tavis-Cummins model. Here $B$ and $A $ are the external magnetic field and coupling constant between the battery and the charger.

翻訳日:2023-04-08 04:34:52 公開日:2021-03-14

# オペレーターの正規順序変換とエンタングルメントの定量化への応用

Operator transpose within normal ordering and its applications for quantifying entanglement ( http://arxiv.org/abs/2103.07821v1 )

ライセンス: Link先を確認

Liyun Hu, Luping Zhang, Xiaoting Chen, Wei Ye, Qin Guo, and Hongyi Fan

(参考訳) 部分転置は絡み合いを定量化する重要な演算であり、ここでは任意の単(二モード)作用素の(部分的な)転置を研究する。 fock-basis展開を用いて、任意の作用素の転置作用素は通常の順序形式で c-数の代わりに a^{{\dag}}(a) を a(a^{{\dag}}) に置き換えることで得られることが判明した。変位演算子とウィグナー演算子の変換について検討し, 密度演算子と遷移密度演算子との間には, ウィグナー関数, 特性関数, 共分散行列などの平均値の関係が形成される。これらの観測はマルチモードの場合にも拡張できる。応用例として, 2モードスクイーズドオペレータの部分的転置とレーザチャネルを介した2モードスクイーズド真空の絡み合いを考える。

Partial transpose is an important operation for quantifying the entanglement, here we study the (partial) transpose of any single (two-mode) operators. Using the Fock-basis expansion, it is found that the transposed operator of an arbitrary operator can be obtained by replacement of a^{{\dag}}(a) by a(a^{{\dag}}) instead of c-number within normal ordering form. The transpose of displacement operator and Wigner operator are studied, from which the relation of Wigner function, characteristics function and average values such as covariance matrix are constructed between density operator and transposed density operator. These observations can be further extended to multi-mode cases. As applications, the partial transpose of two-mode squeezed operator and the entanglement of two-mode squeezed vacuum through a laser channel are considered.

翻訳日:2023-04-08 04:34:19 公開日:2021-03-14

# 配位子相互作用するリドベルグアンサンブルにおけるコヒーレント非局在状態:内部縮退の役割

Coherently delocalized states in dipole interacting Rydberg ensembles: the role of internal degeneracies ( http://arxiv.org/abs/2103.07990v1 )

ライセンス: Link先を確認

Ghassan Abumwis, Christopher W. W\"achtler, Matthew T. Eiles, Alexander Eisfeld

(参考訳) 双極子-双極子相互作用リドバーグ集合体の励起子非局在化に及ぼす縮退原子状態の影響について検討した。凍結ガスと正則 1-, 2-, 3-次元格子配置を例にとると, 縮退しない状況と比較して, 縮退が非局在化を促進することが分かる。磁場によって提供されるゼーマン分裂を用いて、縮退性を持ち上げ、縮退状態と非縮退状態の遷移を詳細に研究する。

We investigate the effect of degenerate atomic states on the exciton delocalization of dipole-dipole interacting Rydberg assemblies. Using a frozen gas and regular one-, two-, and three-dimensional lattice arrangements as examples, we see that degeneracies can enhance the delocalization compared to the situation when there is no degeneracy. Using the Zeeman splitting provided by a magnetic field, we controllably lift the degeneracy to study in detail the transition between degenerate and non-degenerate regimes.

翻訳日:2023-04-08 04:29:09 公開日:2021-03-14

# フラックス可変トランスモン量子ビットに対する表面処理の効果

Effects of surface treatments on flux tunable transmon qubits ( http://arxiv.org/abs/2103.07970v1 )

ライセンス: Link先を確認

M. Mergenthaler, C. M\"uller, M. Ganzhorn, S. Paredes, P. M\"uller, G. Salis, V. P. Adiga, M. Brink, M. Sandberg, J. B. Hertzberg, S. Filipp, A. Fuhrer

(参考訳) 最先端のソリッドステート量子プロセッサの主な制限の1つは、局所環境におけるノイズによるクビットデコヒーレンスと緩和である。完全なフォールトトレラント量子コンピューティングに進むためには、基礎となる微視的ノイズ源をよりよく理解する必要がある。表面への吸着、界面の不純物、材料欠陥は固体量子デバイスにおけるノイズと消散の源として同定されている。ここでは,超高真空パッケージを用いて,真空負荷,紫外線露光,イオン照射処理がフラックス調整可能な超伝導トランスモン量子ビットのコヒーレンスおよび遅いパラメータ変動に与える影響を調べた。本研究では, 各表面処理の効果を, 多くのキュービットの平均値と処理前後の測定値を比較して分析する。検討した処理は緩和レート$\Gamma_1$とエコー減圧レート$\Gamma_2^\textrm{e}$に大きく影響しないが、Neイオン照射は$\Gamma_1$を減少させる。対照的に、紫外線およびnh$_3$処理によりチップ表面から磁性吸着物を除去することにより、フラックスノイズパラメータが改善される。さらに,sf$_6$のイオン照射により,スイートスポットにおけるqubitコヒーレンスに影響を与えることなく,その場および後製造中のqubit周波数を調整できることを実証した。

One of the main limitations in state-of-the art solid-state quantum processors are qubit decoherence and relaxation due to noise in their local environment. For the field to advance towards full fault-tolerant quantum computing, a better understanding of the underlying microscopic noise sources is therefore needed. Adsorbates on surfaces, impurities at interfaces and material defects have been identified as sources of noise and dissipation in solid-state quantum devices. Here, we use an ultra-high vacuum package to study the impact of vacuum loading, UV-light exposure and ion irradiation treatments on coherence and slow parameter fluctuations of flux tunable superconducting transmon qubits. We analyse the effects of each of these surface treatments by comparing averages over many individual qubits and measurements before and after treatment. The treatments studied do not significantly impact the relaxation rate $\Gamma_1$ and the echo dephasing rate $\Gamma_2^\textrm{e}$, except for Ne ion bombardment which reduces $\Gamma_1$. In contrast, flux noise parameters are improved by removing magnetic adsorbates from the chip surfaces with UV-light and NH$_3$ treatments. Additionally, we demonstrate that SF$_6$ ion bombardment can be used to adjust qubit frequencies in-situ and post fabrication without affecting qubit coherence at the sweet spot.

翻訳日:2023-04-08 04:28:29 公開日:2021-03-14

# ダイヤモンド中の自然寿命スピン対のコヒーレンスと絡み合い

Coherence and entanglement of inherently long-lived spin pairs in diamond ( http://arxiv.org/abs/2103.07961v1 )

ライセンス: Link先を確認

H. P. Bartling, M. H. Abobeih, B. Pingault, M. J. Degen, S. J. H. Loenen, C. E. Bradley, J. Randall, M. Markham, D. J. Twitchen, and T. H. Taminiau

(参考訳) 個々の量子システムの一貫性を理解し保護することは、量子科学とテクノロジーにおける中心的な課題である。過去数十年にわたり、コヒーレンスを拡張するための様々な方法が開発されてきた。補完的なアプローチは、本質的にデコヒーレンスから保護される自然に存在するシステムを探すことである。ここでは、固体中の同一核スピンの対が本質的に長寿命の量子系を形成することを示す。ダイヤモンド中の炭素13対を3つ研究し、その近傍の単一のNV中心を用いて量子状態の高忠実度測定を実現する。次に、スピン対は、時計遷移、非コヒーレンスな部分空間、運動的狭小化の変種という3つの現象のユニークな組み合わせにより、外部摂動に対して堅牢であることを明らかにする。結果として生じる不均質な強調時間は$t_2^* = 1.9(3)$ minutesであり、個別に制御された量子ビットでは最長である。最後に、完全な制御を開発し、射影パリティ測定により2つのスピンペア量子ビット間の絡み合い状態を実現する。これらの長寿命量子ビットはダイヤモンドやその他の固体に多く存在し、量子センシング、量子情報処理、量子ネットワークの新たな機会を提供する。

Understanding and protecting the coherence of individual quantum systems is a central challenge in quantum science and technology. Over the last decades, a rich variety of methods to extend coherence have been developed. A complementary approach is to look for naturally occurring systems that are inherently protected against decoherence. Here, we show that pairs of identical nuclear spins in solids form intrinsically long-lived quantum systems. We study three carbon-13 pairs in diamond and realize high-fidelity measurements of their quantum states using a single NV center in their vicinity. We then reveal that the spin pairs are robust to external perturbations due to a unique combination of three phenomena: a clock transition, a decoherence-free subspace, and a variant on motional narrowing. The resulting inhomogeneous dephasing time is $T_2^* = 1.9(3)$ minutes, the longest reported for individually controlled qubits. Finally, we develop complete control and realize an entangled state between two spin-pair qubits through projective parity measurements. These long-lived qubits are abundantly present in diamond and other solids, and provide new opportunities for quantum sensing, quantum information processing, and quantum networks.

翻訳日:2023-04-08 04:28:02 公開日:2021-03-14

# 原子集合におけるデコヒーレンスフリー部分空間の幾何学的操作

Geometric Manipulation of a Decoherence-Free Subspace in Atomic Ensembles ( http://arxiv.org/abs/2103.07907v1 )

ライセンス: Link先を確認

Dongni Chen, Si Luo, Ying-Dan Wang, Stefano Chesi, and Mahn-Soo Choi

(参考訳) 単一モードキャビティに閉じ込められた$\Lambda$型構造を持つ原子のアンサンブルを考察し、系の量子ゼノ部分空間内のゼロエネルギー状態の部分空間上の量子状態のコヒーレントな操作の幾何学的スキームを提案する。特定の部分空間は量子ゼノ部分空間の非コヒーレンスな性質を継承しており、対称性に保護された退化性を持ち、任意のユニタリ操作の普遍的なスキームのすべての条件を満たす。

We consider an ensemble of atoms with $\Lambda$-type level structure trapped in a single-mode cavity, and propose a geometric scheme of coherent manipulation of quantum states on the subspace of zero-energy states within the quantum Zeno subspace of the system. We find that the particular subspace inherits the decoherence-free nature of the quantum Zeno subspace and features a symmetry-protected degeneracy, fulfilling all the conditions for a universal scheme of arbitrary unitary operations on it.

翻訳日:2023-04-08 04:26:27 公開日:2021-03-14

# 超低温原子で実現した異方性ハイゼンベルク模型における横スピンダイナミクス

Transverse spin dynamics in the anisotropic Heisenberg model realized with ultracold atoms ( http://arxiv.org/abs/2103.07866v1 )

ライセンス: Link先を確認

Paul Niklas Jepsen, Wen Wei Ho, Jesse Amato-Grill, Ivana Dimitrova, Eugene Demler, Wolfgang Ketterle

(参考訳) 交換異方性を持つハイゼンベルクモデルでは、横スピン成分は保存されておらず、輸送によっても減衰しうる。ここでは、超低温原子を用いて1次元ハイゼンベルクスピン鎖のダイナミクスをシミュレーションし、異方性によって制御される高速で局所的なスピン崩壊を観測する。さらに, チェーン間の格子深さのばらつきや, チェーンの端面における有効磁場の2倍の減少, 移動孔の存在下での有効磁場のゆらぎによる各チェーン内での劣化などにより, 不均一な崩壊機構を生じる超交換により生じる有効磁場を直接観察する。後者は、穴とマグノンの間の新しい結合機構である。広範な数値シミュレーションによって裏付けられたこれら全てのデファスメント機構は、超低温原子では観測されておらず、基礎となるハバード模型の基本的な性質を示している。

In Heisenberg models with exchange anisotropy, transverse spin components are not conserved and can decay not only by transport, but also by dephasing. Here we utilize ultracold atoms to simulate the dynamics of 1D Heisenberg spin chains, and observe fast, local spin decay controlled by the anisotropy. Additionally, we directly observe an effective magnetic field created by superexchange which causes an inhomogeneous decay mechanism due to variations of lattice depth between chains, as well as dephasing within each chain due to the twofold reduction of the effective magnetic field at the edges of the chains and due to fluctuations of the effective magnetic field in the presence of mobile holes. The latter is a new coupling mechanism between holes and magnons. All these dephasing mechanisms, corroborated by extensive numerical simulations, have not been observed before with ultracold atoms and illustrate basic properties of the underlying Hubbard model.

翻訳日:2023-04-08 04:26:07 公開日:2021-03-14

# 多様体上の連続正規化フロー

Continuous normalizing flows on manifolds ( http://arxiv.org/abs/2104.14959v1 )

ライセンス: Link先を確認

Luca Falorsi

(参考訳) 正規化フローは、複雑なマルチモーダル分布から再パラメータ化可能なサンプルを得るための強力な技術である。残念なことに、現在のアプローチは最も基本的なジオメトリでのみ利用可能であり、基礎となる空間が非自明なトポロジを持つ場合に不足する。微分幾何学と幾何学制御理論の基本的な考え方を用いて、最近導入されたニューラルオドと連続正規化フローを任意の滑らかな多様体に拡張できる方法について述べる。本稿では,これらの空間上のベクトル場をパラメータ化する一般的な手法を提案し,勾配に基づく学習を行う方法を示す。さらに、この一般化された環境での発散に対するスケーラブルな非バイアス推定器を提供する。多様な空間の選択に関する実験では、複雑な分布から再パラメータ可能なサンプルを得るためのフレームワークの能力が実証的に示される。

Normalizing flows are a powerful technique for obtaining reparameterizable samples from complex multimodal distributions. Unfortunately, current approaches are only available for the most basic geometries and fall short when the underlying space has a nontrivial topology, limiting their applicability for most real-world data. Using fundamental ideas from differential geometry and geometric control theory, we describe how the recently introduced Neural ODEs and continuous normalizing flows can be extended to arbitrary smooth manifolds. We propose a general methodology for parameterizing vector fields on these spaces and demonstrate how gradient-based learning can be performed. Additionally, we provide a scalable unbiased estimator for the divergence in this generalized setting. Experiments on a diverse selection of spaces empirically showcase the defined framework's ability to obtain reparameterizable samples from complex distributions.

翻訳日:2023-04-08 04:20:23 公開日:2021-03-14

# 新型コロナウイルス(covid-19)による学内授業からオンライン授業への転換過程 - コソボの高等教育機関を事例として-

The transformation process from in-campus classes into online classes due to the COVID-19 situation -- the case of higher education institutions in Kosovo ( http://arxiv.org/abs/2104.03896v1 )

ライセンス: Link先を確認

Ereza Baftiu and Krenare Pireva Nuci

(参考訳) 新型コロナウイルス(COVID-19)のパンデミックは、世界中の伝統的な教育の面で変化をもたらした。コソバの文脈では、大学は授業からオンライン授業への移行を非常に困難にしている。本研究では,コソボの5つの高等教育機関(HEI)の技術的観点から,インキャンプクラスからオンラインクラスへの転換過程について検討した。データは定性的手法で収集され、3c lichtmanアプローチに従って解析された。その結果,各大学は,インフラの限定化やクラウドインフラの追加により,異なるアプローチを採っていることがわかった。

The COVID-19 pandemic has caused changes in terms of traditional teaching globally. In Kosova context, the Universities have found the transition from teaching in class to online classes quite challenging. This study investigates the transformation process from in-campus classes to online classes from the technical perspective within five Higher Education Institutions (HEI) in Kosovo. The data was collected using the qualitative methods and its analysis followed the 3C Lichtman approach. The results show that each of the Universities followed a different approach, by using either their limited premises infrastructure or using additional cloud infrastructure.

翻訳日:2023-04-08 04:20:10 公開日:2021-03-14

# 古典的電磁ゼロ点放射における水素原子の相対性と放射バランス

Relativity and Radiation Balance for the Classical Hydrogen Atom in Classical Electromagnetic Zero-Point Radiation ( http://arxiv.org/abs/2103.09084v1 )

ライセンス: Link先を確認

Timothy H. Boyer

(参考訳) ここでは、古典的電磁ゼロ点放射における古典的水素原子の理解を概観し、特殊相対性理論の重要性を強調する。初期の計算の試みにおける重要な欠落成分(数値と解析の両方)は、完全な相対論的解析に有効な近似を用いることである。ランダウとリフシッツが与えた非相対論的時間フーリエ展開係数は、クーロンポテンシャルにおける荷電粒子の電磁的記述として誤りであり、この誤差のため、マーシャルとクラヴェリーの放射平衡の失敗に関する結論は無効であると指摘されている。むしろ、マーシャルとクレーヴェリーの計算を用いるが、軌道偏心性(非相対論的軌道が完全相対論的電磁軌道の有効な近似である場合)において、古典的な電磁ゼロ点放射の放射バランスは基本周波数と関連する第1のオーバートンで保つことが示されている。

Here we review the understanding of the classical hydrogen atom in classical electromagnetic zero-point radiation, and emphasize the importance of special relativity. The crucial missing ingredient in earlier calculational attempts (both numerical and analytic) is the use of valid approximations to the full relativistic analysis. It is pointed out that the nonrelativistic time Fourier expansion coefficients given by Landau and Lifshitz are in error as the electromagnetic description of a charged particle in a Coulomb potential, and, because of this error, Marshall and Claverie's conclusion regarding the failure of radiation balance is invalid. Rather, using Marshall and Claverie's calculations, but restricted to lowest nonvanishing order in the orbital eccentricity (where the nonrelativistic orbit is a valid approximation to the fully relativistic electromagnetic orbit) radiation balance for classical electromagnetic zero-point radiation is shown to hold at the fundamental frequencies and associated first overtones.

翻訳日:2023-04-08 04:19:59 公開日:2021-03-14

# 量子相転移における非コヒーレントクエンチダイナミクス

Decoherent Quench Dynamics across Quantum Phase Transitions ( http://arxiv.org/abs/2103.08068v1 )

ライセンス: Link先を確認

Wei-Ting Kuo, Daniel Arovas, Smitha Vishveshwara, Yi-Zhuang You

(参考訳) 本稿では,デコヒーレンスの存在下での量子相転移のクエンチダイナミクスを調べるための定式化について述べる。即時ハミルトニアンの連続量子非破壊測定によって引き起こされるデコヒーレントダイナミクスを定式化する。臨界点を横断する線形時間駆動に対するよく研究された普遍的キブル・ズレーク挙動を一般化する。基底状態上の逆ギャップとして変化する標準相関時間よりもデコヒーレンス時間が短い強いデコヒーレンス構造を特定する。この方法では、システムが平衡から外れ、関連するフリーズアウト長さが$\bar{\xi}\sim\tau^{\nu/({1+2\nu z})} となる場合のフリーズアウト時間$\bar{t}\sim\tau^{{2\nu z}/({1+2\nu z})} がクエンチレート(1/\tau$)に関してパワーロースケーリングを示す。普遍指数は標準的なkibble-zurekスケールと異なる。我々は,チャーン絶縁体系における位相遷移の場合に,このスケーリング挙動を明示的に示す。本研究では,ホール導電率の緩和から凍結時間スケールを推定できることを示す。さらに、翻訳不変性を損なう障害の出現について、創発的長スケールが特徴とする不均衡励起密度の領域での焼成結果が普遍的スケーリングを示すことを示す。解析的予測を検証し,システムのホストに普遍的と仮定するスケーリング引数を相関付けるため,数値シミュレーションを行う。

We present a formulation for investigating quench dynamics across quantum phase transitions in the presence of decoherence. We formulate decoherent dynamics induced by continuous quantum non-demolition measurements of the instantaneous Hamiltonian. We generalize the well-studied universal Kibble-Zurek behavior for linear temporal drive across the critical point. We identify a strong decoherence regime wherein the decoherence time is shorter than the standard correlation time, which varies as the inverse gap above the groundstate. In this regime, we find that the freeze-out time $\bar{t}\sim\tau^{{2\nu z}/({1+2\nu z})}$ for when the system falls out of equilibrium and the associated freeze-out length $\bar{\xi}\sim\tau^{\nu/({1+2\nu z})}$ show power-law scaling with respect to the quench rate $1/\tau$, where the exponents depend on the correlation length exponent $\nu$ and the dynamical exponent $z$ associated with the transition. The universal exponents differ from those of standard Kibble-Zurek scaling. We explicitly demonstrate this scaling behavior in the instance of a topological transition in a Chern insulator system. We show that the freeze-out time scale can be probed from the relaxation of the Hall conductivity. Furthermore, on introducing disorder to break translational invariance, we demonstrate how quenching results in regions of imbalanced excitation density characterized by an emergent length scale which also shows universal scaling. We perform numerical simulations to confirm our analytical predictions and corroborate the scaling arguments that we postulate as universal to a host of systems.

翻訳日:2023-04-08 04:19:38 公開日:2021-03-14

# 超ポテンシャル$W(x,A,B)=A\tanh 3px-B\coth px$ を持つ形状不変ポテンシャルの可解シュロディンガー方程式

Solvable Schrodinger Equations of Shape Invariant Potentials with Superpotential $W(x,A,B)=A\tanh 3px-B\coth px$ ( http://arxiv.org/abs/2103.08066v1 )

ライセンス: Link先を確認

Jamal Benbourenane

(参考訳) 我々は、新しい、正確に解けるSchr\"{o}dinger方程式を提案する。ポテンシャルパートナーは \[{ V=}-Bp\operatorname{csch}[px]^{2}-9p(B+p)\operatorname*{sech}[3px]^{2}+(B\coth[px]-3(B+p)\tanh[3px])^{2} で与えられる。超ポテンシャル $w(x,a,b)=a\tanh 3px-b\coth px を持つ形状不変性を持つ超対称法を用いて得られる。 E_{n}^{\left( -\right) }=(A-B)^{2}-(A-B-4np)^{2}% $ で与えられる固有値を持ち、対応する固有函数は正確に閉形式で決定される。 schr\"{o}dinger方程式とsturm-liouville方程式は一般に閉形式で解くのが難しく、そのいくつかしか知られていない。したがって、厳密な数学的意味では、新しい可解方程式の発見は、解の基盤を理解する上で不可欠である。この結果は核物理学や化学、その他の科学分野にも応用できる可能性がある。

We propose a new, exactly solvable Schr\"{o}dinger equation. The potential partner is given by \[{ V=}-Bp\operatorname{csch}[px]^{2}-9p(B+p)\operatorname*{sech}[3px]^{2}+(B\coth[px]-3(B+p)\tanh[3px])^{2}.\] obtained using supersymmetric method with shape invariance property having a superpotential $W(x,A,B)=A\tanh 3px-B\coth px.$ We derive entirely the exact solutions of this family of Schr\"{o}dinger equations with the eigenvalue given by $E_{n}^{\left( -\right) }=(A-B)^{2}-(A-B-4np)^{2}% $ and the corresponding eigenfunctions are determined exactly and in closed form. Schr\"{o}dinger equations, and Sturm-Liouville equations in general, are challenging to solve in closed form, and only a few of them are known. Therefore, in a strict mathematical sense, discovering new solvable equations is essential in understanding the eluded solutions' underpinnings. This result has potential applications in nuclear physics and chemistry, and other fields of science.

翻訳日:2023-04-08 04:19:03 公開日:2021-03-14

# 量子ldpc符号の結合探索デコーダに向けて

Toward a Union-Find decoder for quantum LDPC codes ( http://arxiv.org/abs/2103.08049v1 )

ライセンス: Link先を確認

Nicolas Delfosse, Vivien Londe and Michael Beverland

(参考訳) 量子LDPC符号は低オーバーヘッド量子コンピューティングにとって有望な方向である。本稿では,量子LDPC符号のアデコーダとしてUnion-Findデコーダの一般化を提案する。このデコーダは、任意の次元 D \geq 3 のトーリック符号や双曲符号や量子展開符号などの量子LDPC符号の異なるクラスに対して、いくつかの A, {\alpha > 0 に対して、An^{\alpha} までの重みで全ての誤差を補正する。この結果を証明するために,その症候群からの誤差の拡散を測定する被覆半径の概念を導入する。この概念はデコード問題を超えて応用できると考えている。また,Union-Findデコーダは,長さ3600の量子LDPC符号の場合,低誤り率条件下での信念伝搬デコーダよりも優れていることを示す数値シミュレーションを行った。

Quantum LDPC codes are a promising direction for low overhead quantum computing. In this paper, we propose a generalization of the Union-Find decoder as adecoder for quantum LDPC codes. We prove that this decoder corrects all errors with weight up to An^{\alpha} for some A, {\alpha} > 0 for different classes of quantum LDPC codes such as toric codes and hyperbolic codes in any dimension D \geq 3 and quantum expander codes. To prove this result, we introduce a notion of covering radius which measures the spread of an error from its syndrome. We believe this notion could find application beyond the decoding problem. We also perform numerical simulations, which show that our Union-Find decoder outperforms the belief propagation decoder in the low error rate regime in the case of a quantum LDPC code with length 3600.

翻訳日:2023-04-08 04:18:12 公開日:2021-03-14

# 量子エントロピー物理

Quantum-Entropy Physics ( http://arxiv.org/abs/2103.07996v1 )

ライセンス: Link先を確認

Davi Geiger and Zvi M. Kedem

(参考訳) 物理学の法則はすべて可逆である。古典粒子のアンサンブルが確率論的に扱われるときにのみ時間矢印が出現し、エントロピーと熱力学の第二法則が導入される。量子物理学では、固有確率性にもかかわらず時間矢印のメカニズムは提案されていない。結果として、励起状態にある電子が、可逆的なユニタリ進化を続けるのではなく、光子が生成され放出されるにつれて「自発的に」基底状態に遷移する理由を説明できない。このような現象に対処するために、時間矢印の出現を誘発する量子物理学のエントロピーを導入する。エントロピー(entropy)は、量子状態の自由度に対するランダム性の尺度である。これは無次元であり、相対論的スカラーであり、位置と運動量の座標変換の下では不変であり、共役性を維持し、CPT変換の下では不変である。保存法則に従っても量子物理過程が起こらない理由を解明するために、エントロピーの有無に基づいて初期状態のすべての進化の集合を4ブロックに分割する。 (i)増加するが一定ではない (ii)減少するが一定ではない。 (iii)定数 (4)振動する。量子物理学におけるエントロピー(weakly)は時間とともに増加するという法則を提案する。したがって、集合の進化は、 (ii)不許可であり、集合における進化 (iv)は、瞬時に新しい状態に移行することにより、発振期間の終了を阻止する。この量子物理学の法則は、保存法則を超えた物理シナリオを制限し、時間矢印を定義することで因果推論を提供する。

All the laws of physics are time-reversible. Time arrow emerges only when ensembles of classical particles are treated probabilistically, outside of physics laws, and the entropy and the second law of thermodynamics are introduced. In quantum physics, no mechanism for a time arrow has been proposed despite its intrinsic probabilistic nature. In consequence, one cannot explain why an electron in an excited state will "spontaneously" transition into a ground state as a photon is created and emitted, instead of continuing in its reversible unitary evolution. To address such phenomena, we introduce an entropy for quantum physics, which will conduce to the emergence of a time arrow. The entropy is a measure of randomness over the degrees of freedom of a quantum state. It is dimensionless; it is a relativistic scalar, it is invariant under coordinate transformation of position and momentum that maintain conjugate properties and under CPT transformations; and its minimum is positive due to the uncertainty principle. To excogitate why some quantum physical processes cannot take place even though they obey conservation laws, we partition the set of all evolutions of an initial state into four blocks, based on whether the entropy is (i) increasing but not a constant, (ii) decreasing but not a constant, (iii) a constant, (iv) oscillating. We propose a law that in quantum physics entropy (weakly) increases over time. Thus, evolutions in the set (ii) are disallowed, and evolutions in set (iv) are barred from completing an oscillation period by instantaneously transitioning to a new state. This law for quantum physics limits physical scenarios beyond conservation laws, providing causality reasoning by defining a time arrow.

翻訳日:2023-04-08 04:17:16 公開日:2021-03-14

# privacynet:マルチ属性の顔プライバシーのための半敵ネットワーク

PrivacyNet: Semi-Adversarial Networks for Multi-attribute Face Privacy ( http://arxiv.org/abs/2001.00561v3 )

ライセンス: Link先を確認

Vahid Mirjalili, Sebastian Raschka, Arun Ross

(参考訳) 近年,人物の顔画像から年齢,性別,人種などのソフトバイオメトリックな属性を高精度に推定する可能性が確立されている。しかし、特に生体認証のために収集された顔画像が、人の同意なしに属性分析に使用される場合、プライバシーの懸念が高まる。この問題に対処するために,画像摂動法を用いて顔画像にソフトバイオメトリックプライバシを付与する手法を開発した。画像の摂動はganベースの半敵ネットワーク(san)(privacynet)を使用して行われ、入力された顔画像がマッチングのために顔マッチング器で使用できるが、属性分類器では確実に使用できないように修正される。さらに、privacynetでは、入力された顔画像(例えば、年齢と人種)に難読化されなければならない特定の属性を選択でき、他の種類の属性(例えば、性別)を抽出することができる。複数の顔マッチング器、複数の年齢/性別/人種分類器、および複数の顔データセットを用いた大規模な実験は、複数の顔および属性分類器にまたがる多属性プライバシー向上手法の一般化可能性を示す。

Recent research has established the possibility of deducing soft-biometric attributes such as age, gender and race from an individual's face image with high accuracy. However, this raises privacy concerns, especially when face images collected for biometric recognition purposes are used for attribute analysis without the person's consent. To address this problem, we develop a technique for imparting soft biometric privacy to face images via an image perturbation methodology. The image perturbation is undertaken using a GAN-based Semi-Adversarial Network (SAN) - referred to as PrivacyNet - that modifies an input face image such that it can be used by a face matcher for matching purposes but cannot be reliably used by an attribute classifier. Further, PrivacyNet allows a person to choose specific attributes that have to be obfuscated in the input face images (e.g., age and race), while allowing for other types of attributes to be extracted (e.g., gender). Extensive experiments using multiple face matchers, multiple age/gender/race classifiers, and multiple face datasets demonstrate the generalizability of the proposed multi-attribute privacy enhancing method across multiple face and attribute classifiers.

翻訳日:2023-01-16 04:14:10 公開日:2021-03-14

# 連続制御のためのdeep radial-basis値関数

Deep Radial-Basis Value Functions for Continuous Control ( http://arxiv.org/abs/2002.01883v2 )

ライセンス: Link先を確認

Kavosh Asadi, Neev Parikh, Ronald E. Parr, George D. Konidaris, Michael L. Littman

(参考訳) 強化学習(RL)の中核となる操作は、学習値関数に対して最適な行動を見つけることである。この操作は、学習値関数が連続的なアクションを入力として取る場合、しばしば難しい。本稿では,放射基底関数(RBF)の出力層を持つディープネットワークを用いて学習した値関数について紹介する。深部RBVFに対する作用値の最大値は、容易に正確に近似できることを示す。さらに、深いRBVFは、普遍関数近似をサポートするため、真の値関数を表現できる。エージェントに深いRBVFを付与することにより、標準的なDQNアルゴリズムを連続制御に拡張する。 RBF-DQNと呼ばれる結果のエージェントは、値関数のみのベースラインを著しく上回り、最先端のアクター批判アルゴリズムと競合することを示す。

A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis function (RBF) output layer. We show that the maximum action-value with respect to a deep RBVF can be approximated easily and accurately. Moreover, deep RBVFs can represent any true value function owing to their support for universal function approximation. We extend the standard DQN algorithm to continuous control by endowing the agent with a deep RBVF. We show that the resultant agent, called RBF-DQN, significantly outperforms value-function-only baselines, and is competitive with state-of-the-art actor-critic algorithms.

翻訳日:2023-01-03 20:43:27 公開日:2021-03-14

# アンダーディスプレイカメラの画像復元

Image Restoration for Under-Display Camera ( http://arxiv.org/abs/2003.04857v2 )

ライセンス: Link先を確認

Yuqian Zhou, David Ren, Neil Emerton, Sehoon Lim, Timothy Large

(参考訳) フルスクリーンデバイスの新しいトレンドは、カメラをスクリーンの後ろに置くことを奨励する。ベゼルを外し、画面下にカメラを集中させると、ディスプレイとボディの比率が大きくなり、ビデオチャットではアイコンタクトが強化されるが、画像の劣化も引き起こす。本稿では,新しい実世界の単一画像復元問題として,新たに定義されたudc(under-display camera)に着目した。まず4k Transparent OLED(T-OLED)とPentile OLED(P-OLED)を使って、その劣化を理解するための光学系を分析します。第2に、実対データ取得を容易にするモニタカメライメージングシステム(MCIS)と、表示パターンとカメラ計測のみからポイントスプレッド関数(PSF)とUDCデータを生成するモデルベースデータ合成パイプラインを設計する。最後に,デコンボリューションに基づくパイプラインと学習に基づく手法を用いて,複雑な劣化を解消する。我々のモデルはリアルタイムの高品質な復元を実証する。提案手法と結果は,UDCの有望な研究価値と方向性を明らかにする。

The new trend of full-screen devices encourages us to position a camera behind a screen. Removing the bezel and centralizing the camera under the screen brings larger display-to-body ratio and enhances eye contact in video chat, but also causes image degradation. In this paper, we focus on a newly-defined Under-Display Camera (UDC), as a novel real-world single image restoration problem. First, we take a 4k Transparent OLED (T-OLED) and a phone Pentile OLED (P-OLED) and analyze their optical systems to understand the degradation. Second, we design a Monitor-Camera Imaging System (MCIS) for easier real pair data acquisition, and a model-based data synthesizing pipeline to generate Point Spread Function (PSF) and UDC data only from display pattern and camera measurements. Finally, we resolve the complicated degradation using deconvolution-based pipeline and learning-based methods. Our model demonstrates a real-time high-quality restoration. The presented methods and results reveal the promising research values and directions of UDC.

翻訳日:2022-12-24 21:20:22 公開日:2021-03-14

# 物体検出のための動的スケールトレーニング

Dynamic Scale Training for Object Detection ( http://arxiv.org/abs/2004.12432v2 )

ライセンス: Link先を確認

Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu Qi, Jian Sun, and Jiaya Jia

(参考訳) 本稿では,オブジェクト検出におけるスケール変動問題を軽減するための動的スケールトレーニングパラダイム(DST)を提案する。画像ピラミッドやマルチスケールトレーニングといったこれまでの戦略は、モデル最適化のためのスケール不変データを準備することを目的としていた。しかし, 提案手法は, スケール変動の処理能力を制限する, 以下の最適化プロセスに気付かない。代わりに、我々のパラダイムでは、最適化プロセスからのフィードバック情報を使用して、データ準備を動的にガイドします。提案手法は驚くほど単純であるが,従来の手法を上回っている(ms cocoデータセットの平均精度2%以上)。実験により,提案手法のスケール変動処理に対する有効性を示した。また、さまざまなバックボーン、ベンチマーク、およびインスタンスのセグメンテーションのようなダウンストリームタスクを一般化することもできる。推論オーバーヘッドを導入せず、一般的な検出設定のための無料ランチとして機能する。さらに、高速収束による効率的なトレーニングも容易である。コードとモデルはgithub.com/yukang2017/stitcherで入手できる。

We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection. Previous strategies like image pyramid, multi-scale training, and their variants are aiming at preparing scale-invariant data for model optimization. However, the preparation procedure is unaware of the following optimization process that restricts their capability in handling the scale variation. Instead, in our paradigm, we use feedback information from the optimization process to dynamically guide the data preparation. The proposed method is surprisingly simple yet obtains significant gains (2%+ Average Precision on MS COCO dataset), outperforming previous methods. Experimental results demonstrate the efficacy of our proposed DST method towards scale variation handling. It could also generalize to various backbones, benchmarks, and other challenging downstream tasks like instance segmentation. It does not introduce inference overhead and could serve as a free lunch for general detection configurations. Besides, it also facilitates efficient training due to fast convergence. Code and models are available at github.com/yukang2017/Stitcher.

翻訳日:2022-12-09 13:34:48 公開日:2021-03-14

# Bullseye Polytope: トランスファービリティを改善したスケーラブルなクリーンラベル中毒攻撃

Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability ( http://arxiv.org/abs/2005.00191v3 )

ライセンス: Link先を確認

Hojjat Aghakhani, Dongyu Meng, Yu-Xiang Wang, Christopher Kruegel, and Giovanni Vigna

(参考訳) ニューラルネットワークのセキュリティに対する最近の懸念の源泉は、トレーニングデータセットに正しくラベル付けされた毒サンプルを注入するクリーンラベルデータセット中毒攻撃の出現である。これらの毒のサンプルは人間の観察者にとって正しいように見えるが、推論中に標的の誤分類を引き起こす悪意のある特徴を含んでいる。そこで我々は,移動学習に対するスケーラブルで移動可能なクリーンラベル中毒攻撃を提案し,特徴空間内のターゲット画像に近い中心に毒画像を生成する。我々の攻撃であるBullseye Polytopeは、現在の最先端技術の攻撃成功率を26.75%向上させ、攻撃速度を12倍に向上させた。我々はさらにブルジー・ポリトープをより実用的な攻撃モデルに拡張し、毒サンプルを作成する際に同じ物体(例えば、異なる角度から)の複数の画像を含める。この拡張により、余分な毒のサンプルを使わずに、16%以上の画像(同じオブジェクト)のアタック転送性が向上する。

A recent source of concern for the security of neural networks is the emergence of clean-label dataset poisoning attacks, wherein correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to the human observer, they contain malicious characteristics that trigger a targeted misclassification during inference. We propose a scalable and transferable clean-label poisoning attack against transfer learning, which creates poison images with their center close to the target image in the feature space. Our attack, Bullseye Polytope, improves the attack success rate of the current state-of-the-art by 26.75% in end-to-end transfer learning, while increasing attack speed by a factor of 12. We further extend Bullseye Polytope to a more practical attack model by including multiple images of the same object (e.g., from different angles) when crafting the poison samples. We demonstrate that this extension improves attack transferability by over 16% to unseen images (of the same object) without using extra poison samples.

翻訳日:2022-12-07 23:18:57 公開日:2021-03-14

# human in events: 複雑なイベントにおける人間中心のビデオ分析のための大規模ベンチマーク

Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events ( http://arxiv.org/abs/2005.04490v5 )

ライセンス: Link先を確認

Weiyao Lin, Huabin Liu, Shizhan Liu, Yuxi Li, Rui Qian, Tao Wang, Ning Xu, Hongkai Xiong, Guo-Jun Qi, Nicu Sebe

(参考訳) 現代のスマートシティの発展とともに、人間中心のビデオ分析は、現実の場面で多様な複雑なイベントを分析するという課題に直面している。複雑な出来事は、密集した群衆、異常、集団行動に関連する。しかしながら、既存のビデオデータセットの規模によって制限されているため、このような複雑なイベントにおけるパフォーマンスを報告している人的分析アプローチはほとんどない。そこで本研究では,人間の動作やポーズ,行動,特に群衆や複雑なイベントにおける動作を理解するために,ヒューマン・イン・イベント (human-in-events) や hieve (human-centric video analysis in complex events) という,新たな大規模データセットを提案する。複雑なイベントにおけるアクションインスタンスの最大数 (>56k) であるポーズ数 (>1M) と、長い時間(平均軌道長は >480 フレーム)続くトラジェクトリの最大数 (the most number of trajectories) を含む。このデータセットに基づいて,より強力な2次元ポーズ特徴の学習を導くために,行動情報の潜在性を活用したポーズ推定ベースラインの強化を提案する。提案手法は,HiEveデータセット上の既存のポーズ推定パイプラインの性能を向上させることができることを示す。さらに,最近の映像分析手法とベースライン手法のベンチマーク実験を行い,HiEveが人間中心のビデオ解析の挑戦的データセットであることを実証した。データセットは、人間中心の分析と複雑な事象の理解における最先端技術の開発を前進させることを期待している。データセットはhttp://humaninevents.orgで利用可能である。

Along with the development of modern smart cities, human-centric video analysis has been encountering the challenge of analyzing diverse and complex events in real scenes. A complex event relates to dense crowds, anomalous, or collective behaviors. However, limited by the scale of existing video datasets, few human analysis approaches have reported their performance on such complex events. To this end, we present a new large-scale dataset, named Human-in-Events or HiEve (Human-centric video analysis in complex Events), for the understanding of human motions, poses, and actions in a variety of realistic events, especially in crowd and complex events. It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time (with an average trajectory length of >480 frames). Based on this dataset, we present an enhanced pose estimation baseline by utilizing the potential of action information to guide the learning of more powerful 2D pose features. We demonstrate that the proposed method is able to boost the performance of existing pose estimation pipelines on our HiEve dataset. Furthermore, we conduct extensive experiments to benchmark recent video analysis approaches together with our baseline methods, demonstrating that HiEve is a challenging dataset for human-centric video analysis. We expect that the dataset will advance the development of cutting-edge techniques in human-centric analysis and the understanding of complex events. The dataset is available at http://humaninevents.org

翻訳日:2022-12-05 07:11:55 公開日:2021-03-14

# 5* 射影変換を用いた知識グラフ埋め込み

5* Knowledge Graph Embeddings with Projective Transformations ( http://arxiv.org/abs/2006.04986v2 )

ライセンス: Link先を確認

Mojtaba Nayyeri, Sahar Vahdati, Can Aykul, Jens Lehmann

(参考訳) 知識グラフ埋め込みモデルを用いたリンク予測が知識グラフ補完の一般的なアプローチとなっている。このようなモデルは、エッジを介してノードをベクトル空間にマッピングし、リンクの可能性を測定する変換関数を用いる。個々のノードをマッピングしながら、サブグラフの構造も変換される。ユークリッド幾何学で設計された埋め込みモデルは、通常、1つの変換タイプの変換や回転をサポートし、隣接する部分グラフに小さな違いがあるグラフの学習に適している。しかし、多重関係的知識グラフは近隣の複数の部分グラフ構造(例えば、パスとループ構造の組み合わせ)を含むことが多く、現在の埋め込みモデルではうまく捉えられていない。この問題に対処するために,複数の同時変換をサポートする射影幾何学における新しいKGEモデル(5*E)を提案する。このモデルはいくつかの好ましい理論的性質を持ち、既存のアプローチを仮定する。これは最も広く使われているリンク予測ベンチマークでそれらを上回っている

Performing link prediction using knowledge graph embedding models has become a popular approach for knowledge graph completion. Such models employ a transformation function that maps nodes via edges into a vector space in order to measure the likelihood of the links. While mapping the individual nodes, the structure of subgraphs is also transformed. Most of the embedding models designed in Euclidean geometry usually support a single transformation type - often translation or rotation, which is suitable for learning on graphs with small differences in neighboring subgraphs. However, multi-relational knowledge graphs often include multiple sub-graph structures in a neighborhood (e.g. combinations of path and loop structures), which current embedding models do not capture well. To tackle this problem, we propose a novel KGE model (5*E) in projective geometry, which supports multiple simultaneous transformations - specifically inversion, reflection, translation, rotation, and homothety. The model has several favorable theoretical properties and subsumes the existing approaches. It outperforms them on the most widely used link prediction benchmarks

翻訳日:2022-11-24 00:24:53 公開日:2021-03-14

# dance revolution: カリキュラム学習による音楽による長期ダンス生成

Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning ( http://arxiv.org/abs/2006.06119v7 )

ライセンス: Link先を確認

Ruozi Huang, Huang Hu, Wei Wu, Kei Sawada, Mi Zhang and Daxin Jiang

(参考訳) 音楽に合わせて踊ることは、古代から人間の生来の能力の1つである。しかし、機械学習の研究では、音楽からダンスの動きを合成することは難しい問題である。近年,リカレントニューラルネットワーク(RNN)のような自己回帰モデルを用いて,ヒトの動作配列を合成している。このようなアプローチは、ニューラルネットワークにフィードバックされる予測エラーの蓄積によって、しばしば短いシーケンスを生成する。この問題は長動き列生成においてさらに深刻になる。また、スタイル、リズム、ビートの観点からのダンスと音楽の一貫性は、モデリングの段階ではまだ考慮されていない。本稿では,音楽条件付きダンス生成を逐次学習問題として定式化し,新しいseq2seqアーキテクチャを考案し,音楽特徴の長いシーケンスを効率的に処理し,音楽とダンスの微妙な対応を捉える。さらに,本論文では,前回の地中運動を用いた完全誘導型教師励行方式から,主に生成した動きを用いた非誘導型自己回帰方式へと,学習過程をゆるやかに変化させる長動系列生成における自己回帰モデルの誤り蓄積を緩和する新しいカリキュラム学習戦略を提案する。大規模な実験により、我々のアプローチは、自動測定と人的評価において、既存の最先端技術よりも大幅に優れていることが示された。また、提案されたアプローチの優れたパフォーマンスを示すデモビデオをhttps://www.youtube.com/watch? v=lmE20MEheZ8。

Dancing to music is one of human's innate abilities since ancient times. In machine learning research, however, synthesizing dance movements from music is a challenging problem. Recently, researchers synthesize human motion sequences through autoregressive models like recurrent neural network (RNN). Such an approach often generates short sequences due to an accumulation of prediction errors that are fed back into the neural network. This problem becomes even more severe in the long motion sequence generation. Besides, the consistency between dance and music in terms of style, rhythm and beat is yet to be taken into account during modeling. In this paper, we formalize the music-conditioned dance generation as a sequence-to-sequence learning problem and devise a novel seq2seq architecture to efficiently process long sequences of music features and capture the fine-grained correspondence between music and dance. Furthermore, we propose a novel curriculum learning strategy to alleviate error accumulation of autoregressive models in long motion sequence generation, which gently changes the training process from a fully guided teacher-forcing scheme using the previous ground-truth movements, towards a less guided autoregressive scheme mostly using the generated movements instead. Extensive experiments show that our approach significantly outperforms the existing state-of-the-arts on automatic metrics and human evaluation. We also make a demo video to demonstrate the superior performance of our proposed approach at https://www.youtube.com/watch?v=lmE20MEheZ8.

翻訳日:2022-11-22 13:22:35 公開日:2021-03-14

# 微分的にプライベートな確率座標降下

Differentially Private Stochastic Coordinate Descent ( http://arxiv.org/abs/2006.07272v4 )

ライセンス: Link先を確認

Georgios Damaskinos, Celestine Mendler-D\"unner, Rachid Guerraoui, Nikolaos Papandreou, Thomas Parnell

(参考訳) 本稿では,確率座標降下アルゴリズムを微分プライベートにするという課題に挑戦する。従来の勾配降下アルゴリズムでは、更新が1つのモデルベクトル上で動作し、このベクトルにノイズを加えることで個人に関する重要な情報を隠蔽するが、確率座標降下はトレーニング中に補助情報をメモリに保持することに大きく依存する。この補助情報は、さらなるプライバシー漏洩をもたらし、この作業で対処される大きな課題を提起する。独立雑音付加の下では、補助情報の整合性は期待通りに保たれるという知見により、DP-SCDは、最初の微分プライベート確率座標降下アルゴリズムである。提案手法を理論的に解析し,コーディネート更新の分離と並列化が有用であると主張している。経験的側面では、一般的な確率勾配降下代替(DP-SGD)に対して、チューニングを著しく少なくして競合性能を示す。

In this paper we tackle the challenge of making the stochastic coordinate descent algorithm differentially private. Compared to the classical gradient descent algorithm where updates operate on a single model vector and controlled noise addition to this vector suffices to hide critical information about individuals, stochastic coordinate descent crucially relies on keeping auxiliary information in memory during training. This auxiliary information provides an additional privacy leak and poses the major challenge addressed in this work. Driven by the insight that under independent noise addition, the consistency of the auxiliary information holds in expectation, we present DP-SCD, the first differentially private stochastic coordinate descent algorithm. We analyze our new method theoretically and argue that decoupling and parallelizing coordinate updates is essential for its utility. On the empirical side we demonstrate competitive performance against the popular stochastic gradient descent alternative (DP-SGD) while requiring significantly less tuning.

翻訳日:2022-11-22 03:42:32 公開日:2021-03-14

# コミュニケーション効率の良い分散学習における誤りフィードバックの代替策

A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning ( http://arxiv.org/abs/2006.11077v2 )

ライセンス: Link先を確認

Samuel Horv\'ath and Peter Richt\'arik

(参考訳) 現代の大規模機械学習アプリケーションは、分散コンピューティングシステムに実装するために確率最適化アルゴリズムを必要とする。このようなシステムの重要なボトルネックは、確率勾配のような労働者間で情報を交換するための通信オーバーヘッドである。この問題を解決するために提案された多くのテクニックの中で、最も成功したのは、エラーフィードバック(EF)による圧縮通信のフレームワークである。 EFは、Top-$K$のようなバイアスのない圧縮機によって引き起こされるエラーに対処できる唯一の方法である。本稿では, 収縮圧縮機を扱うための新しい, 理論上, 実用上, EFの代替案を提案する。特に,任意の収縮圧縮機を誘導非バイアス圧縮機に変換可能な構成を提案する。この変換の後、非バイアス圧縮機で動く既存の方法を適用することができる。我々のアプローチは、メモリ要求の削減、通信の複雑さの保証の改善、仮定の削減など、EFよりも大幅に改善されることを示します。さらに,ノード上の任意の分布に従って,部分的参加を伴うフェデレーション学習に結果を拡張し,そのメリットを実証する。理論的結果を検証する数値実験を数回行った。

Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed compute systems. A key bottleneck of such systems is the communication overhead for exchanging information across the workers, such as stochastic gradients. Among the many techniques proposed to remedy this issue, one of the most successful is the framework of compressed communication with error feedback (EF). EF remains the only known technique that can deal with the error induced by contractive compressors which are not unbiased, such as Top-$K$. In this paper, we propose a new and theoretically and practically better alternative to EF for dealing with contractive compressors. In particular, we propose a construction which can transform any contractive compressor into an induced unbiased compressor. Following this transformation, existing methods able to work with unbiased compressors can be applied. We show that our approach leads to vast improvements over EF, including reduced memory requirements, better communication complexity guarantees and fewer assumptions. We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits thereof. We perform several numerical experiments which validate our theoretical findings.

翻訳日:2022-11-19 03:57:56 公開日:2021-03-14

# BBTの事前トレーニングに役立つフライノート

Taking Notes on the Fly Helps BERT Pre-training ( http://arxiv.org/abs/2008.01466v2 )

ライセンス: Link先を確認

Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

(参考訳) 教師なし言語の事前学習をより効率的でリソース集約の少ないものにする方法は、NLPにおいて重要な研究方向である。本稿では,より優れたデータ利用を提供することにより,言語事前学習手法の効率化に焦点をあてる。言語データコーパスでは、単語はヘビーテール分布に従うことが知られている。単語のかなりの割合はわずか数回しか現れず、まれな単語の埋め込みは通常最適化が不十分である。このような埋め込みはセマンティックな信号が不十分であるため、データの利用効率が低下し、モデル全体の事前学習が遅くなる可能性がある。この問題を軽減するため,我々は,モデルが次回発生することを理解できるように,事前学習中のまれな単語のメモを取るtnf(take notes on the fly)を提案する。具体的には、TNFはノート辞書を保持し、まれな単語の文脈情報を文中に稀な単語が発生したときのメモとして保存する。トレーニング中に同じまれな単語が再び発生すると、前もって保存したメモ情報を使用して、現在の文の意味性を高めることができる。これにより、TNFは、文中のまれな単語によって引き起こされる不適切な意味をカバーするために、クロス文情報を用いるため、より良いデータ利用を提供する。 BERTとELECTRAの両方にTNFを実装し,その効率性と有効性を確認した。実験の結果、TNFのトレーニング時間は、同じパフォーマンスに達すると、バックボーン事前トレーニングモデルよりも60\%$安いことがわかった。同じイテレーション数でトレーニングされた場合、TNFは、ダウンストリームタスクの大部分と平均GLUEスコアで、バックボーンメソッドよりも優れています。ソースコードは補足材料に添付される。

How to make unsupervised language pre-training more efficient and less resource-intensive is an important research direction in NLP. In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization. It is well-known that in language data corpus, words follow a heavy-tail distribution. A large proportion of words appear only very few times and the embeddings of rare words are usually poorly optimized. We argue that such embeddings carry inadequate semantic signals, which could make the data utilization inefficient and slow down the pre-training of the entire model. To mitigate this problem, we propose Taking Notes on the Fly (TNF), which takes notes for rare words on the fly during pre-training to help the model understand them when they occur next time. Specifically, TNF maintains a note dictionary and saves a rare word's contextual information in it as notes when the rare word occurs in a sentence. When the same rare word occurs again during training, the note information saved beforehand can be employed to enhance the semantics of the current sentence. By doing so, TNF provides better data utilization since cross-sentence information is employed to cover the inadequate semantics caused by rare words in the sentences. We implement TNF on both BERT and ELECTRA to check its efficiency and effectiveness. Experimental results show that TNF's training time is $60\%$ less than its backbone pre-training models when reaching the same performance. When trained with the same number of iterations, TNF outperforms its backbone methods on most of downstream tasks and the average GLUE score. Source code is attached in the supplementary material.

翻訳日:2022-11-03 00:14:52 公開日:2021-03-14

# Coupled Oscillatory Recurrent Neural Network (coRNN): 長期間の依存関係を学習するための正確で(段階的な)安定したアーキテクチャ

Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies ( http://arxiv.org/abs/2010.00951v2 )

ライセンス: Link先を確認

T. Konstantin Rusch, Siddhartha Mishra

(参考訳) 脳の機能部分のような生体ニューロンの回路は、結合振動子のネットワークとしてモデル化することができる。状態変数を(段階的に)有界に保ちながら、豊かな出力を表現できるシステムの能力に着想を得て、リカレントニューラルネットワークのための新しいアーキテクチャを提案する。提案するRNNは,制御非線形発振器のモデリングネットワークである2次常微分方程式系の時間分解に基づく。我々は隠れた状態の勾配の正確な境界を証明し、このrnnの爆発と消滅の勾配問題の緩和に繋がる。実験により、提案したRNNは、様々なベンチマークにおける最先端技術に匹敵する性能を示し、複雑なシーケンシャルデータを処理するための安定かつ正確なRNNを提供するアーキテクチャの可能性を示した。

Circuits of biological neurons, such as in the functional parts of the brain can be modeled as networks of coupled oscillators. Inspired by the ability of these systems to express a rich set of outputs while keeping (gradients of) state variables bounded, we propose a novel architecture for recurrent neural networks. Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations, modeling networks of controlled nonlinear oscillators. We prove precise bounds on the gradients of the hidden states, leading to the mitigation of the exploding and vanishing gradient problem for this RNN. Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks, demonstrating the potential of this architecture to provide stable and accurate RNNs for processing complex sequential data.

翻訳日:2022-10-12 00:14:26 公開日:2021-03-14

# ALFWorld:インタラクティブ学習のためのテキストと身体環境の調整

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning ( http://arxiv.org/abs/2010.03768v2 )

ライセンス: Link先を確認

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre C\^ot\'e, Yonatan Bisk, Adam Trischler, Matthew Hausknecht

(参考訳) キッチンの冷蔵庫にリンゴを洗うといった単純な要求があれば、人間はアクションシーケンスを想像し、成功率、確率、効率を、筋肉を動かすことなく評価することで、純粋に抽象的な言葉で説明できる。問題のあるキッチンを見たら、そのシーンに合うように抽象的な計画を更新できる。エージェントは同じ能力を必要とするが、既存の作業は抽象的な推論と具体的実行の両方に必要なインフラを提供していない。この制限には、エージェントがTextWorld(C\^ot\'e et al., 2018)で抽象的テキストベースのポリシーを学習し、ALFREDベンチマーク(Shridhar et al., 2020)の目標をリッチなビジュアル環境で実行できるようにするシミュレータALFWorldを導入することで対処する。 ALFWorldは、TextWorldで学んだ抽象的な知識が、具体的で視覚的に根ざしたアクションに直接対応する新しいBUTLERエージェントの作成を可能にする。実験的に示すように、これは視覚的に接地された環境でのトレーニングよりも優れたエージェントの一般化を促進する。バトラーのシンプルでモジュラーな設計要素は、研究者がパイプラインのすべての部分(言語理解、計画、ナビゲーション、視覚シーン理解)を改善するためのモデルに集中できる問題である。

Given a simple request like Put a washed apple in the kitchen fridge, humans can reason in purely abstract terms by imagining action sequences and scoring their likelihood of success, prototypicality, and efficiency, all without moving a muscle. Once we see the kitchen in question, we can update our abstract plans to fit the scene. Embodied agents require the same abilities, but existing work does not yet provide the infrastructure necessary for both reasoning abstractly and executing concretely. We address this limitation by introducing ALFWorld, a simulator that enables agents to learn abstract, text based policies in TextWorld (C\^ot\'e et al., 2018) and then execute goals from the ALFRED benchmark (Shridhar et al., 2020) in a rich visual environment. ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions. In turn, as we demonstrate empirically, this fosters better agent generalization than training only in the visually grounded environment. BUTLER's simple, modular design factors the problem to allow researchers to focus on models for improving every piece of the pipeline (language understanding, planning, navigation, and visual scene understanding).

翻訳日:2022-10-09 10:57:20 公開日:2021-03-14

# 近似量子状態を用いた期待値の推定

Estimating expectation values using approximate quantum states ( http://arxiv.org/abs/2011.04754v3 )

ライセンス: Link先を確認

Marco Paini, Amir Kalev, Dan Padilha, and Brendan Ruck

(参考訳) n$-qubit状態の近似的な記述を導入する。これは、システムの同一準備数の平方根に対する可観測性の適切に定義された半ノルムの比率によって上限される精度に対して、観測可能な任意の可観測値の期待値を推定するのに十分な情報を含んでいる。本稿では,量子状態生成に加えて,単一量子ビット回転と単一量子ビット計測のみを必要とする状態の近似記述を行うための操作手順について述べる。この手順に従って、結果として得られた状態の記述の基数は、$3MN$に増加することを示す。リゲッティの量子プロセッサユニット上で、ランダムな状態とランダムな可観測値に対して12, 16, 25キュービットの量子ビットを用いて提案手法を検証し、実験誤差にもかかわらず、理論と良好な一致を見出した。

We introduce an approximate description of an $N$-qubit state, which contains sufficient information to estimate the expectation value of any observable to a precision that is upper bounded by the ratio of a suitably-defined seminorm of the observable to the square root of the number of the system's identical preparations $M$, with no explicit dependence on $N$. We describe an operational procedure for constructing the approximate description of the state that requires, besides the quantum state preparation, only single-qubit rotations followed by single-qubit measurements. We show that following this procedure, the cardinality of the resulting description of the state grows as $3MN$. We test the proposed method on Rigetti's quantum processor unit with 12, 16 and 25 qubits for random states and random observables, and find an excellent agreement with the theory, despite experimental errors.

翻訳日:2022-09-28 02:19:18 公開日:2021-03-14

# キャプションを用いた開語彙オブジェクト検出

Open-Vocabulary Object Detection Using Captions ( http://arxiv.org/abs/2011.10678v2 )

ライセンス: Link先を確認

Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang

(参考訳) オブジェクト検出におけるディープニューラルネットワークの精度は極めて高いが、監視要件のためにトレーニングやスケールにコストがかかる。特に、より多くのオブジェクトカテゴリを学ぶには、一般的に比例的にボックスアノテーションが必要である。弱い教師付きおよびゼロショット学習技術は、少ない監督でより多くのカテゴリに対象検出器をスケールするために研究されてきたが、教師付きモデルほど成功せず、広く採用されていない。本稿では,対象検出問題の新たな定式化,すなわちオープンボキャブラリー物体検出法について述べる。本稿では,限定された対象カテゴリに対するバウンディングボックスアノテーションと,より広い範囲のオブジェクトをカバーするイメージキャプチャペアを用いて,より低コストで物体検出を行う新しい手法を提案する。提案手法は,学習中に境界ボックスアノテーションが提供されないオブジェクトを,ゼロショットアプローチよりもはるかに高い精度で検出・ローカライズできることを示す。一方、境界ボックスアノテーションを持つオブジェクトは、教師付きメソッドと同じくらい正確に検出することができる。そこで我々は,スケーラブルな物体検出のための新しい技術を確立した。

Despite the remarkable accuracy of deep neural networks in object detection, they are costly to train and scale due to supervision requirements. Particularly, learning more object categories typically requires proportionally more bounding box annotations. Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models. In this paper, we put forth a novel formulation of the object detection problem, namely open-vocabulary object detection, which is more general, more practical, and more effective than weakly supervised and zero-shot approaches. We propose a new method to train object detectors using bounding box annotations for a limited set of object categories, as well as image-caption pairs that cover a larger variety of objects at a significantly lower cost. We show that the proposed method can detect and localize objects for which no bounding box annotation is provided during training, at a significantly higher accuracy than zero-shot approaches. Meanwhile, objects with bounding box annotation can be detected almost as accurately as supervised methods, which is significantly better than weakly supervised baselines. Accordingly, we establish a new state of the art for scalable object detection.

翻訳日:2022-09-23 05:05:24 公開日:2021-03-14

# RaP-Net: 屋内ローカライゼーションのためのロバスト特徴抽出のための領域的および点的重み付けネットワーク

RaP-Net: A Region-wise and Point-wise Weighting Network to Extract Robust Features for Indoor Localization ( http://arxiv.org/abs/2012.00234v2 )

ライセンス: Link先を確認

Dongjiang Li, Jinyu Miao, Xuesong Shi, Yuxin Tian, Qiwei Long, Tianyu Cai, Ping Guo, Hongfei Yu, Wei Yang, Haosong Yue, Qi Wei, Fei Qiao

(参考訳) 特徴抽出は視覚局所化において重要な役割を果たす。動的オブジェクトや反復領域の信頼性の低い機能は、ロバストな特徴マッチングを邪魔し、屋内でのローカライゼーションに大きく挑戦する。このような問題を克服するために,地域的不可変性と点的信頼性を同時に予測する新しいネットワークであるRaP-Netを提案し,その両方を考慮して特徴を抽出する。また、提案するネットワークをトレーニングするために、OpenLORIS-Locationという新しいデータセットも導入する。データセットには93箇所の屋内画像1553点が含まれている。同じ場所の画像間の様々な外観変化が含まれており、典型的な屋内シーンにおける不変性を学ぶのに役立ちます。実験の結果,openloris-locationデータセットでトレーニングしたrap-netは,特徴マッチングタスクにおいて優れた性能を達成でき,室内ローカライズにおける最先端の特徴アルゴリズムを著しく上回っている。 RaP-Netのコードとデータセットはhttps://github.com/ivipsourcecode/RaP-Netで公開されている。

Feature extraction plays an important role in visual localization. Unreliable features on dynamic objects or repetitive regions will disturb robust feature matching and thus, challenge indoor localization greatly. To conquer such an issue, we propose a novel network, RaP-Net, to simultaneously predict region-wise invariability and point-wise reliability, and then extract features by considering both of them. We also introduce a new dataset, named OpenLORIS-Location, to train proposed network. The dataset contains 1553 indoor images from 93 indoor locations. Various appearance changes between images of the same location are included and they can help to learn the invariability in typical indoor scenes. Experimental results show that the proposed RaP-Net trained with the OpenLORIS-Location dataset achieves an excellent performance in the feature matching task and significantly outperforms state-of-the-arts feature algorithms in indoor localization. The RaP-Net code and dataset are available at https://github.com/ivipsourcecode/RaP-Net.

翻訳日:2021-05-30 19:36:50 公開日:2021-03-14

# (参考訳) テンソルブロックモデルにおける厳密なクラスタリング:統計的最適性と計算限界

Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit ( http://arxiv.org/abs/2012.09996v2 )

ライセンス: CC BY 4.0

Rungang Han, Yuetian Luo, Miaoyan Wang, and Anru R. Zhang

(参考訳) 高次クラスタリングは、神経画像、ゲノム、およびソーシャルネットワーク研究で一般的に発生するマルチウェイデータセットにおける不均一なサブ構造を特定することを目的としている。この問題の非凸性と不連続性は、統計と計算の両方において大きな課題を生じさせる。本稿では,テンソルブロックモデルにおける高次クラスタリングのためのテンソルブロックモデルと計算効率のよい方法である 'emph{high-order Lloyd algorithm} (HLloyd) と \emph{high-order spectrum clustering} (HSC) を提案する。提案手法の収束が確立され,提案手法が妥当な仮定のもとに正確なクラスタリングを実現することを示す。また、3つの異なる信号対雑音比に基づく高次クラスタリングにおける統計的計算的トレードオフの完全な特性を示す。最後に,合成データと実データの両方について広範な実験を行い,提案手法のメリットを示す。

High-order clustering aims to identify heterogeneous substructure in multiway dataset that arises commonly in neuroimaging, genomics, and social network studies. The non-convex and discontinuous nature of the problem poses significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, \emph{high-order Lloyd algorithm} (HLloyd) and \emph{high-order spectral clustering} (HSC), for high-order clustering in tensor block model. The convergence of the proposed procedure is established, and we show that our method achieves exact clustering under reasonable assumptions. We also give the complete characterization for the statistical-computational trade-off in high-order clustering based on three different signal-to-noise ratio regimes. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.

翻訳日:2021-05-02 04:34:56 公開日:2021-03-14

# 混合整数線形最適化による順序付き対実説明

Ordered Counterfactual Explanation by Mixed-Integer Linear Optimization ( http://arxiv.org/abs/2012.11782v2 )

ライセンス: Link先を確認

Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike, Kento Uemura, Hiroki Arimura

(参考訳) 機械学習モデルのポストホックな説明法は意思決定を支援するために広く用いられている。一般的な方法の1つは、予測結果を変える特徴の摂動ベクトルをユーザに提供する、Actionable Recourse (CE) とも呼ばれる。摂動ベクトルが与えられると、ユーザはそれを望ましい決定結果を得るための「作用」として解釈することができる。しかし実際には、摂動ベクトルのみを示すことは、ユーザがアクションを実行するには不十分であることが多い。その理由は、因果関係のような機能間で非対称な相互作用がある場合、アクションの総コストは機能変更の順序に依存することが期待されるためである。したがって、実用的なCE法は摂動ベクトルに加えて、特徴の変化の適切な順序を提供する必要がある。そこで本研究では,OrdCE (Ordered Counterfactual Explanation) と呼ばれる新しいフレームワークを提案する。本稿では,アクションの対と順序を特徴的相互作用に基づいて評価する新しい目的関数を提案する。最適ペアを抽出するために,目的関数を用いた混合整数線形最適化手法を提案する。実データセットの数値実験により,OrdCEの非順序CE法と比較して有効性を示した。

Post-hoc explanation methods for machine learning models have been widely used to support decision-making. One of the popular methods is Counterfactual Explanation (CE), also known as Actionable Recourse, which provides a user with a perturbation vector of features that alters the prediction result. Given a perturbation vector, a user can interpret it as an "action" for obtaining one's desired decision result. In practice, however, showing only a perturbation vector is often insufficient for users to execute the action. The reason is that if there is an asymmetric interaction among features, such as causality, the total cost of the action is expected to depend on the order of changing features. Therefore, practical CE methods are required to provide an appropriate order of changing features in addition to a perturbation vector. For this purpose, we propose a new framework called Ordered Counterfactual Explanation (OrdCE). We introduce a new objective function that evaluates a pair of an action and an order based on feature interaction. To extract an optimal pair, we propose a mixed-integer linear optimization approach with our objective function. Numerical experiments on real datasets demonstrated the effectiveness of our OrdCE in comparison with unordered CE methods.

翻訳日:2021-04-26 07:42:16 公開日:2021-03-14

# (参考訳) DeepStyle:短いテキストのオーサシップ属性のためのユーザスタイルの埋め込み

DeepStyle: User Style Embedding for Authorship Attribution of Short Texts ( http://arxiv.org/abs/2103.11798v1 )

ライセンス: CC BY 4.0

Zhiqiang Hu, Roy Ka-Wei Lee, Lei Wang, Ee-Peng Lim and Bo Dai

(参考訳) 著者帰属(英: Authorship Attribution、AA)は、あるテキストの所有者を見つけるタスクであり、多くのアプリケーションにおいて重要かつ広く研究されている研究トピックである。近年の研究では、深層学習がAAタスクの精度を大幅に向上させることが示されている。それにもかかわらず、提案された手法のほとんどは、単一のタイプの機能(例えば、ワードバイグラム)を使用してユーザー投稿を表現し、タスクに対処するためのテキスト分類アプローチを採用する。さらに、これらの手法はAA結果の非常に限定的な説明性を提供する。本稿では,ユーザの敬遠した文体表現を学習する新しい組込み型フレームワークであるdeepstyleを提案することで,これらの制限に対処する。 TwitterとWeiboの2つの実世界のデータセットについて広範な実験を行った。実験の結果,DeepStyleはAAタスクにおける最先端のベースラインよりも優れていた。

Authorship attribution (AA), which is the task of finding the owner of a given text, is an important and widely studied research topic with many applications. Recent works have shown that deep learning methods could achieve significant accuracy improvement for the AA task. Nevertheless, most of these proposed methods represent user posts using a single type of feature (e.g., word bi-grams) and adopt a text classification approach to address the task. Furthermore, these methods offer very limited explainability of the AA results. In this paper, we address these limitations by proposing DeepStyle, a novel embedding-based framework that learns the representations of users' salient writing styles. We conduct extensive experiments on two real-world datasets from Twitter and Weibo. Our experiment results show that DeepStyle outperforms the state-of-the-art baselines on the AA task.

翻訳日:2021-04-05 03:10:45 公開日:2021-03-14

# (参考訳) DeepHate: 多面的テキスト表現によるヘイトスピーチ検出

DeepHate: Hate Speech Detection via Multi-Faceted Text Representations ( http://arxiv.org/abs/2103.11799v1 )

ライセンス: CC BY 4.0

Rui Cao, Roy Ka-Wei Lee and Tuan-Anh Hoang

(参考訳) オンラインヘイトスピーチは、オンライン社会の結束性を損なう重要な問題であり、私たちの社会における公衆の安全を懸念することさえある。この問題に触発された研究者たちは、オンラインソーシャルプラットフォームにおけるヘイトスピーチを自動的に検出する、多くの伝統的な機械学習とディープラーニング手法を開発した。しかし、これらの手法のほとんどは、単語の頻度や単語の埋め込みなど、単一の型テキストの特徴しか考慮していない。このようなアプローチは、ヘイトスピーチ検出を改善するために使用できる他の豊富なテキスト情報を無視している。本稿では,オンラインソーシャルプラットフォームにおけるヘイトスピーチを検出するために,単語の埋め込み,感情,話題情報などの多面的テキスト表現を組み合わせた新しいディープラーニングモデルDeepHateを提案する。大規模な実験を行い、3つの公開現実データセット上でDeepHateを評価する。実験の結果,DeepHateはヘイトスピーチ検出タスクにおける最先端のベースラインよりも優れていた。また、オンラインソーシャルプラットフォームでヘイトスピーチを検出するのに最適なサルエント機能に関する洞察を提供するために、ケーススタディを実施します。

Online hate speech is an important issue that breaks the cohesiveness of online social communities and even raises public safety concerns in our societies. Motivated by this rising issue, researchers have developed many traditional machine learning and deep learning methods to detect hate speech in online social platforms automatically. However, most of these methods have only considered single type textual feature, e.g., term frequency, or using word embeddings. Such approaches neglect the other rich textual information that could be utilized to improve hate speech detection. In this paper, we propose DeepHate, a novel deep learning model that combines multi-faceted text representations such as word embeddings, sentiments, and topical information, to detect hate speech in online social platforms. We conduct extensive experiments and evaluate DeepHate on three large publicly available real-world datasets. Our experiment results show that DeepHate outperforms the state-of-the-art baselines on the hate speech detection task. We also perform case studies to provide insights into the salient features that best aid in detecting hate speech in online social platforms.

翻訳日:2021-04-05 03:03:27 公開日:2021-03-14

# (参考訳) AngryBERT:ヘイトスピーチ検出のための共同学習目標と感情

AngryBERT: Joint Learning Target and Emotion for Hate Speech Detection ( http://arxiv.org/abs/2103.11800v1 )

ライセンス: CC BY 4.0

Md Rabiul Awal, Rui Cao, Roy Ka-Wei Lee, Sandra Mitrovic

(参考訳) ソーシャルメディアにおけるヘイトスピーチの自動検出は、最近データマイニングと自然言語処理コミュニティで大きな注目を集めている課題である。しかし、既存の手法の多くは、不均衡でしばしばヘイトフルコンテンツのトレーニングサンプルを欠く、注釈付きヘイトスピーチデータセットに大きく依存する教師付きアプローチを採用している。本稿では,新たなマルチタスク学習ベースモデルであるangrybertを提案し,感情分類とターゲット識別を併用したヘイトスピーチ検出を副次的なタスクとして提案する。 3つの一般的なヘイトスピーチ検出データセットを補完する大規模な実験を行った。実験の結果、AngryBERTは最先端のシングルタスク学習とマルチタスク学習のベースラインを上回っていることがわかった。我々は,AngryBERTモデルの強みと特徴を実証的に検証するためにアブレーション研究とケーススタディを行い,その二次課題がヘイトスピーチの検出を改善することを示す。

Automated hate speech detection in social media is a challenging task that has recently gained significant traction in the data mining and Natural Language Processing community. However, most of the existing methods adopt a supervised approach that depended heavily on the annotated hate speech datasets, which are imbalanced and often lack training samples for hateful content. This paper addresses the research gaps by proposing a novel multitask learning-based model, AngryBERT, which jointly learns hate speech detection with sentiment classification and target identification as secondary relevant tasks. We conduct extensive experiments to augment three commonly-used hate speech detection datasets. Our experiment results show that AngryBERT outperforms state-of-the-art single-task-learning and multitask learning baselines. We conduct ablation studies and case studies to empirically examine the strengths and characteristics of our AngryBERT model and show that the secondary tasks are able to improve hate speech detection.

翻訳日:2021-04-05 02:48:24 公開日:2021-03-14

# 静的から動的予測へ:複数の環境要因に基づく山火事リスク評価

From Static to Dynamic Prediction: Wildfire Risk Assessment Based on Multiple Environmental Factors ( http://arxiv.org/abs/2103.10901v1 )

ライセンス: Link先を確認

Tanqiu Jiang, Sidhant K. Bendre, Hanjia Lyu, Jiebo Luo

(参考訳) ワイルドファイアはアメリカ合衆国西海岸で頻繁に起こる最大の災害の1つである。近年、山火事の強度と頻度の増加の原因を理解するために多くの努力がなされている。本研究では,人口密度,正規化差植生指数(ndvi),パーマー干ばつ重大度指数(pdsi),樹木の枯死率,樹木の枯死率,標高など多岐にわたる環境データを用いて,カリフォルニア州の森林火災リスクの高い地域を解析・評価するための静的・動的予測モデルを提案する。さらに,様々な要因の影響をよりよく理解し,予防的行動に知らせることにも焦点を当てる。我々のモデルと結果を検証するために、カリフォルニアの土地を緯度と経度で0.1°$\times$0.1°sの4,242のグリッドに分割し、空間的および時間的条件に基づいて各グリッドのリスクを計算する。対物分析を行うことで、高リスク山火事の減少に対するいくつかの方法がもたらす影響を明らかにする。本研究は、これらの環境データが利用可能であるような様々な地域において、山火事のリスクを推定、監視、軽減する可能性を秘めている。

Wildfire is one of the biggest disasters that frequently occurs on the west coast of the United States. Many efforts have been made to understand the causes of the increases in wildfire intensity and frequency in recent years. In this work, we propose static and dynamic prediction models to analyze and assess the areas with high wildfire risks in California by utilizing a multitude of environmental data including population density, Normalized Difference Vegetation Index (NDVI), Palmer Drought Severity Index (PDSI), tree mortality area, tree mortality number, and altitude. Moreover, we focus on a better understanding of the impacts of different factors so as to inform preventive actions. To validate our models and findings, we divide the land of California into 4,242 grids of 0.1 degrees $\times$ 0.1 degrees in latitude and longitude, and compute the risk of each grid based on spatial and temporal conditions. By performing counterfactual analysis, we uncover the effects of several possible methods on reducing the number of high risk wildfires. Taken together, our study has the potential to estimate, monitor, and reduce the risks of wildfires across diverse areas provided that such environment data is available.

翻訳日:2021-04-05 01:05:44 公開日:2021-03-14

# Fruit Flyは言葉の埋め込みを学べる?

Can a Fruit Fly Learn Word Embeddings? ( http://arxiv.org/abs/2101.06887v2 )

ライセンス: Link先を確認

Yuchen Liang, Chaitanya K. Ryali, Benjamin Hoover, Leopold Grinberg, Saket Navlakha, Mohammed J. Zaki, Dmitry Krotov

(参考訳) ショウジョウバエ脳のキノコ体は神経科学において最も研究されているシステムの一つである。核となるのはケニオン細胞の集団であり、複数の感覚様相から入力を受ける。これらの細胞は前対の側方ニューロンによって抑制され、入力のスパースな高次元表現となる。本研究では,このネットワークモチーフの数学的形式化について検討し,自然言語処理(NLP)タスクである非構造化テキストのコーパスにおいて,単語とその文脈間の相関構造を学習する。このネットワークは単語の意味表現を学習でき、静的および文脈依存の単語埋め込みを生成することができる。単語埋め込みに高密度表現を用いる従来の方法(BERT, GloVeなど)とは異なり、我々のアルゴリズムは単語の意味と文脈をスパースバイナリハッシュコードの形で符号化する。学習した表現の質は、単語類似性分析、単語センスの曖昧さ、文書分類に基づいて評価される。また,fruit fly networkモチーフはnlpの既存の手法に匹敵する性能を実現するだけでなく,計算資源のほんの一部(短いトレーニング時間と少ないメモリフットプリント)しか使用できないことを示した。

The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this network motif and apply it to learning the correlational structure between words and their context in a corpus of unstructured text, a common natural language processing (NLP) task. We show that this network can learn semantic representations of words and can generate both static and context-dependent word embeddings. Unlike conventional methods (e.g., BERT, GloVe) that use dense representations for word embedding, our algorithm encodes semantic meaning of words and their context in the form of sparse binary hash codes. The quality of the learned representations is evaluated on word similarity analysis, word-sense disambiguation, and document classification. It is shown that not only can the fruit fly network motif achieve performance comparable to existing methods in NLP, but, additionally, it uses only a fraction of the computational resources (shorter training time and smaller memory footprint).

翻訳日:2021-03-27 06:07:47 公開日:2021-03-14

# 大規模な調査では、深層学習モデルは欠落データ計算に優れているか? 経験的比較からの証拠

Are deep learning models superior for missing data imputation in large surveys? Evidence from an empirical comparison ( http://arxiv.org/abs/2103.09316v1 )

ライセンス: Link先を確認

Zhenhua Wang, Olanrewaju Akande, Jason Poulos and Fan Li

(参考訳) 多重計算(Multiple imputation、MI)は、サンプル調査における非応答性に起因する欠落データを扱うための最先端の手法である。連鎖方程式(MICE)による多重計算は最も広く使われているMI法であるが、理論的な基礎が欠如しており、計算集約的である。近年, 深層学習モデルに基づくMI手法が開発され, 小規模な研究が進められている。しかし,MICEと比較した場合,特に大規模調査では,現実的な環境下での性能を体系的に評価する研究が限られている。本稿では,実測データに基づくシミュレーションの一般的なフレームワークと,MI手法を比較するための性能指標について述べる。本研究では,アメリカコミュニティ調査データに基づく広範囲なシミュレーションを行い,分類木を用いたマウス,ランダム林を用いたマウス,生成的逆インプテーションネットワーク,デノージングオートエンコーダを用いた複数インプテーションの4つの機械学習手法の繰り返しサンプリング特性を比較した。深層学習に基づくMI手法は,計算時間の観点からはMICEが支配的であるが,分類木を用いたMICEは,偏差,平均二乗誤差,範囲の現実的な設定において,常に深層学習のMI手法よりも優れる。

Multiple imputation (MI) is the state-of-the-art approach for dealing with missing data arising from non-response in sample surveys. Multiple imputation by chained equations (MICE) is the most widely used MI method, but it lacks theoretical foundation and is computationally intensive. Recently, MI methods based on deep learning models have been developed with encouraging results in small studies. However, there has been limited research on systematically evaluating their performance in realistic settings comparing to MICE, particularly in large-scale surveys. This paper provides a general framework for using simulations based on real survey data and several performance metrics to compare MI methods. We conduct extensive simulation studies based on the American Community Survey data to compare repeated sampling properties of four machine learning based MI methods: MICE with classification trees, MICE with random forests, generative adversarial imputation network, and multiple imputation using denoising autoencoders. We find the deep learning based MI methods dominate MICE in terms of computational time; however, MICE with classification trees consistently outperforms the deep learning MI methods in terms of bias, mean squared error, and coverage under a range of realistic settings.

翻訳日:2021-03-18 13:03:29 公開日:2021-03-14

# (参考訳) 入射モデルに対するベイズ実験設計のためのハイブリッド勾配法

A Hybrid Gradient Method to Designing Bayesian Experiments for Implicit Models ( http://arxiv.org/abs/2103.08594v1 )

ライセンス: CC BY 4.0

Jiaxin Zhang, Sirui Bi, Guannan Zhang

(参考訳) ベイズ実験設計(BED)は,収集したデータから収集した情報を最大化する実験を設計することを目的としている。最適設計は通常、データとモデルパラメータ間の相互情報(MI)を最大化することで達成される。例えば、抽出可能なデータ分布を持つ暗黙のモデルを持つMIの分析式が利用できない場合、最近、MIのニューラルネットワークに基づく下界が提案され、下界を最大化するために勾配上昇法が用いられた。しかしながら、2020年のkleinegesseらによるアプローチでは、設計変数に対するmi下界の勾配を計算するためにパスワイズサンプリングパスが必要であり、そのようなパスワイズサンプリングパスは通常、暗黙のモデルではアクセスできない。本研究では,変分mi推定器と進化戦略(es)の最近の進歩とブラックボックス確率勾配上昇(sga)を組み合わせて,mi下界を最大化するハイブリッド勾配手法を提案する。これにより、経路勾配をサンプリングすることなく、暗黙のモデルに対して統一的なスケーラブルな手順で設計プロセスを実現できる。提案手法は,高次元設計空間における暗黙的モデルに対するBEDのスケーラビリティを著しく向上することを示す。

Bayesian experimental design (BED) aims at designing an experiment to maximize the information gathering from the collected data. The optimal design is usually achieved by maximizing the mutual information (MI) between the data and the model parameters. When the analytical expression of the MI is unavailable, e.g., having implicit models with intractable data distributions, a neural network-based lower bound of the MI was recently proposed and a gradient ascent method was used to maximize the lower bound. However, the approach in Kleinegesse et al., 2020 requires a pathwise sampling path to compute the gradient of the MI lower bound with respect to the design variables, and such a pathwise sampling path is usually inaccessible for implicit models. In this work, we propose a hybrid gradient approach that leverages recent advances in variational MI estimator and evolution strategies (ES) combined with black-box stochastic gradient ascent (SGA) to maximize the MI lower bound. This allows the design process to be achieved through a unified scalable procedure for implicit models without sampling path gradients. Several experiments demonstrate that our approach significantly improves the scalability of BED for implicit models in high-dimensional design space.

翻訳日:2021-03-18 01:33:00 公開日:2021-03-14

# 厳密なスパース直交辞書学習

Exact Sparse Orthogonal Dictionary Learning ( http://arxiv.org/abs/2103.09085v1 )

ライセンス: Link先を確認

Kai Liu, Yongjian Zhao, Hua Wang

(参考訳) 過去10年間、入力画像からの辞書の学習は、画像処理と圧縮センシングにおいて最も研究の注目を集めるトピックの1つとなっている。既存の辞書学習法の多くは、K-SVD法のような過剰完全辞書を考慮しており、相互不整合が高く、認識に悪影響を及ぼす可能性がある。一方、スパースコードは、通常、$\ell_0$または$\ell_1$-normのペナルティを追加することで最適化されるが、厳格なスパース性保証はない。本稿では,厳密なスパース符号とグローバルシーケンス収束保証付き直交辞書を得られる直交辞書学習モデルを提案する。本手法は, 辞書ベースの学習手法に比べて, 高い評価結果が得られること, 高い計算効率の利点が期待できることがわかった。

Over the past decade, learning a dictionary from input images for sparse modeling has been one of the topics which receive most research attention in image processing and compressed sensing. Most existing dictionary learning methods consider an over-complete dictionary, such as the K-SVD method, which may result in high mutual incoherence and therefore has a negative impact in recognition. On the other side, the sparse codes are usually optimized by adding the $\ell_0$ or $\ell_1$-norm penalty, but with no strict sparsity guarantee. In this paper, we propose an orthogonal dictionary learning model which can obtain strictly sparse codes and orthogonal dictionary with global sequence convergence guarantee. We find that our method can result in better denoising results than over-complete dictionary based learning methods, and has the additional advantage of high computation efficiency.

翻訳日:2021-03-17 13:19:03 公開日:2021-03-14

# (参考訳) 自律運転における表現学習によるレーダカメラ融合

Radar Camera Fusion via Representation Learning in Autonomous Driving ( http://arxiv.org/abs/2103.07825v1 )

ライセンス: CC BY 4.0

Xu Dong, Binnan Zhuang, Yunxiang Mao, Langechuan Liu

(参考訳) レーダーとカメラは成熟し、コスト効率が高く、堅牢なセンサーであり、大量生産された自動運転システムの認識スタックで広く利用されている。その相補的な性質のため、レーダー検出(レーダーピン)とカメラ認識(2dバウンディングボックス)からの出力は通常融合され、最良の知覚結果を生成する。レーダーカメラ融合の成功の鍵は、正確なデータ関連付けです。レーダ・カメラ・アソシエーションの課題は、運転シーンの複雑さ、レーダ測定のノイズとスパースの性質、および2次元境界ボックスからの深さのあいまいさに起因する。従来のルールに基づくアソシエーション手法は、難解なシナリオやコーナーケースの障害でパフォーマンスが低下するおそれがある。本研究では,rad-camアソシエーションを深層表現学習を通じて解決し,機能レベルのインタラクションとグローバル推論を検討する。具体的には,不完全なラベル付けの難しさを克服し,人間の批判的推論を強制するために,損失サンプリング機構と革新的な順序的損失をデザインする。規則に基づくアルゴリズムによって生成された雑音ラベルを用いて学習したにもかかわらず,提案手法は92.2%のf1スコアを達成し,これは規則に基づく教師よりも11.6%高い。さらに,このデータ駆動方式は,コーナーケースマイニングによる継続的改善にも有効だ。

Radars and cameras are mature, cost-effective, and robust sensors and have been widely used in the perception stack of mass-produced autonomous driving systems. Due to their complementary properties, outputs from radar detection (radar pins) and camera perception (2D bounding boxes) are usually fused to generate the best perception results. The key to successful radar-camera fusion is accurate data association. The challenges in radar-camera association can be attributed to the complexity of driving scenes, the noisy and sparse nature of radar measurements, and the depth ambiguity from 2D bounding boxes. Traditional rule-based association methods are susceptible to performance degradation in challenging scenarios and failure in corner cases. In this study, we propose to address rad-cam association via deep representation learning, to explore feature-level interaction and global reasoning. Concretely, we design a loss sampling mechanism and an innovative ordinal loss to overcome the difficulty of imperfect labeling and to enforce critical human reasoning. Despite being trained with noisy labels generated by a rule-based algorithm, our proposed method achieves a performance of 92.2% F1 score, which is 11.6% higher than the rule-based teacher. Moreover, this data-driven method also lends itself to continuous improvement via corner case mining.

翻訳日:2021-03-17 06:35:56 公開日:2021-03-14

# (参考訳) 文脈的確率推定のための文レベルのノイズコントラスト推定による単語レベルの言語モデル学習

Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation ( http://arxiv.org/abs/2103.07875v1 )

ライセンス: CC BY 4.0

Heewoong Park, Sukhyun Cho, Jonghun Park

(参考訳) 文や単語列の確率分布を推測することは自然言語処理の重要なプロセスである。単語系列の結合確率を計算するために単語レベル言語モデル(lms)が広く採用されているが、文確率推定(spe)に十分な長さの文脈を捉えるのが困難である。これを解決するために、近年の研究では、リカレントニューラルネットワーク(RNN)を用いた文レベルノイズコントラスト推定(NCE)を用いたトレーニング手法を導入している。本研究では,前文の条件文確率を推定することを目的とした文脈的SPEの拡張を試みる。提案されたNCEは、前のテキストとは無関係にネガティブな文をサンプリングするため、訓練されたモデルは、より一貫性のある文により高い確率を与える。本手法を単純な単語レベルのRNN LMに適用し,ネットワークアーキテクチャではなく文レベルのNCEトレーニングの効果に着目した。推定の質は,人間と自動生成した質問を含む複数項目のクローゼ型質問に対して評価した。実験結果は,提案手法が単語レベルRNN LMのSPE品質を改善することを示した。

Inferring the probability distribution of sentences or word sequences is a key process in natural language processing. While word-level language models (LMs) have been widely adopted for computing the joint probabilities of word sequences, they have difficulty in capturing a context long enough for sentence probability estimation (SPE). To overcome this, recent studies introduced training methods using sentence-level noise-contrastive estimation (NCE) with recurrent neural networks (RNNs). In this work, we attempt to extend it for contextual SPE, which aims to estimate a conditional sentence probability given a previous text. The proposed NCE samples negative sentences independently of a previous text so that the trained model gives higher probabilities to the sentences that are more consistent with \textcolor{blue}{the} context. We apply our method to a simple word-level RNN LM to focus on the effect of the sentence-level NCE training rather than on the network architecture. The quality of estimation was evaluated against multiple-choice cloze-style questions including both human and automatically generated questions. The experimental results show that the proposed method improved the SPE quality for the word-level RNN LM.

翻訳日:2021-03-17 06:23:03 公開日:2021-03-14

# (参考訳) R-GSN:異種グラフのためのリレーショナルグラフ類似ネットワーク

R-GSN: The Relation-based Graph Similar Network for Heterogeneous Graph ( http://arxiv.org/abs/2103.07877v1 )

ライセンス: CC BY 4.0

Xinliang Wu and Mengying Jiang and Guizhong Liu

(参考訳) 不均一グラフは、実生活で広く存在するデータ構造の一種です。今日では、異種グラフ上のグラフニューラルネットワークの研究がますます盛んになっている。既存の異種グラフニューラルネットワークアルゴリズムは主にメタパスをベースとしており、もう1つはそうではない。メタパスに基づくアイデアは、しばしば手作業による事前処理を必要とするが、同時に大規模グラフの拡張は困難である。本稿では, メタパスを必要としない一般異種メッセージパッシングパラダイムを提案し, R-GSNを設計し, ベースラインのR-GCNに比べて大幅に改善した。実験により,我々のR-GSNアルゴリズムはogbn-mag大規模不均一グラフデータセット上での最先端の性能を実現することが示された。

Heterogeneous graph is a kind of data structure widely existing in real life. Nowadays, the research of graph neural network on heterogeneous graph has become more and more popular. The existing heterogeneous graph neural network algorithms mainly have two ideas, one is based on meta-path and the other is not. The idea based on meta-path often requires a lot of manual preprocessing, at the same time it is difficult to extend to large scale graphs. In this paper, we proposed the general heterogeneous message passing paradigm and designed R-GSN that does not need meta-path, which is much improved compared to the baseline R-GCN. Experiments have shown that our R-GSN algorithm achieves the state-of-the-art performance on the ogbn-mag large scale heterogeneous graph dataset.

翻訳日:2021-03-17 06:07:48 公開日:2021-03-14

# (参考訳) 携帯型拡張現実モバイルを用いた動的物体復元のためのマルチビューデータキャプチャ

Multi-view data capture for dynamic object reconstruction using handheld augmented reality mobiles ( http://arxiv.org/abs/2103.07883v1 )

ライセンス: CC BY 4.0

M. Bortolon, L. Bazzanella, F. Poiesi

(参考訳) 動的オブジェクト3D再構成に適した複数の携帯端末からほぼ同期のフレームストリームをキャプチャするシステムを提案する。各モバイルは、そのポーズを推定するために同時にローカライズとマッピングを実行し、無線通信チャネルを使用して同期トリガーを送受信する。我々のシステムは、分散トリガ戦略とエッジまたはクラウドにデプロイ可能なデータリレーアーキテクチャを用いて、フレームとモバイルのポーズをリアルタイムで収集することができる。 3次元骨格とボリュームリコンストラクションに利用することで,本システムの有効性を示す。我々のトリガー戦略は、NTPベースの同期アプローチと同等のパフォーマンスを達成するが、アプリケーションのニーズに応じてオンラインで調整できるため、より高い柔軟性を提供する。屋外でスポーツ活動を行う俳優を録画する6つのハンドヘルド拡張現実モバイルを含む、挑戦的な新しいデータセット、すなわち4DMを作成しました。システムを4DM上で検証し、その強みと限界を分析し、モジュールと代替モジュールを比較します。

We propose a system to capture nearly-synchronous frame streams from multiple and moving handheld mobiles that is suitable for dynamic object 3D reconstruction. Each mobile executes Simultaneous Localisation and Mapping on-board to estimate its pose, and uses a wireless communication channel to send or receive synchronisation triggers. Our system can harvest frames and mobile poses in real time using a decentralised triggering strategy and a data-relay architecture that can be deployed either at the Edge or in the Cloud. We show the effectiveness of our system by employing it for 3D skeleton and volumetric reconstructions. Our triggering strategy achieves equal performance to that of an NTP-based synchronisation approach, but offers higher flexibility, as it can be adjusted online based on application needs. We created a challenging new dataset, namely 4DM, that involves six handheld augmented reality mobiles recording an actor performing sports actions outdoors. We validate our system on 4DM, analyse its strengths and limitations, and compare its modules with alternative ones.

翻訳日:2021-03-17 05:54:44 公開日:2021-03-14

# (参考訳) 標準平面の分類のための原理的超音波データ拡張

Principled Ultrasound Data Augmentation for Classification of Standard Planes ( http://arxiv.org/abs/2103.07895v1 )

ライセンス: CC BY 4.0

Lok Hin Lee and Yuan Gao and J. Alison Noble

(参考訳) 大きな学習能力を持つディープラーニングモデルは、しばしば医療画像データセットに適合する。これは、医療データ取得やラベル付けで生じるかなりの時間と費用のために、トレーニングセットが比較的小さいためである。したがって、データ拡張はトレーニングデータの可用性を拡大し、一般化を促進するためにしばしば用いられる。しかし、拡張戦略はしばしば正当化なしでアドホックに選択される。本稿では,モデル分類性能の向上を目的とした拡張ポリシー探索手法を提案する。我々は,医療画像解析によく用いられる追加の変換を補完ポリシー検索に含め,その性能を評価する。さらに,非線形混合サンプルデータ拡張戦略を含むように拡張ポリシー検索を拡張した。本研究では、超音波標準平面分類におけるナイーブデータ増強戦略よりも平均F1スコアが7.0%向上し、医学的画像モデルトレーニングのための原則的データ増強が超音波標準平面検出の大幅な改善につながることを示した。得られた超音波画像の表現は、よりよくクラスタ化され、最適化されたデータ拡張で定義される。

Deep learning models with large learning capacities often overfit to medical imaging datasets. This is because training sets are often relatively small due to the significant time and financial costs incurred in medical data acquisition and labelling. Data augmentation is therefore often used to expand the availability of training data and to increase generalization. However, augmentation strategies are often chosen on an ad-hoc basis without justification. In this paper, we present an augmentation policy search method with the goal of improving model classification performance. We include in the augmentation policy search additional transformations that are often used in medical image analysis and evaluate their performance. In addition, we extend the augmentation policy search to include non-linear mixed-example data augmentation strategies. Using these learned policies, we show that principled data augmentation for medical image model training can lead to significant improvements in ultrasound standard plane detection, with an an average F1-score improvement of 7.0% overall over naive data augmentation strategies in ultrasound fetal standard plane classification. We find that the learned representations of ultrasound images are better clustered and defined with optimized data augmentation.

翻訳日:2021-03-17 05:33:46 公開日:2021-03-14

# (参考訳) バングラ手書き文字認識と生成

Bangla Handwritten Digit Recognition and Generation ( http://arxiv.org/abs/2103.07905v1 )

ライセンス: CC BY 4.0

Md Fahim Sikder

(参考訳) 手書き数字や数値認識は、パターン認識の分野では古典的な問題の一つであり、近年のコンピュータリソースの幅広い可用性のために、大きな進歩を遂げています。英語、アラビア語、中国語、日本語手書きのスクリプトですでに豊富な作品が行われています。バングラでの作業もいくつか行われたが、開発の余地がある。そこで本論文では,BHANDデータセット上で99.44%の検証精度を達成し,AlexnetとInception V3アーキテクチャを上回ったアーキテクチャを実装した。数値認識以外にも、デジタル生成は研究者の注目を集めている分野でもあるが、特にバングラについての研究はあまり行われていない。本論文では,Bangla手書き数字を生成するためにSemi-supvised Generative Adversarial Network(SGAN)を適用し,Bangla桁の生成に成功した。

Handwritten digit or numeral recognition is one of the classical issues in the area of pattern recognition and has seen tremendous advancement because of the recent wide availability of computing resources. Plentiful works have already done on English, Arabic, Chinese, Japanese handwritten script. Some work on Bangla also have been done but there is space for development. From that angle, in this paper, an architecture has been implemented which achieved the validation accuracy of 99.44% on BHAND dataset and outperforms Alexnet and Inception V3 architecture. Beside digit recognition, digit generation is another field which has recently caught the attention of the researchers though not many works have been done in this field especially on Bangla. In this paper, a Semi-Supervised Generative Adversarial Network or SGAN has been applied to generate Bangla handwritten numerals and it successfully generated Bangla digits.

翻訳日:2021-03-17 05:24:11 公開日:2021-03-14

# (参考訳) 自然言語処理における再現性研究の体系的レビュー

A Systematic Review of Reproducibility Research in Natural Language Processing ( http://arxiv.org/abs/2103.07929v1 )

ライセンス: CC BY 4.0

Anya Belz, Shubham Agarwal, Anastasia Shimorina, Ehud Reiter

(参考訳) 科学における再現性危機と呼ばれることの背景から、NLPの分野はますます興味を持ち、その成果の再現性に精通してきている。過去数年間、この地域では様々な新しいイニシアチブやイベント、活発な研究が行われてきた。しかし、再現性がどのように定義され、測定され、対処されるべきかについて、この分野は合意に達するには程遠い。この重点的貢献により、NLPの再現性に関する現在の作業のスナップショット、相違点と類似点の記述、共通分母へのポインタの提供を、可能な限り広角かつ近距離に行うことを目指しています。

Against the background of what has been termed a reproducibility crisis in science, the NLP field is becoming increasingly interested in, and conscientious about, the reproducibility of its results. The past few years have seen an impressive range of new initiatives, events and active research in the area. However, the field is far from reaching a consensus about how reproducibility should be defined, measured and addressed, with diversity of views currently increasing rather than converging. With this focused contribution, we aim to provide a wide-angle, and as near as possible complete, snapshot of current work on reproducibility in NLP, delineating differences and similarities, and providing pointers to common denominators.

翻訳日:2021-03-17 05:15:09 公開日:2021-03-14

# (参考訳) Gym-ANM: 電力配電システムにおけるアクティブネットワーク管理タスクのための強化学習環境

Gym-ANM: Reinforcement Learning Environments for Active Network Management Tasks in Electricity Distribution Systems ( http://arxiv.org/abs/2103.07932v1 )

ライセンス: CC BY 4.0

Robin Henry and Damien Ernst

(参考訳) 配電ネットワークのアクティブネットワーク管理(ANM)には、多くの複雑な確率的逐次最適化問題が含まれる。これらの問題は、再生可能エネルギーと分散ストレージを将来の電力網に統合するために解決する必要がある。本稿では、電力配電ネットワークにおけるANMタスクをモデル化する強化学習(RL)環境を設計するためのフレームワークであるGym-ANMを紹介する。これらの環境は、そのようなシステムの基盤となるダイナミクスに関する広範な知識を必要としない電力ネットワークの管理におけるrl研究の新しい場を提供する。この作業に加えて、ANMの共通の課題を強調するために設計された入門玩具環境ANM6-Easyの実装をリリースしています。また、モデル予測制御(MPC)手法と比較して、最先端のRLアルゴリズムはANM6-Easy上で既に優れた性能が得られることを示す。最後に, (a) 分布ネットワークトポロジーとパラメータ, (b) 観測空間, (c) システムに存在する確率過程のモデル化, (d) 報酬信号に影響を及ぼす一連のハイパーパラメータについて異なる新しい体育環境を作成するためのガイドラインを提供する。 gym-anmはhttps://github.com/robinhenry/gym-anmからダウンロードできる。

Active network management (ANM) of electricity distribution networks include many complex stochastic sequential optimization problems. These problems need to be solved for integrating renewable energies and distributed storage into future electrical grids. In this work, we introduce Gym-ANM, a framework for designing reinforcement learning (RL) environments that model ANM tasks in electricity distribution networks. These environments provide new playgrounds for RL research in the management of electricity networks that do not require an extensive knowledge of the underlying dynamics of such systems. Along with this work, we are releasing an implementation of an introductory toy-environment, ANM6-Easy, designed to emphasize common challenges in ANM. We also show that state-of-the-art RL algorithms can already achieve good performance on ANM6-Easy when compared against a model predictive control (MPC) approach. Finally, we provide guidelines to create new Gym-ANM environments differing in terms of (a) the distribution network topology and parameters, (b) the observation space, (c) the modelling of the stochastic processes present in the system, and (d) a set of hyperparameters influencing the reward signal. Gym-ANM can be downloaded at https://github.com/robinhenry/gym-anm.

翻訳日:2021-03-17 04:55:40 公開日:2021-03-14

# (参考訳) サンプルタスク実行から針挿入を学ぶ

Learning needle insertion from sample task executions ( http://arxiv.org/abs/2103.07938v1 )

ライセンス: CC BY 4.0

Amir Ghalamzan-E

(参考訳) ロボット作業、例えばロボット縫合の自動化は非常に複雑で時間がかかる。自律的にタスクを実行するためのタスクモデルを学ぶことは、技術、ロボット手術、より広いコミュニティのためにアクセス可能にする貴重なことです。ロボット手術のデータを簡単に記録でき、収集したデータを使ってタスクモデルを学ぶことができる。これにより、外科医がロボット操作を監督したり、ツールの低レベル制御の代わりに高レベルのコマンドを与えることができるロボット手術の時間とコストが削減されます。腕1が軟組織に針を挿入し、腕2が軟組織を積極的に操作し、所望の出口と実際の出口が同一であることを保証する2本の腕を持つ軟組織に針を挿入するデータセットを提案する。これは、組織をアクティブに操作することなく縫合することは縫合に失敗する可能性があるため、縫合が縫合に適用される力に耐えるだけの十分な組織を縫合することが出来ないため、実際の手術において重要である。 3対のステレオカメラで記録された60の治験を含む針挿入データセットを提案する。さらに, t 以降の段階でロボットの望ましい状態を予測するDeep-Robot Learning from Demonstrations(デモからの深層ロボット学習)を, 過去のステップのビデオ(すなわち, t での最適動作)から見て紹介する。 n ステップタイム履歴 N はタスクの実行のメモリタイムウィンドウです。実験結果は,提案する深層モデルアーキテクチャが既存手法を上回っていることを示す。ソリューションはまだ実際のロボットにデプロイする準備が整っていないが、結果は実際のロボットを展開するための将来の開発の可能性を示している。

Automating a robotic task, e.g., robotic suturing can be very complex and time-consuming. Learning a task model to autonomously perform the task is invaluable making the technology, robotic surgery, accessible for a wider community. The data of robotic surgery can be easily logged where the collected data can be used to learn task models. This will result in reduced time and cost of robotic surgery in which a surgeon can supervise the robot operation or give high-level commands instead of low-level control of the tools. We present a data-set of needle insertion in soft tissue with two arms where Arm 1 inserts the needle into the tissue and Arm 2 actively manipulate the soft tissue to ensure the desired and actual exit points are the same. This is important in real-surgery because suturing without active manipulation of tissue may yield failure of the suturing as the stitch may not grip enough tissue to resist the force applied for the suturing. We present a needle insertion dataset including 60 successful trials recorded by 3 pair of stereo cameras. Moreover, we present Deep-robot Learning from Demonstrations that predicts the desired state of the robot at the time step after t (which the optimal action taken at t yields) by looking at the video of the past time steps, i.e. n step time history where N is the memory time window, of the task execution. The experimental results illustrate our proposed deep model architecture is outperforming the existing methods. Although the solution is not yet ready to be deployed on a real robot, the results indicate the possibility of future development for real robot deployment.

翻訳日:2021-03-17 04:54:27 公開日:2021-03-14

# (参考訳) MLベースのシステムのためのソフトウェアアーキテクチャ - 既存のものと、その先にあるもの

Software Architecture for ML-based Systems: What Exists and What Lies Ahead ( http://arxiv.org/abs/2103.07950v1 )

ライセンス: CC BY 4.0

Henry Muccini and Karthik Vaidhyanathan

(参考訳) 機械学習(ML)の利用の増加と、現代のソフトウェアアーキテクチャの課題が組み合わさって、MLベースのシステムのためのソフトウェアアーキテクチャ、MLベースのソフトウェアシステムを開発するためのアーキテクチャ技術開発に焦点を当てたソフトウェアアーキテクチャのためのソフトウェアアーキテクチャ、そして、従来のソフトウェアシステムを構築するためのML技術の開発に焦点を当てたソフトウェアアーキテクチャのためのMLの2つの広い研究領域が生まれた。本研究では、MLベースのソフトウェアシステムを設計する現在のシナリオに存在するさまざまなアーキテクチャプラクティスを強調することを目的として、スペクトルの以前の側面に焦点を当てる。 MLベースのソフトウェアシステムを設計するための標準的なプラクティスセットをより適切に定義するために、MLとソフトウェア実践者の双方の注意を必要とするソフトウェアアーキテクチャの4つの重要な領域を特定します。私たちは、イタリア最大の博物館のひとつでキューイングの課題を解決するmlベースのソフトウェアシステムを構築した経験から、これらの領域を基盤としています。

The increasing usage of machine learning (ML) coupled with the software architectural challenges of the modern era has resulted in two broad research areas: i) software architecture for ML-based systems, which focuses on developing architectural techniques for better developing ML-based software systems, and ii) ML for software architectures, which focuses on developing ML techniques to better architect traditional software systems. In this work, we focus on the former side of the spectrum with a goal to highlight the different architecting practices that exist in the current scenario for architecting ML-based software systems. We identify four key areas of software architecture that need the attention of both the ML and software practitioners to better define a standard set of practices for architecting ML-based software systems. We base these areas in light of our experience in architecting an ML-based software system for solving queuing challenges in one of the largest museums in Italy.

翻訳日:2021-03-17 04:39:46 公開日:2021-03-14

# (参考訳) アクティブダイナミカルプロスペクション:パスフィンディング時の感覚制御のための粒子フィルタリングとしてのメンタルシミュレーションのモデル化

Active Dynamical Prospection: Modeling Mental Simulation as Particle Filtering for Sensorimotor Control during Pathfinding ( http://arxiv.org/abs/2103.07966v1 )

ライセンス: CC BY-SA 4.0

Jeremy Gordon and John Chuang

(参考訳) 共通の課題に直面した時に人間が何をするか – どこに行きたいかは分かっていますが、そこに着く最善の方法がまだ分かっていません。これは空間的ナビゲーションやパスフィニングにおいてエージェントが引き起こす問題であり、その解決策は一般により抽象的な計画領域に関するヒントを与えるかもしれない。本研究では,パスファインディング行動の連続的,明示的な探索的パラダイムをモデル化する。私たちのタスクでは、参加者(およびエージェント)は、部分的に観察可能な環境で視覚的な探索とナビゲーションの両方を調整しなければなりません。 1)オンライン実験として実施した新しいパスファインディングパラダイムにおける81名の被験者の行動データの解析,2) 粒子フィルタリングとしてナビゲーション中の予測的メンタルシミュレーションをモデル化する提案,3) 計算エージェントにおける提案のインスタンス化,の3つの主成分がある。我々のモデルであるActive Dynamical Prospectionでは、マップの解法率、経路選択、試行期間の類似パターンと、人間の参加者のデータと比較した場合の注意行動(集約レベルと個人レベルの両方)が示される。また,最初の移動前の遠近的注意と遅延(予測シミュレーションの潜在的な相関関係)がタスク性能の予測であることを見出した。

What do humans do when confronted with a common challenge: we know where we want to go but we are not yet sure the best way to get there, or even if we can. This is the problem posed to agents during spatial navigation and pathfinding, and its solution may give us clues about the more abstract domain of planning in general. In this work, we model pathfinding behavior in a continuous, explicitly exploratory paradigm. In our task, participants (and agents) must coordinate both visual exploration and navigation within a partially observable environment. Our contribution has three primary components: 1) an analysis of behavioral data from 81 human participants in a novel pathfinding paradigm conducted as an online experiment, 2) a proposal to model prospective mental simulation during navigation as particle filtering, and 3) an instantiation of this proposal in a computational agent. We show that our model, Active Dynamical Prospection, demonstrates similar patterns of map solution rate, path selection, and trial duration, as well as attentional behavior (at both aggregate and individual levels) when compared with data from human participants. We also find that both distal attention and delay prior to first move (both potential correlates of prospective simulation) are predictive of task performance.

翻訳日:2021-03-17 04:21:13 公開日:2021-03-14

# (参考訳) CrossoverScheduler: クロスオーバーマナーで複数の分散トレーニングアプリケーションをオーバーラップする

CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner ( http://arxiv.org/abs/2103.07974v1 )

ライセンス: CC BY 4.0

Cheng Luo, Lei Qu, Youshan Miao, Peng Cheng, Yongqiang Xiong

(参考訳) 分散ディープラーニングのワークロードには、GPUクラスタ上のスループット集約型トレーニングタスクが含まれる。分散確率勾配日射(Distributed Stochastic Gradient Descent, SGD)は、後方伝播後の通信遅延を大幅に増大させる。本稿では,分散トレーニングアプリケーションの通信サイクルを,パイプライン通信と計算を通じて他のアプリケーションで満たすアルゴリズムであるcrossoverschedulerを提案する。 CrossoverSchedulerでは、収束率とネットワーク精度を犠牲にすることなく、分散トレーニングの実行性能を著しく向上させることができる。我々は、複数の分散ディープラーニングアプリケーションが同じGPUを交互にタイムシェアできるクロスオーバー同期を導入することで実現した。 CrossoverSchedulerのプロトタイプはHorovodと構築および統合されています。さまざまな分散タスクの実験から、CrossoverSchedulerはImageNetデータセット上の画像分類タスクの20%のスピードアップを実現している。

Distributed deep learning workloads include throughput-intensive training tasks on the GPU clusters, where the Distributed Stochastic Gradient Descent (SGD) incurs significant communication delays after backward propagation, forces workers to wait for the gradient synchronization via a centralized parameter server or directly in decentralized workers. We present CrossoverScheduler, an algorithm that enables communication cycles of a distributed training application to be filled by other applications through pipelining communication and computation. With CrossoverScheduler, the running performance of distributed training can be significantly improved without sacrificing convergence rate and network accuracy. We achieve so by introducing Crossover Synchronization which allows multiple distributed deep learning applications to time-share the same GPU alternately. The prototype of CrossoverScheduler is built and integrated with Horovod. Experiments on a variety of distributed tasks show that CrossoverScheduler achieves 20% \times speedup for image classification tasks on ImageNet dataset.

翻訳日:2021-03-17 04:07:58 公開日:2021-03-14

# (参考訳) 暗黙モデルを用いたベイズ実験設計のためのスケーラブルグラデーションフリー手法

A Scalable Gradient-Free Method for Bayesian Experimental Design with Implicit Models ( http://arxiv.org/abs/2103.08026v1 )

ライセンス: CC BY 4.0

Jiaxin Zhang, Sirui Bi, Guannan Zhang

(参考訳) ベイズ実験設計(BED)は、情報収集を最大化する設計を選択する方法という質問に答えることです。暗黙的なモデルでは、サンプリングが可能であり、従来のBED法では、後方分布を効率的に推定し、データとパラメータ間の相互情報(MI)を最大化するのが困難である。最近の研究では、これらの問題に対処するためにMIの低い境界を最大化するグラデーションアセンションの使用を提案しました。しかし、この手法では設計変数に関してMI下限の経路勾配を計算するためにサンプリングパスが必要であり、そのような経路勾配は通常暗黙のモデルでは到達できない。本論文では, 確率的近似勾配上昇の最近の進歩を有効かつ堅牢なBEDのための平滑な変動MI推定器に組み込んだ新しい手法を提案する。経路勾配の必要がなければ,本手法は暗黙的モデルに対して近似的な勾配を持つ統一的な手順で設計プロセスを実現することができる。いくつかの実験により,本手法はベースライン法より優れ,高次元問題におけるBEDのスケーラビリティが著しく向上することが示された。

Bayesian experimental design (BED) is to answer the question that how to choose designs that maximize the information gathering. For implicit models, where the likelihood is intractable but sampling is possible, conventional BED methods have difficulties in efficiently estimating the posterior distribution and maximizing the mutual information (MI) between data and parameters. Recent work proposed the use of gradient ascent to maximize a lower bound on MI to deal with these issues. However, the approach requires a sampling path to compute the pathwise gradient of the MI lower bound with respect to the design variables, and such a pathwise gradient is usually inaccessible for implicit models. In this paper, we propose a novel approach that leverages recent advances in stochastic approximate gradient ascent incorporated with a smoothed variational MI estimator for efficient and robust BED. Without the necessity of pathwise gradients, our approach allows the design process to be achieved through a unified procedure with an approximate gradient for implicit models. Several experiments show that our approach outperforms baseline methods, and significantly improves the scalability of BED in high-dimensional problems.

翻訳日:2021-03-17 04:01:53 公開日:2021-03-14

# (参考訳) 低リソースニューラルネットワーク翻訳のためのクラウドソーシングフレーズベースのトークン化:Fon言語の場合

Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language ( http://arxiv.org/abs/2103.08052v1 )

ライセンス: CC BY 4.0

Bonaventure F. P. Dossou and Chris C. Emezue

(参考訳) 非常に低リソースで形態的に豊かなアフリカの先住民言語に対する効果的なニューラルネットワーク翻訳(NMT)モデルの構築は、オープンな課題である。利用可能なリソースを見つけるという問題に加えて、多くの作業が前処理とトークン化に費やされます。最近の研究では、標準のトークン化方法がアフリカ言語の文法的、ダイアクリティカル、トーン特性を適切に扱うとは限らないことが示されています。トレーニングサンプルの可用性が極めて低いことに加えて、信頼性の高いNMTモデルの生産を妨げている。本稿では,fon言語を事例研究として,標準トークン化法を再検討し,人間主導のスーパーワードトークン化戦略であるword-expressions-based (web)トークン化を導入する。さらに、Fon-France-Fon翻訳タスクのトークン化戦略を他の人と比較します。

Building effective neural machine translation (NMT) models for very low-resourced and morphologically rich African indigenous languages is an open challenge. Besides the issue of finding available resources for them, a lot of work is put into preprocessing and tokenization. Recent studies have shown that standard tokenization methods do not always adequately deal with the grammatical, diacritical, and tonal properties of some African languages. That, coupled with the extremely low availability of training samples, hinders the production of reliable NMT models. In this paper, using Fon language as a case study, we revisit standard tokenization methods and introduce Word-Expressions-Based (WEB) tokenization, a human-involved super-words tokenization strategy to create a better representative vocabulary for training. Furthermore, we compare our tokenization strategy to others on the Fon-French and French-Fon translation tasks.

翻訳日:2021-03-17 03:38:33 公開日:2021-03-14

# (参考訳) RecSim NG:Recommenderエコシステムの原則的不確実性モデリングを目指して

RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems ( http://arxiv.org/abs/2103.08057v1 )

ライセンス: CC BY 4.0

Martin Mladenov, Chih-Wei Hsu, Vihan Jain, Eugene Ie, Christopher Colby, Nicolas Mayoraz, Hubert Pham, Dustin Tran, Ivan Vendrov, Craig Boutilier

(参考訳) ユーザとのマルチターンインタラクションを最適化し、レコメンダエコシステムにおけるさまざまなエージェント(ユーザ、コンテンツプロバイダ、ベンダなど)のインタラクションをモデル化するレコメンダシステムの開発は、近年注目を集めている。このようなレコメンダーのためのモデルとアルゴリズムの開発とトレーニングは、静的データセットを使用することで特に困難になる可能性があります。そこで我々は,マルチエージェントレコメンダシステムのシミュレーションのための確率的プラットフォームであるrecsim ngを開発した。 RecSim NGはEdward2とTensorFlowで実装されたスケーラブルでモジュール化された差別化可能なシミュレータである。エージェントビヘイビア仕様のための強力で汎用的な確率的プログラム言語、自動微分とトレースによる確率的推論と潜在変数モデル学習のためのツール、アクセラレーションされたハードウェア上でシミュレーションを実行するTensorFlowベースのランタイムを提供する。 RecSim NGについて説明するとともに、RecSim NGが研究者と実践者の両方にとって、レコメンダシステムのための新しいアルゴリズムを容易に開発し、訓練するための簡単なユースケースの小さなセットによって補完される、レコメンダエコシステムの透明で構成可能なエンドツーエンドモデルの作成にどのように使用できるかを説明している。

The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different agents (e.g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years. Developing and training models and algorithms for such recommenders can be especially difficult using static datasets, which often fail to offer the types of counterfactual predictions needed to evaluate policies over extended horizons. To address this, we develop RecSim NG, a probabilistic platform for the simulation of multi-agent recommender systems. RecSim NG is a scalable, modular, differentiable simulator implemented in Edward2 and TensorFlow. It offers: a powerful, general probabilistic programming language for agent-behavior specification; tools for probabilistic inference and latent-variable model learning, backed by automatic differentiation and tracing; and a TensorFlow-based runtime for running simulations on accelerated hardware. We describe RecSim NG and illustrate how it can be used to create transparent, configurable, end-to-end models of a recommender ecosystem, complemented by a small set of simple use cases that demonstrate how RecSim NG can help both researchers and practitioners easily develop and train novel algorithms for recommender systems.

翻訳日:2021-03-17 03:22:11 公開日:2021-03-14

# (参考訳) Versailles-FPデータセット:古代の壁検出

Versailles-FP dataset: Wall Detection in Ancient ( http://arxiv.org/abs/2103.08064v1 )

ライセンス: CC BY 4.0

Wassim Swaileh, Dimitrios Kotzinos, Suman Ghosh, Michel Jordan, Son Vu, and Yaguan Qian

(参考訳) 歴史的建造物の床計画へのアクセスは、建築の進化と歴史を理解するために必要である。このような知識ベースは、かつて建物の一部であったさまざまな出来事、人物、事実の間のつながりを確立することで、歴史の再構築にも役立ちます。 2次元の計画は空間全体を捉えないため、3Dモデリングはこれらのユニークなアーカイブの読影に新たな光を放ち、記念碑の古代国家を理解するための大きな視点を開く。建物や記念碑の3Dモデルの最初のステップは、フロアプランにおける壁検出であり、本稿では、17世紀から18世紀にかけてのヴェルサイユ宮殿の、新しい独特で独特な壁面のFPデータセットを紹介する。データセットの壁マスクは、多方向ステアブルフィルタに基づく自動アプローチによって生成される。生成された壁面は手作業で検証され修正される。我々は最新のデータ集合における壁マスク生成のアプローチを検証する。最後に、壁検出のためのUネットベースの畳み込みフレームワークを提案する。本手法は,完全接続型ネットワークベースアプローチを超越した技術結果を実現する。

Access to historical monuments' floor plans over a time period is necessary to understand the architectural evolution and history. Such knowledge bases also helps to rebuild the history by establishing connection between different event, person and facts which are once part of the buildings. Since the two-dimensional plans do not capture the entire space, 3D modeling sheds new light on the reading of these unique archives and thus opens up great perspectives for understanding the ancient states of the monument. Since the first step in the building's or monument's 3D model is the wall detection in the floor plan, we introduce in this paper the new and unique Versailles FP dataset of wall groundtruthed images of the Versailles Palace dated between 17th and 18th century. The dataset's wall masks are generated using an automatic approach based on multi directional steerable filters. The generated wall masks are then validated and corrected manually. We validate our approach of wall mask generation in state-of-the-art modern datasets. Finally we propose a U net based convolutional framework for wall detection. Our method achieves state of the art result surpassing fully connected network based approach.

翻訳日:2021-03-17 02:51:48 公開日:2021-03-14

# (参考訳) ゼロショット創発通信のための準等価ディスカバリ

Quasi-Equivalence Discovery for Zero-Shot Emergent Communication ( http://arxiv.org/abs/2103.08067v1 )

ライセンス: CC BY 4.0

Kalesha Bullard, Douwe Kiela, Joelle Pineau, Jakob Foerster

(参考訳) 効果的なコミュニケーションはマルチエージェント環境での情報交換を可能にする重要なスキルであり、創発的コミュニケーションは活気ある研究分野であり、個別の安価トークチャネルを含む共通的な設定である。定義上、これらの設定には任意の情報エンコーディングが含まれており、通常、学習したプロトコルがトレーニングパートナーを超えて一般化することを許さない。対照的に、本研究では、ゼロショットコーディネーション(ZSC)を可能にする新しい問題設定と準等価ディスカバリー(QED)アルゴリズム、すなわち独立に訓練されたエージェントに一般化できるプロトコルを発見することを提案する。現実世界の問題設定にはしばしば高価な通信チャネルが含まれており、例えばロボットは四肢を物理的に動かさなければならない。これらの2つの要因が,エージェントが意図を伝えるためにメッセージのエネルギーコストを使用するレファレンシャルゲームにおいて,ユニークなzscポリシーをもたらすことを示す。 Other-Playは最近、最適なZSCポリシーを学ぶために導入されたが、問題の対称性に事前アクセスする必要がある。代わりに、qedはこの設定における対称性を反復的に発見し、最適なzscポリシーに収束する。

Effective communication is an important skill for enabling information exchange in multi-agent settings and emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. Since, by definition, these settings involve arbitrary encoding of information, typically they do not allow for the learned protocols to generalize beyond training partners. In contrast, in this work, we present a novel problem setting and the Quasi-Equivalence Discovery (QED) algorithm that allows for zero-shot coordination (ZSC), i.e., discovering protocols that can generalize to independently trained agents. Real world problem settings often contain costly communication channels, e.g., robots have to physically move their limbs, and a non-uniform distribution over intents. We show that these two factors lead to unique optimal ZSC policies in referential games, where agents use the energy cost of the messages to communicate intent. Other-Play was recently introduced for learning optimal ZSC policies, but requires prior access to the symmetries of the problem. Instead, QED can iteratively discovers the symmetries in this setting and converges to the optimal ZSC policy.

翻訳日:2021-03-17 02:39:17 公開日:2021-03-14

# マルチGANモデルを用いたクレーム検証

Claim Verification using a Multi-GAN based Model ( http://arxiv.org/abs/2103.08001v1 )

ライセンス: Link先を確認

Amartya Hatua, Arjun Mukherjee and Rakesh M. Verma

(参考訳) 本稿では,複数のGANモデルを用いたクレーム検証について述べる。提案モデルは3組のジェネレータと判別器から構成される。生成器と識別器のペアは、支持および反論されたクレームおよびクレームラベルの合成データを生成する責任があります。提案モデルに関する理論的議論は、モデルの平衡状態を検証するために提供される。提案モデルはフィーバーデータセットに適用され、入力テキストデータには事前学習された言語モデルが使用される。合成されたデータは、モデルが技術モデルや他の標準分類器の状態よりも優れた性能を発揮するのに役立つ情報を得るのに役立つ。

This article describes research on claim verification carried out using a multiple GAN-based model. The proposed model consists of three pairs of generators and discriminators. The generator and discriminator pairs are responsible for generating synthetic data for supported and refuted claims and claim labels. A theoretical discussion about the proposed model is provided to validate the equilibrium state of the model. The proposed model is applied to the FEVER dataset, and a pre-trained language model is used for the input text data. The synthetically generated data helps to gain information which helps the model to perform better than state of the art models and other standard classifiers.

翻訳日:2021-03-16 14:32:49 公開日:2021-03-14

# 3次元シーン理解のためのモンテカルロシーン検索

Monte Carlo Scene Search for 3D Scene Understanding ( http://arxiv.org/abs/2103.07969v1 )

ライセンス: Link先を確認

Shreyas Hampali, Sinisa Stekovic, Sayan Deb Sarkar, Chetan Srinivasa Kumar, Friedrich Fraundorfer, Vincent Lepetit

(参考訳) トレーニングデータの必要性を低減するために、一般的なAIアルゴリズムを3Dシーン理解にどのように使用できるかを検討します。より正確には、ノイズの多いRGB-Dスキャンからオブジェクトと部屋レイアウトを検索するためのモンテカルロ木探索(MCTS)アルゴリズムの修正を提案する。 MCTSはゲームプレイングアルゴリズムとして開発されたが、複雑な知覚問題にも使用できることを示す。簡単に調整できるハイパーパラメータは少なく、一般的な損失を最適化できる。 rgb-dデータに基づいて,物体の後方確率と室内配置仮説を最適化する。これにより、現在の解をレンダリングしてRGB-D観測と比較することにより、解空間を探索する分析バイシンセシスアプローチがもたらされる。この探索をより効率的に行うために,標準MCTSのツリー構築・探索方針の簡易な変更を提案する。 ScanNetデータセットに対する我々のアプローチを実証する。我々のメソッドは、特にレイアウト上の手動アノテーションよりも優れた設定を検索することが多い。

We explore how a general AI algorithm can be used for 3D scene understanding in order to reduce the need for training data. More exactly, we propose a modification of the Monte Carlo Tree Search (MCTS) algorithm to retrieve objects and room layouts from noisy RGB-D scans. While MCTS was developed as a game-playing algorithm, we show it can also be used for complex perception problems. It has few easy-to-tune hyperparameters and can optimise general losses. We use it to optimise the posterior probability of objects and room layout hypotheses given the RGB-D data. This results in an analysis-by-synthesis approach that explores the solution space by rendering the current solution and comparing it to the RGB-D observations. To perform this exploration even more efficiently, we propose simple changes to the standard MCTS' tree construction and exploration policy. We demonstrate our approach on the ScanNet dataset. Our method often retrieves configurations that are better than some manual annotations especially on layouts.

翻訳日:2021-03-16 14:32:40 公開日:2021-03-14

# 残差説明に基づく新しい解釈不能非監視異常検出法

A new interpretable unsupervised anomaly detection method based on residual explanation ( http://arxiv.org/abs/2103.07953v1 )

ライセンス: Link先を確認

David F. N. Oliveira, Lucio F. Vismari, Alexandre M. Nascimento, Jorge R. de Almeida Jr, Paulo S. Cugnasca, Joao B. Camargo Jr, Leandro Almeida, Rafael Gripp, Marcelo Neves

(参考訳) 難しい問題に対処するために複雑なパターンをモデリングする際の優れたパフォーマンスにもかかわらず、Deep Learning(DL)メソッドのブラックボックスの性質は、現実のクリティカルドメインにおけるアプリケーションに制限を課している。ブラックボックスの決定に対する人間の推論を可能にする円滑な方法の欠如は、予期せぬ出来事に対する予防措置を妨げ、破滅的な結果をもたらす可能性がある。ブラックボックスモデルの不明瞭さに取り組むため、解釈性はdlベースのシステムにおいて基本的な要件となり、モデルの振る舞いを理解する方法を提供することで、信頼と知識を活用した。現在のホットなトピックですが、監視されていないDLベースの異常検出モデル(AD)における現在の解釈可能性メソッドの既存の制限を克服するには、さらなる進歩が必要です。オートエンコーダ(AE)は、ADアプリケーションのための教師なしDLベースのコアであり、クラス内で最高のパフォーマンスを達成する。しかし、この結果を得るためのハイブリッドな側面(ネットワーク外での追加計算を必要とする)のため、AEベースのADに適用できるのは非依存の解釈可能な方法のみである。これらの非依存メソッドは、多数のパラメータを処理するのに計算的に高価である。本稿では,大規模システムにおけるAEベースのADの限界に対処する新しい解釈可能性手法であるRXP(Residual eXPlainer)を提案する。実装の単純化、計算コストの低減、および再構成された入力機能の偏差解析によって説明が得られる決定論的な振る舞いが際立っています。実鉄道路線のデータを用いた実験において,提案手法はSHAPよりも優れた性能を示し,大規模クリティカルシステムにおける意思決定を支援する可能性を実証した。

Despite the superior performance in modeling complex patterns to address challenging problems, the black-box nature of Deep Learning (DL) methods impose limitations to their application in real-world critical domains. The lack of a smooth manner for enabling human reasoning about the black-box decisions hinder any preventive action to unexpected events, in which may lead to catastrophic consequences. To tackle the unclearness from black-box models, interpretability became a fundamental requirement in DL-based systems, leveraging trust and knowledge by providing ways to understand the model's behavior. Although a current hot topic, further advances are still needed to overcome the existing limitations of the current interpretability methods in unsupervised DL-based models for Anomaly Detection (AD). Autoencoders (AE) are the core of unsupervised DL-based for AD applications, achieving best-in-class performance. However, due to their hybrid aspect to obtain the results (by requiring additional calculations out of network), only agnostic interpretable methods can be applied to AE-based AD. These agnostic methods are computationally expensive to process a large number of parameters. In this paper we present the RXP (Residual eXPlainer), a new interpretability method to deal with the limitations for AE-based AD in large-scale systems. It stands out for its implementation simplicity, low computational cost and deterministic behavior, in which explanations are obtained through the deviation analysis of reconstructed input features. In an experiment using data from a real heavy-haul railway line, the proposed method achieved superior performance compared to SHAP, demonstrating its potential to support decision making in large scale critical systems.

翻訳日:2021-03-16 14:28:20 公開日:2021-03-14

# ニューラルネットワークにおける補間損失挙動

Pre-interpolation loss behaviour in neural networks ( http://arxiv.org/abs/2103.07986v1 )

ライセンス: Link先を確認

Arthur E. W. Venter and Marthinus W. Theunissen and Marelie H. Davel

(参考訳) ニューラルネットワークを分類器としてトレーニングする場合、同じデータセット上の全体的な分類精度を維持または改善しながら、平均テスト損失の増加を観察することが一般的です。この現象の普遍性にも拘わらず、よく研究されておらず、境界の正しい分類の増加によってしばしば軽視される。本稿では,この現象が実際に試験試料の処理方法の違いの結果であることを示す実験的検討を行う。本質的に: テスト損失は全体として増加しませんが、少数のサンプルのためにだけ。大きい表現容量は他のための極度な増加の費用でテストサンプルの大多数のための損失を減らすことを可能にします。この効果は主に、正しく処理されたサンプルの特徴に関連するパラメータ値の増加に起因すると考えられる。本研究は,ディープニューラルネットワークの共通行動の実用的理解に寄与する。また、この作業がネットワーク最適化と一般化に果たす影響についても議論する。

When training neural networks as classifiers, it is common to observe an increase in average test loss while still maintaining or improving the overall classification accuracy on the same dataset. In spite of the ubiquity of this phenomenon, it has not been well studied and is often dismissively attributed to an increase in borderline correct classifications. We present an empirical investigation that shows how this phenomenon is actually a result of the differential manner by which test samples are processed. In essence: test loss does not increase overall, but only for a small minority of samples. Large representational capacities allow losses to decrease for the vast majority of test samples at the cost of extreme increases for others. This effect seems to be mainly caused by increased parameter values relating to the correctly processed sample features. Our findings contribute to the practical understanding of a common behaviour of deep neural networks. We also discuss the implications of this work for network optimisation and generalisation.

翻訳日:2021-03-16 14:27:56 公開日:2021-03-14

# ブロック型抽象構文木分割によるコード要約の改善

Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting ( http://arxiv.org/abs/2103.07845v1 )

ライセンス: Link先を確認

Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li, Rongxin Wu

(参考訳) 自動コード要約は、ソフトウェア開発者を手動コメントの重荷から解放し、ソフトウェア開発とメンテナンスに利益をもたらします。ソースコードの構文構造を表現した抽象構文木(AST)がコード要約の生成をガイドするために組み込まれている。しかし、既存のASTベースのメソッドは、トレーニングの難しさに悩まされ、不十分なコード要約を生成する。本稿では、ASTのリッチツリー形式の構文構造をフルに活用し、コード要約を改善するBlock-wise Abstract Syntax Tree Splitting法(略してBASTS)を提案する。 BASTSは、コントロールフローグラフの支配木にあるブロックに基づいてメソッドのコードを分割し、各コード分割に対して分割ASTを生成します。各スプリットASTはTree-LSTMによってモデル化され、プリトレーニング戦略を使用してローカルの非線形シンタックスエンコーディングをキャプチャする。学習したシンタックスエンコーディングはコードエンコーディングと組み合わせ、Transformerにフィードバックされ、高品質のコードサマリを生成します。ベンチマークに関する総合的な実験は、BASTSが様々な評価指標で最先端のアプローチを著しく上回っていることを実証している。再現性を促進するために、私たちの実装はhttps://github.com/XMUDM/BASTSで入手できます。

Automatic code summarization frees software developers from the heavy burden of manual commenting and benefits software development and maintenance. Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been incorporated to guide the generation of code summaries. However, existing AST based methods suffer from the difficulty of training and generate inadequate code summaries. In this paper, we present the Block-wise Abstract Syntax Tree Splitting method (BASTS for short), which fully utilizes the rich tree-form syntax structure in ASTs, for improving code summarization. BASTS splits the code of a method based on the blocks in the dominator tree of the Control Flow Graph, and generates a split AST for each code split. Each split AST is then modeled by a Tree-LSTM using a pre-training strategy to capture local non-linear syntax encoding. The learned syntax encoding is combined with code encoding, and fed into Transformer to generate high-quality code summaries. Comprehensive experiments on benchmarks have demonstrated that BASTS significantly outperforms state-of-the-art approaches in terms of various evaluation metrics. To facilitate reproducibility, our implementation is available at https://github.com/XMUDM/BASTS.

翻訳日:2021-03-16 14:27:29 公開日:2021-03-14

# 連続処理の因果効果学習のためのVCNetと機能目標正規化

VCNet and Functional Targeted Regularization For Learning Causal Effects of Continuous Treatments ( http://arxiv.org/abs/2103.07861v1 )

ライセンス: Link先を確認

Lizhen Nie, Mao Ye, Qiang Liu, Dan Nicolae

(参考訳) 連続処理による観測データの増大に動機付けられ, 平均線量応答曲線(ADRF)を推定する問題について検討した。利用可能なパラメトリック手法はモデル空間において制限されており、ニューラルネットワークを利用して連続的な処理をブロックに分割し、それぞれのブロックに別々のヘッドを使用することでモデル表現性を向上しようとする以前の試みは、実際には不連続ADRFを生成する。したがって、ADRFを推定するためにニューラルネットワークの構造とトレーニングをどのように適応させるかという問題はまだ開いていません。本稿は2つの重要な貢献を述べる。まず,予測されたADRFの連続性を保ちつつ,モデル表現性を向上させる新しい可変係数ニューラルネットワーク(VCNet)を提案する。第二に、有限サンプル性能を改善するために、ターゲット正規化を一般化し、ADRF曲線全体の二重に堅牢な推定値を得る。

Motivated by the rising abundance of observational data with continuous treatments, we investigate the problem of estimating the average dose-response curve (ADRF). Available parametric methods are limited in their model space, and previous attempts in leveraging neural network to enhance model expressiveness relied on partitioning continuous treatment into blocks and using separate heads for each block; this however produces in practice discontinuous ADRFs. Therefore, the question of how to adapt the structure and training of neural network to estimate ADRFs remains open. This paper makes two important contributions. First, we propose a novel varying coefficient neural network (VCNet) that improves model expressiveness while preserving continuity of the estimated ADRF. Second, to improve finite sample performance, we generalize targeted regularization to obtain a doubly robust estimator of the whole ADRF curve.

翻訳日:2021-03-16 14:25:44 公開日:2021-03-14

# Von Mises-Fisher楕円分布

Von Mises-Fisher Elliptical Distribution ( http://arxiv.org/abs/2103.07948v1 )

ライセンス: Link先を確認

Shengxi Li, Danilo Mandic

(参考訳) 現代の確率的学習システムの大きなクラスは対称分布を仮定しているが、実世界のデータは歪分布に従う傾向にあり、したがって対称分布を通じて適切にモデル化されるとは限らない。この問題に対処するため、楕円分布は対称分布の一般化にますます使われており、近位楕円分布のさらなる改善が注目されている。しかし、既存のアプローチは見積もりが難しいか、複雑で抽象的な表現を持っている。そこで本研究では,vMF(Von-Mises-Fisher)分布を用いて,スキュー楕円分布の明確かつ簡便な確率表現を提案する。これは、非対称学習システムに対処できるだけでなく、歪んだ分布を一般化するための物理的に意味のある方法を提供するためにも示される。厳密さのために、私達の拡張は対称同等と重要で望ましい特性を共有することが証明されます。また,提案するvmf分布は,理論上および実例を通じて,生成が容易であり,推定が安定であることを示す。

A large class of modern probabilistic learning systems assumes symmetric distributions, however, real-world data tend to obey skewed distributions and are thus not always adequately modelled through symmetric distributions. To address this issue, elliptical distributions are increasingly used to generalise symmetric distributions, and further improvements to skewed elliptical distributions have recently attracted much attention. However, existing approaches are either hard to estimate or have complicated and abstract representations. To this end, we propose to employ the von-Mises-Fisher (vMF) distribution to obtain an explicit and simple probability representation of the skewed elliptical distribution. This is shown not only to allow us to deal with non-symmetric learning systems, but also to provide a physically meaningful way of generalising skewed distributions. For rigour, our extension is proved to share important and desirable properties with its symmetric counterpart. We also demonstrate that the proposed vMF distribution is both easy to generate and stable to estimate, both theoretically and through examples.

翻訳日:2021-03-16 14:25:28 公開日:2021-03-14

# すべての報酬を最適化するための1つの表現を学ぶ

Learning One Representation to Optimize All Rewards ( http://arxiv.org/abs/2103.07945v1 )

ライセンス: Link先を確認

Ahmed Touati and Yann Ollivier

(参考訳) 我々は,報酬のないマルコフ決定プロセスのダイナミクスのフォワードバックワード(fb)表現を紹介する。後尾に指定された報酬に対して、明確な準最適ポリシーを提供する。教師なしのフェーズでは,既成の深層学習法と時間差学習(TD)を用いて,環境との報酬のないインタラクションを用いて2つの表現を学習する。試験段階では、報酬表現は、観察または明示的な報酬記述(例えば、目標状態)から推定される。その報酬の最適方針は、これらの表現から直接得られるが、計画はない。教師なしのFB損失は十分に優先されます:トレーニングが完璧であれば、得られたポリシーはどんな報酬機能にも最適です。不完全なトレーニングでは、副最適性は教師なし近似誤差に比例する。 FB表現は、モデルベースのアプローチのように状態を合成することなく、予測占有マップを介して状態と行動の間の長距離関係を学習する。これは任意のブラックボックス確率環境で制御可能なエージェントを学ぶためのステップである。このアプローチは、離散迷路および連続迷路上の目標指向RLアルゴリズム、ピクセルベースのMsPacman、およびFetchReach仮想ロボットアームとよく比較します。また、エージェントが目標指向RLを超える新しいタスクに即座に適応する方法も説明します。

We introduce the forward-backward (FB) representation of the dynamics of a reward-free Markov decision process. It provides explicit near-optimal policies for any reward specified a posteriori. During an unsupervised phase, we use reward-free interactions with the environment to learn two representations via off-the-shelf deep learning methods and temporal difference (TD) learning. In the test phase, a reward representation is estimated either from observations or an explicit reward description (e.g., a target state). The optimal policy for that reward is directly obtained from these representations, with no planning. The unsupervised FB loss is well-principled: if training is perfect, the policies obtained are provably optimal for any reward function. With imperfect training, the sub-optimality is proportional to the unsupervised approximation error. The FB representation learns long-range relationships between states and actions, via a predictive occupancy map, without having to synthesize states as in model-based approaches. This is a step towards learning controllable agents in arbitrary black-box stochastic environments. This approach compares well to goal-oriented RL algorithms on discrete and continuous mazes, pixel-based MsPacman, and the FetchReach virtual robot arm. We also illustrate how the agent can immediately adapt to new tasks beyond goal-oriented RL.

翻訳日:2021-03-16 14:20:34 公開日:2021-03-14

# SemVLP:複数のレベルでセマンティクスをアライメントするビジョン言語前訓練

SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels ( http://arxiv.org/abs/2103.07829v1 )

ライセンス: Link先を確認

Chenliang Li, Ming Yan, Haiyang Xu, Fuli Luo, Wei Wang, Bin Bi, Songfang Huang

(参考訳) 大規模画像テキストペア上での視覚言語事前学習(VLP)は、最近、クロスモーダル表現の学習の急速な進歩を目撃している。既存の事前学習手法は、単一ストリームトランスフォーマーへの入力として機能レベルで画像表現とテキスト表現を直接結合するか、2ストリームのクロスモーダルトランスフォーマーを使用して、画像テキスト表現を高レベルなセマンティック空間で整列させる。実世界の画像テキストデータでは、画像とテキストのペアが両方のモダリティに単純なセマンティクスをアライメントするのは容易である。そこで本稿では,画像とテキスト表現の低レベルと高レベルのセマンティクスを協調的に調整する,新しい事前学習手法SemVLPを提案する。モデルは2つの一般的な方法で事前訓練される: 単一ストリームの事前訓練きめ細かい特徴レベルでの調整および2ストリームの事前訓練ハイレベルセマンティクスの整合 ; プラグ可能なクロスモーダルアテンションモジュールを備えた共有トランスフォーマーネットワークを利用する。提案したSemVLPの有効性を実証するために、4つのよく確立された視覚言語理解タスクについて、多岐にわたる実験を行った。

Vision-language pre-training (VLP) on large-scale image-text pairs has recently witnessed rapid progress for learning cross-modal representations. Existing pre-training methods either directly concatenate image representation and text representation at a feature level as input to a single-stream Transformer, or use a two-stream cross-modal Transformer to align the image-text representation at a high-level semantic space. In real-world image-text data, we observe that it is easy for some of the image-text pairs to align simple semantics on both modalities, while others may be related after higher-level abstraction. Therefore, in this paper, we propose a new pre-training method SemVLP, which jointly aligns both the low-level and high-level semantics between image and text representations. The model is pre-trained iteratively with two prevalent fashions: single-stream pre-training to align at a fine-grained feature level and two-stream pre-training to align high-level semantics, by employing a shared Transformer network with a pluggable cross-modal attention module. An extensive set of experiments have been conducted on four well-established vision-language understanding tasks to demonstrate the effectiveness of the proposed SemVLP in aligning cross-modal representations towards different semantic granularities.

翻訳日:2021-03-16 14:15:39 公開日:2021-03-14

# ソースフル」なツイスト:感性、ハッシュタグ、およびアプリケーションソースに基づく絵文字予測

A `Sourceful' Twist: Emoji Prediction Based on Sentiment, Hashtags and Application Source ( http://arxiv.org/abs/2103.07833v1 )

ライセンス: Link先を確認

Pranav Venkit, Zeba Karishma, Chi-Yang Hsu, Rahul Katiki, Kenneth Huang, Shomir Wilson, Patrick Dudas

(参考訳) 私達は広くテキストの感情を高め、緩和し、または否定するためにソーシャルネットワークで絵文字を使用します。絵文字提案は、すでに多くのクロスプラットフォームアプリケーションに存在しているが、絵文字はテキストの主題や内容を理解するのではなく、一部の顕著な単語に基づいて予測される。そこで本論文では,関係する感情をモデルが理解し,テキストに最も適した絵文字を予測するために,Twitter機能を利用することの重要性を述べる。ハッシュタグやandroidなどのアプリケーションソースなど。絵文字の予測とTwitterの感情分析であまり使われていないことが判明した2つの機能だ。この欠点にアプローチし、さらに絵文字の行動パターンを理解するために、タイムスタンプ、ハッシュタグ、アプリケーションソースなどの追加のtwitterデータをクロールすることで、よりバランスのとれたデータセットを提案する。データ分析とニューラルネットワークモデルのパフォーマンス評価は、ハッシュタグとアプリケーションソースを特徴として使用することで、異なる情報をエンコードすることができ、絵文字の予測に有効であることを示している。

We widely use emojis in social networking to heighten, mitigate or negate the sentiment of the text. Emoji suggestions already exist in many cross-platform applications but an emoji is predicted solely based a few prominent words instead of understanding the subject and substance of the text. Through this paper, we showcase the importance of using Twitter features to help the model understand the sentiment involved and hence to predict the most suitable emoji for the text. Hashtags and Application Sources like Android, etc. are two features which we found to be important yet underused in emoji prediction and Twitter sentiment analysis on the whole. To approach this shortcoming and to further understand emoji behavioral patterns, we propose a more balanced dataset by crawling additional Twitter data, including timestamp, hashtags, and application source acting as additional attributes to the tweet. Our data analysis and neural network model performance evaluations depict that using hashtags and application sources as features allows to encode different information and is effective in emoji prediction.

翻訳日:2021-03-16 14:15:14 公開日:2021-03-14

# 道路・気象条件の異なる自動運転におけるカリキュラム強化学習の価値の検討

Investigating Value of Curriculum Reinforcement Learning in Autonomous Driving Under Diverse Road and Weather Conditions ( http://arxiv.org/abs/2103.07903v1 )

ライセンス: Link先を確認

Anil Ozturk, Mustafa Burak Gunel, Resul Dagdanov, Mirac Ekim Vural, Ferhat Yurdakul, Melih Dal, Nazim Kemal Ure

(参考訳) 強化学習(RL)の応用は自動運転タスクで人気がある。とはいえ、RLエージェントのパフォーマンスをチューニングし、さまざまな運転シナリオで一般化のパフォーマンスを保証することは、依然として大きな問題です。特に、複雑な道路や気象条件で優れた性能を得るには、徹底的なチューニングと計算時間が必要である。複雑なタスクに知識を移すため、簡単な自動化タスクの解決に重点を置くカリキュラムRLは、RLコミュニティで注目を集めている。本論文の主な貢献は、自動運転アプリケーションにおけるカリキュラム強化学習の価値を調査するための体系的研究である。本研究の目的は,道路の複雑度や気象条件の異なる実走行シミュレータにおいて,複数の異なる運転シナリオをセットアップすることである。次に、タスクの組み合わせとカリキュラムの異なるシーケンスでRLエージェントの性能を訓練し、評価する。その結果、カリキュラムRLは、運転性能とサンプルの複雑さの両方の観点から、複雑な運転タスクで有意な利益を得ることができます。結果は、異なるカリキュラムが異なるメリットをもたらす可能性があることも示しており、これは自動カリキュラムトレーニングの今後の研究方向性を示唆している。

Applications of reinforcement learning (RL) are popular in autonomous driving tasks. That being said, tuning the performance of an RL agent and guaranteeing the generalization performance across variety of different driving scenarios is still largely an open problem. In particular, getting good performance on complex road and weather conditions require exhaustive tuning and computation time. Curriculum RL, which focuses on solving simpler automation tasks in order to transfer knowledge to complex tasks, is attracting attention in RL community. The main contribution of this paper is a systematic study for investigating the value of curriculum reinforcement learning in autonomous driving applications. For this purpose, we setup several different driving scenarios in a realistic driving simulator, with varying road complexity and weather conditions. Next, we train and evaluate performance of RL agents on different sequences of task combinations and curricula. Results show that curriculum RL can yield significant gains in complex driving tasks, both in terms of driving performance and sample complexity. Results also demonstrate that different curricula might enable different benefits, which hints future research directions for automated curriculum training.

翻訳日:2021-03-16 14:13:58 公開日:2021-03-14

# Cycle4Completion:Missing Region Codingを用いたCycle Transformationによる不対点クラウド補完

Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding ( http://arxiv.org/abs/2103.07838v1 )

ライセンス: Link先を確認

Xin Wen and Zhizhong Han and Yan-Pei Cao and Pengfei Wan and Wen Zheng and Yu-Shen Liu

(参考訳) 本稿では,部分3dオブジェクトから全測地線を推定するcycle4completionという,新しい非ペアレッド点クラウド補完ネットワークを提案する。従来未完成な完成法は、不完全な形状から完全な形状への幾何学的対応の学習にのみ焦点を合わせ、逆方向の学習を無視することで、3次元形状理解能力の制限による完成精度の低下を招いた。そこで本研究では, 完全形状の潜在空間と不完全空間の2つの周期変換を提案する。サイクル変換の洞察は、ネットワークが相補的な形状から完全または不完全な形状を生成するように学習することで、3d形状を理解するよう促進することである。具体的には、最初のサイクルは不完全ドメインから完全ドメインへ形を変換し、その後不完全ドメインに投影する。このプロセスは完全形状の幾何学的特徴を学習し、完全予測と不完全入力の間の形状整合性を維持する。同様に、逆サイクル変換は完全なドメインから不完全なドメインへ始まり、不完全なシェイプの特徴を学ぶために完全なドメインに戻ります。実験の包括的評価を行い、学習した双方向形状対応モデルが、最先端の非ペアリング補完法よりも優れていることを示す。

In this paper, we present a novel unpaired point cloud completion network, named Cycle4Completion, to infer the complete geometries from a partial 3D object. Previous unpaired completion methods merely focus on the learning of geometric correspondence from incomplete shapes to complete shapes, and ignore the learning in the reverse direction, which makes them suffer from low completion accuracy due to the limited 3D shape understanding ability. To address this problem, we propose two simultaneous cycle transformations between the latent spaces of complete shapes and incomplete ones. The insight of cycle transformation is to promote networks to understand 3D shapes by learning to generate complete or incomplete shapes from their complementary ones. Specifically, the first cycle transforms shapes from incomplete domain to complete domain, and then projects them back to the incomplete domain. This process learns the geometric characteristic of complete shapes, and maintains the shape consistency between the complete prediction and the incomplete input. Similarly, the inverse cycle transformation starts from complete domain to incomplete domain, and goes back to complete domain to learn the characteristic of incomplete shapes. We provide a comprehensive evaluation in experiments, which shows that our model with the learned bidirectional geometry correspondence outperforms state-of-the-art unpaired completion methods.

翻訳日:2021-03-16 14:11:39 公開日:2021-03-14

# マルチモーダル軌道予測のための3つのステップ:モダリティクラスタリング、分類、合成

Three Steps to Multimodal Trajectory Prediction: Modality Clustering, Classification and Synthesis ( http://arxiv.org/abs/2103.07854v1 )

ライセンス: Link先を確認

Jianhua Sun, Yuxuan Li, Hao-Shu Fang, Cewu Lu

(参考訳) 軌道予測タスクには,未来に対する正しい答えが1つもないため,マルチモーダル予測結果が不可欠である。以前のフレームワークは、回帰、生成、分類の3つのカテゴリに分けられる。しかし、これらのフレームワークは異なる側面に弱点があり、マルチモーダル予測タスクを包括的にモデル化できない。本稿では,マルチモーダル予測を3つのステップ(モダリティクラスタリング,分類と合成)に定式化し,それ以前のフレームワークの欠点に対処することにより,新しい予測フレームワークとともに新しい洞察を提案する。提案手法は,社会情報や地図情報を導入することなく,最先端の手法を超越することを示した。具体的には、ETH/UCYデータセットでADEとFDEをそれぞれ19.2%、20.8%改善する。私たちのコードは公開されます。

Multimodal prediction results are essential for trajectory forecasting task as there is no single correct answer for the future. Previous frameworks can be divided into three categories: regression, generation and classification frameworks. However, these frameworks have weaknesses in different aspects so that they cannot model the multimodal prediction task comprehensively. In this paper, we present a novel insight along with a brand-new prediction framework by formulating multimodal prediction into three steps: modality clustering, classification and synthesis, and address the shortcomings of earlier frameworks. Exhaustive experiments on popular benchmarks have demonstrated that our proposed method surpasses state-of-the-art works even without introducing social and map information. Specifically, we achieve 19.2% and 20.8% improvement on ADE and FDE respectively on ETH/UCY dataset. Our code will be made publicly available.

翻訳日:2021-03-16 14:11:15 公開日:2021-03-14

# 複数オブジェクト追跡のための提案分類器の学習

Learning a Proposal Classifier for Multiple Object Tracking ( http://arxiv.org/abs/2103.07889v1 )

ライセンス: Link先を確認

Peng Dai and Renliang Weng and Wongun Choi and Changshui Zhang and Zhangping He and Wei Ding

(参考訳) マルチオブジェクトトラッキング(MOT)の最近のトレンドは、トラッキングパフォーマンスを高めるためにディープラーニングを活用することに向かっています。しかし、データ結合問題をエンドツーエンドで解くことは自明ではない。本稿では,MOTを提案生成,提案スコアリング,トラジェクティブ推論パラダイムとしてアフィニティグラフ上にモデル化した,提案に基づく学習可能なフレームワークを提案する。このフレームワークは、2段階のオブジェクト検出器Faster RCNNに似ており、データ駆動の方法でMOT問題を解決することができる。提案生成のために,生成した提案の品質を維持しながら計算コストを削減するための反復グラフクラスタリング手法を提案する。提案手法は,提案する提案の構造パターンを学習し,評価された品質スコアに従ってランク付けするために,トレーニング可能なグラフ畳み込みネットワーク(GCN)をデプロイする。軌道推論では、複数のトラックに検出を割り当てることができないという制約に従いながら、追跡出力を生成するためのシンプルなオーバーラップ戦略を採用しています。提案手法は,従来の2つの公開ベンチマークにおいて,MOTAとIDF1の両性能改善を実現することを実験的に実証した。コードは \url{https://github.com/daip13/LPC_MOT.git} で入手できます。

The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. However, it is not trivial to solve the data-association problem in an end-to-end fashion. In this paper, we propose a novel proposal-based learnable framework, which models MOT as a proposal generation, proposal scoring and trajectory inference paradigm on an affinity graph. This framework is similar to the two-stage object detector Faster RCNN, and can solve the MOT problem in a data-driven way. For proposal generation, we propose an iterative graph clustering method to reduce the computational cost while maintaining the quality of the generated proposals. For proposal scoring, we deploy a trainable graph-convolutional-network (GCN) to learn the structural patterns of the generated proposals and rank them according to the estimated quality scores. For trajectory inference, a simple deoverlapping strategy is adopted to generate tracking output while complying with the constraints that no detection can be assigned to more than one track. We experimentally demonstrate that the proposed method achieves a clear performance improvement in both MOTA and IDF1 with respect to previous state-of-the-art on two public benchmarks. Our code is available at \url{https://github.com/daip13/LPC_MOT.git}.

翻訳日:2021-03-16 14:11:00 公開日:2021-03-14

# DivCo: Contrastive Generative Adversarial Networkによる多様な条件付き画像合成

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network ( http://arxiv.org/abs/2103.07893v1 )

ライセンス: Link先を確認

Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li

(参考訳) conditional generative adversarial networks (cgans) は、入力条件と潜在コードから様々なイメージを合成することを目標としているが、残念ながらモード崩壊の問題に苦しむ。この問題を解決するため、従来の研究は主に、様々な潜在コードから生成された画像間の関係を無視しながら、潜在コードと生成画像の相関関係を奨励することに焦点を当てていた。最近のMSGANは生成した画像の多様性を奨励しようとしたが、画像ペア間の"負の関係"のみを考慮していた。本稿では,潜在空間で指定された生成画像間の「正」と「負」の関係を適切に制約する新しいDivCoフレームワークを提案する。私たちの知る限りでは、これは様々な条件付き画像合成にコントラスト学習を使用する最初の試みです。隣接する潜時符号から生成された画像と、異なる潜時符号から生成された画像とが類似することを奨励する、新規な潜時拡張コントラスト損失が導入される。提案された遅発性コントラスト損失は、様々なcGANアーキテクチャとよく互換性がある。広範な実験により、提案されたDivCoは、複数の無対およびペアの画像生成タスクで視覚的品質を犠牲にすることなく、最先端の方法よりも多様な画像を生成することができることが実証された。

Conditional generative adversarial networks (cGANs) target at synthesizing diverse images given the input conditions and latent codes, but unfortunately, they usually suffer from the issue of mode collapse. To solve this issue, previous works mainly focused on encouraging the correlation between the latent codes and their generated images, while ignoring the relations between images generated from various latent codes. The recent MSGAN tried to encourage the diversity of the generated image but only considers "negative" relations between the image pairs. In this paper, we propose a novel DivCo framework to properly constrain both "positive" and "negative" relations between the generated images specified in the latent space. To the best of our knowledge, this is the first attempt to use contrastive learning for diverse conditional image synthesis. A novel latent-augmented contrastive loss is introduced, which encourages images generated from adjacent latent codes to be similar and those generated from distinct latent codes to be dissimilar. The proposed latent-augmented contrastive loss is well compatible with various cGAN architectures. Extensive experiments demonstrate that the proposed DivCo can produce more diverse images than state-of-the-art methods without sacrificing visual quality in multiple unpaired and paired image generation tasks.

翻訳日:2021-03-16 14:10:42 公開日:2021-03-14

# Refer-it-in-RGBD:RGBD画像における3次元視覚グラウンドのボトムアップアプローチ

Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images ( http://arxiv.org/abs/2103.07894v1 )

ライセンス: Link先を確認

Haolin Liu, Anran Lin, Xiaoguang Han, Lei Yang, Yizhou Yu, Shuguang Cui

(参考訳) RGBD画像における接地参照表現は新たな分野である。本稿では,参照する物体が閉塞により部分的にスキャンされる場合が多い単視点rgbd画像における3次元視覚グランド化の新たな課題を提案する。 3Dシーンに接地するためのオブジェクト提案を直接生成する従来の作業とは対照的に,コンテキスト認識情報を段階的に集約するボトムアップ手法を提案し,部分幾何学による課題に効果的に対処する。我々のアプローチは、まず言語と視覚機能をボトムレベルに融合させ、rgbdイメージ内の関連領域を粗くローカライズするヒートマップを生成する。次に、ヒートマップに基づく適応的特徴学習を行い、他のビジオ言語融合とオブジェクトレベルのマッチングを行い、最後に参照したオブジェクトを接地する。提案手法は,ScanReferデータセットから抽出したRGBD画像と新たに収集したSUNReferデータセットとを比較して評価する。実験では、両データセットの以前の手法(11.2%と15.6%のAcc@0.5)を上回った。

Grounding referring expressions in RGBD image has been an emerging field. We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate context-aware information, effectively addressing the challenge posed by the partial geometry. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGBD image. Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object. We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGBD images extracted from the ScanRefer dataset and our newly collected SUNRefer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.2% and 15.6% Acc@0.5) on both datasets.

翻訳日:2021-03-16 14:10:20 公開日:2021-03-14

# Bag-of-local-featureによる顔操作の一般化とロバスト化に向けて

Towards Generalizable and Robust Face Manipulation Detection via Bag-of-local-feature ( http://arxiv.org/abs/2103.07915v1 )

ライセンス: Link先を確認

Changtao Miao, Qi Chu, Weihai Li, Tao Gong, Wanyi Zhuang and Nenghai Yu

(参考訳) 過去数年間、顔操作技術の悪質な虐待の問題を解決するために、顔操作検出技術はかなりの注目を集め、顕著な進歩を達成しました。しかし、既存の手法の多くは一般化能力と堅牢性が非常に貧弱である。本稿では,局所的特徴量による一般化とロバスト性を向上させるための新しい顔操作検出法を提案する。具体的には、パッチ間関係をエンコードするためにbag-of-featureアプローチを使ってトランスフォーマーを拡張し、明示的な監督なしにローカルな偽造機能を学ぶことができる。広範な実験では、FaceForensics++、Celeb-DF、DeeperForensics-1.0データセットの競合する最先端のメソッドを上回ります。

Over the past several years, in order to solve the problem of malicious abuse of facial manipulation technology, face manipulation detection technology has obtained considerable attention and achieved remarkable progress. However, most existing methods have very impoverished generalization ability and robustness. In this paper, we propose a novel method for face manipulation detection, which can improve the generalization ability and robustness by bag-of-local-feature. Specifically, we extend Transformers using bag-of-feature approach to encode inter-patch relationships, allowing it to learn local forgery features without any explicit supervision. Extensive experiments demonstrate that our method can outperform competing state-of-the-art methods on FaceForensics++, Celeb-DF and DeeperForensics-1.0 datasets.

翻訳日:2021-03-16 14:10:00 公開日:2021-03-14

# 動的雨発生器による半監督映像の劣化

Semi-Supervised Video Deraining with Dynamic Rain Generator ( http://arxiv.org/abs/2103.07939v1 )

ライセンス: Link先を確認

Zongsheng Yue, Jianwen Xie, Qian Zhao, Deyu Meng

(参考訳) 近年,深層学習(DL)に基づくビデオデライニング手法は大きな成功を収めているが,大きな欠点は2つある。第一に、雨の層の特徴を十分にモデル化していないものが多い。実際、雨の層は空間次元の強い物理的性質(例えば、方向、スケールおよび厚さ)および時間次元の自然な連続性を示し、統計学の空間時間過程によって一般にモデル化することができる。第二に、現在のdlベースの手法はラベル付き合成トレーニングデータに真剣に依存しており、雨種は常にラベルなしの実データと切り離されている。このような合成データセットと実際のデータセットのギャップは、実際のシナリオに適用する際のパフォーマンスの低下につながります。そこで本論文では,雨の層に適応する動的雨発生器を用いて,その洞察力のある特性をよりよく表現する,新しい半監視型ビデオ脱雨法を提案する。具体的には、1つの放出モデルと1つの遷移モデルからなり、それぞれ深層ニューラルネットワーク(DNN)としてパラメータ化される雨のストリークの空間的物理的構造と時間的連続的な変化を同時に符号化する。さらに、ラベル付き合成およびラベルなしの実データに対して、それらの基礎となる共通知識を十分に活用するために、異なる先行フォーマットが設計されている。最後に、我々はまた、このモデルを解決するためにモンテカルロEMアルゴリズムを設計します。提案した半教師付きデラライニングモデルの優位性を検証するため, 大規模実験を行った。

While deep learning (DL)-based video deraining methods have achieved significant success recently, they still exist two major drawbacks. Firstly, most of them do not sufficiently model the characteristics of rain layers of rainy videos. In fact, the rain layers exhibit strong physical properties (e.g., direction, scale and thickness) in spatial dimension and natural continuities in temporal dimension, and thus can be generally modelled by the spatial-temporal process in statistics. Secondly, current DL-based methods seriously depend on the labeled synthetic training data, whose rain types are always deviated from those in unlabeled real data. Such gap between synthetic and real data sets leads to poor performance when applying them in real scenarios. Against these issues, this paper proposes a new semi-supervised video deraining method, in which a dynamic rain generator is employed to fit the rain layer, expecting to better depict its insightful characteristics. Specifically, such dynamic generator consists of one emission model and one transition model to simultaneously encode the spatially physical structure and temporally continuous changes of rain streaks, respectively, which both are parameterized as deep neural networks (DNNs). Further more, different prior formats are designed for the labeled synthetic and unlabeled real data, so as to fully exploit the common knowledge underlying them. Last but not least, we also design a Monte Carlo EM algorithm to solve this model. Extensive experiments are conducted to verify the superiorities of the proposed semi-supervised deraining model.

翻訳日:2021-03-16 14:09:48 公開日:2021-03-14

# Modular Interactive Video Object Segmentation:Interaction-to-Mask, Propagation and difference-Aware Fusion

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion ( http://arxiv.org/abs/2103.07941v1 )

ライセンス: Link先を確認

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

(参考訳) マスク間相互作用とマスク伝搬を分離し,より汎用性と優れた性能を実現するモジュール型対話型VOS(MiVOS)フレームワークを提案する。個別にトレーニングされたインタラクションモジュールは,ユーザインタラクションをオブジェクトマスクに変換して,時空間メモリを読み取るための新しいトップ-k$フィルタ戦略を用いて,伝搬モジュールによって時間的に伝搬する。ユーザの意図を効果的に考慮するために、時空メモリを用いてターゲットフレームに整列した各インタラクションの前後にマスクを適切に融合する方法を学ぶための新しい差分認識モジュールが提案される。 DAVIS上でのユーザインタラクション(例えば、スクリブル、クリック)の定性的および定量的に評価し、この手法が現在の最先端のアルゴリズムを上回る一方で、フレームインタラクションを少なくし、さまざまなタイプのユーザーインタラクションを一般化する利点があることを示した。ソースコードに付随する4.8Mフレームのピクセル精度のセグメンテーションによる大規模合成VOSデータセットを提供し、今後の研究を促進しています。

We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance. Trained separately, the interaction module converts user interactions to an object mask, which is then temporally propagated by our propagation module using a novel top-$k$ filtering strategy in reading the space-time memory. To effectively take the user's intent into account, a novel difference-aware module is proposed to learn how to properly fuse the masks before and after each interaction, which are aligned with the target frames by employing the space-time memory. We evaluate our method both qualitatively and quantitatively with different forms of user interactions (e.g., scribbles, clicks) on DAVIS to show that our method outperforms current state-of-the-art algorithms while requiring fewer frame interactions, with the additional advantage in generalizing to different types of user interactions. We contribute a large-scale synthetic VOS dataset with pixel-accurate segmentation of 4.8M frames to accompany our source codes to facilitate future research.

翻訳日:2021-03-16 14:09:23 公開日:2021-03-14

# TransFG:微細粒度認識のためのトランスフォーマーアーキテクチャ

TransFG: A Transformer Architecture for Fine-grained Recognition ( http://arxiv.org/abs/2103.07976v1 )

ライセンス: Link先を確認

Ju He, Jieneng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille

(参考訳) サブカテゴリからオブジェクトを認識することを目的とした細粒度視覚分類(FGVC)は、本質的に微妙なクラス間差のため、非常に難しい課題である。近年の研究では、最も差別的な画像領域の特定に焦点をあて、ネットワークの微妙なばらつきを捉える能力の向上にそれらを活用することで、この問題に対処している。これらの作業のほとんどは、RPNモジュールを使用してバウンディングボックスを提案し、バックボーンネットワークを使用して選択されたボックスの特徴を抽出します。近年,視覚変換器 (ViT) は従来の分類課題において高い性能を示した。トランスの自己アテンション機構は、すべてのパッチトークンを分類トークンにリンクする。注意リンクの強さはトークンの重要性の指標として直感的に考えることができる。本研究では,トランスの生の注意重みをすべて注意マップに統合して,識別画像パッチを効率的かつ正確に選択し,それらの関係を計算する,新しいトランスフォーマーベースのフレームワークであるTransFGを提案する。重複損失は、異なる地域に焦点を当てるために複数の注意を喚起するために導入されます。さらに、類似サブクラスの特徴表現間の距離をさらに拡大するために、コントラスト損失が適用される。我々は、cub-200-2011、stanford cars、stanford dogs、nabirds、inat2017の5つの人気のあるきめ細かいベンチマーク実験を行い、transfgの価値を実証した。モデルをよりよく理解するための定性的な結果が提示される。

Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences. Recent works mainly tackle this problem by focusing on how to locate the most discriminative image regions and rely on them to improve the capability of networks to capture subtle variances. Most of these works achieve this by using an RPN module to propose bounding boxes and re-use the backbone network to extract features of selected boxes. Recently, vision transformer (ViT) shows its strong performance in the traditional classification task. The self-attention mechanism of the transformer links every patch token to the classification token. The strength of the attention link can be intuitively considered as an indicator of the importance of tokens. In this work, we propose a novel transformer-based framework TransFG where we integrate all raw attention weights of the transformer into an attention map for guiding the network to effectively and accurately select discriminative image patches and compute their relations. A duplicate loss is introduced to encourage multiple attention heads to focus on different regions. In addition, a contrastive loss is applied to further enlarge the distance between feature representations of similar sub-classes. We demonstrate the value of TransFG by conducting experiments on five popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, NABirds and iNat2017 where we achieve state-of-the-art performance. Qualitative results are presented for better understanding of our model.

翻訳日:2021-03-16 14:09:01 公開日:2021-03-14

# 平均フィールドゲームGAN

Mean Field Game GAN ( http://arxiv.org/abs/2103.07855v1 )

ライセンス: Link先を確認

Shaojun Ma, Haomin Zhou, Hongyuan Zha

(参考訳) 新規な平均フィールドゲーム (MFGs) ベースのGAN (generation adversarial network) フレームワークを提案する。具体的には、密度空間における Hopf 式を用いて MFG を主双対問題として書き換え、ニューラルネットワークやサンプルを通じてモデルを訓練できるようにします。私たちのモデルは、ホップ式内の様々な機能を選択する自由のために柔軟です。さらに、私達の公式は数学的にLipschitz-1の制約を避けます。本手法の正確性と効率は,いくつかの実験により検証された。

We propose a novel mean field games (MFGs) based GAN(generative adversarial network) framework. To be specific, we utilize the Hopf formula in density space to rewrite MFGs as a primal-dual problem so that we are able to train the model via neural networks and samples. Our model is flexible due to the freedom of choosing various functionals within the Hopf formula. Moreover, our formulation mathematically avoids Lipschitz-1 constraint. The correctness and efficiency of our method are validated through several experiments.

翻訳日:2021-03-16 14:03:09 公開日:2021-03-14

# fisher divergence critic regularizationを用いたオフライン強化学習

Offline Reinforcement Learning with Fisher Divergence Critic Regularization ( http://arxiv.org/abs/2103.08050v1 )

ライセンス: Link先を確認

Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum

(参考訳) オフライン強化学習(RL)に対する現代の多くのアプローチは、通常、オフラインデータからポリシーのばらつきを測定するペナルティを持つモデルフリーアクター批評家アルゴリズムを増強する行動規則化を利用している。本研究では,オフラインデータを生成するログビヘイビア・ポリティ(log-behavior-policy)と,ニューラルネットワークを用いて学習可能な状態アクション値オフセット項をパラメータ化して,学習方針がデータに近づき続けることを奨励する代替手法を提案する。動作の正規化は、オフセット期間の適切な正規化に対応します。本稿では,オフセット項に勾配ペナルティ正規化器を用い,フィッシャーの発散正規化と等価性を実証し,スコアマッチングと生成エネルギーに基づくモデル文献との関連性を提案する。そこで,このアルゴリズムをfisher-brc (behavior regularized critic) と呼ぶ。標準のオフラインRLベンチマークでは、Fisher-BRCはパフォーマンスの向上と既存の最先端のメソッドよりも迅速な収束を実現します。

Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data. In this work, we propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy, which generated the offline data, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. We propose using a gradient penalty regularizer for the offset term and demonstrate its equivalence to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature. We thus term our resulting algorithm Fisher-BRC (Behavior Regularized Critic). On standard offline RL benchmarks, Fisher-BRC achieves both improved performance and faster convergence over existing state-of-the-art methods.

翻訳日:2021-03-16 14:03:02 公開日:2021-03-14

# ハイパーパラメータ最適化における静的サーロゲートの利用

Use of static surrogates in hyperparameter optimization ( http://arxiv.org/abs/2103.07963v1 )

ライセンス: Link先を確認

Dounia Lakhmiri and S\'ebastien Le Digabel

(参考訳) ニューラルネットワークのハイパーパラメータとアーキテクチャを最適化することは、新しいアプリケーションの開発において長く必要不可欠なフェーズです。この消費プロセスは、低品質な構成を迅速に破棄し、より有望な候補に集中するように設計された戦略の策定の恩恵を受けることができる。本研究の目的は,ニューラルネットワークのアーキテクチャとトレーニングを同時にチューニングするために,直接探索微分自由最適化アルゴリズムを適用したライブラリであるHyperNOMADを,実行の2つのキーステップを目標とし,静的サロゲートの形で安価な近似を利用して,構成の評価と候補プールのランク付けを早期に停止させることである。これらのHyperNOMADへの追加は、提案したソリューションの品質を損なうことなく、リソース消費を改善することが示されている。

Optimizing the hyperparameters and architecture of a neural network is a long yet necessary phase in the development of any new application. This consuming process can benefit from the elaboration of strategies designed to quickly discard low quality configurations and focus on more promising candidates. This work aims at enhancing HyperNOMAD, a library that adapts a direct search derivative-free optimization algorithm to tune both the architecture and the training of a neural network simultaneously, by targeting two keys steps of its execution and exploiting cheap approximations in the form of static surrogates to trigger the early stopping of the evaluation of a configuration and the ranking of pools of candidates. These additions to HyperNOMAD are shown to improve on its resources consumption without harming the quality of the proposed solutions.

翻訳日:2021-03-16 13:58:54 公開日:2021-03-14

# 完了時間による成功度:身体的ナビゲーションのためのダイナミクスを考慮した評価基準

Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation ( http://arxiv.org/abs/2103.08022v1 )

ライセンス: Link先を確認

Naoki Yokoyama, Sehoon Ha, Dhruv Batra

(参考訳) 我々は,移動ロボットのナビゲーション性能を評価するための新しい指標であるCompletion Time (SCT) により,Successを重み付けした。ナビゲーションに関するいくつかの関連する研究は、エージェントが目標とする経路を評価する主要な方法として、パス長(SPL)で重み付けされたSuccessを使用してきたが、SPLは複雑なダイナミクスを持つエージェントを適切に評価する能力に限られている。対照的に、sctはエージェントのダイナミクスモデルを明示的に考慮し、エージェントがそのダイナミクスによって与えられる最速のナビゲーション動作をいかに正確に把握することを目的としている。いくつかの具体化されたナビゲーションはポイントターンダイナミクスを使用しますが、私たちは一般的なモバイルロボティクスプラットフォーム(LoCoBot、TurtleBot、Fetchなど)のダイナミクスモデルをよりよく例示するエージェントのための一輪車ダイナミクスに焦点を当てています。 RRT*-Unicycleは、障害物を含む環境において、開始ポーズから目標位置までの衝突のない経路と完了時間を推定する一輪動力学のアルゴリズムである。深層強化学習と報酬形成の実験を行い,エージェントのナビゲーション性能を異なる動的モデルと比較した。これらのエージェントを評価すると、SPLとは対照的に、SCTは1サイクルモデルがダイナミクスの単純なポイントターンモデルよりもナビゲーション速度の利点を捉えることができることを示しています。最後に、実世界のシミュレーション以外のトレーニングを受けたモデルとアルゴリズムをうまく展開できることを示します。私たちは実際のロボットにエージェントを体現し、アパートをナビゲートし、ゼロショットで一般化できることを示します。

We present Success weighted by Completion Time (SCT), a new metric for evaluating navigation performance for mobile robots. Several related works on navigation have used Success weighted by Path Length (SPL) as the primary method of evaluating the path an agent makes to a goal location, but SPL is limited in its ability to properly evaluate agents with complex dynamics. In contrast, SCT explicitly takes the agent's dynamics model into consideration, and aims to accurately capture how well the agent has approximated the fastest navigation behavior afforded by its dynamics. While several embodied navigation works use point-turn dynamics, we focus on unicycle-cart dynamics for our agent, which better exemplifies the dynamics model of popular mobile robotics platforms (e.g., LoCoBot, TurtleBot, Fetch, etc.). We also present RRT*-Unicycle, an algorithm for unicycle dynamics that estimates the fastest collision-free path and completion time from a starting pose to a goal location in an environment containing obstacles. We experiment with deep reinforcement learning and reward shaping to train and compare the navigation performance of agents with different dynamics models. In evaluating these agents, we show that in contrast to SPL, SCT is able to capture the advantages in navigation speed a unicycle model has over a simpler point-turn model of dynamics. Lastly, we show that we can successfully deploy our trained models and algorithms outside of simulation in the real world. We embody our agents in an real robot to navigate an apartment, and show that they can generalize in a zero-shot manner.

翻訳日:2021-03-16 13:58:38 公開日:2021-03-14

# SaNet: 空間分解能空中画像解析のためのスケール対応ニューラルネットワーク

SaNet: Scale-aware neural Network for Parsing Multiple Spatial Resolution Aerial Images ( http://arxiv.org/abs/2103.07935v1 )

ライセンス: Link先を確認

Libo Wang (School of Remote Sensing and Information Engineering Wuhan University, China)

(参考訳) 画像の地理空間を画素レベルで分類情報で指定することは都市景観理解の基本的な課題である。しかし、リモートセンシングセンサーの巨大な違いにより、複数の空間分解能(MSR)で空撮された画像は、地理的空間オブジェクトのスケール変動の増加と空間分解能が低下するにつれて情報的特徴の損失という2つの問題を引き起こします。そこで本研究では,MSR空中画像解析のためのスケールアウェアニューラルネットワーク (SaNet) を提案する。スケール変動に起因する大小のオブジェクト間の不均衡なセグメンテーション品質に対応するため、SaNetは高密度接続機能ネットワーク(DCFPN)モジュールをデプロイし、大きな受信フィールドを持つ品質のマルチスケールコンテキストをキャプチャする。情報的特徴損失を軽減するため、SFRモジュールをネットワークに組み込み、空間的関係強化を伴うスケール不変の特徴を学習する。 ISPRS Vaihingen 2DデータセットとISPRS Potsdam 2Dデータセットに関する広範な実験結果は、提案されたSaNetの他の最先端のネットワークと比較して優れたクロス解像度セグメンテーション能力を示しています。

Assigning the geospatial objects of aerial images with categorical information at the pixel-level is a basic task in urban scene understanding. However, the huge differencc in remote sensing sensors makes the acqured aerial images in multiple spatial resolution (MSR), which brings two issues: the increased scale variation of geospatial objects and informative feature loss as spatial resolution drops. To address the two issues, we propose a novel scale-aware neural network (SaNet) for parsing MSR aerial images. For coping with the imbalanced segmentation quality between larger and smaller objects caused by the scale variation, the SaNet deploys a densely connected feature network (DCFPN) module to capture quality multi-scale context with large receptive fields. To alleviate the informative feature loss, a SFR module is incorporated into the network to learn scale-invariant features with spatial relation enhancement. Extensive experimental results on the ISPRS Vaihingen 2D Dataset and ISPRS Potsdam 2D Dataset demonstrate the outstanding cross-resolution segmentation ability of the proposed SaNet compared to other state-of-the-art networks.

翻訳日:2021-03-16 13:55:01 公開日:2021-03-14

# 単一画像デハジングのためのプログレッシブ残差学習

Progressive residual learning for single image dehazing ( http://arxiv.org/abs/2103.07973v1 )

ライセンス: Link先を確認

Yudong Liang, Bin Wang, Jiaying Liu, Deyu Li, Yuhua Qian and Wenqi Ren

(参考訳) 最近の物理モデルフリーのデハジング手法は最先端のパフォーマンスを達成している。しかし,物理モデルの指導がなければ,データ不足やデータ不足のため,実際のシナリオに適用すると性能は急速に低下する。一方、物理モデルに基づく手法はより解釈性が高いが、パラメータの多目的最適化に苦しむため、準最適脱ハージング結果につながる可能性がある。本稿では, 物理的モデルフリー脱ハージングプロセスと, 両カテゴリにおける脱ハージング手法のメリットを享受する改良型散乱モデルベース脱ハージング操作を組み合わせ, 段階的残留学習戦略を提案する。特に、地球大気の光と透過地図は、初期物理モデルフリーの消泡過程から正確な残差情報と予備的な消泡修復の助けを借りてインタラクティブに最適化されている。提案手法は,パブリックデヘイジングベンチマークにおける最先端手法に対して,複雑なヘイジングデータに対するモデル解釈性と適応性に優れる。

The recent physical model-free dehazing methods have achieved state-of-the-art performances. However, without the guidance of physical models, the performances degrade rapidly when applied to real scenarios due to the unavailable or insufficient data problems. On the other hand, the physical model-based methods have better interpretability but suffer from multi-objective optimizations of parameters, which may lead to sub-optimal dehazing results. In this paper, a progressive residual learning strategy has been proposed to combine the physical model-free dehazing process with reformulated scattering model-based dehazing operations, which enjoys the merits of dehazing methods in both categories. Specifically, the global atmosphere light and transmission maps are interactively optimized with the aid of accurate residual information and preliminary dehazed restorations from the initial physical model-free dehazing process. The proposed method performs favorably against the state-of-the-art methods on public dehazing benchmarks with better model interpretability and adaptivity for complex hazy data.

翻訳日:2021-03-16 13:54:39 公開日:2021-03-14

# 胸部X線画像からのCOVID-19感染の局在と重症度

COVID-19 Infection Localization and Severity Grading from Chest X-ray Images ( http://arxiv.org/abs/2103.07985v1 )

ライセンス: Link先を確認

Anas M. Tahir, Muhammad E. H. Chowdhury, Amith Khandakar, Tawsifur Rahman, Yazan Qiblawey, Uzair Khurshid, Serkan Kiranyaz, Nabil Ibtehaz, M Shohel Rahman, Somaya Al-Madeed, Khaled Hameed, Tahir Hamid, Sakib Mahmud, Maymouna Ezeddin

(参考訳) コロナウイルス感染症2019(COVID-19)は、2019年12月に世界経済と医療システムに大きな影響を与えたことから、世界中の主要な課題となっている。肺組織に対するcovid-19の影響を考えると、胸部x線撮影は疾患のスクリーニングと監視に不可欠である。多くの研究が、COVID-19の自動診断のためのディープラーニングアプローチを提案している。これらの手法は検出性能に驚くべきものとなったが、通常は数百のCXR画像のみを含む限られた胸部X線レポジトリ(CXR)を用いて評価を行っている。したがって、そのようなデータ不足は、オーバーフィッティングの可能性による信頼性の高い評価を妨げている。さらに、ほとんどの研究では、COVID-19肺炎の感染局在および重症度格付けの能力が示されませんでした。本研究では,CXR画像からの感染定量化による肺分画とCOVID-19の局在の体系的,統一的なアプローチを提案することにより,この緊急ニーズに対処する。これを実現するため,我々は,新しい人間-機械協調アプローチにより,cxr上で地対肺分割マスクのアノテーションを行う11,956個のcovid-19サンプルを含む33,920個のcxr画像を含む,最大のベンチマークデータセットを構築した。最先端セグメンテーションネットワーク、U-Net、U-Net++、Feature Pyramid Networks (FPN) を用いて広範な実験を行った。開発されたネットワークは、広範な反復プロセスを経て、96.11%のインターセクションオーバーユニオン(IoU)と97.99%のダイス類似係数(DSC)で肺領域セグメンテーションの優れた性能を達成しました。さらに、様々な形や種類のCOVID-19感染症が83.05%のIoUと88.21%のDSCで確実に局在した。最後に、提案されたアプローチは、99%を超える感度と特異性の両方で優れたCOVID-19検出性能を達成しました。

Coronavirus disease 2019 (COVID-19) has been the main agenda of the whole world, since it came into sight in December 2019 as it has significantly affected the world economy and healthcare system. Given the effects of COVID-19 on pulmonary tissues, chest radiographic imaging has become a necessity for screening and monitoring the disease. Numerous studies have proposed Deep Learning approaches for the automatic diagnosis of COVID-19. Although these methods achieved astonishing performance in detection, they have used limited chest X-ray (CXR) repositories for evaluation, usually with a few hundred COVID-19 CXR images only. Thus, such data scarcity prevents reliable evaluation with the potential of overfitting. In addition, most studies showed no or limited capability in infection localization and severity grading of COVID-19 pneumonia. In this study, we address this urgent need by proposing a systematic and unified approach for lung segmentation and COVID-19 localization with infection quantification from CXR images. To accomplish this, we have constructed the largest benchmark dataset with 33,920 CXR images, including 11,956 COVID-19 samples, where the annotation of ground-truth lung segmentation masks is performed on CXRs by a novel human-machine collaborative approach. An extensive set of experiments was performed using the state-of-the-art segmentation networks, U-Net, U-Net++, and Feature Pyramid Networks (FPN). The developed network, after an extensive iterative process, reached a superior performance for lung region segmentation with Intersection over Union (IoU) of 96.11% and Dice Similarity Coefficient (DSC) of 97.99%. Furthermore, COVID-19 infections of various shapes and types were reliably localized with 83.05% IoU and 88.21% DSC. Finally, the proposed approach has achieved an outstanding COVID-19 detection performance with both sensitivity and specificity values above 99%.

翻訳日:2021-03-16 13:54:23 公開日:2021-03-14

# Deep Tiling: ディープラーニングアプローチを用いたテクスチャタイル合成

Deep Tiling: Texture Tile Synthesis Using a Deep Learning Approach ( http://arxiv.org/abs/2103.07992v1 )

ライセンス: Link先を確認

Vasilis Toulatzis, Ioannis Fudos

(参考訳) テクスチャはコンピュータグラフィックスの基本的なプロセスである。テクスチャを利用して、3Dシーンの可視化結果を強化する。多くの場合、テクスチャ画像は解像度が小さいため、大きな3dモデル表面を覆うことができない。リピート、ミラーリピート、またはエッジへのクランプなどの従来の技術は、視覚的に許容できる結果をもたらしません。深層学習に基づくテクスチャ合成はそのような場合に非常に有効であることが証明されている。より大きな解像度のテクスチャを作ろうとするディープテクスチャ合成手法はすべて、gpuメモリリソースの面で制限されている。本稿では,入力テクスチャの構造的構成要素に類似した任意の解像度のタイルを作成するために,頑健な深層学習プロセスを用いたサンプルベーステクスチャ合成手法を提案する。このようにして、小サイズの新しいテクスチャタイルを合成して元のテクスチャとマージし、第2に、大きなテクスチャの欠落部分を容易に生成できるという事実から、第一にメモリの少ない方法である。

Texturing is a fundamental process in computer graphics. Texture is leveraged to enhance the visualization outcome for a 3D scene. In many cases a texture image cannot cover a large 3D model surface because of its small resolution. Conventional techniques like repeating, mirror repeating or clamp to edge do not yield visually acceptable results. Deep learning based texture synthesis has proven to be very effective in such cases. All deep texture synthesis methods trying to create larger resolution textures are limited in terms of GPU memory resources. In this paper, we propose a novel approach to example-based texture synthesis by using a robust deep learning process for creating tiles of arbitrary resolutions that resemble the structural components of an input texture. In this manner, our method is firstly much less memory limited owing to the fact that a new texture tile of small size is synthesized and merged with the original texture and secondly can easily produce missing parts of a large texture.

翻訳日:2021-03-16 13:53:51 公開日:2021-03-14

# 人的相互作用による建物制御のための深層強化学習のシミュレーション研究

Simulation Studies on Deep Reinforcement Learning for Building Control with Human Interaction ( http://arxiv.org/abs/2103.07919v1 )

ライセンス: Link先を確認

Donghwan Lee, Niao He, Seungjae Lee, Panagiota Karava, Jianghai Hu

(参考訳) 建築部門は世界最大のエネルギーを消費しており、建物のエネルギー消費と快適管理にかなりの研究関心が寄せられています。近年の強化学習 (RL) の進展に触発されて, 気候制御問題構築におけるRLの可能性を評価することを目的とした。本研究では,連続建物制御タスクに対してddpg(deep deterministic policy gradient)と呼ばれる最近のrlアプローチを適用し,センサ制限による部分的状態観測可能性の処理能力,(b)連続的かつ離散的な高次元状態空間を有する複雑な確率システム,(c)環境条件による不確実性,居住者の行動,快適感についてシミュレーション研究を行い,その性能を評価する。特に、占有者間相互作用による部分的可観測性と不確実性は、制御問題を著しく複雑化する。シミュレーション研究を通じて、DDPGが学んだポリシーは、合理的な性能と計算的トラクタビリティを示す。

The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings. Inspired by recent advances in reinforcement learning (RL), this paper aims at assessing the potential of RL in building climate control problems with occupant interaction. We apply a recent RL approach, called DDPG (deep deterministic policy gradient), for the continuous building control tasks and assess its performance with simulation studies in terms of its ability to handle (a) the partial state observability due to sensor limitations; (b) complex stochastic system with high-dimensional state-spaces, which are jointly continuous and discrete; (c) uncertainties due to ambient weather conditions, occupant's behavior, and comfort feelings. Especially, the partial observability and uncertainty due to the occupant interaction significantly complicate the control problem. Through simulation studies, the policy learned by DDPG demonstrates reasonable performance and computational tractability.

翻訳日:2021-03-16 13:51:42 公開日:2021-03-14

# オープンエンディングゲームにおける学習行動多様性のモデル化

Modelling Behavioural Diversity for Learning in Open-Ended Games ( http://arxiv.org/abs/2103.07927v1 )

ライセンス: Link先を確認

Nicolas Perez Nieves, Yaodong Yang, Oliver Slumbers, David Henry Mguni, Jun Wang

(参考訳) 行動多様性の促進は、戦略サイクルが存在する非推移的ダイナミクスでゲームを解決するために重要であり、一貫した勝者は存在しない(例えば、Rock-Paper-Scissors)。しかし、多様性を定義し、多様性を意識した学習ダイナミクスを構築するための厳格な処理が欠けています。本研究では,ゲームにおける行動の多様性を幾何学的に解釈し,dpp(\emph{ determinantal point processes})に基づく新しい多様性指標を導入する。多様性指標を最適応答力学に組み込むことで,正規形式ゲームやオープンエンドゲームを解決するために,emph{diverse fictitious play} と \emph{diverse policy-space response oracle} を開発した。多様なベストレスポンスのユニークさと、2プレイヤーゲームにおけるアルゴリズムの収束性を証明する。重要なのは、DPPベースの多様性メトリックを最大化することで、エージェントの戦略の混合にまたがる凸ポリトープである \emph{gamescape} を拡大できることである。多様性を意識した解法を検証するために、強い非推移性を示す数万のゲームをテストする。提案手法は, 有効かつ多様な戦略を見出すことにより, 最先端の解法よりもはるかに低いエクスプロイザビリティを実現することを示唆している。

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.

翻訳日:2021-03-16 13:51:24 公開日:2021-03-14

# 機械学習に対するメンバシップ推論の攻撃:調査

Membership Inference Attacks on Machine Learning: A Survey ( http://arxiv.org/abs/2103.07853v1 )

ライセンス: Link先を確認

Hongsheng Hu and Zoran Salcic and Gillian Dobbie and Xuyun Zhang

(参考訳) メンバシップ推論攻撃は、データサンプルがマシンラーニングモデルのトレーニングに使用されたかどうかを識別することを目的としている。会員が個人の機密情報を開示できるため、深刻なプライバシーリスクを引き起こす可能性があります。例えば、病院の健康分析トレーニングセットに参加している個人を特定すると、この個人はかつてその病院の患者だったことが判明します。メンバシップ推論攻撃は、分類モデル、生成モデル、シーケンスツーシーケンスモデルなど、さまざまな機械学習モデルに有効であることが示されている。一方、このようなプライバシー攻撃を防御する多くの方法が提案されている。メンバーシップ推論攻撃は、急速に成長している研究分野であるが、このトピックに関する包括的調査はまだない。本稿では,会員推定攻撃文学におけるこの重要なギャップを橋渡しする。会員の推論攻撃に関する最初の包括的な調査を紹介します。既存のメンバーシップ推論攻撃と防御をまとめて分類し、さまざまな設定で攻撃を実装する方法を明確に示します。さらに、メンバシップ推論攻撃が機能する理由を議論し、ベンチマークデータセットを要約して、比較を促進し、将来の作業の公平性を保証する。最後に,今後の研究の方向性と,レビューによる応用の可能性について提案する。

Membership inference attack aims to identify whether a data sample was used to train a machine learning model or not. It can raise severe privacy risks as the membership can reveal an individual's sensitive information. For example, identifying an individual's participation in a hospital's health analytics training set reveals that this individual was once a patient in that hospital. Membership inference attacks have been shown to be effective on various machine learning models, such as classification models, generative models, and sequence-to-sequence models. Meanwhile, many methods are proposed to defend such a privacy attack. Although membership inference attack is an emerging and rapidly growing research area, there is no comprehensive survey on this topic yet. In this paper, we bridge this important gap in membership inference attack literature. We present the first comprehensive survey of membership inference attacks. We summarize and categorize existing membership inference attacks and defenses and explicitly present how to implement attacks in various settings. Besides, we discuss why membership inference attacks work and summarize the benchmark datasets to facilitate comparison and ensure fairness of future work. Finally, we propose several possible directions for future research and possible applications relying on reviewed works.

翻訳日:2021-03-16 13:49:01 公開日:2021-03-14

# BreakingBED -- 敵対攻撃によるバイナリと効率的なディープニューラルネットワークの破壊

BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks ( http://arxiv.org/abs/2103.08031v1 )

ライセンス: Link先を確認

Manoj Rohit Vemparala, Alexander Frickenstein, Nael Fasfous, Lukas Frickenstein, Qi Zhao, Sabine Kuhn, Daniel Ehrhardt, Yuankai Wu, Christian Unger, Naveen Shankar Nagaraja, Walter Stechele

(参考訳) 組み込みアプリケーション向けの畳み込みニューラルネットワーク(CNN)の展開は、リソース効率とタスク関連精度のバランスをとる上で多くの課題である。これらの2つの側面はCNN圧縮の分野でよく研究されています。現実世界のアプリケーションでは、第3の重要な側面、すなわちcnnの堅牢性が果たされる。本論文では、ホワイトボックスとブラックボックスの敵対攻撃(FGSM、PGD、C&W、DeepFool、LocalSearch、GenAttack)に対する非圧縮、蒸留、粉砕およびバイナライズニューラルネットワークの堅牢性を徹底的に研究する。これらの新たな洞察は、攻撃を検知し、入力を破棄または/または浄化する防御訓練スキームや反応性フィルタリング手法を促進する。 CIFAR-10およびImageNetデータセットをトレーニングした蒸留CNN、エージェントベース最先端プルーニングモデル、XNOR-NetやABC-Netなどのバイナライズニューラルネットワーク(BNN)の実験結果を示す。損失/精度レベル, 応力-ひずみグラフ, ボックスプロット, クラスアクティベーションマッピング (CAM) を用いて, CNN の比較を簡略化する手法を提案する。解析の結果,非圧縮cnnおよびプルーニングcnnのあらゆる種類の攻撃に対する感受性が明らかになった。蒸留されたモデルは、C&Wを除いて全ての白い箱攻撃に対する強さを示す。さらに、バイナリニューラルネットワークは、ベースラインや他の圧縮変形と比較して回復力のある挙動を示す。

Deploying convolutional neural networks (CNNs) for embedded applications presents many challenges in balancing resource-efficiency and task-related accuracy. These two aspects have been well-researched in the field of CNN compression. In real-world applications, a third important aspect comes into play, namely the robustness of the CNN. In this paper, we thoroughly study the robustness of uncompressed, distilled, pruned and binarized neural networks against white-box and black-box adversarial attacks (FGSM, PGD, C&W, DeepFool, LocalSearch and GenAttack). These new insights facilitate defensive training schemes or reactive filtering methods, where the attack is detected and the input is discarded and/or cleaned. Experimental results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks (BNNs) such as XNOR-Net and ABC-Net, trained on CIFAR-10 and ImageNet datasets. We present evaluation methods to simplify the comparison between CNNs under different attack schemes using loss/accuracy levels, stress-strain graphs, box-plots and class activation mapping (CAM). Our analysis reveals susceptible behavior of uncompressed and pruned CNNs against all kinds of attacks. The distilled models exhibit their strength against all white box attacks with an exception of C&W. Furthermore, binary neural networks exhibit resilient behavior compared to their baselines and other compressed variants.

翻訳日:2021-03-16 13:48:43 公開日:2021-03-14

# 量子機械学習のための図式微分

Diagrammatic Differentiation for Quantum Machine Learning ( http://arxiv.org/abs/2103.07960v1 )

ライセンス: Link先を確認

Alexis Toumi, Richie Yeung, Giovanni de Felice

(参考訳) リグからモノイド圏への双数構造の一般化によるテンソル計算のダイアグラム的微分について紹介する。これをZXダイアグラムに適用し、位相パラメータに関して線形写像の勾配を図式的に計算する方法を示す。パラメトリス量子回路の図では、多くの変分量子アルゴリズムに基づいてよく知られたパラメータシフト規則が得られる。次に、任意の非線形演算子を符号化するバブル付きダイアグラムを用いて、ハイブリッド古典量子回路の自動微分に拡張する。さらに、ダイアグラムの差別化には、Monoidalカテゴリ用のPythonライブラリであるDisCoPyのオープンソース実装が付属している。古典量子回路の図式勾配はpyzxライブラリを使って単純化され、tketコンパイラを介して量子ハードウェア上で実行される。このことは、文字列図の構造と量子機械学習の計算能力の両方を活用する多くの実用的な応用への扉を開く。

We introduce diagrammatic differentiation for tensor calculus by generalising the dual number construction from rigs to monoidal categories. Applying this to ZX diagrams, we show how to calculate diagrammatically the gradient of a linear map with respect to a phase parameter. For diagrams of parametrised quantum circuits, we get the well-known parameter-shift rule at the basis of many variational quantum algorithms. We then extend our method to the automatic differentation of hybrid classical-quantum circuits, using diagrams with bubbles to encode arbitrary non-linear operators. Moreover, diagrammatic differentiation comes with an open-source implementation in DisCoPy, the Python library for monoidal categories. Diagrammatic gradients of classical-quantum circuits can then be simplified using the PyZX library and executed on quantum hardware via the tket compiler. This opens the door to many practical applications harnessing both the structure of string diagrams and the computational power of quantum machine learning.

翻訳日:2021-03-16 13:45:06 公開日:2021-03-14

# 強凸最適化問題に対する加速一階法の過渡成長

Transient growth of accelerated first-order methods for strongly convex optimization problems ( http://arxiv.org/abs/2103.08017v1 )

ライセンス: Link先を確認

Hesameddin Mohammadi, Samantha Samuelson, Mihailo R. Jovanovi\'c

(参考訳) 最適化アルゴリズムは、限られた時間予算のアプリケーションでますます使われています。多くのリアルタイムおよび組み込みシナリオでは、ほんの数回のイテレーションしか実行できず、伝統的な収束メトリクスはこれらの非漸近的なシステムのパフォーマンスを評価するために使用できない。本稿では,高速化第一次最適化アルゴリズムの過渡挙動について検討する。二次最適化問題に対しては、線形系理論のツールを用いて、非正規ダイナミクスの存在から過渡的成長が生じることを示す。初期のイテレーションで代数的成長をもたらすモードの存在を同定し、これらのモードによって引き起こされる最適解からの過渡的エクスカージョンを定量化する。強凸滑らかな最適化問題に対して, 積分二次制約の理論を応用し, ネステロフ加速法の過渡応答の大きさの上限を定式化する。最適化変数と大域最小化器の間のユークリッド距離と過渡ピークへの上昇時間の両方が問題の条件数の平方根に比例していることを示す。最後に,条件数が大きい問題に対して,定数係数まで導出する境界の厳密性を示す。

Optimization algorithms are increasingly being used in applications with limited time budgets. In many real-time and embedded scenarios, only a few iterations can be performed and traditional convergence metrics cannot be used to evaluate performance in these non-asymptotic regimes. In this paper, we examine the transient behavior of accelerated first-order optimization algorithms. For quadratic optimization problems, we employ tools from linear systems theory to show that transient growth arises from the presence of non-normal dynamics. We identify the existence of modes that yield an algebraic growth in early iterations and quantify the transient excursion from the optimal solution caused by these modes. For strongly convex smooth optimization problems, we utilize the theory of integral quadratic constraints to establish an upper bound on the magnitude of the transient response of Nesterov's accelerated method. We show that both the Euclidean distance between the optimization variable and the global minimizer and the rise time to the transient peak are proportional to the square root of the condition number of the problem. Finally, for problems with large condition numbers, we demonstrate tightness of the bounds that we derive up to constant factors.

翻訳日:2021-03-16 13:43:09 公開日:2021-03-14

# エクストリームラーニングマシンのランダム係数の事前学習のための修正バッチ内在可塑性手法

A Modified Batch Intrinsic Plasticity Method for Pre-training the Random Coefficients of Extreme Learning Machines ( http://arxiv.org/abs/2103.08042v1 )

ライセンス: Link先を確認

Suchuan Dong, Zongwei Li

(参考訳) 極端な学習機械(elm)では、隠れ層係数はランダムに設定され固定され、ニューラルネットワークの出力層係数は最小二乗法で計算される。 ELMのランダム割り当て係数は、その性能と精度に大きく影響することが知られています。本稿では,ELMニューラルネットワークの乱数係数を前訓練するための修正バッチ内在可塑性(modBIP)法を提案する。本手法は,ニューラルネットワークの各ノードにおける情報伝達を強化することにより,バッチ固有可塑性(BIP)法と同じ原理に基づいて考案されている。 BIPとは2つの点で異なる。第一に、modbipはそのアルゴリズムでアクティベーション関数を含まず、ニューラルネットワークの任意のアクティベーション関数に適用することができる。対照的に、BIPはその構成において活性化関数の逆を使い、活性化関数は可逆性(あるいは単調性)を必要とする。 modBIPメソッドは、しばしば使用される非モノトニック活性化関数(例えば)で動作する。 Gaussian, swish, Gaussian error linear unit, and radial-basis type function)。第2に、modBIPは最小サイズのランダム間隔でターゲットサンプルを生成し、EMMと組み合わせると高精度な計算結果が得られる。 ELM/modBIP法は数値シミュレーションにおいてEMM/BIP法よりも著しく精度が高い。関数近似のための浅層および深層ニューラルネットワークと偏微分方程式を用いた境界/初期値問題について, 数値実験を行った。 ELM/modBIP法を組み合わせることで高精度なシミュレーション結果が得られ,その精度はニューラルネットワークのランダム係数初期化に不感であることが実証された。これは、ランダム係数の事前学習を行わないEMM結果と鋭い対比である。

In extreme learning machines (ELM) the hidden-layer coefficients are randomly set and fixed, while the output-layer coefficients of the neural network are computed by a least squares method. The randomly-assigned coefficients in ELM are known to influence its performance and accuracy significantly. In this paper we present a modified batch intrinsic plasticity (modBIP) method for pre-training the random coefficients in the ELM neural networks. The current method is devised based on the same principle as the batch intrinsic plasticity (BIP) method, namely, by enhancing the information transmission in every node of the neural network. It differs from BIP in two prominent aspects. First, modBIP does not involve the activation function in its algorithm, and it can be applied with any activation function in the neural network. In contrast, BIP employs the inverse of the activation function in its construction, and requires the activation function to be invertible (or monotonic). The modBIP method can work with the often-used non-monotonic activation functions (e.g. Gaussian, swish, Gaussian error linear unit, and radial-basis type functions), with which BIP breaks down. Second, modBIP generates target samples on random intervals with a minimum size, which leads to highly accurate computation results when combined with ELM. The combined ELM/modBIP method is markedly more accurate than ELM/BIP in numerical simulations. Ample numerical experiments are presented with shallow and deep neural networks for function approximation and boundary/initial value problems with partial differential equations. They demonstrate that the combined ELM/modBIP method produces highly accurate simulation results, and that its accuracy is insensitive to the random-coefficient initializations in the neural network. This is in sharp contrast with the ELM results without pre-training of the random coefficients.

翻訳日:2021-03-16 13:42:52 公開日:2021-03-14

# (参考訳) 二次拘束下での深部グラフマッチング

Deep Graph Matching under Quadratic Constraint ( http://arxiv.org/abs/2103.06643v2 )

ライセンス: CC BY 4.0

Quankai Gao, Fudong Wang, Nan Xue, Jin-Gang Yu, Gui-Song Xia

(参考訳) 近年,グラフノード上で抽出された深層特徴の記述能力に依拠して,グラフマッチング問題に対して有望な結果が得られている。しかし、既存のディープグラフマッチング(DGM)メソッドの主な制限の1つは、グラフ構造の明示的な制約の無知であり、トレーニング中にモデルが局所的な最小値に閉じ込められる可能性がある。本稿では, DGM フレームワークに組み込んだ対方グラフ構造を, \textbf{quadratic constraint} として明示的に定式化する。二次制約はグラフ間の対構造的な相違を最小限に抑え、抽出したCNN特徴のみを用いて得られるあいまいさを軽減できる。さらに,2次制約付き最適化に対して,制約のないディープラーニングオプティマイザと互換性があるような,微分可能な実装を提案する。より正確かつ適切な監視を行うために、クラス不均衡に対する適切に設計された偽マッチング損失が提案され、過度に適合しない偽陰性や偽陽性をよりよく罰できる。実験により,本手法は実世界のデータセット上での競合性能を示す。

Recently, deep learning based methods have demonstrated promising results on the graph matching problem, by relying on the descriptive capability of deep features extracted on graph nodes. However, one main limitation with existing deep graph matching (DGM) methods lies in their ignorance of explicit constraint of graph structures, which may lead the model to be trapped into local minimum in training. In this paper, we propose to explicitly formulate pairwise graph structures as a \textbf{quadratic constraint} incorporated into the DGM framework. The quadratic constraint minimizes the pairwise structural discrepancy between graphs, which can reduce the ambiguities brought by only using the extracted CNN features. Moreover, we present a differentiable implementation to the quadratic constrained-optimization such that it is compatible with the unconstrained deep learning optimizer. To give more precise and proper supervision, a well-designed false matching loss against class imbalance is proposed, which can better penalize the false negatives and false positives with less overfitting. Exhaustive experiments demonstrate that our method competitive performance on real-world datasets.

翻訳日:2021-03-16 13:27:21 公開日:2021-03-14

# 確率的制御目的における情報探索の起源の理解

Understanding the origin of information-seeking exploration in probabilistic objectives for control ( http://arxiv.org/abs/2103.06859v2 )

ライセンス: Link先を確認

Beren Millidge, Alexander Tschantz, Anil Seth, Christopher Buckley

(参考訳) 探索と探索のトレードオフは、機械学習から生物学、経済学まで幅広い分野における適応行動の記述の中心である。多くのアプローチが取られているが、このトレードオフを解決するための1つのアプローチは、エージェントが固有の「探索駆動」を持っていること、すなわち、世界のエージェント情報獲得を最大化すること、すなわち機械学習や認知科学で広く研究されているアプローチである。本稿では,このような手法の性質と意味を数学的に検討し,このユーティリティの最大化と情報探索の組合せが,分散目的と呼ぶ目的の完全差分クラスを最小化することから生じることを実証する。 We propose a dichotomy in the objective functions underlying adaptive behaviour between \emph{evidence} objectives, which correspond to well-known reward or utility maximizing objectives in the literature, and \emph{divergence} objectives which instead seek to minimize the divergence between the agent's expected and desired futures, and argue that this new class of divergence objectives could form the mathematical foundation for a much richer understanding of the exploratory components of adaptive and intelligent action, beyond simply greedy utility maximization.

The exploration-exploitation trade-off is central to the description of adaptive behaviour in fields ranging from machine learning, to biology, to economics. While many approaches have been taken, one approach to solving this trade-off has been to equip or propose that agents possess an intrinsic 'exploratory drive' which is often implemented in terms of maximizing the agents information gain about the world -- an approach which has been widely studied in machine learning and cognitive science. In this paper we mathematically investigate the nature and meaning of such approaches and demonstrate that this combination of utility maximizing and information-seeking behaviour arises from the minimization of an entirely difference class of objectives we call divergence objectives. We propose a dichotomy in the objective functions underlying adaptive behaviour between \emph{evidence} objectives, which correspond to well-known reward or utility maximizing objectives in the literature, and \emph{divergence} objectives which instead seek to minimize the divergence between the agent's expected and desired futures, and argue that this new class of divergence objectives could form the mathematical foundation for a much richer understanding of the exploratory components of adaptive and intelligent action, beyond simply greedy utility maximization.

翻訳日:2021-03-16 11:55:54 公開日:2021-03-14

PDF登録状況（公開日: 20210314）