Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20210824となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 離散時間量子ウォークを用いた閉グラフ上のマルチキュービット量子コンピューティング Multi-qubit quantum computing using discrete-time quantum walks on closed graphs ( http://arxiv.org/abs/2004.05956v2 ) ライセンス: Link先を確認	Prateek Chawla, Shivani Singh, Aman Agarwal, Sarvesh Srinivasan, C. M. Chandrashekar	(参考訳) 普遍量子計算は、連続時間と離散時間の両方の量子ウォークを用いて実現することができる。本稿では,単一粒子離散時間量子ウォークに基づくマルチキュービット計算タスクを実現するバージョンを提案する。このスキームのスケーラビリティは、閉格子形式のウォーク操作の集合を用いて、マルチ量子ビット系上の量子ゲートの普遍的な集合を実装することで証明される。また、グローバーのアルゴリズム、量子フーリエ変換、量子位相推定アルゴリズムを実装できる、実験的に実現可能なウォーク演算のセットも提示する。エラー検出と修正の基本的な実装も提示する。このスキームの空間的および時間的複雑さの分析は、量子ウォーク進化操作の実装がシステム固有の特徴であるシステムにおける量子ウォークに基づく量子計算モデルの利点を強調している。 Universal quantum computation can be realised using both continuous-time and discrete-time quantum walks. We present a version based on single particle discrete-time quantum walk to realize multi-qubit computation tasks. The scalability of the scheme is demonstrated by using a set of walk operations on a closed lattice form to implement the universal set of quantum gates on multi-qubit system. We also present a set of experimentally realizable walk operations that can implement Grover's algorithm, quantum Fourier transformation and quantum phase estimation algorithms. An elementary implementation of error detection and correction is also presented. Analysis of space and time complexity of the scheme highlights the advantages of quantum walk based model for quantum computation on systems where implementation of quantum walk evolution operations is an inherent feature of the system.	翻訳日:2023-05-24 11:30:59 公開日:2021-08-24
# 古典イジングハミルトンの低エネルギー状態の量子インスパイア探索法 Quantum-inspired search method for low-energy states of classical Ising Hamiltonians ( http://arxiv.org/abs/2010.00180v2 ) ライセンス: Link先を確認	Hiroshi Ueda, Yuichi Otsuka and Seiji Yunoki	(参考訳) 2体完全連結なランダムイジング相互作用とランダムな局所磁場からなる古典的ハミルトニアンの低エネルギー状態を求める量子インスパイアされた数値計算法を開発した。この方法では、元のイジングハミルトニアンに可換でない無限小量子相互作用を導入し、クリロフ部分空間法に触発された直積状態を繰り返し生成および切断し、元の古典イジングハミルトニアンの低エネルギー状態を得る。計算コストは、無限小量子相互作用(例えば、一体または二体相互作用)の形式と、導入される無限小相互作用項の数、異なる初期状態、および反復中に保持される低エネルギー状態によって制御される。ここでは、異なるサイト上で作用するパウリ$X$作用素の無限小量子相互作用対積と、ランダムなイジング・ハミルトニアン(英語版)(Ising Hamiltonian)へのオンサイトであるパウリ$X$演算子(英語版)の数値コストが1イテレーションあたり$O(N^3)$であることを示す。ランダムなイジング・ハミルトニアンに対して最大600ドルのランダムなカップリング実現の120のインスタンスを検討し、各インスタンスの120の最低エネルギー状態を求める。本稿では,ランダムイジング・ハミルトニアン・スケールの基底状態の探索のために,量子インスパイアされた手法による解の時間-解法と異なる初期状態の観点からの並列化が,約$n^5$ for $n$ から$600 となることを見出した。また, ランダムイジングハミルトニアンの低エネルギー領域におけるアンサンブル平均基底状態, 第一励起エネルギー, アンサンブル平均状態数などの基礎物性についても検討した。 We develop a quantum-inspired numerical procedure for searching low-energy states of a classical Hamiltonian composed of two-body fully-connected random Ising interactions and a random local longitudinal magnetic field. In this method, we introduce infinitesimal quantum interactions that do not commute with the original Ising Hamiltonian, and repeatedly generate and truncate direct product states, inspired by the Krylov subspace method, to obtain the low-energy states of the original classical Ising Hamiltonian. The computational cost is controlled by the form of infinitesimal quantum interactions (e.g., one-body or two-body interactions) and the numbers of infinitesimal interaction terms introduced, different initial states considered, and low-energy states kept during the iteration. For a demonstrate of the method, here we introduce as the infinitesimal quantum interactions pair products of Pauli $X$ operators acting on different sites and on-site Pauli $X$ operators into the random Ising Hamiltonian, in which the numerical cost is $O(N^3)$ per iteration with the system size $N$. We consider 120 instances of the random coupling realizations for the random Ising Hamiltonian with $N$ up to 600 and search the 120 lowest-energy states for each instance. We find that the time-to-solution by the quantum-inspired method proposed here, with parallelization in terms of the different initial states, for searching the ground state of the random Ising Hamiltonian scales approximately as $N^5$ for $N$ up to 600. We also examine the basic physical properties such as the ensemble-averaged ground-state and first-excited energies and the ensemble-averaged number of states in the low-energy region of the random Ising Hamiltonian.	翻訳日:2023-04-30 12:15:38 公開日:2021-08-24
# 縮小密度行列からの複雑性:カオスの新しい診断法 Complexity from the Reduced Density Matrix: a new Diagnostic for Chaos ( http://arxiv.org/abs/2011.04705v2 ) ライセンス: Link先を確認	Arpan Bhattacharyya, S. Shajidul Haque and Eugene H. Kim	(参考訳) 多粒子量子系におけるカオスを特徴づける回路複雑性について検討する。このプロセスでは、複雑性を利用してオープン量子システムを分析する。本研究では,異なる種類の量子回路を探索することにより,密度行列の低減に基づく複雑性から量子カオスの新しい診断法を提案する。共振器の1つまたは両方を反転させる2つの結合振動子のおもちゃモデルに関する明示的な計算により、複雑性の進化がカオスの診断の可能性を示す。 We investigate circuit complexity to characterize chaos in multiparticle quantum systems. In the process, we take a stride to analyze open quantum systems by using complexity. We propose a new diagnostic of quantum chaos from complexity based on the reduced density matrix by exploring different types of quantum circuits. Through explicit calculations on a toy model of two coupled harmonic oscillators, where one or both of the oscillators are inverted, we demonstrate that the evolution of complexity is a possible diagnostic of chaos.	翻訳日:2023-04-24 21:08:09 公開日:2021-08-24
# ドープ量子井戸における電子基底状態の共振器誘起変化の測定に関する理論的提案 Theoretical proposals to measure resonator-induced modifications of the electronic ground-state in doped quantum wells ( http://arxiv.org/abs/2012.09458v3 ) ライセンス: Link先を確認	Yuan Wang and Simone De Liberato	(参考訳) 近年の非摂動型光-物質結合の物理学への関心は、相互作用エネルギーが素粒子に匹敵する固体キャビティ量子力学装置の開発につながった。このような状況下では、結合系の基底状態は相互作用依存となり、多くの調査の対象であったにもかかわらずまだ観測されていない仮想励起の集団を含むと予測される。本稿では,量子井戸における仮想電子励起が基底状態電荷分布をどのように変化させるかを調査し,そのキャビティ誘起摂動を測定する2つの方法を提案する。最初のアプローチは、局所的な欠陥状態を用いた量子井戸の特定の位置における電子集団の分光マッピングに基づいている。第二のアプローチは代わりにケルビンプローブのフォトニック等価性を利用して量子井戸の平均変化分布を測定する。両方の効果は、現在または近未来の技術で観察できる。その結果,地中電子特性の空洞誘起変調の実証への道筋が得られた。 Recent interest in the physics of non-perturbative light-matter coupling led to the development of solid-state cavity quantum electrodynamics setups in which the interaction energies are comparable with the bare ones. In such a regime the ground state of the coupled system becomes interaction-dependent and is predicted to contain a population of virtual excitations which, notwithstanding having been object of many investigations, remain still unobserved. In this paper we investigate how virtual electronic excitations in quantum wells modify the ground-state charge distribution, and propose two methods to measure such a cavity-induced perturbation. The first approach we consider is based on spectroscopic mapping of the electronic population at a specific location in the quantum well using localised defect states. The second approach exploits instead the photonic equivalent of a Kelvin probe to measure the average change distribution across the quantum well. We find both effects observable with present-day or near-future technology. Our results thus provide a route toward a demonstration of cavity-induced modulation of ground-state electronic properties.	翻訳日:2023-04-20 08:46:16 公開日:2021-08-24
# 一次元量子系における緩和の運命の不決定性 Undecidability of the fate of relaxation in one-dimensional quantum systems ( http://arxiv.org/abs/2012.13890v2 ) ライセンス: Link先を確認	Naoto Shiraishi and Keiji Matsumoto	(参考訳) 孤立量子多体系における緩和ダイナミクスについて検討する。緩和後の可観測物の定常値は、この定常値が平衡値と一致する緩和現象であるため、量子熱化の分野での研究のトピックである。したがって、量子多体系における定常値の計算は重要な問題と見なされる。しかし、量子多体系の定常値は計算不可能であることが証明される。より正確には、定常値が与えられた値の近傍にあるかどうかが決定不能な問題であることを示す。我々の決定不能な結果は、最接近相互作用のある1次元シフト不変量系、我々の初期状態から1つのサイト上の状態の積状態、そして1つのボディオブザーバブルのシフトサムに観測可能な場合に、まだ満足できる。この結果は、与えられた量子多体系における熱化の有無を決定する一般的な定理や手続きがないことを明確に示している。 We investigate the relaxation dynamics in an isolated quantum many-body system. The stationary value of an observable after relaxation is a topic of researches in the field of quantum thermalization, since thermalization is a relaxation phenomena where this stationary value coincides with the equilibrium value. Therefore, computing the stationary value in quantum many-body systems is regarded as an important problem. We, however, prove that the stationary value in quantum many-body systems is incomputable. More precisely, we show that whether the stationary value is in the vicinity of a given value or not is an undecidable problem. Our undecidable result is still satisfied when we restrict our system to a one-dimensional shift-invariant system with nearest-neighbor interaction, our initial state to a product state of a state on a single site, and our observable to a shift-sum of a one-body observable. This result clearly shows that there is no general theorem or procedure to decide the presence or absence of thermalization in a given quantum many-body system.	翻訳日:2023-04-19 04:07:44 公開日:2021-08-24
# 量子熱化における不確定性 Undecidability in quantum thermalization ( http://arxiv.org/abs/2012.13889v2 ) ライセンス: Link先を確認	Naoto Shiraishi and Keiji Matsumoto	(参考訳) 孤立量子多体系における熱化の研究は、統計力学の発達の時代までさかのぼる長い歴史がある。自然界の多くの量子多体系は熱化と見なされるが、一部は熱平衡に達することはない。中心的な問題は、ある系が以前に対処されたが解決されていない熱分解するかどうかを明らかにすることである。ここでは、この問題は決定不能であることを示す。結果として生じる不確定性は、システムが最も近い隣り合う相互作用を持つ一次元シフト不変系に制限され、初期状態が固定積状態であるときにさえ適用される。我々は、可逆的普遍的チューリングマシンのダイナミクスをコードするハミルトンのファミリーを構築し、チューリングマシンが停止するかどうかによって緩和過程の運命がかなり変化する。以上の結果から,任意のハミルトニアンにおける熱化の有無を決定する一般定理,アルゴリズム,系統的手続きは存在しないことが示唆された。 The investigation of thermalization in isolated quantum many-body systems has a long history, dating back to the time of developing statistical mechanics. Most quantum many-body systems in nature are considered to thermalize, while some never achieve thermal equilibrium. The central problem is to clarify whether a given system thermalizes, which has been addressed previously, but not resolved. Here, we show that this problem is undecidable. The resulting undecidability even applies when the system is restricted to one-dimensional shift-invariant systems with nearest-neighbour interaction, and the initial state is a fixed product state. We construct a family of Hamiltonians encoding dynamics of a reversible universal Turing machine, where the fate of a relaxation process changes considerably depending on whether the Turing machine halts. Our result indicates that there is no general theorem, algorithm, or systematic procedure determining the presence or absence of thermalization in any given Hamiltonian.	翻訳日:2023-04-19 04:07:28 公開日:2021-08-24
# 平衡から遠い系の普遍的量子揺らぎ-散逸関係 Universal Quantum Fluctuation-Dissipation Relation for Systems Far From Equilibrium ( http://arxiv.org/abs/2101.11827v2 ) ライセンス: Link先を確認	Zhedong Zhang, Xuanhua Wang, Jin Wang	(参考訳) 平衡状態から遠方への緩和に伴うゆらぎは、幅広いスケールの様々なシステムにとって基本的な関心事である。近年の分光などの技術の進歩により、メソスコピック系のゆらぎを、平衡から遠ざかる量子系を駆動する際に緩和過程と関連づけて測定する可能性が生まれている。詳細平衡条件に反する量子マルコフ過程に対する一般非平衡変動散逸定理(FDT)を提案する。ゆらぎとは別に、緩和は、平衡状態から遠方にある量子カール流束によって支配される余分な相関を伴う。このような寄与は熱平衡のために消滅し、従来のFDTが回収される。最終的に分子接合に非平衡FDTを適用し、光透過スペクトルの詳細な平衡破壊効果を解明する。本研究は摂動系および近平衡系におけるゆらぎ-散逸関係の利点と限界を示し、摂動系および近平衡系における量子熱力学の研究に広く興味を持ち、量子熱力学の研究にも幅広い関心を寄せている。 Fluctuations associated with relaxations in far-from-equilibrium regime is of fundamental interest for a large variety of systems within broad scales. Recent advances in techniques such as spectroscopy have generated the possibility for measuring the fluctuations of the mesoscopic systems in connection to the relaxation processes when driving the underlying quantum systems far from equilibrium. We present a general nonequilibrium Fluctuation-Dissipation Theorem (FDT) for quantum Markovian processes where the detailed-balance condition is violated. Apart from the fluctuations, the relaxation involves extra correlation that is governed by the quantum curl flux emerged in the far-from-equilibrium regime. Such a contribution vanishes for the thermal equilibrium, so that the conventional FDT is recovered. We finally apply the nonequilibrium FDT to the molecular junctions, elaborating the detailed-balance-breaking effects on the optical transmission spectrum. Our results have the advantage of and exceed the scope of the fluctuation-dissipation relation in the perturbative and near equilibrium regimes, and is of broad interest for the study of quantum thermodynamics.ation in the perturbative and near equilibrium regimes, and is of broad interest for the study of quantum thermodynamics.	翻訳日:2023-04-13 12:08:22 公開日:2021-08-24
# 高速反転による量子ルーティング Quantum routing with fast reversals ( http://arxiv.org/abs/2103.03264v2 ) ライセンス: Link先を確認	Aniruddha Bapat, Andrew M. Childs, Alexey V. Gorshkov, Samuel King, Eddie Schoute, Hrishee Shastri	(参考訳) 本稿では、相互作用制約下で量子ビットの任意の置換を実装する手法を提案する。提案プロトコルは,経路に沿ったキュービットの順序を高速に逆転する従来の手法を利用する。 n$ の経路上の近距離-neighbor相互作用を考えると、量子ルーティング時間が(1-\epsilon)n$ 以上であるような一定の $\epsilon \approx 0.034$ が存在するが、スワップベースのプロトコルは少なくとも $n-1$ である。これは、スワップベースのルーティング方法に対する最初の既知の量子アドバンテージであり、グリッドのような現実的なアーキテクチャに対する量子ルーティング時間を改善する。さらに,本アルゴリズムはランダムな置換に対する期待値が2n/3$の量子ルーティング時間に接近していることを示し,スワップベースのプロトコルは漸近的に時間$n$を求める。さらに、k \le n$ qubits をルートするスパース置換を考え、経路上では最大$n/3 + o(k^2)$、半径 $r$ の一般グラフでは最大$r/3 + o(k^2)$ の量子ルーティング時間を持つアルゴリズムを与える。 We present methods for implementing arbitrary permutations of qubits under interaction constraints. Our protocols make use of previous methods for rapidly reversing the order of qubits along a path. Given nearest-neighbor interactions on a path of length $n$, we show that there exists a constant $\epsilon \approx 0.034$ such that the quantum routing time is at most $(1-\epsilon)n$, whereas any swap-based protocol needs at least time $n-1$. This represents the first known quantum advantage over swap-based routing methods and also gives improved quantum routing times for realistic architectures such as grids. Furthermore, we show that our algorithm approaches a quantum routing time of $2n/3$ in expectation for uniformly random permutations, whereas swap-based protocols require time $n$ asymptotically. Additionally, we consider sparse permutations that route $k \le n$ qubits and give algorithms with quantum routing time at most $n/3 + O(k^2)$ on paths and at most $2r/3 + O(k^2)$ on general graphs with radius $r$.	翻訳日:2023-04-09 02:20:25 公開日:2021-08-24
# 対称性を通した四体不識別性の特徴 Characterizing four-body indistinguishability via symmetries ( http://arxiv.org/abs/2103.04600v2 ) ライセンス: Link先を確認	Alexander M. Minke, Andreas Buchleitner, Christoph Dittel	(参考訳) 混合状態において調製された内部自由度によって部分的に識別可能な4つの同一のボソニック粒子またはフェルミイオン粒子の識別不能性を特徴付ける方法を示す。これは、その外的(動的)自由度に作用する高度に対称なユニタリに従えば、そのカウント統計によって達成される。純粋な内部状態に対しては、粒子の集合相に関する情報をさらに抽出し、最終的には複素共役までの完全な多粒子密度作用素を実験的に再構築することができる。 We show how to characterize the indistinguishability of up to four identical, bosonic or fermionic particles, which are rendered partially distinguishable through their internal degrees of freedom prepared in mixed states. This is accomplished via their counting statistics when subjected to a highly symmetric unitary acting upon their external (i.e., dynamical) degrees of freedom. For pure internal states, we further extract information on the particles' collective phases, which ultimately allows for an experimental reconstruction of the full many-particle density operator up to complex conjugation.	翻訳日:2023-04-08 18:21:50 公開日:2021-08-24
# 浮遊系の熱・機械的変化の非平衡制御 Nonequilibrium control of thermal and mechanical changes in a levitated system ( http://arxiv.org/abs/2103.10898v2 ) ライセンス: Link先を確認	Markus Rademacher, Michael Konopik, Maxime Debiossac, David Grass, Eric Lutz, Nikolai Kiesel	(参考訳) ゆらぎ定理は、小さな非平衡系の熱力学の第2法則の基本的な拡張である。作業と熱は同様にエネルギー交換の重要な形態であるが、変動関係は機械と熱の同時変化の一般的な状況について実験的に評価されていない。熱駆動は機械駆動よりも一般的に遅く、より実現が難しい。ここでは, フィードバック冷却技術を用いて, 平衡時間よりも1桁高速に浮遊する微小粒子の高速かつ制御された温度変化を実現する。機械制御と熱制御を組み合わせることで, 線形応答理論の範囲を超えて, 両寄与を考慮したゆらぎ定理の有効性を検証した。この結果から, 機械的および熱的変化を同時に行う顕微鏡システムにおいて, 一般の遠方平衡過程の解明が可能となった。 Fluctuation theorems are fundamental extensions of the second law of thermodynamics for small nonequilibrium systems. While work and heat are equally important forms of energy exchange, fluctuation relations have not been experimentally assessed for the generic situation of simultaneous mechanical and thermal changes. Thermal driving is indeed generally slow and more difficult to realize than mechanical driving. Here, we use feedback cooling techniques to implement fast and controlled temperature variations of an underdamped levitated microparticle that are one order of magnitude faster than the equilibration time. Combining mechanical and thermal control, we verify the validity of a fluctuation theorem that accounts for both contributions, well beyond the range of linear response theory. Our results allow the investigation of general far-from-equilibrium processes in microscopic systems that involve fast mechanical and thermal changes at the same time.	翻訳日:2023-04-07 10:54:28 公開日:2021-08-24
# 分解領域におけるイオンアンサンブルの高分解能光学分光による陽子電子質量比 Proton-electron mass ratio by high-resolution optical spectroscopy of ion ensembles in the resolved-carrier regime ( http://arxiv.org/abs/2103.11741v2 ) ライセンス: Link先を確認	I. V. Kortunov, S. Alighanbari, M. G. Hansen, G. S. Giri, V. I. Korobov and S. Schiller	(参考訳) 気体相の光学分光は、原子と分子の構造とそれらの外部磁場との相互作用を解明するための重要なツールである。線解像度は通常、粒子の熱運動による一階ドップラー拡大と励起ビームによる短い通過時間の組合せによって制限される。閉じ込められた粒子の場合、適切なレーザー冷却技術は強い閉じ込め(ラム・ディッケ状態、LDR)をもたらし、これらの効果を伴わない光学分光に繋がる。非レーザー冷却型分光イオンでは、これは1つまたは2つの原子イオンと1つのレーザー可溶性原子イオン[1,2]をトラップする場合にのみ達成されている。ここでは, ドップラーとトランジットを含まない1光子光学分光法が, 中赤外放射によるイオンのアンサンブルにより容易に得られることを示す。本手法を分子イオン上で実証する。我々は、数千個のレーザー冷却原子イオンからなるクーロンクラスター内に約100個の水素分子イオン(HD$^{+}$)をトラップし、基本振動遷移のレーザー分光を行う。遷移周波数は3.3$\times$10$^{-12}$の最低不確かさで決定された。応用例として, 精密な ab initio 計算と測定振動周波数を一致させて, 陽子電子質量比を求める。 Optical spectroscopy in the gas phase is a key tool to elucidate the structure of atoms and molecules and of their interaction with external fields. The line resolution is usually limited by a combination of first-order Doppler broadening due to particle thermal motion and of a short transit time through the excitation beam. For trapped particles, suitable laser cooling techniques can lead to strong confinement (Lamb-Dicke regime, LDR) and thus to optical spectroscopy free of these effects. For non-laser coolable spectroscopy ions, this has so far only been achieved when trapping one or two atomic ions, together with a single laser-coolable atomic ion [1,2]. Here we show that one-photon optical spectroscopy free of Doppler and transit broadening can also be obtained with more easily prepared ensembles of ions, if performed with mid-infrared radiation. We demonstrate the method on molecular ions. We trap approximately 100 molecular hydrogen ions (HD$^{+}$) within a Coulomb cluster of a few thousand laser-cooled atomic ions and perform laser spectroscopy of the fundamental vibrational transition. Transition frequencies were determined with lowest uncertainty of 3.3$\times$10$^{-12}$ fractionally. As an application, we determine the proton-electron mass ratio by matching a precise ab initio calculation with the measured vibrational frequency.	翻訳日:2023-04-07 04:35:37 公開日:2021-08-24
# 散逸結合縮退光パラメトリック発振器における絡み合った猫状態の生成と検出 Generating and detecting entangled cat states in dissipatively coupled degenerate optical parametric oscillators ( http://arxiv.org/abs/2103.16090v2 ) ライセンス: Link先を確認	Zheng-Yang Zhou, Clemens Gneiting, J. Q. You, and Franco Nori	(参考訳) 非ガウス連続変数状態は、量子理論の基礎と創発的量子技術の両方において中心的な役割を果たす。特に「キャット状態」、すなわち2成分のマクロな量子重ね合わせは、量子コヒーレンスをアクセス可能な方法で具現化し、基本的なテストや量子情報タスクにも利用できる。縮退した光パラメトリック発振器は自然に単一モードの猫状態を生成できるため、その実現と活用に有望なプラットフォームとなる。縮退した光パラメトリック振動子間の散逸結合は、これを2モードの絡み合った猫状態、すなわち2モードの絡み合った猫状態へと拡張する。単一光子損失を克服することは、縮退した光パラメトリック発振器において十分に純粋な単一モードの猫状態を実現するための大きな課題である一方で、そのような散逸結合下での2つのモードの絡み合った猫状態の生成は、追加のハードルなしで達成できることを示す。 2つの散逸結合縮退光パラメトリック発振器において、一過性2モード絡み合い状態を生成するためのパラメータレジームを数値的に検討した。キャット状態の絡み合いを認証するために, キャット状態の絡み合いを現実的な条件下で確実に検出できる, 調整された分散型絡み合い基準を用いる。 Non-Gaussian continuous variable states play a central role both in the foundations of quantum theory and for emergent quantum technologies. In particular, "cat states", i.e., two-component macroscopic quantum superpositions, embody quantum coherence in an accessible way and can be harnessed for fundamental tests and quantum information tasks alike. Degenerate optical parametric oscillators can naturally produce single-mode cat states and thus represent a promising platform for their realization and harnessing. We show that a dissipative coupling between degenerate optical parametric oscillators extends this to two-mode entangled cat states, i.e., two-mode entangled cat states are naturally produced under such dissipative coupling. While overcoming single-photon loss still represents a major challenge towards the realization of sufficiently pure single-mode cat states in degenerate optical parametric oscillators, we show that the generation of two-mode entangled cat states under such dissipative coupling can then be achieved without additional hurdles. We numerically explore the parameter regime for the successful generation of transient two-mode entangled cat states in two dissipatively coupled degenerate optical parametric oscillators. To certify the cat-state entanglement, we employ a tailored, variance-based entanglement criterion, which can robustly detect cat-state entanglement under realistic conditions.	翻訳日:2023-04-06 03:49:17 公開日:2021-08-24
# 非対称一般化量子ラビ模型に対する隠れ対称性作用素 Hidden symmetry operators for asymmetric generalised quantum Rabi models ( http://arxiv.org/abs/2104.14164v2 ) ライセンス: Link先を確認	Xilin Lu, Zi-Min Li, Vladimir V. Mangazeev and Murray T. Batchelor	(参考訳) 非対称量子ラビモデル(aqrm)の隠れた$\mathbb{z}_2$対称性は、最近基礎となる対称性作用素の体系的構成を通じて明らかにされている。 AQRMの結果に基づいて、AQRM関連モデルの対称性演算子の一般的な形に対するアンサッツを提案する。このアンザッツを適用して、異方性AQRM、非対称ラビスタークモデル(ARSM)、および異方性ARSMの3つのモデルに対する対称性作用素を得る。 The hidden $\mathbb{Z}_2$ symmetry of the asymmetric quantum Rabi model (AQRM) has recently been revealed via a systematic construction of the underlying symmetry operator. Based on the AQRM result, we propose an ansatz for the general form of the symmetry operators for AQRM-related models. Applying this ansatz we obtain the symmetry operator for three models: the anisotropic AQRM, the asymmetric Rabi-Stark model (ARSM) and the anisotropic ARSM.	翻訳日:2023-04-02 02:18:53 公開日:2021-08-24
# 衝突型電池充電における量子スピードアップ Quantum speed-up in collisional battery charging ( http://arxiv.org/abs/2105.01863v2 ) ライセンス: Link先を確認	Stella Seah, Mart\'i Perarnau-Llobet, G\'eraldine Haack, Nicolas Brunner, Stefan Nimmrichter	(参考訳) 同一の非平衡量子ビット単位による量子電池の充電に関する衝突モデルを提案する。単位がエネルギー固有状態の混合で作成されると、電池のエネルギーゲインは古典的なランダムウォークによって説明され、平均エネルギーと分散は時間とともに線形に成長する。逆に、量子コヒーレンスを含む量子ビットでは、バッテリに干渉効果が蓄積され、量子ランダムウォークを思い起こさせるようなエネルギー分布がより速く広がる。これは、地上で初期化されたバッテリーの高速で効率的な充電に利用できる。具体的には、コヒーレントプロトコルは、任意の非コヒーレント戦略よりも高い充電パワーが得られることを示し、単一のバッテリのレベルで量子スピードアップを実証する。最後に, エルゴトロピーの概念を用いて, 電池から抽出可能な作業量を特徴付ける。 We present a collision model for the charging of a quantum battery by identical nonequilibrium qubit units. When the units are prepared in a mixture of energy eigenstates, the energy gain in the battery can be described by a classical random walk, where both average energy and variance grow linearly with time. Conversely, when the qubits contain quantum coherence, interference effects buildup in the battery and lead to a faster spreading of the energy distribution, reminiscent of a quantum random walk. This can be exploited for faster and more efficient charging of a battery initialized in the ground state. Specifically, we show that coherent protocols can yield higher charging power than any possible incoherent strategy, demonstrating a quantum speed-up at the level of a single battery. Finally, we characterize the amount of extractable work from the battery through the notion of ergotropy.	翻訳日:2023-04-01 13:24:17 公開日:2021-08-24
# 日々のアルゴリズム監査:有害なアルゴリズム行動に直面する日常生活者の力を理解する Everyday algorithm auditing: Understanding the power of everyday users in surfacing harmful algorithmic behaviors ( http://arxiv.org/abs/2105.02980v2 ) ライセンス: Link先を確認	Hong Shen, Alicia DeVos, Motahhare Eslami, Kenneth Holstein	(参考訳) 研究機関は、偏見と有害な行動に対するアルゴリズムシステム監査のための公式なアプローチを提案している。正式な監査アプローチは大きな影響を与えてきたが、システムのデプロイが完了すると、日常使用のコンテキストでのみ問題が発生するため、大きな盲点に陥りがちである。近年,アルゴリズムシステムの日常的使用者が,これらのシステムとの日常的な相互作用の中で遭遇する有害な行為を検知し,意識を高めるケースが増えている。しかし、これまでのところ、このボトムアップでユーザ主導の監査プロセスにはほとんど学術的な注意が払われていない。本稿では,アルゴリズムシステムとの日々のインタラクションを通じて問題のある機械の動作を検出し,理解し,問合せを行うプロセスである,日々のアルゴリズム監査の概念を提案し,検討する。我々は,ユーザのアルゴリズムに関する知識によらず,より中央集権的な監査形態による検出を不要とする,問題のあるマシン動作を克服する上で,日常的なユーザは強力である,と論じる。我々は、日常的なアルゴリズム監査の現実的な事例を分析し、これらの事例から将来のプラットフォームや監査行動を促進するツールの設計の教訓を導き出す。最後に,形式的監査アプローチと,アルゴリズムシステムの日常的利用において生じる有機的監査行動とのギャップを埋めるために,先行する作業について論じる。 A growing body of literature has proposed formal approaches to audit algorithmic systems for biased and harmful behaviors. While formal auditing approaches have been greatly impactful, they often suffer major blindspots, with critical issues surfacing only in the context of everyday use once systems are deployed. Recent years have seen many cases in which everyday users of algorithmic systems detect and raise awareness about harmful behaviors that they encounter in the course of their everyday interactions with these systems. However, to date little academic attention has been granted to these bottom-up, user-driven auditing processes. In this paper, we propose and explore the concept of everyday algorithm auditing, a process in which users detect, understand, and interrogate problematic machine behaviors via their day-to-day interactions with algorithmic systems. We argue that everyday users are powerful in surfacing problematic machine behaviors that may elude detection via more centrally-organized forms of auditing, regardless of users' knowledge about the underlying algorithms. We analyze several real-world cases of everyday algorithm auditing, drawing lessons from these cases for the design of future platforms and tools that facilitate such auditing behaviors. Finally, we discuss work that lies ahead, toward bridging the gaps between formal auditing approaches and the organic auditing behaviors that emerge in everyday use of algorithmic systems.	翻訳日:2023-04-01 07:32:04 公開日:2021-08-24
# 数個のコピーから量子多体システムを学ぶ Learning quantum many-body systems from a few copies ( http://arxiv.org/abs/2107.03333v2 ) ライセンス: Link先を確認	Cambyse Rouz\'e, Daniel Stilck Fran\c{c}a	(参考訳) 量子状態の物理特性を測定から推定することは、量子科学における最も基本的なタスクの1つである。本研究では,与えられた局所性の準局所可観測値の期待値から,系の大きさに多変量的に増大する多数のサンプルから相対誤差までを推定し,対象の可観測値の局所性を多項式的に推定することのできる状態の条件を同定する。これはいくつかのレジームにおいて既知のトモグラフィ法よりも指数関数的に改善される。我々は、量子状態、すなわち最大エントロピー法を量子最適輸送と古典影の新たな分野のツールと組み合わせることで、最も確立された手法の1つを達成する。我々は、この条件が相関のある種の減衰を示す全ての状態に対して成り立つと仮定し、いくつかの部分集合に対してそれを確立する。これらは、任意のハイパーグラフ上の局所通勤ハミルトニアンの1次元熱および高温ギブス状態や浅い回路の出力など、広く研究されている状態のクラスを含む。さらに,独立利害のサンプル複雑性を超えて,最大エントロピー法の改善を示す。これらは、多体状態の共分散行列の条件数に関する新しいバウンダリと同様に、ポストプロセッシングを効率的に実行することが可能なレギュレーションの同定を含む。 Estimating physical properties of quantum states from measurements is one of the most fundamental tasks in quantum science. In this work, we identify conditions on states under which it is possible to infer the expectation values of all quasi-local observables of a given locality up to a relative error from a number of samples that grows polylogarithmically with the system's size and polynomially on the locality of the target observables. This constitutes an exponential improvement over known tomography methods in some regimes. We achieve our results by combining one of the most well-established techniques to learn quantum states, namely the maximum entropy method, with tools from the emerging fields of quantum optimal transport and classical shadows. We conjecture that our condition holds for all states exhibiting some form of decay of correlations and establish it for several subsets thereof. These include widely studied classes of states such as one-dimensional thermal and high-temperature Gibbs states of local commuting Hamiltonians on arbitrary hypergraphs or outputs of shallow circuits. Moreover, we show improvements of the maximum entropy method beyond the sample complexity of independent interest. These include identifying regimes in which it is possible to perform the postprocessing efficiently as well as novel bounds on the condition number of covariance matrices of many-body states.	翻訳日:2023-03-23 04:14:28 公開日:2021-08-24
# フラクタル量子セルオートマトンからの非自明なリアプノフスペクトル Non-trivial Lyapunov spectrum from fractal quantum cellular automata ( http://arxiv.org/abs/2107.12191v2 ) ライセンス: Link先を確認	David Berenstein, Brian Kent	(参考訳) すべてのクリフォードセルオートマトンを含むクリフォードセルオートマトンの集合は、格子の各部位に2k$次元トーラス位相空間を持つ格子系の量子化によって生じる。ダイナミクスはトーラス変数の線型写像であり、また局所的でもある。さらにシンプレクティック構造も保持する。これらは、追加の形式変数の集合に整数係数を持つローラン多項式の成分を持つ2k\times 2k$行列によって分類される。これらのことは、量子代数の生成子の進化におけるフラクタルな振る舞いをもたらす。フラクタルな振る舞いは、元の線形力学系の非自明なリャプノフ指数をもたらす。この証明はこれらの行列の特性多項式のフーリエ解析を用いる。 A generalized set of Clifford cellular automata, which includes all Clifford cellular automata, result from the quantization of a lattice system where on each site of the lattice one has a $2k$-dimensional torus phase space. The dynamics is a linear map in the torus variables and it is also local: the evolution depends only on variables in some region around the original lattice site. Moreover it preserves the symplectic structure. These are classified by $2k\times 2k$ matrices with entries in Laurent polynomials with integer coefficients in a set of additional formal variables. These can lead to fractal behavior in the evolution of the generators of the quantum algebra. Fractal behavior leads to non-trivial Lyapunov exponents of the original linear dynamical system. The proof uses Fourier analysis on the characteristic polynomial of these matrices.	翻訳日:2023-03-20 21:29:27 公開日:2021-08-24
# 因果非分離プロセスにおける情報交換 Information Exchange in Causally Nonseparable Processes ( http://arxiv.org/abs/2108.07270v2 ) ライセンス: Link先を確認	Gianluca Francica	(参考訳) プロセスマトリックスフレームワークは、2つのパーティのシステムに対して、因果非分離構造の存在を予測する。情報交換を特徴とし,両当事者の総エントロピーが非分離性の尺度として作用することを示す。 For a system of two parties, the process matrix framework predicts the existence of causally nonseparable structures. We characterize the information exchanged, showing that the total entropy of the two parties acts as a measure for the nonseparability.	翻訳日:2023-03-18 07:20:42 公開日:2021-08-24
# 空間時間多重グリーンベルガー・ホルン・ザイリンガー(GHZ)測定を用いた量子ネットワークにおける距離非依存な絡み合い生成 Distance-Independent Entanglement Generation in a Quantum Network using Space-Time Multiplexed Greenberger-Horne-Zeilinger (GHZ) Measurements ( http://arxiv.org/abs/2108.09352v2 ) ライセンス: Link先を確認	Ashlesha Patil, Joshua I. Jacobson, Emily van Milligen, Don Towsley, Saikat Guha	(参考訳) リンクをうまく作成する量子ネットワークでは、隣り合うリピータノード間の共有ベル状態は、各タイムスロット内の確率$p$で、成功確率$q<1$、エンドからエンドのエンタングルメント生成速度は、マルチパスルーティングにもかかわらず、消費者間の距離とともに指数関数的に低下する。リピータが確率$q$で成功するGHZ基底で多重量子射影測定を行うことができれば、ある$(p,q)$領域で距離が変化しないが、指数関数的に外部に崩壊する。距離独立率が発生するこの領域は、新しいパーコレーション問題の超臨界領域である。我々は、このGHZプロトコルを拡張し、時間多重ブロック長$k$を組み込む。 k$が増加するにつれて、超臨界領域は拡大する。与えられた$(p,q)$の場合、絡み合い率は最初は$k$で増加し、超臨界領域の中で十分高い$k$で1/k$ GHZ状態として崩壊する。平均$\mu$ で指数関数的に分布するメモリコヒーレンス時間が組み込まれている場合、$k$ を増加させることで超臨界領域が無期限に増加することはない。最後に、スペース分割多重化、すなわち、上記プロトコルを最大$d$の切断されたネットワークリージョンで独立に実行することにより、$d$がネットワークのノード次数である場合、上記ランダム化されたローカルリンク状態プロトコルが超過できないスロットレート当たり1ghzの状態を超過することができる。 $(p,q)$が増加すると、1スロットあたり$d$GHZ状態の究極のミンカットエンタングルメント生成容量にアプローチすることができる。 In a quantum network that successfully creates links, shared Bell states between neighboring repeater nodes, with probability $p$ in each time slot, and performs Bell State Measurements at nodes with success probability $q<1$, the end to end entanglement generation rate drops exponentially with the distance between consumers, despite multi-path routing. If repeaters can perform multi-qubit projective measurements in the GHZ basis that succeed with probability $q$, the rate does not change with distance in a certain $(p,q)$ region, but decays exponentially outside. This region where the distance independent rate occurs is the supercritical region of a new percolation problem. We extend this GHZ protocol to incorporate a time-multiplexing blocklength $k$, the number of time slots over which a repeater can mix-and-match successful links to perform fusion on. As $k$ increases, the supercritical region expands. For a given $(p,q)$, the entanglement rate initially increases with $k$, and once inside the supercritical region for a high enough $k$, it decays as $1/k$ GHZ states per time slot. When memory coherence time exponentially distributed with mean $\mu$ is incorporated, it is seen that increasing $k$ does not indefinitely increase the supercritical region; it has a hard $\mu$ dependent limit. Finally, we find that incorporating space-division multiplexing, i.e., running the above protocol independently in up to $d$ disconnected network regions, where $d$ is the network's node degree, one can go beyond the 1 GHZ state per time slot rate that the above randomized local link-state protocol cannot surpass. As $(p,q)$ increases, one can approach the ultimate min-cut entanglement generation capacity of $d$ GHZ states per slot.	翻訳日:2023-03-17 22:52:53 公開日:2021-08-24
# ユーザエンゲージメントのためのモバイルヘルス設計 : 社会技術的アプローチの重要性 Designing Mobile Health for User Engagement: The Importance of Socio-Technical Approach ( http://arxiv.org/abs/2108.09786v2 ) ライセンス: Link先を確認	Tochukwu Ikwunne, Lucy Hederman and P.J. Wall	(参考訳) グローバル・サウスにおけるモバイルヘルス(mHealth)の有効性に対するユーザエンゲージメントの重要性にもかかわらず、そのような介入の多くはユーザエンゲージメント属性を含まない。これは、社会技術的側面が設計、開発、実装においてしばしば考慮されないためである。また,mHealthのユーザ中心設計プロセスにおいて社会技術的要因が果たす役割については,文献上はほとんど議論されていない。本研究は、mHealthデザインとユーザエンゲージメントに対するテクノ中心のアプローチと、ユーザ中心のデザインに既存の普遍的なフレームワークに依存しているアプローチが、グローバル・サウスのほとんどのmHealthプロジェクトが維持できない結果に、効果がないことを証明している。本研究は, ユーザエンゲージメントに対する態度を探るため, mHealthデザイナや開発者と半構造化インタビューを行ったシエラレオネのプロジェクトについて検討する。ユーザエンゲージメントの障壁とファシリテータは、技術的あるいは社会技術的に識別され、分類された。調査の結果,社会的要因を考慮せずに技術中心のアプローチを採用することは,ユーザのエンゲージメントに悪影響を及ぼす可能性が示唆された。そこで本研究では,mHealthにユーザエンゲージメント属性をより効果的に組み込むための新しい設計フレームワークを提案する。 Despite the significance of user engagement for efficacy of mobile health (mHealth) in the Global South, many such interventions do not include user-engaging attributes. This is because socio-technical aspects are frequently not considered during the design, development, and implementation, stages of such initiatives. In addition, there is little discussion in the literature about the role socio-technical factors play in user-centered design processes for mHealth. This research posits consideration of socio-technical factors is required as techno-centric approaches to mHealth design and user engagement, as well as those relying on existing universal frameworks for user-centered design, have proven to be ineffective with the result that most mHealth projects in the Global South fail to sustain. This research examines projects in Sierra Leone where semi-structured interviews were conducted with mHealth designers and developers in order to explore their attitudes towards user engagement in this case. Barriers and facilitators to user engagement were identified and classified as either technical or socio-technical. Findings from the study indicate that adoption of a techno-centric approach without consideration of socio-technical factors can negatively affect user's engagement. Based on these findings, we propose to develop a new design framework for more effective inclusion of user-engaging attributes in mHealth.	翻訳日:2023-03-17 18:45:14 公開日:2021-08-24
# ハイブリッド電荷センサ単電子トランジスタとCMOS回路のシミュレーション Simulations of hybrid charge-sensing single-electron-transistors and CMOS circuits ( http://arxiv.org/abs/2108.10467v1 ) ライセンス: Link先を確認	Tetsufumi Tanamoto and Keiji Ono	(参考訳) 単一電子トランジスタ(SET)は、量子計算など多くの分野で電荷センサとして広く使われている。一般に、SETの信号は相補的金属酸化物半導体(CMOS)デバイスよりも小さく、増幅回路の多くはこれらの信号を拡大するために必要である。 1つの小さな出力を増幅する代わりに、理論上はSETの1つが参照として使用されるようなペアのSETの増幅を考える。従来のSPICE(Simulation Program with Integrated Circuit Emphasis)回路シミュレータを用いて,SETとCMOSデバイスの2段階増幅過程をシミュレートする。 CMOS回路へのSETのペア実装により、SETからCMOS回路への直接信号転送により、SETの統合がより実現可能となる。 Single-electron transistors (SETs) have been extensively used as charge sensors in many areas such as quantum computations. In general, the signals of SETs are smaller than those of complementary metal-oxide semiconductor (CMOS) devices, and many amplifying circuits are required to enlarge these signals. Instead of amplifying a single small output, we theoretically consider the amplification of pairs of SETs, such that one of the SETs is used as a reference. We simulate the two-stage amplification process of SETs and CMOS devices using a conventional SPICE (Simulation Program with Integrated Circuit Emphasis) circuit simulator. Implementing the pairs of SETs into CMOS circuits makes the integration of SETs more feasible because of direct signal transfer from the SET to the CMOS circuits.	翻訳日:2023-03-17 07:52:42 公開日:2021-08-24
# 曲線空間時間における2つの検出器間の平衡および非平衡量子相関 Equilibrium and nonequilibrium quantum correlations between two detectors in curved space time ( http://arxiv.org/abs/2108.10454v1 ) ライセンス: Link先を確認	He Wang and Jin Wang	(参考訳) 2量子系(カーブラックホールの地平線付近)で符号化された平衡および非平衡量子情報相関について検討した。質量と角運動量,さらに局所曲率や加速度が2量子ビット間の量子相関の挙動に及ぼす影響について検討した。 2つの量子ビットの量子情報は時空構造にエンコードされていることを示す。非平衡の場合、非平衡は相関にも寄与する。 We investigate the equilibrium and nonequilibrium quantum information correlations encoded in two-qubit system (near the horizon of a Kerr black hole). We study the impact of mass and the angular momentum, and further the local curvature or accelerations on the behaviors of the quantum correlations between two qubits. We show the quantum information of two qubits is encoded in the space time structure. In nonequilibrium case, the nonequilibrium can also contribute to the correlations.	翻訳日:2023-03-17 07:51:59 公開日:2021-08-24
# 複素化された Poincar\'e 群の普遍被覆の既約ユニタリ表現のレビューと具体的な記述 Review and concrete description of the irreducible unitary representations of the universal cover of the complexified Poincar\'e group ( http://arxiv.org/abs/2108.10726v1 ) ライセンス: Link先を確認	Luigi Borasi	(参考訳) 我々は、既約ユニタリ表現を $\mathbb{C}^4\rtimes\mathbf{Spin}(4,\mathbb{C})$, すなわち、複素化されたポアンカル群 $\mathbb{C}^4\rtimes\mathbf{SO}(4,\mathbb{C})$ の普遍被覆の教育的表現を与える。これらの表現は1967年にロフマンによって初めて研究された。我々は、この文脈で関連する一般的なウィグナー・マッキー理論の事実とともに、彼の結果の現代的な定式化を提供する。さらに、これらの表現を実現するための異なる方法について議論し、非ゼロの「複素質量」の場合、より明示的な実現の詳細な構成を与える。この明示的な実現は、古典的なウィグナーの場合の $\mathbb{R}^4\rtimes\mathbf{Spin}^0(1,3)$ と平行して拡張する。我々の分析は、フェルミオン理論のユークリッド的定式化への関心が動機である。 We give a pedagogical presentation of the irreducible unitary representations of $\mathbb{C}^4\rtimes\mathbf{Spin}(4,\mathbb{C})$, that is, of the universal cover of the complexified Poincar\'e group $\mathbb{C}^4\rtimes\mathbf{SO}(4,\mathbb{C})$. These representations were first investigated by Roffman in 1967. We provide a modern formulation of his results together with some facts from the general Wigner-Mackey theory which are relevant in this context. Moreover, we discuss different ways to realize these representations and, in the case of a non-zero "complex mass", we give a detailed construction of a more explicit realization. This explicit realization parallels and extends the one used in the classical Wigner case of $\mathbb{R}^4\rtimes\mathbf{Spin}^0(1,3)$. Our analysis is motivated by the interest in the Euclidean formulation of Fermionic theories.	翻訳日:2023-03-17 07:48:47 公開日:2021-08-24
# BTZブラックホールの絡み合いパターンのクラスター代数的記述 Cluster algebraic description of entanglement patterns for the BTZ black hole ( http://arxiv.org/abs/2108.10638v1 ) ライセンス: Link先を確認	Bercel Boldis and P\'eter L\'evay	(参考訳) 高温限界における静的BTZブラックホールと双対な2次元共形場理論の熱状態について検討する。静的BTZスライスの境界を$N$サブシステムに分割した後、熱状態の絡み合いパターンを符号化する基盤となる$C_{N-1}$クラスタ代数が存在することを示す。また、固定された$N$に対してそのようなパターンを幾何学的にカプセル化するポリトープがシクロヘドロン${\mathcal C}_{N-1}$であることを示す。あるいは、これらの絡み合いのパターンは、Zamorodchikov $Y$-system of $C_{N-1}$ typeという用語で測地学(キネマティック空間)の空間に表せる。このような$y$-システムの境界条件は、btzブラックホールのエントロピーを特徴としている。 We study the thermal state of a two dimensional conformal field theory which is dual to the static BTZ black hole in the high temperature limit. After partitioning the boundary of the static BTZ slice into $N$ subsystems we show that there is an underlying $C_{N-1}$ cluster algebra encoding entanglement patterns of the thermal state. We also demonstrate that the polytope encapsulating such patterns in a geometric manner for a fixed $N$ is the cyclohedron ${\mathcal C}_{N-1}$. Alternatively these patterns of entanglement can be represented in the space of geodesics (kinematic space) in terms of a Zamolodchikov $Y$-system of $C_{N-1}$ type. The boundary condition for such an $Y$-system is featuring the entropy of the BTZ black hole.	翻訳日:2023-03-17 07:47:36 公開日:2021-08-24
# 非可換輸送距離空間上の量子チャネルのリッチ曲率 Ricci curvature of quantum channels on non-commutative transportation metric spaces ( http://arxiv.org/abs/2108.10609v1 ) ライセンス: Link先を確認	Li Gao and Cambyse Rouz\'e	(参考訳) Ollivierの研究に続いて、状態空間上の非可換計量の縮約として、量子チャネルの粗いリッチ曲率を導入する。これらの指標は[N]の精神における非可換輸送コストとして定義される。ゴズランとc. l\'{e}onard。 2006] は、文献の異なる量子ワッサースタイン距離に対する統一的なアプローチを与える。粗リッチ曲率下限とその双対勾配推定は、適切な仮定の下では、ポアンカルの不等式(スペクトルギャップ)および輸送コストの不等式を意味する。干渉関係を用いて、ギブスサンプル、ボソニックおよびフェルミオンビームスプリッターおよびn-量子ビット上のパウリチャネルの粗リッチ曲率の正の有界を得る。 Following Ollivier's work, we introduce the coarse Ricci curvature of a quantum channel as the contraction of non-commutative metrics on the state space. These metrics are defined as a non-commutative transportation cost in the spirit of [N. Gozlan and C. L\'{e}onard. 2006], which gives a unified approach to different quantum Wasserstein distances in the literature. We prove that the coarse Ricci curvature lower bound and its dual gradient estimate, under suitable assumptions, imply the Poincar\'{e} inequality (spectral gap) as well as transportation cost inequalities. Using intertwining relations, we obtain positive bounds on the coarse Ricci curvature of Gibbs samplers, Bosonic and Fermionic beam-splitters as well as Pauli channels on n-qubits.	翻訳日:2023-03-17 07:47:12 公開日:2021-08-24
# 量子誤差補正のための効率的な診断法 Efficient diagnostics for quantum error correction ( http://arxiv.org/abs/2108.10830v1 ) ライセンス: Link先を確認	Pavithran Iyer, Aditya Jain, Stephen D. Bartlett and Joseph Emerson	(参考訳) フォールトトレラント量子コンピューティングは、リソースのオーバーヘッドを正確に見積もる必要があるが、ゲート忠実度やダイヤモンド距離といった標準メトリクスは、論理性能の予測に乏しいことが示されている。本稿では,pauliエラー再構成に基づくスケーラブルな実験手法を提案する。数値的なエビデンスから,本手法は,限られたデータであっても,様々な誤差モデルに対する標準誤差測定値に基づいて予測を著しく上回ることを示す。本稿では,この手法が誤り訂正スキームの選択にどのように役立つかを説明する。 Fault-tolerant quantum computing will require accurate estimates of the resource overhead, but standard metrics such as gate fidelity and diamond distance have been shown to be poor predictors of logical performance. We present a scalable experimental approach based on Pauli error reconstruction to predict the performance of concatenated codes. Numerical evidence demonstrates that our method significantly outperforms predictions based on standard error metrics for various error models, even with limited data. We illustrate how this method assists in the selection of error correction schemes.	翻訳日:2023-03-17 07:38:57 公開日:2021-08-24
# 量子コンピューティング応用のためのFermilabにおける大規模ミリケルビンプラットフォーム A large millikelvin platform at Fermilab for quantum computing applications ( http://arxiv.org/abs/2108.10816v1 ) ライセンス: Link先を確認	Matthew Hollister, Ram Dhuley and Grzegorz Tatkowski	(参考訳) 大きなmk冷却プラットフォームの必要性は、量子コンピューティングプラットフォームにおいて、ますます多くの極低温量子ビットをホストしたいという願望に支えられている。我々は,国立量子イニシアティブの下でエネルギー省から資金提供を受けたフェルミラボの超伝導量子材料・システムセンターの一環として,ミリケルビン温度を2m×1.5m程度の実験量で到達可能な極低温プラットフォームを開発している。このプラットフォームは超伝導高周波加速器キャビティ技術に基づく3次元量子ビットアーキテクチャをホストすることを目的としている。本稿では,プラットフォームの基本設計と期待する性能パラメータについて述べる。 The need for larger mK cooling platforms is being driven by the desire to host ever growing numbers of cryogenic qubits in quantum computing platforms. As part of the Superconducting Quantum Materials and Systems Center at Fermilab funded through the Department of Energy under the National Quantum Initiative, we are developing a cryogenic platform capable of reaching millikelvin temperatures in an experimental volume of 2 meters diameter by approximately 1.5 meters in height. The platform is intended to host a three-dimensional qubit architecture based on superconducting radiofrequency accelerator cavity technologies. This paper describes the baseline design of the platform, along with the expected key performance parameters.	翻訳日:2023-03-17 07:38:47 公開日:2021-08-24
# 非半単純tqftからの擬エルミートレビン-ウェンモデル Pseudo-Hermitian Levin-Wen models from non-semisimple TQFTs ( http://arxiv.org/abs/2108.10798v1 ) ライセンス: Link先を確認	Nathan Geer, Aaron D. Lauda, Bertrand Patureau-Mirand, Joshua Sussan	(参考訳) 完全可解な擬エルミート型2次元スピンハミルトニアンの大きなクラスを構築する。これらの系の基底状態はシステムの空間的トポロジーにのみ依存する。トラエフ・ビロモデルを一般化した非半単純tqftを用いて,表面上の基底状態系を表面に割り当てられた値で同定する。非自明な例は、量子パラメータがユニタリの根に特殊化される量子sl(2)の表現の非半単純部分圏から生じる。 We construct large classes of exactly solvable pseudo-Hermitian 2D spin Hamiltonians. The ground states of these systems depend only on the spatial topology of the system. We identify the ground state system on a surface with the value assigned to the surface by a non-semisimple TQFT generalizing the Turaev-Viro model. A non-trivial example arises from a non-semisimple subcategory of representations of quantum sl(2) where the quantum parameter is specialized to a root of unity.	翻訳日:2023-03-17 07:38:24 公開日:2021-08-24
# マルコフ浴と量子雪崩 Markovian baths and quantum avalanches ( http://arxiv.org/abs/2108.10796v1 ) ライセンス: Link先を確認	Dries Sels	(参考訳) 本稿では,多体局所化相と熱介在物の安定性に関する数値的な結果について述べる。この作業は、Morningstarらによる最近の提案を単純化する。 [arXiv:2107.05642]およびマルコフ浴に摂動的に結合する小さな乱れたスピン鎖の研究。正準不定形ハイゼンベルク鎖の雪崩安定性に対する臨界障害はW>20を超えた。アンダーソン絶縁体とは対照的に、雪崩しきい値はシステムサイズとかなりずれており、研究体制の飽和の証拠はない。私は、結果は多体局所化フェーズの欠如によって最も容易に説明できると主張する。 In this work I will discuss some numerical results on the stability of the many-body localized phase to thermal inclusions. The work simplifies a recent proposal by Morningstar et al. [arXiv:2107.05642] and studies small disordered spin chains which are perturbatively coupled to a Markovian bath. The critical disorder for avalanche stability of the canonical disordered Heisenberg chain is shown to exceed W>20. In stark contrast to the Anderson insulator, the avalanche threshold drifts considerably with system size, with no evidence of saturation in the studied regime. I will argue that the results are most easily explained by the absence of a many-body localized phase.	翻訳日:2023-03-17 07:38:16 公開日:2021-08-24
# シリコンスピン量子ビットの低劣化・ロバストマイクロマグネット設計 Low dephasing and robust micromagnet designs for silicon spin qubits ( http://arxiv.org/abs/2108.10769v1 ) ライセンス: Link先を確認	N. I. Dumoulin Stuyck, F. A. Mohiyaddin, R. Li, M. Heyns, B. Govoreanu, and I. P. Radu	(参考訳) シリコン量子ビットでの電子スピン操作を可能にするためにマイクロマグネットが登場し、99:9%以上の単一量子ビットゲートフィデリティを実現している。しかし、これらのマイクロマグネットは歪んだ磁場勾配を量子ビットに応用し、スピン状態は電場ノイズの影響を受けやすく、コヒーレンス時間を制限している。ここでは、量子ビットの劣化を最小限に抑えつつ、高速な量子ビット制御とアドレス可能性を実現するマグネットの設計について述べる。具体的には、磁場勾配による劣化を最小限に抑え、量子ドットに対する磁気次元と位置を設計、最適化する。この設計によるマイクロマグネットによるデフェスレートは、最先端の実装よりも最大3桁低いため、長いコヒーレンス時間を可能にする。この設計は製造誤差に対して堅牢であり、様々なシリコン量子ビットデバイスジオメトリと組み合わせることで、コヒーレンス制限因子の探索と新しいアップスケーリングアプローチを可能にする。 Using micromagnets to enable electron spin manipulation in silicon qubits has emerged as a very popular method, enabling single-qubit gate fidelities larger than 99:9%. However, these micromagnets also apply stray magnetic field gradients onto the qubits, making the spin states susceptible to electric field noise and limiting their coherence times. We describe here a magnet design that minimizes qubit dephasing, while allowing for fast qubit control and addressability. Specifically, we design and optimize magnet dimensions and position relative to the quantum dots, minimizing dephasing from magnetic field gradients. The micromagnet-induced dephasing rates with this design are up to 3-orders of magnitude lower than state-of-the-art implementations, allowing for long coherence times. This design is robust against fabrication errors, and can be combined with a wide variety of silicon qubit device geometries, thereby allowing exploration of coherence limiting factors and novel upscaling approaches.	翻訳日:2023-03-17 07:37:06 公開日:2021-08-24
# 識別可能な量子エミッタの光絡み合い Optical Entanglement of Distinguishable Quantum Emitters ( http://arxiv.org/abs/2108.10928v1 ) ライセンス: Link先を確認	David Levonian, Ralf Riedinger, Bartholomeus Machielse, Erik Knall, Mihir Bhaskar, Can Knaut, Rivka Bekenstein, Hongkun Park, Marko Loncar, Mikhail Lukin	(参考訳) 固体量子エミッターは、長寿命スピン記憶、高忠実度局所演算、長距離絡み合いのための光接続により量子ネットワークの実現に有望な候補である。しかし、局所環境の違いにより、固体エミッタは通常、異なる遷移周波数を特徴とし、任意のエミッタ対間の光的に媒介する絡み合いを作るのが困難である。本稿では,多くの直線幅で分離された光遷移を持つエミッタの接合方法を提案する。本手法では, 電子光学変調器を用いて, 一対のスピン量子ビットのパリティ測定を行うことができる。 7.4GHzの光遷移を持つダイヤモンドナノフォトニックキャビティを用いた2つのシリコン空洞を用いたプロトコルを実験的に実証した。識別可能なエミッタで作業することで、個別の量子ビットアドレッシングと読み出しが可能となり、コロケーションと空間分離エミッタの並列制御と絡み合いが可能になり、量子情報処理システムのスケールアップに向けた重要なステップとなる。 Solid-state quantum emitters are promising candidates for the realization of quantum networks, owing to their long-lived spin memories, high-fidelity local operations, and optical connectivity for long-range entanglement. However, due to differences in local environment, solid-state emitters typically feature a range of distinct transition frequencies, which makes it challenging to create optically mediated entanglement between arbitrary emitter pairs. We propose and demonstrate an efficient method for entangling emitters with optical transitions separated by many linewidths. In our approach, electro-optic modulators enable a single photon to herald a parity measurement on a pair of spin qubits. We experimentally demonstrate the protocol using two silicon-vacancy center sin a diamond nanophotonic cavity, with optical transitions separated by 7.4 GHz. Working with distinguishable emitters allows for individual qubit addressing and readout, enabling parallel control and entanglement of both co-located and spatially separated emitters, a key step towards scaling up quantum information processing systems	翻訳日:2023-03-17 07:30:48 公開日:2021-08-24
# 物質波導波路QEDにおけるマルチバンドおよびアレイ効果 Multiband and array effects in matter-wave-based waveguide QED ( http://arxiv.org/abs/2108.11759v1 ) ライセンス: Link先を確認	Alfonso Lanuza, Joonhyuk Kwon, Youngshin Kim and Dominik Schneble	(参考訳) 原子性物質波の自然放出に関する最近の実験は、導波路に結合した量子エミッタの挙動に新しい窓を開く。本稿では、導波路の帯域分散関係を近似することなく、理論上このシステムを研究するための無限積に基づくアプローチを開発する。本研究では, 1, 複数, 無限個の量子エミッタの1次元配列のシステムを解くとともに, 実験との比較を行った。このことは崩壊スペクトルの詳細な特性を導き、対流境界状態の族、超放射と異なるマルコフ放射を増強するための新しいメカニズム、物質-波分極の出現へと繋がる。 Recent experiments on spontaneous emission of atomic matter waves open a new window into the behavior of quantum emitters coupled to a waveguide. Here we develop an approach based on infinite products to study this system theoretically, without the need to approximate the band dispersion relation of the waveguide. We solve the system for a one-dimensional array of one, multiple and an infinite number of quantum emitters and compare with the experiments. This leads to a detailed characterization of the decay spectrum, with a family of in-gap bound states, new mechanisms for enhanced Markovian emission different from superradiance, and the emergence of matter-wave polaritons.	翻訳日:2023-03-17 07:21:12 公開日:2021-08-24
# GIAOs上の複雑な2電子積分のコレスキー分解:強磁場下での大分子に対する効率的なMP2計算 Cholesky decomposition of complex two-electron integrals over GIAOs: Efficient MP2 computations for large molecules in strong magnetic fields ( http://arxiv.org/abs/2108.11370v1 ) ライセンス: Link先を確認	Simon Blaschke and Stella Stopkowicz	(参考訳) 大規模量子化学計算では、電子反発積分(ERI)テンソルがメモリとディスク空間のボトルネックとなる。外部有限磁場を用いると、置換対称性が減少し、複雑な積分や波動関数パラメータを扱う必要があるため、この問題はさらに顕著になる。この問題を緩和する一つの方法は、ゲージを含む原子軌道上の複素エリスにコレスキー分解(cd)を適用することである。厳密でロバストなエラー制御を維持しつつ、選択されたベースセットから線形依存する製品密度を選択的に捨てて好適な圧縮率を確立する。このエラー制御は、事前定義された補助基底セットに依存する密度フィッティングのような概念的に類似した方法よりも大きな利点となる。有限体 (ff) Hartree-Fock と ff 2次 M{\o}ller Plesset 摂動理論の枠組みにおける CD の利用を実装した。本研究は,CD圧縮速度が有限磁場の存在下での計算において特に有用であることを示す。 FF-CD-MP2方式は、2000以上の基底関数を持つ系の適切な時間間隔で強磁場下での相関処理を可能にする。 In large-scale quantum-chemical calculations the electron-repulsion integral (ERI) tensor rapidly becomes the bottleneck in terms of memory and disk space. When an external finite magnetic field is employed, this problem becomes even more pronounced because of the reduced permutational symmetry and the need to work with complex integrals and wave-function parameters. One way to alleviate the problem is to employ a Cholesky decomposition (CD) to the complex ERIs over gauge-including atomic orbitals. The CD scheme establishes favourable compression rates by selectively discarding linearly dependent product densities from the chosen basis set while maintaining a rigorous and robust error control. This error control constitutes the main advantage over conceptually similar methods such as density fitting which rely on employing pre-defined auxiliary basis sets. We implemented the use of the CD in the framework of finite-field (ff) Hartree-Fock and ff second-order M{\o}ller Plesset perturbation theory. Our work demonstrates that the CD compression rates are particularly beneficial in calculations in the presence of a finite magnetic field. The ff-CD-MP2 scheme enables the correlated treatment of systems with more than 2000 basis functions in strong magnetic fields within a reasonable time span.	翻訳日:2023-03-17 07:20:59 公開日:2021-08-24
# 時間ではなく1つの空間座標を扱うハミルトン的な形式主義 A Hamiltonian-like formalism that treats one spatial coordinate -- rather than time -- differently ( http://arxiv.org/abs/2108.11330v1 ) ライセンス: Link先を確認	Sivapalan Chelvaniththilan	(参考訳) クナトゥム場理論(QFT)のハミルトン形式とラグランジュ形式は同値である。しかし、ローレンツ不変性はラグランジュ形式論において明確に見ることができるが、ハミルトニアン形式ではそれほど明確ではない。これは、時間はハミルトニアン形式論の空間座標とは少し異なる扱いを受けるからである。本稿では、(作用素と状態ベクトルを持つ)ハミルトニアン形式と同様に、空間座標の2つと等しい足場で時間を扱う別の形式を考案できるかどうかを考察するが、3つ目の形式は異なる扱いをされるが、時間は通常ハミルトニアン形式である。 The Hamiltonian and Lagrangian formalisms of Qunatum Field Theory (QFT) are equivalent. But while Lorentz invariance can be clearly seen in the Lagrangian formalism, it is not so explicit in the Hamiltonian one. This is because time is treated a little differently from the spatial coordinates in the Hamiltonian formalism. In this paper, I explore whether it is possible to devise another formalism that is just like the Hamiltonian one (with operators and state vectors) but which treats time on an equal footing with two of the spatial coordinates, while the third one is treated differently, the way time is in the usual Hamiltonian formalism.	翻訳日:2023-03-17 07:20:38 公開日:2021-08-24
# 全シリコン300mm集積プロセスにおける均一スピン量子デバイス Uniform Spin Qubit Devices in an All-Silicon 300 mm Integrated Process ( http://arxiv.org/abs/2108.11317v1 ) ライセンス: Link先を確認	N. I. Dumoulin Stuyck, R. Li, C. Godfrin, A. Elsayed, S. Kubicek, J. Jussot, B. T. Chan, F. A. Mohiyaddin, M. Shehata, G. Simion, Y. Canvel, L. Goux, M. Heyns, B. Govoreanu, and I. P. Radu	(参考訳) 電子スピン量子ビットの大きな配列は、製造とデバイス均一性を大幅に改善する必要がある。ここでは300KからmKまで優れた量子ビットデバイス均一性とチューニング性を示す。これは、重なり合う多結晶シリコン系ゲートスタックを「オールシリコン」とリソグラフィ的に柔軟な300mm流に組み込むことで、初めて達成される。低次si/sio$_2$は、10kホール移動度が1.5 \cdot 10^4$ $cm^2$/vsである。電荷ノイズが低い(3.6$\mu$eV/$\sqrt{\mathrm{Hz}}$ at 1 Hz)よく制御されたセンサーは、最後の電子まで電荷を感知するために用いられる。約20年間(2-100GHz)にわたって優れた再現可能な相互結合制御を実証した。スピン操作と単発スピン読み出しを行い,約150$\mu$eVの谷分割エネルギーを抽出した。これらの低順で均一な量子ビットデバイスと300mmのfab統合は、大規模量子プロセッサへの高速なスケールアップの道を開く。 Larger arrays of electron spin qubits require radical improvements in fabrication and device uniformity. Here we demonstrate excellent qubit device uniformity and tunability from 300K down to mK temperatures. This is achieved, for the first time, by integrating an overlapping polycrystalline silicon-based gate stack in an 'all-Silicon' and lithographically flexible 300mm flow. Low-disorder Si/SiO$_2$ is proved by a 10K Hall mobility of $1.5 \cdot 10^4$ $cm^2$/Vs. Well-controlled sensors with low charge noise (3.6 $\mu$eV/$\sqrt{\mathrm{Hz}}$ at 1 Hz) are used for charge sensing down to the last electron. We demonstrate excellent and reproducible interdot coupling control over nearly 2 decades (2-100 GHz). We show spin manipulation and single-shot spin readout, extracting a valley splitting energy of around 150 $\mu$eV. These low-disorder, uniform qubit devices and 300mm fab integration pave the way for fast scale-up to large quantum processors.	翻訳日:2023-03-17 07:20:27 公開日:2021-08-24
# 均等:制御摂動の注入による量子アニーラーの忠実性の向上 EQUAL: Improving the Fidelity of Quantum Annealers by Injecting Controlled Perturbations ( http://arxiv.org/abs/2108.10964v1 ) ライセンス: Link先を確認	Ramin Ayanzadeh, Poulami Das, Swamit S. Tannu and Moinuddin Qureshi	(参考訳) 量子コンピューティング (quantum computing) は、量子力学特性を用いて計算困難問題を高速化する情報処理パラダイムである。有望だが、既存のゲートベースの量子コンピュータは数十キュービットしかなく、ほとんどのアプリケーションでは十分ではない。一方、数千の量子ビットを持つ既存のQAは、いくつかのドメイン固有の最適化問題を解く可能性がある。 QAは単一命令マシンであり、プログラムを実行するために、ハミルトニアンにキャストされ、ハードウェアに埋め込まれ、単一の量子マシン命令(QMI)が実行される。残念なことに、ハードウェアのノイズと欠陥は、QMIが数千のトライアルで実行されているとしても、QAのサブ最適化ソリューションをもたらす。 QAのプログラム可能性の制限は、ユーザが全てのトライアルで同じQMIを実行することを意味する。この実験はすべて、実行中に同様のノイズプロファイルを経験し、体系的なバイアスをもたらす。我々は,系統的バイアスが最適解につながり,より多くの試行の実行や既存の誤り緩和スキームを用いることで軽減できないことを観察する。この課題に対処するために、EQUAL(Ensemble Quantum Annealing)を提案する。 EQUALは、制御された摂動をプログラムQMIに追加することにより、QMIのアンサンブルを生成する。 QMIのアンサンブルは、QA上で実行されると、全てのトライアルで同じバイアスに遭遇することを避けて、ソリューションの品質を向上させる。 2041-qubit D-Wave QAを用いて評価したところ、EQUALは平均14%(最大26%)でベースラインと理想の差を橋渡しし、追加の試行は不要であった。 EQUALは既存のエラー軽減スキームと組み合わせて、ベースラインとイデアルの違いを平均で55%(最大68%)橋渡しすることができる。 Quantum computing is an information processing paradigm that uses quantum-mechanical properties to speedup computationally hard problems. Although promising, existing gate-based quantum computers consist of only a few dozen qubits and are not large enough for most applications. On the other hand, existing QAs with few thousand of qubits have the potential to solve some domain-specific optimization problems. QAs are single instruction machines and to execute a program, the problem is cast to a Hamiltonian, embedded on the hardware, and a single quantum machine instruction (QMI) is run. Unfortunately, noise and imperfections in hardware result in sub-optimal solutions on QAs even if the QMI is run for thousands of trials. The limited programmability of QAs mean that the user executes the same QMI for all trials. This subjects all trials to a similar noise profile throughout the execution, resulting in a systematic bias. We observe that systematic bias leads to sub-optimal solutions and cannot be alleviated by executing more trials or using existing error-mitigation schemes. To address this challenge, we propose EQUAL (Ensemble Quantum Annealing). EQUAL generates an ensemble of QMIs by adding controlled perturbations to the program QMI. When executed on the QA, the ensemble of QMIs steers the program away from encountering the same bias during all trials and thus, improves the quality of solutions. Our evaluations using the 2041-qubit D-Wave QA show that EQUAL bridges the difference between the baseline and the ideal by an average of 14% (and up to 26%), without requiring any additional trials. EQUAL can be combined with existing error mitigation schemes to further bridge the difference between the baseline and ideal by an average of 55% (and up to 68%).	翻訳日:2023-03-17 07:20:11 公開日:2021-08-24
# 量子崩壊の形式的側面 Formal Aspects of Quantum Decay ( http://arxiv.org/abs/2108.10957v1 ) ライセンス: Link先を確認	D. F. Ram\'irez Jim\'enez and N. G. Kelkar	(参考訳) 不安定状態の生存確率の計算のためのフォック・クリロフ形式は、状態の密度に関する数学的制約に特に注意を払って再検討され、フーリエ変換は生存振幅を与える。純粋な指数的生存振幅に対応する状態の密度を構築することは不可能であることを示す。彼の生存確率 $p(t)$ と状態密度の自己相関関数はコサインフーリエ変換の対であることが示されている。この結果はウィナー・ヒンチンの定理の特別な場合であり、$P(t)$ を時間の偶関数とし、それによって状態の密度が大きなエネルギーで消えるフォームファクタを含むように強制する。振動数の関数として$P(t)$,$n$を表現し、非指数性から指数性への移行領域の小さな時間における部分的特徴と、大きな時間におけるパワーローの崩壊への指数性について論じる。短時間の遷移は、生存確率が1つの振動を完了した時に起こる。発振の数は共鳴状態の性質に依存し、不安定状態の進化の完全な記述は各領域における発振の数の限界を決定することによって提供される。 The Fock-Krylov formalism for the calculation of survival probabilities of unstable states is revisited paying particular attention to the mathematical constraints on the density of states, the Fourier transform of which gives the survival amplitude. We show that it is not possible to construct a density of states corresponding to a purely exponential survival amplitude. he survival probability $P(t)$ and the autocorrelation function of the density of states are shown to form a pair of cosine Fourier transforms. This result is a particular case of the Wiener Khinchin theorem and forces $P(t)$ to be an even function of time which in turn forces the density of states to contain a form factor which vanishes at large energies. Subtle features of the transition regions from the non-exponential to the exponential at small times and the exponential to the power law decay at large times are discussed by expressing $P(t)$ as a function of the number of oscillations, $n$, performed by it. The transition at short times is shown to occur when the survival probability has completed one oscillation. The number of oscillations depend on the properties of the resonant state and a complete description of the evolution of the unstable state is provided by determining the limits on the number of oscillations in each region.	翻訳日:2023-03-17 07:19:39 公開日:2021-08-24
# オープン量子ロータ:相関と物理電流を接続する Open Quantum Rotors: Connecting Correlations and Physical Currents ( http://arxiv.org/abs/2108.10955v1 ) ライセンス: Link先を確認	Ricardo Puebla, Alberto Imparato, Alessio Belenchia, Mauro Paternostro	(参考訳) 我々は、温度の異なる一連の熱浴と相互作用する量子ロータの有限な一次元鎖を考える。ローター間の相互作用をキラルにすると、そのようなシステムは自律的な熱モーターとして振る舞う。このような動的応答は、熱力学的極限における系の基底状態が量子相転移を示すハミルトンパラメータの範囲で強く発音される。このようなワークポイントは、システムの状態内の大きな量子コヒーレンスと多部量子相関と関連付けられている。このことは、そのような量子自律モーターの最適動作機構が最大量子性の一つであることを示唆している。 We consider a finite one-dimensional chain of quantum rotors interacting with a set of thermal baths at different temperatures. When the interaction between the rotors is made chiral, such a system behaves as an autonomous thermal motor, converting heat currents into non-vanishing rotational ones. Such a dynamical response is strongly pronounced in the range of the Hamiltonian parameters for which the ground state of the system in the thermodynamic limit exhibits a quantum phase transition. Such working points are associated with large quantum coherence and multipartite quantum correlations within the state of the system. This suggests that the optimal operating regime of such quantum autonomous motor is one of maximal quantumness.	翻訳日:2023-03-17 07:19:19 公開日:2021-08-24
# キャブライディング時の通勤者の快適性に及ぼす運転行動の影響:ドライバーレーティングの新しい視点に向けて Impact of Driving Behavior on Commuter's Comfort during Cab Rides: Towards a New Perspective of Driver Rating ( http://arxiv.org/abs/2108.10944v1 ) ライセンス: Link先を確認	Rohit Verma, Sugandh Pargal, Debasree Das, Tanusree Parbat, Sai Shankar Kambalapalli, Bivas Mitra, and Sandip Chakraborty	(参考訳) タクシーの通勤の快適さは、ドライバーのレーティングや、uberやlyftのような配車会社の評価に影響する。既存の研究では、通勤者の快適性はパーソナライズされたレベルで異なるだけでなく、同じ通勤者に対して異なる旅行で異なる認識を受けることが示されている。さらに、運転行動や運転環境など、快適感に影響を及ぼす要因がいくつかある。運転行動の影響による通勤者の快適感を自動的に抽出することは、通勤者の満足度を満足させるのに役立つドライバーへのタイムリーなフィードバックに不可欠である。これを踏まえて、通常このようなタクシーに乗る通勤者約200人を調査し、タクシーの乗り心地に影響を及ぼす一連の特徴を得た。次に、通勤者からスマートフォンセンサデータを収集し、そのデータから空間時系列特徴を抽出し、運転に関して5ポイントスケールで通勤者の快適さのレベルを算出するシステム ridergoを開発した。 Ridergoは階層的時間記憶モデルに基づくアプローチを用いて特徴分布の異常を観測し、マルチタスク学習に基づくニューラルネットワークモデルを訓練し、パーソナライズされたレベルで通勤者の快適なレベルを得る。モデルはまた、通勤者に対して、利用可能なデータセットに新しいデータポイントを追加するようにインテリジェントにクエリし、定期的なトレーニングよりも自分自身を改善する。被験者30名を対象にRidergoの評価を行った結果,運転が快適感に影響を及ぼす場合,効率のよい快適度が得られた。 Commuter comfort in cab rides affects driver rating as well as the reputation of ride-hailing firms like Uber/Lyft. Existing research has revealed that commuter comfort not only varies at a personalized level but also is perceived differently on different trips for the same commuter. Furthermore, there are several factors, including driving behavior and driving environment, affecting the perception of comfort. Automatically extracting the perceived comfort level of a commuter due to the impact of the driving behavior is crucial for a timely feedback to the drivers, which can help them to meet the commuter's satisfaction. In light of this, we surveyed around 200 commuters who usually take such cab rides and obtained a set of features that impact comfort during cab rides. Following this, we develop a system Ridergo which collects smartphone sensor data from a commuter, extracts the spatial time series feature from the data, and then computes the level of commuter comfort on a five-point scale with respect to the driving. Ridergo uses a Hierarchical Temporal Memory model-based approach to observe anomalies in the feature distribution and then trains a Multi-task learning-based neural network model to obtain the comfort level of the commuter at a personalized level. The model also intelligently queries the commuter to add new data points to the available dataset and, in turn, improve itself over periodic training. Evaluation of Ridergo on 30 participants shows that the system could provide efficient comfort score with high accuracy when the driving impacts the perceived comfort.	翻訳日:2023-03-17 07:19:11 公開日:2021-08-24
# ソーシャルメディア上のフェイクニュース拡散者の心理・動機要因によるプロファイリング Profiling Fake News Spreaders on Social Media through Psychological and Motivational Factors ( http://arxiv.org/abs/2108.10942v1 ) ライセンス: Link先を確認	Mansooreh Karami, Tahora H. Nazer, Huan Liu	(参考訳) 過去10年間のフェイクニュースの台頭は、選挙に関する意見の揺さぶりから、パンデミックの間に不確実性を生み出すまで、数多くの結果をもたらした。偽ニュースに対処するために開発されたほとんどの方法は、偽ニュースコンテンツや、それを生成する悪意のあるアクターに焦点を当てている。しかし、偽ニュースのバイラル性は、それを広めるユーザーに大きく依存している。これらのユーザーに対する深い理解は、偽ニュースを拡散する可能性のあるユーザーを特定するためのフレームワークの開発に寄与することができる。本研究では,ソーシャルメディア上でのフェイクニューススプレッシャーの特徴と動機要因について,心理学的理論や行動学的研究から考察した。次に、フェイクニューススプレッドラーが他のユーザーと異なる特徴を示すことができるかどうかを判定する一連の実験を行う。さらに,本実験における偽ニュース拡散器の特性が,実際のソーシャルメディア環境における偽ニュース拡散器の検出に応用できるかどうかを検証して検討した。 The rise of fake news in the past decade has brought with it a host of consequences, from swaying opinions on elections to generating uncertainty during a pandemic. A majority of methods developed to combat disinformation either focus on fake news content or malicious actors who generate it. However, the virality of fake news is largely dependent upon the users who propagate it. A deeper understanding of these users can contribute to the development of a framework for identifying users who are likely to spread fake news. In this work, we study the characteristics and motivational factors of fake news spreaders on social media with input from psychological theories and behavioral studies. We then perform a series of experiments to determine if fake news spreaders can be found to exhibit different characteristics than other users. Further, we investigate our findings by testing whether the characteristics we observe amongst fake news spreaders in our experiments can be applied to the detection of fake news spreaders in a real social media environment.	翻訳日:2023-03-17 07:18:44 公開日:2021-08-24
# 低ランクサドルフリーニュートン:確率的非凸最適化のためのスケーラブルな方法 Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization ( http://arxiv.org/abs/2002.02881v3 ) ライセンス: Link先を確認	Thomas O'Leary-Roseberry, Nick Alger, Omar Ghattas	(参考訳) 現代のディープラーニングでは、大規模データセットと一般化特性から、高度にサブサンプル化された確率近似(SA)法が平均近似(SAA)法より好まれている。加えて、ヘッセン人の形成と分解のコストが認識されているため、これらの問題には二階法が用いられない。この研究において、ニュートン法をSA体制に拡張する動機付けを行い、低階近似を好んでヘッセンを形成することを避けるため、スケーラブルな低階サドルフリーニュートン法(LRSFN)を用いることを主張した。さらにLRSFNは、不確定領域から素早く脱出し、より良い最適化ソリューションを実現する。 SA設定では、反復的な更新は確率的ノイズに支配され、手法の安定性が鍵となる。我々は, 連続時間安定性解析フレームワークを導入し, ニュートン法に対する確率的誤差を悪条件のヘッシアンによって大きく増幅できることを示す。 LRSFN法はこの安定性問題をレバンス・マルカールト減衰によって緩和する。しかし、一般に解析は、決定論的問題とは異なり、確率的ヘッセン情報と勾配情報を持つ二階法は小さなステップを踏む必要があることを示している。数値計算の結果,LRSFNは他の手法が抱える問題のある不確定領域から逃れることが可能であり,制限的なステップ長条件下であっても,等価な計算作業の一般化性の観点から,大規模深層学習タスクにおいて一般的な一階法よりも優れていることがわかった。 In modern deep learning, highly subsampled stochastic approximation (SA) methods are preferred to sample average approximation (SAA) methods because of large data sets as well as generalization properties. Additionally, due to perceived costs of forming and factorizing Hessians, second order methods are not used for these problems. In this work we motivate the extension of Newton methods to the SA regime, and argue for the use of the scalable low rank saddle free Newton (LRSFN) method, which avoids forming the Hessian in favor of making a low rank approximation. Additionally, LRSFN can facilitate fast escape from indefinite regions leading to better optimization solutions. In the SA setting, iterative updates are dominated by stochastic noise, and stability of the method is key. We introduce a continuous time stability analysis framework, and use it to demonstrate that stochastic errors for Newton methods can be greatly amplified by ill-conditioned Hessians. The LRSFN method mitigates this stability issue via Levenberg-Marquardt damping. However, generally the analysis shows that second order methods with stochastic Hessian and gradient information may need to take small steps, unlike in deterministic problems. Numerical results show that LRSFN can escape indefinite regions that other methods have issues with; and even under restrictive step length conditions, LRSFN can outperform popular first order methods on large scale deep learning tasks in terms of generalizability for equivalent computational work.	翻訳日:2023-01-03 05:20:07 公開日:2021-08-24
# 微分可能ファジィ論理演算子の解析 Analyzing Differentiable Fuzzy Logic Operators ( http://arxiv.org/abs/2002.06100v2 ) ライセンス: Link先を確認	Emile van Krieken, Erman Acar, Frank van Harmelen	(参考訳) AIコミュニティは、これらのアプローチの強みと弱みが相補的であるとしばしば主張されるため、象徴的アプローチとニューラルアプローチの組み合わせに注意を向けている。最近の文献のトレンドは、ファジィ論理の演算子を用いる弱い教師付き学習技術である。特に、このような論理に記述された事前の背景知識を用いて、ラベル付きでノイズの多いデータからニューラルネットワークのトレーニングを支援する。ニューラルネットワークを用いて論理記号を解釈することにより、この背景知識を通常の損失関数に追加することができる。我々は,ファジィ論理文からの論理演算子の大規模な集合が,微分可能な学習環境でどのように振る舞うかを,形式的かつ実証的に研究する。これらの演算子の多くは、最もよく知られたものを含めて、この設定には非常に適していないことが分かりました。さらなる発見は、これらのファジィ論理における含意の扱いを懸念し、前者によって駆動される勾配とそれに伴う含意の強い不均衡を示す。さらに,この現象に取り組むために,新たなファジィ・インジェクション(sgmoidal implications)のファミリーを導入する。最後に,半教師付き学習に微分可能なファジィ論理を用いることが可能であることを実証的に示し,運用者が実際にどのように振る舞うかを比較する。教師付きベースラインよりも最大の性能向上を達成するためには、学習において良好に機能するが、通常の論理法則を満たさない論理演算子の非標準的な組み合わせに頼る必要がある。 The AI community is increasingly putting its attention towards combining symbolic and neural approaches, as it is often argued that the strengths and weaknesses of these approaches are complementary. One recent trend in the literature are weakly supervised learning techniques that employ operators from fuzzy logics. In particular, these use prior background knowledge described in such logics to help the training of a neural network from unlabeled and noisy data. By interpreting logical symbols using neural networks, this background knowledge can be added to regular loss functions, hence making reasoning a part of learning. We study, both formally and empirically, how a large collection of logical operators from the fuzzy logic literature behave in a differentiable learning setting. We find that many of these operators, including some of the most well-known, are highly unsuitable in this setting. A further finding concerns the treatment of implication in these fuzzy logics, and shows a strong imbalance between gradients driven by the antecedent and the consequent of the implication. Furthermore, we introduce a new family of fuzzy implications (called sigmoidal implications) to tackle this phenomenon. Finally, we empirically show that it is possible to use Differentiable Fuzzy Logics for semi-supervised learning, and compare how different operators behave in practice. We find that, to achieve the largest performance improvement over a supervised baseline, we have to resort to non-standard combinations of logical operators which perform well in learning, but no longer satisfy the usual logical laws.	翻訳日:2023-01-01 04:22:54 公開日:2021-08-24
# Triangle-Net: ポイントクラウド学習におけるロバストネスを目指して Triangle-Net: Towards Robustness in Point Cloud Learning ( http://arxiv.org/abs/2003.00856v2 ) ライセンス: Link先を確認	Chenxi Xiao and Juan Wachs	(参考訳) 3次元オブジェクト認識は、自動運転車やサービスロボット、監視ドローンといった多くのコンピュータビジョンシステムにとって、非構造環境でより効果的に動作するための重要な能力になりつつある。これらのリアルタイムシステムは、様々なサンプリング解像度、ノイズ測定、無拘束ポーズ構成にロバストな効果的な分類方法を必要とする。これまでの研究では、ポイントのスパーシティ、回転、位置固有分散がポイントクラウドに基づく分類技術の性能を著しく低下させる可能性があることが示されている。しかし、どちらも多因子分散や顕著な分散に対して十分に堅牢ではない。そこで本研究では, 回転, 位置シフト, スケーリングに対する不変性を同時に実現し, 点間隔に頑健な3次元分類手法を提案する。この目的のために,提案したニューラルネットワークでエンドツーエンドに学習し,頑健な3Dオブジェクトの潜在表現を得ることのできる点雲グラフ構造を利用する新機能を導入する。このような潜在表現は,点がばらばらである場合,オブジェクト分類や検索タスクの性能を著しく向上させることができる。さらに, 任意のSO(3)回転下では, 16点のみのスパース点雲を用いて, ModelNet 40分類タスクにおいて, ポイントネットと3DmFVをそれぞれ35.0%, 28.1%上回った。 Three dimensional (3D) object recognition is becoming a key desired capability for many computer vision systems such as autonomous vehicles, service robots and surveillance drones to operate more effectively in unstructured environments. These real-time systems require effective classification methods that are robust to various sampling resolutions, noisy measurements, and unconstrained pose configurations. Previous research has shown that points' sparsity, rotation and positional inherent variance can lead to a significant drop in the performance of point cloud based classification techniques. However, neither of them is sufficiently robust to multifactorial variance and significant sparsity. In this regard, we propose a novel approach for 3D classification that can simultaneously achieve invariance towards rotation, positional shift, scaling, and is robust to point sparsity. To this end, we introduce a new feature that utilizes graph structure of point clouds, which can be learned end-to-end with our proposed neural network to acquire a robust latent representation of the 3D object. We show that such latent representations can significantly improve the performance of object classification and retrieval tasks when points are sparse. Further, we show that our approach outperforms PointNet and 3DmFV by 35.0% and 28.1% respectively in ModelNet 40 classification tasks using sparse point clouds of only 16 points under arbitrary SO(3) rotation.	翻訳日:2022-12-28 07:20:17 公開日:2021-08-24
# 人工知能における価値学習に応用した動的認知 Dynamic Cognition Applied to Value Learning in Artificial Intelligence ( http://arxiv.org/abs/2005.05538v6 ) ライセンス: Link先を確認	Nythamar de Oliveira and Nicholas Kluge Corr\^ea	(参考訳) 人工知能(AI)開発の専門家は、インテリジェントシステムとエージェントの開発の進歩が、我々の社会における重要な領域を形作ると予測している。しかし、そのような進歩が慎重さで行われなければ、それは人類にとって否定的な結果をもたらす可能性がある。このため、この分野の何人かの研究者は、堅牢で有益で安全な人工知能の概念を開発しようとしている。現在、AI研究の分野におけるいくつかのオープンな問題は、インテリジェントエージェントの望ましくない振る舞いを避けることの難しさと、そのようなシステムが何をするかを規定することによるものである。直交論で論じられているように、aiが単に知性のために道徳的な好みを発達させることは期待できないという事実を考えると、人工知能エージェントが人間の価値観に合致する価値を持っていることは最も重要である。おそらくこの難しさは、表現的認知手法を用いて、目的、価値、目的を表現している問題に対処する方法に由来する。この問題の解決策は、ドレフュスが提唱した動的認知的アプローチであり、その現象論的哲学は、世界にいる人間の経験は象徴的あるいは接続主義的な認知的手法では表現できないことを擁護している。この問題に対する可能なアプローチは、SED(situated embodied dynamics)のような理論モデルを使用して、AIにおける価値学習問題に対処することだ。 Experts in Artificial Intelligence (AI) development predict that advances in the development of intelligent systems and agents will reshape vital areas in our society. Nevertheless, if such an advance isn't done with prudence, it can result in negative outcomes for humanity. For this reason, several researchers in the area are trying to develop a robust, beneficial, and safe concept of artificial intelligence. Currently, several of the open problems in the field of AI research arise from the difficulty of avoiding unwanted behaviors of intelligent agents, and at the same time specifying what we want such systems to do. It is of utmost importance that artificial intelligent agents have their values aligned with human values, given the fact that we cannot expect an AI to develop our moral preferences simply because of its intelligence, as discussed in the Orthogonality Thesis. Perhaps this difficulty comes from the way we are addressing the problem of expressing objectives, values, and ends, using representational cognitive methods. A solution to this problem would be the dynamic cognitive approach proposed by Dreyfus, whose phenomenological philosophy defends that the human experience of being-in-the-world cannot be represented by the symbolic or connectionist cognitive methods. A possible approach to this problem would be to use theoretical models such as SED (situated embodied dynamics) to address the values learning problem in AI.	翻訳日:2022-12-03 19:08:57 公開日:2021-08-24
# 粗いラベルを用いた弱教師付き表現学習 Weakly Supervised Representation Learning with Coarse Labels ( http://arxiv.org/abs/2005.09681v3 ) ライセンス: Link先を確認	Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Juhua Hu	(参考訳) データ収集のための計算能力と技術の開発により、ディープラーニングは、ビジュアルベンチマークデータセット上の既存のアルゴリズムよりも優れた性能を示す。深層学習のメカニズムの研究に多くの努力が注がれている。重要な観察の1つは、ディープラーニングが原材料から直接タスク依存の方法で識別パターンを学習できることである。そのため、深層学習により得られた表現は手作りの特徴を著しく上回る。しかし、現実のアプリケーションでは、オンラインショッピングでのビジュアル検索のようなタスク固有のラベルを収集するには高価すぎる。これらのタスク固有のラベルの可用性が限られているのに対して、粗いクラスラベルはずっと手頃だが、それらから学んだ表現はターゲットタスクに最適である。この課題を軽減するために,粗いラベルのみを利用できる場合に,対象タスクのきめ細かいパターンを学習するアルゴリズムを提案する。さらに重要なのは、理論的保証を提供することです。実世界のデータセットに対する大規模な実験により,提案手法は,粗いクラス情報のみをトレーニングに利用できる場合に,対象タスク上での学習表現の性能を著しく向上させることができることを示した。コードは \url{https://github.com/idstcv/CoIns} で入手できる。 With the development of computational power and techniques for data collection, deep learning demonstrates a superior performance over most existing algorithms on visual benchmark data sets. Many efforts have been devoted to studying the mechanism of deep learning. One important observation is that deep learning can learn the discriminative patterns from raw materials directly in a task-dependent manner. Therefore, the representations obtained by deep learning outperform hand-crafted features significantly. However, for some real-world applications, it is too expensive to collect the task-specific labels, such as visual search in online shopping. Compared to the limited availability of these task-specific labels, their coarse-class labels are much more affordable, but representations learned from them can be suboptimal for the target task. To mitigate this challenge, we propose an algorithm to learn the fine-grained patterns for the target task, when only its coarse-class labels are available. More importantly, we provide a theoretical guarantee for this. Extensive experiments on real-world data sets demonstrate that the proposed method can significantly improve the performance of learned representations on the target task, when only coarse-class information is available for training. Code is available at \url{https://github.com/idstcv/CoIns}.	翻訳日:2022-12-01 14:16:03 公開日:2021-08-24
# Cumulant GAN Cumulant GAN ( http://arxiv.org/abs/2006.06625v3 ) ライセンス: Link先を確認	Yannis Pantazis, Dipjyoti Paul, Michail Fasoulakis, Yannis Stylianou and Markos Katsoulakis	(参考訳) 本稿では,より深い理論的理解と基礎的最適化問題に対する安定性と性能の向上を目的とした,gans(generative adversarial network)訓練のための新しい損失関数を提案する。新たな損失関数は、\emph{cumulant gan} を生成する累積生成関数に基づいている。最近派生した変分公式に依拠して、対応する最適化問題は r{\'e}nyi 分岐最小化に相当し、gan 損失の(部分的に)統一的な視点を提供する: r{\'e}nyi ファミリーは、kullback-leibler divergence (kld)、reverse kld、helinger distance、$\chi^2$-divergence を含む。 Wasserstein GANは累積GANのメンバーでもある。安定性の面では、線形判別器、ガウス分布および標準勾配降下上昇アルゴリズムに対する累積GANのナッシュ平衡への線形収束を厳密に証明する。最後に,Wasserstein GANに対して画像生成がより堅牢であることが実験的に証明され,より弱い判別器と強い判別器の両方を考慮すると,開始点とFr'echet開始距離の両方で大幅に改善される。 In this paper, we propose a novel loss function for training Generative Adversarial Networks (GANs) aiming towards deeper theoretical understanding as well as improved stability and performance for the underlying optimization problem. The new loss function is based on cumulant generating functions giving rise to \emph{Cumulant GAN}. Relying on a recently-derived variational formula, we show that the corresponding optimization problem is equivalent to R{\'e}nyi divergence minimization, thus offering a (partially) unified perspective of GAN losses: the R{\'e}nyi family encompasses Kullback-Leibler divergence (KLD), reverse KLD, Hellinger distance and $\chi^2$-divergence. Wasserstein GAN is also a member of cumulant GAN. In terms of stability, we rigorously prove the linear convergence of cumulant GAN to the Nash equilibrium for a linear discriminator, Gaussian distributions and the standard gradient descent ascent algorithm. Finally, we experimentally demonstrate that image generation is more robust relative to Wasserstein GAN and it is substantially improved in terms of both inception score and Fr\'echet inception distance when both weaker and stronger discriminators are considered.	翻訳日:2022-11-22 13:15:21 公開日:2021-08-24
# パサデナ:知覚的に認識し、敵対的妄想攻撃 Pasadena: Perceptually Aware and Stealthy Adversarial Denoise Attack ( http://arxiv.org/abs/2007.07097v3 ) ライセンス: Link先を確認	Yupeng Cheng, Qing Guo, Felix Juefei-Xu, Wei Feng, Shang-Wei Lin, Weisi Lin, Yang Liu	(参考訳) 画像デノイジングは、低品質の撮像センサ、不安定な画像伝送プロセス、あるいは低い光条件により、マルチメディアデバイスで撮影された画像に広く存在する自然ノイズを除去することができる。近年の研究では、画像の雑音化は、例えば画像分類のような高レベルな視覚タスクに効果があることも判明している。本研究では,この常識に挑戦し,画像のデノイジングが最先端のディープニューラルネットワーク(dnn)を騙し,画質を高めることができるかどうかという,まったく新しい問題を探求する。この目的のために,敵の攻撃の観点からこの問題を研究するための最初の試みを開始し,敵の妄想攻撃を提案する。まず、マルチメディアデバイスに広くデプロイされたイメージデノイジングモジュール内に攻撃をステルスに埋め込む新しいタスクを、画像のポスト処理操作として特定し、視覚的な画像品質と愚かなdnnを同時に向上させます。第2に,この課題を画像フィルタリングのカーネル予測問題として定式化し,効果的なノイズ除去と逆アタックを同時に行うために,逆ノイズのないカーネルを生成できる逆検出型カーネル予測を提案する。第三に、攻撃がより効果的になりうるセマンティック関連脆弱性領域を特定するために、適応的な知覚領域ローカライゼーションを実装している。本稿では,提案手法をPasadena (Perceptually Aware and Stealthy Adversarial DeNoise Attack) と命名し,NeurIPS'17逆競合データセット(CVPR2021-AIC-VI:unrestricted adversarial attacks on ImageNet,etc)で検証した。包括的評価と分析により,本手法は偏執だけでなく,最先端攻撃に対する成功率や伝達性も著しく向上することが示された。 Image denoising can remove natural noise that widely exists in images captured by multimedia devices due to low-quality imaging sensors, unstable image transmission processes, or low light conditions. Recent works also find that image denoising benefits the high-level vision tasks, e.g., image classification. In this work, we try to challenge this common sense and explore a totally new problem, i.e., whether the image denoising can be given the capability of fooling the state-of-the-art deep neural networks (DNNs) while enhancing the image quality. To this end, we initiate the very first attempt to study this problem from the perspective of adversarial attack and propose the adversarial denoise attack. More specifically, our main contributions are three-fold: First, we identify a new task that stealthily embeds attacks inside the image denoising module widely deployed in multimedia devices as an image post-processing operation to simultaneously enhance the visual image quality and fool DNNs. Second, we formulate this new task as a kernel prediction problem for image filtering and propose the adversarial-denoising kernel prediction that can produce adversarial-noiseless kernels for effective denoising and adversarial attacking simultaneously. Third, we implement an adaptive perceptual region localization to identify semantic-related vulnerability regions with which the attack can be more effective while not doing too much harm to the denoising. We name the proposed method as Pasadena (Perceptually Aware and Stealthy Adversarial DENoise Attack) and validate our method on the NeurIPS'17 adversarial competition dataset, CVPR2021-AIC-VI: unrestricted adversarial attacks on ImageNet,etc. The comprehensive evaluation and analysis demonstrate that our method not only realizes denoising but also achieves a significantly higher success rate and transferability over state-of-the-art attacks.	翻訳日:2022-11-10 15:18:07 公開日:2021-08-24
# 神経形制御 Neuromorphic Control ( http://arxiv.org/abs/2011.04441v2 ) ライセンス: Link先を確認	Luka Ribar, Rodolphe Sepulchre	(参考訳) ニューロモルフィックエンジニアリング(Neuromorphic Engineering)は、ニューラルネットワークの生物学的組織からインスピレーションを得て、コンピューティング、センシング、アクティベーションのための新しい技術を開発することを目的とした、急速に発展する分野である。このようなシステムのユニークな性質は、新しい信号処理と制御パラダイムを要求する。本稿では、異なる時間スケールで作用する正負のフィードバックループと正のフィードバックループからなる興奮性神経系の混合フィードバック組織について紹介する。生物学的神経調節の原理は、混合フィードバック系をニューロモルフィズム的に設計し制御するための方法論を示唆している。提案する設計は、生体ニューロンの組織化を反映し、ニューロモルフィックな電子回路のハードウェアコンポーネントを利用する基本回路要素の並列相互接続からなる。相互接続構造は、入力出力整形問題として神経制御を再構成する単純な制御手法によって、ニューロモルフィックシステムを提供する。神経制御のポテンシャルは、混合フィードバック原理のスケーラビリティを示唆する基本的なネットワークの例に示される。 Neuromorphic engineering is a rapidly developing field that aims to take inspiration from the biological organization of neural systems to develop novel technology for computing, sensing, and actuating. The unique properties of such systems call for new signal processing and control paradigms. The article introduces the mixed feedback organization of excitable neuronal systems, consisting of interlocked positive and negative feedback loops acting in distinct timescales. The principles of biological neuromodulation suggest a methodology for designing and controlling mixed-feedback systems neuromorphically. The proposed design consists of a parallel interconnection of elementary circuit elements that mirrors the organization of biological neurons and utilizes the hardware components of neuromorphic electronic circuits. The interconnection structure endows the neuromorphic systems with a simple control methodology that reframes the neuronal control as an input-output shaping problem. The potential of neuronal control is illustrated on elementary network examples that suggest the scalability of the mixed-feedback principles.	翻訳日:2022-09-28 01:18:49 公開日:2021-08-24
# ランダムウォークを用いたビデオ中の物体検出のための自己教師型学習システム A Self-supervised Learning System for Object Detection in Videos Using Random Walks on Graphs ( http://arxiv.org/abs/2011.05459v3 ) ライセンス: Link先を確認	Juntao Tan, Changkyu Song, Abdeslam Boularias	(参考訳) 本稿では,画像中の物体の新規かつ未発見のカテゴリを検出するための学習用自己教師付きシステムを提案する。提案システムは,様々なオブジェクトを含むシーンの未ラベル映像を入力として受信する。ビデオのフレームは深度情報を使ってオブジェクトに分割され、各ビデオに沿ってセグメントが追跡される。その後、システムは重み付きグラフを構築し、それらを含むオブジェクト間の類似性に基づいてシーケンスを接続する。オブジェクトの2つのシーケンス間の類似性は、オブジェクトの視点を整列するために2つのシーケンス内のフレームを自動的に並べ替えた後、一般的な視覚的特徴を用いて測定される。このグラフは、ランダムウォークを実行することによって、類似の異なる例のトリプレットをサンプリングするために使用される。三重項の例は最終的に、汎用的な視覚特徴を低次元多様体に投影するシアムニューラルネットワークのトレーニングに使用される。 YCB-Video、CORe50、RGBD-Objectの3つの公開データセットの実験は、予測された低次元特徴が未知のオブジェクトを新しいカテゴリにクラスタリングする精度を改善し、最近の非教師なしクラスタリング技術より優れていることを示している。 This paper presents a new self-supervised system for learning to detect novel and previously unseen categories of objects in images. The proposed system receives as input several unlabeled videos of scenes containing various objects. The frames of the videos are segmented into objects using depth information, and the segments are tracked along each video. The system then constructs a weighted graph that connects sequences based on the similarities between the objects that they contain. The similarity between two sequences of objects is measured by using generic visual features, after automatically re-arranging the frames in the two sequences to align the viewpoints of the objects. The graph is used to sample triplets of similar and dissimilar examples by performing random walks. The triplet examples are finally used to train a siamese neural network that projects the generic visual features into a low-dimensional manifold. Experiments on three public datasets, YCB-Video, CORe50 and RGBD-Object, show that the projected low-dimensional features improve the accuracy of clustering unknown objects into novel categories, and outperform several recent unsupervised clustering techniques.	翻訳日:2022-09-27 08:06:22 公開日:2021-08-24
# 単段連続ジェスチャー認識のためのマルチモーダル融合 Multi-modal Fusion for Single-Stage Continuous Gesture Recognition ( http://arxiv.org/abs/2011.04945v2 ) ライセンス: Link先を確認	Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes	(参考訳) ジェスチャー認識は、ロボット工学や人間と機械の相互作用を含む、無数の現実世界の応用が研究されている分野である。現在のジェスチャー認識法は孤立したジェスチャーを認識することに重点を置いており、既存の連続ジェスチャー認識法は、検出と分類に独立したモデルを必要とする2段階のアプローチに限られている。対照的に,複数のジェスチャを1つのモデルで検出・分類可能なtemporal multi-modal fusion(tmmf)と呼ばれる単段連続ジェスチャ認識フレームワークを導入する。このアプローチは、ジェスチャーと非ジェスチャーの自然な遷移を、個々のジェスチャーを検出するための前処理のセグメンテーションステップなしで学習する。これを実現するために,マルチモーダルな入力から流れる重要な情報の統合をサポートし,任意のモードにスケーラブルなマルチモーダル融合機構を提案する。さらに,ユニモーダル・フィーチャー・マッピング(ufm)とマルチモーダル・フィーチャー・マッピング(mfm)モデルを提案し,それぞれユニモーダル・フィーチャーと融合したマルチモーダル・フィーチャーをマッピングする。そこで,本研究では,実感と予測の円滑な一致を促す中点に基づく損失関数を提案し,モデルの自然なジェスチャー遷移の学習を支援する。本稿では,可変長の入力ビデオを処理し,EgoGesture,IPN hand,ChaLearn LAP Continuous Gesture Dataset (ConGD) という3つの課題データセットで最先端の処理を行うフレームワークの有用性を示す。さらに, アブレーション実験により, 提案手法の異なる成分の重要性が示された。 Gesture recognition is a much studied research area which has myriad real-world applications including robotics and human-machine interaction. Current gesture recognition methods have focused on recognising isolated gestures, and existing continuous gesture recognition methods are limited to two-stage approaches where independent models are required for detection and classification, with the performance of the latter being constrained by detection performance. In contrast, we introduce a single-stage continuous gesture recognition framework, called Temporal Multi-Modal Fusion (TMMF), that can detect and classify multiple gestures in a video via a single model. This approach learns the natural transitions between gestures and non-gestures without the need for a pre-processing segmentation step to detect individual gestures. To achieve this, we introduce a multi-modal fusion mechanism to support the integration of important information that flows from multi-modal inputs, and is scalable to any number of modes. Additionally, we propose Unimodal Feature Mapping (UFM) and Multi-modal Feature Mapping (MFM) models to map uni-modal features and the fused multi-modal features respectively. To further enhance performance, we propose a mid-point based loss function that encourages smooth alignment between the ground truth and the prediction, helping the model to learn natural gesture transitions. We demonstrate the utility of our proposed framework, which can handle variable-length input videos, and outperforms the state-of-the-art on three challenging datasets: EgoGesture, IPN hand, and ChaLearn LAP Continuous Gesture Dataset (ConGD). Furthermore, ablation experiments show the importance of different components of the proposed framework.	翻訳日:2022-09-27 07:41:59 公開日:2021-08-24
# シンボル空間による解釈可能な視覚推論 Interpretable Visual Reasoning via Induced Symbolic Space ( http://arxiv.org/abs/2011.11603v2 ) ライセンス: Link先を確認	Zhonghao Wang, Kai Wang, Mo Yu, Jinjun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi	(参考訳) 視覚的推論における概念誘導の問題、すなわち、画像に関連付けられた質問応答対から概念とその階層的関係を同定し、帰納的シンボリック概念空間に取り組むことによって解釈可能なモデルを実現する。そこで我々はまず,オブジェクト指向視覚特徴を用いた視覚的推論タスクを実行するために,オブジェクト指向合成注意モデル(OCCAM)という新しいフレームワークを設計する。次に,対象の視覚的特徴と質問語間の注意パターンから手がかりを用いて,対象と関係の概念を誘導する手法を考案する。最後に, OCCAMを誘導記号空間に表現したオブジェクトに付与することにより, 高い解釈可能性を実現する。我々のモデル設計は、まずオブジェクトと関係の概念を予測し、次に予測された概念を視覚的特徴空間に投影することで、構成的推論モジュールが正常に処理できるようにする。 CLEVRとGQAデータセットの実験は以下のとおりである。 1)OCCAMは,人為的な機能プログラムを使わずに新たな技術を実現する。 2) OCCAMが視覚的特徴や誘導記号的概念空間で表現されたオブジェクト上でのオンパーパフォーマンスを達成できる限り,我々の誘導概念は正確かつ十分である。 We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images; and achieve an interpretable model via working on the induced symbolic concept space. To this end, we first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features. Then, we come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words. Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space. Our model design makes this an easy adaption via first predicting the concepts of objects and relations and then projecting the predicted concepts back to the visual feature space so the compositional reasoning module can process normally. Experiments on the CLEVR and GQA datasets demonstrate: 1) our OCCAM achieves a new state of the art without human-annotated functional programs; 2) our induced concepts are both accurate and sufficient as OCCAM achieves an on-par performance on objects represented either in visual features or in the induced symbolic concept space.	翻訳日:2022-09-22 01:11:01 公開日:2021-08-24
# (参考訳) 線形回帰と整数計画に基づく高分子の推算法 A Method for Inferring Polymers Based on Linear Regression and Integer Programming ( http://arxiv.org/abs/2109.02628v1 ) ライセンス: CC BY 4.0	Ryota Ido, Shengjuan Cao, Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Hiroshi Nagamochi and Tatsuya Akutsu	(参考訳) 近年, 人工ニューラルネットワークと混合整数線形計画法を用いて, 望ましい化学特性を持つ化合物の分子構造を設計するための新しい枠組みが提案されている。本稿では, この枠組みに基づく新しいポリマー推定法を設計する。そこで本研究では, ポリマーをモノマーとして表現する新しい方法を紹介し, ポリマーの構造を特徴とする新しいディスクリプタを定義する。また,フレームワーク内で予測関数を構築するためのビルディングブロックとして線形回帰を用いる。計算実験の結果, 線形回帰で構築した予測関数がよく機能するポリマーの化学特性の集合が明らかとなった。また, 提案手法は, 最大50個の非水素原子を有するポリマーをモノマー形式で推算できることを示した。 A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property using both artificial neural networks and mixed integer linear programming. In this paper, we design a new method for inferring a polymer based on the framework. For this, we introduce a new way of representing a polymer as a form of monomer and define new descriptors that feature the structure of polymers. We also use linear regression as a building block of constructing a prediction function in the framework. The results of our computational experiments reveal a set of chemical properties on polymers to which a prediction function constructed with linear regression performs well. We also observe that the proposed method can infer polymers with up to 50 non-hydrogen atoms in a monomer form.	翻訳日:2021-09-12 12:06:11 公開日:2021-08-24
# Webスケールアプリケーションのためのバイナリコードベースのハッシュ埋め込み Binary Code based Hash Embedding for Web-scale Applications ( http://arxiv.org/abs/2109.02471v1 ) ライセンス: Link先を確認	Bencheng Yan, Pengjie Wang, Jinquan Liu, Wei Lin, Kuang-Chih Lee, Jian Xu and Bo Zheng	(参考訳) 現在、ディープラーニングモデルはレコメンダシステムやオンライン広告といったウェブスケールのアプリケーションに広く採用されている。これらのアプリケーションでは、分類的特徴の埋め込み学習がディープラーニングモデルの成功に不可欠である。これらのモデルでは、各カテゴリの特徴値に学習や最適化が可能なユニークな埋め込みベクトルが割り当てられている。この方法はカテゴリの特徴をうまく捉え、優れた性能を約束するが、特にウェブスケールのアプリケーションの場合、埋め込みテーブルを保存するのに膨大なメモリコストがかかる。このような大きなメモリコストは、edrmの有効性とユーザビリティを著しく阻害する。本稿では,性能を損なうことなく,埋め込みテーブルのサイズを任意のスケールで縮小できるバイナリコードベースのハッシュ埋め込み手法を提案する。実験評価の結果,本手法では組込みテーブルサイズが従来のテーブルサイズよりも1000$\times$小さい場合でも,99\%の性能を達成できることがわかった。 Nowadays, deep learning models are widely adopted in web-scale applications such as recommender systems, and online advertising. In these applications, embedding learning of categorical features is crucial to the success of deep learning models. In these models, a standard method is that each categorical feature value is assigned a unique embedding vector which can be learned and optimized. Although this method can well capture the characteristics of the categorical features and promise good performance, it can incur a huge memory cost to store the embedding table, especially for those web-scale applications. Such a huge memory cost significantly holds back the effectiveness and usability of EDRMs. In this paper, we propose a binary code based hash embedding method which allows the size of the embedding table to be reduced in arbitrary scale without compromising too much performance. Experimental evaluation results show that one can still achieve 99\% performance even if the embedding table size is reduced 1000$\times$ smaller than the original one with our proposed method.	翻訳日:2021-09-12 10:54:22 公開日:2021-08-24
# (参考訳) UAVと移動体マッピング車からの観測を統合した時空間-スペクトル-角観測モデルによる都市マッピングの改善 Spatio-temporal-spectral-angular observation model that integrates observations from UAV and mobile mapping vehicle for better urban mapping ( http://arxiv.org/abs/2109.00900v1 ) ライセンス: CC BY 4.0	Zhenfeng Shao, Gui Cheng, Deren Li, Xiao Huang, Zhipeng Lu, Jian Liu	(参考訳) 複雑な都市シーンでは、1つのセンサーからの観察は避けられないほど観察の空白をもたらし、包括的な方法で都市オブジェクトを記述できない。本稿では,UAVおよび移動体地図車両プラットフォームからの観測を統合し,空中と地上の両方からの協調観測操作を実現するために,時空間・角度観測モデルを提案する。複雑な都市景観のマルチ角度データを効果的に取得するマルチソースリモートセンシングデータ取得システムを開発した。多元データ融合は、咬合による不足データ問題を解決し、複雑な都市シーンにおけるホログラフィック空間および時間情報の正確かつ迅速かつ完全な収集を実現する。我々は,中国長慶市バイシャタウンで実験を行い,UAVと移動体地図からマルチセンサ,マルチ角データを得た。まず、UAVからポイントクラウドを抽出し、UAVとモバイルマッピング車両のポイントクラウドを統合しました。統合された結果は,UAVと移動体地図車両点群の特徴を組み合わせ,提案した共同データ取得プラットフォームの実践性および時空間-スペクトル-角観測モデルの有効性を確認した。 uavまたはモバイルマッピング車両単独での観測と比較すると、統合システムは総合的な都市モニタリングに向けた効果的なデータ取得ソリューションを提供する。 In a complex urban scene, observation from a single sensor unavoidably leads to voids in observations, failing to describe urban objects in a comprehensive manner. In this paper, we propose a spatio-temporal-spectral-angular observation model to integrate observations from UAV and mobile mapping vehicle platform, realizing a joint, coordinated observation operation from both air and ground. We develop a multi-source remote sensing data acquisition system to effectively acquire multi-angle data of complex urban scenes. Multi-source data fusion solves the missing data problem caused by occlusion and achieves accurate, rapid, and complete collection of holographic spatial and temporal information in complex urban scenes. We carried out an experiment on Baisha Town, Chongqing, China and obtained multi-sensor, multi-angle data from UAV and mobile mapping vehicle. We first extracted the point cloud from UAV and then integrated the UAV and mobile mapping vehicle point cloud. The integrated results combined both the characteristic of UAV and mobile mapping vehicle point cloud, confirming the practicability of the proposed joint data acquisition platform and the effectiveness of spatio-temporal-spectral-angular observation model. Compared with the observation from UAV or mobile mapping vehicle alone, the integrated system provides an effective data acquisition solution towards comprehensive urban monitoring.	翻訳日:2021-09-05 09:57:25 公開日:2021-08-24
# (参考訳) DQLEL:エネルギー最適化LoS/NLoS UWBノード選択のための深いQラーニング DQLEL: Deep Q-Learning for Energy-Optimized LoS/NLoS UWB Node Selection ( http://arxiv.org/abs/2108.13157v1 ) ライセンス: CC BY 4.0	Zohreh Hajiakhondi-Meybodi, Arash Mohammadi, Ming Hou, Konstantinos N. Plataniotis	(参考訳) モノのインターネット(IoT)の最近の進歩は、信頼性、正確、エネルギー効率の高い屋内ナビゲーション/ローカライゼーションシステムを提供することを目的として、屋内位置決めへの関心が高まっている。 UWB(Ultra Wide Band)技術は、上記の要件を満たすための候補として浮上している。 UWB技術は、広帯域を用いた屋内位置決めの精度を高めることができるが、その効率的な実装には大きな課題がある。一方、位置決めにおける高精度化は、Non Line of Sight (NLoS) リンクの識別/緩和に依存し、ローカライゼーションフレームワークの複雑さが著しく増大する。一方、UWBビーコンは電池寿命が限られており、特に戦略的な位置にある特定のビーコンの実際の状況では問題となる。これらの課題に対処するため,UWBビーコンの残バッテリ寿命のバランスを維持しつつ,複雑なNLoS緩和手法を使わずに位置精度を向上させるための効率的なノード選択フレームワークを提案する。モバイルユーザは、DQLEL(Deep Q-Learning Energy-Optimized LoS/NLoS)UWBノード選択フレームワークを参照して、Arival(TDoA)フレームワークの2次元時間差に基づいて、UWBビーコンの最適ペアを決定するために自律的に訓練される。提案するDQLELフレームワークの有効性を,リンク条件,UWBビーコンの残電池寿命のずれ,位置誤差,累積報酬の観点から評価した。シミュレーション結果に基づいて,提案するdqlelフレームワークは,上記の側面をはるかに上回っている。 Recent advancements in Internet of Things (IoTs) have brought about a surge of interest in indoor positioning for the purpose of providing reliable, accurate, and energy-efficient indoor navigation/localization systems. Ultra Wide Band (UWB) technology has been emerged as a potential candidate to satisfy the aforementioned requirements. Although UWB technology can enhance the accuracy of indoor positioning due to the use of a wide-frequency spectrum, there are key challenges ahead for its efficient implementation. On the one hand, achieving high precision in positioning relies on the identification/mitigation Non Line of Sight (NLoS) links, leading to a significant increase in the complexity of the localization framework. On the other hand, UWB beacons have a limited battery life, which is especially problematic in practical circumstances with certain beacons located in strategic positions. To address these challenges, we introduce an efficient node selection framework to enhance the location accuracy without using complex NLoS mitigation methods, while maintaining a balance between the remaining battery life of UWB beacons. Referred to as the Deep Q-Learning Energy-optimized LoS/NLoS (DQLEL) UWB node selection framework, the mobile user is autonomously trained to determine the optimal pair of UWB beacons to be localized based on the 2-D Time Difference of Arrival (TDoA) framework. The effectiveness of the proposed DQLEL framework is evaluated in terms of the link condition, the deviation of the remaining battery life of UWB beacons, location error, and cumulative rewards. Based on the simulation results, the proposed DQLEL framework significantly outperformed its counterparts across the aforementioned aspects.	翻訳日:2021-09-05 09:46:57 公開日:2021-08-24
# 適応マスク双生児層による効率的・効率的な埋め込み学習 Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer ( http://arxiv.org/abs/2108.11513v1 ) ライセンス: Link先を確認	Bencheng Yan, Pengjie Wang, Kai Zhang, Wei Lin, Kuang-Chih Lee, Jian Xu and Bo Zheng	(参考訳) 分類的特徴に対する学習の埋め込みは、深層学習に基づくレコメンデーションモデル(DLRM)にとって重要である。各特徴値は、埋め込み学習プロセスを介して埋め込みベクトルにマッピングされる。従来の方法では、同じ特徴フィールドからすべての特徴値に固定および均一な埋め込みサイズを設定する。しかし、そのような構成は学習を組み込むのに最適であるだけでなく、メモリのコストもかかる。ルールベースまたはニューラルアーキテクチャサーチ(NAS)ベースのこれらの問題を解決する既存の方法は、ヒューマンデザインやネットワークトレーニングに広範な努力を必要とする。また、サイズ選択やウォームスタートベースのアプリケーションでは柔軟性がない。本稿では,新しい,効果的な埋め込みサイズ選択手法を提案する。具体的には,標準組込み層の裏側に適応マッシュドツインベース層(amtl)を設計した。 AMTLは、埋め込みベクトルごとに望ましくない次元をマスクするマスクベクトルを生成する。マスクベクトルは次元の選択に柔軟性をもたらし、提案した層は訓練されていないDLRMに簡単に追加できる。広範な実験評価により、提案手法は全てのベンチマークタスクにおける競合ベースラインよりも優れており、またメモリ効率も高く、パフォーマンス指標を妥協することなく60\%のメモリ使用率を節約できることを示した。 Embedding learning for categorical features is crucial for the deep learning-based recommendation models (DLRMs). Each feature value is mapped to an embedding vector via an embedding learning process. Conventional methods configure a fixed and uniform embedding size to all feature values from the same feature field. However, such a configuration is not only sub-optimal for embedding learning but also memory costly. Existing methods that attempt to resolve these problems, either rule-based or neural architecture search (NAS)-based, need extensive efforts on the human design or network training. They are also not flexible in embedding size selection or in warm-start-based applications. In this paper, we propose a novel and effective embedding size selection scheme. Specifically, we design an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer. AMTL generates a mask vector to mask the undesired dimensions for each embedding vector. The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs. Extensive experimental evaluations show that the proposed scheme outperforms competitive baselines on all the benchmark tasks, and is also memory-efficient, saving 60\% memory usage without compromising any performance metrics.	翻訳日:2021-08-27 14:15:34 公開日:2021-08-24
# (参考訳) 高コントラストイメージングのための強化学習による自己最適化適応光学制御 Self-optimizing adaptive optics control with Reinforcement Learning for high-contrast imaging ( http://arxiv.org/abs/2108.11332v1 ) ライセンス: CC BY 4.0	Rico Landman, Sebastiaan Y. Haffert, Vikram M. Radhakrishnan, Christoph U. Keller	(参考訳) 現在および将来の高コントラスト撮像装置は、外惑星を直接撮像するために必要なコントラストに到達するために、極端適応光学系(XAO)を必要とする。制御ループの遅延による望遠鏡振動と時間誤差は、これらのシステムの性能を制限する。これらの効果を減らす一つの方法は予測制御を使用することである。本稿では,モデルフリーの強化学習を用いて,閉ループ予測制御のためのリカレントニューラルネットワークコントローラの最適化について述べる。まず,シミュレーションと実験室構成におけるチップティルト制御のアプローチを検証する。その結果, このアルゴリズムは最適ゲイン積分器と比較して, 振動を効果的に緩和し, パワーロー入力乱流の残差を低減できることがわかった。また,制御則のオンライン更新を必要とせずにランダム振動を最小化できることを示す。次に,本アルゴリズムは高次変形可能なミラーの制御にも適用可能であることを示す。我々は, 定常乱流下での小さな分離において, 制御器が2桁の等級改善を両立できることを実証する。さらに,制御則のオンライン更新を必要とせず,異なる風速や方向に対して比較して,桁違いに改善が見られた。 Current and future high-contrast imaging instruments require extreme adaptive optics (XAO) systems to reach contrasts necessary to directly image exoplanets. Telescope vibrations and the temporal error induced by the latency of the control loop limit the performance of these systems. One way to reduce these effects is to use predictive control. We describe how model-free Reinforcement Learning can be used to optimize a Recurrent Neural Network controller for closed-loop predictive control. First, we verify our proposed approach for tip-tilt control in simulations and a lab setup. The results show that this algorithm can effectively learn to mitigate vibrations and reduce the residuals for power-law input turbulence as compared to an optimal gain integrator. We also show that the controller can learn to minimize random vibrations without requiring online updating of the control law. Next, we show in simulations that our algorithm can also be applied to the control of a high-order deformable mirror. We demonstrate that our controller can provide two orders of magnitude improvement in contrast at small separations under stationary turbulence. Furthermore, we show more than an order of magnitude improvement in contrast for different wind velocities and directions without requiring online updating of the control law.	翻訳日:2021-08-27 00:14:49 公開日:2021-08-24
# (参考訳) 付加雑音モデルによる因果同定における騒音レベルの影響 The Effect of Noise Level on Causal Identification with Additive Noise Models ( http://arxiv.org/abs/2108.11320v1 ) ライセンス: CC BY 4.0	Benjamin Kap	(参考訳) 近年,因果推論や因果学習の分野で多くの研究が行われている。モデルにおける因果効果対を同定するために多くの手法が開発され、因果関係の方向を決定するために観測実世界データにうまく適用されている。これらの手法の多くは、矛盾、サイクル、選択バイアスなどの仮定を単純化する必要がある。しかし、両変数の状況では因果発見の問題はまだ難しい。このような手法の1つのクラスは、二変量の場合も扱えるようにしており、加法ノイズモデル(ANMs)に基づいている。残念ながら、これらの方法の1つの側面は、これまであまり注目されていない: 異なるノイズレベルが、それらの方法が因果関係の方向性を特定する能力に与える影響である。この研究は、実証的研究の助けを借りて、このギャップを埋めることを目的としている。本研究では, x が 2 変数 x, y のジョイント分布を与えられた場合,x が y または y を原因とするか否かを決定する必要のある因果発見問題の最も基本的な形式である双変量の場合を検討した。さらに、加算ノイズのレベルが1%から10000%に徐々に変化するようなanmの徹底的な範囲でテストされた、条件付き分散を用いた \textit{regression with subsequent independence test} と \textit{identification using conditional variances} の2つの特定の方法が選択されている(後者は修正されている)。さらに、本研究の実験では、線形および非線形の anms と同様に、いくつかの異なる種類の分布を考察する。実験の結果、これらの手法はノイズのレベルによっては真の因果方向を捉えることができないことが示された。 In recent years a lot of research has been conducted within the area of causal inference and causal learning. Many methods have been developed to identify the cause-effect pairs in models and have been successfully applied to observational real-world data in order to determine the direction of causal relationships. Many of these methods require simplifying assumptions, such as absence of confounding, cycles, and selection bias. Yet in bivariate situations causal discovery problems remain challenging. One class of such methods, that also allows tackling the bivariate case, is based on Additive Noise Models (ANMs). Unfortunately, one aspect of these methods has not received much attention until now: what is the impact of different noise levels on the ability of these methods to identify the direction of the causal relationship. This work aims to bridge this gap with the help of an empirical study. For this work, we considered bivariate cases, which is the most elementary form of a causal discovery problem where one needs to decide whether X causes Y or Y causes X, given joint distributions of two variables X, Y. Furthermore, two specific methods have been selected, \textit{Regression with Subsequent Independence Test} and \textit{Identification using Conditional Variances}, which have been tested with an exhaustive range of ANMs where the additive noises' levels gradually change from 1% to 10000% of the causes' noise level (the latter remains fixed). Additionally, the experiments in this work consider several different types of distributions as well as linear and non-linear ANMs. The results of the experiments show that these methods can fail to capture the true causal direction for some levels of noise.	翻訳日:2021-08-26 23:47:58 公開日:2021-08-24
# (参考訳) 構造的相互作用を考慮した解釈可能な非パラメトリック付加モデルによるセンササーベイ応答率予測 Predicting Census Survey Response Rates via Interpretable Nonparametric Additive Models with Structured Interactions ( http://arxiv.org/abs/2108.11328v1 ) ライセンス: CC BY 4.0	Shibal Ibrahim, Rahul Mazumder, Peter Radchenko, Emanuel Ben-David	(参考訳) 調査回答率の正確かつ解釈可能な予測は,運用の観点から重要である。アメリカ合衆国国勢調査局のよく知られたroam申請は、米国の国勢調査計画データベースデータに基づいて訓練された原則に基づく統計モデルを使用して、調査の難しい地域を特定する。初期のクラウドソーシングコンペティションでは、回帰ツリーのアンサンブルが調査応答率の予測に最高の性能をもたらしたが、限定的な解釈可能性のため、対応するモデルは対象に適用できなかった。本稿では,調査における応答率を高精度に予測する新しい解釈可能な統計手法を提案する。我々は,$\ell_0$-regularization による対関係を持つ疎非パラメトリック加法モデルと,解釈性を高める階層構造変種について検討した。強力な方法論的基盤にもかかわらず、そのようなモデルは計算的に困難であり、これらのモデルを学習するための新しいスケーラブルなアルゴリズムを提示します。また,提案した推定器の非漸近誤差境界も確立した。米国国勢調査計画データベースに基づく実験は、我々の手法が、人口の異なるセグメントに対して実行可能な解釈可能性を可能にする高品質な予測モデルに繋がることを示している。興味深いことに,我々の手法は,勾配向上とフィードフォワードニューラルネットワークに基づく最先端のブラックボックス機械学習手法に予測性能を損なうことなく,解釈可能性を大幅に向上させる。 pythonのコード実装はhttps://github.com/ShibalIbrahim/Additive-Models-with-Structured-Interactionsで公開されています。 Accurate and interpretable prediction of survey response rates is important from an operational standpoint. The US Census Bureau's well-known ROAM application uses principled statistical models trained on the US Census Planning Database data to identify hard-to-survey areas. An earlier crowdsourcing competition revealed that an ensemble of regression trees led to the best performance in predicting survey response rates; however, the corresponding models could not be adopted for the intended application due to limited interpretability. In this paper, we present new interpretable statistical methods to predict, with high accuracy, response rates in surveys. We study sparse nonparametric additive models with pairwise interactions via $\ell_0$-regularization, as well as hierarchically structured variants that provide enhanced interpretability. Despite strong methodological underpinnings, such models can be computationally challenging -- we present new scalable algorithms for learning these models. We also establish novel non-asymptotic error bounds for the proposed estimators. Experiments based on the US Census Planning Database demonstrate that our methods lead to high-quality predictive models that permit actionable interpretability for different segments of the population. Interestingly, our methods provide significant gains in interpretability without losing in predictive performance to state-of-the-art black-box machine learning methods based on gradient boosting and feedforward neural networks. Our code implementation in python is available at https://github.com/ShibalIbrahim/Additive-Models-with-Structured-Interactions.	翻訳日:2021-08-26 23:40:35 公開日:2021-08-24
# (参考訳) ggnb:gaussian naive bayes intrusion detection system for can bus GGNB: Graph-Based Gaussian Naive Bayes Intrusion Detection System for CAN Bus ( http://arxiv.org/abs/2108.10908v1 ) ライセンス: CC BY 4.0	Riadul Islam, Maloy K. Devnath, Manar D. Samad, and Syed Md Jaffrey Al Kadry	(参考訳) 国家道路交通安全局(nhtsa)は、自動車システムのサイバーセキュリティは他の情報システムのセキュリティよりも重要であると特定した。研究者はすでに、制御エリアネットワーク(CAN)を用いた臨界車両電子制御ユニット(ECU)に対する遠隔攻撃を実証している。さらに、既存の侵入検知システム(IDS)は特定の種類の攻撃に対処することをしばしば提案する。可能な限り短時間で広範囲の攻撃を識別できる一般化可能なIDSは、攻撃固有のIDSよりも実用的価値が高い。本稿では,グラフ特性とページランク関連の特徴を活用し,新しい"textbf g}raph-based {\textbf g}aussian {\textbf n}aive {\textbf b}ayes (ggnb)侵入検出アルゴリズムを提案する。実際の生CANデータセット~\cite{Lee:2017}上のGGNBは99.61\%、99.83\%、96.79\%、96.20\%の検知精度で、それぞれDoS、ファジィ、スプーフィング、リプレイ、混合攻撃を行う。また、OpelAstraデータセット~\cite{Guillaume:2019}を用いて、提案手法はそれぞれ、DoS、診断、ファジングCANID、ファジングペイロード、リプレイ、サスペンション、混合攻撃を考慮した100\%、99.85\%、99.92\%、99.92\%、99.92\%、99.75\%、99.57\%の検出精度を有する。 GGNBベースの方法論では、同じアプリケーションで使用されるSVM分類器と比較して、それぞれ239\times$と135\times$低いトレーニング時間とテスト時間が必要です。 Xilinx Zybo Z7フィールドプログラマブルゲートアレイ(FPGA)ボードを使用して提案されたGGNBは、従来のNNアーキテクチャよりも5.7 \times$、5.9 \times$、5.1 \times$、および3.6 \times$のスライス、LUT、フリップフロップ、DSPユニットを必要とする。 The national highway traffic safety administration (NHTSA) identified cybersecurity of the automobile systems are more critical than the security of other information systems. Researchers already demonstrated remote attacks on critical vehicular electronic control units (ECUs) using controller area network (CAN). Besides, existing intrusion detection systems (IDSs) often propose to tackle a specific type of attack, which may leave a system vulnerable to numerous other types of attacks. A generalizable IDS that can identify a wide range of attacks within the shortest possible time has more practical value than attack-specific IDSs, which is not a trivial task to accomplish. In this paper we propose a novel {\textbf g}raph-based {\textbf G}aussian {\textbf n}aive {\textbf B}ayes (GGNB) intrusion detection algorithm by leveraging graph properties and PageRank-related features. The GGNB on the real rawCAN data set~\cite{Lee:2017} yields 99.61\%, 99.83\%, 96.79\%, and 96.20\% detection accuracy for denial of service (DoS), fuzzy, spoofing, replay, mixed attacks, respectively. Also, using OpelAstra data set~\cite{Guillaume:2019}, the proposed methodology has 100\%, 99.85\%, 99.92\%, 100\%, 99.92\%, 97.75\% and 99.57\% detection accuracy considering DoS, diagnostic, fuzzing CAN ID, fuzzing payload, replay, suspension, and mixed attacks, respectively. The GGNB-based methodology requires about $239\times$ and $135\times$ lower training and tests times, respectively, compared to the SVM classifier used in the same application. Using Xilinx Zybo Z7 field-programmable gate array (FPGA) board, the proposed GGNB requires $5.7 \times$, $5.9 \times$, $5.1 \times$, and $3.6 \times$ fewer slices, LUTs, flip-flops, and DSP units, respectively, than conventional NN architecture.	翻訳日:2021-08-26 23:39:20 公開日:2021-08-24
# (参考訳) テキストクラスタリングのためのハイブリッドマルチソース機能融合 Hybrid Multisource Feature Fusion for the Text Clustering ( http://arxiv.org/abs/2108.10926v1 ) ライセンス: CC BY 4.0	Jiaxuan Chen and Shenglin Gui	(参考訳) テキストクラスタリング技術は教師なしテキストマイニング手法であり、膨大な量のテキスト文書をグループに分割するのに使われる。テキストクラスタリングアルゴリズムは教師付き手法よりも優れたパフォーマンスを実現するのが難しく、クラスタリング性能は選択したテキスト機能に依存することが報告されている。現在、テキスト特徴生成アルゴリズムにはさまざまな種類があり、それぞれがvsmや分散単語埋め込みといった特定の側面からテキスト特徴を抽出するため、コーパスから可能な限り完全な機能を得る新しい方法を求めることが、クラスタリング効果を強化する鍵となっている。本稿では,マルチモデルの特徴表現,相互類似性行列,特徴融合という3つの要素からなるハイブリッド多元特徴融合(hmff)フレームワークを提案する。そこでは,各特徴点の相互類似性行列を構築し,相互類似性行列から相互類似性行列を融合し,次元を小さくしてhmff特徴を生成することにより,入力サンプルをグループに分割するk-meansクラスタリングアルゴリズムを構成できる。実験の結果、HMFFフレームワークは11の公開ベンチマークデータセットのうち7つの公開アルゴリズムよりも優れており、残りの4つのベンチマークデータセットでも主要なパフォーマンスを示している。最終的に、HMFFフレームワークと、野生のCOVID-19データセット上の競合相手と、未知のクラスタ数を比較した。 The text clustering technique is an unsupervised text mining method which are used to partition a huge amount of text documents into groups. It has been reported that text clustering algorithms are hard to achieve better performance than supervised methods and their clustering performance is highly dependent on the picked text features. Currently, there are many different types of text feature generation algorithms, each of which extracts text features from some specific aspects, such as VSM and distributed word embedding, thus seeking a new way of obtaining features as complete as possible from the corpus is the key to enhance the clustering effects. In this paper, we present a hybrid multisource feature fusion (HMFF) framework comprising three components, feature representation of multimodel, mutual similarity matrices and feature fusion, in which we construct mutual similarity matrices for each feature source and fuse discriminative features from mutual similarity matrices by reducing dimensionality to generate HMFF features, then k-means clustering algorithm could be configured to partition input samples into groups. The experimental tests show our HMFF framework outperforms other recently published algorithms on 7 of 11 public benchmark datasets and has the leading performance on the rest 4 benchmark datasets as well. At last, we compare HMFF framework with those competitors on a COVID-19 dataset from the wild with the unknown cluster count, which shows the clusters generated by HMFF framework partition those similar samples much closer.	翻訳日:2021-08-26 23:20:24 公開日:2021-08-24
# (参考訳) SLIVARの現状: ロボット、人間とロボットのインタラクション、そして(音声)対話システムにとって、次は何か? The State of SLIVAR: What's next for robots, human-robot interaction, and (spoken) dialogue systems? ( http://arxiv.org/abs/2108.10931v1 ) ライセンス: CC BY-SA 4.0	Casey Kennington	(参考訳) 我々は,ロボット工学,人間ロボットインタラクション,音声対話システム研究の重要交差点におけるオープンな疑問を議論するために,最近のワークショップとセミナーの報告結果とレコメンデーションを合成した。この拡大する研究分野の目標は、人々がより効果的で自然にロボットとコミュニケーションできるようにすることだ。ネットワークと議論の機会を具体的かつ潜在的に資金提供可能なプロジェクトに向けて推進するため、私たちは関係者に対して、将来の仮想的および対面的な議論やワークショップに参加することを検討するよう促します。 We synthesize the reported results and recommendations of recent workshops and seminars that convened to discuss open questions within the important intersection of robotics, human-robot interaction, and spoken dialogue systems research. The goal of this growing area of research interest is to enable people to more effectively and naturally communicate with robots. To carry forward opportunities networking and discussion towards concrete, potentially fundable projects, we encourage interested parties to consider participating in future virtual and in-person discussions and workshops.	翻訳日:2021-08-26 23:03:52 公開日:2021-08-24
# (参考訳) SNコンピュータサイエンス:タミル語によるYouTubeコメントと投稿の攻撃的言語識別を目指す SN Computer Science: Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts ( http://arxiv.org/abs/2108.10939v1 ) ライセンス: CC BY 4.0	Charangan Vasantharajan and Uthayasanker Thayasivam	(参考訳) ソーシャルメディアプラットフォームにおける攻撃的言語検出は、ここ数年で活発な研究分野となっている。非ネイティブな英語圏では、ソーシャルメディアのユーザーは投稿や記事にコードミキシングされたテキストを使うことが多い。これは、攻撃的なコンテンツ識別タスクにいくつかの課題をもたらし、Tamilで利用可能なリソースが少ないことを考えると、タスクはずっと難しくなります。本研究は,複数の深層学習モデルを用いて広範な実験を行い,YouTube上の攻撃的コンテンツを検出する。本稿では,BERT, DistilBERT, XLM-RoBERTaなどの多言語トランスフォーマネットワークを微調整し, アンサンブルすることで, より優れた結果を得るための, 選択的翻訳・翻訳手法の新規かつ柔軟なアプローチを提案する。実験の結果, ULMFiTが最適モデルであることが確認された。最高のパフォーマンスモデルは、 Distil-BERT や XLM-RoBERTa などの一般的なトランスファー学習モデルやハイブリッドディープラーニングモデルの代わりに、このタミル符号混合データセットの ULMFiT と mBERTBiLSTM であった。提案されたモデルulmfitとmbertbilstmは良好な結果をもたらし、低リソース言語における効果的な攻撃的音声識別を約束している。 Offensive Language detection in social media platforms has been an active field of research over the past years. In non-native English spoken countries, social media users mostly use a code-mixed form of text in their posts/comments. This poses several challenges in the offensive content identification tasks, and considering the low resources available for Tamil, the task becomes much harder. The current study presents extensive experiments using multiple deep learning, and transfer learning models to detect offensive content on YouTube. We propose a novel and flexible approach of selective translation and transliteration techniques to reap better results from fine-tuning and ensembling multilingual transformer networks like BERT, Distil- BERT, and XLM-RoBERTa. The experimental results showed that ULMFiT is the best model for this task. The best performing models were ULMFiT and mBERTBiLSTM for this Tamil code-mix dataset instead of more popular transfer learning models such as Distil- BERT and XLM-RoBERTa and hybrid deep learning models. The proposed model ULMFiT and mBERTBiLSTM yielded good results and are promising for effective offensive speech identification in low-resourced languages.	翻訳日:2021-08-26 22:55:59 公開日:2021-08-24
# (参考訳) 7Tにおける定量的R1マッピングにおける走査間運動アーチファクトの補正 Correcting inter-scan motion artefacts in quantitative R1 mapping at 7T ( http://arxiv.org/abs/2108.10943v1 ) ライセンス: CC BY 4.0	Ya\"el Balbastre, Ali Aghaeifar, Nad\`ege Corbin, Mikael Brudfors, John Ashburner, Martina F. Callaghan	(参考訳) 目的: スキャン間運動は、$R_1$推定における重大なエラー源であり、$B_1$フィールドがより不均一な7Tで増加することが期待できる。確立された補正方式は、ボディコイル参照を必要とするため、7Tに変換されない。ここでは,確立した手法に勝る代替案を2つ紹介する。相対感度を計算するため、ボディコイル画像を必要としない。理論: 提案手法はコイル結合等級画像を用いて相対的なコイル感度を求める。第1の方法は、単純な比で相対感度を効率よく計算し、第2の方法はより洗練された生成モデルを適用する。方法:$R_1$マップは可変フリップ角(VFA)アプローチを用いて計算された。複数のデータセットが3tと7tで取得され、vfaボリュームの取得間を行き来した。 R_1$の地図は、提案された補正と(3Tで)以前に確立された補正スキームで構築された。結果: 3tでは,提案手法がベースライン法を上回った。また, 走査間運動アーチファクトも7Tで減少した。しかし、再現性は、位置特異的な送信電界効果も取り入れた場合にのみ、非運動条件に収束した。結論:提案手法はR_1$マップのスキャン間動作補正を簡略化し,典型的にはボディコイルが利用できない3Tと7Tの両方に適用可能である。すべてのメソッドのオープンソースコードは公開されています。 Purpose: Inter-scan motion is a substantial source of error in $R_1$ estimation, and can be expected to increase at 7T where $B_1$ fields are more inhomogeneous. The established correction scheme does not translate to 7T since it requires a body coil reference. Here we introduce two alternatives that outperform the established method. Since they compute relative sensitivities they do not require body coil images. Theory: The proposed methods use coil-combined magnitude images to obtain the relative coil sensitivities. The first method efficiently computes the relative sensitivities via a simple ratio; the second by fitting a more sophisticated generative model. Methods: $R_1$ maps were computed using the variable flip angle (VFA) approach. Multiple datasets were acquired at 3T and 7T, with and without motion between the acquisition of the VFA volumes. $R_1$ maps were constructed without correction, with the proposed corrections, and (at 3T) with the previously established correction scheme. Results: At 3T, the proposed methods outperform the baseline method. Inter-scan motion artefacts were also reduced at 7T. However, reproducibility only converged on that of the no motion condition if position-specific transmit field effects were also incorporated. Conclusion: The proposed methods simplify inter-scan motion correction of $R_1$ maps and are applicable at both 3T and 7T, where a body coil is typically not available. The open-source code for all methods is made publicly available.	翻訳日:2021-08-26 22:37:04 公開日:2021-08-24
# (参考訳) フィールドガイドによるゼロショット学習 Field-Guide-Inspired Zero-Shot Learning ( http://arxiv.org/abs/2108.10967v1 ) ライセンス: CC BY 4.0	Utkarsh Mall, Bharath Hariharan, and Kavita Bala	(参考訳) 現代の認識システムは、精度を達成するために大量の監督を必要とする。新しいドメインに適応するには、専門家からのかなりのデータが必要である。ゼロショット学習は、新しいカテゴリの注釈付き属性セットを必要とする。新しいカテゴリの属性の完全なセットをアノテートすることは、デプロイにおいて退屈で高価なタスクであることが証明されます。これは、認識ドメインがエキスパートドメインである場合に特に当てはまる。そこで我々は,学習者がクラスを定義する最も有用な属性を対話的に求める,ゼロショットアノテーションに対するフィールドガイド型アプローチを提案する。我々は,CUB,SUN,AWA2などの属性アノテーションを用いた分類ベンチマークにおいて,本手法の有効性を検証し,アノテーション数を大幅に減らし,完全アノテーションを用いたモデルの性能を実現することを示す。専門家の時間は重要なので、実際のデプロイにはアノテーションのコストを削減できる。 Modern recognition systems require large amounts of supervision to achieve accuracy. Adapting to new domains requires significant data from experts, which is onerous and can become too expensive. Zero-shot learning requires an annotated set of attributes for a novel category. Annotating the full set of attributes for a novel category proves to be a tedious and expensive task in deployment. This is especially the case when the recognition domain is an expert domain. We introduce a new field-guide-inspired approach to zero-shot annotation where the learner model interactively asks for the most useful attributes that define a class. We evaluate our method on classification benchmarks with attribute annotations like CUB, SUN, and AWA2 and show that our model achieves the performance of a model with full annotations at the cost of a significantly fewer number of annotations. Since the time of experts is precious, decreasing annotation cost can be very valuable for real-world deployment.	翻訳日:2021-08-26 22:26:09 公開日:2021-08-24
# (参考訳) 実世界single view 3dリコンストラクションのためのドメイン適応 Domain Adaptation for Real-World Single View 3D Reconstruction ( http://arxiv.org/abs/2108.10972v1 ) ライセンス: CC BY 4.0	Brandon Leung, Siddharth Singh, Arik Horodniceanu	(参考訳) 深層学習に基づくオブジェクト再構成アルゴリズムは、古典的手法よりも著しく改善されている。しかし、トレーニングデータとテストデータが異なる分布を持つ場合、教師付き学習ベース手法は性能が良くない。実際、現在のほとんどの研究は、合成されたShapeNetデータセットに満足できるパフォーマンスを保っていますが、実際の画像で提示すると劇的に失敗します。この問題に対処するために、教師なし領域適応は、ラベル付き合成ソースドメインからの転送知識を使用し、ラベル付き実ターゲットドメインの分類器を学ぶことができる。実領域におけるsingle view 3dリコンストラクションの課題に取り組むため,我々は,mmd(maximum mean discrepancy)損失,深海サンゴ,およびdann(domain adversarial neural network)に触発された様々なドメイン適応手法を実験した。これらの結果から,本手法では3dモデルでは対象領域データは教師なしであるが,クラスラベルでは教師なしであるという事実を生かした新しいアーキテクチャを提案する。 pix2voxと呼ばれる最近のネットワークからフレームワークをベースとしています。結果は、shapenetをソースドメインとして、object dataset domain suite(odds)データセットをターゲットとして、real world multiview、multidomain imageデータセットとして、shapenetで実行される。 ODDSのドメインは困難であり、ドメインギャップサイズの概念を評価することができる。このデータセットを用いたマルチビュー再構築文献では,この結果が初めてである。 Deep learning-based object reconstruction algorithms have shown remarkable improvements over classical methods. However, supervised learning based methods perform poorly when the training data and the test data have different distributions. Indeed, most current works perform satisfactorily on the synthetic ShapeNet dataset, but dramatically fail in when presented with real world images. To address this issue, unsupervised domain adaptation can be used transfer knowledge from the labeled synthetic source domain and learn a classifier for the unlabeled real target domain. To tackle this challenge of single view 3D reconstruction in the real domain, we experiment with a variety of domain adaptation techniques inspired by the maximum mean discrepancy (MMD) loss, Deep CORAL, and the domain adversarial neural network (DANN). From these findings, we additionally propose a novel architecture which takes advantage of the fact that in this setting, target domain data is unsupervised with regards to the 3D model but supervised for class labels. We base our framework off a recent network called pix2vox. Results are performed with ShapeNet as the source domain and domains within the Object Dataset Domain Suite (ODDS) dataset as the target, which is a real world multiview, multidomain image dataset. The domains in ODDS vary in difficulty, allowing us to assess notions of domain gap size. Our results are the first in the multiview reconstruction literature using this dataset.	翻訳日:2021-08-26 22:24:59 公開日:2021-08-24
# (参考訳) BERTエンコーディングと文レベル言語モデルを用いた文順序付け Using BERT Encoding and Sentence-Level Language Model for Sentence Ordering ( http://arxiv.org/abs/2108.10986v1 ) ライセンス: CC BY 4.0	Melika Golestani, Seyedeh Zahra Razavi, Zeinab Borhanifard, Farnaz Tahmasebian, and Hesham Faili	(参考訳) 事象の論理列の発見は、自然言語理解の基盤の1つである。イベントのシーケンスを学ぶ一つのアプローチは、コヒーレントなテキストで文の順序を研究することである。文の順序付けは、検索に基づく質問回答、文書要約、ストーリーテリング、テキスト生成、対話システムなど、さまざまなタスクに適用できる。さらに、シャッフル文の順序を学習することで、テキストコヒーレンスをモデル化することを学ぶことができる。これまでの研究は、RNN、LSTM、BiLSTMアーキテクチャを使ってテキスト言語モデルを学習してきた。しかし、これらのネットワークは注意機構の欠如により性能が悪くなっている。本稿では,短い記事のコーパスにおける文順序付けアルゴリズムを提案する。提案手法では,アテンション機構を用いて文の依存関係をキャプチャするUniversal Transformer (UT) に基づく言語モデルを用いる。提案手法は,約100万件の短い人造ストーリーのコーパスであるROCStoriesデータセットにおけるPMR(Perfect Match Ratio)スコアの点から,過去の最先端技術を改善する。提案するモデルには,Sentence Encoder,Language Model,Sentence Arrangement with Brute Force Searchの3つのコンポーネントが含まれている。第1成分は、ROCStoriesデータに基づいて微調整されたSBERT-WK事前学習モデルを用いて文埋め込みを生成する。そして、ユニバーサルトランスフォーマーネットワークが文レベル言語モデルを生成する。復号化のために、ネットワークは、現在の文の次の文として候補文を生成する。我々はコサイン類似性をスコア関数として使用し、他の文をシャッフルセットに埋め込んだ候補にスコアを割り当てる。次に、連続した文のペア間の類似度の総和を最大化するためにブルートフォース探索を用いる。 Discovering the logical sequence of events is one of the cornerstones in Natural Language Understanding. One approach to learn the sequence of events is to study the order of sentences in a coherent text. Sentence ordering can be applied in various tasks such as retrieval-based Question Answering, document summarization, storytelling, text generation, and dialogue systems. Furthermore, we can learn to model text coherence by learning how to order a set of shuffled sentences. Previous research has relied on RNN, LSTM, and BiLSTM architecture for learning text language models. However, these networks have performed poorly due to the lack of attention mechanisms. We propose an algorithm for sentence ordering in a corpus of short stories. Our proposed method uses a language model based on Universal Transformers (UT) that captures sentences' dependencies by employing an attention mechanism. Our method improves the previous state-of-the-art in terms of Perfect Match Ratio (PMR) score in the ROCStories dataset, a corpus of nearly 100K short human-made stories. The proposed model includes three components: Sentence Encoder, Language Model, and Sentence Arrangement with Brute Force Search. The first component generates sentence embeddings using SBERT-WK pre-trained model fine-tuned on the ROCStories data. Then a Universal Transformer network generates a sentence-level language model. For decoding, the network generates a candidate sentence as the following sentence of the current sentence. We use cosine similarity as a scoring function to assign scores to the candidate embedding and the embeddings of other sentences in the shuffled set. Then a Brute Force Search is employed to maximize the sum of similarities between pairs of consecutive sentences.	翻訳日:2021-08-26 22:15:54 公開日:2021-08-24
# (参考訳) OOWL500: ワイルドなデータセットコレクションバイアスを克服する OOWL500: Overcoming Dataset Collection Bias in the Wild ( http://arxiv.org/abs/2108.10992v1 ) ライセンス: CC BY 4.0	Brandon Leung, Chih-Hui Ho, Amir Persekian, David Orozco, Yen Chang, Erik Sandstrom, Bo Liu, Nuno Vasconcelos	(参考訳) 画像データセットがオンラインで「野生」に集められたという仮説は、例えばバイアスのあるオブジェクト認識を生成できる。プロの撮影や特定の角度を好み、研究されている。新たな"研究室内"データ収集インフラストラクチャは、オブジェクトを回りながら画像をキャプチャするドローンで構成されている。重要なことに、この設定と自然なカメラによる制御は、飛行に固有の多くのバイアスを軽減する。安価で容易に複製できる性質は、ビジョンコミュニティによるスケーラブルなデータ収集の取り組みにつながる可能性もあります。このプロシージャの有用性は、fLight (OOWL)で達成されたオブジェクトのデータセットを作成することで実証される。 OOWL500 には 500 個のオブジェクトの 1220,000 イメージが含まれており,クラス単位のクラス数とオブジェクト数を考慮すれば,最大規模の "ラボ内" イメージデータセットである。さらに、オブジェクト認識に関するいくつかの新しい洞察を可能にした。まず,カメラの揺らぎやポーズなどのセマンティック特性の観点から,画像摂動を定義できる新たな対角攻撃戦略を提案する。実際、実験の結果、ImageNetには相当量のポーズとプロの写真バイアスがあることがわかった。第二に、ImageNetのような野生のデータセットとOOWL500のような実験データとの増大は、これらのバイアスを著しく減少させ、一般化を改善するオブジェクト認識に繋がることを示すために使われる。第三に、データセットはデータセット収集の"ベストプロシージャ"に関する質問の研究に使用される。合成画像によるデータ拡張は,野生のデータセットでバイアスを排除するには十分ではなく,カメラの揺動とポーズの多様性が従来考えられていたよりもオブジェクト認識の堅牢性において重要な役割を果たすことが明らかとなった。 The hypothesis that image datasets gathered online "in the wild" can produce biased object recognizers, e.g. preferring professional photography or certain viewing angles, is studied. A new "in the lab" data collection infrastructure is proposed consisting of a drone which captures images as it circles around objects. Crucially, the control provided by this setup and the natural camera shake inherent to flight mitigate many biases. It's inexpensive and easily replicable nature may also potentially lead to a scalable data collection effort by the vision community. The procedure's usefulness is demonstrated by creating a dataset of Objects Obtained With fLight (OOWL). Denoted as OOWL500, it contains 120,000 images of 500 objects and is the largest "in the lab" image dataset available when both number of classes and objects per class are considered. Furthermore, it has enabled several of new insights on object recognition. First, a novel adversarial attack strategy is proposed, where image perturbations are defined in terms of semantic properties such as camera shake and pose. Indeed, experiments have shown that ImageNet has considerable amounts of pose and professional photography bias. Second, it is used to show that the augmentation of in the wild datasets, such as ImageNet, with in the lab data, such as OOWL500, can significantly decrease these biases, leading to object recognizers of improved generalization. Third, the dataset is used to study questions on "best procedures" for dataset collection. It is revealed that data augmentation with synthetic images does not suffice to eliminate in the wild datasets biases, and that camera shake and pose diversity play a more important role in object recognition robustness than previously thought.	翻訳日:2021-08-26 22:04:17 公開日:2021-08-24
# SimVLM: Weak Supervisionでトレーニングするシンプルなビジュアル言語モデル SimVLM: Simple Visual Language Model Pretraining with Weak Supervision ( http://arxiv.org/abs/2108.10904v1 ) ライセンス: Link先を確認	Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao	(参考訳) 視覚表現とテキスト表現の結合モデリングの最近の進歩により、視覚言語前訓練(vlp)は多くのマルチモーダル下流タスクで印象的なパフォーマンスを達成している。しかし、クリーンな画像キャプションや地域ラベルを含む高価なアノテーションの要求は、既存のアプローチのスケーラビリティを制限し、複数のデータセット固有の目的を導入することで事前学習手順を複雑化する。本研究では,これらの制約を緩和し,SimVLM(Simple Visual Language Model)という最小限の事前学習フレームワークを提案する。従来の作業と異なり、SimVLMは大規模な弱監視を利用してトレーニングの複雑さを減らし、単一のプレフィックス言語モデリング目的でエンドツーエンドにトレーニングされる。 VQA(+3.74% vqa-score)、NLVR2(+1.17%精度)、SNLI-VE(+1.37%精度)、画像キャプションタスク(+10.1%平均CIDErスコア)など、様々な差別的で生成的な視覚言語ベンチマークにおいて、結果として得られたモデルは、以前の事前学習方法よりも大幅に優れ、新しい最先端の成果が得られる。さらに、SimVLMは強力な一般化と伝達能力を獲得し、オープンな視覚的質問応答やモダリティ間移動を含むゼロショット動作を可能にすることを実証する。 With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks. However, the requirement for expensive annotations including clean image captions and regional labels limits the scalability of existing approaches, and complicates the pretraining procedure with the introduction of multiple dataset-specific objectives. In this work, we relax these constraints and present a minimalist pretraining framework, named Simple Visual Language Model (SimVLM). Unlike prior work, SimVLM reduces the training complexity by exploiting large-scale weak supervision, and is trained end-to-end with a single prefix language modeling objective. Without utilizing extra data or task-specific customization, the resulting model significantly outperforms previous pretraining methods and achieves new state-of-the-art results on a wide range of discriminative and generative vision-language benchmarks, including VQA (+3.74% vqa-score), NLVR2 (+1.17% accuracy), SNLI-VE (+1.37% accuracy) and image captioning tasks (+10.1% average CIDEr score). Furthermore, we demonstrate that SimVLM acquires strong generalization and transfer ability, enabling zero-shot behavior including open-ended visual question answering and cross-modality transfer.	翻訳日:2021-08-26 13:10:52 公開日:2021-08-24
# データプログラミングを用いたポインタラベルのないラベル学習より強大な単語 The Word is Mightier than the Label Learning without Pointillistic Labels using Data Programming ( http://arxiv.org/abs/2108.10921v1 ) ライセンス: Link先を確認	Chufan Gao and Mononito Goswami	(参考訳) ほとんどの高度な教師付き機械学習(ML)モデルは、大量のポイントバイポイントラベル付きトレーニング例に依存している。大量のデータをハンドラベリングすることは面倒で、高価で、エラーを起こしやすい。近年、競争力のあるエンドモデル分類器を作成するために、弱い監督源の多種多様な利用を調査している研究もある。本稿では,弱い監督に関する最近の研究,特にデータプログラミング(dp)フレームワークについて調査する。 DPは、潜在的なノイズのあるヒューリスティックのセットを入力として、ヒューリスティックの確率的グラフィカルモデルを用いて、データセットの各データポイントにノイズ付き確率ラベルを割り当てる。 DPの背後にある数学の基礎を解析し、2つの実世界のテキスト分類タスクに適用してそのパワーを実証する。さらに,従来データスパース設定で適用されてきた点的アクティブおよび半教師付き学習手法とdpを比較した。 Most advanced supervised Machine Learning (ML) models rely on vast amounts of point-by-point labelled training examples. Hand-labelling vast amounts of data may be tedious, expensive, and error-prone. Recently, some studies have explored the use of diverse sources of weak supervision to produce competitive end model classifiers. In this paper, we survey recent work on weak supervision, and in particular, we investigate the Data Programming (DP) framework. Taking a set of potentially noisy heuristics as input, DP assigns denoised probabilistic labels to each data point in a dataset using a probabilistic graphical model of heuristics. We analyze the math fundamentals behind DP and demonstrate the power of it by applying it on two real-world text classification tasks. Furthermore, we compare DP with pointillistic active and semi-supervised learning techniques traditionally applied in data-sparse settings.	翻訳日:2021-08-26 13:10:14 公開日:2021-08-24
# Bias Mitigated Learning from Differentially Private Synthetic Data: a Cautionary Tale Bias Mitigated Learning from Differentially Private Synthetic Data: A Cautionary Tale ( http://arxiv.org/abs/2108.10934v1 ) ライセンス: Link先を確認	Sahra Ghalebikesabi, Harrison Wilde, Jack Jewson, Arnaud Doucet, Sebastian Vollmer, Chris Holmes	(参考訳) プライバシ保護機械学習への関心が高まり、未公開の実データから合成プライベートデータを生成する新しいモデルが生まれた。しかし、プライバシ保存のメカニズムは、予測モデルや推論の学習のような下流タスクに大きな影響を与える結果合成データにアーティファクトを導入する。特に、合成データ分布が実データ分布の不整合推定であるため、バイアスはすべての解析に影響を及ぼす可能性がある。本研究では, 差動合成データ生成モデルに適用可能な民営化確率比を用いたバイアス緩和手法を提案する。大規模実証評価を通じて, バイアス緩和は, 一般の合成データに対して, 単純かつ効果的なプライバシー準拠の強化をもたらすことを示した。しかし, 偏差補正後においても, 予測や推測などのタスクにおいて, 合成プライベートデータ生成器の有用性に重要な課題が残されている。 Increasing interest in privacy-preserving machine learning has led to new models for synthetic private data generation from undisclosed real data. However, mechanisms of privacy preservation introduce artifacts in the resulting synthetic data that have a significant impact on downstream tasks such as learning predictive models or inference. In particular, bias can affect all analyses as the synthetic data distribution is an inconsistent estimate of the real-data distribution. We propose several bias mitigation strategies using privatized likelihood ratios that have general applicability to differentially private synthetic data generative models. Through large-scale empirical evaluation, we show that bias mitigation provides simple and effective privacy-compliant augmentation for general applications of synthetic data. However, the work highlights that even after bias correction significant challenges remain on the usefulness of synthetic private data generators for tasks such as prediction and inference.	翻訳日:2021-08-26 13:05:33 公開日:2021-08-24
# 先行プローブを用いたエンティティ曖昧性のロバスト性評価:エンティティオーバーシャドーイングの場合 Robustness Evaluation of Entity Disambiguation Using Prior Probes:the Case of Entity Overshadowing ( http://arxiv.org/abs/2108.10949v1 ) ライセンス: Link先を確認	Vera Provatorova, Svitlana Vakulenko, Samarth Bhargav, Evangelos Kanoulas	(参考訳) エンティティの曖昧さ (ED) はエンティティリンク(EL)の最終段階であり、候補となるエンティティが出現するコンテキストに応じてリランクされる。 elのモデルのトレーニングと評価のためのすべてのデータセットは、ニュース記事やツイートのような便利なサンプルで構成されており、より頻繁に発生するエンティティに対するエンティティ分布の以前の確率バイアスを広めている。このようなデータセット上でのELシステムの性能は,事前学習だけで高い精度のスコアを得ることができるため,過大評価されている。より適切な評価ベンチマークとして,エンティティ参照に注釈を付けた16Kの短いテキストスニペットを含むShadowLinkデータセットを導入する。我々はShadowLinkベンチマークで人気のあるELシステムの性能を評価し報告する。その結果, 評価対象のELシステムにおいて, 既往の確率バイアスとエンティティのオーバーシャドーイングの影響を実証し, 共通エンティティの精度に有意な差が認められた。 Entity disambiguation (ED) is the last step of entity linking (EL), when candidate entities are reranked according to the context they appear in. All datasets for training and evaluating models for EL consist of convenience samples, such as news articles and tweets, that propagate the prior probability bias of the entity distribution towards more frequently occurring entities. It was previously shown that the performance of the EL systems on such datasets is overestimated since it is possible to obtain higher accuracy scores by merely learning the prior. To provide a more adequate evaluation benchmark, we introduce the ShadowLink dataset, which includes 16K short text snippets annotated with entity mentions. We evaluate and report the performance of popular EL systems on the ShadowLink benchmark. The results show a considerable difference in accuracy between more and less common entities for all of the EL systems under evaluation, demonstrating the effects of prior probability bias and entity overshadowing.	翻訳日:2021-08-26 13:03:01 公開日:2021-08-24
# ピクセル近傍法による肌色分割のための有効画素分割法 An Effective Pixel-Wise Approach for Skin Colour Segmentation Using Pixel Neighbourhood Technique ( http://arxiv.org/abs/2108.10971v1 ) ライセンス: Link先を確認	Tejas Dastane, Varun Rao, Kartik Shenoy, Devendra Vyavaharkar	(参考訳) 本稿では,カラーレンジのしきい値化などの既存技術が直面する限界を克服する新しい肌色分割手法を提案する。肌の色セグメンテーションは、様々な肌の色と周囲の照明条件に影響され、多くの技術で肌の色セグメンテーションに繋がる。隣接する画素に基づいて,任意のピクセルを皮膚または非皮膚に分類する2段階のPixel Neighbourhood手法を提案する。第1ステップは、深部ニューラルネットワークモデルに画素のHSV値を渡すことにより、各画素が皮膚である確率を算出する。次のステップでは、隣接するピクセルの確率を用いて、皮膚にあるピクセルの類似性を計算する。この技術は既存の技術よりも肌色セグメンテーションが優れている。 This paper presents a novel technique for skin colour segmentation that overcomes the limitations faced by existing techniques such as Colour Range Thresholding. Skin colour segmentation is affected by the varied skin colours and surrounding lighting conditions, leading to poorskin segmentation for many techniques. We propose a new two stage Pixel Neighbourhood technique that classifies any pixel as skin or non-skin based on its neighbourhood pixels. The first step calculates the probability of each pixel being skin by passing HSV values of the pixel to a Deep Neural Network model. In the next step, it calculates the likeliness of pixel being skin using these probabilities of neighbouring pixels. This technique performs skin colour segmentation better than the existing techniques.	翻訳日:2021-08-26 13:02:29 公開日:2021-08-24
# リアルタイムインド手話(ISL)認識 Real-time Indian Sign Language (ISL) Recognition ( http://arxiv.org/abs/2108.10970v1 ) ライセンス: Link先を確認	Kartik Shenoy, Tejas Dastane, Varun Rao, Devendra Vyavaharkar	(参考訳) 本稿では,グリッド型特徴量を用いて,インド手話(ISL)からのポーズやジェスチャーをリアルタイムで認識するシステムを提案する。このシステムは聴覚障害と言語障害のコミュニケーションギャップと社会の他の部分との橋渡しを試みている。既存のソリューションは比較的低い精度を提供するか、リアルタイムに動作しない。このシステムは両方のパラメーターに良い結果を与える。 33のポーズとISLからのジェスチャーを識別できる。 Sign Languageはスマートフォンカメラからキャプチャされ、そのフレームは処理のためにリモートサーバに送信される。外部ハードウェア(手袋やMicrosoft Kinectセンサーなど)の使用は避けられ、ユーザーフレンドリーになる。顔検出、物体の安定化、肌の色分割などの技術は、手の検出や追跡に使われている。さらに、画像は、特徴ベクトルの形で手のポーズを表すグリッドベースの特徴抽出技術によりさらに処理される。ハンドポーズはk-nearest neighborsアルゴリズムで分類される。一方、ジェスチャー分類では、ISLで定義された12の事前選択されたジェスチャーに対応する隠れマルコフモデルチェーンに、動作と中間手ポーズ観察シーケンスが供給される。この手法を用いることで、静的手ポーズの精度は99.7%、ジェスチャー認識の精度は97.23%となる。 This paper presents a system which can recognise hand poses & gestures from the Indian Sign Language (ISL) in real-time using grid-based features. This system attempts to bridge the communication gap between the hearing and speech impaired and the rest of the society. The existing solutions either provide relatively low accuracy or do not work in real-time. This system provides good results on both the parameters. It can identify 33 hand poses and some gestures from the ISL. Sign Language is captured from a smartphone camera and its frames are transmitted to a remote server for processing. The use of any external hardware (such as gloves or the Microsoft Kinect sensor) is avoided, making it user-friendly. Techniques such as Face detection, Object stabilisation and Skin Colour Segmentation are used for hand detection and tracking. The image is further subjected to a Grid-based Feature Extraction technique which represents the hand's pose in the form of a Feature Vector. Hand poses are then classified using the k-Nearest Neighbours algorithm. On the other hand, for gesture classification, the motion and intermediate hand poses observation sequences are fed to Hidden Markov Model chains corresponding to the 12 pre-selected gestures defined in ISL. Using this methodology, the system is able to achieve an accuracy of 99.7% for static hand poses, and an accuracy of 97.23% for gesture recognition.	翻訳日:2021-08-26 12:56:12 公開日:2021-08-24
# NeRP: 少ないサンプリング画像再構成のためのプリエンベディングによる暗黙的ニューラル表現学習 NeRP: Implicit Neural Representation Learning with Prior Embedding for Sparsely Sampled Image Reconstruction ( http://arxiv.org/abs/2108.10991v1 ) ライセンス: Link先を確認	Liyue Shen, John Pauly, Lei Xing	(参考訳) 画像再構成は、サンプリングされたセンサ計測に基づく計算画像の逆問題である。少量のサンプル画像再構成は、限られた測定のために追加の課題を引き起こす。本研究では,事前埋め込み(NeRP)を用いた暗黙的ニューラルネットワーク表現学習手法を提案する。従来の深層学習に基づく画像再構成手法とは根本的に異なり、nerpは画像内の内部情報を事前に活用し、比較的サンプルの少ない測定値の物理を活用して未知の被写体の表現を生成する。以前の画像と少量のサンプルデータを除いて、NeRPを訓練するために大規模なデータは必要ない。また,NeRPはCTやMRIなどの様々な画像モダリティに一般化する一般的な手法であることを示す。また,NeRPは腫瘍進展を評価するのに必要な,微妙ながら重要な画像変化をしっかりと捉えることができることを示した。 Image reconstruction is an inverse problem that solves for a computational image based on sampled sensor measurement. Sparsely sampled image reconstruction poses addition challenges due to limited measurements. In this work, we propose an implicit Neural Representation learning methodology with Prior embedding (NeRP) to reconstruct a computational image from sparsely sampled measurements. The method differs fundamentally from previous deep learning-based image reconstruction approaches in that NeRP exploits the internal information in an image prior, and the physics of the sparsely sampled measurements to produce a representation of the unknown subject. No large-scale data is required to train the NeRP except for a prior image and sparsely sampled measurements. In addition, we demonstrate that NeRP is a general methodology that generalizes to different imaging modalities such as CT and MRI. We also show that NeRP can robustly capture the subtle yet significant image changes required for assessing tumor progression.	翻訳日:2021-08-26 12:55:56 公開日:2021-08-24
# ガウス分布の間のエントロピーGromov-Wasserstein Entropic Gromov-Wasserstein between Gaussian Distributions ( http://arxiv.org/abs/2108.10961v1 ) ライセンス: Link先を確認	Khang Le and Dung Le and Huy Nguyen and Dat Do and Tung Pham and Nhat Ho	(参考訳) 我々はトロピック・グロモフ・ワッサーシュタインとその次元の異なるガウス分布の間の不均衡バージョンについて研究した。計量が内積であるとき、内積gromov-wasserstein (igw) は、エントロピーigwとその非平衡変異の最適輸送計画が(非平衡な)ガウス分布であることを示す。フォン・ノイマンのトレース不等式の適用により、これらのガウス分布の間のエントロピー IGW に対する閉形式式を得る。最後に、複数のガウス分布のエントロピー内積gromov-wasserstein barycenterを考える。エントロピー正則化パラメータが小さい場合、バリセンタがガウス分布であることを証明する。さらに,重心の共分散行列に対する閉形式表現も導出する。 We study the entropic Gromov-Wasserstein and its unbalanced version between (unbalanced) Gaussian distributions with different dimensions. When the metric is the inner product, which we refer to as inner product Gromov-Wasserstein (IGW), we demonstrate that the optimal transportation plans of entropic IGW and its unbalanced variant are (unbalanced) Gaussian distributions. Via an application of von Neumann's trace inequality, we obtain closed-form expressions for the entropic IGW between these Gaussian distributions. Finally, we consider an entropic inner product Gromov-Wasserstein barycenter of multiple Gaussian distributions. We prove that the barycenter is Gaussian distribution when the entropic regularization parameter is small. We further derive closed-form expressions for the covariance matrix of the barycenter.	翻訳日:2021-08-26 12:53:01 公開日:2021-08-24
# オンライン辞書学習に基づく電力系統の故障・サイバー攻撃検出 Online Dictionary Learning Based Fault and Cyber Attack Detection for Power Systems ( http://arxiv.org/abs/2108.10990v1 ) ライセンス: Link先を確認	Gabriel Intriago, Yu Zhang	(参考訳) 新興の広域監視システム(wams)は、電力網の状況把握に大きな改善をもたらした。しかし、新たに導入されたシステムは、通常の物理的障害に変装する可能性のあるサイバー攻撃のリスクを高める可能性がある。本稿では、ストリームデータマイニング分類器(Hoeffding Adaptive Tree)と半教師付き学習技術を利用して、通常のシステム摂動からサイバー攻撃を正確に識別することで、イベントや侵入検知の問題に対処する。まず,提案手法はラベルなしデータから高レベルな特徴を学習することで辞書を構築する。次に、ラベル付きデータを学習辞書原子のスパース線形結合として表現する。我々は、これらのスパースコードを利用して、オンライン分類器と効率的な変更検出器を訓練する。我々は,産業制御システムによるサイバー攻撃データセットを用いた数値実験を行った。ショートサーキット障害、ラインメンテナンス、リモートトリッピングコマンドインジェクション、リレー設定変更、偽データインジェクションの5つのシナリオを検討した。データは改良されたIEEE 9バスシステムに基づいて生成される。シミュレーションの結果,提案手法は最先端手法よりも優れていることがわかった。 The emerging wide area monitoring systems (WAMS) have brought significant improvements in electric grids' situational awareness. However, the newly introduced system can potentially increase the risk of cyber-attacks, which may be disguised as normal physical disturbances. This paper deals with the event and intrusion detection problem by leveraging a stream data mining classifier (Hoeffding adaptive tree) with semi-supervised learning techniques to distinguish cyber-attacks from regular system perturbations accurately. First, our proposed approach builds a dictionary by learning higher-level features from unlabeled data. Then, the labeled data are represented as sparse linear combinations of learned dictionary atoms. We capitalize on those sparse codes to train the online classifier along with efficient change detectors. We conduct numerical experiments with industrial control systems cyber-attack datasets. We consider five different scenarios: short-circuit faults, line maintenance, remote tripping command injection, relay setting change, as well as false data injection. The data are generated based on a modified IEEE 9-bus system. Simulation results show that our proposed approach outperforms the state-of-the-art method.	翻訳日:2021-08-26 12:52:49 公開日:2021-08-24
# (参考訳) ディープラーニングの敵対的ロバスト性:理論・アルゴリズム・応用 Adversarial Robustness of Deep Learning: Theory, Algorithms, and Applications ( http://arxiv.org/abs/2108.10451v1 ) ライセンス: CC BY 4.0	Wenjie Ruan and Xinping Yi and Xiaowei Huang	(参考訳) 本チュートリアルは, 各種深層学習モデルの脆弱性を, 逆例として評価するための, 最新の手法をよく構築したレビューとして紹介することを目的としている。このチュートリアルは特に、ディープニューラルネットワーク(DNN)の敵攻撃と堅牢性検証における最先端技術を強調している。深層学習モデルのロバスト性を改善するための効果的な対策についても紹介する。我々は、この新たな方向性に関する総合的な全体像を提供し、安全-クリティカルなデータ分析アプリケーションにおける堅牢なディープラーニングモデルの設計の緊急性と重要性をコミュニティに認識させ、最終的にはエンドユーザがディープラーニング分類器を信頼できるようにする。また、深層学習の敵意的堅牢性に関する潜在的研究の方向性と、信頼性の高い深層学習に基づくデータ分析システムとアプリケーションを実現するための潜在的な利点を要約する。 This tutorial aims to introduce the fundamentals of adversarial robustness of deep learning, presenting a well-structured review of up-to-date techniques to assess the vulnerability of various types of deep learning models to adversarial examples. This tutorial will particularly highlight state-of-the-art techniques in adversarial attacks and robustness verification of deep neural networks (DNNs). We will also introduce some effective countermeasures to improve the robustness of deep learning models, with a particular focus on adversarial training. We aim to provide a comprehensive overall picture about this emerging direction and enable the community to be aware of the urgency and importance of designing robust deep learning models in safety-critical data analytical applications, ultimately enabling the end-users to trust deep learning classifiers. We will also summarize potential research directions concerning the adversarial robustness of deep learning, and its potential benefits to enable accountable and trustworthy deep learning-based data analytical systems and applications.	翻訳日:2021-08-25 21:06:28 公開日:2021-08-24
# (参考訳) Deep Survival Dose Response Function (DeepSDRF)による確率的治療勧告 Stochastic Treatment Recommendation with Deep Survival Dose Response Function (DeepSDRF) ( http://arxiv.org/abs/2108.10453v1 ) ライセンス: CC BY 4.0	Jie Zhu, Blanca Gallego	(参考訳) 我々は,deep survival dose response function (deepsdrf) と呼ばれる臨床生存率データを用いて,確率的治療推奨問題の一般的な定式化を提案する。すなわち,未観測の要因(共同設立者)が観察された治療と時間と時間の両方に影響を及ぼす履歴データから,条件平均線量応答(CADR)関数を学習する問題を考える。 DeepSDRFから推定される治療効果により,説明的洞察を用いた推薦アルゴリズムの開発が可能となる。ランダム検索と強化学習を併用した2つの推奨手法を比較し,同様の結果を得た。我々は,DeepSDRFとそれに対応する勧告を広範囲なシミュレーション研究と2つの実験データベースで検証した: 1)臨床実践研究データリンク(CPRD)と2)eICU研究所(eRI)データベース。我々の知る限りでは、共同設立者が医学的文脈における観察データによる確率的治療効果を考慮に入れたのはこれが初めてである。 We propose a general formulation for stochastic treatment recommendation problems in settings with clinical survival data, which we call the Deep Survival Dose Response Function (DeepSDRF). That is, we consider the problem of learning the conditional average dose response (CADR) function solely from historical data in which unobserved factors (confounders) affect both observed treatment and time-to-event outcomes. The estimated treatment effect from DeepSDRF enables us to develop recommender algorithms with explanatory insights. We compared two recommender approaches based on random search and reinforcement learning and found similar performance in terms of patient outcome. We tested the DeepSDRF and the corresponding recommender on extensive simulation studies and two empirical databases: 1) the Clinical Practice Research Datalink (CPRD) and 2) the eICU Research Institute (eRI) database. To the best of our knowledge, this is the first time that confounders are taken into consideration for addressing the stochastic treatment effect with observational data in a medical context.	翻訳日:2021-08-25 20:55:39 公開日:2021-08-24
# (参考訳) Isaac Gym: ロボット学習のための高性能GPUベースの物理シミュレーション Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning ( http://arxiv.org/abs/2108.10470v1 ) ライセンス: CC BY 4.0	Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, Gavriel State	(参考訳) Isaac Gymは、GPU上でさまざまなロボットタスクのポリシーをトレーニングする、高性能な学習プラットフォームを提供する。物理シミュレーションとニューラルネットワークポリシのトレーニングはどちらもgpu上にあり、物理バッファからpytorchテンソルに直接データを渡すことで、cpuボトルネックを乗り越えることなく通信する。これにより、ニューラルネットワークにcpuベースのシミュレータとgpuを使用する従来のrlトレーニングに比べて、1つのgpu上で複雑なロボットタスクのトレーニング時間が1～2桁向上した。結果は \url{https://sites.google.com/view/isaacgym-nvidia} でホストされ、isaac gymは \url{https://developer.nvidia.com/isaac-gym} でダウンロードできる。 Isaac Gym offers a high performance learning platform to train policies for wide variety of robotics tasks directly on GPU. Both physics simulation and the neural network policy training reside on GPU and communicate by directly passing data from physics buffers to PyTorch tensors without ever going through any CPU bottlenecks. This leads to blazing fast training times for complex robotics tasks on a single GPU with 1-2 orders of magnitude improvements compared to conventional RL training that uses a CPU based simulator and GPU for neural networks. We host the results and videos at \url{https://sites.google.com/view/isaacgym-nvidia} and isaac gym can be download at \url{https://developer.nvidia.com/isaac-gym}.	翻訳日:2021-08-25 20:31:14 公開日:2021-08-24
# (参考訳) 修正FSSDとモデル圧縮に基づく小型物体検出 Small Object Detection Based on Modified FSSD and Model Compression ( http://arxiv.org/abs/2108.10503v1 ) ライセンス: CC BY 4.0	Qingcai Wang, Hao Zhang, Xianggong Hong, and Qinqin Zhou	(参考訳) 小物体の分解能は比較的低く, 抽出が困難であり, 既存の物体検出法では小物体を効果的に検出することができず, 検出速度や安定性は低い。そこで本研究では,FSSDに基づく小型物体検出アルゴリズムを提案する。まず、異なるレイヤの特徴に含まれる意味情報を異なるスケールオブジェクトの検出に用いることができ、また、特徴融合法を改善して、小さなオブジェクトに有益なより多くの情報を得ることができ、第2に、ニューラルネットワークのトレーニングを加速し、モデルをスパースにするためにバッチ正規化層を導入し、最後に、スケール係数によってモデルを切断して対応する圧縮モデルを得る。実験の結果、アルゴリズムの平均精度(mAP)はPASCAL VOCで80.4%、速度はGTX1080tiで59.5 FPSに達することが示された。刈り取り後、圧縮されたモデルは79.9% mAP、79.5 FPSの速度で到達できる。 MS COCOでは、最良の検出精度(APs)は12.1%であり、全体的な検出精度はIoUが0.5のとき49.8%である。このアルゴリズムは、小型物体の検出精度を向上させるだけでなく、検出速度を大幅に向上させ、速度と精度のバランスをとることができる。 Small objects have relatively low resolution, the unobvious visual features which are difficult to be extracted, so the existing object detection methods cannot effectively detect small objects, and the detection speed and stability are poor. Thus, this paper proposes a small object detection algorithm based on FSSD, meanwhile, in order to reduce the computational cost and storage space, pruning is carried out to achieve model compression. Firstly, the semantic information contained in the features of different layers can be used to detect different scale objects, and the feature fusion method is improved to obtain more information beneficial to small objects; secondly, batch normalization layer is introduced to accelerate the training of neural network and make the model sparse; finally, the model is pruned by scaling factor to get the corresponding compressed model. The experimental results show that the average accuracy (mAP) of the algorithm can reach 80.4% on PASCAL VOC and the speed is 59.5 FPS on GTX1080ti. After pruning, the compressed model can reach 79.9% mAP, and 79.5 FPS in detection speed. On MS COCO, the best detection accuracy (APs) is 12.1%, and the overall detection accuracy is 49.8% AP when IoU is 0.5. The algorithm can not only improve the detection accuracy of small objects, but also greatly improves the detection speed, which reaches a balance between speed and accuracy.	翻訳日:2021-08-25 20:28:39 公開日:2021-08-24
# (参考訳) 組込みシステムにおける実時間単眼人間深度推定とセグメント化 Real-Time Monocular Human Depth Estimation and Segmentation on Embedded Systems ( http://arxiv.org/abs/2108.10506v1 ) ライセンス: CC BY 4.0	Shan An, Fangru Zhou, Mei Yang, Haogang Zhu, Changhong Fu, and Konstantinos A. Tsintotas	(参考訳) 移動歩行者に対する衝突回避のためにシーンの深さを推定することはロボット分野において重要かつ根本的な問題である。本稿では,室内環境における人体深度推定とセグメンテーションの迅速かつ高精度なネットワークアーキテクチャを提案し,単眼カメラを主認識モジュールとした資源制約型プラットフォーム(バッテリ駆動空中・マイクロ空・地上車両を含む)への適用を目指している。エンコーダ・デコーダ構造に従って,提案手法は深さ予測と意味セグメンテーションの2つの分岐からなる。さらに,ネットワーク構造最適化を用いて前方推定速度を改善する。 3つの自己生成データセットに対する試験的な実験は、パイプラインがリアルタイムに実行可能であることを証明し、同等の精度を維持しながら、現代の最先端フレームワーク(TensorRTを備えたNVIDIA Jetson Nano GPUで毎秒114.6フレーム)よりも高いフレームレートを達成する。 Estimating a scene's depth to achieve collision avoidance against moving pedestrians is a crucial and fundamental problem in the robotic field. This paper proposes a novel, low complexity network architecture for fast and accurate human depth estimation and segmentation in indoor environments, aiming to applications for resource-constrained platforms (including battery-powered aerial, micro-aerial, and ground vehicles) with a monocular camera being the primary perception module. Following the encoder-decoder structure, the proposed framework consists of two branches, one for depth prediction and another for semantic segmentation. Moreover, network structure optimization is employed to improve its forward inference speed. Exhaustive experiments on three self-generated datasets prove our pipeline's capability to execute in real-time, achieving higher frame rates than contemporary state-of-the-art frameworks (114.6 frames per second on an NVIDIA Jetson Nano GPU with TensorRT) while maintaining comparable accuracy.	翻訳日:2021-08-25 20:17:17 公開日:2021-08-24
# (参考訳) ARShoe:スマートフォンのリアルタイム拡張現実シューオンシステム ARShoe: Real-Time Augmented Reality Shoe Try-on System on Smartphones ( http://arxiv.org/abs/2108.10515v1 ) ライセンス: CC BY 4.0	Shan An, Guangfu Che, Jinghao Guo, Haogang Zhu, Junjie Ye, Fangru Zhou, Zhaoqi Zhu, Dong Wei, Aishan Liu, Wei Zhang	(参考訳) 仮想トライオン技術により、ユーザーは拡張現実を使ってさまざまなファッションアイテムを試すことができ、便利なオンラインショッピング体験を提供する。しかし、以前の作品の多くは衣服の仮想試着に焦点を合わせ、靴の試着を無視している。そこで本研究では,スマートフォン用のリアルタイム拡張現実バーチャル靴試着システム,ARShoeを提案する。具体的には、ポーズ推定とセグメンテーションを同時に実現するために、新しいマルチブランチネットワークを採用する。試着中にリアルな3Dシューズモデル閉塞を発生させるソリューションを提示する。円滑で安定な試行効果を達成するため,本研究は新たな安定化法をさらに発展させる。さらに, トレーニングと評価のために, 仮想シューズ試用タスク関連ラベルをアノテーションで付加した, 初の大規模フットベンチマークを構築した。新たに構築したベンチマーク実験では,ARShoeの満足度が実証された。スマートフォンの実用化試験では,提案手法のリアルタイム性能と安定化が検証された。 Virtual try-on technology enables users to try various fashion items using augmented reality and provides a convenient online shopping experience. However, most previous works focus on the virtual try-on for clothes while neglecting that for shoes, which is also a promising task. To this concern, this work proposes a real-time augmented reality virtual shoe try-on system for smartphones, namely ARShoe. Specifically, ARShoe adopts a novel multi-branch network to realize pose estimation and segmentation simultaneously. A solution to generate realistic 3D shoe model occlusion during the try-on process is presented. To achieve a smooth and stable try-on effect, this work further develop a novel stabilization method. Moreover, for training and evaluation, we construct the very first large-scale foot benchmark with multiple virtual shoe try-on task-related labels annotated. Exhaustive experiments on our newly constructed benchmark demonstrate the satisfying performance of ARShoe. Practical tests on common smartphones validate the real-time performance and stabilization of the proposed approach.	翻訳日:2021-08-25 20:01:28 公開日:2021-08-24
# (参考訳) ラベル割り当て蒸留による物体検出の改善 Improving Object Detection by Label Assignment Distillation ( http://arxiv.org/abs/2108.10520v1 ) ライセンス: CC BY 4.0	Chuong H. Nguyen, Thuy C. Nguyen, Tuan N. Tang, Nam L.H. Phan	(参考訳) オブジェクト検出におけるラベル割り当ては、画像内のサンプルされた領域に前景または背景のターゲットを割り当てることを目的としている。画像分類のラベル付けとは異なり、この問題はオブジェクトの境界ボックスのために適切に定義されていない。本稿では,蒸留の観点から問題を考察し,ラベル割り当て蒸留(LAD)と呼ぶ。最初のモチベーションは非常に単純で、教師ネットワークを使って生徒のラベルを生成します。これは、教師の予測を直接の目標(ソフトラベル)として使うか、または教師が動的に割り当てるハードラベル(LAD)を通して達成できる。実験の結果, (i)LADはソフトラベルよりも有効であるが, 相補的であることがわかった。 (ii)ladを使用すると、より小さな教師はより大きな生徒を著しく改善できるが、ソフトラベルはできない。次に,2つのネットワークがスクラッチから同時に学習し,教師と学生の役割を動的に交換するコラーニングLADを紹介する。 PAA-ResNet50を教師として使うことで、PAA-ResNet101とPAA-ResNeXt101の検出器を、COCOテストデブセットで46ドル、47.5ドルに改善できます。強力な教師であるPAA-SwinBでは、PAA-ResNet50を1倍のスケジュールトレーニングで43.9ドル、PAA-ResNet101を47.9ドルに改善し、現在の手法を大きく上回っている。ソースコードとチェックポイントはhttps://github.com/cybercore-co-ltd/colad_paperで公開します。 Label assignment in object detection aims to assign targets, foreground or background, to sampled regions in an image. Unlike labeling for image classification, this problem is not well defined due to the object's bounding box. In this paper, we investigate the problem from a perspective of distillation, hence we call Label Assignment Distillation (LAD). Our initial motivation is very simple, we use a teacher network to generate labels for the student. This can be achieved in two ways: either using the teacher's prediction as the direct targets (soft label), or through the hard labels dynamically assigned by the teacher (LAD). Our experiments reveal that: (i) LAD is more effective than soft-label, but they are complementary. (ii) Using LAD, a smaller teacher can also improve a larger student significantly, while soft-label can't. We then introduce Co-learning LAD, in which two networks simultaneously learn from scratch and the role of teacher and student are dynamically interchanged. Using PAA-ResNet50 as a teacher, our LAD techniques can improve detectors PAA-ResNet101 and PAA-ResNeXt101 to $46 \rm AP$ and $47.5\rm AP$ on the COCO test-dev set. With a strong teacher PAA-SwinB, we improve the PAA-ResNet50 to $43.9\rm AP$ with only \1x schedule training, and PAA-ResNet101 to $47.9\rm AP$, significantly surpassing the current methods. Our source code and checkpoints will be released at https://github.com/cybercore-co-ltd/CoLAD_paper.	翻訳日:2021-08-25 19:47:11 公開日:2021-08-24
# (参考訳) 複数物体追跡と軌道予測のための共同学習アーキテクチャ Joint Learning Architecture for Multiple Object Tracking and Trajectory Forecasting ( http://arxiv.org/abs/2108.10543v1 ) ライセンス: CC BY 4.0	Oluwafunmilola Kesa, Olly Styles, Victor Sanchez	(参考訳) 本稿では,複数物体追跡(MOT)と軌跡予測のための共同学習アーキテクチャ(JLA)を提案する。動き予測は、境界ボックスの形で予測を洗練させる技術MOT法のいくつかの状態において広く用いられている。通常、カルマンフィルタは、トラッカーが現在のフレーム内のオブジェクトの位置を正確に予測するのに役立つ短期的な推定を提供する。しかし、カルマンフィルタに基づくアプローチは非線形軌跡を予測できない。追跡軌道予測モデルと予測軌道予測モデルの共同学習を行い,カルマンフィルタのような線形運動予測手法に代えて,短期運動推定のための予測軌道予測法を提案する。我々はMOTChallengeベンチマークでJLAを評価した。評価の結果、JLAは短期動作予測に優れており、FairMOTと比較して、MOT16、MOT17、MOT20データセットのIDスイッチを33%、31%、および47%削減している。 This paper introduces a joint learning architecture (JLA) for multiple object tracking (MOT) and trajectory forecasting in which the goal is to predict objects' current and future trajectories simultaneously. Motion prediction is widely used in several state of the art MOT methods to refine predictions in the form of bounding boxes. Typically, a Kalman Filter provides short-term estimations to help trackers correctly predict objects' locations in the current frame. However, the Kalman Filter-based approaches cannot predict non-linear trajectories. We propose to jointly train a tracking and trajectory forecasting model and use the predicted trajectory forecasts for short-term motion estimates in lieu of linear motion prediction methods such as the Kalman filter. We evaluate our JLA on the MOTChallenge benchmark. Evaluations result show that JLA performs better for short-term motion prediction and reduces ID switches by 33%, 31%, and 47% in the MOT16, MOT17, and MOT20 datasets, respectively, in comparison to FairMOT.	翻訳日:2021-08-25 19:28:20 公開日:2021-08-24
# (参考訳) 凍結組織からの歯肉質の組織学的診断を容易にする再生的逆行性アプローチ A generative adversarial approach to facilitate archival-quality histopathologic diagnoses from frozen tissue sections ( http://arxiv.org/abs/2108.10550v1 ) ライセンス: CC BY 4.0	Kianoush Falahkheirkhah, Tao Guo, Michael Hwang, Pheroze Tamboli, Christopher G Wood, Jose A Karam, Kanishka Sircar, and Rohit Bhargava	(参考訳) 病理組織学を含む臨床診断および研究において、ホルマリン固定パラフィン(FFPE)組織は、その超画質にほぼ普遍的に好まれる。しかし、組織処理時間(24時間以上)は意思決定を遅らせる可能性がある。対照的に、フレッシュフリーズ(ff)処理(1時間未満)は迅速な情報が得られるが、クリアリングの欠如、形態的変形、頻繁なアーティファクトにより診断精度は最適ではない。ここでは、人工知能を使ってこのギャップを埋める。患者40名から分離した98対の腎サンプルから生成逆数ネットワーク(GAN)を用いて,FFPE様画像,仮想FFPEをFFPEから合成した。 5人の病理医が盲検検査の結果を評価した。仮想FFPEデータの画質は高く評価され、実際のFFPE画像とよく似ていることが示された。仮想ffpe画像における疾患の臨床的評価は, ff画像と比較して, 観察者間一致が高かった。ほぼ瞬時に生成された仮想FFPE画像は、情報への時間を短縮するだけでなく、余分なコストと労力なしで通常のFFPE画像からより正確な診断を容易にすることができる。 In clinical diagnostics and research involving histopathology, formalin fixed paraffin embedded (FFPE) tissue is almost universally favored for its superb image quality. However, tissue processing time (more than 24 hours) can slow decision-making. In contrast, fresh frozen (FF) processing (less than 1 hour) can yield rapid information but diagnostic accuracy is suboptimal due to lack of clearing, morphologic deformation and more frequent artifacts. Here, we bridge this gap using artificial intelligence. We synthesize FFPE-like images ,virtual FFPE, from FF images using a generative adversarial network (GAN) from 98 paired kidney samples derived from 40 patients. Five board-certified pathologists evaluated the results in a blinded test. Image quality of the virtual FFPE data was assessed to be high and showed a close resemblance to real FFPE images. Clinical assessments of disease on the virtual FFPE images showed a higher inter-observer agreement compared to FF images. The nearly instantaneously generated virtual FFPE images can not only reduce time to information but can facilitate more precise diagnosis from routine FF images without extraneous costs and effort.	翻訳日:2021-08-25 19:13:22 公開日:2021-08-24
# (参考訳) イベントカメラからの高密度光流れ Dense Optical Flow from Event Cameras ( http://arxiv.org/abs/2108.10552v1 ) ライセンス: CC BY-SA 4.0	Mathias Gehrig and Mario Millh\"ausler and Daniel Gehrig and Davide Scaramuzza	(参考訳) イベントカメラからの高密度光フロー推定に特徴相関と逐次処理を導入することを提案する。現代のフレームベース光フロー法は特徴相関から計算したマッチングコストに大きく依存している。対照的に、マッチングコストを明示的に計算するイベントカメラの光学フロー法は存在しない。代わりに、イベントを用いた学習ベースのアプローチは、通常はU-Netアーキテクチャを利用して光学フローをわずかに見積もる。我々の重要な発見は、相関関数の導入は、畳み込み層のみに依存する従来の方法と比較して、結果を著しく改善するということです。提案手法は,最先端技術と比較して高密度光流を計算し,終点誤差をMVSECで23%削減する。また,イベントカメラ用にこれまでに開発された光学フロー法はすべて,最大流量10ピクセルの非常に小さな変位場を持つデータセット上で評価されている。この観測に基づいて,最大210ピクセルの変位場と3倍の解像度のカメラ分解能を示す,新しい実世界のデータセットを導入する。提案手法は,このデータセットの終端点誤差を66%低減する。 We propose to incorporate feature correlation and sequential processing into dense optical flow estimation from event cameras. Modern frame-based optical flow methods heavily rely on matching costs computed from feature correlation. In contrast, there exists no optical flow method for event cameras that explicitly computes matching costs. Instead, learning-based approaches using events usually resort to the U-Net architecture to estimate optical flow sparsely. Our key finding is that the introduction of correlation features significantly improves results compared to previous methods that solely rely on convolution layers. Compared to the state-of-the-art, our proposed approach computes dense optical flow and reduces the end-point error by 23% on MVSEC. Furthermore, we show that all existing optical flow methods developed so far for event cameras have been evaluated on datasets with very small displacement fields with a maximum flow magnitude of 10 pixels. Based on this observation, we introduce a new real-world dataset that exhibits displacement fields with magnitudes up to 210 pixels and 3 times higher camera resolution. Our proposed approach reduces the end-point error on this dataset by 66%.	翻訳日:2021-08-25 19:02:27 公開日:2021-08-24
# (参考訳) 野獣のタミング:ニューラルな会話モデルを制御する学習 Taming the Beast: Learning to Control Neural Conversational Models ( http://arxiv.org/abs/2108.10561v1 ) ライセンス: CC BY 4.0	Andrea Madotto	(参考訳) 本論文は,タスク指向とチャットの両シナリオにおいて,深層学習に基づく,エンドツーエンドで生成的な対話システムの制御可能性について考察する。特に,スタイルや話題の制御や対話スキルの継続的な付加・結合など,生成対話システム制御のさまざまな側面について検討する。最初の対話システムが商用化されてから30年が経ち、これらのシステムの基本的なアーキテクチャは、自然言語理解(NLU)、対話状態追跡(DST)、対話マネージャ(DM)、自然言語生成(NLG)という4つのパイプライン化された基本コンポーネントで、ほとんど変わっていない。モジュール化システムの重要なコンポーネントである対話マネージャは、応答内容とスタイルを制御する。このモジュールは通常規則でプログラムされ、高度に制御可能で容易に拡張できるように設計されている。強力な「深層学習」アーキテクチャの出現に伴い、システム全体の性能を最適化し、訓練を簡素化するエンドツーエンド生成対話システムが提案されている。しかし、これらのシステムはモジュール化された対話マネージャができる限り容易に制御・拡張できない。これは、通常、大きな事前学習された言語モデル(gpt-2など)である単一のニューラルネットワークが使用されているため、望ましい属性(スタイル、トピックなど)を外科的に変更することは困難である。さらに重要なことに、制御不能な対話システムは攻撃的、さらには有害な反応を引き起こす可能性がある。そこで本論文では,タスク指向およびチャットシナリオにおけるエンドツーエンド生成対話システムの制御可能な手法について検討する。 1)chit-chatモデルのスタイルと話題の制御方法,2)タスク指向対話システムの継続的な制御と拡張方法,3)マルチスキル対話モデルの構成と制御方法について述べる。 This thesis investigates the controllability of deep learning-based, end-to-end, generative dialogue systems in both task-oriented and chit-chat scenarios. In particular, we study the different aspects of controlling generative dialogue systems, including controlling styles and topics and continuously adding and combining dialogue skills. In the three decades since the first dialogue system was commercialized, the basic architecture of such systems has remained substantially unchanged, consisting of four pipelined basic components, namely, natural language understanding (NLU), dialogue state tracking (DST), a dialogue manager (DM) and natural language generation (NLG). The dialogue manager, which is the critical component of the modularized system, controls the response content and style. This module is usually programmed by rules and is designed to be highly controllable and easily extendable. With the emergence of powerful "deep learning" architectures, end-to-end generative dialogue systems have been proposed to optimize overall system performance and simplify training. However, these systems cannot be easily controlled and extended as the modularized dialogue manager can. This is because a single neural system is used, which is usually a large pre-trained language model (e.g., GPT-2), and thus it is hard to surgically change desirable attributes (e.g., style, topics, etc.). More importantly, uncontrollable dialogue systems can generate offensive and even toxic responses. Therefore, in this thesis, we study controllable methods for end-to-end generative dialogue systems in task-oriented and chit-chat scenarios. Throughout the chapters, we describe 1) how to control the style and topics of chit-chat models, 2) how to continuously control and extend task-oriented dialogue systems, and 3) how to compose and control multi-skill dialogue models.	翻訳日:2021-08-25 18:50:05 公開日:2021-08-24
# (参考訳) 残差学習に基づくデュアルオートエンコーダモデルを用いた医用画像圧縮 Lossy Medical Image Compression using Residual Learning-based Dual Autoencoder Model ( http://arxiv.org/abs/2108.10579v1 ) ライセンス: CC BY 4.0	Dipti Mishra, Satish Kumar Singh, Rajat Kumar Singh	(参考訳) 本研究では,マラリアrbc細胞画像パッチを圧縮するための2段階オートエンコーダベースの圧縮機・デコンプレッサーフレームワークを提案する。病気の診断に使用される医療画像は、数十ギガバイトほどの大きさで、非常に巨大です。提案した残差ベースデュアルオートエンコーダネットワークは,デコンプレッサモジュールを通じて元のイメージを再構成するユニークな特徴を抽出するために訓練される。 2つの潜在空間表現(第1は原画像、第2は残留画像)は、最終原画像の再構築に使用される。色-SSIMは、減圧後の細胞画像のクロミナンス部の品質チェックにのみ使用されている。実験の結果,提案手法は,PSNR,Color SSIM,MS-SSIMにおいて,医用画像の他のニューラルネットワーク圧縮技術よりも約35%,10%,5%優れていた。このアルゴリズムは、JPEG-LS、JP2K-LM、CALIC、最近のニューラルネットワークアプローチよりも76%、78%、75%、および74%のビット保存を大幅に改善し、圧縮圧縮技術として優れている。 In this work, we propose a two-stage autoencoder based compressor-decompressor framework for compressing malaria RBC cell image patches. We know that the medical images used for disease diagnosis are around multiple gigabytes size, which is quite huge. The proposed residual-based dual autoencoder network is trained to extract the unique features which are then used to reconstruct the original image through the decompressor module. The two latent space representations (first for the original image and second for the residual image) are used to rebuild the final original image. Color-SSIM has been exclusively used to check the quality of the chrominance part of the cell images after decompression. The empirical results indicate that the proposed work outperformed other neural network related compression technique for medical images by approximately 35%, 10% and 5% in PSNR, Color SSIM and MS-SSIM respectively. The algorithm exhibits a significant improvement in bit savings of 76%, 78%, 75% & 74% over JPEG-LS, JP2K-LM, CALIC and recent neural network approach respectively, making it a good compression-decompression technique.	翻訳日:2021-08-25 18:48:47 公開日:2021-08-24
# (参考訳) ポーランド国境警備隊における刑事文書の検出 Detection of Criminal Texts for the Polish State Border Guard ( http://arxiv.org/abs/2108.10580v1 ) ライセンス: CC BY 4.0	Artur Nowakowski, Krzysztof Jassem	(参考訳) 本稿では,インターネット上に現れるポーランドの犯罪テキストの検出について述べる。非平衡・雑音データの効率的な分類のための最善の設定を探索する実験を行った。ポーランド語をベースとしたトランスフォーマー言語モデルを用いて,我々のモデルを微調整した結果,最高の性能が得られた。検出タスクでは,注釈付きインターネットスニペットの大規模なコーパスをトレーニングデータとして収集した。このデータセットを共有し、Goitoプラットフォームをベンチマークとして、犯罪テキストを検出するための新しいタスクを作成します。 This paper describes research on the detection of Polish criminal texts appearing on the Internet. We carried out experiments to find the best available setup for the efficient classification of unbalanced and noisy data. The best performance was achieved when our model was fine-tuned on a pre-trained Polish-based transformer language model. For the detection task, a large corpus of annotated Internet snippets was collected as training data. We share this dataset and create a new task for the detection of criminal texts using the Gonito platform as the benchmark.	翻訳日:2021-08-25 18:40:14 公開日:2021-08-24
# (参考訳) コンピュータ支援整形外科手術における咬合・ロバスト視覚マーカーレス骨追跡 Occlusion-robust Visual Markerless Bone Tracking for Computer-Assisted Orthopaedic Surgery ( http://arxiv.org/abs/2108.10608v1 ) ライセンス: CC BY 4.0	Xue Hu, Anh Nguyen, Ferdinando Rodriguez y Baena	(参考訳) 従来のコンピュータ支援整形外科ナビゲーションシステムは、患者のポーズのための専用の光学マーカーの追跡に依存しているため、手術のワークフローはより侵襲的で退屈で高価である。視覚的追跡は, マーカーレス, 無努力で標的解剖を測定するために最近提案されているが, 術中介入による実世界の閉塞下では失敗する。さらに、そのような手法はハードウェア固有のものであり、外科的応用には十分ではない。本稿では,咬合に対して頑健なrgb-dセンシングに基づくマーカーレストラッキング手法を提案する。我々は、動的領域の予測とロバストな3Dポイントクラウドセグメンテーションを特徴とする新しいセグメンテーションネットワークを設計する。また,オクルージョン・インスタンスを用いた大規模トレーニングデータ収集にはコストがかかるため,ネットワークトレーニングのための合成RGB-D画像の作成方法も提案する。実験結果から,提案手法は近年の最先端手法よりも,特に閉塞が存在する場合において高い性能を示すことが示された。さらに,本手法は,ネットワーク再トレーニングを必要とせず,キャダバを含む新しいカメラや新たなターゲットモデルによく応用できる。提案手法は,高品質な商用RGB-Dカメラを用いて,モデル膝における1-2デグレスと2-4mmの精度を実現し,臨床応用の基準を満たしている。 Conventional computer-assisted orthopaedic navigation systems rely on the tracking of dedicated optical markers for patient poses, which makes the surgical workflow more invasive, tedious, and expensive. Visual tracking has recently been proposed to measure the target anatomy in a markerless and effortless way, but the existing methods fail under real-world occlusion caused by intraoperative interventions. Furthermore, such methods are hardware-specific and not accurate enough for surgical applications. In this paper, we propose a RGB-D sensing-based markerless tracking method that is robust against occlusion. We design a new segmentation network that features dynamic region-of-interest prediction and robust 3D point cloud segmentation. As it is expensive to collect large-scale training data with occlusion instances, we also propose a new method to create synthetic RGB-D images for network training. Experimental results show that our proposed markerless tracking method outperforms recent state-of-the-art approaches by a large margin, especially when an occlusion exists. Furthermore, our method generalises well to new cameras and new target models, including a cadaver, without the need for network retraining. In practice, by using a high-quality commercial RGB-D camera, our proposed visual tracking method achieves an accuracy of 1-2 degress and 2-4 mm on a model knee, which meets the standard for clinical applications.	翻訳日:2021-08-25 18:31:43 公開日:2021-08-24
# (参考訳) ProtoMIL: ファイングレード・インタプリタビリティのためのプロトタイプ部分を用いた複数インスタンス学習 ProtoMIL: Multiple Instance Learning with Prototypical Parts for Fine-Grained Interpretability ( http://arxiv.org/abs/2108.10612v1 ) ライセンス: CC BY 4.0	Dawid Rymarczyk and Aneta Kaczy\'nska and Jaros{\l}aw Kraus and Adam Pardyl and Bartosz Zieli\'nski	(参考訳) マルチインスタンス学習(mil:multiple instance learning)は、多くの現実の機械学習アプリケーションで人気を集めている。しかしながら、ミルを説明するための対応する取り組みは遅れており、通常は特定の予測に不可欠なバッグのインスタンスを提示することに限られる。本稿では,視覚プロトタイプで動作するケースベース推論プロセスに触発された自己説明可能なMIL手法であるProtoMILを導入することにより,このギャップを埋める。 ProtoMILは、オブジェクト記述に原型的特徴を組み込むことにより、モデル精度と細粒度解釈可能性に前例のない結合を行い、5つのMILデータセットで実験を行った。 Multiple Instance Learning (MIL) gains popularity in many real-life machine learning applications due to its weakly supervised nature. However, the corresponding effort on explaining MIL lags behind, and it is usually limited to presenting instances of a bag that are crucial for a particular prediction. In this paper, we fill this gap by introducing ProtoMIL, a novel self-explainable MIL method inspired by the case-based reasoning process that operates on visual prototypes. Thanks to incorporating prototypical features into objects description, ProtoMIL unprecedentedly joins the model accuracy and fine-grained interpretability, which we present with the experiments on five recognized MIL datasets.	翻訳日:2021-08-25 18:12:05 公開日:2021-08-24
# (参考訳) 不均一Telcoセルデータの外部位置復元 Outdoor Position Recovery from HeterogeneousTelco Cellular Data ( http://arxiv.org/abs/2108.10613v1 ) ライセンス: CC BY 4.0	Yige Zhang, Weixiong Rao, Kun Zhang and Lei Chen	(参考訳) 近年、通信(テルコ)セルラーネットワークによって生成された前例のない量のデータを目撃している。例えば、モバイルデバイスと通信ネットワーク間の接続状態(例えば受信信号強度)を報告するために計測記録(mrs)が生成される。 MRデータは、人間の移動分析、都市計画、交通予測のための屋外モバイルデバイスのローカライズに広く利用されている。隠れマルコフモデル(hmm)のような一階系列モデルを用いた既存の仕事は、低ローカライズエラーの基盤となるモビリティパターンにおける時空間的局所性を捉えようとする。 HMMアプローチは通常、基盤となるモバイルデバイスの安定したモビリティパターンを前提としている。しかし、実際のMRデータセットは、基礎となるモバイルデバイスの混合輸送モードとMRサンプルに関連する位置の不均一な分布により、異種移動パターンを示す。したがって、既存のソリューションはこれらの不均質なモビリティパターンを処理できない。本研究では,マルチタスク学習に基づく深層ニューラルネットワーク(DNN)フレームワークであるPRNet+を提案する。フレームワークの動作を確認するため、PRNet+は特徴抽出モジュールを開発し、異種MRサンプルから局所的、短期的、長期的時空間的局所性を正確に学習する。上海の3つの代表的な地域で収集された8つのデータセットの大規模な評価は、PRNet+が最先端のデータを著しく上回ることを示している。 Recent years have witnessed unprecedented amounts of data generated by telecommunication (Telco) cellular networks. For example, measurement records (MRs) are generated to report the connection states between mobile devices and Telco networks, e.g., received signal strength. MR data have been widely used to localize outdoor mobile devices for human mobility analysis, urban planning, and traffic forecasting. Existing works using first-order sequence models such as the Hidden Markov Model (HMM) attempt to capture spatio-temporal locality in underlying mobility patterns for lower localization errors. The HMM approaches typically assume stable mobility patterns of the underlying mobile devices. Yet real MR datasets exhibit heterogeneous mobility patterns due to mixed transportation modes of the underlying mobile devices and uneven distribution of the positions associated with MR samples. Thus, the existing solutions cannot handle these heterogeneous mobility patterns. we propose a multi-task learning-based deep neural network (DNN) framework, namely PRNet+, to incorporate outdoor position recovery and transportation mode detection. To make sure the framework work, PRNet+ develops a feature extraction module to precisely learn local-, short- and long-term spatio-temporal locality from heterogeneous MR samples. Extensive evaluation on eight datasets collected at three representative areas in Shanghai indicates that PRNet+ greatly outperforms state-of-the-arts.	翻訳日:2021-08-25 17:56:21 公開日:2021-08-24
# (参考訳) 画像なし単一画素セグメンテーション Image-free single-pixel segmentation ( http://arxiv.org/abs/2108.10617v1 ) ライセンス: CC BY 4.0	Haiyan Liu, Liheng Bian, Jun Zhang	(参考訳) 既存のセグメンテーション技術は、セグメンテーションを実行するために入力として高忠実度画像を必要とする。セグメンテーションの結果は、取得した画像よりもはるかに少ないエッジ情報の大部分を含んでいるため、スループットギャップはハードウェアとソフトウェアの両方の無駄につながる。本稿では,画像のない単一画素セグメンテーション手法について報告する。この技術は、構造化照明と単画素検出を組み合わせて、シーンのセグメンテーション情報を効率よくサンプリングし、圧縮された1次元計測に多重化する。照明パターンは、後続のレコンストラクションニューラルネットワークと共に最適化され、シングルピクセルの測定からセグメンテーションマップを直接推定する。エンドツーエンドのエンコーディング・アンド・デコーディング学習フレームワークは、対応するネットワークで最適化された照明を可能にし、高い獲得効率とセグメンテーション効率を提供する。シミュレーションと実験の結果から、正確なセグメンテーションが2次元の少ない入力データで達成できることが確認された。サンプリング比1%の場合、ディス係数は80%以上、画素精度は96%以上となる。我々は,この画像のないセグメンテーション技術が,UAVや無人航空機など,リアルタイムセンシングを必要とする様々な資源制限されたプラットフォームに広く応用できると考えている。 The existing segmentation techniques require high-fidelity images as input to perform semantic segmentation. Since the segmentation results contain most of edge information that is much less than the acquired images, the throughput gap leads to both hardware and software waste. In this letter, we report an image-free single-pixel segmentation technique. The technique combines structured illumination and single-pixel detection together, to efficiently samples and multiplexes scene's segmentation information into compressed one-dimensional measurements. The illumination patterns are optimized together with the subsequent reconstruction neural network, which directly infers segmentation maps from the single-pixel measurements. The end-to-end encoding-and-decoding learning framework enables optimized illumination with corresponding network, which provides both high acquisition and segmentation efficiency. Both simulation and experimental results validate that accurate segmentation can be achieved using two-order-of-magnitude less input data. When the sampling ratio is 1%, the Dice coefficient reaches above 80% and the pixel accuracy reaches above 96%. We envision that this image-free segmentation technique can be widely applied in various resource-limited platforms such as UAV and unmanned vehicle that require real-time sensing.	翻訳日:2021-08-25 17:23:31 公開日:2021-08-24
# (参考訳) adversarial bertを用いた弱い教師付きクロスプラットフォームティーンエージャー検出 Weakly Supervised Cross-platform Teenager Detection with Adversarial BERT ( http://arxiv.org/abs/2108.10619v1 ) ライセンス: CC BY 4.0	Peiling Yi and Arkaitz Zubiaga	(参考訳) ティーンエイジャー検出は、ソーシャルメディアにおける年齢検出タスクの重要な事例であり、十代のユーザーをネガティブな影響から保護することを目的としている。ティーンエイジャー検出タスクはラベル付きデータの不足に苦しめられ、ソーシャルメディアプラットフォーム間でうまく機能する能力が悪化する。プラットフォーム上でラベル付きデータが利用できない環境でのティーンエイジャー検出のさらなる研究のために,Adversarial BERTに基づく新しいクロスプラットフォームフレームワークを提案する。私たちのフレームワークは、ソースプラットフォームから限られた量のラベル付きインスタンスで動作でき、ターゲットプラットフォームからラベル付きデータがなく、ソースからターゲットのソーシャルメディアに知識を転送できます。我々は4つの公開データセットを実験し、クロスプラットフォームのティーンエイジャー検出タスクにおいて、我々のフレームワークが競合するベースラインモデルを大幅に改善できることを示す結果を得た。 Teenager detection is an important case of the age detection task in social media, which aims to detect teenage users to protect them from negative influences. The teenager detection task suffers from the scarcity of labelled data, which exacerbates the ability to perform well across social media platforms. To further research in teenager detection in settings where no labelled data is available for a platform, we propose a novel cross-platform framework based on Adversarial BERT. Our framework can operate with a limited amount of labelled instances from the source platform and with no labelled data from the target platform, transferring knowledge from the source to the target social media. We experiment on four publicly available datasets, obtaining results demonstrating that our framework can significantly improve over competitive baseline models on the cross-platform teenager detection task.	翻訳日:2021-08-25 17:16:03 公開日:2021-08-24
# (参考訳) モデル埋め込み距離を用いたディープニューラルネットワークの分布外例検出 Out-of-Distribution Example Detection in Deep Neural Networks using Distance to Modelled Embedding ( http://arxiv.org/abs/2108.10673v1 ) ライセンス: CC BY 4.0	Rickard Sj\"ogren and Johan Trygg	(参考訳) 安全クリティカルなシステムにおけるディープラーニングの採用は、モデルがデプロイされた後、ディープニューラルネットワークが理解できないことを理解する必要性を高める。ディープニューラルネットワークの振る舞いは、いわゆるアウト・オブ・ディストリビューションの例では定義されていない。つまり、トレーニングセット以外のディストリビューションからの例です。予測時間中に分布外サンプルを検出する手法がいくつか提案されているが、これらの手法はニューラルネットワークアーキテクチャ、ニューラルネットワークのトレーニング方法、パフォーマンス上のオーバーヘッド、あるいは分布外サンプルの性質が事前に分かっていると仮定するのいずれかを制約している。予測時間における分布外例の検出に使用するDIME(Distance to Modelled Embedding)を提案する。線形超平面として特徴空間に埋め込まれたトレーニングセットを近似することにより、単純で教師なし、高性能で計算効率の良い手法を導出する。 DIMEにより、アーキテクチャやトレーニングを変更することなく、ニューラルネットワークモデルに配布外サンプルの予測時間検出を追加できます。実験では,DIMEをアドオンとして使用することにより,予測中の分布外例を効率よく検出し,より汎用性が高く,計算オーバーヘッドも無視できることを示した。 Adoption of deep learning in safety-critical systems raise the need for understanding what deep neural networks do not understand after models have been deployed. The behaviour of deep neural networks is undefined for so called out-of-distribution examples. That is, examples from another distribution than the training set. Several methodologies to detect out-of-distribution examples during prediction-time have been proposed, but these methodologies constrain either neural network architecture, how the neural network is trained, suffer from performance overhead, or assume that the nature of out-of-distribution examples are known a priori. We present Distance to Modelled Embedding (DIME) that we use to detect out-of-distribution examples during prediction time. By approximating the training set embedding into feature space as a linear hyperplane, we derive a simple, unsupervised, highly performant and computationally efficient method. DIME allows us to add prediction-time detection of out-of-distribution examples to neural network models without altering architecture or training while imposing minimal constraints on when it is applicable. In our experiments, we demonstrate that by using DIME as an add-on after training, we efficiently detect out-of-distribution examples during prediction and match state-of-the-art methods while being more versatile and introducing negligible computational overhead.	翻訳日:2021-08-25 17:06:24 公開日:2021-08-24
# (参考訳) MCUa:乳がん組織像分類のためのマルチレベルコンテキストとダイナミックディープアンサンブル MCUa: Multi-level Context and Uncertainty aware Dynamic Deep Ensemble for Breast Cancer Histology Image Classification ( http://arxiv.org/abs/2108.10709v1 ) ライセンス: CC BY 4.0	Zakaria Senousy, Mohammed M. Abdelsamea, Mohamed Medhat Gaber, Moloud Abdar, U Rajendra Acharya, Abbas Khosravi, and Saeid Nahavandi	(参考訳) 乳腺組織像の分類は乳がんの早期診断において重要なステップである。乳腺病理診断では,CNN (Convolutional Neural Networks) がDigitalized histology slidesを用いて大きな成功を収めた。しかし, 大規模デジタル化標本の高視認性と文脈情報の欠如により, 組織分類は依然として困難である。本稿では,マルチレベルコンテキストと不確実性認識(mcua)動的ディープラーニングアンサンブルモデルと呼ばれる新しいcnnを提案する。mcuaモデルは複数のマルチレベルコンテキスト認識モデルからなり,画像パッチ間の空間依存性を階層的に学習する。 MCUamodelhasは、不確実な定量化成分を用いて、マルチレベルの文脈情報に対する高感度を利用して、新しいダイナミックアンサンブルモデルを実現し、乳がん組織像データセットで98.11%の精度を達成した。実験の結果, 最先端の組織分類モデルと比較して, 提案法の有効性が高かった。 Breast histology image classification is a crucial step in the early diagnosis of breast cancer. In breast pathological diagnosis, Convolutional Neural Networks (CNNs) have demonstrated great success using digitized histology slides. However, tissue classification is still challenging due to the high visual variability of the large-sized digitized samples and the lack of contextual information. In this paper, we propose a novel CNN, called Multi-level Context and Uncertainty aware (MCUa) dynamic deep learning ensemble model.MCUamodel consists of several multi-level context-aware models to learn the spatial dependency between image patches in a layer-wise fashion. It exploits the high sensitivity to the multi-level contextual information using an uncertainty quantification component to accomplish a novel dynamic ensemble model.MCUamodelhas achieved a high accuracy of 98.11% on a breast cancer histology image dataset. Experimental results show the superior effectiveness of the proposed solution compared to the state-of-the-art histology classification models.	翻訳日:2021-08-25 16:47:41 公開日:2021-08-24
# (参考訳) メディアパイプハンドを用いたペン紡ぐ手の動き解析 Pen Spinning Hand Movement Analysis Using MediaPipe Hands ( http://arxiv.org/abs/2108.10716v1 ) ライセンス: CC BY 4.0	Tung-Lin Wu, Taishi Senda	(参考訳) MediaPipe Hands と OpenCV を用いたペン回転時の手の動きに関するデータ取得に挑戦した。本研究の目的は,ペン回転競技の性能を客観的に評価するシステムを構築することである。競争における実行、滑らかさ、制御の評価は非常に困難であり、しばしば主観性を伴う。そこで本稿では,客観的数値を用いて評価を完全自動化することを目的とした。不確かさは依然としてMediaPipeの骨格認識に存在し、鮮やかな色の背景では認識が難しい傾向にある。しかし,プログラムの彩度や輝度を変化させることで,認識精度を向上させることができた。さらに、明るさの自動検出と調整も可能になった。対象数値を用いてペン回転の評価を体系化する次のステップとして,手の動きを採用した。各フレームにおける手の座標の標準偏差とL2ノルムを計算することにより,手の動きの上下を可視化することができた。手の動きの結果は非常に正確で、目標に向かって大きな一歩だと感じています。将来的には、ペン紡績の仕上がりを完全に自動化していきたいと考えています。 We challenged to get data about hand movement in pen spinning using MediaPipe Hands and OpenCV. The purpose is to create a system that can be used to objectively evaluate the performance of pen spinning competitions. Evaluation of execution, smoothness, and control in competitions are quite difficult and often with subjectivity. Therefore, we aimed to fully automate the process by using objective numerical values for evaluation. Uncertainty still exists in MediaPipe's skeletal recognition, and it tends to be more difficult to recognize in brightly colored backgrounds. However, we could improve the recognition accuracy by changing the saturation and brightness in the program. Furthermore, automatic detection and adjustment of brightness is now possible. As the next step to systematize the evaluation of pen spinning using objective numerical values, we adopted "hand movements". We were able to visualize the ups and downs of the hand movements by calculating the standard deviation and L2 norm of the hand's coordinates in each frame. The results of hand movements are quite accurate, and we feel that it is a big step toward our goal. In the future, we would like to make great efforts to fully automate the grading of pen spinning.	翻訳日:2021-08-25 15:58:37 公開日:2021-08-24
# (参考訳) グラフニューラルネットワーク: 手法,応用,機会 Graph Neural Networks: Methods, Applications, and Opportunities ( http://arxiv.org/abs/2108.10733v1 ) ライセンス: CC BY 4.0	Lilapati Waikhom and Ripon Patgiri	(参考訳) 過去10年ほどで、私たちは機械学習分野を再活性化するディープラーニングを見てきた。コンピュータビジョン、音声認識、自然言語処理などの分野における多くの問題を解決し、最先端のパフォーマンスを実現している。データは一般にこれらの領域のユークリッド空間で表される。他の様々な領域は非ユークリッド空間で、グラフは理想的な表現である。グラフは、様々なエンティティ間の依存関係と相互関係を表現するのに適している。伝統的に、グラフのハンドクラフト機能は、この複雑なデータ表現から様々なタスクに必要な推論を提供することができない。近年,データベースタスクのグラフ化に深層学習の様々な進歩が取り入れられている。本稿では、各学習環境におけるグラフニューラルネットワーク(GNN)の総合的な調査:教師なし、教師なし、半教師なし、自己教師付き学習。グラフベースの学習環境の分類学は、与えられた学習環境に落下するメソッドの論理的区分を備える。各学習タスクに対するアプローチは、理論と経験的観点の両方から分析される。さらに、GNN構築のための一般的なアーキテクチャガイドラインを提供する。さまざまなアプリケーションやベンチマークデータセットも提供されており、GNNの一般適用性に疑問が残るオープンな課題もある。 In the last decade or so, we have witnessed deep learning reinvigorating the machine learning field. It has solved many problems in the domains of computer vision, speech recognition, natural language processing, and various other tasks with state-of-the-art performance. The data is generally represented in the Euclidean space in these domains. Various other domains conform to non-Euclidean space, for which graph is an ideal representation. Graphs are suitable for representing the dependencies and interrelationships between various entities. Traditionally, handcrafted features for graphs are incapable of providing the necessary inference for various tasks from this complex data representation. Recently, there is an emergence of employing various advances in deep learning to graph data-based tasks. This article provides a comprehensive survey of graph neural networks (GNNs) in each learning setting: supervised, unsupervised, semi-supervised, and self-supervised learning. Taxonomy of each graph based learning setting is provided with logical divisions of methods falling in the given learning setting. The approaches for each learning task are analyzed from both theoretical as well as empirical standpoints. Further, we provide general architecture guidelines for building GNNs. Various applications and benchmark datasets are also provided, along with open challenges still plaguing the general applicability of GNNs.	翻訳日:2021-08-25 15:52:41 公開日:2021-08-24
# (参考訳) DeepPanoContext: ホロスティックなシーンコンテキストグラフと関係に基づく最適化によるパノラマ3次元シーン理解 DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization ( http://arxiv.org/abs/2108.10743v1 ) ライセンス: CC BY 4.0	Cheng Zhang, Zhaopeng Cui, Cai Chen, Shuaicheng Liu, Bing Zeng, Hujun Bao, Yinda Zhang	(参考訳) パノラマ画像は、通常の視点画像に比べて、自然にリッチなシーンコンテキスト情報をエンコードする視野がはるかに大きいが、従来のシーン理解手法ではうまく利用されていない。本論文では,パノラマ映像から各物体の3次元空間配置と形状,ポーズ,位置,意味カテゴリーを復元する新しいパノラマ3次元シーン理解手法を提案する。リッチなコンテキスト情報を十分に活用するために,オブジェクトとルームレイアウトの関係を予測するための新しいグラフニューラルネットワークベースのコンテキストモデルと,高度に設計された対象関数をオンザフライで最適化する微分可能な関係ベースの最適化モジュールを設計した。既存のデータが不完全な地上の真実か、過度に単純化されたシーンであることを認識し、部屋のレイアウトや家具配置の多様性に優れた、パノラマ3Dシーン理解のためのリアルな画像品質を備えた新しい合成データセットを提示する。実験により,従来のパノラマシーン理解法よりも,幾何学的精度と物体配置の両面で優れることを示した。コードはhttps://chengzhag.github.io/publication/dpcで入手できる。 Panorama images have a much larger field-of-view thus naturally encode enriched scene context information compared to standard perspective images, which however is not well exploited in the previous scene understanding methods. In this paper, we propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view panorama image. In order to fully utilize the rich context information, we design a novel graph neural network based context model to predict the relationship among objects and room layout, and a differentiable relationship-based optimization module to optimize object arrangement with well-designed objective functions on-the-fly. Realizing the existing data are either with incomplete ground truth or overly-simplified scene, we present a new synthetic dataset with good diversity in room layout and furniture placement, and realistic image quality for total panoramic 3D scene understanding. Experiments demonstrate that our method outperforms existing methods on panoramic scene understanding in terms of both geometry accuracy and object arrangement. Code is available at https://chengzhag.github.io/publication/dpc.	翻訳日:2021-08-25 15:51:47 公開日:2021-08-24
# (参考訳) 持続可能な開発目標を達成するための解釈可能なディープラーニングモデル Interpretable deep-learning models to help achieve the Sustainable Development Goals ( http://arxiv.org/abs/2108.10744v1 ) ライセンス: CC BY 4.0	Ricardo Vinuesa, Beril Sirmacek	(参考訳) 我々は、解釈可能な人工知能(AI)モデルに対する私たちの洞察と、それが倫理的AIシステムの開発の文脈においていかに不可欠であるか、そして持続可能な開発目標(SDG)に準拠したデータ駆動ソリューションについて議論する。本稿では,インダクティブバイアスによって得られた記号モデルなどを通じて,ディープラーニング手法から真に解釈可能なモデルを抽出する可能性を強調し,aiの持続可能な発展を保証する。 We discuss our insights into interpretable artificial-intelligence (AI) models, and how they are essential in the context of developing ethical AI systems, as well as data-driven solutions compliant with the Sustainable Development Goals (SDGs). We highlight the potential of extracting truly-interpretable models from deep-learning methods, for instance via symbolic models obtained through inductive biases, to ensure a sustainable development of AI.	翻訳日:2021-08-25 15:50:42 公開日:2021-08-24
# (参考訳) 人工生成メタデータを用いた表からの関係抽出 Relation Extraction from Tables using Artificially Generated Metadata ( http://arxiv.org/abs/2108.10750v1 ) ライセンス: CC BY 4.0	Gaurav singh, Siffi Singh, Joshua Wong, Amir Saffari	(参考訳) テーブルからの関係抽出(RE)は、列のペア間の関係を識別するタスクである。一般的に、このタスクのREモデルはトレーニングのためにラベル付きテーブルを必要とする。幸いなことに、ラベル付きテーブルは知識グラフ(KG)から人工的に生成することもできるため、手作業によるアノテーションよりもはるかにコストが低い。しかし、これらのテーブルは実際のテーブルと比較して1つの欠点があり、コラムヘッドやキャプションといった関連するメタデータが欠けている。これは、合成テーブルがメタデータを格納しないKGから生成されるためである。残念ながら、メタデータはテーブルからのreに対する強力なシグナルを提供することができる。この問題に対処するため,合成表のメタデータを人工的に生成する手法を提案する。次に、人工メタデータを入力として使用するREモデルを実験する。実験の結果,F1スコアの9\%-45\%が絶対的に2つの表付きデータセットで改善されることがわかった。 Relation Extraction (RE) from tables is the task of identifying relations between pairs of columns. Generally, RE models for this task require labelled tables for training. Luckily, labelled tables can also be generated artificially from a Knowledge Graph (KG), which makes the cost to acquire them much lower in comparison to manual annotations. However, these tables have one drawback compared to real tables, which is that they lack associated metadata, such as column-headers, captions, etc. This is because synthetic tables are created out of KGs that do not store such metadata. Unfortunately, metadata can provide strong signals for RE from tables. To address this issue, we propose methods to artificially create some of this metadata for synthetic tables. We then experiment with a RE model that uses artificial metadata as input. Our empirical results show that this leads to an improvement of 9\%-45\% in F1 score, in absolute terms, over 2 tabular datasets.	翻訳日:2021-08-25 15:46:07 公開日:2021-08-24
# (参考訳) 単語を超えて:潜在ディリクレ割当モデルのコロケーショントークン化 More Than Words: Collocation Tokenization for Latent Dirichlet Allocation Models ( http://arxiv.org/abs/2108.10755v1 ) ライセンス: CC BY 4.0	Jin Cheevaprawatdomrong, Alexandra Schofield, Attapol T. Rutherford	(参考訳) 伝統的に、LDA (Latent Dirichlet Allocation) は文書の集合の中で単語を取り込み、単語文書の共起を使ってその潜在トピックを発見する。しかし、中国語やタイ語などの単語境界をマークせずに、言語で最高の結果を達成する方法は不明である。本稿では,PearsonのChi-squared test, t-statistics, Word Pair Encoding (WPE)を用いて,LDAモデルの入力としてトークンを生成する。 Chi-squared、t、WPEトークンーはウィキペディアのテキストで訓練され、複合名詞、固有名詞、複合イベント動詞などのグループ化すべき単語を探す。本稿では,モデルの語彙が異なる設定において,クラスタリング品質を測定するための新しい指標を提案する。このメトリックやその他の確立されたメトリクスに基づいて、マージトークンでトレーニングされたトピックは、これらの未マージモデルよりも明確で一貫性があり、トピックの識別に効果的であるトピックキーを生成する。 Traditionally, Latent Dirichlet Allocation (LDA) ingests words in a collection of documents to discover their latent topics using word-document co-occurrences. However, it is unclear how to achieve the best results for languages without marked word boundaries such as Chinese and Thai. Here, we explore the use of Pearson's chi-squared test, t-statistics, and Word Pair Encoding (WPE) to produce tokens as input to the LDA model. The Chi-squared, t, and WPE tokenizers are trained on Wikipedia text to look for words that should be grouped together, such as compound nouns, proper nouns, and complex event verbs. We propose a new metric for measuring the clustering quality in settings where the vocabularies of the models differ. Based on this metric and other established metrics, we show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.	翻訳日:2021-08-25 15:40:28 公開日:2021-08-24
# (参考訳) DU-GAN:低用量CT復調用デュアルドメインU-Netディスクリミネータを用いた生成対向ネットワーク DU-GAN: Generative Adversarial Networks with Dual-Domain U-Net Based Discriminators for Low-Dose CT Denoising ( http://arxiv.org/abs/2108.10772v1 ) ライセンス: CC BY 4.0	Zhizhong Huang, Junping Zhang, Yi Zhang, Hongming Shan	(参考訳) LDCTは、CT関連X線による患者の健康リスクから、医療画像分野で大きな注目を集めている。しかし、放射線線量の減少は再構成画像の品質を低下させ、結果として診断性能を損なう。 LDCT画像の品質向上のために,様々なディープラーニング技術が導入されている。 GANをベースとした denoising 法は、通常、追加の分類網、すなわち、追加の分類網を利用する。識別器は、識別された画像と通常のドーズ画像の最も差別的な違いを学習し、それに従って復調モデルを正規化し、大域的な構造や局所的な詳細に焦点を当てることが多い。本稿では,LDCTデノナイジングモデルを改善するために,GANフレームワークにおけるU-Netに基づく差別化手法であるDU-GANを提案し,画像領域と勾配領域の両方におけるデノナイジング画像の局所的差と局所的差を学習する。このようなU-Netベースの識別器の利点は、U-Netの出力を通じて1ピクセル当たりのフィードバックを提供するだけでなく、U-Netの中間層を通した意味レベルでのグローバル構造に焦点を合わせることができることである。画像領域における敵対的トレーニングに加えて、画像勾配領域に別のu-netベースの判別器を適用し、光子飢餓によるアーティファクトの軽減と分断されたct画像のエッジの強化を図る。さらに、カットミックス技術により、u-netベースの判別器の画素単位の出力に対して、放射線科医に信頼度マップを提供し、その不確かさを可視化し、ldctに基づくスクリーニングおよび診断を容易にする。シミュレーションおよび実世界のデータセットに関する広範な実験は、最近公開された方法よりも質的かつ定量的に優れた性能を示している。 LDCT has drawn major attention in the medical imaging field due to the potential health risks of CT-associated X-ray radiation to patients. Reducing the radiation dose, however, decreases the quality of the reconstructed images, which consequently compromises the diagnostic performance. Various deep learning techniques have been introduced to improve the image quality of LDCT images through denoising. GANs-based denoising methods usually leverage an additional classification network, i.e. discriminator, to learn the most discriminate difference between the denoised and normal-dose images and, hence, regularize the denoising model accordingly; it often focuses either on the global structure or local details. To better regularize the LDCT denoising model, this paper proposes a novel method, termed DU-GAN, which leverages U-Net based discriminators in the GANs framework to learn both global and local difference between the denoised and normal-dose images in both image and gradient domains. The merit of such a U-Net based discriminator is that it can not only provide the per-pixel feedback to the denoising network through the outputs of the U-Net but also focus on the global structure in a semantic level through the middle layer of the U-Net. In addition to the adversarial training in the image domain, we also apply another U-Net based discriminator in the image gradient domain to alleviate the artifacts caused by photon starvation and enhance the edge of the denoised CT images. Furthermore, the CutMix technique enables the per-pixel outputs of the U-Net based discriminator to provide radiologists with a confidence map to visualize the uncertainty of the denoised results, facilitating the LDCT-based screening and diagnosis. Extensive experiments on the simulated and real-world datasets demonstrate superior performance over recently published methods both qualitatively and quantitatively.	翻訳日:2021-08-25 15:32:56 公開日:2021-08-24
# (参考訳) greenformers:低ランク近似による変圧器モデルの計算とメモリ効率の向上 Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation ( http://arxiv.org/abs/2108.10808v1 ) ライセンス: CC BY-SA 4.0	Samuel Cahyawijaya	(参考訳) 本稿では,最近注目されている変圧器モデルの低ランク近似手法によるモデル効率を向上させるためのモデル効率法集 greenformers を提案する。ディープラーニングモデルの開発傾向は、より複雑で大きなモデルをもたらす傾向にある。これはより良く正確な予測につながるが、大量のgpuリソースで数週間のトレーニングを必要とするため、結果として得られるモデルはさらにコストがかかる。特に、トランスフォーマーベースのモデルのサイズと計算コストは、2017年のデビュー以来、2021年初頭に約1億のパラメータから約1.6兆のパラメータへと大幅に増加しています。この計算的な空腹モデルもまた環境にかなりのコストをもたらし、カーボンフットプリントの脅威レベルにまで達する。これらのモデルのいくつかは非常に巨大なので、GPUクラスタなしでモデルを実行することさえ不可能です。グリーンフォーマーは低ランク近似アプローチを適用して変圧器モデルのモデル効率を向上させる。具体的には,低ランク変圧器と呼ばれる変圧器モデルの効率を向上させるための低ランク分解手法を提案する。さらに、我々のモデルをLinformerと呼ばれる既存の低ランク分解手法と比較する。この分析に基づき、低ランクトランスフォーマモデルは短系列(<=512)入力データの処理における時間およびメモリ効率を向上させるのに適し、リンフォーマモデルは長系列入力データの処理効率を向上させるのに適している(>>512)。また,低ランクトランスフォーマは,モデルサイズが大幅に削減されるため,デバイス上でのデプロイメントに適していることを示す。さらに、既存のBERTベースモデルにLRTを適用することで、そのようなモデルを開発するための計算、経済、環境コストを、当初のコストの30%以上削減できると見積もっている。 In this thesis, we introduce Greenformers, a collection of model efficiency methods to improve the model efficiency of the recently renowned transformer models with a low-rank approximation approach. The development trend of deep learning models tends to results in a more complex and larger model. Although it leads to a better and more accurate prediction, the resulting model becomes even more costly, as it requires weeks of training with a huge amount of GPU resources. Particularly, the size and computational cost of transformer-based models have increased tremendously since its first debut in 2017 from ~100 million parameters up to ~1.6 trillion parameters in early 2021. This computationally hungry model also incurs a substantial cost to the environment and even reaches an alarming level of carbon footprint. Some of these models are so massive that it is even impossible to run the model without a GPU cluster. Greenformers improve the model efficiency of transformer models by applying low-rank approximation approaches. Specifically, we propose a low-rank factorization approach to improve the efficiency of the transformer model called Low-Rank Transformer. We further compare our model with an existing low-rank factorization approach called Linformer. Based on our analysis, the Low-Rank Transformer model is suitable for improving both the time and memory efficiency in processing short-sequence (<= 512) input data, while the Linformer model is suitable for improving the efficiency in processing long-sequence input data (>= 512). We also show that Low-Rank Transformer is more suitable for on-device deployment, as it significantly reduces the model size. Additionally, we estimate that applying LRT to the existing BERT-base model can significantly reduce the computational, economical, and environmental costs for developing such models by more than 30% of its original costs.	翻訳日:2021-08-25 15:11:35 公開日:2021-08-24
# (参考訳) All-in-Focus Supervision による教師なし奥行きのブリッジ Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision ( http://arxiv.org/abs/2108.10843v1 ) ライセンス: CC BY 4.0	Ning-Hsu Wang, Ren Wang, Yu-Lun Liu, Yu-Hao Huang, Yu-Lin Chang, Chia-Ping Chen and Kevin Jou	(参考訳) 奥行き推定はコンピュータビジョンにおいて長く続く重要なタスクである。以前の研究のほとんどは、入力画像から深度を推定し、実世界のアプリケーションでは一般的でないオールインフォーカス(AiF)であると仮定している。一方、デフォーカスのぼかしを考慮に入れ、深度推定のための別の手がかりと考える作品もいくつかある。本稿では,焦点位置の異なる画像群(焦点スタックとして知られる)から深度マップだけでなくaif画像も推定する手法を提案する。深度とAiF推定の関係を生かした共有アーキテクチャを設計する。その結果、提案手法は、地上の真理深度で指導的に訓練するか、AiF画像を監視信号として訓練することができる。種々の実験において,本手法は定量的かつ定性的に最先端の手法より優れ,推論時間の効率も高いことを示す。 Depth estimation is a long-lasting yet important task in computer vision. Most of the previous works try to estimate depth from input images and assume images are all-in-focus (AiF), which is less common in real-world applications. On the other hand, a few works take defocus blur into account and consider it as another cue for depth estimation. In this paper, we propose a method to estimate not only a depth map but an AiF image from a set of images with different focus positions (known as a focal stack). We design a shared architecture to exploit the relationship between depth and AiF estimation. As a result, the proposed method can be trained either supervisedly with ground truth depth, or \emph{unsupervisedly} with AiF images as supervisory signals. We show in various experiments that our method outperforms the state-of-the-art methods both quantitatively and qualitatively, and also has higher efficiency in inference time.	翻訳日:2021-08-25 15:10:21 公開日:2021-08-24
# (参考訳) 計算病理学のための四分木画像表現 A QuadTree Image Representation for Computational Pathology ( http://arxiv.org/abs/2108.10873v1 ) ライセンス: CC BY 4.0	Rob Jewsbury, Abhir Bhalerao, Nasir Rajpoot	(参考訳) 計算病理学の分野は、病理画像の重大さからコンピュータビジョンアルゴリズムに多くの課題を呈している。病理組織像は大きく、画像タイルやパッチに分割する必要があるため、現代の畳み込みニューラルネットワーク(cnns)がそれらを処理できる。本稿では,quadtreesを用いて計算病理画像の解釈可能な画像表現を生成する手法と,これらの表現を精度の高い下流分類に利用するパイプラインを提案する。我々の知る限りでは、これは病理画像データにクワッドツリーを使用する最初の試みである。現在広く採用されている組織マスクパッチ抽出法と同程度の精度で, 38%以上少ないデータを用いて, 良好な結果が得られることを示した。 The field of computational pathology presents many challenges for computer vision algorithms due to the sheer size of pathology images. Histopathology images are large and need to be split up into image tiles or patches so modern convolutional neural networks (CNNs) can process them. In this work, we present a method to generate an interpretable image representation of computational pathology images using quadtrees and a pipeline to use these representations for highly accurate downstream classification. To the best of our knowledge, this is the first attempt to use quadtrees for pathology image data. We show it is highly accurate, able to achieve as good results as the currently widely adopted tissue mask patch extraction methods all while using over 38% less data.	翻訳日:2021-08-25 14:43:09 公開日:2021-08-24
# キャリブレーションバックプロジェクション層を用いた教師なし奥行き完了 Unsupervised Depth Completion with Calibrated Backprojection Layers ( http://arxiv.org/abs/2108.10531v1 ) ライセンス: Link先を確認	Alex Wong and Stefano Soatto	(参考訳) 本研究では,画像と疎点雲から深い深さを推定するディープニューラルネットワークアーキテクチャを提案する。カメラの固有キャリブレーションパラメータとともに、lidarまたは他のレンジセンサから得られるビデオストリームと対応する同期スパースポイントクラウドを用いてトレーニングする。推定時には、トレーニングに使用するカメラとは異なるカメラのキャリブレーションが、スパースポイントクラウドと1つの画像とともに、ネットワークへの入力として供給される。キャリブレーションバックプロジェクション層は、キャリブレーションマトリックスと深度特徴記述子を用いて画像の各画素を3次元空間にバックプロジェクションする。得られた3次元位置符号化は、画像記述子と前層出力とを連結してエンコーダの次の層に入力する。デコーダはスキップ接続を利用して深度マップを生成する。結果として得られる校正されたバックプロジェクションネットワーク(kbnet)は、測光再プロジェクションエラーを最小化することで、監視なしで訓練される。 kbnetは一般的な正規化ではなく、トレーニングセットに基づく深さ値の欠落を暗示する。我々はKBNetを公開深度補完ベンチマークでテストし、同じカメラをトレーニングやテストに使用する場合、アートの状態を30%、屋外の8%で上回ります。テストカメラが異なる場合、改善率は62%に達する。 https://github.com/alexklwong/calibrated-backprojection-network.com/。 We propose a deep neural network architecture to infer dense depth from an image and a sparse point cloud. It is trained using a video stream and corresponding synchronized sparse point cloud, as obtained from a LIDAR or other range sensor, along with the intrinsic calibration parameters of the camera. At inference time, the calibration of the camera, which can be different than the one used for training, is fed as an input to the network along with the sparse point cloud and a single image. A Calibrated Backprojection Layer backprojects each pixel in the image to three-dimensional space using the calibration matrix and a depth feature descriptor. The resulting 3D positional encoding is concatenated with the image descriptor and the previous layer output to yield the input to the next layer of the encoder. A decoder, exploiting skip-connections, produces a dense depth map. The resulting Calibrated Backprojection Network, or KBNet, is trained without supervision by minimizing the photometric reprojection error. KBNet imputes missing depth value based on the training set, rather than on generic regularization. We test KBNet on public depth completion benchmarks, where it outperforms the state of the art by 30% indoor and 8% outdoor when the same camera is used for training and testing. When the test camera is different, the improvement reaches 62%. Code available at: https://github.com/alexklwong/calibrated-backprojection-network.	翻訳日:2021-08-25 14:29:46 公開日:2021-08-24
# 微粒化エンティティタイピングのためのPrompt-Learning Prompt-Learning for Fine-Grained Entity Typing ( http://arxiv.org/abs/2108.10604v1 ) ライセンス: Link先を確認	Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Hai-Tao Zheng, Zhiyuan Liu, Juanzi Li, Hong-Gee Kim	(参考訳) 特定のタスクに事前学習言語モデル(PLM)をチューニングするための効果的なアプローチとして、プロンプトラーニングが研究者から注目を集めている。 textit{cloze} スタイルの言語は PLM の多義的な知識を刺激し、自然言語推論、感情分類、知識探索といった一連の NLP タスクにおいて有望な結果が得られる。本研究では,細粒度エンティティタイピングにおけるプロンプトラーニングの適用について,全教師あり,少数ショット,ゼロショットのシナリオで検討する。まず、エンティティ指向の言語処理器とテンプレートを構築し、マスク付き言語モデリングを行うことにより、シンプルで効果的な学習パイプラインを構築する。さらに,ゼロショット体制に取り組むために,素早い学習において分布レベルの最適化を行い,エンティティの情報を自動要約する自己教師型戦略を提案する。教師付き、少数ショット、ゼロショット設定下での3つのきめ細かいエンティティタイピングベンチマーク(最大86クラス)の大規模な実験は、特にトレーニングデータが不十分な場合、プロンプト学習手法が微調整ベースラインを大幅に上回っていることを示している。 As an effective approach to tune pre-trained language models (PLMs) for specific tasks, prompt-learning has recently attracted much attention from researchers. By using \textit{cloze}-style language prompts to stimulate the versatile knowledge of PLMs, prompt-learning can achieve promising results on a series of NLP tasks, such as natural language inference, sentiment classification, and knowledge probing. In this work, we investigate the application of prompt-learning on fine-grained entity typing in fully supervised, few-shot and zero-shot scenarios. We first develop a simple and effective prompt-learning pipeline by constructing entity-oriented verbalizers and templates and conducting masked language modeling. Further, to tackle the zero-shot regime, we propose a self-supervised strategy that carries out distribution-level optimization in prompt-learning to automatically summarize the information of entity types. Extensive experiments on three fine-grained entity typing benchmarks (with up to 86 classes) under fully supervised, few-shot and zero-shot settings show that prompt-learning methods significantly outperform fine-tuning baselines, especially when the training data is insufficient.	翻訳日:2021-08-25 14:29:24 公開日:2021-08-24
# ソーシャルメディアにおける道徳に基づくアサーションと相同性--英語と日本語の文化的比較 Morality-based Assertion and Homophily on Social Media: A Cultural Comparison between English and Japanese Languages ( http://arxiv.org/abs/2108.10643v1 ) ライセンス: Link先を確認	Maneet Singh, Rishemjit Kaur, Akiko Matsuo, S.R.S. Iyengar and Kazutoshi Sasahara	(参考訳) 道徳心理学は道徳的アイデンティティ、評価、感情を扱う分野である。これまでの仕事は道徳的発展と文化の役割に大きく焦点を合わせてきた。言語が文化の本質的な要素であることを知るため,日本語利用者と英語利用者の道徳行動を比較するために,ソーシャルメディアプラットフォームであるTwitterを用いた。ケア、フェアネス、イングループ、オーソリティ、純粋性の5つの基本的道徳的基盤と関連する感情的価値を、英語と日本語のつぶやきと比較する。日本のユーザーのツイートは、フェアネス、イングループ、純粋さが比較的高かった。道徳に関わる感情に関しては、イングランドのツイートは全ての道徳的な側面に対してよりポジティブな感情を表した。ソーシャルメディア上で利用者をつなぐ上での道徳的類似性を考慮して,提案手法を用いて異なる道徳的次元に関するホモフィリーを定量化した。英語のケア、権威、純粋さ、日本語のイングループはTwitter上でホモフィリーを描写している。本研究は,英語および日本語話者の道徳行動に関する文化的差異を明らかにするものである。 Moral psychology is a domain that deals with moral identity, appraisals and emotions. Previous work has greatly focused on moral development and the associated role of culture. Knowing that language is an inherent element of a culture, we used the social media platform Twitter for comparing the moral behaviors of Japanese users with English users. The five basic moral foundations i.e., Care, Fairness, Ingroup, Authority and Purity, along with the associated emotional valence are compared for English and Japanese tweets. The tweets from Japanese users depicted relatively higher Fairness, Ingroup and Purity. As far as emotions related to morality are concerned, the English tweets expressed more positive emotions for all moral dimensions. Considering the role of moral similarities in connecting users on social media, we quantified homophily concerning different moral dimensions using our proposed method. The moral dimensions Care, Authority and Purity for English and Ingroup for Japanese depicted homophily on Twitter. Overall, our study uncovers the underlying cultural differences with respect to moral behavior in English and Japanese speaking users.	翻訳日:2021-08-25 14:29:01 公開日:2021-08-24
# llvip: ローライトビジョンのための可視赤外ペアデータセット LLVIP: A Visible-infrared Paired Dataset for Low-light Vision ( http://arxiv.org/abs/2108.10831v1 ) ライセンス: Link先を確認	Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, Wenli Zhou	(参考訳) 画像の融合や歩行者検出、低照度での画像から画像への変換といった様々な視覚課題において、有効な対象領域の欠如は極めて困難である。この場合、赤外線と可視画像を組み合わせて、詳細な情報と効果的なターゲット領域の両方を提供することができる。本稿では,低照度ビジョンのための可視赤外ペアデータセットLLVIPを提案する。このデータセットには33672枚の画像、または16836枚のペアが含まれており、そのほとんどは非常に暗いシーンで撮影され、すべての画像は時間と空間で厳密に整列している。データセットの歩行者はラベルが付けられています。データセットを他の可視赤外データセットと比較し,画像融合,歩行者検出,画像から画像への変換など,一般的なビジュアルアルゴリズムの性能評価を行った。実験結果は,画像情報に対する融合の相補的効果を示し,超低照度条件下での3つの視覚課題の既存のアルゴリズムの欠如を見出した。 LLVIPデータセットは,低照度アプリケーションにおける画像融合,歩行者検出,画像から画像への変換を促進することによって,コンピュータビジョンのコミュニティに寄与すると考えている。データセットはhttps://bupt-ai-cz.github.io/llvipでリリースされる。 It is very challenging for various visual tasks such as image fusion, pedestrian detection and image-to-image translation in low light conditions due to the loss of effective target areas. In this case, infrared and visible images can be used together to provide both rich detail information and effective target areas. In this paper, we present LLVIP, a visible-infrared paired dataset for low-light vision. This dataset contains 33672 images, or 16836 pairs, most of which were taken at very dark scenes, and all of the images are strictly aligned in time and space. Pedestrians in the dataset are labeled. We compare the dataset with other visible-infrared datasets and evaluate the performance of some popular visual algorithms including image fusion, pedestrian detection and image-to-image translation on the dataset. The experimental results demonstrate the complementary effect of fusion on image information, and find the deficiency of existing algorithms of the three visual tasks in very low-light conditions. We believe the LLVIP dataset will contribute to the community of computer vision by promoting image fusion, pedestrian detection and image-to-image translation in very low-light applications. The dataset is being released in https://bupt-ai-cz.github.io/LLVIP.	翻訳日:2021-08-25 14:28:24 公開日:2021-08-24
# マルチソースドメイン適応のためのメタ自己学習:ベンチマーク Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark ( http://arxiv.org/abs/2108.10840v1 ) ライセンス: Link先を確認	Shuhao Qiu, Chuang Zhu, Wenli Zhou	(参考訳) 近年、深層学習に基づく手法がコンピュータビジョンの分野で有望な結果を示している。しかし、一般的なディープラーニングモデルは大量のラベル付きデータを必要とするため、収集とラベル付けに手間がかかる。さらに、トレーニングデータとテストデータの間のドメインシフトによって、モデルは破壊される可能性があります。テキスト認識はコンピュータビジョンにおいて広く研究されている分野であり、フォントの多様性と複雑な背景により上記の問題に苦しめられている。本稿では,テキスト認識問題に着目し,これらの問題に対して3つの貢献を行う。まず、500万以上の画像を持つ5つの異なるドメインを含む、テキスト認識のためのマルチソースドメイン適応データセットを収集します。次に,メタ自己学習手法とメタ学習パラダイムを組み合わせたメタ自己学習手法を提案する。第3に,ベンチマークを提供するためにデータセット上で広範な実験を行い,本手法の有効性を示す。私たちの仕事とデータセットのコードは、すぐにhttps://bupt-ai-cz.github.io/meta-selflearning/で入手できる。 In recent years, deep learning-based methods have shown promising results in computer vision area. However, a common deep learning model requires a large amount of labeled data, which is labor-intensive to collect and label. What's more, the model can be ruined due to the domain shift between training data and testing data. Text recognition is a broadly studied field in computer vision and suffers from the same problems noted above due to the diversity of fonts and complicated backgrounds. In this paper, we focus on the text recognition problem and mainly make three contributions toward these problems. First, we collect a multi-source domain adaptation dataset for text recognition, including five different domains with over five million images, which is the first multi-domain text recognition dataset to our best knowledge. Secondly, we propose a new method called Meta Self-Learning, which combines the self-learning method with the meta-learning paradigm and achieves a better recognition result under the scene of multi-domain adaptation. Thirdly, extensive experiments are conducted on the dataset to provide a benchmark and also show the effectiveness of our method. The code of our work and dataset are available soon at https://bupt-ai-cz.github.io/Meta-SelfLearning/.	翻訳日:2021-08-25 14:28:06 公開日:2021-08-24
# リカレントニューラルネットワークトランスデューサにおける露出バイアスの低減 Reducing Exposure Bias in Training Recurrent Neural Network Transducers ( http://arxiv.org/abs/2108.10803v1 ) ライセンス: Link先を確認	Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske	(参考訳) リカレントニューラルネットワークトランスデューサ(rnnts)を典型的最大度基準を用いて訓練すると、予測ネットワークは基底真理ラベル配列のみに基づいて訓練される。これにより、モデルがエラーを含むラベルシーケンスを扱う必要がある場合、露出バイアスとして知られる推論中にミスマッチが発生する。本稿では,自動音声認識(ASR)のためのRNNTモデルの一般化を改善するために,トレーニングにおける露出バイアスを低減するアプローチを検討する。予測ネットワークに対するラベル保存入力摂動を導入する。入力トークンシーケンスは、追加のトークン言語モデルに基づいてスイッチアウトとスケジュールサンプリングを使用して摂動される。 300時間のswitchboardデータセットで実施された実験は、その効果を示している。露光バイアスを低減することで、高性能RNNT ASRモデルの精度をさらに向上し、300時間Switchboardデータセットの最先端結果を得ることができることを示す。 When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, when the model must deal with label sequences containing errors. In this paper we investigate approaches to reducing exposure bias in training to improve the generalization of RNNT models for automatic speech recognition (ASR). A label-preserving input perturbation to the prediction network is introduced. The input token sequences are perturbed using SwitchOut and scheduled sampling based on an additional token language model. Experiments conducted on the 300-hour Switchboard dataset demonstrate their effectiveness. By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset.	翻訳日:2021-08-25 14:27:50 公開日:2021-08-24
# 欠落モダリティをもつマルチモーダル学習における最大確率推定 Maximum Likelihood Estimation for Multimodal Learning with Missing Modality ( http://arxiv.org/abs/2108.10513v1 ) ライセンス: Link先を確認	Fei Ma, Xiangxiang Xu, Shao-Lun Huang, Lin Zhang	(参考訳) マルチモーダル学習は多くのシナリオで大きな成功を収めた。一元学習と比較して、異なるモダリティからの情報を効果的に組み合わせて学習タスクの性能を向上させることができる。実際、マルチモーダルデータはセンサーの故障やデータ伝送エラーといった様々な理由により、モダリティを欠いている可能性がある。以前の研究では、モダリティを許容するデータの情報は十分に活用されていない。この問題に対処するために,最大推定値に基づく効率的な手法を提案し,その知識をモダリティ欠落データに組み込む。具体的には、モーダリティ完全データと理論的に最適であるモーダリティ完全データの条件分布を特徴付ける可能性関数を設計する。さらに,ソフトマックス関数の一般化形式を開発し,最大推定値をエンドツーエンドに効果的に実装する。このようなトレーニング戦略は,アルゴリズムの計算可能性を保証する。最後に,実世界のマルチモーダルデータセットに関する一連の実験を行う。トレーニングデータの95%がモダリティを欠いている場合でも,提案手法の有効性を示す。 Multimodal learning has achieved great successes in many scenarios. Compared with unimodal learning, it can effectively combine the information from different modalities to improve the performance of learning tasks. In reality, the multimodal data may have missing modalities due to various reasons, such as sensor failure and data transmission error. In previous works, the information of the modality-missing data has not been well exploited. To address this problem, we propose an efficient approach based on maximum likelihood estimation to incorporate the knowledge in the modality-missing data. Specifically, we design a likelihood function to characterize the conditional distribution of the modality-complete data and the modality-missing data, which is theoretically optimal. Moreover, we develop a generalized form of the softmax function to effectively implement maximum likelihood estimation in an end-to-end manner. Such training strategy guarantees the computability of our algorithm capably. Finally, we conduct a series of experiments on real-world multimodal datasets. Our results demonstrate the effectiveness of the proposed approach, even when 95% of the training data has missing modality.	翻訳日:2021-08-25 14:27:36 公開日:2021-08-24
# より深いグラフニューラルネットワークのトレーニングのためのトリックのバグ:包括的なベンチマーク研究 Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study ( http://arxiv.org/abs/2108.10521v1 ) ライセンス: Link先を確認	Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang	(参考訳) ディープグラフニューラルネットワーク(GNN)のトレーニングは非常に難しい。勾配の消失や過度な適合といった深層アーキテクチャのトレーニングの標準点に加えて、深層GNNのトレーニングは過度なスムーシングや情報スカッシングなどの影響を受けており、大規模なグラフに対する潜在的なパワーを制限している。様々な種類のスキップ接続、グラフ正規化、ランダムなドロップなど、これらの制限に対処するための多くの取り組みが提案されているが、そのようなアーキテクチャをトレーニングするために必要な「トリック」から、深いGNNアーキテクチャがもたらす利点を解消することは困難である。さらに、公正で一貫した実験的な設定を持つ標準ベンチマークの欠如は、新しいメカニズムの有効性を調べる上でほぼ不可能である。これらの観点から、我々は、深層GNNの「トリック」を評価するための最初の公正かつ再現可能なベンチマークを示す。既存のアプローチを分類し,そのハイパーパラメータ感度を調査し,基本構成を統一する。総合的な評価は、最近の大規模Open Graph Benchmark(OGB)を含む、数十のグラフデータセット上で実施される。相乗的研究に基づいて,複数の代表的なグラフデータセットにわたる深層gcnの新たな最先端結果を達成するための,優れたトレーニングトリックのコンボを見出した。我々は,初期接続,アイデンティティマッピング,グループ正規化,バッチ正規化といった有機的な組み合わせが,大規模データセットにおいて最も理想的な性能を持つことを示す。実験はまた、いくつかのトリックを組み合わせたりスケールアップしたりする際に、いくつかの"サプライズ"を明らかにする。すべてのコードはhttps://github.com/VITA-Group/Deep_GCN_Benchmarkingで入手できる。 Training deep graph neural networks (GNNs) is notoriously hard. Besides the standard plights in training deep architectures such as vanishing gradients and overfitting, the training of deep GNNs also uniquely suffers from over-smoothing, information squashing, and so on, which limits their potential power on large-scale graphs. Although numerous efforts are proposed to address these limitations, such as various forms of skip connections, graph normalization, and random dropping, it is difficult to disentangle the advantages brought by a deep GNN architecture from those "tricks" necessary to train such an architecture. Moreover, the lack of a standardized benchmark with fair and consistent experimental settings poses an almost insurmountable obstacle to gauging the effectiveness of new mechanisms. In view of those, we present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs. We categorize existing approaches, investigate their hyperparameter sensitivity, and unify the basic configuration. Comprehensive evaluations are then conducted on tens of representative graph datasets including the recent large-scale Open Graph Benchmark (OGB), with diverse deep GNN backbones. Based on synergistic studies, we discover the combo of superior training tricks, that lead us to attain the new state-of-the-art results for deep GCNs, across multiple representative graph datasets. We demonstrate that an organic combo of initial connection, identity mapping, group and batch normalization has the most ideal performance on large datasets. Experiments also reveal a number of "surprises" when combining or scaling up some of the tricks. All codes are available at https://github.com/VITA-Group/Deep_GCN_Benchmarking.	翻訳日:2021-08-25 14:27:21 公開日:2021-08-24
# 深層強化学習における効果的な探索のためのエントロピー・アウェアモデル初期化 Entropy-Aware Model Initialization for Effective Exploration in Deep Reinforcement Learning ( http://arxiv.org/abs/2108.10533v1 ) ライセンス: Link先を確認	Sooyoung Jang and Hyung-Il Kim	(参考訳) 深層学習における探索の促進は重要な問題である。初期エントロピーの影響について検討し,特に初期エントロピーの影響について検討した。 1) 初期エントロピーの低さは学習失敗の確率を増加させ, 2) この初期エントロピーは探索を阻害する低い値に向かって偏っている。本研究から着想を得たエントロピー対応モデル初期化は,効率的な探索のためのシンプルかつ強力な学習戦略である。提案する学習戦略は,学習失敗を著しく軽減し,実験によるパフォーマンス,安定性,学習速度を向上させる。 Encouraging exploration is a critical issue in deep reinforcement learning. We investigate the effect of initial entropy that significantly influences the exploration, especially at the earlier stage. Our main observations are as follows: 1) low initial entropy increases the probability of learning failure, and 2) this initial entropy is biased towards a low value that inhibits exploration. Inspired by the investigations, we devise entropy-aware model initialization, a simple yet powerful learning strategy for effective exploration. We show that the devised learning strategy significantly reduces learning failures and enhances performance, stability, and learning speed through experiments.	翻訳日:2021-08-25 14:26:53 公開日:2021-08-24
# sigmoidF1:マルチラベル分類のための平滑なF1スコアサロゲート損失 sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification ( http://arxiv.org/abs/2108.10566v1 ) ライセンス: Link先を確認	Gabriel B\'en\'edict, Vincent Koops, Daan Odijk, Maarten de Rijke	(参考訳) マルチクラスマルチラベル分類(multiclass multilabel classification)は、予測を通じて複数のラベルをサンプルに帰属させるタスクである。現在のモデルでは、既存の損失関数(シグモイド、クロスエントロピー、ロジスティックなど)を使用できるように、そのマルチラベル設定を複数のバイナリ分類またはマルチクラス分類に縮小する。実験的に、これらの手法は異なるメトリクス(F1スコア、リコール、精度など)で優れたパフォーマンスを達成することが報告されている。理論的には、多ラベル分類の削減は例ごとに異なるラベル数の予測には適せず、根底にある損失は性能指標の遠距離推定である。我々は損失関数sigmoidF1を提案する。これは f1 のスコアの近似であり、 (i) は確率的勾配降下に対して滑らかで扱いやすい、 (ii) 自然にマルチラベル計量に近似し、 (iii) ラベルの傾向とラベル数を推定する。より一般に、任意の混乱行列計量は滑らかな代理で定式化できることを示す。提案した損失関数を,テキストと画像の異なるデータセットで評価し,多ラベル分類評価の複雑さを考慮に入れた。実験では、SigmoidF1損失を最先端の学習前ニューラルネットワークMobileNetV2とDistilBERTにアタッチした分類ヘッドに埋め込んだ。実験の結果,SigmoidF1は4つのデータセットと複数のメトリクスで他の損失関数よりも優れていた。これらの結果から,訓練時間における損失関数としての推論時間指標の有効性と,マルチラベル分類などの非自明な分類問題への可能性を示した。 Multiclass multilabel classification refers to the task of attributing multiple labels to examples via predictions. Current models formulate a reduction of that multilabel setting into either multiple binary classifications or multiclass classification, allowing for the use of existing loss functions (sigmoid, cross-entropy, logistic, etc.). Empirically, these methods have been reported to achieve good performance on different metrics (F1 score, Recall, Precision, etc.). Theoretically though, the multilabel classification reductions does not accommodate for the prediction of varying numbers of labels per example and the underlying losses are distant estimates of the performance metrics. We propose a loss function, sigmoidF1. It is an approximation of the F1 score that (I) is smooth and tractable for stochastic gradient descent, (II) naturally approximates a multilabel metric, (III) estimates label propensities and label counts. More generally, we show that any confusion matrix metric can be formulated with a smooth surrogate. We evaluate the proposed loss function on different text and image datasets, and with a variety of metrics, to account for the complexity of multilabel classification evaluation. In our experiments, we embed the sigmoidF1 loss in a classification head that is attached to state-of-the-art efficient pretrained neural networks MobileNetV2 and DistilBERT. Our experiments show that sigmoidF1 outperforms other loss functions on four datasets and several metrics. These results show the effectiveness of using inference-time metrics as loss function at training time in general and their potential on non-trivial classification problems like multilabel classification.	翻訳日:2021-08-25 14:26:11 公開日:2021-08-24
# 技術・基本・テキストデータを用いたS&P 500株価予測 S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data ( http://arxiv.org/abs/2108.10826v1 ) ライセンス: Link先を確認	Shan Zhong and David B. Hitchcock	(参考訳) 我々は、株価予測に使用される一般的な予測モデルと新しい予測モデルの両方を要約し、S&P株価予測に技術的指標、基本特性、テキストベースの感情データと組み合わせた。 S&P 500指数方向予測における66.18%の精度と、個々の株式方向予測における62.09%の精度は、ランダムフォレストやLSTMといった異なる機械学習モデルと最先端のアンサンブルモデルを組み合わせて達成された。 2000年1月1日から2019年12月31日までに、現在および元S&P500の大型企業から発行されている518の異なる普通株に関する週毎の歴史的価格、財務報告、およびニュース情報が含まれています。本研究のイノベーションは,金融ニュース項目の感情を分類・推定するために深層言語モデルを活用すること,変数と株価の異なる組み合わせを含む異なるモデルを融合して予測を行うこと,異なる株間でのデータを用いて時系列で機械学習モデルの不十分なデータ問題を克服すること,などである。 We summarized both common and novel predictive models used for stock price prediction and combined them with technical indices, fundamental characteristics and text-based sentiment data to predict S&P stock prices. A 66.18% accuracy in S&P 500 index directional prediction and 62.09% accuracy in individual stock directional prediction was achieved by combining different machine learning models such as Random Forest and LSTM together into state-of-the-art ensemble models. The data we use contains weekly historical prices, finance reports, and text information from news items associated with 518 different common stocks issued by current and former S&P 500 large-cap companies, from January 1, 2000 to December 31, 2019. Our study's innovation includes utilizing deep language models to categorize and infer financial news item sentiment; fusing different models containing different combinations of variables and stocks to jointly make predictions; and overcoming the insufficient data problem for machine learning models in time series by using data across different stocks.	翻訳日:2021-08-25 14:25:43 公開日:2021-08-24
# 物理学を応用した深層学習:システム信頼性評価のための有望な手法 Physics-Informed Deep Learning: A Promising Technique for System Reliability Assessment ( http://arxiv.org/abs/2108.10828v1 ) ライセンス: Link先を確認	Taotao Zhou, Enrique Lopez Droguett, Ali Mosleh	(参考訳) 信頼性と安全性のコミュニティにおけるシステム診断と健康管理のためのディープラーニングに基づく予測モデルに関する研究が注目されている。しかし,システム信頼性評価における深層学習の利用に関する研究は限られている。本稿では,近年の物理インフォームド深層学習の進歩を利用して,このギャップを埋め,深層学習とシステム信頼性評価の新たなインターフェースを探求することを目的とする。特に,物理学を対象とする深層学習の文脈におけるフレームシステム信頼性評価のアプローチを提示し,不確実性定量化とシステム信頼性評価に組み込んだ計測データのための物理を対象とする生成的逆ネットワークの可能性について考察する。提案手法はデュアルプロセッサ計算システムを含む3つの数値例によって実証された。この結果は,計算課題を緩和し,測定データと数理モデルを組み合わせてシステム信頼性評価を行う物理情報深層学習の可能性を示している。 Considerable research has been devoted to deep learning-based predictive models for system prognostics and health management in the reliability and safety community. However, there is limited study on the utilization of deep learning for system reliability assessment. This paper aims to bridge this gap and explore this new interface between deep learning and system reliability assessment by exploiting the recent advances of physics-informed deep learning. Particularly, we present an approach to frame system reliability assessment in the context of physics-informed deep learning and discuss the potential value of physics-informed generative adversarial networks for the uncertainty quantification and measurement data incorporation in system reliability assessment. The proposed approach is demonstrated by three numerical examples involving a dual-processor computing system. The results indicate the potential value of physics-informed deep learning to alleviate computational challenges and combine measurement data and mathematical models for system reliability assessment.	翻訳日:2021-08-25 14:25:24 公開日:2021-08-24
# imGHUM: 人間の3次元形状とArticulated Poseの生成モデル imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose ( http://arxiv.org/abs/2108.10842v1 ) ライセンス: Link先を確認	Thiemo Alldieck, Hongyi Xu, Cristian Sminchisescu	(参考訳) 本稿では,3次元形状と構音ポーズの包括的生成モデルであるimghumについて,符号付き距離関数として表現する。従来の作業とは対照的に、全人体をゼロレベルセットの関数として暗黙的にモデル化し、明示的なテンプレートメッシュを使用しない。本稿では,人間のポーズ,形状,意味に関する詳細な暗黙的生成モデルを,最先端のメッシュモデルと同等に学習することのできる,新しいネットワークアーキテクチャと学習パラダイムを提案する。本モデルでは,手の動きや表情を含む調音ポーズ,形状変化の幅広いスペクトル,任意の解像度や空間的位置でクエリできるなど,人間のモデルに望ましい詳細を特徴付ける。さらに,本モデルでは,異なる形状のインスタンス間の対応性を簡単に確立し,従来の暗黙的表現による対処が困難なアプリケーションを実現するために,空間意味論を付加した。広範な実験において,モデル精度と現在の研究課題への適用性を示す。 We present imGHUM, the first holistic generative model of 3D human shape and articulated pose, represented as a signed distance function. In contrast to prior work, we model the full human body implicitly as a function zero-level-set and without the use of an explicit template mesh. We propose a novel network architecture and a learning paradigm, which make it possible to learn a detailed implicit generative model of human pose, shape, and semantics, on par with state-of-the-art mesh-based models. Our model features desired detail for human models, such as articulated pose including hand motion and facial expressions, a broad spectrum of shape variations, and can be queried at arbitrary resolutions and spatial locations. Additionally, our model has attached spatial semantics making it straightforward to establish correspondences between different shape instances, thus enabling applications that are difficult to tackle using classical implicit representations. In extensive experiments, we demonstrate the model accuracy and its applicability to current research problems.	翻訳日:2021-08-25 14:25:08 公開日:2021-08-24
# ReFINE: Random RangE Finder for Network Embedding REFINE: Random RangE FInder for Network Embedding ( http://arxiv.org/abs/2108.10703v1 ) ライセンス: Link先を確認	Hao Zhu, Piotr Koniusz	(参考訳) ノードの低次元ベクトル表現を学習するネットワーク埋め込み手法は近年,注目されている。行列分解に基づく埋め込みは有効であるが、固有分解ステップのため計算コストがかかることが多い。本稿では,ランダムレンジファインダに基づくネットワーク埋め込み(refine)アルゴリズムを提案する。このアルゴリズムは1スレッドで30秒以内に100万のノード(youtube)に埋め込むことができる。 REFINEはProNEよりも10倍高速で、LINE、DeepWalk、Node2Vec、GraRep、およびHopeといった他のメソッドよりも10-400倍高速である。まず,ネットワーク埋め込みアプローチをスキップグラムモデルとして定式化するが,直交制約により行列分解問題に再構成する。ランダム化 tSVD (truncated SVD) を他の手法として使用する代わりに、ランダム化 QR 分解を用いてノード表現を高速に取得する。さらに,ネットワーク拡張のための簡易だが効率的なスペクトルフィルタを設計し,ノード表現のための高次情報を得る。実験の結果、ノード分類のための異なるサイズ(数千から数百万のノード/エッジ)のデータセットで精錬が非常に効率的であり、優れた性能を享受できることがわかった。 Network embedding approaches have recently attracted considerable interest as they learn low-dimensional vector representations of nodes. Embeddings based on the matrix factorization are effective but they are usually computationally expensive due to the eigen-decomposition step. In this paper, we propose a Random RangE FInder based Network Embedding (REFINE) algorithm, which can perform embedding on one million of nodes (YouTube) within 30 seconds in a single thread. REFINE is 10x faster than ProNE, which is 10-400x faster than other methods such as LINE, DeepWalk, Node2Vec, GraRep, and Hope. Firstly, we formulate our network embedding approach as a skip-gram model, but with an orthogonal constraint, and we reformulate it into the matrix factorization problem. Instead of using randomized tSVD (truncated SVD) as other methods, we employ the Randomized Blocked QR decomposition to obtain the node representation fast. Moreover, we design a simple but efficient spectral filter for network enhancement to obtain higher-order information for node representation. Experimental results prove that REFINE is very efficient on datasets of different sizes (from thousand to million of nodes/edges) for node classification, while enjoying a good performance.	翻訳日:2021-08-25 14:24:21 公開日:2021-08-24
# デジタルヘルスにおけるプライバシー保護型オープンイノベーションのための連合学習 Federated Learning for Privacy-Preserving Open Innovation Future on Digital Health ( http://arxiv.org/abs/2108.10761v1 ) ライセンス: Link先を確認	Guodong Long, Tao Shen, Yue Tan, Leah Gerrard, Allison Clarke, Jing Jiang	(参考訳) プライバシー保護は、人工知能(AI)に関する倫理的な問題である。フェデレーション学習は、データに直接アクセスすることなく、ユーザや組織間で共有モデルを学ぶための、新しい機械学習パラダイムである。プライバシー保護を提供する次世代AIモデルトレーニングフレームワークになり得るため、デジタルヘルスと医療情報学の将来に幅広い影響を及ぼす可能性がある。医療業界におけるオープンなイノベーションフレームワーク、すなわちオープンヘルスの実現は、パートナー組織や研究コミュニティと次世代の共同フレームワークを構築することによって、医療関連組織のイノベーションと創造性を高めることである。特に、このゲームを変えるコラボレーティブフレームワークは、プライバシー保護を伴う多様なデータからの知識共有を提供する。この章では、AIのサポートにより、フェデレーション学習がオープンヘルスエコシステムの開発を可能にする方法について論じる。既存のフェデレーション学習の課題と解決策について論じる。 Privacy protection is an ethical issue with broad concern in Artificial Intelligence (AI). Federated learning is a new machine learning paradigm to learn a shared model across users or organisations without direct access to the data. It has great potential to be the next-general AI model training framework that offers privacy protection and therefore has broad implications for the future of digital health and healthcare informatics. Implementing an open innovation framework in the healthcare industry, namely open health, is to enhance innovation and creative capability of health-related organisations by building a next-generation collaborative framework with partner organisations and the research community. In particular, this game-changing collaborative framework offers knowledge sharing from diverse data with a privacy-preserving. This chapter will discuss how federated learning can enable the development of an open health ecosystem with the support of AI. Existing challenges and solutions for federated learning will be discussed.	翻訳日:2021-08-25 14:23:58 公開日:2021-08-24
# 階段の特徴:階層構造が深層学習をいかに導くか The staircase property: How hierarchical structure can guide deep learning ( http://arxiv.org/abs/2108.10573v1 ) ライセンス: Link先を確認	Emmanuel Abbe, Enric Boix-Adsera, Matthew Brennan, Guy Bresler, Dheeraj Nagaraj	(参考訳) 本稿では,深層ニューラルネットワークが階層的に学習できるデータ分布の構造特性を明らかにする。ブール超キューブ上の関数の「階段」特性を定義し、高階フーリエ係数がチェーンの増加に伴う低階フーリエ係数から到達可能であることを仮定する。この性質を満たす関数は、正規ニューラルネットワークの層状確率座標降下(英語版)(layerwise stochastic coordinate descend)を用いて多項式時間で学習できることを証明している。解析により,そのような階段関数やニューラルネットワークに対して,勾配に基づくアルゴリズムは,ネットワーク深度に沿った低次特徴を優雅に組み合わせることで,高次特徴を学習することを示した。さらに,より標準的なResNetアーキテクチャにより,階段関数が学習可能であることを示す実験により,理論的結果を裏付ける。 sqやpacアルゴリズムをエミュレートできる一般的な多項式サイズネットワークとは対照的に、この理論と実験の結果は、階段特性が通常のネットワーク上での勾配ベース学習の能力を理解する上で役割を担っているという事実を裏付けている。 This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically. We define the "staircase" property for functions over the Boolean hypercube, which posits that high-order Fourier coefficients are reachable from lower-order Fourier coefficients along increasing chains. We prove that functions satisfying this property can be learned in polynomial time using layerwise stochastic coordinate descent on regular neural networks -- a class of network architectures and initializations that have homogeneity properties. Our analysis shows that for such staircase functions and neural networks, the gradient-based algorithm learns high-level features by greedily combining lower-level features along the depth of the network. We further back our theoretical results with experiments showing that staircase functions are also learnable by more standard ResNet architectures with stochastic gradient descent. Both the theoretical and experimental results support the fact that staircase properties have a role to play in understanding the capabilities of gradient-based learning on regular networks, in contrast to general polynomial-size networks that can emulate any SQ or PAC algorithms as recently shown.	翻訳日:2021-08-25 14:23:44 公開日:2021-08-24
# GrADE:時間依存型非線形偏微分方程式に対するグラフベースデータ駆動解法 GrADE: A graph based data-driven solver for time-dependent nonlinear partial differential equations ( http://arxiv.org/abs/2108.10639v1 ) ライセンス: Link先を確認	Yash Kumar and Souvik Chakraborty	(参考訳) 物理世界は物理学の法則によって支配され、しばしば非線形偏微分方程式(PDE)の形で表される。残念ながら、PDEの解は非自明であり、しばしばかなりの計算時間を必要とする。近年の人工知能と機械学習の分野での進歩により、ニューラルネットワークを用いたPDEのソリューションが、大きな潜在能力を持つドメインとして登場した。しかし、この分野の開発のほとんどは、完全に接続されたニューラルネットワーク(FNN)または畳み込みニューラルネットワーク(CNN)に基づいている。 FNNは計算的に非効率であり、ネットワークパラメータの数が巨大になる可能性があるが、CNNは通常のグリッドと単純なドメインを必要とする。本稿では,時間依存非線形pdesを解くためのグラフ注意微分方程式(グレード)と呼ばれる新しい枠組みを提案する。提案するアプローチは、FNN、グラフニューラルネットワークと、最近開発されたNeural ODEフレームワークを結合する。第一の考え方は、空間領域をモデル化するためのグラフニューラルネットワークと、時間領域をモデル化するためのニューラルODEである。注意機構は重要な入力/特徴を特定し、より多くの重み付けを割り当て、提案するフレームワークの性能を高める。一方、ニューラルODEはメモリコストを一定に抑え、速度の数値的精度の取引を可能にする。また,提案するアーキテクチャをより少ない時間で精度良く訓練するための効果的な手法として,深度改善を提案する。提案手法の有効性を1次元および2次元バーガーズ方程式を用いて示す。その結果、PDEのモデリングにおける提案フレームワークの能力と、再トレーニングを必要とせず、より大きなドメインへの拡張性を示した。 The physical world is governed by the laws of physics, often represented in form of nonlinear partial differential equations (PDEs). Unfortunately, solution of PDEs is non-trivial and often involves significant computational time. With recent developments in the field of artificial intelligence and machine learning, the solution of PDEs using neural network has emerged as a domain with huge potential. However, most of the developments in this field are based on either fully connected neural networks (FNN) or convolutional neural networks (CNN). While FNN is computationally inefficient as the number of network parameters can be potentially huge, CNN necessitates regular grid and simpler domain. In this work, we propose a novel framework referred to as the Graph Attention Differential Equation (GrADE) for solving time dependent nonlinear PDEs. The proposed approach couples FNN, graph neural network, and recently developed Neural ODE framework. The primary idea is to use graph neural network for modeling the spatial domain, and Neural ODE for modeling the temporal domain. The attention mechanism identifies important inputs/features and assign more weightage to the same; this enhances the performance of the proposed framework. Neural ODE, on the other hand, results in constant memory cost and allows trading of numerical precision for speed. We also propose depth refinement as an effective technique for training the proposed architecture in lesser time with better accuracy. The effectiveness of the proposed framework is illustrated using 1D and 2D Burgers' equations. Results obtained illustrate the capability of the proposed framework in modeling PDE and its scalability to larger domains without the need for retraining.	翻訳日:2021-08-25 14:23:27 公開日:2021-08-24
# マルチスケール進行統計モデルを用いたロスレス画像圧縮 Lossless Image Compression Using a Multi-Scale Progressive Statistical Model ( http://arxiv.org/abs/2108.10551v1 ) ライセンス: Link先を確認	Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Nannan Zou, Emre Aksu, Miska M. Hannuksela	(参考訳) ロスレス画像圧縮は、情報損失を許さない場合、画像記憶と伝送にとって重要な技術である。ディープラーニング技術の急速な発展に伴い、この分野ではより高い圧縮率を達成するためにディープニューラルネットワークが使用されている。画素単位の自己回帰統計モデルに基づく手法は優れた性能を示した。しかし、シーケンシャルな処理方法は、これらの方法が実際に使用されるのを防ぐ。近年,この制限に対処するために,マルチスケール自己回帰モデルが提案されている。マルチスケールアプローチは並列コンピューティングシステムを効率的に利用し、実用的なシステムを構築することができる。しかし、これらの手法は速度と引き換えに圧縮性能を犠牲にする。本稿では,画素ワイド・アプローチとマルチスケール・アプローチを利用するマルチスケール・プログレッシブ・統計モデルを提案する。我々は,画素の処理順序を容易に調整できるフレキシブルな機構を開発した。提案手法は,推定速度を劇的に低下させることなく,2つの大きなベンチマークデータセットに対して,最先端のロスレス画像圧縮法を著しく向上させる。 Lossless image compression is an important technique for image storage and transmission when information loss is not allowed. With the fast development of deep learning techniques, deep neural networks have been used in this field to achieve a higher compression rate. Methods based on pixel-wise autoregressive statistical models have shown good performance. However, the sequential processing way prevents these methods to be used in practice. Recently, multi-scale autoregressive models have been proposed to address this limitation. Multi-scale approaches can use parallel computing systems efficiently and build practical systems. Nevertheless, these approaches sacrifice compression performance in exchange for speed. In this paper, we propose a multi-scale progressive statistical model that takes advantage of the pixel-wise approach and the multi-scale approach. We developed a flexible mechanism where the processing order of the pixels can be adjusted easily. Our proposed method outperforms the state-of-the-art lossless image compression methods on two large benchmark datasets by a significant margin without degrading the inference speed dramatically.	翻訳日:2021-08-25 14:22:52 公開日:2021-08-24
# 多言語モデルの方が優れているか? トランスフォーマーによるチェコ感覚の向上 Are the Multilingual Models Better? Improving Czech Sentiment with Transformers ( http://arxiv.org/abs/2108.10640v1 ) ライセンス: Link先を確認	Pavel P\v{r}ib\'a\v{n}, Josef Steinberger	(参考訳) 本稿では,トランスフォーマーモデルとその多言語バージョンを用いたチェコ語感情の向上を目指す。より具体的には、3つの感情極性データセットに基づくチェコ語の極性検出の課題について検討する。 5つの多言語モデルと3つの単言語モデルを用いて微調整および実験を行った。単言語モデルと多言語モデルのパフォーマンスを比較し、繰り返しニューラルネットワークに基づく従来のアプローチと比較する。さらに、多言語モデルとその知識を英語からチェコ語へ(そしてその逆も)ゼロショットのクロスリンガル分類で伝達する能力をテストする。実験により,巨大多言語モデルが単言語モデルの性能を克服できることを示した。彼らはまた、訓練データなしで他の言語の極性を検出することができ、最先端のモノリンガル訓練モデルと比較してパフォーマンスは4.4 %以下である。さらに,3つのデータセットについて,最新の結果を得た。 In this paper, we aim at improving Czech sentiment with transformer-based models and their multilingual versions. More concretely, we study the task of polarity detection for the Czech language on three sentiment polarity datasets. We fine-tune and perform experiments with five multilingual and three monolingual models. We compare the monolingual and multilingual models' performance, including comparison with the older approach based on recurrent neural networks. Furthermore, we test the multilingual models and their ability to transfer knowledge from English to Czech (and vice versa) with zero-shot cross-lingual classification. Our experiments show that the huge multilingual models can overcome the performance of the monolingual models. They are also able to detect polarity in another language without any training data, with performance not worse than 4.4 % compared to state-of-the-art monolingual trained models. Moreover, we achieved new state-of-the-art results on all three datasets.	翻訳日:2021-08-25 14:22:24 公開日:2021-08-24
# インテント検出のための密度ベース動的カリキュラム学習 Density-Based Dynamic Curriculum Learning for Intent Detection ( http://arxiv.org/abs/2108.10674v1 ) ライセンス: Link先を確認	Yantao Gong, Cao Liu, Jiazhen Yuan, Fan Yang, Xunliang Cai, Guanglu Wan, Jiansong Chen, Ruiyao Niu and Houfeng Wang	(参考訳) 事前訓練された言語モデルは、意図検出タスクにおいて顕著なパフォーマンスを達成した。しかしながら、各サンプルに同じ重みを割り当てることによって、単純なサンプルの過剰フィットと複雑なサンプルの学習の失敗に苦しむことになる。この問題に対処するために,密度に基づく動的カリキュラム学習モデルを提案する。本モデルは固有ベクトルの密度に応じてサンプルの難易度を定義する。このようにして、全てのサンプルの固有ベクトルの全体分布を同時に活用する。次に,様々な難易度のサンプルに注意を払い,学習過程におけるサンプルの割合を変化させる動的カリキュラム学習戦略を適用した。以上の操作を通じて、単純なサンプルを十分に訓練し、複雑なサンプルを増強する。 3つのオープンデータセットの実験により、提案した密度に基づくアルゴリズムが、単純かつ複雑なサンプルを著しく区別できることが確認された。さらに,本モデルでは,強いベースラインよりも明らかに改善されている。 Pre-trained language models have achieved noticeable performance on the intent detection task. However, due to assigning an identical weight to each sample, they suffer from the overfitting of simple samples and the failure to learn complex samples well. To handle this problem, we propose a density-based dynamic curriculum learning model. Our model defines the sample's difficulty level according to their eigenvectors' density. In this way, we exploit the overall distribution of all samples' eigenvectors simultaneously. Then we apply a dynamic curriculum learning strategy, which pays distinct attention to samples of various difficulty levels and alters the proportion of samples during the training process. Through the above operation, simple samples are well-trained, and complex samples are enhanced. Experiments on three open datasets verify that the proposed density-based algorithm can distinguish simple and complex samples significantly. Besides, our model obtains obvious improvement over the strong baselines.	翻訳日:2021-08-25 14:22:10 公開日:2021-08-24
# 微細診断システムを用いた小児呼吸器疾患の同定 Identification of Pediatric Respiratory Diseases Using Fine-grained Diagnosis System ( http://arxiv.org/abs/2108.10818v1 ) ライセンス: Link先を確認	Gang Yu, Zhongzhi Yu, Yemin Shi, Yingshuo Wang, Xiaoqing Liu, Zheming Li, Yonggen Zhao, Fenglei Sun, Yizhou Yu, Qiang Shu	(参考訳) 喘息、気管支炎、肺炎、上気道感染症(RTI)などの呼吸器疾患は、クリニックで最も一般的な疾患である。これらの疾患の症状の類似性は、患者の到着時に迅速に診断することを妨げる。小児科では, 症状の表現能力が限られているため, 正確な診断は困難である。これは、医療画像装置の欠如と医師の限られた経験が、類似した疾患の区別の困難さをさらに増す、一次病院で悪化する。本報告では, 小児の細粒度診断補助システムについて, 入院時に臨床ノートのみを用いて, 迅速かつ正確な診断を行うように提案する。提案システムは,検査結果の構造化段階と疾患同定段階の2段階からなる。第1段階は臨床ノートから関連する数値を抽出して検査結果を構造化し、疾患識別段階はテキスト形式の臨床記録および第1段階から得られた構造化データに基づく診断を提供する。適応的特徴注入や多モード注意融合といった手法を導入し, ヒューズデータとテキストデータを融合する, 新たな深層学習アルゴリズムを開発した。深層学習モデルのトレーニングには12000人以上の呼吸器疾患患者の臨床ノートを使用し,訓練モデルの性能評価には約1800人の非重複患者からの臨床ノートを使用した。肺炎、RTI、気管支炎、喘息の平均精度(AP)はそれぞれ0.878、0.857、0.714、0.825であり、平均AP(mAP)は0.819である。 Respiratory diseases, including asthma, bronchitis, pneumonia, and upper respiratory tract infection (RTI), are among the most common diseases in clinics. The similarities among the symptoms of these diseases precludes prompt diagnosis upon the patients' arrival. In pediatrics, the patients' limited ability in expressing their situation makes precise diagnosis even harder. This becomes worse in primary hospitals, where the lack of medical imaging devices and the doctors' limited experience further increase the difficulty of distinguishing among similar diseases. In this paper, a pediatric fine-grained diagnosis-assistant system is proposed to provide prompt and precise diagnosis using solely clinical notes upon admission, which would assist clinicians without changing the diagnostic process. The proposed system consists of two stages: a test result structuralization stage and a disease identification stage. The first stage structuralizes test results by extracting relevant numerical values from clinical notes, and the disease identification stage provides a diagnosis based on text-form clinical notes and the structured data obtained from the first stage. A novel deep learning algorithm was developed for the disease identification stage, where techniques including adaptive feature infusion and multi-modal attentive fusion were introduced to fuse structured and text data together. Clinical notes from over 12000 patients with respiratory diseases were used to train a deep learning model, and clinical notes from a non-overlapping set of about 1800 patients were used to evaluate the performance of the trained model. The average precisions (AP) for pneumonia, RTI, bronchitis and asthma are 0.878, 0.857, 0.714, and 0.825, respectively, achieving a mean AP (mAP) of 0.819.	翻訳日:2021-08-25 14:21:42 公開日:2021-08-24
# ParamCrop:ビデオコントラスト学習のためのパラメトリックキュービッククロップ ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning ( http://arxiv.org/abs/2108.10501v1 ) ライセンス: Link先を確認	Zhiwu Qing, Ziyuan Huang, Shiwei Zhang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Rong Ji, Nong Sang	(参考訳) コントラスト学習の中心的な考え方は、異なるインスタンスを区別し、同じインスタンスの異なるビューを同じ表現を共有するように強制することである。自明な解を避けるために、拡張は異なるビューを生成する上で重要な役割を担い、その中ではランダムなトリミングがモデルが強く一般化された表現を学ぶのに有効であることが示される。一般的なランダムな作物操作は、トレーニングプロセスに沿って統計的に一致した2つのビューの違いを保っている。本研究では,学習者表現の質を高めるために,学習過程に沿った2つの拡張ビュー間の差異を適応的に制御する手法を提案する。具体的には、3次元アフィン変換によりビデオから3次元立方体を自動的に収穫する、ビデオコントラスト学習のためのパラメトリック立方体収穫操作であるParamCropを提案する。 ParamCropは、対向目的を用いてビデオバックボーンと同時に訓練され、データから最適な収穫戦略を学ぶ。 2つの拡張ビュー間の中心距離とIoUは、ParamCropによって適応的に制御され、トレーニング過程に沿った相違点の学習は、強い表現を学ぶ上で有益であることを示す。広範囲にわたるアブレーション研究は、複数のコントラスト学習フレームワークとビデオバックボーンに対するParamCropの有効性を示す。 ParamCropでは,HMDB51およびUCF101データセットの最先端性能を改善した。 The central idea of contrastive learning is to discriminate between different instances and force different views of the same instance to share the same representation. To avoid trivial solutions, augmentation plays an important role in generating different views, among which random cropping is shown to be effective for the model to learn a strong and generalized representation. Commonly used random crop operation keeps the difference between two views statistically consistent along the training process. In this work, we challenge this convention by showing that adaptively controlling the disparity between two augmented views along the training process enhances the quality of the learnt representation. Specifically, we present a parametric cubic cropping operation, ParamCrop, for video contrastive learning, which automatically crops a 3D cubic from the video by differentiable 3D affine transformations. ParamCrop is trained simultaneously with the video backbone using an adversarial objective and learns an optimal cropping strategy from the data. The visualizations show that the center distance and the IoU between two augmented views are adaptively controlled by ParamCrop and the learned change in the disparity along the training process is beneficial to learning a strong representation. Extensive ablation studies demonstrate the effectiveness of the proposed ParamCrop on multiple contrastive learning frameworks and video backbones. With ParamCrop, we improve the state-of-the-art performance on both HMDB51 and UCF101 datasets.	翻訳日:2021-08-25 14:20:28 公開日:2021-08-24
# ShapeConv: 室内RGB-Dセマンティックセグメンテーションのための形状認識型畳み込み層 ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation ( http://arxiv.org/abs/2108.10528v1 ) ライセンス: Link先を確認	Jinming Cao, Hanchao Leng, Dani Lischinski, Danny Cohen-Or, Changhe Tu, Yangyan Li	(参考訳) RGB-Dセマンティックセグメンテーションはここ数年で注目を集めている。既存の方法は、主にRGBと深度の特徴を消費するために同質の畳み込み演算子を使用し、固有の違いを無視している。実際、RGB値は投影された画像空間の測光的外観特性を捉え、深度特徴は局所幾何学の形状とそれの基底(場所)をより広い文脈でエンコードする。ベースと比較すると、形状はおそらくより固有であり、セマンティクスとより強く結びついているので、セグメンテーションの精度にとってより重要となる。この観察に触発された形状認識畳み込み層(shapeconv)を用いて深度特徴を処理し,まず深さ特徴を形状成分と基底成分に分解し,次に学習可能な重みを2つ導入してそれぞれ独立に連携させ,最終的にこれら2成分の再重み付け結合に畳み込みを適用する。 shapeconvはモデルに依存しず、ほとんどのcnnに簡単に統合でき、セマンティクスセグメンテーションのためにバニラ畳み込み層を置き換えることができる。屋内RGB-Dセマンティックセマンティックセグメンテーションベンチマーク(NYU-Dv2(-13,-40)、SUN RGB-D、SID)の大規模な実験は、5つのポピュラーなアーキテクチャで採用する際のShapeConvの有効性を実証している。さらに、計算やメモリ増加を推論フェーズに導入することなく、shapeconvによるcnnの性能を向上させる。理由は、ShapeConvにおける形状と基成分のバランスをとる学習ウェイトが、推論フェーズにおいて定数となり、次の畳み込みに融合し、バニラ畳み込み層を持つものと同一のネットワークとなるからである。 RGB-D semantic segmentation has attracted increasing attention over the past few years. Existing methods mostly employ homogeneous convolution operators to consume the RGB and depth features, ignoring their intrinsic differences. In fact, the RGB values capture the photometric appearance properties in the projected image space, while the depth feature encodes both the shape of a local geometry as well as the base (whereabout) of it in a larger context. Compared with the base, the shape probably is more inherent and has a stronger connection to the semantics, and thus is more critical for segmentation accuracy. Inspired by this observation, we introduce a Shape-aware Convolutional layer (ShapeConv) for processing the depth feature, where the depth feature is firstly decomposed into a shape-component and a base-component, next two learnable weights are introduced to cooperate with them independently, and finally a convolution is applied on the re-weighted combination of these two components. ShapeConv is model-agnostic and can be easily integrated into most CNNs to replace vanilla convolutional layers for semantic segmentation. Extensive experiments on three challenging indoor RGB-D semantic segmentation benchmarks, i.e., NYU-Dv2(-13,-40), SUN RGB-D, and SID, demonstrate the effectiveness of our ShapeConv when employing it over five popular architectures. Moreover, the performance of CNNs with ShapeConv is boosted without introducing any computation and memory increase in the inference phase. The reason is that the learnt weights for balancing the importance between the shape and base components in ShapeConv become constants in the inference phase, and thus can be fused into the following convolution, resulting in a network that is identical to one with vanilla convolutional layers.	翻訳日:2021-08-25 14:20:05 公開日:2021-08-24
# 人物再識別のメリットを享受する人物検索 Making Person Search Enjoy the Merits of Person Re-identification ( http://arxiv.org/abs/2108.10536v1 ) ライセンス: Link先を確認	Chuang Liu, Hua Yang, Qin Zhou and Shibao Zheng	(参考訳) 人物検索は、人物再識別(Re-ID)の拡張タスクである。しかし,既存の1段階の人物探索作業の多くは,人物検出とRe-IDの統合による1段階の人物探索性能向上のために,既存の高度なRe-IDモデルをどのように活用するかを研究していない。この問題に対処するため,教師誘導型分散ネットワーク(TDN)という,より高速で強力なワンステップの人物検索フレームワークを提案し,既存のRe-ID研究のメリットを享受する。提案するtdnは,高度な人物再識別知識を人物検索モデルに転送することにより,人物検索性能を大幅に向上させることができる。提案するtdnでは,リid教師モデルからワンステップパーソンサーチモデルへの知識伝達を改善するため,2つのサブタスクを部分的に分離して,強力なワンステップパーソンサーチベースフレームワークを設計する。さらに,Re-IDモデルとワンステップの人物探索モデル間の入力形式の違いによるスケールギャップを橋渡しする知識伝達ブリッジモジュールを提案する。テスト中は、パノラマ画像の文脈情報を利用してより良い検索を行うためのコンテキストパーソンのランク付け戦略をさらに提案する。 2つの公開人検索データセットの実験により,提案手法の有効性が示された。 Person search is an extended task of person re-identification (Re-ID). However, most existing one-step person search works have not studied how to employ existing advanced Re-ID models to boost the one-step person search performance due to the integration of person detection and Re-ID. To address this issue, we propose a faster and stronger one-step person search framework, the Teacher-guided Disentangling Networks (TDN), to make the one-step person search enjoy the merits of the existing Re-ID researches. The proposed TDN can significantly boost the person search performance by transferring the advanced person Re-ID knowledge to the person search model. In the proposed TDN, for better knowledge transfer from the Re-ID teacher model to the one-step person search model, we design a strong one-step person search base framework by partially disentangling the two subtasks. Besides, we propose a Knowledge Transfer Bridge module to bridge the scale gap caused by different input formats between the Re-ID model and one-step person search model. During testing, we further propose the Ranking with Context Persons strategy to exploit the context information in panoramic images for better retrieval. Experiments on two public person search datasets demonstrate the favorable performance of the proposed method.	翻訳日:2021-08-25 14:19:28 公開日:2021-08-24
# StyleAugment: 事前定義されたテクスチャのないスタイル拡張によるテクスチャ非バイアス表現の学習 StyleAugment: Learning Texture De-biased Representations by Style Augmentation without Pre-defined Textures ( http://arxiv.org/abs/2108.10549v1 ) ライセンス: Link先を確認	Sanghyuk Chun, Song Park	(参考訳) 最近の強力な視覚分類器はテクスチャに偏り、形状情報はモデルによって見過ごされている。 Stylized ImageNetと呼ばれるアートスタイルのトランスファー手法を用いて、トレーニング画像を増強する簡単な試みは、テクスチャバイアスを低減することができる。しかし、Stylized ImageNetアプローチには、忠実度と多様性の2つの欠点がある。まず、生成した画像は、自然画像や芸術絵画に見合う重要な意味的ギャップのため、画質が低い。また、Stylized ImageNetトレーニングサンプルはトレーニング前に事前計算されるため、各サンプルの多様性が欠如している。ミニバッチからスタイルを拡張したStyleAugmentを提案する。 styleaugmentは事前定義されたスタイル参照に依存しないが、参照のためのmini-batch内の自然画像によってオンザフライで拡張イメージを生成する。そのため、StyleAugmentでは、各画像に対する豊富なコンバウンディングキューをオンザフライで観察すると同時に、拡張されたイメージは芸術的なスタイルの転送画像よりもリアルである。我々は,画像NetデータセットにおけるStyleAugmentの有効性を,テクスチャデバイアス精度,汚濁堅牢性,自然対向サンプル,閉塞堅牢性などのロバスト性ベンチマークを用いて検証した。 StyleAugmentは従来の教師なしデバイアス法や最先端データ拡張法よりも優れた一般化性能を示す。 Recent powerful vision classifiers are biased towards textures, while shape information is overlooked by the models. A simple attempt by augmenting training images using the artistic style transfer method, called Stylized ImageNet, can reduce the texture bias. However, Stylized ImageNet approach has two drawbacks in fidelity and diversity. First, the generated images show low image quality due to the significant semantic gap betweeen natural images and artistic paintings. Also, Stylized ImageNet training samples are pre-computed before training, resulting in showing the lack of diversity for each sample. We propose a StyleAugment by augmenting styles from the mini-batch. StyleAugment does not rely on the pre-defined style references, but generates augmented images on-the-fly by natural images in the mini-batch for the references. Hence, StyleAugment let the model observe abundant confounding cues for each image by on-the-fly the augmentation strategy, while the augmented images are more realistic than artistic style transferred images. We validate the effectiveness of StyleAugment in the ImageNet dataset with robustness benchmarks, such as texture de-biased accuracy, corruption robustness, natural adversarial samples, and occlusion robustness. StyleAugment shows better generalization performances than previous unsupervised de-biasing methods and state-of-the-art data augmentation methods in our experiments.	翻訳日:2021-08-25 14:19:06 公開日:2021-08-24
# 画像キャプションと視覚的質問応答のための自動パシングネットワーク Auto-Parsing Network for Image Captioning and Visual Question Answering ( http://arxiv.org/abs/2108.10568v1 ) ライセンス: Link先を確認	Xu Yang and Chongyang Gao and Hanwang Zhang and Jianfei Cai	(参考訳) 本稿では,トランスフォーマーに基づく視覚言語システムの有効性を向上させるために,入力データの隠れ木構造を発見し,活用するための自動パーシングネットワークを提案する。具体的には、各自己注意層における注意操作によってパラメータ化された確率的グラフモデル(PGM)を課し、スパース仮定を組み込む。我々はこのPGMを用いて、入力シーケンスをいくつかのクラスタにソフトに分割し、各クラスタを内部エンティティの親として扱う。これらの制約された自己アテンション層を積み重ねることで、下位層のクラスタは新しいシーケンスに構成され、上位層のPGMはこのシーケンスをさらにセグメンテーションする。反復的に、スパースツリーを暗黙的に解析することができ、このツリーの階層的な知識は変換された埋め込みに組み込まれ、ターゲットの視覚言語タスクの解決に使用できる。具体的には、我々のAPNがTransformerベースのネットワークを2つの主要な視覚言語タスクであるCaptioningとVisual Question Answeringで強化できることを示します。また、PGM確率に基づく解析アルゴリズムを開発し、推論中に入力の隠れ構造が何であるかを知ることができる。 We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems. Specifically, we impose a Probabilistic Graphical Model (PGM) parameterized by the attention operations on each self-attention layer to incorporate sparse assumption. We use this PGM to softly segment an input sequence into a few clusters where each cluster can be treated as the parent of the inside entities. By stacking these PGM constrained self-attention layers, the clusters in a lower layer compose into a new sequence, and the PGM in a higher layer will further segment this sequence. Iteratively, a sparse tree can be implicitly parsed, and this tree's hierarchical knowledge is incorporated into the transformed embeddings, which can be used for solving the target vision-language tasks. Specifically, we showcase that our APN can strengthen Transformer based networks in two major vision-language tasks: Captioning and Visual Question Answering. Also, a PGM probability-based parsing algorithm is developed by which we can discover what the hidden structure of input is during the inference.	翻訳日:2021-08-25 14:18:39 公開日:2021-08-24
# 映像グラウンディングのためのサポートセットベースクロススーパービジョン Support-Set Based Cross-Supervision for Video Grounding ( http://arxiv.org/abs/2108.10576v1 ) ライセンス: Link先を確認	Xinpeng Ding, Nannan Wang, Shiwei Zhang, De Cheng, Xiaomeng Li, Ziyuan Huang, Mingqian Tang, Xinbo Gao	(参考訳) 現在のビデオグラウンドディングのアプローチでは、ビデオテキスト関係をキャプチャする複雑なアーキテクチャが提案されており、目覚ましい改善が達成されている。しかし、実際にはアーキテクチャ設計のみで複雑なマルチモーダル関係を学習することは困難である。本稿では,新たなSupport-set Based Cross-Supervision (Sscs) モジュールを提案する。提案するSscsモジュールは、識別的コントラスト目的と生成的キャプション目的の2つの主要成分を含む。対照的な目的は、対照的な学習によって効果的な表現を学ぶことであり、キャプション目的は、テキストによって教師される強力なビデオエンコーダを訓練することができる。接地時間と背景時間の両方で視覚的実体が共存しているため、相互排他的学習はビデオの接地には適さない。本稿では,映像全体から視覚情報を収集し,エンティティの相互排除を解消するサポートセットの概念を用いて,クロススーパービジョンを強化することでこの問題に対処する。元の目的と組み合わせることで、Sscsは既存のアプローチに対するマルチモーダル関係モデリングの能力を高めることができる。我々は,3つの挑戦的データセット上でSscsを広範囲に評価し,特にCharades-STA上のR1@0.5の6.35%において,最先端の手法を大きなマージンで改善できることを示す。 Current approaches for video grounding propose kinds of complex architectures to capture the video-text relations, and have achieved impressive improvements. However, it is hard to learn the complicated multi-modal relations by only architecture designing in fact. In this paper, we introduce a novel Support-set Based Cross-Supervision (Sscs) module which can improve existing methods during training phase without extra inference cost. The proposed Sscs module contains two main components, i.e., discriminative contrastive objective and generative caption objective. The contrastive objective aims to learn effective representations by contrastive learning, while the caption objective can train a powerful video encoder supervised by texts. Due to the co-existence of some visual entities in both ground-truth and background intervals, i.e., mutual exclusion, naively contrastive learning is unsuitable to video grounding. We address the problem by boosting the cross-supervision with the support-set concept, which collects visual information from the whole video and eliminates the mutual exclusion of entities. Combined with the original objectives, Sscs can enhance the abilities of multi-modal relation modeling for existing approaches. We extensively evaluate Sscs on three challenging datasets, and show that our method can improve current state-of-the-art methods by large margins, especially 6.35% in terms of R1@0.5 on Charades-STA.	翻訳日:2021-08-25 14:18:21 公開日:2021-08-24
# 畳み込み単位最適化によるバッチホワイトニングの一般化 Improving Generalization of Batch Whitening by Convolutional Unit Optimization ( http://arxiv.org/abs/2108.10629v1 ) ライセンス: Link先を確認	Yooshin Cho, Hanbyel Cho, Youngsoo Kim, Junmo Kim	(参考訳) バッチホワイトニング(Batch Whitening)は、入力特徴をゼロ平均(Centering)と単位分散(Scaling)に変換し、チャネル間の線形相関(Decorrelation)を取り除くことにより、トレーニングを加速し、安定化する技術である。バッチ正規化を経験的に最適化した一般的な構造では、正規化層は畳み込みとアクティベーション関数の間に現れる。バッチホワイトニングの研究の後、同じ構造をそれ以上解析することなく採用し、線形層の入力がホワイト化されることを前提にバッチホワイト化も分析された。このギャップを埋めるため,我々はこの理論に沿った新しい畳み込みユニットを提案し,本手法は一般にバッチ・ホワイトニングの性能を向上させる。さらに,特徴のランクと相関を調査することで,元の畳み込みユニットの非効率性を示す。本手法は市販のホワイトニングモジュールを用いるため,最先端のホワイトニングモジュールであるイテレーティブ正規化(IterNorm)を用いて,CIFAR-10,CIFAR-100,CUB-200-2011,Stanford Dogs,ImageNetの5つの画像分類データセットにおいて,大幅な性能向上を実現している。特に,大きな学習率,グループサイズ,イテレーション数を用いることで,ホワイトニングの安定性と性能が向上することを確認した。 Batch Whitening is a technique that accelerates and stabilizes training by transforming input features to have a zero mean (Centering) and a unit variance (Scaling), and by removing linear correlation between channels (Decorrelation). In commonly used structures, which are empirically optimized with Batch Normalization, the normalization layer appears between convolution and activation function. Following Batch Whitening studies have employed the same structure without further analysis; even Batch Whitening was analyzed on the premise that the input of a linear layer is whitened. To bridge the gap, we propose a new Convolutional Unit that is in line with the theory, and our method generally improves the performance of Batch Whitening. Moreover, we show the inefficacy of the original Convolutional Unit by investigating rank and correlation of features. As our method is employable off-the-shelf whitening modules, we use Iterative Normalization (IterNorm), the state-of-the-art whitening module, and obtain significantly improved performance on five image classification datasets: CIFAR-10, CIFAR-100, CUB-200-2011, Stanford Dogs, and ImageNet. Notably, we verify that our method improves stability and performance of whitening when using large learning rate, group size, and iteration number.	翻訳日:2021-08-25 14:17:56 公開日:2021-08-24
# レーダー・カメラ融合による全速度レーダリターン Full-Velocity Radar Returns by Radar-Camera Fusion ( http://arxiv.org/abs/2108.10637v1 ) ライセンス: Link先を確認	Yunfei Long, Daniel Morris, Xiaoming Liu, Marcos Castro, Punarjay Chakravarty, Praveen Narayanan	(参考訳) ドップラーレーダーの特徴は、レーダー点の半径方向の速度を測定することである。しかし, 物体速度推定と動的シーンにおけるレーダスイープの時間的統合の欠如は, 物体速度推定を損なう。本稿では,レーダを融合したカメラがレーダに相補的な情報を提供することを認識し,カメラ画像からの対応する光フローを用いてドップラー帰還の点方向全速度推定を行う。さらに,レーダーとカメラの対応を推定するニューラルネットワークを用いて,レーダのリターンとカメラ画像の関連付け問題に対処する。 nuScenesデータセットの実験結果は,提案手法の有効性を検証し,レーダ点の速度推定および蓄積における最先端の精度向上を示す。 A distinctive feature of Doppler radar is the measurement of velocity in the radial direction for radar points. However, the missing tangential velocity component hampers object velocity estimation as well as temporal integration of radar sweeps in dynamic scenes. Recognizing that fusing camera with radar provides complementary information to radar, in this paper we present a closed-form solution for the point-wise, full-velocity estimate of Doppler returns using the corresponding optical flow from camera images. Additionally, we address the association problem between radar returns and camera images with a neural network that is trained to estimate radar-camera correspondences. Experimental results on the nuScenes dataset verify the validity of the method and show significant improvements over the state-of-the-art in velocity estimation and accumulation of radar points.	翻訳日:2021-08-25 14:17:30 公開日:2021-08-24
# 教師なし視覚表現学習のための時間的知識整合性 Temporal Knowledge Consistency for Unsupervised Visual Representation Learning ( http://arxiv.org/abs/2108.10668v1 ) ライセンス: Link先を確認	Weixin Feng, Yuanjiang Wang, Lihua Ma, Ye Yuan, Chi Zhang	(参考訳) インスタンス識別パラダイムは教師なし学習において支配的になっている。教師が生徒の指導信号として組込みの知識を提供するという、教師中心の枠組みを常に採用している。生徒は、教師の見解とインスタンスの空間的一貫性を強制することによって意味のある表現を学ぶ。しかし、教師の出力は、異なる訓練段階において同じ事例で劇的に変化し、予期せぬノイズが引き起こされ、矛盾した目的によって壊滅的な忘れが引き起こされる。本稿では、まずインスタンスの時間的一貫性を現在のインスタンス識別パラダイムに統合し、時間的知識一貫性(TKC)という新しい強力なアルゴリズムを提案する。具体的には,tkcは時間的教師の知識を動的に整理し,学習例の時間的一貫性を重視した有用な情報を適応的に選択する。実験結果から、TKCは線形評価プロトコル上でResNetとAlexNetの両方の視覚表現を学習し、下流タスクにうまく転送できることがわかった。すべての実験から,本手法の有効性と一般化が示唆された。 The instance discrimination paradigm has become dominant in unsupervised learning. It always adopts a teacher-student framework, in which the teacher provides embedded knowledge as a supervision signal for the student. The student learns meaningful representations by enforcing instance spatial consistency with the views from the teacher. However, the outputs of the teacher can vary dramatically on the same instance during different training stages, introducing unexpected noise and leading to catastrophic forgetting caused by inconsistent objectives. In this paper, we first integrate instance temporal consistency into current instance discrimination paradigms, and propose a novel and strong algorithm named Temporal Knowledge Consistency (TKC). Specifically, our TKC dynamically ensembles the knowledge of temporal teachers and adaptively selects useful information according to its importance to learning instance temporal consistency. Experimental result shows that TKC can learn better visual representations on both ResNet and AlexNet on linear evaluation protocol while transfer well to downstream tasks. All experiments suggest the good effectiveness and generalization of our method.	翻訳日:2021-08-25 14:17:14 公開日:2021-08-24
# ビデオ・サリエンシ予測のための時空間自己注意ネットワーク Spatio-Temporal Self-Attention Network for Video Saliency Prediction ( http://arxiv.org/abs/2108.10696v1 ) ライセンス: Link先を確認	Ziqiang Wang, Zhi Liu, Gongyang Li, Tianhong Zhang, Lihua Xu, Jijun Wang	(参考訳) 3次元畳み込みニューラルネットワークは,コンピュータビジョンにおける映像タスクにおいて有望な結果を達成している。しかし、3D畳み込みは、カーネルサイズに応じて固定された局所時空にのみ視覚表現をエンコードするが、人間の注意は常にビデオの異なる時間における関係的な視覚特徴に惹かれる。この制限を克服するために,複数のstsaモジュールを異なる3次元畳み込みバックボーンのレベルに配置し,異なる時間ステップの時空間特徴間の長距離関係を直接捉える,ビデオ・サリエンシ予測のための新たな時空間自己着型3dネットワーク(stsanet)を提案する。さらに,semantic と spatio-temporal 部分空間における文脈知覚とマルチレベル特徴を統合するための注目型マルチスケール融合(amsf)モジュールを提案する。 DHF1K, Hollywood-2, UCF, DIEMベンチマークで得られた結果から, 提案したモデルに比較して, 提案モデルの有効性が明らかとなった。 3D convolutional neural networks have achieved promising results for video tasks in computer vision, including video saliency prediction that is explored in this paper. However, 3D convolution encodes visual representation merely on fixed local spacetime according to its kernel size, while human attention is always attracted by relational visual features at different time of a video. To overcome this limitation, we propose a novel Spatio-Temporal Self-Attention 3D Network (STSANet) for video saliency prediction, in which multiple Spatio-Temporal Self-Attention (STSA) modules are employed at different levels of 3D convolutional backbone to directly capture long-range relations between spatio-temporal features of different time steps. Besides, we propose an Attentional Multi-Scale Fusion (AMSF) module to integrate multi-level features with the perception of context in semantic and spatio-temporal subspaces. Extensive experiments demonstrate the contributions of key components of our method, and the results on DHF1K, Hollywood-2, UCF, and DIEM benchmark datasets clearly prove the superiority of the proposed model compared with all state-of-the-art models.	翻訳日:2021-08-25 14:16:57 公開日:2021-08-24
# PocketNet: ニューラルネットワーク検索とマルチステップ知識蒸留を用いた極軽量顔認識ネットワーク PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation ( http://arxiv.org/abs/2108.10710v1 ) ライセンス: Link先を確認	Fadi Boutros, Patrick Siebke, Marcel Klemt, Naser Damer, Florian Kirchbuchner, Arjan Kuijper	(参考訳) ディープニューラルネットワークは、顔認識の主流となっている。しかし、組み込みデバイスやメモリフットプリントの少ないアプリケーションシナリオに非常に多くのパラメータを含むモデルをデプロイすることは困難である。本研究では,極めて軽量かつ高精度な顔認識ソリューションを提案する。我々はニューラルアーキテクチャ検索を用いて、新しい顔認識モデル、すなわちPocketNetを開発した。また,多段階の知識蒸留という知識蒸留に基づく新しい学習パラダイムを提示することにより,コンパクトモデルの検証性能を向上させることを提案する。我々は,IJB-B,IJB-C,MegaFaceなどの大規模評価ベンチマークを含む9つのベンチマークにおいて,最近のコンパクト顔認識モデルとの比較実験を行った。 pocketnetsは、同じレベルのモデルコンパクト性を考慮して、9つのメインストリームベンチマークで最先端の顔認識性能を一貫して向上させてきた。 0.92mのパラメータを持つネットワークpocketnets-128は、4m以上のパラメータを含む最近のsotaコンパクトモデルと非常に競争力のある結果を得た。トレーニングコードと事前トレーニングされたモデルはhttps://github.com/fdbtrs/PocketNet.comで公開されている。 Deep neural networks have rapidly become the mainstream method for face recognition. However, deploying such models that contain an extremely large number of parameters to embedded devices or in application scenarios with limited memory footprint is challenging. In this work, we present an extremely lightweight and accurate face recognition solution. We utilize neural architecture search to develop a new family of face recognition models, namely PocketNet. We also propose to enhance the verification performance of the compact model by presenting a novel training paradigm based on knowledge distillation, namely the multi-step knowledge distillation. We present an extensive experimental evaluation and comparisons with the recent compact face recognition models on nine different benchmarks including large-scale evaluation benchmarks such as IJB-B, IJB-C, and MegaFace. PocketNets have consistently advanced the state-of-the-art (SOTA) face recognition performance on nine mainstream benchmarks when considering the same level of model compactness. With 0.92M parameters, our smallest network PocketNetS-128 achieved very competitive results to recent SOTA compacted models that contain more than 4M parameters. Training codes and pre-trained models are publicly released https://github.com/fdbtrs/PocketNet.	翻訳日:2021-08-25 14:16:34 公開日:2021-08-24
# 平衡物体検出のための残差予測整合性 Reconcile Prediction Consistency for Balanced Object Detection ( http://arxiv.org/abs/2108.10809v1 ) ライセンス: Link先を確認	Keyang Wang, Lei Zhang	(参考訳) 分類と回帰は物体検出器の2つの柱である。ほとんどのCNNベースの検出器では、これらの2つの柱は独立に最適化されている。それらの間の直接的な相互作用がなければ、分類損失と回帰損失は、トレーニングフェーズの最適方向に対して同期的に最適化できない。これにより、特に不規則な形状や咬合対象において、高い分類スコア、低い局在精度、低い分類スコア、高い局在精度を有する不整合予測が多数生じ、nms後の既存の検出器の検出性能を著しく損なうことが明らかとなる。平衡物体検出のための予測整合性を改善するために,分類枝と局所化枝の最適化を調和させる高調波損失を提案する。調和損失により、これらの2つの分枝は、訓練中に相互に監督し、促進し、推論フェーズにおいてトップ分類とローカライゼーションの共起度の高い一貫した予測を生成することができる。さらに, トレーニング段階において, ローカライゼーション損失が外れ値に支配されるのを防止するため, 異なるIoUレベルの試料の局所化損失の重みを調和させるために, ハーモニックIoU損失を提案する。 PASCAL VOCとMS COCOのベンチマークに関する総合的な実験により,既存の物体検出装置の最先端精度向上に向けたモデルの有効性と有効性を示した。 Classification and regression are two pillars of object detectors. In most CNN-based detectors, these two pillars are optimized independently. Without direct interactions between them, the classification loss and the regression loss can not be optimized synchronously toward the optimal direction in the training phase. This clearly leads to lots of inconsistent predictions with high classification score but low localization accuracy or low classification score but high localization accuracy in the inference phase, especially for the objects of irregular shape and occlusion, which severely hurts the detection performance of existing detectors after NMS. To reconcile prediction consistency for balanced object detection, we propose a Harmonic loss to harmonize the optimization of classification branch and localization branch. The Harmonic loss enables these two branches to supervise and promote each other during training, thereby producing consistent predictions with high co-occurrence of top classification and localization in the inference phase. Furthermore, in order to prevent the localization loss from being dominated by outliers during training phase, a Harmonic IoU loss is proposed to harmonize the weight of the localization loss of different IoU-level samples. Comprehensive experiments on benchmarks PASCAL VOC and MS COCO demonstrate the generality and effectiveness of our model for facilitating existing object detectors to state-of-the-art accuracy.	翻訳日:2021-08-25 14:16:17 公開日:2021-08-24
# 正しい方法でチューニングする:ソフト近傍密度によるドメイン適応の教師なし検証 Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density ( http://arxiv.org/abs/2108.10860v1 ) ライセンス: Link先を確認	Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Stan Sclaroff, Trevor Darrell, and Kate Saenko	(参考訳) unsupervised domain adaptation (uda) メソッドはラベルなしのターゲットドメインの一般化を劇的に改善することができる。しかし, 最適ハイパーパラメータ選択は, 高精度化と負の伝達回避に不可欠である。教師なし適応手法を現実的に検証するにはどうすればいいのか? まず、既存の基準を実証的に分析し、ハイパーパラメータのチューニングにあまり効果がないことを示す。直感的には、訓練されたソース分類器は、近くにある同じクラスのターゲットサンプルを埋め込んで、特徴空間に密集した近傍を形成するべきである。この仮定に基づいて,点間の類似度分布のエントロピーを計算し,ソフトな近傍の密度を測定する,教師なし検証基準を提案する。画像分類とセマンティックセグメンテーションモデルの両方において、ハイパーパラメータとトレーニングイテレーションの数をチューニングすることが可能です。この論文で使われたコードは、 \url{https://github.com/VisionLearningGroup/SND} で入手できる。 Unsupervised domain adaptation (UDA) methods can dramatically improve generalization on unlabeled target domains. However, optimal hyper-parameter selection is critical to achieving high accuracy and avoiding negative transfer. Supervised hyper-parameter validation is not possible without labeled target data, which raises the question: How can we validate unsupervised adaptation techniques in a realistic way? We first empirically analyze existing criteria and demonstrate that they are not very effective for tuning hyper-parameters. Intuitively, a well-trained source classifier should embed target samples of the same class nearby, forming dense neighborhoods in feature space. Based on this assumption, we propose a novel unsupervised validation criterion that measures the density of soft neighborhoods by computing the entropy of the similarity distribution between points. Our criterion is simpler than competing validation methods, yet more effective; it can tune hyper-parameters and the number of training iterations in both image classification and semantic segmentation models. The code used for the paper will be available at \url{https://github.com/VisionLearningGroup/SND}.	翻訳日:2021-08-25 14:15:51 公開日:2021-08-24
# DROID-SLAM:モノクラー、ステレオ、RGB-DカメラのためのディープビジュアルSLAM DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras ( http://arxiv.org/abs/2108.10869v1 ) ライセンス: Link先を確認	Zachary Teed and Jia Deng	(参考訳) ディープラーニングベースのSLAMシステムであるDROID-SLAMを紹介する。 DROID-SLAMは、Dense Bundle Adjustment層を通して、カメラポーズと画素幅の繰り返し更新で構成される。 DROID-SLAMは正確で、以前の作業よりも大幅に改善され、ロバストで、壊滅的な失敗が著しく少ない。単眼ビデオのトレーニングにもかかわらず、ステレオやRGB-Dビデオを利用してテスト時にパフォーマンスを向上させることができる。オープンソースコードのURLはhttps://github.com/princeton-vl/DROID-SLAMです。 We introduce DROID-SLAM, a new deep learning based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer. DROID-SLAM is accurate, achieving large improvements over prior work, and robust, suffering from substantially fewer catastrophic failures. Despite training on monocular video, it can leverage stereo or RGB-D video to achieve improved performance at test time. The URL to our open source code is https://github.com/princeton-vl/DROID-SLAM.	翻訳日:2021-08-25 14:15:35 公開日:2021-08-24
# ソーシャル・アウェア・軌道予測モデルは本当にソーシャル・アウェアなのか? Are socially-aware trajectory prediction models really socially-aware? ( http://arxiv.org/abs/2108.10879v1 ) ライセンス: Link先を確認	Saeed Saadatnejad, Mohammadhossein Bahari, Pedram Khorsandi, Mohammad Saneian, Seyed-Mohsen Moosavi-Dezfooli, Alexandre Alahi	(参考訳) 私たちの分野は最近、ニューラルネットワークベースの軌道予測器の武器レースを目撃しました。これらの予測器は、自律走行や歩行者流シミュレーションなどの多くの応用の核心にあるが、敵の堅牢性は慎重に研究されていない。本稿では,衝突回避の観点から予測モデルの社会的理解を評価するために,社会的対応による攻撃を提案する。攻撃は小さいが慎重に作られた摂動であり、予測を失敗させる。技術的には、我々は衝突を出力の失敗モードと定義し、攻撃を誘導するためのハードおよびソフトアテンション機構を提案する。我々の攻撃のおかげで、私たちは現在のモデルの社会的理解の限界に光を当てた。近年の軌道予測モデルにおいて,本手法の強みを示す。最後に,最先端のモデルの社会的理解を高めるために,我々の攻撃を活用できることを示す。コードはオンラインで入手できる。 https://s-attack.github.io/ Our field has recently witnessed an arms race of neural network-based trajectory predictors. While these predictors are at the core of many applications such as autonomous navigation or pedestrian flow simulations, their adversarial robustness has not been carefully studied. In this paper, we introduce a socially-attended attack to assess the social understanding of prediction models in terms of collision avoidance. An attack is a small yet carefully-crafted perturbations to fail predictors. Technically, we define collision as a failure mode of the output, and propose hard- and soft-attention mechanisms to guide our attack. Thanks to our attack, we shed light on the limitations of the current models in terms of their social understanding. We demonstrate the strengths of our method on the recent trajectory prediction models. Finally, we show that our attack can be employed to increase the social understanding of state-of-the-art models. The code is available online: https://s-attack.github.io/	翻訳日:2021-08-25 14:15:26 公開日:2021-08-24
# Adaptation-Agnostic Meta-Training Adaptation-Agnostic Meta-Training ( http://arxiv.org/abs/2108.10557v1 ) ライセンス: Link先を確認	Jiaxin Chen, Li-Ming Zhan, Xiao-Ming Wu, Fu-Lai Chung	(参考訳) 多くのメタ学習アルゴリズムは、内部タスク適応時にタスク固有の予測器が学習され、メタ更新時にメタパラメータが更新されるという意味で、インターリーブプロセスに定式化することができる。通常のメタトレーニング戦略は、メタパラメータを最適化するために、内部タスク適応手順を区別する必要がある。これにより、内部タスクアルゴリズムを解析的に解決すべきという制約が生じる。この制約の下では、解析解を持つ単純なアルゴリズムのみが、モデル表現性を制限する内部タスクアルゴリズムとして適用できる。制限を緩和するために,適応非依存なメタトレーニング戦略を提案する。提案手法に従い,より強力なアルゴリズム(例えば,異なるアルゴリズムのアンサンブル)をインナータスクアルゴリズムとして適用することで,一般的なベースラインと比較して優れた性能を実現する。ソースコードはhttps://github.com/jiaxinchen666/AdaptationAgnosticMetaLearningで入手できる。 Many meta-learning algorithms can be formulated into an interleaved process, in the sense that task-specific predictors are learned during inner-task adaptation and meta-parameters are updated during meta-update. The normal meta-training strategy needs to differentiate through the inner-task adaptation procedure to optimize the meta-parameters. This leads to a constraint that the inner-task algorithms should be solved analytically. Under this constraint, only simple algorithms with analytical solutions can be applied as the inner-task algorithms, limiting the model expressiveness. To lift the limitation, we propose an adaptation-agnostic meta-training strategy. Following our proposed strategy, we can apply stronger algorithms (e.g., an ensemble of different types of algorithms) as the inner-task algorithm to achieve superior performance comparing with popular baselines. The source code is available at https://github.com/jiaxinchen666/AdaptationAgnosticMetaLearning.	翻訳日:2021-08-25 14:14:57 公開日:2021-08-24
# グラフ分類のためのポーリングアーキテクチャ検索 Pooling Architecture Search for Graph Classification ( http://arxiv.org/abs/2108.10587v1 ) ライセンス: Link先を確認	Lanning Wei, Huan Zhao, Quanming Yao, Zhiqiang He	(参考訳) グラフ分類は化学やバイオインフォマティクスなどの多くの分野において重要な問題であり、グラフニューラルネットワーク(gnn)は最先端(sota)法である。 GNNは、近傍のアグリゲーションスキームに基づいてノードレベルの表現を学習し、グラフレベルの表現を得るために、既存のGNNモデルのアグリゲーション操作後にプール法を適用し、粗い粒度のグラフを生成する。しかし、グラフ分類の高度に多様な応用により、既存のプーリング法の性能は異なるグラフによって異なる。言い換えれば、ほとんどのケースでうまく機能するようにユニバーサルプーリングアーキテクチャを設計することは難しい問題であり、現実世界のアプリケーションではデータ固有のプーリングメソッドが要求される。そこで本研究では,ニューラルネットワークを用いてグラフ分類のための適応的プーリングアーキテクチャを探索する手法を提案する。まず、アグリゲーション、プール、リードアウト、マージの4つのモジュールからなる統一されたフレームワークを設計しました。この枠組みに基づいて、人間設計アーキテクチャに人気のある操作を組み込むことにより、新しい検索空間を設計する。そして, 効率的な探索を可能にするために, 探索空間を連続的に緩和する粗粒化戦略を提案し, 微分可能な探索法を適用できる。 3つのドメインから6つの実世界のデータセットに関する広範囲な実験を行い,提案手法の有効性と有効性を示す。 Graph classification is an important problem with applications across many domains, like chemistry and bioinformatics, for which graph neural networks (GNNs) have been state-of-the-art (SOTA) methods. GNNs are designed to learn node-level representation based on neighborhood aggregation schemes, and to obtain graph-level representation, pooling methods are applied after the aggregation operation in existing GNN models to generate coarse-grained graphs. However,due to highly diverse applications of graph classification, and the performance of existing pooling methods vary on different graphs. In other words, it is a challenging problem to design a universal pooling architecture to perform well in most cases, leading to a demand for data-specific pooling methods in real-world applications. To address this problem, we propose to use neural architecture search (NAS) to search for adaptive pooling architectures for graph classification. Firstly we designed a unified framework consisting of four modules: Aggregation, Pooling, Readout, and Merge, which can cover existing human-designed pooling methods for graph classification. Based on this framework, a novel search space is designed by incorporating popular operations in human-designed architectures. Then to enable efficient search, a coarsening strategy is proposed to continuously relax the search space, thus a differentiable search method can be adopted. Extensive experiments on six real-world datasets from three domains are conducted, and the results demonstrate the effectiveness and efficiency of the proposed framework.	翻訳日:2021-08-25 14:14:42 公開日:2021-08-24
# DeepSleepNet-Lite:不確かさ推定による簡易型自動睡眠ステージスコアモデル DeepSleepNet-Lite: A Simplified Automatic Sleep Stage Scoring Model with Uncertainty Estimates ( http://arxiv.org/abs/2108.10600v1 ) ライセンス: Link先を確認	Luigi Fiorillo, Paolo Favaro, and Francesca Dalia Faraci	(参考訳) ディープラーニングは最新の自動睡眠スコアリングアルゴリズムで広く利用されている。その人気は、優れたパフォーマンスと生信号を直接処理し、データから特徴を学習する能力に起因している。既存のスコアリングアルゴリズムの多くは、大量のトレーニングパラメータと入力中の長い時間シーケンス(最大12分)のために、非常に計算に要求されるアーキテクチャを利用する。これらのアーキテクチャのうち、モデルの不確実性の推定を提供するものはごくわずかである。本研究では,90秒のEEG入力シーケンスのみを処理する簡易軽量スコアリングアーキテクチャであるDeepSleepNet-Liteを提案する。睡眠スコアリングにおいて,モンテカルロドロップアウト手法を初めて活用し,アーキテクチャの性能向上と不確定なインスタンスの検出に活用した。オープンソースのSleep-EDF拡張データベースから単一チャネルのEEG Fpz-Czで評価を行う。 DeepSleepNet-Liteは、既存の最先端アーキテクチャと比較して、性能が若干低いが、全体的な精度ではマクロF1スコアとコーエンのカッパ(Sleep-EDF v1-2013 +/30mins:84.0%, 78.0%, 0.78; on Sleep-EDF v2-2018 +/30mins: 80.3%, 75.2%, 0.73)である。モンテカルロドロップアウトは不確定な予測の推定を可能にする。不確実なインスタンスを拒絶することで、このモデルはデータベースの両バージョンでより高いパフォーマンスを達成する(Sleep-EDF v1-2013 +/-30mins: 86.1.0%, 79.6%, 0.81; on Sleep-EDF v2-2018 +/-30mins: 82.3%, 76.7%, 0.76)。より軽い睡眠スコアリングアプローチは、リアルタイムで睡眠分析を行うためのスコアリングアルゴリズムの応用への道を開く。 Deep learning is widely used in the most recent automatic sleep scoring algorithms. Its popularity stems from its excellent performance and from its ability to directly process raw signals and to learn feature from the data. Most of the existing scoring algorithms exploit very computationally demanding architectures, due to their high number of training parameters, and process lengthy time sequences in input (up to 12 minutes). Only few of these architectures provide an estimate of the model uncertainty. In this study we propose DeepSleepNet-Lite, a simplified and lightweight scoring architecture, processing only 90-seconds EEG input sequences. We exploit, for the first time in sleep scoring, the Monte Carlo dropout technique to enhance the performance of the architecture and to also detect the uncertain instances. The evaluation is performed on a single-channel EEG Fpz-Cz from the open source Sleep-EDF expanded database. DeepSleepNet-Lite achieves slightly lower performance, if not on par, compared to the existing state-of-the-art architectures, in overall accuracy, macro F1-score and Cohen's kappa (on Sleep-EDF v1-2013 +/-30mins: 84.0%, 78.0%, 0.78; on Sleep-EDF v2-2018 +/-30mins: 80.3%, 75.2%, 0.73). Monte Carlo dropout enables the estimate of the uncertain predictions. By rejecting the uncertain instances, the model achieves higher performance on both versions of the database (on Sleep-EDF v1-2013 +/-30mins: 86.1.0%, 79.6%, 0.81; on Sleep-EDF v2-2018 +/-30mins: 82.3%, 76.7%, 0.76). Our lighter sleep scoring approach paves the way to the application of scoring algorithms for sleep analysis in real-time.	翻訳日:2021-08-25 14:14:17 公開日:2021-08-24
# 一般化ページランクを用いた適応的・解釈可能なグラフ畳み込みネットワーク Adaptive and Interpretable Graph Convolution Networks Using Generalized Pagerank ( http://arxiv.org/abs/2108.10636v1 ) ライセンス: Link先を確認	Kishan Wimalawarne and Taiji Suzuki	(参考訳) 深層gcnモデルにおける適応層間グラフ畳み込みについて検討する。我々は、GCNIIネットワークの各層で一般化されたページランクを学習し、適応的な畳み込みを誘導するAdaGPRを提案する。 AdaGPR の一般化は正規化隣接行列の固有値スペクトルの多項式によって一般化されたページランク係数の順に有界であることが示される。一般化境界の解析により、オーバースムーシングは正規化隣接行列の高次による畳み込みとモデルの深さの両方に依存することが分かる。我々は,ベンチマーク実データを用いたノード分類の評価を行い,既存のグラフ畳み込みネットワークに比べてadagprは精度が向上し,オーバースムーシングに対するロバスト性が示された。さらに、レイヤーワイズ一般化ページランクの係数の解析により、モデル解釈を可能にする各レイヤにおける畳み込みを質的に理解できることを示す。 We investigate adaptive layer-wise graph convolution in deep GCN models. We propose AdaGPR to learn generalized Pageranks at each layer of a GCNII network to induce adaptive convolution. We show that the generalization bound for AdaGPR is bounded by a polynomial of the eigenvalue spectrum of the normalized adjacency matrix in the order of the number of generalized Pagerank coefficients. By analysing the generalization bounds we show that oversmoothing depends on both the convolutions by the higher orders of the normalized adjacency matrix and the depth of the model. We performed evaluations on node-classification using benchmark real data and show that AdaGPR provides improved accuracies compared to existing graph convolution networks while demonstrating robustness against oversmoothing. Further, we demonstrate that analysis of coefficients of layer-wise generalized Pageranks allows us to qualitatively understand convolution at each layer enabling model interpretations.	翻訳日:2021-08-25 14:13:35 公開日:2021-08-24
# シンボリック回帰におけるトレーニングデータ削減のためのデータ集約 Data Aggregation for Reducing Training Data in Symbolic Regression ( http://arxiv.org/abs/2108.10660v1 ) ライセンス: Link先を確認	Lukas Kammerer, Gabriel Kronberger, Michael Kommenda	(参考訳) データの量が増えると、遺伝的プログラミングによるシンボリック回帰のような計算量の多い機械学習技術がますます非現実的になる。本研究は,学習データを削減する手法と遺伝的プログラミングのランタイムについて述べる。データは、実際の機械学習アルゴリズムを実行する前に、前処理ステップに集約される。 K平均クラスタリングとデータビンニングはデータアグリゲーションに使われ、最も単純なデータリダクション法としてランダムサンプリングと比較される。実世界の4つのデータセットにおいて,学習における高速化と学習モデルへの影響を分析し,各手法の精度を検証した。遺伝的プログラミングの性能は、ランダムな森林と線形回帰と比較される。その結果、k平均とランダムサンプリングは、データサイズに比例するスピードアップの一方で、元のデータの30%に削減された場合、テスト精度が極めて低下することが示された。逆にバインディングは、非常に高いテストエラーのモデルにつながる。 The growing volume of data makes the use of computationally intense machine learning techniques such as symbolic regression with genetic programming more and more impractical. This work discusses methods to reduce the training data and thereby also the runtime of genetic programming. The data is aggregated in a preprocessing step before running the actual machine learning algorithm. K-means clustering and data binning is used for data aggregation and compared with random sampling as the simplest data reduction method. We analyze the achieved speed-up in training and the effects on the trained models test accuracy for every method on four real-world data sets. The performance of genetic programming is compared with random forests and linear regression. It is shown, that k-means and random sampling lead to very small loss in test accuracy when the data is reduced down to only 30% of the original data, while the speed-up is proportional to the size of the data set. Binning on the contrary, leads to models with very high test error.	翻訳日:2021-08-25 14:13:21 公開日:2021-08-24
# エネルギー時系列予測-従来モデルおよび機械学習モデルの分析および経験的評価 Energy time series forecasting-Analytical and empirical assessment of conventional and machine learning models ( http://arxiv.org/abs/2108.10663v1 ) ライセンス: Link先を確認	Hala Hamdoun, Alaa Sagheer and Hassan Youness	(参考訳) エネルギー時系列予測(tsf)問題を解く従来の手法の候補として,機械学習手法が文献に採用されている。近年,人工知能分野において,幅広い応用において驚くべき性能を発揮する深層学習手法が出現している。しかし、そのエネルギーのtsf問題を解決するための性能に関する証拠は、正確さと計算の要求の観点からは、乏しい。エネルギーTSF問題を扱うレビュー記事の大部分は体系的なレビューであるが、エネルギーTSF問題に対する質的かつ定量的な研究は文献ではまだ行われていない。本論文の目的は2つであり、まず、従来の機械学習と深層学習を総合的に分析し、様々なエネルギー的TSF問題の解法として活用することである。第2に,実世界の3つのデータセットを用いて,選択した手法の実証評価を行う。家庭問題における電力消費問題, 天然ガス問題, 電力消費に関するこれらのデータセットは, 最初の2つの問題は不定形tsfであり, 3つ目の問題は多変量tsfである。従来型と機械学習の両競技者に比較して, 深層学習法は, 精度と予測地平線を著しく改善した。平均時において、計算の要求は他の競争相手よりも顕著に大きい。論文は最終的に、エネルギー予測領域におけるさらなる研究の基盤として、多くの課題、研究の方向性、研究コミュニティへの勧告を特定する。 Machine learning methods have been adopted in the literature as contenders to conventional methods to solve the energy time series forecasting (TSF) problems. Recently, deep learning methods have been emerged in the artificial intelligence field attaining astonishing performance in a wide range of applications. Yet, the evidence about their performance in to solve the energy TSF problems, in terms of accuracy and computational requirements, is scanty. Most of the review articles that handle the energy TSF problem are systematic reviews, however, a qualitative and quantitative study for the energy TSF problem is not yet available in the literature. The purpose of this paper is twofold, first it provides a comprehensive analytical assessment for conventional,machine learning, and deep learning methods that can be utilized to solve various energy TSF problems. Second, the paper carries out an empirical assessment for many selected methods through three real-world datasets. These datasets related to electrical energy consumption problem, natural gas problem, and electric power consumption of an individual household problem.The first two problems are univariate TSF and the third problem is a multivariate TSF. Com-pared to both conventional and machine learning contenders, the deep learning methods attain a significant improvement in terms of accuracy and forecasting horizons examined. In the mean-time, their computational requirements are notably greater than other contenders. Eventually,the paper identifies a number of challenges, potential research directions, and recommendations to the research community may serve as a basis for further research in the energy forecasting domain.	翻訳日:2021-08-25 14:13:07 公開日:2021-08-24
# 電力予測に着目した回帰問題に対する適応的説明型連続学習フレームワーク Adaptive Explainable Continual Learning Framework for Regression Problems with Focus on Power Forecasts ( http://arxiv.org/abs/2108.10781v1 ) ライセンス: Link先を確認	Yujiang He	(参考訳) 従来のディープラーニング技術と比較して、連続学習はディープニューラルネットワークを継続的に適応的に学習することを可能にする。ディープニューラルネットワークは、新しいタスクを学習し、アプリケーションのデータ量が増加し続けるにつれて、古いタスクから得られた知識を克服しなければならない。本稿では,この文脈における潜在的な課題を説明するために,2つの連続学習シナリオを提案する。さらに、回帰タスクの継続的な学習に短いCLeaRフレームワークに関するこれまでの研究に基づいて、モデルが自分自身を拡張し、データを連続的に学習できるように、さらに開発を進めていく予定です。研究トピックは関連するが、継続的なディープラーニングアルゴリズムの開発、データストリームにおける非定常検出戦略、説明可能で可視化可能な人工知能などに限定されない。さらに、フレームワークとアルゴリズム関連のハイパーパラメータをアプリケーションで動的に更新する必要がある。実世界のアプリケーションから収集した発電および消費データに基づいて予測実験を行う。一連の総合的な評価指標と視覚化ツールは、実験結果の分析に役立つ。提案されたフレームワークは、他の絶えず変化するシナリオに一般的に適用されることが期待される。 Compared with traditional deep learning techniques, continual learning enables deep neural networks to learn continually and adaptively. Deep neural networks have to learn new tasks and overcome forgetting the knowledge obtained from the old tasks as the amount of data keeps increasing in applications. In this article, two continual learning scenarios will be proposed to describe the potential challenges in this context. Besides, based on our previous work regarding the CLeaR framework, which is short for continual learning for regression tasks, the work will be further developed to enable models to extend themselves and learn data successively. Research topics are related but not limited to developing continual deep learning algorithms, strategies for non-stationarity detection in data streams, explainable and visualizable artificial intelligence, etc. Moreover, the framework- and algorithm-related hyperparameters should be dynamically updated in applications. Forecasting experiments will be conducted based on power generation and consumption data collected from real-world applications. A series of comprehensive evaluation metrics and visualization tools can help analyze the experimental results. The proposed framework is expected to be generally applied to other constantly changing scenarios.	翻訳日:2021-08-25 14:12:45 公開日:2021-08-24
# 効率的な理論推論のためのグラフコントラスト事前学習 Graph Contrastive Pre-training for Effective Theorem Reasoning ( http://arxiv.org/abs/2108.10821v1 ) ライセンス: Link先を確認	Zhaoyu Li, Binghong Chen, Xujie Si	(参考訳) インタラクティブな定理証明は困難で退屈なプロセスであり、人間の専門家からの非自明な専門知識と詳細な低レベルな指示(または戦術)を必要とする。戦術予測はこのプロセスを自動化する自然な方法です。既存の手法は、人間の専門家による証明からディープニューラルネットワーク(DNN)に基づくモデルを学ぶことによって、戦術予測に関する有望な結果を示す。本稿では,定理証明のための表現学習の改善に焦点を絞った新しい拡張であるニューロタクティクスを提案する。 NeuroTacticは、グラフニューラルネットワーク(GNN)を利用して、定理と前提を表現し、事前学習にグラフコントラスト学習を適用する。定理の表現学習が戦術予測に不可欠であることを実証する。他の方法と比較して、NeuroTacticはCoqGymデータセット上で最先端のパフォーマンスを達成する。 Interactive theorem proving is a challenging and tedious process, which requires non-trivial expertise and detailed low-level instructions (or tactics) from human experts. Tactic prediction is a natural way to automate this process. Existing methods show promising results on tactic prediction by learning a deep neural network (DNN) based model from proofs written by human experts. In this paper, we propose NeuroTactic, a novel extension with a special focus on improving the representation learning for theorem proving. NeuroTactic leverages graph neural networks (GNNs) to represent the theorems and premises, and applies graph contrastive learning for pre-training. We demonstrate that the representation learning of theorems is essential to predict tactics. Compared with other methods, NeuroTactic achieves state-of-the-art performance on the CoqGym dataset.	翻訳日:2021-08-25 14:12:29 公開日:2021-08-24
# CMML:コールドスタート勧告のためのコンテキスト変調メタ学習 CMML: Contextual Modulation Meta Learning for Cold-Start Recommendation ( http://arxiv.org/abs/2108.10511v1 ) ライセンス: Link先を確認	Xidong Feng, Chen Chen, Dong Li, Mengchen Zhao, Jianye Hao, Jun Wang	(参考訳) 実践的なレコメンデータシステムは、過去におけるユーザ・イテム間のインタラクションが不十分である場合、コールドスタートの問題を経験します。メタ学習、特に勾配に基づく学習は、モデルの初期パラメータを学習することでこの問題に対処し、限られたデータ例から特定のタスクへの迅速な適応を可能にする。性能が大幅に向上したにもかかわらず、主な産業展開との非互換性と、インナーループ勾配操作による計算負荷という2つの重大な問題に悩まされる。これら2つの問題は,実用的なレコメンデーションシステムでは適用が困難である。メタ学習フレームワークの利点を享受し、これらの問題を緩和するために、文脈変調メタ学習(cmml)と呼ばれる推奨フレームワークを提案する。 CMMLは完全なフィードフォワード操作で構成されており、計算効率が良く、主要な産業展開と完全に互換性がある。 CMMLは、特定のタスクを表現するためにコンテキストエンコーダを生成するコンテキストエンコーダ、タスクレベルのコンテキストで特定のユーザオブジェクトの特徴を集約するハイブリッドコンテキストジェネレータ、そして、効率的に適応するためにレコメンデーションモデルを変調できるコンテキスト変調ネットワークを含む3つのコンポーネントから構成される。本手法は,様々な実世界のデータセット上でのシナリオ固有のコールドスタート設定とユーザ固有のコールドスタート設定の両方に対して検証し,より高い計算効率とより優れた解釈性を備えた勾配法でCMMLが同等あるいはそれ以上の性能を達成可能であることを示す。 Practical recommender systems experience a cold-start problem when observed user-item interactions in the history are insufficient. Meta learning, especially gradient based one, can be adopted to tackle this problem by learning initial parameters of the model and thus allowing fast adaptation to a specific task from limited data examples. Though with significant performance improvement, it commonly suffers from two critical issues: the non-compatibility with mainstream industrial deployment and the heavy computational burdens, both due to the inner-loop gradient operation. These two issues make them hard to be applied in practical recommender systems. To enjoy the benefits of meta learning framework and mitigate these problems, we propose a recommendation framework called Contextual Modulation Meta Learning (CMML). CMML is composed of fully feed-forward operations so it is computationally efficient and completely compatible with the mainstream industrial deployment. CMML consists of three components, including a context encoder that can generate context embedding to represent a specific task, a hybrid context generator that aggregates specific user-item features with task-level context, and a contextual modulation network, which can modulate the recommendation model to adapt effectively. We validate our approach on both scenario-specific and user-specific cold-start setting on various real-world datasets, showing CMML can achieve comparable or even better performance with gradient based methods yet with much higher computational efficiency and better interpretability.	翻訳日:2021-08-25 14:12:03 公開日:2021-08-24
# autoencoder-based semantic novelty detection: towards dependable ai-based systems Autoencoder-based Semantic Novelty Detection: Towards Dependable AI-based Systems ( http://arxiv.org/abs/2108.10851v1 ) ライセンス: Link先を確認	Andreas Rausch, Azarmidokht Motamedi Sedeh, Meng Zhang	(参考訳) 無人タクシーのような多くの自律システムは、安全上重要な機能を果たす。自律システムは、特に環境認識のために人工知能(AI)技術を採用している。エンジニアはAIベースの自律システムを完全にテストしたり、正式に検証することはできない。 aiベースのシステムの精度は、トレーニングデータの品質に依存する。これにより、訓練に使用するデータと何らかの点で異なる新規検出データが、システム開発及び運用の安全対策となる。本稿では, 意味的オートエンコーダトポロジーのためのアーキテクチャガイドラインと, 意味的エラー計算をノベルティ基準として, オートエンコーダに基づく意味的ノベルティ検出のための新しいアーキテクチャを提案する。このような意味的新規性検出は、偽陰性を最小化することにより、文献から知られているオートエンコーダに基づく新規性検出アプローチよりも優れていることを実証する。 Many autonomous systems, such as driverless taxis, perform safety critical functions. Autonomous systems employ artificial intelligence (AI) techniques, specifically for the environment perception. Engineers cannot completely test or formally verify AI-based autonomous systems. The accuracy of AI-based systems depends on the quality of training data. Thus, novelty detection - identifying data that differ in some respect from the data used for training - becomes a safety measure for system development and operation. In this paper, we propose a new architecture for autoencoder-based semantic novelty detection with two innovations: architectural guidelines for a semantic autoencoder topology and a semantic error calculation as novelty criteria. We demonstrate that such a semantic novelty detection outperforms autoencoder-based novelty detection approaches known from literature by minimizing false negatives.	翻訳日:2021-08-25 14:11:37 公開日:2021-08-24
# 高次MOTスケーラブル化:リフテッド不整合経路の効率的な近似解法 Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths ( http://arxiv.org/abs/2108.10606v1 ) ライセンス: Link先を確認	Andrea Hornakova, Timo Kaiser, Paul Swoboda, Michal Rolinek, Bodo Rosenhahn, Roberto Henschel	(参考訳) 本稿では,複数物体追跡(MOT)のための自然だがNPハードなモデルであるリフトド・ディスジョイント・パス問題(LDP)に対する効率的な近似メッセージパッシング法を提案する。私たちのトラッカーは、長いMOTシーケンスから来る非常に大きなインスタンスにスケールします。近似解法により,ソリューションの品質を犠牲にすることなくMOT15/16/17ベンチマークを処理でき,そのサイズと複雑さから,現在まで LDP ソルバには及ばないMOT20を解くことができる。これら4つの標準MOTベンチマークにおいて、最適 LDP ソルバに基づくトラッカーを含む最先端の手法と同等あるいは同等の性能を達成する。 We present an efficient approximate message passing solver for the lifted disjoint paths problem (LDP), a natural but NP-hard model for multiple object tracking (MOT). Our tracker scales to very large instances that come from long and crowded MOT sequences. Our approximate solver enables us to process the MOT15/16/17 benchmarks without sacrificing solution quality and allows for solving MOT20, which has been out of reach up to now for LDP solvers due to its size and complexity. On all these four standard MOT benchmarks we achieve performance comparable or better than current state-of-the-art methods including a tracker based on an optimal LDP solver.	翻訳日:2021-08-25 14:11:06 公開日:2021-08-24
# 近距離切削車両からの噴霧のベンチマーク A Benchmark for Spray from Nearby Cutting Vehicles ( http://arxiv.org/abs/2108.10800v1 ) ライセンス: Link先を確認	Stefanie Walz, Mario Bijelic, Florian Kraus, Werner Ritter, Martin Simon, Igor Doric	(参考訳) 現在の運転支援システムと自律運転スタックは、明確に定義された環境条件と地理フェンスで囲まれた領域に限られている。悪天候下での運転安全を高めるためには、自動運転と運転支援システムの適用範囲を広げる必要がある。この開発を可能にするために、期待される歪みを定量化するために再現可能なベンチマーク手法が必要である。本発表では,噴霧による乱れの検査方法について述べる。噴霧による乱れを評価するための評価スキームとともに、軽量で構成可能な新しい噴霧装置を導入する。この分析は、自動車用RGBカメラと2種類のLiDARシステム、およびYOLOv3とPV-RCNNに基づく下流検出アルゴリズムをカバーする。密閉車両の一般的なシナリオでは、歪みが最大4秒までの知覚スタックに深刻な影響を与えており、スプレーの影響をベンチマークする必要性が示されている。 Current driver assistance systems and autonomous driving stacks are limited to well-defined environment conditions and geo fenced areas. To increase driving safety in adverse weather conditions, broadening the application spectrum of autonomous driving and driver assistance systems is necessary. In order to enable this development, reproducible benchmarking methods are required to quantify the expected distortions. In this publication, a testing methodology for disturbances from spray is presented. It introduces a novel lightweight and configurable spray setup alongside an evaluation scheme to assess the disturbances caused by spray. The analysis covers an automotive RGB camera and two different LiDAR systems, as well as downstream detection algorithms based on YOLOv3 and PV-RCNN. In a common scenario of a closely cutting vehicle, it is visible that the distortions are severely affecting the perception stack up to four seconds showing the necessity of benchmarking the influences of spray.	翻訳日:2021-08-25 14:10:53 公開日:2021-08-24
# 偏光イメージングによる複合積層板の欠陥自動検出のための次世代認識システム Next-generation perception system for automated defects detection in composite laminates via polarized computational imaging ( http://arxiv.org/abs/2108.10819v1 ) ライセンス: Link先を確認	Yuqi Ding, Jinwei Ye, Corina Barbalata, James Oubre, Chandler Lemoine, Jacob Agostinho, Genevieve Palardy	(参考訳) トリミングやサンディングを含む風力タービンブレードのような大型複合部品の仕上げ作業には、複数の作業員と部品の再配置が必要となる。複合材料製造業界では、製造部品の形状が不整合であり、作業完了は人間の判断と経験に基づくため、そのようなプロセスの自動化は困難である。動的で不確実な環境で仕上げ作業を行うことができる移動ロボットシステムを実装することで、品質が向上し製造コストが低下する。与えられたタスクを完了させるためには、協調ロボットチームは環境を適切に理解し、製造部品の異常を検出する必要がある。本稿では,複合積層板の欠陥を識別する偏極型画像処理システムの初期実装と実演について述べる。ポラリメトリック画像は表面マイクロジオメトリと非常に関係があるため、従来のカラー画像では見えない表面欠陥を検出するのに使うことができる。提案した視覚システムは, ガラス繊維および炭素繊維積層体の欠陥タイプと表面特性(ピンホール, ヴォイド, 引っかき傷, 樹脂フラッシュなど)の同定に成功している。 Finishing operations on large-scale composite components like wind turbine blades, including trimming and sanding, often require multiple workers and part repositioning. In the composites manufacturing industry, automation of such processes is challenging, as manufactured part geometry may be inconsistent and task completion is based on human judgment and experience. Implementing a mobile, collaborative robotic system capable of performing finishing tasks in dynamic and uncertain environments would improve quality and lower manufacturing costs. To complete the given tasks, the collaborative robotic team must properly understand the environment and detect irregularities in the manufactured parts. In this paper, we describe the initial implementation and demonstration of a polarized computational imaging system to identify defects in composite laminates. As the polarimetric images are highly relevant to the surface micro-geometry, they can be used to detect surface defects that are not visible in conventional color images. The proposed vision system successfully identifies defect types and surface characteristics (e.g., pinholes, voids, scratches, resin flash) for different glass fiber and carbon fiber laminates.	翻訳日:2021-08-25 14:10:42 公開日:2021-08-24
# 効率的な長期記憶を有する量子適応エージェント Quantum adaptive agents with efficient long-term memories ( http://arxiv.org/abs/2108.10876v1 ) ライセンス: Link先を確認	Thomas J. Elliott, Mile Gu, Andrew J. P. Garner, Jayne Thompson	(参考訳) 適応システムの成功の中心は、環境からの信号を解釈し、それに応じて反応する能力である。このようなエージェントは、ますます複雑な戦略を実行することができると、通常より良く機能する。エージェントが過去の経験から思い出さなければならない情報が多ければ多いほど、必要なメモリが増えます。本稿では,量子情報処理が可能なエージェントのパワーについて検討する。我々は、量子エージェントがメモリ圧縮の利点を最大化するために採用する必要がある最も一般的な形式を明らかにし、そのメモリ状態を体系的にエンコーディングする手段を提供する。これらのエンコーディングは,メモリ最小の旧来のエージェントと比較して,過去のイベントに関する情報を保存しなければならない場合,非常に有利なスケーリングの利点を示す。 Central to the success of adaptive systems is their ability to interpret signals from their environment and respond accordingly -- they act as agents interacting with their surroundings. Such agents typically perform better when able to execute increasingly complex strategies. This comes with a cost: the more information the agent must recall from its past experiences, the more memory it will need. Here we investigate the power of agents capable of quantum information processing. We uncover the most general form a quantum agent need adopt to maximise memory compression advantages, and provide a systematic means of encoding their memory states. We show these encodings can exhibit extremely favourable scaling advantages relative to memory-minimal classical agents when information must be retained about events increasingly far into the past.	翻訳日:2021-08-25 14:09:52 公開日:2021-08-24
# 反論可能な推奨事項 Counterfactual Explainable Recommendation ( http://arxiv.org/abs/2108.10539v1 ) ライセンス: Link先を確認	Juntao Tan, Shuyuan Xu, Yingqiang Ge, Yunqi Li, Xu Chen, Yongfeng Zhang	(参考訳) ユーザやシステム設計者がより理解と意思決定を容易にするために説明を提供することで、説明可能な推奨は重要な研究課題となっている。本稿では,説明可能な推薦のための因果推論から反事実推論の考察を取り入れた,反事実説明可能な推薦(カウンタ)を提案する。 counterは、説明の複雑さと強みを定式化することができ、モデル決定のための単純(低複雑さ)かつ効果的な(高強度)説明を求めるために、反事実学習フレームワークを採用している。技術的には、各ユーザーに推奨される各項目について、カウンタ最適化問題を定式化し、項目の側面に最小限の変更を発生させ、反事実項目の推奨決定を逆転させる反事実項目を作成する。これらの変更は、なぜオリジナルの項目が推奨されるのかの説明である。反事実的な説明は、ユーザとシステムデザイナの両方がよりよいモデルデバッグのために役立ちます。この作業のもうひとつの貢献は、説明可能な推奨の評価である。幸いなことに、反実的な説明は標準的な定量的評価に非常に適している。説明の質を評価するために,ユーザの視点から2種類の評価指標を設計する。ユーザがそのアイテムを好む理由) と、モデルの観点から見た他のもの(すなわち、) なぜそのアイテムがモデルによって推奨されるのか) 提案手法をブラックボックスレコメンデータシステムに適用し,実世界の5つのデータセット上で生成した説明を評価する。その結果,本モデルは最先端のレコメンデーションモデルよりも正確かつ効果的に説明できることがわかった。 By providing explanations for users and system designers to facilitate better understanding and decision making, explainable recommendation has been an important research problem. In this paper, we propose Counterfactual Explainable Recommendation (CountER), which takes the insights of counterfactual reasoning from causal inference for explainable recommendation. CountER is able to formulate the complexity and the strength of explanations, and it adopts a counterfactual learning framework to seek simple (low complexity) and effective (high strength) explanations for the model decision. Technically, for each item recommended to each user, CountER formulates a joint optimization problem to generate minimal changes on the item aspects so as to create a counterfactual item, such that the recommendation decision on the counterfactual item is reversed. These altered aspects constitute the explanation of why the original item is recommended. The counterfactual explanation helps both the users for better understanding and the system designers for better model debugging. Another contribution of the work is the evaluation of explainable recommendation, which has been a challenging task. Fortunately, counterfactual explanations are very suitable for standard quantitative evaluation. To measure the explanation quality, we design two types of evaluation metrics, one from user's perspective (i.e. why the user likes the item), and the other from model's perspective (i.e. why the item is recommended by the model). We apply our counterfactual learning algorithm on a black-box recommender system and evaluate the generated explanations on five real-world datasets. Results show that our model generates more accurate and effective explanations than state-of-the-art explainable recommendation models.	翻訳日:2021-08-25 14:08:49 公開日:2021-08-24
# フェデレーション学習におけるユーザ貢献度データフリー評価 Data-Free Evaluation of User Contributions in Federated Learning ( http://arxiv.org/abs/2108.10623v1 ) ライセンス: Link先を確認	Hongtao Lv, Zhenzhe Zheng, Tie Luo, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv	(参考訳) Federated Learning (FL)は、モバイルデバイス上の機械学習モデルを、各デバイスのプライベートデータとコンピューティングリソースを使用して分散的にトレーニングする。重要な問題は,(1)モデルトレーニングにおけるユーザの努力を適切なインセンティブで補償し,(2)悪意のある低品質ユーザの検出と削除を可能にするために,個々のユーザの貢献を評価することである。最先端のソリューションは評価目的のために代表的なテストデータセットを必要とするが、そのようなデータセットはしばしば利用できず、合成も困難である。本稿では,テストデータセットを使わずにflにおけるユーザの貢献度を評価するピア予測の考え方に基づいて,ペアワイズ相関合意(pca)と呼ばれる手法を提案する。 pcaはユーザーがアップロードしたモデルパラメータの統計相関を用いてこれを達成する。次に,(1)Fed-PCAと呼ばれる新しいフェデレーション学習アルゴリズム,(2)真性を保証する新たなインセンティブメカニズムを設計に適用する。 MNISTデータセットと大規模産業製品レコメンデーションデータセットを用いてPCAとFed-PCAの性能を評価する。その結果、我々のFed-PCAは標準のFedAvgアルゴリズムや他のベースライン手法を精度良く上回り、同時にPCAはユーザーが真実に振る舞うことを効果的に動機づけることを示した。 Federated learning (FL) trains a machine learning model on mobile devices in a distributed manner using each device's private data and computing resources. A critical issues is to evaluate individual users' contributions so that (1) users' effort in model training can be compensated with proper incentives and (2) malicious and low-quality users can be detected and removed. The state-of-the-art solutions require a representative test dataset for the evaluation purpose, but such a dataset is often unavailable and hard to synthesize. In this paper, we propose a method called Pairwise Correlated Agreement (PCA) based on the idea of peer prediction to evaluate user contribution in FL without a test dataset. PCA achieves this using the statistical correlation of the model parameters uploaded by users. We then apply PCA to designing (1) a new federated learning algorithm called Fed-PCA, and (2) a new incentive mechanism that guarantees truthfulness. We evaluate the performance of PCA and Fed-PCA using the MNIST dataset and a large industrial product recommendation dataset. The results demonstrate that our Fed-PCA outperforms the canonical FedAvg algorithm and other baseline methods in accuracy, and at the same time, PCA effectively incentivizes users to behave truthfully.	翻訳日:2021-08-25 14:08:25 公開日:2021-08-24
# シンボリック回帰における遺伝的操作の有効性について On the Effectiveness of Genetic Operations in Symbolic Regression ( http://arxiv.org/abs/2108.10661v1 ) ライセンス: Link先を確認	Bogdan Burlacu, Michael Affenzeller, Michael Kommenda	(参考訳) 本稿では,遺伝的プログラミング(GP)の進化的ダイナミクスを遺伝情報,多様性尺度,親から子への適合度変化に関する情報を用いて解析する手法について述べる。個体構造における遺伝子の出自を同定する新たなサブツリー追跡手法を導入し, 個体群における最良解の進化に寄与しているのは, ごく少数の祖先個体のみであることを示す。 This paper describes a methodology for analyzing the evolutionary dynamics of genetic programming (GP) using genealogical information, diversity measures and information about the fitness variation from parent to offspring. We introduce a new subtree tracing approach for identifying the origins of genes in the structure of individuals, and we show that only a small fraction of ancestor individuals are responsible for the evolvement of the best solutions in the population.	翻訳日:2021-08-25 14:08:02 公開日:2021-08-24
# オープンバンキングのための連合学習 Federated Learning for Open Banking ( http://arxiv.org/abs/2108.10749v1 ) ライセンス: Link先を確認	Guodong Long, Yue Tan, Jing Jiang, Chengqi Zhang	(参考訳) オープンバンキングは、個々の顧客が自分の銀行データを所有することを可能にし、データマーケットプレースと金融サービスの新たなエコシステムの促進に対する基本的なサポートを提供する。近い将来,連合学習を用いた金融分野におけるデータ所有の分散化が期待できる。これは、分散型トレーニング方法でインテリジェントなモデルを学習できるジャストインタイム技術である。フェデレーション学習の最も魅力的な側面は、プライベートデータを収集することなく、モデルトレーニングを集中型サーバと分散ノードに分解する能力である。この種の分解学習フレームワークは、ユーザのプライバシと機密データを保護する大きな可能性を秘めている。したがって、連合学習は、オープンバンキングデータ市場と自然に結合する。この章では、オープンバンキングの文脈で連合学習を適用する際の課題について論じ、それに対応するソリューションも検討されている。 Open banking enables individual customers to own their banking data, which provides fundamental support for the boosting of a new ecosystem of data marketplaces and financial services. In the near future, it is foreseeable to have decentralized data ownership in the finance sector using federated learning. This is a just-in-time technology that can learn intelligent models in a decentralized training manner. The most attractive aspect of federated learning is its ability to decompose model training into a centralized server and distributed nodes without collecting private data. This kind of decomposed learning framework has great potential to protect users' privacy and sensitive data. Therefore, federated learning combines naturally with an open banking data marketplaces. This chapter will discuss the possible challenges for applying federated learning in the context of open banking, and the corresponding solutions have been explored as well.	翻訳日:2021-08-25 14:07:54 公開日:2021-08-24
# リプシッツ誘導体を用いた一変量関数の大域的最適化の回帰解析 Regret Analysis of Global Optimization in Univariate Functions with Lipschitz Derivatives ( http://arxiv.org/abs/2108.10859v1 ) ライセンス: Link先を確認	Kaan Gokcesu, Hakan Gokcesu	(参考訳) 本研究では,不定損失関数における大域的最適化の問題について検討し,一般的な下限アルゴリズム(例えばpiyavskii-shubertアルゴリズム)の後悔を分析する。任意の時間に$T$(これは最高の見積とグローバルオプティマイザの間の損失の差である)という広く利用可能な単純な後悔の代わりに、累積的後悔をその時点まで調査する。適切な下限アルゴリズムを用いることで、異なる関数のクラスに対して満足のいく累積後悔境界を実現できることを示す。パラメータ $L$ を持つリプシッツ連続函数に対して、累積後悔は$O(L\log T)$であることを示す。パラメータ $H$ を持つ滑らかなリプシッツ函数に対して、累積後悔は $O(H)$ であることを示す。また、リプシッツ連続函数と滑らか函数の両方を個別にカバーするより広範な関数のクラスについて解析的に結果を拡張する。 In this work, we study the problem of global optimization in univariate loss functions, where we analyze the regret of the popular lower bounding algorithms (e.g., Piyavskii-Shubert algorithm). For any given time $T$, instead of the widely available simple regret (which is the difference of the losses between the best estimation up to $T$ and the global optimizer), we study the cumulative regret up to that time. With a suitable lower bounding algorithm, we show that it is possible to achieve satisfactory cumulative regret bounds for different classes of functions. For Lipschitz continuous functions with the parameter $L$, we show that the cumulative regret is $O(L\log T)$. For Lipschitz smooth functions with the parameter $H$, we show that the cumulative regret is $O(H)$. We also analytically extend our results for a broader class of functions that covers both the Lipschitz continuous and smooth functions individually.	翻訳日:2021-08-25 14:07:39 公開日:2021-08-24
# 深信号FBSDEアルゴリズム Deep Signature FBSDE Algorithm ( http://arxiv.org/abs/2108.10504v1 ) ライセンス: Link先を確認	Qi Feng, Man Luo, Zhaoyu Zhang	(参考訳) 本研究では,前向き確率微分方程式 (FBSDEs) を状態と経路に依存する特徴を持つディープシグネチャ/log-signature FBSDEアルゴリズムを提案する。ニューラルネット(RNN)モデルにディープシグネチャ/ログ-シグネチャ変換を組み込むことで,トレーニング時間を短縮し,精度を向上し,既存の文献の手法と比較して時間的地平線を延長する。さらに,パラメータ偏微分方程式 (PDE) や経路依存PDE (PPDE) に関連付けられた,高周波データを含む状態と経路依存オプションの価格設定,モデルあいまいさ,確率ゲームなど,幅広い応用に適用することができる。最後に, ディープシグネチャ/log-signature FBSDEアルゴリズムの収束解析を導出する。 We propose a deep signature/log-signature FBSDE algorithm to solve forward-backward stochastic differential equations (FBSDEs) with state and path dependent features. By incorporating the deep signature/log-signature transformation into the recurrent neural network (RNN) model, our algorithm shortens the training time, improves the accuracy, and extends the time horizon comparing to methods in the existing literature. Moreover, our algorithms can be applied to a wide range of applications such as state and path dependent option pricing involving high-frequency data, model ambiguity, and stochastic games, which are linked to parabolic partial differential equations (PDEs), and path-dependent PDEs (PPDEs). Lastly, we also derive the convergence analysis of the deep signature/log-signature FBSDE algorithm.	翻訳日:2021-08-25 14:06:43 公開日:2021-08-24
# 最小囲み球を用いた第4種最適後方精度不確かさトレードオフの不確かさ定量化 Uncertainty Quantification of the 4th kind; optimal posterior accuracy-uncertainty tradeoff with the minimum enclosing ball ( http://arxiv.org/abs/2108.10517v1 ) ライセンス: Link先を確認	Hamed Hamze Bajgiran and Pau Batlle Franch and Houman Owhadi and Clint Scovel and Mahdy Shirdel and Michael Stanley and Peyman Tavallali	(参考訳) 不確実量化(UQ)には基本的に3種類のアプローチがある: (A) 頑健な最適化、(B) ベイズ的、(C) 決定論。 a) は頑健であるが、正確さとデータの同化に関しては不利である。 (b)前もって必要であり、一般的に脆く、後方推定は遅くなる。 C)は最適な事前の同定につながるが、その近似は次元の呪いに悩まされ、リスクの概念はデータの分布に関して平均化されるものである。我々は, (a), (b), (c) と仮説検定のハイブリッドである4番目の種類を紹介する。これは、サンプルの$x$を観察した後、(1)相対的可能性を通して可能性領域を定義し、(2)その領域でミンマックスゲームを行い、最適推定器とそのリスクを定義する。得られた方法は、(a)データを測定した後に最適な先行性を特定し、(b)リスクの概念は後部であり、(b)最適な推定値の判定とそのリスクは、関心地図の量(次元の呪いの対象ではなく、高速である)に基づいて、確率領域の画像の最小囲い球の計算に還元することができる。この方法は、観測データ(相対可能性)の希少性に仮定された下界として作用する$[0,1]$のパラメータによって特徴づけられる。このパラメータが1ドルに近い場合、この方法は、信頼度が低いUQ推定値で最大推定値の周りに集中した後続分布を生成する。このパラメータが0$に近い場合、この方法は信頼度の高いuq推定値を持つ最大リスク後方分布を生成する。精度不確実性トレードオフのナビゲートに加えて,データ同化に伴うロバスト性-正確性トレードオフをナビゲートすることでベイズ推論の脆性に対処する手法を提案する。 There are essentially three kinds of approaches to Uncertainty Quantification (UQ): (A) robust optimization, (B) Bayesian, (C) decision theory. Although (A) is robust, it is unfavorable with respect to accuracy and data assimilation. (B) requires a prior, it is generally brittle and posterior estimations can be slow. Although (C) leads to the identification of an optimal prior, its approximation suffers from the curse of dimensionality and the notion of risk is one that is averaged with respect to the distribution of the data. We introduce a 4th kind which is a hybrid between (A), (B), (C), and hypothesis testing. It can be summarized as, after observing a sample $x$, (1) defining a likelihood region through the relative likelihood and (2) playing a minmax game in that region to define optimal estimators and their risk. The resulting method has several desirable properties (a) an optimal prior is identified after measuring the data, and the notion of risk is a posterior one, (b) the determination of the optimal estimate and its risk can be reduced to computing the minimum enclosing ball of the image of the likelihood region under the quantity of interest map (which is fast and not subject to the curse of dimensionality). The method is characterized by a parameter in $ [0,1]$ acting as an assumed lower bound on the rarity of the observed data (the relative likelihood). When that parameter is near $1$, the method produces a posterior distribution concentrated around a maximum likelihood estimate with tight but low confidence UQ estimates. When that parameter is near $0$, the method produces a maximal risk posterior distribution with high confidence UQ estimates. In addition to navigating the accuracy-uncertainty tradeoff, the proposed method addresses the brittleness of Bayesian inference by navigating the robustness-accuracy tradeoff associated with data assimilation.	翻訳日:2021-08-25 14:06:29 公開日:2021-08-24
# 1対多: 深層学習による重力波探索 From One to Many: A Deep Learning Coincident Gravitational-Wave Search ( http://arxiv.org/abs/2108.10715v1 ) ライセンス: Link先を確認	Marlin B. Sch\"afer (1 and 2), Alexander H. Nitz (1 and 2) ((1) Max-Planck-Institut f\"ur Gravitationsphysik (Albert-Einstein-Institut), (2) Leibniz Universit\"at Hannover)	(参考訳) コンパクト2元源の合体による重力波は、地球結合検出器によって日常的に観測されている。最も敏感な探索アルゴリズムは、多くの異なる計算済みの重力波形を検出器データと組み合わせ、異なる検出器間の一致を探索する。機械学習は、計算コストを削減し、より複雑な信号をターゲットとする探索アルゴリズムを構築するための代替手法として検討されている。本研究では、単一検出器からの非スピン性二元ブラックホールデータに基づいてトレーニングされたニューラルネットワークを用いて、二元ブラックホール融合による重力波の2検出器探索を構築する。ネットワークは2つの観測所のデータに独立して適用され、2つの観測所間で一致したイベントをチェックする。これにより、独立検出器データを時間シフトすることで、大量のバックグラウンドデータの効率的な分析が可能になる。単一検出器の場合、ネットワークは感度マッチングされたフィルタリングの91.5\%$を維持するが、この数は2つの観測値に対して83.9\%$となる。ネットワークが検出器内の信号一貫性をチェックするために、両方の検出器からのデータを直接操作する単純なネットワークセットを構築します。これらの単純な2検出器ネットワークはいずれも、検出器のデータに個別にネットワークを適用し、時間的偶然を検索するよりも感度を向上させることができない。 Gravitational waves from the coalescence of compact-binary sources are now routinely observed by Earth bound detectors. The most sensitive search algorithms convolve many different pre-calculated gravitational waveforms with the detector data and look for coincident matches between different detectors. Machine learning is being explored as an alternative approach to building a search algorithm that has the prospect to reduce computational costs and target more complex signals. In this work we construct a two-detector search for gravitational waves from binary black hole mergers using neural networks trained on non-spinning binary black hole data from a single detector. The network is applied to the data from both observatories independently and we check for events coincident in time between the two. This enables the efficient analysis of large quantities of background data by time-shifting the independent detector data. We find that while for a single detector the network retains $91.5\%$ of the sensitivity matched filtering can achieve, this number drops to $83.9\%$ for two observatories. To enable the network to check for signal consistency in the detectors, we then construct a set of simple networks that operate directly on data from both detectors. We find that none of these simple two-detector networks are capable of improving the sensitivity over applying networks individually to the data from the detectors and searching for time coincidences.	翻訳日:2021-08-25 14:05:36 公開日:2021-08-24
# 適応群lassoニューラルネットワークモデル : 少数の変数と時間依存データの関数について Adaptive Group Lasso Neural Network Models for Functions of Few Variables and Time-Dependent Data ( http://arxiv.org/abs/2108.10825v1 ) ライセンス: Link先を確認	Lam Si Tung Ho and Giang Tran	(参考訳) 本稿では,動的システムから入力データが生成され,対象関数が少数のアクティブ変数や変数の線形結合に依存する高次元関数近似のための適応群lasso深層ニューラルネットワークを提案する。対象関数をディープニューラルネットワークで近似し,対象関数の制約を表現するために,適切な隠れ層の重みに対して適応群lasso制約を強制する。実験により,提案手法は,スパース辞書行列法,グループラッソペナルティの有無のニューラルネットワークなど,最近の最先端手法よりも優れていることが示された。 In this paper, we propose an adaptive group Lasso deep neural network for high-dimensional function approximation where input data are generated from a dynamical system and the target function depends on few active variables or few linear combinations of variables. We approximate the target function by a deep neural network and enforce an adaptive group Lasso constraint to the weights of a suitable hidden layer in order to represent the constraint on the target function. Our empirical studies show that the proposed method outperforms recent state-of-the-art methods including the sparse dictionary matrix method, neural networks with or without group Lasso penalty.	翻訳日:2021-08-25 14:05:14 公開日:2021-08-24
# (参考訳) SERF:log-Softplus ERrorActivation Functionを用いたディープニューラルネットワークのより良いトレーニングを目指して SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function ( http://arxiv.org/abs/2108.09598v2 ) ライセンス: CC BY 4.0	Sayan Nag, Mayukh Bhattacharyya	(参考訳) アクティベーション機能は、トレーニングダイナミクスとニューラルネットワークのパフォーマンスを決定する上で重要な役割を果たす。シンプルで有効であるにもかかわらず広く採用されているアクティベーション関数 ReLU には、Dying ReLU 問題を含むいくつかの欠点がある。そこで本研究では,自然界において自己正規化され,非単調であるサーフと呼ばれる新しい活性化関数を提案する。 Mishと同様に、SerfもSwishファミリーに属している。コンピュータビジョン(画像分類とオブジェクト検出)と自然言語処理(機械翻訳、感情分類、マルチモーダル・エンテーメント)の様々な実験に基づいて、SerfはReLU(ベースライン)とSwishとMishを含む他のアクティベーション機能を大きく上回っており、より深いアーキテクチャに顕著な差がある。アブレーション研究により、serfベースのアーキテクチャは様々なシナリオにおいてswishやmishよりも優れた性能を示し、様々な深さ、複雑さ、最適化、学習率、バッチサイズ、初期化器、ドロップアウト率でserfの有効性と互換性を検証する。最後に,SwishとSerfの数学的関係について検討し,よりスムーズかつ高速に勾配を最適化する正規化効果を提供するSerfの第1微分のプレコンディショナー関数の影響を示す。 Activation functions play a pivotal role in determining the training dynamics and neural network performance. The widely adopted activation function ReLU despite being simple and effective has few disadvantages including the Dying ReLU problem. In order to tackle such problems, we propose a novel activation function called Serf which is self-regularized and nonmonotonic in nature. Like Mish, Serf also belongs to the Swish family of functions. Based on several experiments on computer vision (image classification and object detection) and natural language processing (machine translation, sentiment classification and multimodal entailment) tasks with different state-of-the-art architectures, it is observed that Serf vastly outperforms ReLU (baseline) and other activation functions including both Swish and Mish, with a markedly bigger margin on deeper architectures. Ablation studies further demonstrate that Serf based architectures perform better than those of Swish and Mish in varying scenarios, validating the effectiveness and compatibility of Serf with varying depth, complexity, optimizers, learning rates, batch sizes, initializers and dropout rates. Finally, we investigate the mathematical relation between Swish and Serf, thereby showing the impact of preconditioner function ingrained in the first derivative of Serf which provides a regularization effect making gradients smoother and optimization faster.	翻訳日:2021-08-25 11:52:32 公開日:2021-08-24
# (参考訳) 側面:構造対応インスタンス深度推定を用いたセンタベースステレオ3d検出器 SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation ( http://arxiv.org/abs/2108.09663v2 ) ライセンス: CC BY 4.0	Xidong Peng, Xinge Zhu, Tai Wang, and Yuexin Ma	(参考訳) 3D検出は環境認識において不可欠である。一般的に使用されるLiDARセンサーのコストが高いため、ステレオビジョンに基づく3D検出は経済的に効果的だが、近年は注目を集めている。 2次元画像に基づくこれらのアプローチでは、正確な深度情報が3次元検出の鍵となり、既存の手法のほとんどは、深度推定の予備段階に頼っている。それらは主にグローバルな深度に焦点を合わせ、この特定のタスク、すなわち空間と局所性における深度情報の性質を無視する。そこで本研究では, ステレオ画像を用いた立体画像によるアンカーフリー3D検出手法を提案し, 各オブジェクトのRoIsからコストボリュームを構成することで, インスタンスレベルの深度情報を探索する。局所的なコスト量の情報のスパース性から,さらに,マッチングの重み付けと構造認識の注意を導入し,奥行き情報の集中化を図る。 KITTIデータセットで行った実験から,本手法は深度マップの監督のない既存手法と比較して最先端の性能を実現することが示された。 3D detection plays an indispensable role in environment perception. Due to the high cost of commonly used LiDAR sensor, stereo vision based 3D detection, as an economical yet effective setting, attracts more attention recently. For these approaches based on 2D images, accurate depth information is the key to achieve 3D detection, and most existing methods resort to a preliminary stage for depth estimation. They mainly focus on the global depth and neglect the property of depth information in this specific task, namely, sparsity and locality, where exactly accurate depth is only needed for these 3D bounding boxes. Motivated by this finding, we propose a stereo-image based anchor-free 3D detection method, called structure-aware stereo 3D detector (termed as SIDE), where we explore the instance-level depth information via constructing the cost volume from RoIs of each object. Due to the information sparsity of local cost volume, we further introduce match reweighting and structure-aware attention, to make the depth information more concentrated. Experiments conducted on the KITTI dataset show that our method achieves the state-of-the-art performance compared to existing methods without depth map supervision.	翻訳日:2021-08-25 11:40:12 公開日:2021-08-24
# (参考訳) 回帰のための効率的なガウス神経プロセス Efficient Gaussian Neural Processes for Regression ( http://arxiv.org/abs/2108.09676v2 ) ライセンス: CC BY 4.0	Stratis Markou, James Requeima, Wessel Bruinsma, Richard Turner	(参考訳) Conditional Neural Processs (CNP; Garnelo et al., 2018) は、よく校正された予測を生成し、テスト時に高速な推論を可能にし、単純な最大精度の手順でトレーニングできる、魅力的なメタラーニングモデルのファミリーである。 CNPの制限は、出力の依存性をモデル化できないことである。これにより予測性能が著しく低下し、コヒーレント関数サンプルの描画が不可能になるため、下流アプリケーションや意思決定におけるCNPの適用性が制限される。ニューラルプロセス(nps; garnelo et al., 2018)は、潜在変数を使用してこの問題を緩和し、出力依存性をモデル化するが、近似推論による困難をもたらす。最近の代替案 (Bruinsma et al.,2021) はFullConvGNPと呼ばれ、予測の依存性をモデル化し、正確な最大形でトレーニング可能である。残念ながらFullConvGNPは高価な2次元畳み込みに依存しており、1次元のデータしか適用できない。本研究では,出力依存性をモデル化する別の手法を提案する。この手法は,最大確率トレーニングにも応用できるが,fullconvgnpと異なり,2次元データと3次元データにスケールできる。提案手法は合成実験において良好な性能を示す。 Conditional Neural Processes (CNP; Garnelo et al., 2018) are an attractive family of meta-learning models which produce well-calibrated predictions, enable fast inference at test time, and are trainable via a simple maximum likelihood procedure. A limitation of CNPs is their inability to model dependencies in the outputs. This significantly hurts predictive performance and renders it impossible to draw coherent function samples, which limits the applicability of CNPs in down-stream applications and decision making. Neural Processes (NPs; Garnelo et al., 2018) attempt to alleviate this issue by using latent variables, relying on these to model output dependencies, but introduces difficulties stemming from approximate inference. One recent alternative (Bruinsma et al.,2021), which we refer to as the FullConvGNP, models dependencies in the predictions while still being trainable via exact maximum-likelihood. Unfortunately, the FullConvGNP relies on expensive 2D-dimensional convolutions, which limit its applicability to only one-dimensional data. In this work, we present an alternative way to model output dependencies which also lends itself maximum likelihood training but, unlike the FullConvGNP, can be scaled to two- and three-dimensional data. The proposed models exhibit good performance in synthetic experiments.	翻訳日:2021-08-25 11:25:44 公開日:2021-08-24
# (参考訳) 構成可能な3dシーンレイアウトによるリアル画像合成 Realistic Image Synthesis with Configurable 3D Scene Layouts ( http://arxiv.org/abs/2108.10031v2 ) ライセンス: CC BY 4.0	Jaebong Jeong, Janghun Jo, Jingdong Wang, Sunghyun Cho, Jaesik Park	(参考訳) 最近の条件付き画像合成手法は高品質な合成画像を提供する。しかし、オブジェクトの位置や向きなどの画像内容の正確な調整は依然として困難であり、合成画像は幾何学的に無効な内容を持つことが多い。 3次元幾何学的な側面から合成画像のリッチな制御性を実現するために,構成可能な3次元シーンレイアウトに基づくリアルな画像合成手法を提案する。提案手法はセマンティックなクラスラベルを持つ3Dシーンを入力として、入力された3Dシーンの色値を合成する3Dシーン描画ネットワークを訓練する。トレーニング済みのペイントネットワークでは、入力された3dシーンの写実的なイメージをレンダリングして操作することができる。絵画ネットワークを3Dカラー監視なしで訓練するために,市販の2Dセマンティック画像合成手法を利用する。実験では,本手法が幾何学的正しい構造をもつ画像を生成し,視点や物体のポーズの変化や絵画スタイルの操作といった幾何学的操作をサポートすることを示す。 Recent conditional image synthesis approaches provide high-quality synthesized images. However, it is still challenging to accurately adjust image contents such as the positions and orientations of objects, and synthesized images often have geometrically invalid contents. To provide users with rich controllability on synthesized images in the aspect of 3D geometry, we propose a novel approach to realistic-looking image synthesis based on a configurable 3D scene layout. Our approach takes a 3D scene with semantic class labels as input and trains a 3D scene painting network that synthesizes color values for the input 3D scene. With the trained painting network, realistic-looking images for the input 3D scene can be rendered and manipulated. To train the painting network without 3D color supervision, we exploit an off-the-shelf 2D semantic image synthesis method. In experiments, we show that our approach produces images with geometrically correct structures and supports geometric manipulation such as the change of the viewpoint and object poses as well as manipulation of the painting style.	翻訳日:2021-08-25 11:14:35 公開日:2021-08-24
# 深層ニューラルネットワークによる微生物コロニー検出法 -比較解析- Deep neural networks approach to microbial colony detection -- a comparative analysis ( http://arxiv.org/abs/2108.10103v2 ) ライセンス: Link先を確認	Sylwia Majchrowska, Jaros{\l}aw Paw{\l}owski, Natalia Czerep, Aleksander G\'orecki, Jakub Kuci\'nski, and Tomasz Golan	(参考訳) 微生物コロニーの計数は微生物学の基本的な課題であり、多くの産業分野に応用されている。それにもかかわらず、人工知能を用いた自動微生物計数に関する最近の研究は、統一された方法論の欠如と大規模なデータセットの可用性のため、ほとんど比較できない。最近導入されたagarデータセットは、第2のニーズへの答えだが、研究はまだ不十分である。この問題に対処するため,AGARデータセット上での3つのよく知られたディープラーニング手法,すなわち2段階,1段階,トランスフォーマーに基づくニューラルネットワークの性能を比較した。得られた結果は将来の実験のベンチマークとして機能するかもしれない。 Counting microbial colonies is a fundamental task in microbiology and has many applications in numerous industry branches. Despite this, current studies towards automatic microbial counting using artificial intelligence are hardly comparable due to the lack of unified methodology and the availability of large datasets. The recently introduced AGAR dataset is the answer to the second need, but the research carried out is still not exhaustive. To tackle this problem, we compared the performance of three well-known deep learning approaches for object detection on the AGAR dataset, namely two-stage, one-stage and transformer based neural networks. The achieved results may serve as a benchmark for future experiments.	翻訳日:2021-08-25 10:58:57 公開日:2021-08-24

Title

Authors

Abstract

論文公表日・翻訳日

# 離散時間量子ウォークを用いた閉グラフ上のマルチキュービット量子コンピューティング

Multi-qubit quantum computing using discrete-time quantum walks on closed graphs ( http://arxiv.org/abs/2004.05956v2 )

ライセンス: Link先を確認

Prateek Chawla, Shivani Singh, Aman Agarwal, Sarvesh Srinivasan, C. M. Chandrashekar

(参考訳) 普遍量子計算は、連続時間と離散時間の両方の量子ウォークを用いて実現することができる。本稿では,単一粒子離散時間量子ウォークに基づくマルチキュービット計算タスクを実現するバージョンを提案する。このスキームのスケーラビリティは、閉格子形式のウォーク操作の集合を用いて、マルチ量子ビット系上の量子ゲートの普遍的な集合を実装することで証明される。また、グローバーのアルゴリズム、量子フーリエ変換、量子位相推定アルゴリズムを実装できる、実験的に実現可能なウォーク演算のセットも提示する。エラー検出と修正の基本的な実装も提示する。このスキームの空間的および時間的複雑さの分析は、量子ウォーク進化操作の実装がシステム固有の特徴であるシステムにおける量子ウォークに基づく量子計算モデルの利点を強調している。

Universal quantum computation can be realised using both continuous-time and discrete-time quantum walks. We present a version based on single particle discrete-time quantum walk to realize multi-qubit computation tasks. The scalability of the scheme is demonstrated by using a set of walk operations on a closed lattice form to implement the universal set of quantum gates on multi-qubit system. We also present a set of experimentally realizable walk operations that can implement Grover's algorithm, quantum Fourier transformation and quantum phase estimation algorithms. An elementary implementation of error detection and correction is also presented. Analysis of space and time complexity of the scheme highlights the advantages of quantum walk based model for quantum computation on systems where implementation of quantum walk evolution operations is an inherent feature of the system.

翻訳日:2023-05-24 11:30:59 公開日:2021-08-24

# 古典イジングハミルトンの低エネルギー状態の量子インスパイア探索法

Quantum-inspired search method for low-energy states of classical Ising Hamiltonians ( http://arxiv.org/abs/2010.00180v2 )

ライセンス: Link先を確認

Hiroshi Ueda, Yuichi Otsuka and Seiji Yunoki

(参考訳) 2体完全連結なランダムイジング相互作用とランダムな局所磁場からなる古典的ハミルトニアンの低エネルギー状態を求める量子インスパイアされた数値計算法を開発した。この方法では、元のイジングハミルトニアンに可換でない無限小量子相互作用を導入し、クリロフ部分空間法に触発された直積状態を繰り返し生成および切断し、元の古典イジングハミルトニアンの低エネルギー状態を得る。計算コストは、無限小量子相互作用(例えば、一体または二体相互作用)の形式と、導入される無限小相互作用項の数、異なる初期状態、および反復中に保持される低エネルギー状態によって制御される。ここでは、異なるサイト上で作用するパウリ$X$作用素の無限小量子相互作用対積と、ランダムなイジング・ハミルトニアン(英語版)(Ising Hamiltonian)へのオンサイトであるパウリ$X$演算子(英語版)の数値コストが1イテレーションあたり$O(N^3)$であることを示す。ランダムなイジング・ハミルトニアンに対して最大600ドルのランダムなカップリング実現の120のインスタンスを検討し、各インスタンスの120の最低エネルギー状態を求める。本稿では,ランダムイジング・ハミルトニアン・スケールの基底状態の探索のために,量子インスパイアされた手法による解の時間-解法と異なる初期状態の観点からの並列化が,約$n^5$ for $n$ から$600 となることを見出した。また, ランダムイジングハミルトニアンの低エネルギー領域におけるアンサンブル平均基底状態, 第一励起エネルギー, アンサンブル平均状態数などの基礎物性についても検討した。

We develop a quantum-inspired numerical procedure for searching low-energy states of a classical Hamiltonian composed of two-body fully-connected random Ising interactions and a random local longitudinal magnetic field. In this method, we introduce infinitesimal quantum interactions that do not commute with the original Ising Hamiltonian, and repeatedly generate and truncate direct product states, inspired by the Krylov subspace method, to obtain the low-energy states of the original classical Ising Hamiltonian. The computational cost is controlled by the form of infinitesimal quantum interactions (e.g., one-body or two-body interactions) and the numbers of infinitesimal interaction terms introduced, different initial states considered, and low-energy states kept during the iteration. For a demonstrate of the method, here we introduce as the infinitesimal quantum interactions pair products of Pauli $X$ operators acting on different sites and on-site Pauli $X$ operators into the random Ising Hamiltonian, in which the numerical cost is $O(N^3)$ per iteration with the system size $N$. We consider 120 instances of the random coupling realizations for the random Ising Hamiltonian with $N$ up to 600 and search the 120 lowest-energy states for each instance. We find that the time-to-solution by the quantum-inspired method proposed here, with parallelization in terms of the different initial states, for searching the ground state of the random Ising Hamiltonian scales approximately as $N^5$ for $N$ up to 600. We also examine the basic physical properties such as the ensemble-averaged ground-state and first-excited energies and the ensemble-averaged number of states in the low-energy region of the random Ising Hamiltonian.

翻訳日:2023-04-30 12:15:38 公開日:2021-08-24

# 縮小密度行列からの複雑性:カオスの新しい診断法

Complexity from the Reduced Density Matrix: a new Diagnostic for Chaos ( http://arxiv.org/abs/2011.04705v2 )

ライセンス: Link先を確認

Arpan Bhattacharyya, S. Shajidul Haque and Eugene H. Kim

(参考訳) 多粒子量子系におけるカオスを特徴づける回路複雑性について検討する。このプロセスでは、複雑性を利用してオープン量子システムを分析する。本研究では,異なる種類の量子回路を探索することにより,密度行列の低減に基づく複雑性から量子カオスの新しい診断法を提案する。共振器の1つまたは両方を反転させる2つの結合振動子のおもちゃモデルに関する明示的な計算により、複雑性の進化がカオスの診断の可能性を示す。

We investigate circuit complexity to characterize chaos in multiparticle quantum systems. In the process, we take a stride to analyze open quantum systems by using complexity. We propose a new diagnostic of quantum chaos from complexity based on the reduced density matrix by exploring different types of quantum circuits. Through explicit calculations on a toy model of two coupled harmonic oscillators, where one or both of the oscillators are inverted, we demonstrate that the evolution of complexity is a possible diagnostic of chaos.

翻訳日:2023-04-24 21:08:09 公開日:2021-08-24

# ドープ量子井戸における電子基底状態の共振器誘起変化の測定に関する理論的提案

Theoretical proposals to measure resonator-induced modifications of the electronic ground-state in doped quantum wells ( http://arxiv.org/abs/2012.09458v3 )

ライセンス: Link先を確認

Yuan Wang and Simone De Liberato

(参考訳) 近年の非摂動型光-物質結合の物理学への関心は、相互作用エネルギーが素粒子に匹敵する固体キャビティ量子力学装置の開発につながった。このような状況下では、結合系の基底状態は相互作用依存となり、多くの調査の対象であったにもかかわらずまだ観測されていない仮想励起の集団を含むと予測される。本稿では,量子井戸における仮想電子励起が基底状態電荷分布をどのように変化させるかを調査し,そのキャビティ誘起摂動を測定する2つの方法を提案する。最初のアプローチは、局所的な欠陥状態を用いた量子井戸の特定の位置における電子集団の分光マッピングに基づいている。第二のアプローチは代わりにケルビンプローブのフォトニック等価性を利用して量子井戸の平均変化分布を測定する。両方の効果は、現在または近未来の技術で観察できる。その結果,地中電子特性の空洞誘起変調の実証への道筋が得られた。

Recent interest in the physics of non-perturbative light-matter coupling led to the development of solid-state cavity quantum electrodynamics setups in which the interaction energies are comparable with the bare ones. In such a regime the ground state of the coupled system becomes interaction-dependent and is predicted to contain a population of virtual excitations which, notwithstanding having been object of many investigations, remain still unobserved. In this paper we investigate how virtual electronic excitations in quantum wells modify the ground-state charge distribution, and propose two methods to measure such a cavity-induced perturbation. The first approach we consider is based on spectroscopic mapping of the electronic population at a specific location in the quantum well using localised defect states. The second approach exploits instead the photonic equivalent of a Kelvin probe to measure the average change distribution across the quantum well. We find both effects observable with present-day or near-future technology. Our results thus provide a route toward a demonstration of cavity-induced modulation of ground-state electronic properties.

翻訳日:2023-04-20 08:46:16 公開日:2021-08-24

# 一次元量子系における緩和の運命の不決定性

Undecidability of the fate of relaxation in one-dimensional quantum systems ( http://arxiv.org/abs/2012.13890v2 )

ライセンス: Link先を確認

Naoto Shiraishi and Keiji Matsumoto

(参考訳) 孤立量子多体系における緩和ダイナミクスについて検討する。緩和後の可観測物の定常値は、この定常値が平衡値と一致する緩和現象であるため、量子熱化の分野での研究のトピックである。したがって、量子多体系における定常値の計算は重要な問題と見なされる。しかし、量子多体系の定常値は計算不可能であることが証明される。より正確には、定常値が与えられた値の近傍にあるかどうかが決定不能な問題であることを示す。我々の決定不能な結果は、最接近相互作用のある1次元シフト不変量系、我々の初期状態から1つのサイト上の状態の積状態、そして1つのボディオブザーバブルのシフトサムに観測可能な場合に、まだ満足できる。この結果は、与えられた量子多体系における熱化の有無を決定する一般的な定理や手続きがないことを明確に示している。

We investigate the relaxation dynamics in an isolated quantum many-body system. The stationary value of an observable after relaxation is a topic of researches in the field of quantum thermalization, since thermalization is a relaxation phenomena where this stationary value coincides with the equilibrium value. Therefore, computing the stationary value in quantum many-body systems is regarded as an important problem. We, however, prove that the stationary value in quantum many-body systems is incomputable. More precisely, we show that whether the stationary value is in the vicinity of a given value or not is an undecidable problem. Our undecidable result is still satisfied when we restrict our system to a one-dimensional shift-invariant system with nearest-neighbor interaction, our initial state to a product state of a state on a single site, and our observable to a shift-sum of a one-body observable. This result clearly shows that there is no general theorem or procedure to decide the presence or absence of thermalization in a given quantum many-body system.

翻訳日:2023-04-19 04:07:44 公開日:2021-08-24

# 量子熱化における不確定性

Undecidability in quantum thermalization ( http://arxiv.org/abs/2012.13889v2 )

ライセンス: Link先を確認

Naoto Shiraishi and Keiji Matsumoto

(参考訳) 孤立量子多体系における熱化の研究は、統計力学の発達の時代までさかのぼる長い歴史がある。自然界の多くの量子多体系は熱化と見なされるが、一部は熱平衡に達することはない。中心的な問題は、ある系が以前に対処されたが解決されていない熱分解するかどうかを明らかにすることである。ここでは、この問題は決定不能であることを示す。結果として生じる不確定性は、システムが最も近い隣り合う相互作用を持つ一次元シフト不変系に制限され、初期状態が固定積状態であるときにさえ適用される。我々は、可逆的普遍的チューリングマシンのダイナミクスをコードするハミルトンのファミリーを構築し、チューリングマシンが停止するかどうかによって緩和過程の運命がかなり変化する。以上の結果から,任意のハミルトニアンにおける熱化の有無を決定する一般定理,アルゴリズム,系統的手続きは存在しないことが示唆された。

The investigation of thermalization in isolated quantum many-body systems has a long history, dating back to the time of developing statistical mechanics. Most quantum many-body systems in nature are considered to thermalize, while some never achieve thermal equilibrium. The central problem is to clarify whether a given system thermalizes, which has been addressed previously, but not resolved. Here, we show that this problem is undecidable. The resulting undecidability even applies when the system is restricted to one-dimensional shift-invariant systems with nearest-neighbour interaction, and the initial state is a fixed product state. We construct a family of Hamiltonians encoding dynamics of a reversible universal Turing machine, where the fate of a relaxation process changes considerably depending on whether the Turing machine halts. Our result indicates that there is no general theorem, algorithm, or systematic procedure determining the presence or absence of thermalization in any given Hamiltonian.

翻訳日:2023-04-19 04:07:28 公開日:2021-08-24

# 平衡から遠い系の普遍的量子揺らぎ-散逸関係

Universal Quantum Fluctuation-Dissipation Relation for Systems Far From Equilibrium ( http://arxiv.org/abs/2101.11827v2 )

ライセンス: Link先を確認

Zhedong Zhang, Xuanhua Wang, Jin Wang

(参考訳) 平衡状態から遠方への緩和に伴うゆらぎは、幅広いスケールの様々なシステムにとって基本的な関心事である。近年の分光などの技術の進歩により、メソスコピック系のゆらぎを、平衡から遠ざかる量子系を駆動する際に緩和過程と関連づけて測定する可能性が生まれている。詳細平衡条件に反する量子マルコフ過程に対する一般非平衡変動散逸定理(FDT)を提案する。ゆらぎとは別に、緩和は、平衡状態から遠方にある量子カール流束によって支配される余分な相関を伴う。このような寄与は熱平衡のために消滅し、従来のFDTが回収される。最終的に分子接合に非平衡FDTを適用し、光透過スペクトルの詳細な平衡破壊効果を解明する。本研究は摂動系および近平衡系におけるゆらぎ-散逸関係の利点と限界を示し、摂動系および近平衡系における量子熱力学の研究に広く興味を持ち、量子熱力学の研究にも幅広い関心を寄せている。

Fluctuations associated with relaxations in far-from-equilibrium regime is of fundamental interest for a large variety of systems within broad scales. Recent advances in techniques such as spectroscopy have generated the possibility for measuring the fluctuations of the mesoscopic systems in connection to the relaxation processes when driving the underlying quantum systems far from equilibrium. We present a general nonequilibrium Fluctuation-Dissipation Theorem (FDT) for quantum Markovian processes where the detailed-balance condition is violated. Apart from the fluctuations, the relaxation involves extra correlation that is governed by the quantum curl flux emerged in the far-from-equilibrium regime. Such a contribution vanishes for the thermal equilibrium, so that the conventional FDT is recovered. We finally apply the nonequilibrium FDT to the molecular junctions, elaborating the detailed-balance-breaking effects on the optical transmission spectrum. Our results have the advantage of and exceed the scope of the fluctuation-dissipation relation in the perturbative and near equilibrium regimes, and is of broad interest for the study of quantum thermodynamics.ation in the perturbative and near equilibrium regimes, and is of broad interest for the study of quantum thermodynamics.

翻訳日:2023-04-13 12:08:22 公開日:2021-08-24

# 高速反転による量子ルーティング

Quantum routing with fast reversals ( http://arxiv.org/abs/2103.03264v2 )

ライセンス: Link先を確認

Aniruddha Bapat, Andrew M. Childs, Alexey V. Gorshkov, Samuel King, Eddie Schoute, Hrishee Shastri

(参考訳) 本稿では、相互作用制約下で量子ビットの任意の置換を実装する手法を提案する。提案プロトコルは,経路に沿ったキュービットの順序を高速に逆転する従来の手法を利用する。 n$ の経路上の近距離-neighbor相互作用を考えると、量子ルーティング時間が(1-\epsilon)n$ 以上であるような一定の $\epsilon \approx 0.034$ が存在するが、スワップベースのプロトコルは少なくとも $n-1$ である。これは、スワップベースのルーティング方法に対する最初の既知の量子アドバンテージであり、グリッドのような現実的なアーキテクチャに対する量子ルーティング時間を改善する。さらに,本アルゴリズムはランダムな置換に対する期待値が2n/3$の量子ルーティング時間に接近していることを示し,スワップベースのプロトコルは漸近的に時間$n$を求める。さらに、k \le n$ qubits をルートするスパース置換を考え、経路上では最大$n/3 + o(k^2)$、半径 $r$ の一般グラフでは最大$r/3 + o(k^2)$ の量子ルーティング時間を持つアルゴリズムを与える。

We present methods for implementing arbitrary permutations of qubits under interaction constraints. Our protocols make use of previous methods for rapidly reversing the order of qubits along a path. Given nearest-neighbor interactions on a path of length $n$, we show that there exists a constant $\epsilon \approx 0.034$ such that the quantum routing time is at most $(1-\epsilon)n$, whereas any swap-based protocol needs at least time $n-1$. This represents the first known quantum advantage over swap-based routing methods and also gives improved quantum routing times for realistic architectures such as grids. Furthermore, we show that our algorithm approaches a quantum routing time of $2n/3$ in expectation for uniformly random permutations, whereas swap-based protocols require time $n$ asymptotically. Additionally, we consider sparse permutations that route $k \le n$ qubits and give algorithms with quantum routing time at most $n/3 + O(k^2)$ on paths and at most $2r/3 + O(k^2)$ on general graphs with radius $r$.

翻訳日:2023-04-09 02:20:25 公開日:2021-08-24

# 対称性を通した四体不識別性の特徴

Characterizing four-body indistinguishability via symmetries ( http://arxiv.org/abs/2103.04600v2 )

ライセンス: Link先を確認

Alexander M. Minke, Andreas Buchleitner, Christoph Dittel

(参考訳) 混合状態において調製された内部自由度によって部分的に識別可能な4つの同一のボソニック粒子またはフェルミイオン粒子の識別不能性を特徴付ける方法を示す。これは、その外的(動的)自由度に作用する高度に対称なユニタリに従えば、そのカウント統計によって達成される。純粋な内部状態に対しては、粒子の集合相に関する情報をさらに抽出し、最終的には複素共役までの完全な多粒子密度作用素を実験的に再構築することができる。

We show how to characterize the indistinguishability of up to four identical, bosonic or fermionic particles, which are rendered partially distinguishable through their internal degrees of freedom prepared in mixed states. This is accomplished via their counting statistics when subjected to a highly symmetric unitary acting upon their external (i.e., dynamical) degrees of freedom. For pure internal states, we further extract information on the particles' collective phases, which ultimately allows for an experimental reconstruction of the full many-particle density operator up to complex conjugation.

翻訳日:2023-04-08 18:21:50 公開日:2021-08-24

# 浮遊系の熱・機械的変化の非平衡制御

Nonequilibrium control of thermal and mechanical changes in a levitated system ( http://arxiv.org/abs/2103.10898v2 )

ライセンス: Link先を確認

Markus Rademacher, Michael Konopik, Maxime Debiossac, David Grass, Eric Lutz, Nikolai Kiesel

(参考訳) ゆらぎ定理は、小さな非平衡系の熱力学の第2法則の基本的な拡張である。作業と熱は同様にエネルギー交換の重要な形態であるが、変動関係は機械と熱の同時変化の一般的な状況について実験的に評価されていない。熱駆動は機械駆動よりも一般的に遅く、より実現が難しい。ここでは, フィードバック冷却技術を用いて, 平衡時間よりも1桁高速に浮遊する微小粒子の高速かつ制御された温度変化を実現する。機械制御と熱制御を組み合わせることで, 線形応答理論の範囲を超えて, 両寄与を考慮したゆらぎ定理の有効性を検証した。この結果から, 機械的および熱的変化を同時に行う顕微鏡システムにおいて, 一般の遠方平衡過程の解明が可能となった。

Fluctuation theorems are fundamental extensions of the second law of thermodynamics for small nonequilibrium systems. While work and heat are equally important forms of energy exchange, fluctuation relations have not been experimentally assessed for the generic situation of simultaneous mechanical and thermal changes. Thermal driving is indeed generally slow and more difficult to realize than mechanical driving. Here, we use feedback cooling techniques to implement fast and controlled temperature variations of an underdamped levitated microparticle that are one order of magnitude faster than the equilibration time. Combining mechanical and thermal control, we verify the validity of a fluctuation theorem that accounts for both contributions, well beyond the range of linear response theory. Our results allow the investigation of general far-from-equilibrium processes in microscopic systems that involve fast mechanical and thermal changes at the same time.

翻訳日:2023-04-07 10:54:28 公開日:2021-08-24

# 分解領域におけるイオンアンサンブルの高分解能光学分光による陽子電子質量比

Proton-electron mass ratio by high-resolution optical spectroscopy of ion ensembles in the resolved-carrier regime ( http://arxiv.org/abs/2103.11741v2 )

ライセンス: Link先を確認

I. V. Kortunov, S. Alighanbari, M. G. Hansen, G. S. Giri, V. I. Korobov and S. Schiller

(参考訳) 気体相の光学分光は、原子と分子の構造とそれらの外部磁場との相互作用を解明するための重要なツールである。線解像度は通常、粒子の熱運動による一階ドップラー拡大と励起ビームによる短い通過時間の組合せによって制限される。閉じ込められた粒子の場合、適切なレーザー冷却技術は強い閉じ込め(ラム・ディッケ状態、LDR)をもたらし、これらの効果を伴わない光学分光に繋がる。非レーザー冷却型分光イオンでは、これは1つまたは2つの原子イオンと1つのレーザー可溶性原子イオン[1,2]をトラップする場合にのみ達成されている。ここでは, ドップラーとトランジットを含まない1光子光学分光法が, 中赤外放射によるイオンのアンサンブルにより容易に得られることを示す。本手法を分子イオン上で実証する。我々は、数千個のレーザー冷却原子イオンからなるクーロンクラスター内に約100個の水素分子イオン(HD$^{+}$)をトラップし、基本振動遷移のレーザー分光を行う。遷移周波数は3.3$\times$10$^{-12}$の最低不確かさで決定された。応用例として, 精密な ab initio 計算と測定振動周波数を一致させて, 陽子電子質量比を求める。

Optical spectroscopy in the gas phase is a key tool to elucidate the structure of atoms and molecules and of their interaction with external fields. The line resolution is usually limited by a combination of first-order Doppler broadening due to particle thermal motion and of a short transit time through the excitation beam. For trapped particles, suitable laser cooling techniques can lead to strong confinement (Lamb-Dicke regime, LDR) and thus to optical spectroscopy free of these effects. For non-laser coolable spectroscopy ions, this has so far only been achieved when trapping one or two atomic ions, together with a single laser-coolable atomic ion [1,2]. Here we show that one-photon optical spectroscopy free of Doppler and transit broadening can also be obtained with more easily prepared ensembles of ions, if performed with mid-infrared radiation. We demonstrate the method on molecular ions. We trap approximately 100 molecular hydrogen ions (HD$^{+}$) within a Coulomb cluster of a few thousand laser-cooled atomic ions and perform laser spectroscopy of the fundamental vibrational transition. Transition frequencies were determined with lowest uncertainty of 3.3$\times$10$^{-12}$ fractionally. As an application, we determine the proton-electron mass ratio by matching a precise ab initio calculation with the measured vibrational frequency.

翻訳日:2023-04-07 04:35:37 公開日:2021-08-24

# 散逸結合縮退光パラメトリック発振器における絡み合った猫状態の生成と検出

Generating and detecting entangled cat states in dissipatively coupled degenerate optical parametric oscillators ( http://arxiv.org/abs/2103.16090v2 )

ライセンス: Link先を確認

Zheng-Yang Zhou, Clemens Gneiting, J. Q. You, and Franco Nori

(参考訳) 非ガウス連続変数状態は、量子理論の基礎と創発的量子技術の両方において中心的な役割を果たす。特に「キャット状態」、すなわち2成分のマクロな量子重ね合わせは、量子コヒーレンスをアクセス可能な方法で具現化し、基本的なテストや量子情報タスクにも利用できる。縮退した光パラメトリック発振器は自然に単一モードの猫状態を生成できるため、その実現と活用に有望なプラットフォームとなる。縮退した光パラメトリック振動子間の散逸結合は、これを2モードの絡み合った猫状態、すなわち2モードの絡み合った猫状態へと拡張する。単一光子損失を克服することは、縮退した光パラメトリック発振器において十分に純粋な単一モードの猫状態を実現するための大きな課題である一方で、そのような散逸結合下での2つのモードの絡み合った猫状態の生成は、追加のハードルなしで達成できることを示す。 2つの散逸結合縮退光パラメトリック発振器において、一過性2モード絡み合い状態を生成するためのパラメータレジームを数値的に検討した。キャット状態の絡み合いを認証するために, キャット状態の絡み合いを現実的な条件下で確実に検出できる, 調整された分散型絡み合い基準を用いる。

Non-Gaussian continuous variable states play a central role both in the foundations of quantum theory and for emergent quantum technologies. In particular, "cat states", i.e., two-component macroscopic quantum superpositions, embody quantum coherence in an accessible way and can be harnessed for fundamental tests and quantum information tasks alike. Degenerate optical parametric oscillators can naturally produce single-mode cat states and thus represent a promising platform for their realization and harnessing. We show that a dissipative coupling between degenerate optical parametric oscillators extends this to two-mode entangled cat states, i.e., two-mode entangled cat states are naturally produced under such dissipative coupling. While overcoming single-photon loss still represents a major challenge towards the realization of sufficiently pure single-mode cat states in degenerate optical parametric oscillators, we show that the generation of two-mode entangled cat states under such dissipative coupling can then be achieved without additional hurdles. We numerically explore the parameter regime for the successful generation of transient two-mode entangled cat states in two dissipatively coupled degenerate optical parametric oscillators. To certify the cat-state entanglement, we employ a tailored, variance-based entanglement criterion, which can robustly detect cat-state entanglement under realistic conditions.

翻訳日:2023-04-06 03:49:17 公開日:2021-08-24

# 非対称一般化量子ラビ模型に対する隠れ対称性作用素

Hidden symmetry operators for asymmetric generalised quantum Rabi models ( http://arxiv.org/abs/2104.14164v2 )

ライセンス: Link先を確認

Xilin Lu, Zi-Min Li, Vladimir V. Mangazeev and Murray T. Batchelor

(参考訳) 非対称量子ラビモデル(aqrm)の隠れた$\mathbb{z}_2$対称性は、最近基礎となる対称性作用素の体系的構成を通じて明らかにされている。 AQRMの結果に基づいて、AQRM関連モデルの対称性演算子の一般的な形に対するアンサッツを提案する。このアンザッツを適用して、異方性AQRM、非対称ラビスタークモデル(ARSM)、および異方性ARSMの3つのモデルに対する対称性作用素を得る。

The hidden $\mathbb{Z}_2$ symmetry of the asymmetric quantum Rabi model (AQRM) has recently been revealed via a systematic construction of the underlying symmetry operator. Based on the AQRM result, we propose an ansatz for the general form of the symmetry operators for AQRM-related models. Applying this ansatz we obtain the symmetry operator for three models: the anisotropic AQRM, the asymmetric Rabi-Stark model (ARSM) and the anisotropic ARSM.

翻訳日:2023-04-02 02:18:53 公開日:2021-08-24

# 衝突型電池充電における量子スピードアップ

Quantum speed-up in collisional battery charging ( http://arxiv.org/abs/2105.01863v2 )

ライセンス: Link先を確認

Stella Seah, Mart\'i Perarnau-Llobet, G\'eraldine Haack, Nicolas Brunner, Stefan Nimmrichter

(参考訳) 同一の非平衡量子ビット単位による量子電池の充電に関する衝突モデルを提案する。単位がエネルギー固有状態の混合で作成されると、電池のエネルギーゲインは古典的なランダムウォークによって説明され、平均エネルギーと分散は時間とともに線形に成長する。逆に、量子コヒーレンスを含む量子ビットでは、バッテリに干渉効果が蓄積され、量子ランダムウォークを思い起こさせるようなエネルギー分布がより速く広がる。これは、地上で初期化されたバッテリーの高速で効率的な充電に利用できる。具体的には、コヒーレントプロトコルは、任意の非コヒーレント戦略よりも高い充電パワーが得られることを示し、単一のバッテリのレベルで量子スピードアップを実証する。最後に, エルゴトロピーの概念を用いて, 電池から抽出可能な作業量を特徴付ける。

We present a collision model for the charging of a quantum battery by identical nonequilibrium qubit units. When the units are prepared in a mixture of energy eigenstates, the energy gain in the battery can be described by a classical random walk, where both average energy and variance grow linearly with time. Conversely, when the qubits contain quantum coherence, interference effects buildup in the battery and lead to a faster spreading of the energy distribution, reminiscent of a quantum random walk. This can be exploited for faster and more efficient charging of a battery initialized in the ground state. Specifically, we show that coherent protocols can yield higher charging power than any possible incoherent strategy, demonstrating a quantum speed-up at the level of a single battery. Finally, we characterize the amount of extractable work from the battery through the notion of ergotropy.

翻訳日:2023-04-01 13:24:17 公開日:2021-08-24

# 日々のアルゴリズム監査:有害なアルゴリズム行動に直面する日常生活者の力を理解する

Everyday algorithm auditing: Understanding the power of everyday users in surfacing harmful algorithmic behaviors ( http://arxiv.org/abs/2105.02980v2 )

ライセンス: Link先を確認

Hong Shen, Alicia DeVos, Motahhare Eslami, Kenneth Holstein

(参考訳) 研究機関は、偏見と有害な行動に対するアルゴリズムシステム監査のための公式なアプローチを提案している。正式な監査アプローチは大きな影響を与えてきたが、システムのデプロイが完了すると、日常使用のコンテキストでのみ問題が発生するため、大きな盲点に陥りがちである。近年,アルゴリズムシステムの日常的使用者が,これらのシステムとの日常的な相互作用の中で遭遇する有害な行為を検知し,意識を高めるケースが増えている。しかし、これまでのところ、このボトムアップでユーザ主導の監査プロセスにはほとんど学術的な注意が払われていない。本稿では,アルゴリズムシステムとの日々のインタラクションを通じて問題のある機械の動作を検出し,理解し,問合せを行うプロセスである,日々のアルゴリズム監査の概念を提案し,検討する。我々は,ユーザのアルゴリズムに関する知識によらず,より中央集権的な監査形態による検出を不要とする,問題のあるマシン動作を克服する上で,日常的なユーザは強力である,と論じる。我々は、日常的なアルゴリズム監査の現実的な事例を分析し、これらの事例から将来のプラットフォームや監査行動を促進するツールの設計の教訓を導き出す。最後に,形式的監査アプローチと,アルゴリズムシステムの日常的利用において生じる有機的監査行動とのギャップを埋めるために,先行する作業について論じる。

A growing body of literature has proposed formal approaches to audit algorithmic systems for biased and harmful behaviors. While formal auditing approaches have been greatly impactful, they often suffer major blindspots, with critical issues surfacing only in the context of everyday use once systems are deployed. Recent years have seen many cases in which everyday users of algorithmic systems detect and raise awareness about harmful behaviors that they encounter in the course of their everyday interactions with these systems. However, to date little academic attention has been granted to these bottom-up, user-driven auditing processes. In this paper, we propose and explore the concept of everyday algorithm auditing, a process in which users detect, understand, and interrogate problematic machine behaviors via their day-to-day interactions with algorithmic systems. We argue that everyday users are powerful in surfacing problematic machine behaviors that may elude detection via more centrally-organized forms of auditing, regardless of users' knowledge about the underlying algorithms. We analyze several real-world cases of everyday algorithm auditing, drawing lessons from these cases for the design of future platforms and tools that facilitate such auditing behaviors. Finally, we discuss work that lies ahead, toward bridging the gaps between formal auditing approaches and the organic auditing behaviors that emerge in everyday use of algorithmic systems.

翻訳日:2023-04-01 07:32:04 公開日:2021-08-24

# 数個のコピーから量子多体システムを学ぶ

Learning quantum many-body systems from a few copies ( http://arxiv.org/abs/2107.03333v2 )

ライセンス: Link先を確認

Cambyse Rouz\'e, Daniel Stilck Fran\c{c}a

(参考訳) 量子状態の物理特性を測定から推定することは、量子科学における最も基本的なタスクの1つである。本研究では,与えられた局所性の準局所可観測値の期待値から,系の大きさに多変量的に増大する多数のサンプルから相対誤差までを推定し,対象の可観測値の局所性を多項式的に推定することのできる状態の条件を同定する。これはいくつかのレジームにおいて既知のトモグラフィ法よりも指数関数的に改善される。我々は、量子状態、すなわち最大エントロピー法を量子最適輸送と古典影の新たな分野のツールと組み合わせることで、最も確立された手法の1つを達成する。我々は、この条件が相関のある種の減衰を示す全ての状態に対して成り立つと仮定し、いくつかの部分集合に対してそれを確立する。これらは、任意のハイパーグラフ上の局所通勤ハミルトニアンの1次元熱および高温ギブス状態や浅い回路の出力など、広く研究されている状態のクラスを含む。さらに,独立利害のサンプル複雑性を超えて,最大エントロピー法の改善を示す。これらは、多体状態の共分散行列の条件数に関する新しいバウンダリと同様に、ポストプロセッシングを効率的に実行することが可能なレギュレーションの同定を含む。

Estimating physical properties of quantum states from measurements is one of the most fundamental tasks in quantum science. In this work, we identify conditions on states under which it is possible to infer the expectation values of all quasi-local observables of a given locality up to a relative error from a number of samples that grows polylogarithmically with the system's size and polynomially on the locality of the target observables. This constitutes an exponential improvement over known tomography methods in some regimes. We achieve our results by combining one of the most well-established techniques to learn quantum states, namely the maximum entropy method, with tools from the emerging fields of quantum optimal transport and classical shadows. We conjecture that our condition holds for all states exhibiting some form of decay of correlations and establish it for several subsets thereof. These include widely studied classes of states such as one-dimensional thermal and high-temperature Gibbs states of local commuting Hamiltonians on arbitrary hypergraphs or outputs of shallow circuits. Moreover, we show improvements of the maximum entropy method beyond the sample complexity of independent interest. These include identifying regimes in which it is possible to perform the postprocessing efficiently as well as novel bounds on the condition number of covariance matrices of many-body states.

翻訳日:2023-03-23 04:14:28 公開日:2021-08-24

# フラクタル量子セルオートマトンからの非自明なリアプノフスペクトル

Non-trivial Lyapunov spectrum from fractal quantum cellular automata ( http://arxiv.org/abs/2107.12191v2 )

ライセンス: Link先を確認

David Berenstein, Brian Kent

(参考訳) すべてのクリフォードセルオートマトンを含むクリフォードセルオートマトンの集合は、格子の各部位に2k$次元トーラス位相空間を持つ格子系の量子化によって生じる。ダイナミクスはトーラス変数の線型写像であり、また局所的でもある。さらにシンプレクティック構造も保持する。これらは、追加の形式変数の集合に整数係数を持つローラン多項式の成分を持つ2k\times 2k$行列によって分類される。これらのことは、量子代数の生成子の進化におけるフラクタルな振る舞いをもたらす。フラクタルな振る舞いは、元の線形力学系の非自明なリャプノフ指数をもたらす。この証明はこれらの行列の特性多項式のフーリエ解析を用いる。

A generalized set of Clifford cellular automata, which includes all Clifford cellular automata, result from the quantization of a lattice system where on each site of the lattice one has a $2k$-dimensional torus phase space. The dynamics is a linear map in the torus variables and it is also local: the evolution depends only on variables in some region around the original lattice site. Moreover it preserves the symplectic structure. These are classified by $2k\times 2k$ matrices with entries in Laurent polynomials with integer coefficients in a set of additional formal variables. These can lead to fractal behavior in the evolution of the generators of the quantum algebra. Fractal behavior leads to non-trivial Lyapunov exponents of the original linear dynamical system. The proof uses Fourier analysis on the characteristic polynomial of these matrices.

翻訳日:2023-03-20 21:29:27 公開日:2021-08-24

# 因果非分離プロセスにおける情報交換

Information Exchange in Causally Nonseparable Processes ( http://arxiv.org/abs/2108.07270v2 )

ライセンス: Link先を確認

Gianluca Francica

(参考訳) プロセスマトリックスフレームワークは、2つのパーティのシステムに対して、因果非分離構造の存在を予測する。情報交換を特徴とし,両当事者の総エントロピーが非分離性の尺度として作用することを示す。

For a system of two parties, the process matrix framework predicts the existence of causally nonseparable structures. We characterize the information exchanged, showing that the total entropy of the two parties acts as a measure for the nonseparability.

翻訳日:2023-03-18 07:20:42 公開日:2021-08-24

# 空間時間多重グリーンベルガー・ホルン・ザイリンガー(GHZ)測定を用いた量子ネットワークにおける距離非依存な絡み合い生成

Distance-Independent Entanglement Generation in a Quantum Network using Space-Time Multiplexed Greenberger-Horne-Zeilinger (GHZ) Measurements ( http://arxiv.org/abs/2108.09352v2 )

ライセンス: Link先を確認

Ashlesha Patil, Joshua I. Jacobson, Emily van Milligen, Don Towsley, Saikat Guha

(参考訳) リンクをうまく作成する量子ネットワークでは、隣り合うリピータノード間の共有ベル状態は、各タイムスロット内の確率$p$で、成功確率$q<1$、エンドからエンドのエンタングルメント生成速度は、マルチパスルーティングにもかかわらず、消費者間の距離とともに指数関数的に低下する。リピータが確率$q$で成功するGHZ基底で多重量子射影測定を行うことができれば、ある$(p,q)$領域で距離が変化しないが、指数関数的に外部に崩壊する。距離独立率が発生するこの領域は、新しいパーコレーション問題の超臨界領域である。我々は、このGHZプロトコルを拡張し、時間多重ブロック長$k$を組み込む。 k$が増加するにつれて、超臨界領域は拡大する。与えられた$(p,q)$の場合、絡み合い率は最初は$k$で増加し、超臨界領域の中で十分高い$k$で1/k$ GHZ状態として崩壊する。平均$\mu$ で指数関数的に分布するメモリコヒーレンス時間が組み込まれている場合、$k$ を増加させることで超臨界領域が無期限に増加することはない。最後に、スペース分割多重化、すなわち、上記プロトコルを最大$d$の切断されたネットワークリージョンで独立に実行することにより、$d$がネットワークのノード次数である場合、上記ランダム化されたローカルリンク状態プロトコルが超過できないスロットレート当たり1ghzの状態を超過することができる。 $(p,q)$が増加すると、1スロットあたり$d$GHZ状態の究極のミンカットエンタングルメント生成容量にアプローチすることができる。

In a quantum network that successfully creates links, shared Bell states between neighboring repeater nodes, with probability $p$ in each time slot, and performs Bell State Measurements at nodes with success probability $q<1$, the end to end entanglement generation rate drops exponentially with the distance between consumers, despite multi-path routing. If repeaters can perform multi-qubit projective measurements in the GHZ basis that succeed with probability $q$, the rate does not change with distance in a certain $(p,q)$ region, but decays exponentially outside. This region where the distance independent rate occurs is the supercritical region of a new percolation problem. We extend this GHZ protocol to incorporate a time-multiplexing blocklength $k$, the number of time slots over which a repeater can mix-and-match successful links to perform fusion on. As $k$ increases, the supercritical region expands. For a given $(p,q)$, the entanglement rate initially increases with $k$, and once inside the supercritical region for a high enough $k$, it decays as $1/k$ GHZ states per time slot. When memory coherence time exponentially distributed with mean $\mu$ is incorporated, it is seen that increasing $k$ does not indefinitely increase the supercritical region; it has a hard $\mu$ dependent limit. Finally, we find that incorporating space-division multiplexing, i.e., running the above protocol independently in up to $d$ disconnected network regions, where $d$ is the network's node degree, one can go beyond the 1 GHZ state per time slot rate that the above randomized local link-state protocol cannot surpass. As $(p,q)$ increases, one can approach the ultimate min-cut entanglement generation capacity of $d$ GHZ states per slot.

翻訳日:2023-03-17 22:52:53 公開日:2021-08-24

# ユーザエンゲージメントのためのモバイルヘルス設計 : 社会技術的アプローチの重要性

Designing Mobile Health for User Engagement: The Importance of Socio-Technical Approach ( http://arxiv.org/abs/2108.09786v2 )

ライセンス: Link先を確認

Tochukwu Ikwunne, Lucy Hederman and P.J. Wall

(参考訳) グローバル・サウスにおけるモバイルヘルス(mHealth)の有効性に対するユーザエンゲージメントの重要性にもかかわらず、そのような介入の多くはユーザエンゲージメント属性を含まない。これは、社会技術的側面が設計、開発、実装においてしばしば考慮されないためである。また,mHealthのユーザ中心設計プロセスにおいて社会技術的要因が果たす役割については,文献上はほとんど議論されていない。本研究は、mHealthデザインとユーザエンゲージメントに対するテクノ中心のアプローチと、ユーザ中心のデザインに既存の普遍的なフレームワークに依存しているアプローチが、グローバル・サウスのほとんどのmHealthプロジェクトが維持できない結果に、効果がないことを証明している。本研究は, ユーザエンゲージメントに対する態度を探るため, mHealthデザイナや開発者と半構造化インタビューを行ったシエラレオネのプロジェクトについて検討する。ユーザエンゲージメントの障壁とファシリテータは、技術的あるいは社会技術的に識別され、分類された。調査の結果,社会的要因を考慮せずに技術中心のアプローチを採用することは,ユーザのエンゲージメントに悪影響を及ぼす可能性が示唆された。そこで本研究では,mHealthにユーザエンゲージメント属性をより効果的に組み込むための新しい設計フレームワークを提案する。

Despite the significance of user engagement for efficacy of mobile health (mHealth) in the Global South, many such interventions do not include user-engaging attributes. This is because socio-technical aspects are frequently not considered during the design, development, and implementation, stages of such initiatives. In addition, there is little discussion in the literature about the role socio-technical factors play in user-centered design processes for mHealth. This research posits consideration of socio-technical factors is required as techno-centric approaches to mHealth design and user engagement, as well as those relying on existing universal frameworks for user-centered design, have proven to be ineffective with the result that most mHealth projects in the Global South fail to sustain. This research examines projects in Sierra Leone where semi-structured interviews were conducted with mHealth designers and developers in order to explore their attitudes towards user engagement in this case. Barriers and facilitators to user engagement were identified and classified as either technical or socio-technical. Findings from the study indicate that adoption of a techno-centric approach without consideration of socio-technical factors can negatively affect user's engagement. Based on these findings, we propose to develop a new design framework for more effective inclusion of user-engaging attributes in mHealth.

翻訳日:2023-03-17 18:45:14 公開日:2021-08-24

# ハイブリッド電荷センサ単電子トランジスタとCMOS回路のシミュレーション

Simulations of hybrid charge-sensing single-electron-transistors and CMOS circuits ( http://arxiv.org/abs/2108.10467v1 )

ライセンス: Link先を確認

Tetsufumi Tanamoto and Keiji Ono

(参考訳) 単一電子トランジスタ(SET)は、量子計算など多くの分野で電荷センサとして広く使われている。一般に、SETの信号は相補的金属酸化物半導体(CMOS)デバイスよりも小さく、増幅回路の多くはこれらの信号を拡大するために必要である。 1つの小さな出力を増幅する代わりに、理論上はSETの1つが参照として使用されるようなペアのSETの増幅を考える。従来のSPICE(Simulation Program with Integrated Circuit Emphasis)回路シミュレータを用いて,SETとCMOSデバイスの2段階増幅過程をシミュレートする。 CMOS回路へのSETのペア実装により、SETからCMOS回路への直接信号転送により、SETの統合がより実現可能となる。

Single-electron transistors (SETs) have been extensively used as charge sensors in many areas such as quantum computations. In general, the signals of SETs are smaller than those of complementary metal-oxide semiconductor (CMOS) devices, and many amplifying circuits are required to enlarge these signals. Instead of amplifying a single small output, we theoretically consider the amplification of pairs of SETs, such that one of the SETs is used as a reference. We simulate the two-stage amplification process of SETs and CMOS devices using a conventional SPICE (Simulation Program with Integrated Circuit Emphasis) circuit simulator. Implementing the pairs of SETs into CMOS circuits makes the integration of SETs more feasible because of direct signal transfer from the SET to the CMOS circuits.

翻訳日:2023-03-17 07:52:42 公開日:2021-08-24

# 曲線空間時間における2つの検出器間の平衡および非平衡量子相関

Equilibrium and nonequilibrium quantum correlations between two detectors in curved space time ( http://arxiv.org/abs/2108.10454v1 )

ライセンス: Link先を確認

He Wang and Jin Wang

(参考訳) 2量子系(カーブラックホールの地平線付近)で符号化された平衡および非平衡量子情報相関について検討した。質量と角運動量,さらに局所曲率や加速度が2量子ビット間の量子相関の挙動に及ぼす影響について検討した。 2つの量子ビットの量子情報は時空構造にエンコードされていることを示す。非平衡の場合、非平衡は相関にも寄与する。

We investigate the equilibrium and nonequilibrium quantum information correlations encoded in two-qubit system (near the horizon of a Kerr black hole). We study the impact of mass and the angular momentum, and further the local curvature or accelerations on the behaviors of the quantum correlations between two qubits. We show the quantum information of two qubits is encoded in the space time structure. In nonequilibrium case, the nonequilibrium can also contribute to the correlations.

翻訳日:2023-03-17 07:51:59 公開日:2021-08-24

# 複素化された Poincar\'e 群の普遍被覆の既約ユニタリ表現のレビューと具体的な記述

Review and concrete description of the irreducible unitary representations of the universal cover of the complexified Poincar\'e group ( http://arxiv.org/abs/2108.10726v1 )

ライセンス: Link先を確認

Luigi Borasi

(参考訳) 我々は、既約ユニタリ表現を $\mathbb{C}^4\rtimes\mathbf{Spin}(4,\mathbb{C})$, すなわち、複素化されたポアンカル群 $\mathbb{C}^4\rtimes\mathbf{SO}(4,\mathbb{C})$ の普遍被覆の教育的表現を与える。これらの表現は1967年にロフマンによって初めて研究された。我々は、この文脈で関連する一般的なウィグナー・マッキー理論の事実とともに、彼の結果の現代的な定式化を提供する。さらに、これらの表現を実現するための異なる方法について議論し、非ゼロの「複素質量」の場合、より明示的な実現の詳細な構成を与える。この明示的な実現は、古典的なウィグナーの場合の $\mathbb{R}^4\rtimes\mathbf{Spin}^0(1,3)$ と平行して拡張する。我々の分析は、フェルミオン理論のユークリッド的定式化への関心が動機である。

We give a pedagogical presentation of the irreducible unitary representations of $\mathbb{C}^4\rtimes\mathbf{Spin}(4,\mathbb{C})$, that is, of the universal cover of the complexified Poincar\'e group $\mathbb{C}^4\rtimes\mathbf{SO}(4,\mathbb{C})$. These representations were first investigated by Roffman in 1967. We provide a modern formulation of his results together with some facts from the general Wigner-Mackey theory which are relevant in this context. Moreover, we discuss different ways to realize these representations and, in the case of a non-zero "complex mass", we give a detailed construction of a more explicit realization. This explicit realization parallels and extends the one used in the classical Wigner case of $\mathbb{R}^4\rtimes\mathbf{Spin}^0(1,3)$. Our analysis is motivated by the interest in the Euclidean formulation of Fermionic theories.

翻訳日:2023-03-17 07:48:47 公開日:2021-08-24

# BTZブラックホールの絡み合いパターンのクラスター代数的記述

Cluster algebraic description of entanglement patterns for the BTZ black hole ( http://arxiv.org/abs/2108.10638v1 )

ライセンス: Link先を確認

Bercel Boldis and P\'eter L\'evay

(参考訳) 高温限界における静的BTZブラックホールと双対な2次元共形場理論の熱状態について検討する。静的BTZスライスの境界を$N$サブシステムに分割した後、熱状態の絡み合いパターンを符号化する基盤となる$C_{N-1}$クラスタ代数が存在することを示す。また、固定された$N$に対してそのようなパターンを幾何学的にカプセル化するポリトープがシクロヘドロン${\mathcal C}_{N-1}$であることを示す。あるいは、これらの絡み合いのパターンは、Zamorodchikov $Y$-system of $C_{N-1}$ typeという用語で測地学(キネマティック空間)の空間に表せる。このような$y$-システムの境界条件は、btzブラックホールのエントロピーを特徴としている。

We study the thermal state of a two dimensional conformal field theory which is dual to the static BTZ black hole in the high temperature limit. After partitioning the boundary of the static BTZ slice into $N$ subsystems we show that there is an underlying $C_{N-1}$ cluster algebra encoding entanglement patterns of the thermal state. We also demonstrate that the polytope encapsulating such patterns in a geometric manner for a fixed $N$ is the cyclohedron ${\mathcal C}_{N-1}$. Alternatively these patterns of entanglement can be represented in the space of geodesics (kinematic space) in terms of a Zamolodchikov $Y$-system of $C_{N-1}$ type. The boundary condition for such an $Y$-system is featuring the entropy of the BTZ black hole.

翻訳日:2023-03-17 07:47:36 公開日:2021-08-24

# 非可換輸送距離空間上の量子チャネルのリッチ曲率

Ricci curvature of quantum channels on non-commutative transportation metric spaces ( http://arxiv.org/abs/2108.10609v1 )

ライセンス: Link先を確認

Li Gao and Cambyse Rouz\'e

(参考訳) Ollivierの研究に続いて、状態空間上の非可換計量の縮約として、量子チャネルの粗いリッチ曲率を導入する。これらの指標は[N]の精神における非可換輸送コストとして定義される。ゴズランとc. l\'{e}onard。 2006] は、文献の異なる量子ワッサースタイン距離に対する統一的なアプローチを与える。粗リッチ曲率下限とその双対勾配推定は、適切な仮定の下では、ポアンカルの不等式(スペクトルギャップ)および輸送コストの不等式を意味する。干渉関係を用いて、ギブスサンプル、ボソニックおよびフェルミオンビームスプリッターおよびn-量子ビット上のパウリチャネルの粗リッチ曲率の正の有界を得る。

Following Ollivier's work, we introduce the coarse Ricci curvature of a quantum channel as the contraction of non-commutative metrics on the state space. These metrics are defined as a non-commutative transportation cost in the spirit of [N. Gozlan and C. L\'{e}onard. 2006], which gives a unified approach to different quantum Wasserstein distances in the literature. We prove that the coarse Ricci curvature lower bound and its dual gradient estimate, under suitable assumptions, imply the Poincar\'{e} inequality (spectral gap) as well as transportation cost inequalities. Using intertwining relations, we obtain positive bounds on the coarse Ricci curvature of Gibbs samplers, Bosonic and Fermionic beam-splitters as well as Pauli channels on n-qubits.

翻訳日:2023-03-17 07:47:12 公開日:2021-08-24

# 量子誤差補正のための効率的な診断法

Efficient diagnostics for quantum error correction ( http://arxiv.org/abs/2108.10830v1 )

ライセンス: Link先を確認

Pavithran Iyer, Aditya Jain, Stephen D. Bartlett and Joseph Emerson

(参考訳) フォールトトレラント量子コンピューティングは、リソースのオーバーヘッドを正確に見積もる必要があるが、ゲート忠実度やダイヤモンド距離といった標準メトリクスは、論理性能の予測に乏しいことが示されている。本稿では,pauliエラー再構成に基づくスケーラブルな実験手法を提案する。数値的なエビデンスから,本手法は,限られたデータであっても,様々な誤差モデルに対する標準誤差測定値に基づいて予測を著しく上回ることを示す。本稿では,この手法が誤り訂正スキームの選択にどのように役立つかを説明する。

Fault-tolerant quantum computing will require accurate estimates of the resource overhead, but standard metrics such as gate fidelity and diamond distance have been shown to be poor predictors of logical performance. We present a scalable experimental approach based on Pauli error reconstruction to predict the performance of concatenated codes. Numerical evidence demonstrates that our method significantly outperforms predictions based on standard error metrics for various error models, even with limited data. We illustrate how this method assists in the selection of error correction schemes.

翻訳日:2023-03-17 07:38:57 公開日:2021-08-24

# 量子コンピューティング応用のためのFermilabにおける大規模ミリケルビンプラットフォーム

A large millikelvin platform at Fermilab for quantum computing applications ( http://arxiv.org/abs/2108.10816v1 )

ライセンス: Link先を確認

Matthew Hollister, Ram Dhuley and Grzegorz Tatkowski

(参考訳) 大きなmk冷却プラットフォームの必要性は、量子コンピューティングプラットフォームにおいて、ますます多くの極低温量子ビットをホストしたいという願望に支えられている。我々は,国立量子イニシアティブの下でエネルギー省から資金提供を受けたフェルミラボの超伝導量子材料・システムセンターの一環として,ミリケルビン温度を2m×1.5m程度の実験量で到達可能な極低温プラットフォームを開発している。このプラットフォームは超伝導高周波加速器キャビティ技術に基づく3次元量子ビットアーキテクチャをホストすることを目的としている。本稿では,プラットフォームの基本設計と期待する性能パラメータについて述べる。

The need for larger mK cooling platforms is being driven by the desire to host ever growing numbers of cryogenic qubits in quantum computing platforms. As part of the Superconducting Quantum Materials and Systems Center at Fermilab funded through the Department of Energy under the National Quantum Initiative, we are developing a cryogenic platform capable of reaching millikelvin temperatures in an experimental volume of 2 meters diameter by approximately 1.5 meters in height. The platform is intended to host a three-dimensional qubit architecture based on superconducting radiofrequency accelerator cavity technologies. This paper describes the baseline design of the platform, along with the expected key performance parameters.

翻訳日:2023-03-17 07:38:47 公開日:2021-08-24

# 非半単純tqftからの擬エルミートレビン-ウェンモデル

Pseudo-Hermitian Levin-Wen models from non-semisimple TQFTs ( http://arxiv.org/abs/2108.10798v1 )

ライセンス: Link先を確認

Nathan Geer, Aaron D. Lauda, Bertrand Patureau-Mirand, Joshua Sussan

(参考訳) 完全可解な擬エルミート型2次元スピンハミルトニアンの大きなクラスを構築する。これらの系の基底状態はシステムの空間的トポロジーにのみ依存する。トラエフ・ビロモデルを一般化した非半単純tqftを用いて,表面上の基底状態系を表面に割り当てられた値で同定する。非自明な例は、量子パラメータがユニタリの根に特殊化される量子sl(2)の表現の非半単純部分圏から生じる。

We construct large classes of exactly solvable pseudo-Hermitian 2D spin Hamiltonians. The ground states of these systems depend only on the spatial topology of the system. We identify the ground state system on a surface with the value assigned to the surface by a non-semisimple TQFT generalizing the Turaev-Viro model. A non-trivial example arises from a non-semisimple subcategory of representations of quantum sl(2) where the quantum parameter is specialized to a root of unity.

翻訳日:2023-03-17 07:38:24 公開日:2021-08-24

# マルコフ浴と量子雪崩

Markovian baths and quantum avalanches ( http://arxiv.org/abs/2108.10796v1 )

ライセンス: Link先を確認

Dries Sels

(参考訳) 本稿では,多体局所化相と熱介在物の安定性に関する数値的な結果について述べる。この作業は、Morningstarらによる最近の提案を単純化する。 [arXiv:2107.05642]およびマルコフ浴に摂動的に結合する小さな乱れたスピン鎖の研究。正準不定形ハイゼンベルク鎖の雪崩安定性に対する臨界障害はW>20を超えた。アンダーソン絶縁体とは対照的に、雪崩しきい値はシステムサイズとかなりずれており、研究体制の飽和の証拠はない。私は、結果は多体局所化フェーズの欠如によって最も容易に説明できると主張する。

In this work I will discuss some numerical results on the stability of the many-body localized phase to thermal inclusions. The work simplifies a recent proposal by Morningstar et al. [arXiv:2107.05642] and studies small disordered spin chains which are perturbatively coupled to a Markovian bath. The critical disorder for avalanche stability of the canonical disordered Heisenberg chain is shown to exceed W>20. In stark contrast to the Anderson insulator, the avalanche threshold drifts considerably with system size, with no evidence of saturation in the studied regime. I will argue that the results are most easily explained by the absence of a many-body localized phase.

翻訳日:2023-03-17 07:38:16 公開日:2021-08-24

# シリコンスピン量子ビットの低劣化・ロバストマイクロマグネット設計

Low dephasing and robust micromagnet designs for silicon spin qubits ( http://arxiv.org/abs/2108.10769v1 )

ライセンス: Link先を確認

N. I. Dumoulin Stuyck, F. A. Mohiyaddin, R. Li, M. Heyns, B. Govoreanu, and I. P. Radu

(参考訳) シリコン量子ビットでの電子スピン操作を可能にするためにマイクロマグネットが登場し、99:9%以上の単一量子ビットゲートフィデリティを実現している。しかし、これらのマイクロマグネットは歪んだ磁場勾配を量子ビットに応用し、スピン状態は電場ノイズの影響を受けやすく、コヒーレンス時間を制限している。ここでは、量子ビットの劣化を最小限に抑えつつ、高速な量子ビット制御とアドレス可能性を実現するマグネットの設計について述べる。具体的には、磁場勾配による劣化を最小限に抑え、量子ドットに対する磁気次元と位置を設計、最適化する。この設計によるマイクロマグネットによるデフェスレートは、最先端の実装よりも最大3桁低いため、長いコヒーレンス時間を可能にする。この設計は製造誤差に対して堅牢であり、様々なシリコン量子ビットデバイスジオメトリと組み合わせることで、コヒーレンス制限因子の探索と新しいアップスケーリングアプローチを可能にする。

Using micromagnets to enable electron spin manipulation in silicon qubits has emerged as a very popular method, enabling single-qubit gate fidelities larger than 99:9%. However, these micromagnets also apply stray magnetic field gradients onto the qubits, making the spin states susceptible to electric field noise and limiting their coherence times. We describe here a magnet design that minimizes qubit dephasing, while allowing for fast qubit control and addressability. Specifically, we design and optimize magnet dimensions and position relative to the quantum dots, minimizing dephasing from magnetic field gradients. The micromagnet-induced dephasing rates with this design are up to 3-orders of magnitude lower than state-of-the-art implementations, allowing for long coherence times. This design is robust against fabrication errors, and can be combined with a wide variety of silicon qubit device geometries, thereby allowing exploration of coherence limiting factors and novel upscaling approaches.

翻訳日:2023-03-17 07:37:06 公開日:2021-08-24

# 識別可能な量子エミッタの光絡み合い

Optical Entanglement of Distinguishable Quantum Emitters ( http://arxiv.org/abs/2108.10928v1 )

ライセンス: Link先を確認

David Levonian, Ralf Riedinger, Bartholomeus Machielse, Erik Knall, Mihir Bhaskar, Can Knaut, Rivka Bekenstein, Hongkun Park, Marko Loncar, Mikhail Lukin

(参考訳) 固体量子エミッターは、長寿命スピン記憶、高忠実度局所演算、長距離絡み合いのための光接続により量子ネットワークの実現に有望な候補である。しかし、局所環境の違いにより、固体エミッタは通常、異なる遷移周波数を特徴とし、任意のエミッタ対間の光的に媒介する絡み合いを作るのが困難である。本稿では,多くの直線幅で分離された光遷移を持つエミッタの接合方法を提案する。本手法では, 電子光学変調器を用いて, 一対のスピン量子ビットのパリティ測定を行うことができる。 7.4GHzの光遷移を持つダイヤモンドナノフォトニックキャビティを用いた2つのシリコン空洞を用いたプロトコルを実験的に実証した。識別可能なエミッタで作業することで、個別の量子ビットアドレッシングと読み出しが可能となり、コロケーションと空間分離エミッタの並列制御と絡み合いが可能になり、量子情報処理システムのスケールアップに向けた重要なステップとなる。

Solid-state quantum emitters are promising candidates for the realization of quantum networks, owing to their long-lived spin memories, high-fidelity local operations, and optical connectivity for long-range entanglement. However, due to differences in local environment, solid-state emitters typically feature a range of distinct transition frequencies, which makes it challenging to create optically mediated entanglement between arbitrary emitter pairs. We propose and demonstrate an efficient method for entangling emitters with optical transitions separated by many linewidths. In our approach, electro-optic modulators enable a single photon to herald a parity measurement on a pair of spin qubits. We experimentally demonstrate the protocol using two silicon-vacancy center sin a diamond nanophotonic cavity, with optical transitions separated by 7.4 GHz. Working with distinguishable emitters allows for individual qubit addressing and readout, enabling parallel control and entanglement of both co-located and spatially separated emitters, a key step towards scaling up quantum information processing systems

翻訳日:2023-03-17 07:30:48 公開日:2021-08-24

# 物質波導波路QEDにおけるマルチバンドおよびアレイ効果

Multiband and array effects in matter-wave-based waveguide QED ( http://arxiv.org/abs/2108.11759v1 )

ライセンス: Link先を確認

Alfonso Lanuza, Joonhyuk Kwon, Youngshin Kim and Dominik Schneble

(参考訳) 原子性物質波の自然放出に関する最近の実験は、導波路に結合した量子エミッタの挙動に新しい窓を開く。本稿では、導波路の帯域分散関係を近似することなく、理論上このシステムを研究するための無限積に基づくアプローチを開発する。本研究では, 1, 複数, 無限個の量子エミッタの1次元配列のシステムを解くとともに, 実験との比較を行った。このことは崩壊スペクトルの詳細な特性を導き、対流境界状態の族、超放射と異なるマルコフ放射を増強するための新しいメカニズム、物質-波分極の出現へと繋がる。

Recent experiments on spontaneous emission of atomic matter waves open a new window into the behavior of quantum emitters coupled to a waveguide. Here we develop an approach based on infinite products to study this system theoretically, without the need to approximate the band dispersion relation of the waveguide. We solve the system for a one-dimensional array of one, multiple and an infinite number of quantum emitters and compare with the experiments. This leads to a detailed characterization of the decay spectrum, with a family of in-gap bound states, new mechanisms for enhanced Markovian emission different from superradiance, and the emergence of matter-wave polaritons.

翻訳日:2023-03-17 07:21:12 公開日:2021-08-24

# GIAOs上の複雑な2電子積分のコレスキー分解:強磁場下での大分子に対する効率的なMP2計算

Cholesky decomposition of complex two-electron integrals over GIAOs: Efficient MP2 computations for large molecules in strong magnetic fields ( http://arxiv.org/abs/2108.11370v1 )

ライセンス: Link先を確認

Simon Blaschke and Stella Stopkowicz

(参考訳) 大規模量子化学計算では、電子反発積分(ERI)テンソルがメモリとディスク空間のボトルネックとなる。外部有限磁場を用いると、置換対称性が減少し、複雑な積分や波動関数パラメータを扱う必要があるため、この問題はさらに顕著になる。この問題を緩和する一つの方法は、ゲージを含む原子軌道上の複素エリスにコレスキー分解(cd)を適用することである。厳密でロバストなエラー制御を維持しつつ、選択されたベースセットから線形依存する製品密度を選択的に捨てて好適な圧縮率を確立する。このエラー制御は、事前定義された補助基底セットに依存する密度フィッティングのような概念的に類似した方法よりも大きな利点となる。有限体 (ff) Hartree-Fock と ff 2次 M{\o}ller Plesset 摂動理論の枠組みにおける CD の利用を実装した。本研究は,CD圧縮速度が有限磁場の存在下での計算において特に有用であることを示す。 FF-CD-MP2方式は、2000以上の基底関数を持つ系の適切な時間間隔で強磁場下での相関処理を可能にする。

In large-scale quantum-chemical calculations the electron-repulsion integral (ERI) tensor rapidly becomes the bottleneck in terms of memory and disk space. When an external finite magnetic field is employed, this problem becomes even more pronounced because of the reduced permutational symmetry and the need to work with complex integrals and wave-function parameters. One way to alleviate the problem is to employ a Cholesky decomposition (CD) to the complex ERIs over gauge-including atomic orbitals. The CD scheme establishes favourable compression rates by selectively discarding linearly dependent product densities from the chosen basis set while maintaining a rigorous and robust error control. This error control constitutes the main advantage over conceptually similar methods such as density fitting which rely on employing pre-defined auxiliary basis sets. We implemented the use of the CD in the framework of finite-field (ff) Hartree-Fock and ff second-order M{\o}ller Plesset perturbation theory. Our work demonstrates that the CD compression rates are particularly beneficial in calculations in the presence of a finite magnetic field. The ff-CD-MP2 scheme enables the correlated treatment of systems with more than 2000 basis functions in strong magnetic fields within a reasonable time span.

翻訳日:2023-03-17 07:20:59 公開日:2021-08-24

# 時間ではなく1つの空間座標を扱うハミルトン的な形式主義

A Hamiltonian-like formalism that treats one spatial coordinate -- rather than time -- differently ( http://arxiv.org/abs/2108.11330v1 )

ライセンス: Link先を確認

Sivapalan Chelvaniththilan

(参考訳) クナトゥム場理論(QFT)のハミルトン形式とラグランジュ形式は同値である。しかし、ローレンツ不変性はラグランジュ形式論において明確に見ることができるが、ハミルトニアン形式ではそれほど明確ではない。これは、時間はハミルトニアン形式論の空間座標とは少し異なる扱いを受けるからである。本稿では、(作用素と状態ベクトルを持つ)ハミルトニアン形式と同様に、空間座標の2つと等しい足場で時間を扱う別の形式を考案できるかどうかを考察するが、3つ目の形式は異なる扱いをされるが、時間は通常ハミルトニアン形式である。

The Hamiltonian and Lagrangian formalisms of Qunatum Field Theory (QFT) are equivalent. But while Lorentz invariance can be clearly seen in the Lagrangian formalism, it is not so explicit in the Hamiltonian one. This is because time is treated a little differently from the spatial coordinates in the Hamiltonian formalism. In this paper, I explore whether it is possible to devise another formalism that is just like the Hamiltonian one (with operators and state vectors) but which treats time on an equal footing with two of the spatial coordinates, while the third one is treated differently, the way time is in the usual Hamiltonian formalism.

翻訳日:2023-03-17 07:20:38 公開日:2021-08-24

# 全シリコン300mm集積プロセスにおける均一スピン量子デバイス

Uniform Spin Qubit Devices in an All-Silicon 300 mm Integrated Process ( http://arxiv.org/abs/2108.11317v1 )

ライセンス: Link先を確認

N. I. Dumoulin Stuyck, R. Li, C. Godfrin, A. Elsayed, S. Kubicek, J. Jussot, B. T. Chan, F. A. Mohiyaddin, M. Shehata, G. Simion, Y. Canvel, L. Goux, M. Heyns, B. Govoreanu, and I. P. Radu

(参考訳) 電子スピン量子ビットの大きな配列は、製造とデバイス均一性を大幅に改善する必要がある。ここでは300KからmKまで優れた量子ビットデバイス均一性とチューニング性を示す。これは、重なり合う多結晶シリコン系ゲートスタックを「オールシリコン」とリソグラフィ的に柔軟な300mm流に組み込むことで、初めて達成される。低次si/sio$_2$は、10kホール移動度が1.5 \cdot 10^4$ $cm^2$/vsである。電荷ノイズが低い(3.6$\mu$eV/$\sqrt{\mathrm{Hz}}$ at 1 Hz)よく制御されたセンサーは、最後の電子まで電荷を感知するために用いられる。約20年間(2-100GHz)にわたって優れた再現可能な相互結合制御を実証した。スピン操作と単発スピン読み出しを行い,約150$\mu$eVの谷分割エネルギーを抽出した。これらの低順で均一な量子ビットデバイスと300mmのfab統合は、大規模量子プロセッサへの高速なスケールアップの道を開く。

Larger arrays of electron spin qubits require radical improvements in fabrication and device uniformity. Here we demonstrate excellent qubit device uniformity and tunability from 300K down to mK temperatures. This is achieved, for the first time, by integrating an overlapping polycrystalline silicon-based gate stack in an 'all-Silicon' and lithographically flexible 300mm flow. Low-disorder Si/SiO$_2$ is proved by a 10K Hall mobility of $1.5 \cdot 10^4$ $cm^2$/Vs. Well-controlled sensors with low charge noise (3.6 $\mu$eV/$\sqrt{\mathrm{Hz}}$ at 1 Hz) are used for charge sensing down to the last electron. We demonstrate excellent and reproducible interdot coupling control over nearly 2 decades (2-100 GHz). We show spin manipulation and single-shot spin readout, extracting a valley splitting energy of around 150 $\mu$eV. These low-disorder, uniform qubit devices and 300mm fab integration pave the way for fast scale-up to large quantum processors.

翻訳日:2023-03-17 07:20:27 公開日:2021-08-24

# 均等:制御摂動の注入による量子アニーラーの忠実性の向上

EQUAL: Improving the Fidelity of Quantum Annealers by Injecting Controlled Perturbations ( http://arxiv.org/abs/2108.10964v1 )

ライセンス: Link先を確認

Ramin Ayanzadeh, Poulami Das, Swamit S. Tannu and Moinuddin Qureshi

(参考訳) 量子コンピューティング (quantum computing) は、量子力学特性を用いて計算困難問題を高速化する情報処理パラダイムである。有望だが、既存のゲートベースの量子コンピュータは数十キュービットしかなく、ほとんどのアプリケーションでは十分ではない。一方、数千の量子ビットを持つ既存のQAは、いくつかのドメイン固有の最適化問題を解く可能性がある。 QAは単一命令マシンであり、プログラムを実行するために、ハミルトニアンにキャストされ、ハードウェアに埋め込まれ、単一の量子マシン命令(QMI)が実行される。残念なことに、ハードウェアのノイズと欠陥は、QMIが数千のトライアルで実行されているとしても、QAのサブ最適化ソリューションをもたらす。 QAのプログラム可能性の制限は、ユーザが全てのトライアルで同じQMIを実行することを意味する。この実験はすべて、実行中に同様のノイズプロファイルを経験し、体系的なバイアスをもたらす。我々は,系統的バイアスが最適解につながり,より多くの試行の実行や既存の誤り緩和スキームを用いることで軽減できないことを観察する。この課題に対処するために、EQUAL(Ensemble Quantum Annealing)を提案する。 EQUALは、制御された摂動をプログラムQMIに追加することにより、QMIのアンサンブルを生成する。 QMIのアンサンブルは、QA上で実行されると、全てのトライアルで同じバイアスに遭遇することを避けて、ソリューションの品質を向上させる。 2041-qubit D-Wave QAを用いて評価したところ、EQUALは平均14%(最大26%)でベースラインと理想の差を橋渡しし、追加の試行は不要であった。 EQUALは既存のエラー軽減スキームと組み合わせて、ベースラインとイデアルの違いを平均で55%(最大68%)橋渡しすることができる。

Quantum computing is an information processing paradigm that uses quantum-mechanical properties to speedup computationally hard problems. Although promising, existing gate-based quantum computers consist of only a few dozen qubits and are not large enough for most applications. On the other hand, existing QAs with few thousand of qubits have the potential to solve some domain-specific optimization problems. QAs are single instruction machines and to execute a program, the problem is cast to a Hamiltonian, embedded on the hardware, and a single quantum machine instruction (QMI) is run. Unfortunately, noise and imperfections in hardware result in sub-optimal solutions on QAs even if the QMI is run for thousands of trials. The limited programmability of QAs mean that the user executes the same QMI for all trials. This subjects all trials to a similar noise profile throughout the execution, resulting in a systematic bias. We observe that systematic bias leads to sub-optimal solutions and cannot be alleviated by executing more trials or using existing error-mitigation schemes. To address this challenge, we propose EQUAL (Ensemble Quantum Annealing). EQUAL generates an ensemble of QMIs by adding controlled perturbations to the program QMI. When executed on the QA, the ensemble of QMIs steers the program away from encountering the same bias during all trials and thus, improves the quality of solutions. Our evaluations using the 2041-qubit D-Wave QA show that EQUAL bridges the difference between the baseline and the ideal by an average of 14% (and up to 26%), without requiring any additional trials. EQUAL can be combined with existing error mitigation schemes to further bridge the difference between the baseline and ideal by an average of 55% (and up to 68%).

翻訳日:2023-03-17 07:20:11 公開日:2021-08-24

# 量子崩壊の形式的側面

Formal Aspects of Quantum Decay ( http://arxiv.org/abs/2108.10957v1 )

ライセンス: Link先を確認

D. F. Ram\'irez Jim\'enez and N. G. Kelkar

(参考訳) 不安定状態の生存確率の計算のためのフォック・クリロフ形式は、状態の密度に関する数学的制約に特に注意を払って再検討され、フーリエ変換は生存振幅を与える。純粋な指数的生存振幅に対応する状態の密度を構築することは不可能であることを示す。彼の生存確率 $p(t)$ と状態密度の自己相関関数はコサインフーリエ変換の対であることが示されている。この結果はウィナー・ヒンチンの定理の特別な場合であり、$P(t)$ を時間の偶関数とし、それによって状態の密度が大きなエネルギーで消えるフォームファクタを含むように強制する。振動数の関数として$P(t)$,$n$を表現し、非指数性から指数性への移行領域の小さな時間における部分的特徴と、大きな時間におけるパワーローの崩壊への指数性について論じる。短時間の遷移は、生存確率が1つの振動を完了した時に起こる。発振の数は共鳴状態の性質に依存し、不安定状態の進化の完全な記述は各領域における発振の数の限界を決定することによって提供される。

The Fock-Krylov formalism for the calculation of survival probabilities of unstable states is revisited paying particular attention to the mathematical constraints on the density of states, the Fourier transform of which gives the survival amplitude. We show that it is not possible to construct a density of states corresponding to a purely exponential survival amplitude. he survival probability $P(t)$ and the autocorrelation function of the density of states are shown to form a pair of cosine Fourier transforms. This result is a particular case of the Wiener Khinchin theorem and forces $P(t)$ to be an even function of time which in turn forces the density of states to contain a form factor which vanishes at large energies. Subtle features of the transition regions from the non-exponential to the exponential at small times and the exponential to the power law decay at large times are discussed by expressing $P(t)$ as a function of the number of oscillations, $n$, performed by it. The transition at short times is shown to occur when the survival probability has completed one oscillation. The number of oscillations depend on the properties of the resonant state and a complete description of the evolution of the unstable state is provided by determining the limits on the number of oscillations in each region.

翻訳日:2023-03-17 07:19:39 公開日:2021-08-24

# オープン量子ロータ:相関と物理電流を接続する

Open Quantum Rotors: Connecting Correlations and Physical Currents ( http://arxiv.org/abs/2108.10955v1 )

ライセンス: Link先を確認

Ricardo Puebla, Alberto Imparato, Alessio Belenchia, Mauro Paternostro

(参考訳) 我々は、温度の異なる一連の熱浴と相互作用する量子ロータの有限な一次元鎖を考える。ローター間の相互作用をキラルにすると、そのようなシステムは自律的な熱モーターとして振る舞う。このような動的応答は、熱力学的極限における系の基底状態が量子相転移を示すハミルトンパラメータの範囲で強く発音される。このようなワークポイントは、システムの状態内の大きな量子コヒーレンスと多部量子相関と関連付けられている。このことは、そのような量子自律モーターの最適動作機構が最大量子性の一つであることを示唆している。

We consider a finite one-dimensional chain of quantum rotors interacting with a set of thermal baths at different temperatures. When the interaction between the rotors is made chiral, such a system behaves as an autonomous thermal motor, converting heat currents into non-vanishing rotational ones. Such a dynamical response is strongly pronounced in the range of the Hamiltonian parameters for which the ground state of the system in the thermodynamic limit exhibits a quantum phase transition. Such working points are associated with large quantum coherence and multipartite quantum correlations within the state of the system. This suggests that the optimal operating regime of such quantum autonomous motor is one of maximal quantumness.

翻訳日:2023-03-17 07:19:19 公開日:2021-08-24

# キャブライディング時の通勤者の快適性に及ぼす運転行動の影響:ドライバーレーティングの新しい視点に向けて

Impact of Driving Behavior on Commuter's Comfort during Cab Rides: Towards a New Perspective of Driver Rating ( http://arxiv.org/abs/2108.10944v1 )

ライセンス: Link先を確認

Rohit Verma, Sugandh Pargal, Debasree Das, Tanusree Parbat, Sai Shankar Kambalapalli, Bivas Mitra, and Sandip Chakraborty

(参考訳) タクシーの通勤の快適さは、ドライバーのレーティングや、uberやlyftのような配車会社の評価に影響する。既存の研究では、通勤者の快適性はパーソナライズされたレベルで異なるだけでなく、同じ通勤者に対して異なる旅行で異なる認識を受けることが示されている。さらに、運転行動や運転環境など、快適感に影響を及ぼす要因がいくつかある。運転行動の影響による通勤者の快適感を自動的に抽出することは、通勤者の満足度を満足させるのに役立つドライバーへのタイムリーなフィードバックに不可欠である。これを踏まえて、通常このようなタクシーに乗る通勤者約200人を調査し、タクシーの乗り心地に影響を及ぼす一連の特徴を得た。次に、通勤者からスマートフォンセンサデータを収集し、そのデータから空間時系列特徴を抽出し、運転に関して5ポイントスケールで通勤者の快適さのレベルを算出するシステム ridergoを開発した。 Ridergoは階層的時間記憶モデルに基づくアプローチを用いて特徴分布の異常を観測し、マルチタスク学習に基づくニューラルネットワークモデルを訓練し、パーソナライズされたレベルで通勤者の快適なレベルを得る。モデルはまた、通勤者に対して、利用可能なデータセットに新しいデータポイントを追加するようにインテリジェントにクエリし、定期的なトレーニングよりも自分自身を改善する。被験者30名を対象にRidergoの評価を行った結果,運転が快適感に影響を及ぼす場合,効率のよい快適度が得られた。

Commuter comfort in cab rides affects driver rating as well as the reputation of ride-hailing firms like Uber/Lyft. Existing research has revealed that commuter comfort not only varies at a personalized level but also is perceived differently on different trips for the same commuter. Furthermore, there are several factors, including driving behavior and driving environment, affecting the perception of comfort. Automatically extracting the perceived comfort level of a commuter due to the impact of the driving behavior is crucial for a timely feedback to the drivers, which can help them to meet the commuter's satisfaction. In light of this, we surveyed around 200 commuters who usually take such cab rides and obtained a set of features that impact comfort during cab rides. Following this, we develop a system Ridergo which collects smartphone sensor data from a commuter, extracts the spatial time series feature from the data, and then computes the level of commuter comfort on a five-point scale with respect to the driving. Ridergo uses a Hierarchical Temporal Memory model-based approach to observe anomalies in the feature distribution and then trains a Multi-task learning-based neural network model to obtain the comfort level of the commuter at a personalized level. The model also intelligently queries the commuter to add new data points to the available dataset and, in turn, improve itself over periodic training. Evaluation of Ridergo on 30 participants shows that the system could provide efficient comfort score with high accuracy when the driving impacts the perceived comfort.

翻訳日:2023-03-17 07:19:11 公開日:2021-08-24

# ソーシャルメディア上のフェイクニュース拡散者の心理・動機要因によるプロファイリング

Profiling Fake News Spreaders on Social Media through Psychological and Motivational Factors ( http://arxiv.org/abs/2108.10942v1 )

ライセンス: Link先を確認

Mansooreh Karami, Tahora H. Nazer, Huan Liu

(参考訳) 過去10年間のフェイクニュースの台頭は、選挙に関する意見の揺さぶりから、パンデミックの間に不確実性を生み出すまで、数多くの結果をもたらした。偽ニュースに対処するために開発されたほとんどの方法は、偽ニュースコンテンツや、それを生成する悪意のあるアクターに焦点を当てている。しかし、偽ニュースのバイラル性は、それを広めるユーザーに大きく依存している。これらのユーザーに対する深い理解は、偽ニュースを拡散する可能性のあるユーザーを特定するためのフレームワークの開発に寄与することができる。本研究では,ソーシャルメディア上でのフェイクニューススプレッシャーの特徴と動機要因について,心理学的理論や行動学的研究から考察した。次に、フェイクニューススプレッドラーが他のユーザーと異なる特徴を示すことができるかどうかを判定する一連の実験を行う。さらに,本実験における偽ニュース拡散器の特性が,実際のソーシャルメディア環境における偽ニュース拡散器の検出に応用できるかどうかを検証して検討した。

The rise of fake news in the past decade has brought with it a host of consequences, from swaying opinions on elections to generating uncertainty during a pandemic. A majority of methods developed to combat disinformation either focus on fake news content or malicious actors who generate it. However, the virality of fake news is largely dependent upon the users who propagate it. A deeper understanding of these users can contribute to the development of a framework for identifying users who are likely to spread fake news. In this work, we study the characteristics and motivational factors of fake news spreaders on social media with input from psychological theories and behavioral studies. We then perform a series of experiments to determine if fake news spreaders can be found to exhibit different characteristics than other users. Further, we investigate our findings by testing whether the characteristics we observe amongst fake news spreaders in our experiments can be applied to the detection of fake news spreaders in a real social media environment.

翻訳日:2023-03-17 07:18:44 公開日:2021-08-24

# 低ランクサドルフリーニュートン:確率的非凸最適化のためのスケーラブルな方法

Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization ( http://arxiv.org/abs/2002.02881v3 )

ライセンス: Link先を確認

Thomas O'Leary-Roseberry, Nick Alger, Omar Ghattas

(参考訳) 現代のディープラーニングでは、大規模データセットと一般化特性から、高度にサブサンプル化された確率近似(SA)法が平均近似(SAA)法より好まれている。加えて、ヘッセン人の形成と分解のコストが認識されているため、これらの問題には二階法が用いられない。この研究において、ニュートン法をSA体制に拡張する動機付けを行い、低階近似を好んでヘッセンを形成することを避けるため、スケーラブルな低階サドルフリーニュートン法(LRSFN)を用いることを主張した。さらにLRSFNは、不確定領域から素早く脱出し、より良い最適化ソリューションを実現する。 SA設定では、反復的な更新は確率的ノイズに支配され、手法の安定性が鍵となる。我々は, 連続時間安定性解析フレームワークを導入し, ニュートン法に対する確率的誤差を悪条件のヘッシアンによって大きく増幅できることを示す。 LRSFN法はこの安定性問題をレバンス・マルカールト減衰によって緩和する。しかし、一般に解析は、決定論的問題とは異なり、確率的ヘッセン情報と勾配情報を持つ二階法は小さなステップを踏む必要があることを示している。数値計算の結果,LRSFNは他の手法が抱える問題のある不確定領域から逃れることが可能であり,制限的なステップ長条件下であっても,等価な計算作業の一般化性の観点から,大規模深層学習タスクにおいて一般的な一階法よりも優れていることがわかった。

In modern deep learning, highly subsampled stochastic approximation (SA) methods are preferred to sample average approximation (SAA) methods because of large data sets as well as generalization properties. Additionally, due to perceived costs of forming and factorizing Hessians, second order methods are not used for these problems. In this work we motivate the extension of Newton methods to the SA regime, and argue for the use of the scalable low rank saddle free Newton (LRSFN) method, which avoids forming the Hessian in favor of making a low rank approximation. Additionally, LRSFN can facilitate fast escape from indefinite regions leading to better optimization solutions. In the SA setting, iterative updates are dominated by stochastic noise, and stability of the method is key. We introduce a continuous time stability analysis framework, and use it to demonstrate that stochastic errors for Newton methods can be greatly amplified by ill-conditioned Hessians. The LRSFN method mitigates this stability issue via Levenberg-Marquardt damping. However, generally the analysis shows that second order methods with stochastic Hessian and gradient information may need to take small steps, unlike in deterministic problems. Numerical results show that LRSFN can escape indefinite regions that other methods have issues with; and even under restrictive step length conditions, LRSFN can outperform popular first order methods on large scale deep learning tasks in terms of generalizability for equivalent computational work.

翻訳日:2023-01-03 05:20:07 公開日:2021-08-24

# 微分可能ファジィ論理演算子の解析

Analyzing Differentiable Fuzzy Logic Operators ( http://arxiv.org/abs/2002.06100v2 )

ライセンス: Link先を確認

Emile van Krieken, Erman Acar, Frank van Harmelen

(参考訳) AIコミュニティは、これらのアプローチの強みと弱みが相補的であるとしばしば主張されるため、象徴的アプローチとニューラルアプローチの組み合わせに注意を向けている。最近の文献のトレンドは、ファジィ論理の演算子を用いる弱い教師付き学習技術である。特に、このような論理に記述された事前の背景知識を用いて、ラベル付きでノイズの多いデータからニューラルネットワークのトレーニングを支援する。ニューラルネットワークを用いて論理記号を解釈することにより、この背景知識を通常の損失関数に追加することができる。我々は,ファジィ論理文からの論理演算子の大規模な集合が,微分可能な学習環境でどのように振る舞うかを,形式的かつ実証的に研究する。これらの演算子の多くは、最もよく知られたものを含めて、この設定には非常に適していないことが分かりました。さらなる発見は、これらのファジィ論理における含意の扱いを懸念し、前者によって駆動される勾配とそれに伴う含意の強い不均衡を示す。さらに,この現象に取り組むために,新たなファジィ・インジェクション(sgmoidal implications)のファミリーを導入する。最後に,半教師付き学習に微分可能なファジィ論理を用いることが可能であることを実証的に示し,運用者が実際にどのように振る舞うかを比較する。教師付きベースラインよりも最大の性能向上を達成するためには、学習において良好に機能するが、通常の論理法則を満たさない論理演算子の非標準的な組み合わせに頼る必要がある。

The AI community is increasingly putting its attention towards combining symbolic and neural approaches, as it is often argued that the strengths and weaknesses of these approaches are complementary. One recent trend in the literature are weakly supervised learning techniques that employ operators from fuzzy logics. In particular, these use prior background knowledge described in such logics to help the training of a neural network from unlabeled and noisy data. By interpreting logical symbols using neural networks, this background knowledge can be added to regular loss functions, hence making reasoning a part of learning. We study, both formally and empirically, how a large collection of logical operators from the fuzzy logic literature behave in a differentiable learning setting. We find that many of these operators, including some of the most well-known, are highly unsuitable in this setting. A further finding concerns the treatment of implication in these fuzzy logics, and shows a strong imbalance between gradients driven by the antecedent and the consequent of the implication. Furthermore, we introduce a new family of fuzzy implications (called sigmoidal implications) to tackle this phenomenon. Finally, we empirically show that it is possible to use Differentiable Fuzzy Logics for semi-supervised learning, and compare how different operators behave in practice. We find that, to achieve the largest performance improvement over a supervised baseline, we have to resort to non-standard combinations of logical operators which perform well in learning, but no longer satisfy the usual logical laws.

翻訳日:2023-01-01 04:22:54 公開日:2021-08-24

# Triangle-Net: ポイントクラウド学習におけるロバストネスを目指して

Triangle-Net: Towards Robustness in Point Cloud Learning ( http://arxiv.org/abs/2003.00856v2 )

ライセンス: Link先を確認

Chenxi Xiao and Juan Wachs

(参考訳) 3次元オブジェクト認識は、自動運転車やサービスロボット、監視ドローンといった多くのコンピュータビジョンシステムにとって、非構造環境でより効果的に動作するための重要な能力になりつつある。これらのリアルタイムシステムは、様々なサンプリング解像度、ノイズ測定、無拘束ポーズ構成にロバストな効果的な分類方法を必要とする。これまでの研究では、ポイントのスパーシティ、回転、位置固有分散がポイントクラウドに基づく分類技術の性能を著しく低下させる可能性があることが示されている。しかし、どちらも多因子分散や顕著な分散に対して十分に堅牢ではない。そこで本研究では, 回転, 位置シフト, スケーリングに対する不変性を同時に実現し, 点間隔に頑健な3次元分類手法を提案する。この目的のために,提案したニューラルネットワークでエンドツーエンドに学習し,頑健な3Dオブジェクトの潜在表現を得ることのできる点雲グラフ構造を利用する新機能を導入する。このような潜在表現は,点がばらばらである場合,オブジェクト分類や検索タスクの性能を著しく向上させることができる。さらに, 任意のSO(3)回転下では, 16点のみのスパース点雲を用いて, ModelNet 40分類タスクにおいて, ポイントネットと3DmFVをそれぞれ35.0%, 28.1%上回った。

Three dimensional (3D) object recognition is becoming a key desired capability for many computer vision systems such as autonomous vehicles, service robots and surveillance drones to operate more effectively in unstructured environments. These real-time systems require effective classification methods that are robust to various sampling resolutions, noisy measurements, and unconstrained pose configurations. Previous research has shown that points' sparsity, rotation and positional inherent variance can lead to a significant drop in the performance of point cloud based classification techniques. However, neither of them is sufficiently robust to multifactorial variance and significant sparsity. In this regard, we propose a novel approach for 3D classification that can simultaneously achieve invariance towards rotation, positional shift, scaling, and is robust to point sparsity. To this end, we introduce a new feature that utilizes graph structure of point clouds, which can be learned end-to-end with our proposed neural network to acquire a robust latent representation of the 3D object. We show that such latent representations can significantly improve the performance of object classification and retrieval tasks when points are sparse. Further, we show that our approach outperforms PointNet and 3DmFV by 35.0% and 28.1% respectively in ModelNet 40 classification tasks using sparse point clouds of only 16 points under arbitrary SO(3) rotation.

翻訳日:2022-12-28 07:20:17 公開日:2021-08-24

# 人工知能における価値学習に応用した動的認知

Dynamic Cognition Applied to Value Learning in Artificial Intelligence ( http://arxiv.org/abs/2005.05538v6 )

ライセンス: Link先を確認

Nythamar de Oliveira and Nicholas Kluge Corr\^ea

(参考訳) 人工知能(AI)開発の専門家は、インテリジェントシステムとエージェントの開発の進歩が、我々の社会における重要な領域を形作ると予測している。しかし、そのような進歩が慎重さで行われなければ、それは人類にとって否定的な結果をもたらす可能性がある。このため、この分野の何人かの研究者は、堅牢で有益で安全な人工知能の概念を開発しようとしている。現在、AI研究の分野におけるいくつかのオープンな問題は、インテリジェントエージェントの望ましくない振る舞いを避けることの難しさと、そのようなシステムが何をするかを規定することによるものである。直交論で論じられているように、aiが単に知性のために道徳的な好みを発達させることは期待できないという事実を考えると、人工知能エージェントが人間の価値観に合致する価値を持っていることは最も重要である。おそらくこの難しさは、表現的認知手法を用いて、目的、価値、目的を表現している問題に対処する方法に由来する。この問題の解決策は、ドレフュスが提唱した動的認知的アプローチであり、その現象論的哲学は、世界にいる人間の経験は象徴的あるいは接続主義的な認知的手法では表現できないことを擁護している。この問題に対する可能なアプローチは、SED(situated embodied dynamics)のような理論モデルを使用して、AIにおける価値学習問題に対処することだ。

Experts in Artificial Intelligence (AI) development predict that advances in the development of intelligent systems and agents will reshape vital areas in our society. Nevertheless, if such an advance isn't done with prudence, it can result in negative outcomes for humanity. For this reason, several researchers in the area are trying to develop a robust, beneficial, and safe concept of artificial intelligence. Currently, several of the open problems in the field of AI research arise from the difficulty of avoiding unwanted behaviors of intelligent agents, and at the same time specifying what we want such systems to do. It is of utmost importance that artificial intelligent agents have their values aligned with human values, given the fact that we cannot expect an AI to develop our moral preferences simply because of its intelligence, as discussed in the Orthogonality Thesis. Perhaps this difficulty comes from the way we are addressing the problem of expressing objectives, values, and ends, using representational cognitive methods. A solution to this problem would be the dynamic cognitive approach proposed by Dreyfus, whose phenomenological philosophy defends that the human experience of being-in-the-world cannot be represented by the symbolic or connectionist cognitive methods. A possible approach to this problem would be to use theoretical models such as SED (situated embodied dynamics) to address the values learning problem in AI.

翻訳日:2022-12-03 19:08:57 公開日:2021-08-24

# 粗いラベルを用いた弱教師付き表現学習

Weakly Supervised Representation Learning with Coarse Labels ( http://arxiv.org/abs/2005.09681v3 )

ライセンス: Link先を確認

Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Juhua Hu

(参考訳) データ収集のための計算能力と技術の開発により、ディープラーニングは、ビジュアルベンチマークデータセット上の既存のアルゴリズムよりも優れた性能を示す。深層学習のメカニズムの研究に多くの努力が注がれている。重要な観察の1つは、ディープラーニングが原材料から直接タスク依存の方法で識別パターンを学習できることである。そのため、深層学習により得られた表現は手作りの特徴を著しく上回る。しかし、現実のアプリケーションでは、オンラインショッピングでのビジュアル検索のようなタスク固有のラベルを収集するには高価すぎる。これらのタスク固有のラベルの可用性が限られているのに対して、粗いクラスラベルはずっと手頃だが、それらから学んだ表現はターゲットタスクに最適である。この課題を軽減するために,粗いラベルのみを利用できる場合に,対象タスクのきめ細かいパターンを学習するアルゴリズムを提案する。さらに重要なのは、理論的保証を提供することです。実世界のデータセットに対する大規模な実験により,提案手法は,粗いクラス情報のみをトレーニングに利用できる場合に,対象タスク上での学習表現の性能を著しく向上させることができることを示した。コードは \url{https://github.com/idstcv/CoIns} で入手できる。

With the development of computational power and techniques for data collection, deep learning demonstrates a superior performance over most existing algorithms on visual benchmark data sets. Many efforts have been devoted to studying the mechanism of deep learning. One important observation is that deep learning can learn the discriminative patterns from raw materials directly in a task-dependent manner. Therefore, the representations obtained by deep learning outperform hand-crafted features significantly. However, for some real-world applications, it is too expensive to collect the task-specific labels, such as visual search in online shopping. Compared to the limited availability of these task-specific labels, their coarse-class labels are much more affordable, but representations learned from them can be suboptimal for the target task. To mitigate this challenge, we propose an algorithm to learn the fine-grained patterns for the target task, when only its coarse-class labels are available. More importantly, we provide a theoretical guarantee for this. Extensive experiments on real-world data sets demonstrate that the proposed method can significantly improve the performance of learned representations on the target task, when only coarse-class information is available for training. Code is available at \url{https://github.com/idstcv/CoIns}.

翻訳日:2022-12-01 14:16:03 公開日:2021-08-24

# Cumulant GAN

Cumulant GAN ( http://arxiv.org/abs/2006.06625v3 )

ライセンス: Link先を確認

Yannis Pantazis, Dipjyoti Paul, Michail Fasoulakis, Yannis Stylianou and Markos Katsoulakis

(参考訳) 本稿では,より深い理論的理解と基礎的最適化問題に対する安定性と性能の向上を目的とした,gans(generative adversarial network)訓練のための新しい損失関数を提案する。新たな損失関数は、\emph{cumulant gan} を生成する累積生成関数に基づいている。最近派生した変分公式に依拠して、対応する最適化問題は r{\'e}nyi 分岐最小化に相当し、gan 損失の(部分的に)統一的な視点を提供する: r{\'e}nyi ファミリーは、kullback-leibler divergence (kld)、reverse kld、helinger distance、$\chi^2$-divergence を含む。 Wasserstein GANは累積GANのメンバーでもある。安定性の面では、線形判別器、ガウス分布および標準勾配降下上昇アルゴリズムに対する累積GANのナッシュ平衡への線形収束を厳密に証明する。最後に,Wasserstein GANに対して画像生成がより堅牢であることが実験的に証明され,より弱い判別器と強い判別器の両方を考慮すると,開始点とFr'echet開始距離の両方で大幅に改善される。

In this paper, we propose a novel loss function for training Generative Adversarial Networks (GANs) aiming towards deeper theoretical understanding as well as improved stability and performance for the underlying optimization problem. The new loss function is based on cumulant generating functions giving rise to \emph{Cumulant GAN}. Relying on a recently-derived variational formula, we show that the corresponding optimization problem is equivalent to R{\'e}nyi divergence minimization, thus offering a (partially) unified perspective of GAN losses: the R{\'e}nyi family encompasses Kullback-Leibler divergence (KLD), reverse KLD, Hellinger distance and $\chi^2$-divergence. Wasserstein GAN is also a member of cumulant GAN. In terms of stability, we rigorously prove the linear convergence of cumulant GAN to the Nash equilibrium for a linear discriminator, Gaussian distributions and the standard gradient descent ascent algorithm. Finally, we experimentally demonstrate that image generation is more robust relative to Wasserstein GAN and it is substantially improved in terms of both inception score and Fr\'echet inception distance when both weaker and stronger discriminators are considered.

翻訳日:2022-11-22 13:15:21 公開日:2021-08-24

# パサデナ:知覚的に認識し、敵対的妄想攻撃

Pasadena: Perceptually Aware and Stealthy Adversarial Denoise Attack ( http://arxiv.org/abs/2007.07097v3 )

ライセンス: Link先を確認

Yupeng Cheng, Qing Guo, Felix Juefei-Xu, Wei Feng, Shang-Wei Lin, Weisi Lin, Yang Liu

(参考訳) 画像デノイジングは、低品質の撮像センサ、不安定な画像伝送プロセス、あるいは低い光条件により、マルチメディアデバイスで撮影された画像に広く存在する自然ノイズを除去することができる。近年の研究では、画像の雑音化は、例えば画像分類のような高レベルな視覚タスクに効果があることも判明している。本研究では,この常識に挑戦し,画像のデノイジングが最先端のディープニューラルネットワーク(dnn)を騙し,画質を高めることができるかどうかという,まったく新しい問題を探求する。この目的のために,敵の攻撃の観点からこの問題を研究するための最初の試みを開始し,敵の妄想攻撃を提案する。まず、マルチメディアデバイスに広くデプロイされたイメージデノイジングモジュール内に攻撃をステルスに埋め込む新しいタスクを、画像のポスト処理操作として特定し、視覚的な画像品質と愚かなdnnを同時に向上させます。第2に,この課題を画像フィルタリングのカーネル予測問題として定式化し,効果的なノイズ除去と逆アタックを同時に行うために,逆ノイズのないカーネルを生成できる逆検出型カーネル予測を提案する。第三に、攻撃がより効果的になりうるセマンティック関連脆弱性領域を特定するために、適応的な知覚領域ローカライゼーションを実装している。本稿では,提案手法をPasadena (Perceptually Aware and Stealthy Adversarial DeNoise Attack) と命名し,NeurIPS'17逆競合データセット(CVPR2021-AIC-VI:unrestricted adversarial attacks on ImageNet,etc)で検証した。包括的評価と分析により,本手法は偏執だけでなく,最先端攻撃に対する成功率や伝達性も著しく向上することが示された。

Image denoising can remove natural noise that widely exists in images captured by multimedia devices due to low-quality imaging sensors, unstable image transmission processes, or low light conditions. Recent works also find that image denoising benefits the high-level vision tasks, e.g., image classification. In this work, we try to challenge this common sense and explore a totally new problem, i.e., whether the image denoising can be given the capability of fooling the state-of-the-art deep neural networks (DNNs) while enhancing the image quality. To this end, we initiate the very first attempt to study this problem from the perspective of adversarial attack and propose the adversarial denoise attack. More specifically, our main contributions are three-fold: First, we identify a new task that stealthily embeds attacks inside the image denoising module widely deployed in multimedia devices as an image post-processing operation to simultaneously enhance the visual image quality and fool DNNs. Second, we formulate this new task as a kernel prediction problem for image filtering and propose the adversarial-denoising kernel prediction that can produce adversarial-noiseless kernels for effective denoising and adversarial attacking simultaneously. Third, we implement an adaptive perceptual region localization to identify semantic-related vulnerability regions with which the attack can be more effective while not doing too much harm to the denoising. We name the proposed method as Pasadena (Perceptually Aware and Stealthy Adversarial DENoise Attack) and validate our method on the NeurIPS'17 adversarial competition dataset, CVPR2021-AIC-VI: unrestricted adversarial attacks on ImageNet,etc. The comprehensive evaluation and analysis demonstrate that our method not only realizes denoising but also achieves a significantly higher success rate and transferability over state-of-the-art attacks.

翻訳日:2022-11-10 15:18:07 公開日:2021-08-24

# 神経形制御

Neuromorphic Control ( http://arxiv.org/abs/2011.04441v2 )

ライセンス: Link先を確認

Luka Ribar, Rodolphe Sepulchre

(参考訳) ニューロモルフィックエンジニアリング(Neuromorphic Engineering)は、ニューラルネットワークの生物学的組織からインスピレーションを得て、コンピューティング、センシング、アクティベーションのための新しい技術を開発することを目的とした、急速に発展する分野である。このようなシステムのユニークな性質は、新しい信号処理と制御パラダイムを要求する。本稿では、異なる時間スケールで作用する正負のフィードバックループと正のフィードバックループからなる興奮性神経系の混合フィードバック組織について紹介する。生物学的神経調節の原理は、混合フィードバック系をニューロモルフィズム的に設計し制御するための方法論を示唆している。提案する設計は、生体ニューロンの組織化を反映し、ニューロモルフィックな電子回路のハードウェアコンポーネントを利用する基本回路要素の並列相互接続からなる。相互接続構造は、入力出力整形問題として神経制御を再構成する単純な制御手法によって、ニューロモルフィックシステムを提供する。神経制御のポテンシャルは、混合フィードバック原理のスケーラビリティを示唆する基本的なネットワークの例に示される。

Neuromorphic engineering is a rapidly developing field that aims to take inspiration from the biological organization of neural systems to develop novel technology for computing, sensing, and actuating. The unique properties of such systems call for new signal processing and control paradigms. The article introduces the mixed feedback organization of excitable neuronal systems, consisting of interlocked positive and negative feedback loops acting in distinct timescales. The principles of biological neuromodulation suggest a methodology for designing and controlling mixed-feedback systems neuromorphically. The proposed design consists of a parallel interconnection of elementary circuit elements that mirrors the organization of biological neurons and utilizes the hardware components of neuromorphic electronic circuits. The interconnection structure endows the neuromorphic systems with a simple control methodology that reframes the neuronal control as an input-output shaping problem. The potential of neuronal control is illustrated on elementary network examples that suggest the scalability of the mixed-feedback principles.

翻訳日:2022-09-28 01:18:49 公開日:2021-08-24

# ランダムウォークを用いたビデオ中の物体検出のための自己教師型学習システム

A Self-supervised Learning System for Object Detection in Videos Using Random Walks on Graphs ( http://arxiv.org/abs/2011.05459v3 )

ライセンス: Link先を確認

Juntao Tan, Changkyu Song, Abdeslam Boularias

(参考訳) 本稿では,画像中の物体の新規かつ未発見のカテゴリを検出するための学習用自己教師付きシステムを提案する。提案システムは,様々なオブジェクトを含むシーンの未ラベル映像を入力として受信する。ビデオのフレームは深度情報を使ってオブジェクトに分割され、各ビデオに沿ってセグメントが追跡される。その後、システムは重み付きグラフを構築し、それらを含むオブジェクト間の類似性に基づいてシーケンスを接続する。オブジェクトの2つのシーケンス間の類似性は、オブジェクトの視点を整列するために2つのシーケンス内のフレームを自動的に並べ替えた後、一般的な視覚的特徴を用いて測定される。このグラフは、ランダムウォークを実行することによって、類似の異なる例のトリプレットをサンプリングするために使用される。三重項の例は最終的に、汎用的な視覚特徴を低次元多様体に投影するシアムニューラルネットワークのトレーニングに使用される。 YCB-Video、CORe50、RGBD-Objectの3つの公開データセットの実験は、予測された低次元特徴が未知のオブジェクトを新しいカテゴリにクラスタリングする精度を改善し、最近の非教師なしクラスタリング技術より優れていることを示している。

This paper presents a new self-supervised system for learning to detect novel and previously unseen categories of objects in images. The proposed system receives as input several unlabeled videos of scenes containing various objects. The frames of the videos are segmented into objects using depth information, and the segments are tracked along each video. The system then constructs a weighted graph that connects sequences based on the similarities between the objects that they contain. The similarity between two sequences of objects is measured by using generic visual features, after automatically re-arranging the frames in the two sequences to align the viewpoints of the objects. The graph is used to sample triplets of similar and dissimilar examples by performing random walks. The triplet examples are finally used to train a siamese neural network that projects the generic visual features into a low-dimensional manifold. Experiments on three public datasets, YCB-Video, CORe50 and RGBD-Object, show that the projected low-dimensional features improve the accuracy of clustering unknown objects into novel categories, and outperform several recent unsupervised clustering techniques.

翻訳日:2022-09-27 08:06:22 公開日:2021-08-24

# 単段連続ジェスチャー認識のためのマルチモーダル融合

Multi-modal Fusion for Single-Stage Continuous Gesture Recognition ( http://arxiv.org/abs/2011.04945v2 )

ライセンス: Link先を確認

Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes

(参考訳) ジェスチャー認識は、ロボット工学や人間と機械の相互作用を含む、無数の現実世界の応用が研究されている分野である。現在のジェスチャー認識法は孤立したジェスチャーを認識することに重点を置いており、既存の連続ジェスチャー認識法は、検出と分類に独立したモデルを必要とする2段階のアプローチに限られている。対照的に,複数のジェスチャを1つのモデルで検出・分類可能なtemporal multi-modal fusion(tmmf)と呼ばれる単段連続ジェスチャ認識フレームワークを導入する。このアプローチは、ジェスチャーと非ジェスチャーの自然な遷移を、個々のジェスチャーを検出するための前処理のセグメンテーションステップなしで学習する。これを実現するために,マルチモーダルな入力から流れる重要な情報の統合をサポートし,任意のモードにスケーラブルなマルチモーダル融合機構を提案する。さらに,ユニモーダル・フィーチャー・マッピング(ufm)とマルチモーダル・フィーチャー・マッピング(mfm)モデルを提案し,それぞれユニモーダル・フィーチャーと融合したマルチモーダル・フィーチャーをマッピングする。そこで,本研究では,実感と予測の円滑な一致を促す中点に基づく損失関数を提案し,モデルの自然なジェスチャー遷移の学習を支援する。本稿では,可変長の入力ビデオを処理し,EgoGesture,IPN hand,ChaLearn LAP Continuous Gesture Dataset (ConGD) という3つの課題データセットで最先端の処理を行うフレームワークの有用性を示す。さらに, アブレーション実験により, 提案手法の異なる成分の重要性が示された。

Gesture recognition is a much studied research area which has myriad real-world applications including robotics and human-machine interaction. Current gesture recognition methods have focused on recognising isolated gestures, and existing continuous gesture recognition methods are limited to two-stage approaches where independent models are required for detection and classification, with the performance of the latter being constrained by detection performance. In contrast, we introduce a single-stage continuous gesture recognition framework, called Temporal Multi-Modal Fusion (TMMF), that can detect and classify multiple gestures in a video via a single model. This approach learns the natural transitions between gestures and non-gestures without the need for a pre-processing segmentation step to detect individual gestures. To achieve this, we introduce a multi-modal fusion mechanism to support the integration of important information that flows from multi-modal inputs, and is scalable to any number of modes. Additionally, we propose Unimodal Feature Mapping (UFM) and Multi-modal Feature Mapping (MFM) models to map uni-modal features and the fused multi-modal features respectively. To further enhance performance, we propose a mid-point based loss function that encourages smooth alignment between the ground truth and the prediction, helping the model to learn natural gesture transitions. We demonstrate the utility of our proposed framework, which can handle variable-length input videos, and outperforms the state-of-the-art on three challenging datasets: EgoGesture, IPN hand, and ChaLearn LAP Continuous Gesture Dataset (ConGD). Furthermore, ablation experiments show the importance of different components of the proposed framework.

翻訳日:2022-09-27 07:41:59 公開日:2021-08-24

# シンボル空間による解釈可能な視覚推論

Interpretable Visual Reasoning via Induced Symbolic Space ( http://arxiv.org/abs/2011.11603v2 )

ライセンス: Link先を確認

Zhonghao Wang, Kai Wang, Mo Yu, Jinjun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi

(参考訳) 視覚的推論における概念誘導の問題、すなわち、画像に関連付けられた質問応答対から概念とその階層的関係を同定し、帰納的シンボリック概念空間に取り組むことによって解釈可能なモデルを実現する。そこで我々はまず,オブジェクト指向視覚特徴を用いた視覚的推論タスクを実行するために,オブジェクト指向合成注意モデル(OCCAM)という新しいフレームワークを設計する。次に,対象の視覚的特徴と質問語間の注意パターンから手がかりを用いて,対象と関係の概念を誘導する手法を考案する。最後に, OCCAMを誘導記号空間に表現したオブジェクトに付与することにより, 高い解釈可能性を実現する。我々のモデル設計は、まずオブジェクトと関係の概念を予測し、次に予測された概念を視覚的特徴空間に投影することで、構成的推論モジュールが正常に処理できるようにする。 CLEVRとGQAデータセットの実験は以下のとおりである。 1)OCCAMは,人為的な機能プログラムを使わずに新たな技術を実現する。 2) OCCAMが視覚的特徴や誘導記号的概念空間で表現されたオブジェクト上でのオンパーパフォーマンスを達成できる限り,我々の誘導概念は正確かつ十分である。

We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images; and achieve an interpretable model via working on the induced symbolic concept space. To this end, we first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features. Then, we come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words. Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space. Our model design makes this an easy adaption via first predicting the concepts of objects and relations and then projecting the predicted concepts back to the visual feature space so the compositional reasoning module can process normally. Experiments on the CLEVR and GQA datasets demonstrate: 1) our OCCAM achieves a new state of the art without human-annotated functional programs; 2) our induced concepts are both accurate and sufficient as OCCAM achieves an on-par performance on objects represented either in visual features or in the induced symbolic concept space.

翻訳日:2022-09-22 01:11:01 公開日:2021-08-24

# (参考訳) 線形回帰と整数計画に基づく高分子の推算法

A Method for Inferring Polymers Based on Linear Regression and Integer Programming ( http://arxiv.org/abs/2109.02628v1 )

ライセンス: CC BY 4.0

Ryota Ido, Shengjuan Cao, Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Hiroshi Nagamochi and Tatsuya Akutsu

(参考訳) 近年, 人工ニューラルネットワークと混合整数線形計画法を用いて, 望ましい化学特性を持つ化合物の分子構造を設計するための新しい枠組みが提案されている。本稿では, この枠組みに基づく新しいポリマー推定法を設計する。そこで本研究では, ポリマーをモノマーとして表現する新しい方法を紹介し, ポリマーの構造を特徴とする新しいディスクリプタを定義する。また,フレームワーク内で予測関数を構築するためのビルディングブロックとして線形回帰を用いる。計算実験の結果, 線形回帰で構築した予測関数がよく機能するポリマーの化学特性の集合が明らかとなった。また, 提案手法は, 最大50個の非水素原子を有するポリマーをモノマー形式で推算できることを示した。

A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property using both artificial neural networks and mixed integer linear programming. In this paper, we design a new method for inferring a polymer based on the framework. For this, we introduce a new way of representing a polymer as a form of monomer and define new descriptors that feature the structure of polymers. We also use linear regression as a building block of constructing a prediction function in the framework. The results of our computational experiments reveal a set of chemical properties on polymers to which a prediction function constructed with linear regression performs well. We also observe that the proposed method can infer polymers with up to 50 non-hydrogen atoms in a monomer form.

翻訳日:2021-09-12 12:06:11 公開日:2021-08-24

# Webスケールアプリケーションのためのバイナリコードベースのハッシュ埋め込み

Binary Code based Hash Embedding for Web-scale Applications ( http://arxiv.org/abs/2109.02471v1 )

ライセンス: Link先を確認

Bencheng Yan, Pengjie Wang, Jinquan Liu, Wei Lin, Kuang-Chih Lee, Jian Xu and Bo Zheng

(参考訳) 現在、ディープラーニングモデルはレコメンダシステムやオンライン広告といったウェブスケールのアプリケーションに広く採用されている。これらのアプリケーションでは、分類的特徴の埋め込み学習がディープラーニングモデルの成功に不可欠である。これらのモデルでは、各カテゴリの特徴値に学習や最適化が可能なユニークな埋め込みベクトルが割り当てられている。この方法はカテゴリの特徴をうまく捉え、優れた性能を約束するが、特にウェブスケールのアプリケーションの場合、埋め込みテーブルを保存するのに膨大なメモリコストがかかる。このような大きなメモリコストは、edrmの有効性とユーザビリティを著しく阻害する。本稿では,性能を損なうことなく,埋め込みテーブルのサイズを任意のスケールで縮小できるバイナリコードベースのハッシュ埋め込み手法を提案する。実験評価の結果,本手法では組込みテーブルサイズが従来のテーブルサイズよりも1000$\times$小さい場合でも,99\%の性能を達成できることがわかった。

Nowadays, deep learning models are widely adopted in web-scale applications such as recommender systems, and online advertising. In these applications, embedding learning of categorical features is crucial to the success of deep learning models. In these models, a standard method is that each categorical feature value is assigned a unique embedding vector which can be learned and optimized. Although this method can well capture the characteristics of the categorical features and promise good performance, it can incur a huge memory cost to store the embedding table, especially for those web-scale applications. Such a huge memory cost significantly holds back the effectiveness and usability of EDRMs. In this paper, we propose a binary code based hash embedding method which allows the size of the embedding table to be reduced in arbitrary scale without compromising too much performance. Experimental evaluation results show that one can still achieve 99\% performance even if the embedding table size is reduced 1000$\times$ smaller than the original one with our proposed method.

翻訳日:2021-09-12 10:54:22 公開日:2021-08-24

# (参考訳) UAVと移動体マッピング車からの観測を統合した時空間-スペクトル-角観測モデルによる都市マッピングの改善

Spatio-temporal-spectral-angular observation model that integrates observations from UAV and mobile mapping vehicle for better urban mapping ( http://arxiv.org/abs/2109.00900v1 )

ライセンス: CC BY 4.0

Zhenfeng Shao, Gui Cheng, Deren Li, Xiao Huang, Zhipeng Lu, Jian Liu

(参考訳) 複雑な都市シーンでは、1つのセンサーからの観察は避けられないほど観察の空白をもたらし、包括的な方法で都市オブジェクトを記述できない。本稿では,UAVおよび移動体地図車両プラットフォームからの観測を統合し,空中と地上の両方からの協調観測操作を実現するために,時空間・角度観測モデルを提案する。複雑な都市景観のマルチ角度データを効果的に取得するマルチソースリモートセンシングデータ取得システムを開発した。多元データ融合は、咬合による不足データ問題を解決し、複雑な都市シーンにおけるホログラフィック空間および時間情報の正確かつ迅速かつ完全な収集を実現する。我々は,中国長慶市バイシャタウンで実験を行い,UAVと移動体地図からマルチセンサ,マルチ角データを得た。まず、UAVからポイントクラウドを抽出し、UAVとモバイルマッピング車両のポイントクラウドを統合しました。統合された結果は,UAVと移動体地図車両点群の特徴を組み合わせ,提案した共同データ取得プラットフォームの実践性および時空間-スペクトル-角観測モデルの有効性を確認した。 uavまたはモバイルマッピング車両単独での観測と比較すると、統合システムは総合的な都市モニタリングに向けた効果的なデータ取得ソリューションを提供する。

In a complex urban scene, observation from a single sensor unavoidably leads to voids in observations, failing to describe urban objects in a comprehensive manner. In this paper, we propose a spatio-temporal-spectral-angular observation model to integrate observations from UAV and mobile mapping vehicle platform, realizing a joint, coordinated observation operation from both air and ground. We develop a multi-source remote sensing data acquisition system to effectively acquire multi-angle data of complex urban scenes. Multi-source data fusion solves the missing data problem caused by occlusion and achieves accurate, rapid, and complete collection of holographic spatial and temporal information in complex urban scenes. We carried out an experiment on Baisha Town, Chongqing, China and obtained multi-sensor, multi-angle data from UAV and mobile mapping vehicle. We first extracted the point cloud from UAV and then integrated the UAV and mobile mapping vehicle point cloud. The integrated results combined both the characteristic of UAV and mobile mapping vehicle point cloud, confirming the practicability of the proposed joint data acquisition platform and the effectiveness of spatio-temporal-spectral-angular observation model. Compared with the observation from UAV or mobile mapping vehicle alone, the integrated system provides an effective data acquisition solution towards comprehensive urban monitoring.

翻訳日:2021-09-05 09:57:25 公開日:2021-08-24

# (参考訳) DQLEL:エネルギー最適化LoS/NLoS UWBノード選択のための深いQラーニング

DQLEL: Deep Q-Learning for Energy-Optimized LoS/NLoS UWB Node Selection ( http://arxiv.org/abs/2108.13157v1 )

ライセンス: CC BY 4.0

Zohreh Hajiakhondi-Meybodi, Arash Mohammadi, Ming Hou, Konstantinos N. Plataniotis

(参考訳) モノのインターネット(IoT)の最近の進歩は、信頼性、正確、エネルギー効率の高い屋内ナビゲーション/ローカライゼーションシステムを提供することを目的として、屋内位置決めへの関心が高まっている。 UWB(Ultra Wide Band)技術は、上記の要件を満たすための候補として浮上している。 UWB技術は、広帯域を用いた屋内位置決めの精度を高めることができるが、その効率的な実装には大きな課題がある。一方、位置決めにおける高精度化は、Non Line of Sight (NLoS) リンクの識別/緩和に依存し、ローカライゼーションフレームワークの複雑さが著しく増大する。一方、UWBビーコンは電池寿命が限られており、特に戦略的な位置にある特定のビーコンの実際の状況では問題となる。これらの課題に対処するため,UWBビーコンの残バッテリ寿命のバランスを維持しつつ,複雑なNLoS緩和手法を使わずに位置精度を向上させるための効率的なノード選択フレームワークを提案する。モバイルユーザは、DQLEL(Deep Q-Learning Energy-Optimized LoS/NLoS)UWBノード選択フレームワークを参照して、Arival(TDoA)フレームワークの2次元時間差に基づいて、UWBビーコンの最適ペアを決定するために自律的に訓練される。提案するDQLELフレームワークの有効性を,リンク条件,UWBビーコンの残電池寿命のずれ,位置誤差,累積報酬の観点から評価した。シミュレーション結果に基づいて,提案するdqlelフレームワークは,上記の側面をはるかに上回っている。

Recent advancements in Internet of Things (IoTs) have brought about a surge of interest in indoor positioning for the purpose of providing reliable, accurate, and energy-efficient indoor navigation/localization systems. Ultra Wide Band (UWB) technology has been emerged as a potential candidate to satisfy the aforementioned requirements. Although UWB technology can enhance the accuracy of indoor positioning due to the use of a wide-frequency spectrum, there are key challenges ahead for its efficient implementation. On the one hand, achieving high precision in positioning relies on the identification/mitigation Non Line of Sight (NLoS) links, leading to a significant increase in the complexity of the localization framework. On the other hand, UWB beacons have a limited battery life, which is especially problematic in practical circumstances with certain beacons located in strategic positions. To address these challenges, we introduce an efficient node selection framework to enhance the location accuracy without using complex NLoS mitigation methods, while maintaining a balance between the remaining battery life of UWB beacons. Referred to as the Deep Q-Learning Energy-optimized LoS/NLoS (DQLEL) UWB node selection framework, the mobile user is autonomously trained to determine the optimal pair of UWB beacons to be localized based on the 2-D Time Difference of Arrival (TDoA) framework. The effectiveness of the proposed DQLEL framework is evaluated in terms of the link condition, the deviation of the remaining battery life of UWB beacons, location error, and cumulative rewards. Based on the simulation results, the proposed DQLEL framework significantly outperformed its counterparts across the aforementioned aspects.

翻訳日:2021-09-05 09:46:57 公開日:2021-08-24

# 適応マスク双生児層による効率的・効率的な埋め込み学習

Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer ( http://arxiv.org/abs/2108.11513v1 )

ライセンス: Link先を確認

Bencheng Yan, Pengjie Wang, Kai Zhang, Wei Lin, Kuang-Chih Lee, Jian Xu and Bo Zheng

(参考訳) 分類的特徴に対する学習の埋め込みは、深層学習に基づくレコメンデーションモデル(DLRM)にとって重要である。各特徴値は、埋め込み学習プロセスを介して埋め込みベクトルにマッピングされる。従来の方法では、同じ特徴フィールドからすべての特徴値に固定および均一な埋め込みサイズを設定する。しかし、そのような構成は学習を組み込むのに最適であるだけでなく、メモリのコストもかかる。ルールベースまたはニューラルアーキテクチャサーチ(NAS)ベースのこれらの問題を解決する既存の方法は、ヒューマンデザインやネットワークトレーニングに広範な努力を必要とする。また、サイズ選択やウォームスタートベースのアプリケーションでは柔軟性がない。本稿では,新しい,効果的な埋め込みサイズ選択手法を提案する。具体的には,標準組込み層の裏側に適応マッシュドツインベース層(amtl)を設計した。 AMTLは、埋め込みベクトルごとに望ましくない次元をマスクするマスクベクトルを生成する。マスクベクトルは次元の選択に柔軟性をもたらし、提案した層は訓練されていないDLRMに簡単に追加できる。広範な実験評価により、提案手法は全てのベンチマークタスクにおける競合ベースラインよりも優れており、またメモリ効率も高く、パフォーマンス指標を妥協することなく60\%のメモリ使用率を節約できることを示した。

Embedding learning for categorical features is crucial for the deep learning-based recommendation models (DLRMs). Each feature value is mapped to an embedding vector via an embedding learning process. Conventional methods configure a fixed and uniform embedding size to all feature values from the same feature field. However, such a configuration is not only sub-optimal for embedding learning but also memory costly. Existing methods that attempt to resolve these problems, either rule-based or neural architecture search (NAS)-based, need extensive efforts on the human design or network training. They are also not flexible in embedding size selection or in warm-start-based applications. In this paper, we propose a novel and effective embedding size selection scheme. Specifically, we design an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer. AMTL generates a mask vector to mask the undesired dimensions for each embedding vector. The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs. Extensive experimental evaluations show that the proposed scheme outperforms competitive baselines on all the benchmark tasks, and is also memory-efficient, saving 60\% memory usage without compromising any performance metrics.

翻訳日:2021-08-27 14:15:34 公開日:2021-08-24

# (参考訳) 高コントラストイメージングのための強化学習による自己最適化適応光学制御

Self-optimizing adaptive optics control with Reinforcement Learning for high-contrast imaging ( http://arxiv.org/abs/2108.11332v1 )

ライセンス: CC BY 4.0

Rico Landman, Sebastiaan Y. Haffert, Vikram M. Radhakrishnan, Christoph U. Keller

(参考訳) 現在および将来の高コントラスト撮像装置は、外惑星を直接撮像するために必要なコントラストに到達するために、極端適応光学系(XAO)を必要とする。制御ループの遅延による望遠鏡振動と時間誤差は、これらのシステムの性能を制限する。これらの効果を減らす一つの方法は予測制御を使用することである。本稿では,モデルフリーの強化学習を用いて,閉ループ予測制御のためのリカレントニューラルネットワークコントローラの最適化について述べる。まず,シミュレーションと実験室構成におけるチップティルト制御のアプローチを検証する。その結果, このアルゴリズムは最適ゲイン積分器と比較して, 振動を効果的に緩和し, パワーロー入力乱流の残差を低減できることがわかった。また,制御則のオンライン更新を必要とせずにランダム振動を最小化できることを示す。次に,本アルゴリズムは高次変形可能なミラーの制御にも適用可能であることを示す。我々は, 定常乱流下での小さな分離において, 制御器が2桁の等級改善を両立できることを実証する。さらに,制御則のオンライン更新を必要とせず,異なる風速や方向に対して比較して,桁違いに改善が見られた。

Current and future high-contrast imaging instruments require extreme adaptive optics (XAO) systems to reach contrasts necessary to directly image exoplanets. Telescope vibrations and the temporal error induced by the latency of the control loop limit the performance of these systems. One way to reduce these effects is to use predictive control. We describe how model-free Reinforcement Learning can be used to optimize a Recurrent Neural Network controller for closed-loop predictive control. First, we verify our proposed approach for tip-tilt control in simulations and a lab setup. The results show that this algorithm can effectively learn to mitigate vibrations and reduce the residuals for power-law input turbulence as compared to an optimal gain integrator. We also show that the controller can learn to minimize random vibrations without requiring online updating of the control law. Next, we show in simulations that our algorithm can also be applied to the control of a high-order deformable mirror. We demonstrate that our controller can provide two orders of magnitude improvement in contrast at small separations under stationary turbulence. Furthermore, we show more than an order of magnitude improvement in contrast for different wind velocities and directions without requiring online updating of the control law.

翻訳日:2021-08-27 00:14:49 公開日:2021-08-24

# (参考訳) 付加雑音モデルによる因果同定における騒音レベルの影響

The Effect of Noise Level on Causal Identification with Additive Noise Models ( http://arxiv.org/abs/2108.11320v1 )

ライセンス: CC BY 4.0

Benjamin Kap

(参考訳) 近年,因果推論や因果学習の分野で多くの研究が行われている。モデルにおける因果効果対を同定するために多くの手法が開発され、因果関係の方向を決定するために観測実世界データにうまく適用されている。これらの手法の多くは、矛盾、サイクル、選択バイアスなどの仮定を単純化する必要がある。しかし、両変数の状況では因果発見の問題はまだ難しい。このような手法の1つのクラスは、二変量の場合も扱えるようにしており、加法ノイズモデル(ANMs)に基づいている。残念ながら、これらの方法の1つの側面は、これまであまり注目されていない: 異なるノイズレベルが、それらの方法が因果関係の方向性を特定する能力に与える影響である。この研究は、実証的研究の助けを借りて、このギャップを埋めることを目的としている。本研究では, x が 2 変数 x, y のジョイント分布を与えられた場合,x が y または y を原因とするか否かを決定する必要のある因果発見問題の最も基本的な形式である双変量の場合を検討した。さらに、加算ノイズのレベルが1%から10000%に徐々に変化するようなanmの徹底的な範囲でテストされた、条件付き分散を用いた \textit{regression with subsequent independence test} と \textit{identification using conditional variances} の2つの特定の方法が選択されている(後者は修正されている)。さらに、本研究の実験では、線形および非線形の anms と同様に、いくつかの異なる種類の分布を考察する。実験の結果、これらの手法はノイズのレベルによっては真の因果方向を捉えることができないことが示された。

In recent years a lot of research has been conducted within the area of causal inference and causal learning. Many methods have been developed to identify the cause-effect pairs in models and have been successfully applied to observational real-world data in order to determine the direction of causal relationships. Many of these methods require simplifying assumptions, such as absence of confounding, cycles, and selection bias. Yet in bivariate situations causal discovery problems remain challenging. One class of such methods, that also allows tackling the bivariate case, is based on Additive Noise Models (ANMs). Unfortunately, one aspect of these methods has not received much attention until now: what is the impact of different noise levels on the ability of these methods to identify the direction of the causal relationship. This work aims to bridge this gap with the help of an empirical study. For this work, we considered bivariate cases, which is the most elementary form of a causal discovery problem where one needs to decide whether X causes Y or Y causes X, given joint distributions of two variables X, Y. Furthermore, two specific methods have been selected, \textit{Regression with Subsequent Independence Test} and \textit{Identification using Conditional Variances}, which have been tested with an exhaustive range of ANMs where the additive noises' levels gradually change from 1% to 10000% of the causes' noise level (the latter remains fixed). Additionally, the experiments in this work consider several different types of distributions as well as linear and non-linear ANMs. The results of the experiments show that these methods can fail to capture the true causal direction for some levels of noise.

翻訳日:2021-08-26 23:47:58 公開日:2021-08-24

# (参考訳) 構造的相互作用を考慮した解釈可能な非パラメトリック付加モデルによるセンササーベイ応答率予測

Predicting Census Survey Response Rates via Interpretable Nonparametric Additive Models with Structured Interactions ( http://arxiv.org/abs/2108.11328v1 )

ライセンス: CC BY 4.0

Shibal Ibrahim, Rahul Mazumder, Peter Radchenko, Emanuel Ben-David

(参考訳) 調査回答率の正確かつ解釈可能な予測は,運用の観点から重要である。アメリカ合衆国国勢調査局のよく知られたroam申請は、米国の国勢調査計画データベースデータに基づいて訓練された原則に基づく統計モデルを使用して、調査の難しい地域を特定する。初期のクラウドソーシングコンペティションでは、回帰ツリーのアンサンブルが調査応答率の予測に最高の性能をもたらしたが、限定的な解釈可能性のため、対応するモデルは対象に適用できなかった。本稿では,調査における応答率を高精度に予測する新しい解釈可能な統計手法を提案する。我々は,$\ell_0$-regularization による対関係を持つ疎非パラメトリック加法モデルと,解釈性を高める階層構造変種について検討した。強力な方法論的基盤にもかかわらず、そのようなモデルは計算的に困難であり、これらのモデルを学習するための新しいスケーラブルなアルゴリズムを提示します。また,提案した推定器の非漸近誤差境界も確立した。米国国勢調査計画データベースに基づく実験は、我々の手法が、人口の異なるセグメントに対して実行可能な解釈可能性を可能にする高品質な予測モデルに繋がることを示している。興味深いことに,我々の手法は,勾配向上とフィードフォワードニューラルネットワークに基づく最先端のブラックボックス機械学習手法に予測性能を損なうことなく,解釈可能性を大幅に向上させる。 pythonのコード実装はhttps://github.com/ShibalIbrahim/Additive-Models-with-Structured-Interactionsで公開されています。

Accurate and interpretable prediction of survey response rates is important from an operational standpoint. The US Census Bureau's well-known ROAM application uses principled statistical models trained on the US Census Planning Database data to identify hard-to-survey areas. An earlier crowdsourcing competition revealed that an ensemble of regression trees led to the best performance in predicting survey response rates; however, the corresponding models could not be adopted for the intended application due to limited interpretability. In this paper, we present new interpretable statistical methods to predict, with high accuracy, response rates in surveys. We study sparse nonparametric additive models with pairwise interactions via $\ell_0$-regularization, as well as hierarchically structured variants that provide enhanced interpretability. Despite strong methodological underpinnings, such models can be computationally challenging -- we present new scalable algorithms for learning these models. We also establish novel non-asymptotic error bounds for the proposed estimators. Experiments based on the US Census Planning Database demonstrate that our methods lead to high-quality predictive models that permit actionable interpretability for different segments of the population. Interestingly, our methods provide significant gains in interpretability without losing in predictive performance to state-of-the-art black-box machine learning methods based on gradient boosting and feedforward neural networks. Our code implementation in python is available at https://github.com/ShibalIbrahim/Additive-Models-with-Structured-Interactions.

翻訳日:2021-08-26 23:40:35 公開日:2021-08-24

# (参考訳) ggnb:gaussian naive bayes intrusion detection system for can bus

GGNB: Graph-Based Gaussian Naive Bayes Intrusion Detection System for CAN Bus ( http://arxiv.org/abs/2108.10908v1 )

ライセンス: CC BY 4.0

Riadul Islam, Maloy K. Devnath, Manar D. Samad, and Syed Md Jaffrey Al Kadry

(参考訳) 国家道路交通安全局(nhtsa)は、自動車システムのサイバーセキュリティは他の情報システムのセキュリティよりも重要であると特定した。研究者はすでに、制御エリアネットワーク(CAN)を用いた臨界車両電子制御ユニット(ECU)に対する遠隔攻撃を実証している。さらに、既存の侵入検知システム(IDS)は特定の種類の攻撃に対処することをしばしば提案する。可能な限り短時間で広範囲の攻撃を識別できる一般化可能なIDSは、攻撃固有のIDSよりも実用的価値が高い。本稿では,グラフ特性とページランク関連の特徴を活用し,新しい"textbf g}raph-based {\textbf g}aussian {\textbf n}aive {\textbf b}ayes (ggnb)侵入検出アルゴリズムを提案する。実際の生CANデータセット~\cite{Lee:2017}上のGGNBは99.61\%、99.83\%、96.79\%、96.20\%の検知精度で、それぞれDoS、ファジィ、スプーフィング、リプレイ、混合攻撃を行う。また、OpelAstraデータセット~\cite{Guillaume:2019}を用いて、提案手法はそれぞれ、DoS、診断、ファジングCANID、ファジングペイロード、リプレイ、サスペンション、混合攻撃を考慮した100\%、99.85\%、99.92\%、99.92\%、99.92\%、99.75\%、99.57\%の検出精度を有する。 GGNBベースの方法論では、同じアプリケーションで使用されるSVM分類器と比較して、それぞれ239\times$と135\times$低いトレーニング時間とテスト時間が必要です。 Xilinx Zybo Z7フィールドプログラマブルゲートアレイ(FPGA)ボードを使用して提案されたGGNBは、従来のNNアーキテクチャよりも5.7 \times$、5.9 \times$、5.1 \times$、および3.6 \times$のスライス、LUT、フリップフロップ、DSPユニットを必要とする。

The national highway traffic safety administration (NHTSA) identified cybersecurity of the automobile systems are more critical than the security of other information systems. Researchers already demonstrated remote attacks on critical vehicular electronic control units (ECUs) using controller area network (CAN). Besides, existing intrusion detection systems (IDSs) often propose to tackle a specific type of attack, which may leave a system vulnerable to numerous other types of attacks. A generalizable IDS that can identify a wide range of attacks within the shortest possible time has more practical value than attack-specific IDSs, which is not a trivial task to accomplish. In this paper we propose a novel {\textbf g}raph-based {\textbf G}aussian {\textbf n}aive {\textbf B}ayes (GGNB) intrusion detection algorithm by leveraging graph properties and PageRank-related features. The GGNB on the real rawCAN data set~\cite{Lee:2017} yields 99.61\%, 99.83\%, 96.79\%, and 96.20\% detection accuracy for denial of service (DoS), fuzzy, spoofing, replay, mixed attacks, respectively. Also, using OpelAstra data set~\cite{Guillaume:2019}, the proposed methodology has 100\%, 99.85\%, 99.92\%, 100\%, 99.92\%, 97.75\% and 99.57\% detection accuracy considering DoS, diagnostic, fuzzing CAN ID, fuzzing payload, replay, suspension, and mixed attacks, respectively. The GGNB-based methodology requires about $239\times$ and $135\times$ lower training and tests times, respectively, compared to the SVM classifier used in the same application. Using Xilinx Zybo Z7 field-programmable gate array (FPGA) board, the proposed GGNB requires $5.7 \times$, $5.9 \times$, $5.1 \times$, and $3.6 \times$ fewer slices, LUTs, flip-flops, and DSP units, respectively, than conventional NN architecture.

翻訳日:2021-08-26 23:39:20 公開日:2021-08-24

# (参考訳) テキストクラスタリングのためのハイブリッドマルチソース機能融合

Hybrid Multisource Feature Fusion for the Text Clustering ( http://arxiv.org/abs/2108.10926v1 )

ライセンス: CC BY 4.0

Jiaxuan Chen and Shenglin Gui

(参考訳) テキストクラスタリング技術は教師なしテキストマイニング手法であり、膨大な量のテキスト文書をグループに分割するのに使われる。テキストクラスタリングアルゴリズムは教師付き手法よりも優れたパフォーマンスを実現するのが難しく、クラスタリング性能は選択したテキスト機能に依存することが報告されている。現在、テキスト特徴生成アルゴリズムにはさまざまな種類があり、それぞれがvsmや分散単語埋め込みといった特定の側面からテキスト特徴を抽出するため、コーパスから可能な限り完全な機能を得る新しい方法を求めることが、クラスタリング効果を強化する鍵となっている。本稿では,マルチモデルの特徴表現,相互類似性行列,特徴融合という3つの要素からなるハイブリッド多元特徴融合(hmff)フレームワークを提案する。そこでは,各特徴点の相互類似性行列を構築し,相互類似性行列から相互類似性行列を融合し,次元を小さくしてhmff特徴を生成することにより,入力サンプルをグループに分割するk-meansクラスタリングアルゴリズムを構成できる。実験の結果、HMFFフレームワークは11の公開ベンチマークデータセットのうち7つの公開アルゴリズムよりも優れており、残りの4つのベンチマークデータセットでも主要なパフォーマンスを示している。最終的に、HMFFフレームワークと、野生のCOVID-19データセット上の競合相手と、未知のクラスタ数を比較した。

The text clustering technique is an unsupervised text mining method which are used to partition a huge amount of text documents into groups. It has been reported that text clustering algorithms are hard to achieve better performance than supervised methods and their clustering performance is highly dependent on the picked text features. Currently, there are many different types of text feature generation algorithms, each of which extracts text features from some specific aspects, such as VSM and distributed word embedding, thus seeking a new way of obtaining features as complete as possible from the corpus is the key to enhance the clustering effects. In this paper, we present a hybrid multisource feature fusion (HMFF) framework comprising three components, feature representation of multimodel, mutual similarity matrices and feature fusion, in which we construct mutual similarity matrices for each feature source and fuse discriminative features from mutual similarity matrices by reducing dimensionality to generate HMFF features, then k-means clustering algorithm could be configured to partition input samples into groups. The experimental tests show our HMFF framework outperforms other recently published algorithms on 7 of 11 public benchmark datasets and has the leading performance on the rest 4 benchmark datasets as well. At last, we compare HMFF framework with those competitors on a COVID-19 dataset from the wild with the unknown cluster count, which shows the clusters generated by HMFF framework partition those similar samples much closer.

翻訳日:2021-08-26 23:20:24 公開日:2021-08-24

# (参考訳) SLIVARの現状: ロボット、人間とロボットのインタラクション、そして(音声)対話システムにとって、次は何か?

The State of SLIVAR: What's next for robots, human-robot interaction, and (spoken) dialogue systems? ( http://arxiv.org/abs/2108.10931v1 )

ライセンス: CC BY-SA 4.0

Casey Kennington

(参考訳) 我々は,ロボット工学,人間ロボットインタラクション,音声対話システム研究の重要交差点におけるオープンな疑問を議論するために,最近のワークショップとセミナーの報告結果とレコメンデーションを合成した。この拡大する研究分野の目標は、人々がより効果的で自然にロボットとコミュニケーションできるようにすることだ。ネットワークと議論の機会を具体的かつ潜在的に資金提供可能なプロジェクトに向けて推進するため、私たちは関係者に対して、将来の仮想的および対面的な議論やワークショップに参加することを検討するよう促します。

We synthesize the reported results and recommendations of recent workshops and seminars that convened to discuss open questions within the important intersection of robotics, human-robot interaction, and spoken dialogue systems research. The goal of this growing area of research interest is to enable people to more effectively and naturally communicate with robots. To carry forward opportunities networking and discussion towards concrete, potentially fundable projects, we encourage interested parties to consider participating in future virtual and in-person discussions and workshops.

翻訳日:2021-08-26 23:03:52 公開日:2021-08-24

# (参考訳) SNコンピュータサイエンス:タミル語によるYouTubeコメントと投稿の攻撃的言語識別を目指す

SN Computer Science: Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts ( http://arxiv.org/abs/2108.10939v1 )

ライセンス: CC BY 4.0

Charangan Vasantharajan and Uthayasanker Thayasivam

(参考訳) ソーシャルメディアプラットフォームにおける攻撃的言語検出は、ここ数年で活発な研究分野となっている。非ネイティブな英語圏では、ソーシャルメディアのユーザーは投稿や記事にコードミキシングされたテキストを使うことが多い。これは、攻撃的なコンテンツ識別タスクにいくつかの課題をもたらし、Tamilで利用可能なリソースが少ないことを考えると、タスクはずっと難しくなります。本研究は,複数の深層学習モデルを用いて広範な実験を行い,YouTube上の攻撃的コンテンツを検出する。本稿では,BERT, DistilBERT, XLM-RoBERTaなどの多言語トランスフォーマネットワークを微調整し, アンサンブルすることで, より優れた結果を得るための, 選択的翻訳・翻訳手法の新規かつ柔軟なアプローチを提案する。実験の結果, ULMFiTが最適モデルであることが確認された。最高のパフォーマンスモデルは、 Distil-BERT や XLM-RoBERTa などの一般的なトランスファー学習モデルやハイブリッドディープラーニングモデルの代わりに、このタミル符号混合データセットの ULMFiT と mBERTBiLSTM であった。提案されたモデルulmfitとmbertbilstmは良好な結果をもたらし、低リソース言語における効果的な攻撃的音声識別を約束している。

Offensive Language detection in social media platforms has been an active field of research over the past years. In non-native English spoken countries, social media users mostly use a code-mixed form of text in their posts/comments. This poses several challenges in the offensive content identification tasks, and considering the low resources available for Tamil, the task becomes much harder. The current study presents extensive experiments using multiple deep learning, and transfer learning models to detect offensive content on YouTube. We propose a novel and flexible approach of selective translation and transliteration techniques to reap better results from fine-tuning and ensembling multilingual transformer networks like BERT, Distil- BERT, and XLM-RoBERTa. The experimental results showed that ULMFiT is the best model for this task. The best performing models were ULMFiT and mBERTBiLSTM for this Tamil code-mix dataset instead of more popular transfer learning models such as Distil- BERT and XLM-RoBERTa and hybrid deep learning models. The proposed model ULMFiT and mBERTBiLSTM yielded good results and are promising for effective offensive speech identification in low-resourced languages.

翻訳日:2021-08-26 22:55:59 公開日:2021-08-24

# (参考訳) 7Tにおける定量的R1マッピングにおける走査間運動アーチファクトの補正

Correcting inter-scan motion artefacts in quantitative R1 mapping at 7T ( http://arxiv.org/abs/2108.10943v1 )

ライセンス: CC BY 4.0

Ya\"el Balbastre, Ali Aghaeifar, Nad\`ege Corbin, Mikael Brudfors, John Ashburner, Martina F. Callaghan

(参考訳) 目的: スキャン間運動は、$R_1$推定における重大なエラー源であり、$B_1$フィールドがより不均一な7Tで増加することが期待できる。確立された補正方式は、ボディコイル参照を必要とするため、7Tに変換されない。ここでは,確立した手法に勝る代替案を2つ紹介する。相対感度を計算するため、ボディコイル画像を必要としない。理論: 提案手法はコイル結合等級画像を用いて相対的なコイル感度を求める。第1の方法は、単純な比で相対感度を効率よく計算し、第2の方法はより洗練された生成モデルを適用する。方法:$R_1$マップは可変フリップ角(VFA)アプローチを用いて計算された。複数のデータセットが3tと7tで取得され、vfaボリュームの取得間を行き来した。 R_1$の地図は、提案された補正と(3Tで)以前に確立された補正スキームで構築された。結果: 3tでは,提案手法がベースライン法を上回った。また, 走査間運動アーチファクトも7Tで減少した。しかし、再現性は、位置特異的な送信電界効果も取り入れた場合にのみ、非運動条件に収束した。結論:提案手法はR_1$マップのスキャン間動作補正を簡略化し,典型的にはボディコイルが利用できない3Tと7Tの両方に適用可能である。すべてのメソッドのオープンソースコードは公開されています。

Purpose: Inter-scan motion is a substantial source of error in $R_1$ estimation, and can be expected to increase at 7T where $B_1$ fields are more inhomogeneous. The established correction scheme does not translate to 7T since it requires a body coil reference. Here we introduce two alternatives that outperform the established method. Since they compute relative sensitivities they do not require body coil images. Theory: The proposed methods use coil-combined magnitude images to obtain the relative coil sensitivities. The first method efficiently computes the relative sensitivities via a simple ratio; the second by fitting a more sophisticated generative model. Methods: $R_1$ maps were computed using the variable flip angle (VFA) approach. Multiple datasets were acquired at 3T and 7T, with and without motion between the acquisition of the VFA volumes. $R_1$ maps were constructed without correction, with the proposed corrections, and (at 3T) with the previously established correction scheme. Results: At 3T, the proposed methods outperform the baseline method. Inter-scan motion artefacts were also reduced at 7T. However, reproducibility only converged on that of the no motion condition if position-specific transmit field effects were also incorporated. Conclusion: The proposed methods simplify inter-scan motion correction of $R_1$ maps and are applicable at both 3T and 7T, where a body coil is typically not available. The open-source code for all methods is made publicly available.

翻訳日:2021-08-26 22:37:04 公開日:2021-08-24

# (参考訳) フィールドガイドによるゼロショット学習

Field-Guide-Inspired Zero-Shot Learning ( http://arxiv.org/abs/2108.10967v1 )

ライセンス: CC BY 4.0

Utkarsh Mall, Bharath Hariharan, and Kavita Bala

(参考訳) 現代の認識システムは、精度を達成するために大量の監督を必要とする。新しいドメインに適応するには、専門家からのかなりのデータが必要である。ゼロショット学習は、新しいカテゴリの注釈付き属性セットを必要とする。新しいカテゴリの属性の完全なセットをアノテートすることは、デプロイにおいて退屈で高価なタスクであることが証明されます。これは、認識ドメインがエキスパートドメインである場合に特に当てはまる。そこで我々は,学習者がクラスを定義する最も有用な属性を対話的に求める,ゼロショットアノテーションに対するフィールドガイド型アプローチを提案する。我々は,CUB,SUN,AWA2などの属性アノテーションを用いた分類ベンチマークにおいて,本手法の有効性を検証し,アノテーション数を大幅に減らし,完全アノテーションを用いたモデルの性能を実現することを示す。専門家の時間は重要なので、実際のデプロイにはアノテーションのコストを削減できる。

Modern recognition systems require large amounts of supervision to achieve accuracy. Adapting to new domains requires significant data from experts, which is onerous and can become too expensive. Zero-shot learning requires an annotated set of attributes for a novel category. Annotating the full set of attributes for a novel category proves to be a tedious and expensive task in deployment. This is especially the case when the recognition domain is an expert domain. We introduce a new field-guide-inspired approach to zero-shot annotation where the learner model interactively asks for the most useful attributes that define a class. We evaluate our method on classification benchmarks with attribute annotations like CUB, SUN, and AWA2 and show that our model achieves the performance of a model with full annotations at the cost of a significantly fewer number of annotations. Since the time of experts is precious, decreasing annotation cost can be very valuable for real-world deployment.

翻訳日:2021-08-26 22:26:09 公開日:2021-08-24

# (参考訳) 実世界single view 3dリコンストラクションのためのドメイン適応

Domain Adaptation for Real-World Single View 3D Reconstruction ( http://arxiv.org/abs/2108.10972v1 )

ライセンス: CC BY 4.0

Brandon Leung, Siddharth Singh, Arik Horodniceanu

(参考訳) 深層学習に基づくオブジェクト再構成アルゴリズムは、古典的手法よりも著しく改善されている。しかし、トレーニングデータとテストデータが異なる分布を持つ場合、教師付き学習ベース手法は性能が良くない。実際、現在のほとんどの研究は、合成されたShapeNetデータセットに満足できるパフォーマンスを保っていますが、実際の画像で提示すると劇的に失敗します。この問題に対処するために、教師なし領域適応は、ラベル付き合成ソースドメインからの転送知識を使用し、ラベル付き実ターゲットドメインの分類器を学ぶことができる。実領域におけるsingle view 3dリコンストラクションの課題に取り組むため,我々は,mmd(maximum mean discrepancy)損失,深海サンゴ,およびdann(domain adversarial neural network)に触発された様々なドメイン適応手法を実験した。これらの結果から,本手法では3dモデルでは対象領域データは教師なしであるが,クラスラベルでは教師なしであるという事実を生かした新しいアーキテクチャを提案する。 pix2voxと呼ばれる最近のネットワークからフレームワークをベースとしています。結果は、shapenetをソースドメインとして、object dataset domain suite(odds)データセットをターゲットとして、real world multiview、multidomain imageデータセットとして、shapenetで実行される。 ODDSのドメインは困難であり、ドメインギャップサイズの概念を評価することができる。このデータセットを用いたマルチビュー再構築文献では,この結果が初めてである。

Deep learning-based object reconstruction algorithms have shown remarkable improvements over classical methods. However, supervised learning based methods perform poorly when the training data and the test data have different distributions. Indeed, most current works perform satisfactorily on the synthetic ShapeNet dataset, but dramatically fail in when presented with real world images. To address this issue, unsupervised domain adaptation can be used transfer knowledge from the labeled synthetic source domain and learn a classifier for the unlabeled real target domain. To tackle this challenge of single view 3D reconstruction in the real domain, we experiment with a variety of domain adaptation techniques inspired by the maximum mean discrepancy (MMD) loss, Deep CORAL, and the domain adversarial neural network (DANN). From these findings, we additionally propose a novel architecture which takes advantage of the fact that in this setting, target domain data is unsupervised with regards to the 3D model but supervised for class labels. We base our framework off a recent network called pix2vox. Results are performed with ShapeNet as the source domain and domains within the Object Dataset Domain Suite (ODDS) dataset as the target, which is a real world multiview, multidomain image dataset. The domains in ODDS vary in difficulty, allowing us to assess notions of domain gap size. Our results are the first in the multiview reconstruction literature using this dataset.

翻訳日:2021-08-26 22:24:59 公開日:2021-08-24

# (参考訳) BERTエンコーディングと文レベル言語モデルを用いた文順序付け

Using BERT Encoding and Sentence-Level Language Model for Sentence Ordering ( http://arxiv.org/abs/2108.10986v1 )

ライセンス: CC BY 4.0

Melika Golestani, Seyedeh Zahra Razavi, Zeinab Borhanifard, Farnaz Tahmasebian, and Hesham Faili

(参考訳) 事象の論理列の発見は、自然言語理解の基盤の1つである。イベントのシーケンスを学ぶ一つのアプローチは、コヒーレントなテキストで文の順序を研究することである。文の順序付けは、検索に基づく質問回答、文書要約、ストーリーテリング、テキスト生成、対話システムなど、さまざまなタスクに適用できる。さらに、シャッフル文の順序を学習することで、テキストコヒーレンスをモデル化することを学ぶことができる。これまでの研究は、RNN、LSTM、BiLSTMアーキテクチャを使ってテキスト言語モデルを学習してきた。しかし、これらのネットワークは注意機構の欠如により性能が悪くなっている。本稿では,短い記事のコーパスにおける文順序付けアルゴリズムを提案する。提案手法では,アテンション機構を用いて文の依存関係をキャプチャするUniversal Transformer (UT) に基づく言語モデルを用いる。提案手法は,約100万件の短い人造ストーリーのコーパスであるROCStoriesデータセットにおけるPMR(Perfect Match Ratio)スコアの点から,過去の最先端技術を改善する。提案するモデルには,Sentence Encoder,Language Model,Sentence Arrangement with Brute Force Searchの3つのコンポーネントが含まれている。第1成分は、ROCStoriesデータに基づいて微調整されたSBERT-WK事前学習モデルを用いて文埋め込みを生成する。そして、ユニバーサルトランスフォーマーネットワークが文レベル言語モデルを生成する。復号化のために、ネットワークは、現在の文の次の文として候補文を生成する。我々はコサイン類似性をスコア関数として使用し、他の文をシャッフルセットに埋め込んだ候補にスコアを割り当てる。次に、連続した文のペア間の類似度の総和を最大化するためにブルートフォース探索を用いる。

Discovering the logical sequence of events is one of the cornerstones in Natural Language Understanding. One approach to learn the sequence of events is to study the order of sentences in a coherent text. Sentence ordering can be applied in various tasks such as retrieval-based Question Answering, document summarization, storytelling, text generation, and dialogue systems. Furthermore, we can learn to model text coherence by learning how to order a set of shuffled sentences. Previous research has relied on RNN, LSTM, and BiLSTM architecture for learning text language models. However, these networks have performed poorly due to the lack of attention mechanisms. We propose an algorithm for sentence ordering in a corpus of short stories. Our proposed method uses a language model based on Universal Transformers (UT) that captures sentences' dependencies by employing an attention mechanism. Our method improves the previous state-of-the-art in terms of Perfect Match Ratio (PMR) score in the ROCStories dataset, a corpus of nearly 100K short human-made stories. The proposed model includes three components: Sentence Encoder, Language Model, and Sentence Arrangement with Brute Force Search. The first component generates sentence embeddings using SBERT-WK pre-trained model fine-tuned on the ROCStories data. Then a Universal Transformer network generates a sentence-level language model. For decoding, the network generates a candidate sentence as the following sentence of the current sentence. We use cosine similarity as a scoring function to assign scores to the candidate embedding and the embeddings of other sentences in the shuffled set. Then a Brute Force Search is employed to maximize the sum of similarities between pairs of consecutive sentences.

翻訳日:2021-08-26 22:15:54 公開日:2021-08-24

# (参考訳) OOWL500: ワイルドなデータセットコレクションバイアスを克服する

OOWL500: Overcoming Dataset Collection Bias in the Wild ( http://arxiv.org/abs/2108.10992v1 )

ライセンス: CC BY 4.0

Brandon Leung, Chih-Hui Ho, Amir Persekian, David Orozco, Yen Chang, Erik Sandstrom, Bo Liu, Nuno Vasconcelos

(参考訳) 画像データセットがオンラインで「野生」に集められたという仮説は、例えばバイアスのあるオブジェクト認識を生成できる。プロの撮影や特定の角度を好み、研究されている。新たな"研究室内"データ収集インフラストラクチャは、オブジェクトを回りながら画像をキャプチャするドローンで構成されている。重要なことに、この設定と自然なカメラによる制御は、飛行に固有の多くのバイアスを軽減する。安価で容易に複製できる性質は、ビジョンコミュニティによるスケーラブルなデータ収集の取り組みにつながる可能性もあります。このプロシージャの有用性は、fLight (OOWL)で達成されたオブジェクトのデータセットを作成することで実証される。 OOWL500 には 500 個のオブジェクトの 1220,000 イメージが含まれており,クラス単位のクラス数とオブジェクト数を考慮すれば,最大規模の "ラボ内" イメージデータセットである。さらに、オブジェクト認識に関するいくつかの新しい洞察を可能にした。まず,カメラの揺らぎやポーズなどのセマンティック特性の観点から,画像摂動を定義できる新たな対角攻撃戦略を提案する。実際、実験の結果、ImageNetには相当量のポーズとプロの写真バイアスがあることがわかった。第二に、ImageNetのような野生のデータセットとOOWL500のような実験データとの増大は、これらのバイアスを著しく減少させ、一般化を改善するオブジェクト認識に繋がることを示すために使われる。第三に、データセットはデータセット収集の"ベストプロシージャ"に関する質問の研究に使用される。合成画像によるデータ拡張は,野生のデータセットでバイアスを排除するには十分ではなく,カメラの揺動とポーズの多様性が従来考えられていたよりもオブジェクト認識の堅牢性において重要な役割を果たすことが明らかとなった。

The hypothesis that image datasets gathered online "in the wild" can produce biased object recognizers, e.g. preferring professional photography or certain viewing angles, is studied. A new "in the lab" data collection infrastructure is proposed consisting of a drone which captures images as it circles around objects. Crucially, the control provided by this setup and the natural camera shake inherent to flight mitigate many biases. It's inexpensive and easily replicable nature may also potentially lead to a scalable data collection effort by the vision community. The procedure's usefulness is demonstrated by creating a dataset of Objects Obtained With fLight (OOWL). Denoted as OOWL500, it contains 120,000 images of 500 objects and is the largest "in the lab" image dataset available when both number of classes and objects per class are considered. Furthermore, it has enabled several of new insights on object recognition. First, a novel adversarial attack strategy is proposed, where image perturbations are defined in terms of semantic properties such as camera shake and pose. Indeed, experiments have shown that ImageNet has considerable amounts of pose and professional photography bias. Second, it is used to show that the augmentation of in the wild datasets, such as ImageNet, with in the lab data, such as OOWL500, can significantly decrease these biases, leading to object recognizers of improved generalization. Third, the dataset is used to study questions on "best procedures" for dataset collection. It is revealed that data augmentation with synthetic images does not suffice to eliminate in the wild datasets biases, and that camera shake and pose diversity play a more important role in object recognition robustness than previously thought.

翻訳日:2021-08-26 22:04:17 公開日:2021-08-24

# SimVLM: Weak Supervisionでトレーニングするシンプルなビジュアル言語モデル

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision ( http://arxiv.org/abs/2108.10904v1 )

ライセンス: Link先を確認

Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

(参考訳) 視覚表現とテキスト表現の結合モデリングの最近の進歩により、視覚言語前訓練(vlp)は多くのマルチモーダル下流タスクで印象的なパフォーマンスを達成している。しかし、クリーンな画像キャプションや地域ラベルを含む高価なアノテーションの要求は、既存のアプローチのスケーラビリティを制限し、複数のデータセット固有の目的を導入することで事前学習手順を複雑化する。本研究では,これらの制約を緩和し,SimVLM(Simple Visual Language Model)という最小限の事前学習フレームワークを提案する。従来の作業と異なり、SimVLMは大規模な弱監視を利用してトレーニングの複雑さを減らし、単一のプレフィックス言語モデリング目的でエンドツーエンドにトレーニングされる。 VQA(+3.74% vqa-score)、NLVR2(+1.17%精度)、SNLI-VE(+1.37%精度)、画像キャプションタスク(+10.1%平均CIDErスコア)など、様々な差別的で生成的な視覚言語ベンチマークにおいて、結果として得られたモデルは、以前の事前学習方法よりも大幅に優れ、新しい最先端の成果が得られる。さらに、SimVLMは強力な一般化と伝達能力を獲得し、オープンな視覚的質問応答やモダリティ間移動を含むゼロショット動作を可能にすることを実証する。

With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks. However, the requirement for expensive annotations including clean image captions and regional labels limits the scalability of existing approaches, and complicates the pretraining procedure with the introduction of multiple dataset-specific objectives. In this work, we relax these constraints and present a minimalist pretraining framework, named Simple Visual Language Model (SimVLM). Unlike prior work, SimVLM reduces the training complexity by exploiting large-scale weak supervision, and is trained end-to-end with a single prefix language modeling objective. Without utilizing extra data or task-specific customization, the resulting model significantly outperforms previous pretraining methods and achieves new state-of-the-art results on a wide range of discriminative and generative vision-language benchmarks, including VQA (+3.74% vqa-score), NLVR2 (+1.17% accuracy), SNLI-VE (+1.37% accuracy) and image captioning tasks (+10.1% average CIDEr score). Furthermore, we demonstrate that SimVLM acquires strong generalization and transfer ability, enabling zero-shot behavior including open-ended visual question answering and cross-modality transfer.

翻訳日:2021-08-26 13:10:52 公開日:2021-08-24

# データプログラミングを用いたポインタラベルのないラベル学習より強大な単語

The Word is Mightier than the Label Learning without Pointillistic Labels using Data Programming ( http://arxiv.org/abs/2108.10921v1 )

ライセンス: Link先を確認

Chufan Gao and Mononito Goswami

(参考訳) ほとんどの高度な教師付き機械学習(ML)モデルは、大量のポイントバイポイントラベル付きトレーニング例に依存している。大量のデータをハンドラベリングすることは面倒で、高価で、エラーを起こしやすい。近年、競争力のあるエンドモデル分類器を作成するために、弱い監督源の多種多様な利用を調査している研究もある。本稿では,弱い監督に関する最近の研究,特にデータプログラミング(dp)フレームワークについて調査する。 DPは、潜在的なノイズのあるヒューリスティックのセットを入力として、ヒューリスティックの確率的グラフィカルモデルを用いて、データセットの各データポイントにノイズ付き確率ラベルを割り当てる。 DPの背後にある数学の基礎を解析し、2つの実世界のテキスト分類タスクに適用してそのパワーを実証する。さらに,従来データスパース設定で適用されてきた点的アクティブおよび半教師付き学習手法とdpを比較した。

Most advanced supervised Machine Learning (ML) models rely on vast amounts of point-by-point labelled training examples. Hand-labelling vast amounts of data may be tedious, expensive, and error-prone. Recently, some studies have explored the use of diverse sources of weak supervision to produce competitive end model classifiers. In this paper, we survey recent work on weak supervision, and in particular, we investigate the Data Programming (DP) framework. Taking a set of potentially noisy heuristics as input, DP assigns denoised probabilistic labels to each data point in a dataset using a probabilistic graphical model of heuristics. We analyze the math fundamentals behind DP and demonstrate the power of it by applying it on two real-world text classification tasks. Furthermore, we compare DP with pointillistic active and semi-supervised learning techniques traditionally applied in data-sparse settings.

翻訳日:2021-08-26 13:10:14 公開日:2021-08-24

# Bias Mitigated Learning from Differentially Private Synthetic Data: a Cautionary Tale

Bias Mitigated Learning from Differentially Private Synthetic Data: A Cautionary Tale ( http://arxiv.org/abs/2108.10934v1 )

ライセンス: Link先を確認

Sahra Ghalebikesabi, Harrison Wilde, Jack Jewson, Arnaud Doucet, Sebastian Vollmer, Chris Holmes

(参考訳) プライバシ保護機械学習への関心が高まり、未公開の実データから合成プライベートデータを生成する新しいモデルが生まれた。しかし、プライバシ保存のメカニズムは、予測モデルや推論の学習のような下流タスクに大きな影響を与える結果合成データにアーティファクトを導入する。特に、合成データ分布が実データ分布の不整合推定であるため、バイアスはすべての解析に影響を及ぼす可能性がある。本研究では, 差動合成データ生成モデルに適用可能な民営化確率比を用いたバイアス緩和手法を提案する。大規模実証評価を通じて, バイアス緩和は, 一般の合成データに対して, 単純かつ効果的なプライバシー準拠の強化をもたらすことを示した。しかし, 偏差補正後においても, 予測や推測などのタスクにおいて, 合成プライベートデータ生成器の有用性に重要な課題が残されている。

Increasing interest in privacy-preserving machine learning has led to new models for synthetic private data generation from undisclosed real data. However, mechanisms of privacy preservation introduce artifacts in the resulting synthetic data that have a significant impact on downstream tasks such as learning predictive models or inference. In particular, bias can affect all analyses as the synthetic data distribution is an inconsistent estimate of the real-data distribution. We propose several bias mitigation strategies using privatized likelihood ratios that have general applicability to differentially private synthetic data generative models. Through large-scale empirical evaluation, we show that bias mitigation provides simple and effective privacy-compliant augmentation for general applications of synthetic data. However, the work highlights that even after bias correction significant challenges remain on the usefulness of synthetic private data generators for tasks such as prediction and inference.

翻訳日:2021-08-26 13:05:33 公開日:2021-08-24

# 先行プローブを用いたエンティティ曖昧性のロバスト性評価:エンティティオーバーシャドーイングの場合

Robustness Evaluation of Entity Disambiguation Using Prior Probes:the Case of Entity Overshadowing ( http://arxiv.org/abs/2108.10949v1 )

ライセンス: Link先を確認

Vera Provatorova, Svitlana Vakulenko, Samarth Bhargav, Evangelos Kanoulas

(参考訳) エンティティの曖昧さ (ED) はエンティティリンク(EL)の最終段階であり、候補となるエンティティが出現するコンテキストに応じてリランクされる。 elのモデルのトレーニングと評価のためのすべてのデータセットは、ニュース記事やツイートのような便利なサンプルで構成されており、より頻繁に発生するエンティティに対するエンティティ分布の以前の確率バイアスを広めている。このようなデータセット上でのELシステムの性能は,事前学習だけで高い精度のスコアを得ることができるため,過大評価されている。より適切な評価ベンチマークとして,エンティティ参照に注釈を付けた16Kの短いテキストスニペットを含むShadowLinkデータセットを導入する。我々はShadowLinkベンチマークで人気のあるELシステムの性能を評価し報告する。その結果, 評価対象のELシステムにおいて, 既往の確率バイアスとエンティティのオーバーシャドーイングの影響を実証し, 共通エンティティの精度に有意な差が認められた。

Entity disambiguation (ED) is the last step of entity linking (EL), when candidate entities are reranked according to the context they appear in. All datasets for training and evaluating models for EL consist of convenience samples, such as news articles and tweets, that propagate the prior probability bias of the entity distribution towards more frequently occurring entities. It was previously shown that the performance of the EL systems on such datasets is overestimated since it is possible to obtain higher accuracy scores by merely learning the prior. To provide a more adequate evaluation benchmark, we introduce the ShadowLink dataset, which includes 16K short text snippets annotated with entity mentions. We evaluate and report the performance of popular EL systems on the ShadowLink benchmark. The results show a considerable difference in accuracy between more and less common entities for all of the EL systems under evaluation, demonstrating the effects of prior probability bias and entity overshadowing.

翻訳日:2021-08-26 13:03:01 公開日:2021-08-24

# ピクセル近傍法による肌色分割のための有効画素分割法

An Effective Pixel-Wise Approach for Skin Colour Segmentation Using Pixel Neighbourhood Technique ( http://arxiv.org/abs/2108.10971v1 )

ライセンス: Link先を確認

Tejas Dastane, Varun Rao, Kartik Shenoy, Devendra Vyavaharkar

(参考訳) 本稿では,カラーレンジのしきい値化などの既存技術が直面する限界を克服する新しい肌色分割手法を提案する。肌の色セグメンテーションは、様々な肌の色と周囲の照明条件に影響され、多くの技術で肌の色セグメンテーションに繋がる。隣接する画素に基づいて,任意のピクセルを皮膚または非皮膚に分類する2段階のPixel Neighbourhood手法を提案する。第1ステップは、深部ニューラルネットワークモデルに画素のHSV値を渡すことにより、各画素が皮膚である確率を算出する。次のステップでは、隣接するピクセルの確率を用いて、皮膚にあるピクセルの類似性を計算する。この技術は既存の技術よりも肌色セグメンテーションが優れている。

This paper presents a novel technique for skin colour segmentation that overcomes the limitations faced by existing techniques such as Colour Range Thresholding. Skin colour segmentation is affected by the varied skin colours and surrounding lighting conditions, leading to poorskin segmentation for many techniques. We propose a new two stage Pixel Neighbourhood technique that classifies any pixel as skin or non-skin based on its neighbourhood pixels. The first step calculates the probability of each pixel being skin by passing HSV values of the pixel to a Deep Neural Network model. In the next step, it calculates the likeliness of pixel being skin using these probabilities of neighbouring pixels. This technique performs skin colour segmentation better than the existing techniques.

翻訳日:2021-08-26 13:02:29 公開日:2021-08-24

# リアルタイムインド手話(ISL)認識

Real-time Indian Sign Language (ISL) Recognition ( http://arxiv.org/abs/2108.10970v1 )

ライセンス: Link先を確認

Kartik Shenoy, Tejas Dastane, Varun Rao, Devendra Vyavaharkar

(参考訳) 本稿では,グリッド型特徴量を用いて,インド手話(ISL)からのポーズやジェスチャーをリアルタイムで認識するシステムを提案する。このシステムは聴覚障害と言語障害のコミュニケーションギャップと社会の他の部分との橋渡しを試みている。既存のソリューションは比較的低い精度を提供するか、リアルタイムに動作しない。このシステムは両方のパラメーターに良い結果を与える。 33のポーズとISLからのジェスチャーを識別できる。 Sign Languageはスマートフォンカメラからキャプチャされ、そのフレームは処理のためにリモートサーバに送信される。外部ハードウェア(手袋やMicrosoft Kinectセンサーなど)の使用は避けられ、ユーザーフレンドリーになる。顔検出、物体の安定化、肌の色分割などの技術は、手の検出や追跡に使われている。さらに、画像は、特徴ベクトルの形で手のポーズを表すグリッドベースの特徴抽出技術によりさらに処理される。ハンドポーズはk-nearest neighborsアルゴリズムで分類される。一方、ジェスチャー分類では、ISLで定義された12の事前選択されたジェスチャーに対応する隠れマルコフモデルチェーンに、動作と中間手ポーズ観察シーケンスが供給される。この手法を用いることで、静的手ポーズの精度は99.7%、ジェスチャー認識の精度は97.23%となる。

This paper presents a system which can recognise hand poses & gestures from the Indian Sign Language (ISL) in real-time using grid-based features. This system attempts to bridge the communication gap between the hearing and speech impaired and the rest of the society. The existing solutions either provide relatively low accuracy or do not work in real-time. This system provides good results on both the parameters. It can identify 33 hand poses and some gestures from the ISL. Sign Language is captured from a smartphone camera and its frames are transmitted to a remote server for processing. The use of any external hardware (such as gloves or the Microsoft Kinect sensor) is avoided, making it user-friendly. Techniques such as Face detection, Object stabilisation and Skin Colour Segmentation are used for hand detection and tracking. The image is further subjected to a Grid-based Feature Extraction technique which represents the hand's pose in the form of a Feature Vector. Hand poses are then classified using the k-Nearest Neighbours algorithm. On the other hand, for gesture classification, the motion and intermediate hand poses observation sequences are fed to Hidden Markov Model chains corresponding to the 12 pre-selected gestures defined in ISL. Using this methodology, the system is able to achieve an accuracy of 99.7% for static hand poses, and an accuracy of 97.23% for gesture recognition.

翻訳日:2021-08-26 12:56:12 公開日:2021-08-24

# NeRP: 少ないサンプリング画像再構成のためのプリエンベディングによる暗黙的ニューラル表現学習

NeRP: Implicit Neural Representation Learning with Prior Embedding for Sparsely Sampled Image Reconstruction ( http://arxiv.org/abs/2108.10991v1 )

ライセンス: Link先を確認

Liyue Shen, John Pauly, Lei Xing

(参考訳) 画像再構成は、サンプリングされたセンサ計測に基づく計算画像の逆問題である。少量のサンプル画像再構成は、限られた測定のために追加の課題を引き起こす。本研究では,事前埋め込み(NeRP)を用いた暗黙的ニューラルネットワーク表現学習手法を提案する。従来の深層学習に基づく画像再構成手法とは根本的に異なり、nerpは画像内の内部情報を事前に活用し、比較的サンプルの少ない測定値の物理を活用して未知の被写体の表現を生成する。以前の画像と少量のサンプルデータを除いて、NeRPを訓練するために大規模なデータは必要ない。また,NeRPはCTやMRIなどの様々な画像モダリティに一般化する一般的な手法であることを示す。また,NeRPは腫瘍進展を評価するのに必要な,微妙ながら重要な画像変化をしっかりと捉えることができることを示した。

Image reconstruction is an inverse problem that solves for a computational image based on sampled sensor measurement. Sparsely sampled image reconstruction poses addition challenges due to limited measurements. In this work, we propose an implicit Neural Representation learning methodology with Prior embedding (NeRP) to reconstruct a computational image from sparsely sampled measurements. The method differs fundamentally from previous deep learning-based image reconstruction approaches in that NeRP exploits the internal information in an image prior, and the physics of the sparsely sampled measurements to produce a representation of the unknown subject. No large-scale data is required to train the NeRP except for a prior image and sparsely sampled measurements. In addition, we demonstrate that NeRP is a general methodology that generalizes to different imaging modalities such as CT and MRI. We also show that NeRP can robustly capture the subtle yet significant image changes required for assessing tumor progression.

翻訳日:2021-08-26 12:55:56 公開日:2021-08-24

# ガウス分布の間のエントロピーGromov-Wasserstein

Entropic Gromov-Wasserstein between Gaussian Distributions ( http://arxiv.org/abs/2108.10961v1 )

ライセンス: Link先を確認

Khang Le and Dung Le and Huy Nguyen and Dat Do and Tung Pham and Nhat Ho

(参考訳) 我々はトロピック・グロモフ・ワッサーシュタインとその次元の異なるガウス分布の間の不均衡バージョンについて研究した。計量が内積であるとき、内積gromov-wasserstein (igw) は、エントロピーigwとその非平衡変異の最適輸送計画が(非平衡な)ガウス分布であることを示す。フォン・ノイマンのトレース不等式の適用により、これらのガウス分布の間のエントロピー IGW に対する閉形式式を得る。最後に、複数のガウス分布のエントロピー内積gromov-wasserstein barycenterを考える。エントロピー正則化パラメータが小さい場合、バリセンタがガウス分布であることを証明する。さらに,重心の共分散行列に対する閉形式表現も導出する。

We study the entropic Gromov-Wasserstein and its unbalanced version between (unbalanced) Gaussian distributions with different dimensions. When the metric is the inner product, which we refer to as inner product Gromov-Wasserstein (IGW), we demonstrate that the optimal transportation plans of entropic IGW and its unbalanced variant are (unbalanced) Gaussian distributions. Via an application of von Neumann's trace inequality, we obtain closed-form expressions for the entropic IGW between these Gaussian distributions. Finally, we consider an entropic inner product Gromov-Wasserstein barycenter of multiple Gaussian distributions. We prove that the barycenter is Gaussian distribution when the entropic regularization parameter is small. We further derive closed-form expressions for the covariance matrix of the barycenter.

翻訳日:2021-08-26 12:53:01 公開日:2021-08-24

# オンライン辞書学習に基づく電力系統の故障・サイバー攻撃検出

Online Dictionary Learning Based Fault and Cyber Attack Detection for Power Systems ( http://arxiv.org/abs/2108.10990v1 )

ライセンス: Link先を確認

Gabriel Intriago, Yu Zhang

(参考訳) 新興の広域監視システム(wams)は、電力網の状況把握に大きな改善をもたらした。しかし、新たに導入されたシステムは、通常の物理的障害に変装する可能性のあるサイバー攻撃のリスクを高める可能性がある。本稿では、ストリームデータマイニング分類器(Hoeffding Adaptive Tree)と半教師付き学習技術を利用して、通常のシステム摂動からサイバー攻撃を正確に識別することで、イベントや侵入検知の問題に対処する。まず,提案手法はラベルなしデータから高レベルな特徴を学習することで辞書を構築する。次に、ラベル付きデータを学習辞書原子のスパース線形結合として表現する。我々は、これらのスパースコードを利用して、オンライン分類器と効率的な変更検出器を訓練する。我々は,産業制御システムによるサイバー攻撃データセットを用いた数値実験を行った。ショートサーキット障害、ラインメンテナンス、リモートトリッピングコマンドインジェクション、リレー設定変更、偽データインジェクションの5つのシナリオを検討した。データは改良されたIEEE 9バスシステムに基づいて生成される。シミュレーションの結果,提案手法は最先端手法よりも優れていることがわかった。

The emerging wide area monitoring systems (WAMS) have brought significant improvements in electric grids' situational awareness. However, the newly introduced system can potentially increase the risk of cyber-attacks, which may be disguised as normal physical disturbances. This paper deals with the event and intrusion detection problem by leveraging a stream data mining classifier (Hoeffding adaptive tree) with semi-supervised learning techniques to distinguish cyber-attacks from regular system perturbations accurately. First, our proposed approach builds a dictionary by learning higher-level features from unlabeled data. Then, the labeled data are represented as sparse linear combinations of learned dictionary atoms. We capitalize on those sparse codes to train the online classifier along with efficient change detectors. We conduct numerical experiments with industrial control systems cyber-attack datasets. We consider five different scenarios: short-circuit faults, line maintenance, remote tripping command injection, relay setting change, as well as false data injection. The data are generated based on a modified IEEE 9-bus system. Simulation results show that our proposed approach outperforms the state-of-the-art method.

翻訳日:2021-08-26 12:52:49 公開日:2021-08-24

# (参考訳) ディープラーニングの敵対的ロバスト性:理論・アルゴリズム・応用

Adversarial Robustness of Deep Learning: Theory, Algorithms, and Applications ( http://arxiv.org/abs/2108.10451v1 )

ライセンス: CC BY 4.0

Wenjie Ruan and Xinping Yi and Xiaowei Huang

(参考訳) 本チュートリアルは, 各種深層学習モデルの脆弱性を, 逆例として評価するための, 最新の手法をよく構築したレビューとして紹介することを目的としている。このチュートリアルは特に、ディープニューラルネットワーク(DNN)の敵攻撃と堅牢性検証における最先端技術を強調している。深層学習モデルのロバスト性を改善するための効果的な対策についても紹介する。我々は、この新たな方向性に関する総合的な全体像を提供し、安全-クリティカルなデータ分析アプリケーションにおける堅牢なディープラーニングモデルの設計の緊急性と重要性をコミュニティに認識させ、最終的にはエンドユーザがディープラーニング分類器を信頼できるようにする。また、深層学習の敵意的堅牢性に関する潜在的研究の方向性と、信頼性の高い深層学習に基づくデータ分析システムとアプリケーションを実現するための潜在的な利点を要約する。

This tutorial aims to introduce the fundamentals of adversarial robustness of deep learning, presenting a well-structured review of up-to-date techniques to assess the vulnerability of various types of deep learning models to adversarial examples. This tutorial will particularly highlight state-of-the-art techniques in adversarial attacks and robustness verification of deep neural networks (DNNs). We will also introduce some effective countermeasures to improve the robustness of deep learning models, with a particular focus on adversarial training. We aim to provide a comprehensive overall picture about this emerging direction and enable the community to be aware of the urgency and importance of designing robust deep learning models in safety-critical data analytical applications, ultimately enabling the end-users to trust deep learning classifiers. We will also summarize potential research directions concerning the adversarial robustness of deep learning, and its potential benefits to enable accountable and trustworthy deep learning-based data analytical systems and applications.

翻訳日:2021-08-25 21:06:28 公開日:2021-08-24

# (参考訳) Deep Survival Dose Response Function (DeepSDRF)による確率的治療勧告

Stochastic Treatment Recommendation with Deep Survival Dose Response Function (DeepSDRF) ( http://arxiv.org/abs/2108.10453v1 )

ライセンス: CC BY 4.0

Jie Zhu, Blanca Gallego

(参考訳) 我々は,deep survival dose response function (deepsdrf) と呼ばれる臨床生存率データを用いて,確率的治療推奨問題の一般的な定式化を提案する。すなわち,未観測の要因(共同設立者)が観察された治療と時間と時間の両方に影響を及ぼす履歴データから,条件平均線量応答(CADR)関数を学習する問題を考える。 DeepSDRFから推定される治療効果により,説明的洞察を用いた推薦アルゴリズムの開発が可能となる。ランダム検索と強化学習を併用した2つの推奨手法を比較し,同様の結果を得た。我々は,DeepSDRFとそれに対応する勧告を広範囲なシミュレーション研究と2つの実験データベースで検証した: 1)臨床実践研究データリンク(CPRD)と2)eICU研究所(eRI)データベース。我々の知る限りでは、共同設立者が医学的文脈における観察データによる確率的治療効果を考慮に入れたのはこれが初めてである。

We propose a general formulation for stochastic treatment recommendation problems in settings with clinical survival data, which we call the Deep Survival Dose Response Function (DeepSDRF). That is, we consider the problem of learning the conditional average dose response (CADR) function solely from historical data in which unobserved factors (confounders) affect both observed treatment and time-to-event outcomes. The estimated treatment effect from DeepSDRF enables us to develop recommender algorithms with explanatory insights. We compared two recommender approaches based on random search and reinforcement learning and found similar performance in terms of patient outcome. We tested the DeepSDRF and the corresponding recommender on extensive simulation studies and two empirical databases: 1) the Clinical Practice Research Datalink (CPRD) and 2) the eICU Research Institute (eRI) database. To the best of our knowledge, this is the first time that confounders are taken into consideration for addressing the stochastic treatment effect with observational data in a medical context.

翻訳日:2021-08-25 20:55:39 公開日:2021-08-24

# (参考訳) Isaac Gym: ロボット学習のための高性能GPUベースの物理シミュレーション

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning ( http://arxiv.org/abs/2108.10470v1 )

ライセンス: CC BY 4.0

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, Gavriel State

(参考訳) Isaac Gymは、GPU上でさまざまなロボットタスクのポリシーをトレーニングする、高性能な学習プラットフォームを提供する。物理シミュレーションとニューラルネットワークポリシのトレーニングはどちらもgpu上にあり、物理バッファからpytorchテンソルに直接データを渡すことで、cpuボトルネックを乗り越えることなく通信する。これにより、ニューラルネットワークにcpuベースのシミュレータとgpuを使用する従来のrlトレーニングに比べて、1つのgpu上で複雑なロボットタスクのトレーニング時間が1～2桁向上した。結果は \url{https://sites.google.com/view/isaacgym-nvidia} でホストされ、isaac gymは \url{https://developer.nvidia.com/isaac-gym} でダウンロードできる。

Isaac Gym offers a high performance learning platform to train policies for wide variety of robotics tasks directly on GPU. Both physics simulation and the neural network policy training reside on GPU and communicate by directly passing data from physics buffers to PyTorch tensors without ever going through any CPU bottlenecks. This leads to blazing fast training times for complex robotics tasks on a single GPU with 1-2 orders of magnitude improvements compared to conventional RL training that uses a CPU based simulator and GPU for neural networks. We host the results and videos at \url{https://sites.google.com/view/isaacgym-nvidia} and isaac gym can be download at \url{https://developer.nvidia.com/isaac-gym}.

翻訳日:2021-08-25 20:31:14 公開日:2021-08-24

# (参考訳) 修正FSSDとモデル圧縮に基づく小型物体検出

Small Object Detection Based on Modified FSSD and Model Compression ( http://arxiv.org/abs/2108.10503v1 )

ライセンス: CC BY 4.0

Qingcai Wang, Hao Zhang, Xianggong Hong, and Qinqin Zhou

(参考訳) 小物体の分解能は比較的低く, 抽出が困難であり, 既存の物体検出法では小物体を効果的に検出することができず, 検出速度や安定性は低い。そこで本研究では,FSSDに基づく小型物体検出アルゴリズムを提案する。まず、異なるレイヤの特徴に含まれる意味情報を異なるスケールオブジェクトの検出に用いることができ、また、特徴融合法を改善して、小さなオブジェクトに有益なより多くの情報を得ることができ、第2に、ニューラルネットワークのトレーニングを加速し、モデルをスパースにするためにバッチ正規化層を導入し、最後に、スケール係数によってモデルを切断して対応する圧縮モデルを得る。実験の結果、アルゴリズムの平均精度(mAP)はPASCAL VOCで80.4%、速度はGTX1080tiで59.5 FPSに達することが示された。刈り取り後、圧縮されたモデルは79.9% mAP、79.5 FPSの速度で到達できる。 MS COCOでは、最良の検出精度(APs)は12.1%であり、全体的な検出精度はIoUが0.5のとき49.8%である。このアルゴリズムは、小型物体の検出精度を向上させるだけでなく、検出速度を大幅に向上させ、速度と精度のバランスをとることができる。

Small objects have relatively low resolution, the unobvious visual features which are difficult to be extracted, so the existing object detection methods cannot effectively detect small objects, and the detection speed and stability are poor. Thus, this paper proposes a small object detection algorithm based on FSSD, meanwhile, in order to reduce the computational cost and storage space, pruning is carried out to achieve model compression. Firstly, the semantic information contained in the features of different layers can be used to detect different scale objects, and the feature fusion method is improved to obtain more information beneficial to small objects; secondly, batch normalization layer is introduced to accelerate the training of neural network and make the model sparse; finally, the model is pruned by scaling factor to get the corresponding compressed model. The experimental results show that the average accuracy (mAP) of the algorithm can reach 80.4% on PASCAL VOC and the speed is 59.5 FPS on GTX1080ti. After pruning, the compressed model can reach 79.9% mAP, and 79.5 FPS in detection speed. On MS COCO, the best detection accuracy (APs) is 12.1%, and the overall detection accuracy is 49.8% AP when IoU is 0.5. The algorithm can not only improve the detection accuracy of small objects, but also greatly improves the detection speed, which reaches a balance between speed and accuracy.

翻訳日:2021-08-25 20:28:39 公開日:2021-08-24

# (参考訳) 組込みシステムにおける実時間単眼人間深度推定とセグメント化

Real-Time Monocular Human Depth Estimation and Segmentation on Embedded Systems ( http://arxiv.org/abs/2108.10506v1 )

ライセンス: CC BY 4.0

Shan An, Fangru Zhou, Mei Yang, Haogang Zhu, Changhong Fu, and Konstantinos A. Tsintotas

(参考訳) 移動歩行者に対する衝突回避のためにシーンの深さを推定することはロボット分野において重要かつ根本的な問題である。本稿では,室内環境における人体深度推定とセグメンテーションの迅速かつ高精度なネットワークアーキテクチャを提案し,単眼カメラを主認識モジュールとした資源制約型プラットフォーム(バッテリ駆動空中・マイクロ空・地上車両を含む)への適用を目指している。エンコーダ・デコーダ構造に従って,提案手法は深さ予測と意味セグメンテーションの2つの分岐からなる。さらに,ネットワーク構造最適化を用いて前方推定速度を改善する。 3つの自己生成データセットに対する試験的な実験は、パイプラインがリアルタイムに実行可能であることを証明し、同等の精度を維持しながら、現代の最先端フレームワーク(TensorRTを備えたNVIDIA Jetson Nano GPUで毎秒114.6フレーム)よりも高いフレームレートを達成する。

Estimating a scene's depth to achieve collision avoidance against moving pedestrians is a crucial and fundamental problem in the robotic field. This paper proposes a novel, low complexity network architecture for fast and accurate human depth estimation and segmentation in indoor environments, aiming to applications for resource-constrained platforms (including battery-powered aerial, micro-aerial, and ground vehicles) with a monocular camera being the primary perception module. Following the encoder-decoder structure, the proposed framework consists of two branches, one for depth prediction and another for semantic segmentation. Moreover, network structure optimization is employed to improve its forward inference speed. Exhaustive experiments on three self-generated datasets prove our pipeline's capability to execute in real-time, achieving higher frame rates than contemporary state-of-the-art frameworks (114.6 frames per second on an NVIDIA Jetson Nano GPU with TensorRT) while maintaining comparable accuracy.

翻訳日:2021-08-25 20:17:17 公開日:2021-08-24

# (参考訳) ARShoe:スマートフォンのリアルタイム拡張現実シューオンシステム

ARShoe: Real-Time Augmented Reality Shoe Try-on System on Smartphones ( http://arxiv.org/abs/2108.10515v1 )

ライセンス: CC BY 4.0

Shan An, Guangfu Che, Jinghao Guo, Haogang Zhu, Junjie Ye, Fangru Zhou, Zhaoqi Zhu, Dong Wei, Aishan Liu, Wei Zhang

(参考訳) 仮想トライオン技術により、ユーザーは拡張現実を使ってさまざまなファッションアイテムを試すことができ、便利なオンラインショッピング体験を提供する。しかし、以前の作品の多くは衣服の仮想試着に焦点を合わせ、靴の試着を無視している。そこで本研究では,スマートフォン用のリアルタイム拡張現実バーチャル靴試着システム,ARShoeを提案する。具体的には、ポーズ推定とセグメンテーションを同時に実現するために、新しいマルチブランチネットワークを採用する。試着中にリアルな3Dシューズモデル閉塞を発生させるソリューションを提示する。円滑で安定な試行効果を達成するため,本研究は新たな安定化法をさらに発展させる。さらに, トレーニングと評価のために, 仮想シューズ試用タスク関連ラベルをアノテーションで付加した, 初の大規模フットベンチマークを構築した。新たに構築したベンチマーク実験では,ARShoeの満足度が実証された。スマートフォンの実用化試験では,提案手法のリアルタイム性能と安定化が検証された。

Virtual try-on technology enables users to try various fashion items using augmented reality and provides a convenient online shopping experience. However, most previous works focus on the virtual try-on for clothes while neglecting that for shoes, which is also a promising task. To this concern, this work proposes a real-time augmented reality virtual shoe try-on system for smartphones, namely ARShoe. Specifically, ARShoe adopts a novel multi-branch network to realize pose estimation and segmentation simultaneously. A solution to generate realistic 3D shoe model occlusion during the try-on process is presented. To achieve a smooth and stable try-on effect, this work further develop a novel stabilization method. Moreover, for training and evaluation, we construct the very first large-scale foot benchmark with multiple virtual shoe try-on task-related labels annotated. Exhaustive experiments on our newly constructed benchmark demonstrate the satisfying performance of ARShoe. Practical tests on common smartphones validate the real-time performance and stabilization of the proposed approach.

翻訳日:2021-08-25 20:01:28 公開日:2021-08-24

# (参考訳) ラベル割り当て蒸留による物体検出の改善

Improving Object Detection by Label Assignment Distillation ( http://arxiv.org/abs/2108.10520v1 )

ライセンス: CC BY 4.0

Chuong H. Nguyen, Thuy C. Nguyen, Tuan N. Tang, Nam L.H. Phan

(参考訳) オブジェクト検出におけるラベル割り当ては、画像内のサンプルされた領域に前景または背景のターゲットを割り当てることを目的としている。画像分類のラベル付けとは異なり、この問題はオブジェクトの境界ボックスのために適切に定義されていない。本稿では,蒸留の観点から問題を考察し,ラベル割り当て蒸留(LAD)と呼ぶ。最初のモチベーションは非常に単純で、教師ネットワークを使って生徒のラベルを生成します。これは、教師の予測を直接の目標(ソフトラベル)として使うか、または教師が動的に割り当てるハードラベル(LAD)を通して達成できる。実験の結果, (i)LADはソフトラベルよりも有効であるが, 相補的であることがわかった。 (ii)ladを使用すると、より小さな教師はより大きな生徒を著しく改善できるが、ソフトラベルはできない。次に,2つのネットワークがスクラッチから同時に学習し,教師と学生の役割を動的に交換するコラーニングLADを紹介する。 PAA-ResNet50を教師として使うことで、PAA-ResNet101とPAA-ResNeXt101の検出器を、COCOテストデブセットで46ドル、47.5ドルに改善できます。強力な教師であるPAA-SwinBでは、PAA-ResNet50を1倍のスケジュールトレーニングで43.9ドル、PAA-ResNet101を47.9ドルに改善し、現在の手法を大きく上回っている。ソースコードとチェックポイントはhttps://github.com/cybercore-co-ltd/colad_paperで公開します。

Label assignment in object detection aims to assign targets, foreground or background, to sampled regions in an image. Unlike labeling for image classification, this problem is not well defined due to the object's bounding box. In this paper, we investigate the problem from a perspective of distillation, hence we call Label Assignment Distillation (LAD). Our initial motivation is very simple, we use a teacher network to generate labels for the student. This can be achieved in two ways: either using the teacher's prediction as the direct targets (soft label), or through the hard labels dynamically assigned by the teacher (LAD). Our experiments reveal that: (i) LAD is more effective than soft-label, but they are complementary. (ii) Using LAD, a smaller teacher can also improve a larger student significantly, while soft-label can't. We then introduce Co-learning LAD, in which two networks simultaneously learn from scratch and the role of teacher and student are dynamically interchanged. Using PAA-ResNet50 as a teacher, our LAD techniques can improve detectors PAA-ResNet101 and PAA-ResNeXt101 to $46 \rm AP$ and $47.5\rm AP$ on the COCO test-dev set. With a strong teacher PAA-SwinB, we improve the PAA-ResNet50 to $43.9\rm AP$ with only \1x schedule training, and PAA-ResNet101 to $47.9\rm AP$, significantly surpassing the current methods. Our source code and checkpoints will be released at https://github.com/cybercore-co-ltd/CoLAD_paper.

翻訳日:2021-08-25 19:47:11 公開日:2021-08-24

# (参考訳) 複数物体追跡と軌道予測のための共同学習アーキテクチャ

Joint Learning Architecture for Multiple Object Tracking and Trajectory Forecasting ( http://arxiv.org/abs/2108.10543v1 )

ライセンス: CC BY 4.0

Oluwafunmilola Kesa, Olly Styles, Victor Sanchez

(参考訳) 本稿では,複数物体追跡(MOT)と軌跡予測のための共同学習アーキテクチャ(JLA)を提案する。動き予測は、境界ボックスの形で予測を洗練させる技術MOT法のいくつかの状態において広く用いられている。通常、カルマンフィルタは、トラッカーが現在のフレーム内のオブジェクトの位置を正確に予測するのに役立つ短期的な推定を提供する。しかし、カルマンフィルタに基づくアプローチは非線形軌跡を予測できない。追跡軌道予測モデルと予測軌道予測モデルの共同学習を行い,カルマンフィルタのような線形運動予測手法に代えて,短期運動推定のための予測軌道予測法を提案する。我々はMOTChallengeベンチマークでJLAを評価した。評価の結果、JLAは短期動作予測に優れており、FairMOTと比較して、MOT16、MOT17、MOT20データセットのIDスイッチを33%、31%、および47%削減している。

This paper introduces a joint learning architecture (JLA) for multiple object tracking (MOT) and trajectory forecasting in which the goal is to predict objects' current and future trajectories simultaneously. Motion prediction is widely used in several state of the art MOT methods to refine predictions in the form of bounding boxes. Typically, a Kalman Filter provides short-term estimations to help trackers correctly predict objects' locations in the current frame. However, the Kalman Filter-based approaches cannot predict non-linear trajectories. We propose to jointly train a tracking and trajectory forecasting model and use the predicted trajectory forecasts for short-term motion estimates in lieu of linear motion prediction methods such as the Kalman filter. We evaluate our JLA on the MOTChallenge benchmark. Evaluations result show that JLA performs better for short-term motion prediction and reduces ID switches by 33%, 31%, and 47% in the MOT16, MOT17, and MOT20 datasets, respectively, in comparison to FairMOT.

翻訳日:2021-08-25 19:28:20 公開日:2021-08-24

# (参考訳) 凍結組織からの歯肉質の組織学的診断を容易にする再生的逆行性アプローチ

A generative adversarial approach to facilitate archival-quality histopathologic diagnoses from frozen tissue sections ( http://arxiv.org/abs/2108.10550v1 )

ライセンス: CC BY 4.0

Kianoush Falahkheirkhah, Tao Guo, Michael Hwang, Pheroze Tamboli, Christopher G Wood, Jose A Karam, Kanishka Sircar, and Rohit Bhargava

(参考訳) 病理組織学を含む臨床診断および研究において、ホルマリン固定パラフィン(FFPE)組織は、その超画質にほぼ普遍的に好まれる。しかし、組織処理時間(24時間以上)は意思決定を遅らせる可能性がある。対照的に、フレッシュフリーズ(ff)処理(1時間未満)は迅速な情報が得られるが、クリアリングの欠如、形態的変形、頻繁なアーティファクトにより診断精度は最適ではない。ここでは、人工知能を使ってこのギャップを埋める。患者40名から分離した98対の腎サンプルから生成逆数ネットワーク(GAN)を用いて,FFPE様画像,仮想FFPEをFFPEから合成した。 5人の病理医が盲検検査の結果を評価した。仮想FFPEデータの画質は高く評価され、実際のFFPE画像とよく似ていることが示された。仮想ffpe画像における疾患の臨床的評価は, ff画像と比較して, 観察者間一致が高かった。ほぼ瞬時に生成された仮想FFPE画像は、情報への時間を短縮するだけでなく、余分なコストと労力なしで通常のFFPE画像からより正確な診断を容易にすることができる。

In clinical diagnostics and research involving histopathology, formalin fixed paraffin embedded (FFPE) tissue is almost universally favored for its superb image quality. However, tissue processing time (more than 24 hours) can slow decision-making. In contrast, fresh frozen (FF) processing (less than 1 hour) can yield rapid information but diagnostic accuracy is suboptimal due to lack of clearing, morphologic deformation and more frequent artifacts. Here, we bridge this gap using artificial intelligence. We synthesize FFPE-like images ,virtual FFPE, from FF images using a generative adversarial network (GAN) from 98 paired kidney samples derived from 40 patients. Five board-certified pathologists evaluated the results in a blinded test. Image quality of the virtual FFPE data was assessed to be high and showed a close resemblance to real FFPE images. Clinical assessments of disease on the virtual FFPE images showed a higher inter-observer agreement compared to FF images. The nearly instantaneously generated virtual FFPE images can not only reduce time to information but can facilitate more precise diagnosis from routine FF images without extraneous costs and effort.

翻訳日:2021-08-25 19:13:22 公開日:2021-08-24

# (参考訳) イベントカメラからの高密度光流れ

Dense Optical Flow from Event Cameras ( http://arxiv.org/abs/2108.10552v1 )

ライセンス: CC BY-SA 4.0

Mathias Gehrig and Mario Millh\"ausler and Daniel Gehrig and Davide Scaramuzza

(参考訳) イベントカメラからの高密度光フロー推定に特徴相関と逐次処理を導入することを提案する。現代のフレームベース光フロー法は特徴相関から計算したマッチングコストに大きく依存している。対照的に、マッチングコストを明示的に計算するイベントカメラの光学フロー法は存在しない。代わりに、イベントを用いた学習ベースのアプローチは、通常はU-Netアーキテクチャを利用して光学フローをわずかに見積もる。我々の重要な発見は、相関関数の導入は、畳み込み層のみに依存する従来の方法と比較して、結果を著しく改善するということです。提案手法は,最先端技術と比較して高密度光流を計算し,終点誤差をMVSECで23%削減する。また,イベントカメラ用にこれまでに開発された光学フロー法はすべて,最大流量10ピクセルの非常に小さな変位場を持つデータセット上で評価されている。この観測に基づいて,最大210ピクセルの変位場と3倍の解像度のカメラ分解能を示す,新しい実世界のデータセットを導入する。提案手法は,このデータセットの終端点誤差を66%低減する。

We propose to incorporate feature correlation and sequential processing into dense optical flow estimation from event cameras. Modern frame-based optical flow methods heavily rely on matching costs computed from feature correlation. In contrast, there exists no optical flow method for event cameras that explicitly computes matching costs. Instead, learning-based approaches using events usually resort to the U-Net architecture to estimate optical flow sparsely. Our key finding is that the introduction of correlation features significantly improves results compared to previous methods that solely rely on convolution layers. Compared to the state-of-the-art, our proposed approach computes dense optical flow and reduces the end-point error by 23% on MVSEC. Furthermore, we show that all existing optical flow methods developed so far for event cameras have been evaluated on datasets with very small displacement fields with a maximum flow magnitude of 10 pixels. Based on this observation, we introduce a new real-world dataset that exhibits displacement fields with magnitudes up to 210 pixels and 3 times higher camera resolution. Our proposed approach reduces the end-point error on this dataset by 66%.

翻訳日:2021-08-25 19:02:27 公開日:2021-08-24

# (参考訳) 野獣のタミング:ニューラルな会話モデルを制御する学習

Taming the Beast: Learning to Control Neural Conversational Models ( http://arxiv.org/abs/2108.10561v1 )

ライセンス: CC BY 4.0

Andrea Madotto

(参考訳) 本論文は,タスク指向とチャットの両シナリオにおいて,深層学習に基づく,エンドツーエンドで生成的な対話システムの制御可能性について考察する。特に,スタイルや話題の制御や対話スキルの継続的な付加・結合など,生成対話システム制御のさまざまな側面について検討する。最初の対話システムが商用化されてから30年が経ち、これらのシステムの基本的なアーキテクチャは、自然言語理解(NLU)、対話状態追跡(DST)、対話マネージャ(DM)、自然言語生成(NLG)という4つのパイプライン化された基本コンポーネントで、ほとんど変わっていない。モジュール化システムの重要なコンポーネントである対話マネージャは、応答内容とスタイルを制御する。このモジュールは通常規則でプログラムされ、高度に制御可能で容易に拡張できるように設計されている。強力な「深層学習」アーキテクチャの出現に伴い、システム全体の性能を最適化し、訓練を簡素化するエンドツーエンド生成対話システムが提案されている。しかし、これらのシステムはモジュール化された対話マネージャができる限り容易に制御・拡張できない。これは、通常、大きな事前学習された言語モデル(gpt-2など)である単一のニューラルネットワークが使用されているため、望ましい属性(スタイル、トピックなど)を外科的に変更することは困難である。さらに重要なことに、制御不能な対話システムは攻撃的、さらには有害な反応を引き起こす可能性がある。そこで本論文では,タスク指向およびチャットシナリオにおけるエンドツーエンド生成対話システムの制御可能な手法について検討する。 1)chit-chatモデルのスタイルと話題の制御方法,2)タスク指向対話システムの継続的な制御と拡張方法,3)マルチスキル対話モデルの構成と制御方法について述べる。

This thesis investigates the controllability of deep learning-based, end-to-end, generative dialogue systems in both task-oriented and chit-chat scenarios. In particular, we study the different aspects of controlling generative dialogue systems, including controlling styles and topics and continuously adding and combining dialogue skills. In the three decades since the first dialogue system was commercialized, the basic architecture of such systems has remained substantially unchanged, consisting of four pipelined basic components, namely, natural language understanding (NLU), dialogue state tracking (DST), a dialogue manager (DM) and natural language generation (NLG). The dialogue manager, which is the critical component of the modularized system, controls the response content and style. This module is usually programmed by rules and is designed to be highly controllable and easily extendable. With the emergence of powerful "deep learning" architectures, end-to-end generative dialogue systems have been proposed to optimize overall system performance and simplify training. However, these systems cannot be easily controlled and extended as the modularized dialogue manager can. This is because a single neural system is used, which is usually a large pre-trained language model (e.g., GPT-2), and thus it is hard to surgically change desirable attributes (e.g., style, topics, etc.). More importantly, uncontrollable dialogue systems can generate offensive and even toxic responses. Therefore, in this thesis, we study controllable methods for end-to-end generative dialogue systems in task-oriented and chit-chat scenarios. Throughout the chapters, we describe 1) how to control the style and topics of chit-chat models, 2) how to continuously control and extend task-oriented dialogue systems, and 3) how to compose and control multi-skill dialogue models.

翻訳日:2021-08-25 18:50:05 公開日:2021-08-24

# (参考訳) 残差学習に基づくデュアルオートエンコーダモデルを用いた医用画像圧縮

Lossy Medical Image Compression using Residual Learning-based Dual Autoencoder Model ( http://arxiv.org/abs/2108.10579v1 )

ライセンス: CC BY 4.0

Dipti Mishra, Satish Kumar Singh, Rajat Kumar Singh

(参考訳) 本研究では,マラリアrbc細胞画像パッチを圧縮するための2段階オートエンコーダベースの圧縮機・デコンプレッサーフレームワークを提案する。病気の診断に使用される医療画像は、数十ギガバイトほどの大きさで、非常に巨大です。提案した残差ベースデュアルオートエンコーダネットワークは,デコンプレッサモジュールを通じて元のイメージを再構成するユニークな特徴を抽出するために訓練される。 2つの潜在空間表現(第1は原画像、第2は残留画像)は、最終原画像の再構築に使用される。色-SSIMは、減圧後の細胞画像のクロミナンス部の品質チェックにのみ使用されている。実験の結果,提案手法は,PSNR,Color SSIM,MS-SSIMにおいて,医用画像の他のニューラルネットワーク圧縮技術よりも約35%,10%,5%優れていた。このアルゴリズムは、JPEG-LS、JP2K-LM、CALIC、最近のニューラルネットワークアプローチよりも76%、78%、75%、および74%のビット保存を大幅に改善し、圧縮圧縮技術として優れている。

In this work, we propose a two-stage autoencoder based compressor-decompressor framework for compressing malaria RBC cell image patches. We know that the medical images used for disease diagnosis are around multiple gigabytes size, which is quite huge. The proposed residual-based dual autoencoder network is trained to extract the unique features which are then used to reconstruct the original image through the decompressor module. The two latent space representations (first for the original image and second for the residual image) are used to rebuild the final original image. Color-SSIM has been exclusively used to check the quality of the chrominance part of the cell images after decompression. The empirical results indicate that the proposed work outperformed other neural network related compression technique for medical images by approximately 35%, 10% and 5% in PSNR, Color SSIM and MS-SSIM respectively. The algorithm exhibits a significant improvement in bit savings of 76%, 78%, 75% & 74% over JPEG-LS, JP2K-LM, CALIC and recent neural network approach respectively, making it a good compression-decompression technique.

翻訳日:2021-08-25 18:48:47 公開日:2021-08-24

# (参考訳) ポーランド国境警備隊における刑事文書の検出

Detection of Criminal Texts for the Polish State Border Guard ( http://arxiv.org/abs/2108.10580v1 )

ライセンス: CC BY 4.0

Artur Nowakowski, Krzysztof Jassem

(参考訳) 本稿では,インターネット上に現れるポーランドの犯罪テキストの検出について述べる。非平衡・雑音データの効率的な分類のための最善の設定を探索する実験を行った。ポーランド語をベースとしたトランスフォーマー言語モデルを用いて,我々のモデルを微調整した結果,最高の性能が得られた。検出タスクでは,注釈付きインターネットスニペットの大規模なコーパスをトレーニングデータとして収集した。このデータセットを共有し、Goitoプラットフォームをベンチマークとして、犯罪テキストを検出するための新しいタスクを作成します。

This paper describes research on the detection of Polish criminal texts appearing on the Internet. We carried out experiments to find the best available setup for the efficient classification of unbalanced and noisy data. The best performance was achieved when our model was fine-tuned on a pre-trained Polish-based transformer language model. For the detection task, a large corpus of annotated Internet snippets was collected as training data. We share this dataset and create a new task for the detection of criminal texts using the Gonito platform as the benchmark.

翻訳日:2021-08-25 18:40:14 公開日:2021-08-24

# (参考訳) コンピュータ支援整形外科手術における咬合・ロバスト視覚マーカーレス骨追跡

Occlusion-robust Visual Markerless Bone Tracking for Computer-Assisted Orthopaedic Surgery ( http://arxiv.org/abs/2108.10608v1 )

ライセンス: CC BY 4.0

Xue Hu, Anh Nguyen, Ferdinando Rodriguez y Baena

(参考訳) 従来のコンピュータ支援整形外科ナビゲーションシステムは、患者のポーズのための専用の光学マーカーの追跡に依存しているため、手術のワークフローはより侵襲的で退屈で高価である。視覚的追跡は, マーカーレス, 無努力で標的解剖を測定するために最近提案されているが, 術中介入による実世界の閉塞下では失敗する。さらに、そのような手法はハードウェア固有のものであり、外科的応用には十分ではない。本稿では,咬合に対して頑健なrgb-dセンシングに基づくマーカーレストラッキング手法を提案する。我々は、動的領域の予測とロバストな3Dポイントクラウドセグメンテーションを特徴とする新しいセグメンテーションネットワークを設計する。また,オクルージョン・インスタンスを用いた大規模トレーニングデータ収集にはコストがかかるため,ネットワークトレーニングのための合成RGB-D画像の作成方法も提案する。実験結果から,提案手法は近年の最先端手法よりも,特に閉塞が存在する場合において高い性能を示すことが示された。さらに,本手法は,ネットワーク再トレーニングを必要とせず,キャダバを含む新しいカメラや新たなターゲットモデルによく応用できる。提案手法は,高品質な商用RGB-Dカメラを用いて,モデル膝における1-2デグレスと2-4mmの精度を実現し,臨床応用の基準を満たしている。

Conventional computer-assisted orthopaedic navigation systems rely on the tracking of dedicated optical markers for patient poses, which makes the surgical workflow more invasive, tedious, and expensive. Visual tracking has recently been proposed to measure the target anatomy in a markerless and effortless way, but the existing methods fail under real-world occlusion caused by intraoperative interventions. Furthermore, such methods are hardware-specific and not accurate enough for surgical applications. In this paper, we propose a RGB-D sensing-based markerless tracking method that is robust against occlusion. We design a new segmentation network that features dynamic region-of-interest prediction and robust 3D point cloud segmentation. As it is expensive to collect large-scale training data with occlusion instances, we also propose a new method to create synthetic RGB-D images for network training. Experimental results show that our proposed markerless tracking method outperforms recent state-of-the-art approaches by a large margin, especially when an occlusion exists. Furthermore, our method generalises well to new cameras and new target models, including a cadaver, without the need for network retraining. In practice, by using a high-quality commercial RGB-D camera, our proposed visual tracking method achieves an accuracy of 1-2 degress and 2-4 mm on a model knee, which meets the standard for clinical applications.

翻訳日:2021-08-25 18:31:43 公開日:2021-08-24

# (参考訳) ProtoMIL: ファイングレード・インタプリタビリティのためのプロトタイプ部分を用いた複数インスタンス学習

ProtoMIL: Multiple Instance Learning with Prototypical Parts for Fine-Grained Interpretability ( http://arxiv.org/abs/2108.10612v1 )

ライセンス: CC BY 4.0

Dawid Rymarczyk and Aneta Kaczy\'nska and Jaros{\l}aw Kraus and Adam Pardyl and Bartosz Zieli\'nski

(参考訳) マルチインスタンス学習(mil:multiple instance learning)は、多くの現実の機械学習アプリケーションで人気を集めている。しかしながら、ミルを説明するための対応する取り組みは遅れており、通常は特定の予測に不可欠なバッグのインスタンスを提示することに限られる。本稿では,視覚プロトタイプで動作するケースベース推論プロセスに触発された自己説明可能なMIL手法であるProtoMILを導入することにより,このギャップを埋める。 ProtoMILは、オブジェクト記述に原型的特徴を組み込むことにより、モデル精度と細粒度解釈可能性に前例のない結合を行い、5つのMILデータセットで実験を行った。

Multiple Instance Learning (MIL) gains popularity in many real-life machine learning applications due to its weakly supervised nature. However, the corresponding effort on explaining MIL lags behind, and it is usually limited to presenting instances of a bag that are crucial for a particular prediction. In this paper, we fill this gap by introducing ProtoMIL, a novel self-explainable MIL method inspired by the case-based reasoning process that operates on visual prototypes. Thanks to incorporating prototypical features into objects description, ProtoMIL unprecedentedly joins the model accuracy and fine-grained interpretability, which we present with the experiments on five recognized MIL datasets.

翻訳日:2021-08-25 18:12:05 公開日:2021-08-24

# (参考訳) 不均一Telcoセルデータの外部位置復元

Outdoor Position Recovery from HeterogeneousTelco Cellular Data ( http://arxiv.org/abs/2108.10613v1 )

ライセンス: CC BY 4.0

Yige Zhang, Weixiong Rao, Kun Zhang and Lei Chen

(参考訳) 近年、通信(テルコ)セルラーネットワークによって生成された前例のない量のデータを目撃している。例えば、モバイルデバイスと通信ネットワーク間の接続状態(例えば受信信号強度)を報告するために計測記録(mrs)が生成される。 MRデータは、人間の移動分析、都市計画、交通予測のための屋外モバイルデバイスのローカライズに広く利用されている。隠れマルコフモデル(hmm)のような一階系列モデルを用いた既存の仕事は、低ローカライズエラーの基盤となるモビリティパターンにおける時空間的局所性を捉えようとする。 HMMアプローチは通常、基盤となるモバイルデバイスの安定したモビリティパターンを前提としている。しかし、実際のMRデータセットは、基礎となるモバイルデバイスの混合輸送モードとMRサンプルに関連する位置の不均一な分布により、異種移動パターンを示す。したがって、既存のソリューションはこれらの不均質なモビリティパターンを処理できない。本研究では,マルチタスク学習に基づく深層ニューラルネットワーク(DNN)フレームワークであるPRNet+を提案する。フレームワークの動作を確認するため、PRNet+は特徴抽出モジュールを開発し、異種MRサンプルから局所的、短期的、長期的時空間的局所性を正確に学習する。上海の3つの代表的な地域で収集された8つのデータセットの大規模な評価は、PRNet+が最先端のデータを著しく上回ることを示している。

Recent years have witnessed unprecedented amounts of data generated by telecommunication (Telco) cellular networks. For example, measurement records (MRs) are generated to report the connection states between mobile devices and Telco networks, e.g., received signal strength. MR data have been widely used to localize outdoor mobile devices for human mobility analysis, urban planning, and traffic forecasting. Existing works using first-order sequence models such as the Hidden Markov Model (HMM) attempt to capture spatio-temporal locality in underlying mobility patterns for lower localization errors. The HMM approaches typically assume stable mobility patterns of the underlying mobile devices. Yet real MR datasets exhibit heterogeneous mobility patterns due to mixed transportation modes of the underlying mobile devices and uneven distribution of the positions associated with MR samples. Thus, the existing solutions cannot handle these heterogeneous mobility patterns. we propose a multi-task learning-based deep neural network (DNN) framework, namely PRNet+, to incorporate outdoor position recovery and transportation mode detection. To make sure the framework work, PRNet+ develops a feature extraction module to precisely learn local-, short- and long-term spatio-temporal locality from heterogeneous MR samples. Extensive evaluation on eight datasets collected at three representative areas in Shanghai indicates that PRNet+ greatly outperforms state-of-the-arts.

翻訳日:2021-08-25 17:56:21 公開日:2021-08-24

# (参考訳) 画像なし単一画素セグメンテーション

Image-free single-pixel segmentation ( http://arxiv.org/abs/2108.10617v1 )

ライセンス: CC BY 4.0

Haiyan Liu, Liheng Bian, Jun Zhang

(参考訳) 既存のセグメンテーション技術は、セグメンテーションを実行するために入力として高忠実度画像を必要とする。セグメンテーションの結果は、取得した画像よりもはるかに少ないエッジ情報の大部分を含んでいるため、スループットギャップはハードウェアとソフトウェアの両方の無駄につながる。本稿では,画像のない単一画素セグメンテーション手法について報告する。この技術は、構造化照明と単画素検出を組み合わせて、シーンのセグメンテーション情報を効率よくサンプリングし、圧縮された1次元計測に多重化する。照明パターンは、後続のレコンストラクションニューラルネットワークと共に最適化され、シングルピクセルの測定からセグメンテーションマップを直接推定する。エンドツーエンドのエンコーディング・アンド・デコーディング学習フレームワークは、対応するネットワークで最適化された照明を可能にし、高い獲得効率とセグメンテーション効率を提供する。シミュレーションと実験の結果から、正確なセグメンテーションが2次元の少ない入力データで達成できることが確認された。サンプリング比1%の場合、ディス係数は80%以上、画素精度は96%以上となる。我々は,この画像のないセグメンテーション技術が,UAVや無人航空機など,リアルタイムセンシングを必要とする様々な資源制限されたプラットフォームに広く応用できると考えている。

The existing segmentation techniques require high-fidelity images as input to perform semantic segmentation. Since the segmentation results contain most of edge information that is much less than the acquired images, the throughput gap leads to both hardware and software waste. In this letter, we report an image-free single-pixel segmentation technique. The technique combines structured illumination and single-pixel detection together, to efficiently samples and multiplexes scene's segmentation information into compressed one-dimensional measurements. The illumination patterns are optimized together with the subsequent reconstruction neural network, which directly infers segmentation maps from the single-pixel measurements. The end-to-end encoding-and-decoding learning framework enables optimized illumination with corresponding network, which provides both high acquisition and segmentation efficiency. Both simulation and experimental results validate that accurate segmentation can be achieved using two-order-of-magnitude less input data. When the sampling ratio is 1%, the Dice coefficient reaches above 80% and the pixel accuracy reaches above 96%. We envision that this image-free segmentation technique can be widely applied in various resource-limited platforms such as UAV and unmanned vehicle that require real-time sensing.

翻訳日:2021-08-25 17:23:31 公開日:2021-08-24

# (参考訳) adversarial bertを用いた弱い教師付きクロスプラットフォームティーンエージャー検出

Weakly Supervised Cross-platform Teenager Detection with Adversarial BERT ( http://arxiv.org/abs/2108.10619v1 )

ライセンス: CC BY 4.0

Peiling Yi and Arkaitz Zubiaga

(参考訳) ティーンエイジャー検出は、ソーシャルメディアにおける年齢検出タスクの重要な事例であり、十代のユーザーをネガティブな影響から保護することを目的としている。ティーンエイジャー検出タスクはラベル付きデータの不足に苦しめられ、ソーシャルメディアプラットフォーム間でうまく機能する能力が悪化する。プラットフォーム上でラベル付きデータが利用できない環境でのティーンエイジャー検出のさらなる研究のために,Adversarial BERTに基づく新しいクロスプラットフォームフレームワークを提案する。私たちのフレームワークは、ソースプラットフォームから限られた量のラベル付きインスタンスで動作でき、ターゲットプラットフォームからラベル付きデータがなく、ソースからターゲットのソーシャルメディアに知識を転送できます。我々は4つの公開データセットを実験し、クロスプラットフォームのティーンエイジャー検出タスクにおいて、我々のフレームワークが競合するベースラインモデルを大幅に改善できることを示す結果を得た。

Teenager detection is an important case of the age detection task in social media, which aims to detect teenage users to protect them from negative influences. The teenager detection task suffers from the scarcity of labelled data, which exacerbates the ability to perform well across social media platforms. To further research in teenager detection in settings where no labelled data is available for a platform, we propose a novel cross-platform framework based on Adversarial BERT. Our framework can operate with a limited amount of labelled instances from the source platform and with no labelled data from the target platform, transferring knowledge from the source to the target social media. We experiment on four publicly available datasets, obtaining results demonstrating that our framework can significantly improve over competitive baseline models on the cross-platform teenager detection task.

翻訳日:2021-08-25 17:16:03 公開日:2021-08-24

# (参考訳) モデル埋め込み距離を用いたディープニューラルネットワークの分布外例検出

Out-of-Distribution Example Detection in Deep Neural Networks using Distance to Modelled Embedding ( http://arxiv.org/abs/2108.10673v1 )

ライセンス: CC BY 4.0

Rickard Sj\"ogren and Johan Trygg

(参考訳) 安全クリティカルなシステムにおけるディープラーニングの採用は、モデルがデプロイされた後、ディープニューラルネットワークが理解できないことを理解する必要性を高める。ディープニューラルネットワークの振る舞いは、いわゆるアウト・オブ・ディストリビューションの例では定義されていない。つまり、トレーニングセット以外のディストリビューションからの例です。予測時間中に分布外サンプルを検出する手法がいくつか提案されているが、これらの手法はニューラルネットワークアーキテクチャ、ニューラルネットワークのトレーニング方法、パフォーマンス上のオーバーヘッド、あるいは分布外サンプルの性質が事前に分かっていると仮定するのいずれかを制約している。予測時間における分布外例の検出に使用するDIME(Distance to Modelled Embedding)を提案する。線形超平面として特徴空間に埋め込まれたトレーニングセットを近似することにより、単純で教師なし、高性能で計算効率の良い手法を導出する。 DIMEにより、アーキテクチャやトレーニングを変更することなく、ニューラルネットワークモデルに配布外サンプルの予測時間検出を追加できます。実験では,DIMEをアドオンとして使用することにより,予測中の分布外例を効率よく検出し,より汎用性が高く,計算オーバーヘッドも無視できることを示した。

Adoption of deep learning in safety-critical systems raise the need for understanding what deep neural networks do not understand after models have been deployed. The behaviour of deep neural networks is undefined for so called out-of-distribution examples. That is, examples from another distribution than the training set. Several methodologies to detect out-of-distribution examples during prediction-time have been proposed, but these methodologies constrain either neural network architecture, how the neural network is trained, suffer from performance overhead, or assume that the nature of out-of-distribution examples are known a priori. We present Distance to Modelled Embedding (DIME) that we use to detect out-of-distribution examples during prediction time. By approximating the training set embedding into feature space as a linear hyperplane, we derive a simple, unsupervised, highly performant and computationally efficient method. DIME allows us to add prediction-time detection of out-of-distribution examples to neural network models without altering architecture or training while imposing minimal constraints on when it is applicable. In our experiments, we demonstrate that by using DIME as an add-on after training, we efficiently detect out-of-distribution examples during prediction and match state-of-the-art methods while being more versatile and introducing negligible computational overhead.

翻訳日:2021-08-25 17:06:24 公開日:2021-08-24

# (参考訳) MCUa:乳がん組織像分類のためのマルチレベルコンテキストとダイナミックディープアンサンブル

MCUa: Multi-level Context and Uncertainty aware Dynamic Deep Ensemble for Breast Cancer Histology Image Classification ( http://arxiv.org/abs/2108.10709v1 )

ライセンス: CC BY 4.0

Zakaria Senousy, Mohammed M. Abdelsamea, Mohamed Medhat Gaber, Moloud Abdar, U Rajendra Acharya, Abbas Khosravi, and Saeid Nahavandi

(参考訳) 乳腺組織像の分類は乳がんの早期診断において重要なステップである。乳腺病理診断では,CNN (Convolutional Neural Networks) がDigitalized histology slidesを用いて大きな成功を収めた。しかし, 大規模デジタル化標本の高視認性と文脈情報の欠如により, 組織分類は依然として困難である。本稿では,マルチレベルコンテキストと不確実性認識(mcua)動的ディープラーニングアンサンブルモデルと呼ばれる新しいcnnを提案する。mcuaモデルは複数のマルチレベルコンテキスト認識モデルからなり,画像パッチ間の空間依存性を階層的に学習する。 MCUamodelhasは、不確実な定量化成分を用いて、マルチレベルの文脈情報に対する高感度を利用して、新しいダイナミックアンサンブルモデルを実現し、乳がん組織像データセットで98.11%の精度を達成した。実験の結果, 最先端の組織分類モデルと比較して, 提案法の有効性が高かった。

Breast histology image classification is a crucial step in the early diagnosis of breast cancer. In breast pathological diagnosis, Convolutional Neural Networks (CNNs) have demonstrated great success using digitized histology slides. However, tissue classification is still challenging due to the high visual variability of the large-sized digitized samples and the lack of contextual information. In this paper, we propose a novel CNN, called Multi-level Context and Uncertainty aware (MCUa) dynamic deep learning ensemble model.MCUamodel consists of several multi-level context-aware models to learn the spatial dependency between image patches in a layer-wise fashion. It exploits the high sensitivity to the multi-level contextual information using an uncertainty quantification component to accomplish a novel dynamic ensemble model.MCUamodelhas achieved a high accuracy of 98.11% on a breast cancer histology image dataset. Experimental results show the superior effectiveness of the proposed solution compared to the state-of-the-art histology classification models.

翻訳日:2021-08-25 16:47:41 公開日:2021-08-24

# (参考訳) メディアパイプハンドを用いたペン紡ぐ手の動き解析

Pen Spinning Hand Movement Analysis Using MediaPipe Hands ( http://arxiv.org/abs/2108.10716v1 )

ライセンス: CC BY 4.0

Tung-Lin Wu, Taishi Senda

(参考訳) MediaPipe Hands と OpenCV を用いたペン回転時の手の動きに関するデータ取得に挑戦した。本研究の目的は,ペン回転競技の性能を客観的に評価するシステムを構築することである。競争における実行、滑らかさ、制御の評価は非常に困難であり、しばしば主観性を伴う。そこで本稿では,客観的数値を用いて評価を完全自動化することを目的とした。不確かさは依然としてMediaPipeの骨格認識に存在し、鮮やかな色の背景では認識が難しい傾向にある。しかし,プログラムの彩度や輝度を変化させることで,認識精度を向上させることができた。さらに、明るさの自動検出と調整も可能になった。対象数値を用いてペン回転の評価を体系化する次のステップとして,手の動きを採用した。各フレームにおける手の座標の標準偏差とL2ノルムを計算することにより,手の動きの上下を可視化することができた。手の動きの結果は非常に正確で、目標に向かって大きな一歩だと感じています。将来的には、ペン紡績の仕上がりを完全に自動化していきたいと考えています。

We challenged to get data about hand movement in pen spinning using MediaPipe Hands and OpenCV. The purpose is to create a system that can be used to objectively evaluate the performance of pen spinning competitions. Evaluation of execution, smoothness, and control in competitions are quite difficult and often with subjectivity. Therefore, we aimed to fully automate the process by using objective numerical values for evaluation. Uncertainty still exists in MediaPipe's skeletal recognition, and it tends to be more difficult to recognize in brightly colored backgrounds. However, we could improve the recognition accuracy by changing the saturation and brightness in the program. Furthermore, automatic detection and adjustment of brightness is now possible. As the next step to systematize the evaluation of pen spinning using objective numerical values, we adopted "hand movements". We were able to visualize the ups and downs of the hand movements by calculating the standard deviation and L2 norm of the hand's coordinates in each frame. The results of hand movements are quite accurate, and we feel that it is a big step toward our goal. In the future, we would like to make great efforts to fully automate the grading of pen spinning.

翻訳日:2021-08-25 15:58:37 公開日:2021-08-24

# (参考訳) グラフニューラルネットワーク: 手法,応用,機会

Graph Neural Networks: Methods, Applications, and Opportunities ( http://arxiv.org/abs/2108.10733v1 )

ライセンス: CC BY 4.0

Lilapati Waikhom and Ripon Patgiri

(参考訳) 過去10年ほどで、私たちは機械学習分野を再活性化するディープラーニングを見てきた。コンピュータビジョン、音声認識、自然言語処理などの分野における多くの問題を解決し、最先端のパフォーマンスを実現している。データは一般にこれらの領域のユークリッド空間で表される。他の様々な領域は非ユークリッド空間で、グラフは理想的な表現である。グラフは、様々なエンティティ間の依存関係と相互関係を表現するのに適している。伝統的に、グラフのハンドクラフト機能は、この複雑なデータ表現から様々なタスクに必要な推論を提供することができない。近年,データベースタスクのグラフ化に深層学習の様々な進歩が取り入れられている。本稿では、各学習環境におけるグラフニューラルネットワーク(GNN)の総合的な調査:教師なし、教師なし、半教師なし、自己教師付き学習。グラフベースの学習環境の分類学は、与えられた学習環境に落下するメソッドの論理的区分を備える。各学習タスクに対するアプローチは、理論と経験的観点の両方から分析される。さらに、GNN構築のための一般的なアーキテクチャガイドラインを提供する。さまざまなアプリケーションやベンチマークデータセットも提供されており、GNNの一般適用性に疑問が残るオープンな課題もある。

In the last decade or so, we have witnessed deep learning reinvigorating the machine learning field. It has solved many problems in the domains of computer vision, speech recognition, natural language processing, and various other tasks with state-of-the-art performance. The data is generally represented in the Euclidean space in these domains. Various other domains conform to non-Euclidean space, for which graph is an ideal representation. Graphs are suitable for representing the dependencies and interrelationships between various entities. Traditionally, handcrafted features for graphs are incapable of providing the necessary inference for various tasks from this complex data representation. Recently, there is an emergence of employing various advances in deep learning to graph data-based tasks. This article provides a comprehensive survey of graph neural networks (GNNs) in each learning setting: supervised, unsupervised, semi-supervised, and self-supervised learning. Taxonomy of each graph based learning setting is provided with logical divisions of methods falling in the given learning setting. The approaches for each learning task are analyzed from both theoretical as well as empirical standpoints. Further, we provide general architecture guidelines for building GNNs. Various applications and benchmark datasets are also provided, along with open challenges still plaguing the general applicability of GNNs.

翻訳日:2021-08-25 15:52:41 公開日:2021-08-24

# (参考訳) DeepPanoContext: ホロスティックなシーンコンテキストグラフと関係に基づく最適化によるパノラマ3次元シーン理解

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization ( http://arxiv.org/abs/2108.10743v1 )

ライセンス: CC BY 4.0

Cheng Zhang, Zhaopeng Cui, Cai Chen, Shuaicheng Liu, Bing Zeng, Hujun Bao, Yinda Zhang

(参考訳) パノラマ画像は、通常の視点画像に比べて、自然にリッチなシーンコンテキスト情報をエンコードする視野がはるかに大きいが、従来のシーン理解手法ではうまく利用されていない。本論文では,パノラマ映像から各物体の3次元空間配置と形状,ポーズ,位置,意味カテゴリーを復元する新しいパノラマ3次元シーン理解手法を提案する。リッチなコンテキスト情報を十分に活用するために,オブジェクトとルームレイアウトの関係を予測するための新しいグラフニューラルネットワークベースのコンテキストモデルと,高度に設計された対象関数をオンザフライで最適化する微分可能な関係ベースの最適化モジュールを設計した。既存のデータが不完全な地上の真実か、過度に単純化されたシーンであることを認識し、部屋のレイアウトや家具配置の多様性に優れた、パノラマ3Dシーン理解のためのリアルな画像品質を備えた新しい合成データセットを提示する。実験により,従来のパノラマシーン理解法よりも,幾何学的精度と物体配置の両面で優れることを示した。コードはhttps://chengzhag.github.io/publication/dpcで入手できる。

Panorama images have a much larger field-of-view thus naturally encode enriched scene context information compared to standard perspective images, which however is not well exploited in the previous scene understanding methods. In this paper, we propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view panorama image. In order to fully utilize the rich context information, we design a novel graph neural network based context model to predict the relationship among objects and room layout, and a differentiable relationship-based optimization module to optimize object arrangement with well-designed objective functions on-the-fly. Realizing the existing data are either with incomplete ground truth or overly-simplified scene, we present a new synthetic dataset with good diversity in room layout and furniture placement, and realistic image quality for total panoramic 3D scene understanding. Experiments demonstrate that our method outperforms existing methods on panoramic scene understanding in terms of both geometry accuracy and object arrangement. Code is available at https://chengzhag.github.io/publication/dpc.

翻訳日:2021-08-25 15:51:47 公開日:2021-08-24

# (参考訳) 持続可能な開発目標を達成するための解釈可能なディープラーニングモデル

Interpretable deep-learning models to help achieve the Sustainable Development Goals ( http://arxiv.org/abs/2108.10744v1 )

ライセンス: CC BY 4.0

Ricardo Vinuesa, Beril Sirmacek

(参考訳) 我々は、解釈可能な人工知能(AI)モデルに対する私たちの洞察と、それが倫理的AIシステムの開発の文脈においていかに不可欠であるか、そして持続可能な開発目標(SDG)に準拠したデータ駆動ソリューションについて議論する。本稿では,インダクティブバイアスによって得られた記号モデルなどを通じて,ディープラーニング手法から真に解釈可能なモデルを抽出する可能性を強調し,aiの持続可能な発展を保証する。

We discuss our insights into interpretable artificial-intelligence (AI) models, and how they are essential in the context of developing ethical AI systems, as well as data-driven solutions compliant with the Sustainable Development Goals (SDGs). We highlight the potential of extracting truly-interpretable models from deep-learning methods, for instance via symbolic models obtained through inductive biases, to ensure a sustainable development of AI.

翻訳日:2021-08-25 15:50:42 公開日:2021-08-24

# (参考訳) 人工生成メタデータを用いた表からの関係抽出

Relation Extraction from Tables using Artificially Generated Metadata ( http://arxiv.org/abs/2108.10750v1 )

ライセンス: CC BY 4.0

Gaurav singh, Siffi Singh, Joshua Wong, Amir Saffari

(参考訳) テーブルからの関係抽出(RE)は、列のペア間の関係を識別するタスクである。一般的に、このタスクのREモデルはトレーニングのためにラベル付きテーブルを必要とする。幸いなことに、ラベル付きテーブルは知識グラフ(KG)から人工的に生成することもできるため、手作業によるアノテーションよりもはるかにコストが低い。しかし、これらのテーブルは実際のテーブルと比較して1つの欠点があり、コラムヘッドやキャプションといった関連するメタデータが欠けている。これは、合成テーブルがメタデータを格納しないKGから生成されるためである。残念ながら、メタデータはテーブルからのreに対する強力なシグナルを提供することができる。この問題に対処するため,合成表のメタデータを人工的に生成する手法を提案する。次に、人工メタデータを入力として使用するREモデルを実験する。実験の結果,F1スコアの9\%-45\%が絶対的に2つの表付きデータセットで改善されることがわかった。

Relation Extraction (RE) from tables is the task of identifying relations between pairs of columns. Generally, RE models for this task require labelled tables for training. Luckily, labelled tables can also be generated artificially from a Knowledge Graph (KG), which makes the cost to acquire them much lower in comparison to manual annotations. However, these tables have one drawback compared to real tables, which is that they lack associated metadata, such as column-headers, captions, etc. This is because synthetic tables are created out of KGs that do not store such metadata. Unfortunately, metadata can provide strong signals for RE from tables. To address this issue, we propose methods to artificially create some of this metadata for synthetic tables. We then experiment with a RE model that uses artificial metadata as input. Our empirical results show that this leads to an improvement of 9\%-45\% in F1 score, in absolute terms, over 2 tabular datasets.

翻訳日:2021-08-25 15:46:07 公開日:2021-08-24

# (参考訳) 単語を超えて:潜在ディリクレ割当モデルのコロケーショントークン化

More Than Words: Collocation Tokenization for Latent Dirichlet Allocation Models ( http://arxiv.org/abs/2108.10755v1 )

ライセンス: CC BY 4.0

Jin Cheevaprawatdomrong, Alexandra Schofield, Attapol T. Rutherford

(参考訳) 伝統的に、LDA (Latent Dirichlet Allocation) は文書の集合の中で単語を取り込み、単語文書の共起を使ってその潜在トピックを発見する。しかし、中国語やタイ語などの単語境界をマークせずに、言語で最高の結果を達成する方法は不明である。本稿では,PearsonのChi-squared test, t-statistics, Word Pair Encoding (WPE)を用いて,LDAモデルの入力としてトークンを生成する。 Chi-squared、t、WPEトークンーはウィキペディアのテキストで訓練され、複合名詞、固有名詞、複合イベント動詞などのグループ化すべき単語を探す。本稿では,モデルの語彙が異なる設定において,クラスタリング品質を測定するための新しい指標を提案する。このメトリックやその他の確立されたメトリクスに基づいて、マージトークンでトレーニングされたトピックは、これらの未マージモデルよりも明確で一貫性があり、トピックの識別に効果的であるトピックキーを生成する。

Traditionally, Latent Dirichlet Allocation (LDA) ingests words in a collection of documents to discover their latent topics using word-document co-occurrences. However, it is unclear how to achieve the best results for languages without marked word boundaries such as Chinese and Thai. Here, we explore the use of Pearson's chi-squared test, t-statistics, and Word Pair Encoding (WPE) to produce tokens as input to the LDA model. The Chi-squared, t, and WPE tokenizers are trained on Wikipedia text to look for words that should be grouped together, such as compound nouns, proper nouns, and complex event verbs. We propose a new metric for measuring the clustering quality in settings where the vocabularies of the models differ. Based on this metric and other established metrics, we show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.

翻訳日:2021-08-25 15:40:28 公開日:2021-08-24

# (参考訳) DU-GAN:低用量CT復調用デュアルドメインU-Netディスクリミネータを用いた生成対向ネットワーク

DU-GAN: Generative Adversarial Networks with Dual-Domain U-Net Based Discriminators for Low-Dose CT Denoising ( http://arxiv.org/abs/2108.10772v1 )

ライセンス: CC BY 4.0

Zhizhong Huang, Junping Zhang, Yi Zhang, Hongming Shan

(参考訳) LDCTは、CT関連X線による患者の健康リスクから、医療画像分野で大きな注目を集めている。しかし、放射線線量の減少は再構成画像の品質を低下させ、結果として診断性能を損なう。 LDCT画像の品質向上のために,様々なディープラーニング技術が導入されている。 GANをベースとした denoising 法は、通常、追加の分類網、すなわち、追加の分類網を利用する。識別器は、識別された画像と通常のドーズ画像の最も差別的な違いを学習し、それに従って復調モデルを正規化し、大域的な構造や局所的な詳細に焦点を当てることが多い。本稿では,LDCTデノナイジングモデルを改善するために,GANフレームワークにおけるU-Netに基づく差別化手法であるDU-GANを提案し,画像領域と勾配領域の両方におけるデノナイジング画像の局所的差と局所的差を学習する。このようなU-Netベースの識別器の利点は、U-Netの出力を通じて1ピクセル当たりのフィードバックを提供するだけでなく、U-Netの中間層を通した意味レベルでのグローバル構造に焦点を合わせることができることである。画像領域における敵対的トレーニングに加えて、画像勾配領域に別のu-netベースの判別器を適用し、光子飢餓によるアーティファクトの軽減と分断されたct画像のエッジの強化を図る。さらに、カットミックス技術により、u-netベースの判別器の画素単位の出力に対して、放射線科医に信頼度マップを提供し、その不確かさを可視化し、ldctに基づくスクリーニングおよび診断を容易にする。シミュレーションおよび実世界のデータセットに関する広範な実験は、最近公開された方法よりも質的かつ定量的に優れた性能を示している。

LDCT has drawn major attention in the medical imaging field due to the potential health risks of CT-associated X-ray radiation to patients. Reducing the radiation dose, however, decreases the quality of the reconstructed images, which consequently compromises the diagnostic performance. Various deep learning techniques have been introduced to improve the image quality of LDCT images through denoising. GANs-based denoising methods usually leverage an additional classification network, i.e. discriminator, to learn the most discriminate difference between the denoised and normal-dose images and, hence, regularize the denoising model accordingly; it often focuses either on the global structure or local details. To better regularize the LDCT denoising model, this paper proposes a novel method, termed DU-GAN, which leverages U-Net based discriminators in the GANs framework to learn both global and local difference between the denoised and normal-dose images in both image and gradient domains. The merit of such a U-Net based discriminator is that it can not only provide the per-pixel feedback to the denoising network through the outputs of the U-Net but also focus on the global structure in a semantic level through the middle layer of the U-Net. In addition to the adversarial training in the image domain, we also apply another U-Net based discriminator in the image gradient domain to alleviate the artifacts caused by photon starvation and enhance the edge of the denoised CT images. Furthermore, the CutMix technique enables the per-pixel outputs of the U-Net based discriminator to provide radiologists with a confidence map to visualize the uncertainty of the denoised results, facilitating the LDCT-based screening and diagnosis. Extensive experiments on the simulated and real-world datasets demonstrate superior performance over recently published methods both qualitatively and quantitatively.

翻訳日:2021-08-25 15:32:56 公開日:2021-08-24

# (参考訳) greenformers:低ランク近似による変圧器モデルの計算とメモリ効率の向上

Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation ( http://arxiv.org/abs/2108.10808v1 )

ライセンス: CC BY-SA 4.0

Samuel Cahyawijaya

(参考訳) 本稿では,最近注目されている変圧器モデルの低ランク近似手法によるモデル効率を向上させるためのモデル効率法集 greenformers を提案する。ディープラーニングモデルの開発傾向は、より複雑で大きなモデルをもたらす傾向にある。これはより良く正確な予測につながるが、大量のgpuリソースで数週間のトレーニングを必要とするため、結果として得られるモデルはさらにコストがかかる。特に、トランスフォーマーベースのモデルのサイズと計算コストは、2017年のデビュー以来、2021年初頭に約1億のパラメータから約1.6兆のパラメータへと大幅に増加しています。この計算的な空腹モデルもまた環境にかなりのコストをもたらし、カーボンフットプリントの脅威レベルにまで達する。これらのモデルのいくつかは非常に巨大なので、GPUクラスタなしでモデルを実行することさえ不可能です。グリーンフォーマーは低ランク近似アプローチを適用して変圧器モデルのモデル効率を向上させる。具体的には,低ランク変圧器と呼ばれる変圧器モデルの効率を向上させるための低ランク分解手法を提案する。さらに、我々のモデルをLinformerと呼ばれる既存の低ランク分解手法と比較する。この分析に基づき、低ランクトランスフォーマモデルは短系列(<=512)入力データの処理における時間およびメモリ効率を向上させるのに適し、リンフォーマモデルは長系列入力データの処理効率を向上させるのに適している(>>512)。また,低ランクトランスフォーマは,モデルサイズが大幅に削減されるため,デバイス上でのデプロイメントに適していることを示す。さらに、既存のBERTベースモデルにLRTを適用することで、そのようなモデルを開発するための計算、経済、環境コストを、当初のコストの30%以上削減できると見積もっている。

In this thesis, we introduce Greenformers, a collection of model efficiency methods to improve the model efficiency of the recently renowned transformer models with a low-rank approximation approach. The development trend of deep learning models tends to results in a more complex and larger model. Although it leads to a better and more accurate prediction, the resulting model becomes even more costly, as it requires weeks of training with a huge amount of GPU resources. Particularly, the size and computational cost of transformer-based models have increased tremendously since its first debut in 2017 from ~100 million parameters up to ~1.6 trillion parameters in early 2021. This computationally hungry model also incurs a substantial cost to the environment and even reaches an alarming level of carbon footprint. Some of these models are so massive that it is even impossible to run the model without a GPU cluster. Greenformers improve the model efficiency of transformer models by applying low-rank approximation approaches. Specifically, we propose a low-rank factorization approach to improve the efficiency of the transformer model called Low-Rank Transformer. We further compare our model with an existing low-rank factorization approach called Linformer. Based on our analysis, the Low-Rank Transformer model is suitable for improving both the time and memory efficiency in processing short-sequence (<= 512) input data, while the Linformer model is suitable for improving the efficiency in processing long-sequence input data (>= 512). We also show that Low-Rank Transformer is more suitable for on-device deployment, as it significantly reduces the model size. Additionally, we estimate that applying LRT to the existing BERT-base model can significantly reduce the computational, economical, and environmental costs for developing such models by more than 30% of its original costs.

翻訳日:2021-08-25 15:11:35 公開日:2021-08-24

# (参考訳) All-in-Focus Supervision による教師なし奥行きのブリッジ

Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision ( http://arxiv.org/abs/2108.10843v1 )

ライセンス: CC BY 4.0

Ning-Hsu Wang, Ren Wang, Yu-Lun Liu, Yu-Hao Huang, Yu-Lin Chang, Chia-Ping Chen and Kevin Jou

(参考訳) 奥行き推定はコンピュータビジョンにおいて長く続く重要なタスクである。以前の研究のほとんどは、入力画像から深度を推定し、実世界のアプリケーションでは一般的でないオールインフォーカス(AiF)であると仮定している。一方、デフォーカスのぼかしを考慮に入れ、深度推定のための別の手がかりと考える作品もいくつかある。本稿では,焦点位置の異なる画像群(焦点スタックとして知られる)から深度マップだけでなくaif画像も推定する手法を提案する。深度とAiF推定の関係を生かした共有アーキテクチャを設計する。その結果、提案手法は、地上の真理深度で指導的に訓練するか、AiF画像を監視信号として訓練することができる。種々の実験において,本手法は定量的かつ定性的に最先端の手法より優れ,推論時間の効率も高いことを示す。

Depth estimation is a long-lasting yet important task in computer vision. Most of the previous works try to estimate depth from input images and assume images are all-in-focus (AiF), which is less common in real-world applications. On the other hand, a few works take defocus blur into account and consider it as another cue for depth estimation. In this paper, we propose a method to estimate not only a depth map but an AiF image from a set of images with different focus positions (known as a focal stack). We design a shared architecture to exploit the relationship between depth and AiF estimation. As a result, the proposed method can be trained either supervisedly with ground truth depth, or \emph{unsupervisedly} with AiF images as supervisory signals. We show in various experiments that our method outperforms the state-of-the-art methods both quantitatively and qualitatively, and also has higher efficiency in inference time.

翻訳日:2021-08-25 15:10:21 公開日:2021-08-24

# (参考訳) 計算病理学のための四分木画像表現

A QuadTree Image Representation for Computational Pathology ( http://arxiv.org/abs/2108.10873v1 )

ライセンス: CC BY 4.0

Rob Jewsbury, Abhir Bhalerao, Nasir Rajpoot

(参考訳) 計算病理学の分野は、病理画像の重大さからコンピュータビジョンアルゴリズムに多くの課題を呈している。病理組織像は大きく、画像タイルやパッチに分割する必要があるため、現代の畳み込みニューラルネットワーク(cnns)がそれらを処理できる。本稿では,quadtreesを用いて計算病理画像の解釈可能な画像表現を生成する手法と,これらの表現を精度の高い下流分類に利用するパイプラインを提案する。我々の知る限りでは、これは病理画像データにクワッドツリーを使用する最初の試みである。現在広く採用されている組織マスクパッチ抽出法と同程度の精度で, 38%以上少ないデータを用いて, 良好な結果が得られることを示した。

The field of computational pathology presents many challenges for computer vision algorithms due to the sheer size of pathology images. Histopathology images are large and need to be split up into image tiles or patches so modern convolutional neural networks (CNNs) can process them. In this work, we present a method to generate an interpretable image representation of computational pathology images using quadtrees and a pipeline to use these representations for highly accurate downstream classification. To the best of our knowledge, this is the first attempt to use quadtrees for pathology image data. We show it is highly accurate, able to achieve as good results as the currently widely adopted tissue mask patch extraction methods all while using over 38% less data.

翻訳日:2021-08-25 14:43:09 公開日:2021-08-24

# キャリブレーションバックプロジェクション層を用いた教師なし奥行き完了

Unsupervised Depth Completion with Calibrated Backprojection Layers ( http://arxiv.org/abs/2108.10531v1 )

ライセンス: Link先を確認

Alex Wong and Stefano Soatto

(参考訳) 本研究では,画像と疎点雲から深い深さを推定するディープニューラルネットワークアーキテクチャを提案する。カメラの固有キャリブレーションパラメータとともに、lidarまたは他のレンジセンサから得られるビデオストリームと対応する同期スパースポイントクラウドを用いてトレーニングする。推定時には、トレーニングに使用するカメラとは異なるカメラのキャリブレーションが、スパースポイントクラウドと1つの画像とともに、ネットワークへの入力として供給される。キャリブレーションバックプロジェクション層は、キャリブレーションマトリックスと深度特徴記述子を用いて画像の各画素を3次元空間にバックプロジェクションする。得られた3次元位置符号化は、画像記述子と前層出力とを連結してエンコーダの次の層に入力する。デコーダはスキップ接続を利用して深度マップを生成する。結果として得られる校正されたバックプロジェクションネットワーク(kbnet)は、測光再プロジェクションエラーを最小化することで、監視なしで訓練される。 kbnetは一般的な正規化ではなく、トレーニングセットに基づく深さ値の欠落を暗示する。我々はKBNetを公開深度補完ベンチマークでテストし、同じカメラをトレーニングやテストに使用する場合、アートの状態を30%、屋外の8%で上回ります。テストカメラが異なる場合、改善率は62%に達する。 https://github.com/alexklwong/calibrated-backprojection-network.com/。

We propose a deep neural network architecture to infer dense depth from an image and a sparse point cloud. It is trained using a video stream and corresponding synchronized sparse point cloud, as obtained from a LIDAR or other range sensor, along with the intrinsic calibration parameters of the camera. At inference time, the calibration of the camera, which can be different than the one used for training, is fed as an input to the network along with the sparse point cloud and a single image. A Calibrated Backprojection Layer backprojects each pixel in the image to three-dimensional space using the calibration matrix and a depth feature descriptor. The resulting 3D positional encoding is concatenated with the image descriptor and the previous layer output to yield the input to the next layer of the encoder. A decoder, exploiting skip-connections, produces a dense depth map. The resulting Calibrated Backprojection Network, or KBNet, is trained without supervision by minimizing the photometric reprojection error. KBNet imputes missing depth value based on the training set, rather than on generic regularization. We test KBNet on public depth completion benchmarks, where it outperforms the state of the art by 30% indoor and 8% outdoor when the same camera is used for training and testing. When the test camera is different, the improvement reaches 62%. Code available at: https://github.com/alexklwong/calibrated-backprojection-network.

翻訳日:2021-08-25 14:29:46 公開日:2021-08-24

# 微粒化エンティティタイピングのためのPrompt-Learning

Prompt-Learning for Fine-Grained Entity Typing ( http://arxiv.org/abs/2108.10604v1 )

ライセンス: Link先を確認

Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Hai-Tao Zheng, Zhiyuan Liu, Juanzi Li, Hong-Gee Kim

(参考訳) 特定のタスクに事前学習言語モデル(PLM)をチューニングするための効果的なアプローチとして、プロンプトラーニングが研究者から注目を集めている。 textit{cloze} スタイルの言語は PLM の多義的な知識を刺激し、自然言語推論、感情分類、知識探索といった一連の NLP タスクにおいて有望な結果が得られる。本研究では,細粒度エンティティタイピングにおけるプロンプトラーニングの適用について,全教師あり,少数ショット,ゼロショットのシナリオで検討する。まず、エンティティ指向の言語処理器とテンプレートを構築し、マスク付き言語モデリングを行うことにより、シンプルで効果的な学習パイプラインを構築する。さらに,ゼロショット体制に取り組むために,素早い学習において分布レベルの最適化を行い,エンティティの情報を自動要約する自己教師型戦略を提案する。教師付き、少数ショット、ゼロショット設定下での3つのきめ細かいエンティティタイピングベンチマーク(最大86クラス)の大規模な実験は、特にトレーニングデータが不十分な場合、プロンプト学習手法が微調整ベースラインを大幅に上回っていることを示している。

As an effective approach to tune pre-trained language models (PLMs) for specific tasks, prompt-learning has recently attracted much attention from researchers. By using \textit{cloze}-style language prompts to stimulate the versatile knowledge of PLMs, prompt-learning can achieve promising results on a series of NLP tasks, such as natural language inference, sentiment classification, and knowledge probing. In this work, we investigate the application of prompt-learning on fine-grained entity typing in fully supervised, few-shot and zero-shot scenarios. We first develop a simple and effective prompt-learning pipeline by constructing entity-oriented verbalizers and templates and conducting masked language modeling. Further, to tackle the zero-shot regime, we propose a self-supervised strategy that carries out distribution-level optimization in prompt-learning to automatically summarize the information of entity types. Extensive experiments on three fine-grained entity typing benchmarks (with up to 86 classes) under fully supervised, few-shot and zero-shot settings show that prompt-learning methods significantly outperform fine-tuning baselines, especially when the training data is insufficient.

翻訳日:2021-08-25 14:29:24 公開日:2021-08-24

# ソーシャルメディアにおける道徳に基づくアサーションと相同性--英語と日本語の文化的比較

Morality-based Assertion and Homophily on Social Media: A Cultural Comparison between English and Japanese Languages ( http://arxiv.org/abs/2108.10643v1 )

ライセンス: Link先を確認

Maneet Singh, Rishemjit Kaur, Akiko Matsuo, S.R.S. Iyengar and Kazutoshi Sasahara

(参考訳) 道徳心理学は道徳的アイデンティティ、評価、感情を扱う分野である。これまでの仕事は道徳的発展と文化の役割に大きく焦点を合わせてきた。言語が文化の本質的な要素であることを知るため,日本語利用者と英語利用者の道徳行動を比較するために,ソーシャルメディアプラットフォームであるTwitterを用いた。ケア、フェアネス、イングループ、オーソリティ、純粋性の5つの基本的道徳的基盤と関連する感情的価値を、英語と日本語のつぶやきと比較する。日本のユーザーのツイートは、フェアネス、イングループ、純粋さが比較的高かった。道徳に関わる感情に関しては、イングランドのツイートは全ての道徳的な側面に対してよりポジティブな感情を表した。ソーシャルメディア上で利用者をつなぐ上での道徳的類似性を考慮して,提案手法を用いて異なる道徳的次元に関するホモフィリーを定量化した。英語のケア、権威、純粋さ、日本語のイングループはTwitter上でホモフィリーを描写している。本研究は,英語および日本語話者の道徳行動に関する文化的差異を明らかにするものである。

Moral psychology is a domain that deals with moral identity, appraisals and emotions. Previous work has greatly focused on moral development and the associated role of culture. Knowing that language is an inherent element of a culture, we used the social media platform Twitter for comparing the moral behaviors of Japanese users with English users. The five basic moral foundations i.e., Care, Fairness, Ingroup, Authority and Purity, along with the associated emotional valence are compared for English and Japanese tweets. The tweets from Japanese users depicted relatively higher Fairness, Ingroup and Purity. As far as emotions related to morality are concerned, the English tweets expressed more positive emotions for all moral dimensions. Considering the role of moral similarities in connecting users on social media, we quantified homophily concerning different moral dimensions using our proposed method. The moral dimensions Care, Authority and Purity for English and Ingroup for Japanese depicted homophily on Twitter. Overall, our study uncovers the underlying cultural differences with respect to moral behavior in English and Japanese speaking users.

翻訳日:2021-08-25 14:29:01 公開日:2021-08-24

# llvip: ローライトビジョンのための可視赤外ペアデータセット

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision ( http://arxiv.org/abs/2108.10831v1 )

ライセンス: Link先を確認

Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, Wenli Zhou

(参考訳) 画像の融合や歩行者検出、低照度での画像から画像への変換といった様々な視覚課題において、有効な対象領域の欠如は極めて困難である。この場合、赤外線と可視画像を組み合わせて、詳細な情報と効果的なターゲット領域の両方を提供することができる。本稿では,低照度ビジョンのための可視赤外ペアデータセットLLVIPを提案する。このデータセットには33672枚の画像、または16836枚のペアが含まれており、そのほとんどは非常に暗いシーンで撮影され、すべての画像は時間と空間で厳密に整列している。データセットの歩行者はラベルが付けられています。データセットを他の可視赤外データセットと比較し,画像融合,歩行者検出,画像から画像への変換など,一般的なビジュアルアルゴリズムの性能評価を行った。実験結果は,画像情報に対する融合の相補的効果を示し,超低照度条件下での3つの視覚課題の既存のアルゴリズムの欠如を見出した。 LLVIPデータセットは,低照度アプリケーションにおける画像融合,歩行者検出,画像から画像への変換を促進することによって,コンピュータビジョンのコミュニティに寄与すると考えている。データセットはhttps://bupt-ai-cz.github.io/llvipでリリースされる。

It is very challenging for various visual tasks such as image fusion, pedestrian detection and image-to-image translation in low light conditions due to the loss of effective target areas. In this case, infrared and visible images can be used together to provide both rich detail information and effective target areas. In this paper, we present LLVIP, a visible-infrared paired dataset for low-light vision. This dataset contains 33672 images, or 16836 pairs, most of which were taken at very dark scenes, and all of the images are strictly aligned in time and space. Pedestrians in the dataset are labeled. We compare the dataset with other visible-infrared datasets and evaluate the performance of some popular visual algorithms including image fusion, pedestrian detection and image-to-image translation on the dataset. The experimental results demonstrate the complementary effect of fusion on image information, and find the deficiency of existing algorithms of the three visual tasks in very low-light conditions. We believe the LLVIP dataset will contribute to the community of computer vision by promoting image fusion, pedestrian detection and image-to-image translation in very low-light applications. The dataset is being released in https://bupt-ai-cz.github.io/LLVIP.

翻訳日:2021-08-25 14:28:24 公開日:2021-08-24

# マルチソースドメイン適応のためのメタ自己学習:ベンチマーク

Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark ( http://arxiv.org/abs/2108.10840v1 )

ライセンス: Link先を確認

Shuhao Qiu, Chuang Zhu, Wenli Zhou

(参考訳) 近年、深層学習に基づく手法がコンピュータビジョンの分野で有望な結果を示している。しかし、一般的なディープラーニングモデルは大量のラベル付きデータを必要とするため、収集とラベル付けに手間がかかる。さらに、トレーニングデータとテストデータの間のドメインシフトによって、モデルは破壊される可能性があります。テキスト認識はコンピュータビジョンにおいて広く研究されている分野であり、フォントの多様性と複雑な背景により上記の問題に苦しめられている。本稿では,テキスト認識問題に着目し,これらの問題に対して3つの貢献を行う。まず、500万以上の画像を持つ5つの異なるドメインを含む、テキスト認識のためのマルチソースドメイン適応データセットを収集します。次に,メタ自己学習手法とメタ学習パラダイムを組み合わせたメタ自己学習手法を提案する。第3に,ベンチマークを提供するためにデータセット上で広範な実験を行い,本手法の有効性を示す。私たちの仕事とデータセットのコードは、すぐにhttps://bupt-ai-cz.github.io/meta-selflearning/で入手できる。

In recent years, deep learning-based methods have shown promising results in computer vision area. However, a common deep learning model requires a large amount of labeled data, which is labor-intensive to collect and label. What's more, the model can be ruined due to the domain shift between training data and testing data. Text recognition is a broadly studied field in computer vision and suffers from the same problems noted above due to the diversity of fonts and complicated backgrounds. In this paper, we focus on the text recognition problem and mainly make three contributions toward these problems. First, we collect a multi-source domain adaptation dataset for text recognition, including five different domains with over five million images, which is the first multi-domain text recognition dataset to our best knowledge. Secondly, we propose a new method called Meta Self-Learning, which combines the self-learning method with the meta-learning paradigm and achieves a better recognition result under the scene of multi-domain adaptation. Thirdly, extensive experiments are conducted on the dataset to provide a benchmark and also show the effectiveness of our method. The code of our work and dataset are available soon at https://bupt-ai-cz.github.io/Meta-SelfLearning/.

翻訳日:2021-08-25 14:28:06 公開日:2021-08-24

# リカレントニューラルネットワークトランスデューサにおける露出バイアスの低減

Reducing Exposure Bias in Training Recurrent Neural Network Transducers ( http://arxiv.org/abs/2108.10803v1 )

ライセンス: Link先を確認

Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske

(参考訳) リカレントニューラルネットワークトランスデューサ(rnnts)を典型的最大度基準を用いて訓練すると、予測ネットワークは基底真理ラベル配列のみに基づいて訓練される。これにより、モデルがエラーを含むラベルシーケンスを扱う必要がある場合、露出バイアスとして知られる推論中にミスマッチが発生する。本稿では,自動音声認識(ASR)のためのRNNTモデルの一般化を改善するために,トレーニングにおける露出バイアスを低減するアプローチを検討する。予測ネットワークに対するラベル保存入力摂動を導入する。入力トークンシーケンスは、追加のトークン言語モデルに基づいてスイッチアウトとスケジュールサンプリングを使用して摂動される。 300時間のswitchboardデータセットで実施された実験は、その効果を示している。露光バイアスを低減することで、高性能RNNT ASRモデルの精度をさらに向上し、300時間Switchboardデータセットの最先端結果を得ることができることを示す。

When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, when the model must deal with label sequences containing errors. In this paper we investigate approaches to reducing exposure bias in training to improve the generalization of RNNT models for automatic speech recognition (ASR). A label-preserving input perturbation to the prediction network is introduced. The input token sequences are perturbed using SwitchOut and scheduled sampling based on an additional token language model. Experiments conducted on the 300-hour Switchboard dataset demonstrate their effectiveness. By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset.

翻訳日:2021-08-25 14:27:50 公開日:2021-08-24

# 欠落モダリティをもつマルチモーダル学習における最大確率推定

Maximum Likelihood Estimation for Multimodal Learning with Missing Modality ( http://arxiv.org/abs/2108.10513v1 )

ライセンス: Link先を確認

Fei Ma, Xiangxiang Xu, Shao-Lun Huang, Lin Zhang

(参考訳) マルチモーダル学習は多くのシナリオで大きな成功を収めた。一元学習と比較して、異なるモダリティからの情報を効果的に組み合わせて学習タスクの性能を向上させることができる。実際、マルチモーダルデータはセンサーの故障やデータ伝送エラーといった様々な理由により、モダリティを欠いている可能性がある。以前の研究では、モダリティを許容するデータの情報は十分に活用されていない。この問題に対処するために,最大推定値に基づく効率的な手法を提案し,その知識をモダリティ欠落データに組み込む。具体的には、モーダリティ完全データと理論的に最適であるモーダリティ完全データの条件分布を特徴付ける可能性関数を設計する。さらに,ソフトマックス関数の一般化形式を開発し,最大推定値をエンドツーエンドに効果的に実装する。このようなトレーニング戦略は,アルゴリズムの計算可能性を保証する。最後に,実世界のマルチモーダルデータセットに関する一連の実験を行う。トレーニングデータの95%がモダリティを欠いている場合でも,提案手法の有効性を示す。

Multimodal learning has achieved great successes in many scenarios. Compared with unimodal learning, it can effectively combine the information from different modalities to improve the performance of learning tasks. In reality, the multimodal data may have missing modalities due to various reasons, such as sensor failure and data transmission error. In previous works, the information of the modality-missing data has not been well exploited. To address this problem, we propose an efficient approach based on maximum likelihood estimation to incorporate the knowledge in the modality-missing data. Specifically, we design a likelihood function to characterize the conditional distribution of the modality-complete data and the modality-missing data, which is theoretically optimal. Moreover, we develop a generalized form of the softmax function to effectively implement maximum likelihood estimation in an end-to-end manner. Such training strategy guarantees the computability of our algorithm capably. Finally, we conduct a series of experiments on real-world multimodal datasets. Our results demonstrate the effectiveness of the proposed approach, even when 95% of the training data has missing modality.

翻訳日:2021-08-25 14:27:36 公開日:2021-08-24

# より深いグラフニューラルネットワークのトレーニングのためのトリックのバグ:包括的なベンチマーク研究

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study ( http://arxiv.org/abs/2108.10521v1 )

ライセンス: Link先を確認

Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

(参考訳) ディープグラフニューラルネットワーク(GNN)のトレーニングは非常に難しい。勾配の消失や過度な適合といった深層アーキテクチャのトレーニングの標準点に加えて、深層GNNのトレーニングは過度なスムーシングや情報スカッシングなどの影響を受けており、大規模なグラフに対する潜在的なパワーを制限している。様々な種類のスキップ接続、グラフ正規化、ランダムなドロップなど、これらの制限に対処するための多くの取り組みが提案されているが、そのようなアーキテクチャをトレーニングするために必要な「トリック」から、深いGNNアーキテクチャがもたらす利点を解消することは困難である。さらに、公正で一貫した実験的な設定を持つ標準ベンチマークの欠如は、新しいメカニズムの有効性を調べる上でほぼ不可能である。これらの観点から、我々は、深層GNNの「トリック」を評価するための最初の公正かつ再現可能なベンチマークを示す。既存のアプローチを分類し,そのハイパーパラメータ感度を調査し,基本構成を統一する。総合的な評価は、最近の大規模Open Graph Benchmark(OGB)を含む、数十のグラフデータセット上で実施される。相乗的研究に基づいて,複数の代表的なグラフデータセットにわたる深層gcnの新たな最先端結果を達成するための,優れたトレーニングトリックのコンボを見出した。我々は,初期接続,アイデンティティマッピング,グループ正規化,バッチ正規化といった有機的な組み合わせが,大規模データセットにおいて最も理想的な性能を持つことを示す。実験はまた、いくつかのトリックを組み合わせたりスケールアップしたりする際に、いくつかの"サプライズ"を明らかにする。すべてのコードはhttps://github.com/VITA-Group/Deep_GCN_Benchmarkingで入手できる。

Training deep graph neural networks (GNNs) is notoriously hard. Besides the standard plights in training deep architectures such as vanishing gradients and overfitting, the training of deep GNNs also uniquely suffers from over-smoothing, information squashing, and so on, which limits their potential power on large-scale graphs. Although numerous efforts are proposed to address these limitations, such as various forms of skip connections, graph normalization, and random dropping, it is difficult to disentangle the advantages brought by a deep GNN architecture from those "tricks" necessary to train such an architecture. Moreover, the lack of a standardized benchmark with fair and consistent experimental settings poses an almost insurmountable obstacle to gauging the effectiveness of new mechanisms. In view of those, we present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs. We categorize existing approaches, investigate their hyperparameter sensitivity, and unify the basic configuration. Comprehensive evaluations are then conducted on tens of representative graph datasets including the recent large-scale Open Graph Benchmark (OGB), with diverse deep GNN backbones. Based on synergistic studies, we discover the combo of superior training tricks, that lead us to attain the new state-of-the-art results for deep GCNs, across multiple representative graph datasets. We demonstrate that an organic combo of initial connection, identity mapping, group and batch normalization has the most ideal performance on large datasets. Experiments also reveal a number of "surprises" when combining or scaling up some of the tricks. All codes are available at https://github.com/VITA-Group/Deep_GCN_Benchmarking.

翻訳日:2021-08-25 14:27:21 公開日:2021-08-24

# 深層強化学習における効果的な探索のためのエントロピー・アウェアモデル初期化

Entropy-Aware Model Initialization for Effective Exploration in Deep Reinforcement Learning ( http://arxiv.org/abs/2108.10533v1 )

ライセンス: Link先を確認

Sooyoung Jang and Hyung-Il Kim

(参考訳) 深層学習における探索の促進は重要な問題である。初期エントロピーの影響について検討し,特に初期エントロピーの影響について検討した。 1) 初期エントロピーの低さは学習失敗の確率を増加させ, 2) この初期エントロピーは探索を阻害する低い値に向かって偏っている。本研究から着想を得たエントロピー対応モデル初期化は,効率的な探索のためのシンプルかつ強力な学習戦略である。提案する学習戦略は,学習失敗を著しく軽減し,実験によるパフォーマンス,安定性,学習速度を向上させる。

Encouraging exploration is a critical issue in deep reinforcement learning. We investigate the effect of initial entropy that significantly influences the exploration, especially at the earlier stage. Our main observations are as follows: 1) low initial entropy increases the probability of learning failure, and 2) this initial entropy is biased towards a low value that inhibits exploration. Inspired by the investigations, we devise entropy-aware model initialization, a simple yet powerful learning strategy for effective exploration. We show that the devised learning strategy significantly reduces learning failures and enhances performance, stability, and learning speed through experiments.

翻訳日:2021-08-25 14:26:53 公開日:2021-08-24

# sigmoidF1:マルチラベル分類のための平滑なF1スコアサロゲート損失

sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification ( http://arxiv.org/abs/2108.10566v1 )

ライセンス: Link先を確認

Gabriel B\'en\'edict, Vincent Koops, Daan Odijk, Maarten de Rijke

(参考訳) マルチクラスマルチラベル分類(multiclass multilabel classification)は、予測を通じて複数のラベルをサンプルに帰属させるタスクである。現在のモデルでは、既存の損失関数(シグモイド、クロスエントロピー、ロジスティックなど)を使用できるように、そのマルチラベル設定を複数のバイナリ分類またはマルチクラス分類に縮小する。実験的に、これらの手法は異なるメトリクス(F1スコア、リコール、精度など)で優れたパフォーマンスを達成することが報告されている。理論的には、多ラベル分類の削減は例ごとに異なるラベル数の予測には適せず、根底にある損失は性能指標の遠距離推定である。我々は損失関数sigmoidF1を提案する。これは f1 のスコアの近似であり、 (i) は確率的勾配降下に対して滑らかで扱いやすい、 (ii) 自然にマルチラベル計量に近似し、 (iii) ラベルの傾向とラベル数を推定する。より一般に、任意の混乱行列計量は滑らかな代理で定式化できることを示す。提案した損失関数を,テキストと画像の異なるデータセットで評価し,多ラベル分類評価の複雑さを考慮に入れた。実験では、SigmoidF1損失を最先端の学習前ニューラルネットワークMobileNetV2とDistilBERTにアタッチした分類ヘッドに埋め込んだ。実験の結果,SigmoidF1は4つのデータセットと複数のメトリクスで他の損失関数よりも優れていた。これらの結果から,訓練時間における損失関数としての推論時間指標の有効性と,マルチラベル分類などの非自明な分類問題への可能性を示した。

Multiclass multilabel classification refers to the task of attributing multiple labels to examples via predictions. Current models formulate a reduction of that multilabel setting into either multiple binary classifications or multiclass classification, allowing for the use of existing loss functions (sigmoid, cross-entropy, logistic, etc.). Empirically, these methods have been reported to achieve good performance on different metrics (F1 score, Recall, Precision, etc.). Theoretically though, the multilabel classification reductions does not accommodate for the prediction of varying numbers of labels per example and the underlying losses are distant estimates of the performance metrics. We propose a loss function, sigmoidF1. It is an approximation of the F1 score that (I) is smooth and tractable for stochastic gradient descent, (II) naturally approximates a multilabel metric, (III) estimates label propensities and label counts. More generally, we show that any confusion matrix metric can be formulated with a smooth surrogate. We evaluate the proposed loss function on different text and image datasets, and with a variety of metrics, to account for the complexity of multilabel classification evaluation. In our experiments, we embed the sigmoidF1 loss in a classification head that is attached to state-of-the-art efficient pretrained neural networks MobileNetV2 and DistilBERT. Our experiments show that sigmoidF1 outperforms other loss functions on four datasets and several metrics. These results show the effectiveness of using inference-time metrics as loss function at training time in general and their potential on non-trivial classification problems like multilabel classification.

翻訳日:2021-08-25 14:26:11 公開日:2021-08-24

# 技術・基本・テキストデータを用いたS&P 500株価予測

S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data ( http://arxiv.org/abs/2108.10826v1 )

ライセンス: Link先を確認

Shan Zhong and David B. Hitchcock

(参考訳) 我々は、株価予測に使用される一般的な予測モデルと新しい予測モデルの両方を要約し、S&P株価予測に技術的指標、基本特性、テキストベースの感情データと組み合わせた。 S&P 500指数方向予測における66.18%の精度と、個々の株式方向予測における62.09%の精度は、ランダムフォレストやLSTMといった異なる機械学習モデルと最先端のアンサンブルモデルを組み合わせて達成された。 2000年1月1日から2019年12月31日までに、現在および元S&P500の大型企業から発行されている518の異なる普通株に関する週毎の歴史的価格、財務報告、およびニュース情報が含まれています。本研究のイノベーションは,金融ニュース項目の感情を分類・推定するために深層言語モデルを活用すること,変数と株価の異なる組み合わせを含む異なるモデルを融合して予測を行うこと,異なる株間でのデータを用いて時系列で機械学習モデルの不十分なデータ問題を克服すること,などである。

We summarized both common and novel predictive models used for stock price prediction and combined them with technical indices, fundamental characteristics and text-based sentiment data to predict S&P stock prices. A 66.18% accuracy in S&P 500 index directional prediction and 62.09% accuracy in individual stock directional prediction was achieved by combining different machine learning models such as Random Forest and LSTM together into state-of-the-art ensemble models. The data we use contains weekly historical prices, finance reports, and text information from news items associated with 518 different common stocks issued by current and former S&P 500 large-cap companies, from January 1, 2000 to December 31, 2019. Our study's innovation includes utilizing deep language models to categorize and infer financial news item sentiment; fusing different models containing different combinations of variables and stocks to jointly make predictions; and overcoming the insufficient data problem for machine learning models in time series by using data across different stocks.

翻訳日:2021-08-25 14:25:43 公開日:2021-08-24

# 物理学を応用した深層学習:システム信頼性評価のための有望な手法

Physics-Informed Deep Learning: A Promising Technique for System Reliability Assessment ( http://arxiv.org/abs/2108.10828v1 )

ライセンス: Link先を確認

Taotao Zhou, Enrique Lopez Droguett, Ali Mosleh

(参考訳) 信頼性と安全性のコミュニティにおけるシステム診断と健康管理のためのディープラーニングに基づく予測モデルに関する研究が注目されている。しかし,システム信頼性評価における深層学習の利用に関する研究は限られている。本稿では,近年の物理インフォームド深層学習の進歩を利用して,このギャップを埋め,深層学習とシステム信頼性評価の新たなインターフェースを探求することを目的とする。特に,物理学を対象とする深層学習の文脈におけるフレームシステム信頼性評価のアプローチを提示し,不確実性定量化とシステム信頼性評価に組み込んだ計測データのための物理を対象とする生成的逆ネットワークの可能性について考察する。提案手法はデュアルプロセッサ計算システムを含む3つの数値例によって実証された。この結果は,計算課題を緩和し,測定データと数理モデルを組み合わせてシステム信頼性評価を行う物理情報深層学習の可能性を示している。

Considerable research has been devoted to deep learning-based predictive models for system prognostics and health management in the reliability and safety community. However, there is limited study on the utilization of deep learning for system reliability assessment. This paper aims to bridge this gap and explore this new interface between deep learning and system reliability assessment by exploiting the recent advances of physics-informed deep learning. Particularly, we present an approach to frame system reliability assessment in the context of physics-informed deep learning and discuss the potential value of physics-informed generative adversarial networks for the uncertainty quantification and measurement data incorporation in system reliability assessment. The proposed approach is demonstrated by three numerical examples involving a dual-processor computing system. The results indicate the potential value of physics-informed deep learning to alleviate computational challenges and combine measurement data and mathematical models for system reliability assessment.

翻訳日:2021-08-25 14:25:24 公開日:2021-08-24

# imGHUM: 人間の3次元形状とArticulated Poseの生成モデル

imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose ( http://arxiv.org/abs/2108.10842v1 )

ライセンス: Link先を確認

Thiemo Alldieck, Hongyi Xu, Cristian Sminchisescu

(参考訳) 本稿では,3次元形状と構音ポーズの包括的生成モデルであるimghumについて,符号付き距離関数として表現する。従来の作業とは対照的に、全人体をゼロレベルセットの関数として暗黙的にモデル化し、明示的なテンプレートメッシュを使用しない。本稿では,人間のポーズ,形状,意味に関する詳細な暗黙的生成モデルを,最先端のメッシュモデルと同等に学習することのできる,新しいネットワークアーキテクチャと学習パラダイムを提案する。本モデルでは,手の動きや表情を含む調音ポーズ,形状変化の幅広いスペクトル,任意の解像度や空間的位置でクエリできるなど,人間のモデルに望ましい詳細を特徴付ける。さらに,本モデルでは,異なる形状のインスタンス間の対応性を簡単に確立し,従来の暗黙的表現による対処が困難なアプリケーションを実現するために,空間意味論を付加した。広範な実験において,モデル精度と現在の研究課題への適用性を示す。

We present imGHUM, the first holistic generative model of 3D human shape and articulated pose, represented as a signed distance function. In contrast to prior work, we model the full human body implicitly as a function zero-level-set and without the use of an explicit template mesh. We propose a novel network architecture and a learning paradigm, which make it possible to learn a detailed implicit generative model of human pose, shape, and semantics, on par with state-of-the-art mesh-based models. Our model features desired detail for human models, such as articulated pose including hand motion and facial expressions, a broad spectrum of shape variations, and can be queried at arbitrary resolutions and spatial locations. Additionally, our model has attached spatial semantics making it straightforward to establish correspondences between different shape instances, thus enabling applications that are difficult to tackle using classical implicit representations. In extensive experiments, we demonstrate the model accuracy and its applicability to current research problems.

翻訳日:2021-08-25 14:25:08 公開日:2021-08-24

# ReFINE: Random RangE Finder for Network Embedding

REFINE: Random RangE FInder for Network Embedding ( http://arxiv.org/abs/2108.10703v1 )

ライセンス: Link先を確認

Hao Zhu, Piotr Koniusz

(参考訳) ノードの低次元ベクトル表現を学習するネットワーク埋め込み手法は近年,注目されている。行列分解に基づく埋め込みは有効であるが、固有分解ステップのため計算コストがかかることが多い。本稿では,ランダムレンジファインダに基づくネットワーク埋め込み(refine)アルゴリズムを提案する。このアルゴリズムは1スレッドで30秒以内に100万のノード(youtube)に埋め込むことができる。 REFINEはProNEよりも10倍高速で、LINE、DeepWalk、Node2Vec、GraRep、およびHopeといった他のメソッドよりも10-400倍高速である。まず,ネットワーク埋め込みアプローチをスキップグラムモデルとして定式化するが,直交制約により行列分解問題に再構成する。ランダム化 tSVD (truncated SVD) を他の手法として使用する代わりに、ランダム化 QR 分解を用いてノード表現を高速に取得する。さらに,ネットワーク拡張のための簡易だが効率的なスペクトルフィルタを設計し,ノード表現のための高次情報を得る。実験の結果、ノード分類のための異なるサイズ(数千から数百万のノード/エッジ)のデータセットで精錬が非常に効率的であり、優れた性能を享受できることがわかった。

Network embedding approaches have recently attracted considerable interest as they learn low-dimensional vector representations of nodes. Embeddings based on the matrix factorization are effective but they are usually computationally expensive due to the eigen-decomposition step. In this paper, we propose a Random RangE FInder based Network Embedding (REFINE) algorithm, which can perform embedding on one million of nodes (YouTube) within 30 seconds in a single thread. REFINE is 10x faster than ProNE, which is 10-400x faster than other methods such as LINE, DeepWalk, Node2Vec, GraRep, and Hope. Firstly, we formulate our network embedding approach as a skip-gram model, but with an orthogonal constraint, and we reformulate it into the matrix factorization problem. Instead of using randomized tSVD (truncated SVD) as other methods, we employ the Randomized Blocked QR decomposition to obtain the node representation fast. Moreover, we design a simple but efficient spectral filter for network enhancement to obtain higher-order information for node representation. Experimental results prove that REFINE is very efficient on datasets of different sizes (from thousand to million of nodes/edges) for node classification, while enjoying a good performance.

翻訳日:2021-08-25 14:24:21 公開日:2021-08-24

# デジタルヘルスにおけるプライバシー保護型オープンイノベーションのための連合学習

Federated Learning for Privacy-Preserving Open Innovation Future on Digital Health ( http://arxiv.org/abs/2108.10761v1 )

ライセンス: Link先を確認

Guodong Long, Tao Shen, Yue Tan, Leah Gerrard, Allison Clarke, Jing Jiang

(参考訳) プライバシー保護は、人工知能(AI)に関する倫理的な問題である。フェデレーション学習は、データに直接アクセスすることなく、ユーザや組織間で共有モデルを学ぶための、新しい機械学習パラダイムである。プライバシー保護を提供する次世代AIモデルトレーニングフレームワークになり得るため、デジタルヘルスと医療情報学の将来に幅広い影響を及ぼす可能性がある。医療業界におけるオープンなイノベーションフレームワーク、すなわちオープンヘルスの実現は、パートナー組織や研究コミュニティと次世代の共同フレームワークを構築することによって、医療関連組織のイノベーションと創造性を高めることである。特に、このゲームを変えるコラボレーティブフレームワークは、プライバシー保護を伴う多様なデータからの知識共有を提供する。この章では、AIのサポートにより、フェデレーション学習がオープンヘルスエコシステムの開発を可能にする方法について論じる。既存のフェデレーション学習の課題と解決策について論じる。

Privacy protection is an ethical issue with broad concern in Artificial Intelligence (AI). Federated learning is a new machine learning paradigm to learn a shared model across users or organisations without direct access to the data. It has great potential to be the next-general AI model training framework that offers privacy protection and therefore has broad implications for the future of digital health and healthcare informatics. Implementing an open innovation framework in the healthcare industry, namely open health, is to enhance innovation and creative capability of health-related organisations by building a next-generation collaborative framework with partner organisations and the research community. In particular, this game-changing collaborative framework offers knowledge sharing from diverse data with a privacy-preserving. This chapter will discuss how federated learning can enable the development of an open health ecosystem with the support of AI. Existing challenges and solutions for federated learning will be discussed.

翻訳日:2021-08-25 14:23:58 公開日:2021-08-24

# 階段の特徴:階層構造が深層学習をいかに導くか

The staircase property: How hierarchical structure can guide deep learning ( http://arxiv.org/abs/2108.10573v1 )

ライセンス: Link先を確認

Emmanuel Abbe, Enric Boix-Adsera, Matthew Brennan, Guy Bresler, Dheeraj Nagaraj

(参考訳) 本稿では,深層ニューラルネットワークが階層的に学習できるデータ分布の構造特性を明らかにする。ブール超キューブ上の関数の「階段」特性を定義し、高階フーリエ係数がチェーンの増加に伴う低階フーリエ係数から到達可能であることを仮定する。この性質を満たす関数は、正規ニューラルネットワークの層状確率座標降下(英語版)(layerwise stochastic coordinate descend)を用いて多項式時間で学習できることを証明している。解析により,そのような階段関数やニューラルネットワークに対して,勾配に基づくアルゴリズムは,ネットワーク深度に沿った低次特徴を優雅に組み合わせることで,高次特徴を学習することを示した。さらに,より標準的なResNetアーキテクチャにより,階段関数が学習可能であることを示す実験により,理論的結果を裏付ける。 sqやpacアルゴリズムをエミュレートできる一般的な多項式サイズネットワークとは対照的に、この理論と実験の結果は、階段特性が通常のネットワーク上での勾配ベース学習の能力を理解する上で役割を担っているという事実を裏付けている。

This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically. We define the "staircase" property for functions over the Boolean hypercube, which posits that high-order Fourier coefficients are reachable from lower-order Fourier coefficients along increasing chains. We prove that functions satisfying this property can be learned in polynomial time using layerwise stochastic coordinate descent on regular neural networks -- a class of network architectures and initializations that have homogeneity properties. Our analysis shows that for such staircase functions and neural networks, the gradient-based algorithm learns high-level features by greedily combining lower-level features along the depth of the network. We further back our theoretical results with experiments showing that staircase functions are also learnable by more standard ResNet architectures with stochastic gradient descent. Both the theoretical and experimental results support the fact that staircase properties have a role to play in understanding the capabilities of gradient-based learning on regular networks, in contrast to general polynomial-size networks that can emulate any SQ or PAC algorithms as recently shown.

翻訳日:2021-08-25 14:23:44 公開日:2021-08-24

# GrADE:時間依存型非線形偏微分方程式に対するグラフベースデータ駆動解法

GrADE: A graph based data-driven solver for time-dependent nonlinear partial differential equations ( http://arxiv.org/abs/2108.10639v1 )

ライセンス: Link先を確認

Yash Kumar and Souvik Chakraborty

(参考訳) 物理世界は物理学の法則によって支配され、しばしば非線形偏微分方程式(PDE)の形で表される。残念ながら、PDEの解は非自明であり、しばしばかなりの計算時間を必要とする。近年の人工知能と機械学習の分野での進歩により、ニューラルネットワークを用いたPDEのソリューションが、大きな潜在能力を持つドメインとして登場した。しかし、この分野の開発のほとんどは、完全に接続されたニューラルネットワーク(FNN)または畳み込みニューラルネットワーク(CNN)に基づいている。 FNNは計算的に非効率であり、ネットワークパラメータの数が巨大になる可能性があるが、CNNは通常のグリッドと単純なドメインを必要とする。本稿では,時間依存非線形pdesを解くためのグラフ注意微分方程式(グレード)と呼ばれる新しい枠組みを提案する。提案するアプローチは、FNN、グラフニューラルネットワークと、最近開発されたNeural ODEフレームワークを結合する。第一の考え方は、空間領域をモデル化するためのグラフニューラルネットワークと、時間領域をモデル化するためのニューラルODEである。注意機構は重要な入力/特徴を特定し、より多くの重み付けを割り当て、提案するフレームワークの性能を高める。一方、ニューラルODEはメモリコストを一定に抑え、速度の数値的精度の取引を可能にする。また,提案するアーキテクチャをより少ない時間で精度良く訓練するための効果的な手法として,深度改善を提案する。提案手法の有効性を1次元および2次元バーガーズ方程式を用いて示す。その結果、PDEのモデリングにおける提案フレームワークの能力と、再トレーニングを必要とせず、より大きなドメインへの拡張性を示した。

The physical world is governed by the laws of physics, often represented in form of nonlinear partial differential equations (PDEs). Unfortunately, solution of PDEs is non-trivial and often involves significant computational time. With recent developments in the field of artificial intelligence and machine learning, the solution of PDEs using neural network has emerged as a domain with huge potential. However, most of the developments in this field are based on either fully connected neural networks (FNN) or convolutional neural networks (CNN). While FNN is computationally inefficient as the number of network parameters can be potentially huge, CNN necessitates regular grid and simpler domain. In this work, we propose a novel framework referred to as the Graph Attention Differential Equation (GrADE) for solving time dependent nonlinear PDEs. The proposed approach couples FNN, graph neural network, and recently developed Neural ODE framework. The primary idea is to use graph neural network for modeling the spatial domain, and Neural ODE for modeling the temporal domain. The attention mechanism identifies important inputs/features and assign more weightage to the same; this enhances the performance of the proposed framework. Neural ODE, on the other hand, results in constant memory cost and allows trading of numerical precision for speed. We also propose depth refinement as an effective technique for training the proposed architecture in lesser time with better accuracy. The effectiveness of the proposed framework is illustrated using 1D and 2D Burgers' equations. Results obtained illustrate the capability of the proposed framework in modeling PDE and its scalability to larger domains without the need for retraining.

翻訳日:2021-08-25 14:23:27 公開日:2021-08-24

# マルチスケール進行統計モデルを用いたロスレス画像圧縮

Lossless Image Compression Using a Multi-Scale Progressive Statistical Model ( http://arxiv.org/abs/2108.10551v1 )

ライセンス: Link先を確認

Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Nannan Zou, Emre Aksu, Miska M. Hannuksela

(参考訳) ロスレス画像圧縮は、情報損失を許さない場合、画像記憶と伝送にとって重要な技術である。ディープラーニング技術の急速な発展に伴い、この分野ではより高い圧縮率を達成するためにディープニューラルネットワークが使用されている。画素単位の自己回帰統計モデルに基づく手法は優れた性能を示した。しかし、シーケンシャルな処理方法は、これらの方法が実際に使用されるのを防ぐ。近年,この制限に対処するために,マルチスケール自己回帰モデルが提案されている。マルチスケールアプローチは並列コンピューティングシステムを効率的に利用し、実用的なシステムを構築することができる。しかし、これらの手法は速度と引き換えに圧縮性能を犠牲にする。本稿では,画素ワイド・アプローチとマルチスケール・アプローチを利用するマルチスケール・プログレッシブ・統計モデルを提案する。我々は,画素の処理順序を容易に調整できるフレキシブルな機構を開発した。提案手法は,推定速度を劇的に低下させることなく,2つの大きなベンチマークデータセットに対して,最先端のロスレス画像圧縮法を著しく向上させる。

Lossless image compression is an important technique for image storage and transmission when information loss is not allowed. With the fast development of deep learning techniques, deep neural networks have been used in this field to achieve a higher compression rate. Methods based on pixel-wise autoregressive statistical models have shown good performance. However, the sequential processing way prevents these methods to be used in practice. Recently, multi-scale autoregressive models have been proposed to address this limitation. Multi-scale approaches can use parallel computing systems efficiently and build practical systems. Nevertheless, these approaches sacrifice compression performance in exchange for speed. In this paper, we propose a multi-scale progressive statistical model that takes advantage of the pixel-wise approach and the multi-scale approach. We developed a flexible mechanism where the processing order of the pixels can be adjusted easily. Our proposed method outperforms the state-of-the-art lossless image compression methods on two large benchmark datasets by a significant margin without degrading the inference speed dramatically.

翻訳日:2021-08-25 14:22:52 公開日:2021-08-24

# 多言語モデルの方が優れているか? トランスフォーマーによるチェコ感覚の向上

Are the Multilingual Models Better? Improving Czech Sentiment with Transformers ( http://arxiv.org/abs/2108.10640v1 )

ライセンス: Link先を確認

Pavel P\v{r}ib\'a\v{n}, Josef Steinberger

(参考訳) 本稿では,トランスフォーマーモデルとその多言語バージョンを用いたチェコ語感情の向上を目指す。より具体的には、3つの感情極性データセットに基づくチェコ語の極性検出の課題について検討する。 5つの多言語モデルと3つの単言語モデルを用いて微調整および実験を行った。単言語モデルと多言語モデルのパフォーマンスを比較し、繰り返しニューラルネットワークに基づく従来のアプローチと比較する。さらに、多言語モデルとその知識を英語からチェコ語へ(そしてその逆も)ゼロショットのクロスリンガル分類で伝達する能力をテストする。実験により,巨大多言語モデルが単言語モデルの性能を克服できることを示した。彼らはまた、訓練データなしで他の言語の極性を検出することができ、最先端のモノリンガル訓練モデルと比較してパフォーマンスは4.4 %以下である。さらに,3つのデータセットについて,最新の結果を得た。

In this paper, we aim at improving Czech sentiment with transformer-based models and their multilingual versions. More concretely, we study the task of polarity detection for the Czech language on three sentiment polarity datasets. We fine-tune and perform experiments with five multilingual and three monolingual models. We compare the monolingual and multilingual models' performance, including comparison with the older approach based on recurrent neural networks. Furthermore, we test the multilingual models and their ability to transfer knowledge from English to Czech (and vice versa) with zero-shot cross-lingual classification. Our experiments show that the huge multilingual models can overcome the performance of the monolingual models. They are also able to detect polarity in another language without any training data, with performance not worse than 4.4 % compared to state-of-the-art monolingual trained models. Moreover, we achieved new state-of-the-art results on all three datasets.

翻訳日:2021-08-25 14:22:24 公開日:2021-08-24

# インテント検出のための密度ベース動的カリキュラム学習

Density-Based Dynamic Curriculum Learning for Intent Detection ( http://arxiv.org/abs/2108.10674v1 )

ライセンス: Link先を確認

Yantao Gong, Cao Liu, Jiazhen Yuan, Fan Yang, Xunliang Cai, Guanglu Wan, Jiansong Chen, Ruiyao Niu and Houfeng Wang

(参考訳) 事前訓練された言語モデルは、意図検出タスクにおいて顕著なパフォーマンスを達成した。しかしながら、各サンプルに同じ重みを割り当てることによって、単純なサンプルの過剰フィットと複雑なサンプルの学習の失敗に苦しむことになる。この問題に対処するために,密度に基づく動的カリキュラム学習モデルを提案する。本モデルは固有ベクトルの密度に応じてサンプルの難易度を定義する。このようにして、全てのサンプルの固有ベクトルの全体分布を同時に活用する。次に,様々な難易度のサンプルに注意を払い,学習過程におけるサンプルの割合を変化させる動的カリキュラム学習戦略を適用した。以上の操作を通じて、単純なサンプルを十分に訓練し、複雑なサンプルを増強する。 3つのオープンデータセットの実験により、提案した密度に基づくアルゴリズムが、単純かつ複雑なサンプルを著しく区別できることが確認された。さらに,本モデルでは,強いベースラインよりも明らかに改善されている。

Pre-trained language models have achieved noticeable performance on the intent detection task. However, due to assigning an identical weight to each sample, they suffer from the overfitting of simple samples and the failure to learn complex samples well. To handle this problem, we propose a density-based dynamic curriculum learning model. Our model defines the sample's difficulty level according to their eigenvectors' density. In this way, we exploit the overall distribution of all samples' eigenvectors simultaneously. Then we apply a dynamic curriculum learning strategy, which pays distinct attention to samples of various difficulty levels and alters the proportion of samples during the training process. Through the above operation, simple samples are well-trained, and complex samples are enhanced. Experiments on three open datasets verify that the proposed density-based algorithm can distinguish simple and complex samples significantly. Besides, our model obtains obvious improvement over the strong baselines.

翻訳日:2021-08-25 14:22:10 公開日:2021-08-24

# 微細診断システムを用いた小児呼吸器疾患の同定

Identification of Pediatric Respiratory Diseases Using Fine-grained Diagnosis System ( http://arxiv.org/abs/2108.10818v1 )

ライセンス: Link先を確認

Gang Yu, Zhongzhi Yu, Yemin Shi, Yingshuo Wang, Xiaoqing Liu, Zheming Li, Yonggen Zhao, Fenglei Sun, Yizhou Yu, Qiang Shu

(参考訳) 喘息、気管支炎、肺炎、上気道感染症(RTI)などの呼吸器疾患は、クリニックで最も一般的な疾患である。これらの疾患の症状の類似性は、患者の到着時に迅速に診断することを妨げる。小児科では, 症状の表現能力が限られているため, 正確な診断は困難である。これは、医療画像装置の欠如と医師の限られた経験が、類似した疾患の区別の困難さをさらに増す、一次病院で悪化する。本報告では, 小児の細粒度診断補助システムについて, 入院時に臨床ノートのみを用いて, 迅速かつ正確な診断を行うように提案する。提案システムは,検査結果の構造化段階と疾患同定段階の2段階からなる。第1段階は臨床ノートから関連する数値を抽出して検査結果を構造化し、疾患識別段階はテキスト形式の臨床記録および第1段階から得られた構造化データに基づく診断を提供する。適応的特徴注入や多モード注意融合といった手法を導入し, ヒューズデータとテキストデータを融合する, 新たな深層学習アルゴリズムを開発した。深層学習モデルのトレーニングには12000人以上の呼吸器疾患患者の臨床ノートを使用し,訓練モデルの性能評価には約1800人の非重複患者からの臨床ノートを使用した。肺炎、RTI、気管支炎、喘息の平均精度(AP)はそれぞれ0.878、0.857、0.714、0.825であり、平均AP(mAP)は0.819である。

Respiratory diseases, including asthma, bronchitis, pneumonia, and upper respiratory tract infection (RTI), are among the most common diseases in clinics. The similarities among the symptoms of these diseases precludes prompt diagnosis upon the patients' arrival. In pediatrics, the patients' limited ability in expressing their situation makes precise diagnosis even harder. This becomes worse in primary hospitals, where the lack of medical imaging devices and the doctors' limited experience further increase the difficulty of distinguishing among similar diseases. In this paper, a pediatric fine-grained diagnosis-assistant system is proposed to provide prompt and precise diagnosis using solely clinical notes upon admission, which would assist clinicians without changing the diagnostic process. The proposed system consists of two stages: a test result structuralization stage and a disease identification stage. The first stage structuralizes test results by extracting relevant numerical values from clinical notes, and the disease identification stage provides a diagnosis based on text-form clinical notes and the structured data obtained from the first stage. A novel deep learning algorithm was developed for the disease identification stage, where techniques including adaptive feature infusion and multi-modal attentive fusion were introduced to fuse structured and text data together. Clinical notes from over 12000 patients with respiratory diseases were used to train a deep learning model, and clinical notes from a non-overlapping set of about 1800 patients were used to evaluate the performance of the trained model. The average precisions (AP) for pneumonia, RTI, bronchitis and asthma are 0.878, 0.857, 0.714, and 0.825, respectively, achieving a mean AP (mAP) of 0.819.

翻訳日:2021-08-25 14:21:42 公開日:2021-08-24

# ParamCrop:ビデオコントラスト学習のためのパラメトリックキュービッククロップ

ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning ( http://arxiv.org/abs/2108.10501v1 )

ライセンス: Link先を確認

Zhiwu Qing, Ziyuan Huang, Shiwei Zhang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Rong Ji, Nong Sang

(参考訳) コントラスト学習の中心的な考え方は、異なるインスタンスを区別し、同じインスタンスの異なるビューを同じ表現を共有するように強制することである。自明な解を避けるために、拡張は異なるビューを生成する上で重要な役割を担い、その中ではランダムなトリミングがモデルが強く一般化された表現を学ぶのに有効であることが示される。一般的なランダムな作物操作は、トレーニングプロセスに沿って統計的に一致した2つのビューの違いを保っている。本研究では,学習者表現の質を高めるために,学習過程に沿った2つの拡張ビュー間の差異を適応的に制御する手法を提案する。具体的には、3次元アフィン変換によりビデオから3次元立方体を自動的に収穫する、ビデオコントラスト学習のためのパラメトリック立方体収穫操作であるParamCropを提案する。 ParamCropは、対向目的を用いてビデオバックボーンと同時に訓練され、データから最適な収穫戦略を学ぶ。 2つの拡張ビュー間の中心距離とIoUは、ParamCropによって適応的に制御され、トレーニング過程に沿った相違点の学習は、強い表現を学ぶ上で有益であることを示す。広範囲にわたるアブレーション研究は、複数のコントラスト学習フレームワークとビデオバックボーンに対するParamCropの有効性を示す。 ParamCropでは,HMDB51およびUCF101データセットの最先端性能を改善した。

The central idea of contrastive learning is to discriminate between different instances and force different views of the same instance to share the same representation. To avoid trivial solutions, augmentation plays an important role in generating different views, among which random cropping is shown to be effective for the model to learn a strong and generalized representation. Commonly used random crop operation keeps the difference between two views statistically consistent along the training process. In this work, we challenge this convention by showing that adaptively controlling the disparity between two augmented views along the training process enhances the quality of the learnt representation. Specifically, we present a parametric cubic cropping operation, ParamCrop, for video contrastive learning, which automatically crops a 3D cubic from the video by differentiable 3D affine transformations. ParamCrop is trained simultaneously with the video backbone using an adversarial objective and learns an optimal cropping strategy from the data. The visualizations show that the center distance and the IoU between two augmented views are adaptively controlled by ParamCrop and the learned change in the disparity along the training process is beneficial to learning a strong representation. Extensive ablation studies demonstrate the effectiveness of the proposed ParamCrop on multiple contrastive learning frameworks and video backbones. With ParamCrop, we improve the state-of-the-art performance on both HMDB51 and UCF101 datasets.

翻訳日:2021-08-25 14:20:28 公開日:2021-08-24

# ShapeConv: 室内RGB-Dセマンティックセグメンテーションのための形状認識型畳み込み層

ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation ( http://arxiv.org/abs/2108.10528v1 )

ライセンス: Link先を確認

Jinming Cao, Hanchao Leng, Dani Lischinski, Danny Cohen-Or, Changhe Tu, Yangyan Li

(参考訳) RGB-Dセマンティックセグメンテーションはここ数年で注目を集めている。既存の方法は、主にRGBと深度の特徴を消費するために同質の畳み込み演算子を使用し、固有の違いを無視している。実際、RGB値は投影された画像空間の測光的外観特性を捉え、深度特徴は局所幾何学の形状とそれの基底(場所)をより広い文脈でエンコードする。ベースと比較すると、形状はおそらくより固有であり、セマンティクスとより強く結びついているので、セグメンテーションの精度にとってより重要となる。この観察に触発された形状認識畳み込み層(shapeconv)を用いて深度特徴を処理し,まず深さ特徴を形状成分と基底成分に分解し,次に学習可能な重みを2つ導入してそれぞれ独立に連携させ,最終的にこれら2成分の再重み付け結合に畳み込みを適用する。 shapeconvはモデルに依存しず、ほとんどのcnnに簡単に統合でき、セマンティクスセグメンテーションのためにバニラ畳み込み層を置き換えることができる。屋内RGB-Dセマンティックセマンティックセグメンテーションベンチマーク(NYU-Dv2(-13,-40)、SUN RGB-D、SID)の大規模な実験は、5つのポピュラーなアーキテクチャで採用する際のShapeConvの有効性を実証している。さらに、計算やメモリ増加を推論フェーズに導入することなく、shapeconvによるcnnの性能を向上させる。理由は、ShapeConvにおける形状と基成分のバランスをとる学習ウェイトが、推論フェーズにおいて定数となり、次の畳み込みに融合し、バニラ畳み込み層を持つものと同一のネットワークとなるからである。

RGB-D semantic segmentation has attracted increasing attention over the past few years. Existing methods mostly employ homogeneous convolution operators to consume the RGB and depth features, ignoring their intrinsic differences. In fact, the RGB values capture the photometric appearance properties in the projected image space, while the depth feature encodes both the shape of a local geometry as well as the base (whereabout) of it in a larger context. Compared with the base, the shape probably is more inherent and has a stronger connection to the semantics, and thus is more critical for segmentation accuracy. Inspired by this observation, we introduce a Shape-aware Convolutional layer (ShapeConv) for processing the depth feature, where the depth feature is firstly decomposed into a shape-component and a base-component, next two learnable weights are introduced to cooperate with them independently, and finally a convolution is applied on the re-weighted combination of these two components. ShapeConv is model-agnostic and can be easily integrated into most CNNs to replace vanilla convolutional layers for semantic segmentation. Extensive experiments on three challenging indoor RGB-D semantic segmentation benchmarks, i.e., NYU-Dv2(-13,-40), SUN RGB-D, and SID, demonstrate the effectiveness of our ShapeConv when employing it over five popular architectures. Moreover, the performance of CNNs with ShapeConv is boosted without introducing any computation and memory increase in the inference phase. The reason is that the learnt weights for balancing the importance between the shape and base components in ShapeConv become constants in the inference phase, and thus can be fused into the following convolution, resulting in a network that is identical to one with vanilla convolutional layers.

翻訳日:2021-08-25 14:20:05 公開日:2021-08-24

# 人物再識別のメリットを享受する人物検索

Making Person Search Enjoy the Merits of Person Re-identification ( http://arxiv.org/abs/2108.10536v1 )

ライセンス: Link先を確認

Chuang Liu, Hua Yang, Qin Zhou and Shibao Zheng

(参考訳) 人物検索は、人物再識別(Re-ID)の拡張タスクである。しかし,既存の1段階の人物探索作業の多くは,人物検出とRe-IDの統合による1段階の人物探索性能向上のために,既存の高度なRe-IDモデルをどのように活用するかを研究していない。この問題に対処するため,教師誘導型分散ネットワーク(TDN)という,より高速で強力なワンステップの人物検索フレームワークを提案し,既存のRe-ID研究のメリットを享受する。提案するtdnは,高度な人物再識別知識を人物検索モデルに転送することにより,人物検索性能を大幅に向上させることができる。提案するtdnでは,リid教師モデルからワンステップパーソンサーチモデルへの知識伝達を改善するため,2つのサブタスクを部分的に分離して,強力なワンステップパーソンサーチベースフレームワークを設計する。さらに,Re-IDモデルとワンステップの人物探索モデル間の入力形式の違いによるスケールギャップを橋渡しする知識伝達ブリッジモジュールを提案する。テスト中は、パノラマ画像の文脈情報を利用してより良い検索を行うためのコンテキストパーソンのランク付け戦略をさらに提案する。 2つの公開人検索データセットの実験により,提案手法の有効性が示された。

Person search is an extended task of person re-identification (Re-ID). However, most existing one-step person search works have not studied how to employ existing advanced Re-ID models to boost the one-step person search performance due to the integration of person detection and Re-ID. To address this issue, we propose a faster and stronger one-step person search framework, the Teacher-guided Disentangling Networks (TDN), to make the one-step person search enjoy the merits of the existing Re-ID researches. The proposed TDN can significantly boost the person search performance by transferring the advanced person Re-ID knowledge to the person search model. In the proposed TDN, for better knowledge transfer from the Re-ID teacher model to the one-step person search model, we design a strong one-step person search base framework by partially disentangling the two subtasks. Besides, we propose a Knowledge Transfer Bridge module to bridge the scale gap caused by different input formats between the Re-ID model and one-step person search model. During testing, we further propose the Ranking with Context Persons strategy to exploit the context information in panoramic images for better retrieval. Experiments on two public person search datasets demonstrate the favorable performance of the proposed method.

翻訳日:2021-08-25 14:19:28 公開日:2021-08-24

# StyleAugment: 事前定義されたテクスチャのないスタイル拡張によるテクスチャ非バイアス表現の学習

StyleAugment: Learning Texture De-biased Representations by Style Augmentation without Pre-defined Textures ( http://arxiv.org/abs/2108.10549v1 )

ライセンス: Link先を確認

Sanghyuk Chun, Song Park

(参考訳) 最近の強力な視覚分類器はテクスチャに偏り、形状情報はモデルによって見過ごされている。 Stylized ImageNetと呼ばれるアートスタイルのトランスファー手法を用いて、トレーニング画像を増強する簡単な試みは、テクスチャバイアスを低減することができる。しかし、Stylized ImageNetアプローチには、忠実度と多様性の2つの欠点がある。まず、生成した画像は、自然画像や芸術絵画に見合う重要な意味的ギャップのため、画質が低い。また、Stylized ImageNetトレーニングサンプルはトレーニング前に事前計算されるため、各サンプルの多様性が欠如している。ミニバッチからスタイルを拡張したStyleAugmentを提案する。 styleaugmentは事前定義されたスタイル参照に依存しないが、参照のためのmini-batch内の自然画像によってオンザフライで拡張イメージを生成する。そのため、StyleAugmentでは、各画像に対する豊富なコンバウンディングキューをオンザフライで観察すると同時に、拡張されたイメージは芸術的なスタイルの転送画像よりもリアルである。我々は,画像NetデータセットにおけるStyleAugmentの有効性を,テクスチャデバイアス精度,汚濁堅牢性,自然対向サンプル,閉塞堅牢性などのロバスト性ベンチマークを用いて検証した。 StyleAugmentは従来の教師なしデバイアス法や最先端データ拡張法よりも優れた一般化性能を示す。

Recent powerful vision classifiers are biased towards textures, while shape information is overlooked by the models. A simple attempt by augmenting training images using the artistic style transfer method, called Stylized ImageNet, can reduce the texture bias. However, Stylized ImageNet approach has two drawbacks in fidelity and diversity. First, the generated images show low image quality due to the significant semantic gap betweeen natural images and artistic paintings. Also, Stylized ImageNet training samples are pre-computed before training, resulting in showing the lack of diversity for each sample. We propose a StyleAugment by augmenting styles from the mini-batch. StyleAugment does not rely on the pre-defined style references, but generates augmented images on-the-fly by natural images in the mini-batch for the references. Hence, StyleAugment let the model observe abundant confounding cues for each image by on-the-fly the augmentation strategy, while the augmented images are more realistic than artistic style transferred images. We validate the effectiveness of StyleAugment in the ImageNet dataset with robustness benchmarks, such as texture de-biased accuracy, corruption robustness, natural adversarial samples, and occlusion robustness. StyleAugment shows better generalization performances than previous unsupervised de-biasing methods and state-of-the-art data augmentation methods in our experiments.

翻訳日:2021-08-25 14:19:06 公開日:2021-08-24

# 画像キャプションと視覚的質問応答のための自動パシングネットワーク

Auto-Parsing Network for Image Captioning and Visual Question Answering ( http://arxiv.org/abs/2108.10568v1 )

ライセンス: Link先を確認

Xu Yang and Chongyang Gao and Hanwang Zhang and Jianfei Cai

(参考訳) 本稿では,トランスフォーマーに基づく視覚言語システムの有効性を向上させるために,入力データの隠れ木構造を発見し,活用するための自動パーシングネットワークを提案する。具体的には、各自己注意層における注意操作によってパラメータ化された確率的グラフモデル(PGM)を課し、スパース仮定を組み込む。我々はこのPGMを用いて、入力シーケンスをいくつかのクラスタにソフトに分割し、各クラスタを内部エンティティの親として扱う。これらの制約された自己アテンション層を積み重ねることで、下位層のクラスタは新しいシーケンスに構成され、上位層のPGMはこのシーケンスをさらにセグメンテーションする。反復的に、スパースツリーを暗黙的に解析することができ、このツリーの階層的な知識は変換された埋め込みに組み込まれ、ターゲットの視覚言語タスクの解決に使用できる。具体的には、我々のAPNがTransformerベースのネットワークを2つの主要な視覚言語タスクであるCaptioningとVisual Question Answeringで強化できることを示します。また、PGM確率に基づく解析アルゴリズムを開発し、推論中に入力の隠れ構造が何であるかを知ることができる。

We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems. Specifically, we impose a Probabilistic Graphical Model (PGM) parameterized by the attention operations on each self-attention layer to incorporate sparse assumption. We use this PGM to softly segment an input sequence into a few clusters where each cluster can be treated as the parent of the inside entities. By stacking these PGM constrained self-attention layers, the clusters in a lower layer compose into a new sequence, and the PGM in a higher layer will further segment this sequence. Iteratively, a sparse tree can be implicitly parsed, and this tree's hierarchical knowledge is incorporated into the transformed embeddings, which can be used for solving the target vision-language tasks. Specifically, we showcase that our APN can strengthen Transformer based networks in two major vision-language tasks: Captioning and Visual Question Answering. Also, a PGM probability-based parsing algorithm is developed by which we can discover what the hidden structure of input is during the inference.

翻訳日:2021-08-25 14:18:39 公開日:2021-08-24

# 映像グラウンディングのためのサポートセットベースクロススーパービジョン

Support-Set Based Cross-Supervision for Video Grounding ( http://arxiv.org/abs/2108.10576v1 )

ライセンス: Link先を確認

Xinpeng Ding, Nannan Wang, Shiwei Zhang, De Cheng, Xiaomeng Li, Ziyuan Huang, Mingqian Tang, Xinbo Gao

(参考訳) 現在のビデオグラウンドディングのアプローチでは、ビデオテキスト関係をキャプチャする複雑なアーキテクチャが提案されており、目覚ましい改善が達成されている。しかし、実際にはアーキテクチャ設計のみで複雑なマルチモーダル関係を学習することは困難である。本稿では,新たなSupport-set Based Cross-Supervision (Sscs) モジュールを提案する。提案するSscsモジュールは、識別的コントラスト目的と生成的キャプション目的の2つの主要成分を含む。対照的な目的は、対照的な学習によって効果的な表現を学ぶことであり、キャプション目的は、テキストによって教師される強力なビデオエンコーダを訓練することができる。接地時間と背景時間の両方で視覚的実体が共存しているため、相互排他的学習はビデオの接地には適さない。本稿では,映像全体から視覚情報を収集し,エンティティの相互排除を解消するサポートセットの概念を用いて,クロススーパービジョンを強化することでこの問題に対処する。元の目的と組み合わせることで、Sscsは既存のアプローチに対するマルチモーダル関係モデリングの能力を高めることができる。我々は,3つの挑戦的データセット上でSscsを広範囲に評価し,特にCharades-STA上のR1@0.5の6.35%において,最先端の手法を大きなマージンで改善できることを示す。

Current approaches for video grounding propose kinds of complex architectures to capture the video-text relations, and have achieved impressive improvements. However, it is hard to learn the complicated multi-modal relations by only architecture designing in fact. In this paper, we introduce a novel Support-set Based Cross-Supervision (Sscs) module which can improve existing methods during training phase without extra inference cost. The proposed Sscs module contains two main components, i.e., discriminative contrastive objective and generative caption objective. The contrastive objective aims to learn effective representations by contrastive learning, while the caption objective can train a powerful video encoder supervised by texts. Due to the co-existence of some visual entities in both ground-truth and background intervals, i.e., mutual exclusion, naively contrastive learning is unsuitable to video grounding. We address the problem by boosting the cross-supervision with the support-set concept, which collects visual information from the whole video and eliminates the mutual exclusion of entities. Combined with the original objectives, Sscs can enhance the abilities of multi-modal relation modeling for existing approaches. We extensively evaluate Sscs on three challenging datasets, and show that our method can improve current state-of-the-art methods by large margins, especially 6.35% in terms of R1@0.5 on Charades-STA.

翻訳日:2021-08-25 14:18:21 公開日:2021-08-24

# 畳み込み単位最適化によるバッチホワイトニングの一般化

Improving Generalization of Batch Whitening by Convolutional Unit Optimization ( http://arxiv.org/abs/2108.10629v1 )

ライセンス: Link先を確認

Yooshin Cho, Hanbyel Cho, Youngsoo Kim, Junmo Kim

(参考訳) バッチホワイトニング(Batch Whitening)は、入力特徴をゼロ平均(Centering)と単位分散(Scaling)に変換し、チャネル間の線形相関(Decorrelation)を取り除くことにより、トレーニングを加速し、安定化する技術である。バッチ正規化を経験的に最適化した一般的な構造では、正規化層は畳み込みとアクティベーション関数の間に現れる。バッチホワイトニングの研究の後、同じ構造をそれ以上解析することなく採用し、線形層の入力がホワイト化されることを前提にバッチホワイト化も分析された。このギャップを埋めるため,我々はこの理論に沿った新しい畳み込みユニットを提案し,本手法は一般にバッチ・ホワイトニングの性能を向上させる。さらに,特徴のランクと相関を調査することで,元の畳み込みユニットの非効率性を示す。本手法は市販のホワイトニングモジュールを用いるため,最先端のホワイトニングモジュールであるイテレーティブ正規化(IterNorm)を用いて,CIFAR-10,CIFAR-100,CUB-200-2011,Stanford Dogs,ImageNetの5つの画像分類データセットにおいて,大幅な性能向上を実現している。特に,大きな学習率,グループサイズ,イテレーション数を用いることで,ホワイトニングの安定性と性能が向上することを確認した。

Batch Whitening is a technique that accelerates and stabilizes training by transforming input features to have a zero mean (Centering) and a unit variance (Scaling), and by removing linear correlation between channels (Decorrelation). In commonly used structures, which are empirically optimized with Batch Normalization, the normalization layer appears between convolution and activation function. Following Batch Whitening studies have employed the same structure without further analysis; even Batch Whitening was analyzed on the premise that the input of a linear layer is whitened. To bridge the gap, we propose a new Convolutional Unit that is in line with the theory, and our method generally improves the performance of Batch Whitening. Moreover, we show the inefficacy of the original Convolutional Unit by investigating rank and correlation of features. As our method is employable off-the-shelf whitening modules, we use Iterative Normalization (IterNorm), the state-of-the-art whitening module, and obtain significantly improved performance on five image classification datasets: CIFAR-10, CIFAR-100, CUB-200-2011, Stanford Dogs, and ImageNet. Notably, we verify that our method improves stability and performance of whitening when using large learning rate, group size, and iteration number.

翻訳日:2021-08-25 14:17:56 公開日:2021-08-24

# レーダー・カメラ融合による全速度レーダリターン

Full-Velocity Radar Returns by Radar-Camera Fusion ( http://arxiv.org/abs/2108.10637v1 )

ライセンス: Link先を確認

Yunfei Long, Daniel Morris, Xiaoming Liu, Marcos Castro, Punarjay Chakravarty, Praveen Narayanan

(参考訳) ドップラーレーダーの特徴は、レーダー点の半径方向の速度を測定することである。しかし, 物体速度推定と動的シーンにおけるレーダスイープの時間的統合の欠如は, 物体速度推定を損なう。本稿では,レーダを融合したカメラがレーダに相補的な情報を提供することを認識し,カメラ画像からの対応する光フローを用いてドップラー帰還の点方向全速度推定を行う。さらに,レーダーとカメラの対応を推定するニューラルネットワークを用いて,レーダのリターンとカメラ画像の関連付け問題に対処する。 nuScenesデータセットの実験結果は,提案手法の有効性を検証し,レーダ点の速度推定および蓄積における最先端の精度向上を示す。

A distinctive feature of Doppler radar is the measurement of velocity in the radial direction for radar points. However, the missing tangential velocity component hampers object velocity estimation as well as temporal integration of radar sweeps in dynamic scenes. Recognizing that fusing camera with radar provides complementary information to radar, in this paper we present a closed-form solution for the point-wise, full-velocity estimate of Doppler returns using the corresponding optical flow from camera images. Additionally, we address the association problem between radar returns and camera images with a neural network that is trained to estimate radar-camera correspondences. Experimental results on the nuScenes dataset verify the validity of the method and show significant improvements over the state-of-the-art in velocity estimation and accumulation of radar points.

翻訳日:2021-08-25 14:17:30 公開日:2021-08-24

# 教師なし視覚表現学習のための時間的知識整合性

Temporal Knowledge Consistency for Unsupervised Visual Representation Learning ( http://arxiv.org/abs/2108.10668v1 )

ライセンス: Link先を確認

Weixin Feng, Yuanjiang Wang, Lihua Ma, Ye Yuan, Chi Zhang

(参考訳) インスタンス識別パラダイムは教師なし学習において支配的になっている。教師が生徒の指導信号として組込みの知識を提供するという、教師中心の枠組みを常に採用している。生徒は、教師の見解とインスタンスの空間的一貫性を強制することによって意味のある表現を学ぶ。しかし、教師の出力は、異なる訓練段階において同じ事例で劇的に変化し、予期せぬノイズが引き起こされ、矛盾した目的によって壊滅的な忘れが引き起こされる。本稿では、まずインスタンスの時間的一貫性を現在のインスタンス識別パラダイムに統合し、時間的知識一貫性(TKC)という新しい強力なアルゴリズムを提案する。具体的には,tkcは時間的教師の知識を動的に整理し,学習例の時間的一貫性を重視した有用な情報を適応的に選択する。実験結果から、TKCは線形評価プロトコル上でResNetとAlexNetの両方の視覚表現を学習し、下流タスクにうまく転送できることがわかった。すべての実験から,本手法の有効性と一般化が示唆された。

The instance discrimination paradigm has become dominant in unsupervised learning. It always adopts a teacher-student framework, in which the teacher provides embedded knowledge as a supervision signal for the student. The student learns meaningful representations by enforcing instance spatial consistency with the views from the teacher. However, the outputs of the teacher can vary dramatically on the same instance during different training stages, introducing unexpected noise and leading to catastrophic forgetting caused by inconsistent objectives. In this paper, we first integrate instance temporal consistency into current instance discrimination paradigms, and propose a novel and strong algorithm named Temporal Knowledge Consistency (TKC). Specifically, our TKC dynamically ensembles the knowledge of temporal teachers and adaptively selects useful information according to its importance to learning instance temporal consistency. Experimental result shows that TKC can learn better visual representations on both ResNet and AlexNet on linear evaluation protocol while transfer well to downstream tasks. All experiments suggest the good effectiveness and generalization of our method.

翻訳日:2021-08-25 14:17:14 公開日:2021-08-24

# ビデオ・サリエンシ予測のための時空間自己注意ネットワーク

Spatio-Temporal Self-Attention Network for Video Saliency Prediction ( http://arxiv.org/abs/2108.10696v1 )

ライセンス: Link先を確認

Ziqiang Wang, Zhi Liu, Gongyang Li, Tianhong Zhang, Lihua Xu, Jijun Wang

(参考訳) 3次元畳み込みニューラルネットワークは,コンピュータビジョンにおける映像タスクにおいて有望な結果を達成している。しかし、3D畳み込みは、カーネルサイズに応じて固定された局所時空にのみ視覚表現をエンコードするが、人間の注意は常にビデオの異なる時間における関係的な視覚特徴に惹かれる。この制限を克服するために,複数のstsaモジュールを異なる3次元畳み込みバックボーンのレベルに配置し,異なる時間ステップの時空間特徴間の長距離関係を直接捉える,ビデオ・サリエンシ予測のための新たな時空間自己着型3dネットワーク(stsanet)を提案する。さらに,semantic と spatio-temporal 部分空間における文脈知覚とマルチレベル特徴を統合するための注目型マルチスケール融合(amsf)モジュールを提案する。 DHF1K, Hollywood-2, UCF, DIEMベンチマークで得られた結果から, 提案したモデルに比較して, 提案モデルの有効性が明らかとなった。

3D convolutional neural networks have achieved promising results for video tasks in computer vision, including video saliency prediction that is explored in this paper. However, 3D convolution encodes visual representation merely on fixed local spacetime according to its kernel size, while human attention is always attracted by relational visual features at different time of a video. To overcome this limitation, we propose a novel Spatio-Temporal Self-Attention 3D Network (STSANet) for video saliency prediction, in which multiple Spatio-Temporal Self-Attention (STSA) modules are employed at different levels of 3D convolutional backbone to directly capture long-range relations between spatio-temporal features of different time steps. Besides, we propose an Attentional Multi-Scale Fusion (AMSF) module to integrate multi-level features with the perception of context in semantic and spatio-temporal subspaces. Extensive experiments demonstrate the contributions of key components of our method, and the results on DHF1K, Hollywood-2, UCF, and DIEM benchmark datasets clearly prove the superiority of the proposed model compared with all state-of-the-art models.

翻訳日:2021-08-25 14:16:57 公開日:2021-08-24

# PocketNet: ニューラルネットワーク検索とマルチステップ知識蒸留を用いた極軽量顔認識ネットワーク

PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation ( http://arxiv.org/abs/2108.10710v1 )

ライセンス: Link先を確認

Fadi Boutros, Patrick Siebke, Marcel Klemt, Naser Damer, Florian Kirchbuchner, Arjan Kuijper

(参考訳) ディープニューラルネットワークは、顔認識の主流となっている。しかし、組み込みデバイスやメモリフットプリントの少ないアプリケーションシナリオに非常に多くのパラメータを含むモデルをデプロイすることは困難である。本研究では,極めて軽量かつ高精度な顔認識ソリューションを提案する。我々はニューラルアーキテクチャ検索を用いて、新しい顔認識モデル、すなわちPocketNetを開発した。また,多段階の知識蒸留という知識蒸留に基づく新しい学習パラダイムを提示することにより,コンパクトモデルの検証性能を向上させることを提案する。我々は,IJB-B,IJB-C,MegaFaceなどの大規模評価ベンチマークを含む9つのベンチマークにおいて,最近のコンパクト顔認識モデルとの比較実験を行った。 pocketnetsは、同じレベルのモデルコンパクト性を考慮して、9つのメインストリームベンチマークで最先端の顔認識性能を一貫して向上させてきた。 0.92mのパラメータを持つネットワークpocketnets-128は、4m以上のパラメータを含む最近のsotaコンパクトモデルと非常に競争力のある結果を得た。トレーニングコードと事前トレーニングされたモデルはhttps://github.com/fdbtrs/PocketNet.comで公開されている。

Deep neural networks have rapidly become the mainstream method for face recognition. However, deploying such models that contain an extremely large number of parameters to embedded devices or in application scenarios with limited memory footprint is challenging. In this work, we present an extremely lightweight and accurate face recognition solution. We utilize neural architecture search to develop a new family of face recognition models, namely PocketNet. We also propose to enhance the verification performance of the compact model by presenting a novel training paradigm based on knowledge distillation, namely the multi-step knowledge distillation. We present an extensive experimental evaluation and comparisons with the recent compact face recognition models on nine different benchmarks including large-scale evaluation benchmarks such as IJB-B, IJB-C, and MegaFace. PocketNets have consistently advanced the state-of-the-art (SOTA) face recognition performance on nine mainstream benchmarks when considering the same level of model compactness. With 0.92M parameters, our smallest network PocketNetS-128 achieved very competitive results to recent SOTA compacted models that contain more than 4M parameters. Training codes and pre-trained models are publicly released https://github.com/fdbtrs/PocketNet.

翻訳日:2021-08-25 14:16:34 公開日:2021-08-24

# 平衡物体検出のための残差予測整合性

Reconcile Prediction Consistency for Balanced Object Detection ( http://arxiv.org/abs/2108.10809v1 )

ライセンス: Link先を確認

Keyang Wang, Lei Zhang

(参考訳) 分類と回帰は物体検出器の2つの柱である。ほとんどのCNNベースの検出器では、これらの2つの柱は独立に最適化されている。それらの間の直接的な相互作用がなければ、分類損失と回帰損失は、トレーニングフェーズの最適方向に対して同期的に最適化できない。これにより、特に不規則な形状や咬合対象において、高い分類スコア、低い局在精度、低い分類スコア、高い局在精度を有する不整合予測が多数生じ、nms後の既存の検出器の検出性能を著しく損なうことが明らかとなる。平衡物体検出のための予測整合性を改善するために,分類枝と局所化枝の最適化を調和させる高調波損失を提案する。調和損失により、これらの2つの分枝は、訓練中に相互に監督し、促進し、推論フェーズにおいてトップ分類とローカライゼーションの共起度の高い一貫した予測を生成することができる。さらに, トレーニング段階において, ローカライゼーション損失が外れ値に支配されるのを防止するため, 異なるIoUレベルの試料の局所化損失の重みを調和させるために, ハーモニックIoU損失を提案する。 PASCAL VOCとMS COCOのベンチマークに関する総合的な実験により,既存の物体検出装置の最先端精度向上に向けたモデルの有効性と有効性を示した。

Classification and regression are two pillars of object detectors. In most CNN-based detectors, these two pillars are optimized independently. Without direct interactions between them, the classification loss and the regression loss can not be optimized synchronously toward the optimal direction in the training phase. This clearly leads to lots of inconsistent predictions with high classification score but low localization accuracy or low classification score but high localization accuracy in the inference phase, especially for the objects of irregular shape and occlusion, which severely hurts the detection performance of existing detectors after NMS. To reconcile prediction consistency for balanced object detection, we propose a Harmonic loss to harmonize the optimization of classification branch and localization branch. The Harmonic loss enables these two branches to supervise and promote each other during training, thereby producing consistent predictions with high co-occurrence of top classification and localization in the inference phase. Furthermore, in order to prevent the localization loss from being dominated by outliers during training phase, a Harmonic IoU loss is proposed to harmonize the weight of the localization loss of different IoU-level samples. Comprehensive experiments on benchmarks PASCAL VOC and MS COCO demonstrate the generality and effectiveness of our model for facilitating existing object detectors to state-of-the-art accuracy.

翻訳日:2021-08-25 14:16:17 公開日:2021-08-24

# 正しい方法でチューニングする:ソフト近傍密度によるドメイン適応の教師なし検証

Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density ( http://arxiv.org/abs/2108.10860v1 )

ライセンス: Link先を確認

Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Stan Sclaroff, Trevor Darrell, and Kate Saenko

(参考訳) unsupervised domain adaptation (uda) メソッドはラベルなしのターゲットドメインの一般化を劇的に改善することができる。しかし, 最適ハイパーパラメータ選択は, 高精度化と負の伝達回避に不可欠である。教師なし適応手法を現実的に検証するにはどうすればいいのか? まず、既存の基準を実証的に分析し、ハイパーパラメータのチューニングにあまり効果がないことを示す。直感的には、訓練されたソース分類器は、近くにある同じクラスのターゲットサンプルを埋め込んで、特徴空間に密集した近傍を形成するべきである。この仮定に基づいて,点間の類似度分布のエントロピーを計算し,ソフトな近傍の密度を測定する,教師なし検証基準を提案する。画像分類とセマンティックセグメンテーションモデルの両方において、ハイパーパラメータとトレーニングイテレーションの数をチューニングすることが可能です。この論文で使われたコードは、 \url{https://github.com/VisionLearningGroup/SND} で入手できる。

Unsupervised domain adaptation (UDA) methods can dramatically improve generalization on unlabeled target domains. However, optimal hyper-parameter selection is critical to achieving high accuracy and avoiding negative transfer. Supervised hyper-parameter validation is not possible without labeled target data, which raises the question: How can we validate unsupervised adaptation techniques in a realistic way? We first empirically analyze existing criteria and demonstrate that they are not very effective for tuning hyper-parameters. Intuitively, a well-trained source classifier should embed target samples of the same class nearby, forming dense neighborhoods in feature space. Based on this assumption, we propose a novel unsupervised validation criterion that measures the density of soft neighborhoods by computing the entropy of the similarity distribution between points. Our criterion is simpler than competing validation methods, yet more effective; it can tune hyper-parameters and the number of training iterations in both image classification and semantic segmentation models. The code used for the paper will be available at \url{https://github.com/VisionLearningGroup/SND}.

翻訳日:2021-08-25 14:15:51 公開日:2021-08-24

# DROID-SLAM:モノクラー、ステレオ、RGB-DカメラのためのディープビジュアルSLAM

DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras ( http://arxiv.org/abs/2108.10869v1 )

ライセンス: Link先を確認

Zachary Teed and Jia Deng

(参考訳) ディープラーニングベースのSLAMシステムであるDROID-SLAMを紹介する。 DROID-SLAMは、Dense Bundle Adjustment層を通して、カメラポーズと画素幅の繰り返し更新で構成される。 DROID-SLAMは正確で、以前の作業よりも大幅に改善され、ロバストで、壊滅的な失敗が著しく少ない。単眼ビデオのトレーニングにもかかわらず、ステレオやRGB-Dビデオを利用してテスト時にパフォーマンスを向上させることができる。オープンソースコードのURLはhttps://github.com/princeton-vl/DROID-SLAMです。

We introduce DROID-SLAM, a new deep learning based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer. DROID-SLAM is accurate, achieving large improvements over prior work, and robust, suffering from substantially fewer catastrophic failures. Despite training on monocular video, it can leverage stereo or RGB-D video to achieve improved performance at test time. The URL to our open source code is https://github.com/princeton-vl/DROID-SLAM.

翻訳日:2021-08-25 14:15:35 公開日:2021-08-24

# ソーシャル・アウェア・軌道予測モデルは本当にソーシャル・アウェアなのか?

Are socially-aware trajectory prediction models really socially-aware? ( http://arxiv.org/abs/2108.10879v1 )

ライセンス: Link先を確認

Saeed Saadatnejad, Mohammadhossein Bahari, Pedram Khorsandi, Mohammad Saneian, Seyed-Mohsen Moosavi-Dezfooli, Alexandre Alahi

(参考訳) 私たちの分野は最近、ニューラルネットワークベースの軌道予測器の武器レースを目撃しました。これらの予測器は、自律走行や歩行者流シミュレーションなどの多くの応用の核心にあるが、敵の堅牢性は慎重に研究されていない。本稿では,衝突回避の観点から予測モデルの社会的理解を評価するために,社会的対応による攻撃を提案する。攻撃は小さいが慎重に作られた摂動であり、予測を失敗させる。技術的には、我々は衝突を出力の失敗モードと定義し、攻撃を誘導するためのハードおよびソフトアテンション機構を提案する。我々の攻撃のおかげで、私たちは現在のモデルの社会的理解の限界に光を当てた。近年の軌道予測モデルにおいて,本手法の強みを示す。最後に,最先端のモデルの社会的理解を高めるために,我々の攻撃を活用できることを示す。コードはオンラインで入手できる。 https://s-attack.github.io/

Our field has recently witnessed an arms race of neural network-based trajectory predictors. While these predictors are at the core of many applications such as autonomous navigation or pedestrian flow simulations, their adversarial robustness has not been carefully studied. In this paper, we introduce a socially-attended attack to assess the social understanding of prediction models in terms of collision avoidance. An attack is a small yet carefully-crafted perturbations to fail predictors. Technically, we define collision as a failure mode of the output, and propose hard- and soft-attention mechanisms to guide our attack. Thanks to our attack, we shed light on the limitations of the current models in terms of their social understanding. We demonstrate the strengths of our method on the recent trajectory prediction models. Finally, we show that our attack can be employed to increase the social understanding of state-of-the-art models. The code is available online: https://s-attack.github.io/

翻訳日:2021-08-25 14:15:26 公開日:2021-08-24

# Adaptation-Agnostic Meta-Training

Adaptation-Agnostic Meta-Training ( http://arxiv.org/abs/2108.10557v1 )

ライセンス: Link先を確認

Jiaxin Chen, Li-Ming Zhan, Xiao-Ming Wu, Fu-Lai Chung

(参考訳) 多くのメタ学習アルゴリズムは、内部タスク適応時にタスク固有の予測器が学習され、メタ更新時にメタパラメータが更新されるという意味で、インターリーブプロセスに定式化することができる。通常のメタトレーニング戦略は、メタパラメータを最適化するために、内部タスク適応手順を区別する必要がある。これにより、内部タスクアルゴリズムを解析的に解決すべきという制約が生じる。この制約の下では、解析解を持つ単純なアルゴリズムのみが、モデル表現性を制限する内部タスクアルゴリズムとして適用できる。制限を緩和するために,適応非依存なメタトレーニング戦略を提案する。提案手法に従い,より強力なアルゴリズム(例えば,異なるアルゴリズムのアンサンブル)をインナータスクアルゴリズムとして適用することで,一般的なベースラインと比較して優れた性能を実現する。ソースコードはhttps://github.com/jiaxinchen666/AdaptationAgnosticMetaLearningで入手できる。

Many meta-learning algorithms can be formulated into an interleaved process, in the sense that task-specific predictors are learned during inner-task adaptation and meta-parameters are updated during meta-update. The normal meta-training strategy needs to differentiate through the inner-task adaptation procedure to optimize the meta-parameters. This leads to a constraint that the inner-task algorithms should be solved analytically. Under this constraint, only simple algorithms with analytical solutions can be applied as the inner-task algorithms, limiting the model expressiveness. To lift the limitation, we propose an adaptation-agnostic meta-training strategy. Following our proposed strategy, we can apply stronger algorithms (e.g., an ensemble of different types of algorithms) as the inner-task algorithm to achieve superior performance comparing with popular baselines. The source code is available at https://github.com/jiaxinchen666/AdaptationAgnosticMetaLearning.

翻訳日:2021-08-25 14:14:57 公開日:2021-08-24

# グラフ分類のためのポーリングアーキテクチャ検索

Pooling Architecture Search for Graph Classification ( http://arxiv.org/abs/2108.10587v1 )

ライセンス: Link先を確認

Lanning Wei, Huan Zhao, Quanming Yao, Zhiqiang He

(参考訳) グラフ分類は化学やバイオインフォマティクスなどの多くの分野において重要な問題であり、グラフニューラルネットワーク(gnn)は最先端(sota)法である。 GNNは、近傍のアグリゲーションスキームに基づいてノードレベルの表現を学習し、グラフレベルの表現を得るために、既存のGNNモデルのアグリゲーション操作後にプール法を適用し、粗い粒度のグラフを生成する。しかし、グラフ分類の高度に多様な応用により、既存のプーリング法の性能は異なるグラフによって異なる。言い換えれば、ほとんどのケースでうまく機能するようにユニバーサルプーリングアーキテクチャを設計することは難しい問題であり、現実世界のアプリケーションではデータ固有のプーリングメソッドが要求される。そこで本研究では,ニューラルネットワークを用いてグラフ分類のための適応的プーリングアーキテクチャを探索する手法を提案する。まず、アグリゲーション、プール、リードアウト、マージの4つのモジュールからなる統一されたフレームワークを設計しました。この枠組みに基づいて、人間設計アーキテクチャに人気のある操作を組み込むことにより、新しい検索空間を設計する。そして, 効率的な探索を可能にするために, 探索空間を連続的に緩和する粗粒化戦略を提案し, 微分可能な探索法を適用できる。 3つのドメインから6つの実世界のデータセットに関する広範囲な実験を行い,提案手法の有効性と有効性を示す。

Graph classification is an important problem with applications across many domains, like chemistry and bioinformatics, for which graph neural networks (GNNs) have been state-of-the-art (SOTA) methods. GNNs are designed to learn node-level representation based on neighborhood aggregation schemes, and to obtain graph-level representation, pooling methods are applied after the aggregation operation in existing GNN models to generate coarse-grained graphs. However,due to highly diverse applications of graph classification, and the performance of existing pooling methods vary on different graphs. In other words, it is a challenging problem to design a universal pooling architecture to perform well in most cases, leading to a demand for data-specific pooling methods in real-world applications. To address this problem, we propose to use neural architecture search (NAS) to search for adaptive pooling architectures for graph classification. Firstly we designed a unified framework consisting of four modules: Aggregation, Pooling, Readout, and Merge, which can cover existing human-designed pooling methods for graph classification. Based on this framework, a novel search space is designed by incorporating popular operations in human-designed architectures. Then to enable efficient search, a coarsening strategy is proposed to continuously relax the search space, thus a differentiable search method can be adopted. Extensive experiments on six real-world datasets from three domains are conducted, and the results demonstrate the effectiveness and efficiency of the proposed framework.

翻訳日:2021-08-25 14:14:42 公開日:2021-08-24

# DeepSleepNet-Lite:不確かさ推定による簡易型自動睡眠ステージスコアモデル

DeepSleepNet-Lite: A Simplified Automatic Sleep Stage Scoring Model with Uncertainty Estimates ( http://arxiv.org/abs/2108.10600v1 )

ライセンス: Link先を確認

Luigi Fiorillo, Paolo Favaro, and Francesca Dalia Faraci

(参考訳) ディープラーニングは最新の自動睡眠スコアリングアルゴリズムで広く利用されている。その人気は、優れたパフォーマンスと生信号を直接処理し、データから特徴を学習する能力に起因している。既存のスコアリングアルゴリズムの多くは、大量のトレーニングパラメータと入力中の長い時間シーケンス(最大12分)のために、非常に計算に要求されるアーキテクチャを利用する。これらのアーキテクチャのうち、モデルの不確実性の推定を提供するものはごくわずかである。本研究では,90秒のEEG入力シーケンスのみを処理する簡易軽量スコアリングアーキテクチャであるDeepSleepNet-Liteを提案する。睡眠スコアリングにおいて,モンテカルロドロップアウト手法を初めて活用し,アーキテクチャの性能向上と不確定なインスタンスの検出に活用した。オープンソースのSleep-EDF拡張データベースから単一チャネルのEEG Fpz-Czで評価を行う。 DeepSleepNet-Liteは、既存の最先端アーキテクチャと比較して、性能が若干低いが、全体的な精度ではマクロF1スコアとコーエンのカッパ(Sleep-EDF v1-2013 +/30mins:84.0%, 78.0%, 0.78; on Sleep-EDF v2-2018 +/30mins: 80.3%, 75.2%, 0.73)である。モンテカルロドロップアウトは不確定な予測の推定を可能にする。不確実なインスタンスを拒絶することで、このモデルはデータベースの両バージョンでより高いパフォーマンスを達成する(Sleep-EDF v1-2013 +/-30mins: 86.1.0%, 79.6%, 0.81; on Sleep-EDF v2-2018 +/-30mins: 82.3%, 76.7%, 0.76)。より軽い睡眠スコアリングアプローチは、リアルタイムで睡眠分析を行うためのスコアリングアルゴリズムの応用への道を開く。

Deep learning is widely used in the most recent automatic sleep scoring algorithms. Its popularity stems from its excellent performance and from its ability to directly process raw signals and to learn feature from the data. Most of the existing scoring algorithms exploit very computationally demanding architectures, due to their high number of training parameters, and process lengthy time sequences in input (up to 12 minutes). Only few of these architectures provide an estimate of the model uncertainty. In this study we propose DeepSleepNet-Lite, a simplified and lightweight scoring architecture, processing only 90-seconds EEG input sequences. We exploit, for the first time in sleep scoring, the Monte Carlo dropout technique to enhance the performance of the architecture and to also detect the uncertain instances. The evaluation is performed on a single-channel EEG Fpz-Cz from the open source Sleep-EDF expanded database. DeepSleepNet-Lite achieves slightly lower performance, if not on par, compared to the existing state-of-the-art architectures, in overall accuracy, macro F1-score and Cohen's kappa (on Sleep-EDF v1-2013 +/-30mins: 84.0%, 78.0%, 0.78; on Sleep-EDF v2-2018 +/-30mins: 80.3%, 75.2%, 0.73). Monte Carlo dropout enables the estimate of the uncertain predictions. By rejecting the uncertain instances, the model achieves higher performance on both versions of the database (on Sleep-EDF v1-2013 +/-30mins: 86.1.0%, 79.6%, 0.81; on Sleep-EDF v2-2018 +/-30mins: 82.3%, 76.7%, 0.76). Our lighter sleep scoring approach paves the way to the application of scoring algorithms for sleep analysis in real-time.

翻訳日:2021-08-25 14:14:17 公開日:2021-08-24

# 一般化ページランクを用いた適応的・解釈可能なグラフ畳み込みネットワーク

Adaptive and Interpretable Graph Convolution Networks Using Generalized Pagerank ( http://arxiv.org/abs/2108.10636v1 )

ライセンス: Link先を確認

Kishan Wimalawarne and Taiji Suzuki

(参考訳) 深層gcnモデルにおける適応層間グラフ畳み込みについて検討する。我々は、GCNIIネットワークの各層で一般化されたページランクを学習し、適応的な畳み込みを誘導するAdaGPRを提案する。 AdaGPR の一般化は正規化隣接行列の固有値スペクトルの多項式によって一般化されたページランク係数の順に有界であることが示される。一般化境界の解析により、オーバースムーシングは正規化隣接行列の高次による畳み込みとモデルの深さの両方に依存することが分かる。我々は,ベンチマーク実データを用いたノード分類の評価を行い,既存のグラフ畳み込みネットワークに比べてadagprは精度が向上し,オーバースムーシングに対するロバスト性が示された。さらに、レイヤーワイズ一般化ページランクの係数の解析により、モデル解釈を可能にする各レイヤにおける畳み込みを質的に理解できることを示す。

We investigate adaptive layer-wise graph convolution in deep GCN models. We propose AdaGPR to learn generalized Pageranks at each layer of a GCNII network to induce adaptive convolution. We show that the generalization bound for AdaGPR is bounded by a polynomial of the eigenvalue spectrum of the normalized adjacency matrix in the order of the number of generalized Pagerank coefficients. By analysing the generalization bounds we show that oversmoothing depends on both the convolutions by the higher orders of the normalized adjacency matrix and the depth of the model. We performed evaluations on node-classification using benchmark real data and show that AdaGPR provides improved accuracies compared to existing graph convolution networks while demonstrating robustness against oversmoothing. Further, we demonstrate that analysis of coefficients of layer-wise generalized Pageranks allows us to qualitatively understand convolution at each layer enabling model interpretations.

翻訳日:2021-08-25 14:13:35 公開日:2021-08-24

# シンボリック回帰におけるトレーニングデータ削減のためのデータ集約

Data Aggregation for Reducing Training Data in Symbolic Regression ( http://arxiv.org/abs/2108.10660v1 )

ライセンス: Link先を確認

Lukas Kammerer, Gabriel Kronberger, Michael Kommenda

(参考訳) データの量が増えると、遺伝的プログラミングによるシンボリック回帰のような計算量の多い機械学習技術がますます非現実的になる。本研究は,学習データを削減する手法と遺伝的プログラミングのランタイムについて述べる。データは、実際の機械学習アルゴリズムを実行する前に、前処理ステップに集約される。 K平均クラスタリングとデータビンニングはデータアグリゲーションに使われ、最も単純なデータリダクション法としてランダムサンプリングと比較される。実世界の4つのデータセットにおいて,学習における高速化と学習モデルへの影響を分析し,各手法の精度を検証した。遺伝的プログラミングの性能は、ランダムな森林と線形回帰と比較される。その結果、k平均とランダムサンプリングは、データサイズに比例するスピードアップの一方で、元のデータの30%に削減された場合、テスト精度が極めて低下することが示された。逆にバインディングは、非常に高いテストエラーのモデルにつながる。

The growing volume of data makes the use of computationally intense machine learning techniques such as symbolic regression with genetic programming more and more impractical. This work discusses methods to reduce the training data and thereby also the runtime of genetic programming. The data is aggregated in a preprocessing step before running the actual machine learning algorithm. K-means clustering and data binning is used for data aggregation and compared with random sampling as the simplest data reduction method. We analyze the achieved speed-up in training and the effects on the trained models test accuracy for every method on four real-world data sets. The performance of genetic programming is compared with random forests and linear regression. It is shown, that k-means and random sampling lead to very small loss in test accuracy when the data is reduced down to only 30% of the original data, while the speed-up is proportional to the size of the data set. Binning on the contrary, leads to models with very high test error.

翻訳日:2021-08-25 14:13:21 公開日:2021-08-24

# エネルギー時系列予測-従来モデルおよび機械学習モデルの分析および経験的評価

Energy time series forecasting-Analytical and empirical assessment of conventional and machine learning models ( http://arxiv.org/abs/2108.10663v1 )

ライセンス: Link先を確認

Hala Hamdoun, Alaa Sagheer and Hassan Youness

(参考訳) エネルギー時系列予測(tsf)問題を解く従来の手法の候補として,機械学習手法が文献に採用されている。近年,人工知能分野において,幅広い応用において驚くべき性能を発揮する深層学習手法が出現している。しかし、そのエネルギーのtsf問題を解決するための性能に関する証拠は、正確さと計算の要求の観点からは、乏しい。エネルギーTSF問題を扱うレビュー記事の大部分は体系的なレビューであるが、エネルギーTSF問題に対する質的かつ定量的な研究は文献ではまだ行われていない。本論文の目的は2つであり、まず、従来の機械学習と深層学習を総合的に分析し、様々なエネルギー的TSF問題の解法として活用することである。第2に,実世界の3つのデータセットを用いて,選択した手法の実証評価を行う。家庭問題における電力消費問題, 天然ガス問題, 電力消費に関するこれらのデータセットは, 最初の2つの問題は不定形tsfであり, 3つ目の問題は多変量tsfである。従来型と機械学習の両競技者に比較して, 深層学習法は, 精度と予測地平線を著しく改善した。平均時において、計算の要求は他の競争相手よりも顕著に大きい。論文は最終的に、エネルギー予測領域におけるさらなる研究の基盤として、多くの課題、研究の方向性、研究コミュニティへの勧告を特定する。

Machine learning methods have been adopted in the literature as contenders to conventional methods to solve the energy time series forecasting (TSF) problems. Recently, deep learning methods have been emerged in the artificial intelligence field attaining astonishing performance in a wide range of applications. Yet, the evidence about their performance in to solve the energy TSF problems, in terms of accuracy and computational requirements, is scanty. Most of the review articles that handle the energy TSF problem are systematic reviews, however, a qualitative and quantitative study for the energy TSF problem is not yet available in the literature. The purpose of this paper is twofold, first it provides a comprehensive analytical assessment for conventional,machine learning, and deep learning methods that can be utilized to solve various energy TSF problems. Second, the paper carries out an empirical assessment for many selected methods through three real-world datasets. These datasets related to electrical energy consumption problem, natural gas problem, and electric power consumption of an individual household problem.The first two problems are univariate TSF and the third problem is a multivariate TSF. Com-pared to both conventional and machine learning contenders, the deep learning methods attain a significant improvement in terms of accuracy and forecasting horizons examined. In the mean-time, their computational requirements are notably greater than other contenders. Eventually,the paper identifies a number of challenges, potential research directions, and recommendations to the research community may serve as a basis for further research in the energy forecasting domain.

翻訳日:2021-08-25 14:13:07 公開日:2021-08-24

# 電力予測に着目した回帰問題に対する適応的説明型連続学習フレームワーク

Adaptive Explainable Continual Learning Framework for Regression Problems with Focus on Power Forecasts ( http://arxiv.org/abs/2108.10781v1 )

ライセンス: Link先を確認

Yujiang He

(参考訳) 従来のディープラーニング技術と比較して、連続学習はディープニューラルネットワークを継続的に適応的に学習することを可能にする。ディープニューラルネットワークは、新しいタスクを学習し、アプリケーションのデータ量が増加し続けるにつれて、古いタスクから得られた知識を克服しなければならない。本稿では,この文脈における潜在的な課題を説明するために,2つの連続学習シナリオを提案する。さらに、回帰タスクの継続的な学習に短いCLeaRフレームワークに関するこれまでの研究に基づいて、モデルが自分自身を拡張し、データを連続的に学習できるように、さらに開発を進めていく予定です。研究トピックは関連するが、継続的なディープラーニングアルゴリズムの開発、データストリームにおける非定常検出戦略、説明可能で可視化可能な人工知能などに限定されない。さらに、フレームワークとアルゴリズム関連のハイパーパラメータをアプリケーションで動的に更新する必要がある。実世界のアプリケーションから収集した発電および消費データに基づいて予測実験を行う。一連の総合的な評価指標と視覚化ツールは、実験結果の分析に役立つ。提案されたフレームワークは、他の絶えず変化するシナリオに一般的に適用されることが期待される。

Compared with traditional deep learning techniques, continual learning enables deep neural networks to learn continually and adaptively. Deep neural networks have to learn new tasks and overcome forgetting the knowledge obtained from the old tasks as the amount of data keeps increasing in applications. In this article, two continual learning scenarios will be proposed to describe the potential challenges in this context. Besides, based on our previous work regarding the CLeaR framework, which is short for continual learning for regression tasks, the work will be further developed to enable models to extend themselves and learn data successively. Research topics are related but not limited to developing continual deep learning algorithms, strategies for non-stationarity detection in data streams, explainable and visualizable artificial intelligence, etc. Moreover, the framework- and algorithm-related hyperparameters should be dynamically updated in applications. Forecasting experiments will be conducted based on power generation and consumption data collected from real-world applications. A series of comprehensive evaluation metrics and visualization tools can help analyze the experimental results. The proposed framework is expected to be generally applied to other constantly changing scenarios.

翻訳日:2021-08-25 14:12:45 公開日:2021-08-24

# 効率的な理論推論のためのグラフコントラスト事前学習

Graph Contrastive Pre-training for Effective Theorem Reasoning ( http://arxiv.org/abs/2108.10821v1 )

ライセンス: Link先を確認

Zhaoyu Li, Binghong Chen, Xujie Si

(参考訳) インタラクティブな定理証明は困難で退屈なプロセスであり、人間の専門家からの非自明な専門知識と詳細な低レベルな指示(または戦術)を必要とする。戦術予測はこのプロセスを自動化する自然な方法です。既存の手法は、人間の専門家による証明からディープニューラルネットワーク(DNN)に基づくモデルを学ぶことによって、戦術予測に関する有望な結果を示す。本稿では,定理証明のための表現学習の改善に焦点を絞った新しい拡張であるニューロタクティクスを提案する。 NeuroTacticは、グラフニューラルネットワーク(GNN)を利用して、定理と前提を表現し、事前学習にグラフコントラスト学習を適用する。定理の表現学習が戦術予測に不可欠であることを実証する。他の方法と比較して、NeuroTacticはCoqGymデータセット上で最先端のパフォーマンスを達成する。

Interactive theorem proving is a challenging and tedious process, which requires non-trivial expertise and detailed low-level instructions (or tactics) from human experts. Tactic prediction is a natural way to automate this process. Existing methods show promising results on tactic prediction by learning a deep neural network (DNN) based model from proofs written by human experts. In this paper, we propose NeuroTactic, a novel extension with a special focus on improving the representation learning for theorem proving. NeuroTactic leverages graph neural networks (GNNs) to represent the theorems and premises, and applies graph contrastive learning for pre-training. We demonstrate that the representation learning of theorems is essential to predict tactics. Compared with other methods, NeuroTactic achieves state-of-the-art performance on the CoqGym dataset.

翻訳日:2021-08-25 14:12:29 公開日:2021-08-24

# CMML:コールドスタート勧告のためのコンテキスト変調メタ学習

CMML: Contextual Modulation Meta Learning for Cold-Start Recommendation ( http://arxiv.org/abs/2108.10511v1 )

ライセンス: Link先を確認

Xidong Feng, Chen Chen, Dong Li, Mengchen Zhao, Jianye Hao, Jun Wang

(参考訳) 実践的なレコメンデータシステムは、過去におけるユーザ・イテム間のインタラクションが不十分である場合、コールドスタートの問題を経験します。メタ学習、特に勾配に基づく学習は、モデルの初期パラメータを学習することでこの問題に対処し、限られたデータ例から特定のタスクへの迅速な適応を可能にする。性能が大幅に向上したにもかかわらず、主な産業展開との非互換性と、インナーループ勾配操作による計算負荷という2つの重大な問題に悩まされる。これら2つの問題は,実用的なレコメンデーションシステムでは適用が困難である。メタ学習フレームワークの利点を享受し、これらの問題を緩和するために、文脈変調メタ学習(cmml)と呼ばれる推奨フレームワークを提案する。 CMMLは完全なフィードフォワード操作で構成されており、計算効率が良く、主要な産業展開と完全に互換性がある。 CMMLは、特定のタスクを表現するためにコンテキストエンコーダを生成するコンテキストエンコーダ、タスクレベルのコンテキストで特定のユーザオブジェクトの特徴を集約するハイブリッドコンテキストジェネレータ、そして、効率的に適応するためにレコメンデーションモデルを変調できるコンテキスト変調ネットワークを含む3つのコンポーネントから構成される。本手法は,様々な実世界のデータセット上でのシナリオ固有のコールドスタート設定とユーザ固有のコールドスタート設定の両方に対して検証し,より高い計算効率とより優れた解釈性を備えた勾配法でCMMLが同等あるいはそれ以上の性能を達成可能であることを示す。

Practical recommender systems experience a cold-start problem when observed user-item interactions in the history are insufficient. Meta learning, especially gradient based one, can be adopted to tackle this problem by learning initial parameters of the model and thus allowing fast adaptation to a specific task from limited data examples. Though with significant performance improvement, it commonly suffers from two critical issues: the non-compatibility with mainstream industrial deployment and the heavy computational burdens, both due to the inner-loop gradient operation. These two issues make them hard to be applied in practical recommender systems. To enjoy the benefits of meta learning framework and mitigate these problems, we propose a recommendation framework called Contextual Modulation Meta Learning (CMML). CMML is composed of fully feed-forward operations so it is computationally efficient and completely compatible with the mainstream industrial deployment. CMML consists of three components, including a context encoder that can generate context embedding to represent a specific task, a hybrid context generator that aggregates specific user-item features with task-level context, and a contextual modulation network, which can modulate the recommendation model to adapt effectively. We validate our approach on both scenario-specific and user-specific cold-start setting on various real-world datasets, showing CMML can achieve comparable or even better performance with gradient based methods yet with much higher computational efficiency and better interpretability.

翻訳日:2021-08-25 14:12:03 公開日:2021-08-24

# autoencoder-based semantic novelty detection: towards dependable ai-based systems

Autoencoder-based Semantic Novelty Detection: Towards Dependable AI-based Systems ( http://arxiv.org/abs/2108.10851v1 )

ライセンス: Link先を確認

Andreas Rausch, Azarmidokht Motamedi Sedeh, Meng Zhang

(参考訳) 無人タクシーのような多くの自律システムは、安全上重要な機能を果たす。自律システムは、特に環境認識のために人工知能(AI)技術を採用している。エンジニアはAIベースの自律システムを完全にテストしたり、正式に検証することはできない。 aiベースのシステムの精度は、トレーニングデータの品質に依存する。これにより、訓練に使用するデータと何らかの点で異なる新規検出データが、システム開発及び運用の安全対策となる。本稿では, 意味的オートエンコーダトポロジーのためのアーキテクチャガイドラインと, 意味的エラー計算をノベルティ基準として, オートエンコーダに基づく意味的ノベルティ検出のための新しいアーキテクチャを提案する。このような意味的新規性検出は、偽陰性を最小化することにより、文献から知られているオートエンコーダに基づく新規性検出アプローチよりも優れていることを実証する。

Many autonomous systems, such as driverless taxis, perform safety critical functions. Autonomous systems employ artificial intelligence (AI) techniques, specifically for the environment perception. Engineers cannot completely test or formally verify AI-based autonomous systems. The accuracy of AI-based systems depends on the quality of training data. Thus, novelty detection - identifying data that differ in some respect from the data used for training - becomes a safety measure for system development and operation. In this paper, we propose a new architecture for autoencoder-based semantic novelty detection with two innovations: architectural guidelines for a semantic autoencoder topology and a semantic error calculation as novelty criteria. We demonstrate that such a semantic novelty detection outperforms autoencoder-based novelty detection approaches known from literature by minimizing false negatives.

翻訳日:2021-08-25 14:11:37 公開日:2021-08-24

# 高次MOTスケーラブル化:リフテッド不整合経路の効率的な近似解法

Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths ( http://arxiv.org/abs/2108.10606v1 )

ライセンス: Link先を確認

Andrea Hornakova, Timo Kaiser, Paul Swoboda, Michal Rolinek, Bodo Rosenhahn, Roberto Henschel

(参考訳) 本稿では,複数物体追跡(MOT)のための自然だがNPハードなモデルであるリフトド・ディスジョイント・パス問題(LDP)に対する効率的な近似メッセージパッシング法を提案する。私たちのトラッカーは、長いMOTシーケンスから来る非常に大きなインスタンスにスケールします。近似解法により,ソリューションの品質を犠牲にすることなくMOT15/16/17ベンチマークを処理でき,そのサイズと複雑さから,現在まで LDP ソルバには及ばないMOT20を解くことができる。これら4つの標準MOTベンチマークにおいて、最適 LDP ソルバに基づくトラッカーを含む最先端の手法と同等あるいは同等の性能を達成する。

We present an efficient approximate message passing solver for the lifted disjoint paths problem (LDP), a natural but NP-hard model for multiple object tracking (MOT). Our tracker scales to very large instances that come from long and crowded MOT sequences. Our approximate solver enables us to process the MOT15/16/17 benchmarks without sacrificing solution quality and allows for solving MOT20, which has been out of reach up to now for LDP solvers due to its size and complexity. On all these four standard MOT benchmarks we achieve performance comparable or better than current state-of-the-art methods including a tracker based on an optimal LDP solver.

翻訳日:2021-08-25 14:11:06 公開日:2021-08-24

# 近距離切削車両からの噴霧のベンチマーク

A Benchmark for Spray from Nearby Cutting Vehicles ( http://arxiv.org/abs/2108.10800v1 )

ライセンス: Link先を確認

Stefanie Walz, Mario Bijelic, Florian Kraus, Werner Ritter, Martin Simon, Igor Doric

(参考訳) 現在の運転支援システムと自律運転スタックは、明確に定義された環境条件と地理フェンスで囲まれた領域に限られている。悪天候下での運転安全を高めるためには、自動運転と運転支援システムの適用範囲を広げる必要がある。この開発を可能にするために、期待される歪みを定量化するために再現可能なベンチマーク手法が必要である。本発表では,噴霧による乱れの検査方法について述べる。噴霧による乱れを評価するための評価スキームとともに、軽量で構成可能な新しい噴霧装置を導入する。この分析は、自動車用RGBカメラと2種類のLiDARシステム、およびYOLOv3とPV-RCNNに基づく下流検出アルゴリズムをカバーする。密閉車両の一般的なシナリオでは、歪みが最大4秒までの知覚スタックに深刻な影響を与えており、スプレーの影響をベンチマークする必要性が示されている。

Current driver assistance systems and autonomous driving stacks are limited to well-defined environment conditions and geo fenced areas. To increase driving safety in adverse weather conditions, broadening the application spectrum of autonomous driving and driver assistance systems is necessary. In order to enable this development, reproducible benchmarking methods are required to quantify the expected distortions. In this publication, a testing methodology for disturbances from spray is presented. It introduces a novel lightweight and configurable spray setup alongside an evaluation scheme to assess the disturbances caused by spray. The analysis covers an automotive RGB camera and two different LiDAR systems, as well as downstream detection algorithms based on YOLOv3 and PV-RCNN. In a common scenario of a closely cutting vehicle, it is visible that the distortions are severely affecting the perception stack up to four seconds showing the necessity of benchmarking the influences of spray.

翻訳日:2021-08-25 14:10:53 公開日:2021-08-24

# 偏光イメージングによる複合積層板の欠陥自動検出のための次世代認識システム

Next-generation perception system for automated defects detection in composite laminates via polarized computational imaging ( http://arxiv.org/abs/2108.10819v1 )

ライセンス: Link先を確認

Yuqi Ding, Jinwei Ye, Corina Barbalata, James Oubre, Chandler Lemoine, Jacob Agostinho, Genevieve Palardy

(参考訳) トリミングやサンディングを含む風力タービンブレードのような大型複合部品の仕上げ作業には、複数の作業員と部品の再配置が必要となる。複合材料製造業界では、製造部品の形状が不整合であり、作業完了は人間の判断と経験に基づくため、そのようなプロセスの自動化は困難である。動的で不確実な環境で仕上げ作業を行うことができる移動ロボットシステムを実装することで、品質が向上し製造コストが低下する。与えられたタスクを完了させるためには、協調ロボットチームは環境を適切に理解し、製造部品の異常を検出する必要がある。本稿では,複合積層板の欠陥を識別する偏極型画像処理システムの初期実装と実演について述べる。ポラリメトリック画像は表面マイクロジオメトリと非常に関係があるため、従来のカラー画像では見えない表面欠陥を検出するのに使うことができる。提案した視覚システムは, ガラス繊維および炭素繊維積層体の欠陥タイプと表面特性(ピンホール, ヴォイド, 引っかき傷, 樹脂フラッシュなど)の同定に成功している。

Finishing operations on large-scale composite components like wind turbine blades, including trimming and sanding, often require multiple workers and part repositioning. In the composites manufacturing industry, automation of such processes is challenging, as manufactured part geometry may be inconsistent and task completion is based on human judgment and experience. Implementing a mobile, collaborative robotic system capable of performing finishing tasks in dynamic and uncertain environments would improve quality and lower manufacturing costs. To complete the given tasks, the collaborative robotic team must properly understand the environment and detect irregularities in the manufactured parts. In this paper, we describe the initial implementation and demonstration of a polarized computational imaging system to identify defects in composite laminates. As the polarimetric images are highly relevant to the surface micro-geometry, they can be used to detect surface defects that are not visible in conventional color images. The proposed vision system successfully identifies defect types and surface characteristics (e.g., pinholes, voids, scratches, resin flash) for different glass fiber and carbon fiber laminates.

翻訳日:2021-08-25 14:10:42 公開日:2021-08-24

# 効率的な長期記憶を有する量子適応エージェント

Quantum adaptive agents with efficient long-term memories ( http://arxiv.org/abs/2108.10876v1 )

ライセンス: Link先を確認

Thomas J. Elliott, Mile Gu, Andrew J. P. Garner, Jayne Thompson

(参考訳) 適応システムの成功の中心は、環境からの信号を解釈し、それに応じて反応する能力である。このようなエージェントは、ますます複雑な戦略を実行することができると、通常より良く機能する。エージェントが過去の経験から思い出さなければならない情報が多ければ多いほど、必要なメモリが増えます。本稿では,量子情報処理が可能なエージェントのパワーについて検討する。我々は、量子エージェントがメモリ圧縮の利点を最大化するために採用する必要がある最も一般的な形式を明らかにし、そのメモリ状態を体系的にエンコーディングする手段を提供する。これらのエンコーディングは,メモリ最小の旧来のエージェントと比較して,過去のイベントに関する情報を保存しなければならない場合,非常に有利なスケーリングの利点を示す。

Central to the success of adaptive systems is their ability to interpret signals from their environment and respond accordingly -- they act as agents interacting with their surroundings. Such agents typically perform better when able to execute increasingly complex strategies. This comes with a cost: the more information the agent must recall from its past experiences, the more memory it will need. Here we investigate the power of agents capable of quantum information processing. We uncover the most general form a quantum agent need adopt to maximise memory compression advantages, and provide a systematic means of encoding their memory states. We show these encodings can exhibit extremely favourable scaling advantages relative to memory-minimal classical agents when information must be retained about events increasingly far into the past.

翻訳日:2021-08-25 14:09:52 公開日:2021-08-24

# 反論可能な推奨事項

Counterfactual Explainable Recommendation ( http://arxiv.org/abs/2108.10539v1 )

ライセンス: Link先を確認

Juntao Tan, Shuyuan Xu, Yingqiang Ge, Yunqi Li, Xu Chen, Yongfeng Zhang

(参考訳) ユーザやシステム設計者がより理解と意思決定を容易にするために説明を提供することで、説明可能な推奨は重要な研究課題となっている。本稿では,説明可能な推薦のための因果推論から反事実推論の考察を取り入れた,反事実説明可能な推薦(カウンタ)を提案する。 counterは、説明の複雑さと強みを定式化することができ、モデル決定のための単純(低複雑さ)かつ効果的な(高強度)説明を求めるために、反事実学習フレームワークを採用している。技術的には、各ユーザーに推奨される各項目について、カウンタ最適化問題を定式化し、項目の側面に最小限の変更を発生させ、反事実項目の推奨決定を逆転させる反事実項目を作成する。これらの変更は、なぜオリジナルの項目が推奨されるのかの説明である。反事実的な説明は、ユーザとシステムデザイナの両方がよりよいモデルデバッグのために役立ちます。この作業のもうひとつの貢献は、説明可能な推奨の評価である。幸いなことに、反実的な説明は標準的な定量的評価に非常に適している。説明の質を評価するために,ユーザの視点から2種類の評価指標を設計する。ユーザがそのアイテムを好む理由) と、モデルの観点から見た他のもの(すなわち、) なぜそのアイテムがモデルによって推奨されるのか) 提案手法をブラックボックスレコメンデータシステムに適用し,実世界の5つのデータセット上で生成した説明を評価する。その結果,本モデルは最先端のレコメンデーションモデルよりも正確かつ効果的に説明できることがわかった。

By providing explanations for users and system designers to facilitate better understanding and decision making, explainable recommendation has been an important research problem. In this paper, we propose Counterfactual Explainable Recommendation (CountER), which takes the insights of counterfactual reasoning from causal inference for explainable recommendation. CountER is able to formulate the complexity and the strength of explanations, and it adopts a counterfactual learning framework to seek simple (low complexity) and effective (high strength) explanations for the model decision. Technically, for each item recommended to each user, CountER formulates a joint optimization problem to generate minimal changes on the item aspects so as to create a counterfactual item, such that the recommendation decision on the counterfactual item is reversed. These altered aspects constitute the explanation of why the original item is recommended. The counterfactual explanation helps both the users for better understanding and the system designers for better model debugging. Another contribution of the work is the evaluation of explainable recommendation, which has been a challenging task. Fortunately, counterfactual explanations are very suitable for standard quantitative evaluation. To measure the explanation quality, we design two types of evaluation metrics, one from user's perspective (i.e. why the user likes the item), and the other from model's perspective (i.e. why the item is recommended by the model). We apply our counterfactual learning algorithm on a black-box recommender system and evaluate the generated explanations on five real-world datasets. Results show that our model generates more accurate and effective explanations than state-of-the-art explainable recommendation models.

翻訳日:2021-08-25 14:08:49 公開日:2021-08-24

# フェデレーション学習におけるユーザ貢献度データフリー評価

Data-Free Evaluation of User Contributions in Federated Learning ( http://arxiv.org/abs/2108.10623v1 )

ライセンス: Link先を確認

Hongtao Lv, Zhenzhe Zheng, Tie Luo, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv

(参考訳) Federated Learning (FL)は、モバイルデバイス上の機械学習モデルを、各デバイスのプライベートデータとコンピューティングリソースを使用して分散的にトレーニングする。重要な問題は,(1)モデルトレーニングにおけるユーザの努力を適切なインセンティブで補償し,(2)悪意のある低品質ユーザの検出と削除を可能にするために,個々のユーザの貢献を評価することである。最先端のソリューションは評価目的のために代表的なテストデータセットを必要とするが、そのようなデータセットはしばしば利用できず、合成も困難である。本稿では,テストデータセットを使わずにflにおけるユーザの貢献度を評価するピア予測の考え方に基づいて,ペアワイズ相関合意(pca)と呼ばれる手法を提案する。 pcaはユーザーがアップロードしたモデルパラメータの統計相関を用いてこれを達成する。次に,(1)Fed-PCAと呼ばれる新しいフェデレーション学習アルゴリズム,(2)真性を保証する新たなインセンティブメカニズムを設計に適用する。 MNISTデータセットと大規模産業製品レコメンデーションデータセットを用いてPCAとFed-PCAの性能を評価する。その結果、我々のFed-PCAは標準のFedAvgアルゴリズムや他のベースライン手法を精度良く上回り、同時にPCAはユーザーが真実に振る舞うことを効果的に動機づけることを示した。

Federated learning (FL) trains a machine learning model on mobile devices in a distributed manner using each device's private data and computing resources. A critical issues is to evaluate individual users' contributions so that (1) users' effort in model training can be compensated with proper incentives and (2) malicious and low-quality users can be detected and removed. The state-of-the-art solutions require a representative test dataset for the evaluation purpose, but such a dataset is often unavailable and hard to synthesize. In this paper, we propose a method called Pairwise Correlated Agreement (PCA) based on the idea of peer prediction to evaluate user contribution in FL without a test dataset. PCA achieves this using the statistical correlation of the model parameters uploaded by users. We then apply PCA to designing (1) a new federated learning algorithm called Fed-PCA, and (2) a new incentive mechanism that guarantees truthfulness. We evaluate the performance of PCA and Fed-PCA using the MNIST dataset and a large industrial product recommendation dataset. The results demonstrate that our Fed-PCA outperforms the canonical FedAvg algorithm and other baseline methods in accuracy, and at the same time, PCA effectively incentivizes users to behave truthfully.

翻訳日:2021-08-25 14:08:25 公開日:2021-08-24

# シンボリック回帰における遺伝的操作の有効性について

On the Effectiveness of Genetic Operations in Symbolic Regression ( http://arxiv.org/abs/2108.10661v1 )

ライセンス: Link先を確認

Bogdan Burlacu, Michael Affenzeller, Michael Kommenda

(参考訳) 本稿では,遺伝的プログラミング(GP)の進化的ダイナミクスを遺伝情報,多様性尺度,親から子への適合度変化に関する情報を用いて解析する手法について述べる。個体構造における遺伝子の出自を同定する新たなサブツリー追跡手法を導入し, 個体群における最良解の進化に寄与しているのは, ごく少数の祖先個体のみであることを示す。

This paper describes a methodology for analyzing the evolutionary dynamics of genetic programming (GP) using genealogical information, diversity measures and information about the fitness variation from parent to offspring. We introduce a new subtree tracing approach for identifying the origins of genes in the structure of individuals, and we show that only a small fraction of ancestor individuals are responsible for the evolvement of the best solutions in the population.

翻訳日:2021-08-25 14:08:02 公開日:2021-08-24

# オープンバンキングのための連合学習

Federated Learning for Open Banking ( http://arxiv.org/abs/2108.10749v1 )

ライセンス: Link先を確認

Guodong Long, Yue Tan, Jing Jiang, Chengqi Zhang

(参考訳) オープンバンキングは、個々の顧客が自分の銀行データを所有することを可能にし、データマーケットプレースと金融サービスの新たなエコシステムの促進に対する基本的なサポートを提供する。近い将来,連合学習を用いた金融分野におけるデータ所有の分散化が期待できる。これは、分散型トレーニング方法でインテリジェントなモデルを学習できるジャストインタイム技術である。フェデレーション学習の最も魅力的な側面は、プライベートデータを収集することなく、モデルトレーニングを集中型サーバと分散ノードに分解する能力である。この種の分解学習フレームワークは、ユーザのプライバシと機密データを保護する大きな可能性を秘めている。したがって、連合学習は、オープンバンキングデータ市場と自然に結合する。この章では、オープンバンキングの文脈で連合学習を適用する際の課題について論じ、それに対応するソリューションも検討されている。

Open banking enables individual customers to own their banking data, which provides fundamental support for the boosting of a new ecosystem of data marketplaces and financial services. In the near future, it is foreseeable to have decentralized data ownership in the finance sector using federated learning. This is a just-in-time technology that can learn intelligent models in a decentralized training manner. The most attractive aspect of federated learning is its ability to decompose model training into a centralized server and distributed nodes without collecting private data. This kind of decomposed learning framework has great potential to protect users' privacy and sensitive data. Therefore, federated learning combines naturally with an open banking data marketplaces. This chapter will discuss the possible challenges for applying federated learning in the context of open banking, and the corresponding solutions have been explored as well.

翻訳日:2021-08-25 14:07:54 公開日:2021-08-24

# リプシッツ誘導体を用いた一変量関数の大域的最適化の回帰解析

Regret Analysis of Global Optimization in Univariate Functions with Lipschitz Derivatives ( http://arxiv.org/abs/2108.10859v1 )

ライセンス: Link先を確認

Kaan Gokcesu, Hakan Gokcesu

(参考訳) 本研究では,不定損失関数における大域的最適化の問題について検討し,一般的な下限アルゴリズム(例えばpiyavskii-shubertアルゴリズム)の後悔を分析する。任意の時間に$T$(これは最高の見積とグローバルオプティマイザの間の損失の差である)という広く利用可能な単純な後悔の代わりに、累積的後悔をその時点まで調査する。適切な下限アルゴリズムを用いることで、異なる関数のクラスに対して満足のいく累積後悔境界を実現できることを示す。パラメータ $L$ を持つリプシッツ連続函数に対して、累積後悔は$O(L\log T)$であることを示す。パラメータ $H$ を持つ滑らかなリプシッツ函数に対して、累積後悔は $O(H)$ であることを示す。また、リプシッツ連続函数と滑らか函数の両方を個別にカバーするより広範な関数のクラスについて解析的に結果を拡張する。

In this work, we study the problem of global optimization in univariate loss functions, where we analyze the regret of the popular lower bounding algorithms (e.g., Piyavskii-Shubert algorithm). For any given time $T$, instead of the widely available simple regret (which is the difference of the losses between the best estimation up to $T$ and the global optimizer), we study the cumulative regret up to that time. With a suitable lower bounding algorithm, we show that it is possible to achieve satisfactory cumulative regret bounds for different classes of functions. For Lipschitz continuous functions with the parameter $L$, we show that the cumulative regret is $O(L\log T)$. For Lipschitz smooth functions with the parameter $H$, we show that the cumulative regret is $O(H)$. We also analytically extend our results for a broader class of functions that covers both the Lipschitz continuous and smooth functions individually.

翻訳日:2021-08-25 14:07:39 公開日:2021-08-24

# 深信号FBSDEアルゴリズム

Deep Signature FBSDE Algorithm ( http://arxiv.org/abs/2108.10504v1 )

ライセンス: Link先を確認

Qi Feng, Man Luo, Zhaoyu Zhang

(参考訳) 本研究では,前向き確率微分方程式 (FBSDEs) を状態と経路に依存する特徴を持つディープシグネチャ/log-signature FBSDEアルゴリズムを提案する。ニューラルネット(RNN)モデルにディープシグネチャ/ログ-シグネチャ変換を組み込むことで,トレーニング時間を短縮し,精度を向上し,既存の文献の手法と比較して時間的地平線を延長する。さらに,パラメータ偏微分方程式 (PDE) や経路依存PDE (PPDE) に関連付けられた,高周波データを含む状態と経路依存オプションの価格設定,モデルあいまいさ,確率ゲームなど,幅広い応用に適用することができる。最後に, ディープシグネチャ/log-signature FBSDEアルゴリズムの収束解析を導出する。

We propose a deep signature/log-signature FBSDE algorithm to solve forward-backward stochastic differential equations (FBSDEs) with state and path dependent features. By incorporating the deep signature/log-signature transformation into the recurrent neural network (RNN) model, our algorithm shortens the training time, improves the accuracy, and extends the time horizon comparing to methods in the existing literature. Moreover, our algorithms can be applied to a wide range of applications such as state and path dependent option pricing involving high-frequency data, model ambiguity, and stochastic games, which are linked to parabolic partial differential equations (PDEs), and path-dependent PDEs (PPDEs). Lastly, we also derive the convergence analysis of the deep signature/log-signature FBSDE algorithm.

翻訳日:2021-08-25 14:06:43 公開日:2021-08-24

# 最小囲み球を用いた第4種最適後方精度不確かさトレードオフの不確かさ定量化

Uncertainty Quantification of the 4th kind; optimal posterior accuracy-uncertainty tradeoff with the minimum enclosing ball ( http://arxiv.org/abs/2108.10517v1 )

ライセンス: Link先を確認

Hamed Hamze Bajgiran and Pau Batlle Franch and Houman Owhadi and Clint Scovel and Mahdy Shirdel and Michael Stanley and Peyman Tavallali

(参考訳) 不確実量化(UQ)には基本的に3種類のアプローチがある: (A) 頑健な最適化、(B) ベイズ的、(C) 決定論。 a) は頑健であるが、正確さとデータの同化に関しては不利である。 (b)前もって必要であり、一般的に脆く、後方推定は遅くなる。 C)は最適な事前の同定につながるが、その近似は次元の呪いに悩まされ、リスクの概念はデータの分布に関して平均化されるものである。我々は, (a), (b), (c) と仮説検定のハイブリッドである4番目の種類を紹介する。これは、サンプルの$x$を観察した後、(1)相対的可能性を通して可能性領域を定義し、(2)その領域でミンマックスゲームを行い、最適推定器とそのリスクを定義する。得られた方法は、(a)データを測定した後に最適な先行性を特定し、(b)リスクの概念は後部であり、(b)最適な推定値の判定とそのリスクは、関心地図の量(次元の呪いの対象ではなく、高速である)に基づいて、確率領域の画像の最小囲い球の計算に還元することができる。この方法は、観測データ(相対可能性)の希少性に仮定された下界として作用する$[0,1]$のパラメータによって特徴づけられる。このパラメータが1ドルに近い場合、この方法は、信頼度が低いUQ推定値で最大推定値の周りに集中した後続分布を生成する。このパラメータが0$に近い場合、この方法は信頼度の高いuq推定値を持つ最大リスク後方分布を生成する。精度不確実性トレードオフのナビゲートに加えて,データ同化に伴うロバスト性-正確性トレードオフをナビゲートすることでベイズ推論の脆性に対処する手法を提案する。

There are essentially three kinds of approaches to Uncertainty Quantification (UQ): (A) robust optimization, (B) Bayesian, (C) decision theory. Although (A) is robust, it is unfavorable with respect to accuracy and data assimilation. (B) requires a prior, it is generally brittle and posterior estimations can be slow. Although (C) leads to the identification of an optimal prior, its approximation suffers from the curse of dimensionality and the notion of risk is one that is averaged with respect to the distribution of the data. We introduce a 4th kind which is a hybrid between (A), (B), (C), and hypothesis testing. It can be summarized as, after observing a sample $x$, (1) defining a likelihood region through the relative likelihood and (2) playing a minmax game in that region to define optimal estimators and their risk. The resulting method has several desirable properties (a) an optimal prior is identified after measuring the data, and the notion of risk is a posterior one, (b) the determination of the optimal estimate and its risk can be reduced to computing the minimum enclosing ball of the image of the likelihood region under the quantity of interest map (which is fast and not subject to the curse of dimensionality). The method is characterized by a parameter in $ [0,1]$ acting as an assumed lower bound on the rarity of the observed data (the relative likelihood). When that parameter is near $1$, the method produces a posterior distribution concentrated around a maximum likelihood estimate with tight but low confidence UQ estimates. When that parameter is near $0$, the method produces a maximal risk posterior distribution with high confidence UQ estimates. In addition to navigating the accuracy-uncertainty tradeoff, the proposed method addresses the brittleness of Bayesian inference by navigating the robustness-accuracy tradeoff associated with data assimilation.

翻訳日:2021-08-25 14:06:29 公開日:2021-08-24

# 1対多: 深層学習による重力波探索

From One to Many: A Deep Learning Coincident Gravitational-Wave Search ( http://arxiv.org/abs/2108.10715v1 )

ライセンス: Link先を確認

Marlin B. Sch\"afer (1 and 2), Alexander H. Nitz (1 and 2) ((1) Max-Planck-Institut f\"ur Gravitationsphysik (Albert-Einstein-Institut), (2) Leibniz Universit\"at Hannover)

(参考訳) コンパクト2元源の合体による重力波は、地球結合検出器によって日常的に観測されている。最も敏感な探索アルゴリズムは、多くの異なる計算済みの重力波形を検出器データと組み合わせ、異なる検出器間の一致を探索する。機械学習は、計算コストを削減し、より複雑な信号をターゲットとする探索アルゴリズムを構築するための代替手法として検討されている。本研究では、単一検出器からの非スピン性二元ブラックホールデータに基づいてトレーニングされたニューラルネットワークを用いて、二元ブラックホール融合による重力波の2検出器探索を構築する。ネットワークは2つの観測所のデータに独立して適用され、2つの観測所間で一致したイベントをチェックする。これにより、独立検出器データを時間シフトすることで、大量のバックグラウンドデータの効率的な分析が可能になる。単一検出器の場合、ネットワークは感度マッチングされたフィルタリングの91.5\%$を維持するが、この数は2つの観測値に対して83.9\%$となる。ネットワークが検出器内の信号一貫性をチェックするために、両方の検出器からのデータを直接操作する単純なネットワークセットを構築します。これらの単純な2検出器ネットワークはいずれも、検出器のデータに個別にネットワークを適用し、時間的偶然を検索するよりも感度を向上させることができない。

Gravitational waves from the coalescence of compact-binary sources are now routinely observed by Earth bound detectors. The most sensitive search algorithms convolve many different pre-calculated gravitational waveforms with the detector data and look for coincident matches between different detectors. Machine learning is being explored as an alternative approach to building a search algorithm that has the prospect to reduce computational costs and target more complex signals. In this work we construct a two-detector search for gravitational waves from binary black hole mergers using neural networks trained on non-spinning binary black hole data from a single detector. The network is applied to the data from both observatories independently and we check for events coincident in time between the two. This enables the efficient analysis of large quantities of background data by time-shifting the independent detector data. We find that while for a single detector the network retains $91.5\%$ of the sensitivity matched filtering can achieve, this number drops to $83.9\%$ for two observatories. To enable the network to check for signal consistency in the detectors, we then construct a set of simple networks that operate directly on data from both detectors. We find that none of these simple two-detector networks are capable of improving the sensitivity over applying networks individually to the data from the detectors and searching for time coincidences.

翻訳日:2021-08-25 14:05:36 公開日:2021-08-24

# 適応群lassoニューラルネットワークモデル : 少数の変数と時間依存データの関数について

Adaptive Group Lasso Neural Network Models for Functions of Few Variables and Time-Dependent Data ( http://arxiv.org/abs/2108.10825v1 )

ライセンス: Link先を確認

Lam Si Tung Ho and Giang Tran

(参考訳) 本稿では,動的システムから入力データが生成され,対象関数が少数のアクティブ変数や変数の線形結合に依存する高次元関数近似のための適応群lasso深層ニューラルネットワークを提案する。対象関数をディープニューラルネットワークで近似し,対象関数の制約を表現するために,適切な隠れ層の重みに対して適応群lasso制約を強制する。実験により,提案手法は,スパース辞書行列法,グループラッソペナルティの有無のニューラルネットワークなど,最近の最先端手法よりも優れていることが示された。

In this paper, we propose an adaptive group Lasso deep neural network for high-dimensional function approximation where input data are generated from a dynamical system and the target function depends on few active variables or few linear combinations of variables. We approximate the target function by a deep neural network and enforce an adaptive group Lasso constraint to the weights of a suitable hidden layer in order to represent the constraint on the target function. Our empirical studies show that the proposed method outperforms recent state-of-the-art methods including the sparse dictionary matrix method, neural networks with or without group Lasso penalty.

翻訳日:2021-08-25 14:05:14 公開日:2021-08-24

# (参考訳) SERF:log-Softplus ERrorActivation Functionを用いたディープニューラルネットワークのより良いトレーニングを目指して

SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function ( http://arxiv.org/abs/2108.09598v2 )

ライセンス: CC BY 4.0

Sayan Nag, Mayukh Bhattacharyya

(参考訳) アクティベーション機能は、トレーニングダイナミクスとニューラルネットワークのパフォーマンスを決定する上で重要な役割を果たす。シンプルで有効であるにもかかわらず広く採用されているアクティベーション関数 ReLU には、Dying ReLU 問題を含むいくつかの欠点がある。そこで本研究では,自然界において自己正規化され,非単調であるサーフと呼ばれる新しい活性化関数を提案する。 Mishと同様に、SerfもSwishファミリーに属している。コンピュータビジョン(画像分類とオブジェクト検出)と自然言語処理(機械翻訳、感情分類、マルチモーダル・エンテーメント)の様々な実験に基づいて、SerfはReLU(ベースライン)とSwishとMishを含む他のアクティベーション機能を大きく上回っており、より深いアーキテクチャに顕著な差がある。アブレーション研究により、serfベースのアーキテクチャは様々なシナリオにおいてswishやmishよりも優れた性能を示し、様々な深さ、複雑さ、最適化、学習率、バッチサイズ、初期化器、ドロップアウト率でserfの有効性と互換性を検証する。最後に,SwishとSerfの数学的関係について検討し,よりスムーズかつ高速に勾配を最適化する正規化効果を提供するSerfの第1微分のプレコンディショナー関数の影響を示す。

Activation functions play a pivotal role in determining the training dynamics and neural network performance. The widely adopted activation function ReLU despite being simple and effective has few disadvantages including the Dying ReLU problem. In order to tackle such problems, we propose a novel activation function called Serf which is self-regularized and nonmonotonic in nature. Like Mish, Serf also belongs to the Swish family of functions. Based on several experiments on computer vision (image classification and object detection) and natural language processing (machine translation, sentiment classification and multimodal entailment) tasks with different state-of-the-art architectures, it is observed that Serf vastly outperforms ReLU (baseline) and other activation functions including both Swish and Mish, with a markedly bigger margin on deeper architectures. Ablation studies further demonstrate that Serf based architectures perform better than those of Swish and Mish in varying scenarios, validating the effectiveness and compatibility of Serf with varying depth, complexity, optimizers, learning rates, batch sizes, initializers and dropout rates. Finally, we investigate the mathematical relation between Swish and Serf, thereby showing the impact of preconditioner function ingrained in the first derivative of Serf which provides a regularization effect making gradients smoother and optimization faster.

翻訳日:2021-08-25 11:52:32 公開日:2021-08-24

# (参考訳) 側面:構造対応インスタンス深度推定を用いたセンタベースステレオ3d検出器

SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation ( http://arxiv.org/abs/2108.09663v2 )

ライセンス: CC BY 4.0

Xidong Peng, Xinge Zhu, Tai Wang, and Yuexin Ma

(参考訳) 3D検出は環境認識において不可欠である。一般的に使用されるLiDARセンサーのコストが高いため、ステレオビジョンに基づく3D検出は経済的に効果的だが、近年は注目を集めている。 2次元画像に基づくこれらのアプローチでは、正確な深度情報が3次元検出の鍵となり、既存の手法のほとんどは、深度推定の予備段階に頼っている。それらは主にグローバルな深度に焦点を合わせ、この特定のタスク、すなわち空間と局所性における深度情報の性質を無視する。そこで本研究では, ステレオ画像を用いた立体画像によるアンカーフリー3D検出手法を提案し, 各オブジェクトのRoIsからコストボリュームを構成することで, インスタンスレベルの深度情報を探索する。局所的なコスト量の情報のスパース性から,さらに,マッチングの重み付けと構造認識の注意を導入し,奥行き情報の集中化を図る。 KITTIデータセットで行った実験から,本手法は深度マップの監督のない既存手法と比較して最先端の性能を実現することが示された。

3D detection plays an indispensable role in environment perception. Due to the high cost of commonly used LiDAR sensor, stereo vision based 3D detection, as an economical yet effective setting, attracts more attention recently. For these approaches based on 2D images, accurate depth information is the key to achieve 3D detection, and most existing methods resort to a preliminary stage for depth estimation. They mainly focus on the global depth and neglect the property of depth information in this specific task, namely, sparsity and locality, where exactly accurate depth is only needed for these 3D bounding boxes. Motivated by this finding, we propose a stereo-image based anchor-free 3D detection method, called structure-aware stereo 3D detector (termed as SIDE), where we explore the instance-level depth information via constructing the cost volume from RoIs of each object. Due to the information sparsity of local cost volume, we further introduce match reweighting and structure-aware attention, to make the depth information more concentrated. Experiments conducted on the KITTI dataset show that our method achieves the state-of-the-art performance compared to existing methods without depth map supervision.

翻訳日:2021-08-25 11:40:12 公開日:2021-08-24

# (参考訳) 回帰のための効率的なガウス神経プロセス

Efficient Gaussian Neural Processes for Regression ( http://arxiv.org/abs/2108.09676v2 )

ライセンス: CC BY 4.0

Stratis Markou, James Requeima, Wessel Bruinsma, Richard Turner

(参考訳) Conditional Neural Processs (CNP; Garnelo et al., 2018) は、よく校正された予測を生成し、テスト時に高速な推論を可能にし、単純な最大精度の手順でトレーニングできる、魅力的なメタラーニングモデルのファミリーである。 CNPの制限は、出力の依存性をモデル化できないことである。これにより予測性能が著しく低下し、コヒーレント関数サンプルの描画が不可能になるため、下流アプリケーションや意思決定におけるCNPの適用性が制限される。ニューラルプロセス(nps; garnelo et al., 2018)は、潜在変数を使用してこの問題を緩和し、出力依存性をモデル化するが、近似推論による困難をもたらす。最近の代替案 (Bruinsma et al.,2021) はFullConvGNPと呼ばれ、予測の依存性をモデル化し、正確な最大形でトレーニング可能である。残念ながらFullConvGNPは高価な2次元畳み込みに依存しており、1次元のデータしか適用できない。本研究では,出力依存性をモデル化する別の手法を提案する。この手法は,最大確率トレーニングにも応用できるが,fullconvgnpと異なり,2次元データと3次元データにスケールできる。提案手法は合成実験において良好な性能を示す。

Conditional Neural Processes (CNP; Garnelo et al., 2018) are an attractive family of meta-learning models which produce well-calibrated predictions, enable fast inference at test time, and are trainable via a simple maximum likelihood procedure. A limitation of CNPs is their inability to model dependencies in the outputs. This significantly hurts predictive performance and renders it impossible to draw coherent function samples, which limits the applicability of CNPs in down-stream applications and decision making. Neural Processes (NPs; Garnelo et al., 2018) attempt to alleviate this issue by using latent variables, relying on these to model output dependencies, but introduces difficulties stemming from approximate inference. One recent alternative (Bruinsma et al.,2021), which we refer to as the FullConvGNP, models dependencies in the predictions while still being trainable via exact maximum-likelihood. Unfortunately, the FullConvGNP relies on expensive 2D-dimensional convolutions, which limit its applicability to only one-dimensional data. In this work, we present an alternative way to model output dependencies which also lends itself maximum likelihood training but, unlike the FullConvGNP, can be scaled to two- and three-dimensional data. The proposed models exhibit good performance in synthetic experiments.

翻訳日:2021-08-25 11:25:44 公開日:2021-08-24

# (参考訳) 構成可能な3dシーンレイアウトによるリアル画像合成

Realistic Image Synthesis with Configurable 3D Scene Layouts ( http://arxiv.org/abs/2108.10031v2 )

ライセンス: CC BY 4.0

Jaebong Jeong, Janghun Jo, Jingdong Wang, Sunghyun Cho, Jaesik Park

(参考訳) 最近の条件付き画像合成手法は高品質な合成画像を提供する。しかし、オブジェクトの位置や向きなどの画像内容の正確な調整は依然として困難であり、合成画像は幾何学的に無効な内容を持つことが多い。 3次元幾何学的な側面から合成画像のリッチな制御性を実現するために,構成可能な3次元シーンレイアウトに基づくリアルな画像合成手法を提案する。提案手法はセマンティックなクラスラベルを持つ3Dシーンを入力として、入力された3Dシーンの色値を合成する3Dシーン描画ネットワークを訓練する。トレーニング済みのペイントネットワークでは、入力された3dシーンの写実的なイメージをレンダリングして操作することができる。絵画ネットワークを3Dカラー監視なしで訓練するために,市販の2Dセマンティック画像合成手法を利用する。実験では,本手法が幾何学的正しい構造をもつ画像を生成し,視点や物体のポーズの変化や絵画スタイルの操作といった幾何学的操作をサポートすることを示す。

Recent conditional image synthesis approaches provide high-quality synthesized images. However, it is still challenging to accurately adjust image contents such as the positions and orientations of objects, and synthesized images often have geometrically invalid contents. To provide users with rich controllability on synthesized images in the aspect of 3D geometry, we propose a novel approach to realistic-looking image synthesis based on a configurable 3D scene layout. Our approach takes a 3D scene with semantic class labels as input and trains a 3D scene painting network that synthesizes color values for the input 3D scene. With the trained painting network, realistic-looking images for the input 3D scene can be rendered and manipulated. To train the painting network without 3D color supervision, we exploit an off-the-shelf 2D semantic image synthesis method. In experiments, we show that our approach produces images with geometrically correct structures and supports geometric manipulation such as the change of the viewpoint and object poses as well as manipulation of the painting style.

翻訳日:2021-08-25 11:14:35 公開日:2021-08-24

# 深層ニューラルネットワークによる微生物コロニー検出法 -比較解析-

Deep neural networks approach to microbial colony detection -- a comparative analysis ( http://arxiv.org/abs/2108.10103v2 )

ライセンス: Link先を確認

Sylwia Majchrowska, Jaros{\l}aw Paw{\l}owski, Natalia Czerep, Aleksander G\'orecki, Jakub Kuci\'nski, and Tomasz Golan

(参考訳) 微生物コロニーの計数は微生物学の基本的な課題であり、多くの産業分野に応用されている。それにもかかわらず、人工知能を用いた自動微生物計数に関する最近の研究は、統一された方法論の欠如と大規模なデータセットの可用性のため、ほとんど比較できない。最近導入されたagarデータセットは、第2のニーズへの答えだが、研究はまだ不十分である。この問題に対処するため,AGARデータセット上での3つのよく知られたディープラーニング手法,すなわち2段階,1段階,トランスフォーマーに基づくニューラルネットワークの性能を比較した。得られた結果は将来の実験のベンチマークとして機能するかもしれない。

Counting microbial colonies is a fundamental task in microbiology and has many applications in numerous industry branches. Despite this, current studies towards automatic microbial counting using artificial intelligence are hardly comparable due to the lack of unified methodology and the availability of large datasets. The recently introduced AGAR dataset is the answer to the second need, but the research carried out is still not exhaustive. To tackle this problem, we compared the performance of three well-known deep learning approaches for object detection on the AGAR dataset, namely two-stage, one-stage and transformer based neural networks. The achieved results may serve as a benchmark for future experiments.

翻訳日:2021-08-25 10:58:57 公開日:2021-08-24

PDF登録状況（公開日: 20210824）