Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20211224となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# ハミルトン最小化に基づく計算の量子モデルの数学的構造について On the mathematical structure of quantum models of computation based on Hamiltonian minimisation ( http://arxiv.org/abs/2009.10088v2 ) ライセンス: Link先を確認	Jacob Biamonte	(参考訳) スピンハミルトンの基底状態の性質を決定することは、数学、理論、応用物理学の分野を結びつける中心的な関連点である。過去数十年間、物理系の基底状態特性は計算資源としてますます考慮されてきた。この論文は、量子計算や古典計算に関連する(プログラム)基底状態を生成する数学的装置の一部を開発する。この論文(現在10年以上前の)で示された中核的な発見には (i)論理演算(ゲート)は、イジングスピンの低エネルギーセクタに組み込むことができるが、3つの(かつより高い)ボディーイジング相互作用項は、2及び1体のイジング項の最小化によって模倣することができるが、スラックスピンの導入は必要ではない。 (ii)摂動理論ガジェットは、与えられたハミルトニアンには存在しない相互作用のエミュレーションを可能にする。例えば、~$yy$ 相互作用は、zz$, $xx$ から実現することができる。他の結果と組み合わせて、これらのモデルが基底状態量子計算のための普遍的な資源を提供することを示した。より最近の発見は、量子アルゴリズムに対する現代の変分アプローチの理想化バージョンが量子計算の普遍モデルを可能にするという証明を含んでいる。他の関連する結果は、基底状態の量子計算や量子回路によるハミルトニアンの最小化に関連しても示される。 ising model reductions, stochastic versus quantum processes on graphs, quantum gates and circuits as tensor networks, variational quantum algorithms and hamiltonian gadgets. 関連トピックは以下のとおりである。 Determining properties of ground states of spin Hamiltonians remains a topic of central relevance connecting disciplines of mathematical, theoretical and applied physics. In the last few decades, ground state properties of physical systems have been increasingly considered as computational resources. This thesis develops parts of the mathematical apparatus to create (program) ground states relevant for quantum and classical computation. The core findings presented in this thesis (now over a decade old) including that (i) logic operations (gates) can be embedded into the low-energy sector of Ising spins whereas three (and higher) body Ising interaction terms can be mimicked through the minimisation of 2- and 1-body Ising terms yet require the introduction of slack spins; (ii) Perturbation theory gadgets enable the emulation of interactions not present in a given Hamiltonian, e.g.~$YY$ interactions can be realized from $ZZ$, $XX$, the thesis contains a result from 2007 showing that physically relevant two-body model Hamiltonian's have a QMA-hard ground state energy decision problem. Merged with other results, this established that these models provide a universal resource for ground state quantum computation. More recent findings include the proof that an idealised version of the contemporary variational approach to quantum algorithms enables a universal model of quantum computation. Other related results are also presented as they relate to ground state quantum computation and the minimisation of Hamiltonians by quantum circuits. The topics covered include: Ising model reductions, stochastic versus quantum processes on graphs, quantum gates and circuits as tensor networks, variational quantum algorithms and Hamiltonian gadgets.	翻訳日:2023-05-01 09:14:35 公開日:2021-12-24
# 受動オンチップ超伝導循環器の動作:デバイス制御と準粒子効果 Operating a passive on-chip superconducting circulator: device control and quasiparticle effects ( http://arxiv.org/abs/2103.02759v2 ) ライセンス: Link先を確認	Dat Thanh Le, Clemens Muller, Rohit Navarathna, Arkady Fedorov, and T. M. Stace	(参考訳) マイクロ波循環器は超伝導回路に基づく量子技術において重要な役割を果たす。従来の循環器の設計はフェライト材料を採用しており、バルク状であり強磁場を伴い、超伝導チップへの集積には適さない。オンチップ超伝導循環器の有望な設計は、受動ジョセフソン接合環に基づいている。本稿では,回路チューニングと準粒子トンネル効果の2つの運用課題について考察する。断熱除去を用いて散乱行列を計算し,パラメータ制約を導出し,最適循環を実現する。次に、ゲート電圧やフラックスバイアスを含む外部制御パラメータの完全なセットに対して循環器性能を数値的に最適化し、この多次元最適化が迅速に収束して最適作業点を求めることを示す。また、循環器リングにおける準粒子トンネルの可能性と、それが信号循環に与える影響についても検討する。この結果は,ジョセフソン接合からなる受動オンチップ超電導循環器の実用運用の基礎となる。 Microwave circulators play an important role in quantum technology based on superconducting circuits. The conventional circulator design, which employs ferrite materials, is bulky and involves strong magnetic fields, rendering it unsuitable for integration on superconducting chips. One promising design for an on-chip superconducting circulator is based on a passive Josephson-junction ring. In this paper, we consider two operational issues for such a device: circuit tuning and the effects of quasiparticle tunneling. We compute the scattering matrix using adiabatic elimination and derive the parameter constraints to achieve optimal circulation. We then numerically optimize the circulator performance over the full set of external control parameters, including gate voltages and flux bias, to demonstrate that this multi-dimensional optimization converges quickly to find optimal working points. We also consider the possibility of quasiparticle tunneling in the circulator ring and how it affects signal circulation. Our results form the basis for practical operation of a passive on-chip superconducting circulator made from a ring of Josephson junctions.	翻訳日:2023-04-09 07:43:21 公開日:2021-12-24
# リー代数量子系のフロッケ工学 Floquet Engineering of Lie Algebraic Quantum Systems ( http://arxiv.org/abs/2103.15923v2 ) ライセンス: Link先を確認	Jayendra N. Bandyopadhyay and Juzar Thingna	(参考訳) 静的ハミルトニアンから始まる所望のシステムを分光的に実現するために,周期駆動プロトコルを体系的に設計する'フロケットエンジニアリング'形式を提案する。この形式は、例えば、非相互作用粒子が格子上を移動する固体系や、光学格子上を移動する超低温原子によって記述される変種を含む閉リー代数構造を持つ量子系に適用できる。 Floquet Engineering の以前の試みとは異なり、我々の手法は任意の駆動周波数でFloquet Hamiltonian を生成し、高速または低速な駆動レギュレーションに制限されない。この手法はWei-Norman ansatzに基づいており、これはもともと任意の駆動のための時間進化演算子を構築するために提案されたものである。本稿では、このアンサッツを駆動の一期間以内に定義されたマイクロモーションダイナミクスに適用し、マイクロモーションのゲージを固定して駆動プロトコルを得る。このアイデアを説明するために、2バンドシステムまたはテストベッドとして2つのサブラティクスからなるシステムを用いる。特に,パラダイム的フラットバンドモデルであるクロススティッチ格子モデルの設計に注目する。 We propose a `Floquet engineering' formalism to systematically design a periodic driving protocol in order to stroboscopically realize the desired system starting from a given static Hamiltonian. The formalism is applicable to quantum systems which have an underlying closed Lie-algebraic structure, for example, solid-state systems with noninteracting particles moving on a lattice or its variant described by the ultra-cold atoms moving on an optical lattice. Unlike previous attempts at Floquet engineering, our method produces the desired Floquet Hamiltonian at any driving frequency and is not restricted to the fast or slow driving regimes. The approach is based on Wei-Norman ansatz, which was originally proposed to construct a time-evolution operator for any arbitrary driving. Here, we apply this ansatz to the micro-motion dynamics, defined within one period of the driving, and obtain the driving protocol by fixing the gauge of the micro-motion. To illustrate our idea, we use a two-band system or the systems consisting of two sub-lattices as a testbed. Particularly, we focus on engineering the cross-stitched lattice model that has been a paradigmatic flat-band model.	翻訳日:2023-04-06 05:40:31 公開日:2021-12-24
# 量子平均力ギブス状態の弱および超強結合限界 Weak and ultrastrong coupling limits of the quantum mean force Gibbs state ( http://arxiv.org/abs/2104.12606v2 ) ライセンス: Link先を確認	J. D. Cresser, J. Anders	(参考訳) ギブス状態は、温度$T$の環境と接触する系の平衡状態であると考えられている。しかし、システムと環境の間の不要な相互作用は、状態が変化する可能性がある。ここで、この平均力ギブス状態の一般的な表現を導出し、ボソニック貯水池と相互作用するシステムにおいて有効である。まず、弱結合極限の状態を導出し、一般に、素系ハミルトニアンに関してコヒーレンスを維持することを見つける。第2に,超強結合構造の研究に適した新しい拡張法を開発した。これにより、平均力ギブス状態の明示的な形式を導出することができ、ハミルトニアン系の代わりにシステム-貯留相互作用によって設定された対角線となる。 1つの量子ビット、3レベルのv-システム、2つの結合量子ビットは全てボソニック貯水池と相互作用する。その結果、強結合状態におけるコヒーレンスの存在に光を当て、ナノスケール熱力学研究の鍵となるツールを提供した。 The Gibbs state is widely taken to be the equilibrium state of a system in contact with an environment at temperature $T$. However, non-negligible interactions between system and environment can give rise to an altered state. Here we derive general expressions for this mean force Gibbs state, valid for any system that interacts with a bosonic reservoir. First, we derive the state in the weak coupling limit and find that, in general, it maintains coherences with respect to the bare system Hamiltonian. Second, we develop a new expansion method suited to investigate the ultrastrong coupling regime. This allows us to derive the explicit form for the mean force Gibbs state, and we find that it becomes diagonal in the basis set by the system-reservoir interaction instead of the system Hamiltonian. Several examples are discussed including a single qubit, a three-level V-system and two coupled qubits all interacting with bosonic reservoirs. The results shed light on the presence of coherences in the strong coupling regime, and provide key tools for nanoscale thermodynamics investigations.	翻訳日:2023-04-02 09:01:46 公開日:2021-12-24
# 7km展開ファイバリンクを用いた遠方クロックの場二方向量子同期の実現 Implementation of field two-way quantum synchronization of distant clocks across a 7 km deployed fiber link ( http://arxiv.org/abs/2109.00784v2 ) ライセンス: Link先を確認	Runai Quan, Huibo Hong, Wenxiang Xue, Honglei Quan, Wenyu Zhao, Xiao Xiang, Yuting Liu, Mingtao Cao, Tao Liu, Shougang Zhang, Ruifang Dong	(参考訳) 両方向の量子クロック同期はフェムト秒レベルの同期機能だけでなく、対称遅延攻撃に対するセキュリティも提供することが示されており、これにより、遠隔クロックを精度とセキュリティの両面で比較・同期する方法として期待できる。本稿では,HメーサとRbクロック間の2方向量子同期のフィールドテストを行い,7kmの展開ファイバでリンクした。 rbクロックの周波数安定性に制限され,30 sでの時間安定性を32 psで測定した。光ファイバマイクロ波周波数伝達技術を適用して, 得られた光子対の数が30秒で1440個に過ぎなかったにもかかわらず, 1マグニチュード以上の安定性が1.9psに向上した。このような実装は、フィールド応用を促進するための双方向量子クロック同期法の高実用性を示す。 The two-way quantum clock synchronization has been shown not only providing femtosecond-level synchronization capability but also security against symmetric delay attacks, thus becoming a prospective method to compare and synchronize distant clocks with both enhanced precision and security. In this letter, a field test of two-way quantum synchronization between a H-maser and a Rb clock linked by a 7 km-long deployed fiber was implemented. Limited by the frequency stability of the Rb clock, the achieved time stability at 30 s was measured as 32 ps. By applying a fiber-optic microwave frequency transfer technology, the stability was improved by more than one-magnitude to 1.9 ps, even though the number of acquired photon pairs was only 1440 in 30 s due to the low sampling rate of the utilized coincidence measurement system. Such implementation demonstrates the high practicability of two-way quantum clock synchronization method for promoting the field applications.	翻訳日:2023-03-16 08:43:12 公開日:2021-12-24
# 三体クーロン系における原子から分子への連続的転移のキャラクタリゼーション Characterization of the continuous transition from atomic to molecular shape in the three-body Coulomb system ( http://arxiv.org/abs/2109.04542v2 ) ライセンス: Link先を確認	Laura D. Salas, Barbara Zamora-Yusti, and Julio C. Arce	(参考訳) 粒子の質量比が変化するため, 2つの同一粒子と反対電荷の第3粒子からなるクーロン系において, 原子から分子形状への連続遷移を, 代替的, 不定形的にキャラクタリゼーションする。変動に最適化された波動関数に境界条件の正確な分解を適用することにより、単一粒子と同一粒子の相対運動に対する非断熱ポテンシャル曲面を基底状態に構築する。遷移は、そのような表面の地形の質量比と関連する境界分布と条件分布の形状との進化を通して明らかにされる。本手法は, 分子形状のボルン-オッペンハイマーと電荷分配画像の統合と拡張を行う。 We present an alternative, univocal characterization of the continuous transition from atomic to molecular shape in the Coulomb system constituted by two identical particles and a third particle with the opposite charge, as the mass ratio of the particles varies. Applying a marginal-conditional exact factorization to a variationally optimized wavefunction, we construct a nonadiabatic potential energy surface for the relative motion between the single particle and each of the identical particles in the ground state. The transition is revealed through the evolution with the mass ratio of the topography of such surface and of the shapes of the associated marginal and conditional distributions. Our approach unifies and extends to the nonadiabatic regime the Born-Oppenheimer and charge-distribution pictures of molecular shape.	翻訳日:2023-03-15 18:07:58 公開日:2021-12-24
# 1+1)次元$O(3)$非線形$\sigma$-モデルと$\theta=\pi$項のテンソルネットワークシミュレーション Tensor network simulation of the (1+1)-dimensional $O(3)$ nonlinear $\sigma$-model with $\theta=\pi$ term ( http://arxiv.org/abs/2109.11324v2 ) ライセンス: Link先を確認	Wei Tang, X. C. Xie, Lei Wang, Hong-Hao Tu	(参考訳) 1+1)次元の$O(3)$非線形な$\sigma$-modelと$\theta=\pi$のテンソルネットワークシミュレーションを行う。ハミルトンの定式化の中で、この場の理論は磁気モノポールで装飾された量子ローターモデルの有限温度分割関数として現れる。単極高調波基底を用いて、この修正量子回転子モデルの行列表現を導出し、テンソルネットワークシミュレーションを可能にする。我々は,最近開発した連続行列積作用素法[tang et al., phys. rev. lett. 125, 170604 (2020)]を用いて有限温度特性の研究を行い,無質量性を明らかにする。結合定数の関数としての中心電荷は計算で直接抽出され、場理論の予測と比較される。 We perform a tensor network simulation of the (1+1)-dimensional $O(3)$ nonlinear $\sigma$-model with $\theta=\pi$ term. Within the Hamiltonian formulation, this field theory emerges as the finite-temperature partition function of a modified quantum rotor model decorated with magnetic monopoles. Using the monopole harmonics basis, we derive the matrix representation for this modified quantum rotor model, which enables tensor network simulations. We employ our recently developed continuous matrix product operator method [Tang et al., Phys. Rev. Lett. 125, 170604 (2020)] to study the finite-temperature properties of this model and reveal its massless nature. The central charge as a function of the coupling constant is directly extracted in our calculations and compared with field theory predictions.	翻訳日:2023-03-13 23:13:24 公開日:2021-12-24
# 非エルミート光原子鏡 A non-Hermitian optical atomic mirror ( http://arxiv.org/abs/2110.10070v2 ) ライセンス: Link先を確認	Yi-Cheng Wang, Jhih-Shih You, H. H. Jen	(参考訳) 対称性とトポロジーの研究は量子光学において重要なブレークスルーをもたらしたが、よりリッチな振る舞いは光間相互作用の非エルミート的性質から生じる。高反射率非エルミタン光学鏡は、集合双極子モードに付随する共振器近傍の2次元の中性原子のサブ波長アレイによって実現することができる。ここでは、二乗原子格子の結晶対称性を低くすることで、例外点が非欠陥縮退から発展し、例外点から生じる分散バルクフェルミ弧が光円錐によって切り離されることを示す。双極子-双極子相互作用は相反するが、幾何学に依存しない非エルミート皮膚効果が出現する。さらに、境界に局在したスキンモードは、長距離相互作用に由来するスケールフリーな振る舞いを示し、そのメカニズムは非ブロッホバンド理論の枠組みを越えている。我々の研究は、非単純性、トポロジー、長距離相互作用の間の相互作用の研究の扉を開く。 Explorations of symmetry and topology have led to important breakthroughs in quantum optics, but much richer behaviors arise from the non-Hermitian nature of light-matter interactions. A high-reflectivity, non-Hermitian optical mirror can be realized by a two-dimensional subwavelength array of neutral atoms near the cooperative resonance associated with the collective dipole modes. Here we show that exceptional points develop from a nondefective degeneracy by lowering the crystal symmetry of a square atomic lattice, and dispersive bulk Fermi arcs that originate from exceptional points are truncated by the light cone. We also find, although the dipole-dipole interaction is reciprocal, the geometry-dependent non-Hermitian skin effect emerges. Furthermore, skin modes localized at a boundary show a scale-free behavior that stems from the long-range interaction and whose mechanism goes beyond the framework of non-Bloch band theory. Our work opens the door to the study of the interplay among non-Hermiticity, topology, and long-range interaction.	翻訳日:2023-03-11 02:01:18 公開日:2021-12-24
# スイッチング可能な軌道角運動量を有するヘラルド単一光子の室温オンチップ生成 Room-temperature on-chip generation of heralded single photons with switchable orbital angular momentum ( http://arxiv.org/abs/2111.05594v2 ) ライセンス: Link先を確認	Shan Zhang, Xue Feng, Wei Zhang, Kaiyu Cui, Fang Liu, and Yidong Huang	(参考訳) 量子光学において、軌道角運動量(oam)は、l の位相電荷によって量子化される無限および離散固有値の性質から、高次元量子状態を達成することを非常に有望である。ここでは、スイッチング可能なOAMモードを持つ一光子光源を提案し、シリコンチップ上で実証した。室温では、11のoamモード(l=2~6,-6〜-1)を持つヘラルド単光子が熱光学効果により生成・切り替えに成功した。我々は、複数のOAMモードを持ち、室温で動作する統合量子源は、高次元量子情報処理のための実用的なプラットフォームを提供すると考えている。さらに,提案アーキテクチャは,OAM量子源の性能向上のために,他の物質システムにも拡張可能である。 In quantum optics, orbital angular momentum (OAM) is very promising to achieve high-dimensional quantum states due to the nature of infinite and discrete eigenvalue, which is quantized by the topological charge of l. Here, a heralded single-photon source with switchable OAM modes is proposed and demonstrated on silicon chip. At room-temperature, the heralded single photons with 11 OAM modes (l=2~6, -6~-1) have been successfully generated and switched through thermo-optical effect. We believe that such an integrated quantum source with multiple OAM modes and operating at room-temperature would provide a practical platform for high-dimensional quantum information processing. Moreover, our proposed architecture can also be extended to other material systems to further improve the performance of OAM quantum source.	翻訳日:2023-03-08 12:17:56 公開日:2021-12-24
# qudit表面コードとハイパーマップコード Qudit surface code and hypermap code ( http://arxiv.org/abs/2112.01752v2 ) ライセンス: Link先を確認	Zihan Lei	(参考訳) 本稿では、ホモロジー量子コードを任意のqudit次元$D\geq{2}$で定義し、2-複素$\Sigma$上でCSS演算子を直接定義する。 2-コンプレックスが曲面から来ると、qudit曲面コードが得られる。次に、定義したコードの次元が常に $\sigma$ の最初のホモロジー群のサイズに等しいことを証明する。次に、martin leslie が提案したハイパーマップホモロジー量子コードを qudit のケースに拡張し、そのようなすべてのハイパーマップコードに対して、我々が定義したホモロジー量子コードがそれと等しくなるような抽象的2-複体を構築した。 In this article, we define homological quantum code in arbitrary qudit dimension $D\geq{2}$ by directly defining CSS operators on a 2-complex $\Sigma$. When the 2-complex is from a surface, we get a qudit surface code. Then we prove that the dimension of the code we defined always equals the size of the first homology group of $\Sigma$. Next, we expand the hypermap-homology quantum code proposed by Martin Leslie to the qudit's case, and for every such hypermap code, we constructed an abstract 2-complex whose homological quantum code we just defined equals it.	翻訳日:2023-03-06 00:16:08 公開日:2021-12-24
# 2つの一般的な系統的誤りに対する短い複合回転ロバスト Short composite rotation robust against two common systematic errors ( http://arxiv.org/abs/2112.12945v1 ) ライセンス: Link先を確認	Shingo Kukita, Haruki Kiya, and Yasushi Kondo	(参考訳) システムエラーは正確な量子制御を妨げる。パルス長誤差 (PLE) とオフ共振誤差 (ORE) は1ビット制御で発生する典型的な系統誤差である。複合パルス(CP)は、量子演算中の系統的なエラーの影響を補うのに役立つ。 PLEまたはOREに対して堅牢ないくつかのCPが同定されている。しかし、両方のエラー(bi-robust)に対して堅牢なCPを構築する試みはほとんど行われていない。複数個のロバストCPを改良し, 従来開発されたバイロバストCPよりも動作時間を短縮した新しいバイロバストCPを開発した。 Systematic errors hinder precise quantum control. Pulse length errors (PLEs) and off-resonance errors (OREs) are typical systematic errors that are encountered during one-qubit control. A composite pulse (CP) can help compensate for the effects of systematic errors during quantum operation. Several CPs that are robust against either PLE or ORE have been identified. However, few attempts have been made to construct CPs that are robust against both errors (bi-robust). We develop a novel bi-robust CP for one-qubit operations by modifying a PLE robust CP, which exhibits a shorter operation time than that of previously developed bi-robust CPs.	翻訳日:2023-03-03 09:16:47 公開日:2021-12-24
# 1次元水素分子イオンに対するシュロディンガー方程式の解析解 Analytical solutions of the Schrodinger equation for the one-dimensional hydrogen molecular ion ( http://arxiv.org/abs/2112.13135v1 ) ライセンス: Link先を確認	Stavros Theodorakis	(参考訳) 1次元水素分子イオンに対するシュロディンガー方程式の解析解を提案する。特に、この系の電子エネルギー曲線に対して、基底状態と第1励起状態に対応する閉形式表現を示す。我々の結果は以前得られた数値解と一致している。 We present analytical solutions of the Schrodinger equation for the one-dimensional hydrogen molecular ion. In particular, we present closed form expressions for the electronic energy curves of this system that correspond to the ground state and the first excited state. Our results agree with numerical solutions obtained before.	翻訳日:2023-03-03 09:15:53 公開日:2021-12-24
# IPsecアーキテクチャにおける量子鍵分配技術の概要 Overview of Quantum Key Distribution Technique within IPsec Architecture ( http://arxiv.org/abs/2112.13105v1 ) ライセンス: Link先を確認	Emir Dervisevic, Miralem Mehic	(参考訳) qkd(quantum key distribution)は、情報理論上安全な方法で遠隔ユーザ間で対称なバイナリキーを確立するためのアプローチである。本稿では、最新のIP(Internet Protocol)ネットワークにおけるセキュアな通信を確立するために、QKDを最もポピュラーなアーキテクチャに統合する既存のソリューションの概要について述べる。提供される概要は、標準化されたソリューションを目指すIPsecアーキテクチャにおけるQKDの統合をさらに設計するために使用することができる。 Quantum Key Distribution (QKD) is an approach for establishing symmetrical binary keys between distant users in an information-theoretically secure way. In this paper we provide an overview of existing solutions that integrate QKD within the most popular architecture for establishing secure communications in modern IP (Internet Protocol) networks - IPsec (Internet Protocol security). The provided overview can be used to further design the integration of QKD within the IPsec architecture striving for a standardized solution.	翻訳日:2023-03-03 09:15:49 公開日:2021-12-24
# 広帯域マイクロピラー空洞に埋め込まれた量子ドットに基づく光子対の高抽出効率源 High extraction efficiency source of photon pairs based on a quantum dot embedded in a broadband micropillar cavity ( http://arxiv.org/abs/2112.13074v1 ) ライセンス: Link先を確認	Laia Gin\'es, Magdalena Mocza{\l}a-Dusanowska, David Dlaka, Radim Ho\v{s}\'ak, Junior R. Gonzales-Ureta, Miroslav Je\v{z}ek, Edmund Harbord, Ruth Oulton, Sven H\"ofling, Andrew B. Young, Christian Schneider, Ana Predojevi\'c	(参考訳) 単一量子ドットにおける光子対の生成は、その性質上決定論的な過程に基づいている。しかし、高インデックス半導体ホスト材料からこれらの光子対を効率的に抽出するには、フォトニック環境の工学が必要である。単一量子ドットから放出される光子対の抽出に適した広帯域演算を用いて、69.4(10)$\%$の抽出効率を特徴とするマイクロピラーデバイスについて報告する。抽出効率の向上を実現するため,Purcellの強化にのみ依存するアプローチに対して,本手法はキャビティモード以外のモードへの排出抑制を利用する。当社の技術実装では、量子技術の増大するニーズに合わせてスケールアップ可能な、より高いデバイス収率を実現するための、控えめな製造努力が必要です。さらに、デバイスの設計をさらに最適化して、85$\%$の抽出効率を実現することができる。 The generation of photon pairs in single quantum dots is based on a process that is, in its nature, deterministic. However, an efficient extraction of these photon pairs from a high-index semiconductor host material requires engineering of the photonic environment. We report on a micropillar-based device featuring an extraction efficiency of 69.4(10)$\%$ that is achieved by harnessing a broadband operation suitable for extraction of photon pairs emitted from a single quantum dot. Opposing the approaches that rely solely on Purcell enhancement to realize the enhancement of the extraction efficiency, our solution exploits a suppression of the emission into the modes other than the cavity mode. Our technological implementation requires modest fabrication effort enabling higher device yields that can be scaled up to meet the growing needs of quantum technologies. Furthermore, the design of the device can be further optimized to allow for an extraction efficiency of 85$\%$.	翻訳日:2023-03-03 09:15:33 公開日:2021-12-24
# 自己受動面を有するペロブスカイト単結晶に基づく固有(トラップフリー)トランジスタ The intrinsic (trap-free) transistors based on perovskite single crystals with self-passivated surfaces ( http://arxiv.org/abs/2112.13056v1 ) ライセンス: Link先を確認	V. Bruevich, L. Kasaei, S. Rangan, H. Hijazi, Z. Zhang, T. Emge, E. Andrei, R. A. Bartynski, L. C. Feldman, V. Podzorov	(参考訳) 鉛ハロゲン化ペロブスカイトは様々な光電子応用に適した新しい半導体材料として登場した。しかし、このタイプの材料の電荷輸送特性に関する基礎的および応用研究に必要なデバイスである信頼性ペロブスカイト電界効果トランジスタ(FET)の製造は困難であることが証明されている。ここでは,セシウム鉛臭化物(cspbbr3)のエピタキシャル単結晶薄膜に基づく高性能ペロブスカイトfetを示す。気相エピタキシーの改善により、このペロブスカイトの真に大きな原子平らな膜を、優れた構造と表面特性で成長させることができる。これらのCsPbBr3膜に基づくFETは、非常に低いヒステリシスと高い固有の電荷キャリアモビリティを有する教科書トランジスタ特性を示す。このような高性能デバイスが利用可能になったことで、ペロブスカイトFETにおけるホール効果が初めて研究された。 CsPbBr3 FETの荷電担体移動度は, 室温で約30 cm2V-1s-1から50 Kで約250 cm2V-1s-1に増加し, 主にフォノン散乱による帯域輸送が制限された。ここで説明されるエピタキシャル成長とFET製造法は、ハイブリッドを含む他のペロブスカイトにも自然に拡張可能であり、ペロブスカイトFETの研究における性能ボトルネックを克服する技術的な進歩を表している。 Lead-halide perovskites emerged as novel semiconducting materials suitable for a variety of optoelectronic applications. However, fabrication of reliable perovskite field-effect transistors (FETs), the devices necessary for the fundamental and applied research on charge transport properties of this class of materials, has proven challenging. Here we demonstrate high-performance perovskite FETs based on epitaxial, single crystalline thin films of cesium lead bromide (CsPbBr3). An improved vapor-phase epitaxy has allowed growing truly large-area, atomically flat films of this perovskite with excellent structural and surface properties. FETs based on these CsPbBr3 films exhibit textbook transistor characteristics, with a very low hysteresis and high intrinsic charge carrier mobility. Availability of such high-performance devices has allowed the study of Hall effect in perovskite FETs for the first time. Our magneto-transport measurements show that the charge carrier mobility of CsPbBr3 FETs increases on cooling, from ~ 30 cm2V-1s-1 at room temperature, to ~ 250 cm2V-1s-1 at 50 K, exhibiting a band transport mostly limited by phonon scattering. The epitaxial growth and FET fabrication methodologies described here can be naturally extended to other perovskites, including the hybrid ones, thus representing a technological leap forward, overcoming the performance bottleneck in research on perovskite FETs.	翻訳日:2023-03-03 09:15:18 公開日:2021-12-24
# 線形ネットワークにおけるマクロ量子相関のシミュレーション Simulating macroscopic quantum correlations in linear networks ( http://arxiv.org/abs/2112.13014v1 ) ライセンス: Link先を確認	A. Dellios, Peter D. Drummond, Bogdan Opanchuk, Run Yan Teh, and Margaret D. Reid	(参考訳) 多くの発展型量子技術は異なるタイプの量子ネットワークを利用する。線形量子ネットワークでさえ非自明であり、出力光子分布は指数関数的に複雑である。しかし、それでも計算シミュレーションは可能である。使用される方法は等価位相空間表現への変換であり、確率的に扱うことができる。これはデコヒーレンスを含む実験結果の予測と検証に非常に有用なツールを提供する。量子計算上の優位性を示すことを意図したガウスボソンサンプリングの実験と同様に、これらの手法は他の種類の絡み合った線形量子ネットワークにも適用できる。本稿では、この領域における研究のチュートリアルとレビューを行い、正のP分布とウィグナー分布を用いて量子位相空間技術を説明する。 Many developing quantum technologies make use of quantum networks of different types. Even linear quantum networks are nontrivial, as the output photon distributions can be exponentially complex. Despite this, they can still be computationally simulated. The methods used are transformations into equivalent phase-space representations, which can then be treated probabilistically. This provides an exceptionally useful tool for the prediction and validation of experimental results, including decoherence. As well as experiments in Gaussian boson sampling, which are intended to demonstrate quantum computational advantage, these methods are applicable to other types of entangled linear quantum networks as well. This paper provides a tutorial and review of work in this area, to explain quantum phase-space techniques using the positive-P and Wigner distributions.	翻訳日:2023-03-03 09:13:52 公開日:2021-12-24
# ワイル幾何学と量子補正 Weyl Geometry and Quantum Corrections ( http://arxiv.org/abs/2112.12964v1 ) ライセンス: Link先を確認	Sijo K. Joseph	(参考訳) 量子論の幾何学的定式化に関する最近の研究は、ワイル幾何学が量子論と一般相対性理論を古典的場の理論として一貫して融合するために用いられることを示唆している。ワイル幾何学の枠組みでは、量子論と重力は、量子論がジオメトリゼーションされると、一貫して融合できるようである。拡張微分幾何学は量子力学的結果をより一般的な非線形の枠組みに修正することができる。著者は、拡張微分幾何学が既知の量子方程式やマクスウェルの電磁方程式の修正をどのように修正するかを示している。 Recent research in the geometric formulation of quantum theory has implied that Weyl Geometry can be used to merge quantum theory and general relativity consistently as classical field theories. In the Weyl Geometric framework, it seems that both quantum theory and gravity can merge consistently, once quantum theory is geometrized. The extended differential geometry can modify the quantum mechanical results into a more general nonlinear framework. Author shows that, how the extended differential geometry modifies the known quantum equations and also the modification to the Maxwell's electromagnetic equations.	翻訳日:2023-03-03 09:13:03 公開日:2021-12-24
# 量子サイド情報を用いた推測作業 Guesswork with Quantum Side Information ( http://arxiv.org/abs/2001.03598v3 ) ライセンス: Link先を確認	Eric P. Hanson, Vishal Katariya, Nilanjana Datta, Mark M. Wilde	(参考訳) 確率変数の実現を正しく推測するには、平均で最小の推測数が必要なのか? この疑問に対する答えは、1994年にマッシーによる推測という量の概念を導入し、これはエントロピーに対する代替のセキュリティ基準と見なすことができる。本稿では,量子側情報の存在下での推測について考察し,一般的な逐次推定戦略が単一測定を行い,その結果から推測戦略を選択することと等価であることを示す。この結果を用いて、量子側情報の存在下での推測上のエントロピー的なワンショットと漸近境界を推定し、半定値プログラム(SDP)を定式化し、その量を計算する。 bb84状態を含む単純な例について,数値的および解析的に推算し,その推算をセキュリティ基準として用いた場合,若干不完全な鍵状態の安全性を検証した連続性を証明する。 What is the minimum number of guesses needed on average to correctly guess a realization of a random variable? The answer to this question led to the introduction of the notion of a quantity called guesswork by Massey in 1994, which can be viewed as an alternate security criterion to entropy. In this paper, we consider the guesswork in the presence of quantum side information, and show that a general sequential guessing strategy is equivalent to performing a single measurement and choosing a guessing strategy from the outcome. We use this result to deduce entropic one-shot and asymptotic bounds on the guesswork in the presence of quantum side information, and to formulate a semi-definite program (SDP) to calculate the quantity. We evaluate the guesswork for a simple example involving the BB84 states, both numerically and analytically, and prove a continuity result that certifies the security of slightly imperfect key states when the guesswork is used as the security criterion.	翻訳日:2023-01-12 23:20:59 公開日:2021-12-24
# 木、森、そして不純物に基づく変数の重要性 Trees, forests, and impurity-based variable importance ( http://arxiv.org/abs/2001.04295v3 ) ライセンス: Link先を確認	Erwan Scornet (CMAP)	(参考訳) ランダムフォレスト(breiman, 2001])のようなツリーアンサンブル手法は、高い次元の表データを扱うのに非常に人気がある。しかし、機械学習が意思決定問題に使用される場合、アルゴリズム予測プロセスの深い理解を必要とするため、最良の予測手順の解決は合理的ではないかもしれない。不幸なことに、ランダムな森林は数百の決定木を平均して予測した結果、本質的に解釈できない。このいわゆるブラックボックスアルゴリズムの知識を得る古典的なアプローチは、各入力変数の予測的影響を評価するために使用される変数の重要性を計算することである。可変重要度は変数のランク付けや選択に使用され、データ分析において大きな役割を果たす。それにもかかわらず、そのような方法でランダムな森林変数の重要さを使うのは正当化されていない。本稿では,2つのよく知られたランダムな森林変動の重要性である平均減少不純物(MDI)を分析する。入力変数が独立で相互作用がない場合、MDIは各変数の寄与が明確に識別される出力の分散分解を提供する。また,入力変数や相互作用の依存性を示すモデルについても検討した。分析の結果,単木に比べて森林の利用にメリットがある可能性が示唆された。 Tree ensemble methods such as random forests [Breiman, 2001] are very popular to handle high-dimensional tabular data sets, notably because of their good predictive accuracy. However, when machine learning is used for decision-making problems, settling for the best predictive procedures may not be reasonable since enlightened decisions require an in-depth comprehension of the algorithm prediction process. Unfortunately, random forests are not intrinsically interpretable since their prediction results from averaging several hundreds of decision trees. A classic approach to gain knowledge on this so-called black-box algorithm is to compute variable importances, that are employed to assess the predictive impact of each input variable. Variable importances are then used to rank or select variables and thus play a great role in data analysis. Nevertheless, there is no justification to use random forest variable importances in such way: we do not even know what these quantities estimate. In this paper, we analyze one of the two well-known random forest variable importances, the Mean Decrease Impurity (MDI). We prove that if input variables are independent and in absence of interactions, MDI provides a variance decomposition of the output, where the contribution of each variable is clearly identified. We also study models exhibiting dependence between input variables or interaction, for which the variable importance is intrinsically ill-defined. Our analysis shows that there may exist some benefits to use a forest compared to a single tree.	翻訳日:2023-01-11 23:52:24 公開日:2021-12-24
# 短・雑音テキストストリームのためのグラフ畳み込みトピックモデル A Graph Convolutional Topic Model for Short and Noisy Text Streams ( http://arxiv.org/abs/2003.06112v4 ) ライセンス: Link先を確認	Ngo Van Linh, Tran Xuan Bach and Khoat Than	(参考訳) データストリームから隠れたトピックを学ぶことは、必然的に必要だが、コンセプトドリフトや、短くて騒がしいデータといった難しい問題を引き起こした。トピックモデルを強化するために事前知識を使用することは、これらの課題に対処する潜在的な解決策の1つです。人的知識(Wordnetなど)や事前訓練されたモデル(Word2vecなど)から派生した事前知識は、トピックモデルがよりうまく機能するのに非常に有用である。しかし、データが継続的に無限に届くストリーミング環境では、既存の研究はこれらのリソースを効果的に活用することに限定されている。特に意味のある単語関係を含む知識グラフは無視される。本稿では,知識グラフを効果的に活用することを目的として,グラフ畳み込みネットワーク(gcn)をトピックモデルに統合する新しいグラフ畳み込みトピックモデル(gctm)と,データストリームに対してネットワークとトピックモデルを同時に学習する学習方法を提案する。各ミニバッチでは,外部ナレッジグラフを活用できるだけでなく,外部ナレッジグラフと古いナレッジのバランスをとることができ,新しいデータでうまく機能する。我々は,人間の知識グラフ(Wordnet)と事前学習した単語埋め込み(Word2vec)から構築したグラフ(Word2vec)を用いて,提案手法の評価を行う。提案手法は,確率的予測尺度とトピックコヒーレンスの観点から,最先端のベースラインよりもはるかに優れた性能が得られることを示す。特に,本手法は,短いテキストやコンセプトドリフトを扱う場合にも有効である。 GCTMの実装は \url{https://github.com/bachtranxuan/GCTM.git} で利用可能である。 Learning hidden topics from data streams has become absolutely necessary but posed challenging problems such as concept drift as well as short and noisy data. Using prior knowledge to enrich a topic model is one of potential solutions to cope with these challenges. Prior knowledge that is derived from human knowledge (e.g. Wordnet) or a pre-trained model (e.g. Word2vec) is very valuable and useful to help topic models work better. However, in a streaming environment where data arrives continually and infinitely, existing studies are limited to exploiting these resources effectively. Especially, a knowledge graph, that contains meaningful word relations, is ignored. In this paper, to aim at exploiting a knowledge graph effectively, we propose a novel graph convolutional topic model (GCTM) which integrates graph convolutional networks (GCN) into a topic model and a learning method which learns the networks and the topic model simultaneously for data streams. In each minibatch, our method not only can exploit an external knowledge graph but also can balance the external and old knowledge to perform well on new data. We conduct extensive experiments to evaluate our method with both a human knowledge graph (Wordnet) and a graph built from pre-trained word embeddings (Word2vec). The experimental results show that our method achieves significantly better performances than state-of-the-art baselines in terms of probabilistic predictive measure and topic coherence. In particular, our method can work well when dealing with short texts as well as concept drift. The implementation of GCTM is available at \url{https://github.com/bachtranxuan/GCTM.git}.	翻訳日:2022-12-24 01:04:52 公開日:2021-12-24
# ソース知識とターゲット関連性を同時に伝達する非教師なしドメイン適応 Unsupervised Domain Adaptation Through Transferring both the Source-Knowledge and Target-Relatedness Simultaneously ( http://arxiv.org/abs/2003.08051v3 ) ライセンス: Link先を確認	Qing Tian, Yanan Zhu, Chuang Ma, Meng Cao	(参考訳) 教師なしドメイン適応(Unsupervised domain adapt, UDA)は、機械学習とパターン認識の分野における新たな研究トピックであり、ソースドメインから知識を伝達することで、ラベルなしのターゲットドメインの学習を支援することを目的としている。 Unsupervised domain adaptation (UDA) is an emerging research topic in the field of machine learning and pattern recognition, which aims to help the learning of unlabeled target domain by transferring knowledge from the source domain.	翻訳日:2022-12-22 09:31:08 公開日:2021-12-24
# ランダムドット製品グラフの連続推論のための1次元部分多様体の学習 Learning 1-Dimensional Submanifolds for Subsequent Inference on Random Dot Product Graphs ( http://arxiv.org/abs/2004.07348v6 ) ライセンス: Link先を確認	Michael W. Trosset, Mingyue Gao, Minh Tang, Carey E. Priebe	(参考訳) ランダムドット積グラフ(RDPG)は、潜伏ユークリッド空間における頂点が位置に対応するネットワークの生成モデルであり、潜伏位置の点積によってエッジ確率が決定される。潜在空間の未知の1$次元部分多様体から潜在位置をランダムにサンプリングするRDPGを考察する。原則として、制限された推論、すなわち、部分多様体の構造を利用する手順は、制限されていない推論よりも効果的であるべきであるが、部分多様体が未知のときに制限された推論を実行する方法が明確でない。多様体学習の手法は、制限された推論の利点を実現するのに十分十分に未知の部分多様体を学習するために使うことができる。説明するために、我々は、小さな頂点のコミュニティのFr\'{e}chet手段に関する1ドルおよび2ドルサンプル仮説を、潜在構造を推論するために、完全な頂点集合を用いてテストした。本研究では,推定潜在位置から構築された近傍グラフ上の最短経路距離を用いて,未知の1$次元部分多様体上の弧長を推定する等マップ法を多様体学習に適用するテスト統計を提案する。従来のイソマプの応用とは異なり、推定された潜在位置は興味のサブ多様体には属さない。我々は、isomap の既存の収束結果をこの設定に拡張し、それらを用いて、補助頂点の数が増えるにつれて、部分多様体が知られているとき、テストのパワーは対応するテストのパワーに収束することを示す。最後に,本手法をショウジョウバエ幼生のキノコのコネクトームを研究する際に生じる推論問題に適用する。不定値学習多様体検定(英語版)(univariate learnt manifold test)は(p<0.05$)を拒絶するが、多変量環境空間検定(英語版)(multivariate ambient space test)は(p\gg0.05$)ではない。 A random dot product graph (RDPG) is a generative model for networks in which vertices correspond to positions in a latent Euclidean space and edge probabilities are determined by the dot products of the latent positions. We consider RDPGs for which the latent positions are randomly sampled from an unknown $1$-dimensional submanifold of the latent space. In principle, restricted inference, i.e., procedures that exploit the structure of the submanifold, should be more effective than unrestricted inference; however, it is not clear how to conduct restricted inference when the submanifold is unknown. We submit that techniques for manifold learning can be used to learn the unknown submanifold well enough to realize benefit from restricted inference. To illustrate, we test $1$- and $2$-sample hypotheses about the Fr\'{e}chet means of small communities of vertices, using the complete set of vertices to infer latent structure. We propose test statistics that deploy the Isomap procedure for manifold learning, using shortest path distances on neighborhood graphs constructed from estimated latent positions to estimate arc lengths on the unknown $1$-dimensional submanifold. Unlike conventional applications of Isomap, the estimated latent positions do not lie on the submanifold of interest. We extend existing convergence results for Isomap to this setting and use them to demonstrate that, as the number of auxiliary vertices increases, the power of our test converges to the power of the corresponding test when the submanifold is known. Finally, we apply our methods to an inference problem that arises in studying the connectome of the Drosophila larval mushroom body. The univariate learnt manifold test rejects ($p<0.05$), while the multivariate ambient space test does not ($p\gg0.05$), illustrating the value of identifying and exploiting low-dimensional structure for subsequent inference.	翻訳日:2022-12-13 02:54:21 公開日:2021-12-24
# 定常時間系列からの推論のための学習因子グラフ Learned Factor Graphs for Inference from Stationary Time Sequences ( http://arxiv.org/abs/2006.03258v4 ) ライセンス: Link先を確認	Nir Shlezinger, Nariman Farsad, Yonina C. Eldar, and Andrea J. Goldsmith	(参考訳) 時系列からの推論法の設計は、伝統的に、潜在希望シーケンスと観測されたシーケンスの関係を記述する統計モデルに依存してきた。モデルに基づくアルゴリズムの幅広い系統が導出され、基礎となる分布を表す因子グラフ上の再帰的計算を用いて制御可能な複雑性で推論を行う。別のモデルに依存しないアプローチでは、機械学習(ML)手法を用いる。本稿では,モデルベースアルゴリズムとデータ駆動型MLツールを組み合わせた定常時間列のフレームワークを提案する。提案手法では、完全な推論タスクではなく、時間系列の分布を記述する因子グラフの特定の成分を別々に学習するためにニューラルネットワークが開発された。この分布の定常特性を利用することで、結果のアプローチを時間的変化の列に適用することができる。学習された因子グラフは、小さなトレーニングセットを使用してトレーニング可能なコンパクトニューラルネットワークを使用して実現することができる。本稿では,ラベル付きデータから和生成スキームを実装することを学習し,異なる長さのシーケンスに適用可能な,学習した定常因子グラフに基づく推論アルゴリズムを提案する。提案する学習因子グラフは,sleep-edfデータセットを用いた睡眠ステージ検出や未知チャネルを用いたデジタル通信におけるシンボル検出のために,小さなトレーニングセットから正確な推論を行うことができることを示す。 The design of methods for inference from time sequences has traditionally relied on statistical models that describe the relation between a latent desired sequence and the observed one. A broad family of model-based algorithms have been derived to carry out inference at controllable complexity using recursive computations over the factor graph representing the underlying distribution. An alternative model-agnostic approach utilizes machine learning (ML) methods. Here we propose a framework that combines model-based algorithms and data-driven ML tools for stationary time sequences. In the proposed approach, neural networks are developed to separately learn specific components of a factor graph describing the distribution of the time sequence, rather than the complete inference task. By exploiting stationary properties of this distribution, the resulting approach can be applied to sequences of varying temporal duration. Learned factor graph can be realized using compact neural networks that are trainable using small training sets, or alternatively, be used to improve upon existing deep inference systems. We present an inference algorithm based on learned stationary factor graphs, which learns to implement the sum-product scheme from labeled data, and can be applied to sequences of different lengths. Our experimental results demonstrate the ability of the proposed learned factor graphs to learn to carry out accurate inference from small training sets for sleep stage detection using the Sleep-EDF dataset, as well as for symbol detection in digital communications with unknown channels.	翻訳日:2022-11-25 03:26:18 公開日:2021-12-24
# 可逆ニューラルネットワークにおける爆発的逆の理解と緩和 Understanding and Mitigating Exploding Inverses in Invertible Neural Networks ( http://arxiv.org/abs/2006.09347v2 ) ライセンス: Link先を確認	Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger Grosse, J\"orn-Henrik Jacobsen	(参考訳) Invertible Neural Network (INN) は、生成モデルの設計、メモリ節約勾配計算の実装、逆問題の解決に使われている。本研究は,よく使われる INN アーキテクチャが爆発する逆数に悩まされ,数値的に非可逆になる傾向があることを示す。 In-out-of-distribution(OOD)データにおける変数変更の非適用性、メモリセービングバックプロップの誤勾配、正規化フローモデルからのサンプリング不能など、広範囲にわたるINNユースケースの障害を明らかにする。さらに、共通アーキテクチャの原子構造ブロックの双Lipschitz特性を導出する。 INNの安定性に関するこれらの洞察は、これらの障害を治療する方法を提供します。メモリ節約バックプロップのように局所可逆性が十分であるタスクに対しては、柔軟で効率的な正規化子を提案する。 OODデータに正規化フローを適用するなど、グローバルな可逆性が必要な問題に対しては、安定したINNビルディングブロックを設計することの重要性を示す。 Invertible neural networks (INNs) have been used to design generative models, implement memory-saving gradient computation, and solve inverse problems. In this work, we show that commonly-used INN architectures suffer from exploding inverses and are thus prone to becoming numerically non-invertible. Across a wide range of INN use-cases, we reveal failures including the non-applicability of the change-of-variables formula on in- and out-of-distribution (OOD) data, incorrect gradients for memory-saving backprop, and the inability to sample from normalizing flow models. We further derive bi-Lipschitz properties of atomic building blocks of common architectures. These insights into the stability of INNs then provide ways forward to remedy these failures. For tasks where local invertibility is sufficient, like memory-saving backprop, we propose a flexible and efficient regularizer. For problems where global invertibility is necessary, such as applying normalizing flows on OOD data, we show the importance of designing stable INN building blocks.	翻訳日:2022-11-20 19:37:04 公開日:2021-12-24
# グラフニューラルネットワークのアーキテクチャ的意味 Architectural Implications of Graph Neural Networks ( http://arxiv.org/abs/2009.00804v2 ) ライセンス: Link先を確認	Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, Minyi Guo	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造を操作するディープラーニングモデルの新たなラインである。多くのグラフ関連のタスクで高い精度を達成しているため、ますます人気が高まっている。しかしながら、GNNはシステムやアーキテクチャのコミュニティにおいて、多層パーセプトロンや畳み込みニューラルネットワークのようなそれと同等のものほどよく理解されていない。この作業は、コミュニティにGNNを導入しようとします。 GCNの特性のみを示す以前の作業とは対照的に、一般的なGNN記述フレームワークに基づいて、GNNワークロードの品種の大部分をカバーしています。 2つの広く使われているライブラリの上にモデルを構築することで、汎用およびアプリケーション固有のアーキテクチャに関する推論ステージでのgnn計算を特徴付け、gnnのシステムおよびアーキテクチャ研究をさらに促進できることを期待します。 Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures. It is becoming more and more popular due to its high accuracy achieved in many graph-related tasks. However, GNN is not as well understood in the system and architecture community as its counterparts such as multi-layer perceptrons and convolutional neural networks. This work tries to introduce the GNN to our community. In contrast to prior work that only presents characterizations of GCNs, our work covers a large portion of the varieties for GNN workloads based on a general GNN description framework. By constructing the models on top of two widely-used libraries, we characterize the GNN computation at inference stage concerning general-purpose and application-specific architectures and hope our work can foster more system and architecture research for GNNs.	翻訳日:2022-10-22 20:04:00 公開日:2021-12-24
# リフレクティブデコーディング:オフザシェルフ言語モデルによる一方向生成を超えて Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models ( http://arxiv.org/abs/2010.08566v4 ) ライセンス: Link先を確認	Peter West, Ximing Lu, Ari Holtzman, Chandra Bhagavatula, Jena Hwang, Yejin Choi	(参考訳) 一般公開され、大きな事前訓練された言語モデル(LM)は、顕著な品質のテキストを生成するが、左から右へしか連続しない。結果として、それらは、タスク固有の監督を必要とするパラフレージングやテキストインフィルなどの一方向の仮定を破る生成タスクに即座には適用されない。本稿では,一方向LMの非順序タスクへの直接適用を可能にする新しい非教師付きアルゴリズムであるリフレクティブデコーディングを提案する。 2段階のアプローチでは、監督やパラレルコーパスは必要ありません。まず、文脈化ステップにおいて、私たちはLMを使用して、入力をまとめてキャプチャする過去と将来のコンテキストのアンサンブルを生成します(例えば、パラフレーズのソース文)。第2に、リフレクションステップでは、これらの「コンテキストアンサンブル」を条件とし、それらと互換性のある出力を生成する。包括的実証実験の結果、反射的復号法はパラフレージングと帰納的テキスト埋入の両方において強い教師なしベースラインを上回り、教師なしメソッドと教師なしメソッドのギャップを著しく狭めることが示された。反射復号は、人的評価を含む様々な指標で複数の教師付きベースラインを超える。 Publicly available, large pretrained LanguageModels (LMs) generate text with remarkable quality, but only sequentially from left to right. As a result, they are not immediately applicable to generation tasks that break the unidirectional assumption, such as paraphrasing or text-infilling, necessitating task-specific supervision. In this paper, we present Reflective Decoding, a novel unsupervised algorithm that allows for direct application of unidirectional LMs to non-sequential tasks. Our 2-step approach requires no supervision or even parallel corpora, only two off-the-shelf pretrained LMs in opposite directions: forward and backward. First, in the contextualization step, we use LMs to generate ensembles of past and future contexts which collectively capture the input (e.g. the source sentence for paraphrasing). Second, in the reflection step, we condition on these "context ensembles", generating outputs that are compatible with them. Comprehensive empirical results demonstrate that Reflective Decoding outperforms strong unsupervised baselines on both paraphrasing and abductive text infilling, significantly narrowing the gap between unsupervised and supervised methods. Reflective Decoding surpasses multiple supervised baselines on various metrics including human evaluation.	翻訳日:2022-10-06 21:05:28 公開日:2021-12-24
# メタ強化学習による動的チャネルアクセス Dynamic Channel Access via Meta-Reinforcement Learning ( http://arxiv.org/abs/2201.09075v1 ) ライセンス: Link先を確認	Ziyang Lu and M. Cenk Gursoy	(参考訳) 本稿では,メタ強化学習による動的無線環境におけるチャネルアクセス問題に対処する。 spectrumは、特にネットワーク内のデバイス数の増加に伴い、無線通信において不足しているリソースである。近年,深部強化学習(DRL)の成功に触発されて,DRLを介して無線リソース割り当て問題に対処する研究が盛んに行われている。しかし、DRLアルゴリズムのトレーニングには、通常、特定のタスクごとに環境から収集された大量のデータが必要である。本研究では,これらの課題に対処するために,モデル非依存型メタラーニング(MAML)の手法を取り入れたメタDRLフレームワークを提案する。提案手法では,類似するチャネル選択タスクに対して共通初期化を訓練する。初期化から、同じ分布から引き出された異なるタスクに適応するためには、わずかに勾配降下が要求される。シミュレーション結果による性能改善を実証する。 In this paper, we address the channel access problem in a dynamic wireless environment via meta-reinforcement learning. Spectrum is a scarce resource in wireless communications, especially with the dramatic increase in the number of devices in networks. Recently, inspired by the success of deep reinforcement learning (DRL), extensive studies have been conducted in addressing wireless resource allocation problems via DRL. However, training DRL algorithms usually requires a massive amount of data collected from the environment for each specific task and the well-trained model may fail if there is a small variation in the environment. In this work, in order to address these challenges, we propose a meta-DRL framework that incorporates the method of Model-Agnostic Meta-Learning (MAML). In the proposed framework, we train a common initialization for similar channel selection tasks. From the initialization, we show that only a few gradient descents are required for adapting to different tasks drawn from the same distribution. We demonstrate the performance improvements via simulation results.	翻訳日:2022-01-30 11:52:26 公開日:2021-12-24
# ラーニング・ツー・ランク蒸留を用いた効率的な組合せ最適化モデル An Efficient Combinatorial Optimization Model Using Learning-to-Rank Distillation ( http://arxiv.org/abs/2201.00695v1 ) ライセンス: Link先を確認	Honguk Woo, Hyunsung Lee, Sangwoo Cho	(参考訳) 近年,複合最適化問題(COP)の解法として深部強化学習(RL)が実現可能であることが証明されている。本手法は情報検索の分野で研究されている。いくつかのCOPは入力項目の優先順位付けとして定式化できるが、情報検索でよく見られるように、COPの深部RLにどのように学習から階級への技法を組み込むかは、完全には解明されていない。本稿では、COPのRLにより得られる高性能なランク付けポリシーを非定位単純モデルに蒸留し、低遅延COPソルバを実現するための、学習からランクへの蒸留に基づくCOPフレームワークを提案する。具体的には、近似されたランキング蒸留を用いて、勾配降下によるスコアベースランキングモデルを学習可能にする。さらに,効率的なシーケンスサンプリングを用いて,遅延の少ない推論性能を向上させる。このフレームワークを用いて,蒸留モデルがそれぞれの高性能RLに匹敵する性能を得るだけでなく,数倍高速な推算を行うことを示した。優先度に基づくタスクスケジューリングや多次元knapsackなど,複数のCOPを用いてフレームワークの評価を行い,推論遅延と性能の観点からフレームワークの利点を実証した。 Recently, deep reinforcement learning (RL) has proven its feasibility in solving combinatorial optimization problems (COPs). The learning-to-rank techniques have been studied in the field of information retrieval. While several COPs can be formulated as the prioritization of input items, as is common in the information retrieval, it has not been fully explored how the learning-to-rank techniques can be incorporated into deep RL for COPs. In this paper, we present the learning-to-rank distillation-based COP framework, where a high-performance ranking policy obtained by RL for a COP can be distilled into a non-iterative, simple model, thereby achieving a low-latency COP solver. Specifically, we employ the approximated ranking distillation to render a score-based ranking model learnable via gradient descent. Furthermore, we use the efficient sequence sampling to improve the inference performance with a limited delay. With the framework, we demonstrate that a distilled model not only achieves comparable performance to its respective, high-performance RL, but also provides several times faster inferences. We evaluate the framework with several COPs such as priority-based task scheduling and multidimensional knapsack, demonstrating the benefits of the framework in terms of inference latency and performance.	翻訳日:2022-01-09 13:30:51 公開日:2021-12-24
# 状態選択アルゴリズムと状態フルネットワークプロトコルのファジング性能への影響 State Selection Algorithms and Their Impact on The Performance of Stateful Network Protocol Fuzzing ( http://arxiv.org/abs/2112.15498v1 ) ライセンス: Link先を確認	Dongge Liu, Van-Thuan Pham, Gidon Ernst, Toby Murray, and Benjamin I.P. Rubinstein	(参考訳) ネットワークプロトコルの実装のステートフル性は、ファジングを含むテストと検証のテクニックにユニークな課題をもたらす。ステートフルなファズナーは状態モデルを利用して状態空間を分割し、テスト生成プロセスを支援する。すべての状態が等しく重要で、ファジングキャンペーンには時間制限があるわけではないので、ファジングは進歩的な状態を他の状態よりも優先する効果的な状態選択アルゴリズムを必要とする。いくつかの状態選択アルゴリズムが提案されているが、異なるプラットフォーム上で個別に実装され評価され、決定的な結果を得るのが困難である。本研究では,ネットワークサーバ用最先端ファジタであるAFLNetと同一のファジリングプラットフォーム上で,広範な状態選択アルゴリズムの評価を行う。このアルゴリズムセットには、AFLNetとAFLNetLegionと呼ばれる新しい原理のアルゴリズムがサポートされている。 ProFuzzBenchベンチマークの実験結果から, (i) AFLNetの既存の状態選択アルゴリズムは、非常によく似たコードカバレッジを実現する。 (ii) aflnetlegionは、選択されたケーススタディでこれらのアルゴリズムを明らかに上回っているが (iii)全体的な改善は無意味である。これらは予想外だが興味深い発見だ。この問題を特定し、今後の研究機会を開く可能性のある洞察を共有します。 The statefulness property of network protocol implementations poses a unique challenge for testing and verification techniques, including Fuzzing. Stateful fuzzers tackle this challenge by leveraging state models to partition the state space and assist the test generation process. Since not all states are equally important and fuzzing campaigns have time limits, fuzzers need effective state selection algorithms to prioritize progressive states over others. Several state selection algorithms have been proposed but they were implemented and evaluated separately on different platforms, making it hard to achieve conclusive findings. In this work, we evaluate an extensive set of state selection algorithms on the same fuzzing platform that is AFLNet, a state-of-the-art fuzzer for network servers. The algorithm set includes existing ones supported by AFLNet and our novel and principled algorithm called AFLNetLegion. The experimental results on the ProFuzzBench benchmark show that (i) the existing state selection algorithms of AFLNet achieve very similar code coverage, (ii) AFLNetLegion clearly outperforms these algorithms in selected case studies, but (iii) the overall improvement appears insignificant. These are unexpected yet interesting findings. We identify problems and share insights that could open opportunities for future research on this topic.	翻訳日:2022-01-09 13:29:14 公開日:2021-12-24
# (参考訳) CARLAに実装された自律走行車両の遮蔽型検証・検証フレームワーク Intersection focused Situation Coverage-based Verification and Validation Framework for Autonomous Vehicles Implemented in CARLA ( http://arxiv.org/abs/2112.14706v1 ) ライセンス: CC BY 4.0	Zaid Tahir, Rob Alexander	(参考訳) 自動運転車(avs:autonomous vehicle)は、自動運転ソフトウェアにおけるエラーが大きな損失につながる可能性があるため、安全クリティカルなドメインで運用される。統計的には、AVsオペレーショナルデザインドメイン(ODD)の一部である道路交差点は、最も高い事故率を持っている。したがって,道路交差点の限界に対するAVの試験と道路交差点の安全確保が重要であり,本論文の焦点となる。本稿では,CARLA というオープンソースのAVシミュレータで開発された AV の検証・検証・安全性保証のための状況カバレッジ(SitCov) AV-testing フレームワークを提案する。 sitcov av-testing frameworkは、avsの安全性保証のための自動テストスイート生成のための状況カバレッジ基準を使用して、異なる環境および交差点構成条件下での道路交差点における車両間相互作用に焦点を当てている。我々は、交叉状況のオントロジーを開発し、それを用いて状況超空間、すなわちそのオントロジーから生じる全ての可能な状況の空間を生成する。 SitCov AVテストフレームワークの評価のために,エゴAVで複数の障害を発生させ,状況カバレッジとランダムな状況生成を比較した。両方の生成手法が、同じ数のシード断層をトリガーしていることがわかりましたが、カバレッジベースの生成は、エゴAVの自律運転アルゴリズムの弱点、特にエッジケースにおいて、より多くを教えてくれます。私たちのコードはオンラインで公開されており、誰でも私たちのSitCov AV-testingフレームワークを使って、それを使って、さらにその上に構築することができます。本稿では,V&Vの領域とAV開発への貢献を理論的観点からだけでなく,オープンソースのソフトウェアコントリビューションや,V&VやAV開発のためのフレキシブル・エフェクトなツールのリリースの観点からも目指す。 Autonomous Vehicles (AVs) i.e., self-driving cars, operate in a safety critical domain, since errors in the autonomous driving software can lead to huge losses. Statistically, road intersections which are a part of the AVs operational design domain (ODD), have some of the highest accident rates. Hence, testing AVs to the limits on road intersections and assuring their safety on road intersections is pertinent, and thus the focus of this paper. We present a situation coverage-based (SitCov) AV-testing framework for the verification and validation (V&V) and safety assurance of AVs, developed in an open-source AV simulator named CARLA. The SitCov AV-testing framework focuses on vehicle-to-vehicle interaction on a road intersection under different environmental and intersection configuration situations, using situation coverage criteria for automatic test suite generation for safety assurance of AVs. We have developed an ontology for intersection situations, and used it to generate a situation hyperspace i.e., the space of all possible situations arising from that ontology. For the evaluation of our SitCov AV-testing framework, we have seeded multiple faults in our ego AV, and compared situation coverage based and random situation generation. We have found that both generation methodologies trigger around the same number of seeded faults, but the situation coverage-based generation tells us a lot more about the weaknesses of the autonomous driving algorithm of our ego AV, especially in edge-cases. Our code is publicly available online, anyone can use our SitCov AV-testing framework and use it or build further on top of it. This paper aims to contribute to the domain of V&V and development of AVs, not only from a theoretical point of view, but also from the viewpoint of an open-source software contribution and releasing a flexible/effective tool for V&V and development of AVs.	翻訳日:2022-01-02 09:01:35 公開日:2021-12-24
# (参考訳) プロファイルなしのプロファイル誘導最適化:機械学習アプローチ Profile Guided Optimization without Profiles: A Machine Learning Approach ( http://arxiv.org/abs/2112.14679v1 ) ライセンス: CC BY 4.0	Nadav Rotem, Chris Cummins	(参考訳) プロファイル誘導最適化は、動的動作に基づくコンパイラの最適化能力を改善する効果的な手法であるが、プロファイルデータの収集は高価で面倒であり、定期的な更新が必要である。本稿では,プロファイルを最適化せずにコンパイルされたプログラムの性能を向上させる分岐確率を推定する新しい統計手法を提案する。分岐確率情報を有するバイナリの大規模なコーパスから収集した情報を用いて,オフライントレーニングを行う。学習されたモデルは、コンパイラーが正規の未入力プログラムの分岐確率を予測するために使われ、コンパイラは最適化決定を知らせるために使用できる。我々の技術はLLVMに直接統合され、既存の人間工学のコンパイラヒューリスティックスを補完します。本手法をベンチマークスイートで評価し,プロファイル情報無しでコンパイルした場合の利点を示す。デプロイメントでは,プロファイリングの実行を必要とせず,コンパイル時間に何の影響も与えない。 Profile guided optimization is an effective technique for improving the optimization ability of compilers based on dynamic behavior, but collecting profile data is expensive, cumbersome, and requires regular updating to remain fresh. We present a novel statistical approach to inferring branch probabilities that improves the performance of programs that are compiled without profile guided optimizations. We perform offline training using information that is collected from a large corpus of binaries that have branch probabilities information. The learned model is used by the compiler to predict the branch probabilities of regular uninstrumented programs, which the compiler can then use to inform optimization decisions. We integrate our technique directly in LLVM, supplementing the existing human-engineered compiler heuristics. We evaluate our technique on a suite of benchmarks, demonstrating some gains over compiling without profile information. In deployment, our technique requires no profiling runs and has negligible effect on compilation time.	翻訳日:2022-01-02 08:40:00 公開日:2021-12-24
# (参考訳) 深層強化学習によるレーン変更決定- Lane Change Decision-Making through Deep Reinforcement Learning ( http://arxiv.org/abs/2112.14705v1 ) ライセンス: CC BY-SA 4.0	Mukesh Ghimire, Malobika Roy Choudhury, Guna Sekhar Sai Harsha Lagudu	(参考訳) 交通環境の複雑さとボラティリティのため、自動運転における意思決定は極めて難しい問題である。このプロジェクトでは、Deep Q-Networkとルールベースの制約を使ってレーン変更の意思決定を行います。安全かつ効率的な車線変更挙動は、高レベル側方意思決定と低レベルルールに基づく軌道監視を組み合わせることで得られる。エージェントは、100エピソードのトレーニングを経て、現実世界のような大都市シミュレーターで適切な車線変更操作を行うことが期待されている。その結果,ルールベースDQNはDQN法よりも優れた性能を示した。規則に基づくDQNは、0.8の安全率、平均速度47MPHを達成する Due to the complexity and volatility of the traffic environment, decision-making in autonomous driving is a significantly hard problem. In this project, we use a Deep Q-Network, along with rule-based constraints to make lane-changing decision. A safe and efficient lane change behavior may be obtained by combining high-level lateral decision-making with low-level rule-based trajectory monitoring. The agent is anticipated to perform appropriate lane-change maneuvers in a real-world-like udacity simulator after training it for a total of 100 episodes. The results shows that the rule-based DQN performs better than the DQN method. The rule-based DQN achieves a safety rate of 0.8 and average speed of 47 MPH	翻訳日:2022-01-02 08:26:57 公開日:2021-12-24
# シフトウインドウセルフアテンションによる原料品質検出 Raw Produce Quality Detection with Shifted Window Self-Attention ( http://arxiv.org/abs/2112.13845v1 ) ライセンス: Link先を確認	Oh Joon Kwon, Byungsoo Kim, Youngduck Choi	(参考訳) 気候変動の加速と人口の急増により、世界の食料不安全は今後数十年で悪化すると予想されている。この静脈では、食品生産のあらゆるレベルで非効率を取り除くことが重要である。ディープラーニングの最近の進歩は、そのような非効率性を減らすのに役立つが、その応用はまだ業界全体で主流になっておらず、大規模な経済コストを誘導している。この点において、RPQD(Raw Produce Quality Detection)タスクにCNN(Convolutional Neural Networks)などの最新の技術が適用されている。一方、Transformerが他のモダリティのビジョンで成功したことで、RPQDのTransformerベースのモデルよりも優れたパフォーマンスが期待できるようになりました。本研究では,近年の最先端swin(shifted windows)トランスフォーマーについて,ウインドウ内とウインドウ間の両方で自己接触を計算した。 Swin Transformerを4種類のRPQD画像データセット上のCNNモデルと比較し、それぞれが果物、野菜、魚、豚肉、牛肉といった異なる種類の原料を含む。 swin transformerは、優れた性能や競争力を実現するだけでなく、データと計算効率も向上し、現実の環境での実際のデプロイメントに理想的です。私たちの知る限りでは、これはrpqdタスクに関する最初の大規模な実証研究であり、今後の作業でさらに注目されることを期待しています。 Global food insecurity is expected to worsen in the coming decades with the accelerated rate of climate change and the rapidly increasing population. In this vein, it is important to remove inefficiencies at every level of food production. The recent advances in deep learning can help reduce such inefficiencies, yet their application has not yet become mainstream throughout the industry, inducing economic costs at a massive scale. To this point, modern techniques such as CNNs (Convolutional Neural Networks) have been applied to RPQD (Raw Produce Quality Detection) tasks. On the other hand, Transformer's successful debut in the vision among other modalities led us to expect a better performance with these Transformer-based models in RPQD. In this work, we exclusively investigate the recent state-of-the-art Swin (Shifted Windows) Transformer which computes self-attention in both intra- and inter-window fashion. We compare Swin Transformer against CNN models on four RPQD image datasets, each containing different kinds of raw produce: fruits and vegetables, fish, pork, and beef. We observe that Swin Transformer not only achieves better or competitive performance but also is data- and compute-efficient, making it ideal for actual deployment in real-world setting. To the best of our knowledge, this is the first large-scale empirical study on RPQD task, which we hope will gain more attention in future works.	翻訳日:2022-01-02 08:18:35 公開日:2021-12-24
# BMPQ:スクラッチからのDNNのビット勾配感度駆動混合精度量子化 BMPQ: Bit-Gradient Sensitivity Driven Mixed-Precision Quantization of DNNs from Scratch ( http://arxiv.org/abs/2112.13843v1 ) ライセンス: Link先を確認	Souvik Kundu, Shikai Wang, Qirui Sun, Peter A. Beerel, Massoud Pedram	(参考訳) 混合精度量子化を持つ大規模DNNは、高い分類性能を維持しながら超高圧縮を実現することができる。しかし、最適化プロセスの指針となる正確なメトリックを見つけることの難しさから、これらの手法は32ビット浮動小数点 (FP-32) ベースラインと比較して大きなパフォーマンスを犠牲にするか、事前訓練されたベースラインの可用性を必要とする計算的かつ反復的なトレーニングポリシーに依存している。この問題に対処するため,BMPQはビット勾配を用いて層感度を分析し,混合精度の量子化モデルを生成する訓練手法である。 BMPQは単一のトレーニングイテレーションを必要とするが、トレーニング済みのベースラインは必要ない。整数線形プログラム(ILP)を使用して、ハードウェア予算の固定の下で、トレーニング中にレイヤーの精度を動的に調整する。 BMPQの有効性を評価するため,CIFAR-10,CIFAR-100,Tiny-ImageNetデータセット上でVGG16,ResNet18を用いて広範囲に実験を行った。ベースラインのFP-32モデルと比較して、BMPQは15.4倍少ないパラメータビットを持つモデルの精度は無視できる。 sota "during training" と比較すると,cifar-10,cifar-100,tiny-imagenetでは2.1倍,2.2倍,2.9倍小さく,精度は最大14.54%向上した。 Large DNNs with mixed-precision quantization can achieve ultra-high compression while retaining high classification performance. However, because of the challenges in finding an accurate metric that can guide the optimization process, these methods either sacrifice significant performance compared to the 32-bit floating-point (FP-32) baseline or rely on a compute-expensive, iterative training policy that requires the availability of a pre-trained baseline. To address this issue, this paper presents BMPQ, a training method that uses bit gradients to analyze layer sensitivities and yield mixed-precision quantized models. BMPQ requires a single training iteration but does not need a pre-trained baseline. It uses an integer linear program (ILP) to dynamically adjust the precision of layers during training, subject to a fixed hardware budget. To evaluate the efficacy of BMPQ, we conduct extensive experiments with VGG16 and ResNet18 on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. Compared to the baseline FP-32 models, BMPQ can yield models that have 15.4x fewer parameter bits with a negligible drop in accuracy. Compared to the SOTA "during training", mixed-precision training scheme, our models are 2.1x, 2.2x, and 2.9x smaller, on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively, with an improved accuracy of up to 14.54%.	翻訳日:2022-01-02 08:18:10 公開日:2021-12-24
# 微分可能ゲームにおける多様性のためのリアプノフ指数 Lyapunov Exponents for Diversity in Differentiable Games ( http://arxiv.org/abs/2112.14570v1 ) ライセンス: Link先を確認	Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, Jakob Foerster	(参考訳) ridge rider (rr) は、hessian ("ridges") の固有ベクトルに従うことによって最適化問題の多様な解を求めるアルゴリズムである。 RRは保守的な勾配系(すなわち単一損失関数を含む設定)のために設計されており、サドルで分岐する。この概念を非保存的多エージェント勾配系に一般化し,任意の分岐点を求めるための一般化リッジライダー(grr)法を提案する。力学系の分野から機械を活用し,提案手法の理論的動機付けを行う。興味のある高次元問題に洞察を与えながら,新たな現象を可視化できる新しい玩具問題を構築した。最後に, 反復囚人のジレンマと, 生成的敵ネットワークを含む関連する機械学習問題において, 多様な解を求めることにより, 提案手法を実証的に評価した。 Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian ("ridges"). RR is designed for conservative gradient systems (i.e., settings involving a single loss function), where it branches at saddles - easy-to-find bifurcation points. We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method - denoted Generalized Ridge Rider (GRR) - for finding arbitrary bifurcation points. We give theoretical motivation for our method by leveraging machinery from the field of dynamical systems. We construct novel toy problems where we can visualize new phenomena while giving insight into high-dimensional problems of interest. Finally, we empirically evaluate our method by finding diverse solutions in the iterated prisoners' dilemma and relevant machine learning problems including generative adversarial networks.	翻訳日:2022-01-02 08:17:28 公開日:2021-12-24
# (参考訳) 報酬を再発見する:強化学習の視点 Rediscovering Affordance: A Reinforcement Learning Perspective ( http://arxiv.org/abs/2112.12886v1 ) ライセンス: CC BY 4.0	Yi-Chi Liao, Kashyap Todi, Aditya Acharya, Antti Keurulainen, Andrew Howes, Antti Oulasvirta	(参考訳) Affordanceは、オブジェクトによって許される可能性のあるアクションの認識を指す。その人間とコンピュータの相互作用との関連性にもかかわらず、既存の理論では、余裕形成の基盤となるメカニズムは説明されていない。本稿では,認知科学における強化学習理論に基づく補償形成の積分理論を提案する。鍵となる前提は、ユーザーは、強化信号(成功/失敗)が存在する場合の経験を通して、期待できる運動行動を知覚に関連付けることを学ぶことである。彼らはまた、行動(例えば「回転する」というダイヤル)を分類することを学び、価格について名前を付け、推論する能力を与える。新たなウィジェットに遭遇すると、これらのアクションを一般化する能力は、余裕を知覚する能力を決定する。この理論を仮想ロボットモデルに実装し,対話型ウィジェットタスクにおけるアフォーダンスの人間的適応を実証する。その予測は、人間のデータの動向と一致しているが、人間はより早くアフォーマンスを適応できるため、追加のメカニズムの存在を示唆する。 Affordance refers to the perception of possible actions allowed by an object. Despite its relevance to human-computer interaction, no existing theory explains the mechanisms that underpin affordance-formation; that is, how affordances are discovered and adapted via interaction. We propose an integrative theory of affordance-formation based on the theory of reinforcement learning in cognitive sciences. The key assumption is that users learn to associate promising motor actions to percepts via experience when reinforcement signals (success/failure) are present. They also learn to categorize actions (e.g., ``rotating'' a dial), giving them the ability to name and reason about affordance. Upon encountering novel widgets, their ability to generalize these actions determines their ability to perceive affordances. We implement this theory in a virtual robot model, which demonstrates human-like adaptation of affordance in interactive widgets tasks. While its predictions align with trends in human data, humans are able to adapt affordances faster, suggesting the existence of additional mechanisms.	翻訳日:2021-12-29 18:09:19 公開日:2021-12-24
# (参考訳) 高次元行列値データに対する最適可変クラスタリング Optimal Variable Clustering for High-Dimensional Matrix Valued Data ( http://arxiv.org/abs/2112.12909v1 ) ライセンス: CC BY 4.0	Inbeom Lee, Siyi Deng, Yang Ning	(参考訳) 行列値データは多くのアプリケーションでますます普及している。このタイプのデータに対する既存のクラスタリング手法のほとんどは、平均モデルに合わせて調整されており、特に高次元の設定において非常に有意義な特徴の依存構造を考慮していない。クラスタリングのための依存構造から情報を抽出するために,列と列のクラスタを表す未知のメンバシップ行列を用いて,行列形式で配置された特徴に対する新しい潜在変数モデルを提案する。このモデルでは、重み付き共分散行列の差分を相似性尺度として用いた階層的クラスタリングアルゴリズムのクラスをさらに提案する。理論上,温和な条件下では,高次元環境でのクラスタリング一貫性を実現する。この一貫性の結果は、重み付き共分散行列の幅広いクラスを持つアルゴリズムに対して成立するが、この結果の条件は重みの選択に依存する。この重みがアルゴリズムの理論的性能にどのように影響するかを調べるため、潜在変数モデルに基づいてクラスタリングのためのミニマックス下限を確立する。これらの結果から, この重みを用いることで, クラスター分離計量の大きさの観点で, アルゴリズムが最小のレート最適となることを保証できるという意味で, 最適重みを同定する。また,最適重み付きアルゴリズムの実用的実装についても論じる。最後に,本アルゴリズムの有限サンプル性能を評価するためのシミュレーション研究を行い,その手法をゲノムデータセットに適用する。 Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which can be very informative, especially in high-dimensional settings. To extract the information from the dependence structure for clustering, we propose a new latent variable model for the features arranged in matrix form, with some unknown membership matrices representing the clusters for the rows and columns. Under this model, we further propose a class of hierarchical clustering algorithms using the difference of a weighted covariance matrix as the dissimilarity measure. Theoretically, we show that under mild conditions, our algorithm attains clustering consistency in the high-dimensional setting. While this consistency result holds for our algorithm with a broad class of weighted covariance matrices, the conditions for this result depend on the choice of the weight. To investigate how the weight affects the theoretical performance of our algorithm, we establish the minimax lower bound for clustering under our latent variable model. Given these results, we identify the optimal weight in the sense that using this weight guarantees our algorithm to be minimax rate-optimal in terms of the magnitude of some cluster separation metric. The practical implementation of our algorithm with the optimal weight is also discussed. Finally, we conduct simulation studies to evaluate the finite sample performance of our algorithm and apply the method to a genomic dataset.	翻訳日:2021-12-29 17:45:00 公開日:2021-12-24
# (参考訳) 高精度3次元ポーズと形状推定のための複数初期化最適化ネットワーク Multi-initialization Optimization Network for Accurate 3D Human Pose and Shape Estimation ( http://arxiv.org/abs/2112.12917v1 ) ライセンス: CC BY 4.0	Zhiwei Liu, Xiangyu Zhu, Lu Yang, Xiang Yan, Ming Tang, Zhen Lei, Guibo Zhu, Xuetao Feng, Yan Wang, Jinqiao Wang	(参考訳) 単眼のrgb画像からの3d人間のポーズと形状復元は難しい課題である。既存の学習に基づく手法は、例えば2dと3dのジョイント位置といった弱い監督信号に大きく依存している。しかし、これらの弱い監督ラベルには2Dから3Dの曖昧さがあるので、そのようなラベルで訓練すると、ネットワークは局所的な最適条件で立ち往生しやすい。本稿では,複数の初期化を最適化することで,アンビチュアリティを低減する。具体的には,マルチイニシャライズ最適化ネットワーク(mion)と呼ばれる3段階フレームワークを提案する。第1段階では,入力サンプルの2次元キーポイントに適合する粗い3次元再構成候補を戦略的に選択する。各粗い再構成は初期化と見なすことができ、1つの最適化分岐につながる。第2段階では, メッシュ改質トランス (MRT) を設計し, 自己保持機構を用いて粗い再構成結果をそれぞれ洗練する。最後に,RGB画像の視覚的証拠が与えられた3次元再構成と一致するかどうかを評価することで,複数の候補から最高の結果を得るために,一貫性推定ネットワーク(CEN)を提案する。実験により、当社のマルチ初期化最適化ネットワークは、既存の3Dメッシュベースのメソッドを複数の公開ベンチマークで上回ります。 3D human pose and shape recovery from a monocular RGB image is a challenging task. Existing learning based methods highly depend on weak supervision signals, e.g. 2D and 3D joint location, due to the lack of in-the-wild paired 3D supervision. However, considering the 2D-to-3D ambiguities existed in these weak supervision labels, the network is easy to get stuck in local optima when trained with such labels. In this paper, we reduce the ambituity by optimizing multiple initializations. Specifically, we propose a three-stage framework named Multi-Initialization Optimization Network (MION). In the first stage, we strategically select different coarse 3D reconstruction candidates which are compatible with the 2D keypoints of input sample. Each coarse reconstruction can be regarded as an initialization leads to one optimization branch. In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism. Finally, a Consistency Estimation Network (CEN) is proposed to find the best result from mutiple candidates by evaluating if the visual evidence in RGB image matches a given 3D reconstruction. Experiments demonstrate that our Multi-Initialization Optimization Network outperforms existing 3D mesh based methods on multiple public benchmarks.	翻訳日:2021-12-29 17:43:44 公開日:2021-12-24
# (参考訳) nvbench:クロスドメイン自然言語と可視化タスクのための大規模合成データセット nvBench: A Large-Scale Synthesized Dataset for Cross-Domain Natural Language to Visualization Task ( http://arxiv.org/abs/2112.12926v1 ) ライセンス: CC BY 4.0	Yuyu Luo, Jiawei Tang, Guoliang Li	(参考訳) 自然言語(NL)クエリを対応する視覚化(VIS)に変換するNL2VISは、商用の視覚化ベンダーと学術研究者の両方で注目を集めている。過去数年間、高度なディープラーニングベースのモデルは、多くの自然言語処理(NLP)タスクにおいて人間のような能力を達成した。しかし、大きなバルクは、多くの(NL、VIS)ペアを持つベンチマークの欠如である。 105ドメイン上の750テーブルから25,750(NL, VIS)のペアを含む,最初の大規模NL2VISベンチマークであるnvBenchを,クロスドメインNL2VISタスクをサポートするために,(NL, SQL)ベンチマークから合成した。 nvBenchの品質は、23人の専門家と300人以上の群衆労働者によって広く検証されている。 nvBenchを用いたディープラーニングモデルトレーニングでは、nvBenchがNL2VISの分野を推し進めることができる。 NL2VIS - which translates natural language (NL) queries to corresponding visualizations (VIS) - has attracted more and more attention both in commercial visualization vendors and academic researchers. In the last few years, the advanced deep learning-based models have achieved human-like abilities in many natural language processing (NLP) tasks, which clearly tells us that the deep learning-based technique is a good choice to push the field of NL2VIS. However, a big balk is the lack of benchmarks with lots of (NL, VIS) pairs. We present nvBench, the first large-scale NL2VIS benchmark, containing 25,750 (NL, VIS) pairs from 750 tables over 105 domains, synthesized from (NL, SQL) benchmarks to support cross-domain NL2VIS task. The quality of nvBench has been extensively validated by 23 experts and 300+ crowd workers. Deep learning-based models training using nvBench demonstrate that nvBench can push the field of NL2VIS.	翻訳日:2021-12-29 17:28:06 公開日:2021-12-24
# (参考訳) 癌患者の計算表現型と死亡予測のための制約付きテンソル因子分解 Constrained tensor factorization for computational phenotyping and mortality prediction in patients with cancer ( http://arxiv.org/abs/2112.12933v1 ) ライセンス: CC BY 4.0	Francisco Y Cai, Chengsheng Mao, Yuan Luo	(参考訳) 背景: 米国で電子健康記録(EHR)の採用が増加していることで、計算可能なデータのトロブが生まれ、機械学習が有用な洞察を抽出するために応用されている。 EHRデータは、行列(テンソル)の3次元アナログとして表現され、計算表現型として解釈できる2次元因子に分解される。方法:2000年から2015年までノースウェスタン医科大学データウェアハウスにおける乳がん,前立腺がん,大腸癌,肺癌患者の計算表現型を導出し,コホート死亡率を予測するために,制約テンソル因子化を適用した。本実験では,因子化アルゴリズムにおける教師付き用語の使用,医療指標によるテンソル共起のフィルタリング,および因子化過程における社会決定因子(SDOH)の添加について検討した。得られた計算表現型を定性的に評価し,曲線(AUC)統計に基づく5年間の死亡予測能力を評価した。結果: 医学的指標によるフィルタリングにより, より簡潔で解釈可能な表現型が得られた。死亡予測性能(auc)は、実験条件やがんの種類によって異なっていた(例: 0.623 - 0.694, 前立腺: 0.603 - 0.750, 大腸: 0.523 - 0.641, 肺: 0.517 - 0.623)。一般に、教師付き項とSDOH共変量の導入により予測性能が向上した。結論: がん患者のスパースEHRデータに適用された制約テンソル因子化は, 5年間の死亡を予測できる計算表現型を発見することができる。因子化アルゴリズムにSDOH変数を組み込むことは、予測性能を向上させるための簡単で効果的な方法である。 Background: The increasing adoption of electronic health records (EHR) across the US has created troves of computable data, to which machine learning methods have been applied to extract useful insights. EHR data, represented as a three-dimensional analogue of a matrix (tensor), is decomposed into two-dimensional factors that can be interpreted as computational phenotypes. Methods: We apply constrained tensor factorization to derive computational phenotypes and predict mortality in cohorts of patients with breast, prostate, colorectal, or lung cancer in the Northwestern Medicine Enterprise Data Warehouse from 2000 to 2015. In our experiments, we examined using a supervised term in the factorization algorithm, filtering tensor co-occurrences by medical indication, and incorporating additional social determinants of health (SDOH) covariates in the factorization process. We evaluated the resulting computational phenotypes qualitatively and by assessing their ability to predict five-year mortality using the area under the curve (AUC) statistic. Results: Filtering by medical indication led to more concise and interpretable phenotypes. Mortality prediction performance (AUC) varied under the different experimental conditions and by cancer type (breast: 0.623 - 0.694, prostate: 0.603 - 0.750, colorectal: 0.523 - 0.641, and lung: 0.517 - 0.623). Generally, prediction performance improved with the use of a supervised term and the incorporation of SDOH covariates. Conclusion: Constrained tensor factorization, applied to sparse EHR data of patients with cancer, can discover computational phenotypes predictive of five-year mortality. The incorporation of SDOH variables into the factorization algorithm is an easy-to-implement and effective way to improve prediction performance.	翻訳日:2021-12-29 17:15:08 公開日:2021-12-24
# (参考訳) ドメイン特化語埋め込みとトピックモデリングを用いた科学出版の分析 Analyzing Scientific Publications using Domain-Specific Word Embedding and Topic Modelling ( http://arxiv.org/abs/2112.12940v1 ) ライセンス: CC BY 4.0	Trisha Singhal, Junhua Liu, Lucienne T.M. Blessing, Kwan Hui Lim	(参考訳) 科学の世界は急速に変化しており、新しい技術が開発され、新しい傾向が出現している。本稿では,学術出版物の科学的分析を行うための枠組みを提案する。このフレームワークは、単語埋め込みやトピックモデリングなど、自然言語処理の様々な技術を採用し、組み合わせている。単語埋め込みはドメイン固有語の意味的意味を捉えるために使われる。本稿では,様々な研究分野において,一般の意味的意味とドメイン固有語を学習できる2つの新しい科学論文の埋め込み,すなわちpub-gとpub-wを提案する。その後、これらの大規模研究分野における研究トピックのクラスターを特定するためにトピックモデリングが使用される。 2つの研究領域から1995年から2020年までの2つのカンファレンスと2つのジャーナルからなる出版データセットを収集した。 PUB-G と PUB-W の埋め込みは,トピックコヒーレンスに基づく ~0.18-1.03 のマージンの他のベースライン埋め込みに比べて優れていることを示す。 The scientific world is changing at a rapid pace, with new technology being developed and new trends being set at an increasing frequency. This paper presents a framework for conducting scientific analyses of academic publications, which is crucial to monitor research trends and identify potential innovations. This framework adopts and combines various techniques of Natural Language Processing, such as word embedding and topic modelling. Word embedding is used to capture semantic meanings of domain-specific words. We propose two novel scientific publication embedding, i.e., PUB-G and PUB-W, which are capable of learning semantic meanings of general as well as domain-specific words in various research fields. Thereafter, topic modelling is used to identify clusters of research topics within these larger research fields. We curated a publication dataset consisting of two conferences and two journals from 1995 to 2020 from two research domains. Experimental results show that our PUB-G and PUB-W embeddings are superior in comparison to other baseline embeddings by a margin of ~0.18-1.03 based on topic coherence.	翻訳日:2021-12-29 17:02:59 公開日:2021-12-24
# (参考訳) 生体像分割における深層アンサンブル Deep ensembles in bioimage segmentation ( http://arxiv.org/abs/2112.12955v1 ) ライセンス: CC BY 4.0	Loris Nanni, Daniela Cuza, Alessandra Lumini, Andrea Loreggia and Sheryl Brahnam	(参考訳) セマンティックセグメンテーションは、画像の各ピクセルを、利用可能なすべてのピクセルの集合から選択された特定のラベルに割り当てることで分類する。ここ数年、このようなタスクに多くの注意が向けられた。多くのコンピュータビジョン研究者は、画像のセマンティクスと低レベルの表現を学習できるモデルを開発するためにオートエンコーダ構造を適用しようとした。オートエンコーダアーキテクチャでは、入力が与えられた場合、エンコーダは、デコーダが元のデータを再構成するために使用する入力の低次元表現を計算する。本研究では,畳み込みニューラルネットワーク(CNN)のアンサンブルを提案する。アンサンブル法では、多くの異なるモデルが訓練され、分類に使用され、アンサンブルは単一分類器の出力を集約する。このアプローチは、システム全体のパフォーマンスを改善するために、さまざまな分類器の違いを活用する。単一分類器間の多様性は、異なる損失関数を用いて強制される。特に,ダイスと構造類似性指数の組み合わせによる新たな損失関数を提案する。提案するアンサンブルは,DeepLabV3+とHarDNet環境を用いて,異なるバックボーンネットワークを組み合わせることで実現されている。この提案はポリープとスキンセグメンテーションの2つの実世界のシナリオに関する広範な経験的評価を通じて評価される。すべてのコードはhttps://github.com/LorisNanni.comで公開されている。 Semantic segmentation consists in classifying each pixel of an image by assigning it to a specific label chosen from a set of all the available ones. During the last few years, a lot of attention shifted to this kind of task. Many computer vision researchers tried to apply autoencoder structures to develop models that can learn the semantics of the image as well as a low-level representation of it. In an autoencoder architecture, given an input, an encoder computes a low dimensional representation of the input that is then used by a decoder to reconstruct the original data. In this work, we propose an ensemble of convolutional neural networks (CNNs). In ensemble methods, many different models are trained and then used for classification, the ensemble aggregates the outputs of the single classifiers. The approach leverages on differences of various classifiers to improve the performance of the whole system. Diversity among the single classifiers is enforced by using different loss functions. In particular, we present a new loss function that results from the combination of Dice and Structural Similarity Index. The proposed ensemble is implemented by combining different backbone networks using the DeepLabV3+ and HarDNet environment. The proposal is evaluated through an extensive empirical evaluation on two real-world scenarios: polyp and skin segmentation. All the code is available online at https://github.com/LorisNanni.	翻訳日:2021-12-29 16:46:44 公開日:2021-12-24
# (参考訳) 分岐モデル空間におけるサポートベクトルマシンの最適モデル平均化 Optimal Model Averaging of Support Vector Machines in Diverging Model Spaces ( http://arxiv.org/abs/2112.12961v1 ) ライセンス: CC BY 4.0	Chaoxia Yuan, Chao Ying, Zhou Yu, Fang Fang	(参考訳) サポートベクトルマシン(SVM)は多くの分野で大きな成功を収めた強力な分類手法である。その性能は冗長な共変量によって著しく損なわれるため、高次元共変量を持つSVMではモデル選択技術が広く用いられている。モデル選択の代替として、過去数十年でモデル平均化の領域で顕著な進歩が見られた。しかし、svmでは頻繁なモデル平均化手法は考慮されなかった。本研究は, このギャップを埋めることを目的として, クロスバリデーションにより最適重みを選択するSVMの頻繁なモデル平均化手順を提案する。サンプルサイズの指数関数的な速度で共変数の数が発散した場合でも、ヒンジ損失と最小損失の比率が1に収束するという意味で、提案手法の漸近的最適性を示す。また、モデル平均化に関する洞察を提供する収束率も導き出します。パラメータ選択をチューニングする面倒だが重要なタスクを必要とするSVMのモデル選択法と比較して、モデル平均化法はタスクを回避し、実証研究において有望な性能を示す。 Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet no frequentist model averaging method was considered for SVM. This work aims to fill the gap and to propose a frequentist model averaging procedure for SVM which selects the optimal weight by cross validation. Even when the number of covariates diverges at an exponential rate of the sample size, we show asymptotic optimality of the proposed method in the sense that the ratio of its hinge loss to the lowest possible loss converges to one. We also derive the convergence rate which provides more insights to model averaging. Compared to model selection methods of SVM which require a tedious but critical task of tuning parameter selection, the model averaging method avoids the task and shows promising performances in the empirical studies.	翻訳日:2021-12-29 16:25:26 公開日:2021-12-24
# (参考訳) 地球システム科学のための機械学習(ESS):南アジアにおける調査・現状・今後の方向性 Machine learning for Earth System Science (ESS): A survey, status and future directions for South Asia ( http://arxiv.org/abs/2112.12966v1 ) ライセンス: CC BY 4.0	Manmeet Singh, Bipin Kumar, Rajib Chattopadhyay, K Amarjyothi, Anup K Sutar, Sukanta Roy, Suryachandra A Rao, Ravi S. Nanjundiah	(参考訳) この調査は、機械学習アルゴリズムを適用する地球システム科学の現在の問題に焦点を当てている。これは、以前の研究の概要、インドの地球科学省における進行中の作業、そしていくつかの重要な地球科学問題へのMLアルゴリズムの将来の応用の概要を提供する。本研究では,地球システム科学(ESS)における機械学習に対するGartnerのハイプサイクルと,機械学習に関連する多次元領域のマインドマップについて,これまでの研究との比較を行った。我々は主に、大気、海洋、地震学、生物圏を含む地球科学の重要な要素に焦点を当て、統計的なダウンスケーリングや予測問題へのAI/ML応用をカバーする。 This survey focuses on the current problems in Earth systems science where machine learning algorithms can be applied. It provides an overview of previous work, ongoing work at the Ministry of Earth Sciences, Gov. of India, and future applications of ML algorithms to some significant earth science problems. We provide a comparison of previous work with this survey, a mind map of multidimensional areas related to machine learning and a Gartner's hype cycle for machine learning in Earth system science (ESS). We mainly focus on the critical components in Earth Sciences, including atmospheric, Ocean, Seismology, and biosphere, and cover AI/ML applications to statistical downscaling and forecasting problems.	翻訳日:2021-12-29 15:47:36 公開日:2021-12-24
# (参考訳) SGTR: Transformer を用いたエンドツーエンドのシーングラフ生成 SGTR: End-to-end Scene Graph Generation with Transformer ( http://arxiv.org/abs/2112.12970v1 ) ライセンス: CC BY 4.0	Rongjie Li, Songyang Zhang, Xuming He	(参考訳) シーングラフ生成(SGG)は、複雑な構成特性のため、難しい視覚的理解課題である。これまでのほとんどの作業では、ボトムアップの2段階あるいはポイントベースの1段階アプローチを採用していました。本研究では、上記の問題に対処する新しいSGG法を提案し、この課題を二部グラフ構築問題として定式化する。そこで我々は,まずエンティティと述語の提案集合を生成し,その後に有向エッジを推論して関係三重項を形成するトランスフォーマティブベースのエンドツーエンドフレームワークを開発した。特に,関係の構成的性質を活用するために,構造的述語生成器に基づく新しいエンティティ対応述語表現を開発する。さらに,エンティティ認識構造に基づいて,二部的なシーングラフの接続を推測するグラフ合成モジュールを設計し,シーングラフをエンドツーエンドで生成できるようにした。広範な実験結果から,我々の設計は,既存の手法のほとんどを上回って,高い推論効率を享受し,2つの難解なベンチマークにおいて,最先端あるいは同等のパフォーマンスを達成できることがわかった。当社のモデルがTransformerベースのシーングラフ生成の強力なベースラインになることを期待しています。 Scene Graph Generation (SGG) remains a challenging visual understanding task due to its complex compositional property. Most previous works adopt a bottom-up two-stage or a point-based one-stage approach, which often suffers from overhead time complexity or sub-optimal design assumption. In this work, we propose a novel SGG method to address the aforementioned issues, which formulates the task as a bipartite graph construction problem. To solve the problem, we develop a transformer-based end-to-end framework that first generates the entity and predicate proposal set, followed by inferring directed edges to form the relation triplets. In particular, we develop a new entity-aware predicate representation based on a structural predicate generator to leverage the compositional property of relationships. Moreover, we design a graph assembling module to infer the connectivity of the bipartite scene graph based on our entity-aware structure, enabling us to generate the scene graph in an end-to-end manner. Extensive experimental results show that our design is able to achieve the state-of-the-art or comparable performance on two challenging benchmarks, surpassing most of the existing approaches and enjoying higher efficiency in inference. We hope our model can serve as a strong baseline for the Transformer-based scene graph generation.	翻訳日:2021-12-29 15:32:20 公開日:2021-12-24
# (参考訳) 重要度重み付けは補間分類器と相容れないか? Is Importance Weighting Incompatible with Interpolating Classifiers? ( http://arxiv.org/abs/2112.12986v1 ) ライセンス: CC BY 4.0	Ke Alexander Wang, Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto	(参考訳) 重み付けは分散シフトを扱う古典的なテクニックである。しかし、以前の研究は、重み付けの重要性が過小パラメータ化されたニューラルネットワークにほとんど影響しないことを示す強力な経験的、理論的証拠を示している。重み付けは過パラメータニューラルネットワークのトレーニングと真に相容れないのか? 私たちの論文はこれを否定的に答える。重み付けは過剰パラメータ化のためではなく、ロジスティック損失やクロスエントロピー損失のような指数関数的に重み付けされた損失を使用することによって失敗する。その結果,過剰パラメータモデルの分布変化の補正において,多項式付き損失が重要度重み付けの効果を回復することが示された。過パラメータ線形モデルによる重み付き多項式尾損失に対する勾配勾配の挙動を特徴付けるとともに,ラベルシフト設定における多項式尾損失の利点を理論的に示す。驚くべきことに、我々の理論は古典的不偏重を指数化することによって得られる重みを用いることで性能が向上することを示している。最後に,亜集団シフトとラベルシフトデータセットを用いたニューラルネットワーク実験により,本解析の実用的価値を示す。再加重すると、損失関数はテスト精度の最大9%で再加重クロスエントロピーより優れる。損失関数はまた、分布シフトを補正するための、よく調整された最先端の方法に匹敵する、あるいは超えているテストアキュラティシーを与えます。 Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We show that importance weighting fails not because of the overparameterization, but instead, as a result of using exponentially-tailed losses like the logistic or cross-entropy loss. As a remedy, we show that polynomially-tailed losses restore the effects of importance reweighting in correcting distribution shift in overparameterized models. We characterize the behavior of gradient descent on importance weighted polynomially-tailed losses with overparameterized linear models, and theoretically demonstrate the advantage of using polynomially-tailed losses in a label shift setting. Surprisingly, our theory shows that using weights that are obtained by exponentiating the classical unbiased importance weights can improve performance. Finally, we demonstrate the practical value of our analysis with neural network experiments on a subpopulation shift and a label shift dataset. When reweighted, our loss function can outperform reweighted cross-entropy by as much as 9% in test accuracy. Our loss function also gives test accuracies comparable to, or even exceeding, well-tuned state-of-the-art methods for correcting distribution shifts.	翻訳日:2021-12-29 15:09:13 公開日:2021-12-24
# (参考訳) iSeg3D:インタラクティブな3D形状分割ツール iSeg3D: An Interactive 3D Shape Segmentation Tool ( http://arxiv.org/abs/2112.12988v1 ) ライセンス: CC BY 4.0	Sucheng Qian, Liu Liu, Wenqiang Xu, Cewu Lu	(参考訳) 大規模データセットは3次元形状理解において優れた特徴を学習するために不可欠だが、ディープラーニングトレーニングを満足できるデータセットはごくわずかである。主な理由の1つは、ポリゴンやスクリブルを使ったポイントごとのセマンティックラベルの注釈付けが退屈で非効率であることである。 3次元形状のセグメンテーションアノテーションを容易にするために,iSegというアノテーションツールを提案する。最小限の人間のクリックで満足なセグメンテーション結果を得る(<10)。我々の観察では、ほとんどのオブジェクトは有限原始形状の合成と見なすことができ、構築された原始構成形状データ上でiSeg3Dモデルを訓練し、幾何学的事前知識を自己教師された方法で学習する。人間の相互作用を考えると、学習した知識は任意の形状の部品を分割するのに使用することができ、正のクリックはプリミティブを意味的な部分に関連付けるのに役立つ。さらに、オンラインのヒューマン・イン・ループ・ファインチューニングモジュールも提供し、より少ないクリックでセグメンテーションを行えるようにしています。 PartNet形状分割におけるiSeg3Dの有効性を示す実験を行った。データとコードは公開される予定だ。 A large-scale dataset is essential for learning good features in 3D shape understanding, but there are only a few datasets that can satisfy deep learning training. One of the major reasons is that current tools for annotating per-point semantic labels using polygons or scribbles are tedious and inefficient. To facilitate segmentation annotations in 3D shapes, we propose an effective annotation tool, named iSeg for 3D shape. It can obtain a satisfied segmentation result with minimal human clicks (< 10). Under our observation, most objects can be considered as the composition of finite primitive shapes, and we train iSeg3D model on our built primitive-composed shape data to learn the geometric prior knowledge in a self-supervised manner. Given human interactions, the learned knowledge can be used to segment parts on arbitrary shapes, in which positive clicks help associate the primitives into the semantic parts and negative clicks can avoid over-segmentation. Besides, We also provide an online human-in-loop fine-tuning module that enables the model perform better segmentation with less clicks. Experiments demonstrate the effectiveness of iSeg3D on PartNet shape segmentation. Data and codes will be made publicly available.	翻訳日:2021-12-29 15:07:53 公開日:2021-12-24
# (参考訳) ドメイン対応連続ゼロショット学習 Domain-Aware Continual Zero-Shot Learning ( http://arxiv.org/abs/2112.12989v1 ) ライセンス: CC BY 4.0	Kai Yi, Mohamed Elhoseiny	(参考訳) ドメイン認識型連続ゼロショット学習(DACZSL)は、目に見えないカテゴリのイメージを逐次認識するタスクである。我々は、DomainNetデータセット上にDACZSLを作成し、一連のタスクに分割し、トレーニング中に目に見えないドメインにクラスを段階的に提供し、目に見えないクラスと目に見えないクラスの両方に対して評価を行う。また、DACZSL設定に適応した最先端のベースラインモデルよりも優れた新しいDomain-Invariant CZSL Network(DIN)を提案する。我々は、グローバル共有ネットワークに加えて、小さなタスク毎プライベートネットワークで、以前のタスクから得た知識を省くための構造ベースのアプローチを採用する。プライベートネットワークがドメインとタスク固有の表現を捉えるように促すため、我々は、我々のグローバルネットワークのタスク不変およびドメイン不変をすべてのタスクにわたって可能にするために、新しい敵の知識の絡み合い設定でモデルを訓練します。提案手法では,クラスレベルのテキスト表現を改善するために,クラスレベルでの学習可能なプロンプトも学習し,サイド情報を表現して,将来の未確認クラスのゼロショット予測を可能にする。私たちのコードとベンチマークは公開される予定だ。 We introduce Domain Aware Continual Zero-Shot Learning (DACZSL), the task of visually recognizing images of unseen categories in unseen domains sequentially. We created DACZSL on top of the DomainNet dataset by dividing it into a sequence of tasks, where classes are incrementally provided on seen domains during training and evaluation is conducted on unseen domains for both seen and unseen classes. We also proposed a novel Domain-Invariant CZSL Network (DIN), which outperforms state-of-the-art baseline models that we adapted to DACZSL setting. We adopt a structure-based approach to alleviate forgetting knowledge from previous tasks with a small per-task private network in addition to a global shared network. To encourage the private network to capture the domain and task-specific representation, we train our model with a novel adversarial knowledge disentanglement setting to make our global network task-invariant and domain-invariant over all the tasks. Our method also learns a class-wise learnable prompt to obtain better class-level text representation, which is used to represent side information to enable zero-shot prediction of future unseen classes. Our code and benchmarks will be made publicly available.	翻訳日:2021-12-29 14:55:08 公開日:2021-12-24
# (参考訳) Toeplitzの最小二乗問題,高速アルゴリズム,ビッグデータ Toeplitz Least Squares Problems, Fast Algorithms and Big Data ( http://arxiv.org/abs/2112.12994v1 ) ライセンス: CC BY 4.0	Ali Eshragh, Oliver Di Pietro and Michael A. Saunders	(参考訳) 時系列解析では、自己回帰モデルに適合する場合は、Toeplitz の通常の最小二乗問題を何度も解いて適切なモデルを見つけなければならない。最近の2つのアルゴリズム(lsarと反復半減法)はランダム化数値線形代数学(randnla)技術を適用し、大きな時系列データに自己回帰モデルを適用している。本研究では,これら2つの近似アルゴリズムの品質を大規模合成データと実世界データで比較検討した。両方のアルゴリズムは合成データセットに匹敵する結果を示すが、実世界の時系列データに適用するとLSARアルゴリズムはより堅牢であるように見える。 randnlaはビッグデータ時系列の文脈において有効であると結論づける。 In time series analysis, when fitting an autoregressive model, one must solve a Toeplitz ordinary least squares problem numerous times to find an appropriate model, which can severely affect computational times with large data sets. Two recent algorithms (LSAR and Repeated Halving) have applied randomized numerical linear algebra (RandNLA) techniques to fitting an autoregressive model to big time-series data. We investigate and compare the quality of these two approximation algorithms on large-scale synthetic and real-world data. While both algorithms display comparable results for synthetic datasets, the LSAR algorithm appears to be more robust when applied to real-world time series data. We conclude that RandNLA is effective in the context of big-data time series.	翻訳日:2021-12-29 14:33:01 公開日:2021-12-24
# (参考訳) 自己監督型単眼深度推定のためのチャネルワイズアテンションに基づくネットワーク Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation ( http://arxiv.org/abs/2112.13047v1 ) ライセンス: CC BY 4.0	Jiaxing Yan, Hong Zhao, Penghui Bu, YuSheng Jin	(参考訳) 自己教師付き学習は単眼深度推定に非常に有望な結果を示した。シーン構造と局所的詳細はどちらも高品質な深さ推定のための重要な手がかりである。最近の研究は、シーン構造の明示的なモデリングの欠如と詳細情報の適切なハンドリングに苦しめられ、その結果、パフォーマンスボトルネックとぼやけたアーティファクトが予測結果に生じている。本稿では,チャネルワイドアテンションに基づく深さ推定ネットワーク(CADepth-Net)を提案する。 1) この構造認識モジュールは, 長距離依存を捕捉し, チャネル次元における識別的特徴を集約する自己認識機構を用いて, シーン構造の認識を明示的に強化し, より優れたシーン理解とリッチな特徴表現を得る。 2) 細部強調モジュールは、チャンネルワイドの特徴マップを再分類し、情報的特徴を選択的に強調し、重要な局所的詳細情報を強調し、異なるレベルの特徴をより効率的に融合させ、より正確でシャープな深度予測を実現する。さらに,本手法の有効性を検証し,KITTIベンチマークとMake3Dデータセットの最先端結果が得られたことを示す。 Self-supervised learning has shown very promising results for monocular depth estimation. Scene structure and local details both are significant clues for high-quality depth estimation. Recent works suffer from the lack of explicit modeling of scene structure and proper handling of details information, which leads to a performance bottleneck and blurry artefacts in predicted results. In this paper, we propose the Channel-wise Attention-based Depth Estimation Network (CADepth-Net) with two effective contributions: 1) The structure perception module employs the self-attention mechanism to capture long-range dependencies and aggregates discriminative features in channel dimensions, explicitly enhances the perception of scene structure, obtains the better scene understanding and rich feature representation. 2) The detail emphasis module re-calibrates channel-wise feature maps and selectively emphasizes the informative features, aiming to highlight crucial local details information and fuse different level features more efficiently, resulting in more precise and sharper depth prediction. Furthermore, the extensive experiments validate the effectiveness of our method and show that our model achieves the state-of-the-art results on the KITTI benchmark and Make3D datasets.	翻訳日:2021-12-29 14:13:44 公開日:2021-12-24
# (参考訳) 高速スケーラブルHDRデゴーストリングのための自己ゲートメモリリカレントネットワーク Self-Gated Memory Recurrent Network for Efficient Scalable HDR Deghosting ( http://arxiv.org/abs/2112.13050v1 ) ライセンス: CC BY 4.0	K. Ram Prabhakar, Susmit Agrawal, R. Venkatesh Babu	(参考訳) 任意の長さの動的シーケンスを融合する新しいリカレントネットワーク型hdrデガホスト方式を提案する。提案手法は畳み込み型および再帰型アーキテクチャを用いて視覚的にゴーストフリーなhdr画像を生成する。我々は,標準lstmセルよりも少ないパラメータを持ち,高速な実行時間を有する新しいリカレントセルアーキテクチャ,すなわち自己制御メモリ(sgm)セルを導入する。 sgmセルでは、ゲートを流れる情報の流れは、ゲートの出力に自身の関数を乗じることで制御される。さらに、2つのSGMセルを双方向設定で使用し、出力品質を向上する。提案手法は,既存のhdrデガホスト法と比較して,3つの公開データセットを定量的に分離すると同時に,可変長入力シーケンスを再トレーニングすることなく融合する拡張性を実現する。広範なアブレーションにより,提案手法における個々の成分の重要性を実証する。コードはhttps://val.cds.iisc.ac.in/hdr/hdrrnn/index.htmlで入手できる。 We propose a novel recurrent network-based HDR deghosting method for fusing arbitrary length dynamic sequences. The proposed method uses convolutional and recurrent architectures to generate visually pleasing, ghosting-free HDR images. We introduce a new recurrent cell architecture, namely Self-Gated Memory (SGM) cell, that outperforms the standard LSTM cell while containing fewer parameters and having faster running times. In the SGM cell, the information flow through a gate is controlled by multiplying the gate's output by a function of itself. Additionally, we use two SGM cells in a bidirectional setting to improve output quality. The proposed approach achieves state-of-the-art performance compared to existing HDR deghosting methods quantitatively across three publicly available datasets while simultaneously achieving scalability to fuse variable-length input sequence without necessitating re-training. Through extensive ablations, we demonstrate the importance of individual components in our proposed approach. The code is available at https://val.cds.iisc.ac.in/HDR/HDRRNN/index.html.	翻訳日:2021-12-29 13:54:52 公開日:2021-12-24
# (参考訳) tri-transformer hawkesプロセス: 3つの頭は1つより優れている Tri-Transformer Hawkes Process: Three Heads are better than one ( http://arxiv.org/abs/2112.13058v1 ) ライセンス: CC BY 4.0	Zhi-yan Song, Jian-wei Liu, Lu-ning Zhang, and Ya-nan Han	(参考訳) 抽象。私たちが遭遇する現実世界のデータのほとんどは非同期イベントシーケンスであり、過去数十年は、ソーシャルネットワーク、電子医療記録、金融取引の分野への様々なポイントプロセスの実装が特徴である。初めは、複雑配列における異なる事象間の相互トリガーパターンを同時にシミュレートできるホークス過程とその変種が一般的であり、ニューラルネットワークの進歩とともに、ニューラルホークスプロセスが次々と提案され、徐々に研究ホットスポットとなっている。変圧器ホークスプロセス (THP) の提案は大幅に性能が向上し, 変圧器に基づくニューラルホークスプロセスの新たなアップサージが開始された。しかし、THPは非同期イベントシーケンスにおける発生時間やイベントの種類に関する情報を完全に利用していない。単にイベントタイプ変換のエンコーディングと、ソースエンコーディングに時間変換のロケーションエンコーディングを追加するだけである。同時に、単一の変換器から構築された学習者は、不可能な学習バイアスをもたらす。これらの問題を緩和するため,我々は,イベントと時間情報をドット製品注目に付加し,新たなマルチヘッド注目を形成するトリトランスフォーマホークスプロセス(tri-thp)モデルを提案する。 Tri-THPの有効性は、実世界と合成データの双方でよく設計された実験によって証明されている。 Abstract. Most of the real world data we encounter are asynchronous event sequence, so the last decades have been characterized by the implementation of various point process into the field of social networks,electronic medical records and financial transactions. At the beginning, Hawkes process and its variants which can simulate simultaneously the self-triggering and mutual triggering patterns between different events in complex sequences in a clear and quantitative way are more popular.Later on, with the advances of neural network, neural Hawkes process has been proposed one after another, and gradually become a research hotspot. The proposal of the transformer Hawkes process (THP) has gained a huge performance improvement, so a new upsurge of the neural Hawkes process based on transformer is set off. However, THP does not make full use of the information of occurrence time and type of event in the asynchronous event sequence. It simply adds the encoding of event type conversion and the location encoding of time conversion to the source encoding. At the same time, the learner built from a single transformer will result in an inescapable learning bias. In order to mitigate these problems, we propose a tri-transformer Hawkes process (Tri-THP) model, in which the event and time information are added to the dot-product attention as auxiliary information to form a new multihead attention. The effectiveness of the Tri-THP is proved by a series of well-designed experiments on both real world and synthetic data.	翻訳日:2021-12-29 13:27:44 公開日:2021-12-24
# (参考訳) Virtuoso: SOCのリアルタイムチューニングのためのビデオベースのインテリジェンス Virtuoso: Video-based Intelligence for real-time tuning on SOCs ( http://arxiv.org/abs/2112.13076v1 ) ライセンス: CC BY 4.0	Jayoung Lee, PengCheng Wang, Ran Xu, Venkat Dasari, Noah Weston, Yin Li, Saurabh Bagchi, and Somali Chaterji	(参考訳) 画像分類や物体検出などのコンピュータビジョンタスクを組み込みデバイスやモバイルデバイスに最適化するために,効率的な適応型コンピュータビジョンシステムが提案されている。これらのソリューションは、非常に最近のもので、近似ノブを持つ適応システムを設計することで、モデル(ディープニューラルネットワーク、DNN)またはシステムを最適化することに焦点を当てている。最近の試みにもかかわらず、既存のソリューションには2つの大きな欠点がある。第一に、システムはどのモデルを実行するかを決定する間、モデルのエネルギー消費を考慮しない。第2に、他の共同居住者のワークロードのため、デバイス上での競合の現実的なシナリオを考慮していない。本研究では,高効率で適応的な映像物体検出システムvirtuosoを提案する。基盤となるvirtuosoは、精度・エネルギー・レイテンシ軸の異なる操作点で動作するマルチブランチ実行カーネルと、ユーザ要求を満たすために最適な実行ブランチを選択する軽量ランタイムスケジューラである。 Virtuosoと同等に比較するために、Faster R-CNN (FRCNN)、YOLO v3、SSD、EfficientDet、SELSA、MEGA、REPP、FastAdapt、およびFRCNN+、YOLO+、SSD+、EfficientDet+の社内適応版を含む15の最先端または広く使用されているプロトコルをベンチマークした。この包括的なベンチマークにより、virtuosoは上記のプロトコルをすべて上回っており、nvidia jetsonモバイルgpuのあらゆる効率レベルで精度のフロンティアをリードしている。具体的には、Virtuosoの精度は63.9%に達し、これは一般的なオブジェクト検出モデルよりも10%以上高く、FRCNNは51.1%、YOLOは49.5%である。 Efficient and adaptive computer vision systems have been proposed to make computer vision tasks, such as image classification and object detection, optimized for embedded or mobile devices. These solutions, quite recent in their origin, focus on optimizing the model (a deep neural network, DNN) or the system by designing an adaptive system with approximation knobs. In spite of several recent efforts, we show that existing solutions suffer from two major drawbacks. First, the system does not consider energy consumption of the models while making a decision on which model to run. Second, the evaluation does not consider the practical scenario of contention on the device, due to other co-resident workloads. In this work, we propose an efficient and adaptive video object detection system, Virtuoso, which is jointly optimized for accuracy, energy efficiency, and latency. Underlying Virtuoso is a multi-branch execution kernel that is capable of running at different operating points in the accuracy-energy-latency axes, and a lightweight runtime scheduler to select the best fit execution branch to satisfy the user requirement. To fairly compare with Virtuoso, we benchmark 15 state-of-the-art or widely used protocols, including Faster R-CNN (FRCNN), YOLO v3, SSD, EfficientDet, SELSA, MEGA, REPP, FastAdapt, and our in-house adaptive variants of FRCNN+, YOLO+, SSD+, and EfficientDet+ (our variants have enhanced efficiency for mobiles). With this comprehensive benchmark, Virtuoso has shown superiority to all the above protocols, leading the accuracy frontier at every efficiency level on NVIDIA Jetson mobile GPUs. Specifically, Virtuoso has achieved an accuracy of 63.9%, which is more than 10% higher than some of the popular object detection models, FRCNN at 51.1%, and YOLO at 49.5%.	翻訳日:2021-12-29 13:16:45 公開日:2021-12-24
# (参考訳) 非対向低光画像強調のための可逆ネットワーク Invertible Network for Unpaired Low-light Image Enhancement ( http://arxiv.org/abs/2112.13107v1 ) ライセンス: CC BY 4.0	Jize Zhang, Haolin Wang, Xiaohe Wu, Wangmeng Zuo	(参考訳) 既存の低照度画像強調手法では、2つのCNNジェネレータを別々に配置し、拡張と分解を行う2方向GANフレームワークが好まれる。しかし、そのようなデータ駆動モデルは、低照度と通常の光画像間の変換の固有の特性を無視し、不安定なトレーニングとアーティファクトをもたらす。そこで本研究では,可逆ネットワークを利用してフォワードプロセスにおける低光度画像の強調を行い,非ペア学習で逆光を劣化させる手法を提案する。生成された実画像は、敵対的学習のための識別器に送られる。敵の損失に加えて、トレーニングの安定性を確保し、より詳細な画像を保存するために様々な損失関数を設計する。特に、過剰露光問題を緩和するために可逆性損失を導入する。さらに,低照度画像に対するプログレッシブ自己誘導強調処理を提案し,SOTAに対して良好な性能を示す。 Existing unpaired low-light image enhancement approaches prefer to employ the two-way GAN framework, in which two CNN generators are deployed for enhancement and degradation separately. However, such data-driven models ignore the inherent characteristics of transformation between the low and normal light images, leading to unstable training and artifacts. Here, we propose to leverage the invertible network to enhance low-light image in forward process and degrade the normal-light one inversely with unpaired learning. The generated and real images are then fed into discriminators for adversarial learning. In addition to the adversarial loss, we design various loss functions to ensure the stability of training and preserve more image details. Particularly, a reversibility loss is introduced to alleviate the over-exposure problem. Moreover, we present a progressive self-guided enhancement process for low-light images and achieve favorable performance against the SOTAs.	翻訳日:2021-12-29 12:43:43 公開日:2021-12-24
# (参考訳) リニア関数近似による加速・インスタンス最適政策評価 Accelerated and instance-optimal policy evaluation with linear function approximation ( http://arxiv.org/abs/2112.13109v1 ) ライセンス: CC BY 4.0	Tianjiao Li, Guanghui Lan and Ashwin Pananjady	(参考訳) 本稿では,線形関数近似による政策評価の問題と,高い最適性を保証するアルゴリズムを提示する。まず、この問題における決定的誤差と確率的誤差の両方に基づくベースラインを確立する下界の証明から始める。特に,トランジションカーネルの定常分布に付随するインスタンス依存ノルムにおいて,決定的誤差に起因するオラクルの複雑性を低く証明し,局所漸近ミニマックス機構を用いて,確率的誤差に起因したインスタンス依存的な下界を観測モデルで証明する。既存のアルゴリズムは、これらの下界の少なくとも1つと一致しない: 説明するために、時間差学習の分散還元型を分析し、特にオラクルの複雑性下界を達成することができないことを示す。この問題に対処するため,我々は,下限と下限の両方を同時に一致させ,インスタンス最適化の強い概念を実現する,分散低減高速時間差アルゴリズム(vrftd)を開発した。最後に、vrftdアルゴリズムをマルコフ観測による設定に拡張し、連鎖の混合時間に比例する乗算係数まで設定したi.i.d.の設定と一致するインスタンス依存収束結果を提供する。最適性の理論的保証は数値実験によって裏付けられる。 We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results that match those in the i.i.d. setting up to a multiplicative factor that is proportional to the mixing time of the chain. Our theoretical guarantees of optimality are corroborated by numerical experiments.	翻訳日:2021-12-29 12:40:27 公開日:2021-12-24
# (参考訳) MRI由来の正規化フローを用いた超音波スペックル抑制とノイズ除去 Ultrasound Speckle Suppression and Denoising using MRI-derived Normalizing Flow Priors ( http://arxiv.org/abs/2112.13110v1 ) ライセンス: CC BY-SA 4.0	Vincent van de Schaft and Ruud J.G. van Sloun	(参考訳) 超音波検査は安価で広くアクセス可能でコンパクトな医用イメージングソリューションを提供する。しかし、CTやMRIなどの他の画像モダリティと比較して、超音波画像はサブ波長散乱のランダムな干渉に起因する強いスペックルノイズに悩まされている。これにより超音波画像の品質が低下し、解釈が困難になる。本稿では,高画質mri画像から得られた深部生成前処理を用いた最大ポストエリリ推定に基づく,教師なし超音波スペックル低減法と画像切り離し法を提案する。生成組織反射率を事前にモデル化するために,近年,様々な応用において信号先行のモデル化に非常に有効であることが判明した流れの正規化を利用する。一般化を容易にするため,NYUの高速MRI(完全サンプリング)データセットのパッチに基づいて,前処理を分解し,フローモデルをトレーニングする。この前処理は反復分母スキームの推論に使用される。まず,騒がしいmriデータに対する学習前処理の有用性を検証し,picmusとcubdlのデータセットから得られたシミュレーション画像とin-vivo超音波画像の両方の性能評価を行った。その結果,他の(教師なし)超音波除音法 (nlm, obnlm) よりも定量的, 質的に優れていた。 Ultrasonography offers an inexpensive, widely-accessible and compact medical imaging solution. However, compared to other imaging modalities such as CT and MRI, ultrasound images notoriously suffer from strong speckle noise, which originates from the random interference of sub-wavelength scattering. This deteriorates ultrasound image quality and makes interpretation challenging. We here propose a new unsupervised ultrasound speckle reduction and image denoising method based on maximum-a-posteriori estimation with deep generative priors that are learned from high-quality MRI images. To model the generative tissue reflectivity prior, we exploit normalizing flows, which in recent years have shown to be very powerful in modeling signal priors across a variety of applications. To facilitate generaliation, we factorize the prior and train our flow model on patches from the NYU fastMRI (fully-sampled) dataset. This prior is then used for inference in an iterative denoising scheme. We first validate the utility of our learned priors on noisy MRI data (no prior domain shift), and then turn to evaluating performance on both simulated and in-vivo ultrasound images from the PICMUS and CUBDL datasets. The results show that the method outperforms other (unsupervised) ultrasound denoising methods (NLM and OBNLM) both quantitatively and qualitatively.	翻訳日:2021-12-29 12:39:09 公開日:2021-12-24
# (参考訳) 劣化によるDNA配列データの品質測定 Measuring Quality of DNA Sequence Data via Degradation ( http://arxiv.org/abs/2112.13111v1 ) ライセンス: CC BY 4.0	Alan F. Karr, Jason Hauzel, Adam A. Porter, Marcel Schaefer	(参考訳) 本稿では,ゲノムデータの品質評価のための新しいパラダイムを提案し,その有効性を定量的に評価する。その理論的根拠は、初期品質が高いほど、ゲノムが脆弱になり、分解の影響が大きくなることである。我々は, この現象がユビキタスであり, 劣化の定量化が多目的に利用できることを示す。データ品質に関して問題となる可能性のある外れ値の特定に重点を置いていますが、真の異常である場合や、データベースを変換しようとする場合さえあります。 We propose and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database.	翻訳日:2021-12-29 12:22:36 公開日:2021-12-24
# (参考訳) ゲノムのマルコフ構造の異常識別と読み出し分類への応用 Application of Markov Structure of Genomes to Outlier Identification and Read Classification ( http://arxiv.org/abs/2112.13117v1 ) ライセンス: CC BY 4.0	Alan F. Karr, Jason Hauzel, Adam A. Porter, Marcel Schaefer	(参考訳) 本稿では,2つのバイオインフォマティクス問題,すなわち,ゲノムデータベースにおける異常点の同定と,実際のウイルスとアデノウイルスのデータを用いたメダゲノミクスにおける分類の2次マルコフ過程として,ゲノムの構造を応用する。 In this paper we apply the structure of genomes as second-order Markov processes specified by the distributions of successive triplets of bases to two bioinformatics problems: identification of outliers in genome databases and read classification in metagenomics, using real coronavirus and adenovirus data.	翻訳日:2021-12-29 12:09:26 公開日:2021-12-24
# (参考訳) グラフとグラフの類似度を学習するためのニューラルネットワークフレームワーク A Neural Framework for Learning Subgraph and Graph Similarity Measures ( http://arxiv.org/abs/2112.13143v1 ) ライセンス: CC BY 4.0	Rishabh Ranjan, Siddharth Grover, Sourav Medya, Venkatesan Chakaravarthy, Yogish Sabharwal, Sayan Ranu	(参考訳) グラフ解析において、グラフ類似性探索は基本的な演算子である。このフレームワークでは、クエリグラフとグラフデータベースが与えられた場合、クエリに構造的に類似したデータベースグラフのサブグラフを特定することが目的である。サブグラフ編集距離(sed)は、サブグラフの類似性の最も表現力のある尺度の1つである。本研究では,グラフペアの学習セットとそのSED値からSEDを学習する問題について検討する。そこで我々は,SEDを連想させる構造を持つ埋め込み空間を学習するNEUROSEDと呼ばれる新しいシアムグラフニューラルネットワークを設計する。 NEUROSEDは特殊に製作された帰納バイアスの助けを借りて、高い精度を実現するだけでなく、予測されたSEDが真のSEDと同様に三角形の不等式を満たすことを保証する。この設計はグラフ編集距離(GED)をモデル化するのに十分一般的なものであり、予測されたGED空間が真のGED空間のようにメートル法であることを保証している。 SEDとGEDの両方において、実際のグラフデータセットに関する大規模な実験により、NEUROSEDは最先端のベースラインの約18倍の速度でRMSEの約2倍の速度で達成されていることが証明された。さらに、ペア独立な埋め込みと理論的性質のため、neurosedはグラフやサブグラフの検索を約3桁高速化できる。 Subgraph similarity search is a fundamental operator in graph analysis. In this framework, given a query graph and a graph database, the goal is to identify subgraphs of the database graphs that are structurally similar to the query. Subgraph edit distance (SED) is one of the most expressive measures for subgraph similarity. In this work, we study the problem of learning SED from a training set of graph pairs and their SED values. Towards that end, we design a novel siamese graph neural network called NEUROSED, which learns an embedding space with a rich structure reminiscent of SED. With the help of a specially crafted inductive bias, NEUROSED not only enables high accuracy but also ensures that the predicted SED, like true SED, satisfies triangle inequality. The design is generic enough to also model graph edit distance (GED), while ensuring that the predicted GED space is metric, like the true GED space. Extensive experiments on real graph datasets, for both SED and GED, establish that NEUROSED achieves approximately 2 times lower RMSE than the state of the art and is approximately 18 times faster than the fastest baseline. Further, owing to its pair-independent embeddings and theoretical properties, NEUROSED allows approximately 3 orders of magnitude faster retrieval of graphs and subgraphs.	翻訳日:2021-12-29 11:55:40 公開日:2021-12-24
# (参考訳) SoK:音声処理システムのセキュリティに関する研究 SoK: A Study of the Security on Voice Processing Systems ( http://arxiv.org/abs/2112.13144v1 ) ライセンス: CC BY 4.0	Robert Chang, Logan Kuo, Arthur Liu, and Nader Sehatbakhsh	(参考訳) 音声処理システム(vps)の使用は、商用音声認識デバイスや主要なテキスト対音声ソフトウェアといったアプリケーションへの依存が高まり、日々の日常生活で普及し続けているため、これらのシステムに対する攻撃はますます複雑で、多様で、絶えず進化している。 VPSのユースケースが急速に新しいスペースと目的に成長するにつれ、プライバシーに関する潜在的な影響はますます危険になっている。さらに、空襲の数の増加と実用性の増加により、システム障害はずっと起こり得るものになっている。本稿では,音声処理システムにおけるユニークな攻撃の配置を識別し,分類する。長年にわたり研究は、システムの故障やサービスの否定をもたらす特殊な標的のない攻撃から、敵によって制御される結果を強制するより汎用的な攻撃へと移行してきた。現在の最も頻繁に使用されている機械学習システムと、現代の音声処理システムの中核であるディープニューラルネットワークは、セキュリティよりもパフォーマンスとスケーラビリティを重視して構築されている。したがって,我々は音声処理環境の発達を再評価し,今後の発展と理論的改善を提案するために,現在の攻撃・防御の状況を特定することが重要である。 As the use of Voice Processing Systems (VPS) continues to become more prevalent in our daily lives through the increased reliance on applications such as commercial voice recognition devices as well as major text-to-speech software, the attacks on these systems are increasingly complex, varied, and constantly evolving. With the use cases for VPS rapidly growing into new spaces and purposes, the potential consequences regarding privacy are increasingly more dangerous. In addition, the growing number and increased practicality of over-the-air attacks have made system failures much more probable. In this paper, we will identify and classify an arrangement of unique attacks on voice processing systems. Over the years research has been moving from specialized, untargeted attacks that result in the malfunction of systems and the denial of services to more general, targeted attacks that can force an outcome controlled by an adversary. The current and most frequently used machine learning systems and deep neural networks, which are at the core of modern voice processing systems, were built with a focus on performance and scalability rather than security. Therefore, it is critical for us to reassess the developing voice processing landscape and to identify the state of current attacks and defenses so that we may suggest future developments and theoretical improvements.	翻訳日:2021-12-29 11:27:11 公開日:2021-12-24
# (参考訳) 前・逆離散周期ラドン変換の高速かつスケーラブルな計算法 Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform ( http://arxiv.org/abs/2112.13149v1 ) ライセンス: CC BY 4.0	Cesar Carranza, Daniel Llamocca, and Marios Pattichis	(参考訳) 離散周期ラドン変換(DPRT)は、投影からの画像再構成を含むアプリケーションで広く使われている。この原稿では、以下の方法に基づいた前方および逆dprtを計算するための高速でスケーラブルなアプローチを紹介している。 (i)固定点加算木の並列配列 (ii) 加算器ツリーの入力データを選択する際に外部メモリコンポーネントにアクセスする必要をなくすための円形シフトレジスタ。 (iii)提案するアーキテクチャを利用可能なリソースに適合させるdprt計算に対する画像ブロックに基づくアプローチ (4)入力画像のサイズに依存しない1または数回のクロックサイクルで計算される高速なトランスポジション。結果として、$N\times N$ image(N$ prime)の場合、提案手法はクロックサイクル当たりの$N^{2}$加算を計算することができる。従来のアプローチと比較して、スケーラブルなアプローチは、さまざまな計算リソースに対して最も高速な実装を提供する。例えば、251\times 251$の画像では、systolicの実装で必要とされるよりも約25\%少ないflip-flopsで、スケーラブルなdprtは36倍高速に計算できる。最も高速な場合、DPRTとその逆をそれぞれ2N+\lceil \log_{2}N\right\rceil+1$と2N+3\left\lceil \log_{2}N\right\rceil+B+2$ cyclesで計算できる最適化アーキテクチャを導入します。一方、拡張性のあるDPRTアプローチでは、systolic実装よりも1ビットの追加が必要であり、スピードと1ビットの追加の間のトレードオフを提供する。提案したDPRTアーキテクチャはすべてVHDLで実装され、FPGA実装を用いて検証された。 The Discrete Periodic Radon Transform (DPRT) has been extensively used in applications that involve image reconstructions from projections. This manuscript introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: (i) a parallel array of fixed-point adder trees, (ii) circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees, (iii) an image block-based approach to DPRT computation that can fit the proposed architecture to available resources, and (iv) fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an $N\times N$ image ($N$ prime), the proposed approach can compute up to $N^{2}$ additions per clock cycle. Compared to previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a $251\times 251$ image, for approximately $25\%$ fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized architectures that can compute the DPRT and its inverse in just $2N+\left\lceil \log_{2}N\right\rceil+1$ and $2N+3\left\lceil \log_{2}N\right\rceil+B+2$ cycles respectively, where $B$ is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-bit additions than for the systolic implementation and provides a trade-off between speed and additional 1-bit additions. All of the proposed DPRT architectures were implemented in VHDL and validated using an FPGA implementation.	翻訳日:2021-12-29 11:11:58 公開日:2021-12-24
# (参考訳) スケーラブルアーキテクチャを用いた高速2次元畳み込みと相互相関 Fast 2D Convolutions and Cross-Correlations Using Scalable Architectures ( http://arxiv.org/abs/2112.13150v1 ) ライセンス: CC BY 4.0	Cesar Carranza, Daniel Llamocca, and Marios Pattichis	(参考訳) この原稿は、高速でスケーラブルなアーキテクチャと、畳み込みと相互相関を計算するための関連するアルゴリズムを記述している。基本的な考え方は、2次元の畳み込みとクロス相関を変換領域内の1次元の畳み込みとクロス相関の集合にマッピングすることである。これは、一般的なカーネルに離散周期ラドン変換(DPRT)を使用し、低ランクカーネルにSVD-LU分解を使用することで達成される。このアプローチではスケーラブルなアーキテクチャを使用し、最新のFPGAやZynq-SOCデバイスに組み込める。利用可能なリソースの種類によっては、$P\times P$ blocks、$O(P)$ clock cycles to $O(P^2)$ clock cyclesで2D畳み込みと相互相関を計算することができる。したがって、パフォーマンスと必要な数とリソースの種類との間にトレードオフがある。本稿では,最新のプログラマブルデバイス(Virtex-7とZynq-SOC)を用いて提案アーキテクチャの実装を行う。必要なリソースの量と種類に基づいて,提案手法が現在の手法を大きく上回ることを示す。 The manuscript describes fast and scalable architectures and associated algorithms for computing convolutions and cross-correlations. The basic idea is to map 2D convolutions and cross-correlations to a collection of 1D convolutions and cross-correlations in the transform domain. This is accomplished through the use of the Discrete Periodic Radon Transform (DPRT) for general kernels and the use of SVD-LU decompositions for low-rank kernels. The approach uses scalable architectures that can be fitted into modern FPGA and Zynq-SOC devices. Based on different types of available resources, for $P\times P$ blocks, 2D convolutions and cross-correlations can be computed in just $O(P)$ clock cycles up to $O(P^2)$ clock cycles. Thus, there is a trade-off between performance and required numbers and types of resources. We provide implementations of the proposed architectures using modern programmable devices (Virtex-7 and Zynq-SOC). Based on the amounts and types of required resources, we show that the proposed approaches significantly outperform current methods.	翻訳日:2021-12-29 10:40:30 公開日:2021-12-24
# ニューラルインターコネクトとダンピング割り当てを用いた全エネルギーシェーピング --パッシビリティに基づく制御 Total Energy Shaping with Neural Interconnection and Damping Assignment -- Passivity Based Control ( http://arxiv.org/abs/2112.12999v1 ) ライセンス: Link先を確認	Santiago Sanchez-Escalonilla, Rodolfo Reyes-Baez, Bayu Jayawardhana	(参考訳) 本研究では、ニューラルネットワーク(NN)の普遍的近似特性を利用して、ポート-ハミルトン(pH)フレームワークにおける完全作動機械系の相互接続と減衰割り当て(IDA)制御を設計する。そこで我々は、IDA-PBC法を、偏微分マッチング方程式を解く教師付き学習問題に変換し、平衡割当やリャプノフ安定性条件を満たす。この結果の主な結果は、学習アルゴリズムの出力が通過率とリャプノフ安定性の観点から明確な制御論的解釈を持つことである。提案する制御設計手法は, 数値シミュレーションにより自由度1, 2自由度機械システムに対して検証された。 In this work we exploit the universal approximation property of Neural Networks (NNs) to design interconnection and damping assignment (IDA) passivity-based control (PBC) schemes for fully-actuated mechanical systems in the port-Hamiltonian (pH) framework. To that end, we transform the IDA-PBC method into a supervised learning problem that solves the partial differential matching equations, and fulfills equilibrium assignment and Lyapunov stability conditions. A main consequence of this, is that the output of the learning algorithm has a clear control-theoretic interpretation in terms of passivity and Lyapunov stability. The proposed control design methodology is validated for mechanical systems of one and two degrees-of-freedom via numerical simulations.	翻訳日:2021-12-28 17:50:28 公開日:2021-12-24
# 深部暗黙的場を用いた点雲からのコンパクト建築モデルの構築 Reconstructing Compact Building Models from Point Clouds Using Deep Implicit Fields ( http://arxiv.org/abs/2112.13142v1 ) ライセンス: Link先を確認	Zhaiyu Chen, Seyran Khademi, Hugo Ledoux, Liangliang Nan	(参考訳) 3次元建築モデルは、多くの現実世界の応用においてますます重要な役割を果たす一方、建物のコンパクトな表現は未解決の問題である。本稿では,点雲からコンパクト・水密・多角形建築モデルを再構築するための新しい枠組みを提案する。私たちのフレームワークは3つのコンポーネントで構成されています。 a) 細胞複合体は、候補集合として多面体埋め込みを提供する適応空間分割によって生成される。 b) 暗黙的場は、占有率推定の構築を容易にする深層ニューラルネットワークによって学習される。 (c)マルコフ確率場を定式化し、組合せ最適化により建物の外面を抽出する。形状再構成, 表面近似, 幾何単純化における最先端手法と評価, 比較を行った。人工的および実世界のポイントクラウドにおける実験では、ニューラルネットワークによる戦略により、忠実性、コンパクト性、計算効率において、高品質なビルディングモデルが得られることが示されています。提案手法は, ノイズに対する頑健さと測定の不十分さを示し, 合成スキャンから実世界の計測まで, 直接的に一般化することができる。 Three-dimensional (3D) building models play an increasingly pivotal role in many real-world applications while obtaining a compact representation of buildings remains an open problem. In this paper, we present a novel framework for reconstructing compact, watertight, polygonal building models from point clouds. Our framework comprises three components: (a) a cell complex is generated via adaptive space partitioning that provides a polyhedral embedding as the candidate set; (b) an implicit field is learned by a deep neural network that facilitates building occupancy estimation; (c) a Markov random field is formulated to extract the outer surface of a building via combinatorial optimization. We evaluate and compare our method with state-of-the-art methods in shape reconstruction, surface approximation, and geometry simplification. Experiments on both synthetic and real-world point clouds have demonstrated that, with our neural-guided strategy, high-quality building models can be obtained with significant advantages in fidelity, compactness, and computational efficiency. Our method shows robustness to noise and insufficient measurements, and it can directly generalize from synthetic scans to real-world measurements.	翻訳日:2021-12-28 17:42:40 公開日:2021-12-24
# 汚染ガウスモデルにおけるロバスト推定のためのトラクタブルおよび準最適逆アルゴリズム Tractable and Near-Optimal Adversarial Algorithms for Robust Estimation in Contaminated Gaussian Models ( http://arxiv.org/abs/2112.12919v1 ) ライセンス: Link先を確認	Ziyue Wang, Zhiqiang Tan	(参考訳) フーバーの汚染ガウスモデルの下での位置と分散行列の同時推定の問題を考える。まず, 人口レベルでの最小$f$-divergence推定を非パラメトリック判別器を用いた生成逆数法に対応して検討し, 最小距離推定のロバスト性と同様に, 頑健な推定につながる$f$-divergencesの条件を確立する。より重要なことは、単純なスプライン判別器を用いた扱いやすい逆アルゴリズムを開発し、現在のジェネレータに与えられた凹型目的関数を最大化することで判別器パラメータを完全に更新できるように、入れ子最適化によって実装できる。提案手法は,$f$-divergenceと使用したペナルティに応じて,最小値の最適値またはほぼ最適値を達成する。本稿では,古典的ロバスト推定法,ペアワイズ法,ニューラルネットワーク判別法に対する提案手法の利点を示すシミュレーション手法を提案する。 Consider the problem of simultaneous estimation of location and variance matrix under Huber's contaminated Gaussian model. First, we study minimum $f$-divergence estimation at the population level, corresponding to a generative adversarial method with a nonparametric discriminator and establish conditions on $f$-divergences which lead to robust estimation, similarly to robustness of minimum distance estimation. More importantly, we develop tractable adversarial algorithms with simple spline discriminators, which can be implemented via nested optimization such that the discriminator parameters can be fully updated by maximizing a concave objective function given the current generator. The proposed methods are shown to achieve minimax optimal rates or near-optimal rates depending on the $f$-divergence and the penalty used. We present simulation studies to demonstrate advantages of the proposed methods over classic robust estimators, pairwise methods, and a generative adversarial method with neural network discriminators.	翻訳日:2021-12-28 17:37:32 公開日:2021-12-24
# ディープフィードフォワードReLUニューラルネットワークのパラメータ同定可能性 Parameter identifiability of a deep feedforward ReLU neural network ( http://arxiv.org/abs/2112.12982v1 ) ライセンス: Link先を確認	Joachim Bona-Pellissier (IMT), Fran\c{c}ois Bachoc (IMT), Fran\c{c}ois Malgouyres (IMT)	(参考訳) 入力空間のサブセットにおける関数の知識のおかげで、ニューラルネットワークのパラメータ重みとバイアスを回復する可能性は、状況、呪い、祝福によっても可能となる。一方、パラメータを復元することで、より良い敵攻撃が可能になり、ネットワーク構築に使用されるデータセットから機密情報を開示することもできる。一方、ネットワークのパラメータが復元可能であれば、潜在空間の特徴を解釈できることをユーザに保証する。また、ネットワークの性能に関する正式な保証を得るための基盤も提供する。したがって、パラメータを識別できるネットワークとパラメータを識別できないネットワークを特徴付けることが重要である。本稿では、入力空間のサブセットに実装する関数から、ネットワークのパラメータが一意に同定されたモジュロ置換と正の再スケーリングを持つディープ完全接続フィードフォワードReLUニューラルネットワークに条件セットを提供する。 The possibility for one to recover the parameters-weights and biases-of a neural network thanks to the knowledge of its function on a subset of the input space can be, depending on the situation, a curse or a blessing. On one hand, recovering the parameters allows for better adversarial attacks and could also disclose sensitive information from the dataset used to construct the network. On the other hand, if the parameters of a network can be recovered, it guarantees the user that the features in the latent spaces can be interpreted. It also provides foundations to obtain formal guarantees on the performances of the network. It is therefore important to characterize the networks whose parameters can be identified and those whose parameters cannot. In this article, we provide a set of conditions on a deep fully-connected feedforward ReLU neural network under which the parameters of the network are uniquely identified-modulo permutation and positive rescaling-from the function it implements on a subset of the input space.	翻訳日:2021-12-28 17:37:13 公開日:2021-12-24
# 基礎疾患と新型コロナウイルス感受性との関連性に関する機械学習解析 A machine learning analysis of the relationship between some underlying medical conditions and COVID-19 susceptibility ( http://arxiv.org/abs/2112.12901v1 ) ライセンス: Link先を確認	Mostafa Rezapour, Colin A. Varady	(参考訳) 過去数年間、新型コロナウイルス(covid-19)は米国に住むすべての国民の日常生活に大きな影響を与え、気づかれずにはいられないいくつかの致命的な健康リスクを課してきた。米国の社会にcovid-19が与える恐怖と危険が高まる中で、個人が利用するための恒久的な治療として、いくつかのワクチンやブースターが作成されている。本稿では,米国内の複数の州において,新型コロナウイルスワクチンとブースターの関連とコロナウイルスの総感染者数について検討する。また,本研究は,いくつかの病原体とcovid-19の関連について述べる。本稿では,これらの関係を効果的に議論するために,統計的テストと機械学習手法を用いて分析と議論を行う。さらに, 教育的達成, 人種, およびcovid-19との関係と, 基礎疾患, ワクチン接種率, およびcovid-19総症例数, 死亡数との関連性について考察した。 For the past couple years, the Coronavirus, commonly known as COVID-19, has significantly affected the daily lives of all citizens residing in the United States by imposing several, fatal health risks that cannot go unnoticed. In response to the growing fear and danger COVID-19 inflicts upon societies in the USA, several vaccines and boosters have been created as a permanent remedy for individuals to take advantage of. In this paper, we investigate the relationship between the COVID-19 vaccines and boosters and the total case count for the Coronavirus across multiple states in the USA. Additionally, this paper discusses the relationship between several, selected underlying health conditions with COVID-19. To discuss these relationships effectively, this paper will utilize statistical tests and machine learning methods for analysis and discussion purposes. Furthermore, this paper reflects upon conclusions made about the relationship between educational attainment, race, and COVID-19 and the possible connections that can be established with underlying health conditions, vaccination rates, and COVID-19 total case and death counts.	翻訳日:2021-12-28 17:36:33 公開日:2021-12-24
# 機械学習を用いた心電図信号の心室頻拍検出と分類モデル Supraventricular Tachycardia Detection and Classification Model of ECG signal Using Machine Learning ( http://arxiv.org/abs/2112.12953v1 ) ライセンス: Link先を確認	Pampa Howladar, Manodipan Sahoo	(参考訳) 心電図(ECG)信号の研究は、心電図プロセスが非侵襲的で使用が容易であるため、心臓疾患の診断に不可欠である。本研究は, ノイズのフィルタリング, 心電図特性のユニークな収集, および重症度に応じて異なる型を分類する自動学習分類モデルを含む, 数段階からなる上室性不整脈予測モデルを提案する。我々は,ノイズを低減し,抽出前の機能をよりよく決定するために,信号の消音・消音を行う。その後,必要な特徴抽出の一部として1つのrピーク検出法とq-s検出法を提案する。これらの特徴に対応する次のパラメータが計算される。これらの特徴を活かして,異なるタイプの上室頻拍を分類できる機械学習に基づく分類モデルを開発した。上室頻拍不整脈における決定木モデルが最も効率的な機械学習モデルであることが示唆された。すべての機械学習モデルの中で、このモデルは上室頻拍の重要な信号誤分類を最も効率的に低減する。実験の結果, 良好な改善が得られ, 97%の精度で提案手法の有効性が示された。 Investigation on the electrocardiogram (ECG) signals is an essential way to diagnose heart disease since the ECG process is noninvasive and easy to use. This work presents a supraventricular arrhythmia prediction model consisting of a few stages, including filtering of noise, a unique collection of ECG characteristics, and automated learning classifying model to classify distinct types, depending on their severity. We de-trend and de-noise a signal to reduce noise to better determine functionality before extractions are performed. After that, we present one R-peak detection method and Q-S detection method as a part of necessary feature extraction. Next parameters are computed that correspond to these features. Using these characteristics, we have developed a classification model based on machine learning that can successfully categorize different types of supraventricular tachycardia. Our findings suggest that decision-tree-based models are the most efficient machine learning models for supraventricular tachycardia arrhythmia. Among all the machine learning models, this model most efficiently lowers the crucial signal misclassification of supraventricular tachycardia. Experimental results indicate satisfactory improvements and demonstrate a superior efficiency of the proposed approach with 97% accuracy.	翻訳日:2021-12-28 17:36:16 公開日:2021-12-24
# 機械学習による心電図信号の効率的な心室頻拍検出モデル Machine Learning-based Efficient Ventricular Tachycardia Detection Model of ECG Signal ( http://arxiv.org/abs/2112.12956v1 ) ライセンス: Link先を確認	Pampa Howladar, Manodipan Sahoo	(参考訳) 心不全の一次診断と解析では、心電図信号が重要な役割を果たす。本稿では, ノイズフィルタリングを用いた心室頻拍不整脈の予測モデル, 心電図特徴のユニークなセット, 機械学習に基づく分類モデルを提案する。信号特徴抽出に先立ち,特徴を適切に検出するためのノイズを除去するために信号の消音・消音を行う。その後、必要な特徴を抽出し、これらの特徴に関連する必要なパラメータを測定する。これらのパラメータを用いて、異なるタイプの心室頻拍不整脈を効率的に分類できる機械学習アプローチを用いて、効率的なマルチクラス分類モデルを作成した。以上の結果から,ロジスティック回帰モデルと決定木モデルが最も効率的な心室頻拍検出モデルであることが示唆された。心臓疾患を診断し,患者のケアを見つけるためには,早期かつ信頼性の高い不整脈の診断が必要である。提案手法の実装により,心室頻拍に関連する臨界信号の誤分類を極めて効率的に低減する問題に対処する。実験結果から,提案したアルゴリズムに対する高いレジリエンスを示した。この支援により、医師は患者のこのタイプの不整脈を早期に評価し、適切なタイミングで適切な判断をすることができる。 In primary diagnosis and analysis of heart defects, an ECG signal plays a significant role. This paper presents a model for the prediction of ventricular tachycardia arrhythmia using noise filtering, a unique set of ECG features, and a machine learning-based classifier model. Before signal feature extraction, we detrend and denoise the signal to eliminate the noise for detecting features properly. After that necessary features have been extracted and necessary parameters related to these features are measured. Using these parameters, we prepared one efficient multiclass classifier model using a machine learning approach that can classify different types of ventricular tachycardia arrhythmias efficiently. Our results indicate that Logistic regression and Decision tree-based models are the most efficient machine learning models for detecting ventricular tachycardia arrhythmia. In order to diagnose heart diseases and find care for a patient, an early, reliable diagnosis of different types of arrhythmia is necessary. By implementing our proposed method, this work deals with the problem of reducing the misclassification of the critical signal related to ventricular tachycardia very efficiently. Experimental findings demonstrate satisfactory enhancements and demonstrate high resilience to the algorithm that we have proposed. With this assistance, doctors can assess this type of arrhythmia of a patient early and take the right decision at the proper time.	翻訳日:2021-12-28 17:35:29 公開日:2021-12-24
# 時系列のコンパクト辞書表現を用いた誤差有界近似時系列接合 Error-bounded Approximate Time Series Joins using Compact Dictionary Representations of Time Series ( http://arxiv.org/abs/2112.12965v1 ) ライセンス: Link先を確認	Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Huiyuan Chen, Zhongfang Zhuang, Wei Zhang, Eamonn Keogh	(参考訳) matrix profileは、時系列データの類似性結合機能を提供する効果的なデータマイニングツールである。行列プロファイルのユーザは、相似性結合(自己結合)を用いて自身で時系列を結合するか、相似性結合を用いて別の時系列と結合することができる。いずれかのタイプの結合を呼び出すことで、マトリクスプロファイルはデータの保存された構造と異常な構造の両方を発見するのに役立つ。 5年前の行列プロファイルの導入以来、近似結合による計算の高速化に複数の取り組みがなされてきたが、これらの取り組みの大部分は自己結合にのみ焦点をあてている。本研究では,時系列のコンパクトな"ディクショナリ"表現を作成することにより,誤差有界保証を伴う近似時系列間類似性結合を効率的に実行可能であることを示す。元の時系列ではなく辞書表現を用いることで、異常マイニングシステムのスループットを少なくとも20倍向上させることができるが、基本的に精度は低下しない。副次的な効果として、辞書は時系列を意味的に意味のある方法で要約し、直感的で実行可能な洞察を提供する。医学や交通の分野における辞書に基づく時系列間類似性の有用性を実証する。 The matrix profile is an effective data mining tool that provides similarity join functionality for time series data. Users of the matrix profile can either join a time series with itself using intra-similarity join (i.e., self-join) or join a time series with another time series using inter-similarity join. By invoking either or both types of joins, the matrix profile can help users discover both conserved and anomalous structures in the data. Since the introduction of the matrix profile five years ago, multiple efforts have been made to speed up the computation with approximate joins; however, the majority of these efforts only focus on self-joins. In this work, we show that it is possible to efficiently perform approximate inter-time series similarity joins with error bounded guarantees by creating a compact "dictionary" representation of time series. Using the dictionary representation instead of the original time series, we are able to improve the throughput of an anomaly mining system by at least 20X, with essentially no decrease in accuracy. As a side effect, the dictionaries also summarize the time series in a semantically meaningful way and can provide intuitive and actionable insights. We demonstrate the utility of our dictionary-based inter-time series similarity joins on domains as diverse as medicine and transportation.	翻訳日:2021-12-28 17:35:10 公開日:2021-12-24
# DP-UTIL: 機械学習における差分プライバシーの総合的ユーティリティ分析 DP-UTIL: Comprehensive Utility Analysis of Differential Privacy in Machine Learning ( http://arxiv.org/abs/2112.12998v1 ) ライセンス: Link先を確認	Ismat Jarin and Birhanu Eshete	(参考訳) 差分プライバシー(DP)は、プライバシー漏洩の定量化を推理する厳格な形式主義として登場した。機械学習(ML)では、DPはトレーニング例の推論/開示を制限するために使用されている。以前の作業では、MLパイプライン全体にわたってDPを活用し、独立して、勾配の摂動のようなメカニズムに重点を置いていた。本稿では,入力摂動,客観的摂動,勾配摂動,出力摂動,予測摂動に着目した,mlパイプライン全体のdpの総合的ユーティリティ解析フレームワークdp-utilを提案する。プライバシに敏感なデータに対するMLタスクが与えられた場合、DP-UTILは、モデルユーティリティ損失、プライバシリーク、真に明らかになったトレーニングサンプルの数で測定された、これらの5つの摂動領域におけるDPの影響に関する総合的な比較分析を可能にする。我々は,視覚,医療,財務データセットの分類タスクよりもDP-UTILを評価するために,2つの代表的な学習アルゴリズム(論理回帰とディープニューラルネットワーク)を事例スタディアタックとして利用した。結果のハイライトの1つは、予測摂動が一貫してすべてのデータセットにわたるすべてのモデルで最も低いユーティリティ損失を達成していることです。ロジスティック回帰モデルでは、客観摂動は他の摂動法と比較して低いプライバシー漏洩をもたらす。ディープニューラルネットワークの場合、勾配の摂動はプライバシリークを低くする。さらに,本研究の結果から,プライバシリークが増大するにつれて,より多くのメンバーサンプルが発見されたことが示唆された。以上の結果から,どの摂動メカニズムを使用するべきかを判断するためには,最適化手法(凸対非凸),摂動機構,クラス数,プライバシ予算のダイナミクスを検討する必要があることが示唆された。 Differential Privacy (DP) has emerged as a rigorous formalism to reason about quantifiable privacy leakage. In machine learning (ML), DP has been employed to limit inference/disclosure of training examples. Prior work leveraged DP across the ML pipeline, albeit in isolation, often focusing on mechanisms such as gradient perturbation. In this paper, we present, DP-UTIL, a holistic utility analysis framework of DP across the ML pipeline with focus on input perturbation, objective perturbation, gradient perturbation, output perturbation, and prediction perturbation. Given an ML task on privacy-sensitive data, DP-UTIL enables a ML privacy practitioner perform holistic comparative analysis on the impact of DP in these five perturbation spots, measured in terms of model utility loss, privacy leakage, and the number of truly revealed training samples. We evaluate DP-UTIL over classification tasks on vision, medical, and financial datasets, using two representative learning algorithms (logistic regression and deep neural network) against membership inference attack as a case study attack. One of the highlights of our results is that prediction perturbation consistently achieves the lowest utility loss on all models across all datasets. In logistic regression models, objective perturbation results in lowest privacy leakage compared to other perturbation techniques. For deep neural networks, gradient perturbation results in lowest privacy leakage. Moreover, our results on true revealed records suggest that as privacy leakage increases a differentially private model reveals more number of member samples. Overall, our findings suggest that to make informed decisions as to which perturbation mechanism to use, a ML privacy practitioner needs to examine the dynamics between optimization techniques (convex vs. non-convex), perturbation mechanisms, number of classes, and privacy budget.	翻訳日:2021-12-28 17:34:51 公開日:2021-12-24
# 解析クエリ処理のための微調整データ構造 Fine-Tuning Data Structures for Analytical Query Processing ( http://arxiv.org/abs/2112.13099v1 ) ライセンス: Link先を確認	Amir Shaikhha, Marios Kelepeshis, Mahdi Ghorbani	(参考訳) 分析ワークロードの効率的な計算を支援するために,データ構造を自動的に選択するフレームワークを提案する。私たちの貢献は2倍です。まず,古典結合やgroupjoin,データベース内機械学習エンジンなど,さまざまなクエリ処理パラダイムの背後にあるアルゴリズムを表現可能な,新しい低レベル中間言語を提案する。この言語は辞書の概念に基づいて設計されており、低レベルの実装をより細かく選択することができる。次に、機械学習とプログラム推論を組み合わせることで、代替実装のコストモデルを自動的に推論する。辞書コストモデルは、所定のハードウェアアーキテクチャ上の辞書操作のプロファイリングデータセット上で訓練された回帰モデルを用いて学習される。プログラムコストモデルは静的プログラム解析を用いて推定される。実験の結果,マイクロベンチマークにおける訓練コストモデルの有効性が示された。さらに、我々のフレームワークが生成したコードの性能は、最先端の分析クエリエンジンと最近のデータベース内機械学習フレームワークに匹敵するか、同等であることを示す。 We introduce a framework for automatically choosing data structures to support efficient computation of analytical workloads. Our contributions are twofold. First, we introduce a novel low-level intermediate language that can express the algorithms behind various query processing paradigms such as classical joins, groupjoin, and in-database machine learning engines. This language is designed around the notion of dictionaries, and allows for a more fine-grained choice of its low-level implementation. Second, the cost model for alternative implementations is automatically inferred by combining machine learning and program reasoning. The dictionary cost model is learned using a regression model trained over the profiling dataset of dictionary operations on a given hardware architecture. The program cost model is inferred using static program analysis. Our experimental results show the effectiveness of the trained cost model on micro benchmarks. Furthermore, we show that the performance of the code generated by our framework either outperforms or is on par with the state-of-the-art analytical query engines and a recent in-database machine learning framework.	翻訳日:2021-12-28 17:34:17 公開日:2021-12-24
# リチウムイオン電池の物理モデルと機械学習の統合 Integrating Physics-Based Modeling with Machine Learning for Lithium-Ion Batteries ( http://arxiv.org/abs/2112.12979v1 ) ライセンス: Link先を確認	Hao Tu, Scott Moura, Yebin Wang, Huazhen Fang	(参考訳) リチウムイオン電池(libs)の数学的モデリングは、高度な電池管理における主要な課題である。本稿では,LiBの高精度モデリングを実現するために,物理モデルと機械学習を統合する2つの新しいフレームワークを提案する。フレームワークの特徴は、物理モデルの状態情報の機械学習モデルに通知することで、物理モデルと機械学習の深い統合を可能にすることである。これらの枠組みに基づき、電気化学モデルと等価回路モデルとをそれぞれフィードフォワードニューラルネットワークと組み合わせて、一連のハイブリッドモデルを構築する。ハイブリッドモデルは構造的に比較的類似しており、広範なシミュレーションや実験で示されているように、幅広いCレートでかなりの予測精度を提供できる。この研究は、老化と認識のハイブリッドモデリングをさらに拡大し、健康状態を意識して予測するハイブリッドモデルの設計へと繋がる。実験により、モデルはLiBのサイクルライフサイクルを通して高い予測精度を持つことが示された。 Mathematical modeling of lithium-ion batteries (LiBs) is a primary challenge in advanced battery management. This paper proposes two new frameworks to integrate a physics-based model with machine learning to achieve high-precision modeling for LiBs. The frameworks are characterized by informing the machine learning model of the state information of the physical model, enabling a deep integration between physics and machine learning. Based on the frameworks, a series of hybrid models are constructed, through combining an electrochemical model and an equivalent circuit model, respectively, with a feedforward neural network. The hybrid models are relatively parsimonious in structure and can provide considerable predictive accuracy under a broad range of C-rates, as shown by extensive simulations and experiments. The study further expands to conduct aging-aware hybrid modeling, leading to the design of a hybrid model conscious of the state-of-health to make prediction. Experiments show that the model has high predictive accuracy throughout a LiB's cycle life.	翻訳日:2021-12-28 17:27:46 公開日:2021-12-24
# Dyson-Schwinger方程式の自律的数値解析継続のための機械学習パイプライン A machine learning pipeline for autonomous numerical analytic continuation of Dyson-Schwinger equations ( http://arxiv.org/abs/2112.13011v1 ) ライセンス: Link先を確認	Andreas Windisch, Thomas Gallien, Christopher Schwarzlmueller	(参考訳) ダイソン=シュウィンガー方程式(Dyson-Schwinger equations, DSEs)は、場の量子論においてn点関数を表現する非摂動的な方法である。例えば、ユークリッド空間やランダウゲージで働くと、クォークプロパゲータのダイソン=シュウィンガー方程式を実および複素領域で研究することができる。これらの方程式を複素領域で解くことを目指すとき、つまり複素外部モータに対して、超球面座標で表されるループ運動量の複素平面における半径成分の積分輪郭を変形しなければならない。これは自己エネルギーループの積分における極と分岐切断を避けるために行う必要がある。ダイソン=シュウィンガー方程式の性質はそうであるので、それらは自己一貫性のある方法で解かなければならないので、反復ステップ毎に積分の解析的性質を解析することはできない。本稿では,コンピュータビジョン(cv)へのディープラーニング(dl)アプローチに基づく機械学習パイプラインと,反復ステップ毎に数値積分の極と分岐を検知し,これらの障害を回避する適切な積分輪郭変形を提案することで,この問題を自律的に解決できる深層強化学習(drl)を提案する。我々はこれらのタスク、すなわち、棒と枝の切断検出と輪郭変形の両方の原理の証明をスケッチする。 Dyson-Schwinger equations (DSEs) are a non-perturbative way to express n-point functions in quantum field theory. Working in Euclidean space and in Landau gauge, for example, one can study the quark propagator Dyson-Schwinger equation in the real and complex domain, given that a suitable and tractable truncation has been found. When aiming for solving these equations in the complex domain, that is, for complex external momenta, one has to deform the integration contour of the radial component in the complex plane of the loop momentum expressed in hyper-spherical coordinates. This has to be done in order to avoid poles and branch cuts in the integrand of the self-energy loop. Since the nature of Dyson-Schwinger equations is such, that they have to be solved in a self-consistent way, one cannot analyze the analytic properties of the integrand after every iteration step, as this would not be feasible. In these proceedings, we suggest a machine learning pipeline based on deep learning (DL) approaches to computer vision (CV), as well as deep reinforcement learning (DRL), that could solve this problem autonomously by detecting poles and branch cuts in the numerical integrand after every iteration step and by suggesting suitable integration contour deformations that avoid these obstructions. We sketch out a proof of principle for both of these tasks, that is, the pole and branch cut detection, as well as the contour deformation.	翻訳日:2021-12-28 17:27:33 公開日:2021-12-24
# 非侵襲的胎児心電図 : モデル,技術,アルゴリズム Noninvasive Fetal Electrocardiography: Models, Technologies and Algorithms ( http://arxiv.org/abs/2112.13021v1 ) ライセンス: Link先を確認	Reza Sameni	(参考訳) 胎児心電図(fECG)は1900年代初頭に母体腹部から初めて記録された。過去50年間、最も先進的な電子工学技術と信号処理アルゴリズムは、非侵襲的な胎児心電図を胎児の心臓モニタリングのための信頼できる技術に変換するために用いられてきた。本章では,非侵襲的母体腹部記録からのfECGのモデリング,抽出,解析のために開発された主要な信号処理技術について概説し,相互に詳細に比較する。章の主な話題は以下のとおりである。 1)信号処理の観点からのfECGの電気生理学 2)母体表面から得られたfECGの母体体積伝導媒体の数学的モデルと波形モデル 3) 信号取得要件 4)適応フィルタや半盲音源分離技術を含むfECGノイズと干渉キャンセルのためのモデルに基づく手法 5) 少数のチャンネルから胎児の運動追跡とオンラインfECG抽出のアルゴリズムが進歩した。 The fetal electrocardiogram (fECG) was first recorded from the maternal abdominal surface in the early 1900s. During the past fifty years, the most advanced electronics technologies and signal processing algorithms have been used to convert noninvasive fetal electrocardiography into a reliable technology for fetal cardiac monitoring. In this chapter, the major signal processing techniques, which have been developed for the modeling, extraction and analysis of the fECG from noninvasive maternal abdominal recordings are reviewed and compared with one another in detail. The major topics of the chapter include: 1) the electrophysiology of the fECG from the signal processing viewpoint, 2) the mathematical model of the maternal volume conduction media and the waveform models of the fECG acquired from body surface leads, 3) the signal acquisition requirements, 4) model-based techniques for fECG noise and interference cancellation, including adaptive filters and semi-blind source separation techniques, and 5) recent algorithmic advances for fetal motion tracking and online fECG extraction from few number of channels.	翻訳日:2021-12-28 17:27:06 公開日:2021-12-24
# クライアント分散低減による圧縮連合学習の高速化 Faster Rates for Compressed Federated Learning with Client-Variance Reduction ( http://arxiv.org/abs/2112.13097v1 ) ライセンス: Link先を確認	Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richt\'arik	(参考訳) 分散学習および連合学習アプリケーションの通信ボトルネックにより、通信圧縮を用いたアルゴリズムが注目され、実際に広く使われている。さらに、不均一なクライアントの総数が非常に多く、各通信ラウンドでサーバがすべてのクライアントと通信できないため、連合学習にはクライアント分散が存在する。本稿では,この2つの問題に対して,圧縮およびクライアント分散低減手法を提案する。具体的には、COFIGとFRECONを導入し、クライアント分散化による通信圧縮をうまく楽しむ。 COFIGの総通信ラウンドは$O(\frac{(1+\omega)^{3/2}\sqrt{N}}{S\epsilon^2}+\frac{(1+\omega)N^{2/3}}{S\epsilon^2})$である。さらに、FRECONは非凸環境でCOFIGよりも早く収束し、$O(\frac{(1+\omega)\sqrt{N}}{S\epsilon^2})$通信ラウンドに収束する。凸設定では、COFIG は通信ラウンド $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ に収束する。結論として、cofigとfreconはどちらもすべてのクライアントと通信する必要がなく、凸および非凸のフェデレーション学習の第一または第二の収束結果を提供するが、以前の作業では完全なクライアント通信が必要か(実用的ではない)、より悪い収束結果を得る必要がある。 Due to the communication bottleneck in distributed and federated learning applications, algorithms using communication compression have attracted significant attention and are widely used in practice. Moreover, there exists client-variance in federated learning due to the total number of heterogeneous clients is usually very large and the server is unable to communicate with all clients in each communication round. In this paper, we address these two issues together by proposing compressed and client-variance reduced methods. Concretely, we introduce COFIG and FRECON, which successfully enjoy communication compression with client-variance reduction. The total communication round of COFIG is $O(\frac{(1+\omega)^{3/2}\sqrt{N}}{S\epsilon^2}+\frac{(1+\omega)N^{2/3}}{S\epsilon^2})$ in the nonconvex setting, where $N$ is the total number of clients, $S$ is the number of communicated clients in each round, $\epsilon$ is the convergence error, and $\omega$ is the parameter for the compression operator. Besides, our FRECON can converge faster than COFIG in the nonconvex setting, and it converges with $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon^2})$ communication rounds. In the convex setting, COFIG converges within the communication rounds $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$, which is also the first convergence result for compression schemes that do not communicate with all the clients in each round. In sum, both COFIG and FRECON do not need to communicate with all the clients and provide first/faster convergence results for convex and nonconvex federated learning, while previous works either require full clients communication (thus not practical) or obtain worse convergence results.	翻訳日:2021-12-28 17:26:51 公開日:2021-12-24
# 説明可能な人工知能による脳機能発達の理解 : 課題と展望 Towards Understanding Human Functional Brain Development with Explainable Artificial Intelligence: Challenges and Perspectives ( http://arxiv.org/abs/2112.12910v1 ) ライセンス: Link先を確認	Mehrin Kiani, Javier Andreu-Perez, Hani Hagras, Silvia Rigato, and Maria Laura Filippetti	(参考訳) 過去数十年間、人間の脳の発達を調べるためにますます採用されている非侵襲的な神経画像技術が著しく進歩してきた。しかし、これらの改善は必ずしも、機能的脳発達のメカニズムを説明することができる、より洗練されたデータ分析尺度に従わなかった。例えば、単変量(脳の単一領域)から多変量(脳の複数領域)への変化分析パラダイムは、異なる脳領域間の相互作用の調査を可能にするために重要である。しかし、発達する脳領域間の相互作用に光を当てる多変量解析の可能性にもかかわらず、人工知能(AI)技術を適用して分析を説明不能にする。本研究の目的は,現在最先端のAI技術が機能的脳発達にどのような影響を及ぼすかを理解することである。さらに、発達認知神経科学(DCN)フレームワークによって定義された脳発達のプロセスに基づいて、どのAI技術が学習を説明するかのレビューも実施されている。この研究は、eXplainable AI(XAI)がDCNフレームワークによって仮説された機能的脳開発を調査するための実行可能な方法を提供するかもしれないことも示唆している。 The last decades have seen significant advancements in non-invasive neuroimaging technologies that have been increasingly adopted to examine human brain development. However, these improvements have not necessarily been followed by more sophisticated data analysis measures that are able to explain the mechanisms underlying functional brain development. For example, the shift from univariate (single area in the brain) to multivariate (multiple areas in brain) analysis paradigms is of significance as it allows investigations into the interactions between different brain regions. However, despite the potential of multivariate analysis to shed light on the interactions between developing brain regions, artificial intelligence (AI) techniques applied render the analysis non-explainable. The purpose of this paper is to understand the extent to which current state-of-the-art AI techniques can inform functional brain development. In addition, a review of which AI techniques are more likely to explain their learning based on the processes of brain development as defined by developmental cognitive neuroscience (DCN) frameworks is also undertaken. This work also proposes that eXplainable AI (XAI) may provide viable methods to investigate functional brain development as hypothesised by DCN frameworks.	翻訳日:2021-12-28 16:51:06 公開日:2021-12-24
# 小さなニューラルネットワークと小さなトレーニングセットを駆使した深部神経進化の研究:MRI脳系列分類へのサンプル応用 Deep Neuroevolution Squeezes More out of Small Neural Networks and Small Training Sets: Sample Application to MRI Brain Sequence Classification ( http://arxiv.org/abs/2112.12990v1 ) ライセンス: Link先を確認	Joseph N Stember, Hrithwik Shalu	(参考訳) 目的:Deep Neuroevolution (DNE)は、小さなニューラルネットワークと小さなトレーニングセットでうまく機能する放射線学人工知能(AI)を提供することを約束している。我々は、MRI脳シークエンス分類へのプループ・オブ・プリンシプルの適用を通して、この可能性を実現することを目指している。方法】T1,T1ポストコントラスト,T2-FLAIR,T2-FLAIRの4つのシークエンス/重み付けで20例のトレーニングセットを解析した。我々は、比較的小さな畳み込みニューラルネットワーク(cnn)のパラメータを次のように訓練した。次に,CNNトレーニングセットの精度を測定し,後者を適合度評価指標とした。最も適した児童CNNが同定された。私たちは彼らの突然変異を親CNNに組み込んだ。この選択的に変異した親は次世代の親cnnとなった。私たちは約5万世代にわたってこのプロセスを繰り返しました。結果: DNEは単調収束を100%トレーニングセット精度で達成した。 dneはまた、100%テストセット精度に単調に収束した。結論: DNEは小さなトレーニングセットと小さなCNNで完全な精度を達成することができる。特に、深層強化学習と組み合わせると、dneは放射線学aiを学習能力においてより人間らしくする探求の道を開くかもしれない。 DNEは、新しいタスクや新しいイメージタイプに適応できる、放射線医学のAIアルゴリズムの、予想されるメタラーニング体制の重要な構成要素であるかもしれない。 Purpose: Deep Neuroevolution (DNE) holds the promise of providing radiology artificial intelligence (AI) that performs well with small neural networks and small training sets. We seek to realize this potential via a proof-of-principle application to MRI brain sequence classification. Methods: We analyzed a training set of 20 patients, each with four sequences/weightings: T1, T1 post-contrast, T2, and T2-FLAIR. We trained the parameters of a relatively small convolutional neural network (CNN) as follows: First, we randomly mutated the CNN weights. We then measured the CNN training set accuracy, using the latter as the fitness evaluation metric. The fittest child CNNs were identified. We incorporated their mutations into the parent CNN. This selectively mutated parent became the next generation's parent CNN. We repeated this process for approximately 50,000 generations. Results: DNE achieved monotonic convergence to 100% training set accuracy. DNE also converged monotonically to 100% testing set accuracy. Conclusions: DNE can achieve perfect accuracy with small training sets and small CNNs. Particularly when combined with Deep Reinforcement Learning, DNE may provide a path forward in the quest to make radiology AI more human-like in its ability to learn. DNE may very well turn out to be a key component of the much-anticipated meta-learning regime of radiology AI algorithms that can adapt to new tasks and new image types, similar to human radiologists.	翻訳日:2021-12-28 16:50:44 公開日:2021-12-24
# ドップラー速度に基づく移動物体のクラスタリングと速度推定アルゴリズム Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects ( http://arxiv.org/abs/2112.12984v1 ) ライセンス: Link先を確認	Mian Guo, Kai Zhong, Xiaozhi Wang	(参考訳) 本研究では,FMCW LiDARの特性に基づくドップラー速度クラスタと速度推定アルゴリズムを提案する。我々は同じ物体上でドップラー速度の連続性を証明した。この原理に基づき,領域拡大クラスタリングアルゴリズムを用いて移動物体と静止背景の区別を実現する。得られた静止背景を用いて最小二乗法によりFMCW LiDARの速度を推定する。次に,推定lidar速度とクラスタリングにより得られた移動物体のドップラー速度を用いて,移動物体の速度を推定する。リアルタイム処理を確保するために,適切な最小二乗パラメータを設定した。一方、このアルゴリズムの有効性を検証するため、自動走行シミュレーションプラットフォームCARLA上でFMCW LiDARモデルを作成し、データを生成する。その結果,少なくとも450万点を処理でき,ryzen 3600x cpuの演算能力で毎秒150個の移動物体の速度を推定でき,動作状態検出精度は99%以上,速度精度は0.1m/sであった。 We propose a Doppler velocity-based cluster and velocity estimation algorithm based on the characteristics of FMCW LiDAR which achieves highly accurate, single-scan, and real-time motion state detection and velocity estimation. We prove the continuity of the Doppler velocity on the same object. Based on this principle, we achieve the distinction between moving objects and stationary background via region growing clustering algorithm. The obtained stationary background will be used to estimate the velocity of the FMCW LiDAR by the least-squares method. Then we estimate the velocity of the moving objects using the estimated LiDAR velocity and the Doppler velocity of moving objects obtained by clustering. To ensure real-time processing, we set the appropriate least-squares parameters. Meanwhile, to verify the effectiveness of the algorithm, we create the FMCW LiDAR model on the autonomous driving simulation platform CARLA for spawning data. The results show that our algorithm can process at least a 4.5million points and estimate the velocity of 150 moving objects per second under the arithmetic power of the Ryzen 3600x CPU, with a motion state detection accuracy of over 99% and estimated velocity accuracy of 0.1 m/s.	翻訳日:2021-12-28 16:47:05 公開日:2021-12-24
# US-GAN:表情合成における究極のスキップ接続の重要性について US-GAN: On the importance of Ultimate Skip Connection for Facial Expression Synthesis ( http://arxiv.org/abs/2112.13002v1 ) ライセンス: Link先を確認	Arbish Akram and Nazar Khan	(参考訳) 近年の研究では、顔表情合成のための多領域画像・画像翻訳において顕著な結果が示されている。これらの手法は有効であるが, 多数のラベル付きサンプルが必要である。より小さなデータセットでトレーニングすると、パフォーマンスが大幅に低下します。この制限に対処するため,本研究では,us-ganという,より小さなデータセットを用いることで,実用的な表現を合成する手法を提案する。提案手法は,1つの残差ブロック,復号層,および入力画像と出力画像とを接続する究極のスキップ接続を符号化する。最先端の表情合成法に比べて3倍少ないパラメータを持つ。実験により,提案手法の定量的,定性的な有効性を示した。また,入力顔画像の鮮やかな顔と全体色の詳細を復元するのには,最終的なスキップ接続が十分であることを示す。 Recent studies have shown impressive results in multi-domain image-to-image translation for facial expression synthesis. While effective, these methods require a large number of labelled samples for plausible results. Their performance significantly degrades when we train them on smaller datasets. To address this limitation, in this work, we present US-GAN, a smaller and effective method for synthesizing plausible expressions by employing notably smaller datasets. The proposed method comprises of encoding layers, single residual block, decoding layers and an ultimate skip connection that links the input image to an output image. It has three times lesser parameters as compared to state-of-the-art facial expression synthesis methods. Experimental results demonstrate the quantitative and qualitative effectiveness of our proposed method. In addition, we also show that an ultimate skip connection is sufficient for recovering rich facial and overall color details of the input face image that a larger state-of-the-art model fails to recover.	翻訳日:2021-12-28 16:46:46 公開日:2021-12-24
# 航行可能な地域への接地言語命令 Grounding Linguistic Commands to Navigable Regions ( http://arxiv.org/abs/2112.13031v1 ) ライセンス: Link先を確認	Nivedita Rufus, Kanishk Jain, Unni Krishnan R Nair, Vineet Gandhi, K Madhava Krishna	(参考訳) 人間は「黄色いセダンの隣の公園」のような言語コマンドを熱心に理解し、車両が走行すべき道路のどの地域を直感的に知ることができる。この能力を自動運転車に拡張することは、人間の指示に応えて行動する完全自律型エージェントを作るための次のステップだ。そこで本研究では,ナビゲーション可能な地域 (RNR) の参照という新たな課題,すなわち言語命令に基づくナビゲーションに対する関心領域の接地について提案する。 RNRは参照イメージセグメンテーション(RIS)とは違い、ナビゲーション可能な領域を接地するのではなく、自然言語表現によって参照されるオブジェクトを接地することに焦点を当てている。例えば、「黄色いセダンの隣の駐車場」というコマンドは、RISが参照するセダンを分割することを目的としており、RNRが提案する駐車エリアを道路上に分割することを目的としている。既存のtalk2carデータセットを言語コマンドで記述された領域のセグメンテーションマスクで拡張する,新たなデータセットであるtalk2car-regsegを紹介する。データセットの実用性を評価するために、簡潔なmanoeuvre指向のコマンドで別々のテストスプリットが提供されます。提案するデータセットを新しいトランスフォーマーベースのアーキテクチャを用いてベンチマークする。複数の評価基準において,広範なアブレーションを行い,ベースラインよりも優れた性能を示す。 RNR出力に基づく下流経路プランナが提案手法の有効性を確認した。 Humans have a natural ability to effortlessly comprehend linguistic commands such as "park next to the yellow sedan" and instinctively know which region of the road the vehicle should navigate. Extending this ability to autonomous vehicles is the next step towards creating fully autonomous agents that respond and act according to human commands. To this end, we propose the novel task of Referring Navigable Regions (RNR), i.e., grounding regions of interest for navigation based on the linguistic command. RNR is different from Referring Image Segmentation (RIS), which focuses on grounding an object referred to by the natural language expression instead of grounding a navigable region. For example, for a command "park next to the yellow sedan," RIS will aim to segment the referred sedan, and RNR aims to segment the suggested parking region on the road. We introduce a new dataset, Talk2Car-RegSeg, which extends the existing Talk2car dataset with segmentation masks for the regions described by the linguistic commands. A separate test split with concise manoeuvre-oriented commands is provided to assess the practicality of our dataset. We benchmark the proposed dataset using a novel transformer-based architecture. We present extensive ablations and show superior performance over baselines on multiple evaluation metrics. A downstream path planner generating trajectories based on RNR outputs confirms the efficacy of the proposed framework.	翻訳日:2021-12-28 16:46:33 公開日:2021-12-24
# 汎用wasserstein dice loss, test-time augmentation, and transformers for the brats 2021 challenge Generalized Wasserstein Dice Loss, Test-time Augmentation, and Transformers for the BraTS 2021 challenge ( http://arxiv.org/abs/2112.13054v1 ) ライセンス: Link先を確認	Lucas Fidon, Suprosanna Shit, Ivan Ezhov, Johannes C. Paetzold, S\'ebastien Ourselin, Tom Vercauteren	(参考訳) 多重磁気共鳴イメージング(MRI)による脳腫瘍のセグメント化は、医療画像計算において難しい課題である。主な課題は、様々なスキャナーとイメージングプロトコルへの一般化性にある。本稿では,予測時間を増やすことなくモデルロバスト性を高める戦略を検討する。この目的に向けて、異なる損失、オプティマイザ、および列車価データ分割を用いて訓練されたモデルから堅牢なアンサンブルを見つけることを検討する。重要なことは、U-Netアーキテクチャのボトルネックにトランスフォーマーが組み込まれていることである。ボトルネック内のトランスフォーマーは、平均でu-netのベースラインよりもわずかに悪いが、一般的なwasserstein dice損失は一貫して優れた結果をもたらす。さらに,高速かつロバストな推論のために,効率的なテスト時間拡張戦略を採用する。テストタイム増強を伴う7つの3次元U-Netの最終的なアンサンブルは、BraTS 2021テストデータセットで評価すると平均89.4%、平均ハウスドルフ95%距離10.0mmとなる。私たちのコードとトレーニングされたモデルはhttps://github.com/LucasFidon/TRABIT_BraTS2021で公開されています。 Brain tumor segmentation from multiple Magnetic Resonance Imaging (MRI) modalities is a challenging task in medical image computation. The main challenges lie in the generalizability to a variety of scanners and imaging protocols. In this paper, we explore strategies to increase model robustness without increasing inference time. Towards this aim, we explore finding a robust ensemble from models trained using different losses, optimizers, and train-validation data split. Importantly, we explore the inclusion of a transformer in the bottleneck of the U-Net architecture. While we find transformer in the bottleneck performs slightly worse than the baseline U-Net in average, the generalized Wasserstein Dice loss consistently produces superior results. Further, we adopt an efficient test time augmentation strategy for faster and robust inference. Our final ensemble of seven 3D U-Nets with test-time augmentation produces an average dice score of 89.4% and an average Hausdorff 95% distance of 10.0 mm when evaluated on the BraTS 2021 testing dataset. Our code and trained models are publicly available at https://github.com/LucasFidon/TRABIT_BraTS2021.	翻訳日:2021-12-28 16:46:11 公開日:2021-12-24
# クリニカルエビデンスから変態を識別する:要約文と引用文のテキスト特徴を用いた臨床研究の分類 Distinguishing Transformative from Incremental Clinical Evidence: A Classifier of Clinical Research using Textual features from Abstracts and Citing Sentences ( http://arxiv.org/abs/2112.12996v1 ) ライセンス: Link先を確認	Xuanyu Shi, Jian Du	(参考訳) 臨床研究や臨床意思決定においては、ある研究が変化したか、特定の疾患管理のための現在のケア基準のみを支持しているかを知ることが重要である。このような変化をトランスフォーメーションとして定義し、インクリメンタルな研究としてサポートします。通常、そのようなタスクを人間が完了するには膨大な量のドメインの専門知識と時間が必要です。教員の意見は、ある研究が確立した研究に挑戦するかどうかについて、よく注釈付きコーパスを与えてくれます。本研究では, 段階的臨床証拠と変態を区別する機械学習手法を提案する。また,2年間の引用文の要約と引用文の窓からのテキストを,学部オピニオンズの専門家が推奨し,ラベル付けした臨床研究のトレーニングセットとして収集した。平均 auc は 0.755 (0.705-0.875) であり、ランダムフォレストを分類器とし、文を特徴とする。その結果,変換研究は抽象文と異なり,文を引用する言語パターンが典型的であることがわかった。我々は,これらの臨床証拠の特定が困難であるか,あるいは臨床医や研究者が確立した主張を裏付けるだけの効果的なツールを提供する。 In clinical research and clinical decision-making, it is important to know if a study changes or only supports the current standards of care for specific disease management. We define such a change as transformative and a support as incremental research. It usually requires a huge amount of domain expertise and time for humans to finish such tasks. Faculty Opinions provides us with a well-annotated corpus on whether a research challenges or only confirms established research. In this study, a machine learning approach is proposed to distinguishing transformative from incremental clinical evidence. The texts from both abstract and a 2-year window of citing sentences are collected for a training set of clinical studies recommended and labeled by Faculty Opinions experts. We achieve the best performance with an average AUC of 0.755 (0.705-0.875) using Random Forest as the classifier and citing sentences as the feature. The results showed that transformative research has typical language patterns in citing sentences unlike abstract sentences. We provide an efficient tool for identifying those clinical evidence challenging or only confirming established claims for clinicians and researchers.	翻訳日:2021-12-28 16:32:49 公開日:2021-12-24
# TSAXのトレンド TSAX is Trending ( http://arxiv.org/abs/2112.12912v1 ) ライセンス: Link先を確認	Muhammad Marwan Muhammad Fuad	(参考訳) 時系列データはユビキタスであり、複数の領域に多くの応用があるため、時系列マイニングはデータマイニングの重要な分野である。時系列採掘の主な課題は分類である。時系列表現法は時系列分類や他の時系列マイニングタスクにおいて重要な役割を果たしている。時系列データの最も一般的な表現方法の1つは、シンボリックアグリゲート近似(SAX)である。その人気の背後にある秘密は、シンプルさと効率だ。しかしsaxには、トレンド情報を表現できないという大きな欠点がある。 SAXがトレンド情報を取得するためのいくつかの方法が提案されているが、これは複雑な処理、前処理、後処理の手順を犠牲にしている。本稿では,SAXに最小限の複雑さを与えるだけで,時系列分類における性能を大幅に向上させる,Trending SAX (TSAX) と呼ばれる新しいSAXを提案する。これは50のデータセットで実験的に検証される。その結果,SAXと比較して39データセットの分類誤差が小さいため,本手法の優れた性能を示した。 Time series mining is an important branch of data mining, as time series data is ubiquitous and has many applications in several domains. The main task in time series mining is classification. Time series representation methods play an important role in time series classification and other time series mining tasks. One of the most popular representation methods of time series data is the Symbolic Aggregate approXimation (SAX). The secret behind its popularity is its simplicity and efficiency. SAX has however one major drawback, which is its inability to represent trend information. Several methods have been proposed to enable SAX to capture trend information, but this comes at the expense of complex processing, preprocessing, or post-processing procedures. In this paper we present a new modification of SAX that we call Trending SAX (TSAX), which only adds minimal complexity to SAX, but substantially improves its performance in time series classification. This is validated experimentally on 50 datasets. The results show the superior performance of our method, as it gives a smaller classification error on 39 datasets compared with SAX.	翻訳日:2021-12-28 16:31:45 公開日:2021-12-24
# 周期的再構成による絡み合い Disentanglement by Cyclic Reconstruction ( http://arxiv.org/abs/2112.12980v1 ) ライセンス: Link先を確認	David Bertoin, Emmanuel Rachelson (DMIA)	(参考訳) ディープニューラルネットワークは、データから意味のある特徴を自動的に抽出する能力を示している。しかし、教師付き学習では、トレーニングに使用されるデータセット特有の情報が、手元のタスクとは無関係であり、抽出された表現にエンコードされる可能性がある。この残りの情報はドメイン固有のバイアスをもたらし、一般化性能を弱める。本研究では,その情報をタスク関連表現とその補完的文脈表現に分割することを提案する。提案手法は, 逆特徴予測器と循環再構成を組み合わせることで, これら2つの表現を単一領域教師ありの場合に分離する手法である。次に、この手法を教師なし領域適応問題に適用し、ソースとターゲットドメインの両方で実行可能なモデルを訓練する。特に,トレーニングラベルの欠如にもかかわらず,対象領域のゆがみを促進する手法を提案する。これにより、両方のドメインからタスク固有の情報を分離し、共通の表現に投影することができる。タスク固有の表現は、ソースドメインからターゲットドメインに取得した知識の効率的な転送を可能にする。単一ドメインの場合、情報検索タスクにおける表現の質と、強化されたタスク固有の表現によって引き起こされる一般化の利点を示す。次に,いくつかの古典的ドメイン適応ベンチマークで提案手法を検証し,ドメイン適応における絡み合いの利点を説明する。 Deep neural networks have demonstrated their ability to automatically extract meaningful features from data. However, in supervised learning, information specific to the dataset used for training, but irrelevant to the task at hand, may remain encoded in the extracted representations. This remaining information introduces a domain-specific bias, weakening the generalization performance. In this work, we propose splitting the information into a task-related representation and its complementary context representation. We propose an original method, combining adversarial feature predictors and cyclic reconstruction, to disentangle these two representations in the single-domain supervised case. We then adapt this method to the unsupervised domain adaptation problem, consisting of training a model capable of performing on both a source and a target domain. In particular, our method promotes disentanglement in the target domain, despite the absence of training labels. This enables the isolation of task-specific information from both domains and a projection into a common representation. The task-specific representation allows efficient transfer of knowledge acquired from the source domain to the target domain. In the single-domain case, we demonstrate the quality of our representations on information retrieval tasks and the generalization benefits induced by sharpened task-specific representations. We then validate the proposed method on several classical domain adaptation benchmarks and illustrate the benefits of disentanglement for domain adaptation.	翻訳日:2021-12-28 16:29:43 公開日:2021-12-24
# 教師なしドメイン適応再同定のための擬似ラベル作成におけるベストプラクティスの形式的アプローチ A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification ( http://arxiv.org/abs/2112.12887v1 ) ライセンス: Link先を確認	Fabian Dubourvieux, Romaric Audigier, Ang\'elique Loesch, Samia Ainouz, St\'ephane Canu	(参考訳) Unsupervised Domain Adaptive (UDA) Re-Identification (re-ID) に最適なパフォーマンスで対処するためには、擬似ラベルの使用が一般的である。実際、このアプローチのファミリはいくつかのUDA re-ID固有のフレームワークを生み出しました。これらの研究において、uda re-idのパフォーマンスを改善するための研究方向は様々であり、主に直観と実験に基づいている: 擬似ラベルを精錬し、擬似ラベルにおけるエラーの影響を低減させる。あらゆる擬似ラベルメソッドに実装できる一般的な優れたプラクティスから、そのパフォーマンスを一貫して向上させるのは難しいかもしれません。この課題に対処するために、擬似ラベル UDA re-ID に関する新たな理論的考察を提案する。貢献は3つあります。一擬似ラベル型UDA re-IDの新たな理論枠組みで、UDA re-ID性能に関する新たな一般学習上界を通じて定式化される。 (ii)疑似ラベル付けの一般的な実践は,提案する理論枠組みの解釈から直接導き出され,目標の再識別性能が向上する。 3) 課題のある人物と車両のクロスデータセット・リIDタスクに対する広範囲な実験により, 様々な最先端手法に対する一貫した性能向上と, グッドプラクティスの様々な提案がなされた。 The use of pseudo-labels prevails in order to tackle Unsupervised Domain Adaptive (UDA) Re-Identification (re-ID) with the best performance. Indeed, this family of approaches has given rise to several UDA re-ID specific frameworks, which are effective. In these works, research directions to improve Pseudo-Labeling UDA re-ID performance are varied and mostly based on intuition and experiments: refining pseudo-labels, reducing the impact of errors in pseudo-labels... It can be hard to deduce from them general good practices, which can be implemented in any Pseudo-Labeling method, to consistently improve its performance. To address this key question, a new theoretical view on Pseudo-Labeling UDA re-ID is proposed. The contributions are threefold: (i) A novel theoretical framework for Pseudo-Labeling UDA re-ID, formalized through a new general learning upper-bound on the UDA re-ID performance. (ii) General good practices for Pseudo-Labeling, directly deduced from the interpretation of the proposed theoretical framework, in order to improve the target re-ID performance. (iii) Extensive experiments on challenging person and vehicle cross-dataset re-ID tasks, showing consistent performance improvements for various state-of-the-art methods and various proposed implementations of good practices.	翻訳日:2021-12-28 16:03:21 公開日:2021-12-24
# 非条件モデルを用いたクラスタ誘導画像合成 Cluster-guided Image Synthesis with Unconditional Models ( http://arxiv.org/abs/2112.12911v1 ) ライセンス: Link先を確認	Markos Georgopoulos, James Oldfield, Grigorios G Chrysos, Yannis Panagakis	(参考訳) GAN(Generative Adversarial Networks)は、画像生成における最先端の原動力である。高解像度フォトリアリスティック画像を合成する能力はあるものの、異なる粒度のオンデマンドコンディショニングでコンテンツを生成することは課題である。この課題は通常、巨大なデータセットに興味のある属性をアノテートすることで解決される。したがって、教師なし生成モデルの生成プロセスに制御を導入することが不可欠である。本研究では,教師なし方式でよく訓練されたGANを活用して,制御可能な画像生成に焦点を当てる。この目的のために、生成元の中間層の表現空間は、意味的に意味のある属性(例えば、髪の色とポーズ)に基づいてデータを分離する多数のクラスタを形成する。クラスタ割り当てを条件付けすることで、提案手法は生成された画像の意味クラスを制御することができる。提案手法は,Implicit Maximum Likelihood Estimation (IMLE)による各クラスタからのサンプリングを可能にする。顔(CelebA-HQとFFHQ)、動物(Imagenet)、オブジェクト(LSUN)に対するアプローチの有効性を,異なる事前学習生成モデルを用いて示す。その結果,顔の性別,ポーズ,ヘアスタイルなどの属性による条件画像生成,およびさまざまな対象のクラスにおけるさまざまな特徴が明らかになった。 Generative Adversarial Networks (GANs) are the driving force behind the state-of-the-art in image generation. Despite their ability to synthesize high-resolution photo-realistic images, generating content with on-demand conditioning of different granularity remains a challenge. This challenge is usually tackled by annotating massive datasets with the attributes of interest, a laborious task that is not always a viable option. Therefore, it is vital to introduce control into the generation process of unsupervised generative models. In this work, we focus on controllable image generation by leveraging GANs that are well-trained in an unsupervised fashion. To this end, we discover that the representation space of intermediate layers of the generator forms a number of clusters that separate the data according to semantically meaningful attributes (e.g., hair color and pose). By conditioning on the cluster assignments, the proposed method is able to control the semantic class of the generated image. Our approach enables sampling from each cluster by Implicit Maximum Likelihood Estimation (IMLE). We showcase the efficacy of our approach on faces (CelebA-HQ and FFHQ), animals (Imagenet) and objects (LSUN) using different pre-trained generative models. The results highlight the ability of our approach to condition image generation on attributes like gender, pose and hair style on faces, as well as a variety of features on different object classes.	翻訳日:2021-12-28 16:02:57 公開日:2021-12-24
# すべてのボクセルが等しくない:ポイント・ボクセルの視点からのセマンティックシーンの完成 Not All Voxels Are Equal: Semantic Scene Completion from the Point-Voxel Perspective ( http://arxiv.org/abs/2112.12925v1 ) ライセンス: Link先を確認	Jiaxiang Tang, Xiaokang Chen, Jingbo Wang, Gang Zeng	(参考訳) 本稿では,3dシーンの意味的・占有的表現を予測するための有用なタスクであるセマンティック・シーン・コンプリート(ssc)を再検討する。このタスクの多くのメソッドは、常に局所的なシーン構造を維持するためのボキセル化シーン表現に基づいている。しかしながら、目に見えない空ボクセルが存在するため、ネットワークがより深くなると、これらの手法は常に重い計算冗長性に苦しむため、完成品質が制限される。このジレンマに対処するために,本課題に対する新しい点-ボクセルアグリゲーションネットワークを提案する。まず,これら見えない空のボクセルを除去し,そのシーンから意味情報を効率よく捉えるために,深い点ストリームを採用することにより,ボクセル化シーンを点雲に転送する。一方、2つの3次元畳み込み層のみを含む軽量ボクセルストリームは、ボクセル化されたシーンの局所構造を保存する。さらに、ボクセルストリームからポイントストリームに構造の詳細を融合する異方性ボクセルアグリゲーション演算子と、ポイントストリームにおけるアップサンプリングプロセスを意味ラベルによって強化する意味認識伝播モジュールを設計した。入力として深度画像しか持たない2つのベンチマークにおいて,我々のモデルが最先端をはるかに上回ることを示す。 We revisit Semantic Scene Completion (SSC), a useful task to predict the semantic and occupancy representation of 3D scenes, in this paper. A number of methods for this task are always based on voxelized scene representations for keeping local scene structure. However, due to the existence of visible empty voxels, these methods always suffer from heavy computation redundancy when the network goes deeper, and thus limit the completion quality. To address this dilemma, we propose our novel point-voxel aggregation network for this task. Firstly, we transfer the voxelized scenes to point clouds by removing these visible empty voxels and adopt a deep point stream to capture semantic information from the scene efficiently. Meanwhile, a light-weight voxel stream containing only two 3D convolution layers preserves local structures of the voxelized scenes. Furthermore, we design an anisotropic voxel aggregation operator to fuse the structure details from the voxel stream into the point stream, and a semantic-aware propagation module to enhance the up-sampling process in the point stream by semantic labels. We demonstrate that our model surpasses state-of-the-arts on two benchmarks by a large margin, with only depth images as the input.	翻訳日:2021-12-28 16:02:32 公開日:2021-12-24
# 一般化ゼロショット分類のための学習型クロスモーダル表現 Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification ( http://arxiv.org/abs/2112.12927v1 ) ライセンス: Link先を確認	Zhiyu Fang, Xiaobin Zhu, Chun Yang, Zheng Han, Jingyan Qin, Xu-Cheng Yin	(参考訳) クロスモーダルオートエンコーダの潜伏空間を整列させて一般的な潜伏埋め込みを学習することは、一般化ゼロショット分類(GZSC)の効果的な戦略である。しかし、粒度の細かいインスタンス単位のアノテーションが欠如しているため、多様化した画像の視覚表現と固定属性の意味表現との相違により、ドメインシフトの問題に悩まされる。本稿では,GZSCのためのアラインド・クロスモーダル表現(ACMR)を学習する,革新的なオートエンコーダネットワークを提案する。具体的には,学習した分類器によって導かれる潜在部分空間上でのクロスモーダル潜在特徴のアライメントを強化するための新しいビジョン・セマンティクスアライメント(vsa)法を提案する。さらに,潜伏変数の識別能力を高めるとともに,潜伏変数が崩壊する可能性を低減するための新しい情報拡張モジュール(IEM)を提案する。公開データセットに関する広範囲な実験により,本手法の最先端性能が実証された。 Learning a common latent embedding by aligning the latent spaces of cross-modal autoencoders is an effective strategy for Generalized Zero-Shot Classification (GZSC). However, due to the lack of fine-grained instance-wise annotations, it still easily suffer from the domain shift problem for the discrepancy between the visual representation of diversified images and the semantic representation of fixed attributes. In this paper, we propose an innovative autoencoder network by learning Aligned Cross-Modal Representations (dubbed ACMR) for GZSC. Specifically, we propose a novel Vision-Semantic Alignment (VSA) method to strengthen the alignment of cross-modal latent features on the latent subspaces guided by a learned classifier. In addition, we propose a novel Information Enhancement Module (IEM) to reduce the possibility of latent variables collapse meanwhile encouraging the discriminative ability of latent variables. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method.	翻訳日:2021-12-28 16:01:43 公開日:2021-12-24
# 意味セグメンテーションのためのリアルタイムグローバルアテンションネットワーク Realtime Global Attention Network for Semantic Segmentation ( http://arxiv.org/abs/2112.12939v1 ) ライセンス: Link先を確認	Xi Mo, Xiangyu Chen	(参考訳) 本稿では,セマンティックセグメンテーションの課題に対して,エンドツーエンドのグローバルアテンションニューラルネットワーク(RGANet)を提案する。自己注意パラダイムによって展開される符号化戦略とは違って,提案するグローバルアテンションモジュールは,奥行きの畳み込みやアフィン変換を通じてグローバルアテンションを符号化する。これらのグローバルアテンションモジュールを階層アーキテクチャに統合することは、高い推論性能を維持する。さらに,非凸,広く散在する地盤トラス領域の負の効果を軽減するため,改良された評価指標であるMGRIDを提案する。セマンティックセグメンテーションのための最先端アーキテクチャに関する広範な実験の結果は、ロボット単眼視覚に対する提案手法の先進的な性能を示している。 In this paper, we proposed an end-to-end realtime global attention neural network (RGANet) for the challenging task of semantic segmentation. Different from the encoding strategy deployed by self-attention paradigms, the proposed global attention module encodes global attention via depth-wise convolution and affine transformations. The integration of these global attention modules into a hierarchy architecture maintains high inferential performance. In addition, an improved evaluation metric, namely MGRID, is proposed to alleviate the negative effect of non-convex, widely scattered ground-truth areas. Results from extensive experiments on state-of-the-art architectures for semantic segmentation manifest the leading performance of proposed approaches for robotic monocular visual perception.	翻訳日:2021-12-28 16:01:26 公開日:2021-12-24
# 入射ニューラル表現によるRGB画像からの連続スペクトル再構成 Continuous Spectral Reconstruction from RGB Images via Implicit Neural Representation ( http://arxiv.org/abs/2112.13003v1 ) ライセンス: Link先を確認	Ruikang Xu, Mingde Yao, Chang Chen, Lizhi Wang, Zhiwei Xiong	(参考訳) 既存のスペクトル再構成法は通常、RGB画像から多くのスペクトル帯域への離散写像を学ぶ。しかし、このモデリング戦略はスペクトルシグネチャの連続性を無視している。本稿では,新しい連続スペクトル表現を導入することにより,この限界を解消するためのニューラルスペクトル再構成(nesr)を提案する。この目的のために、暗黙の関数の概念を採用し、ニューラルネットワークを用いたパラメータ化実施を行う。具体的には,まずバックボーンネットワークを用いてRGB入力の空間的特徴を抽出する。本研究では,スペクトルプロファイル補間(spi)モジュールとニューラル・アテンション・マッピング(nam)モジュール(nam)モジュールを考案し,空間スペクトル相関がより良い表現に関わっている深い特徴を強調する。次に、サンプルスペクトルバンドの数を連続的な暗黙関数の座標と見なして、深い特徴からスペクトル強度への投影を学習する。広範な実験により、nesrのベースライン法に対する再構成精度の差が示される。さらにnesrは、任意の数のスペクトル帯域を目標出力として有効にすることで、スペクトル再構成の柔軟性を拡張する。 Existing methods for spectral reconstruction usually learn a discrete mapping from RGB images to a number of spectral bands. However, this modeling strategy ignores the continuous nature of spectral signature. In this paper, we propose Neural Spectral Reconstruction (NeSR) to lift this limitation, by introducing a novel continuous spectral representation. To this end, we embrace the concept of implicit function and implement a parameterized embodiment with a neural network. Specifically, we first adopt a backbone network to extract spatial features of RGB inputs. Based on it, we devise Spectral Profile Interpolation (SPI) module and Neural Attention Mapping (NAM) module to enrich deep features, where the spatial-spectral correlation is involved for a better representation. Then, we view the number of sampled spectral bands as the coordinate of continuous implicit function, so as to learn the projection from deep features to spectral intensities. Extensive experiments demonstrate the distinct advantage of NeSR in reconstruction accuracy over baseline methods. Moreover, NeSR extends the flexibility of spectral reconstruction by enabling an arbitrary number of spectral bands as the target output.	翻訳日:2021-12-28 16:01:14 公開日:2021-12-24
# ベンチマーク歩行者オドメトリ:brown pedestrian odometry dataset (bpod) Benchmarking Pedestrian Odometry: The Brown Pedestrian Odometry Dataset (BPOD) ( http://arxiv.org/abs/2112.13018v1 ) ライセンス: Link先を確認	David Charatan, Hongyi Fan, Benjamin Kimia	(参考訳) 頭部装着歩行者設定における視覚計測アルゴリズムのベンチマークのためのBrown Pedestrian Odometry Dataset(BPOD)を提案する。このデータセットは、ブラウン大学のキャンパスの様々な屋内および屋外の12箇所で、グローバルおよびローリングシャッターステレオカメラを用いて撮影された。既存のデータセットと比較すると、BPODは画像のぼやけや自転を多く含んでいる。歩行者の経路に沿って設置されたスティックオンマーカーから地中軌道を生成し、第三者ビデオを用いて歩行者の位置を文書化する。 BPOD上での直接的・特徴的・学習型VO法の性能評価を行った。以上の結果から,歩行者軌跡の把握には重要な開発が必要であることが示唆された。データセットへのリンクはこちら。 \url{https://doi.org/10.26300/c1n7-7p93 We present the Brown Pedestrian Odometry Dataset (BPOD) for benchmarking visual odometry algorithms in head-mounted pedestrian settings. This dataset was captured using synchronized global and rolling shutter stereo cameras in 12 diverse indoor and outdoor locations on Brown University's campus. Compared to existing datasets, BPOD contains more image blur and self-rotation, which are common in pedestrian odometry but rare elsewhere. Ground-truth trajectories are generated from stick-on markers placed along the pedestrian's path, and the pedestrian's position is documented using a third-person video. We evaluate the performance of representative direct, feature-based, and learning-based VO methods on BPOD. Our results show that significant development is needed to successfully capture pedestrian trajectories. The link to the dataset is here: \url{https://doi.org/10.26300/c1n7-7p93	翻訳日:2021-12-28 16:00:56 公開日:2021-12-24
# SimViT:スライディングウィンドウを備えたシンプルな視覚変換器 SimViT: Exploring a Simple Vision Transformer with sliding windows ( http://arxiv.org/abs/2112.13085v1 ) ライセンス: Link先を確認	Gang Li, Di Xu, Xing Cheng, Lingyu Si, Changwen Zheng	(参考訳) 視覚変換器は多くの視覚タスクにおいてバックボーンモデルとして優れた性能を発揮しているが、そのほとんどは画像やウィンドウ内の全てのトークンのグローバルな関係を捉えることを目的としており、2D構造におけるパッチ間の固有の空間的および局所的相関を乱す。本稿では、空間構造と局所情報を視覚変換器に組み込むための、SimViTというシンプルな視覚変換器を提案する。具体的には,従来のマルチヘッド・セルフ・アテンションの代わりに,MCSA(Multi-head Central Self-Attention)を導入した。スライディングウィンドウの導入は、空間構造のキャプチャを容易にする。一方、SimViTは複数の層から複数の階層的特徴を抽出し、密集予測を行う。広範な実験により、simvitは様々な画像処理タスクの汎用バックボーンモデルとして効果的かつ効率的であることが示されている。特に我々のSimViT-Microは、ImageNet-1kデータセットで71.1%の精度を達成するために3.3Mパラメータしか必要としていない。私たちのコードはhttps://github.com/ucasligang/simvitで利用可能です。 Although vision Transformers have achieved excellent performance as backbone models in many vision tasks, most of them intend to capture global relations of all tokens in an image or a window, which disrupts the inherent spatial and local correlations between patches in 2D structure. In this paper, we introduce a simple vision Transformer named SimViT, to incorporate spatial structure and local information into the vision Transformers. Specifically, we introduce Multi-head Central Self-Attention(MCSA) instead of conventional Multi-head Self-Attention to capture highly local relations. The introduction of sliding windows facilitates the capture of spatial structure. Meanwhile, SimViT extracts multi-scale hierarchical features from different layers for dense prediction tasks. Extensive experiments show the SimViT is effective and efficient as a general-purpose backbone model for various image processing tasks. Especially, our SimViT-Micro only needs 3.3M parameters to achieve 71.1% top-1 accuracy on ImageNet-1k dataset, which is the smallest size vision Transformer model by now. Our code will be available in https://github.com/ucasligang/SimViT.	翻訳日:2021-12-28 16:00:43 公開日:2021-12-24
# CatchBackdoor:差動ファズリングによる臨界トロイの木馬神経経路同定によるバックドアテスト CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing ( http://arxiv.org/abs/2112.13064v1 ) ライセンス: Link先を確認	Haibo Jin, Ruoxi Chen, Jinyin Chen, Yao Cheng, Chong Fu, Ting Wang, Yue Yu, and Zhaoyan Ming	(参考訳) 現実世界のアプリケーションにおけるディープニューラルネットワーク(DNN)の成功は、豊富な事前学習モデルの恩恵を受けている。しかし、バックドアで事前訓練されたモデルは下流dnnの配備に重大な脅威をもたらす可能性がある。既存のDNNテスト手法は主に、敵の設定で誤ったコーナーケースの振る舞いを見つけるために設計されているが、強力なトロイの木馬によるバックドアの発見には失敗した。トロジャンネットワークの挙動を観察すると、それらは以前の研究で提案されたように単一の妥協ニューロンによって反映されるだけでなく、複数のニューロンの活性化強度と周波数における臨界神経経路に起因していることがわかる。この作業はDNNのバックドアテストを公式化し、CatchBackdoorフレームワークを提案する。少数の良性例からの臨界ニューロンの微分ファジングにより、トロイの木馬の経路、特に重要な経路を特定し、同定された経路の臨界ニューロンをシミュレートしてバックドアテストの例を生成する。大規模な実験は、既存の方法よりも高い検出性能を持つCatchBackdoorの優位性を実証している。 catchbackdoorは、既存の方法では検出できない、ステルスブレンドとアダプティブアタックによってバックドアを検知する。さらに,モデルゾウにおけるモデルバックドアの可能性を明らかにする実験を行った。 The success of deep neural networks (DNNs) in real-world applications has benefited from abundant pre-trained models. However, the backdoored pre-trained models can pose a significant trojan threat to the deployment of downstream DNNs. Existing DNN testing methods are mainly designed to find incorrect corner case behaviors in adversarial settings but fail to discover the backdoors crafted by strong trojan attacks. Observing the trojan network behaviors shows that they are not just reflected by a single compromised neuron as proposed by previous work but attributed to the critical neural paths in the activation intensity and frequency of multiple neurons. This work formulates the DNN backdoor testing and proposes the CatchBackdoor framework. Via differential fuzzing of critical neurons from a small number of benign examples, we identify the trojan paths and particularly the critical ones, and generate backdoor testing examples by simulating the critical neurons in the identified paths. Extensive experiments demonstrate the superiority of CatchBackdoor, with higher detection performance than existing methods. CatchBackdoor works better on detecting backdoors by stealthy blending and adaptive attacks, which existing methods fail to detect. Moreover, our experiments show that CatchBackdoor may reveal the potential backdoors of models in Model Zoo.	翻訳日:2021-12-28 15:16:03 公開日:2021-12-24
# deepgantt:バック散乱ネットワークのためのスケーラブルなディープラーニングスケジューラ DeepGANTT: A Scalable Deep Learning Scheduler for Backscatter Networks ( http://arxiv.org/abs/2112.12985v1 ) ライセンス: Link先を確認	Daniel F. Perez-Ramirez, Carlos Perez-Penichet, Nicolas Tsiftes, Thiemo Voigt, Dejan Kostic, Magnus Boman	(参考訳) 最近のバックスキャッター通信技術は、無修正のコモディティ無線デバイスと直接通信しながらバッテリーなしで動作できる超低電力無線デバイスを可能にする。コモディティデバイスは、電池のないノードが環境からエネルギーを集めながら通信し、センサ、計算、通信タスクを行うために必要な無修正キャリアを提供するのに協力する。非変調キャリアの最適プロビジョニングは、NPハード組合せ最適化問題であるため、ネットワークのサイズを制限する。その結果、以前の作品はキャリアの最適化を完全に無視するか、あるいは準最適ヒューリスティックに頼り、貴重なエネルギーとスペクトル資源を浪費した。本稿では,無線通信機器と相互運用する電池フリーデバイスのためのディープラーニングスケジューラDeepGANTTを紹介する。 DeepGANTTはグラフニューラルネットワークを利用して、この問題に固有の可変入力と出力サイズを克服する。我々は,制約最適化解法から得られる比較的小さいサイズの最適スケジュールで,ディープラーニングスケジューラを訓練する。 DeepGANTTは、慎重に設計されたヒューリスティックなソリューションよりも優れているだけでなく、トレーニングされた問題サイズで最適なスケジューラの約3%で性能を発揮する。最後に、DeepGANTTはトレーニングで使用される最大値の4倍以上の問題を一般化し、最適なスケジューラのスケーラビリティの限界を破り、より効率的な後方散乱ネットワークを実現する。 Recent backscatter communication techniques enable ultra low power wireless devices that operate without batteries while interoperating directly with unmodified commodity wireless devices. Commodity devices cooperate in providing the unmodulated carrier that the battery-free nodes need to communicate while collecting energy from their environment to perform sensing, computation, and communication tasks. The optimal provision of the unmodulated carrier limits the size of the network because it is an NP-hard combinatorial optimization problem. Consequently, previous works either ignore carrier optimization altogether or resort to suboptimal heuristics, wasting valuable energy and spectral resources. We present DeepGANTT, a deep learning scheduler for battery-free devices interoperating with wireless commodity ones. DeepGANTT leverages graph neural networks to overcome variable input and output size challenges inherent to this problem. We train our deep learning scheduler with optimal schedules of relatively small size obtained from a constraint optimization solver. DeepGANTT not only outperforms a carefully crafted heuristic solution but also performs within ~3% of the optimal scheduler on trained problem sizes. Finally, DeepGANTT generalizes to problems more than four times larger than the maximum used for training, therefore breaking the scalability limitations of the optimal scheduler and paving the way for more efficient backscatter networks.	翻訳日:2021-12-28 15:13:02 公開日:2021-12-24
# パーソナライズタスクにおける状態空間クラスタリングの無理な効率性について On the Unreasonable Efficiency of State Space Clustering in Personalization Tasks ( http://arxiv.org/abs/2112.13141v1 ) ライセンス: Link先を確認	Anton Dereventsov, Ranga Raju Vatsavai, Clayton Webster	(参考訳) 本研究では,複雑な報酬信号を用いてパーソナライズタスクを解決するための強化学習(rl)手法を検討する。特に,ネットワークアーキテクチャや最適化アルゴリズムの従来の選択に加えて,単純な$k$-meansアルゴリズムを用いた状態空間クラスタリングを基本としたアプローチである。数値例は異なるrl手順の効率を示し、この手法がエージェントの学習能力を加速し、エージェントの性能を制限しないことを示すために用いられる。 In this effort we consider a reinforcement learning (RL) technique for solving personalization tasks with complex reward signals. In particular, our approach is based on state space clustering with the use of a simplistic $k$-means algorithm as well as conventional choices of the network architectures and optimization algorithms. Numerical examples demonstrate the efficiency of different RL procedures and are used to illustrate that this technique accelerates the agent's ability to learn and does not restrict the agent's performance.	翻訳日:2021-12-28 15:12:41 公開日:2021-12-24
# NIP: 対人攻撃に対するニューロンレベルの逆摂動 NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks ( http://arxiv.org/abs/2112.13060v1 ) ライセンス: Link先を確認	Ruoxi Chen, Haibo Jin, Jinyin Chen, Haibin Zheng, Yue Yu and Shouling Ji	(参考訳) ディープラーニングモデルは前例のない成功を収めたものの、敵攻撃に対する脆弱性は、特にセキュリティクリティカルなドメインにデプロイする場合、注目を集めている。この課題に対処するために、反応性と積極的な戦略を含む多くの防衛戦略が提案されている。画像特徴空間の観点からは、特徴の変化により満足な結果に到達できないものもある。さらに、モデルによって学習された特徴は、分類結果に直接関連しない。それらと異なり、防御法は基本的に内部モデルと異なり、攻撃前後のニューロンの挙動を調査する。我々は、攻撃が正しいラベルに最も寄与するニューロンを劇的に変化させることでモデルに誤解をもたらすことを観察した。そこで我々は、ニューロンの影響の概念を導入し、ニューロンを前部、中部、尾部に分割する。そこで本研究では,神経レベル逆摂動法(NIP)を提案する。前方のニューロンを強化し、尾部のニューロンを弱めることで、NIPは高い良性を維持しながらほぼ全ての敵の摂動を排除できる。さらに、適応性(特に大きいもの)による摂動の大きさの違いにも対処できる。 3つのデータセットと6つのモデルで包括的な実験を行った結果、nipは11の敵の攻撃に対して最先端のベースラインよりも優れていた。さらに,ニューロンの活性化と可視化による解釈可能な証明を提供し,理解を深める。 Although deep learning models have achieved unprecedented success, their vulnerabilities towards adversarial attacks have attracted increasing attention, especially when deployed in security-critical domains. To address the challenge, numerous defense strategies, including reactive and proactive ones, have been proposed for robustness improvement. From the perspective of image feature space, some of them cannot reach satisfying results due to the shift of features. Besides, features learned by models are not directly related to classification results. Different from them, We consider defense method essentially from model inside and investigated the neuron behaviors before and after attacks. We observed that attacks mislead the model by dramatically changing the neurons that contribute most and least to the correct label. Motivated by it, we introduce the concept of neuron influence and further divide neurons into front, middle and tail part. Based on it, we propose neuron-level inverse perturbation(NIP), the first neuron-level reactive defense method against adversarial attacks. By strengthening front neurons and weakening those in the tail part, NIP can eliminate nearly all adversarial perturbations while still maintaining high benign accuracy. Besides, it can cope with different sizes of perturbations via adaptivity, especially larger ones. Comprehensive experiments conducted on three datasets and six models show that NIP outperforms the state-of-the-art baselines against eleven adversarial attacks. We further provide interpretable proofs via neuron activation and visualization for better understanding.	翻訳日:2021-12-28 14:47:03 公開日:2021-12-24
# テキストスタックのスポイラー:トランスフォーマーはどの程度役に立つのか? Spoiler in a Textstack: How Much Can Transformers Help? ( http://arxiv.org/abs/2112.12913v1 ) ライセンス: Link先を確認	Anna Wr\'oblewska, Pawe{\l} Rzepi\'nski, Sylwia Sysko-Roma\'nczuk	(参考訳) 本稿では,レビューにおけるスポイラー検出に関する研究について述べる。本稿では、利用可能なテキストベースのモデルタスクを微調整し、整理する手法について、最新のディープラーニングの成果とモデルの結果を解釈する手法について述べる。これまで、スポイラー研究は文献にはほとんど記述されていない。我々は,アノテート付きスポイラを備えた2つのオープンデータセット上で,転送学習アプローチと異なる最新のトランスフォーマーアーキテクチャをテストした(roc aucはtv tropes moviesデータセットで81\%,goodreadsデータセットは88\%)。また、データを収集し、きめ細かいアノテーションで新しいデータセットを組み立てました。そこで我々は,モデルの信頼性を評価し,その結果を説明するために,解釈可能性技術と尺度を用いた。 This paper presents our research regarding spoiler detection in reviews. In this use case, we describe the method of fine-tuning and organizing the available text-based model tasks with the latest deep learning achievements and techniques to interpret the models' results. Until now, spoiler research has been rarely described in the literature. We tested the transfer learning approach and different latest transformer architectures on two open datasets with annotated spoilers (ROC AUC above 81\% on TV Tropes Movies dataset, and Goodreads dataset above 88\%). We also collected data and assembled a new dataset with fine-grained annotations. To that end, we employed interpretability techniques and measures to assess the models' reliability and explain their results.	翻訳日:2021-12-28 14:46:09 公開日:2021-12-24
# モノトーン増分量子化解を用いた確率学習方程式 Stochastic Learning Equation using Monotone Increasing Resolution of Quantization ( http://arxiv.org/abs/2112.13006v1 ) ライセンス: Link先を確認	Jinwuk Seok, Jeong-Si Kim	(参考訳) 本稿では,提案アルゴリズムの量子化と確率解析の解法を単調に拡張した量子化学習方程式を提案する。密度分布と均一分布の量子化誤差に対するホワイトノイズ仮説によれば、量子化誤差をホワイトノイズとみなすことができる。このことから,単調に量子化分解能が増加する学習方程式は分布の観点として弱く収束することを示した。本稿では,対象関数のヘシアン制約のような局所収束特性の代わりに,リプシッツ条件を満たす領域に対して,大域的最適化が可能であることを示す。 In this paper, we propose a quantized learning equation with a monotone increasing resolution of quantization and stochastic analysis for the proposed algorithm. According to the white noise hypothesis for the quantization error with dense and uniform distribution, we can regard the quantization error as i.i.d.\ white noise. Based on this, we show that the learning equation with monotonically increasing quantization resolution converges weakly as the distribution viewpoint. The analysis of this paper shows that global optimization is possible for a domain that satisfies the Lipschitz condition instead of local convergence properties such as the Hessian constraint of the objective function.	翻訳日:2021-12-28 14:44:51 公開日:2021-12-24
# 検証セットのないdart: 限界可能性の最適化 DARTS without a Validation Set: Optimizing the Marginal Likelihood ( http://arxiv.org/abs/2112.13023v1 ) ライセンス: Link先を確認	Miroslav Fil, Binxin Ru, Clare Lyle, Yarin Gal	(参考訳) neural architecture search (nas)の成功は、歴史的に、過剰な計算要求によって制限されている。 DARTSのような現代のウェイトシェアリングNASメソッドは、1桁のGPU日で検索を終了できるが、共有ウェイトから最終最高のアーキテクチャを抽出することは信頼性が低いことで知られている。ベイズ限度解釈を用いた最近開発された一般化推定器であるtraining-speed-estimate (tse)は、以前はdartの勾配に基づく最適化の検証損失の代わりに使用されてきた。これによりDARTSのスキップ接続崩壊が防止され、NASBench-201と元のDARTS検索スペースの性能が大幅に向上する。各種DARTS診断を適用し,検証セットを使用しない異常な動作を示すことで,これらの結果を拡張した。さらに,本実験は,操作選択に比べて文献上ではあまり注目されていないにもかかわらず,探索性能に強い負の影響を与えるdartの深さギャップとトポロジ選択の具体例を与える。 The success of neural architecture search (NAS) has historically been limited by excessive compute requirements. While modern weight-sharing NAS methods such as DARTS are able to finish the search in single-digit GPU days, extracting the final best architecture from the shared weights is notoriously unreliable. Training-Speed-Estimate (TSE), a recently developed generalization estimator with a Bayesian marginal likelihood interpretation, has previously been used in place of the validation loss for gradient-based optimization in DARTS. This prevents the DARTS skip connection collapse, which significantly improves performance on NASBench-201 and the original DARTS search space. We extend those results by applying various DARTS diagnostics and show several unusual behaviors arising from not using a validation set. Furthermore, our experiments yield concrete examples of the depth gap and topology selection in DARTS having a strongly negative impact on the search performance despite generally receiving limited attention in the literature compared to the operations selection.	翻訳日:2021-12-28 14:43:15 公開日:2021-12-24
# bi型不均質グラフ学習のための2重階層型アテンションネットワーク Dual Hierarchical Attention Networks for Bi-typed Heterogeneous Graph Learning ( http://arxiv.org/abs/2112.13078v1 ) ライセンス: Link先を確認	Yu Zhao, Shaopeng Wei, Huaming Du, Xingyan Chen, Qing Li, Fuzhen Zhuang, Ji Liu and Gang Kou	(参考訳) バイタイプ不均一グラフは多くの実世界のシナリオに適用される。しかし、以前の異種グラフ学習研究は、通常、そのような異種グラフの双タイプ実体間の複雑な相互作用を無視する。本稿では,クラス間およびクラス間階層的階層的アテンションネットワークを用いて,二型不均質グラフ上での包括的ノード表現を学習する新しい二重階層的アテンションネットワーク(dhan)を提案する。具体的には、クラス内の注意は、同じタイプの隣人からノード表現を学ぶことを目的としており、クラス間の注意は、異なる種類の隣人からノード表現を集約することができる。したがって、デュアルアテンション操作により、DHANはノード内隣接情報だけでなく、二型不均一グラフ内のクラス間隣接情報も十分に活用することができる。両種異種学習ノードの包括的表現におけるDHANの有効性を十分に確認する各種課題に関する実験結果 Bi-typed heterogeneous graphs are applied in many real-world scenarios. However, previous heterogeneous graph learning studies usually ignore the complex interactions among the bi-typed entities in such heterogeneous graphs. To address this issue, in this paper we propose a novel Dual Hierarchical Attention Networks (DHAN) to learn comprehensive node representations on the bi-typed heterogeneous graphs with intra-class and inter-class hierarchical attention networks. Specifically, the intra-class attention aims to learn the node representations from its same type of neighbors, while inter-class attention is able to aggregate node representations from its different types of neighbors. Hence, the dual attention operations enable DHAN to sufficiently leverage not only the node intra-class neighboring information but also the inter-class neighboring information in the bi-typed heterogeneous graph. Experimental results on various tasks against the state-of-the-arts sufficiently confirm the capability of DHAN in learning node comprehensive representations on the bi-typed heterogeneous	翻訳日:2021-12-28 14:42:56 公開日:2021-12-24
# 解釈型強化学習に関する調査研究 A Survey on Interpretable Reinforcement Learning ( http://arxiv.org/abs/2112.13112v1 ) ライセンス: Link先を確認	Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao and Wulong Liu	(参考訳) 深層強化学習は、逐次的な意思決定問題に対して有望な機械学習アプローチとなっているが、自律運転や医療アプリケーションといった高度な領域では十分に成熟していない。そのような状況下では、例えば、学習されたポリシーは解釈可能で、配置前に検査される(例えば、安全性と検証可能性のために)必要がある。本調査は強化学習(RL)における高い解釈可能性を実現するための様々なアプローチの概要を提供する。その目的のために、解釈可能性(モデルの特性として)と説明可能性(プロキシの介入によるポストホック操作として)を区別し、それらをrlの文脈で以前の概念を強調して議論する。特に、解釈可能なRLは、解釈可能な入力、解釈可能な(遷移/回帰)モデル、解釈可能な意思決定など、異なる側面を受け入れることができると論じる。このスキームに基づいて,過去10年間の論文を中心に,解釈可能なRLに関する最近の研究の要約と分析を行った。また,いくつかの研究分野を簡潔に議論し,有望な研究の方向性を指摘する。 Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as a property of a model) and explainability (as a post-hoc operation, with the intervention of a proxy) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to interpretable RL with an emphasis on papers published in the past 10 years. We also discuss briefly some related research areas and point to some potential promising research directions.	翻訳日:2021-12-28 14:42:41 公開日:2021-12-24
# 集合フィードバックをもつガウス過程帯域 Gaussian Process Bandits with Aggregated Feedback ( http://arxiv.org/abs/2112.13029v1 ) ライセンス: Link先を確認	Mengyan Zhang, Russell Tsuchida, Cheng Soon Ong	(参考訳) 我々は,固定予算内で最高の武器を総括的フィードバックの下で推薦するという新しい設定の下で,連続武装バンディット問題を考える。これは、正確な報酬を得るのが不可能または高価であるアプリケーションによって動機付けられ、サブセットを超える平均のような集約された報酬やフィードバックが利用可能である。報奨関数の集合はガウス過程からのものであると仮定して制約し、ガウス過程最適化最適化(GPOO)アルゴリズムを提案する。ノードをアーム空間のサブセットとする木を適応的に構築し、フィードバックがノードの代表者の報酬の集合である。我々は,推奨する腕に対するフィードバックの集約に関して,新たな単純な後悔概念を提案する。本稿では,提案アルゴリズムの理論的解析を行い,特別な場合として単一点フィードバックを復元する。 GPOOを例示し、シミュレーションデータの関連アルゴリズムと比較する。 We consider the continuum-armed bandits problem, under a novel setting of recommending the best arms within a fixed budget under aggregated feedback. This is motivated by applications where the precise rewards are impossible or expensive to obtain, while an aggregated reward or feedback, such as the average over a subset, is available. We constrain the set of reward functions by assuming that they are from a Gaussian Process and propose the Gaussian Process Optimistic Optimisation (GPOO) algorithm. We adaptively construct a tree with nodes as subsets of the arm space, where the feedback is the aggregated reward of representatives of a node. We propose a new simple regret notion with respect to aggregated feedback on the recommended arms. We provide theoretical analysis for the proposed algorithm, and recover single point feedback as a special case. We illustrate GPOO and compare it with related algorithms on simulated data.	翻訳日:2021-12-28 14:39:11 公開日:2021-12-24
# シーンのテキスト認識に優れたテキスト推論を可能にするビジュアルセマンティクス Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition ( http://arxiv.org/abs/2112.12916v1 ) ライセンス: Link先を確認	Yue He, Chen Chen, Jing Zhang, Juhua Liu, Fengxiang He, Chaoyue Wang, Bo Du	(参考訳) 既存のシーンテキスト認識(str)手法は、典型的には言語モデルを使用して、視覚認識(vr)モデルによって予測される1d文字系列の結合確率を最適化する。この問題に対処するため,本論文では,視覚意味論に基づくテキスト推論を初めて試みる。技術的には、vrモデルによって予測される文字分割マップを考えると、各インスタンスにサブグラフを構築し、ノードがその中のピクセルを表し、ノード間のエッジはその空間的類似性に基づいて追加される。その後、これらの部分グラフはルートノードによって順次接続され、完全なグラフにマージされる。このグラフに基づいて,テキスト推論(GTR)のためのグラフ畳み込みネットワークを考案し,これをクロスエントロピー損失で監視する。 GTRは、テキスト推論の改善によるパフォーマンス向上のために、代表STRモデルに簡単にプラグインできる。具体的には,セグメンテーションベースのSTRベースラインでGTRを言語モデルに並列化することで,S-GTRというモデルを構築し,相互学習による視覚言語的相補性を効果的に活用する。 S-GTRは6つのSTRベンチマークに新しい最先端をセットし、多言語データセットに最適化する。コードはhttps://github.com/adeline-cs/GTRで入手できる。 Existing Scene Text Recognition (STR) methods typically use a language model to optimize the joint probability of the 1D character sequence predicted by a visual recognition (VR) model, which ignore the 2D spatial context of visual semantics within and between character instances, making them not generalize well to arbitrary shape scene text. To address this issue, we make the first attempt to perform textual reasoning based on visual semantics in this paper. Technically, given the character segmentation maps predicted by a VR model, we construct a subgraph for each instance, where nodes represent the pixels in it and edges are added between nodes based on their spatial similarity. Then, these subgraphs are sequentially connected by their root nodes and merged into a complete graph. Based on this graph, we devise a graph convolutional network for textual reasoning (GTR) by supervising it with a cross-entropy loss. GTR can be easily plugged in representative STR models to improve their performance owing to better textual reasoning. Specifically, we construct our model, namely S-GTR, by paralleling GTR to the language model in a segmentation-based STR baseline, which can effectively exploit the visual-linguistic complementarity via mutual learning. S-GTR sets new state-of-the-art on six challenging STR benchmarks and generalizes well to multi-linguistic datasets. Code is available at https://github.com/adeline-cs/GTR.	翻訳日:2021-12-28 14:17:47 公開日:2021-12-24
# ニューラルネットワークモデルにおける擬似記憶 Counterfactual Memorization in Neural Language Models ( http://arxiv.org/abs/2112.12938v1 ) ライセンス: Link先を確認	Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tram\`er, Nicholas Carlini	(参考訳) 現代のニューラルネットワークモデルは、トレーニングデータから機密情報を記憶するNLPリスクのタスクで広く使用されている。モデルがパラメータ、トレーニングデータ、計算でスケールアップを続けるにつれ、言語モデルの記憶力の理解はどちらも学習理論の観点から重要であり、現実のアプリケーションでは事実上不可欠である。言語モデルにおける暗記に関する以前の研究における公然の疑問は、「一般的な」暗記をフィルターする方法である。実際、ほとんどの記憶基準はトレーニングセットの出現回数と強く相関しており、慣れ親しんだフレーズや公的な知識、テンプレート化されたテキストなどの「一般的な」記憶を捉えている。本稿では,心理学における人間の記憶の分類から着想を得た原則的視点を提供する。この観点から、トレーニング中に特定の文書が省略された場合、モデルの予測がどのように変化するかを特徴付ける反事実記憶の概念を定式化する。標準テキストデータセットにおける偽記憶されたトレーニング例を同定し,検討する。さらに、各トレーニング例が検証セットと生成されたテキストに与える影響を推定し、これがテスト時の記憶源の直接的な証拠となることを示す。 Modern neural language models widely used in tasks across NLP risk memorizing sensitive information from their training data. As models continue to scale up in parameters, training data, and compute, understanding memorization in language models is both important from a learning-theoretical point of view, and is practically crucial in real world applications. An open question in previous studies of memorization in language models is how to filter out "common" memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing "common" memorization such as familiar phrases, public knowledge or templated texts. In this paper, we provide a principled perspective inspired by a taxonomy of human memory in Psychology. From this perspective, we formulate a notion of counterfactual memorization, which characterizes how a model's predictions change if a particular document is omitted during training. We identify and study counterfactually-memorized training examples in standard text datasets. We further estimate the influence of each training example on the validation set and on generated texts, and show that this can provide direct evidence of the source of memorization at test time.	翻訳日:2021-12-28 13:57:50 公開日:2021-12-24
# マルチスケール機能融合:道路ポットホール検出のためのセマンティックセグメンテーションの学習 Multi-Scale Feature Fusion: Learning Better Semantic Segmentation for Road Pothole Detection ( http://arxiv.org/abs/2112.13082v1 ) ライセンス: Link先を確認	Jiahe Fan, Mohammud J. Bocus, Brett Hosking, Rigen Wu, Yanan Liu, Sergey Vityazev, Rui Fan	(参考訳) 本稿では,単一モーダル意味セグメンテーションに基づく新しいポットホール検出手法を提案する。まず、畳み込みニューラルネットワークを用いて入力画像から視覚的特徴を抽出する。チャネルアテンションモジュールは、異なる機能マップの一貫性を高めるためにチャネル機能を強化します。次に,空間的コンテキスト情報を統合するために,アトーラス空間ピラミッドプーリングモジュール(連続的なアトーラス畳み込みと拡張率)を用いる。これにより、ポットホールと無傷道路の区別が容易になる。最後に, 提案したマルチスケール機能融合モジュールを用いて, 隣接層内の特徴マップを融合する。これにより、異なる機能チャネル層間のセマンティクスギャップはさらに低減される。提案手法の有効性を実証するため,Pothole-600データセットを用いて実験を行った。定量的比較により,本手法はRGB画像と変換された異種画像の両方において最先端(SoTA)性能を実現し,STA単一モーダルセマンティックセマンティックセマンティクスネットワークを3つ上回った。 This paper presents a novel pothole detection approach based on single-modal semantic segmentation. It first extracts visual features from input images using a convolutional neural network. A channel attention module then reweighs the channel features to enhance the consistency of different feature maps. Subsequently, we employ an atrous spatial pyramid pooling module (comprising of atrous convolutions in series, with progressive rates of dilation) to integrate the spatial context information. This helps better distinguish between potholes and undamaged road areas. Finally, the feature maps in the adjacent layers are fused using our proposed multi-scale feature fusion module. This further reduces the semantic gap between different feature channel layers. Extensive experiments were carried out on the Pothole-600 dataset to demonstrate the effectiveness of our proposed method. The quantitative comparisons suggest that our method achieves the state-of-the-art (SoTA) performance on both RGB images and transformed disparity images, outperforming three SoTA single-modal semantic segmentation networks.	翻訳日:2021-12-28 13:57:33 公開日:2021-12-24
# MAMLは機能再使用によってのみ動作するか? データ中心の視点 Does MAML Only Work via Feature Re-use? A Data Centric Perspective ( http://arxiv.org/abs/2112.13137v1 ) ライセンス: Link先を確認	Brando Miranda, Yu-Xiong Wang and Sanmi Koyejo	(参考訳) 最近の研究は、優れた埋め込みが、多くの数ショットの学習ベンチマークを解決する必要があることを示唆している。さらに、モデル非依存なメタ学習(maml)も、良い埋め込みを学習することで、同じ方法で機能することを強く示唆している。これらの観察は、メタ学習アルゴリズムが何をし、いつ機能するのかについての理解の欠如を浮き彫りにする。本研究では,メタ学習型MAML表現がいかに機能するかを示す実験結果を提供する。特に3つの興味深い性質を同定する。 1) 従来の研究とは対照的に,機能再使用の程度が低い合成ベンチマークのファミリーを定義することが可能であることが示され,現在の数発の学習ベンチマークはメタ学習アルゴリズムの成功に必要な特性を持っていない可能性が示唆された。 2) メタオーバーフィットは、クラス数(あるいは概念)が有限であるときに起こり、タスクが無制限の概念(例えばオンライン学習)を持つと、この問題は消滅する。 3)mamlによるメタテスト時の適応性は,提案する合成ベンチマークのトレーニングにおいても,大幅な表現変更やメタテストパフォーマンスの向上を必ずしも生まない。最後に、メタ学習アルゴリズムをよりよく理解するためには、絶対的なパフォーマンスのみを追跡することを超えて、メタ学習の程度を正式に定量化し、両方のメトリクスを一緒に追跡しなければなりません。この方法での報告結果は、メタオーバーフィッティングのソースをより正確に特定し、固定機能の再使用を超えて学習する柔軟なメタ学習アルゴリズムを設計するのに役立ちます。最後に、メタラーニングを再考する上での課題は、以前の研究で示唆されたアルゴリズムではなく、数ショットの学習データセットとベンチマークの設計にあると推測する。 Recent work has suggested that a good embedding is all we need to solve many few-shot learning benchmarks. Furthermore, other work has strongly suggested that Model Agnostic Meta-Learning (MAML) also works via this same method - by learning a good embedding. These observations highlight our lack of understanding of what meta-learning algorithms are doing and when they work. In this work, we provide empirical results that shed some light on how meta-learned MAML representations function. In particular, we identify three interesting properties: 1) In contrast to previous work, we show that it is possible to define a family of synthetic benchmarks that result in a low degree of feature re-use - suggesting that current few-shot learning benchmarks might not have the properties needed for the success of meta-learning algorithms; 2) meta-overfitting occurs when the number of classes (or concepts) are finite, and this issue disappears once the task has an unbounded number of concepts (e.g., online learning); 3) more adaptation at meta-test time with MAML does not necessarily result in a significant representation change or even an improvement in meta-test performance - even when training on our proposed synthetic benchmarks. Finally, we suggest that to understand meta-learning algorithms better, we must go beyond tracking only absolute performance and, in addition, formally quantify the degree of meta-learning and track both metrics together. Reporting results in future work this way will help us identify the sources of meta-overfitting more accurately and help us design more flexible meta-learning algorithms that learn beyond fixed feature re-use. Finally, we conjecture the core challenge of re-thinking meta-learning is in the design of few-shot learning data sets and benchmarks - rather than in the algorithms, as suggested by previous work.	翻訳日:2021-12-28 13:57:13 公開日:2021-12-24
# ゼロタスクの多様性の曲線--MAMLにおける伝達学習の失敗とその実証的等価性について The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and their Empirical Equivalence ( http://arxiv.org/abs/2112.13121v1 ) ライセンス: Link先を確認	Brando Miranda, Yu-Xiong Wang and Sanmi Koyejo	(参考訳) 最近、数ショットの学習ベンチマークを解くのに必要なトランスファーラーニングソリューションがすべてであることが明らかになっている。これにより、メタ学習アルゴリズムのデプロイ時期とデプロイ方法に関する重要な疑問が提起される。本稿では,メタラーニングソリューションが成功するか否かを予測できると仮定した,数発の学習ベンチマークの計算可能なメトリックを最初に定式化することにより,これらの疑問を明らかにするための第一歩とする。我々はこの指標を数ショットの学習ベンチマークの多様性係数と命名する。多様性係数を用いることで、miniimagenetベンチマークの多様性はゼロであることが示され、多様性を計算するための24の異なる方法が示される。この結果から,MAML学習における伝達学習ソリューションの公平な比較を行う場合,両者が同一のメタテスト精度を持つことを示した。これは、トランスファーラーニングがMAMLよりも優れていないことを示唆している。これら2つの事実は、多様性がメタラーニングの成功と相関するかどうかの最初のテストであり、したがって、ゼロの多様性係数は、特にメタテスト時間において、トランスファーラーニングとMAML学習ソリューションの高い類似性に相関していることを示す。したがって、メタ学習ソリューションは、多様性係数がゼロのとき、転送学習と同じメタテスト性能を持つ。 It has been recently observed that a transfer learning solution might be all we needed to solve many few-shot learning benchmarks. This raises important questions about when and how meta-learning algorithms should be deployed. In this paper, we make a first step in clarifying these questions by first formulating a computable metric for a few-shot learning benchmark that we hypothesize is predictive of whether meta-learning solutions will succeed or not. We name this metric the diversity coefficient of a few-shot learning benchmark. Using the diversity coefficient, we show that the MiniImagenet benchmark has zero diversity - according to twenty-four different ways to compute the diversity. We proceed to show that when making a fair comparison between MAML learned solutions to transfer learning, both have identical meta-test accuracy. This suggests that transfer learning fails to outperform MAML - contrary to what previous work suggests. Together, these two facts provide the first test of whether diversity correlates with meta-learning success and therefore show that a diversity coefficient of zero correlates with a high similarity between transfer learning and MAML learned solutions - especially at meta-test time. We therefore conjecture meta-learned solutions have the same meta-test performance as transfer learning when the diversity coefficient is zero.	翻訳日:2021-12-28 13:56:31 公開日:2021-12-24
# (参考訳) 新規物体学習のためのデュアルパス構造コントラスト埋め込み Dual Path Structural Contrastive Embeddings for Learning Novel Objects ( http://arxiv.org/abs/2112.12359v2 ) ライセンス: CC BY 4.0	Bingbin Li, Elvis Han Cui, Yanan Li, Donghui Wang, Weng Kee Wong	(参考訳) 少数のラベル付きサンプルから新しいクラスを学ぶことは、機械学習領域で注目を集めている。メタラーニングベースあるいはトランスファーラーニングベースのパラダイムに関する最近の研究は、優れた機能空間に関する情報を得ることが、少ないタスクで良好なパフォーマンスを達成するための効果的な解決策であることを示している。本稿では,特徴表現と分類器のタスクを分離し,典型的な伝達学習学習戦略を通じて,基本クラスからのみ特徴埋め込みアーキテクチャを学習する,単純だが効果的なパラダイムを提案する。基本クラスと新しいクラスをまたいだ一般化能力とクラス内の識別能力の両方を維持するため,構造的類似性とコントラスト的特徴構成を効果的に組み合わせたデュアルパス特徴学習手法を提案する。このように、内部クラスのアライメントとクラス間の均一性はバランスよく保たれ、性能が向上する。 3つの一般的なベンチマーク実験により、単純なプロトタイプベース分類器を組み込んだ場合、インダクティブ推論とトランスダクティブ推論のいずれにおいても、標準および一般化された少数ショット問題に対して有望な結果が得られることが示された。 Learning novel classes from a very few labeled samples has attracted increasing attention in machine learning areas. Recent research on either meta-learning based or transfer-learning based paradigm demonstrates that gaining information on a good feature space can be an effective solution to achieve favorable performance on few-shot tasks. In this paper, we propose a simple but effective paradigm that decouples the tasks of learning feature representations and classifiers and only learns the feature embedding architecture from base classes via the typical transfer-learning training strategy. To maintain both the generalization ability across base and novel classes and discrimination ability within each class, we propose a dual path feature learning scheme that effectively combines structural similarity with contrastive feature construction. In this way, both inner-class alignment and inter-class uniformity can be well balanced, and result in improved performance. Experiments on three popular benchmarks show that when incorporated with a simple prototype based classifier, our method can still achieve promising results for both standard and generalized few-shot problems in either an inductive or transductive inference setting.	翻訳日:2021-12-28 13:44:32 公開日:2021-12-24
# (参考訳) 両レベル最適化レンズによる高速対人訓練の見直しと改善 Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization ( http://arxiv.org/abs/2112.12376v2 ) ライセンス: CC BY 4.0	Yihua Zhang, Guanhua Zhang, Prashant Khanduri, Mingyi Hong, Shiyu Chang, Sijia Liu	(参考訳) 敵陣訓練(AT)は、敵陣攻撃に対するディープニューラルネットワークの堅牢性を改善するための防御メカニズムとして広く認知されている。最小化器(すなわちディフェンダー)は、最大化器(すなわち攻撃者)が作成した敵の例の存在下で、最悪の場合のトレーニング損失を最小限に抑えるためのロバストなモデルを求める。しかし、min-maxの性質は計算量が多いためスケールが難しい。一方、FAST-ATアルゴリズムや、ATを改善する最近の多くのアルゴリズムは、その最大化ステップを単純なワンショット勾配符号ベースの攻撃生成ステップに置き換えることで、min-maxベースのATを単純化している。実装は容易ではあるが、fast-atは理論的な保証が欠けており、その実用性は不十分であり、強力な敵とのトレーニングにおいて強固な破壊的過剰に苦しむ。本稿では,双方向最適化(BLO)の観点からFAST-ATの設計を提案する。まず,fast-atの最も一般的なアルゴリズム仕様は,符号操作を含む二値問題を解くための勾配降下型アルゴリズムと等価であることを示す。しかし、符号操作の離散性はアルゴリズムの性能を理解するのを難しくしている。そこで本研究では,Fast Bi-level AT (FAST-BAT) と呼ばれる新しいアルゴリズムの設計と解析を行う。 FAST-BATは、グラデーションサインメソッドや明示的なロバストな正規化を呼ばずに、符号ベースの投射勾配降下(PGD)攻撃を防御することができる。さらに,本手法は, 従来のFAST-ATベースラインよりも優れたモデルロバスト性を実現し, 破滅的なオーバーフィッティングを誘発せず, あるいは標準精度の低下に悩まされることを実証的に示す。 Adversarial training (AT) has become a widely recognized defense mechanism to improve the robustness of deep neural networks against adversarial attacks. It solves a min-max optimization problem, where the minimizer (i.e., defender) seeks a robust model to minimize the worst-case training loss in the presence of adversarial examples crafted by the maximizer (i.e., attacker). However, the min-max nature makes AT computationally intensive and thus difficult to scale. Meanwhile, the FAST-AT algorithm, and in fact many recent algorithms that improve AT, simplify the min-max based AT by replacing its maximization step with the simple one-shot gradient sign based attack generation step. Although easy to implement, FAST-AT lacks theoretical guarantees, and its practical performance can be unsatisfactory, suffering from the robustness catastrophic overfitting when training with strong adversaries. In this paper, we propose to design FAST-AT from the perspective of bi-level optimization (BLO). We first make the key observation that the most commonly-used algorithmic specification of FAST-AT is equivalent to using some gradient descent-type algorithm to solve a bi-level problem involving a sign operation. However, the discrete nature of the sign operation makes it difficult to understand the algorithm performance. Based on the above observation, we propose a new tractable bi-level optimization problem, design and analyze a new set of algorithms termed Fast Bi-level AT (FAST-BAT). FAST-BAT is capable of defending sign-based projected gradient descent (PGD) attacks without calling any gradient sign method and explicit robust regularization. Furthermore, we empirically show that our method outperforms state-of-the-art FAST-AT baselines, by achieving superior model robustness without inducing robustness catastrophic overfitting, or suffering from any loss of standard accuracy.	翻訳日:2021-12-28 13:14:35 公開日:2021-12-24
# (参考訳) latr: シーンテキストvqaのためのレイアウト対応トランスフォーマー LaTr: Layout-Aware Transformer for Scene-Text VQA ( http://arxiv.org/abs/2112.12494v2 ) ライセンス: CC BY 4.0	Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, R. Manmatha	(参考訳) 本稿では,Scene Text Visual Question Answering (STVQA) のための新しいマルチモーダルアーキテクチャ,Layout-Aware Transformer (LaTr) を提案する。 STVQAのタスクは、異なるモダリティを推論するモデルを必要とする。そこで我々はまず,各モダリティの影響を調査し,特にレイアウト情報に富んだ言語モジュールの重要性を明らかにする。そこで本研究では,テキストと空間的手がかりのみを必要とする単目的事前学習方式を提案する。スキャンした文書にこの事前学習方式を適用することは、ドメイン間差にもかかわらず、自然画像を使用するよりも一定の利点があることを示す。スキャンされた文書は調達が容易で、テキストセンスがあり、様々なレイアウトを持ち、言語とレイアウト情報を結びつけることで、モデルが様々な空間的手がかり(例えば左、下等)を学ぶのを助ける。既存の手法と比較すると,この手法は語彙を含まない復号化を行い,訓練語彙をはるかに一般化する。さらに我々は,LaTrがOCRエラーに対する堅牢性を改善することを実証した。さらに,視覚変換器を活用することで,外部物体検出装置の必要性を解消する。 LaTrは、複数のデータセット上で最先端のSTVQAメソッドより優れている。特に、TextVQAでは+7.6%、ST-VQAでは+10.8%、OCR-VQAでは+4.0%である。 We propose a novel multimodal architecture for Scene Text Visual Question Answering (STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to reason over different modalities. Thus, we first investigate the impact of each modality, and reveal the importance of the language module, especially when enriched with layout information. Accounting for this, we propose a single objective pre-training scheme that requires only text and spatial cues. We show that applying this pre-training scheme on scanned documents has certain advantages over using natural images, despite the domain gap. Scanned documents are easy to procure, text-dense and have a variety of layouts, helping the model learn various spatial cues (e.g. left-of, below etc.) by tying together language and layout information. Compared to existing approaches, our method performs vocabulary-free decoding and, as shown, generalizes well beyond the training vocabulary. We further demonstrate that LaTr improves robustness towards OCR errors, a common reason for failure cases in STVQA. In addition, by leveraging a vision transformer, we eliminate the need for an external object detector. LaTr outperforms state-of-the-art STVQA methods on multiple datasets. In particular, +7.6% on TextVQA, +10.8% on ST-VQA and +4.0% on OCR-VQA (all absolute accuracy numbers).	翻訳日:2021-12-28 12:45:31 公開日:2021-12-24
# (参考訳) ディープニューラルネットワークを用いた高次元分類問題の最適学習 Optimal learning of high-dimensional classification problems using deep neural networks ( http://arxiv.org/abs/2112.12555v2 ) ライセンス: CC BY 4.0	Philipp Petersen, Felix Voigtlaender	(参考訳) 本研究では,無騒音訓練サンプルから学習分類関数を学習する問題を,決定境界が一定の正則性を持つと仮定して検討する。この推定問題の普遍的下限を,連続決定境界の一般クラスに対して定めている。局所的バロン-正則決定境界のクラスでは、最適推定率は基本的に基底次元とは独立であり、深層ニューラルネットワークの適切なクラスに対する経験的リスク最小化法により実現可能である。これらの結果は、バロン正則関数のクラスの$l^1$と$l^\infty$エントロピーの新しい推定に基づいている。 We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we find that the optimal estimation rates are essentially independent of the underlying dimension and can be realized by empirical risk minimization methods over a suitable class of deep neural networks. These results are based on novel estimates of the $L^1$ and $L^\infty$ entropies of the class of Barron-regular functions.	翻訳日:2021-12-28 12:18:51 公開日:2021-12-24
# banmo: カジュアルなビデオから3dニューラルモデルを作る BANMo: Building Animatable 3D Neural Models from Many Casual Videos ( http://arxiv.org/abs/2112.12761v2 ) ライセンス: Link先を確認	Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo	(参考訳) 関節型3d形状再構成の作業は、しばしば特殊なセンサー(例えば、同期マルチカメラシステム)や、事前構築された3d変形可能なモデル(例えば、smalやsmpl)に依存する。このようなメソッドは、野生のさまざまなオブジェクトセットにスケールできない。本稿では,特殊なセンサや事前定義されたテンプレート形状を必要としないBANMoを提案する。 BANMoは、多くのモノクロカジュアルビデオから高忠実な3Dモデル(形状とアニマタブルなスキンウェイトを含む)を、異なるレンダリングフレームワークで構築する。多くのビデオを使用することで、カメラのビューやオブジェクトの調音をより広範にカバーできる一方で、背景や照明条件の異なるシーン間での対応を確立する上での重要な課題がもたらされる。我々は,(1)関節骨とブレンドスキンを用いた古典的変形可能な形状モデル,(2)勾配に基づく最適化に寄与する体積神経放射場(NeRF),(3)ピクセルと関節モデルとの対応を生成する正準埋め込みの3つの学派を融合させることを考察した。ニューラルブレンドスキンモデルを導入し, 可微分変形と可逆変形を可能にした。標準埋め込みと組み合わせることで、サイクル整合性で自己教師できるビデオ間の密接な対応を確立することができる。リアルと合成のデータセットでは、BANMoは人間や動物の以前の作品よりも忠実な3D再構成を示しており、新しい視点やポーズからリアルな画像をレンダリングすることができる。プロジェクトWebページ: banmo-www.github.io Prior work for articulated 3D shape reconstruction often relies on specialized sensors (e.g., synchronized multi-camera systems), or pre-built 3D deformable models (e.g., SMAL or SMPL). Such methods are not able to scale to diverse sets of objects in the wild. We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape. BANMo builds high-fidelity, articulated 3D models (including shape and animatable skinning weights) from many monocular casual videos in a differentiable rendering framework. While the use of many videos provides more coverage of camera views and object articulations, they introduce significant challenges in establishing correspondence across scenes with different backgrounds, illumination conditions, etc. Our key insight is to merge three schools of thought; (1) classic deformable shape models that make use of articulated bones and blend skinning, (2) volumetric neural radiance fields (NeRFs) that are amenable to gradient-based optimization, and (3) canonical embeddings that generate correspondences between pixels and an articulated model. We introduce neural blend skinning models that allow for differentiable and invertible articulated deformations. When combined with canonical embeddings, such models allow us to establish dense correspondences across videos that can be self-supervised with cycle consistency. On real and synthetic datasets, BANMo shows higher-fidelity 3D reconstructions than prior works for humans and animals, with the ability to render realistic images from novel viewpoints and poses. Project webpage: banmo-www.github.io .	翻訳日:2021-12-28 12:16:40 公開日:2021-12-24

Title

Authors

Abstract

論文公表日・翻訳日

# ハミルトン最小化に基づく計算の量子モデルの数学的構造について

On the mathematical structure of quantum models of computation based on Hamiltonian minimisation ( http://arxiv.org/abs/2009.10088v2 )

ライセンス: Link先を確認

Jacob Biamonte

(参考訳) スピンハミルトンの基底状態の性質を決定することは、数学、理論、応用物理学の分野を結びつける中心的な関連点である。過去数十年間、物理系の基底状態特性は計算資源としてますます考慮されてきた。この論文は、量子計算や古典計算に関連する(プログラム)基底状態を生成する数学的装置の一部を開発する。この論文(現在10年以上前の)で示された中核的な発見には (i)論理演算(ゲート)は、イジングスピンの低エネルギーセクタに組み込むことができるが、3つの(かつより高い)ボディーイジング相互作用項は、2及び1体のイジング項の最小化によって模倣することができるが、スラックスピンの導入は必要ではない。 (ii)摂動理論ガジェットは、与えられたハミルトニアンには存在しない相互作用のエミュレーションを可能にする。例えば、~$yy$ 相互作用は、zz$, $xx$ から実現することができる。他の結果と組み合わせて、これらのモデルが基底状態量子計算のための普遍的な資源を提供することを示した。より最近の発見は、量子アルゴリズムに対する現代の変分アプローチの理想化バージョンが量子計算の普遍モデルを可能にするという証明を含んでいる。他の関連する結果は、基底状態の量子計算や量子回路によるハミルトニアンの最小化に関連しても示される。 ising model reductions, stochastic versus quantum processes on graphs, quantum gates and circuits as tensor networks, variational quantum algorithms and hamiltonian gadgets. 関連トピックは以下のとおりである。

Determining properties of ground states of spin Hamiltonians remains a topic of central relevance connecting disciplines of mathematical, theoretical and applied physics. In the last few decades, ground state properties of physical systems have been increasingly considered as computational resources. This thesis develops parts of the mathematical apparatus to create (program) ground states relevant for quantum and classical computation. The core findings presented in this thesis (now over a decade old) including that (i) logic operations (gates) can be embedded into the low-energy sector of Ising spins whereas three (and higher) body Ising interaction terms can be mimicked through the minimisation of 2- and 1-body Ising terms yet require the introduction of slack spins; (ii) Perturbation theory gadgets enable the emulation of interactions not present in a given Hamiltonian, e.g.~$YY$ interactions can be realized from $ZZ$, $XX$, the thesis contains a result from 2007 showing that physically relevant two-body model Hamiltonian's have a QMA-hard ground state energy decision problem. Merged with other results, this established that these models provide a universal resource for ground state quantum computation. More recent findings include the proof that an idealised version of the contemporary variational approach to quantum algorithms enables a universal model of quantum computation. Other related results are also presented as they relate to ground state quantum computation and the minimisation of Hamiltonians by quantum circuits. The topics covered include: Ising model reductions, stochastic versus quantum processes on graphs, quantum gates and circuits as tensor networks, variational quantum algorithms and Hamiltonian gadgets.

翻訳日:2023-05-01 09:14:35 公開日:2021-12-24

# 受動オンチップ超伝導循環器の動作:デバイス制御と準粒子効果

Operating a passive on-chip superconducting circulator: device control and quasiparticle effects ( http://arxiv.org/abs/2103.02759v2 )

ライセンス: Link先を確認

Dat Thanh Le, Clemens Muller, Rohit Navarathna, Arkady Fedorov, and T. M. Stace

(参考訳) マイクロ波循環器は超伝導回路に基づく量子技術において重要な役割を果たす。従来の循環器の設計はフェライト材料を採用しており、バルク状であり強磁場を伴い、超伝導チップへの集積には適さない。オンチップ超伝導循環器の有望な設計は、受動ジョセフソン接合環に基づいている。本稿では,回路チューニングと準粒子トンネル効果の2つの運用課題について考察する。断熱除去を用いて散乱行列を計算し,パラメータ制約を導出し,最適循環を実現する。次に、ゲート電圧やフラックスバイアスを含む外部制御パラメータの完全なセットに対して循環器性能を数値的に最適化し、この多次元最適化が迅速に収束して最適作業点を求めることを示す。また、循環器リングにおける準粒子トンネルの可能性と、それが信号循環に与える影響についても検討する。この結果は,ジョセフソン接合からなる受動オンチップ超電導循環器の実用運用の基礎となる。

Microwave circulators play an important role in quantum technology based on superconducting circuits. The conventional circulator design, which employs ferrite materials, is bulky and involves strong magnetic fields, rendering it unsuitable for integration on superconducting chips. One promising design for an on-chip superconducting circulator is based on a passive Josephson-junction ring. In this paper, we consider two operational issues for such a device: circuit tuning and the effects of quasiparticle tunneling. We compute the scattering matrix using adiabatic elimination and derive the parameter constraints to achieve optimal circulation. We then numerically optimize the circulator performance over the full set of external control parameters, including gate voltages and flux bias, to demonstrate that this multi-dimensional optimization converges quickly to find optimal working points. We also consider the possibility of quasiparticle tunneling in the circulator ring and how it affects signal circulation. Our results form the basis for practical operation of a passive on-chip superconducting circulator made from a ring of Josephson junctions.

翻訳日:2023-04-09 07:43:21 公開日:2021-12-24

# リー代数量子系のフロッケ工学

Floquet Engineering of Lie Algebraic Quantum Systems ( http://arxiv.org/abs/2103.15923v2 )

ライセンス: Link先を確認

Jayendra N. Bandyopadhyay and Juzar Thingna

(参考訳) 静的ハミルトニアンから始まる所望のシステムを分光的に実現するために,周期駆動プロトコルを体系的に設計する'フロケットエンジニアリング'形式を提案する。この形式は、例えば、非相互作用粒子が格子上を移動する固体系や、光学格子上を移動する超低温原子によって記述される変種を含む閉リー代数構造を持つ量子系に適用できる。 Floquet Engineering の以前の試みとは異なり、我々の手法は任意の駆動周波数でFloquet Hamiltonian を生成し、高速または低速な駆動レギュレーションに制限されない。この手法はWei-Norman ansatzに基づいており、これはもともと任意の駆動のための時間進化演算子を構築するために提案されたものである。本稿では、このアンサッツを駆動の一期間以内に定義されたマイクロモーションダイナミクスに適用し、マイクロモーションのゲージを固定して駆動プロトコルを得る。このアイデアを説明するために、2バンドシステムまたはテストベッドとして2つのサブラティクスからなるシステムを用いる。特に,パラダイム的フラットバンドモデルであるクロススティッチ格子モデルの設計に注目する。

We propose a `Floquet engineering' formalism to systematically design a periodic driving protocol in order to stroboscopically realize the desired system starting from a given static Hamiltonian. The formalism is applicable to quantum systems which have an underlying closed Lie-algebraic structure, for example, solid-state systems with noninteracting particles moving on a lattice or its variant described by the ultra-cold atoms moving on an optical lattice. Unlike previous attempts at Floquet engineering, our method produces the desired Floquet Hamiltonian at any driving frequency and is not restricted to the fast or slow driving regimes. The approach is based on Wei-Norman ansatz, which was originally proposed to construct a time-evolution operator for any arbitrary driving. Here, we apply this ansatz to the micro-motion dynamics, defined within one period of the driving, and obtain the driving protocol by fixing the gauge of the micro-motion. To illustrate our idea, we use a two-band system or the systems consisting of two sub-lattices as a testbed. Particularly, we focus on engineering the cross-stitched lattice model that has been a paradigmatic flat-band model.

翻訳日:2023-04-06 05:40:31 公開日:2021-12-24

# 量子平均力ギブス状態の弱および超強結合限界

Weak and ultrastrong coupling limits of the quantum mean force Gibbs state ( http://arxiv.org/abs/2104.12606v2 )

ライセンス: Link先を確認

J. D. Cresser, J. Anders

(参考訳) ギブス状態は、温度$T$の環境と接触する系の平衡状態であると考えられている。しかし、システムと環境の間の不要な相互作用は、状態が変化する可能性がある。ここで、この平均力ギブス状態の一般的な表現を導出し、ボソニック貯水池と相互作用するシステムにおいて有効である。まず、弱結合極限の状態を導出し、一般に、素系ハミルトニアンに関してコヒーレンスを維持することを見つける。第2に,超強結合構造の研究に適した新しい拡張法を開発した。これにより、平均力ギブス状態の明示的な形式を導出することができ、ハミルトニアン系の代わりにシステム-貯留相互作用によって設定された対角線となる。 1つの量子ビット、3レベルのv-システム、2つの結合量子ビットは全てボソニック貯水池と相互作用する。その結果、強結合状態におけるコヒーレンスの存在に光を当て、ナノスケール熱力学研究の鍵となるツールを提供した。

The Gibbs state is widely taken to be the equilibrium state of a system in contact with an environment at temperature $T$. However, non-negligible interactions between system and environment can give rise to an altered state. Here we derive general expressions for this mean force Gibbs state, valid for any system that interacts with a bosonic reservoir. First, we derive the state in the weak coupling limit and find that, in general, it maintains coherences with respect to the bare system Hamiltonian. Second, we develop a new expansion method suited to investigate the ultrastrong coupling regime. This allows us to derive the explicit form for the mean force Gibbs state, and we find that it becomes diagonal in the basis set by the system-reservoir interaction instead of the system Hamiltonian. Several examples are discussed including a single qubit, a three-level V-system and two coupled qubits all interacting with bosonic reservoirs. The results shed light on the presence of coherences in the strong coupling regime, and provide key tools for nanoscale thermodynamics investigations.

翻訳日:2023-04-02 09:01:46 公開日:2021-12-24

# 7km展開ファイバリンクを用いた遠方クロックの場二方向量子同期の実現

Implementation of field two-way quantum synchronization of distant clocks across a 7 km deployed fiber link ( http://arxiv.org/abs/2109.00784v2 )

ライセンス: Link先を確認

Runai Quan, Huibo Hong, Wenxiang Xue, Honglei Quan, Wenyu Zhao, Xiao Xiang, Yuting Liu, Mingtao Cao, Tao Liu, Shougang Zhang, Ruifang Dong

(参考訳) 両方向の量子クロック同期はフェムト秒レベルの同期機能だけでなく、対称遅延攻撃に対するセキュリティも提供することが示されており、これにより、遠隔クロックを精度とセキュリティの両面で比較・同期する方法として期待できる。本稿では,HメーサとRbクロック間の2方向量子同期のフィールドテストを行い,7kmの展開ファイバでリンクした。 rbクロックの周波数安定性に制限され,30 sでの時間安定性を32 psで測定した。光ファイバマイクロ波周波数伝達技術を適用して, 得られた光子対の数が30秒で1440個に過ぎなかったにもかかわらず, 1マグニチュード以上の安定性が1.9psに向上した。このような実装は、フィールド応用を促進するための双方向量子クロック同期法の高実用性を示す。

The two-way quantum clock synchronization has been shown not only providing femtosecond-level synchronization capability but also security against symmetric delay attacks, thus becoming a prospective method to compare and synchronize distant clocks with both enhanced precision and security. In this letter, a field test of two-way quantum synchronization between a H-maser and a Rb clock linked by a 7 km-long deployed fiber was implemented. Limited by the frequency stability of the Rb clock, the achieved time stability at 30 s was measured as 32 ps. By applying a fiber-optic microwave frequency transfer technology, the stability was improved by more than one-magnitude to 1.9 ps, even though the number of acquired photon pairs was only 1440 in 30 s due to the low sampling rate of the utilized coincidence measurement system. Such implementation demonstrates the high practicability of two-way quantum clock synchronization method for promoting the field applications.

翻訳日:2023-03-16 08:43:12 公開日:2021-12-24

# 三体クーロン系における原子から分子への連続的転移のキャラクタリゼーション

Characterization of the continuous transition from atomic to molecular shape in the three-body Coulomb system ( http://arxiv.org/abs/2109.04542v2 )

ライセンス: Link先を確認

Laura D. Salas, Barbara Zamora-Yusti, and Julio C. Arce

(参考訳) 粒子の質量比が変化するため, 2つの同一粒子と反対電荷の第3粒子からなるクーロン系において, 原子から分子形状への連続遷移を, 代替的, 不定形的にキャラクタリゼーションする。変動に最適化された波動関数に境界条件の正確な分解を適用することにより、単一粒子と同一粒子の相対運動に対する非断熱ポテンシャル曲面を基底状態に構築する。遷移は、そのような表面の地形の質量比と関連する境界分布と条件分布の形状との進化を通して明らかにされる。本手法は, 分子形状のボルン-オッペンハイマーと電荷分配画像の統合と拡張を行う。

We present an alternative, univocal characterization of the continuous transition from atomic to molecular shape in the Coulomb system constituted by two identical particles and a third particle with the opposite charge, as the mass ratio of the particles varies. Applying a marginal-conditional exact factorization to a variationally optimized wavefunction, we construct a nonadiabatic potential energy surface for the relative motion between the single particle and each of the identical particles in the ground state. The transition is revealed through the evolution with the mass ratio of the topography of such surface and of the shapes of the associated marginal and conditional distributions. Our approach unifies and extends to the nonadiabatic regime the Born-Oppenheimer and charge-distribution pictures of molecular shape.

翻訳日:2023-03-15 18:07:58 公開日:2021-12-24

# 1+1)次元$O(3)$非線形$\sigma$-モデルと$\theta=\pi$項のテンソルネットワークシミュレーション

Tensor network simulation of the (1+1)-dimensional $O(3)$ nonlinear $\sigma$-model with $\theta=\pi$ term ( http://arxiv.org/abs/2109.11324v2 )

ライセンス: Link先を確認

Wei Tang, X. C. Xie, Lei Wang, Hong-Hao Tu

(参考訳) 1+1)次元の$O(3)$非線形な$\sigma$-modelと$\theta=\pi$のテンソルネットワークシミュレーションを行う。ハミルトンの定式化の中で、この場の理論は磁気モノポールで装飾された量子ローターモデルの有限温度分割関数として現れる。単極高調波基底を用いて、この修正量子回転子モデルの行列表現を導出し、テンソルネットワークシミュレーションを可能にする。我々は,最近開発した連続行列積作用素法[tang et al., phys. rev. lett. 125, 170604 (2020)]を用いて有限温度特性の研究を行い,無質量性を明らかにする。結合定数の関数としての中心電荷は計算で直接抽出され、場理論の予測と比較される。

We perform a tensor network simulation of the (1+1)-dimensional $O(3)$ nonlinear $\sigma$-model with $\theta=\pi$ term. Within the Hamiltonian formulation, this field theory emerges as the finite-temperature partition function of a modified quantum rotor model decorated with magnetic monopoles. Using the monopole harmonics basis, we derive the matrix representation for this modified quantum rotor model, which enables tensor network simulations. We employ our recently developed continuous matrix product operator method [Tang et al., Phys. Rev. Lett. 125, 170604 (2020)] to study the finite-temperature properties of this model and reveal its massless nature. The central charge as a function of the coupling constant is directly extracted in our calculations and compared with field theory predictions.

翻訳日:2023-03-13 23:13:24 公開日:2021-12-24

# 非エルミート光原子鏡

A non-Hermitian optical atomic mirror ( http://arxiv.org/abs/2110.10070v2 )

ライセンス: Link先を確認

Yi-Cheng Wang, Jhih-Shih You, H. H. Jen

(参考訳) 対称性とトポロジーの研究は量子光学において重要なブレークスルーをもたらしたが、よりリッチな振る舞いは光間相互作用の非エルミート的性質から生じる。高反射率非エルミタン光学鏡は、集合双極子モードに付随する共振器近傍の2次元の中性原子のサブ波長アレイによって実現することができる。ここでは、二乗原子格子の結晶対称性を低くすることで、例外点が非欠陥縮退から発展し、例外点から生じる分散バルクフェルミ弧が光円錐によって切り離されることを示す。双極子-双極子相互作用は相反するが、幾何学に依存しない非エルミート皮膚効果が出現する。さらに、境界に局在したスキンモードは、長距離相互作用に由来するスケールフリーな振る舞いを示し、そのメカニズムは非ブロッホバンド理論の枠組みを越えている。我々の研究は、非単純性、トポロジー、長距離相互作用の間の相互作用の研究の扉を開く。

Explorations of symmetry and topology have led to important breakthroughs in quantum optics, but much richer behaviors arise from the non-Hermitian nature of light-matter interactions. A high-reflectivity, non-Hermitian optical mirror can be realized by a two-dimensional subwavelength array of neutral atoms near the cooperative resonance associated with the collective dipole modes. Here we show that exceptional points develop from a nondefective degeneracy by lowering the crystal symmetry of a square atomic lattice, and dispersive bulk Fermi arcs that originate from exceptional points are truncated by the light cone. We also find, although the dipole-dipole interaction is reciprocal, the geometry-dependent non-Hermitian skin effect emerges. Furthermore, skin modes localized at a boundary show a scale-free behavior that stems from the long-range interaction and whose mechanism goes beyond the framework of non-Bloch band theory. Our work opens the door to the study of the interplay among non-Hermiticity, topology, and long-range interaction.

翻訳日:2023-03-11 02:01:18 公開日:2021-12-24

# スイッチング可能な軌道角運動量を有するヘラルド単一光子の室温オンチップ生成

Room-temperature on-chip generation of heralded single photons with switchable orbital angular momentum ( http://arxiv.org/abs/2111.05594v2 )

ライセンス: Link先を確認

Shan Zhang, Xue Feng, Wei Zhang, Kaiyu Cui, Fang Liu, and Yidong Huang

(参考訳) 量子光学において、軌道角運動量(oam)は、l の位相電荷によって量子化される無限および離散固有値の性質から、高次元量子状態を達成することを非常に有望である。ここでは、スイッチング可能なOAMモードを持つ一光子光源を提案し、シリコンチップ上で実証した。室温では、11のoamモード(l=2~6,-6〜-1)を持つヘラルド単光子が熱光学効果により生成・切り替えに成功した。我々は、複数のOAMモードを持ち、室温で動作する統合量子源は、高次元量子情報処理のための実用的なプラットフォームを提供すると考えている。さらに,提案アーキテクチャは,OAM量子源の性能向上のために,他の物質システムにも拡張可能である。

In quantum optics, orbital angular momentum (OAM) is very promising to achieve high-dimensional quantum states due to the nature of infinite and discrete eigenvalue, which is quantized by the topological charge of l. Here, a heralded single-photon source with switchable OAM modes is proposed and demonstrated on silicon chip. At room-temperature, the heralded single photons with 11 OAM modes (l=2~6, -6~-1) have been successfully generated and switched through thermo-optical effect. We believe that such an integrated quantum source with multiple OAM modes and operating at room-temperature would provide a practical platform for high-dimensional quantum information processing. Moreover, our proposed architecture can also be extended to other material systems to further improve the performance of OAM quantum source.

翻訳日:2023-03-08 12:17:56 公開日:2021-12-24

# qudit表面コードとハイパーマップコード

Qudit surface code and hypermap code ( http://arxiv.org/abs/2112.01752v2 )

ライセンス: Link先を確認

Zihan Lei

(参考訳) 本稿では、ホモロジー量子コードを任意のqudit次元$D\geq{2}$で定義し、2-複素$\Sigma$上でCSS演算子を直接定義する。 2-コンプレックスが曲面から来ると、qudit曲面コードが得られる。次に、定義したコードの次元が常に $\sigma$ の最初のホモロジー群のサイズに等しいことを証明する。次に、martin leslie が提案したハイパーマップホモロジー量子コードを qudit のケースに拡張し、そのようなすべてのハイパーマップコードに対して、我々が定義したホモロジー量子コードがそれと等しくなるような抽象的2-複体を構築した。

In this article, we define homological quantum code in arbitrary qudit dimension $D\geq{2}$ by directly defining CSS operators on a 2-complex $\Sigma$. When the 2-complex is from a surface, we get a qudit surface code. Then we prove that the dimension of the code we defined always equals the size of the first homology group of $\Sigma$. Next, we expand the hypermap-homology quantum code proposed by Martin Leslie to the qudit's case, and for every such hypermap code, we constructed an abstract 2-complex whose homological quantum code we just defined equals it.

翻訳日:2023-03-06 00:16:08 公開日:2021-12-24

# 2つの一般的な系統的誤りに対する短い複合回転ロバスト

Short composite rotation robust against two common systematic errors ( http://arxiv.org/abs/2112.12945v1 )

ライセンス: Link先を確認

Shingo Kukita, Haruki Kiya, and Yasushi Kondo

(参考訳) システムエラーは正確な量子制御を妨げる。パルス長誤差 (PLE) とオフ共振誤差 (ORE) は1ビット制御で発生する典型的な系統誤差である。複合パルス(CP)は、量子演算中の系統的なエラーの影響を補うのに役立つ。 PLEまたはOREに対して堅牢ないくつかのCPが同定されている。しかし、両方のエラー(bi-robust)に対して堅牢なCPを構築する試みはほとんど行われていない。複数個のロバストCPを改良し, 従来開発されたバイロバストCPよりも動作時間を短縮した新しいバイロバストCPを開発した。

Systematic errors hinder precise quantum control. Pulse length errors (PLEs) and off-resonance errors (OREs) are typical systematic errors that are encountered during one-qubit control. A composite pulse (CP) can help compensate for the effects of systematic errors during quantum operation. Several CPs that are robust against either PLE or ORE have been identified. However, few attempts have been made to construct CPs that are robust against both errors (bi-robust). We develop a novel bi-robust CP for one-qubit operations by modifying a PLE robust CP, which exhibits a shorter operation time than that of previously developed bi-robust CPs.

翻訳日:2023-03-03 09:16:47 公開日:2021-12-24

# 1次元水素分子イオンに対するシュロディンガー方程式の解析解

Analytical solutions of the Schrodinger equation for the one-dimensional hydrogen molecular ion ( http://arxiv.org/abs/2112.13135v1 )

ライセンス: Link先を確認

Stavros Theodorakis

(参考訳) 1次元水素分子イオンに対するシュロディンガー方程式の解析解を提案する。特に、この系の電子エネルギー曲線に対して、基底状態と第1励起状態に対応する閉形式表現を示す。我々の結果は以前得られた数値解と一致している。

We present analytical solutions of the Schrodinger equation for the one-dimensional hydrogen molecular ion. In particular, we present closed form expressions for the electronic energy curves of this system that correspond to the ground state and the first excited state. Our results agree with numerical solutions obtained before.

翻訳日:2023-03-03 09:15:53 公開日:2021-12-24

# IPsecアーキテクチャにおける量子鍵分配技術の概要

Overview of Quantum Key Distribution Technique within IPsec Architecture ( http://arxiv.org/abs/2112.13105v1 )

ライセンス: Link先を確認

Emir Dervisevic, Miralem Mehic

(参考訳) qkd(quantum key distribution)は、情報理論上安全な方法で遠隔ユーザ間で対称なバイナリキーを確立するためのアプローチである。本稿では、最新のIP(Internet Protocol)ネットワークにおけるセキュアな通信を確立するために、QKDを最もポピュラーなアーキテクチャに統合する既存のソリューションの概要について述べる。提供される概要は、標準化されたソリューションを目指すIPsecアーキテクチャにおけるQKDの統合をさらに設計するために使用することができる。

Quantum Key Distribution (QKD) is an approach for establishing symmetrical binary keys between distant users in an information-theoretically secure way. In this paper we provide an overview of existing solutions that integrate QKD within the most popular architecture for establishing secure communications in modern IP (Internet Protocol) networks - IPsec (Internet Protocol security). The provided overview can be used to further design the integration of QKD within the IPsec architecture striving for a standardized solution.

翻訳日:2023-03-03 09:15:49 公開日:2021-12-24

# 広帯域マイクロピラー空洞に埋め込まれた量子ドットに基づく光子対の高抽出効率源

High extraction efficiency source of photon pairs based on a quantum dot embedded in a broadband micropillar cavity ( http://arxiv.org/abs/2112.13074v1 )

ライセンス: Link先を確認

Laia Gin\'es, Magdalena Mocza{\l}a-Dusanowska, David Dlaka, Radim Ho\v{s}\'ak, Junior R. Gonzales-Ureta, Miroslav Je\v{z}ek, Edmund Harbord, Ruth Oulton, Sven H\"ofling, Andrew B. Young, Christian Schneider, Ana Predojevi\'c

(参考訳) 単一量子ドットにおける光子対の生成は、その性質上決定論的な過程に基づいている。しかし、高インデックス半導体ホスト材料からこれらの光子対を効率的に抽出するには、フォトニック環境の工学が必要である。単一量子ドットから放出される光子対の抽出に適した広帯域演算を用いて、69.4(10)$\%$の抽出効率を特徴とするマイクロピラーデバイスについて報告する。抽出効率の向上を実現するため,Purcellの強化にのみ依存するアプローチに対して,本手法はキャビティモード以外のモードへの排出抑制を利用する。当社の技術実装では、量子技術の増大するニーズに合わせてスケールアップ可能な、より高いデバイス収率を実現するための、控えめな製造努力が必要です。さらに、デバイスの設計をさらに最適化して、85$\%$の抽出効率を実現することができる。

The generation of photon pairs in single quantum dots is based on a process that is, in its nature, deterministic. However, an efficient extraction of these photon pairs from a high-index semiconductor host material requires engineering of the photonic environment. We report on a micropillar-based device featuring an extraction efficiency of 69.4(10)$\%$ that is achieved by harnessing a broadband operation suitable for extraction of photon pairs emitted from a single quantum dot. Opposing the approaches that rely solely on Purcell enhancement to realize the enhancement of the extraction efficiency, our solution exploits a suppression of the emission into the modes other than the cavity mode. Our technological implementation requires modest fabrication effort enabling higher device yields that can be scaled up to meet the growing needs of quantum technologies. Furthermore, the design of the device can be further optimized to allow for an extraction efficiency of 85$\%$.

翻訳日:2023-03-03 09:15:33 公開日:2021-12-24

# 自己受動面を有するペロブスカイト単結晶に基づく固有(トラップフリー)トランジスタ

The intrinsic (trap-free) transistors based on perovskite single crystals with self-passivated surfaces ( http://arxiv.org/abs/2112.13056v1 )

ライセンス: Link先を確認

V. Bruevich, L. Kasaei, S. Rangan, H. Hijazi, Z. Zhang, T. Emge, E. Andrei, R. A. Bartynski, L. C. Feldman, V. Podzorov

(参考訳) 鉛ハロゲン化ペロブスカイトは様々な光電子応用に適した新しい半導体材料として登場した。しかし、このタイプの材料の電荷輸送特性に関する基礎的および応用研究に必要なデバイスである信頼性ペロブスカイト電界効果トランジスタ(FET)の製造は困難であることが証明されている。ここでは,セシウム鉛臭化物(cspbbr3)のエピタキシャル単結晶薄膜に基づく高性能ペロブスカイトfetを示す。気相エピタキシーの改善により、このペロブスカイトの真に大きな原子平らな膜を、優れた構造と表面特性で成長させることができる。これらのCsPbBr3膜に基づくFETは、非常に低いヒステリシスと高い固有の電荷キャリアモビリティを有する教科書トランジスタ特性を示す。このような高性能デバイスが利用可能になったことで、ペロブスカイトFETにおけるホール効果が初めて研究された。 CsPbBr3 FETの荷電担体移動度は, 室温で約30 cm2V-1s-1から50 Kで約250 cm2V-1s-1に増加し, 主にフォノン散乱による帯域輸送が制限された。ここで説明されるエピタキシャル成長とFET製造法は、ハイブリッドを含む他のペロブスカイトにも自然に拡張可能であり、ペロブスカイトFETの研究における性能ボトルネックを克服する技術的な進歩を表している。

Lead-halide perovskites emerged as novel semiconducting materials suitable for a variety of optoelectronic applications. However, fabrication of reliable perovskite field-effect transistors (FETs), the devices necessary for the fundamental and applied research on charge transport properties of this class of materials, has proven challenging. Here we demonstrate high-performance perovskite FETs based on epitaxial, single crystalline thin films of cesium lead bromide (CsPbBr3). An improved vapor-phase epitaxy has allowed growing truly large-area, atomically flat films of this perovskite with excellent structural and surface properties. FETs based on these CsPbBr3 films exhibit textbook transistor characteristics, with a very low hysteresis and high intrinsic charge carrier mobility. Availability of such high-performance devices has allowed the study of Hall effect in perovskite FETs for the first time. Our magneto-transport measurements show that the charge carrier mobility of CsPbBr3 FETs increases on cooling, from ~ 30 cm2V-1s-1 at room temperature, to ~ 250 cm2V-1s-1 at 50 K, exhibiting a band transport mostly limited by phonon scattering. The epitaxial growth and FET fabrication methodologies described here can be naturally extended to other perovskites, including the hybrid ones, thus representing a technological leap forward, overcoming the performance bottleneck in research on perovskite FETs.

翻訳日:2023-03-03 09:15:18 公開日:2021-12-24

# 線形ネットワークにおけるマクロ量子相関のシミュレーション

Simulating macroscopic quantum correlations in linear networks ( http://arxiv.org/abs/2112.13014v1 )

ライセンス: Link先を確認

A. Dellios, Peter D. Drummond, Bogdan Opanchuk, Run Yan Teh, and Margaret D. Reid

(参考訳) 多くの発展型量子技術は異なるタイプの量子ネットワークを利用する。線形量子ネットワークでさえ非自明であり、出力光子分布は指数関数的に複雑である。しかし、それでも計算シミュレーションは可能である。使用される方法は等価位相空間表現への変換であり、確率的に扱うことができる。これはデコヒーレンスを含む実験結果の予測と検証に非常に有用なツールを提供する。量子計算上の優位性を示すことを意図したガウスボソンサンプリングの実験と同様に、これらの手法は他の種類の絡み合った線形量子ネットワークにも適用できる。本稿では、この領域における研究のチュートリアルとレビューを行い、正のP分布とウィグナー分布を用いて量子位相空間技術を説明する。

Many developing quantum technologies make use of quantum networks of different types. Even linear quantum networks are nontrivial, as the output photon distributions can be exponentially complex. Despite this, they can still be computationally simulated. The methods used are transformations into equivalent phase-space representations, which can then be treated probabilistically. This provides an exceptionally useful tool for the prediction and validation of experimental results, including decoherence. As well as experiments in Gaussian boson sampling, which are intended to demonstrate quantum computational advantage, these methods are applicable to other types of entangled linear quantum networks as well. This paper provides a tutorial and review of work in this area, to explain quantum phase-space techniques using the positive-P and Wigner distributions.

翻訳日:2023-03-03 09:13:52 公開日:2021-12-24

# ワイル幾何学と量子補正

Weyl Geometry and Quantum Corrections ( http://arxiv.org/abs/2112.12964v1 )

ライセンス: Link先を確認

Sijo K. Joseph

(参考訳) 量子論の幾何学的定式化に関する最近の研究は、ワイル幾何学が量子論と一般相対性理論を古典的場の理論として一貫して融合するために用いられることを示唆している。ワイル幾何学の枠組みでは、量子論と重力は、量子論がジオメトリゼーションされると、一貫して融合できるようである。拡張微分幾何学は量子力学的結果をより一般的な非線形の枠組みに修正することができる。著者は、拡張微分幾何学が既知の量子方程式やマクスウェルの電磁方程式の修正をどのように修正するかを示している。

Recent research in the geometric formulation of quantum theory has implied that Weyl Geometry can be used to merge quantum theory and general relativity consistently as classical field theories. In the Weyl Geometric framework, it seems that both quantum theory and gravity can merge consistently, once quantum theory is geometrized. The extended differential geometry can modify the quantum mechanical results into a more general nonlinear framework. Author shows that, how the extended differential geometry modifies the known quantum equations and also the modification to the Maxwell's electromagnetic equations.

翻訳日:2023-03-03 09:13:03 公開日:2021-12-24

# 量子サイド情報を用いた推測作業

Guesswork with Quantum Side Information ( http://arxiv.org/abs/2001.03598v3 )

ライセンス: Link先を確認

Eric P. Hanson, Vishal Katariya, Nilanjana Datta, Mark M. Wilde

(参考訳) 確率変数の実現を正しく推測するには、平均で最小の推測数が必要なのか? この疑問に対する答えは、1994年にマッシーによる推測という量の概念を導入し、これはエントロピーに対する代替のセキュリティ基準と見なすことができる。本稿では,量子側情報の存在下での推測について考察し,一般的な逐次推定戦略が単一測定を行い,その結果から推測戦略を選択することと等価であることを示す。この結果を用いて、量子側情報の存在下での推測上のエントロピー的なワンショットと漸近境界を推定し、半定値プログラム(SDP)を定式化し、その量を計算する。 bb84状態を含む単純な例について,数値的および解析的に推算し,その推算をセキュリティ基準として用いた場合,若干不完全な鍵状態の安全性を検証した連続性を証明する。

What is the minimum number of guesses needed on average to correctly guess a realization of a random variable? The answer to this question led to the introduction of the notion of a quantity called guesswork by Massey in 1994, which can be viewed as an alternate security criterion to entropy. In this paper, we consider the guesswork in the presence of quantum side information, and show that a general sequential guessing strategy is equivalent to performing a single measurement and choosing a guessing strategy from the outcome. We use this result to deduce entropic one-shot and asymptotic bounds on the guesswork in the presence of quantum side information, and to formulate a semi-definite program (SDP) to calculate the quantity. We evaluate the guesswork for a simple example involving the BB84 states, both numerically and analytically, and prove a continuity result that certifies the security of slightly imperfect key states when the guesswork is used as the security criterion.

翻訳日:2023-01-12 23:20:59 公開日:2021-12-24

# 木、森、そして不純物に基づく変数の重要性

Trees, forests, and impurity-based variable importance ( http://arxiv.org/abs/2001.04295v3 )

ライセンス: Link先を確認

Erwan Scornet (CMAP)

(参考訳) ランダムフォレスト(breiman, 2001])のようなツリーアンサンブル手法は、高い次元の表データを扱うのに非常に人気がある。しかし、機械学習が意思決定問題に使用される場合、アルゴリズム予測プロセスの深い理解を必要とするため、最良の予測手順の解決は合理的ではないかもしれない。不幸なことに、ランダムな森林は数百の決定木を平均して予測した結果、本質的に解釈できない。このいわゆるブラックボックスアルゴリズムの知識を得る古典的なアプローチは、各入力変数の予測的影響を評価するために使用される変数の重要性を計算することである。可変重要度は変数のランク付けや選択に使用され、データ分析において大きな役割を果たす。それにもかかわらず、そのような方法でランダムな森林変数の重要さを使うのは正当化されていない。本稿では,2つのよく知られたランダムな森林変動の重要性である平均減少不純物(MDI)を分析する。入力変数が独立で相互作用がない場合、MDIは各変数の寄与が明確に識別される出力の分散分解を提供する。また,入力変数や相互作用の依存性を示すモデルについても検討した。分析の結果,単木に比べて森林の利用にメリットがある可能性が示唆された。

Tree ensemble methods such as random forests [Breiman, 2001] are very popular to handle high-dimensional tabular data sets, notably because of their good predictive accuracy. However, when machine learning is used for decision-making problems, settling for the best predictive procedures may not be reasonable since enlightened decisions require an in-depth comprehension of the algorithm prediction process. Unfortunately, random forests are not intrinsically interpretable since their prediction results from averaging several hundreds of decision trees. A classic approach to gain knowledge on this so-called black-box algorithm is to compute variable importances, that are employed to assess the predictive impact of each input variable. Variable importances are then used to rank or select variables and thus play a great role in data analysis. Nevertheless, there is no justification to use random forest variable importances in such way: we do not even know what these quantities estimate. In this paper, we analyze one of the two well-known random forest variable importances, the Mean Decrease Impurity (MDI). We prove that if input variables are independent and in absence of interactions, MDI provides a variance decomposition of the output, where the contribution of each variable is clearly identified. We also study models exhibiting dependence between input variables or interaction, for which the variable importance is intrinsically ill-defined. Our analysis shows that there may exist some benefits to use a forest compared to a single tree.

翻訳日:2023-01-11 23:52:24 公開日:2021-12-24

# 短・雑音テキストストリームのためのグラフ畳み込みトピックモデル

A Graph Convolutional Topic Model for Short and Noisy Text Streams ( http://arxiv.org/abs/2003.06112v4 )

ライセンス: Link先を確認

Ngo Van Linh, Tran Xuan Bach and Khoat Than

(参考訳) データストリームから隠れたトピックを学ぶことは、必然的に必要だが、コンセプトドリフトや、短くて騒がしいデータといった難しい問題を引き起こした。トピックモデルを強化するために事前知識を使用することは、これらの課題に対処する潜在的な解決策の1つです。人的知識(Wordnetなど)や事前訓練されたモデル(Word2vecなど)から派生した事前知識は、トピックモデルがよりうまく機能するのに非常に有用である。しかし、データが継続的に無限に届くストリーミング環境では、既存の研究はこれらのリソースを効果的に活用することに限定されている。特に意味のある単語関係を含む知識グラフは無視される。本稿では,知識グラフを効果的に活用することを目的として,グラフ畳み込みネットワーク(gcn)をトピックモデルに統合する新しいグラフ畳み込みトピックモデル(gctm)と,データストリームに対してネットワークとトピックモデルを同時に学習する学習方法を提案する。各ミニバッチでは,外部ナレッジグラフを活用できるだけでなく,外部ナレッジグラフと古いナレッジのバランスをとることができ,新しいデータでうまく機能する。我々は,人間の知識グラフ(Wordnet)と事前学習した単語埋め込み(Word2vec)から構築したグラフ(Word2vec)を用いて,提案手法の評価を行う。提案手法は,確率的予測尺度とトピックコヒーレンスの観点から,最先端のベースラインよりもはるかに優れた性能が得られることを示す。特に,本手法は,短いテキストやコンセプトドリフトを扱う場合にも有効である。 GCTMの実装は \url{https://github.com/bachtranxuan/GCTM.git} で利用可能である。

Learning hidden topics from data streams has become absolutely necessary but posed challenging problems such as concept drift as well as short and noisy data. Using prior knowledge to enrich a topic model is one of potential solutions to cope with these challenges. Prior knowledge that is derived from human knowledge (e.g. Wordnet) or a pre-trained model (e.g. Word2vec) is very valuable and useful to help topic models work better. However, in a streaming environment where data arrives continually and infinitely, existing studies are limited to exploiting these resources effectively. Especially, a knowledge graph, that contains meaningful word relations, is ignored. In this paper, to aim at exploiting a knowledge graph effectively, we propose a novel graph convolutional topic model (GCTM) which integrates graph convolutional networks (GCN) into a topic model and a learning method which learns the networks and the topic model simultaneously for data streams. In each minibatch, our method not only can exploit an external knowledge graph but also can balance the external and old knowledge to perform well on new data. We conduct extensive experiments to evaluate our method with both a human knowledge graph (Wordnet) and a graph built from pre-trained word embeddings (Word2vec). The experimental results show that our method achieves significantly better performances than state-of-the-art baselines in terms of probabilistic predictive measure and topic coherence. In particular, our method can work well when dealing with short texts as well as concept drift. The implementation of GCTM is available at \url{https://github.com/bachtranxuan/GCTM.git}.

翻訳日:2022-12-24 01:04:52 公開日:2021-12-24

# ソース知識とターゲット関連性を同時に伝達する非教師なしドメイン適応

Unsupervised Domain Adaptation Through Transferring both the Source-Knowledge and Target-Relatedness Simultaneously ( http://arxiv.org/abs/2003.08051v3 )

ライセンス: Link先を確認

Qing Tian, Yanan Zhu, Chuang Ma, Meng Cao

(参考訳) 教師なしドメイン適応(Unsupervised domain adapt, UDA)は、機械学習とパターン認識の分野における新たな研究トピックであり、ソースドメインから知識を伝達することで、ラベルなしのターゲットドメインの学習を支援することを目的としている。

Unsupervised domain adaptation (UDA) is an emerging research topic in the field of machine learning and pattern recognition, which aims to help the learning of unlabeled target domain by transferring knowledge from the source domain.

翻訳日:2022-12-22 09:31:08 公開日:2021-12-24

# ランダムドット製品グラフの連続推論のための1次元部分多様体の学習

Learning 1-Dimensional Submanifolds for Subsequent Inference on Random Dot Product Graphs ( http://arxiv.org/abs/2004.07348v6 )

ライセンス: Link先を確認

Michael W. Trosset, Mingyue Gao, Minh Tang, Carey E. Priebe

(参考訳) ランダムドット積グラフ(RDPG)は、潜伏ユークリッド空間における頂点が位置に対応するネットワークの生成モデルであり、潜伏位置の点積によってエッジ確率が決定される。潜在空間の未知の1$次元部分多様体から潜在位置をランダムにサンプリングするRDPGを考察する。原則として、制限された推論、すなわち、部分多様体の構造を利用する手順は、制限されていない推論よりも効果的であるべきであるが、部分多様体が未知のときに制限された推論を実行する方法が明確でない。多様体学習の手法は、制限された推論の利点を実現するのに十分十分に未知の部分多様体を学習するために使うことができる。説明するために、我々は、小さな頂点のコミュニティのFr\'{e}chet手段に関する1ドルおよび2ドルサンプル仮説を、潜在構造を推論するために、完全な頂点集合を用いてテストした。本研究では,推定潜在位置から構築された近傍グラフ上の最短経路距離を用いて,未知の1$次元部分多様体上の弧長を推定する等マップ法を多様体学習に適用するテスト統計を提案する。従来のイソマプの応用とは異なり、推定された潜在位置は興味のサブ多様体には属さない。我々は、isomap の既存の収束結果をこの設定に拡張し、それらを用いて、補助頂点の数が増えるにつれて、部分多様体が知られているとき、テストのパワーは対応するテストのパワーに収束することを示す。最後に,本手法をショウジョウバエ幼生のキノコのコネクトームを研究する際に生じる推論問題に適用する。不定値学習多様体検定(英語版)(univariate learnt manifold test)は(p<0.05$)を拒絶するが、多変量環境空間検定(英語版)(multivariate ambient space test)は(p\gg0.05$)ではない。

A random dot product graph (RDPG) is a generative model for networks in which vertices correspond to positions in a latent Euclidean space and edge probabilities are determined by the dot products of the latent positions. We consider RDPGs for which the latent positions are randomly sampled from an unknown $1$-dimensional submanifold of the latent space. In principle, restricted inference, i.e., procedures that exploit the structure of the submanifold, should be more effective than unrestricted inference; however, it is not clear how to conduct restricted inference when the submanifold is unknown. We submit that techniques for manifold learning can be used to learn the unknown submanifold well enough to realize benefit from restricted inference. To illustrate, we test $1$- and $2$-sample hypotheses about the Fr\'{e}chet means of small communities of vertices, using the complete set of vertices to infer latent structure. We propose test statistics that deploy the Isomap procedure for manifold learning, using shortest path distances on neighborhood graphs constructed from estimated latent positions to estimate arc lengths on the unknown $1$-dimensional submanifold. Unlike conventional applications of Isomap, the estimated latent positions do not lie on the submanifold of interest. We extend existing convergence results for Isomap to this setting and use them to demonstrate that, as the number of auxiliary vertices increases, the power of our test converges to the power of the corresponding test when the submanifold is known. Finally, we apply our methods to an inference problem that arises in studying the connectome of the Drosophila larval mushroom body. The univariate learnt manifold test rejects ($p<0.05$), while the multivariate ambient space test does not ($p\gg0.05$), illustrating the value of identifying and exploiting low-dimensional structure for subsequent inference.

翻訳日:2022-12-13 02:54:21 公開日:2021-12-24

# 定常時間系列からの推論のための学習因子グラフ

Learned Factor Graphs for Inference from Stationary Time Sequences ( http://arxiv.org/abs/2006.03258v4 )

ライセンス: Link先を確認

Nir Shlezinger, Nariman Farsad, Yonina C. Eldar, and Andrea J. Goldsmith

(参考訳) 時系列からの推論法の設計は、伝統的に、潜在希望シーケンスと観測されたシーケンスの関係を記述する統計モデルに依存してきた。モデルに基づくアルゴリズムの幅広い系統が導出され、基礎となる分布を表す因子グラフ上の再帰的計算を用いて制御可能な複雑性で推論を行う。別のモデルに依存しないアプローチでは、機械学習(ML)手法を用いる。本稿では,モデルベースアルゴリズムとデータ駆動型MLツールを組み合わせた定常時間列のフレームワークを提案する。提案手法では、完全な推論タスクではなく、時間系列の分布を記述する因子グラフの特定の成分を別々に学習するためにニューラルネットワークが開発された。この分布の定常特性を利用することで、結果のアプローチを時間的変化の列に適用することができる。学習された因子グラフは、小さなトレーニングセットを使用してトレーニング可能なコンパクトニューラルネットワークを使用して実現することができる。本稿では,ラベル付きデータから和生成スキームを実装することを学習し,異なる長さのシーケンスに適用可能な,学習した定常因子グラフに基づく推論アルゴリズムを提案する。提案する学習因子グラフは,sleep-edfデータセットを用いた睡眠ステージ検出や未知チャネルを用いたデジタル通信におけるシンボル検出のために,小さなトレーニングセットから正確な推論を行うことができることを示す。

The design of methods for inference from time sequences has traditionally relied on statistical models that describe the relation between a latent desired sequence and the observed one. A broad family of model-based algorithms have been derived to carry out inference at controllable complexity using recursive computations over the factor graph representing the underlying distribution. An alternative model-agnostic approach utilizes machine learning (ML) methods. Here we propose a framework that combines model-based algorithms and data-driven ML tools for stationary time sequences. In the proposed approach, neural networks are developed to separately learn specific components of a factor graph describing the distribution of the time sequence, rather than the complete inference task. By exploiting stationary properties of this distribution, the resulting approach can be applied to sequences of varying temporal duration. Learned factor graph can be realized using compact neural networks that are trainable using small training sets, or alternatively, be used to improve upon existing deep inference systems. We present an inference algorithm based on learned stationary factor graphs, which learns to implement the sum-product scheme from labeled data, and can be applied to sequences of different lengths. Our experimental results demonstrate the ability of the proposed learned factor graphs to learn to carry out accurate inference from small training sets for sleep stage detection using the Sleep-EDF dataset, as well as for symbol detection in digital communications with unknown channels.

翻訳日:2022-11-25 03:26:18 公開日:2021-12-24

# 可逆ニューラルネットワークにおける爆発的逆の理解と緩和

Understanding and Mitigating Exploding Inverses in Invertible Neural Networks ( http://arxiv.org/abs/2006.09347v2 )

ライセンス: Link先を確認

Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger Grosse, J\"orn-Henrik Jacobsen

(参考訳) Invertible Neural Network (INN) は、生成モデルの設計、メモリ節約勾配計算の実装、逆問題の解決に使われている。本研究は,よく使われる INN アーキテクチャが爆発する逆数に悩まされ,数値的に非可逆になる傾向があることを示す。 In-out-of-distribution(OOD)データにおける変数変更の非適用性、メモリセービングバックプロップの誤勾配、正規化フローモデルからのサンプリング不能など、広範囲にわたるINNユースケースの障害を明らかにする。さらに、共通アーキテクチャの原子構造ブロックの双Lipschitz特性を導出する。 INNの安定性に関するこれらの洞察は、これらの障害を治療する方法を提供します。メモリ節約バックプロップのように局所可逆性が十分であるタスクに対しては、柔軟で効率的な正規化子を提案する。 OODデータに正規化フローを適用するなど、グローバルな可逆性が必要な問題に対しては、安定したINNビルディングブロックを設計することの重要性を示す。

Invertible neural networks (INNs) have been used to design generative models, implement memory-saving gradient computation, and solve inverse problems. In this work, we show that commonly-used INN architectures suffer from exploding inverses and are thus prone to becoming numerically non-invertible. Across a wide range of INN use-cases, we reveal failures including the non-applicability of the change-of-variables formula on in- and out-of-distribution (OOD) data, incorrect gradients for memory-saving backprop, and the inability to sample from normalizing flow models. We further derive bi-Lipschitz properties of atomic building blocks of common architectures. These insights into the stability of INNs then provide ways forward to remedy these failures. For tasks where local invertibility is sufficient, like memory-saving backprop, we propose a flexible and efficient regularizer. For problems where global invertibility is necessary, such as applying normalizing flows on OOD data, we show the importance of designing stable INN building blocks.

翻訳日:2022-11-20 19:37:04 公開日:2021-12-24

# グラフニューラルネットワークのアーキテクチャ的意味

Architectural Implications of Graph Neural Networks ( http://arxiv.org/abs/2009.00804v2 )

ライセンス: Link先を確認

Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, Minyi Guo

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造を操作するディープラーニングモデルの新たなラインである。多くのグラフ関連のタスクで高い精度を達成しているため、ますます人気が高まっている。しかしながら、GNNはシステムやアーキテクチャのコミュニティにおいて、多層パーセプトロンや畳み込みニューラルネットワークのようなそれと同等のものほどよく理解されていない。この作業は、コミュニティにGNNを導入しようとします。 GCNの特性のみを示す以前の作業とは対照的に、一般的なGNN記述フレームワークに基づいて、GNNワークロードの品種の大部分をカバーしています。 2つの広く使われているライブラリの上にモデルを構築することで、汎用およびアプリケーション固有のアーキテクチャに関する推論ステージでのgnn計算を特徴付け、gnnのシステムおよびアーキテクチャ研究をさらに促進できることを期待します。

Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures. It is becoming more and more popular due to its high accuracy achieved in many graph-related tasks. However, GNN is not as well understood in the system and architecture community as its counterparts such as multi-layer perceptrons and convolutional neural networks. This work tries to introduce the GNN to our community. In contrast to prior work that only presents characterizations of GCNs, our work covers a large portion of the varieties for GNN workloads based on a general GNN description framework. By constructing the models on top of two widely-used libraries, we characterize the GNN computation at inference stage concerning general-purpose and application-specific architectures and hope our work can foster more system and architecture research for GNNs.

翻訳日:2022-10-22 20:04:00 公開日:2021-12-24

# リフレクティブデコーディング:オフザシェルフ言語モデルによる一方向生成を超えて

Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models ( http://arxiv.org/abs/2010.08566v4 )

ライセンス: Link先を確認

Peter West, Ximing Lu, Ari Holtzman, Chandra Bhagavatula, Jena Hwang, Yejin Choi

(参考訳) 一般公開され、大きな事前訓練された言語モデル(LM)は、顕著な品質のテキストを生成するが、左から右へしか連続しない。結果として、それらは、タスク固有の監督を必要とするパラフレージングやテキストインフィルなどの一方向の仮定を破る生成タスクに即座には適用されない。本稿では,一方向LMの非順序タスクへの直接適用を可能にする新しい非教師付きアルゴリズムであるリフレクティブデコーディングを提案する。 2段階のアプローチでは、監督やパラレルコーパスは必要ありません。まず、文脈化ステップにおいて、私たちはLMを使用して、入力をまとめてキャプチャする過去と将来のコンテキストのアンサンブルを生成します(例えば、パラフレーズのソース文)。第2に、リフレクションステップでは、これらの「コンテキストアンサンブル」を条件とし、それらと互換性のある出力を生成する。包括的実証実験の結果、反射的復号法はパラフレージングと帰納的テキスト埋入の両方において強い教師なしベースラインを上回り、教師なしメソッドと教師なしメソッドのギャップを著しく狭めることが示された。反射復号は、人的評価を含む様々な指標で複数の教師付きベースラインを超える。

Publicly available, large pretrained LanguageModels (LMs) generate text with remarkable quality, but only sequentially from left to right. As a result, they are not immediately applicable to generation tasks that break the unidirectional assumption, such as paraphrasing or text-infilling, necessitating task-specific supervision. In this paper, we present Reflective Decoding, a novel unsupervised algorithm that allows for direct application of unidirectional LMs to non-sequential tasks. Our 2-step approach requires no supervision or even parallel corpora, only two off-the-shelf pretrained LMs in opposite directions: forward and backward. First, in the contextualization step, we use LMs to generate ensembles of past and future contexts which collectively capture the input (e.g. the source sentence for paraphrasing). Second, in the reflection step, we condition on these "context ensembles", generating outputs that are compatible with them. Comprehensive empirical results demonstrate that Reflective Decoding outperforms strong unsupervised baselines on both paraphrasing and abductive text infilling, significantly narrowing the gap between unsupervised and supervised methods. Reflective Decoding surpasses multiple supervised baselines on various metrics including human evaluation.

翻訳日:2022-10-06 21:05:28 公開日:2021-12-24

# メタ強化学習による動的チャネルアクセス

Dynamic Channel Access via Meta-Reinforcement Learning ( http://arxiv.org/abs/2201.09075v1 )

ライセンス: Link先を確認

Ziyang Lu and M. Cenk Gursoy

(参考訳) 本稿では,メタ強化学習による動的無線環境におけるチャネルアクセス問題に対処する。 spectrumは、特にネットワーク内のデバイス数の増加に伴い、無線通信において不足しているリソースである。近年,深部強化学習(DRL)の成功に触発されて,DRLを介して無線リソース割り当て問題に対処する研究が盛んに行われている。しかし、DRLアルゴリズムのトレーニングには、通常、特定のタスクごとに環境から収集された大量のデータが必要である。本研究では,これらの課題に対処するために,モデル非依存型メタラーニング(MAML)の手法を取り入れたメタDRLフレームワークを提案する。提案手法では,類似するチャネル選択タスクに対して共通初期化を訓練する。初期化から、同じ分布から引き出された異なるタスクに適応するためには、わずかに勾配降下が要求される。シミュレーション結果による性能改善を実証する。

In this paper, we address the channel access problem in a dynamic wireless environment via meta-reinforcement learning. Spectrum is a scarce resource in wireless communications, especially with the dramatic increase in the number of devices in networks. Recently, inspired by the success of deep reinforcement learning (DRL), extensive studies have been conducted in addressing wireless resource allocation problems via DRL. However, training DRL algorithms usually requires a massive amount of data collected from the environment for each specific task and the well-trained model may fail if there is a small variation in the environment. In this work, in order to address these challenges, we propose a meta-DRL framework that incorporates the method of Model-Agnostic Meta-Learning (MAML). In the proposed framework, we train a common initialization for similar channel selection tasks. From the initialization, we show that only a few gradient descents are required for adapting to different tasks drawn from the same distribution. We demonstrate the performance improvements via simulation results.

翻訳日:2022-01-30 11:52:26 公開日:2021-12-24

# ラーニング・ツー・ランク蒸留を用いた効率的な組合せ最適化モデル

An Efficient Combinatorial Optimization Model Using Learning-to-Rank Distillation ( http://arxiv.org/abs/2201.00695v1 )

ライセンス: Link先を確認

Honguk Woo, Hyunsung Lee, Sangwoo Cho

(参考訳) 近年,複合最適化問題(COP)の解法として深部強化学習(RL)が実現可能であることが証明されている。本手法は情報検索の分野で研究されている。いくつかのCOPは入力項目の優先順位付けとして定式化できるが、情報検索でよく見られるように、COPの深部RLにどのように学習から階級への技法を組み込むかは、完全には解明されていない。本稿では、COPのRLにより得られる高性能なランク付けポリシーを非定位単純モデルに蒸留し、低遅延COPソルバを実現するための、学習からランクへの蒸留に基づくCOPフレームワークを提案する。具体的には、近似されたランキング蒸留を用いて、勾配降下によるスコアベースランキングモデルを学習可能にする。さらに,効率的なシーケンスサンプリングを用いて,遅延の少ない推論性能を向上させる。このフレームワークを用いて,蒸留モデルがそれぞれの高性能RLに匹敵する性能を得るだけでなく,数倍高速な推算を行うことを示した。優先度に基づくタスクスケジューリングや多次元knapsackなど,複数のCOPを用いてフレームワークの評価を行い,推論遅延と性能の観点からフレームワークの利点を実証した。

Recently, deep reinforcement learning (RL) has proven its feasibility in solving combinatorial optimization problems (COPs). The learning-to-rank techniques have been studied in the field of information retrieval. While several COPs can be formulated as the prioritization of input items, as is common in the information retrieval, it has not been fully explored how the learning-to-rank techniques can be incorporated into deep RL for COPs. In this paper, we present the learning-to-rank distillation-based COP framework, where a high-performance ranking policy obtained by RL for a COP can be distilled into a non-iterative, simple model, thereby achieving a low-latency COP solver. Specifically, we employ the approximated ranking distillation to render a score-based ranking model learnable via gradient descent. Furthermore, we use the efficient sequence sampling to improve the inference performance with a limited delay. With the framework, we demonstrate that a distilled model not only achieves comparable performance to its respective, high-performance RL, but also provides several times faster inferences. We evaluate the framework with several COPs such as priority-based task scheduling and multidimensional knapsack, demonstrating the benefits of the framework in terms of inference latency and performance.

翻訳日:2022-01-09 13:30:51 公開日:2021-12-24

# 状態選択アルゴリズムと状態フルネットワークプロトコルのファジング性能への影響

State Selection Algorithms and Their Impact on The Performance of Stateful Network Protocol Fuzzing ( http://arxiv.org/abs/2112.15498v1 )

ライセンス: Link先を確認

Dongge Liu, Van-Thuan Pham, Gidon Ernst, Toby Murray, and Benjamin I.P. Rubinstein

(参考訳) ネットワークプロトコルの実装のステートフル性は、ファジングを含むテストと検証のテクニックにユニークな課題をもたらす。ステートフルなファズナーは状態モデルを利用して状態空間を分割し、テスト生成プロセスを支援する。すべての状態が等しく重要で、ファジングキャンペーンには時間制限があるわけではないので、ファジングは進歩的な状態を他の状態よりも優先する効果的な状態選択アルゴリズムを必要とする。いくつかの状態選択アルゴリズムが提案されているが、異なるプラットフォーム上で個別に実装され評価され、決定的な結果を得るのが困難である。本研究では,ネットワークサーバ用最先端ファジタであるAFLNetと同一のファジリングプラットフォーム上で,広範な状態選択アルゴリズムの評価を行う。このアルゴリズムセットには、AFLNetとAFLNetLegionと呼ばれる新しい原理のアルゴリズムがサポートされている。 ProFuzzBenchベンチマークの実験結果から, (i) AFLNetの既存の状態選択アルゴリズムは、非常によく似たコードカバレッジを実現する。 (ii) aflnetlegionは、選択されたケーススタディでこれらのアルゴリズムを明らかに上回っているが (iii)全体的な改善は無意味である。これらは予想外だが興味深い発見だ。この問題を特定し、今後の研究機会を開く可能性のある洞察を共有します。

The statefulness property of network protocol implementations poses a unique challenge for testing and verification techniques, including Fuzzing. Stateful fuzzers tackle this challenge by leveraging state models to partition the state space and assist the test generation process. Since not all states are equally important and fuzzing campaigns have time limits, fuzzers need effective state selection algorithms to prioritize progressive states over others. Several state selection algorithms have been proposed but they were implemented and evaluated separately on different platforms, making it hard to achieve conclusive findings. In this work, we evaluate an extensive set of state selection algorithms on the same fuzzing platform that is AFLNet, a state-of-the-art fuzzer for network servers. The algorithm set includes existing ones supported by AFLNet and our novel and principled algorithm called AFLNetLegion. The experimental results on the ProFuzzBench benchmark show that (i) the existing state selection algorithms of AFLNet achieve very similar code coverage, (ii) AFLNetLegion clearly outperforms these algorithms in selected case studies, but (iii) the overall improvement appears insignificant. These are unexpected yet interesting findings. We identify problems and share insights that could open opportunities for future research on this topic.

翻訳日:2022-01-09 13:29:14 公開日:2021-12-24

# (参考訳) CARLAに実装された自律走行車両の遮蔽型検証・検証フレームワーク

Intersection focused Situation Coverage-based Verification and Validation Framework for Autonomous Vehicles Implemented in CARLA ( http://arxiv.org/abs/2112.14706v1 )

ライセンス: CC BY 4.0

Zaid Tahir, Rob Alexander

(参考訳) 自動運転車(avs:autonomous vehicle)は、自動運転ソフトウェアにおけるエラーが大きな損失につながる可能性があるため、安全クリティカルなドメインで運用される。統計的には、AVsオペレーショナルデザインドメイン(ODD)の一部である道路交差点は、最も高い事故率を持っている。したがって,道路交差点の限界に対するAVの試験と道路交差点の安全確保が重要であり,本論文の焦点となる。本稿では,CARLA というオープンソースのAVシミュレータで開発された AV の検証・検証・安全性保証のための状況カバレッジ(SitCov) AV-testing フレームワークを提案する。 sitcov av-testing frameworkは、avsの安全性保証のための自動テストスイート生成のための状況カバレッジ基準を使用して、異なる環境および交差点構成条件下での道路交差点における車両間相互作用に焦点を当てている。我々は、交叉状況のオントロジーを開発し、それを用いて状況超空間、すなわちそのオントロジーから生じる全ての可能な状況の空間を生成する。 SitCov AVテストフレームワークの評価のために,エゴAVで複数の障害を発生させ,状況カバレッジとランダムな状況生成を比較した。両方の生成手法が、同じ数のシード断層をトリガーしていることがわかりましたが、カバレッジベースの生成は、エゴAVの自律運転アルゴリズムの弱点、特にエッジケースにおいて、より多くを教えてくれます。私たちのコードはオンラインで公開されており、誰でも私たちのSitCov AV-testingフレームワークを使って、それを使って、さらにその上に構築することができます。本稿では,V&Vの領域とAV開発への貢献を理論的観点からだけでなく,オープンソースのソフトウェアコントリビューションや,V&VやAV開発のためのフレキシブル・エフェクトなツールのリリースの観点からも目指す。

Autonomous Vehicles (AVs) i.e., self-driving cars, operate in a safety critical domain, since errors in the autonomous driving software can lead to huge losses. Statistically, road intersections which are a part of the AVs operational design domain (ODD), have some of the highest accident rates. Hence, testing AVs to the limits on road intersections and assuring their safety on road intersections is pertinent, and thus the focus of this paper. We present a situation coverage-based (SitCov) AV-testing framework for the verification and validation (V&V) and safety assurance of AVs, developed in an open-source AV simulator named CARLA. The SitCov AV-testing framework focuses on vehicle-to-vehicle interaction on a road intersection under different environmental and intersection configuration situations, using situation coverage criteria for automatic test suite generation for safety assurance of AVs. We have developed an ontology for intersection situations, and used it to generate a situation hyperspace i.e., the space of all possible situations arising from that ontology. For the evaluation of our SitCov AV-testing framework, we have seeded multiple faults in our ego AV, and compared situation coverage based and random situation generation. We have found that both generation methodologies trigger around the same number of seeded faults, but the situation coverage-based generation tells us a lot more about the weaknesses of the autonomous driving algorithm of our ego AV, especially in edge-cases. Our code is publicly available online, anyone can use our SitCov AV-testing framework and use it or build further on top of it. This paper aims to contribute to the domain of V&V and development of AVs, not only from a theoretical point of view, but also from the viewpoint of an open-source software contribution and releasing a flexible/effective tool for V&V and development of AVs.

翻訳日:2022-01-02 09:01:35 公開日:2021-12-24

# (参考訳) プロファイルなしのプロファイル誘導最適化:機械学習アプローチ

Profile Guided Optimization without Profiles: A Machine Learning Approach ( http://arxiv.org/abs/2112.14679v1 )

ライセンス: CC BY 4.0

Nadav Rotem, Chris Cummins

(参考訳) プロファイル誘導最適化は、動的動作に基づくコンパイラの最適化能力を改善する効果的な手法であるが、プロファイルデータの収集は高価で面倒であり、定期的な更新が必要である。本稿では,プロファイルを最適化せずにコンパイルされたプログラムの性能を向上させる分岐確率を推定する新しい統計手法を提案する。分岐確率情報を有するバイナリの大規模なコーパスから収集した情報を用いて,オフライントレーニングを行う。学習されたモデルは、コンパイラーが正規の未入力プログラムの分岐確率を予測するために使われ、コンパイラは最適化決定を知らせるために使用できる。我々の技術はLLVMに直接統合され、既存の人間工学のコンパイラヒューリスティックスを補完します。本手法をベンチマークスイートで評価し,プロファイル情報無しでコンパイルした場合の利点を示す。デプロイメントでは,プロファイリングの実行を必要とせず,コンパイル時間に何の影響も与えない。

Profile guided optimization is an effective technique for improving the optimization ability of compilers based on dynamic behavior, but collecting profile data is expensive, cumbersome, and requires regular updating to remain fresh. We present a novel statistical approach to inferring branch probabilities that improves the performance of programs that are compiled without profile guided optimizations. We perform offline training using information that is collected from a large corpus of binaries that have branch probabilities information. The learned model is used by the compiler to predict the branch probabilities of regular uninstrumented programs, which the compiler can then use to inform optimization decisions. We integrate our technique directly in LLVM, supplementing the existing human-engineered compiler heuristics. We evaluate our technique on a suite of benchmarks, demonstrating some gains over compiling without profile information. In deployment, our technique requires no profiling runs and has negligible effect on compilation time.

翻訳日:2022-01-02 08:40:00 公開日:2021-12-24

# (参考訳) 深層強化学習によるレーン変更決定-

Lane Change Decision-Making through Deep Reinforcement Learning ( http://arxiv.org/abs/2112.14705v1 )

ライセンス: CC BY-SA 4.0

Mukesh Ghimire, Malobika Roy Choudhury, Guna Sekhar Sai Harsha Lagudu

(参考訳) 交通環境の複雑さとボラティリティのため、自動運転における意思決定は極めて難しい問題である。このプロジェクトでは、Deep Q-Networkとルールベースの制約を使ってレーン変更の意思決定を行います。安全かつ効率的な車線変更挙動は、高レベル側方意思決定と低レベルルールに基づく軌道監視を組み合わせることで得られる。エージェントは、100エピソードのトレーニングを経て、現実世界のような大都市シミュレーターで適切な車線変更操作を行うことが期待されている。その結果,ルールベースDQNはDQN法よりも優れた性能を示した。規則に基づくDQNは、0.8の安全率、平均速度47MPHを達成する

Due to the complexity and volatility of the traffic environment, decision-making in autonomous driving is a significantly hard problem. In this project, we use a Deep Q-Network, along with rule-based constraints to make lane-changing decision. A safe and efficient lane change behavior may be obtained by combining high-level lateral decision-making with low-level rule-based trajectory monitoring. The agent is anticipated to perform appropriate lane-change maneuvers in a real-world-like udacity simulator after training it for a total of 100 episodes. The results shows that the rule-based DQN performs better than the DQN method. The rule-based DQN achieves a safety rate of 0.8 and average speed of 47 MPH

翻訳日:2022-01-02 08:26:57 公開日:2021-12-24

# シフトウインドウセルフアテンションによる原料品質検出

Raw Produce Quality Detection with Shifted Window Self-Attention ( http://arxiv.org/abs/2112.13845v1 )

ライセンス: Link先を確認

Oh Joon Kwon, Byungsoo Kim, Youngduck Choi

(参考訳) 気候変動の加速と人口の急増により、世界の食料不安全は今後数十年で悪化すると予想されている。この静脈では、食品生産のあらゆるレベルで非効率を取り除くことが重要である。ディープラーニングの最近の進歩は、そのような非効率性を減らすのに役立つが、その応用はまだ業界全体で主流になっておらず、大規模な経済コストを誘導している。この点において、RPQD(Raw Produce Quality Detection)タスクにCNN(Convolutional Neural Networks)などの最新の技術が適用されている。一方、Transformerが他のモダリティのビジョンで成功したことで、RPQDのTransformerベースのモデルよりも優れたパフォーマンスが期待できるようになりました。本研究では,近年の最先端swin(shifted windows)トランスフォーマーについて,ウインドウ内とウインドウ間の両方で自己接触を計算した。 Swin Transformerを4種類のRPQD画像データセット上のCNNモデルと比較し、それぞれが果物、野菜、魚、豚肉、牛肉といった異なる種類の原料を含む。 swin transformerは、優れた性能や競争力を実現するだけでなく、データと計算効率も向上し、現実の環境での実際のデプロイメントに理想的です。私たちの知る限りでは、これはrpqdタスクに関する最初の大規模な実証研究であり、今後の作業でさらに注目されることを期待しています。

Global food insecurity is expected to worsen in the coming decades with the accelerated rate of climate change and the rapidly increasing population. In this vein, it is important to remove inefficiencies at every level of food production. The recent advances in deep learning can help reduce such inefficiencies, yet their application has not yet become mainstream throughout the industry, inducing economic costs at a massive scale. To this point, modern techniques such as CNNs (Convolutional Neural Networks) have been applied to RPQD (Raw Produce Quality Detection) tasks. On the other hand, Transformer's successful debut in the vision among other modalities led us to expect a better performance with these Transformer-based models in RPQD. In this work, we exclusively investigate the recent state-of-the-art Swin (Shifted Windows) Transformer which computes self-attention in both intra- and inter-window fashion. We compare Swin Transformer against CNN models on four RPQD image datasets, each containing different kinds of raw produce: fruits and vegetables, fish, pork, and beef. We observe that Swin Transformer not only achieves better or competitive performance but also is data- and compute-efficient, making it ideal for actual deployment in real-world setting. To the best of our knowledge, this is the first large-scale empirical study on RPQD task, which we hope will gain more attention in future works.

翻訳日:2022-01-02 08:18:35 公開日:2021-12-24

# BMPQ:スクラッチからのDNNのビット勾配感度駆動混合精度量子化

BMPQ: Bit-Gradient Sensitivity Driven Mixed-Precision Quantization of DNNs from Scratch ( http://arxiv.org/abs/2112.13843v1 )

ライセンス: Link先を確認

Souvik Kundu, Shikai Wang, Qirui Sun, Peter A. Beerel, Massoud Pedram

(参考訳) 混合精度量子化を持つ大規模DNNは、高い分類性能を維持しながら超高圧縮を実現することができる。しかし、最適化プロセスの指針となる正確なメトリックを見つけることの難しさから、これらの手法は32ビット浮動小数点 (FP-32) ベースラインと比較して大きなパフォーマンスを犠牲にするか、事前訓練されたベースラインの可用性を必要とする計算的かつ反復的なトレーニングポリシーに依存している。この問題に対処するため,BMPQはビット勾配を用いて層感度を分析し,混合精度の量子化モデルを生成する訓練手法である。 BMPQは単一のトレーニングイテレーションを必要とするが、トレーニング済みのベースラインは必要ない。整数線形プログラム(ILP)を使用して、ハードウェア予算の固定の下で、トレーニング中にレイヤーの精度を動的に調整する。 BMPQの有効性を評価するため,CIFAR-10,CIFAR-100,Tiny-ImageNetデータセット上でVGG16,ResNet18を用いて広範囲に実験を行った。ベースラインのFP-32モデルと比較して、BMPQは15.4倍少ないパラメータビットを持つモデルの精度は無視できる。 sota "during training" と比較すると,cifar-10,cifar-100,tiny-imagenetでは2.1倍,2.2倍,2.9倍小さく,精度は最大14.54%向上した。

Large DNNs with mixed-precision quantization can achieve ultra-high compression while retaining high classification performance. However, because of the challenges in finding an accurate metric that can guide the optimization process, these methods either sacrifice significant performance compared to the 32-bit floating-point (FP-32) baseline or rely on a compute-expensive, iterative training policy that requires the availability of a pre-trained baseline. To address this issue, this paper presents BMPQ, a training method that uses bit gradients to analyze layer sensitivities and yield mixed-precision quantized models. BMPQ requires a single training iteration but does not need a pre-trained baseline. It uses an integer linear program (ILP) to dynamically adjust the precision of layers during training, subject to a fixed hardware budget. To evaluate the efficacy of BMPQ, we conduct extensive experiments with VGG16 and ResNet18 on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. Compared to the baseline FP-32 models, BMPQ can yield models that have 15.4x fewer parameter bits with a negligible drop in accuracy. Compared to the SOTA "during training", mixed-precision training scheme, our models are 2.1x, 2.2x, and 2.9x smaller, on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively, with an improved accuracy of up to 14.54%.

翻訳日:2022-01-02 08:18:10 公開日:2021-12-24

# 微分可能ゲームにおける多様性のためのリアプノフ指数

Lyapunov Exponents for Diversity in Differentiable Games ( http://arxiv.org/abs/2112.14570v1 )

ライセンス: Link先を確認

Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, Jakob Foerster

(参考訳) ridge rider (rr) は、hessian ("ridges") の固有ベクトルに従うことによって最適化問題の多様な解を求めるアルゴリズムである。 RRは保守的な勾配系(すなわち単一損失関数を含む設定)のために設計されており、サドルで分岐する。この概念を非保存的多エージェント勾配系に一般化し,任意の分岐点を求めるための一般化リッジライダー(grr)法を提案する。力学系の分野から機械を活用し,提案手法の理論的動機付けを行う。興味のある高次元問題に洞察を与えながら,新たな現象を可視化できる新しい玩具問題を構築した。最後に, 反復囚人のジレンマと, 生成的敵ネットワークを含む関連する機械学習問題において, 多様な解を求めることにより, 提案手法を実証的に評価した。

Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian ("ridges"). RR is designed for conservative gradient systems (i.e., settings involving a single loss function), where it branches at saddles - easy-to-find bifurcation points. We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method - denoted Generalized Ridge Rider (GRR) - for finding arbitrary bifurcation points. We give theoretical motivation for our method by leveraging machinery from the field of dynamical systems. We construct novel toy problems where we can visualize new phenomena while giving insight into high-dimensional problems of interest. Finally, we empirically evaluate our method by finding diverse solutions in the iterated prisoners' dilemma and relevant machine learning problems including generative adversarial networks.

翻訳日:2022-01-02 08:17:28 公開日:2021-12-24

# (参考訳) 報酬を再発見する:強化学習の視点

Rediscovering Affordance: A Reinforcement Learning Perspective ( http://arxiv.org/abs/2112.12886v1 )

ライセンス: CC BY 4.0

Yi-Chi Liao, Kashyap Todi, Aditya Acharya, Antti Keurulainen, Andrew Howes, Antti Oulasvirta

(参考訳) Affordanceは、オブジェクトによって許される可能性のあるアクションの認識を指す。その人間とコンピュータの相互作用との関連性にもかかわらず、既存の理論では、余裕形成の基盤となるメカニズムは説明されていない。本稿では,認知科学における強化学習理論に基づく補償形成の積分理論を提案する。鍵となる前提は、ユーザーは、強化信号(成功/失敗)が存在する場合の経験を通して、期待できる運動行動を知覚に関連付けることを学ぶことである。彼らはまた、行動(例えば「回転する」というダイヤル)を分類することを学び、価格について名前を付け、推論する能力を与える。新たなウィジェットに遭遇すると、これらのアクションを一般化する能力は、余裕を知覚する能力を決定する。この理論を仮想ロボットモデルに実装し,対話型ウィジェットタスクにおけるアフォーダンスの人間的適応を実証する。その予測は、人間のデータの動向と一致しているが、人間はより早くアフォーマンスを適応できるため、追加のメカニズムの存在を示唆する。

Affordance refers to the perception of possible actions allowed by an object. Despite its relevance to human-computer interaction, no existing theory explains the mechanisms that underpin affordance-formation; that is, how affordances are discovered and adapted via interaction. We propose an integrative theory of affordance-formation based on the theory of reinforcement learning in cognitive sciences. The key assumption is that users learn to associate promising motor actions to percepts via experience when reinforcement signals (success/failure) are present. They also learn to categorize actions (e.g., ``rotating'' a dial), giving them the ability to name and reason about affordance. Upon encountering novel widgets, their ability to generalize these actions determines their ability to perceive affordances. We implement this theory in a virtual robot model, which demonstrates human-like adaptation of affordance in interactive widgets tasks. While its predictions align with trends in human data, humans are able to adapt affordances faster, suggesting the existence of additional mechanisms.

翻訳日:2021-12-29 18:09:19 公開日:2021-12-24

# (参考訳) 高次元行列値データに対する最適可変クラスタリング

Optimal Variable Clustering for High-Dimensional Matrix Valued Data ( http://arxiv.org/abs/2112.12909v1 )

ライセンス: CC BY 4.0

Inbeom Lee, Siyi Deng, Yang Ning

(参考訳) 行列値データは多くのアプリケーションでますます普及している。このタイプのデータに対する既存のクラスタリング手法のほとんどは、平均モデルに合わせて調整されており、特に高次元の設定において非常に有意義な特徴の依存構造を考慮していない。クラスタリングのための依存構造から情報を抽出するために,列と列のクラスタを表す未知のメンバシップ行列を用いて,行列形式で配置された特徴に対する新しい潜在変数モデルを提案する。このモデルでは、重み付き共分散行列の差分を相似性尺度として用いた階層的クラスタリングアルゴリズムのクラスをさらに提案する。理論上,温和な条件下では,高次元環境でのクラスタリング一貫性を実現する。この一貫性の結果は、重み付き共分散行列の幅広いクラスを持つアルゴリズムに対して成立するが、この結果の条件は重みの選択に依存する。この重みがアルゴリズムの理論的性能にどのように影響するかを調べるため、潜在変数モデルに基づいてクラスタリングのためのミニマックス下限を確立する。これらの結果から, この重みを用いることで, クラスター分離計量の大きさの観点で, アルゴリズムが最小のレート最適となることを保証できるという意味で, 最適重みを同定する。また,最適重み付きアルゴリズムの実用的実装についても論じる。最後に,本アルゴリズムの有限サンプル性能を評価するためのシミュレーション研究を行い,その手法をゲノムデータセットに適用する。

Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which can be very informative, especially in high-dimensional settings. To extract the information from the dependence structure for clustering, we propose a new latent variable model for the features arranged in matrix form, with some unknown membership matrices representing the clusters for the rows and columns. Under this model, we further propose a class of hierarchical clustering algorithms using the difference of a weighted covariance matrix as the dissimilarity measure. Theoretically, we show that under mild conditions, our algorithm attains clustering consistency in the high-dimensional setting. While this consistency result holds for our algorithm with a broad class of weighted covariance matrices, the conditions for this result depend on the choice of the weight. To investigate how the weight affects the theoretical performance of our algorithm, we establish the minimax lower bound for clustering under our latent variable model. Given these results, we identify the optimal weight in the sense that using this weight guarantees our algorithm to be minimax rate-optimal in terms of the magnitude of some cluster separation metric. The practical implementation of our algorithm with the optimal weight is also discussed. Finally, we conduct simulation studies to evaluate the finite sample performance of our algorithm and apply the method to a genomic dataset.

翻訳日:2021-12-29 17:45:00 公開日:2021-12-24

# (参考訳) 高精度3次元ポーズと形状推定のための複数初期化最適化ネットワーク

Multi-initialization Optimization Network for Accurate 3D Human Pose and Shape Estimation ( http://arxiv.org/abs/2112.12917v1 )

ライセンス: CC BY 4.0

Zhiwei Liu, Xiangyu Zhu, Lu Yang, Xiang Yan, Ming Tang, Zhen Lei, Guibo Zhu, Xuetao Feng, Yan Wang, Jinqiao Wang

(参考訳) 単眼のrgb画像からの3d人間のポーズと形状復元は難しい課題である。既存の学習に基づく手法は、例えば2dと3dのジョイント位置といった弱い監督信号に大きく依存している。しかし、これらの弱い監督ラベルには2Dから3Dの曖昧さがあるので、そのようなラベルで訓練すると、ネットワークは局所的な最適条件で立ち往生しやすい。本稿では,複数の初期化を最適化することで,アンビチュアリティを低減する。具体的には,マルチイニシャライズ最適化ネットワーク(mion)と呼ばれる3段階フレームワークを提案する。第1段階では,入力サンプルの2次元キーポイントに適合する粗い3次元再構成候補を戦略的に選択する。各粗い再構成は初期化と見なすことができ、1つの最適化分岐につながる。第2段階では, メッシュ改質トランス (MRT) を設計し, 自己保持機構を用いて粗い再構成結果をそれぞれ洗練する。最後に,RGB画像の視覚的証拠が与えられた3次元再構成と一致するかどうかを評価することで,複数の候補から最高の結果を得るために,一貫性推定ネットワーク(CEN)を提案する。実験により、当社のマルチ初期化最適化ネットワークは、既存の3Dメッシュベースのメソッドを複数の公開ベンチマークで上回ります。

3D human pose and shape recovery from a monocular RGB image is a challenging task. Existing learning based methods highly depend on weak supervision signals, e.g. 2D and 3D joint location, due to the lack of in-the-wild paired 3D supervision. However, considering the 2D-to-3D ambiguities existed in these weak supervision labels, the network is easy to get stuck in local optima when trained with such labels. In this paper, we reduce the ambituity by optimizing multiple initializations. Specifically, we propose a three-stage framework named Multi-Initialization Optimization Network (MION). In the first stage, we strategically select different coarse 3D reconstruction candidates which are compatible with the 2D keypoints of input sample. Each coarse reconstruction can be regarded as an initialization leads to one optimization branch. In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism. Finally, a Consistency Estimation Network (CEN) is proposed to find the best result from mutiple candidates by evaluating if the visual evidence in RGB image matches a given 3D reconstruction. Experiments demonstrate that our Multi-Initialization Optimization Network outperforms existing 3D mesh based methods on multiple public benchmarks.

翻訳日:2021-12-29 17:43:44 公開日:2021-12-24

# (参考訳) nvbench:クロスドメイン自然言語と可視化タスクのための大規模合成データセット

nvBench: A Large-Scale Synthesized Dataset for Cross-Domain Natural Language to Visualization Task ( http://arxiv.org/abs/2112.12926v1 )

ライセンス: CC BY 4.0

Yuyu Luo, Jiawei Tang, Guoliang Li

(参考訳) 自然言語(NL)クエリを対応する視覚化(VIS)に変換するNL2VISは、商用の視覚化ベンダーと学術研究者の両方で注目を集めている。過去数年間、高度なディープラーニングベースのモデルは、多くの自然言語処理(NLP)タスクにおいて人間のような能力を達成した。しかし、大きなバルクは、多くの(NL、VIS)ペアを持つベンチマークの欠如である。 105ドメイン上の750テーブルから25,750(NL, VIS)のペアを含む,最初の大規模NL2VISベンチマークであるnvBenchを,クロスドメインNL2VISタスクをサポートするために,(NL, SQL)ベンチマークから合成した。 nvBenchの品質は、23人の専門家と300人以上の群衆労働者によって広く検証されている。 nvBenchを用いたディープラーニングモデルトレーニングでは、nvBenchがNL2VISの分野を推し進めることができる。

NL2VIS - which translates natural language (NL) queries to corresponding visualizations (VIS) - has attracted more and more attention both in commercial visualization vendors and academic researchers. In the last few years, the advanced deep learning-based models have achieved human-like abilities in many natural language processing (NLP) tasks, which clearly tells us that the deep learning-based technique is a good choice to push the field of NL2VIS. However, a big balk is the lack of benchmarks with lots of (NL, VIS) pairs. We present nvBench, the first large-scale NL2VIS benchmark, containing 25,750 (NL, VIS) pairs from 750 tables over 105 domains, synthesized from (NL, SQL) benchmarks to support cross-domain NL2VIS task. The quality of nvBench has been extensively validated by 23 experts and 300+ crowd workers. Deep learning-based models training using nvBench demonstrate that nvBench can push the field of NL2VIS.

翻訳日:2021-12-29 17:28:06 公開日:2021-12-24

# (参考訳) 癌患者の計算表現型と死亡予測のための制約付きテンソル因子分解

Constrained tensor factorization for computational phenotyping and mortality prediction in patients with cancer ( http://arxiv.org/abs/2112.12933v1 )

ライセンス: CC BY 4.0

Francisco Y Cai, Chengsheng Mao, Yuan Luo

(参考訳) 背景: 米国で電子健康記録(EHR)の採用が増加していることで、計算可能なデータのトロブが生まれ、機械学習が有用な洞察を抽出するために応用されている。 EHRデータは、行列(テンソル)の3次元アナログとして表現され、計算表現型として解釈できる2次元因子に分解される。方法:2000年から2015年までノースウェスタン医科大学データウェアハウスにおける乳がん,前立腺がん,大腸癌,肺癌患者の計算表現型を導出し,コホート死亡率を予測するために,制約テンソル因子化を適用した。本実験では,因子化アルゴリズムにおける教師付き用語の使用,医療指標によるテンソル共起のフィルタリング,および因子化過程における社会決定因子(SDOH)の添加について検討した。得られた計算表現型を定性的に評価し,曲線(AUC)統計に基づく5年間の死亡予測能力を評価した。結果: 医学的指標によるフィルタリングにより, より簡潔で解釈可能な表現型が得られた。死亡予測性能(auc)は、実験条件やがんの種類によって異なっていた(例: 0.623 - 0.694, 前立腺: 0.603 - 0.750, 大腸: 0.523 - 0.641, 肺: 0.517 - 0.623)。一般に、教師付き項とSDOH共変量の導入により予測性能が向上した。結論: がん患者のスパースEHRデータに適用された制約テンソル因子化は, 5年間の死亡を予測できる計算表現型を発見することができる。因子化アルゴリズムにSDOH変数を組み込むことは、予測性能を向上させるための簡単で効果的な方法である。

Background: The increasing adoption of electronic health records (EHR) across the US has created troves of computable data, to which machine learning methods have been applied to extract useful insights. EHR data, represented as a three-dimensional analogue of a matrix (tensor), is decomposed into two-dimensional factors that can be interpreted as computational phenotypes. Methods: We apply constrained tensor factorization to derive computational phenotypes and predict mortality in cohorts of patients with breast, prostate, colorectal, or lung cancer in the Northwestern Medicine Enterprise Data Warehouse from 2000 to 2015. In our experiments, we examined using a supervised term in the factorization algorithm, filtering tensor co-occurrences by medical indication, and incorporating additional social determinants of health (SDOH) covariates in the factorization process. We evaluated the resulting computational phenotypes qualitatively and by assessing their ability to predict five-year mortality using the area under the curve (AUC) statistic. Results: Filtering by medical indication led to more concise and interpretable phenotypes. Mortality prediction performance (AUC) varied under the different experimental conditions and by cancer type (breast: 0.623 - 0.694, prostate: 0.603 - 0.750, colorectal: 0.523 - 0.641, and lung: 0.517 - 0.623). Generally, prediction performance improved with the use of a supervised term and the incorporation of SDOH covariates. Conclusion: Constrained tensor factorization, applied to sparse EHR data of patients with cancer, can discover computational phenotypes predictive of five-year mortality. The incorporation of SDOH variables into the factorization algorithm is an easy-to-implement and effective way to improve prediction performance.

翻訳日:2021-12-29 17:15:08 公開日:2021-12-24

# (参考訳) ドメイン特化語埋め込みとトピックモデリングを用いた科学出版の分析

Analyzing Scientific Publications using Domain-Specific Word Embedding and Topic Modelling ( http://arxiv.org/abs/2112.12940v1 )

ライセンス: CC BY 4.0

Trisha Singhal, Junhua Liu, Lucienne T.M. Blessing, Kwan Hui Lim

(参考訳) 科学の世界は急速に変化しており、新しい技術が開発され、新しい傾向が出現している。本稿では,学術出版物の科学的分析を行うための枠組みを提案する。このフレームワークは、単語埋め込みやトピックモデリングなど、自然言語処理の様々な技術を採用し、組み合わせている。単語埋め込みはドメイン固有語の意味的意味を捉えるために使われる。本稿では,様々な研究分野において,一般の意味的意味とドメイン固有語を学習できる2つの新しい科学論文の埋め込み,すなわちpub-gとpub-wを提案する。その後、これらの大規模研究分野における研究トピックのクラスターを特定するためにトピックモデリングが使用される。 2つの研究領域から1995年から2020年までの2つのカンファレンスと2つのジャーナルからなる出版データセットを収集した。 PUB-G と PUB-W の埋め込みは,トピックコヒーレンスに基づく ~0.18-1.03 のマージンの他のベースライン埋め込みに比べて優れていることを示す。

The scientific world is changing at a rapid pace, with new technology being developed and new trends being set at an increasing frequency. This paper presents a framework for conducting scientific analyses of academic publications, which is crucial to monitor research trends and identify potential innovations. This framework adopts and combines various techniques of Natural Language Processing, such as word embedding and topic modelling. Word embedding is used to capture semantic meanings of domain-specific words. We propose two novel scientific publication embedding, i.e., PUB-G and PUB-W, which are capable of learning semantic meanings of general as well as domain-specific words in various research fields. Thereafter, topic modelling is used to identify clusters of research topics within these larger research fields. We curated a publication dataset consisting of two conferences and two journals from 1995 to 2020 from two research domains. Experimental results show that our PUB-G and PUB-W embeddings are superior in comparison to other baseline embeddings by a margin of ~0.18-1.03 based on topic coherence.

翻訳日:2021-12-29 17:02:59 公開日:2021-12-24

# (参考訳) 生体像分割における深層アンサンブル

Deep ensembles in bioimage segmentation ( http://arxiv.org/abs/2112.12955v1 )

ライセンス: CC BY 4.0

Loris Nanni, Daniela Cuza, Alessandra Lumini, Andrea Loreggia and Sheryl Brahnam

(参考訳) セマンティックセグメンテーションは、画像の各ピクセルを、利用可能なすべてのピクセルの集合から選択された特定のラベルに割り当てることで分類する。ここ数年、このようなタスクに多くの注意が向けられた。多くのコンピュータビジョン研究者は、画像のセマンティクスと低レベルの表現を学習できるモデルを開発するためにオートエンコーダ構造を適用しようとした。オートエンコーダアーキテクチャでは、入力が与えられた場合、エンコーダは、デコーダが元のデータを再構成するために使用する入力の低次元表現を計算する。本研究では,畳み込みニューラルネットワーク(CNN)のアンサンブルを提案する。アンサンブル法では、多くの異なるモデルが訓練され、分類に使用され、アンサンブルは単一分類器の出力を集約する。このアプローチは、システム全体のパフォーマンスを改善するために、さまざまな分類器の違いを活用する。単一分類器間の多様性は、異なる損失関数を用いて強制される。特に,ダイスと構造類似性指数の組み合わせによる新たな損失関数を提案する。提案するアンサンブルは,DeepLabV3+とHarDNet環境を用いて,異なるバックボーンネットワークを組み合わせることで実現されている。この提案はポリープとスキンセグメンテーションの2つの実世界のシナリオに関する広範な経験的評価を通じて評価される。すべてのコードはhttps://github.com/LorisNanni.comで公開されている。

Semantic segmentation consists in classifying each pixel of an image by assigning it to a specific label chosen from a set of all the available ones. During the last few years, a lot of attention shifted to this kind of task. Many computer vision researchers tried to apply autoencoder structures to develop models that can learn the semantics of the image as well as a low-level representation of it. In an autoencoder architecture, given an input, an encoder computes a low dimensional representation of the input that is then used by a decoder to reconstruct the original data. In this work, we propose an ensemble of convolutional neural networks (CNNs). In ensemble methods, many different models are trained and then used for classification, the ensemble aggregates the outputs of the single classifiers. The approach leverages on differences of various classifiers to improve the performance of the whole system. Diversity among the single classifiers is enforced by using different loss functions. In particular, we present a new loss function that results from the combination of Dice and Structural Similarity Index. The proposed ensemble is implemented by combining different backbone networks using the DeepLabV3+ and HarDNet environment. The proposal is evaluated through an extensive empirical evaluation on two real-world scenarios: polyp and skin segmentation. All the code is available online at https://github.com/LorisNanni.

翻訳日:2021-12-29 16:46:44 公開日:2021-12-24

# (参考訳) 分岐モデル空間におけるサポートベクトルマシンの最適モデル平均化

Optimal Model Averaging of Support Vector Machines in Diverging Model Spaces ( http://arxiv.org/abs/2112.12961v1 )

ライセンス: CC BY 4.0

Chaoxia Yuan, Chao Ying, Zhou Yu, Fang Fang

(参考訳) サポートベクトルマシン(SVM)は多くの分野で大きな成功を収めた強力な分類手法である。その性能は冗長な共変量によって著しく損なわれるため、高次元共変量を持つSVMではモデル選択技術が広く用いられている。モデル選択の代替として、過去数十年でモデル平均化の領域で顕著な進歩が見られた。しかし、svmでは頻繁なモデル平均化手法は考慮されなかった。本研究は, このギャップを埋めることを目的として, クロスバリデーションにより最適重みを選択するSVMの頻繁なモデル平均化手順を提案する。サンプルサイズの指数関数的な速度で共変数の数が発散した場合でも、ヒンジ損失と最小損失の比率が1に収束するという意味で、提案手法の漸近的最適性を示す。また、モデル平均化に関する洞察を提供する収束率も導き出します。パラメータ選択をチューニングする面倒だが重要なタスクを必要とするSVMのモデル選択法と比較して、モデル平均化法はタスクを回避し、実証研究において有望な性能を示す。

Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet no frequentist model averaging method was considered for SVM. This work aims to fill the gap and to propose a frequentist model averaging procedure for SVM which selects the optimal weight by cross validation. Even when the number of covariates diverges at an exponential rate of the sample size, we show asymptotic optimality of the proposed method in the sense that the ratio of its hinge loss to the lowest possible loss converges to one. We also derive the convergence rate which provides more insights to model averaging. Compared to model selection methods of SVM which require a tedious but critical task of tuning parameter selection, the model averaging method avoids the task and shows promising performances in the empirical studies.

翻訳日:2021-12-29 16:25:26 公開日:2021-12-24

# (参考訳) 地球システム科学のための機械学習(ESS):南アジアにおける調査・現状・今後の方向性

Machine learning for Earth System Science (ESS): A survey, status and future directions for South Asia ( http://arxiv.org/abs/2112.12966v1 )

ライセンス: CC BY 4.0

Manmeet Singh, Bipin Kumar, Rajib Chattopadhyay, K Amarjyothi, Anup K Sutar, Sukanta Roy, Suryachandra A Rao, Ravi S. Nanjundiah

(参考訳) この調査は、機械学習アルゴリズムを適用する地球システム科学の現在の問題に焦点を当てている。これは、以前の研究の概要、インドの地球科学省における進行中の作業、そしていくつかの重要な地球科学問題へのMLアルゴリズムの将来の応用の概要を提供する。本研究では,地球システム科学(ESS)における機械学習に対するGartnerのハイプサイクルと,機械学習に関連する多次元領域のマインドマップについて,これまでの研究との比較を行った。我々は主に、大気、海洋、地震学、生物圏を含む地球科学の重要な要素に焦点を当て、統計的なダウンスケーリングや予測問題へのAI/ML応用をカバーする。

This survey focuses on the current problems in Earth systems science where machine learning algorithms can be applied. It provides an overview of previous work, ongoing work at the Ministry of Earth Sciences, Gov. of India, and future applications of ML algorithms to some significant earth science problems. We provide a comparison of previous work with this survey, a mind map of multidimensional areas related to machine learning and a Gartner's hype cycle for machine learning in Earth system science (ESS). We mainly focus on the critical components in Earth Sciences, including atmospheric, Ocean, Seismology, and biosphere, and cover AI/ML applications to statistical downscaling and forecasting problems.

翻訳日:2021-12-29 15:47:36 公開日:2021-12-24

# (参考訳) SGTR: Transformer を用いたエンドツーエンドのシーングラフ生成

SGTR: End-to-end Scene Graph Generation with Transformer ( http://arxiv.org/abs/2112.12970v1 )

ライセンス: CC BY 4.0

Rongjie Li, Songyang Zhang, Xuming He

(参考訳) シーングラフ生成(SGG)は、複雑な構成特性のため、難しい視覚的理解課題である。これまでのほとんどの作業では、ボトムアップの2段階あるいはポイントベースの1段階アプローチを採用していました。本研究では、上記の問題に対処する新しいSGG法を提案し、この課題を二部グラフ構築問題として定式化する。そこで我々は,まずエンティティと述語の提案集合を生成し,その後に有向エッジを推論して関係三重項を形成するトランスフォーマティブベースのエンドツーエンドフレームワークを開発した。特に,関係の構成的性質を活用するために,構造的述語生成器に基づく新しいエンティティ対応述語表現を開発する。さらに,エンティティ認識構造に基づいて,二部的なシーングラフの接続を推測するグラフ合成モジュールを設計し,シーングラフをエンドツーエンドで生成できるようにした。広範な実験結果から,我々の設計は,既存の手法のほとんどを上回って,高い推論効率を享受し,2つの難解なベンチマークにおいて,最先端あるいは同等のパフォーマンスを達成できることがわかった。当社のモデルがTransformerベースのシーングラフ生成の強力なベースラインになることを期待しています。

Scene Graph Generation (SGG) remains a challenging visual understanding task due to its complex compositional property. Most previous works adopt a bottom-up two-stage or a point-based one-stage approach, which often suffers from overhead time complexity or sub-optimal design assumption. In this work, we propose a novel SGG method to address the aforementioned issues, which formulates the task as a bipartite graph construction problem. To solve the problem, we develop a transformer-based end-to-end framework that first generates the entity and predicate proposal set, followed by inferring directed edges to form the relation triplets. In particular, we develop a new entity-aware predicate representation based on a structural predicate generator to leverage the compositional property of relationships. Moreover, we design a graph assembling module to infer the connectivity of the bipartite scene graph based on our entity-aware structure, enabling us to generate the scene graph in an end-to-end manner. Extensive experimental results show that our design is able to achieve the state-of-the-art or comparable performance on two challenging benchmarks, surpassing most of the existing approaches and enjoying higher efficiency in inference. We hope our model can serve as a strong baseline for the Transformer-based scene graph generation.

翻訳日:2021-12-29 15:32:20 公開日:2021-12-24

# (参考訳) 重要度重み付けは補間分類器と相容れないか?

Is Importance Weighting Incompatible with Interpolating Classifiers? ( http://arxiv.org/abs/2112.12986v1 )

ライセンス: CC BY 4.0

Ke Alexander Wang, Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto

(参考訳) 重み付けは分散シフトを扱う古典的なテクニックである。しかし、以前の研究は、重み付けの重要性が過小パラメータ化されたニューラルネットワークにほとんど影響しないことを示す強力な経験的、理論的証拠を示している。重み付けは過パラメータニューラルネットワークのトレーニングと真に相容れないのか? 私たちの論文はこれを否定的に答える。重み付けは過剰パラメータ化のためではなく、ロジスティック損失やクロスエントロピー損失のような指数関数的に重み付けされた損失を使用することによって失敗する。その結果,過剰パラメータモデルの分布変化の補正において,多項式付き損失が重要度重み付けの効果を回復することが示された。過パラメータ線形モデルによる重み付き多項式尾損失に対する勾配勾配の挙動を特徴付けるとともに,ラベルシフト設定における多項式尾損失の利点を理論的に示す。驚くべきことに、我々の理論は古典的不偏重を指数化することによって得られる重みを用いることで性能が向上することを示している。最後に,亜集団シフトとラベルシフトデータセットを用いたニューラルネットワーク実験により,本解析の実用的価値を示す。再加重すると、損失関数はテスト精度の最大9%で再加重クロスエントロピーより優れる。損失関数はまた、分布シフトを補正するための、よく調整された最先端の方法に匹敵する、あるいは超えているテストアキュラティシーを与えます。

Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We show that importance weighting fails not because of the overparameterization, but instead, as a result of using exponentially-tailed losses like the logistic or cross-entropy loss. As a remedy, we show that polynomially-tailed losses restore the effects of importance reweighting in correcting distribution shift in overparameterized models. We characterize the behavior of gradient descent on importance weighted polynomially-tailed losses with overparameterized linear models, and theoretically demonstrate the advantage of using polynomially-tailed losses in a label shift setting. Surprisingly, our theory shows that using weights that are obtained by exponentiating the classical unbiased importance weights can improve performance. Finally, we demonstrate the practical value of our analysis with neural network experiments on a subpopulation shift and a label shift dataset. When reweighted, our loss function can outperform reweighted cross-entropy by as much as 9% in test accuracy. Our loss function also gives test accuracies comparable to, or even exceeding, well-tuned state-of-the-art methods for correcting distribution shifts.

翻訳日:2021-12-29 15:09:13 公開日:2021-12-24

# (参考訳) iSeg3D:インタラクティブな3D形状分割ツール

iSeg3D: An Interactive 3D Shape Segmentation Tool ( http://arxiv.org/abs/2112.12988v1 )

ライセンス: CC BY 4.0

Sucheng Qian, Liu Liu, Wenqiang Xu, Cewu Lu

(参考訳) 大規模データセットは3次元形状理解において優れた特徴を学習するために不可欠だが、ディープラーニングトレーニングを満足できるデータセットはごくわずかである。主な理由の1つは、ポリゴンやスクリブルを使ったポイントごとのセマンティックラベルの注釈付けが退屈で非効率であることである。 3次元形状のセグメンテーションアノテーションを容易にするために,iSegというアノテーションツールを提案する。最小限の人間のクリックで満足なセグメンテーション結果を得る(<10)。我々の観察では、ほとんどのオブジェクトは有限原始形状の合成と見なすことができ、構築された原始構成形状データ上でiSeg3Dモデルを訓練し、幾何学的事前知識を自己教師された方法で学習する。人間の相互作用を考えると、学習した知識は任意の形状の部品を分割するのに使用することができ、正のクリックはプリミティブを意味的な部分に関連付けるのに役立つ。さらに、オンラインのヒューマン・イン・ループ・ファインチューニングモジュールも提供し、より少ないクリックでセグメンテーションを行えるようにしています。 PartNet形状分割におけるiSeg3Dの有効性を示す実験を行った。データとコードは公開される予定だ。

A large-scale dataset is essential for learning good features in 3D shape understanding, but there are only a few datasets that can satisfy deep learning training. One of the major reasons is that current tools for annotating per-point semantic labels using polygons or scribbles are tedious and inefficient. To facilitate segmentation annotations in 3D shapes, we propose an effective annotation tool, named iSeg for 3D shape. It can obtain a satisfied segmentation result with minimal human clicks (< 10). Under our observation, most objects can be considered as the composition of finite primitive shapes, and we train iSeg3D model on our built primitive-composed shape data to learn the geometric prior knowledge in a self-supervised manner. Given human interactions, the learned knowledge can be used to segment parts on arbitrary shapes, in which positive clicks help associate the primitives into the semantic parts and negative clicks can avoid over-segmentation. Besides, We also provide an online human-in-loop fine-tuning module that enables the model perform better segmentation with less clicks. Experiments demonstrate the effectiveness of iSeg3D on PartNet shape segmentation. Data and codes will be made publicly available.

翻訳日:2021-12-29 15:07:53 公開日:2021-12-24

# (参考訳) ドメイン対応連続ゼロショット学習

Domain-Aware Continual Zero-Shot Learning ( http://arxiv.org/abs/2112.12989v1 )

ライセンス: CC BY 4.0

Kai Yi, Mohamed Elhoseiny

(参考訳) ドメイン認識型連続ゼロショット学習(DACZSL)は、目に見えないカテゴリのイメージを逐次認識するタスクである。我々は、DomainNetデータセット上にDACZSLを作成し、一連のタスクに分割し、トレーニング中に目に見えないドメインにクラスを段階的に提供し、目に見えないクラスと目に見えないクラスの両方に対して評価を行う。また、DACZSL設定に適応した最先端のベースラインモデルよりも優れた新しいDomain-Invariant CZSL Network(DIN)を提案する。我々は、グローバル共有ネットワークに加えて、小さなタスク毎プライベートネットワークで、以前のタスクから得た知識を省くための構造ベースのアプローチを採用する。プライベートネットワークがドメインとタスク固有の表現を捉えるように促すため、我々は、我々のグローバルネットワークのタスク不変およびドメイン不変をすべてのタスクにわたって可能にするために、新しい敵の知識の絡み合い設定でモデルを訓練します。提案手法では,クラスレベルのテキスト表現を改善するために,クラスレベルでの学習可能なプロンプトも学習し,サイド情報を表現して,将来の未確認クラスのゼロショット予測を可能にする。私たちのコードとベンチマークは公開される予定だ。

We introduce Domain Aware Continual Zero-Shot Learning (DACZSL), the task of visually recognizing images of unseen categories in unseen domains sequentially. We created DACZSL on top of the DomainNet dataset by dividing it into a sequence of tasks, where classes are incrementally provided on seen domains during training and evaluation is conducted on unseen domains for both seen and unseen classes. We also proposed a novel Domain-Invariant CZSL Network (DIN), which outperforms state-of-the-art baseline models that we adapted to DACZSL setting. We adopt a structure-based approach to alleviate forgetting knowledge from previous tasks with a small per-task private network in addition to a global shared network. To encourage the private network to capture the domain and task-specific representation, we train our model with a novel adversarial knowledge disentanglement setting to make our global network task-invariant and domain-invariant over all the tasks. Our method also learns a class-wise learnable prompt to obtain better class-level text representation, which is used to represent side information to enable zero-shot prediction of future unseen classes. Our code and benchmarks will be made publicly available.

翻訳日:2021-12-29 14:55:08 公開日:2021-12-24

# (参考訳) Toeplitzの最小二乗問題,高速アルゴリズム,ビッグデータ

Toeplitz Least Squares Problems, Fast Algorithms and Big Data ( http://arxiv.org/abs/2112.12994v1 )

ライセンス: CC BY 4.0

Ali Eshragh, Oliver Di Pietro and Michael A. Saunders

(参考訳) 時系列解析では、自己回帰モデルに適合する場合は、Toeplitz の通常の最小二乗問題を何度も解いて適切なモデルを見つけなければならない。最近の2つのアルゴリズム(lsarと反復半減法)はランダム化数値線形代数学(randnla)技術を適用し、大きな時系列データに自己回帰モデルを適用している。本研究では,これら2つの近似アルゴリズムの品質を大規模合成データと実世界データで比較検討した。両方のアルゴリズムは合成データセットに匹敵する結果を示すが、実世界の時系列データに適用するとLSARアルゴリズムはより堅牢であるように見える。 randnlaはビッグデータ時系列の文脈において有効であると結論づける。

In time series analysis, when fitting an autoregressive model, one must solve a Toeplitz ordinary least squares problem numerous times to find an appropriate model, which can severely affect computational times with large data sets. Two recent algorithms (LSAR and Repeated Halving) have applied randomized numerical linear algebra (RandNLA) techniques to fitting an autoregressive model to big time-series data. We investigate and compare the quality of these two approximation algorithms on large-scale synthetic and real-world data. While both algorithms display comparable results for synthetic datasets, the LSAR algorithm appears to be more robust when applied to real-world time series data. We conclude that RandNLA is effective in the context of big-data time series.

翻訳日:2021-12-29 14:33:01 公開日:2021-12-24

# (参考訳) 自己監督型単眼深度推定のためのチャネルワイズアテンションに基づくネットワーク

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation ( http://arxiv.org/abs/2112.13047v1 )

ライセンス: CC BY 4.0

Jiaxing Yan, Hong Zhao, Penghui Bu, YuSheng Jin

(参考訳) 自己教師付き学習は単眼深度推定に非常に有望な結果を示した。シーン構造と局所的詳細はどちらも高品質な深さ推定のための重要な手がかりである。最近の研究は、シーン構造の明示的なモデリングの欠如と詳細情報の適切なハンドリングに苦しめられ、その結果、パフォーマンスボトルネックとぼやけたアーティファクトが予測結果に生じている。本稿では,チャネルワイドアテンションに基づく深さ推定ネットワーク(CADepth-Net)を提案する。 1) この構造認識モジュールは, 長距離依存を捕捉し, チャネル次元における識別的特徴を集約する自己認識機構を用いて, シーン構造の認識を明示的に強化し, より優れたシーン理解とリッチな特徴表現を得る。 2) 細部強調モジュールは、チャンネルワイドの特徴マップを再分類し、情報的特徴を選択的に強調し、重要な局所的詳細情報を強調し、異なるレベルの特徴をより効率的に融合させ、より正確でシャープな深度予測を実現する。さらに,本手法の有効性を検証し,KITTIベンチマークとMake3Dデータセットの最先端結果が得られたことを示す。

Self-supervised learning has shown very promising results for monocular depth estimation. Scene structure and local details both are significant clues for high-quality depth estimation. Recent works suffer from the lack of explicit modeling of scene structure and proper handling of details information, which leads to a performance bottleneck and blurry artefacts in predicted results. In this paper, we propose the Channel-wise Attention-based Depth Estimation Network (CADepth-Net) with two effective contributions: 1) The structure perception module employs the self-attention mechanism to capture long-range dependencies and aggregates discriminative features in channel dimensions, explicitly enhances the perception of scene structure, obtains the better scene understanding and rich feature representation. 2) The detail emphasis module re-calibrates channel-wise feature maps and selectively emphasizes the informative features, aiming to highlight crucial local details information and fuse different level features more efficiently, resulting in more precise and sharper depth prediction. Furthermore, the extensive experiments validate the effectiveness of our method and show that our model achieves the state-of-the-art results on the KITTI benchmark and Make3D datasets.

翻訳日:2021-12-29 14:13:44 公開日:2021-12-24

# (参考訳) 高速スケーラブルHDRデゴーストリングのための自己ゲートメモリリカレントネットワーク

Self-Gated Memory Recurrent Network for Efficient Scalable HDR Deghosting ( http://arxiv.org/abs/2112.13050v1 )

ライセンス: CC BY 4.0

K. Ram Prabhakar, Susmit Agrawal, R. Venkatesh Babu

(参考訳) 任意の長さの動的シーケンスを融合する新しいリカレントネットワーク型hdrデガホスト方式を提案する。提案手法は畳み込み型および再帰型アーキテクチャを用いて視覚的にゴーストフリーなhdr画像を生成する。我々は,標準lstmセルよりも少ないパラメータを持ち,高速な実行時間を有する新しいリカレントセルアーキテクチャ,すなわち自己制御メモリ(sgm)セルを導入する。 sgmセルでは、ゲートを流れる情報の流れは、ゲートの出力に自身の関数を乗じることで制御される。さらに、2つのSGMセルを双方向設定で使用し、出力品質を向上する。提案手法は,既存のhdrデガホスト法と比較して,3つの公開データセットを定量的に分離すると同時に,可変長入力シーケンスを再トレーニングすることなく融合する拡張性を実現する。広範なアブレーションにより,提案手法における個々の成分の重要性を実証する。コードはhttps://val.cds.iisc.ac.in/hdr/hdrrnn/index.htmlで入手できる。

We propose a novel recurrent network-based HDR deghosting method for fusing arbitrary length dynamic sequences. The proposed method uses convolutional and recurrent architectures to generate visually pleasing, ghosting-free HDR images. We introduce a new recurrent cell architecture, namely Self-Gated Memory (SGM) cell, that outperforms the standard LSTM cell while containing fewer parameters and having faster running times. In the SGM cell, the information flow through a gate is controlled by multiplying the gate's output by a function of itself. Additionally, we use two SGM cells in a bidirectional setting to improve output quality. The proposed approach achieves state-of-the-art performance compared to existing HDR deghosting methods quantitatively across three publicly available datasets while simultaneously achieving scalability to fuse variable-length input sequence without necessitating re-training. Through extensive ablations, we demonstrate the importance of individual components in our proposed approach. The code is available at https://val.cds.iisc.ac.in/HDR/HDRRNN/index.html.

翻訳日:2021-12-29 13:54:52 公開日:2021-12-24

# (参考訳) tri-transformer hawkesプロセス: 3つの頭は1つより優れている

Tri-Transformer Hawkes Process: Three Heads are better than one ( http://arxiv.org/abs/2112.13058v1 )

ライセンス: CC BY 4.0

Zhi-yan Song, Jian-wei Liu, Lu-ning Zhang, and Ya-nan Han

(参考訳) 抽象。私たちが遭遇する現実世界のデータのほとんどは非同期イベントシーケンスであり、過去数十年は、ソーシャルネットワーク、電子医療記録、金融取引の分野への様々なポイントプロセスの実装が特徴である。初めは、複雑配列における異なる事象間の相互トリガーパターンを同時にシミュレートできるホークス過程とその変種が一般的であり、ニューラルネットワークの進歩とともに、ニューラルホークスプロセスが次々と提案され、徐々に研究ホットスポットとなっている。変圧器ホークスプロセス (THP) の提案は大幅に性能が向上し, 変圧器に基づくニューラルホークスプロセスの新たなアップサージが開始された。しかし、THPは非同期イベントシーケンスにおける発生時間やイベントの種類に関する情報を完全に利用していない。単にイベントタイプ変換のエンコーディングと、ソースエンコーディングに時間変換のロケーションエンコーディングを追加するだけである。同時に、単一の変換器から構築された学習者は、不可能な学習バイアスをもたらす。これらの問題を緩和するため,我々は,イベントと時間情報をドット製品注目に付加し,新たなマルチヘッド注目を形成するトリトランスフォーマホークスプロセス(tri-thp)モデルを提案する。 Tri-THPの有効性は、実世界と合成データの双方でよく設計された実験によって証明されている。

Abstract. Most of the real world data we encounter are asynchronous event sequence, so the last decades have been characterized by the implementation of various point process into the field of social networks,electronic medical records and financial transactions. At the beginning, Hawkes process and its variants which can simulate simultaneously the self-triggering and mutual triggering patterns between different events in complex sequences in a clear and quantitative way are more popular.Later on, with the advances of neural network, neural Hawkes process has been proposed one after another, and gradually become a research hotspot. The proposal of the transformer Hawkes process (THP) has gained a huge performance improvement, so a new upsurge of the neural Hawkes process based on transformer is set off. However, THP does not make full use of the information of occurrence time and type of event in the asynchronous event sequence. It simply adds the encoding of event type conversion and the location encoding of time conversion to the source encoding. At the same time, the learner built from a single transformer will result in an inescapable learning bias. In order to mitigate these problems, we propose a tri-transformer Hawkes process (Tri-THP) model, in which the event and time information are added to the dot-product attention as auxiliary information to form a new multihead attention. The effectiveness of the Tri-THP is proved by a series of well-designed experiments on both real world and synthetic data.

翻訳日:2021-12-29 13:27:44 公開日:2021-12-24

# (参考訳) Virtuoso: SOCのリアルタイムチューニングのためのビデオベースのインテリジェンス

Virtuoso: Video-based Intelligence for real-time tuning on SOCs ( http://arxiv.org/abs/2112.13076v1 )

ライセンス: CC BY 4.0

Jayoung Lee, PengCheng Wang, Ran Xu, Venkat Dasari, Noah Weston, Yin Li, Saurabh Bagchi, and Somali Chaterji

(参考訳) 画像分類や物体検出などのコンピュータビジョンタスクを組み込みデバイスやモバイルデバイスに最適化するために,効率的な適応型コンピュータビジョンシステムが提案されている。これらのソリューションは、非常に最近のもので、近似ノブを持つ適応システムを設計することで、モデル(ディープニューラルネットワーク、DNN)またはシステムを最適化することに焦点を当てている。最近の試みにもかかわらず、既存のソリューションには2つの大きな欠点がある。第一に、システムはどのモデルを実行するかを決定する間、モデルのエネルギー消費を考慮しない。第2に、他の共同居住者のワークロードのため、デバイス上での競合の現実的なシナリオを考慮していない。本研究では,高効率で適応的な映像物体検出システムvirtuosoを提案する。基盤となるvirtuosoは、精度・エネルギー・レイテンシ軸の異なる操作点で動作するマルチブランチ実行カーネルと、ユーザ要求を満たすために最適な実行ブランチを選択する軽量ランタイムスケジューラである。 Virtuosoと同等に比較するために、Faster R-CNN (FRCNN)、YOLO v3、SSD、EfficientDet、SELSA、MEGA、REPP、FastAdapt、およびFRCNN+、YOLO+、SSD+、EfficientDet+の社内適応版を含む15の最先端または広く使用されているプロトコルをベンチマークした。この包括的なベンチマークにより、virtuosoは上記のプロトコルをすべて上回っており、nvidia jetsonモバイルgpuのあらゆる効率レベルで精度のフロンティアをリードしている。具体的には、Virtuosoの精度は63.9%に達し、これは一般的なオブジェクト検出モデルよりも10%以上高く、FRCNNは51.1%、YOLOは49.5%である。

Efficient and adaptive computer vision systems have been proposed to make computer vision tasks, such as image classification and object detection, optimized for embedded or mobile devices. These solutions, quite recent in their origin, focus on optimizing the model (a deep neural network, DNN) or the system by designing an adaptive system with approximation knobs. In spite of several recent efforts, we show that existing solutions suffer from two major drawbacks. First, the system does not consider energy consumption of the models while making a decision on which model to run. Second, the evaluation does not consider the practical scenario of contention on the device, due to other co-resident workloads. In this work, we propose an efficient and adaptive video object detection system, Virtuoso, which is jointly optimized for accuracy, energy efficiency, and latency. Underlying Virtuoso is a multi-branch execution kernel that is capable of running at different operating points in the accuracy-energy-latency axes, and a lightweight runtime scheduler to select the best fit execution branch to satisfy the user requirement. To fairly compare with Virtuoso, we benchmark 15 state-of-the-art or widely used protocols, including Faster R-CNN (FRCNN), YOLO v3, SSD, EfficientDet, SELSA, MEGA, REPP, FastAdapt, and our in-house adaptive variants of FRCNN+, YOLO+, SSD+, and EfficientDet+ (our variants have enhanced efficiency for mobiles). With this comprehensive benchmark, Virtuoso has shown superiority to all the above protocols, leading the accuracy frontier at every efficiency level on NVIDIA Jetson mobile GPUs. Specifically, Virtuoso has achieved an accuracy of 63.9%, which is more than 10% higher than some of the popular object detection models, FRCNN at 51.1%, and YOLO at 49.5%.

翻訳日:2021-12-29 13:16:45 公開日:2021-12-24

# (参考訳) 非対向低光画像強調のための可逆ネットワーク

Invertible Network for Unpaired Low-light Image Enhancement ( http://arxiv.org/abs/2112.13107v1 )

ライセンス: CC BY 4.0

Jize Zhang, Haolin Wang, Xiaohe Wu, Wangmeng Zuo

(参考訳) 既存の低照度画像強調手法では、2つのCNNジェネレータを別々に配置し、拡張と分解を行う2方向GANフレームワークが好まれる。しかし、そのようなデータ駆動モデルは、低照度と通常の光画像間の変換の固有の特性を無視し、不安定なトレーニングとアーティファクトをもたらす。そこで本研究では,可逆ネットワークを利用してフォワードプロセスにおける低光度画像の強調を行い,非ペア学習で逆光を劣化させる手法を提案する。生成された実画像は、敵対的学習のための識別器に送られる。敵の損失に加えて、トレーニングの安定性を確保し、より詳細な画像を保存するために様々な損失関数を設計する。特に、過剰露光問題を緩和するために可逆性損失を導入する。さらに,低照度画像に対するプログレッシブ自己誘導強調処理を提案し,SOTAに対して良好な性能を示す。

Existing unpaired low-light image enhancement approaches prefer to employ the two-way GAN framework, in which two CNN generators are deployed for enhancement and degradation separately. However, such data-driven models ignore the inherent characteristics of transformation between the low and normal light images, leading to unstable training and artifacts. Here, we propose to leverage the invertible network to enhance low-light image in forward process and degrade the normal-light one inversely with unpaired learning. The generated and real images are then fed into discriminators for adversarial learning. In addition to the adversarial loss, we design various loss functions to ensure the stability of training and preserve more image details. Particularly, a reversibility loss is introduced to alleviate the over-exposure problem. Moreover, we present a progressive self-guided enhancement process for low-light images and achieve favorable performance against the SOTAs.

翻訳日:2021-12-29 12:43:43 公開日:2021-12-24

# (参考訳) リニア関数近似による加速・インスタンス最適政策評価

Accelerated and instance-optimal policy evaluation with linear function approximation ( http://arxiv.org/abs/2112.13109v1 )

ライセンス: CC BY 4.0

Tianjiao Li, Guanghui Lan and Ashwin Pananjady

(参考訳) 本稿では,線形関数近似による政策評価の問題と,高い最適性を保証するアルゴリズムを提示する。まず、この問題における決定的誤差と確率的誤差の両方に基づくベースラインを確立する下界の証明から始める。特に,トランジションカーネルの定常分布に付随するインスタンス依存ノルムにおいて,決定的誤差に起因するオラクルの複雑性を低く証明し,局所漸近ミニマックス機構を用いて,確率的誤差に起因したインスタンス依存的な下界を観測モデルで証明する。既存のアルゴリズムは、これらの下界の少なくとも1つと一致しない: 説明するために、時間差学習の分散還元型を分析し、特にオラクルの複雑性下界を達成することができないことを示す。この問題に対処するため,我々は,下限と下限の両方を同時に一致させ,インスタンス最適化の強い概念を実現する,分散低減高速時間差アルゴリズム(vrftd)を開発した。最後に、vrftdアルゴリズムをマルコフ観測による設定に拡張し、連鎖の混合時間に比例する乗算係数まで設定したi.i.d.の設定と一致するインスタンス依存収束結果を提供する。最適性の理論的保証は数値実験によって裏付けられる。

We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results that match those in the i.i.d. setting up to a multiplicative factor that is proportional to the mixing time of the chain. Our theoretical guarantees of optimality are corroborated by numerical experiments.

翻訳日:2021-12-29 12:40:27 公開日:2021-12-24

# (参考訳) MRI由来の正規化フローを用いた超音波スペックル抑制とノイズ除去

Ultrasound Speckle Suppression and Denoising using MRI-derived Normalizing Flow Priors ( http://arxiv.org/abs/2112.13110v1 )

ライセンス: CC BY-SA 4.0

Vincent van de Schaft and Ruud J.G. van Sloun

(参考訳) 超音波検査は安価で広くアクセス可能でコンパクトな医用イメージングソリューションを提供する。しかし、CTやMRIなどの他の画像モダリティと比較して、超音波画像はサブ波長散乱のランダムな干渉に起因する強いスペックルノイズに悩まされている。これにより超音波画像の品質が低下し、解釈が困難になる。本稿では,高画質mri画像から得られた深部生成前処理を用いた最大ポストエリリ推定に基づく,教師なし超音波スペックル低減法と画像切り離し法を提案する。生成組織反射率を事前にモデル化するために,近年,様々な応用において信号先行のモデル化に非常に有効であることが判明した流れの正規化を利用する。一般化を容易にするため,NYUの高速MRI(完全サンプリング)データセットのパッチに基づいて,前処理を分解し,フローモデルをトレーニングする。この前処理は反復分母スキームの推論に使用される。まず,騒がしいmriデータに対する学習前処理の有用性を検証し,picmusとcubdlのデータセットから得られたシミュレーション画像とin-vivo超音波画像の両方の性能評価を行った。その結果,他の(教師なし)超音波除音法 (nlm, obnlm) よりも定量的, 質的に優れていた。

Ultrasonography offers an inexpensive, widely-accessible and compact medical imaging solution. However, compared to other imaging modalities such as CT and MRI, ultrasound images notoriously suffer from strong speckle noise, which originates from the random interference of sub-wavelength scattering. This deteriorates ultrasound image quality and makes interpretation challenging. We here propose a new unsupervised ultrasound speckle reduction and image denoising method based on maximum-a-posteriori estimation with deep generative priors that are learned from high-quality MRI images. To model the generative tissue reflectivity prior, we exploit normalizing flows, which in recent years have shown to be very powerful in modeling signal priors across a variety of applications. To facilitate generaliation, we factorize the prior and train our flow model on patches from the NYU fastMRI (fully-sampled) dataset. This prior is then used for inference in an iterative denoising scheme. We first validate the utility of our learned priors on noisy MRI data (no prior domain shift), and then turn to evaluating performance on both simulated and in-vivo ultrasound images from the PICMUS and CUBDL datasets. The results show that the method outperforms other (unsupervised) ultrasound denoising methods (NLM and OBNLM) both quantitatively and qualitatively.

翻訳日:2021-12-29 12:39:09 公開日:2021-12-24

# (参考訳) 劣化によるDNA配列データの品質測定

Measuring Quality of DNA Sequence Data via Degradation ( http://arxiv.org/abs/2112.13111v1 )

ライセンス: CC BY 4.0

Alan F. Karr, Jason Hauzel, Adam A. Porter, Marcel Schaefer

(参考訳) 本稿では,ゲノムデータの品質評価のための新しいパラダイムを提案し,その有効性を定量的に評価する。その理論的根拠は、初期品質が高いほど、ゲノムが脆弱になり、分解の影響が大きくなることである。我々は, この現象がユビキタスであり, 劣化の定量化が多目的に利用できることを示す。データ品質に関して問題となる可能性のある外れ値の特定に重点を置いていますが、真の異常である場合や、データベースを変換しようとする場合さえあります。

We propose and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database.

翻訳日:2021-12-29 12:22:36 公開日:2021-12-24

# (参考訳) ゲノムのマルコフ構造の異常識別と読み出し分類への応用

Application of Markov Structure of Genomes to Outlier Identification and Read Classification ( http://arxiv.org/abs/2112.13117v1 )

ライセンス: CC BY 4.0

Alan F. Karr, Jason Hauzel, Adam A. Porter, Marcel Schaefer

(参考訳) 本稿では,2つのバイオインフォマティクス問題,すなわち,ゲノムデータベースにおける異常点の同定と,実際のウイルスとアデノウイルスのデータを用いたメダゲノミクスにおける分類の2次マルコフ過程として,ゲノムの構造を応用する。

In this paper we apply the structure of genomes as second-order Markov processes specified by the distributions of successive triplets of bases to two bioinformatics problems: identification of outliers in genome databases and read classification in metagenomics, using real coronavirus and adenovirus data.

翻訳日:2021-12-29 12:09:26 公開日:2021-12-24

# (参考訳) グラフとグラフの類似度を学習するためのニューラルネットワークフレームワーク

A Neural Framework for Learning Subgraph and Graph Similarity Measures ( http://arxiv.org/abs/2112.13143v1 )

ライセンス: CC BY 4.0

Rishabh Ranjan, Siddharth Grover, Sourav Medya, Venkatesan Chakaravarthy, Yogish Sabharwal, Sayan Ranu

(参考訳) グラフ解析において、グラフ類似性探索は基本的な演算子である。このフレームワークでは、クエリグラフとグラフデータベースが与えられた場合、クエリに構造的に類似したデータベースグラフのサブグラフを特定することが目的である。サブグラフ編集距離(sed)は、サブグラフの類似性の最も表現力のある尺度の1つである。本研究では,グラフペアの学習セットとそのSED値からSEDを学習する問題について検討する。そこで我々は,SEDを連想させる構造を持つ埋め込み空間を学習するNEUROSEDと呼ばれる新しいシアムグラフニューラルネットワークを設計する。 NEUROSEDは特殊に製作された帰納バイアスの助けを借りて、高い精度を実現するだけでなく、予測されたSEDが真のSEDと同様に三角形の不等式を満たすことを保証する。この設計はグラフ編集距離(GED)をモデル化するのに十分一般的なものであり、予測されたGED空間が真のGED空間のようにメートル法であることを保証している。 SEDとGEDの両方において、実際のグラフデータセットに関する大規模な実験により、NEUROSEDは最先端のベースラインの約18倍の速度でRMSEの約2倍の速度で達成されていることが証明された。さらに、ペア独立な埋め込みと理論的性質のため、neurosedはグラフやサブグラフの検索を約3桁高速化できる。

Subgraph similarity search is a fundamental operator in graph analysis. In this framework, given a query graph and a graph database, the goal is to identify subgraphs of the database graphs that are structurally similar to the query. Subgraph edit distance (SED) is one of the most expressive measures for subgraph similarity. In this work, we study the problem of learning SED from a training set of graph pairs and their SED values. Towards that end, we design a novel siamese graph neural network called NEUROSED, which learns an embedding space with a rich structure reminiscent of SED. With the help of a specially crafted inductive bias, NEUROSED not only enables high accuracy but also ensures that the predicted SED, like true SED, satisfies triangle inequality. The design is generic enough to also model graph edit distance (GED), while ensuring that the predicted GED space is metric, like the true GED space. Extensive experiments on real graph datasets, for both SED and GED, establish that NEUROSED achieves approximately 2 times lower RMSE than the state of the art and is approximately 18 times faster than the fastest baseline. Further, owing to its pair-independent embeddings and theoretical properties, NEUROSED allows approximately 3 orders of magnitude faster retrieval of graphs and subgraphs.

翻訳日:2021-12-29 11:55:40 公開日:2021-12-24

# (参考訳) SoK:音声処理システムのセキュリティに関する研究

SoK: A Study of the Security on Voice Processing Systems ( http://arxiv.org/abs/2112.13144v1 )

ライセンス: CC BY 4.0

Robert Chang, Logan Kuo, Arthur Liu, and Nader Sehatbakhsh

(参考訳) 音声処理システム(vps)の使用は、商用音声認識デバイスや主要なテキスト対音声ソフトウェアといったアプリケーションへの依存が高まり、日々の日常生活で普及し続けているため、これらのシステムに対する攻撃はますます複雑で、多様で、絶えず進化している。 VPSのユースケースが急速に新しいスペースと目的に成長するにつれ、プライバシーに関する潜在的な影響はますます危険になっている。さらに、空襲の数の増加と実用性の増加により、システム障害はずっと起こり得るものになっている。本稿では,音声処理システムにおけるユニークな攻撃の配置を識別し,分類する。長年にわたり研究は、システムの故障やサービスの否定をもたらす特殊な標的のない攻撃から、敵によって制御される結果を強制するより汎用的な攻撃へと移行してきた。現在の最も頻繁に使用されている機械学習システムと、現代の音声処理システムの中核であるディープニューラルネットワークは、セキュリティよりもパフォーマンスとスケーラビリティを重視して構築されている。したがって,我々は音声処理環境の発達を再評価し,今後の発展と理論的改善を提案するために,現在の攻撃・防御の状況を特定することが重要である。

As the use of Voice Processing Systems (VPS) continues to become more prevalent in our daily lives through the increased reliance on applications such as commercial voice recognition devices as well as major text-to-speech software, the attacks on these systems are increasingly complex, varied, and constantly evolving. With the use cases for VPS rapidly growing into new spaces and purposes, the potential consequences regarding privacy are increasingly more dangerous. In addition, the growing number and increased practicality of over-the-air attacks have made system failures much more probable. In this paper, we will identify and classify an arrangement of unique attacks on voice processing systems. Over the years research has been moving from specialized, untargeted attacks that result in the malfunction of systems and the denial of services to more general, targeted attacks that can force an outcome controlled by an adversary. The current and most frequently used machine learning systems and deep neural networks, which are at the core of modern voice processing systems, were built with a focus on performance and scalability rather than security. Therefore, it is critical for us to reassess the developing voice processing landscape and to identify the state of current attacks and defenses so that we may suggest future developments and theoretical improvements.

翻訳日:2021-12-29 11:27:11 公開日:2021-12-24

# (参考訳) 前・逆離散周期ラドン変換の高速かつスケーラブルな計算法

Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform ( http://arxiv.org/abs/2112.13149v1 )

ライセンス: CC BY 4.0

Cesar Carranza, Daniel Llamocca, and Marios Pattichis

(参考訳) 離散周期ラドン変換(DPRT)は、投影からの画像再構成を含むアプリケーションで広く使われている。この原稿では、以下の方法に基づいた前方および逆dprtを計算するための高速でスケーラブルなアプローチを紹介している。 (i)固定点加算木の並列配列 (ii) 加算器ツリーの入力データを選択する際に外部メモリコンポーネントにアクセスする必要をなくすための円形シフトレジスタ。 (iii)提案するアーキテクチャを利用可能なリソースに適合させるdprt計算に対する画像ブロックに基づくアプローチ (4)入力画像のサイズに依存しない1または数回のクロックサイクルで計算される高速なトランスポジション。結果として、$N\times N$ image(N$ prime)の場合、提案手法はクロックサイクル当たりの$N^{2}$加算を計算することができる。従来のアプローチと比較して、スケーラブルなアプローチは、さまざまな計算リソースに対して最も高速な実装を提供する。例えば、251\times 251$の画像では、systolicの実装で必要とされるよりも約25\%少ないflip-flopsで、スケーラブルなdprtは36倍高速に計算できる。最も高速な場合、DPRTとその逆をそれぞれ2N+\lceil \log_{2}N\right\rceil+1$と2N+3\left\lceil \log_{2}N\right\rceil+B+2$ cyclesで計算できる最適化アーキテクチャを導入します。一方、拡張性のあるDPRTアプローチでは、systolic実装よりも1ビットの追加が必要であり、スピードと1ビットの追加の間のトレードオフを提供する。提案したDPRTアーキテクチャはすべてVHDLで実装され、FPGA実装を用いて検証された。

The Discrete Periodic Radon Transform (DPRT) has been extensively used in applications that involve image reconstructions from projections. This manuscript introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: (i) a parallel array of fixed-point adder trees, (ii) circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees, (iii) an image block-based approach to DPRT computation that can fit the proposed architecture to available resources, and (iv) fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an $N\times N$ image ($N$ prime), the proposed approach can compute up to $N^{2}$ additions per clock cycle. Compared to previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a $251\times 251$ image, for approximately $25\%$ fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized architectures that can compute the DPRT and its inverse in just $2N+\left\lceil \log_{2}N\right\rceil+1$ and $2N+3\left\lceil \log_{2}N\right\rceil+B+2$ cycles respectively, where $B$ is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-bit additions than for the systolic implementation and provides a trade-off between speed and additional 1-bit additions. All of the proposed DPRT architectures were implemented in VHDL and validated using an FPGA implementation.

翻訳日:2021-12-29 11:11:58 公開日:2021-12-24

# (参考訳) スケーラブルアーキテクチャを用いた高速2次元畳み込みと相互相関

Fast 2D Convolutions and Cross-Correlations Using Scalable Architectures ( http://arxiv.org/abs/2112.13150v1 )

ライセンス: CC BY 4.0

Cesar Carranza, Daniel Llamocca, and Marios Pattichis

(参考訳) この原稿は、高速でスケーラブルなアーキテクチャと、畳み込みと相互相関を計算するための関連するアルゴリズムを記述している。基本的な考え方は、2次元の畳み込みとクロス相関を変換領域内の1次元の畳み込みとクロス相関の集合にマッピングすることである。これは、一般的なカーネルに離散周期ラドン変換(DPRT)を使用し、低ランクカーネルにSVD-LU分解を使用することで達成される。このアプローチではスケーラブルなアーキテクチャを使用し、最新のFPGAやZynq-SOCデバイスに組み込める。利用可能なリソースの種類によっては、$P\times P$ blocks、$O(P)$ clock cycles to $O(P^2)$ clock cyclesで2D畳み込みと相互相関を計算することができる。したがって、パフォーマンスと必要な数とリソースの種類との間にトレードオフがある。本稿では,最新のプログラマブルデバイス(Virtex-7とZynq-SOC)を用いて提案アーキテクチャの実装を行う。必要なリソースの量と種類に基づいて,提案手法が現在の手法を大きく上回ることを示す。

The manuscript describes fast and scalable architectures and associated algorithms for computing convolutions and cross-correlations. The basic idea is to map 2D convolutions and cross-correlations to a collection of 1D convolutions and cross-correlations in the transform domain. This is accomplished through the use of the Discrete Periodic Radon Transform (DPRT) for general kernels and the use of SVD-LU decompositions for low-rank kernels. The approach uses scalable architectures that can be fitted into modern FPGA and Zynq-SOC devices. Based on different types of available resources, for $P\times P$ blocks, 2D convolutions and cross-correlations can be computed in just $O(P)$ clock cycles up to $O(P^2)$ clock cycles. Thus, there is a trade-off between performance and required numbers and types of resources. We provide implementations of the proposed architectures using modern programmable devices (Virtex-7 and Zynq-SOC). Based on the amounts and types of required resources, we show that the proposed approaches significantly outperform current methods.

翻訳日:2021-12-29 10:40:30 公開日:2021-12-24

# ニューラルインターコネクトとダンピング割り当てを用いた全エネルギーシェーピング --パッシビリティに基づく制御

Total Energy Shaping with Neural Interconnection and Damping Assignment -- Passivity Based Control ( http://arxiv.org/abs/2112.12999v1 )

ライセンス: Link先を確認

Santiago Sanchez-Escalonilla, Rodolfo Reyes-Baez, Bayu Jayawardhana

(参考訳) 本研究では、ニューラルネットワーク(NN)の普遍的近似特性を利用して、ポート-ハミルトン(pH)フレームワークにおける完全作動機械系の相互接続と減衰割り当て(IDA)制御を設計する。そこで我々は、IDA-PBC法を、偏微分マッチング方程式を解く教師付き学習問題に変換し、平衡割当やリャプノフ安定性条件を満たす。この結果の主な結果は、学習アルゴリズムの出力が通過率とリャプノフ安定性の観点から明確な制御論的解釈を持つことである。提案する制御設計手法は, 数値シミュレーションにより自由度1, 2自由度機械システムに対して検証された。

In this work we exploit the universal approximation property of Neural Networks (NNs) to design interconnection and damping assignment (IDA) passivity-based control (PBC) schemes for fully-actuated mechanical systems in the port-Hamiltonian (pH) framework. To that end, we transform the IDA-PBC method into a supervised learning problem that solves the partial differential matching equations, and fulfills equilibrium assignment and Lyapunov stability conditions. A main consequence of this, is that the output of the learning algorithm has a clear control-theoretic interpretation in terms of passivity and Lyapunov stability. The proposed control design methodology is validated for mechanical systems of one and two degrees-of-freedom via numerical simulations.

翻訳日:2021-12-28 17:50:28 公開日:2021-12-24

# 深部暗黙的場を用いた点雲からのコンパクト建築モデルの構築

Reconstructing Compact Building Models from Point Clouds Using Deep Implicit Fields ( http://arxiv.org/abs/2112.13142v1 )

ライセンス: Link先を確認

Zhaiyu Chen, Seyran Khademi, Hugo Ledoux, Liangliang Nan

(参考訳) 3次元建築モデルは、多くの現実世界の応用においてますます重要な役割を果たす一方、建物のコンパクトな表現は未解決の問題である。本稿では,点雲からコンパクト・水密・多角形建築モデルを再構築するための新しい枠組みを提案する。私たちのフレームワークは3つのコンポーネントで構成されています。 a) 細胞複合体は、候補集合として多面体埋め込みを提供する適応空間分割によって生成される。 b) 暗黙的場は、占有率推定の構築を容易にする深層ニューラルネットワークによって学習される。 (c)マルコフ確率場を定式化し、組合せ最適化により建物の外面を抽出する。形状再構成, 表面近似, 幾何単純化における最先端手法と評価, 比較を行った。人工的および実世界のポイントクラウドにおける実験では、ニューラルネットワークによる戦略により、忠実性、コンパクト性、計算効率において、高品質なビルディングモデルが得られることが示されています。提案手法は, ノイズに対する頑健さと測定の不十分さを示し, 合成スキャンから実世界の計測まで, 直接的に一般化することができる。

Three-dimensional (3D) building models play an increasingly pivotal role in many real-world applications while obtaining a compact representation of buildings remains an open problem. In this paper, we present a novel framework for reconstructing compact, watertight, polygonal building models from point clouds. Our framework comprises three components: (a) a cell complex is generated via adaptive space partitioning that provides a polyhedral embedding as the candidate set; (b) an implicit field is learned by a deep neural network that facilitates building occupancy estimation; (c) a Markov random field is formulated to extract the outer surface of a building via combinatorial optimization. We evaluate and compare our method with state-of-the-art methods in shape reconstruction, surface approximation, and geometry simplification. Experiments on both synthetic and real-world point clouds have demonstrated that, with our neural-guided strategy, high-quality building models can be obtained with significant advantages in fidelity, compactness, and computational efficiency. Our method shows robustness to noise and insufficient measurements, and it can directly generalize from synthetic scans to real-world measurements.

翻訳日:2021-12-28 17:42:40 公開日:2021-12-24

# 汚染ガウスモデルにおけるロバスト推定のためのトラクタブルおよび準最適逆アルゴリズム

Tractable and Near-Optimal Adversarial Algorithms for Robust Estimation in Contaminated Gaussian Models ( http://arxiv.org/abs/2112.12919v1 )

ライセンス: Link先を確認

Ziyue Wang, Zhiqiang Tan

(参考訳) フーバーの汚染ガウスモデルの下での位置と分散行列の同時推定の問題を考える。まず, 人口レベルでの最小$f$-divergence推定を非パラメトリック判別器を用いた生成逆数法に対応して検討し, 最小距離推定のロバスト性と同様に, 頑健な推定につながる$f$-divergencesの条件を確立する。より重要なことは、単純なスプライン判別器を用いた扱いやすい逆アルゴリズムを開発し、現在のジェネレータに与えられた凹型目的関数を最大化することで判別器パラメータを完全に更新できるように、入れ子最適化によって実装できる。提案手法は,$f$-divergenceと使用したペナルティに応じて,最小値の最適値またはほぼ最適値を達成する。本稿では,古典的ロバスト推定法,ペアワイズ法,ニューラルネットワーク判別法に対する提案手法の利点を示すシミュレーション手法を提案する。

Consider the problem of simultaneous estimation of location and variance matrix under Huber's contaminated Gaussian model. First, we study minimum $f$-divergence estimation at the population level, corresponding to a generative adversarial method with a nonparametric discriminator and establish conditions on $f$-divergences which lead to robust estimation, similarly to robustness of minimum distance estimation. More importantly, we develop tractable adversarial algorithms with simple spline discriminators, which can be implemented via nested optimization such that the discriminator parameters can be fully updated by maximizing a concave objective function given the current generator. The proposed methods are shown to achieve minimax optimal rates or near-optimal rates depending on the $f$-divergence and the penalty used. We present simulation studies to demonstrate advantages of the proposed methods over classic robust estimators, pairwise methods, and a generative adversarial method with neural network discriminators.

翻訳日:2021-12-28 17:37:32 公開日:2021-12-24

# ディープフィードフォワードReLUニューラルネットワークのパラメータ同定可能性

Parameter identifiability of a deep feedforward ReLU neural network ( http://arxiv.org/abs/2112.12982v1 )

ライセンス: Link先を確認

Joachim Bona-Pellissier (IMT), Fran\c{c}ois Bachoc (IMT), Fran\c{c}ois Malgouyres (IMT)

(参考訳) 入力空間のサブセットにおける関数の知識のおかげで、ニューラルネットワークのパラメータ重みとバイアスを回復する可能性は、状況、呪い、祝福によっても可能となる。一方、パラメータを復元することで、より良い敵攻撃が可能になり、ネットワーク構築に使用されるデータセットから機密情報を開示することもできる。一方、ネットワークのパラメータが復元可能であれば、潜在空間の特徴を解釈できることをユーザに保証する。また、ネットワークの性能に関する正式な保証を得るための基盤も提供する。したがって、パラメータを識別できるネットワークとパラメータを識別できないネットワークを特徴付けることが重要である。本稿では、入力空間のサブセットに実装する関数から、ネットワークのパラメータが一意に同定されたモジュロ置換と正の再スケーリングを持つディープ完全接続フィードフォワードReLUニューラルネットワークに条件セットを提供する。

The possibility for one to recover the parameters-weights and biases-of a neural network thanks to the knowledge of its function on a subset of the input space can be, depending on the situation, a curse or a blessing. On one hand, recovering the parameters allows for better adversarial attacks and could also disclose sensitive information from the dataset used to construct the network. On the other hand, if the parameters of a network can be recovered, it guarantees the user that the features in the latent spaces can be interpreted. It also provides foundations to obtain formal guarantees on the performances of the network. It is therefore important to characterize the networks whose parameters can be identified and those whose parameters cannot. In this article, we provide a set of conditions on a deep fully-connected feedforward ReLU neural network under which the parameters of the network are uniquely identified-modulo permutation and positive rescaling-from the function it implements on a subset of the input space.

翻訳日:2021-12-28 17:37:13 公開日:2021-12-24

# 基礎疾患と新型コロナウイルス感受性との関連性に関する機械学習解析

A machine learning analysis of the relationship between some underlying medical conditions and COVID-19 susceptibility ( http://arxiv.org/abs/2112.12901v1 )

ライセンス: Link先を確認

Mostafa Rezapour, Colin A. Varady

(参考訳) 過去数年間、新型コロナウイルス(covid-19)は米国に住むすべての国民の日常生活に大きな影響を与え、気づかれずにはいられないいくつかの致命的な健康リスクを課してきた。米国の社会にcovid-19が与える恐怖と危険が高まる中で、個人が利用するための恒久的な治療として、いくつかのワクチンやブースターが作成されている。本稿では,米国内の複数の州において,新型コロナウイルスワクチンとブースターの関連とコロナウイルスの総感染者数について検討する。また,本研究は,いくつかの病原体とcovid-19の関連について述べる。本稿では,これらの関係を効果的に議論するために,統計的テストと機械学習手法を用いて分析と議論を行う。さらに, 教育的達成, 人種, およびcovid-19との関係と, 基礎疾患, ワクチン接種率, およびcovid-19総症例数, 死亡数との関連性について考察した。

For the past couple years, the Coronavirus, commonly known as COVID-19, has significantly affected the daily lives of all citizens residing in the United States by imposing several, fatal health risks that cannot go unnoticed. In response to the growing fear and danger COVID-19 inflicts upon societies in the USA, several vaccines and boosters have been created as a permanent remedy for individuals to take advantage of. In this paper, we investigate the relationship between the COVID-19 vaccines and boosters and the total case count for the Coronavirus across multiple states in the USA. Additionally, this paper discusses the relationship between several, selected underlying health conditions with COVID-19. To discuss these relationships effectively, this paper will utilize statistical tests and machine learning methods for analysis and discussion purposes. Furthermore, this paper reflects upon conclusions made about the relationship between educational attainment, race, and COVID-19 and the possible connections that can be established with underlying health conditions, vaccination rates, and COVID-19 total case and death counts.

翻訳日:2021-12-28 17:36:33 公開日:2021-12-24

# 機械学習を用いた心電図信号の心室頻拍検出と分類モデル

Supraventricular Tachycardia Detection and Classification Model of ECG signal Using Machine Learning ( http://arxiv.org/abs/2112.12953v1 )

ライセンス: Link先を確認

Pampa Howladar, Manodipan Sahoo

(参考訳) 心電図(ECG)信号の研究は、心電図プロセスが非侵襲的で使用が容易であるため、心臓疾患の診断に不可欠である。本研究は, ノイズのフィルタリング, 心電図特性のユニークな収集, および重症度に応じて異なる型を分類する自動学習分類モデルを含む, 数段階からなる上室性不整脈予測モデルを提案する。我々は,ノイズを低減し,抽出前の機能をよりよく決定するために,信号の消音・消音を行う。その後,必要な特徴抽出の一部として1つのrピーク検出法とq-s検出法を提案する。これらの特徴に対応する次のパラメータが計算される。これらの特徴を活かして,異なるタイプの上室頻拍を分類できる機械学習に基づく分類モデルを開発した。上室頻拍不整脈における決定木モデルが最も効率的な機械学習モデルであることが示唆された。すべての機械学習モデルの中で、このモデルは上室頻拍の重要な信号誤分類を最も効率的に低減する。実験の結果, 良好な改善が得られ, 97%の精度で提案手法の有効性が示された。

Investigation on the electrocardiogram (ECG) signals is an essential way to diagnose heart disease since the ECG process is noninvasive and easy to use. This work presents a supraventricular arrhythmia prediction model consisting of a few stages, including filtering of noise, a unique collection of ECG characteristics, and automated learning classifying model to classify distinct types, depending on their severity. We de-trend and de-noise a signal to reduce noise to better determine functionality before extractions are performed. After that, we present one R-peak detection method and Q-S detection method as a part of necessary feature extraction. Next parameters are computed that correspond to these features. Using these characteristics, we have developed a classification model based on machine learning that can successfully categorize different types of supraventricular tachycardia. Our findings suggest that decision-tree-based models are the most efficient machine learning models for supraventricular tachycardia arrhythmia. Among all the machine learning models, this model most efficiently lowers the crucial signal misclassification of supraventricular tachycardia. Experimental results indicate satisfactory improvements and demonstrate a superior efficiency of the proposed approach with 97% accuracy.

翻訳日:2021-12-28 17:36:16 公開日:2021-12-24

# 機械学習による心電図信号の効率的な心室頻拍検出モデル

Machine Learning-based Efficient Ventricular Tachycardia Detection Model of ECG Signal ( http://arxiv.org/abs/2112.12956v1 )

ライセンス: Link先を確認

Pampa Howladar, Manodipan Sahoo

(参考訳) 心不全の一次診断と解析では、心電図信号が重要な役割を果たす。本稿では, ノイズフィルタリングを用いた心室頻拍不整脈の予測モデル, 心電図特徴のユニークなセット, 機械学習に基づく分類モデルを提案する。信号特徴抽出に先立ち,特徴を適切に検出するためのノイズを除去するために信号の消音・消音を行う。その後、必要な特徴を抽出し、これらの特徴に関連する必要なパラメータを測定する。これらのパラメータを用いて、異なるタイプの心室頻拍不整脈を効率的に分類できる機械学習アプローチを用いて、効率的なマルチクラス分類モデルを作成した。以上の結果から,ロジスティック回帰モデルと決定木モデルが最も効率的な心室頻拍検出モデルであることが示唆された。心臓疾患を診断し,患者のケアを見つけるためには,早期かつ信頼性の高い不整脈の診断が必要である。提案手法の実装により,心室頻拍に関連する臨界信号の誤分類を極めて効率的に低減する問題に対処する。実験結果から,提案したアルゴリズムに対する高いレジリエンスを示した。この支援により、医師は患者のこのタイプの不整脈を早期に評価し、適切なタイミングで適切な判断をすることができる。

In primary diagnosis and analysis of heart defects, an ECG signal plays a significant role. This paper presents a model for the prediction of ventricular tachycardia arrhythmia using noise filtering, a unique set of ECG features, and a machine learning-based classifier model. Before signal feature extraction, we detrend and denoise the signal to eliminate the noise for detecting features properly. After that necessary features have been extracted and necessary parameters related to these features are measured. Using these parameters, we prepared one efficient multiclass classifier model using a machine learning approach that can classify different types of ventricular tachycardia arrhythmias efficiently. Our results indicate that Logistic regression and Decision tree-based models are the most efficient machine learning models for detecting ventricular tachycardia arrhythmia. In order to diagnose heart diseases and find care for a patient, an early, reliable diagnosis of different types of arrhythmia is necessary. By implementing our proposed method, this work deals with the problem of reducing the misclassification of the critical signal related to ventricular tachycardia very efficiently. Experimental findings demonstrate satisfactory enhancements and demonstrate high resilience to the algorithm that we have proposed. With this assistance, doctors can assess this type of arrhythmia of a patient early and take the right decision at the proper time.

翻訳日:2021-12-28 17:35:29 公開日:2021-12-24

# 時系列のコンパクト辞書表現を用いた誤差有界近似時系列接合

Error-bounded Approximate Time Series Joins using Compact Dictionary Representations of Time Series ( http://arxiv.org/abs/2112.12965v1 )

ライセンス: Link先を確認

Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Huiyuan Chen, Zhongfang Zhuang, Wei Zhang, Eamonn Keogh

(参考訳) matrix profileは、時系列データの類似性結合機能を提供する効果的なデータマイニングツールである。行列プロファイルのユーザは、相似性結合(自己結合)を用いて自身で時系列を結合するか、相似性結合を用いて別の時系列と結合することができる。いずれかのタイプの結合を呼び出すことで、マトリクスプロファイルはデータの保存された構造と異常な構造の両方を発見するのに役立つ。 5年前の行列プロファイルの導入以来、近似結合による計算の高速化に複数の取り組みがなされてきたが、これらの取り組みの大部分は自己結合にのみ焦点をあてている。本研究では,時系列のコンパクトな"ディクショナリ"表現を作成することにより,誤差有界保証を伴う近似時系列間類似性結合を効率的に実行可能であることを示す。元の時系列ではなく辞書表現を用いることで、異常マイニングシステムのスループットを少なくとも20倍向上させることができるが、基本的に精度は低下しない。副次的な効果として、辞書は時系列を意味的に意味のある方法で要約し、直感的で実行可能な洞察を提供する。医学や交通の分野における辞書に基づく時系列間類似性の有用性を実証する。

The matrix profile is an effective data mining tool that provides similarity join functionality for time series data. Users of the matrix profile can either join a time series with itself using intra-similarity join (i.e., self-join) or join a time series with another time series using inter-similarity join. By invoking either or both types of joins, the matrix profile can help users discover both conserved and anomalous structures in the data. Since the introduction of the matrix profile five years ago, multiple efforts have been made to speed up the computation with approximate joins; however, the majority of these efforts only focus on self-joins. In this work, we show that it is possible to efficiently perform approximate inter-time series similarity joins with error bounded guarantees by creating a compact "dictionary" representation of time series. Using the dictionary representation instead of the original time series, we are able to improve the throughput of an anomaly mining system by at least 20X, with essentially no decrease in accuracy. As a side effect, the dictionaries also summarize the time series in a semantically meaningful way and can provide intuitive and actionable insights. We demonstrate the utility of our dictionary-based inter-time series similarity joins on domains as diverse as medicine and transportation.

翻訳日:2021-12-28 17:35:10 公開日:2021-12-24

# DP-UTIL: 機械学習における差分プライバシーの総合的ユーティリティ分析

DP-UTIL: Comprehensive Utility Analysis of Differential Privacy in Machine Learning ( http://arxiv.org/abs/2112.12998v1 )

ライセンス: Link先を確認

Ismat Jarin and Birhanu Eshete

(参考訳) 差分プライバシー(DP)は、プライバシー漏洩の定量化を推理する厳格な形式主義として登場した。機械学習(ML)では、DPはトレーニング例の推論/開示を制限するために使用されている。以前の作業では、MLパイプライン全体にわたってDPを活用し、独立して、勾配の摂動のようなメカニズムに重点を置いていた。本稿では,入力摂動,客観的摂動,勾配摂動,出力摂動,予測摂動に着目した,mlパイプライン全体のdpの総合的ユーティリティ解析フレームワークdp-utilを提案する。プライバシに敏感なデータに対するMLタスクが与えられた場合、DP-UTILは、モデルユーティリティ損失、プライバシリーク、真に明らかになったトレーニングサンプルの数で測定された、これらの5つの摂動領域におけるDPの影響に関する総合的な比較分析を可能にする。我々は,視覚,医療,財務データセットの分類タスクよりもDP-UTILを評価するために,2つの代表的な学習アルゴリズム(論理回帰とディープニューラルネットワーク)を事例スタディアタックとして利用した。結果のハイライトの1つは、予測摂動が一貫してすべてのデータセットにわたるすべてのモデルで最も低いユーティリティ損失を達成していることです。ロジスティック回帰モデルでは、客観摂動は他の摂動法と比較して低いプライバシー漏洩をもたらす。ディープニューラルネットワークの場合、勾配の摂動はプライバシリークを低くする。さらに,本研究の結果から,プライバシリークが増大するにつれて,より多くのメンバーサンプルが発見されたことが示唆された。以上の結果から,どの摂動メカニズムを使用するべきかを判断するためには,最適化手法(凸対非凸),摂動機構,クラス数,プライバシ予算のダイナミクスを検討する必要があることが示唆された。

Differential Privacy (DP) has emerged as a rigorous formalism to reason about quantifiable privacy leakage. In machine learning (ML), DP has been employed to limit inference/disclosure of training examples. Prior work leveraged DP across the ML pipeline, albeit in isolation, often focusing on mechanisms such as gradient perturbation. In this paper, we present, DP-UTIL, a holistic utility analysis framework of DP across the ML pipeline with focus on input perturbation, objective perturbation, gradient perturbation, output perturbation, and prediction perturbation. Given an ML task on privacy-sensitive data, DP-UTIL enables a ML privacy practitioner perform holistic comparative analysis on the impact of DP in these five perturbation spots, measured in terms of model utility loss, privacy leakage, and the number of truly revealed training samples. We evaluate DP-UTIL over classification tasks on vision, medical, and financial datasets, using two representative learning algorithms (logistic regression and deep neural network) against membership inference attack as a case study attack. One of the highlights of our results is that prediction perturbation consistently achieves the lowest utility loss on all models across all datasets. In logistic regression models, objective perturbation results in lowest privacy leakage compared to other perturbation techniques. For deep neural networks, gradient perturbation results in lowest privacy leakage. Moreover, our results on true revealed records suggest that as privacy leakage increases a differentially private model reveals more number of member samples. Overall, our findings suggest that to make informed decisions as to which perturbation mechanism to use, a ML privacy practitioner needs to examine the dynamics between optimization techniques (convex vs. non-convex), perturbation mechanisms, number of classes, and privacy budget.

翻訳日:2021-12-28 17:34:51 公開日:2021-12-24

# 解析クエリ処理のための微調整データ構造

Fine-Tuning Data Structures for Analytical Query Processing ( http://arxiv.org/abs/2112.13099v1 )

ライセンス: Link先を確認

Amir Shaikhha, Marios Kelepeshis, Mahdi Ghorbani

(参考訳) 分析ワークロードの効率的な計算を支援するために,データ構造を自動的に選択するフレームワークを提案する。私たちの貢献は2倍です。まず,古典結合やgroupjoin,データベース内機械学習エンジンなど,さまざまなクエリ処理パラダイムの背後にあるアルゴリズムを表現可能な,新しい低レベル中間言語を提案する。この言語は辞書の概念に基づいて設計されており、低レベルの実装をより細かく選択することができる。次に、機械学習とプログラム推論を組み合わせることで、代替実装のコストモデルを自動的に推論する。辞書コストモデルは、所定のハードウェアアーキテクチャ上の辞書操作のプロファイリングデータセット上で訓練された回帰モデルを用いて学習される。プログラムコストモデルは静的プログラム解析を用いて推定される。実験の結果,マイクロベンチマークにおける訓練コストモデルの有効性が示された。さらに、我々のフレームワークが生成したコードの性能は、最先端の分析クエリエンジンと最近のデータベース内機械学習フレームワークに匹敵するか、同等であることを示す。

We introduce a framework for automatically choosing data structures to support efficient computation of analytical workloads. Our contributions are twofold. First, we introduce a novel low-level intermediate language that can express the algorithms behind various query processing paradigms such as classical joins, groupjoin, and in-database machine learning engines. This language is designed around the notion of dictionaries, and allows for a more fine-grained choice of its low-level implementation. Second, the cost model for alternative implementations is automatically inferred by combining machine learning and program reasoning. The dictionary cost model is learned using a regression model trained over the profiling dataset of dictionary operations on a given hardware architecture. The program cost model is inferred using static program analysis. Our experimental results show the effectiveness of the trained cost model on micro benchmarks. Furthermore, we show that the performance of the code generated by our framework either outperforms or is on par with the state-of-the-art analytical query engines and a recent in-database machine learning framework.

翻訳日:2021-12-28 17:34:17 公開日:2021-12-24

# リチウムイオン電池の物理モデルと機械学習の統合

Integrating Physics-Based Modeling with Machine Learning for Lithium-Ion Batteries ( http://arxiv.org/abs/2112.12979v1 )

ライセンス: Link先を確認

Hao Tu, Scott Moura, Yebin Wang, Huazhen Fang

(参考訳) リチウムイオン電池(libs)の数学的モデリングは、高度な電池管理における主要な課題である。本稿では,LiBの高精度モデリングを実現するために,物理モデルと機械学習を統合する2つの新しいフレームワークを提案する。フレームワークの特徴は、物理モデルの状態情報の機械学習モデルに通知することで、物理モデルと機械学習の深い統合を可能にすることである。これらの枠組みに基づき、電気化学モデルと等価回路モデルとをそれぞれフィードフォワードニューラルネットワークと組み合わせて、一連のハイブリッドモデルを構築する。ハイブリッドモデルは構造的に比較的類似しており、広範なシミュレーションや実験で示されているように、幅広いCレートでかなりの予測精度を提供できる。この研究は、老化と認識のハイブリッドモデリングをさらに拡大し、健康状態を意識して予測するハイブリッドモデルの設計へと繋がる。実験により、モデルはLiBのサイクルライフサイクルを通して高い予測精度を持つことが示された。

Mathematical modeling of lithium-ion batteries (LiBs) is a primary challenge in advanced battery management. This paper proposes two new frameworks to integrate a physics-based model with machine learning to achieve high-precision modeling for LiBs. The frameworks are characterized by informing the machine learning model of the state information of the physical model, enabling a deep integration between physics and machine learning. Based on the frameworks, a series of hybrid models are constructed, through combining an electrochemical model and an equivalent circuit model, respectively, with a feedforward neural network. The hybrid models are relatively parsimonious in structure and can provide considerable predictive accuracy under a broad range of C-rates, as shown by extensive simulations and experiments. The study further expands to conduct aging-aware hybrid modeling, leading to the design of a hybrid model conscious of the state-of-health to make prediction. Experiments show that the model has high predictive accuracy throughout a LiB's cycle life.

翻訳日:2021-12-28 17:27:46 公開日:2021-12-24

# Dyson-Schwinger方程式の自律的数値解析継続のための機械学習パイプライン

A machine learning pipeline for autonomous numerical analytic continuation of Dyson-Schwinger equations ( http://arxiv.org/abs/2112.13011v1 )

ライセンス: Link先を確認

Andreas Windisch, Thomas Gallien, Christopher Schwarzlmueller

(参考訳) ダイソン=シュウィンガー方程式(Dyson-Schwinger equations, DSEs)は、場の量子論においてn点関数を表現する非摂動的な方法である。例えば、ユークリッド空間やランダウゲージで働くと、クォークプロパゲータのダイソン=シュウィンガー方程式を実および複素領域で研究することができる。これらの方程式を複素領域で解くことを目指すとき、つまり複素外部モータに対して、超球面座標で表されるループ運動量の複素平面における半径成分の積分輪郭を変形しなければならない。これは自己エネルギーループの積分における極と分岐切断を避けるために行う必要がある。ダイソン=シュウィンガー方程式の性質はそうであるので、それらは自己一貫性のある方法で解かなければならないので、反復ステップ毎に積分の解析的性質を解析することはできない。本稿では,コンピュータビジョン(cv)へのディープラーニング(dl)アプローチに基づく機械学習パイプラインと,反復ステップ毎に数値積分の極と分岐を検知し,これらの障害を回避する適切な積分輪郭変形を提案することで,この問題を自律的に解決できる深層強化学習(drl)を提案する。我々はこれらのタスク、すなわち、棒と枝の切断検出と輪郭変形の両方の原理の証明をスケッチする。

Dyson-Schwinger equations (DSEs) are a non-perturbative way to express n-point functions in quantum field theory. Working in Euclidean space and in Landau gauge, for example, one can study the quark propagator Dyson-Schwinger equation in the real and complex domain, given that a suitable and tractable truncation has been found. When aiming for solving these equations in the complex domain, that is, for complex external momenta, one has to deform the integration contour of the radial component in the complex plane of the loop momentum expressed in hyper-spherical coordinates. This has to be done in order to avoid poles and branch cuts in the integrand of the self-energy loop. Since the nature of Dyson-Schwinger equations is such, that they have to be solved in a self-consistent way, one cannot analyze the analytic properties of the integrand after every iteration step, as this would not be feasible. In these proceedings, we suggest a machine learning pipeline based on deep learning (DL) approaches to computer vision (CV), as well as deep reinforcement learning (DRL), that could solve this problem autonomously by detecting poles and branch cuts in the numerical integrand after every iteration step and by suggesting suitable integration contour deformations that avoid these obstructions. We sketch out a proof of principle for both of these tasks, that is, the pole and branch cut detection, as well as the contour deformation.

翻訳日:2021-12-28 17:27:33 公開日:2021-12-24

# 非侵襲的胎児心電図 : モデル,技術,アルゴリズム

Noninvasive Fetal Electrocardiography: Models, Technologies and Algorithms ( http://arxiv.org/abs/2112.13021v1 )

ライセンス: Link先を確認

Reza Sameni

(参考訳) 胎児心電図(fECG)は1900年代初頭に母体腹部から初めて記録された。過去50年間、最も先進的な電子工学技術と信号処理アルゴリズムは、非侵襲的な胎児心電図を胎児の心臓モニタリングのための信頼できる技術に変換するために用いられてきた。本章では,非侵襲的母体腹部記録からのfECGのモデリング,抽出,解析のために開発された主要な信号処理技術について概説し,相互に詳細に比較する。章の主な話題は以下のとおりである。 1)信号処理の観点からのfECGの電気生理学 2)母体表面から得られたfECGの母体体積伝導媒体の数学的モデルと波形モデル 3) 信号取得要件 4)適応フィルタや半盲音源分離技術を含むfECGノイズと干渉キャンセルのためのモデルに基づく手法 5) 少数のチャンネルから胎児の運動追跡とオンラインfECG抽出のアルゴリズムが進歩した。

The fetal electrocardiogram (fECG) was first recorded from the maternal abdominal surface in the early 1900s. During the past fifty years, the most advanced electronics technologies and signal processing algorithms have been used to convert noninvasive fetal electrocardiography into a reliable technology for fetal cardiac monitoring. In this chapter, the major signal processing techniques, which have been developed for the modeling, extraction and analysis of the fECG from noninvasive maternal abdominal recordings are reviewed and compared with one another in detail. The major topics of the chapter include: 1) the electrophysiology of the fECG from the signal processing viewpoint, 2) the mathematical model of the maternal volume conduction media and the waveform models of the fECG acquired from body surface leads, 3) the signal acquisition requirements, 4) model-based techniques for fECG noise and interference cancellation, including adaptive filters and semi-blind source separation techniques, and 5) recent algorithmic advances for fetal motion tracking and online fECG extraction from few number of channels.

翻訳日:2021-12-28 17:27:06 公開日:2021-12-24

# クライアント分散低減による圧縮連合学習の高速化

Faster Rates for Compressed Federated Learning with Client-Variance Reduction ( http://arxiv.org/abs/2112.13097v1 )

ライセンス: Link先を確認

Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richt\'arik

(参考訳) 分散学習および連合学習アプリケーションの通信ボトルネックにより、通信圧縮を用いたアルゴリズムが注目され、実際に広く使われている。さらに、不均一なクライアントの総数が非常に多く、各通信ラウンドでサーバがすべてのクライアントと通信できないため、連合学習にはクライアント分散が存在する。本稿では,この2つの問題に対して,圧縮およびクライアント分散低減手法を提案する。具体的には、COFIGとFRECONを導入し、クライアント分散化による通信圧縮をうまく楽しむ。 COFIGの総通信ラウンドは$O(\frac{(1+\omega)^{3/2}\sqrt{N}}{S\epsilon^2}+\frac{(1+\omega)N^{2/3}}{S\epsilon^2})$である。さらに、FRECONは非凸環境でCOFIGよりも早く収束し、$O(\frac{(1+\omega)\sqrt{N}}{S\epsilon^2})$通信ラウンドに収束する。凸設定では、COFIG は通信ラウンド $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ に収束する。結論として、cofigとfreconはどちらもすべてのクライアントと通信する必要がなく、凸および非凸のフェデレーション学習の第一または第二の収束結果を提供するが、以前の作業では完全なクライアント通信が必要か(実用的ではない)、より悪い収束結果を得る必要がある。

Due to the communication bottleneck in distributed and federated learning applications, algorithms using communication compression have attracted significant attention and are widely used in practice. Moreover, there exists client-variance in federated learning due to the total number of heterogeneous clients is usually very large and the server is unable to communicate with all clients in each communication round. In this paper, we address these two issues together by proposing compressed and client-variance reduced methods. Concretely, we introduce COFIG and FRECON, which successfully enjoy communication compression with client-variance reduction. The total communication round of COFIG is $O(\frac{(1+\omega)^{3/2}\sqrt{N}}{S\epsilon^2}+\frac{(1+\omega)N^{2/3}}{S\epsilon^2})$ in the nonconvex setting, where $N$ is the total number of clients, $S$ is the number of communicated clients in each round, $\epsilon$ is the convergence error, and $\omega$ is the parameter for the compression operator. Besides, our FRECON can converge faster than COFIG in the nonconvex setting, and it converges with $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon^2})$ communication rounds. In the convex setting, COFIG converges within the communication rounds $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$, which is also the first convergence result for compression schemes that do not communicate with all the clients in each round. In sum, both COFIG and FRECON do not need to communicate with all the clients and provide first/faster convergence results for convex and nonconvex federated learning, while previous works either require full clients communication (thus not practical) or obtain worse convergence results.

翻訳日:2021-12-28 17:26:51 公開日:2021-12-24

# 説明可能な人工知能による脳機能発達の理解 : 課題と展望

Towards Understanding Human Functional Brain Development with Explainable Artificial Intelligence: Challenges and Perspectives ( http://arxiv.org/abs/2112.12910v1 )

ライセンス: Link先を確認

Mehrin Kiani, Javier Andreu-Perez, Hani Hagras, Silvia Rigato, and Maria Laura Filippetti

(参考訳) 過去数十年間、人間の脳の発達を調べるためにますます採用されている非侵襲的な神経画像技術が著しく進歩してきた。しかし、これらの改善は必ずしも、機能的脳発達のメカニズムを説明することができる、より洗練されたデータ分析尺度に従わなかった。例えば、単変量(脳の単一領域)から多変量(脳の複数領域)への変化分析パラダイムは、異なる脳領域間の相互作用の調査を可能にするために重要である。しかし、発達する脳領域間の相互作用に光を当てる多変量解析の可能性にもかかわらず、人工知能(AI)技術を適用して分析を説明不能にする。本研究の目的は,現在最先端のAI技術が機能的脳発達にどのような影響を及ぼすかを理解することである。さらに、発達認知神経科学(DCN)フレームワークによって定義された脳発達のプロセスに基づいて、どのAI技術が学習を説明するかのレビューも実施されている。この研究は、eXplainable AI(XAI)がDCNフレームワークによって仮説された機能的脳開発を調査するための実行可能な方法を提供するかもしれないことも示唆している。

The last decades have seen significant advancements in non-invasive neuroimaging technologies that have been increasingly adopted to examine human brain development. However, these improvements have not necessarily been followed by more sophisticated data analysis measures that are able to explain the mechanisms underlying functional brain development. For example, the shift from univariate (single area in the brain) to multivariate (multiple areas in brain) analysis paradigms is of significance as it allows investigations into the interactions between different brain regions. However, despite the potential of multivariate analysis to shed light on the interactions between developing brain regions, artificial intelligence (AI) techniques applied render the analysis non-explainable. The purpose of this paper is to understand the extent to which current state-of-the-art AI techniques can inform functional brain development. In addition, a review of which AI techniques are more likely to explain their learning based on the processes of brain development as defined by developmental cognitive neuroscience (DCN) frameworks is also undertaken. This work also proposes that eXplainable AI (XAI) may provide viable methods to investigate functional brain development as hypothesised by DCN frameworks.

翻訳日:2021-12-28 16:51:06 公開日:2021-12-24

# 小さなニューラルネットワークと小さなトレーニングセットを駆使した深部神経進化の研究:MRI脳系列分類へのサンプル応用

Deep Neuroevolution Squeezes More out of Small Neural Networks and Small Training Sets: Sample Application to MRI Brain Sequence Classification ( http://arxiv.org/abs/2112.12990v1 )

ライセンス: Link先を確認

Joseph N Stember, Hrithwik Shalu

(参考訳) 目的:Deep Neuroevolution (DNE)は、小さなニューラルネットワークと小さなトレーニングセットでうまく機能する放射線学人工知能(AI)を提供することを約束している。我々は、MRI脳シークエンス分類へのプループ・オブ・プリンシプルの適用を通して、この可能性を実現することを目指している。方法】T1,T1ポストコントラスト,T2-FLAIR,T2-FLAIRの4つのシークエンス/重み付けで20例のトレーニングセットを解析した。我々は、比較的小さな畳み込みニューラルネットワーク(cnn)のパラメータを次のように訓練した。次に,CNNトレーニングセットの精度を測定し,後者を適合度評価指標とした。最も適した児童CNNが同定された。私たちは彼らの突然変異を親CNNに組み込んだ。この選択的に変異した親は次世代の親cnnとなった。私たちは約5万世代にわたってこのプロセスを繰り返しました。結果: DNEは単調収束を100%トレーニングセット精度で達成した。 dneはまた、100%テストセット精度に単調に収束した。結論: DNEは小さなトレーニングセットと小さなCNNで完全な精度を達成することができる。特に、深層強化学習と組み合わせると、dneは放射線学aiを学習能力においてより人間らしくする探求の道を開くかもしれない。 DNEは、新しいタスクや新しいイメージタイプに適応できる、放射線医学のAIアルゴリズムの、予想されるメタラーニング体制の重要な構成要素であるかもしれない。

Purpose: Deep Neuroevolution (DNE) holds the promise of providing radiology artificial intelligence (AI) that performs well with small neural networks and small training sets. We seek to realize this potential via a proof-of-principle application to MRI brain sequence classification. Methods: We analyzed a training set of 20 patients, each with four sequences/weightings: T1, T1 post-contrast, T2, and T2-FLAIR. We trained the parameters of a relatively small convolutional neural network (CNN) as follows: First, we randomly mutated the CNN weights. We then measured the CNN training set accuracy, using the latter as the fitness evaluation metric. The fittest child CNNs were identified. We incorporated their mutations into the parent CNN. This selectively mutated parent became the next generation's parent CNN. We repeated this process for approximately 50,000 generations. Results: DNE achieved monotonic convergence to 100% training set accuracy. DNE also converged monotonically to 100% testing set accuracy. Conclusions: DNE can achieve perfect accuracy with small training sets and small CNNs. Particularly when combined with Deep Reinforcement Learning, DNE may provide a path forward in the quest to make radiology AI more human-like in its ability to learn. DNE may very well turn out to be a key component of the much-anticipated meta-learning regime of radiology AI algorithms that can adapt to new tasks and new image types, similar to human radiologists.

翻訳日:2021-12-28 16:50:44 公開日:2021-12-24

# ドップラー速度に基づく移動物体のクラスタリングと速度推定アルゴリズム

Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects ( http://arxiv.org/abs/2112.12984v1 )

ライセンス: Link先を確認

Mian Guo, Kai Zhong, Xiaozhi Wang

(参考訳) 本研究では,FMCW LiDARの特性に基づくドップラー速度クラスタと速度推定アルゴリズムを提案する。我々は同じ物体上でドップラー速度の連続性を証明した。この原理に基づき,領域拡大クラスタリングアルゴリズムを用いて移動物体と静止背景の区別を実現する。得られた静止背景を用いて最小二乗法によりFMCW LiDARの速度を推定する。次に,推定lidar速度とクラスタリングにより得られた移動物体のドップラー速度を用いて,移動物体の速度を推定する。リアルタイム処理を確保するために,適切な最小二乗パラメータを設定した。一方、このアルゴリズムの有効性を検証するため、自動走行シミュレーションプラットフォームCARLA上でFMCW LiDARモデルを作成し、データを生成する。その結果,少なくとも450万点を処理でき,ryzen 3600x cpuの演算能力で毎秒150個の移動物体の速度を推定でき,動作状態検出精度は99%以上,速度精度は0.1m/sであった。

We propose a Doppler velocity-based cluster and velocity estimation algorithm based on the characteristics of FMCW LiDAR which achieves highly accurate, single-scan, and real-time motion state detection and velocity estimation. We prove the continuity of the Doppler velocity on the same object. Based on this principle, we achieve the distinction between moving objects and stationary background via region growing clustering algorithm. The obtained stationary background will be used to estimate the velocity of the FMCW LiDAR by the least-squares method. Then we estimate the velocity of the moving objects using the estimated LiDAR velocity and the Doppler velocity of moving objects obtained by clustering. To ensure real-time processing, we set the appropriate least-squares parameters. Meanwhile, to verify the effectiveness of the algorithm, we create the FMCW LiDAR model on the autonomous driving simulation platform CARLA for spawning data. The results show that our algorithm can process at least a 4.5million points and estimate the velocity of 150 moving objects per second under the arithmetic power of the Ryzen 3600x CPU, with a motion state detection accuracy of over 99% and estimated velocity accuracy of 0.1 m/s.

翻訳日:2021-12-28 16:47:05 公開日:2021-12-24

# US-GAN:表情合成における究極のスキップ接続の重要性について

US-GAN: On the importance of Ultimate Skip Connection for Facial Expression Synthesis ( http://arxiv.org/abs/2112.13002v1 )

ライセンス: Link先を確認

Arbish Akram and Nazar Khan

(参考訳) 近年の研究では、顔表情合成のための多領域画像・画像翻訳において顕著な結果が示されている。これらの手法は有効であるが, 多数のラベル付きサンプルが必要である。より小さなデータセットでトレーニングすると、パフォーマンスが大幅に低下します。この制限に対処するため,本研究では,us-ganという,より小さなデータセットを用いることで,実用的な表現を合成する手法を提案する。提案手法は,1つの残差ブロック,復号層,および入力画像と出力画像とを接続する究極のスキップ接続を符号化する。最先端の表情合成法に比べて3倍少ないパラメータを持つ。実験により,提案手法の定量的,定性的な有効性を示した。また,入力顔画像の鮮やかな顔と全体色の詳細を復元するのには,最終的なスキップ接続が十分であることを示す。

Recent studies have shown impressive results in multi-domain image-to-image translation for facial expression synthesis. While effective, these methods require a large number of labelled samples for plausible results. Their performance significantly degrades when we train them on smaller datasets. To address this limitation, in this work, we present US-GAN, a smaller and effective method for synthesizing plausible expressions by employing notably smaller datasets. The proposed method comprises of encoding layers, single residual block, decoding layers and an ultimate skip connection that links the input image to an output image. It has three times lesser parameters as compared to state-of-the-art facial expression synthesis methods. Experimental results demonstrate the quantitative and qualitative effectiveness of our proposed method. In addition, we also show that an ultimate skip connection is sufficient for recovering rich facial and overall color details of the input face image that a larger state-of-the-art model fails to recover.

翻訳日:2021-12-28 16:46:46 公開日:2021-12-24

# 航行可能な地域への接地言語命令

Grounding Linguistic Commands to Navigable Regions ( http://arxiv.org/abs/2112.13031v1 )

ライセンス: Link先を確認

Nivedita Rufus, Kanishk Jain, Unni Krishnan R Nair, Vineet Gandhi, K Madhava Krishna

(参考訳) 人間は「黄色いセダンの隣の公園」のような言語コマンドを熱心に理解し、車両が走行すべき道路のどの地域を直感的に知ることができる。この能力を自動運転車に拡張することは、人間の指示に応えて行動する完全自律型エージェントを作るための次のステップだ。そこで本研究では,ナビゲーション可能な地域 (RNR) の参照という新たな課題,すなわち言語命令に基づくナビゲーションに対する関心領域の接地について提案する。 RNRは参照イメージセグメンテーション(RIS)とは違い、ナビゲーション可能な領域を接地するのではなく、自然言語表現によって参照されるオブジェクトを接地することに焦点を当てている。例えば、「黄色いセダンの隣の駐車場」というコマンドは、RISが参照するセダンを分割することを目的としており、RNRが提案する駐車エリアを道路上に分割することを目的としている。既存のtalk2carデータセットを言語コマンドで記述された領域のセグメンテーションマスクで拡張する,新たなデータセットであるtalk2car-regsegを紹介する。データセットの実用性を評価するために、簡潔なmanoeuvre指向のコマンドで別々のテストスプリットが提供されます。提案するデータセットを新しいトランスフォーマーベースのアーキテクチャを用いてベンチマークする。複数の評価基準において,広範なアブレーションを行い,ベースラインよりも優れた性能を示す。 RNR出力に基づく下流経路プランナが提案手法の有効性を確認した。

Humans have a natural ability to effortlessly comprehend linguistic commands such as "park next to the yellow sedan" and instinctively know which region of the road the vehicle should navigate. Extending this ability to autonomous vehicles is the next step towards creating fully autonomous agents that respond and act according to human commands. To this end, we propose the novel task of Referring Navigable Regions (RNR), i.e., grounding regions of interest for navigation based on the linguistic command. RNR is different from Referring Image Segmentation (RIS), which focuses on grounding an object referred to by the natural language expression instead of grounding a navigable region. For example, for a command "park next to the yellow sedan," RIS will aim to segment the referred sedan, and RNR aims to segment the suggested parking region on the road. We introduce a new dataset, Talk2Car-RegSeg, which extends the existing Talk2car dataset with segmentation masks for the regions described by the linguistic commands. A separate test split with concise manoeuvre-oriented commands is provided to assess the practicality of our dataset. We benchmark the proposed dataset using a novel transformer-based architecture. We present extensive ablations and show superior performance over baselines on multiple evaluation metrics. A downstream path planner generating trajectories based on RNR outputs confirms the efficacy of the proposed framework.

翻訳日:2021-12-28 16:46:33 公開日:2021-12-24

# 汎用wasserstein dice loss, test-time augmentation, and transformers for the brats 2021 challenge

Generalized Wasserstein Dice Loss, Test-time Augmentation, and Transformers for the BraTS 2021 challenge ( http://arxiv.org/abs/2112.13054v1 )

ライセンス: Link先を確認

Lucas Fidon, Suprosanna Shit, Ivan Ezhov, Johannes C. Paetzold, S\'ebastien Ourselin, Tom Vercauteren

(参考訳) 多重磁気共鳴イメージング(MRI)による脳腫瘍のセグメント化は、医療画像計算において難しい課題である。主な課題は、様々なスキャナーとイメージングプロトコルへの一般化性にある。本稿では,予測時間を増やすことなくモデルロバスト性を高める戦略を検討する。この目的に向けて、異なる損失、オプティマイザ、および列車価データ分割を用いて訓練されたモデルから堅牢なアンサンブルを見つけることを検討する。重要なことは、U-Netアーキテクチャのボトルネックにトランスフォーマーが組み込まれていることである。ボトルネック内のトランスフォーマーは、平均でu-netのベースラインよりもわずかに悪いが、一般的なwasserstein dice損失は一貫して優れた結果をもたらす。さらに,高速かつロバストな推論のために,効率的なテスト時間拡張戦略を採用する。テストタイム増強を伴う7つの3次元U-Netの最終的なアンサンブルは、BraTS 2021テストデータセットで評価すると平均89.4%、平均ハウスドルフ95%距離10.0mmとなる。私たちのコードとトレーニングされたモデルはhttps://github.com/LucasFidon/TRABIT_BraTS2021で公開されています。

Brain tumor segmentation from multiple Magnetic Resonance Imaging (MRI) modalities is a challenging task in medical image computation. The main challenges lie in the generalizability to a variety of scanners and imaging protocols. In this paper, we explore strategies to increase model robustness without increasing inference time. Towards this aim, we explore finding a robust ensemble from models trained using different losses, optimizers, and train-validation data split. Importantly, we explore the inclusion of a transformer in the bottleneck of the U-Net architecture. While we find transformer in the bottleneck performs slightly worse than the baseline U-Net in average, the generalized Wasserstein Dice loss consistently produces superior results. Further, we adopt an efficient test time augmentation strategy for faster and robust inference. Our final ensemble of seven 3D U-Nets with test-time augmentation produces an average dice score of 89.4% and an average Hausdorff 95% distance of 10.0 mm when evaluated on the BraTS 2021 testing dataset. Our code and trained models are publicly available at https://github.com/LucasFidon/TRABIT_BraTS2021.

翻訳日:2021-12-28 16:46:11 公開日:2021-12-24

# クリニカルエビデンスから変態を識別する:要約文と引用文のテキスト特徴を用いた臨床研究の分類

Distinguishing Transformative from Incremental Clinical Evidence: A Classifier of Clinical Research using Textual features from Abstracts and Citing Sentences ( http://arxiv.org/abs/2112.12996v1 )

ライセンス: Link先を確認

Xuanyu Shi, Jian Du

(参考訳) 臨床研究や臨床意思決定においては、ある研究が変化したか、特定の疾患管理のための現在のケア基準のみを支持しているかを知ることが重要である。このような変化をトランスフォーメーションとして定義し、インクリメンタルな研究としてサポートします。通常、そのようなタスクを人間が完了するには膨大な量のドメインの専門知識と時間が必要です。教員の意見は、ある研究が確立した研究に挑戦するかどうかについて、よく注釈付きコーパスを与えてくれます。本研究では, 段階的臨床証拠と変態を区別する機械学習手法を提案する。また,2年間の引用文の要約と引用文の窓からのテキストを,学部オピニオンズの専門家が推奨し,ラベル付けした臨床研究のトレーニングセットとして収集した。平均 auc は 0.755 (0.705-0.875) であり、ランダムフォレストを分類器とし、文を特徴とする。その結果,変換研究は抽象文と異なり,文を引用する言語パターンが典型的であることがわかった。我々は,これらの臨床証拠の特定が困難であるか,あるいは臨床医や研究者が確立した主張を裏付けるだけの効果的なツールを提供する。

In clinical research and clinical decision-making, it is important to know if a study changes or only supports the current standards of care for specific disease management. We define such a change as transformative and a support as incremental research. It usually requires a huge amount of domain expertise and time for humans to finish such tasks. Faculty Opinions provides us with a well-annotated corpus on whether a research challenges or only confirms established research. In this study, a machine learning approach is proposed to distinguishing transformative from incremental clinical evidence. The texts from both abstract and a 2-year window of citing sentences are collected for a training set of clinical studies recommended and labeled by Faculty Opinions experts. We achieve the best performance with an average AUC of 0.755 (0.705-0.875) using Random Forest as the classifier and citing sentences as the feature. The results showed that transformative research has typical language patterns in citing sentences unlike abstract sentences. We provide an efficient tool for identifying those clinical evidence challenging or only confirming established claims for clinicians and researchers.

翻訳日:2021-12-28 16:32:49 公開日:2021-12-24

# TSAXのトレンド

TSAX is Trending ( http://arxiv.org/abs/2112.12912v1 )

ライセンス: Link先を確認

Muhammad Marwan Muhammad Fuad

(参考訳) 時系列データはユビキタスであり、複数の領域に多くの応用があるため、時系列マイニングはデータマイニングの重要な分野である。時系列採掘の主な課題は分類である。時系列表現法は時系列分類や他の時系列マイニングタスクにおいて重要な役割を果たしている。時系列データの最も一般的な表現方法の1つは、シンボリックアグリゲート近似(SAX)である。その人気の背後にある秘密は、シンプルさと効率だ。しかしsaxには、トレンド情報を表現できないという大きな欠点がある。 SAXがトレンド情報を取得するためのいくつかの方法が提案されているが、これは複雑な処理、前処理、後処理の手順を犠牲にしている。本稿では,SAXに最小限の複雑さを与えるだけで,時系列分類における性能を大幅に向上させる,Trending SAX (TSAX) と呼ばれる新しいSAXを提案する。これは50のデータセットで実験的に検証される。その結果,SAXと比較して39データセットの分類誤差が小さいため,本手法の優れた性能を示した。

Time series mining is an important branch of data mining, as time series data is ubiquitous and has many applications in several domains. The main task in time series mining is classification. Time series representation methods play an important role in time series classification and other time series mining tasks. One of the most popular representation methods of time series data is the Symbolic Aggregate approXimation (SAX). The secret behind its popularity is its simplicity and efficiency. SAX has however one major drawback, which is its inability to represent trend information. Several methods have been proposed to enable SAX to capture trend information, but this comes at the expense of complex processing, preprocessing, or post-processing procedures. In this paper we present a new modification of SAX that we call Trending SAX (TSAX), which only adds minimal complexity to SAX, but substantially improves its performance in time series classification. This is validated experimentally on 50 datasets. The results show the superior performance of our method, as it gives a smaller classification error on 39 datasets compared with SAX.

翻訳日:2021-12-28 16:31:45 公開日:2021-12-24

# 周期的再構成による絡み合い

Disentanglement by Cyclic Reconstruction ( http://arxiv.org/abs/2112.12980v1 )

ライセンス: Link先を確認

David Bertoin, Emmanuel Rachelson (DMIA)

(参考訳) ディープニューラルネットワークは、データから意味のある特徴を自動的に抽出する能力を示している。しかし、教師付き学習では、トレーニングに使用されるデータセット特有の情報が、手元のタスクとは無関係であり、抽出された表現にエンコードされる可能性がある。この残りの情報はドメイン固有のバイアスをもたらし、一般化性能を弱める。本研究では,その情報をタスク関連表現とその補完的文脈表現に分割することを提案する。提案手法は, 逆特徴予測器と循環再構成を組み合わせることで, これら2つの表現を単一領域教師ありの場合に分離する手法である。次に、この手法を教師なし領域適応問題に適用し、ソースとターゲットドメインの両方で実行可能なモデルを訓練する。特に,トレーニングラベルの欠如にもかかわらず,対象領域のゆがみを促進する手法を提案する。これにより、両方のドメインからタスク固有の情報を分離し、共通の表現に投影することができる。タスク固有の表現は、ソースドメインからターゲットドメインに取得した知識の効率的な転送を可能にする。単一ドメインの場合、情報検索タスクにおける表現の質と、強化されたタスク固有の表現によって引き起こされる一般化の利点を示す。次に,いくつかの古典的ドメイン適応ベンチマークで提案手法を検証し,ドメイン適応における絡み合いの利点を説明する。

Deep neural networks have demonstrated their ability to automatically extract meaningful features from data. However, in supervised learning, information specific to the dataset used for training, but irrelevant to the task at hand, may remain encoded in the extracted representations. This remaining information introduces a domain-specific bias, weakening the generalization performance. In this work, we propose splitting the information into a task-related representation and its complementary context representation. We propose an original method, combining adversarial feature predictors and cyclic reconstruction, to disentangle these two representations in the single-domain supervised case. We then adapt this method to the unsupervised domain adaptation problem, consisting of training a model capable of performing on both a source and a target domain. In particular, our method promotes disentanglement in the target domain, despite the absence of training labels. This enables the isolation of task-specific information from both domains and a projection into a common representation. The task-specific representation allows efficient transfer of knowledge acquired from the source domain to the target domain. In the single-domain case, we demonstrate the quality of our representations on information retrieval tasks and the generalization benefits induced by sharpened task-specific representations. We then validate the proposed method on several classical domain adaptation benchmarks and illustrate the benefits of disentanglement for domain adaptation.

翻訳日:2021-12-28 16:29:43 公開日:2021-12-24

# 教師なしドメイン適応再同定のための擬似ラベル作成におけるベストプラクティスの形式的アプローチ

A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification ( http://arxiv.org/abs/2112.12887v1 )

ライセンス: Link先を確認

Fabian Dubourvieux, Romaric Audigier, Ang\'elique Loesch, Samia Ainouz, St\'ephane Canu

(参考訳) Unsupervised Domain Adaptive (UDA) Re-Identification (re-ID) に最適なパフォーマンスで対処するためには、擬似ラベルの使用が一般的である。実際、このアプローチのファミリはいくつかのUDA re-ID固有のフレームワークを生み出しました。これらの研究において、uda re-idのパフォーマンスを改善するための研究方向は様々であり、主に直観と実験に基づいている: 擬似ラベルを精錬し、擬似ラベルにおけるエラーの影響を低減させる。あらゆる擬似ラベルメソッドに実装できる一般的な優れたプラクティスから、そのパフォーマンスを一貫して向上させるのは難しいかもしれません。この課題に対処するために、擬似ラベル UDA re-ID に関する新たな理論的考察を提案する。貢献は3つあります。一擬似ラベル型UDA re-IDの新たな理論枠組みで、UDA re-ID性能に関する新たな一般学習上界を通じて定式化される。 (ii)疑似ラベル付けの一般的な実践は,提案する理論枠組みの解釈から直接導き出され,目標の再識別性能が向上する。 3) 課題のある人物と車両のクロスデータセット・リIDタスクに対する広範囲な実験により, 様々な最先端手法に対する一貫した性能向上と, グッドプラクティスの様々な提案がなされた。

The use of pseudo-labels prevails in order to tackle Unsupervised Domain Adaptive (UDA) Re-Identification (re-ID) with the best performance. Indeed, this family of approaches has given rise to several UDA re-ID specific frameworks, which are effective. In these works, research directions to improve Pseudo-Labeling UDA re-ID performance are varied and mostly based on intuition and experiments: refining pseudo-labels, reducing the impact of errors in pseudo-labels... It can be hard to deduce from them general good practices, which can be implemented in any Pseudo-Labeling method, to consistently improve its performance. To address this key question, a new theoretical view on Pseudo-Labeling UDA re-ID is proposed. The contributions are threefold: (i) A novel theoretical framework for Pseudo-Labeling UDA re-ID, formalized through a new general learning upper-bound on the UDA re-ID performance. (ii) General good practices for Pseudo-Labeling, directly deduced from the interpretation of the proposed theoretical framework, in order to improve the target re-ID performance. (iii) Extensive experiments on challenging person and vehicle cross-dataset re-ID tasks, showing consistent performance improvements for various state-of-the-art methods and various proposed implementations of good practices.

翻訳日:2021-12-28 16:03:21 公開日:2021-12-24

# 非条件モデルを用いたクラスタ誘導画像合成

Cluster-guided Image Synthesis with Unconditional Models ( http://arxiv.org/abs/2112.12911v1 )

ライセンス: Link先を確認

Markos Georgopoulos, James Oldfield, Grigorios G Chrysos, Yannis Panagakis

(参考訳) GAN(Generative Adversarial Networks)は、画像生成における最先端の原動力である。高解像度フォトリアリスティック画像を合成する能力はあるものの、異なる粒度のオンデマンドコンディショニングでコンテンツを生成することは課題である。この課題は通常、巨大なデータセットに興味のある属性をアノテートすることで解決される。したがって、教師なし生成モデルの生成プロセスに制御を導入することが不可欠である。本研究では,教師なし方式でよく訓練されたGANを活用して,制御可能な画像生成に焦点を当てる。この目的のために、生成元の中間層の表現空間は、意味的に意味のある属性(例えば、髪の色とポーズ)に基づいてデータを分離する多数のクラスタを形成する。クラスタ割り当てを条件付けすることで、提案手法は生成された画像の意味クラスを制御することができる。提案手法は,Implicit Maximum Likelihood Estimation (IMLE)による各クラスタからのサンプリングを可能にする。顔(CelebA-HQとFFHQ)、動物(Imagenet)、オブジェクト(LSUN)に対するアプローチの有効性を,異なる事前学習生成モデルを用いて示す。その結果,顔の性別,ポーズ,ヘアスタイルなどの属性による条件画像生成,およびさまざまな対象のクラスにおけるさまざまな特徴が明らかになった。

Generative Adversarial Networks (GANs) are the driving force behind the state-of-the-art in image generation. Despite their ability to synthesize high-resolution photo-realistic images, generating content with on-demand conditioning of different granularity remains a challenge. This challenge is usually tackled by annotating massive datasets with the attributes of interest, a laborious task that is not always a viable option. Therefore, it is vital to introduce control into the generation process of unsupervised generative models. In this work, we focus on controllable image generation by leveraging GANs that are well-trained in an unsupervised fashion. To this end, we discover that the representation space of intermediate layers of the generator forms a number of clusters that separate the data according to semantically meaningful attributes (e.g., hair color and pose). By conditioning on the cluster assignments, the proposed method is able to control the semantic class of the generated image. Our approach enables sampling from each cluster by Implicit Maximum Likelihood Estimation (IMLE). We showcase the efficacy of our approach on faces (CelebA-HQ and FFHQ), animals (Imagenet) and objects (LSUN) using different pre-trained generative models. The results highlight the ability of our approach to condition image generation on attributes like gender, pose and hair style on faces, as well as a variety of features on different object classes.

翻訳日:2021-12-28 16:02:57 公開日:2021-12-24

# すべてのボクセルが等しくない:ポイント・ボクセルの視点からのセマンティックシーンの完成

Not All Voxels Are Equal: Semantic Scene Completion from the Point-Voxel Perspective ( http://arxiv.org/abs/2112.12925v1 )

ライセンス: Link先を確認

Jiaxiang Tang, Xiaokang Chen, Jingbo Wang, Gang Zeng

(参考訳) 本稿では,3dシーンの意味的・占有的表現を予測するための有用なタスクであるセマンティック・シーン・コンプリート(ssc)を再検討する。このタスクの多くのメソッドは、常に局所的なシーン構造を維持するためのボキセル化シーン表現に基づいている。しかしながら、目に見えない空ボクセルが存在するため、ネットワークがより深くなると、これらの手法は常に重い計算冗長性に苦しむため、完成品質が制限される。このジレンマに対処するために,本課題に対する新しい点-ボクセルアグリゲーションネットワークを提案する。まず,これら見えない空のボクセルを除去し,そのシーンから意味情報を効率よく捉えるために,深い点ストリームを採用することにより,ボクセル化シーンを点雲に転送する。一方、2つの3次元畳み込み層のみを含む軽量ボクセルストリームは、ボクセル化されたシーンの局所構造を保存する。さらに、ボクセルストリームからポイントストリームに構造の詳細を融合する異方性ボクセルアグリゲーション演算子と、ポイントストリームにおけるアップサンプリングプロセスを意味ラベルによって強化する意味認識伝播モジュールを設計した。入力として深度画像しか持たない2つのベンチマークにおいて,我々のモデルが最先端をはるかに上回ることを示す。

We revisit Semantic Scene Completion (SSC), a useful task to predict the semantic and occupancy representation of 3D scenes, in this paper. A number of methods for this task are always based on voxelized scene representations for keeping local scene structure. However, due to the existence of visible empty voxels, these methods always suffer from heavy computation redundancy when the network goes deeper, and thus limit the completion quality. To address this dilemma, we propose our novel point-voxel aggregation network for this task. Firstly, we transfer the voxelized scenes to point clouds by removing these visible empty voxels and adopt a deep point stream to capture semantic information from the scene efficiently. Meanwhile, a light-weight voxel stream containing only two 3D convolution layers preserves local structures of the voxelized scenes. Furthermore, we design an anisotropic voxel aggregation operator to fuse the structure details from the voxel stream into the point stream, and a semantic-aware propagation module to enhance the up-sampling process in the point stream by semantic labels. We demonstrate that our model surpasses state-of-the-arts on two benchmarks by a large margin, with only depth images as the input.

翻訳日:2021-12-28 16:02:32 公開日:2021-12-24

# 一般化ゼロショット分類のための学習型クロスモーダル表現

Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification ( http://arxiv.org/abs/2112.12927v1 )

ライセンス: Link先を確認

Zhiyu Fang, Xiaobin Zhu, Chun Yang, Zheng Han, Jingyan Qin, Xu-Cheng Yin

(参考訳) クロスモーダルオートエンコーダの潜伏空間を整列させて一般的な潜伏埋め込みを学習することは、一般化ゼロショット分類(GZSC)の効果的な戦略である。しかし、粒度の細かいインスタンス単位のアノテーションが欠如しているため、多様化した画像の視覚表現と固定属性の意味表現との相違により、ドメインシフトの問題に悩まされる。本稿では,GZSCのためのアラインド・クロスモーダル表現(ACMR)を学習する,革新的なオートエンコーダネットワークを提案する。具体的には,学習した分類器によって導かれる潜在部分空間上でのクロスモーダル潜在特徴のアライメントを強化するための新しいビジョン・セマンティクスアライメント(vsa)法を提案する。さらに,潜伏変数の識別能力を高めるとともに,潜伏変数が崩壊する可能性を低減するための新しい情報拡張モジュール(IEM)を提案する。公開データセットに関する広範囲な実験により,本手法の最先端性能が実証された。

Learning a common latent embedding by aligning the latent spaces of cross-modal autoencoders is an effective strategy for Generalized Zero-Shot Classification (GZSC). However, due to the lack of fine-grained instance-wise annotations, it still easily suffer from the domain shift problem for the discrepancy between the visual representation of diversified images and the semantic representation of fixed attributes. In this paper, we propose an innovative autoencoder network by learning Aligned Cross-Modal Representations (dubbed ACMR) for GZSC. Specifically, we propose a novel Vision-Semantic Alignment (VSA) method to strengthen the alignment of cross-modal latent features on the latent subspaces guided by a learned classifier. In addition, we propose a novel Information Enhancement Module (IEM) to reduce the possibility of latent variables collapse meanwhile encouraging the discriminative ability of latent variables. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method.

翻訳日:2021-12-28 16:01:43 公開日:2021-12-24

# 意味セグメンテーションのためのリアルタイムグローバルアテンションネットワーク

Realtime Global Attention Network for Semantic Segmentation ( http://arxiv.org/abs/2112.12939v1 )

ライセンス: Link先を確認

Xi Mo, Xiangyu Chen

(参考訳) 本稿では,セマンティックセグメンテーションの課題に対して,エンドツーエンドのグローバルアテンションニューラルネットワーク(RGANet)を提案する。自己注意パラダイムによって展開される符号化戦略とは違って,提案するグローバルアテンションモジュールは,奥行きの畳み込みやアフィン変換を通じてグローバルアテンションを符号化する。これらのグローバルアテンションモジュールを階層アーキテクチャに統合することは、高い推論性能を維持する。さらに,非凸,広く散在する地盤トラス領域の負の効果を軽減するため,改良された評価指標であるMGRIDを提案する。セマンティックセグメンテーションのための最先端アーキテクチャに関する広範な実験の結果は、ロボット単眼視覚に対する提案手法の先進的な性能を示している。

In this paper, we proposed an end-to-end realtime global attention neural network (RGANet) for the challenging task of semantic segmentation. Different from the encoding strategy deployed by self-attention paradigms, the proposed global attention module encodes global attention via depth-wise convolution and affine transformations. The integration of these global attention modules into a hierarchy architecture maintains high inferential performance. In addition, an improved evaluation metric, namely MGRID, is proposed to alleviate the negative effect of non-convex, widely scattered ground-truth areas. Results from extensive experiments on state-of-the-art architectures for semantic segmentation manifest the leading performance of proposed approaches for robotic monocular visual perception.

翻訳日:2021-12-28 16:01:26 公開日:2021-12-24

# 入射ニューラル表現によるRGB画像からの連続スペクトル再構成

Continuous Spectral Reconstruction from RGB Images via Implicit Neural Representation ( http://arxiv.org/abs/2112.13003v1 )

ライセンス: Link先を確認

Ruikang Xu, Mingde Yao, Chang Chen, Lizhi Wang, Zhiwei Xiong

(参考訳) 既存のスペクトル再構成法は通常、RGB画像から多くのスペクトル帯域への離散写像を学ぶ。しかし、このモデリング戦略はスペクトルシグネチャの連続性を無視している。本稿では,新しい連続スペクトル表現を導入することにより,この限界を解消するためのニューラルスペクトル再構成(nesr)を提案する。この目的のために、暗黙の関数の概念を採用し、ニューラルネットワークを用いたパラメータ化実施を行う。具体的には,まずバックボーンネットワークを用いてRGB入力の空間的特徴を抽出する。本研究では,スペクトルプロファイル補間(spi)モジュールとニューラル・アテンション・マッピング(nam)モジュール(nam)モジュールを考案し,空間スペクトル相関がより良い表現に関わっている深い特徴を強調する。次に、サンプルスペクトルバンドの数を連続的な暗黙関数の座標と見なして、深い特徴からスペクトル強度への投影を学習する。広範な実験により、nesrのベースライン法に対する再構成精度の差が示される。さらにnesrは、任意の数のスペクトル帯域を目標出力として有効にすることで、スペクトル再構成の柔軟性を拡張する。

Existing methods for spectral reconstruction usually learn a discrete mapping from RGB images to a number of spectral bands. However, this modeling strategy ignores the continuous nature of spectral signature. In this paper, we propose Neural Spectral Reconstruction (NeSR) to lift this limitation, by introducing a novel continuous spectral representation. To this end, we embrace the concept of implicit function and implement a parameterized embodiment with a neural network. Specifically, we first adopt a backbone network to extract spatial features of RGB inputs. Based on it, we devise Spectral Profile Interpolation (SPI) module and Neural Attention Mapping (NAM) module to enrich deep features, where the spatial-spectral correlation is involved for a better representation. Then, we view the number of sampled spectral bands as the coordinate of continuous implicit function, so as to learn the projection from deep features to spectral intensities. Extensive experiments demonstrate the distinct advantage of NeSR in reconstruction accuracy over baseline methods. Moreover, NeSR extends the flexibility of spectral reconstruction by enabling an arbitrary number of spectral bands as the target output.

翻訳日:2021-12-28 16:01:14 公開日:2021-12-24

# ベンチマーク歩行者オドメトリ:brown pedestrian odometry dataset (bpod)

Benchmarking Pedestrian Odometry: The Brown Pedestrian Odometry Dataset (BPOD) ( http://arxiv.org/abs/2112.13018v1 )

ライセンス: Link先を確認

David Charatan, Hongyi Fan, Benjamin Kimia

(参考訳) 頭部装着歩行者設定における視覚計測アルゴリズムのベンチマークのためのBrown Pedestrian Odometry Dataset(BPOD)を提案する。このデータセットは、ブラウン大学のキャンパスの様々な屋内および屋外の12箇所で、グローバルおよびローリングシャッターステレオカメラを用いて撮影された。既存のデータセットと比較すると、BPODは画像のぼやけや自転を多く含んでいる。歩行者の経路に沿って設置されたスティックオンマーカーから地中軌道を生成し、第三者ビデオを用いて歩行者の位置を文書化する。 BPOD上での直接的・特徴的・学習型VO法の性能評価を行った。以上の結果から,歩行者軌跡の把握には重要な開発が必要であることが示唆された。データセットへのリンクはこちら。 \url{https://doi.org/10.26300/c1n7-7p93

We present the Brown Pedestrian Odometry Dataset (BPOD) for benchmarking visual odometry algorithms in head-mounted pedestrian settings. This dataset was captured using synchronized global and rolling shutter stereo cameras in 12 diverse indoor and outdoor locations on Brown University's campus. Compared to existing datasets, BPOD contains more image blur and self-rotation, which are common in pedestrian odometry but rare elsewhere. Ground-truth trajectories are generated from stick-on markers placed along the pedestrian's path, and the pedestrian's position is documented using a third-person video. We evaluate the performance of representative direct, feature-based, and learning-based VO methods on BPOD. Our results show that significant development is needed to successfully capture pedestrian trajectories. The link to the dataset is here: \url{https://doi.org/10.26300/c1n7-7p93

翻訳日:2021-12-28 16:00:56 公開日:2021-12-24

# SimViT:スライディングウィンドウを備えたシンプルな視覚変換器

SimViT: Exploring a Simple Vision Transformer with sliding windows ( http://arxiv.org/abs/2112.13085v1 )

ライセンス: Link先を確認

Gang Li, Di Xu, Xing Cheng, Lingyu Si, Changwen Zheng

(参考訳) 視覚変換器は多くの視覚タスクにおいてバックボーンモデルとして優れた性能を発揮しているが、そのほとんどは画像やウィンドウ内の全てのトークンのグローバルな関係を捉えることを目的としており、2D構造におけるパッチ間の固有の空間的および局所的相関を乱す。本稿では、空間構造と局所情報を視覚変換器に組み込むための、SimViTというシンプルな視覚変換器を提案する。具体的には,従来のマルチヘッド・セルフ・アテンションの代わりに,MCSA(Multi-head Central Self-Attention)を導入した。スライディングウィンドウの導入は、空間構造のキャプチャを容易にする。一方、SimViTは複数の層から複数の階層的特徴を抽出し、密集予測を行う。広範な実験により、simvitは様々な画像処理タスクの汎用バックボーンモデルとして効果的かつ効率的であることが示されている。特に我々のSimViT-Microは、ImageNet-1kデータセットで71.1%の精度を達成するために3.3Mパラメータしか必要としていない。私たちのコードはhttps://github.com/ucasligang/simvitで利用可能です。

Although vision Transformers have achieved excellent performance as backbone models in many vision tasks, most of them intend to capture global relations of all tokens in an image or a window, which disrupts the inherent spatial and local correlations between patches in 2D structure. In this paper, we introduce a simple vision Transformer named SimViT, to incorporate spatial structure and local information into the vision Transformers. Specifically, we introduce Multi-head Central Self-Attention(MCSA) instead of conventional Multi-head Self-Attention to capture highly local relations. The introduction of sliding windows facilitates the capture of spatial structure. Meanwhile, SimViT extracts multi-scale hierarchical features from different layers for dense prediction tasks. Extensive experiments show the SimViT is effective and efficient as a general-purpose backbone model for various image processing tasks. Especially, our SimViT-Micro only needs 3.3M parameters to achieve 71.1% top-1 accuracy on ImageNet-1k dataset, which is the smallest size vision Transformer model by now. Our code will be available in https://github.com/ucasligang/SimViT.

翻訳日:2021-12-28 16:00:43 公開日:2021-12-24

# CatchBackdoor:差動ファズリングによる臨界トロイの木馬神経経路同定によるバックドアテスト

CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing ( http://arxiv.org/abs/2112.13064v1 )

ライセンス: Link先を確認

Haibo Jin, Ruoxi Chen, Jinyin Chen, Yao Cheng, Chong Fu, Ting Wang, Yue Yu, and Zhaoyan Ming

(参考訳) 現実世界のアプリケーションにおけるディープニューラルネットワーク(DNN)の成功は、豊富な事前学習モデルの恩恵を受けている。しかし、バックドアで事前訓練されたモデルは下流dnnの配備に重大な脅威をもたらす可能性がある。既存のDNNテスト手法は主に、敵の設定で誤ったコーナーケースの振る舞いを見つけるために設計されているが、強力なトロイの木馬によるバックドアの発見には失敗した。トロジャンネットワークの挙動を観察すると、それらは以前の研究で提案されたように単一の妥協ニューロンによって反映されるだけでなく、複数のニューロンの活性化強度と周波数における臨界神経経路に起因していることがわかる。この作業はDNNのバックドアテストを公式化し、CatchBackdoorフレームワークを提案する。少数の良性例からの臨界ニューロンの微分ファジングにより、トロイの木馬の経路、特に重要な経路を特定し、同定された経路の臨界ニューロンをシミュレートしてバックドアテストの例を生成する。大規模な実験は、既存の方法よりも高い検出性能を持つCatchBackdoorの優位性を実証している。 catchbackdoorは、既存の方法では検出できない、ステルスブレンドとアダプティブアタックによってバックドアを検知する。さらに,モデルゾウにおけるモデルバックドアの可能性を明らかにする実験を行った。

The success of deep neural networks (DNNs) in real-world applications has benefited from abundant pre-trained models. However, the backdoored pre-trained models can pose a significant trojan threat to the deployment of downstream DNNs. Existing DNN testing methods are mainly designed to find incorrect corner case behaviors in adversarial settings but fail to discover the backdoors crafted by strong trojan attacks. Observing the trojan network behaviors shows that they are not just reflected by a single compromised neuron as proposed by previous work but attributed to the critical neural paths in the activation intensity and frequency of multiple neurons. This work formulates the DNN backdoor testing and proposes the CatchBackdoor framework. Via differential fuzzing of critical neurons from a small number of benign examples, we identify the trojan paths and particularly the critical ones, and generate backdoor testing examples by simulating the critical neurons in the identified paths. Extensive experiments demonstrate the superiority of CatchBackdoor, with higher detection performance than existing methods. CatchBackdoor works better on detecting backdoors by stealthy blending and adaptive attacks, which existing methods fail to detect. Moreover, our experiments show that CatchBackdoor may reveal the potential backdoors of models in Model Zoo.

翻訳日:2021-12-28 15:16:03 公開日:2021-12-24

# deepgantt:バック散乱ネットワークのためのスケーラブルなディープラーニングスケジューラ

DeepGANTT: A Scalable Deep Learning Scheduler for Backscatter Networks ( http://arxiv.org/abs/2112.12985v1 )

ライセンス: Link先を確認

Daniel F. Perez-Ramirez, Carlos Perez-Penichet, Nicolas Tsiftes, Thiemo Voigt, Dejan Kostic, Magnus Boman

(参考訳) 最近のバックスキャッター通信技術は、無修正のコモディティ無線デバイスと直接通信しながらバッテリーなしで動作できる超低電力無線デバイスを可能にする。コモディティデバイスは、電池のないノードが環境からエネルギーを集めながら通信し、センサ、計算、通信タスクを行うために必要な無修正キャリアを提供するのに協力する。非変調キャリアの最適プロビジョニングは、NPハード組合せ最適化問題であるため、ネットワークのサイズを制限する。その結果、以前の作品はキャリアの最適化を完全に無視するか、あるいは準最適ヒューリスティックに頼り、貴重なエネルギーとスペクトル資源を浪費した。本稿では,無線通信機器と相互運用する電池フリーデバイスのためのディープラーニングスケジューラDeepGANTTを紹介する。 DeepGANTTはグラフニューラルネットワークを利用して、この問題に固有の可変入力と出力サイズを克服する。我々は,制約最適化解法から得られる比較的小さいサイズの最適スケジュールで,ディープラーニングスケジューラを訓練する。 DeepGANTTは、慎重に設計されたヒューリスティックなソリューションよりも優れているだけでなく、トレーニングされた問題サイズで最適なスケジューラの約3%で性能を発揮する。最後に、DeepGANTTはトレーニングで使用される最大値の4倍以上の問題を一般化し、最適なスケジューラのスケーラビリティの限界を破り、より効率的な後方散乱ネットワークを実現する。

Recent backscatter communication techniques enable ultra low power wireless devices that operate without batteries while interoperating directly with unmodified commodity wireless devices. Commodity devices cooperate in providing the unmodulated carrier that the battery-free nodes need to communicate while collecting energy from their environment to perform sensing, computation, and communication tasks. The optimal provision of the unmodulated carrier limits the size of the network because it is an NP-hard combinatorial optimization problem. Consequently, previous works either ignore carrier optimization altogether or resort to suboptimal heuristics, wasting valuable energy and spectral resources. We present DeepGANTT, a deep learning scheduler for battery-free devices interoperating with wireless commodity ones. DeepGANTT leverages graph neural networks to overcome variable input and output size challenges inherent to this problem. We train our deep learning scheduler with optimal schedules of relatively small size obtained from a constraint optimization solver. DeepGANTT not only outperforms a carefully crafted heuristic solution but also performs within ~3% of the optimal scheduler on trained problem sizes. Finally, DeepGANTT generalizes to problems more than four times larger than the maximum used for training, therefore breaking the scalability limitations of the optimal scheduler and paving the way for more efficient backscatter networks.

翻訳日:2021-12-28 15:13:02 公開日:2021-12-24

# パーソナライズタスクにおける状態空間クラスタリングの無理な効率性について

On the Unreasonable Efficiency of State Space Clustering in Personalization Tasks ( http://arxiv.org/abs/2112.13141v1 )

ライセンス: Link先を確認

Anton Dereventsov, Ranga Raju Vatsavai, Clayton Webster

(参考訳) 本研究では,複雑な報酬信号を用いてパーソナライズタスクを解決するための強化学習(rl)手法を検討する。特に,ネットワークアーキテクチャや最適化アルゴリズムの従来の選択に加えて,単純な$k$-meansアルゴリズムを用いた状態空間クラスタリングを基本としたアプローチである。数値例は異なるrl手順の効率を示し、この手法がエージェントの学習能力を加速し、エージェントの性能を制限しないことを示すために用いられる。

In this effort we consider a reinforcement learning (RL) technique for solving personalization tasks with complex reward signals. In particular, our approach is based on state space clustering with the use of a simplistic $k$-means algorithm as well as conventional choices of the network architectures and optimization algorithms. Numerical examples demonstrate the efficiency of different RL procedures and are used to illustrate that this technique accelerates the agent's ability to learn and does not restrict the agent's performance.

翻訳日:2021-12-28 15:12:41 公開日:2021-12-24

# NIP: 対人攻撃に対するニューロンレベルの逆摂動

NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks ( http://arxiv.org/abs/2112.13060v1 )

ライセンス: Link先を確認

Ruoxi Chen, Haibo Jin, Jinyin Chen, Haibin Zheng, Yue Yu and Shouling Ji

(参考訳) ディープラーニングモデルは前例のない成功を収めたものの、敵攻撃に対する脆弱性は、特にセキュリティクリティカルなドメインにデプロイする場合、注目を集めている。この課題に対処するために、反応性と積極的な戦略を含む多くの防衛戦略が提案されている。画像特徴空間の観点からは、特徴の変化により満足な結果に到達できないものもある。さらに、モデルによって学習された特徴は、分類結果に直接関連しない。それらと異なり、防御法は基本的に内部モデルと異なり、攻撃前後のニューロンの挙動を調査する。我々は、攻撃が正しいラベルに最も寄与するニューロンを劇的に変化させることでモデルに誤解をもたらすことを観察した。そこで我々は、ニューロンの影響の概念を導入し、ニューロンを前部、中部、尾部に分割する。そこで本研究では,神経レベル逆摂動法(NIP)を提案する。前方のニューロンを強化し、尾部のニューロンを弱めることで、NIPは高い良性を維持しながらほぼ全ての敵の摂動を排除できる。さらに、適応性(特に大きいもの)による摂動の大きさの違いにも対処できる。 3つのデータセットと6つのモデルで包括的な実験を行った結果、nipは11の敵の攻撃に対して最先端のベースラインよりも優れていた。さらに,ニューロンの活性化と可視化による解釈可能な証明を提供し,理解を深める。

Although deep learning models have achieved unprecedented success, their vulnerabilities towards adversarial attacks have attracted increasing attention, especially when deployed in security-critical domains. To address the challenge, numerous defense strategies, including reactive and proactive ones, have been proposed for robustness improvement. From the perspective of image feature space, some of them cannot reach satisfying results due to the shift of features. Besides, features learned by models are not directly related to classification results. Different from them, We consider defense method essentially from model inside and investigated the neuron behaviors before and after attacks. We observed that attacks mislead the model by dramatically changing the neurons that contribute most and least to the correct label. Motivated by it, we introduce the concept of neuron influence and further divide neurons into front, middle and tail part. Based on it, we propose neuron-level inverse perturbation(NIP), the first neuron-level reactive defense method against adversarial attacks. By strengthening front neurons and weakening those in the tail part, NIP can eliminate nearly all adversarial perturbations while still maintaining high benign accuracy. Besides, it can cope with different sizes of perturbations via adaptivity, especially larger ones. Comprehensive experiments conducted on three datasets and six models show that NIP outperforms the state-of-the-art baselines against eleven adversarial attacks. We further provide interpretable proofs via neuron activation and visualization for better understanding.

翻訳日:2021-12-28 14:47:03 公開日:2021-12-24

# テキストスタックのスポイラー:トランスフォーマーはどの程度役に立つのか?

Spoiler in a Textstack: How Much Can Transformers Help? ( http://arxiv.org/abs/2112.12913v1 )

ライセンス: Link先を確認

Anna Wr\'oblewska, Pawe{\l} Rzepi\'nski, Sylwia Sysko-Roma\'nczuk

(参考訳) 本稿では,レビューにおけるスポイラー検出に関する研究について述べる。本稿では、利用可能なテキストベースのモデルタスクを微調整し、整理する手法について、最新のディープラーニングの成果とモデルの結果を解釈する手法について述べる。これまで、スポイラー研究は文献にはほとんど記述されていない。我々は,アノテート付きスポイラを備えた2つのオープンデータセット上で,転送学習アプローチと異なる最新のトランスフォーマーアーキテクチャをテストした(roc aucはtv tropes moviesデータセットで81\%,goodreadsデータセットは88\%)。また、データを収集し、きめ細かいアノテーションで新しいデータセットを組み立てました。そこで我々は,モデルの信頼性を評価し,その結果を説明するために,解釈可能性技術と尺度を用いた。

This paper presents our research regarding spoiler detection in reviews. In this use case, we describe the method of fine-tuning and organizing the available text-based model tasks with the latest deep learning achievements and techniques to interpret the models' results. Until now, spoiler research has been rarely described in the literature. We tested the transfer learning approach and different latest transformer architectures on two open datasets with annotated spoilers (ROC AUC above 81\% on TV Tropes Movies dataset, and Goodreads dataset above 88\%). We also collected data and assembled a new dataset with fine-grained annotations. To that end, we employed interpretability techniques and measures to assess the models' reliability and explain their results.

翻訳日:2021-12-28 14:46:09 公開日:2021-12-24

# モノトーン増分量子化解を用いた確率学習方程式

Stochastic Learning Equation using Monotone Increasing Resolution of Quantization ( http://arxiv.org/abs/2112.13006v1 )

ライセンス: Link先を確認

Jinwuk Seok, Jeong-Si Kim

(参考訳) 本稿では,提案アルゴリズムの量子化と確率解析の解法を単調に拡張した量子化学習方程式を提案する。密度分布と均一分布の量子化誤差に対するホワイトノイズ仮説によれば、量子化誤差をホワイトノイズとみなすことができる。このことから,単調に量子化分解能が増加する学習方程式は分布の観点として弱く収束することを示した。本稿では,対象関数のヘシアン制約のような局所収束特性の代わりに,リプシッツ条件を満たす領域に対して,大域的最適化が可能であることを示す。

In this paper, we propose a quantized learning equation with a monotone increasing resolution of quantization and stochastic analysis for the proposed algorithm. According to the white noise hypothesis for the quantization error with dense and uniform distribution, we can regard the quantization error as i.i.d.\ white noise. Based on this, we show that the learning equation with monotonically increasing quantization resolution converges weakly as the distribution viewpoint. The analysis of this paper shows that global optimization is possible for a domain that satisfies the Lipschitz condition instead of local convergence properties such as the Hessian constraint of the objective function.

翻訳日:2021-12-28 14:44:51 公開日:2021-12-24

# 検証セットのないdart: 限界可能性の最適化

DARTS without a Validation Set: Optimizing the Marginal Likelihood ( http://arxiv.org/abs/2112.13023v1 )

ライセンス: Link先を確認

Miroslav Fil, Binxin Ru, Clare Lyle, Yarin Gal

(参考訳) neural architecture search (nas)の成功は、歴史的に、過剰な計算要求によって制限されている。 DARTSのような現代のウェイトシェアリングNASメソッドは、1桁のGPU日で検索を終了できるが、共有ウェイトから最終最高のアーキテクチャを抽出することは信頼性が低いことで知られている。ベイズ限度解釈を用いた最近開発された一般化推定器であるtraining-speed-estimate (tse)は、以前はdartの勾配に基づく最適化の検証損失の代わりに使用されてきた。これによりDARTSのスキップ接続崩壊が防止され、NASBench-201と元のDARTS検索スペースの性能が大幅に向上する。各種DARTS診断を適用し,検証セットを使用しない異常な動作を示すことで,これらの結果を拡張した。さらに,本実験は,操作選択に比べて文献上ではあまり注目されていないにもかかわらず,探索性能に強い負の影響を与えるdartの深さギャップとトポロジ選択の具体例を与える。

The success of neural architecture search (NAS) has historically been limited by excessive compute requirements. While modern weight-sharing NAS methods such as DARTS are able to finish the search in single-digit GPU days, extracting the final best architecture from the shared weights is notoriously unreliable. Training-Speed-Estimate (TSE), a recently developed generalization estimator with a Bayesian marginal likelihood interpretation, has previously been used in place of the validation loss for gradient-based optimization in DARTS. This prevents the DARTS skip connection collapse, which significantly improves performance on NASBench-201 and the original DARTS search space. We extend those results by applying various DARTS diagnostics and show several unusual behaviors arising from not using a validation set. Furthermore, our experiments yield concrete examples of the depth gap and topology selection in DARTS having a strongly negative impact on the search performance despite generally receiving limited attention in the literature compared to the operations selection.

翻訳日:2021-12-28 14:43:15 公開日:2021-12-24

# bi型不均質グラフ学習のための2重階層型アテンションネットワーク

Dual Hierarchical Attention Networks for Bi-typed Heterogeneous Graph Learning ( http://arxiv.org/abs/2112.13078v1 )

ライセンス: Link先を確認

Yu Zhao, Shaopeng Wei, Huaming Du, Xingyan Chen, Qing Li, Fuzhen Zhuang, Ji Liu and Gang Kou

(参考訳) バイタイプ不均一グラフは多くの実世界のシナリオに適用される。しかし、以前の異種グラフ学習研究は、通常、そのような異種グラフの双タイプ実体間の複雑な相互作用を無視する。本稿では,クラス間およびクラス間階層的階層的アテンションネットワークを用いて,二型不均質グラフ上での包括的ノード表現を学習する新しい二重階層的アテンションネットワーク(dhan)を提案する。具体的には、クラス内の注意は、同じタイプの隣人からノード表現を学ぶことを目的としており、クラス間の注意は、異なる種類の隣人からノード表現を集約することができる。したがって、デュアルアテンション操作により、DHANはノード内隣接情報だけでなく、二型不均一グラフ内のクラス間隣接情報も十分に活用することができる。両種異種学習ノードの包括的表現におけるDHANの有効性を十分に確認する各種課題に関する実験結果

Bi-typed heterogeneous graphs are applied in many real-world scenarios. However, previous heterogeneous graph learning studies usually ignore the complex interactions among the bi-typed entities in such heterogeneous graphs. To address this issue, in this paper we propose a novel Dual Hierarchical Attention Networks (DHAN) to learn comprehensive node representations on the bi-typed heterogeneous graphs with intra-class and inter-class hierarchical attention networks. Specifically, the intra-class attention aims to learn the node representations from its same type of neighbors, while inter-class attention is able to aggregate node representations from its different types of neighbors. Hence, the dual attention operations enable DHAN to sufficiently leverage not only the node intra-class neighboring information but also the inter-class neighboring information in the bi-typed heterogeneous graph. Experimental results on various tasks against the state-of-the-arts sufficiently confirm the capability of DHAN in learning node comprehensive representations on the bi-typed heterogeneous

翻訳日:2021-12-28 14:42:56 公開日:2021-12-24

# 解釈型強化学習に関する調査研究

A Survey on Interpretable Reinforcement Learning ( http://arxiv.org/abs/2112.13112v1 )

ライセンス: Link先を確認

Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao and Wulong Liu

(参考訳) 深層強化学習は、逐次的な意思決定問題に対して有望な機械学習アプローチとなっているが、自律運転や医療アプリケーションといった高度な領域では十分に成熟していない。そのような状況下では、例えば、学習されたポリシーは解釈可能で、配置前に検査される(例えば、安全性と検証可能性のために)必要がある。本調査は強化学習(RL)における高い解釈可能性を実現するための様々なアプローチの概要を提供する。その目的のために、解釈可能性(モデルの特性として)と説明可能性(プロキシの介入によるポストホック操作として)を区別し、それらをrlの文脈で以前の概念を強調して議論する。特に、解釈可能なRLは、解釈可能な入力、解釈可能な(遷移/回帰)モデル、解釈可能な意思決定など、異なる側面を受け入れることができると論じる。このスキームに基づいて,過去10年間の論文を中心に,解釈可能なRLに関する最近の研究の要約と分析を行った。また,いくつかの研究分野を簡潔に議論し,有望な研究の方向性を指摘する。

Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as a property of a model) and explainability (as a post-hoc operation, with the intervention of a proxy) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to interpretable RL with an emphasis on papers published in the past 10 years. We also discuss briefly some related research areas and point to some potential promising research directions.

翻訳日:2021-12-28 14:42:41 公開日:2021-12-24

# 集合フィードバックをもつガウス過程帯域

Gaussian Process Bandits with Aggregated Feedback ( http://arxiv.org/abs/2112.13029v1 )

ライセンス: Link先を確認

Mengyan Zhang, Russell Tsuchida, Cheng Soon Ong

(参考訳) 我々は,固定予算内で最高の武器を総括的フィードバックの下で推薦するという新しい設定の下で,連続武装バンディット問題を考える。これは、正確な報酬を得るのが不可能または高価であるアプリケーションによって動機付けられ、サブセットを超える平均のような集約された報酬やフィードバックが利用可能である。報奨関数の集合はガウス過程からのものであると仮定して制約し、ガウス過程最適化最適化(GPOO)アルゴリズムを提案する。ノードをアーム空間のサブセットとする木を適応的に構築し、フィードバックがノードの代表者の報酬の集合である。我々は,推奨する腕に対するフィードバックの集約に関して,新たな単純な後悔概念を提案する。本稿では,提案アルゴリズムの理論的解析を行い,特別な場合として単一点フィードバックを復元する。 GPOOを例示し、シミュレーションデータの関連アルゴリズムと比較する。

We consider the continuum-armed bandits problem, under a novel setting of recommending the best arms within a fixed budget under aggregated feedback. This is motivated by applications where the precise rewards are impossible or expensive to obtain, while an aggregated reward or feedback, such as the average over a subset, is available. We constrain the set of reward functions by assuming that they are from a Gaussian Process and propose the Gaussian Process Optimistic Optimisation (GPOO) algorithm. We adaptively construct a tree with nodes as subsets of the arm space, where the feedback is the aggregated reward of representatives of a node. We propose a new simple regret notion with respect to aggregated feedback on the recommended arms. We provide theoretical analysis for the proposed algorithm, and recover single point feedback as a special case. We illustrate GPOO and compare it with related algorithms on simulated data.

翻訳日:2021-12-28 14:39:11 公開日:2021-12-24

# シーンのテキスト認識に優れたテキスト推論を可能にするビジュアルセマンティクス

Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition ( http://arxiv.org/abs/2112.12916v1 )

ライセンス: Link先を確認

Yue He, Chen Chen, Jing Zhang, Juhua Liu, Fengxiang He, Chaoyue Wang, Bo Du

(参考訳) 既存のシーンテキスト認識(str)手法は、典型的には言語モデルを使用して、視覚認識(vr)モデルによって予測される1d文字系列の結合確率を最適化する。この問題に対処するため,本論文では,視覚意味論に基づくテキスト推論を初めて試みる。技術的には、vrモデルによって予測される文字分割マップを考えると、各インスタンスにサブグラフを構築し、ノードがその中のピクセルを表し、ノード間のエッジはその空間的類似性に基づいて追加される。その後、これらの部分グラフはルートノードによって順次接続され、完全なグラフにマージされる。このグラフに基づいて,テキスト推論(GTR)のためのグラフ畳み込みネットワークを考案し,これをクロスエントロピー損失で監視する。 GTRは、テキスト推論の改善によるパフォーマンス向上のために、代表STRモデルに簡単にプラグインできる。具体的には,セグメンテーションベースのSTRベースラインでGTRを言語モデルに並列化することで,S-GTRというモデルを構築し,相互学習による視覚言語的相補性を効果的に活用する。 S-GTRは6つのSTRベンチマークに新しい最先端をセットし、多言語データセットに最適化する。コードはhttps://github.com/adeline-cs/GTRで入手できる。

Existing Scene Text Recognition (STR) methods typically use a language model to optimize the joint probability of the 1D character sequence predicted by a visual recognition (VR) model, which ignore the 2D spatial context of visual semantics within and between character instances, making them not generalize well to arbitrary shape scene text. To address this issue, we make the first attempt to perform textual reasoning based on visual semantics in this paper. Technically, given the character segmentation maps predicted by a VR model, we construct a subgraph for each instance, where nodes represent the pixels in it and edges are added between nodes based on their spatial similarity. Then, these subgraphs are sequentially connected by their root nodes and merged into a complete graph. Based on this graph, we devise a graph convolutional network for textual reasoning (GTR) by supervising it with a cross-entropy loss. GTR can be easily plugged in representative STR models to improve their performance owing to better textual reasoning. Specifically, we construct our model, namely S-GTR, by paralleling GTR to the language model in a segmentation-based STR baseline, which can effectively exploit the visual-linguistic complementarity via mutual learning. S-GTR sets new state-of-the-art on six challenging STR benchmarks and generalizes well to multi-linguistic datasets. Code is available at https://github.com/adeline-cs/GTR.

翻訳日:2021-12-28 14:17:47 公開日:2021-12-24

# ニューラルネットワークモデルにおける擬似記憶

Counterfactual Memorization in Neural Language Models ( http://arxiv.org/abs/2112.12938v1 )

ライセンス: Link先を確認

Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tram\`er, Nicholas Carlini

(参考訳) 現代のニューラルネットワークモデルは、トレーニングデータから機密情報を記憶するNLPリスクのタスクで広く使用されている。モデルがパラメータ、トレーニングデータ、計算でスケールアップを続けるにつれ、言語モデルの記憶力の理解はどちらも学習理論の観点から重要であり、現実のアプリケーションでは事実上不可欠である。言語モデルにおける暗記に関する以前の研究における公然の疑問は、「一般的な」暗記をフィルターする方法である。実際、ほとんどの記憶基準はトレーニングセットの出現回数と強く相関しており、慣れ親しんだフレーズや公的な知識、テンプレート化されたテキストなどの「一般的な」記憶を捉えている。本稿では,心理学における人間の記憶の分類から着想を得た原則的視点を提供する。この観点から、トレーニング中に特定の文書が省略された場合、モデルの予測がどのように変化するかを特徴付ける反事実記憶の概念を定式化する。標準テキストデータセットにおける偽記憶されたトレーニング例を同定し,検討する。さらに、各トレーニング例が検証セットと生成されたテキストに与える影響を推定し、これがテスト時の記憶源の直接的な証拠となることを示す。

Modern neural language models widely used in tasks across NLP risk memorizing sensitive information from their training data. As models continue to scale up in parameters, training data, and compute, understanding memorization in language models is both important from a learning-theoretical point of view, and is practically crucial in real world applications. An open question in previous studies of memorization in language models is how to filter out "common" memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing "common" memorization such as familiar phrases, public knowledge or templated texts. In this paper, we provide a principled perspective inspired by a taxonomy of human memory in Psychology. From this perspective, we formulate a notion of counterfactual memorization, which characterizes how a model's predictions change if a particular document is omitted during training. We identify and study counterfactually-memorized training examples in standard text datasets. We further estimate the influence of each training example on the validation set and on generated texts, and show that this can provide direct evidence of the source of memorization at test time.

翻訳日:2021-12-28 13:57:50 公開日:2021-12-24

# マルチスケール機能融合:道路ポットホール検出のためのセマンティックセグメンテーションの学習

Multi-Scale Feature Fusion: Learning Better Semantic Segmentation for Road Pothole Detection ( http://arxiv.org/abs/2112.13082v1 )

ライセンス: Link先を確認

Jiahe Fan, Mohammud J. Bocus, Brett Hosking, Rigen Wu, Yanan Liu, Sergey Vityazev, Rui Fan

(参考訳) 本稿では,単一モーダル意味セグメンテーションに基づく新しいポットホール検出手法を提案する。まず、畳み込みニューラルネットワークを用いて入力画像から視覚的特徴を抽出する。チャネルアテンションモジュールは、異なる機能マップの一貫性を高めるためにチャネル機能を強化します。次に,空間的コンテキスト情報を統合するために,アトーラス空間ピラミッドプーリングモジュール(連続的なアトーラス畳み込みと拡張率)を用いる。これにより、ポットホールと無傷道路の区別が容易になる。最後に, 提案したマルチスケール機能融合モジュールを用いて, 隣接層内の特徴マップを融合する。これにより、異なる機能チャネル層間のセマンティクスギャップはさらに低減される。提案手法の有効性を実証するため,Pothole-600データセットを用いて実験を行った。定量的比較により,本手法はRGB画像と変換された異種画像の両方において最先端(SoTA)性能を実現し,STA単一モーダルセマンティックセマンティックセマンティクスネットワークを3つ上回った。

This paper presents a novel pothole detection approach based on single-modal semantic segmentation. It first extracts visual features from input images using a convolutional neural network. A channel attention module then reweighs the channel features to enhance the consistency of different feature maps. Subsequently, we employ an atrous spatial pyramid pooling module (comprising of atrous convolutions in series, with progressive rates of dilation) to integrate the spatial context information. This helps better distinguish between potholes and undamaged road areas. Finally, the feature maps in the adjacent layers are fused using our proposed multi-scale feature fusion module. This further reduces the semantic gap between different feature channel layers. Extensive experiments were carried out on the Pothole-600 dataset to demonstrate the effectiveness of our proposed method. The quantitative comparisons suggest that our method achieves the state-of-the-art (SoTA) performance on both RGB images and transformed disparity images, outperforming three SoTA single-modal semantic segmentation networks.

翻訳日:2021-12-28 13:57:33 公開日:2021-12-24

# MAMLは機能再使用によってのみ動作するか? データ中心の視点

Does MAML Only Work via Feature Re-use? A Data Centric Perspective ( http://arxiv.org/abs/2112.13137v1 )

ライセンス: Link先を確認

Brando Miranda, Yu-Xiong Wang and Sanmi Koyejo

(参考訳) 最近の研究は、優れた埋め込みが、多くの数ショットの学習ベンチマークを解決する必要があることを示唆している。さらに、モデル非依存なメタ学習(maml)も、良い埋め込みを学習することで、同じ方法で機能することを強く示唆している。これらの観察は、メタ学習アルゴリズムが何をし、いつ機能するのかについての理解の欠如を浮き彫りにする。本研究では,メタ学習型MAML表現がいかに機能するかを示す実験結果を提供する。特に3つの興味深い性質を同定する。 1) 従来の研究とは対照的に,機能再使用の程度が低い合成ベンチマークのファミリーを定義することが可能であることが示され,現在の数発の学習ベンチマークはメタ学習アルゴリズムの成功に必要な特性を持っていない可能性が示唆された。 2) メタオーバーフィットは、クラス数(あるいは概念)が有限であるときに起こり、タスクが無制限の概念(例えばオンライン学習)を持つと、この問題は消滅する。 3)mamlによるメタテスト時の適応性は,提案する合成ベンチマークのトレーニングにおいても,大幅な表現変更やメタテストパフォーマンスの向上を必ずしも生まない。最後に、メタ学習アルゴリズムをよりよく理解するためには、絶対的なパフォーマンスのみを追跡することを超えて、メタ学習の程度を正式に定量化し、両方のメトリクスを一緒に追跡しなければなりません。この方法での報告結果は、メタオーバーフィッティングのソースをより正確に特定し、固定機能の再使用を超えて学習する柔軟なメタ学習アルゴリズムを設計するのに役立ちます。最後に、メタラーニングを再考する上での課題は、以前の研究で示唆されたアルゴリズムではなく、数ショットの学習データセットとベンチマークの設計にあると推測する。

Recent work has suggested that a good embedding is all we need to solve many few-shot learning benchmarks. Furthermore, other work has strongly suggested that Model Agnostic Meta-Learning (MAML) also works via this same method - by learning a good embedding. These observations highlight our lack of understanding of what meta-learning algorithms are doing and when they work. In this work, we provide empirical results that shed some light on how meta-learned MAML representations function. In particular, we identify three interesting properties: 1) In contrast to previous work, we show that it is possible to define a family of synthetic benchmarks that result in a low degree of feature re-use - suggesting that current few-shot learning benchmarks might not have the properties needed for the success of meta-learning algorithms; 2) meta-overfitting occurs when the number of classes (or concepts) are finite, and this issue disappears once the task has an unbounded number of concepts (e.g., online learning); 3) more adaptation at meta-test time with MAML does not necessarily result in a significant representation change or even an improvement in meta-test performance - even when training on our proposed synthetic benchmarks. Finally, we suggest that to understand meta-learning algorithms better, we must go beyond tracking only absolute performance and, in addition, formally quantify the degree of meta-learning and track both metrics together. Reporting results in future work this way will help us identify the sources of meta-overfitting more accurately and help us design more flexible meta-learning algorithms that learn beyond fixed feature re-use. Finally, we conjecture the core challenge of re-thinking meta-learning is in the design of few-shot learning data sets and benchmarks - rather than in the algorithms, as suggested by previous work.

翻訳日:2021-12-28 13:57:13 公開日:2021-12-24

# ゼロタスクの多様性の曲線--MAMLにおける伝達学習の失敗とその実証的等価性について

The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and their Empirical Equivalence ( http://arxiv.org/abs/2112.13121v1 )

ライセンス: Link先を確認

Brando Miranda, Yu-Xiong Wang and Sanmi Koyejo

(参考訳) 最近、数ショットの学習ベンチマークを解くのに必要なトランスファーラーニングソリューションがすべてであることが明らかになっている。これにより、メタ学習アルゴリズムのデプロイ時期とデプロイ方法に関する重要な疑問が提起される。本稿では,メタラーニングソリューションが成功するか否かを予測できると仮定した,数発の学習ベンチマークの計算可能なメトリックを最初に定式化することにより,これらの疑問を明らかにするための第一歩とする。我々はこの指標を数ショットの学習ベンチマークの多様性係数と命名する。多様性係数を用いることで、miniimagenetベンチマークの多様性はゼロであることが示され、多様性を計算するための24の異なる方法が示される。この結果から,MAML学習における伝達学習ソリューションの公平な比較を行う場合,両者が同一のメタテスト精度を持つことを示した。これは、トランスファーラーニングがMAMLよりも優れていないことを示唆している。これら2つの事実は、多様性がメタラーニングの成功と相関するかどうかの最初のテストであり、したがって、ゼロの多様性係数は、特にメタテスト時間において、トランスファーラーニングとMAML学習ソリューションの高い類似性に相関していることを示す。したがって、メタ学習ソリューションは、多様性係数がゼロのとき、転送学習と同じメタテスト性能を持つ。

It has been recently observed that a transfer learning solution might be all we needed to solve many few-shot learning benchmarks. This raises important questions about when and how meta-learning algorithms should be deployed. In this paper, we make a first step in clarifying these questions by first formulating a computable metric for a few-shot learning benchmark that we hypothesize is predictive of whether meta-learning solutions will succeed or not. We name this metric the diversity coefficient of a few-shot learning benchmark. Using the diversity coefficient, we show that the MiniImagenet benchmark has zero diversity - according to twenty-four different ways to compute the diversity. We proceed to show that when making a fair comparison between MAML learned solutions to transfer learning, both have identical meta-test accuracy. This suggests that transfer learning fails to outperform MAML - contrary to what previous work suggests. Together, these two facts provide the first test of whether diversity correlates with meta-learning success and therefore show that a diversity coefficient of zero correlates with a high similarity between transfer learning and MAML learned solutions - especially at meta-test time. We therefore conjecture meta-learned solutions have the same meta-test performance as transfer learning when the diversity coefficient is zero.

翻訳日:2021-12-28 13:56:31 公開日:2021-12-24

# (参考訳) 新規物体学習のためのデュアルパス構造コントラスト埋め込み

Dual Path Structural Contrastive Embeddings for Learning Novel Objects ( http://arxiv.org/abs/2112.12359v2 )

ライセンス: CC BY 4.0

Bingbin Li, Elvis Han Cui, Yanan Li, Donghui Wang, Weng Kee Wong

(参考訳) 少数のラベル付きサンプルから新しいクラスを学ぶことは、機械学習領域で注目を集めている。メタラーニングベースあるいはトランスファーラーニングベースのパラダイムに関する最近の研究は、優れた機能空間に関する情報を得ることが、少ないタスクで良好なパフォーマンスを達成するための効果的な解決策であることを示している。本稿では,特徴表現と分類器のタスクを分離し,典型的な伝達学習学習戦略を通じて,基本クラスからのみ特徴埋め込みアーキテクチャを学習する,単純だが効果的なパラダイムを提案する。基本クラスと新しいクラスをまたいだ一般化能力とクラス内の識別能力の両方を維持するため,構造的類似性とコントラスト的特徴構成を効果的に組み合わせたデュアルパス特徴学習手法を提案する。このように、内部クラスのアライメントとクラス間の均一性はバランスよく保たれ、性能が向上する。 3つの一般的なベンチマーク実験により、単純なプロトタイプベース分類器を組み込んだ場合、インダクティブ推論とトランスダクティブ推論のいずれにおいても、標準および一般化された少数ショット問題に対して有望な結果が得られることが示された。

Learning novel classes from a very few labeled samples has attracted increasing attention in machine learning areas. Recent research on either meta-learning based or transfer-learning based paradigm demonstrates that gaining information on a good feature space can be an effective solution to achieve favorable performance on few-shot tasks. In this paper, we propose a simple but effective paradigm that decouples the tasks of learning feature representations and classifiers and only learns the feature embedding architecture from base classes via the typical transfer-learning training strategy. To maintain both the generalization ability across base and novel classes and discrimination ability within each class, we propose a dual path feature learning scheme that effectively combines structural similarity with contrastive feature construction. In this way, both inner-class alignment and inter-class uniformity can be well balanced, and result in improved performance. Experiments on three popular benchmarks show that when incorporated with a simple prototype based classifier, our method can still achieve promising results for both standard and generalized few-shot problems in either an inductive or transductive inference setting.

翻訳日:2021-12-28 13:44:32 公開日:2021-12-24

# (参考訳) 両レベル最適化レンズによる高速対人訓練の見直しと改善

Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization ( http://arxiv.org/abs/2112.12376v2 )

ライセンス: CC BY 4.0

Yihua Zhang, Guanhua Zhang, Prashant Khanduri, Mingyi Hong, Shiyu Chang, Sijia Liu

(参考訳) 敵陣訓練(AT)は、敵陣攻撃に対するディープニューラルネットワークの堅牢性を改善するための防御メカニズムとして広く認知されている。最小化器(すなわちディフェンダー)は、最大化器(すなわち攻撃者)が作成した敵の例の存在下で、最悪の場合のトレーニング損失を最小限に抑えるためのロバストなモデルを求める。しかし、min-maxの性質は計算量が多いためスケールが難しい。一方、FAST-ATアルゴリズムや、ATを改善する最近の多くのアルゴリズムは、その最大化ステップを単純なワンショット勾配符号ベースの攻撃生成ステップに置き換えることで、min-maxベースのATを単純化している。実装は容易ではあるが、fast-atは理論的な保証が欠けており、その実用性は不十分であり、強力な敵とのトレーニングにおいて強固な破壊的過剰に苦しむ。本稿では,双方向最適化(BLO)の観点からFAST-ATの設計を提案する。まず,fast-atの最も一般的なアルゴリズム仕様は,符号操作を含む二値問題を解くための勾配降下型アルゴリズムと等価であることを示す。しかし、符号操作の離散性はアルゴリズムの性能を理解するのを難しくしている。そこで本研究では,Fast Bi-level AT (FAST-BAT) と呼ばれる新しいアルゴリズムの設計と解析を行う。 FAST-BATは、グラデーションサインメソッドや明示的なロバストな正規化を呼ばずに、符号ベースの投射勾配降下(PGD)攻撃を防御することができる。さらに,本手法は, 従来のFAST-ATベースラインよりも優れたモデルロバスト性を実現し, 破滅的なオーバーフィッティングを誘発せず, あるいは標準精度の低下に悩まされることを実証的に示す。

Adversarial training (AT) has become a widely recognized defense mechanism to improve the robustness of deep neural networks against adversarial attacks. It solves a min-max optimization problem, where the minimizer (i.e., defender) seeks a robust model to minimize the worst-case training loss in the presence of adversarial examples crafted by the maximizer (i.e., attacker). However, the min-max nature makes AT computationally intensive and thus difficult to scale. Meanwhile, the FAST-AT algorithm, and in fact many recent algorithms that improve AT, simplify the min-max based AT by replacing its maximization step with the simple one-shot gradient sign based attack generation step. Although easy to implement, FAST-AT lacks theoretical guarantees, and its practical performance can be unsatisfactory, suffering from the robustness catastrophic overfitting when training with strong adversaries. In this paper, we propose to design FAST-AT from the perspective of bi-level optimization (BLO). We first make the key observation that the most commonly-used algorithmic specification of FAST-AT is equivalent to using some gradient descent-type algorithm to solve a bi-level problem involving a sign operation. However, the discrete nature of the sign operation makes it difficult to understand the algorithm performance. Based on the above observation, we propose a new tractable bi-level optimization problem, design and analyze a new set of algorithms termed Fast Bi-level AT (FAST-BAT). FAST-BAT is capable of defending sign-based projected gradient descent (PGD) attacks without calling any gradient sign method and explicit robust regularization. Furthermore, we empirically show that our method outperforms state-of-the-art FAST-AT baselines, by achieving superior model robustness without inducing robustness catastrophic overfitting, or suffering from any loss of standard accuracy.

翻訳日:2021-12-28 13:14:35 公開日:2021-12-24

# (参考訳) latr: シーンテキストvqaのためのレイアウト対応トランスフォーマー

LaTr: Layout-Aware Transformer for Scene-Text VQA ( http://arxiv.org/abs/2112.12494v2 )

ライセンス: CC BY 4.0

Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, R. Manmatha

(参考訳) 本稿では,Scene Text Visual Question Answering (STVQA) のための新しいマルチモーダルアーキテクチャ,Layout-Aware Transformer (LaTr) を提案する。 STVQAのタスクは、異なるモダリティを推論するモデルを必要とする。そこで我々はまず,各モダリティの影響を調査し,特にレイアウト情報に富んだ言語モジュールの重要性を明らかにする。そこで本研究では,テキストと空間的手がかりのみを必要とする単目的事前学習方式を提案する。スキャンした文書にこの事前学習方式を適用することは、ドメイン間差にもかかわらず、自然画像を使用するよりも一定の利点があることを示す。スキャンされた文書は調達が容易で、テキストセンスがあり、様々なレイアウトを持ち、言語とレイアウト情報を結びつけることで、モデルが様々な空間的手がかり(例えば左、下等)を学ぶのを助ける。既存の手法と比較すると,この手法は語彙を含まない復号化を行い,訓練語彙をはるかに一般化する。さらに我々は,LaTrがOCRエラーに対する堅牢性を改善することを実証した。さらに,視覚変換器を活用することで,外部物体検出装置の必要性を解消する。 LaTrは、複数のデータセット上で最先端のSTVQAメソッドより優れている。特に、TextVQAでは+7.6%、ST-VQAでは+10.8%、OCR-VQAでは+4.0%である。

We propose a novel multimodal architecture for Scene Text Visual Question Answering (STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to reason over different modalities. Thus, we first investigate the impact of each modality, and reveal the importance of the language module, especially when enriched with layout information. Accounting for this, we propose a single objective pre-training scheme that requires only text and spatial cues. We show that applying this pre-training scheme on scanned documents has certain advantages over using natural images, despite the domain gap. Scanned documents are easy to procure, text-dense and have a variety of layouts, helping the model learn various spatial cues (e.g. left-of, below etc.) by tying together language and layout information. Compared to existing approaches, our method performs vocabulary-free decoding and, as shown, generalizes well beyond the training vocabulary. We further demonstrate that LaTr improves robustness towards OCR errors, a common reason for failure cases in STVQA. In addition, by leveraging a vision transformer, we eliminate the need for an external object detector. LaTr outperforms state-of-the-art STVQA methods on multiple datasets. In particular, +7.6% on TextVQA, +10.8% on ST-VQA and +4.0% on OCR-VQA (all absolute accuracy numbers).

翻訳日:2021-12-28 12:45:31 公開日:2021-12-24

# (参考訳) ディープニューラルネットワークを用いた高次元分類問題の最適学習

Optimal learning of high-dimensional classification problems using deep neural networks ( http://arxiv.org/abs/2112.12555v2 )

ライセンス: CC BY 4.0

Philipp Petersen, Felix Voigtlaender

(参考訳) 本研究では,無騒音訓練サンプルから学習分類関数を学習する問題を,決定境界が一定の正則性を持つと仮定して検討する。この推定問題の普遍的下限を,連続決定境界の一般クラスに対して定めている。局所的バロン-正則決定境界のクラスでは、最適推定率は基本的に基底次元とは独立であり、深層ニューラルネットワークの適切なクラスに対する経験的リスク最小化法により実現可能である。これらの結果は、バロン正則関数のクラスの$l^1$と$l^\infty$エントロピーの新しい推定に基づいている。

We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we find that the optimal estimation rates are essentially independent of the underlying dimension and can be realized by empirical risk minimization methods over a suitable class of deep neural networks. These results are based on novel estimates of the $L^1$ and $L^\infty$ entropies of the class of Barron-regular functions.

翻訳日:2021-12-28 12:18:51 公開日:2021-12-24

# banmo: カジュアルなビデオから3dニューラルモデルを作る

BANMo: Building Animatable 3D Neural Models from Many Casual Videos ( http://arxiv.org/abs/2112.12761v2 )

ライセンス: Link先を確認

Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo

(参考訳) 関節型3d形状再構成の作業は、しばしば特殊なセンサー(例えば、同期マルチカメラシステム)や、事前構築された3d変形可能なモデル(例えば、smalやsmpl)に依存する。このようなメソッドは、野生のさまざまなオブジェクトセットにスケールできない。本稿では,特殊なセンサや事前定義されたテンプレート形状を必要としないBANMoを提案する。 BANMoは、多くのモノクロカジュアルビデオから高忠実な3Dモデル(形状とアニマタブルなスキンウェイトを含む)を、異なるレンダリングフレームワークで構築する。多くのビデオを使用することで、カメラのビューやオブジェクトの調音をより広範にカバーできる一方で、背景や照明条件の異なるシーン間での対応を確立する上での重要な課題がもたらされる。我々は,(1)関節骨とブレンドスキンを用いた古典的変形可能な形状モデル,(2)勾配に基づく最適化に寄与する体積神経放射場(NeRF),(3)ピクセルと関節モデルとの対応を生成する正準埋め込みの3つの学派を融合させることを考察した。ニューラルブレンドスキンモデルを導入し, 可微分変形と可逆変形を可能にした。標準埋め込みと組み合わせることで、サイクル整合性で自己教師できるビデオ間の密接な対応を確立することができる。リアルと合成のデータセットでは、BANMoは人間や動物の以前の作品よりも忠実な3D再構成を示しており、新しい視点やポーズからリアルな画像をレンダリングすることができる。プロジェクトWebページ: banmo-www.github.io

Prior work for articulated 3D shape reconstruction often relies on specialized sensors (e.g., synchronized multi-camera systems), or pre-built 3D deformable models (e.g., SMAL or SMPL). Such methods are not able to scale to diverse sets of objects in the wild. We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape. BANMo builds high-fidelity, articulated 3D models (including shape and animatable skinning weights) from many monocular casual videos in a differentiable rendering framework. While the use of many videos provides more coverage of camera views and object articulations, they introduce significant challenges in establishing correspondence across scenes with different backgrounds, illumination conditions, etc. Our key insight is to merge three schools of thought; (1) classic deformable shape models that make use of articulated bones and blend skinning, (2) volumetric neural radiance fields (NeRFs) that are amenable to gradient-based optimization, and (3) canonical embeddings that generate correspondences between pixels and an articulated model. We introduce neural blend skinning models that allow for differentiable and invertible articulated deformations. When combined with canonical embeddings, such models allow us to establish dense correspondences across videos that can be self-supervised with cycle consistency. On real and synthetic datasets, BANMo shows higher-fidelity 3D reconstructions than prior works for humans and animals, with the ability to render realistic images from novel viewpoints and poses. Project webpage: banmo-www.github.io .

翻訳日:2021-12-28 12:16:40 公開日:2021-12-24

PDF登録状況（公開日: 20211224）