Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20211101となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 多部系における協調と依存 Cooperation and dependencies in multipartite systems ( http://arxiv.org/abs/2003.12489v3 ) ライセンス: Link先を確認	Waldemar Klobus, Marek Miller, Mahasweta Pandit, Ray Ganardi, Lukas Knips, Jan Dziewior, Jasmin Meinecke, Harald Weinfurter, Wieslaw Laskowski, Tomasz Paterek	(参考訳) 本稿では,グローバルシステムのサブシステム間の依存性の度合いを把握できる情報理論量化器を提案する。量化器は、多くの性質を共有しているにもかかわらず、多部相関の測度とは異なる。古典的だけでなく量子システムにも直接計算可能であり、2つのサブシステム間の条件付き相互情報の比較に還元される。例えば、対称量子秘密共有のために新しい量子化器を使うことの利点を示す。また、ローカル操作下での条件付き相互情報の単調性の欠如を特徴付ける不等式を証明し、直観的な理解を提供する。これは、ここで導入された多成分依存測度と多成分相関の区別を示す。 We propose an information-theoretic quantifier for the advantage gained from cooperation that captures the degree of dependency between subsystems of a global system. The quantifier is distinct from measures of multipartite correlations despite sharing many properties with them. It is directly computable for classical as well as quantum systems and reduces to comparing the respective conditional mutual information between any two subsystems. Exemplarily we show the benefits of using the new quantifier for symmetric quantum secret sharing. We also prove an inequality characterizing the lack of monotonicity of conditional mutual information under local operations and provide intuitive understanding for it. This underlines the distinction between the multipartite dependence measure introduced here and multipartite correlations.	翻訳日:2023-05-27 18:23:16 公開日:2021-11-01
# 仮想量子ビットを用いた多層熱機械の設計 Simplifying the design of multilevel thermal machines using virtual qubits ( http://arxiv.org/abs/2009.03832v3 ) ライセンス: Link先を確認	Ayaka Usui, Wolfgang Niedenzu, Marcus Huber	(参考訳) 量子熱力学は、しばしば大規模で複雑な環境と相互作用する小さな量子機械のダイナミクスを扱う。仮想量子ビット、衝突モデル、リセットマスター方程式は、数量子マシンと熱環境に結合した2次元ターゲットシステムの定性的挙動を予測するための非常に有用なツールとなっている。全ての物理系に対する単純化されたモデルパラメータの整合性は知られていないが、定性的予測は実装に関係なく量子機械の一般的な設計を可能にする。より大規模で複雑なマシンに結合した多次元システムのモデリングに複数の競合する仮想キュービットを導入することで、これらのツールを一般化する。 3次元のターゲットに対する完全な物理力学をシミュレートすることにより、現実的なセットアップにおける物理変化の定性的特徴を正確に予測し、数量子ビットを超える自律量子マシンを設計するために「ダイアル」として使用できるリセットモデルの一般的な性質を明らかにする。次に,マルチキュービットマシンに結合した任意の次元システムに対するリセットモデルの一般解析解を提案する。最後に, 改良された3レベルレーザーを, 実験結果の例示として紹介する。 Quantum thermodynamics often deals with the dynamics of small quantum machines interfacing with a large and complex environment. Virtual qubits, collisional models and reset master equations have become highly useful tools for predicting the qualitative behaviour of two-dimensional target systems coupled to few-qubit machines and a thermal environment. While few successes in matching the simplified model parameters for all possible physical systems are known, the qualitative predictions still allow for a general design of quantum machines irrespective of the implementation. We generalise these tools by introducing multiple competing virtual qubits for modelling multi-dimensional systems coupled to larger and more complex machines. By simulating the full physical dynamics for targets with three dimensions, we uncover general properties of reset models that can be used as `dials' to correctly predict the qualitative features of physical changes in a realistic setup and thus design autonomous quantum machines beyond a few qubits. We then present a general analytic solution of the reset model for arbitrary-dimensional systems coupled to multi-qubit machines. Finally, we showcase an improved three-level laser as an exemplary application of our results.	翻訳日:2023-05-03 05:04:55 公開日:2021-11-01
# 量子液体中の高速不純物の動的量子チェレンコフ転移 Dynamical quantum Cherenkov transition of fast impurities in quantum liquids ( http://arxiv.org/abs/2101.00030v2 ) ライセンス: Link先を確認	Kushal Seetharam, Yulia Shchadilova, Fabian Grusdt, Mikhail B. Zvonarev, Eugene Demler	(参考訳) 相互作用する量子多体媒体における移動不純物のダイナミクスを理解するという課題は、環境の不純物と励起状態の間の絡み合いを、幅広いエネルギースケールで含む必要性から生じる。本稿では, ボゴリューボフ励起を流し始めると, 三次元量子ボース流体中に注入される有限質量不純物の運動について検討する。我々は不純物の速度が不純物とボソンの相互作用の強さと不純物の反動エネルギーに依存する臨界値を超えたときの力学の遷移を明らかにする。インジェクション実験では, この2つのレジームは, 不純物速度の非破壊特性だけでなく, ロスシュミットエコー, becで励起された密度リップル, 散乱ボソニック粒子の運動量分布にも有意差が認められた。この遷移は動的量子チェレンコフ効果の顕在化であり、ラムゼー干渉法、RF分光法、吸収イメージング法、飛行時間イメージング法を用いて超低温原子で実験的に観測可能である。 The challenge of understanding the dynamics of a mobile impurity in an interacting quantum many-body medium comes from the necessity of including entanglement between the impurity and excited states of the environment in a wide range of energy scales. In this paper, we investigate the motion of a finite mass impurity injected into a three-dimensional quantum Bose fluid as it starts shedding Bogoliubov excitations. We uncover a transition in the dynamics as the impurity's velocity crosses a critical value which depends on the strength of the interaction between the impurity and bosons as well as the impurity's recoil energy. We find that in injection experiments, the two regimes differ not only in the character of the impurity velocity abatement, but also exhibit qualitative differences in the Loschmidt echo, density ripples excited in the BEC, and momentum distribution of scattered bosonic particles. The transition is a manifestation of a dynamical quantum Cherenkov effect, and should be experimentally observable with ultracold atoms using Ramsey interferometry, RF spectroscopy, absorption imaging, and time-of-flight imaging.	翻訳日:2023-04-18 05:31:39 公開日:2021-11-01
# ネットワークのクランクとキャビティの計算 Computing Cliques and Cavities in Networks ( http://arxiv.org/abs/2101.00536v3 ) ライセンス: Link先を確認	Dinghua Shi, Zhifeng Chen, Xiang Sun, Qinghua Chen, Chuang Ma, Yang Lou and Guanrong Chen	(参考訳) 複雑なネットワークには、ノード、エッジ、三角形などの完全なサブグラフが含まれており、異なる順序の単純化やクランクと呼ばれる。特に、高次傾斜角からなる空洞は脳機能に重要な役割を果たす。最大クランクの探索はnp完全問題であるため、与えられたネットワークの計算可能性を決定するためにkコア分解を用いる。計算可能なネットワークに対して,異なる順序の斜めを見つけるための実装可能なアルゴリズムを用いて探索法を設計し,オイラー特性数も取得する。次に,隣接する斜めの境界行列のランクを用いてベッチ数を計算する。さらに,異なる順序のキャビティを求めるための最適化アルゴリズムも設計する。最後に、このアルゴリズムをC. elegansの神経ネットワークに適用し、典型的なデータセットからのデータを用いて、その傾きの全てと異なる順序のキャビティを見つけ、その構造と関数のさらなる数学的解析と計算の基盤を提供する。 Complex networks contain complete subgraphs such as nodes, edges, triangles, etc., referred to as simplices and cliques of different orders. Notably, cavities consisting of higher-order cliques play an important role in brain functions. Since searching for maximum cliques is an NP-complete problem, we use k-core decomposition to determine the computability of a given network. For a computable network, we design a search method with an implementable algorithm for finding cliques of different orders, obtaining also the Euler characteristic number. Then, we compute the Betti numbers by using the ranks of boundary matrices of adjacent cliques. Furthermore, we design an optimized algorithm for finding cavities of different orders. Finally, we apply the algorithm to the neuronal network of C. elegans with data from one typical dataset, and find all of its cliques and some cavities of different orders, providing a basis for further mathematical analysis and computation of its structure and function.	翻訳日:2023-04-18 00:13:42 公開日:2021-11-01
# 自然言語テキストからの因果関係の抽出に関する調査 A Survey on Extraction of Causal Relations from Natural Language Text ( http://arxiv.org/abs/2101.06426v2 ) ライセンス: Link先を確認	Jie Yang, Soyeon Caren Han, Josiah Poon	(参考訳) 人間の認知の重要な要素として、因果関係はテキストに頻繁に現れ、テキストから因果関係を計算することで、予測タスクのための因果関係を構築するのに役立つ。既存の因果抽出技術には、知識ベース、統計機械学習(ML)ベース、深層学習ベースアプローチなどがある。各メソッドには長所と短所がある。例えば、知識ベースのメソッドは理解できるが、広範な手動のドメイン知識が必要であり、ドメイン間の適用性が低い。統計的機械学習手法は自然言語処理(NLP)ツールキットによってより自動化される。しかし、機能工学は労働集約的であり、ツールキットはエラー伝播を引き起こす可能性がある。近年,その強力な表現学習能力と計算資源の急速な増加により,深層学習技術がNLP研究者から注目を集めている。その制限には、高い計算コストと適切な注釈付きトレーニングデータの欠如が含まれる。本稿では,因果抽出に関する総合的な調査を行う。まず, 因果関係抽出における一次形式, 明示的因果関係, 暗黙的因果関係, および相互因果関係について紹介する。次に、因果関係抽出のためのベンチマークデータセットとモデリングアセスメント手法をリストアップする。そこで本研究では,3つの手法を代表システムで概説する。最後に、既存のオープンチャレンジを潜在的な方向性で強調する。 As an essential component of human cognition, cause-effect relations appear frequently in text, and curating cause-effect relations from text helps in building causal networks for predictive tasks. Existing causality extraction techniques include knowledge-based, statistical machine learning(ML)-based, and deep learning-based approaches. Each method has its advantages and weaknesses. For example, knowledge-based methods are understandable but require extensive manual domain knowledge and have poor cross-domain applicability. Statistical machine learning methods are more automated because of natural language processing (NLP) toolkits. However, feature engineering is labor-intensive, and toolkits may lead to error propagation. In the past few years, deep learning techniques attract substantial attention from NLP researchers because of its' powerful representation learning ability and the rapid increase in computational resources. Their limitations include high computational costs and a lack of adequate annotated training data. In this paper, we conduct a comprehensive survey of causality extraction. We initially introduce primary forms existing in the causality extraction: explicit intra-sentential causality, implicit causality, and inter-sentential causality. Next, we list benchmark datasets and modeling assessment methods for causal relation extraction. Then, we present a structured overview of the three techniques with their representative systems. Lastly, we highlight existing open challenges with their potential directions.	翻訳日:2023-04-15 01:02:37 公開日:2021-11-01
# 条件数を2次的に改善した正定値量子線形系の解法について On solving classes of positive-definite quantum linear systems with quadratically improved runtime in the condition number ( http://arxiv.org/abs/2101.11868v3 ) ライセンス: Link先を確認	Davide Orsucci, Vedran Dunjko	(参考訳) 量子線形システム(QLS)問題を解く量子アルゴリズムは、計算に難解な微分方程式の解や機械学習の高速化など、近年最も研究されている量子アルゴリズムの一つである。 QLSソルバの効率を管理する基本的なパラメータは$\kappa$であり、QLS問題の発端から知られているように、ランタイムスケールは$\kappa$[Harrow, Hassidim and Lloyd, PRL 103, 150502 (2009)]で少なくとも線形である。しかし、正定値行列の場合、古典的なアルゴリズムは$\sqrt{\kappa}$で実行時にスケーリングすることで線形システムを解くことができる。したがって、QLSソルバが類似した改善を達成できるかどうかを問うことは自然である。この研究では、QLSを解くことは、$A$が正定値であるときにも、$\kappa$のランタイム線型を伴っていることを示す。次に、この下界を回避できる正定値QLSの広いクラスを特定し、$\kappa$の二次的スピードアップを特徴とする2つの新しい量子アルゴリズムを提示する: 1つは、A^{-1}$の行列ブロック符号化を効率的に実装することに基づいており、もう1つは、システムの事前条件を満たすために$A = L L^\dagger$という形式の分解を構成する。これらの方法は広く適用でき、どちらもBQP完全問題を効率的に解くことができる。 Quantum algorithms for solving the Quantum Linear System (QLS) problem are among the most investigated quantum algorithms of recent times, with potential applications including the solution of computationally intractable differential equations and speed-ups in machine learning. A fundamental parameter governing the efficiency of QLS solvers is $\kappa$, the condition number of the coefficient matrix $A$, as it has been known since the inception of the QLS problem that for worst-case instances the runtime scales at least linearly in $\kappa$ [Harrow, Hassidim and Lloyd, PRL 103, 150502 (2009)]. However, for the case of positive-definite matrices classical algorithms can solve linear systems with a runtime scaling as $\sqrt{\kappa}$, a quadratic improvement compared to the the indefinite case. It is then natural to ask whether QLS solvers may hold an analogous improvement. In this work we answer the question in the negative, showing that solving a QLS entails a runtime linear in $\kappa$ also when $A$ is positive definite. We then identify broad classes of positive-definite QLS where this lower bound can be circumvented and present two new quantum algorithms featuring a quadratic speed-up in $\kappa$: the first is based on efficiently implementing a matrix-block-encoding of $A^{-1}$, the second constructs a decomposition of the form $A = L L^\dagger$ to precondition the system. These methods are widely applicable and both allow to efficiently solve BQP-complete problems.	翻訳日:2023-04-13 11:58:17 公開日:2021-11-01
# ディラックフェルミオンの測定誘起相転移の有効理論 Effective Theory for the Measurement-Induced Phase Transition of Dirac Fermions ( http://arxiv.org/abs/2102.08381v5 ) ライセンス: Link先を確認	M. Buchhold, Y. Minoguchi, A. Altland, S. Diehl	(参考訳) 測定に曝露された波動関数は、決定論的ユニタリおよび確率論的測定によって引き起こされる状態の更新を伴う純粋状態ダイナミクスを実行し、量子軌道を定義する。多くの粒子系では、これらの異なる動力学要素の競合は量子相転移に似たシナリオを引き起こす。単一量子軌道のランダム性に拘わらず、それにアクセスするために、軌道プロジェクタの$n$-次モーメントのアンサンブル平均に対して$n$-replica Keldysh場理論を構築する。鍵となる発見は、この場の理論が無期限に加熱される1つの自由度に分解され、n-1$の他は効果的な非エルミート的ハミルトニアンによって生成される純粋な状態進化の形式に投げ込まれることである。この分離は自由理論に完全であり、相互作用理論に有用である。特に局所的に測定されたディラックフェルミオンは(1+1)$次元で研究され、長い波長で観察された相互作用するルッティンガー液体にボゾン化することができる。このモデルでは、非エルミートハミルトニアンは複素係数を持つ量子シン・ゴルドンモデルに対応する。再正規化群分析により、対数絡み合いエントロピー成長を伴うギャップレス臨界相と、ベレジンスキー-コステルリッツ-トゥーレス遷移によって分離されたガッピング領域ロー相が明らかにされる。ここで現われる物理画像は、観測速度を増加させると、軌道波関数を測定オペレーターの固有状態にピン留めするものである。 A wave function exposed to measurements undergoes pure state dynamics, with deterministic unitary and probabilistic measurement induced state updates, defining a quantum trajectory. For many-particle systems, the competition of these different elements of dynamics can give rise to a scenario similar to quantum phase transitions. To access it despite the randomness of single quantum trajectories, we construct an $n$-replica Keldysh field theory for the ensemble average of the $n$-th moment of the trajectory projector. A key finding is that this field theory decouples into one set of degrees of freedom that heats up indefinitely, while $n-1$ others can be cast into the form of pure state evolutions generated by an effective non-Hermitian Hamiltonian. This decoupling is exact for free theories, and useful for interacting ones. In particular, we study locally measured Dirac fermions in $(1+1)$ dimensions, which can be bosonized to a monitored interacting Luttinger liquid at long wavelengths. For this model, the non-Hermitian Hamiltonian corresponds to a quantum Sine-Gordon model with complex coefficients. A renormalization group analysis reveals a gapless critical phase with logarithmic entanglement entropy growth, and a gapped area law phase, separated by a Berezinskii-Kosterlitz-Thouless transition. The physical picture emerging here is a pinning of the trajectory wave function into eigenstates of the measurement operators upon increasing the monitoring rate.	翻訳日:2023-04-11 00:14:12 公開日:2021-11-01
# 非自明な交換則を持つヨルダン・ウィグナー変換と量子ビット Jordan-Wigner transformation and qubits with nontrivial exchange rule ( http://arxiv.org/abs/2103.04629v3 ) ライセンス: Link先を確認	Alexander Yu. Vlasov	(参考訳) 有名な(スピンのない)フェルミオン量子ビットは、通常のフェルミオンと比較してより微妙な考察を必要とするかもしれない。局所フェルミオンモードのモデルを考慮すると、公式には「占有」状態 \|1> のみが粒子交換に関して反対称性に関係するが、「真空」状態 \|0> は関係しない。このような「配置」によってインデックス付けされたフェルミオン量子ビットに対する交換規則の導入は、一般的な超選択原理により疑わしいように見える。しかし、そのような「超指数」量子ビットの一貫した代数的構成が本研究で提示される。考えられる手法は超空間の構成と何らかの関係があるが、量子ビットモデルの一般化に用いられる超対称性の標準定義といくつかの違いがある。 Well-known (spinless) fermionic qubits may need more subtle consideration in comparison with usual (spinful) fermions. Taking into account a model with local fermionic modes, formally only the 'occupied' states \|1> could be relevant for antisymmetry with respect to particles interchange, but 'vacuum' state \|0> is not. Introduction of exchange rule for such fermionic qubits indexed by some 'positions' may look questionable due to general super-selection principle. However, a consistent algebraic construction of such 'super-indexed' qubits is presented in this work. Considered method has some relation with construction of super-spaces, but it has some differences with standard definition of supersymmety sometimes used for generalizations of qubit model.	翻訳日:2023-04-08 18:22:21 公開日:2021-11-01
# フォトニック量子コンピュータ上での強相関多ボソン波動関数の符号化:魅力的なボース・ハッバードモデルへの応用 Encoding strongly-correlated many-boson wavefunctions on a photonic quantum computer: application to the attractive Bose-Hubbard model ( http://arxiv.org/abs/2103.15021v4 ) ライセンス: Link先を確認	Saad Yalouz, Bruno Senjean, Filippo Miatto, Vedran Dunjko	(参考訳) 変分量子アルゴリズム(VQA)は、特に短期的に利用可能なデバイスの観点から、複雑な強い相関の量子多体系の特性を決定する最も有望な方法の一つであると考えられている。この文脈において、多体波動関数を符号化する効率的な量子回路の開発は、VQAの成功の鍵の一つである。フェルミオン系の固有状態をエンコードする現在の量子デバイスの可能性の研究に多大な努力が払われているが、ボソニック系のエンコーディングについてはほとんど知られていない。本研究では,連続変数(cv)フォトニック系量子回路を用いた(単純だがリッチな)bose-hubbardモデルの基底状態の符号化について検討する。 2つの異なるアンザッツアーキテクチャを導入し、提案した連続可変量子回路が(99%以上の忠実度を持つ)強相関多ボソン波動関数をわずか数層で効率的にエンコードできることを実証した。多数のボソン系の基底状態を近似するアンザッツの適合性の研究の他に、変分量子固有解法アルゴリズムにおけるアンザッツの使用の初期評価を行い、エネルギー最小化による発見を行う。この目的のために,実験系におけるハミルトンエネルギーの測定手法を導入し,サンプリングノイズの影響について検討する。 Variational quantum algorithms (VQA) are considered as some of the most promising methods to determine the properties of complex strongly correlated quantum many-body systems, especially from the perspective of devices available in the near term. In this context, the development of efficient quantum circuit ansatze to encode a many-body wavefunction is one of the keys for the success of a VQA. Great efforts have been invested to study the potential of current quantum devices to encode the eigenstates of fermionic systems, but little is known about the encoding of bosonic systems. In this work, we investigate the encoding of the ground state of the (simple but rich) attractive Bose-Hubbard model using a Continuous-Variable (CV) photonic-based quantum circuit. We introduce two different ansatz architectures and demonstrate that the proposed continuous variable quantum circuits can efficiently encode (with a fidelity higher than 99%) the strongly correlated many-boson wavefunction with just a few layers, in all many-body regimes and for different number of bosons and initial states. Beyond the study of the suitability of the ansatz to approximate the ground states of many-boson systems, we also perform initial evaluations of the use of the ansatz in a variational quantum eigensolver algorithm to find it through energy minimization. To this end we also introduce a scheme to measure the Hamiltonian energy in an experimental system, and study the effect of sampling noise.	翻訳日:2023-04-06 08:10:37 公開日:2021-11-01
# 長距離キタエフ鎖におけるハミルトンパラメータ推定における超ハイゼンベルクスケーリング Super-Heisenberg scaling in Hamiltonian parameter estimation in the long-range Kitaev chain ( http://arxiv.org/abs/2104.07120v2 ) ライセンス: Link先を確認	Jing Yang, Shengshi Pang, Adolfo del Campo and Andrew N. Jordan	(参考訳) 量子力学において、非線形多体相互作用はハイゼンベルクスケーリングを超えるためにハミルトンパラメータ推定の精度を高めることができる。本稿では,長距離相互作用を持つ線形系における相互作用強度の推定と,キタエフ鎖を用いたケーススタディとして,相互作用範囲を変化させた量子フィッシャー情報におけるハイゼンベルクからスーパーハイゼンベルクへの遷移について考察する。さらに,量子制御により,量子フィッシャー情報の事前因子が向上することを示す。本研究は,多体量子メソロジーにおける最適量子制御と長距離相互作用の利点を探求する。 In quantum metrology, nonlinear many-body interactions can enhance the precision of Hamiltonian parameter estimation to surpass the Heisenberg scaling. Here, we consider the estimation of the interaction strength in linear systems with long-range interactions and using the Kitaev chains as a case study, we establish a transition from the Heisenberg to super-Heisenberg scaling in the quantum Fisher information by varying the interaction range. We further show that quantum control can improve the prefactor of the quantum Fisher information. Our results explore the advantage of optimal quantum control and long-range interactions in many-body quantum metrology.	翻訳日:2023-04-03 20:54:17 公開日:2021-11-01
# 非平衡定常状態における絡み合い対策:一次元における実測結果 Entanglement Measures in a Nonequilibrium Steady State: Exact Results in One Dimension ( http://arxiv.org/abs/2105.00740v2 ) ライセンス: Link先を確認	Shachar Fraenkel, Moshe Goldstein	(参考訳) 絡み合いは、凝縮物質多体系の研究において顕著な役割を果たす: 絡み合い測定は、量子情報プロトコルにおけるこれらの系の使用可能性の定量化だけでなく、物理学にも光を当てる。しかし、特に平衡状態にある系では、正確な分析結果はほとんど残っていない。本研究では, 中心近傍に任意の散乱領域を有する一様密結合鎖からなる一様一次元フェルミオン系を, ゼロ温度の直流バイアス電圧により検討する。したがって、系は電流が流れる非平衡定常状態に保たれ、純粋な量子状態によって記述することができる。フィッシャー・ハートウィッグ予想の一般化を用いて,補数付きサブシステムの二部絡みエントロピーの厳密な計算を行い,体積則線形項と対数項の両方を含む,サブシステムの長さの絡み合いのスケーリングが極めて珍しいことを示した。線形項は散乱による不完全伝達と関連しており、レヴィトフ・レソヴィック全計数統計公式の一般化を提供する。対数項は分布関数におけるフェルミの不連続性から生じる。また, 粒子数解の絡み合いの正確な式も生成する。先行次エンタングルメント等式は適用されるが、第1項の破壊はサブシステムのサイズとともに増大し、従来研究されていたシステムでは観察されなかった新しい挙動が観察される。我々は, 単一不純物サイトを有する密結合鎖の具体的モデルに適用し, 解析式が数値計算とよく一致していることを示す。解析結果は、多重散乱領域の場合に対応するためにさらに一般化される。 Entanglement plays a prominent role in the study of condensed matter many-body systems: Entanglement measures not only quantify the possible use of these systems in quantum information protocols, but also shed light on their physics. However, exact analytical results remain scarce, especially for systems out of equilibrium. In this work we examine a paradigmatic one-dimensional fermionic system that consists of a uniform tight-binding chain with an arbitrary scattering region near its center, which is subject to a DC bias voltage at zero temperature. The system is thus held in a current-carrying nonequilibrium steady state, which can nevertheless be described by a pure quantum state. Using a generalization of the Fisher-Hartwig conjecture, we present an exact calculation of the bipartite entanglement entropy of a subsystem with its complement, and show that the scaling of entanglement with the length of the subsystem is highly unusual, containing both a volume-law linear term and a logarithmic term. The linear term is related to imperfect transmission due to scattering, and provides a generalization of the Levitov-Lesovik full counting statistics formula. The logarithmic term arises from the Fermi discontinuities in the distribution function. Our analysis also produces an exact expression for the particle-number-resolved entanglement. We find that although to leading order entanglement equipartition applies, the first term breaking it grows with the size of the subsystem, a novel behavior not observed in previously studied systems. We apply our general results to a concrete model of a tight-binding chain with a single impurity site, and show that the analytical expressions are in good agreement with numerical calculations. The analytical results are further generalized to accommodate the case of multiple scattering regions.	翻訳日:2023-04-01 17:59:43 公開日:2021-11-01
# フェルミオン格子モデルにおける多体スカル状態に対する群論的アプローチ Group theoretic approach to many-body scar states in fermionic lattice models ( http://arxiv.org/abs/2106.10300v3 ) ライセンス: Link先を確認	Kiryl Pakrouski, Preethi N. Pallegar, Fedor K. Popov, Igor R. Klebanov	(参考訳) ArXiv:2007.00845] は、高対称状態の3つの族が、適切なリー群の生成元である$H_0+OT$という形のスピン-1/2フェルミオンハミルトニアンに対して多体傷であることを示した。これらの家族の1つが有名な$\eta$-pairing州で構成されている。傷の通常の特性に加えて、これらの状態のファミリーは電磁ノイズに敏感であり、量子情報の保存と処理に有利である。本稿では,ハバード相互作用やハイゼンベルク相互作用やそれらを含むハミルトニアンなど,多くのよく知られた結合項が要求される形式であり,微調整を伴わずにこれらの状態を傷跡として支持することを示す。トポロジ的モデルを含む多くのよく使われるモデルに対する明示的な$H_0+OT$分解が提供される。実験的な実装を可能にするため,これらのモデルの低エネルギー部分空間が傷跡のみから構成される条件について議論する。さらに、傷跡のある新しいモデルを設計するためのビルディングブロックとして使用できるすべてのジェネレータをT$に書き、最も興味深いのはスピン軌道結合ホッピングと超伝導ペアリングの用語を含む。このフレームワークを非エルミートオープンシステムにも拡張し、scarサブスペースがコヒーレントな時間発展を継続し、"復活"を示すことを実証します。拡張された2D $tJU$モデルの完全な数値的研究は、不変傷の新規な性質を明確に示し、我々の発見を支持する。 It has been shown [arXiv:2007.00845] that three families of highly symmetric states are many-body scars for any spin-1/2 fermionic Hamiltonian of the form $H_0+OT$, where $T$ is a generator of an appropriate Lie group. One of these families consists of the well-known $\eta$-pairing states. In addition to having the usual properties of scars, these families of states are insensitive to electromagnetic noise and have advantages for storing and processing quantum information. In this paper we show that a number of well-known coupling terms, such as the Hubbard and the Heisenberg interactions, and the Hamiltonians containing them, are of the required form and support these states as scars without fine-tuning. The explicit $H_0+OT$ decomposition for a number of most commonly used models, including topological ones, is provided. To facilitate possible experimental implementations, we discuss the conditions for the low-energy subspace of these models to be comprised solely of scars. Further, we write down all the generators $T$ that can be used as building blocks for designing new models with scars, most interestingly including the spin-orbit coupled hopping and superconducting pairing terms. We expand this framework to the non-Hermitian open systems and demonstrate that for them the scar subspace continues to undergo coherent time evolution and exhibit the "revivals". A full numerical study of an extended 2D $tJU$ model explicitly illustrates the novel properties of the invariant scars and supports our findings.	翻訳日:2023-03-26 08:05:57 公開日:2021-11-01
# コプレーナマイクロ波共振器の近接場テラヘルツ分光法 Near-Field Terahertz Nanoscopy of Coplanar Microwave Resonators ( http://arxiv.org/abs/2106.12907v2 ) ライセンス: Link先を確認	Xiao Guo, Xin He, Zach Degnan, Bogdan C. Donose, Karl Bertling, Arkady Fedorov, Aleksandar D. Raki\'c, Peter Jacobson	(参考訳) 超伝導量子回路は主要な量子コンピューティングプラットフォームの一つである。超伝導量子コンピューティングを実用上重要な点に進めるためには、デコヒーレンスにつながる物質不完全性を特定し、対処することが重要である。ここでは、超伝導量子プロセッサの最も特徴的な構成要素の一つであるシリコン上の湿式エッチングアルミニウム共振器の局所誘電特性とキャリア濃度を探索するためにテラヘルツ走査近接場光学顕微鏡(SNOM)を用いる。近年開発されたベクトルキャリブレーション法を用いてマイクロ波フィードライン近傍の分光からTHz誘電率を抽出する。抽出された誘電率をdrudeモデルに適合させることにより,エッチングチャネル内のシリコンはバッファオキシドエッチングシリコンよりもキャリア濃度が高いことを見出し,キャリア濃度を低減するための後処理法を検討する。その結果,近接場thz調査は量子デバイスにおける電位損失チャネルの定量的評価と同定に応用できることがわかった。 Superconducting quantum circuits are one of the leading quantum computing platforms. To advance superconducting quantum computing to a point of practical importance, it is critical to identify and address material imperfections that lead to decoherence. Here, we use terahertz Scanning Near-field Optical Microscopy (SNOM) to probe the local dielectric properties and carrier concentrations of wet-etched aluminum resonators on silicon, one of the most characteristic components of the superconducting quantum processors. Using a recently developed vector calibration technique, we extract the THz permittivity from spectroscopy in proximity to the microwave feedline. Fitting the extracted permittivity to the Drude model, we find that silicon in the etched channel has a carrier concentration greater than buffer oxide etched silicon and we explore post-processing methods to reduce the carrier concentrations. Our results show that near-field THz investigations can be applied to quantitatively evaluate and identify potential loss channels in quantum devices.	翻訳日:2023-03-25 16:20:35 公開日:2021-11-01
# 勾配を用いたヘビー学習--現代ディープラーニングフレームワークを用いたヘビー畳み込みニューラルネットワーク Hebbian learning with gradients: Hebbian convolutional neural networks with modern deep learning frameworks ( http://arxiv.org/abs/2107.01729v2 ) ライセンス: Link先を確認	Thomas Miconi	(参考訳) 深層学習ネットワークは一般に非生物学的学習法を用いる。対照的に、Hebbian Learningのような生物学的にもっとも有効な学習に基づくネットワークは、比較的性能が劣り、実装の難しさを示している。ここでは,階層的,畳み込み型ニューラルネットワークにおけるヘビアン学習が,最新のディープラーニングフレームワークとほぼ簡単に実装可能であることを示す。勾配が平ヘビアン規則(dw ~= xy)、グロスベルクの星内規則(dw ~= y(x-w))、オジャの規則(dw ~= y(x-yw))を正確に実装した式を提供する。アプリケーションとして,オブジェクト認識のためのヘビアン畳み込み多層ネットワークを構築する。このようなネットワークの上位層は大規模で単純な特徴(ガボライクなフィルタやブロブ)を学習する傾向にあり,従来報告されていたデコード性能の低下が説明できる。この傾向に対処するために、我々は、学習された特徴をスパーサーし、性能を大幅に向上させ、情報が連続する層を越えて増加するようにするための介入(可塑性の少ないデンサー活性化、層間の接続の刈り込み)を導入する。我々は、より高度な技術(動的刺激、トレース学習、フィードバック接続など)と、現代のディープラーニングフレームワークが提供する膨大な計算能力によって、多層ヘビーネットワークのパフォーマンスと生物学的関連性が大幅に向上すると仮定する。 Deep learning networks generally use non-biological learning methods. By contrast, networks based on more biologically plausible learning, such as Hebbian learning, show comparatively poor performance and difficulties of implementation. Here we show that Hebbian learning in hierarchical, convolutional neural networks can be implemented almost trivially with modern deep learning frameworks, by using specific losses whose gradients produce exactly the desired Hebbian updates. We provide expressions whose gradients exactly implement a plain Hebbian rule (dw ~= xy), Grossberg's instar rule (dw ~= y(x-w)), and Oja's rule (dw ~= y(x-yw)). As an application, we build Hebbian convolutional multi-layer networks for object recognition. We observe that higher layers of such networks tend to learn large, simple features (Gabor-like filters and blobs), explaining the previously reported decrease in decoding performance over successive layers. To combat this tendency, we introduce interventions (denser activations with sparse plasticity, pruning of connections between layers) which result in sparser learned features, massively increase performance, and allow information to increase over successive layers. We hypothesize that more advanced techniques (dynamic stimuli, trace learning, feedback connections, etc.), together with the massive computational boost offered by modern deep learning frameworks, could greatly improve the performance and biological relevance of multi-layer Hebbian networks.	翻訳日:2023-03-23 11:21:25 公開日:2021-11-01
# 部分空間が完全または真に絡み合うための単純条件 Simple sufficient condition for subspace to be completely or genuinely entangled ( http://arxiv.org/abs/2107.07530v2 ) ライセンス: Link先を確認	Maciej Demianowicz, Grzegorz Rajchel-Mieldzio\'c, and Remigiusz Augusiak	(参考訳) 単純十分条件を導入することで、二成分あるいは多成分ヒルベルト空間の部分空間が絡み合っているかどうかを判断することができる。我々の基準の主な要素は、幾何学的エンタングルメントの測度で表される部分空間にまたがるベクトルの絡み合いの観点から、部分空間の最小エンタングルメント上の境界である。基準は完全かつ真に絡み合った部分空間にも適用できる。いくつかの重要なシナリオでその有用性を探る。さらに、この条件から直接従う混合状態の絡み合い基準を述べる。補助的な結果として、$d$-レベルのディック状態の絡み合いの一般化幾何測度の公式を提供する。 We introduce a simple sufficient criterion, which allows one to tell whether a subspace of a bipartite or multipartite Hilbert space is entangled. The main ingredient of our criterion is a bound on the minimal entanglement of a subspace in terms of entanglement of vectors spanning that subspace expressed for geometrical measures of entanglement. The criterion is applicable to both completely and genuinely entangled subspaces. We explore its usefulness in several important scenarios. Further, an entanglement criterion for mixed states following directly from the condition is stated. As an auxiliary result we provide a formula for the generalized geometric measure of entanglement of the $d$--level Dicke states.	翻訳日:2023-03-22 05:15:23 公開日:2021-11-01
# 量子バッテリの抽出可能な仕事のゆらぎと帯電力の限界 Fluctuations in Extractable Work and Bounds on the Charging Power of Quantum Batteries ( http://arxiv.org/abs/2107.08620v2 ) ライセンス: Link先を確認	Shang-Yung Wang	(参考訳) 自由エネルギー作用素のゆらぎが量子電池の充電電力を束縛しているという主張に対する最近の意見の相違により、我々は元の導出を批判的に分析する。この分析は、上記の主張が閉系力学と開系力学の両方に当てはまらないことを示している。本結果は,充電用量子電池の作業内容に対して,自由エネルギー演算子は一貫した定量演算子ではないことを示す。 Motivated by a recent disagreement about the claim that fluctuations in the free energy operator bound the charging power of a quantum battery, we present a critical analysis of the original derivation. The analysis shows that the above claim does not hold for both closed- and open-system dynamics. Our results indicate that the free energy operator is not a consistent quantifying operator for the work content of a charging quantum battery.	翻訳日:2023-03-21 21:26:30 公開日:2021-11-01
# Aharonov-Bohm熱機関の非線形性能向上機構 Non-linear regime for enhanced performance of an Aharonov-Bohm heat engine ( http://arxiv.org/abs/2107.13222v2 ) ライセンス: Link先を確認	G\'eraldine Haack and Francesco Giazotto	(参考訳) ナノスケールにおける熱輸送と量子熱力学は、近年、特に量子技術の文脈において注目を集めている。量子技術に関する実験は非線形状態で行われることが期待される。本研究では,Aharonov-Bohm(AB)干渉計を熱機関として動作させるための線形応答系に基づく以前の結果に基づいて構築する。非線形系では、このメソスコピック量子機械が達成できるチューナビリティ、大きな効率性、熱力を示し、このab環が完全量子系において効率的な熱機械を開発するためのエキサイティングな視点を明らかにした。 Thermal transport and quantum thermodynamics at the nanoscale is nowadays garnering an increasing attention, in particular in the context of quantum technologies. Experiments relevant for quantum technology are expected to be performed in the non-linear regime. In this work, we build on previous results derived in the linear response regime for the performance of an Aharonov-Bohm (AB) interferometer operated as heat engine. In the non-linear regime, we demonstrate the tunability, large efficiency and thermopower that this mesoscopic quantum machine can achieve, confirming the exciting perspectives that this AB ring offers for developing efficient thermal machines in the fully quantum regime.	翻訳日:2023-03-20 17:19:49 公開日:2021-11-01
# 境界駆動自由フェルミオン鎖の待ち時間統計 Waiting-times statistics in boundary driven free fermion chains ( http://arxiv.org/abs/2108.11850v3 ) ライセンス: Link先を確認	Gabriel T. Landi	(参考訳) 両端のリンドブラッド浴に結合した量子鎖の待ち時間分布(WTD)について検討した。我々の焦点は自由フェルミオン鎖であり、そこでは単粒子行列の項で閉形式表現を導き、任意の大きさの鎖を研究できる。その際、非エルミート的プロパゲータを含む2点相関関数の公式も導出する。 We study the waiting-time distributions (WTDs) of quantum chains coupled to two Lindblad baths at each end. Our focus is on free fermion chains, where we derive closed-form expressions in terms of single-particle matrices, allowing one to study arbitrarily large chain sizes. In doing so, we also derive formulas for 2-point correlation functions involving non-Hermitian propagators.	翻訳日:2023-03-17 03:07:23 公開日:2021-11-01
# 量子誤差緩和のための実践的枠組み A Practical Framework for Quantum Error Mitigation ( http://arxiv.org/abs/2110.05389v2 ) ライセンス: Link先を確認	Zhenyu Cai	(参考訳) 量子エラー軽減は、近い将来、量子機械の実用化において重要な役割を果たすことが期待されている。したがって、多くの量子エラー緩和スキームをコヒーレントなフレームワークの下で提案し、その基礎となる接続を強調し、実用的性能のガイダンスを提供することが重要である。本稿では,現在最先端の量子エラー緩和方式のほとんどを含む線形量子エラー緩和という一般的なフレームワークを構築する。フレームワーク内では、量子エラー緩和は、ノイズ状態からエラー緩和状態を抽出するものと見なすことができる。抽出率と呼ばれる抽出に成功した誤り軽減状態の割合は、与えられた緩和スキームのコスト効果を示す。この枠組みを用いることで,様々な緩和手法における忠実度向上,サンプリングオーバーヘッド,抽出率の導出と比較が可能となる。フレームワークによって提供される構造、洞察、直観は、新しいスキームのさらなる発展の基盤となりうる。 Quantum error mitigation is expected to play a crucial role in the practical applications of quantum machines for the foreseeable future. Thus it is important to put the numerous quantum error mitigation schemes proposed under a coherent framework that can highlight their underlying connections while providing guidance for their practical performance. In this article, we construct a general framework named linear quantum error mitigation that includes most of the state-of-the-art quantum error mitigation schemes. Within the framework, quantum error mitigation can be effectively viewed as extracting the error-mitigated state out of the noisy state. The fraction of error-mitigated state that is successfully extracted, called extraction rate, will indicate the cost-effectiveness of the given mitigation scheme. Using the framework, we can derive and compare the fidelity boost, sampling overhead and extraction rate across various mitigation schemes under practical assumptions. The structure, insights and intuitions provided by the framework can serve as a basis for further developments of new schemes.	翻訳日:2023-03-11 19:09:37 公開日:2021-11-01
# 可観測物の不確かさ領域と状態非依存の不確かさ関係 Uncertainty regions of observables and state-independent uncertainty relations ( http://arxiv.org/abs/2110.14134v2 ) ライセンス: Link先を確認	Lin Zhang and Shunlong Luo and Shao-Ming Fei and Junde Wu	(参考訳) 可観測物の分散や偏差の和に対する最適状態独立な下界は、不確実な制限状態に達する実験の増加にとって重要である。本稿では、一様ハール測度から導かれるランダム量子状態における2つ以上の量子観測可能な量子状態のタプルによって形成される不確かさ領域を決定することにより、分散や偏差の密接な不確かさ関係を計算するための枠組みを提案する。これらの不確かさ領域の分析式から, 2, 3, 任意の観測値の差分あるいは偏差の和で満たされる状態非依存の不確かさの不等式を示し, 両部類および三部体系において, 実験的に交絡検出基準を導出した。 The optimal state-independent lower bounds for the sum of variances or deviations of observables are of significance for the growing number of experiments that reach the uncertainty limited regime. We present a framework for computing the tight uncertainty relations of variance or deviation via determining the uncertainty regions, which are formed by the tuples of two or more of quantum observables in random quantum states induced from the uniform Haar measure on the purified states. From the analytical formulae of these uncertainty regions, we present state-independent uncertainty inequalities satisfied by the sum of variances or deviations of two, three and arbitrary many observables, from which experimentally friend entanglement detection criteria are derived for bipartite and tripartite systems.	翻訳日:2023-03-10 03:33:50 公開日:2021-11-01
# テクスチュアリティ, 微調整, テレロジカル説明 Contextuality, Fine-Tuning, and Teleological Explanation ( http://arxiv.org/abs/2110.15898v2 ) ライセンス: Link先を確認	Emily Adlam	(参考訳) 私は、文脈性に問題があるという直感の源泉として様々な提案を評価し、究極的には、文脈性は微調整の観点から考えるのが最適である、と結論づける。量子力学の他の微調整問題と同様に、この振る舞いは物理学の遠隔的特徴の顕在化として理解することができる。最後に、文脈分析に用いられてきたいくつかの形式的な数学的枠組みについて論じ、それらの結果が科学リアリストによってどのように解釈されるべきかを考察する。この議論の過程で、私はいくつかの新しい数学的結果を得た。私は、準備の文脈性は微調整の一形態であることを示し、測定の文脈性は、閉因果ループを禁ずるグローバルな制約に訴えることで説明できることを示し、また、古典的な存在論的モデルから負の確率が、疫学的な制約とともに生じることを実証する。 I assess various proposals for the source of the intuition that there is something problematic about contextuality, ultimately concluding that contextuality is best thought of in terms of fine-tuning. I then argue that as with other fine-tuning problems in quantum mechanics, this behaviour can be understood as a manifestation of teleological features of physics. Finally I discuss several formal mathematical frameworks that have been used to analyse contextuality and consider how their results should be interpreted by scientific realists. In the course of this discussion I obtain several new mathematical results - I demonstrate that preparation contextuality is a form of fine-tuning, I show that measurement contextuality can be explained by appeal to a global constraint forbidding closed causal loops, and I demonstrate how negative probabilities can arise from a classical ontological model together with an epistemic restriction.	翻訳日:2023-03-09 22:51:06 公開日:2021-11-01
# パス干渉による超伝導量子状態読み出しの改善 Improved superconducting qubit state readout by path interference ( http://arxiv.org/abs/2111.00736v1 ) ライセンス: Link先を確認	Zhiling Wang, Zenghui Bao, Yukai Wu, Yan Li, Cheng Ma, Tianqi Cai, Yipu Song, Hongyi Zhang, Luming Duan	(参考訳) 高忠実なシングルショット量子ビット状態の読み出しは多くの量子情報処理プロトコルに必須である。超伝導量子回路では、量子ビット状態は通常、透過または反射からマイクロ波空洞の分散周波数シフトを検出することによって決定される。本稿では,伝送信号と反射信号の間の構成的干渉を利用して,量子状態の読み出しを最適化し,より解決された状態の識別と量子状態の読み出し精度の向上を実証する。簡便かつ便利な手法として、空洞光子状態の識別に基づく他のクビット状態読み出し手法と組み合わせることで、クビット状態読み出しをさらに改善することができる。 High fidelity single shot qubit state readout is essential for many quantum information processing protocols. In superconducting quantum circuit, the qubit state is usually determined by detecting the dispersive frequency shift of a microwave cavity from either transmission or reflection. In this paper, we demonstrate the use of constructive interference between the transmitted and reflected signal to optimize the qubit state readout, with which we find a better resolved state discrimination and an improved qubit readout fidelity. As a simple and convenient approach, our scheme can be combined with other qubit readout methods based on the discrimination of cavity photon states to further improve the qubit state readout.	翻訳日:2023-03-09 17:15:04 公開日:2021-11-01
# 誤情報曝露後のtwitterユーザーの行動変化の分析 Analyzing Behavioral Changes of Twitter Users After Exposure to Misinformation ( http://arxiv.org/abs/2111.00700v1 ) ライセンス: Link先を確認	Yichen Wang, Richard Han, Tamara Lehman, Qin Lv, and Shivakant Mishra	(参考訳) 近年、ソーシャルメディアプラットフォームは誤情報を広めるために利用されてきた。オンライン誤報はユーザの信念に影響を与え、偏光のような社会的影響に結びついている。本研究は,誤報が特定のユーザの行動に与える影響に着目し,誤報に晒された後,一般のTwitterユーザが行動を変えたかどうかを理解することを目的とする。露出したユーザーの前後の行動を比較して、投稿したツイートの頻度やツイートの感情に重大な変化が生じたかどうかを判断する。以上の結果から,利用者の行動に統計学的に有意な変化がみられた。言語距離分析により,露光前の露出ユーザとベースラインユーザとの間には,すでに違いが見られた。また,マルチ露光群と極端変化群という2つの特定のユーザグループの特徴について検討した。最後に,誤報ツイートの出現後のユーザの行動の変化が,フォロワー数や投稿者のフォロワー数によって異なるかどうかを調査し,その行動変化がすべて類似していることを示す。 Social media platforms have been exploited to disseminate misinformation in recent years. The widespread online misinformation has been shown to affect users' beliefs and is connected to social impact such as polarization. In this work, we focus on misinformation's impact on specific user behavior and aim to understand whether general Twitter users changed their behavior after being exposed to misinformation. We compare the before and after behavior of exposed users to determine whether the frequency of the tweets they posted, or the sentiment of their tweets underwent any significant change. Our results indicate that users overall exhibited statistically significant changes in behavior across some of these metrics. Through language distance analysis, we show that exposed users were already different from baseline users before the exposure. We also study the characteristics of two specific user groups, multi-exposure and extreme change groups, which were potentially highly impacted. Finally, we study if the changes in the behavior of the users after exposure to misinformation tweets vary based on the number of their followers or the number of followers of the tweet authors, and find that their behavioral changes are all similar.	翻訳日:2023-03-09 17:14:34 公開日:2021-11-01
# 縮合ニューマン級数による量子誤差の緩和 Mitigating Quantum Errors via Truncated Neumann Series ( http://arxiv.org/abs/2111.00691v1 ) ライセンス: Link先を確認	Kun Wang, Yu-Ao Chen, and Xin Wang	(参考訳) 量子ゲートと量子ハードウェア上の測定は、必然的に量子エラーを引き起こすハードウェアの不完全性に直面する。このような避けられないエラーの軽減は、量子ハードウェアのパワーをより深く探求するために不可欠である。本稿では,ニューマン級数を用いた量子期待値計算において,量子ゲートと測定誤差を軽減できる統一フレームワークを提案する。基本的な考え方は、量子デバイスのシーケンシャルな応用によって生成される異なる順序の量子エラーと慎重に選択された係数を線形に組み合わせることで、その逆を近似することで量子エラーの効果をキャンセルすることである。注目すべきは、推定誤差は、停止順序で指数関数的に減衰し、量子デバイスのノイズ抵抗が適度である限り、帰納誤差軽減オーバーヘッドはシステムサイズに依存しないことである。異なる量子誤差に対してこのフレームワークを数値的にテストし,計算精度が大幅に向上していることを確認した。我々のフレームワークは、量子ゲートと測定誤差を統一的に緩和し、いかなるエラー構造も想定せず、また量子エラーを完全に特徴づけるためにトモグラフィーの手順も必要とせず、そして最も重要なのはスケーラビリティである。これらのアドバンテージは、我々の量子エラー軽減フレームワークを効率的かつ実用的なものにし、量子アプリケーションを提供するための短期量子デバイスの能力を拡張します。 Quantum gates and measurements on quantum hardware are inevitably subject to hardware imperfections that lead to quantum errors. Mitigating such unavoidable errors is crucial to explore the power of quantum hardware better. In this paper, we propose a unified framework that can mitigate quantum gate and measurement errors in computing quantum expectation values utilizing the truncated Neumann series. The essential idea is to cancel the effect of quantum error by approximating its inverse via linearly combining quantum errors of different orders produced by sequential applications of the quantum devices with carefully chosen coefficients. Remarkably, the estimation error decays exponentially in the truncated order, and the incurred error mitigation overhead is independent of the system size, as long as the noise resistance of the quantum device is moderate. We numerically test this framework for different quantum errors and find that the computation accuracy is substantially improved. Our framework possesses several vital advantages: it mitigates quantum gate and measurement errors in a unified manner, it neither assumes any error structure nor requires the tomography procedure to completely characterize the quantum errors, and most importantly, it is scalable. These advantages empower our quantum error mitigation framework to be efficient and practical and extend the ability of near-term quantum devices to deliver quantum applications.	翻訳日:2023-03-09 17:14:15 公開日:2021-11-01
# 非ガウス資源を用いた連続可変量子テレポーテーションの強化 Enhancing Continuous Variable Quantum Teleportation using Non-Gaussian Resources ( http://arxiv.org/abs/2111.00672v1 ) ライセンス: Link先を確認	Eduardo Villase\~nor and Robert Malaney	(参考訳) 連続可変(CV)非ガウス資源は、CVベースの量子通信とCVベースの計算のための量子エラー補正の実現において基礎となる。本研究では, CV非ガウス状態をノイズチャネルによるコヒーレントおよび圧縮状態の伝送の文脈における量子テレポーテーション資源状態として用いることを検討する。異なる非ガウス的資源状態の配列を検討し、各資源に対して達成された状態テレポーテーションの忠実度を計算する。以上の結果から,非ガウス状態の使用は,従来のCVテレポーテーション(ガウス2モード圧縮真空状態)と比較して大きな優位性を示した。ファイバーベースの量子通信では、特定の非ガウス状態を用いることで、量子テレポーテーションの範囲が約40%増加する。衛星と地上の量子通信において、ミシウス衛星と一致する開口構成のために、量子テレポーテーションの実行可能な範囲は700kmから1200kmを超える。これらの結果は、地球と宇宙の両方のネットワークにおける実用的および実現可能な量子通信の性能が著しく向上したことを示している。 Continuous Variable (CV) non-Gaussian resources are fundamental in the realization of quantum error correction for CV-based quantum communications and CV-based computing. In this work, we investigate the use of CV non-Gaussian states as quantum teleportation resource states in the context of the transmission of coherent and squeezed states through noisy channels. We consider an array of different non-Gaussian resource states, and compute the fidelity of state teleportation achieved for each resource. Our results show that the use of non-Gaussian states presents a significant advantage compared to the traditional resource adopted for CV teleportation; the Gaussian two-mode squeezed vacuum state. In fiber-based quantum communications, the range of quantum teleportation is increased by approximately 40% via the use of certain non-Gaussian states. In satellite-to-ground quantum communications, for aperture configurations consistent with the Micius satellite, the viable range of quantum teleportation is increased from 700 km to over 1200 km. These results represent a significant increase in the performance of pragmatic and realizable quantum communications in both terrestrial and space-based networks.	翻訳日:2023-03-09 17:13:54 公開日:2021-11-01
# ボルンの支配確立におけるカオス的・秩序的軌道の役割 The role of chaotic and ordered trajectories in establishing Born's rule ( http://arxiv.org/abs/2111.00846v1 ) ライセンス: Link先を確認	Athanasios C. Tzemos and George Contopoulos	(参考訳) 様々な量子の絡み合いに対するボルンの法則が満たされる(あるいは満たさない)とき、2つの絡み合ったボヘミアン量子ビットの軌道、順序とカオスについて詳細に研究した。エンタングルメントの任意の非零値とカオス的軌道が共存し、エンタングルメントが減少するに従って順序付けられた軌道の割合が増加する。ゼロエンタングルメントと最大エンタングルメントの極端なケースでは、順序とカオスの軌道だけが対応する。このモデルのカオス軌跡はエルゴード的であり、任意の絡み合いの値に対して、その点の極限分布は初期条件に依存しない。そのため、ボルンの規則の動的確立(あるいはそうでない)に責任を持つ秩序とカオスの軌道の比率である。 We study in detail the trajectories, ordered and chaotic, of two entangled Bohmian qubits when their initial preparation satisfies (or not) Born's rule for various amounts of quantum entanglement. For any non zero value of entanglement ordered and chaotic trajectories coexist and the proportion of ordered trajectories increases with the decrease of the entanglement. In the extreme cases of zero and maximum entanglement we have only ordered and chaotic trajectories correspondingly. The chaotic trajectories of this model are ergodic, for any given value of entanglement, namely the limiting distribution of their points does not depend on their initial conditions. Consequently it is the ratio between ordered and chaotic trajectories which is responsible for the dynamical establishment (or not) of Born's rule.	翻訳日:2023-03-09 17:09:31 公開日:2021-11-01
# 多体局在化は反復量子最適化を可能にする Many-body localization enables iterative quantum optimization ( http://arxiv.org/abs/2111.00842v1 ) ライセンス: Link先を確認	Hanteng Wang, Hsiu-Chung Yeh, Alex Kamenev	(参考訳) 我々は,反復量子プロトコルを提案し,ガラス状エネルギー環境を用いて最適化問題を解く。これは多体局在遷移の三臨界点付近の周期的サイクリングに基づいている。これにより、各反復が局所エネルギーの最小値を求める非指数的に小さな確率に導くことが保証される。もう1つの重要な要素は、サイクルパラメータを現在達成されている最適状態("参照"状態)に調整し、より深い最小値が見つかるとリセットすることである。三項臨界点の位置が分かっていれば、このアルゴリズムは多項式時間で任意の精度で絶対最小値に近づくことができることを示す。 We suggest an iterative quantum protocol, allowing to solve optimization problems with a glassy energy landscape. It is based on a periodic cycling around the tricritical point of the many-body localization transition. This ensures that each iteration leads to a non-exponentially small probability to find a lower local energy minimum. The other key ingredient is to tailor the cycle parameters to a currently achieved optimal state (the "reference" state) and to reset them once a deeper minimum is found. We show that, if the position of the tricritical point is known, the algorithm allows to approach the absolute minimum with any given precision in a polynomial time.	翻訳日:2023-03-09 17:09:16 公開日:2021-11-01
# 挑戦的だが機会に満ちている小学校におけるプログラミングに対する教師の視点 Challenging but Full of Opportunities: Teachers' Perspectives on Programming in Primary Schools ( http://arxiv.org/abs/2111.00799v1 ) ライセンス: Link先を確認	Luisa Greifenstein, Isabella Gra{\ss}l, Gordon Fraser	(参考訳) 学校カリキュラムにおける計算思考の確立により、教師は小学校レベルでのプログラミングに子供たちを導入する必要がある。これは最近の発展なので、小学校の教師はプログラミングを最善に教えるために十分な準備ができていないかもしれないし、なぜそうしなければならないのかを十分に理解していないかもしれない。これらの質問をより深く理解するために,実践経験から得られた洞察と,教員養成の期待とを対比した。小学校でプログラミングを教えた教師200名, 教員97名を対象に, プログラミングを教える際の課題, 子供たちがプログラミングを学ぶときに生じる機会, およびこれらを実践的に扱うための戦略について調査した。多くの課題や機会が正しく予測されているが、小学校のプログラミング教育のために小学校教師をより良く準備するために、カリキュラムの改訂を通知できるいくつかの不一致がある。 The widespread establishment of computational thinking in school curricula requires teachers to introduce children to programming already at primary school level. As this is a recent development, primary school teachers may neither be adequately prepared for how to best teach programming, nor may they be fully aware why they have to do so. In order to gain a better understanding of these questions, we contrast insights taken from practical experiences with the anticipations of teachers in training. By surveying 200 teachers who have taught programming at primary schools and 97 teachers in training, we identify relevant challenges when teaching programming, opportunities that arise when children learn programming, and strategies how to address both of these in practice. While many challenges and opportunities are correctly anticipated, we find several disagreements that can inform revisions of the curricula in teaching studies to better prepare primary school teachers for teaching programming at primary schools.	翻訳日:2023-03-09 17:08:46 公開日:2021-11-01
# ソフトコアポテンシャルをもつ位置不規則イジングスピンのダイナミクス Dynamics of position disordered Ising spins with a soft-core potential ( http://arxiv.org/abs/2111.00779v1 ) ライセンス: Link先を確認	Canzhu Tan, Xiaodong Lin, Yabing Zhou, Y. H. Jiang, Matthias Weidem\"uller, Bing Zhu	(参考訳) r$ はスピン間の距離であり、r_c$ はソフトコアの半径である2体相互作用ポテンシャル $\propto1/[1+(r/r_c)^\alpha]$ (\alpha\ge d$) の下でランダムに分布するイジングスピンの磁化緩和を理論的に研究する。動力学は全てのスピンが横方向に偏光して始まる。均質な場合、解析式は熱力学極限で導出され、これは$\propto\exp(-t^2)$から始まり、指数$\beta=d/\alpha$で長く漸近的に拡張指数法則に従う。振動挙動の間は減衰振幅で観察される。ガウスサンプルの場合、平均スピン間距離である$l_\rho/r_c$と$l_\rho$との比で系の乱れの程度を制御でき、磁化ダイナミクスを数値的に研究できる。 l_\rho/r_c\ll1$の限界において、スピンの位置異常にもかかわらず、全磁化に対してコヒーレント多体ダイナミクスを回収する。 l_\rho/r_c\gg1$の反対の極限では、磁化の初期素早い崩壊の後、均質の場合と同様のダイナミクスが現れる。漸近進化に対する$\beta\approx0.18$の伸張指数を$d=3, \alpha=6$とし、同種の場合(\beta=0.5$)とは異なる。 We theoretically study magnetization relaxation of Ising spins distributed randomly in a $d$-dimension homogeneous and Gaussian profile under a soft-core two-body interaction potential $\propto1/[1+(r/R_c)^\alpha]$ ($\alpha\ge d$), where $r$ is the inter-spin distance and $R_c$ is the soft-core radius. The dynamics starts with all spins polarized in the transverse direction. In the homogeneous case, an analytic expression is derived at the thermodynamic limit, which starts as $\propto\exp(-t^2)$ and follows a stretched-exponential law asymptotically at long time with an exponent $\beta=d/\alpha$. In between an oscillating behaviour is observed with a damping amplitude. For Gaussian samples, the degree of disorder in the system can be controlled by the ratio $l_\rho/R_c$ with $l_\rho$ the mean inter-spin distance and the magnetization dynamics is investigated numerically. In the limit of $l_\rho/R_c\ll1$, a coherent many-body dynamics is recovered for the total magnetization despite of the position disorder of spins. In the opposite limit of $l_\rho/R_c\gg1$, a similar dynamics as that in the homogeneous case emerges at later time after a initial fast decay of the magnetization. We obtain a stretched exponent of $\beta\approx0.18$ for the asymptotic evolution with $d=3, \alpha=6$, which is different from that in the homogeneous case ($\beta=0.5$).	翻訳日:2023-03-09 17:07:38 公開日:2021-11-01
# 線形方程式系に対する行列合同のQUBO定式化への応用について On the application of matrix congruence to QUBO formulations for systems of linear equations ( http://arxiv.org/abs/2111.00747v1 ) ライセンス: Link先を確認	Sun Woo Park, Hyunju Lee, Byung Chun Kim, Youngho Woo, and Kyungtaek Jun	(参考訳) 量子コンピューティングアルゴリズムの最近の研究は、計算モデルの強化に寄与する可能性がある量子コンピュータの特徴の発掘に焦点を当てている。量子アニール法は線形方程式系の2次非制約バイナリ最適化(QUBO)を効果的に並列化する。本稿では,実対称行列と対角行列の合同性を生かして,これらの定式化を単純化する。さらに、QRやSVD分解などの古典的アルゴリズムよりも優れた性能を持つQUBOモデルの計算性能を示す。 Recent studies on quantum computing algorithms focus on excavating features of quantum computers which have potential for contributing to computational model enhancements. Among various approaches, quantum annealing methods effectively parallelize quadratic unconstrained binary optimization (QUBO) formulations of systems of linear equations. In this paper, we simplify these formulations by exploiting congruence of real symmetric matrices to diagonal matrices. We further exhibit computational merits of the proposed QUBO models, which can outperform classical algorithms such as QR and SVD decomposition.	翻訳日:2023-03-09 17:07:06 公開日:2021-11-01
# 薄膜Al/AlO$_x$/Alジョセフソン接合による3次元トランスモンの磁場抵抗 Magnetic-field resilience of 3D transmons with thin-film Al/AlO$_x$/Al Josephson junctions approaching 1 T ( http://arxiv.org/abs/2111.01115v1 ) ライセンス: Link先を確認	J. Krause, C. Dickel, E. Vaal, M. Vielmetter, J. Feng, R. Bounds, G. Catelani, J. M. Fink, Yoichi Ando	(参考訳) 磁場-弾性超伝導回路は、スピンまたはトポロジカル量子ビットと電気機械要素を含むハイブリッド量子計算アーキテクチャのセンシングや、フラックスノイズや準粒子損失の研究を可能にする。薄膜3Dアルミニウムトランスモンのスペクトルおよびコヒーレンス時間に及ぼす面内磁場最大1Tの影響について検討した。強磁場の影響を受けない銅空洞を用いて、トランスモンの磁場効果のみを探査することができる。そこで,同一キャビティ内で冷却された単一接合とイカトランスモンのデータを提示する。予想通り、超伝導ギャップの抑制と幾何学的フラウンホーファー様の寄与により、トランスモン周波数は磁場の増加とともに減少する。それにもかかわらず、薄膜トランスモンは強磁場弾性を示す:どちらのトランスモンも、マイクロ秒コヒーレンスを少なくとも 0.65 t まで表示し、$t_1$は測定可能な範囲全体で 1 $\mathrm{\mu}$s を超える。 SQUID分光は磁石の限界である1Tまで実現可能である。薄膜アルミニウムジョセフソン接合は高磁場下での超伝導回路に適したハードウェアである。 Magnetic-field-resilient superconducting circuits enable sensing applications and hybrid quantum-computing architectures involving spin or topological qubits and electro-mechanical elements, as well as studying flux noise and quasiparticle loss. We investigate the effect of in-plane magnetic fields up to 1 T on the spectrum and coherence times of thin-film 3D aluminum transmons. Using a copper cavity, unaffected by strong magnetic fields, we can solely probe the magnetic-field effect on the transmons. We present data on a single-junction and a SQUID transmon, that were cooled down in the same cavity. As expected, transmon frequencies decrease with increasing fields, due to a suppression of the superconducting gap and a geometric Fraunhofer-like contribution. Nevertheless, the thin-film transmons show strong magnetic-field resilience: both transmons display microsecond coherence up to at least 0.65 T, and $T_1$ remains above 1 $\mathrm{\mu}$s over the entire measurable range. SQUID spectroscopy is feasible up to 1 T, the limit of our magnet. We conclude that thin-film aluminum Josephson junctions are a suitable hardware for superconducting circuits in the high-magnetic-field regime.	翻訳日:2023-03-09 17:01:24 公開日:2021-11-01
# 分子集合体の吸収スペクトルのシミュレーション:確率的純粋状態の階層的アプローチ Simulation of absorption spectra of molecular aggregates: a Hierarchy of Stochastic Pure States approach ( http://arxiv.org/abs/2111.01089v1 ) ライセンス: Link先を確認	Lipeng Chen, Doran I. G. Bennett and Alexander Eisfeld	(参考訳) 電子励起と振動自由度を強く構造的に結合した分子集合体に対する分光観測器のシミュレーションは重要であるが難しい課題である。純粋状態の階層(HOPS)は、局所的確率的軌跡に基づく正式な正確な解を提供する。大きな凝集体における吸収スペクトルのシミュレーションのためにホップの局在を利用するには、正規化軌道の定式化が必要である。ここでは、ケット状態とブラ状態が異なる電子ヒルベルト空間で伝播する正規化ダイアル方程式を提供する。この研究は、吸収スペクトルのシミュレーションや電場との相互作用に関して摂動性を持つ非線形分光の定式化に適応的なHOPS法を適用するための扉を開く。 The simulation of spectroscopic observables for molecular aggregates with strong and structured coupling of electronic excitation to vibrational degrees of freedom is an important but challenging task. The hierarchy of pure states (HOPS) provides a formally exact solution based on local, stochastic trajectories. Exploiting the localization of HOPS for the simulation of absorption spectra in large aggregares requires a formulation in terms of normalized trajectories. Here we provide a normalized dyadic equation where the ket- and bra-states are propagated in different electronic Hilbert spaces. This work opens the door to apply adaptive HOPS methods for the simulation of absorption spectra and also to a formulation for non-linear spectroscopy that is perturbative with respect to interactions with the electric field.	翻訳日:2023-03-09 17:01:01 公開日:2021-11-01
# 光吸収ターゲットを用いた量子照明 Quantum illumination with a light absorbing target ( http://arxiv.org/abs/2111.01069v1 ) ライセンス: Link先を確認	Rivu Gupta, Saptarshi Roy, Tamoghna Das, Aditi Sen De	(参考訳) 量子照明(QI)プロトコルでは、通常は部分的に反射するビームスプリッターによってモデル化されるターゲットの存在を検出する。我々は、目標が落下する光の一部を吸収した場合のqiの性能を分析し、シナリオをより現実的なものにする。本稿では,これらの特徴を持つ対象をモデル化し,チャーノフ境界(CB)の観点から量子領域における検出可能性を検討する。アイドラーフリーのセットアップでは、QIのコヒーレント状態を使用し、2モード圧縮真空(TMSV)状態がシグナ-イドラー方式で使用される。いずれの場合においても,吸収量の増加とともにcbの低下により検出効率が向上したことを報告する。興味深いことに,吸収の存在下では,より熱的背景が高効率でターゲット検出に繋がる可能性がある。さらに, 有限吸収量においても量子アドバンテージは持続することを示した。しかし, tmsv が提供する量子アドバンテージは, 吸収によって単調に減少し, 高吸収下では消滅的に小さくなることがわかった。また,コヒーレント状態とTMSV状態の両方の最適性(イドラーフリー,信号イドラー)を低い反射率と吸収の限界で示す。 In a quantum illumination (QI) protocol, the task is to detect the presence of the target which is typically modelled by a partially reflecting beam splitter. We analyze the performance of QI when the target absorbs part of the light that falls on it, thereby making the scenario more realistic. We present an optical setup that models a target with these characteristics and explore its detectability in the quantum domain in terms of the Chernoff bound (CB). For an idler-free setup, we use the coherent state for QI while the two mode squeezed vacuum (TMSV) state is employed in the signal-idler scheme. In both the cases, we report an absorption-induced enhancement of the detection efficiency indicated by a lowering of CB with increasing amounts of absorption. Interestingly, we show that in the presence of absorption, a more intense thermal background can lead to target detection with enhanced efficiency. Moreover, we observe that the quantum advantage persists even for finite amounts of absorption. However, we find that the quantum advantage offered by TMSV decreases monotonically with absorption, and becomes vanishingly small in the high absorption regime. We also demonstrate the optimality of both the coherent and the TMSV states in their respective setups (idler-free and signal-idler) in the limit of low reflectivity and absorption.	翻訳日:2023-03-09 17:00:28 公開日:2021-11-01
# 多モード連続可変絡み合い分布におけるクロストーク補償 Cross talk compensation in multimode continuous-variable entanglement distribution ( http://arxiv.org/abs/2111.00948v1 ) ライセンス: Link先を確認	Olena Kovalenko, Vladyslav C. Usenko and Radim Filip	(参考訳) 2モード圧縮状態は、連続変数およびハイブリッド量子情報プロトコルを遠隔で使用するスケーラブルでロバストな絡み合いリソースである。平行な類似チャネルを伝播する2モード圧縮状態の多モード分布における線形クロストークの効果を考察する。まず, 分布ガウスの絡み合いの劣化を低減するため, チャネル内への最初の2モードスクイージングは, クロストークの存在下で既に最適化されるべきであることを示す。第2に,チャネル透過率がすべてのモードに対して同じであればクロストークを完全に補償できる,絡み合いを使用する前に,モード間の相対位相と受信側における線形結合の同時最適化を提案する。どちらのモードでも同様の透過率値を持つ現実的なチャネルの場合、クロストークはいまだにほとんど補償される。モード干渉に依存する手法は、別のペアの測定とフィードフォワード制御を用いて、1組のモードにおける絡み合い局在の代替手法を克服する。我々の理論的結果は、クロストークによるスケーラブルな量子ネットワークにおけるマルチモード連続可変フォトニック絡み合いのより効率的な利用への道を開いた。 Two-mode squeezed states are scalable and robust entanglement resources for continuous-variable and hybrid quantum information protocols at a distance. We consider the effect of a linear cross talk in the multimode distribution of two-mode squeezed states propagating through parallel similar channels. First, to reduce degradation of the distributed Gaussian entanglement, we show that the initial two-mode squeezing entering the channel should be optimized already in the presence of a small cross talk. Second, we suggest simultaneous optimization of relative phase between the modes and their linear coupling on a receiver side prior to the use of entanglement, which can fully compensate the cross talk once the channel transmittance is the same for all the modes. For the realistic channels with similar transmittance values for either of the modes, the cross talk can be still largely compensated. This method relying on the mode interference overcomes an alternative method of entanglement localization in one pair of modes using measurement on another pair and feed-forward control. Our theoretical results pave the way to more efficient use of multimode continuous-variable photonic entanglement in scalable quantum networks with cross talk.	翻訳日:2023-03-09 16:59:16 公開日:2021-11-01
# アフシャールの二重スリット実験のシミュレーション Simulation of Afshar's Double Slit Experiment ( http://arxiv.org/abs/2111.01220v1 ) ライセンス: Link先を確認	Bret Gergely and Herman Batelaan	(参考訳) Shahriar S. Afshar は2007年に修正した二重スリットの実験は相補性 [1] に反すると主張した。彼は標準の二重スリット実験を2回修正した。まず、スリットとスクリーンの間に、干渉最小限の位置に配置されるワイヤーグリッドを追加する。第2の修正は、ワイヤーグリッドのすぐ後に収束レンズを配置することである。この考え方は、ワイヤグリッドは干渉ミニマ(波状行動)の存在を意味し、一方レンズは、どの方向の情報(粒子状行動)を同時に得ることができる。より最近では、John G. Cramer [2] は、この実験は量子力学のトランザクショナル解釈(TIQM)を加速させたと主張した。彼の主張はTIQMを支持するボーアの相補性を精査している。本実験は, 量子力学の経路積分定式化を用いたシミュレーションにより解析し, エングルト, グリーンバーグ, ヤシン(E-G-Y) [4, 5] の波動粒子双対関係に一致することを示す。量子力学的解釈のためのテストベッドを提供するためのafsharの実験の使用は限られていると結論づけた。 Shahriar S. Afshar claimed that his 2007 modified version of the double-slit experiment violates complementarity [1]. He makes two modifications to the standard double-slit experiment. First, he adds a wire grid that is placed in between the slits and the screen at locations of interference minima. The second modification is to place a converging lens just after the wire grid. The idea is that the wire grid implies the existence of interference minima(wave-like behavior), while the lens can simultaneously obtain which-way information (particle-like behavior). More recently, John G. Cramer [2] argued that the experiment bolstered the Transactional Interpretation of Quantum mechanics (TIQM). His argument scrutinizes Bohr's complementarity in favor of TIQM. We analyze this experiment by simulation using the path integral formulation of quantum mechanics [3] and find that it agrees with the wave particle duality relation given by Englert, Greenberg and Yasin (E-G-Y) [4, 5]. We conclude that the use of Afshar's experiment to provide a testbed for quantum mechanical interpretations is limited.	翻訳日:2023-03-09 16:50:31 公開日:2021-11-01
# 局所的同時状態判別 Local simultaneous state discrimination ( http://arxiv.org/abs/2111.01209v1 ) ライセンス: Link先を確認	Christian Majenz, Maris Ozols, Christian Schaffner, Mehrdad Tahmasbi	(参考訳) 量子状態判別は、量子情報理論において研究される最も基本的な問題の1つである。応用範囲はチャネルコーディングからメトロロジーや暗号まで多岐にわたる。本稿では,この課題の新しい変種である局所同時状態判別(lssd)を提案する。従来分散した識別問題の変種では、当事者間のコミュニケーションが常に共同回答を導き出すことができたが、lssdの当事者はコミュニケーションが取れず、同時に正しい回答をしなければならない。この同時性は、例えば古典的状態の場合、問題は非分散区別タスクに自明でないことを意味する。それ自体は興味深いが、量子暗号においても問題が発生する。問題を導入した後、いくつかの特徴的結果を与える。その例を示します一局所的差別の最適戦略は、古典的状態においても、lssdの最適戦略と一致する必要はない。二追加の絡み合った資源がlssdの最適成功確率を増加させ、かつ、三エンタングルメントを用いた戦略に比べて、量子より強い非符号資源は、ある場合において、より高い成功確率をもたらすことができる。最後に,(古典的)3者lssdにおける最適戦略の発見はnp-hardであることを示す。 Quantum state discrimination is one of the most fundamental problems studied in quantum information theory. Applications range from channel coding to metrology and cryptography. In this work, we introduce a new variant of this task: Local Simultaneous State Discrimination (LSSD). While previous distributed variants of the discrimination problem always allowed some communication between the parties to come up with a joint answer, the parties in LSSD cannot communicate and have to simultaneously answer correctly. This simultaneity implies, e.g., that for classical states, the problem does not trivialize to a non-distributed distinguishing task. While interesting in its own right, this problem also arises in quantum cryptography. After introducing the problem, we give a number of characterization results. We give examples showing that i) the optimal strategy for local discrimination need not coincide with the optimal strategy for LSSD, even for classical states, ii) an additional entangled resource can increase the optimal success probability in LSSD, and iii) stronger-than-quantum non-signalling resources can allow for a higher success probability in some cases, compared to strategies using entanglement. Finally, we show that finding the optimal strategy in (classical) 3-party LSSD is NP-hard.	翻訳日:2023-03-09 16:49:51 公開日:2021-11-01
# 量子場からの相関の収穫を妨害する Sabotaging the harvesting of correlations from quantum fields ( http://arxiv.org/abs/2111.01191v1 ) ライセンス: Link先を確認	Abhisek Sahu, Irene Melgarejo-Lermas and Eduardo Mart\'in-Mart\'inez	(参考訳) 本研究では,古典的および量子的相関関係の量子場に結合した非摂動的収穫について検討する。まず、時間に局所的に量子場に対向する任意の数の2レベル系を持つシナリオを考える。次に、2つのターゲット検出器(アリスとボブ)がフィールドとの相互作用を通じて相関を得る能力に対する追加検出器(インターロッパー)の存在の影響について検討する。我々は,この非摂動体制下での異なる相関指標の収穫を解析し,一方の因果的過去に作用することで,一方のインターロパーでもアリスとボブの相関関係を完全に妨害できることを実証した。具体的には、インターロッパーがフィールドと相互作用できることを示し、フィールド自体がエントロピーを持つパーティの1つを'略奪する'ことを示します。これによりアリスとボブはいかなる相関関係も獲得できない。さらに,このような攻撃は防御できないことを示した。 We study the non-perturbative harvesting of classical and quantum correlations between two parties coupled to a quantum field. First, we consider a scenario with an arbitrary number of two-level systems that couple to a quantum field locally in time. Then, we study the impact of the presence of additional detectors (interlopers) on the ability for two target detectors (Alice and Bob) to acquire correlations through their interaction with the field. We analyze the harvesting of different correlation measures in this non-perturbative regime and we demonstrate that even a single interloper can completely sabotage all correlation harvesting between Alice and Bob by acting on the causal past of one of them. Specifically, we show that the interloper is able to interact with the field so that the field itself `floods' one of the parties with entropy. This prevents Alice and Bob from acquiring any correlations. Furthermore, we show that this kind of attack cannot be defended against.	翻訳日:2023-03-09 16:49:22 公開日:2021-11-01
# ゲージ理論の量子熱化:カオス、乱流、普遍性 Quantum thermalization of gauge theories: chaos, turbulence and universality ( http://arxiv.org/abs/2111.01155v1 ) ライセンス: Link先を確認	Niklas Mueller, Torsten V. Zache, Robert Ott	(参考訳) 本稿では, 2+1時空次元における{\mathbf{z}_2$格子ゲージ理論のリアルタイム熱化ダイナミクスについて述べる。古典的な熱化はカオス的挙動、乱流、普遍性と関連しているが、量子力学系におけるこれらの現象の顕在化は明らかではない。しかし、絡み合い構造のレンズを通して見ると、量子熱分解は特徴的な段階を進行し、カオス、乱流、普遍性といった古典的現象と著しく類似した現象が現れる。 In this talk, we discuss real-time thermalization dynamics of $\mathbf{Z}_2$ Lattice Gauge Theory in 2+1 spacetime dimensions. While classical thermalization is commonly associated with chaotic behavior, turbulence and universality, the manifestation of these phenomena in quantum mechanical systems is not clear. However, when viewed through the lens of Entanglement Structure, we find that quantum thermalization proceeds in characteristic stages and reveals phenomena remarkably similar to their classical counterparts: chaos, turbulence and universality.	翻訳日:2023-03-09 16:48:50 公開日:2021-11-01
# Coherent Spin-Polarized Electron Beam, Phys における強度干渉 Rev. Lett. 126, 125501 (2021) Comment on Kuwahara et al., Intensity Interference in a Coherent Spin-Polarized Electron Beam, Phys. Rev. Lett. 126, 125501 (2021) ( http://arxiv.org/abs/2111.02890v1 ) ライセンス: Link先を確認	Herman Batelaan, Sam Keramati and T. J. Gay	(参考訳) クワハラとアルの主張。 [1] は、ハンベリー・ブラウン・ツイツ電子反結合ディップの観測を報告している(第3図)が、光偏光への電子源放出速度依存性として説明できる可能性がある。 GaAs/GaAsP試料のひずみは一軸であり、光電子放出における線形二色性は0.1%よりも15%[7]大きいと予測される。円偏光についても同様の懸念がある。 The claim that Kuwahara et al. [1] have reported the observation of a Hanbury Brown-Twiss electron antibunching dip (their Fig. 3) could possibly be explained as an electron source emission rate dependency on the light polarization. Strain on their GaAs/GaAsP sample is uniaxial, and one would expect a linear dichroism in the photoemission possibly as large as 15% [7] - much larger than the 0.1% reported effect. The same concern exist for circular polarized light.	翻訳日:2023-03-09 16:40:34 公開日:2021-11-01
# 完全セキュアな分散スーパーデンス符号化: 最適性の絡み合い要件 Absolutely Secure Distributed Superdense Coding: Entanglement Requirement for Optimality ( http://arxiv.org/abs/2111.01563v1 ) ライセンス: Link先を確認	Sagnik Dutta, Asmita Banerjee, Prasanta K. Panigrahi	(参考訳) superdenseコーディングは、量子チャネルを介して古典的情報をセキュアに通信するためのリソースとして、絡み合いを使用する。超高次符号化法は、そのキャパシティがホレボ境界に達すると最適である。最適性のためには、最大絡み合いはアリスとボブの両分割で必要であるが、絶対的かつ真のマルチパーティイト絡み合いは不要である。偶数ビットまたは奇数ビットの情報を送信できる以前のスキームとは異なり、真の多部交絡GHZ状態を用いて任意の情報ビットを送信する一般化された高密度符号化プロトコルを実証した。異なるパウリ演算子の固有ベイジで表現されたGHZ状態は,セキュリティチェック手法を定式化してプロトコルの絶対セキュリティを確保する,ユニークなパリティパターンによって特徴づけられる。本手法は,空間的に分離されたパーティ間で資源情報を分散するシナリオに適用可能であることを示す。最後に、Bobに送られた量子ビットの数を最適化し、分散密度符号化法を構築する。 Superdense coding uses entanglement as a resource to communicate classical information securely through quantum channels. A superdense coding method is optimal when its capacity reaches Holevo bound. We show that for optimality, maximal entanglement is a necessity across the bipartition of Alice and Bob, but neither absolute nor genuine multipartite entanglement is required. Unlike the previous schemes, which can transmit either even or odd bits of information, we have demonstrated a generalized dense coding protocol using the genuine multipartite entangled GHZ state to send arbitrary information bits. Expressed in the eigenbasis of different Pauli operators, GHZ state is characterized by a unique parity pattern which enables us to formulate a security checking technique to ensure absolute security of the protocol. We show this method to be equally applicable in a scenario, where the resource information is distributed among spatially separated parties. Finally, optimizing the number of qubit(s) sent to Bob, we construct a distributed dense coding method, which completely depicts absolutely secure one way quantum communication between many to one party.	翻訳日:2023-03-09 16:40:22 公開日:2021-11-01
# 多安定キャビティマグノニックシステムを用いた長期記憶と三元論理ゲート Long-Time Memory and Ternary Logic Gate Using a Multistable Cavity Magnonic System ( http://arxiv.org/abs/2111.01558v1 ) ライセンス: Link先を確認	Rui-Chang Shen, Yi-Pu Wang, Jie Li, Shi-Yao Zhu, G. S. Agarwal, and J. Q. You	(参考訳) マルチスタビリティは動的システムの異常な非線形特性であり、メモリとスイッチを実装するために探索することができる。ここではKerr非線形性を有する3モードキャビティマグノン系のトライスタビリティを実験的に実現した。三安定領域の3つの安定状態は、特定の駆動条件下でのキャビティマグノン偏光子の周波数シフトの安定解に対応する。安定した状態にあるシステムは、システムが経験した履歴に依存しており、この状態は、履歴情報を格納するために利用できる。我々の実験では、メモリ時間は5.11秒に達する。さらに, このマルチスタブルハイブリッドシステムを用いて, オンオフ特性のよい3次論理ゲートを実演する。我々の新しい発見は、空洞マグノニクスに基づく情報保存と処理への道を開いた。 Multistability is an extraordinary nonlinear property of dynamical systems and can be explored to implement memory and switches. Here we experimentally realize the tristability in a three-mode cavity magnonic system with Kerr nonlinearity. The three stable states in the tristable region correspond to the stable solutions of the frequency shift of the cavity magnon polariton under specific driving conditions. We find that the system staying in which stable state depends on the history experienced by the system, and this state can be harnessed to store the history information. In our experiment, the memory time can reach as long as 5.11 s. Moreover, we demonstrate the ternary logic gate with good on-off characteristics using this multistable hybrid system. Our new findings pave a way towards cavity magnonics-based information storage and processing.	翻訳日:2023-03-09 16:40:02 公開日:2021-11-01
# 熱画像における物体検出のための教師なし画像生成拡張適応 Unsupervised Image-generation Enhanced Adaptation for Object Detection in Thermal images ( http://arxiv.org/abs/2002.06770v3 ) ライセンス: Link先を確認	Peng Liu, Fuyu Li, Wanyi Li	(参考訳) 熱画像における物体検出は重要なコンピュータビジョンタスクであり、無人車両、ロボット工学、監視、夜間ビジョンなど多くの応用がある。ディープラーニングに基づく検出器は大きな進歩を遂げており、通常は大量のラベル付きトレーニングデータを必要とする。しかし, 熱画像中の物体検出のためのラベル付きデータは乏しく, 収集に費用がかかる。多数のラベル付き可視画像を利用して、それらを熱画像領域に適応する方法は、解決される予定である。熱画像における物体検出のための教師なし画像生成適応法を提案する。可視領域と熱領域との間のギャップを低減するため、提案手法では、対象画像と類似した擬似熱画像を生成することができ、可視領域のアノテーション情報を保存できる。画像生成は、CycleGANに基づく画像間変換および強度反転変換を含む。生成された偽の熱画像は、新たなソースドメインとして使用される。そして、オフザシェルフ領域適応高速RCNNを用いて、生成された中間領域と熱標的領域とのギャップを低減する。提案手法の有効性と優位性を示す実験を行った。 Object detection in thermal images is an important computer vision task and has many applications such as unmanned vehicles, robotics, surveillance and night vision. Deep learning based detectors have achieved major progress, which usually need large amount of labelled training data. However, labelled data for object detection in thermal images is scarce and expensive to collect. How to take advantage of the large number labelled visible images and adapt them into thermal image domain, is expected to solve. This paper proposes an unsupervised image-generation enhanced adaptation method for object detection in thermal images. To reduce the gap between visible domain and thermal domain, the proposed method manages to generate simulated fake thermal images that are similar to the target images, and preserves the annotation information of the visible source domain. The image generation includes a CycleGAN based image-to-image translation and an intensity inversion transformation. Generated fake thermal images are used as renewed source domain. And then the off-the-shelf Domain Adaptive Faster RCNN is utilized to reduce the gap between generated intermediate domain and the thermal target domain. Experiments demonstrate the effectiveness and superiority of the proposed method.	翻訳日:2022-12-31 13:01:07 公開日:2021-11-01
# SWAG:スパースラーニングのためのラッパー手法 SWAG: A Wrapper Method for Sparse Learning ( http://arxiv.org/abs/2006.12837v2 ) ライセンス: Link先を確認	Roberto Molinari, Gaetan Bakalli, St\'ephane Guerrier, Cesare Miglioli, Samuel Orso, Mucyo Karemera, Olivier Scaillet	(参考訳) 機械学習の手法やアルゴリズムの大部分は、ユーザの優先度に必ずしも一致するとは限らない予測性能に高い優先度を与える。多くの場合、工学から遺伝学までさまざまな分野の実践者や研究者は、特にすべての属性が利用できるわけではない設定において、結果の解釈可能性と再現性を必要としている。その結果、機械学習アルゴリズムのアウトプットをより解釈しやすくし、ユーザが属性の可用性に基づいて選択できる(予測性能の観点から)「等価な」学習者のライブラリを提供することが、これらの学習者をテストおよび/または予測/識別目的で利用するために必要となる。そこで本研究では,利用者が指定した学習方法に基づき,属性空間をゆるやかに探索し,データ収集とストレージコストの低さを生かした疎学習者のライブラリを探索する,スクリーニングとラッパーのアプローチを組み合わせた手法を提案する。この新しい方法は (i)容易に解釈できる属性の低次元ネットワークを提供する。 (ii)強力な学習者と同等の予測力を定義する属性の組み合わせの多様性に基づき、結果の潜在的な再現性を高める。我々はこのアルゴリズムを "Sparse Wrapper AlGorithm" (SWAG) と呼ぶ。 The majority of machine learning methods and algorithms give high priority to prediction performance which may not always correspond to the priority of the users. In many cases, practitioners and researchers in different fields, going from engineering to genetics, require interpretability and replicability of the results especially in settings where, for example, not all attributes may be available to them. As a consequence, there is the need to make the outputs of machine learning algorithms more interpretable and to deliver a library of "equivalent" learners (in terms of prediction performance) that users can select based on attribute availability in order to test and/or make use of these learners for predictive/diagnostic purposes. To address these needs, we propose to study a procedure that combines screening and wrapper approaches which, based on a user-specified learning method, greedily explores the attribute space to find a library of sparse learners with consequent low data collection and storage costs. This new method (i) delivers a low-dimensional network of attributes that can be easily interpreted and (ii) increases the potential replicability of results based on the diversity of attribute combinations defining strong learners with equivalent predictive power. We call this algorithm "Sparse Wrapper AlGorithm" (SWAG).	翻訳日:2022-11-17 21:58:41 公開日:2021-11-01
# 光を暗くする - 統一されたフレームワーク下での知識グラフ埋め込みモデルの大規模評価 Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework ( http://arxiv.org/abs/2006.13365v5 ) ライセンス: Link先を確認	Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Mikhail Galkin, Sahand Sharifzadeh, Asja Fischer, Volker Tresp, Jens Lehmann	(参考訳) モデルの実装、トレーニング、評価を組み込んだ知識グラフの異質性は、公正かつ徹底的な比較を困難にしている。先述した結果の再現性を評価するため,pykeenソフトウェアパッケージに21のインタラクションモデルを再実装し,評価した。報告したハイパーパラメータではどの結果が再現可能かは,別のハイパーパラメータでのみ再現可能であり,再現できないため,なぜこのような結果になるのかという知見を与えるため,概説する。次に、4つのデータセットで大規模ベンチマークを行い、数千の実験と24,804gpu時間の計算を行った。我々は、ベストプラクティス、各モデルのベスト設定、そして以前公開されたベスト設定よりも改善できる点について洞察を得る。モデルアーキテクチャ、トレーニングアプローチ、損失関数、および逆関係の明示的なモデリングの組み合わせは、モデルアーキテクチャによって決定されるだけでなく、モデルの性能にとって重要であることを強調する。いくつかのアーキテクチャが、慎重に設定された場合、最先端技術と競合する結果を得ることができることを示す。コード、実験的な構成、結果、分析はhttps://github.com/pykeen/pykeenとhttps://github.com/pykeen/benchmarkingで利用可能です。 The heterogeneity in recently published knowledge graph embedding models' implementations, training, and evaluation has made fair and thorough comparisons difficult. In order to assess the reproducibility of previously published results, we re-implemented and evaluated 21 interaction models in the PyKEEN software package. Here, we outline which results could be reproduced with their reported hyper-parameters, which could only be reproduced with alternate hyper-parameters, and which could not be reproduced at all as well as provide insight as to why this might be the case. We then performed a large-scale benchmarking on four datasets with several thousands of experiments and 24,804 GPU hours of computation time. We present insights gained as to best practices, best configurations for each model, and where improvements could be made over previously published best configurations. Our results highlight that the combination of model architecture, training approach, loss function, and the explicit modeling of inverse relations is crucial for a model's performances, and not only determined by the model architecture. We provide evidence that several architectures can obtain results competitive to the state-of-the-art when configured carefully. We have made all code, experimental configurations, results, and analyses that lead to our interpretations available at https://github.com/pykeen/pykeen and https://github.com/pykeen/benchmarking	翻訳日:2022-11-17 21:32:37 公開日:2021-11-01
# ガウス過程と適応的離散化を用いたパレートアクティブラーニング Pareto Active Learning with Gaussian Processes and Adaptive Discretization ( http://arxiv.org/abs/2006.14061v2 ) ライセンス: Link先を確認	Andi Nika, Kerem Bozgan, Sepehr Elahi, \c{C}a\u{g}{\i}n Ararat, Cem Tekin	(参考訳) ベクトル値対象関数 $\boldsymbol{f}$ をガウス過程 (gp) からサンプリングし, 指数集合が整ったコンパクトな計量空間 $({\cal x},d)$ であるようなベクトル値対象関数 $\boldsymbol{f}$ を最適化する問題を考える。我々は、$\boldsymbol{f}$が事前に分かっていないと仮定し、$\boldsymbol{f}$ at design $x$は、$\boldsymbol{f}(x)$のうるさい観測結果をもたらすと仮定する。完全探索によるパレート最適設計の同定は,${\cal x}$ の濃度が大きい場合には実現不可能であるため,gpサンプリング関数の滑らかさと $({\cal x},d)$ の構造を利用して高速に学習するアルゴリズムであるadaptive $\boldsymbol{\epsilon}$-palを提案する。本質的に、Adaptive $\boldsymbol{\epsilon}$-PALは木に基づく適応的な離散化技術を用いて、可能な限り少数の評価で$\boldsymbol{\epsilon}$-accurate Paretoの集合を識別する。我々は、$\boldsymbol{\epsilon}$-accurate Pareto 集合識別のサンプル複雑性に基づく情報型および計量次元型境界を提供する。また,本アルゴリズムが複数のベンチマークデータセットにおけるpareto集合同定手法よりも優れていることを実験的に示す。 We consider the problem of optimizing a vector-valued objective function $\boldsymbol{f}$ sampled from a Gaussian Process (GP) whose index set is a well-behaved, compact metric space $({\cal X},d)$ of designs. We assume that $\boldsymbol{f}$ is not known beforehand and that evaluating $\boldsymbol{f}$ at design $x$ results in a noisy observation of $\boldsymbol{f}(x)$. Since identifying the Pareto optimal designs via exhaustive search is infeasible when the cardinality of ${\cal X}$ is large, we propose an algorithm, called Adaptive $\boldsymbol{\epsilon}$-PAL, that exploits the smoothness of the GP-sampled function and the structure of $({\cal X},d)$ to learn fast. In essence, Adaptive $\boldsymbol{\epsilon}$-PAL employs a tree-based adaptive discretization technique to identify an $\boldsymbol{\epsilon}$-accurate Pareto set of designs in as few evaluations as possible. We provide both information-type and metric dimension-type bounds on the sample complexity of $\boldsymbol{\epsilon}$-accurate Pareto set identification. We also experimentally show that our algorithm outperforms other Pareto set identification methods on several benchmark datasets.	翻訳日:2022-11-17 10:07:31 公開日:2021-11-01
# すべての障害モードが等しく作成される訳ではない: unlicable (mis)分類のためのディープニューラルネットワークのトレーニング Not all Failure Modes are Created Equal: Training Deep Neural Networks for Explicable (Mis)Classification ( http://arxiv.org/abs/2006.14841v2 ) ライセンス: Link先を確認	Alberto Olmo, Sailik Sengupta, Subbarao Kambhampati	(参考訳) ディープニューラルネットワークは、しばしば画像分類タスクで不安定であり、入力を誤分類することが知られている。これらの誤分類は避けられないかもしれないが、全ての障害モードは等しく考えることはできない。特定の誤分類(例えば、犬の画像を飛行機に分類する)は、人間を困惑させ、システムに対する人間の信頼を失う。さらに悪いことに、これらの誤り(例えば霊長類として誤って分類された人)は、有害な社会的影響をもたらす可能性がある。そこで本研究では,説明不能な誤りを減らすことを目的とする。この課題に対処するために、まず、クラスがセマンティックに近いのかという人間の期待を捉えたクラスレベルのセマンティックス(M^h$)を得る方法について議論する。遠くにあるもの。 CIFAR-10, CIFAR-100, ImageNetなどの画像ベンチマークでは, 人文研究や人文知識ベースを活用すれば, クラスレベルのセマンティクスが容易に得られることを示す。第二に,重み付き損失関数(WLF)を用いて,不説明性の重みによる誤分類をペナルティ化する手法を提案する。最後に,提案手法を用いた既存の分類器のトレーニング(あるいは微調整)により,(1)最上位1の精度,(2)分布内および分布外の両方のテストデータにおけるより説明可能な障害モード,(3)既存の研究に比べて人為的ラベルの収集に要するコストが大幅に削減されることを示す。 Deep Neural Networks are often brittle on image classification tasks and known to misclassify inputs. While these misclassifications may be inevitable, all failure modes cannot be considered equal. Certain misclassifications (eg. classifying the image of a dog to an airplane) can perplex humans and result in the loss of human trust in the system. Even worse, these errors (eg. a person misclassified as a primate) can have odious societal impacts. Thus, in this work, we aim to reduce inexplicable errors. To address this challenge, we first discuss methods to obtain the class-level semantics that capture the human's expectation ($M^h$) regarding which classes are semantically close {\em vs.} ones that are far away. We show that for popular image benchmarks (like CIFAR-10, CIFAR-100, ImageNet), class-level semantics can be readily obtained by leveraging either human subject studies or publicly available human-curated knowledge bases. Second, we propose the use of Weighted Loss Functions (WLFs) to penalize misclassifications by the weight of their inexplicability. Finally, we show that training (or fine-tuning) existing classifiers with the proposed methods lead to Deep Neural Networks that have (1) comparable top-1 accuracy, (2) more explicable failure modes on both in-distribution and out-of-distribution (OOD) test data, and (3) incur significantly less cost in the gathering of additional human labels compared to existing works.	翻訳日:2022-11-16 20:46:45 公開日:2021-11-01
# 自閉症スペクトラム障害の神経画像診断とリハビリテーションのための深層学習 Deep Learning for Neuroimaging-based Diagnosis and Rehabilitation of Autism Spectrum Disorder: A Review ( http://arxiv.org/abs/2007.01285v4 ) ライセンス: Link先を確認	Marjane Khodatars, Afshin Shoeibi, Delaram Sadeghi, Navid Ghassemi, Mahboobeh Jafari, Parisa Moridian, Ali Khadem, Roohallah Alizadehsani, Assef Zare, Yinan Kong, Abbas Khosravi, Saeid Nahavandi, Sadiq Hussain, U. Rajendra Acharya, Michael Berk	(参考訳) 自閉症スペクトラム障害(ASD)の正確な診断と効果的なリハビリテーションが本疾患の管理に不可欠である。人工知能(AI)技術は、医師が自動診断とリハビリテーションの手順を適用するのを助ける。 AI技術は、従来の機械学習(ML)アプローチとディープラーニング(DL)技術で構成される。従来のml法は様々な特徴抽出と分類技術を用いるが、dlでは特徴抽出と分類のプロセスは知的かつ統合的に達成される。 ASDの診断のためのDL法は神経画像に基づくアプローチに焦点を当てている。神経イメージング技術は、ASD診断に有用な非侵襲性疾患マーカーである。構造的および機能的ニューロイメージング技術は、医師に脳の構造(解剖学と構造的接続)と機能(活動と機能的接続)に関する重要な情報を提供する。脳の複雑な構造と機能のため、DLのような強力なAI技術を活用することなく、神経画像データを用いたASD診断のための最適な手順を提案することは困難である。本稿では,ASDを識別するためのDLネットワークを用いた研究について述べる。 DLネットワークを利用したASD患者を支援するためのリハビリテーションツールも評価した。最後に,ASDの自動検出と修復において重要な課題を提示し,今後の課題を提案する。 Accurate diagnosis of Autism Spectrum Disorder (ASD) followed by effective rehabilitation is essential for the management of this disorder. Artificial intelligence (AI) techniques can aid physicians to apply automatic diagnosis and rehabilitation procedures. AI techniques comprise traditional machine learning (ML) approaches and deep learning (DL) techniques. Conventional ML methods employ various feature extraction and classification techniques, but in DL, the process of feature extraction and classification is accomplished intelligently and integrally. DL methods for diagnosis of ASD have been focused on neuroimaging-based approaches. Neuroimaging techniques are non-invasive disease markers potentially useful for ASD diagnosis. Structural and functional neuroimaging techniques provide physicians substantial information about the structure (anatomy and structural connectivity) and function (activity and functional connectivity) of the brain. Due to the intricate structure and function of the brain, proposing optimum procedures for ASD diagnosis with neuroimaging data without exploiting powerful AI techniques like DL may be challenging. In this paper, studies conducted with the aid of DL networks to distinguish ASD are investigated. Rehabilitation tools provided for supporting ASD patients utilizing DL networks are also assessed. Finally, we will present important challenges in the automated detection and rehabilitation of ASD and propose some future works.	翻訳日:2022-11-14 14:02:43 公開日:2021-11-01
# 不確実性推定を用いた変分オートエンコーダによる分布外サンプルの検出 Detecting Out-of-distribution Samples via Variational Auto-encoder with Reliable Uncertainty Estimation ( http://arxiv.org/abs/2007.08128v3 ) ライセンス: Link先を確認	Xuming Ran, Mingkun Xu, Lingrui Mei, Qi Xu, Quanying Liu	(参考訳) 変分オートエンコーダ(VAE)は、ディープニューラルネットワークアーキテクチャとベイズ法から豊かな表現能力を持つ影響のある生成モデルである。しかしながら、VAEモデルは、分布外入力(OOD)に対して、分布外入力(ID)よりも高い確率を割り当てる弱点がある。この問題に対処するため、OOD入力の深い理解には確実な不確実性推定が重要であると考えられる。本研究では,INCPVAEと呼ばれるVAEのエンコーダに統合可能な改良型ノイズコントラッシブ先行(INCP)を提案する。 INCPは拡張性があり、VAEと互換性があり、不確実性評価のためのINCPの利点も採用している。各種データセットに対する実験により,標準のVAEと比較してOODデータの不確実性推定に優れ,異常検出タスクにおいて堅牢であることが示された。 INCPVAEモデルは、OOD入力に対する確実な不確実性を推定し、VAEモデルにおけるOOD問題を解く。 Variational autoencoders (VAEs) are influential generative models with rich representation capabilities from the deep neural network architecture and Bayesian method. However, VAE models have a weakness that assign a higher likelihood to out-of-distribution (OOD) inputs than in-distribution (ID) inputs. To address this problem, a reliable uncertainty estimation is considered to be critical for in-depth understanding of OOD inputs. In this study, we propose an improved noise contrastive prior (INCP) to be able to integrate into the encoder of VAEs, called INCPVAE. INCP is scalable, trainable and compatible with VAEs, and it also adopts the merits from the INCP for uncertainty estimation. Experiments on various datasets demonstrate that compared to the standard VAEs, our model is superior in uncertainty estimation for the OOD data and is robust in anomaly detection tasks. The INCPVAE model obtains reliable uncertainty estimation for OOD inputs and solves the OOD problem in VAE models.	翻訳日:2022-11-09 22:04:06 公開日:2021-11-01
# クロスドメイン少数ショット認識のための中間レベルパターンの再検討 Revisiting Mid-Level Patterns for Cross-Domain Few-Shot Recognition ( http://arxiv.org/abs/2008.03128v4 ) ライセンス: Link先を確認	Yixiong Zou, Shanghang Zhang, JianPeng Yu, Yonghong Tian, Jos\'e M. F. Moura	(参考訳) 既存のマイノリティ・ショット・ラーニング(fsl)メソッドは通常ベースクラスを想定し、新しいクラスは同じドメイン(ドメイン内設定)からのものである。しかし、実際には、いくつかの特別なドメインがベースクラスを構築するのに十分なトレーニングサンプルを集めることは不可能である。この問題を解決するために, 一般ドメインベースクラスから特殊ドメイン新規クラスへ知識を転送するために, クロスドメインfsl (cdfsl) が最近提案されている。既存のcdfslは主に近接ドメイン間の転送に重点を置いているが、実際のアプリケーションで新しいクラスが現れる場合、遠隔ドメイン間の転送を考えることは稀であり、さらに難しい。本稿では,新しいクラスがベースクラスから離れた領域にあるcdfslの難解なサブセットを,メインストリームfsl作業においてより転送可能でありながら未検討である中レベルの特徴を再検討することで検討する。中間レベルの特徴の識別性を高めるために,各サンプルの識別情報を学ぶために,中間レベルの特徴を奨励する残差予測タスクを提案する。特に、このメカニズムはドメイン内のFSLやCDFSLに近いドメインでも有効である。したがって、同じトレーニングフレームワークの下で、クロスドメインFSLとインドメインFSLの両方に2種類の機能を提供します。 2つの挑戦的な医療データセットを含む6つの公開データセットの両方の設定下での実験は、我々の理論的根拠を検証し、最先端のパフォーマンスを示す。コードはリリースされる。 Existing few-shot learning (FSL) methods usually assume base classes and novel classes are from the same domain (in-domain setting). However, in practice, it may be infeasible to collect sufficient training samples for some special domains to construct base classes. To solve this problem, cross-domain FSL (CDFSL) is proposed very recently to transfer knowledge from general-domain base classes to special-domain novel classes. Existing CDFSL works mostly focus on transferring between near domains, while rarely consider transferring between distant domains, which is in practical need as any novel classes could appear in real-world applications, and is even more challenging. In this paper, we study a challenging subset of CDFSL where the novel classes are in distant domains from base classes, by revisiting the mid-level features, which are more transferable yet under-explored in main stream FSL work. To boost the discriminability of mid-level features, we propose a residual-prediction task to encourage mid-level features to learn discriminative information of each sample. Notably, such mechanism also benefits the in-domain FSL and CDFSL in near domains. Therefore, we provide two types of features for both cross- and in-domain FSL respectively, under the same training framework. Experiments under both settings on six public datasets, including two challenging medical datasets, validate the our rationale and demonstrate state-of-the-art performance. Code will be released.	翻訳日:2022-11-02 01:40:12 公開日:2021-11-01
# Berrut Approximated Coded Computing: 多項式コンピューティング以外のストラグラー耐性 Berrut Approximated Coded Computing: Straggler Resistance Beyond Polynomial Computing ( http://arxiv.org/abs/2009.08327v3 ) ライセンス: Link先を確認	Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali	(参考訳) 大規模データセットで複雑なモデルをトレーニングするために分散学習を使用する際の大きな課題のひとつは、ストラグラー効果に対処することだ。解法として,計算タスクに冗長性を効率的に付加するコード計算が最近提案されている。この技術では、符号化はデータセットにまたがって使用され、一定の大きさのワーカーノードの任意のサブセットの結果が最終的な結果を取り戻すのに十分であるように、符号化されたデータ上で計算される。これらのアプローチの主な課題は、(1)多項式関数の計算に限られていること、(2)データのサイズとモデル複雑性(多項式の次数)の乗算によって、待機するサーバのサブセットのサイズが大きくなること、(3)実数上での計算では数値的に安定ではないこと、である。本稿では,多項式関数計算に限らない別の手法として,berrut近似符号化計算(bacc)を提案する。さらに、マスターノードは、利用可能なワーカーノードの任意のサブセットの結果を用いて、最終的な結果を概ね計算することができる。近似アプローチは計算量の低い数値的に安定であることが証明されている。また,分散学習問題などの異なる環境でのシミュレーション結果を用いて,近似の精度を理論的に確立し検証した。特に、baccはサーバーのクラスタ上でディープニューラルネットワークをトレーニングするために使われ、収束率の点で繰り返し計算(繰り返し符号化)を上回っています。 One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the computation tasks. In this technique, coding is used across data sets, and computation is done over coded data, such that the results of an arbitrary subset of worker nodes with a certain size are enough to recover the final results. The major challenges with those approaches are (1) they are limited to polynomial function computations, (2) the size of the subset of servers that we need to wait for grows with the multiplication of the size of the data set and the model complexity (the degree of the polynomial), which can be prohibitively large, (3) they are not numerically stable for computation over real numbers. In this paper, we propose Berrut Approximated Coded Computing (BACC), as an alternative approach, which is not limited to polynomial function computation. In addition, the master node can approximately calculate the final results, using the outcomes of any arbitrary subset of available worker nodes. The approximation approach is proven to be numerically stable with low computational complexity. In addition, the accuracy of the approximation is established theoretically and verified by simulation results in different settings such as distributed learning problems. In particular, BACC is used to train a deep neural network on a cluster of servers, which outperforms repetitive computation (repetition coding) in terms of the rate of convergence.	翻訳日:2022-10-17 12:25:32 公開日:2021-11-01
# 小データを用いた幼児ポーズ推定のための不変表現学習 Invariant Representation Learning for Infant Pose Estimation with Small Data ( http://arxiv.org/abs/2010.06100v5 ) ライセンス: Link先を確認	Xiaofei Huang, Nihang Fu, Shuangjun Liu, Sarah Ostadabbas	(参考訳) 幼児の運動分析は、幼児の発達研究において重要な話題である。しかしながら、人間のポーズ推定の応用はますます広くなってきているが、大規模成人のポーズデータセットでトレーニングされたモデルは、体比とポーズの多用途性が著しく異なるため、幼児のポーズの推定にほとんど成功していない。さらに、プライバシとセキュリティの考慮事項は、堅牢なモデルのトレーニングに必要な適切な幼児ポーズデータの提供をゼロから妨げている。そこで本稿では,1) 幼児用合成画像と, 生成した合成幼児用画像とを組み合わせたハイブリッド合成・実幼児用画像(syrip)データセットの構築と公開を行い, (2) 隣接する成人用画像と合成幼児用画像の知識を, 微調整型ドメイン対応幼児用画像(fidip)推定モデルに転送できる多段階不変表現学習戦略を提案する。我々は,SyRIPデータセットでトレーニングされたモデルと同一のネットワーク構造を用いたアブレーション研究を行い,他の公立幼児ポーズデータセットでトレーニングされたモデルよりも顕著な改善を示した。複雑度が異なるポーズ推定バックボーンネットワークと統合されたfidipは、これらのモデルの微調整バージョンよりも一貫してパフォーマンスが良い。最新のDarkPoseモデルを用いた幼児のポーズ推定では、平均的精度(mAP)は93.6である。 Infant motion analysis is a topic with critical importance in early childhood development studies. However, while the applications of human pose estimation have become more and more broad, models trained on large-scale adult pose datasets are barely successful in estimating infant poses due to the significant differences in their body ratio and the versatility of their poses. Moreover, the privacy and security considerations hinder the availability of adequate infant pose data required for training of a robust model from scratch. To address this problem, this paper presents (1) building and publicly releasing a hybrid synthetic and real infant pose (SyRIP) dataset with small yet diverse real infant images as well as generated synthetic infant poses and (2) a multi-stage invariant representation learning strategy that could transfer the knowledge from the adjacent domains of adult poses and synthetic infant images into our fine-tuned domain-adapted infant pose (FiDIP) estimation model. In our ablation study, with identical network structure, models trained on SyRIP dataset show noticeable improvement over the ones trained on the only other public infant pose datasets. Integrated with pose estimation backbone networks with varying complexity, FiDIP performs consistently better than the fine-tuned versions of those models. One of our best infant pose estimation performers on the state-of-the-art DarkPose model shows mean average precision (mAP) of 93.6.	翻訳日:2022-10-07 23:58:23 公開日:2021-11-01
# 言語間一般化改善のための多言語BERT言語句の探索 Looking for Clues of Language in Multilingual BERT to Improve Cross-lingual Generalization ( http://arxiv.org/abs/2010.10041v4 ) ライセンス: Link先を確認	Chi-Liang Liu and Tsung-Yuan Hsu and Yung-Sung Chuang and Chung-Yi Li and Hung-yi Lee	(参考訳) 多言語BERT (m-BERT) には、言語情報と意味情報の両方が含まれている。我々は、言語のトークンの埋め込みを平均化することによって、言語の表現を得ることができることを見出した。この言語表現を前提として、トークン埋め込みを操作することで多言語BERTの出力言語を制御し、教師なしトークン翻訳を実現する。さらに、この観測に基づいて、m-BERTの言語間能力を改善するために、計算的に安価で効果的なアプローチを提案する。 Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information. We find that the representation of a language can be obtained by simply averaging the embeddings of the tokens of the language. Given this language representation, we control the output languages of multilingual BERT by manipulating the token embeddings, thus achieving unsupervised token translation. We further propose a computationally cheap but effective approach to improve the cross-lingual ability of m-BERT based on this observation.	翻訳日:2022-10-05 05:52:23 公開日:2021-11-01
# 教師なし複数質問に対する回答:基礎知識から学び始める Unsupervised Multiple Choices Question Answering: Start Learning from Basic Knowledge ( http://arxiv.org/abs/2010.11003v2 ) ライセンス: Link先を確認	Chi-Liang Liu and Hung-yi Lee	(参考訳) 本稿では,mcqa(unsupervised multiple choices question answering)の可能性について検討する。 MCQAモデルは、非常に基本的な知識から始めて、ある選択が他の選択よりも正しい確率が高いことを知っている。この情報は、非常にうるさいが、MCQAモデルのトレーニングを導く。提案手法は RACE のベースラインアプローチよりも優れており,MC500 の教師あり学習手法と同等である。 In this paper, we study the possibility of almost unsupervised Multiple Choices Question Answering (MCQA). Starting from very basic knowledge, MCQA model knows that some choices have higher probabilities of being correct than the others. The information, though very noisy, guides the training of an MCQA model. The proposed method is shown to outperform the baseline approaches on RACE and even comparable with some supervised learning approaches on MC500.	翻訳日:2022-10-04 22:49:39 公開日:2021-11-01
# 分散予測のための因果意味表現の学習 Learning Causal Semantic Representation for Out-of-Distribution Prediction ( http://arxiv.org/abs/2011.01681v5 ) ライセンス: Link先を確認	Chang Liu, Xinwei Sun, Jindong Wang, Haoyue Tang, Tao Li, Tao Qin, Wei Chen, Tie-Yan Liu	(参考訳) 従来の教師付き学習法、特に深層学習法は、学習された表現がドメイン固有の相関によって意味的要因と変動要因を混合し、意味的要素のみがアウトオブディストリビューション(ood)の例に敏感であることが判明した。この問題を解決するために,因果推論に基づく因果意味生成モデル(CSG)を提案し,その2つの要因を個別にモデル化し,共通かつ困難な単一トレーニング領域からのOOD予測手法を開発する。これらの手法は因果不変の原理に基づいており、効率的な学習と容易な予測のための変分ベイズにおける新しい設計である。理論的には、ある条件下では、CSGはトレーニングデータに適合させることで意味的因子を識別できることを証明し、この意味的識別はOOD一般化誤差の有界性と適応の成功を保証する。実証実験では、OOD性能は一般的なベースラインよりも向上した。 Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output. To address the problem, we propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately, and develop methods for OOD prediction from a single training domain, which is common and challenging. The methods are based on the causal invariance principle, with a novel design in variational Bayes for both efficient learning and easy prediction. Theoretically, we prove that under certain conditions, CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error and the success of adaptation. Empirical study shows improved OOD performance over prevailing baselines.	翻訳日:2022-09-30 03:41:39 公開日:2021-11-01
# 量子畳み込みニューラルネットワークにおける不規則高原の欠如 Absence of Barren Plateaus in Quantum Convolutional Neural Networks ( http://arxiv.org/abs/2011.02966v2 ) ライセンス: Link先を確認	Arthur Pesah, M. Cerezo, Samson Wang, Tyler Volkoff, Andrew T. Sornborger, Patrick J. Coles	(参考訳) 量子ニューラルネットワーク(QNN)は、量子データを効率的に分析する可能性に興奮を引き起こしている。しかし、この興奮は、多くのqnnアーキテクチャにおいて、barren plateau landscapesとして知られる指数関数的に消失する勾配の存在によって温められている。近年、量子畳み込みニューラルネットワーク(qcnns)が提案されており、関連するデータ特徴に関する情報を保存しながら量子ビット数を削減する畳み込み層とプール層が連なる。本研究では,qcnnアーキテクチャにおけるパラメータの勾配スケーリングを厳密に解析する。勾配のばらつきは多項式よりも早く消えることが分かり、QCNNが不規則な高原を示さないことが示唆された。これは、他の多くのQNNアーキテクチャとは異なり、ランダムに初期化されたQCNNのトレーニング可能性に関する分析的な保証を提供する。本研究の結果を導出するために,Haar分散ユニタリに対する期待値を解析するグラフベースの新しい手法を導入する。最後に,解析結果を検証するために数値シミュレーションを行う。 Quantum neural networks (QNNs) have generated excitement around the possibility of efficiently analyzing quantum data. But this excitement has been tempered by the existence of exponentially vanishing gradients, known as barren plateau landscapes, for many QNN architectures. Recently, Quantum Convolutional Neural Networks (QCNNs) have been proposed, involving a sequence of convolutional and pooling layers that reduce the number of qubits while preserving information about relevant data features. In this work we rigorously analyze the gradient scaling for the parameters in the QCNN architecture. We find that the variance of the gradient vanishes no faster than polynomially, implying that QCNNs do not exhibit barren plateaus. This provides an analytical guarantee for the trainability of randomly initialized QCNNs, which highlights QCNNs as being trainable under random initialization unlike many other QNN architectures. To derive our results we introduce a novel graph-based method to analyze expectation values over Haar-distributed unitaries, which will likely be useful in other contexts. Finally, we perform numerical simulations to verify our analytical results.	翻訳日:2022-09-29 11:48:56 公開日:2021-11-01
# 高次ボロノイ図を用いた$k$-Nearest近傍分類器の逆例 Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams ( http://arxiv.org/abs/2011.09719v2 ) ライセンス: Link先を確認	Chawin Sitawarin, Evgenios M. Kornaropoulos, Dawn Song, David Wagner	(参考訳) 逆例は機械学習モデルにおいて広く研究されている現象である。注目を集めているのはニューラルネットワークだが、他の実用的なモデルもこの問題に悩まされている。そこで本研究では,最小ノルムの逆数例を求めるために,$k$-nearest 近傍分類の逆数ロバスト性を評価するアルゴリズムを提案する。従来の提案と異なり,与えられた入力点から外へ拡大する探索を行うことにより幾何学的アプローチをとる。高いレベルでは、探索半径は、入力点と異なる分類を行う細胞を見つけるまで、近くのボロノイ細胞へと拡大する。アルゴリズムを大規模な$k$にスケールするために、様々なデータセットにおいて、ベースラインと比較してより少ないノルムで摂動を求める近似ステップを導入する。さらに、我々のアプローチが競合より優れているデータセットの構造特性を分析する。 Adversarial examples are a widely studied phenomenon in machine learning models. While most of the attention has been focused on neural networks, other practical models also suffer from this issue. In this work, we propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification, i.e., finding a minimum-norm adversarial example. Diverging from previous proposals, we take a geometric approach by performing a search that expands outwards from a given input point. On a high level, the search radius expands to the nearby Voronoi cells until we find a cell that classifies differently from the input point. To scale the algorithm to a large $k$, we introduce approximation steps that find perturbations with smaller norm, compared to the baselines, in a variety of datasets. Furthermore, we analyze the structural properties of a dataset where our approach outperforms the competition.	翻訳日:2022-09-23 20:16:56 公開日:2021-11-01
# データ効率の高い電波銀河の分類 Data-Efficient Classification of Radio Galaxies ( http://arxiv.org/abs/2011.13311v2 ) ライセンス: Link先を確認	Ashwin Samudre, Lijo George, Mahak Bansal, Yogesh Wadadekar	(参考訳) 電波銀河からの連続放出は、一般的にFRI、FRII、ベント、コンプレックスなどの異なる形態分類に分類される。本稿では,小規模データセット($\sim 2000$ sample)を用いた深層学習法を用いて,形態学に基づく電波銀河分類の課題について検討する。本研究では, サイクリック学習率や識別学習などの高度な技術を用いた事前訓練DenseNetモデルを用いて, ツインネットワークに基づく数ショット学習手法を適用し, モデルを高速に学習する。我々は、ベント型銀河とFRII型銀河の最大の混同源である最高の性能モデルを用いて、92\%以上の分類精度を達成する。私たちの結果は、小さながキュレーションされたデータセットに焦点を合わせることで、ニューラルネットワークのトレーニングにベストプラクティスを使うことが、よい結果をもたらすことを示しています。自動分類技術は、近日中に数十万個の新しい電波銀河を検出すると期待されている次世代の電波望遠鏡による調査に欠かせない。 The continuum emission from radio galaxies can be generally classified into different morphological classes such as FRI, FRII, Bent, or Compact. In this paper, we explore the task of radio galaxy classification based on morphology using deep learning methods with a focus on using a small scale dataset ($\sim 2000$ samples). We apply few-shot learning techniques based on Twin Networks and transfer learning techniques using a pre-trained DenseNet model with advanced techniques like cyclical learning rate and discriminative learning to train the model rapidly. We achieve a classification accuracy of over 92\% using our best performing model with the biggest source of confusion being between Bent and FRII type galaxies. Our results show that focusing on a small but curated dataset along with the use of best practices to train the neural network can lead to good results. Automated classification techniques will be crucial for upcoming surveys with next generation radio telescopes which are expected to detect hundreds of thousands of new radio galaxies in the near future.	翻訳日:2022-09-20 12:35:12 公開日:2021-11-01
# (参考訳) perspeechnorm: ペルシャ語の音声処理正規化ツールキット PerSpeechNorm: A Persian Toolkit for Speech Processing Normalization ( http://arxiv.org/abs/2111.03470v1 ) ライセンス: CC BY 4.0	Romina Oji, Seyedeh Fatemeh Razavi, Sajjad Abdi Dehsorkh, Alireza Hariri, Hadi Asheri, Reshad Hosseini	(参考訳) 一般に、音声処理モデルは音響モデルとともに言語モデルで構成される。言語モデルの複雑さとバリエーションに関わらず、クリーニング、正規化、トークン化という3つの重要な前処理ステップが言語モデルで必要である。上述のステップの中で、正規化ステップは、純粋なテキストアプリケーションで統一されたフォーマットに不可欠である。しかし、音声処理モジュールの組み込み言語モデルでは、正規化は形式統一に限定されない。さらに、読みやすいシンボル、番号等を、どのように発音するかに変換する必要がある。音声処理モジュールに組み込み言語モデルのためのペルシア正規化ツールキットは存在しないので,本論文では,音声処理におけるテキスト処理のためのオープンソース正規化ツールキットを提案する。簡潔に言えば、記号(普通通貨、#、@、urlなど)、数字(日付、時間、電話番号、国定コードなど)といった異なる読みやすいペルシア語のテキストを考える。他のペルシア語テキスト正規化ツールとの比較は、音声処理における提案手法の優位性を示している。また,提案した関数の1つ(文分離)に対するモデルの性能を,HAZMやParsivarといった他の共通自然言語ライブラリと比較すると,提案手法の適切な性能を示す。さらに,ペルシャ語ウィキペディアデータの評価により,提案手法の適切な性能が確認された。 In general, speech processing models consist of a language model along with an acoustic model. Regardless of the language model's complexity and variants, three critical pre-processing steps are needed in language models: cleaning, normalization, and tokenization. Among mentioned steps, the normalization step is so essential to format unification in pure textual applications. However, for embedded language models in speech processing modules, normalization is not limited to format unification. Moreover, it has to convert each readable symbol, number, etc., to how they are pronounced. To the best of our knowledge, there is no Persian normalization toolkits for embedded language models in speech processing modules, So in this paper, we propose an open-source normalization toolkit for text processing in speech applications. Briefly, we consider different readable Persian text like symbols (common currencies, #, @, URL, etc.), numbers (date, time, phone number, national code, etc.), and so on. Comparison with other available Persian textual normalization tools indicates the superiority of the proposed method in speech processing. Also, comparing the model's performance for one of the proposed functions (sentence separation) with other common natural language libraries such as HAZM and Parsivar indicates the proper performance of the proposed method. Besides, its evaluation of some Persian Wikipedia data confirms the proper performance of the proposed method.	翻訳日:2021-11-14 15:44:28 公開日:2021-11-01
# 整数計画のための大近所探索ポリシーの学習 Learning Large Neighborhood Search Policy for Integer Programming ( http://arxiv.org/abs/2111.03466v1 ) ライセンス: Link先を確認	Yaoxin Wu, Wen Song, Zhiguang Cao and Jie Zhang	(参考訳) 本稿では,整数プログラミング (IP) のための大規模近傍探索 (LNS) ポリシーを学習するための深層強化学習 (RL) 手法を提案する。 RLポリシーはデフォールト演算子として訓練され、各ステップで変数のサブセットを選択し、IPソルバによって修復演算子として再最適化される。しかし、可変部分集合の組合せ数は典型的なrlアルゴリズムの直接適用を妨げている。この課題に取り組むために、私たちはすべてのサブセットを各変数のバイナリ決定に分解することで表現します。次に,各変数のポリシを並列に学習するためにニューラルネットワークを設計,カスタマイズされたアクタ-クリティックアルゴリズムでトレーニングする。提案手法を4つの代表的IP問題に対して評価する。結果は、SCIPよりもはるかに少ない時間でより良いソリューションを見つけることができ、同じランタイムで他のLSSベースラインよりも大幅に優れていることを示している。さらに、これらの利点は、ポリシーがより大きな問題に一般化するときに特に持続する。また、gurobiによるさらなる実験により、この最先端の商用解法を同じ時間内に実現できることが判明した。 We propose a deep reinforcement learning (RL) method to learn large neighborhood search (LNS) policy for integer programming (IP). The RL policy is trained as the destroy operator to select a subset of variables at each step, which is reoptimized by an IP solver as the repair operator. However, the combinatorial number of variable subsets prevents direct application of typical RL algorithms. To tackle this challenge, we represent all subsets by factorizing them into binary decisions on each variable. We then design a neural network to learn policies for each variable in parallel, trained by a customized actor-critic algorithm. We evaluate the proposed method on four representative IP problems. Results show that it can find better solutions than SCIP in much less time, and significantly outperform other LNS baselines with the same runtime. Moreover, these advantages notably persist when the policies generalize to larger problems. Further experiments with Gurobi also reveal that our method can outperform this state-of-the-art commercial solver within the same time limit.	翻訳日:2021-11-14 15:12:53 公開日:2021-11-01
# RADAMS:IDoS攻撃に対するレジリエントで適応的なアラートと注意管理戦略 RADAMS: Resilient and Adaptive Alert and Attention Management Strategy against Informational Denial-of-Service (IDoS) Attacks ( http://arxiv.org/abs/2111.03463v1 ) ライセンス: Link先を確認	Linan Huang and Quanyan Zhu	(参考訳) 人間の注意欠陥を利用した攻撃は、サイバーセキュリティに深刻な脅威をもたらしている。本研究では,人間の操作を過負荷にし,実際の攻撃を隠蔽するために大量のフェント攻撃を発生させるIDoS攻撃という,新たなタイプのアクティブアタック攻撃を特定し,正式に定義する。人間の要因(例えば、専門知識、ストレス、効率のレベル)と経験的結果(例えば、ヤークス・ドッドソンの法則とサンクコスト誤認)を組み込んで、オペレータの注意力のダイナミクスとその意思決定プロセスとそのリアルタイムの警告監視と検査をモデル化します。そこで我々は,警告の可観測性に基づいて警告を選択的に強調するResilient and Adaptive Data-driven alert and Attention Management Strategy (RADAMS)を開発した。 RADAMSは強化学習を使用して、様々な人間のオペレータ向けにカスタマイズされた、転送可能な設計を実現し、IDoS攻撃を進化させる。統合モデリングと理論的分析は、製品原則(Product Principle of Attention, PPoA)、基本的限界、重要な人的・経済的要因間のトレードオフにつながる。実験結果は,提案手法がデフォルト戦略を上回り,最大20%のidosリスクを低減できることを示した。さらに、この戦略はコスト、攻撃頻度、人的注意力の多様さに耐性がある。我々は,注意リスク等価性,攻撃者のジレンマ,半真正銘の最適攻撃戦略などの興味深い現象を認識した。 Attacks exploiting human attentional vulnerability have posed severe threats to cybersecurity. In this work, we identify and formally define a new type of proactive attentional attacks called Informational Denial-of-Service (IDoS) attacks that generate a large volume of feint attacks to overload human operators and hide real attacks among feints. We incorporate human factors (e.g., levels of expertise, stress, and efficiency) and empirical results (e.g., the Yerkes-Dodson law and the sunk cost fallacy) to model the operators' attention dynamics and their decision-making processes along with the real-time alert monitoring and inspection. To assist human operators in timely and accurately dismissing the feints and escalating the real attacks, we develop a Resilient and Adaptive Data-driven alert and Attention Management Strategy (RADAMS) that de-emphasizes alerts selectively based on the alerts' observable features. RADAMS uses reinforcement learning to achieve a customized and transferable design for various human operators and evolving IDoS attacks. The integrated modeling and theoretical analysis lead to the Product Principle of Attention (PPoA), fundamental limits, and the tradeoff among crucial human and economic factors. Experimental results corroborate that the proposed strategy outperforms the default strategy and can reduce the IDoS risk by as much as 20%. Besides, the strategy is resilient to large variations of costs, attack frequencies, and human attention capacities. We have recognized interesting phenomena such as attentional risk equivalency, attacker's dilemma, and the half-truth optimal attack strategy.	翻訳日:2021-11-14 15:12:11 公開日:2021-11-01
# 深層学習による繊維強化複合材料の応力場予測 Stress field prediction in fiber-reinforced composite materials using a deep learning approach ( http://arxiv.org/abs/2111.05271v1 ) ライセンス: Link先を確認	Anindya Bhaduri, Ashwini Gupta, Lori Graham-Brady	(参考訳) 計算応力解析は材料システム設計における重要なステップである。有限要素法 (FEM) は複雑な材料系の応力解析を行う標準的な手法である。ストレス分析を加速する方法は、femをデータ駆動機械学習ベースのストレス分析アプローチに置き換えることである。本研究では, 繊維強化マトリックス複合材料システムについて考察し, 深層学習ツールを用いて応力場予測のためのFEM手法の代替手法を提案する。まず, 空間構成の異なる繊維の固定数の複合材料系に対する応力場マップの予測を試みた。具体的には,複合材料中の繊維の空間配置と対応するフォン・ミセス応力場とのマッピングを試みた。これは畳み込みニューラルネットワーク(CNN)、特にU-Netアーキテクチャを使用して、トレーニングデータと同じ数のファイバーを持つシステムの真のストレスマップを使用して達成される。 u-netはエンコーダ・デコーダネットワークであり,本研究では複合材料イメージを入力として入力画像と同じ大きさの応力場画像を出力する。トレーニングサンプルの異なる初期化を行い,少数のトレーニングサンプルに対する予測精度の感度を求めることにより,ロバスト性解析を行う。複合材料系の繊維数が同じ体積率で増加すると、その形状を正確に表現するためには、より微細な有限要素メッシュ離散化が必要である。これにより計算コストが増大する。そこで, 本研究の目的は, 比較的安価な繊維数が少ない系の真の応力マップからの情報を用いて, 空間構成の異なる繊維数が多いシステムの応力場を予測することである。 Computational stress analysis is an important step in the design of material systems. Finite element method (FEM) is a standard approach of performing stress analysis of complex material systems. A way to accelerate stress analysis is to replace FEM with a data-driven machine learning based stress analysis approach. In this study, we consider a fiber-reinforced matrix composite material system and we use deep learning tools to find an alternative to the FEM approach for stress field prediction. We first try to predict stress field maps for composite material systems of fixed number of fibers with varying spatial configurations. Specifically, we try to find a mapping between the spatial arrangement of the fibers in the composite material and the corresponding von Mises stress field. This is achieved by using a convolutional neural network (CNN), specifically a U-Net architecture, using true stress maps of systems with same number of fibers as training data. U-Net is a encoder-decoder network which in this study takes in the composite material image as an input and outputs the stress field image which is of the same size as the input image. We perform a robustness analysis by taking different initializations of the training samples to find the sensitivity of the prediction accuracy to the small number of training samples. When the number of fibers in the composite material system is increased for the same volume fraction, a finer finite element mesh discretization is required to represent the geometry accurately. This leads to an increase in the computational cost. Thus, the secondary goal here is to predict the stress field for systems with larger number of fibers with varying spatial configurations using information from the true stress maps of relatively cheaper systems of smaller fiber number.	翻訳日:2021-11-14 15:10:47 公開日:2021-11-01
# テクノロジーの世代交代:コンピュータ科学と神経外科、そしてVRのユースケース Generational Frameshifts in Technology: Computer Science and Neurosurgery, The VR Use Case ( http://arxiv.org/abs/2110.15719v2 ) ライセンス: Link先を確認	Samuel R. Browd, Maya Sharma, Chetan Sharma	(参考訳) 私たちは、神経外科の実践を変えるために協力的に集結する技術が合流する、歴史上のユニークな瞬間にいます。これらの技術変革は、神経外科の術中パフォーマンス向上ツールや方法の改善、非同期神経外科訓練とシミュレーションのためのスケーラブルなソリューション、および、品質評価、請求書作成、結果測定、外科的ベストプラクティスの普及などの基本的変化を可能にする手術データの広範囲にわたる集約を含む、全面的に導入される。手術の詳細を把握し,手術の各部位を解析しながら,より安全かつ効率的に手術を行う能力は,当科の領域とすべての外科専門分野に全く新しい画期的な展開をもたらす。手術室内の全てのコンポーネントのデジタル化により、コンピュータや計算科学の様々な分野を活用して、位置に関係なく高品質な神経外科治療のケアと提供を改善する新たな洞察を得ることができる。神経外科の民主化は進行中であり、現代の世界のこれらのツールの開発、抽出、導入によって推進されるでしょう。仮想現実(virtual reality)は、消費者が直面するテクノロジーが、産業や医療において明確な役割を担っていることを示す良い例であり、人間の能力と相互作用をスケールするための新しいパラダイムを作る様々なコンピュータサイエンス技術の融合の顕著な例である。著者らは、近い将来のオペレーティングルームを実現するために必要な、無数の計算科学とデータ科学を紹介、紹介するテクノロジエコシステムについて説明している。 We are at a unique moment in history where there is a confluence of technologies which will synergistically come together to transform the practice of neurosurgery. These technological transformations will be all-encompassing, including improved tools and methods for intraoperative performance of neurosurgery, scalable solutions for asynchronous neurosurgical training and simulation, as well as broad aggregation of operative data allowing fundamental changes in quality assessment, billing, outcome measures, and dissemination of surgical best practices. The ability to perform surgery more safely and more efficiently while capturing the operative details and parsing each component of the operation will open an entirely new epoch advancing our field and all surgical specialties. The digitization of all components within the operating room will allow us to leverage the various fields within computer and computational science to obtain new insights that will improve care and delivery of the highest quality neurosurgery regardless of location. The democratization of neurosurgery is at hand and will be driven by our development, extraction, and adoption of these tools of the modern world. Virtual reality provides a good example of how consumer-facing technologies are finding a clear role in industry and medicine and serves as a notable example of the confluence of various computer science technologies creating a novel paradigm for scaling human ability and interactions. The authors describe the technology ecosystem that has come and highlight a myriad of computational and data sciences that will be necessary to enable the operating room of the near future.	翻訳日:2021-11-07 12:02:41 公開日:2021-11-01
# タスクガイドグラフ変換による弱教師付き概念マップ生成 Weakly Supervised Concept Map Generation through Task-Guided Graph Translation ( http://arxiv.org/abs/2110.15720v2 ) ライセンス: Link先を確認	Jiaying Lu, Xiangjue Dong, Carl Yang	(参考訳) 近年、自由テキストから知識を適切に構造化した要約を提供することの利点から、概念地図生成技術の急速な発展を目撃している。従来の教師なしメソッドはタスク指向のコンセプトマップを生成しないが、深層生成モデルは大量のトレーニングデータを必要とする。本稿では,GT-D2G(Graph Translation based Document-To-Graph)を提案する。汎用NLPパイプラインを利用して意味豊かな初期グラフを導出し,文書ラベルの弱い管理下でより簡潔な構造に翻訳する。これらの概念マップの品質と解釈性は,3つの実世界のコーパス上での人間による評価によって検証され,文書ラベルの不足による制御実験において,下流作業におけるそれらの有用性はさらに実証された。 Recent years have witnessed the rapid development of concept map generation techniques due to their advantages in providing well-structured summarization of knowledge from free texts. Traditional unsupervised methods do not generate task-oriented concept maps, whereas deep generative models require large amounts of training data. In this work, we present GT-D2G (Graph Translation based Document-To-Graph), an automatic concept map generation framework that leverages generalized NLP pipelines to derive semantic-rich initial graphs, and translates them into more concise structures under the weak supervision of document labels. The quality and interpretability of such concept maps are validated through human evaluation on three real-world corpora, and their utility in the downstream task is further demonstrated in the controlled experiments with scarce document labels.	翻訳日:2021-11-07 11:38:29 公開日:2021-11-01
# (参考訳) 医用画像解析のためのディープニューラルネットワークの透明性:解釈可能性の検討 Transparency of Deep Neural Networks for Medical Image Analysis: A Review of Interpretability Methods ( http://arxiv.org/abs/2111.02398v1 ) ライセンス: CC BY 4.0	Zohaib Salahuddin, Henry C Woodruff, Avishek Chatterjee and Philippe Lambin	(参考訳) 人工知能は、診断と治療決定のための多くの臨床応用に有用な助けとして登場した。ディープニューラルネットワークは、利用可能なデータと計算能力の急速な増加により、多くのタスクで臨床医と同等あるいは優れたパフォーマンスを示している。信頼できるAIの原則に従うためには、AIシステムは透明性、堅牢、公正、そして説明責任を保証することが不可欠である。現在のディープニューラルソリューションは、意思決定プロセスに関する詳細の理解が欠如しているため、ブラックボックスと呼ばれる。したがって、日常的な臨床ワークフローに組み込む前に、ディープニューラルネットワークの解釈可能性を確保する必要がある。本総説では, 医用画像解析用深層学習モデルの理解に用いられてきた9種類の解釈可能性手法を, 生成した説明書の種類と技術的類似性に基づいて, 体系的キーワード検索と専門知識を用いて同定した。さらに,様々な解釈方法によって得られた説明を評価するための進歩について報告する。最後に, 医用画像解析における深部ニューラルネットワークの解釈可能性に関する限界, 解釈可能性手法と今後の方向性について考察する。 Artificial Intelligence has emerged as a useful aid in numerous clinical applications for diagnosis and treatment decisions. Deep neural networks have shown same or better performance than clinicians in many tasks owing to the rapid increase in the available data and computational power. In order to conform to the principles of trustworthy AI, it is essential that the AI system be transparent, robust, fair and ensure accountability. Current deep neural solutions are referred to as black-boxes due to a lack of understanding of the specifics concerning the decision making process. Therefore, there is a need to ensure interpretability of deep neural networks before they can be incorporated in the routine clinical workflow. In this narrative review, we utilized systematic keyword searches and domain expertise to identify nine different types of interpretability methods that have been used for understanding deep learning models for medical image analysis applications based on the type of generated explanations and technical similarities. Furthermore, we report the progress made towards evaluating the explanations produced by various interpretability methods. Finally we discuss limitations, provide guidelines for using interpretability methods and future directions concerning the interpretability of deep neural networks for medical imaging analysis.	翻訳日:2021-11-06 06:31:53 公開日:2021-11-01
# (参考訳) スクラッチから同時に刈り取った構造と重みを学習する:注意に基づくアプローチ Learning Pruned Structure and Weights Simultaneously from Scratch: an Attention based Approach ( http://arxiv.org/abs/2111.02399v1 ) ライセンス: CC BY 4.0	Qisheng He, Ming Dong, Loren Schwiebert, Weisong Shi	(参考訳) ディープラーニングモデルには通常、数百万のトレーニング可能なウェイトが含まれているため、ストレージスペースの削減とランタイム効率の向上という、より効率的なネットワーク構造に対する需要が高まっている。プルーニングは最も人気のあるネットワーク圧縮技術の一つである。本稿では,非構造化プルーニングパイプライン,注意に基づく同時スパース構造と重み学習(ASWL)を提案する。従来のチャネルワイドやウェイトワイドアテンション機構とは異なり、ASWLは各層に対する層ワイドアテンションによるプルーニング比を計算する効率的なアルゴリズムを提案し、密集ネットワークとスパースネットワークの重みをランダムに初期化した重みから同時に学習するように追跡する。 MNIST, Cifar10, ImageNet を用いた実験により, ASWL は最先端のネットワークプルーニング手法と比較して, 精度, プルーニング率, 動作効率で優れたプルーニング結果が得られることを示した。 As a deep learning model typically contains millions of trainable weights, there has been a growing demand for a more efficient network structure with reduced storage space and improved run-time efficiency. Pruning is one of the most popular network compression techniques. In this paper, we propose a novel unstructured pruning pipeline, Attention-based Simultaneous sparse structure and Weight Learning (ASWL). Unlike traditional channel-wise or weight-wise attention mechanism, ASWL proposed an efficient algorithm to calculate the pruning ratio through layer-wise attention for each layer, and both weights for the dense network and the sparse network are tracked so that the pruned structure is simultaneously learned from randomly initialized weights. Our experiments on MNIST, Cifar10, and ImageNet show that ASWL achieves superior pruning results in terms of accuracy, pruning ratio and operating efficiency when compared with state-of-the-art network pruning methods.	翻訳日:2021-11-06 05:36:19 公開日:2021-11-01
# (参考訳) 医用画像分類のための深部AUCの最大化 : 課題と機会 Deep AUC Maximization for Medical Image Classification: Challenges and Opportunities ( http://arxiv.org/abs/2111.02400v1 ) ライセンス: CC BY 4.0	Tianbao Yang	(参考訳) 本稿では,医療画像分類において,AUC の最大化による新たな深層学習手法(いわゆる \underline{\bf D}eep \underline{\bf A}UC \underline{\bf M}aximization あるいは {\bf DAM} )がもたらした機会と課題について論じる。 AUCは医学画像分類の標準的な性能指標であるため、AUCを直接最適化することで、従来の損失関数(例えばクロスエントロピー損失)を最小化するよりも、ディープニューラルネットワークを学習する方が優れたパフォーマンスが得られる。近年,大規模医用画像分類に深部auc最大化を用いる傾向がみられた。本稿では,最近の結果を強調して考察する。 i) DAMの確率的非凸最適化アルゴリズムによる進歩 (ii)様々な医用画像分類問題における有望な結果次に、機能学習、大規模最適化、信頼できるAIモデルの学習という3つの観点から、医療画像分類におけるDAMの課題と機会について論じる。 In this extended abstract, we will present and discuss opportunities and challenges brought about by a new deep learning method by AUC maximization (aka \underline{\bf D}eep \underline{\bf A}UC \underline{\bf M}aximization or {\bf DAM}) for medical image classification. Since AUC (aka area under ROC curve) is a standard performance measure for medical image classification, hence directly optimizing AUC could achieve a better performance for learning a deep neural network than minimizing a traditional loss function (e.g., cross-entropy loss). Recently, there emerges a trend of using deep AUC maximization for large-scale medical image classification. In this paper, we will discuss these recent results by highlighting (i) the advancements brought by stochastic non-convex optimization algorithms for DAM; (ii) the promising results on various medical image classification problems. Then, we will discuss challenges and opportunities of DAM for medical image classification from three perspectives, feature learning, large-scale optimization, and learning trustworthy AI models.	翻訳日:2021-11-06 05:21:44 公開日:2021-11-01
# (参考訳) 物理インフォームド機械学習を用いたcfd問題の数値近似 Numerical Approximation in CFD Problems Using Physics Informed Machine Learning ( http://arxiv.org/abs/2111.02987v1 ) ライセンス: CC BY 4.0	Siddharth Rout, Vikas Dwivedi, Balaji Srinivasan	(参考訳) この論文は、計算コストが低くランタイムの低い幅広いCFD問題に普遍的に使用できる代替近似法を見つけるための様々な手法に焦点を当てている。機械学習の分野では、コアの野望を満たすために様々な技術が研究されている。定常移流拡散問題(stable advection diffusion problem)は、ある方法が解を提供するまでの複雑さのレベルを理解するためのテストケースとして用いられてきた。最終的に、計算データを使ったトレーニングなしに微分方程式を解くことが可能な、物理学的なインフォームド機械学習技術に焦点が当てられる。 I.E. Lagarisらによる一般的な方法。 M. Raissiらは徹底的に調査されている。一般的な方法は対流支配的な問題を解決することはできない。分散物理情報ニューラルネットワーク (DPINN) と呼ばれる物理情報化手法を提案し, 対流支配問題の解法を提案する。ドメインを分割し、他の物理ベースの制約を平均二乗損失項として導入することで、古いメソッドのレキシビリティと能力を高める。様々な実験を行い、この手法で終わりから終わりまでの可能性を探る。パラメトリックな研究は、異なる可変パラメータに対するメソッドの振る舞いを理解するためにも行われる。この方法は、定常な対流拡散問題と不安定な正方形パルス問題に対して試験される。正確な結果が記録されている。 Extreme Learning Machine (ELM) は、チューナブルパラメーターを犠牲にして、非常に高速なニューラルネットワークアルゴリズムである。提案モデルのEMMに基づく変種は, 対流拡散問題に対して検証される。 ELMは複雑な最適化を単純化し、メソッドは非定型であるため、ソリューションは単一ショットで記録される。 ELMベースの変種は単純なDPINN法よりもうまく機能しているようだ。将来の様々な発展のスコープは、論文全体を通して示唆される。 The thesis focuses on various techniques to find an alternate approximation method that could be universally used for a wide range of CFD problems but with low computational cost and low runtime. Various techniques have been explored within the field of machine learning to gauge the utility in fulfilling the core ambition. Steady advection diffusion problem has been used as the test case to understand the level of complexity up to which a method can provide solution. Ultimately, the focus stays over physics informed machine learning techniques where solving differential equations is possible without any training with computed data. The prevalent methods by I.E. Lagaris et.al. and M. Raissi et.al are explored thoroughly. The prevalent methods cannot solve advection dominant problems. A physics informed method, called as Distributed Physics Informed Neural Network (DPINN), is proposed to solve advection dominant problems. It increases the lexibility and capability of older methods by splitting the domain and introducing other physics-based constraints as mean squared loss terms. Various experiments are done to explore the end to end possibilities with the method. Parametric study is also done to understand the behavior of the method to different tunable parameters. The method is tested over steady advection-diffusion problems and unsteady square pulse problems. Very accurate results are recorded. Extreme learning machine (ELM) is a very fast neural network algorithm at the cost of tunable parameters. The ELM based variant of the proposed model is tested over the advection-diffusion problem. ELM makes the complex optimization simpler and Since the method is non-iterative, the solution is recorded in a single shot. The ELM based variant seems to work better than the simple DPINN method. Simultaneously scope for various development in future are hinted throughout the thesis.	翻訳日:2021-11-06 05:12:28 公開日:2021-11-01
# 前・逆PDE問題に対するグラディエント・エンハンス物理インフォームドニューラルネットワーク Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems ( http://arxiv.org/abs/2111.02801v1 ) ライセンス: Link先を確認	Jeremy Yu, Lu Lu, Xuhui Meng, George Em Karniadakis	(参考訳) ディープラーニングは物理インフォームドニューラルネットワーク(PINN)を通じて偏微分方程式(PDE)を解くのに有効なツールであることが示されている。 PINNはPDE残基をニューラルネットワークの損失関数に埋め込んでおり、様々な前方および逆PDE問題の解決に成功している。しかし、第一世代のピンの欠点の1つは、訓練点が多ければ精度が限られることである。そこで本研究では, PINNの精度とトレーニング効率を向上させるために, 勾配型物理インフォームドニューラルネットワーク(gPINN)を提案する。 gPINNはPDE残差の勾配情報を利用して、勾配を損失関数に埋め込む。我々はgPINNを広範囲にテストし、PDE問題と逆PDE問題の両方においてgPINNの有効性を実証した。以上の結果から,gPINNはトレーニングポイントの少ないPINNよりも優れた性能を示した。さらに,gpinnと,トレーニング中のトレーニング点の分布を適応的に改善する方法であるresult-based adaptivefine (rar) を組み合わせることで,特に急勾配溶液を用いたpdesにおいて,gpinnの性能をさらに向上させた。 Deep learning has been shown to be an effective tool in solving partial differential equations (PDEs) through physics-informed neural networks (PINNs). PINNs embed the PDE residual into the loss function of the neural network, and have been successfully employed to solve diverse forward and inverse PDE problems. However, one disadvantage of the first generation of PINNs is that they usually have limited accuracy even with many training points. Here, we propose a new method, gradient-enhanced physics-informed neural networks (gPINNs), for improving the accuracy and training efficiency of PINNs. gPINNs leverage gradient information of the PDE residual and embed the gradient into the loss function. We tested gPINNs extensively and demonstrated the effectiveness of gPINNs in both forward and inverse PDE problems. Our numerical results show that gPINN performs better than PINN with fewer training points. Furthermore, we combined gPINN with the method of residual-based adaptive refinement (RAR), a method for improving the distribution of training points adaptively during training, to further improve the performance of gPINN, especially in PDEs with solutions that have steep gradients.	翻訳日:2021-11-05 15:45:12 公開日:2021-11-01
# 6gセンシングのための自己教師付き無線視覚表現学習 Self-Supervised Radio-Visual Representation Learning for 6G Sensing ( http://arxiv.org/abs/2111.02887v1 ) ライセンス: Link先を確認	Mohammed Alloulah, Akash Deep Singh, Maximilian Arnold	(参考訳) 将来の6Gセルネットワークでは、共同通信およびセンシングプロトコルにより、ネットワークは環境を認識でき、統一された通信知覚基盤の上に多くの新しいアプリケーションの扉を開く。しかし、センシングシーンの粗い無線表現の解釈は困難であり、これらの創発的システムの可能性を妨げる。無線と視覚を組み合わせることで、人間の介入を最小限に抑える無線のみのセンシングモデルを自動的に学習する。私たちは、何百万もの未解決のデータポイントをフィードできる無線センシングモデルを構築したいと考えています。そこで我々は,近年の自己教師型学習の進歩を活用し,新たなラベルのない無線-視覚協調学習手法を定式化した。本手法は,共通線形分類ベンチマークに従って実装・評価し,質的・定量的な性能指標を報告する。本評価では, 下流センシングデモンストラクタに対して, ラジオ・ビジュアル・セルフ・スーパービジョンで学習した表現が良好に動作し, ラベル付きデータが少ない場合, 完全に教師付き表現よりも優れることを示す。これは、自己教師付き学習が将来のスケーラブルな無線センシングシステムにとって重要な実現可能性を示している。 In future 6G cellular networks, a joint communication and sensing protocol will allow the network to perceive the environment, opening the door for many new applications atop a unified communication-perception infrastructure. However, interpreting the sparse radio representation of sensing scenes is challenging, which hinders the potential of these emergent systems. We propose to combine radio and vision to automatically learn a radio-only sensing model with minimal human intervention. We want to build a radio sensing model that can feed on millions of uncurated data points. To this end, we leverage recent advances in self-supervised learning and formulate a new label-free radio-visual co-learning scheme, whereby vision trains radio via cross-modal mutual information. We implement and evaluate our scheme according to the common linear classification benchmark, and report qualitative and quantitative performance metrics. In our evaluation, the representation learnt by radio-visual self-supervision works well for a downstream sensing demonstrator, and outperforms its fully-supervised counterpart when less labelled data is used. This indicates that self-supervised learning could be an important enabler for future scalable radio sensing systems.	翻訳日:2021-11-05 15:44:25 公開日:2021-11-01
# 機械学習に基づく分子フラグメントリンクのためのデカップリング座標 Decoupled coordinates for machine learning-based molecular fragment linking ( http://arxiv.org/abs/2111.02930v1 ) ライセンス: Link先を確認	Markus Fleck and Noah Weber and Christopher Trummer	(参考訳) 機械学習に基づく分子フラグメントリンクの最近の進歩は、生成プロセスにリンクすべきフラグメントの相対的配向を示す構造情報を伝えることの重要性を示している。しかし、そのような構造情報は完全な相対座標系の形ではまだ提供されていない。結合長、結合角、ねじれ角の分離集合の数学的詳細を精巧化し、座標系が完備であることが示されている。生成したリンカーの品質に対する重要な影響を数値的に示す。異なる種類の自由度における信頼性情報量について検討した。アブレーション研究と情報理論的解析を行う。提案した利点は、リンカ設計における標準的グッドプラクティスとして、完全かつ分離された相対座標系の適用を示唆している。 Recent developments in machine-learning based molecular fragment linking have demonstrated the importance of informing the generation process with structural information specifying the relative orientation of the fragments to be linked. However, such structural information has not yet been provided in the form of a complete relative coordinate system. Mathematical details for a decoupled set of bond lengths, bond angles and torsion angles are elaborated and the coordinate system is demonstrated to be complete. Significant impact on the quality of the generated linkers is demonstrated numerically. The amount of reliable information within the different types of degrees of freedom is investigated. Ablation studies and an information-theoretical analysis are performed. The presented benefits suggest the application of a complete and decoupled relative coordinate system as a standard good practice in linker design.	翻訳日:2021-11-05 15:43:50 公開日:2021-11-01
# (参考訳) 全スライドイメージングのための深層学習に基づく複数インスタンス学習の依存性の会計 Accounting for Dependencies in Deep Learning Based Multiple Instance Learning for Whole Slide Imaging ( http://arxiv.org/abs/2111.01556v1 ) ライセンス: CC BY 4.0	Andriy Myronenko, Ziyue Xu, Dong Yang, Holger Roth, Daguang Xu	(参考訳) 多重インスタンス学習(MIL)は、スライド画像全体(WSI)を分類するための重要なアルゴリズムである。ヒストロジー WSI には数十億ピクセルのピクセルがあり、膨大な計算とアノテーションの課題を生み出す。通常、このようなイメージは、バッグレベルのクラスラベルのみを提供する一連のパッチ(インスタンスの袋)に分割される。ディープラーニングに基づくMIL手法は、畳み込みニューラルネットワーク(CNN)を用いてインスタンス特徴を算出する。まず、自己注意型トランスフォーマーブロックを埋め込んでインスタンス間の依存関係をキャプチャすることで、トレーニング中のインスタンス間の依存関係を明示的に説明することを提案します。例えば、腫瘍のグレードは、wsi内の異なる場所にあるいくつかの特定のパターンの存在に依存し、パッチ間の依存関係を考慮しなければならない。次に,インスタンス擬似ラベルに基づくインスタンス分割損失関数を提案する。提案手法を複数のベースライン法と比較し,1k 以上の画像を持つ最大で公開可能な wsi データセット panda challenge データセット上で評価し,最新の結果を示す。 Multiple instance learning (MIL) is a key algorithm for classification of whole slide images (WSI). Histology WSIs can have billions of pixels, which create enormous computational and annotation challenges. Typically, such images are divided into a set of patches (a bag of instances), where only bag-level class labels are provided. Deep learning based MIL methods calculate instance features using convolutional neural network (CNN). Our proposed approach is also deep learning based, with the following two contributions: Firstly, we propose to explicitly account for dependencies between instances during training by embedding self-attention Transformer blocks to capture dependencies between instances. For example, a tumor grade may depend on the presence of several particular patterns at different locations in WSI, which requires to account for dependencies between patches. Secondly, we propose an instance-wise loss function based on instance pseudo-labels. We compare the proposed algorithm to multiple baseline methods, evaluate it on the PANDA challenge dataset, the largest publicly available WSI dataset with over 11K images, and demonstrate state-of-the-art results.	翻訳日:2021-11-04 02:28:00 公開日:2021-11-01
# (参考訳) 総合的, 臨床的に正確な頭頸部臓器 : 階層的深層学習による大規模多施設研究 Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study ( http://arxiv.org/abs/2111.01544v1 ) ライセンス: CC BY 4.0	Dazhou Guo, Jia Ge, Xianghua Ye, Senxiang Yan, Yi Xin, Yuchen Song, Bing-shen Huang, Tsung-Min Hung, Zhuotun Zhu, Ling Peng, Yanping Ren, Rui Liu, Gong Zhang, Mengyuan Mao, Xiaohua Chen, Zhongjie Lu, Wenxiang Li, Yuzhen Chen, Lingyun Huang, Jing Xiao, Adam P. Harrison, Le Lu, Chien-Yu Lin, Dakai Jin, Tsung-Ying Ho	(参考訳) 放射線治療後の合併症を軽減するためには,OARセグメンテーションが重要である。コンセンサスガイドラインでは、頭頸部(H&N)領域に40以上のOARを推奨しているが、このタスクの予測可能な禁止コストのため、ほとんどの機関は、OARの小さなサブセットを規定し、他のOARに関連する線量分布を無視することによって、大幅に単純化されたプロトコルを選択する。本稿では,42個のH&N OARの包括的集合を正確に記述するために,ディープラーニングを用いた新しい,自動化された,高効率な階層化OARセグメンテーション(SOARS)システムを提案する。 soarは42のoarをアンカー、中レベル、小規模、ハードのサブカテゴリに階層化し、ニューラルネットワークアーキテクチャをニューラルネットワーク検索(nas)の原則によって各カテゴリに特化している。内科機関で176名の研修患者を用いてSOARSモデルを構築し,6施設の外部患者1327名に対して個別に評価を行った。システム評価毎のdiceスコア(他の指標では最大36%のエラー低減)では、最先端の手法を一貫して35%以上上回っている。さらに重要なことは、SOARSの予測の98%が直接臨床受け入れの修正(放射線オンコロジーのワークロードを90%削減する)を必要とせず、それらのセグメンテーションとドシメトリックの精度がユーザー間の変動より小さいことを示しています。以上の結果から,h&n癌放射線治療におけるoar脱線プロセスに対するsoarsの臨床応用性が向上し,効率,包括性,品質が向上した。 Accurate organ at risk (OAR) segmentation is critical to reduce the radiotherapy post-treatment complications. Consensus guidelines recommend a set of more than 40 OARs in the head and neck (H&N) region, however, due to the predictable prohibitive labor-cost of this task, most institutions choose a substantially simplified protocol by delineating a smaller subset of OARs and neglecting the dose distributions associated with other OARs. In this work we propose a novel, automated and highly effective stratified OAR segmentation (SOARS) system using deep learning to precisely delineate a comprehensive set of 42 H&N OARs. SOARS stratifies 42 OARs into anchor, mid-level, and small & hard subcategories, with specifically derived neural network architectures for each category by neural architecture search (NAS) principles. We built SOARS models using 176 training patients in an internal institution and independently evaluated on 1327 external patients across six different institutions. It consistently outperformed other state-of-the-art methods by at least 3-5% in Dice score for each institutional evaluation (up to 36% relative error reduction in other metrics). More importantly, extensive multi-user studies evidently demonstrated that 98% of the SOARS predictions need only very minor or no revisions for direct clinical acceptance (saving 90% radiation oncologists workload), and their segmentation and dosimetric accuracy are within or smaller than the inter-user variation. These findings confirmed the strong clinical applicability of SOARS for the OAR delineation process in H&N cancer radiotherapy workflows, with improved efficiency, comprehensiveness, and quality.	翻訳日:2021-11-04 02:18:48 公開日:2021-11-01
# (参考訳) 若年者のための皮質下構造分割データベース Sub-cortical structure segmentation database for young population ( http://arxiv.org/abs/2111.01561v1 ) ライセンス: CC BY 4.0	Jayanthi Sivaswamy, Alphin J Thottupattu, Mythri V, Raghav Mehta, R Sheelakumari, Chandrasekharan Kesavadas	(参考訳) MRIスキャンによる皮質下構造の分離は多くの神経学的診断において重要である。これは面倒なタスク機械学習であり、特に深層学習(DL)手法が研究されている。脳の構造的複雑さは、大きな高品質なセグメンテーションデータセットを必要とし、皮質下構造セグメンテーションのための優れたdlベースのソリューションを開発する。これに向けて、114, 1.5 Tesla, T1 MRIスキャンのセットをリリースし、14の皮質下構造を手動で記述しています。データセットのスキャンは、健康な若い被験者(男性58名、女性56名)から取得され、すべての構造は経験豊富な放射線学の専門家によって手動で記述されている。このデータセットを用いてセグメンテーション実験を行い,深層学習法を用いて精度の高い結果が得られることを示した。 Segmentation of sub-cortical structures from MRI scans is of interest in many neurological diagnosis. Since this is a laborious task machine learning and specifically deep learning (DL) methods have become explored. The structural complexity of the brain demands a large, high quality segmentation dataset to develop good DL-based solutions for sub-cortical structure segmentation. Towards this, we are releasing a set of 114, 1.5 Tesla, T1 MRI scans with manual delineations for 14 sub-cortical structures. The scans in the dataset were acquired from healthy young (21-30 years) subjects ( 58 male and 56 female) and all the structures are manually delineated by experienced radiology experts. Segmentation experiments have been conducted with this dataset and results demonstrate that accurate results can be obtained with deep-learning methods.	翻訳日:2021-11-04 02:16:52 公開日:2021-11-01
# (参考訳) 頭頸部放射線治療における臓器コンチューリングのためのベイズモデルの比較 Comparing Bayesian Models for Organ Contouring in Headand Neck Radiotherapy ( http://arxiv.org/abs/2111.01134v1 ) ライセンス: CC BY 4.0	Prerak Mody, Nicolas Chaves-de-Plaza, Klaus Hildebrandt, Rene van Egmond, Huib de Ridder, Marius Staring	(参考訳) 放射線治療における臓器コントゥーリングの深層学習モデルは臨床応用が期待されているが、現在では予測された輪郭の自動品質評価(QA)のためのツールがほとんどない。ベイズモデルとその関連する不確実性を用いて、不正確な予測を検出するプロセスを自動化することができる。本研究では,予測校正誤差 (ECE) と定性的尺度 (R-AvU) を用いて,自動コントゥーリングのためのベイズモデルDropOutとFlipOutについて検討する。モデルが低ECEを信頼に値するものとみなすべきであることはよく理解されている。しかし、QAの文脈では、モデルは不正確な領域では高い不確実性を持ち、正確な領域では低い不確実性を持つ必要がある。このような振る舞いは、エキスパートユーザの視覚的な注意を、潜在的に不正確なリージョンに向け、QAプロセスのスピードアップにつながる可能性がある。 R-AvUグラフを用いて、精度と不正確な領域における異なるモデルの挙動を質的に比較する。 MICCAI2015 Head and Neck Segmentation ChallengeとDeepMindTCIA CTデータセットで、DropOut-DICE、Dropout-CE (Cross Entropy)、FlipOut-CEの3つのモデルを用いて実験が行われた。その結果,DropOut-DICEはECEが最も高く,Dropout-CEとFlipOut-CEはECEが低かった。 DropOut-CEとFlipOut-CEの違いをよりよく理解するために、R-AvUグラフを使用して、FlipOut-CEはDropOut-CEよりも不正確な領域における不確実性カバレッジが優れていることを示す。このような量的および質的なメトリクスの組み合わせは、臨床環境でQAツールとしてデプロイできるモデルを選択するのに役立つ新しいアプローチを探求する。 Deep learning models for organ contouring in radiotherapy are poised for clinical usage, but currently, there exist few tools for automated quality assessment (QA) of the predicted contours. Using Bayesian models and their associated uncertainty, one can potentially automate the process of detecting inaccurate predictions. We investigate two Bayesian models for auto-contouring, DropOut and FlipOut, using a quantitative measure - expected calibration error (ECE) and a qualitative measure - region-based accuracy-vs-uncertainty (R-AvU) graphs. It is well understood that a model should have low ECE to be considered trustworthy. However, in a QA context, a model should also have high uncertainty in inaccurate regions and low uncertainty in accurate regions. Such behaviour could direct visual attention of expert users to potentially inaccurate regions, leading to a speed up in the QA process. Using R-AvU graphs, we qualitatively compare the behaviour of different models in accurate and inaccurate regions. Experiments are conducted on the MICCAI2015 Head and Neck Segmentation Challenge and on the DeepMindTCIA CT dataset using three models: DropOut-DICE, Dropout-CE (Cross Entropy) and FlipOut-CE. Quantitative results show that DropOut-DICE has the highest ECE, while Dropout-CE and FlipOut-CE have the lowest ECE. To better understand the difference between DropOut-CE and FlipOut-CE, we use the R-AvU graph which shows that FlipOut-CE has better uncertainty coverage in inaccurate regions than DropOut-CE. Such a combination of quantitative and qualitative metrics explores a new approach that helps to select which model can be deployed as a QA tool in clinical settings.	翻訳日:2021-11-04 02:03:06 公開日:2021-11-01
# (参考訳) 全脳認知復号における深層伝達学習の評価 Evaluating deep transfer learning for whole-brain cognitive decoding ( http://arxiv.org/abs/2111.01562v1 ) ライセンス: CC BY 4.0	Armin W. Thomas and Ulman Lindenberger and Wojciech Samek and Klaus-Robert M\"uller	(参考訳) 多くの分野の研究で、少量のサンプルを持つデータセットにおけるディープラーニング(DL)モデルの性能を改善するのに、転送学習(TL)が適していることが示されている。この経験的成功は、機能的神経画像データを用いた認知的デコード解析へのtlの適用に対する関心を惹き起こした。本稿では,全脳機能型磁気共鳴画像(fMRI)データから,認知状態(顔や家の画像など)の復号化にDLモデルを適用するためのTLを体系的に評価する。まず,公開fmriデータセット上で2つのdlアーキテクチャを事前学習し,その性能を独立した実験タスクと完全に独立したデータセットで評価した。事前訓練されたモデルは、常に高い復号精度を達成し、通常、事前訓練されていないモデル変種よりも訓練時間とデータが少ない。これらの利点は、トレーニング済みモデルが新しいデータでトレーニングする際、学習した特徴の多くを再利用できることから生じており、事前トレーニングの利点をもたらすメカニズムに関する深い洞察を提供する。しかし, 学習済みモデルの復号決定を解釈する際に, DLモデルを用いた全脳認知復号化の難しさも浮き彫りにしている。 Research in many fields has shown that transfer learning (TL) is well-suited to improve the performance of deep learning (DL) models in datasets with small numbers of samples. This empirical success has triggered interest in the application of TL to cognitive decoding analyses with functional neuroimaging data. Here, we systematically evaluate TL for the application of DL models to the decoding of cognitive states (e.g., viewing images of faces or houses) from whole-brain functional Magnetic Resonance Imaging (fMRI) data. We first pre-train two DL architectures on a large, public fMRI dataset and subsequently evaluate their performance in an independent experimental task and a fully independent dataset. The pre-trained models consistently achieve higher decoding accuracies and generally require less training time and data than model variants that were not pre-trained, clearly underlining the benefits of pre-training. We demonstrate that these benefits arise from the ability of the pre-trained models to reuse many of their learned features when training with new data, providing deeper insights into the mechanisms giving rise to the benefits of pre-training. Yet, we also surface nuanced challenges for whole-brain cognitive decoding with DL models when interpreting the decoding decisions of the pre-trained models, as these have learned to utilize the fMRI data in unforeseen and counterintuitive ways to identify individual cognitive states.	翻訳日:2021-11-04 01:56:42 公開日:2021-11-01
# (参考訳) ニューラルネットワークトレーニングダイナミクスの局所性の検討 Investigating the locality of neural network training dynamics ( http://arxiv.org/abs/2111.01166v1 ) ライセンス: CC BY 4.0	Soham Dan, Phanideep Gampa and Anirbit Mukherjee	(参考訳) ディープラーニングの理論における基本的な探求は、学習アルゴリズムが取る重み空間における軌道の性質を理解することである。非常に最近分離されたそのような特性の1つは、「局所弾性」(S_{\rm rel}$)であり、サンプルデータポイントが別のデータポイントでの予測に与える影響の伝播を定量化するものである。本研究では,新しい理論的知見と,この性質のより慎重な実証的証拠を様々な設定で提供することにより,局所弾性の包括的研究を行う。まず、分類設定に特有なものとして、$s_{\rm rel}$という元の概念の新しい定義を提案する。 SVHN、CIFAR-10、CIFAR-100の最先端ニューラルネットワークトレーニングに関する実験では、新しい$S_{\rm rel}$が、サンプルデータと同じクラス内で予測を変更するのに好まれる重み更新の特性をどのように検出するかを示す。次に、最初の$s_{\rm rel}$が2ドルのフェーズの振る舞いを示す回帰を行うニューラルネットワークの例を例示して、トレーニングは$s_{\rm rel}$が急速に変化する場合の最初の弾性フェーズ、$s_{\rm rel}$が大きくなる場合の最終的な非弾性フェーズを経て行われることを実証する。最後に、元の$s_{\rm rel}$関数の閉形式式を得ることができる勾配フローによる学習の複数の例を示す。これらの導出公式のプロットを調べることによって、回帰設定における$s_{\rm rel}$の実験的に検出された性質のいくつかを理論的に実証した。 A fundamental quest in the theory of deep-learning is to understand the properties of the trajectories in the weight space that a learning algorithm takes. One such property that had very recently been isolated is that of "local elasticity" ($S_{\rm rel}$), which quantifies the propagation of influence of a sampled data point on the prediction at another data point. In this work, we perform a comprehensive study of local elasticity by providing new theoretical insights and more careful empirical evidence of this property in a variety of settings. Firstly, specific to the classification setting, we suggest a new definition of the original idea of $S_{\rm rel}$. Via experiments on state-of-the-art neural networks training on SVHN, CIFAR-10 and CIFAR-100 we demonstrate how our new $S_{\rm rel}$ detects the property of the weight updates preferring to make changes in predictions within the same class of the sampled data. Next, we demonstrate via examples of neural nets doing regression that the original $S_{\rm rel}$ reveals a $2-$phase behaviour: that their training proceeds via an initial elastic phase when $S_{\rm rel}$ changes rapidly and an eventual inelastic phase when $S_{\rm rel}$ remains large. Lastly, we give multiple examples of learning via gradient flows for which one can get a closed-form expression of the original $S_{\rm rel}$ function. By studying the plots of these derived formulas we given a theoretical demonstration of some of the experimentally detected properties of $S_{\rm rel}$ in the regression setting.	翻訳日:2021-11-04 01:55:29 公開日:2021-11-01
# (参考訳) 組合せ空間上のベイズ最適化のための潜在空間と構造化カーネルの組み合わせ Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces ( http://arxiv.org/abs/2111.01186v1 ) ライセンス: CC BY 4.0	Aryan Deshwal and Janardhan Rao Doppa	(参考訳) 我々は高価なブラックボックス関数評価を用いて組合せ空間(シーケンス、木、グラフなど)を最適化する問題を考える。例えば、物理実験を用いて薬物設計のための分子を最適化する。ベイズ最適化(英: bayesian optimization、bo)は、学習されたサーロゲートモデルによって誘導される高ユーティリティの入力をインテリジェントに選択することで、そのような問題を解決する効率的なフレームワークである。最近の組合せ空間に対するboアプローチは、ディープ生成モデル(dgms)を用いて構造の潜在表現を学習することで連続空間上のboを減少させることである。連続空間から選択された入力を離散構造に復号して機能評価を行う。しかし、潜在空間上の代理モデルは、ターゲットのブラックボックス関数を近似するために所望の帰納バイアスを持たないDGMによって得られた情報のみを使用する。この欠点を克服するため,本論文ではLADDERと呼ばれる原則的アプローチを提案する。鍵となる考え方は、より優れたサロゲートモデリングのために、デコードされた構造から学習された潜在空間表現に構造情報を明示的に統合する新しい構造結合カーネルを定義することである。実世界のベンチマーク実験により, LADDERは潜在空間法よりもBOよりも大幅に向上し, 最先端手法とよく似た性能を示した。 We consider the problem of optimizing combinatorial spaces (e.g., sequences, trees, and graphs) using expensive black-box function evaluations. For example, optimizing molecules for drug design using physical lab experiments. Bayesian optimization (BO) is an efficient framework for solving such problems by intelligently selecting the inputs with high utility guided by a learned surrogate model. A recent BO approach for combinatorial spaces is through a reduction to BO over continuous spaces by learning a latent representation of structures using deep generative models (DGMs). The selected input from the continuous space is decoded into a discrete structure for performing function evaluation. However, the surrogate model over the latent space only uses the information learned by the DGM, which may not have the desired inductive bias to approximate the target black-box function. To overcome this drawback, this paper proposes a principled approach referred as LADDER. The key idea is to define a novel structure-coupled kernel that explicitly integrates the structural information from decoded structures with the learned latent space representation for better surrogate modeling. Our experiments on real-world benchmarks show that LADDER significantly improves over the BO over latent space method, and performs better or similar to state-of-the-art methods.	翻訳日:2021-11-04 00:37:47 公開日:2021-11-01
# (参考訳) 保持ペダルを用いたピアノ音楽生成の学習 Learning To Generate Piano Music With Sustain Pedals ( http://arxiv.org/abs/2111.01216v1 ) ライセンス: CC BY 4.0	Joann Ching and Yi-Hsuan Yang	(参考訳) 近年,音楽情報検索コミュニティにおいて,音楽信号からピアノペダルを検出する研究への関心が高まっている。しかし、我々の知る限り、近年の象徴音楽の生成モデルは、ピアノのペダルを考慮に入れることはめったにない。本研究では,Kongらが提案する転写モデルを用いて,AILabs1k7データセットにおけるピアノ演奏の音声記録からペダル情報を取得し,Hsiaoらが提案する複合語変換器を修正し,ペダル関連トークンを他の楽譜とともに生成するトランスフォーマーデコーダを構築する。練習データとして推定された保持ペダル情報を用いて作業を行うが、ピアノ演奏世代の課題においてさらなる改善と保持ペダルの関与が期待できる。 Recent years have witnessed a growing interest in research related to the detection of piano pedals from audio signals in the music information retrieval community. However, to our best knowledge, recent generative models for symbolic music have rarely taken piano pedals into account. In this work, we employ the transcription model proposed by Kong et al. to get pedal information from the audio recordings of piano performance in the AILabs1k7 dataset, and then modify the Compound Word Transformer proposed by Hsiao et al. to build a Transformer decoder that generates pedal-related tokens along with other musical tokens. While the work is done by using inferred sustain pedal information as training data, the result shows hope for further improvement and the importance of the involvement of sustain pedal in tasks of piano performance generations.	翻訳日:2021-11-04 00:19:16 公開日:2021-11-01
# (参考訳) 空中計算によるロバスト連合学習 Robust Federated Learning via Over-The-Air Computation ( http://arxiv.org/abs/2111.01221v1 ) ライセンス: CC BY 4.0	Houssem Sifaou and Geoffrey Ye Li	(参考訳) 本稿では,ビザンチン攻撃に対する空中フェデレート学習のロバスト性について検討する。モデル更新の単純な平均化は、悪意のあるクライアントのローカルモデル更新のランダムあるいは意図的な修正に対して、学習タスクを脆弱にする。本稿では,このような攻撃に対して,フェデレート学習のためのオーバー・ザ・エア計算の利点を保ちながら,ロバストな伝達と集約の枠組みを提案する。提案した堅牢な連合学習では、参加するクライアントをランダムにグループに分割し、各グループに送信時間スロットを割り当てる。パラメータサーバは、ロバスト集約技術を用いて異なるグループの結果を集約し、別のトレーニングラウンドのためにクライアントに結果を送信する。また,提案アルゴリズムの収束性も解析する。数値シミュレーションは、ビザンツ攻撃に対する提案されたアプローチの堅牢性を確認する。 This paper investigates the robustness of over-the-air federated learning to Byzantine attacks. The simple averaging of the model updates via over-the-air computation makes the learning task vulnerable to random or intended modifications of the local model updates of some malicious clients. We propose a robust transmission and aggregation framework to such attacks while preserving the benefits of over-the-air computation for federated learning. For the proposed robust federated learning, the participating clients are randomly divided into groups and a transmission time slot is allocated to each group. The parameter server aggregates the results of the different groups using a robust aggregation technique and conveys the result to the clients for another training round. We also analyze the convergence of the proposed algorithm. Numerical simulations confirm the robustness of the proposed approach to Byzantine attacks.	翻訳日:2021-11-04 00:15:16 公開日:2021-11-01
# (参考訳) スパース連続注意のためのカーネル変形指数関数族 Kernel Deformed Exponential Families for Sparse Continuous Attention ( http://arxiv.org/abs/2111.01222v1 ) ライセンス: CC BY 4.0	Alexander Moreno, Supriya Nagesh, Zhenke Wu, Walter Dempsey, James M. Rehg	(参考訳) 注意機構は、確率重みに関してデータ表現の期待値を取る。これは重要な機能に焦点を当てた要約統計を作成する。近年 (Martins et al. 2020, 2021) は指数関数的・変形的指数関数族からの非指数的注意密度に着目した継続的注意機構を提案している。 (Farinhas et al. 2021)はこれを拡張して、密集した柔軟なクラスであるガウス混合注意密度を使用した。本稿では、これを2つの一般的なフレキシブルクラス、すなわち、カーネル指数族と、新しいスパース対カーネル指数族に拡張する。理論的には、核指数関数群と変形指数関数群の両方に対する新たな存在結果を示し、変形した場合が核指数関数群と同様の近似能力を持つことを示す。実験により、カーネル変形指数族はデータ領域の複数のコンパクト領域に参加することができることが示された。 Attention mechanisms take an expectation of a data representation with respect to probability weights. This creates summary statistics that focus on important features. Recently, (Martins et al. 2020, 2021) proposed continuous attention mechanisms, focusing on unimodal attention densities from the exponential and deformed exponential families: the latter has sparse support. (Farinhas et al. 2021) extended this to use Gaussian mixture attention densities, which are a flexible class with dense support. In this paper, we extend this to two general flexible classes: kernel exponential families and our new sparse counterpart kernel deformed exponential families. Theoretically, we show new existence results for both kernel exponential and deformed exponential families, and that the deformed case has similar approximation capabilities to kernel exponential families. Experiments show that kernel deformed exponential families can attend to multiple compact regions of the data domain.	翻訳日:2021-11-04 00:04:43 公開日:2021-11-01
# (参考訳) 大規模事前学習型言語モデルによる自然言語処理の最近の進歩:調査 Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey ( http://arxiv.org/abs/2111.01243v1 ) ライセンス: CC BY 4.0	Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heinz, and Dan Roth	(参考訳) BERTのような、トレーニング済みのトランスフォーマーベースの大規模言語モデルは、自然言語処理(NLP)の分野を大きく変えた。本稿では,これらの大規模言語モデルを用いたNLPタスクの事前学習,微調整,プロンプト,テキスト生成といった手法を用いた最近の研究について述べる。また,事前学習した言語モデルを用いて学習補助やその他の目的のためのデータを生成する手法を提案する。我々は,今後の研究の限界と方向性に関する議論を締めくくっている。 Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field. We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches. We also present approaches that use pre-trained language models to generate data for training augmentation or other purposes. We conclude with discussions on limitations and suggested directions for future research.	翻訳日:2021-11-03 23:39:17 公開日:2021-11-01
# (参考訳) 単一画像からのアイ・イン・ハンドカメラキャリブレーションの学習 Learning Eye-in-Hand Camera Calibration from a Single Image ( http://arxiv.org/abs/2111.01245v1 ) ライセンス: CC BY 4.0	Eugene Valassakis, Kamil Dreczkowski, Edward Johns	(参考訳) アイ・イン・ハンドカメラのキャリブレーションはロボット工学の基本的かつ長期にわたる問題である。本稿では,この問題を解決するための学習的手法を1つのRGB画像からオンライン化し,モデルを完全に合成データでトレーニングする。画像から外部行列を直接予測する1つの直接回帰モデルと、2次元キーポイントを回帰してPnPを使用する1つの疎対応モデルと、回帰深度とセグメンテーションマップを用いてICPのポーズ推定を可能にする1つの密対応モデルである。実験では,これらの手法を相互に評価し,確立された古典的手法に対して評価し,直接回帰が他の手法に勝る驚くべき結果を見出した。 Eye-in-hand camera calibration is a fundamental and long-studied problem in robotics. We present a study on using learning-based methods for solving this problem online from a single RGB image, whilst training our models with entirely synthetic data. We study three main approaches: one direct regression model that directly predicts the extrinsic matrix from an image, one sparse correspondence model that regresses 2D keypoints and then uses PnP, and one dense correspondence model that uses regressed depth and segmentation maps to enable ICP pose estimation. In our experiments, we benchmark these methods against each other and against well-established classical methods, to find the surprising result that direct regression outperforms other approaches, and we perform noise-sensitivity analysis to gain further insights into these results.	翻訳日:2021-11-03 23:38:23 公開日:2021-11-01
# (参考訳) ヤコビアンスイッチング線形力学系を用いたリバースエンジニアリングリカレントニューラルネットワーク Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems ( http://arxiv.org/abs/2111.01256v1 ) ライセンス: CC BY 4.0	Jimmy T.H. Smith, Scott W. Linderman, David Sussillo	(参考訳) リカレントニューラルネットワーク(RNN)は時系列データを処理するための強力なモデルであるが、どのように機能するかを理解するのは難しい。この理解を改善することは、機械学習と神経科学の両方のコミュニティにとって大きな関心事である。トレーニングされたRNNをその固定点を中心に線形化することでリバースエンジニアリングするフレームワークは洞察を与えてきたが、アプローチには大きな課題がある。これには、線形動力学で非線形力学を再構成する際に、rnnダイナミクスとエラー蓄積を研究する際に展開する不動点を選択することの難しさが含まれる。本稿では,新しい線形力学系(SLDS)の定式化により,これらの制約を克服する新しいモデルを提案する。共同訓練されたRNNのテイラー級数展開と、RNNの固定点を選ぶために訓練された補助関数がSLDSダイナミクスを制御している。結果は、RNNを近似した訓練されたSLDS変種であり、状態空間の各点に対する固定点を生成できる補助関数であり、可能であればその1次項が計算を行うように正規化された訓練された非線形RNNである。このモデルはトレーニング後の不動点最適化を取り除き、状態空間の任意の点におけるsldの学習されたダイナミクスを曖昧に研究できる。また、SLDSモデルをスイッチ間のパラメータを共有しながら、スイッチポイントの連続多様体に一般化する。従来のリバースエンジニアリングRNNに関連する2つの合成タスクにおいて,モデルの有効性を検証する。 LFADSのような複雑なアーキテクチャでは,我々のモデルがドロップインとして利用でき,このLFADSハイブリッドを用いて,非ヒト霊長類の運動系からの単一心房刺激活性を解析することができる。 Recurrent neural networks (RNNs) are powerful models for processing time-series data, but it remains challenging to understand how they function. Improving this understanding is of substantial interest to both the machine learning and neuroscience communities. The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges. These include difficulty choosing which fixed point to expand around when studying RNN dynamics and error accumulation when reconstructing the nonlinear dynamics with the linearized dynamics. We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation. A first-order Taylor series expansion of the co-trained RNN and an auxiliary function trained to pick out the RNN's fixed points govern the SLDS dynamics. The results are a trained SLDS variant that closely approximates the RNN, an auxiliary function that can produce a fixed point for each point in state-space, and a trained nonlinear RNN whose dynamics have been regularized such that its first-order terms perform the computation, if possible. This model removes the post-training fixed point optimization and allows us to unambiguously study the learned dynamics of the SLDS at any point in state-space. It also generalizes SLDS models to continuous manifolds of switching points while sharing parameters across switches. We validate the utility of the model on two synthetic tasks relevant to previous work reverse engineering RNNs. We then show that our model can be used as a drop-in in more complex architectures, such as LFADS, and apply this LFADS hybrid to analyze single-trial spiking activity from the motor system of a non-human primate.	翻訳日:2021-11-03 23:14:49 公開日:2021-11-01
# (参考訳) 累積自己回帰自己注意による脳動態 Brain dynamics via Cumulative Auto-Regressive Self-Attention ( http://arxiv.org/abs/2111.01271v1 ) ライセンス: CC BY 4.0	Usman Mahmood, Zening Fu, Vince Calhoun, Sergey Plis	(参考訳) 多変量動的プロセスは、個々の時系列を表すコンポーネント間の重み付け接続グラフによって直感的に記述されることが多い。このグラフをピアソン相関行列として単純な表現であっても、脳画像文献で示されるように、有益で予測的である。しかしながら、強力なグラフニューラルネットワーク(GNN)は、同様の設定でより良いパフォーマンスを期待されている。本研究では,脳画像アプリケーションにおいて,深部GNNよりもかなり浅く,予測精度に優れるモデルを提案する。本モデルは,各時系列の自己回帰構造を学習し,学習した表現間の有向接続グラフを,エンドツーエンドで自己認識機構を用いて推定する。患者とコントロール間の分類器としてのモデルの教師付きトレーニングにより、有向接続グラフを生成し、各被験者に予測される時系列の構成要素を強調するモデルが得られる。統合失調症患者とコントロールを分類する機能的神経画像データセットについて検討した。 Multivariate dynamical processes can often be intuitively described by a weighted connectivity graph between components representing each individual time-series. Even a simple representation of this graph as a Pearson correlation matrix may be informative and predictive as demonstrated in the brain imaging literature. However, there is a consensus expectation that powerful graph neural networks (GNNs) should perform better in similar settings. In this work, we present a model that is considerably shallow than deep GNNs, yet outperforms them in predictive accuracy in a brain imaging application. Our model learns the autoregressive structure of individual time series and estimates directed connectivity graphs between the learned representations via a self-attention mechanism in an end-to-end fashion. The supervised training of the model as a classifier between patients and controls results in a model that generates directed connectivity graphs and highlights the components of the time-series that are predictive for each subject. We demonstrate our results on a functional neuroimaging dataset classifying schizophrenia patients and controls.	翻訳日:2021-11-03 22:50:44 公開日:2021-11-01
# (参考訳) マルチネットワークInfoMax:グラフ畳み込みネットワークを含む事前学習手法 Multi network InfoMax: A pre-training method involving graph convolutional networks ( http://arxiv.org/abs/2111.01276v1 ) ライセンス: CC BY 4.0	Usman Mahmood, Zening Fu, Vince Calhoun, Sergey Plis	(参考訳) 異なる特徴やそれらの関係をデータから発見することは、分類など、さまざまなタスクにとって重要な知識を見つけるのに役立ちます。ニューロイメージングでは、これらの機能は脳障害の理解、分類、および予防に役立つ可能性がある。高性能過パラメータディープラーニング(DL)モデルのモデルイントロスペクションは、これらの特徴や関係を見つけるのに役立つ。しかし、高性能なdlモデルを達成するには、多くの分野で利用可能なラベル付きトレーニングサンプル(n$)が必要となる。本稿では,入力サンプルの2つの高レベル埋め込み間の相互情報を最大化することに基づく,グラフ畳み込み/ニューラルネットワーク(gcns/gnns)を用いた事前学習手法を提案する。最近提案された事前学習手法の多くは、アーキテクチャの多くの可能性の1つである。ほとんどすべてのdlモデルは複数のネットワークのアンサンブルであるので、モデルの2つの異なるネットワーク(畳み込みとグラフネットワーク)からハイレベルな埋め込みを取ります。学習された高レベルグラフ潜在表現は、下流グラフ分類タスクのパフォーマンスを高め、大量のラベル付きデータサンプルの必要性を回避します。対象を健康管理群 (hc) と統合失調症群 (sz) に分類するための神経画像データセットに適用する。実験の結果,事前学習モデルが非訓練モデルを大きく上回り,同様の性能を得るためには50～%少ないデータを必要とすることがわかった。 Discovering distinct features and their relations from data can help us uncover valuable knowledge crucial for various tasks, e.g., classification. In neuroimaging, these features could help to understand, classify, and possibly prevent brain disorders. Model introspection of highly performant overparameterized deep learning (DL) models could help find these features and relations. However, to achieve high-performance level DL models require numerous labeled training samples ($n$) rarely available in many fields. This paper presents a pre-training method involving graph convolutional/neural networks (GCNs/GNNs), based on maximizing mutual information between two high-level embeddings of an input sample. Many of the recently proposed pre-training methods pre-train one of many possible networks of an architecture. Since almost every DL model is an ensemble of multiple networks, we take our high-level embeddings from two different networks of a model --a convolutional and a graph network--. The learned high-level graph latent representations help increase performance for downstream graph classification tasks and bypass the need for a high number of labeled data samples. We apply our method to a neuroimaging dataset for classifying subjects into healthy control (HC) and schizophrenia (SZ) groups. Our experiments show that the pre-trained model significantly outperforms the non-pre-trained model and requires $50\%$ less data for similar performance.	翻訳日:2021-11-03 22:42:51 公開日:2021-11-01
# PointNu-Net : 臨床野における同時多部組織分類と分類 PointNu-Net: Simultaneous Multi-tissue Histology Nuclei Segmentation and Classification in the Clinical Wild ( http://arxiv.org/abs/2111.01557v1 ) ライセンス: Link先を確認	Kai Yao and Kaizhu Huang and Jie Sun and Amir Hussain and Curran Jude	(参考訳) 自動核セグメンテーションと分類は、デジタル病理学において重要な役割を果たす。しかしながら、以前の作業は、主に多様性とサイズが限定されたデータに基づいており、結果が疑わしいか、あるいは実際のダウンストリームタスクで誤解を招くようにしている。本稿では,「臨床ワイルド」からのデータを扱うことができる信頼性とロバストな手法を構築することを目的とする。具体的には, haematoxylin および eosin (h&e) 染色組織病理データからの核を同時検出, 分割, 分類する新しい方法の検討と, 最近の大規模データセット pannuke を用いたアプローチの評価を行った。本稿では,各核の中心点を決定するために,各核の検出と分類を新しい意味的キーポイント推定問題として扱う。次に、動的インスタンスセグメンテーションを用いて、核中心点に対する対応する類別マスクを求める。 2つの同時実行課題を分離することにより、クラス認識検出とクラス非依存セグメンテーションの恩恵を受け、性能が大幅に向上する。提案手法は19の異なる組織タイプにまたがる核分画と分類において優れた性能を示し,新たなベンチマーク結果を得た。 Automatic nuclei segmentation and classification plays a vital role in digital pathology. However, previous works are mostly built on data with limited diversity and small sizes, making the results questionable or misleading in actual downstream tasks. In this paper, we aim to build a reliable and robust method capable of dealing with data from the 'the clinical wild'. Specifically, we study and design a new method to simultaneously detect, segment, and classify nuclei from Haematoxylin and Eosin (H&E) stained histopathology data, and evaluate our approach using the recent largest dataset: PanNuke. We address the detection and classification of each nuclei as a novel semantic keypoint estimation problem to determine the center point of each nuclei. Next, the corresponding class-agnostic masks for nuclei center points are obtained using dynamic instance segmentation. By decoupling two simultaneous challenging tasks, our method can benefit from class-aware detection and class-agnostic segmentation, thus leading to a significant performance boost. We demonstrate the superior performance of our proposed approach for nuclei segmentation and classification across 19 different tissue types, delivering new benchmark results.	翻訳日:2021-11-03 15:09:28 公開日:2021-11-01
# 時系列・計量・機械学習・ディープラーニングモデルを用いた株価予測 Stock Price Prediction Using Time Series, Econometric, Machine Learning, and Deep Learning Models ( http://arxiv.org/abs/2111.01137v1 ) ライセンス: Link先を確認	Ananda Chatterjee, Hrisav Bhowmick, and Jaydip Sen	(参考訳) 長い間、研究者は株式価格予測の信頼性と正確な予測モデルを開発してきた。文献によると、予測モデルが正しく設計され洗練されているならば、将来の株価を辛うじて忠実に見積もることができる。本稿では,株価予測のための時系列モデル,計量モデル,各種学習モデルについて述べる。 2004年1月から2019年12月までのInfosys、ICICI、SUN PHARMAのデータは、どのモデルがどのセクターでどのモデルが最も優れているかを知るためのトレーニングとテストに使用された。本稿では,1つの時系列モデル (Holt-Winters Exponential Smoothing) と1つのエコノメトリモデル (ARIMA) と2つの機械学習モデル (Random Forest と MARS) と2つのディープラーニングベースモデル (シンプルな RNN と LSTM) を含む。 MARSは最高の機械学習モデルであることが証明され、LSTMは最高のディープラーニングモデルであることが証明された。しかし、IT(Infosysデータ)、バンキング(ICICIデータ)、ヘルス(SUN PHARMAデータ)の3分野すべてにおいて、MARSは販売予測において最高のパフォーマンスモデルであることが証明された。 For a long-time, researchers have been developing a reliable and accurate predictive model for stock price prediction. According to the literature, if predictive models are correctly designed and refined, they can painstakingly and faithfully estimate future stock values. This paper demonstrates a set of time series, econometric, and various learning-based models for stock price prediction. The data of Infosys, ICICI, and SUN PHARMA from the period of January 2004 to December 2019 was used here for training and testing the models to know which model performs best in which sector. One time series model (Holt-Winters Exponential Smoothing), one econometric model (ARIMA), two machine Learning models (Random Forest and MARS), and two deep learning-based models (simple RNN and LSTM) have been included in this paper. MARS has been proved to be the best performing machine learning model, while LSTM has proved to be the best performing deep learning model. But overall, for all three sectors - IT (on Infosys data), Banking (on ICICI data), and Health (on SUN PHARMA data), MARS has proved to be the best performing model in sales forecasting.	翻訳日:2021-11-03 15:09:05 公開日:2021-11-01
# OPF-Learn:AC Optimal Power Flowデータセット作成のためのオープンソースフレームワーク OPF-Learn: An Open-Source Framework for Creating Representative AC Optimal Power Flow Datasets ( http://arxiv.org/abs/2111.01228v1 ) ライセンス: Link先を確認	Trager Joswig-Jones, Ahmed S. Zamzam, Kyri Baker	(参考訳) 再生可能発電のレベルの増加は、不確実性を管理するためにac最適電力フロー(ac opf)のためのデータ駆動アプローチへの関心が高まっているが、規律化されたデータセットの作成とベンチマークの欠如は、文献におけるアプローチ間の有用な比較を禁止している。信頼性を高めるために、モデルは幅広い操作条件で確実に解を予測できなければならない。本稿では、juliaとpython用のopf-learnパッケージを開発し、ac opf実現可能領域の幅広いスペクトルにまたがる代表データセットを作成するために計算効率の良い手法を用いている。負荷プロファイルは、AC OPF可能なセットを含む凸集合から一様にサンプリングされる。検出された各不実現点について、凸集合は緩和された定式化の特性を用いて、不実現性証明を用いて縮小される。このフレームワークは、文献に見られる従来のテクニックよりも、実現可能なスペース全体を代表するデータセットを生成し、機械学習モデルのパフォーマンスを向上させる。 Increasing levels of renewable generation motivate a growing interest in data-driven approaches for AC optimal power flow (AC OPF) to manage uncertainty; however, a lack of disciplined dataset creation and benchmarking prohibits useful comparison among approaches in the literature. To instill confidence, models must be able to reliably predict solutions across a wide range of operating conditions. This paper develops the OPF-Learn package for Julia and Python, which uses a computationally efficient approach to create representative datasets that span a wide spectrum of the AC OPF feasible region. Load profiles are uniformly sampled from a convex set that contains the AC OPF feasible set. For each infeasible point found, the convex set is reduced using infeasibility certificates, found by using properties of a relaxed formulation. The framework is shown to generate datasets that are more representative of the entire feasible space versus traditional techniques seen in the literature, improving machine learning model performance.	翻訳日:2021-11-03 15:08:41 公開日:2021-11-01
# マルチスケールモデリングのためのマルチ解像度X線マイクロCT画像の深部学習 Deep learning of multi-resolution X-Ray micro-CT images for multi-scale modelling ( http://arxiv.org/abs/2111.01270v1 ) ライセンス: Link先を確認	Samuel J. Jackson and Yufu Niu and Sojwal Manoorkar and Peyman Mostaghimi and Ryan T. Armstrong	(参考訳) マルチスケール多孔質系のキャラクタリゼーション、解析、モデル開発を制限するX線マイクロ計算トモグラフィーには、視野と解像度のトレードオフがある。本稿では,これらのトレードオフを克服するために,edsr畳み込みニューラルネットワークを3次元拡張し,低解像度データから大規模空間スケールの高分解能データを生成する。ベントハイマー岩盤試料からの対高分解能(hr, 2$\mu$m)と低分解能(lr, 6$\mu$m)の画像データを用いてネットワークを訓練する。トレーニングサンプルから得られたlrとhrデータと、異なるマイクロ構造を持つ別のサンプルは、テキスト分析、セグメンテーション動作、およびpnm(pore-network model)多相流シミュレーションなど、さまざまなメトリクスでネットワークを検証するために使用される。検証されたEDSRネットワークは、長さ6-7cmの各コアサンプルに対して、約1000の高解像度REVサブボリューム画像を生成する。各サブボリュームは、PNMから予測される異なる石油物理特性を持ち、各サンプルの3次元連続体スケールモデルを作成するために結合される。低キャピラリー数での乾燥不能な流れは, 実験圧力と3次元飽和度を1:1で直接比較し, 一定範囲の分数流でシミュレートした。 edsr生成モデルは, 細孔径分布の広い流れ場において, 異質性の存在下での実験的挙動を予測できるベースlrモデルよりも精度が高い。モデルは通常、実験的な再現性と3桁の相対透過性の範囲内で飽和を予測するのに正確である。実証されたワークフローは、キャリブレーションなしで完全に予測され、真にマルチスケールの異種システムにおけるフローをイメージし、シミュレートし、分析する可能性を開く。 There are inherent field-of-view and resolution trade-offs in X-Ray micro-computed tomography imaging, which limit the characterization, analysis and model development of multi-scale porous systems. In this paper, we overcome these tradeoffs by developing a 3D Enhanced Deep Super Resolution (EDSR) convolutional neural network to create enhanced, high-resolution data over large spatial scales from low-resolution data. Paired high-resolution (HR, 2$\mu$m) and low resolution (LR, 6$\mu$m) image data from a Bentheimer rock sample are used to train the network. Unseen LR and HR data from the training sample, and another sample with a distinct micro-structure, are used to validate the network with various metrics: textual analysis, segmentation behaviour and pore-network model (PNM) multiphase flow simulations. The validated EDSR network is used to generate ~1000 high-resolution REV subvolume images for each full core sample of length 6-7cm (total image sizes are ~6000x6000x32000 voxels). Each subvolume has distinct petrophysical properties predicted from PNMs, which are combined to create a 3D continuum-scale model of each sample. Drainage immiscible flow at low capillary number is simulated across a range of fractional flows and compared directly to experimental pressures and 3D saturations on a 1:1 basis. The EDSR generated model is more accurate than the base LR model at predicting experimental behaviour in the presence of heterogeneities, especially in flow regimes where a wide distribution of pore-sizes are encountered. The models are generally accurate at predicting saturations to within the experimental repeatability and relative permeability across three orders of magnitude. The demonstrated workflow is a fully predictive, without calibration, and opens up the possibility to image, simulate and analyse flow in truly multi-scale heterogeneous systems that are otherwise intractable.	翻訳日:2021-11-03 15:08:23 公開日:2021-11-01
# ネスト力学系としてのディープニューラルネットワーク Deep neural networks as nested dynamical systems ( http://arxiv.org/abs/2111.01297v1 ) ライセンス: Link先を確認	David I. Spivak, Timothy Hosgood	(参考訳) ディープニューラルネットワークの「ニューロン」は、脳のニューロン(または神経細胞、混乱を避けるために)に対応するべきである。しかし、私たちは、このアナロジーは型チェックさえしていないと主張しています。ヘッビアン・ラーニングを「一緒に発火するセル」というわずかにグリムの要約と一致して、この記事ではアナロジーが異なるべきという主張を述べる。深層ニューラルネットワークの"neuron"は重みの変化を管理しているため、それらは脳内のシナプスに似ています。神経細胞が単なるワイヤー以上のように見えるという直感は、まさに正しいものであり、本記事では、正確なカテゴリー理論の類似によって正当化される。全体としては、「ニューロン」を引用に残したり、人工ニューロンと呼ぶことで、人工ニューロンと神経細胞を同等にすることの誤りを強調し続ける。まず、深層ニューラルネットワークを、非常に制限された相互作用パターンを持つネストされた動的システムとみなす方法を説明し、次に、エンジニアリング全体を通して有用だが状況の変化に適応できない、より一般的な動的システムに対する相互作用を説明する。前述のように、類推はどちらも埋め込まれた数学的形式主義によって我々に強制される。それらは制御理論のように複雑な相互作用を持つが、ディープニューラルネットワークのような状況に適応する。 There is an analogy that is often made between deep neural networks and actual brains, suggested by the nomenclature itself: the "neurons" in deep neural networks should correspond to neurons (or nerve cells, to avoid confusion) in the brain. We claim, however, that this analogy doesn't even type check: it is structurally flawed. In agreement with the slightly glib summary of Hebbian learning as "cells that fire together wire together", this article makes the case that the analogy should be different. Since the "neurons" in deep neural networks are managing the changing weights, they are more akin to the synapses in the brain; instead, it is the wires in deep neural networks that are more like nerve cells, in that they are what cause the information to flow. An intuition that nerve cells seem like more than mere wires is exactly right, and is justified by a precise category-theoretic analogy which we will explore in this article. Throughout, we will continue to highlight the error in equating artificial neurons with nerve cells by leaving "neuron" in quotes or by calling them artificial neurons. We will first explain how to view deep neural networks as nested dynamical systems with a very restricted sort of interaction pattern, and then explain a more general sort of interaction for dynamical systems that is useful throughout engineering, but which fails to adapt to changing circumstances. As mentioned, an analogy is then forced upon us by the mathematical formalism in which they are both embedded. We call the resulting encompassing generalization deeply interacting learning systems: they have complex interaction as in control theory, but adaptation to circumstances as in deep neural networks.	翻訳日:2021-11-03 15:07:49 公開日:2021-11-01
# 効率的で簡潔で正確な説明が Provably efficient, succinct, and precise explanations ( http://arxiv.org/abs/2111.01576v1 ) ライセンス: Link先を確認	Guy Blanc, Jane Lange, and Li-Yang Tan	(参考訳) 任意のブラックボックスモデルの予測を説明する問題は$f$である。 $f$ とインスタンス $x$ に対するクエリアクセスが与えられたとき、基本的に$f(x)$ を決定する$x$ の機能の小さなセットを出力する。我々は、返却する説明の簡潔さと正確さを証明可能な保証で効率的なアルゴリズムを設計する。以前のアルゴリズムは効率的だったが、そのような保証がなかったり、保証を達成できなかったりしたが、効率が悪かった。学習決定木を暗黙的に学習する問題に接続してアルゴリズムを得る。この学習タスクの暗黙的な性質は、$f$の複雑さが難解で大きな代理決定木を必要とする場合でも、効率的なアルゴリズムを可能にする。学習理論,局所計算アルゴリズム,複雑性理論を組み合わせることで,暗黙的な学習問題を解決する。暗黙的な学習による説明」というアプローチは,ポストホックな説明とグローバルな説明,ローカルな説明の2つの異なる方法の要素を共有し,両者の利点を享受している。 We consider the problem of explaining the predictions of an arbitrary blackbox model $f$: given query access to $f$ and an instance $x$, output a small set of $x$'s features that in conjunction essentially determines $f(x)$. We design an efficient algorithm with provable guarantees on the succinctness and precision of the explanations that it returns. Prior algorithms were either efficient but lacked such guarantees, or achieved such guarantees but were inefficient. We obtain our algorithm via a connection to the problem of {\sl implicitly} learning decision trees. The implicit nature of this learning task allows for efficient algorithms even when the complexity of $f$ necessitates an intractably large surrogate decision tree. We solve the implicit learning problem by bringing together techniques from learning theory, local computation algorithms, and complexity theory. Our approach of "explaining by implicit learning" shares elements of two previously disparate methods for post-hoc explanations, global and local explanations, and we make the case that it enjoys advantages of both.	翻訳日:2021-11-03 15:07:01 公開日:2021-11-01
# 音声データセットにおける1回だけ聴く(yoho)アルゴリズムの頑健性評価 Evaluating robustness of You Only Hear Once(YOHO) Algorithm on noisy audios in the VOICe Dataset ( http://arxiv.org/abs/2111.01205v1 ) ライセンス: Link先を確認	Soham Tiwari, Kshitiz Lakhotia, Manjunath Mulimani	(参考訳) マシンリスニングにおける音響イベント検出(sed)は、オーディオファイル内の異なる音を識別し、オーディオ内の特定の音イベントの開始と終了時刻を識別する。 SEDは、音声監視、音声認識、文脈に基づくインデックス作成やマルチメディアデータベース内のデータの検索など、様々な用途で使用されている。しかし、現実のシナリオでは、様々なソースからのオーディオは、干渉するノイズや外乱をほとんど持たない。本稿では,ノイズの多い音声データに対して,You Only Hear Once (YOHO)アルゴリズムの性能を検証した。コンピュータビジョンにおけるYou Only Look Once (YOLO)アルゴリズムにインスパイアされたYOHOアルゴリズムは、Music Speech Detection Dataset、TUT Sound Event、Urban-SEDデータセットなど、さまざまな最先端アルゴリズムのパフォーマンスを、低い推論時間でマッチングすることができる。本稿では,音量比(snr)の異なる音声ファイルを含む音声データセットにおけるyohoアルゴリズムの性能について検討する。 YOHOはVOICeデータセットの論文で報告された最高のパフォーマンスのSEDアルゴリズムを上回ったり、少なくとも一致させることができる。 Sound event detection (SED) in machine listening entails identifying the different sounds in an audio file and identifying the start and end time of a particular sound event in the audio. SED finds use in various applications such as audio surveillance, speech recognition, and context-based indexing and retrieval of data in a multimedia database. However, in real-life scenarios, the audios from various sources are seldom devoid of any interfering noise or disturbance. In this paper, we test the performance of the You Only Hear Once (YOHO) algorithm on noisy audio data. Inspired by the You Only Look Once (YOLO) algorithm in computer vision, the YOHO algorithm can match the performance of the various state-of-the-art algorithms on datasets such as Music Speech Detection Dataset, TUT Sound Event, and Urban-SED datasets but at lower inference times. In this paper, we explore the performance of the YOHO algorithm on the VOICe dataset containing audio files with noise at different sound-to-noise ratios (SNR). YOHO could outperform or at least match the best performing SED algorithms reported in the VOICe dataset paper and make inferences in less time.	翻訳日:2021-11-03 14:43:48 公開日:2021-11-01
# 生成するな - ソーン発散を伴う微分プライベート生成モデルのトレーニング Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence ( http://arxiv.org/abs/2111.01177v1 ) ライセンス: Link先を確認	Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis	(参考訳) 大量のデータに基づいてトレーニングされた機械学習モデルは、いくつかの分野でブレークスルーにつながっているが、データへのアクセスが制限されているため、プライバシに敏感なドメインへのデプロイメントは制限されている。プライベートデータに対するプライバシの制約でトレーニングされた生成モデルはこの課題を回避し、代わりにプライベートデータへの間接アクセスを提供する。本稿では,プライベートデータから差分プライバシーを持つデータ分布を学習するための,新しい最適トランスポートベース生成法であるdp-sinkhornを提案する。 dp-sinkhornは、モデルとデータの間の正確な最適移動距離に対する計算効率のよい近似であるspinhorn divergenceを微分プライベートな方法で最小化し、勾配推定のバイアス分散トレードオフを制御する新しい手法を使用する。主に生成的敵ネットワークに基づく差分的私的生成モデルを訓練するための既存のアプローチとは異なり、我々は、特にプライバシー制約によるノイズの存在下では、最適化が困難な敵の目的に頼らない。したがって、DP-Sinkhornは訓練や展開が容易である。実験では,複数の画像モデリングベンチマークにおける最先端の改良を行い,有意なrgb画像の差分プライベート合成を示す。プロジェクトページ:https://nv-tlabs.github.io/DP-Sinkhorn Although machine learning models trained on massive data have led to break-throughs in several areas, their deployment in privacy-sensitive domains remains limited due to restricted access to data. Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead. We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. DP-Sinkhorn minimizes the Sinkhorn divergence, a computationally efficient approximation to the exact optimal transport distance, between the model and data in a differentially private manner and uses a novel technique for control-ling the bias-variance trade-off of gradient estimates. Unlike existing approaches for training differentially private generative models, which are mostly based on generative adversarial networks, we do not rely on adversarial objectives, which are notoriously difficult to optimize, especially in the presence of noise imposed by privacy constraints. Hence, DP-Sinkhorn is easy to train and deploy. Experimentally, we improve upon the state-of-the-art on multiple image modeling benchmarks and show differentially private synthesis of informative RGB images. Project page:https://nv-tlabs.github.io/DP-Sinkhorn.	翻訳日:2021-11-03 14:38:45 公開日:2021-11-01
# DAGに基づく分散フェデレーション学習によるインプシットモデル特殊化 Implicit Model Specialization through DAG-based Decentralized Federated Learning ( http://arxiv.org/abs/2111.01257v1 ) ライセンス: Link先を確認	Jossekin Beilharz, Bjarne Pfitzner, Robert Schmid, Paul Geppert, Bernd Arnrich, and Andreas Polze	(参考訳) フェデレートされた学習により、分散クライアントのグループは、プライベートデータ上で共通の機械学習モデルをトレーニングできる。モデル更新の交換は、中央のエンティティまたは分散型の方法で、例えばブロックチェーンによって管理される。しかし、すべてのクライアント間の強い一般化により、これらのアプローチは非独立かつ同一の分散(非iid)データには適さない。モデル更新の有向非巡回グラフ(DAG)に基づくフェデレーション学習における分散化とパーソナライズへの統一的なアプローチを提案する。単一のグローバルモデルをトレーニングする代わりに、クライアントはローカルデータに特化して、各データの類似性に依存する他のクライアントからのモデル更新を使用する。この特殊化は、DAGベースの通信とモデル更新の選択から暗黙的に現れる。このように、データのサブセットに焦点を当てた特殊なモデルの進化を可能にすることで、集中型あるいはブロックチェーンベースのセットアップでのフェデレーション学習よりも、非IIDデータをカバーできるのです。私たちの知る限りでは、提案するソリューションは、完全に分散した連合学習において、パーソナライゼーションと有毒な堅牢性を統合する最初の方法です。評価の結果,3つのデータセット上でのモデル更新のDAGに基づく通信から,モデルの特殊化が直接現れることがわかった。さらに,フェデレート平均化と比較してモデル精度が安定し,クライアント間のばらつきも小さくなった。 Federated learning allows a group of distributed clients to train a common machine learning model on private data. The exchange of model updates is managed either by a central entity or in a decentralized way, e.g. by a blockchain. However, the strong generalization across all clients makes these approaches unsuited for non-independent and identically distributed (non-IID) data. We propose a unified approach to decentralization and personalization in federated learning that is based on a directed acyclic graph (DAG) of model updates. Instead of training a single global model, clients specialize on their local data while using the model updates from other clients dependent on the similarity of their respective data. This specialization implicitly emerges from the DAG-based communication and selection of model updates. Thus, we enable the evolution of specialized models, which focus on a subset of the data and therefore cover non-IID data better than federated learning in a centralized or blockchain-based setup. To the best of our knowledge, the proposed solution is the first to unite personalization and poisoning robustness in fully decentralized federated learning. Our evaluation shows that the specialization of models emerges directly from the DAG-based communication of model updates on three different datasets. Furthermore, we show stable model accuracy and less variance across clients when compared to federated averaging.	翻訳日:2021-11-03 14:38:22 公開日:2021-11-01
# 潜在状態と変更点検出のためのネットワーククラスタリング Network Clustering for Latent State and Changepoint Detection ( http://arxiv.org/abs/2111.01273v1 ) ライセンス: Link先を確認	Madeline Navarro and Genevera I. Allen and Michael Weylandt	(参考訳) ネットワークモデルは、幅広い構造化データソースを分析するための強力で柔軟なフレームワークを提供する。しかし、多くの状況において、基礎となる現象の異なる側面を捉えたり、時間とともに変化する振る舞いを捉えるために複数のネットワークを構築することができる。このような環境では、共通構造のパターンを特定するために、関連するネットワークをクラスタリングすることがしばしば有用である。本稿では,ネットワーククラスタリングの課題に対する凸アプローチを提案する。提案手法では,コンベックス融合ペナルティを用いてスムースなツリー状クラスタ構造を誘導し,クラスタ数を事前に選択する必要がなくなる。コンベックスネットワーククラスタリングのための効率的なアルゴリズムを提案し,その有効性を合成例で示す。 Network models provide a powerful and flexible framework for analyzing a wide range of structured data sources. In many situations of interest, however, multiple networks can be constructed to capture different aspects of an underlying phenomenon or to capture changing behavior over time. In such settings, it is often useful to cluster together related networks in attempt to identify patterns of common structure. In this paper, we propose a convex approach for the task of network clustering. Our approach uses a convex fusion penalty to induce a smoothly-varying tree-like cluster structure, eliminating the need to select the number of clusters a priori. We provide an efficient algorithm for convex network clustering and demonstrate its effectiveness on synthetic examples.	翻訳日:2021-11-03 14:37:55 公開日:2021-11-01
# トランスペアレント、解釈可能、パーシモニアス、シミュラブル(tips)機械学習モデルを用いたcovid-19パンデミックダイナミクスのモデル化 : システム思考とシステム識別の観点からのケーススタディ Modelling COVID-19 Pandemic Dynamics Using Transparent, Interpretable, Parsimonious and Simulatable (TIPS) Machine Learning Models: A Case Study from Systems Thinking and System Identification Perspectives ( http://arxiv.org/abs/2111.01763v1 ) ライセンス: Link先を確認	Hua-Liang Wei and S.A. Billings	(参考訳) 新型コロナウイルス(covid-19)の流行以降、新型コロナウイルスの感染拡大をシミュレートし、研究するために、感染したウイルス(sir)と感染したウイルス(seir)モデル(sir)を多く使っている文献に、パンデミックのダイナミクスに関する天文学的な出版物が数多く登場している。 SIRとSEIRは、通常の微分方程式(ODE)の初期値問題(IVP)のクラスである連続時間モデルである。回帰や機械学習などの離散時間モデルも新型コロナウイルスのパンデミックデータ(例:感染症の予測)の分析に応用されているが、これらの手法のほとんどは、事前知識に基づいて事前選択された少数の入力変数を含む単純化されたモデルを使用するか、非常に複雑なモデル(例:ディープラーニング)を使用する。再現数(R番号)、感染事例、死亡など、時間差や時間遅れの関係の調査に焦点をあてた研究は比較的少なく、システム思考と動的視点からパンデミックが広まるのを分析している。本研究は, システム工学とシステム同定を用いて, 透明, 解釈可能, パーシモニアス, シミュレーション可能(tips)な動的機械学習モデルを構築することを提案する。 TIPSモデルは、よく知られたNARMAX(Nonlinear AutoRegressive moving Average with eXogenous inputs)モデルに基づいて開発されており、COVID-19パンデミックのダイナミクスをよりよく理解することができる。英国の新型コロナウイルス(covid-19)データに関するケーススタディが実施された。提案手法と新たな知見は、新型コロナウイルスの感染拡大のダイナミクスをよりよく理解するために有用である。 Since the outbreak of COVID-19, an astronomical number of publications on the pandemic dynamics appeared in the literature, of which many use the susceptible infected removed (SIR) and susceptible exposed infected removed (SEIR) models, or their variants, to simulate and study the spread of the coronavirus. SIR and SEIR are continuous-time models which are a class of initial value problems (IVPs) of ordinary differential equations (ODEs). Discrete-time models such as regression and machine learning have also been applied to analyze COVID-19 pandemic data (e.g. predicting infection cases), but most of these methods use simplified models involving a small number of input variables pre-selected based on a priori knowledge, or use very complicated models (e.g. deep learning), purely focusing on certain prediction purposes and paying little attention to the model interpretability. There have been relatively fewer studies focusing on the investigations of the inherent time-lagged or time-delayed relationships e.g. between the reproduction number (R number), infection cases, and deaths, analyzing the pandemic spread from a systems thinking and dynamic perspective. The present study, for the first time, proposes using systems engineering and system identification approach to build transparent, interpretable, parsimonious and simulatable (TIPS) dynamic machine learning models, establishing links between the R number, the infection cases and deaths caused by COVID-19. The TIPS models are developed based on the well-known NARMAX (Nonlinear AutoRegressive Moving Average with eXogenous inputs) model, which can help better understand the COVID-19 pandemic dynamics. A case study on the UK COVID-19 data is carried out, and new findings are detailed. The proposed method and the associated new findings are useful for better understanding the spread dynamics of the COVID-19 pandemic.	翻訳日:2021-11-03 14:33:28 公開日:2021-11-01
# 位相層による二次分散関数有向非巡回グラフの効率的な学習 Efficient Learning of Quadratic Variance Function Directed Acyclic Graphs via Topological Layers ( http://arxiv.org/abs/2111.01560v1 ) ライセンス: Link先を確認	Wei Zhou and Xin He and Wei Zhong and Junhui Wang	(参考訳) 直接非巡回グラフ(DAG)モデルは、多くのアプリケーション領域における確率変数間の因果関係を表現するために広く用いられている。本稿では,親が与えられた各ノードの条件分散が条件平均の二次関数である非ガウス型dagモデルの特殊クラスについて述べる。このようなガウス的でないDAGモデルのクラスは、かなり柔軟であり、ポアソン、ビノミアル、幾何学、指数、ガンマを含む多くの一般的な分布を特別な場合として認める。学習を容易にするために,トポロジカル層の概念を導入し,効率的なDAG学習アルゴリズムを開発した。まず、階層的な方法でトポロジ的層を再構築し、次に異なる層のノード間の有向エッジを復元する。その利点は、多くのシミュレーション例や、nbaプレーヤーの統計データとalibabaが収集した化粧品販売データを含む2つの実生活データセットへの応用で示されている。 Directed acyclic graph (DAG) models are widely used to represent causal relationships among random variables in many application domains. This paper studies a special class of non-Gaussian DAG models, where the conditional variance of each node given its parents is a quadratic function of its conditional mean. Such a class of non-Gaussian DAG models are fairly flexible and admit many popular distributions as special cases, including Poisson, Binomial, Geometric, Exponential, and Gamma. To facilitate learning, we introduce a novel concept of topological layers, and develop an efficient DAG learning algorithm. It first reconstructs the topological layers in a hierarchical fashion and then recoveries the directed edges between nodes in different layers, which requires much less computational cost than most existing algorithms in literature. Its advantage is also demonstrated in a number of simulated examples, as well as its applications to two real-life datasets, including an NBA player statistics data and a cosmetic sales data collected by Alibaba.	翻訳日:2021-11-03 14:22:44 公開日:2021-11-01
# Arch-Net: アーキテクチャに依存しないモデル展開のためのモデル蒸留 Arch-Net: Model Distillation for Architecture Agnostic Model Deployment ( http://arxiv.org/abs/2111.01135v1 ) ライセンス: Link先を確認	Weixin Xu, Zipeng Feng, Shuangkang Fang, Song Yuan, Yi Yang, Shuchang Zhou	(参考訳) ディープニューラルネットワークの計算能力の膨大な要求は、現実世界のアプリケーションにとって大きなハードルとなる。最近のアプリケーション固有集積回路(ASIC)チップは、ニューラルネットワークアクセラレーション専用のハードウェアをサポートする。しかし、ASICは開発に数年を要し、ニューラルアーキテクチャ研究の最新の開発によって必然的に追い越されている。例えば、トランスフォーマーネットワークは、多くの人気のあるチップにネイティブサポートを持っていないため、デプロイが難しい。本稿では,asicのほとんどのアーキテクチャで効率的にサポートできるオペレータのみからなるニューラルネットワークであるarch-netを提案する。 Arch-Netが生成されると、Layer Normalization や Embedding Layersのようなより一般的なネットワーク構成は、ラベルのないBlockwise Model Distillationを通じて徐々に排除され、同時にサブ8ビット量子化を実行してパフォーマンスを最大化する。機械翻訳と画像分類タスクの実証結果から,最新のニューラルアーキテクチャを高速実行およびア正確なアーチネットに変換し,複数の量産型asicチップにデプロイできることを確認した。コードはhttps://github.com/megvii-research/Arch-Netで入手できる。 Vast requirement of computation power of Deep Neural Networks is a major hurdle to their real world applications. Many recent Application Specific Integrated Circuit (ASIC) chips feature dedicated hardware support for Neural Network Acceleration. However, as ASICs take multiple years to develop, they are inevitably out-paced by the latest development in Neural Architecture Research. For example, Transformer Networks do not have native support on many popular chips, and hence are difficult to deploy. In this paper, we propose Arch-Net, a family of Neural Networks made up of only operators efficiently supported across most architectures of ASICs. When a Arch-Net is produced, less common network constructs, like Layer Normalization and Embedding Layers, are eliminated in a progressive manner through label-free Blockwise Model Distillation, while performing sub-eight bit quantization at the same time to maximize performance. Empirical results on machine translation and image classification tasks confirm that we can transform latest developed Neural Architectures into fast running and as-accurate Arch-Net, ready for deployment on multiple mass-produced ASIC chips. The code will be available at https://github.com/megvii-research/Arch-Net.	翻訳日:2021-11-03 14:21:36 公開日:2021-11-01
# グラフベーススーパービジョンを用いたシーケンストランスダクション Sequence Transduction with Graph-based Supervision ( http://arxiv.org/abs/2111.01272v1 ) ライセンス: Link先を確認	Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux	(参考訳) リカレントニューラルネットワークトランスデューサ(RNN-T)の目標は、生産のための今日の最高の自動音声認識(ASR)システムを構築する上で大きな役割を果たす。接続性時間分類(CTC)の目的と同様に、RNN-T損失は、一組のアライメントをどのように生成してフルサムトレーニングのための格子を形成するかを定義する特定のルールを使用する。しかし、これらのルールが最適であり、最高のASR結果をもたらすかどうかはまだ不明である。本研究では,ラベルのグラフ表現を受け入れるためにRNN-T損失を一般化する新たなトランスデューサ目的関数を提案する。 CTCのような格子を持つトランスデューサベースのASRは、標準のRNN-Tよりも優れた結果が得られると同時に、厳密な単調なアライメントを確保し、復号処理の最適化を可能にすることを実証する。例えば、提案したCTCライクなトランスデューサシステムは、同等のRNN-Tベースのシステムに対する4.8%の改善に対応する、LibriSpeechの他のテスト条件に対する単語誤り率5.9%を達成する。 The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss uses specific rules that define how a set of alignments is generated to form a lattice for the full-sum training. However, it is yet largely unknown if these rules are optimal and do lead to the best possible ASR results. In this work, we present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels, thus providing a flexible and efficient framework to manipulate training lattices, for example for restricting alignments or studying different transition rules. We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T, while also ensuring a strictly monotonic alignment, which will allow better optimization of the decoding procedure. For example, the proposed CTC-like transducer system achieves a word error rate of 5.9% for the test-other condition of LibriSpeech, corresponding to an improvement of 4.8% relative to an equivalent RNN-T based system.	翻訳日:2021-11-03 14:19:25 公開日:2021-11-01
# 大規模デジタル実験における機械学習を用いた因果分節解析の枠組み A framework for causal segmentation analysis with machine learning in large-scale digital experiments ( http://arxiv.org/abs/2111.01223v1 ) ライセンス: Link先を確認	Nima S. Hejazi, Wenjing Zheng, Sathya Anand	(参考訳) 本稿では,大規模デジタル実験において,ユーザサブグループ間の治療効果の差分を明らかにすることを目的とした,因果セグメント発見のためのエンドツーエンド方法論フレームワークを提案する。因果推論と非半パラメトリック統計の最近の進展に基づき,(1)サブグループ固有の治療効果に基づく候補治療の利益となるユーザセグメントの発見,(2)予測されたセグメント固有の利益や損害に基づいて,学習者の治療アームに動的に割り当てたユニットの因果影響の評価,の2つの目的を統一した。提案手法はモデル非依存で、最先端機械学習アルゴリズムを推定手順に組み込むことができ、ランダム化a/bテストや準実験に適用できる。オープンソースのRパッケージ実装であるSherlockが導入されている。 We present an end-to-end methodological framework for causal segment discovery that aims to uncover differential impacts of treatments across subgroups of users in large-scale digital experiments. Building on recent developments in causal inference and non/semi-parametric statistics, our approach unifies two objectives: (1) the discovery of user segments that stand to benefit from a candidate treatment based on subgroup-specific treatment effects, and (2) the evaluation of causal impacts of dynamically assigning units to a study's treatment arm based on their predicted segment-specific benefit or harm. Our proposal is model-agnostic, capable of incorporating state-of-the-art machine learning algorithms into the estimation procedure, and is applicable in randomized A/B tests and quasi-experiments. An open source R package implementation, sherlock, is introduced.	翻訳日:2021-11-03 14:17:02 公開日:2021-11-01
# 意図しない選択:持続的資格化率格差と介入 Unintended Selection: Persistent Qualification Rate Disparities and Interventions ( http://arxiv.org/abs/2111.01201v1 ) ライセンス: Link先を確認	Reilly Raab, Yang Liu	(参考訳) 機械学習におけるグループレベルの差異のダイナミクスを現実的に - かつ公平に - モデリングすることは、まだ未解決の問題です。特に、人類の人工集団の間に固有の違いを仮定しないモデルを望むが、むしろ不均等な不均質なサブ集団の初期条件に訴えることで、不均質化を図っている。この論文では、エージェントはそれぞれ、資格(例えばローン)を表す「真の」バイナリラベルである$y$によって通知される、実数値のフィーチャ$x$(例えばクレジットスコア)を持っている。各エージェントは、(例えば、ローン承認)バイナリ分類ラベル$\hat{Y}$をベイズ最適化機械学習分類器から受信し、(X$と(2)は、それらが属するエージェントの単独グループ$G$内で成功した戦略(例えば、昇給を求める)を模倣することにより、彼らの資格を更新することができる。我々は、異なるグループ間での資格率の格差$\Pr(Y=1)$と、この格差が、世界の人口で繰り返し再訓練されたベイズ最適分類器のシーケンスによってどのように変化するかを考える。我々は,模倣過程のクラスから派生したレプリケータ方程式を用いて,各サブポピュレーション(グループ)の適合率の進化をモデル化する。分類器の均一配置による非自明な平衡状態の組では,初期資格密度以外のすべての面においてグループが同一であっても,下位集団間の資格率の差が無期限に持続することを示す。次に,提案するフェアネス介入が,グループレベルの資格率格差を永久に排除できる新しいフィードバック制御機構とともに,この力学系に与える影響をシミュレーションする。我々は、モデルと発見の限界について議論し、将来的な仕事の可能性を概説することで締めくくります。 Realistically -- and equitably -- modeling the dynamics of group-level disparities in machine learning remains an open problem. In particular, we desire models that do not suppose inherent differences between artificial groups of people -- but rather endogenize disparities by appeal to unequal initial conditions of insular subpopulations. In this paper, agents each have a real-valued feature $X$ (e.g., credit score) informed by a "true" binary label $Y$ representing qualification (e.g., for a loan). Each agent alternately (1) receives a binary classification label $\hat{Y}$ (e.g., loan approval) from a Bayes-optimal machine learning classifier observing $X$ and (2) may update their qualification $Y$ by imitating successful strategies (e.g., seek a raise) within an isolated group $G$ of agents to which they belong. We consider the disparity of qualification rates $\Pr(Y=1)$ between different groups and how this disparity changes subject to a sequence of Bayes-optimal classifiers repeatedly retrained on the global population. We model the evolving qualification rates of each subpopulation (group) using the replicator equation, which derives from a class of imitation processes. We show that differences in qualification rates between subpopulations can persist indefinitely for a set of non-trivial equilibrium states due to uniformed classifier deployments, even when groups are identical in all aspects except initial qualification densities. We next simulate the effects of commonly proposed fairness interventions on this dynamical system along with a new feedback control mechanism capable of permanently eliminating group-level qualification rate disparities. We conclude by discussing the limitations of our model and findings and by outlining potential future work.	翻訳日:2021-11-03 14:13:31 公開日:2021-11-01
# Minimax Optimization: Convex-submodular の場合 Minimax Optimization: The Case of Convex-Submodular ( http://arxiv.org/abs/2111.01262v1 ) ライセンス: Link先を確認	Arman Adibi, Aryan Mokhtari, Hamed Hassani	(参考訳) ミニマックス最適化は、機械学習、ゲーム理論、制御理論における様々な応用に取り組んできた。これまでの文献では、例えば凸凸ミニマックス最適化(convex-concave minimax optimization)など、連続領域におけるそのような問題の研究に重点を置いてきた。それでも、ミニマックス問題は連続領域を超えて連続離散領域や完全離散領域にまで拡張される。本稿では、ユークリッド空間に属する連続変数上の最小化が与えられた基底集合の部分集合上の最大化である混合連続離散ミニマックス問題について研究する。連続変数に関して目的が凸であり、離散変数に関して部分モジュラーであるような凸-部分モジュラーミニマックス問題のクラスを導入する。このような問題は機械学習アプリケーションに頻繁に現れるが、アルゴリズム的および理論的観点からそれらに対処する方法についてはほとんど分かっていない。このような問題に対して、まず、サドル点の取得は任意の近似に対して困難であることを示し、従って(近傍)最適性の新たな概念を導入する。次に, 凸および単調サブモジュラーミニマックス問題の解法と, その収束率, 計算複雑性, 最終解の質を最適解として特徴付けるアルゴリズム手法を提案する。提案アルゴリズムは反復的であり、離散最適化と連続最適化の両方のツールを組み合わせる。最後に,本手法の有効性を示す数値実験を行った。 Minimax optimization has been central in addressing various applications in machine learning, game theory, and control theory. Prior literature has thus far mainly focused on studying such problems in the continuous domain, e.g., convex-concave minimax optimization is now understood to a significant extent. Nevertheless, minimax problems extend far beyond the continuous domain to mixed continuous-discrete domains or even fully discrete domains. In this paper, we study mixed continuous-discrete minimax problems where the minimization is over a continuous variable belonging to Euclidean space and the maximization is over subsets of a given ground set. We introduce the class of convex-submodular minimax problems, where the objective is convex with respect to the continuous variable and submodular with respect to the discrete variable. Even though such problems appear frequently in machine learning applications, little is known about how to address them from algorithmic and theoretical perspectives. For such problems, we first show that obtaining saddle points are hard up to any approximation, and thus introduce new notions of (near-) optimality. We then provide several algorithmic procedures for solving convex and monotone-submodular minimax problems and characterize their convergence rates, computational complexity, and quality of the final solution according to our notions of optimally. Our proposed algorithms are iterative and combine tools from both discrete and continuous optimization. Finally, we provide numerical experiments to showcase the effectiveness of our purposed methods.	翻訳日:2021-11-03 14:13:00 公開日:2021-11-01
# switch point biased self-training: code-switchingのための事前学習モデルの再提案 Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching ( http://arxiv.org/abs/2111.01231v1 ) ライセンス: Link先を確認	Parul Chopra, Sai Krishna Rallabandi, Alan W Black, Khyathi Raghavi Chandu	(参考訳) 多言語コミュニティにおけるコミュニケーションの容易さによるユビキタスな現象であるcode-switching (cs)は、言語処理における未熟な問題である。この背景にある主な理由は、(1)事前訓練された大規模多言語モデルを活用するための最小限の努力、(2)注釈付きデータの欠如である。 CSにおける多言語モデルの低性能の区別は、スイッチポイントにつながる言語の文内混合である。まず 4 つの異なる言語ペアに POS と NER という2 つのシーケンスラベリングタスクをベンチマークし,問題を特定し,その中の最高のパフォーマンスモデルである char-BERT を選択する (addressing (1))。次に,未注釈データ(アドレッシング(2))を活用することで,スイッチポイントバイアスを用いて既存の事前学習モデルを再利用する自己学習手法を提案する。両タスクにおける2つの異なる言語ペアの全体的な性能を維持しながら,スイッチポイント性能のギャップを小さくすることで,我々のアプローチが両タスクで良好に動作することを示す。私たちのコードは、https://github.com/PC09/EMNLP2021-Switch-Point-biased-Self-Training.comで利用可能です。 Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in leveraging large pretrained multilingual models, and (2) the lack of annotated data. The distinguishing case of low performance of multilingual models in CS is the intra-sentence mixing of languages leading to switch points. We first benchmark two sequence labeling tasks -- POS and NER on 4 different language pairs with a suite of pretrained models to identify the problems and select the best performing model, char-BERT, among them (addressing (1)). We then propose a self training method to repurpose the existing pretrained models using a switch-point bias by leveraging unannotated data (addressing (2)). We finally demonstrate that our approach performs well on both tasks by reducing the gap between the switch point performance while retaining the overall performance on two distinct language pairs in both the tasks. Our code is available here: https://github.com/PC09/EMNLP2021-Switch-Point-biased-Self-Training.	翻訳日:2021-11-03 14:12:37 公開日:2021-11-01
# 映像理解モデルのための勾配周波数変調 Gradient Frequency Modulation for Visually Explaining Video Understanding Models ( http://arxiv.org/abs/2111.01215v1 ) ライセンス: Link先を確認	Xinmiao Lin, Wentao Bao, Matthew Wright, Yu Kong	(参考訳) 多くのアプリケーションでは、なぜ機械学習モデルが意思決定を行うのかを理解することが不可欠であるが、これは最先端のニューラルネットワークのブラックボックスの性質によって阻害されている。このため、ビデオ理解の分野を含む深層学習における説明可能性に注目が集まっている。映像データの時間的次元から,映像行動認識モデルを説明する主な課題は,既存の文献では無視されている時空間的に一貫した視覚説明を作ることである。本稿では,映像理解モデルの意思決定を説明するために,周波数ベース極値摂動(f-ep)を提案する。摂動法によって与えられる説明は、空間的・時間的にともにノイズと非スムースであるため、離散コサイン変換(dct)を用いてニューラルネットワークモデルから勾配写像の周波数を変調する。実験では,f-ep がモデルの意思決定をより忠実に表現する時空間的一貫性のある説明を提供することを示す。 In many applications, it is essential to understand why a machine learning model makes the decisions it does, but this is inhibited by the black-box nature of state-of-the-art neural networks. Because of this, increasing attention has been paid to explainability in deep learning, including in the area of video understanding. Due to the temporal dimension of video data, the main challenge of explaining a video action recognition model is to produce spatiotemporally consistent visual explanations, which has been ignored in the existing literature. In this paper, we propose Frequency-based Extremal Perturbation (F-EP) to explain a video understanding model's decisions. Because the explanations given by perturbation methods are noisy and non-smooth both spatially and temporally, we propose to modulate the frequencies of gradient maps from the neural network model with a Discrete Cosine Transform (DCT). We show in a range of experiments that F-EP provides more spatiotemporally consistent explanations that more faithfully represent the model's decisions compared to the existing state-of-the-art methods.	翻訳日:2021-11-03 14:10:08 公開日:2021-11-01
# 前もってのニューラルシーンフロー Neural Scene Flow Prior ( http://arxiv.org/abs/2111.01253v1 ) ライセンス: Link先を確認	Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey	(参考訳) ディープラーニング革命以前、多くの知覚アルゴリズムは実行時最適化と強力な事前/正規化ペナルティに基づいていた。コンピュータビジョンにおけるこの主な例は光学とシーンフローである。教師付き学習は、明示的な正規化の必要性をほとんど取り除いた。代わりに、大量のラベル付きデータを使用して事前統計をキャプチャするが、多くの問題に対して必ずしも容易に利用できない。ニューラルネットワークの学習には最適化が使用されるが、このネットワークの重みは実行時に凍結される。その結果、これらの学習ソリューションはドメイン固有であり、他の統計的に異なるシナリオによく当てはまらない。本稿では,実行時最適化と強い正規化に大きく依存するシーンフロー問題を再考する。ここでの中心的なイノベーションは、ニューラルネットワークのアーキテクチャを新しいタイプの暗黙正則化器として使用する、前もってニューラルネットワークのシーンフローを含めることである。学習ベースのシーンフローメソッドとは異なり、最適化は実行時に発生し、我々のアプローチではオフラインのデータセットは必要とせず、自律運転のような新しい環境へのデプロイに最適である。我々は、マルチレイヤパーセプトロン(MLP)のみをベースとしたアーキテクチャをシーンフローとして使用できることを示します。我々の手法は、シーンフローベンチマークで競争力のある結果を得ることができます。また、神経前兆の暗黙的かつ連続的なシーンフロー表現は、ポイント雲の列にまたがる密集した長期対応を推定することができる。濃密な動き情報は、動きベクトルを統合することにより、時間を通して点を伝播できるシーンフローフィールドによって表現される。我々は,lidar点雲の列を蓄積することで,このような能力を示す。 Before the deep learning revolution, many perception algorithms were based on runtime optimization in conjunction with a strong prior/regularization penalty. A prime example of this in computer vision is optical and scene flow. Supervised learning has largely displaced the need for explicit regularization. Instead, they rely on large amounts of labeled data to capture prior statistics, which are not always readily available for many problems. Although optimization is employed to learn the neural network, the weights of this network are frozen at runtime. As a result, these learning solutions are domain-specific and do not generalize well to other statistically different scenarios. This paper revisits the scene flow problem that relies predominantly on runtime optimization and strong regularization. A central innovation here is the inclusion of a neural scene flow prior, which uses the architecture of neural networks as a new type of implicit regularizer. Unlike learning-based scene flow methods, optimization occurs at runtime, and our approach needs no offline datasets -- making it ideal for deployment in new environments such as autonomous driving. We show that an architecture based exclusively on multilayer perceptrons (MLPs) can be used as a scene flow prior. Our method attains competitive -- if not better -- results on scene flow benchmarks. Also, our neural prior's implicit and continuous scene flow representation allows us to estimate dense long-term correspondences across a sequence of point clouds. The dense motion information is represented by scene flow fields where points can be propagated through time by integrating motion vectors. We demonstrate such a capability by accumulating a sequence of lidar point clouds.	翻訳日:2021-11-03 14:09:52 公開日:2021-11-01
# 運動境界と咬合の協調検出 Joint Detection of Motion Boundaries and Occlusions ( http://arxiv.org/abs/2111.01261v1 ) ライセンス: Link先を確認	Hannah Halin Kim, Shuzhi Yu, Carlo Tomasi	(参考訳) 本研究では,映像中の運動境界(mbs)と咬合領域(occ)を同時検出する畳み込みニューラルネットワークmonetを提案する。検出は、光の流れがMBに沿って不連続であり、Occでは定義されていないため困難である。 2つの時間方向を同時に推論するため、2つのフレーム間の推定マップを直接ワープする。フレーム間の外観ミスマッチは、しばしばMBやOccに近づきやすいため、1フレーム内の各特徴に対して、検索範囲内の特徴と一致した最小差を記録するコストブロックを構築する。このコストブロックは2次元であり、フロー分析で使われる4次元のコストボリュームよりもはるかに安価である。コストブロック機能はエンコーダで計算され、MBとOccの推定はデコーダで計算される。デコーダ層を細粒度に配置することで性能が向上することがわかった。 MONetは、SintelとFlyingChairsOccベンチマークの両方のタスクにおいて、細かな調整をすることなく、従来の技術よりも優れている。 We propose MONet, a convolutional neural network that jointly detects motion boundaries (MBs) and occlusion regions (Occs) in video both forward and backward in time. Detection is difficult because optical flow is discontinuous along MBs and undefined in Occs, while many flow estimators assume smoothness and a flow defined everywhere. To reason in the two time directions simultaneously, we direct-warp the estimated maps between the two frames. Since appearance mismatches between frames often signal vicinity to MBs or Occs, we construct a cost block that for each feature in one frame records the lowest discrepancy with matching features in a search range. This cost block is two-dimensional, and much less expensive than the four-dimensional cost volumes used in flow analysis. Cost-block features are computed by an encoder, and MB and Occ estimates are computed by a decoder. We found that arranging decoder layers fine-to-coarse, rather than coarse-to-fine, improves performance. MONet outperforms the prior state of the art for both tasks on the Sintel and FlyingChairsOcc benchmarks without any fine-tuning on them.	翻訳日:2021-11-03 14:09:26 公開日:2021-11-01
# クロスモーダルビデオ検索のためのマスキングモード Masking Modalities for Cross-modal Video Retrieval ( http://arxiv.org/abs/2111.01300v1 ) ライセンス: Link先を確認	Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid	(参考訳) 大規模アンラベリングデータセットの事前トレーニングでは、コンピュータビジョンと自然言語処理の分野で顕著なパフォーマンス向上が見られた。大規模ビデオデータセットの出現を考えると、ビデオエンコーダを事前訓練するための一般的な戦略は、付随する音声を弱い監督力として使うことである。しかし、音声は事前学習を監督するために使用されるため、ビデオエンコーダには見られず、そのモダリティを処理することを学ばない。音声言語における豊富な手がかりを活用できない現在の事前学習手法の欠点に対処した。提案手法は,ビデオモダリティの全てを監督,すなわち外見,音,書き起こし音声として利用して,ビデオエンコーダの事前訓練を行うことである。入力の全体モダリティを隠蔽し、他の2つのモダリティを使って予測する。これにより、それぞれのモダリティが他の人とコラボレーションすることを奨励し、私たちのビデオエンコーダは、音声と同様に外観や音声を処理することを学びます。 How2R, YouCook2, Condensed Moviesデータセット上で, ビデオ検索のための"モダリティマスキング"事前学習手法の優れた性能を示す。 Pre-training on large scale unlabelled datasets has shown impressive performance improvements in the fields of computer vision and natural language processing. Given the advent of large-scale instructional video datasets, a common strategy for pre-training video encoders is to use the accompanying speech as weak supervision. However, as speech is used to supervise the pre-training, it is never seen by the video encoder, which does not learn to process that modality. We address this drawback of current pre-training methods, which fail to exploit the rich cues in spoken language. Our proposal is to pre-train a video encoder using all the available video modalities as supervision, namely, appearance, sound, and transcribed speech. We mask an entire modality in the input and predict it using the other two modalities. This encourages each modality to collaborate with the others, and our video encoder learns to process appearance and audio as well as speech. We show the superior performance of our "modality masking" pre-training approach for video retrieval on the How2R, YouCook2 and Condensed Movies datasets.	翻訳日:2021-11-03 14:09:09 公開日:2021-11-01
# プロンプトレベルEMA非応答予測のためのトランスフォーマ Transformers for prompt-level EMA non-response prediction ( http://arxiv.org/abs/2111.01193v1 ) ライセンス: Link先を確認	Supriya Nagesh, Alexander Moreno, Stephanie M. Carpenter, Jamie Yap, Soujanya Chatterjee, Steven Lloyd Lizotte, Neng Wan, Santosh Kumar, Cho Lam, David W. Wetter, Inbal Nahum-Shani, James M. Rehg	(参考訳) eco momentary assessments (emas)は、モバイルヘルス(mhealth)研究や治療プログラムの参加者から現在の認知状態、影響、行動、環境要因を測定する上で重要な心理学的データソースである。参加者がEMAプロンプトに反応しない非応答は、内因性問題である。非応答を正確に予測できる能力は、EMAのデリバリを改善し、コンプライアンスの介入を開発するために利用できる。先行研究は、非応答を予測するために古典的な機械学習モデルを探求した。しかし、ますます大規模なEMAデータセットが利用可能になるにつれて、他の分野で有効であったディープラーニングモデルを活用する可能性がある。近年,NLPなどの領域ではトランスフォーマーモデルの性能が向上している。この研究は、EMAデータ分析におけるトランスフォーマーの利用を初めて探求したものである。 EMAデータにトランスフォーマーを適用する際の3つの重要な問題に対処する。 1.入力表現,入力表現 2. 時間情報のエンコーディング 3. 下流予測タスク性能向上のための事前学習の有用性トランスモデルは0.77の非応答予測AUCを実現し、従来のMLやLSTMベースのディープラーニングモデルよりも大幅に優れている。我々は,40K EMAサンプルのコーパスで学習した予測モデルを研究コミュニティに無償で提供し,将来的なトランスフォーマーベースのEMA分析作業の開発を促進する。 Ecological Momentary Assessments (EMAs) are an important psychological data source for measuring current cognitive states, affect, behavior, and environmental factors from participants in mobile health (mHealth) studies and treatment programs. Non-response, in which participants fail to respond to EMA prompts, is an endemic problem. The ability to accurately predict non-response could be utilized to improve EMA delivery and develop compliance interventions. Prior work has explored classical machine learning models for predicting non-response. However, as increasingly large EMA datasets become available, there is the potential to leverage deep learning models that have been effective in other fields. Recently, transformer models have shown state-of-the-art performance in NLP and other domains. This work is the first to explore the use of transformers for EMA data analysis. We address three key questions in applying transformers to EMA data: 1. Input representation, 2. encoding temporal information, 3. utility of pre-training on improving downstream prediction task performance. The transformer model achieves a non-response prediction AUC of 0.77 and is significantly better than classical ML and LSTM-based deep learning models. We will make our a predictive model trained on a corpus of 40K EMA samples freely-available to the research community, in order to facilitate the development of future transformer-based EMA analysis works.	翻訳日:2021-11-03 14:04:15 公開日:2021-11-01
# 時系列生成のためのSig-Wasserstein GANs Sig-Wasserstein GANs for Time Series Generation ( http://arxiv.org/abs/2111.01207v1 ) ライセンス: Link先を確認	Hao Ni, Lukasz Szpruch, Marc Sabate-Vidales, Baoren Xiao, Magnus Wiese, Shujian Liao	(参考訳) 合成データ(Synthetic data)は、AI機械学習パイプラインの開発とデプロイを著しく加速する新興技術である。本研究では,連続時間確率モデルと新たに提案された$w_1$メトリックを組み合わせることで,高忠実度時系列生成器sigwganを開発した。前者は確率微分方程式に基づくLogsig-RNNモデルであり、後者は時系列によって誘導される測度を特徴づける普遍的および原理的な数学的特徴に由来する。 SigWGAN は計算的に挑戦する GAN min-max 問題を高忠実度サンプルを生成しながら教師あり学習に変換することができる。一般的な量的リスクモデルと経験的金融データから得られた合成データから,提案モデルを検証する。コードはhttps://github.com/sigcgans/sig-wasserstein-gans.gitで入手できる。 Synthetic data is an emerging technology that can significantly accelerate the development and deployment of AI machine learning pipelines. In this work, we develop high-fidelity time-series generators, the SigWGAN, by combining continuous-time stochastic models with the newly proposed signature $W_1$ metric. The former are the Logsig-RNN models based on the stochastic differential equations, whereas the latter originates from the universal and principled mathematical features to characterize the measure induced by time series. SigWGAN allows turning computationally challenging GAN min-max problem into supervised learning while generating high fidelity samples. We validate the proposed model on both synthetic data generated by popular quantitative risk models and empirical financial data. Codes are available at https://github.com/SigCGANs/Sig-Wasserstein-GANs.git.	翻訳日:2021-11-03 14:03:55 公開日:2021-11-01
# サーバグレードハードウェアによるヒューマンレベル制御 Human-Level Control without Server-Grade Hardware ( http://arxiv.org/abs/2111.01264v1 ) ライセンス: Link先を確認	Brett Daley and Christopher Amato	(参考訳) ディープQネットワーク(DQN)は強化学習の大きなマイルストーンであり、人間レベルの制御ポリシーが報酬の最大化を通じて生の視覚入力から直接学習できることを初めて実証した。導入から何年も経っても、dqnは多くのイノベーションが後継手法に採用されているため、dqnは研究コミュニティと非常に関係がある。それでも、ハードウェアの大幅な進歩にもかかわらず、DQNの最初のAtari 2600実験は完全な複製に費用がかかるままであった。これは、最先端のハードウェアや大規模なクラウドコンピューティングリソースにアクセスできない研究者にとって、大きな障壁となる。そこで本研究では,CPU-GPUデスクトップシステムを最大限活用するために設計された,並列かつ同期化された新しい実行フレームワークを活用したDQN実装を提案する。 NVIDIA GeForce GTX 1080 GPUを1つだけで実装することで、200万フレームのAtari実験のトレーニング時間を25時間から9時間に短縮します。我々の論文で紹介されたアイデアは、多くのオフポリシー深層強化学習法に一般化されるべきである。 Deep Q-Network (DQN) marked a major milestone for reinforcement learning, demonstrating for the first time that human-level control policies could be learned directly from raw visual inputs via reward maximization. Even years after its introduction, DQN remains highly relevant to the research community since many of its innovations have been adopted by successor methods. Nevertheless, despite significant hardware advances in the interim, DQN's original Atari 2600 experiments remain costly to replicate in full. This poses an immense barrier to researchers who cannot afford state-of-the-art hardware or lack access to large-scale cloud computing resources. To facilitate improved access to deep reinforcement learning research, we introduce a DQN implementation that leverages a novel concurrent and synchronized execution framework designed to maximally utilize a heterogeneous CPU-GPU desktop system. With just one NVIDIA GeForce GTX 1080 GPU, our implementation reduces the training time of a 200-million-frame Atari experiment from 25 hours to just 9 hours. The ideas introduced in our paper should be generalizable to a large number of off-policy deep reinforcement learning methods.	翻訳日:2021-11-03 14:03:42 公開日:2021-11-01
# 車両グリッド統合を考慮した電気自動車充電ステーションの運転学習 Learning to Operate an Electric Vehicle Charging Station Considering Vehicle-grid Integration ( http://arxiv.org/abs/2111.01294v1 ) ライセンス: Link先を確認	Zuzhao Ye, Yuanqi Gao, Nanpeng Yu	(参考訳) 電気自動車(EV)の急速な普及により、EV充電ステーションの広範な設置が求められている。充電ステーションの収益性を最大化するために、充電と電気グリッドサービスの両方を提供するインテリジェントコントローラが大いに求められている。しかし,EVの到着時間や充電要求が不確実であることから,最適な充電スケジュールを決定することは困難である。本稿では、充電ステーションの利益を最大化するために、新しい集中型アロケーションと分散実行(CADE)強化学習(RL)フレームワークを提案する。集中配置プロセスでは、EVは待機スポットまたは充電スポットに割り当てられる。分散実行プロセスでは、各充電器は共有リプレイメモリからアクション値関数を学習しながら、独自の充電/放電決定を行う。このCADEフレームワークはRLアルゴリズムのスケーラビリティとサンプル効率を大幅に改善する。数値計算により,提案したCADEフレームワークは計算効率が高く,拡張性も高く,ベースラインモデル予測制御(MPC)よりも優れていた。また,強化学習エージェントの内部動作を説明するために,学習した行動値関数の詳細な分析を行う。 The rapid adoption of electric vehicles (EVs) calls for the widespread installation of EV charging stations. To maximize the profitability of charging stations, intelligent controllers that provide both charging and electric grid services are in great need. However, it is challenging to determine the optimal charging schedule due to the uncertain arrival time and charging demands of EVs. In this paper, we propose a novel centralized allocation and decentralized execution (CADE) reinforcement learning (RL) framework to maximize the charging station's profit. In the centralized allocation process, EVs are allocated to either the waiting or charging spots. In the decentralized execution process, each charger makes its own charging/discharging decision while learning the action-value functions from a shared replay memory. This CADE framework significantly improves the scalability and sample efficiency of the RL algorithm. Numerical results show that the proposed CADE framework is both computationally efficient and scalable, and significantly outperforms the baseline model predictive control (MPC). We also provide an in-depth analysis of the learned action-value function to explain the inner working of the reinforcement learning agent.	翻訳日:2021-11-03 14:02:01 公開日:2021-11-01
# ハードウェアを意識したニューラルアーキテクチャ検索のためのプロキシデバイス One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search ( http://arxiv.org/abs/2111.01203v1 ) ライセンス: Link先を確認	Bingqian Lu and Jianyi Yang and Weiwen Jiang and Yiyu Shi and Shaolei Ren	(参考訳) 畳み込みニューラルネットワーク(cnns)は、視覚ベースの自律運転やビデオコンテンツ分析など、多くの現実のアプリケーションで使われている。様々なターゲットデバイスでcnn推論を実行するには、ハードウェアアウェアニューラルアーキテクチャ検索(nas)が不可欠である。効率的なハードウェア対応NASの重要な要件は、異なるアーキテクチャをランク付けするための推論レイテンシの高速評価である。ターゲットデバイス毎の遅延予測器の構築は、技術状況において一般的に使用されているが、非常に多様なデバイスの存在下でスケーラビリティに欠ける、非常に時間を要するプロセスである。本研究では,レイテンシのモノトニック性(monotonicity)を活用することでスケーラビリティの課題に対処します。強いレイテンシのモノトニック性が存在する場合、最適性を損なうことなく、新しいターゲットデバイス上で1つのプロキシデバイスを検索したアーキテクチャを再利用できる。強い遅延単調性がない場合、遅延単調性を大幅に向上させる効率的なプロキシ適応手法を提案する。最後に、我々は、MobileNet-V2、MobileNet-V3、NAS-Bench-201、ProxylessNAS、FBNetなど、複数の主要な検索空間上で異なるプラットフォームで実験を行い、アプローチを検証する。我々の結果は、ひとつのプロキシデバイスを使用することで、デバイス毎のNASとほぼ同じPareto-Optimalアーキテクチャを見つけることができ、各デバイス用の遅延予測器を構築することの禁止コストを回避することができることを強調している。 Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity -- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.	翻訳日:2021-11-03 12:48:47 公開日:2021-11-01
# ASMDD:アラビア音声誤認識検出データセット ASMDD: Arabic Speech Mispronunciation Detection Dataset ( http://arxiv.org/abs/2111.01136v1 ) ライセンス: Link先を確認	Salah A. Aly, Abdelrahman Salah, Hesham M. Eraqi	(参考訳) エジプト語対話におけるアラビア語の誤発音検出の最大のデータセットを紹介する。データセットは、アラビア語で最も頻繁に使われる上位100語を表す注釈付きオーディオファイルで構成されており、100人のエジプト人の子供(2歳から8歳)が発音している。データセットは、専門家リスナーによるセグメント発音誤り検出に基づいて収集、注釈付けされる。 The largest dataset of Arabic speech mispronunciation detections in Egyptian dialogues is introduced. The dataset is composed of annotated audio files representing the top 100 words that are most frequently used in the Arabic language, pronounced by 100 Egyptian children (aged between 2 and 8 years old). The dataset is collected and annotated on segmental pronunciation error detections by expert listeners.	翻訳日:2021-11-03 12:44:25 公開日:2021-11-01
# ディープラーニングを用いたツイートの因果関係の同定--2017-2021年の糖尿病関連ツイートを事例として Identifying causal associations in tweets using deep learning: Use case on diabetes-related tweets from 2017-2021 ( http://arxiv.org/abs/2111.01225v1 ) ライセンス: Link先を確認	Adrian Ahne (1 and 2), Vivek Khetan (3), Xavier Tannier (4), Md Imbessat Hassan Rizvi (5), Thomas Czernichow (2), Francisco Orchard (2), Charline Bour (6), Andrew Fano (3), Guy Fagherazzi (6) ((1) Paris-Saclay University, UVSQ, Inserm, Gustave Roussy, Exposome and Heredity team, CESP, F-94805, Villejuif, France, (2) Epiconcept, Paris, France, (3) Accenture Labs, San Francisco, USA, (4) Sorbonne University, Inserm, University Sorbonne Paris Nord, Laboratoire d'Informatique Medicale et d'Ingenierie des Connaissances pour la e-Sante, LIMICS, Paris, France, (5) Indian Institute of Science, Bengaluru, India, (6) Deep Digital Phenotyping Research Unit, Department of Precision Health, Luxembourg Institute of Health, Strassen, Luxembourg)	(参考訳) 目的: 糖尿病関連ツイートにおける明示的・暗黙的な因果関係を抽出し, 因果性の観点から, 糖尿病オンラインコミュニティ内で共有されている意見, 感情, 観察をよりよく理解するためのツールを提供する。資料と方法:2017年4月から2021年1月の間に、3000万以上の英語の糖尿病関連ツイートが収集された。ディープラーニングと自然言語処理は、個人的および感情的なコンテンツのツイートに焦点を当てるために適用された。 cause-effect-tweetデータセットが手動でラベル付けされ、トレーニングに使用される 1) 因果関係を含む因果関係文を検出するための微調整Bertweetモデル 2) BERTをベースとしたCRFモデルを用いて, 因果関係を抽出した。原因と影響は半教師付きアプローチでクラスター化され、インタラクティブな因果効果ネットワークで可視化された。結果: 不均衡データセットでは68%のリコールで因果文が検出された。 BERTをベースとしたCRFモデルは68%のマクロリコールで原因効果検出のための細調整BERTモデルより優れていた。これにより96,676件の大義関連判決が下された。ディアベテス」は中央クラスタとして同定され、「死」と「インスリン」が続く。インスリン価格関連原因は、しばしば「死」と関連づけられた。結論: 因果文を検出し, 明示的, 暗黙的, 単語的および多語的原因とそれに対応する効果を, BERTベースのアーキテクチャを活用し, 原因効果ネットワークとして可視化した糖尿病関連ツイートで表す新しい手法を開発した。実生活における因果関係を抽出し,ソーシャルメディアデータから報告した患者報告の結果は,糖尿病研究において有用な補完的情報源となる。 Objective: Leveraging machine learning methods, we aim to extract both explicit and implicit cause-effect associations in patient-reported, diabetes-related tweets and provide a tool to better understand opinion, feelings and observations shared within the diabetes online community from a causality perspective. Materials and Methods: More than 30 million diabetes-related tweets in English were collected between April 2017 and January 2021. Deep learning and natural language processing methods were applied to focus on tweets with personal and emotional content. A cause-effect-tweet dataset was manually labeled and used to train 1) a fine-tuned Bertweet model to detect causal sentences containing a causal association 2) a CRF model with BERT based features to extract possible cause-effect associations. Causes and effects were clustered in a semi-supervised approach and visualised in an interactive cause-effect-network. Results: Causal sentences were detected with a recall of 68% in an imbalanced dataset. A CRF model with BERT based features outperformed a fine-tuned BERT model for cause-effect detection with a macro recall of 68%. This led to 96,676 sentences with cause-effect associations. "Diabetes" was identified as the central cluster followed by "Death" and "Insulin". Insulin pricing related causes were frequently associated with "Death". Conclusions: A novel methodology was developed to detect causal sentences and identify both explicit and implicit, single and multi-word cause and corresponding effect as expressed in diabetes-related tweets leveraging BERT-based architectures and visualised as cause-effect-network. Extracting causal associations on real-life, patient reported outcomes in social media data provides a useful complementary source of information in diabetes research.	翻訳日:2021-11-03 12:44:19 公開日:2021-11-01
# 不確実なコスト関数を持つユーザのための低コストアルゴリズムリコース Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions ( http://arxiv.org/abs/2111.01235v1 ) ライセンス: Link先を確認	Prateek Yadav, Peter Hase, Mohit Bansal	(参考訳) 機械学習モデル決定に影響を及ぼす人々のアルゴリズム的会話を特定するという問題は近年注目を集めている。最近の作業は、ユーザ満足度と直接リンクされる、ユーザによるコストのモデルである。しかし、すべてのユーザー間で共有される単一のグローバルなコスト関数を想定している。これは、ユーザが機能変更に伴う異なるコストと機能に対して行動する意思について異なる好みを持っている場合、非現実的な仮定である。本研究では,ユーザ固有のコスト関数の概念を形式化し,ユーザのための行動可能なリコースを識別する新しい手法を提案する。デフォルトでは,ユーザのコスト関数はrecourseメソッドから隠されていると仮定するが,我々のフレームワークでは,ユーザの好みやコスト関数を部分的にあるいは完全に指定することができる。提案する目的関数である「期待最小コスト(EMC)」は,(1)利用者に選択肢のセットを提示する場合,利用者が採用できる少なくとも1つの低コストソリューションが存在することが重要であり,(2)利用者の真のコスト関数を知らない場合には,まず可算コスト関数をサンプリングし,ユーザの満足度を概算し,期待するユーザにとって良好なコストを実現するためのセットを見つけることができる。我々は,新しい離散最適化アルゴリズムであるコスト最適化局所探索 (cols) を用いてemcを最適化する。ユーザコストをシミュレーションした人気のある実世界のデータセットの実験的評価により,強いベースライン法と比較して25.89ポイントのユーザを満足できることがわかった。また, 標準的公平度指標を用いて, 比較した手法よりも, 集団間でより公平なソリューションを提供できることを示すとともに, コスト関数分布の誤特定にロバストな手法であることを検証した。 The problem of identifying algorithmic recourse for people affected by machine learning model decisions has received much attention recently. Some recent works model user-incurred cost, which is directly linked to user satisfaction. But they assume a single global cost function that is shared across all users. This is an unrealistic assumption when users have dissimilar preferences about their willingness to act upon a feature and different costs associated with changing that feature. In this work, we formalize the notion of user-specific cost functions and introduce a new method for identifying actionable recourses for users. By default, we assume that users' cost functions are hidden from the recourse method, though our framework allows users to partially or completely specify their preferences or cost function. We propose an objective function, Expected Minimum Cost (EMC), based on two key ideas: (1) when presenting a set of options to a user, it is vital that there is at least one low-cost solution the user could adopt; (2) when we do not know the user's true cost function, we can approximately optimize for user satisfaction by first sampling plausible cost functions, then finding a set that achieves a good cost for the user in expectation. We optimize EMC with a novel discrete optimization algorithm, Cost-Optimized Local Search (COLS), which is guaranteed to improve the recourse set quality over iterations. Experimental evaluation on popular real-world datasets with simulated user costs demonstrates that our method satisfies up to 25.89 percentage points more users compared to strong baseline methods. Using standard fairness metrics, we also show that our method can provide more fair solutions across demographic groups than comparable methods, and we verify that our method is robust to misspecification of the cost function distribution.	翻訳日:2021-11-03 12:43:52 公開日:2021-11-01
# HRViT:マルチスケール高分解能ビジョントランス HRViT: Multi-Scale High-Resolution Vision Transformer ( http://arxiv.org/abs/2111.01236v1 ) ライセンス: Link先を確認	Jiaqi Gu, Hyoukjun Kwon, Dilin Wang, Wei Ye, Meng Li, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra, David Z. Pan	(参考訳) ビジョントランスフォーマー (vits) は、コンピュータビジョンタスクの優れた性能で多くの注目を集めている。単一スケールの低分解能表現の限界に対処するため、事前の作業では、階層構造を持つ高分解能密度予測タスクにViTを適用してピラミッドの特徴を生成する。しかし、その分類のようなシーケンシャルトポロジーを考えると、マルチスケールの表現学習はまだViTでは未探索である。意味的にリッチで空間的に精度の高いマルチスケール表現を学習する能力を高めるために,我々は高解像度のマルチブランチアーキテクチャをHRViTと呼ばれる視覚変換器と効率的に統合し,高密度予測タスクのParetoを新たなレベルに押し上げる。我々は、異種分岐設計を探求し、線形層における冗長性を低減し、モデル性能とハードウェア効率のバランスをとるためにモデル非線形性を強化した。提案したHRViTは、ADE20K上の50.20% mIoUと、セマンティックセグメンテーションタスクのためのCityscapes上の83.16% mIoUを達成し、最先端のMiTとCSWinを平均1.78 mIoUの改善、28%のパラメータ削減、21%のFLOPs還元を達成し、HRViTを強力な視覚バックボーンとしての可能性を示している。 Vision transformers (ViTs) have attracted much attention for their superior performance on computer vision tasks. To address their limitations of single-scale low-resolution representations, prior work adapts ViTs to high-resolution dense prediction tasks with hierarchical architectures to generate pyramid features. However, multi-scale representation learning is still under-explored on ViTs, given their classification-like sequential topology. To enhance ViTs with more capability to learn semantically-rich and spatially-precise multi-scale representations, in this work, we present an efficient integration of high-resolution multi-branch architectures with vision transformers, dubbed HRViT, pushing the Pareto front of dense prediction tasks to a new level. We explore heterogeneous branch design, reduce the redundancy in linear layers, and augment the model nonlinearity to balance the model performance and hardware efficiency. The proposed HRViT achieves 50.20% mIoU on ADE20K and 83.16% mIoU on Cityscapes for semantic segmentation tasks, surpassing state-of-the-art MiT and CSWin with an average of +1.78 mIoU improvement, 28% parameter reduction, and 21% FLOPs reduction, demonstrating the potential of HRViT as strong vision backbones.	翻訳日:2021-11-03 12:43:22 公開日:2021-11-01
# 顔のランドマーク検出における合成画像による人口バイアスの発見 Using Synthetic Images To Uncover Population Biases In Facial Landmarks Detection ( http://arxiv.org/abs/2111.01683v1 ) ライセンス: Link先を確認	Ran Shadmi, Jonathan Laserson, Gil Elbaz	(参考訳) トレーニングされたモデルのパフォーマンスを分析して弱点を特定するには、テスト用のデータの一部を脇に置いておく必要がある。テストセットは、ターゲット集団のすべての関連するサブグループに対して統計的に重要なバイアスを検出するのに十分な大きさでなければならない。この要件は、特にデータ格納型アプリケーションでは、満足しにくい場合があります。合成テストセットを生成することで,この問題を克服することを提案する。我々は、顔のランドマーク検出タスクを使用して、実際のデータセットで観察されるすべてのバイアスが、慎重に設計された合成データセットで見られることを示し、提案を検証する。これは、合成テストセットがモデルの弱点を効率的に検出し、量や多様性の観点から実テストセットの限界を克服できることを示している。 In order to analyze a trained model performance and identify its weak spots, one has to set aside a portion of the data for testing. The test set has to be large enough to detect statistically significant biases with respect to all the relevant sub-groups in the target population. This requirement may be difficult to satisfy, especially in data-hungry applications. We propose to overcome this difficulty by generating synthetic test set. We use the face landmarks detection task to validate our proposal by showing that all the biases observed on real datasets are also seen on a carefully designed synthetic dataset. This shows that synthetic test sets can efficiently detect a model's weak spots and overcome limitations of real test set in terms of quantity and/or diversity.	翻訳日:2021-11-03 12:41:01 公開日:2021-11-01
# (参考訳) VSEC:ベトナムのSpelling Correctionのためのトランスフォーマーベースモデル VSEC: Transformer-based Model for Vietnamese Spelling Correction ( http://arxiv.org/abs/2111.00640v1 ) ライセンス: CC BY 4.0	Dinh-Truong Do, Ha Thanh Nguyen, Thang Ngoc Bui, Dinh Hieu Vo	(参考訳) スペル誤り訂正は、自然言語処理における長い歴史を持つトピックの1つである。これまでの研究は目覚ましい成果を上げたが、依然として課題は残っている。ベトナム語では、タスクの最先端の方法は、隣接する音節から音節の文脈を推測する。しかし、2つの(あるいはそれ以上の)綴りミスが互いに近くにある場合、モデルはコンテキストを失う可能性があるため、この手法の精度は満足できない。本稿では,ベトナム語の綴り誤りを訂正する新しい手法を提案する。深層学習モデルを用いてミスタイプエラーとミススペルエラーの問題に取り組む。特に埋め込み層はバイトペア符号化技術によって駆動される。 Transformerアーキテクチャに基づくシーケンスモデルとシーケンスモデルにより、我々のアプローチは、同じ問題に関する以前の研究とは異なるものになる。実験では,スペルエラーをランダムに導入した大規模な合成データセットを用いてモデルを訓練する。提案手法の性能を現実的なデータセットを用いて検証する。このデータセットは、9,341の異なるベトナム語文に11,202の人造ミススペルを含んでいる。実験の結果, 検出した86.8%の誤差と81.5%の誤りが検出され, それぞれ5.6%, 2.2%の改善が得られた。 Spelling error correction is one of topics which have a long history in natural language processing. Although previous studies have achieved remarkable results, challenges still exist. In the Vietnamese language, a state-of-the-art method for the task infers a syllable's context from its adjacent syllables. The method's accuracy can be unsatisfactory, however, because the model may lose the context if two (or more) spelling mistakes stand near each other. In this paper, we propose a novel method to correct Vietnamese spelling errors. We tackle the problems of mistyped errors and misspelled errors by using a deep learning model. The embedding layer, in particular, is powered by the byte pair encoding technique. The sequence to sequence model based on the Transformer architecture makes our approach different from the previous works on the same problem. In the experiment, we train the model with a large synthetic dataset, which is randomly introduced spelling errors. We test the performance of the proposed method using a realistic dataset. This dataset contains 11,202 human-made misspellings in 9,341 different Vietnamese sentences. The experimental results show that our method achieves encouraging performance with 86.8% errors detected and 81.5% errors corrected, which improves the state-of-the-art approach 5.6% and 2.2%, respectively.	翻訳日:2021-11-03 04:15:22 公開日:2021-11-01
# (参考訳) trivoc:ロバストなポイントクラウド登録のための効率的な投票ベースのコンセンサス最大化 TriVoC: Efficient Voting-based Consensus Maximization for Robust Point Cloud Registration with Extreme Outlier Ratios ( http://arxiv.org/abs/2111.00657v1 ) ライセンス: CC0 1.0	Lei Sun, Lu Deng	(参考訳) 対応ベースの点雲登録は、ロボットの知覚とコンピュータビジョンの基盤であり、2点雲を対応づける最良の剛性変換を推定することを目指している。しかし、3次元キーポイントマッチングアプローチのロバスト性が限られているため、おそらく大きな数のアウトリアーは対応式の中に存在しがちであり、ロバストな登録手法が必須となる。残念なことに、既存のロバストな手法は、高または極端な外れ値比に直面した場合に、それぞれ独自の制限(高い計算コストや限られたロバスト性)を持つ。本稿では, 頑健な登録問題に対して, TriVoC (Triple-layered Voting with Consensus maximization) という, 高速, 決定論的, 確固な解決法を提案する。最小の3点セットの選択を3つの連続したレイヤに分解し,各レイヤにおいて,ペアワイズ等長制約に基づいて効率的な投票・対応ソートフレームワークを設計する。このように、3点集合は、ソートされたシーケンスに従って縮小された対応集合から独立して選択することができ、計算コストを大幅に下げる一方、確率的終了条件を満たす限り、最大のコンセンサスセット(最終イリヤセット)を達成するための強い保証を提供する。様々な実験によって、我々の解法トライボックは最大99%の外れ値に対して堅牢であり、極端な外れ値比でも精度が高く、時間効率が高く、実世界のアプリケーションでも実用的であり、他の最先端の競合よりもパフォーマンスが優れていることが示された。 Correspondence-based point cloud registration is a cornerstone in robotics perception and computer vision, which seeks to estimate the best rigid transformation aligning two point clouds from the putative correspondences. However, due to the limited robustness of 3D keypoint matching approaches, outliers, probably in large numbers, are prone to exist among the correspondences, which makes robust registration methods imperative. Unfortunately, existing robust methods have their own limitations (e.g. high computational cost or limited robustness) when facing high or extreme outlier ratios, probably unsuitable for practical use. In this paper, we present a novel, fast, deterministic and guaranteed robust solver, named TriVoC (Triple-layered Voting with Consensus maximization), for the robust registration problem. We decompose the selecting of the minimal 3-point sets into 3 consecutive layers, and in each layer we design an efficient voting and correspondence sorting framework on the basis of the pairwise equal-length constraint. In this manner, the 3-point sets can be selected independently from the reduced correspondence sets according to the sorted sequence, which can significantly lower the computational cost and meanwhile provide a strong guarantee to achieve the largest consensus set (as the final inlier set) as long as a probabilistic termination condition is fulfilled. Varied experiments show that our solver TriVoC is robust against up to 99% outliers, highly accurate, time-efficient even with extreme outlier ratios, and also practical for real-world applications, showing performance superior to other state-of-the-art competitors.	翻訳日:2021-11-03 04:06:59 公開日:2021-11-01
# (参考訳) 新しい特徴的ヒト外観データセットを用いた人間および機械の顔検出の評価 Evaluation of Human and Machine Face Detection using a Novel Distinctive Human Appearance Dataset ( http://arxiv.org/abs/2111.00660v1 ) ライセンス: CC BY 4.0	Necdet Gurkan and Jordan W. Suchow	(参考訳) 顔検出はコンピュータビジョンの分野で長年の課題であり、究極の目標は、制約のない環境で人間の顔を正確にローカライズすることである。ポーズ、解像度、照明、オクルージョン、ビューポイント \cite{merler2019diversity} といった要因が組み合わさっているため、これらのシステムの正確性には大きな技術的ハードルがある。しかし、機械学習の最近の進歩により、顔検出システムは驚くほど精度が向上し、主にデータ駆動のディープラーニングモデルである \cite{wang2017 detectioning} に基づいている。奨励的ではあるが、配備システムの顔検出性能と社会的責任を制限する重要な側面は、人間の外見に固有の多様性である。あらゆる人間の外観は、その遺産、アイデンティティ、経験、自己表現の目に見える表現など、個人に特有の何かを反映している。しかし, 顔の大きさや形状, 肌の色, 体調, 身体の装飾などの違いに直面すると, 顔検出システムの性能に疑問がある。この目的に向けて,表情を低頻度で表現し,顔のデータセットでアンサンプリングされる傾向の強い特徴的人間出現データセットを収集した。そして,これらの画像中の顔を検出する能力について,最先端の顔検出モデルの評価を行った。評価結果は,顔検出アルゴリズムがこれらの多様な外観によく適応していないことを示す。現在の顔検出モデルの評価と特徴付けは、より公平で正確な顔検出システムの構築に向けた研究と開発を加速する。 Face detection is a long-standing challenge in the field of computer vision, with the ultimate goal being to accurately localize human faces in an unconstrained environment. There are significant technical hurdles in making these systems accurate due to confounding factors related to pose, image resolution, illumination, occlusion, and viewpoint \cite{merler2019diversity}. That being said, with recent developments in machine learning, face-detection systems have achieved extraordinary accuracy, largely built on data-driven deep-learning models \cite{wang2017detecting}. Though encouraging, a critical aspect that limits face-detection performance and social responsibility of deployed systems is the inherent diversity of human appearance. Every human appearance reflects something unique about a person, including their heritage, identity, experiences, and visible manifestations of self-expression. However, there are questions about how well face-detection systems perform when faced with varying face size and shape, skin color, body modification, and body ornamentation. Towards this goal, we collected the Distinctive Human Appearance dataset, an image set that represents appearances with low frequency and that tend to be undersampled in face datasets. Then, we evaluated current state-of-the-art face-detection models in their ability to detect faces in these images. The evaluation results show that face-detection algorithms do not generalize well to these diverse appearances. Evaluating and characterizing the state of current face-detection models will accelerate research and development towards creating fairer and more accurate face-detection systems.	翻訳日:2021-11-03 03:30:21 公開日:2021-11-01
# (参考訳) 勧告の比較解説 Comparative Explanations of Recommendations ( http://arxiv.org/abs/2111.00670v1 ) ライセンス: CC BY 4.0	Aobo Yang, Nan Wang, Renqin Cai, Hongbo Deng, Hongning Wang	(参考訳) レコメンデーションは基本的に比較(またはランク付け)のプロセスであるため、ある項目が他の項目よりも優れていると信じている理由、すなわち推奨項目に関する比較説明をユーザーに示す必要がある。理想的には、説明を読むと、ユーザーはシステムと同じ項目のランキングに到達すべきである。残念ながら、このような比較説明にはほとんど研究の注意が払われていない。本研究では,レコメンダシステムからランク付けされた項目群間の相対的な比較を説明するための抽出・再定義アーキテクチャを開発した。推奨項目ごとに、まず関連レビューから1つの文を抽出し、参照項目の集合に対して最適な比較を行う。そして、この抽出文を生成モデルを介して対象ユーザに対してさらに調音し、その項目を推奨する理由をよりよく説明する。我々はBLEUに基づく新しい説明品質指標を設計し、汎用コンテンツの生成を避けるために抽出・精錬部品のエンドツーエンドトレーニングを指導する。 2つの大規模レコメンデーションベンチマークデータセットに対する広範囲なオフライン評価と、最先端のレコメンデーションアルゴリズムに対する真面目なユーザ調査は、比較説明の必要性とソリューションの有効性を示している。 As recommendation is essentially a comparative (or ranking) process, a good explanation should illustrate to users why an item is believed to be better than another, i.e., comparative explanations about the recommended items. Ideally, after reading the explanations, a user should reach the same ranking of items as the system's. Unfortunately, little research attention has yet been paid on such comparative explanations. In this work, we develop an extract-and-refine architecture to explain the relative comparisons among a set of ranked items from a recommender system. For each recommended item, we first extract one sentence from its associated reviews that best suits the desired comparison against a set of reference items. Then this extracted sentence is further articulated with respect to the target user through a generative model to better explain why the item is recommended. We design a new explanation quality metric based on BLEU to guide the end-to-end training of the extraction and refinement components, which avoids generation of generic content. Extensive offline evaluations on two large recommendation benchmark datasets and serious user studies against an array of state-of-the-art explainable recommendation algorithms demonstrate the necessity of comparative explanations and the effectiveness of our solution.	翻訳日:2021-11-03 03:17:14 公開日:2021-11-01
# (参考訳) 特徴豊かさを有する蒸留物体検出器 Distilling Object Detectors with Feature Richness ( http://arxiv.org/abs/2111.00674v1 ) ライセンス: CC BY 4.0	Zhixing Du, Rui Zhang, Ming Chang, Xishan Zhang, Shaoli Liu, Tianshi Chen, Tianshi Chen	(参考訳) 近年、大規模深層モデルが大きな成功を収めているが、計算の複雑さと巨大なストレージ要件により、リソース制限のあるデバイスにデプロイすることが大きな課題となっている。モデル圧縮・加速法として、知識蒸留は教師検出器から暗黒知識を伝達することにより、小型モデルの性能を効果的に向上させる。しかし、既存の蒸留法に基づく検出法のほとんどは、主に2つの制限がある境界ボックス付近の特徴を模倣している。まず、バウンディングボックスの外にある有益な機能を無視する。第二に、これらの手法は教師検出器によって背景と見なされるいくつかの特徴を模倣する。以上の課題に対処するため,蒸留時の一般化検出性を向上する重要な特徴を選択するために,FRS(Feature-Richness Score)法を提案する。提案手法は,境界ボックスの外にある重要な特徴を効果的に検索し,境界ボックス内の有害な特徴を取り除く。本手法は,アンカーベース,アンカーフリー両検出器において優れた性能を示す。例えば、resnet-50のretinanetはcoco2017データセットのマップで39.7%に達し、resnet-101ベースの教師検出器38.9%を0.8%上回っている。 In recent years, large-scale deep models have achieved great success, but the huge computational complexity and massive storage requirements make it a great challenge to deploy them in resource-limited devices. As a model compression and acceleration method, knowledge distillation effectively improves the performance of small models by transferring the dark knowledge from the teacher detector. However, most of the existing distillation-based detection methods mainly imitating features near bounding boxes, which suffer from two limitations. First, they ignore the beneficial features outside the bounding boxes. Second, these methods imitate some features which are mistakenly regarded as the background by the teacher detector. To address the above issues, we propose a novel Feature-Richness Score (FRS) method to choose important features that improve generalized detectability during distilling. The proposed method effectively retrieves the important features outside the bounding boxes and removes the detrimental features within the bounding boxes. Extensive experiments show that our methods achieve excellent performance on both anchor-based and anchor-free detectors. For example, RetinaNet with ResNet-50 achieves 39.7% in mAP on the COCO2017 dataset, which even surpasses the ResNet-101 based teacher detector 38.9% by 0.8%.	翻訳日:2021-11-03 02:59:47 公開日:2021-11-01
# (参考訳) 球面埋め込みの領域適応 Domain-adaptation of spherical embeddings ( http://arxiv.org/abs/2111.00677v1 ) ライセンス: CC BY 4.0	Mihalis Gongolidis, Jeremy Minton, Ronin Wu, Valentin Stauber, Jason Hoelscher-Obermaier and Viktor Botev	(参考訳) 特定のドメインの言語に汎用的な埋め込みを更新する埋め込みモデルのドメイン適応は、効果的なモデルをスクラッチからトレーニングするのに不十分なデータを持つドメインにとって実証済みのテクニックである。化学出版物はそのような分野の1つであり、科学用語と過剰な用語が一般的な言語モデルのパフォーマンスを阻害する。近年の arXiv:1911.01196 で提案されている球面埋め込みモデル (JoSE) は,多次元単位球上での訓練において,単語と文書の埋め込みを共同で学習する。しかし、トレーニング中のグローバル回転による非収束は、ドメイン適応を妨げている。本研究では,埋め込み空間のグローバルなローテーションに対応する手法を開発し,ドメイン固有トレーニング中に単語や文書を更新する手法を提案する。 2つの新しい文書分類データセットがgeneral and chemistry scientific journalsから照合され、提案された更新トレーニング戦略とベンチマークモデルを比較する。当社の戦略は、word2vecに似たレベルまでドメイン適応のパフォーマンスコストを削減できることを示します。 Domain adaptation of embedding models, updating a generic embedding to the language of a specific domain, is a proven technique for domains that have insufficient data to train an effective model from scratch. Chemistry publications is one such domain, where scientific jargon and overloaded terminology inhibit the performance of a general language model. The recent spherical embedding model (JoSE) proposed in arXiv:1911.01196 jointly learns word and document embeddings during training on the multi-dimensional unit sphere, which performs well for document classification and word correlation tasks. But, we show a non-convergence caused by global rotations during its training prevents it from domain adaptation. In this work, we develop methods to counter the global rotation of the embedding space and propose strategies to update words and documents during domain specific training. Two new document classification data-sets are collated from general and chemistry scientific journals to compare the proposed update training strategies with benchmark models. We show that our strategies are able to reduce the performance cost of domain adaptation to a level similar to Word2Vec.	翻訳日:2021-11-03 02:44:10 公開日:2021-11-01
# (参考訳) Discourse Comprehension: 文接続を表現するための質問応答フレームワーク Discourse Comprehension: A Question Answering Framework to Represent Sentence Connections ( http://arxiv.org/abs/2111.00701v1 ) ライセンス: CC BY 4.0	Wei-Jen Ko, Cutter Dalton, Mark Simmons, Eliza Fisher, Greg Durrett, Junyi Jessy Li	(参考訳) 単純なファクトイドの質問応答を通じてテキスト理解が大幅に進歩してきたが、談話のより包括的な理解は依然として大きな課題である。テキストを読みながら批判的に反省する人は、好奇心に駆られ、しばしば公然とした質問を提起し、内容の深い理解を反映し、答えるには複雑な推論を必要とする。この種の談話理解のためのモデルを構築して評価する上での重要な課題は、注釈付きデータの欠如である。本稿では,ニュース文書の理解を目的としたスケーラブルなデータ収集を実現するための新しいパラダイムを提案する。得られたコーパスであるDCQA(Discourse Comprehension by Question Answering)は、607の英語文書からなる22,430の質問回答ペアで構成されている。 DCQAは、言論と文間のセマンティックリンクの両方を自由形式のオープンエンドの質問形式でキャプチャする。 INQUISITIVEデータセットからの質問に注釈を付けた評価セットでは、DCQAがオープンな質問に答えるための貴重な監視を提供することを示す。さらに,既存の質問応答資源を活用した事前学習手法を設計,合成データを用いて不可解な質問に適応する。 While there has been substantial progress in text comprehension through simple factoid question answering, more holistic comprehension of a discourse still presents a major challenge. Someone critically reflecting on a text as they read it will pose curiosity-driven, often open-ended questions, which reflect deep understanding of the content and require complex reasoning to answer. A key challenge in building and evaluating models for this type of discourse comprehension is the lack of annotated data, especially since finding answers to such questions (which may not be answered at all) requires high cognitive load for annotators over long documents. This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents, viewing these questions through the lens of discourse. The resulting corpus, DCQA (Discourse Comprehension by Question Answering), consists of 22,430 question-answer pairs across 607 English documents. DCQA captures both discourse and semantic links between sentences in the form of free-form, open-ended questions. On an evaluation set that we annotated on questions from the INQUISITIVE dataset, we show that DCQA provides valuable supervision for answering open-ended questions. We additionally design pre-training methods utilizing existing question-answering resources, and use synthetic data to accommodate unanswerable questions.	翻訳日:2021-11-03 02:41:12 公開日:2021-11-01
# (参考訳) アウトラインとファイリング: ナレッジグラフ上の複雑な質問に答える階層的クエリグラフ生成 Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph ( http://arxiv.org/abs/2111.00732v1 ) ライセンス: CC BY 4.0	Yongrui Chen, Huiying Li, Guilin Qi, Tianxing Wu, and Tenggou Wang	(参考訳) クエリグラフの構築は、自然言語の質問に答えるナレッジグラフ上で、正確な実行可能なSPARQLを構築することを目的としている。最近のアプローチはnnベースのクエリグラフのランキングでうまく機能しているが、複雑なsparql構文、ランキングのための巨大な検索スペース、ローカルなあいまいさを伴うノイズの多いクエリグラフという3つの新しい課題がある。本稿ではこれらの課題に対処する。当初、一般的な複雑なSPARQL構文を頂点とエッジからなるサブグラフとみなし、それらを適応するための新しい統一クエリグラフ文法を提案する。次に,問合せグラフを構築するための新しい二段階アプローチを提案する。第1段階では、最上位の$k$関連インスタンス(エンティティ、関係など)は、候補インスタンスとして単純な戦略によって収集される。第2段階では、グラフ生成モデルが階層生成を行う。まず、頂点とエッジが空のスロットであるグラフ構造を概説し、次に適切なインスタンスをスロットに埋め、クエリグラフを完成させる。このアプローチでは,クエリグラフ全体の耐え難い検索空間を手頃なサブスペースに分解する一方で,グローバル構造情報を活用して局所曖昧性を排除する。実験結果から,本手法は最も難しいKGQAベンチマークの最先端性を大幅に向上し,複雑な問題に対して優れた性能を示すことが示された。 Query graph building aims to build correct executable SPARQL over the knowledge graph for answering natural language questions. Although recent approaches perform well by NN-based query graph ranking, more complex questions bring three new challenges: complicated SPARQL syntax, huge search space for ranking, and noisy query graphs with local ambiguity. This paper handles these challenges. Initially, we regard common complicated SPARQL syntax as the sub-graphs comprising of vertices and edges and propose a new unified query graph grammar to adapt them. Subsequently, we propose a new two-stage approach to build query graphs. In the first stage, the top-$k$ related instances (entities, relations, etc.) are collected by simple strategies, as the candidate instances. In the second stage, a graph generation model performs hierarchical generation. It first outlines a graph structure whose vertices and edges are empty slots, and then fills the appropriate instances into the slots, thereby completing the query graph. Our approach decomposes the unbearable search space of entire query graphs into affordable sub-spaces of operations, meanwhile, leverages the global structural information to eliminate local ambiguity. The experimental results demonstrate that our approach greatly improves state-of-the-art on the hardest KGQA benchmarks and has an excellent performance on complex questions.	翻訳日:2021-11-03 02:24:56 公開日:2021-11-01
# (参考訳) 信念を広める集団からのロバストな深層学習 Robust Deep Learning from Crowds with Belief Propagation ( http://arxiv.org/abs/2111.00734v1 ) ライセンス: CC BY 4.0	Hoyoung Kim, Seunghyuk Cho, Dongwoo Kim, Jungseul Ok	(参考訳) クラウドソーシングシステムにより、クラウドワーカーから騒がしいラベルを収集できます。ワーカとタスク間のローカル依存関係を表すグラフィカルモデルは、ノイズの多い回答から真のラベルを推論する原則的な方法を提供する。しかし、多くの場合、真のラベルではなく、クラウドソースされたデータセットから直接見えないデータに取り組んでいる予測モデルが必要です。真のラベルを推論し、同時に予測モデルを学習するために、ニューラルネットワークがタスク特徴から真のラベルを生成する新しいデータ生成プロセスを提案する。変動推論と深層学習を交互に交互に行うEMフレームワークを考案し,真のラベルを推測し,ニューラルネットワークを更新する。合成データと実データを用いた実験結果から,信念伝達に基づくemアルゴリズムは頑健であることが分かる。一業務の特徴の腐敗二前任のマルチモーダル又はミスマッチ労働者、及び三多くの業務に騒音を提出するスパマーは少ない。 Crowdsourcing systems enable us to collect noisy labels from crowd workers. A graphical model representing local dependencies between workers and tasks provides a principled way of reasoning over the true labels from the noisy answers. However, one needs a predictive model working on unseen data directly from crowdsourced datasets instead of the true labels in many cases. To infer true labels and learn a predictive model simultaneously, we propose a new data-generating process, where a neural network generates the true labels from task features. We devise an EM framework alternating variational inference and deep learning to infer the true labels and to update the neural network, respectively. Experimental results with synthetic and real datasets show a belief-propagation-based EM algorithm is robust to i) corruption in task features, ii) multi-modal or mismatched worker prior, and iii) few spammers submitting noises to many tasks.	翻訳日:2021-11-03 01:50:52 公開日:2021-11-01
# (参考訳) URIR:知識グラフに基づくユーザRNNエンコーダとアイテムエンコーダの推薦アルゴリズム URIR: Recommendation algorithm of user RNN encoder and item encoder based on knowledge graph ( http://arxiv.org/abs/2111.00739v1 ) ライセンス: CC BY 4.0	Na zhao, Zhen Long, Zhi-Dan Zhao, Jian Wang	(参考訳) 情報量が多いため,ユーザが興味を持っているものを見つけることは困難である。ユーザ体験を改善するため,音楽レコメンデーションや映画レコメンデーション,オンラインショッピングなどのシナリオで広く利用されている。近年,知識グラフ(KG)はレコメンデーションシステムの性能向上に有効なツールであることが証明されている。しかし、レコメンデーションにナレッジグラフを適用する際の大きな課題は、ナレッジグラフを使ってより良いユーザコードやアイテムコードを取得する方法である。そこで本研究では,知識グラフ(URIR)に基づくユーザリカレントニューラルネットワーク(RNN)エンコーダとアイテムエンコーダ推薦アルゴリズムを提案する。本研究は,アイテムの表現ベクトルを生成するために高レベルな隣接情報をキャプチャしてアイテムをエンコードし,ユーザの表現ベクトルを生成するためにrnnおよびアイテムの表現ベクトルを適用し,ユーザの表現ベクトルおよびアイテムの表現ベクトルに対して内部積演算を行い,アイテムとのインタラクションの確率を得る。 3つの実世界のデータセットに関する数値実験により、URIRはAUC、Precision、Recall、MRRなどの指標における最先端アルゴリズムよりも優れた性能を示している。これは、urirがナレッジグラフを効果的に使用して、より良いユーザコードとアイテムコードを取得し、よりよい推奨結果を得ることができることを意味する。 Due to a large amount of information, it is difficult for users to find what they are interested in among the many choices. In order to improve users' experience, recommendation systems have been widely used in music recommendations, movie recommendations, online shopping, and other scenarios. Recently, Knowledge Graph (KG) has been proven to be an effective tool to improve the performance of recommendation systems. However, a huge challenge in applying knowledge graphs for recommendation is how to use knowledge graphs to obtain better user codes and item codes. In response to this problem, this research proposes a user Recurrent Neural Network (RNN) encoder and item encoder recommendation algorithm based on Knowledge Graph (URIR). This study encodes items by capturing high-level neighbor information to generate items' representation vectors and applies an RNN and items' representation vectors to encode users to generate users' representation vectors, and then perform inner product operation on users' representation vectors and items' representation vectors to get probabilities of users interaction with items. Numerical experiments on three real-world datasets demonstrate that URIR is superior performance to state-of-the-art algorithms in indicators such as AUC, Precision, Recall, and MRR. This implies that URIR can effectively use knowledge graph to obtain better user codes and item codes, thereby obtaining better recommendation results.	翻訳日:2021-11-03 01:28:17 公開日:2021-11-01
# (参考訳) 3次元脳腫瘍MRIのセマンティックセグメンテーションにおける冗長性の検討 Redundancy Reduction in Semantic Segmentation of 3D Brain Tumor MRIs ( http://arxiv.org/abs/2111.00742v1 ) ライセンス: CC BY 4.0	Md Mahfuzur Rahman Siddiquee, Andriy Myronenko	(参考訳) また、multimodal brain tumor segmentation challenge (brats) 2021ではさらに大きなデータセットを提供し、疾患の分析と治療計画に必要な脳腫瘍の分割方法の協力と研究を容易にする。 BraTS 2021の大規模なデータセットサイズと現代的なGPUの出現は、データから腫瘍表現を学ぶためのディープラーニングベースのアプローチによりよい機会を提供する。本研究では,エンコーダ・デコーダに基づくセグメンテーションネットワークを維持しつつ,摂動下での冗長性を最小限に抑えるネットワークトレーニングプロセスの改良に焦点をあてた。ネットワークが訓練された場合、信頼性に基づくアンサンブル技術を導入し、パフォーマンスをさらに向上する。本手法をBraTS 2021検証ボード上で評価し, 腫瘍コア, 腫瘍コア, 腫瘍全体に対する平均ダイス0.8600, 0.8868, 0.9265を得た。私たちのチーム(NVAUTO)の応募は、ETとTCのスコアで上位10チーム、WTのスコアで上位10チームだった。 Another year of the multimodal brain tumor segmentation challenge (BraTS) 2021 provides an even larger dataset to facilitate collaboration and research of brain tumor segmentation methods, which are necessary for disease analysis and treatment planning. A large dataset size of BraTS 2021 and the advent of modern GPUs provide a better opportunity for deep-learning based approaches to learn tumor representation from the data. In this work, we maintained an encoder-decoder based segmentation network, but focused on a modification of network training process that minimizes redundancy under perturbations. Given a set trained networks, we further introduce a confidence based ensembling techniques to further improve the performance. We evaluated the method on BraTS 2021 validation board, and achieved 0.8600, 0.8868 and 0.9265 average dice for enhanced tumor core, tumor core and whole tumor, respectively. Our team (NVAUTO) submission was the top performing in terms of ET and TC scores and within top 10 performing teams in terms of WT scores.	翻訳日:2021-11-03 01:16:53 公開日:2021-11-01
# (参考訳) 品質評価データセットを効率的に生成する新しいツール A New Tool for Efficiently Generating Quality Estimation Datasets ( http://arxiv.org/abs/2111.00767v1 ) ライセンス: CC BY 4.0	Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim	(参考訳) 品質推定(QE)トレーニングのためのデータの構築は費用がかかり、かなりの人的労力を要する。本研究では、qeを実行しながらデータ中心のアプローチに注目し、入力として単言語または並列コーパスのみを受信してqeデータセットを生成する完全自動擬似qeデータセット生成ツールを提案する。これにより、データ拡張または複数の言語ペアにQEの適用性を活用するように促すことにより、QE性能が向上する。さらに、このツールがコミュニティにQEデータセットを開発するための新しい安価な方法を提供すると考えているので、ユーザフレンドリーなQEデータセット生成ツールを公開するつもりです。 Building of data for quality estimation (QE) training is expensive and requires significant human labor. In this study, we focus on a data-centric approach while performing QE, and subsequently propose a fully automatic pseudo-QE dataset generation tool that generates QE datasets by receiving only monolingual or parallel corpus as the input. Consequently, the QE performance is enhanced either by data augmentation or by encouraging multiple language pairs to exploit the applicability of QE. Further, we intend to publicly release this user friendly QE dataset generation tool as we believe this tool provides a new, inexpensive method to the community for developing QE datasets.	翻訳日:2021-11-03 01:09:47 公開日:2021-11-01
# (参考訳) AdaPool: 情報保持ダウンサンプリングのための指数適応型プール AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling ( http://arxiv.org/abs/2111.00772v1 ) ライセンス: CC BY 4.0	Alexandros Stergiou and Ronald Poppe	(参考訳) プール層は畳み込みニューラルネットワーク(cnns)の重要な構成要素であり、計算オーバーヘッドを削減し、畳み込み操作の受容野を増加させる。彼らは入力ボリュームによく似たサンプル化されたボリュームを作成し、理想的には計算とメモリ効率の両立を目指している。両方の要件を共同で満たすことは困難である。この目的のために、適応的で指数関数的に重み付けされたプール法である $\textit{adaPool}$ を提案する。提案手法では,dice-sorensen係数の指数値と指数最大値に基づく2組のプーリングカーネルのパラメータ化融合を用いる。 adaPoolの重要な性質は、その双方向性である。一般的なプーリング法とは対照的に、ウェイトはダウンサンプリングされたアクティベーションマップをアップサンプルするために使うことができる。このメソッドを $\textit{adaUnPool}$ とします。 adaPoolは画像やビデオの分類やオブジェクト検出など,さまざまなタスクを通じて,ディテールの保存性の向上を実証する。次に,画像および映像フレームの超解像とフレーム補間タスクにおけるadaunpoolの評価を行う。ベンチマークには、新しい高品質で高フレームレートのビデオデータセットである$\textit{Inter4K}$を導入する。組み合わせた実験により、adaPoolはタスクやバックボーンアーキテクチャにまたがる優れた結果を体系的に達成し、微妙な計算とメモリオーバーヘッドを発生させることを示した。 Pooling layers are essential building blocks of Convolutional Neural Networks (CNNs) that reduce computational overhead and increase the receptive fields of proceeding convolutional operations. They aim to produce downsampled volumes that closely resemble the input volume while, ideally, also being computationally and memory efficient. It is a challenge to meet both requirements jointly. To this end, we propose an adaptive and exponentially weighted pooling method named $\textit{adaPool}$. Our proposed method uses a parameterized fusion of two sets of pooling kernels that are based on the exponent of the Dice-Sorensen coefficient and the exponential maximum, respectively. A key property of adaPool is its bidirectional nature. In contrast to common pooling methods, weights can be used to upsample a downsampled activation map. We term this method $\textit{adaUnPool}$. We demonstrate how adaPool improves the preservation of detail through a range of tasks including image and video classification and object detection. We then evaluate adaUnPool on image and video frame super-resolution and frame interpolation tasks. For benchmarking, we introduce $\textit{Inter4K}$, a novel high-quality, high frame-rate video dataset. Our combined experiments demonstrate that adaPool systematically achieves better results across tasks and backbone architectures, while introducing a minor additional computational and memory overhead.	翻訳日:2021-11-03 01:05:43 公開日:2021-11-01
# (参考訳) AIを利用した支払いシステムのためのスマートルーティングソリューション An AI-powered Smart Routing Solution for Payment Systems ( http://arxiv.org/abs/2111.00783v1 ) ライセンス: CC0 1.0	Ramya Bygari, Aayush Gupta, Shashwat Raghuvanshi, Aakanksha Bapna, Birendra Sahu	(参考訳) 現在のデジタル化時代には、オンライン決済システムがかなりの関心を集めている。支払いシステムの効率性の向上は、ビジネスの収益に大きな影響を与えるため重要である。ゲートウェイは、すべてのトランザクションがルーティングされる支払いシステムの不可欠なコンポーネントである。オンライン決済システムでは、支払い処理は価格、方法、リスクチェックなど様々な設定によってこれらのゲートウェイと統合される。これらの構成を端末と呼ぶ。各ゲートウェイには複数の端末が関連付けられる。支払いトランザクションを最良の端末にルーティングすることは、支払いトランザクションが成功する確率を高めるために不可欠である。機械学習(ML)と人工知能(AI)技術は、過去のパフォーマンスと様々な支払い関連属性に基づいて、最適な端末を正確に予測するために使用することができる。我々は静的モジュールと動的モジュールからなるパイプラインを考案した。静的モジュールは、静的ルールとゲートウェイのダウンタイムを予測するロジスティック回帰モデルを使用して、端末の初期フィルタリングを行う。その後、動的モジュールは成功率、支払い属性、タイムラグなどに基づいて多くの新しい特徴を計算し、端末動作を正確にモデル化する。これらの特徴を適応時間減衰率アルゴリズムを用いてリアルタイムにフィードバックループを用いて更新し、ランダムフォレスト分類器に渡して端末毎の成功確率を予測する。このパイプラインは現在razorpayで運用中であり、数百万のトランザクションをリアルタイムにルーティングし、すべての支払い方法(クレジットカード、デビットカード、upi、ネットバンキング)で成功率を4-6\%向上させている。これにより、当社の決済システムはパフォーマンス低下に対する耐性が向上し、ユーザエクスペリエンスが向上し、商人への信頼が増し、ビジネスの収益が向上しました。 In the current era of digitization, online payment systems are attracting considerable interest. Improving the efficiency of a payment system is important since it has a substantial impact on revenues for businesses. A gateway is an integral component of a payment system through which every transaction is routed. In an online payment system, payment processors integrate with these gateways by means of various configurations such as pricing, methods, risk checks, etc. These configurations are called terminals. Each gateway can have multiple terminals associated with it. Routing a payment transaction through the best terminal is crucial to increase the probability of a payment transaction being successful. Machine learning (ML) and artificial intelligence (AI) techniques can be used to accurately predict the best terminals based on their previous performance and various payment-related attributes. We have devised a pipeline consisting of static and dynamic modules. The static module does the initial filtering of the terminals using static rules and a logistic regression model that predicts gateway downtimes. Subsequently, the dynamic module computes a lot of novel features based on success rate, payment attributes, time lag, etc. to model the terminal behaviour accurately. These features are updated using an adaptive time decay rate algorithm in real-time using a feedback loop and passed to a random forest classifier to predict the success probabilities for every terminal. This pipeline is currently in production at Razorpay routing millions of transactions through it in real-time and has given a 4-6\% improvement in success rate across all payment methods (credit card, debit card, UPI, net banking). This has made our payment system more resilient to performance drops, which has improved the user experience, instilled more trust in the merchants, and boosted the revenue of the business.	翻訳日:2021-11-03 00:41:23 公開日:2021-11-01
# (参考訳) 大規模製品ネットワークにおける動的価格と需要学習 : PAC-Bayesianアプローチ Dynamic Pricing and Demand Learning on a Large Network of Products: A PAC-Bayesian Approach ( http://arxiv.org/abs/2111.00790v1 ) ライセンス: CC BY 4.0	Bora Keskin, David Simchi-Levi, Prem Talwai	(参考訳) 私たちは、T$の期間でN$製品の大規模なネットワークを提供する売り手を考えています。販売者は、製品の線形需要モデルのパラメータを知らず、販売観察に基づいて需要モデルを学ぶために製品価格を動的に調整することができる。売り手は、その疑似レグレット、すなわち、基盤となる需要モデルを知っている透視能力者に対する期待収益損失を最小化することを目指している。我々は,製品ネットワークの様々な接続特性を特徴付けるために,製品間の需要関係のばらばらな集合を考える。特に,(1)ネットワーク上の接続数を制限する$l_0$ sparsity,(2)クロスプロダクト価格感受性の大きさを制約する対角的スパーシティ,(3)ネットワークノード上の類似度メトリックの漸近的減衰を制約するスペクトルスパーシティという新たな概念の3つの異なるスパーシティフレームワークについて検討した。我々は,不確実性とpac-bayesianアプローチの楽観性を組み合わせた動的価格学習政策を提案し,この方針がn$と$t$の漸近的最適性能を達成することを示す。また,スペクトル・非対角性の場合,ネットワークが密集している場合でも,売り手は疑似レグレット線形をn$で得ることができることを示した。 We consider a seller offering a large network of $N$ products over a time horizon of $T$ periods. The seller does not know the parameters of the products' linear demand model, and can dynamically adjust product prices to learn the demand model based on sales observations. The seller aims to minimize its pseudo-regret, i.e., the expected revenue loss relative to a clairvoyant who knows the underlying demand model. We consider a sparse set of demand relationships between products to characterize various connectivity properties of the product network. In particular, we study three different sparsity frameworks: (1) $L_0$ sparsity, which constrains the number of connections in the network, and (2) off-diagonal sparsity, which constrains the magnitude of cross-product price sensitivities, and (3) a new notion of spectral sparsity, which constrains the asymptotic decay of a similarity metric on network nodes. We propose a dynamic pricing-and-learning policy that combines the optimism-in-the-face-of-uncertainty and PAC-Bayesian approaches, and show that this policy achieves asymptotically optimal performance in terms of $N$ and $T$. We also show that in the case of spectral and off-diagonal sparsity, the seller can have a pseudo-regret linear in $N$, even when the network is dense.	翻訳日:2021-11-03 00:31:57 公開日:2021-11-01
# (参考訳) 局所シナプス塑性によるイベントベース時空間特徴記述子学習:コンピュータビジョンの生物学的現実的視点 Learning Event-based Spatio-Temporal Feature Descriptors via Local Synaptic Plasticity: A Biologically-realistic Perspective of Computer Vision ( http://arxiv.org/abs/2111.00791v1 ) ライセンス: CC BY 4.0	Ali Safa, Hichem Sahli, Andr\'e Bourdoux, Ilja Ocket, Francky Catthoor, Georges Gielen	(参考訳) 視覚野で経験的に観察されるように,スパイクタイミング依存塑性学習(STDP)を用いたスパイク皮質アンサンブルを最適化した理論を提案する。提案手法を用いて,N-MNIST,CIFAR10-DVS,IBM DVS128ジェスチャデータセットでそれぞれ評価するイベントベースカメラのための,完全接続型,畳み込み型,アクションベースの機能記述器のクラスを構築した。 CIFAR10-DVSでは,従来のイベントベースの特徴記述子 (+8%) と比較して, 精度が向上した。最新のSTDPシステムに比べて精度が大幅に向上した(N-MNISTでは+10%、IBM DVS128 Gestureでは+7.74%)。ニューロモルフィックエッジデバイスにおける超低消費電力学習に加えて、私たちの研究は、生物学的に現実的で最適化に基づく皮質視覚の理論への道を開くのに役立ちます。 We present an optimization-based theory describing spiking cortical ensembles equipped with Spike-Timing-Dependent Plasticity (STDP) learning, as empirically observed in the visual cortex. Using our methods, we build a class of fully-connected, convolutional and action-based feature descriptors for event-based camera that we respectively assess on N-MNIST, challenging CIFAR10-DVS and on the IBM DVS128 gesture dataset. We report significant accuracy improvements compared to conventional state-of-the-art event-based feature descriptors (+8% on CIFAR10-DVS). We report large improvements in accuracy compared to state-of-the-art STDP-based systems (+10% on N-MNIST, +7.74% on IBM DVS128 Gesture). In addition to ultra-low-power learning in neuromorphic edge devices, our work helps paving the way towards a biologically-realistic, optimization-based theory of cortical vision.	翻訳日:2021-11-03 00:30:49 公開日:2021-11-01
# (参考訳) 予測モデルにおける共起バイアスの統計的定量化 Statistical quantification of confounding bias in predictive modelling ( http://arxiv.org/abs/2111.00814v1 ) ライセンス: CC BY 4.0	Tamas Spisak	(参考訳) 非パラメトリックな統計テストの欠如は、多くの研究分野における堅牢で有効で一般化可能な予測モデルの開発を著しく妨げている。ここでは、ある共同設立変数に対して、未確立モデルと完全構築モデルのヌル仮説をそれぞれ探索する部分的および完全共創テストを提案する。テストは、機械学習でよく見られる非正規および非線形依存予測においても、タイプiのエラーと高い統計力に対する厳密な制御を提供する。 Human Connectome ProjectとAutism Brain Imaging Data Exchangeのデータセットからトレーニングされた脳の機能的コネクティビティデータに基づいて、提案されたテストを適用することで、これまで報告されていない、あるいは最先端のコンファウンド緩和アプローチで修正が難しい共同創設者が明らかになった。 mlconfound(https://mlconfound.readthedocs.io)パッケージに実装されたこのテストは、予測モデルの一般化性と神経生物学的妥当性の評価と改善を支援し、臨床的に有用な機械学習バイオマーカーの開発を促進する。 The lack of non-parametric statistical tests for confounding bias significantly hampers the development of robust, valid and generalizable predictive models in many fields of research. Here I propose the partial and full confounder tests, which, for a given confounder variable, probe the null hypotheses of unconfounded and fully confounded models, respectively. The tests provide a strict control for Type I errors and high statistical power, even for non-normally and non-linearly dependent predictions, often seen in machine learning. Applying the proposed tests on models trained on functional brain connectivity data from the Human Connectome Project and the Autism Brain Imaging Data Exchange dataset reveals confounders that were previously unreported or found to be hard to correct for with state-of-the-art confound mitigation approaches. The tests, implemented in the package mlconfound (https://mlconfound.readthedocs.io), can aid the assessment and improvement of the generalizability and neurobiological validity of predictive models and, thereby, foster the development of clinically useful machine learning biomarkers.	翻訳日:2021-11-03 00:12:48 公開日:2021-11-01
# (参考訳) ロバストネスのためのエッセンス仕様の改訂に向けて Towards Reformulating Essence Specifications for Robustness ( http://arxiv.org/abs/2111.00821v1 ) ライセンス: CC BY-SA 4.0	\"Ozg\"ur Akg\"un, Alan M. Frisch, Ian P. Gent, Christopher Jefferson, Ian Miguel, Peter Nightingale, Andr\'as Z. Salamon	(参考訳) Essence言語は、ユーザが制約モデリング決定を行う上で上の抽象レベルで制約問題を指定することを可能にする。エッセンス仕様は、一連のリファインメントルールを使用するConjure自動モデリングツールを使用して制約モデルに洗練される。しかし本質は、与えられた問題を特定するための多くの等価な方法があるリッチ言語である。したがって、ユーザーはドメイン属性や抽象型の使用を省略できるため、適用可能な洗練されたルールが少なくなり、選択する出力モデルのセットが削減される。本稿では,入力エッセンス仕様の変動に直面した出力制約モデルの品質のロバスト性を高めるために,この情報を自動回復する問題に対処する。我々は、決定変数の型を変更したり、ドメインを縮小する属性を追加することができる改革ルールを提示します。本手法の有効性を,変換仕様から生成できるモデルの量と品質の観点から示す。 The Essence language allows a user to specify a constraint problem at a level of abstraction above that at which constraint modelling decisions are made. Essence specifications are refined into constraint models using the Conjure automated modelling tool, which employs a suite of refinement rules. However, Essence is a rich language in which there are many equivalent ways to specify a given problem. A user may therefore omit the use of domain attributes or abstract types, resulting in fewer refinement rules being applicable and therefore a reduced set of output models from which to select. This paper addresses the problem of recovering this information automatically to increase the robustness of the quality of the output constraint models in the face of variation in the input Essence specification. We present reformulation rules that can change the type of a decision variable or add attributes that shrink its domain. We demonstrate the efficacy of this approach in terms of the quantity and quality of models Conjure can produce from the transformed specification compared with the original.	翻訳日:2021-11-02 23:47:04 公開日:2021-11-01
# (参考訳) 再現性レンズによる人工知能の公正性・説明責任・信頼度・透明性の教育 Teaching Fairness, Accountability, Confidentiality, and Transparency in Artificial Intelligence through the Lens of Reproducibility ( http://arxiv.org/abs/2111.00826v1 ) ライセンス: CC BY 4.0	Ana Lucic, Maurits Bleeker, Sami Jullien, Samarth Bhargav, Maarten de Rijke	(参考訳) 本研究は,アムステルダム大学の公正性,説明可能性,信頼性,透明性に関する技術的・大学院レベルのコース(FACT-AI)について,再現性のレンズを通してFACT-AIの概念を教える。コースの焦点は、トップAIカンファレンスから既存のFACT-AIアルゴリズムを再現し、彼らの経験に関するレポートを書くことに基づくグループプロジェクトである。コースの最初のイテレーションで、私たちはグループプロジェクトのコード実装を備えたオープンソースリポジトリを作成しました。第2イテレーションでは、学生に対して、機械学習再現性チャレンジにグループプロジェクトを提出するように勧めました。我々は、1年が世界的なパンデミックと一致した2年間の授業を指導した経験を振り返り、大学院レベルのaiプログラムで再現性を通じてファクトaiを教えるためのガイドラインを提案する。将来、教員が大学に同様のコースを開設する上で有用なリソースになることを願っている。 In this work we explain the setup for a technical, graduate-level course on Fairness, Accountability, Confidentiality and Transparency in Artificial Intelligence (FACT-AI) at the University of Amsterdam, which teaches FACT-AI concepts through the lens of reproducibility. The focal point of the course is a group project based on reproducing existing FACT-AI algorithms from top AI conferences, and writing a report about their experiences. In the first iteration of the course, we created an open source repository with the code implementations from the group projects. In the second iteration, we encouraged students to submit their group projects to the Machine Learning Reproducibility Challenge, which resulted in 9 reports from our course being accepted to the challenge. We reflect on our experience teaching the course over two academic years, where one year coincided with a global pandemic, and propose guidelines for teaching FACT-AI through reproducibility in graduate-level AI programs. We hope this can be a useful resource for instructors to set up similar courses at their universities in the future.	翻訳日:2021-11-02 23:37:47 公開日:2021-11-01
# (参考訳) 低リソース言語における名前付きエンティティ認識のためのディープラーニングトランスフォーマーアーキテクチャ:最先端の成果 Deep Learning Transformer Architecture for Named Entity Recognition on Low Resourced Languages: State of the art results ( http://arxiv.org/abs/2111.00830v1 ) ライセンス: CC BY 4.0	Ridewaan Hanslo	(参考訳) 本稿では,低リソースの南アフリカ(SA)言語10言語を対象としたNERのためのディープラーニング(DL)トランスフォーマーアーキテクチャモデルの評価について述べる。さらに、これらのDLトランスモデルを他のニューラルネットワークおよび機械学習(ML)NERモデルと比較した。その結果,言語毎に離散的な微調整パラメータを適用すると,トランスフォーマーモデルの性能が著しく向上することがわかった。さらに、微調整トランスフォーマーモデルは、低リソースのsa言語でnerを使った他のニューラルネットワークや機械学習モデルよりも優れている。例えば、トランスフォーマーモデルは、条件付き確率場mlモデルを上回る平均f-scoreを含む10のsa言語のうち6言語で最高のf-scoreを生成した。さらなる研究は、フレーズチャンキング、機械翻訳、パート・オブ・スパイチ・タギングなど、他の自然言語処理タスクやアプリケーションにおける、より最近のトランスフォーマーアーキテクチャモデルを評価する可能性がある。 This paper reports on the evaluation of Deep Learning (DL) transformer architecture models for Named-Entity Recognition (NER) on ten low-resourced South African (SA) languages. In addition, these DL transformer models were compared to other Neural Network and Machine Learning (ML) NER models. The findings show that transformer models significantly improve performance when applying discrete fine-tuning parameters per language. Furthermore, fine-tuned transformer models outperform other neural network and machine learning models with NER on the low-resourced SA languages. For example, the transformer models generated the highest F-scores for six of the ten SA languages, including the highest average F-score surpassing the Conditional Random Fields ML model. Additional research could evaluate the more recent transformer architecture models on other Natural Language Processing tasks and applications, such as Phrase chunking, Machine Translation, and Part-of-Speech tagging.	翻訳日:2021-11-02 23:21:37 公開日:2021-11-01
# (参考訳) deep learning seeded importance sampling による重力波の高速定位 Swift sky localization of gravitational waves using deep learning seeded importance sampling ( http://arxiv.org/abs/2111.00833v1 ) ライセンス: CC BY 4.0	Alex Kolmus, Gr\'egory Baltus, Justin Janquart, Twan van Laarhoven, Sarah Caudill, and Tom Heskes	(参考訳) 重力波の天起源の高速で高精度で信頼性の高い推定は、リアルタイムのマルチメッセンガー天文学を可能にする。現在のベイズ推論方法論は、正確かつ信頼性が高いが、遅い。ディープラーニングモデルは、重力波の推論タスクに対して正確で極めて高速であることを示してきたが、その出力はニューラルネットワークのブラックボックスの性質のために本質的に疑わしい。本研究では,多頭部畳み込みニューラルネットワークによって生成された近似後部への重要サンプリングを適用し,ベイズ推論と深層学習に参加する。ニューラルネットワークは、ligoおよびvirgo検出器に与えられたシミュレーション重力波注入のための空座標と2つの質量のフォン・ミセス・フィッシャー分布とガウス分布をパラメータ化する。ベイズ推定を数分で生成する予測に非常によく似た、見えない重力波イベントのためのスカイマップを生成する。さらに、ニューラルネットワークから予測不良を検出し、素早くフラグを立てることができます。 Fast, highly accurate, and reliable inference of the sky origin of gravitational waves would enable real-time multi-messenger astronomy. Current Bayesian inference methodologies, although highly accurate and reliable, are slow. Deep learning models have shown themselves to be accurate and extremely fast for inference tasks on gravitational waves, but their output is inherently questionable due to the blackbox nature of neural networks. In this work, we join Bayesian inference and deep learning by applying importance sampling on an approximate posterior generated by a multi-headed convolutional neural network. The neural network parametrizes Von Mises-Fisher and Gaussian distributions for the sky coordinates and two masses for given simulated gravitational wave injections in the LIGO and Virgo detectors. We generate skymaps for unseen gravitational-wave events that highly resemble predictions generated using Bayesian inference in a few minutes. Furthermore, we can detect poor predictions from the neural network, and quickly flag them.	翻訳日:2021-11-02 23:10:35 公開日:2021-11-01
# (参考訳) GradCAMを用いた実時間MRI変動のシミュレーションによるディープラーニングモデルと視覚的説明の改善 Simulating Realistic MRI variations to Improve Deep Learning model and visual explanations using GradCAM ( http://arxiv.org/abs/2111.00837v1 ) ライセンス: CC BY 4.0	Muhammad Ilyas Patel, Shrey Singla, Razeem Ahmad Ali Mattathodi, Sumit Sharma, Deepam Gautam, Srinivasa Rao Kundeti	(参考訳) 医療分野では、MRIのランドマーク検出は、スキャン計画や画像登録などのタスクにおいて、医療技術者の努力を減らす上で重要な役割を果たす。まず、脳の解剖学に散在する88のランドマーク -- 矢状、冠、軸 -- が手動で注釈付けされ、その後、専門臨床技術者のガイドラインは、斜めスキャンでも重要なアトラスのランドマークを特定するために、既存のランドマークのより適切な位置化のために解剖学的に取られる。限られたデータ可用性を克服するため,合成3次元ボリュームデータを生成するために,現実的なデータ拡張を実装した。修正されたHighRes3DNetモデルを用いて脳MRIボリュームランドマーク検出問題を解決する。未発見のデータ上でトレーニングされたモデルを視覚的に説明し、より弱いモデルからより強固なモデルを識別するために、勾配重み付きクラスアクティベーションマッピング(grad-cam)を実装し、モデルが集中している領域を強調する粗いローカライズマップを作成します。実験の結果,提案手法は良好な結果を示し,パイプライン全体を多数のランドマークや他の解剖学に拡張できることがわかった。 In the medical field, landmark detection in MRI plays an important role in reducing medical technician efforts in tasks like scan planning, image registration, etc. First, 88 landmarks spread across the brain anatomy in the three respective views -- sagittal, coronal, and axial are manually annotated, later guidelines from the expert clinical technicians are taken sub-anatomy-wise, for better localization of the existing landmarks, in order to identify and locate the important atlas landmarks even in oblique scans. To overcome limited data availability, we implement realistic data augmentation to generate synthetic 3D volumetric data. We use a modified HighRes3DNet model for solving brain MRI volumetric landmark detection problem. In order to visually explain our trained model on unseen data, and discern a stronger model from a weaker model, we implement Gradient-weighted Class Activation Mapping (Grad-CAM) which produces a coarse localization map highlighting the regions the model is focusing. Our experiments show that the proposed method shows favorable results, and the overall pipeline can be extended to a variable number of landmarks and other anatomies.	翻訳日:2021-11-02 22:54:26 公開日:2021-11-01
# (参考訳) 大規模ディープラーニング最適化: 総合的な調査 Large-Scale Deep Learning Optimizations: A Comprehensive Survey ( http://arxiv.org/abs/2111.00856v1 ) ライセンス: CC BY 4.0	Xiaoxin He, Fuzhao Xue, Xiaozhe Ren, Yang You	(参考訳) ディープラーニングは、幅広いAIアプリケーションで有望な結果を得た。より大きなデータセットとモデルにより、継続的にパフォーマンスが向上する。しかし、私たちは一般的に、より多くの計算と通信に長いトレーニング時間を費やしています。本研究では,モデル精度とモデル効率に関して,大規模深層学習の最適化に関する明確なスケッチを提供する。我々は,大規模バッチ学習で発生する一般化ギャップの解答的トピックを最適化するために最もよく用いられるアルゴリズムについて検討し,通信オーバヘッドに対処し,メモリフットプリントを削減するためのSOTA戦略を概観する。 Deep learning have achieved promising results on a wide spectrum of AI applications. Larger datasets and models consistently yield better performance. However, we generally spend longer training time on more computation and communication. In this survey, we aim to provide a clear sketch about the optimizations for large-scale deep learning with regard to the model accuracy and model efficiency. We investigate algorithms that are most commonly used for optimizing, elaborate the debatable topic of generalization gap arises in large-batch training, and review the SOTA strategies in addressing the communication overhead and reducing the memory footprints.	翻訳日:2021-11-02 22:39:47 公開日:2021-11-01
# (参考訳) pcaに基づくマルチタスク学習:ランダムマトリクスによるアプローチ PCA-based Multi Task Learning: a Random Matrix Approach ( http://arxiv.org/abs/2111.00924v1 ) ライセンス: CC BY 4.0	Malik Tiomoko, Romain Couillet and Fr\'ed\'eric Pascal	(参考訳) 本稿では,人気主成分分析(PCA)に基づく教師付き学習スキーム \cite{barshan 2011supervised,bair2006prediction} における,emph{computationally efficient} multi-task learning (MTL)拡張の提案と理論的解析を行う。その分析が明らかにする (i) デフォルト学習は,emph{ negative transfer} に苦しむことによって劇的に失敗することがあるが, (ii)データラベルの単純なカウンタ測定は負の転送を回避し、必ずしも性能の向上をもたらす。合成および実データベンチマーク実験の支援により,提案手法は最先端のMTL法と同等の性能を示すが,計算コストは大幅に削減された。 The article proposes and theoretically analyses a \emph{computationally efficient} multi-task learning (MTL) extension of popular principal component analysis (PCA)-based supervised learning schemes \cite{barshan2011supervised,bair2006prediction}. The analysis reveals that (i) by default learning may dramatically fail by suffering from \emph{negative transfer}, but that (ii) simple counter-measures on data labels avert negative transfer and necessarily result in improved performances. Supporting experiments on synthetic and real data benchmarks show that the proposed method achieves comparable performance with state-of-the-art MTL methods but at a \emph{significantly reduced computational cost}.	翻訳日:2021-11-02 22:38:56 公開日:2021-11-01
# (参考訳) 領域不確実性定量化による半教師あり学習 Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification ( http://arxiv.org/abs/2111.00928v1 ) ライセンス: CC BY 4.0	Zhenyu Wang, Yali Li, Ye Guo, Shengjin Wang	(参考訳) 半教師付き学習は、大量のラベルのないデータをパフォーマンス向上に活用することを目的としている。既存の作品は主に画像分類に焦点を当てている。本稿では,ラベル付きデータの収集に手間がかかる物体検出のための半教師付き学習について述べる。現在の手法は、擬似ラベルによって生成されるノイズの多い領域によって容易に妨げられる。雑音ラベリングに対処するため,領域の不確実性を定量化して雑音耐性半教師付き学習を提案する。まず, 擬似ラベルによるノイズの異なる形態による悪影響について検討した。そこで本研究では,異なる強度の領域の耐雑音特性を同定することにより,領域の不確かさを定量化する。領域不確かさをインポートし、マルチピーク確率分布アウトプットを促進することにより、不確実性をトレーニングに導入し、さらに耐雑音学習を実現する。 PASCAL VOCとMS COCOの併用実験により,本手法の異常な性能を実証した。 Semi-supervised learning aims to leverage a large amount of unlabeled data for performance boosting. Existing works primarily focus on image classification. In this paper, we delve into semi-supervised learning for object detection, where labeled data are more labor-intensive to collect. Current methods are easily distracted by noisy regions generated by pseudo labels. To combat the noisy labeling, we propose noise-resistant semi-supervised learning by quantifying the region uncertainty. We first investigate the adverse effects brought by different forms of noise associated with pseudo labels. Then we propose to quantify the uncertainty of regions by identifying the noise-resistant properties of regions over different strengths. By importing the region uncertainty quantification and promoting multipeak probability distribution output, we introduce uncertainty into training and further achieve noise-resistant learning. Experiments on both PASCAL VOC and MS COCO demonstrate the extraordinary performance of our method.	翻訳日:2021-11-02 22:21:38 公開日:2021-11-01
# (参考訳) あらゆる境界:双方向境界を持つエネルギーベースモデルのトレーニング Bounds all around: training energy-based models with bidirectional bounds ( http://arxiv.org/abs/2111.00929v1 ) ライセンス: CC BY 4.0	Cong Geng, Jia Wang, Zhiyong Gao, Jes Frellsen, S{\o}ren Hauberg	(参考訳) エネルギーベースモデル(EBM)は密度推定のためのエレガントなフレームワークを提供するが、それらは訓練が難しいことで知られている。近年の研究では、変動値関数を持つミニマックスゲームを通じてebmを訓練する生成的敵ネットワークとの関連が確立されている。本稿では,ebmログライクな双方向バウンドを提案し,低バウンドを最大化し,ミニマックスゲームを解く際の上限を最小化する。我々は、トレーニングを安定させる勾配ペナルティに縛り付けられたペナルティをリンクし、最高のエンジニアリングプラクティスの基盤を提供します。境界を評価するために、ebm生成器のヤコビ決定式の新規かつ効率的な推定器を開発した。これらの開発はトレーニングを著しく安定させ,高品質な密度推定とサンプル生成を実現している。 Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train. Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function. We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper bound when solving the minimax game. We link one bound to a gradient penalty that stabilizes training, thereby providing grounding for best engineering practice. To evaluate the bounds we develop a new and efficient estimator of the Jacobi-determinant of the EBM generator. We demonstrate that these developments significantly stabilize training and yield high-quality density estimation and sample generation.	翻訳日:2021-11-02 22:08:52 公開日:2021-11-01
# (参考訳) 注意機構を用いたNested Multiple Instance Learning Nested Multiple Instance Learning with Attention Mechanisms ( http://arxiv.org/abs/2111.00947v1 ) ライセンス: CC BY 4.0	Saul Fuster, Trygve Eftest{\o}l, Kjersti Engan	(参考訳) 多重インスタンス学習(MIL)は、未知のラベルを持つデータの複数のインスタンスをバッグに分類する弱い教師付き学習の一種である。個々のインスタンスに関する知識は不完全であるため、ラベルはインスタンスを含むバッグに割り当てられる。この方法はラベル付きデータに適合するが、画像への関心領域の発見や時系列信号の集合におけるイベントの検出など、インスタンスの集合間の関連性が必要な、より複雑なシナリオを解決するための深さが欠けている。 Nested MILは、最外側のバッグだけがラベル付けされ、インナーバッグとインスタンスが潜在ラベルとして表現されるバッグ内のラベル付きバッグについて検討している。さらに,各インスタンスが弱いバッグラベルに与える影響を認識できるように,アテンション機構を用いて解釈可能性を高めることを提案する。古典的画像データセットにおける実験により,提案モデルが画像領域の関連インスタンスの発見だけでなく,高精度な性能を提供することが示された。 Multiple instance learning (MIL) is a type of weakly supervised learning where multiple instances of data with unknown labels are sorted into bags. Since knowledge about the individual instances is incomplete, labels are assigned to the bags containing the instances. While this method fits diverse applications were labelled data is scarce, it lacks depth for solving more complex scenarios where associations between sets of instances have to be made, like finding relevant regions of interest in an image or detecting events in a set of time-series signals. Nested MIL considers labelled bags within bags, where only the outermost bag is labelled and inner-bags and instances are represented as latent labels. In addition, we propose using an attention mechanism to add interpretability, providing awareness into the impact of each instance to the weak bag label. Experiments in classical image datasets show that our proposed model provides high accuracy performance as well as spotting relevant instances on image regions.	翻訳日:2021-11-02 21:49:09 公開日:2021-11-01
# (参考訳) gtfs2vec -- マイクロリージョンにおける公共交通提供の比較のためのGTFS埋め込みの学習 gtfs2vec -- Learning GTFS Embeddings for comparing Public Transport Offer in Microregions ( http://arxiv.org/abs/2111.00960v1 ) ライセンス: CC BY-SA 4.0	Piotr Gramacki, Szymon Wo\'zniak, Piotr Szyma\'nski	(参考訳) 欧州48都市を選定し,公共交通機関の時刻表をgtfs形式で収集した。 UberのH3空間指数を用いて、各都市を六角形に分割した。時刻表データに基づいて、各地域における公共交通機関の可用性の量と多様性を記述する特定の機能を作成しました。次に、各領域を埋め込むための自己連想型ディープニューラルネットワークを訓練した。このような表現を準備した上で,階層的クラスタリングアプローチを用いて類似領域を識別した。そこで我々は,領域間のユークリッド距離を持つ凝集クラスタリングアルゴリズムとウォード法を用いてクラスタ内分散を最小化した。最後に、得られたクラスタを異なるレベルで分析し、公共交通機関の可用性を質的に記述するいくつかのクラスタを特定した。本研究は, 分析都市の特徴と一致し, 公共交通機関のスケジュール特性に類似した地域を検索できることを示した。 We selected 48 European cities and gathered their public transport timetables in the GTFS format. We utilized Uber's H3 spatial index to divide each city into hexagonal micro-regions. Based on the timetables data we created certain features describing the quantity and variety of public transport availability in each region. Next, we trained an auto-associative deep neural network to embed each of the regions. Having such prepared representations, we then used a hierarchical clustering approach to identify similar regions. To do so, we utilized an agglomerative clustering algorithm with a euclidean distance between regions and Ward's method to minimize in-cluster variance. Finally, we analyzed the obtained clusters at different levels to identify some number of clusters that qualitatively describe public transport availability. We showed that our typology matches the characteristics of analyzed cities and allows succesful searching for areas with similar public transport schedule characteristics.	翻訳日:2021-11-02 21:38:11 公開日:2021-11-01
# (参考訳) 天文学における深層学習アルゴリズムのロバスト性-銀河形態学的研究 Robustness of deep learning algorithms in astronomy -- galaxy morphology studies ( http://arxiv.org/abs/2111.00961v1 ) ライセンス: CC BY 4.0	A. \'Ciprijanovi\'c, D. Kafkes, G. N. Perdue, K. Pedro, G. Snyder, F. J. S\'anchez, S. Madireddy, S. Wild, B. Nord	(参考訳) ディープラーニングモデルは、特に科学データの高次元とボリュームを扱うために、幅広い科学領域で広く採用されている。しかし、これらのモデルは複雑さと過小パラメータ化のために不安定になりがちであり、特に、実際の科学データでよく見られる圧縮やぼやけといった一般的な画像処理によって現れる不注意な逆向きの摂動が原因である。この不安定さを理解し、これらの敵対的摂動に対して堅牢なモデルを開発することが重要である。本研究では、露光時間からの観測ノイズの影響と、LSSTモックデータにおける異なる形態の銀河の識別を訓練したResNet18の性能に対する圧縮や望遠鏡誤差のプロキシとしての1ピクセル攻撃の最悪のシナリオについて検討する。我々はまた、このタイプの自然発生攻撃の場合に、ドメイン適応技術がモデルのロバスト性を改善するのにどのように役立つかを検討し、科学者がより信頼できる安定したモデルを構築するのを助ける。 Deep learning models are being increasingly adopted in wide array of scientific domains, especially to handle high-dimensionality and volume of the scientific data. However, these models tend to be brittle due to their complexity and overparametrization, especially to the inadvertent adversarial perturbations that can appear due to common image processing such as compression or blurring that are often seen with real scientific data. It is crucial to understand this brittleness and develop models robust to these adversarial perturbations. To this end, we study the effect of observational noise from the exposure time, as well as the worst case scenario of a one-pixel attack as a proxy for compression or telescope errors on performance of ResNet18 trained to distinguish between galaxies of different morphologies in LSST mock data. We also explore how domain adaptation techniques can help improve model robustness in case of this type of naturally occurring attacks and help scientists build more trustworthy and stable models.	翻訳日:2021-11-02 21:27:51 公開日:2021-11-01
# (参考訳) iflow:一様コーダによる効率的なロスレス圧縮のための数値可逆流れ iFlow: Numerically Invertible Flows for Efficient Lossless Compression via a Uniform Coder ( http://arxiv.org/abs/2111.00965v1 ) ライセンス: CC BY 4.0	Shifeng Zhang, Ning Kang, Tom Ryder and Zhenguo Li	(参考訳) 世界は2020年に59ZB$ (5.9 \times 10^{13} GB$) のデータを生産したと推定され、データストレージと送信の両方で莫大なコストがかかった。幸いなことに、ディープジェネレーティブモデルの最近の進歩は、いわゆる「ニューラル圧縮」アルゴリズムの新たなクラスを先導し、圧縮比の点で従来のコーデックを大きく上回っている。残念ながら、ニューラルネットワークの圧縮は、その帯域幅が限られているため、商業的関心をほとんど集めていないため、非常に効率的なフレームワークの開発は、実用上非常に重要である。本稿では,高圧縮比を実現するための大きな能力を示す正規化フローを用いた無損失圧縮について論じる。そこで我々は,効率的なロスレス圧縮を実現する新しい手法iFlowを紹介する。まず, モジュールスケール変換(MST)と, MSTに基づく数値逆流変換の新たなファミリを提案する。次に、iFlowに組み込まれた高速均一分散コーデックであるUniform Base Conversion System (UBCS)を導入し、効率的な圧縮を実現する。 iFlowは最先端圧縮比を達成し、他の高性能スキームよりも5\times$高速である。さらに,本論文では,フローに基づく幅広いアルゴリズムの符号化時間を高速化する手法を提案する。 It was estimated that the world produced $59 ZB$ ($5.9 \times 10^{13} GB$) of data in 2020, resulting in the enormous costs of both data storage and transmission. Fortunately, recent advances in deep generative models have spearheaded a new class of so-called "neural compression" algorithms, which significantly outperform traditional codecs in terms of compression ratio. Unfortunately, the application of neural compression garners little commercial interest due to its limited bandwidth; therefore, developing highly efficient frameworks is of critical practical importance. In this paper, we discuss lossless compression using normalizing flows which have demonstrated a great capacity for achieving high compression ratios. As such, we introduce iFlow, a new method for achieving efficient lossless compression. We first propose Modular Scale Transform (MST) and a novel family of numerically invertible flow transformations based on MST. Then we introduce the Uniform Base Conversion System (UBCS), a fast uniform-distribution codec incorporated into iFlow, enabling efficient compression. iFlow achieves state-of-the-art compression ratios and is $5\times$ quicker than other high-performance schemes. Furthermore, the techniques presented in this paper can be used to accelerate coding time for a broad class of flow-based algorithms.	翻訳日:2021-11-02 21:18:10 公開日:2021-11-01
# (参考訳) Hex2vec - OpenStreetMapタグでH3ヘキサゴナルを埋め込むコンテキスト認識 Hex2vec -- Context-Aware Embedding H3 Hexagons with OpenStreetMap Tags ( http://arxiv.org/abs/2111.00970v1 ) ライセンス: CC BY-SA 4.0	Szymon Wo\'zniak, Piotr Szyma\'nski	(参考訳) 空間的および地理的データの表現学習は、ディープニューラルネットワークを用いた領域間の類似性検出と高品質な推論を可能にする、急速に発展する分野である。しかし過去のアプローチでは、ラスター画像(地図、通り、衛星写真)、移動データ、あるいは道路ネットワークの埋め込みに集中していた。本稿では,マイクログリッドにおける都市機能と土地利用に関して,OpenStreetMap領域のベクトル表現を学習するための最初のアプローチを提案する。土地利用, 建築, 都市域の機能, 水の種類, 緑その他の自然地域の主な特徴に関連するOSMタグのサブセットを同定する。タギングの質を手作業で検証し,訓練対象都市36都市を選定した。 UberのH3インデックスは、都市を六角形に分割するために使われ、OSMタグは六角形ごとに集約された。負サンプリングを用いたスキップグラムモデルに基づくhex2vec法を提案する。結果として得られるベクトル表現は、ベクトルベースの言語モデルに見られるものと同様、地図特性のセマンティック構造を示す。また, ポーランドの6都市における地域類似度検出の知見を提示し, 集積クラスタリングにより得られた地域型について提案する。 Representation learning of spatial and geographic data is a rapidly developing field which allows for similarity detection between areas and high-quality inference using deep neural networks. Past approaches however concentrated on embedding raster imagery (maps, street or satellite photos), mobility data or road networks. In this paper we propose the first approach to learning vector representations of OpenStreetMap regions with respect to urban functions and land-use in a micro-region grid. We identify a subset of OSM tags related to major characteristics of land-use, building and urban region functions, types of water, green or other natural areas. Through manual verification of tagging quality, we selected 36 cities were for training region representations. Uber's H3 index was used to divide the cities into hexagons, and OSM tags were aggregated for each hexagon. We propose the hex2vec method based on the Skip-gram model with negative sampling. The resulting vector representations showcase semantic structures of the map characteristics, similar to ones found in vector-based language models. We also present insights from region similarity detection in six Polish cities and propose a region typology obtained through agglomerative clustering.	翻訳日:2021-11-02 20:21:21 公開日:2021-11-01
# (参考訳) 転置学習に基づく発音スコアリング手法 A transfer learning based approach for pronunciation scoring ( http://arxiv.org/abs/2111.00976v1 ) ライセンス: CC BY 4.0	Marcelo Sancinetti, Jazmin Vidal, Cyntia Bonomi, Luciana Ferrer	(参考訳) 音声レベルの発音のスコア付けは難しい課題であり、人間の注釈装置とは程遠いパフォーマンスである。標準システムは、ネイティブデータのみを持つ自動音声認識(asr)用に訓練されたモデルを使用して、フレーズ内の各電話機のスコアを生成する。非ネイティブデータを使用してタスクのために特別にトレーニングされたシステムを使用する場合、パフォーマンスが向上している。しかし、このようなシステムは、このタスクのためにラベル付けされたデータセットが少なく、通常は小さいという課題に直面している。本稿では,asrに訓練されたモデルを活用して,発音スコアリングのタスクに適応するトランスファー学習に基づくアプローチを提案する。本稿では,いくつかの設計選択の効果を分析し,その性能をGOPシステムと比較する。最終システムは,不必要な修正率の低減を優先するコスト関数として,評価研究のためのデータベースであるEpaDBのGOPシステムよりも20%優れている。 Phone-level pronunciation scoring is a challenging task, with performance far from that of human annotators. Standard systems generate a score for each phone in a phrase using models trained for automatic speech recognition (ASR) with native data only. Better performance has been shown when using systems that are trained specifically for the task using non-native data. Yet, such systems face the challenge that datasets labelled for this task are scarce and usually small. In this paper, we present a transfer learning-based approach that leverages a model trained for ASR, adapting it for the task of pronunciation scoring. We analyze the effect of several design choices and compare the performance with a state-of-the-art goodness of pronunciation (GOP) system. Our final system is 20% better than the GOP system on EpaDB, a database for pronunciation scoring research, for a cost function that prioritizes low rates of unnecessary corrections.	翻訳日:2021-11-02 20:08:05 公開日:2021-11-01
# (参考訳) 部分適応部分モジュラー最大化 Partial-Adaptive Submodular Maximization ( http://arxiv.org/abs/2111.00986v1 ) ライセンス: CC BY 4.0	Shaojie Tang and Jing Yuan	(参考訳) 典型的な適応的逐次決定問題の目標は、いくつかの部分的な観測に基づいて項目群を逐次選択する対話的ポリシーを設計し、期待される有用性を最大化することである。プール型能動学習や適応的影響最大化を含む実世界の多くのアプリケーションの有用性は適応的部分モジュラリティの性質を満たすことが示されている。しかしながら、適応部分モジュラー最大化に関する既存の研究のほとんどは、完全な適応設定に焦点を当てており、次の選択を行う前に、次の選択からフィードバックを待つ必要がある。このアプローチは過去のフィードバックを最大限に活用して情報的決定を行うことができるが、すべての選択が事前に行われる非適応的なソリューションと比較して、選択プロセスを完成させるのに長い時間がかかるかもしれない。本稿では,バッチ内で複数の選択を同時に行い,それらの実現を同時に観察できる部分適応部分モジュラー最大化の問題について検討する。我々のアプローチは、過去の選択から観察を待つ時間を削減するとともに、適応性の利点を享受します。最善の知識では、非単調適応部分モジュラー最大化問題に対する部分適応ポリシーは知られていない。我々はこの問題を,濃度制約とknapsack制約の両方の下で検討し,どちらの場合においても効果的かつ効率的な解法を開発する。我々はまた、バッチクエリの複雑さ、すなわち、ポリシーが選択プロセスの完了に要するバッチの数を、いくつかの仮定の下で分析する。 The goal of a typical adaptive sequential decision making problem is to design an interactive policy that selects a group of items sequentially, based on some partial observations, to maximize the expected utility. It has been shown that the utility functions of many real-world applications, including pooled-based active learning and adaptive influence maximization, satisfy the property of adaptive submodularity. However, most of existing studies on adaptive submodular maximization focus on the fully adaptive setting, i.e., one must wait for the feedback from \emph{all} past selections before making the next selection. Although this approach can take full advantage of feedback from the past to make informed decisions, it may take a longer time to complete the selection process as compared with the non-adaptive solution where all selections are made in advance before any observations take place. In this paper, we explore the problem of partial-adaptive submodular maximization where one is allowed to make multiple selections in a batch simultaneously and observe their realizations together. Our approach enjoys the benefits of adaptivity while reducing the time spent on waiting for the observations from past selections. To the best of our knowledge, no results are known for partial-adaptive policies for the non-monotone adaptive submodular maximization problem. We study this problem under both cardinality constraint and knapsack constraints, and develop effective and efficient solutions for both cases. We also analyze the batch query complexity, i.e., the number of batches a policy takes to complete the selection process, of our policy under some additional assumptions.	翻訳日:2021-11-02 19:54:29 公開日:2021-11-01
# (参考訳) 手話理解のための手話理解モデル--ナイジェリア手話言語を事例として Sign-to-Speech Model for Sign Language Understanding: A Case Study of Nigerian Sign Language ( http://arxiv.org/abs/2111.00995v1 ) ライセンス: CC BY 4.0	Steven Kolawole, Opeyemi Osakuade, Nayan Saxena, Babatunde Kazeem Olorisade	(参考訳) 本稿では,ナイジェリアを事例として,アフリカのサハラ以南地域において,手話に精通していない一般社会と難聴者のコミュニケーション障壁を低減し,難聴症例が最も多い地域社会のコミュニケーション障壁を緩和することを目的とした。このデータセットはナイジェリア手話言語の先駆的なデータセットであり、関連する利害関係者と共同で作成された。 2つの異なるオブジェクト検出モデルと分類モデルに対する準備状態のデータを前処理し,手話からテキストへの変換タスクにおけるモデル性能を測定するために多様な評価指標を用いた。最後に、予測した手話テキストを音声に変換し、リアルタイムに動作し、手話/フレーズをテキストに変換し、次に音声に変換する印象的な結果を達成する軽量アプリケーションにおいて、最高のパフォーマンスモデルを展開する。 Through this paper, we seek to reduce the communication barrier between the hearing-impaired community and the larger society who are usually not familiar with sign language in the sub-Saharan region of Africa with the largest occurrences of hearing disability cases, while using Nigeria as a case study. The dataset is a pioneer dataset for the Nigerian Sign Language and was created in collaboration with relevant stakeholders. We pre-processed the data in readiness for two different object detection models and a classification model and employed diverse evaluation metrics to gauge model performance on sign-language to text conversion tasks. Finally, we convert the predicted sign texts to speech and deploy the best performing model in a lightweight application that works in real-time and achieves impressive results converting sign words/phrases to text and subsequently, into speech.	翻訳日:2021-11-02 19:31:34 公開日:2021-11-01
# (参考訳) 時間的文脈から少し助けを借りて:マルチモーダル・エゴセントリックなアクション認識 With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition ( http://arxiv.org/abs/2111.01024v1 ) ライセンス: CC BY 4.0	Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen	(参考訳) エゴセントリックなビデオでは、アクションは素早く続く。行動の時間的文脈を活かし、認識性能を向上させるために周囲の行動に出席することを学ぶ手法を提案する。時間的文脈を組み込むために,映像や音声を入力モダリティとして取り入れるトランスフォーマーに基づくマルチモーダルモデルを提案し,その予測を強化するためにアクションシーケンスコンテキストを提供する明示的な言語モデルを提案する。我々は,EPIC-KITCHENSとEGTEAデータセットを用いて,最先端の性能を報告する。音声入力のモダリティと言語モデルを用いて予測を再スコア化することで,時間的文脈の活用のメリットを実証する。コードとモデル:https://github.com/ekazakos/MTCN。 In egocentric videos, actions occur in quick succession. We capitalise on the action's temporal context and propose a method that learns to attend to surrounding actions in order to improve recognition performance. To incorporate the temporal context, we propose a transformer-based multimodal model that ingests video and audio as input modalities, with an explicit language model providing action sequence context to enhance the predictions. We test our approach on EPIC-KITCHENS and EGTEA datasets reporting state-of-the-art performance. Our ablations showcase the advantage of utilising temporal context as well as incorporating audio input modality and language model to rescore predictions. Code and models at: https://github.com/ekazakos/MTCN.	翻訳日:2021-11-02 19:25:34 公開日:2021-11-01
# (参考訳) 材料科学・化学のための解釈・説明可能な機械学習 Interpretable and Explainable Machine Learning for Materials Science and Chemistry ( http://arxiv.org/abs/2111.01037v1 ) ライセンス: CC BY 4.0	Felipe Oviedo, Juan Lavista Ferres, Tonio Buonassisi, Keith Butler	(参考訳) 材料科学におけるデータ駆動アプローチの普及は、科学的発見を成功させるための機械学習モデルの真の可能性を実現するための、エキサイティングな初期段階にあるが、それらは純粋に予測能力を超えた性質を持つ必要がある。モデルの予測と内部動作は、人間の専門家によるある程度の説明可能性を提供し、潜在的なモデル問題や制限の特定を可能にし、モデル予測への信頼を築き、科学的洞察につながる予期せぬ相関を明らかにするべきである。本稿では,材料科学・化学における解釈可能性・説明可能性技術の応用を概説し,これらの技術が科学研究の成果をどう改善するかを論じる。 While the uptake of data-driven approaches for materials science is at an exciting, early stage, to realise the true potential of machine learning models for successful scientific discovery, they must have qualities beyond purely predictive power. The predictions and inner workings of models should provide a certain degree of explainability by human experts, permitting the identification of potential model issues or limitations, building trust on model predictions and unveiling unexpected correlations that may lead to scientific insights. In this work, we summarize applications of interpretability and explainability techniques for materials science and chemistry and discuss how these techniques can improve the outcome of scientific studies.	翻訳日:2021-11-02 19:02:58 公開日:2021-11-01
# (参考訳) カオス力学系における同化の学習 Learning to Assimilate in Chaotic Dynamical Systems ( http://arxiv.org/abs/2111.01058v1 ) ライセンス: CC BY 4.0	Michael McCabe and Jed Brown	(参考訳) カオスシステムにおけるシミュレーションに基づく予測の精度は、予測の初期化時のシステム状態の高品質な推定に大きく依存する。データ同化法は、雑音、不完全観測、システム力学の数値モデルを体系的に組み合わせて、これらの初期条件を推測するために用いられる。我々は, 基底真理データを必要とせず, 雑音の連続した観測から力学系を同化する学習フレームワークであるアモータライズド・アシミレーションを導入する。我々は,自己教師付き記述から,微分可能なシミュレーションを用いて動的システム設定へ強力な結果を拡張することにより,フレームワークのモチベーションを高める。複数のベンチマークシステムにまたがる実験結果から,広く利用されているデータ同化手法に対するアプローチの有効性が示唆された。 The accuracy of simulation-based forecasting in chaotic systems is heavily dependent on high-quality estimates of the system state at the time the forecast is initialized. Data assimilation methods are used to infer these initial conditions by systematically combining noisy, incomplete observations and numerical models of system dynamics to produce effective estimation schemes. We introduce amortized assimilation, a framework for learning to assimilate in dynamical systems from sequences of noisy observations with no need for ground truth data. We motivate the framework by extending powerful results from self-supervised denoising to the dynamical systems setting through the use of differentiable simulation. Experimental results across several benchmark systems highlight the improved effectiveness of our approach over widely-used data assimilation methods.	翻訳日:2021-11-02 18:42:14 公開日:2021-11-01
# (参考訳) ZeBRA:ゼロデータに基づく繰り返しビットフリップ攻撃によるニューラルネットワークの高精度破壊 ZeBRA: Precisely Destroying Neural Networks with Zero-Data Based Repeated Bit Flip Attack ( http://arxiv.org/abs/2111.01080v1 ) ライセンス: CC BY 4.0	Dahoon Park, Kon-Woo Kwon, Sunghoon Im, Jaeha Kung	(参考訳) 本稿では,自己攻撃データセットを合成することにより,ディープニューラルネットワーク(dnn)を高精度に破壊するゼロデータ型反復ビットフリップ攻撃(zebra)を提案する。対向重み攻撃に関する多くの先行研究は、重みパラメータだけでなく、攻撃対象の脆弱なビットを探索する際のトレーニングやテストデータセットも必要である。本研究では,被害者のdnnモデルにおけるバッチ正規化層の統計を利用して,蒸留対象データと呼ばれる攻撃データセットを合成する。蒸留したターゲットデータを備えたzebraアルゴリズムは、トレーニングやテストデータセットにアクセスせずに、モデルの脆弱なビットを検索できる。そこで本手法は,DNNの安全のために,敵の重み付け攻撃をより致命的なものにする。実験の結果,従来の攻撃法と比較して,DNNの破壊に要するビットフリップ数は平均で2.0x (CIFAR-10) と1.6x (ImageNet) が少ないことがわかった。コードはhttps://github.com/で入手できる。 pdh930105/ZeBRA。 In this paper, we present Zero-data Based Repeated bit flip Attack (ZeBRA) that precisely destroys deep neural networks (DNNs) by synthesizing its own attack datasets. Many prior works on adversarial weight attack require not only the weight parameters, but also the training or test dataset in searching vulnerable bits to be attacked. We propose to synthesize the attack dataset, named distilled target data, by utilizing the statistics of batch normalization layers in the victim DNN model. Equipped with the distilled target data, our ZeBRA algorithm can search vulnerable bits in the model without accessing training or test dataset. Thus, our approach makes the adversarial weight attack more fatal to the security of DNNs. Our experimental results show that 2.0x (CIFAR-10) and 1.6x (ImageNet) less number of bit flips are required on average to destroy DNNs compared to the previous attack method. Our code is available at https://github. com/pdh930105/ZeBRA.	翻訳日:2021-11-02 18:19:11 公開日:2021-11-01
# マルチメディア推薦のためのコントラストモダリティ融合による潜在構造マイニング Latent Structures Mining with Contrastive Modality Fusion for Multimedia Recommendation ( http://arxiv.org/abs/2111.00678v1 ) ライセンス: Link先を確認	Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Mengqi Zhang, Shu Wu, Liang Wang	(参考訳) 近年,マルチメディアリコメンデーションへの関心が高まっている。マルチモーダルコンテンツを用いたアイテムの対話性を予測することを目的としている。これまでの研究は、サイド情報を含むマルチモーダル機能によるユーザ・テーマインタラクションのモデリングに焦点を当てている。しかし、この方式はマルチメディアレコメンデーションには適していない。まず、協調的なアイテム-アイテム間の関係のみが、高次アイテム-ユーザ-アイテム間の共起によって暗黙的にモデル化される。これらのマルチモーダルコンテンツに基づく潜在的セマンティック・アイテム・イテム構造は、より優れたアイテム表現を学習し、候補項目を包括的に発見するための推奨モデルを支援するのに有用である。第2に, 細粒度マルチモーダル核融合を無視する先行研究である。複数モードにアクセスできることで、豊富な情報を取得することができるが、線形結合や過去の作業における連結による単純な粗粒融合は、内容情報や項目の関係を十分に理解するには不十分である、と我々は論じ、このために、contRastive mOdality fusion method (MICRO) を用いた潜伏構造を提案する。具体化するために,各モダリティの項目間関係を学習する新しいモダリティ対応構造学習モジュールを考案した。学習したモダリティ対応アイテムの関係に基づき、モダリティ対応アイテム表現にアイテム親和性を明示的に注入するグラフ畳み込みを行う。そして,マルチモーダルな特徴を融合する新しいコントラスト手法を設計する。これらの強化された項目表現は、より正確な推奨を行うために既存の協調フィルタリングメソッドにプラグインすることができる。実世界のデータセットに関する広範な実験は、最先端のベースラインよりも優れた方法を示している。 Recent years have witnessed growing interests in multimedia recommendation, which aims to predict whether a user will interact with an item with multimodal contents. Previous studies focus on modeling user-item interactions with multimodal features included as side information. However, this scheme is not well-designed for multimedia recommendation. Firstly, only collaborative item-item relationships are implicitly modeled through high-order item-user-item co-occurrences. We argue that the latent semantic item-item structures underlying these multimodal contents could be beneficial for learning better item representations and assist the recommender models to comprehensively discover candidate items. Secondly, previous studies disregard the fine-grained multimodal fusion. Although having access to multiple modalities might allow us to capture rich information, we argue that the simple coarse-grained fusion by linear combination or concatenation in previous work is insufficient to fully understand content information and item relationships.To this end, we propose a latent structure MIning with ContRastive mOdality fusion method (MICRO for brevity). To be specific, we devise a novel modality-aware structure learning module, which learns item-item relationships for each modality. Based on the learned modality-aware latent item relationships, we perform graph convolutions that explicitly inject item affinities to modality-aware item representations. Then, we design a novel contrastive method to fuse multimodal features. These enriched item representations can be plugged into existing collaborative filtering methods to make more accurate recommendations. Extensive experiments on real-world datasets demonstrate the superiority of our method over state-of-the-art baselines.	翻訳日:2021-11-02 18:01:58 公開日:2021-11-01
# 階層型情報構造を用いた分散協調強化学習 Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure ( http://arxiv.org/abs/2111.00781v1 ) ライセンス: Link先を確認	Hsu Kao, Chen-Yu Wei, Vijay Subramanian	(参考訳) 情報非対称性のため,マルチエージェント強化学習(MARL)の問題は困難である。この課題を克服するために、既存の手法では高いレベルの調整やエージェント間のコミュニケーションを必要とすることが多い。我々は、アプリケーションに生じる階層的な情報構造を持つ2エージェントマルチアームバンド(MAB)とマルコフ決定プロセス(MDP)について検討し、協調や通信を必要としないよりシンプルで効率的なアルゴリズムを提案する。構造では、各ステップで ``leader" がまず彼女のアクションを選択し、その後 ``follower" がリーダーのアクションを観察した後、彼のアクションを決定する。 2つのエージェントは、共同アクションに依存する同じ報酬(およびmdp設定における同じ状態遷移)を観察します。バンドイット設定には,$\widetilde{\mathcal{o}}(\sqrt{abt})$ と$\mathcal{o}(\log(t))$ の近似的ギャップ非依存的後悔と,$a$ と $b$ がそれぞれリーダーとフォロワーのアクション数であり,$t$ がステップ数であるような階層的バンドイットアルゴリズムを提案する。我々はさらに,複数のフォロワと深い階層を持つケースにまで拡張し,それぞれが最適に近い後悔の限界を得る。 mdp の設定では、$\widetilde{\mathcal{o}}(\sqrt{h^7s^2abt})$ regret、ただし$h$ は1エピソードあたりのステップ数、$s$ はステート数、$t$ はエピソード数である。これは、$A、B$、および$T$という観点で既存の下限と一致する。 Multi-agent reinforcement learning (MARL) problems are challenging due to information asymmetry. To overcome this challenge, existing methods often require high level of coordination or communication between the agents. We consider two-agent multi-armed bandits (MABs) and Markov decision processes (MDPs) with a hierarchical information structure arising in applications, which we exploit to propose simpler and more efficient algorithms that require no coordination or communication. In the structure, in each step the ``leader" chooses her action first, and then the ``follower" decides his action after observing the leader's action. The two agents observe the same reward (and the same state transition in the MDP setting) that depends on their joint action. For the bandit setting, we propose a hierarchical bandit algorithm that achieves a near-optimal gap-independent regret of $\widetilde{\mathcal{O}}(\sqrt{ABT})$ and a near-optimal gap-dependent regret of $\mathcal{O}(\log(T))$, where $A$ and $B$ are the numbers of actions of the leader and the follower, respectively, and $T$ is the number of steps. We further extend to the case of multiple followers and the case with a deep hierarchy, where we both obtain near-optimal regret bounds. For the MDP setting, we obtain $\widetilde{\mathcal{O}}(\sqrt{H^7S^2ABT})$ regret, where $H$ is the number of steps per episode, $S$ is the number of states, $T$ is the number of episodes. This matches the existing lower bound in terms of $A, B$, and $T$.	翻訳日:2021-11-02 18:01:33 公開日:2021-11-01
# 画像としてのプログラムの符号化:ソースコードの視覚的表現の評価 Encoding Program as Image: Evaluating Visual Representation of Source Code ( http://arxiv.org/abs/2111.01097v1 ) ライセンス: Link先を確認	Md Rafiqul Islam Rabin, Mohammad Amin Alipour	(参考訳) ニューラルネットワークの入力ベクトルにソースコードをエンコードするいくつかのアプローチがある。これらのアプローチは、入力プログラムの様々な構文的特徴と意味的特徴をエンコーディングに含もうとしている。本稿では,入力プログラムのスナップショットに基づくソースコードの新しい表現であるcode2snapshotについて検討する。この表現のいくつかのバリエーションを評価し、その性能を入力プログラムの豊かな構文的特徴と意味的特徴を利用した最先端表現と比較する。コード要約タスクにおけるCode2Snapshotの実用性に関する予備的な研究は、入力プログラムの単純なスナップショットが最先端表現に匹敵する性能を持つことを示唆している。興味深いことに、入力プログラムを無視することはcode2snapshotのパフォーマンスに無意味な影響を与えるため、いくつかのタスクでは、ニューラルネットワークが入力プログラムの構造のみに依存することで高いパフォーマンスを提供する可能性がある。 There are several approaches to encode source code in the input vectors of neural models. These approaches attempt to include various syntactic and semantic features of input programs in their encoding. In this paper, we investigate Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs. We evaluate several variations of this representation and compare its performance with state-of-the-art representations that utilize the rich syntactic and semantic features of input programs. Our preliminary study on the utility of Code2Snapshot in the code summarization task suggests that simple snapshots of input programs have comparable performance to the state-of-the-art representations. Interestingly, obscuring the input programs have insignificant impacts on the Code2Snapshot performance, suggesting that, for some tasks, neural models may provide high performance by relying merely on the structure of input programs.	翻訳日:2021-11-02 18:00:38 公開日:2021-11-01
# (参考訳) FaceScape: 1次元顔再構成のための3次元顔データセットとベンチマーク FaceScape: 3D Facial Dataset and Benchmark for Single-View 3D Face Reconstruction ( http://arxiv.org/abs/2111.01082v1 ) ライセンス: CC BY 4.0	Hao Zhu, Haotian Yang, Longwei Guo, Yidi Zhang, Yanru Wang, Mingkai Huang, Qiu Shen, Ruigang Yang, Xun Cao	(参考訳) 本稿では,大規模な3次元顔データセット,FaceScape,およびそれに対応するベンチマークについて述べる。 FaceScapeデータをトレーニングすることにより、単一の画像入力から精巧な3次元顔モデルを予測する新しいアルゴリズムを提案する。 FaceScapeデータセットは18,760のテクスチャ付き3D顔を提供する。 3Dモデルは、位相的に均一になるように処理される細孔レベルの顔形状を含んでいる。これらの微細な3次元顔モデルは、粗い形状と詳細な幾何学のための変位マップの3次元形態モデルとして表現することができる。大規模かつ高精度なデータセットを活用して、深層ニューラルネットワークを用いて表現固有の動的詳細を学習する新しいアルゴリズムが提案されている。学習された関係は、単一の画像入力から3次元顔予測システムの基礎となる。従来の方法とは異なり、予測した3dモデルは、異なる表現の下で高度に詳細な幾何学を組み込むことができる。また、FaceScapeデータを用いて、最新の単一視点顔再構成手法の評価を行う。精度はカメラのポーズと焦点距離の寸法で報告され分析され、忠実で包括的な評価が得られ、新たな課題が浮かび上がっている。前例のないデータセット、ベンチマーク、コードは、研究目的で一般に公開された。 In this paper, we present a large-scale detailed 3D face dataset, FaceScape, and the corresponding benchmark to evaluate single-view facial 3D reconstruction. By training on FaceScape data, a novel algorithm is proposed to predict elaborate riggable 3D face models from a single image input. FaceScape dataset provides 18,760 textured 3D faces, captured from 938 subjects and each with 20 specific expressions. The 3D models contain the pore-level facial geometry that is also processed to be topologically uniformed. These fine 3D facial models can be represented as a 3D morphable model for rough shapes and displacement maps for detailed geometry. Taking advantage of the large-scale and high-accuracy dataset, a novel algorithm is further proposed to learn the expression-specific dynamic details using a deep neural network. The learned relationship serves as the foundation of our 3D face prediction system from a single image input. Different than the previous methods, our predicted 3D models are riggable with highly detailed geometry under different expressions. We also use FaceScape data to generate the in-the-wild and in-the-lab benchmark to evaluate recent methods of single-view face reconstruction. The accuracy is reported and analyzed on the dimensions of camera pose and focal length, which provides a faithful and comprehensive evaluation and reveals new challenges. The unprecedented dataset, benchmark, and code have been released to the public for research purpose.	翻訳日:2021-11-02 17:57:24 公開日:2021-11-01
# To Talk or To Work: モバイルエッジデバイス上での効果的なフェデレーション学習の遅延 To Talk or to Work: Delay Efficient Federated Learning over Mobile Edge Devices ( http://arxiv.org/abs/2111.00637v1 ) ライセンス: Link先を確認	Pavana Prakash, Jiahao Ding, Maoqiang Wu, Minglei Shu, Rong Yu, and Miao Pan	(参考訳) 新たな分散機械学習パラダイムであるフェデレーション・ラーニング(fl)は、エッジコンピューティングと融合し、モバイルエッジデバイス上で新たなアプリケーションを持つ有望な分野である。 FLでは、モバイルデバイスは、モデル更新だけを共有することで、中央サーバの調整の下で、自身のデータに基づいてモデルをトレーニングするため、トレーニングデータをプライベートに保持する。しかし、データの中心的な可用性がなければ、計算ノードは収束を達成するためにしばしばモデル更新を伝える必要がある。したがって、ローカルモデル更新を作成するためのローカルな計算時間と、それらをサーバに送受信するのに要する時間とが、全体の時間を遅らせることになる。さらに、信頼性の低いネットワーク接続は、これらの更新の効率的な通信を妨げる可能性がある。そこで本稿では,モデルが収束するために必要な通信ラウンドと計算時間(計算時間と通信時間の両方)を削減する遅延効率のfl機構を提案する。遅延に寄与する様々なパラメータの影響を探求し,無線通信(話)と局所計算(作業)のトレードオフのバランスを図る。総合時間との関係を最適化問題として定式化し,広範なシミュレーションによるアプローチの有効性を示す。 Federated learning (FL), an emerging distributed machine learning paradigm, in conflux with edge computing is a promising area with novel applications over mobile edge devices. In FL, since mobile devices collaborate to train a model based on their own data under the coordination of a central server by sharing just the model updates, training data is maintained private. However, without the central availability of data, computing nodes need to communicate the model updates often to attain convergence. Hence, the local computation time to create local model updates along with the time taken for transmitting them to and from the server result in a delay in the overall time. Furthermore, unreliable network connections may obstruct an efficient communication of these updates. To address these, in this paper, we propose a delay-efficient FL mechanism that reduces the overall time (consisting of both the computation and communication latencies) and communication rounds required for the model to converge. Exploring the impact of various parameters contributing to delay, we seek to balance the trade-off between wireless communication (to talk) and local computation (to work). We formulate a relation with overall time as an optimization problem and demonstrate the efficacy of our approach through extensive simulations.	翻訳日:2021-11-02 17:30:07 公開日:2021-11-01
# GCNear: ニアメモリ処理による効率的なGCNトレーニングのためのハイブリッドアーキテクチャ GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing ( http://arxiv.org/abs/2111.00680v1 ) ライセンス: Link先を確認	Zhe Zhou and Cong Li and Xuechao Wei and Guangyu Sun	(参考訳) 近年,グラフ畳み込みネットワーク (GCN) は非ユークリッドグラフデータを解析するための最先端のアルゴリズムとなっている。しかし、特に大きなグラフで効率的なgcnトレーニングを実現することは困難である。理由は多岐にわたる。 1)GCNトレーニングは、かなりのメモリフットプリントを発生させる。大規模なグラフ上のフルバッチトレーニングは、バックプロパゲーションのために中間データをバッファするために数百から数千ギガバイトのメモリを必要とする。 2)GCNトレーニングには、メモリ集約データ削減と計算集約機能/段階更新操作の両方が含まれる。このような異質な性質は、現在のCPU/GPUプラットフォームに挑戦する。 3) グラフの不規則性と複雑なトレーニングデータフローは,GCN訓練システムの効率向上の難しさを両立させる。本稿では,これらの課題に対処するためのハイブリッドアーキテクチャであるGCNearを提案する。具体的には、GCNearはDIMMベースのメモリシステムを採用し、容易にスケールできるメモリ容量を提供する。ヘテロジニアスの性質に合わせて、GCNトレーニング操作をメモリ集約リデュースと計算集約更新操作に分類する。次に、高集積ローカル帯域幅をフル活用して、Reducee操作をオン・DIMM NMEにオフロードする。更新処理に十分な計算能力を持つCAEを採用している。さらに,GCNタスクの不規則性に対処し,GCNearの性能を改善するための最適化手法を提案する。また,GCNearのスケーラビリティを評価するためのマルチGCNearシステムを提案する。 Recently, Graph Convolutional Networks (GCNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data. However, it is challenging to realize efficient GCN training, especially on large graphs. The reasons are many-folded: 1) GCN training incurs a substantial memory footprint. Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory to buffer the intermediate data for back-propagation. 2) GCN training involves both memory-intensive data reduction and computation-intensive features/gradients update operations. Such a heterogeneous nature challenges current CPU/GPU platforms. 3) The irregularity of graphs and the complex training dataflow jointly increase the difficulty of improving a GCN training system's efficiency. This paper presents GCNear, a hybrid architecture to tackle these challenges. Specifically, GCNear adopts a DIMM-based memory system to provide easy-to-scale memory capacity. To match the heterogeneous nature, we categorize GCN training operations as memory-intensive Reduce and computation-intensive Update operations. We then offload Reduce operations to on-DIMM NMEs, making full use of the high aggregated local bandwidth. We adopt a CAE with sufficient computation capacity to process Update operations. We further propose several optimization strategies to deal with the irregularity of GCN tasks and improve GCNear's performance. We also propose a Multi-GCNear system to evaluate the scalability of GCNear.	翻訳日:2021-11-02 17:29:47 公開日:2021-11-01
# オンラインの公正な学習がランク付けへ Calibrating Explore-Exploit Trade-off for Fair Online Learning to Rank ( http://arxiv.org/abs/2111.00735v1 ) ライセンス: Link先を確認	Yiling Jia, Hongning Wang	(参考訳) オンライン・ラーニング・ト・ランク(OL2R)は、オフラインの教師付きランキング・モデル学習に必要な高価なレバレッジ・ラベリングを避けるという利点により、近年大きな研究関心を集めている。そのような解は未知数(例えば、故意に選択された結果をトップの位置に提示する)を探索し、関連性の推定を改善する。しかしこれは、OL2Rの期間中に異なる種類のアイテムが異なる治療を受ける可能性があるという、ランキングフェアネスに関する懸念を引き起こす。しかし、既存の公正ランキングソリューションは、通常、結果の妥当性の知識や、OL2Rの設定と矛盾するパフォーマンスローダを必要とするため、公正性を保証するために直接適用することはできない。本稿では,ol2rにおける集団曝露によって定義される公平性を達成するための汎用フレームワークを提案する。鍵となる考え方は、公正性制御、関連学習、オンラインランキング品質のための探索と搾取を校正することである。特に、モデルが関連性フィードバックの結果の集合を探索する場合、ランダムな置換のサブセット内で探索を限定し、フィードバックが不偏である間にグループ間の公平性が維持される。理論的には、このような戦略は、公平性を得るためのOL2Rの後悔に最小の歪みをもたらす。既存のフェアOL2Rソリューションと比較して,提案ソリューションの有効性を示すために,ベンチマークデータセットをランク付けする2つの公開学習において,大規模な実証分析を行う。 Online learning to rank (OL2R) has attracted great research interests in recent years, thanks to its advantages in avoiding expensive relevance labeling as required in offline supervised ranking model learning. Such a solution explores the unknowns (e.g., intentionally present selected results on top positions) to improve its relevance estimation. This however triggers concerns on its ranking fairness: different groups of items might receive differential treatments during the course of OL2R. But existing fair ranking solutions usually require the knowledge of result relevance or a performing ranker beforehand, which contradicts with the setting of OL2R and thus cannot be directly applied to guarantee fairness. In this work, we propose a general framework to achieve fairness defined by group exposure in OL2R. The key idea is to calibrate exploration and exploitation for fairness control, relevance learning and online ranking quality. In particular, when the model is exploring a set of results for relevance feedback, we confine the exploration within a subset of random permutations, where fairness across groups is maintained while the feedback is still unbiased. Theoretically we prove such a strategy introduces minimum distortion in OL2R's regret to obtain fairness. Extensive empirical analysis is performed on two public learning to rank benchmark datasets to demonstrate the effectiveness of the proposed solution compared to existing fair OL2R solutions.	翻訳日:2021-11-02 17:29:29 公開日:2021-11-01
# 無差別の毒殺攻撃はショートカットだ Indiscriminate Poisoning Attacks Are Shortcuts ( http://arxiv.org/abs/2111.00898v1 ) ライセンス: Link先を確認	Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu	(参考訳) トレーニングデータに知覚不可能な摂動を加えてトレーニングモデルのテストエラーを最大化する無差別なデータ中毒攻撃は、不正なデータの使用を防止できると考えられるため、トレンドとなっている。本研究では,これらの摂動が原理的に働く理由について考察する。高度な中毒攻撃の摂動は、対応するサンプルの標的ラベルに割り当てられた場合にはほぼ \textbf{linear separable} であり、学習目的には \emph{shortcuts} として機能する。この重要な人口の資産は以前にも明らかにされていない。さらに,線分分離性が中毒攻撃の働き馬であることの検証も行う。線形分離可能なデータを摂動として合成し、そのような合成摂動が故意に作られた攻撃と同じくらい強力であることを示す。我々の発見は, 深層学習は知覚不可能な尺度であり, 通常の特徴と混在しているとしても, ショートカットに大きく依存しているため, 従来信じられていたような「emph{shortcut learning}」問題は深刻であることを示している。この発見は、事前訓練された特徴抽出器がこれらの中毒攻撃を効果的に無効にすることを示唆している。 Indiscriminate data poisoning attacks, which add imperceptible perturbations to training data to maximize the test error of trained models, have become a trendy topic because they are thought to be capable of preventing unauthorized use of data. In this work, we investigate why these perturbations work in principle. We find that the perturbations of advanced poisoning attacks are almost \textbf{linear separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective. This important population property has not been unveiled before. Moreover, we further verify that linear separability is indeed the workhorse for poisoning attacks. We synthesize linear separable data as perturbations and show that such synthetic perturbations are as powerful as the deliberately crafted attacks. Our finding suggests that the \emph{shortcut learning} problem is more serious than previously believed as deep learning heavily relies on shortcuts even if they are of an imperceptible scale and mixed together with the normal features. This finding also suggests that pre-trained feature extractors would disable these poisoning attacks effectively.	翻訳日:2021-11-02 17:29:05 公開日:2021-11-01
# ランダム化シミュレーションによるロボット学習 Robot Learning from Randomized Simulations: A Review ( http://arxiv.org/abs/2111.00956v1 ) ライセンス: Link先を確認	Fabio Muratore, Fabio Ramos, Greg Turk, Wenhao Yu, Michael Gienger and Jan Peters	(参考訳) ディープラーニングの台頭は、大量のデータを必要とする方法を好むロボット研究のパラダイムシフトを引き起こしている。このようなデータセットを物理プラットフォーム上で生成するのは違法にコストがかかる。したがって、最先端のアプローチは、データ生成が高速かつ安価であるシミュレーションで学習し、その知識を実際のロボット(sim-to-real)に転送する。現実的になりつつあるにもかかわらず、すべてのシミュレーターはモデルに基づいて構築されているため、必然的に不完全である。これは、ロボットの制御ポリシーを学習し、シミュレーションと現実のミスマッチを克服するためにシミュレータをどのように修正できるかという問題を引き起こし、しばしば「現実のギャップ」と呼ばれる。本稿では, ランダム化シミュレーションから学習する手法である「ドメインランダム化」と呼ばれる手法に着目し, ロボット工学におけるシム・トゥ・リアル研究の総合的なレビューを行う。 The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. It is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the 'reality gap'. We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named 'domain randomization' which is a method for learning from randomized simulations.	翻訳日:2021-11-02 17:28:45 公開日:2021-11-01
# STORM+:非凸最適化のためのモーメント付き完全適応SGD STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization ( http://arxiv.org/abs/2111.01040v1 ) ライセンス: Link先を確認	Kfir Y. Levy, Ali Kavis, Volkan Cevher	(参考訳) 本研究では,目的が滑らかな損失関数に対する期待値である確率的非凸最適化問題を調査し,近似定常点を求めることを目的とする。このような問題に対処する最も一般的なアプローチは分散還元法であり、これはこの場合の下限に合致する密接な収束率を得ることでも知られている。それにもかかわらず、これらの技術は適切に選択された「メガバッチサイズ」と連動してアンカーポイントを注意深く維持する必要がある。これにより、実用性を弱める超パラメータチューニング問題が発生する。近年, [Cutkosky and Orabona, 2019] は, アンカーポイントや大規模なバッチサイズの使用を避けるために再帰運動量を利用することができ, この設定に最適なレートが得られることを示した。しかし、ストームと呼ばれるそれらの手法は、滑らかさの知識と勾配ノルムの束縛に大きく依存している。本研究では,パラメータフリーで大規模なバッチサイズを必要としない新しい手法STORM+を提案し,近似定常点を求めるために最適なO(1/T^{1/3})$レートを求める。我々の研究は、学習率と運動量パラメータを適応的に設定する新しいアプローチとともに、STORMアルゴリズムに基づいている。 In this work we investigate stochastic non-convex optimization problems where the objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is variance reduction techniques, which are also known to obtain tight convergence rates, matching the lower bounds in this case. Nevertheless, these techniques require a careful maintenance of anchor points in conjunction with appropriately selected "mega-batchsizes". This leads to a challenging hyperparameter tuning problem, that weakens their practicality. Recently, [Cutkosky and Orabona, 2019] have shown that one can employ recursive momentum in order to avoid the use of anchor points and large batchsizes, and still obtain the optimal rate for this setting. Yet, their method called STORM crucially relies on the knowledge of the smoothness, as well a bound on the gradient norms. In this work we propose STORM+, a new method that is completely parameter-free, does not require large batch-sizes, and obtains the optimal $O(1/T^{1/3})$ rate for finding an approximate stationary point. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.	翻訳日:2021-11-02 17:27:40 公開日:2021-11-01
# 資源効率のよい連合学習 Resource-Efficient Federated Learning ( http://arxiv.org/abs/2111.01108v1 ) ライセンス: Link先を確認	Ahmed M. Abdelmoniem and Atal Narayan Sahu and Marco Canini and Suhaib A. Fahmy	(参考訳) フェデレーション学習(fl)は,ローカルデータを用いた学習者による分散トレーニングを可能にし,プライバシの強化とコミュニケーションの削減を実現する。しかし、デプロイメントスケールとしてデータ分散の不均一性、デバイス機能、アクセシビリティに関する多くの課題が提示され、モデル収束とバイアスの両方に影響を与える可能性がある。既存のflスキームは、公平性を改善するためにランダムな参加者選択を用いるが、これはリソースの効率の悪い使用とより低い品質のトレーニングをもたらす可能性がある。本研究では,FLにおける資源効率の課題を系統的に解決し,知的受像者選択のメリットと,混在する参加者からの更新を取り入れた。我々は、これらの要因がいかにリソース効率を向上させるかを示しながら、トレーニングされたモデル品質も改善する。 Federated Learning (FL) enables distributed training by learners using local data, thereby enhancing privacy and reducing communication. However, it presents numerous challenges relating to the heterogeneity of the data distribution, device capabilities, and participant availability as deployments scale, which can impact both model convergence and bias. Existing FL schemes use random participant selection to improve fairness; however, this can result in inefficient use of resources and lower quality training. In this work, we systematically address the question of resource efficiency in FL, showing the benefits of intelligent participant selection, and incorporation of updates from straggling participants. We demonstrate how these factors enable resource efficiency while also improving trained model quality.	翻訳日:2021-11-02 17:27:19 公開日:2021-11-01
# (参考訳) 磁気共鳴画像の画質指標とニューラルネットワークのセグメンテーション精度の相関 Correlation between image quality metrics of magnetic resonance images and the neural network segmentation accuracy ( http://arxiv.org/abs/2111.01093v1 ) ライセンス: CC BY 4.0	Rajarajeswari Muthusivarajan, Adrian Celaya, Joshua P. Yung, Satish Viswanath, Daniel S. Marcus, Caroline Chung, David Fuentes	(参考訳) Deep neural networks with multilevel connections process input data in complex ways to learn the information.A networks learning efficiency depends not only on the complex neural network architecture but also on the input training images.Medical image segmentation with deep neural networks for skull stripping or tumor segmentation from magnetic resonance images enables learning both global and local features of the images.Though medical images are collected in a controlled environment,there may be artifacts or equipment based variance that cause inherent bias in the input set.In this study, we investigated the correlation between the image quality metrics of MR images with the neural network segmentation accuracy.For that we have used the 3D DenseNet architecture and let the network trained on the same input but applying different methodologies to select the training data set based on the IQM values.The difference in the segmentation accuracy between models based on the random training inputs with IQM based training inputs shed light on the role of image quality metrics on segmentation accuracy.By running the image quality metrics to choose the training inputs,further we may tune the learning efficiency of the network and the segmentation accuracy. Deep neural networks with multilevel connections process input data in complex ways to learn the information.A networks learning efficiency depends not only on the complex neural network architecture but also on the input training images.Medical image segmentation with deep neural networks for skull stripping or tumor segmentation from magnetic resonance images enables learning both global and local features of the images.Though medical images are collected in a controlled environment,there may be artifacts or equipment based variance that cause inherent bias in the input set.In this study, we investigated the correlation between the image quality metrics of MR images with the neural network segmentation accuracy.For that we have used the 3D DenseNet architecture and let the network trained on the same input but applying different methodologies to select the training data set based on the IQM values.The difference in the segmentation accuracy between models based on the random training inputs with IQM based training inputs shed light on the role of image quality metrics on segmentation accuracy.By running the image quality metrics to choose the training inputs,further we may tune the learning efficiency of the network and the segmentation accuracy.	翻訳日:2021-11-02 17:24:17 公開日:2021-11-01
# 情報価値におけるマルチマーケットモノポリーと非凹凸の拡大 Expanding Multi-Market Monopoly and Nonconcavity in the Value of Information ( http://arxiv.org/abs/2111.00839v1 ) ライセンス: Link先を確認	Stefan Behringer	(参考訳) 本稿では,複数の相互接続市場においてランダムに増加する需要に直面した価格設定独占の具体的設定におけるベイズ的逆問題について検討する。 Kalman-Bucy-Stratonovichフィルタを用いた完全動的離散モデルで信号のモノポリスへの値を調べると、信号の分散において非モノトニックであることが分かる。古典的な情報の価値の静的な設定では、この関係は凸や凹凸であるが、常に単調である。非単調性の存在は、系の外因性成長速度に大きく依存する。 In this paper I investigate a Bayesian inverse problem in the specific setting of a price setting monopolist facing a randomly growing demand in multiple possibly interconnected markets. Investigating the Value of Information of a signal to the monopolist in a fully dynamic discrete model employing the Kalman-Bucy-Stratonovich filter, we find that it may be non-monotonic in the variance of the signal. In the classical static settings of the Value of Information literature this relationship may be convex or concave, but is always monotonic. The existence of the non-monotonicity depends critically on the exogenous growth rate of the system.	翻訳日:2021-11-02 17:12:15 公開日:2021-11-01
# (参考訳) aiへのステークホルダーの参加:"多種多様な利害関係者の追加と混乱"を超えて Stakeholder Participation in AI: Beyond "Add Diverse Stakeholders and Stir" ( http://arxiv.org/abs/2111.01122v1 ) ライセンス: CC BY 4.0	Fernando Delgado, Stephen Yang, Michael Madaio, Qian Yang	(参考訳) HCIとAI研究には、AIシステムの設計は、AIに影響されるステークホルダーを関与させ、強化する必要があるという意見の一致が増えている。しかし、ai設計に利害関係者が参加すべき方法は明らかではない。このワークショップペーパーは、既存の文献の参加に関する合成と、最近公開された調査およびAI研究者や実践者に対する数十の半構造化されたインタビューを通じて、現在のプラクティスの実証分析を通じて、AI設計における「参加的転換」を掘り下げることを目的としている。本稿は,AI設計における参加実践の現代的景観を詳述した経験的知見の集合を,我々の文献合成と経験的研究に基づいて分析するための概念的枠組みを提案する。これらの発見は、PD of AIがAI、HCI、その他の研究コミュニティをどのように前進させるべきか、より原則化された議論をブートストラップに役立てることができる。 There is a growing consensus in HCI and AI research that the design of AI systems needs to engage and empower stakeholders who will be affected by AI. However, the manner in which stakeholders should participate in AI design is unclear. This workshop paper aims to ground what we dub a 'participatory turn' in AI design by synthesizing existing literature on participation and through empirical analysis of its current practices via a survey of recent published research and a dozen semi-structured interviews with AI researchers and practitioners. Based on our literature synthesis and empirical research, this paper presents a conceptual framework for analyzing participatory approaches to AI design and articulates a set of empirical findings that in ensemble detail out the contemporary landscape of participatory practice in AI design. These findings can help bootstrap a more principled discussion on how PD of AI should move forward across AI, HCI, and other research communities.	翻訳日:2021-11-02 17:05:07 公開日:2021-11-01
# マルチエージェント知覚のための蒸留コラボレーショングラフの学習 Learning Distilled Collaboration Graph for Multi-Agent Perception ( http://arxiv.org/abs/2111.00643v1 ) ライセンス: Link先を確認	Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, Wenjun Zhang	(参考訳) マルチエージェント知覚のためのパフォーマンス・バンド幅トレードオフを改善するために,訓練可能,ポーズ認識,エージェント間の適応協調をモデル化する新しい蒸留コラボレーショングラフ(ディスクグラフ)を提案する。私たちの重要な斬新さは2つの側面にある。まず,知識蒸留によるDiscoGraphの学習を行う教師支援フレームワークを提案する。教師モデルは、全体視点入力と早期に協調し、生徒モデルは、単視点入力との中間的な協調に基づいている。本枠組みは,教師モデルの対応に合うように,学生モデルにおけるコラボレーション後の特徴マップを制約することにより,ディスクグラフを訓練する。次に、行列値のエッジウェイトをDiscoGraphで提案する。このような行列において、各要素は特定の空間領域におけるエージェント間注意を反映し、エージェントが情報領域を適応的に強調することができる。推論中は、蒸留されたコラボレーションネットワーク(DiscoNet)という名前の学生モデルのみを使用する必要があります。教師/学生のフレームワークに貢献し、共有されたDiscoNetを持つ複数のエージェントが、総合的な視点で仮説的な教師モデルのパフォーマンスに協力的にアプローチすることができる。 CARLAとSUMOを用いた大規模マルチエージェント認識データセットであるV2X-Sim 1.0で本手法の有効性を検証した。マルチエージェント3Dオブジェクト検出における定量的および定性的実験により、DiscoNetは最先端の協調認識法よりも優れた性能帯域トレードオフを達成できただけでなく、より簡単な設計の根拠ももたらした。私たちのコードはhttps://github.com/ai4ce/DiscoNetで公開されています。 To promote better performance-bandwidth trade-off for multi-agent perception, we propose a novel distilled collaboration graph (DiscoGraph) to model trainable, pose-aware, and adaptive collaboration among agents. Our key novelties lie in two aspects. First, we propose a teacher-student framework to train DiscoGraph via knowledge distillation. The teacher model employs an early collaboration with holistic-view inputs; the student model is based on intermediate collaboration with single-view inputs. Our framework trains DiscoGraph by constraining post-collaboration feature maps in the student model to match the correspondences in the teacher model. Second, we propose a matrix-valued edge weight in DiscoGraph. In such a matrix, each element reflects the inter-agent attention at a specific spatial region, allowing an agent to adaptively highlight the informative regions. During inference, we only need to use the student model named as the distilled collaboration network (DiscoNet). Attributed to the teacher-student framework, multiple agents with the shared DiscoNet could collaboratively approach the performance of a hypothetical teacher model with a holistic view. Our approach is validated on V2X-Sim 1.0, a large-scale multi-agent perception dataset that we synthesized using CARLA and SUMO co-simulation. Our quantitative and qualitative experiments in multi-agent 3D object detection show that DiscoNet could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale. Our code is available on https://github.com/ai4ce/DiscoNet.	翻訳日:2021-11-02 16:53:23 公開日:2021-11-01
# ロバスト最適輸送による正確な点雲登録 Accurate Point Cloud Registration with Robust Optimal Transport ( http://arxiv.org/abs/2111.00648v1 ) ライセンス: Link先を確認	Zhengyang Shen, Jean Feydy, Peirong Liu, Ariel Hern\'an Curiale, Ruben San Jose Estepar, Raul San Jose Estepar, Marc Niethammer	(参考訳) 本研究は, 形状整合に対するロバスト最適輸送(OT)の利用について検討する。具体的には、最近のOTソルバは、ポイントクラウド登録のための最適化と深層学習の両方を改良し、安価な計算コストで精度を高めていることを示す。この写本は、現代のOT理論の実践的な概要から始まる。次に、このフレームワークを形状マッチングに使用する際の主な困難に対する解決策を提供する。最後に, 部分形状の剛性登録, キティデータセットのシーンフロー推定, インスピレーションと有効期限の間の肺血管木の非パラメトリック登録など, 幅広い課題に対する輸送強化登録モデルの性能について紹介する。 otベースの手法は,kittiと肺登録課題において,精度とスケーラビリティの両面で最先端の結果を得る。 PVT1010もリリースしました。これは、高濃度のサンプリング点を持つ1010対の肺血管樹のデータセットです。このデータセットは、非常に複雑な形状と変形を持つポイントクラウド登録アルゴリズムの難しいユースケースを提供する。我々の研究は、ロバストOTが幅広い登録モデルの高速な事前調整と微調整を可能にし、コンピュータビジョンツールボックスのための新しいキーメソッドを提供することを示した。私たちのコードとデータセットは、https://github.com/uncbiag/robot.comからオンラインで利用できます。 This work investigates the use of robust optimal transport (OT) for shape matching. Specifically, we show that recent OT solvers improve both optimization-based and deep learning methods for point cloud registration, boosting accuracy at an affordable computational cost. This manuscript starts with a practical overview of modern OT theory. We then provide solutions to the main difficulties in using this framework for shape matching. Finally, we showcase the performance of transport-enhanced registration models on a wide range of challenging tasks: rigid registration for partial shapes; scene flow estimation on the Kitti dataset; and nonparametric registration of lung vascular trees between inspiration and expiration. Our OT-based methods achieve state-of-the-art results on Kitti and for the challenging lung registration task, both in terms of accuracy and scalability. We also release PVT1010, a new public dataset of 1,010 pairs of lung vascular trees with densely sampled points. This dataset provides a challenging use case for point cloud registration algorithms with highly complex shapes and deformations. Our work demonstrates that robust OT enables fast pre-alignment and fine-tuning for a wide range of registration models, thereby providing a new key method for the computer vision toolbox. Our code and dataset are available online at: https://github.com/uncbiag/robot.	翻訳日:2021-11-02 16:52:56 公開日:2021-11-01
# 画像弁別における自己検証 Self-Verification in Image Denoising ( http://arxiv.org/abs/2111.00666v1 ) ライセンス: Link先を確認	Huangxing Lin, Yihong Zhuang, Delu Zeng, Yue Huang, Xinghao Ding, John Paisley	(参考訳) 画像の雑音化のための自己検証と呼ばれる新しい正規化を考案する。この正規化は、従来の事前定義ではなく、ネットワークが学習した深い画像を用いて定式化される。具体的には、ネットワークの出力を ``re-noising'' の後再び denoise する ``prior'' として扱う。再演された画像と先行画像の比較は、ネットワークの演奏能力の自己検証として解釈することができる。自己検証は,画像復元に必要な低レベルな画像統計をネットワークが捉えることを促す。また,この自己検証正規化に基づき,クリーンな画像が見られなくても,ネットワークが無声化を学べることを示す。この学習戦略は自己教師型であり,自己検証画像認知(SVID)と呼ぶ。 SVIDは学習に基づく手法と従来のモデルに基づく認知的手法の混合と見なすことができ、ネットワークの出力を用いて正規化を適応的に定式化する。観測された劣化データのみを用いて,様々な認知タスクへのSVIDの適用について述べる。教師付きCNNに近いデノイングパフォーマンスを実現することができる。 We devise a new regularization, called self-verification, for image denoising. This regularization is formulated using a deep image prior learned by the network, rather than a traditional predefined prior. Specifically, we treat the output of the network as a ``prior'' that we denoise again after ``re-noising''. The comparison between the again denoised image and its prior can be interpreted as a self-verification of the network's denoising ability. We demonstrate that self-verification encourages the network to capture low-level image statistics needed to restore the image. Based on this self-verification regularization, we further show that the network can learn to denoise even if it has not seen any clean images. This learning strategy is self-supervised, and we refer to it as Self-Verification Image Denoising (SVID). SVID can be seen as a mixture of learning-based methods and traditional model-based denoising methods, in which regularization is adaptively formulated using the output of the network. We show the application of SVID to various denoising tasks using only observed corrupted data. It can achieve the denoising performance close to supervised CNNs.	翻訳日:2021-11-02 16:52:35 公開日:2021-11-01
# OctField: 3Dモデリングのための階層的命令関数 OctField: Hierarchical Implicit Functions for 3D Modeling ( http://arxiv.org/abs/2111.01067v1 ) ライセンス: Link先を確認	Jia-Heng Tang, Weikai Chen, Jie Yang, Bo Wang, Songrun Liu, Bo Yang, Lin Gao	(参考訳) 局所化暗黙関数の最近の進歩により、ニューラル暗示表現は大きなシーンにスケーラブルになった。しかし、これらのアプローチによって使われる3次元空間の正則部分分割は、表面占有率と幾何学的詳細の様々な粒度を考慮に入れない。その結果、メモリフットプリントは入力ボリュームとともに立方的に増大し、中程度の密度の分解でも計算コストが抑えられる。そこで本研究では,3次元曲面の学習可能な階層的暗黙表現であるコード化されたオクターフィールドを提案する。我々のアプローチの鍵は、3dシーンの適応分解であり、興味のある面の周囲に局所的な暗黙の関数だけを分散させる。この目的を達成するために、曲面占有率と部分幾何学の豊かさに応じて3次元空間を適応的に分割する階層的オクツリー構造を導入する。 octreeは離散的かつ非微分可能であるため、さらに、octreeセルの下位分割を確率的プロセスとしてモデル化し、octree構造と表面形状の両方を可微分的に再帰的にエンコードし復号する新しい階層ネットワークを提案する。形状モデリングおよび再構成タスクにおけるOctoFieldの価値を示し、代替手法よりも優れていることを示す。 Recent advances in localized implicit functions have enabled neural implicit representation to be scalable to large scenes. However, the regular subdivision of 3D space employed by these approaches fails to take into account the sparsity of the surface occupancy and the varying granularities of geometric details. As a result, its memory footprint grows cubically with the input volume, leading to a prohibitive computational cost even at a moderately dense decomposition. In this work, we present a learnable hierarchical implicit representation for 3D surfaces, coded OctField, that allows high-precision encoding of intricate surfaces with low memory and computational budget. The key to our approach is an adaptive decomposition of 3D scenes that only distributes local implicit functions around the surface of interest. We achieve this goal by introducing a hierarchical octree structure to adaptively subdivide the 3D space according to the surface occupancy and the richness of part geometry. As octree is discrete and non-differentiable, we further propose a novel hierarchical network that models the subdivision of octree cells as a probabilistic process and recursively encodes and decodes both octree structure and surface geometry in a differentiable manner. We demonstrate the value of OctField for a range of shape modeling and reconstruction tasks, showing superiority over alternative approaches.	翻訳日:2021-11-02 16:50:02 公開日:2021-11-01
# refinegan: 精度の高いピッチと強度応答を持つグラウンド真理よりも優れた波形を普遍的に生成する RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses ( http://arxiv.org/abs/2111.00962v1 ) ライセンス: Link先を確認	Shengyuan Xu, Wenxiao Zhao, Jing Guo	(参考訳) GAN(Generative Adversarial Network)に基づく高忠実度波形生成へのアプローチの多くは、その性能向上のために識別器に大きく依存している。しかし、このGAN法の過剰使用は、生成過程に大きな不確実性をもたらし、しばしばピッチと強度のミスマッチを引き起こし、歌声合成(SVS)のような敏感なケースでは致命的である。この問題に対処するため,高速な実時間生成機能を備えた高忠実なニューラルボコーダであるRefineGANを提案し,ロバスト性,ピッチと強度の精度,フルバンドオーディオ生成に着目した。学習過程の安定化と神経ボコーダのロバスト性を維持するために,マルチスケールのスペクトログラムに基づく損失関数を用いたピッチ誘導型洗練アーキテクチャを用いた。この方法で生成された音声は、地中音と比較した場合、主観的テストにおいて優れた性能を示す。この結果から, スピーカが生み出す欠陥や記録処理を除去することにより, 波形再構成時の忠実度も向上した。さらに、ある特定の種類のデータに基づいて訓練されたモデルが、全く見えない言語と目に見えない話者で同じように機能することを示した。生成されたサンプルペアはhttps://timedomain-tech.github.io/refinegan/で提供される。 Most GAN(Generative Adversarial Network)-based approaches towards high-fidelity waveform generation heavily rely on discriminators to improve their performance. However, the over-use of this GAN method introduces much uncertainty into the generation process and often result in mismatches of pitch and intensity, which is fatal when it comes to sensitive using cases such as singing voice synthesis(SVS). To address this problem, we propose RefineGAN, a high-fidelity neural vocoder with faster-than-real-time generation capability, and focused on the robustness, pitch and intensity accuracy, and full-band audio generation. We employed a pitch-guided refine architecture with a multi-scale spectrogram-based loss function to help stabilize the training process and maintain the robustness of the neural vocoder while using the GAN-based training method. Audio generated using this method shows a better performance in subjective tests when compared with the ground-truth audio. This result shows that the fidelity is even improved during the waveform reconstruction by eliminating defects produced by the speaker and the recording procedure. Moreover, a further study shows that models trained on a specified type of data can perform on totally unseen language and unseen speaker identically well. Generated sample pairs are provided on https://timedomain-tech.github.io/refinegan/.	翻訳日:2021-11-02 16:48:53 公開日:2021-11-01
# (参考訳) オープンワールドサンプリングによる種子不均衡データのコントラスト学習の改善 Improving Contrastive Learning on Imbalanced Seed Data via Open-World Sampling ( http://arxiv.org/abs/2111.01004v1 ) ライセンス: CC BY 4.0	Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang	(参考訳) 対照的な学習アプローチは、ターゲットクラスのラベルがほとんどない視覚表現の学習において大きな成功を収めた。これは、インターネット規模の外部ソースからのラベルなしイメージをより多く取り入れて、そのパフォーマンスを向上させるという、キュレートされた"シード"ベンチマークを超えてそれらをスケールアップする可能性を示すものだ。しかし、実際には、より大きなモデルサイズとより長いトレーニングを必要とするため、ラベルのないデータが大量に必要となる。さらに、open-world unlabeledデータは、通常、暗黙のlong-tailクラスまたは属性分布に従うが、その多くはターゲットクラスに属しない。したがって、ラベルのないデータをすべて盲目的に活用すれば、データの不均衡と邪魔になる可能性がある。このことは、関連するクラスに対して一般化可能でバランスの取れた多様な表現を学ぶために、外部ソースからラベルのないデータを戦略的に選択する原則的なアプローチを模索する動機となっている。本研究では,(1)無作為データ拡張によるサンプルの実証的コントラスト損失期待(ECLE)のソートによるテールクラスからのサンプルのサンプリングを促進するテールネス,(2)学習を妨げかねないアウトリーチを拒否する近接性,(3)サンプルの集合における多様性を保証するダイバーシティの3つの簡単な原則に従う,MAK(Model-Aware K-center)と呼ばれるオープンワールドなラベル付きデータサンプリングフレームワークを提案する。実験では,ImageNet-100-LT(ラベルなし)をシードデータセットと2つの"ノイズ"外部データソースとして使用することにより,MAKは,フルショット設定と少数ショット設定の線形分類器評価により,学習した機能の全体的な表現品質とクラスバランス性の両方を一貫して改善できることを示した。コードは以下の通り。 \url{https://github.com/VITA-Group/MAK Contrastive learning approaches have achieved great success in learning visual representations with few labels of the target classes. That implies a tantalizing possibility of scaling them up beyond a curated "seed" benchmark, to incorporating more unlabeled images from the internet-scale external sources to enhance its performance. However, in practice, larger amount of unlabeled data will require more computing resources due to the bigger model size and longer training needed. Moreover, open-world unlabeled data usually follows an implicit long-tail class or attribute distribution, many of which also do not belong to the target classes. Blindly leveraging all unlabeled data hence can lead to the data imbalance as well as distraction issues. This motivates us to seek a principled approach to strategically select unlabeled data from an external source, in order to learn generalizable, balanced and diverse representations for relevant classes. In this work, we present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK), which follows three simple principles: (1) tailness, which encourages sampling of examples from tail classes, by sorting the empirical contrastive loss expectation (ECLE) of samples over random data augmentations; (2) proximity, which rejects the out-of-distribution outliers that may distract training; and (3) diversity, which ensures diversity in the set of sampled examples. Empirically, using ImageNet-100-LT (without labels) as the seed dataset and two "noisy" external data sources, we demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features, as evaluated via linear classifier evaluation on full-shot and few-shot settings. The code is available at: \url{https://github.com/VITA-Group/MAK	翻訳日:2021-11-02 16:46:11 公開日:2021-11-01
# 構造情報の鍵:3次元物体検出における自己注意型RoI特徴エクストラクタ Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection ( http://arxiv.org/abs/2111.00931v1 ) ライセンス: Link先を確認	Diankun Zhang, Zhijie Zheng, Xueting Bi, Xiaojun Liu,	(参考訳) RoIのすべての特徴がグリッドピクセルから得られる2Dオブジェクト検出とは異なり、3Dポイントクラウドオブジェクト検出のRoI特徴抽出はより多様である。本稿ではまず,2つの最先端モデルPV-RCNNとVoxel-RCNNの構造と性能の違いを比較し,解析する。そして,2つのモデル間の性能差は点情報からではなく,構造情報から生じることがわかった。ボクセルの特徴は、点雲にダウンサンプリングする代わりに量子化を行うので、点雲全体の完全な情報を含むことができるため、より構造的な情報を含んでいる。ボクセルの特徴の強い構造情報は、正確な位置情報を持っていなくても、この検出器を我々の実験で高い性能にします。そこで我々は3次元物体検出の鍵となる構造情報を提案する。以上の結論に基づき、3次元提案から抽出した特徴の構造化情報を強化する自己注意型RoI特徴抽出器(SARFE)を提案する。 SARFEは既存の3D検出器で簡単に使用できるプラグアンドプレイモジュールである。我々のSARFEは、KITTIデータセットとWaymo Openデータセットの両方で評価されます。新たに導入されたSARFEにより、リアルタイム能力を維持しながら、KITTIデータセット上でのサイクリストの大きなマージンで最先端の3D検出器の性能を向上する。 Unlike 2D object detection where all RoI features come from grid pixels, the RoI feature extraction of 3D point cloud object detection is more diverse. In this paper, we first compare and analyze the differences in structure and performance between the two state-of-the-art models PV-RCNN and Voxel-RCNN. Then, we find that the performance gap between the two models does not come from point information, but structural information. The voxel features contain more structural information because they do quantization instead of downsampling to point cloud so that they can contain basically the complete information of the whole point cloud. The stronger structural information in voxel features makes the detector have higher performance in our experiments even if the voxel features don't have accurate location information. Then, we propose that structural information is the key to 3D object detection. Based on the above conclusion, we propose a Self-Attention RoI Feature Extractor (SARFE) to enhance structural information of the feature extracted from 3D proposals. SARFE is a plug-and-play module that can be easily used on existing 3D detectors. Our SARFE is evaluated on both KITTI dataset and Waymo Open dataset. With the newly introduced SARFE, we improve the performance of the state-of-the-art 3D detectors by a large margin in cyclist on KITTI dataset while keeping real-time capability.	翻訳日:2021-11-02 16:30:54 公開日:2021-11-01
# 3次元人文推定のための高次インプシシトフェアリングネットワーク Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation ( http://arxiv.org/abs/2111.00950v1 ) ライセンス: Link先を確認	Jianning Quan and A. Ben Hamza	(参考訳) 人間の3Dポーズを推定することは、主に人体の関節の複雑さ、閉塞、照明条件の変動など、難しい課題であることが証明されている。本稿では,2次元から3次元のポーズ推定のための初期残差接続を持つ高次グラフ畳み込みフレームワークを提案する。ノード特徴の集約にマルチホップ近傍を用いることにより,体節間の長距離依存性を捉えることができる。さらに,ネットワークアーキテクチャにおいて設計により統合された残差接続を活用し,ネットワーク深度が増大するにつれて,学習した特徴表現が入力層の初期特徴から重要な情報を保持することを保証する。 2つの標準ベンチマークで行った実験と改善研究は、我々のモデルの有効性を示し、3次元ポーズ推定のための強力なベースライン法よりも優れた性能を実現した。 Estimating a 3D human pose has proven to be a challenging task, primarily because of the complexity of the human body joints, occlusions, and variability in lighting conditions. In this paper, we introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation. Using multi-hop neighborhoods for node feature aggregation, our model is able to capture the long-range dependencies between body joints. Moreover, our approach leverages residual connections, which are integrated by design in our network architecture, ensuring that the learned feature representations retain important information from the initial features of the input layer as the network depth increases. Experiments and ablations studies conducted on two standard benchmarks demonstrate the effectiveness of our model, achieving superior performance over strong baseline methods for 3D human pose estimation.	翻訳日:2021-11-02 16:30:33 公開日:2021-11-01
# ウェアラブルカメラと多モード融合による人間軌道予測 Egocentric Human Trajectory Forecasting with a Wearable Camera and Multi-Modal Fusion ( http://arxiv.org/abs/2111.00993v1 ) ライセンス: Link先を確認	Jianing Qiu, Lipeng Chen, Xiao Gu, Frank P.-W. Lo, Ya-Yen Tsai, Jiankai Sun, Jiaqi Liu and Benny Lo	(参考訳) 本稿では,密集空間における自我中心型カメラ装着者(自我者)の軌跡予測の問題に対処する。現実世界を歩き回るさまざまなカメラの装着者のデータから得られた軌道予測能力は、視覚障害者のナビゲーション支援や、移動ロボットにおける人間のナビゲーション行動のシミュレーション、人間とロボットのインタラクションの改善に移すことができる。この目的のために、カメラを装着した混雑した空間を航行する人々の実際の軌跡を含む、新しいエゴセントリックな人間の軌道予測データセットを構築し、豊かな文脈データを抽出した。我々は,カメラ装着者の過去の軌跡,近所の人々の過去の軌跡,シーンの意味やシーンの深さなどの環境を予測するために,3つの異なるモダリティを抽出し,活用する。複数のモードを融合する新しいカスケードクロスアテンション機構を組み込んだトランスフォーマベースのエンコーダ・デコーダニューラルネットワークモデルは、カメラ装着者の将来の軌道を予測するために設計されている。実験により,エゴセントリックな人軌道予測において,本モデルが最先端の手法より優れていることが示された。 In this paper, we address the problem of forecasting the trajectory of an egocentric camera wearer (ego-person) in crowded spaces. The trajectory forecasting ability learned from the data of different camera wearers walking around in the real world can be transferred to assist visually impaired people in navigation, as well as to instill human navigation behaviours in mobile robots, enabling better human-robot interactions. To this end, a novel egocentric human trajectory forecasting dataset was constructed, containing real trajectories of people navigating in crowded spaces wearing a camera, as well as extracted rich contextual data. We extract and utilize three different modalities to forecast the trajectory of the camera wearer, i.e., his/her past trajectory, the past trajectories of nearby people, and the environment such as the scene semantics or the depth of the scene. A Transformer-based encoder-decoder neural network model, integrated with a novel cascaded cross-attention mechanism that fuses multiple modalities, has been designed to predict the future trajectory of the camera wearer. Extensive experiments have been conducted, and the results have shown that our model outperforms the state-of-the-art methods in egocentric human trajectory forecasting.	翻訳日:2021-11-02 16:30:21 公開日:2021-11-01
# render in- between: motion guided video synthesis for action interpolation Render In-between: Motion Guided Video Synthesis for Action Interpolation ( http://arxiv.org/abs/2111.01029v1 ) ライセンス: Link先を確認	Hsuan-I Ho, Xu Chen, Jie Song, Otmar Hilliges	(参考訳) 人間のアクティビティのアップサンプリングは、ゲームからエンターテイメント、スポーツ放送に至るまで、多くの潜在的なアプリケーションにおいて、興味深いが難しい課題だ。この環境でビデオフレームを合成することの主な難しさは、人間の動きの非常に複雑で非線形な性質と、身体の複雑な外観とテクスチャに起因する。本稿では,現実的な人間の動きと外観を創出できる動き誘導型フレームアップサンプリングフレームワークを提案する。大規模モーションキャプチャデータセット(amass)を利用して、フレーム間の非線形骨格運動を推定する新しいモーションモデルを訓練する。高いフレームレートのポーズ予測は、ニューラルネットワークレンダリングパイプラインがフルフレーム出力を生成するために使用し、ポーズとバックグラウンドの一貫性を考慮している。私たちのパイプラインでは、低フレームレートビデオと非ペアの人間のモーションデータしか必要ありませんが、トレーニングのために高フレームレートビデオは必要ありません。さらに,この課題に対して,人間の活動の高品質かつ高フレームなビデオからなる最初の評価データセットを寄贈する。現状の映像補間技術と比較すると, 画質と精度が向上し, 画素レベル, 分布測定値, 比較ユーザ評価の結果から明らかとなった。私たちのコードと収集したデータセットはhttps://git.io/render-in- betweenで利用可能です。 Upsampling videos of human activity is an interesting yet challenging task with many potential applications ranging from gaming to entertainment and sports broadcasting. The main difficulty in synthesizing video frames in this setting stems from the highly complex and non-linear nature of human motion and the complex appearance and texture of the body. We propose to address these issues in a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset (AMASS). The high-frame-rate pose predictions are then used by a neural rendering pipeline to produce the full-frame output, taking the pose and background consistency into consideration. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training. Furthermore, we contribute the first evaluation dataset that consists of high-quality and high-frame-rate videos of human activities for this task. Compared with state-of-the-art video interpolation techniques, our method produces in-between frames with better quality and accuracy, which is evident by state-of-the-art results on pixel-level, distributional metrics and comparative user evaluations. Our code and the collected dataset are available at https://git.io/Render-In-Between.	翻訳日:2021-11-02 16:29:58 公開日:2021-11-01
# 英語誤読検出と診断のための非自己回帰的エンドツーエンドニューラルモデリングの検討 Exploring Non-Autoregressive End-To-End Neural Modeling For English Mispronunciation Detection And Diagnosis ( http://arxiv.org/abs/2111.00844v1 ) ライセンス: Link先を確認	Hsin-Wei Wang, Bi-Cheng Yan, Hsuan-Sheng Chiu, Yung-Chang Hsu, Berlin Chen	(参考訳) エンド・ツー・エンド(E2E)ニューラル・モデリングは、コンピュータ支援言語訓練(CAPT)システムの開発を主な研究分野としており、従来の発音に基づく手法と競合する性能を示している。しかし、CAPTの現在のE2Eニューラルメソッドは、少なくとも2つの重要な課題に直面している。一方、E2E法のほとんどは、左から右へのビームサーチで自己回帰的に動作し、L2学習者の発音を指示する。しかし、これは推論の速度が非常に遅くなり、必然的に実用を妨げます。一方、E2Eニューラルメソッドは通常データ欲求であり、非ネイティブなトレーニングデータが不足すると、誤発音の検出と診断(MD&D)に対する効果が低下することがしばしばある。そこで我々は,非自己回帰(NAR)E2Eニューラルモデリングを利用した新しいMD&D手法を提案し,従来のE2Eニューラル手法と同等の性能を維持しつつ,推論時間を劇的に高速化した。さらに,本手法のNAR E2Eモデル上に積み重ねた発音モデリングネットワークを設計・開発し,MD&Dの有効性をさらに向上する。 DNN-HMM音響モデル上に構築された最上位のE2Eモデルと象徴的発音スコアに基づく手法と比較して,L2-ARCTIC英語データセットを用いた実験により本手法の有効性が検証された。 End-to-end (E2E) neural modeling has emerged as one predominant school of thought to develop computer-assisted language training (CAPT) systems, showing competitive performance to conventional pronunciation-scoring based methods. However, current E2E neural methods for CAPT are faced with at least two pivotal challenges. On one hand, most of the E2E methods operate in an autoregressive manner with left-to-right beam search to dictate the pronunciations of an L2 learners. This however leads to very slow inference speed, which inevitably hinders their practical use. On the other hand, E2E neural methods are normally data greedy and meanwhile an insufficient amount of nonnative training data would often reduce their efficacy on mispronunciation detection and diagnosis (MD&D). In response, we put forward a novel MD&D method that leverages non-autoregressive (NAR) E2E neural modeling to dramatically speed up the inference time while maintaining performance in line with the conventional E2E neural methods. In addition, we design and develop a pronunciation modeling network stacked on top of the NAR E2E models of our method to further boost the effectiveness of MD&D. Empirical experiments conducted on the L2-ARCTIC English dataset seems to validate the feasibility of our method, in comparison to some top-of-the-line E2E models and an iconic pronunciation-scoring based method built on a DNN-HMM acoustic model.	翻訳日:2021-11-02 16:29:35 公開日:2021-11-01
# 擬球面コントラスト発散 Pseudo-Spherical Contrastive Divergence ( http://arxiv.org/abs/2111.00780v1 ) ライセンス: Link先を確認	Lantao Yu, Jiaming Song, Yang Song, Stefano Ermon	(参考訳) エネルギーベースモデル(EBM)は柔軟な分布パラメトリゼーションを提供する。しかし、難解な分割関数のため、通常、最大確率推定のために対比的発散を通じて訓練される。本稿では,ESMの最大確率学習を一般化するための擬球面コントラスト分散(PS-CD)を提案する。 ps-cdは、難解な分割関数の計算を回避し、対照的な発散を含む一般化された学習目的のファミリーを提供する、厳密に適切な均質なスコアリングルールのファミリーの最大化に由来する。さらにPS-CDでは,計算コストや変動最小値の最適化を伴わずに,多様な学習目標を柔軟に選択することができる。提案手法の理論的解析と合成データと一般的な画像データセットの両方に関する広範な実験により、ps-cdの有効性とモデリングの柔軟性が示され、データの汚染に対する堅牢性が示され、最大精度と$f$-ebmsよりも優れていることが示された。 Energy-based models (EBMs) offer flexible distribution parametrization. However, due to the intractable partition function, they are typically trained via contrastive divergence for maximum likelihood estimation. In this paper, we propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum likelihood learning of EBMs. PS-CD is derived from the maximization of a family of strictly proper homogeneous scoring rules, which avoids the computation of the intractable partition function and provides a generalized family of learning objectives that include contrastive divergence as a special case. Moreover, PS-CD allows us to flexibly choose various learning objectives to train EBMs without additional computational cost or variational minimax optimization. Theoretical analysis on the proposed method and extensive experiments on both synthetic data and commonly used image datasets demonstrate the effectiveness and modeling flexibility of PS-CD, as well as its robustness to data contamination, thus showing its superiority over maximum likelihood and $f$-EBMs.	翻訳日:2021-11-02 16:22:41 公開日:2021-11-01
# Back to Basics:IMPによる効率的なネットワーク圧縮 Back to Basics: Efficient Network Compression via IMP ( http://arxiv.org/abs/2111.00843v1 ) ライセンス: Link先を確認	Max Zimmer, Christoph Spiegel, Sebastian Pokutta	(参考訳) ネットワークプルーニング(Network pruning)は、ディープニューラルネットワークを推論時の性能劣化が少なく、効果的に圧縮する手法である。イテレーティブ・マグニチュード・プルーニング(imp)は、ネットワークプルーニングの最も確立されたアプローチの1つであり、いくつかの反復的なトレーニングとプルーニングステップで構成されており、ネットワークのパフォーマンスのかなりの部分がプルーニング後に失われ、その後再トレーニングフェーズで回復される。ベンチマーク参照として一般的に使用されるが、しばしば議論される。 a) 訓練段階にスパーシフィケーションを取り入れないことにより、準最適状態に達すること。 b) グローバル選択基準が最適層ワイドプルーニング率を適切に決定できないこと。 c) その反復的な性質は、遅く非競争的である。最近提案された再訓練技術に照らして,impとpruning-during-trainingアルゴリズムを比較し,選択基準の修正案を評価し,実際に必要とされるイテレーション数と総トレーニング時間について検討した。再学習のためのSLRを用いたIMPは,計算オーバーヘッドの少ない,あるいは少ない,最先端のpruning-during-trainingアプローチよりも優れており,大域的な選択基準は,より複雑なアプローチとほぼ競合するものであり,スパース性vのほとんどを達成するために実際に必要となるエポックはごくわずかである。 -IMPのパフォーマンストレードオフ。我々の目標は、IMPが既に、より複雑でパラメータ化されたアプローチに匹敵する、あるいはさらに優れた、最先端のプルーニング結果を提供できることを示し、また、将来の研究のためのより現実的で容易に実現可能なベースラインを確立することである。 Network pruning is a widely used technique for effectively compressing Deep Neural Networks with little to no degradation in performance during inference. Iterative Magnitude Pruning (IMP) is one of the most established approaches for network pruning, consisting of several iterative training and pruning steps, where a significant amount of the network's performance is lost after pruning and then recovered in the subsequent retraining phase. While commonly used as a benchmark reference, it is often argued that a) it reaches suboptimal states by not incorporating sparsification into the training phase, b) its global selection criterion fails to properly determine optimal layer-wise pruning rates and c) its iterative nature makes it slow and non-competitive. In light of recently proposed retraining techniques, we investigate these claims through rigorous and consistent experiments where we compare IMP to pruning-during-training algorithms, evaluate proposed modifications of its selection criterion and study the number of iterations and total training time actually required. We find that IMP with SLR for retraining can outperform state-of-the-art pruning-during-training approaches without or with only little computational overhead, that the global magnitude selection criterion is largely competitive with more complex approaches and that only few retraining epochs are needed in practice to achieve most of the sparsity-vs.-performance tradeoff of IMP. Our goals are both to demonstrate that basic IMP can already provide state-of-the-art pruning results on par with or even outperforming more complex or heavily parameterized approaches and also to establish a more realistic yet easily realisable baseline for future research.	翻訳日:2021-11-02 16:22:24 公開日:2021-11-01
# 機械学習による作物収量最適化 Machine Learning aided Crop Yield Optimization ( http://arxiv.org/abs/2111.00963v1 ) ライセンス: Link先を確認	Chace Ashcraft, Kiran Karra	(参考訳) 本稿では,openai体育館インタフェースを用いた作物シミュレーション環境を提案し,現代的深層強化学習(drl)アルゴリズムを用いて収率を最適化する。 DRLアルゴリズムは,水や肥料の使用量などの制約要因を最小化しつつ,収穫量の最適化を支援する新しい政策やアプローチの発見に有用であることを示す。我々は、このハイブリッドプラントモデルとデータ駆動アプローチにより、作物収量を最適化する新しい戦略が、人口増加と気候変動による世界の食料需要に対応するのに役立つことを示唆する。 We present a crop simulation environment with an OpenAI Gym interface, and apply modern deep reinforcement learning (DRL) algorithms to optimize yield. We empirically show that DRL algorithms may be useful in discovering new policies and approaches to help optimize crop yield, while simultaneously minimizing constraining factors such as water and fertilizer usage. We propose that this hybrid plant modeling and data-driven approach for discovering new strategies to optimize crop yield may help address upcoming global food demands due to population expansion and climate change.	翻訳日:2021-11-02 16:20:47 公開日:2021-11-01
# PDE-READ:ディープラーニングを用いた人間可読部分微分方程式探索 PDE-READ: Human-readable Partial Differential Equation Discovery using Deep Learning ( http://arxiv.org/abs/2111.00998v1 ) ライセンス: Link先を確認	Robert Stephany, Christopher Earls	(参考訳) PDE発見は、複雑な物理系の予測モデルを明らかにすることを約束するが、測定がまばらでノイズの多い場合には困難である。本稿では,2つの有理ニューラルネットワークと原理的スパース回帰アルゴリズムを用いて,システムの応答を支配する隠れたダイナミクスを同定する新しい手法を提案する。第1のネットワークはシステム応答関数を、第2のネットワークはシステムの進化を駆動する隠れPDEを学習する。次に,パラメータフリーなスパース回帰アルゴリズムを用いて,隠れたPDEの可読な形式を第2ネットワークから抽出する。我々はPDE-READと呼ばれるオープンソースライブラリにアプローチを実装した。提案手法は, 熱, バーガース, コルテヴェーグ・ド・ブリーズ方程式を顕著な整合性で同定する。提案手法は空間と雑音の両方に対して前例のない頑健であり,実世界の観測データに適用可能であることを示す。 PDE discovery shows promise for uncovering predictive models for complex physical systems but has difficulty when measurements are sparse and noisy. We introduce a new approach for PDE discovery that uses two Rational Neural Networks and a principled sparse regression algorithm to identify the hidden dynamics that govern a system's response. The first network learns the system response function, while the second learns a hidden PDE which drives the system's evolution. We then use a parameter-free sparse regression algorithm to extract a human-readable form of the hidden PDE from the second network. We implement our approach in an open-source library called PDE-READ. Our approach successfully identifies the Heat, Burgers, and Korteweg-De Vries equations with remarkable consistency. We demonstrate that our approach is unprecedentedly robust to both sparsity and noise and is, therefore, applicable to real-world observational data.	翻訳日:2021-11-02 16:20:37 公開日:2021-11-01
# 分散原理は、ドロップアウトがフラットなミニマを見つける理由を説明する A variance principle explains why dropout finds flatter minima ( http://arxiv.org/abs/2111.01022v1 ) ライセンス: Link先を確認	Zhongwang Zhang, Hanxu Zhou, Zhi-Qin John Xu	(参考訳) ドロップアウトはディープラーニングにおいて大きな成功をおさめたが、高次元パラメータ空間における優れた一般化解を見つけるのにどのように役立つかは分かっていない。本研究では,ドロップアウトによる学習では,標準的な勾配降下訓練と比較して,ニューラルネットワークが最少で平坦であることを示す。さらに, 落下が実験によってより平坦なミニマムを発見するメカニズムについて検討した。ノイズの分散が損失景観のより鋭い方向で大きくなることを, {\displaystyle {\it variance principle} として提案する。既存の研究によると、sgdは分散原理を満たしており、トレーニングは最小化される。我々の研究は、落音によるノイズも、落音がフラットなミニマムを見つける理由を説明する分散原理を満たすことを示した。一般論として, 分散原理は, より平坦な最小値を求め, 優れた一般化を得るためのトレーニングを導くドロップアウトとSGDとの重要な類似性である,と指摘する。 Although dropout has achieved great success in deep learning, little is known about how it helps the training find a good generalization solution in the high-dimensional parameter space. In this work, we show that the training with dropout finds the neural network with a flatter minimum compared with standard gradient descent training. We further study the underlying mechanism of why dropout finds flatter minima through experiments. We propose a {\it Variance Principle} that the variance of a noise is larger at the sharper direction of the loss landscape. Existing works show that SGD satisfies the variance principle, which leads the training to flatter minima. Our work show that the noise induced by the dropout also satisfies the variance principle that explains why dropout finds flatter minima. In general, our work points out that the variance principle is an important similarity between dropout and SGD that lead the training to find flatter minima and obtain good generalization.	翻訳日:2021-11-02 16:20:23 公開日:2021-11-01
# FedFm:エッジノードにおける障害軽減のためのロバストなフェデレーション学習アプローチを目指して FedFm: Towards a Robust Federated Learning Approach For Fault Mitigation at the Edge Nodes ( http://arxiv.org/abs/2111.01074v1 ) ライセンス: Link先を確認	Manupriya Gupta, Pavas Goyal, Rohit Verma, Rajeev Shorey, Huzur Saran	(参考訳) フェデレーション学習は、"データからモデルへのsend"から"モデルからデータへのsend"へと変化します。エッジエコシステムで使用すると、さまざまな手段でデータを収集し、異なるネットワークチャネルを介して接続する多数の異種エッジデバイスがトレーニングプロセスに関与します。このようなエコシステムにおけるエッジデバイスの障害は、デバイス障害やネットワーク上の問題によるものだ。本稿では、まず、FLモデルにおけるエッジデバイス数の影響を分析し、モデルに寄与する最適なデバイス数を選択するための戦略を提供する。選択したデバイスが失敗した場合,エッジエコシステムがどのように振る舞うかを観察し,堅牢なフェデレート学習技術を保証するための緩和戦略を提供する。 Federated Learning deviates from the norm of "send data to model" to "send model to data". When used in an edge ecosystem, numerous heterogeneous edge devices collecting data through different means and connected through different network channels get involved in the training process. Failure of edge devices in such an ecosystem due to device fault or network issues is highly likely. In this paper, we first analyse the impact of the number of edge devices on an FL model and provide a strategy to select an optimal number of devices that would contribute to the model. We observe how the edge ecosystem behaves when the selected devices fail and provide a mitigation strategy to ensure a robust Federated Learning technique.	翻訳日:2021-11-02 16:20:05 公開日:2021-11-01
# SmartSplit: スマートフォン環境におけるCNN分割のためのレイテンシ・エネルギメモリ最適化 SmartSplit: Latency-Energy-Memory Optimisation for CNN Splitting on Smartphone Environment ( http://arxiv.org/abs/2111.01077v1 ) ライセンス: Link先を確認	Ishan Prakash, Aniruddh Bansal, Rohit Verma, Rajeev Shorey	(参考訳) スマートフォン業界では、すべての処理をユーザに近づけて、プライバシの懸念に対処する必要性から、人工知能が中心的存在となっている。複数のAIアプリケーションで使用されている畳み込みニューラルネットワーク(CNN)は、非常にリソースと計算集約性が高い。次世代スマートフォンはAI対応チップを備えているが、多くのアプリケーションがスマートフォン上で同時に実行されるため、最小限のメモリとエネルギー利用が不可欠である。これを踏まえると、処理の一部をクラウドサーバにオフロードすることで、スマートフォンのワークロードを最適化することは、研究の重要な方向である。本稿では,スマートフォンとクラウドサーバ間でCNNを分割する可能性について,エンドツーエンドのレイテンシ,メモリ利用,エネルギー消費を最適化する多目的最適化問題を定式化することによって分析する。我々は、最適化問題を解決するために、意思決定に基づくアプローチによる遺伝的アルゴリズムであるsmartsplitを設計した。実験では複数のCNNモデルを用いて,スマートフォンとクラウドサーバのCNN分割が実現可能であることを示す。提案されたアプローチであるSmartSplitは、他の最先端のアプローチよりも優れている。 Artificial Intelligence has now taken centre stage in the smartphone industry owing to the need of bringing all processing close to the user and addressing privacy concerns. Convolution Neural Networks (CNNs), which are used by several AI applications, are highly resource and computation intensive. Although new generation smartphones come with AI-enabled chips, minimal memory and energy utilisation is essential as many applications are run concurrently on a smartphone. In light of this, optimising the workload on the smartphone by offloading a part of the processing to a cloud server is an important direction of research. In this paper, we analyse the feasibility of splitting CNNs between smartphones and cloud server by formulating a multi-objective optimisation problem that optimises the end-to-end latency, memory utilisation, and energy consumption. We design SmartSplit, a Genetic Algorithm with decision analysis based approach to solve the optimisation problem. Our experiments run with multiple CNN models show that splitting a CNN between a smartphone and a cloud server is feasible. The proposed approach, SmartSplit fares better when compared to other state-of-the-art approaches.	翻訳日:2021-11-02 16:19:54 公開日:2021-11-01
# winogradの最小フィルタリングに基づく高速畳み込み:導入と開発 Fast Convolution based on Winograd Minimum Filtering: Introduction and Development ( http://arxiv.org/abs/2111.00977v1 ) ライセンス: Link先を確認	Gan Tong and Libo Huang	(参考訳) 畳み込みニューラルネットワーク(CNN)は様々な分野で広く使われており、重要な役割を果たしている。畳み込み演算子は畳み込みニューラルネットワークの基本的なコンポーネントであり、ネットワークトレーニングと推論の最も時間を要する部分でもある。近年、FFTやWinogradなどの高速な畳み込みアルゴリズムが提案されている。このうち、ウィノグラードの畳み込みは畳み込みにおける乗算演算を著しく減少させ、FFT畳み込みよりもメモリ空間を小さくする。したがって、ウィノグラード畳み込みは数年のうちに高速畳み込み実装の最初の選択肢となった。現在、畳み込みアルゴリズムの体系的な概要は存在しない。本稿は、このギャップを埋め、フォローアップ研究者に詳細なリファレンスを提供することを目的としている。本稿では,アルゴリズム拡張,アルゴリズム最適化,実装,アプリケーションという3つの側面からウィノグラード畳み込みの開発を要約し,最終的に将来的な方向性について簡単な展望を述べる。 Convolutional Neural Network (CNN) has been widely used in various fields and played an important role. Convolution operators are the fundamental component of convolutional neural networks, and it is also the most time-consuming part of network training and inference. In recent years, researchers have proposed several fast convolution algorithms including FFT and Winograd. Among them, Winograd convolution significantly reduces the multiplication operations in convolution, and it also takes up less memory space than FFT convolution. Therefore, Winograd convolution has quickly become the first choice for fast convolution implementation within a few years. At present, there is no systematic summary of the convolution algorithm. This article aims to fill this gap and provide detailed references for follow-up researchers. This article summarizes the development of Winograd convolution from the three aspects of algorithm expansion, algorithm optimization, implementation, and application, and finally makes a simple outlook on the possible future directions.	翻訳日:2021-11-02 16:16:48 公開日:2021-11-01
# (参考訳) 混合確率推定とPU学習 : 最近のアプローチ Mixture Proportion Estimation and PU Learning: A Modern Approach ( http://arxiv.org/abs/2111.00980v1 ) ライセンス: CC BY 4.0	Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton	(参考訳) 正の例と(正のクラスと負のクラスの両方から)ラベルされていない例のみを考えると、正確な正の逆負の分類器を推定することを期待できる。形式的には、このタスクは2つのサブタスクに分けられる。 (i)混合比率推定(mpe) --非ラベルデータ中の正の例の比率を決定する。 (ii)pu-learning -このような推定を行い、所望の正負の分類法を学習する。残念ながら、両方の問題の古典的な方法は高次元の設定で分解される。一方、最近提案されたヒューリスティックスは理論的コヒーレンスを欠き、ハイパーパラメータチューニングに依存する。本稿では,pu-learningの単純な目的であるbest bin estimation (bbe) (mpe) とconditional value ignoring risk (cvir) の2つの簡単な手法を提案する。どちらの手法も経験的に従来の手法を支配しており、BBEでは、正の例の小さな部分集合をきれいに分離するためにモデルを訓練できるたびに保持する形式的な保証を確立する。最終アルゴリズム(TED)$^n$は2つの手順を交互に行い、混合比推定器と分類器の両方を著しく改善する。 Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning. Both methods dominate previous approaches empirically, and for BBE, we establish formal guarantees that hold whenever we can train a model to cleanly separate out a small subset of positive examples. Our final algorithm (TED)$^n$, alternates between the two procedures, significantly improving both our mixture proportion estimator and classifier	翻訳日:2021-11-02 16:14:23 公開日:2021-11-01
# 2次元解剖学的ランドマーク検出のための特徴集約と細分化ネットワーク Feature Aggregation and Refinement Network for 2D AnatomicalLandmark Detection ( http://arxiv.org/abs/2111.00659v1 ) ライセンス: Link先を確認	Yueyuan Ao and Hong Wu	(参考訳) 解剖学的ランドマークの局在は臨床診断、治療計画、研究に不可欠である。本稿では,解剖学的ランドマークの自動検出のための,FARNet(Feature aggregate and refinement Network)という新しいディープネットワークを提案する。医療領域における限られたトレーニングデータの問題を緩和するため,本ネットワークでは,自然画像に事前学習したディープネットワークをバックボーンネットワークとして採用し,いくつかの人気ネットワークを比較した。我々のFARNetには、マルチスケール機能融合のためのマルチスケール機能集約モジュールと、高分解能熱マップ回帰のための機能改善モジュールが含まれています。粗大な監督が2つのモジュールに適用され、エンドツーエンドのトレーニングが促進される。さらに,高精度ヒートマップ回帰のための指数重み付き中心損失という新しい損失関数を提案し,ランドマーク近傍の画素の損失に着目し,遠方からの損失を抑制する。本研究のネットワークは,頭部X線写真,手指X線写真,脊椎X線写真を含む3つの解剖学的ランドマーク検出データセットで評価され,3つのデータセットすべてで最先端のパフォーマンスが達成されている。コードは以下の通り。 \url{https://github.com/JuvenileInWind/FARNet} Localization of anatomical landmarks is essential for clinical diagnosis, treatment planning, and research. In this paper, we propose a novel deep network, named feature aggregation and refinement network (FARNet), for the automatic detection of anatomical landmarks. To alleviate the problem of limited training data in the medical domain, our network adopts a deep network pre-trained on natural images as the backbone network and several popular networks have been compared. Our FARNet also includes a multi-scale feature aggregation module for multi-scale feature fusion and a feature refinement module for high-resolution heatmap regression. Coarse-to-fine supervisions are applied to the two modules to facilitate the end-to-end training. We further propose a novel loss function named Exponential Weighted Center loss for accurate heatmap regression, which focuses on the losses from the pixels near landmarks and suppresses the ones from far away. Our network has been evaluated on three publicly available anatomical landmark detection datasets, including cephalometric radiographs, hand radiographs, and spine radiographs, and achieves state-of-art performances on all three datasets. Code is available at: \url{https://github.com/JuvenileInWind/FARNet}	翻訳日:2021-11-02 15:36:58 公開日:2021-11-01
# 反復ロバスト変換同期の学習 Learning Iterative Robust Transformation Synchronization ( http://arxiv.org/abs/2111.00728v1 ) ライセンス: Link先を確認	Zi Jian Yew and Gim Hee Lee	(参考訳) 変換同期は、与えられた対関係運動の集合から絶対変換を回復する問題である。その有用性にもかかわらず、ノイズや外向きの相対運動の影響や、解析的にモデル化し、高い忠実度で抑制することの難しさにより、この問題は依然として困難である。本研究では,ロバストな損失関数を手作りすることを避け,グラフニューラルネットワーク(gnns)を用いて変換同期を学習する方法を提案する。複雑なマルチステージパイプラインを使用する以前の作業とは異なり、各ステップは、接空間におけるインクリメンタルな更新を予測することによって、前回のイテレーションからの絶対的なステップを洗練する、単一の重み共有メッセージパッシング層で構成される反復的アプローチを採用している。外れ値の影響を減らすために、メッセージは集約の前に重み付けされる。我々の反復的アプローチは明示的な初期化ステップの必要性を軽減し、アイデンティティの初期ポーズとうまく機能する。提案手法は単純ではあるが,SO(3) と SE(3) の同時同期実験により,既存の手工・学習同期手法に対して良好に動作することを示す。 Transformation Synchronization is the problem of recovering absolute transformations from a given set of pairwise relative motions. Despite its usefulness, the problem remains challenging due to the influences from noisy and outlier relative motions, and the difficulty to model analytically and suppress them with high fidelity. In this work, we avoid handcrafting robust loss functions, and propose to use graph neural networks (GNNs) to learn transformation synchronization. Unlike previous works which use complicated multi-stage pipelines, we use an iterative approach where each step consists of a single weight-shared message passing layer that refines the absolute poses from the previous iteration by predicting an incremental update in the tangent space. To reduce the influence of outliers, the messages are weighted before aggregation. Our iterative approach alleviates the need for an explicit initialization step and performs well with identity initial poses. Although our approach is simple, we show that it performs favorably against existing handcrafted and learned synchronization methods through experiments on both SO(3) and SE(3) synchronization.	翻訳日:2021-11-02 15:36:38 公開日:2021-11-01
# バイアス修正モジュールによる局所表現の改善によるショット学習 Few-shot learning with improved local representations via bias rectify module ( http://arxiv.org/abs/2111.00754v1 ) ライセンス: Link先を確認	Chao Dong, Qi Ye, Wenchao Meng, Kaixiang Yang	(参考訳) メトリック学習に基づく最近のアプローチは、マイナショット学習において大きな進歩を遂げている。しかし、それらのほとんどは画像レベルの表現方法に限られており、クラス内のバリエーションや空間的知識を適切に扱えないため、望ましくないパフォーマンスを生み出す。本稿では,特徴表現の構造に存在する空間情報を十分に活用するために,深バイアス正規化ネットワーク(dbrn)を提案する。まず,クラス内変異による悪影響を軽減するためにバイアス修正モジュールを用いた。 bias rectifyモジュールは、異なる重み付けによって分類に対してより識別可能な特徴に焦点を合わせることができる。トレーニングデータを完全に活用するために,サポートセットから生成されたプロトタイプをより代表的なものにするためのプロトタイプ拡張機構を設計する。提案手法の有効性を検証するため,本手法は最先端手法を上回ることができるため,人気のあるマイナショット分類ベンチマークを用いて広範囲に実験を行った。 Recent approaches based on metric learning have achieved great progress in few-shot learning. However, most of them are limited to image-level representation manners, which fail to properly deal with the intra-class variations and spatial knowledge and thus produce undesirable performance. In this paper we propose a Deep Bias Rectify Network (DBRN) to fully exploit the spatial information that exists in the structure of the feature representations. We first employ a bias rectify module to alleviate the adverse impact caused by the intra-class variations. bias rectify module is able to focus on the features that are more discriminative for classification by given different weights. To make full use of the training data, we design a prototype augment mechanism that can make the prototypes generated from the support set to be more representative. To validate the effectiveness of our method, we conducted extensive experiments on various popular few-shot classification benchmarks and our methods can outperform state-of-the-art methods.	翻訳日:2021-11-02 15:36:20 公開日:2021-11-01
# 衝突認識因子による相互作用手の単眼3次元再構成 Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements ( http://arxiv.org/abs/2111.00763v1 ) ライセンス: Link先を確認	Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy	(参考訳) 3dインタラクションハンドレコンストラクションは、人間と機械の相互作用と人間の行動を理解するのに不可欠である。この分野での以前の作業では、深度画像のような補助入力に依存するか、単眼のrgb画像を使用する場合のみ片手で処理できる。シングルハンド法は、両手間の相互作用を明示的にモデル化できないため、密接に相互作用する手に適用すると、衝突した手メッシュを生成する傾向がある。本稿では,単眼単一rgb画像から3次元インタラクションハンドを再構築する最初の試みを行う。高精度な3dポーズと最小限の衝突を伴う3dハンドメッシュを生成できる。これは2段階のフレームワークによって実現されている。具体的には、第1段階は畳み込みニューラルネットワークを採用し、衝突を許容するがポーズ精度の高いハンドメッシュを奨励する粗い予測を生成する。第2段階は、3dポーズの正確性を維持しながら、一連の因子化された改良を通じて衝突を段階的に緩和する。効率と精度のトレードオフを考慮し, 分解改質の可能性を慎重に検討する。 interhand2.6m のような大規模データセットにおける広範囲な量的・質的結果が提案手法の有効性を示している。 3D interacting hand reconstruction is essential to facilitate human-machine interaction and human behaviors understanding. Previous works in this field either rely on auxiliary inputs such as depth images or they can only handle a single hand if monocular single RGB images are used. Single-hand methods tend to generate collided hand meshes, when applied to closely interacting hands, since they cannot model the interactions between two hands explicitly. In this paper, we make the first attempt to reconstruct 3D interacting hands from monocular single RGB images. Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions. This is made possible via a two-stage framework. Specifically, the first stage adopts a convolutional neural network to generate coarse predictions that tolerate collisions but encourage pose-accurate hand meshes. The second stage progressively ameliorates the collisions through a series of factorized refinements while retaining the preciseness of 3D poses. We carefully investigate potential implementations for the factorized refinement, considering the trade-off between efficiency and accuracy. Extensive quantitative and qualitative results on large-scale datasets such as InterHand2.6M demonstrate the effectiveness of the proposed approach.	翻訳日:2021-11-02 15:36:03 公開日:2021-11-01
# 注意的特徴集約による高密度予測 Dense Prediction with Attentive Feature Aggregation ( http://arxiv.org/abs/2111.00770v1 ) ライセンス: Link先を確認	Yung-Hsu Yang, Thomas E. Huang, Samuel Rota Bul\`o, Peter Kontschieder, Fisher Yu	(参考訳) 異なる層にまたがる特徴から情報を集約することは、密集予測モデルに不可欠な操作である。限定的な表現性にもかかわらず、機能結合は集約操作の選択を支配する。本稿では,AFA(Attentive Feature Aggregation)を導入し,より表現力のある非線形操作で異なるネットワーク層を融合させる。 AFAは、層活性化の重み付き平均を計算するために空間的注意とチャネル的注意の両方を利用する。ニューラルボリュームレンダリングに触発されて、AFAをスケールスペースレンダリング(SSR)で拡張し、マルチスケール予測の後期融合を行う。 afaは、既存のネットワーク設計の幅広い範囲に適用できる。 cityscapes、bdd100k、mapillary vistasなどのセマンティックセグメンテーションベンチマークに対して、無視できる計算量とパラメータオーバーヘッドで、一貫性と大幅な改善が得られました。特に、AFAは、Cityscapes上でのDeep Layer Aggregation(DLA)モデルの性能を約6%向上させる。実験結果から,afa はセグメンテーションマップを段階的に洗練し,境界詳細を改善することを学び,bsds500 および nyudv2 における境界検出ベンチマークの最新の結果を得ることができた。コードとビデオのリソースはhttp://vis.xyz/pub/dla-afaで入手できる。 Aggregating information from features across different layers is an essential operation for dense prediction models. Despite its limited expressiveness, feature concatenation dominates the choice of aggregation operations. In this paper, we introduce Attentive Feature Aggregation (AFA) to fuse different network layers with more expressive non-linear operations. AFA exploits both spatial and channel attention to compute weighted average of the layer activations. Inspired by neural volume rendering, we extend AFA with Scale-Space Rendering (SSR) to perform late fusion of multi-scale predictions. AFA is applicable to a wide range of existing network designs. Our experiments show consistent and significant improvements on challenging semantic segmentation benchmarks, including Cityscapes, BDD100K, and Mapillary Vistas, at negligible computational and parameter overhead. In particular, AFA improves the performance of the Deep Layer Aggregation (DLA) model by nearly 6% mIoU on Cityscapes. Our experimental analyses show that AFA learns to progressively refine segmentation maps and to improve boundary details, leading to new state-of-the-art results on boundary detection benchmarks on BSDS500 and NYUDv2. Code and video resources are available at http://vis.xyz/pub/dla-afa.	翻訳日:2021-11-02 15:34:46 公開日:2021-11-01
# PP-ShiTu:実用軽量画像認識システム PP-ShiTu: A Practical Lightweight Image Recognition System ( http://arxiv.org/abs/2111.00775v1 ) ライセンス: Link先を確認	Shengyu Wei, Ruoyu Guo, Cheng Cui, Bin Lu, Shuilong Dong, Tingquan Gao, Yuning Du, Ying Zhou, Xueying Lyu, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma	(参考訳) 近年,画像認識アプリケーションの開発が急速に進んでいる。顔認識、歩行者と車両の再識別、ランドマーク検索、製品認識など、さまざまな分野で多くの研究と技術が登場している。本稿では,主物体検出,特徴抽出,ベクトル探索の3つのモジュールからなる実用軽量画像認識システムPP-ShiTuを提案する。メトリクス学習,ディープハッシュ,知識蒸留,モデル量子化といった一般的な戦略を導入し,精度と推論速度を向上させる。上記の戦略では、PP-ShiTuは混合データセットでトレーニングされたモデルのセットで、さまざまなシナリオでうまく機能する。異なるデータセットとベンチマークの実験により、システムは画像認識の異なる領域で広く有効であることが示された。上記のモデルはすべてオープンソースで、コードはGitHubリポジトリPaddleClas on PaddlePaddleで公開されている。 In recent years, image recognition applications have developed rapidly. A large number of studies and techniques have emerged in different fields, such as face recognition, pedestrian and vehicle re-identification, landmark retrieval, and product recognition. In this paper, we propose a practical lightweight image recognition system, named PP-ShiTu, consisting of the following 3 modules, mainbody detection, feature extraction and vector search. We introduce popular strategies including metric learning, deep hash, knowledge distillation and model quantization to improve accuracy and inference speed. With strategies above, PP-ShiTu works well in different scenarios with a set of models trained on a mixed dataset. Experiments on different datasets and benchmarks show that the system is widely effective in different domains of image recognition. All the above mentioned models are open-sourced and the code is available in the GitHub repository PaddleClas on PaddlePaddle.	翻訳日:2021-11-02 15:34:24 公開日:2021-11-01
# 凸形状を持つ測地線モデル Geodesic Models with Convexity Shape Prior ( http://arxiv.org/abs/2111.00794v1 ) ライセンス: Link先を確認	Da Chen and Jean-Marie Mirebeau and Minglei Shu and Xuecheng Tai and Laurent D. Cohen	(参考訳) アイコン方程式に基づく最小測地モデルは、様々な画像セグメンテーションのシナリオで適切な解を見つけることができる。既存の測地線に基づくセグメンテーションアプローチは、通常、測地線曲線を計算するためにユークリッド曲線の長さや曲率ペナル化長さなどの幾何正規化項と共に画像特徴を利用する。本稿では, より複雑な問題として, 凸形状を持つ曲率ペナル化された測地線経路を求める。我々は、平面曲線を高次元の向きに依存した空間にマッピングできる配向リフト戦略に依存する新しい測地モデルを確立する。凸形状は、特定の曲率の制約をコードする局所測地線メトリクスの構築のための制約となる。そして、向き上げ空間における測地距離とそれに対応する閉測地路を、最先端のハミルトン高速マーチング法により効率的に計算することができる。さらに,提案する測地線モデルをアクティブな輪郭に適用することにより,凸形状の利点と曲率ペナリゼーションを保った効率的なインタラクティブ画像分割アルゴリズムを実現する。 The minimal geodesic models based on the Eikonal equations are capable of finding suitable solutions in various image segmentation scenarios. Existing geodesic-based segmentation approaches usually exploit image features in conjunction with geometric regularization terms, such as Euclidean curve length or curvature-penalized length, for computing geodesic curves. In this paper, we take into account a more complicated problem: finding curvature-penalized geodesic paths with a convexity shape prior. We establish new geodesic models relying on the strategy of orientation-lifting, by which a planar curve can be mapped to an high-dimensional orientation-dependent space. The convexity shape prior serves as a constraint for the construction of local geodesic metrics encoding a particular curvature constraint. Then the geodesic distances and the corresponding closed geodesic paths in the orientation-lifted space can be efficiently computed through state-of-the-art Hamiltonian fast marching method. In addition, we apply the proposed geodesic models to the active contours, leading to efficient interactive image segmentation algorithms that preserve the advantages of convexity shape prior and curvature penalization.	翻訳日:2021-11-02 15:34:11 公開日:2021-11-01
# LSTA-Net:スケルトンに基づく行動認識のための長期時空間アグリゲーションネットワーク LSTA-Net: Long short-term Spatio-Temporal Aggregation Network for Skeleton-based Action Recognition ( http://arxiv.org/abs/2111.00823v1 ) ライセンス: Link先を確認	Tailin Chen, Shidong Wang, Desen Zhou, Yu Guan	(参考訳) 様々な時空間依存のモデル化は、スケルトンシーケンスにおける人間の行動を認識する鍵となる。既存の手法の多くは、ダイナミックジョイントの依存関係を引き出すためにトラバーサルルールやグラフトポロジの設計に過度に依存しており、これは遠方かつ重要なジョイントの関連性を反映していない。さらに, 局所的な運用により, 重要な長距離時間情報については, 既存の作品ではよく調べられていない。この問題に対処するため,我々はlsta-netを提案する。このネットワークは,長期・短距離の依存関係を時空間的に効果的に捉えることができる。我々は,空間的特徴の集約と時間的特徴の集約を交互に行う純粋因子化アーキテクチャにモデルを考案する。特徴集約効果を改善するため、チャネルワイドアテンション機構も設計・採用されている。 3つの公開ベンチマークデータセットで広範な実験を行い,提案手法は空間領域と時間領域における長短距離依存性の両方を捉えることができ,他の最先端手法よりも高い結果が得られることが示唆された。コードはhttps://github.com/tailin1009/lsta-net。 Modelling various spatio-temporal dependencies is the key to recognising human actions in skeleton sequences. Most existing methods excessively relied on the design of traversal rules or graph topologies to draw the dependencies of the dynamic joints, which is inadequate to reflect the relationships of the distant yet important joints. Furthermore, due to the locally adopted operations, the important long-range temporal information is therefore not well explored in existing works. To address this issue, in this work we propose LSTA-Net: a novel Long short-term Spatio-Temporal Aggregation Network, which can effectively capture the long/short-range dependencies in a spatio-temporal manner. We devise our model into a pure factorised architecture which can alternately perform spatial feature aggregation and temporal feature aggregation. To improve the feature aggregation effect, a channel-wise attention mechanism is also designed and employed. Extensive experiments were conducted on three public benchmark datasets, and the results suggest that our approach can capture both long-and-short range dependencies in the space and time domain, yielding higher results than other state-of-the-art methods. Code available at https://github.com/tailin1009/LSTA-Net.	翻訳日:2021-11-02 15:33:54 公開日:2021-11-01
# 破損不変人物再同定のためのベンチマーク Benchmarks for Corruption Invariant Person Re-identification ( http://arxiv.org/abs/2111.00880v1 ) ライセンス: Link先を確認	Minghui Chen, Zhiqiang Wang, Feng Zheng	(参考訳) 安全クリティカルなアプリケーションに人体再識別(ReID)モデルをデプロイする場合、さまざまな画像破損に対するモデルの堅牢性を理解することが重要となる。しかし、person reidの現在の評価では、クリーンデータセットのパフォーマンスのみを検討し、さまざまな腐敗したシナリオでイメージを無視している。本研究では,6つのReIDベンチマークを総合的に構築し,腐敗不変表現を学習する。 ReID の分野では,マーケット-1501,CUHK03,MSMT17,RegDB,SYSU-MM01 など,単品間および多品間データセットにおける汚い不変性学習の徹底的な研究を行う。最近の21種類のreid法のロバスト性性能を再現・検討した結果,以下の知見を得た。 1) 変圧器モデルの方がcnnモデルに比べて劣化画像に対して頑健である。 2) ランダム消去の確率を増大させることにより, モデル劣化の堅牢性が損なわれる。 3) クロスデータセットの一般化は汚職の堅牢性の向上とともに改善する。上記の観察を解析することにより,多様な腐敗に対するロバスト性の向上を実現する,単一および相互モダリティreidデータセットの強固なベースラインを提案する。私たちのコードはhttps://github.com/MinghuiChen43/CIL-ReIDで公開されています。 When deploying person re-identification (ReID) model in safety-critical applications, it is pivotal to understanding the robustness of the model against a diverse array of image corruptions. However, current evaluations of person ReID only consider the performance on clean datasets and ignore images in various corrupted scenarios. In this work, we comprehensively establish six ReID benchmarks for learning corruption invariant representation. In the field of ReID, we are the first to conduct an exhaustive study on corruption invariant learning in single- and cross-modality datasets, including Market-1501, CUHK03, MSMT17, RegDB, SYSU-MM01. After reproducing and examining the robustness performance of 21 recent ReID methods, we have some observations: 1) transformer-based models are more robust towards corrupted images, compared with CNN-based models, 2) increasing the probability of random erasing (a commonly used augmentation method) hurts model corruption robustness, 3) cross-dataset generalization improves with corruption robustness increases. By analyzing the above observations, we propose a strong baseline on both single- and cross-modality ReID datasets which achieves improved robustness against diverse corruptions. Our codes are available on https://github.com/MinghuiChen43/CIL-ReID.	翻訳日:2021-11-02 15:33:34 公開日:2021-11-01
# リテラリートイデータセットを用いた階層画像分類 Hierarchical Image Classification with A Literally Toy Dataset ( http://arxiv.org/abs/2111.00892v1 ) ライセンス: Link先を確認	Long He, Dandan Song, Liang Zheng	(参考訳) 画像分類における教師なし領域適応(UDA)は依然として大きな課題である。既存のUDAイメージデータセットでは、クラスは通常フラットな方法で整理され、平易な分類器を訓練することができる。しかし、いくつかのシナリオでは、フラットなカテゴリはいくつかのベースクラスに由来する。例えば、バギーはクラス鳥に属する。階層的画像分類として,クラスが上述の特徴を持ち,フラットクラスとベースクラスが階層的に分類される分類タスクを定義する。直感的には、このような階層構造を活用することは、階層的なイメージ分類に役立つ。本稿では,ラベル階層から学習した特徴を融合させて分類性能を向上させる。具体的には,階層ラベルとUDA技術を用いて特徴抽出器を訓練し,入力画像の複数の特徴を出力する。それらの機能は、最後にきめ細かいクラスを予測するために結合される。この研究はlego-15という新しいデータセットで行われます。 lego-15のデータセットには、レゴブロックの合成画像と実際の画像が15種類含まれています。各クラスは粗いレベルラベルと中間レベルラベルに由来する。例えば、"85080"クラスは、レンガ(粗い)とレンガ(中間)に関連付けられている。本稿では,本手法が階層画像分類におけるUDAのベースラインを一貫した改善をもたらすことを示す。大規模なアブレーションと変種研究は、新しいデータセットと調査アルゴリズムに関する洞察を提供する。 Unsupervised domain adaptation (UDA) in image classification remains a big challenge. In existing UDA image dataset, classes are usually organized in a flattened way, where a plain classifier can be trained. Yet in some scenarios, the flat categories originate from some base classes. For example, buggies belong to the class bird. We define the classification task where classes have characteristics above and the flat classes and the base classes are organized hierarchically as hierarchical image classification. Intuitively, leveraging such hierarchical structure will benefit hierarchical image classification, e.g., two easily confusing classes may belong to entirely different base classes. In this paper, we improve the performance of classification by fusing features learned from a hierarchy of labels. Specifically, we train feature extractors supervised by hierarchical labels and with UDA technology, which will output multiple features for an input image. The features are subsequently concatenated to predict the finest-grained class. This study is conducted with a new dataset named Lego-15. Consisting of synthetic images and real images of the Lego bricks, the Lego-15 dataset contains 15 classes of bricks. Each class originates from a coarse-level label and a middle-level label. For example, class "85080" is associated with bricks (coarse) and bricks round (middle). In this dataset, we demonstrate that our method brings about consistent improvement over the baseline in UDA in hierarchical image classification. Extensive ablation and variant studies provide insights into the new dataset and the investigated algorithm.	翻訳日:2021-11-02 15:33:15 公開日:2021-11-01
# PP-PicoDet: モバイルデバイスのリアルタイムオブジェクト検出器 PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices ( http://arxiv.org/abs/2111.00902v1 ) ライセンス: Link先を確認	Guanghua Yu, Qinyao Chang, Wenyu Lv, Chang Xu, Cheng Cui, Wei Ji, Qingqing Dang, Kaipeng Deng, Guanzhong Wang, Yuning Du, Baohua Lai, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma	(参考訳) 精度と効率のトレードオフは、オブジェクト検出において難しい問題である。本稿では,オブジェクト検出のための重要な最適化とニューラルネットワークアーキテクチャの選択を研究し,精度と効率を向上させることを目的とする。軽量物体検出モデルにおけるアンカーフリー戦略の適用性について検討する。我々は,バックボーン構造を強化し,首の軽量構造を設計し,ネットワークの特徴抽出能力を向上させる。ラベル割り当て戦略と損失関数を改善し,トレーニングをより安定かつ効率的にする。これらの最適化により, PP-PicoDetと呼ばれる, モバイル機器の物体検出性能に優れたリアルタイム物体検出ファミリを新たに構築する。我々のモデルは、他の一般的なモデルと比べて精度とレイテンシのトレードオフが良くなります。わずか0.99MパラメータのPicoDet-Sは30.6%のmAPを実現しており、これはmAPの絶対4.8%の改善であり、YOLOX-Nanoと比較してモバイルCPUの推論遅延を55%削減している。入力サイズが320のとき、モバイルARM CPU上で123 FPS(Paddle Liteを使用した150 FPS)に達する。わずか3.3MパラメータのPicoDet-Lは40.9%のmAPを達成するが、これは絶対3.7%の改善であり、YOLOv5sよりも44%高速である。図1に示すように、私たちのモデルは軽量オブジェクト検出の最先端の結果をはるかに上回っています。コードと事前学習されたモデルはhttps://github.com/paddlepaddle/paddledetectionで入手できる。 The better accuracy and efficiency trade-off has been a challenging problem in object detection. In this work, we are dedicated to studying key optimizations and neural network architecture choices for object detection to improve accuracy and efficiency. We investigate the applicability of the anchor-free strategy on lightweight object detection models. We enhance the backbone structure and design the lightweight structure of the neck, which improves the feature extraction ability of the network. We improve label assignment strategy and loss function to make training more stable and efficient. Through these optimizations, we create a new family of real-time object detectors, named PP-PicoDet, which achieves superior performance on object detection for mobile devices. Our models achieve better trade-offs between accuracy and latency compared to other popular models. PicoDet-S with only 0.99M parameters achieves 30.6% mAP, which is an absolute 4.8% improvement in mAP while reducing mobile CPU inference latency by 55% compared to YOLOX-Nano, and is an absolute 7.1% improvement in mAP compared to NanoDet. It reaches 123 FPS (150 FPS using Paddle Lite) on mobile ARM CPU when the input size is 320. PicoDet-L with only 3.3M parameters achieves 40.9% mAP, which is an absolute 3.7% improvement in mAP and 44% faster than YOLOv5s. As shown in Figure 1, our models far outperform the state-of-the-art results for lightweight object detection. Code and pre-trained models are available at https://github.com/PaddlePaddle/PaddleDetection.	翻訳日:2021-11-02 15:32:56 公開日:2021-11-01
# DFCANet:クロスドメインアイリス提示検出のためのDense Feature Calibration-Attention Guided Network DFCANet: Dense Feature Calibration-Attention Guided Network for Cross Domain Iris Presentation Attack Detection ( http://arxiv.org/abs/2111.00919v1 ) ライセンス: Link先を確認	Gaurav Jaswal, Aman Verma, Sumantra Dutta Roy, Raghavendra Ramachandra	(参考訳) アイリス提示攻撃検知(IPAD)は、個人認証の確保に不可欠であり、アイリス認識システムとして広く利用されている。しかし、既存のIPADアルゴリズムは、制約のない環境でのキャプチャと、ボナフィドとアタックサンプル間の高い視覚的相関のため、目に見えない領域とクロスドメインのシナリオをうまく一般化しない。虹彩画像の複雑なテクスチャおよび形態パターンにおけるこれらの類似性は、さらなる性能劣化に寄与する。そこで本稿では,これらの欠点を解消するために,局所的に分布する虹彩パターンをグローバルに配置したdfcanet: dense feature calibration and attention guided networkを提案する。 DFCANetは特徴校正畳み込みと残差学習の利点を高め、ドメイン固有の虹彩特徴表現を生成する。キャリブレーションされた特徴マップのいくつかのチャネルは、より顕著な情報を含んでいるため、チャネルアテンション機構を通じてチャネルを横断する識別的特徴学習を行う。提案モデルの課題を強化するため,DFCANetを非切除眼および非正常眼虹彩画像上で動作させる。クロスドメインとドメイン内シナリオに対する広範囲な実験は、一貫性のある結果を示している。 DFCANetは最先端の手法と比較して,ベンチマークIIITD CLI, IIIT CSD, NDCLD13データベースのパフォーマンスが大幅に向上した。さらに,不連続な虹彩データの特徴とデータ不足を克服するために,新しいインクリメンタル学習ベースの手法が導入された。また,攻撃カテゴリのソフトレンズを様々なクロスドメインプロトコルで評価する難易度シナリオも追求する。コードは公開される予定だ。 An iris presentation attack detection (IPAD) is essential for securing personal identity is widely used iris recognition systems. However, the existing IPAD algorithms do not generalize well to unseen and cross-domain scenarios because of capture in unconstrained environments and high visual correlation amongst bonafide and attack samples. These similarities in intricate textural and morphological patterns of iris ocular images contribute further to performance degradation. To alleviate these shortcomings, this paper proposes DFCANet: Dense Feature Calibration and Attention Guided Network which calibrates the locally spread iris patterns with the globally located ones. Uplifting advantages from feature calibration convolution and residual learning, DFCANet generates domain-specific iris feature representations. Since some channels in the calibrated feature maps contain more prominent information, we capitalize discriminative feature learning across the channels through the channel attention mechanism. In order to intensify the challenge for our proposed model, we make DFCANet operate over nonsegmented and non-normalized ocular iris images. Extensive experimentation conducted over challenging cross-domain and intra-domain scenarios highlights consistent outperforming results. Compared to state-of-the-art methods, DFCANet achieves significant gains in performance for the benchmark IIITD CLI, IIIT CSD and NDCLD13 databases respectively. Further, a novel incremental learning-based methodology has been introduced so as to overcome disentangled iris-data characteristics and data scarcity. This paper also pursues the challenging scenario that considers soft-lens under the attack category with evaluation performed under various cross-domain protocols. The code will be made publicly available.	翻訳日:2021-11-02 15:32:32 公開日:2021-11-01
# (参考訳) 深層学習に適合する論理規則: 船型分類の新しいアプローチ Logic Rules Meet Deep Learning: A Novel Approach for Ship Type Classification ( http://arxiv.org/abs/2111.01042v1 ) ライセンス: CC BY 4.0	Manolis Pitsikalis, Thanh-Toan Do, Alexei Lisitsa and Shan Luo	(参考訳) 海運産業は、国際貿易と経済の重要な要素であるが、法律の遵守と安全性を確保するためには、監視が必要である。本稿では, 自動識別システムから送信された船舶データと, 船舶画像とを組み合わせた, 船舶型分類モデルを提案する。我々のアプローチの主な構成要素は、より高速なR-CNNディープニューラルネットワークとIF-THENルールを備えたニューロファジィシステムである。実世界のデータを用いてモデルを評価し、この組み合わせの利点を示しながら、他の手法と比較する。その結果,本モデルでは,ブラックボックスのアプローチとは対照的に説明可能性のレベルを維持しつつ,検討した次のベストモデルと比較して,予測スコアを最大15.4\%向上させることができることがわかった。 The shipping industry is an important component of the global trade and economy, however in order to ensure law compliance and safety it needs to be monitored. In this paper, we present a novel Ship Type classification model that combines vessel transmitted data from the Automatic Identification System, with vessel imagery. The main components of our approach are the Faster R-CNN Deep Neural Network and a Neuro-Fuzzy system with IF-THEN rules. We evaluate our model using real world data and showcase the advantages of this combination while also compare it with other methods. Results show that our model can increase prediction scores by up to 15.4\% when compared with the next best model we considered, while also maintaining a level of explainability as opposed to common black box approaches.	翻訳日:2021-11-02 15:31:05 公開日:2021-11-01
# 都市知識グラフによる知識駆動サイト選択 Knowledge-driven Site Selection via Urban Knowledge Graph ( http://arxiv.org/abs/2111.00787v1 ) ライセンス: Link先を確認	Yu Liu, Jingtao Ding, Yong Li	(参考訳) サイト選択は、ビジネスの成功にとって重要な新しい店舗の最適な場所を決定する。特に、多元都市データを用いた人工知能の幅広い応用は、インテリジェントなサイト選択を有望にする。しかし、既存のデータ駆動手法は機能工学に大きく依存しており、ビジネスの一般化と複雑な関係モデリングの問題に直面している。ジレンマを除去するために,本研究では知識グラフ(KG)からアイデアを借用し,KnowSiteの略であるサイト選択のための知識駆動モデルを提案する。具体的には, 蒸留知識と高次意味論に動機づけられ, まず, 都市の重要要素と意味関係を捉えた都市kg (urbankg) を構築した。 UrbanKGに基づいて, サイト決定のためのエンコーダ・デコーダ構造に入力される意味表現の事前学習手法を採用する。マルチリレーショナルメッセージパッシングとリレーショナルパスに基づくアテンション機構を開発したKnowSiteは,各種ビジネスとサイト選択基準との関係を明らかにする。 2つのデータセットに対する大規模な実験により、KnowSiteは、有効性と説明可能性の両方で代表ベースラインを上回っていることが示された。 Site selection determines optimal locations for new stores, which is of crucial importance to business success. Especially, the wide application of artificial intelligence with multi-source urban data makes intelligent site selection promising. However, existing data-driven methods heavily rely on feature engineering, facing the issues of business generalization and complex relationship modeling. To get rid of the dilemma, in this work, we borrow ideas from knowledge graph (KG), and propose a knowledge-driven model for site selection, short for KnowSite. Specifically, motivated by distilled knowledge and rich semantics in KG, we firstly construct an urban KG (UrbanKG) with cities' key elements and semantic relationships captured. Based on UrbanKG, we employ pre-training techniques for semantic representations, which are fed into an encoder-decoder structure for site decisions. With multi-relational message passing and relation path-based attention mechanism developed, KnowSite successfully reveals the relationship between various businesses and site selection criteria. Extensive experiments on two datasets demonstrate that KnowSite outperforms representative baselines with both effectiveness and explainability achieved.	翻訳日:2021-11-02 15:18:36 公開日:2021-11-01
# 五目(ごもく):ゲームとプレイヤーワインの分析 Gomoku: analysis of the game and of the player Wine ( http://arxiv.org/abs/2111.01016v1 ) ライセンス: Link先を確認	Lorenzo Piazzo, Michele Scarpiniti and Enzo Baccarelli	(参考訳) 五目(ごもく)は、古典的なボードゲームで、新しい人工知能(AI)技術を試すのに理想的に適している。本報告では,新たなゴモクプレイヤーの作成を希望する開発者を支援することを目的として,既存のゲームよりも広く,より深いゲームコンセプトと戦略の分析を行う。また,人工的プレーヤの一般構造について論じた上で,インターネット上で自由に利用でき,現代的プレーヤの組織化方法の優れた例である,ワインという名の強い五目プレーヤを提示・分析した。 Gomoku, also known as five in a row, is a classical board game, ideally suited for quickly testing novel Artificial Intelligence (AI) techniques. With the aim of facilitating a developer willing to write a new Gomoku player, in this report we present an analysis of the main game concepts and strategies, which is wider and deeper than existing ones. Moreover, after discussing the general structure of an artificial player, we present and analyse a strong Gomoku player, named Wine, the code of which is freely available on the Internet and which is an excelent example of how a modern player is organised.	翻訳日:2021-11-02 15:18:17 公開日:2021-11-01
# (参考訳) FREGAN : ビデオのフレームレート向上のための生成的敵ネットワークの応用 FREGAN : an application of generative adversarial networks in enhancing the frame rate of videos ( http://arxiv.org/abs/2111.01105v1 ) ライセンス: CC BY 4.0	Rishik Mishra, Neeraj Gupta, Nitya Shukla	(参考訳) デジタルビデオは個々のフレームの集合であり、シーンが各フレームのタイムスライスを利用した映像をストリーミングする。高いリフレッシュレートと高いフレームレートは、すべてのハイテクアプリケーションの要求である。ビデオのアクショントラッキングは簡単になり、リフレッシュレートが高いため、ゲームアプリケーションでは動きがスムーズになる。画面に表示されるフレーム間の時間が少なくなるため、より高速なレスポンスを提供する。 fregan(frame rate enhancement generative adversarial network)モデルが提案されており、過去のフレームのシーケンスに基づいてビデオシーケンスの将来フレームを予測する。本稿では,ganモデルについて検討し,ビデオのフレームレート向上のためにfreganを提案する。提案手法では,フーバー損失を損失関数として用いた。超解像に優れた結果をもたらし,フレームレート向上の応用において,その性能を再現しようと試みた。標準データセット(ucf101およびrfree500)における提案モデルの有効性を検証した。実験の結果,提案モデルはピーク信号対雑音比(psnr)34.94,構造類似度指数(ssim)0.95であった。 A digital video is a collection of individual frames, while streaming the video the scene utilized the time slice for each frame. High refresh rate and high frame rate is the demand of all high technology applications. The action tracking in videos becomes easier and motion becomes smoother in gaming applications due to the high refresh rate. It provides a faster response because of less time in between each frame that is displayed on the screen. FREGAN (Frame Rate Enhancement Generative Adversarial Network) model has been proposed, which predicts future frames of a video sequence based on a sequence of past frames. In this paper, we investigated the GAN model and proposed FREGAN for the enhancement of frame rate in videos. We have utilized Huber loss as a loss function in the proposed FREGAN. It provided excellent results in super-resolution and we have tried to reciprocate that performance in the application of frame rate enhancement. We have validated the effectiveness of the proposed model on the standard datasets (UCF101 and RFree500). The experimental outcomes illustrate that the proposed model has a Peak signal-to-noise ratio (PSNR) of 34.94 and a Structural Similarity Index (SSIM) of 0.95.	翻訳日:2021-11-02 15:12:45 公開日:2021-11-01
# 弱ショット学習のためのインフルエンシャル・プロトタイプネットワーク : 皮膚科における事例研究 Influential Prototypical Networks for Few Shot Learning: A Dermatological Case Study ( http://arxiv.org/abs/2111.00698v1 ) ライセンス: Link先を確認	Ranjana Roy Chowdhury, Deepti R. Bathula	(参考訳) プロトタイプネットワーク(PN)は単純だが効果的なショットラーニング戦略である。ユークリッド距離を計算して各クラスの原型表現に分類する,メートル法に基づくメタラーニング手法である。従来のpn属性はすべてのサンプルに等しく重要であり、各クラスに属するサポートサンプル埋め込みを平均化することによってプロトタイプを生成する。そこで本研究では, 支持試料分布への影響に対応する試料に重みを付与するPNの新たなバージョンを提案する。試料を含む試料分布の平均埋込量と試料を除いた最大平均差(mmd)に基づいて試料の影響重みを算出する。提案するipnetの包括的評価は,3種類のベンチマークdermatological datasetにおける他のpnsとの比較により行った。 ipnetは、3つのデータセットと様々なn-way、k-shot分類タスクで魅力的な結果を持つすべてのベースラインモデルを上回る。クロスドメイン適応実験からの発見により、IPNetの堅牢性と一般化性がさらに確立される。 Prototypical network (PN) is a simple yet effective few shot learning strategy. It is a metric-based meta-learning technique where classification is performed by computing Euclidean distances to prototypical representations of each class. Conventional PN attributes equal importance to all samples and generates prototypes by simply averaging the support sample embeddings belonging to each class. In this work, we propose a novel version of PN that attributes weights to support samples corresponding to their influence on the support sample distribution. Influence weights of samples are calculated based on maximum mean discrepancy (MMD) between the mean embeddings of sample distributions including and excluding the sample. Comprehensive evaluation of our proposed influential PN (IPNet) is performed by comparing its performance with other baseline PNs on three different benchmark dermatological datasets. IPNet outperforms all baseline models with compelling results across all three datasets and various N-way, K-shot classification tasks. Findings from cross-domain adaptation experiments further establish the robustness and generalizability of IPNet.	翻訳日:2021-11-02 15:02:30 公開日:2021-11-01
# 単一項目ファッションレコメンデーション:クロスドメインレコメンデーションに向けて Single-Item Fashion Recommender: Towards Cross-Domain Recommendations ( http://arxiv.org/abs/2111.00758v1 ) ライセンス: Link先を確認	Seyed Omid Mohammadi, Hossein Bodaghi, Ahmad Kalhor (University of Tehran, College of Engineering, School of Electrical and Computer Engineering, Tehran, Iran)	(参考訳) 現在、レコメンダシステムと検索エンジンはファッションeコマースにおいて不可欠な役割を担っている。それでも、多くの課題があり、この研究はいくつかの課題に取り組みます。この記事ではまず,並列ニューラルネットワークを用いて1つのファッションアイテムショップイメージを入力として,店舗で利用可能な類似アイテムをリストアップしてショップ内レコメンデーションを行う,コンテンツベースのファッションレコメンデーションシステムを提案する。次に、ユーザの好みに基づいて結果をパーソナライズするように、同じ構造が強化される。この研究は、ドメイン外のクエリに対してより堅牢なシステムを実現するバックグラウンド拡張技術を導入し、カタログショップイメージのトレーニングセットのみを使用して、ストリート・ツー・ショップのレコメンデーションを可能にする。さらに,本論文の最後の貢献は,客観的人間得点と呼ばれるレコメンデーションタスクのための新しい評価指標である。この方法は、人間のスコアラーの主観評価から解釈可能で比較可能なスコアを生成する、完全にカスタマイズ可能なフレームワークである。 Nowadays, recommender systems and search engines play an integral role in fashion e-commerce. Still, many challenges lie ahead, and this study tries to tackle some. This article first suggests a content-based fashion recommender system that uses a parallel neural network to take a single fashion item shop image as input and make in-shop recommendations by listing similar items available in the store. Next, the same structure is enhanced to personalize the results based on user preferences. This work then introduces a background augmentation technique that makes the system more robust to out-of-domain queries, enabling it to make street-to-shop recommendations using only a training set of catalog shop images. Moreover, the last contribution of this paper is a new evaluation metric for recommendation tasks called objective-guided human score. This method is an entirely customizable framework that produces interpretable, comparable scores from subjective evaluations of human scorers.	翻訳日:2021-11-02 15:02:15 公開日:2021-11-01
# 線形時間不変系の安全学習 Safe Learning of Linear Time-Invariant Systems ( http://arxiv.org/abs/2111.00631v1 ) ライセンス: Link先を確認	Farhad Farokhi, Alex S. Leong, Mohammad Zamani, Iman Shames	(参考訳) 離散時間線形時間不変システムの同時学習と制御における安全性を検討する。利用状態の測定回数に基づいて,システムの学習モデルに基づく厳密な信頼性境界を提供する。これらの境界は、潜在的に時間的制約のある最適化問題によってシステムへの制御入力を変更するために使用される。安全性に制約のある最適化が実現可能な解決策が存在する場合, 安全セットを最小限の確率で退避させることが証明できる。この最適化問題は、学習中のモデルの不確実性を考慮した安全制約を厳格化することにより、より計算に優しい形式に再構成される。学習モデルの信頼性が向上するにつれて、締め付けは減少する。最終的に、励起の持続下では、より多くの測定値が収集されるにつれて、締め付けは無視される。 We consider safety in simultaneous learning and control of discrete-time linear time-invariant systems. We provide rigorous confidence bounds on the learned model of the system based on the number of utilized state measurements. These bounds are used to modify control inputs to the system via an optimization problem with potentially time-varying safety constraints. We prove that the state can only exit the safe set with small probability, provided a feasible solution to the safety-constrained optimization exists. This optimization problem is then reformulated in a more computationally-friendly format by tightening the safety constraints to account for model uncertainty during learning. The tightening decreases as the confidence in the learned model improves. We finally prove that, under persistence of excitation, the tightening becomes negligible as more measurements are gathered.	翻訳日:2021-11-02 15:00:13 公開日:2021-11-01
# SADGA: テキスト間SQLのための構造対応デュアルグラフ集約ネットワーク SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL ( http://arxiv.org/abs/2111.00653v1 ) ライセンス: Link先を確認	Ruichu Cai, Jinjie Yuan, Boyan Xu, Zhifeng Hao	(参考訳) 質問の自然言語をSQLクエリに翻訳することを目的としたText-to-SQLタスクが最近注目を集めている。 Text-to-SQLの最も難しい問題のひとつは、トレーニング済みモデルを未知のデータベーススキーマに一般化する方法である。鍵は一般化可能性にある (i)質問をモデル化する符号化方法とデータベーススキーマ (ii)問題中の単語とデータベーススキーマのテーブル/カラム間のマッピングを学ぶための質問・スキーマリンク手法。上述の2つの課題に着目して、クロスドメインテキスト・トゥ・SQLのための構造対応デュアルグラフ集約ネットワーク(SADGA)を提案する。 SADGAでは、自然言語問題とデータベーススキーマの両方に統一的な符号化モデルを提供するために、グラフ構造を採用する。提案する統一モデリングに基づいて,構造認識型アグリゲーション手法をさらに考案し,質問文とスキーマグラフのマッピングを学習する。本手法は,Global Graph Linking,Local Graph Linking,Dual-Graph Aggregation Mechanismを特徴とする。私たちは提案のパフォーマンスを実証的に研究するだけでなく、テキストからSQLへのベンチマークであるSpiderの書き込み時の3位を達成しました。 The Text-to-SQL task, aiming to translate the natural language of the questions into SQL queries, has drawn much attention recently. One of the most challenging problems of Text-to-SQL is how to generalize the trained model to the unseen database schemas, also known as the cross-domain Text-to-SQL task. The key lies in the generalizability of (i) the encoding method to model the question and the database schema and (ii) the question-schema linking method to learn the mapping between words in the question and tables/columns in the database schema. Focusing on the above two key issues, we propose a Structure-Aware Dual Graph Aggregation Network (SADGA) for cross-domain Text-to-SQL. In SADGA, we adopt the graph structure to provide a unified encoding model for both the natural language question and database schema. Based on the proposed unified modeling, we further devise a structure-aware aggregation method to learn the mapping between the question-graph and schema-graph. The structure-aware aggregation method is featured with Global Graph Linking, Local Graph Linking, and Dual-Graph Aggregation Mechanism. We not only study the performance of our proposal empirically but also achieved 3rd place on the challenging Text-to-SQL benchmark Spider at the time of writing.	翻訳日:2021-11-02 14:58:50 公開日:2021-11-01
# Adapterによる教師なしドメイン適応 Unsupervised Domain Adaptation with Adapter ( http://arxiv.org/abs/2111.00667v1 ) ライセンス: Link先を確認	Rongsheng Zhang, Yinhe Zheng, Xiaoxi Mao, Minlie Huang	(参考訳) 事前学習言語モデル(PrLM)を用いた教師なしドメイン適応(UDA)は、これらの事前学習モデルが様々なドメインから学んだ一般的な知識を組み込んでいるため、有望な結果を得た。しかし、小さなドメイン固有のコーパス上でprlmの全てのパラメータを微調整することは、学習されたジェネリック知識を歪め、また各ドメインに微調整されたprlm全体を配置するコストも高くなる。本稿では,教師なしドメイン適応のためのアダプタベースの微調整手法について検討する。具体的には、いくつかのトレーニング可能なアダプタモジュールをPrLMに挿入し、元のPrLMのパラメータを微調整時に固定することで、組み込みの汎用知識を保存する。これらのアダプタをmix-domainコーパスを使ってトレーニングするためにdomain-fusionスキームが導入された。 2つのベンチマークデータセットに関する詳細な実験を行い,提案手法が異なるタスク,データセットサイズ,ドメイン類似性において有効であることを示す。 Unsupervised domain adaptation (UDA) with pre-trained language models (PrLM) has achieved promising results since these pre-trained models embed generic knowledge learned from various domains. However, fine-tuning all the parameters of the PrLM on a small domain-specific corpus distort the learned generic knowledge, and it is also expensive to deployment a whole fine-tuned PrLM for each domain. This paper explores an adapter-based fine-tuning approach for unsupervised domain adaptation. Specifically, several trainable adapter modules are inserted in a PrLM, and the embedded generic knowledge is preserved by fixing the parameters of the original PrLM at fine-tuning. A domain-fusion scheme is introduced to train these adapters using a mix-domain corpus to better capture transferable features. Elaborated experiments on two benchmark datasets are carried out, and the results demonstrate that our approach is effective with different tasks, dataset sizes, and domain similarities.	翻訳日:2021-11-02 14:57:24 公開日:2021-11-01
# 不当・不当な動詞の教師なし発見 Unsupervised Discovery of Unaccusative and Unergative Verbs ( http://arxiv.org/abs/2111.00808v1 ) ライセンス: Link先を確認	Sharid Lo\'aiciga, Luca Bevacqua, Christian Hardmeier	(参考訳) 英語の非強制動詞と非強制動詞を教師なしで検出する手法を提案する。これらのカテゴリにより、動詞の意味的役割を知らずに、因果関係の交替に関与している動詞を識別できる。この方法は、候補動詞の非推移文変種を生成し、言語モデルを求めることに基づく。アノテーション付きリソースに依存しないというメリットも加わり,同様のアプローチと同等の結果を得た。 We present an unsupervised method to detect English unergative and unaccusative verbs. These categories allow us to identify verbs participating in the causative-inchoative alternation without knowing the semantic roles of the verb. The method is based on the generation of intransitive sentence variants of candidate verbs and probing a language model. We obtained results on par with similar approaches, with the added benefit of not relying on annotated resources.	翻訳日:2021-11-02 14:57:06 公開日:2021-11-01
# スパン抽出のためのラベル知識を用いた拡張言語表現 Enhanced Language Representation with Label Knowledge for Span Extraction ( http://arxiv.org/abs/2111.00884v1 ) ライセンス: Link先を確認	Pan Yang, Xin Cong, Zhenyun Sun, Xingwu Liu	(参考訳) プレーンテキストからテキストスパン(単語やフレーズなど)を抽出することを目的としたスパン抽出は、インフォメーション抽出の基本的なプロセスである。近年の研究では,スパン抽出タスクを質問応答問題(QA形式化)に形式化し,最先端のパフォーマンスを実現することで,テキスト表現を強化するラベル知識を導入している。しかし、QA形式化はラベルの知識を完全に活用せず、トレーニング/推論の効率が低い。これらの問題に対処するために,ラベル知識を統合する新しいパラダイムを導入し,ラベル知識をテキスト表現に明示的に効率的に統合する新しいモデルを提案する。具体的には、テキストとラベルアノテーションを独立してエンコードし、ラベルの知識をテキスト表現に統合する。我々は,フラットNER,ネストNER,イベント検出の3つの典型的なスパン抽出タスクについて広範な実験を行った。経験的な結果は 1) 提案手法は4つのベンチマークで最先端の性能を実現する。 2) トレーニング時間と推論時間は,qa形式化パラダイムと比較して,平均で76%,77%削減されている。コードとデータはhttps://github.com/akeepers/lear.comから入手できます。 Span extraction, aiming to extract text spans (such as words or phrases) from plain texts, is a fundamental process in Information Extraction. Recent works introduce the label knowledge to enhance the text representation by formalizing the span extraction task into a question answering problem (QA Formalization), which achieves state-of-the-art performance. However, QA Formalization does not fully exploit the label knowledge and suffers from low efficiency in training/inference. To address those problems, we introduce a new paradigm to integrate label knowledge and further propose a novel model to explicitly and efficiently integrate label knowledge into text representations. Specifically, it encodes texts and label annotations independently and then integrates label knowledge into text representation with an elaborate-designed semantics fusion module. We conduct extensive experiments on three typical span extraction tasks: flat NER, nested NER, and event detection. The empirical results show that 1) our method achieves state-of-the-art performance on four benchmarks, and 2) reduces training time and inference time by 76% and 77% on average, respectively, compared with the QA Formalization paradigm. Our code and data are available at https://github.com/Akeepers/LEAR.	翻訳日:2021-11-02 14:57:01 公開日:2021-11-01
# トランスフォーマーモデルを用いた言語間ヘイトスピーチ検出 Cross-lingual Hate Speech Detection using Transformer Models ( http://arxiv.org/abs/2111.00981v1 ) ライセンス: Link先を確認	Teodor Ti\c{t}a, Arkaitz Zubiaga	(参考訳) 言語横断設定におけるヘイトスピーチ検出は、中規模および大規模オンラインプラットフォームにおいて最も関心を寄せる分野である。この問題をグローバルな規模で適切に解決できないことは、道徳的に疑わしい現実の出来事、人間の死、そして憎悪そのものの永続性につながる。本稿では, 英語からフランス語, フランス語, バイヴァーサ, および各言語を個別に学習する, この重要なソーシャルデータサイエンス課題について, 反復的改善と比較誤差分析を含む細調整による多言語トランスフォーマーモデル(mBERT, XLM-RoBERTa)の能力について述べる。 Hate speech detection within a cross-lingual setting represents a paramount area of interest for all medium and large-scale online platforms. Failing to properly address this issue on a global scale has already led over time to morally questionable real-life events, human deaths, and the perpetuation of hate itself. This paper illustrates the capabilities of fine-tuned altered multi-lingual Transformer models (mBERT, XLM-RoBERTa) regarding this crucial social data science task with cross-lingual training from English to French, vice-versa and each language on its own, including sections about iterative improvement and comparative error analysis.	翻訳日:2021-11-02 14:56:44 公開日:2021-11-01
# (参考訳) cGANの分類と非分類の統一的視点 A Unified View of cGANs with and without Classifiers ( http://arxiv.org/abs/2111.01035v1 ) ライセンス: CC BY 4.0	Si-An Chen, Chun-Liang Li, Hsuan-Tien Lin	(参考訳) Conditional Generative Adversarial Networks (cGANs) は、クラス条件分布からサンプリングできる暗黙の生成モデルである。既存のcGANは幅広い異なる識別器の設計と訓練目的に基づいている。初期の作業で一般的な設計の一つは、正しい分類器が間違ったクラスで生成されたサンプルを除去するのに役立つと仮定して、トレーニング中に分類器を含めることである。しかし、cGANの分類子を含むと、容易に分類できるサンプルだけを生成する副作用が生じることが多い。近年、いくつかの代表的cGANは、分類器を使わずに最先端の性能に到達することを避けている。何らかの形で、分類器がより良いcganを設計するために復活できるかどうかは不明だ。本研究では,cGANを改善するために,分類器を適切に活用できることを実証する。まず、結合確率分布の分解を用いて、cGANの目標を接続し、統一的なフレームワークとして分類する。このフレームワークは、分布をパラメータ化するための古典的なエネルギーモデルとともに、cGANに対する分類器の使用を原則的に正当化する。 ACGAN(英語版)、ProjGAN(英語版)、ContraGAN(英語版)などの一般的なcGAN変種を、異なるレベルの近似を持つ特別なケースとして説明している。実験の結果,提案したフレームワークにインスパイアされた設計は,複数のベンチマークデータセット,特に最も困難なImageNetにおいて,最先端のcGANよりも優れていた。コードはhttps://github.com/sian-chen/PyTorch-ECGANで公開されている。 Conditional Generative Adversarial Networks (cGANs) are implicit generative models which allow to sample from class-conditional distributions. Existing cGANs are based on a wide range of different discriminator designs and training objectives. One popular design in earlier works is to include a classifier during training with the assumption that good classifiers can help eliminate samples generated with wrong classes. Nevertheless, including classifiers in cGANs often comes with a side effect of only generating easy-to-classify samples. Recently, some representative cGANs avoid the shortcoming and reach state-of-the-art performance without having classifiers. Somehow it remains unanswered whether the classifiers can be resurrected to design better cGANs. In this work, we demonstrate that classifiers can be properly leveraged to improve cGANs. We start by using the decomposition of the joint probability distribution to connect the goals of cGANs and classification as a unified framework. The framework, along with a classic energy model to parameterize distributions, justifies the use of classifiers for cGANs in a principled manner. It explains several popular cGAN variants, such as ACGAN, ProjGAN, and ContraGAN, as special cases with different levels of approximations, which provides a unified view and brings new insights to understanding cGANs. Experimental results demonstrate that the design inspired by the proposed framework outperforms state-of-the-art cGANs on multiple benchmark datasets, especially on the most challenging ImageNet. The code is available at https://github.com/sian-chen/PyTorch-ECGAN.	翻訳日:2021-11-02 14:54:43 公開日:2021-11-01
# RMNet: ネットワークから残留接続をほぼ取り除く RMNet: Equivalently Removing Residual Connection from Networks ( http://arxiv.org/abs/2111.00687v1 ) ライセンス: Link先を確認	Fanxu Meng, Hao Cheng, Jiaxin Zhuang, Ke Li, Xing Sun	(参考訳) 残差接続は、非常に深いニューラルネットワークのトレーニングを可能にするが、マルチブランチトポロジーのため、オンライン推論には適さない。これにより、多くの研究者が推論時の残差接続を伴わずにDNNの設計に取り組むことができる。例えば、RepVGGはデプロイ時にマルチブランチトポロジをVGGライクな(シングルブランチ)モデルに再パラメータ化し、ネットワークが比較的浅い場合に優れたパフォーマンスを示す。しかし、RepVGGはResNetをVGGに等価に変換することはできない。なぜなら再パラメータ化法は線形ブロックにのみ適用でき、非線形層(ReLU)は、特に深いネットワークにおいて限られた表現能力をもたらす残差接続の外に置く必要があるからである。本稿では,この問題を解決し,resblock上でのrm(reserving and merge)操作により,バニラ網の残差接続を同値に除去することを提案する。具体的には、RM操作により、入力特徴マップがブロックを通り抜けて情報を保存し、ブロックの最後に全ての情報をマージすることで、元の出力を変更することなく残余接続を除去することができる。プラグインとしてrm操作は基本的に3つの利点がある。 1)その実装により、高比ネットワークプルーニングに自然に親しみやすい。 2) RepVGG の深さ制限を破るのに役立つ。 3) ResNet や RepVGG と比較して,RMNet の精度向上を実現している。 rmオペレーションのイデオロギーは、将来、コミュニティのモデル設計に関する多くの洞察を刺激できると信じています。コードはhttps://github.com/fxmeng/rmnet。 Although residual connection enables training very deep neural networks, it is not friendly for online inference due to its multi-branch topology. This encourages many researchers to work on designing DNNs without residual connections at inference. For example, RepVGG re-parameterizes multi-branch topology to a VGG-like (single-branch) model when deploying, showing great performance when the network is relatively shallow. However, RepVGG can not transform ResNet to VGG equivalently because re-parameterizing methods can only be applied to linear blocks and the non-linear layers (ReLU) have to be put outside of the residual connection which results in limited representation ability, especially for deeper networks. In this paper, we aim to remedy this problem and propose to remove the residual connection in a vanilla ResNet equivalently by a reserving and merging (RM) operation on ResBlock. Specifically, the RM operation allows input feature maps to pass through the block while reserving their information and merges all the information at the end of each block, which can remove residual connections without changing the original output. As a plug-in method, RM Operation basically has three advantages: 1) its implementation makes it naturally friendly for high ratio network pruning. 2) it helps break the depth limitation of RepVGG. 3) it leads to better accuracy-speed trade-off network (RMNet) compared to ResNet and RepVGG. We believe the ideology of RM Operation can inspire many insights on model design for the community in the future. Code is available at: https://github.com/fxmeng/RMNet.	翻訳日:2021-11-02 14:34:45 公開日:2021-11-01
# プロジェクションされたGANがより速く収束 Projected GANs Converge Faster ( http://arxiv.org/abs/2111.01007v1 ) ライセンス: Link先を確認	Axel Sauer, Kashyap Chitta, Jens M\"uller, Andreas Geiger	(参考訳) GAN(Generative Adversarial Networks)は高品質な画像を生成するが、訓練は難しい。注意深い正規化、大量の計算、高価なハイパーパラメータスイープが必要です。生成したサンプルと実際のサンプルを固定された事前訓練された特徴空間に投影することで、これらの問題に大きく取り組みます。判別器は事前訓練されたモデルの深い層から特徴を完全に活用できないという発見により、チャネルと解像度をまたいだ特徴を混合するより効果的な戦略を提案する。我々の投影GANは画像品質、サンプル効率、収束速度を改善する。さらに、最大1メガピクセルの解像度と互換性があり、22のベンチマークデータセット上で最先端のFr\echet Inception Distance(FID)を前進させる。重要なことは、予測されたGANはそれまでの最低値のFIDと最大40倍の速さで一致し、同じ計算リソースからウォールタイム時間を5日から3時間未満に短縮する。 Generative Adversarial Networks (GANs) produce high-quality images but are challenging to train. They need careful regularization, vast amounts of compute, and expensive hyper-parameter sweeps. We make significant headway on these issues by projecting generated and real samples into a fixed, pretrained feature space. Motivated by the finding that the discriminator cannot fully exploit features from deeper layers of the pretrained model, we propose a more effective strategy that mixes features across channels and resolutions. Our Projected GAN improves image quality, sample efficiency, and convergence speed. It is further compatible with resolutions of up to one Megapixel and advances the state-of-the-art Fr\'echet Inception Distance (FID) on twenty-two benchmark datasets. Importantly, Projected GANs match the previously lowest FIDs up to 40 times faster, cutting the wall-clock time from 5 days to less than 3 hours given the same computational resources.	翻訳日:2021-11-02 14:33:59 公開日:2021-11-01
# リレーショナルパスルールマイニングを用いた知識グラフ埋め込みのためのトランスダクティブデータ拡張 Transductive Data Augmentation with Relational Path Rule Mining for Knowledge Graph Embedding ( http://arxiv.org/abs/2111.00974v1 ) ライセンス: Link先を確認	Yushi Hirose, Masashi Shimbo, Taro Watanabe	(参考訳) 知識グラフの完成には、グラフ埋め込みに基づくものと関係経路規則帰納に基づくものという2つの主要な予測モデルが存在する。それぞれ異なる利点と欠点がある。両タイプを生かしたハイブリッドモデルが最近提案されている。ハイブリッドモデルの1つであるUniKERは、リレーショナルパスルールによってトレーニングデータを交互に拡張し、埋め込みモデルを訓練する。その高い予測精度にもかかわらず、拡張データの品質を維持するために低信頼ルールを無視しているため、関係パスルールを十分に活用していない。この制限を緩和するため,関係経路規則と拡張データの信頼度に基づく重み付けによるトランスダクティブデータ拡張を提案する。その結果,本手法は真の回答や類似したエンティティを含むデータを追加することで,組込みモデルの性能を効果的に向上できることがわかった。 For knowledge graph completion, two major types of prediction models exist: one based on graph embeddings, and the other based on relation path rule induction. They have different advantages and disadvantages. To take advantage of both types, hybrid models have been proposed recently. One of the hybrid models, UniKER, alternately augments training data by relation path rules and trains an embedding model. Despite its high prediction accuracy, it does not take full advantage of relation path rules, as it disregards low-confidence rules in order to maintain the quality of augmented data. To mitigate this limitation, we propose transductive data augmentation by relation path rules and confidence-based weighting of augmented data. The results and analysis show that our proposed method effectively improves the performance of the embedding model by augmenting data that include true answers or entities similar to them.	翻訳日:2021-11-02 14:29:19 公開日:2021-11-01
# スペクトル距離によるグラフ構造攻撃 Graph Structural Attack by Spectral Distanc ( http://arxiv.org/abs/2111.00684v1 ) ライセンス: Link先を確認	Lu Lin, Ethan Blaser and Hongning Wang	(参考訳) グラフ畳み込みネットワーク(GCNs)は、グラフ学習タスクにおける優れたパフォーマンスのため、関心が高まりつつあるが、敵攻撃に対する脆弱性も示されている。本稿では,フーリエ領域におけるグラフスペクトルフィルタの破壊に有効なグラフ構造攻撃について検討する。スペクトルフィルタの破壊を測定するために、グラフラプラシアンの固有値に基づいてスペクトル距離を定義する。次に,タスク固有の攻撃目標と提案したスペクトル距離を同時に最大化し,エッジ摂動を生成する。実験は、トレーニング時間とテスト時間の両方において、ホワイトボックス設定における提案された攻撃の有効性を示す。筆者らの定性的分析は、攻撃行動とスペクトル分布の強制的な変化の関連性を示し、スペクトル距離の最大化が空間領域におけるグラフの構造特性の変化とフーリエ領域における周波数成分の摂動に有効な方法であることを示す実証的な証拠を提供する。 Graph Convolutional Networks (GCNs) have fueled a surge of interest due to their superior performance on graph learning tasks, but are also shown vulnerability to adversarial attacks. In this paper, an effective graph structural attack is investigated to disrupt graph spectral filters in the Fourier domain. We define the spectral distance based on the eigenvalues of graph Laplacian to measure the disruption of spectral filters. We then generate edge perturbations by simultaneously maximizing a task-specific attack objective and the proposed spectral distance. The experiments demonstrate remarkable effectiveness of the proposed attack in the white-box setting at both training and test time. Our qualitative analysis shows the connection between the attack behavior and the imposed changes on the spectral distribution, which provides empirical evidence that maximizing spectral distance is an effective manner to change the structural property of graphs in the spatial domain and perturb the frequency components in the Fourier domain.	翻訳日:2021-11-02 14:27:59 公開日:2021-11-01
# sim上で検証し、実数で検出する -- ドメインランダム化のためのモデル選択 Validate on Sim, Detect on Real -- Model Selection for Domain Randomization ( http://arxiv.org/abs/2111.00765v1 ) ライセンス: Link先を確認	Gal Leibovich, Guy Jacob, Shadi Endrawis, Gal Novik, Aviv Tamar	(参考訳) sim2realと呼ばれるロボットのスキルを学ぶ実践的なアプローチは、シミュレーションで制御ポリシーを訓練し、それを実際のロボットにデプロイする。ドメインランダム化(dr: domain randomization)に基づくsim2実数転送の改善のための一般的なテクニック: 現実世界へのより良い一般化を期待して、ランダムに生成されたさまざまなドメインのポリシーをトレーニングする。ポリシー学習とDRアルゴリズムの両方において、多くのハイパーパラメーターがあるため、多くの訓練されたモデルが出来上がり、その中で最良のモデルを選択するには、実際のロボットに対してコストがかかる。この作業では、現実の世界でポリシーを実行することなく、ポリシーをランク付けできますか? 我々の主な考え方は、事前定義された現実世界データの集合が、オフ・オブ・ディストリビューション検出(OOD)技術を用いて、すべてのポリシーを評価することができるということである。ある意味で、このアプローチは、現実世界の実行前にポリシーを評価するための"ユニットテスト"と見なすことができる。しかし、OODスコア自体が不正確であり、特定のOODメソッドに非常に敏感であることがわかった。本研究の主な貢献は,OODとシミュレーションにおける評価を組み合わせた,単純なyet効率の政策スコアである。我々のスコア - VSDR - は、追加の現実世界データを必要とすることなく、ポリシーランキングの精度を大幅に向上させることができることを示す。画像入力を伴うロボットグリップタスクにおいて,VSDRがsim2real転送に与える影響を評価する。我々は、様々なDRパラメータとOOD手法を広範囲に評価し、VSDRがボード全体のポリシー選択を改善することを示す。さらに重要なことは,本手法が格付けを著しく向上し,ベースラインに比べてデータ量が大幅に少ないことである。 A practical approach to learning robot skills, often termed sim2real, is to train control policies in simulation and then deploy them on a real robot. Popular techniques to improve the sim2real transfer build on domain randomization (DR): Training the policy on a diverse set of randomly generated domains with the hope of better generalization to the real world. Due to the large number of hyper-parameters in both the policy learning and DR algorithms, one often ends up with a large number of trained models, where choosing the best model among them demands costly evaluation on the real robot. In this work we ask: Can we rank the policies without running them in the real world? Our main idea is that a predefined set of real world data can be used to evaluate all policies, using out-of-distribution detection (OOD) techniques. In a sense, this approach can be seen as a "unit test" to evaluate policies before any real world execution. However, we find that by itself, the OOD score can be inaccurate and very sensitive to the particular OOD method. Our main contribution is a simple-yet-effective policy score that combines OOD with an evaluation in simulation. We show that our score - VSDR - can significantly improve the accuracy of policy ranking without requiring additional real world data. We evaluate the effectiveness of VSDR on sim2real transfer in a robotic grasping task with image inputs. We extensively evaluate different DR parameters and OOD methods, and show that VSDR improves policy selection across the board. More importantly, our method achieves significantly better ranking, and uses significantly less data compared to baselines.	翻訳日:2021-11-02 14:27:43 公開日:2021-11-01
# マルチエージェント環境における独立強化学習アルゴリズムの検討 Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments ( http://arxiv.org/abs/2111.01100v1 ) ライセンス: Link先を確認	Ken Ming Lee, Sriram Ganapathi Subramanian, Mark Crowley	(参考訳) 独立強化学習アルゴリズムは、マルチエージェント環境で最良のポリシーを見つけるための理論的保証はない。しかし、実際には、先行研究は、いくつかの領域における独立アルゴリズムによる良い性能と、他の領域での悪いパフォーマンスを報告している。さらに、独立したアルゴリズムの強みと弱みに関する包括的な研究が文献に欠けている。本稿では,マルチエージェント環境の3つの主要カテゴリ,すなわち協調的,競争的,混合的環境にまたがる4つのpettingzoo環境における独立アルゴリズムの性能を実証的に比較する。完全観測可能な環境では、独立アルゴリズムが協調的かつ競争的な環境でマルチエージェントアルゴリズムと同等の性能を発揮することを示す。混合環境において,独立したアルゴリズムで訓練されたエージェントは,個別によく行動することを学ぶが,同盟国との協力や敵との競争を学ばないことを示す。また,協調的部分可観測環境において,再帰性を加えることで独立アルゴリズムの学習が向上することを示した。 Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on four PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. We show that in fully-observable environments, independent algorithms can perform on par with multi-agent algorithms in cooperative and competitive settings. For the mixed environments, we show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies. We also show that adding recurrence improves the learning of independent algorithms in cooperative partially observable environments.	翻訳日:2021-11-02 14:27:20 公開日:2021-11-01
# (参考訳) OpenStreetMapデータを用いた自転車共有システムの駅立地計画への移動学習アプローチ Transfer Learning Approach to Bicycle-sharing Systems' Station Location Planning using OpenStreetMap Data ( http://arxiv.org/abs/2111.00990v1 ) ライセンス: CC BY-SA 4.0	Kamil Raczycki, Piotr Szyma\'nski	(参考訳) 自転車共有システム(BSS)は、先進地域の大規模で富裕な都市の多くの市民にとって日々の現実となっている。しかしながら、自転車共有ステーションのレイアウトを計画するには、通常、高価なデータ収集、旅行行動の調査、そしてステーションレイアウトの最適化が必要となる。多くの小さな都市や町、特に発展途上国では、こうした計画の資金調達が困難である。 BSSの計画にもかなりの時間がかかる。しかし、パンデミックが示すように、自治体は自転車の公共交通機関を離れる市民を含むモビリティシフトに迅速に対応する必要がある。自転車の需要の増加に対処するためには、自転車シェアリングシステムを迅速に提供することが重要である。本稿では,bssレイアウト設計におけるコストと時間の問題に対処し,空間埋め込み手法を用いて,計画の合理化とプロセスを容易にする新しいソリューションを提案する。 openstreetmapの公開データとヨーロッパの34都市からの駅配置のみに基づいて、uber h3離散グローバルグリッドシステムを使用して都市をマイクロリージョンに分割し、トランスファーラーニングを使用して、異なる都市の既存のシステムに基づいて駅を配置する価値のある地域を示す方法が開発されている。この作業の結果は、駅レイアウトを基準都市の選択で計画する際の意思決定においてプランナーを支援するメカニズムである。 Bicycle-sharing systems (BSS) have become a daily reality for many citizens of larger, wealthier cities in developed regions. However, planning the layout of bicycle-sharing stations usually requires expensive data gathering, surveying travel behavior and trip modelling followed by station layout optimization. Many smaller cities and towns, especially in developing areas, may have difficulty financing such projects. Planning a BSS also takes a considerable amount of time. Yet as the pandemic has shown us, municipalities will face the need to adapt rapidly to mobility shifts, which include citizens leaving public transport for bicycles. Laying out a bike sharing system quickly will become critical in addressing the increase in bike demand. This paper addresses the problem of cost and time in BSS layout design and proposes a new solution to streamline and facilitate the process of such planning by using spatial embedding methods. Based only on publicly available data from OpenStreetMap, and station layouts from 34 cities in Europe, a method has been developed to divide cities into micro-regions using the Uber H3 discrete global grid system and to indicate regions where it is worth placing a station based on existing systems in different cities using transfer learning. The result of the work is a mechanism to support planners in their decision making when planning a station layout with a choice of reference cities.	翻訳日:2021-11-02 14:24:48 公開日:2021-11-01
# トランスを用いた家畜のモニタリング Livestock Monitoring with Transformer ( http://arxiv.org/abs/2111.00801v1 ) ライセンス: Link先を確認	Bhavesh Tangirala, Ishan Bhandari, Daniel Laszlo, Deepak K. Gupta, Rajat M. Thomas, Devanshu Arya	(参考訳) 家畜の行動の追跡は、現代の家畜農場における早期発見と伝染病の予防を可能にする。経済的利益とは別に、これは家畜農場で使用される抗生物質の量を減らし、それ以外はヒトの食生活に入り、抗生物質耐性の流行を緩和する。標準的なビデオカメラは、ほとんどの現代農場で利用でき、家畜をモニターできる。しかし、ほとんどのコンピュータビジョンアルゴリズムは、主に、このタスクで性能が悪い。一農場で飼育されている動物と同一の外観で、明らかな空間的特徴がないもの (二)既存のトラッカーのいずれも長期間の堅牢性がなく、 (iii)照明の変化、頻繁な閉塞、カメラアングルの変化、動物のサイズなど実世界の状況は、モデルが一般化することを困難にしている。これらの課題を踏まえて,グループ内豚を対象としたエンド・ツー・エンド行動監視システムを開発し,インスタンスレベルのセグメンテーション,トラッキング,アクション認識,再識別(star)タスクを同時に行う。本稿では, トランスフォーマーアーキテクチャを用いて, グループ豚のインスタンスレベルの埋め込みを学習する, エンドツーエンド多目的家畜監視フレームワークであるStarformerを紹介する。実屋内養豚環境における豚の行動分類, セグメンテーション, セグメンテーション, 追跡, 行動分類を含むビデオシーケンスからなる, 慎重に整理されたデータセットであるPigtraceを提案する。 STARタスクを同時に最適化することで、スターフォーマーは個々のタスクでトレーニングされた一般的なベースラインモデルより優れていることを示す。 Tracking the behaviour of livestock enables early detection and thus prevention of contagious diseases in modern animal farms. Apart from economic gains, this would reduce the amount of antibiotics used in livestock farming which otherwise enters the human diet exasperating the epidemic of antibiotic resistance - a leading cause of death. We could use standard video cameras, available in most modern farms, to monitor livestock. However, most computer vision algorithms perform poorly on this task, primarily because, (i) animals bred in farms look identical, lacking any obvious spatial signature, (ii) none of the existing trackers are robust for long duration, and (iii) real-world conditions such as changing illumination, frequent occlusion, varying camera angles, and sizes of the animals make it hard for models to generalize. Given these challenges, we develop an end-to-end behaviour monitoring system for group-housed pigs to perform simultaneous instance level segmentation, tracking, action recognition and re-identification (STAR) tasks. We present starformer, the first end-to-end multiple-object livestock monitoring framework that learns instance-level embeddings for grouped pigs through the use of transformer architecture. For benchmarking, we present Pigtrace, a carefully curated dataset comprising video sequences with instance level bounding box, segmentation, tracking and activity classification of pigs in real indoor farming environment. Using simultaneous optimization on STAR tasks we show that starformer outperforms popular baseline models trained for individual tasks.	翻訳日:2021-11-02 14:08:21 公開日:2021-11-01
# 3次元表面認識画像合成のための生成操作場 Generative Occupancy Fields for 3D Surface-Aware Image Synthesis ( http://arxiv.org/abs/2111.00969v1 ) ライセンス: Link先を確認	Xudong Xu, Xingang Pan, Dahua Lin, Bo Dai	(参考訳) 生成放射場の出現は、3d認識画像合成の発展を著しく促進した。 radianceフィールドでの累積レンダリングプロセスは、勾配がボリューム全体に分散するが、拡散したオブジェクト表面につながるため、これらの生成モデルのトレーニングをずっと簡単にする。一方、放射場と比較すると、占有表現は本質的に決定論的曲面を保証できる。しかし、私たちが生成モデルに直接占有表現を適用すると、訓練中は物体表面上のスパース勾配のみを受け取り、最終的に収束問題に悩まされる。本稿では,コンパクトな物体表面を学習できる生成的放射場に基づく新しいモデルである生成的占有場(gof)を提案する。 GOFの重要な洞察は、放射場における累積レンダリングから、学習面がより正確になるにつれて、表面点のみのレンダリングへの専用の遷移である。このように、GOFは2つの表現の利点を統一されたフレームワークで組み合わせる。実際には、そのレンダリング過程におけるサンプリング領域をボリューム全体から表面周辺の最小隣接領域に徐々に縮小することにより、輝度場からマーチから占有率表現への開始のトレーニングタイム遷移を実現する。複数のデータセットに関する総合的な実験を通して、GOFは高画質画像を3次元整合性で合成し、コンパクトで滑らかな物体表面を同時に学習できることを実証した。コード、モデル、デモビデオはhttps://sheldontsui.github.io/projects/gofで入手できる。 The advent of generative radiance fields has significantly promoted the development of 3D-aware image synthesis. The cumulative rendering process in radiance fields makes training these generative models much easier since gradients are distributed over the entire volume, but leads to diffused object surfaces. In the meantime, compared to radiance fields occupancy representations could inherently ensure deterministic surfaces. However, if we directly apply occupancy representations to generative models, during training they will only receive sparse gradients located on object surfaces and eventually suffer from the convergence problem. In this paper, we propose Generative Occupancy Fields (GOF), a novel model based on generative radiance fields that can learn compact object surfaces without impeding its training convergence. The key insight of GOF is a dedicated transition from the cumulative rendering in radiance fields to rendering with only the surface points as the learned surface gets more and more accurate. In this way, GOF combines the merits of two representations in a unified framework. In practice, the training-time transition of start from radiance fields and march to occupancy representations is achieved in GOF by gradually shrinking the sampling region in its rendering process from the entire volume to a minimal neighboring region around the surface. Through comprehensive experiments on multiple datasets, we demonstrate that GOF can synthesize high-quality images with 3D consistency and simultaneously learn compact and smooth object surfaces. Code, models, and demo videos are available at https://sheldontsui.github.io/projects/GOF	翻訳日:2021-11-02 14:07:24 公開日:2021-11-01
# 長期文書分類の比較研究 Comparative Study of Long Document Classification ( http://arxiv.org/abs/2111.00702v1 ) ライセンス: Link先を確認	Vedangi Wagh, Snehal Khandve, Isha Joshi, Apurva Wani, Geetanjali Kale, Raviraj Joshi	(参考訳) インターネット上の文書形式で保存される情報の量は急速に増加している。そのため、これらの文書を最適に整理・維持することが求められている。テキスト分類アルゴリズムは、テキスト内の単語間の複雑な関係を研究し、文書の意味論を解釈しようとする。これらのアルゴリズムはここ数年で大きく進化した。単純な機械学習アルゴリズムからトランスフォーマーベースのアーキテクチャまで、多くの進歩がありました。しかし、既存の文献は異なるデータセットに対する異なるアプローチを分析しており、機械学習アルゴリズムの性能を比較することは困難である。本研究では,機械学習の標準手法を用いて,長い文書分類を再考する。単純なNaive Bayesから6つの標準テキスト分類データセット上の複雑なBERTまでのアプローチをベンチマークする。本稿では,長い文書データセットに対して異なるアルゴリズムを徹底的に比較する。長い文書分類は単純なタスクであり、基本的なアルゴリズムでさえ、ほとんどのデータセットにおいてBERTベースのアプローチと競合的に実行されます。 BERTベースのモデルはすべてのデータセットで一貫して良好に動作し、計算コストが懸念されない場合、文書分類タスクに盲目的に使用できる。浅層モデルのカテゴリでは、すべてのデータセットで適切に機能する生のBiLSTM + Maxアーキテクチャの使用を提案する。さらに単純なGlove + Attention bag of words modelは、より単純なユースケースに利用できる。高度なモデルを使用することの重要性は、比較的難しいタスクであるIMDBの感情データセットで明らかである。 The amount of information stored in the form of documents on the internet has been increasing rapidly. Thus it has become a necessity to organize and maintain these documents in an optimum manner. Text classification algorithms study the complex relationships between words in a text and try to interpret the semantics of the document. These algorithms have evolved significantly in the past few years. There has been a lot of progress from simple machine learning algorithms to transformer-based architectures. However, existing literature has analyzed different approaches on different data sets thus making it difficult to compare the performance of machine learning algorithms. In this work, we revisit long document classification using standard machine learning approaches. We benchmark approaches ranging from simple Naive Bayes to complex BERT on six standard text classification datasets. We present an exhaustive comparison of different algorithms on a range of long document datasets. We re-iterate that long document classification is a simpler task and even basic algorithms perform competitively with BERT-based approaches on most of the datasets. The BERT-based models perform consistently well on all the datasets and can be blindly used for the document classification task when the computations cost is not a concern. In the shallow model's category, we suggest the usage of raw BiLSTM + Max architecture which performs decently across all the datasets. Even simpler Glove + Attention bag of words model can be utilized for simpler use cases. The importance of using sophisticated models is clearly visible in the IMDB sentiment dataset which is a comparatively harder task.	翻訳日:2021-11-02 14:04:54 公開日:2021-11-01
# 解釈可能なコントラスト語ムーバーの埋め込み Interpretable contrastive word mover's embedding ( http://arxiv.org/abs/2111.01023v1 ) ライセンス: Link先を確認	Ruijie Jiang, Julia Gouvea, Eric Miller, David Hammer, Shuchin Aeron	(参考訳) 本稿では,分類のための文書の教師付き埋め込み,すなわちコントラスト語ムーバーの埋め込みに対する一般的なアプローチが,解釈可能性を加えることで著しく向上することを示す。この解釈性は、クラスタリング促進機構をコントラスト損失に組み込むことによって達成される。いくつかの公開データセットでは,提案手法が既存のベースラインに対して大幅に改善すると同時に,特定のクラスを最も代表するキーワードのセットを識別することでクラスタへの解釈を提供する。本稿は,学習科学(LS)の領域に根ざした課題である,科学的な文章や思考のための学生の作業を評価するための自然言語処理(NLP)手法の開発の必要性が背景にある。この文脈では,本手法が生物学の授業における実験報告に関連する学生の作業の有意義な評価につながることを示し,LS研究者が学生の理解を深め,科学的思考過程の証拠を評価するのに役立つことを示す。 This paper shows that a popular approach to the supervised embedding of documents for classification, namely, contrastive Word Mover's Embedding, can be significantly enhanced by adding interpretability. This interpretability is achieved by incorporating a clustering promoting mechanism into the contrastive loss. On several public datasets, we show that our method improves significantly upon existing baselines while providing interpretation to the clusters via identifying a set of keywords that are the most representative of a particular class. Our approach was motivated in part by the need to develop Natural Language Processing (NLP) methods for the \textit{novel problem of assessing student work for scientific writing and thinking} - a problem that is central to the area of (educational) Learning Sciences (LS). In this context, we show that our approach leads to a meaningful assessment of the student work related to lab reports from a biology class and can help LS researchers gain insights into student understanding and assess evidence of scientific thought processes.	翻訳日:2021-11-02 14:04:36 公開日:2021-11-01
# Collage: ディープラーニングバックエンドの自動統合 Collage: Automated Integration of Deep Learning Backends ( http://arxiv.org/abs/2111.00655v1 ) ライセンス: Link先を確認	Byungsoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, Tianqi Chen, Zhihao Jia	(参考訳) ディープラーニング(DL)アプリケーションの効率的なデプロイに対する強い要求は、リッチなDLエコシステムの迅速な開発を促す。高速な進歩に追いつくためには、dlフレームワークが様々な最適化されたライブラリやランタイムをバックエンドとして効率的に統合し、それらを適切に使用することで、可能な限り高速な実行可能ファイルを生成することが不可欠である。しかし、現在のdlフレームワークは多様なバックエンドを統合するためにかなりの手作業を必要とし、しばしば高いパフォーマンスを提供することができない。本稿では,dlバックエンドを統合するための自動フレームワークであるcollageを提案する。 Collageは、ユーザがさまざまなバックエンドの機能を正確に指定できるバックエンド登録インターフェースを提供する。 Collageは利用可能なバックエンドの仕様を活用することで、特定のワークロードと実行環境に対して最適化されたバックエンド配置を検索する。評価の結果,コラージュは手動の介入なしに複数のバックエンドを自動的に統合し,2つのNVIDIA GPUとIntel CPUで既存のフレームワークを1.21x,1.39x,1.40xで上回ります。 Strong demands for efficient deployment of Deep Learning (DL) applications prompt the rapid development of a rich DL ecosystem. To keep up with its fast advancement, it is crucial for DL frameworks to efficiently integrate a variety of optimized libraries and runtimes as their backends and generate the fastest possible executable by using them properly. However, current DL frameworks require significant manual effort to integrate diverse backends and often fail to deliver high performance. In this paper, we propose Collage, an automatic framework for integrating DL backends. Collage provides a backend registration interface that allows users to precisely specify the capability of various backends. By leveraging the specifications of available backends, Collage searches for an optimized backend placement for a given workload and execution environment. Our evaluation shows that Collage automatically integrates multiple backends together without manual intervention, and outperforms existing frameworks by 1.21x, 1.39x, 1.40x on two different NVIDIA GPUs and an Intel CPU respectively.	翻訳日:2021-11-02 14:02:19 公開日:2021-11-01
# RMNA:ルールマイニングを用いた近隣アグリゲーションに基づく知識グラフ表現学習モデル RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining ( http://arxiv.org/abs/2111.00658v1 ) ライセンス: Link先を確認	Ling Chen, Jun Cui, Xing Tang, Chaodu Song, Yuntao Qian, Yansheng Li, and Yongjun Zhang	(参考訳) 最先端の伝統的な表現学習(TRL)モデルは知識グラフの完成度において競争性能を示すが、実体の埋め込みの間にパラメータ共有はなく、実体間の接続が弱い。そこで,隣接集約型表現学習(narl)モデルを提案する。しかし、既存のNARLモデルは、複数のホップ隣人の情報を無視したり、階層的な隣人の集約によって、複数のホップ隣人の完全性を破壊したりする。本稿では,ルールマイニングアルゴリズムを用いてホルンルールを取得しフィルタするRMNAというNARLモデルを提案する。また,選択されたホルンルールを用いて,貴重なマルチホップ隣人をワンホップ隣人に変換するので,これらのワンホップ隣人を集約することで,有意義なマルチホップ隣人の情報を完全に活用することができる。実験では,RMNAと最先端TRLモデル,NARLモデルを比較した。その結果,RMNAは競争力のある性能を示した。 Although the state-of-the-art traditional representation learning (TRL) models show competitive performance on knowledge graph completion, there is no parameter sharing between the embeddings of entities, and the connections between entities are weak. Therefore, neighbor aggregation-based representation learning (NARL) models are proposed, which encode the information in the neighbors of an entity into its embeddings. However, existing NARL models either only utilize one-hop neighbors, ignoring the information in multi-hop neighbors, or utilize multi-hop neighbors by hierarchical neighbor aggregation, destroying the completeness of multi-hop neighbors. In this paper, we propose a NARL model named RMNA, which obtains and filters horn rules through a rule mining algorithm, and uses selected horn rules to transform valuable multi-hop neighbors into one-hop neighbors, therefore, the information in valuable multi-hop neighbors can be completely utilized by aggregating these one-hop neighbors. In experiments, we compare RMNA with the state-of-the-art TRL models and NARL models. The results show that RMNA has a competitive performance.	翻訳日:2021-11-02 14:02:02 公開日:2021-11-01
# 畳み込みニューラルネットワークの解法の拡張によるグラフニューラルネットワークのエッジレベル説明 Edge-Level Explanations for Graph Neural Networks by Extending Explainability Methods for Convolutional Neural Networks ( http://arxiv.org/abs/2111.00722v1 ) ライセンス: Link先を確認	Tetsu Kasanishi, Xueting Wang, and Toshihiko Yamasaki	(参考訳) グラフニューラルネットワーク(GNN)は、グラフデータを入力として扱うディープラーニングモデルであり、トラフィック予測や分子特性予測といった様々なタスクに適用される。しかしながら、GNNの複雑さのため、入力のどの部分がGNNモデルの出力に影響を与えるかを分析することは困難である。本研究では,GNNに対して局所解釈型モデル非依存記述(LIME)やグラディエント・ベース・サリエンシマップ,グラディエント・クラス活性化マッピング(Grad-CAM)などの畳み込みニューラルネットワーク(CNN)の説明可能性手法を拡張し,入力グラフのどのエッジが重要かを予測する。実験結果から,limeベースの手法は実環境における複数タスクの最も効率的な説明可能性であり,gnnによる説明可能性の最先端手法よりも優れていることが示唆された。 Graph Neural Networks (GNNs) are deep learning models that take graph data as inputs, and they are applied to various tasks such as traffic prediction and molecular property prediction. However, owing to the complexity of the GNNs, it has been difficult to analyze which parts of inputs affect the GNN model's outputs. In this study, we extend explainability methods for Convolutional Neural Networks (CNNs), such as Local Interpretable Model-Agnostic Explanations (LIME), Gradient-Based Saliency Maps, and Gradient-Weighted Class Activation Mapping (Grad-CAM) to GNNs, and predict which edges in the input graphs are important for GNN decisions. The experimental results indicate that the LIME-based approach is the most efficient explainability method for multiple tasks in the real-world situation, outperforming even the state-of-the-art method in GNN explainability.	翻訳日:2021-11-02 14:01:42 公開日:2021-11-01
# 交通予測のための適応型マルチレセプティブフィールド空間-時間グラフ畳み込みネットワーク Adaptive Multi-receptive Field Spatial-Temporal Graph Convolutional Network for Traffic Forecasting ( http://arxiv.org/abs/2111.00724v1 ) ライセンス: Link先を確認	Xing Wang (1), Juan Zhao (1), Lin Zhu (1), Xu Zhou (2), Zhao Li (2), Junlan Feng (1), Chao Deng (1), Yong Zhang (2) ((1) China Mobile Research Institute, Beijing, China, (2) Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, China)	(参考訳) モバイルネットワークトラフィック予測は,日々のネットワーク運用において重要な機能のひとつだ。商用モバイルネットワークは、大きく、異種で、複雑で、動的である。これらの本質的な特徴により、グラフ畳み込みネットワークに基づく予測手法や、自動車交通予測に成功している様々な注意メカニズムといった最近の高度なアルゴリズムでも、モバイルネットワークトラフィック予測は解決されない。本稿では,この問題を時空間シーケンス予測タスクとして用いた。本稿では,移動基地局のトラフィック動態をモデル化するために,新しい深層学習ネットワークアーキテクチャである適応多受容場空間時間グラフ畳み込みネットワーク(AMF-STGCN)を提案する。 AMF-STGCNは,(1)移動ネットワークにおける複雑な時空間依存性を共同でモデル化し,(2)異種基地局の様々な受容場を捕捉するための注意機構を適用し,(3)完全に接続されたディープネットワークに基づく余分なデコーダを導入して,マルチステップ予測による誤り伝播課題を克服する。 2つの異なるドメインからの4つの実世界のデータセットの実験は、一貫してamf-stgcnが最先端のメソッドを上回ることを示している。 Mobile network traffic forecasting is one of the key functions in daily network operation. A commercial mobile network is large, heterogeneous, complex and dynamic. These intrinsic features make mobile network traffic forecasting far from being solved even with recent advanced algorithms such as graph convolutional network-based prediction approaches and various attention mechanisms, which have been proved successful in vehicle traffic forecasting. In this paper, we cast the problem as a spatial-temporal sequence prediction task. We propose a novel deep learning network architecture, Adaptive Multi-receptive Field Spatial-Temporal Graph Convolutional Networks (AMF-STGCN), to model the traffic dynamics of mobile base stations. AMF-STGCN extends GCN by (1) jointly modeling the complex spatial-temporal dependencies in mobile networks, (2) applying attention mechanisms to capture various Receptive Fields of heterogeneous base stations, and (3) introducing an extra decoder based on a fully connected deep network to conquer the error propagation challenge with multi-step forecasting. Experiments on four real-world datasets from two different domains consistently show AMF-STGCN outperforms the state-of-the-art methods.	翻訳日:2021-11-02 14:01:24 公開日:2021-11-01
# マルコフ報酬の表現性について On the Expressivity of Markov Reward ( http://arxiv.org/abs/2111.00876v1 ) ライセンス: Link先を確認	David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh	(参考訳) リワードは強化学習エージェントの推進力である。本稿では,エージェントが実行するタスクをキャプチャする手段として,報酬の表現性を理解することを目的としている。本研究は,(1)許容される行動のセット,(2)行動上の部分順序付け,(3)軌道上の部分順序付けという,3つの新しい「タスク」の抽象概念を中心に構成する。私たちの主な結果は、報酬はこれらのタスクの多くを表現できるが、それぞれのタスクタイプには、マルコフ報酬関数がキャプチャできないインスタンスが存在することを示しています。次に,マルコフ報酬関数を構成する多項式時間アルゴリズムのセットを提供し,エージェントがこれら3種類のタスクを最適化し,その報酬関数が存在しないかを正しく判断する。結論は,我々の理論的知見を裏付ける実証的研究である。 Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.	翻訳日:2021-11-02 14:01:03 公開日:2021-11-01
# 2段階因果MDPの干渉効率アルゴリズム Intervention Efficient Algorithm for Two-Stage Causal MDPs ( http://arxiv.org/abs/2111.00886v1 ) ライセンス: Link先を確認	Rahul Madhavan, Aurghya Maiti, Gaurav Sinha and Siddharth Barman	(参考訳) マルコフ決定過程 (MDP) では、状態が確率的に報酬を生成する因果グラフに対応する。この設定では、学習者の目標は、各状態の変数に介入することで高い報酬をもたらす原子的介入を特定することである。最近の因果関係の枠組みを一般化し、それぞれの状態に平行な因果関係グラフを持つ2段階の因果関係のMDPに対する(単純な)最小化保証を開発する。インスタンス依存の後悔の束縛を実現するアルゴリズムを提案する。このアルゴリズムの重要な特徴は、凸最適化を利用して探索問題に対処することである。後悔の保証が本質的にきついインスタンスのクラスを特定し、理論的結果を実験的に検証する。 We study Markov Decision Processes (MDP) wherein states correspond to causal graphs that stochastically generate rewards. In this setup, the learner's goal is to identify atomic interventions that lead to high rewards by intervening on variables at each state. Generalizing the recent causal-bandit framework, the current work develops (simple) regret minimization guarantees for two-stage causal MDPs, with parallel causal graph at each state. We propose an algorithm that achieves an instance dependent regret bound. A key feature of our algorithm is that it utilizes convex optimization to address the exploration problem. We identify classes of instances wherein our regret guarantee is essentially tight, and experimentally validate our theoretical results.	翻訳日:2021-11-02 14:00:50 公開日:2021-11-01
# ノード数が変動する線形非ガウス有向非巡回グラフの学習 Learning linear non-Gaussian directed acyclic graph with diverging number of nodes ( http://arxiv.org/abs/2111.00740v1 ) ライセンス: Link先を確認	Ruixuan Zhao and Xin He and Junhui Wang	(参考訳) 有向非巡回グラフ(DAG)として表される非巡回モデルは、収集ノード間の方向因果関係を表現するために広く用いられている。本稿では,高次元の場合において,連続的な非ガウス分布の雑音が生じるような非線形ガウスDAGを効率よく学習する方法を提案する。これは、ガウス雑音を仮定する既存のDAG学習法と、正確なDAG回復を達成するための分散仮定を付加している。提案手法は,DAG学習を促進するためにトポロジカル層の概念を活用する。特に、トポロジ的層をボトムアップ的に正確に再構成することができ、各層内のノード間の親子関係も一貫して確立できることを示す。さらに,提案手法はDAG学習の文献で広く想定されている親の忠実さや親の忠実さの仮定を必要としない。その利点は、さまざまなシミュレーション例で人気のあるライバルたちとの数値比較や、covid-19の世界的な拡散に関する実際の応用によっても支持されている。 Acyclic model, often depicted as a directed acyclic graph (DAG), has been widely employed to represent directional causal relations among collected nodes. In this article, we propose an efficient method to learn linear non-Gaussian DAG in high dimensional cases, where the noises can be of any continuous non-Gaussian distribution. This is in sharp contrast to most existing DAG learning methods assuming Gaussian noise with additional variance assumptions to attain exact DAG recovery. The proposed method leverages a novel concept of topological layer to facilitate the DAG learning. Particularly, we show that the topological layers can be exactly reconstructed in a bottom-up fashion, and the parent-child relations among nodes in each layer can also be consistently established. More importantly, the proposed method does not require the faithfulness or parental faithfulness assumption which has been widely assumed in the literature of DAG learning. Its advantage is also supported by the numerical comparison against some popular competitors in various simulated examples as well as a real application on the global spread of COVID-19.	翻訳日:2021-11-02 13:59:27 公開日:2021-11-01
# 正規化フローを用いたポチトグラフィーの不確かさ定量化 Uncertainty quantification for ptychography using normalizing flows ( http://arxiv.org/abs/2111.00745v1 ) ライセンス: Link先を確認	Agnimitra Dasgupta and Zichao Wendy Di	(参考訳) 高分解能・非破壊的な材料キャラクタリゼーションに欠かせない手法として、Ptychographyは大規模な非線形・非凸逆問題を示すが、本質的な光子統計はこれらの課題に対処するための統計に基づく深層学習アプローチに明確な機会を与える。本研究は, 高次元後部サロゲートを得るための正規化流を探索し, また, 復元に伴う不確かさのキャラクタリゼーションを可能にする。地中真実の欠如による復元品質の判断, 突発的な人工物発見, 返却された不確実性パターンを用いた将来の実験の指導において, 極めて望ましい能力である。提案手法は, 音を付加した合成試料と, 様々な物理実験環境での性能を示す。 Ptychography, as an essential tool for high-resolution and nondestructive material characterization, presents a challenging large-scale nonlinear and non-convex inverse problem; however, its intrinsic photon statistics create clear opportunities for statistical-based deep learning approaches to tackle these challenges, which has been underexplored. In this work, we explore normalizing flows to obtain a surrogate for the high-dimensional posterior, which also enables the characterization of the uncertainty associated with the reconstruction: an extremely desirable capability when judging the reconstruction quality in the absence of ground truth, spotting spurious artifacts and guiding future experiments using the returned uncertainty patterns. We demonstrate the performance of the proposed method on a synthetic sample with added noise and in various physical experimental settings.	翻訳日:2021-11-02 13:59:09 公開日:2021-11-01
# (参考訳) コントラスト学習は予習からファインタニングまで対向的ロバスト性を維持するか? When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning? ( http://arxiv.org/abs/2111.01124v1 ) ライセンス: CC0 1.0	Lijie Fan, Sijia Liu, Pin-Yu Chen, Gaoyuan Zhang, Chuang Gan	(参考訳) Contrastive Learning (CL)は、一般化可能な特徴表現を学習し、その上に線形分類器を微調整することで、下流タスクの最先端のパフォーマンスを達成する。しかし, 画像分類において, 対向ロバスト性は不可欠となるため, CLが下流タスクに対するロバスト性を維持することができるかどうかは不明である。主な課題は、自己指導型事前学習+教師型微調整パラダイムにおいて、事前訓練から微調整までの学習課題のミスマッチにより、対人的堅牢性が容易に忘れられることである。このような課題を,“クロスタスクロバスト性転送可能性”と呼んでいる。上記の問題に対処するため,本論文では,ロバスト性向上のレンズを通してcl原理を再検討し,発展させる。 1) 画像の高周波成分はモデルのロバスト性を向上させるのに有用であり, (2) 擬似超視覚刺激(例:特徴クラスタリング)によるclの強化は、忘れずにロバスト性を維持するのに役立つ。本稿では,新しい設計を取り入れた新しい対向型コントラスト事前学習フレームワークAdvCLを提案する。本稿では,AdvCLがモデル精度と微調整効率を損なうことなく,タスク間の堅牢性伝達性を向上できることを示す。本稿では,AdvCLが複数のデータセット(CIFAR-10,CIFAR-100,STL-10)とファインタニングスキーム(線形評価とフルモデルファインタニング)において,最先端の自己教師型学習手法よりも優れていることを示す。 Contrastive learning (CL) can learn generalizable feature representations and achieve the state-of-the-art performance of downstream tasks by finetuning a linear classifier on top of it. However, as adversarial robustness becomes vital in image classification, it remains unclear whether or not CL is able to preserve robustness to downstream tasks. The main challenge is that in the self-supervised pretraining + supervised finetuning paradigm, adversarial robustness is easily forgotten due to a learning task mismatch from pretraining to finetuning. We call such a challenge 'cross-task robustness transferability'. To address the above problem, in this paper we revisit and advance CL principles through the lens of robustness enhancement. We show that (1) the design of contrastive views matters: High-frequency components of images are beneficial to improving model robustness; (2) Augmenting CL with pseudo-supervision stimulus (e.g., resorting to feature clustering) helps preserve robustness without forgetting. Equipped with our new designs, we propose AdvCL, a novel adversarial contrastive pretraining framework. We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency. With a thorough experimental study, we demonstrate that AdvCL outperforms the state-of-the-art self-supervised robust learning methods across multiple datasets (CIFAR-10, CIFAR-100, and STL-10) and finetuning schemes (linear evaluation and full model finetuning).	翻訳日:2021-11-02 13:55:41 公開日:2021-11-01
# vpfnet:マルチクラス3dオブジェクト検出のためのvoxel-pixel fusion network VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection ( http://arxiv.org/abs/2111.00966v1 ) ライセンス: Link先を確認	Chia-Hung Wang, Hsueh-Wei Chen, Li-Chen Fu	(参考訳) 多くのLiDARを用いた大規模物体検出法、単一クラス物体検出法、あるいは簡単な状況下では、非常によく機能すると主張した。しかし,イメージセマンティクスの活用に失敗したため,小型物体の検出や硬い状況下での性能は,融合ベースのものを超えなかった。本稿では,複雑な環境下での検知性能を高めるために,LiDARとカメラセンサデータストリームを併用した深層学習(DL)組み込み核融合型3Dオブジェクト検出ネットワーク,Voxel-Pixel Fusion Network (VPFNet)を提案する。このネットワーク内では、voxel-pixel fusion(vpf)層と呼ばれ、voxel-pixelペアの幾何学的関係を利用して、voxelの特徴とピクセルの特徴を適切なメカニズムで融合する。さらに,voxel-pixel対の特性を考慮し,核融合効果を誘導・増強するために,いくつかのパラメータが特に設計されている。提案手法は,マルチレベル難易度下でのマルチクラス3次元オブジェクト検出タスクのKITTIベンチマークで評価し,平均平均精度(mAP)ですべての最先端手法より優れていることを示す。ここでの我々のアプローチは、挑戦的な歩行者クラスでKITTIのリーダーボードにランクインしている点も注目に値する。 Many LiDAR-based methods for detecting large objects, single-class object detection, or under easy situations were claimed to perform quite well. However, their performances of detecting small objects or under hard situations did not surpass those of the fusion-based ones due to failure to leverage the image semantics. In order to elevate the detection performance in a complicated environment, this paper proposes a deep learning (DL)-embedded fusion-based multi-class 3D object detection network which admits both LiDAR and camera sensor data streams, named Voxel-Pixel Fusion Network (VPFNet). Inside this network, a key novel component is called Voxel-Pixel Fusion (VPF) layer, which takes advantage of the geometric relation of a voxel-pixel pair and fuses the voxel features and the pixel features with proper mechanisms. Moreover, several parameters are particularly designed to guide and enhance the fusion effect after considering the characteristics of a voxel-pixel pair. Finally, the proposed method is evaluated on the KITTI benchmark for multi-class 3D object detection task under multilevel difficulty, and is shown to outperform all state-of-the-art methods in mean average precision (mAP). It is also noteworthy that our approach here ranks the first on the KITTI leaderboard for the challenging pedestrian class.	翻訳日:2021-11-02 13:27:37 公開日:2021-11-01
# MOST-GAN:遠交顔画像操作のための3次元形状型GAN MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation ( http://arxiv.org/abs/2111.01048v1 ) ライセンス: Link先を確認	Safa C. Medin, Bernhard Egger, Anoop Cherian, Ye Wang, Joshua B. Tenenbaum, Xiaoming Liu, Tim K. Marks	(参考訳) 最近のgans(generative adversarial network)の進歩は、顔画像合成において顕著な成果をもたらしている。スタイルベースのganを用いる手法は、印象的なフォトリアリスティックな顔画像を生成することができるが、生成した顔の特徴を有意義で不連続な方法で制御することはしばしば困難である。事前のアプローチは、以前に訓練されたGANの潜在空間内で、このような意味制御と非絡み合いを実現することを目的としている。対照的に,3次元形状,アルベド,ポーズ,照明などの顔の物理的属性を事前にモデル化し,デザインによる絡み合いを解消する枠組みを提案する。提案手法であるMOST-GANは,スタイルベースGANの表現力と光リアリズムと非線形3D形態素モデルの物理的歪みと柔軟性を統合し,最先端の2Dヘア操作ネットワークと結合する。 MOST-GANは、その物理的特性を完全に3D制御した肖像画の写実的な操作を実現し、照明、表情、およびフルプロファイルビューまでのポーズの極端な操作を可能にする。 Recent advances in generative adversarial networks (GANs) have led to remarkable achievements in face image synthesis. While methods that use style-based GANs can generate strikingly photorealistic face images, it is often difficult to control the characteristics of the generated faces in a meaningful and disentangled way. Prior approaches aim to achieve such semantic control and disentanglement within the latent space of a previously trained GAN. In contrast, we propose a framework that a priori models physical attributes of the face such as 3D shape, albedo, pose, and lighting explicitly, thus providing disentanglement by design. Our method, MOST-GAN, integrates the expressive power and photorealism of style-based GANs with the physical disentanglement and flexibility of nonlinear 3D morphable models, which we couple with a state-of-the-art 2D hair manipulation network. MOST-GAN achieves photorealistic manipulation of portrait images with fully disentangled 3D control over their physical attributes, enabling extreme manipulation of lighting, facial expression, and pose variations up to full profile view.	翻訳日:2021-11-02 13:27:12 公開日:2021-11-01
# ACGANの再起動: 安定トレーニングによる補助分類型GAN Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training ( http://arxiv.org/abs/2111.01118v1 ) ライセンス: Link先を確認	Minguk Kang, Woohyeon Shim, Minsu Cho, Jaesik Park	(参考訳) 条件付き生成逆数ネットワーク(cGAN)は、クラス情報をGANに組み込んで現実的な画像を生成する。最も一般的なcGANの1つは、ソフトマックスクロスエントロピー損失(ACGAN)を持つ補助分類器GANであるが、データセットのクラス数が増加するにつれて、ACGANのトレーニングが困難であることが広く知られている。 ACGANはまた、多様性の欠如により容易に分類できるサンプルを生成する傾向がある。本稿では,ACGANの治療法を2つ紹介する。まず,分類器内での勾配爆発は早期学習において望ましくない崩壊を引き起こし,入力ベクトルを単位超球面に投影することで問題を解くことができる。次に,データ対データクロスエントロピー損失 (d2d-ce) を提案する。本稿では,Rebooted Auxiliary Classifier Generative Adversarial Network (ReACGAN)を提案する。実験結果から,ReACGANはCIFAR10, Tiny-ImageNet, CUB200, ImageNetのデータセット上で,最先端の生成結果が得られることがわかった。また、ReACGANは差別化可能な拡張による利点があり、D2D-CEがStyleGAN2アーキテクチャと調和していることを検証する。モデル重みと代表的なcGANの実装を提供するソフトウェアパッケージはhttps://github.com/POSTECH-CVLab/PyTorch-StudioGANで公開されている。 Conditional Generative Adversarial Networks (cGAN) generate realistic images by incorporating class information into GAN. While one of the most popular cGANs is an auxiliary classifier GAN with softmax cross-entropy loss (ACGAN), it is widely known that training ACGAN is challenging as the number of classes in the dataset increases. ACGAN also tends to generate easily classifiable samples with a lack of diversity. In this paper, we introduce two cures for ACGAN. First, we identify that gradient exploding in the classifier can cause an undesirable collapse in early training, and projecting input vectors onto a unit hypersphere can resolve the problem. Second, we propose the Data-to-Data Cross-Entropy loss (D2D-CE) to exploit relational information in the class-labeled dataset. On this foundation, we propose the Rebooted Auxiliary Classifier Generative Adversarial Network (ReACGAN). The experimental results show that ReACGAN achieves state-of-the-art generation results on CIFAR10, Tiny-ImageNet, CUB200, and ImageNet datasets. We also verify that ReACGAN benefits from differentiable augmentations and that D2D-CE harmonizes with StyleGAN2 architecture. Model weights and a software package that provides implementations of representative cGANs and all experiments in our paper are available at https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.	翻訳日:2021-11-02 13:26:52 公開日:2021-11-01
# 強化学習におけるサンプル複雑度の水平依存性の設定 Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning ( http://arxiv.org/abs/2111.00633v1 ) ライセンス: Link先を確認	Yuanzhi Li, Ruosong Wang, Lin F. Yang	(参考訳) 近年,強化学習(RL)におけるサンプル複雑性の水平依存性の理解への関心が高まっている。特に、地平線長が$H$のRL環境においては、状態と動作の数が固定されたときに、$\mathrm{polylog}(H)$環境相互作用のエピソードを用いて、$O(1)$-最適化ポリシーを学習する、ほぼ正しい(PAC)アルゴリズムが存在することを示した。しかし、$\mathrm{polylog}(h)$ の依存が必要かどうかはまだ不明である。本研究では,同じpac保証を実現するアルゴリズムを開発しながら,環境間インタラクションの$o(1)$のエピソードのみを使用して,rlにおけるサンプル複雑性の地平線依存性を完全に解決する。私たちはこの限界を達成する一割引及び有限水平マルコフ決定過程(MDP)における値関数の接続を確立すること。 (II)MDPにおける新しい摂動解析我々の新しい技術は独立した興味を持ち、RLの関連する問題に適用できると考えている。 Recently there is a surge of interest in understanding the horizon-dependence of the sample complexity in reinforcement learning (RL). Notably, for an RL environment with horizon length $H$, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O(1)$-optimal policy using $\mathrm{polylog}(H)$ episodes of environment interactions when the number of states and actions is fixed. It is yet unknown whether the $\mathrm{polylog}(H)$ dependence is necessary or not. In this work, we resolve this question by developing an algorithm that achieves the same PAC guarantee while using only $O(1)$ episodes of environment interactions, completely settling the horizon-dependence of the sample complexity in RL. We achieve this bound by (i) establishing a connection between value functions in discounted and finite-horizon Markov decision processes (MDPs) and (ii) a novel perturbation analysis in MDPs. We believe our new techniques are of independent interest and could be applied in related questions in RL.	翻訳日:2021-11-02 13:26:28 公開日:2021-11-01
# 分散非凸最適化のための通信圧縮適応勾配法 Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization ( http://arxiv.org/abs/2111.00705v1 ) ライセンス: Link先を確認	Yujia Wang, Lu Lin and Jinghui Chen	(参考訳) トレーニングデータセットの規模が爆発的に増えているため、近年、分散学習への関心が高まっている。主なボトルネックの1つは、中央サーバとローカルワーカーの間の通信コストが大きいことである。誤りフィードバック圧縮は確率勾配勾配(SGD)による通信コストの低減に成功していることが証明されているが、大規模機械学習モデルのトレーニングに広く用いられている保証付き通信効率の高い適応勾配法を構築する試みは、はるかに少ない。本稿では,分散非凸最適化問題に対する通信圧縮型AMSGradを提案する。提案する分散学習フレームワークは,効果的な勾配圧縮戦略とワーカーサイドモデル更新設計を特徴とする。提案手法は,確率的非凸最適化設定において,非圧縮バニラ AMSGrad と同じ繰り返しの複雑度で,一階定常点に収束することを示す。様々なベンチマーク実験が我々の理論を裏付けている。 Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer attempts in building communication-efficient adaptive gradient methods with provable guarantees, which are widely used in training large-scale machine learning models. In this paper, we propose a new communication-compressed AMSGrad for distributed nonconvex optimization problem, which is provably efficient. Our proposed distributed learning framework features an effective gradient compression strategy and a worker-side model update design. We prove that the proposed communication-efficient distributed adaptive gradient method converges to the first-order stationary point with the same iteration complexity as uncompressed vanilla AMSGrad in the stochastic nonconvex optimization setting. Experiments on various benchmarks back up our theory.	翻訳日:2021-11-02 13:26:09 公開日:2021-11-01
# ニューラルネットワークにおける自由確率、ニュートンリリパッドおよびジャコビアン Free Probability, Newton lilypads and Jacobians of neural networks ( http://arxiv.org/abs/2111.00841v1 ) ライセンス: Link先を確認	Reda Chhaibi, Tariq Daouda, Ezechiel Kahn	(参考訳) ニューラルネットワークの学習過程における勾配降下は多くの不安定性を伴う。ヤコビアンのスペクトル密度はロバスト性を分析する重要な要素である。ペニントンらの研究に続いて、そのようなヤコビアンは自由確率論からの自由乗法畳み込みを用いてモデル化される。本稿では,関連するスペクトル密度を計算するための信頼性の高い高速手法を提案する。この方法は制御され、証明された収束を有する。我々の手法は, 適応的なニュートン・ラフソンスキームに基づいてアトラクションの流域を探索し, チェーン化することで, ライパッドのような連続的な流域と, 目的に向かってのステップを見つける。本稿では,学習プロセスがネットワークの深さ,層幅,初期化選択によってどのように影響を受けるかを評価することにより,本手法の適用性を示す。 Gradient descent during the learning process of a neural network can be subject to many instabilities. The spectral density of the Jacobian is a key component for analyzing robustness. Following the works of Pennington et al., such Jacobians are modeled using free multiplicative convolutions from Free Probability Theory. We present a reliable and very fast method for computing the associated spectral densities. This method has a controlled and proven convergence. Our technique is based on an adaptative Newton-Raphson scheme, by finding and chaining basins of attraction: the Newton algorithm finds contiguous lilypad-like basins and steps from one to the next, heading towards the objective. We demonstrate the applicability of our method by using it to assess how the learning process is affected by network depth, layer widths and initialization choices: empirically, final test losses are very correlated to our Free Probability metrics.	翻訳日:2021-11-02 13:24:46 公開日:2021-11-01
# (参考訳) ロバストな質問応答のためのイントロスペクティブ蒸留 Introspective Distillation for Robust Question Answering ( http://arxiv.org/abs/2111.01026v1 ) ライセンス: CC BY 4.0	Yulei Niu, Hanwang Zhang	(参考訳) 質問応答(QA)モデルは、例えば、視覚的QAに先行する言語や、読解における位置バイアスといったデータバイアスを利用するためによく知られている。近年の脱バイアス法は, 分配内(ID)性能を著しく犠牲にして, 分配外(OOD)の一般化性を向上している。したがって、これらはテスト分布が事前に知られている領域にのみ適用できる。本稿では,QAの両世界を最大限に活用するために,IntroD (Introspective Distillation) と呼ばれる新しい脱臭法を提案する。我々は,OODとIDの帰納バイアスを,トレーニングサンプルが現実のIDの世界に適合するか,あるいは偽のOODに適合するかを検査することによってブレンドすることを目的とする。視覚的QAデータセットのVQA v2, VQA-CP, 読解理解データセットのSQuAD実験により, 提案したIntroDは, 他のデバイアス手法と比較して競合性のあるOOD性能を維持しつつ, より優れたID性能を実現していることが示された。 Question answering (QA) models are well-known to exploit data bias, e.g., the language prior in visual QA and the position bias in reading comprehension. Recent debiasing methods achieve good out-of-distribution (OOD) generalizability with a considerable sacrifice of the in-distribution (ID) performance. Therefore, they are only applicable in domains where the test distribution is known in advance. In this paper, we present a novel debiasing method called Introspective Distillation (IntroD) to make the best of both worlds for QA. Our key technical contribution is to blend the inductive bias of OOD and ID by introspecting whether a training sample fits in the factual ID world or the counterfactual OOD one. Experiments on visual QA datasets VQA v2, VQA-CP, and reading comprehension dataset SQuAD demonstrate that our proposed IntroD maintains the competitive OOD performance compared to other debiasing methods, while sacrificing little or even achieving better ID performance compared to the non-debiasing ones.	翻訳日:2021-11-02 13:20:35 公開日:2021-11-01
# コントラスト学習の一般化に向けて Towards the Generalization of Contrastive Self-Supervised Learning ( http://arxiv.org/abs/2111.00743v1 ) ライセンス: Link先を確認	Weiran Huang and Mingyang Yi and Xuyang Zhao	(参考訳) 近年,学習にラベルのないデータしか必要としない自己教師型学習が注目されている。コントラスト学習は、自己教師あり学習のための一般的なアプローチであり、実際に経験的にうまく機能する。しかし,下流課題における一般化能力の理論的理解は十分に研究されていない。そこで,本研究では,自己教師付き事前学習モデルがダウンストリームタスクにどのように一般化するかを理論的に説明する。具体的には,クラス中心と密集したクラス内サンプルを区別する特徴空間に入力データを組み込む場合,自己教師付きモデルが下流分類タスクにおいて一般化能を有することを示す。以上の結論により、SimCLR と Barlow Twins は2つの正準コントラスト自己監督法である。上記の特徴空間はいずれの手法でも得られることが証明され、下流分類タスクの一般化におけるそれらの成功を説明する。最後に, 理論的な知見を検証するため, 様々な実験を行った。 Recently, self-supervised learning has attracted great attention since it only requires unlabeled data for training. Contrastive learning is a popular approach for self-supervised learning and empirically performs well in practice. However, the theoretical understanding of its generalization ability on downstream tasks is not well studied. To this end, we present a theoretical explanation of how contrastive self-supervised pre-trained models generalize to downstream tasks. Concretely, we quantitatively show that the self-supervised model has generalization ability on downstream classification tasks if it embeds input data into a feature space with distinguishing centers of classes and closely clustered intra-class samples. With the above conclusion, we further explore SimCLR and Barlow Twins, which are two canonical contrastive self-supervised methods. We prove that the aforementioned feature space can be obtained via any of the methods, and thus explain their success on the generalization on downstream classification tasks. Finally, various experiments are also conducted to verify our theoretical findings.	翻訳日:2021-11-02 12:58:30 公開日:2021-11-01
# ベイズ最適化のためのディープカーネル獲得関数のエンドツーエンド学習 End-to-End Learning of Deep Kernel Acquisition Functions for Bayesian Optimization ( http://arxiv.org/abs/2111.00639v1 ) ライセンス: Link先を確認	Tomoharu Iwata	(参考訳) 複雑な構造を持つ高次元データに対するベイズ最適化(BO)のために、ガウス過程(GP)のためのニューラルネットワークベースのカーネルは、ディープラーニングの高表現力によって柔軟な代理関数を学習するために使われてきた。しかし、既存の手法では、BO性能を直接改善しない限界確率を最大化してニューラルネットワークを訓練している。本稿では,boが求める真の最適値と最良値との差を最小化するニューラルネットワークを用いた,boのメタ学習手法を提案する。我々は,現在評価されているデータポイントを入力として,次に評価すべきデータポイントをニューラルネットワークによって出力するポリシをモデル化する。このモデルでは、ニューラルネットワークベースのカーネルは、取得関数とgpを介してギャップをバックプロパゲーションすることにより、取得関数に適するように訓練される。我々のモデルは、複数のタスクから強化学習フレームワークによって訓練されている。ニューラルネットワークはさまざまなタスク間で共有されるため、複数のトレーニングタスクからBOに関する知識を収集し、その知識を見えないテストタスクに使用することができる。 3つのテキスト文書データセットを用いた実験において,提案手法が既存の手法よりも優れたBO性能を実現することを示す。 For Bayesian optimization (BO) on high-dimensional data with complex structure, neural network-based kernels for Gaussian processes (GPs) have been used to learn flexible surrogate functions by the high representation power of deep learning. However, existing methods train neural networks by maximizing the marginal likelihood, which do not directly improve the BO performance. In this paper, we propose a meta-learning method for BO with neural network-based kernels that minimizes the expected gap between the true optimum value and the best value found by BO. We model a policy, which takes the current evaluated data points as input and outputs the next data point to be evaluated, by a neural network, where neural network-based kernels, GPs, and mutual information-based acquisition functions are used as its layers. With our model, the neural network-based kernel is trained to be appropriate for the acquisition function by backpropagating the gap through the acquisition function and GP. Our model is trained by a reinforcement learning framework from multiple tasks. Since the neural network is shared across different tasks, we can gather knowledge on BO from multiple training tasks, and use the knowledge for unseen test tasks. In experiments using three text document datasets, we demonstrate that the proposed method achieves better BO performance than the existing methods.	翻訳日:2021-11-02 12:58:03 公開日:2021-11-01
# NOTMAD: サンプル特異構造とパラメータによるベイズネットワークの推定 NOTMAD: Estimating Bayesian Networks with Sample-Specific Structures and Parameters ( http://arxiv.org/abs/2111.01104v1 ) ライセンス: Link先を確認	Ben Lengerich, Caleb Ellington, Bryon Aragam, Eric P. Xing, Manolis Kellis	(参考訳) 文脈固有のベイズネットワーク(即ち有向非巡回グラフ、dag)は変数間の文脈依存関係を識別するが、非巡回性要求によって引き起こされる非凸性は、文脈固有の推定子(例えばグラフ生成関数)間での情報共有を困難にする。このため、コンテキスト固有のベイズネットワークを推定する既存の手法では、データセットをサブサンプルに分割し、統計的パワーと解像度を制限し、多次元および潜在コンテキストの使用を防止している。この課題を克服するために,NOTEARSを最適化したアーチティパルDAG(NOTMAD)を提案する。 NOTMADは、コンテキスト固有のベイジアンネットワークを、サンプルコンテキストに応じてアーキティパルネットワークを混合することを学ぶ関数の出力としてモデル化する。原型的ネットワークは、文脈固有のネットワークと共同で推定され、事前の知識は不要である。我々は、この非巡回性制約を混合関数に逆伝播する滑らかな正規化損失としてエンコードし、この方法でNOTMADはコンテキスト固有の非巡回グラフ間で情報を共有し、ベイズ的ネットワーク構造とパラメータを単一サンプル解像度で推定することができる。がんの形態的変異に対応する患者特異的遺伝子発現ネットワークを含む分析および実験を通じて,notmadおよびサンプル特異的ネットワーク推論の有用性を実証する。 Context-specific Bayesian networks (i.e. directed acyclic graphs, DAGs) identify context-dependent relationships between variables, but the non-convexity induced by the acyclicity requirement makes it difficult to share information between context-specific estimators (e.g. with graph generator functions). For this reason, existing methods for inferring context-specific Bayesian networks have favored breaking datasets into subsamples, limiting statistical power and resolution, and preventing the use of multidimensional and latent contexts. To overcome this challenge, we propose NOTEARS-optimized Mixtures of Archetypal DAGs (NOTMAD). NOTMAD models context-specific Bayesian networks as the output of a function which learns to mix archetypal networks according to sample context. The archetypal networks are estimated jointly with the context-specific networks and do not require any prior knowledge. We encode the acyclicity constraint as a smooth regularization loss which is back-propagated to the mixing function; in this way, NOTMAD shares information between context-specific acyclic graphs, enabling the estimation of Bayesian network structures and parameters at even single-sample resolution. We demonstrate the utility of NOTMAD and sample-specific network inference through analysis and experiments, including patient-specific gene expression networks which correspond to morphological variation in cancer.	翻訳日:2021-11-02 12:57:42 公開日:2021-11-01
# 確率ゲートを用いた支援:理論と線形モデルへの応用 Support Recovery with Stochastic Gates: Theory and Application for Linear Models ( http://arxiv.org/abs/2110.15960v2 ) ライセンス: Link先を確認	Soham Jana, Henry Li, Yutaro Yamada, Ofir Lindenbaum	(参考訳) 本研究では,独立かつ同一に分布する正規誤差を持つ線形モデルにおいて,係数ベクトル(\beta^$)の同時回復と推定の問題を解析する。確率ゲート(stg)[ylnk20]の非線形ペナルティに基づくペナライズ最小二乗推定器を用いて係数を推定する。ガウス設計行列を考えると、stgベースの推定器は、次元および$\beta^$の妥当な条件下で真のデータ生成係数ベクトルに収束し、その支持集合を高い確率で検出する。一般非線形モデル用に設計された既存のSTG推定器を改善するために,線形モデル設定のための新しいプロジェクションベースアルゴリズムを提案する。この新しい手法は, 合成データ解析におけるリカバリを支援するために, 多くの古典的推定器を上回っている。 We analyze the problem of simultaneous support recovery and estimation of the coefficient vector ($\beta^$) in a linear model with independent and identically distributed Normal errors. We apply the penalized least square estimator based on non-linear penalties of stochastic gates (STG) [YLNK20] to estimate the coefficients. Considering Gaussian design matrices we show that under reasonable conditions on dimension and sparsity of $\beta^$ the STG based estimator converges to the true data generating coefficient vector and also detects its support set with high probability. We propose a new projection based algorithm for linear models setup to improve upon the existing STG estimator that was originally designed for general non-linear models. Our new procedure outperforms many classical estimators for support recovery in synthetic data analysis.	翻訳日:2021-11-02 11:19:22 公開日:2021-11-01
# 形状認識型3次元画像合成のためのシェーディングガイド生成命令モデル A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis ( http://arxiv.org/abs/2110.15678v2 ) ライセンス: Link先を確認	Xingang Pan, Xudong Xu, Chen Change Loy, Christian Theobalt, Bo Dai	(参考訳) 生成放射場の発展は、3D認識画像合成の境界を押し上げている。これらの手法は,複数の視点から3次元物体が現実的に見えるという観察に触発され,正則化として多視点制約を導入し,有効3次元放射場を2次元画像から学習する。進行にもかかわらず、形状と色のあいまいさのために正確な3D形状を捉えることができず、下流のタスクでは適用性が制限される。本研究では,この曖昧さに対処するために,新たに改良された形状表現を学習可能なシェーディング誘導型生成暗黙モデルを提案する。私たちの重要な洞察は、正確な3d形状は異なる照明条件下でもリアルなレンダリングをもたらすだろうということです。照明を明示的にモデル化し、様々な照明条件でシェーディングを行うことにより、マルチライト制約を実現する。勾配は、合成された画像を判別器に供給することによって導出される。表面正規化計算の計算負荷を補うために, 表面追跡による効率的なボリュームレンダリング戦略を考案し, 学習時間と推定時間をそれぞれ24%, 48%削減した。提案手法は, 正確な3次元形状を把握しながら, 光リアルな3次元画像合成を実現する。本研究では,既存の手法に対する3次元形状再構成手法の性能向上を実証し,画像照明への適用性を示す。私たちのコードはhttps://github.com/xingangpan/shadeganでリリースします。 The advancement of generative radiance fields has pushed the boundary of 3D-aware image synthesis. Motivated by the observation that a 3D object should look realistic from multiple viewpoints, these methods introduce a multi-view constraint as regularization to learn valid 3D radiance fields from 2D images. Despite the progress, they often fall short of capturing accurate 3D shapes due to the shape-color ambiguity, limiting their applicability in downstream tasks. In this work, we address this ambiguity by proposing a novel shading-guided generative implicit model that is able to learn a starkly improved shape representation. Our key insight is that an accurate 3D shape should also yield a realistic rendering under different lighting conditions. This multi-lighting constraint is realized by modeling illumination explicitly and performing shading with various lighting conditions. Gradients are derived by feeding the synthesized images to a discriminator. To compensate for the additional computational burden of calculating surface normals, we further devise an efficient volume rendering strategy via surface tracking, reducing the training and inference time by 24% and 48%, respectively. Our experiments on multiple datasets show that the proposed approach achieves photorealistic 3D-aware image synthesis while capturing accurate underlying 3D shapes. We demonstrate improved performance of our approach on 3D shape reconstruction against existing methods, and show its applicability on image relighting. Our code will be released at https://github.com/XingangPan/ShadeGAN.	翻訳日:2021-11-02 11:19:08 公開日:2021-11-01

Title

Authors

Abstract

論文公表日・翻訳日

# 多部系における協調と依存

Cooperation and dependencies in multipartite systems ( http://arxiv.org/abs/2003.12489v3 )

ライセンス: Link先を確認

Waldemar Klobus, Marek Miller, Mahasweta Pandit, Ray Ganardi, Lukas Knips, Jan Dziewior, Jasmin Meinecke, Harald Weinfurter, Wieslaw Laskowski, Tomasz Paterek

(参考訳) 本稿では,グローバルシステムのサブシステム間の依存性の度合いを把握できる情報理論量化器を提案する。量化器は、多くの性質を共有しているにもかかわらず、多部相関の測度とは異なる。古典的だけでなく量子システムにも直接計算可能であり、2つのサブシステム間の条件付き相互情報の比較に還元される。例えば、対称量子秘密共有のために新しい量子化器を使うことの利点を示す。また、ローカル操作下での条件付き相互情報の単調性の欠如を特徴付ける不等式を証明し、直観的な理解を提供する。これは、ここで導入された多成分依存測度と多成分相関の区別を示す。

We propose an information-theoretic quantifier for the advantage gained from cooperation that captures the degree of dependency between subsystems of a global system. The quantifier is distinct from measures of multipartite correlations despite sharing many properties with them. It is directly computable for classical as well as quantum systems and reduces to comparing the respective conditional mutual information between any two subsystems. Exemplarily we show the benefits of using the new quantifier for symmetric quantum secret sharing. We also prove an inequality characterizing the lack of monotonicity of conditional mutual information under local operations and provide intuitive understanding for it. This underlines the distinction between the multipartite dependence measure introduced here and multipartite correlations.

翻訳日:2023-05-27 18:23:16 公開日:2021-11-01

# 仮想量子ビットを用いた多層熱機械の設計

Simplifying the design of multilevel thermal machines using virtual qubits ( http://arxiv.org/abs/2009.03832v3 )

ライセンス: Link先を確認

Ayaka Usui, Wolfgang Niedenzu, Marcus Huber

(参考訳) 量子熱力学は、しばしば大規模で複雑な環境と相互作用する小さな量子機械のダイナミクスを扱う。仮想量子ビット、衝突モデル、リセットマスター方程式は、数量子マシンと熱環境に結合した2次元ターゲットシステムの定性的挙動を予測するための非常に有用なツールとなっている。全ての物理系に対する単純化されたモデルパラメータの整合性は知られていないが、定性的予測は実装に関係なく量子機械の一般的な設計を可能にする。より大規模で複雑なマシンに結合した多次元システムのモデリングに複数の競合する仮想キュービットを導入することで、これらのツールを一般化する。 3次元のターゲットに対する完全な物理力学をシミュレートすることにより、現実的なセットアップにおける物理変化の定性的特徴を正確に予測し、数量子ビットを超える自律量子マシンを設計するために「ダイアル」として使用できるリセットモデルの一般的な性質を明らかにする。次に,マルチキュービットマシンに結合した任意の次元システムに対するリセットモデルの一般解析解を提案する。最後に, 改良された3レベルレーザーを, 実験結果の例示として紹介する。

Quantum thermodynamics often deals with the dynamics of small quantum machines interfacing with a large and complex environment. Virtual qubits, collisional models and reset master equations have become highly useful tools for predicting the qualitative behaviour of two-dimensional target systems coupled to few-qubit machines and a thermal environment. While few successes in matching the simplified model parameters for all possible physical systems are known, the qualitative predictions still allow for a general design of quantum machines irrespective of the implementation. We generalise these tools by introducing multiple competing virtual qubits for modelling multi-dimensional systems coupled to larger and more complex machines. By simulating the full physical dynamics for targets with three dimensions, we uncover general properties of reset models that can be used as `dials' to correctly predict the qualitative features of physical changes in a realistic setup and thus design autonomous quantum machines beyond a few qubits. We then present a general analytic solution of the reset model for arbitrary-dimensional systems coupled to multi-qubit machines. Finally, we showcase an improved three-level laser as an exemplary application of our results.

翻訳日:2023-05-03 05:04:55 公開日:2021-11-01

# 量子液体中の高速不純物の動的量子チェレンコフ転移

Dynamical quantum Cherenkov transition of fast impurities in quantum liquids ( http://arxiv.org/abs/2101.00030v2 )

ライセンス: Link先を確認

Kushal Seetharam, Yulia Shchadilova, Fabian Grusdt, Mikhail B. Zvonarev, Eugene Demler

(参考訳) 相互作用する量子多体媒体における移動不純物のダイナミクスを理解するという課題は、環境の不純物と励起状態の間の絡み合いを、幅広いエネルギースケールで含む必要性から生じる。本稿では, ボゴリューボフ励起を流し始めると, 三次元量子ボース流体中に注入される有限質量不純物の運動について検討する。我々は不純物の速度が不純物とボソンの相互作用の強さと不純物の反動エネルギーに依存する臨界値を超えたときの力学の遷移を明らかにする。インジェクション実験では, この2つのレジームは, 不純物速度の非破壊特性だけでなく, ロスシュミットエコー, becで励起された密度リップル, 散乱ボソニック粒子の運動量分布にも有意差が認められた。この遷移は動的量子チェレンコフ効果の顕在化であり、ラムゼー干渉法、RF分光法、吸収イメージング法、飛行時間イメージング法を用いて超低温原子で実験的に観測可能である。

The challenge of understanding the dynamics of a mobile impurity in an interacting quantum many-body medium comes from the necessity of including entanglement between the impurity and excited states of the environment in a wide range of energy scales. In this paper, we investigate the motion of a finite mass impurity injected into a three-dimensional quantum Bose fluid as it starts shedding Bogoliubov excitations. We uncover a transition in the dynamics as the impurity's velocity crosses a critical value which depends on the strength of the interaction between the impurity and bosons as well as the impurity's recoil energy. We find that in injection experiments, the two regimes differ not only in the character of the impurity velocity abatement, but also exhibit qualitative differences in the Loschmidt echo, density ripples excited in the BEC, and momentum distribution of scattered bosonic particles. The transition is a manifestation of a dynamical quantum Cherenkov effect, and should be experimentally observable with ultracold atoms using Ramsey interferometry, RF spectroscopy, absorption imaging, and time-of-flight imaging.

翻訳日:2023-04-18 05:31:39 公開日:2021-11-01

# ネットワークのクランクとキャビティの計算

Computing Cliques and Cavities in Networks ( http://arxiv.org/abs/2101.00536v3 )

ライセンス: Link先を確認

Dinghua Shi, Zhifeng Chen, Xiang Sun, Qinghua Chen, Chuang Ma, Yang Lou and Guanrong Chen

(参考訳) 複雑なネットワークには、ノード、エッジ、三角形などの完全なサブグラフが含まれており、異なる順序の単純化やクランクと呼ばれる。特に、高次傾斜角からなる空洞は脳機能に重要な役割を果たす。最大クランクの探索はnp完全問題であるため、与えられたネットワークの計算可能性を決定するためにkコア分解を用いる。計算可能なネットワークに対して,異なる順序の斜めを見つけるための実装可能なアルゴリズムを用いて探索法を設計し,オイラー特性数も取得する。次に,隣接する斜めの境界行列のランクを用いてベッチ数を計算する。さらに,異なる順序のキャビティを求めるための最適化アルゴリズムも設計する。最後に、このアルゴリズムをC. elegansの神経ネットワークに適用し、典型的なデータセットからのデータを用いて、その傾きの全てと異なる順序のキャビティを見つけ、その構造と関数のさらなる数学的解析と計算の基盤を提供する。

Complex networks contain complete subgraphs such as nodes, edges, triangles, etc., referred to as simplices and cliques of different orders. Notably, cavities consisting of higher-order cliques play an important role in brain functions. Since searching for maximum cliques is an NP-complete problem, we use k-core decomposition to determine the computability of a given network. For a computable network, we design a search method with an implementable algorithm for finding cliques of different orders, obtaining also the Euler characteristic number. Then, we compute the Betti numbers by using the ranks of boundary matrices of adjacent cliques. Furthermore, we design an optimized algorithm for finding cavities of different orders. Finally, we apply the algorithm to the neuronal network of C. elegans with data from one typical dataset, and find all of its cliques and some cavities of different orders, providing a basis for further mathematical analysis and computation of its structure and function.

翻訳日:2023-04-18 00:13:42 公開日:2021-11-01

# 自然言語テキストからの因果関係の抽出に関する調査

A Survey on Extraction of Causal Relations from Natural Language Text ( http://arxiv.org/abs/2101.06426v2 )

ライセンス: Link先を確認

Jie Yang, Soyeon Caren Han, Josiah Poon

(参考訳) 人間の認知の重要な要素として、因果関係はテキストに頻繁に現れ、テキストから因果関係を計算することで、予測タスクのための因果関係を構築するのに役立つ。既存の因果抽出技術には、知識ベース、統計機械学習(ML)ベース、深層学習ベースアプローチなどがある。各メソッドには長所と短所がある。例えば、知識ベースのメソッドは理解できるが、広範な手動のドメイン知識が必要であり、ドメイン間の適用性が低い。統計的機械学習手法は自然言語処理(NLP)ツールキットによってより自動化される。しかし、機能工学は労働集約的であり、ツールキットはエラー伝播を引き起こす可能性がある。近年,その強力な表現学習能力と計算資源の急速な増加により,深層学習技術がNLP研究者から注目を集めている。その制限には、高い計算コストと適切な注釈付きトレーニングデータの欠如が含まれる。本稿では,因果抽出に関する総合的な調査を行う。まず, 因果関係抽出における一次形式, 明示的因果関係, 暗黙的因果関係, および相互因果関係について紹介する。次に、因果関係抽出のためのベンチマークデータセットとモデリングアセスメント手法をリストアップする。そこで本研究では,3つの手法を代表システムで概説する。最後に、既存のオープンチャレンジを潜在的な方向性で強調する。

As an essential component of human cognition, cause-effect relations appear frequently in text, and curating cause-effect relations from text helps in building causal networks for predictive tasks. Existing causality extraction techniques include knowledge-based, statistical machine learning(ML)-based, and deep learning-based approaches. Each method has its advantages and weaknesses. For example, knowledge-based methods are understandable but require extensive manual domain knowledge and have poor cross-domain applicability. Statistical machine learning methods are more automated because of natural language processing (NLP) toolkits. However, feature engineering is labor-intensive, and toolkits may lead to error propagation. In the past few years, deep learning techniques attract substantial attention from NLP researchers because of its' powerful representation learning ability and the rapid increase in computational resources. Their limitations include high computational costs and a lack of adequate annotated training data. In this paper, we conduct a comprehensive survey of causality extraction. We initially introduce primary forms existing in the causality extraction: explicit intra-sentential causality, implicit causality, and inter-sentential causality. Next, we list benchmark datasets and modeling assessment methods for causal relation extraction. Then, we present a structured overview of the three techniques with their representative systems. Lastly, we highlight existing open challenges with their potential directions.

翻訳日:2023-04-15 01:02:37 公開日:2021-11-01

# 条件数を2次的に改善した正定値量子線形系の解法について

On solving classes of positive-definite quantum linear systems with quadratically improved runtime in the condition number ( http://arxiv.org/abs/2101.11868v3 )

ライセンス: Link先を確認

Davide Orsucci, Vedran Dunjko

(参考訳) 量子線形システム(QLS)問題を解く量子アルゴリズムは、計算に難解な微分方程式の解や機械学習の高速化など、近年最も研究されている量子アルゴリズムの一つである。 QLSソルバの効率を管理する基本的なパラメータは$\kappa$であり、QLS問題の発端から知られているように、ランタイムスケールは$\kappa$[Harrow, Hassidim and Lloyd, PRL 103, 150502 (2009)]で少なくとも線形である。しかし、正定値行列の場合、古典的なアルゴリズムは$\sqrt{\kappa}$で実行時にスケーリングすることで線形システムを解くことができる。したがって、QLSソルバが類似した改善を達成できるかどうかを問うことは自然である。この研究では、QLSを解くことは、$A$が正定値であるときにも、$\kappa$のランタイム線型を伴っていることを示す。次に、この下界を回避できる正定値QLSの広いクラスを特定し、$\kappa$の二次的スピードアップを特徴とする2つの新しい量子アルゴリズムを提示する: 1つは、A^{-1}$の行列ブロック符号化を効率的に実装することに基づいており、もう1つは、システムの事前条件を満たすために$A = L L^\dagger$という形式の分解を構成する。これらの方法は広く適用でき、どちらもBQP完全問題を効率的に解くことができる。

Quantum algorithms for solving the Quantum Linear System (QLS) problem are among the most investigated quantum algorithms of recent times, with potential applications including the solution of computationally intractable differential equations and speed-ups in machine learning. A fundamental parameter governing the efficiency of QLS solvers is $\kappa$, the condition number of the coefficient matrix $A$, as it has been known since the inception of the QLS problem that for worst-case instances the runtime scales at least linearly in $\kappa$ [Harrow, Hassidim and Lloyd, PRL 103, 150502 (2009)]. However, for the case of positive-definite matrices classical algorithms can solve linear systems with a runtime scaling as $\sqrt{\kappa}$, a quadratic improvement compared to the the indefinite case. It is then natural to ask whether QLS solvers may hold an analogous improvement. In this work we answer the question in the negative, showing that solving a QLS entails a runtime linear in $\kappa$ also when $A$ is positive definite. We then identify broad classes of positive-definite QLS where this lower bound can be circumvented and present two new quantum algorithms featuring a quadratic speed-up in $\kappa$: the first is based on efficiently implementing a matrix-block-encoding of $A^{-1}$, the second constructs a decomposition of the form $A = L L^\dagger$ to precondition the system. These methods are widely applicable and both allow to efficiently solve BQP-complete problems.

翻訳日:2023-04-13 11:58:17 公開日:2021-11-01

# ディラックフェルミオンの測定誘起相転移の有効理論

Effective Theory for the Measurement-Induced Phase Transition of Dirac Fermions ( http://arxiv.org/abs/2102.08381v5 )

ライセンス: Link先を確認

M. Buchhold, Y. Minoguchi, A. Altland, S. Diehl

(参考訳) 測定に曝露された波動関数は、決定論的ユニタリおよび確率論的測定によって引き起こされる状態の更新を伴う純粋状態ダイナミクスを実行し、量子軌道を定義する。多くの粒子系では、これらの異なる動力学要素の競合は量子相転移に似たシナリオを引き起こす。単一量子軌道のランダム性に拘わらず、それにアクセスするために、軌道プロジェクタの$n$-次モーメントのアンサンブル平均に対して$n$-replica Keldysh場理論を構築する。鍵となる発見は、この場の理論が無期限に加熱される1つの自由度に分解され、n-1$の他は効果的な非エルミート的ハミルトニアンによって生成される純粋な状態進化の形式に投げ込まれることである。この分離は自由理論に完全であり、相互作用理論に有用である。特に局所的に測定されたディラックフェルミオンは(1+1)$次元で研究され、長い波長で観察された相互作用するルッティンガー液体にボゾン化することができる。このモデルでは、非エルミートハミルトニアンは複素係数を持つ量子シン・ゴルドンモデルに対応する。再正規化群分析により、対数絡み合いエントロピー成長を伴うギャップレス臨界相と、ベレジンスキー-コステルリッツ-トゥーレス遷移によって分離されたガッピング領域ロー相が明らかにされる。ここで現われる物理画像は、観測速度を増加させると、軌道波関数を測定オペレーターの固有状態にピン留めするものである。

A wave function exposed to measurements undergoes pure state dynamics, with deterministic unitary and probabilistic measurement induced state updates, defining a quantum trajectory. For many-particle systems, the competition of these different elements of dynamics can give rise to a scenario similar to quantum phase transitions. To access it despite the randomness of single quantum trajectories, we construct an $n$-replica Keldysh field theory for the ensemble average of the $n$-th moment of the trajectory projector. A key finding is that this field theory decouples into one set of degrees of freedom that heats up indefinitely, while $n-1$ others can be cast into the form of pure state evolutions generated by an effective non-Hermitian Hamiltonian. This decoupling is exact for free theories, and useful for interacting ones. In particular, we study locally measured Dirac fermions in $(1+1)$ dimensions, which can be bosonized to a monitored interacting Luttinger liquid at long wavelengths. For this model, the non-Hermitian Hamiltonian corresponds to a quantum Sine-Gordon model with complex coefficients. A renormalization group analysis reveals a gapless critical phase with logarithmic entanglement entropy growth, and a gapped area law phase, separated by a Berezinskii-Kosterlitz-Thouless transition. The physical picture emerging here is a pinning of the trajectory wave function into eigenstates of the measurement operators upon increasing the monitoring rate.

翻訳日:2023-04-11 00:14:12 公開日:2021-11-01

# 非自明な交換則を持つヨルダン・ウィグナー変換と量子ビット

Jordan-Wigner transformation and qubits with nontrivial exchange rule ( http://arxiv.org/abs/2103.04629v3 )

ライセンス: Link先を確認

Alexander Yu. Vlasov

(参考訳) 有名な(スピンのない)フェルミオン量子ビットは、通常のフェルミオンと比較してより微妙な考察を必要とするかもしれない。局所フェルミオンモードのモデルを考慮すると、公式には「占有」状態 |1> のみが粒子交換に関して反対称性に関係するが、「真空」状態 |0> は関係しない。このような「配置」によってインデックス付けされたフェルミオン量子ビットに対する交換規則の導入は、一般的な超選択原理により疑わしいように見える。しかし、そのような「超指数」量子ビットの一貫した代数的構成が本研究で提示される。考えられる手法は超空間の構成と何らかの関係があるが、量子ビットモデルの一般化に用いられる超対称性の標準定義といくつかの違いがある。

Well-known (spinless) fermionic qubits may need more subtle consideration in comparison with usual (spinful) fermions. Taking into account a model with local fermionic modes, formally only the 'occupied' states |1> could be relevant for antisymmetry with respect to particles interchange, but 'vacuum' state |0> is not. Introduction of exchange rule for such fermionic qubits indexed by some 'positions' may look questionable due to general super-selection principle. However, a consistent algebraic construction of such 'super-indexed' qubits is presented in this work. Considered method has some relation with construction of super-spaces, but it has some differences with standard definition of supersymmety sometimes used for generalizations of qubit model.

翻訳日:2023-04-08 18:22:21 公開日:2021-11-01

# フォトニック量子コンピュータ上での強相関多ボソン波動関数の符号化:魅力的なボース・ハッバードモデルへの応用

Encoding strongly-correlated many-boson wavefunctions on a photonic quantum computer: application to the attractive Bose-Hubbard model ( http://arxiv.org/abs/2103.15021v4 )

ライセンス: Link先を確認

Saad Yalouz, Bruno Senjean, Filippo Miatto, Vedran Dunjko

(参考訳) 変分量子アルゴリズム(VQA)は、特に短期的に利用可能なデバイスの観点から、複雑な強い相関の量子多体系の特性を決定する最も有望な方法の一つであると考えられている。この文脈において、多体波動関数を符号化する効率的な量子回路の開発は、VQAの成功の鍵の一つである。フェルミオン系の固有状態をエンコードする現在の量子デバイスの可能性の研究に多大な努力が払われているが、ボソニック系のエンコーディングについてはほとんど知られていない。本研究では,連続変数(cv)フォトニック系量子回路を用いた(単純だがリッチな)bose-hubbardモデルの基底状態の符号化について検討する。 2つの異なるアンザッツアーキテクチャを導入し、提案した連続可変量子回路が(99%以上の忠実度を持つ)強相関多ボソン波動関数をわずか数層で効率的にエンコードできることを実証した。多数のボソン系の基底状態を近似するアンザッツの適合性の研究の他に、変分量子固有解法アルゴリズムにおけるアンザッツの使用の初期評価を行い、エネルギー最小化による発見を行う。この目的のために,実験系におけるハミルトンエネルギーの測定手法を導入し,サンプリングノイズの影響について検討する。

Variational quantum algorithms (VQA) are considered as some of the most promising methods to determine the properties of complex strongly correlated quantum many-body systems, especially from the perspective of devices available in the near term. In this context, the development of efficient quantum circuit ansatze to encode a many-body wavefunction is one of the keys for the success of a VQA. Great efforts have been invested to study the potential of current quantum devices to encode the eigenstates of fermionic systems, but little is known about the encoding of bosonic systems. In this work, we investigate the encoding of the ground state of the (simple but rich) attractive Bose-Hubbard model using a Continuous-Variable (CV) photonic-based quantum circuit. We introduce two different ansatz architectures and demonstrate that the proposed continuous variable quantum circuits can efficiently encode (with a fidelity higher than 99%) the strongly correlated many-boson wavefunction with just a few layers, in all many-body regimes and for different number of bosons and initial states. Beyond the study of the suitability of the ansatz to approximate the ground states of many-boson systems, we also perform initial evaluations of the use of the ansatz in a variational quantum eigensolver algorithm to find it through energy minimization. To this end we also introduce a scheme to measure the Hamiltonian energy in an experimental system, and study the effect of sampling noise.

翻訳日:2023-04-06 08:10:37 公開日:2021-11-01

# 長距離キタエフ鎖におけるハミルトンパラメータ推定における超ハイゼンベルクスケーリング

Super-Heisenberg scaling in Hamiltonian parameter estimation in the long-range Kitaev chain ( http://arxiv.org/abs/2104.07120v2 )

ライセンス: Link先を確認

Jing Yang, Shengshi Pang, Adolfo del Campo and Andrew N. Jordan

(参考訳) 量子力学において、非線形多体相互作用はハイゼンベルクスケーリングを超えるためにハミルトンパラメータ推定の精度を高めることができる。本稿では,長距離相互作用を持つ線形系における相互作用強度の推定と,キタエフ鎖を用いたケーススタディとして,相互作用範囲を変化させた量子フィッシャー情報におけるハイゼンベルクからスーパーハイゼンベルクへの遷移について考察する。さらに,量子制御により,量子フィッシャー情報の事前因子が向上することを示す。本研究は,多体量子メソロジーにおける最適量子制御と長距離相互作用の利点を探求する。

In quantum metrology, nonlinear many-body interactions can enhance the precision of Hamiltonian parameter estimation to surpass the Heisenberg scaling. Here, we consider the estimation of the interaction strength in linear systems with long-range interactions and using the Kitaev chains as a case study, we establish a transition from the Heisenberg to super-Heisenberg scaling in the quantum Fisher information by varying the interaction range. We further show that quantum control can improve the prefactor of the quantum Fisher information. Our results explore the advantage of optimal quantum control and long-range interactions in many-body quantum metrology.

翻訳日:2023-04-03 20:54:17 公開日:2021-11-01

# 非平衡定常状態における絡み合い対策:一次元における実測結果

Entanglement Measures in a Nonequilibrium Steady State: Exact Results in One Dimension ( http://arxiv.org/abs/2105.00740v2 )

ライセンス: Link先を確認

Shachar Fraenkel, Moshe Goldstein

(参考訳) 絡み合いは、凝縮物質多体系の研究において顕著な役割を果たす: 絡み合い測定は、量子情報プロトコルにおけるこれらの系の使用可能性の定量化だけでなく、物理学にも光を当てる。しかし、特に平衡状態にある系では、正確な分析結果はほとんど残っていない。本研究では, 中心近傍に任意の散乱領域を有する一様密結合鎖からなる一様一次元フェルミオン系を, ゼロ温度の直流バイアス電圧により検討する。したがって、系は電流が流れる非平衡定常状態に保たれ、純粋な量子状態によって記述することができる。フィッシャー・ハートウィッグ予想の一般化を用いて,補数付きサブシステムの二部絡みエントロピーの厳密な計算を行い,体積則線形項と対数項の両方を含む,サブシステムの長さの絡み合いのスケーリングが極めて珍しいことを示した。線形項は散乱による不完全伝達と関連しており、レヴィトフ・レソヴィック全計数統計公式の一般化を提供する。対数項は分布関数におけるフェルミの不連続性から生じる。また, 粒子数解の絡み合いの正確な式も生成する。先行次エンタングルメント等式は適用されるが、第1項の破壊はサブシステムのサイズとともに増大し、従来研究されていたシステムでは観察されなかった新しい挙動が観察される。我々は, 単一不純物サイトを有する密結合鎖の具体的モデルに適用し, 解析式が数値計算とよく一致していることを示す。解析結果は、多重散乱領域の場合に対応するためにさらに一般化される。

Entanglement plays a prominent role in the study of condensed matter many-body systems: Entanglement measures not only quantify the possible use of these systems in quantum information protocols, but also shed light on their physics. However, exact analytical results remain scarce, especially for systems out of equilibrium. In this work we examine a paradigmatic one-dimensional fermionic system that consists of a uniform tight-binding chain with an arbitrary scattering region near its center, which is subject to a DC bias voltage at zero temperature. The system is thus held in a current-carrying nonequilibrium steady state, which can nevertheless be described by a pure quantum state. Using a generalization of the Fisher-Hartwig conjecture, we present an exact calculation of the bipartite entanglement entropy of a subsystem with its complement, and show that the scaling of entanglement with the length of the subsystem is highly unusual, containing both a volume-law linear term and a logarithmic term. The linear term is related to imperfect transmission due to scattering, and provides a generalization of the Levitov-Lesovik full counting statistics formula. The logarithmic term arises from the Fermi discontinuities in the distribution function. Our analysis also produces an exact expression for the particle-number-resolved entanglement. We find that although to leading order entanglement equipartition applies, the first term breaking it grows with the size of the subsystem, a novel behavior not observed in previously studied systems. We apply our general results to a concrete model of a tight-binding chain with a single impurity site, and show that the analytical expressions are in good agreement with numerical calculations. The analytical results are further generalized to accommodate the case of multiple scattering regions.

翻訳日:2023-04-01 17:59:43 公開日:2021-11-01

# フェルミオン格子モデルにおける多体スカル状態に対する群論的アプローチ

Group theoretic approach to many-body scar states in fermionic lattice models ( http://arxiv.org/abs/2106.10300v3 )

ライセンス: Link先を確認

Kiryl Pakrouski, Preethi N. Pallegar, Fedor K. Popov, Igor R. Klebanov

(参考訳) ArXiv:2007.00845] は、高対称状態の3つの族が、適切なリー群の生成元である$H_0+OT$という形のスピン-1/2フェルミオンハミルトニアンに対して多体傷であることを示した。これらの家族の1つが有名な$\eta$-pairing州で構成されている。傷の通常の特性に加えて、これらの状態のファミリーは電磁ノイズに敏感であり、量子情報の保存と処理に有利である。本稿では,ハバード相互作用やハイゼンベルク相互作用やそれらを含むハミルトニアンなど,多くのよく知られた結合項が要求される形式であり,微調整を伴わずにこれらの状態を傷跡として支持することを示す。トポロジ的モデルを含む多くのよく使われるモデルに対する明示的な$H_0+OT$分解が提供される。実験的な実装を可能にするため,これらのモデルの低エネルギー部分空間が傷跡のみから構成される条件について議論する。さらに、傷跡のある新しいモデルを設計するためのビルディングブロックとして使用できるすべてのジェネレータをT$に書き、最も興味深いのはスピン軌道結合ホッピングと超伝導ペアリングの用語を含む。このフレームワークを非エルミートオープンシステムにも拡張し、scarサブスペースがコヒーレントな時間発展を継続し、"復活"を示すことを実証します。拡張された2D $tJU$モデルの完全な数値的研究は、不変傷の新規な性質を明確に示し、我々の発見を支持する。

It has been shown [arXiv:2007.00845] that three families of highly symmetric states are many-body scars for any spin-1/2 fermionic Hamiltonian of the form $H_0+OT$, where $T$ is a generator of an appropriate Lie group. One of these families consists of the well-known $\eta$-pairing states. In addition to having the usual properties of scars, these families of states are insensitive to electromagnetic noise and have advantages for storing and processing quantum information. In this paper we show that a number of well-known coupling terms, such as the Hubbard and the Heisenberg interactions, and the Hamiltonians containing them, are of the required form and support these states as scars without fine-tuning. The explicit $H_0+OT$ decomposition for a number of most commonly used models, including topological ones, is provided. To facilitate possible experimental implementations, we discuss the conditions for the low-energy subspace of these models to be comprised solely of scars. Further, we write down all the generators $T$ that can be used as building blocks for designing new models with scars, most interestingly including the spin-orbit coupled hopping and superconducting pairing terms. We expand this framework to the non-Hermitian open systems and demonstrate that for them the scar subspace continues to undergo coherent time evolution and exhibit the "revivals". A full numerical study of an extended 2D $tJU$ model explicitly illustrates the novel properties of the invariant scars and supports our findings.

翻訳日:2023-03-26 08:05:57 公開日:2021-11-01

# コプレーナマイクロ波共振器の近接場テラヘルツ分光法

Near-Field Terahertz Nanoscopy of Coplanar Microwave Resonators ( http://arxiv.org/abs/2106.12907v2 )

ライセンス: Link先を確認

Xiao Guo, Xin He, Zach Degnan, Bogdan C. Donose, Karl Bertling, Arkady Fedorov, Aleksandar D. Raki\'c, Peter Jacobson

(参考訳) 超伝導量子回路は主要な量子コンピューティングプラットフォームの一つである。超伝導量子コンピューティングを実用上重要な点に進めるためには、デコヒーレンスにつながる物質不完全性を特定し、対処することが重要である。ここでは、超伝導量子プロセッサの最も特徴的な構成要素の一つであるシリコン上の湿式エッチングアルミニウム共振器の局所誘電特性とキャリア濃度を探索するためにテラヘルツ走査近接場光学顕微鏡(SNOM)を用いる。近年開発されたベクトルキャリブレーション法を用いてマイクロ波フィードライン近傍の分光からTHz誘電率を抽出する。抽出された誘電率をdrudeモデルに適合させることにより,エッチングチャネル内のシリコンはバッファオキシドエッチングシリコンよりもキャリア濃度が高いことを見出し,キャリア濃度を低減するための後処理法を検討する。その結果,近接場thz調査は量子デバイスにおける電位損失チャネルの定量的評価と同定に応用できることがわかった。

Superconducting quantum circuits are one of the leading quantum computing platforms. To advance superconducting quantum computing to a point of practical importance, it is critical to identify and address material imperfections that lead to decoherence. Here, we use terahertz Scanning Near-field Optical Microscopy (SNOM) to probe the local dielectric properties and carrier concentrations of wet-etched aluminum resonators on silicon, one of the most characteristic components of the superconducting quantum processors. Using a recently developed vector calibration technique, we extract the THz permittivity from spectroscopy in proximity to the microwave feedline. Fitting the extracted permittivity to the Drude model, we find that silicon in the etched channel has a carrier concentration greater than buffer oxide etched silicon and we explore post-processing methods to reduce the carrier concentrations. Our results show that near-field THz investigations can be applied to quantitatively evaluate and identify potential loss channels in quantum devices.

翻訳日:2023-03-25 16:20:35 公開日:2021-11-01

# 勾配を用いたヘビー学習--現代ディープラーニングフレームワークを用いたヘビー畳み込みニューラルネットワーク

Hebbian learning with gradients: Hebbian convolutional neural networks with modern deep learning frameworks ( http://arxiv.org/abs/2107.01729v2 )

ライセンス: Link先を確認

Thomas Miconi

(参考訳) 深層学習ネットワークは一般に非生物学的学習法を用いる。対照的に、Hebbian Learningのような生物学的にもっとも有効な学習に基づくネットワークは、比較的性能が劣り、実装の難しさを示している。ここでは,階層的,畳み込み型ニューラルネットワークにおけるヘビアン学習が,最新のディープラーニングフレームワークとほぼ簡単に実装可能であることを示す。勾配が平ヘビアン規則(dw ~= xy)、グロスベルクの星内規則(dw ~= y(x-w))、オジャの規則(dw ~= y(x-yw))を正確に実装した式を提供する。アプリケーションとして,オブジェクト認識のためのヘビアン畳み込み多層ネットワークを構築する。このようなネットワークの上位層は大規模で単純な特徴(ガボライクなフィルタやブロブ)を学習する傾向にあり,従来報告されていたデコード性能の低下が説明できる。この傾向に対処するために、我々は、学習された特徴をスパーサーし、性能を大幅に向上させ、情報が連続する層を越えて増加するようにするための介入(可塑性の少ないデンサー活性化、層間の接続の刈り込み)を導入する。我々は、より高度な技術(動的刺激、トレース学習、フィードバック接続など)と、現代のディープラーニングフレームワークが提供する膨大な計算能力によって、多層ヘビーネットワークのパフォーマンスと生物学的関連性が大幅に向上すると仮定する。

Deep learning networks generally use non-biological learning methods. By contrast, networks based on more biologically plausible learning, such as Hebbian learning, show comparatively poor performance and difficulties of implementation. Here we show that Hebbian learning in hierarchical, convolutional neural networks can be implemented almost trivially with modern deep learning frameworks, by using specific losses whose gradients produce exactly the desired Hebbian updates. We provide expressions whose gradients exactly implement a plain Hebbian rule (dw ~= xy), Grossberg's instar rule (dw ~= y(x-w)), and Oja's rule (dw ~= y(x-yw)). As an application, we build Hebbian convolutional multi-layer networks for object recognition. We observe that higher layers of such networks tend to learn large, simple features (Gabor-like filters and blobs), explaining the previously reported decrease in decoding performance over successive layers. To combat this tendency, we introduce interventions (denser activations with sparse plasticity, pruning of connections between layers) which result in sparser learned features, massively increase performance, and allow information to increase over successive layers. We hypothesize that more advanced techniques (dynamic stimuli, trace learning, feedback connections, etc.), together with the massive computational boost offered by modern deep learning frameworks, could greatly improve the performance and biological relevance of multi-layer Hebbian networks.

翻訳日:2023-03-23 11:21:25 公開日:2021-11-01

# 部分空間が完全または真に絡み合うための単純条件

Simple sufficient condition for subspace to be completely or genuinely entangled ( http://arxiv.org/abs/2107.07530v2 )

ライセンス: Link先を確認

Maciej Demianowicz, Grzegorz Rajchel-Mieldzio\'c, and Remigiusz Augusiak

(参考訳) 単純十分条件を導入することで、二成分あるいは多成分ヒルベルト空間の部分空間が絡み合っているかどうかを判断することができる。我々の基準の主な要素は、幾何学的エンタングルメントの測度で表される部分空間にまたがるベクトルの絡み合いの観点から、部分空間の最小エンタングルメント上の境界である。基準は完全かつ真に絡み合った部分空間にも適用できる。いくつかの重要なシナリオでその有用性を探る。さらに、この条件から直接従う混合状態の絡み合い基準を述べる。補助的な結果として、$d$-レベルのディック状態の絡み合いの一般化幾何測度の公式を提供する。

We introduce a simple sufficient criterion, which allows one to tell whether a subspace of a bipartite or multipartite Hilbert space is entangled. The main ingredient of our criterion is a bound on the minimal entanglement of a subspace in terms of entanglement of vectors spanning that subspace expressed for geometrical measures of entanglement. The criterion is applicable to both completely and genuinely entangled subspaces. We explore its usefulness in several important scenarios. Further, an entanglement criterion for mixed states following directly from the condition is stated. As an auxiliary result we provide a formula for the generalized geometric measure of entanglement of the $d$--level Dicke states.

翻訳日:2023-03-22 05:15:23 公開日:2021-11-01

# 量子バッテリの抽出可能な仕事のゆらぎと帯電力の限界

Fluctuations in Extractable Work and Bounds on the Charging Power of Quantum Batteries ( http://arxiv.org/abs/2107.08620v2 )

ライセンス: Link先を確認

Shang-Yung Wang

(参考訳) 自由エネルギー作用素のゆらぎが量子電池の充電電力を束縛しているという主張に対する最近の意見の相違により、我々は元の導出を批判的に分析する。この分析は、上記の主張が閉系力学と開系力学の両方に当てはまらないことを示している。本結果は,充電用量子電池の作業内容に対して,自由エネルギー演算子は一貫した定量演算子ではないことを示す。

Motivated by a recent disagreement about the claim that fluctuations in the free energy operator bound the charging power of a quantum battery, we present a critical analysis of the original derivation. The analysis shows that the above claim does not hold for both closed- and open-system dynamics. Our results indicate that the free energy operator is not a consistent quantifying operator for the work content of a charging quantum battery.

翻訳日:2023-03-21 21:26:30 公開日:2021-11-01

# Aharonov-Bohm熱機関の非線形性能向上機構

Non-linear regime for enhanced performance of an Aharonov-Bohm heat engine ( http://arxiv.org/abs/2107.13222v2 )

ライセンス: Link先を確認

G\'eraldine Haack and Francesco Giazotto

(参考訳) ナノスケールにおける熱輸送と量子熱力学は、近年、特に量子技術の文脈において注目を集めている。量子技術に関する実験は非線形状態で行われることが期待される。本研究では,Aharonov-Bohm(AB)干渉計を熱機関として動作させるための線形応答系に基づく以前の結果に基づいて構築する。非線形系では、このメソスコピック量子機械が達成できるチューナビリティ、大きな効率性、熱力を示し、このab環が完全量子系において効率的な熱機械を開発するためのエキサイティングな視点を明らかにした。

Thermal transport and quantum thermodynamics at the nanoscale is nowadays garnering an increasing attention, in particular in the context of quantum technologies. Experiments relevant for quantum technology are expected to be performed in the non-linear regime. In this work, we build on previous results derived in the linear response regime for the performance of an Aharonov-Bohm (AB) interferometer operated as heat engine. In the non-linear regime, we demonstrate the tunability, large efficiency and thermopower that this mesoscopic quantum machine can achieve, confirming the exciting perspectives that this AB ring offers for developing efficient thermal machines in the fully quantum regime.

翻訳日:2023-03-20 17:19:49 公開日:2021-11-01

# 境界駆動自由フェルミオン鎖の待ち時間統計

Waiting-times statistics in boundary driven free fermion chains ( http://arxiv.org/abs/2108.11850v3 )

ライセンス: Link先を確認

Gabriel T. Landi

(参考訳) 両端のリンドブラッド浴に結合した量子鎖の待ち時間分布(WTD)について検討した。我々の焦点は自由フェルミオン鎖であり、そこでは単粒子行列の項で閉形式表現を導き、任意の大きさの鎖を研究できる。その際、非エルミート的プロパゲータを含む2点相関関数の公式も導出する。

We study the waiting-time distributions (WTDs) of quantum chains coupled to two Lindblad baths at each end. Our focus is on free fermion chains, where we derive closed-form expressions in terms of single-particle matrices, allowing one to study arbitrarily large chain sizes. In doing so, we also derive formulas for 2-point correlation functions involving non-Hermitian propagators.

翻訳日:2023-03-17 03:07:23 公開日:2021-11-01

# 量子誤差緩和のための実践的枠組み

A Practical Framework for Quantum Error Mitigation ( http://arxiv.org/abs/2110.05389v2 )

ライセンス: Link先を確認

Zhenyu Cai

(参考訳) 量子エラー軽減は、近い将来、量子機械の実用化において重要な役割を果たすことが期待されている。したがって、多くの量子エラー緩和スキームをコヒーレントなフレームワークの下で提案し、その基礎となる接続を強調し、実用的性能のガイダンスを提供することが重要である。本稿では,現在最先端の量子エラー緩和方式のほとんどを含む線形量子エラー緩和という一般的なフレームワークを構築する。フレームワーク内では、量子エラー緩和は、ノイズ状態からエラー緩和状態を抽出するものと見なすことができる。抽出率と呼ばれる抽出に成功した誤り軽減状態の割合は、与えられた緩和スキームのコスト効果を示す。この枠組みを用いることで,様々な緩和手法における忠実度向上,サンプリングオーバーヘッド,抽出率の導出と比較が可能となる。フレームワークによって提供される構造、洞察、直観は、新しいスキームのさらなる発展の基盤となりうる。

Quantum error mitigation is expected to play a crucial role in the practical applications of quantum machines for the foreseeable future. Thus it is important to put the numerous quantum error mitigation schemes proposed under a coherent framework that can highlight their underlying connections while providing guidance for their practical performance. In this article, we construct a general framework named linear quantum error mitigation that includes most of the state-of-the-art quantum error mitigation schemes. Within the framework, quantum error mitigation can be effectively viewed as extracting the error-mitigated state out of the noisy state. The fraction of error-mitigated state that is successfully extracted, called extraction rate, will indicate the cost-effectiveness of the given mitigation scheme. Using the framework, we can derive and compare the fidelity boost, sampling overhead and extraction rate across various mitigation schemes under practical assumptions. The structure, insights and intuitions provided by the framework can serve as a basis for further developments of new schemes.

翻訳日:2023-03-11 19:09:37 公開日:2021-11-01

# 可観測物の不確かさ領域と状態非依存の不確かさ関係

Uncertainty regions of observables and state-independent uncertainty relations ( http://arxiv.org/abs/2110.14134v2 )

ライセンス: Link先を確認

Lin Zhang and Shunlong Luo and Shao-Ming Fei and Junde Wu

(参考訳) 可観測物の分散や偏差の和に対する最適状態独立な下界は、不確実な制限状態に達する実験の増加にとって重要である。本稿では、一様ハール測度から導かれるランダム量子状態における2つ以上の量子観測可能な量子状態のタプルによって形成される不確かさ領域を決定することにより、分散や偏差の密接な不確かさ関係を計算するための枠組みを提案する。これらの不確かさ領域の分析式から, 2, 3, 任意の観測値の差分あるいは偏差の和で満たされる状態非依存の不確かさの不等式を示し, 両部類および三部体系において, 実験的に交絡検出基準を導出した。

The optimal state-independent lower bounds for the sum of variances or deviations of observables are of significance for the growing number of experiments that reach the uncertainty limited regime. We present a framework for computing the tight uncertainty relations of variance or deviation via determining the uncertainty regions, which are formed by the tuples of two or more of quantum observables in random quantum states induced from the uniform Haar measure on the purified states. From the analytical formulae of these uncertainty regions, we present state-independent uncertainty inequalities satisfied by the sum of variances or deviations of two, three and arbitrary many observables, from which experimentally friend entanglement detection criteria are derived for bipartite and tripartite systems.

翻訳日:2023-03-10 03:33:50 公開日:2021-11-01

# テクスチュアリティ, 微調整, テレロジカル説明

Contextuality, Fine-Tuning, and Teleological Explanation ( http://arxiv.org/abs/2110.15898v2 )

ライセンス: Link先を確認

Emily Adlam

(参考訳) 私は、文脈性に問題があるという直感の源泉として様々な提案を評価し、究極的には、文脈性は微調整の観点から考えるのが最適である、と結論づける。量子力学の他の微調整問題と同様に、この振る舞いは物理学の遠隔的特徴の顕在化として理解することができる。最後に、文脈分析に用いられてきたいくつかの形式的な数学的枠組みについて論じ、それらの結果が科学リアリストによってどのように解釈されるべきかを考察する。この議論の過程で、私はいくつかの新しい数学的結果を得た。私は、準備の文脈性は微調整の一形態であることを示し、測定の文脈性は、閉因果ループを禁ずるグローバルな制約に訴えることで説明できることを示し、また、古典的な存在論的モデルから負の確率が、疫学的な制約とともに生じることを実証する。

I assess various proposals for the source of the intuition that there is something problematic about contextuality, ultimately concluding that contextuality is best thought of in terms of fine-tuning. I then argue that as with other fine-tuning problems in quantum mechanics, this behaviour can be understood as a manifestation of teleological features of physics. Finally I discuss several formal mathematical frameworks that have been used to analyse contextuality and consider how their results should be interpreted by scientific realists. In the course of this discussion I obtain several new mathematical results - I demonstrate that preparation contextuality is a form of fine-tuning, I show that measurement contextuality can be explained by appeal to a global constraint forbidding closed causal loops, and I demonstrate how negative probabilities can arise from a classical ontological model together with an epistemic restriction.

翻訳日:2023-03-09 22:51:06 公開日:2021-11-01

# パス干渉による超伝導量子状態読み出しの改善

Improved superconducting qubit state readout by path interference ( http://arxiv.org/abs/2111.00736v1 )

ライセンス: Link先を確認

Zhiling Wang, Zenghui Bao, Yukai Wu, Yan Li, Cheng Ma, Tianqi Cai, Yipu Song, Hongyi Zhang, Luming Duan

(参考訳) 高忠実なシングルショット量子ビット状態の読み出しは多くの量子情報処理プロトコルに必須である。超伝導量子回路では、量子ビット状態は通常、透過または反射からマイクロ波空洞の分散周波数シフトを検出することによって決定される。本稿では,伝送信号と反射信号の間の構成的干渉を利用して,量子状態の読み出しを最適化し,より解決された状態の識別と量子状態の読み出し精度の向上を実証する。簡便かつ便利な手法として、空洞光子状態の識別に基づく他のクビット状態読み出し手法と組み合わせることで、クビット状態読み出しをさらに改善することができる。

High fidelity single shot qubit state readout is essential for many quantum information processing protocols. In superconducting quantum circuit, the qubit state is usually determined by detecting the dispersive frequency shift of a microwave cavity from either transmission or reflection. In this paper, we demonstrate the use of constructive interference between the transmitted and reflected signal to optimize the qubit state readout, with which we find a better resolved state discrimination and an improved qubit readout fidelity. As a simple and convenient approach, our scheme can be combined with other qubit readout methods based on the discrimination of cavity photon states to further improve the qubit state readout.

翻訳日:2023-03-09 17:15:04 公開日:2021-11-01

# 誤情報曝露後のtwitterユーザーの行動変化の分析

Analyzing Behavioral Changes of Twitter Users After Exposure to Misinformation ( http://arxiv.org/abs/2111.00700v1 )

ライセンス: Link先を確認

Yichen Wang, Richard Han, Tamara Lehman, Qin Lv, and Shivakant Mishra

(参考訳) 近年、ソーシャルメディアプラットフォームは誤情報を広めるために利用されてきた。オンライン誤報はユーザの信念に影響を与え、偏光のような社会的影響に結びついている。本研究は,誤報が特定のユーザの行動に与える影響に着目し,誤報に晒された後,一般のTwitterユーザが行動を変えたかどうかを理解することを目的とする。露出したユーザーの前後の行動を比較して、投稿したツイートの頻度やツイートの感情に重大な変化が生じたかどうかを判断する。以上の結果から,利用者の行動に統計学的に有意な変化がみられた。言語距離分析により,露光前の露出ユーザとベースラインユーザとの間には,すでに違いが見られた。また,マルチ露光群と極端変化群という2つの特定のユーザグループの特徴について検討した。最後に,誤報ツイートの出現後のユーザの行動の変化が,フォロワー数や投稿者のフォロワー数によって異なるかどうかを調査し,その行動変化がすべて類似していることを示す。

Social media platforms have been exploited to disseminate misinformation in recent years. The widespread online misinformation has been shown to affect users' beliefs and is connected to social impact such as polarization. In this work, we focus on misinformation's impact on specific user behavior and aim to understand whether general Twitter users changed their behavior after being exposed to misinformation. We compare the before and after behavior of exposed users to determine whether the frequency of the tweets they posted, or the sentiment of their tweets underwent any significant change. Our results indicate that users overall exhibited statistically significant changes in behavior across some of these metrics. Through language distance analysis, we show that exposed users were already different from baseline users before the exposure. We also study the characteristics of two specific user groups, multi-exposure and extreme change groups, which were potentially highly impacted. Finally, we study if the changes in the behavior of the users after exposure to misinformation tweets vary based on the number of their followers or the number of followers of the tweet authors, and find that their behavioral changes are all similar.

翻訳日:2023-03-09 17:14:34 公開日:2021-11-01

# 縮合ニューマン級数による量子誤差の緩和

Mitigating Quantum Errors via Truncated Neumann Series ( http://arxiv.org/abs/2111.00691v1 )

ライセンス: Link先を確認

Kun Wang, Yu-Ao Chen, and Xin Wang

(参考訳) 量子ゲートと量子ハードウェア上の測定は、必然的に量子エラーを引き起こすハードウェアの不完全性に直面する。このような避けられないエラーの軽減は、量子ハードウェアのパワーをより深く探求するために不可欠である。本稿では,ニューマン級数を用いた量子期待値計算において,量子ゲートと測定誤差を軽減できる統一フレームワークを提案する。基本的な考え方は、量子デバイスのシーケンシャルな応用によって生成される異なる順序の量子エラーと慎重に選択された係数を線形に組み合わせることで、その逆を近似することで量子エラーの効果をキャンセルすることである。注目すべきは、推定誤差は、停止順序で指数関数的に減衰し、量子デバイスのノイズ抵抗が適度である限り、帰納誤差軽減オーバーヘッドはシステムサイズに依存しないことである。異なる量子誤差に対してこのフレームワークを数値的にテストし,計算精度が大幅に向上していることを確認した。我々のフレームワークは、量子ゲートと測定誤差を統一的に緩和し、いかなるエラー構造も想定せず、また量子エラーを完全に特徴づけるためにトモグラフィーの手順も必要とせず、そして最も重要なのはスケーラビリティである。これらのアドバンテージは、我々の量子エラー軽減フレームワークを効率的かつ実用的なものにし、量子アプリケーションを提供するための短期量子デバイスの能力を拡張します。

Quantum gates and measurements on quantum hardware are inevitably subject to hardware imperfections that lead to quantum errors. Mitigating such unavoidable errors is crucial to explore the power of quantum hardware better. In this paper, we propose a unified framework that can mitigate quantum gate and measurement errors in computing quantum expectation values utilizing the truncated Neumann series. The essential idea is to cancel the effect of quantum error by approximating its inverse via linearly combining quantum errors of different orders produced by sequential applications of the quantum devices with carefully chosen coefficients. Remarkably, the estimation error decays exponentially in the truncated order, and the incurred error mitigation overhead is independent of the system size, as long as the noise resistance of the quantum device is moderate. We numerically test this framework for different quantum errors and find that the computation accuracy is substantially improved. Our framework possesses several vital advantages: it mitigates quantum gate and measurement errors in a unified manner, it neither assumes any error structure nor requires the tomography procedure to completely characterize the quantum errors, and most importantly, it is scalable. These advantages empower our quantum error mitigation framework to be efficient and practical and extend the ability of near-term quantum devices to deliver quantum applications.

翻訳日:2023-03-09 17:14:15 公開日:2021-11-01

# 非ガウス資源を用いた連続可変量子テレポーテーションの強化

Enhancing Continuous Variable Quantum Teleportation using Non-Gaussian Resources ( http://arxiv.org/abs/2111.00672v1 )

ライセンス: Link先を確認

Eduardo Villase\~nor and Robert Malaney

(参考訳) 連続可変(CV)非ガウス資源は、CVベースの量子通信とCVベースの計算のための量子エラー補正の実現において基礎となる。本研究では, CV非ガウス状態をノイズチャネルによるコヒーレントおよび圧縮状態の伝送の文脈における量子テレポーテーション資源状態として用いることを検討する。異なる非ガウス的資源状態の配列を検討し、各資源に対して達成された状態テレポーテーションの忠実度を計算する。以上の結果から,非ガウス状態の使用は,従来のCVテレポーテーション(ガウス2モード圧縮真空状態)と比較して大きな優位性を示した。ファイバーベースの量子通信では、特定の非ガウス状態を用いることで、量子テレポーテーションの範囲が約40%増加する。衛星と地上の量子通信において、ミシウス衛星と一致する開口構成のために、量子テレポーテーションの実行可能な範囲は700kmから1200kmを超える。これらの結果は、地球と宇宙の両方のネットワークにおける実用的および実現可能な量子通信の性能が著しく向上したことを示している。

Continuous Variable (CV) non-Gaussian resources are fundamental in the realization of quantum error correction for CV-based quantum communications and CV-based computing. In this work, we investigate the use of CV non-Gaussian states as quantum teleportation resource states in the context of the transmission of coherent and squeezed states through noisy channels. We consider an array of different non-Gaussian resource states, and compute the fidelity of state teleportation achieved for each resource. Our results show that the use of non-Gaussian states presents a significant advantage compared to the traditional resource adopted for CV teleportation; the Gaussian two-mode squeezed vacuum state. In fiber-based quantum communications, the range of quantum teleportation is increased by approximately 40% via the use of certain non-Gaussian states. In satellite-to-ground quantum communications, for aperture configurations consistent with the Micius satellite, the viable range of quantum teleportation is increased from 700 km to over 1200 km. These results represent a significant increase in the performance of pragmatic and realizable quantum communications in both terrestrial and space-based networks.

翻訳日:2023-03-09 17:13:54 公開日:2021-11-01

# ボルンの支配確立におけるカオス的・秩序的軌道の役割

The role of chaotic and ordered trajectories in establishing Born's rule ( http://arxiv.org/abs/2111.00846v1 )

ライセンス: Link先を確認

Athanasios C. Tzemos and George Contopoulos

(参考訳) 様々な量子の絡み合いに対するボルンの法則が満たされる(あるいは満たさない)とき、2つの絡み合ったボヘミアン量子ビットの軌道、順序とカオスについて詳細に研究した。エンタングルメントの任意の非零値とカオス的軌道が共存し、エンタングルメントが減少するに従って順序付けられた軌道の割合が増加する。ゼロエンタングルメントと最大エンタングルメントの極端なケースでは、順序とカオスの軌道だけが対応する。このモデルのカオス軌跡はエルゴード的であり、任意の絡み合いの値に対して、その点の極限分布は初期条件に依存しない。そのため、ボルンの規則の動的確立(あるいはそうでない)に責任を持つ秩序とカオスの軌道の比率である。

We study in detail the trajectories, ordered and chaotic, of two entangled Bohmian qubits when their initial preparation satisfies (or not) Born's rule for various amounts of quantum entanglement. For any non zero value of entanglement ordered and chaotic trajectories coexist and the proportion of ordered trajectories increases with the decrease of the entanglement. In the extreme cases of zero and maximum entanglement we have only ordered and chaotic trajectories correspondingly. The chaotic trajectories of this model are ergodic, for any given value of entanglement, namely the limiting distribution of their points does not depend on their initial conditions. Consequently it is the ratio between ordered and chaotic trajectories which is responsible for the dynamical establishment (or not) of Born's rule.

翻訳日:2023-03-09 17:09:31 公開日:2021-11-01

# 多体局在化は反復量子最適化を可能にする

Many-body localization enables iterative quantum optimization ( http://arxiv.org/abs/2111.00842v1 )

ライセンス: Link先を確認

Hanteng Wang, Hsiu-Chung Yeh, Alex Kamenev

(参考訳) 我々は,反復量子プロトコルを提案し,ガラス状エネルギー環境を用いて最適化問題を解く。これは多体局在遷移の三臨界点付近の周期的サイクリングに基づいている。これにより、各反復が局所エネルギーの最小値を求める非指数的に小さな確率に導くことが保証される。もう1つの重要な要素は、サイクルパラメータを現在達成されている最適状態("参照"状態)に調整し、より深い最小値が見つかるとリセットすることである。三項臨界点の位置が分かっていれば、このアルゴリズムは多項式時間で任意の精度で絶対最小値に近づくことができることを示す。

We suggest an iterative quantum protocol, allowing to solve optimization problems with a glassy energy landscape. It is based on a periodic cycling around the tricritical point of the many-body localization transition. This ensures that each iteration leads to a non-exponentially small probability to find a lower local energy minimum. The other key ingredient is to tailor the cycle parameters to a currently achieved optimal state (the "reference" state) and to reset them once a deeper minimum is found. We show that, if the position of the tricritical point is known, the algorithm allows to approach the absolute minimum with any given precision in a polynomial time.

翻訳日:2023-03-09 17:09:16 公開日:2021-11-01

# 挑戦的だが機会に満ちている小学校におけるプログラミングに対する教師の視点

Challenging but Full of Opportunities: Teachers' Perspectives on Programming in Primary Schools ( http://arxiv.org/abs/2111.00799v1 )

ライセンス: Link先を確認

Luisa Greifenstein, Isabella Gra{\ss}l, Gordon Fraser

(参考訳) 学校カリキュラムにおける計算思考の確立により、教師は小学校レベルでのプログラミングに子供たちを導入する必要がある。これは最近の発展なので、小学校の教師はプログラミングを最善に教えるために十分な準備ができていないかもしれないし、なぜそうしなければならないのかを十分に理解していないかもしれない。これらの質問をより深く理解するために,実践経験から得られた洞察と,教員養成の期待とを対比した。小学校でプログラミングを教えた教師200名, 教員97名を対象に, プログラミングを教える際の課題, 子供たちがプログラミングを学ぶときに生じる機会, およびこれらを実践的に扱うための戦略について調査した。多くの課題や機会が正しく予測されているが、小学校のプログラミング教育のために小学校教師をより良く準備するために、カリキュラムの改訂を通知できるいくつかの不一致がある。

The widespread establishment of computational thinking in school curricula requires teachers to introduce children to programming already at primary school level. As this is a recent development, primary school teachers may neither be adequately prepared for how to best teach programming, nor may they be fully aware why they have to do so. In order to gain a better understanding of these questions, we contrast insights taken from practical experiences with the anticipations of teachers in training. By surveying 200 teachers who have taught programming at primary schools and 97 teachers in training, we identify relevant challenges when teaching programming, opportunities that arise when children learn programming, and strategies how to address both of these in practice. While many challenges and opportunities are correctly anticipated, we find several disagreements that can inform revisions of the curricula in teaching studies to better prepare primary school teachers for teaching programming at primary schools.

翻訳日:2023-03-09 17:08:46 公開日:2021-11-01

# ソフトコアポテンシャルをもつ位置不規則イジングスピンのダイナミクス

Dynamics of position disordered Ising spins with a soft-core potential ( http://arxiv.org/abs/2111.00779v1 )

ライセンス: Link先を確認

Canzhu Tan, Xiaodong Lin, Yabing Zhou, Y. H. Jiang, Matthias Weidem\"uller, Bing Zhu

(参考訳) r$ はスピン間の距離であり、r_c$ はソフトコアの半径である2体相互作用ポテンシャル $\propto1/[1+(r/r_c)^\alpha]$ (\alpha\ge d$) の下でランダムに分布するイジングスピンの磁化緩和を理論的に研究する。動力学は全てのスピンが横方向に偏光して始まる。均質な場合、解析式は熱力学極限で導出され、これは$\propto\exp(-t^2)$から始まり、指数$\beta=d/\alpha$で長く漸近的に拡張指数法則に従う。振動挙動の間は減衰振幅で観察される。ガウスサンプルの場合、平均スピン間距離である$l_\rho/r_c$と$l_\rho$との比で系の乱れの程度を制御でき、磁化ダイナミクスを数値的に研究できる。 l_\rho/r_c\ll1$の限界において、スピンの位置異常にもかかわらず、全磁化に対してコヒーレント多体ダイナミクスを回収する。 l_\rho/r_c\gg1$の反対の極限では、磁化の初期素早い崩壊の後、均質の場合と同様のダイナミクスが現れる。漸近進化に対する$\beta\approx0.18$の伸張指数を$d=3, \alpha=6$とし、同種の場合(\beta=0.5$)とは異なる。

We theoretically study magnetization relaxation of Ising spins distributed randomly in a $d$-dimension homogeneous and Gaussian profile under a soft-core two-body interaction potential $\propto1/[1+(r/R_c)^\alpha]$ ($\alpha\ge d$), where $r$ is the inter-spin distance and $R_c$ is the soft-core radius. The dynamics starts with all spins polarized in the transverse direction. In the homogeneous case, an analytic expression is derived at the thermodynamic limit, which starts as $\propto\exp(-t^2)$ and follows a stretched-exponential law asymptotically at long time with an exponent $\beta=d/\alpha$. In between an oscillating behaviour is observed with a damping amplitude. For Gaussian samples, the degree of disorder in the system can be controlled by the ratio $l_\rho/R_c$ with $l_\rho$ the mean inter-spin distance and the magnetization dynamics is investigated numerically. In the limit of $l_\rho/R_c\ll1$, a coherent many-body dynamics is recovered for the total magnetization despite of the position disorder of spins. In the opposite limit of $l_\rho/R_c\gg1$, a similar dynamics as that in the homogeneous case emerges at later time after a initial fast decay of the magnetization. We obtain a stretched exponent of $\beta\approx0.18$ for the asymptotic evolution with $d=3, \alpha=6$, which is different from that in the homogeneous case ($\beta=0.5$).

翻訳日:2023-03-09 17:07:38 公開日:2021-11-01

# 線形方程式系に対する行列合同のQUBO定式化への応用について

On the application of matrix congruence to QUBO formulations for systems of linear equations ( http://arxiv.org/abs/2111.00747v1 )

ライセンス: Link先を確認

Sun Woo Park, Hyunju Lee, Byung Chun Kim, Youngho Woo, and Kyungtaek Jun

(参考訳) 量子コンピューティングアルゴリズムの最近の研究は、計算モデルの強化に寄与する可能性がある量子コンピュータの特徴の発掘に焦点を当てている。量子アニール法は線形方程式系の2次非制約バイナリ最適化(QUBO)を効果的に並列化する。本稿では,実対称行列と対角行列の合同性を生かして,これらの定式化を単純化する。さらに、QRやSVD分解などの古典的アルゴリズムよりも優れた性能を持つQUBOモデルの計算性能を示す。

Recent studies on quantum computing algorithms focus on excavating features of quantum computers which have potential for contributing to computational model enhancements. Among various approaches, quantum annealing methods effectively parallelize quadratic unconstrained binary optimization (QUBO) formulations of systems of linear equations. In this paper, we simplify these formulations by exploiting congruence of real symmetric matrices to diagonal matrices. We further exhibit computational merits of the proposed QUBO models, which can outperform classical algorithms such as QR and SVD decomposition.

翻訳日:2023-03-09 17:07:06 公開日:2021-11-01

# 薄膜Al/AlO$_x$/Alジョセフソン接合による3次元トランスモンの磁場抵抗

Magnetic-field resilience of 3D transmons with thin-film Al/AlO$_x$/Al Josephson junctions approaching 1 T ( http://arxiv.org/abs/2111.01115v1 )

ライセンス: Link先を確認

J. Krause, C. Dickel, E. Vaal, M. Vielmetter, J. Feng, R. Bounds, G. Catelani, J. M. Fink, Yoichi Ando

(参考訳) 磁場-弾性超伝導回路は、スピンまたはトポロジカル量子ビットと電気機械要素を含むハイブリッド量子計算アーキテクチャのセンシングや、フラックスノイズや準粒子損失の研究を可能にする。薄膜3Dアルミニウムトランスモンのスペクトルおよびコヒーレンス時間に及ぼす面内磁場最大1Tの影響について検討した。強磁場の影響を受けない銅空洞を用いて、トランスモンの磁場効果のみを探査することができる。そこで,同一キャビティ内で冷却された単一接合とイカトランスモンのデータを提示する。予想通り、超伝導ギャップの抑制と幾何学的フラウンホーファー様の寄与により、トランスモン周波数は磁場の増加とともに減少する。それにもかかわらず、薄膜トランスモンは強磁場弾性を示す:どちらのトランスモンも、マイクロ秒コヒーレンスを少なくとも 0.65 t まで表示し、$t_1$は測定可能な範囲全体で 1 $\mathrm{\mu}$s を超える。 SQUID分光は磁石の限界である1Tまで実現可能である。薄膜アルミニウムジョセフソン接合は高磁場下での超伝導回路に適したハードウェアである。

Magnetic-field-resilient superconducting circuits enable sensing applications and hybrid quantum-computing architectures involving spin or topological qubits and electro-mechanical elements, as well as studying flux noise and quasiparticle loss. We investigate the effect of in-plane magnetic fields up to 1 T on the spectrum and coherence times of thin-film 3D aluminum transmons. Using a copper cavity, unaffected by strong magnetic fields, we can solely probe the magnetic-field effect on the transmons. We present data on a single-junction and a SQUID transmon, that were cooled down in the same cavity. As expected, transmon frequencies decrease with increasing fields, due to a suppression of the superconducting gap and a geometric Fraunhofer-like contribution. Nevertheless, the thin-film transmons show strong magnetic-field resilience: both transmons display microsecond coherence up to at least 0.65 T, and $T_1$ remains above 1 $\mathrm{\mu}$s over the entire measurable range. SQUID spectroscopy is feasible up to 1 T, the limit of our magnet. We conclude that thin-film aluminum Josephson junctions are a suitable hardware for superconducting circuits in the high-magnetic-field regime.

翻訳日:2023-03-09 17:01:24 公開日:2021-11-01

# 分子集合体の吸収スペクトルのシミュレーション:確率的純粋状態の階層的アプローチ

Simulation of absorption spectra of molecular aggregates: a Hierarchy of Stochastic Pure States approach ( http://arxiv.org/abs/2111.01089v1 )

ライセンス: Link先を確認

Lipeng Chen, Doran I. G. Bennett and Alexander Eisfeld

(参考訳) 電子励起と振動自由度を強く構造的に結合した分子集合体に対する分光観測器のシミュレーションは重要であるが難しい課題である。純粋状態の階層(HOPS)は、局所的確率的軌跡に基づく正式な正確な解を提供する。大きな凝集体における吸収スペクトルのシミュレーションのためにホップの局在を利用するには、正規化軌道の定式化が必要である。ここでは、ケット状態とブラ状態が異なる電子ヒルベルト空間で伝播する正規化ダイアル方程式を提供する。この研究は、吸収スペクトルのシミュレーションや電場との相互作用に関して摂動性を持つ非線形分光の定式化に適応的なHOPS法を適用するための扉を開く。

The simulation of spectroscopic observables for molecular aggregates with strong and structured coupling of electronic excitation to vibrational degrees of freedom is an important but challenging task. The hierarchy of pure states (HOPS) provides a formally exact solution based on local, stochastic trajectories. Exploiting the localization of HOPS for the simulation of absorption spectra in large aggregares requires a formulation in terms of normalized trajectories. Here we provide a normalized dyadic equation where the ket- and bra-states are propagated in different electronic Hilbert spaces. This work opens the door to apply adaptive HOPS methods for the simulation of absorption spectra and also to a formulation for non-linear spectroscopy that is perturbative with respect to interactions with the electric field.

翻訳日:2023-03-09 17:01:01 公開日:2021-11-01

# 光吸収ターゲットを用いた量子照明

Quantum illumination with a light absorbing target ( http://arxiv.org/abs/2111.01069v1 )

ライセンス: Link先を確認

Rivu Gupta, Saptarshi Roy, Tamoghna Das, Aditi Sen De

(参考訳) 量子照明(QI)プロトコルでは、通常は部分的に反射するビームスプリッターによってモデル化されるターゲットの存在を検出する。我々は、目標が落下する光の一部を吸収した場合のqiの性能を分析し、シナリオをより現実的なものにする。本稿では,これらの特徴を持つ対象をモデル化し,チャーノフ境界(CB)の観点から量子領域における検出可能性を検討する。アイドラーフリーのセットアップでは、QIのコヒーレント状態を使用し、2モード圧縮真空(TMSV)状態がシグナ-イドラー方式で使用される。いずれの場合においても,吸収量の増加とともにcbの低下により検出効率が向上したことを報告する。興味深いことに,吸収の存在下では,より熱的背景が高効率でターゲット検出に繋がる可能性がある。さらに, 有限吸収量においても量子アドバンテージは持続することを示した。しかし, tmsv が提供する量子アドバンテージは, 吸収によって単調に減少し, 高吸収下では消滅的に小さくなることがわかった。また,コヒーレント状態とTMSV状態の両方の最適性(イドラーフリー,信号イドラー)を低い反射率と吸収の限界で示す。

In a quantum illumination (QI) protocol, the task is to detect the presence of the target which is typically modelled by a partially reflecting beam splitter. We analyze the performance of QI when the target absorbs part of the light that falls on it, thereby making the scenario more realistic. We present an optical setup that models a target with these characteristics and explore its detectability in the quantum domain in terms of the Chernoff bound (CB). For an idler-free setup, we use the coherent state for QI while the two mode squeezed vacuum (TMSV) state is employed in the signal-idler scheme. In both the cases, we report an absorption-induced enhancement of the detection efficiency indicated by a lowering of CB with increasing amounts of absorption. Interestingly, we show that in the presence of absorption, a more intense thermal background can lead to target detection with enhanced efficiency. Moreover, we observe that the quantum advantage persists even for finite amounts of absorption. However, we find that the quantum advantage offered by TMSV decreases monotonically with absorption, and becomes vanishingly small in the high absorption regime. We also demonstrate the optimality of both the coherent and the TMSV states in their respective setups (idler-free and signal-idler) in the limit of low reflectivity and absorption.

翻訳日:2023-03-09 17:00:28 公開日:2021-11-01

# 多モード連続可変絡み合い分布におけるクロストーク補償

Cross talk compensation in multimode continuous-variable entanglement distribution ( http://arxiv.org/abs/2111.00948v1 )

ライセンス: Link先を確認

Olena Kovalenko, Vladyslav C. Usenko and Radim Filip

(参考訳) 2モード圧縮状態は、連続変数およびハイブリッド量子情報プロトコルを遠隔で使用するスケーラブルでロバストな絡み合いリソースである。平行な類似チャネルを伝播する2モード圧縮状態の多モード分布における線形クロストークの効果を考察する。まず, 分布ガウスの絡み合いの劣化を低減するため, チャネル内への最初の2モードスクイージングは, クロストークの存在下で既に最適化されるべきであることを示す。第2に,チャネル透過率がすべてのモードに対して同じであればクロストークを完全に補償できる,絡み合いを使用する前に,モード間の相対位相と受信側における線形結合の同時最適化を提案する。どちらのモードでも同様の透過率値を持つ現実的なチャネルの場合、クロストークはいまだにほとんど補償される。モード干渉に依存する手法は、別のペアの測定とフィードフォワード制御を用いて、1組のモードにおける絡み合い局在の代替手法を克服する。我々の理論的結果は、クロストークによるスケーラブルな量子ネットワークにおけるマルチモード連続可変フォトニック絡み合いのより効率的な利用への道を開いた。

Two-mode squeezed states are scalable and robust entanglement resources for continuous-variable and hybrid quantum information protocols at a distance. We consider the effect of a linear cross talk in the multimode distribution of two-mode squeezed states propagating through parallel similar channels. First, to reduce degradation of the distributed Gaussian entanglement, we show that the initial two-mode squeezing entering the channel should be optimized already in the presence of a small cross talk. Second, we suggest simultaneous optimization of relative phase between the modes and their linear coupling on a receiver side prior to the use of entanglement, which can fully compensate the cross talk once the channel transmittance is the same for all the modes. For the realistic channels with similar transmittance values for either of the modes, the cross talk can be still largely compensated. This method relying on the mode interference overcomes an alternative method of entanglement localization in one pair of modes using measurement on another pair and feed-forward control. Our theoretical results pave the way to more efficient use of multimode continuous-variable photonic entanglement in scalable quantum networks with cross talk.

翻訳日:2023-03-09 16:59:16 公開日:2021-11-01

# アフシャールの二重スリット実験のシミュレーション

Simulation of Afshar's Double Slit Experiment ( http://arxiv.org/abs/2111.01220v1 )

ライセンス: Link先を確認

Bret Gergely and Herman Batelaan

(参考訳) Shahriar S. Afshar は2007年に修正した二重スリットの実験は相補性 [1] に反すると主張した。彼は標準の二重スリット実験を2回修正した。まず、スリットとスクリーンの間に、干渉最小限の位置に配置されるワイヤーグリッドを追加する。第2の修正は、ワイヤーグリッドのすぐ後に収束レンズを配置することである。この考え方は、ワイヤグリッドは干渉ミニマ(波状行動)の存在を意味し、一方レンズは、どの方向の情報(粒子状行動)を同時に得ることができる。より最近では、John G. Cramer [2] は、この実験は量子力学のトランザクショナル解釈(TIQM)を加速させたと主張した。彼の主張はTIQMを支持するボーアの相補性を精査している。本実験は, 量子力学の経路積分定式化を用いたシミュレーションにより解析し, エングルト, グリーンバーグ, ヤシン(E-G-Y) [4, 5] の波動粒子双対関係に一致することを示す。量子力学的解釈のためのテストベッドを提供するためのafsharの実験の使用は限られていると結論づけた。

Shahriar S. Afshar claimed that his 2007 modified version of the double-slit experiment violates complementarity [1]. He makes two modifications to the standard double-slit experiment. First, he adds a wire grid that is placed in between the slits and the screen at locations of interference minima. The second modification is to place a converging lens just after the wire grid. The idea is that the wire grid implies the existence of interference minima(wave-like behavior), while the lens can simultaneously obtain which-way information (particle-like behavior). More recently, John G. Cramer [2] argued that the experiment bolstered the Transactional Interpretation of Quantum mechanics (TIQM). His argument scrutinizes Bohr's complementarity in favor of TIQM. We analyze this experiment by simulation using the path integral formulation of quantum mechanics [3] and find that it agrees with the wave particle duality relation given by Englert, Greenberg and Yasin (E-G-Y) [4, 5]. We conclude that the use of Afshar's experiment to provide a testbed for quantum mechanical interpretations is limited.

翻訳日:2023-03-09 16:50:31 公開日:2021-11-01

# 局所的同時状態判別

Local simultaneous state discrimination ( http://arxiv.org/abs/2111.01209v1 )

ライセンス: Link先を確認

Christian Majenz, Maris Ozols, Christian Schaffner, Mehrdad Tahmasbi

(参考訳) 量子状態判別は、量子情報理論において研究される最も基本的な問題の1つである。応用範囲はチャネルコーディングからメトロロジーや暗号まで多岐にわたる。本稿では,この課題の新しい変種である局所同時状態判別(lssd)を提案する。従来分散した識別問題の変種では、当事者間のコミュニケーションが常に共同回答を導き出すことができたが、lssdの当事者はコミュニケーションが取れず、同時に正しい回答をしなければならない。この同時性は、例えば古典的状態の場合、問題は非分散区別タスクに自明でないことを意味する。それ自体は興味深いが、量子暗号においても問題が発生する。問題を導入した後、いくつかの特徴的結果を与える。その例を示します一局所的差別の最適戦略は、古典的状態においても、lssdの最適戦略と一致する必要はない。二追加の絡み合った資源がlssdの最適成功確率を増加させ、かつ、三エンタングルメントを用いた戦略に比べて、量子より強い非符号資源は、ある場合において、より高い成功確率をもたらすことができる。最後に,(古典的)3者lssdにおける最適戦略の発見はnp-hardであることを示す。

Quantum state discrimination is one of the most fundamental problems studied in quantum information theory. Applications range from channel coding to metrology and cryptography. In this work, we introduce a new variant of this task: Local Simultaneous State Discrimination (LSSD). While previous distributed variants of the discrimination problem always allowed some communication between the parties to come up with a joint answer, the parties in LSSD cannot communicate and have to simultaneously answer correctly. This simultaneity implies, e.g., that for classical states, the problem does not trivialize to a non-distributed distinguishing task. While interesting in its own right, this problem also arises in quantum cryptography. After introducing the problem, we give a number of characterization results. We give examples showing that i) the optimal strategy for local discrimination need not coincide with the optimal strategy for LSSD, even for classical states, ii) an additional entangled resource can increase the optimal success probability in LSSD, and iii) stronger-than-quantum non-signalling resources can allow for a higher success probability in some cases, compared to strategies using entanglement. Finally, we show that finding the optimal strategy in (classical) 3-party LSSD is NP-hard.

翻訳日:2023-03-09 16:49:51 公開日:2021-11-01

# 量子場からの相関の収穫を妨害する

Sabotaging the harvesting of correlations from quantum fields ( http://arxiv.org/abs/2111.01191v1 )

ライセンス: Link先を確認

Abhisek Sahu, Irene Melgarejo-Lermas and Eduardo Mart\'in-Mart\'inez

(参考訳) 本研究では,古典的および量子的相関関係の量子場に結合した非摂動的収穫について検討する。まず、時間に局所的に量子場に対向する任意の数の2レベル系を持つシナリオを考える。次に、2つのターゲット検出器(アリスとボブ)がフィールドとの相互作用を通じて相関を得る能力に対する追加検出器(インターロッパー)の存在の影響について検討する。我々は,この非摂動体制下での異なる相関指標の収穫を解析し,一方の因果的過去に作用することで,一方のインターロパーでもアリスとボブの相関関係を完全に妨害できることを実証した。具体的には、インターロッパーがフィールドと相互作用できることを示し、フィールド自体がエントロピーを持つパーティの1つを'略奪する'ことを示します。これによりアリスとボブはいかなる相関関係も獲得できない。さらに,このような攻撃は防御できないことを示した。

We study the non-perturbative harvesting of classical and quantum correlations between two parties coupled to a quantum field. First, we consider a scenario with an arbitrary number of two-level systems that couple to a quantum field locally in time. Then, we study the impact of the presence of additional detectors (interlopers) on the ability for two target detectors (Alice and Bob) to acquire correlations through their interaction with the field. We analyze the harvesting of different correlation measures in this non-perturbative regime and we demonstrate that even a single interloper can completely sabotage all correlation harvesting between Alice and Bob by acting on the causal past of one of them. Specifically, we show that the interloper is able to interact with the field so that the field itself `floods' one of the parties with entropy. This prevents Alice and Bob from acquiring any correlations. Furthermore, we show that this kind of attack cannot be defended against.

翻訳日:2023-03-09 16:49:22 公開日:2021-11-01

# ゲージ理論の量子熱化:カオス、乱流、普遍性

Quantum thermalization of gauge theories: chaos, turbulence and universality ( http://arxiv.org/abs/2111.01155v1 )

ライセンス: Link先を確認

Niklas Mueller, Torsten V. Zache, Robert Ott

(参考訳) 本稿では, 2+1時空次元における{\mathbf{z}_2$格子ゲージ理論のリアルタイム熱化ダイナミクスについて述べる。古典的な熱化はカオス的挙動、乱流、普遍性と関連しているが、量子力学系におけるこれらの現象の顕在化は明らかではない。しかし、絡み合い構造のレンズを通して見ると、量子熱分解は特徴的な段階を進行し、カオス、乱流、普遍性といった古典的現象と著しく類似した現象が現れる。

In this talk, we discuss real-time thermalization dynamics of $\mathbf{Z}_2$ Lattice Gauge Theory in 2+1 spacetime dimensions. While classical thermalization is commonly associated with chaotic behavior, turbulence and universality, the manifestation of these phenomena in quantum mechanical systems is not clear. However, when viewed through the lens of Entanglement Structure, we find that quantum thermalization proceeds in characteristic stages and reveals phenomena remarkably similar to their classical counterparts: chaos, turbulence and universality.

翻訳日:2023-03-09 16:48:50 公開日:2021-11-01

# Coherent Spin-Polarized Electron Beam, Phys における強度干渉 Rev. Lett. 126, 125501 (2021)

Comment on Kuwahara et al., Intensity Interference in a Coherent Spin-Polarized Electron Beam, Phys. Rev. Lett. 126, 125501 (2021) ( http://arxiv.org/abs/2111.02890v1 )

ライセンス: Link先を確認

Herman Batelaan, Sam Keramati and T. J. Gay

(参考訳) クワハラとアルの主張。 [1] は、ハンベリー・ブラウン・ツイツ電子反結合ディップの観測を報告している(第3図)が、光偏光への電子源放出速度依存性として説明できる可能性がある。 GaAs/GaAsP試料のひずみは一軸であり、光電子放出における線形二色性は0.1%よりも15%[7]大きいと予測される。円偏光についても同様の懸念がある。

The claim that Kuwahara et al. [1] have reported the observation of a Hanbury Brown-Twiss electron antibunching dip (their Fig. 3) could possibly be explained as an electron source emission rate dependency on the light polarization. Strain on their GaAs/GaAsP sample is uniaxial, and one would expect a linear dichroism in the photoemission possibly as large as 15% [7] - much larger than the 0.1% reported effect. The same concern exist for circular polarized light.

翻訳日:2023-03-09 16:40:34 公開日:2021-11-01

# 完全セキュアな分散スーパーデンス符号化: 最適性の絡み合い要件

Absolutely Secure Distributed Superdense Coding: Entanglement Requirement for Optimality ( http://arxiv.org/abs/2111.01563v1 )

ライセンス: Link先を確認

Sagnik Dutta, Asmita Banerjee, Prasanta K. Panigrahi

(参考訳) superdenseコーディングは、量子チャネルを介して古典的情報をセキュアに通信するためのリソースとして、絡み合いを使用する。超高次符号化法は、そのキャパシティがホレボ境界に達すると最適である。最適性のためには、最大絡み合いはアリスとボブの両分割で必要であるが、絶対的かつ真のマルチパーティイト絡み合いは不要である。偶数ビットまたは奇数ビットの情報を送信できる以前のスキームとは異なり、真の多部交絡GHZ状態を用いて任意の情報ビットを送信する一般化された高密度符号化プロトコルを実証した。異なるパウリ演算子の固有ベイジで表現されたGHZ状態は,セキュリティチェック手法を定式化してプロトコルの絶対セキュリティを確保する,ユニークなパリティパターンによって特徴づけられる。本手法は,空間的に分離されたパーティ間で資源情報を分散するシナリオに適用可能であることを示す。最後に、Bobに送られた量子ビットの数を最適化し、分散密度符号化法を構築する。

Superdense coding uses entanglement as a resource to communicate classical information securely through quantum channels. A superdense coding method is optimal when its capacity reaches Holevo bound. We show that for optimality, maximal entanglement is a necessity across the bipartition of Alice and Bob, but neither absolute nor genuine multipartite entanglement is required. Unlike the previous schemes, which can transmit either even or odd bits of information, we have demonstrated a generalized dense coding protocol using the genuine multipartite entangled GHZ state to send arbitrary information bits. Expressed in the eigenbasis of different Pauli operators, GHZ state is characterized by a unique parity pattern which enables us to formulate a security checking technique to ensure absolute security of the protocol. We show this method to be equally applicable in a scenario, where the resource information is distributed among spatially separated parties. Finally, optimizing the number of qubit(s) sent to Bob, we construct a distributed dense coding method, which completely depicts absolutely secure one way quantum communication between many to one party.

翻訳日:2023-03-09 16:40:22 公開日:2021-11-01

# 多安定キャビティマグノニックシステムを用いた長期記憶と三元論理ゲート

Long-Time Memory and Ternary Logic Gate Using a Multistable Cavity Magnonic System ( http://arxiv.org/abs/2111.01558v1 )

ライセンス: Link先を確認

Rui-Chang Shen, Yi-Pu Wang, Jie Li, Shi-Yao Zhu, G. S. Agarwal, and J. Q. You

(参考訳) マルチスタビリティは動的システムの異常な非線形特性であり、メモリとスイッチを実装するために探索することができる。ここではKerr非線形性を有する3モードキャビティマグノン系のトライスタビリティを実験的に実現した。三安定領域の3つの安定状態は、特定の駆動条件下でのキャビティマグノン偏光子の周波数シフトの安定解に対応する。安定した状態にあるシステムは、システムが経験した履歴に依存しており、この状態は、履歴情報を格納するために利用できる。我々の実験では、メモリ時間は5.11秒に達する。さらに, このマルチスタブルハイブリッドシステムを用いて, オンオフ特性のよい3次論理ゲートを実演する。我々の新しい発見は、空洞マグノニクスに基づく情報保存と処理への道を開いた。

Multistability is an extraordinary nonlinear property of dynamical systems and can be explored to implement memory and switches. Here we experimentally realize the tristability in a three-mode cavity magnonic system with Kerr nonlinearity. The three stable states in the tristable region correspond to the stable solutions of the frequency shift of the cavity magnon polariton under specific driving conditions. We find that the system staying in which stable state depends on the history experienced by the system, and this state can be harnessed to store the history information. In our experiment, the memory time can reach as long as 5.11 s. Moreover, we demonstrate the ternary logic gate with good on-off characteristics using this multistable hybrid system. Our new findings pave a way towards cavity magnonics-based information storage and processing.

翻訳日:2023-03-09 16:40:02 公開日:2021-11-01

# 熱画像における物体検出のための教師なし画像生成拡張適応

Unsupervised Image-generation Enhanced Adaptation for Object Detection in Thermal images ( http://arxiv.org/abs/2002.06770v3 )

ライセンス: Link先を確認

Peng Liu, Fuyu Li, Wanyi Li

(参考訳) 熱画像における物体検出は重要なコンピュータビジョンタスクであり、無人車両、ロボット工学、監視、夜間ビジョンなど多くの応用がある。ディープラーニングに基づく検出器は大きな進歩を遂げており、通常は大量のラベル付きトレーニングデータを必要とする。しかし, 熱画像中の物体検出のためのラベル付きデータは乏しく, 収集に費用がかかる。多数のラベル付き可視画像を利用して、それらを熱画像領域に適応する方法は、解決される予定である。熱画像における物体検出のための教師なし画像生成適応法を提案する。可視領域と熱領域との間のギャップを低減するため、提案手法では、対象画像と類似した擬似熱画像を生成することができ、可視領域のアノテーション情報を保存できる。画像生成は、CycleGANに基づく画像間変換および強度反転変換を含む。生成された偽の熱画像は、新たなソースドメインとして使用される。そして、オフザシェルフ領域適応高速RCNNを用いて、生成された中間領域と熱標的領域とのギャップを低減する。提案手法の有効性と優位性を示す実験を行った。

Object detection in thermal images is an important computer vision task and has many applications such as unmanned vehicles, robotics, surveillance and night vision. Deep learning based detectors have achieved major progress, which usually need large amount of labelled training data. However, labelled data for object detection in thermal images is scarce and expensive to collect. How to take advantage of the large number labelled visible images and adapt them into thermal image domain, is expected to solve. This paper proposes an unsupervised image-generation enhanced adaptation method for object detection in thermal images. To reduce the gap between visible domain and thermal domain, the proposed method manages to generate simulated fake thermal images that are similar to the target images, and preserves the annotation information of the visible source domain. The image generation includes a CycleGAN based image-to-image translation and an intensity inversion transformation. Generated fake thermal images are used as renewed source domain. And then the off-the-shelf Domain Adaptive Faster RCNN is utilized to reduce the gap between generated intermediate domain and the thermal target domain. Experiments demonstrate the effectiveness and superiority of the proposed method.

翻訳日:2022-12-31 13:01:07 公開日:2021-11-01

# SWAG:スパースラーニングのためのラッパー手法

SWAG: A Wrapper Method for Sparse Learning ( http://arxiv.org/abs/2006.12837v2 )

ライセンス: Link先を確認

Roberto Molinari, Gaetan Bakalli, St\'ephane Guerrier, Cesare Miglioli, Samuel Orso, Mucyo Karemera, Olivier Scaillet

(参考訳) 機械学習の手法やアルゴリズムの大部分は、ユーザの優先度に必ずしも一致するとは限らない予測性能に高い優先度を与える。多くの場合、工学から遺伝学までさまざまな分野の実践者や研究者は、特にすべての属性が利用できるわけではない設定において、結果の解釈可能性と再現性を必要としている。その結果、機械学習アルゴリズムのアウトプットをより解釈しやすくし、ユーザが属性の可用性に基づいて選択できる(予測性能の観点から)「等価な」学習者のライブラリを提供することが、これらの学習者をテストおよび/または予測/識別目的で利用するために必要となる。そこで本研究では,利用者が指定した学習方法に基づき,属性空間をゆるやかに探索し,データ収集とストレージコストの低さを生かした疎学習者のライブラリを探索する,スクリーニングとラッパーのアプローチを組み合わせた手法を提案する。この新しい方法は (i)容易に解釈できる属性の低次元ネットワークを提供する。 (ii)強力な学習者と同等の予測力を定義する属性の組み合わせの多様性に基づき、結果の潜在的な再現性を高める。我々はこのアルゴリズムを "Sparse Wrapper AlGorithm" (SWAG) と呼ぶ。

The majority of machine learning methods and algorithms give high priority to prediction performance which may not always correspond to the priority of the users. In many cases, practitioners and researchers in different fields, going from engineering to genetics, require interpretability and replicability of the results especially in settings where, for example, not all attributes may be available to them. As a consequence, there is the need to make the outputs of machine learning algorithms more interpretable and to deliver a library of "equivalent" learners (in terms of prediction performance) that users can select based on attribute availability in order to test and/or make use of these learners for predictive/diagnostic purposes. To address these needs, we propose to study a procedure that combines screening and wrapper approaches which, based on a user-specified learning method, greedily explores the attribute space to find a library of sparse learners with consequent low data collection and storage costs. This new method (i) delivers a low-dimensional network of attributes that can be easily interpreted and (ii) increases the potential replicability of results based on the diversity of attribute combinations defining strong learners with equivalent predictive power. We call this algorithm "Sparse Wrapper AlGorithm" (SWAG).

翻訳日:2022-11-17 21:58:41 公開日:2021-11-01

# 光を暗くする - 統一されたフレームワーク下での知識グラフ埋め込みモデルの大規模評価

Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework ( http://arxiv.org/abs/2006.13365v5 )

ライセンス: Link先を確認

Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Mikhail Galkin, Sahand Sharifzadeh, Asja Fischer, Volker Tresp, Jens Lehmann

(参考訳) モデルの実装、トレーニング、評価を組み込んだ知識グラフの異質性は、公正かつ徹底的な比較を困難にしている。先述した結果の再現性を評価するため,pykeenソフトウェアパッケージに21のインタラクションモデルを再実装し,評価した。報告したハイパーパラメータではどの結果が再現可能かは,別のハイパーパラメータでのみ再現可能であり,再現できないため,なぜこのような結果になるのかという知見を与えるため,概説する。次に、4つのデータセットで大規模ベンチマークを行い、数千の実験と24,804gpu時間の計算を行った。我々は、ベストプラクティス、各モデルのベスト設定、そして以前公開されたベスト設定よりも改善できる点について洞察を得る。モデルアーキテクチャ、トレーニングアプローチ、損失関数、および逆関係の明示的なモデリングの組み合わせは、モデルアーキテクチャによって決定されるだけでなく、モデルの性能にとって重要であることを強調する。いくつかのアーキテクチャが、慎重に設定された場合、最先端技術と競合する結果を得ることができることを示す。コード、実験的な構成、結果、分析はhttps://github.com/pykeen/pykeenとhttps://github.com/pykeen/benchmarkingで利用可能です。

The heterogeneity in recently published knowledge graph embedding models' implementations, training, and evaluation has made fair and thorough comparisons difficult. In order to assess the reproducibility of previously published results, we re-implemented and evaluated 21 interaction models in the PyKEEN software package. Here, we outline which results could be reproduced with their reported hyper-parameters, which could only be reproduced with alternate hyper-parameters, and which could not be reproduced at all as well as provide insight as to why this might be the case. We then performed a large-scale benchmarking on four datasets with several thousands of experiments and 24,804 GPU hours of computation time. We present insights gained as to best practices, best configurations for each model, and where improvements could be made over previously published best configurations. Our results highlight that the combination of model architecture, training approach, loss function, and the explicit modeling of inverse relations is crucial for a model's performances, and not only determined by the model architecture. We provide evidence that several architectures can obtain results competitive to the state-of-the-art when configured carefully. We have made all code, experimental configurations, results, and analyses that lead to our interpretations available at https://github.com/pykeen/pykeen and https://github.com/pykeen/benchmarking

翻訳日:2022-11-17 21:32:37 公開日:2021-11-01

# ガウス過程と適応的離散化を用いたパレートアクティブラーニング

Pareto Active Learning with Gaussian Processes and Adaptive Discretization ( http://arxiv.org/abs/2006.14061v2 )

ライセンス: Link先を確認

Andi Nika, Kerem Bozgan, Sepehr Elahi, \c{C}a\u{g}{\i}n Ararat, Cem Tekin

(参考訳) ベクトル値対象関数 $\boldsymbol{f}$ をガウス過程 (gp) からサンプリングし, 指数集合が整ったコンパクトな計量空間 $({\cal x},d)$ であるようなベクトル値対象関数 $\boldsymbol{f}$ を最適化する問題を考える。我々は、$\boldsymbol{f}$が事前に分かっていないと仮定し、$\boldsymbol{f}$ at design $x$は、$\boldsymbol{f}(x)$のうるさい観測結果をもたらすと仮定する。完全探索によるパレート最適設計の同定は,${\cal x}$ の濃度が大きい場合には実現不可能であるため,gpサンプリング関数の滑らかさと $({\cal x},d)$ の構造を利用して高速に学習するアルゴリズムであるadaptive $\boldsymbol{\epsilon}$-palを提案する。本質的に、Adaptive $\boldsymbol{\epsilon}$-PALは木に基づく適応的な離散化技術を用いて、可能な限り少数の評価で$\boldsymbol{\epsilon}$-accurate Paretoの集合を識別する。我々は、$\boldsymbol{\epsilon}$-accurate Pareto 集合識別のサンプル複雑性に基づく情報型および計量次元型境界を提供する。また,本アルゴリズムが複数のベンチマークデータセットにおけるpareto集合同定手法よりも優れていることを実験的に示す。

We consider the problem of optimizing a vector-valued objective function $\boldsymbol{f}$ sampled from a Gaussian Process (GP) whose index set is a well-behaved, compact metric space $({\cal X},d)$ of designs. We assume that $\boldsymbol{f}$ is not known beforehand and that evaluating $\boldsymbol{f}$ at design $x$ results in a noisy observation of $\boldsymbol{f}(x)$. Since identifying the Pareto optimal designs via exhaustive search is infeasible when the cardinality of ${\cal X}$ is large, we propose an algorithm, called Adaptive $\boldsymbol{\epsilon}$-PAL, that exploits the smoothness of the GP-sampled function and the structure of $({\cal X},d)$ to learn fast. In essence, Adaptive $\boldsymbol{\epsilon}$-PAL employs a tree-based adaptive discretization technique to identify an $\boldsymbol{\epsilon}$-accurate Pareto set of designs in as few evaluations as possible. We provide both information-type and metric dimension-type bounds on the sample complexity of $\boldsymbol{\epsilon}$-accurate Pareto set identification. We also experimentally show that our algorithm outperforms other Pareto set identification methods on several benchmark datasets.

翻訳日:2022-11-17 10:07:31 公開日:2021-11-01

# すべての障害モードが等しく作成される訳ではない: unlicable (mis)分類のためのディープニューラルネットワークのトレーニング

Not all Failure Modes are Created Equal: Training Deep Neural Networks for Explicable (Mis)Classification ( http://arxiv.org/abs/2006.14841v2 )

ライセンス: Link先を確認

Alberto Olmo, Sailik Sengupta, Subbarao Kambhampati

(参考訳) ディープニューラルネットワークは、しばしば画像分類タスクで不安定であり、入力を誤分類することが知られている。これらの誤分類は避けられないかもしれないが、全ての障害モードは等しく考えることはできない。特定の誤分類(例えば、犬の画像を飛行機に分類する)は、人間を困惑させ、システムに対する人間の信頼を失う。さらに悪いことに、これらの誤り(例えば霊長類として誤って分類された人)は、有害な社会的影響をもたらす可能性がある。そこで本研究では,説明不能な誤りを減らすことを目的とする。この課題に対処するために、まず、クラスがセマンティックに近いのかという人間の期待を捉えたクラスレベルのセマンティックス(M^h$)を得る方法について議論する。遠くにあるもの。 CIFAR-10, CIFAR-100, ImageNetなどの画像ベンチマークでは, 人文研究や人文知識ベースを活用すれば, クラスレベルのセマンティクスが容易に得られることを示す。第二に,重み付き損失関数(WLF)を用いて,不説明性の重みによる誤分類をペナルティ化する手法を提案する。最後に,提案手法を用いた既存の分類器のトレーニング(あるいは微調整)により,(1)最上位1の精度,(2)分布内および分布外の両方のテストデータにおけるより説明可能な障害モード,(3)既存の研究に比べて人為的ラベルの収集に要するコストが大幅に削減されることを示す。

Deep Neural Networks are often brittle on image classification tasks and known to misclassify inputs. While these misclassifications may be inevitable, all failure modes cannot be considered equal. Certain misclassifications (eg. classifying the image of a dog to an airplane) can perplex humans and result in the loss of human trust in the system. Even worse, these errors (eg. a person misclassified as a primate) can have odious societal impacts. Thus, in this work, we aim to reduce inexplicable errors. To address this challenge, we first discuss methods to obtain the class-level semantics that capture the human's expectation ($M^h$) regarding which classes are semantically close {\em vs.} ones that are far away. We show that for popular image benchmarks (like CIFAR-10, CIFAR-100, ImageNet), class-level semantics can be readily obtained by leveraging either human subject studies or publicly available human-curated knowledge bases. Second, we propose the use of Weighted Loss Functions (WLFs) to penalize misclassifications by the weight of their inexplicability. Finally, we show that training (or fine-tuning) existing classifiers with the proposed methods lead to Deep Neural Networks that have (1) comparable top-1 accuracy, (2) more explicable failure modes on both in-distribution and out-of-distribution (OOD) test data, and (3) incur significantly less cost in the gathering of additional human labels compared to existing works.

翻訳日:2022-11-16 20:46:45 公開日:2021-11-01

# 自閉症スペクトラム障害の神経画像診断とリハビリテーションのための深層学習

Deep Learning for Neuroimaging-based Diagnosis and Rehabilitation of Autism Spectrum Disorder: A Review ( http://arxiv.org/abs/2007.01285v4 )

ライセンス: Link先を確認

Marjane Khodatars, Afshin Shoeibi, Delaram Sadeghi, Navid Ghassemi, Mahboobeh Jafari, Parisa Moridian, Ali Khadem, Roohallah Alizadehsani, Assef Zare, Yinan Kong, Abbas Khosravi, Saeid Nahavandi, Sadiq Hussain, U. Rajendra Acharya, Michael Berk

(参考訳) 自閉症スペクトラム障害(ASD)の正確な診断と効果的なリハビリテーションが本疾患の管理に不可欠である。人工知能(AI)技術は、医師が自動診断とリハビリテーションの手順を適用するのを助ける。 AI技術は、従来の機械学習(ML)アプローチとディープラーニング(DL)技術で構成される。従来のml法は様々な特徴抽出と分類技術を用いるが、dlでは特徴抽出と分類のプロセスは知的かつ統合的に達成される。 ASDの診断のためのDL法は神経画像に基づくアプローチに焦点を当てている。神経イメージング技術は、ASD診断に有用な非侵襲性疾患マーカーである。構造的および機能的ニューロイメージング技術は、医師に脳の構造(解剖学と構造的接続)と機能(活動と機能的接続)に関する重要な情報を提供する。脳の複雑な構造と機能のため、DLのような強力なAI技術を活用することなく、神経画像データを用いたASD診断のための最適な手順を提案することは困難である。本稿では,ASDを識別するためのDLネットワークを用いた研究について述べる。 DLネットワークを利用したASD患者を支援するためのリハビリテーションツールも評価した。最後に,ASDの自動検出と修復において重要な課題を提示し,今後の課題を提案する。

Accurate diagnosis of Autism Spectrum Disorder (ASD) followed by effective rehabilitation is essential for the management of this disorder. Artificial intelligence (AI) techniques can aid physicians to apply automatic diagnosis and rehabilitation procedures. AI techniques comprise traditional machine learning (ML) approaches and deep learning (DL) techniques. Conventional ML methods employ various feature extraction and classification techniques, but in DL, the process of feature extraction and classification is accomplished intelligently and integrally. DL methods for diagnosis of ASD have been focused on neuroimaging-based approaches. Neuroimaging techniques are non-invasive disease markers potentially useful for ASD diagnosis. Structural and functional neuroimaging techniques provide physicians substantial information about the structure (anatomy and structural connectivity) and function (activity and functional connectivity) of the brain. Due to the intricate structure and function of the brain, proposing optimum procedures for ASD diagnosis with neuroimaging data without exploiting powerful AI techniques like DL may be challenging. In this paper, studies conducted with the aid of DL networks to distinguish ASD are investigated. Rehabilitation tools provided for supporting ASD patients utilizing DL networks are also assessed. Finally, we will present important challenges in the automated detection and rehabilitation of ASD and propose some future works.

翻訳日:2022-11-14 14:02:43 公開日:2021-11-01

# 不確実性推定を用いた変分オートエンコーダによる分布外サンプルの検出

Detecting Out-of-distribution Samples via Variational Auto-encoder with Reliable Uncertainty Estimation ( http://arxiv.org/abs/2007.08128v3 )

ライセンス: Link先を確認

Xuming Ran, Mingkun Xu, Lingrui Mei, Qi Xu, Quanying Liu

(参考訳) 変分オートエンコーダ(VAE)は、ディープニューラルネットワークアーキテクチャとベイズ法から豊かな表現能力を持つ影響のある生成モデルである。しかしながら、VAEモデルは、分布外入力(OOD)に対して、分布外入力(ID)よりも高い確率を割り当てる弱点がある。この問題に対処するため、OOD入力の深い理解には確実な不確実性推定が重要であると考えられる。本研究では,INCPVAEと呼ばれるVAEのエンコーダに統合可能な改良型ノイズコントラッシブ先行(INCP)を提案する。 INCPは拡張性があり、VAEと互換性があり、不確実性評価のためのINCPの利点も採用している。各種データセットに対する実験により,標準のVAEと比較してOODデータの不確実性推定に優れ,異常検出タスクにおいて堅牢であることが示された。 INCPVAEモデルは、OOD入力に対する確実な不確実性を推定し、VAEモデルにおけるOOD問題を解く。

Variational autoencoders (VAEs) are influential generative models with rich representation capabilities from the deep neural network architecture and Bayesian method. However, VAE models have a weakness that assign a higher likelihood to out-of-distribution (OOD) inputs than in-distribution (ID) inputs. To address this problem, a reliable uncertainty estimation is considered to be critical for in-depth understanding of OOD inputs. In this study, we propose an improved noise contrastive prior (INCP) to be able to integrate into the encoder of VAEs, called INCPVAE. INCP is scalable, trainable and compatible with VAEs, and it also adopts the merits from the INCP for uncertainty estimation. Experiments on various datasets demonstrate that compared to the standard VAEs, our model is superior in uncertainty estimation for the OOD data and is robust in anomaly detection tasks. The INCPVAE model obtains reliable uncertainty estimation for OOD inputs and solves the OOD problem in VAE models.

翻訳日:2022-11-09 22:04:06 公開日:2021-11-01

# クロスドメイン少数ショット認識のための中間レベルパターンの再検討

Revisiting Mid-Level Patterns for Cross-Domain Few-Shot Recognition ( http://arxiv.org/abs/2008.03128v4 )

ライセンス: Link先を確認

Yixiong Zou, Shanghang Zhang, JianPeng Yu, Yonghong Tian, Jos\'e M. F. Moura

(参考訳) 既存のマイノリティ・ショット・ラーニング(fsl)メソッドは通常ベースクラスを想定し、新しいクラスは同じドメイン(ドメイン内設定)からのものである。しかし、実際には、いくつかの特別なドメインがベースクラスを構築するのに十分なトレーニングサンプルを集めることは不可能である。この問題を解決するために, 一般ドメインベースクラスから特殊ドメイン新規クラスへ知識を転送するために, クロスドメインfsl (cdfsl) が最近提案されている。既存のcdfslは主に近接ドメイン間の転送に重点を置いているが、実際のアプリケーションで新しいクラスが現れる場合、遠隔ドメイン間の転送を考えることは稀であり、さらに難しい。本稿では,新しいクラスがベースクラスから離れた領域にあるcdfslの難解なサブセットを,メインストリームfsl作業においてより転送可能でありながら未検討である中レベルの特徴を再検討することで検討する。中間レベルの特徴の識別性を高めるために,各サンプルの識別情報を学ぶために,中間レベルの特徴を奨励する残差予測タスクを提案する。特に、このメカニズムはドメイン内のFSLやCDFSLに近いドメインでも有効である。したがって、同じトレーニングフレームワークの下で、クロスドメインFSLとインドメインFSLの両方に2種類の機能を提供します。 2つの挑戦的な医療データセットを含む6つの公開データセットの両方の設定下での実験は、我々の理論的根拠を検証し、最先端のパフォーマンスを示す。コードはリリースされる。

Existing few-shot learning (FSL) methods usually assume base classes and novel classes are from the same domain (in-domain setting). However, in practice, it may be infeasible to collect sufficient training samples for some special domains to construct base classes. To solve this problem, cross-domain FSL (CDFSL) is proposed very recently to transfer knowledge from general-domain base classes to special-domain novel classes. Existing CDFSL works mostly focus on transferring between near domains, while rarely consider transferring between distant domains, which is in practical need as any novel classes could appear in real-world applications, and is even more challenging. In this paper, we study a challenging subset of CDFSL where the novel classes are in distant domains from base classes, by revisiting the mid-level features, which are more transferable yet under-explored in main stream FSL work. To boost the discriminability of mid-level features, we propose a residual-prediction task to encourage mid-level features to learn discriminative information of each sample. Notably, such mechanism also benefits the in-domain FSL and CDFSL in near domains. Therefore, we provide two types of features for both cross- and in-domain FSL respectively, under the same training framework. Experiments under both settings on six public datasets, including two challenging medical datasets, validate the our rationale and demonstrate state-of-the-art performance. Code will be released.

翻訳日:2022-11-02 01:40:12 公開日:2021-11-01

# Berrut Approximated Coded Computing: 多項式コンピューティング以外のストラグラー耐性

Berrut Approximated Coded Computing: Straggler Resistance Beyond Polynomial Computing ( http://arxiv.org/abs/2009.08327v3 )

ライセンス: Link先を確認

Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali

(参考訳) 大規模データセットで複雑なモデルをトレーニングするために分散学習を使用する際の大きな課題のひとつは、ストラグラー効果に対処することだ。解法として,計算タスクに冗長性を効率的に付加するコード計算が最近提案されている。この技術では、符号化はデータセットにまたがって使用され、一定の大きさのワーカーノードの任意のサブセットの結果が最終的な結果を取り戻すのに十分であるように、符号化されたデータ上で計算される。これらのアプローチの主な課題は、(1)多項式関数の計算に限られていること、(2)データのサイズとモデル複雑性(多項式の次数)の乗算によって、待機するサーバのサブセットのサイズが大きくなること、(3)実数上での計算では数値的に安定ではないこと、である。本稿では,多項式関数計算に限らない別の手法として,berrut近似符号化計算(bacc)を提案する。さらに、マスターノードは、利用可能なワーカーノードの任意のサブセットの結果を用いて、最終的な結果を概ね計算することができる。近似アプローチは計算量の低い数値的に安定であることが証明されている。また,分散学習問題などの異なる環境でのシミュレーション結果を用いて,近似の精度を理論的に確立し検証した。特に、baccはサーバーのクラスタ上でディープニューラルネットワークをトレーニングするために使われ、収束率の点で繰り返し計算(繰り返し符号化)を上回っています。

One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the computation tasks. In this technique, coding is used across data sets, and computation is done over coded data, such that the results of an arbitrary subset of worker nodes with a certain size are enough to recover the final results. The major challenges with those approaches are (1) they are limited to polynomial function computations, (2) the size of the subset of servers that we need to wait for grows with the multiplication of the size of the data set and the model complexity (the degree of the polynomial), which can be prohibitively large, (3) they are not numerically stable for computation over real numbers. In this paper, we propose Berrut Approximated Coded Computing (BACC), as an alternative approach, which is not limited to polynomial function computation. In addition, the master node can approximately calculate the final results, using the outcomes of any arbitrary subset of available worker nodes. The approximation approach is proven to be numerically stable with low computational complexity. In addition, the accuracy of the approximation is established theoretically and verified by simulation results in different settings such as distributed learning problems. In particular, BACC is used to train a deep neural network on a cluster of servers, which outperforms repetitive computation (repetition coding) in terms of the rate of convergence.

翻訳日:2022-10-17 12:25:32 公開日:2021-11-01

# 小データを用いた幼児ポーズ推定のための不変表現学習

Invariant Representation Learning for Infant Pose Estimation with Small Data ( http://arxiv.org/abs/2010.06100v5 )

ライセンス: Link先を確認

Xiaofei Huang, Nihang Fu, Shuangjun Liu, Sarah Ostadabbas

(参考訳) 幼児の運動分析は、幼児の発達研究において重要な話題である。しかしながら、人間のポーズ推定の応用はますます広くなってきているが、大規模成人のポーズデータセットでトレーニングされたモデルは、体比とポーズの多用途性が著しく異なるため、幼児のポーズの推定にほとんど成功していない。さらに、プライバシとセキュリティの考慮事項は、堅牢なモデルのトレーニングに必要な適切な幼児ポーズデータの提供をゼロから妨げている。そこで本稿では,1) 幼児用合成画像と, 生成した合成幼児用画像とを組み合わせたハイブリッド合成・実幼児用画像(syrip)データセットの構築と公開を行い, (2) 隣接する成人用画像と合成幼児用画像の知識を, 微調整型ドメイン対応幼児用画像(fidip)推定モデルに転送できる多段階不変表現学習戦略を提案する。我々は,SyRIPデータセットでトレーニングされたモデルと同一のネットワーク構造を用いたアブレーション研究を行い,他の公立幼児ポーズデータセットでトレーニングされたモデルよりも顕著な改善を示した。複雑度が異なるポーズ推定バックボーンネットワークと統合されたfidipは、これらのモデルの微調整バージョンよりも一貫してパフォーマンスが良い。最新のDarkPoseモデルを用いた幼児のポーズ推定では、平均的精度(mAP)は93.6である。

Infant motion analysis is a topic with critical importance in early childhood development studies. However, while the applications of human pose estimation have become more and more broad, models trained on large-scale adult pose datasets are barely successful in estimating infant poses due to the significant differences in their body ratio and the versatility of their poses. Moreover, the privacy and security considerations hinder the availability of adequate infant pose data required for training of a robust model from scratch. To address this problem, this paper presents (1) building and publicly releasing a hybrid synthetic and real infant pose (SyRIP) dataset with small yet diverse real infant images as well as generated synthetic infant poses and (2) a multi-stage invariant representation learning strategy that could transfer the knowledge from the adjacent domains of adult poses and synthetic infant images into our fine-tuned domain-adapted infant pose (FiDIP) estimation model. In our ablation study, with identical network structure, models trained on SyRIP dataset show noticeable improvement over the ones trained on the only other public infant pose datasets. Integrated with pose estimation backbone networks with varying complexity, FiDIP performs consistently better than the fine-tuned versions of those models. One of our best infant pose estimation performers on the state-of-the-art DarkPose model shows mean average precision (mAP) of 93.6.

翻訳日:2022-10-07 23:58:23 公開日:2021-11-01

# 言語間一般化改善のための多言語BERT言語句の探索

Looking for Clues of Language in Multilingual BERT to Improve Cross-lingual Generalization ( http://arxiv.org/abs/2010.10041v4 )

ライセンス: Link先を確認

Chi-Liang Liu and Tsung-Yuan Hsu and Yung-Sung Chuang and Chung-Yi Li and Hung-yi Lee

(参考訳) 多言語BERT (m-BERT) には、言語情報と意味情報の両方が含まれている。我々は、言語のトークンの埋め込みを平均化することによって、言語の表現を得ることができることを見出した。この言語表現を前提として、トークン埋め込みを操作することで多言語BERTの出力言語を制御し、教師なしトークン翻訳を実現する。さらに、この観測に基づいて、m-BERTの言語間能力を改善するために、計算的に安価で効果的なアプローチを提案する。

Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information. We find that the representation of a language can be obtained by simply averaging the embeddings of the tokens of the language. Given this language representation, we control the output languages of multilingual BERT by manipulating the token embeddings, thus achieving unsupervised token translation. We further propose a computationally cheap but effective approach to improve the cross-lingual ability of m-BERT based on this observation.

翻訳日:2022-10-05 05:52:23 公開日:2021-11-01

# 教師なし複数質問に対する回答:基礎知識から学び始める

Unsupervised Multiple Choices Question Answering: Start Learning from Basic Knowledge ( http://arxiv.org/abs/2010.11003v2 )

ライセンス: Link先を確認

Chi-Liang Liu and Hung-yi Lee

(参考訳) 本稿では,mcqa(unsupervised multiple choices question answering)の可能性について検討する。 MCQAモデルは、非常に基本的な知識から始めて、ある選択が他の選択よりも正しい確率が高いことを知っている。この情報は、非常にうるさいが、MCQAモデルのトレーニングを導く。提案手法は RACE のベースラインアプローチよりも優れており,MC500 の教師あり学習手法と同等である。

In this paper, we study the possibility of almost unsupervised Multiple Choices Question Answering (MCQA). Starting from very basic knowledge, MCQA model knows that some choices have higher probabilities of being correct than the others. The information, though very noisy, guides the training of an MCQA model. The proposed method is shown to outperform the baseline approaches on RACE and even comparable with some supervised learning approaches on MC500.

翻訳日:2022-10-04 22:49:39 公開日:2021-11-01

# 分散予測のための因果意味表現の学習

Learning Causal Semantic Representation for Out-of-Distribution Prediction ( http://arxiv.org/abs/2011.01681v5 )

ライセンス: Link先を確認

Chang Liu, Xinwei Sun, Jindong Wang, Haoyue Tang, Tao Li, Tao Qin, Wei Chen, Tie-Yan Liu

(参考訳) 従来の教師付き学習法、特に深層学習法は、学習された表現がドメイン固有の相関によって意味的要因と変動要因を混合し、意味的要素のみがアウトオブディストリビューション(ood)の例に敏感であることが判明した。この問題を解決するために,因果推論に基づく因果意味生成モデル(CSG)を提案し,その2つの要因を個別にモデル化し,共通かつ困難な単一トレーニング領域からのOOD予測手法を開発する。これらの手法は因果不変の原理に基づいており、効率的な学習と容易な予測のための変分ベイズにおける新しい設計である。理論的には、ある条件下では、CSGはトレーニングデータに適合させることで意味的因子を識別できることを証明し、この意味的識別はOOD一般化誤差の有界性と適応の成功を保証する。実証実験では、OOD性能は一般的なベースラインよりも向上した。

Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output. To address the problem, we propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately, and develop methods for OOD prediction from a single training domain, which is common and challenging. The methods are based on the causal invariance principle, with a novel design in variational Bayes for both efficient learning and easy prediction. Theoretically, we prove that under certain conditions, CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error and the success of adaptation. Empirical study shows improved OOD performance over prevailing baselines.

翻訳日:2022-09-30 03:41:39 公開日:2021-11-01

# 量子畳み込みニューラルネットワークにおける不規則高原の欠如

Absence of Barren Plateaus in Quantum Convolutional Neural Networks ( http://arxiv.org/abs/2011.02966v2 )

ライセンス: Link先を確認

Arthur Pesah, M. Cerezo, Samson Wang, Tyler Volkoff, Andrew T. Sornborger, Patrick J. Coles

(参考訳) 量子ニューラルネットワーク(QNN)は、量子データを効率的に分析する可能性に興奮を引き起こしている。しかし、この興奮は、多くのqnnアーキテクチャにおいて、barren plateau landscapesとして知られる指数関数的に消失する勾配の存在によって温められている。近年、量子畳み込みニューラルネットワーク(qcnns)が提案されており、関連するデータ特徴に関する情報を保存しながら量子ビット数を削減する畳み込み層とプール層が連なる。本研究では,qcnnアーキテクチャにおけるパラメータの勾配スケーリングを厳密に解析する。勾配のばらつきは多項式よりも早く消えることが分かり、QCNNが不規則な高原を示さないことが示唆された。これは、他の多くのQNNアーキテクチャとは異なり、ランダムに初期化されたQCNNのトレーニング可能性に関する分析的な保証を提供する。本研究の結果を導出するために,Haar分散ユニタリに対する期待値を解析するグラフベースの新しい手法を導入する。最後に,解析結果を検証するために数値シミュレーションを行う。

Quantum neural networks (QNNs) have generated excitement around the possibility of efficiently analyzing quantum data. But this excitement has been tempered by the existence of exponentially vanishing gradients, known as barren plateau landscapes, for many QNN architectures. Recently, Quantum Convolutional Neural Networks (QCNNs) have been proposed, involving a sequence of convolutional and pooling layers that reduce the number of qubits while preserving information about relevant data features. In this work we rigorously analyze the gradient scaling for the parameters in the QCNN architecture. We find that the variance of the gradient vanishes no faster than polynomially, implying that QCNNs do not exhibit barren plateaus. This provides an analytical guarantee for the trainability of randomly initialized QCNNs, which highlights QCNNs as being trainable under random initialization unlike many other QNN architectures. To derive our results we introduce a novel graph-based method to analyze expectation values over Haar-distributed unitaries, which will likely be useful in other contexts. Finally, we perform numerical simulations to verify our analytical results.

翻訳日:2022-09-29 11:48:56 公開日:2021-11-01

# 高次ボロノイ図を用いた$k$-Nearest近傍分類器の逆例

Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams ( http://arxiv.org/abs/2011.09719v2 )

ライセンス: Link先を確認

Chawin Sitawarin, Evgenios M. Kornaropoulos, Dawn Song, David Wagner

(参考訳) 逆例は機械学習モデルにおいて広く研究されている現象である。注目を集めているのはニューラルネットワークだが、他の実用的なモデルもこの問題に悩まされている。そこで本研究では,最小ノルムの逆数例を求めるために,$k$-nearest 近傍分類の逆数ロバスト性を評価するアルゴリズムを提案する。従来の提案と異なり,与えられた入力点から外へ拡大する探索を行うことにより幾何学的アプローチをとる。高いレベルでは、探索半径は、入力点と異なる分類を行う細胞を見つけるまで、近くのボロノイ細胞へと拡大する。アルゴリズムを大規模な$k$にスケールするために、様々なデータセットにおいて、ベースラインと比較してより少ないノルムで摂動を求める近似ステップを導入する。さらに、我々のアプローチが競合より優れているデータセットの構造特性を分析する。

Adversarial examples are a widely studied phenomenon in machine learning models. While most of the attention has been focused on neural networks, other practical models also suffer from this issue. In this work, we propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification, i.e., finding a minimum-norm adversarial example. Diverging from previous proposals, we take a geometric approach by performing a search that expands outwards from a given input point. On a high level, the search radius expands to the nearby Voronoi cells until we find a cell that classifies differently from the input point. To scale the algorithm to a large $k$, we introduce approximation steps that find perturbations with smaller norm, compared to the baselines, in a variety of datasets. Furthermore, we analyze the structural properties of a dataset where our approach outperforms the competition.

翻訳日:2022-09-23 20:16:56 公開日:2021-11-01

# データ効率の高い電波銀河の分類

Data-Efficient Classification of Radio Galaxies ( http://arxiv.org/abs/2011.13311v2 )

ライセンス: Link先を確認

Ashwin Samudre, Lijo George, Mahak Bansal, Yogesh Wadadekar

(参考訳) 電波銀河からの連続放出は、一般的にFRI、FRII、ベント、コンプレックスなどの異なる形態分類に分類される。本稿では,小規模データセット($\sim 2000$ sample)を用いた深層学習法を用いて,形態学に基づく電波銀河分類の課題について検討する。本研究では, サイクリック学習率や識別学習などの高度な技術を用いた事前訓練DenseNetモデルを用いて, ツインネットワークに基づく数ショット学習手法を適用し, モデルを高速に学習する。我々は、ベント型銀河とFRII型銀河の最大の混同源である最高の性能モデルを用いて、92\%以上の分類精度を達成する。私たちの結果は、小さながキュレーションされたデータセットに焦点を合わせることで、ニューラルネットワークのトレーニングにベストプラクティスを使うことが、よい結果をもたらすことを示しています。自動分類技術は、近日中に数十万個の新しい電波銀河を検出すると期待されている次世代の電波望遠鏡による調査に欠かせない。

The continuum emission from radio galaxies can be generally classified into different morphological classes such as FRI, FRII, Bent, or Compact. In this paper, we explore the task of radio galaxy classification based on morphology using deep learning methods with a focus on using a small scale dataset ($\sim 2000$ samples). We apply few-shot learning techniques based on Twin Networks and transfer learning techniques using a pre-trained DenseNet model with advanced techniques like cyclical learning rate and discriminative learning to train the model rapidly. We achieve a classification accuracy of over 92\% using our best performing model with the biggest source of confusion being between Bent and FRII type galaxies. Our results show that focusing on a small but curated dataset along with the use of best practices to train the neural network can lead to good results. Automated classification techniques will be crucial for upcoming surveys with next generation radio telescopes which are expected to detect hundreds of thousands of new radio galaxies in the near future.

翻訳日:2022-09-20 12:35:12 公開日:2021-11-01

# (参考訳) perspeechnorm: ペルシャ語の音声処理正規化ツールキット

PerSpeechNorm: A Persian Toolkit for Speech Processing Normalization ( http://arxiv.org/abs/2111.03470v1 )

ライセンス: CC BY 4.0

Romina Oji, Seyedeh Fatemeh Razavi, Sajjad Abdi Dehsorkh, Alireza Hariri, Hadi Asheri, Reshad Hosseini

(参考訳) 一般に、音声処理モデルは音響モデルとともに言語モデルで構成される。言語モデルの複雑さとバリエーションに関わらず、クリーニング、正規化、トークン化という3つの重要な前処理ステップが言語モデルで必要である。上述のステップの中で、正規化ステップは、純粋なテキストアプリケーションで統一されたフォーマットに不可欠である。しかし、音声処理モジュールの組み込み言語モデルでは、正規化は形式統一に限定されない。さらに、読みやすいシンボル、番号等を、どのように発音するかに変換する必要がある。音声処理モジュールに組み込み言語モデルのためのペルシア正規化ツールキットは存在しないので,本論文では,音声処理におけるテキスト処理のためのオープンソース正規化ツールキットを提案する。簡潔に言えば、記号(普通通貨、#、@、urlなど)、数字(日付、時間、電話番号、国定コードなど)といった異なる読みやすいペルシア語のテキストを考える。他のペルシア語テキスト正規化ツールとの比較は、音声処理における提案手法の優位性を示している。また,提案した関数の1つ(文分離)に対するモデルの性能を,HAZMやParsivarといった他の共通自然言語ライブラリと比較すると,提案手法の適切な性能を示す。さらに,ペルシャ語ウィキペディアデータの評価により,提案手法の適切な性能が確認された。

In general, speech processing models consist of a language model along with an acoustic model. Regardless of the language model's complexity and variants, three critical pre-processing steps are needed in language models: cleaning, normalization, and tokenization. Among mentioned steps, the normalization step is so essential to format unification in pure textual applications. However, for embedded language models in speech processing modules, normalization is not limited to format unification. Moreover, it has to convert each readable symbol, number, etc., to how they are pronounced. To the best of our knowledge, there is no Persian normalization toolkits for embedded language models in speech processing modules, So in this paper, we propose an open-source normalization toolkit for text processing in speech applications. Briefly, we consider different readable Persian text like symbols (common currencies, #, @, URL, etc.), numbers (date, time, phone number, national code, etc.), and so on. Comparison with other available Persian textual normalization tools indicates the superiority of the proposed method in speech processing. Also, comparing the model's performance for one of the proposed functions (sentence separation) with other common natural language libraries such as HAZM and Parsivar indicates the proper performance of the proposed method. Besides, its evaluation of some Persian Wikipedia data confirms the proper performance of the proposed method.

翻訳日:2021-11-14 15:44:28 公開日:2021-11-01

# 整数計画のための大近所探索ポリシーの学習

Learning Large Neighborhood Search Policy for Integer Programming ( http://arxiv.org/abs/2111.03466v1 )

ライセンス: Link先を確認

Yaoxin Wu, Wen Song, Zhiguang Cao and Jie Zhang

(参考訳) 本稿では,整数プログラミング (IP) のための大規模近傍探索 (LNS) ポリシーを学習するための深層強化学習 (RL) 手法を提案する。 RLポリシーはデフォールト演算子として訓練され、各ステップで変数のサブセットを選択し、IPソルバによって修復演算子として再最適化される。しかし、可変部分集合の組合せ数は典型的なrlアルゴリズムの直接適用を妨げている。この課題に取り組むために、私たちはすべてのサブセットを各変数のバイナリ決定に分解することで表現します。次に,各変数のポリシを並列に学習するためにニューラルネットワークを設計,カスタマイズされたアクタ-クリティックアルゴリズムでトレーニングする。提案手法を4つの代表的IP問題に対して評価する。結果は、SCIPよりもはるかに少ない時間でより良いソリューションを見つけることができ、同じランタイムで他のLSSベースラインよりも大幅に優れていることを示している。さらに、これらの利点は、ポリシーがより大きな問題に一般化するときに特に持続する。また、gurobiによるさらなる実験により、この最先端の商用解法を同じ時間内に実現できることが判明した。

We propose a deep reinforcement learning (RL) method to learn large neighborhood search (LNS) policy for integer programming (IP). The RL policy is trained as the destroy operator to select a subset of variables at each step, which is reoptimized by an IP solver as the repair operator. However, the combinatorial number of variable subsets prevents direct application of typical RL algorithms. To tackle this challenge, we represent all subsets by factorizing them into binary decisions on each variable. We then design a neural network to learn policies for each variable in parallel, trained by a customized actor-critic algorithm. We evaluate the proposed method on four representative IP problems. Results show that it can find better solutions than SCIP in much less time, and significantly outperform other LNS baselines with the same runtime. Moreover, these advantages notably persist when the policies generalize to larger problems. Further experiments with Gurobi also reveal that our method can outperform this state-of-the-art commercial solver within the same time limit.

翻訳日:2021-11-14 15:12:53 公開日:2021-11-01

# RADAMS:IDoS攻撃に対するレジリエントで適応的なアラートと注意管理戦略

RADAMS: Resilient and Adaptive Alert and Attention Management Strategy against Informational Denial-of-Service (IDoS) Attacks ( http://arxiv.org/abs/2111.03463v1 )

ライセンス: Link先を確認

Linan Huang and Quanyan Zhu

(参考訳) 人間の注意欠陥を利用した攻撃は、サイバーセキュリティに深刻な脅威をもたらしている。本研究では,人間の操作を過負荷にし,実際の攻撃を隠蔽するために大量のフェント攻撃を発生させるIDoS攻撃という,新たなタイプのアクティブアタック攻撃を特定し,正式に定義する。人間の要因(例えば、専門知識、ストレス、効率のレベル)と経験的結果(例えば、ヤークス・ドッドソンの法則とサンクコスト誤認)を組み込んで、オペレータの注意力のダイナミクスとその意思決定プロセスとそのリアルタイムの警告監視と検査をモデル化します。そこで我々は,警告の可観測性に基づいて警告を選択的に強調するResilient and Adaptive Data-driven alert and Attention Management Strategy (RADAMS)を開発した。 RADAMSは強化学習を使用して、様々な人間のオペレータ向けにカスタマイズされた、転送可能な設計を実現し、IDoS攻撃を進化させる。統合モデリングと理論的分析は、製品原則(Product Principle of Attention, PPoA)、基本的限界、重要な人的・経済的要因間のトレードオフにつながる。実験結果は,提案手法がデフォルト戦略を上回り,最大20%のidosリスクを低減できることを示した。さらに、この戦略はコスト、攻撃頻度、人的注意力の多様さに耐性がある。我々は,注意リスク等価性,攻撃者のジレンマ,半真正銘の最適攻撃戦略などの興味深い現象を認識した。

Attacks exploiting human attentional vulnerability have posed severe threats to cybersecurity. In this work, we identify and formally define a new type of proactive attentional attacks called Informational Denial-of-Service (IDoS) attacks that generate a large volume of feint attacks to overload human operators and hide real attacks among feints. We incorporate human factors (e.g., levels of expertise, stress, and efficiency) and empirical results (e.g., the Yerkes-Dodson law and the sunk cost fallacy) to model the operators' attention dynamics and their decision-making processes along with the real-time alert monitoring and inspection. To assist human operators in timely and accurately dismissing the feints and escalating the real attacks, we develop a Resilient and Adaptive Data-driven alert and Attention Management Strategy (RADAMS) that de-emphasizes alerts selectively based on the alerts' observable features. RADAMS uses reinforcement learning to achieve a customized and transferable design for various human operators and evolving IDoS attacks. The integrated modeling and theoretical analysis lead to the Product Principle of Attention (PPoA), fundamental limits, and the tradeoff among crucial human and economic factors. Experimental results corroborate that the proposed strategy outperforms the default strategy and can reduce the IDoS risk by as much as 20%. Besides, the strategy is resilient to large variations of costs, attack frequencies, and human attention capacities. We have recognized interesting phenomena such as attentional risk equivalency, attacker's dilemma, and the half-truth optimal attack strategy.

翻訳日:2021-11-14 15:12:11 公開日:2021-11-01

# 深層学習による繊維強化複合材料の応力場予測

Stress field prediction in fiber-reinforced composite materials using a deep learning approach ( http://arxiv.org/abs/2111.05271v1 )

ライセンス: Link先を確認

Anindya Bhaduri, Ashwini Gupta, Lori Graham-Brady

(参考訳) 計算応力解析は材料システム設計における重要なステップである。有限要素法 (FEM) は複雑な材料系の応力解析を行う標準的な手法である。ストレス分析を加速する方法は、femをデータ駆動機械学習ベースのストレス分析アプローチに置き換えることである。本研究では, 繊維強化マトリックス複合材料システムについて考察し, 深層学習ツールを用いて応力場予測のためのFEM手法の代替手法を提案する。まず, 空間構成の異なる繊維の固定数の複合材料系に対する応力場マップの予測を試みた。具体的には,複合材料中の繊維の空間配置と対応するフォン・ミセス応力場とのマッピングを試みた。これは畳み込みニューラルネットワーク(CNN)、特にU-Netアーキテクチャを使用して、トレーニングデータと同じ数のファイバーを持つシステムの真のストレスマップを使用して達成される。 u-netはエンコーダ・デコーダネットワークであり,本研究では複合材料イメージを入力として入力画像と同じ大きさの応力場画像を出力する。トレーニングサンプルの異なる初期化を行い,少数のトレーニングサンプルに対する予測精度の感度を求めることにより,ロバスト性解析を行う。複合材料系の繊維数が同じ体積率で増加すると、その形状を正確に表現するためには、より微細な有限要素メッシュ離散化が必要である。これにより計算コストが増大する。そこで, 本研究の目的は, 比較的安価な繊維数が少ない系の真の応力マップからの情報を用いて, 空間構成の異なる繊維数が多いシステムの応力場を予測することである。

Computational stress analysis is an important step in the design of material systems. Finite element method (FEM) is a standard approach of performing stress analysis of complex material systems. A way to accelerate stress analysis is to replace FEM with a data-driven machine learning based stress analysis approach. In this study, we consider a fiber-reinforced matrix composite material system and we use deep learning tools to find an alternative to the FEM approach for stress field prediction. We first try to predict stress field maps for composite material systems of fixed number of fibers with varying spatial configurations. Specifically, we try to find a mapping between the spatial arrangement of the fibers in the composite material and the corresponding von Mises stress field. This is achieved by using a convolutional neural network (CNN), specifically a U-Net architecture, using true stress maps of systems with same number of fibers as training data. U-Net is a encoder-decoder network which in this study takes in the composite material image as an input and outputs the stress field image which is of the same size as the input image. We perform a robustness analysis by taking different initializations of the training samples to find the sensitivity of the prediction accuracy to the small number of training samples. When the number of fibers in the composite material system is increased for the same volume fraction, a finer finite element mesh discretization is required to represent the geometry accurately. This leads to an increase in the computational cost. Thus, the secondary goal here is to predict the stress field for systems with larger number of fibers with varying spatial configurations using information from the true stress maps of relatively cheaper systems of smaller fiber number.

翻訳日:2021-11-14 15:10:47 公開日:2021-11-01

# テクノロジーの世代交代:コンピュータ科学と神経外科、そしてVRのユースケース

Generational Frameshifts in Technology: Computer Science and Neurosurgery, The VR Use Case ( http://arxiv.org/abs/2110.15719v2 )

ライセンス: Link先を確認

Samuel R. Browd, Maya Sharma, Chetan Sharma

(参考訳) 私たちは、神経外科の実践を変えるために協力的に集結する技術が合流する、歴史上のユニークな瞬間にいます。これらの技術変革は、神経外科の術中パフォーマンス向上ツールや方法の改善、非同期神経外科訓練とシミュレーションのためのスケーラブルなソリューション、および、品質評価、請求書作成、結果測定、外科的ベストプラクティスの普及などの基本的変化を可能にする手術データの広範囲にわたる集約を含む、全面的に導入される。手術の詳細を把握し,手術の各部位を解析しながら,より安全かつ効率的に手術を行う能力は,当科の領域とすべての外科専門分野に全く新しい画期的な展開をもたらす。手術室内の全てのコンポーネントのデジタル化により、コンピュータや計算科学の様々な分野を活用して、位置に関係なく高品質な神経外科治療のケアと提供を改善する新たな洞察を得ることができる。神経外科の民主化は進行中であり、現代の世界のこれらのツールの開発、抽出、導入によって推進されるでしょう。仮想現実(virtual reality)は、消費者が直面するテクノロジーが、産業や医療において明確な役割を担っていることを示す良い例であり、人間の能力と相互作用をスケールするための新しいパラダイムを作る様々なコンピュータサイエンス技術の融合の顕著な例である。著者らは、近い将来のオペレーティングルームを実現するために必要な、無数の計算科学とデータ科学を紹介、紹介するテクノロジエコシステムについて説明している。

We are at a unique moment in history where there is a confluence of technologies which will synergistically come together to transform the practice of neurosurgery. These technological transformations will be all-encompassing, including improved tools and methods for intraoperative performance of neurosurgery, scalable solutions for asynchronous neurosurgical training and simulation, as well as broad aggregation of operative data allowing fundamental changes in quality assessment, billing, outcome measures, and dissemination of surgical best practices. The ability to perform surgery more safely and more efficiently while capturing the operative details and parsing each component of the operation will open an entirely new epoch advancing our field and all surgical specialties. The digitization of all components within the operating room will allow us to leverage the various fields within computer and computational science to obtain new insights that will improve care and delivery of the highest quality neurosurgery regardless of location. The democratization of neurosurgery is at hand and will be driven by our development, extraction, and adoption of these tools of the modern world. Virtual reality provides a good example of how consumer-facing technologies are finding a clear role in industry and medicine and serves as a notable example of the confluence of various computer science technologies creating a novel paradigm for scaling human ability and interactions. The authors describe the technology ecosystem that has come and highlight a myriad of computational and data sciences that will be necessary to enable the operating room of the near future.

翻訳日:2021-11-07 12:02:41 公開日:2021-11-01

# タスクガイドグラフ変換による弱教師付き概念マップ生成

Weakly Supervised Concept Map Generation through Task-Guided Graph Translation ( http://arxiv.org/abs/2110.15720v2 )

ライセンス: Link先を確認

Jiaying Lu, Xiangjue Dong, Carl Yang

(参考訳) 近年、自由テキストから知識を適切に構造化した要約を提供することの利点から、概念地図生成技術の急速な発展を目撃している。従来の教師なしメソッドはタスク指向のコンセプトマップを生成しないが、深層生成モデルは大量のトレーニングデータを必要とする。本稿では,GT-D2G(Graph Translation based Document-To-Graph)を提案する。汎用NLPパイプラインを利用して意味豊かな初期グラフを導出し,文書ラベルの弱い管理下でより簡潔な構造に翻訳する。これらの概念マップの品質と解釈性は,3つの実世界のコーパス上での人間による評価によって検証され,文書ラベルの不足による制御実験において,下流作業におけるそれらの有用性はさらに実証された。

Recent years have witnessed the rapid development of concept map generation techniques due to their advantages in providing well-structured summarization of knowledge from free texts. Traditional unsupervised methods do not generate task-oriented concept maps, whereas deep generative models require large amounts of training data. In this work, we present GT-D2G (Graph Translation based Document-To-Graph), an automatic concept map generation framework that leverages generalized NLP pipelines to derive semantic-rich initial graphs, and translates them into more concise structures under the weak supervision of document labels. The quality and interpretability of such concept maps are validated through human evaluation on three real-world corpora, and their utility in the downstream task is further demonstrated in the controlled experiments with scarce document labels.

翻訳日:2021-11-07 11:38:29 公開日:2021-11-01

# (参考訳) 医用画像解析のためのディープニューラルネットワークの透明性:解釈可能性の検討

Transparency of Deep Neural Networks for Medical Image Analysis: A Review of Interpretability Methods ( http://arxiv.org/abs/2111.02398v1 )

ライセンス: CC BY 4.0

Zohaib Salahuddin, Henry C Woodruff, Avishek Chatterjee and Philippe Lambin

(参考訳) 人工知能は、診断と治療決定のための多くの臨床応用に有用な助けとして登場した。ディープニューラルネットワークは、利用可能なデータと計算能力の急速な増加により、多くのタスクで臨床医と同等あるいは優れたパフォーマンスを示している。信頼できるAIの原則に従うためには、AIシステムは透明性、堅牢、公正、そして説明責任を保証することが不可欠である。現在のディープニューラルソリューションは、意思決定プロセスに関する詳細の理解が欠如しているため、ブラックボックスと呼ばれる。したがって、日常的な臨床ワークフローに組み込む前に、ディープニューラルネットワークの解釈可能性を確保する必要がある。本総説では, 医用画像解析用深層学習モデルの理解に用いられてきた9種類の解釈可能性手法を, 生成した説明書の種類と技術的類似性に基づいて, 体系的キーワード検索と専門知識を用いて同定した。さらに,様々な解釈方法によって得られた説明を評価するための進歩について報告する。最後に, 医用画像解析における深部ニューラルネットワークの解釈可能性に関する限界, 解釈可能性手法と今後の方向性について考察する。

Artificial Intelligence has emerged as a useful aid in numerous clinical applications for diagnosis and treatment decisions. Deep neural networks have shown same or better performance than clinicians in many tasks owing to the rapid increase in the available data and computational power. In order to conform to the principles of trustworthy AI, it is essential that the AI system be transparent, robust, fair and ensure accountability. Current deep neural solutions are referred to as black-boxes due to a lack of understanding of the specifics concerning the decision making process. Therefore, there is a need to ensure interpretability of deep neural networks before they can be incorporated in the routine clinical workflow. In this narrative review, we utilized systematic keyword searches and domain expertise to identify nine different types of interpretability methods that have been used for understanding deep learning models for medical image analysis applications based on the type of generated explanations and technical similarities. Furthermore, we report the progress made towards evaluating the explanations produced by various interpretability methods. Finally we discuss limitations, provide guidelines for using interpretability methods and future directions concerning the interpretability of deep neural networks for medical imaging analysis.

翻訳日:2021-11-06 06:31:53 公開日:2021-11-01

# (参考訳) スクラッチから同時に刈り取った構造と重みを学習する:注意に基づくアプローチ

Learning Pruned Structure and Weights Simultaneously from Scratch: an Attention based Approach ( http://arxiv.org/abs/2111.02399v1 )

ライセンス: CC BY 4.0

Qisheng He, Ming Dong, Loren Schwiebert, Weisong Shi

(参考訳) ディープラーニングモデルには通常、数百万のトレーニング可能なウェイトが含まれているため、ストレージスペースの削減とランタイム効率の向上という、より効率的なネットワーク構造に対する需要が高まっている。プルーニングは最も人気のあるネットワーク圧縮技術の一つである。本稿では,非構造化プルーニングパイプライン,注意に基づく同時スパース構造と重み学習(ASWL)を提案する。従来のチャネルワイドやウェイトワイドアテンション機構とは異なり、ASWLは各層に対する層ワイドアテンションによるプルーニング比を計算する効率的なアルゴリズムを提案し、密集ネットワークとスパースネットワークの重みをランダムに初期化した重みから同時に学習するように追跡する。 MNIST, Cifar10, ImageNet を用いた実験により, ASWL は最先端のネットワークプルーニング手法と比較して, 精度, プルーニング率, 動作効率で優れたプルーニング結果が得られることを示した。

As a deep learning model typically contains millions of trainable weights, there has been a growing demand for a more efficient network structure with reduced storage space and improved run-time efficiency. Pruning is one of the most popular network compression techniques. In this paper, we propose a novel unstructured pruning pipeline, Attention-based Simultaneous sparse structure and Weight Learning (ASWL). Unlike traditional channel-wise or weight-wise attention mechanism, ASWL proposed an efficient algorithm to calculate the pruning ratio through layer-wise attention for each layer, and both weights for the dense network and the sparse network are tracked so that the pruned structure is simultaneously learned from randomly initialized weights. Our experiments on MNIST, Cifar10, and ImageNet show that ASWL achieves superior pruning results in terms of accuracy, pruning ratio and operating efficiency when compared with state-of-the-art network pruning methods.

翻訳日:2021-11-06 05:36:19 公開日:2021-11-01

# (参考訳) 医用画像分類のための深部AUCの最大化 : 課題と機会

Deep AUC Maximization for Medical Image Classification: Challenges and Opportunities ( http://arxiv.org/abs/2111.02400v1 )

ライセンス: CC BY 4.0

Tianbao Yang

(参考訳) 本稿では,医療画像分類において,AUC の最大化による新たな深層学習手法(いわゆる \underline{\bf D}eep \underline{\bf A}UC \underline{\bf M}aximization あるいは {\bf DAM} )がもたらした機会と課題について論じる。 AUCは医学画像分類の標準的な性能指標であるため、AUCを直接最適化することで、従来の損失関数(例えばクロスエントロピー損失)を最小化するよりも、ディープニューラルネットワークを学習する方が優れたパフォーマンスが得られる。近年,大規模医用画像分類に深部auc最大化を用いる傾向がみられた。本稿では,最近の結果を強調して考察する。 i) DAMの確率的非凸最適化アルゴリズムによる進歩 (ii)様々な医用画像分類問題における有望な結果次に、機能学習、大規模最適化、信頼できるAIモデルの学習という3つの観点から、医療画像分類におけるDAMの課題と機会について論じる。

In this extended abstract, we will present and discuss opportunities and challenges brought about by a new deep learning method by AUC maximization (aka \underline{\bf D}eep \underline{\bf A}UC \underline{\bf M}aximization or {\bf DAM}) for medical image classification. Since AUC (aka area under ROC curve) is a standard performance measure for medical image classification, hence directly optimizing AUC could achieve a better performance for learning a deep neural network than minimizing a traditional loss function (e.g., cross-entropy loss). Recently, there emerges a trend of using deep AUC maximization for large-scale medical image classification. In this paper, we will discuss these recent results by highlighting (i) the advancements brought by stochastic non-convex optimization algorithms for DAM; (ii) the promising results on various medical image classification problems. Then, we will discuss challenges and opportunities of DAM for medical image classification from three perspectives, feature learning, large-scale optimization, and learning trustworthy AI models.

翻訳日:2021-11-06 05:21:44 公開日:2021-11-01

# (参考訳) 物理インフォームド機械学習を用いたcfd問題の数値近似

Numerical Approximation in CFD Problems Using Physics Informed Machine Learning ( http://arxiv.org/abs/2111.02987v1 )

ライセンス: CC BY 4.0

Siddharth Rout, Vikas Dwivedi, Balaji Srinivasan

(参考訳) この論文は、計算コストが低くランタイムの低い幅広いCFD問題に普遍的に使用できる代替近似法を見つけるための様々な手法に焦点を当てている。機械学習の分野では、コアの野望を満たすために様々な技術が研究されている。定常移流拡散問題(stable advection diffusion problem)は、ある方法が解を提供するまでの複雑さのレベルを理解するためのテストケースとして用いられてきた。最終的に、計算データを使ったトレーニングなしに微分方程式を解くことが可能な、物理学的なインフォームド機械学習技術に焦点が当てられる。 I.E. Lagarisらによる一般的な方法。 M. Raissiらは徹底的に調査されている。一般的な方法は対流支配的な問題を解決することはできない。分散物理情報ニューラルネットワーク (DPINN) と呼ばれる物理情報化手法を提案し, 対流支配問題の解法を提案する。ドメインを分割し、他の物理ベースの制約を平均二乗損失項として導入することで、古いメソッドのレキシビリティと能力を高める。様々な実験を行い、この手法で終わりから終わりまでの可能性を探る。パラメトリックな研究は、異なる可変パラメータに対するメソッドの振る舞いを理解するためにも行われる。この方法は、定常な対流拡散問題と不安定な正方形パルス問題に対して試験される。正確な結果が記録されている。 Extreme Learning Machine (ELM) は、チューナブルパラメーターを犠牲にして、非常に高速なニューラルネットワークアルゴリズムである。提案モデルのEMMに基づく変種は, 対流拡散問題に対して検証される。 ELMは複雑な最適化を単純化し、メソッドは非定型であるため、ソリューションは単一ショットで記録される。 ELMベースの変種は単純なDPINN法よりもうまく機能しているようだ。将来の様々な発展のスコープは、論文全体を通して示唆される。

The thesis focuses on various techniques to find an alternate approximation method that could be universally used for a wide range of CFD problems but with low computational cost and low runtime. Various techniques have been explored within the field of machine learning to gauge the utility in fulfilling the core ambition. Steady advection diffusion problem has been used as the test case to understand the level of complexity up to which a method can provide solution. Ultimately, the focus stays over physics informed machine learning techniques where solving differential equations is possible without any training with computed data. The prevalent methods by I.E. Lagaris et.al. and M. Raissi et.al are explored thoroughly. The prevalent methods cannot solve advection dominant problems. A physics informed method, called as Distributed Physics Informed Neural Network (DPINN), is proposed to solve advection dominant problems. It increases the lexibility and capability of older methods by splitting the domain and introducing other physics-based constraints as mean squared loss terms. Various experiments are done to explore the end to end possibilities with the method. Parametric study is also done to understand the behavior of the method to different tunable parameters. The method is tested over steady advection-diffusion problems and unsteady square pulse problems. Very accurate results are recorded. Extreme learning machine (ELM) is a very fast neural network algorithm at the cost of tunable parameters. The ELM based variant of the proposed model is tested over the advection-diffusion problem. ELM makes the complex optimization simpler and Since the method is non-iterative, the solution is recorded in a single shot. The ELM based variant seems to work better than the simple DPINN method. Simultaneously scope for various development in future are hinted throughout the thesis.

翻訳日:2021-11-06 05:12:28 公開日:2021-11-01

# 前・逆PDE問題に対するグラディエント・エンハンス物理インフォームドニューラルネットワーク

Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems ( http://arxiv.org/abs/2111.02801v1 )

ライセンス: Link先を確認

Jeremy Yu, Lu Lu, Xuhui Meng, George Em Karniadakis

(参考訳) ディープラーニングは物理インフォームドニューラルネットワーク(PINN)を通じて偏微分方程式(PDE)を解くのに有効なツールであることが示されている。 PINNはPDE残基をニューラルネットワークの損失関数に埋め込んでおり、様々な前方および逆PDE問題の解決に成功している。しかし、第一世代のピンの欠点の1つは、訓練点が多ければ精度が限られることである。そこで本研究では, PINNの精度とトレーニング効率を向上させるために, 勾配型物理インフォームドニューラルネットワーク(gPINN)を提案する。 gPINNはPDE残差の勾配情報を利用して、勾配を損失関数に埋め込む。我々はgPINNを広範囲にテストし、PDE問題と逆PDE問題の両方においてgPINNの有効性を実証した。以上の結果から,gPINNはトレーニングポイントの少ないPINNよりも優れた性能を示した。さらに,gpinnと,トレーニング中のトレーニング点の分布を適応的に改善する方法であるresult-based adaptivefine (rar) を組み合わせることで,特に急勾配溶液を用いたpdesにおいて,gpinnの性能をさらに向上させた。

Deep learning has been shown to be an effective tool in solving partial differential equations (PDEs) through physics-informed neural networks (PINNs). PINNs embed the PDE residual into the loss function of the neural network, and have been successfully employed to solve diverse forward and inverse PDE problems. However, one disadvantage of the first generation of PINNs is that they usually have limited accuracy even with many training points. Here, we propose a new method, gradient-enhanced physics-informed neural networks (gPINNs), for improving the accuracy and training efficiency of PINNs. gPINNs leverage gradient information of the PDE residual and embed the gradient into the loss function. We tested gPINNs extensively and demonstrated the effectiveness of gPINNs in both forward and inverse PDE problems. Our numerical results show that gPINN performs better than PINN with fewer training points. Furthermore, we combined gPINN with the method of residual-based adaptive refinement (RAR), a method for improving the distribution of training points adaptively during training, to further improve the performance of gPINN, especially in PDEs with solutions that have steep gradients.

翻訳日:2021-11-05 15:45:12 公開日:2021-11-01

# 6gセンシングのための自己教師付き無線視覚表現学習

Self-Supervised Radio-Visual Representation Learning for 6G Sensing ( http://arxiv.org/abs/2111.02887v1 )

ライセンス: Link先を確認

Mohammed Alloulah, Akash Deep Singh, Maximilian Arnold

(参考訳) 将来の6Gセルネットワークでは、共同通信およびセンシングプロトコルにより、ネットワークは環境を認識でき、統一された通信知覚基盤の上に多くの新しいアプリケーションの扉を開く。しかし、センシングシーンの粗い無線表現の解釈は困難であり、これらの創発的システムの可能性を妨げる。無線と視覚を組み合わせることで、人間の介入を最小限に抑える無線のみのセンシングモデルを自動的に学習する。私たちは、何百万もの未解決のデータポイントをフィードできる無線センシングモデルを構築したいと考えています。そこで我々は,近年の自己教師型学習の進歩を活用し,新たなラベルのない無線-視覚協調学習手法を定式化した。本手法は,共通線形分類ベンチマークに従って実装・評価し,質的・定量的な性能指標を報告する。本評価では, 下流センシングデモンストラクタに対して, ラジオ・ビジュアル・セルフ・スーパービジョンで学習した表現が良好に動作し, ラベル付きデータが少ない場合, 完全に教師付き表現よりも優れることを示す。これは、自己教師付き学習が将来のスケーラブルな無線センシングシステムにとって重要な実現可能性を示している。

In future 6G cellular networks, a joint communication and sensing protocol will allow the network to perceive the environment, opening the door for many new applications atop a unified communication-perception infrastructure. However, interpreting the sparse radio representation of sensing scenes is challenging, which hinders the potential of these emergent systems. We propose to combine radio and vision to automatically learn a radio-only sensing model with minimal human intervention. We want to build a radio sensing model that can feed on millions of uncurated data points. To this end, we leverage recent advances in self-supervised learning and formulate a new label-free radio-visual co-learning scheme, whereby vision trains radio via cross-modal mutual information. We implement and evaluate our scheme according to the common linear classification benchmark, and report qualitative and quantitative performance metrics. In our evaluation, the representation learnt by radio-visual self-supervision works well for a downstream sensing demonstrator, and outperforms its fully-supervised counterpart when less labelled data is used. This indicates that self-supervised learning could be an important enabler for future scalable radio sensing systems.

翻訳日:2021-11-05 15:44:25 公開日:2021-11-01

# 機械学習に基づく分子フラグメントリンクのためのデカップリング座標

Decoupled coordinates for machine learning-based molecular fragment linking ( http://arxiv.org/abs/2111.02930v1 )

ライセンス: Link先を確認

Markus Fleck and Noah Weber and Christopher Trummer

(参考訳) 機械学習に基づく分子フラグメントリンクの最近の進歩は、生成プロセスにリンクすべきフラグメントの相対的配向を示す構造情報を伝えることの重要性を示している。しかし、そのような構造情報は完全な相対座標系の形ではまだ提供されていない。結合長、結合角、ねじれ角の分離集合の数学的詳細を精巧化し、座標系が完備であることが示されている。生成したリンカーの品質に対する重要な影響を数値的に示す。異なる種類の自由度における信頼性情報量について検討した。アブレーション研究と情報理論的解析を行う。提案した利点は、リンカ設計における標準的グッドプラクティスとして、完全かつ分離された相対座標系の適用を示唆している。

Recent developments in machine-learning based molecular fragment linking have demonstrated the importance of informing the generation process with structural information specifying the relative orientation of the fragments to be linked. However, such structural information has not yet been provided in the form of a complete relative coordinate system. Mathematical details for a decoupled set of bond lengths, bond angles and torsion angles are elaborated and the coordinate system is demonstrated to be complete. Significant impact on the quality of the generated linkers is demonstrated numerically. The amount of reliable information within the different types of degrees of freedom is investigated. Ablation studies and an information-theoretical analysis are performed. The presented benefits suggest the application of a complete and decoupled relative coordinate system as a standard good practice in linker design.

翻訳日:2021-11-05 15:43:50 公開日:2021-11-01

# (参考訳) 全スライドイメージングのための深層学習に基づく複数インスタンス学習の依存性の会計

Accounting for Dependencies in Deep Learning Based Multiple Instance Learning for Whole Slide Imaging ( http://arxiv.org/abs/2111.01556v1 )

ライセンス: CC BY 4.0

Andriy Myronenko, Ziyue Xu, Dong Yang, Holger Roth, Daguang Xu

(参考訳) 多重インスタンス学習(MIL)は、スライド画像全体(WSI)を分類するための重要なアルゴリズムである。ヒストロジー WSI には数十億ピクセルのピクセルがあり、膨大な計算とアノテーションの課題を生み出す。通常、このようなイメージは、バッグレベルのクラスラベルのみを提供する一連のパッチ(インスタンスの袋)に分割される。ディープラーニングに基づくMIL手法は、畳み込みニューラルネットワーク(CNN)を用いてインスタンス特徴を算出する。まず、自己注意型トランスフォーマーブロックを埋め込んでインスタンス間の依存関係をキャプチャすることで、トレーニング中のインスタンス間の依存関係を明示的に説明することを提案します。例えば、腫瘍のグレードは、wsi内の異なる場所にあるいくつかの特定のパターンの存在に依存し、パッチ間の依存関係を考慮しなければならない。次に,インスタンス擬似ラベルに基づくインスタンス分割損失関数を提案する。提案手法を複数のベースライン法と比較し,1k 以上の画像を持つ最大で公開可能な wsi データセット panda challenge データセット上で評価し,最新の結果を示す。

Multiple instance learning (MIL) is a key algorithm for classification of whole slide images (WSI). Histology WSIs can have billions of pixels, which create enormous computational and annotation challenges. Typically, such images are divided into a set of patches (a bag of instances), where only bag-level class labels are provided. Deep learning based MIL methods calculate instance features using convolutional neural network (CNN). Our proposed approach is also deep learning based, with the following two contributions: Firstly, we propose to explicitly account for dependencies between instances during training by embedding self-attention Transformer blocks to capture dependencies between instances. For example, a tumor grade may depend on the presence of several particular patterns at different locations in WSI, which requires to account for dependencies between patches. Secondly, we propose an instance-wise loss function based on instance pseudo-labels. We compare the proposed algorithm to multiple baseline methods, evaluate it on the PANDA challenge dataset, the largest publicly available WSI dataset with over 11K images, and demonstrate state-of-the-art results.

翻訳日:2021-11-04 02:28:00 公開日:2021-11-01

# (参考訳) 総合的, 臨床的に正確な頭頸部臓器 : 階層的深層学習による大規模多施設研究

Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study ( http://arxiv.org/abs/2111.01544v1 )

ライセンス: CC BY 4.0

Dazhou Guo, Jia Ge, Xianghua Ye, Senxiang Yan, Yi Xin, Yuchen Song, Bing-shen Huang, Tsung-Min Hung, Zhuotun Zhu, Ling Peng, Yanping Ren, Rui Liu, Gong Zhang, Mengyuan Mao, Xiaohua Chen, Zhongjie Lu, Wenxiang Li, Yuzhen Chen, Lingyun Huang, Jing Xiao, Adam P. Harrison, Le Lu, Chien-Yu Lin, Dakai Jin, Tsung-Ying Ho

(参考訳) 放射線治療後の合併症を軽減するためには,OARセグメンテーションが重要である。コンセンサスガイドラインでは、頭頸部(H&N)領域に40以上のOARを推奨しているが、このタスクの予測可能な禁止コストのため、ほとんどの機関は、OARの小さなサブセットを規定し、他のOARに関連する線量分布を無視することによって、大幅に単純化されたプロトコルを選択する。本稿では,42個のH&N OARの包括的集合を正確に記述するために,ディープラーニングを用いた新しい,自動化された,高効率な階層化OARセグメンテーション(SOARS)システムを提案する。 soarは42のoarをアンカー、中レベル、小規模、ハードのサブカテゴリに階層化し、ニューラルネットワークアーキテクチャをニューラルネットワーク検索(nas)の原則によって各カテゴリに特化している。内科機関で176名の研修患者を用いてSOARSモデルを構築し,6施設の外部患者1327名に対して個別に評価を行った。システム評価毎のdiceスコア(他の指標では最大36%のエラー低減)では、最先端の手法を一貫して35%以上上回っている。さらに重要なことは、SOARSの予測の98%が直接臨床受け入れの修正(放射線オンコロジーのワークロードを90%削減する)を必要とせず、それらのセグメンテーションとドシメトリックの精度がユーザー間の変動より小さいことを示しています。以上の結果から,h&n癌放射線治療におけるoar脱線プロセスに対するsoarsの臨床応用性が向上し,効率,包括性,品質が向上した。

Accurate organ at risk (OAR) segmentation is critical to reduce the radiotherapy post-treatment complications. Consensus guidelines recommend a set of more than 40 OARs in the head and neck (H&N) region, however, due to the predictable prohibitive labor-cost of this task, most institutions choose a substantially simplified protocol by delineating a smaller subset of OARs and neglecting the dose distributions associated with other OARs. In this work we propose a novel, automated and highly effective stratified OAR segmentation (SOARS) system using deep learning to precisely delineate a comprehensive set of 42 H&N OARs. SOARS stratifies 42 OARs into anchor, mid-level, and small & hard subcategories, with specifically derived neural network architectures for each category by neural architecture search (NAS) principles. We built SOARS models using 176 training patients in an internal institution and independently evaluated on 1327 external patients across six different institutions. It consistently outperformed other state-of-the-art methods by at least 3-5% in Dice score for each institutional evaluation (up to 36% relative error reduction in other metrics). More importantly, extensive multi-user studies evidently demonstrated that 98% of the SOARS predictions need only very minor or no revisions for direct clinical acceptance (saving 90% radiation oncologists workload), and their segmentation and dosimetric accuracy are within or smaller than the inter-user variation. These findings confirmed the strong clinical applicability of SOARS for the OAR delineation process in H&N cancer radiotherapy workflows, with improved efficiency, comprehensiveness, and quality.

翻訳日:2021-11-04 02:18:48 公開日:2021-11-01

# (参考訳) 若年者のための皮質下構造分割データベース

Sub-cortical structure segmentation database for young population ( http://arxiv.org/abs/2111.01561v1 )

ライセンス: CC BY 4.0

Jayanthi Sivaswamy, Alphin J Thottupattu, Mythri V, Raghav Mehta, R Sheelakumari, Chandrasekharan Kesavadas

(参考訳) MRIスキャンによる皮質下構造の分離は多くの神経学的診断において重要である。これは面倒なタスク機械学習であり、特に深層学習(DL)手法が研究されている。脳の構造的複雑さは、大きな高品質なセグメンテーションデータセットを必要とし、皮質下構造セグメンテーションのための優れたdlベースのソリューションを開発する。これに向けて、114, 1.5 Tesla, T1 MRIスキャンのセットをリリースし、14の皮質下構造を手動で記述しています。データセットのスキャンは、健康な若い被験者(男性58名、女性56名)から取得され、すべての構造は経験豊富な放射線学の専門家によって手動で記述されている。このデータセットを用いてセグメンテーション実験を行い,深層学習法を用いて精度の高い結果が得られることを示した。

Segmentation of sub-cortical structures from MRI scans is of interest in many neurological diagnosis. Since this is a laborious task machine learning and specifically deep learning (DL) methods have become explored. The structural complexity of the brain demands a large, high quality segmentation dataset to develop good DL-based solutions for sub-cortical structure segmentation. Towards this, we are releasing a set of 114, 1.5 Tesla, T1 MRI scans with manual delineations for 14 sub-cortical structures. The scans in the dataset were acquired from healthy young (21-30 years) subjects ( 58 male and 56 female) and all the structures are manually delineated by experienced radiology experts. Segmentation experiments have been conducted with this dataset and results demonstrate that accurate results can be obtained with deep-learning methods.

翻訳日:2021-11-04 02:16:52 公開日:2021-11-01

# (参考訳) 頭頸部放射線治療における臓器コンチューリングのためのベイズモデルの比較

Comparing Bayesian Models for Organ Contouring in Headand Neck Radiotherapy ( http://arxiv.org/abs/2111.01134v1 )

ライセンス: CC BY 4.0

Prerak Mody, Nicolas Chaves-de-Plaza, Klaus Hildebrandt, Rene van Egmond, Huib de Ridder, Marius Staring

(参考訳) 放射線治療における臓器コントゥーリングの深層学習モデルは臨床応用が期待されているが、現在では予測された輪郭の自動品質評価(QA)のためのツールがほとんどない。ベイズモデルとその関連する不確実性を用いて、不正確な予測を検出するプロセスを自動化することができる。本研究では,予測校正誤差 (ECE) と定性的尺度 (R-AvU) を用いて,自動コントゥーリングのためのベイズモデルDropOutとFlipOutについて検討する。モデルが低ECEを信頼に値するものとみなすべきであることはよく理解されている。しかし、QAの文脈では、モデルは不正確な領域では高い不確実性を持ち、正確な領域では低い不確実性を持つ必要がある。このような振る舞いは、エキスパートユーザの視覚的な注意を、潜在的に不正確なリージョンに向け、QAプロセスのスピードアップにつながる可能性がある。 R-AvUグラフを用いて、精度と不正確な領域における異なるモデルの挙動を質的に比較する。 MICCAI2015 Head and Neck Segmentation ChallengeとDeepMindTCIA CTデータセットで、DropOut-DICE、Dropout-CE (Cross Entropy)、FlipOut-CEの3つのモデルを用いて実験が行われた。その結果,DropOut-DICEはECEが最も高く,Dropout-CEとFlipOut-CEはECEが低かった。 DropOut-CEとFlipOut-CEの違いをよりよく理解するために、R-AvUグラフを使用して、FlipOut-CEはDropOut-CEよりも不正確な領域における不確実性カバレッジが優れていることを示す。このような量的および質的なメトリクスの組み合わせは、臨床環境でQAツールとしてデプロイできるモデルを選択するのに役立つ新しいアプローチを探求する。

Deep learning models for organ contouring in radiotherapy are poised for clinical usage, but currently, there exist few tools for automated quality assessment (QA) of the predicted contours. Using Bayesian models and their associated uncertainty, one can potentially automate the process of detecting inaccurate predictions. We investigate two Bayesian models for auto-contouring, DropOut and FlipOut, using a quantitative measure - expected calibration error (ECE) and a qualitative measure - region-based accuracy-vs-uncertainty (R-AvU) graphs. It is well understood that a model should have low ECE to be considered trustworthy. However, in a QA context, a model should also have high uncertainty in inaccurate regions and low uncertainty in accurate regions. Such behaviour could direct visual attention of expert users to potentially inaccurate regions, leading to a speed up in the QA process. Using R-AvU graphs, we qualitatively compare the behaviour of different models in accurate and inaccurate regions. Experiments are conducted on the MICCAI2015 Head and Neck Segmentation Challenge and on the DeepMindTCIA CT dataset using three models: DropOut-DICE, Dropout-CE (Cross Entropy) and FlipOut-CE. Quantitative results show that DropOut-DICE has the highest ECE, while Dropout-CE and FlipOut-CE have the lowest ECE. To better understand the difference between DropOut-CE and FlipOut-CE, we use the R-AvU graph which shows that FlipOut-CE has better uncertainty coverage in inaccurate regions than DropOut-CE. Such a combination of quantitative and qualitative metrics explores a new approach that helps to select which model can be deployed as a QA tool in clinical settings.

翻訳日:2021-11-04 02:03:06 公開日:2021-11-01

# (参考訳) 全脳認知復号における深層伝達学習の評価

Evaluating deep transfer learning for whole-brain cognitive decoding ( http://arxiv.org/abs/2111.01562v1 )

ライセンス: CC BY 4.0

Armin W. Thomas and Ulman Lindenberger and Wojciech Samek and Klaus-Robert M\"uller

(参考訳) 多くの分野の研究で、少量のサンプルを持つデータセットにおけるディープラーニング(DL)モデルの性能を改善するのに、転送学習(TL)が適していることが示されている。この経験的成功は、機能的神経画像データを用いた認知的デコード解析へのtlの適用に対する関心を惹き起こした。本稿では,全脳機能型磁気共鳴画像(fMRI)データから,認知状態(顔や家の画像など)の復号化にDLモデルを適用するためのTLを体系的に評価する。まず,公開fmriデータセット上で2つのdlアーキテクチャを事前学習し,その性能を独立した実験タスクと完全に独立したデータセットで評価した。事前訓練されたモデルは、常に高い復号精度を達成し、通常、事前訓練されていないモデル変種よりも訓練時間とデータが少ない。これらの利点は、トレーニング済みモデルが新しいデータでトレーニングする際、学習した特徴の多くを再利用できることから生じており、事前トレーニングの利点をもたらすメカニズムに関する深い洞察を提供する。しかし, 学習済みモデルの復号決定を解釈する際に, DLモデルを用いた全脳認知復号化の難しさも浮き彫りにしている。

Research in many fields has shown that transfer learning (TL) is well-suited to improve the performance of deep learning (DL) models in datasets with small numbers of samples. This empirical success has triggered interest in the application of TL to cognitive decoding analyses with functional neuroimaging data. Here, we systematically evaluate TL for the application of DL models to the decoding of cognitive states (e.g., viewing images of faces or houses) from whole-brain functional Magnetic Resonance Imaging (fMRI) data. We first pre-train two DL architectures on a large, public fMRI dataset and subsequently evaluate their performance in an independent experimental task and a fully independent dataset. The pre-trained models consistently achieve higher decoding accuracies and generally require less training time and data than model variants that were not pre-trained, clearly underlining the benefits of pre-training. We demonstrate that these benefits arise from the ability of the pre-trained models to reuse many of their learned features when training with new data, providing deeper insights into the mechanisms giving rise to the benefits of pre-training. Yet, we also surface nuanced challenges for whole-brain cognitive decoding with DL models when interpreting the decoding decisions of the pre-trained models, as these have learned to utilize the fMRI data in unforeseen and counterintuitive ways to identify individual cognitive states.

翻訳日:2021-11-04 01:56:42 公開日:2021-11-01

# (参考訳) ニューラルネットワークトレーニングダイナミクスの局所性の検討

Investigating the locality of neural network training dynamics ( http://arxiv.org/abs/2111.01166v1 )

ライセンス: CC BY 4.0

Soham Dan, Phanideep Gampa and Anirbit Mukherjee

(参考訳) ディープラーニングの理論における基本的な探求は、学習アルゴリズムが取る重み空間における軌道の性質を理解することである。非常に最近分離されたそのような特性の1つは、「局所弾性」(S_{\rm rel}$)であり、サンプルデータポイントが別のデータポイントでの予測に与える影響の伝播を定量化するものである。本研究では,新しい理論的知見と,この性質のより慎重な実証的証拠を様々な設定で提供することにより,局所弾性の包括的研究を行う。まず、分類設定に特有なものとして、$s_{\rm rel}$という元の概念の新しい定義を提案する。 SVHN、CIFAR-10、CIFAR-100の最先端ニューラルネットワークトレーニングに関する実験では、新しい$S_{\rm rel}$が、サンプルデータと同じクラス内で予測を変更するのに好まれる重み更新の特性をどのように検出するかを示す。次に、最初の$s_{\rm rel}$が2ドルのフェーズの振る舞いを示す回帰を行うニューラルネットワークの例を例示して、トレーニングは$s_{\rm rel}$が急速に変化する場合の最初の弾性フェーズ、$s_{\rm rel}$が大きくなる場合の最終的な非弾性フェーズを経て行われることを実証する。最後に、元の$s_{\rm rel}$関数の閉形式式を得ることができる勾配フローによる学習の複数の例を示す。これらの導出公式のプロットを調べることによって、回帰設定における$s_{\rm rel}$の実験的に検出された性質のいくつかを理論的に実証した。

A fundamental quest in the theory of deep-learning is to understand the properties of the trajectories in the weight space that a learning algorithm takes. One such property that had very recently been isolated is that of "local elasticity" ($S_{\rm rel}$), which quantifies the propagation of influence of a sampled data point on the prediction at another data point. In this work, we perform a comprehensive study of local elasticity by providing new theoretical insights and more careful empirical evidence of this property in a variety of settings. Firstly, specific to the classification setting, we suggest a new definition of the original idea of $S_{\rm rel}$. Via experiments on state-of-the-art neural networks training on SVHN, CIFAR-10 and CIFAR-100 we demonstrate how our new $S_{\rm rel}$ detects the property of the weight updates preferring to make changes in predictions within the same class of the sampled data. Next, we demonstrate via examples of neural nets doing regression that the original $S_{\rm rel}$ reveals a $2-$phase behaviour: that their training proceeds via an initial elastic phase when $S_{\rm rel}$ changes rapidly and an eventual inelastic phase when $S_{\rm rel}$ remains large. Lastly, we give multiple examples of learning via gradient flows for which one can get a closed-form expression of the original $S_{\rm rel}$ function. By studying the plots of these derived formulas we given a theoretical demonstration of some of the experimentally detected properties of $S_{\rm rel}$ in the regression setting.

翻訳日:2021-11-04 01:55:29 公開日:2021-11-01

# (参考訳) 組合せ空間上のベイズ最適化のための潜在空間と構造化カーネルの組み合わせ

Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces ( http://arxiv.org/abs/2111.01186v1 )

ライセンス: CC BY 4.0

Aryan Deshwal and Janardhan Rao Doppa

(参考訳) 我々は高価なブラックボックス関数評価を用いて組合せ空間(シーケンス、木、グラフなど)を最適化する問題を考える。例えば、物理実験を用いて薬物設計のための分子を最適化する。ベイズ最適化(英: bayesian optimization、bo)は、学習されたサーロゲートモデルによって誘導される高ユーティリティの入力をインテリジェントに選択することで、そのような問題を解決する効率的なフレームワークである。最近の組合せ空間に対するboアプローチは、ディープ生成モデル(dgms)を用いて構造の潜在表現を学習することで連続空間上のboを減少させることである。連続空間から選択された入力を離散構造に復号して機能評価を行う。しかし、潜在空間上の代理モデルは、ターゲットのブラックボックス関数を近似するために所望の帰納バイアスを持たないDGMによって得られた情報のみを使用する。この欠点を克服するため,本論文ではLADDERと呼ばれる原則的アプローチを提案する。鍵となる考え方は、より優れたサロゲートモデリングのために、デコードされた構造から学習された潜在空間表現に構造情報を明示的に統合する新しい構造結合カーネルを定義することである。実世界のベンチマーク実験により, LADDERは潜在空間法よりもBOよりも大幅に向上し, 最先端手法とよく似た性能を示した。

We consider the problem of optimizing combinatorial spaces (e.g., sequences, trees, and graphs) using expensive black-box function evaluations. For example, optimizing molecules for drug design using physical lab experiments. Bayesian optimization (BO) is an efficient framework for solving such problems by intelligently selecting the inputs with high utility guided by a learned surrogate model. A recent BO approach for combinatorial spaces is through a reduction to BO over continuous spaces by learning a latent representation of structures using deep generative models (DGMs). The selected input from the continuous space is decoded into a discrete structure for performing function evaluation. However, the surrogate model over the latent space only uses the information learned by the DGM, which may not have the desired inductive bias to approximate the target black-box function. To overcome this drawback, this paper proposes a principled approach referred as LADDER. The key idea is to define a novel structure-coupled kernel that explicitly integrates the structural information from decoded structures with the learned latent space representation for better surrogate modeling. Our experiments on real-world benchmarks show that LADDER significantly improves over the BO over latent space method, and performs better or similar to state-of-the-art methods.

翻訳日:2021-11-04 00:37:47 公開日:2021-11-01

# (参考訳) 保持ペダルを用いたピアノ音楽生成の学習

Learning To Generate Piano Music With Sustain Pedals ( http://arxiv.org/abs/2111.01216v1 )

ライセンス: CC BY 4.0

Joann Ching and Yi-Hsuan Yang

(参考訳) 近年,音楽情報検索コミュニティにおいて,音楽信号からピアノペダルを検出する研究への関心が高まっている。しかし、我々の知る限り、近年の象徴音楽の生成モデルは、ピアノのペダルを考慮に入れることはめったにない。本研究では,Kongらが提案する転写モデルを用いて,AILabs1k7データセットにおけるピアノ演奏の音声記録からペダル情報を取得し,Hsiaoらが提案する複合語変換器を修正し,ペダル関連トークンを他の楽譜とともに生成するトランスフォーマーデコーダを構築する。練習データとして推定された保持ペダル情報を用いて作業を行うが、ピアノ演奏世代の課題においてさらなる改善と保持ペダルの関与が期待できる。

Recent years have witnessed a growing interest in research related to the detection of piano pedals from audio signals in the music information retrieval community. However, to our best knowledge, recent generative models for symbolic music have rarely taken piano pedals into account. In this work, we employ the transcription model proposed by Kong et al. to get pedal information from the audio recordings of piano performance in the AILabs1k7 dataset, and then modify the Compound Word Transformer proposed by Hsiao et al. to build a Transformer decoder that generates pedal-related tokens along with other musical tokens. While the work is done by using inferred sustain pedal information as training data, the result shows hope for further improvement and the importance of the involvement of sustain pedal in tasks of piano performance generations.

翻訳日:2021-11-04 00:19:16 公開日:2021-11-01

# (参考訳) 空中計算によるロバスト連合学習

Robust Federated Learning via Over-The-Air Computation ( http://arxiv.org/abs/2111.01221v1 )

ライセンス: CC BY 4.0

Houssem Sifaou and Geoffrey Ye Li

(参考訳) 本稿では,ビザンチン攻撃に対する空中フェデレート学習のロバスト性について検討する。モデル更新の単純な平均化は、悪意のあるクライアントのローカルモデル更新のランダムあるいは意図的な修正に対して、学習タスクを脆弱にする。本稿では,このような攻撃に対して,フェデレート学習のためのオーバー・ザ・エア計算の利点を保ちながら,ロバストな伝達と集約の枠組みを提案する。提案した堅牢な連合学習では、参加するクライアントをランダムにグループに分割し、各グループに送信時間スロットを割り当てる。パラメータサーバは、ロバスト集約技術を用いて異なるグループの結果を集約し、別のトレーニングラウンドのためにクライアントに結果を送信する。また,提案アルゴリズムの収束性も解析する。数値シミュレーションは、ビザンツ攻撃に対する提案されたアプローチの堅牢性を確認する。

This paper investigates the robustness of over-the-air federated learning to Byzantine attacks. The simple averaging of the model updates via over-the-air computation makes the learning task vulnerable to random or intended modifications of the local model updates of some malicious clients. We propose a robust transmission and aggregation framework to such attacks while preserving the benefits of over-the-air computation for federated learning. For the proposed robust federated learning, the participating clients are randomly divided into groups and a transmission time slot is allocated to each group. The parameter server aggregates the results of the different groups using a robust aggregation technique and conveys the result to the clients for another training round. We also analyze the convergence of the proposed algorithm. Numerical simulations confirm the robustness of the proposed approach to Byzantine attacks.

翻訳日:2021-11-04 00:15:16 公開日:2021-11-01

# (参考訳) スパース連続注意のためのカーネル変形指数関数族

Kernel Deformed Exponential Families for Sparse Continuous Attention ( http://arxiv.org/abs/2111.01222v1 )

ライセンス: CC BY 4.0

Alexander Moreno, Supriya Nagesh, Zhenke Wu, Walter Dempsey, James M. Rehg

(参考訳) 注意機構は、確率重みに関してデータ表現の期待値を取る。これは重要な機能に焦点を当てた要約統計を作成する。近年 (Martins et al. 2020, 2021) は指数関数的・変形的指数関数族からの非指数的注意密度に着目した継続的注意機構を提案している。 (Farinhas et al. 2021)はこれを拡張して、密集した柔軟なクラスであるガウス混合注意密度を使用した。本稿では、これを2つの一般的なフレキシブルクラス、すなわち、カーネル指数族と、新しいスパース対カーネル指数族に拡張する。理論的には、核指数関数群と変形指数関数群の両方に対する新たな存在結果を示し、変形した場合が核指数関数群と同様の近似能力を持つことを示す。実験により、カーネル変形指数族はデータ領域の複数のコンパクト領域に参加することができることが示された。

Attention mechanisms take an expectation of a data representation with respect to probability weights. This creates summary statistics that focus on important features. Recently, (Martins et al. 2020, 2021) proposed continuous attention mechanisms, focusing on unimodal attention densities from the exponential and deformed exponential families: the latter has sparse support. (Farinhas et al. 2021) extended this to use Gaussian mixture attention densities, which are a flexible class with dense support. In this paper, we extend this to two general flexible classes: kernel exponential families and our new sparse counterpart kernel deformed exponential families. Theoretically, we show new existence results for both kernel exponential and deformed exponential families, and that the deformed case has similar approximation capabilities to kernel exponential families. Experiments show that kernel deformed exponential families can attend to multiple compact regions of the data domain.

翻訳日:2021-11-04 00:04:43 公開日:2021-11-01

# (参考訳) 大規模事前学習型言語モデルによる自然言語処理の最近の進歩:調査

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey ( http://arxiv.org/abs/2111.01243v1 )

ライセンス: CC BY 4.0

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heinz, and Dan Roth

(参考訳) BERTのような、トレーニング済みのトランスフォーマーベースの大規模言語モデルは、自然言語処理(NLP)の分野を大きく変えた。本稿では,これらの大規模言語モデルを用いたNLPタスクの事前学習,微調整,プロンプト,テキスト生成といった手法を用いた最近の研究について述べる。また,事前学習した言語モデルを用いて学習補助やその他の目的のためのデータを生成する手法を提案する。我々は,今後の研究の限界と方向性に関する議論を締めくくっている。

Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field. We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches. We also present approaches that use pre-trained language models to generate data for training augmentation or other purposes. We conclude with discussions on limitations and suggested directions for future research.

翻訳日:2021-11-03 23:39:17 公開日:2021-11-01

# (参考訳) 単一画像からのアイ・イン・ハンドカメラキャリブレーションの学習

Learning Eye-in-Hand Camera Calibration from a Single Image ( http://arxiv.org/abs/2111.01245v1 )

ライセンス: CC BY 4.0

Eugene Valassakis, Kamil Dreczkowski, Edward Johns

(参考訳) アイ・イン・ハンドカメラのキャリブレーションはロボット工学の基本的かつ長期にわたる問題である。本稿では,この問題を解決するための学習的手法を1つのRGB画像からオンライン化し,モデルを完全に合成データでトレーニングする。画像から外部行列を直接予測する1つの直接回帰モデルと、2次元キーポイントを回帰してPnPを使用する1つの疎対応モデルと、回帰深度とセグメンテーションマップを用いてICPのポーズ推定を可能にする1つの密対応モデルである。実験では,これらの手法を相互に評価し,確立された古典的手法に対して評価し,直接回帰が他の手法に勝る驚くべき結果を見出した。

Eye-in-hand camera calibration is a fundamental and long-studied problem in robotics. We present a study on using learning-based methods for solving this problem online from a single RGB image, whilst training our models with entirely synthetic data. We study three main approaches: one direct regression model that directly predicts the extrinsic matrix from an image, one sparse correspondence model that regresses 2D keypoints and then uses PnP, and one dense correspondence model that uses regressed depth and segmentation maps to enable ICP pose estimation. In our experiments, we benchmark these methods against each other and against well-established classical methods, to find the surprising result that direct regression outperforms other approaches, and we perform noise-sensitivity analysis to gain further insights into these results.

翻訳日:2021-11-03 23:38:23 公開日:2021-11-01

# (参考訳) ヤコビアンスイッチング線形力学系を用いたリバースエンジニアリングリカレントニューラルネットワーク

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems ( http://arxiv.org/abs/2111.01256v1 )

ライセンス: CC BY 4.0

Jimmy T.H. Smith, Scott W. Linderman, David Sussillo

(参考訳) リカレントニューラルネットワーク(RNN)は時系列データを処理するための強力なモデルであるが、どのように機能するかを理解するのは難しい。この理解を改善することは、機械学習と神経科学の両方のコミュニティにとって大きな関心事である。トレーニングされたRNNをその固定点を中心に線形化することでリバースエンジニアリングするフレームワークは洞察を与えてきたが、アプローチには大きな課題がある。これには、線形動力学で非線形力学を再構成する際に、rnnダイナミクスとエラー蓄積を研究する際に展開する不動点を選択することの難しさが含まれる。本稿では,新しい線形力学系(SLDS)の定式化により,これらの制約を克服する新しいモデルを提案する。共同訓練されたRNNのテイラー級数展開と、RNNの固定点を選ぶために訓練された補助関数がSLDSダイナミクスを制御している。結果は、RNNを近似した訓練されたSLDS変種であり、状態空間の各点に対する固定点を生成できる補助関数であり、可能であればその1次項が計算を行うように正規化された訓練された非線形RNNである。このモデルはトレーニング後の不動点最適化を取り除き、状態空間の任意の点におけるsldの学習されたダイナミクスを曖昧に研究できる。また、SLDSモデルをスイッチ間のパラメータを共有しながら、スイッチポイントの連続多様体に一般化する。従来のリバースエンジニアリングRNNに関連する2つの合成タスクにおいて,モデルの有効性を検証する。 LFADSのような複雑なアーキテクチャでは,我々のモデルがドロップインとして利用でき,このLFADSハイブリッドを用いて,非ヒト霊長類の運動系からの単一心房刺激活性を解析することができる。

Recurrent neural networks (RNNs) are powerful models for processing time-series data, but it remains challenging to understand how they function. Improving this understanding is of substantial interest to both the machine learning and neuroscience communities. The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges. These include difficulty choosing which fixed point to expand around when studying RNN dynamics and error accumulation when reconstructing the nonlinear dynamics with the linearized dynamics. We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation. A first-order Taylor series expansion of the co-trained RNN and an auxiliary function trained to pick out the RNN's fixed points govern the SLDS dynamics. The results are a trained SLDS variant that closely approximates the RNN, an auxiliary function that can produce a fixed point for each point in state-space, and a trained nonlinear RNN whose dynamics have been regularized such that its first-order terms perform the computation, if possible. This model removes the post-training fixed point optimization and allows us to unambiguously study the learned dynamics of the SLDS at any point in state-space. It also generalizes SLDS models to continuous manifolds of switching points while sharing parameters across switches. We validate the utility of the model on two synthetic tasks relevant to previous work reverse engineering RNNs. We then show that our model can be used as a drop-in in more complex architectures, such as LFADS, and apply this LFADS hybrid to analyze single-trial spiking activity from the motor system of a non-human primate.

翻訳日:2021-11-03 23:14:49 公開日:2021-11-01

# (参考訳) 累積自己回帰自己注意による脳動態

Brain dynamics via Cumulative Auto-Regressive Self-Attention ( http://arxiv.org/abs/2111.01271v1 )

ライセンス: CC BY 4.0

Usman Mahmood, Zening Fu, Vince Calhoun, Sergey Plis

(参考訳) 多変量動的プロセスは、個々の時系列を表すコンポーネント間の重み付け接続グラフによって直感的に記述されることが多い。このグラフをピアソン相関行列として単純な表現であっても、脳画像文献で示されるように、有益で予測的である。しかしながら、強力なグラフニューラルネットワーク(GNN)は、同様の設定でより良いパフォーマンスを期待されている。本研究では,脳画像アプリケーションにおいて,深部GNNよりもかなり浅く,予測精度に優れるモデルを提案する。本モデルは,各時系列の自己回帰構造を学習し,学習した表現間の有向接続グラフを,エンドツーエンドで自己認識機構を用いて推定する。患者とコントロール間の分類器としてのモデルの教師付きトレーニングにより、有向接続グラフを生成し、各被験者に予測される時系列の構成要素を強調するモデルが得られる。統合失調症患者とコントロールを分類する機能的神経画像データセットについて検討した。

Multivariate dynamical processes can often be intuitively described by a weighted connectivity graph between components representing each individual time-series. Even a simple representation of this graph as a Pearson correlation matrix may be informative and predictive as demonstrated in the brain imaging literature. However, there is a consensus expectation that powerful graph neural networks (GNNs) should perform better in similar settings. In this work, we present a model that is considerably shallow than deep GNNs, yet outperforms them in predictive accuracy in a brain imaging application. Our model learns the autoregressive structure of individual time series and estimates directed connectivity graphs between the learned representations via a self-attention mechanism in an end-to-end fashion. The supervised training of the model as a classifier between patients and controls results in a model that generates directed connectivity graphs and highlights the components of the time-series that are predictive for each subject. We demonstrate our results on a functional neuroimaging dataset classifying schizophrenia patients and controls.

翻訳日:2021-11-03 22:50:44 公開日:2021-11-01

# (参考訳) マルチネットワークInfoMax:グラフ畳み込みネットワークを含む事前学習手法

Multi network InfoMax: A pre-training method involving graph convolutional networks ( http://arxiv.org/abs/2111.01276v1 )

ライセンス: CC BY 4.0

Usman Mahmood, Zening Fu, Vince Calhoun, Sergey Plis

(参考訳) 異なる特徴やそれらの関係をデータから発見することは、分類など、さまざまなタスクにとって重要な知識を見つけるのに役立ちます。ニューロイメージングでは、これらの機能は脳障害の理解、分類、および予防に役立つ可能性がある。高性能過パラメータディープラーニング(DL)モデルのモデルイントロスペクションは、これらの特徴や関係を見つけるのに役立つ。しかし、高性能なdlモデルを達成するには、多くの分野で利用可能なラベル付きトレーニングサンプル(n$)が必要となる。本稿では,入力サンプルの2つの高レベル埋め込み間の相互情報を最大化することに基づく,グラフ畳み込み/ニューラルネットワーク(gcns/gnns)を用いた事前学習手法を提案する。最近提案された事前学習手法の多くは、アーキテクチャの多くの可能性の1つである。ほとんどすべてのdlモデルは複数のネットワークのアンサンブルであるので、モデルの2つの異なるネットワーク(畳み込みとグラフネットワーク)からハイレベルな埋め込みを取ります。学習された高レベルグラフ潜在表現は、下流グラフ分類タスクのパフォーマンスを高め、大量のラベル付きデータサンプルの必要性を回避します。対象を健康管理群 (hc) と統合失調症群 (sz) に分類するための神経画像データセットに適用する。実験の結果,事前学習モデルが非訓練モデルを大きく上回り,同様の性能を得るためには50～%少ないデータを必要とすることがわかった。

Discovering distinct features and their relations from data can help us uncover valuable knowledge crucial for various tasks, e.g., classification. In neuroimaging, these features could help to understand, classify, and possibly prevent brain disorders. Model introspection of highly performant overparameterized deep learning (DL) models could help find these features and relations. However, to achieve high-performance level DL models require numerous labeled training samples ($n$) rarely available in many fields. This paper presents a pre-training method involving graph convolutional/neural networks (GCNs/GNNs), based on maximizing mutual information between two high-level embeddings of an input sample. Many of the recently proposed pre-training methods pre-train one of many possible networks of an architecture. Since almost every DL model is an ensemble of multiple networks, we take our high-level embeddings from two different networks of a model --a convolutional and a graph network--. The learned high-level graph latent representations help increase performance for downstream graph classification tasks and bypass the need for a high number of labeled data samples. We apply our method to a neuroimaging dataset for classifying subjects into healthy control (HC) and schizophrenia (SZ) groups. Our experiments show that the pre-trained model significantly outperforms the non-pre-trained model and requires $50\%$ less data for similar performance.

翻訳日:2021-11-03 22:42:51 公開日:2021-11-01

# PointNu-Net : 臨床野における同時多部組織分類と分類

PointNu-Net: Simultaneous Multi-tissue Histology Nuclei Segmentation and Classification in the Clinical Wild ( http://arxiv.org/abs/2111.01557v1 )

ライセンス: Link先を確認

Kai Yao and Kaizhu Huang and Jie Sun and Amir Hussain and Curran Jude

(参考訳) 自動核セグメンテーションと分類は、デジタル病理学において重要な役割を果たす。しかしながら、以前の作業は、主に多様性とサイズが限定されたデータに基づいており、結果が疑わしいか、あるいは実際のダウンストリームタスクで誤解を招くようにしている。本稿では,「臨床ワイルド」からのデータを扱うことができる信頼性とロバストな手法を構築することを目的とする。具体的には, haematoxylin および eosin (h&e) 染色組織病理データからの核を同時検出, 分割, 分類する新しい方法の検討と, 最近の大規模データセット pannuke を用いたアプローチの評価を行った。本稿では,各核の中心点を決定するために,各核の検出と分類を新しい意味的キーポイント推定問題として扱う。次に、動的インスタンスセグメンテーションを用いて、核中心点に対する対応する類別マスクを求める。 2つの同時実行課題を分離することにより、クラス認識検出とクラス非依存セグメンテーションの恩恵を受け、性能が大幅に向上する。提案手法は19の異なる組織タイプにまたがる核分画と分類において優れた性能を示し,新たなベンチマーク結果を得た。

Automatic nuclei segmentation and classification plays a vital role in digital pathology. However, previous works are mostly built on data with limited diversity and small sizes, making the results questionable or misleading in actual downstream tasks. In this paper, we aim to build a reliable and robust method capable of dealing with data from the 'the clinical wild'. Specifically, we study and design a new method to simultaneously detect, segment, and classify nuclei from Haematoxylin and Eosin (H&E) stained histopathology data, and evaluate our approach using the recent largest dataset: PanNuke. We address the detection and classification of each nuclei as a novel semantic keypoint estimation problem to determine the center point of each nuclei. Next, the corresponding class-agnostic masks for nuclei center points are obtained using dynamic instance segmentation. By decoupling two simultaneous challenging tasks, our method can benefit from class-aware detection and class-agnostic segmentation, thus leading to a significant performance boost. We demonstrate the superior performance of our proposed approach for nuclei segmentation and classification across 19 different tissue types, delivering new benchmark results.

翻訳日:2021-11-03 15:09:28 公開日:2021-11-01

# 時系列・計量・機械学習・ディープラーニングモデルを用いた株価予測

Stock Price Prediction Using Time Series, Econometric, Machine Learning, and Deep Learning Models ( http://arxiv.org/abs/2111.01137v1 )

ライセンス: Link先を確認

Ananda Chatterjee, Hrisav Bhowmick, and Jaydip Sen

(参考訳) 長い間、研究者は株式価格予測の信頼性と正確な予測モデルを開発してきた。文献によると、予測モデルが正しく設計され洗練されているならば、将来の株価を辛うじて忠実に見積もることができる。本稿では,株価予測のための時系列モデル,計量モデル,各種学習モデルについて述べる。 2004年1月から2019年12月までのInfosys、ICICI、SUN PHARMAのデータは、どのモデルがどのセクターでどのモデルが最も優れているかを知るためのトレーニングとテストに使用された。本稿では,1つの時系列モデル (Holt-Winters Exponential Smoothing) と1つのエコノメトリモデル (ARIMA) と2つの機械学習モデル (Random Forest と MARS) と2つのディープラーニングベースモデル (シンプルな RNN と LSTM) を含む。 MARSは最高の機械学習モデルであることが証明され、LSTMは最高のディープラーニングモデルであることが証明された。しかし、IT(Infosysデータ)、バンキング(ICICIデータ)、ヘルス(SUN PHARMAデータ)の3分野すべてにおいて、MARSは販売予測において最高のパフォーマンスモデルであることが証明された。

For a long-time, researchers have been developing a reliable and accurate predictive model for stock price prediction. According to the literature, if predictive models are correctly designed and refined, they can painstakingly and faithfully estimate future stock values. This paper demonstrates a set of time series, econometric, and various learning-based models for stock price prediction. The data of Infosys, ICICI, and SUN PHARMA from the period of January 2004 to December 2019 was used here for training and testing the models to know which model performs best in which sector. One time series model (Holt-Winters Exponential Smoothing), one econometric model (ARIMA), two machine Learning models (Random Forest and MARS), and two deep learning-based models (simple RNN and LSTM) have been included in this paper. MARS has been proved to be the best performing machine learning model, while LSTM has proved to be the best performing deep learning model. But overall, for all three sectors - IT (on Infosys data), Banking (on ICICI data), and Health (on SUN PHARMA data), MARS has proved to be the best performing model in sales forecasting.

翻訳日:2021-11-03 15:09:05 公開日:2021-11-01

# OPF-Learn:AC Optimal Power Flowデータセット作成のためのオープンソースフレームワーク

OPF-Learn: An Open-Source Framework for Creating Representative AC Optimal Power Flow Datasets ( http://arxiv.org/abs/2111.01228v1 )

ライセンス: Link先を確認

Trager Joswig-Jones, Ahmed S. Zamzam, Kyri Baker

(参考訳) 再生可能発電のレベルの増加は、不確実性を管理するためにac最適電力フロー(ac opf)のためのデータ駆動アプローチへの関心が高まっているが、規律化されたデータセットの作成とベンチマークの欠如は、文献におけるアプローチ間の有用な比較を禁止している。信頼性を高めるために、モデルは幅広い操作条件で確実に解を予測できなければならない。本稿では、juliaとpython用のopf-learnパッケージを開発し、ac opf実現可能領域の幅広いスペクトルにまたがる代表データセットを作成するために計算効率の良い手法を用いている。負荷プロファイルは、AC OPF可能なセットを含む凸集合から一様にサンプリングされる。検出された各不実現点について、凸集合は緩和された定式化の特性を用いて、不実現性証明を用いて縮小される。このフレームワークは、文献に見られる従来のテクニックよりも、実現可能なスペース全体を代表するデータセットを生成し、機械学習モデルのパフォーマンスを向上させる。

Increasing levels of renewable generation motivate a growing interest in data-driven approaches for AC optimal power flow (AC OPF) to manage uncertainty; however, a lack of disciplined dataset creation and benchmarking prohibits useful comparison among approaches in the literature. To instill confidence, models must be able to reliably predict solutions across a wide range of operating conditions. This paper develops the OPF-Learn package for Julia and Python, which uses a computationally efficient approach to create representative datasets that span a wide spectrum of the AC OPF feasible region. Load profiles are uniformly sampled from a convex set that contains the AC OPF feasible set. For each infeasible point found, the convex set is reduced using infeasibility certificates, found by using properties of a relaxed formulation. The framework is shown to generate datasets that are more representative of the entire feasible space versus traditional techniques seen in the literature, improving machine learning model performance.

翻訳日:2021-11-03 15:08:41 公開日:2021-11-01

# マルチスケールモデリングのためのマルチ解像度X線マイクロCT画像の深部学習

Deep learning of multi-resolution X-Ray micro-CT images for multi-scale modelling ( http://arxiv.org/abs/2111.01270v1 )

ライセンス: Link先を確認

Samuel J. Jackson and Yufu Niu and Sojwal Manoorkar and Peyman Mostaghimi and Ryan T. Armstrong

(参考訳) マルチスケール多孔質系のキャラクタリゼーション、解析、モデル開発を制限するX線マイクロ計算トモグラフィーには、視野と解像度のトレードオフがある。本稿では,これらのトレードオフを克服するために,edsr畳み込みニューラルネットワークを3次元拡張し,低解像度データから大規模空間スケールの高分解能データを生成する。ベントハイマー岩盤試料からの対高分解能(hr, 2$\mu$m)と低分解能(lr, 6$\mu$m)の画像データを用いてネットワークを訓練する。トレーニングサンプルから得られたlrとhrデータと、異なるマイクロ構造を持つ別のサンプルは、テキスト分析、セグメンテーション動作、およびpnm(pore-network model)多相流シミュレーションなど、さまざまなメトリクスでネットワークを検証するために使用される。検証されたEDSRネットワークは、長さ6-7cmの各コアサンプルに対して、約1000の高解像度REVサブボリューム画像を生成する。各サブボリュームは、PNMから予測される異なる石油物理特性を持ち、各サンプルの3次元連続体スケールモデルを作成するために結合される。低キャピラリー数での乾燥不能な流れは, 実験圧力と3次元飽和度を1:1で直接比較し, 一定範囲の分数流でシミュレートした。 edsr生成モデルは, 細孔径分布の広い流れ場において, 異質性の存在下での実験的挙動を予測できるベースlrモデルよりも精度が高い。モデルは通常、実験的な再現性と3桁の相対透過性の範囲内で飽和を予測するのに正確である。実証されたワークフローは、キャリブレーションなしで完全に予測され、真にマルチスケールの異種システムにおけるフローをイメージし、シミュレートし、分析する可能性を開く。

There are inherent field-of-view and resolution trade-offs in X-Ray micro-computed tomography imaging, which limit the characterization, analysis and model development of multi-scale porous systems. In this paper, we overcome these tradeoffs by developing a 3D Enhanced Deep Super Resolution (EDSR) convolutional neural network to create enhanced, high-resolution data over large spatial scales from low-resolution data. Paired high-resolution (HR, 2$\mu$m) and low resolution (LR, 6$\mu$m) image data from a Bentheimer rock sample are used to train the network. Unseen LR and HR data from the training sample, and another sample with a distinct micro-structure, are used to validate the network with various metrics: textual analysis, segmentation behaviour and pore-network model (PNM) multiphase flow simulations. The validated EDSR network is used to generate ~1000 high-resolution REV subvolume images for each full core sample of length 6-7cm (total image sizes are ~6000x6000x32000 voxels). Each subvolume has distinct petrophysical properties predicted from PNMs, which are combined to create a 3D continuum-scale model of each sample. Drainage immiscible flow at low capillary number is simulated across a range of fractional flows and compared directly to experimental pressures and 3D saturations on a 1:1 basis. The EDSR generated model is more accurate than the base LR model at predicting experimental behaviour in the presence of heterogeneities, especially in flow regimes where a wide distribution of pore-sizes are encountered. The models are generally accurate at predicting saturations to within the experimental repeatability and relative permeability across three orders of magnitude. The demonstrated workflow is a fully predictive, without calibration, and opens up the possibility to image, simulate and analyse flow in truly multi-scale heterogeneous systems that are otherwise intractable.

翻訳日:2021-11-03 15:08:23 公開日:2021-11-01

# ネスト力学系としてのディープニューラルネットワーク

Deep neural networks as nested dynamical systems ( http://arxiv.org/abs/2111.01297v1 )

ライセンス: Link先を確認

David I. Spivak, Timothy Hosgood

(参考訳) ディープニューラルネットワークの「ニューロン」は、脳のニューロン(または神経細胞、混乱を避けるために)に対応するべきである。しかし、私たちは、このアナロジーは型チェックさえしていないと主張しています。ヘッビアン・ラーニングを「一緒に発火するセル」というわずかにグリムの要約と一致して、この記事ではアナロジーが異なるべきという主張を述べる。深層ニューラルネットワークの"neuron"は重みの変化を管理しているため、それらは脳内のシナプスに似ています。神経細胞が単なるワイヤー以上のように見えるという直感は、まさに正しいものであり、本記事では、正確なカテゴリー理論の類似によって正当化される。全体としては、「ニューロン」を引用に残したり、人工ニューロンと呼ぶことで、人工ニューロンと神経細胞を同等にすることの誤りを強調し続ける。まず、深層ニューラルネットワークを、非常に制限された相互作用パターンを持つネストされた動的システムとみなす方法を説明し、次に、エンジニアリング全体を通して有用だが状況の変化に適応できない、より一般的な動的システムに対する相互作用を説明する。前述のように、類推はどちらも埋め込まれた数学的形式主義によって我々に強制される。それらは制御理論のように複雑な相互作用を持つが、ディープニューラルネットワークのような状況に適応する。

There is an analogy that is often made between deep neural networks and actual brains, suggested by the nomenclature itself: the "neurons" in deep neural networks should correspond to neurons (or nerve cells, to avoid confusion) in the brain. We claim, however, that this analogy doesn't even type check: it is structurally flawed. In agreement with the slightly glib summary of Hebbian learning as "cells that fire together wire together", this article makes the case that the analogy should be different. Since the "neurons" in deep neural networks are managing the changing weights, they are more akin to the synapses in the brain; instead, it is the wires in deep neural networks that are more like nerve cells, in that they are what cause the information to flow. An intuition that nerve cells seem like more than mere wires is exactly right, and is justified by a precise category-theoretic analogy which we will explore in this article. Throughout, we will continue to highlight the error in equating artificial neurons with nerve cells by leaving "neuron" in quotes or by calling them artificial neurons. We will first explain how to view deep neural networks as nested dynamical systems with a very restricted sort of interaction pattern, and then explain a more general sort of interaction for dynamical systems that is useful throughout engineering, but which fails to adapt to changing circumstances. As mentioned, an analogy is then forced upon us by the mathematical formalism in which they are both embedded. We call the resulting encompassing generalization deeply interacting learning systems: they have complex interaction as in control theory, but adaptation to circumstances as in deep neural networks.

翻訳日:2021-11-03 15:07:49 公開日:2021-11-01

# 効率的で簡潔で正確な説明が

Provably efficient, succinct, and precise explanations ( http://arxiv.org/abs/2111.01576v1 )

ライセンス: Link先を確認

Guy Blanc, Jane Lange, and Li-Yang Tan

(参考訳) 任意のブラックボックスモデルの予測を説明する問題は$f$である。 $f$ とインスタンス $x$ に対するクエリアクセスが与えられたとき、基本的に$f(x)$ を決定する$x$ の機能の小さなセットを出力する。我々は、返却する説明の簡潔さと正確さを証明可能な保証で効率的なアルゴリズムを設計する。以前のアルゴリズムは効率的だったが、そのような保証がなかったり、保証を達成できなかったりしたが、効率が悪かった。学習決定木を暗黙的に学習する問題に接続してアルゴリズムを得る。この学習タスクの暗黙的な性質は、$f$の複雑さが難解で大きな代理決定木を必要とする場合でも、効率的なアルゴリズムを可能にする。学習理論,局所計算アルゴリズム,複雑性理論を組み合わせることで,暗黙的な学習問題を解決する。暗黙的な学習による説明」というアプローチは,ポストホックな説明とグローバルな説明,ローカルな説明の2つの異なる方法の要素を共有し,両者の利点を享受している。

We consider the problem of explaining the predictions of an arbitrary blackbox model $f$: given query access to $f$ and an instance $x$, output a small set of $x$'s features that in conjunction essentially determines $f(x)$. We design an efficient algorithm with provable guarantees on the succinctness and precision of the explanations that it returns. Prior algorithms were either efficient but lacked such guarantees, or achieved such guarantees but were inefficient. We obtain our algorithm via a connection to the problem of {\sl implicitly} learning decision trees. The implicit nature of this learning task allows for efficient algorithms even when the complexity of $f$ necessitates an intractably large surrogate decision tree. We solve the implicit learning problem by bringing together techniques from learning theory, local computation algorithms, and complexity theory. Our approach of "explaining by implicit learning" shares elements of two previously disparate methods for post-hoc explanations, global and local explanations, and we make the case that it enjoys advantages of both.

翻訳日:2021-11-03 15:07:01 公開日:2021-11-01

# 音声データセットにおける1回だけ聴く(yoho)アルゴリズムの頑健性評価

Evaluating robustness of You Only Hear Once(YOHO) Algorithm on noisy audios in the VOICe Dataset ( http://arxiv.org/abs/2111.01205v1 )

ライセンス: Link先を確認

Soham Tiwari, Kshitiz Lakhotia, Manjunath Mulimani

(参考訳) マシンリスニングにおける音響イベント検出(sed)は、オーディオファイル内の異なる音を識別し、オーディオ内の特定の音イベントの開始と終了時刻を識別する。 SEDは、音声監視、音声認識、文脈に基づくインデックス作成やマルチメディアデータベース内のデータの検索など、様々な用途で使用されている。しかし、現実のシナリオでは、様々なソースからのオーディオは、干渉するノイズや外乱をほとんど持たない。本稿では,ノイズの多い音声データに対して,You Only Hear Once (YOHO)アルゴリズムの性能を検証した。コンピュータビジョンにおけるYou Only Look Once (YOLO)アルゴリズムにインスパイアされたYOHOアルゴリズムは、Music Speech Detection Dataset、TUT Sound Event、Urban-SEDデータセットなど、さまざまな最先端アルゴリズムのパフォーマンスを、低い推論時間でマッチングすることができる。本稿では,音量比(snr)の異なる音声ファイルを含む音声データセットにおけるyohoアルゴリズムの性能について検討する。 YOHOはVOICeデータセットの論文で報告された最高のパフォーマンスのSEDアルゴリズムを上回ったり、少なくとも一致させることができる。

Sound event detection (SED) in machine listening entails identifying the different sounds in an audio file and identifying the start and end time of a particular sound event in the audio. SED finds use in various applications such as audio surveillance, speech recognition, and context-based indexing and retrieval of data in a multimedia database. However, in real-life scenarios, the audios from various sources are seldom devoid of any interfering noise or disturbance. In this paper, we test the performance of the You Only Hear Once (YOHO) algorithm on noisy audio data. Inspired by the You Only Look Once (YOLO) algorithm in computer vision, the YOHO algorithm can match the performance of the various state-of-the-art algorithms on datasets such as Music Speech Detection Dataset, TUT Sound Event, and Urban-SED datasets but at lower inference times. In this paper, we explore the performance of the YOHO algorithm on the VOICe dataset containing audio files with noise at different sound-to-noise ratios (SNR). YOHO could outperform or at least match the best performing SED algorithms reported in the VOICe dataset paper and make inferences in less time.

翻訳日:2021-11-03 14:43:48 公開日:2021-11-01

# 生成するな - ソーン発散を伴う微分プライベート生成モデルのトレーニング

Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence ( http://arxiv.org/abs/2111.01177v1 )

ライセンス: Link先を確認

Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis

(参考訳) 大量のデータに基づいてトレーニングされた機械学習モデルは、いくつかの分野でブレークスルーにつながっているが、データへのアクセスが制限されているため、プライバシに敏感なドメインへのデプロイメントは制限されている。プライベートデータに対するプライバシの制約でトレーニングされた生成モデルはこの課題を回避し、代わりにプライベートデータへの間接アクセスを提供する。本稿では,プライベートデータから差分プライバシーを持つデータ分布を学習するための,新しい最適トランスポートベース生成法であるdp-sinkhornを提案する。 dp-sinkhornは、モデルとデータの間の正確な最適移動距離に対する計算効率のよい近似であるspinhorn divergenceを微分プライベートな方法で最小化し、勾配推定のバイアス分散トレードオフを制御する新しい手法を使用する。主に生成的敵ネットワークに基づく差分的私的生成モデルを訓練するための既存のアプローチとは異なり、我々は、特にプライバシー制約によるノイズの存在下では、最適化が困難な敵の目的に頼らない。したがって、DP-Sinkhornは訓練や展開が容易である。実験では,複数の画像モデリングベンチマークにおける最先端の改良を行い,有意なrgb画像の差分プライベート合成を示す。プロジェクトページ:https://nv-tlabs.github.io/DP-Sinkhorn

Although machine learning models trained on massive data have led to break-throughs in several areas, their deployment in privacy-sensitive domains remains limited due to restricted access to data. Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead. We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. DP-Sinkhorn minimizes the Sinkhorn divergence, a computationally efficient approximation to the exact optimal transport distance, between the model and data in a differentially private manner and uses a novel technique for control-ling the bias-variance trade-off of gradient estimates. Unlike existing approaches for training differentially private generative models, which are mostly based on generative adversarial networks, we do not rely on adversarial objectives, which are notoriously difficult to optimize, especially in the presence of noise imposed by privacy constraints. Hence, DP-Sinkhorn is easy to train and deploy. Experimentally, we improve upon the state-of-the-art on multiple image modeling benchmarks and show differentially private synthesis of informative RGB images. Project page:https://nv-tlabs.github.io/DP-Sinkhorn.

翻訳日:2021-11-03 14:38:45 公開日:2021-11-01

# DAGに基づく分散フェデレーション学習によるインプシットモデル特殊化

Implicit Model Specialization through DAG-based Decentralized Federated Learning ( http://arxiv.org/abs/2111.01257v1 )

ライセンス: Link先を確認

Jossekin Beilharz, Bjarne Pfitzner, Robert Schmid, Paul Geppert, Bernd Arnrich, and Andreas Polze

(参考訳) フェデレートされた学習により、分散クライアントのグループは、プライベートデータ上で共通の機械学習モデルをトレーニングできる。モデル更新の交換は、中央のエンティティまたは分散型の方法で、例えばブロックチェーンによって管理される。しかし、すべてのクライアント間の強い一般化により、これらのアプローチは非独立かつ同一の分散(非iid)データには適さない。モデル更新の有向非巡回グラフ(DAG)に基づくフェデレーション学習における分散化とパーソナライズへの統一的なアプローチを提案する。単一のグローバルモデルをトレーニングする代わりに、クライアントはローカルデータに特化して、各データの類似性に依存する他のクライアントからのモデル更新を使用する。この特殊化は、DAGベースの通信とモデル更新の選択から暗黙的に現れる。このように、データのサブセットに焦点を当てた特殊なモデルの進化を可能にすることで、集中型あるいはブロックチェーンベースのセットアップでのフェデレーション学習よりも、非IIDデータをカバーできるのです。私たちの知る限りでは、提案するソリューションは、完全に分散した連合学習において、パーソナライゼーションと有毒な堅牢性を統合する最初の方法です。評価の結果,3つのデータセット上でのモデル更新のDAGに基づく通信から,モデルの特殊化が直接現れることがわかった。さらに,フェデレート平均化と比較してモデル精度が安定し,クライアント間のばらつきも小さくなった。

Federated learning allows a group of distributed clients to train a common machine learning model on private data. The exchange of model updates is managed either by a central entity or in a decentralized way, e.g. by a blockchain. However, the strong generalization across all clients makes these approaches unsuited for non-independent and identically distributed (non-IID) data. We propose a unified approach to decentralization and personalization in federated learning that is based on a directed acyclic graph (DAG) of model updates. Instead of training a single global model, clients specialize on their local data while using the model updates from other clients dependent on the similarity of their respective data. This specialization implicitly emerges from the DAG-based communication and selection of model updates. Thus, we enable the evolution of specialized models, which focus on a subset of the data and therefore cover non-IID data better than federated learning in a centralized or blockchain-based setup. To the best of our knowledge, the proposed solution is the first to unite personalization and poisoning robustness in fully decentralized federated learning. Our evaluation shows that the specialization of models emerges directly from the DAG-based communication of model updates on three different datasets. Furthermore, we show stable model accuracy and less variance across clients when compared to federated averaging.

翻訳日:2021-11-03 14:38:22 公開日:2021-11-01

# 潜在状態と変更点検出のためのネットワーククラスタリング

Network Clustering for Latent State and Changepoint Detection ( http://arxiv.org/abs/2111.01273v1 )

ライセンス: Link先を確認

Madeline Navarro and Genevera I. Allen and Michael Weylandt

(参考訳) ネットワークモデルは、幅広い構造化データソースを分析するための強力で柔軟なフレームワークを提供する。しかし、多くの状況において、基礎となる現象の異なる側面を捉えたり、時間とともに変化する振る舞いを捉えるために複数のネットワークを構築することができる。このような環境では、共通構造のパターンを特定するために、関連するネットワークをクラスタリングすることがしばしば有用である。本稿では,ネットワーククラスタリングの課題に対する凸アプローチを提案する。提案手法では,コンベックス融合ペナルティを用いてスムースなツリー状クラスタ構造を誘導し,クラスタ数を事前に選択する必要がなくなる。コンベックスネットワーククラスタリングのための効率的なアルゴリズムを提案し,その有効性を合成例で示す。

Network models provide a powerful and flexible framework for analyzing a wide range of structured data sources. In many situations of interest, however, multiple networks can be constructed to capture different aspects of an underlying phenomenon or to capture changing behavior over time. In such settings, it is often useful to cluster together related networks in attempt to identify patterns of common structure. In this paper, we propose a convex approach for the task of network clustering. Our approach uses a convex fusion penalty to induce a smoothly-varying tree-like cluster structure, eliminating the need to select the number of clusters a priori. We provide an efficient algorithm for convex network clustering and demonstrate its effectiveness on synthetic examples.

翻訳日:2021-11-03 14:37:55 公開日:2021-11-01

# トランスペアレント、解釈可能、パーシモニアス、シミュラブル(tips)機械学習モデルを用いたcovid-19パンデミックダイナミクスのモデル化 : システム思考とシステム識別の観点からのケーススタディ

Modelling COVID-19 Pandemic Dynamics Using Transparent, Interpretable, Parsimonious and Simulatable (TIPS) Machine Learning Models: A Case Study from Systems Thinking and System Identification Perspectives ( http://arxiv.org/abs/2111.01763v1 )

ライセンス: Link先を確認

Hua-Liang Wei and S.A. Billings

(参考訳) 新型コロナウイルス(covid-19)の流行以降、新型コロナウイルスの感染拡大をシミュレートし、研究するために、感染したウイルス(sir)と感染したウイルス(seir)モデル(sir)を多く使っている文献に、パンデミックのダイナミクスに関する天文学的な出版物が数多く登場している。 SIRとSEIRは、通常の微分方程式(ODE)の初期値問題(IVP)のクラスである連続時間モデルである。回帰や機械学習などの離散時間モデルも新型コロナウイルスのパンデミックデータ(例:感染症の予測)の分析に応用されているが、これらの手法のほとんどは、事前知識に基づいて事前選択された少数の入力変数を含む単純化されたモデルを使用するか、非常に複雑なモデル(例:ディープラーニング)を使用する。再現数(R番号)、感染事例、死亡など、時間差や時間遅れの関係の調査に焦点をあてた研究は比較的少なく、システム思考と動的視点からパンデミックが広まるのを分析している。本研究は, システム工学とシステム同定を用いて, 透明, 解釈可能, パーシモニアス, シミュレーション可能(tips)な動的機械学習モデルを構築することを提案する。 TIPSモデルは、よく知られたNARMAX(Nonlinear AutoRegressive moving Average with eXogenous inputs)モデルに基づいて開発されており、COVID-19パンデミックのダイナミクスをよりよく理解することができる。英国の新型コロナウイルス(covid-19)データに関するケーススタディが実施された。提案手法と新たな知見は、新型コロナウイルスの感染拡大のダイナミクスをよりよく理解するために有用である。

Since the outbreak of COVID-19, an astronomical number of publications on the pandemic dynamics appeared in the literature, of which many use the susceptible infected removed (SIR) and susceptible exposed infected removed (SEIR) models, or their variants, to simulate and study the spread of the coronavirus. SIR and SEIR are continuous-time models which are a class of initial value problems (IVPs) of ordinary differential equations (ODEs). Discrete-time models such as regression and machine learning have also been applied to analyze COVID-19 pandemic data (e.g. predicting infection cases), but most of these methods use simplified models involving a small number of input variables pre-selected based on a priori knowledge, or use very complicated models (e.g. deep learning), purely focusing on certain prediction purposes and paying little attention to the model interpretability. There have been relatively fewer studies focusing on the investigations of the inherent time-lagged or time-delayed relationships e.g. between the reproduction number (R number), infection cases, and deaths, analyzing the pandemic spread from a systems thinking and dynamic perspective. The present study, for the first time, proposes using systems engineering and system identification approach to build transparent, interpretable, parsimonious and simulatable (TIPS) dynamic machine learning models, establishing links between the R number, the infection cases and deaths caused by COVID-19. The TIPS models are developed based on the well-known NARMAX (Nonlinear AutoRegressive Moving Average with eXogenous inputs) model, which can help better understand the COVID-19 pandemic dynamics. A case study on the UK COVID-19 data is carried out, and new findings are detailed. The proposed method and the associated new findings are useful for better understanding the spread dynamics of the COVID-19 pandemic.

翻訳日:2021-11-03 14:33:28 公開日:2021-11-01

# 位相層による二次分散関数有向非巡回グラフの効率的な学習

Efficient Learning of Quadratic Variance Function Directed Acyclic Graphs via Topological Layers ( http://arxiv.org/abs/2111.01560v1 )

ライセンス: Link先を確認

Wei Zhou and Xin He and Wei Zhong and Junhui Wang

(参考訳) 直接非巡回グラフ(DAG)モデルは、多くのアプリケーション領域における確率変数間の因果関係を表現するために広く用いられている。本稿では,親が与えられた各ノードの条件分散が条件平均の二次関数である非ガウス型dagモデルの特殊クラスについて述べる。このようなガウス的でないDAGモデルのクラスは、かなり柔軟であり、ポアソン、ビノミアル、幾何学、指数、ガンマを含む多くの一般的な分布を特別な場合として認める。学習を容易にするために,トポロジカル層の概念を導入し,効率的なDAG学習アルゴリズムを開発した。まず、階層的な方法でトポロジ的層を再構築し、次に異なる層のノード間の有向エッジを復元する。その利点は、多くのシミュレーション例や、nbaプレーヤーの統計データとalibabaが収集した化粧品販売データを含む2つの実生活データセットへの応用で示されている。

Directed acyclic graph (DAG) models are widely used to represent causal relationships among random variables in many application domains. This paper studies a special class of non-Gaussian DAG models, where the conditional variance of each node given its parents is a quadratic function of its conditional mean. Such a class of non-Gaussian DAG models are fairly flexible and admit many popular distributions as special cases, including Poisson, Binomial, Geometric, Exponential, and Gamma. To facilitate learning, we introduce a novel concept of topological layers, and develop an efficient DAG learning algorithm. It first reconstructs the topological layers in a hierarchical fashion and then recoveries the directed edges between nodes in different layers, which requires much less computational cost than most existing algorithms in literature. Its advantage is also demonstrated in a number of simulated examples, as well as its applications to two real-life datasets, including an NBA player statistics data and a cosmetic sales data collected by Alibaba.

翻訳日:2021-11-03 14:22:44 公開日:2021-11-01

# Arch-Net: アーキテクチャに依存しないモデル展開のためのモデル蒸留

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment ( http://arxiv.org/abs/2111.01135v1 )

ライセンス: Link先を確認

Weixin Xu, Zipeng Feng, Shuangkang Fang, Song Yuan, Yi Yang, Shuchang Zhou

(参考訳) ディープニューラルネットワークの計算能力の膨大な要求は、現実世界のアプリケーションにとって大きなハードルとなる。最近のアプリケーション固有集積回路(ASIC)チップは、ニューラルネットワークアクセラレーション専用のハードウェアをサポートする。しかし、ASICは開発に数年を要し、ニューラルアーキテクチャ研究の最新の開発によって必然的に追い越されている。例えば、トランスフォーマーネットワークは、多くの人気のあるチップにネイティブサポートを持っていないため、デプロイが難しい。本稿では,asicのほとんどのアーキテクチャで効率的にサポートできるオペレータのみからなるニューラルネットワークであるarch-netを提案する。 Arch-Netが生成されると、Layer Normalization や Embedding Layersのようなより一般的なネットワーク構成は、ラベルのないBlockwise Model Distillationを通じて徐々に排除され、同時にサブ8ビット量子化を実行してパフォーマンスを最大化する。機械翻訳と画像分類タスクの実証結果から,最新のニューラルアーキテクチャを高速実行およびア正確なアーチネットに変換し,複数の量産型asicチップにデプロイできることを確認した。コードはhttps://github.com/megvii-research/Arch-Netで入手できる。

Vast requirement of computation power of Deep Neural Networks is a major hurdle to their real world applications. Many recent Application Specific Integrated Circuit (ASIC) chips feature dedicated hardware support for Neural Network Acceleration. However, as ASICs take multiple years to develop, they are inevitably out-paced by the latest development in Neural Architecture Research. For example, Transformer Networks do not have native support on many popular chips, and hence are difficult to deploy. In this paper, we propose Arch-Net, a family of Neural Networks made up of only operators efficiently supported across most architectures of ASICs. When a Arch-Net is produced, less common network constructs, like Layer Normalization and Embedding Layers, are eliminated in a progressive manner through label-free Blockwise Model Distillation, while performing sub-eight bit quantization at the same time to maximize performance. Empirical results on machine translation and image classification tasks confirm that we can transform latest developed Neural Architectures into fast running and as-accurate Arch-Net, ready for deployment on multiple mass-produced ASIC chips. The code will be available at https://github.com/megvii-research/Arch-Net.

翻訳日:2021-11-03 14:21:36 公開日:2021-11-01

# グラフベーススーパービジョンを用いたシーケンストランスダクション

Sequence Transduction with Graph-based Supervision ( http://arxiv.org/abs/2111.01272v1 )

ライセンス: Link先を確認

Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

(参考訳) リカレントニューラルネットワークトランスデューサ(RNN-T)の目標は、生産のための今日の最高の自動音声認識(ASR)システムを構築する上で大きな役割を果たす。接続性時間分類(CTC)の目的と同様に、RNN-T損失は、一組のアライメントをどのように生成してフルサムトレーニングのための格子を形成するかを定義する特定のルールを使用する。しかし、これらのルールが最適であり、最高のASR結果をもたらすかどうかはまだ不明である。本研究では,ラベルのグラフ表現を受け入れるためにRNN-T損失を一般化する新たなトランスデューサ目的関数を提案する。 CTCのような格子を持つトランスデューサベースのASRは、標準のRNN-Tよりも優れた結果が得られると同時に、厳密な単調なアライメントを確保し、復号処理の最適化を可能にすることを実証する。例えば、提案したCTCライクなトランスデューサシステムは、同等のRNN-Tベースのシステムに対する4.8%の改善に対応する、LibriSpeechの他のテスト条件に対する単語誤り率5.9%を達成する。

The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss uses specific rules that define how a set of alignments is generated to form a lattice for the full-sum training. However, it is yet largely unknown if these rules are optimal and do lead to the best possible ASR results. In this work, we present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels, thus providing a flexible and efficient framework to manipulate training lattices, for example for restricting alignments or studying different transition rules. We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T, while also ensuring a strictly monotonic alignment, which will allow better optimization of the decoding procedure. For example, the proposed CTC-like transducer system achieves a word error rate of 5.9% for the test-other condition of LibriSpeech, corresponding to an improvement of 4.8% relative to an equivalent RNN-T based system.

翻訳日:2021-11-03 14:19:25 公開日:2021-11-01

# 大規模デジタル実験における機械学習を用いた因果分節解析の枠組み

A framework for causal segmentation analysis with machine learning in large-scale digital experiments ( http://arxiv.org/abs/2111.01223v1 )

ライセンス: Link先を確認

Nima S. Hejazi, Wenjing Zheng, Sathya Anand

(参考訳) 本稿では,大規模デジタル実験において,ユーザサブグループ間の治療効果の差分を明らかにすることを目的とした,因果セグメント発見のためのエンドツーエンド方法論フレームワークを提案する。因果推論と非半パラメトリック統計の最近の進展に基づき,(1)サブグループ固有の治療効果に基づく候補治療の利益となるユーザセグメントの発見,(2)予測されたセグメント固有の利益や損害に基づいて,学習者の治療アームに動的に割り当てたユニットの因果影響の評価,の2つの目的を統一した。提案手法はモデル非依存で、最先端機械学習アルゴリズムを推定手順に組み込むことができ、ランダム化a/bテストや準実験に適用できる。オープンソースのRパッケージ実装であるSherlockが導入されている。

We present an end-to-end methodological framework for causal segment discovery that aims to uncover differential impacts of treatments across subgroups of users in large-scale digital experiments. Building on recent developments in causal inference and non/semi-parametric statistics, our approach unifies two objectives: (1) the discovery of user segments that stand to benefit from a candidate treatment based on subgroup-specific treatment effects, and (2) the evaluation of causal impacts of dynamically assigning units to a study's treatment arm based on their predicted segment-specific benefit or harm. Our proposal is model-agnostic, capable of incorporating state-of-the-art machine learning algorithms into the estimation procedure, and is applicable in randomized A/B tests and quasi-experiments. An open source R package implementation, sherlock, is introduced.

翻訳日:2021-11-03 14:17:02 公開日:2021-11-01

# 意図しない選択:持続的資格化率格差と介入

Unintended Selection: Persistent Qualification Rate Disparities and Interventions ( http://arxiv.org/abs/2111.01201v1 )

ライセンス: Link先を確認

Reilly Raab, Yang Liu

(参考訳) 機械学習におけるグループレベルの差異のダイナミクスを現実的に - かつ公平に - モデリングすることは、まだ未解決の問題です。特に、人類の人工集団の間に固有の違いを仮定しないモデルを望むが、むしろ不均等な不均質なサブ集団の初期条件に訴えることで、不均質化を図っている。この論文では、エージェントはそれぞれ、資格(例えばローン)を表す「真の」バイナリラベルである$y$によって通知される、実数値のフィーチャ$x$(例えばクレジットスコア)を持っている。各エージェントは、(例えば、ローン承認)バイナリ分類ラベル$\hat{Y}$をベイズ最適化機械学習分類器から受信し、(X$と(2)は、それらが属するエージェントの単独グループ$G$内で成功した戦略(例えば、昇給を求める)を模倣することにより、彼らの資格を更新することができる。我々は、異なるグループ間での資格率の格差$\Pr(Y=1)$と、この格差が、世界の人口で繰り返し再訓練されたベイズ最適分類器のシーケンスによってどのように変化するかを考える。我々は,模倣過程のクラスから派生したレプリケータ方程式を用いて,各サブポピュレーション(グループ)の適合率の進化をモデル化する。分類器の均一配置による非自明な平衡状態の組では,初期資格密度以外のすべての面においてグループが同一であっても,下位集団間の資格率の差が無期限に持続することを示す。次に,提案するフェアネス介入が,グループレベルの資格率格差を永久に排除できる新しいフィードバック制御機構とともに,この力学系に与える影響をシミュレーションする。我々は、モデルと発見の限界について議論し、将来的な仕事の可能性を概説することで締めくくります。

Realistically -- and equitably -- modeling the dynamics of group-level disparities in machine learning remains an open problem. In particular, we desire models that do not suppose inherent differences between artificial groups of people -- but rather endogenize disparities by appeal to unequal initial conditions of insular subpopulations. In this paper, agents each have a real-valued feature $X$ (e.g., credit score) informed by a "true" binary label $Y$ representing qualification (e.g., for a loan). Each agent alternately (1) receives a binary classification label $\hat{Y}$ (e.g., loan approval) from a Bayes-optimal machine learning classifier observing $X$ and (2) may update their qualification $Y$ by imitating successful strategies (e.g., seek a raise) within an isolated group $G$ of agents to which they belong. We consider the disparity of qualification rates $\Pr(Y=1)$ between different groups and how this disparity changes subject to a sequence of Bayes-optimal classifiers repeatedly retrained on the global population. We model the evolving qualification rates of each subpopulation (group) using the replicator equation, which derives from a class of imitation processes. We show that differences in qualification rates between subpopulations can persist indefinitely for a set of non-trivial equilibrium states due to uniformed classifier deployments, even when groups are identical in all aspects except initial qualification densities. We next simulate the effects of commonly proposed fairness interventions on this dynamical system along with a new feedback control mechanism capable of permanently eliminating group-level qualification rate disparities. We conclude by discussing the limitations of our model and findings and by outlining potential future work.

翻訳日:2021-11-03 14:13:31 公開日:2021-11-01

# Minimax Optimization: Convex-submodular の場合

Minimax Optimization: The Case of Convex-Submodular ( http://arxiv.org/abs/2111.01262v1 )

ライセンス: Link先を確認

Arman Adibi, Aryan Mokhtari, Hamed Hassani

(参考訳) ミニマックス最適化は、機械学習、ゲーム理論、制御理論における様々な応用に取り組んできた。これまでの文献では、例えば凸凸ミニマックス最適化(convex-concave minimax optimization)など、連続領域におけるそのような問題の研究に重点を置いてきた。それでも、ミニマックス問題は連続領域を超えて連続離散領域や完全離散領域にまで拡張される。本稿では、ユークリッド空間に属する連続変数上の最小化が与えられた基底集合の部分集合上の最大化である混合連続離散ミニマックス問題について研究する。連続変数に関して目的が凸であり、離散変数に関して部分モジュラーであるような凸-部分モジュラーミニマックス問題のクラスを導入する。このような問題は機械学習アプリケーションに頻繁に現れるが、アルゴリズム的および理論的観点からそれらに対処する方法についてはほとんど分かっていない。このような問題に対して、まず、サドル点の取得は任意の近似に対して困難であることを示し、従って(近傍)最適性の新たな概念を導入する。次に, 凸および単調サブモジュラーミニマックス問題の解法と, その収束率, 計算複雑性, 最終解の質を最適解として特徴付けるアルゴリズム手法を提案する。提案アルゴリズムは反復的であり、離散最適化と連続最適化の両方のツールを組み合わせる。最後に,本手法の有効性を示す数値実験を行った。

Minimax optimization has been central in addressing various applications in machine learning, game theory, and control theory. Prior literature has thus far mainly focused on studying such problems in the continuous domain, e.g., convex-concave minimax optimization is now understood to a significant extent. Nevertheless, minimax problems extend far beyond the continuous domain to mixed continuous-discrete domains or even fully discrete domains. In this paper, we study mixed continuous-discrete minimax problems where the minimization is over a continuous variable belonging to Euclidean space and the maximization is over subsets of a given ground set. We introduce the class of convex-submodular minimax problems, where the objective is convex with respect to the continuous variable and submodular with respect to the discrete variable. Even though such problems appear frequently in machine learning applications, little is known about how to address them from algorithmic and theoretical perspectives. For such problems, we first show that obtaining saddle points are hard up to any approximation, and thus introduce new notions of (near-) optimality. We then provide several algorithmic procedures for solving convex and monotone-submodular minimax problems and characterize their convergence rates, computational complexity, and quality of the final solution according to our notions of optimally. Our proposed algorithms are iterative and combine tools from both discrete and continuous optimization. Finally, we provide numerical experiments to showcase the effectiveness of our purposed methods.

翻訳日:2021-11-03 14:13:00 公開日:2021-11-01

# switch point biased self-training: code-switchingのための事前学習モデルの再提案

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching ( http://arxiv.org/abs/2111.01231v1 )

ライセンス: Link先を確認

Parul Chopra, Sai Krishna Rallabandi, Alan W Black, Khyathi Raghavi Chandu

(参考訳) 多言語コミュニティにおけるコミュニケーションの容易さによるユビキタスな現象であるcode-switching (cs)は、言語処理における未熟な問題である。この背景にある主な理由は、(1)事前訓練された大規模多言語モデルを活用するための最小限の努力、(2)注釈付きデータの欠如である。 CSにおける多言語モデルの低性能の区別は、スイッチポイントにつながる言語の文内混合である。まず 4 つの異なる言語ペアに POS と NER という2 つのシーケンスラベリングタスクをベンチマークし,問題を特定し,その中の最高のパフォーマンスモデルである char-BERT を選択する (addressing (1))。次に,未注釈データ(アドレッシング(2))を活用することで,スイッチポイントバイアスを用いて既存の事前学習モデルを再利用する自己学習手法を提案する。両タスクにおける2つの異なる言語ペアの全体的な性能を維持しながら,スイッチポイント性能のギャップを小さくすることで,我々のアプローチが両タスクで良好に動作することを示す。私たちのコードは、https://github.com/PC09/EMNLP2021-Switch-Point-biased-Self-Training.comで利用可能です。

Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in leveraging large pretrained multilingual models, and (2) the lack of annotated data. The distinguishing case of low performance of multilingual models in CS is the intra-sentence mixing of languages leading to switch points. We first benchmark two sequence labeling tasks -- POS and NER on 4 different language pairs with a suite of pretrained models to identify the problems and select the best performing model, char-BERT, among them (addressing (1)). We then propose a self training method to repurpose the existing pretrained models using a switch-point bias by leveraging unannotated data (addressing (2)). We finally demonstrate that our approach performs well on both tasks by reducing the gap between the switch point performance while retaining the overall performance on two distinct language pairs in both the tasks. Our code is available here: https://github.com/PC09/EMNLP2021-Switch-Point-biased-Self-Training.

翻訳日:2021-11-03 14:12:37 公開日:2021-11-01

# 映像理解モデルのための勾配周波数変調

Gradient Frequency Modulation for Visually Explaining Video Understanding Models ( http://arxiv.org/abs/2111.01215v1 )

ライセンス: Link先を確認

Xinmiao Lin, Wentao Bao, Matthew Wright, Yu Kong

(参考訳) 多くのアプリケーションでは、なぜ機械学習モデルが意思決定を行うのかを理解することが不可欠であるが、これは最先端のニューラルネットワークのブラックボックスの性質によって阻害されている。このため、ビデオ理解の分野を含む深層学習における説明可能性に注目が集まっている。映像データの時間的次元から,映像行動認識モデルを説明する主な課題は,既存の文献では無視されている時空間的に一貫した視覚説明を作ることである。本稿では,映像理解モデルの意思決定を説明するために,周波数ベース極値摂動(f-ep)を提案する。摂動法によって与えられる説明は、空間的・時間的にともにノイズと非スムースであるため、離散コサイン変換(dct)を用いてニューラルネットワークモデルから勾配写像の周波数を変調する。実験では,f-ep がモデルの意思決定をより忠実に表現する時空間的一貫性のある説明を提供することを示す。

In many applications, it is essential to understand why a machine learning model makes the decisions it does, but this is inhibited by the black-box nature of state-of-the-art neural networks. Because of this, increasing attention has been paid to explainability in deep learning, including in the area of video understanding. Due to the temporal dimension of video data, the main challenge of explaining a video action recognition model is to produce spatiotemporally consistent visual explanations, which has been ignored in the existing literature. In this paper, we propose Frequency-based Extremal Perturbation (F-EP) to explain a video understanding model's decisions. Because the explanations given by perturbation methods are noisy and non-smooth both spatially and temporally, we propose to modulate the frequencies of gradient maps from the neural network model with a Discrete Cosine Transform (DCT). We show in a range of experiments that F-EP provides more spatiotemporally consistent explanations that more faithfully represent the model's decisions compared to the existing state-of-the-art methods.

翻訳日:2021-11-03 14:10:08 公開日:2021-11-01

# 前もってのニューラルシーンフロー

Neural Scene Flow Prior ( http://arxiv.org/abs/2111.01253v1 )

ライセンス: Link先を確認

Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey

(参考訳) ディープラーニング革命以前、多くの知覚アルゴリズムは実行時最適化と強力な事前/正規化ペナルティに基づいていた。コンピュータビジョンにおけるこの主な例は光学とシーンフローである。教師付き学習は、明示的な正規化の必要性をほとんど取り除いた。代わりに、大量のラベル付きデータを使用して事前統計をキャプチャするが、多くの問題に対して必ずしも容易に利用できない。ニューラルネットワークの学習には最適化が使用されるが、このネットワークの重みは実行時に凍結される。その結果、これらの学習ソリューションはドメイン固有であり、他の統計的に異なるシナリオによく当てはまらない。本稿では,実行時最適化と強い正規化に大きく依存するシーンフロー問題を再考する。ここでの中心的なイノベーションは、ニューラルネットワークのアーキテクチャを新しいタイプの暗黙正則化器として使用する、前もってニューラルネットワークのシーンフローを含めることである。学習ベースのシーンフローメソッドとは異なり、最適化は実行時に発生し、我々のアプローチではオフラインのデータセットは必要とせず、自律運転のような新しい環境へのデプロイに最適である。我々は、マルチレイヤパーセプトロン(MLP)のみをベースとしたアーキテクチャをシーンフローとして使用できることを示します。我々の手法は、シーンフローベンチマークで競争力のある結果を得ることができます。また、神経前兆の暗黙的かつ連続的なシーンフロー表現は、ポイント雲の列にまたがる密集した長期対応を推定することができる。濃密な動き情報は、動きベクトルを統合することにより、時間を通して点を伝播できるシーンフローフィールドによって表現される。我々は,lidar点雲の列を蓄積することで,このような能力を示す。

Before the deep learning revolution, many perception algorithms were based on runtime optimization in conjunction with a strong prior/regularization penalty. A prime example of this in computer vision is optical and scene flow. Supervised learning has largely displaced the need for explicit regularization. Instead, they rely on large amounts of labeled data to capture prior statistics, which are not always readily available for many problems. Although optimization is employed to learn the neural network, the weights of this network are frozen at runtime. As a result, these learning solutions are domain-specific and do not generalize well to other statistically different scenarios. This paper revisits the scene flow problem that relies predominantly on runtime optimization and strong regularization. A central innovation here is the inclusion of a neural scene flow prior, which uses the architecture of neural networks as a new type of implicit regularizer. Unlike learning-based scene flow methods, optimization occurs at runtime, and our approach needs no offline datasets -- making it ideal for deployment in new environments such as autonomous driving. We show that an architecture based exclusively on multilayer perceptrons (MLPs) can be used as a scene flow prior. Our method attains competitive -- if not better -- results on scene flow benchmarks. Also, our neural prior's implicit and continuous scene flow representation allows us to estimate dense long-term correspondences across a sequence of point clouds. The dense motion information is represented by scene flow fields where points can be propagated through time by integrating motion vectors. We demonstrate such a capability by accumulating a sequence of lidar point clouds.

翻訳日:2021-11-03 14:09:52 公開日:2021-11-01

# 運動境界と咬合の協調検出

Joint Detection of Motion Boundaries and Occlusions ( http://arxiv.org/abs/2111.01261v1 )

ライセンス: Link先を確認

Hannah Halin Kim, Shuzhi Yu, Carlo Tomasi

(参考訳) 本研究では,映像中の運動境界(mbs)と咬合領域(occ)を同時検出する畳み込みニューラルネットワークmonetを提案する。検出は、光の流れがMBに沿って不連続であり、Occでは定義されていないため困難である。 2つの時間方向を同時に推論するため、2つのフレーム間の推定マップを直接ワープする。フレーム間の外観ミスマッチは、しばしばMBやOccに近づきやすいため、1フレーム内の各特徴に対して、検索範囲内の特徴と一致した最小差を記録するコストブロックを構築する。このコストブロックは2次元であり、フロー分析で使われる4次元のコストボリュームよりもはるかに安価である。コストブロック機能はエンコーダで計算され、MBとOccの推定はデコーダで計算される。デコーダ層を細粒度に配置することで性能が向上することがわかった。 MONetは、SintelとFlyingChairsOccベンチマークの両方のタスクにおいて、細かな調整をすることなく、従来の技術よりも優れている。

We propose MONet, a convolutional neural network that jointly detects motion boundaries (MBs) and occlusion regions (Occs) in video both forward and backward in time. Detection is difficult because optical flow is discontinuous along MBs and undefined in Occs, while many flow estimators assume smoothness and a flow defined everywhere. To reason in the two time directions simultaneously, we direct-warp the estimated maps between the two frames. Since appearance mismatches between frames often signal vicinity to MBs or Occs, we construct a cost block that for each feature in one frame records the lowest discrepancy with matching features in a search range. This cost block is two-dimensional, and much less expensive than the four-dimensional cost volumes used in flow analysis. Cost-block features are computed by an encoder, and MB and Occ estimates are computed by a decoder. We found that arranging decoder layers fine-to-coarse, rather than coarse-to-fine, improves performance. MONet outperforms the prior state of the art for both tasks on the Sintel and FlyingChairsOcc benchmarks without any fine-tuning on them.

翻訳日:2021-11-03 14:09:26 公開日:2021-11-01

# クロスモーダルビデオ検索のためのマスキングモード

Masking Modalities for Cross-modal Video Retrieval ( http://arxiv.org/abs/2111.01300v1 )

ライセンス: Link先を確認

Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid

(参考訳) 大規模アンラベリングデータセットの事前トレーニングでは、コンピュータビジョンと自然言語処理の分野で顕著なパフォーマンス向上が見られた。大規模ビデオデータセットの出現を考えると、ビデオエンコーダを事前訓練するための一般的な戦略は、付随する音声を弱い監督力として使うことである。しかし、音声は事前学習を監督するために使用されるため、ビデオエンコーダには見られず、そのモダリティを処理することを学ばない。音声言語における豊富な手がかりを活用できない現在の事前学習手法の欠点に対処した。提案手法は,ビデオモダリティの全てを監督,すなわち外見,音,書き起こし音声として利用して,ビデオエンコーダの事前訓練を行うことである。入力の全体モダリティを隠蔽し、他の2つのモダリティを使って予測する。これにより、それぞれのモダリティが他の人とコラボレーションすることを奨励し、私たちのビデオエンコーダは、音声と同様に外観や音声を処理することを学びます。 How2R, YouCook2, Condensed Moviesデータセット上で, ビデオ検索のための"モダリティマスキング"事前学習手法の優れた性能を示す。

Pre-training on large scale unlabelled datasets has shown impressive performance improvements in the fields of computer vision and natural language processing. Given the advent of large-scale instructional video datasets, a common strategy for pre-training video encoders is to use the accompanying speech as weak supervision. However, as speech is used to supervise the pre-training, it is never seen by the video encoder, which does not learn to process that modality. We address this drawback of current pre-training methods, which fail to exploit the rich cues in spoken language. Our proposal is to pre-train a video encoder using all the available video modalities as supervision, namely, appearance, sound, and transcribed speech. We mask an entire modality in the input and predict it using the other two modalities. This encourages each modality to collaborate with the others, and our video encoder learns to process appearance and audio as well as speech. We show the superior performance of our "modality masking" pre-training approach for video retrieval on the How2R, YouCook2 and Condensed Movies datasets.

翻訳日:2021-11-03 14:09:09 公開日:2021-11-01

# プロンプトレベルEMA非応答予測のためのトランスフォーマ

Transformers for prompt-level EMA non-response prediction ( http://arxiv.org/abs/2111.01193v1 )

ライセンス: Link先を確認

Supriya Nagesh, Alexander Moreno, Stephanie M. Carpenter, Jamie Yap, Soujanya Chatterjee, Steven Lloyd Lizotte, Neng Wan, Santosh Kumar, Cho Lam, David W. Wetter, Inbal Nahum-Shani, James M. Rehg

(参考訳) eco momentary assessments (emas)は、モバイルヘルス(mhealth)研究や治療プログラムの参加者から現在の認知状態、影響、行動、環境要因を測定する上で重要な心理学的データソースである。参加者がEMAプロンプトに反応しない非応答は、内因性問題である。非応答を正確に予測できる能力は、EMAのデリバリを改善し、コンプライアンスの介入を開発するために利用できる。先行研究は、非応答を予測するために古典的な機械学習モデルを探求した。しかし、ますます大規模なEMAデータセットが利用可能になるにつれて、他の分野で有効であったディープラーニングモデルを活用する可能性がある。近年,NLPなどの領域ではトランスフォーマーモデルの性能が向上している。この研究は、EMAデータ分析におけるトランスフォーマーの利用を初めて探求したものである。 EMAデータにトランスフォーマーを適用する際の3つの重要な問題に対処する。 1.入力表現,入力表現 2. 時間情報のエンコーディング 3. 下流予測タスク性能向上のための事前学習の有用性トランスモデルは0.77の非応答予測AUCを実現し、従来のMLやLSTMベースのディープラーニングモデルよりも大幅に優れている。我々は,40K EMAサンプルのコーパスで学習した予測モデルを研究コミュニティに無償で提供し,将来的なトランスフォーマーベースのEMA分析作業の開発を促進する。

Ecological Momentary Assessments (EMAs) are an important psychological data source for measuring current cognitive states, affect, behavior, and environmental factors from participants in mobile health (mHealth) studies and treatment programs. Non-response, in which participants fail to respond to EMA prompts, is an endemic problem. The ability to accurately predict non-response could be utilized to improve EMA delivery and develop compliance interventions. Prior work has explored classical machine learning models for predicting non-response. However, as increasingly large EMA datasets become available, there is the potential to leverage deep learning models that have been effective in other fields. Recently, transformer models have shown state-of-the-art performance in NLP and other domains. This work is the first to explore the use of transformers for EMA data analysis. We address three key questions in applying transformers to EMA data: 1. Input representation, 2. encoding temporal information, 3. utility of pre-training on improving downstream prediction task performance. The transformer model achieves a non-response prediction AUC of 0.77 and is significantly better than classical ML and LSTM-based deep learning models. We will make our a predictive model trained on a corpus of 40K EMA samples freely-available to the research community, in order to facilitate the development of future transformer-based EMA analysis works.

翻訳日:2021-11-03 14:04:15 公開日:2021-11-01

# 時系列生成のためのSig-Wasserstein GANs

Sig-Wasserstein GANs for Time Series Generation ( http://arxiv.org/abs/2111.01207v1 )

ライセンス: Link先を確認

Hao Ni, Lukasz Szpruch, Marc Sabate-Vidales, Baoren Xiao, Magnus Wiese, Shujian Liao

(参考訳) 合成データ(Synthetic data)は、AI機械学習パイプラインの開発とデプロイを著しく加速する新興技術である。本研究では,連続時間確率モデルと新たに提案された$w_1$メトリックを組み合わせることで,高忠実度時系列生成器sigwganを開発した。前者は確率微分方程式に基づくLogsig-RNNモデルであり、後者は時系列によって誘導される測度を特徴づける普遍的および原理的な数学的特徴に由来する。 SigWGAN は計算的に挑戦する GAN min-max 問題を高忠実度サンプルを生成しながら教師あり学習に変換することができる。一般的な量的リスクモデルと経験的金融データから得られた合成データから,提案モデルを検証する。コードはhttps://github.com/sigcgans/sig-wasserstein-gans.gitで入手できる。

Synthetic data is an emerging technology that can significantly accelerate the development and deployment of AI machine learning pipelines. In this work, we develop high-fidelity time-series generators, the SigWGAN, by combining continuous-time stochastic models with the newly proposed signature $W_1$ metric. The former are the Logsig-RNN models based on the stochastic differential equations, whereas the latter originates from the universal and principled mathematical features to characterize the measure induced by time series. SigWGAN allows turning computationally challenging GAN min-max problem into supervised learning while generating high fidelity samples. We validate the proposed model on both synthetic data generated by popular quantitative risk models and empirical financial data. Codes are available at https://github.com/SigCGANs/Sig-Wasserstein-GANs.git.

翻訳日:2021-11-03 14:03:55 公開日:2021-11-01

# サーバグレードハードウェアによるヒューマンレベル制御

Human-Level Control without Server-Grade Hardware ( http://arxiv.org/abs/2111.01264v1 )

ライセンス: Link先を確認

Brett Daley and Christopher Amato

(参考訳) ディープQネットワーク(DQN)は強化学習の大きなマイルストーンであり、人間レベルの制御ポリシーが報酬の最大化を通じて生の視覚入力から直接学習できることを初めて実証した。導入から何年も経っても、dqnは多くのイノベーションが後継手法に採用されているため、dqnは研究コミュニティと非常に関係がある。それでも、ハードウェアの大幅な進歩にもかかわらず、DQNの最初のAtari 2600実験は完全な複製に費用がかかるままであった。これは、最先端のハードウェアや大規模なクラウドコンピューティングリソースにアクセスできない研究者にとって、大きな障壁となる。そこで本研究では,CPU-GPUデスクトップシステムを最大限活用するために設計された,並列かつ同期化された新しい実行フレームワークを活用したDQN実装を提案する。 NVIDIA GeForce GTX 1080 GPUを1つだけで実装することで、200万フレームのAtari実験のトレーニング時間を25時間から9時間に短縮します。我々の論文で紹介されたアイデアは、多くのオフポリシー深層強化学習法に一般化されるべきである。

Deep Q-Network (DQN) marked a major milestone for reinforcement learning, demonstrating for the first time that human-level control policies could be learned directly from raw visual inputs via reward maximization. Even years after its introduction, DQN remains highly relevant to the research community since many of its innovations have been adopted by successor methods. Nevertheless, despite significant hardware advances in the interim, DQN's original Atari 2600 experiments remain costly to replicate in full. This poses an immense barrier to researchers who cannot afford state-of-the-art hardware or lack access to large-scale cloud computing resources. To facilitate improved access to deep reinforcement learning research, we introduce a DQN implementation that leverages a novel concurrent and synchronized execution framework designed to maximally utilize a heterogeneous CPU-GPU desktop system. With just one NVIDIA GeForce GTX 1080 GPU, our implementation reduces the training time of a 200-million-frame Atari experiment from 25 hours to just 9 hours. The ideas introduced in our paper should be generalizable to a large number of off-policy deep reinforcement learning methods.

翻訳日:2021-11-03 14:03:42 公開日:2021-11-01

# 車両グリッド統合を考慮した電気自動車充電ステーションの運転学習

Learning to Operate an Electric Vehicle Charging Station Considering Vehicle-grid Integration ( http://arxiv.org/abs/2111.01294v1 )

ライセンス: Link先を確認

Zuzhao Ye, Yuanqi Gao, Nanpeng Yu

(参考訳) 電気自動車(EV)の急速な普及により、EV充電ステーションの広範な設置が求められている。充電ステーションの収益性を最大化するために、充電と電気グリッドサービスの両方を提供するインテリジェントコントローラが大いに求められている。しかし,EVの到着時間や充電要求が不確実であることから,最適な充電スケジュールを決定することは困難である。本稿では、充電ステーションの利益を最大化するために、新しい集中型アロケーションと分散実行(CADE)強化学習(RL)フレームワークを提案する。集中配置プロセスでは、EVは待機スポットまたは充電スポットに割り当てられる。分散実行プロセスでは、各充電器は共有リプレイメモリからアクション値関数を学習しながら、独自の充電/放電決定を行う。このCADEフレームワークはRLアルゴリズムのスケーラビリティとサンプル効率を大幅に改善する。数値計算により,提案したCADEフレームワークは計算効率が高く,拡張性も高く,ベースラインモデル予測制御(MPC)よりも優れていた。また,強化学習エージェントの内部動作を説明するために,学習した行動値関数の詳細な分析を行う。

The rapid adoption of electric vehicles (EVs) calls for the widespread installation of EV charging stations. To maximize the profitability of charging stations, intelligent controllers that provide both charging and electric grid services are in great need. However, it is challenging to determine the optimal charging schedule due to the uncertain arrival time and charging demands of EVs. In this paper, we propose a novel centralized allocation and decentralized execution (CADE) reinforcement learning (RL) framework to maximize the charging station's profit. In the centralized allocation process, EVs are allocated to either the waiting or charging spots. In the decentralized execution process, each charger makes its own charging/discharging decision while learning the action-value functions from a shared replay memory. This CADE framework significantly improves the scalability and sample efficiency of the RL algorithm. Numerical results show that the proposed CADE framework is both computationally efficient and scalable, and significantly outperforms the baseline model predictive control (MPC). We also provide an in-depth analysis of the learned action-value function to explain the inner working of the reinforcement learning agent.

翻訳日:2021-11-03 14:02:01 公開日:2021-11-01

# ハードウェアを意識したニューラルアーキテクチャ検索のためのプロキシデバイス

One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search ( http://arxiv.org/abs/2111.01203v1 )

ライセンス: Link先を確認

Bingqian Lu and Jianyi Yang and Weiwen Jiang and Yiyu Shi and Shaolei Ren

(参考訳) 畳み込みニューラルネットワーク(cnns)は、視覚ベースの自律運転やビデオコンテンツ分析など、多くの現実のアプリケーションで使われている。様々なターゲットデバイスでcnn推論を実行するには、ハードウェアアウェアニューラルアーキテクチャ検索(nas)が不可欠である。効率的なハードウェア対応NASの重要な要件は、異なるアーキテクチャをランク付けするための推論レイテンシの高速評価である。ターゲットデバイス毎の遅延予測器の構築は、技術状況において一般的に使用されているが、非常に多様なデバイスの存在下でスケーラビリティに欠ける、非常に時間を要するプロセスである。本研究では,レイテンシのモノトニック性(monotonicity)を活用することでスケーラビリティの課題に対処します。強いレイテンシのモノトニック性が存在する場合、最適性を損なうことなく、新しいターゲットデバイス上で1つのプロキシデバイスを検索したアーキテクチャを再利用できる。強い遅延単調性がない場合、遅延単調性を大幅に向上させる効率的なプロキシ適応手法を提案する。最後に、我々は、MobileNet-V2、MobileNet-V3、NAS-Bench-201、ProxylessNAS、FBNetなど、複数の主要な検索空間上で異なるプラットフォームで実験を行い、アプローチを検証する。我々の結果は、ひとつのプロキシデバイスを使用することで、デバイス毎のNASとほぼ同じPareto-Optimalアーキテクチャを見つけることができ、各デバイス用の遅延予測器を構築することの禁止コストを回避することができることを強調している。

Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity -- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.

翻訳日:2021-11-03 12:48:47 公開日:2021-11-01

# ASMDD:アラビア音声誤認識検出データセット

ASMDD: Arabic Speech Mispronunciation Detection Dataset ( http://arxiv.org/abs/2111.01136v1 )

ライセンス: Link先を確認

Salah A. Aly, Abdelrahman Salah, Hesham M. Eraqi

(参考訳) エジプト語対話におけるアラビア語の誤発音検出の最大のデータセットを紹介する。データセットは、アラビア語で最も頻繁に使われる上位100語を表す注釈付きオーディオファイルで構成されており、100人のエジプト人の子供(2歳から8歳)が発音している。データセットは、専門家リスナーによるセグメント発音誤り検出に基づいて収集、注釈付けされる。

The largest dataset of Arabic speech mispronunciation detections in Egyptian dialogues is introduced. The dataset is composed of annotated audio files representing the top 100 words that are most frequently used in the Arabic language, pronounced by 100 Egyptian children (aged between 2 and 8 years old). The dataset is collected and annotated on segmental pronunciation error detections by expert listeners.

翻訳日:2021-11-03 12:44:25 公開日:2021-11-01

# ディープラーニングを用いたツイートの因果関係の同定--2017-2021年の糖尿病関連ツイートを事例として

Identifying causal associations in tweets using deep learning: Use case on diabetes-related tweets from 2017-2021 ( http://arxiv.org/abs/2111.01225v1 )

ライセンス: Link先を確認

Adrian Ahne (1 and 2), Vivek Khetan (3), Xavier Tannier (4), Md Imbessat Hassan Rizvi (5), Thomas Czernichow (2), Francisco Orchard (2), Charline Bour (6), Andrew Fano (3), Guy Fagherazzi (6) ((1) Paris-Saclay University, UVSQ, Inserm, Gustave Roussy, Exposome and Heredity team, CESP, F-94805, Villejuif, France, (2) Epiconcept, Paris, France, (3) Accenture Labs, San Francisco, USA, (4) Sorbonne University, Inserm, University Sorbonne Paris Nord, Laboratoire d'Informatique Medicale et d'Ingenierie des Connaissances pour la e-Sante, LIMICS, Paris, France, (5) Indian Institute of Science, Bengaluru, India, (6) Deep Digital Phenotyping Research Unit, Department of Precision Health, Luxembourg Institute of Health, Strassen, Luxembourg)

(参考訳) 目的: 糖尿病関連ツイートにおける明示的・暗黙的な因果関係を抽出し, 因果性の観点から, 糖尿病オンラインコミュニティ内で共有されている意見, 感情, 観察をよりよく理解するためのツールを提供する。資料と方法:2017年4月から2021年1月の間に、3000万以上の英語の糖尿病関連ツイートが収集された。ディープラーニングと自然言語処理は、個人的および感情的なコンテンツのツイートに焦点を当てるために適用された。 cause-effect-tweetデータセットが手動でラベル付けされ、トレーニングに使用される 1) 因果関係を含む因果関係文を検出するための微調整Bertweetモデル 2) BERTをベースとしたCRFモデルを用いて, 因果関係を抽出した。原因と影響は半教師付きアプローチでクラスター化され、インタラクティブな因果効果ネットワークで可視化された。結果: 不均衡データセットでは68%のリコールで因果文が検出された。 BERTをベースとしたCRFモデルは68%のマクロリコールで原因効果検出のための細調整BERTモデルより優れていた。これにより96,676件の大義関連判決が下された。ディアベテス」は中央クラスタとして同定され、「死」と「インスリン」が続く。インスリン価格関連原因は、しばしば「死」と関連づけられた。結論: 因果文を検出し, 明示的, 暗黙的, 単語的および多語的原因とそれに対応する効果を, BERTベースのアーキテクチャを活用し, 原因効果ネットワークとして可視化した糖尿病関連ツイートで表す新しい手法を開発した。実生活における因果関係を抽出し,ソーシャルメディアデータから報告した患者報告の結果は,糖尿病研究において有用な補完的情報源となる。

Objective: Leveraging machine learning methods, we aim to extract both explicit and implicit cause-effect associations in patient-reported, diabetes-related tweets and provide a tool to better understand opinion, feelings and observations shared within the diabetes online community from a causality perspective. Materials and Methods: More than 30 million diabetes-related tweets in English were collected between April 2017 and January 2021. Deep learning and natural language processing methods were applied to focus on tweets with personal and emotional content. A cause-effect-tweet dataset was manually labeled and used to train 1) a fine-tuned Bertweet model to detect causal sentences containing a causal association 2) a CRF model with BERT based features to extract possible cause-effect associations. Causes and effects were clustered in a semi-supervised approach and visualised in an interactive cause-effect-network. Results: Causal sentences were detected with a recall of 68% in an imbalanced dataset. A CRF model with BERT based features outperformed a fine-tuned BERT model for cause-effect detection with a macro recall of 68%. This led to 96,676 sentences with cause-effect associations. "Diabetes" was identified as the central cluster followed by "Death" and "Insulin". Insulin pricing related causes were frequently associated with "Death". Conclusions: A novel methodology was developed to detect causal sentences and identify both explicit and implicit, single and multi-word cause and corresponding effect as expressed in diabetes-related tweets leveraging BERT-based architectures and visualised as cause-effect-network. Extracting causal associations on real-life, patient reported outcomes in social media data provides a useful complementary source of information in diabetes research.

翻訳日:2021-11-03 12:44:19 公開日:2021-11-01

# 不確実なコスト関数を持つユーザのための低コストアルゴリズムリコース

Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions ( http://arxiv.org/abs/2111.01235v1 )

ライセンス: Link先を確認

Prateek Yadav, Peter Hase, Mohit Bansal

(参考訳) 機械学習モデル決定に影響を及ぼす人々のアルゴリズム的会話を特定するという問題は近年注目を集めている。最近の作業は、ユーザ満足度と直接リンクされる、ユーザによるコストのモデルである。しかし、すべてのユーザー間で共有される単一のグローバルなコスト関数を想定している。これは、ユーザが機能変更に伴う異なるコストと機能に対して行動する意思について異なる好みを持っている場合、非現実的な仮定である。本研究では,ユーザ固有のコスト関数の概念を形式化し,ユーザのための行動可能なリコースを識別する新しい手法を提案する。デフォルトでは,ユーザのコスト関数はrecourseメソッドから隠されていると仮定するが,我々のフレームワークでは,ユーザの好みやコスト関数を部分的にあるいは完全に指定することができる。提案する目的関数である「期待最小コスト(EMC)」は,(1)利用者に選択肢のセットを提示する場合,利用者が採用できる少なくとも1つの低コストソリューションが存在することが重要であり,(2)利用者の真のコスト関数を知らない場合には,まず可算コスト関数をサンプリングし,ユーザの満足度を概算し,期待するユーザにとって良好なコストを実現するためのセットを見つけることができる。我々は,新しい離散最適化アルゴリズムであるコスト最適化局所探索 (cols) を用いてemcを最適化する。ユーザコストをシミュレーションした人気のある実世界のデータセットの実験的評価により,強いベースライン法と比較して25.89ポイントのユーザを満足できることがわかった。また, 標準的公平度指標を用いて, 比較した手法よりも, 集団間でより公平なソリューションを提供できることを示すとともに, コスト関数分布の誤特定にロバストな手法であることを検証した。

The problem of identifying algorithmic recourse for people affected by machine learning model decisions has received much attention recently. Some recent works model user-incurred cost, which is directly linked to user satisfaction. But they assume a single global cost function that is shared across all users. This is an unrealistic assumption when users have dissimilar preferences about their willingness to act upon a feature and different costs associated with changing that feature. In this work, we formalize the notion of user-specific cost functions and introduce a new method for identifying actionable recourses for users. By default, we assume that users' cost functions are hidden from the recourse method, though our framework allows users to partially or completely specify their preferences or cost function. We propose an objective function, Expected Minimum Cost (EMC), based on two key ideas: (1) when presenting a set of options to a user, it is vital that there is at least one low-cost solution the user could adopt; (2) when we do not know the user's true cost function, we can approximately optimize for user satisfaction by first sampling plausible cost functions, then finding a set that achieves a good cost for the user in expectation. We optimize EMC with a novel discrete optimization algorithm, Cost-Optimized Local Search (COLS), which is guaranteed to improve the recourse set quality over iterations. Experimental evaluation on popular real-world datasets with simulated user costs demonstrates that our method satisfies up to 25.89 percentage points more users compared to strong baseline methods. Using standard fairness metrics, we also show that our method can provide more fair solutions across demographic groups than comparable methods, and we verify that our method is robust to misspecification of the cost function distribution.

翻訳日:2021-11-03 12:43:52 公開日:2021-11-01

# HRViT:マルチスケール高分解能ビジョントランス

HRViT: Multi-Scale High-Resolution Vision Transformer ( http://arxiv.org/abs/2111.01236v1 )

ライセンス: Link先を確認

Jiaqi Gu, Hyoukjun Kwon, Dilin Wang, Wei Ye, Meng Li, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra, David Z. Pan

(参考訳) ビジョントランスフォーマー (vits) は、コンピュータビジョンタスクの優れた性能で多くの注目を集めている。単一スケールの低分解能表現の限界に対処するため、事前の作業では、階層構造を持つ高分解能密度予測タスクにViTを適用してピラミッドの特徴を生成する。しかし、その分類のようなシーケンシャルトポロジーを考えると、マルチスケールの表現学習はまだViTでは未探索である。意味的にリッチで空間的に精度の高いマルチスケール表現を学習する能力を高めるために,我々は高解像度のマルチブランチアーキテクチャをHRViTと呼ばれる視覚変換器と効率的に統合し,高密度予測タスクのParetoを新たなレベルに押し上げる。我々は、異種分岐設計を探求し、線形層における冗長性を低減し、モデル性能とハードウェア効率のバランスをとるためにモデル非線形性を強化した。提案したHRViTは、ADE20K上の50.20% mIoUと、セマンティックセグメンテーションタスクのためのCityscapes上の83.16% mIoUを達成し、最先端のMiTとCSWinを平均1.78 mIoUの改善、28%のパラメータ削減、21%のFLOPs還元を達成し、HRViTを強力な視覚バックボーンとしての可能性を示している。

Vision transformers (ViTs) have attracted much attention for their superior performance on computer vision tasks. To address their limitations of single-scale low-resolution representations, prior work adapts ViTs to high-resolution dense prediction tasks with hierarchical architectures to generate pyramid features. However, multi-scale representation learning is still under-explored on ViTs, given their classification-like sequential topology. To enhance ViTs with more capability to learn semantically-rich and spatially-precise multi-scale representations, in this work, we present an efficient integration of high-resolution multi-branch architectures with vision transformers, dubbed HRViT, pushing the Pareto front of dense prediction tasks to a new level. We explore heterogeneous branch design, reduce the redundancy in linear layers, and augment the model nonlinearity to balance the model performance and hardware efficiency. The proposed HRViT achieves 50.20% mIoU on ADE20K and 83.16% mIoU on Cityscapes for semantic segmentation tasks, surpassing state-of-the-art MiT and CSWin with an average of +1.78 mIoU improvement, 28% parameter reduction, and 21% FLOPs reduction, demonstrating the potential of HRViT as strong vision backbones.

翻訳日:2021-11-03 12:43:22 公開日:2021-11-01

# 顔のランドマーク検出における合成画像による人口バイアスの発見

Using Synthetic Images To Uncover Population Biases In Facial Landmarks Detection ( http://arxiv.org/abs/2111.01683v1 )

ライセンス: Link先を確認

Ran Shadmi, Jonathan Laserson, Gil Elbaz

(参考訳) トレーニングされたモデルのパフォーマンスを分析して弱点を特定するには、テスト用のデータの一部を脇に置いておく必要がある。テストセットは、ターゲット集団のすべての関連するサブグループに対して統計的に重要なバイアスを検出するのに十分な大きさでなければならない。この要件は、特にデータ格納型アプリケーションでは、満足しにくい場合があります。合成テストセットを生成することで,この問題を克服することを提案する。我々は、顔のランドマーク検出タスクを使用して、実際のデータセットで観察されるすべてのバイアスが、慎重に設計された合成データセットで見られることを示し、提案を検証する。これは、合成テストセットがモデルの弱点を効率的に検出し、量や多様性の観点から実テストセットの限界を克服できることを示している。

In order to analyze a trained model performance and identify its weak spots, one has to set aside a portion of the data for testing. The test set has to be large enough to detect statistically significant biases with respect to all the relevant sub-groups in the target population. This requirement may be difficult to satisfy, especially in data-hungry applications. We propose to overcome this difficulty by generating synthetic test set. We use the face landmarks detection task to validate our proposal by showing that all the biases observed on real datasets are also seen on a carefully designed synthetic dataset. This shows that synthetic test sets can efficiently detect a model's weak spots and overcome limitations of real test set in terms of quantity and/or diversity.

翻訳日:2021-11-03 12:41:01 公開日:2021-11-01

# (参考訳) VSEC:ベトナムのSpelling Correctionのためのトランスフォーマーベースモデル

VSEC: Transformer-based Model for Vietnamese Spelling Correction ( http://arxiv.org/abs/2111.00640v1 )

ライセンス: CC BY 4.0

Dinh-Truong Do, Ha Thanh Nguyen, Thang Ngoc Bui, Dinh Hieu Vo

(参考訳) スペル誤り訂正は、自然言語処理における長い歴史を持つトピックの1つである。これまでの研究は目覚ましい成果を上げたが、依然として課題は残っている。ベトナム語では、タスクの最先端の方法は、隣接する音節から音節の文脈を推測する。しかし、2つの(あるいはそれ以上の)綴りミスが互いに近くにある場合、モデルはコンテキストを失う可能性があるため、この手法の精度は満足できない。本稿では,ベトナム語の綴り誤りを訂正する新しい手法を提案する。深層学習モデルを用いてミスタイプエラーとミススペルエラーの問題に取り組む。特に埋め込み層はバイトペア符号化技術によって駆動される。 Transformerアーキテクチャに基づくシーケンスモデルとシーケンスモデルにより、我々のアプローチは、同じ問題に関する以前の研究とは異なるものになる。実験では,スペルエラーをランダムに導入した大規模な合成データセットを用いてモデルを訓練する。提案手法の性能を現実的なデータセットを用いて検証する。このデータセットは、9,341の異なるベトナム語文に11,202の人造ミススペルを含んでいる。実験の結果, 検出した86.8%の誤差と81.5%の誤りが検出され, それぞれ5.6%, 2.2%の改善が得られた。

Spelling error correction is one of topics which have a long history in natural language processing. Although previous studies have achieved remarkable results, challenges still exist. In the Vietnamese language, a state-of-the-art method for the task infers a syllable's context from its adjacent syllables. The method's accuracy can be unsatisfactory, however, because the model may lose the context if two (or more) spelling mistakes stand near each other. In this paper, we propose a novel method to correct Vietnamese spelling errors. We tackle the problems of mistyped errors and misspelled errors by using a deep learning model. The embedding layer, in particular, is powered by the byte pair encoding technique. The sequence to sequence model based on the Transformer architecture makes our approach different from the previous works on the same problem. In the experiment, we train the model with a large synthetic dataset, which is randomly introduced spelling errors. We test the performance of the proposed method using a realistic dataset. This dataset contains 11,202 human-made misspellings in 9,341 different Vietnamese sentences. The experimental results show that our method achieves encouraging performance with 86.8% errors detected and 81.5% errors corrected, which improves the state-of-the-art approach 5.6% and 2.2%, respectively.

翻訳日:2021-11-03 04:15:22 公開日:2021-11-01

# (参考訳) trivoc:ロバストなポイントクラウド登録のための効率的な投票ベースのコンセンサス最大化

TriVoC: Efficient Voting-based Consensus Maximization for Robust Point Cloud Registration with Extreme Outlier Ratios ( http://arxiv.org/abs/2111.00657v1 )

ライセンス: CC0 1.0

Lei Sun, Lu Deng

(参考訳) 対応ベースの点雲登録は、ロボットの知覚とコンピュータビジョンの基盤であり、2点雲を対応づける最良の剛性変換を推定することを目指している。しかし、3次元キーポイントマッチングアプローチのロバスト性が限られているため、おそらく大きな数のアウトリアーは対応式の中に存在しがちであり、ロバストな登録手法が必須となる。残念なことに、既存のロバストな手法は、高または極端な外れ値比に直面した場合に、それぞれ独自の制限(高い計算コストや限られたロバスト性)を持つ。本稿では, 頑健な登録問題に対して, TriVoC (Triple-layered Voting with Consensus maximization) という, 高速, 決定論的, 確固な解決法を提案する。最小の3点セットの選択を3つの連続したレイヤに分解し,各レイヤにおいて,ペアワイズ等長制約に基づいて効率的な投票・対応ソートフレームワークを設計する。このように、3点集合は、ソートされたシーケンスに従って縮小された対応集合から独立して選択することができ、計算コストを大幅に下げる一方、確率的終了条件を満たす限り、最大のコンセンサスセット(最終イリヤセット)を達成するための強い保証を提供する。様々な実験によって、我々の解法トライボックは最大99%の外れ値に対して堅牢であり、極端な外れ値比でも精度が高く、時間効率が高く、実世界のアプリケーションでも実用的であり、他の最先端の競合よりもパフォーマンスが優れていることが示された。

Correspondence-based point cloud registration is a cornerstone in robotics perception and computer vision, which seeks to estimate the best rigid transformation aligning two point clouds from the putative correspondences. However, due to the limited robustness of 3D keypoint matching approaches, outliers, probably in large numbers, are prone to exist among the correspondences, which makes robust registration methods imperative. Unfortunately, existing robust methods have their own limitations (e.g. high computational cost or limited robustness) when facing high or extreme outlier ratios, probably unsuitable for practical use. In this paper, we present a novel, fast, deterministic and guaranteed robust solver, named TriVoC (Triple-layered Voting with Consensus maximization), for the robust registration problem. We decompose the selecting of the minimal 3-point sets into 3 consecutive layers, and in each layer we design an efficient voting and correspondence sorting framework on the basis of the pairwise equal-length constraint. In this manner, the 3-point sets can be selected independently from the reduced correspondence sets according to the sorted sequence, which can significantly lower the computational cost and meanwhile provide a strong guarantee to achieve the largest consensus set (as the final inlier set) as long as a probabilistic termination condition is fulfilled. Varied experiments show that our solver TriVoC is robust against up to 99% outliers, highly accurate, time-efficient even with extreme outlier ratios, and also practical for real-world applications, showing performance superior to other state-of-the-art competitors.

翻訳日:2021-11-03 04:06:59 公開日:2021-11-01

# (参考訳) 新しい特徴的ヒト外観データセットを用いた人間および機械の顔検出の評価

Evaluation of Human and Machine Face Detection using a Novel Distinctive Human Appearance Dataset ( http://arxiv.org/abs/2111.00660v1 )

ライセンス: CC BY 4.0

Necdet Gurkan and Jordan W. Suchow

(参考訳) 顔検出はコンピュータビジョンの分野で長年の課題であり、究極の目標は、制約のない環境で人間の顔を正確にローカライズすることである。ポーズ、解像度、照明、オクルージョン、ビューポイント \cite{merler2019diversity} といった要因が組み合わさっているため、これらのシステムの正確性には大きな技術的ハードルがある。しかし、機械学習の最近の進歩により、顔検出システムは驚くほど精度が向上し、主にデータ駆動のディープラーニングモデルである \cite{wang2017 detectioning} に基づいている。奨励的ではあるが、配備システムの顔検出性能と社会的責任を制限する重要な側面は、人間の外見に固有の多様性である。あらゆる人間の外観は、その遺産、アイデンティティ、経験、自己表現の目に見える表現など、個人に特有の何かを反映している。しかし, 顔の大きさや形状, 肌の色, 体調, 身体の装飾などの違いに直面すると, 顔検出システムの性能に疑問がある。この目的に向けて,表情を低頻度で表現し,顔のデータセットでアンサンプリングされる傾向の強い特徴的人間出現データセットを収集した。そして,これらの画像中の顔を検出する能力について,最先端の顔検出モデルの評価を行った。評価結果は,顔検出アルゴリズムがこれらの多様な外観によく適応していないことを示す。現在の顔検出モデルの評価と特徴付けは、より公平で正確な顔検出システムの構築に向けた研究と開発を加速する。

Face detection is a long-standing challenge in the field of computer vision, with the ultimate goal being to accurately localize human faces in an unconstrained environment. There are significant technical hurdles in making these systems accurate due to confounding factors related to pose, image resolution, illumination, occlusion, and viewpoint \cite{merler2019diversity}. That being said, with recent developments in machine learning, face-detection systems have achieved extraordinary accuracy, largely built on data-driven deep-learning models \cite{wang2017detecting}. Though encouraging, a critical aspect that limits face-detection performance and social responsibility of deployed systems is the inherent diversity of human appearance. Every human appearance reflects something unique about a person, including their heritage, identity, experiences, and visible manifestations of self-expression. However, there are questions about how well face-detection systems perform when faced with varying face size and shape, skin color, body modification, and body ornamentation. Towards this goal, we collected the Distinctive Human Appearance dataset, an image set that represents appearances with low frequency and that tend to be undersampled in face datasets. Then, we evaluated current state-of-the-art face-detection models in their ability to detect faces in these images. The evaluation results show that face-detection algorithms do not generalize well to these diverse appearances. Evaluating and characterizing the state of current face-detection models will accelerate research and development towards creating fairer and more accurate face-detection systems.

翻訳日:2021-11-03 03:30:21 公開日:2021-11-01

# (参考訳) 勧告の比較解説

Comparative Explanations of Recommendations ( http://arxiv.org/abs/2111.00670v1 )

ライセンス: CC BY 4.0

Aobo Yang, Nan Wang, Renqin Cai, Hongbo Deng, Hongning Wang

(参考訳) レコメンデーションは基本的に比較(またはランク付け)のプロセスであるため、ある項目が他の項目よりも優れていると信じている理由、すなわち推奨項目に関する比較説明をユーザーに示す必要がある。理想的には、説明を読むと、ユーザーはシステムと同じ項目のランキングに到達すべきである。残念ながら、このような比較説明にはほとんど研究の注意が払われていない。本研究では,レコメンダシステムからランク付けされた項目群間の相対的な比較を説明するための抽出・再定義アーキテクチャを開発した。推奨項目ごとに、まず関連レビューから1つの文を抽出し、参照項目の集合に対して最適な比較を行う。そして、この抽出文を生成モデルを介して対象ユーザに対してさらに調音し、その項目を推奨する理由をよりよく説明する。我々はBLEUに基づく新しい説明品質指標を設計し、汎用コンテンツの生成を避けるために抽出・精錬部品のエンドツーエンドトレーニングを指導する。 2つの大規模レコメンデーションベンチマークデータセットに対する広範囲なオフライン評価と、最先端のレコメンデーションアルゴリズムに対する真面目なユーザ調査は、比較説明の必要性とソリューションの有効性を示している。

As recommendation is essentially a comparative (or ranking) process, a good explanation should illustrate to users why an item is believed to be better than another, i.e., comparative explanations about the recommended items. Ideally, after reading the explanations, a user should reach the same ranking of items as the system's. Unfortunately, little research attention has yet been paid on such comparative explanations. In this work, we develop an extract-and-refine architecture to explain the relative comparisons among a set of ranked items from a recommender system. For each recommended item, we first extract one sentence from its associated reviews that best suits the desired comparison against a set of reference items. Then this extracted sentence is further articulated with respect to the target user through a generative model to better explain why the item is recommended. We design a new explanation quality metric based on BLEU to guide the end-to-end training of the extraction and refinement components, which avoids generation of generic content. Extensive offline evaluations on two large recommendation benchmark datasets and serious user studies against an array of state-of-the-art explainable recommendation algorithms demonstrate the necessity of comparative explanations and the effectiveness of our solution.

翻訳日:2021-11-03 03:17:14 公開日:2021-11-01

# (参考訳) 特徴豊かさを有する蒸留物体検出器

Distilling Object Detectors with Feature Richness ( http://arxiv.org/abs/2111.00674v1 )

ライセンス: CC BY 4.0

Zhixing Du, Rui Zhang, Ming Chang, Xishan Zhang, Shaoli Liu, Tianshi Chen, Tianshi Chen

(参考訳) 近年、大規模深層モデルが大きな成功を収めているが、計算の複雑さと巨大なストレージ要件により、リソース制限のあるデバイスにデプロイすることが大きな課題となっている。モデル圧縮・加速法として、知識蒸留は教師検出器から暗黒知識を伝達することにより、小型モデルの性能を効果的に向上させる。しかし、既存の蒸留法に基づく検出法のほとんどは、主に2つの制限がある境界ボックス付近の特徴を模倣している。まず、バウンディングボックスの外にある有益な機能を無視する。第二に、これらの手法は教師検出器によって背景と見なされるいくつかの特徴を模倣する。以上の課題に対処するため,蒸留時の一般化検出性を向上する重要な特徴を選択するために,FRS(Feature-Richness Score)法を提案する。提案手法は,境界ボックスの外にある重要な特徴を効果的に検索し,境界ボックス内の有害な特徴を取り除く。本手法は,アンカーベース,アンカーフリー両検出器において優れた性能を示す。例えば、resnet-50のretinanetはcoco2017データセットのマップで39.7%に達し、resnet-101ベースの教師検出器38.9%を0.8%上回っている。

In recent years, large-scale deep models have achieved great success, but the huge computational complexity and massive storage requirements make it a great challenge to deploy them in resource-limited devices. As a model compression and acceleration method, knowledge distillation effectively improves the performance of small models by transferring the dark knowledge from the teacher detector. However, most of the existing distillation-based detection methods mainly imitating features near bounding boxes, which suffer from two limitations. First, they ignore the beneficial features outside the bounding boxes. Second, these methods imitate some features which are mistakenly regarded as the background by the teacher detector. To address the above issues, we propose a novel Feature-Richness Score (FRS) method to choose important features that improve generalized detectability during distilling. The proposed method effectively retrieves the important features outside the bounding boxes and removes the detrimental features within the bounding boxes. Extensive experiments show that our methods achieve excellent performance on both anchor-based and anchor-free detectors. For example, RetinaNet with ResNet-50 achieves 39.7% in mAP on the COCO2017 dataset, which even surpasses the ResNet-101 based teacher detector 38.9% by 0.8%.

翻訳日:2021-11-03 02:59:47 公開日:2021-11-01

# (参考訳) 球面埋め込みの領域適応

Domain-adaptation of spherical embeddings ( http://arxiv.org/abs/2111.00677v1 )

ライセンス: CC BY 4.0

Mihalis Gongolidis, Jeremy Minton, Ronin Wu, Valentin Stauber, Jason Hoelscher-Obermaier and Viktor Botev

(参考訳) 特定のドメインの言語に汎用的な埋め込みを更新する埋め込みモデルのドメイン適応は、効果的なモデルをスクラッチからトレーニングするのに不十分なデータを持つドメインにとって実証済みのテクニックである。化学出版物はそのような分野の1つであり、科学用語と過剰な用語が一般的な言語モデルのパフォーマンスを阻害する。近年の arXiv:1911.01196 で提案されている球面埋め込みモデル (JoSE) は,多次元単位球上での訓練において,単語と文書の埋め込みを共同で学習する。しかし、トレーニング中のグローバル回転による非収束は、ドメイン適応を妨げている。本研究では,埋め込み空間のグローバルなローテーションに対応する手法を開発し,ドメイン固有トレーニング中に単語や文書を更新する手法を提案する。 2つの新しい文書分類データセットがgeneral and chemistry scientific journalsから照合され、提案された更新トレーニング戦略とベンチマークモデルを比較する。当社の戦略は、word2vecに似たレベルまでドメイン適応のパフォーマンスコストを削減できることを示します。

Domain adaptation of embedding models, updating a generic embedding to the language of a specific domain, is a proven technique for domains that have insufficient data to train an effective model from scratch. Chemistry publications is one such domain, where scientific jargon and overloaded terminology inhibit the performance of a general language model. The recent spherical embedding model (JoSE) proposed in arXiv:1911.01196 jointly learns word and document embeddings during training on the multi-dimensional unit sphere, which performs well for document classification and word correlation tasks. But, we show a non-convergence caused by global rotations during its training prevents it from domain adaptation. In this work, we develop methods to counter the global rotation of the embedding space and propose strategies to update words and documents during domain specific training. Two new document classification data-sets are collated from general and chemistry scientific journals to compare the proposed update training strategies with benchmark models. We show that our strategies are able to reduce the performance cost of domain adaptation to a level similar to Word2Vec.

翻訳日:2021-11-03 02:44:10 公開日:2021-11-01

# (参考訳) Discourse Comprehension: 文接続を表現するための質問応答フレームワーク

Discourse Comprehension: A Question Answering Framework to Represent Sentence Connections ( http://arxiv.org/abs/2111.00701v1 )

ライセンス: CC BY 4.0

Wei-Jen Ko, Cutter Dalton, Mark Simmons, Eliza Fisher, Greg Durrett, Junyi Jessy Li

(参考訳) 単純なファクトイドの質問応答を通じてテキスト理解が大幅に進歩してきたが、談話のより包括的な理解は依然として大きな課題である。テキストを読みながら批判的に反省する人は、好奇心に駆られ、しばしば公然とした質問を提起し、内容の深い理解を反映し、答えるには複雑な推論を必要とする。この種の談話理解のためのモデルを構築して評価する上での重要な課題は、注釈付きデータの欠如である。本稿では,ニュース文書の理解を目的としたスケーラブルなデータ収集を実現するための新しいパラダイムを提案する。得られたコーパスであるDCQA(Discourse Comprehension by Question Answering)は、607の英語文書からなる22,430の質問回答ペアで構成されている。 DCQAは、言論と文間のセマンティックリンクの両方を自由形式のオープンエンドの質問形式でキャプチャする。 INQUISITIVEデータセットからの質問に注釈を付けた評価セットでは、DCQAがオープンな質問に答えるための貴重な監視を提供することを示す。さらに,既存の質問応答資源を活用した事前学習手法を設計,合成データを用いて不可解な質問に適応する。

While there has been substantial progress in text comprehension through simple factoid question answering, more holistic comprehension of a discourse still presents a major challenge. Someone critically reflecting on a text as they read it will pose curiosity-driven, often open-ended questions, which reflect deep understanding of the content and require complex reasoning to answer. A key challenge in building and evaluating models for this type of discourse comprehension is the lack of annotated data, especially since finding answers to such questions (which may not be answered at all) requires high cognitive load for annotators over long documents. This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents, viewing these questions through the lens of discourse. The resulting corpus, DCQA (Discourse Comprehension by Question Answering), consists of 22,430 question-answer pairs across 607 English documents. DCQA captures both discourse and semantic links between sentences in the form of free-form, open-ended questions. On an evaluation set that we annotated on questions from the INQUISITIVE dataset, we show that DCQA provides valuable supervision for answering open-ended questions. We additionally design pre-training methods utilizing existing question-answering resources, and use synthetic data to accommodate unanswerable questions.

翻訳日:2021-11-03 02:41:12 公開日:2021-11-01

# (参考訳) アウトラインとファイリング: ナレッジグラフ上の複雑な質問に答える階層的クエリグラフ生成

Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph ( http://arxiv.org/abs/2111.00732v1 )

ライセンス: CC BY 4.0

Yongrui Chen, Huiying Li, Guilin Qi, Tianxing Wu, and Tenggou Wang

(参考訳) クエリグラフの構築は、自然言語の質問に答えるナレッジグラフ上で、正確な実行可能なSPARQLを構築することを目的としている。最近のアプローチはnnベースのクエリグラフのランキングでうまく機能しているが、複雑なsparql構文、ランキングのための巨大な検索スペース、ローカルなあいまいさを伴うノイズの多いクエリグラフという3つの新しい課題がある。本稿ではこれらの課題に対処する。当初、一般的な複雑なSPARQL構文を頂点とエッジからなるサブグラフとみなし、それらを適応するための新しい統一クエリグラフ文法を提案する。次に,問合せグラフを構築するための新しい二段階アプローチを提案する。第1段階では、最上位の$k$関連インスタンス(エンティティ、関係など)は、候補インスタンスとして単純な戦略によって収集される。第2段階では、グラフ生成モデルが階層生成を行う。まず、頂点とエッジが空のスロットであるグラフ構造を概説し、次に適切なインスタンスをスロットに埋め、クエリグラフを完成させる。このアプローチでは,クエリグラフ全体の耐え難い検索空間を手頃なサブスペースに分解する一方で,グローバル構造情報を活用して局所曖昧性を排除する。実験結果から,本手法は最も難しいKGQAベンチマークの最先端性を大幅に向上し,複雑な問題に対して優れた性能を示すことが示された。

Query graph building aims to build correct executable SPARQL over the knowledge graph for answering natural language questions. Although recent approaches perform well by NN-based query graph ranking, more complex questions bring three new challenges: complicated SPARQL syntax, huge search space for ranking, and noisy query graphs with local ambiguity. This paper handles these challenges. Initially, we regard common complicated SPARQL syntax as the sub-graphs comprising of vertices and edges and propose a new unified query graph grammar to adapt them. Subsequently, we propose a new two-stage approach to build query graphs. In the first stage, the top-$k$ related instances (entities, relations, etc.) are collected by simple strategies, as the candidate instances. In the second stage, a graph generation model performs hierarchical generation. It first outlines a graph structure whose vertices and edges are empty slots, and then fills the appropriate instances into the slots, thereby completing the query graph. Our approach decomposes the unbearable search space of entire query graphs into affordable sub-spaces of operations, meanwhile, leverages the global structural information to eliminate local ambiguity. The experimental results demonstrate that our approach greatly improves state-of-the-art on the hardest KGQA benchmarks and has an excellent performance on complex questions.

翻訳日:2021-11-03 02:24:56 公開日:2021-11-01

# (参考訳) 信念を広める集団からのロバストな深層学習

Robust Deep Learning from Crowds with Belief Propagation ( http://arxiv.org/abs/2111.00734v1 )

ライセンス: CC BY 4.0

Hoyoung Kim, Seunghyuk Cho, Dongwoo Kim, Jungseul Ok

(参考訳) クラウドソーシングシステムにより、クラウドワーカーから騒がしいラベルを収集できます。ワーカとタスク間のローカル依存関係を表すグラフィカルモデルは、ノイズの多い回答から真のラベルを推論する原則的な方法を提供する。しかし、多くの場合、真のラベルではなく、クラウドソースされたデータセットから直接見えないデータに取り組んでいる予測モデルが必要です。真のラベルを推論し、同時に予測モデルを学習するために、ニューラルネットワークがタスク特徴から真のラベルを生成する新しいデータ生成プロセスを提案する。変動推論と深層学習を交互に交互に行うEMフレームワークを考案し,真のラベルを推測し,ニューラルネットワークを更新する。合成データと実データを用いた実験結果から,信念伝達に基づくemアルゴリズムは頑健であることが分かる。一業務の特徴の腐敗二前任のマルチモーダル又はミスマッチ労働者、及び三多くの業務に騒音を提出するスパマーは少ない。

Crowdsourcing systems enable us to collect noisy labels from crowd workers. A graphical model representing local dependencies between workers and tasks provides a principled way of reasoning over the true labels from the noisy answers. However, one needs a predictive model working on unseen data directly from crowdsourced datasets instead of the true labels in many cases. To infer true labels and learn a predictive model simultaneously, we propose a new data-generating process, where a neural network generates the true labels from task features. We devise an EM framework alternating variational inference and deep learning to infer the true labels and to update the neural network, respectively. Experimental results with synthetic and real datasets show a belief-propagation-based EM algorithm is robust to i) corruption in task features, ii) multi-modal or mismatched worker prior, and iii) few spammers submitting noises to many tasks.

翻訳日:2021-11-03 01:50:52 公開日:2021-11-01

# (参考訳) URIR:知識グラフに基づくユーザRNNエンコーダとアイテムエンコーダの推薦アルゴリズム

URIR: Recommendation algorithm of user RNN encoder and item encoder based on knowledge graph ( http://arxiv.org/abs/2111.00739v1 )

ライセンス: CC BY 4.0

Na zhao, Zhen Long, Zhi-Dan Zhao, Jian Wang

(参考訳) 情報量が多いため,ユーザが興味を持っているものを見つけることは困難である。ユーザ体験を改善するため,音楽レコメンデーションや映画レコメンデーション,オンラインショッピングなどのシナリオで広く利用されている。近年,知識グラフ(KG)はレコメンデーションシステムの性能向上に有効なツールであることが証明されている。しかし、レコメンデーションにナレッジグラフを適用する際の大きな課題は、ナレッジグラフを使ってより良いユーザコードやアイテムコードを取得する方法である。そこで本研究では,知識グラフ(URIR)に基づくユーザリカレントニューラルネットワーク(RNN)エンコーダとアイテムエンコーダ推薦アルゴリズムを提案する。本研究は,アイテムの表現ベクトルを生成するために高レベルな隣接情報をキャプチャしてアイテムをエンコードし,ユーザの表現ベクトルを生成するためにrnnおよびアイテムの表現ベクトルを適用し,ユーザの表現ベクトルおよびアイテムの表現ベクトルに対して内部積演算を行い,アイテムとのインタラクションの確率を得る。 3つの実世界のデータセットに関する数値実験により、URIRはAUC、Precision、Recall、MRRなどの指標における最先端アルゴリズムよりも優れた性能を示している。これは、urirがナレッジグラフを効果的に使用して、より良いユーザコードとアイテムコードを取得し、よりよい推奨結果を得ることができることを意味する。

Due to a large amount of information, it is difficult for users to find what they are interested in among the many choices. In order to improve users' experience, recommendation systems have been widely used in music recommendations, movie recommendations, online shopping, and other scenarios. Recently, Knowledge Graph (KG) has been proven to be an effective tool to improve the performance of recommendation systems. However, a huge challenge in applying knowledge graphs for recommendation is how to use knowledge graphs to obtain better user codes and item codes. In response to this problem, this research proposes a user Recurrent Neural Network (RNN) encoder and item encoder recommendation algorithm based on Knowledge Graph (URIR). This study encodes items by capturing high-level neighbor information to generate items' representation vectors and applies an RNN and items' representation vectors to encode users to generate users' representation vectors, and then perform inner product operation on users' representation vectors and items' representation vectors to get probabilities of users interaction with items. Numerical experiments on three real-world datasets demonstrate that URIR is superior performance to state-of-the-art algorithms in indicators such as AUC, Precision, Recall, and MRR. This implies that URIR can effectively use knowledge graph to obtain better user codes and item codes, thereby obtaining better recommendation results.

翻訳日:2021-11-03 01:28:17 公開日:2021-11-01

# (参考訳) 3次元脳腫瘍MRIのセマンティックセグメンテーションにおける冗長性の検討

Redundancy Reduction in Semantic Segmentation of 3D Brain Tumor MRIs ( http://arxiv.org/abs/2111.00742v1 )

ライセンス: CC BY 4.0

Md Mahfuzur Rahman Siddiquee, Andriy Myronenko

(参考訳) また、multimodal brain tumor segmentation challenge (brats) 2021ではさらに大きなデータセットを提供し、疾患の分析と治療計画に必要な脳腫瘍の分割方法の協力と研究を容易にする。 BraTS 2021の大規模なデータセットサイズと現代的なGPUの出現は、データから腫瘍表現を学ぶためのディープラーニングベースのアプローチによりよい機会を提供する。本研究では,エンコーダ・デコーダに基づくセグメンテーションネットワークを維持しつつ,摂動下での冗長性を最小限に抑えるネットワークトレーニングプロセスの改良に焦点をあてた。ネットワークが訓練された場合、信頼性に基づくアンサンブル技術を導入し、パフォーマンスをさらに向上する。本手法をBraTS 2021検証ボード上で評価し, 腫瘍コア, 腫瘍コア, 腫瘍全体に対する平均ダイス0.8600, 0.8868, 0.9265を得た。私たちのチーム(NVAUTO)の応募は、ETとTCのスコアで上位10チーム、WTのスコアで上位10チームだった。

Another year of the multimodal brain tumor segmentation challenge (BraTS) 2021 provides an even larger dataset to facilitate collaboration and research of brain tumor segmentation methods, which are necessary for disease analysis and treatment planning. A large dataset size of BraTS 2021 and the advent of modern GPUs provide a better opportunity for deep-learning based approaches to learn tumor representation from the data. In this work, we maintained an encoder-decoder based segmentation network, but focused on a modification of network training process that minimizes redundancy under perturbations. Given a set trained networks, we further introduce a confidence based ensembling techniques to further improve the performance. We evaluated the method on BraTS 2021 validation board, and achieved 0.8600, 0.8868 and 0.9265 average dice for enhanced tumor core, tumor core and whole tumor, respectively. Our team (NVAUTO) submission was the top performing in terms of ET and TC scores and within top 10 performing teams in terms of WT scores.

翻訳日:2021-11-03 01:16:53 公開日:2021-11-01

# (参考訳) 品質評価データセットを効率的に生成する新しいツール

A New Tool for Efficiently Generating Quality Estimation Datasets ( http://arxiv.org/abs/2111.00767v1 )

ライセンス: CC BY 4.0

Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim

(参考訳) 品質推定(QE)トレーニングのためのデータの構築は費用がかかり、かなりの人的労力を要する。本研究では、qeを実行しながらデータ中心のアプローチに注目し、入力として単言語または並列コーパスのみを受信してqeデータセットを生成する完全自動擬似qeデータセット生成ツールを提案する。これにより、データ拡張または複数の言語ペアにQEの適用性を活用するように促すことにより、QE性能が向上する。さらに、このツールがコミュニティにQEデータセットを開発するための新しい安価な方法を提供すると考えているので、ユーザフレンドリーなQEデータセット生成ツールを公開するつもりです。

Building of data for quality estimation (QE) training is expensive and requires significant human labor. In this study, we focus on a data-centric approach while performing QE, and subsequently propose a fully automatic pseudo-QE dataset generation tool that generates QE datasets by receiving only monolingual or parallel corpus as the input. Consequently, the QE performance is enhanced either by data augmentation or by encouraging multiple language pairs to exploit the applicability of QE. Further, we intend to publicly release this user friendly QE dataset generation tool as we believe this tool provides a new, inexpensive method to the community for developing QE datasets.

翻訳日:2021-11-03 01:09:47 公開日:2021-11-01

# (参考訳) AdaPool: 情報保持ダウンサンプリングのための指数適応型プール

AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling ( http://arxiv.org/abs/2111.00772v1 )

ライセンス: CC BY 4.0

Alexandros Stergiou and Ronald Poppe

(参考訳) プール層は畳み込みニューラルネットワーク(cnns)の重要な構成要素であり、計算オーバーヘッドを削減し、畳み込み操作の受容野を増加させる。彼らは入力ボリュームによく似たサンプル化されたボリュームを作成し、理想的には計算とメモリ効率の両立を目指している。両方の要件を共同で満たすことは困難である。この目的のために、適応的で指数関数的に重み付けされたプール法である $\textit{adaPool}$ を提案する。提案手法では,dice-sorensen係数の指数値と指数最大値に基づく2組のプーリングカーネルのパラメータ化融合を用いる。 adaPoolの重要な性質は、その双方向性である。一般的なプーリング法とは対照的に、ウェイトはダウンサンプリングされたアクティベーションマップをアップサンプルするために使うことができる。このメソッドを $\textit{adaUnPool}$ とします。 adaPoolは画像やビデオの分類やオブジェクト検出など,さまざまなタスクを通じて,ディテールの保存性の向上を実証する。次に,画像および映像フレームの超解像とフレーム補間タスクにおけるadaunpoolの評価を行う。ベンチマークには、新しい高品質で高フレームレートのビデオデータセットである$\textit{Inter4K}$を導入する。組み合わせた実験により、adaPoolはタスクやバックボーンアーキテクチャにまたがる優れた結果を体系的に達成し、微妙な計算とメモリオーバーヘッドを発生させることを示した。

Pooling layers are essential building blocks of Convolutional Neural Networks (CNNs) that reduce computational overhead and increase the receptive fields of proceeding convolutional operations. They aim to produce downsampled volumes that closely resemble the input volume while, ideally, also being computationally and memory efficient. It is a challenge to meet both requirements jointly. To this end, we propose an adaptive and exponentially weighted pooling method named $\textit{adaPool}$. Our proposed method uses a parameterized fusion of two sets of pooling kernels that are based on the exponent of the Dice-Sorensen coefficient and the exponential maximum, respectively. A key property of adaPool is its bidirectional nature. In contrast to common pooling methods, weights can be used to upsample a downsampled activation map. We term this method $\textit{adaUnPool}$. We demonstrate how adaPool improves the preservation of detail through a range of tasks including image and video classification and object detection. We then evaluate adaUnPool on image and video frame super-resolution and frame interpolation tasks. For benchmarking, we introduce $\textit{Inter4K}$, a novel high-quality, high frame-rate video dataset. Our combined experiments demonstrate that adaPool systematically achieves better results across tasks and backbone architectures, while introducing a minor additional computational and memory overhead.

翻訳日:2021-11-03 01:05:43 公開日:2021-11-01

# (参考訳) AIを利用した支払いシステムのためのスマートルーティングソリューション

An AI-powered Smart Routing Solution for Payment Systems ( http://arxiv.org/abs/2111.00783v1 )

ライセンス: CC0 1.0

Ramya Bygari, Aayush Gupta, Shashwat Raghuvanshi, Aakanksha Bapna, Birendra Sahu

(参考訳) 現在のデジタル化時代には、オンライン決済システムがかなりの関心を集めている。支払いシステムの効率性の向上は、ビジネスの収益に大きな影響を与えるため重要である。ゲートウェイは、すべてのトランザクションがルーティングされる支払いシステムの不可欠なコンポーネントである。オンライン決済システムでは、支払い処理は価格、方法、リスクチェックなど様々な設定によってこれらのゲートウェイと統合される。これらの構成を端末と呼ぶ。各ゲートウェイには複数の端末が関連付けられる。支払いトランザクションを最良の端末にルーティングすることは、支払いトランザクションが成功する確率を高めるために不可欠である。機械学習(ML)と人工知能(AI)技術は、過去のパフォーマンスと様々な支払い関連属性に基づいて、最適な端末を正確に予測するために使用することができる。我々は静的モジュールと動的モジュールからなるパイプラインを考案した。静的モジュールは、静的ルールとゲートウェイのダウンタイムを予測するロジスティック回帰モデルを使用して、端末の初期フィルタリングを行う。その後、動的モジュールは成功率、支払い属性、タイムラグなどに基づいて多くの新しい特徴を計算し、端末動作を正確にモデル化する。これらの特徴を適応時間減衰率アルゴリズムを用いてリアルタイムにフィードバックループを用いて更新し、ランダムフォレスト分類器に渡して端末毎の成功確率を予測する。このパイプラインは現在razorpayで運用中であり、数百万のトランザクションをリアルタイムにルーティングし、すべての支払い方法(クレジットカード、デビットカード、upi、ネットバンキング)で成功率を4-6\%向上させている。これにより、当社の決済システムはパフォーマンス低下に対する耐性が向上し、ユーザエクスペリエンスが向上し、商人への信頼が増し、ビジネスの収益が向上しました。

In the current era of digitization, online payment systems are attracting considerable interest. Improving the efficiency of a payment system is important since it has a substantial impact on revenues for businesses. A gateway is an integral component of a payment system through which every transaction is routed. In an online payment system, payment processors integrate with these gateways by means of various configurations such as pricing, methods, risk checks, etc. These configurations are called terminals. Each gateway can have multiple terminals associated with it. Routing a payment transaction through the best terminal is crucial to increase the probability of a payment transaction being successful. Machine learning (ML) and artificial intelligence (AI) techniques can be used to accurately predict the best terminals based on their previous performance and various payment-related attributes. We have devised a pipeline consisting of static and dynamic modules. The static module does the initial filtering of the terminals using static rules and a logistic regression model that predicts gateway downtimes. Subsequently, the dynamic module computes a lot of novel features based on success rate, payment attributes, time lag, etc. to model the terminal behaviour accurately. These features are updated using an adaptive time decay rate algorithm in real-time using a feedback loop and passed to a random forest classifier to predict the success probabilities for every terminal. This pipeline is currently in production at Razorpay routing millions of transactions through it in real-time and has given a 4-6\% improvement in success rate across all payment methods (credit card, debit card, UPI, net banking). This has made our payment system more resilient to performance drops, which has improved the user experience, instilled more trust in the merchants, and boosted the revenue of the business.

翻訳日:2021-11-03 00:41:23 公開日:2021-11-01

# (参考訳) 大規模製品ネットワークにおける動的価格と需要学習 : PAC-Bayesianアプローチ

Dynamic Pricing and Demand Learning on a Large Network of Products: A PAC-Bayesian Approach ( http://arxiv.org/abs/2111.00790v1 )

ライセンス: CC BY 4.0

Bora Keskin, David Simchi-Levi, Prem Talwai

(参考訳) 私たちは、T$の期間でN$製品の大規模なネットワークを提供する売り手を考えています。販売者は、製品の線形需要モデルのパラメータを知らず、販売観察に基づいて需要モデルを学ぶために製品価格を動的に調整することができる。売り手は、その疑似レグレット、すなわち、基盤となる需要モデルを知っている透視能力者に対する期待収益損失を最小化することを目指している。我々は,製品ネットワークの様々な接続特性を特徴付けるために,製品間の需要関係のばらばらな集合を考える。特に,(1)ネットワーク上の接続数を制限する$l_0$ sparsity,(2)クロスプロダクト価格感受性の大きさを制約する対角的スパーシティ,(3)ネットワークノード上の類似度メトリックの漸近的減衰を制約するスペクトルスパーシティという新たな概念の3つの異なるスパーシティフレームワークについて検討した。我々は,不確実性とpac-bayesianアプローチの楽観性を組み合わせた動的価格学習政策を提案し,この方針がn$と$t$の漸近的最適性能を達成することを示す。また,スペクトル・非対角性の場合,ネットワークが密集している場合でも,売り手は疑似レグレット線形をn$で得ることができることを示した。

We consider a seller offering a large network of $N$ products over a time horizon of $T$ periods. The seller does not know the parameters of the products' linear demand model, and can dynamically adjust product prices to learn the demand model based on sales observations. The seller aims to minimize its pseudo-regret, i.e., the expected revenue loss relative to a clairvoyant who knows the underlying demand model. We consider a sparse set of demand relationships between products to characterize various connectivity properties of the product network. In particular, we study three different sparsity frameworks: (1) $L_0$ sparsity, which constrains the number of connections in the network, and (2) off-diagonal sparsity, which constrains the magnitude of cross-product price sensitivities, and (3) a new notion of spectral sparsity, which constrains the asymptotic decay of a similarity metric on network nodes. We propose a dynamic pricing-and-learning policy that combines the optimism-in-the-face-of-uncertainty and PAC-Bayesian approaches, and show that this policy achieves asymptotically optimal performance in terms of $N$ and $T$. We also show that in the case of spectral and off-diagonal sparsity, the seller can have a pseudo-regret linear in $N$, even when the network is dense.

翻訳日:2021-11-03 00:31:57 公開日:2021-11-01

# (参考訳) 局所シナプス塑性によるイベントベース時空間特徴記述子学習:コンピュータビジョンの生物学的現実的視点

Learning Event-based Spatio-Temporal Feature Descriptors via Local Synaptic Plasticity: A Biologically-realistic Perspective of Computer Vision ( http://arxiv.org/abs/2111.00791v1 )

ライセンス: CC BY 4.0

Ali Safa, Hichem Sahli, Andr\'e Bourdoux, Ilja Ocket, Francky Catthoor, Georges Gielen

(参考訳) 視覚野で経験的に観察されるように,スパイクタイミング依存塑性学習(STDP)を用いたスパイク皮質アンサンブルを最適化した理論を提案する。提案手法を用いて,N-MNIST,CIFAR10-DVS,IBM DVS128ジェスチャデータセットでそれぞれ評価するイベントベースカメラのための,完全接続型,畳み込み型,アクションベースの機能記述器のクラスを構築した。 CIFAR10-DVSでは,従来のイベントベースの特徴記述子 (+8%) と比較して, 精度が向上した。最新のSTDPシステムに比べて精度が大幅に向上した(N-MNISTでは+10%、IBM DVS128 Gestureでは+7.74%)。ニューロモルフィックエッジデバイスにおける超低消費電力学習に加えて、私たちの研究は、生物学的に現実的で最適化に基づく皮質視覚の理論への道を開くのに役立ちます。

We present an optimization-based theory describing spiking cortical ensembles equipped with Spike-Timing-Dependent Plasticity (STDP) learning, as empirically observed in the visual cortex. Using our methods, we build a class of fully-connected, convolutional and action-based feature descriptors for event-based camera that we respectively assess on N-MNIST, challenging CIFAR10-DVS and on the IBM DVS128 gesture dataset. We report significant accuracy improvements compared to conventional state-of-the-art event-based feature descriptors (+8% on CIFAR10-DVS). We report large improvements in accuracy compared to state-of-the-art STDP-based systems (+10% on N-MNIST, +7.74% on IBM DVS128 Gesture). In addition to ultra-low-power learning in neuromorphic edge devices, our work helps paving the way towards a biologically-realistic, optimization-based theory of cortical vision.

翻訳日:2021-11-03 00:30:49 公開日:2021-11-01

# (参考訳) 予測モデルにおける共起バイアスの統計的定量化

Statistical quantification of confounding bias in predictive modelling ( http://arxiv.org/abs/2111.00814v1 )

ライセンス: CC BY 4.0

Tamas Spisak

(参考訳) 非パラメトリックな統計テストの欠如は、多くの研究分野における堅牢で有効で一般化可能な予測モデルの開発を著しく妨げている。ここでは、ある共同設立変数に対して、未確立モデルと完全構築モデルのヌル仮説をそれぞれ探索する部分的および完全共創テストを提案する。テストは、機械学習でよく見られる非正規および非線形依存予測においても、タイプiのエラーと高い統計力に対する厳密な制御を提供する。 Human Connectome ProjectとAutism Brain Imaging Data Exchangeのデータセットからトレーニングされた脳の機能的コネクティビティデータに基づいて、提案されたテストを適用することで、これまで報告されていない、あるいは最先端のコンファウンド緩和アプローチで修正が難しい共同創設者が明らかになった。 mlconfound(https://mlconfound.readthedocs.io)パッケージに実装されたこのテストは、予測モデルの一般化性と神経生物学的妥当性の評価と改善を支援し、臨床的に有用な機械学習バイオマーカーの開発を促進する。

The lack of non-parametric statistical tests for confounding bias significantly hampers the development of robust, valid and generalizable predictive models in many fields of research. Here I propose the partial and full confounder tests, which, for a given confounder variable, probe the null hypotheses of unconfounded and fully confounded models, respectively. The tests provide a strict control for Type I errors and high statistical power, even for non-normally and non-linearly dependent predictions, often seen in machine learning. Applying the proposed tests on models trained on functional brain connectivity data from the Human Connectome Project and the Autism Brain Imaging Data Exchange dataset reveals confounders that were previously unreported or found to be hard to correct for with state-of-the-art confound mitigation approaches. The tests, implemented in the package mlconfound (https://mlconfound.readthedocs.io), can aid the assessment and improvement of the generalizability and neurobiological validity of predictive models and, thereby, foster the development of clinically useful machine learning biomarkers.

翻訳日:2021-11-03 00:12:48 公開日:2021-11-01

# (参考訳) ロバストネスのためのエッセンス仕様の改訂に向けて

Towards Reformulating Essence Specifications for Robustness ( http://arxiv.org/abs/2111.00821v1 )

ライセンス: CC BY-SA 4.0

\"Ozg\"ur Akg\"un, Alan M. Frisch, Ian P. Gent, Christopher Jefferson, Ian Miguel, Peter Nightingale, Andr\'as Z. Salamon

(参考訳) Essence言語は、ユーザが制約モデリング決定を行う上で上の抽象レベルで制約問題を指定することを可能にする。エッセンス仕様は、一連のリファインメントルールを使用するConjure自動モデリングツールを使用して制約モデルに洗練される。しかし本質は、与えられた問題を特定するための多くの等価な方法があるリッチ言語である。したがって、ユーザーはドメイン属性や抽象型の使用を省略できるため、適用可能な洗練されたルールが少なくなり、選択する出力モデルのセットが削減される。本稿では,入力エッセンス仕様の変動に直面した出力制約モデルの品質のロバスト性を高めるために,この情報を自動回復する問題に対処する。我々は、決定変数の型を変更したり、ドメインを縮小する属性を追加することができる改革ルールを提示します。本手法の有効性を,変換仕様から生成できるモデルの量と品質の観点から示す。

The Essence language allows a user to specify a constraint problem at a level of abstraction above that at which constraint modelling decisions are made. Essence specifications are refined into constraint models using the Conjure automated modelling tool, which employs a suite of refinement rules. However, Essence is a rich language in which there are many equivalent ways to specify a given problem. A user may therefore omit the use of domain attributes or abstract types, resulting in fewer refinement rules being applicable and therefore a reduced set of output models from which to select. This paper addresses the problem of recovering this information automatically to increase the robustness of the quality of the output constraint models in the face of variation in the input Essence specification. We present reformulation rules that can change the type of a decision variable or add attributes that shrink its domain. We demonstrate the efficacy of this approach in terms of the quantity and quality of models Conjure can produce from the transformed specification compared with the original.

翻訳日:2021-11-02 23:47:04 公開日:2021-11-01

# (参考訳) 再現性レンズによる人工知能の公正性・説明責任・信頼度・透明性の教育

Teaching Fairness, Accountability, Confidentiality, and Transparency in Artificial Intelligence through the Lens of Reproducibility ( http://arxiv.org/abs/2111.00826v1 )

ライセンス: CC BY 4.0

Ana Lucic, Maurits Bleeker, Sami Jullien, Samarth Bhargav, Maarten de Rijke

(参考訳) 本研究は,アムステルダム大学の公正性,説明可能性,信頼性,透明性に関する技術的・大学院レベルのコース(FACT-AI)について,再現性のレンズを通してFACT-AIの概念を教える。コースの焦点は、トップAIカンファレンスから既存のFACT-AIアルゴリズムを再現し、彼らの経験に関するレポートを書くことに基づくグループプロジェクトである。コースの最初のイテレーションで、私たちはグループプロジェクトのコード実装を備えたオープンソースリポジトリを作成しました。第2イテレーションでは、学生に対して、機械学習再現性チャレンジにグループプロジェクトを提出するように勧めました。我々は、1年が世界的なパンデミックと一致した2年間の授業を指導した経験を振り返り、大学院レベルのaiプログラムで再現性を通じてファクトaiを教えるためのガイドラインを提案する。将来、教員が大学に同様のコースを開設する上で有用なリソースになることを願っている。

In this work we explain the setup for a technical, graduate-level course on Fairness, Accountability, Confidentiality and Transparency in Artificial Intelligence (FACT-AI) at the University of Amsterdam, which teaches FACT-AI concepts through the lens of reproducibility. The focal point of the course is a group project based on reproducing existing FACT-AI algorithms from top AI conferences, and writing a report about their experiences. In the first iteration of the course, we created an open source repository with the code implementations from the group projects. In the second iteration, we encouraged students to submit their group projects to the Machine Learning Reproducibility Challenge, which resulted in 9 reports from our course being accepted to the challenge. We reflect on our experience teaching the course over two academic years, where one year coincided with a global pandemic, and propose guidelines for teaching FACT-AI through reproducibility in graduate-level AI programs. We hope this can be a useful resource for instructors to set up similar courses at their universities in the future.

翻訳日:2021-11-02 23:37:47 公開日:2021-11-01

# (参考訳) 低リソース言語における名前付きエンティティ認識のためのディープラーニングトランスフォーマーアーキテクチャ:最先端の成果

Deep Learning Transformer Architecture for Named Entity Recognition on Low Resourced Languages: State of the art results ( http://arxiv.org/abs/2111.00830v1 )

ライセンス: CC BY 4.0

Ridewaan Hanslo

(参考訳) 本稿では,低リソースの南アフリカ(SA)言語10言語を対象としたNERのためのディープラーニング(DL)トランスフォーマーアーキテクチャモデルの評価について述べる。さらに、これらのDLトランスモデルを他のニューラルネットワークおよび機械学習(ML)NERモデルと比較した。その結果,言語毎に離散的な微調整パラメータを適用すると,トランスフォーマーモデルの性能が著しく向上することがわかった。さらに、微調整トランスフォーマーモデルは、低リソースのsa言語でnerを使った他のニューラルネットワークや機械学習モデルよりも優れている。例えば、トランスフォーマーモデルは、条件付き確率場mlモデルを上回る平均f-scoreを含む10のsa言語のうち6言語で最高のf-scoreを生成した。さらなる研究は、フレーズチャンキング、機械翻訳、パート・オブ・スパイチ・タギングなど、他の自然言語処理タスクやアプリケーションにおける、より最近のトランスフォーマーアーキテクチャモデルを評価する可能性がある。

This paper reports on the evaluation of Deep Learning (DL) transformer architecture models for Named-Entity Recognition (NER) on ten low-resourced South African (SA) languages. In addition, these DL transformer models were compared to other Neural Network and Machine Learning (ML) NER models. The findings show that transformer models significantly improve performance when applying discrete fine-tuning parameters per language. Furthermore, fine-tuned transformer models outperform other neural network and machine learning models with NER on the low-resourced SA languages. For example, the transformer models generated the highest F-scores for six of the ten SA languages, including the highest average F-score surpassing the Conditional Random Fields ML model. Additional research could evaluate the more recent transformer architecture models on other Natural Language Processing tasks and applications, such as Phrase chunking, Machine Translation, and Part-of-Speech tagging.

翻訳日:2021-11-02 23:21:37 公開日:2021-11-01

# (参考訳) deep learning seeded importance sampling による重力波の高速定位

Swift sky localization of gravitational waves using deep learning seeded importance sampling ( http://arxiv.org/abs/2111.00833v1 )

ライセンス: CC BY 4.0

Alex Kolmus, Gr\'egory Baltus, Justin Janquart, Twan van Laarhoven, Sarah Caudill, and Tom Heskes

(参考訳) 重力波の天起源の高速で高精度で信頼性の高い推定は、リアルタイムのマルチメッセンガー天文学を可能にする。現在のベイズ推論方法論は、正確かつ信頼性が高いが、遅い。ディープラーニングモデルは、重力波の推論タスクに対して正確で極めて高速であることを示してきたが、その出力はニューラルネットワークのブラックボックスの性質のために本質的に疑わしい。本研究では,多頭部畳み込みニューラルネットワークによって生成された近似後部への重要サンプリングを適用し,ベイズ推論と深層学習に参加する。ニューラルネットワークは、ligoおよびvirgo検出器に与えられたシミュレーション重力波注入のための空座標と2つの質量のフォン・ミセス・フィッシャー分布とガウス分布をパラメータ化する。ベイズ推定を数分で生成する予測に非常によく似た、見えない重力波イベントのためのスカイマップを生成する。さらに、ニューラルネットワークから予測不良を検出し、素早くフラグを立てることができます。

Fast, highly accurate, and reliable inference of the sky origin of gravitational waves would enable real-time multi-messenger astronomy. Current Bayesian inference methodologies, although highly accurate and reliable, are slow. Deep learning models have shown themselves to be accurate and extremely fast for inference tasks on gravitational waves, but their output is inherently questionable due to the blackbox nature of neural networks. In this work, we join Bayesian inference and deep learning by applying importance sampling on an approximate posterior generated by a multi-headed convolutional neural network. The neural network parametrizes Von Mises-Fisher and Gaussian distributions for the sky coordinates and two masses for given simulated gravitational wave injections in the LIGO and Virgo detectors. We generate skymaps for unseen gravitational-wave events that highly resemble predictions generated using Bayesian inference in a few minutes. Furthermore, we can detect poor predictions from the neural network, and quickly flag them.

翻訳日:2021-11-02 23:10:35 公開日:2021-11-01

# (参考訳) GradCAMを用いた実時間MRI変動のシミュレーションによるディープラーニングモデルと視覚的説明の改善

Simulating Realistic MRI variations to Improve Deep Learning model and visual explanations using GradCAM ( http://arxiv.org/abs/2111.00837v1 )

ライセンス: CC BY 4.0

Muhammad Ilyas Patel, Shrey Singla, Razeem Ahmad Ali Mattathodi, Sumit Sharma, Deepam Gautam, Srinivasa Rao Kundeti

(参考訳) 医療分野では、MRIのランドマーク検出は、スキャン計画や画像登録などのタスクにおいて、医療技術者の努力を減らす上で重要な役割を果たす。まず、脳の解剖学に散在する88のランドマーク -- 矢状、冠、軸 -- が手動で注釈付けされ、その後、専門臨床技術者のガイドラインは、斜めスキャンでも重要なアトラスのランドマークを特定するために、既存のランドマークのより適切な位置化のために解剖学的に取られる。限られたデータ可用性を克服するため,合成3次元ボリュームデータを生成するために,現実的なデータ拡張を実装した。修正されたHighRes3DNetモデルを用いて脳MRIボリュームランドマーク検出問題を解決する。未発見のデータ上でトレーニングされたモデルを視覚的に説明し、より弱いモデルからより強固なモデルを識別するために、勾配重み付きクラスアクティベーションマッピング(grad-cam)を実装し、モデルが集中している領域を強調する粗いローカライズマップを作成します。実験の結果,提案手法は良好な結果を示し,パイプライン全体を多数のランドマークや他の解剖学に拡張できることがわかった。

In the medical field, landmark detection in MRI plays an important role in reducing medical technician efforts in tasks like scan planning, image registration, etc. First, 88 landmarks spread across the brain anatomy in the three respective views -- sagittal, coronal, and axial are manually annotated, later guidelines from the expert clinical technicians are taken sub-anatomy-wise, for better localization of the existing landmarks, in order to identify and locate the important atlas landmarks even in oblique scans. To overcome limited data availability, we implement realistic data augmentation to generate synthetic 3D volumetric data. We use a modified HighRes3DNet model for solving brain MRI volumetric landmark detection problem. In order to visually explain our trained model on unseen data, and discern a stronger model from a weaker model, we implement Gradient-weighted Class Activation Mapping (Grad-CAM) which produces a coarse localization map highlighting the regions the model is focusing. Our experiments show that the proposed method shows favorable results, and the overall pipeline can be extended to a variable number of landmarks and other anatomies.

翻訳日:2021-11-02 22:54:26 公開日:2021-11-01

# (参考訳) 大規模ディープラーニング最適化: 総合的な調査

Large-Scale Deep Learning Optimizations: A Comprehensive Survey ( http://arxiv.org/abs/2111.00856v1 )

ライセンス: CC BY 4.0

Xiaoxin He, Fuzhao Xue, Xiaozhe Ren, Yang You

(参考訳) ディープラーニングは、幅広いAIアプリケーションで有望な結果を得た。より大きなデータセットとモデルにより、継続的にパフォーマンスが向上する。しかし、私たちは一般的に、より多くの計算と通信に長いトレーニング時間を費やしています。本研究では,モデル精度とモデル効率に関して,大規模深層学習の最適化に関する明確なスケッチを提供する。我々は,大規模バッチ学習で発生する一般化ギャップの解答的トピックを最適化するために最もよく用いられるアルゴリズムについて検討し,通信オーバヘッドに対処し,メモリフットプリントを削減するためのSOTA戦略を概観する。

Deep learning have achieved promising results on a wide spectrum of AI applications. Larger datasets and models consistently yield better performance. However, we generally spend longer training time on more computation and communication. In this survey, we aim to provide a clear sketch about the optimizations for large-scale deep learning with regard to the model accuracy and model efficiency. We investigate algorithms that are most commonly used for optimizing, elaborate the debatable topic of generalization gap arises in large-batch training, and review the SOTA strategies in addressing the communication overhead and reducing the memory footprints.

翻訳日:2021-11-02 22:39:47 公開日:2021-11-01

# (参考訳) pcaに基づくマルチタスク学習:ランダムマトリクスによるアプローチ

PCA-based Multi Task Learning: a Random Matrix Approach ( http://arxiv.org/abs/2111.00924v1 )

ライセンス: CC BY 4.0

Malik Tiomoko, Romain Couillet and Fr\'ed\'eric Pascal

(参考訳) 本稿では,人気主成分分析(PCA)に基づく教師付き学習スキーム \cite{barshan 2011supervised,bair2006prediction} における,emph{computationally efficient} multi-task learning (MTL)拡張の提案と理論的解析を行う。その分析が明らかにする (i) デフォルト学習は,emph{ negative transfer} に苦しむことによって劇的に失敗することがあるが, (ii)データラベルの単純なカウンタ測定は負の転送を回避し、必ずしも性能の向上をもたらす。合成および実データベンチマーク実験の支援により,提案手法は最先端のMTL法と同等の性能を示すが,計算コストは大幅に削減された。

The article proposes and theoretically analyses a \emph{computationally efficient} multi-task learning (MTL) extension of popular principal component analysis (PCA)-based supervised learning schemes \cite{barshan2011supervised,bair2006prediction}. The analysis reveals that (i) by default learning may dramatically fail by suffering from \emph{negative transfer}, but that (ii) simple counter-measures on data labels avert negative transfer and necessarily result in improved performances. Supporting experiments on synthetic and real data benchmarks show that the proposed method achieves comparable performance with state-of-the-art MTL methods but at a \emph{significantly reduced computational cost}.

翻訳日:2021-11-02 22:38:56 公開日:2021-11-01

# (参考訳) 領域不確実性定量化による半教師あり学習

Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification ( http://arxiv.org/abs/2111.00928v1 )

ライセンス: CC BY 4.0

Zhenyu Wang, Yali Li, Ye Guo, Shengjin Wang

(参考訳) 半教師付き学習は、大量のラベルのないデータをパフォーマンス向上に活用することを目的としている。既存の作品は主に画像分類に焦点を当てている。本稿では,ラベル付きデータの収集に手間がかかる物体検出のための半教師付き学習について述べる。現在の手法は、擬似ラベルによって生成されるノイズの多い領域によって容易に妨げられる。雑音ラベリングに対処するため,領域の不確実性を定量化して雑音耐性半教師付き学習を提案する。まず, 擬似ラベルによるノイズの異なる形態による悪影響について検討した。そこで本研究では,異なる強度の領域の耐雑音特性を同定することにより,領域の不確かさを定量化する。領域不確かさをインポートし、マルチピーク確率分布アウトプットを促進することにより、不確実性をトレーニングに導入し、さらに耐雑音学習を実現する。 PASCAL VOCとMS COCOの併用実験により,本手法の異常な性能を実証した。

Semi-supervised learning aims to leverage a large amount of unlabeled data for performance boosting. Existing works primarily focus on image classification. In this paper, we delve into semi-supervised learning for object detection, where labeled data are more labor-intensive to collect. Current methods are easily distracted by noisy regions generated by pseudo labels. To combat the noisy labeling, we propose noise-resistant semi-supervised learning by quantifying the region uncertainty. We first investigate the adverse effects brought by different forms of noise associated with pseudo labels. Then we propose to quantify the uncertainty of regions by identifying the noise-resistant properties of regions over different strengths. By importing the region uncertainty quantification and promoting multipeak probability distribution output, we introduce uncertainty into training and further achieve noise-resistant learning. Experiments on both PASCAL VOC and MS COCO demonstrate the extraordinary performance of our method.

翻訳日:2021-11-02 22:21:38 公開日:2021-11-01

# (参考訳) あらゆる境界:双方向境界を持つエネルギーベースモデルのトレーニング

Bounds all around: training energy-based models with bidirectional bounds ( http://arxiv.org/abs/2111.00929v1 )

ライセンス: CC BY 4.0

Cong Geng, Jia Wang, Zhiyong Gao, Jes Frellsen, S{\o}ren Hauberg

(参考訳) エネルギーベースモデル(EBM)は密度推定のためのエレガントなフレームワークを提供するが、それらは訓練が難しいことで知られている。近年の研究では、変動値関数を持つミニマックスゲームを通じてebmを訓練する生成的敵ネットワークとの関連が確立されている。本稿では,ebmログライクな双方向バウンドを提案し,低バウンドを最大化し,ミニマックスゲームを解く際の上限を最小化する。我々は、トレーニングを安定させる勾配ペナルティに縛り付けられたペナルティをリンクし、最高のエンジニアリングプラクティスの基盤を提供します。境界を評価するために、ebm生成器のヤコビ決定式の新規かつ効率的な推定器を開発した。これらの開発はトレーニングを著しく安定させ,高品質な密度推定とサンプル生成を実現している。

Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train. Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function. We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper bound when solving the minimax game. We link one bound to a gradient penalty that stabilizes training, thereby providing grounding for best engineering practice. To evaluate the bounds we develop a new and efficient estimator of the Jacobi-determinant of the EBM generator. We demonstrate that these developments significantly stabilize training and yield high-quality density estimation and sample generation.

翻訳日:2021-11-02 22:08:52 公開日:2021-11-01

# (参考訳) 注意機構を用いたNested Multiple Instance Learning

Nested Multiple Instance Learning with Attention Mechanisms ( http://arxiv.org/abs/2111.00947v1 )

ライセンス: CC BY 4.0

Saul Fuster, Trygve Eftest{\o}l, Kjersti Engan

(参考訳) 多重インスタンス学習(MIL)は、未知のラベルを持つデータの複数のインスタンスをバッグに分類する弱い教師付き学習の一種である。個々のインスタンスに関する知識は不完全であるため、ラベルはインスタンスを含むバッグに割り当てられる。この方法はラベル付きデータに適合するが、画像への関心領域の発見や時系列信号の集合におけるイベントの検出など、インスタンスの集合間の関連性が必要な、より複雑なシナリオを解決するための深さが欠けている。 Nested MILは、最外側のバッグだけがラベル付けされ、インナーバッグとインスタンスが潜在ラベルとして表現されるバッグ内のラベル付きバッグについて検討している。さらに,各インスタンスが弱いバッグラベルに与える影響を認識できるように,アテンション機構を用いて解釈可能性を高めることを提案する。古典的画像データセットにおける実験により,提案モデルが画像領域の関連インスタンスの発見だけでなく,高精度な性能を提供することが示された。

Multiple instance learning (MIL) is a type of weakly supervised learning where multiple instances of data with unknown labels are sorted into bags. Since knowledge about the individual instances is incomplete, labels are assigned to the bags containing the instances. While this method fits diverse applications were labelled data is scarce, it lacks depth for solving more complex scenarios where associations between sets of instances have to be made, like finding relevant regions of interest in an image or detecting events in a set of time-series signals. Nested MIL considers labelled bags within bags, where only the outermost bag is labelled and inner-bags and instances are represented as latent labels. In addition, we propose using an attention mechanism to add interpretability, providing awareness into the impact of each instance to the weak bag label. Experiments in classical image datasets show that our proposed model provides high accuracy performance as well as spotting relevant instances on image regions.

翻訳日:2021-11-02 21:49:09 公開日:2021-11-01

# (参考訳) gtfs2vec -- マイクロリージョンにおける公共交通提供の比較のためのGTFS埋め込みの学習

gtfs2vec -- Learning GTFS Embeddings for comparing Public Transport Offer in Microregions ( http://arxiv.org/abs/2111.00960v1 )

ライセンス: CC BY-SA 4.0

Piotr Gramacki, Szymon Wo\'zniak, Piotr Szyma\'nski

(参考訳) 欧州48都市を選定し,公共交通機関の時刻表をgtfs形式で収集した。 UberのH3空間指数を用いて、各都市を六角形に分割した。時刻表データに基づいて、各地域における公共交通機関の可用性の量と多様性を記述する特定の機能を作成しました。次に、各領域を埋め込むための自己連想型ディープニューラルネットワークを訓練した。このような表現を準備した上で,階層的クラスタリングアプローチを用いて類似領域を識別した。そこで我々は,領域間のユークリッド距離を持つ凝集クラスタリングアルゴリズムとウォード法を用いてクラスタ内分散を最小化した。最後に、得られたクラスタを異なるレベルで分析し、公共交通機関の可用性を質的に記述するいくつかのクラスタを特定した。本研究は, 分析都市の特徴と一致し, 公共交通機関のスケジュール特性に類似した地域を検索できることを示した。

We selected 48 European cities and gathered their public transport timetables in the GTFS format. We utilized Uber's H3 spatial index to divide each city into hexagonal micro-regions. Based on the timetables data we created certain features describing the quantity and variety of public transport availability in each region. Next, we trained an auto-associative deep neural network to embed each of the regions. Having such prepared representations, we then used a hierarchical clustering approach to identify similar regions. To do so, we utilized an agglomerative clustering algorithm with a euclidean distance between regions and Ward's method to minimize in-cluster variance. Finally, we analyzed the obtained clusters at different levels to identify some number of clusters that qualitatively describe public transport availability. We showed that our typology matches the characteristics of analyzed cities and allows succesful searching for areas with similar public transport schedule characteristics.

翻訳日:2021-11-02 21:38:11 公開日:2021-11-01

# (参考訳) 天文学における深層学習アルゴリズムのロバスト性-銀河形態学的研究

Robustness of deep learning algorithms in astronomy -- galaxy morphology studies ( http://arxiv.org/abs/2111.00961v1 )

ライセンス: CC BY 4.0

A. \'Ciprijanovi\'c, D. Kafkes, G. N. Perdue, K. Pedro, G. Snyder, F. J. S\'anchez, S. Madireddy, S. Wild, B. Nord

(参考訳) ディープラーニングモデルは、特に科学データの高次元とボリュームを扱うために、幅広い科学領域で広く採用されている。しかし、これらのモデルは複雑さと過小パラメータ化のために不安定になりがちであり、特に、実際の科学データでよく見られる圧縮やぼやけといった一般的な画像処理によって現れる不注意な逆向きの摂動が原因である。この不安定さを理解し、これらの敵対的摂動に対して堅牢なモデルを開発することが重要である。本研究では、露光時間からの観測ノイズの影響と、LSSTモックデータにおける異なる形態の銀河の識別を訓練したResNet18の性能に対する圧縮や望遠鏡誤差のプロキシとしての1ピクセル攻撃の最悪のシナリオについて検討する。我々はまた、このタイプの自然発生攻撃の場合に、ドメイン適応技術がモデルのロバスト性を改善するのにどのように役立つかを検討し、科学者がより信頼できる安定したモデルを構築するのを助ける。

Deep learning models are being increasingly adopted in wide array of scientific domains, especially to handle high-dimensionality and volume of the scientific data. However, these models tend to be brittle due to their complexity and overparametrization, especially to the inadvertent adversarial perturbations that can appear due to common image processing such as compression or blurring that are often seen with real scientific data. It is crucial to understand this brittleness and develop models robust to these adversarial perturbations. To this end, we study the effect of observational noise from the exposure time, as well as the worst case scenario of a one-pixel attack as a proxy for compression or telescope errors on performance of ResNet18 trained to distinguish between galaxies of different morphologies in LSST mock data. We also explore how domain adaptation techniques can help improve model robustness in case of this type of naturally occurring attacks and help scientists build more trustworthy and stable models.

翻訳日:2021-11-02 21:27:51 公開日:2021-11-01

# (参考訳) iflow:一様コーダによる効率的なロスレス圧縮のための数値可逆流れ

iFlow: Numerically Invertible Flows for Efficient Lossless Compression via a Uniform Coder ( http://arxiv.org/abs/2111.00965v1 )

ライセンス: CC BY 4.0

Shifeng Zhang, Ning Kang, Tom Ryder and Zhenguo Li

(参考訳) 世界は2020年に59ZB$ (5.9 \times 10^{13} GB$) のデータを生産したと推定され、データストレージと送信の両方で莫大なコストがかかった。幸いなことに、ディープジェネレーティブモデルの最近の進歩は、いわゆる「ニューラル圧縮」アルゴリズムの新たなクラスを先導し、圧縮比の点で従来のコーデックを大きく上回っている。残念ながら、ニューラルネットワークの圧縮は、その帯域幅が限られているため、商業的関心をほとんど集めていないため、非常に効率的なフレームワークの開発は、実用上非常に重要である。本稿では,高圧縮比を実現するための大きな能力を示す正規化フローを用いた無損失圧縮について論じる。そこで我々は,効率的なロスレス圧縮を実現する新しい手法iFlowを紹介する。まず, モジュールスケール変換(MST)と, MSTに基づく数値逆流変換の新たなファミリを提案する。次に、iFlowに組み込まれた高速均一分散コーデックであるUniform Base Conversion System (UBCS)を導入し、効率的な圧縮を実現する。 iFlowは最先端圧縮比を達成し、他の高性能スキームよりも5\times$高速である。さらに,本論文では,フローに基づく幅広いアルゴリズムの符号化時間を高速化する手法を提案する。

It was estimated that the world produced $59 ZB$ ($5.9 \times 10^{13} GB$) of data in 2020, resulting in the enormous costs of both data storage and transmission. Fortunately, recent advances in deep generative models have spearheaded a new class of so-called "neural compression" algorithms, which significantly outperform traditional codecs in terms of compression ratio. Unfortunately, the application of neural compression garners little commercial interest due to its limited bandwidth; therefore, developing highly efficient frameworks is of critical practical importance. In this paper, we discuss lossless compression using normalizing flows which have demonstrated a great capacity for achieving high compression ratios. As such, we introduce iFlow, a new method for achieving efficient lossless compression. We first propose Modular Scale Transform (MST) and a novel family of numerically invertible flow transformations based on MST. Then we introduce the Uniform Base Conversion System (UBCS), a fast uniform-distribution codec incorporated into iFlow, enabling efficient compression. iFlow achieves state-of-the-art compression ratios and is $5\times$ quicker than other high-performance schemes. Furthermore, the techniques presented in this paper can be used to accelerate coding time for a broad class of flow-based algorithms.

翻訳日:2021-11-02 21:18:10 公開日:2021-11-01

# (参考訳) Hex2vec - OpenStreetMapタグでH3ヘキサゴナルを埋め込むコンテキスト認識

Hex2vec -- Context-Aware Embedding H3 Hexagons with OpenStreetMap Tags ( http://arxiv.org/abs/2111.00970v1 )

ライセンス: CC BY-SA 4.0

Szymon Wo\'zniak, Piotr Szyma\'nski

(参考訳) 空間的および地理的データの表現学習は、ディープニューラルネットワークを用いた領域間の類似性検出と高品質な推論を可能にする、急速に発展する分野である。しかし過去のアプローチでは、ラスター画像(地図、通り、衛星写真)、移動データ、あるいは道路ネットワークの埋め込みに集中していた。本稿では,マイクログリッドにおける都市機能と土地利用に関して,OpenStreetMap領域のベクトル表現を学習するための最初のアプローチを提案する。土地利用, 建築, 都市域の機能, 水の種類, 緑その他の自然地域の主な特徴に関連するOSMタグのサブセットを同定する。タギングの質を手作業で検証し,訓練対象都市36都市を選定した。 UberのH3インデックスは、都市を六角形に分割するために使われ、OSMタグは六角形ごとに集約された。負サンプリングを用いたスキップグラムモデルに基づくhex2vec法を提案する。結果として得られるベクトル表現は、ベクトルベースの言語モデルに見られるものと同様、地図特性のセマンティック構造を示す。また, ポーランドの6都市における地域類似度検出の知見を提示し, 集積クラスタリングにより得られた地域型について提案する。

Representation learning of spatial and geographic data is a rapidly developing field which allows for similarity detection between areas and high-quality inference using deep neural networks. Past approaches however concentrated on embedding raster imagery (maps, street or satellite photos), mobility data or road networks. In this paper we propose the first approach to learning vector representations of OpenStreetMap regions with respect to urban functions and land-use in a micro-region grid. We identify a subset of OSM tags related to major characteristics of land-use, building and urban region functions, types of water, green or other natural areas. Through manual verification of tagging quality, we selected 36 cities were for training region representations. Uber's H3 index was used to divide the cities into hexagons, and OSM tags were aggregated for each hexagon. We propose the hex2vec method based on the Skip-gram model with negative sampling. The resulting vector representations showcase semantic structures of the map characteristics, similar to ones found in vector-based language models. We also present insights from region similarity detection in six Polish cities and propose a region typology obtained through agglomerative clustering.

翻訳日:2021-11-02 20:21:21 公開日:2021-11-01

# (参考訳) 転置学習に基づく発音スコアリング手法

A transfer learning based approach for pronunciation scoring ( http://arxiv.org/abs/2111.00976v1 )

ライセンス: CC BY 4.0

Marcelo Sancinetti, Jazmin Vidal, Cyntia Bonomi, Luciana Ferrer

(参考訳) 音声レベルの発音のスコア付けは難しい課題であり、人間の注釈装置とは程遠いパフォーマンスである。標準システムは、ネイティブデータのみを持つ自動音声認識(asr)用に訓練されたモデルを使用して、フレーズ内の各電話機のスコアを生成する。非ネイティブデータを使用してタスクのために特別にトレーニングされたシステムを使用する場合、パフォーマンスが向上している。しかし、このようなシステムは、このタスクのためにラベル付けされたデータセットが少なく、通常は小さいという課題に直面している。本稿では,asrに訓練されたモデルを活用して,発音スコアリングのタスクに適応するトランスファー学習に基づくアプローチを提案する。本稿では,いくつかの設計選択の効果を分析し,その性能をGOPシステムと比較する。最終システムは,不必要な修正率の低減を優先するコスト関数として,評価研究のためのデータベースであるEpaDBのGOPシステムよりも20%優れている。

Phone-level pronunciation scoring is a challenging task, with performance far from that of human annotators. Standard systems generate a score for each phone in a phrase using models trained for automatic speech recognition (ASR) with native data only. Better performance has been shown when using systems that are trained specifically for the task using non-native data. Yet, such systems face the challenge that datasets labelled for this task are scarce and usually small. In this paper, we present a transfer learning-based approach that leverages a model trained for ASR, adapting it for the task of pronunciation scoring. We analyze the effect of several design choices and compare the performance with a state-of-the-art goodness of pronunciation (GOP) system. Our final system is 20% better than the GOP system on EpaDB, a database for pronunciation scoring research, for a cost function that prioritizes low rates of unnecessary corrections.

翻訳日:2021-11-02 20:08:05 公開日:2021-11-01

# (参考訳) 部分適応部分モジュラー最大化

Partial-Adaptive Submodular Maximization ( http://arxiv.org/abs/2111.00986v1 )

ライセンス: CC BY 4.0

Shaojie Tang and Jing Yuan

(参考訳) 典型的な適応的逐次決定問題の目標は、いくつかの部分的な観測に基づいて項目群を逐次選択する対話的ポリシーを設計し、期待される有用性を最大化することである。プール型能動学習や適応的影響最大化を含む実世界の多くのアプリケーションの有用性は適応的部分モジュラリティの性質を満たすことが示されている。しかしながら、適応部分モジュラー最大化に関する既存の研究のほとんどは、完全な適応設定に焦点を当てており、次の選択を行う前に、次の選択からフィードバックを待つ必要がある。このアプローチは過去のフィードバックを最大限に活用して情報的決定を行うことができるが、すべての選択が事前に行われる非適応的なソリューションと比較して、選択プロセスを完成させるのに長い時間がかかるかもしれない。本稿では,バッチ内で複数の選択を同時に行い,それらの実現を同時に観察できる部分適応部分モジュラー最大化の問題について検討する。我々のアプローチは、過去の選択から観察を待つ時間を削減するとともに、適応性の利点を享受します。最善の知識では、非単調適応部分モジュラー最大化問題に対する部分適応ポリシーは知られていない。我々はこの問題を,濃度制約とknapsack制約の両方の下で検討し,どちらの場合においても効果的かつ効率的な解法を開発する。我々はまた、バッチクエリの複雑さ、すなわち、ポリシーが選択プロセスの完了に要するバッチの数を、いくつかの仮定の下で分析する。

The goal of a typical adaptive sequential decision making problem is to design an interactive policy that selects a group of items sequentially, based on some partial observations, to maximize the expected utility. It has been shown that the utility functions of many real-world applications, including pooled-based active learning and adaptive influence maximization, satisfy the property of adaptive submodularity. However, most of existing studies on adaptive submodular maximization focus on the fully adaptive setting, i.e., one must wait for the feedback from \emph{all} past selections before making the next selection. Although this approach can take full advantage of feedback from the past to make informed decisions, it may take a longer time to complete the selection process as compared with the non-adaptive solution where all selections are made in advance before any observations take place. In this paper, we explore the problem of partial-adaptive submodular maximization where one is allowed to make multiple selections in a batch simultaneously and observe their realizations together. Our approach enjoys the benefits of adaptivity while reducing the time spent on waiting for the observations from past selections. To the best of our knowledge, no results are known for partial-adaptive policies for the non-monotone adaptive submodular maximization problem. We study this problem under both cardinality constraint and knapsack constraints, and develop effective and efficient solutions for both cases. We also analyze the batch query complexity, i.e., the number of batches a policy takes to complete the selection process, of our policy under some additional assumptions.

翻訳日:2021-11-02 19:54:29 公開日:2021-11-01

# (参考訳) 手話理解のための手話理解モデル--ナイジェリア手話言語を事例として

Sign-to-Speech Model for Sign Language Understanding: A Case Study of Nigerian Sign Language ( http://arxiv.org/abs/2111.00995v1 )

ライセンス: CC BY 4.0

Steven Kolawole, Opeyemi Osakuade, Nayan Saxena, Babatunde Kazeem Olorisade

(参考訳) 本稿では,ナイジェリアを事例として,アフリカのサハラ以南地域において,手話に精通していない一般社会と難聴者のコミュニケーション障壁を低減し,難聴症例が最も多い地域社会のコミュニケーション障壁を緩和することを目的とした。このデータセットはナイジェリア手話言語の先駆的なデータセットであり、関連する利害関係者と共同で作成された。 2つの異なるオブジェクト検出モデルと分類モデルに対する準備状態のデータを前処理し,手話からテキストへの変換タスクにおけるモデル性能を測定するために多様な評価指標を用いた。最後に、予測した手話テキストを音声に変換し、リアルタイムに動作し、手話/フレーズをテキストに変換し、次に音声に変換する印象的な結果を達成する軽量アプリケーションにおいて、最高のパフォーマンスモデルを展開する。

Through this paper, we seek to reduce the communication barrier between the hearing-impaired community and the larger society who are usually not familiar with sign language in the sub-Saharan region of Africa with the largest occurrences of hearing disability cases, while using Nigeria as a case study. The dataset is a pioneer dataset for the Nigerian Sign Language and was created in collaboration with relevant stakeholders. We pre-processed the data in readiness for two different object detection models and a classification model and employed diverse evaluation metrics to gauge model performance on sign-language to text conversion tasks. Finally, we convert the predicted sign texts to speech and deploy the best performing model in a lightweight application that works in real-time and achieves impressive results converting sign words/phrases to text and subsequently, into speech.

翻訳日:2021-11-02 19:31:34 公開日:2021-11-01

# (参考訳) 時間的文脈から少し助けを借りて:マルチモーダル・エゴセントリックなアクション認識

With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition ( http://arxiv.org/abs/2111.01024v1 )

ライセンス: CC BY 4.0

Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen

(参考訳) エゴセントリックなビデオでは、アクションは素早く続く。行動の時間的文脈を活かし、認識性能を向上させるために周囲の行動に出席することを学ぶ手法を提案する。時間的文脈を組み込むために,映像や音声を入力モダリティとして取り入れるトランスフォーマーに基づくマルチモーダルモデルを提案し,その予測を強化するためにアクションシーケンスコンテキストを提供する明示的な言語モデルを提案する。我々は,EPIC-KITCHENSとEGTEAデータセットを用いて,最先端の性能を報告する。音声入力のモダリティと言語モデルを用いて予測を再スコア化することで,時間的文脈の活用のメリットを実証する。コードとモデル:https://github.com/ekazakos/MTCN。

In egocentric videos, actions occur in quick succession. We capitalise on the action's temporal context and propose a method that learns to attend to surrounding actions in order to improve recognition performance. To incorporate the temporal context, we propose a transformer-based multimodal model that ingests video and audio as input modalities, with an explicit language model providing action sequence context to enhance the predictions. We test our approach on EPIC-KITCHENS and EGTEA datasets reporting state-of-the-art performance. Our ablations showcase the advantage of utilising temporal context as well as incorporating audio input modality and language model to rescore predictions. Code and models at: https://github.com/ekazakos/MTCN.

翻訳日:2021-11-02 19:25:34 公開日:2021-11-01

# (参考訳) 材料科学・化学のための解釈・説明可能な機械学習

Interpretable and Explainable Machine Learning for Materials Science and Chemistry ( http://arxiv.org/abs/2111.01037v1 )

ライセンス: CC BY 4.0

Felipe Oviedo, Juan Lavista Ferres, Tonio Buonassisi, Keith Butler

(参考訳) 材料科学におけるデータ駆動アプローチの普及は、科学的発見を成功させるための機械学習モデルの真の可能性を実現するための、エキサイティングな初期段階にあるが、それらは純粋に予測能力を超えた性質を持つ必要がある。モデルの予測と内部動作は、人間の専門家によるある程度の説明可能性を提供し、潜在的なモデル問題や制限の特定を可能にし、モデル予測への信頼を築き、科学的洞察につながる予期せぬ相関を明らかにするべきである。本稿では,材料科学・化学における解釈可能性・説明可能性技術の応用を概説し,これらの技術が科学研究の成果をどう改善するかを論じる。

While the uptake of data-driven approaches for materials science is at an exciting, early stage, to realise the true potential of machine learning models for successful scientific discovery, they must have qualities beyond purely predictive power. The predictions and inner workings of models should provide a certain degree of explainability by human experts, permitting the identification of potential model issues or limitations, building trust on model predictions and unveiling unexpected correlations that may lead to scientific insights. In this work, we summarize applications of interpretability and explainability techniques for materials science and chemistry and discuss how these techniques can improve the outcome of scientific studies.

翻訳日:2021-11-02 19:02:58 公開日:2021-11-01

# (参考訳) カオス力学系における同化の学習

Learning to Assimilate in Chaotic Dynamical Systems ( http://arxiv.org/abs/2111.01058v1 )

ライセンス: CC BY 4.0

Michael McCabe and Jed Brown

(参考訳) カオスシステムにおけるシミュレーションに基づく予測の精度は、予測の初期化時のシステム状態の高品質な推定に大きく依存する。データ同化法は、雑音、不完全観測、システム力学の数値モデルを体系的に組み合わせて、これらの初期条件を推測するために用いられる。我々は, 基底真理データを必要とせず, 雑音の連続した観測から力学系を同化する学習フレームワークであるアモータライズド・アシミレーションを導入する。我々は,自己教師付き記述から,微分可能なシミュレーションを用いて動的システム設定へ強力な結果を拡張することにより,フレームワークのモチベーションを高める。複数のベンチマークシステムにまたがる実験結果から,広く利用されているデータ同化手法に対するアプローチの有効性が示唆された。

The accuracy of simulation-based forecasting in chaotic systems is heavily dependent on high-quality estimates of the system state at the time the forecast is initialized. Data assimilation methods are used to infer these initial conditions by systematically combining noisy, incomplete observations and numerical models of system dynamics to produce effective estimation schemes. We introduce amortized assimilation, a framework for learning to assimilate in dynamical systems from sequences of noisy observations with no need for ground truth data. We motivate the framework by extending powerful results from self-supervised denoising to the dynamical systems setting through the use of differentiable simulation. Experimental results across several benchmark systems highlight the improved effectiveness of our approach over widely-used data assimilation methods.

翻訳日:2021-11-02 18:42:14 公開日:2021-11-01

# (参考訳) ZeBRA:ゼロデータに基づく繰り返しビットフリップ攻撃によるニューラルネットワークの高精度破壊

ZeBRA: Precisely Destroying Neural Networks with Zero-Data Based Repeated Bit Flip Attack ( http://arxiv.org/abs/2111.01080v1 )

ライセンス: CC BY 4.0

Dahoon Park, Kon-Woo Kwon, Sunghoon Im, Jaeha Kung

(参考訳) 本稿では,自己攻撃データセットを合成することにより,ディープニューラルネットワーク(dnn)を高精度に破壊するゼロデータ型反復ビットフリップ攻撃(zebra)を提案する。対向重み攻撃に関する多くの先行研究は、重みパラメータだけでなく、攻撃対象の脆弱なビットを探索する際のトレーニングやテストデータセットも必要である。本研究では,被害者のdnnモデルにおけるバッチ正規化層の統計を利用して,蒸留対象データと呼ばれる攻撃データセットを合成する。蒸留したターゲットデータを備えたzebraアルゴリズムは、トレーニングやテストデータセットにアクセスせずに、モデルの脆弱なビットを検索できる。そこで本手法は,DNNの安全のために,敵の重み付け攻撃をより致命的なものにする。実験の結果,従来の攻撃法と比較して,DNNの破壊に要するビットフリップ数は平均で2.0x (CIFAR-10) と1.6x (ImageNet) が少ないことがわかった。コードはhttps://github.com/で入手できる。 pdh930105/ZeBRA。

In this paper, we present Zero-data Based Repeated bit flip Attack (ZeBRA) that precisely destroys deep neural networks (DNNs) by synthesizing its own attack datasets. Many prior works on adversarial weight attack require not only the weight parameters, but also the training or test dataset in searching vulnerable bits to be attacked. We propose to synthesize the attack dataset, named distilled target data, by utilizing the statistics of batch normalization layers in the victim DNN model. Equipped with the distilled target data, our ZeBRA algorithm can search vulnerable bits in the model without accessing training or test dataset. Thus, our approach makes the adversarial weight attack more fatal to the security of DNNs. Our experimental results show that 2.0x (CIFAR-10) and 1.6x (ImageNet) less number of bit flips are required on average to destroy DNNs compared to the previous attack method. Our code is available at https://github. com/pdh930105/ZeBRA.

翻訳日:2021-11-02 18:19:11 公開日:2021-11-01

# マルチメディア推薦のためのコントラストモダリティ融合による潜在構造マイニング

Latent Structures Mining with Contrastive Modality Fusion for Multimedia Recommendation ( http://arxiv.org/abs/2111.00678v1 )

ライセンス: Link先を確認

Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Mengqi Zhang, Shu Wu, Liang Wang

(参考訳) 近年,マルチメディアリコメンデーションへの関心が高まっている。マルチモーダルコンテンツを用いたアイテムの対話性を予測することを目的としている。これまでの研究は、サイド情報を含むマルチモーダル機能によるユーザ・テーマインタラクションのモデリングに焦点を当てている。しかし、この方式はマルチメディアレコメンデーションには適していない。まず、協調的なアイテム-アイテム間の関係のみが、高次アイテム-ユーザ-アイテム間の共起によって暗黙的にモデル化される。これらのマルチモーダルコンテンツに基づく潜在的セマンティック・アイテム・イテム構造は、より優れたアイテム表現を学習し、候補項目を包括的に発見するための推奨モデルを支援するのに有用である。第2に, 細粒度マルチモーダル核融合を無視する先行研究である。複数モードにアクセスできることで、豊富な情報を取得することができるが、線形結合や過去の作業における連結による単純な粗粒融合は、内容情報や項目の関係を十分に理解するには不十分である、と我々は論じ、このために、contRastive mOdality fusion method (MICRO) を用いた潜伏構造を提案する。具体化するために,各モダリティの項目間関係を学習する新しいモダリティ対応構造学習モジュールを考案した。学習したモダリティ対応アイテムの関係に基づき、モダリティ対応アイテム表現にアイテム親和性を明示的に注入するグラフ畳み込みを行う。そして,マルチモーダルな特徴を融合する新しいコントラスト手法を設計する。これらの強化された項目表現は、より正確な推奨を行うために既存の協調フィルタリングメソッドにプラグインすることができる。実世界のデータセットに関する広範な実験は、最先端のベースラインよりも優れた方法を示している。

Recent years have witnessed growing interests in multimedia recommendation, which aims to predict whether a user will interact with an item with multimodal contents. Previous studies focus on modeling user-item interactions with multimodal features included as side information. However, this scheme is not well-designed for multimedia recommendation. Firstly, only collaborative item-item relationships are implicitly modeled through high-order item-user-item co-occurrences. We argue that the latent semantic item-item structures underlying these multimodal contents could be beneficial for learning better item representations and assist the recommender models to comprehensively discover candidate items. Secondly, previous studies disregard the fine-grained multimodal fusion. Although having access to multiple modalities might allow us to capture rich information, we argue that the simple coarse-grained fusion by linear combination or concatenation in previous work is insufficient to fully understand content information and item relationships.To this end, we propose a latent structure MIning with ContRastive mOdality fusion method (MICRO for brevity). To be specific, we devise a novel modality-aware structure learning module, which learns item-item relationships for each modality. Based on the learned modality-aware latent item relationships, we perform graph convolutions that explicitly inject item affinities to modality-aware item representations. Then, we design a novel contrastive method to fuse multimodal features. These enriched item representations can be plugged into existing collaborative filtering methods to make more accurate recommendations. Extensive experiments on real-world datasets demonstrate the superiority of our method over state-of-the-art baselines.

翻訳日:2021-11-02 18:01:58 公開日:2021-11-01

# 階層型情報構造を用いた分散協調強化学習

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure ( http://arxiv.org/abs/2111.00781v1 )

ライセンス: Link先を確認

Hsu Kao, Chen-Yu Wei, Vijay Subramanian

(参考訳) 情報非対称性のため,マルチエージェント強化学習(MARL)の問題は困難である。この課題を克服するために、既存の手法では高いレベルの調整やエージェント間のコミュニケーションを必要とすることが多い。我々は、アプリケーションに生じる階層的な情報構造を持つ2エージェントマルチアームバンド(MAB)とマルコフ決定プロセス(MDP)について検討し、協調や通信を必要としないよりシンプルで効率的なアルゴリズムを提案する。構造では、各ステップで ``leader" がまず彼女のアクションを選択し、その後 ``follower" がリーダーのアクションを観察した後、彼のアクションを決定する。 2つのエージェントは、共同アクションに依存する同じ報酬(およびmdp設定における同じ状態遷移)を観察します。バンドイット設定には,$\widetilde{\mathcal{o}}(\sqrt{abt})$ と$\mathcal{o}(\log(t))$ の近似的ギャップ非依存的後悔と,$a$ と $b$ がそれぞれリーダーとフォロワーのアクション数であり,$t$ がステップ数であるような階層的バンドイットアルゴリズムを提案する。我々はさらに,複数のフォロワと深い階層を持つケースにまで拡張し,それぞれが最適に近い後悔の限界を得る。 mdp の設定では、$\widetilde{\mathcal{o}}(\sqrt{h^7s^2abt})$ regret、ただし$h$ は1エピソードあたりのステップ数、$s$ はステート数、$t$ はエピソード数である。これは、$A、B$、および$T$という観点で既存の下限と一致する。

Multi-agent reinforcement learning (MARL) problems are challenging due to information asymmetry. To overcome this challenge, existing methods often require high level of coordination or communication between the agents. We consider two-agent multi-armed bandits (MABs) and Markov decision processes (MDPs) with a hierarchical information structure arising in applications, which we exploit to propose simpler and more efficient algorithms that require no coordination or communication. In the structure, in each step the ``leader" chooses her action first, and then the ``follower" decides his action after observing the leader's action. The two agents observe the same reward (and the same state transition in the MDP setting) that depends on their joint action. For the bandit setting, we propose a hierarchical bandit algorithm that achieves a near-optimal gap-independent regret of $\widetilde{\mathcal{O}}(\sqrt{ABT})$ and a near-optimal gap-dependent regret of $\mathcal{O}(\log(T))$, where $A$ and $B$ are the numbers of actions of the leader and the follower, respectively, and $T$ is the number of steps. We further extend to the case of multiple followers and the case with a deep hierarchy, where we both obtain near-optimal regret bounds. For the MDP setting, we obtain $\widetilde{\mathcal{O}}(\sqrt{H^7S^2ABT})$ regret, where $H$ is the number of steps per episode, $S$ is the number of states, $T$ is the number of episodes. This matches the existing lower bound in terms of $A, B$, and $T$.

翻訳日:2021-11-02 18:01:33 公開日:2021-11-01

# 画像としてのプログラムの符号化:ソースコードの視覚的表現の評価

Encoding Program as Image: Evaluating Visual Representation of Source Code ( http://arxiv.org/abs/2111.01097v1 )

ライセンス: Link先を確認

Md Rafiqul Islam Rabin, Mohammad Amin Alipour

(参考訳) ニューラルネットワークの入力ベクトルにソースコードをエンコードするいくつかのアプローチがある。これらのアプローチは、入力プログラムの様々な構文的特徴と意味的特徴をエンコーディングに含もうとしている。本稿では,入力プログラムのスナップショットに基づくソースコードの新しい表現であるcode2snapshotについて検討する。この表現のいくつかのバリエーションを評価し、その性能を入力プログラムの豊かな構文的特徴と意味的特徴を利用した最先端表現と比較する。コード要約タスクにおけるCode2Snapshotの実用性に関する予備的な研究は、入力プログラムの単純なスナップショットが最先端表現に匹敵する性能を持つことを示唆している。興味深いことに、入力プログラムを無視することはcode2snapshotのパフォーマンスに無意味な影響を与えるため、いくつかのタスクでは、ニューラルネットワークが入力プログラムの構造のみに依存することで高いパフォーマンスを提供する可能性がある。

There are several approaches to encode source code in the input vectors of neural models. These approaches attempt to include various syntactic and semantic features of input programs in their encoding. In this paper, we investigate Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs. We evaluate several variations of this representation and compare its performance with state-of-the-art representations that utilize the rich syntactic and semantic features of input programs. Our preliminary study on the utility of Code2Snapshot in the code summarization task suggests that simple snapshots of input programs have comparable performance to the state-of-the-art representations. Interestingly, obscuring the input programs have insignificant impacts on the Code2Snapshot performance, suggesting that, for some tasks, neural models may provide high performance by relying merely on the structure of input programs.

翻訳日:2021-11-02 18:00:38 公開日:2021-11-01

# (参考訳) FaceScape: 1次元顔再構成のための3次元顔データセットとベンチマーク

FaceScape: 3D Facial Dataset and Benchmark for Single-View 3D Face Reconstruction ( http://arxiv.org/abs/2111.01082v1 )

ライセンス: CC BY 4.0

Hao Zhu, Haotian Yang, Longwei Guo, Yidi Zhang, Yanru Wang, Mingkai Huang, Qiu Shen, Ruigang Yang, Xun Cao

(参考訳) 本稿では,大規模な3次元顔データセット,FaceScape,およびそれに対応するベンチマークについて述べる。 FaceScapeデータをトレーニングすることにより、単一の画像入力から精巧な3次元顔モデルを予測する新しいアルゴリズムを提案する。 FaceScapeデータセットは18,760のテクスチャ付き3D顔を提供する。 3Dモデルは、位相的に均一になるように処理される細孔レベルの顔形状を含んでいる。これらの微細な3次元顔モデルは、粗い形状と詳細な幾何学のための変位マップの3次元形態モデルとして表現することができる。大規模かつ高精度なデータセットを活用して、深層ニューラルネットワークを用いて表現固有の動的詳細を学習する新しいアルゴリズムが提案されている。学習された関係は、単一の画像入力から3次元顔予測システムの基礎となる。従来の方法とは異なり、予測した3dモデルは、異なる表現の下で高度に詳細な幾何学を組み込むことができる。また、FaceScapeデータを用いて、最新の単一視点顔再構成手法の評価を行う。精度はカメラのポーズと焦点距離の寸法で報告され分析され、忠実で包括的な評価が得られ、新たな課題が浮かび上がっている。前例のないデータセット、ベンチマーク、コードは、研究目的で一般に公開された。

In this paper, we present a large-scale detailed 3D face dataset, FaceScape, and the corresponding benchmark to evaluate single-view facial 3D reconstruction. By training on FaceScape data, a novel algorithm is proposed to predict elaborate riggable 3D face models from a single image input. FaceScape dataset provides 18,760 textured 3D faces, captured from 938 subjects and each with 20 specific expressions. The 3D models contain the pore-level facial geometry that is also processed to be topologically uniformed. These fine 3D facial models can be represented as a 3D morphable model for rough shapes and displacement maps for detailed geometry. Taking advantage of the large-scale and high-accuracy dataset, a novel algorithm is further proposed to learn the expression-specific dynamic details using a deep neural network. The learned relationship serves as the foundation of our 3D face prediction system from a single image input. Different than the previous methods, our predicted 3D models are riggable with highly detailed geometry under different expressions. We also use FaceScape data to generate the in-the-wild and in-the-lab benchmark to evaluate recent methods of single-view face reconstruction. The accuracy is reported and analyzed on the dimensions of camera pose and focal length, which provides a faithful and comprehensive evaluation and reveals new challenges. The unprecedented dataset, benchmark, and code have been released to the public for research purpose.

翻訳日:2021-11-02 17:57:24 公開日:2021-11-01

# To Talk or To Work: モバイルエッジデバイス上での効果的なフェデレーション学習の遅延

To Talk or to Work: Delay Efficient Federated Learning over Mobile Edge Devices ( http://arxiv.org/abs/2111.00637v1 )

ライセンス: Link先を確認

Pavana Prakash, Jiahao Ding, Maoqiang Wu, Minglei Shu, Rong Yu, and Miao Pan

(参考訳) 新たな分散機械学習パラダイムであるフェデレーション・ラーニング(fl)は、エッジコンピューティングと融合し、モバイルエッジデバイス上で新たなアプリケーションを持つ有望な分野である。 FLでは、モバイルデバイスは、モデル更新だけを共有することで、中央サーバの調整の下で、自身のデータに基づいてモデルをトレーニングするため、トレーニングデータをプライベートに保持する。しかし、データの中心的な可用性がなければ、計算ノードは収束を達成するためにしばしばモデル更新を伝える必要がある。したがって、ローカルモデル更新を作成するためのローカルな計算時間と、それらをサーバに送受信するのに要する時間とが、全体の時間を遅らせることになる。さらに、信頼性の低いネットワーク接続は、これらの更新の効率的な通信を妨げる可能性がある。そこで本稿では,モデルが収束するために必要な通信ラウンドと計算時間(計算時間と通信時間の両方)を削減する遅延効率のfl機構を提案する。遅延に寄与する様々なパラメータの影響を探求し,無線通信(話)と局所計算(作業)のトレードオフのバランスを図る。総合時間との関係を最適化問題として定式化し,広範なシミュレーションによるアプローチの有効性を示す。

Federated learning (FL), an emerging distributed machine learning paradigm, in conflux with edge computing is a promising area with novel applications over mobile edge devices. In FL, since mobile devices collaborate to train a model based on their own data under the coordination of a central server by sharing just the model updates, training data is maintained private. However, without the central availability of data, computing nodes need to communicate the model updates often to attain convergence. Hence, the local computation time to create local model updates along with the time taken for transmitting them to and from the server result in a delay in the overall time. Furthermore, unreliable network connections may obstruct an efficient communication of these updates. To address these, in this paper, we propose a delay-efficient FL mechanism that reduces the overall time (consisting of both the computation and communication latencies) and communication rounds required for the model to converge. Exploring the impact of various parameters contributing to delay, we seek to balance the trade-off between wireless communication (to talk) and local computation (to work). We formulate a relation with overall time as an optimization problem and demonstrate the efficacy of our approach through extensive simulations.

翻訳日:2021-11-02 17:30:07 公開日:2021-11-01

# GCNear: ニアメモリ処理による効率的なGCNトレーニングのためのハイブリッドアーキテクチャ

GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing ( http://arxiv.org/abs/2111.00680v1 )

ライセンス: Link先を確認

Zhe Zhou and Cong Li and Xuechao Wei and Guangyu Sun

(参考訳) 近年,グラフ畳み込みネットワーク (GCN) は非ユークリッドグラフデータを解析するための最先端のアルゴリズムとなっている。しかし、特に大きなグラフで効率的なgcnトレーニングを実現することは困難である。理由は多岐にわたる。 1)GCNトレーニングは、かなりのメモリフットプリントを発生させる。大規模なグラフ上のフルバッチトレーニングは、バックプロパゲーションのために中間データをバッファするために数百から数千ギガバイトのメモリを必要とする。 2)GCNトレーニングには、メモリ集約データ削減と計算集約機能/段階更新操作の両方が含まれる。このような異質な性質は、現在のCPU/GPUプラットフォームに挑戦する。 3) グラフの不規則性と複雑なトレーニングデータフローは,GCN訓練システムの効率向上の難しさを両立させる。本稿では,これらの課題に対処するためのハイブリッドアーキテクチャであるGCNearを提案する。具体的には、GCNearはDIMMベースのメモリシステムを採用し、容易にスケールできるメモリ容量を提供する。ヘテロジニアスの性質に合わせて、GCNトレーニング操作をメモリ集約リデュースと計算集約更新操作に分類する。次に、高集積ローカル帯域幅をフル活用して、Reducee操作をオン・DIMM NMEにオフロードする。更新処理に十分な計算能力を持つCAEを採用している。さらに,GCNタスクの不規則性に対処し,GCNearの性能を改善するための最適化手法を提案する。また,GCNearのスケーラビリティを評価するためのマルチGCNearシステムを提案する。

Recently, Graph Convolutional Networks (GCNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data. However, it is challenging to realize efficient GCN training, especially on large graphs. The reasons are many-folded: 1) GCN training incurs a substantial memory footprint. Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory to buffer the intermediate data for back-propagation. 2) GCN training involves both memory-intensive data reduction and computation-intensive features/gradients update operations. Such a heterogeneous nature challenges current CPU/GPU platforms. 3) The irregularity of graphs and the complex training dataflow jointly increase the difficulty of improving a GCN training system's efficiency. This paper presents GCNear, a hybrid architecture to tackle these challenges. Specifically, GCNear adopts a DIMM-based memory system to provide easy-to-scale memory capacity. To match the heterogeneous nature, we categorize GCN training operations as memory-intensive Reduce and computation-intensive Update operations. We then offload Reduce operations to on-DIMM NMEs, making full use of the high aggregated local bandwidth. We adopt a CAE with sufficient computation capacity to process Update operations. We further propose several optimization strategies to deal with the irregularity of GCN tasks and improve GCNear's performance. We also propose a Multi-GCNear system to evaluate the scalability of GCNear.

翻訳日:2021-11-02 17:29:47 公開日:2021-11-01

# オンラインの公正な学習がランク付けへ

Calibrating Explore-Exploit Trade-off for Fair Online Learning to Rank ( http://arxiv.org/abs/2111.00735v1 )

ライセンス: Link先を確認

Yiling Jia, Hongning Wang

(参考訳) オンライン・ラーニング・ト・ランク(OL2R)は、オフラインの教師付きランキング・モデル学習に必要な高価なレバレッジ・ラベリングを避けるという利点により、近年大きな研究関心を集めている。そのような解は未知数(例えば、故意に選択された結果をトップの位置に提示する)を探索し、関連性の推定を改善する。しかしこれは、OL2Rの期間中に異なる種類のアイテムが異なる治療を受ける可能性があるという、ランキングフェアネスに関する懸念を引き起こす。しかし、既存の公正ランキングソリューションは、通常、結果の妥当性の知識や、OL2Rの設定と矛盾するパフォーマンスローダを必要とするため、公正性を保証するために直接適用することはできない。本稿では,ol2rにおける集団曝露によって定義される公平性を達成するための汎用フレームワークを提案する。鍵となる考え方は、公正性制御、関連学習、オンラインランキング品質のための探索と搾取を校正することである。特に、モデルが関連性フィードバックの結果の集合を探索する場合、ランダムな置換のサブセット内で探索を限定し、フィードバックが不偏である間にグループ間の公平性が維持される。理論的には、このような戦略は、公平性を得るためのOL2Rの後悔に最小の歪みをもたらす。既存のフェアOL2Rソリューションと比較して,提案ソリューションの有効性を示すために,ベンチマークデータセットをランク付けする2つの公開学習において,大規模な実証分析を行う。

Online learning to rank (OL2R) has attracted great research interests in recent years, thanks to its advantages in avoiding expensive relevance labeling as required in offline supervised ranking model learning. Such a solution explores the unknowns (e.g., intentionally present selected results on top positions) to improve its relevance estimation. This however triggers concerns on its ranking fairness: different groups of items might receive differential treatments during the course of OL2R. But existing fair ranking solutions usually require the knowledge of result relevance or a performing ranker beforehand, which contradicts with the setting of OL2R and thus cannot be directly applied to guarantee fairness. In this work, we propose a general framework to achieve fairness defined by group exposure in OL2R. The key idea is to calibrate exploration and exploitation for fairness control, relevance learning and online ranking quality. In particular, when the model is exploring a set of results for relevance feedback, we confine the exploration within a subset of random permutations, where fairness across groups is maintained while the feedback is still unbiased. Theoretically we prove such a strategy introduces minimum distortion in OL2R's regret to obtain fairness. Extensive empirical analysis is performed on two public learning to rank benchmark datasets to demonstrate the effectiveness of the proposed solution compared to existing fair OL2R solutions.

翻訳日:2021-11-02 17:29:29 公開日:2021-11-01

# 無差別の毒殺攻撃はショートカットだ

Indiscriminate Poisoning Attacks Are Shortcuts ( http://arxiv.org/abs/2111.00898v1 )

ライセンス: Link先を確認

Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

(参考訳) トレーニングデータに知覚不可能な摂動を加えてトレーニングモデルのテストエラーを最大化する無差別なデータ中毒攻撃は、不正なデータの使用を防止できると考えられるため、トレンドとなっている。本研究では,これらの摂動が原理的に働く理由について考察する。高度な中毒攻撃の摂動は、対応するサンプルの標的ラベルに割り当てられた場合にはほぼ \textbf{linear separable} であり、学習目的には \emph{shortcuts} として機能する。この重要な人口の資産は以前にも明らかにされていない。さらに,線分分離性が中毒攻撃の働き馬であることの検証も行う。線形分離可能なデータを摂動として合成し、そのような合成摂動が故意に作られた攻撃と同じくらい強力であることを示す。我々の発見は, 深層学習は知覚不可能な尺度であり, 通常の特徴と混在しているとしても, ショートカットに大きく依存しているため, 従来信じられていたような「emph{shortcut learning}」問題は深刻であることを示している。この発見は、事前訓練された特徴抽出器がこれらの中毒攻撃を効果的に無効にすることを示唆している。

Indiscriminate data poisoning attacks, which add imperceptible perturbations to training data to maximize the test error of trained models, have become a trendy topic because they are thought to be capable of preventing unauthorized use of data. In this work, we investigate why these perturbations work in principle. We find that the perturbations of advanced poisoning attacks are almost \textbf{linear separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective. This important population property has not been unveiled before. Moreover, we further verify that linear separability is indeed the workhorse for poisoning attacks. We synthesize linear separable data as perturbations and show that such synthetic perturbations are as powerful as the deliberately crafted attacks. Our finding suggests that the \emph{shortcut learning} problem is more serious than previously believed as deep learning heavily relies on shortcuts even if they are of an imperceptible scale and mixed together with the normal features. This finding also suggests that pre-trained feature extractors would disable these poisoning attacks effectively.

翻訳日:2021-11-02 17:29:05 公開日:2021-11-01

# ランダム化シミュレーションによるロボット学習

Robot Learning from Randomized Simulations: A Review ( http://arxiv.org/abs/2111.00956v1 )

ライセンス: Link先を確認

Fabio Muratore, Fabio Ramos, Greg Turk, Wenhao Yu, Michael Gienger and Jan Peters

(参考訳) ディープラーニングの台頭は、大量のデータを必要とする方法を好むロボット研究のパラダイムシフトを引き起こしている。このようなデータセットを物理プラットフォーム上で生成するのは違法にコストがかかる。したがって、最先端のアプローチは、データ生成が高速かつ安価であるシミュレーションで学習し、その知識を実際のロボット(sim-to-real)に転送する。現実的になりつつあるにもかかわらず、すべてのシミュレーターはモデルに基づいて構築されているため、必然的に不完全である。これは、ロボットの制御ポリシーを学習し、シミュレーションと現実のミスマッチを克服するためにシミュレータをどのように修正できるかという問題を引き起こし、しばしば「現実のギャップ」と呼ばれる。本稿では, ランダム化シミュレーションから学習する手法である「ドメインランダム化」と呼ばれる手法に着目し, ロボット工学におけるシム・トゥ・リアル研究の総合的なレビューを行う。

The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. It is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the 'reality gap'. We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named 'domain randomization' which is a method for learning from randomized simulations.

翻訳日:2021-11-02 17:28:45 公開日:2021-11-01

# STORM+:非凸最適化のためのモーメント付き完全適応SGD

STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization ( http://arxiv.org/abs/2111.01040v1 )

ライセンス: Link先を確認

Kfir Y. Levy, Ali Kavis, Volkan Cevher

(参考訳) 本研究では,目的が滑らかな損失関数に対する期待値である確率的非凸最適化問題を調査し,近似定常点を求めることを目的とする。このような問題に対処する最も一般的なアプローチは分散還元法であり、これはこの場合の下限に合致する密接な収束率を得ることでも知られている。それにもかかわらず、これらの技術は適切に選択された「メガバッチサイズ」と連動してアンカーポイントを注意深く維持する必要がある。これにより、実用性を弱める超パラメータチューニング問題が発生する。近年, [Cutkosky and Orabona, 2019] は, アンカーポイントや大規模なバッチサイズの使用を避けるために再帰運動量を利用することができ, この設定に最適なレートが得られることを示した。しかし、ストームと呼ばれるそれらの手法は、滑らかさの知識と勾配ノルムの束縛に大きく依存している。本研究では,パラメータフリーで大規模なバッチサイズを必要としない新しい手法STORM+を提案し,近似定常点を求めるために最適なO(1/T^{1/3})$レートを求める。我々の研究は、学習率と運動量パラメータを適応的に設定する新しいアプローチとともに、STORMアルゴリズムに基づいている。

In this work we investigate stochastic non-convex optimization problems where the objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is variance reduction techniques, which are also known to obtain tight convergence rates, matching the lower bounds in this case. Nevertheless, these techniques require a careful maintenance of anchor points in conjunction with appropriately selected "mega-batchsizes". This leads to a challenging hyperparameter tuning problem, that weakens their practicality. Recently, [Cutkosky and Orabona, 2019] have shown that one can employ recursive momentum in order to avoid the use of anchor points and large batchsizes, and still obtain the optimal rate for this setting. Yet, their method called STORM crucially relies on the knowledge of the smoothness, as well a bound on the gradient norms. In this work we propose STORM+, a new method that is completely parameter-free, does not require large batch-sizes, and obtains the optimal $O(1/T^{1/3})$ rate for finding an approximate stationary point. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.

翻訳日:2021-11-02 17:27:40 公開日:2021-11-01

# 資源効率のよい連合学習

Resource-Efficient Federated Learning ( http://arxiv.org/abs/2111.01108v1 )

ライセンス: Link先を確認

Ahmed M. Abdelmoniem and Atal Narayan Sahu and Marco Canini and Suhaib A. Fahmy

(参考訳) フェデレーション学習(fl)は,ローカルデータを用いた学習者による分散トレーニングを可能にし,プライバシの強化とコミュニケーションの削減を実現する。しかし、デプロイメントスケールとしてデータ分散の不均一性、デバイス機能、アクセシビリティに関する多くの課題が提示され、モデル収束とバイアスの両方に影響を与える可能性がある。既存のflスキームは、公平性を改善するためにランダムな参加者選択を用いるが、これはリソースの効率の悪い使用とより低い品質のトレーニングをもたらす可能性がある。本研究では,FLにおける資源効率の課題を系統的に解決し,知的受像者選択のメリットと,混在する参加者からの更新を取り入れた。我々は、これらの要因がいかにリソース効率を向上させるかを示しながら、トレーニングされたモデル品質も改善する。

Federated Learning (FL) enables distributed training by learners using local data, thereby enhancing privacy and reducing communication. However, it presents numerous challenges relating to the heterogeneity of the data distribution, device capabilities, and participant availability as deployments scale, which can impact both model convergence and bias. Existing FL schemes use random participant selection to improve fairness; however, this can result in inefficient use of resources and lower quality training. In this work, we systematically address the question of resource efficiency in FL, showing the benefits of intelligent participant selection, and incorporation of updates from straggling participants. We demonstrate how these factors enable resource efficiency while also improving trained model quality.

翻訳日:2021-11-02 17:27:19 公開日:2021-11-01

# (参考訳) 磁気共鳴画像の画質指標とニューラルネットワークのセグメンテーション精度の相関

Correlation between image quality metrics of magnetic resonance images and the neural network segmentation accuracy ( http://arxiv.org/abs/2111.01093v1 )

ライセンス: CC BY 4.0

Rajarajeswari Muthusivarajan, Adrian Celaya, Joshua P. Yung, Satish Viswanath, Daniel S. Marcus, Caroline Chung, David Fuentes

(参考訳) Deep neural networks with multilevel connections process input data in complex ways to learn the information.A networks learning efficiency depends not only on the complex neural network architecture but also on the input training images.Medical image segmentation with deep neural networks for skull stripping or tumor segmentation from magnetic resonance images enables learning both global and local features of the images.Though medical images are collected in a controlled environment,there may be artifacts or equipment based variance that cause inherent bias in the input set.In this study, we investigated the correlation between the image quality metrics of MR images with the neural network segmentation accuracy.For that we have used the 3D DenseNet architecture and let the network trained on the same input but applying different methodologies to select the training data set based on the IQM values.The difference in the segmentation accuracy between models based on the random training inputs with IQM based training inputs shed light on the role of image quality metrics on segmentation accuracy.By running the image quality metrics to choose the training inputs,further we may tune the learning efficiency of the network and the segmentation accuracy.

Deep neural networks with multilevel connections process input data in complex ways to learn the information.A networks learning efficiency depends not only on the complex neural network architecture but also on the input training images.Medical image segmentation with deep neural networks for skull stripping or tumor segmentation from magnetic resonance images enables learning both global and local features of the images.Though medical images are collected in a controlled environment,there may be artifacts or equipment based variance that cause inherent bias in the input set.In this study, we investigated the correlation between the image quality metrics of MR images with the neural network segmentation accuracy.For that we have used the 3D DenseNet architecture and let the network trained on the same input but applying different methodologies to select the training data set based on the IQM values.The difference in the segmentation accuracy between models based on the random training inputs with IQM based training inputs shed light on the role of image quality metrics on segmentation accuracy.By running the image quality metrics to choose the training inputs,further we may tune the learning efficiency of the network and the segmentation accuracy.

翻訳日:2021-11-02 17:24:17 公開日:2021-11-01

# 情報価値におけるマルチマーケットモノポリーと非凹凸の拡大

Expanding Multi-Market Monopoly and Nonconcavity in the Value of Information ( http://arxiv.org/abs/2111.00839v1 )

ライセンス: Link先を確認

Stefan Behringer

(参考訳) 本稿では,複数の相互接続市場においてランダムに増加する需要に直面した価格設定独占の具体的設定におけるベイズ的逆問題について検討する。 Kalman-Bucy-Stratonovichフィルタを用いた完全動的離散モデルで信号のモノポリスへの値を調べると、信号の分散において非モノトニックであることが分かる。古典的な情報の価値の静的な設定では、この関係は凸や凹凸であるが、常に単調である。非単調性の存在は、系の外因性成長速度に大きく依存する。

In this paper I investigate a Bayesian inverse problem in the specific setting of a price setting monopolist facing a randomly growing demand in multiple possibly interconnected markets. Investigating the Value of Information of a signal to the monopolist in a fully dynamic discrete model employing the Kalman-Bucy-Stratonovich filter, we find that it may be non-monotonic in the variance of the signal. In the classical static settings of the Value of Information literature this relationship may be convex or concave, but is always monotonic. The existence of the non-monotonicity depends critically on the exogenous growth rate of the system.

翻訳日:2021-11-02 17:12:15 公開日:2021-11-01

# (参考訳) aiへのステークホルダーの参加:"多種多様な利害関係者の追加と混乱"を超えて

Stakeholder Participation in AI: Beyond "Add Diverse Stakeholders and Stir" ( http://arxiv.org/abs/2111.01122v1 )

ライセンス: CC BY 4.0

Fernando Delgado, Stephen Yang, Michael Madaio, Qian Yang

(参考訳) HCIとAI研究には、AIシステムの設計は、AIに影響されるステークホルダーを関与させ、強化する必要があるという意見の一致が増えている。しかし、ai設計に利害関係者が参加すべき方法は明らかではない。このワークショップペーパーは、既存の文献の参加に関する合成と、最近公開された調査およびAI研究者や実践者に対する数十の半構造化されたインタビューを通じて、現在のプラクティスの実証分析を通じて、AI設計における「参加的転換」を掘り下げることを目的としている。本稿は,AI設計における参加実践の現代的景観を詳述した経験的知見の集合を,我々の文献合成と経験的研究に基づいて分析するための概念的枠組みを提案する。これらの発見は、PD of AIがAI、HCI、その他の研究コミュニティをどのように前進させるべきか、より原則化された議論をブートストラップに役立てることができる。

There is a growing consensus in HCI and AI research that the design of AI systems needs to engage and empower stakeholders who will be affected by AI. However, the manner in which stakeholders should participate in AI design is unclear. This workshop paper aims to ground what we dub a 'participatory turn' in AI design by synthesizing existing literature on participation and through empirical analysis of its current practices via a survey of recent published research and a dozen semi-structured interviews with AI researchers and practitioners. Based on our literature synthesis and empirical research, this paper presents a conceptual framework for analyzing participatory approaches to AI design and articulates a set of empirical findings that in ensemble detail out the contemporary landscape of participatory practice in AI design. These findings can help bootstrap a more principled discussion on how PD of AI should move forward across AI, HCI, and other research communities.

翻訳日:2021-11-02 17:05:07 公開日:2021-11-01

# マルチエージェント知覚のための蒸留コラボレーショングラフの学習

Learning Distilled Collaboration Graph for Multi-Agent Perception ( http://arxiv.org/abs/2111.00643v1 )

ライセンス: Link先を確認

Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, Wenjun Zhang

(参考訳) マルチエージェント知覚のためのパフォーマンス・バンド幅トレードオフを改善するために,訓練可能,ポーズ認識,エージェント間の適応協調をモデル化する新しい蒸留コラボレーショングラフ(ディスクグラフ)を提案する。私たちの重要な斬新さは2つの側面にある。まず,知識蒸留によるDiscoGraphの学習を行う教師支援フレームワークを提案する。教師モデルは、全体視点入力と早期に協調し、生徒モデルは、単視点入力との中間的な協調に基づいている。本枠組みは,教師モデルの対応に合うように,学生モデルにおけるコラボレーション後の特徴マップを制約することにより,ディスクグラフを訓練する。次に、行列値のエッジウェイトをDiscoGraphで提案する。このような行列において、各要素は特定の空間領域におけるエージェント間注意を反映し、エージェントが情報領域を適応的に強調することができる。推論中は、蒸留されたコラボレーションネットワーク(DiscoNet)という名前の学生モデルのみを使用する必要があります。教師/学生のフレームワークに貢献し、共有されたDiscoNetを持つ複数のエージェントが、総合的な視点で仮説的な教師モデルのパフォーマンスに協力的にアプローチすることができる。 CARLAとSUMOを用いた大規模マルチエージェント認識データセットであるV2X-Sim 1.0で本手法の有効性を検証した。マルチエージェント3Dオブジェクト検出における定量的および定性的実験により、DiscoNetは最先端の協調認識法よりも優れた性能帯域トレードオフを達成できただけでなく、より簡単な設計の根拠ももたらした。私たちのコードはhttps://github.com/ai4ce/DiscoNetで公開されています。

To promote better performance-bandwidth trade-off for multi-agent perception, we propose a novel distilled collaboration graph (DiscoGraph) to model trainable, pose-aware, and adaptive collaboration among agents. Our key novelties lie in two aspects. First, we propose a teacher-student framework to train DiscoGraph via knowledge distillation. The teacher model employs an early collaboration with holistic-view inputs; the student model is based on intermediate collaboration with single-view inputs. Our framework trains DiscoGraph by constraining post-collaboration feature maps in the student model to match the correspondences in the teacher model. Second, we propose a matrix-valued edge weight in DiscoGraph. In such a matrix, each element reflects the inter-agent attention at a specific spatial region, allowing an agent to adaptively highlight the informative regions. During inference, we only need to use the student model named as the distilled collaboration network (DiscoNet). Attributed to the teacher-student framework, multiple agents with the shared DiscoNet could collaboratively approach the performance of a hypothetical teacher model with a holistic view. Our approach is validated on V2X-Sim 1.0, a large-scale multi-agent perception dataset that we synthesized using CARLA and SUMO co-simulation. Our quantitative and qualitative experiments in multi-agent 3D object detection show that DiscoNet could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale. Our code is available on https://github.com/ai4ce/DiscoNet.

翻訳日:2021-11-02 16:53:23 公開日:2021-11-01

# ロバスト最適輸送による正確な点雲登録

Accurate Point Cloud Registration with Robust Optimal Transport ( http://arxiv.org/abs/2111.00648v1 )

ライセンス: Link先を確認

Zhengyang Shen, Jean Feydy, Peirong Liu, Ariel Hern\'an Curiale, Ruben San Jose Estepar, Raul San Jose Estepar, Marc Niethammer

(参考訳) 本研究は, 形状整合に対するロバスト最適輸送(OT)の利用について検討する。具体的には、最近のOTソルバは、ポイントクラウド登録のための最適化と深層学習の両方を改良し、安価な計算コストで精度を高めていることを示す。この写本は、現代のOT理論の実践的な概要から始まる。次に、このフレームワークを形状マッチングに使用する際の主な困難に対する解決策を提供する。最後に, 部分形状の剛性登録, キティデータセットのシーンフロー推定, インスピレーションと有効期限の間の肺血管木の非パラメトリック登録など, 幅広い課題に対する輸送強化登録モデルの性能について紹介する。 otベースの手法は,kittiと肺登録課題において,精度とスケーラビリティの両面で最先端の結果を得る。 PVT1010もリリースしました。これは、高濃度のサンプリング点を持つ1010対の肺血管樹のデータセットです。このデータセットは、非常に複雑な形状と変形を持つポイントクラウド登録アルゴリズムの難しいユースケースを提供する。我々の研究は、ロバストOTが幅広い登録モデルの高速な事前調整と微調整を可能にし、コンピュータビジョンツールボックスのための新しいキーメソッドを提供することを示した。私たちのコードとデータセットは、https://github.com/uncbiag/robot.comからオンラインで利用できます。

This work investigates the use of robust optimal transport (OT) for shape matching. Specifically, we show that recent OT solvers improve both optimization-based and deep learning methods for point cloud registration, boosting accuracy at an affordable computational cost. This manuscript starts with a practical overview of modern OT theory. We then provide solutions to the main difficulties in using this framework for shape matching. Finally, we showcase the performance of transport-enhanced registration models on a wide range of challenging tasks: rigid registration for partial shapes; scene flow estimation on the Kitti dataset; and nonparametric registration of lung vascular trees between inspiration and expiration. Our OT-based methods achieve state-of-the-art results on Kitti and for the challenging lung registration task, both in terms of accuracy and scalability. We also release PVT1010, a new public dataset of 1,010 pairs of lung vascular trees with densely sampled points. This dataset provides a challenging use case for point cloud registration algorithms with highly complex shapes and deformations. Our work demonstrates that robust OT enables fast pre-alignment and fine-tuning for a wide range of registration models, thereby providing a new key method for the computer vision toolbox. Our code and dataset are available online at: https://github.com/uncbiag/robot.

翻訳日:2021-11-02 16:52:56 公開日:2021-11-01

# 画像弁別における自己検証

Self-Verification in Image Denoising ( http://arxiv.org/abs/2111.00666v1 )

ライセンス: Link先を確認

Huangxing Lin, Yihong Zhuang, Delu Zeng, Yue Huang, Xinghao Ding, John Paisley

(参考訳) 画像の雑音化のための自己検証と呼ばれる新しい正規化を考案する。この正規化は、従来の事前定義ではなく、ネットワークが学習した深い画像を用いて定式化される。具体的には、ネットワークの出力を ``re-noising'' の後再び denoise する ``prior'' として扱う。再演された画像と先行画像の比較は、ネットワークの演奏能力の自己検証として解釈することができる。自己検証は,画像復元に必要な低レベルな画像統計をネットワークが捉えることを促す。また,この自己検証正規化に基づき,クリーンな画像が見られなくても,ネットワークが無声化を学べることを示す。この学習戦略は自己教師型であり,自己検証画像認知(SVID)と呼ぶ。 SVIDは学習に基づく手法と従来のモデルに基づく認知的手法の混合と見なすことができ、ネットワークの出力を用いて正規化を適応的に定式化する。観測された劣化データのみを用いて,様々な認知タスクへのSVIDの適用について述べる。教師付きCNNに近いデノイングパフォーマンスを実現することができる。

We devise a new regularization, called self-verification, for image denoising. This regularization is formulated using a deep image prior learned by the network, rather than a traditional predefined prior. Specifically, we treat the output of the network as a ``prior'' that we denoise again after ``re-noising''. The comparison between the again denoised image and its prior can be interpreted as a self-verification of the network's denoising ability. We demonstrate that self-verification encourages the network to capture low-level image statistics needed to restore the image. Based on this self-verification regularization, we further show that the network can learn to denoise even if it has not seen any clean images. This learning strategy is self-supervised, and we refer to it as Self-Verification Image Denoising (SVID). SVID can be seen as a mixture of learning-based methods and traditional model-based denoising methods, in which regularization is adaptively formulated using the output of the network. We show the application of SVID to various denoising tasks using only observed corrupted data. It can achieve the denoising performance close to supervised CNNs.

翻訳日:2021-11-02 16:52:35 公開日:2021-11-01

# OctField: 3Dモデリングのための階層的命令関数

OctField: Hierarchical Implicit Functions for 3D Modeling ( http://arxiv.org/abs/2111.01067v1 )

ライセンス: Link先を確認

Jia-Heng Tang, Weikai Chen, Jie Yang, Bo Wang, Songrun Liu, Bo Yang, Lin Gao

(参考訳) 局所化暗黙関数の最近の進歩により、ニューラル暗示表現は大きなシーンにスケーラブルになった。しかし、これらのアプローチによって使われる3次元空間の正則部分分割は、表面占有率と幾何学的詳細の様々な粒度を考慮に入れない。その結果、メモリフットプリントは入力ボリュームとともに立方的に増大し、中程度の密度の分解でも計算コストが抑えられる。そこで本研究では,3次元曲面の学習可能な階層的暗黙表現であるコード化されたオクターフィールドを提案する。我々のアプローチの鍵は、3dシーンの適応分解であり、興味のある面の周囲に局所的な暗黙の関数だけを分散させる。この目的を達成するために、曲面占有率と部分幾何学の豊かさに応じて3次元空間を適応的に分割する階層的オクツリー構造を導入する。 octreeは離散的かつ非微分可能であるため、さらに、octreeセルの下位分割を確率的プロセスとしてモデル化し、octree構造と表面形状の両方を可微分的に再帰的にエンコードし復号する新しい階層ネットワークを提案する。形状モデリングおよび再構成タスクにおけるOctoFieldの価値を示し、代替手法よりも優れていることを示す。

Recent advances in localized implicit functions have enabled neural implicit representation to be scalable to large scenes. However, the regular subdivision of 3D space employed by these approaches fails to take into account the sparsity of the surface occupancy and the varying granularities of geometric details. As a result, its memory footprint grows cubically with the input volume, leading to a prohibitive computational cost even at a moderately dense decomposition. In this work, we present a learnable hierarchical implicit representation for 3D surfaces, coded OctField, that allows high-precision encoding of intricate surfaces with low memory and computational budget. The key to our approach is an adaptive decomposition of 3D scenes that only distributes local implicit functions around the surface of interest. We achieve this goal by introducing a hierarchical octree structure to adaptively subdivide the 3D space according to the surface occupancy and the richness of part geometry. As octree is discrete and non-differentiable, we further propose a novel hierarchical network that models the subdivision of octree cells as a probabilistic process and recursively encodes and decodes both octree structure and surface geometry in a differentiable manner. We demonstrate the value of OctField for a range of shape modeling and reconstruction tasks, showing superiority over alternative approaches.

翻訳日:2021-11-02 16:50:02 公開日:2021-11-01

# refinegan: 精度の高いピッチと強度応答を持つグラウンド真理よりも優れた波形を普遍的に生成する

RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses ( http://arxiv.org/abs/2111.00962v1 )

ライセンス: Link先を確認

Shengyuan Xu, Wenxiao Zhao, Jing Guo

(参考訳) GAN(Generative Adversarial Network)に基づく高忠実度波形生成へのアプローチの多くは、その性能向上のために識別器に大きく依存している。しかし、このGAN法の過剰使用は、生成過程に大きな不確実性をもたらし、しばしばピッチと強度のミスマッチを引き起こし、歌声合成(SVS)のような敏感なケースでは致命的である。この問題に対処するため,高速な実時間生成機能を備えた高忠実なニューラルボコーダであるRefineGANを提案し,ロバスト性,ピッチと強度の精度,フルバンドオーディオ生成に着目した。学習過程の安定化と神経ボコーダのロバスト性を維持するために,マルチスケールのスペクトログラムに基づく損失関数を用いたピッチ誘導型洗練アーキテクチャを用いた。この方法で生成された音声は、地中音と比較した場合、主観的テストにおいて優れた性能を示す。この結果から, スピーカが生み出す欠陥や記録処理を除去することにより, 波形再構成時の忠実度も向上した。さらに、ある特定の種類のデータに基づいて訓練されたモデルが、全く見えない言語と目に見えない話者で同じように機能することを示した。生成されたサンプルペアはhttps://timedomain-tech.github.io/refinegan/で提供される。

Most GAN(Generative Adversarial Network)-based approaches towards high-fidelity waveform generation heavily rely on discriminators to improve their performance. However, the over-use of this GAN method introduces much uncertainty into the generation process and often result in mismatches of pitch and intensity, which is fatal when it comes to sensitive using cases such as singing voice synthesis(SVS). To address this problem, we propose RefineGAN, a high-fidelity neural vocoder with faster-than-real-time generation capability, and focused on the robustness, pitch and intensity accuracy, and full-band audio generation. We employed a pitch-guided refine architecture with a multi-scale spectrogram-based loss function to help stabilize the training process and maintain the robustness of the neural vocoder while using the GAN-based training method. Audio generated using this method shows a better performance in subjective tests when compared with the ground-truth audio. This result shows that the fidelity is even improved during the waveform reconstruction by eliminating defects produced by the speaker and the recording procedure. Moreover, a further study shows that models trained on a specified type of data can perform on totally unseen language and unseen speaker identically well. Generated sample pairs are provided on https://timedomain-tech.github.io/refinegan/.

翻訳日:2021-11-02 16:48:53 公開日:2021-11-01

# (参考訳) オープンワールドサンプリングによる種子不均衡データのコントラスト学習の改善

Improving Contrastive Learning on Imbalanced Seed Data via Open-World Sampling ( http://arxiv.org/abs/2111.01004v1 )

ライセンス: CC BY 4.0

Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang

(参考訳) 対照的な学習アプローチは、ターゲットクラスのラベルがほとんどない視覚表現の学習において大きな成功を収めた。これは、インターネット規模の外部ソースからのラベルなしイメージをより多く取り入れて、そのパフォーマンスを向上させるという、キュレートされた"シード"ベンチマークを超えてそれらをスケールアップする可能性を示すものだ。しかし、実際には、より大きなモデルサイズとより長いトレーニングを必要とするため、ラベルのないデータが大量に必要となる。さらに、open-world unlabeledデータは、通常、暗黙のlong-tailクラスまたは属性分布に従うが、その多くはターゲットクラスに属しない。したがって、ラベルのないデータをすべて盲目的に活用すれば、データの不均衡と邪魔になる可能性がある。このことは、関連するクラスに対して一般化可能でバランスの取れた多様な表現を学ぶために、外部ソースからラベルのないデータを戦略的に選択する原則的なアプローチを模索する動機となっている。本研究では,(1)無作為データ拡張によるサンプルの実証的コントラスト損失期待(ECLE)のソートによるテールクラスからのサンプルのサンプリングを促進するテールネス,(2)学習を妨げかねないアウトリーチを拒否する近接性,(3)サンプルの集合における多様性を保証するダイバーシティの3つの簡単な原則に従う,MAK(Model-Aware K-center)と呼ばれるオープンワールドなラベル付きデータサンプリングフレームワークを提案する。実験では,ImageNet-100-LT(ラベルなし)をシードデータセットと2つの"ノイズ"外部データソースとして使用することにより,MAKは,フルショット設定と少数ショット設定の線形分類器評価により,学習した機能の全体的な表現品質とクラスバランス性の両方を一貫して改善できることを示した。コードは以下の通り。 \url{https://github.com/VITA-Group/MAK

Contrastive learning approaches have achieved great success in learning visual representations with few labels of the target classes. That implies a tantalizing possibility of scaling them up beyond a curated "seed" benchmark, to incorporating more unlabeled images from the internet-scale external sources to enhance its performance. However, in practice, larger amount of unlabeled data will require more computing resources due to the bigger model size and longer training needed. Moreover, open-world unlabeled data usually follows an implicit long-tail class or attribute distribution, many of which also do not belong to the target classes. Blindly leveraging all unlabeled data hence can lead to the data imbalance as well as distraction issues. This motivates us to seek a principled approach to strategically select unlabeled data from an external source, in order to learn generalizable, balanced and diverse representations for relevant classes. In this work, we present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK), which follows three simple principles: (1) tailness, which encourages sampling of examples from tail classes, by sorting the empirical contrastive loss expectation (ECLE) of samples over random data augmentations; (2) proximity, which rejects the out-of-distribution outliers that may distract training; and (3) diversity, which ensures diversity in the set of sampled examples. Empirically, using ImageNet-100-LT (without labels) as the seed dataset and two "noisy" external data sources, we demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features, as evaluated via linear classifier evaluation on full-shot and few-shot settings. The code is available at: \url{https://github.com/VITA-Group/MAK

翻訳日:2021-11-02 16:46:11 公開日:2021-11-01

# 構造情報の鍵:3次元物体検出における自己注意型RoI特徴エクストラクタ

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection ( http://arxiv.org/abs/2111.00931v1 )

ライセンス: Link先を確認

Diankun Zhang, Zhijie Zheng, Xueting Bi, Xiaojun Liu,

(参考訳) RoIのすべての特徴がグリッドピクセルから得られる2Dオブジェクト検出とは異なり、3Dポイントクラウドオブジェクト検出のRoI特徴抽出はより多様である。本稿ではまず,2つの最先端モデルPV-RCNNとVoxel-RCNNの構造と性能の違いを比較し,解析する。そして,2つのモデル間の性能差は点情報からではなく,構造情報から生じることがわかった。ボクセルの特徴は、点雲にダウンサンプリングする代わりに量子化を行うので、点雲全体の完全な情報を含むことができるため、より構造的な情報を含んでいる。ボクセルの特徴の強い構造情報は、正確な位置情報を持っていなくても、この検出器を我々の実験で高い性能にします。そこで我々は3次元物体検出の鍵となる構造情報を提案する。以上の結論に基づき、3次元提案から抽出した特徴の構造化情報を強化する自己注意型RoI特徴抽出器(SARFE)を提案する。 SARFEは既存の3D検出器で簡単に使用できるプラグアンドプレイモジュールである。我々のSARFEは、KITTIデータセットとWaymo Openデータセットの両方で評価されます。新たに導入されたSARFEにより、リアルタイム能力を維持しながら、KITTIデータセット上でのサイクリストの大きなマージンで最先端の3D検出器の性能を向上する。

Unlike 2D object detection where all RoI features come from grid pixels, the RoI feature extraction of 3D point cloud object detection is more diverse. In this paper, we first compare and analyze the differences in structure and performance between the two state-of-the-art models PV-RCNN and Voxel-RCNN. Then, we find that the performance gap between the two models does not come from point information, but structural information. The voxel features contain more structural information because they do quantization instead of downsampling to point cloud so that they can contain basically the complete information of the whole point cloud. The stronger structural information in voxel features makes the detector have higher performance in our experiments even if the voxel features don't have accurate location information. Then, we propose that structural information is the key to 3D object detection. Based on the above conclusion, we propose a Self-Attention RoI Feature Extractor (SARFE) to enhance structural information of the feature extracted from 3D proposals. SARFE is a plug-and-play module that can be easily used on existing 3D detectors. Our SARFE is evaluated on both KITTI dataset and Waymo Open dataset. With the newly introduced SARFE, we improve the performance of the state-of-the-art 3D detectors by a large margin in cyclist on KITTI dataset while keeping real-time capability.

翻訳日:2021-11-02 16:30:54 公開日:2021-11-01

# 3次元人文推定のための高次インプシシトフェアリングネットワーク

Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation ( http://arxiv.org/abs/2111.00950v1 )

ライセンス: Link先を確認

Jianning Quan and A. Ben Hamza

(参考訳) 人間の3Dポーズを推定することは、主に人体の関節の複雑さ、閉塞、照明条件の変動など、難しい課題であることが証明されている。本稿では,2次元から3次元のポーズ推定のための初期残差接続を持つ高次グラフ畳み込みフレームワークを提案する。ノード特徴の集約にマルチホップ近傍を用いることにより,体節間の長距離依存性を捉えることができる。さらに,ネットワークアーキテクチャにおいて設計により統合された残差接続を活用し,ネットワーク深度が増大するにつれて,学習した特徴表現が入力層の初期特徴から重要な情報を保持することを保証する。 2つの標準ベンチマークで行った実験と改善研究は、我々のモデルの有効性を示し、3次元ポーズ推定のための強力なベースライン法よりも優れた性能を実現した。

Estimating a 3D human pose has proven to be a challenging task, primarily because of the complexity of the human body joints, occlusions, and variability in lighting conditions. In this paper, we introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation. Using multi-hop neighborhoods for node feature aggregation, our model is able to capture the long-range dependencies between body joints. Moreover, our approach leverages residual connections, which are integrated by design in our network architecture, ensuring that the learned feature representations retain important information from the initial features of the input layer as the network depth increases. Experiments and ablations studies conducted on two standard benchmarks demonstrate the effectiveness of our model, achieving superior performance over strong baseline methods for 3D human pose estimation.

翻訳日:2021-11-02 16:30:33 公開日:2021-11-01

# ウェアラブルカメラと多モード融合による人間軌道予測

Egocentric Human Trajectory Forecasting with a Wearable Camera and Multi-Modal Fusion ( http://arxiv.org/abs/2111.00993v1 )

ライセンス: Link先を確認

Jianing Qiu, Lipeng Chen, Xiao Gu, Frank P.-W. Lo, Ya-Yen Tsai, Jiankai Sun, Jiaqi Liu and Benny Lo

(参考訳) 本稿では,密集空間における自我中心型カメラ装着者(自我者)の軌跡予測の問題に対処する。現実世界を歩き回るさまざまなカメラの装着者のデータから得られた軌道予測能力は、視覚障害者のナビゲーション支援や、移動ロボットにおける人間のナビゲーション行動のシミュレーション、人間とロボットのインタラクションの改善に移すことができる。この目的のために、カメラを装着した混雑した空間を航行する人々の実際の軌跡を含む、新しいエゴセントリックな人間の軌道予測データセットを構築し、豊かな文脈データを抽出した。我々は,カメラ装着者の過去の軌跡,近所の人々の過去の軌跡,シーンの意味やシーンの深さなどの環境を予測するために,3つの異なるモダリティを抽出し,活用する。複数のモードを融合する新しいカスケードクロスアテンション機構を組み込んだトランスフォーマベースのエンコーダ・デコーダニューラルネットワークモデルは、カメラ装着者の将来の軌道を予測するために設計されている。実験により,エゴセントリックな人軌道予測において,本モデルが最先端の手法より優れていることが示された。

In this paper, we address the problem of forecasting the trajectory of an egocentric camera wearer (ego-person) in crowded spaces. The trajectory forecasting ability learned from the data of different camera wearers walking around in the real world can be transferred to assist visually impaired people in navigation, as well as to instill human navigation behaviours in mobile robots, enabling better human-robot interactions. To this end, a novel egocentric human trajectory forecasting dataset was constructed, containing real trajectories of people navigating in crowded spaces wearing a camera, as well as extracted rich contextual data. We extract and utilize three different modalities to forecast the trajectory of the camera wearer, i.e., his/her past trajectory, the past trajectories of nearby people, and the environment such as the scene semantics or the depth of the scene. A Transformer-based encoder-decoder neural network model, integrated with a novel cascaded cross-attention mechanism that fuses multiple modalities, has been designed to predict the future trajectory of the camera wearer. Extensive experiments have been conducted, and the results have shown that our model outperforms the state-of-the-art methods in egocentric human trajectory forecasting.

翻訳日:2021-11-02 16:30:21 公開日:2021-11-01

# render in- between: motion guided video synthesis for action interpolation

Render In-between: Motion Guided Video Synthesis for Action Interpolation ( http://arxiv.org/abs/2111.01029v1 )

ライセンス: Link先を確認

Hsuan-I Ho, Xu Chen, Jie Song, Otmar Hilliges

(参考訳) 人間のアクティビティのアップサンプリングは、ゲームからエンターテイメント、スポーツ放送に至るまで、多くの潜在的なアプリケーションにおいて、興味深いが難しい課題だ。この環境でビデオフレームを合成することの主な難しさは、人間の動きの非常に複雑で非線形な性質と、身体の複雑な外観とテクスチャに起因する。本稿では,現実的な人間の動きと外観を創出できる動き誘導型フレームアップサンプリングフレームワークを提案する。大規模モーションキャプチャデータセット(amass)を利用して、フレーム間の非線形骨格運動を推定する新しいモーションモデルを訓練する。高いフレームレートのポーズ予測は、ニューラルネットワークレンダリングパイプラインがフルフレーム出力を生成するために使用し、ポーズとバックグラウンドの一貫性を考慮している。私たちのパイプラインでは、低フレームレートビデオと非ペアの人間のモーションデータしか必要ありませんが、トレーニングのために高フレームレートビデオは必要ありません。さらに,この課題に対して,人間の活動の高品質かつ高フレームなビデオからなる最初の評価データセットを寄贈する。現状の映像補間技術と比較すると, 画質と精度が向上し, 画素レベル, 分布測定値, 比較ユーザ評価の結果から明らかとなった。私たちのコードと収集したデータセットはhttps://git.io/render-in- betweenで利用可能です。

Upsampling videos of human activity is an interesting yet challenging task with many potential applications ranging from gaming to entertainment and sports broadcasting. The main difficulty in synthesizing video frames in this setting stems from the highly complex and non-linear nature of human motion and the complex appearance and texture of the body. We propose to address these issues in a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset (AMASS). The high-frame-rate pose predictions are then used by a neural rendering pipeline to produce the full-frame output, taking the pose and background consistency into consideration. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training. Furthermore, we contribute the first evaluation dataset that consists of high-quality and high-frame-rate videos of human activities for this task. Compared with state-of-the-art video interpolation techniques, our method produces in-between frames with better quality and accuracy, which is evident by state-of-the-art results on pixel-level, distributional metrics and comparative user evaluations. Our code and the collected dataset are available at https://git.io/Render-In-Between.

翻訳日:2021-11-02 16:29:58 公開日:2021-11-01

# 英語誤読検出と診断のための非自己回帰的エンドツーエンドニューラルモデリングの検討

Exploring Non-Autoregressive End-To-End Neural Modeling For English Mispronunciation Detection And Diagnosis ( http://arxiv.org/abs/2111.00844v1 )

ライセンス: Link先を確認

Hsin-Wei Wang, Bi-Cheng Yan, Hsuan-Sheng Chiu, Yung-Chang Hsu, Berlin Chen

(参考訳) エンド・ツー・エンド(E2E)ニューラル・モデリングは、コンピュータ支援言語訓練(CAPT)システムの開発を主な研究分野としており、従来の発音に基づく手法と競合する性能を示している。しかし、CAPTの現在のE2Eニューラルメソッドは、少なくとも2つの重要な課題に直面している。一方、E2E法のほとんどは、左から右へのビームサーチで自己回帰的に動作し、L2学習者の発音を指示する。しかし、これは推論の速度が非常に遅くなり、必然的に実用を妨げます。一方、E2Eニューラルメソッドは通常データ欲求であり、非ネイティブなトレーニングデータが不足すると、誤発音の検出と診断(MD&D)に対する効果が低下することがしばしばある。そこで我々は,非自己回帰(NAR)E2Eニューラルモデリングを利用した新しいMD&D手法を提案し,従来のE2Eニューラル手法と同等の性能を維持しつつ,推論時間を劇的に高速化した。さらに,本手法のNAR E2Eモデル上に積み重ねた発音モデリングネットワークを設計・開発し,MD&Dの有効性をさらに向上する。 DNN-HMM音響モデル上に構築された最上位のE2Eモデルと象徴的発音スコアに基づく手法と比較して,L2-ARCTIC英語データセットを用いた実験により本手法の有効性が検証された。

End-to-end (E2E) neural modeling has emerged as one predominant school of thought to develop computer-assisted language training (CAPT) systems, showing competitive performance to conventional pronunciation-scoring based methods. However, current E2E neural methods for CAPT are faced with at least two pivotal challenges. On one hand, most of the E2E methods operate in an autoregressive manner with left-to-right beam search to dictate the pronunciations of an L2 learners. This however leads to very slow inference speed, which inevitably hinders their practical use. On the other hand, E2E neural methods are normally data greedy and meanwhile an insufficient amount of nonnative training data would often reduce their efficacy on mispronunciation detection and diagnosis (MD&D). In response, we put forward a novel MD&D method that leverages non-autoregressive (NAR) E2E neural modeling to dramatically speed up the inference time while maintaining performance in line with the conventional E2E neural methods. In addition, we design and develop a pronunciation modeling network stacked on top of the NAR E2E models of our method to further boost the effectiveness of MD&D. Empirical experiments conducted on the L2-ARCTIC English dataset seems to validate the feasibility of our method, in comparison to some top-of-the-line E2E models and an iconic pronunciation-scoring based method built on a DNN-HMM acoustic model.

翻訳日:2021-11-02 16:29:35 公開日:2021-11-01

# 擬球面コントラスト発散

Pseudo-Spherical Contrastive Divergence ( http://arxiv.org/abs/2111.00780v1 )

ライセンス: Link先を確認

Lantao Yu, Jiaming Song, Yang Song, Stefano Ermon

(参考訳) エネルギーベースモデル(EBM)は柔軟な分布パラメトリゼーションを提供する。しかし、難解な分割関数のため、通常、最大確率推定のために対比的発散を通じて訓練される。本稿では,ESMの最大確率学習を一般化するための擬球面コントラスト分散(PS-CD)を提案する。 ps-cdは、難解な分割関数の計算を回避し、対照的な発散を含む一般化された学習目的のファミリーを提供する、厳密に適切な均質なスコアリングルールのファミリーの最大化に由来する。さらにPS-CDでは,計算コストや変動最小値の最適化を伴わずに,多様な学習目標を柔軟に選択することができる。提案手法の理論的解析と合成データと一般的な画像データセットの両方に関する広範な実験により、ps-cdの有効性とモデリングの柔軟性が示され、データの汚染に対する堅牢性が示され、最大精度と$f$-ebmsよりも優れていることが示された。

Energy-based models (EBMs) offer flexible distribution parametrization. However, due to the intractable partition function, they are typically trained via contrastive divergence for maximum likelihood estimation. In this paper, we propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum likelihood learning of EBMs. PS-CD is derived from the maximization of a family of strictly proper homogeneous scoring rules, which avoids the computation of the intractable partition function and provides a generalized family of learning objectives that include contrastive divergence as a special case. Moreover, PS-CD allows us to flexibly choose various learning objectives to train EBMs without additional computational cost or variational minimax optimization. Theoretical analysis on the proposed method and extensive experiments on both synthetic data and commonly used image datasets demonstrate the effectiveness and modeling flexibility of PS-CD, as well as its robustness to data contamination, thus showing its superiority over maximum likelihood and $f$-EBMs.

翻訳日:2021-11-02 16:22:41 公開日:2021-11-01

# Back to Basics:IMPによる効率的なネットワーク圧縮

Back to Basics: Efficient Network Compression via IMP ( http://arxiv.org/abs/2111.00843v1 )

ライセンス: Link先を確認

Max Zimmer, Christoph Spiegel, Sebastian Pokutta

(参考訳) ネットワークプルーニング(Network pruning)は、ディープニューラルネットワークを推論時の性能劣化が少なく、効果的に圧縮する手法である。イテレーティブ・マグニチュード・プルーニング(imp)は、ネットワークプルーニングの最も確立されたアプローチの1つであり、いくつかの反復的なトレーニングとプルーニングステップで構成されており、ネットワークのパフォーマンスのかなりの部分がプルーニング後に失われ、その後再トレーニングフェーズで回復される。ベンチマーク参照として一般的に使用されるが、しばしば議論される。 a) 訓練段階にスパーシフィケーションを取り入れないことにより、準最適状態に達すること。 b) グローバル選択基準が最適層ワイドプルーニング率を適切に決定できないこと。 c) その反復的な性質は、遅く非競争的である。最近提案された再訓練技術に照らして,impとpruning-during-trainingアルゴリズムを比較し,選択基準の修正案を評価し,実際に必要とされるイテレーション数と総トレーニング時間について検討した。再学習のためのSLRを用いたIMPは,計算オーバーヘッドの少ない,あるいは少ない,最先端のpruning-during-trainingアプローチよりも優れており,大域的な選択基準は,より複雑なアプローチとほぼ競合するものであり,スパース性vのほとんどを達成するために実際に必要となるエポックはごくわずかである。 -IMPのパフォーマンストレードオフ。我々の目標は、IMPが既に、より複雑でパラメータ化されたアプローチに匹敵する、あるいはさらに優れた、最先端のプルーニング結果を提供できることを示し、また、将来の研究のためのより現実的で容易に実現可能なベースラインを確立することである。

Network pruning is a widely used technique for effectively compressing Deep Neural Networks with little to no degradation in performance during inference. Iterative Magnitude Pruning (IMP) is one of the most established approaches for network pruning, consisting of several iterative training and pruning steps, where a significant amount of the network's performance is lost after pruning and then recovered in the subsequent retraining phase. While commonly used as a benchmark reference, it is often argued that a) it reaches suboptimal states by not incorporating sparsification into the training phase, b) its global selection criterion fails to properly determine optimal layer-wise pruning rates and c) its iterative nature makes it slow and non-competitive. In light of recently proposed retraining techniques, we investigate these claims through rigorous and consistent experiments where we compare IMP to pruning-during-training algorithms, evaluate proposed modifications of its selection criterion and study the number of iterations and total training time actually required. We find that IMP with SLR for retraining can outperform state-of-the-art pruning-during-training approaches without or with only little computational overhead, that the global magnitude selection criterion is largely competitive with more complex approaches and that only few retraining epochs are needed in practice to achieve most of the sparsity-vs.-performance tradeoff of IMP. Our goals are both to demonstrate that basic IMP can already provide state-of-the-art pruning results on par with or even outperforming more complex or heavily parameterized approaches and also to establish a more realistic yet easily realisable baseline for future research.

翻訳日:2021-11-02 16:22:24 公開日:2021-11-01

# 機械学習による作物収量最適化

Machine Learning aided Crop Yield Optimization ( http://arxiv.org/abs/2111.00963v1 )

ライセンス: Link先を確認

Chace Ashcraft, Kiran Karra

(参考訳) 本稿では,openai体育館インタフェースを用いた作物シミュレーション環境を提案し,現代的深層強化学習(drl)アルゴリズムを用いて収率を最適化する。 DRLアルゴリズムは,水や肥料の使用量などの制約要因を最小化しつつ,収穫量の最適化を支援する新しい政策やアプローチの発見に有用であることを示す。我々は、このハイブリッドプラントモデルとデータ駆動アプローチにより、作物収量を最適化する新しい戦略が、人口増加と気候変動による世界の食料需要に対応するのに役立つことを示唆する。

We present a crop simulation environment with an OpenAI Gym interface, and apply modern deep reinforcement learning (DRL) algorithms to optimize yield. We empirically show that DRL algorithms may be useful in discovering new policies and approaches to help optimize crop yield, while simultaneously minimizing constraining factors such as water and fertilizer usage. We propose that this hybrid plant modeling and data-driven approach for discovering new strategies to optimize crop yield may help address upcoming global food demands due to population expansion and climate change.

翻訳日:2021-11-02 16:20:47 公開日:2021-11-01

# PDE-READ:ディープラーニングを用いた人間可読部分微分方程式探索

PDE-READ: Human-readable Partial Differential Equation Discovery using Deep Learning ( http://arxiv.org/abs/2111.00998v1 )

ライセンス: Link先を確認

Robert Stephany, Christopher Earls

(参考訳) PDE発見は、複雑な物理系の予測モデルを明らかにすることを約束するが、測定がまばらでノイズの多い場合には困難である。本稿では,2つの有理ニューラルネットワークと原理的スパース回帰アルゴリズムを用いて,システムの応答を支配する隠れたダイナミクスを同定する新しい手法を提案する。第1のネットワークはシステム応答関数を、第2のネットワークはシステムの進化を駆動する隠れPDEを学習する。次に,パラメータフリーなスパース回帰アルゴリズムを用いて,隠れたPDEの可読な形式を第2ネットワークから抽出する。我々はPDE-READと呼ばれるオープンソースライブラリにアプローチを実装した。提案手法は, 熱, バーガース, コルテヴェーグ・ド・ブリーズ方程式を顕著な整合性で同定する。提案手法は空間と雑音の両方に対して前例のない頑健であり,実世界の観測データに適用可能であることを示す。

PDE discovery shows promise for uncovering predictive models for complex physical systems but has difficulty when measurements are sparse and noisy. We introduce a new approach for PDE discovery that uses two Rational Neural Networks and a principled sparse regression algorithm to identify the hidden dynamics that govern a system's response. The first network learns the system response function, while the second learns a hidden PDE which drives the system's evolution. We then use a parameter-free sparse regression algorithm to extract a human-readable form of the hidden PDE from the second network. We implement our approach in an open-source library called PDE-READ. Our approach successfully identifies the Heat, Burgers, and Korteweg-De Vries equations with remarkable consistency. We demonstrate that our approach is unprecedentedly robust to both sparsity and noise and is, therefore, applicable to real-world observational data.

翻訳日:2021-11-02 16:20:37 公開日:2021-11-01

# 分散原理は、ドロップアウトがフラットなミニマを見つける理由を説明する

A variance principle explains why dropout finds flatter minima ( http://arxiv.org/abs/2111.01022v1 )

ライセンス: Link先を確認

Zhongwang Zhang, Hanxu Zhou, Zhi-Qin John Xu

(参考訳) ドロップアウトはディープラーニングにおいて大きな成功をおさめたが、高次元パラメータ空間における優れた一般化解を見つけるのにどのように役立つかは分かっていない。本研究では,ドロップアウトによる学習では,標準的な勾配降下訓練と比較して,ニューラルネットワークが最少で平坦であることを示す。さらに, 落下が実験によってより平坦なミニマムを発見するメカニズムについて検討した。ノイズの分散が損失景観のより鋭い方向で大きくなることを, {\displaystyle {\it variance principle} として提案する。既存の研究によると、sgdは分散原理を満たしており、トレーニングは最小化される。我々の研究は、落音によるノイズも、落音がフラットなミニマムを見つける理由を説明する分散原理を満たすことを示した。一般論として, 分散原理は, より平坦な最小値を求め, 優れた一般化を得るためのトレーニングを導くドロップアウトとSGDとの重要な類似性である,と指摘する。

Although dropout has achieved great success in deep learning, little is known about how it helps the training find a good generalization solution in the high-dimensional parameter space. In this work, we show that the training with dropout finds the neural network with a flatter minimum compared with standard gradient descent training. We further study the underlying mechanism of why dropout finds flatter minima through experiments. We propose a {\it Variance Principle} that the variance of a noise is larger at the sharper direction of the loss landscape. Existing works show that SGD satisfies the variance principle, which leads the training to flatter minima. Our work show that the noise induced by the dropout also satisfies the variance principle that explains why dropout finds flatter minima. In general, our work points out that the variance principle is an important similarity between dropout and SGD that lead the training to find flatter minima and obtain good generalization.

翻訳日:2021-11-02 16:20:23 公開日:2021-11-01

# FedFm:エッジノードにおける障害軽減のためのロバストなフェデレーション学習アプローチを目指して

FedFm: Towards a Robust Federated Learning Approach For Fault Mitigation at the Edge Nodes ( http://arxiv.org/abs/2111.01074v1 )

ライセンス: Link先を確認

Manupriya Gupta, Pavas Goyal, Rohit Verma, Rajeev Shorey, Huzur Saran

(参考訳) フェデレーション学習は、"データからモデルへのsend"から"モデルからデータへのsend"へと変化します。エッジエコシステムで使用すると、さまざまな手段でデータを収集し、異なるネットワークチャネルを介して接続する多数の異種エッジデバイスがトレーニングプロセスに関与します。このようなエコシステムにおけるエッジデバイスの障害は、デバイス障害やネットワーク上の問題によるものだ。本稿では、まず、FLモデルにおけるエッジデバイス数の影響を分析し、モデルに寄与する最適なデバイス数を選択するための戦略を提供する。選択したデバイスが失敗した場合,エッジエコシステムがどのように振る舞うかを観察し,堅牢なフェデレート学習技術を保証するための緩和戦略を提供する。

Federated Learning deviates from the norm of "send data to model" to "send model to data". When used in an edge ecosystem, numerous heterogeneous edge devices collecting data through different means and connected through different network channels get involved in the training process. Failure of edge devices in such an ecosystem due to device fault or network issues is highly likely. In this paper, we first analyse the impact of the number of edge devices on an FL model and provide a strategy to select an optimal number of devices that would contribute to the model. We observe how the edge ecosystem behaves when the selected devices fail and provide a mitigation strategy to ensure a robust Federated Learning technique.

翻訳日:2021-11-02 16:20:05 公開日:2021-11-01

# SmartSplit: スマートフォン環境におけるCNN分割のためのレイテンシ・エネルギメモリ最適化

SmartSplit: Latency-Energy-Memory Optimisation for CNN Splitting on Smartphone Environment ( http://arxiv.org/abs/2111.01077v1 )

ライセンス: Link先を確認

Ishan Prakash, Aniruddh Bansal, Rohit Verma, Rajeev Shorey

(参考訳) スマートフォン業界では、すべての処理をユーザに近づけて、プライバシの懸念に対処する必要性から、人工知能が中心的存在となっている。複数のAIアプリケーションで使用されている畳み込みニューラルネットワーク(CNN)は、非常にリソースと計算集約性が高い。次世代スマートフォンはAI対応チップを備えているが、多くのアプリケーションがスマートフォン上で同時に実行されるため、最小限のメモリとエネルギー利用が不可欠である。これを踏まえると、処理の一部をクラウドサーバにオフロードすることで、スマートフォンのワークロードを最適化することは、研究の重要な方向である。本稿では,スマートフォンとクラウドサーバ間でCNNを分割する可能性について,エンドツーエンドのレイテンシ,メモリ利用,エネルギー消費を最適化する多目的最適化問題を定式化することによって分析する。我々は、最適化問題を解決するために、意思決定に基づくアプローチによる遺伝的アルゴリズムであるsmartsplitを設計した。実験では複数のCNNモデルを用いて,スマートフォンとクラウドサーバのCNN分割が実現可能であることを示す。提案されたアプローチであるSmartSplitは、他の最先端のアプローチよりも優れている。

Artificial Intelligence has now taken centre stage in the smartphone industry owing to the need of bringing all processing close to the user and addressing privacy concerns. Convolution Neural Networks (CNNs), which are used by several AI applications, are highly resource and computation intensive. Although new generation smartphones come with AI-enabled chips, minimal memory and energy utilisation is essential as many applications are run concurrently on a smartphone. In light of this, optimising the workload on the smartphone by offloading a part of the processing to a cloud server is an important direction of research. In this paper, we analyse the feasibility of splitting CNNs between smartphones and cloud server by formulating a multi-objective optimisation problem that optimises the end-to-end latency, memory utilisation, and energy consumption. We design SmartSplit, a Genetic Algorithm with decision analysis based approach to solve the optimisation problem. Our experiments run with multiple CNN models show that splitting a CNN between a smartphone and a cloud server is feasible. The proposed approach, SmartSplit fares better when compared to other state-of-the-art approaches.

翻訳日:2021-11-02 16:19:54 公開日:2021-11-01

# winogradの最小フィルタリングに基づく高速畳み込み:導入と開発

Fast Convolution based on Winograd Minimum Filtering: Introduction and Development ( http://arxiv.org/abs/2111.00977v1 )

ライセンス: Link先を確認

Gan Tong and Libo Huang

(参考訳) 畳み込みニューラルネットワーク(CNN)は様々な分野で広く使われており、重要な役割を果たしている。畳み込み演算子は畳み込みニューラルネットワークの基本的なコンポーネントであり、ネットワークトレーニングと推論の最も時間を要する部分でもある。近年、FFTやWinogradなどの高速な畳み込みアルゴリズムが提案されている。このうち、ウィノグラードの畳み込みは畳み込みにおける乗算演算を著しく減少させ、FFT畳み込みよりもメモリ空間を小さくする。したがって、ウィノグラード畳み込みは数年のうちに高速畳み込み実装の最初の選択肢となった。現在、畳み込みアルゴリズムの体系的な概要は存在しない。本稿は、このギャップを埋め、フォローアップ研究者に詳細なリファレンスを提供することを目的としている。本稿では,アルゴリズム拡張,アルゴリズム最適化,実装,アプリケーションという3つの側面からウィノグラード畳み込みの開発を要約し,最終的に将来的な方向性について簡単な展望を述べる。

Convolutional Neural Network (CNN) has been widely used in various fields and played an important role. Convolution operators are the fundamental component of convolutional neural networks, and it is also the most time-consuming part of network training and inference. In recent years, researchers have proposed several fast convolution algorithms including FFT and Winograd. Among them, Winograd convolution significantly reduces the multiplication operations in convolution, and it also takes up less memory space than FFT convolution. Therefore, Winograd convolution has quickly become the first choice for fast convolution implementation within a few years. At present, there is no systematic summary of the convolution algorithm. This article aims to fill this gap and provide detailed references for follow-up researchers. This article summarizes the development of Winograd convolution from the three aspects of algorithm expansion, algorithm optimization, implementation, and application, and finally makes a simple outlook on the possible future directions.

翻訳日:2021-11-02 16:16:48 公開日:2021-11-01

# (参考訳) 混合確率推定とPU学習 : 最近のアプローチ

Mixture Proportion Estimation and PU Learning: A Modern Approach ( http://arxiv.org/abs/2111.00980v1 )

ライセンス: CC BY 4.0

Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton

(参考訳) 正の例と(正のクラスと負のクラスの両方から)ラベルされていない例のみを考えると、正確な正の逆負の分類器を推定することを期待できる。形式的には、このタスクは2つのサブタスクに分けられる。 (i)混合比率推定(mpe) --非ラベルデータ中の正の例の比率を決定する。 (ii)pu-learning -このような推定を行い、所望の正負の分類法を学習する。残念ながら、両方の問題の古典的な方法は高次元の設定で分解される。一方、最近提案されたヒューリスティックスは理論的コヒーレンスを欠き、ハイパーパラメータチューニングに依存する。本稿では,pu-learningの単純な目的であるbest bin estimation (bbe) (mpe) とconditional value ignoring risk (cvir) の2つの簡単な手法を提案する。どちらの手法も経験的に従来の手法を支配しており、BBEでは、正の例の小さな部分集合をきれいに分離するためにモデルを訓練できるたびに保持する形式的な保証を確立する。最終アルゴリズム(TED)$^n$は2つの手順を交互に行い、混合比推定器と分類器の両方を著しく改善する。

Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning. Both methods dominate previous approaches empirically, and for BBE, we establish formal guarantees that hold whenever we can train a model to cleanly separate out a small subset of positive examples. Our final algorithm (TED)$^n$, alternates between the two procedures, significantly improving both our mixture proportion estimator and classifier

翻訳日:2021-11-02 16:14:23 公開日:2021-11-01

# 2次元解剖学的ランドマーク検出のための特徴集約と細分化ネットワーク

Feature Aggregation and Refinement Network for 2D AnatomicalLandmark Detection ( http://arxiv.org/abs/2111.00659v1 )

ライセンス: Link先を確認

Yueyuan Ao and Hong Wu

(参考訳) 解剖学的ランドマークの局在は臨床診断、治療計画、研究に不可欠である。本稿では,解剖学的ランドマークの自動検出のための,FARNet(Feature aggregate and refinement Network)という新しいディープネットワークを提案する。医療領域における限られたトレーニングデータの問題を緩和するため,本ネットワークでは,自然画像に事前学習したディープネットワークをバックボーンネットワークとして採用し,いくつかの人気ネットワークを比較した。我々のFARNetには、マルチスケール機能融合のためのマルチスケール機能集約モジュールと、高分解能熱マップ回帰のための機能改善モジュールが含まれています。粗大な監督が2つのモジュールに適用され、エンドツーエンドのトレーニングが促進される。さらに,高精度ヒートマップ回帰のための指数重み付き中心損失という新しい損失関数を提案し,ランドマーク近傍の画素の損失に着目し,遠方からの損失を抑制する。本研究のネットワークは,頭部X線写真,手指X線写真,脊椎X線写真を含む3つの解剖学的ランドマーク検出データセットで評価され,3つのデータセットすべてで最先端のパフォーマンスが達成されている。コードは以下の通り。 \url{https://github.com/JuvenileInWind/FARNet}

Localization of anatomical landmarks is essential for clinical diagnosis, treatment planning, and research. In this paper, we propose a novel deep network, named feature aggregation and refinement network (FARNet), for the automatic detection of anatomical landmarks. To alleviate the problem of limited training data in the medical domain, our network adopts a deep network pre-trained on natural images as the backbone network and several popular networks have been compared. Our FARNet also includes a multi-scale feature aggregation module for multi-scale feature fusion and a feature refinement module for high-resolution heatmap regression. Coarse-to-fine supervisions are applied to the two modules to facilitate the end-to-end training. We further propose a novel loss function named Exponential Weighted Center loss for accurate heatmap regression, which focuses on the losses from the pixels near landmarks and suppresses the ones from far away. Our network has been evaluated on three publicly available anatomical landmark detection datasets, including cephalometric radiographs, hand radiographs, and spine radiographs, and achieves state-of-art performances on all three datasets. Code is available at: \url{https://github.com/JuvenileInWind/FARNet}

翻訳日:2021-11-02 15:36:58 公開日:2021-11-01

# 反復ロバスト変換同期の学習

Learning Iterative Robust Transformation Synchronization ( http://arxiv.org/abs/2111.00728v1 )

ライセンス: Link先を確認

Zi Jian Yew and Gim Hee Lee

(参考訳) 変換同期は、与えられた対関係運動の集合から絶対変換を回復する問題である。その有用性にもかかわらず、ノイズや外向きの相対運動の影響や、解析的にモデル化し、高い忠実度で抑制することの難しさにより、この問題は依然として困難である。本研究では,ロバストな損失関数を手作りすることを避け,グラフニューラルネットワーク(gnns)を用いて変換同期を学習する方法を提案する。複雑なマルチステージパイプラインを使用する以前の作業とは異なり、各ステップは、接空間におけるインクリメンタルな更新を予測することによって、前回のイテレーションからの絶対的なステップを洗練する、単一の重み共有メッセージパッシング層で構成される反復的アプローチを採用している。外れ値の影響を減らすために、メッセージは集約の前に重み付けされる。我々の反復的アプローチは明示的な初期化ステップの必要性を軽減し、アイデンティティの初期ポーズとうまく機能する。提案手法は単純ではあるが,SO(3) と SE(3) の同時同期実験により,既存の手工・学習同期手法に対して良好に動作することを示す。

Transformation Synchronization is the problem of recovering absolute transformations from a given set of pairwise relative motions. Despite its usefulness, the problem remains challenging due to the influences from noisy and outlier relative motions, and the difficulty to model analytically and suppress them with high fidelity. In this work, we avoid handcrafting robust loss functions, and propose to use graph neural networks (GNNs) to learn transformation synchronization. Unlike previous works which use complicated multi-stage pipelines, we use an iterative approach where each step consists of a single weight-shared message passing layer that refines the absolute poses from the previous iteration by predicting an incremental update in the tangent space. To reduce the influence of outliers, the messages are weighted before aggregation. Our iterative approach alleviates the need for an explicit initialization step and performs well with identity initial poses. Although our approach is simple, we show that it performs favorably against existing handcrafted and learned synchronization methods through experiments on both SO(3) and SE(3) synchronization.

翻訳日:2021-11-02 15:36:38 公開日:2021-11-01

# バイアス修正モジュールによる局所表現の改善によるショット学習

Few-shot learning with improved local representations via bias rectify module ( http://arxiv.org/abs/2111.00754v1 )

ライセンス: Link先を確認

Chao Dong, Qi Ye, Wenchao Meng, Kaixiang Yang

(参考訳) メトリック学習に基づく最近のアプローチは、マイナショット学習において大きな進歩を遂げている。しかし、それらのほとんどは画像レベルの表現方法に限られており、クラス内のバリエーションや空間的知識を適切に扱えないため、望ましくないパフォーマンスを生み出す。本稿では,特徴表現の構造に存在する空間情報を十分に活用するために,深バイアス正規化ネットワーク(dbrn)を提案する。まず,クラス内変異による悪影響を軽減するためにバイアス修正モジュールを用いた。 bias rectifyモジュールは、異なる重み付けによって分類に対してより識別可能な特徴に焦点を合わせることができる。トレーニングデータを完全に活用するために,サポートセットから生成されたプロトタイプをより代表的なものにするためのプロトタイプ拡張機構を設計する。提案手法の有効性を検証するため,本手法は最先端手法を上回ることができるため,人気のあるマイナショット分類ベンチマークを用いて広範囲に実験を行った。

Recent approaches based on metric learning have achieved great progress in few-shot learning. However, most of them are limited to image-level representation manners, which fail to properly deal with the intra-class variations and spatial knowledge and thus produce undesirable performance. In this paper we propose a Deep Bias Rectify Network (DBRN) to fully exploit the spatial information that exists in the structure of the feature representations. We first employ a bias rectify module to alleviate the adverse impact caused by the intra-class variations. bias rectify module is able to focus on the features that are more discriminative for classification by given different weights. To make full use of the training data, we design a prototype augment mechanism that can make the prototypes generated from the support set to be more representative. To validate the effectiveness of our method, we conducted extensive experiments on various popular few-shot classification benchmarks and our methods can outperform state-of-the-art methods.

翻訳日:2021-11-02 15:36:20 公開日:2021-11-01

# 衝突認識因子による相互作用手の単眼3次元再構成

Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements ( http://arxiv.org/abs/2111.00763v1 )

ライセンス: Link先を確認

Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy

(参考訳) 3dインタラクションハンドレコンストラクションは、人間と機械の相互作用と人間の行動を理解するのに不可欠である。この分野での以前の作業では、深度画像のような補助入力に依存するか、単眼のrgb画像を使用する場合のみ片手で処理できる。シングルハンド法は、両手間の相互作用を明示的にモデル化できないため、密接に相互作用する手に適用すると、衝突した手メッシュを生成する傾向がある。本稿では,単眼単一rgb画像から3次元インタラクションハンドを再構築する最初の試みを行う。高精度な3dポーズと最小限の衝突を伴う3dハンドメッシュを生成できる。これは2段階のフレームワークによって実現されている。具体的には、第1段階は畳み込みニューラルネットワークを採用し、衝突を許容するがポーズ精度の高いハンドメッシュを奨励する粗い予測を生成する。第2段階は、3dポーズの正確性を維持しながら、一連の因子化された改良を通じて衝突を段階的に緩和する。効率と精度のトレードオフを考慮し, 分解改質の可能性を慎重に検討する。 interhand2.6m のような大規模データセットにおける広範囲な量的・質的結果が提案手法の有効性を示している。

3D interacting hand reconstruction is essential to facilitate human-machine interaction and human behaviors understanding. Previous works in this field either rely on auxiliary inputs such as depth images or they can only handle a single hand if monocular single RGB images are used. Single-hand methods tend to generate collided hand meshes, when applied to closely interacting hands, since they cannot model the interactions between two hands explicitly. In this paper, we make the first attempt to reconstruct 3D interacting hands from monocular single RGB images. Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions. This is made possible via a two-stage framework. Specifically, the first stage adopts a convolutional neural network to generate coarse predictions that tolerate collisions but encourage pose-accurate hand meshes. The second stage progressively ameliorates the collisions through a series of factorized refinements while retaining the preciseness of 3D poses. We carefully investigate potential implementations for the factorized refinement, considering the trade-off between efficiency and accuracy. Extensive quantitative and qualitative results on large-scale datasets such as InterHand2.6M demonstrate the effectiveness of the proposed approach.

翻訳日:2021-11-02 15:36:03 公開日:2021-11-01

# 注意的特徴集約による高密度予測

Dense Prediction with Attentive Feature Aggregation ( http://arxiv.org/abs/2111.00770v1 )

ライセンス: Link先を確認

Yung-Hsu Yang, Thomas E. Huang, Samuel Rota Bul\`o, Peter Kontschieder, Fisher Yu

(参考訳) 異なる層にまたがる特徴から情報を集約することは、密集予測モデルに不可欠な操作である。限定的な表現性にもかかわらず、機能結合は集約操作の選択を支配する。本稿では,AFA(Attentive Feature Aggregation)を導入し,より表現力のある非線形操作で異なるネットワーク層を融合させる。 AFAは、層活性化の重み付き平均を計算するために空間的注意とチャネル的注意の両方を利用する。ニューラルボリュームレンダリングに触発されて、AFAをスケールスペースレンダリング(SSR)で拡張し、マルチスケール予測の後期融合を行う。 afaは、既存のネットワーク設計の幅広い範囲に適用できる。 cityscapes、bdd100k、mapillary vistasなどのセマンティックセグメンテーションベンチマークに対して、無視できる計算量とパラメータオーバーヘッドで、一貫性と大幅な改善が得られました。特に、AFAは、Cityscapes上でのDeep Layer Aggregation(DLA)モデルの性能を約6%向上させる。実験結果から,afa はセグメンテーションマップを段階的に洗練し,境界詳細を改善することを学び,bsds500 および nyudv2 における境界検出ベンチマークの最新の結果を得ることができた。コードとビデオのリソースはhttp://vis.xyz/pub/dla-afaで入手できる。

Aggregating information from features across different layers is an essential operation for dense prediction models. Despite its limited expressiveness, feature concatenation dominates the choice of aggregation operations. In this paper, we introduce Attentive Feature Aggregation (AFA) to fuse different network layers with more expressive non-linear operations. AFA exploits both spatial and channel attention to compute weighted average of the layer activations. Inspired by neural volume rendering, we extend AFA with Scale-Space Rendering (SSR) to perform late fusion of multi-scale predictions. AFA is applicable to a wide range of existing network designs. Our experiments show consistent and significant improvements on challenging semantic segmentation benchmarks, including Cityscapes, BDD100K, and Mapillary Vistas, at negligible computational and parameter overhead. In particular, AFA improves the performance of the Deep Layer Aggregation (DLA) model by nearly 6% mIoU on Cityscapes. Our experimental analyses show that AFA learns to progressively refine segmentation maps and to improve boundary details, leading to new state-of-the-art results on boundary detection benchmarks on BSDS500 and NYUDv2. Code and video resources are available at http://vis.xyz/pub/dla-afa.

翻訳日:2021-11-02 15:34:46 公開日:2021-11-01

# PP-ShiTu:実用軽量画像認識システム

PP-ShiTu: A Practical Lightweight Image Recognition System ( http://arxiv.org/abs/2111.00775v1 )

ライセンス: Link先を確認

Shengyu Wei, Ruoyu Guo, Cheng Cui, Bin Lu, Shuilong Dong, Tingquan Gao, Yuning Du, Ying Zhou, Xueying Lyu, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma

(参考訳) 近年,画像認識アプリケーションの開発が急速に進んでいる。顔認識、歩行者と車両の再識別、ランドマーク検索、製品認識など、さまざまな分野で多くの研究と技術が登場している。本稿では,主物体検出,特徴抽出,ベクトル探索の3つのモジュールからなる実用軽量画像認識システムPP-ShiTuを提案する。メトリクス学習,ディープハッシュ,知識蒸留,モデル量子化といった一般的な戦略を導入し,精度と推論速度を向上させる。上記の戦略では、PP-ShiTuは混合データセットでトレーニングされたモデルのセットで、さまざまなシナリオでうまく機能する。異なるデータセットとベンチマークの実験により、システムは画像認識の異なる領域で広く有効であることが示された。上記のモデルはすべてオープンソースで、コードはGitHubリポジトリPaddleClas on PaddlePaddleで公開されている。

In recent years, image recognition applications have developed rapidly. A large number of studies and techniques have emerged in different fields, such as face recognition, pedestrian and vehicle re-identification, landmark retrieval, and product recognition. In this paper, we propose a practical lightweight image recognition system, named PP-ShiTu, consisting of the following 3 modules, mainbody detection, feature extraction and vector search. We introduce popular strategies including metric learning, deep hash, knowledge distillation and model quantization to improve accuracy and inference speed. With strategies above, PP-ShiTu works well in different scenarios with a set of models trained on a mixed dataset. Experiments on different datasets and benchmarks show that the system is widely effective in different domains of image recognition. All the above mentioned models are open-sourced and the code is available in the GitHub repository PaddleClas on PaddlePaddle.

翻訳日:2021-11-02 15:34:24 公開日:2021-11-01

# 凸形状を持つ測地線モデル

Geodesic Models with Convexity Shape Prior ( http://arxiv.org/abs/2111.00794v1 )

ライセンス: Link先を確認

Da Chen and Jean-Marie Mirebeau and Minglei Shu and Xuecheng Tai and Laurent D. Cohen

(参考訳) アイコン方程式に基づく最小測地モデルは、様々な画像セグメンテーションのシナリオで適切な解を見つけることができる。既存の測地線に基づくセグメンテーションアプローチは、通常、測地線曲線を計算するためにユークリッド曲線の長さや曲率ペナル化長さなどの幾何正規化項と共に画像特徴を利用する。本稿では, より複雑な問題として, 凸形状を持つ曲率ペナル化された測地線経路を求める。我々は、平面曲線を高次元の向きに依存した空間にマッピングできる配向リフト戦略に依存する新しい測地モデルを確立する。凸形状は、特定の曲率の制約をコードする局所測地線メトリクスの構築のための制約となる。そして、向き上げ空間における測地距離とそれに対応する閉測地路を、最先端のハミルトン高速マーチング法により効率的に計算することができる。さらに,提案する測地線モデルをアクティブな輪郭に適用することにより,凸形状の利点と曲率ペナリゼーションを保った効率的なインタラクティブ画像分割アルゴリズムを実現する。

The minimal geodesic models based on the Eikonal equations are capable of finding suitable solutions in various image segmentation scenarios. Existing geodesic-based segmentation approaches usually exploit image features in conjunction with geometric regularization terms, such as Euclidean curve length or curvature-penalized length, for computing geodesic curves. In this paper, we take into account a more complicated problem: finding curvature-penalized geodesic paths with a convexity shape prior. We establish new geodesic models relying on the strategy of orientation-lifting, by which a planar curve can be mapped to an high-dimensional orientation-dependent space. The convexity shape prior serves as a constraint for the construction of local geodesic metrics encoding a particular curvature constraint. Then the geodesic distances and the corresponding closed geodesic paths in the orientation-lifted space can be efficiently computed through state-of-the-art Hamiltonian fast marching method. In addition, we apply the proposed geodesic models to the active contours, leading to efficient interactive image segmentation algorithms that preserve the advantages of convexity shape prior and curvature penalization.

翻訳日:2021-11-02 15:34:11 公開日:2021-11-01

# LSTA-Net:スケルトンに基づく行動認識のための長期時空間アグリゲーションネットワーク

LSTA-Net: Long short-term Spatio-Temporal Aggregation Network for Skeleton-based Action Recognition ( http://arxiv.org/abs/2111.00823v1 )

ライセンス: Link先を確認

Tailin Chen, Shidong Wang, Desen Zhou, Yu Guan

(参考訳) 様々な時空間依存のモデル化は、スケルトンシーケンスにおける人間の行動を認識する鍵となる。既存の手法の多くは、ダイナミックジョイントの依存関係を引き出すためにトラバーサルルールやグラフトポロジの設計に過度に依存しており、これは遠方かつ重要なジョイントの関連性を反映していない。さらに, 局所的な運用により, 重要な長距離時間情報については, 既存の作品ではよく調べられていない。この問題に対処するため,我々はlsta-netを提案する。このネットワークは,長期・短距離の依存関係を時空間的に効果的に捉えることができる。我々は,空間的特徴の集約と時間的特徴の集約を交互に行う純粋因子化アーキテクチャにモデルを考案する。特徴集約効果を改善するため、チャネルワイドアテンション機構も設計・採用されている。 3つの公開ベンチマークデータセットで広範な実験を行い,提案手法は空間領域と時間領域における長短距離依存性の両方を捉えることができ,他の最先端手法よりも高い結果が得られることが示唆された。コードはhttps://github.com/tailin1009/lsta-net。

Modelling various spatio-temporal dependencies is the key to recognising human actions in skeleton sequences. Most existing methods excessively relied on the design of traversal rules or graph topologies to draw the dependencies of the dynamic joints, which is inadequate to reflect the relationships of the distant yet important joints. Furthermore, due to the locally adopted operations, the important long-range temporal information is therefore not well explored in existing works. To address this issue, in this work we propose LSTA-Net: a novel Long short-term Spatio-Temporal Aggregation Network, which can effectively capture the long/short-range dependencies in a spatio-temporal manner. We devise our model into a pure factorised architecture which can alternately perform spatial feature aggregation and temporal feature aggregation. To improve the feature aggregation effect, a channel-wise attention mechanism is also designed and employed. Extensive experiments were conducted on three public benchmark datasets, and the results suggest that our approach can capture both long-and-short range dependencies in the space and time domain, yielding higher results than other state-of-the-art methods. Code available at https://github.com/tailin1009/LSTA-Net.

翻訳日:2021-11-02 15:33:54 公開日:2021-11-01

# 破損不変人物再同定のためのベンチマーク

Benchmarks for Corruption Invariant Person Re-identification ( http://arxiv.org/abs/2111.00880v1 )

ライセンス: Link先を確認

Minghui Chen, Zhiqiang Wang, Feng Zheng

(参考訳) 安全クリティカルなアプリケーションに人体再識別(ReID)モデルをデプロイする場合、さまざまな画像破損に対するモデルの堅牢性を理解することが重要となる。しかし、person reidの現在の評価では、クリーンデータセットのパフォーマンスのみを検討し、さまざまな腐敗したシナリオでイメージを無視している。本研究では,6つのReIDベンチマークを総合的に構築し,腐敗不変表現を学習する。 ReID の分野では,マーケット-1501,CUHK03,MSMT17,RegDB,SYSU-MM01 など,単品間および多品間データセットにおける汚い不変性学習の徹底的な研究を行う。最近の21種類のreid法のロバスト性性能を再現・検討した結果,以下の知見を得た。 1) 変圧器モデルの方がcnnモデルに比べて劣化画像に対して頑健である。 2) ランダム消去の確率を増大させることにより, モデル劣化の堅牢性が損なわれる。 3) クロスデータセットの一般化は汚職の堅牢性の向上とともに改善する。上記の観察を解析することにより,多様な腐敗に対するロバスト性の向上を実現する,単一および相互モダリティreidデータセットの強固なベースラインを提案する。私たちのコードはhttps://github.com/MinghuiChen43/CIL-ReIDで公開されています。

When deploying person re-identification (ReID) model in safety-critical applications, it is pivotal to understanding the robustness of the model against a diverse array of image corruptions. However, current evaluations of person ReID only consider the performance on clean datasets and ignore images in various corrupted scenarios. In this work, we comprehensively establish six ReID benchmarks for learning corruption invariant representation. In the field of ReID, we are the first to conduct an exhaustive study on corruption invariant learning in single- and cross-modality datasets, including Market-1501, CUHK03, MSMT17, RegDB, SYSU-MM01. After reproducing and examining the robustness performance of 21 recent ReID methods, we have some observations: 1) transformer-based models are more robust towards corrupted images, compared with CNN-based models, 2) increasing the probability of random erasing (a commonly used augmentation method) hurts model corruption robustness, 3) cross-dataset generalization improves with corruption robustness increases. By analyzing the above observations, we propose a strong baseline on both single- and cross-modality ReID datasets which achieves improved robustness against diverse corruptions. Our codes are available on https://github.com/MinghuiChen43/CIL-ReID.

翻訳日:2021-11-02 15:33:34 公開日:2021-11-01

# リテラリートイデータセットを用いた階層画像分類

Hierarchical Image Classification with A Literally Toy Dataset ( http://arxiv.org/abs/2111.00892v1 )

ライセンス: Link先を確認

Long He, Dandan Song, Liang Zheng

(参考訳) 画像分類における教師なし領域適応(UDA)は依然として大きな課題である。既存のUDAイメージデータセットでは、クラスは通常フラットな方法で整理され、平易な分類器を訓練することができる。しかし、いくつかのシナリオでは、フラットなカテゴリはいくつかのベースクラスに由来する。例えば、バギーはクラス鳥に属する。階層的画像分類として,クラスが上述の特徴を持ち,フラットクラスとベースクラスが階層的に分類される分類タスクを定義する。直感的には、このような階層構造を活用することは、階層的なイメージ分類に役立つ。本稿では,ラベル階層から学習した特徴を融合させて分類性能を向上させる。具体的には,階層ラベルとUDA技術を用いて特徴抽出器を訓練し,入力画像の複数の特徴を出力する。それらの機能は、最後にきめ細かいクラスを予測するために結合される。この研究はlego-15という新しいデータセットで行われます。 lego-15のデータセットには、レゴブロックの合成画像と実際の画像が15種類含まれています。各クラスは粗いレベルラベルと中間レベルラベルに由来する。例えば、"85080"クラスは、レンガ(粗い)とレンガ(中間)に関連付けられている。本稿では,本手法が階層画像分類におけるUDAのベースラインを一貫した改善をもたらすことを示す。大規模なアブレーションと変種研究は、新しいデータセットと調査アルゴリズムに関する洞察を提供する。

Unsupervised domain adaptation (UDA) in image classification remains a big challenge. In existing UDA image dataset, classes are usually organized in a flattened way, where a plain classifier can be trained. Yet in some scenarios, the flat categories originate from some base classes. For example, buggies belong to the class bird. We define the classification task where classes have characteristics above and the flat classes and the base classes are organized hierarchically as hierarchical image classification. Intuitively, leveraging such hierarchical structure will benefit hierarchical image classification, e.g., two easily confusing classes may belong to entirely different base classes. In this paper, we improve the performance of classification by fusing features learned from a hierarchy of labels. Specifically, we train feature extractors supervised by hierarchical labels and with UDA technology, which will output multiple features for an input image. The features are subsequently concatenated to predict the finest-grained class. This study is conducted with a new dataset named Lego-15. Consisting of synthetic images and real images of the Lego bricks, the Lego-15 dataset contains 15 classes of bricks. Each class originates from a coarse-level label and a middle-level label. For example, class "85080" is associated with bricks (coarse) and bricks round (middle). In this dataset, we demonstrate that our method brings about consistent improvement over the baseline in UDA in hierarchical image classification. Extensive ablation and variant studies provide insights into the new dataset and the investigated algorithm.

翻訳日:2021-11-02 15:33:15 公開日:2021-11-01

# PP-PicoDet: モバイルデバイスのリアルタイムオブジェクト検出器

PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices ( http://arxiv.org/abs/2111.00902v1 )

ライセンス: Link先を確認

Guanghua Yu, Qinyao Chang, Wenyu Lv, Chang Xu, Cheng Cui, Wei Ji, Qingqing Dang, Kaipeng Deng, Guanzhong Wang, Yuning Du, Baohua Lai, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma

(参考訳) 精度と効率のトレードオフは、オブジェクト検出において難しい問題である。本稿では,オブジェクト検出のための重要な最適化とニューラルネットワークアーキテクチャの選択を研究し,精度と効率を向上させることを目的とする。軽量物体検出モデルにおけるアンカーフリー戦略の適用性について検討する。我々は,バックボーン構造を強化し,首の軽量構造を設計し,ネットワークの特徴抽出能力を向上させる。ラベル割り当て戦略と損失関数を改善し,トレーニングをより安定かつ効率的にする。これらの最適化により, PP-PicoDetと呼ばれる, モバイル機器の物体検出性能に優れたリアルタイム物体検出ファミリを新たに構築する。我々のモデルは、他の一般的なモデルと比べて精度とレイテンシのトレードオフが良くなります。わずか0.99MパラメータのPicoDet-Sは30.6%のmAPを実現しており、これはmAPの絶対4.8%の改善であり、YOLOX-Nanoと比較してモバイルCPUの推論遅延を55%削減している。入力サイズが320のとき、モバイルARM CPU上で123 FPS(Paddle Liteを使用した150 FPS)に達する。わずか3.3MパラメータのPicoDet-Lは40.9%のmAPを達成するが、これは絶対3.7%の改善であり、YOLOv5sよりも44%高速である。図1に示すように、私たちのモデルは軽量オブジェクト検出の最先端の結果をはるかに上回っています。コードと事前学習されたモデルはhttps://github.com/paddlepaddle/paddledetectionで入手できる。

The better accuracy and efficiency trade-off has been a challenging problem in object detection. In this work, we are dedicated to studying key optimizations and neural network architecture choices for object detection to improve accuracy and efficiency. We investigate the applicability of the anchor-free strategy on lightweight object detection models. We enhance the backbone structure and design the lightweight structure of the neck, which improves the feature extraction ability of the network. We improve label assignment strategy and loss function to make training more stable and efficient. Through these optimizations, we create a new family of real-time object detectors, named PP-PicoDet, which achieves superior performance on object detection for mobile devices. Our models achieve better trade-offs between accuracy and latency compared to other popular models. PicoDet-S with only 0.99M parameters achieves 30.6% mAP, which is an absolute 4.8% improvement in mAP while reducing mobile CPU inference latency by 55% compared to YOLOX-Nano, and is an absolute 7.1% improvement in mAP compared to NanoDet. It reaches 123 FPS (150 FPS using Paddle Lite) on mobile ARM CPU when the input size is 320. PicoDet-L with only 3.3M parameters achieves 40.9% mAP, which is an absolute 3.7% improvement in mAP and 44% faster than YOLOv5s. As shown in Figure 1, our models far outperform the state-of-the-art results for lightweight object detection. Code and pre-trained models are available at https://github.com/PaddlePaddle/PaddleDetection.

翻訳日:2021-11-02 15:32:56 公開日:2021-11-01

# DFCANet:クロスドメインアイリス提示検出のためのDense Feature Calibration-Attention Guided Network

DFCANet: Dense Feature Calibration-Attention Guided Network for Cross Domain Iris Presentation Attack Detection ( http://arxiv.org/abs/2111.00919v1 )

ライセンス: Link先を確認

Gaurav Jaswal, Aman Verma, Sumantra Dutta Roy, Raghavendra Ramachandra

(参考訳) アイリス提示攻撃検知(IPAD)は、個人認証の確保に不可欠であり、アイリス認識システムとして広く利用されている。しかし、既存のIPADアルゴリズムは、制約のない環境でのキャプチャと、ボナフィドとアタックサンプル間の高い視覚的相関のため、目に見えない領域とクロスドメインのシナリオをうまく一般化しない。虹彩画像の複雑なテクスチャおよび形態パターンにおけるこれらの類似性は、さらなる性能劣化に寄与する。そこで本稿では,これらの欠点を解消するために,局所的に分布する虹彩パターンをグローバルに配置したdfcanet: dense feature calibration and attention guided networkを提案する。 DFCANetは特徴校正畳み込みと残差学習の利点を高め、ドメイン固有の虹彩特徴表現を生成する。キャリブレーションされた特徴マップのいくつかのチャネルは、より顕著な情報を含んでいるため、チャネルアテンション機構を通じてチャネルを横断する識別的特徴学習を行う。提案モデルの課題を強化するため,DFCANetを非切除眼および非正常眼虹彩画像上で動作させる。クロスドメインとドメイン内シナリオに対する広範囲な実験は、一貫性のある結果を示している。 DFCANetは最先端の手法と比較して,ベンチマークIIITD CLI, IIIT CSD, NDCLD13データベースのパフォーマンスが大幅に向上した。さらに,不連続な虹彩データの特徴とデータ不足を克服するために,新しいインクリメンタル学習ベースの手法が導入された。また,攻撃カテゴリのソフトレンズを様々なクロスドメインプロトコルで評価する難易度シナリオも追求する。コードは公開される予定だ。

An iris presentation attack detection (IPAD) is essential for securing personal identity is widely used iris recognition systems. However, the existing IPAD algorithms do not generalize well to unseen and cross-domain scenarios because of capture in unconstrained environments and high visual correlation amongst bonafide and attack samples. These similarities in intricate textural and morphological patterns of iris ocular images contribute further to performance degradation. To alleviate these shortcomings, this paper proposes DFCANet: Dense Feature Calibration and Attention Guided Network which calibrates the locally spread iris patterns with the globally located ones. Uplifting advantages from feature calibration convolution and residual learning, DFCANet generates domain-specific iris feature representations. Since some channels in the calibrated feature maps contain more prominent information, we capitalize discriminative feature learning across the channels through the channel attention mechanism. In order to intensify the challenge for our proposed model, we make DFCANet operate over nonsegmented and non-normalized ocular iris images. Extensive experimentation conducted over challenging cross-domain and intra-domain scenarios highlights consistent outperforming results. Compared to state-of-the-art methods, DFCANet achieves significant gains in performance for the benchmark IIITD CLI, IIIT CSD and NDCLD13 databases respectively. Further, a novel incremental learning-based methodology has been introduced so as to overcome disentangled iris-data characteristics and data scarcity. This paper also pursues the challenging scenario that considers soft-lens under the attack category with evaluation performed under various cross-domain protocols. The code will be made publicly available.

翻訳日:2021-11-02 15:32:32 公開日:2021-11-01

# (参考訳) 深層学習に適合する論理規則: 船型分類の新しいアプローチ

Logic Rules Meet Deep Learning: A Novel Approach for Ship Type Classification ( http://arxiv.org/abs/2111.01042v1 )

ライセンス: CC BY 4.0

Manolis Pitsikalis, Thanh-Toan Do, Alexei Lisitsa and Shan Luo

(参考訳) 海運産業は、国際貿易と経済の重要な要素であるが、法律の遵守と安全性を確保するためには、監視が必要である。本稿では, 自動識別システムから送信された船舶データと, 船舶画像とを組み合わせた, 船舶型分類モデルを提案する。我々のアプローチの主な構成要素は、より高速なR-CNNディープニューラルネットワークとIF-THENルールを備えたニューロファジィシステムである。実世界のデータを用いてモデルを評価し、この組み合わせの利点を示しながら、他の手法と比較する。その結果,本モデルでは,ブラックボックスのアプローチとは対照的に説明可能性のレベルを維持しつつ,検討した次のベストモデルと比較して,予測スコアを最大15.4\%向上させることができることがわかった。

The shipping industry is an important component of the global trade and economy, however in order to ensure law compliance and safety it needs to be monitored. In this paper, we present a novel Ship Type classification model that combines vessel transmitted data from the Automatic Identification System, with vessel imagery. The main components of our approach are the Faster R-CNN Deep Neural Network and a Neuro-Fuzzy system with IF-THEN rules. We evaluate our model using real world data and showcase the advantages of this combination while also compare it with other methods. Results show that our model can increase prediction scores by up to 15.4\% when compared with the next best model we considered, while also maintaining a level of explainability as opposed to common black box approaches.

翻訳日:2021-11-02 15:31:05 公開日:2021-11-01

# 都市知識グラフによる知識駆動サイト選択

Knowledge-driven Site Selection via Urban Knowledge Graph ( http://arxiv.org/abs/2111.00787v1 )

ライセンス: Link先を確認

Yu Liu, Jingtao Ding, Yong Li

(参考訳) サイト選択は、ビジネスの成功にとって重要な新しい店舗の最適な場所を決定する。特に、多元都市データを用いた人工知能の幅広い応用は、インテリジェントなサイト選択を有望にする。しかし、既存のデータ駆動手法は機能工学に大きく依存しており、ビジネスの一般化と複雑な関係モデリングの問題に直面している。ジレンマを除去するために,本研究では知識グラフ(KG)からアイデアを借用し,KnowSiteの略であるサイト選択のための知識駆動モデルを提案する。具体的には, 蒸留知識と高次意味論に動機づけられ, まず, 都市の重要要素と意味関係を捉えた都市kg (urbankg) を構築した。 UrbanKGに基づいて, サイト決定のためのエンコーダ・デコーダ構造に入力される意味表現の事前学習手法を採用する。マルチリレーショナルメッセージパッシングとリレーショナルパスに基づくアテンション機構を開発したKnowSiteは,各種ビジネスとサイト選択基準との関係を明らかにする。 2つのデータセットに対する大規模な実験により、KnowSiteは、有効性と説明可能性の両方で代表ベースラインを上回っていることが示された。

Site selection determines optimal locations for new stores, which is of crucial importance to business success. Especially, the wide application of artificial intelligence with multi-source urban data makes intelligent site selection promising. However, existing data-driven methods heavily rely on feature engineering, facing the issues of business generalization and complex relationship modeling. To get rid of the dilemma, in this work, we borrow ideas from knowledge graph (KG), and propose a knowledge-driven model for site selection, short for KnowSite. Specifically, motivated by distilled knowledge and rich semantics in KG, we firstly construct an urban KG (UrbanKG) with cities' key elements and semantic relationships captured. Based on UrbanKG, we employ pre-training techniques for semantic representations, which are fed into an encoder-decoder structure for site decisions. With multi-relational message passing and relation path-based attention mechanism developed, KnowSite successfully reveals the relationship between various businesses and site selection criteria. Extensive experiments on two datasets demonstrate that KnowSite outperforms representative baselines with both effectiveness and explainability achieved.

翻訳日:2021-11-02 15:18:36 公開日:2021-11-01

# 五目(ごもく):ゲームとプレイヤーワインの分析

Gomoku: analysis of the game and of the player Wine ( http://arxiv.org/abs/2111.01016v1 )

ライセンス: Link先を確認

Lorenzo Piazzo, Michele Scarpiniti and Enzo Baccarelli

(参考訳) 五目(ごもく)は、古典的なボードゲームで、新しい人工知能(AI)技術を試すのに理想的に適している。本報告では,新たなゴモクプレイヤーの作成を希望する開発者を支援することを目的として,既存のゲームよりも広く,より深いゲームコンセプトと戦略の分析を行う。また,人工的プレーヤの一般構造について論じた上で,インターネット上で自由に利用でき,現代的プレーヤの組織化方法の優れた例である,ワインという名の強い五目プレーヤを提示・分析した。

Gomoku, also known as five in a row, is a classical board game, ideally suited for quickly testing novel Artificial Intelligence (AI) techniques. With the aim of facilitating a developer willing to write a new Gomoku player, in this report we present an analysis of the main game concepts and strategies, which is wider and deeper than existing ones. Moreover, after discussing the general structure of an artificial player, we present and analyse a strong Gomoku player, named Wine, the code of which is freely available on the Internet and which is an excelent example of how a modern player is organised.

翻訳日:2021-11-02 15:18:17 公開日:2021-11-01

# (参考訳) FREGAN : ビデオのフレームレート向上のための生成的敵ネットワークの応用

FREGAN : an application of generative adversarial networks in enhancing the frame rate of videos ( http://arxiv.org/abs/2111.01105v1 )

ライセンス: CC BY 4.0

Rishik Mishra, Neeraj Gupta, Nitya Shukla

(参考訳) デジタルビデオは個々のフレームの集合であり、シーンが各フレームのタイムスライスを利用した映像をストリーミングする。高いリフレッシュレートと高いフレームレートは、すべてのハイテクアプリケーションの要求である。ビデオのアクショントラッキングは簡単になり、リフレッシュレートが高いため、ゲームアプリケーションでは動きがスムーズになる。画面に表示されるフレーム間の時間が少なくなるため、より高速なレスポンスを提供する。 fregan(frame rate enhancement generative adversarial network)モデルが提案されており、過去のフレームのシーケンスに基づいてビデオシーケンスの将来フレームを予測する。本稿では,ganモデルについて検討し,ビデオのフレームレート向上のためにfreganを提案する。提案手法では,フーバー損失を損失関数として用いた。超解像に優れた結果をもたらし,フレームレート向上の応用において,その性能を再現しようと試みた。標準データセット(ucf101およびrfree500)における提案モデルの有効性を検証した。実験の結果,提案モデルはピーク信号対雑音比(psnr)34.94,構造類似度指数(ssim)0.95であった。

A digital video is a collection of individual frames, while streaming the video the scene utilized the time slice for each frame. High refresh rate and high frame rate is the demand of all high technology applications. The action tracking in videos becomes easier and motion becomes smoother in gaming applications due to the high refresh rate. It provides a faster response because of less time in between each frame that is displayed on the screen. FREGAN (Frame Rate Enhancement Generative Adversarial Network) model has been proposed, which predicts future frames of a video sequence based on a sequence of past frames. In this paper, we investigated the GAN model and proposed FREGAN for the enhancement of frame rate in videos. We have utilized Huber loss as a loss function in the proposed FREGAN. It provided excellent results in super-resolution and we have tried to reciprocate that performance in the application of frame rate enhancement. We have validated the effectiveness of the proposed model on the standard datasets (UCF101 and RFree500). The experimental outcomes illustrate that the proposed model has a Peak signal-to-noise ratio (PSNR) of 34.94 and a Structural Similarity Index (SSIM) of 0.95.

翻訳日:2021-11-02 15:12:45 公開日:2021-11-01

# 弱ショット学習のためのインフルエンシャル・プロトタイプネットワーク : 皮膚科における事例研究

Influential Prototypical Networks for Few Shot Learning: A Dermatological Case Study ( http://arxiv.org/abs/2111.00698v1 )

ライセンス: Link先を確認

Ranjana Roy Chowdhury, Deepti R. Bathula

(参考訳) プロトタイプネットワーク(PN)は単純だが効果的なショットラーニング戦略である。ユークリッド距離を計算して各クラスの原型表現に分類する,メートル法に基づくメタラーニング手法である。従来のpn属性はすべてのサンプルに等しく重要であり、各クラスに属するサポートサンプル埋め込みを平均化することによってプロトタイプを生成する。そこで本研究では, 支持試料分布への影響に対応する試料に重みを付与するPNの新たなバージョンを提案する。試料を含む試料分布の平均埋込量と試料を除いた最大平均差(mmd)に基づいて試料の影響重みを算出する。提案するipnetの包括的評価は,3種類のベンチマークdermatological datasetにおける他のpnsとの比較により行った。 ipnetは、3つのデータセットと様々なn-way、k-shot分類タスクで魅力的な結果を持つすべてのベースラインモデルを上回る。クロスドメイン適応実験からの発見により、IPNetの堅牢性と一般化性がさらに確立される。

Prototypical network (PN) is a simple yet effective few shot learning strategy. It is a metric-based meta-learning technique where classification is performed by computing Euclidean distances to prototypical representations of each class. Conventional PN attributes equal importance to all samples and generates prototypes by simply averaging the support sample embeddings belonging to each class. In this work, we propose a novel version of PN that attributes weights to support samples corresponding to their influence on the support sample distribution. Influence weights of samples are calculated based on maximum mean discrepancy (MMD) between the mean embeddings of sample distributions including and excluding the sample. Comprehensive evaluation of our proposed influential PN (IPNet) is performed by comparing its performance with other baseline PNs on three different benchmark dermatological datasets. IPNet outperforms all baseline models with compelling results across all three datasets and various N-way, K-shot classification tasks. Findings from cross-domain adaptation experiments further establish the robustness and generalizability of IPNet.

翻訳日:2021-11-02 15:02:30 公開日:2021-11-01

# 単一項目ファッションレコメンデーション:クロスドメインレコメンデーションに向けて

Single-Item Fashion Recommender: Towards Cross-Domain Recommendations ( http://arxiv.org/abs/2111.00758v1 )

ライセンス: Link先を確認

Seyed Omid Mohammadi, Hossein Bodaghi, Ahmad Kalhor (University of Tehran, College of Engineering, School of Electrical and Computer Engineering, Tehran, Iran)

(参考訳) 現在、レコメンダシステムと検索エンジンはファッションeコマースにおいて不可欠な役割を担っている。それでも、多くの課題があり、この研究はいくつかの課題に取り組みます。この記事ではまず,並列ニューラルネットワークを用いて1つのファッションアイテムショップイメージを入力として,店舗で利用可能な類似アイテムをリストアップしてショップ内レコメンデーションを行う,コンテンツベースのファッションレコメンデーションシステムを提案する。次に、ユーザの好みに基づいて結果をパーソナライズするように、同じ構造が強化される。この研究は、ドメイン外のクエリに対してより堅牢なシステムを実現するバックグラウンド拡張技術を導入し、カタログショップイメージのトレーニングセットのみを使用して、ストリート・ツー・ショップのレコメンデーションを可能にする。さらに,本論文の最後の貢献は,客観的人間得点と呼ばれるレコメンデーションタスクのための新しい評価指標である。この方法は、人間のスコアラーの主観評価から解釈可能で比較可能なスコアを生成する、完全にカスタマイズ可能なフレームワークである。

Nowadays, recommender systems and search engines play an integral role in fashion e-commerce. Still, many challenges lie ahead, and this study tries to tackle some. This article first suggests a content-based fashion recommender system that uses a parallel neural network to take a single fashion item shop image as input and make in-shop recommendations by listing similar items available in the store. Next, the same structure is enhanced to personalize the results based on user preferences. This work then introduces a background augmentation technique that makes the system more robust to out-of-domain queries, enabling it to make street-to-shop recommendations using only a training set of catalog shop images. Moreover, the last contribution of this paper is a new evaluation metric for recommendation tasks called objective-guided human score. This method is an entirely customizable framework that produces interpretable, comparable scores from subjective evaluations of human scorers.

翻訳日:2021-11-02 15:02:15 公開日:2021-11-01

# 線形時間不変系の安全学習

Safe Learning of Linear Time-Invariant Systems ( http://arxiv.org/abs/2111.00631v1 )

ライセンス: Link先を確認

Farhad Farokhi, Alex S. Leong, Mohammad Zamani, Iman Shames

(参考訳) 離散時間線形時間不変システムの同時学習と制御における安全性を検討する。利用状態の測定回数に基づいて,システムの学習モデルに基づく厳密な信頼性境界を提供する。これらの境界は、潜在的に時間的制約のある最適化問題によってシステムへの制御入力を変更するために使用される。安全性に制約のある最適化が実現可能な解決策が存在する場合, 安全セットを最小限の確率で退避させることが証明できる。この最適化問題は、学習中のモデルの不確実性を考慮した安全制約を厳格化することにより、より計算に優しい形式に再構成される。学習モデルの信頼性が向上するにつれて、締め付けは減少する。最終的に、励起の持続下では、より多くの測定値が収集されるにつれて、締め付けは無視される。

We consider safety in simultaneous learning and control of discrete-time linear time-invariant systems. We provide rigorous confidence bounds on the learned model of the system based on the number of utilized state measurements. These bounds are used to modify control inputs to the system via an optimization problem with potentially time-varying safety constraints. We prove that the state can only exit the safe set with small probability, provided a feasible solution to the safety-constrained optimization exists. This optimization problem is then reformulated in a more computationally-friendly format by tightening the safety constraints to account for model uncertainty during learning. The tightening decreases as the confidence in the learned model improves. We finally prove that, under persistence of excitation, the tightening becomes negligible as more measurements are gathered.

翻訳日:2021-11-02 15:00:13 公開日:2021-11-01

# SADGA: テキスト間SQLのための構造対応デュアルグラフ集約ネットワーク

SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL ( http://arxiv.org/abs/2111.00653v1 )

ライセンス: Link先を確認

Ruichu Cai, Jinjie Yuan, Boyan Xu, Zhifeng Hao

(参考訳) 質問の自然言語をSQLクエリに翻訳することを目的としたText-to-SQLタスクが最近注目を集めている。 Text-to-SQLの最も難しい問題のひとつは、トレーニング済みモデルを未知のデータベーススキーマに一般化する方法である。鍵は一般化可能性にある (i)質問をモデル化する符号化方法とデータベーススキーマ (ii)問題中の単語とデータベーススキーマのテーブル/カラム間のマッピングを学ぶための質問・スキーマリンク手法。上述の2つの課題に着目して、クロスドメインテキスト・トゥ・SQLのための構造対応デュアルグラフ集約ネットワーク(SADGA)を提案する。 SADGAでは、自然言語問題とデータベーススキーマの両方に統一的な符号化モデルを提供するために、グラフ構造を採用する。提案する統一モデリングに基づいて,構造認識型アグリゲーション手法をさらに考案し,質問文とスキーマグラフのマッピングを学習する。本手法は,Global Graph Linking,Local Graph Linking,Dual-Graph Aggregation Mechanismを特徴とする。私たちは提案のパフォーマンスを実証的に研究するだけでなく、テキストからSQLへのベンチマークであるSpiderの書き込み時の3位を達成しました。

The Text-to-SQL task, aiming to translate the natural language of the questions into SQL queries, has drawn much attention recently. One of the most challenging problems of Text-to-SQL is how to generalize the trained model to the unseen database schemas, also known as the cross-domain Text-to-SQL task. The key lies in the generalizability of (i) the encoding method to model the question and the database schema and (ii) the question-schema linking method to learn the mapping between words in the question and tables/columns in the database schema. Focusing on the above two key issues, we propose a Structure-Aware Dual Graph Aggregation Network (SADGA) for cross-domain Text-to-SQL. In SADGA, we adopt the graph structure to provide a unified encoding model for both the natural language question and database schema. Based on the proposed unified modeling, we further devise a structure-aware aggregation method to learn the mapping between the question-graph and schema-graph. The structure-aware aggregation method is featured with Global Graph Linking, Local Graph Linking, and Dual-Graph Aggregation Mechanism. We not only study the performance of our proposal empirically but also achieved 3rd place on the challenging Text-to-SQL benchmark Spider at the time of writing.

翻訳日:2021-11-02 14:58:50 公開日:2021-11-01

# Adapterによる教師なしドメイン適応

Unsupervised Domain Adaptation with Adapter ( http://arxiv.org/abs/2111.00667v1 )

ライセンス: Link先を確認

Rongsheng Zhang, Yinhe Zheng, Xiaoxi Mao, Minlie Huang

(参考訳) 事前学習言語モデル(PrLM)を用いた教師なしドメイン適応(UDA)は、これらの事前学習モデルが様々なドメインから学んだ一般的な知識を組み込んでいるため、有望な結果を得た。しかし、小さなドメイン固有のコーパス上でprlmの全てのパラメータを微調整することは、学習されたジェネリック知識を歪め、また各ドメインに微調整されたprlm全体を配置するコストも高くなる。本稿では,教師なしドメイン適応のためのアダプタベースの微調整手法について検討する。具体的には、いくつかのトレーニング可能なアダプタモジュールをPrLMに挿入し、元のPrLMのパラメータを微調整時に固定することで、組み込みの汎用知識を保存する。これらのアダプタをmix-domainコーパスを使ってトレーニングするためにdomain-fusionスキームが導入された。 2つのベンチマークデータセットに関する詳細な実験を行い,提案手法が異なるタスク,データセットサイズ,ドメイン類似性において有効であることを示す。

Unsupervised domain adaptation (UDA) with pre-trained language models (PrLM) has achieved promising results since these pre-trained models embed generic knowledge learned from various domains. However, fine-tuning all the parameters of the PrLM on a small domain-specific corpus distort the learned generic knowledge, and it is also expensive to deployment a whole fine-tuned PrLM for each domain. This paper explores an adapter-based fine-tuning approach for unsupervised domain adaptation. Specifically, several trainable adapter modules are inserted in a PrLM, and the embedded generic knowledge is preserved by fixing the parameters of the original PrLM at fine-tuning. A domain-fusion scheme is introduced to train these adapters using a mix-domain corpus to better capture transferable features. Elaborated experiments on two benchmark datasets are carried out, and the results demonstrate that our approach is effective with different tasks, dataset sizes, and domain similarities.

翻訳日:2021-11-02 14:57:24 公開日:2021-11-01

# 不当・不当な動詞の教師なし発見

Unsupervised Discovery of Unaccusative and Unergative Verbs ( http://arxiv.org/abs/2111.00808v1 )

ライセンス: Link先を確認

Sharid Lo\'aiciga, Luca Bevacqua, Christian Hardmeier

(参考訳) 英語の非強制動詞と非強制動詞を教師なしで検出する手法を提案する。これらのカテゴリにより、動詞の意味的役割を知らずに、因果関係の交替に関与している動詞を識別できる。この方法は、候補動詞の非推移文変種を生成し、言語モデルを求めることに基づく。アノテーション付きリソースに依存しないというメリットも加わり,同様のアプローチと同等の結果を得た。

We present an unsupervised method to detect English unergative and unaccusative verbs. These categories allow us to identify verbs participating in the causative-inchoative alternation without knowing the semantic roles of the verb. The method is based on the generation of intransitive sentence variants of candidate verbs and probing a language model. We obtained results on par with similar approaches, with the added benefit of not relying on annotated resources.

翻訳日:2021-11-02 14:57:06 公開日:2021-11-01

# スパン抽出のためのラベル知識を用いた拡張言語表現

Enhanced Language Representation with Label Knowledge for Span Extraction ( http://arxiv.org/abs/2111.00884v1 )

ライセンス: Link先を確認

Pan Yang, Xin Cong, Zhenyun Sun, Xingwu Liu

(参考訳) プレーンテキストからテキストスパン(単語やフレーズなど)を抽出することを目的としたスパン抽出は、インフォメーション抽出の基本的なプロセスである。近年の研究では,スパン抽出タスクを質問応答問題(QA形式化)に形式化し,最先端のパフォーマンスを実現することで,テキスト表現を強化するラベル知識を導入している。しかし、QA形式化はラベルの知識を完全に活用せず、トレーニング/推論の効率が低い。これらの問題に対処するために,ラベル知識を統合する新しいパラダイムを導入し,ラベル知識をテキスト表現に明示的に効率的に統合する新しいモデルを提案する。具体的には、テキストとラベルアノテーションを独立してエンコードし、ラベルの知識をテキスト表現に統合する。我々は,フラットNER,ネストNER,イベント検出の3つの典型的なスパン抽出タスクについて広範な実験を行った。経験的な結果は 1) 提案手法は4つのベンチマークで最先端の性能を実現する。 2) トレーニング時間と推論時間は,qa形式化パラダイムと比較して,平均で76%,77%削減されている。コードとデータはhttps://github.com/akeepers/lear.comから入手できます。

Span extraction, aiming to extract text spans (such as words or phrases) from plain texts, is a fundamental process in Information Extraction. Recent works introduce the label knowledge to enhance the text representation by formalizing the span extraction task into a question answering problem (QA Formalization), which achieves state-of-the-art performance. However, QA Formalization does not fully exploit the label knowledge and suffers from low efficiency in training/inference. To address those problems, we introduce a new paradigm to integrate label knowledge and further propose a novel model to explicitly and efficiently integrate label knowledge into text representations. Specifically, it encodes texts and label annotations independently and then integrates label knowledge into text representation with an elaborate-designed semantics fusion module. We conduct extensive experiments on three typical span extraction tasks: flat NER, nested NER, and event detection. The empirical results show that 1) our method achieves state-of-the-art performance on four benchmarks, and 2) reduces training time and inference time by 76% and 77% on average, respectively, compared with the QA Formalization paradigm. Our code and data are available at https://github.com/Akeepers/LEAR.

翻訳日:2021-11-02 14:57:01 公開日:2021-11-01

# トランスフォーマーモデルを用いた言語間ヘイトスピーチ検出

Cross-lingual Hate Speech Detection using Transformer Models ( http://arxiv.org/abs/2111.00981v1 )

ライセンス: Link先を確認

Teodor Ti\c{t}a, Arkaitz Zubiaga

(参考訳) 言語横断設定におけるヘイトスピーチ検出は、中規模および大規模オンラインプラットフォームにおいて最も関心を寄せる分野である。この問題をグローバルな規模で適切に解決できないことは、道徳的に疑わしい現実の出来事、人間の死、そして憎悪そのものの永続性につながる。本稿では, 英語からフランス語, フランス語, バイヴァーサ, および各言語を個別に学習する, この重要なソーシャルデータサイエンス課題について, 反復的改善と比較誤差分析を含む細調整による多言語トランスフォーマーモデル(mBERT, XLM-RoBERTa)の能力について述べる。

Hate speech detection within a cross-lingual setting represents a paramount area of interest for all medium and large-scale online platforms. Failing to properly address this issue on a global scale has already led over time to morally questionable real-life events, human deaths, and the perpetuation of hate itself. This paper illustrates the capabilities of fine-tuned altered multi-lingual Transformer models (mBERT, XLM-RoBERTa) regarding this crucial social data science task with cross-lingual training from English to French, vice-versa and each language on its own, including sections about iterative improvement and comparative error analysis.

翻訳日:2021-11-02 14:56:44 公開日:2021-11-01

# (参考訳) cGANの分類と非分類の統一的視点

A Unified View of cGANs with and without Classifiers ( http://arxiv.org/abs/2111.01035v1 )

ライセンス: CC BY 4.0

Si-An Chen, Chun-Liang Li, Hsuan-Tien Lin

(参考訳) Conditional Generative Adversarial Networks (cGANs) は、クラス条件分布からサンプリングできる暗黙の生成モデルである。既存のcGANは幅広い異なる識別器の設計と訓練目的に基づいている。初期の作業で一般的な設計の一つは、正しい分類器が間違ったクラスで生成されたサンプルを除去するのに役立つと仮定して、トレーニング中に分類器を含めることである。しかし、cGANの分類子を含むと、容易に分類できるサンプルだけを生成する副作用が生じることが多い。近年、いくつかの代表的cGANは、分類器を使わずに最先端の性能に到達することを避けている。何らかの形で、分類器がより良いcganを設計するために復活できるかどうかは不明だ。本研究では,cGANを改善するために,分類器を適切に活用できることを実証する。まず、結合確率分布の分解を用いて、cGANの目標を接続し、統一的なフレームワークとして分類する。このフレームワークは、分布をパラメータ化するための古典的なエネルギーモデルとともに、cGANに対する分類器の使用を原則的に正当化する。 ACGAN(英語版)、ProjGAN(英語版)、ContraGAN(英語版)などの一般的なcGAN変種を、異なるレベルの近似を持つ特別なケースとして説明している。実験の結果,提案したフレームワークにインスパイアされた設計は,複数のベンチマークデータセット,特に最も困難なImageNetにおいて,最先端のcGANよりも優れていた。コードはhttps://github.com/sian-chen/PyTorch-ECGANで公開されている。

Conditional Generative Adversarial Networks (cGANs) are implicit generative models which allow to sample from class-conditional distributions. Existing cGANs are based on a wide range of different discriminator designs and training objectives. One popular design in earlier works is to include a classifier during training with the assumption that good classifiers can help eliminate samples generated with wrong classes. Nevertheless, including classifiers in cGANs often comes with a side effect of only generating easy-to-classify samples. Recently, some representative cGANs avoid the shortcoming and reach state-of-the-art performance without having classifiers. Somehow it remains unanswered whether the classifiers can be resurrected to design better cGANs. In this work, we demonstrate that classifiers can be properly leveraged to improve cGANs. We start by using the decomposition of the joint probability distribution to connect the goals of cGANs and classification as a unified framework. The framework, along with a classic energy model to parameterize distributions, justifies the use of classifiers for cGANs in a principled manner. It explains several popular cGAN variants, such as ACGAN, ProjGAN, and ContraGAN, as special cases with different levels of approximations, which provides a unified view and brings new insights to understanding cGANs. Experimental results demonstrate that the design inspired by the proposed framework outperforms state-of-the-art cGANs on multiple benchmark datasets, especially on the most challenging ImageNet. The code is available at https://github.com/sian-chen/PyTorch-ECGAN.

翻訳日:2021-11-02 14:54:43 公開日:2021-11-01

# RMNet: ネットワークから残留接続をほぼ取り除く

RMNet: Equivalently Removing Residual Connection from Networks ( http://arxiv.org/abs/2111.00687v1 )

ライセンス: Link先を確認

Fanxu Meng, Hao Cheng, Jiaxin Zhuang, Ke Li, Xing Sun

(参考訳) 残差接続は、非常に深いニューラルネットワークのトレーニングを可能にするが、マルチブランチトポロジーのため、オンライン推論には適さない。これにより、多くの研究者が推論時の残差接続を伴わずにDNNの設計に取り組むことができる。例えば、RepVGGはデプロイ時にマルチブランチトポロジをVGGライクな(シングルブランチ)モデルに再パラメータ化し、ネットワークが比較的浅い場合に優れたパフォーマンスを示す。しかし、RepVGGはResNetをVGGに等価に変換することはできない。なぜなら再パラメータ化法は線形ブロックにのみ適用でき、非線形層(ReLU)は、特に深いネットワークにおいて限られた表現能力をもたらす残差接続の外に置く必要があるからである。本稿では,この問題を解決し,resblock上でのrm(reserving and merge)操作により,バニラ網の残差接続を同値に除去することを提案する。具体的には、RM操作により、入力特徴マップがブロックを通り抜けて情報を保存し、ブロックの最後に全ての情報をマージすることで、元の出力を変更することなく残余接続を除去することができる。プラグインとしてrm操作は基本的に3つの利点がある。 1)その実装により、高比ネットワークプルーニングに自然に親しみやすい。 2) RepVGG の深さ制限を破るのに役立つ。 3) ResNet や RepVGG と比較して,RMNet の精度向上を実現している。 rmオペレーションのイデオロギーは、将来、コミュニティのモデル設計に関する多くの洞察を刺激できると信じています。コードはhttps://github.com/fxmeng/rmnet。

Although residual connection enables training very deep neural networks, it is not friendly for online inference due to its multi-branch topology. This encourages many researchers to work on designing DNNs without residual connections at inference. For example, RepVGG re-parameterizes multi-branch topology to a VGG-like (single-branch) model when deploying, showing great performance when the network is relatively shallow. However, RepVGG can not transform ResNet to VGG equivalently because re-parameterizing methods can only be applied to linear blocks and the non-linear layers (ReLU) have to be put outside of the residual connection which results in limited representation ability, especially for deeper networks. In this paper, we aim to remedy this problem and propose to remove the residual connection in a vanilla ResNet equivalently by a reserving and merging (RM) operation on ResBlock. Specifically, the RM operation allows input feature maps to pass through the block while reserving their information and merges all the information at the end of each block, which can remove residual connections without changing the original output. As a plug-in method, RM Operation basically has three advantages: 1) its implementation makes it naturally friendly for high ratio network pruning. 2) it helps break the depth limitation of RepVGG. 3) it leads to better accuracy-speed trade-off network (RMNet) compared to ResNet and RepVGG. We believe the ideology of RM Operation can inspire many insights on model design for the community in the future. Code is available at: https://github.com/fxmeng/RMNet.

翻訳日:2021-11-02 14:34:45 公開日:2021-11-01

# プロジェクションされたGANがより速く収束

Projected GANs Converge Faster ( http://arxiv.org/abs/2111.01007v1 )

ライセンス: Link先を確認

Axel Sauer, Kashyap Chitta, Jens M\"uller, Andreas Geiger

(参考訳) GAN(Generative Adversarial Networks)は高品質な画像を生成するが、訓練は難しい。注意深い正規化、大量の計算、高価なハイパーパラメータスイープが必要です。生成したサンプルと実際のサンプルを固定された事前訓練された特徴空間に投影することで、これらの問題に大きく取り組みます。判別器は事前訓練されたモデルの深い層から特徴を完全に活用できないという発見により、チャネルと解像度をまたいだ特徴を混合するより効果的な戦略を提案する。我々の投影GANは画像品質、サンプル効率、収束速度を改善する。さらに、最大1メガピクセルの解像度と互換性があり、22のベンチマークデータセット上で最先端のFr\echet Inception Distance(FID)を前進させる。重要なことは、予測されたGANはそれまでの最低値のFIDと最大40倍の速さで一致し、同じ計算リソースからウォールタイム時間を5日から3時間未満に短縮する。

Generative Adversarial Networks (GANs) produce high-quality images but are challenging to train. They need careful regularization, vast amounts of compute, and expensive hyper-parameter sweeps. We make significant headway on these issues by projecting generated and real samples into a fixed, pretrained feature space. Motivated by the finding that the discriminator cannot fully exploit features from deeper layers of the pretrained model, we propose a more effective strategy that mixes features across channels and resolutions. Our Projected GAN improves image quality, sample efficiency, and convergence speed. It is further compatible with resolutions of up to one Megapixel and advances the state-of-the-art Fr\'echet Inception Distance (FID) on twenty-two benchmark datasets. Importantly, Projected GANs match the previously lowest FIDs up to 40 times faster, cutting the wall-clock time from 5 days to less than 3 hours given the same computational resources.

翻訳日:2021-11-02 14:33:59 公開日:2021-11-01

# リレーショナルパスルールマイニングを用いた知識グラフ埋め込みのためのトランスダクティブデータ拡張

Transductive Data Augmentation with Relational Path Rule Mining for Knowledge Graph Embedding ( http://arxiv.org/abs/2111.00974v1 )

ライセンス: Link先を確認

Yushi Hirose, Masashi Shimbo, Taro Watanabe

(参考訳) 知識グラフの完成には、グラフ埋め込みに基づくものと関係経路規則帰納に基づくものという2つの主要な予測モデルが存在する。それぞれ異なる利点と欠点がある。両タイプを生かしたハイブリッドモデルが最近提案されている。ハイブリッドモデルの1つであるUniKERは、リレーショナルパスルールによってトレーニングデータを交互に拡張し、埋め込みモデルを訓練する。その高い予測精度にもかかわらず、拡張データの品質を維持するために低信頼ルールを無視しているため、関係パスルールを十分に活用していない。この制限を緩和するため,関係経路規則と拡張データの信頼度に基づく重み付けによるトランスダクティブデータ拡張を提案する。その結果,本手法は真の回答や類似したエンティティを含むデータを追加することで,組込みモデルの性能を効果的に向上できることがわかった。

For knowledge graph completion, two major types of prediction models exist: one based on graph embeddings, and the other based on relation path rule induction. They have different advantages and disadvantages. To take advantage of both types, hybrid models have been proposed recently. One of the hybrid models, UniKER, alternately augments training data by relation path rules and trains an embedding model. Despite its high prediction accuracy, it does not take full advantage of relation path rules, as it disregards low-confidence rules in order to maintain the quality of augmented data. To mitigate this limitation, we propose transductive data augmentation by relation path rules and confidence-based weighting of augmented data. The results and analysis show that our proposed method effectively improves the performance of the embedding model by augmenting data that include true answers or entities similar to them.

翻訳日:2021-11-02 14:29:19 公開日:2021-11-01

# スペクトル距離によるグラフ構造攻撃

Graph Structural Attack by Spectral Distanc ( http://arxiv.org/abs/2111.00684v1 )

ライセンス: Link先を確認

Lu Lin, Ethan Blaser and Hongning Wang

(参考訳) グラフ畳み込みネットワーク(GCNs)は、グラフ学習タスクにおける優れたパフォーマンスのため、関心が高まりつつあるが、敵攻撃に対する脆弱性も示されている。本稿では,フーリエ領域におけるグラフスペクトルフィルタの破壊に有効なグラフ構造攻撃について検討する。スペクトルフィルタの破壊を測定するために、グラフラプラシアンの固有値に基づいてスペクトル距離を定義する。次に,タスク固有の攻撃目標と提案したスペクトル距離を同時に最大化し,エッジ摂動を生成する。実験は、トレーニング時間とテスト時間の両方において、ホワイトボックス設定における提案された攻撃の有効性を示す。筆者らの定性的分析は、攻撃行動とスペクトル分布の強制的な変化の関連性を示し、スペクトル距離の最大化が空間領域におけるグラフの構造特性の変化とフーリエ領域における周波数成分の摂動に有効な方法であることを示す実証的な証拠を提供する。

Graph Convolutional Networks (GCNs) have fueled a surge of interest due to their superior performance on graph learning tasks, but are also shown vulnerability to adversarial attacks. In this paper, an effective graph structural attack is investigated to disrupt graph spectral filters in the Fourier domain. We define the spectral distance based on the eigenvalues of graph Laplacian to measure the disruption of spectral filters. We then generate edge perturbations by simultaneously maximizing a task-specific attack objective and the proposed spectral distance. The experiments demonstrate remarkable effectiveness of the proposed attack in the white-box setting at both training and test time. Our qualitative analysis shows the connection between the attack behavior and the imposed changes on the spectral distribution, which provides empirical evidence that maximizing spectral distance is an effective manner to change the structural property of graphs in the spatial domain and perturb the frequency components in the Fourier domain.

翻訳日:2021-11-02 14:27:59 公開日:2021-11-01

# sim上で検証し、実数で検出する -- ドメインランダム化のためのモデル選択

Validate on Sim, Detect on Real -- Model Selection for Domain Randomization ( http://arxiv.org/abs/2111.00765v1 )

ライセンス: Link先を確認

Gal Leibovich, Guy Jacob, Shadi Endrawis, Gal Novik, Aviv Tamar

(参考訳) sim2realと呼ばれるロボットのスキルを学ぶ実践的なアプローチは、シミュレーションで制御ポリシーを訓練し、それを実際のロボットにデプロイする。ドメインランダム化(dr: domain randomization)に基づくsim2実数転送の改善のための一般的なテクニック: 現実世界へのより良い一般化を期待して、ランダムに生成されたさまざまなドメインのポリシーをトレーニングする。ポリシー学習とDRアルゴリズムの両方において、多くのハイパーパラメーターがあるため、多くの訓練されたモデルが出来上がり、その中で最良のモデルを選択するには、実際のロボットに対してコストがかかる。この作業では、現実の世界でポリシーを実行することなく、ポリシーをランク付けできますか? 我々の主な考え方は、事前定義された現実世界データの集合が、オフ・オブ・ディストリビューション検出(OOD)技術を用いて、すべてのポリシーを評価することができるということである。ある意味で、このアプローチは、現実世界の実行前にポリシーを評価するための"ユニットテスト"と見なすことができる。しかし、OODスコア自体が不正確であり、特定のOODメソッドに非常に敏感であることがわかった。本研究の主な貢献は,OODとシミュレーションにおける評価を組み合わせた,単純なyet効率の政策スコアである。我々のスコア - VSDR - は、追加の現実世界データを必要とすることなく、ポリシーランキングの精度を大幅に向上させることができることを示す。画像入力を伴うロボットグリップタスクにおいて,VSDRがsim2real転送に与える影響を評価する。我々は、様々なDRパラメータとOOD手法を広範囲に評価し、VSDRがボード全体のポリシー選択を改善することを示す。さらに重要なことは,本手法が格付けを著しく向上し,ベースラインに比べてデータ量が大幅に少ないことである。

A practical approach to learning robot skills, often termed sim2real, is to train control policies in simulation and then deploy them on a real robot. Popular techniques to improve the sim2real transfer build on domain randomization (DR): Training the policy on a diverse set of randomly generated domains with the hope of better generalization to the real world. Due to the large number of hyper-parameters in both the policy learning and DR algorithms, one often ends up with a large number of trained models, where choosing the best model among them demands costly evaluation on the real robot. In this work we ask: Can we rank the policies without running them in the real world? Our main idea is that a predefined set of real world data can be used to evaluate all policies, using out-of-distribution detection (OOD) techniques. In a sense, this approach can be seen as a "unit test" to evaluate policies before any real world execution. However, we find that by itself, the OOD score can be inaccurate and very sensitive to the particular OOD method. Our main contribution is a simple-yet-effective policy score that combines OOD with an evaluation in simulation. We show that our score - VSDR - can significantly improve the accuracy of policy ranking without requiring additional real world data. We evaluate the effectiveness of VSDR on sim2real transfer in a robotic grasping task with image inputs. We extensively evaluate different DR parameters and OOD methods, and show that VSDR improves policy selection across the board. More importantly, our method achieves significantly better ranking, and uses significantly less data compared to baselines.

翻訳日:2021-11-02 14:27:43 公開日:2021-11-01

# マルチエージェント環境における独立強化学習アルゴリズムの検討

Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments ( http://arxiv.org/abs/2111.01100v1 )

ライセンス: Link先を確認

Ken Ming Lee, Sriram Ganapathi Subramanian, Mark Crowley

(参考訳) 独立強化学習アルゴリズムは、マルチエージェント環境で最良のポリシーを見つけるための理論的保証はない。しかし、実際には、先行研究は、いくつかの領域における独立アルゴリズムによる良い性能と、他の領域での悪いパフォーマンスを報告している。さらに、独立したアルゴリズムの強みと弱みに関する包括的な研究が文献に欠けている。本稿では,マルチエージェント環境の3つの主要カテゴリ,すなわち協調的,競争的,混合的環境にまたがる4つのpettingzoo環境における独立アルゴリズムの性能を実証的に比較する。完全観測可能な環境では、独立アルゴリズムが協調的かつ競争的な環境でマルチエージェントアルゴリズムと同等の性能を発揮することを示す。混合環境において,独立したアルゴリズムで訓練されたエージェントは,個別によく行動することを学ぶが,同盟国との協力や敵との競争を学ばないことを示す。また,協調的部分可観測環境において,再帰性を加えることで独立アルゴリズムの学習が向上することを示した。

Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on four PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. We show that in fully-observable environments, independent algorithms can perform on par with multi-agent algorithms in cooperative and competitive settings. For the mixed environments, we show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies. We also show that adding recurrence improves the learning of independent algorithms in cooperative partially observable environments.

翻訳日:2021-11-02 14:27:20 公開日:2021-11-01

# (参考訳) OpenStreetMapデータを用いた自転車共有システムの駅立地計画への移動学習アプローチ

Transfer Learning Approach to Bicycle-sharing Systems' Station Location Planning using OpenStreetMap Data ( http://arxiv.org/abs/2111.00990v1 )

ライセンス: CC BY-SA 4.0

Kamil Raczycki, Piotr Szyma\'nski

(参考訳) 自転車共有システム(BSS)は、先進地域の大規模で富裕な都市の多くの市民にとって日々の現実となっている。しかしながら、自転車共有ステーションのレイアウトを計画するには、通常、高価なデータ収集、旅行行動の調査、そしてステーションレイアウトの最適化が必要となる。多くの小さな都市や町、特に発展途上国では、こうした計画の資金調達が困難である。 BSSの計画にもかなりの時間がかかる。しかし、パンデミックが示すように、自治体は自転車の公共交通機関を離れる市民を含むモビリティシフトに迅速に対応する必要がある。自転車の需要の増加に対処するためには、自転車シェアリングシステムを迅速に提供することが重要である。本稿では,bssレイアウト設計におけるコストと時間の問題に対処し,空間埋め込み手法を用いて,計画の合理化とプロセスを容易にする新しいソリューションを提案する。 openstreetmapの公開データとヨーロッパの34都市からの駅配置のみに基づいて、uber h3離散グローバルグリッドシステムを使用して都市をマイクロリージョンに分割し、トランスファーラーニングを使用して、異なる都市の既存のシステムに基づいて駅を配置する価値のある地域を示す方法が開発されている。この作業の結果は、駅レイアウトを基準都市の選択で計画する際の意思決定においてプランナーを支援するメカニズムである。

Bicycle-sharing systems (BSS) have become a daily reality for many citizens of larger, wealthier cities in developed regions. However, planning the layout of bicycle-sharing stations usually requires expensive data gathering, surveying travel behavior and trip modelling followed by station layout optimization. Many smaller cities and towns, especially in developing areas, may have difficulty financing such projects. Planning a BSS also takes a considerable amount of time. Yet as the pandemic has shown us, municipalities will face the need to adapt rapidly to mobility shifts, which include citizens leaving public transport for bicycles. Laying out a bike sharing system quickly will become critical in addressing the increase in bike demand. This paper addresses the problem of cost and time in BSS layout design and proposes a new solution to streamline and facilitate the process of such planning by using spatial embedding methods. Based only on publicly available data from OpenStreetMap, and station layouts from 34 cities in Europe, a method has been developed to divide cities into micro-regions using the Uber H3 discrete global grid system and to indicate regions where it is worth placing a station based on existing systems in different cities using transfer learning. The result of the work is a mechanism to support planners in their decision making when planning a station layout with a choice of reference cities.

翻訳日:2021-11-02 14:24:48 公開日:2021-11-01

# トランスを用いた家畜のモニタリング

Livestock Monitoring with Transformer ( http://arxiv.org/abs/2111.00801v1 )

ライセンス: Link先を確認

Bhavesh Tangirala, Ishan Bhandari, Daniel Laszlo, Deepak K. Gupta, Rajat M. Thomas, Devanshu Arya

(参考訳) 家畜の行動の追跡は、現代の家畜農場における早期発見と伝染病の予防を可能にする。経済的利益とは別に、これは家畜農場で使用される抗生物質の量を減らし、それ以外はヒトの食生活に入り、抗生物質耐性の流行を緩和する。標準的なビデオカメラは、ほとんどの現代農場で利用でき、家畜をモニターできる。しかし、ほとんどのコンピュータビジョンアルゴリズムは、主に、このタスクで性能が悪い。一農場で飼育されている動物と同一の外観で、明らかな空間的特徴がないもの (二)既存のトラッカーのいずれも長期間の堅牢性がなく、 (iii)照明の変化、頻繁な閉塞、カメラアングルの変化、動物のサイズなど実世界の状況は、モデルが一般化することを困難にしている。これらの課題を踏まえて,グループ内豚を対象としたエンド・ツー・エンド行動監視システムを開発し,インスタンスレベルのセグメンテーション,トラッキング,アクション認識,再識別(star)タスクを同時に行う。本稿では, トランスフォーマーアーキテクチャを用いて, グループ豚のインスタンスレベルの埋め込みを学習する, エンドツーエンド多目的家畜監視フレームワークであるStarformerを紹介する。実屋内養豚環境における豚の行動分類, セグメンテーション, セグメンテーション, 追跡, 行動分類を含むビデオシーケンスからなる, 慎重に整理されたデータセットであるPigtraceを提案する。 STARタスクを同時に最適化することで、スターフォーマーは個々のタスクでトレーニングされた一般的なベースラインモデルより優れていることを示す。

Tracking the behaviour of livestock enables early detection and thus prevention of contagious diseases in modern animal farms. Apart from economic gains, this would reduce the amount of antibiotics used in livestock farming which otherwise enters the human diet exasperating the epidemic of antibiotic resistance - a leading cause of death. We could use standard video cameras, available in most modern farms, to monitor livestock. However, most computer vision algorithms perform poorly on this task, primarily because, (i) animals bred in farms look identical, lacking any obvious spatial signature, (ii) none of the existing trackers are robust for long duration, and (iii) real-world conditions such as changing illumination, frequent occlusion, varying camera angles, and sizes of the animals make it hard for models to generalize. Given these challenges, we develop an end-to-end behaviour monitoring system for group-housed pigs to perform simultaneous instance level segmentation, tracking, action recognition and re-identification (STAR) tasks. We present starformer, the first end-to-end multiple-object livestock monitoring framework that learns instance-level embeddings for grouped pigs through the use of transformer architecture. For benchmarking, we present Pigtrace, a carefully curated dataset comprising video sequences with instance level bounding box, segmentation, tracking and activity classification of pigs in real indoor farming environment. Using simultaneous optimization on STAR tasks we show that starformer outperforms popular baseline models trained for individual tasks.

翻訳日:2021-11-02 14:08:21 公開日:2021-11-01

# 3次元表面認識画像合成のための生成操作場

Generative Occupancy Fields for 3D Surface-Aware Image Synthesis ( http://arxiv.org/abs/2111.00969v1 )

ライセンス: Link先を確認

Xudong Xu, Xingang Pan, Dahua Lin, Bo Dai

(参考訳) 生成放射場の出現は、3d認識画像合成の発展を著しく促進した。 radianceフィールドでの累積レンダリングプロセスは、勾配がボリューム全体に分散するが、拡散したオブジェクト表面につながるため、これらの生成モデルのトレーニングをずっと簡単にする。一方、放射場と比較すると、占有表現は本質的に決定論的曲面を保証できる。しかし、私たちが生成モデルに直接占有表現を適用すると、訓練中は物体表面上のスパース勾配のみを受け取り、最終的に収束問題に悩まされる。本稿では,コンパクトな物体表面を学習できる生成的放射場に基づく新しいモデルである生成的占有場(gof)を提案する。 GOFの重要な洞察は、放射場における累積レンダリングから、学習面がより正確になるにつれて、表面点のみのレンダリングへの専用の遷移である。このように、GOFは2つの表現の利点を統一されたフレームワークで組み合わせる。実際には、そのレンダリング過程におけるサンプリング領域をボリューム全体から表面周辺の最小隣接領域に徐々に縮小することにより、輝度場からマーチから占有率表現への開始のトレーニングタイム遷移を実現する。複数のデータセットに関する総合的な実験を通して、GOFは高画質画像を3次元整合性で合成し、コンパクトで滑らかな物体表面を同時に学習できることを実証した。コード、モデル、デモビデオはhttps://sheldontsui.github.io/projects/gofで入手できる。

The advent of generative radiance fields has significantly promoted the development of 3D-aware image synthesis. The cumulative rendering process in radiance fields makes training these generative models much easier since gradients are distributed over the entire volume, but leads to diffused object surfaces. In the meantime, compared to radiance fields occupancy representations could inherently ensure deterministic surfaces. However, if we directly apply occupancy representations to generative models, during training they will only receive sparse gradients located on object surfaces and eventually suffer from the convergence problem. In this paper, we propose Generative Occupancy Fields (GOF), a novel model based on generative radiance fields that can learn compact object surfaces without impeding its training convergence. The key insight of GOF is a dedicated transition from the cumulative rendering in radiance fields to rendering with only the surface points as the learned surface gets more and more accurate. In this way, GOF combines the merits of two representations in a unified framework. In practice, the training-time transition of start from radiance fields and march to occupancy representations is achieved in GOF by gradually shrinking the sampling region in its rendering process from the entire volume to a minimal neighboring region around the surface. Through comprehensive experiments on multiple datasets, we demonstrate that GOF can synthesize high-quality images with 3D consistency and simultaneously learn compact and smooth object surfaces. Code, models, and demo videos are available at https://sheldontsui.github.io/projects/GOF

翻訳日:2021-11-02 14:07:24 公開日:2021-11-01

# 長期文書分類の比較研究

Comparative Study of Long Document Classification ( http://arxiv.org/abs/2111.00702v1 )

ライセンス: Link先を確認

Vedangi Wagh, Snehal Khandve, Isha Joshi, Apurva Wani, Geetanjali Kale, Raviraj Joshi

(参考訳) インターネット上の文書形式で保存される情報の量は急速に増加している。そのため、これらの文書を最適に整理・維持することが求められている。テキスト分類アルゴリズムは、テキスト内の単語間の複雑な関係を研究し、文書の意味論を解釈しようとする。これらのアルゴリズムはここ数年で大きく進化した。単純な機械学習アルゴリズムからトランスフォーマーベースのアーキテクチャまで、多くの進歩がありました。しかし、既存の文献は異なるデータセットに対する異なるアプローチを分析しており、機械学習アルゴリズムの性能を比較することは困難である。本研究では,機械学習の標準手法を用いて,長い文書分類を再考する。単純なNaive Bayesから6つの標準テキスト分類データセット上の複雑なBERTまでのアプローチをベンチマークする。本稿では,長い文書データセットに対して異なるアルゴリズムを徹底的に比較する。長い文書分類は単純なタスクであり、基本的なアルゴリズムでさえ、ほとんどのデータセットにおいてBERTベースのアプローチと競合的に実行されます。 BERTベースのモデルはすべてのデータセットで一貫して良好に動作し、計算コストが懸念されない場合、文書分類タスクに盲目的に使用できる。浅層モデルのカテゴリでは、すべてのデータセットで適切に機能する生のBiLSTM + Maxアーキテクチャの使用を提案する。さらに単純なGlove + Attention bag of words modelは、より単純なユースケースに利用できる。高度なモデルを使用することの重要性は、比較的難しいタスクであるIMDBの感情データセットで明らかである。

The amount of information stored in the form of documents on the internet has been increasing rapidly. Thus it has become a necessity to organize and maintain these documents in an optimum manner. Text classification algorithms study the complex relationships between words in a text and try to interpret the semantics of the document. These algorithms have evolved significantly in the past few years. There has been a lot of progress from simple machine learning algorithms to transformer-based architectures. However, existing literature has analyzed different approaches on different data sets thus making it difficult to compare the performance of machine learning algorithms. In this work, we revisit long document classification using standard machine learning approaches. We benchmark approaches ranging from simple Naive Bayes to complex BERT on six standard text classification datasets. We present an exhaustive comparison of different algorithms on a range of long document datasets. We re-iterate that long document classification is a simpler task and even basic algorithms perform competitively with BERT-based approaches on most of the datasets. The BERT-based models perform consistently well on all the datasets and can be blindly used for the document classification task when the computations cost is not a concern. In the shallow model's category, we suggest the usage of raw BiLSTM + Max architecture which performs decently across all the datasets. Even simpler Glove + Attention bag of words model can be utilized for simpler use cases. The importance of using sophisticated models is clearly visible in the IMDB sentiment dataset which is a comparatively harder task.

翻訳日:2021-11-02 14:04:54 公開日:2021-11-01

# 解釈可能なコントラスト語ムーバーの埋め込み

Interpretable contrastive word mover's embedding ( http://arxiv.org/abs/2111.01023v1 )

ライセンス: Link先を確認

Ruijie Jiang, Julia Gouvea, Eric Miller, David Hammer, Shuchin Aeron

(参考訳) 本稿では,分類のための文書の教師付き埋め込み,すなわちコントラスト語ムーバーの埋め込みに対する一般的なアプローチが,解釈可能性を加えることで著しく向上することを示す。この解釈性は、クラスタリング促進機構をコントラスト損失に組み込むことによって達成される。いくつかの公開データセットでは,提案手法が既存のベースラインに対して大幅に改善すると同時に,特定のクラスを最も代表するキーワードのセットを識別することでクラスタへの解釈を提供する。本稿は,学習科学(LS)の領域に根ざした課題である,科学的な文章や思考のための学生の作業を評価するための自然言語処理(NLP)手法の開発の必要性が背景にある。この文脈では,本手法が生物学の授業における実験報告に関連する学生の作業の有意義な評価につながることを示し,LS研究者が学生の理解を深め,科学的思考過程の証拠を評価するのに役立つことを示す。

This paper shows that a popular approach to the supervised embedding of documents for classification, namely, contrastive Word Mover's Embedding, can be significantly enhanced by adding interpretability. This interpretability is achieved by incorporating a clustering promoting mechanism into the contrastive loss. On several public datasets, we show that our method improves significantly upon existing baselines while providing interpretation to the clusters via identifying a set of keywords that are the most representative of a particular class. Our approach was motivated in part by the need to develop Natural Language Processing (NLP) methods for the \textit{novel problem of assessing student work for scientific writing and thinking} - a problem that is central to the area of (educational) Learning Sciences (LS). In this context, we show that our approach leads to a meaningful assessment of the student work related to lab reports from a biology class and can help LS researchers gain insights into student understanding and assess evidence of scientific thought processes.

翻訳日:2021-11-02 14:04:36 公開日:2021-11-01

# Collage: ディープラーニングバックエンドの自動統合

Collage: Automated Integration of Deep Learning Backends ( http://arxiv.org/abs/2111.00655v1 )

ライセンス: Link先を確認

Byungsoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, Tianqi Chen, Zhihao Jia

(参考訳) ディープラーニング(DL)アプリケーションの効率的なデプロイに対する強い要求は、リッチなDLエコシステムの迅速な開発を促す。高速な進歩に追いつくためには、dlフレームワークが様々な最適化されたライブラリやランタイムをバックエンドとして効率的に統合し、それらを適切に使用することで、可能な限り高速な実行可能ファイルを生成することが不可欠である。しかし、現在のdlフレームワークは多様なバックエンドを統合するためにかなりの手作業を必要とし、しばしば高いパフォーマンスを提供することができない。本稿では,dlバックエンドを統合するための自動フレームワークであるcollageを提案する。 Collageは、ユーザがさまざまなバックエンドの機能を正確に指定できるバックエンド登録インターフェースを提供する。 Collageは利用可能なバックエンドの仕様を活用することで、特定のワークロードと実行環境に対して最適化されたバックエンド配置を検索する。評価の結果,コラージュは手動の介入なしに複数のバックエンドを自動的に統合し,2つのNVIDIA GPUとIntel CPUで既存のフレームワークを1.21x,1.39x,1.40xで上回ります。

Strong demands for efficient deployment of Deep Learning (DL) applications prompt the rapid development of a rich DL ecosystem. To keep up with its fast advancement, it is crucial for DL frameworks to efficiently integrate a variety of optimized libraries and runtimes as their backends and generate the fastest possible executable by using them properly. However, current DL frameworks require significant manual effort to integrate diverse backends and often fail to deliver high performance. In this paper, we propose Collage, an automatic framework for integrating DL backends. Collage provides a backend registration interface that allows users to precisely specify the capability of various backends. By leveraging the specifications of available backends, Collage searches for an optimized backend placement for a given workload and execution environment. Our evaluation shows that Collage automatically integrates multiple backends together without manual intervention, and outperforms existing frameworks by 1.21x, 1.39x, 1.40x on two different NVIDIA GPUs and an Intel CPU respectively.

翻訳日:2021-11-02 14:02:19 公開日:2021-11-01

# RMNA:ルールマイニングを用いた近隣アグリゲーションに基づく知識グラフ表現学習モデル

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining ( http://arxiv.org/abs/2111.00658v1 )

ライセンス: Link先を確認

Ling Chen, Jun Cui, Xing Tang, Chaodu Song, Yuntao Qian, Yansheng Li, and Yongjun Zhang

(参考訳) 最先端の伝統的な表現学習(TRL)モデルは知識グラフの完成度において競争性能を示すが、実体の埋め込みの間にパラメータ共有はなく、実体間の接続が弱い。そこで,隣接集約型表現学習(narl)モデルを提案する。しかし、既存のNARLモデルは、複数のホップ隣人の情報を無視したり、階層的な隣人の集約によって、複数のホップ隣人の完全性を破壊したりする。本稿では,ルールマイニングアルゴリズムを用いてホルンルールを取得しフィルタするRMNAというNARLモデルを提案する。また,選択されたホルンルールを用いて,貴重なマルチホップ隣人をワンホップ隣人に変換するので,これらのワンホップ隣人を集約することで,有意義なマルチホップ隣人の情報を完全に活用することができる。実験では,RMNAと最先端TRLモデル,NARLモデルを比較した。その結果,RMNAは競争力のある性能を示した。

Although the state-of-the-art traditional representation learning (TRL) models show competitive performance on knowledge graph completion, there is no parameter sharing between the embeddings of entities, and the connections between entities are weak. Therefore, neighbor aggregation-based representation learning (NARL) models are proposed, which encode the information in the neighbors of an entity into its embeddings. However, existing NARL models either only utilize one-hop neighbors, ignoring the information in multi-hop neighbors, or utilize multi-hop neighbors by hierarchical neighbor aggregation, destroying the completeness of multi-hop neighbors. In this paper, we propose a NARL model named RMNA, which obtains and filters horn rules through a rule mining algorithm, and uses selected horn rules to transform valuable multi-hop neighbors into one-hop neighbors, therefore, the information in valuable multi-hop neighbors can be completely utilized by aggregating these one-hop neighbors. In experiments, we compare RMNA with the state-of-the-art TRL models and NARL models. The results show that RMNA has a competitive performance.

翻訳日:2021-11-02 14:02:02 公開日:2021-11-01

# 畳み込みニューラルネットワークの解法の拡張によるグラフニューラルネットワークのエッジレベル説明

Edge-Level Explanations for Graph Neural Networks by Extending Explainability Methods for Convolutional Neural Networks ( http://arxiv.org/abs/2111.00722v1 )

ライセンス: Link先を確認

Tetsu Kasanishi, Xueting Wang, and Toshihiko Yamasaki

(参考訳) グラフニューラルネットワーク(GNN)は、グラフデータを入力として扱うディープラーニングモデルであり、トラフィック予測や分子特性予測といった様々なタスクに適用される。しかしながら、GNNの複雑さのため、入力のどの部分がGNNモデルの出力に影響を与えるかを分析することは困難である。本研究では,GNNに対して局所解釈型モデル非依存記述(LIME)やグラディエント・ベース・サリエンシマップ,グラディエント・クラス活性化マッピング(Grad-CAM)などの畳み込みニューラルネットワーク(CNN)の説明可能性手法を拡張し,入力グラフのどのエッジが重要かを予測する。実験結果から,limeベースの手法は実環境における複数タスクの最も効率的な説明可能性であり,gnnによる説明可能性の最先端手法よりも優れていることが示唆された。

Graph Neural Networks (GNNs) are deep learning models that take graph data as inputs, and they are applied to various tasks such as traffic prediction and molecular property prediction. However, owing to the complexity of the GNNs, it has been difficult to analyze which parts of inputs affect the GNN model's outputs. In this study, we extend explainability methods for Convolutional Neural Networks (CNNs), such as Local Interpretable Model-Agnostic Explanations (LIME), Gradient-Based Saliency Maps, and Gradient-Weighted Class Activation Mapping (Grad-CAM) to GNNs, and predict which edges in the input graphs are important for GNN decisions. The experimental results indicate that the LIME-based approach is the most efficient explainability method for multiple tasks in the real-world situation, outperforming even the state-of-the-art method in GNN explainability.

翻訳日:2021-11-02 14:01:42 公開日:2021-11-01

# 交通予測のための適応型マルチレセプティブフィールド空間-時間グラフ畳み込みネットワーク

Adaptive Multi-receptive Field Spatial-Temporal Graph Convolutional Network for Traffic Forecasting ( http://arxiv.org/abs/2111.00724v1 )

ライセンス: Link先を確認

Xing Wang (1), Juan Zhao (1), Lin Zhu (1), Xu Zhou (2), Zhao Li (2), Junlan Feng (1), Chao Deng (1), Yong Zhang (2) ((1) China Mobile Research Institute, Beijing, China, (2) Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, China)

(参考訳) モバイルネットワークトラフィック予測は,日々のネットワーク運用において重要な機能のひとつだ。商用モバイルネットワークは、大きく、異種で、複雑で、動的である。これらの本質的な特徴により、グラフ畳み込みネットワークに基づく予測手法や、自動車交通予測に成功している様々な注意メカニズムといった最近の高度なアルゴリズムでも、モバイルネットワークトラフィック予測は解決されない。本稿では,この問題を時空間シーケンス予測タスクとして用いた。本稿では,移動基地局のトラフィック動態をモデル化するために,新しい深層学習ネットワークアーキテクチャである適応多受容場空間時間グラフ畳み込みネットワーク(AMF-STGCN)を提案する。 AMF-STGCNは,(1)移動ネットワークにおける複雑な時空間依存性を共同でモデル化し,(2)異種基地局の様々な受容場を捕捉するための注意機構を適用し,(3)完全に接続されたディープネットワークに基づく余分なデコーダを導入して,マルチステップ予測による誤り伝播課題を克服する。 2つの異なるドメインからの4つの実世界のデータセットの実験は、一貫してamf-stgcnが最先端のメソッドを上回ることを示している。

Mobile network traffic forecasting is one of the key functions in daily network operation. A commercial mobile network is large, heterogeneous, complex and dynamic. These intrinsic features make mobile network traffic forecasting far from being solved even with recent advanced algorithms such as graph convolutional network-based prediction approaches and various attention mechanisms, which have been proved successful in vehicle traffic forecasting. In this paper, we cast the problem as a spatial-temporal sequence prediction task. We propose a novel deep learning network architecture, Adaptive Multi-receptive Field Spatial-Temporal Graph Convolutional Networks (AMF-STGCN), to model the traffic dynamics of mobile base stations. AMF-STGCN extends GCN by (1) jointly modeling the complex spatial-temporal dependencies in mobile networks, (2) applying attention mechanisms to capture various Receptive Fields of heterogeneous base stations, and (3) introducing an extra decoder based on a fully connected deep network to conquer the error propagation challenge with multi-step forecasting. Experiments on four real-world datasets from two different domains consistently show AMF-STGCN outperforms the state-of-the-art methods.

翻訳日:2021-11-02 14:01:24 公開日:2021-11-01

# マルコフ報酬の表現性について

On the Expressivity of Markov Reward ( http://arxiv.org/abs/2111.00876v1 )

ライセンス: Link先を確認

David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

(参考訳) リワードは強化学習エージェントの推進力である。本稿では,エージェントが実行するタスクをキャプチャする手段として,報酬の表現性を理解することを目的としている。本研究は,(1)許容される行動のセット,(2)行動上の部分順序付け,(3)軌道上の部分順序付けという,3つの新しい「タスク」の抽象概念を中心に構成する。私たちの主な結果は、報酬はこれらのタスクの多くを表現できるが、それぞれのタスクタイプには、マルコフ報酬関数がキャプチャできないインスタンスが存在することを示しています。次に,マルコフ報酬関数を構成する多項式時間アルゴリズムのセットを提供し,エージェントがこれら3種類のタスクを最適化し,その報酬関数が存在しないかを正しく判断する。結論は,我々の理論的知見を裏付ける実証的研究である。

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

翻訳日:2021-11-02 14:01:03 公開日:2021-11-01

# 2段階因果MDPの干渉効率アルゴリズム

Intervention Efficient Algorithm for Two-Stage Causal MDPs ( http://arxiv.org/abs/2111.00886v1 )

ライセンス: Link先を確認

Rahul Madhavan, Aurghya Maiti, Gaurav Sinha and Siddharth Barman

(参考訳) マルコフ決定過程 (MDP) では、状態が確率的に報酬を生成する因果グラフに対応する。この設定では、学習者の目標は、各状態の変数に介入することで高い報酬をもたらす原子的介入を特定することである。最近の因果関係の枠組みを一般化し、それぞれの状態に平行な因果関係グラフを持つ2段階の因果関係のMDPに対する(単純な)最小化保証を開発する。インスタンス依存の後悔の束縛を実現するアルゴリズムを提案する。このアルゴリズムの重要な特徴は、凸最適化を利用して探索問題に対処することである。後悔の保証が本質的にきついインスタンスのクラスを特定し、理論的結果を実験的に検証する。

We study Markov Decision Processes (MDP) wherein states correspond to causal graphs that stochastically generate rewards. In this setup, the learner's goal is to identify atomic interventions that lead to high rewards by intervening on variables at each state. Generalizing the recent causal-bandit framework, the current work develops (simple) regret minimization guarantees for two-stage causal MDPs, with parallel causal graph at each state. We propose an algorithm that achieves an instance dependent regret bound. A key feature of our algorithm is that it utilizes convex optimization to address the exploration problem. We identify classes of instances wherein our regret guarantee is essentially tight, and experimentally validate our theoretical results.

翻訳日:2021-11-02 14:00:50 公開日:2021-11-01

# ノード数が変動する線形非ガウス有向非巡回グラフの学習

Learning linear non-Gaussian directed acyclic graph with diverging number of nodes ( http://arxiv.org/abs/2111.00740v1 )

ライセンス: Link先を確認

Ruixuan Zhao and Xin He and Junhui Wang

(参考訳) 有向非巡回グラフ(DAG)として表される非巡回モデルは、収集ノード間の方向因果関係を表現するために広く用いられている。本稿では,高次元の場合において,連続的な非ガウス分布の雑音が生じるような非線形ガウスDAGを効率よく学習する方法を提案する。これは、ガウス雑音を仮定する既存のDAG学習法と、正確なDAG回復を達成するための分散仮定を付加している。提案手法は,DAG学習を促進するためにトポロジカル層の概念を活用する。特に、トポロジ的層をボトムアップ的に正確に再構成することができ、各層内のノード間の親子関係も一貫して確立できることを示す。さらに,提案手法はDAG学習の文献で広く想定されている親の忠実さや親の忠実さの仮定を必要としない。その利点は、さまざまなシミュレーション例で人気のあるライバルたちとの数値比較や、covid-19の世界的な拡散に関する実際の応用によっても支持されている。

Acyclic model, often depicted as a directed acyclic graph (DAG), has been widely employed to represent directional causal relations among collected nodes. In this article, we propose an efficient method to learn linear non-Gaussian DAG in high dimensional cases, where the noises can be of any continuous non-Gaussian distribution. This is in sharp contrast to most existing DAG learning methods assuming Gaussian noise with additional variance assumptions to attain exact DAG recovery. The proposed method leverages a novel concept of topological layer to facilitate the DAG learning. Particularly, we show that the topological layers can be exactly reconstructed in a bottom-up fashion, and the parent-child relations among nodes in each layer can also be consistently established. More importantly, the proposed method does not require the faithfulness or parental faithfulness assumption which has been widely assumed in the literature of DAG learning. Its advantage is also supported by the numerical comparison against some popular competitors in various simulated examples as well as a real application on the global spread of COVID-19.

翻訳日:2021-11-02 13:59:27 公開日:2021-11-01

# 正規化フローを用いたポチトグラフィーの不確かさ定量化

Uncertainty quantification for ptychography using normalizing flows ( http://arxiv.org/abs/2111.00745v1 )

ライセンス: Link先を確認

Agnimitra Dasgupta and Zichao Wendy Di

(参考訳) 高分解能・非破壊的な材料キャラクタリゼーションに欠かせない手法として、Ptychographyは大規模な非線形・非凸逆問題を示すが、本質的な光子統計はこれらの課題に対処するための統計に基づく深層学習アプローチに明確な機会を与える。本研究は, 高次元後部サロゲートを得るための正規化流を探索し, また, 復元に伴う不確かさのキャラクタリゼーションを可能にする。地中真実の欠如による復元品質の判断, 突発的な人工物発見, 返却された不確実性パターンを用いた将来の実験の指導において, 極めて望ましい能力である。提案手法は, 音を付加した合成試料と, 様々な物理実験環境での性能を示す。

Ptychography, as an essential tool for high-resolution and nondestructive material characterization, presents a challenging large-scale nonlinear and non-convex inverse problem; however, its intrinsic photon statistics create clear opportunities for statistical-based deep learning approaches to tackle these challenges, which has been underexplored. In this work, we explore normalizing flows to obtain a surrogate for the high-dimensional posterior, which also enables the characterization of the uncertainty associated with the reconstruction: an extremely desirable capability when judging the reconstruction quality in the absence of ground truth, spotting spurious artifacts and guiding future experiments using the returned uncertainty patterns. We demonstrate the performance of the proposed method on a synthetic sample with added noise and in various physical experimental settings.

翻訳日:2021-11-02 13:59:09 公開日:2021-11-01

# (参考訳) コントラスト学習は予習からファインタニングまで対向的ロバスト性を維持するか?

When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning? ( http://arxiv.org/abs/2111.01124v1 )

ライセンス: CC0 1.0

Lijie Fan, Sijia Liu, Pin-Yu Chen, Gaoyuan Zhang, Chuang Gan

(参考訳) Contrastive Learning (CL)は、一般化可能な特徴表現を学習し、その上に線形分類器を微調整することで、下流タスクの最先端のパフォーマンスを達成する。しかし, 画像分類において, 対向ロバスト性は不可欠となるため, CLが下流タスクに対するロバスト性を維持することができるかどうかは不明である。主な課題は、自己指導型事前学習+教師型微調整パラダイムにおいて、事前訓練から微調整までの学習課題のミスマッチにより、対人的堅牢性が容易に忘れられることである。このような課題を,“クロスタスクロバスト性転送可能性”と呼んでいる。上記の問題に対処するため,本論文では,ロバスト性向上のレンズを通してcl原理を再検討し,発展させる。 1) 画像の高周波成分はモデルのロバスト性を向上させるのに有用であり, (2) 擬似超視覚刺激(例:特徴クラスタリング)によるclの強化は、忘れずにロバスト性を維持するのに役立つ。本稿では,新しい設計を取り入れた新しい対向型コントラスト事前学習フレームワークAdvCLを提案する。本稿では,AdvCLがモデル精度と微調整効率を損なうことなく,タスク間の堅牢性伝達性を向上できることを示す。本稿では,AdvCLが複数のデータセット(CIFAR-10,CIFAR-100,STL-10)とファインタニングスキーム(線形評価とフルモデルファインタニング)において,最先端の自己教師型学習手法よりも優れていることを示す。

Contrastive learning (CL) can learn generalizable feature representations and achieve the state-of-the-art performance of downstream tasks by finetuning a linear classifier on top of it. However, as adversarial robustness becomes vital in image classification, it remains unclear whether or not CL is able to preserve robustness to downstream tasks. The main challenge is that in the self-supervised pretraining + supervised finetuning paradigm, adversarial robustness is easily forgotten due to a learning task mismatch from pretraining to finetuning. We call such a challenge 'cross-task robustness transferability'. To address the above problem, in this paper we revisit and advance CL principles through the lens of robustness enhancement. We show that (1) the design of contrastive views matters: High-frequency components of images are beneficial to improving model robustness; (2) Augmenting CL with pseudo-supervision stimulus (e.g., resorting to feature clustering) helps preserve robustness without forgetting. Equipped with our new designs, we propose AdvCL, a novel adversarial contrastive pretraining framework. We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency. With a thorough experimental study, we demonstrate that AdvCL outperforms the state-of-the-art self-supervised robust learning methods across multiple datasets (CIFAR-10, CIFAR-100, and STL-10) and finetuning schemes (linear evaluation and full model finetuning).

翻訳日:2021-11-02 13:55:41 公開日:2021-11-01

# vpfnet:マルチクラス3dオブジェクト検出のためのvoxel-pixel fusion network

VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection ( http://arxiv.org/abs/2111.00966v1 )

ライセンス: Link先を確認

Chia-Hung Wang, Hsueh-Wei Chen, Li-Chen Fu

(参考訳) 多くのLiDARを用いた大規模物体検出法、単一クラス物体検出法、あるいは簡単な状況下では、非常によく機能すると主張した。しかし,イメージセマンティクスの活用に失敗したため,小型物体の検出や硬い状況下での性能は,融合ベースのものを超えなかった。本稿では,複雑な環境下での検知性能を高めるために,LiDARとカメラセンサデータストリームを併用した深層学習(DL)組み込み核融合型3Dオブジェクト検出ネットワーク,Voxel-Pixel Fusion Network (VPFNet)を提案する。このネットワーク内では、voxel-pixel fusion(vpf)層と呼ばれ、voxel-pixelペアの幾何学的関係を利用して、voxelの特徴とピクセルの特徴を適切なメカニズムで融合する。さらに,voxel-pixel対の特性を考慮し,核融合効果を誘導・増強するために,いくつかのパラメータが特に設計されている。提案手法は,マルチレベル難易度下でのマルチクラス3次元オブジェクト検出タスクのKITTIベンチマークで評価し,平均平均精度(mAP)ですべての最先端手法より優れていることを示す。ここでの我々のアプローチは、挑戦的な歩行者クラスでKITTIのリーダーボードにランクインしている点も注目に値する。

Many LiDAR-based methods for detecting large objects, single-class object detection, or under easy situations were claimed to perform quite well. However, their performances of detecting small objects or under hard situations did not surpass those of the fusion-based ones due to failure to leverage the image semantics. In order to elevate the detection performance in a complicated environment, this paper proposes a deep learning (DL)-embedded fusion-based multi-class 3D object detection network which admits both LiDAR and camera sensor data streams, named Voxel-Pixel Fusion Network (VPFNet). Inside this network, a key novel component is called Voxel-Pixel Fusion (VPF) layer, which takes advantage of the geometric relation of a voxel-pixel pair and fuses the voxel features and the pixel features with proper mechanisms. Moreover, several parameters are particularly designed to guide and enhance the fusion effect after considering the characteristics of a voxel-pixel pair. Finally, the proposed method is evaluated on the KITTI benchmark for multi-class 3D object detection task under multilevel difficulty, and is shown to outperform all state-of-the-art methods in mean average precision (mAP). It is also noteworthy that our approach here ranks the first on the KITTI leaderboard for the challenging pedestrian class.

翻訳日:2021-11-02 13:27:37 公開日:2021-11-01

# MOST-GAN:遠交顔画像操作のための3次元形状型GAN

MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation ( http://arxiv.org/abs/2111.01048v1 )

ライセンス: Link先を確認

Safa C. Medin, Bernhard Egger, Anoop Cherian, Ye Wang, Joshua B. Tenenbaum, Xiaoming Liu, Tim K. Marks

(参考訳) 最近のgans(generative adversarial network)の進歩は、顔画像合成において顕著な成果をもたらしている。スタイルベースのganを用いる手法は、印象的なフォトリアリスティックな顔画像を生成することができるが、生成した顔の特徴を有意義で不連続な方法で制御することはしばしば困難である。事前のアプローチは、以前に訓練されたGANの潜在空間内で、このような意味制御と非絡み合いを実現することを目的としている。対照的に,3次元形状,アルベド,ポーズ,照明などの顔の物理的属性を事前にモデル化し,デザインによる絡み合いを解消する枠組みを提案する。提案手法であるMOST-GANは,スタイルベースGANの表現力と光リアリズムと非線形3D形態素モデルの物理的歪みと柔軟性を統合し,最先端の2Dヘア操作ネットワークと結合する。 MOST-GANは、その物理的特性を完全に3D制御した肖像画の写実的な操作を実現し、照明、表情、およびフルプロファイルビューまでのポーズの極端な操作を可能にする。

Recent advances in generative adversarial networks (GANs) have led to remarkable achievements in face image synthesis. While methods that use style-based GANs can generate strikingly photorealistic face images, it is often difficult to control the characteristics of the generated faces in a meaningful and disentangled way. Prior approaches aim to achieve such semantic control and disentanglement within the latent space of a previously trained GAN. In contrast, we propose a framework that a priori models physical attributes of the face such as 3D shape, albedo, pose, and lighting explicitly, thus providing disentanglement by design. Our method, MOST-GAN, integrates the expressive power and photorealism of style-based GANs with the physical disentanglement and flexibility of nonlinear 3D morphable models, which we couple with a state-of-the-art 2D hair manipulation network. MOST-GAN achieves photorealistic manipulation of portrait images with fully disentangled 3D control over their physical attributes, enabling extreme manipulation of lighting, facial expression, and pose variations up to full profile view.

翻訳日:2021-11-02 13:27:12 公開日:2021-11-01

# ACGANの再起動: 安定トレーニングによる補助分類型GAN

Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training ( http://arxiv.org/abs/2111.01118v1 )

ライセンス: Link先を確認

Minguk Kang, Woohyeon Shim, Minsu Cho, Jaesik Park

(参考訳) 条件付き生成逆数ネットワーク(cGAN)は、クラス情報をGANに組み込んで現実的な画像を生成する。最も一般的なcGANの1つは、ソフトマックスクロスエントロピー損失(ACGAN)を持つ補助分類器GANであるが、データセットのクラス数が増加するにつれて、ACGANのトレーニングが困難であることが広く知られている。 ACGANはまた、多様性の欠如により容易に分類できるサンプルを生成する傾向がある。本稿では,ACGANの治療法を2つ紹介する。まず,分類器内での勾配爆発は早期学習において望ましくない崩壊を引き起こし,入力ベクトルを単位超球面に投影することで問題を解くことができる。次に,データ対データクロスエントロピー損失 (d2d-ce) を提案する。本稿では,Rebooted Auxiliary Classifier Generative Adversarial Network (ReACGAN)を提案する。実験結果から,ReACGANはCIFAR10, Tiny-ImageNet, CUB200, ImageNetのデータセット上で,最先端の生成結果が得られることがわかった。また、ReACGANは差別化可能な拡張による利点があり、D2D-CEがStyleGAN2アーキテクチャと調和していることを検証する。モデル重みと代表的なcGANの実装を提供するソフトウェアパッケージはhttps://github.com/POSTECH-CVLab/PyTorch-StudioGANで公開されている。

Conditional Generative Adversarial Networks (cGAN) generate realistic images by incorporating class information into GAN. While one of the most popular cGANs is an auxiliary classifier GAN with softmax cross-entropy loss (ACGAN), it is widely known that training ACGAN is challenging as the number of classes in the dataset increases. ACGAN also tends to generate easily classifiable samples with a lack of diversity. In this paper, we introduce two cures for ACGAN. First, we identify that gradient exploding in the classifier can cause an undesirable collapse in early training, and projecting input vectors onto a unit hypersphere can resolve the problem. Second, we propose the Data-to-Data Cross-Entropy loss (D2D-CE) to exploit relational information in the class-labeled dataset. On this foundation, we propose the Rebooted Auxiliary Classifier Generative Adversarial Network (ReACGAN). The experimental results show that ReACGAN achieves state-of-the-art generation results on CIFAR10, Tiny-ImageNet, CUB200, and ImageNet datasets. We also verify that ReACGAN benefits from differentiable augmentations and that D2D-CE harmonizes with StyleGAN2 architecture. Model weights and a software package that provides implementations of representative cGANs and all experiments in our paper are available at https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.

翻訳日:2021-11-02 13:26:52 公開日:2021-11-01

# 強化学習におけるサンプル複雑度の水平依存性の設定

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning ( http://arxiv.org/abs/2111.00633v1 )

ライセンス: Link先を確認

Yuanzhi Li, Ruosong Wang, Lin F. Yang

(参考訳) 近年,強化学習(RL)におけるサンプル複雑性の水平依存性の理解への関心が高まっている。特に、地平線長が$H$のRL環境においては、状態と動作の数が固定されたときに、$\mathrm{polylog}(H)$環境相互作用のエピソードを用いて、$O(1)$-最適化ポリシーを学習する、ほぼ正しい(PAC)アルゴリズムが存在することを示した。しかし、$\mathrm{polylog}(h)$ の依存が必要かどうかはまだ不明である。本研究では,同じpac保証を実現するアルゴリズムを開発しながら,環境間インタラクションの$o(1)$のエピソードのみを使用して,rlにおけるサンプル複雑性の地平線依存性を完全に解決する。私たちはこの限界を達成する一割引及び有限水平マルコフ決定過程(MDP)における値関数の接続を確立すること。 (II)MDPにおける新しい摂動解析我々の新しい技術は独立した興味を持ち、RLの関連する問題に適用できると考えている。

Recently there is a surge of interest in understanding the horizon-dependence of the sample complexity in reinforcement learning (RL). Notably, for an RL environment with horizon length $H$, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O(1)$-optimal policy using $\mathrm{polylog}(H)$ episodes of environment interactions when the number of states and actions is fixed. It is yet unknown whether the $\mathrm{polylog}(H)$ dependence is necessary or not. In this work, we resolve this question by developing an algorithm that achieves the same PAC guarantee while using only $O(1)$ episodes of environment interactions, completely settling the horizon-dependence of the sample complexity in RL. We achieve this bound by (i) establishing a connection between value functions in discounted and finite-horizon Markov decision processes (MDPs) and (ii) a novel perturbation analysis in MDPs. We believe our new techniques are of independent interest and could be applied in related questions in RL.

翻訳日:2021-11-02 13:26:28 公開日:2021-11-01

# 分散非凸最適化のための通信圧縮適応勾配法

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization ( http://arxiv.org/abs/2111.00705v1 )

ライセンス: Link先を確認

Yujia Wang, Lu Lin and Jinghui Chen

(参考訳) トレーニングデータセットの規模が爆発的に増えているため、近年、分散学習への関心が高まっている。主なボトルネックの1つは、中央サーバとローカルワーカーの間の通信コストが大きいことである。誤りフィードバック圧縮は確率勾配勾配(SGD)による通信コストの低減に成功していることが証明されているが、大規模機械学習モデルのトレーニングに広く用いられている保証付き通信効率の高い適応勾配法を構築する試みは、はるかに少ない。本稿では,分散非凸最適化問題に対する通信圧縮型AMSGradを提案する。提案する分散学習フレームワークは,効果的な勾配圧縮戦略とワーカーサイドモデル更新設計を特徴とする。提案手法は,確率的非凸最適化設定において,非圧縮バニラ AMSGrad と同じ繰り返しの複雑度で,一階定常点に収束することを示す。様々なベンチマーク実験が我々の理論を裏付けている。

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer attempts in building communication-efficient adaptive gradient methods with provable guarantees, which are widely used in training large-scale machine learning models. In this paper, we propose a new communication-compressed AMSGrad for distributed nonconvex optimization problem, which is provably efficient. Our proposed distributed learning framework features an effective gradient compression strategy and a worker-side model update design. We prove that the proposed communication-efficient distributed adaptive gradient method converges to the first-order stationary point with the same iteration complexity as uncompressed vanilla AMSGrad in the stochastic nonconvex optimization setting. Experiments on various benchmarks back up our theory.

翻訳日:2021-11-02 13:26:09 公開日:2021-11-01

# ニューラルネットワークにおける自由確率、ニュートンリリパッドおよびジャコビアン

Free Probability, Newton lilypads and Jacobians of neural networks ( http://arxiv.org/abs/2111.00841v1 )

ライセンス: Link先を確認

Reda Chhaibi, Tariq Daouda, Ezechiel Kahn

(参考訳) ニューラルネットワークの学習過程における勾配降下は多くの不安定性を伴う。ヤコビアンのスペクトル密度はロバスト性を分析する重要な要素である。ペニントンらの研究に続いて、そのようなヤコビアンは自由確率論からの自由乗法畳み込みを用いてモデル化される。本稿では,関連するスペクトル密度を計算するための信頼性の高い高速手法を提案する。この方法は制御され、証明された収束を有する。我々の手法は, 適応的なニュートン・ラフソンスキームに基づいてアトラクションの流域を探索し, チェーン化することで, ライパッドのような連続的な流域と, 目的に向かってのステップを見つける。本稿では,学習プロセスがネットワークの深さ,層幅,初期化選択によってどのように影響を受けるかを評価することにより,本手法の適用性を示す。

Gradient descent during the learning process of a neural network can be subject to many instabilities. The spectral density of the Jacobian is a key component for analyzing robustness. Following the works of Pennington et al., such Jacobians are modeled using free multiplicative convolutions from Free Probability Theory. We present a reliable and very fast method for computing the associated spectral densities. This method has a controlled and proven convergence. Our technique is based on an adaptative Newton-Raphson scheme, by finding and chaining basins of attraction: the Newton algorithm finds contiguous lilypad-like basins and steps from one to the next, heading towards the objective. We demonstrate the applicability of our method by using it to assess how the learning process is affected by network depth, layer widths and initialization choices: empirically, final test losses are very correlated to our Free Probability metrics.

翻訳日:2021-11-02 13:24:46 公開日:2021-11-01

# (参考訳) ロバストな質問応答のためのイントロスペクティブ蒸留

Introspective Distillation for Robust Question Answering ( http://arxiv.org/abs/2111.01026v1 )

ライセンス: CC BY 4.0

Yulei Niu, Hanwang Zhang

(参考訳) 質問応答(QA)モデルは、例えば、視覚的QAに先行する言語や、読解における位置バイアスといったデータバイアスを利用するためによく知られている。近年の脱バイアス法は, 分配内(ID)性能を著しく犠牲にして, 分配外(OOD)の一般化性を向上している。したがって、これらはテスト分布が事前に知られている領域にのみ適用できる。本稿では,QAの両世界を最大限に活用するために,IntroD (Introspective Distillation) と呼ばれる新しい脱臭法を提案する。我々は,OODとIDの帰納バイアスを,トレーニングサンプルが現実のIDの世界に適合するか,あるいは偽のOODに適合するかを検査することによってブレンドすることを目的とする。視覚的QAデータセットのVQA v2, VQA-CP, 読解理解データセットのSQuAD実験により, 提案したIntroDは, 他のデバイアス手法と比較して競合性のあるOOD性能を維持しつつ, より優れたID性能を実現していることが示された。

Question answering (QA) models are well-known to exploit data bias, e.g., the language prior in visual QA and the position bias in reading comprehension. Recent debiasing methods achieve good out-of-distribution (OOD) generalizability with a considerable sacrifice of the in-distribution (ID) performance. Therefore, they are only applicable in domains where the test distribution is known in advance. In this paper, we present a novel debiasing method called Introspective Distillation (IntroD) to make the best of both worlds for QA. Our key technical contribution is to blend the inductive bias of OOD and ID by introspecting whether a training sample fits in the factual ID world or the counterfactual OOD one. Experiments on visual QA datasets VQA v2, VQA-CP, and reading comprehension dataset SQuAD demonstrate that our proposed IntroD maintains the competitive OOD performance compared to other debiasing methods, while sacrificing little or even achieving better ID performance compared to the non-debiasing ones.

翻訳日:2021-11-02 13:20:35 公開日:2021-11-01

# コントラスト学習の一般化に向けて

Towards the Generalization of Contrastive Self-Supervised Learning ( http://arxiv.org/abs/2111.00743v1 )

ライセンス: Link先を確認

Weiran Huang and Mingyang Yi and Xuyang Zhao

(参考訳) 近年,学習にラベルのないデータしか必要としない自己教師型学習が注目されている。コントラスト学習は、自己教師あり学習のための一般的なアプローチであり、実際に経験的にうまく機能する。しかし,下流課題における一般化能力の理論的理解は十分に研究されていない。そこで,本研究では,自己教師付き事前学習モデルがダウンストリームタスクにどのように一般化するかを理論的に説明する。具体的には,クラス中心と密集したクラス内サンプルを区別する特徴空間に入力データを組み込む場合,自己教師付きモデルが下流分類タスクにおいて一般化能を有することを示す。以上の結論により、SimCLR と Barlow Twins は2つの正準コントラスト自己監督法である。上記の特徴空間はいずれの手法でも得られることが証明され、下流分類タスクの一般化におけるそれらの成功を説明する。最後に, 理論的な知見を検証するため, 様々な実験を行った。

Recently, self-supervised learning has attracted great attention since it only requires unlabeled data for training. Contrastive learning is a popular approach for self-supervised learning and empirically performs well in practice. However, the theoretical understanding of its generalization ability on downstream tasks is not well studied. To this end, we present a theoretical explanation of how contrastive self-supervised pre-trained models generalize to downstream tasks. Concretely, we quantitatively show that the self-supervised model has generalization ability on downstream classification tasks if it embeds input data into a feature space with distinguishing centers of classes and closely clustered intra-class samples. With the above conclusion, we further explore SimCLR and Barlow Twins, which are two canonical contrastive self-supervised methods. We prove that the aforementioned feature space can be obtained via any of the methods, and thus explain their success on the generalization on downstream classification tasks. Finally, various experiments are also conducted to verify our theoretical findings.

翻訳日:2021-11-02 12:58:30 公開日:2021-11-01

# ベイズ最適化のためのディープカーネル獲得関数のエンドツーエンド学習

End-to-End Learning of Deep Kernel Acquisition Functions for Bayesian Optimization ( http://arxiv.org/abs/2111.00639v1 )

ライセンス: Link先を確認

Tomoharu Iwata

(参考訳) 複雑な構造を持つ高次元データに対するベイズ最適化(BO)のために、ガウス過程(GP)のためのニューラルネットワークベースのカーネルは、ディープラーニングの高表現力によって柔軟な代理関数を学習するために使われてきた。しかし、既存の手法では、BO性能を直接改善しない限界確率を最大化してニューラルネットワークを訓練している。本稿では,boが求める真の最適値と最良値との差を最小化するニューラルネットワークを用いた,boのメタ学習手法を提案する。我々は,現在評価されているデータポイントを入力として,次に評価すべきデータポイントをニューラルネットワークによって出力するポリシをモデル化する。このモデルでは、ニューラルネットワークベースのカーネルは、取得関数とgpを介してギャップをバックプロパゲーションすることにより、取得関数に適するように訓練される。我々のモデルは、複数のタスクから強化学習フレームワークによって訓練されている。ニューラルネットワークはさまざまなタスク間で共有されるため、複数のトレーニングタスクからBOに関する知識を収集し、その知識を見えないテストタスクに使用することができる。 3つのテキスト文書データセットを用いた実験において,提案手法が既存の手法よりも優れたBO性能を実現することを示す。

For Bayesian optimization (BO) on high-dimensional data with complex structure, neural network-based kernels for Gaussian processes (GPs) have been used to learn flexible surrogate functions by the high representation power of deep learning. However, existing methods train neural networks by maximizing the marginal likelihood, which do not directly improve the BO performance. In this paper, we propose a meta-learning method for BO with neural network-based kernels that minimizes the expected gap between the true optimum value and the best value found by BO. We model a policy, which takes the current evaluated data points as input and outputs the next data point to be evaluated, by a neural network, where neural network-based kernels, GPs, and mutual information-based acquisition functions are used as its layers. With our model, the neural network-based kernel is trained to be appropriate for the acquisition function by backpropagating the gap through the acquisition function and GP. Our model is trained by a reinforcement learning framework from multiple tasks. Since the neural network is shared across different tasks, we can gather knowledge on BO from multiple training tasks, and use the knowledge for unseen test tasks. In experiments using three text document datasets, we demonstrate that the proposed method achieves better BO performance than the existing methods.

翻訳日:2021-11-02 12:58:03 公開日:2021-11-01

# NOTMAD: サンプル特異構造とパラメータによるベイズネットワークの推定

NOTMAD: Estimating Bayesian Networks with Sample-Specific Structures and Parameters ( http://arxiv.org/abs/2111.01104v1 )

ライセンス: Link先を確認

Ben Lengerich, Caleb Ellington, Bryon Aragam, Eric P. Xing, Manolis Kellis

(参考訳) 文脈固有のベイズネットワーク(即ち有向非巡回グラフ、dag)は変数間の文脈依存関係を識別するが、非巡回性要求によって引き起こされる非凸性は、文脈固有の推定子(例えばグラフ生成関数)間での情報共有を困難にする。このため、コンテキスト固有のベイズネットワークを推定する既存の手法では、データセットをサブサンプルに分割し、統計的パワーと解像度を制限し、多次元および潜在コンテキストの使用を防止している。この課題を克服するために,NOTEARSを最適化したアーチティパルDAG(NOTMAD)を提案する。 NOTMADは、コンテキスト固有のベイジアンネットワークを、サンプルコンテキストに応じてアーキティパルネットワークを混合することを学ぶ関数の出力としてモデル化する。原型的ネットワークは、文脈固有のネットワークと共同で推定され、事前の知識は不要である。我々は、この非巡回性制約を混合関数に逆伝播する滑らかな正規化損失としてエンコードし、この方法でNOTMADはコンテキスト固有の非巡回グラフ間で情報を共有し、ベイズ的ネットワーク構造とパラメータを単一サンプル解像度で推定することができる。がんの形態的変異に対応する患者特異的遺伝子発現ネットワークを含む分析および実験を通じて,notmadおよびサンプル特異的ネットワーク推論の有用性を実証する。

Context-specific Bayesian networks (i.e. directed acyclic graphs, DAGs) identify context-dependent relationships between variables, but the non-convexity induced by the acyclicity requirement makes it difficult to share information between context-specific estimators (e.g. with graph generator functions). For this reason, existing methods for inferring context-specific Bayesian networks have favored breaking datasets into subsamples, limiting statistical power and resolution, and preventing the use of multidimensional and latent contexts. To overcome this challenge, we propose NOTEARS-optimized Mixtures of Archetypal DAGs (NOTMAD). NOTMAD models context-specific Bayesian networks as the output of a function which learns to mix archetypal networks according to sample context. The archetypal networks are estimated jointly with the context-specific networks and do not require any prior knowledge. We encode the acyclicity constraint as a smooth regularization loss which is back-propagated to the mixing function; in this way, NOTMAD shares information between context-specific acyclic graphs, enabling the estimation of Bayesian network structures and parameters at even single-sample resolution. We demonstrate the utility of NOTMAD and sample-specific network inference through analysis and experiments, including patient-specific gene expression networks which correspond to morphological variation in cancer.

翻訳日:2021-11-02 12:57:42 公開日:2021-11-01

# 確率ゲートを用いた支援:理論と線形モデルへの応用

Support Recovery with Stochastic Gates: Theory and Application for Linear Models ( http://arxiv.org/abs/2110.15960v2 )

ライセンス: Link先を確認

Soham Jana, Henry Li, Yutaro Yamada, Ofir Lindenbaum

(参考訳) 本研究では,独立かつ同一に分布する正規誤差を持つ線形モデルにおいて,係数ベクトル(\beta^*$)の同時回復と推定の問題を解析する。確率ゲート(stg)[ylnk20]の非線形ペナルティに基づくペナライズ最小二乗推定器を用いて係数を推定する。ガウス設計行列を考えると、stgベースの推定器は、次元および$\beta^*$の妥当な条件下で真のデータ生成係数ベクトルに収束し、その支持集合を高い確率で検出する。一般非線形モデル用に設計された既存のSTG推定器を改善するために,線形モデル設定のための新しいプロジェクションベースアルゴリズムを提案する。この新しい手法は, 合成データ解析におけるリカバリを支援するために, 多くの古典的推定器を上回っている。

We analyze the problem of simultaneous support recovery and estimation of the coefficient vector ($\beta^*$) in a linear model with independent and identically distributed Normal errors. We apply the penalized least square estimator based on non-linear penalties of stochastic gates (STG) [YLNK20] to estimate the coefficients. Considering Gaussian design matrices we show that under reasonable conditions on dimension and sparsity of $\beta^*$ the STG based estimator converges to the true data generating coefficient vector and also detects its support set with high probability. We propose a new projection based algorithm for linear models setup to improve upon the existing STG estimator that was originally designed for general non-linear models. Our new procedure outperforms many classical estimators for support recovery in synthetic data analysis.

翻訳日:2021-11-02 11:19:22 公開日:2021-11-01

# 形状認識型3次元画像合成のためのシェーディングガイド生成命令モデル

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis ( http://arxiv.org/abs/2110.15678v2 )

ライセンス: Link先を確認

Xingang Pan, Xudong Xu, Chen Change Loy, Christian Theobalt, Bo Dai

(参考訳) 生成放射場の発展は、3D認識画像合成の境界を押し上げている。これらの手法は,複数の視点から3次元物体が現実的に見えるという観察に触発され,正則化として多視点制約を導入し,有効3次元放射場を2次元画像から学習する。進行にもかかわらず、形状と色のあいまいさのために正確な3D形状を捉えることができず、下流のタスクでは適用性が制限される。本研究では,この曖昧さに対処するために,新たに改良された形状表現を学習可能なシェーディング誘導型生成暗黙モデルを提案する。私たちの重要な洞察は、正確な3d形状は異なる照明条件下でもリアルなレンダリングをもたらすだろうということです。照明を明示的にモデル化し、様々な照明条件でシェーディングを行うことにより、マルチライト制約を実現する。勾配は、合成された画像を判別器に供給することによって導出される。表面正規化計算の計算負荷を補うために, 表面追跡による効率的なボリュームレンダリング戦略を考案し, 学習時間と推定時間をそれぞれ24%, 48%削減した。提案手法は, 正確な3次元形状を把握しながら, 光リアルな3次元画像合成を実現する。本研究では,既存の手法に対する3次元形状再構成手法の性能向上を実証し,画像照明への適用性を示す。私たちのコードはhttps://github.com/xingangpan/shadeganでリリースします。

The advancement of generative radiance fields has pushed the boundary of 3D-aware image synthesis. Motivated by the observation that a 3D object should look realistic from multiple viewpoints, these methods introduce a multi-view constraint as regularization to learn valid 3D radiance fields from 2D images. Despite the progress, they often fall short of capturing accurate 3D shapes due to the shape-color ambiguity, limiting their applicability in downstream tasks. In this work, we address this ambiguity by proposing a novel shading-guided generative implicit model that is able to learn a starkly improved shape representation. Our key insight is that an accurate 3D shape should also yield a realistic rendering under different lighting conditions. This multi-lighting constraint is realized by modeling illumination explicitly and performing shading with various lighting conditions. Gradients are derived by feeding the synthesized images to a discriminator. To compensate for the additional computational burden of calculating surface normals, we further devise an efficient volume rendering strategy via surface tracking, reducing the training and inference time by 24% and 48%, respectively. Our experiments on multiple datasets show that the proposed approach achieves photorealistic 3D-aware image synthesis while capturing accurate underlying 3D shapes. We demonstrate improved performance of our approach on 3D shape reconstruction against existing methods, and show its applicability on image relighting. Our code will be released at https://github.com/XingangPan/ShadeGAN.

翻訳日:2021-11-02 11:19:08 公開日:2021-11-01

PDF登録状況（公開日: 20211101）