Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20200731となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 忠実度変動を用いたノイズテレポーテーションの性能評価 Rating the performance of noisy teleportation using fluctuations in fidelity ( http://arxiv.org/abs/2001.11463v2 ) ライセンス: Link先を確認	Saptarshi Roy and Arkaprabha Ghosal	(参考訳) 量子テレポーテーションは、量子世界の最も先駆的な特徴の1つである。通常、テレポーテーションプロトコルの品質はその平均忠実度によってのみ判断される。本研究では, テレポーテーションの性能を, 忠実度と不確かさの両方の観点から解析する。具体的には,テレポータビリティスコア(telportability score)という,忠実度と偏差の両面から貢献する量を定義した。また、量子状態のテレポーテーションが1つまたは複数の中間ステップで必要とされるプロトコルに必要な感度を考慮に入れている。我々は,無騒音シナリオにおけるテレポータビリティスコアを計算し,資源状態の絡み合い内容と単調に増加することを示す。 n鎖リピータのような構成であっても、結果は変わらない。しかし、ノイズの存在下では、テレポータビリティスコアは、当初共有されていたリソース状態の絡み合い内容に関して、時々非単調な振る舞いを示すことができる。具体的には、局所的なビットフリップおよびビット位相フリップノイズの下では、より少ないエンタングル状態は、システムパラメータの特定の選択に対して高いテレポータビリティスコアを持つことができる。グローバルな非偏極ノイズの存在下では、低絡み合いの資源状態と高感度要求に対して、ノイズのないシナリオと比較して、ノイズの多い状態の方がテレポータビリティスコアがよい。 Quantum teleportation is one of the most pioneering features of the quantum world. Typically, the quality of a teleportation protocol is solely judged by its average fidelity. In this work, we analyze the performance of teleportation in terms of both fidelity and the deviation in fidelity. Specifically, we define a quantity called teleportability score, which incorporates contributions from both the fidelity and its deviation. It also takes into account the sensitivity one requires for a protocol in which the teleportation of a quantum state is required in one or many intermediate steps. We compute the teleportability score in the noiseless scenario and find that it increases monotonically with the entanglement content of the resource state. The result remains same even if we consider an n-chain repeater-like configuration. However, in the presence of noise, the teleportability score, can sometime display a nonmonotonic behaviour with respect to the entanglement content of the initially shared resource state. Specifically, under local bit-flip and bit-phase-flip noise, lesser entangled states can have higher teleportability score for certain choice of system parameters. In the presence of global depolarizing noise, for low entangled resource states and high sensitivity requirements, the noisy states can have better a teleportability score in comparison to the noiseless scenario.	翻訳日:2023-06-05 04:44:43 公開日:2020-07-31
# 異方性ホッピングをもつ立方格子上のボース・ハバード模型の量子相転移 Quantum phase transition of the Bose-Hubbard model on cubic lattice with anisotropic hopping ( http://arxiv.org/abs/2002.10602v2 ) ライセンス: Link先を確認	Tao Wang and Xue-Feng Zhang	(参考訳) 量子多体系では、次元性は量子相転移のタイプにおいて重要な役割を果たす。次元クロスオーバー中の量子系の研究のために,高次シンボリック強結合拡大法を用いて,異方性ホッピングを伴う立方体格子上のボース・ハバード模型について検討した。 mott-絶縁体と超流動相の境界を8階まで拡張した解析級数を計算する。臨界指数は pad\'{e} re-summation 法によって抽出される。一方、余剰充填における臨界点も得られ、それらは再正規化群理論の予測とよく一致する。ギャップエネルギーのスケーリングと全位相図は最終的に与えられ、将来の研究における実験および数値シミュレーションのベンチマークとして捉えることができる。 In quantum many-body system, dimensionality plays a critical role on type of the quantum phase transition. In order to study the quantum system during dimensional crossover, we studied the Bose-Hubbard model on cubic lattice with anisotropic hopping by using the high order symbolic strong coupling expansion method. The analytic series expanded boundaries between the Mott-insulator and superfluid phase up to eighth order are calculated. The critical exponents are extracted by Pad\'{e} re-summation method, which clearly shows the dimensional crossover behavior. Meanwhile, the critical points at commensurate filling can also be obtained, and they match well with the prediction of renormalization group theory. The scaling of the gap energy and whole phase diagram are given at last, and they can be taken as the benchmark for experiment and numerical simulations in the future study.	翻訳日:2023-06-02 00:11:46 公開日:2020-07-31
# 多モード光キャビティにおける光子を介する1次元ガスのピエルス遷移 Photon-mediated Peierls Transition of a 1D Gas in a Multimode Optical Cavity ( http://arxiv.org/abs/2002.12285v3 ) ライセンス: Link先を確認	Colin Rylands, Yudan Guo, Benjamin L. Lev, Jonathan Keeling, and Victor Galitski	(参考訳) 電荷密度波に対するピエルス不安定性はフォノン駆動の強相関物理学の正準例であり、トポロジカル量子物質とエキゾティック超伝導と密接に関連している。多モード共焦点空洞内に閉じ込められた相互作用するボース原子やフェルミ原子の1次元チューブシステムを用いて、類似した光子を媒介するパイエルス遷移を実現する方法を提案する。キャビティをポンピングすることで、原子系におけるキャビティを媒介する金属--絶縁体遷移が実現される。 tonks-girardeau極限における強く相互作用するボソンの場合、この遷移は(フェルミオン化を通じて)パイエルス不安定であると理解することができる。この計算を相互作用強度の有限値に拡張し、キャビティ場と質量ギャップの双方について解析式を導出する。それらは次元のない物質-光のカップリングに非自明な力法則を依存させる。 The Peierls instability toward a charge density wave is a canonical example of phonon-driven strongly correlated physics and is intimately related to topological quantum matter and exotic superconductivity. We propose a method to realize an analogous photon-mediated Peierls transition, using a system of one-dimensional tubes of interacting Bose or Fermi atoms trapped inside a multimode confocal cavity. Pumping the cavity transversely engineers a cavity-mediated metal--to--insulator transition in the atomic system. For strongly interacting bosons in the Tonks-Girardeau limit, this transition can be understood (through fermionization) as being the Peierls instability. We extend the calculation to finite values of the interaction strength and derive analytic expressions for both the cavity field and mass gap. They display nontrivial power law dependence on the dimensionless matter-light coupling.	翻訳日:2023-06-01 12:27:29 公開日:2020-07-31
# 非平衡モンテカルロシミュレーションからの強結合 Strong coupling from non-equilibrium Monte Carlo simulations ( http://arxiv.org/abs/2003.13734v2 ) ライセンス: Link先を確認	Olmo Francesconi, Marco Panero, David Preti	(参考訳) 格子上の非平衡モンテカルロシミュレーションを用いて、schr\"odinger-functional schemeにおける非可換ゲージ理論の動作結合を計算する。 We compute the running coupling of non-Abelian gauge theories in the Schr\"odinger-functional scheme, by means of non-equilibrium Monte Carlo simulations on the lattice.	翻訳日:2023-05-27 12:03:48 公開日:2020-07-31
# 量子設計に割り当てられたPOVMの不確実性関係のR\'{e}nyi定式化 R\'{e}nyi formulation of uncertainty relations for POVMs assigned to a quantum design ( http://arxiv.org/abs/2004.05576v3 ) ライセンス: Link先を確認	Alexey E. Rastegin	(参考訳) 情報エントロピーは、不確実性原理によって課される制約を表現する強力で柔軟な方法を提供する。このアプローチは量子情報理論の問題への応用に非常に適していると思われる。このような質問は、ある特定の構造を持つ測定を含むのが典型的である。後者はしばしば、十分一般的なスコープの不確実性関係に従うエントロピー境界を改善することができる。量子設計は量子情報理論の多くの問題で使われており、関連する測定に対する不確かさの関係が興味深い。本稿では、量子設計に割り当てられた povm に対する min-エントロピーと r\'{e}nyi エントロピーの観点からの不確実性関係を求める。 Landau--Pollak型の関連も扱う。 2次元の量子設計の例を用いて、得られた下界を以前のものと比較する。エントロピーステアリングの不等式への影響を簡潔に論じる。 Information entropies provide powerful and flexible way to express restrictions imposed by the uncertainty principle. This approach seems to be very suitable in application to problems of quantum information theory. It is typical that questions of such a kind involve measurements having one or another specific structure. The latter often allows us to improve entropic bounds that follow from uncertainty relations of sufficiently general scope. Quantum designs have found use in many issues of quantum information theory, whence uncertainty relations for related measurements are of interest. In this paper, we obtain uncertainty relations in terms of min-entropies and R\'{e}nyi entropies for POVMs assigned to a quantum design. Relations of the Landau--Pollak type are addressed as well. Using examples of quantum designs in two dimensions, the obtained lower bounds are then compared with the previous ones. An impact on entropic steering inequalities is briefly discussed.	翻訳日:2023-05-25 02:26:47 公開日:2020-07-31
# ランドーレベル問題としての位相空間量子力学 Phase Space Quantum Mechanics as a Landau Level Problem ( http://arxiv.org/abs/2004.11455v2 ) ライセンス: Link先を確認	Kun Yang	(参考訳) 位相空間における量子力学の定式化問題と、量子力学粒子の特定のランダウレベルへの運動を投影する問題との関連性を指摘する。特に、量子ホール効果の研究で広く用いられている最低ランダウレベル波動関数は、この文脈で実際に位相空間波動関数であることが示されている。簡単な問題を解析して,この理解の有用性を実証し,他のユーティリティを提案する。 We point out the connection between the problem of formulating quantum mechanics in phase space and projecting the motion of a quantum mechanical particle onto a particular Landau level. In particular, we show that lowest Landau level wave functions, which are widely used in studies of quantum Hall effect, are actually phase space wave functions in this context. We demonstrate the usefulness of this understanding by analyzing some simple problems, and propose other utilities.	翻訳日:2023-05-22 08:11:02 公開日:2020-07-31
# 一軸シフト中心を持つ解析的射影対称正則ガウス関数 Analytically projected rotationally symmetric explicitly correlated Gaussian Functions with one-axis-shifted centers ( http://arxiv.org/abs/2005.00092v2 ) ライセンス: Link先を確認	Andrea Muolo and Markus Reiher	(参考訳) 任意の角運動量とパリティを持つN-粒子系の波動関数を拡張するための新しい有意相関関数形式を示す。我々は,前回の研究 [j. chem. phys. 149, 184105 (2018)] で数値的に活用した投影に基づく手法を開発し,一軸偏移中心と明確に相関し,積分射影作用素を解析的に解いてハミルトニアンおよび角運動量作用素の行列要素を導出する。ボルン-オッペンハイマー近似を仮定しない変分小体計算は、3粒子系と4粒子系のいくつかの回転励起状態に対して提示される。新たな形式主義を,小原子や分子の性質の高精度計算のための統一的な枠組みとして利用できることを示す。 A new explicitly correlated functional form for expanding the wave function of an N-particle system with arbitrary angular momentum and parity is presented. We develop the projection-based approach, numerically exploited in our previous work [J. Chem. Phys. 149, 184105 (2018)], to explicitly correlated Gausssians with one-axis shifted centers and derive the matrix elements for the Hamiltonian and the angular momentum operators by analytically solving the integral projection operator. Variational few-body calculations without assuming the Born-Oppenheimer approximation are presented for several rotationally excited states of three- and four-particle systems. We show how the new formalism can be used as a unified framework for high-accuracy calculations of properties of small atoms and molecules.	翻訳日:2023-05-21 16:54:43 公開日:2020-07-31
# 量子ネットワーク幾何学のスペクトル次元の探索 Probing the spectral dimension of quantum network geometries ( http://arxiv.org/abs/2005.09665v2 ) ライセンス: Link先を確認	Johannes Nokkala, Jyrki Piilo, Ginestra Bianconi	(参考訳) ノードが結合された量子発振器である「フラバー付き量子ネットワーク幾何」 (QNGF) によって記述されたオープン量子システムの環境を考える。 QNGFの幾何学的性質は、有限スペクトル次元を示すネットワークのラプラシア行列のスペクトル特性に反映され、QNGFの正規モードの周波数も決定される。補助開量子系をネットワークに結合し、低周波領域における正規モード周波数を探索することにより、事前未知のスペクトル次元を間接的に推定できることを示す。この意味では、これは例えば発振器の素周波数や定数結合強度の値ではなく、ネットワーク幾何学の特性である。数値的な証拠は、この推定が高周波遮断とノイズ、あるいは正常モード周波数の欠如の両方に頑健であることを示している。補助システムをランダム結合強度を持つネットワークノードのサブセットに結合し、正規モード周波数の十分大きなサブセットを明らかにし解決する。 We consider an environment for an open quantum system described by a "Quantum Network Geometry with Flavor" (QNGF) in which the nodes are coupled quantum oscillators. The geometrical nature of QNGF is reflected in the spectral properties of the Laplacian matrix of the network which display a finite spectral dimension, determining also the frequencies of the normal modes of QNGFs. We show that an a priori unknown spectral dimension can be indirectly estimated by coupling an auxiliary open quantum system to the network and probing the normal mode frequencies in the low frequency regime. We find that the network parameters do not affect the estimate; in this sense it is a property of the network geometry, rather than the values of, e.g., oscillator bare frequencies or the constant coupling strength. Numerical evidence suggests that the estimate is also robust both to small changes in the high frequency cutoff and noisy or missing normal mode frequencies. We propose to couple the auxiliary system to a subset of network nodes with random coupling strengths to reveal and resolve a sufficiently large subset of normal mode frequencies.	翻訳日:2023-05-19 08:04:55 公開日:2020-07-31
# フロッケダイナミクスのための量子カオス測度 Quantum chaos measures for Floquet dynamics ( http://arxiv.org/abs/2007.07283v2 ) ライセンス: Link先を確認	Amin A. Nizami	(参考訳) 周期的に曲がりくねったフロッケシステム(例えば、蹴られたローター)は、カオスのパラダイム的かつ説明的な単純なモデルである。非可積分量子力学には、Loschmidtエコー、ウィグナー関数、スペクトル関数、OTOCなどのカオス的挙動の存在(または遷移)の診断測度がある。我々は、これらの測度を、駆動量子系のユニタリフロッケ作用素の固有系の観点から解析的に計算する。また、量子キックローターに対するフロケ作用素の固有方程式の等価な代替形式も決定する。蹴り上げられたローターのより単純な積分可能な変種に対して、その動力学の表現論的導出を与える。 Periodically kicked Floquet systems such as the kicked rotor are a paradigmatic and illustrative simple model of chaos. For non-integrable quantum dynamics there are several diagnostic measures of the presence of (or the transition to) chaotic behaviour including the Loschmidt echo, Wigner function, spectral function and OTOC. We analytically compute these measures in terms of the eigensystem of the unitary Floquet operator of driven quantum systems. We also determine equivalent alternative forms of the eigen-equation of the Floquet operator for the quantum kicked rotor. For a simpler integrable variant of the kicked rotor, we give a representation theoretic derivation of its dynamics.	翻訳日:2023-05-10 01:58:35 公開日:2020-07-31
# スパース量子ノイズの高速推定 Fast Estimation of Sparse Quantum Noise ( http://arxiv.org/abs/2007.07901v2 ) ライセンス: Link先を確認	Robin Harper, Wenjun Yu, Steven T. Flammia	(参考訳) 量子コンピュータがフォールトトレランスしきい値に近づくにつれ、大規模量子デバイスにおけるノイズの診断と特徴化がますます重要になっている。ノイズチャンネルの最も重要なクラスの1つは、理論的トラクタビリティと実験的妥当性の両方の理由から、パウリチャンネルのクラスである。ここでは、$s$非ゼロパウリ誤差率を$s$スパース、$n$量子パウリノイズチャネル、あるいはより一般に$s$最大のパウリ誤差率で推定する実用的なアルゴリズムを提案する。このアルゴリズムは厳密なリカバリ保証を持ち、$O(n^2)$測定、$O(s n^2)$古典的な処理時間、Clifford量子回路のみを使用する。我々は,IBM 14-qubit 超伝導デバイスのデータに対して,単純化されたクリフォード回路を用いたアルゴリズムのヒューリスティックバージョンを実験的に検証した。これらのデータは、信号が測定ノイズフロアより2桁下にある場合でも、任意の重み付きパウリ誤差の確率を正確に正確に推定できることを示している。 As quantum computers approach the fault tolerance threshold, diagnosing and characterizing the noise on large scale quantum devices is increasingly important. One of the most important classes of noise channels is the class of Pauli channels, for reasons of both theoretical tractability and experimental relevance. Here we present a practical algorithm for estimating the $s$ nonzero Pauli error rates in an $s$-sparse, $n$-qubit Pauli noise channel, or more generally the $s$ largest Pauli error rates. The algorithm comes with rigorous recovery guarantees and uses only $O(n^2)$ measurements, $O(s n^2)$ classical processing time, and Clifford quantum circuits. We experimentally validate a heuristic version of the algorithm that uses simplified Clifford circuits on data from an IBM 14-qubit superconducting device and our open source implementation. These data show that accurate and precise estimation of the probability of arbitrary-weight Pauli errors is possible even when the signal is two orders of magnitude below the measurement noise floor.	翻訳日:2023-05-09 09:03:20 公開日:2020-07-31
# 条件付き確率の規則は量子論において有効である [ゲルマンとヤオの「ベイズ統計のホール」について] The rule of conditional probability is valid in quantum theory [Comment on Gelman & Yao's "Holes in Bayesian Statistics"] ( http://arxiv.org/abs/2007.08160v3 ) ライセンス: Link先を確認	P.G.L. Porta Mana	(参考訳) 最近の写本で、Gelman & Yao (2020) は「通常の条件確率の規則は量子領域で失敗する」と主張し、「確率論は真ではない(量子物理学)」と主張し、量子二重スリット実験の例でこれらの主張を支持することを主張した。本論は量子論のいくつかの関連する文献を思い出し、そのことを示している。 (i)ゲルマン・アンド・ヤオの主張は偽であり、実際、量子例は確率論の規則を確認する。 (ii) 量子の例に見られる特定の不等式は、例えば urn からの描画のような非常に非量子的な例にも現れることが示されるので、この問題には量子理論に特有のものはない。引用された原稿の量子論に関する誤記や不正確な記述も修正されている。 In a recent manuscript, Gelman & Yao (2020) claim that "the usual rules of conditional probability fail in the quantum realm" and that "probability theory isn't true (quantum physics)" and purport to support these statements with the example of a quantum double-slit experiment. The present comment recalls some relevant literature in quantum theory and shows that (i) Gelman & Yao's statements are false; in fact, the quantum example confirms the rules of probability theory; (ii) the particular inequality found in the quantum example can be shown to appear also in very non-quantum examples, such as drawing from an urn; thus there is nothing peculiar to quantum theory in this matter. A couple of wrong or imprecise statements about quantum theory in the cited manuscript are also corrected.	翻訳日:2023-05-09 07:17:32 公開日:2020-07-31
# フラグ状態スカッシングモデルを用いた非平衡位相符号化BB84プロトコルの鍵レート改善 Improving key rates of the unbalanced phase-encoded BB84 protocol using the flag-state squashing model ( http://arxiv.org/abs/2007.08662v2 ) ライセンス: Link先を確認	Nicky Kai Hong Li and Norbert L\"utkenhaus	(参考訳) 位相符号化bb84の実装はすべて、実際には不平衡振幅を持つ信号状態を持つ。したがって、プライオリによる元のセキュリティ分析は適用されない。以前のセキュリティ証明では、通常のBB84の動作を回復するために、マルチフォトンパルスの信号タグを使用する。不均衡信号の場合、光子数分割攻撃はeveに完全な情報を漏らすことはない。本研究では,フラグ状態のスカッシングモデルを用いて,多光子生成プライベート情報のいくつかを解析で保存する。数値的な証明手法を用いることで, 従来公表されていた低損失状態と比較して, キーレートが大幅に向上した。信頼できないダークカウントの通常のシナリオは、あるパラメーター状態において概念的な困難に陥ることが判明した。そこで本稿では,信頼度の高いダークカウントシナリオについても論じる。また,信頼された装置によって損失の一部が引き起こされることが分かっている場合,鍵レートの上昇も報告する。これらのキーレートの改善は、実験的な設定を変更せずに達成できることを強調する。 All phase-encoded BB84 implementations have signal states with unbalanced amplitudes in practice. Thus, the original security analyses a priori do not apply to them. Previous security proofs use signal tagging of multi-photon pulses to recover the behaviour of regular BB84. This is overly conservative, as for unbalanced signals, the photon-number splitting attack does not leak full information to Eve. In this work, we exploit the flag-state squashing model to preserve some parts of the multi-photon generated private information in our analysis. Using a numerical proof technique, we obtain significantly higher key rates compared with previously published results in the low-loss regime. It turns out that the usual scenario of untrusted dark counts runs into conceptual difficulties in some parameter regime. Thus, we discuss the trusted dark count scenario in this paper as well. We also report a gain in key rates when part of the total loss is known to be induced by a trusted device. We highlight that all these key rate improvements can be achieved without modification of the experimental setup.	翻訳日:2023-05-09 06:53:21 公開日:2020-07-31
# 極小熱環境による絡み合いの突然死 Sudden death of entanglement induced by a minimal thermal environment ( http://arxiv.org/abs/2007.09140v2 ) ライセンス: Link先を確認	G.L. De\c{c}ordi and A. Vidiella-Barranco	(参考訳) 熱状態における1つのモード電磁界と1つのモード電磁界を結合した2つの相互作用する2レベル系(量子ビット)のダイナミクスについて検討した。フィールドは、多くの自由度を持つ熱貯水池を通して環境をモデル化する通常のアプローチとは対照的に、小さな環境の役割を担っている。提案したモデルの解析解は,2量子系の特性量子特性に対する小環境とのカップリングの影響を解明するものである。量子エンタングルメントとコヒーレンスの時間進化について検討し、関連する結合定数への依存性と環境の有効温度の影響を検証した。興味深いことに、このような単純なシステムでは、突然の死亡と突然の絡み合いが生じる可能性がある。また、分離されたキュービットを、他のキュービットとフィールドモードで構成された複合環境に結合すると考えられる別のパーティションについても論じる。 We study the dynamics of two interacting two-level systems (qubits) having one of them isolated and the other coupled to a single mode electromagnetic field in a thermal state. The field plays the role of a small environment, in contrast to the usual approach of modeling an environment via a thermal reservoir with many degrees of freedom. We find the analytical solution of the proposed model, which allows us to investigate the consequences of the coupling to the small environment on characteristic quantum features of the two-qubit system. We study the time evolution of quantum entanglement and coherence, verifying the dependence on the relevant coupling constants as well as the influence of the effective temperature of the environment. Interestingly, we find that both sudden death and sudden birth of entanglement may occur in such a simple system. We also discuss a different partition, in which the isolated qubit is considered to be coupled to a composite environment, constituted by the other qubit plus the field mode.	翻訳日:2023-05-09 04:51:22 公開日:2020-07-31
# 中性Tb(II)(Cp$^{\rm{iPr5}}$)$_2$単一分子磁石における電気的に調整された超微細スペクトル Electrically tuned hyperfine spectrum in neutral Tb(II)(Cp$^{\rm{iPr5}}$)$_2$ single-molecule magnet ( http://arxiv.org/abs/2007.15798v1 ) ライセンス: Link先を確認	Robert L. Smith, Aleksander L. Wysocki, and Kyungwha Park	(参考訳) 長いスピンコヒーレンス時間を持つ分子スピン量子ビットと、そのような量子ビット上の非侵襲的な操作方法が要求される。分子電子スピンレベルと核スピンレベルの両方を量子ビットとして使用できることを示した。ドーパントを持つ固体系では、電子スピン密度がドーパント核で高い場合、電場が核スピン量子ビットレベル間の間隔を効果的に変化させることが示されている。このような固体系に着想を得て、Ln$^{2+}$の特異な電子配置を持つ二価ランタニド(Ln)錯体はLn核スピンと電子自由度との間に強い相互作用を持ち、相互作用の電気的チューニングを行う。例えば、中性Tb(II)(Cp$^{\rm{iPr5}}$)$_2$単分子磁石(SMM)における$^{159}$Tb核の電子構造と超微細相互作用を、制限された活性空間状態相互作用に含まれるスピン-軌道相互作用を持つ完全活性空間自己整合磁場法を用いて研究する。計算の結果,低エネルギー状態は4f^8(6s,5d_{z^2})^1$,4$f^8$(5$d_{x^2-y^2}$)$^1$,4f^8(5d_{xy})^1$構成から生じることがわかった。我々は,超微細な相互作用パラメータと電子核スペクトルを多構成アプローチで計算する。超微細相互作用は、Tb(III)Pc$_2$SMMよりも1桁大きい。これは、Tb核スピンと6s,5d)$軌道の占有に由来する核の電子スピン密度との間の強いフェルミ接触相互作用に由来する。また、フェルミ接触項の電場への応答が電子核レベル分離の電気的チューニングをもたらすことも明らかにした。この超微細スターク効果は量子コンピューティングにおける分子核スピンの応用に有用かもしれない。 Molecular spin qubits with long spin coherence time as well as non-invasive operation methods on such qubits are in high demand. It was shown that both molecular electronic and nuclear spin levels can be used as qubits. In solid state systems with dopants, an electric field was shown to effectively change the spacing between the nuclear spin qubit levels when the electron spin density is high at the nucleus of the dopant. Inspired by such solid-state systems, we propose that divalent lanthanide (Ln) complexes with an unusual electronic configuration of Ln$^{2+}$ have a strong interaction between the Ln nuclear spin and the electronic degrees of freedom, which renders electrical tuning of the interaction. As an example, we study electronic structure and hyperfine interaction of the $^{159}$Tb nucleus in a neutral Tb(II)(Cp$^{\rm{iPr5}}$)$_2$ single-molecule magnet (SMM) using the complete active space self-consistent field method with spin-orbit interaction included within the restricted active space state interaction. Our calculations show that the low-energy states arise from $4f^8(6s,5d_{z^2})^1$, 4$f^8$(5$d_{x^2-y^2}$)$^1$, and $4f^8(5d_{xy})^1$ configurations. We compute the hyperfine interaction parameters and the electronic-nuclear spectrum within our multiconfigurational approach. We find that the hyperfine interaction is about one order of magnitude greater than that for Tb(III)Pc$_2$ SMMs. This stems from the strong Fermi contact interaction between the Tb nuclear spin and the electron spin density at the nucleus that originates from the occupation of the $(6s,5d)$ orbitals. We also uncover that the response of the Fermi contact term to electric field results in electrical tuning of the electronic-nuclear level separations. This hyperfine Stark effect may be useful for applications of molecular nuclear spins for quantum computing.	翻訳日:2023-05-07 12:56:12 公開日:2020-07-31
# フェニルスルホニル-カルバゾールtadfエミッタの電子遷移研究への量子コンピューティングの応用 Applications of Quantum Computing for Investigations of Electronic Transitions in Phenylsulfonyl-carbazole TADF Emitters ( http://arxiv.org/abs/2007.15795v1 ) ライセンス: Link先を確認	Qi Gao, Gavin O. Jones, Mario Motta, Michihiko Sugawara, Hiroshi C. Watanabe, Takao Kobayashi, Eriko Watanabe, Yu-ya Ohnishi, Hajime Nakamura and Naoki Yamamoto	(参考訳) 有機発光ダイオード(oled)用熱活性化遅延蛍光(tadf)エミッタとして提案されるフェニルスルホニル-カルバゾール化合物の第一一重項(s1)および三重項(t1)励起状態の量子化学研究を,量子シミュレータおよびデバイス上での量子方程式-運動変動量子固有解法(qeom-vqe)と変分量子デフレレーション(vqd)アルゴリズムを用いて行った。これらの量子シミュレーションは, tadf分子の最大占有量と最小占有分子軌道(homo, lumo)からなる活性空間上で, 二重ゼータ品質基底セットを用いて行った。量子シミュレータ上での計算により予測されるS1とT1(\Delta E_{st}$)のエネルギー分離の差は実験データとよく一致していることがわかった。 qeom-vqeアルゴリズムとvqdアルゴリズムを用いて, 量子デバイス上で誤差緩和を伴わずにシミュレーションを行うことにより, 励起状態に対する16mhaと88mhaの違いが確認された。状態トモグラフィーによる誤差軽減を利用して、量子状態の浄化とエネルギー値の補正を行うことで、未緩和結果に対する大きな誤差は、少なくとも正確な値に関して3mHaの違いに改善することができる。その結果、量子シミュレーションによって予測される$\Delta E_{st}$の値と実験で得られた値との間には優れた一致が見いだされた。 A quantum chemistry study of the first singlet (S1) and triplet (T1) excited states of phenylsulfonyl-carbazole compounds, proposed as useful thermally activated delayed fluorescence (TADF) emitters for organic light emitting diode (OLED) applications, was performed with the quantum Equation-Of-Motion Variational Quantum Eigensolver (qEOM-VQE) and Variational Quantum Deflation (VQD) algorithms on quantum simulators and devices. These quantum simulations were performed with double zeta quality basis sets on an active space comprising the highest occupied and lowest unoccupied molecular orbitals (HOMO, LUMO) of the TADF molecules. The differences in energy separations between S1 and T1 ($\Delta E_{st}$) predicted by calculations on quantum simulators were found to be in excellent agreement with experimental data. Differences of 16 and 88 mHa with respect to exact energies were found for excited states by using the qEOM-VQE and VQD algorithms, respectively, to perform simulations on quantum devices without error mitigation. By utilizing error mitigation by state tomography to purify the quantum states and correct energy values, the large errors found for unmitigated results could be improved to differences of, at most, 3 mHa with respect to exact values. Consequently, excellent agreement could be found between values of $\Delta E_{st}$ predicted by quantum simulations and those found in experiments.	翻訳日:2023-05-07 12:55:41 公開日:2020-07-31
# 二次元水素のcr\'amer-rao複雑性 Cr\'amer-Rao complexity of the two-dimensional confined hydrogen ( http://arxiv.org/abs/2007.15913v1 ) ライセンス: Link先を確認	C. R. Esta\~n\'on, N. Aquino, D. Puertas-Centeno, J. S. Dehesa	(参考訳) 2次元に閉じ込められた水素原子の内部障害は、統計量 Cr'amer-Rao による 1\textit{s}, 2\textit{s}, 2\textit{p}, 3\textit{d} 量子状態の閉じ込め半径で数値的に研究される。まず, 分散の閉じ込め依存性と電子分布の位置と運動量拡散のフィッシャー情報について計算し, 考察する。次に, Cr\'amer-Rao複雑性測定(平均値と電子分布の勾配含量に関する電荷濃度の組合せバランスを定量化する)を, 位置と運動量空間で検討した。閉じ込めは、この2つの成分測度によって全ての量子状態に対するシステムの複雑さを区別する。 The internal disorder of the two-dimensional confined hydrogenic atom is numerically studied in terms of the confinement radius for the 1\textit{s}, 2\textit{s}, 2\textit{p} and 3\textit{d} quantum states by means of the statistical Cr\'amer-Rao complexity measure. First, the confinement dependence of the variance and the Fisher information of the position and momentum spreading of its electron distribution are computed and discussed. Then, the Cr\'amer-Rao complexity measure (which quantifies the combined balance of the charge concentration around the mean value and the gradient content of the electron distribution) is investigated in position and momentum spaces. We found that confinement does distinguish complexity of the system for all quantum states by means of this two component measure.	翻訳日:2023-05-07 12:53:52 公開日:2020-07-31
# 多体系における局所相互作用による絡み合い補正 Entanglement correction due to local interactions in many-body systems ( http://arxiv.org/abs/2007.15908v1 ) ライセンス: Link先を確認	Yevheniia Cheipesh, Lorenzo Cevolani, Stefan Kehrein	(参考訳) フロー方程式ホログラフィー法の摂動拡張に基づいて,弱および局所的に相互作用するフェルミオンの2部分母エントロピーの領域法則の補正を行う。 1次元および2次元の場合(および高次元の場合)の明示的な計算は、相互作用強度が最大$u^2$の非相互作用フェルミオンの絡み合いエントロピーに対する主補正はスケーリングに影響を与えないが、主対数項の前因子に準粒子残基を乗じるだけであることを示している。地域法に準じた用語も存在する。相互作用強度は$U^2$に比例し、システムサイズと線形にスケールする。 The correction to the area law for the bipartite min-entanglement entropy of weakly and locally interacting fermions is calculated based on a perturbative extension of the flow equation holography method. Explicit calculations for the one- and two-dimensional case (and similarly for higher dimensions) show that the leading correction to the entanglement entropy of non-interacting fermions up to $U^2$ in the interaction strength does not change the scaling, but only affects the pre-factor of the leading logarithmic term multiplying it by the quasiparticle residue. A term sub-leading to the area law is also present. It is proportional to $U^2$ in the interaction strength and scales linearly with the system size.	翻訳日:2023-05-07 12:53:36 公開日:2020-07-31
# 減音効果を考慮した量子リモートセンシング Quantum remote sensing under the effect of dephasing ( http://arxiv.org/abs/2007.15903v1 ) ライセンス: Link先を確認	Hideaki Okane, Hideaki Hakoshima, Yuki Takeuchi, Yuya Seki and Yuichiro Matsuzaki	(参考訳) 量子リモートセンシング(QRS)は、量子ビットセンサの測定結果に関するセキュリティを追加するスキームである。クライアントは量子センサを持つリモートサーバに測定タスクを委譲し、Eavesdropper(Eve)はサーバ側に保存されたすべての古典的な情報を盗む。量子特性を用いることで、QRSはクライアントがEveよりも知覚結果に関する情報を得る情報ゲインに関する非対称性を提供する。しかし、量子状態はデコヒーレンスに対して脆弱であるため、そのようなQRSが現実的な雑音の影響下で実際に有用であるかどうかは不明である。本稿では,対象フィールドとの相互作用を強調するqrsの性能について検討する。 QRSでは、クライアントとサーバはベルペアを共有する必要があり、ベルペアの欠陥は、検知のためのサーバ側の体系的な方法で状態準備エラーにつながる。我々はデフォーカスと状態準備の誤りの効果を考察する。クライアント側の不確実性は、繰り返し数$M$ for small $M$の平方根によって減少する。一方、大規模な$m$の場合、状態準備エラーはデファスメントと同様に関連し、不確実性は$m$で対数的に減少する。我々はクライアントとイブの間で得た情報を比較する。これにより、非対称な利得がデファス化の効果の下でも維持される条件が得られる。 The quantum remote sensing (QRS) is a scheme to add security about the measurement results of a qubit-based sensor. A client delegates a measurement task to a remote server that has a quantum sensor, and eavesdropper (Eve) steals every classical information stored in the server side. By using quantum properties, the QRS provides an asymmetricity about the information gain where the client gets more information about the sensing results than Eve. However, quantum states are fragile against decoherence, and so it is not clear whether such a QRS is practically useful under the effect of realistic noise. Here, we investigate the performance of the QRS with dephasing during the interaction with the target fields. In the QRS, the client and server need to share a Bell pair, and an imperfection of the Bell pair leads to a state preparation error in a systematic way on the server side for the sensing. We consider the effect of both dephasing and state preparation error. The uncertainty of the client side decreases with the square root of the repetition number $M$ for small $M$, which is the same scaling as the standard quantum metrology. On the other hand, for large $M$, the state preparation error becomes as relevant as the dephasing, and the uncertainty decreases logarithmically with $M$. We compare the information gain between the client and Eve. This leads us to obtain the conditions for the asymmetric gain to be maintained even under the effect of dephasing.	翻訳日:2023-05-07 12:53:22 公開日:2020-07-31
# 3つの同一ボソン:非整数次元および外部場における特性 Three identical bosons: Properties in non-integer dimensions and in external fields ( http://arxiv.org/abs/2007.15900v1 ) ライセンス: Link先を確認	E. Garrido and A.S. Jensen	(参考訳) 3次元(3次元)空間から2次元(2次元)空間に連続的に絞り込まれる3体系について検討する。このようなスクイージングは、一つの軸に沿って作用する外部閉じ込め電位によって得られる。しかし、この手順は数値的に要求されるか、特に大きな絞り込みシナリオでは不可能である。代用として、パラメータとして$d$という次元を使い、2\leq d \leq 3$の範囲内で連続的に変化する。 $d$-計算の単純さは、進行的閉じ込め後の3体状態の進化を研究するために利用される。 3dで相対的に$s$-waveを持つ3つの同一スピンレスボソンと調和振動子スクイージングポテンシャルの場合には考慮される。 2つの方法から得られた結果を比較し,両手法の次元,絞り長,波動関数に関するそれらの間の変換を行う。すべての計算はより単純な$d$-method内で完全に可能であるが、同じ幾何学を外部ポテンシャルで同時に提供する。 Three-body systems that are continuously squeezed from a three-dimensional (3D) space into a two-dimensional (2D) space are investigated. Such a squeezing can be obtained by means of an external confining potential acting along a single axis. However, this procedure can be numerically demanding, or even undoable, especially for large squeezed scenarios. An alternative is provided by use of the dimension $d$ as a parameter that changes continuously within the range $2\leq d \leq 3$. The simplicity of the $d$-calculations is exploited to investigate the evolution of three-body states after progressive confinement. The case of three identical spinless bosons with relative $s$-waves in 3D, and a harmonic oscillator squeezing potential is considered. We compare results from the two methods and provide a translation between them, relating dimension, squeezing length, and wave functions from both methods. All calculations are then possible entirely within the simpler $d$-method, but simultaneously providing the equivalent geometry with the external potential.	翻訳日:2023-05-07 12:52:59 公開日:2020-07-31
# 絡み合うチャネルのヌルスペースとその応用 Nullspaces of Entanglement Breaking Channels and Applications ( http://arxiv.org/abs/2007.15893v1 ) ライセンス: Link先を確認	D.W. Kribs, J. Levick, K. Olfert, R. Pereira, M. Rahaman	(参考訳) 絡み合うチャネルのヌルスペース構造と関連する応用について検討する。トレースゼロ行列のすべての作用素空間は、絡み合う破壊チャネルのヌル空間であることを示す。相補的なチャネル挙動と絡み合うチャネルのヌル空間に基づいて、量子チャネルの混合ユニタリ性をテストする。我々は、ある種の絡み合うチャネルのクラスに対するプライベート代数の存在を保証する条件を特定する。 We investigate the nullspace structures of entanglement breaking channels, and related applications. We show that every operator space of trace zero matrices is the nullspace of an entanglement breaking channel. We derive a test for mixed unitarity of quantum channels based on complementary channel behaviour and entanglement breaking channel nullspaces. We identify conditions that guarantee the existence of private algebras for certain classes of entanglement breaking channels.	翻訳日:2023-05-07 12:52:42 公開日:2020-07-31
# 絡み合いの少ない単純な量子位置検証プロトコルを破る Breaking simple quantum position verification protocols with little entanglement ( http://arxiv.org/abs/2007.15808v1 ) ライセンス: Link先を確認	Andrea Olivo, Ulysse Chabaud, Andr\'e Chailloux, Fr\'ed\'eric Grosshans	(参考訳) inqc(instantaneous nonlocal quantum computation)は、見かけの量子および相対論的制約を回避し、指数的絡み合いコストでジェネリック量子位置検証(qpv)プロトコル(遠方証明器の位置をセキュアに検証する)を攻撃可能にする。我々は,最大絡み合ったキューディットのペアを共有する敵について検討し,1光子を1つの角度で偏光したQPVプロトコルの実用的ファミリに対する低次元INQC攻撃を,$\theta$で行う。クリフォード階層の外に座っているもの(例えば$\pi/6$)や、$\theta$が$\simeq 5\cdot 10^{-3}$以上のエラーをプロトコルのキュービットに2つのebitを持つ敵に対して許容できないことなど、いくつかの合理的な角度に対する正確な攻撃を見つける。 Instantaneous nonlocal quantum computation (INQC) evades apparent quantum and relativistic constraints and allows to attack generic quantum position verification (QPV) protocols (aiming at securely certifying the location of a distant prover) at an exponential entanglement cost. We consider adversaries sharing maximally entangled pairs of qudits and find low-dimensional INQC attacks against the simple practical family of QPV protocols based on single photons polarized at an angle $\theta$. We find exact attacks against some rational angles, including some sitting outside of the Clifford hierarchy (e.g. $\pi/6$), and show no $\theta$ allows to tolerate errors higher than $\simeq 5\cdot 10^{-3}$ against adversaries holding two ebits per protocol's qubit.	翻訳日:2023-05-07 12:51:38 公開日:2020-07-31
# ハイブリッド職場における食事嗜好の分析 Seating preference analysis for hybrid workplaces ( http://arxiv.org/abs/2007.15807v1 ) ライセンス: Link先を確認	Mohammad Saiedur Rahaman, Shaw Kudo, Tim Rawling, Yongli Ren, and Flora D. Salim	(参考訳) フレキシブルな仕事の性質の増大と、新型コロナウイルス(covid-19)の規制による近年の要件により、職場はよりハイブリッドになってきている(例えば、伝統的なオフィススペースや、自宅など他の場所で働くことができる)。作業場は設計、レイアウト、利用可能な設備が異なるため、多くの作業員は適切な調整が難しいと感じている。最終的に、これは仕事の生産性や、集中、ストレス、ムードなどの関連するパラメータに悪影響を及ぼす。この負の作業経験を引き起こす重要な要因の1つは、利用可能な座席配置に直接関連している。本稿では、新型コロナウイルス以前のデータを用いて、37人の従業員の様々な座席選択を理解するための分析を行い、ハイブリッド職場環境における調査結果を分析した。また、我々の発見がより広範なハイブリッドな作業環境にどのように適応できるかを示す意味のリストについても論じる。 Due to the increasing nature of flexible work and the recent requirements from COVID-19 restrictions, workplaces are becoming more hybrid (i.e. allowing workers to work between traditional office spaces and elsewhere including from home). Since workplaces are different in design, layout and available facilities, many workers find it difficult to adjust accordingly. Eventually, this impacts negatively towards work productivity and other related parameters including concentration, stress, and mood while at work. One of the key factors that causes this negative work experience is directly linked to the available seating arrangements. In this paper, we conduct an analysis to understand various seating preferences of 37 workers with varying demographics, using the data collected pre-COVID-19, and analyse the findings in the context of hybrid workplace settings. We also discuss a list of implications illustrating how our findings can be adapted across wider hybrid work settings.	翻訳日:2023-05-07 12:51:21 公開日:2020-07-31
# 音響光学変調器を用いた光空間モードの高速生成と検出 Fast Generation and Detection of Spatial Modes of Light using an Acousto-Optic Modulator ( http://arxiv.org/abs/2007.16115v1 ) ライセンス: Link先を確認	Boris Braverman, Alexander Skerjanc, Nicholas Sullivan, Robert W. Boyd	(参考訳) 光の空間モードは、古典的および量子的情報をエンコードするのに使用できる高次元空間を提供する。空間光変調器やデジタルマイクロミラー装置などの高解像度位相マスクを再構成する必要があるため、これらのモードを動的に生成・測定するための現在のアプローチは遅い。光の空間モードを更新するプロセスは、AOM(Acousto-optic modulator)のような高速な画像保存光学スイッチで静止相マスクのセットを多重化することにより、大幅に加速することができる。両パスAOMを用いて5つの軌道角運動量状態のうちの1つを最大500kHzのスイッチング速度で生成する。次に,このシステムを用いて2次元ヒルベルト空間における空間モードの高速量子トモグラフィーを行い,未知の状態を3つの偏りのない基底からなる6つの空間モードに投影する。我々は平均96.9%の忠実度で任意の状態を1ミリ秒未満で再構築することができる。 Spatial modes of light provide a high-dimensional space that can be used to encode both classical and quantum information. Current approaches for dynamically generating and measuring these modes are slow, due to the need to reconfigure a high-resolution phase mask such as a spatial light modulator or digital micromirror device. The process of updating the spatial mode of light can be greatly accelerated by multiplexing a set of static phase masks with a fast, image-preserving optical switch, such as an acousto-optic modulator (AOM). We experimentally realize this approach, using a double-pass AOM to generate one of five orbital angular momentum states with a switching rate of up to 500 kHz. We then apply this system to perform fast quantum state tomography of spatial modes of light in a 2-dimensional Hilbert space, by projecting the unknown state onto six spatial modes comprising three mutually unbiased bases. We are able to reconstruct arbitrary states in under 1 ms with an average fidelity of 96.9%.	翻訳日:2023-05-07 12:43:37 公開日:2020-07-31
# 超伝導ナノワイヤ単光子検出器の2x2多重画素アレイの量子検出器トモグラフィー Quantum detector tomography of a 2x2 multi-pixel array of superconducting nanowire single photon detectors ( http://arxiv.org/abs/2007.16048v1 ) ライセンス: Link先を確認	Timon Schapeler, Jan Philipp Hoepker, Tim J. Bartley	(参考訳) 超伝導ナノワイヤ単光子検出器の商用2x2アレイの量子検出器トモグラフィーを実証する。本研究は, 検出器物理に関係なく, 効率, 暗数, クロストーク確率などの検出器固有値を直接抽出できることを示す。これらの数値は、デバイスの再構成された正の演算子値測定(POVM)の4つの要素から直接識別される。検出器トモグラフィーにより抽出された効率と暗カウント確率の値は,これらの量の独立測定値と良好な一致を示し,クロストーク確率の直感的な操作定義を提供する。最後に,再構成に必要なパラメータを慎重に選択し,データの過度なスムース化を回避する必要があることを示す。 We demonstrate quantum detector tomography of a commercial 2x2 array of superconducting nanowire single photon detectors. We show that detector-specific figures of merit including efficiency, dark-count and cross-talk probabilities can be directly extracted, without recourse to the underlying detector physics. These figures of merit are directly identified from just four elements of the reconstructed positive operator valued measure (POVM) of the device. We show that the values for efficiency and dark-count probability extracted by detector tomography show excellent agreement with independent measurements of these quantities, and we provide an intuitive operational definition for cross-talk probability. Finally, we show that parameters required for the reconstruction must be carefully chosen to avoid oversmoothing the data.	翻訳日:2023-05-07 12:42:28 公開日:2020-07-31
# 単一光子アバランシェダイオードカメラによる量子照明イメージング Quantum illumination imaging with a single-photon avalanche diode camera ( http://arxiv.org/abs/2007.16037v1 ) ライセンス: Link先を確認	Hugo Defienne, Jiuxuan Zhao, Edoardo Charbon, Daniele Faccio	(参考訳) 単光子-バランシェダイオード(SPAD)アレイは、バイオフォトニクス、光学測光、量子光学において必須のツールである。しかし、画素数が少なく、量子効率が低く、フィリング係数も小さいため、実用的なイメージングへの応用は妨げられている。本稿では,100kピクセルspadカメラを用いたフルフィールドエンタングル光子対相関イメージングを示す。 5億対以上の位置間の光子一致を測定することで、撮像系の全点拡散関数と、空間的に絡み合った光子対によって照らされた対象物の高分解能画像を求める。我々は、我々の撮像手法が成層光に対して堅牢であることを示し、量子イメージング技術が実験室を超えて量子LiDARのような実世界のアプリケーションに移行することを可能にする。 Single-photon-avalanche diode (SPAD) arrays are essential tools in biophotonics, optical ranging and sensing and quantum optics. However, their small number of pixels, low quantum efficiency and small fill factor have so far hindered their use for practical imaging applications. Here, we demonstrate full-field entangled photon pair correlation imaging using a 100-kpixels SPAD camera. By measuring photon coincidences between more than 500 million pairs of positions, we retrieve the full point spread function of the imaging system and subsequently high-resolution images of target objects illuminated by spatially entangled photon pairs. We show that our imaging approach is robust against stray light, enabling quantum imaging technologies to move beyond laboratory experiments towards real-world applications such as quantum LiDAR.	翻訳日:2023-05-07 12:42:14 公開日:2020-07-31
# 強化学習を用いた量子コンパイラの量子ビットルーティング Using Reinforcement Learning to Perform Qubit Routing in Quantum Compilers ( http://arxiv.org/abs/2007.15957v1 ) ライセンス: Link先を確認	Matteo G. Pozzi (1), Steven J. Herbert (1 and 2), Akash Sengupta (3), Robert D. Mullins (1) ((1) University of Cambridge Computer Laboratory, (2) Cambridge Quantum Computing, (3) Department of Engineering, University of Cambridge)	(参考訳) 量子ルーティング(Qubit routing)とは、ターゲットの量子コンピュータの接続制約を満たすために量子回路を変更するタスクである。これはSWAPゲートを回路に挿入することで、論理ゲートが隣接する物理量子ビット間でのみ発生するようにする。 SWAPゲートが付加する回路深度を最小化することが目的である。本稿では,深層q学習パラダイムの修正版を用いた量子ビットルーティング手法を提案する。このシステムは、ランダム回路とリアル回路の両方で現在利用可能な最も先進的な量子コンパイラの2つから、短期的なアーキテクチャサイズでqubitルーティング手順を上回ることができる。 "Qubit routing" refers to the task of modifying quantum circuits so that they satisfy the connectivity constraints of a target quantum computer. This involves inserting SWAP gates into the circuit so that the logical gates only ever occur between adjacent physical qubits. The goal is to minimise the circuit depth added by the SWAP gates. In this paper, we propose a qubit routing procedure that uses a modified version of the deep Q-learning paradigm. The system is able to outperform the qubit routing procedures from two of the most advanced quantum compilers currently available, on both random and realistic circuits, across near-term architecture sizes.	翻訳日:2023-05-07 12:41:12 公開日:2020-07-31
# 起業金融研究における西洋的イデオロギー的均質性--高引用出版物からのエビデンス Western ideological homogeneity in entrepreneurial finance research: Evidence from highly cited publications ( http://arxiv.org/abs/2008.00016v1 ) ライセンス: Link先を確認	Minh-Hoang Nguyen, Huyen Thanh T. Nguyen, Thanh-Hang Pham, Manh-Toan Ho and Quan-Hoang Vuong	(参考訳) 起業家はグローバルな持続可能な開発において重要な役割を果たすが、限られた金融資源は業績と生存率を制限している。したがって、起業家金融の規律は、金融と起業家精神の関係を探求するために生まれます。起業家精神のグローバルな存在にもかかわらず、起業家金融の文献は西洋のイデオロギー的均質であると疑われている。本研究の目的は、起業家金融文学における西洋のイデオロギー的均質性の存在を検討することである。我々は,マインドスポンジ機構と文献分析(Y-インデックスと社会構造)を用いて,Web of Scienceデータベースから抽出された412の高度に引用された論文を分析し,起業家金融の中核イデオロギーの集合における異質性に対する弱い耐性と西洋イデオロギーの優位性を見出した。これらの結果は著者、機関、国レベルで一致しており、この分野における西洋のイデオロギー的同質性の存在の強い証拠を示している。筆者らは,イデオロギー的均質性の欠点を避けるため,研究トピックの多様化と知識交換の促進を積極的に行うことを推奨する。さらに、他の科学分野におけるイデオロギーの多様性を評価できる方法として、マインドスポンジ機構の合成と文献分析が提案されている。 Entrepreneurs play crucial roles in global sustainable development, but limited financial resources constrain their performance and survival rate. Entrepreneurial finance discipline is, therefore, born to explore the connection between finance and entrepreneurship. Despite the global presence of entrepreneurship, the literature of entrepreneurial finance is suspected to be Western ideologically homogenous. Thus, the objective of this study is to examine the existence of Western ideological homogeneity in entrepreneurial finance literature. Employing the mindsponge mechanism and bibliometric analyses (Y-index and social structure), we analyze 412 highly cited publications extracted from Web of Science database and find Western ideological dominance as well as weak tolerance towards heterogeneity in the set of core ideologies of entrepreneurial finance. These results are consistent across author-, institution-, and country-levels, which reveals strong evidence for the existence of Western ideological homogeneity in the field. We recommend editors, reviewers, and authors to have proactive actions to diversify research topics and enhancing knowledge exchange to avoid the shortfalls of ideological homogeneity. Moreover, the synthesis of mindsponge mechanism and bibliometric analyses are suggested as a possible way to evaluate the state of ideological diversity in other scientific disciplines.	翻訳日:2023-05-07 12:35:24 公開日:2020-07-31
# 放射状schr\"odinger方程式のユニタリ、連続体、定常摂動理論 Unitary, continuum, stationary perturbation theory for the radial Schr\"odinger equation ( http://arxiv.org/abs/2008.01831v1 ) ライセンス: Link先を確認	Scott E. Hoffmann	(参考訳) ポアンカル群生成子の可換体は、単位変換が自由生成元と相互作用する相対論的理論の生成元を関連付ける場合、形式的に変化しない。非相対論的な場合、生成体のユニタリ変換の概念をテストし、自由かつ相互作用するハミルトン多様体はユニタリ変換によって関連付ける必要がある。他の著者はこの概念を時間依存摂動理論に適用し、摂動論における時間発展作用素の各々の順序へのユニタリティを与え、標準摂動理論よりも改善することを示した。この場合、定常摂動理論は球対称ポテンシャルから散乱するための放射状シュレーディンガー方程式の近似解を見つけるために構成することができる。カップリング定数において、第1および第2次位相シフトに対して一般式を求める。本研究では,S波位相シフトに対する第1次および第2次コントリビューションと,対応する正解の第2次への拡張との完全な一致を求める。 The commutators of the Poincar\'e group generators will be unchanged in form if a unitary transformation relates the free generators to the generators of an interacting relativistic theory. We test the concept of unitary transformations of generators in the nonrelativistic case, requiring that the free and interacting Hamiltonians be related by a unitary transformation. Other authors have applied this concept to time-dependent perturbation theory to give unitarity of the time evolution operator to each order in perturbation theory, with results that show improvement over the standard perturbation theory. In our case, a stationary perturbation theory can be constructed to find approximate solutions of the radial Schr\"odinger equation for scattering from a spherically symmetric potential. General formulae are obtained for the phase shifts at first and second order in the coupling constant. We test the method on a simple system with a known exact solution and find complete agreement between our first- and second-order contributions to the s-wave phase shifts and the corresponding expansion to second order of the exact solution.	翻訳日:2023-05-07 12:24:47 公開日:2020-07-31
# トラップ型超伝導光子検出器による捕捉イオン量子状態の読み出し State Readout of a Trapped Ion Qubit Using a Trap-Integrated Superconducting Photon Detector ( http://arxiv.org/abs/2008.00065v1 ) ライセンス: Link先を確認	S. L. Todaro, V. B. Verma, K. C. McCormick, D. T. C. Allcock, R. P. Mirin, D. J. Wineland, S. W. Nam, A. C. Wilson, D. Leibfried, and D. H. Slichter	(参考訳) トラップ型光子検出器を用いた捕捉イオン量子ビットの高忠実度状態読み出しについて報告する。このトラップ構造に作製された超伝導ナノワイヤ単光子検出器(SNSPD)を用いた状態依存型イオン蛍光光子を数えることにより、表面電界rfイオントラップに保持される1つの$^9$Be$^+$イオンの超微細量子状態を決定する。平均読み出し忠実度は 0.9991(1) であり、平均読み出し期間は 46 $\mu$s であり、読み出しレーザビームの偏光不純物とオフ共振光励起によって制限される。イオンと検出器の間に干渉する光学素子がないため、イオン蛍光を自己校正光源として利用し、検出器の量子効率と光子入射角と偏光への依存性を決定することができる。 We report high-fidelity state readout of a trapped ion qubit using a trap-integrated photon detector. We determine the hyperfine qubit state of a single $^9$Be$^+$ ion held in a surface-electrode rf ion trap by counting state-dependent ion fluorescence photons with a superconducting nanowire single-photon detector (SNSPD) fabricated into the trap structure. The average readout fidelity is 0.9991(1), with a mean readout duration of 46 $\mu$s, and is limited by the polarization impurity of the readout laser beam and by off-resonant optical pumping. Because there are no intervening optical elements between the ion and the detector, we can use the ion fluorescence as a self-calibrated photon source to determine the detector quantum efficiency and its dependence on photon incidence angle and polarization.	翻訳日:2023-05-07 12:24:04 公開日:2020-07-31
# データから知識から行動へ:スマートグリッドの実現 From Data to Knowledge to Action: Enabling the Smart Grid ( http://arxiv.org/abs/2008.00055v1 ) ライセンス: Link先を確認	Randal E. Bryant, Randy H. Katz, Chase Hensel, and Erwin P. Gianchandani	(参考訳) 我が国の発電、送電、配電のためのインフラである「グリッド」は、何世紀にもわたる技術に基づく遺物である。大規模プラントによる高価で中央集権的な発電と、大規模な送電・流通システムで構成されている。要求がどうであれ、すべての加入者に同時に高品質な電力を提供することを試みており、それゆえ、各配布ポイントにおけるピークアグリゲーション需要までのサイズでなければならない。最終的にシステムはエンドツーエンドの同期を必要とするため、"バッフィング(buffering)"エネルギを格納するメカニズムが欠如しており、グリッド間の共有や"上流(upstream)"停止時の独立操作を複雑にしている。最近のブラックアウトは、既存のグリッドの問題を示している。さらに、この構造は太陽や風といった再生可能エネルギー源の高度に可変な性質に対応できない。多くの人々は、電気エネルギーの生成、分配、消費のためのより分散的で適応的で市場ベースのインフラである「スマートグリッド」に期待を向けている。この新しいアプローチは、既存の配電システムに比べて環境への影響を低減しつつ、効率とレジリエンスを高めるように設計されている。スマートグリッドの当初の計画では、既存の情報技術を広く活用することを示唆している。特に、データ分析の最近の進歩は、データマイニング、機械学習などである。エネルギーの使用方法や現在のエネルギーグリッドに課している要求の種類に関する豊富なデータを理解するのに役立ち、スマートグリッドを大幅に強化し、最終的にはその影響を増幅する可能性を持っている。ここでは、電力網が10年でどう見えるか、特に、データ分析アプローチに対する連邦政府の投資が、このビジョンを実現する上でいかに重要かを説明します。 Our nation's infrastructure for generating, transmitting, and distributing electricity - "The Grid" - is a relic based in many respects on century-old technology. It consists of expensive, centralized generation via large plants, and a massive transmission and distribution system. It strives to deliver high-quality power to all subscribers simultaneously - no matter what their demand - and must therefore be sized to the peak aggregate demand at each distribution point. Ultimately, the system demands end-to-end synchronization, and it lacks a mechanism for storing ("buffering") energy, thus complicating sharing among grids or independent operation during an "upstream" outage. Recent blackouts demonstrate the existing grid's problems - failures are rare but spectacular. Moreover, the structure cannot accommodate the highly variable nature of renewable energy sources such as solar and wind. Many people are pinning their hopes on the "smart grid" - i.e., a more distributed, adaptive, and market-based infrastructure for the generation, distribution, and consumption of electrical energy. This new approach is designed to yield greater efficiency and resilience, while reducing environmental impact, compared to the existing electricity distribution system. Initial plans for the smart grid suggest it will make extensive use of existing information technology. In particular, recent advances in data analytics - i.e., data mining, machine learning, etc. - have the potential to greatly enhance the smart grid and, ultimately, amplify its impact, by helping us make sense of an increasing wealth of data about how we use energy and the kinds of demands that we are placing upon the current energy grid. Here we describe what the electricity grid could look like in 10 years, and specifically how Federal investment in data analytics approaches are critical to realizing this vision.	翻訳日:2023-05-07 12:23:49 公開日:2020-07-31
# 関連するOTOC作用素:古典力学のフットプリント Relevant OTOC operators: footprints of the classical dynamics ( http://arxiv.org/abs/2008.00046v1 ) ライセンス: Link先を確認	Pablo D. Bergamasco, Gabriel G. Carlo and Alejandro M. F. Rivas	(参考訳) out-of-time order correlator (otoc) は最近、量子情報のスクランブルと絡み合いに関連付けられた様々な領域で関連づけられている。また、量子複雑性の指標として提案されている。この意味で、OTOC-REの定理は、作用素の完備基底にまとめられたOTOCを第二レニイエントロピーに関連付ける。ここでは、パウリ、リフレクション、翻訳演算子で構築されたような物理的に意味のある基底上でOTOC-RE対応を研究した。この進化は、異なるダイナミクスを持つ2つの摂動と結合されたアーノルドキャットマップからなるパラダイム的二成分系によって与えられる。関係作用素の小さな集合上の和は、エントロピーの非常によい近似を得るのに十分であり、したがって、時間 t 0 まで、力学の性格を明らかにするのに十分であることを示す。逆に、これは複雑さの別の自然な指標、すなわち時間に伴う関連する演算子の数のスケーリングを提供する。位相空間で表されるとき、これらの集合は、選択された基底に応じて深さの異なる古典力学足跡を明らかにする。 The out-of-time order correlator (OTOC) has recently become relevant in different areas where it has been linked to scrambling of quantum information and entanglement. It has also been proposed as a good indicator of quantum complexity. In this sense, the OTOC-RE theorem relates the OTOCs summed over a complete base of operators to the second Renyi entropy. Here we have studied the OTOC-RE correspondence on physically meaningful bases like the ones constructed with the Pauli, reflection, and translation operators. The evolution is given by a paradigmatic bi-partite system consisting of two perturbed and coupled Arnold cat maps with different dynamics. We show that the sum over a small set of relevant operators, is enough in order to obtain a very good approximation for the entropy and hence to reveal the character of the dynamics, up to a time t 0 . In turn, this provides with an alternative natural indicator of complexity, i.e. the scaling of the number of relevant operators with time. When represented in phase space, each one of these sets reveals the classical dynamical footprints with different depth according to the chosen base.	翻訳日:2023-05-07 12:23:20 公開日:2020-07-31
# データから知識から行動へ:21世紀のグローバル・エンバーサ From Data to Knowledge to Action: A Global Enabler for the 21st Century ( http://arxiv.org/abs/2008.00045v1 ) ライセンス: Link先を確認	Eric Horvitz and Tom Mitchell	(参考訳) コンピュータと数理科学の進歩は、真の証拠に基づく意思決定を可能にする前例のない能力を生み出した。これらの能力は、科学、社会、政府の課題に関する決定を支援するために、データの大規模なキャプチャと、そのデータの洞察とレコメンデーションへの変換を可能にする。主な進歩は、リッチなデータストリームの可用性の上昇、大量のデータの保存と検索のコストの急落、計算能力とメモリの指数関数的な増加、機械学習と推論を実行するための方法の多さの飛躍などである。これらの進歩は、大量のデータを活用して洞察を創造し、意思決定を導く能力の転換点を生み出しました。商業、科学、教育、芸術、エンターテイメントのWebへの移行により、人間の活動に関する前例のない量の構造化された、構造化されていないデータベースが利用できるようになる。科学において、新しい明らかなパラダイムとセンシング技術は、基本的に新しい種類の低コストセンサー(ゲノムマイクロアレイなど)や、前例のない範囲と解像度を提供するビューアを通じて、大量のデータを作成している。データはデータ中心の分析に大きなチャンスをもたらす。これまでのところ、これらの大規模なデータセットから学習する可能性の表面をひっかいただけだった。意思決定者に対して洞察を提供し、行動やポリシーの質を高めるために、私たちの新しい機能をもっと広範囲にタップする機会がある。 A confluence of advances in the computer and mathematical sciences has unleashed unprecedented capabilities for enabling true evidence-based decision making. These capabilities are making possible the large-scale capture of data and the transformation of that data into insights and recommendations in support of decisions about challenging problems in science, society, and government. Key advances include jumps in the availability of rich streams of data, precipitous drops in the cost of storing and retrieving massive amounts of data, exponential increases in computing power and memory, and jumps in the prowess of methods for performing machine learning and reasoning. These advances have come together to create an inflection point in our ability to harness large amounts of data for generating insights and guiding decision making. The shift of commerce, science, education, art, and entertainment to the web makes available unprecedented quantities of structured and unstructured databases about human activities - much of it available to anyone who wishes to mine it for insights. In the sciences, new evidential paradigms and sensing technologies are making available great quantities of data, via use of fundamentally new kinds of low-cost sensors (e.g., genomic microarrays) or through viewers that provide unprecedented scope and resolution. The data pose a huge opportunity for data-centric analyses. To date, we have only scratched the surface of the potential for learning from these large-scale data sets. Opportunities abound for tapping our new capabilities more broadly to provide insights to decision makers and to enhance the quality of their actions and policies.	翻訳日:2023-05-07 12:23:04 公開日:2020-07-31
# 最初の原理による量子熱力学のモデル:量子ハロまたは小さな環境 The Model of Quantum Thermodynamics From the First Principles: Quantum Halo or Small Environment ( http://arxiv.org/abs/2008.00040v1 ) ライセンス: Link先を確認	Ashot Gevorkyan	(参考訳) 結合系 (js) - `quantum system (qs)+thermal bath (tb) の進化は、ランジュバン・シュル(langevin-schr\"{o}dinger)型の確率微分方程式を満たす複雑な確率的過程の枠組みで考えられている。環境と相互にランダムに相互作用する2つの線形結合振動子をQSとして選択する。相互作用が白色ランダム過程の法則に従う場合、QSの統計パラメータとその環境のすべての構成は、二重積分と二階偏微分方程式の解の形で解析的に実行される。時間依存フォン・ノイマンエントロピーとその一般化の表現は、jsで起こる自己組織化と絡み合い過程を考慮して得られる。数学的には、TB における JS の緩和の結果、小さな量子化された環境が形成されることが証明され、これは QS の継続あるいはハローと解釈できる。連成2本の線形発振器の崩壊により形成されたベル状態は、環境の影響を考慮して構成される。 QSの漸近状態の$(in)$と$(out$)への遷移は、TBの影響を考慮して詳細に研究される。モデル問題の枠組みの中で、第一原理から量子熱力学を構築する可能性は、追加条件を使わずに証明される。 The evolution of the joint system (JS) - ``quantum system (QS)+thermal bath (TB)" is considered in the framework of a complex probabilistic processes that satisfies the stochastic differential equation of the Langevin-Schr\"{o}dinger type. Two linearly coupled oscillators that randomly interact with the environment and with each other are selected as QS. In the case when the interactions obey the law of a white random process, all the construction of the statistical parameters of the QS and its environment are performed analytically in the form of double integrals and solutions of second-order partial differential equations. Expressions of time-dependent von Neumann entropy and its generalization are obtained, taking into account the self-organization and entanglement processes occurring in the JS. It is mathematically proved that as a result of the relaxation of JS in the TB, a small quantized environment is formed, which can be interpreted as a continuation of QS or its halo. Bell states formed as a result of the decay of coupled two linear oscillators are constructed taking into account the influence of the environment. The transitions between $(in)$ and $(out$) asymptotic states of QS are studied in detail taking into account the influence of TB. Within the framework of the model problem, the possibility of constructing quantum thermodynamics from the first principle is proved without using any additional conditions.	翻訳日:2023-05-07 12:22:40 公開日:2020-07-31
# 次世代コンピューティングの可能性と課題 Opportunities and Challenges for Next Generation Computing ( http://arxiv.org/abs/2008.00023v1 ) ライセンス: Link先を確認	Gregory D. Hager, Mark D. Hill, and Katherine Yelick	(参考訳) コンピューティングは、ビジネスや農業からコミュニケーション、エンターテイメントに至るまで、私たちの生活のほぼすべての側面を劇的に変えました。国家としては、エネルギー、輸送、防衛のためのシステム設計におけるコンピューティングに依存しており、コンピューティングは世界の根本的な理解を改善し、健康と環境における大きな課題に対するソリューションの開発を支援する科学的発見を促進する。なぜなら、私たちのイノベーションは、過去数十年で性能とコストパフォーマンスが100万倍に向上したコンピュータ上で実行することができるからです。この背景にある推進力はムーアの法則と呼ばれるチップ毎のトランジスタの倍増を繰り返している。デナード・スケーリング(Dennard Scaling)は、これらのパフォーマンスの倍増をほぼ一定のパワーで実現したイネーブルだが、いずれのトレンドも課題に直面している。過去30年間のこの2つのトレンドの影響について考えてみましょう。 1980年代のスーパーコンピュータ(例えばCray 2)は2Gflops近くで評価され、200KW近い電力を消費した。当時、気象予報から核兵器研究まで、高性能で全国規模の用途に使われていた。同じような性能のコンピュータがポケットに収まり、消費電力は10ワット以下になった。つまり、ペタフロロップスケールのマシン(例えば1Pflop(=1015オペレーション/秒)のパフォーマンスに約500KWを必要とするCray XK7)を取り込み、そのプロセスを繰り返すことになる。そんなコンピューターをポケットに入れたら何ができますか。高容量コンピューティングの状況をどのように変えるのか? 本稿では,パーソナル・スケール・コンピューティングと国家規模コンピューティングの双方において,劇的なパフォーマンス向上の機会と課題を明らかにし,この規模のコンピューティングを実現する上での「アウト・オブ・ザ・ボックス」の可能性について論じる。 Computing has dramatically changed nearly every aspect of our lives, from business and agriculture to communication and entertainment. As a nation, we rely on computing in the design of systems for energy, transportation and defense; and computing fuels scientific discoveries that will improve our fundamental understanding of the world and help develop solutions to major challenges in health and the environment. Computing has changed our world, in part, because our innovations can run on computers whose performance and cost-performance has improved a million-fold over the last few decades. A driving force behind this has been a repeated doubling of the transistors per chip, dubbed Moore's Law. A concomitant enabler has been Dennard Scaling that has permitted these performance doublings at roughly constant power, but, as we will see, both trends face challenges. Consider for a moment the impact of these two trends over the past 30 years. A 1980's supercomputer (e.g. a Cray 2) was rated at nearly 2 Gflops and consumed nearly 200 KW of power. At the time, it was used for high performance and national-scale applications ranging from weather forecasting to nuclear weapons research. A computer of similar performance now fits in our pocket and consumes less than 10 watts. What would be the implications of a similar computing/power reduction over the next 30 years - that is, taking a petaflop-scale machine (e.g. the Cray XK7 which requires about 500 KW for 1 Pflop (=1015 operations/sec) performance) and repeating that process? What is possible with such a computer in your pocket? How would it change the landscape of high capacity computing? In the remainder of this paper, we articulate some opportunities and challenges for dramatic performance improvements of both personal to national scale computing, and discuss some "out of the box" possibilities for achieving computing at this scale.	翻訳日:2023-05-07 12:22:15 公開日:2020-07-31
# モノのインターネットのトレンドの加速による安全、セキュリティ、プライバシの脅威 Safety, Security, and Privacy Threats Posed by Accelerating Trends in the Internet of Things ( http://arxiv.org/abs/2008.00017v1 ) ライセンス: Link先を確認	Kevin Fu, Tadayoshi Kohno, Daniel Lopresti, Elizabeth Mynatt, Klara Nahrstedt, Shwetak Patel, Debra Richardson, and Ben Zorn	(参考訳) IoT(Internet of Things)はすでに、産業や都市、家庭を変革している。全ての産業におけるこの変革の経済的価値は1兆ドルと見積もられ、エネルギー効率、健康、生産性に対する社会的影響は巨大である。相互接続されたスマートデバイスの潜在的な利点は、あらゆるデバイスにセンサーとインテリジェンスを埋め込む際のリスクと悪用の可能性を高める。 iotデバイスの増加に関する主要な問題のひとつは、安全かつ安全に運用するために必要となる複雑さの増加である。この複雑さの増加によって、新しい安全性、セキュリティ、プライバシ、ユーザビリティの課題は、個人が1つのデバイスを保護するだけで直面する困難な課題をはるかに超えます。スマートデバイスやデバイスの集合が引き起こす負の傾向に注目し,セキュリティや物理的安全性,プライバシ,ユーザビリティなどに関わる問題は厳密に相互接続され,4つすべてを同時に対処するソリューションが必要であると主張する。既存の技術に基づく個々のデバイスに対する厳密な安全性とセキュリティ基準が必要である。同様に、個人がデバイスのコレクションを確実に管理する最良の方法を決定する研究は、このようなシステムの今後の展開を導く必要がある。 The Internet of Things (IoT) is already transforming industries, cities, and homes. The economic value of this transformation across all industries is estimated to be trillions of dollars and the societal impact on energy efficiency, health, and productivity are enormous. Alongside potential benefits of interconnected smart devices comes increased risk and potential for abuse when embedding sensing and intelligence into every device. One of the core problems with the increasing number of IoT devices is the increased complexity that is required to operate them safely and securely. This increased complexity creates new safety, security, privacy, and usability challenges far beyond the difficult challenges individuals face just securing a single device. We highlight some of the negative trends that smart devices and collections of devices cause and we argue that issues related to security, physical safety, privacy, and usability are tightly interconnected and solutions that address all four simultaneously are needed. Tight safety and security standards for individual devices based on existing technology are needed. Likewise research that determines the best way for individuals to confidently manage collections of devices must guide the future deployments of such systems.	翻訳日:2023-05-07 12:21:44 公開日:2020-07-31
# side-tuning: 追加サイドネットワークによるネットワーク適応のためのベースライン Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks ( http://arxiv.org/abs/1912.13503v4 ) ライセンス: Link先を確認	Jeffrey O Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik	(参考訳) 望ましいタスクのためにニューラルネットワークをトレーニングする場合、ランダムに初期化された重みから始めるよりも、トレーニング済みのネットワークに適応する方がよい。適応性は、トレーニングデータが不足している場合、単一の学習者が複数のタスクを実行する必要がある場合、あるいはネットワークで事前をエンコードしたい場合に役立つ。ネットワーク適応のための最も一般的なアプローチは、微調整と固定特徴抽出器として事前訓練されたネットワークの利用である。本稿では,その代替案であるサイドチューニングを提案する。サイドチューニングは、(変更されていない)事前トレーニングされたネットワークと総和で融合した軽量な"サイド"ネットワークをトレーニングすることで、事前トレーニングされたネットワークに適応する。この単純な方法は、既存のソリューションと同等かそれ以上に機能し、微調整、固定機能、その他の一般的なアプローチに関する基本的な問題を解決します。特に、サイドチューニングは過度に適合しにくく、漸近的に一貫性があり、漸進的な学習における破滅的な忘れに苦しむことはない。本研究では,インクリメンタル学習 (icifar, itaskonomy),強化学習,模倣学習 (visual navigation in habitat), nlp質問応答学習 (squad v2) およびシングルタスク伝達学習 (taskonomy) など,様々なシナリオにおけるサイドチューニングの性能を示す。 When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights. Adaptation can be useful in cases when training data is scarce, when a single learner needs to perform multiple tasks, or when one wishes to encode priors in the network. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others. In this paper, we propose a straightforward alternative: side-tuning. Side-tuning adapts a pre-trained network by training a lightweight "side" network that is fused with the (unchanged) pre-trained network via summation. This simple method works as well as or better than existing solutions and it resolves some of the basic issues with fine-tuning, fixed features, and other common approaches. In particular, side-tuning is less prone to overfitting, is asymptotically consistent, and does not suffer from catastrophic forgetting in incremental learning. We demonstrate the performance of side-tuning under a diverse set of scenarios, including incremental learning (iCIFAR, iTaskonomy), reinforcement learning, imitation learning (visual navigation in Habitat), NLP question-answering (SQuAD v2), and single-task transfer learning (Taskonomy), with consistently promising results.	翻訳日:2023-01-16 20:07:19 公開日:2020-07-31
# 非同期イベントベースデータのための微分可能なリカレントサーフェス A Differentiable Recurrent Surface for Asynchronous Event-Based Data ( http://arxiv.org/abs/2001.03455v2 ) ライセンス: Link先を確認	Marco Cannici, Marco Ciccone, Andrea Romanoni, Matteo Matteucci	(参考訳) dynamic vision sensor (dvss) 輝度変化の対象となるピクセルに対応するイベントを非同期にストリームする。古典的な視覚装置とは異なり、シーンの粗い表現を生成する。したがって、標準的なコンピュータビジョンアルゴリズムを適用するには、イベントをフレームやイベントサーフェスに統合する必要がある。これは通常、余分なヒューリスティックを用いてフレームを再構築する手作りのグリッドによって達成される。本稿では,イベントを効率的に処理し,エンドツーエンドのタスク依存型イベントサーフェスを学ぶための,lstm(long short-term memory)セルのグリッドであるmatrix-lstmを提案する。既存の再構成手法と比較すると,MVSECベンチマークでは光学フロー推定の柔軟性や表現性が向上し,N-Carsデータセット上でのイベントベースオブジェクト分類の最先端性が改善されている。 Dynamic Vision Sensors (DVSs) asynchronously stream events in correspondence of pixels subject to brightness changes. Differently from classic vision devices, they produce a sparse representation of the scene. Therefore, to apply standard computer vision algorithms, events need to be integrated into a frame or event-surface. This is usually attained through hand-crafted grids that reconstruct the frame using ad-hoc heuristics. In this paper, we propose Matrix-LSTM, a grid of Long Short-Term Memory (LSTM) cells that efficiently process events and learn end-to-end task-dependent event-surfaces. Compared to existing reconstruction approaches, our learned event-surface shows good flexibility and expressiveness on optical flow estimation on the MVSEC benchmark and it improves the state-of-the-art of event-based object classification on the N-Cars dataset.	翻訳日:2023-01-12 23:14:50 公開日:2020-07-31
# 機械学習モデルに何を尋ねますか? 人文モデル対話に基づくモデル記述のためのユーザニーズの同定 What Would You Ask the Machine Learning Model? Identification of User Needs for Model Explanations Based on Human-Model Conversations ( http://arxiv.org/abs/2002.05674v3 ) ライセンス: Link先を確認	Micha{\l} Ku\'zba, Przemys{\l}aw Biecek	(参考訳) 最近、eXplainable Artificial Intelligenceの分野では、メソッドが増えている。驚いたことに、彼らの開発は、エンドユーザーのニーズの研究ではなく、モデル開発者によって進められます。ニーズの分析は、完了すれば、オープンな質問の研究ではなく、A/Bテストの形式を取る。人間のオペレータはmlモデルに何を尋ねるのか?」という問いに答えるために,予測モデルの決定を説明する会話システムを提案する。本研究では,タイタニック号の生存確率を予測するための機械学習モデルについて,Dr_antというチャットボットを開発した。モデルのさまざまな側面についてDr_ant氏と話し、予測の背後にある根拠を理解することができます。 1000以上の対話のコーパスを収集し、ユーザが聞きたい最も一般的なタイプの質問を分析します。我々の知る限り、これは人間の操作者のニーズを予測モデルの対話的かつ反復的な対話探索から収集する会話システムを用いた最初の研究である。 Recently we see a rising number of methods in the field of eXplainable Artificial Intelligence. To our surprise, their development is driven by model developers rather than a study of needs for human end users. The analysis of needs, if done, takes the form of an A/B test rather than a study of open questions. To answer the question "What would a human operator like to ask the ML model?" we propose a conversational system explaining decisions of the predictive model. In this experiment, we developed a chatbot called dr_ant to talk about machine learning model trained to predict survival odds on Titanic. People can talk with dr_ant about different aspects of the model to understand the rationale behind its predictions. Having collected a corpus of 1000+ dialogues, we analyse the most common types of questions that users would like to ask. To our knowledge, it is the first study which uses a conversational system to collect the needs of human operators from the interactive and iterative dialogue explorations of a predictive model.	翻訳日:2023-01-03 03:32:56 公開日:2020-07-31
# ノイズブレーカー:ノイズ解析で導かれるグラデーショナルイメージ NoiseBreaker: Gradual Image Denoising Guided by Noise Analysis ( http://arxiv.org/abs/2002.07487v2 ) ライセンス: Link先を確認	Florian Lemarchand, Erwan Nogues and Maxime Pelcat	(参考訳) 完全な教師付きディープラーニングベースのデノイザは現在、最もパフォーマンスの高いイメージデノイザソリューションである。しかし、それらはきれいな参照画像を必要とする。対象雑音が複雑である場合、例えば、未知の一次雑音と未知の強度の混合からなる場合、完全な教師付き解は問題に適したトレーニングセットを構築することの困難さによって制限される。本稿では,画像中の支配ノイズを反復的に検出し,調整されたデノイザーを用いて除去する漸進的デノイジング戦略を提案する。この手法は混合雑音に対する美術ブラインドデノイザーの状態に追従することを示す。さらに, ノイズ解析により, ノイズの種類だけでなく, 騒音強度も効率的に誘導できることを示した。この手法は、遭遇した雑音の性質についての洞察を提供し、既存のデノイザーを新しいノイズの性質で拡張することができる。この特徴により、様々なデノイジングケースに適応する。 Fully supervised deep-learning based denoisers are currently the most performing image denoising solutions. However, they require clean reference images. When the target noise is complex, e.g. composed of an unknown mixture of primary noises with unknown intensity, fully supervised solutions are limited by the difficulty to build a suited training set for the problem. This paper proposes a gradual denoising strategy that iteratively detects the dominating noise in an image, and removes it using a tailored denoiser. The method is shown to keep up with state of the art blind denoisers on mixture noises. Moreover, noise analysis is demonstrated to guide denoisers efficiently not only on noise type, but also on noise intensity. The method provides an insight on the nature of the encountered noise, and it makes it possible to extend an existing denoiser with new noise nature. This feature makes the method adaptive to varied denoising cases.	翻訳日:2022-12-30 20:26:30 公開日:2020-07-31
# 実世界の人間-ロボット協調強化学習 Real-World Human-Robot Collaborative Reinforcement Learning ( http://arxiv.org/abs/2003.01156v2 ) ライセンス: Link先を確認	Ali Shafti, Jonas Tjomsland, William Dudley and A. Aldo Faisal	(参考訳) 人間とインテリジェントロボット(embodied ai)の現実世界における直感的なコラボレーションは、ロボット工学の多くの望ましい応用にとって必須の目的である。明示的なコミュニケーションに関する多くの研究があるが、人間とロボットがどのように暗黙的に相互作用するか、運動適応レベルに焦点を当てている。本研究では,2つの直交軸の回転に動作を制限し,各軸を1人のプレイヤーに割り当てることにより,人間ロボット協調迷路ゲームの現実的な構成について述べる。この結果、人間もエージェントも自分でゲームを解くことはできない。我々は,ロボットエージェントの制御に深層強化学習を用い,実世界のプレイの30分以内に,いかなる事前学習も行わない結果を得る。次に、この設定を用いて、協調ゲームのためのポリシーを共同学習する際に、人間/エージェントの行動と適応に関する体系的な実験を行う。本研究では,人間とロボットエージェント間の時間的相互政治学習の結果を提示し,各参加者のエージェントがゲームプレイの表現として機能することを示す。これにより、エージェントのポリシーと自身のエージェントのポリシーを比較して、エージェントが自分自身と異なるエージェントと遊ぶ場合の成功を関連付けることができます。 The intuitive collaboration of humans and intelligent robots (embodied AI) in the real-world is an essential objective for many desirable applications of robotics. Whilst there is much research regarding explicit communication, we focus on how humans and robots interact implicitly, on motor adaptation level. We present a real-world setup of a human-robot collaborative maze game, designed to be non-trivial and only solvable through collaboration, by limiting the actions to rotations of two orthogonal axes, and assigning each axes to one player. This results in neither the human nor the agent being able to solve the game on their own. We use deep reinforcement learning for the control of the robotic agent, and achieve results within 30 minutes of real-world play, without any type of pre-training. We then use this setup to perform systematic experiments on human/agent behaviour and adaptation when co-learning a policy for the collaborative game. We present results on how co-policy learning occurs over time between the human and the robotic agent resulting in each participant's agent serving as a representation of how they would play the game. This allows us to relate a person's success when playing with different agents than their own, by comparing the policy of the agent with that of their own agent.	翻訳日:2022-12-27 04:40:25 公開日:2020-07-31
# ニューロモルフィックハードウェアを用いたエネルギー効率の高いマップレスナビゲーションのための深層・スパイキングニューラルネットワークの強化共学習 Reinforcement co-Learning of Deep and Spiking Neural Networks for Energy-Efficient Mapless Navigation with Neuromorphic Hardware ( http://arxiv.org/abs/2003.01157v2 ) ライセンス: Link先を確認	Guangzhi Tang, Neelesh Kumar, Konstantinos P. Michmizos	(参考訳) エネルギー効率の良いマップレスナビゲーションは、限られたオンボードリソースで未知の環境を探索するモバイルロボットにとって不可欠である。近年の深部強化学習(DRL)アプローチはナビゲーションに成功しているが、その高エネルギー消費はいくつかのロボットアプリケーションでの使用を制限する。本稿では、スパイクニューラルネットワークのエネルギー効率とDRLの最適性を組み合わせたニューロモルフィックなアプローチを提案し、それをマップレスナビゲーションの学習制御ポリシーでベンチマークする。我々のハイブリッド・フレームワークである深層決定主義的政策勾配(SDDPG)は、スパイキングアクターネットワーク(SAN)と深い批判ネットワークから構成されており、この2つのネットワークは勾配降下を用いて共同で訓練されている。共同学習は、2つのネットワーク間の相乗的情報交換を可能にし、共有表現学習を通じて相互の制限を克服した。アプローチを評価するため、トレーニング済みのSANをIntelのLoihiニューロモルフィックプロセッサにデプロイした。シミュレーションおよび実世界の複雑な環境において,本手法はJetson TX2のDDPGと比較して75倍のエネルギーを消費し,目標への航法成功率も1%から4.2%に向上した。これらの結果は、ニューロモルフィックハードウェアで自律ロボットを制御する脳に触発されたアルゴリズムを設計するための継続的な取り組みを強化する。 Energy-efficient mapless navigation is crucial for mobile robots as they explore unknown environments with limited on-board resources. Although the recent deep reinforcement learning (DRL) approaches have been successfully applied to navigation, their high energy consumption limits their use in several robotic applications. Here, we propose a neuromorphic approach that combines the energy-efficiency of spiking neural networks with the optimality of DRL and benchmark it in learning control policies for mapless navigation. Our hybrid framework, spiking deep deterministic policy gradient (SDDPG), consists of a spiking actor network (SAN) and a deep critic network, where the two networks were trained jointly using gradient descent. The co-learning enabled synergistic information exchange between the two networks, allowing them to overcome each other's limitations through a shared representation learning. To evaluate our approach, we deployed the trained SAN on Intel's Loihi neuromorphic processor. When validated on simulated and real-world complex environments, our method on Loihi consumed 75 times less energy per inference as compared to DDPG on Jetson TX2, and also exhibited a higher rate of successful navigation to the goal, which ranged from 1% to 4.2% and depended on the forward-propagation timestep size. These results reinforce our ongoing efforts to design brain-inspired algorithms for controlling autonomous robots with neuromorphic hardware.	翻訳日:2022-12-27 04:14:33 公開日:2020-07-31
# RGBDフィデューシャルセンシングとリカレントニューラルネットワークを用いた効率的なケーブル駆動手術ロボット Efficiently Calibrating Cable-Driven Surgical Robots with RGBD Fiducial Sensing and Recurrent Neural Networks ( http://arxiv.org/abs/2003.08520v4 ) ライセンス: Link先を確認	Minho Hwang, Brijen Thananjeyan, Samuel Paradis, Daniel Seita, Jeffrey Ichnowski, Danyal Fer, Thomas Low, and Ken Goldberg	(参考訳) Intuitive surgery's da Vinci Research Kit (dVRK) のようなケーブル駆動型手術補助装置(RSA)を用いた手術用サブタスクの自動化は,ケーブルストレッチやヒステリシスなどのケーブル関連効果の制御が困難である。 rgbdセンシングで追跡するエンドエフェクタと腕に3dプリントされたフィドゥクアル座標フレームを配置することで、ロボットを効率的に校正する新しい手法を提案する。関節間の結合と履歴に依存した効果を計測するために, サンプル軌跡からのデータを解析し, モデリングへの13のアプローチを検討する。これらのモデルには線形回帰とlstmリカレントニューラルネットワークが含まれており、それぞれ時間的ウィンドウ長が異なり、補償的フィードバックを提供する。提案手法では,1800試料のデータ収集に31分,モデルトレーニングに1分以下を要した。基準軌道の試験セットの結果は、トレーニングされたモデルが物理ロボットの平均追尾誤差を2.96mmから0.65mmに低減できることを示している。 fls pegトランスファー外科医訓練タスクのオープンループ軌道の実行結果から、最良のモデルは成功率を39.4 %から96.7 %に増加させ、熟練した外科患者に匹敵する性能をもたらすことが示唆された。コードや3Dプリント可能なモデルを含む補助材料はhttps://sites.google.com/berkeley.edu/surgical-calibrationで入手できる。 Automation of surgical subtasks using cable-driven robotic surgical assistants (RSAs) such as Intuitive Surgical's da Vinci Research Kit (dVRK) is challenging due to imprecision in control from cable-related effects such as cable stretching and hysteresis. We propose a novel approach to efficiently calibrate such robots by placing a 3D printed fiducial coordinate frames on the arm and end-effector that is tracked using RGBD sensing. To measure the coupling and history-dependent effects between joints, we analyze data from sampled trajectories and consider 13 approaches to modeling. These models include linear regression and LSTM recurrent neural networks, each with varying temporal window length to provide compensatory feedback. With the proposed method, data collection of 1800 samples takes 31 minutes and model training takes under 1 minute. Results on a test set of reference trajectories suggest that the trained model can reduce the mean tracking error of the physical robot from 2.96 mm to 0.65 mm. Results on the execution of open-loop trajectories of the FLS peg transfer surgeon training task suggest that the best model increases success rate from 39.4 % to 96.7 %, producing performance comparable to that of an expert surgical resident. Supplementary materials, including code and 3D-printable models, are available at https://sites.google.com/berkeley.edu/surgical-calibration	翻訳日:2022-12-22 04:42:02 公開日:2020-07-31
# テスト時のコスト効率の高い機能獲得による異種間意思決定支援 Peri-Diagnostic Decision Support Through Cost-Efficient Feature Acquisition at Test-Time ( http://arxiv.org/abs/2003.14127v2 ) ライセンス: Link先を確認	Gerome Vivar, Kamilia Mullakaeva, Andreas Zwergal, Nassir Navab, and Seyed-Ahmad Ahmadi	(参考訳) 医学におけるコンピュータ支援診断(CADx)アルゴリズムは、医師に患者固有の意思決定支援を提供する。これらのアルゴリズムは通常、高次元マルチモーダル検査データの完全取得後に適用され、しばしば特徴完全性を仮定する。しかし、検査コスト、侵襲性、または徴候の欠如により、このようなケースはめったにない。 CADxのサブプロブレムは,これまでにCADxコミュニティでほとんど注目されていないが,取得段階を含む診断ワークフロー全体において,医師を指導することを目的としている。我々は、医師の視点から「これまで収集された証拠を収集し、最も正確で効率的な診断予測を達成するために、次にどの検査を行うべきか」という質問をモデル化した。本研究では,入力層でのドロップアウトの利用と,テスト時にトレーニングされたネットワークの勾配の統合により,機能の重要性を動的に属性づけする手法を提案する。 2つの公衆医療と2つの合成データセットを用いて,提案手法の有効性を検証する。その結果,提案手法は従来手法よりもコスト効率が高く,全体の精度も高いことがわかった。これは直接的に、患者にとって不要な検査を減らし、医師にとってより早く、よりコストが低く、より正確な意思決定支援に繋がる。 Computer-aided diagnosis (CADx) algorithms in medicine provide patient-specific decision support for physicians. These algorithms are usually applied after full acquisition of high-dimensional multimodal examination data, and often assume feature-completeness. This, however, is rarely the case due to examination costs, invasiveness, or a lack of indication. A sub-problem in CADx, which to our knowledge has received very little attention among the CADx community so far, is to guide the physician during the entire peri-diagnostic workflow, including the acquisition stage. We model the following question, asked from a physician's perspective: "Given the evidence collected so far, which examination should I perform next, in order to achieve the most accurate and efficient diagnostic prediction?". In this work, we propose a novel approach which is enticingly simple: use dropout at the input layer, and integrated gradients of the trained network at test-time to attribute feature importance dynamically. We validate and explain the effectiveness of our proposed approach using two public medical and two synthetic datasets. Results show that our proposed approach is more cost- and feature-efficient than prior approaches and achieves a higher overall accuracy. This directly translates to less unnecessary examinations for patients, and a quicker, less costly and more accurate decision support for the physician.	翻訳日:2022-12-18 00:22:58 公開日:2020-07-31
# 2D-3Dライン対応付き先行LiDARマップにおける単眼カメラの定位 Monocular Camera Localization in Prior LiDAR Maps with 2D-3D Line Correspondences ( http://arxiv.org/abs/2004.00740v2 ) ライセンス: Link先を確認	Huai Yu, Weikun Zhen, Wen Yang, Ji Zhang, Sebastian Scherer	(参考訳) 既存の地図における軽量カメラのローカライゼーションは、視覚ベースのナビゲーションに不可欠である。現在、視覚および視覚慣性オドメトリ(vo\&vio)技術は状態推定のためによく開発されているが、ループ閉包時に必然的に蓄積されたドリフトとポーズジャンプがある。これらの問題を解決するために,直接2D-3D線対応を用いた先行LiDARマップにおける効率的な単眼カメラのローカライズ手法を提案する。画像とLiDAR点雲の出現差とモダリティギャップに対処するため,LDARマップから幾何学的3D線をオフラインに抽出し,ビデオシーケンスからロバストな2D線をオンライン抽出する。 VIOからのポーズ予測により、粗い2D-3D線対応を効率的に得ることができる。次に、カメラポーズと2D-3D対応を、対応の投影誤差を最小化し、出力を拒否することで繰り返し最適化する。 eurocmavデータセットと収集したデータセットにおける実験結果から,提案手法は,構造化された環境でのドリフトやジャンプを蓄積することなく,効率的にカメラポーズを推定できることが示されている。 Light-weight camera localization in existing maps is essential for vision-based navigation. Currently, visual and visual-inertial odometry (VO\&VIO) techniques are well-developed for state estimation but with inevitable accumulated drifts and pose jumps upon loop closure. To overcome these problems, we propose an efficient monocular camera localization method in prior LiDAR maps using direct 2D-3D line correspondences. To handle the appearance differences and modality gaps between LiDAR point clouds and images, geometric 3D lines are extracted offline from LiDAR maps while robust 2D lines are extracted online from video sequences. With the pose prediction from VIO, we can efficiently obtain coarse 2D-3D line correspondences. Then the camera poses and 2D-3D correspondences are iteratively optimized by minimizing the projection error of correspondences and rejecting outliers. Experimental results on the EurocMav dataset and our collected dataset demonstrate that the proposed method can efficiently estimate camera poses without accumulated drifts or pose jumps in structured environments.	翻訳日:2022-12-17 19:13:00 公開日:2020-07-31
# 変形を考慮した3次元モデル埋め込みと検索 Deformation-Aware 3D Model Embedding and Retrieval ( http://arxiv.org/abs/2004.01228v3 ) ライセンス: Link先を確認	Mikaela Angelina Uy and Jingwei Huang and Minhyuk Sung and Tolga Birdal and Leonidas Guibas	(参考訳) 本稿では,与えられた問合せ形状に変形可能な3次元モデルの検索問題を導入し,この検索課題を解決するための新しい深部変形認識埋め込みを提案する。 3Dモデル検索は、ノイズと部分的な3Dスキャンからクリーンで完全な3Dモデルを復元するための基本的な操作である。しかし、3次元形状の有限集合を考えると、クエリに最も近いモデルでさえ満足できないかもしれない。これにより、検索したモデルに適合するように3次元モデル変形技術を適用する動機付けとなる。しかし、多くの3次元変形技術では、元のモデルの重要な特徴を保存し、クエリへの変形モデルの完全適合を防止するために、一定の制限が課されている。この変形モデルとクエリ間のギャップは、典型的なメトリック学習技術では扱えないモデル間の非対称な関係を誘導する。そこで本研究では,位置依存型自我中心距離場を利用して非対称な関係を学習する新しい深層埋め込み手法を提案する。また,組込みネットワークを訓練するための2つの戦略を提案する。これらの手法は、合成データと実データの両方で実験において、他のベースラインよりも優れていることを示す。プロジェクトページはhttps://deformscan2cad.github.io/で閲覧できます。 We introduce a new problem of retrieving 3D models that are deformable to a given query shape and present a novel deep deformation-aware embedding to solve this retrieval task. 3D model retrieval is a fundamental operation for recovering a clean and complete 3D model from a noisy and partial 3D scan. However, given a finite collection of 3D shapes, even the closest model to a query may not be satisfactory. This motivates us to apply 3D model deformation techniques to adapt the retrieved model so as to better fit the query. Yet, certain restrictions are enforced in most 3D deformation techniques to preserve important features of the original model that prevent a perfect fitting of the deformed model to the query. This gap between the deformed model and the query induces asymmetric relationships among the models, which cannot be handled by typical metric learning techniques. Thus, to retrieve the best models for fitting, we propose a novel deep embedding approach that learns the asymmetric relationships by leveraging location-dependent egocentric distance fields. We also propose two strategies for training the embedding network. We demonstrate that both of these approaches outperform other baselines in our experiments with both synthetic and real data. Our project page can be found at https://deformscan2cad.github.io/.	翻訳日:2022-12-17 10:14:06 公開日:2020-07-31
# CLARIAHのオントロジー:歴史・言語・メディアの相互運用性を目指して Ontologies in CLARIAH: Towards Interoperability in History, Language and Media ( http://arxiv.org/abs/2004.02845v2 ) ライセンス: Link先を確認	Albert Mero\~no-Pe\~nuela, Victor de Boer, Marieke van Erp, Richard Zijdeman, Rick Mourits, Willem Melder, Auke Rijpma, Ruben Schalk	(参考訳) デジタル人文科学の最も重要な目標の1つは、研究者に新たな研究課題のためのデータとツールを提供することである。ここでfairの原則は、データが必要な場合に有用なフレームワークを提供する: findable, 様々なソースに散在することが多い; アクセス可能; 一部はオフラインまたはペイウォールの背後にあるのでアクセス可能; 相互運用可能; 標準の知識表現形式と共有語彙を使用する; 適切なライセンスと許可によって再利用する。多様な人文科学領域からのデータの統合は簡単ではなく、「経済の富は18世紀に均等に分配されたか?」「破壊的なメディアイベントを中心に構築された物語は何か?」といった研究課題や、学者の準備段階(データ収集、知識組織、清掃など)を考慮する必要がある。本章では,オランダ国立プロジェクト clariah で開発・統合されたオントロジーとツールについて記述し,パラダイム的データ表現(文体コーパス,構造化データ,マルチメディア)を持つ人文科学(言語学,社会・経済史,メディア研究)の「ピラーズ」という3つの基本領域のデータセットから,これらの問題に対処した。このようなオントロジーとツールを用いて,一般化と再利用性の観点から学んだ教訓を要約する。 One of the most important goals of digital humanities is to provide researchers with data and tools for new research questions, either by increasing the scale of scholarly studies, linking existing databases, or improving the accessibility of data. Here, the FAIR principles provide a useful framework as these state that data needs to be: Findable, as they are often scattered among various sources; Accessible, since some might be offline or behind paywalls; Interoperable, thus using standard knowledge representation formats and shared vocabularies; and Reusable, through adequate licensing and permissions. Integrating data from diverse humanities domains is not trivial, research questions such as "was economic wealth equally distributed in the 18th century?", or "what are narratives constructed around disruptive media events?") and preparation phases (e.g. data collection, knowledge organisation, cleaning) of scholars need to be taken into account. In this chapter, we describe the ontologies and tools developed and integrated in the Dutch national project CLARIAH to address these issues across datasets from three fundamental domains or "pillars" of the humanities (linguistics, social and economic history, and media studies) that have paradigmatic data representations (textual corpora, structured data, and multimedia). We summarise the lessons learnt from using such ontologies and tools in these domains from a generalisation and reusability perspective.	翻訳日:2022-12-16 07:29:35 公開日:2020-07-31
# never stop learning: ロボット強化学習における微調整の有効性 Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning ( http://arxiv.org/abs/2004.10190v2 ) ライセンス: Link先を確認	Ryan Julian, Benjamin Swanson, Gaurav S. Sukhatme, Sergey Levine, Chelsea Finn, and Karol Hausman	(参考訳) ロボット学習システムの大きな約束の一つは、過ちから学び、絶えず変化する環境に適応できることだ。この可能性にもかかわらず、今日のロボット学習システムのほとんどは、固定ポリシーとしてデプロイされており、デプロイ後に適応されていない。学習済みの振る舞いを、現実世界の新しい環境、オブジェクト、パーセプションに効率的に適応できるだろうか? 本稿では,継続的な適応を促進するロボット学習フレームワークのための手法と実証的証拠を提案する。特に, 背景, 物体形状, 外観, 照明条件, ロボット形態の変化など, オフポリシー強化学習による微調整により, 視覚に基づくロボット操作ポリシを新たなバリエーションに適用する方法を実証する。さらに、この適応はタスクをゼロから学習するために必要なデータの0.2%未満を使用する。私たちは、事前トレーニングされたポリシーを適用するアプローチが、微調整の過程でかなりのパフォーマンス向上につながること、rlによる事前トレーニングが不可欠であることを見出します。また、これらの肯定的な結果は、連続的な学習環境に限られており、新しいタスクの連続したデータを用いて、単一のポリシーの行を繰り返し微調整する。我々の経験的な結論は、シミュレーション操作タスクの実験と、580,000の把持で事前訓練された実際のロボット把持システムに関する52のユニークな微調整実験によって一貫して支持されている。 One of the great promises of robot learning systems is that they will be able to learn from their mistakes and continuously adapt to ever-changing environments. Despite this potential, most of the robot learning systems today are deployed as a fixed policy and they are not being adapted after their deployment. Can we efficiently adapt previously learned behaviors to new environments, objects and percepts in the real world? In this paper, we present a method and empirical evidence towards a robot learning framework that facilitates continuous adaption. In particular, we demonstrate how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning, including changes in background, object shape and appearance, lighting conditions, and robot morphology. Further, this adaptation uses less than 0.2% of the data necessary to learn the task from scratch. We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning, and that pre-training via RL is essential: training from scratch or adapting from supervised ImageNet features are both unsuccessful with such small amounts of data. We also find that these positive results hold in a limited continual learning setting, in which we repeatedly fine-tune a single lineage of policies using data from a succession of new tasks. Our empirical conclusions are consistently supported by experiments on simulated manipulation tasks, and by 52 unique fine-tuning experiments on a real robotic grasping system pre-trained on 580,000 grasps.	翻訳日:2022-12-11 05:55:09 公開日:2020-07-31
# DR-SPAAM:2次元距離データにおける人物検出のための空間的・自己回帰モデル DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data ( http://arxiv.org/abs/2004.14079v2 ) ライセンス: Link先を確認	Dan Jia, Alexander Hermans, and Bastian Leibe	(参考訳) 2次元LiDARを用いた人物検出は,2次元範囲データの低情報化による課題である。 LiDAR点のばらつきによる問題を緩和するため、現在の最先端手法は複数の過去のスキャンを融合させ、組み合わせたスキャンを用いて検出を行う。このような後方視の融合の欠点は、すべてのスキャンを明示的にアライメントする必要があること、そして必要なアライメント操作によってパイプライン全体のコストが高くなることだ。本稿では,異なるタイミングで取得したスキャンを組み合わせるための代替戦略を用いた人物検出ネットワークを提案する。距離ロバスト空間注意自動回帰モデル (DR-SPAAM) は前方視のパラダイムに従っている。中間機能をbackboneネットワークからテンプレートとして保持し、新しいスキャンが利用可能になったときにテンプレートを更新します。更新されたフィーチャーテンプレートは、現在シーンにいる人を検出するために使用される。 DROWデータセットでは,既存の最先端技術よりも約4倍高速で,専用GPUを搭載したラップトップで87.2FPS,NVIDIA Jetson AGX組み込みGPUで22.6FPSで動作する。 PyTorchと、事前トレーニングされたモデルを含むROSノードでコードをリリースします。 Detecting persons using a 2D LiDAR is a challenging task due to the low information content of 2D range data. To alleviate the problem caused by the sparsity of the LiDAR points, current state-of-the-art methods fuse multiple previous scans and perform detection using the combined scans. The downside of such a backward looking fusion is that all the scans need to be aligned explicitly, and the necessary alignment operation makes the whole pipeline more expensive -- often too expensive for real-world applications. In this paper, we propose a person detection network which uses an alternative strategy to combine scans obtained at different times. Our method, Distance Robust SPatial Attention and Auto-regressive Model (DR-SPAAM), follows a forward looking paradigm. It keeps the intermediate features from the backbone network as a template and recurrently updates the template when a new scan becomes available. The updated feature template is in turn used for detecting persons currently in the scene. On the DROW dataset, our method outperforms the existing state-of-the-art, while being approximately four times faster, running at 87.2 FPS on a laptop with a dedicated GPU and at 22.6 FPS on an NVIDIA Jetson AGX embedded GPU. We release our code in PyTorch and a ROS node including pre-trained models.	翻訳日:2022-12-08 14:29:09 公開日:2020-07-31
# 前歯部光コヒーレンス断層撮影における隅角閉鎖緑内障の評価 AGE Challenge: Angle Closure Glaucoma Evaluation in Anterior Segment Optical Coherence Tomography ( http://arxiv.org/abs/2005.02258v3 ) ライセンス: Link先を確認	Huazhu Fu, Fei Li, Xu Sun, Xingxing Cao, Jingan Liao, Jose Ignacio Orlando, Xing Tao, Yuexiang Li, Shihao Zhang, Mingkui Tan, Chenglang Yuan, Cheng Bian, Ruitao Xie, Jiongcheng Li, Xiaomeng Li, Jing Wang, Le Geng, Panming Li, Huaying Hao, Jiang Liu, Yan Kong, Yongyong Ren, Hrvoje Bogunovic, Xiulan Zhang, Yanwu Xu	(参考訳) アングル閉鎖緑内障(ACG)は開角緑内障よりも攻撃的な疾患であり、前室角度(ACA)の異常な解剖学的構造が眼圧を上昇させ、徐々に緑内障を発症し、最終的には視力障害や盲目を引き起こす。 Anterior Segment Optical Coherence Tomography (AS-OCT) は、開角度から角度閉鎖を識別する高速で接触のない方法を提供する。緑内障の診断のために多くの医用画像解析アルゴリズムが開発されたが、AS-OCTイメージングに焦点を当てた研究はごくわずかである。特に、既存のメソッドを統一的に評価するための公開as-octデータセットは存在せず、アングルクロージャ検出と評価のための自動化技術の開発の進捗を制限している。そこで我々は,MICCAI 2019と共同で開催したAngle closure Glaucoma Evaluation Challenge (AGE)を組織した。年齢課題は, 強膜刺激の局在と角度閉鎖分類の2つの課題から成っていた。そこで我々は199人の患者から4800個の注釈付きAS-OCT画像の大規模なデータセットを公開し、異なるモデルをベンチマークし比較するための評価フレームワークを提案した。 AGEチャレンジでは200以上のチームがオンラインに登録され、1100以上の結果がオンライン評価のために提出された。最終的に8チームがオンサイトチャレンジに参加した。本稿では,これらの8つの課題を要約し,その2つの課題に対して対応する結果を解析する。我々はさらに制限と今後の方向性について議論する。 AGEチャレンジでは,最高性能のユークリッド距離は平均10ピクセル (10um) であり,角度クロージャ分類のタスクでは,全てのアルゴリズムが良好な性能を達成し,100%の精度を得ることができた。 Angle closure glaucoma (ACG) is a more aggressive disease than open-angle glaucoma, where the abnormal anatomical structures of the anterior chamber angle (ACA) may cause an elevated intraocular pressure and gradually lead to glaucomatous optic neuropathy and eventually to visual impairment and blindness. Anterior Segment Optical Coherence Tomography (AS-OCT) imaging provides a fast and contactless way to discriminate angle closure from open angle. Although many medical image analysis algorithms have been developed for glaucoma diagnosis, only a few studies have focused on AS-OCT imaging. In particular, there is no public AS-OCT dataset available for evaluating the existing methods in a uniform way, which limits progress in the development of automated techniques for angle closure detection and assessment. To address this, we organized the Angle closure Glaucoma Evaluation challenge (AGE), held in conjunction with MICCAI 2019. The AGE challenge consisted of two tasks: scleral spur localization and angle closure classification. For this challenge, we released a large dataset of 4800 annotated AS-OCT images from 199 patients, and also proposed an evaluation framework to benchmark and compare different models. During the AGE challenge, over 200 teams registered online, and more than 1100 results were submitted for online evaluation. Finally, eight teams participated in the onsite challenge. In this paper, we summarize these eight onsite challenge methods and analyze their corresponding results for the two tasks. We further discuss limitations and future directions. In the AGE challenge, the top-performing approach had an average Euclidean Distance of 10 pixels (10um) in scleral spur localization, while in the task of angle closure classification, all the algorithms achieved satisfactory performances, with two best obtaining an accuracy rate of 100%.	翻訳日:2022-12-06 14:17:26 公開日:2020-07-31
# 自動マルチラベル分類法のロバストな実験評価 A Robust Experimental Evaluation of Automated Multi-Label Classification Methods ( http://arxiv.org/abs/2005.08083v2 ) ライセンス: Link先を確認	Alex G. C. de S\'a, Cristiano G. Pimenta, Gisele L. Pappa and Alex A. Freitas	(参考訳) 機械学習(AutoML)は、与えられた学習タスクに対するアルゴリズムの選択と設定を扱うために登場した。 AutoMLの進歩に伴い、特に従来の分類や回帰問題に対していくつかの効果的な手法が導入された。 AutoMLの成功とは別に、いくつかの問題が未解決のままである。特に問題の一つは、さまざまなタイプのデータを扱うautomlメソッドが欠如していることだ。このシナリオに基づいて,マルチラベル分類(MLC)問題に対するAutoMLにアプローチする。 mlcでは、それぞれの例が複数のクラスラベルに同時に関連付けられるが、標準的な分類タスクとは異なり、例は1つのクラスラベルに関連付けられる。本研究では,14のデータセットと3つの設計された検索空間に対して,2つの進化的手法,1つのベイズ最適化法,1つのランダム探索法,1つのグリージー探索法と5つの自動多ラベル分類法の比較を行った。全体として、最も顕著な方法は、標準文法に基づく遺伝的プログラミング(GGP)探索法、すなわちAuto-MEKA$_{GGP}$に基づくものである。 Auto-MEKA$_{GGP}$は, 比較において最高の平均値を示し, グリーディ探索法と比較した場合を除き, 異なる探索空間における他の手法よりも統計的に優れていた。 Automated Machine Learning (AutoML) has emerged to deal with the selection and configuration of algorithms for a given learning task. With the progression of AutoML, several effective methods were introduced, especially for traditional classification and regression problems. Apart from the AutoML success, several issues remain open. One issue, in particular, is the lack of ability of AutoML methods to deal with different types of data. Based on this scenario, this paper approaches AutoML for multi-label classification (MLC) problems. In MLC, each example can be simultaneously associated to several class labels, unlike the standard classification task, where an example is associated to just one class label. In this work, we provide a general comparison of five automated multi-label classification methods -- two evolutionary methods, one Bayesian optimization method, one random search and one greedy search -- on 14 datasets and three designed search spaces. Overall, we observe that the most prominent method is the one based on a canonical grammar-based genetic programming (GGP) search method, namely Auto-MEKA$_{GGP}$. Auto-MEKA$_{GGP}$ presented the best average results in our comparison and was statistically better than all the other methods in different search spaces and evaluated measures, except when compared to the greedy search method.	翻訳日:2022-12-02 12:41:17 公開日:2020-07-31
# 超低リソース自動音声認識のための生成型adversarial training data adaptation Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition ( http://arxiv.org/abs/2005.09256v2 ) ライセンス: Link先を確認	Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara	(参考訳) 言語文化の遺産を保存するために、絶滅危惧言語の音声データを書き起こし、アーカイブすることが重要であり、自動音声認識(asr)はこのプロセスを容易にする強力なツールである。しかし、絶滅危惧言語は一般に多くの話者を持つ大きなコーパスを持たないため、訓練されたASRモデルの性能は概してかなり劣っている。それでも、書き起こさなければならない自発的な音声データの多くの記録が残されていることが多い。本研究では,この話者スパーシティ問題を解決するために,学習音声データ全体を変換し,テスト話者のように聞こえるようにし,高精度なasrシステムを構築することを提案する。本研究では,CycleGANをベースとした非並列音声変換技術を用いて,テスト話者の音声に近いラベル付きトレーニングデータを構築する。 AinuとMboshiの2つの低リソースコーパスに対する話者適応手法の評価を行った。 Ainu corpusの電話誤り率を35-60%改善し,Mboshi corpusでは40%改善した。このアプローチは、教師なし適応とこれら2つのコーパスによる多言語訓練という、2つの従来の手法よりも優れていた。 It is important to transcribe and archive speech data of endangered languages for preserving heritages of verbal culture and automatic speech recognition (ASR) is a powerful tool to facilitate this process. However, since endangered languages do not generally have large corpora with many speakers, the performance of ASR models trained on them are considerably poor in general. Nevertheless, we are often left with a lot of recordings of spontaneous speech data that have to be transcribed. In this work, for mitigating this speaker sparsity problem, we propose to convert the whole training speech data and make it sound like the test speaker in order to develop a highly accurate ASR system for this speaker. For this purpose, we utilize a CycleGAN-based non-parallel voice conversion technology to forge a labeled training data that is close to the test speaker's speech. We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi. We obtained 35-60% relative improvement in phone error rate on the Ainu corpus, and 40% relative improvement was attained on the Mboshi corpus. This approach outperformed two conventional methods namely unsupervised adaptation and multilingual training with these two corpora.	翻訳日:2022-12-01 14:15:11 公開日:2020-07-31
# 振動データを用いた回転軸の機械学習による不均衡検出 Machine Learning-Based Unbalance Detection of a Rotating Shaft Using Vibration Data ( http://arxiv.org/abs/2005.12742v3 ) ライセンス: Link先を確認	Oliver Mey, Willi Neudeck, Andr\'e Schneider and Olaf Enge-Rosenblatt	(参考訳) 振動センサによる回転機械の故障検出は、早期に機械の損傷を検知し、適切な対策を講じて生産停止を防ぐことができる。機械学習を用いた振動データの解析は、関連する分析労力の大幅な削減と、さらなる診断精度の向上を約束する。ここでは、アンバランス検出のためのアルゴリズムの開発と評価の基礎として使用されるデータセットを公開する。この目的のために3Dプリントホルダを用いて回転軸に様々な大きさのアンバランスを取付けた。速度は近似から近似まで。 630 RPMから2330 RPMの3つのセンサを用いて, 回転軸の振動を毎秒4096値のサンプリング速度で記録した。不均衡強度ごとに開発と評価データセットが利用可能である。このように記録されたデータセットを用いて、完全に接続された畳み込みニューラルネットワーク、Hidden Markov ModelsおよびRandom Forest分類を自動抽出時系列特徴に基づいてテストした。評価データセット上で98.6 %の予測精度で、スケールしたfft変換振動データを入力として受信する完全接続ニューラルネットワークを用いて、最良の結果が得られる。 Fault detection at rotating machinery with the help of vibration sensors offers the possibility to detect damage to machines at an early stage and to prevent production downtimes by taking appropriate measures. The analysis of the vibration data using methods of machine learning promises a significant reduction in the associated analysis effort and a further improvement in diagnostic accuracy. Here we publish a dataset which is used as a basis for the development and evaluation of algorithms for unbalance detection. For this purpose, unbalances of various sizes were attached to a rotating shaft using a 3D-printed holder. In a speed range from approx. 630 RPM to 2330 RPM, three sensors were used to record vibrations on the rotating shaft at a sampling rate of 4096 values per second. A development and an evaluation dataset are available for each unbalance strength. Using the dataset recorded in this way, fully connected and convolutional neural networks, Hidden Markov Models and Random Forest classifications on the basis of automatically extracted time series features were tested. With a prediction accuracy of 98.6 % on the evaluation dataset, the best result could be achieved with a fully-connected neural network that receives the scaled FFT-transformed vibration data as input.	翻訳日:2022-11-29 00:50:13 公開日:2020-07-31
# ニューラルネットワークモデルの抽出に対する保護 A Protection against the Extraction of Neural Network Models ( http://arxiv.org/abs/2005.12782v3 ) ライセンス: Link先を確認	Herv\'e Chabanne and Vincent Despiegel and Linda Guiga	(参考訳) oracleがニューラルネットワーク(nn)にアクセスすると、その基盤となるモデルを抽出することができる。ここでは,基盤となるnnの予測をほとんど変更せず,リバースエンジニアリングのタスクを複雑化させる寄生層を追加することで保護を導入する。提案手法は,畳み込みnnを用いた雑音idマッピングを近似する。新たな寄生層の導入が攻撃を複雑化する理由を説明する。我々は,保護されたNNの性能と精度に関する実験を報告する。 Given oracle access to a Neural Network (NN), it is possible to extract its underlying model. We here introduce a protection by adding parasitic layers which keep the underlying NN's predictions mostly unchanged while complexifying the task of reverse-engineering. Our countermeasure relies on approximating a noisy identity mapping with a Convolutional NN. We explain why the introduction of new parasitic layers complexifies the attacks. We report experiments regarding the performance and the accuracy of the protected NN.	翻訳日:2022-11-28 23:47:14 公開日:2020-07-31
# 時間列回帰と連続正規化流れに対する離散化最適化と最適分散 Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows ( http://arxiv.org/abs/2005.13420v2 ) ライセンス: Link先を確認	Derek Onken and Lars Ruthotto	(参考訳) ニューラルネットワークを用いた時系列回帰と連続正規化フロー(CNF)に対する離散化最適化(Disc-Opt)と最適化分散(Opt-Disc)アプローチの比較を行った。ニューラルODEは、ニューラルネットワーク成分を持つ通常の微分方程式(ODE)である。神経odeのトレーニングは、重みが制御であり、隠れた特徴が状態である最適な制御問題である。トレーニングの各イテレーションでは、odeの前方と後方のタイムを解決し、大量の計算、時間、メモリを必要とする。画像分類タスクにおける Opt-Disc と Disc-Opt のアプローチを比較すると、Gholami et al. (2019) は勾配の精度が保証されているために Disc-Opt が好ましいことを示唆している。本稿では,時系列回帰とCNFの比較を行う。分類とは異なり、これらのタスクにおける有意義なモデルは、CNFの可逆性など、正確な最終的な出力を超える追加の要求を満たす必要がある。数値実験により、注意深い数値処理を行うことで、Opt-Discと同等の性能をトレーニングコストを大幅に削減できることを示した。 Disc-Optはトレーニング時間を39%から97%に減らした7つの問題のうち6つでコストを削減した。 We compare the discretize-optimize (Disc-Opt) and optimize-discretize (Opt-Disc) approaches for time-series regression and continuous normalizing flows (CNFs) using neural ODEs. Neural ODEs are ordinary differential equations (ODEs) with neural network components. Training a neural ODE is an optimal control problem where the weights are the controls and the hidden features are the states. Every training iteration involves solving an ODE forward and another backward in time, which can require large amounts of computation, time, and memory. Comparing the Opt-Disc and Disc-Opt approaches in image classification tasks, Gholami et al. (2019) suggest that Disc-Opt is preferable due to the guaranteed accuracy of gradients. In this paper, we extend the comparison to neural ODEs for time-series regression and CNFs. Unlike in classification, meaningful models in these tasks must also satisfy additional requirements beyond accurate final-time output, e.g., the invertibility of the CNF. Through our numerical experiments, we demonstrate that with careful numerical treatment, Disc-Opt methods can achieve similar performance as Opt-Disc at inference with drastically reduced training costs. Disc-Opt reduced costs in six out of seven separate problems with training time reduction ranging from 39% to 97%, and in one case, Disc-Opt reduced training from nine days to less than one day.	翻訳日:2022-11-28 08:21:21 公開日:2020-07-31
# ロバスト推定と制御のための神経収縮メトリクス:凸最適化アプローチ Neural Contraction Metrics for Robust Estimation and Control: A Convex Optimization Approach ( http://arxiv.org/abs/2006.04361v3 ) ライセンス: Link先を確認	Hiroyasu Tsukamoto and Soon-Jo Chung	(参考訳) 本稿では,ニューラル収縮メトリック(ncm)の概念を用いたロバストな非線形推定と制御のための新しいディープラーニングフレームワークを提案する。 NCMは、非線形システムの指数的安定性に必要な必要十分条件である最適収縮距離の大域的近似に、ディープ・ロング・短期記憶リカレントニューラルネットワークを使用する。この最適性は、オフラインでサンプリングされた収縮測度が、摂動と摂動系軌道の間の定常ユークリッド距離の上界を最小化するための凸最適化問題の解であることに由来する。そこで本稿では,NCMを用いた非線形システムの最適推定器と制御器の設計について述べる。この枠組みの性能はロレンツ振動子状態推定と宇宙船の最適運動計画問題によって示される。 This paper presents a new deep learning-based framework for robust nonlinear estimation and control using the concept of a Neural Contraction Metric (NCM). The NCM uses a deep long short-term memory recurrent neural network for a global approximation of an optimal contraction metric, the existence of which is a necessary and sufficient condition for exponential stability of nonlinear systems. The optimality stems from the fact that the contraction metrics sampled offline are the solutions of a convex optimization problem to minimize an upper bound of the steady-state Euclidean distance between perturbed and unperturbed system trajectories. We demonstrate how to exploit NCMs to design an online optimal estimator and controller for nonlinear systems with bounded disturbances utilizing their duality. The performance of our framework is illustrated through Lorenz oscillator state estimation and spacecraft optimal motion planning problems.	翻訳日:2022-11-24 01:42:26 公開日:2020-07-31
# ハンティントン病における持続発声からの発声マーカー Vocal markers from sustained phonation in Huntington's Disease ( http://arxiv.org/abs/2006.05365v3 ) ライセンス: Link先を確認	Rachid Riad and Hadrien Titeux and Laurie Lemoine and Justine Montillot and Jennifer Hamet Bagnou and Xuan Nga Cao and Emmanuel Dupoux and Anne-Catherine Bachoud-L\'evi	(参考訳) 疾患修正治療は現在神経変性疾患で評価されている。ハンティントン病は、前マンニフェスト遺伝子キャリアにおいてさえ、自動的に臨床マーカーを設計するユニークな機会である。音声障害を臨床マーカーとして検討し, 診断と遺伝子担体について検討した。音声特徴と変調パワースペクトル特徴の2つの特徴セットを使用しました。発声は遺伝子キャリアーのサブクリニカル障害の同定には不十分であることがわかった。以上の結果から, ハンティントン病の臨床成績の予測には, 音像の特徴が適していると考えられた。 Disease-modifying treatments are currently assessed in neurodegenerative diseases. Huntington's Disease represents a unique opportunity to design automatic sub-clinical markers, even in premanifest gene carriers. We investigated phonatory impairments as potential clinical markers and propose them for both diagnosis and gene carriers follow-up. We used two sets of features: Phonatory features and Modulation Power Spectrum Features. We found that phonation is not sufficient for the identification of sub-clinical disorders of premanifest gene carriers. According to our regression results, Phonatory features are suitable for the predictions of clinical performance in Huntington's Disease.	翻訳日:2022-11-23 15:19:59 公開日:2020-07-31
# 差別化可能なレンダリング: 調査 Differentiable Rendering: A Survey ( http://arxiv.org/abs/2006.12057v2 ) ライセンス: Link先を確認	Hiroharu Kato, Deniz Beker, Mihai Morariu, Takahiro Ando, Toru Matsuoka, Wadim Kehl, Adrien Gaidon	(参考訳) ディープニューラルネットワーク(DNN)は、オブジェクト検出やイメージセグメンテーションなどの視覚関連タスクにおいて、顕著なパフォーマンス向上を示している。その成功にもかかわらず、通常は画像を形成する3dオブジェクトの理解が欠如しており、シーンに関する3d情報を収集したり、簡単に注釈を付けることは必ずしも不可能である。微分可能レンダリングは、3dオブジェクトの勾配を画像を通して計算し伝播できる新しいフィールドである。また、3Dデータ収集とアノテーションの要求を減らし、様々なアプリケーションで高い成功率を実現する。本稿では,既存の文献を概観し,微分可能レンダリングの現状,応用,オープンリサーチの問題について考察する。 Deep neural networks (DNNs) have shown remarkable performance improvements on vision-related tasks such as object detection or image segmentation. Despite their success, they generally lack the understanding of 3D objects which form the image, as it is not always possible to collect 3D information about the scene or to easily annotate it. Differentiable rendering is a novel field which allows the gradients of 3D objects to be calculated and propagated through images. It also reduces the requirement of 3D data collection and annotation, while enabling higher success rate in various applications. This paper reviews existing literature and discusses the current state of differentiable rendering, its applications and open research problems.	翻訳日:2022-11-18 06:39:56 公開日:2020-07-31
# SRFlow: 正規化フローによる超解法空間の学習 SRFlow: Learning the Super-Resolution Space with Normalizing Flow ( http://arxiv.org/abs/2006.14200v2 ) ライセンス: Link先を確認	Andreas Lugmayr and Martin Danelljan and Luc Van Gool and Radu Timofte	(参考訳) 超解像度は、与えられた低解像度画像の複数の予測を可能にするため、不適切な問題である。この基本的な事実は、最先端のディープラーニングベースのアプローチによって無視されている。これらの手法は、リコンストラクションと敵対的損失の組み合わせを使って決定論的マッピングを訓練する。そこで本研究では,低解像度入力の出力条件分布を学習可能な正規化フローベースの超解像法であるsrflowを提案する。我々のモデルは、単一の損失、すなわち負のログ類似性を用いて原則的に訓練される。したがって、SRFlowは問題の本質を直接的に説明し、多彩なフォトリアリスティックな高解像度画像を予測することを学ぶ。また,srflowが学習した強像後段を柔軟な画像操作技術として活用し,他の画像からのコンテンツの転送などによる超解像の高分解能化を実現する。我々は, 顔や超解像性全般について, 広範囲にわたる実験を行った。 SRFlowは、PSNRと知覚品質指標の両方の観点から、最先端のGANベースのアプローチよりも優れており、超解解の空間を探索することで多様性を実現する。 Super-resolution is an ill-posed problem, since it allows for multiple predictions for a given low-resolution image. This fundamental fact is largely ignored by state-of-the-art deep learning based approaches. These methods instead train a deterministic mapping using combinations of reconstruction and adversarial losses. In this work, we therefore propose SRFlow: a normalizing flow based super-resolution method capable of learning the conditional distribution of the output given the low-resolution input. Our model is trained in a principled manner using a single loss, namely the negative log-likelihood. SRFlow therefore directly accounts for the ill-posed nature of the problem, and learns to predict diverse photo-realistic high-resolution images. Moreover, we utilize the strong image posterior learned by SRFlow to design flexible image manipulation techniques, capable of enhancing super-resolved images by, e.g., transferring content from other images. We perform extensive experiments on faces, as well as on super-resolution in general. SRFlow outperforms state-of-the-art GAN-based approaches in terms of both PSNR and perceptual quality metrics, while allowing for diversity through the exploration of the space of super-resolved solutions.	翻訳日:2022-11-17 04:24:38 公開日:2020-07-31
# ピアレビューにおける紙入札最適化のためのスーパーアルゴリズム A SUPER Algorithm to Optimize Paper Bidding in Peer Review ( http://arxiv.org/abs/2007.07079v2 ) ライセンス: Link先を確認	Tanner Fiez, Nihar B. Shah, Lillian Ratliff	(参考訳) 多くのアプリケーションがユーザのシーケンシャルな到着を伴い、各ユーザにアイテムの順序を示す必要がある。主な例(この記事の焦点となる)は、レビュー者が順次システムに入り、各レビュアーが提出された論文のリストを表示し、レビュアーがいくつかの論文をレビューするために "bids" する必要があるカンファレンスピアレビューの入札プロセスである。示されている論文の順序は、プライマシー効果による入札に大きな影響を与える。表示すべき論文の順序を決定する際、競合する2つの目標がある。 (i)各紙に十分な数の入札を得ること、及び (二)関連項目を提示して審査員を満足させる。本稿では,この問題を原則的に研究する枠組みの開発から始める。この目的のために,AアルゴリズムにインスパイアされたSUPERアルゴリズムを提案する。理論的には、アルゴリズムの局所最適性保証を示し、人気ベースラインがかなり最適であることを示す。さらに, 類似性に関するコミュニティモデルでは, SUPER* がほぼ最適であるのに対して, 人気ベースラインはかなり最適であることを示す。 ICLR 2018の実際のデータと合成データの実験では、SUPERは既存のシステムにデプロイされたベースラインをかなり上回り、必要な入札を50～75%以下に減らし、さまざまな現実の複雑さにも堅牢であることがわかった。 A number of applications involve sequential arrival of users, and require showing each user an ordering of items. A prime example (which forms the focus of this paper) is the bidding process in conference peer review where reviewers enter the system sequentially, each reviewer needs to be shown the list of submitted papers, and the reviewer then "bids" to review some papers. The order of the papers shown has a significant impact on the bids due to primacy effects. In deciding on the ordering of papers to show, there are two competing goals: (i) obtaining sufficiently many bids for each paper, and (ii) satisfying reviewers by showing them relevant items. In this paper, we begin by developing a framework to study this problem in a principled manner. We present an algorithm called SUPER, inspired by the A* algorithm, for this goal. Theoretically, we show a local optimality guarantee of our algorithm and prove that popular baselines are considerably suboptimal. Moreover, under a community model for the similarities, we prove that SUPER* is near-optimal whereas the popular baselines are considerably suboptimal. In experiments on real data from ICLR 2018 and synthetic data, we find that SUPER* considerably outperforms baselines deployed in existing systems, consistently reducing the number of papers with fewer than requisite bids by 50-75% or more, and is also robust to various real world complexities.	翻訳日:2022-11-16 07:32:16 公開日:2020-07-31
# メタSAC: メタグラディエントによるソフトアクター・クライトのエントロピー温度の自動調整 Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient ( http://arxiv.org/abs/2007.01932v2 ) ライセンス: Link先を確認	Yufei Wang, Tianwei Ni	(参考訳) 探索探索ジレンマは、強化学習において長い間重要な問題であった。本稿では,これら2つのバランスをとるための新しい手法を提案する。提案手法は,従来のタスク報酬とポリシのエントロピーのバランスをとる「エントロピー温度」を用いて,エクスプロイトと探索のトレードオフを制御するソフトアクタ・クリティカル(SAC)アルゴリズムに基づいている。 SACはこのハイパーパラメータに非常に敏感であることが実証的に示されており、自動調整に制約付き最適化を用いるフォローアップ作業(SAC-v2)にはいくつかの制限がある。提案手法の中核は,SACのエントロピー温度を自動調整するために,メタグラディエントと新しいメタ目的を併用することである。我々は,Meta-SACがいくつかのMujocoベンチマークタスクにおいて有望な性能を達成し,最も困難なタスクの一つであるHumanoid-v2において,SAC-v2を10%以上上回っていることを示す。 Exploration-exploitation dilemma has long been a crucial issue in reinforcement learning. In this paper, we propose a new approach to automatically balance between these two. Our method is built upon the Soft Actor-Critic (SAC) algorithm, which uses an "entropy temperature" that balances the original task reward and the policy entropy, and hence controls the trade-off between exploitation and exploration. It is empirically shown that SAC is very sensitive to this hyperparameter, and the follow-up work (SAC-v2), which uses constrained optimization for automatic adjustment, has some limitations. The core of our method, namely Meta-SAC, is to use metagradient along with a novel meta objective to automatically tune the entropy temperature in SAC. We show that Meta-SAC achieves promising performances on several of the Mujoco benchmarking tasks, and outperforms SAC-v2 over 10% in one of the most challenging tasks, humanoid-v2.	翻訳日:2022-11-14 04:43:20 公開日:2020-07-31
# ModeNet:学習ビデオ符号化のためのモード選択ネットワーク ModeNet: Mode Selection Network For Learned Video Coding ( http://arxiv.org/abs/2007.02532v2 ) ライセンス: Link先を確認	Th\'eo Ladune (IETR), Pierrick Philippe, Wassim Hamidouche (IETR), Lu Zhang (IETR), Olivier D\'eforges (IETR)	(参考訳) 本稿では,深層学習に基づくビデオ圧縮を強化するため,モード選択ネットワーク(ModeNet)を提案する。従来のビデオコーディングにインスパイアされたModeNetの目的は、いくつかのコーディングモード間の競争を可能にすることである。提案したModeNetは,各ピクセルを最も適した符号化モードに割り当てるために使用されるフレームの画素分割を学習し,伝達する。 modenetは異なるコーディングモードと共に訓練され、レート分散コストを最小限に抑える。これは、異なるコーディングツール間の競合を可能にするために、他のシステムに一般化できる柔軟なコンポーネントである。 Mod-eNetの関心は、Pフレームのコーディングタスクで研究され、予測値からフレームをコーディングする手法の設計に使用される。 modenetベースのシステムは、学習画像圧縮2020(clic20)のpフレーム符号化トラック条件で評価することで、魅力的な性能を達成している。 In this paper, a mode selection network (ModeNet) is proposed to enhance deep learning-based video compression. Inspired by traditional video coding, ModeNet purpose is to enable competition among several coding modes. The proposed ModeNet learns and conveys a pixel-wise partitioning of the frame, used to assign each pixel to the most suited coding mode. ModeNet is trained alongside the different coding modes to minimize a rate-distortion cost. It is a flexible component which can be generalized to other systems to allow competition between different coding tools. Mod-eNet interest is studied on a P-frame coding task, where it is used to design a method for coding a frame given its prediction. ModeNet-based systems achieve compelling performance when evaluated under the Challenge on Learned Image Compression 2020 (CLIC20) P-frame coding track conditions.	翻訳日:2022-11-13 02:26:27 公開日:2020-07-31
# シーンコンテキストによる長期人間の動作予測 Long-term Human Motion Prediction with Scene Context ( http://arxiv.org/abs/2007.03672v3 ) ライセンス: Link先を確認	Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, Jitendra Malik	(参考訳) 人間の動きはゴール指向であり、シーン内の物体の空間配置に影響される。将来の人間の動きを計画するためには、環境を知覚することが不可欠だ。人間の動きを予測する既存の研究は、シーンの文脈に注意を払わないため、長期的な予測に苦慮している。本研究では,この課題に対処するためのシーンコンテキストを活用する新しい3段階フレームワークを提案する。 1つのシーンイメージと2Dポーズ履歴が与えられた後、まず複数の人間の動作目標を抽出し、各目標に向けて3Dヒューマンパスを計画し、最終的に各パスに続く3Dヒューマンポーズシーケンスを予測する。安定したトレーニングと厳密な評価のために,クリーンアノテーションを用いた多様な合成データセットを寄贈する。合成データと実データの両方において,本手法は既存の手法に比べて一貫した定量的,定性的な改善を示す。 Human movement is goal-directed and influenced by the spatial layout of the objects in the scene. To plan future human motion, it is crucial to perceive the environment -- imagine how hard it is to navigate a new room with lights off. Existing works on predicting human motion do not pay attention to the scene context and thus struggle in long-term prediction. In this work, we propose a novel three-stage framework that exploits scene context to tackle this task. Given a single scene image and 2D pose histories, our method first samples multiple human motion goals, then plans 3D human paths towards each goal, and finally predicts 3D human pose sequences following each path. For stable training and rigorous evaluation, we contribute a diverse synthetic dataset with clean annotations. In both synthetic and real datasets, our method shows consistent quantitative and qualitative improvements over existing methods.	翻訳日:2022-11-12 20:08:34 公開日:2020-07-31
# AttentionNAS:ビデオ分類のための時空間注意細胞探索 AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification ( http://arxiv.org/abs/2007.12034v2 ) ライセンス: Link先を確認	Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani and Wei Hua	(参考訳) 畳み込み操作には2つの制限がある:(1)同じフィルタを全ての位置に適用する場所を明示的にモデル化しないこと、(2)小さな近傍でのみ動作するような長距離依存性のモデル化には不向きである。どちらの制限も注意操作によって緩和できるが、特にビデオに注意を向ける場合、注意を使用するように多くの設計選択が決定される。ビデオに注意を向ける原理的な方法を目指して,時空間注意細胞探索の課題に対処する。そこで本稿では, セルの様々な設計選択を柔軟に探索できる, 時空間アテンションセルの新しい探索空間を提案する。検出されたアテンションセルは既存のバックボーンネットワーク(例えばI3DやS3D)にシームレスに挿入することができ、Kinetics-600とMiTのデータセットでビデオ分類精度を2%以上改善することができる。検出された注意セルは、両方のデータセット上の非ローカルブロックよりも優れており、異なるモダリティ、バックボーン、データセットにまたがる強力な一般化を示している。注意細胞をI3D-R50に挿入すると、両方のデータセットで最先端のパフォーマンスが得られる。 Convolutional operations have two limitations: (1) do not explicitly model where to focus as the same filter is applied to all the positions, and (2) are unsuitable for modeling long-range dependencies as they only operate on a small neighborhood. While both limitations can be alleviated by attention operations, many design choices remain to be determined to use attention, especially when applying attention to videos. Towards a principled way of applying attention to videos, we address the task of spatiotemporal attention cell search. We propose a novel search space for spatiotemporal attention cells, which allows the search algorithm to flexibly explore various design choices in the cell. The discovered attention cells can be seamlessly inserted into existing backbone networks, e.g., I3D or S3D, and improve video classification accuracy by more than 2% on both Kinetics-600 and MiT datasets. The discovered attention cells outperform non-local blocks on both datasets, and demonstrate strong generalization across different modalities, backbones, and datasets. Inserting our attention cells into I3D-R50 yields state-of-the-art performance on both datasets.	翻訳日:2022-11-07 12:21:16 公開日:2020-07-31
# ディープラーニングモデルに基づく積極的なタスク管理 Proactive Tasks Management based on a Deep Learning Model ( http://arxiv.org/abs/2007.12857v2 ) ライセンス: Link先を確認	Kostas Kolomvatsos, Christos Anagnotopoulos	(参考訳) 広く普及するコンピューティングアプリケーションは、アクティビティを促進するユーザを取り巻くインテリジェンスを扱う。このインテリジェンスは、組み込みシステムやエンドユーザーとの密接な距離にあるデバイスに組み込まれたソフトウェアコンポーネントという形で提供される。 ecノードはiot(internet of things)インフラストラクチャに存在するデバイスが収集したデータに対して,さまざまなタスクを実行することができる。本稿では,要求に基づく知的かつ積極的なタスク管理モデルを提案する。デマンドはECノードで利用可能なタスクの使用に関心のあるユーザやアプリケーションの数を表しており、その人気を特徴づけている。我々は、Deep Machine Learning(DML)モデル、特にLong Short Term Memory(LSTM)ネットワークを利用して、各タスクに対する需要指標の分布を学習し、将来の関心を見積もる。この情報は歴史的な観察と組み合わされ、限られた関心のためにどのタスクがオフロードされるかを決定する意思決定スキームをサポートする。意思決定において、各タスクが割り当てられる処理ノードに追加される可能性のある負荷も考慮に入れておく必要があります。本モデルには,提案機構を評価するための実験シミュレーションが多数添付されている。提案手法は,最も効率的な割当を結論づけつつ,飛べば決定できることを示す。 Pervasive computing applications deal with intelligence surrounding users that can facilitate their activities. This intelligence is provided in the form of software components incorporated in embedded systems or devices in close distance with end users.One example infrastructure that can host intelligent pervasive services is the Edge Computing (EC) infrastructure. EC nodes can execute a number of tasks for data collected by devices present in the Internet of Things (IoT) infrastructure. In this paper, we propose an intelligent, proactive tasks management model based on the demand. Demand depicts the number of users or applications interested in using the available tasks in EC nodes, thus, characterizing their popularity. We rely on a Deep Machine Learning (DML) model and more specifically on a Long Short Term Memory (LSTM) network to learn the distribution of demand indicators for each task and estimate the future interest. This information is combined with historical observations and support a decision making scheme to conclude which tasks will be offloaded due to limited interest on them. We have to notice that in our decision making, we also take into consideration the load that every task may add to the processing node where it will be allocated. The description of our model is accompanied by a large set of experimental simulations for evaluating the proposed mechanism. We provide numerical results and reveal that the proposed scheme is capable of deciding on the fly while concluding the most efficient allocation.	翻訳日:2022-11-07 01:28:04 公開日:2020-07-31
# 深層学習を用いた除草ハーバリウムスカンからの植物器官の検出とアノテーション Detection and Annotation of Plant Organs from Digitized Herbarium Scans using Deep Learning ( http://arxiv.org/abs/2007.13106v2 ) ライセンス: Link先を確認	Sohaib Younis, Marco Schmidt, Claus Weiland, Stefan Dressler, Bernhard Seeger, Thomas Hickler	(参考訳) ヘルバリウム標本がデジタル化され、オンラインリポジトリからアクセスできるようになるにつれて、高度なコンピュータビジョン技術がそれらから情報を抽出するために使用されている。ハーバリウムシート上の特定の植物器官の存在は、様々な科学的文脈において有用な情報であり、これらの臓器の自動認識は、そのような情報の動員に役立つ。本研究では,より高速なr-cnnを用いて,植物器官の検出に深層学習を用いる。実験では,6種類の植物器官に何千ものバウンディングボックスを含む数百のエルバリウムスキャンを手作業で注釈し,植物器官検出モデルの訓練と評価に用いた。モデルは葉や茎で特にうまく機能し、花はシートに多く含まれていたが、同様に認識されなかった。 As herbarium specimens are increasingly becoming digitized and accessible in online repositories, advanced computer vision techniques are being used to extract information from them. The presence of certain plant organs on herbarium sheets is useful information in various scientific contexts and automatic recognition of these organs will help mobilize such information. In our study we use deep learning to detect plant organs on digitized herbarium specimens with Faster R-CNN. For our experiment we manually annotated hundreds of herbarium scans with thousands of bounding boxes for six types of plant organs and used them for training and evaluating the plant organ detection model. The model worked particularly well on leaves and stems, while flowers were also present in large numbers in the sheets, but not equally well recognized.	翻訳日:2022-11-06 20:11:50 公開日:2020-07-31
# 解剖学的に可変なXCATファントムを用いた3次元心臓MR画像合成のためのXCAT-GAN XCAT-GAN for Synthesizing 3D Consistent Labeled Cardiac MR Images on Anatomically Variable XCAT Phantoms ( http://arxiv.org/abs/2007.13408v2 ) ライセンス: Link先を確認	Sina Amirrajab, Samaneh Abbasi-Sureshjani, Yasmina Al Khalil, Cristian Lorenz, Juergen Weese, Josien Pluim, and Marcel Breeuwer	(参考訳) GAN(Generative Adversarial Network)は、高忠実度画像の合成による有望なデータ濃縮ソリューションを提供する。しかし,新しい解剖学的変化を伴うラベル付き画像の大量生成は未発見のままである。そこで本研究では, 4D eXtended Cardiac と Torso (XCAT) コンピュータ化ヒトファントムを用いて, 解剖学的変化が大きい仮想被験者の集団に心磁気共鳴画像(CMR)を合成する方法を提案する。本研究では,4-classと8-class XCAT-GANの2つの条件付き画像合成手法について検討した。 4-class法は心臓のアノテーションのみに依存し,8-class法では心周囲臓器の多部ラベルマップが予測され,条件付き画像合成のためのより良いガイダンスが得られた。いずれの手法も、条件付きXCAT-GANを実画像と対応するラベルとの組み合わせで訓練し、その後、推論時にXCATから派生したラベルに置き換える。そのため、トレーニングされたネットワークは、組織固有のテクスチャを新しいラベルマップに正確に転送する。終末期期および終末期期における合成cmr画像の33個の仮想被写体を作成し,そのデータの有効性について検討した。その結果, 実画像(40巻)の20%に留まらず, 合成CMR画像の付加によりセグメンテーション性能が保たれることがわかった。さらに, 実データ拡張のための合成画像の利用改善は, ハウスドルフ距離を最大28%削減し, ディススコアを最大5%向上させることで明らかとなり, 全次元において地上の真実と高い類似性を示した。 Generative adversarial networks (GANs) have provided promising data enrichment solutions by synthesizing high-fidelity images. However, generating large sets of labeled images with new anatomical variations remains unexplored. We propose a novel method for synthesizing cardiac magnetic resonance (CMR) images on a population of virtual subjects with a large anatomical variation, introduced using the 4D eXtended Cardiac and Torso (XCAT) computerized human phantom. We investigate two conditional image synthesis approaches grounded on a semantically-consistent mask-guided image generation technique: 4-class and 8-class XCAT-GANs. The 4-class technique relies on only the annotations of the heart; while the 8-class technique employs a predicted multi-tissue label map of the heart-surrounding organs and provides better guidance for our conditional image synthesis. For both techniques, we train our conditional XCAT-GAN with real images paired with corresponding labels and subsequently at the inference time, we substitute the labels with the XCAT derived ones. Therefore, the trained network accurately transfers the tissue-specific textures to the new label maps. By creating 33 virtual subjects of synthetic CMR images at the end-diastolic and end-systolic phases, we evaluate the usefulness of such data in the downstream cardiac cavity segmentation task under different augmentation strategies. Results demonstrate that even with only 20% of real images (40 volumes) seen during training, segmentation performance is retained with the addition of synthetic CMR images. Moreover, the improvement in utilizing synthetic images for augmenting the real data is evident through the reduction of Hausdorff distance up to 28% and an increase in the Dice score up to 5%, indicating a higher similarity to the ground truth in all dimensions.	翻訳日:2022-11-06 08:19:36 公開日:2020-07-31
# 超高解像度画像のセグメンテーション Foveation for Segmentation of Ultra-High Resolution Images ( http://arxiv.org/abs/2007.15124v2 ) ライセンス: Link先を確認	Chen Jin, Ryutaro Tanno, Moucheng Xu, Thomy Mertzanidou, Daniel C. Alexander	(参考訳) 超高解像度画像のセグメンテーションは、その巨大なサイズ、数百万から数十億ピクセルからなるため、難しい。典型的なソリューションとしては、入力イメージを固定サイズのパッチに分割したり、メモリ制約を満たすためにダウンサンプリングしたりする。このような操作は、視野(FoV)、すなわち空間カバレッジと画像解像度において情報損失を引き起こす。しかし、セグメンテーションのパフォーマンスへの影響はまだ検討されていない。本研究では,fovと解像度のトレードオフが超高解像度画像のセグメンテーション性能に影響を及ぼすことを示すモチベーション実験から始める。次に、与えられた超高解像度画像に対して、入力パッチの適切な構成(FoV/解像度トレードオフ)を適応的に選択し、画像の各空間位置で下流セグメンテーションモデルにフィードする、学習可能なデータローダであるFoveationモジュールを導入する。フェーベーションモジュールは、タスク性能を最大化するためにセグメンテーションネットワークと共同で訓練される。我々は、FoV/解像度トレードオフのパッチでトレーニングされた場合よりも、Foveationモジュールがセグメンテーション性能を一貫して向上することを示す3つの公開高解像度画像データセットを実証する。本手法は,deepglobe aerial image dataset における sota 性能を実現する。 Gleason2019 の病理組織学的データセットでは,最も臨床的に重要で曖昧な2つのクラス (Gleason Grade 3 と 4 ) のセグメンテーション精度が13.1%,7.5%向上し,6 名の人間専門家の平均性能を6.5%,7.5%向上させた。私たちのコードとトレーニングされたモデルは、$\text{https://github.com/lxasqjc/Foveation-Segmentation}$で利用可能です。 Segmentation of ultra-high resolution images is challenging because of their enormous size, consisting of millions or even billions of pixels. Typical solutions include dividing input images into patches of fixed size and/or down-sampling to meet memory constraints. Such operations incur information loss in the field-of-view (FoV) i.e., spatial coverage and the image resolution. The impact on segmentation performance is, however, as yet understudied. In this work, we start with a motivational experiment which demonstrates that the trade-off between FoV and resolution affects the segmentation performance on ultra-high resolution images---and furthermore, its influence also varies spatially according to the local patterns in different areas. We then introduce foveation module, a learnable "dataloader" which, for a given ultra-high resolution image, adaptively chooses the appropriate configuration (FoV/resolution trade-off) of the input patch to feed to the downstream segmentation model at each spatial location of the image. The foveation module is jointly trained with the segmentation network to maximise the task performance. We demonstrate on three publicly available high-resolution image datasets that the foveation module consistently improves segmentation performance over the cases trained with patches of fixed FoV/resolution trade-off. Our approach achieves the SoTA performance on the DeepGlobe aerial image dataset. On the Gleason2019 histopathology dataset, our model achieves better segmentation accuracy for the two most clinically important and ambiguous classes (Gleason Grade 3 and 4) than the top performers in the challenge by 13.1% and 7.5%, and improves on the average performance of 6 human experts by 6.5% and 7.5%. Our code and trained models are available at $\text{https://github.com/lxasqjc/Foveation-Segmentation}$.	翻訳日:2022-11-05 20:08:44 公開日:2020-07-31
# 考えることを学ぶか、何をするかを学ぶか--活気ある学習の微妙な基礎 Learning what they think vs. learning what they do: The micro-foundations of vicarious learning ( http://arxiv.org/abs/2007.15264v2 ) ライセンス: Link先を確認	Sanghyun Park and Phanish Puranam	(参考訳) 活気のある学習は組織学習の重要な要素です。私たちは、活発な学習の基礎となる2つの基本的なプロセスを理論化し、モデル化します。私たちのモデルの分析は、3つの重要な洞察を示します。第一に、活気ある学習者のシステムにエージェントがいない場合でも、どちらのプロセスでも活気ある学習は有益である。第二に、信念共有による活気ある学習は、行動と成果の相互観察よりも普遍的に優れているわけではない。特に,行動と成果の相互観測可能性の実現は,タスク環境が価値に大きな違いのある選択肢がほとんどなく,時間的プレッシャーがない場合に,信念の共有よりも優れている。第三に、ヴィカリアス学習の対称性は実際には信念共有に悪影響を及ぼすが、観察学習を改善する。これら3つの結果は、自発的な学習が自己完結するバイアスド・信念にどのように影響を及ぼすかの結果として示される。 Vicarious learning is a vital component of organizational learning. We theorize and model two fundamental processes underlying vicarious learning: observation of actions (learning what they do) vs. belief sharing (learning what they think). The analysis of our model points to three key insights. First, vicarious learning through either process is beneficial even when no agent in a system of vicarious learners begins with a knowledge advantage. Second, vicarious learning through belief sharing is not universally better than mutual observation of actions and outcomes. Specifically, enabling mutual observability of actions and outcomes is superior to sharing of beliefs when the task environment features few alternatives with large differences in their value and there are no time pressures. Third, symmetry in vicarious learning in fact adversely affects belief sharing but improves observational learning. All three results are shown to be the consequence of how vicarious learning affects self-confirming biased beliefs.	翻訳日:2022-11-05 14:33:48 公開日:2020-07-31
# 外部知識からの学習による多ラベルゼロショット分類 Multi-label Zero-shot Classification by Learning to Transfer from External Knowledge ( http://arxiv.org/abs/2007.15610v2 ) ライセンス: Link先を確認	He Huang, Yuanwei Chen, Wei Tang, Wenhao Zheng, Qing-Guo Chen, Yao Hu, Philip Yu	(参考訳) マルチラベルゼロショット分類は、入力画像に対する複数の未知のクラスラベルを予測することを目的としている。シングルレーベルよりも難しい。一方、各イメージに割り当てられたラベルの制限されていない数は、モデルが見たクラスにもっと簡単にオーバーフィットする。一方、既存のマルチラベル分類データセットでは、目に見えるクラスと見えないクラスの間に大きな意味的ギャップがある。このような難題に対処するために,外部知識からの伝達を学習する多ラベルゼロショット分類フレームワークを提案する。 ImageNetは特徴抽出器の事前訓練によく使われており、大きめできめ細かなラベル空間を持つ。これにより、目に見えるクラスと見えないクラスを橋渡しし、一般化を促進する外部知識として活用するモチベーションが生まれます。具体的には、ターゲットデータセットのクラスだけでなく、ImageNetのクラスも含む知識グラフを構築する。対象とするデータセットではimagenetラベルが利用できないため,拡張知識グラフで初期状態を推測する新しいposvaeモジュールを提案する。次に,関係グラフ畳み込みネットワーク(RGCN)を設計し,クラス間で情報を伝達し,知識伝達を実現する。 2つのベンチマークデータセットの実験結果から,提案手法の有効性が示された。 Multi-label zero-shot classification aims to predict multiple unseen class labels for an input image. It is more challenging than its single-label counterpart. On one hand, the unconstrained number of labels assigned to each image makes the model more easily overfit to those seen classes. On the other hand, there is a large semantic gap between seen and unseen classes in the existing multi-label classification datasets. To address these difficult issues, this paper introduces a novel multi-label zero-shot classification framework by learning to transfer from external knowledge. We observe that ImageNet is commonly used to pretrain the feature extractor and has a large and fine-grained label space. This motivates us to exploit it as external knowledge to bridge the seen and unseen classes and promote generalization. Specifically, we construct a knowledge graph including not only classes from the target dataset but also those from ImageNet. Since ImageNet labels are not available in the target dataset, we propose a novel PosVAE module to infer their initial states in the extended knowledge graph. Then we design a relational graph convolutional network (RGCN) to propagate information among classes and achieve knowledge transfer. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed approach.	翻訳日:2022-11-05 14:24:31 公開日:2020-07-31
# 正規化RBFカーネルによるサンプル効率の向上 Improving Sample Efficiency with Normalized RBF Kernels ( http://arxiv.org/abs/2007.15397v2 ) ライセンス: Link先を確認	Sebastian Pineda-Arango, David Obando-Paniagua, Alperen Dedeoglu, Philip Kurzend\"orfer, Friedemann Schestag and Randolf Scholz	(参考訳) ディープラーニングモデルでは、少ないデータでより多くを学ぶことが重要になっています。本稿では,正規化されたラジアル基底関数(RBF)カーネルを用いたニューラルネットワークを用いて,サンプル効率の向上を図る。さらに,このような出力層がクラスがコンパクトかつ分離された埋め込み空間をどのように見つけるかを示す。そこで本研究では,このようなニューラルネットワークを分類タスクで学習するための2段階の手法を提案する。 CIFAR-10 と CIFAR-100 の実験により,通常のカーネルを出力層として持つネットワークは,SoftMax 出力層を用いたネットワークと比較して,提案手法によりより高効率,高コンパクト,高分離性が得られることを示した。 In deep learning models, learning more with less data is becoming more important. This paper explores how neural networks with normalized Radial Basis Function (RBF) kernels can be trained to achieve better sample efficiency. Moreover, we show how this kind of output layer can find embedding spaces where the classes are compact and well-separated. In order to achieve this, we propose a two-phase method to train those type of neural networks on classification tasks. Experiments on CIFAR-10 and CIFAR-100 show that networks with normalized kernels as output layer can achieve higher sample efficiency, high compactness and well-separability through the presented method in comparison to networks with SoftMax output layer.	翻訳日:2022-11-05 13:21:41 公開日:2020-07-31
# 1量子化ディープニューラルネットワーク量子状態を持つ二次元スピンレス格子フェルミオンの相 Phases of two-dimensional spinless lattice fermions with first-quantized deep neural-network quantum states ( http://arxiv.org/abs/2008.00118v1 ) ライセンス: Link先を確認	James Stokes, Javier Robledo Moreno, Eftychios A. Pnevmatikakis, Giuseppe Carleo	(参考訳) 格子上の強結合フェルミオン系を解析するために, 第一量子化ディープニューラルネットワーク技術を開発した。畳み込み残差ブロックを持つ深い残差ネットワークを利用するスレーター・ジャストロウインスパイアアンサッツを用いて、最寄り-近距離相互作用を持つ正方格子上のスピンレスフェルミオンの基底状態を近似的に決定する。ニューラルネットワークのansatzの柔軟性は、エネルギーと相関関数の両方において、小さなシステムでの正確な対角化結果と比較して高い精度をもたらす。大規模系では, 相互作用強度と粒子密度の関数として, 金属相と電荷相の境界を正確に推定する。 First-quantized deep neural network techniques are developed for analyzing strongly coupled fermionic systems on the lattice. Using a Slater-Jastrow inspired ansatz which exploits deep residual networks with convolutional residual blocks, we approximately determine the ground state of spinless fermions on a square lattice with nearest-neighbor interactions. The flexibility of the neural-network ansatz results in a high level of accuracy when compared to exact diagonalization results on small systems, both for energy and correlation functions. On large systems, we obtain accurate estimates of the boundaries between metallic and charge ordered phases as a function of the interaction strength and the particle density.	翻訳日:2022-11-04 07:20:38 公開日:2020-07-31
# ニューラルネットワークによるアイスフォビック性能の予測 Using neural networks to predict icephobic performance ( http://arxiv.org/abs/2008.00966v1 ) ライセンス: Link先を確認	Rahul Ramachandran	(参考訳) 超疎水性表面に触発されたアイスホビック表面は、アイシング問題に対する受動的解を提供する。しかし, 超疎水化を助長する物質的特徴によっては, 耐氷性能に悪影響を及ぼす可能性があるため, アイスフォビシティのモデル化は困難である。本研究では, 人工ニューラルネットワークを用いたアイスフォビシティのモデル化手法を提案する。人工ニューラルネットワークモデルを用いて, コンクリートの凍結性能を予測した。実験データを用いて, 凍結条件下で水滴の表面着氷強度と再着氷係数(cor)を推算した。材料, 塗料組成物, および環境条件をモデル入力変数として用いた。多層パーセプトロンを用いて根平均二乗誤差 0.08, 90%信頼区間 [0.042, 0.151] のcorを予測した。このモデルは展開後の0.92の係数を持つ。氷の付着強度は試料の幅広い値に対して変化するため, 混合密度ネットワークをモデルとして, マルチモーダルデータの基盤となる関係を学習した。判定係数は0.96。アイスフォビック性能における入力変数の相対的重要性を置換重要度を用いて算出した。開発モデルはコンクリートの耐氷性を最適化する上で有益である。 Icephobic surfaces inspired by superhydrophobic surfaces offer a passive solution to the problem of icing. However, modeling icephobicity is challenging because some material features that aid superhydrophobicity can adversely affect the icephobic performance. This study presents a new approach based on artificial neural networks to model icephobicity. Artificial neural network models were developed to predict the icephobic performance of concrete. The models were trained on experimental data to predict the surface ice adhesion strength and the coefficient of restitution (COR) of water droplet bouncing off the surface under freezing conditions. The material and coating compositions, and environmental condition were used as the models' input variables. A multilayer perceptron was trained to predict COR with a root mean squared error of 0.08, and a 90% confidence interval of [0.042, 0.151]. The model had a coefficient of determination of 0.92 after deployment. Since ice adhesion strength varied over a wide range of values for the samples, a mixture density network was model was developed to learn the underlying relationship in the multimodal data. Coefficient of determination for the model was 0.96. The relative importance of the input variables in icephobic performance were calculated using permutation importance. The developed models will be beneficial to optimize icephobicity of concrete.	翻訳日:2022-11-04 07:20:26 公開日:2020-07-31
# 時間ワープ編集距離の改善 --線形メモリにおける並列動的プログラム Improved Time Warp Edit Distance -- A Parallel Dynamic Program in Linear Memory ( http://arxiv.org/abs/2007.16135v1 ) ライセンス: Link先を確認	Garrett Wright	(参考訳) 編集距離(edit distance)は、動的プログラミング問題の古典的なファミリーであり、タイムワープ編集距離(time warp edit distance)は、計量と時間弾性の概念を用いて問題を洗練する。大規模並列化と線形記憶のみを必要とする,改良されたタイムワープ編集距離アルゴリズムを提案する。この方法は、オリジナルの動的プログラム空間をカバーするために3対角帯の行列を用いる。対角線更新のすべての要素は並列に計算できる。コア法(core method)は、twed long common subsequence data dependenceの特徴であり、類似のバンドサブプロブレム構造を共有する動的プログラムに適用できる。このアルゴリズムはPythonバインディングを備えたCUDA Cライブラリとして実装されている。挑戦的な問題のスピードアップは驚くべきことです。 Edit Distance is a classic family of dynamic programming problems, among which Time Warp Edit Distance refines the problem with the notion of a metric and temporal elasticity. A novel Improved Time Warp Edit Distance algorithm that is both massively parallelizable and requiring only linear storage is presented. This method uses the procession of a three diagonal band to cover the original dynamic program space. Every element of the diagonal update can be computed in parallel. The core method is a feature of the TWED Longest Common Subsequence data dependence and is applicable to dynamic programs that share similar band subproblem structure. The algorithm has been implemented as a CUDA C library with Python bindings. Speedups for challenging problems are phenomenal.	翻訳日:2022-11-04 07:19:21 公開日:2020-07-31
# CorrSigNet: 放射線画像と病理画像からCRRELated Prostate Cancer SIGnaturesを学習してコンピュータ支援診断を改善する CorrSigNet: Learning CORRelated Prostate Cancer SIGnatures from Radiology and Pathology Images for Improved Computer Aided Diagnosis ( http://arxiv.org/abs/2008.00119v1 ) ライセンス: Link先を確認	Indrani Bhattacharya and Arun Seetharaman and Wei Shao and Rewa Sood and Christian A. Kunder and Richard E. Fan and Simon John Christoph Soerensen and Jeffrey B. Wang and Pejman Ghanouni and Nikola C. Teslovich and James D. Brooks and Geoffrey A. Sonn and Mirabela Rusu	(参考訳) MRIは前立腺がんのスクリーニングやステージングに広く用いられている。しかし、多くの前立腺癌はMRIでは容易に識別できない微妙な特徴を有しており、診断に失敗し、放射線学の解釈に異常が生じる。機械学習モデルは、がんの同定を改善するために開発されたが、現在のモデルは、MRIによる特徴を用いてがんを局在させるが、切除組織で観察される疾患の病態の特徴を考慮しない。本稿では,MRIで前立腺癌を局所化する2段階自動モデルであるCorrSigNetを提案する。まず, 共通表現学習を用いて, 対応する病理組織学的特徴に関連付けられたがんのmri所見を学習する。第二に、学習した相関MRI機能を使って、畳み込みニューラルネットワークをトレーニングし、前立腺がんを局所化する。病理組織像は、相関する特徴を学習するために第1段階のみ使用される。これらの特徴は、(病理組織学や手術なしで)新しい患者のmriから抽出され、がんを局在化することができる。前立腺切除術を施行した806スライス75例を対象に,本フレームワークを訓練,検証した。前立腺癌症例20例(139例,24例,112万画素)の独立した検査群を用いて,1画素あたりの感度0.81,特異度0.71,auc 0.86,1レシオンauc$0.96 \pm 0.07$を,mriを用いた前立腺癌予測における現在の精度を上回った。 Magnetic Resonance Imaging (MRI) is widely used for screening and staging prostate cancer. However, many prostate cancers have subtle features which are not easily identifiable on MRI, resulting in missed diagnoses and alarming variability in radiologist interpretation. Machine learning models have been developed in an effort to improve cancer identification, but current models localize cancer using MRI-derived features, while failing to consider the disease pathology characteristics observed on resected tissue. In this paper, we propose CorrSigNet, an automated two-step model that localizes prostate cancer on MRI by capturing the pathology features of cancer. First, the model learns MRI signatures of cancer that are correlated with corresponding histopathology features using Common Representation Learning. Second, the model uses the learned correlated MRI features to train a Convolutional Neural Network to localize prostate cancer. The histopathology images are used only in the first step to learn the correlated features. Once learned, these correlated features can be extracted from MRI of new patients (without histopathology or surgery) to localize cancer. We trained and validated our framework on a unique dataset of 75 patients with 806 slices who underwent MRI followed by prostatectomy surgery. We tested our method on an independent test set of 20 prostatectomy patients (139 slices, 24 cancerous lesions, 1.12M pixels) and achieved a per-pixel sensitivity of 0.81, specificity of 0.71, AUC of 0.86 and a per-lesion AUC of $0.96 \pm 0.07$, outperforming the current state-of-the-art accuracy in predicting prostate cancer using MRI.	翻訳日:2022-11-04 07:16:43 公開日:2020-07-31
# 高次元線形モデルにおける経験ベイズ後方の変分近似 Variational approximations of empirical Bayes posteriors in high-dimensional linear models ( http://arxiv.org/abs/2007.15930v1 ) ライセンス: Link先を確認	Yue Yang and Ryan Martin	(参考訳) 高次元では、先行尾部は後部計算と漸近性集中率の両方に有意な影響を及ぼす。後方計算を比較的シンプルに保ちながら最適速度を達成するために,データ駆動センタを用いた薄い共役前処理を特徴とする経験的ベイズ法が最近提案されている。共役先行法は計算負担の一部を緩和するが、マルコフ連鎖モンテカルロ法は依然として必要であり、次元が高ければ高価である。本稿では, 実験的なベイズ後部への変分近似を開発し, 計算が高速で, 原点の最適濃度特性を保っている。シミュレーションでは,本手法は,多種多様な高次元環境における文献における既存変分近似よりも優れた性能を示した。 In high-dimensions, the prior tails can have a significant effect on both posterior computation and asymptotic concentration rates. To achieve optimal rates while keeping the posterior computations relatively simple, an empirical Bayes approach has recently been proposed, featuring thin-tailed conjugate priors with data-driven centers. While conjugate priors ease some of the computational burden, Markov chain Monte Carlo methods are still needed, which can be expensive when dimension is high. In this paper, we develop a variational approximation to the empirical Bayes posterior that is fast to compute and retains the optimal concentration rate properties of the original. In simulations, our method is shown to have superior performance compared to existing variational approximations in the literature across a wide range of high-dimensional settings.	翻訳日:2022-11-04 07:16:11 公開日:2020-07-31
# モチーフに基づくグラフ畳み込み多層ネットワークを用いたグラフの表現学習 Representation Learning of Graphs Using Graph Convolutional Multilayer Networks Based on Motifs ( http://arxiv.org/abs/2007.15838v1 ) ライセンス: Link先を確認	Xing Li, Wei Wei, Xiangnan Feng, Xue Liu, Zhiming Zheng	(参考訳) グラフ構造は一般的に使用されるデータ記憶モードであり、グラフ内のノードの低次元埋め込み表現は、ノード分類、リンク予測など、様々な典型的なタスクで非常に有用であることが判明した。しかし、既存のアプローチのほとんどはグラフ内の二項関係(すなわちエッジ)から始まり、グラフの高階局所構造(すなわちモチーフ)を生かしていない。本稿では,ノードの特徴情報とグラフの高次局所構造を利用して,それまで認識されていなかったデータに対してノード埋め込みを効果的に生成する新しいフレームワークmgcmnを提案する。研究により、異なるタイプのネットワークには異なるキーモチーフがあることが判明した。また,提案手法のベースライン法に対する利点は,引用ネットワークとソーシャルネットワークのデータセットに関する数多くの実験で実証されている。同時に,分類精度の向上とクラスタリング係数との正の相関が明らかになった。高次構造情報を用いることで、グラフニューラルネットワークの学習効率を大幅に向上させ、新たな学習モードの確立を促進することができると考えられる。 The graph structure is a commonly used data storage mode, and it turns out that the low-dimensional embedded representation of nodes in the graph is extremely useful in various typical tasks, such as node classification, link prediction , etc. However, most of the existing approaches start from the binary relationship (i.e., edges) in the graph and have not leveraged the higher order local structure (i.e., motifs) of the graph. Here, we propose mGCMN -- a novel framework which utilizes node feature information and the higher order local structure of the graph to effectively generate node embeddings for previously unseen data. Through research we have found that different types of networks have different key motifs. And the advantages of our method over the baseline methods have been demonstrated in a large number of experiments on citation network and social network datasets. At the same time, a positive correlation between increase of the classification accuracy and the clustering coefficient is revealed. It is believed that using high order structural information can truly manifest the potential of the network, which will greatly improve the learning efficiency of the graph neural network and promote a brand-new learning mode establishment.	翻訳日:2022-11-04 07:14:51 公開日:2020-07-31
# DeepCOVIDNet:不均一な特徴と相互作用を用いた新型コロナウイルスの予測監視のための解釈可能なディープラーニングモデル DeepCOVIDNet: An Interpretable Deep Learning Model for Predictive Surveillance of COVID-19 Using Heterogeneous Features and their Interactions ( http://arxiv.org/abs/2008.00115v1 ) ライセンス: Link先を確認	Ankit Ramchandani, Chao Fan, Ali Mostafavi	(参考訳) 本稿では,今後新型コロナウイルス感染者の増加範囲を予測するための深層学習モデルを提案し,多変量時系列と多変量空間時系列データの等次元表現を計算するための新しい手法を提案する。この手法により, 国勢調査データ, 地域内移動性, 地域間移動性, 社会的分散性データ, 感染の過去成長など, 様々な特徴を取り入れ, それらの特徴間の複雑な相互作用を学ぶことができる。様々な情報源から収集したデータを用いて,米国全郡で7日間の感染者増加範囲を推定した。また,このモデルを用いて,感染拡大の予測に最も影響力のある特徴を同定する。また、特徴のペアを分析し、観察された2次相互作用の量を推定する。実験により,提案モデルが良好な予測性能と極めて解釈可能な特徴分析結果を得ることで,新型コロナウイルス等の全国レベルのパンデミック監視のための標準疫学モデルを補完する可能性が示唆された。深層学習モデルから得られた結果と結果は、政策立案者や研究者に効果的な緩和と対応戦略を考案する上で有益である。さらなる開発と実験を素早く進めるために、提案されたモデルを実装するために使用されるコードは、完全にオープンソース化された。 In this paper, we propose a deep learning model to forecast the range of increase in COVID-19 infected cases in future days and we present a novel method to compute equidimensional representations of multivariate time series and multivariate spatial time series data. Using this novel method, the proposed model can both take in a large number of heterogeneous features, such as census data, intra-county mobility, inter-county mobility, social distancing data, past growth of infection, among others, and learn complex interactions between these features. Using data collected from various sources, we estimate the range of increase in infected cases seven days into the future for all U.S. counties. In addition, we use the model to identify the most influential features for prediction of the growth of infection. We also analyze pairs of features and estimate the amount of observed second-order interaction between them. Experiments show that the proposed model obtains satisfactory predictive performance and fairly interpretable feature analysis results; hence, the proposed model could complement the standard epidemiological models for national-level surveillance of pandemics, such as COVID-19. The results and findings obtained from the deep learning model could potentially inform policymakers and researchers in devising effective mitigation and response strategies. To fast-track further development and experimentation, the code used to implement the proposed model has been made fully open source.	翻訳日:2022-11-04 07:13:59 公開日:2020-07-31
# 音響シーン分類におけるデバイス適応のための神経ラベル埋め込みによるリレーショナル教師学生学習 Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification ( http://arxiv.org/abs/2008.00110v1 ) ライセンス: Link先を確認	Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee	(参考訳) 本稿では,ニューラルラベル埋め込み(NLE)と関係教師学習(RTSL)を活用した音響シーン分類におけるデバイスミスマッチ問題に対処するドメイン適応フレームワークを提案する。提案手法では,音響シーンクラス間の構造的関係を考慮し,デバイスに依存しない関係を捉える。トレーニング段階では、転写可能な知識はソースドメインからNLEに凝縮される。次に、適応段階において、従来の教師学習でしばしば必要とされるペア・ソース・ターゲットデータを用いることなく、適応対象モデルを学習するための新しいRTSL戦略を採用する。提案するフレームワークはDCASE 2018 Task1bデータセットで評価されている。 AlexNet-L深層分類モデルによる実験結果から,提案手法の有効性が確認された。 NLE-alone適応は、従来のデバイス適応や教師による適応技術と好適に比較できる。 RTSLによるNLEはさらに分類精度を向上させる。 In this paper, we propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL). Taking into account the structural relationships between acoustic scene classes, our proposed framework captures such relationships which are intrinsically device-independent. In the training stage, transferable knowledge is condensed in NLE from the source domain. Next in the adaptation stage, a novel RTSL strategy is adopted to learn adapted target models without using paired source-target data often required in conventional teacher student learning. The proposed framework is evaluated on the DCASE 2018 Task1b data set. Experimental results based on AlexNet-L deep classification models confirm the effectiveness of our proposed approach for mismatch situations. NLE-alone adaptation compares favourably with the conventional device adaptation and teacher student based adaptation techniques. NLE with RTSL further improves the classification accuracy.	翻訳日:2022-11-04 07:07:27 公開日:2020-07-31
# DBLSTM-CTCを用いた手書き文字認識における暗黙的・明示的言語モデル情報の効果に関する研究 A Study on Effects of Implicit and Explicit Language Model Information for DBLSTM-CTC Based Handwriting Recognition ( http://arxiv.org/abs/2008.01532v1 ) ライセンス: Link先を確認	Qi Liu, Lijuan Wang, Qiang Huo	(参考訳) 直交時間分類(CTC)出力層を備えたD-BLSTM(Deep Bidirectional Long Short-Term Memory)が手書き認識のための最先端ソリューションとして確立されている。 CTC目的関数を用いて訓練されたDBLSTMは、文字モデリングのためのローカル文字画像依存性と暗黙的な言語モデリングのための長距離コンテキスト依存性の両方を学ぶことはよく知られている。本稿では,DBLSTM-CTCを用いた手書き文字認識における暗黙的および明示的言語モデル情報の効果について,明示的言語モデルを用いた復号化の性能の比較を行った。 100万行のトレーニング文を使用してDBLSTMをトレーニングしても、明示的な言語モデルを使用することは有用である。このような大規模トレーニング問題に対処するために,mini-batch based epochwise back propagation through time (bptt)アルゴリズムを用いて,dblstmのctcトレーニング用gpuベースのトレーニングツールを開発した。 Deep Bidirectional Long Short-Term Memory (D-BLSTM) with a Connectionist Temporal Classification (CTC) output layer has been established as one of the state-of-the-art solutions for handwriting recognition. It is well known that the DBLSTM trained by using a CTC objective function will learn both local character image dependency for character modeling and long-range contextual dependency for implicit language modeling. In this paper, we study the effects of implicit and explicit language model information for DBLSTM-CTC based handwriting recognition by comparing the performance of using or without using an explicit language model in decoding. It is observed that even using one million lines of training sentences to train the DBLSTM, using an explicit language model is still helpful. To deal with such a large-scale training problem, a GPU-based training tool has been developed for CTC training of DBLSTM by using a mini-batch based epochwise Back Propagation Through Time (BPTT) algorithm.	翻訳日:2022-11-04 07:07:11 公開日:2020-07-31
# LVCSRのための将来ベクトル拡張LSTM言語モデル Future Vector Enhanced LSTM Language Model for LVCSR ( http://arxiv.org/abs/2008.01832v1 ) ライセンス: Link先を確認	Qi Liu, Yanmin Qian, Kai Yu	(参考訳) 言語モデル(LM)は,大語彙連続音声認識(LVCSR)において重要な役割を果たす。しかしながら、従来の言語モデルは、与えられた歴史を持つ次の単一の単語しか予測しないが、連続した単語列の予測は通常、LVCSRで要求され有用である。学習中の単一単語予測モデルと読み出し要求における長期シーケンス予測のミスマッチは、性能低下につながる可能性がある。本稿では,将来ベクトルを用いた拡張長短期メモリ(LSTM)LMを提案する。与えられた履歴に加えて、シーケンスの残りの部分は将来のベクターによって埋め込まれる。この将来のベクターはlstm lmに組み込むことができるので、より長期のシーケンスレベルの情報をモデル化することができる。実験の結果,提案したLSTM LMはBLEUスコアよりも長期的シーケンス予測が優れていることがわかった。音声認識では,提案するLSTM LMは若干の利得が得られるが,従来のLSTM LMとの大きな相補性が得られると考えられる。新たなLSTM LMと従来のLSTM LMを併用することで,単語誤り率を大幅に向上させることができる。 Language models (LM) play an important role in large vocabulary continuous speech recognition (LVCSR). However, traditional language models only predict next single word with given history, while the consecutive predictions on a sequence of words are usually demanded and useful in LVCSR. The mismatch between the single word prediction modeling in trained and the long term sequence prediction in read demands may lead to the performance degradation. In this paper, a novel enhanced long short-term memory (LSTM) LM using the future vector is proposed. In addition to the given history, the rest of the sequence will be also embedded by future vectors. This future vector can be incorporated with the LSTM LM, so it has the ability to model much longer term sequence level information. Experiments show that, the proposed new LSTM LM gets a better result on BLEU scores for long term sequence prediction. For the speech recognition rescoring, although the proposed LSTM LM obtains very slight gains, the new model seems obtain the great complementary with the conventional LSTM LM. Rescoring using both the new and conventional LSTM LMs can achieve a very large improvement on the word error rate.	翻訳日:2022-11-04 07:06:54 公開日:2020-07-31
# ビデオ領域適応のための逆二部グラフ学習 Adversarial Bipartite Graph Learning for Video Domain Adaptation ( http://arxiv.org/abs/2007.15829v1 ) ライセンス: Link先を確認	Yadan Luo, Zi Huang, Zijian Wang, Zheng Zhang, Mahsa Baktashmotlagh	(参考訳) 異なる領域間のモデルを適応させることに焦点を当てたドメイン適応技術は、ソース(トレーニング)とターゲット(テスト)ドメイン間での空間的および時間的な大きなシフトのため、ビデオ認識領域ではほとんど研究されない。このように、視覚領域適応に関する最近の研究は、逆学習を利用して、ソースとターゲットビデオの表現を統一し、特徴伝達性を強化しているため、ビデオにはあまり効果がない。この制限を克服するために,本研究では,ドメイン不変表現を学習する代わりに,ドメインに依存しないビデオ分類器を学習し,両部グラフのネットワークトポロジとソースターゲットの相互作用を直接モデル化するAdversarial Bipartite Graph (ABG) 学習フレームワークを提案する。具体的には、ソースフレームとターゲットフレームを異種頂点としてサンプリングし、2種類のノードを接続するエッジがそれらの親和性を測定する。メッセージパッシングを通じて、それぞれの頂点は、その異種の隣人から特徴を集約し、同じクラスから来る特徴を均等に混合させる。トレーニングとテストステージでビデオ分類器をこのようなクロスドメイン表現に明示的に公開することで,ラベル付きソースデータへの偏りが少なくなり,結果としてターゲットドメインのより優れた一般化が可能になるのです。モデルキャパシティをさらに高め,難読化タスクにおけるアーキテクチャのロバスト性を検証するため,ビデオレベルの二部グラフを付加した半教師付き環境での作業にモデルを拡張した。 4つのベンチマークで行った大規模な実験は、ビデオ認識におけるSOTA法に対する提案手法の有効性を実証している。 Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area due to the significant spatial and temporal shifts across the source (i.e. training) and target (i.e. test) domains. As such, recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations and strengthen the feature transferability are not highly effective on the videos. To overcome this limitation, in this paper, we learn a domain-agnostic video classifier instead of learning domain-invariant representations, and propose an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions with a network topology of the bipartite graph. Specifically, the source and target frames are sampled as heterogeneous vertexes while the edges connecting two types of nodes measure the affinity among them. Through message-passing, each vertex aggregates the features from its heterogeneous neighbors, forcing the features coming from the same class to be mixed evenly. Explicitly exposing the video classifier to such cross-domain representations at the training and test stages makes our model less biased to the labeled source data, which in-turn results in achieving a better generalization on the target domain. To further enhance the model capacity and testify the robustness of the proposed architecture on difficult transfer tasks, we extend our model to work in a semi-supervised setting using an additional video-level bipartite graph. Extensive experiments conducted on four benchmarks evidence the effectiveness of the proposed approach over the SOTA methods on the task of video recognition.	翻訳日:2022-11-04 07:06:16 公開日:2020-07-31
# デモビデオからの動作コードの推定 Estimating Motion Codes from Demonstration Videos ( http://arxiv.org/abs/2007.15841v1 ) ライセンス: Link先を確認	Maxat Alibayev, David Paulius and Yu Sun	(参考訳) 運動分類学は、操作をバイナリ符号化された表現としてエンコードすることができる。これらの動き符号は、接触や軌道タイプを含む動きの機械的特徴を記述する埋め込み空間における操作動作を本質的に表わす。埋め込みにモーションコードを使用する主な利点は、動きをロボット関連の特徴でより適切に定義でき、それらの距離をこれらの動き特徴を用いてより合理的に測定できることである。本稿では,実演ビデオから動作コードを教師なしで抽出する深層学習パイプラインを開発し,その知識をロボットに適切に表現し,活用する。評価の結果,EPIC-KITCHENSデータセットにおける動作のデモから動作符号を抽出できることが示唆された。 A motion taxonomy can encode manipulations as a binary-encoded representation, which we refer to as motion codes. These motion codes innately represent a manipulation action in an embedded space that describes the motion's mechanical features, including contact and trajectory type. The key advantage of using motion codes for embedding is that motions can be more appropriately defined with robotic-relevant features, and their distances can be more reasonably measured using these motion features. In this paper, we develop a deep learning pipeline to extract motion codes from demonstration videos in an unsupervised manner so that knowledge from these videos can be properly represented and used for robots. Our evaluations show that motion codes can be extracted from demonstrations of action in the EPIC-KITCHENS dataset.	翻訳日:2022-11-04 07:05:46 公開日:2020-07-31
# ロバスト糖尿病網膜症スクリーニングにおける残肝-CycleGANを用いたカメラ適応 Residual-CycleGAN based Camera Adaptation for Robust Diabetic Retinopathy Screening ( http://arxiv.org/abs/2007.15874v1 ) ライセンス: Link先を確認	Dalu Yang, Yehui Yang, Tiantian Huang, Binghong Wu, Lei Wang, Yanwu Xu	(参考訳) 基礎画像からの糖尿病網膜症(dr)の自動検出に焦点をあてた広範な研究がある。しかし、実際のdrスクリーニングでこれらのモデルを適用する場合、精度の低下は観測され、そこでは、訓練中のim-ageをキャプチャするために使用されるものと、楽しいカメラブランドが異なる。カメラブランドが1つしか持たないが、他のブランドのカメラで撮られたイマージで優れたパフォーマンスを実現している、ラベル付きファンドイメージの分類モデルをどうやってトレーニングできるのか? 本稿では,dr分類モデルの性能に及ぼすファンドスカメラブランドのドメインシフトの影響を,実験的な観点から定量的に検証する。さらに,領域適応によりカメラブランドの差異を緩和し,対象カメラ画像の分類性能の向上を図るために,カメラ指向残像GANを提案する。 EyePACS da-tasetとプライベートデータセットの両方での大規模なアブレーション実験により、カメラのブランド差が分類性能にシグニフィカライズし、提案したメソオードが対象領域におけるモデル性能を効果的に改善できることが証明された。我々は、eyepacs da-tasetの各画像のカメラブランドを推測し、ラベル付けし、さらにドメイン適応の研究のためにカメラブランドのラベルを公表する。 There are extensive researches focusing on automated diabetic reti-nopathy (DR) detection from fundus images. However, the accuracy drop is ob-served when applying these models in real-world DR screening, where the fun-dus camera brands are different from the ones used to capture the training im-ages. How can we train a classification model on labeled fundus images ac-quired from only one camera brand, yet still achieves good performance on im-ages taken by other brands of cameras? In this paper, we quantitatively verify the impact of fundus camera brands related domain shift on the performance of DR classification models, from an experimental perspective. Further, we pro-pose camera-oriented residual-CycleGAN to mitigate the camera brand differ-ence by domain adaptation and achieve increased classification performance on target camera images. Extensive ablation experiments on both the EyePACS da-taset and a private dataset show that the camera brand difference can signifi-cantly impact the classification performance and prove that our proposed meth-od can effectively improve the model performance on the target domain. We have inferred and labeled the camera brand for each image in the EyePACS da-taset and will publicize the camera brand labels for further research on domain adaptation.	翻訳日:2022-11-04 07:05:05 公開日:2020-07-31
# Visual SLAMのための動的物体追跡とマスキング Dynamic Object Tracking and Masking for Visual SLAM ( http://arxiv.org/abs/2008.00072v1 ) ライセンス: Link先を確認	Jonathan Vincent, Mathieu Labb\'e, Jean-Samuel Lauzon, Fran\c{c}ois Grondin, Pier-Marc Comtois-Rivet, Fran\c{c}ois Michaud	(参考訳) 動的環境下では、動いた物体から得られる視覚的特徴によって視覚的SLAM技術の性能が損なわれる可能性がある。一つの解決策は、それらのオブジェクトを識別して、その視覚的特徴をローカライズとマッピングのために取り除くことである。本稿では,深層ニューラルネットワーク,拡張kalmanフィルタ,視覚スラムを用いて,動的環境(gtx 1080の約14fps)における局所化とマッピングの両方を改善する,シンプルで高速なパイプラインを提案する。 RTAB-MapをビジュアルSLAMとして使用したTUMデータセットからの動的シーケンスの結果から,本手法は他の最先端手法と同じようなローカライゼーション性能を実現するとともに,追跡された動的オブジェクトの位置,それらの動的オブジェクトを含まない3Dマップ,パイプライン全体を適度に移動するロボット上でのループクロージャ検出の改善などが示唆された。 In dynamic environments, performance of visual SLAM techniques can be impaired by visual features taken from moving objects. One solution is to identify those objects so that their visual features can be removed for localization and mapping. This paper presents a simple and fast pipeline that uses deep neural networks, extended Kalman filters and visual SLAM to improve both localization and mapping in dynamic environments (around 14 fps on a GTX 1080). Results on the dynamic sequences from the TUM dataset using RTAB-Map as visual SLAM suggest that the approach achieves similar localization performance compared to other state-of-the-art methods, while also providing the position of the tracked dynamic objects, a 3D map free of those dynamic objects, better loop closure detection with the whole pipeline able to run on a robot moving at moderate speed.	翻訳日:2022-11-04 07:04:19 公開日:2020-07-31
# ベータスタビライザを用いたディープラーニングに関する研究 An Investigation on Deep Learning with Beta Stabilizer ( http://arxiv.org/abs/2008.01173v1 ) ライセンス: Link先を確認	Qi Liu, Tian Tan, Kai Yu	(参考訳) 人工ニューラルネットワーク(ANN)は、手書き認識や音声認識などの多くのアプリケーションで使われている。ニューラルネットワークのトレーニング手順において,学習率が重要であることはよく知られている。学習率の初期値が最終結果に合致し得ることが示され、この値は実際に常に手動で設定される。ベータ安定化器と呼ばれる新しいパラメータを導入し、初期学習率の感度を下げた。しかし、この手法はシグモイド活性化機能を持つディープニューラルネットワーク(DNN)に対してのみ提案されている。本稿では,ベータ安定化器を長期記憶(LSTM)に拡張し,LSTMとDNNを含む様々なモデルに対するベータ安定化器パラメータの効果を検討した。ベータ安定化パラメータは、Reluアクティベーション関数とLSTMを持つDNNでほぼ同じ性能で学習率の感度を低下させることができると結論付けた。しかし, 可溶性活性化機能を有するDNNとLSTMに対するβ安定化剤の効果は, シグモイド活性化機能を有するDNNに対する影響よりも小さいことがわかった。 Artificial neural networks (ANN) have been used in many applications such like handwriting recognition and speech recognition. It is well-known that learning rate is a crucial value in the training procedure for artificial neural networks. It is shown that the initial value of learning rate can confoundedly affect the final result and this value is always set manually in practice. A new parameter called beta stabilizer has been introduced to reduce the sensitivity of the initial learning rate. But this method has only been proposed for deep neural network (DNN) with sigmoid activation function. In this paper we extended beta stabilizer to long short-term memory (LSTM) and investigated the effects of beta stabilizer parameters on different models, including LSTM and DNN with relu activation function. It is concluded that beta stabilizer parameters can reduce the sensitivity of learning rate with almost the same performance on DNN with relu activation function and LSTM. However, it is shown that the effects of beta stabilizer on DNN with relu activation function and LSTM are fewer than the effects on DNN with sigmoid activation function.	翻訳日:2022-11-04 06:57:43 公開日:2020-07-31
# ブラジャーとケッツでブラケットするブラケット Bracketing brackets with bras and kets ( http://arxiv.org/abs/2008.12247v1 ) ライセンス: Link先を確認	Emily Clark, Angelie Vincent, J. Nathan Kutz, and Steven L. Brunton	(参考訳) ブラケットは航空機の製造と設計、部品の接合、重量の維持、ワイヤーの保持、継手強化に欠かせない要素である。全ての航空機で数百から数千の独特なブラケットが使用されるが、多くの異なるブラケットの製造は非効率で高価である。幸運にも、いわゆる「異なる」ブラケットの多くは、実際には互いに非常に似ているか、あるいは同一である。本稿では,ブラケットデータの階層的クラスタリングに基づいて,現在のブラケットの大きなカタログから比較的小さな代表ブラケット群を構築するための,データ駆動型フレームワークを提案する。現代の商用機では、テストセットの半分を十分に正確に記述しながら、ブラケットの完全なセットを30 %削減できることがわかった。このアプローチは、内積の「bra」と「ket」である2つの括弧間の多目的類似性を定量化する内積を設計することに基づいている。航空機製造におけるブラケット数を削減するために,本アルゴリズムを実証するが,大規模コンポーネントの標準化にも適用できる。 Brackets are an essential component in aircraft manufacture and design, joining parts together, supporting weight, holding wires, and strengthening joints. Hundreds or thousands of unique brackets are used in every aircraft, but manufacturing a large number of distinct brackets is inefficient and expensive. Fortunately, many so-called "different" brackets are in fact very similar or even identical to each other. In this manuscript, we present a data-driven framework for constructing a comparatively small group of representative brackets from a large catalog of current brackets, based on hierarchical clustering of bracket data. We find that for a modern commercial aircraft, the full set of brackets can be reduced by 30\% while still describing half of the test set sufficiently accurately. This approach is based on designing an inner product that quantifies a multi-objective similarity between two brackets, which are the "bra" and the "ket" of the inner product. Although we demonstrate this algorithm to reduce the number of brackets in aerospace manufacturing, it may be generally applied to any large-scale component standardization effort.	翻訳日:2022-11-04 06:57:25 公開日:2020-07-31
# 実行時情報を組み込んだ準最適反応合成 Near-Optimal Reactive Synthesis Incorporating Runtime Information ( http://arxiv.org/abs/2007.16107v1 ) ライセンス: Link先を確認	Suda Bharadwaj, Abraham P. Vinod, Rayna Dimitrova, Ufuk Topcu	(参考訳) 我々は,動的環境におけるミッション仕様を満たす戦略を計算し,性能指標を最適化する,最適反応合成の問題を考える。実行時にのみ利用可能なタスククリティカルな情報を戦略合成に組み込んで,パフォーマンスを向上させる。このような時間変化情報を利用する既存のアプローチは、リアルタイムアプリケーションでは計算不可能なオンライン再合成を必要とする。本稿では,候補のインスタンス化に対応する戦略のセット(事前特定代表情報シナリオ)を事前に合成する。そこで我々は,すべての安全性と生存目標を満たしながら,実行時の戦略を動的に切り替える新しいスイッチング機構を提案する。また、パフォーマンスサブオプティリティの境界を特徴付ける。そこで本研究では,ロボットの目標位置の可能性をリアルタイムで更新するロボット運動計画手法と,都市空力移動のための航空交通管理問題について紹介する。 We consider the problem of optimal reactive synthesis - compute a strategy that satisfies a mission specification in a dynamic environment, and optimizes a performance metric. We incorporate task-critical information, that is only available at runtime, into the strategy synthesis in order to improve performance. Existing approaches to utilising such time-varying information require online re-synthesis, which is not computationally feasible in real-time applications. In this paper, we pre-synthesize a set of strategies corresponding to candidate instantiations (pre-specified representative information scenarios). We then propose a novel switching mechanism to dynamically switch between the strategies at runtime while guaranteeing all safety and liveness goals are met. We also characterize bounds on the performance suboptimality. We demonstrate our approach on two examples - robotic motion planning where the likelihood of the position of the robot's goal is updated in real-time, and an air traffic management problem for urban air mobility.	翻訳日:2022-11-04 06:57:08 公開日:2020-07-31
# 複数属性選択決定のためのルックアヘッドおよびハイブリッドサンプル割り当て手順 Lookahead and Hybrid Sample Allocation Procedures for Multiple Attribute Selection Decisions ( http://arxiv.org/abs/2007.16119v1 ) ライセンス: Link先を確認	Jeffrey W. Herrmann and Kunal Mehta	(参考訳) 属性は、意思決定者が検討している代替案に関する重要な情報を提供する。大きさが不確実な場合、意思決定者はどの選択肢が本当に最良ののかわからないため、属性の測定は意思決定者がよりよい判断を下すのに役立つかもしれない。本稿では、各測定値が1つの属性の1つのサンプルを1つの代替として生成する設定について考察する。収集すべきサンプル数が一定であれば、どのサンプルを取得するかを決定し、測定を行い、属性の規模に関する事前の信念を更新し、代替案を選択する必要がある。本稿では,複数の属性選択決定に対するサンプル割当問題を提案し,不確かさをモデル化するために離散分布を用いた場合の2つの逐次的ルックアヘッド手順を提案する。 2つの手順は似ているが、異なる品質基準(と損失関数)を反映しており、これは異なる決定ルールを動機付けている。そこで本研究では,まず一様アロケーション手順を用いてサンプルを割り当て,次にシーケンシャルなルックアヘッド手順を用いて,シーケンシャルなプロシージャとハイブリッドなプロシージャの性能を評価するためのシミュレーション研究を行った。その結果,初期標本の多く(すべてではないが)を均一な割当手順で割当てることによって,全体の計算労力を削減できるだけでなく,平均的な機会コストが低く,真にベストな代替案も選択できることが示唆された。 Attributes provide critical information about the alternatives that a decision-maker is considering. When their magnitudes are uncertain, the decision-maker may be unsure about which alternative is truly the best, so measuring the attributes may help the decision-maker make a better decision. This paper considers settings in which each measurement yields one sample of one attribute for one alternative. When given a fixed number of samples to collect, the decision-maker must determine which samples to obtain, make the measurements, update prior beliefs about the attribute magnitudes, and then select an alternative. This paper presents the sample allocation problem for multiple attribute selection decisions and proposes two sequential, lookahead procedures for the case in which discrete distributions are used to model the uncertain attribute magnitudes. The two procedures are similar but reflect different quality measures (and loss functions), which motivate different decision rules: (1) select the alternative with the greatest expected utility and (2) select the alternative that is most likely to be the truly best alternative. We conducted a simulation study to evaluate the performance of the sequential procedures and hybrid procedures that first allocate some samples using a uniform allocation procedure and then use the sequential, lookahead procedure. The results indicate that the hybrid procedures are effective; allocating many (but not all) of the initial samples with the uniform allocation procedure not only reduces overall computational effort but also selects alternatives that have lower average opportunity cost and are more often truly best.	翻訳日:2022-11-04 06:56:55 公開日:2020-07-31
# the tactician (extended version):coqのためのシームレスでインタラクティブな戦術学習者と証明者 The Tactician (extended version): A Seamless, Interactive Tactic Learner and Prover for Coq ( http://arxiv.org/abs/2008.00120v1 ) ライセンス: Link先を確認	Lasse Blaauwbroek, Josef Urban and Herman Geuvers	(参考訳) 我々はCoq Proof Assistantの戦術学習者であり証明者であるTacticianを紹介する。 Tacticianは、一般的な証明戦略のコントロールを維持しながら、ユーザーが戦術的証明決定を行うのを助ける。この目的のために、tacticianは以前に書かれた戦術スクリプトから学習し、実行すべき次の戦術について提案するか、証明合成の負担を完全に引き受ける。 Tacticianの目標は、ユーザに対してシームレスでインタラクティブで直感的なエクスペリエンスと、堅牢で適応的な証明自動化を提供することだ。本稿では,ユーザの視点からのTacticianの概要を概観し,大規模に学習しながら,パッケージ依存管理の日常的利用と課題について述べる。最後に、tacticianのcoqプラグインと機械学習プラットフォームとしての実装について紹介する。 We present Tactician, a tactic learner and prover for the Coq Proof Assistant. Tactician helps users make tactical proof decisions while they retain control over the general proof strategy. To this end, Tactician learns from previously written tactic scripts and gives users either suggestions about the next tactic to be executed or altogether takes over the burden of proof synthesis. Tactician's goal is to provide users with a seamless, interactive, and intuitive experience together with robust and adaptive proof automation. In this paper, we give an overview of Tactician from the user's point of view, regarding both day-to-day usage and issues of package dependency management while learning in the large. Finally, we give a peek into Tactician's implementation as a Coq plugin and machine learning platform.	翻訳日:2022-11-04 06:56:15 公開日:2020-07-31
# 非同期分散マイクロホンを用いた発話会議記録システム Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones ( http://arxiv.org/abs/2007.15868v1 ) ライセンス: Link先を確認	Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu	(参考訳) 本稿では,非同期マイクロホンを用いた新しい音声書き起こしフレームワークを提案する。音声同期、話者ダイアリゼーション、誘導音源分離を用いた発話音声強調、自動音声認識、重複低減で構成されている。音声強調の前に話者ダイアリゼーションを行うことで、マイクロホン間のサンプリング周波数ミスマッチを考慮せずに重複音声を処理することができる。実際の会議データセットから,11個の分散マイクロホンを用いて28.7%の文字誤り率(CER)を達成し,テーブル中央のモノラルマイクロホンは38.2%のCERを示した。また,本フレームワークは21.8 %のcerを達成し,ヘッドセット用マイクロホンによる音声認識では2.1ポイントの精度を示した。 A novel framework for meeting transcription using asynchronous microphones is proposed in this paper. It consists of audio synchronization, speaker diarization, utterance-wise speech enhancement using guided source separation, automatic speech recognition, and duplication reduction. Doing speaker diarization before speech enhancement enables the system to deal with overlapped speech without considering sampling frequency mismatch between microphones. Evaluation on our real meeting datasets showed that our framework achieved a character error rate (CER) of 28.7 % by using 11 distributed microphones, while a monaural microphone placed on the center of the table had a CER of 38.2 %. We also showed that our framework achieved CER of 21.8 %, which is only 2.1 percentage points higher than the CER in headset microphone-based transcription.	翻訳日:2022-11-04 06:56:02 公開日:2020-07-31
# 部分発話を用いた音響シーン分類における音響セグメントモデルに基づくセグメント単位選択手法 An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances ( http://arxiv.org/abs/2008.00107v1 ) ライセンス: Link先を確認	Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee	(参考訳) 本稿では,音響シーン分類(asc)のための情報が少ない音声録音における音響セグメントを除去するサブ発話単位選択フレームワークを提案する。このアプローチは,音響シーン全体の空間をカバーする音響セグメント単位の普遍セットを基盤としている。まず、これらの単位を音響セグメントモデル(ASM)でモデル化し、音響シーンの発話を音響セグメント単位のシーケンスにトークン化する。次に、情報検索における停止語の概念と並行して、ASMを自動的に検出する。最後に、ほとんどの音響シーンの検索においてインデックス化能力の低いため、停止ASMに関連する音響セグメントをブロックする。全発話を含むシーンモデルとは対照的に、ASM除去サブ発話、すなわち音節を停止しない音響発話は、最終分類のためのAlexNet-Lバックエンドへの入力として使用される。 dcase 2018データセットでは、シーン分類の精度が、発話全体の68%からセグメント選択による72.1%に向上した。これはデータ拡張やアンサンブル戦略を使わずに競合する精度を示す。さらに,本手法は注意を払ってAlexNet-Lと比較した。 In this paper, we propose a sub-utterance unit selection framework to remove acoustic segments in audio recordings that carry little information for acoustic scene classification (ASC). Our approach is built upon a universal set of acoustic segment units covering the overall acoustic scene space. First, those units are modeled with acoustic segment models (ASMs) used to tokenize acoustic scene utterances into sequences of acoustic segment units. Next, paralleling the idea of stop words in information retrieval, stop ASMs are automatically detected. Finally, acoustic segments associated with the stop ASMs are blocked, because of their low indexing power in retrieval of most acoustic scenes. In contrast to building scene models with whole utterances, the ASM-removed sub-utterances, i.e., acoustic utterances without stop acoustic segments, are then used as inputs to the AlexNet-L back-end for final classification. On the DCASE 2018 dataset, scene classification accuracy increases from 68%, with whole utterances, to 72.1%, with segment selection. This represents a competitive accuracy without any data augmentation, and/or ensemble strategy. Moreover, our approach compares favourably to AlexNet-L with attention.	翻訳日:2022-11-04 06:55:25 公開日:2020-07-31
# 低光度画像におけるサルエント物体検出のための画像強調の検討 Exploring Image Enhancement for Salient Object Detection in Low Light Images ( http://arxiv.org/abs/2007.16124v1 ) ライセンス: Link先を確認	Xin Xu, Shiqin Wang, Zheng Wang, Xiaolong Zhang, and Ruimin Hu	(参考訳) 非一様照明環境で撮影される低光画像は通常、シーン深度と対応する環境光で劣化する。この劣化により、劣化した画像モダリティの厳しい物体情報損失が発生し、低コントラスト特性と人工光の影響により、顕著な物体検出がより困難になる。しかし,実世界のシナリオでは実現不可能な十分な明るさ環境下で画像が撮影されるという仮定に基づいて,既存の正当性物体検出モデルを開発した。本研究では,低照度画像における物体検出を容易にする画像強調手法を提案する。提案モデルでは, 物理照明モデルを深部ニューラルネットワークに直接埋め込んで, 低光画像の劣化を記述する。さらに、非局所ブロック層を用いて、物体の局所的内容と、その局所的に有利な領域との差をキャプチャする。定量的評価のために,画素レベルの人間ラベル付地中アノテーションを用いた低照度画像データセットを構築し,4つの公開データセットとベンチマークデータセットで有望な結果を報告する。 Low light images captured in a non-uniform illumination environment usually are degraded with the scene depth and the corresponding environment lights. This degradation results in severe object information loss in the degraded image modality, which makes the salient object detection more challenging due to low contrast property and artificial light influence. However, existing salient object detection models are developed based on the assumption that the images are captured under a sufficient brightness environment, which is impractical in real-world scenarios. In this work, we propose an image enhancement approach to facilitate the salient object detection in low light images. The proposed model directly embeds the physical lighting model into the deep neural network to describe the degradation of low light images, in which the environment light is treated as a point-wise variate and changes with local content. Moreover, a Non-Local-Block Layer is utilized to capture the difference of local content of an object against its local neighborhood favoring regions. To quantitative evaluation, we construct a low light Images dataset with pixel-level human-labeled ground-truth annotations and report promising results on four public datasets and our benchmark dataset.	翻訳日:2022-11-04 06:48:56 公開日:2020-07-31
# ハイブリッド特徴選択モデルに基づくPalm静脈同定 Palm Vein Identification based on hybrid features selection model ( http://arxiv.org/abs/2007.16195v1 ) ライセンス: Link先を確認	Mohammed Hamzah Abed, Ali H. Alsaeedi, Ali D. Alfoudi, Abayomi M. Otebolaku, Yasmeen Sajid Razooqi	(参考訳) Palm vein Identification (PVI) は、セキュリティと認証システムを強化するために使われる生体認証技術である。手のひら静脈パターンの鍵となる特徴は、その個性、忘れられない、意図しない、無許可の人物が取ることができないことである。しかし,手のひら静脈パターンから抽出した特徴は,高い冗長性を有する。本稿では,2次元離散ウェーブレット変換,主成分分析 (PCA) と粒子群最適化 (PSO) (2D-DWTPP) の組合せモデルを提案する。 2D-DWT抽出はヤシ静脈像から特徴を抽出し,PCAはヤシ静脈像の冗長性を低下させる。このシステムは、ラッパーモデルに基づいて高い尊敬機能を選択するように訓練されている。 PSOは、機能の最適なサブセットによってラッパーモデルを供給します。提案システムは4つの分類器を目的関数として使用し、サポートベクターマシン(svm)、k極近傍(knn)、決定木(dt)、na\"ive bayes(nb)を含むvpiを決定する。実験結果から,提案システムIitはSVMで最高の結果を得た。提案した2D-DWTPPモデルについて評価し,特徴選択のないAlexnetや分類器と比較して,顕著な効率性を示した。実験的に、このモデルは (98.65) に反映され、一方 alexnet は (63.5) 、応用分類器は特徴選択なし (78.79) である。 Palm vein identification (PVI) is a modern biometric security technique used for increasing security and authentication systems. The key characteristics of palm vein patterns include, its uniqueness to each individual, unforgettable, non-intrusive and cannot be taken by an unauthorized person. However, the extracted features from the palm vein pattern are huge with high redundancy. In this paper, we propose a combine model of two-Dimensional Discrete Wavelet Transform, Principal Component Analysis (PCA), and Particle Swarm Optimization (PSO) (2D-DWTPP) to enhance prediction of vein palm patterns. The 2D-DWT Extracts features from palm vein images, PCA reduces the redundancy in palm vein features. The system has been trained in selecting high reverent features based on the wrapper model. The PSO feeds wrapper model by an optimal subset of features. The proposed system uses four classifiers as an objective function to determine VPI which include Support Vector Machine (SVM), K Nearest Neighbor (KNN), Decision Tree (DT) and Na\"ive Bayes (NB). The empirical result shows the proposed system Iit satisfied best results with SVM. The proposed 2D-DWTPP model has been evaluated and the results shown remarkable efficiency in comparison with Alexnet and classifier without feature selection. Experimentally, our model has better accuracy reflected by (98.65) while Alexnet has (63.5) and applied classifier without feature selection has (78.79).	翻訳日:2022-11-04 06:48:35 公開日:2020-07-31
# 車両計数のための物体検出と追尾アルゴリズム:比較分析 Object Detection and Tracking Algorithms for Vehicle Counting: A Comparative Analysis ( http://arxiv.org/abs/2007.16198v1 ) ライセンス: Link先を確認	Vishal Mandal and Yaw Adu-Gyamfi	(参考訳) ディープラーニングとハイパフォーマンスコンピューティングの分野における急速な進歩は、ビデオベースの車両計数システムの範囲を大きく拡大した。本稿では,関心領域(ROI)の異なる車両のクラスを検出し,追跡するために,アートオブジェクトの検出と追跡アルゴリズムのいくつかの状態をデプロイする。 ROIにおける車両の正確な検出と追跡の目標は、正確な車両数を得ることである。異なるトラッキングシステムと組み合わせたオブジェクト検出モデルの複数組み合わせを、最良の車両カウントフレームワークへのアクセスに適用する。モデルでは、異なる気象条件、閉塞、低照度設定に関連する課題に対処し、計算に富んだトレーニングとフィードバックサイクルを通じて車両情報や軌道を効率的に抽出する。ルイジアナ州交通開発局から得られた9時間以上の交通映像データの手動で集計した地上情報と比較し、すべてのモデルの組み合わせから得られた自動車両数を検証した。実験の結果、センタネットとディープソート、ディテクトロン2とディープソート、そしてヨーロフ4とディープソートの組み合わせは、全車両で最高の総計率を生み出した。 The rapid advancement in the field of deep learning and high performance computing has highly augmented the scope of video based vehicle counting system. In this paper, the authors deploy several state of the art object detection and tracking algorithms to detect and track different classes of vehicles in their regions of interest (ROI). The goal of correctly detecting and tracking vehicles' in their ROI is to obtain an accurate vehicle count. Multiple combinations of object detection models coupled with different tracking systems are applied to access the best vehicle counting framework. The models' addresses challenges associated to different weather conditions, occlusion and low-light settings and efficiently extracts vehicle information and trajectories through its computationally rich training and feedback cycles. The automatic vehicle counts resulting from all the model combinations are validated and compared against the manually counted ground truths of over 9 hours' traffic video data obtained from the Louisiana Department of Transportation and Development. Experimental results demonstrate that the combination of CenterNet and Deep SORT, Detectron2 and Deep SORT, and YOLOv4 and Deep SORT produced the best overall counting percentage for all vehicles.	翻訳日:2022-11-04 06:48:05 公開日:2020-07-31
# 車両検出・追跡のための視覚注意手がかりの活用 Utilising Visual Attention Cues for Vehicle Detection and Tracking ( http://arxiv.org/abs/2008.00106v1 ) ライセンス: Link先を確認	Feiyan Hu, Venkatesh G M, Noel E. O'Connor, Alan F. Smeaton and Suzanne Little	(参考訳) Advanced Driver-Assistance Systems (ADAS)は多くの研究者から注目を集めている。視覚ベースのセンサーは、運転中に人間の視覚行動をエミュレートする最も近い方法だ。本稿では,物体検出と追跡に視覚的注意(saliency)を利用する方法について検討する。調査します 1) 2段階物体検出装置において,<emph{subjectness} 注目度マップや,<emph{objectness} 注目度マップなどの視覚的注意度マップが,領域提案生成を容易にするか。 2)複数の物体の追跡に視覚的注意マップをどのように利用できるか。本稿では,物体を同時に検出し,対象性と主観性マップを生成し,計算力を節約するニューラルネットワークを提案する。さらに,逐次モンテカルロ確率仮説密度(phd)フィルタを用いて追跡中に視覚注意マップを活用した。実験はKITTIとDETRACのデータセットを用いて行われた。視覚的注意と階層的特徴の使用により、オブジェクト検出における$\approx$8\%が大幅に改善され、KITTIデータセット上で$\approx$4\%のトラッキング性能が向上した。 Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency map and an \emph{objectness} attention map can facilitate region proposal generation in a 2-stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of $\approx$8\% in object detection which effectively increased tracking performance by $\approx$4\% on KITTI dataset.	翻訳日:2022-11-04 06:46:52 公開日:2020-07-31
# Prolog-based Dialog Engineを用いたインタラクティブテキストグラフマイニング Interactive Text Graph Mining with a Prolog-based Dialog Engine ( http://arxiv.org/abs/2008.00956v1 ) ライセンス: Link先を確認	Paul Tarau and Eduardo Blanco	(参考訳) ニューラルネットワークベースの依存性パーサとグラフベースの自然言語処理モジュールの上に、テキスト文書から抽出されたランキングファクトデータベースを対話的に探索するPrologベースのダイアログエンジンを設計する。依存グラフを再編成し,文の最も関連性の高い要素に着目し,文識別子をグラフノードとして統合する。さらに、グラフをランク付けした後、依存リンクとWordNetが主観動詞オブジェクト、is-a、part-of関係という形でもたらす暗黙のセマンティック情報を利用する。 Prologの事実とその推測結果に基づいて、ダイアログエンジンはクエリに関するテキストグラフを専門とし、ドキュメントの最も関連性の高いコンテンツ要素をインタラクティブに公開する。統合システムのオープンソースコードはhttps://github.com/ptarau/DeepRank で公開されている。論理プログラミングの理論と実践(tplp)における考察。 On top of a neural network-based dependency parser and a graph-based natural language processing module we design a Prolog-based dialog engine that explores interactively a ranked fact database extracted from a text document. We reorganize dependency graphs to focus on the most relevant content elements of a sentence and integrate sentence identifiers as graph nodes. Additionally, after ranking the graph we take advantage of the implicit semantic information that dependency links and WordNet bring in the form of subject-verb-object, is-a and part-of relations. Working on the Prolog facts and their inferred consequences, the dialog engine specializes the text graph with respect to a query and reveals interactively the document's most relevant content elements. The open-source code of the integrated system is available at https://github.com/ptarau/DeepRank . Under consideration in Theory and Practice of Logic Programming (TPLP).	翻訳日:2022-11-04 06:46:32 公開日:2020-07-31
# ニューラルネットワークの変性と脳との関係 Neural Network Degeneration and its Relationship to the Brain ( http://arxiv.org/abs/2008.00053v1 ) ライセンス: Link先を確認	Jacob Adamczyk	(参考訳) 本稿では,脳の小さな部分としてのニューラルネットワーク(NN)の適用について述べる。生物学的コネクトームを表すネットワークは、空間的にも時間的にも変化する。ここで適用される劣化技法は「重み劣化」、「重みスクランブル」、「可変活性化機能」である。これらの方法は、アルツハイマー病、ハンティントン病、パーキンソン病、脳卒中や脳腫瘍などの神経変性疾患の研究に光を当てることを目的としている。メモリ損失と一般化学習障害に対する基本的な洞察は、ネットワーク劣化時のネットワークのエラー関数を監視することによって得られる。各面の生物学的意義についても論じる。 This report discusses the application of neural networks (NNs) as small segments of the brain. The networks representing the biological connectome are altered both spatially and temporally. The degradation techniques applied here are "weight degradation", "weight scrambling", and variable activation function. These methods aim to shine light on the study of neurodegenerative diseases such as Alzheimer's, Huntington's and Parkinson's disease as well as strokes and brain tumors disrupting the flow of information in the brain's network. Fundamental insights to memory loss and generalized learning dysfunction are gained by monitoring the network's error function during network degradation. The biological significance of each facet is also discussed.	翻訳日:2022-11-04 06:39:46 公開日:2020-07-31
# lemma:マルチエージェントマルチタスクアクティビティを学習するためのマルチビューデータセット LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities ( http://arxiv.org/abs/2007.15781v1 ) ライセンス: Link先を確認	Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-chun Zhu	(参考訳) 人間の行動を理解し解釈することは長年の挑戦であり、人工知能における知覚の重要な指標である。しかし、ゴール指向アクション、同時マルチタスク、マルチエージェント間のコラボレーションなど、日常的な活動の衝動的な要素は、これまでの文献ではほとんど失われている。補題データセットを導入して,これらの欠落した次元に対して,細心の注意を払って設計した設定で対処するための単一のホームを提供し,異なる学習目標を強調するためにタスクやエージェントの数が異なる。我々は、人間と物体の相互作用による原子間相互作用を密に注釈し、日常の活動の構成性、スケジューリング、割り当ての土台として提供する。さらに,合成行動認識と行動/タスク予測ベンチマークをベースラインモデルで作成し,構成行動理解と時間的推論の能力を測定する。この取り組みにより、マシンビジョンコミュニティは、目標指向の人間活動を調べ、現実世界におけるタスクのスケジューリングと割り当てをさらに研究できることを期待します。 Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence. However, a few imperative components of daily human activities are largely missed in prior literature, including the goal-directed actions, concurrent multi-tasks, and collaborations among multi-agents. We introduce the LEMMA dataset to provide a single home to address these missing dimensions with meticulously designed settings, wherein the number of tasks and agents varies to highlight different learning objectives. We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities. We further devise challenging compositional action recognition and action/task anticipation benchmarks with baseline models to measure the capability of compositional action understanding and temporal reasoning. We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.	翻訳日:2022-11-04 06:39:35 公開日:2020-07-31
# AR-Net:効果的な行動認識のための適応フレーム分解能 AR-Net: Adaptive Frame Resolution for Efficient Action Recognition ( http://arxiv.org/abs/2007.15796v1 ) ライセンス: Link先を確認	Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, and Rogerio Feris	(参考訳) 行動認識はコンピュータビジョンにおいてオープンかつ挑戦的な問題である。現在の最先端モデルは優れた認識結果を提供するが、その計算費用は現実世界の多くのアプリケーションに対する影響を制限する。本稿では,提案手法であるar-net(adaptive resolution network,適応解像度ネットワーク)を提案する。具体的には、映像フレームを与えられた場合、アクション認識モデルによる処理にどの入力解像度を使用するべきかを、精度と効率の両立を目標としてポリシーネットワークを用いて決定する。標準バックプロパゲーションを用いた認識モデルと協調してポリシーネットワークを効率的に訓練する。いくつかの挑戦的行動認識ベンチマークデータセットに関する広範な実験は、最先端手法に対する提案手法の有効性をよく示している。プロジェクトページはhttps://mengyuest.github.io/AR-Netにある。 Action recognition is an open and challenging problem in computer vision. While current state-of-the-art models offer excellent recognition results, their computational expense limits their impact for many real-world applications. In this paper, we propose a novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition in long untrimmed videos. Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency. We efficiently train the policy network jointly with the recognition model using standard back-propagation. Extensive experiments on several challenging action recognition benchmark datasets well demonstrate the efficacy of our proposed approach over state-of-the-art methods. The project page can be found at https://mengyuest.github.io/AR-Net	翻訳日:2022-11-04 06:39:17 公開日:2020-07-31
# ETH-XGaze:極端ヘッドポーズにおける注視推定のための大規模データセットと注視変動 ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation ( http://arxiv.org/abs/2007.15837v1 ) ライセンス: Link先を確認	Xucong Zhang and Seonwook Park and Thabo Beeler and Derek Bradley and Siyu Tang and Otmar Hilliges	(参考訳) 視線推定はコンピュータビジョン、人間のコンピュータインタラクション、ロボット工学の多くの応用における基本的なタスクである。多くの最先端のメソッドはカスタムデータセット上でトレーニングされ、テストされるため、メソッド間の比較が困難になる。さらに、既存の視線推定データセットは、頭部ポーズと視線変動が制限されており、異なるプロトコルとメトリクスを用いて評価を行う。本稿では,頭部の極端な姿勢下での視線の異なる100万以上の高解像度画像からなる,eth-xgazeと呼ばれる新しい視線推定データセットを提案する。このデータセットは,18台のデジタルslrカメラと調整可能な照明条件を含むカスタムハードウェアセットアップと,地上の真理観測目標を記録する校正システムを用いて,110名の参加者から収集した。我々のデータセットは、異なる頭部ポーズと視線角度で視線推定手法の堅牢性を大幅に改善できることを示す。さらに,ETH-XGazeの標準化された実験プロトコルと評価基準を定義し,今後の視線推定研究を統一する。データセットとベンチマークのWebサイトはhttps://ait.ethz.ch/projects/2020/ETH-XGazeで公開されている。 Gaze estimation is a fundamental task in many applications of computer vision, human computer interaction and robotics. Many state-of-the-art methods are trained and tested on custom datasets, making comparison across methods challenging. Furthermore, existing gaze estimation datasets have limited head pose and gaze variations, and the evaluations are conducted using different protocols and metrics. In this paper, we propose a new gaze estimation dataset called ETH-XGaze, consisting of over one million high-resolution images of varying gaze under extreme head poses. We collect this dataset from 110 participants with a custom hardware setup including 18 digital SLR cameras and adjustable illumination conditions, and a calibrated system to record ground truth gaze targets. We show that our dataset can significantly improve the robustness of gaze estimation methods across different head poses and gaze angles. Additionally, we define a standardized experimental protocol and evaluation metric on ETH-XGaze, to better unify gaze estimation research going forward. The dataset and benchmark website are available at https://ait.ethz.ch/projects/2020/ETH-XGaze	翻訳日:2022-11-04 06:38:50 公開日:2020-07-31
# 深層ニューラルネットワークの特徴可視化のための塩分駆動クラスインプレッション Saliency-driven Class Impressions for Feature Visualization of Deep Neural Networks ( http://arxiv.org/abs/2007.15861v1 ) ライセンス: Link先を確認	Sravanti Addepalli, Dipesh Tamboli, R. Venkatesh Babu, Biplab Banerjee	(参考訳) 本稿では,各クラスのインプレッションを分類器のメモリから抽出するデータフリーな手法を提案する。ディープラーニングのレジームは、トレーニングデータから特定のクラスの異なるパターン(あるいは特徴)を抽出するように分類器に権限を与えます。これらのモデルをクリティカルなアプリケーションにデプロイする前に、分類に不可欠なと思われる機能を視覚化するのが有利である。既存の可視化手法は,背景特徴と前景特徴の両方からなる高信頼画像を生成する。これにより、あるクラスの重要な機能が何であるかを判断するのは難しい。本研究では,与えられたタスクにおいて最も重要な識別的特徴を視覚化するための,サリエンシー駆動手法を提案する。既存のメソッドのもう一つの欠点は、生成されたビジュアライゼーションの信頼性が、与えられたクラスの複数のインスタンスを作成することで高まることである。我々は,画像ごとの単一オブジェクトの開発にアルゴリズムを制限し,信頼性の高い特徴を抽出し,その結果の可視化を向上する。さらに,2つ以上のクラスの自然な融合画像として,否定画像の生成を実証する。 In this paper, we propose a data-free method of extracting Impressions of each class from the classifier's memory. The Deep Learning regime empowers classifiers to extract distinct patterns (or features) of a given class from training data, which is the basis on which they generalize to unseen data. Before deploying these models on critical applications, it is advantageous to visualize the features considered to be essential for classification. Existing visualization methods develop high confidence images consisting of both background and foreground features. This makes it hard to judge what the crucial features of a given class are. In this work, we propose a saliency-driven approach to visualize discriminative features that are considered most important for a given task. Another drawback of existing methods is that confidence of the generated visualizations is increased by creating multiple instances of the given class. We restrict the algorithm to develop a single object per image, which helps further in extracting features of high confidence and also results in better visualizations. We further demonstrate the generation of negative images as naturally fused images of two or more classes.	翻訳日:2022-11-04 06:38:14 公開日:2020-07-31
# DynaMiTe:ロバストリアルタイム特徴マッチングのための時間制約付き動的局所運動モデル DynaMiTe: A Dynamic Local Motion Model with Temporal Constraints for Robust Real-Time Feature Matching ( http://arxiv.org/abs/2007.16005v1 ) ライセンス: Link先を確認	Patrick Ruhkamp and Ruiqi Gong and Nassir Navab and Benjamin Busam	(参考訳) 特徴量に基づくビジュアルオドメトリーとSLAM法では,リアルタイムに正確なカメラポーズ推定を行うために,連続した画像フレーム間の正確な対応が求められている。現在の特徴マッチングパイプラインは、特徴抽出器の記述能力にのみ依存するか、計算的に複雑な最適化スキームを必要とする。本稿では,ディスクリプタ入力に非依存な軽量パイプラインDynaMiTeを提案する。この手法の理論的バックボーンは、特徴マッチングの確率的定式化と、物理的動機づけのある制約の研究にある。動的適応可能な局所運動モデルは、効率的なデータ構造に特徴群をカプセル化する。時間的制約は局所運動モデルの情報を時間的に伝達するので、マッチングの検索空間の複雑さも軽減する。 dynamiteは、高いフレームレートでマッチング精度とカメラポーズ推定の両面で優れた結果を達成し、計算効率は高く、最先端のマッチング手法よりも優れている。 Feature based visual odometry and SLAM methods require accurate and fast correspondence matching between consecutive image frames for precise camera pose estimation in real-time. Current feature matching pipelines either rely solely on the descriptive capabilities of the feature extractor or need computationally complex optimization schemes. We present the lightweight pipeline DynaMiTe, which is agnostic to the descriptor input and leverages spatial-temporal cues with efficient statistical measures. The theoretical backbone of the method lies within a probabilistic formulation of feature matching and the respective study of physically motivated constraints. A dynamically adaptable local motion model encapsulates groups of features in an efficient data structure. Temporal constraints transfer information of the local motion model across time, thus additionally reducing the search space complexity for matching. DynaMiTe achieves superior results both in terms of matching accuracy and camera pose estimation with high frame rates, outperforming state-of-the-art matching methods while being computationally more efficient.	翻訳日:2022-11-04 06:37:56 公開日:2020-07-31
# 自動運転車の交通制御ジェスチャー認識 Traffic Control Gesture Recognition for Autonomous Vehicles ( http://arxiv.org/abs/2007.16072v1 ) ライセンス: Link先を確認	Julian Wiederer, Arij Bouazizi, Ulrich Kressel and Vasileios Belagiannis	(参考訳) 車の運転手は、交通官のジェスチャーに反応する方法を知っています。道路交通制御のジェスチャー認識機能がない限り、これは自動運転車には当てはまらないことは明らかだ。本研究では、交通制御ジェスチャー認識のための学習データを提供するため、既存の自動運転データセットの制限に対処する。本稿では3Dボディスケルトン入力に基づくデータセットを導入し,時間ステップ毎に交通制御のジェスチャー分類を行う。私たちのデータセットは、複数のアクターによる250のシーケンスで構成されています。このデータセットを評価するために,再帰的ネットワーク,注意機構,時間的畳み込みネットワーク,グラフ畳み込みネットワークなどのディープニューラルネットワークに基づく8つの逐次処理モデルを提案する。我々は、データセットに対する全てのアプローチの広範な評価と分析、および現実世界の定量的評価について述べる。コードとデータセットが公開されている。 A car driver knows how to react on the gestures of the traffic officers. Clearly, this is not the case for the autonomous vehicle, unless it has road traffic control gesture recognition functionalities. In this work, we address the limitation of the existing autonomous driving datasets to provide learning data for traffic control gesture recognition. We introduce a dataset that is based on 3D body skeleton input to perform traffic control gesture classification on every time step. Our dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence. To evaluate our dataset, we propose eight sequential processing models based on deep neural networks such as recurrent networks, attention mechanism, temporal convolutional networks and graph convolutional networks. We present an extensive evaluation and analysis of all approaches for our dataset, as well as real-world quantitative evaluation. The code and dataset is publicly available.	翻訳日:2022-11-04 06:37:40 公開日:2020-07-31
# 医用画像分類のための畳み込みニューラルネットワークにおける新しいグローバル空間注意機構 A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification ( http://arxiv.org/abs/2007.15897v1 ) ライセンス: Link先を確認	Linchuan Xu, Jun Huang, Atsushi Nitanda, Ryo Asaoka, Kenji Yamanishi	(参考訳) 画像分類を含む視覚的タスクのパフォーマンスと解釈性を改善するために、畳み込みニューラルネットワーク(CNN)に空間的注意が導入された。空間的注意の本質は、同じ層またはチャネル内でアクティベーションの相対的重要性を表す重みマップを学ぶことである。既存の注意のメカニズムはすべて、重みマップが画像に特有であるという意味で局所的な注意である。しかし, 医療分野では, 画像の集合が同一対象と同一の症状を記録し, 同一の構造的内容を共有するため, すべての画像が同じ重みマップを共有する必要がある場合がある。本稿では,医療画像の分類を主目的とし,cnnにおける新たな空間的注目機構を提案する。グローバルウェイトマップは重要なピクセルと重要でないピクセルの間の決定境界によってインスタンス化される。また,画素内のすべての画像の強度が画素の特徴であるバイナリ分類器によって決定境界を実現することを提案する。バイナリ分類は画像分類CNNに統合され、CNNと共に最適化される。 2つの医用画像データセットと1つの表情データセットの実験により、googlenet, vgg, resnet, densenetの4つの強力なcnnの性能向上だけでなく、有意義な出席領域も得られ、ドメインのイメージの内容を理解するのに有用であることが示された。 Spatial attention has been introduced to convolutional neural networks (CNNs) for improving both their performance and interpretability in visual tasks including image classification. The essence of the spatial attention is to learn a weight map which represents the relative importance of activations within the same layer or channel. All existing attention mechanisms are local attentions in the sense that weight maps are image-specific. However, in the medical field, there are cases that all the images should share the same weight map because the set of images record the same kind of symptom related to the same object and thereby share the same structural content. In this paper, we thus propose a novel global spatial attention mechanism in CNNs mainly for medical image classification. The global weight map is instantiated by a decision boundary between important pixels and unimportant pixels. And we propose to realize the decision boundary by a binary classifier in which the intensities of all images at a pixel are the features of the pixel. The binary classification is integrated into an image classification CNN and is to be optimized together with the CNN. Experiments on two medical image datasets and one facial expression dataset showed that with the proposed attention, not only the performance of four powerful CNNs which are GoogleNet, VGG, ResNet, and DenseNet can be improved, but also meaningful attended regions can be obtained, which is beneficial for understanding the content of images of a domain.	翻訳日:2022-11-04 06:31:30 公開日:2020-07-31
# リモートセンシングのためのニューラルスタイル転送 Neural Style Transfer for Remote Sensing ( http://arxiv.org/abs/2007.15920v1 ) ライセンス: Link先を確認	Maria Karatzoglidi, Georgios Felekis and Eleni Charou	(参考訳) Leon A. Gatys らの論文 "A Neural Algorithm of Artistic Style" で概説された有名なテクニックは、学術文学と産業応用の両方においてトレンドとなっている。 Neural Style Transfer (NST)は、2D画像の芸術的スタイリング、ユーザ支援作成ツール、エンターテイメントアプリケーションのための制作ツールなど、幅広い用途に欠かせないツールである。本研究の目的は,NSTアルゴリズムに基づく衛星画像から芸術地図を作成する方法を提案することである。この方法は3つの基本的なステップを含む (i)衛星画像における意味的画像分割の適用、その内容の分類(陸水) (二)各クラス及びクラスに対するニューラルスタイル転送の適用 (iii)コラージュ、すなわち、前段の2つの様式化された画像の組み合わせからなる芸術的画像の作成。 The well-known technique outlined in the paper of Leon A. Gatys et al., A Neural Algorithm of Artistic Style, has become a trending topic both in academic literature and industrial applications. Neural Style Transfer (NST) constitutes an essential tool for a wide range of applications, such as artistic stylization of 2D images, user-assisted creation tools and production tools for entertainment applications. The purpose of this study is to present a method for creating artistic maps from satellite images, based on the NST algorithm. This method includes three basic steps (i) application of semantic image segmentation on the original satellite image, dividing its content into classes (i.e. land, water), (ii) application of neural style transfer for each class and (iii) creation of a collage, i.e. an artistic image consisting of a combination of the two stylized image generated on the previous step.	翻訳日:2022-11-04 06:31:07 公開日:2020-07-31
# 3D検出ネットワークを用いた乳房超音波自動診断 Computer-aided Tumor Diagnosis in Automated Breast Ultrasound using 3D Detection Network ( http://arxiv.org/abs/2007.16133v1 ) ライセンス: Link先を確認	Junxiong Yu, Chaoyu Chen, Xin Yang, Yi Wang, Dan Yan, Jianxing Zhang, Dong Ni	(参考訳) 自動乳房超音波(ABUS)は、乳がんの診断と診断のための新しい将来性のある画像モダリティであり、直感的な3D情報と診断価値の高い冠動脈平面情報を提供することができる。しかし、ABUS画像から腫瘍を手動でスクリーニング・診断することは非常に時間がかかり、異常の見落としが生じる可能性がある。そこで本研究では, 病変部位を同定し, 良性腫瘍, 悪性腫瘍と分類するための新しい2段階3D検出ネットワークを提案する。具体的には,abus画像中の病変を同定するために,頻繁に使用されるセグメンテーションネットワークではなく,3次元検出ネットワークを提案する。新しい類似性損失は、病変と背景を効果的に区別するように設計されている。次に、検出された病変を良性または悪性と識別する分類ネットワークを用いる。分類タスクと局所化タスクの相関を改善するために,IoUバランスの取れた分類損失を採用する。良性腫瘍145例,悪性腫瘍273例の418例を対象に,本ネットワークの有効性を検証した。実験により, ネットワークの感度は97.66%, 1.23偽陽性 (FPs) であり, 曲線(AUC) 値0.8720以下の領域を有することがわかった。 Automated breast ultrasound (ABUS) is a new and promising imaging modality for breast cancer detection and diagnosis, which could provide intuitive 3D information and coronal plane information with great diagnostic value. However, manually screening and diagnosing tumors from ABUS images is very time-consuming and overlooks of abnormalities may happen. In this study, we propose a novel two-stage 3D detection network for locating suspected lesion areas and further classifying lesions as benign or malignant tumors. Specifically, we propose a 3D detection network rather than frequently-used segmentation network to locate lesions in ABUS images, thus our network can make full use of the spatial context information in ABUS images. A novel similarity loss is designed to effectively distinguish lesions from background. Then a classification network is employed to identify the located lesions as benign or malignant. An IoU-balanced classification loss is adopted to improve the correlation between classification and localization task. The efficacy of our network is verified from a collected dataset of 418 patients with 145 benign tumors and 273 malignant tumors. Experiments show our network attains a sensitivity of 97.66% with 1.23 false positives (FPs), and has an area under the curve(AUC) value of 0.8720.	翻訳日:2022-11-04 06:30:52 公開日:2020-07-31
# 自己教師付き学習による臨床脳波信号の構造解明 Uncovering the structure of clinical EEG signals with self-supervised learning ( http://arxiv.org/abs/2007.16104v1 ) ライセンス: Link先を確認	Hubert Banville, Omar Chehab, Aapo Hyv\"arinen, Denis-Alexander Engemann, Alexandre Gramfort	(参考訳) 目的。教師付き学習パラダイムは、しばしば利用可能なラベル付きデータの量によって制限される。この現象は脳波(EEG)などの臨床関連データに特に問題があり、専門的な専門知識や人的処理時間の観点からラベル付けに費用がかかる。その結果、脳波データで学習するために設計されたディープラーニングアーキテクチャは、従来の機能ベースアプローチとよく似た、比較的浅いモデルとパフォーマンスを生み出した。しかし、ほとんどの状況では、ラベルのないデータは豊富に利用できる。このラベルのないデータから情報を抽出することで、ラベルへのアクセスが制限されているにもかかわらず、ディープニューラルネットワークで競合性能に達することができるかもしれない。アプローチ。脳波信号の表現を学習するために,ラベルのないデータの構造を発見するための有望な手法である自己教師学習(SSL)について検討した。具体的には,脳波に基づく睡眠ステージングと病理診断という2つの臨床関連課題に対して,時間的文脈予測と対比的予測符号化に基づく2つの課題を検討した。何千もの録音を伴う2つの大規模公開データセットの実験を行い、ベースライン比較を行った。主な結果。 SSLで学習した機能に基づいてトレーニングされた線形分類器は、低ラベルのデータレギュレーションで純粋に監視されたディープニューラルネットワークよりも優れ、すべてのラベルが利用可能になった時に競争力のあるパフォーマンスを達成した。さらに,各手法で得られた埋込みは,年齢効果などの生理現象や臨床現象に関連する明らかな潜伏構造を示した。重要なこと。脳波データに対する自己教師あり学習アプローチの利点を実証する。我々の結果は、SSLが脳波データのディープラーニングモデルをより広く活用する道を開くことを示唆している。 Objective. Supervised learning paradigms are often limited by the amount of labeled data that is available. This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG), where labeling can be costly in terms of specialized expertise and human processing time. Consequently, deep learning architectures designed to learn on EEG data have yielded relatively shallow models and performances at best similar to those of traditional feature-based approaches. However, in most situations, unlabeled data is available in abundance. By extracting information from this unlabeled data, it might be possible to reach competitive performance with deep neural networks despite limited access to labels. Approach. We investigated self-supervised learning (SSL), a promising technique for discovering structure in unlabeled data, to learn representations of EEG signals. Specifically, we explored two tasks based on temporal context prediction as well as contrastive predictive coding on two clinically-relevant problems: EEG-based sleep staging and pathology detection. We conducted experiments on two large public datasets with thousands of recordings and performed baseline comparisons with purely supervised and hand-engineered approaches. Main results. Linear classifiers trained on SSL-learned features consistently outperformed purely supervised deep neural networks in low-labeled data regimes while reaching competitive performance when all labels were available. Additionally, the embeddings learned with each method revealed clear latent structures related to physiological and clinical phenomena, such as age effects. Significance. We demonstrate the benefit of self-supervised learning approaches on EEG data. Our results suggest that SSL may pave the way to a wider use of deep learning models on EEG data.	翻訳日:2022-11-04 06:30:01 公開日:2020-07-31
# 構造化宝くじを用いた食事深層音響モデル Diet deep generative audio models with structured lottery ( http://arxiv.org/abs/2007.16170v1 ) ライセンス: Link先を確認	Philippe Esling, Ninon Devis, Adrien Bitton, Antoine Caillon, Axel Chemla--Romeu-Santos, Constance Douwes	(参考訳) ディープラーニングモデルは、ほとんどのオーディオアプリケーション分野で非常に成功したソリューションを提供している。しかし、これらのモデルの高精度さは、膨大な計算コストを犠牲にしている。この側面は、提案されたモデルの品質を評価する際に、ほとんど常に見過ごされている。しかし、モデルは複雑さを考慮せずに評価するべきではない。この側面はオーディオアプリケーションにおいて特に重要であり、リアルタイム制約のある特殊な組み込みハードウェアに大きく依存している。本稿では,深層音響モデルにおける宝くじチケット仮説を研究することにより,深層モデルが過小評価されているという最近の観測結果について述べる。この仮説は、非常に効率的な小さなサブネットワークが深いモデルに存在し、孤立して訓練された場合、より大きなモデルよりも高い精度を提供するというものである。しかし、宝くじは構造化されていないマスキングに依存するため、結果として得られるモデルはディスクサイズや推論時間に何の利益も与えない。そこで我々は,構造化トリミングを行う手法を開発した。これはグローバル選択に頼り、相互情報に基づく特定の基準を導入する必要があることを示す。まず、小型モデルが大型モデルよりも精度が高いという驚くべき結果を確認する。さらに, モデル重量の最大95%を, 精度の大幅な低下を伴わずに除去できることを示した。したがって、wavenet、sing、ddspなどの一般的な手法をまたいで、高い精度で最大100倍小さい生成音声の非常に軽量なモデルを得ることができる。 Raspberry PiとArduinoにこれらのモデルを埋め込む理論的境界について検討し、大きなGPUモデルと同じ品質のCPU上で生成モデルを得ることができることを示す。最後に,組込みプラットフォーム上での深層生成音声モデルの実装の可能性について論じる。 Deep learning models have provided extremely successful solutions in most audio application fields. However, the high accuracy of these models comes at the expense of a tremendous computation cost. This aspect is almost always overlooked in evaluating the quality of proposed models. However, models should not be evaluated without taking into account their complexity. This aspect is especially critical in audio applications, which heavily relies on specialized embedded hardware with real-time constraints. In this paper, we build on recent observations that deep models are highly overparameterized, by studying the lottery ticket hypothesis on deep generative audio models. This hypothesis states that extremely efficient small sub-networks exist in deep models and would provide higher accuracy than larger models if trained in isolation. However, lottery tickets are found by relying on unstructured masking, which means that resulting models do not provide any gain in either disk size or inference time. Instead, we develop here a method aimed at performing structured trimming. We show that this requires to rely on global selection and introduce a specific criterion based on mutual information. First, we confirm the surprising result that smaller models provide higher accuracy than their large counterparts. We further show that we can remove up to 95% of the model weights without significant degradation in accuracy. Hence, we can obtain very light models for generative audio across popular methods such as Wavenet, SING or DDSP, that are up to 100 times smaller with commensurate accuracy. We study the theoretical bounds for embedding these models on Raspberry Pi and Arduino, and show that we can obtain generative models on CPU with equivalent quality as large GPU models. Finally, we discuss the possibility of implementing deep generative audio models on embedded platforms.	翻訳日:2022-11-04 06:29:36 公開日:2020-07-31
# 宝くじトリミングによる超軽量深度MIR Ultra-light deep MIR by trimming lottery tickets ( http://arxiv.org/abs/2007.16187v1 ) ライセンス: Link先を確認	Philippe Esling, Theis Bazin, Adrien Bitton, Tristan Carsault, Ninon Devis	(参考訳) 音楽情報検索における現状の成果は、主にディープラーニングのアプローチに支配されている。これらはすべてのタスクに対して前例のない精度を提供する。しかし、これらのモデルの一貫して見過ごされがちな欠点は、驚くほど複雑であり、それが成功に不可欠であるように思える。本稿では,抽選券仮説に基づくモデル刈り込み手法を提案することで,この問題に対処した。個々の重みをマスクする代わりに、ユニット全体の構造的なトリミングを通じてパラメータを明示的に削除できるように、元のアプローチを変更します。これにより,サイズやメモリ,操作数といった面で,事実上軽量なモデルが実現される。本提案は,精度を損なうことなく,最大90%のモデルパラメータを除去できることを示す。我々は、より小さな圧縮比(ネットワークの最大85%)で、より軽いモデルが、より重いモデルよりも一貫して優れているという驚くべき結果を確認した。我々はこれらの結果を,音声分類,ピッチ認識,コード抽出,ドラムの書き起こし,オンセット推定など,多数のMIRタスクで示す。 MIRの超軽量ディープラーニングモデルはCPU上で動作し、最小限の精度で組み込みデバイスに適合する。 Current state-of-the-art results in Music Information Retrieval are largely dominated by deep learning approaches. These provide unprecedented accuracy across all tasks. However, the consistently overlooked downside of these models is their stunningly massive complexity, which seems concomitantly crucial to their success. In this paper, we address this issue by proposing a model pruning method based on the lottery ticket hypothesis. We modify the original approach to allow for explicitly removing parameters, through structured trimming of entire units, instead of simply masking individual weights. This leads to models which are effectively lighter in terms of size, memory and number of operations. We show that our proposal can remove up to 90% of the model parameters without loss of accuracy, leading to ultra-light deep MIR models. We confirm the surprising result that, at smaller compression ratios (removing up to 85% of a network), lighter models consistently outperform their heavier counterparts. We exhibit these results on a large array of MIR tasks including audio classification, pitch recognition, chord extraction, drum transcription and onset estimation. The resulting ultra-light deep learning models for MIR can run on CPU, and can even fit on embedded devices with minimal degradation of accuracy.	翻訳日:2022-11-04 06:29:11 公開日:2020-07-31
# SimulEval: 同時翻訳のための評価ツールキット SimulEval: An Evaluation Toolkit for Simultaneous Translation ( http://arxiv.org/abs/2007.16193v1 ) ライセンス: Link先を確認	Xutai Ma, Mohammad Javad Dousti, Changhan Wang, Jiatao Gu, Juan Pino	(参考訳) テキストと音声の同時翻訳は、モデルが完全なソース入力を読む前に翻訳を開始するリアルタイムおよび低レイテンシシナリオに焦点を当てる。同時翻訳モデルの評価は、レイテンシが翻訳品質に加えて考慮すべき要素であることから、オフラインモデルよりも複雑である。研究コミュニティは、同時翻訳のための新しいモデリングアプローチに重点を置いているが、現在では普遍的な評価手順を欠いている。そこで本研究では,テキストと音声の同時翻訳のための簡易かつ汎用的な評価ツールキットであるSimulEvalを提案する。サーバクライアントスキームを導入し、同時に翻訳シナリオを作成し、サーバがソース入力を送り、評価のための予測を受け取り、クライアントがカスタマイズされたポリシーを実行する。ポリシーが与えられたら、自動的に同時デコードを実行し、いくつかの一般的なレイテンシメトリクスをまとめて報告する。また、テキスト同時翻訳から音声タスクへの遅延メトリクスも適用する。さらに、SimulEvalは、システムの同時復号プロセスをよりよく理解するための可視化インターフェースを備えている。 SimulEvalはすでに、IWSLT 2020の同時音声翻訳タスクに広く使われている。コードは出版時に公開される。 Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario where the model starts translating before reading the complete source input. Evaluating simultaneous translation models is more complex than offline models because the latency is another factor to consider in addition to translation quality. The research community, despite its growing focus on novel modeling approaches to simultaneous translation, currently lacks a universal evaluation procedure. Therefore, we present SimulEval, an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation. A server-client scheme is introduced to create a simultaneous translation scenario, where the server sends source input and receives predictions for evaluation and the client executes customized policies. Given a policy, it automatically performs simultaneous decoding and collectively reports several popular latency metrics. We also adapt latency metrics from text simultaneous translation to the speech task. Additionally, SimulEval is equipped with a visualization interface to provide better understanding of the simultaneous decoding process of a system. SimulEval has already been extensively used for the IWSLT 2020 shared task on simultaneous speech translation. Code will be released upon publication.	翻訳日:2022-11-04 06:28:33 公開日:2020-07-31
# ランキング指向推薦システムグラフの埋め込み Embedding Ranking-Oriented Recommender System Graphs ( http://arxiv.org/abs/2007.16173v1 ) ライセンス: Link先を確認	Taher Hekmatfar, Saman Haratizadeh, Sama Goliaei	(参考訳) グラフベースレコメンダシステム(grss)は、データのグラフィカル表現における構造情報を解析し、特に直接ユーザ・テーマ関係データがスパースである場合に、より優れたレコメンデーションを行う。主要なレコメンデーションシステムを構成するランク指向のGRSは、主にノードの類似度を測定するために好み(またはランク)データのグラフィカルな表現を使い、そこから近隣のメカニズムを使ってレコメンデーションリストを推測することができる。本稿では,グラフベースの新しいランク指向推薦フレームワークであるPGRecを提案する。 PGRecは、PrefGraphと呼ばれる新しいグラフ構造によって、アイテムよりもユーザの好みをモデル化する。このグラフは、要素化と深層学習の両方の手法を利用して、ユーザ、アイテム、嗜好を表すベクトルを抽出する改良された埋め込みアプローチによって活用される。結果の埋め込みは、最終的なレコメンデーションリストが推測されるユーザの未知のペアワイズ選好を予測するために使用される。本研究では,提案手法の性能評価を行い,pgrecが,映画レンスデータセットのndcg@10において,ベースラインアルゴリズムを最大3.2%上回ることを示した。 Graph-based recommender systems (GRSs) analyze the structural information in the graphical representation of data to make better recommendations, especially when the direct user-item relation data is sparse. Ranking-oriented GRSs that form a major class of recommendation systems, mostly use the graphical representation of preference (or rank) data for measuring node similarities, from which they can infer a recommendation list using a neighborhood-based mechanism. In this paper, we propose PGRec, a novel graph-based ranking-oriented recommendation framework. PGRec models the preferences of the users over items, by a novel graph structure called PrefGraph. This graph is then exploited by an improved embedding approach, taking advantage of both factorization and deep learning methods, to extract vectors representing users, items, and preferences. The resulting embedding are then used for predicting users' unknown pairwise preferences from which the final recommendation lists are inferred. We have evaluated the performance of the proposed method against the state of the art model-based and neighborhood-based recommendation methods, and our experiments show that PGRec outperforms the baseline algorithms up to 3.2% in terms of NDCG@10 in different MovieLens datasets.	翻訳日:2022-11-04 06:22:50 公開日:2020-07-31
# 深層学習に基づく変調分類器に対する複数アンテナによる逆攻撃 Adversarial Attacks with Multiple Antennas Against Deep Learning-Based Modulation Classifiers ( http://arxiv.org/abs/2007.16204v1 ) ライセンス: Link先を確認	Brian Kim and Yalin E. Sagduyu and Tugba Erpek and Kemal Davaslioglu and Sennur Ulukus	(参考訳) 本稿では,受信者が異なる変調方式の受信者に信号を送信し,受信者が深層学習に基づく分類器を用いて受信信号の変調方式を分類する無線通信システムを考える。同時に、敵は複数のアンテナを用いて敵の摂動を送信し、分類器を騙して受信した信号を誤分類する。敵の機械学習の観点から、敵の攻撃性能を改善するために、敵の複数のアンテナを利用する方法を示す。 2つの主要なポイントは、敵の複数のアンテナ、すなわちアンテナ間の電力配分とチャネルの多様性の利用を利用して検討される。まず,一つのアンテナを持つ複数の独立した敵は,同じ総電力で複数のアンテナを持つ1つの敵に比べて攻撃性能が向上しないことを示す。そこで我々は,1つのアンテナのみに電力を割り当てたり,チャネルゲインに比例あるいは逆比例するなど,複数のアンテナ間で電力を割り当てる様々な方法を検討する。チャネルの多様性を利用して,シンボルレベルで最大チャンネル利得を有するチャネルを介して逆摂動を伝達する攻撃を提案する。この攻撃は,アンテナ間のチャネルのばらつきやチャネル相関の観点から,異なるチャネル条件下での他の攻撃と比較して,分類精度が著しく低下することを示す。また,チャネルの多様性を活かして敵攻撃を行うアンテナの数が増加するにつれて,攻撃の成功が著しく向上することを示す。 We consider a wireless communication system, where a transmitter sends signals to a receiver with different modulation types while the receiver classifies the modulation types of the received signals using its deep learning-based classifier. Concurrently, an adversary transmits adversarial perturbations using its multiple antennas to fool the classifier into misclassifying the received signals. From the adversarial machine learning perspective, we show how to utilize multiple antennas at the adversary to improve the adversarial (evasion) attack performance. Two main points are considered while exploiting the multiple antennas at the adversary, namely the power allocation among antennas and the utilization of channel diversity. First, we show that multiple independent adversaries, each with a single antenna cannot improve the attack performance compared to a single adversary with multiple antennas using the same total power. Then, we consider various ways to allocate power among multiple antennas at a single adversary such as allocating power to only one antenna, and proportional or inversely proportional to the channel gain. By utilizing channel diversity, we introduce an attack to transmit the adversarial perturbation through the channel with the largest channel gain at the symbol level. We show that this attack reduces the classifier accuracy significantly compared to other attacks under different channel conditions in terms of channel variance and channel correlation across antennas. Also, we show that the attack success improves significantly as the number of antennas increases at the adversary that can better utilize channel diversity to craft adversarial attacks.	翻訳日:2022-11-04 06:22:27 公開日:2020-07-31
# マルウェアデータの意味のあるクラスタの同定 Identifying meaningful clusters in malware data ( http://arxiv.org/abs/2008.01175v1 ) ライセンス: Link先を確認	Renato Cordeiro de Amorim and Carlos David Lopez Ruiz	(参考訳) ドライブ・バイ・ダウンのマルウェアデータに意味のあるクラスタを見つけることは特に難しい作業である。マルウェアデータは、幅広い濃度の異なる重なり合うクラスタを含む傾向にある。これは、マルウェアのサンプルの間にかなり類似している可能性があるためである(いくつかは同一の家系に属すると言われている)。クラスタリングアルゴリズムは通常、正規化されたデータセットに適用される。しかし、正規化のプロセスは、クラスタリングに類似した貢献をするために異なる範囲の値で特徴を設定することを目的としている。意味の薄いものよりも有意義な機能を好まないので、データ前処理の段階を期待すべきだろう。本稿では,上記の問題に正確に対処する手法を提案する。クラスタ間の分離を増加させることのできる反復データ前処理方法である。それぞれの機能のクラスタ内の関連度を計算し、それをデータ再スケーリングファクタとして使用します。これを収束するまで繰り返すことで、マルウェアデータはクリアなクラスタに分離され、平均的なシルエット幅が増加した。 Finding meaningful clusters in drive-by-download malware data is a particularly difficult task. Malware data tends to contain overlapping clusters with wide variations of cardinality. This happens because there can be considerable similarity between malware samples (some are even said to belong to the same family), and these tend to appear in bursts. Clustering algorithms are usually applied to normalised data sets. However, the process of normalisation aims at setting features with different range values to have a similar contribution to the clustering. It does not favour more meaningful features over those that are less meaningful, an effect one should perhaps expect of the data pre-processing stage. In this paper we introduce a method to deal precisely with the problem above. This is an iterative data pre-processing method capable of aiding to increase the separation between clusters. It does so by calculating the within-cluster degree of relevance of each feature, and then it uses these as a data rescaling factor. By repeating this until convergence our malware data was separated in clear clusters, leading to a higher average silhouette width.	翻訳日:2022-11-04 06:21:23 公開日:2020-07-31
# 実世界信号のクラウドソーシング音声品質評価予測のためのピラミッドリカレントネットワーク A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals ( http://arxiv.org/abs/2007.15797v1 ) ライセンス: Link先を確認	Xuan Dong and Donald S. Williamson	(参考訳) 目的語質尺度の現実的能力は,(1)実環境を適切にモデル化しない模擬データから,(2)主観的評価と強く相関しない客観的スコアを推定することにより,制限される。さらに、リスナー品質評価を伴う現実世界の信号の大規模なデータセットは、現在存在しないため、現実世界の評価が容易になる。本稿では,人間の聞き手によって評価される実世界の音声信号の知覚的品質を収集し,予測する。まず,2つの実世界のコーパス上でクラウドソースによる聞き取り調査を行い,質の高い評価データセットを収集した。さらに、注目機構を備えたピラミッド双方向長期記憶(pBLSTM)ネットワークを用いて、人間の品質評価を予測する新しい手法を開発した。その結果,予測スコアが人的判断と強く相関する従来の評価手法よりも統計的に低い推定誤差が得られた。 The real-world capabilities of objective speech quality measures are limited since current measures (1) are developed from simulated data that does not adequately model real environments; or they (2) predict objective scores that are not always strongly correlated with subjective ratings. Additionally, a large dataset of real-world signals with listener quality ratings does not currently exist, which would help facilitate real-world assessment. In this paper, we collect and predict the perceptual quality of real-world speech signals that are evaluated by human listeners. We first collect a large quality rating dataset by conducting crowdsourced listening studies on two real-world corpora. We further develop a novel approach that predicts human quality ratings using a pyramid bidirectional long short term memory (pBLSTM) network with an attention mechanism. The results show that the proposed model achieves statistically lower estimation errors than prior assessment approaches, where the predicted scores strongly correlate with human judgments.	翻訳日:2022-11-04 06:20:48 公開日:2020-07-31
# 大規模肺炎と気胸を用いた弱監督型一段階視覚と言語疾患の検出 Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies ( http://arxiv.org/abs/2007.15778v1 ) ライセンス: Link先を確認	Leo K. Tam, Xiaosong Wang, Evrim Turkbey, Kevin Lu, Yuhong Wen, and Daguang Xu	(参考訳) 詳細なラベルがないため、大きなデータセットにもかかわらず、医療画像における臨床関連オブジェクトの検出は困難である。ラベル問題に対処するために、自然言語情報を含む検出アーキテクチャを用いてシーンレベルのラベルを利用する。特に肺炎と気胸に焦点をあてたMIMIC-CXRデータセットに,放射線技師によるペアリングボックスと自然言語アノテーションを新たに導入した。このデータセットと合わせて,クラスアクティベーションマッピング(CAM)や勾配CAM,およびNIH ChestXray-14およびMIMIC-CXRデータセットに対する関連する実装との強力なベースライン比較とともに,弱教師付きトランスフォーマー層選択型ワンステージデュアルヘッド検出アーキテクチャ(LITERATI)を提案する。視覚言語アーキテクチャの進歩から借用したliterati法は、純粋に監督された方法でスケールする検出のために、画像と参照表現(自然言語で画像にローカライズされたオブジェクト)の入力を示す。アーキテクチャの変更は、3つの障害に対処する - 教師付き視覚と言語検出を弱教師付きで実装し、臨床参照表現自然言語情報を取り入れ、マップ確率の高い忠実度検出を生成する。それにもかかわらず、微妙な参照、マルチインスタンス仕様、比較的冗長な医療報告を含む放射線医学的アノテーションの難易度は、スケールでの視覚言語検出タスクを将来的な調査に刺激し続ける。 Detecting clinically relevant objects in medical images is a challenge despite large datasets due to the lack of detailed labels. To address the label issue, we utilize the scene-level labels with a detection architecture that incorporates natural language information. We present a challenging new set of radiologist paired bounding box and natural language annotations on the publicly available MIMIC-CXR dataset especially focussed on pneumonia and pneumothorax. Along with the dataset, we present a joint vision language weakly supervised transformer layer-selected one-stage dual head detection architecture (LITERATI) alongside strong baseline comparisons with class activation mapping (CAM), gradient CAM, and relevant implementations on the NIH ChestXray-14 and MIMIC-CXR dataset. Borrowing from advances in vision language architectures, the LITERATI method demonstrates joint image and referring expression (objects localized in the image using natural language) input for detection that scales in a purely weakly supervised fashion. The architectural modifications address three obstacles -- implementing a supervised vision and language detection method in a weakly supervised fashion, incorporating clinical referring expression natural language information, and generating high fidelity detections with map probabilities. Nevertheless, the challenging clinical nature of the radiologist annotations including subtle references, multi-instance specifications, and relatively verbose underlying medical reports, ensures the vision language detection task at scale remains stimulating for future investigation.	翻訳日:2022-11-04 06:20:34 公開日:2020-07-31
# 身体を見る: 心理的苦痛における身体の身振りと自己適応の自動分析 Looking At The Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress ( http://arxiv.org/abs/2007.15815v1 ) ライセンス: Link先を確認	Weizhe Lin, Indigo Orton, Qingbiao Li, Gabriela Pavarini, Marwa Mahmoud	(参考訳) 心理的苦痛は社会において重要かつ増大する問題である。このような苦痛の自動検出、評価、分析は、研究の活発な領域である。顔、頭、声といったモダリティと比較して、これらのタスクに身体のモダリティを使用することを研究する研究は比較的少ない。これは、利用可能なデータセットが限られていることや、有用な身体機能を自動的に抽出するのが難しいことによる。最近のポーズ推定とディープラーニングの進歩により、このモダリティとドメインに対する新しいアプローチが可能になった。そこで本研究では,短時間のインタビューや自己報告の苦難ラベルのための全身ビデオを含む新しいデータセットを収集・分析した。本研究では,自己適応者のサブセットである自己適応とフィジットを自動的に検出する新たな手法を提案する。統計的身体動作とフィジット機能の分析を行い、被験者の行動にどう影響するかを探索する。そこで本研究では,マルチモーダル・ディープ・デノイジングオートエンコーダとフィッシャーベクトルエンコーディングの改良を用いた特徴表現を組み合わせるマルチモーダル手法を提案する。提案モデルでは,自己報告型不安度と抑うつ度とをラベル付けしたデータセットにおいて,音声視覚機能と自動検出型行動手がかりを組み合わせることで,苦痛レベルを予測できることを実証した。 Psychological distress is a significant and growing issue in society. Automatic detection, assessment, and analysis of such distress is an active area of research. Compared to modalities such as face, head, and vocal, research investigating the use of the body modality for these tasks is relatively sparse. This is, in part, due to the limited available datasets and difficulty in automatically extracting useful body features. Recent advances in pose estimation and deep learning have enabled new approaches to this modality and domain. To enable this research, we have collected and analyzed a new dataset containing full body videos for short interviews and self-reported distress labels. We propose a novel method to automatically detect self-adaptors and fidgeting, a subset of self-adaptors that has been shown to be correlated with psychological distress. We perform analysis on statistical body gestures and fidgeting features to explore how distress levels affect participants' behaviors. We then propose a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our proposed model, combining audio-visual features with automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels.	翻訳日:2022-11-04 06:20:03 公開日:2020-07-31
# アナカタバティック慣性:PSOのための粒子ワイド適応慣性 Anakatabatic Inertia: Particle-wise Adaptive Inertia for PSO ( http://arxiv.org/abs/2008.00979v1 ) ライセンス: Link先を確認	Sini\v{s}a Dru\v{z}eta, Stefan Ivi\'c	(参考訳) 粒子群最適化の開発を通じて、粒子慣性は可能な方法の改善を研究する方法の重要な側面として確立されてきた。先行研究の継続として, 個々の粒子の適合性向上に基づく慣性重み適応の新たな一般化手法, anakatabatic inertiaを提案する。この手法により、粒子の増大または減少に対応する各粒子の慣性重量値、すなわち粒子の昇降(アナバティック)または下降(カタバティック)運動によって条件づけられる。提案する慣性重み制御フレームワークは、cec 2014テストスイートの30のテスト機能でメタ最適化され、テストされた。提案手法は, 使用するPSO法(Standard PSOおよびTVAC-PSO)毎に4種類のアナカタバティックモデルを生成した。ベンチマーク実験の結果, anakatabatic inertiaモデルを用いた場合, 標準pso(最終フィットネス最小値が0.09桁まで低下する)と, 比較的強力なtvac-pso(最終フィットネス最小値が0.59桁まで低下する)の精度向上が, ほとんどが方法の性能に悪影響を及ぼすことなく確実に達成できることがわかった。 Throughout the course of the development of Particle Swarm Optimization, particle inertia has been established as an important aspect of the method for researching possible method improvements. As a continuation of our previous research, we propose a novel generalized technique of inertia weight adaptation based on individual particle's fitness improvement, called anakatabatic inertia. This technique allows for adapting inertia weight value for each particle corresponding to the particle's increasing or decreasing fitness, i.e. conditioned by particle's ascending (anabatic) or descending (katabatic) movement. The proposed inertia weight control framework was metaoptimized and tested on the 30 test functions of the CEC 2014 test suite. The conducted procedure produced four anakatabatic models, two for each of the PSO methods used (Standard PSO and TVAC-PSO). The benchmark testing results show that using the proposed anakatabatic inertia models reliably yield moderate improvements in accuracy of Standard PSO (final fitness minimum reduced up to 0.09 orders of magnitude) and rather strong improvements for TVAC-PSO (final fitness minimum reduced up to 0.59 orders of magnitude), mostly without any adverse effects on the method's performance.	翻訳日:2022-11-04 06:13:30 公開日:2020-07-31
# トロイの木馬ニューラルネットワークの実用的検出:データ制限とデータフリーケース Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases ( http://arxiv.org/abs/2007.15802v1 ) ライセンス: Link先を確認	Ren Wang, Gaoyuan Zhang, Sijia Liu, Pin-Yu Chen, Jinjun Xiong, Meng Wang	(参考訳) トレーニングデータが悪質に改ざんされた場合、取得したディープニューラルネットワーク(DNN)の予測は、トロイの木馬攻撃(または中毒バックドア攻撃)と呼ばれる敵によって操作できる。トロイの木馬攻撃に対するDNNの堅牢性の欠如は、下流アプリケーションにおけるリアルタイム機械学習(ML)システムに大きなダメージを与える可能性がある。本稿では,データカース方式におけるトロイの木馬ネットワーク(trojannet)検出の問題点について検討する。まず,データ限定型TrojanNet検出器(TND)を提案する。トロイの木馬攻撃と,各サンプル攻撃,全サンプルユニバーサル攻撃を含む予測・回避攻撃の関連を探索することにより,有効なデータ制限型tndを確立できることを示す。さらに,データサンプルにアクセスせずにTrojanNetを検出できるデータフリーTNDを提案する。このようなTNDは、ランダムノイズ入力においてもトロイの木馬の挙動を示す隠れニューロンの内部応答を利用して構築可能であることを示す。提案手法の有効性は, CIFAR-10, GTSRB, ImageNetなど, 異なるモデルアーキテクチャおよびデータセット下での広範な実験により評価される。 When the training data are maliciously tampered, the predictions of the acquired deep neural network (DNN) can be manipulated by an adversary known as the Trojan attack (or poisoning backdoor attack). The lack of robustness of DNNs against Trojan attacks could significantly harm real-life machine learning (ML) systems in downstream applications, therefore posing widespread concern to their trustworthiness. In this paper, we study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime, where only the weights of a trained DNN are accessed by the detector. We first propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection. We show that an effective data-limited TND can be established by exploring connections between Trojan attack and prediction-evasion adversarial attacks including per-sample attack as well as all-sample universal attack. In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples. We show that such a TND can be built by leveraging the internal response of hidden neurons, which exhibits the Trojan behavior even at random noise inputs. The effectiveness of our proposals is evaluated by extensive experiments under different model architectures and datasets including CIFAR-10, GTSRB, and ImageNet.	翻訳日:2022-11-04 06:12:23 公開日:2020-07-31
# 深い直接的可能性のノックオフ Deep Direct Likelihood Knockoffs ( http://arxiv.org/abs/2007.15835v1 ) ライセンス: Link先を確認	Mukund Sudarshan, Wesley Tansey, Rajesh Ranganath	(参考訳) 予測モデリングでは、ディープニューラルネットワークなどのブラックボックス機械学習手法を使用して最先端のパフォーマンスを実現することが多い。科学的領域では、科学者は予測を行うのにどの特徴が実際に重要なのかを知りたがることが多い。これらの発見は、コストのかかるフォローアップ実験につながる可能性があり、発見に対するエラー率があまり高くないことが重要である。 model-xのノックオフにより、fdrを制御して重要な機能を発見できる。しかし、ノックオフには、いわゆる"swap"プロパティに準拠しながら、ノックオフ機能を正確にモデル化できるリッチな生成モデルが必要である。我々は、ノックオフスワップ特性がもたらすKLの発散を直接最小化するDeep Direct Likelihood Knockoffs (DDLK) を開発した。 DDLKは、まず特徴の明示的な可能性を最大化し、次に特徴とノックオフの結合分布とそれらのスワップ間のKLのばらつきを最小化する。生成したノックオフが任意のスワップで有効であることを保証するため、DDLKはGumbel-Softmaxトリックを使用して、最悪のスワップでノックオフジェネレータを最適化する。 DDLKはベースラインよりも高いパワーを持ち、COVID-19の震源の1つである大規模なデータセットを含む様々な合成および実際のベンチマークでの偽発見率を制御する。 Predictive modeling often uses black box machine learning methods, such as deep neural networks, to achieve state-of-the-art performance. In scientific domains, the scientist often wishes to discover which features are actually important for making the predictions. These discoveries may lead to costly follow-up experiments and as such it is important that the error rate on discoveries is not too high. Model-X knockoffs enable important features to be discovered with control of the FDR. However, knockoffs require rich generative models capable of accurately modeling the knockoff features while ensuring they obey the so-called "swap" property. We develop Deep Direct Likelihood Knockoffs (DDLK), which directly minimizes the KL divergence implied by the knockoff swap property. DDLK consists of two stages: it first maximizes the explicit likelihood of the features, then minimizes the KL divergence between the joint distribution of features and knockoffs and any swap between them. To ensure that the generated knockoffs are valid under any possible swap, DDLK uses the Gumbel-Softmax trick to optimize the knockoff generator under the worst-case swap. We find DDLK has higher power than baselines while controlling the false discovery rate on a variety of synthetic and real benchmarks including a task involving a large dataset from one of the epicenters of COVID-19.	翻訳日:2022-11-04 06:11:42 公開日:2020-07-31
# 再利用距離の学習 Learning Forward Reuse Distance ( http://arxiv.org/abs/2007.15859v1 ) ライセンス: Link先を確認	Pengcheng Li, Yongbin Gu	(参考訳) キャッシング技術は、コンピュータアーキテクチャにおけるwebキャッシュからインフラストラクチャ、memcached、メモリキャッシュに至るまで、クラウドコンピューティングの時代において広く使われている。キャッシュされたデータの予測は、キャッシュ管理とパフォーマンスを大幅に改善する。近年のディープラーニング技術の進歩は、新しいインテリジェントキャッシュ置換ポリシーの設計を可能にする。本研究では,将来のデータアクセスを予測する学習支援手法を提案する。 LSTMに基づく強力なリカレントニューラルネットワークモデルにより,キャッシュトレースのみを入力として,高い予測精度が得られることがわかった。高い精度は、慎重に作られたローカリティ駆動の機能設計から得られる。高い予測精度に触発され、擬似PTポリシーを提案し、Microsoft Researchから13の現実世界のストレージワークロードに対して評価する。その結果、新しいキャッシュポリシーは、最先端の実用的なポリシーを最大19.2%改善し、平均でオプトより2.3%高いミス率しか発生しないことがわかった。 Caching techniques are widely used in the era of cloud computing from applications, such as Web caches to infrastructures, Memcached and memory caches in computer architectures. Prediction of cached data can greatly help improve cache management and performance. The recent advancement of deep learning techniques enables the design of novel intelligent cache replacement policies. In this work, we propose a learning-aided approach to predict future data accesses. We find that a powerful LSTM-based recurrent neural network model can provide high prediction accuracy based on only a cache trace as input. The high accuracy results from a carefully crafted locality-driven feature design. Inspired by the high prediction accuracy, we propose a pseudo OPT policy and evaluate it upon 13 real-world storage workloads from Microsoft Research. Results demonstrate that the new cache policy improves state-of-art practical policies by up to 19.2% and incurs only 2.3% higher miss ratio than OPT on average.	翻訳日:2022-11-04 06:11:20 公開日:2020-07-31
# 非定常問題に適用可能な最適化を用いた深層ロボット学習に向けて Towards Deep Robot Learning with Optimizer applicable to Non-stationary Problems ( http://arxiv.org/abs/2007.15890v1 ) ライセンス: Link先を確認	Taisuke Kobayashi	(参考訳) 本稿では,d-amsgradと呼ばれる深層学習のための新しい最適化器を提案する。実世界のデータでは、ロボットのスキルを学ぶために使用するデータセットからノイズや外れ値を排除することはできない。この問題は、データをリアルタイムで収集することで学習するロボットにとって特に重要であり、手作業ではソートできない。そのため、この問題を解決するためにいくつかのノイズローバストオプティマイザが開発され、Adam Optimizationr の変種である AmsGrad は、その収束の証明を持っている。しかし、実際にはロボットのシナリオにおける学習性能は向上しない。この理由は、ほとんどのロボット学習問題は静止していないと仮定されているが、AmsGradは学習中に最大2番目の運動量を与えると仮定している。非定常問題に適応するために, 最大2次運動量を緩やかに減少させる改良版を提案する。提案するオプティマイザは,ベースラインと同じ世界的最適点に達する能力を有し,その性能はロボティクス問題におけるベースラインよりも優れていた。 This paper proposes a new optimizer for deep learning, named d-AmsGrad. In the real-world data, noise and outliers cannot be excluded from dataset to be used for learning robot skills. This problem is especially striking for robots that learn by collecting data in real time, which cannot be sorted manually. Several noise-robust optimizers have therefore been developed to resolve this problem, and one of them, named AmsGrad, which is a variant of Adam optimizer, has a proof of its convergence. However, in practice, it does not improve learning performance in robotics scenarios. This reason is hypothesized that most of robot learning problems are non-stationary, but AmsGrad assumes the maximum second momentum during learning to be stationarily given. In order to adapt to the non-stationary problems, an improved version, which slowly decays the maximum second momentum, is proposed. The proposed optimizer has the same capability of reaching the global optimum as baselines, and its performance outperformed that of the baselines in robotics problems.	翻訳日:2022-11-04 06:11:08 公開日:2020-07-31
# 機械学習のためのグラフ信号処理 : レビューと新しい視点 Graph signal processing for machine learning: A review and new perspectives ( http://arxiv.org/abs/2007.16061v1 ) ライセンス: Link先を確認	Xiaowen Dong, Dorina Thanou, Laura Toni, Michael Bronstein, Pascal Frossard	(参考訳) 大規模構造化データの効率的な表現、処理、分析、可視化、特にネットワークやグラフのような複雑なドメインに関連するものなどは、現代の機械学習において重要な問題である。グラフ信号処理(gsp)は、グラフでサポートされているデータを扱うことを目的とした信号処理モデルとアルゴリズムの活気ある分野であり、この課題に対処するために新たな研究の道を開く。本稿では、グラフフィルタや変換といったgspの概念とツールが、新しい機械学習アルゴリズムの開発にもたらしたいくつかの重要な貢献についてレビューする。特に,データ構造とリレーショナル・プライオリティの活用,データと計算効率の向上,モデル解釈可能性の向上という3つの側面に注目した。さらに, 応用数学と信号処理, 機械学習とネットワーク科学の橋渡しとなるであろうgsp技術の今後の発展について, 新たな視点を提示する。これらの異なる分野にまたがる交配は、現代における複雑なデータ分析の多くの課題を解き放つのに役立つかもしれない。 The effective representation, processing, analysis, and visualization of large-scale structured data, especially those related to complex domains such as networks and graphs, are one of the key questions in modern machine learning. Graph signal processing (GSP), a vibrant branch of signal processing models and algorithms that aims at handling data supported on graphs, opens new paths of research to address this challenge. In this article, we review a few important contributions made by GSP concepts and tools, such as graph filters and transforms, to the development of novel machine learning algorithms. In particular, our discussion focuses on the following three aspects: exploiting data structure and relational priors, improving data and computational efficiency, and enhancing model interpretability. Furthermore, we provide new perspectives on future development of GSP techniques that may serve as a bridge between applied mathematics and signal processing on one side, and machine learning and network science on the other. Cross-fertilization across these different disciplines may help unlock the numerous challenges of complex data analysis in the modern age.	翻訳日:2022-11-04 06:10:50 公開日:2020-07-31
# ディープラーニング分類器における逐次ドリフト検出 Sequential Drift Detection in Deep Learning Classifiers ( http://arxiv.org/abs/2007.16109v1 ) ライセンス: Link先を確認	Samuel Ackerman, Parijat Dube, Eitan Farchi	(参考訳) ニューラルネットワーク埋め込みを用いて,適切な逐次決定枠組み内でドリフト検出を定式化し,データドリフトの検出を行う。これにより、統計検査を繰り返し適用しながらも、誤報率の制御が可能となる。変更検出アルゴリズムは,誤報の回避と迅速な検出のトレードオフに自然に直面するため,これら2つの懸念のバランスをとるアルゴリズムの能力を評価する損失関数を導入し,一連の実験で使用する。 We utilize neural network embeddings to detect data drift by formulating the drift detection within an appropriate sequential decision framework. This enables control of the false alarm rate although the statistical tests are repeatedly applied. Since change detection algorithms naturally face a tradeoff between avoiding false alarms and quick correct detection, we introduce a loss function which evaluates an algorithm's ability to balance these two concerns, and we use it in a series of experiments.	翻訳日:2022-11-04 06:10:32 公開日:2020-07-31
# ニューラルネットワークを用いた半水中のヒーブとサージ運動の予測 Predicting heave and surge motions of a semi-submersible with neural networks ( http://arxiv.org/abs/2007.15973v1 ) ライセンス: Link先を確認	Xiaoxian Guo and Xiantao Zhang and Xinliang Tian and Xin Li and Wenyue Lu	(参考訳) 船舶や浮遊プラットフォームでのリアルタイム動き予測は、動き補償システムの性能を向上させるのに役立つ。また、移動に関して重要なオフショア作戦に便利な早期警戒情報を提供することもできる。本研究では,半潜水艇のヒーブ動作とサージ動作を予測するために,LSTMに基づく長期記憶モデルを開発した。訓練とテストデータは、中国の上海江東大学(上海市)の深海盆地で実施された模型実験から得られたものだ。動きと測定波はLSTM細胞に供給され、その後サーブラート完全連結(FC)層を通過して予測を得た。測定された波の助けを借りて、予測は平均90%近い精度で46.5秒まで伸びた。トレーニングされたモデルは、ノイズ拡張データセットを使用して、0.8までのノイズレベルで効果的に動作した。さらなるステップとして、モデルは動き自体に基づいてのみ動きを予測することができる。モデルのアーキテクチャに関するセンシティブな研究に基づいて,機械学習モデル構築のためのガイドラインを提案する。提案するLSTMモデルでは, 船体波励起運動を予測する能力が強い。 Real-time motion prediction of a vessel or a floating platform can help to improve the performance of motion compensation systems. It can also provide useful early-warning information for offshore operations that are critical with regard to motion. In this study, a long short-term memory (LSTM) -based machine learning model was developed to predict heave and surge motions of a semi-submersible. The training and test data came from a model test carried out in the deep-water ocean basin, at Shanghai Jiao Tong University, China. The motion and measured waves were fed into LSTM cells and then went through serval fully connected (FC) layers to obtain the prediction. With the help of measured waves, the prediction extended 46.5 s into future with an average accuracy close to 90%. Using a noise-extended dataset, the trained model effectively worked with a noise level up to 0.8. As a further step, the model could predict motions only based on the motion itself. Based on sensitive studies on the architectures of the model, guidelines for the construction of the machine learning model are proposed. The proposed LSTM model shows a strong ability to predict vessel wave-excited motions.	翻訳日:2022-11-04 06:04:57 公開日:2020-07-31
# 寒冷後部とアリアティック不確かさ Cold Posteriors and Aleatoric Uncertainty ( http://arxiv.org/abs/2008.00029v1 ) ライセンス: Link先を確認	Ben Adlam, Jasper Snoek, and Samuel L. Smith	(参考訳) 近年の研究では、ベイズニューラルネットワークにおいて、検証セット(「コールド後部」効果)で後部の「温度」をチューニングすることで、正確な推論よりも優れていることが観察されている。この現象を解釈するために、ベイズニューラルネットワークでよく使われる先行は、多くの分類データセット上のラベルのアレラトリック不確かさを著しく過大評価することができると論じる。この問題は、ラベルの品質が高いMNISTやCIFARのような学術ベンチマークで特に顕著である。ガウス過程回帰の特別な場合、任意の正の温度は修正前の修正後の有効な後部に対応し、この温度を調整することは経験的ベイズと直接類似する。分類タスクでは、事前の修正と温度のチューニングの間に直接的な等価性はないが、温度の低下はトレーニングセットの既存の例をリラベルすることで、ほとんど情報を得ることができないという私たちの信念をよりよく反映するモデルにつながる可能性がある。したがって、冷えた後部は必ずしも正確な推論手順と一致しないが、それらはしばしば我々の真の以前の信念を反映していると信じている。 Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set (the "cold posterior" effect). To help interpret this phenomenon, we argue that commonly used priors in Bayesian neural networks can significantly overestimate the aleatoric uncertainty in the labels on many classification datasets. This problem is particularly pronounced in academic benchmarks like MNIST or CIFAR, for which the quality of the labels is high. For the special case of Gaussian process regression, any positive temperature corresponds to a valid posterior under a modified prior, and tuning this temperature is directly analogous to empirical Bayes. On classification tasks, there is no direct equivalence between modifying the prior and tuning the temperature, however reducing the temperature can lead to models which better reflect our belief that one gains little information by relabeling existing examples in the training set. Therefore although cold posteriors do not always correspond to an exact inference procedure, we believe they may often better reflect our true prior beliefs.	翻訳日:2022-11-04 06:03:54 公開日:2020-07-31
# サイクル学習率を用いた深層強化学習 Deep Reinforcement Learning using Cyclical Learning Rates ( http://arxiv.org/abs/2008.01171v1 ) ライセンス: Link先を確認	Ralf Gulde, Marc Tuscher, Akos Csiszar, Oliver Riedel and Alexander Verl	(参考訳) 深層強化学習(Dep Reinforcement Learning, DRL)法は、しばしば問題を解決するためにハイパーパラメータの微妙なチューニングに依存する。確率勾配降下(SGD)に基づく最適化手順における最も影響力のあるパラメータの1つは、学習率である。循環学習について検討し,様々なDRL問題に対する一般循環学習率の定義法を提案する。本稿では,複素DRL問題に適用した循環学習法を提案する。実験の結果,循環学習は,高度に調整された固定学習率と同等あるいはそれ以上の結果が得られることがわかった。本稿では、DRL設定における循環学習率の最初の適用例を示し、手動ハイパーパラメータチューニングの克服に向けた第一歩となる。 Deep Reinforcement Learning (DRL) methods often rely on the meticulous tuning of hyperparameters to successfully resolve problems. One of the most influential parameters in optimization procedures based on stochastic gradient descent (SGD) is the learning rate. We investigate cyclical learning and propose a method for defining a general cyclical learning rate for various DRL problems. In this paper we present a method for cyclical learning applied to complex DRL problems. Our experiments show that, utilizing cyclical learning achieves similar or even better results than highly tuned fixed learning rates. This paper presents the first application of cyclical learning rates in DRL settings and is a step towards overcoming manual hyperparameter tuning.	翻訳日:2022-11-04 06:03:12 公開日:2020-07-31
# 加速度センサを用いた歩行認識のための特徴学習 Feature Learning for Accelerometer based Gait Recognition ( http://arxiv.org/abs/2007.15958v1 ) ライセンス: Link先を確認	Szil\'ard Nemes, Margit Antal	(参考訳) 音声や物体認識などのパターンマッチングの最近の進歩は、歩行認識のための深層学習ソリューションによる特徴学習の実現を支援する。過去論文では、このタスクのために教師付き方法で訓練されたディープニューラルネットワークを評価した。本研究では,教師なしアプローチと教師なしアプローチの両方について検討した。同様のアーキテクチャをエンドツーエンドモデルとオートエンコーダに組み込んだ特徴抽出器を,歩行検証システムにおいて優れた表現を学習する能力に基づいて比較した。両方の特徴抽出器はIDNetデータセットでトレーニングされ、その後ZJU-GaitAccelデータセットで特徴抽出に使用された。その結果、オートエンコーダは、特徴学習能力に関して差別的なエンドツーエンドモデルに非常に近いこと、そして完全な畳み込みモデルは、訓練戦略に関係なく優れた特徴表現を学習できることを示した。 Recent advances in pattern matching, such as speech or object recognition support the viability of feature learning with deep learning solutions for gait recognition. Past papers have evaluated deep neural networks trained in a supervised manner for this task. In this work, we investigated both supervised and unsupervised approaches. Feature extractors using similar architectures incorporated into end-to-end models and autoencoders were compared based on their ability of learning good representations for a gait verification system. Both feature extractors were trained on the IDNet dataset then used for feature extraction on the ZJU-GaitAccel dataset. Results show that autoencoders are very close to discriminative end-to-end models with regards to their feature learning ability and that fully convolutional models are able to learn good feature representations, regardless of the training strategy.	翻訳日:2022-11-04 06:03:00 公開日:2020-07-31
# Transformer-XLを用いたソースコードの言語モデリング Language Modelling for Source Code with Transformer-XL ( http://arxiv.org/abs/2007.15813v1 ) ライセンス: Link先を確認	Thomas Dowdell, Hongyu Zhang	(参考訳) 自然言語のテキストと同様に、ソフトウェアは「自然性」を示しており、統計言語モデルによって捉えることができる。近年,ディープラーニングによるソフトウェアの自然性を表現するために,ニューラルネットワークモデルが提案されている。本稿では,rnnモデルやtransformer-xlモデルを含む,ソースコードのための最先端ニューラルネットワークモデルの実験的評価を行う。大規模なPythonコードコーパスの実験により,Transformer-XL モデルは RNN ベースのモデル(LSTM や GRU モデルを含む)よりも計算コストがはるかに少なく,ソフトウェアの自然性を捉えることができることがわかった。 It has been found that software, like natural language texts, exhibits "naturalness", which can be captured by statistical language models. In recent years, neural language models have been proposed to represent the naturalness of software through deep learning. In this paper, we conduct an experimental evaluation of state-of-the-art neural language models for source code, including RNN-based models and Transformer-XL based models. Through experiments on a large-scale Python code corpus, we find that the Transformer-XL model outperforms RNN-based models (including LSTM and GRU models) in capturing the naturalness of software, with far less computational cost.	翻訳日:2022-11-04 06:02:12 公開日:2020-07-31
# 大規模金融コーパスによるNERの性能向上 Improving NER's Performance with Massive financial corpus ( http://arxiv.org/abs/2007.15871v1 ) ライセンス: Link先を確認	Han Zhang	(参考訳) 大きなディープニューラルネットワークのトレーニングには大量の高品質なアノテーションデータが必要ですが、時間と労力のコストは中小企業には高すぎるのです。企業名の認識タスクを,小規模かつ低品質なトレーニングデータを用いて開始し,モデルトレーニング速度の向上と最低労働コストによるパフォーマンスの予測を行う。本手法は,Albert-small や Electra-small といったエリート言語モデルの事前学習,蒸留の知識,多段階学習を含む。その結果,リコール率を20ポイント近く引き上げ,BERT-CRFモデルの4倍速くなることがわかった。 Training large deep neural networks needs massive high quality annotation data, but the time and labor costs are too expensive for small business. We start a company-name recognition task with a small scale and low quality training data, then using skills to enhanced model training speed and predicting performance with minimum labor cost. The methods we use involve pre-training a lite language model such as Albert-small or Electra-small in financial corpus, knowledge of distillation and multi-stage learning. The result is that we raised the recall rate by nearly 20 points and get 4 times as fast as BERT-CRF model.	翻訳日:2022-11-04 05:55:31 公開日:2020-07-31
# 臨床エンティティ抽出の機械学習のためのロバストベンチマーク Robust Benchmarking for Machine Learning of Clinical Entity Extraction ( http://arxiv.org/abs/2007.16127v1 ) ライセンス: Link先を確認	Monica Agrawal, Chloe O'Connell, Yasmin Fatemi, Ariel Levy, David Sontag	(参考訳) 臨床研究は、しばしば自由テキスト臨床ノートにのみ存在する患者の物語の要素を理解する必要がある。音符を下流で使用するための構造化データに変換するために、これらの要素は一般的に抽出され、医学用語に正規化される。本研究では,最先端システムの性能を監査し,改善領域を示す。 2019 n2c2共有タスクにおける臨床エンティティ正規化システムに対する高いタスク精度は、誤解を招くものであり、基盤となる性能は依然として不安定である。一般的な概念(95.3%)では正規化精度が高いが、トレーニングデータでは認識できない概念(69.3%)ではずっと低い。医療用語の不整合,既存のラベル付けスキーマの制限,狭い評価手法によって,現在のアプローチが妨げられていることを示す。これらの問題に対処するために、臨床エンティティ抽出のためのアノテーションフレームワークを再構築し、堅牢なエンドツーエンドシステムベンチマークを可能にする。 2つのアノテータ間の新たなフレームワークからのアノテーションの一致を評価し、エンティティ認識のための Jaccard 類似度 0.73 とエンティティ正規化のための 0.83 を達成した。本稿では,エンティティ認識と正規化におけるメソッド開発を促進させる基準標準の作成の必要性を実証する手法を提案する。 Clinical studies often require understanding elements of a patient's narrative that exist only in free text clinical notes. To transform notes into structured data for downstream use, these elements are commonly extracted and normalized to medical vocabularies. In this work, we audit the performance of and indicate areas of improvement for state-of-the-art systems. We find that high task accuracies for clinical entity normalization systems on the 2019 n2c2 Shared Task are misleading, and underlying performance is still brittle. Normalization accuracy is high for common concepts (95.3%), but much lower for concepts unseen in training data (69.3%). We demonstrate that current approaches are hindered in part by inconsistencies in medical vocabularies, limitations of existing labeling schemas, and narrow evaluation techniques. We reformulate the annotation framework for clinical entity extraction to factor in these issues to allow for robust end-to-end system benchmarking. We evaluate concordance of annotations from our new framework between two annotators and achieve a Jaccard similarity of 0.73 for entity recognition and an agreement of 0.83 for entity normalization. We propose a path forward to address the demonstrated need for the creation of a reference standard to spur method development in entity recognition and normalization.	翻訳日:2022-11-04 05:55:18 公開日:2020-07-31
# パーキンソン病の学習に基づくコンピュータ支援処方モデル--データ駆動の視点から Learning-based Computer-aided Prescription Model for Parkinson's Disease: A Data-driven Perspective ( http://arxiv.org/abs/2007.16103v1 ) ライセンス: Link先を確認	Yinghuan Shi and Wanqi Yang and Kim-Han Thung and Hao Wang and Yang Gao and Yang Pan and Li Zhang and Dinggang Shen	(参考訳) 本稿では「PD患者に対する自動処方薬処方」という新たな課題について考察する。この目標を達成するために、まずデータセットを収集して 1) PD患者の症状, および 2)神経科医が提供した処方薬。次に, 観察した症状と処方薬の関係を学習し, 新たな処方薬モデルを構築した。最後に,新来の患者に対しては,処方薬モデルにより,観察された症状に対して適切な処方薬を推奨できる(予測)。方法論的な部分から,提案したモデルであるPrescription viA Learning lAtent Symptoms (PALAS)は,データの多モード表現を用いた処方を推奨できる。 PALASでは、症状と処方薬の関係をより良くモデル化するために、潜伏症状空間が学習される。さらに,PALASの効率的な交互最適化手法を提案する。本手法は,南京脳病院における136人のpd患者から収集したデータを用いて評価した。本研究は,他の競合手法と比較して,提案手法の有効性と臨床効果を示すものである。 In this paper, we study a novel problem: "automatic prescription recommendation for PD patients." To realize this goal, we first build a dataset by collecting 1) symptoms of PD patients, and 2) their prescription drug provided by neurologists. Then, we build a novel computer-aided prescription model by learning the relation between observed symptoms and prescription drug. Finally, for the new coming patients, we could recommend (predict) suitable prescription drug on their observed symptoms by our prescription model. From the methodology part, our proposed model, namely Prescription viA Learning lAtent Symptoms (PALAS), could recommend prescription using the multi-modality representation of the data. In PALAS, a latent symptom space is learned to better model the relationship between symptoms and prescription drug, as there is a large semantic gap between them. Moreover, we present an efficient alternating optimization method for PALAS. We evaluated our method using the data collected from 136 PD patients at Nanjing Brain Hospital, which can be regarded as a large dataset in PD research community. The experimental results demonstrate the effectiveness and clinical potential of our method in this recommendation task, if compared with other competing methods.	翻訳日:2022-11-04 05:45:47 公開日:2020-07-31
# グラフニューラルネットワークにおけるニューラルアーキテクチャ探索 Neural Architecture Search in Graph Neural Networks ( http://arxiv.org/abs/2008.00077v1 ) ライセンス: Link先を確認	Matheus Nunes and Gisele L. Pappa	(参考訳) グラフデータに対する解析的タスクの実行は、リレーショナル情報の多様さと高可用性のため、ますます興味深いものになりつつある。しかし、画像や文とは異なり、ネットワークにはシーケンスの概念はない。ノード(とエッジ)は絶対的な順序に従わず、従来の機械学習(ML)アルゴリズムがパターンを認識して、この種のデータに基づいて予測を一般化することは難しい。グラフニューラルネットワーク(GNN)はこの問題にうまく対処した。これらは畳み込みの概念をグラフドメインに一般化した後に人気になった。しかし、それらは多くのハイパーパラメータを持ち、その設計と最適化は現在、ヒューリスティックスや経験的直観に基づく手作りである。 neural architecture search (nas)メソッドは、この問題に対する興味深い解決策である。本稿では,強化学習に基づく2つのnas法と,進化的アルゴリズムに基づく2つのnas法を比較した。その結果、2つの探索空間上の7つのデータセットを考察し、どちらの方法もランダムな探索と類似の精度を持つことを示し、探索空間の次元が実際に問題と関係しているかどうかという疑問を提起した。 Performing analytical tasks over graph data has become increasingly interesting due to the ubiquity and large availability of relational information. However, unlike images or sentences, there is no notion of sequence in networks. Nodes (and edges) follow no absolute order, and it is hard for traditional machine learning (ML) algorithms to recognize a pattern and generalize their predictions on this type of data. Graph Neural Networks (GNN) successfully tackled this problem. They became popular after the generalization of the convolution concept to the graph domain. However, they possess a large number of hyperparameters and their design and optimization is currently hand-made, based on heuristics or empirical intuition. Neural Architecture Search (NAS) methods appear as an interesting solution to this problem. In this direction, this paper compares two NAS methods for optimizing GNN: one based on reinforcement learning and a second based on evolutionary algorithms. Results consider 7 datasets over two search spaces and show that both methods obtain similar accuracies to a random search, raising the question of how many of the search space dimensions are actually relevant to the problem.	翻訳日:2022-11-04 05:45:10 公開日:2020-07-31
# 遺伝学的改善@ICSE 2020 Genetic Improvement @ ICSE 2020 ( http://arxiv.org/abs/2007.15987v1 ) ライセンス: Link先を確認	William B. Langdon, Westley Weimer, Justyna Petke, Erik Fredericks, Seongmin Lee, Emily Winter, Michail Basios, Myra B. Cohen, Aymeric Blot, Markus Wagner, Bobby R. Bruce, Shin Yoo, Simos Gerasimou, Oliver Krauss, Yu Huang and Michael Gerten	(参考訳) facebookの基調講演と正式なプレゼンテーション(議事録に記録されている)に続いて、第8回国際遺伝子改善ワークショップであるgi-2020 @ icse(2020年7月3日金曜の第42回acm/ieee国際ソフトウェアエンジニアリング会議の一部として開催)で幅広い議論が行われた。トピックとしては、産業の取り込み、ヒューマンファクタ、説明責任(説明可能性、正当化可能性、エクスプロビリティ)、GIベンチマークがある。我々はまた、近年の様々なオンラインアプローチ(例えばSBST 2020)を、対面2面インタラクションなしでインターネット上でWWWを介して仮想コンピュータサイエンス会議やワークショップを開催することと対比した。最後に、コロナウイルスのCovid-19パンデミックが来年と将来の研究にどのように影響するかを推測する。 Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included industry take up, human factors, explainabiloity (explainability, justifyability, exploitability) and GI benchmarks. We also contrast various recent online approaches (e.g. SBST 2020) to holding virtual computer science conferences and workshops via the WWW on the Internet without face-2-face interaction. Finally we speculate on how the Coronavirus Covid-19 Pandemic will affect research next year and into the future.	翻訳日:2022-11-04 05:44:52 公開日:2020-07-31
# 画像の自動生成音素キャプションの評価 Evaluating Automatically Generated Phoneme Captions for Images ( http://arxiv.org/abs/2007.15916v1 ) ライセンス: Link先を確認	Justin van der Hout, Zolt\'an D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg	(参考訳) Image2Speechは画像の音声記述を生成する比較的新しいタスクである。本稿では,この課題の評価について検討する。そこでまず,音素配列からなる画像キャプションを生成するImage2Speechシステムを開発した。このシステムはFlickr8kコーパスでオリジナルのImage2Speechシステムより優れていた。その後、これらの音素キャプションを文に変換する。キャプションは人間の評価者によって画像の記述が優れているとして評価された。最後に, 結果の客観的な測定値は, これらの評価値と相関した。 BLEU4は人間のレーティングと完全に相関しないが、調査された指標の中では最も高い相関関係を示しており、Image2Speechタスクの現在ある最高のメトリクスである。現在の指標は、入力が単語であると仮定するという事実によって制限されている。 Image2Speechタスクのより適切なメトリックは、入力を単語の一部、すなわち音素の一部と仮定するべきである。 Image2Speech is the relatively new task of generating a spoken description of an image. This paper presents an investigation into the evaluation of this task. For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences. This system outperformed the original Image2Speech system on the Flickr8k corpus. Subsequently, these phoneme captions were converted into sentences of words. The captions were rated by human evaluators for their goodness of describing the image. Finally, several objective metric scores of the results were correlated with these human ratings. Although BLEU4 does not perfectly correlate with human ratings, it obtained the highest correlation among the investigated metrics, and is the best currently existing metric for the Image2Speech task. Current metrics are limited by the fact that they assume their input to be words. A more appropriate metric for the Image2Speech task should assume its input to be parts of words, i.e. phonemes, instead.	翻訳日:2022-11-04 05:44:37 公開日:2020-07-31
# 部分メチル化アルジトール酢酸のGC-EIMSスペクトル同定のためのパーゼン密度推定を用いた強化学習と多次元ベイズ分類を用いたニューラルネットワークの比較研究 A Comparative study of Artificial Neural Networks Using Reinforcement learning and Multidimensional Bayesian Classification Using Parzen Density Estimation for Identification of GC-EIMS Spectra of Partially Methylated Alditol Acetates ( http://arxiv.org/abs/2008.02072v1 ) ライセンス: Link先を確認	Faramarz Valafar, Homayoun Valafar	(参考訳) 本研究では, 部分メチル化Alditol Acetates (PMAAs) のガスクロマトグラフィー-電子衝突質量スペクトル (GC-EIMS) データベース用パターン認識検索エンジンの開発について報告する。また,本研究に用いられた2つのパターン認識技術の比較結果を報告する。最初の手法はベイズ分類器とパーゼン密度推定器を用いた統計手法である。第2のテクニックは、強化学習でトレーニングされたニューラルネットワークモジュールだ。ここでは、両システムが少量の雑音でスペクトルを特定するのに優れていることを示す。両方のシステムの性能は劣化信号-雑音比(SNR)で劣化する。部分スペクトル(データを見逃す)を扱う場合、人工ニューラルネットワークシステムの性能が向上する。開発システムはワールドワイドウェブ上に実装されており、GC-EIMS機器に記録されたこれらの分子のスペクトルを用いてPMAAを識別することを目的としている。したがって、このシステムはGC-EIMSスペクトルの計器やカラム依存の変動に敏感である。 This study reports the development of a pattern recognition search engine for a World Wide Web-based database of gas chromatography-electron impact mass spectra (GC-EIMS) of partially methylated Alditol Acetates (PMAAs). Here, we also report comparative results for two pattern recognition techniques that were employed for this study. The first technique is a statistical technique using Bayesian classifiers and Parzen density estimators. The second technique involves an artificial neural network module trained with reinforcement learning. We demonstrate here that both systems perform well in identifying spectra with small amounts of noise. Both system's performance degrades with degrading signal-to-noise ratio (SNR). When dealing with partial spectra (missing data), the artificial neural network system performs better. The developed system is implemented on the world wide web, and is intended to identify PMAAs using submitted spectra of these molecules recorded on any GC-EIMS instrument. The system, therefore, is insensitive to instrument and column dependent variations in GC-EIMS spectra.	翻訳日:2022-11-04 05:44:02 公開日:2020-07-31
# HMCNAS:隠れマルコフ連鎖とベイズ最適化を用いたニューラルネットワーク探索 HMCNAS: Neural Architecture Search using Hidden Markov Chains and Bayesian Optimization ( http://arxiv.org/abs/2007.16149v1 ) ライセンス: Link先を確認	Vasco Lopes and Lu\'is A. Alexandre	(参考訳) Neural Architecture Searchは、さまざまなタスクにおいて最先端のパフォーマンスを達成した。しかし、最終的なモデルアーキテクチャ、サンプルするレイヤの数、強制操作、小さな検索空間など、解決される問題や生成されたモデルに関連する人間定義を必要とする多くの仮定は、最終的にシステムへのバイアスを引き起こすコストで、より高いパフォーマンスを持つモデルを持つことに貢献する。本稿では,2つの新しい構成要素からなるHMCNASを提案する。一人間が設計したモデルに関する情報を利用して、複雑な探索空間を自律的に生成する方法二人間の定義したパラメータや小さな探索空間に頼ることなく、ゼロから競合するCNNを生成することができるベイズ最適化付き進化的アルゴリズム。実験の結果,提案手法は競争的アーキテクチャを極めて短時間で得ることができることがわかった。 HMCNASは、特定のタスクに関する人間的な知識を必要とせずに、競争モデルを作成する方法を提供することによって、NASを一般化するステップを提供する。 Neural Architecture Search has achieved state-of-the-art performance in a variety of tasks, out-performing human-designed networks. However, many assumptions, that require human definition, related with the problems being solved or the models generated are still needed: final model architectures, number of layers to be sampled, forced operations, small search spaces, which ultimately contributes to having models with higher performances at the cost of inducing bias into the system. In this paper, we propose HMCNAS, which is composed of two novel components: i) a method that leverages information about human-designed models to autonomously generate a complex search space, and ii) an Evolutionary Algorithm with Bayesian Optimization that is capable of generating competitive CNNs from scratch, without relying on human-defined parameters or small search spaces. The experimental results show that the proposed approach results in competitive architectures obtained in a very short time. HMCNAS provides a step towards generalizing NAS, by providing a way to create competitive models, without requiring any human knowledge about the specific task.	翻訳日:2022-11-04 05:38:13 公開日:2020-07-31
# ニューラル言語生成:定式化、方法、および評価 Neural Language Generation: Formulation, Methods, and Evaluation ( http://arxiv.org/abs/2007.15780v1 ) ライセンス: Link先を確認	Cristina Garbacea, Qiaozhu Mei	(参考訳) ニューラルネットワークに基づく生成モデリングの最近の進歩は、コンピュータシステムが人間とシームレスに会話でき、自然言語を理解できることを期待している。ニューラルネットワークは、さまざまなユーザニーズを満たすさまざまなコンテキストやタスクにおいて、さまざまな成功度に対するテキストの抜粋を生成するために使用されている。特に、大規模データセットでトレーニングされた高容量のディープラーニングモデルは、明示的な監視信号がなくても、データのパターンを学習する非並列的な能力を示し、現実的で一貫性のあるテキストを生成するための、多くの新しい可能性を開く。自然言語生成の分野は急速に進化しているが、解決すべきオープンな課題がまだたくさんある。本調査では,自然言語生成の問題を明確に定義し,分類する。我々は, 自然言語の生成が実用上重要であるような, 一般的な定式化のインスタンス化である特定のアプリケーションタスクについて検討する。次に、多様なテキストを生成するのに使用される方法とニューラルネットワークアーキテクチャの概要を紹介する。それにもかかわらず、これらの生成モデルによって生成されたテキストの品質を評価する標準的な方法は存在しない。この目的のために、自然言語生成システムの評価に関する現在のアプローチをレビューする。我々は、この調査が神経自然言語生成の定式化、方法、および評価の有益な概要を提供することを期待している。 Recent advances in neural network-based generative modeling have reignited the hopes in having computer systems capable of seamlessly conversing with humans and able to understand natural language. Neural architectures have been employed to generate text excerpts to various degrees of success, in a multitude of contexts and tasks that fulfil various user needs. Notably, high capacity deep learning models trained on large scale datasets demonstrate unparalleled abilities to learn patterns in the data even in the lack of explicit supervision signals, opening up a plethora of new possibilities regarding producing realistic and coherent texts. While the field of natural language generation is evolving rapidly, there are still many open challenges to address. In this survey we formally define and categorize the problem of natural language generation. We review particular application tasks that are instantiations of these general formulations, in which generating natural language is of practical importance. Next we include a comprehensive outline of methods and neural architectures employed for generating diverse texts. Nevertheless, there is no standard way to assess the quality of text produced by these generative models, which constitutes a serious bottleneck towards the progress of the field. To this end, we also review current approaches to evaluating natural language generation systems. We hope this survey will provide an informative overview of formulations, methods, and assessments of neural natural language generation.	翻訳日:2022-11-04 05:37:10 公開日:2020-07-31
# 医療報告書からの信頼できる情報抽出のための浅層cnnモデルのモデル削減 Model Reduction of Shallow CNN Model for Reliable Deployment of Information Extraction from Medical Reports ( http://arxiv.org/abs/2008.01572v1 ) ライセンス: Link先を確認	Abhishek K Dubey and Alina Peluso and Jacob Hinkle and Devanshu Agarawal and Zilong Tan	(参考訳) 浅層畳み込みニューラルネットワーク(cnn)は、がん病理報告から情報を抽出するための時間テストツールである。 Shallow CNNはこのタスクを、多くのNLPタスクの最先端を保持するBERTなど、他のディープラーニングモデルと競合的に実行する。この偏心現象の背景にある主な洞察は、がん病理報告からの情報抽出はタスクを実行するためにドメイン固有のテキストセグメントをほんの数個しか必要とせず、そのため、ほとんどのテキストとコンテキストがそのタスクに過大に働くことである。シャローCNNモデルは、ラベル付きトレーニングセットからこれらのキーショートテキストセグメントを識別するのに適しているが、識別されたテキストセグメントは人間には不明瞭である。本研究では,CNNフィルタと関連するテキストセグメントとの信頼性の高い接続を実現するためのモデル縮小ツールの開発により,このギャップを埋める。我々は,n-gram存在表現の線形変換を変換重みに先立って非負性および疎性で近似することにより,浅部CNN表現の複雑さを低減し,解釈可能なモデルを得る。提案手法は,従来認識されていたトレードオフ境界と,モデルの縮小による説明可能性とのギャップを橋渡しするものである。 Shallow Convolution Neural Network (CNN) is a time-tested tool for the information extraction from cancer pathology reports. Shallow CNN performs competitively on this task to other deep learning models including BERT, which holds the state-of-the-art for many NLP tasks. The main insight behind this eccentric phenomenon is that the information extraction from cancer pathology reports require only a small number of domain-specific text segments to perform the task, thus making the most of the texts and contexts excessive for the task. Shallow CNN model is well-suited to identify these key short text segments from the labeled training set; however, the identified text segments remain obscure to humans. In this study, we fill this gap by developing a model reduction tool to make a reliable connection between CNN filters and relevant text segments by discarding the spurious connections. We reduce the complexity of shallow CNN representation by approximating it with a linear transformation of n-gram presence representation with a non-negativity and sparsity prior on the transformation weights to obtain an interpretable model. Our approach bridge the gap between the conventionally perceived trade-off boundary between accuracy on the one side and explainability on the other by model reduction.	翻訳日:2022-11-04 05:36:04 公開日:2020-07-31

Title

Authors

Abstract

論文公表日・翻訳日

# 忠実度変動を用いたノイズテレポーテーションの性能評価

Rating the performance of noisy teleportation using fluctuations in fidelity ( http://arxiv.org/abs/2001.11463v2 )

ライセンス: Link先を確認

Saptarshi Roy and Arkaprabha Ghosal

(参考訳) 量子テレポーテーションは、量子世界の最も先駆的な特徴の1つである。通常、テレポーテーションプロトコルの品質はその平均忠実度によってのみ判断される。本研究では, テレポーテーションの性能を, 忠実度と不確かさの両方の観点から解析する。具体的には,テレポータビリティスコア(telportability score)という,忠実度と偏差の両面から貢献する量を定義した。また、量子状態のテレポーテーションが1つまたは複数の中間ステップで必要とされるプロトコルに必要な感度を考慮に入れている。我々は,無騒音シナリオにおけるテレポータビリティスコアを計算し,資源状態の絡み合い内容と単調に増加することを示す。 n鎖リピータのような構成であっても、結果は変わらない。しかし、ノイズの存在下では、テレポータビリティスコアは、当初共有されていたリソース状態の絡み合い内容に関して、時々非単調な振る舞いを示すことができる。具体的には、局所的なビットフリップおよびビット位相フリップノイズの下では、より少ないエンタングル状態は、システムパラメータの特定の選択に対して高いテレポータビリティスコアを持つことができる。グローバルな非偏極ノイズの存在下では、低絡み合いの資源状態と高感度要求に対して、ノイズのないシナリオと比較して、ノイズの多い状態の方がテレポータビリティスコアがよい。

Quantum teleportation is one of the most pioneering features of the quantum world. Typically, the quality of a teleportation protocol is solely judged by its average fidelity. In this work, we analyze the performance of teleportation in terms of both fidelity and the deviation in fidelity. Specifically, we define a quantity called teleportability score, which incorporates contributions from both the fidelity and its deviation. It also takes into account the sensitivity one requires for a protocol in which the teleportation of a quantum state is required in one or many intermediate steps. We compute the teleportability score in the noiseless scenario and find that it increases monotonically with the entanglement content of the resource state. The result remains same even if we consider an n-chain repeater-like configuration. However, in the presence of noise, the teleportability score, can sometime display a nonmonotonic behaviour with respect to the entanglement content of the initially shared resource state. Specifically, under local bit-flip and bit-phase-flip noise, lesser entangled states can have higher teleportability score for certain choice of system parameters. In the presence of global depolarizing noise, for low entangled resource states and high sensitivity requirements, the noisy states can have better a teleportability score in comparison to the noiseless scenario.

翻訳日:2023-06-05 04:44:43 公開日:2020-07-31

# 異方性ホッピングをもつ立方格子上のボース・ハバード模型の量子相転移

Quantum phase transition of the Bose-Hubbard model on cubic lattice with anisotropic hopping ( http://arxiv.org/abs/2002.10602v2 )

ライセンス: Link先を確認

Tao Wang and Xue-Feng Zhang

(参考訳) 量子多体系では、次元性は量子相転移のタイプにおいて重要な役割を果たす。次元クロスオーバー中の量子系の研究のために,高次シンボリック強結合拡大法を用いて,異方性ホッピングを伴う立方体格子上のボース・ハバード模型について検討した。 mott-絶縁体と超流動相の境界を8階まで拡張した解析級数を計算する。臨界指数は pad\'{e} re-summation 法によって抽出される。一方、余剰充填における臨界点も得られ、それらは再正規化群理論の予測とよく一致する。ギャップエネルギーのスケーリングと全位相図は最終的に与えられ、将来の研究における実験および数値シミュレーションのベンチマークとして捉えることができる。

In quantum many-body system, dimensionality plays a critical role on type of the quantum phase transition. In order to study the quantum system during dimensional crossover, we studied the Bose-Hubbard model on cubic lattice with anisotropic hopping by using the high order symbolic strong coupling expansion method. The analytic series expanded boundaries between the Mott-insulator and superfluid phase up to eighth order are calculated. The critical exponents are extracted by Pad\'{e} re-summation method, which clearly shows the dimensional crossover behavior. Meanwhile, the critical points at commensurate filling can also be obtained, and they match well with the prediction of renormalization group theory. The scaling of the gap energy and whole phase diagram are given at last, and they can be taken as the benchmark for experiment and numerical simulations in the future study.

翻訳日:2023-06-02 00:11:46 公開日:2020-07-31

# 多モード光キャビティにおける光子を介する1次元ガスのピエルス遷移

Photon-mediated Peierls Transition of a 1D Gas in a Multimode Optical Cavity ( http://arxiv.org/abs/2002.12285v3 )

ライセンス: Link先を確認

Colin Rylands, Yudan Guo, Benjamin L. Lev, Jonathan Keeling, and Victor Galitski

(参考訳) 電荷密度波に対するピエルス不安定性はフォノン駆動の強相関物理学の正準例であり、トポロジカル量子物質とエキゾティック超伝導と密接に関連している。多モード共焦点空洞内に閉じ込められた相互作用するボース原子やフェルミ原子の1次元チューブシステムを用いて、類似した光子を媒介するパイエルス遷移を実現する方法を提案する。キャビティをポンピングすることで、原子系におけるキャビティを媒介する金属--絶縁体遷移が実現される。 tonks-girardeau極限における強く相互作用するボソンの場合、この遷移は(フェルミオン化を通じて)パイエルス不安定であると理解することができる。この計算を相互作用強度の有限値に拡張し、キャビティ場と質量ギャップの双方について解析式を導出する。それらは次元のない物質-光のカップリングに非自明な力法則を依存させる。

The Peierls instability toward a charge density wave is a canonical example of phonon-driven strongly correlated physics and is intimately related to topological quantum matter and exotic superconductivity. We propose a method to realize an analogous photon-mediated Peierls transition, using a system of one-dimensional tubes of interacting Bose or Fermi atoms trapped inside a multimode confocal cavity. Pumping the cavity transversely engineers a cavity-mediated metal--to--insulator transition in the atomic system. For strongly interacting bosons in the Tonks-Girardeau limit, this transition can be understood (through fermionization) as being the Peierls instability. We extend the calculation to finite values of the interaction strength and derive analytic expressions for both the cavity field and mass gap. They display nontrivial power law dependence on the dimensionless matter-light coupling.

翻訳日:2023-06-01 12:27:29 公開日:2020-07-31

# 非平衡モンテカルロシミュレーションからの強結合

Strong coupling from non-equilibrium Monte Carlo simulations ( http://arxiv.org/abs/2003.13734v2 )

ライセンス: Link先を確認

Olmo Francesconi, Marco Panero, David Preti

(参考訳) 格子上の非平衡モンテカルロシミュレーションを用いて、schr\"odinger-functional schemeにおける非可換ゲージ理論の動作結合を計算する。

We compute the running coupling of non-Abelian gauge theories in the Schr\"odinger-functional scheme, by means of non-equilibrium Monte Carlo simulations on the lattice.

翻訳日:2023-05-27 12:03:48 公開日:2020-07-31

# 量子設計に割り当てられたPOVMの不確実性関係のR\'{e}nyi定式化

R\'{e}nyi formulation of uncertainty relations for POVMs assigned to a quantum design ( http://arxiv.org/abs/2004.05576v3 )

ライセンス: Link先を確認

Alexey E. Rastegin

(参考訳) 情報エントロピーは、不確実性原理によって課される制約を表現する強力で柔軟な方法を提供する。このアプローチは量子情報理論の問題への応用に非常に適していると思われる。このような質問は、ある特定の構造を持つ測定を含むのが典型的である。後者はしばしば、十分一般的なスコープの不確実性関係に従うエントロピー境界を改善することができる。量子設計は量子情報理論の多くの問題で使われており、関連する測定に対する不確かさの関係が興味深い。本稿では、量子設計に割り当てられた povm に対する min-エントロピーと r\'{e}nyi エントロピーの観点からの不確実性関係を求める。 Landau--Pollak型の関連も扱う。 2次元の量子設計の例を用いて、得られた下界を以前のものと比較する。エントロピーステアリングの不等式への影響を簡潔に論じる。

Information entropies provide powerful and flexible way to express restrictions imposed by the uncertainty principle. This approach seems to be very suitable in application to problems of quantum information theory. It is typical that questions of such a kind involve measurements having one or another specific structure. The latter often allows us to improve entropic bounds that follow from uncertainty relations of sufficiently general scope. Quantum designs have found use in many issues of quantum information theory, whence uncertainty relations for related measurements are of interest. In this paper, we obtain uncertainty relations in terms of min-entropies and R\'{e}nyi entropies for POVMs assigned to a quantum design. Relations of the Landau--Pollak type are addressed as well. Using examples of quantum designs in two dimensions, the obtained lower bounds are then compared with the previous ones. An impact on entropic steering inequalities is briefly discussed.

翻訳日:2023-05-25 02:26:47 公開日:2020-07-31

# ランドーレベル問題としての位相空間量子力学

Phase Space Quantum Mechanics as a Landau Level Problem ( http://arxiv.org/abs/2004.11455v2 )

ライセンス: Link先を確認

Kun Yang

(参考訳) 位相空間における量子力学の定式化問題と、量子力学粒子の特定のランダウレベルへの運動を投影する問題との関連性を指摘する。特に、量子ホール効果の研究で広く用いられている最低ランダウレベル波動関数は、この文脈で実際に位相空間波動関数であることが示されている。簡単な問題を解析して,この理解の有用性を実証し,他のユーティリティを提案する。

We point out the connection between the problem of formulating quantum mechanics in phase space and projecting the motion of a quantum mechanical particle onto a particular Landau level. In particular, we show that lowest Landau level wave functions, which are widely used in studies of quantum Hall effect, are actually phase space wave functions in this context. We demonstrate the usefulness of this understanding by analyzing some simple problems, and propose other utilities.

翻訳日:2023-05-22 08:11:02 公開日:2020-07-31

# 一軸シフト中心を持つ解析的射影対称正則ガウス関数

Analytically projected rotationally symmetric explicitly correlated Gaussian Functions with one-axis-shifted centers ( http://arxiv.org/abs/2005.00092v2 )

ライセンス: Link先を確認

Andrea Muolo and Markus Reiher

(参考訳) 任意の角運動量とパリティを持つN-粒子系の波動関数を拡張するための新しい有意相関関数形式を示す。我々は,前回の研究 [j. chem. phys. 149, 184105 (2018)] で数値的に活用した投影に基づく手法を開発し,一軸偏移中心と明確に相関し,積分射影作用素を解析的に解いてハミルトニアンおよび角運動量作用素の行列要素を導出する。ボルン-オッペンハイマー近似を仮定しない変分小体計算は、3粒子系と4粒子系のいくつかの回転励起状態に対して提示される。新たな形式主義を,小原子や分子の性質の高精度計算のための統一的な枠組みとして利用できることを示す。

A new explicitly correlated functional form for expanding the wave function of an N-particle system with arbitrary angular momentum and parity is presented. We develop the projection-based approach, numerically exploited in our previous work [J. Chem. Phys. 149, 184105 (2018)], to explicitly correlated Gausssians with one-axis shifted centers and derive the matrix elements for the Hamiltonian and the angular momentum operators by analytically solving the integral projection operator. Variational few-body calculations without assuming the Born-Oppenheimer approximation are presented for several rotationally excited states of three- and four-particle systems. We show how the new formalism can be used as a unified framework for high-accuracy calculations of properties of small atoms and molecules.

翻訳日:2023-05-21 16:54:43 公開日:2020-07-31

# 量子ネットワーク幾何学のスペクトル次元の探索

Probing the spectral dimension of quantum network geometries ( http://arxiv.org/abs/2005.09665v2 )

ライセンス: Link先を確認

Johannes Nokkala, Jyrki Piilo, Ginestra Bianconi

(参考訳) ノードが結合された量子発振器である「フラバー付き量子ネットワーク幾何」 (QNGF) によって記述されたオープン量子システムの環境を考える。 QNGFの幾何学的性質は、有限スペクトル次元を示すネットワークのラプラシア行列のスペクトル特性に反映され、QNGFの正規モードの周波数も決定される。補助開量子系をネットワークに結合し、低周波領域における正規モード周波数を探索することにより、事前未知のスペクトル次元を間接的に推定できることを示す。この意味では、これは例えば発振器の素周波数や定数結合強度の値ではなく、ネットワーク幾何学の特性である。数値的な証拠は、この推定が高周波遮断とノイズ、あるいは正常モード周波数の欠如の両方に頑健であることを示している。補助システムをランダム結合強度を持つネットワークノードのサブセットに結合し、正規モード周波数の十分大きなサブセットを明らかにし解決する。

We consider an environment for an open quantum system described by a "Quantum Network Geometry with Flavor" (QNGF) in which the nodes are coupled quantum oscillators. The geometrical nature of QNGF is reflected in the spectral properties of the Laplacian matrix of the network which display a finite spectral dimension, determining also the frequencies of the normal modes of QNGFs. We show that an a priori unknown spectral dimension can be indirectly estimated by coupling an auxiliary open quantum system to the network and probing the normal mode frequencies in the low frequency regime. We find that the network parameters do not affect the estimate; in this sense it is a property of the network geometry, rather than the values of, e.g., oscillator bare frequencies or the constant coupling strength. Numerical evidence suggests that the estimate is also robust both to small changes in the high frequency cutoff and noisy or missing normal mode frequencies. We propose to couple the auxiliary system to a subset of network nodes with random coupling strengths to reveal and resolve a sufficiently large subset of normal mode frequencies.

翻訳日:2023-05-19 08:04:55 公開日:2020-07-31

# フロッケダイナミクスのための量子カオス測度

Quantum chaos measures for Floquet dynamics ( http://arxiv.org/abs/2007.07283v2 )

ライセンス: Link先を確認

Amin A. Nizami

(参考訳) 周期的に曲がりくねったフロッケシステム(例えば、蹴られたローター)は、カオスのパラダイム的かつ説明的な単純なモデルである。非可積分量子力学には、Loschmidtエコー、ウィグナー関数、スペクトル関数、OTOCなどのカオス的挙動の存在(または遷移)の診断測度がある。我々は、これらの測度を、駆動量子系のユニタリフロッケ作用素の固有系の観点から解析的に計算する。また、量子キックローターに対するフロケ作用素の固有方程式の等価な代替形式も決定する。蹴り上げられたローターのより単純な積分可能な変種に対して、その動力学の表現論的導出を与える。

Periodically kicked Floquet systems such as the kicked rotor are a paradigmatic and illustrative simple model of chaos. For non-integrable quantum dynamics there are several diagnostic measures of the presence of (or the transition to) chaotic behaviour including the Loschmidt echo, Wigner function, spectral function and OTOC. We analytically compute these measures in terms of the eigensystem of the unitary Floquet operator of driven quantum systems. We also determine equivalent alternative forms of the eigen-equation of the Floquet operator for the quantum kicked rotor. For a simpler integrable variant of the kicked rotor, we give a representation theoretic derivation of its dynamics.

翻訳日:2023-05-10 01:58:35 公開日:2020-07-31

# スパース量子ノイズの高速推定

Fast Estimation of Sparse Quantum Noise ( http://arxiv.org/abs/2007.07901v2 )

ライセンス: Link先を確認

Robin Harper, Wenjun Yu, Steven T. Flammia

(参考訳) 量子コンピュータがフォールトトレランスしきい値に近づくにつれ、大規模量子デバイスにおけるノイズの診断と特徴化がますます重要になっている。ノイズチャンネルの最も重要なクラスの1つは、理論的トラクタビリティと実験的妥当性の両方の理由から、パウリチャンネルのクラスである。ここでは、$s$非ゼロパウリ誤差率を$s$スパース、$n$量子パウリノイズチャネル、あるいはより一般に$s$最大のパウリ誤差率で推定する実用的なアルゴリズムを提案する。このアルゴリズムは厳密なリカバリ保証を持ち、$O(n^2)$測定、$O(s n^2)$古典的な処理時間、Clifford量子回路のみを使用する。我々は,IBM 14-qubit 超伝導デバイスのデータに対して,単純化されたクリフォード回路を用いたアルゴリズムのヒューリスティックバージョンを実験的に検証した。これらのデータは、信号が測定ノイズフロアより2桁下にある場合でも、任意の重み付きパウリ誤差の確率を正確に正確に推定できることを示している。

As quantum computers approach the fault tolerance threshold, diagnosing and characterizing the noise on large scale quantum devices is increasingly important. One of the most important classes of noise channels is the class of Pauli channels, for reasons of both theoretical tractability and experimental relevance. Here we present a practical algorithm for estimating the $s$ nonzero Pauli error rates in an $s$-sparse, $n$-qubit Pauli noise channel, or more generally the $s$ largest Pauli error rates. The algorithm comes with rigorous recovery guarantees and uses only $O(n^2)$ measurements, $O(s n^2)$ classical processing time, and Clifford quantum circuits. We experimentally validate a heuristic version of the algorithm that uses simplified Clifford circuits on data from an IBM 14-qubit superconducting device and our open source implementation. These data show that accurate and precise estimation of the probability of arbitrary-weight Pauli errors is possible even when the signal is two orders of magnitude below the measurement noise floor.

翻訳日:2023-05-09 09:03:20 公開日:2020-07-31

# 条件付き確率の規則は量子論において有効である [ゲルマンとヤオの「ベイズ統計のホール」について]

The rule of conditional probability is valid in quantum theory [Comment on Gelman & Yao's "Holes in Bayesian Statistics"] ( http://arxiv.org/abs/2007.08160v3 )

ライセンス: Link先を確認

P.G.L. Porta Mana

(参考訳) 最近の写本で、Gelman & Yao (2020) は「通常の条件確率の規則は量子領域で失敗する」と主張し、「確率論は真ではない(量子物理学)」と主張し、量子二重スリット実験の例でこれらの主張を支持することを主張した。本論は量子論のいくつかの関連する文献を思い出し、そのことを示している。 (i)ゲルマン・アンド・ヤオの主張は偽であり、実際、量子例は確率論の規則を確認する。 (ii) 量子の例に見られる特定の不等式は、例えば urn からの描画のような非常に非量子的な例にも現れることが示されるので、この問題には量子理論に特有のものはない。引用された原稿の量子論に関する誤記や不正確な記述も修正されている。

In a recent manuscript, Gelman & Yao (2020) claim that "the usual rules of conditional probability fail in the quantum realm" and that "probability theory isn't true (quantum physics)" and purport to support these statements with the example of a quantum double-slit experiment. The present comment recalls some relevant literature in quantum theory and shows that (i) Gelman & Yao's statements are false; in fact, the quantum example confirms the rules of probability theory; (ii) the particular inequality found in the quantum example can be shown to appear also in very non-quantum examples, such as drawing from an urn; thus there is nothing peculiar to quantum theory in this matter. A couple of wrong or imprecise statements about quantum theory in the cited manuscript are also corrected.

翻訳日:2023-05-09 07:17:32 公開日:2020-07-31

# フラグ状態スカッシングモデルを用いた非平衡位相符号化BB84プロトコルの鍵レート改善

Improving key rates of the unbalanced phase-encoded BB84 protocol using the flag-state squashing model ( http://arxiv.org/abs/2007.08662v2 )

ライセンス: Link先を確認

Nicky Kai Hong Li and Norbert L\"utkenhaus

(参考訳) 位相符号化bb84の実装はすべて、実際には不平衡振幅を持つ信号状態を持つ。したがって、プライオリによる元のセキュリティ分析は適用されない。以前のセキュリティ証明では、通常のBB84の動作を回復するために、マルチフォトンパルスの信号タグを使用する。不均衡信号の場合、光子数分割攻撃はeveに完全な情報を漏らすことはない。本研究では,フラグ状態のスカッシングモデルを用いて,多光子生成プライベート情報のいくつかを解析で保存する。数値的な証明手法を用いることで, 従来公表されていた低損失状態と比較して, キーレートが大幅に向上した。信頼できないダークカウントの通常のシナリオは、あるパラメーター状態において概念的な困難に陥ることが判明した。そこで本稿では,信頼度の高いダークカウントシナリオについても論じる。また,信頼された装置によって損失の一部が引き起こされることが分かっている場合,鍵レートの上昇も報告する。これらのキーレートの改善は、実験的な設定を変更せずに達成できることを強調する。

All phase-encoded BB84 implementations have signal states with unbalanced amplitudes in practice. Thus, the original security analyses a priori do not apply to them. Previous security proofs use signal tagging of multi-photon pulses to recover the behaviour of regular BB84. This is overly conservative, as for unbalanced signals, the photon-number splitting attack does not leak full information to Eve. In this work, we exploit the flag-state squashing model to preserve some parts of the multi-photon generated private information in our analysis. Using a numerical proof technique, we obtain significantly higher key rates compared with previously published results in the low-loss regime. It turns out that the usual scenario of untrusted dark counts runs into conceptual difficulties in some parameter regime. Thus, we discuss the trusted dark count scenario in this paper as well. We also report a gain in key rates when part of the total loss is known to be induced by a trusted device. We highlight that all these key rate improvements can be achieved without modification of the experimental setup.

翻訳日:2023-05-09 06:53:21 公開日:2020-07-31

# 極小熱環境による絡み合いの突然死

Sudden death of entanglement induced by a minimal thermal environment ( http://arxiv.org/abs/2007.09140v2 )

ライセンス: Link先を確認

G.L. De\c{c}ordi and A. Vidiella-Barranco

(参考訳) 熱状態における1つのモード電磁界と1つのモード電磁界を結合した2つの相互作用する2レベル系(量子ビット)のダイナミクスについて検討した。フィールドは、多くの自由度を持つ熱貯水池を通して環境をモデル化する通常のアプローチとは対照的に、小さな環境の役割を担っている。提案したモデルの解析解は,2量子系の特性量子特性に対する小環境とのカップリングの影響を解明するものである。量子エンタングルメントとコヒーレンスの時間進化について検討し、関連する結合定数への依存性と環境の有効温度の影響を検証した。興味深いことに、このような単純なシステムでは、突然の死亡と突然の絡み合いが生じる可能性がある。また、分離されたキュービットを、他のキュービットとフィールドモードで構成された複合環境に結合すると考えられる別のパーティションについても論じる。

We study the dynamics of two interacting two-level systems (qubits) having one of them isolated and the other coupled to a single mode electromagnetic field in a thermal state. The field plays the role of a small environment, in contrast to the usual approach of modeling an environment via a thermal reservoir with many degrees of freedom. We find the analytical solution of the proposed model, which allows us to investigate the consequences of the coupling to the small environment on characteristic quantum features of the two-qubit system. We study the time evolution of quantum entanglement and coherence, verifying the dependence on the relevant coupling constants as well as the influence of the effective temperature of the environment. Interestingly, we find that both sudden death and sudden birth of entanglement may occur in such a simple system. We also discuss a different partition, in which the isolated qubit is considered to be coupled to a composite environment, constituted by the other qubit plus the field mode.

翻訳日:2023-05-09 04:51:22 公開日:2020-07-31

# 中性Tb(II)(Cp$^{\rm{iPr5}}$)$_2$単一分子磁石における電気的に調整された超微細スペクトル

Electrically tuned hyperfine spectrum in neutral Tb(II)(Cp$^{\rm{iPr5}}$)$_2$ single-molecule magnet ( http://arxiv.org/abs/2007.15798v1 )

ライセンス: Link先を確認

Robert L. Smith, Aleksander L. Wysocki, and Kyungwha Park

(参考訳) 長いスピンコヒーレンス時間を持つ分子スピン量子ビットと、そのような量子ビット上の非侵襲的な操作方法が要求される。分子電子スピンレベルと核スピンレベルの両方を量子ビットとして使用できることを示した。ドーパントを持つ固体系では、電子スピン密度がドーパント核で高い場合、電場が核スピン量子ビットレベル間の間隔を効果的に変化させることが示されている。このような固体系に着想を得て、Ln$^{2+}$の特異な電子配置を持つ二価ランタニド(Ln)錯体はLn核スピンと電子自由度との間に強い相互作用を持ち、相互作用の電気的チューニングを行う。例えば、中性Tb(II)(Cp$^{\rm{iPr5}}$)$_2$単分子磁石(SMM)における$^{159}$Tb核の電子構造と超微細相互作用を、制限された活性空間状態相互作用に含まれるスピン-軌道相互作用を持つ完全活性空間自己整合磁場法を用いて研究する。計算の結果,低エネルギー状態は4f^8(6s,5d_{z^2})^1$,4$f^8$(5$d_{x^2-y^2}$)$^1$,4f^8(5d_{xy})^1$構成から生じることがわかった。我々は,超微細な相互作用パラメータと電子核スペクトルを多構成アプローチで計算する。超微細相互作用は、Tb(III)Pc$_2$SMMよりも1桁大きい。これは、Tb核スピンと6s,5d)$軌道の占有に由来する核の電子スピン密度との間の強いフェルミ接触相互作用に由来する。また、フェルミ接触項の電場への応答が電子核レベル分離の電気的チューニングをもたらすことも明らかにした。この超微細スターク効果は量子コンピューティングにおける分子核スピンの応用に有用かもしれない。

Molecular spin qubits with long spin coherence time as well as non-invasive operation methods on such qubits are in high demand. It was shown that both molecular electronic and nuclear spin levels can be used as qubits. In solid state systems with dopants, an electric field was shown to effectively change the spacing between the nuclear spin qubit levels when the electron spin density is high at the nucleus of the dopant. Inspired by such solid-state systems, we propose that divalent lanthanide (Ln) complexes with an unusual electronic configuration of Ln$^{2+}$ have a strong interaction between the Ln nuclear spin and the electronic degrees of freedom, which renders electrical tuning of the interaction. As an example, we study electronic structure and hyperfine interaction of the $^{159}$Tb nucleus in a neutral Tb(II)(Cp$^{\rm{iPr5}}$)$_2$ single-molecule magnet (SMM) using the complete active space self-consistent field method with spin-orbit interaction included within the restricted active space state interaction. Our calculations show that the low-energy states arise from $4f^8(6s,5d_{z^2})^1$, 4$f^8$(5$d_{x^2-y^2}$)$^1$, and $4f^8(5d_{xy})^1$ configurations. We compute the hyperfine interaction parameters and the electronic-nuclear spectrum within our multiconfigurational approach. We find that the hyperfine interaction is about one order of magnitude greater than that for Tb(III)Pc$_2$ SMMs. This stems from the strong Fermi contact interaction between the Tb nuclear spin and the electron spin density at the nucleus that originates from the occupation of the $(6s,5d)$ orbitals. We also uncover that the response of the Fermi contact term to electric field results in electrical tuning of the electronic-nuclear level separations. This hyperfine Stark effect may be useful for applications of molecular nuclear spins for quantum computing.

翻訳日:2023-05-07 12:56:12 公開日:2020-07-31

# フェニルスルホニル-カルバゾールtadfエミッタの電子遷移研究への量子コンピューティングの応用

Applications of Quantum Computing for Investigations of Electronic Transitions in Phenylsulfonyl-carbazole TADF Emitters ( http://arxiv.org/abs/2007.15795v1 )

ライセンス: Link先を確認

Qi Gao, Gavin O. Jones, Mario Motta, Michihiko Sugawara, Hiroshi C. Watanabe, Takao Kobayashi, Eriko Watanabe, Yu-ya Ohnishi, Hajime Nakamura and Naoki Yamamoto

(参考訳) 有機発光ダイオード(oled)用熱活性化遅延蛍光(tadf)エミッタとして提案されるフェニルスルホニル-カルバゾール化合物の第一一重項(s1)および三重項(t1)励起状態の量子化学研究を,量子シミュレータおよびデバイス上での量子方程式-運動変動量子固有解法(qeom-vqe)と変分量子デフレレーション(vqd)アルゴリズムを用いて行った。これらの量子シミュレーションは, tadf分子の最大占有量と最小占有分子軌道(homo, lumo)からなる活性空間上で, 二重ゼータ品質基底セットを用いて行った。量子シミュレータ上での計算により予測されるS1とT1(\Delta E_{st}$)のエネルギー分離の差は実験データとよく一致していることがわかった。 qeom-vqeアルゴリズムとvqdアルゴリズムを用いて, 量子デバイス上で誤差緩和を伴わずにシミュレーションを行うことにより, 励起状態に対する16mhaと88mhaの違いが確認された。状態トモグラフィーによる誤差軽減を利用して、量子状態の浄化とエネルギー値の補正を行うことで、未緩和結果に対する大きな誤差は、少なくとも正確な値に関して3mHaの違いに改善することができる。その結果、量子シミュレーションによって予測される$\Delta E_{st}$の値と実験で得られた値との間には優れた一致が見いだされた。

A quantum chemistry study of the first singlet (S1) and triplet (T1) excited states of phenylsulfonyl-carbazole compounds, proposed as useful thermally activated delayed fluorescence (TADF) emitters for organic light emitting diode (OLED) applications, was performed with the quantum Equation-Of-Motion Variational Quantum Eigensolver (qEOM-VQE) and Variational Quantum Deflation (VQD) algorithms on quantum simulators and devices. These quantum simulations were performed with double zeta quality basis sets on an active space comprising the highest occupied and lowest unoccupied molecular orbitals (HOMO, LUMO) of the TADF molecules. The differences in energy separations between S1 and T1 ($\Delta E_{st}$) predicted by calculations on quantum simulators were found to be in excellent agreement with experimental data. Differences of 16 and 88 mHa with respect to exact energies were found for excited states by using the qEOM-VQE and VQD algorithms, respectively, to perform simulations on quantum devices without error mitigation. By utilizing error mitigation by state tomography to purify the quantum states and correct energy values, the large errors found for unmitigated results could be improved to differences of, at most, 3 mHa with respect to exact values. Consequently, excellent agreement could be found between values of $\Delta E_{st}$ predicted by quantum simulations and those found in experiments.

翻訳日:2023-05-07 12:55:41 公開日:2020-07-31

# 二次元水素のcr\'amer-rao複雑性

Cr\'amer-Rao complexity of the two-dimensional confined hydrogen ( http://arxiv.org/abs/2007.15913v1 )

ライセンス: Link先を確認

C. R. Esta\~n\'on, N. Aquino, D. Puertas-Centeno, J. S. Dehesa

(参考訳) 2次元に閉じ込められた水素原子の内部障害は、統計量 Cr'amer-Rao による 1\textit{s}, 2\textit{s}, 2\textit{p}, 3\textit{d} 量子状態の閉じ込め半径で数値的に研究される。まず, 分散の閉じ込め依存性と電子分布の位置と運動量拡散のフィッシャー情報について計算し, 考察する。次に, Cr\'amer-Rao複雑性測定(平均値と電子分布の勾配含量に関する電荷濃度の組合せバランスを定量化する)を, 位置と運動量空間で検討した。閉じ込めは、この2つの成分測度によって全ての量子状態に対するシステムの複雑さを区別する。

The internal disorder of the two-dimensional confined hydrogenic atom is numerically studied in terms of the confinement radius for the 1\textit{s}, 2\textit{s}, 2\textit{p} and 3\textit{d} quantum states by means of the statistical Cr\'amer-Rao complexity measure. First, the confinement dependence of the variance and the Fisher information of the position and momentum spreading of its electron distribution are computed and discussed. Then, the Cr\'amer-Rao complexity measure (which quantifies the combined balance of the charge concentration around the mean value and the gradient content of the electron distribution) is investigated in position and momentum spaces. We found that confinement does distinguish complexity of the system for all quantum states by means of this two component measure.

翻訳日:2023-05-07 12:53:52 公開日:2020-07-31

# 多体系における局所相互作用による絡み合い補正

Entanglement correction due to local interactions in many-body systems ( http://arxiv.org/abs/2007.15908v1 )

ライセンス: Link先を確認

Yevheniia Cheipesh, Lorenzo Cevolani, Stefan Kehrein

(参考訳) フロー方程式ホログラフィー法の摂動拡張に基づいて,弱および局所的に相互作用するフェルミオンの2部分母エントロピーの領域法則の補正を行う。 1次元および2次元の場合(および高次元の場合)の明示的な計算は、相互作用強度が最大$u^2$の非相互作用フェルミオンの絡み合いエントロピーに対する主補正はスケーリングに影響を与えないが、主対数項の前因子に準粒子残基を乗じるだけであることを示している。地域法に準じた用語も存在する。相互作用強度は$U^2$に比例し、システムサイズと線形にスケールする。

The correction to the area law for the bipartite min-entanglement entropy of weakly and locally interacting fermions is calculated based on a perturbative extension of the flow equation holography method. Explicit calculations for the one- and two-dimensional case (and similarly for higher dimensions) show that the leading correction to the entanglement entropy of non-interacting fermions up to $U^2$ in the interaction strength does not change the scaling, but only affects the pre-factor of the leading logarithmic term multiplying it by the quasiparticle residue. A term sub-leading to the area law is also present. It is proportional to $U^2$ in the interaction strength and scales linearly with the system size.

翻訳日:2023-05-07 12:53:36 公開日:2020-07-31

# 減音効果を考慮した量子リモートセンシング

Quantum remote sensing under the effect of dephasing ( http://arxiv.org/abs/2007.15903v1 )

ライセンス: Link先を確認

Hideaki Okane, Hideaki Hakoshima, Yuki Takeuchi, Yuya Seki and Yuichiro Matsuzaki

(参考訳) 量子リモートセンシング(QRS)は、量子ビットセンサの測定結果に関するセキュリティを追加するスキームである。クライアントは量子センサを持つリモートサーバに測定タスクを委譲し、Eavesdropper(Eve)はサーバ側に保存されたすべての古典的な情報を盗む。量子特性を用いることで、QRSはクライアントがEveよりも知覚結果に関する情報を得る情報ゲインに関する非対称性を提供する。しかし、量子状態はデコヒーレンスに対して脆弱であるため、そのようなQRSが現実的な雑音の影響下で実際に有用であるかどうかは不明である。本稿では,対象フィールドとの相互作用を強調するqrsの性能について検討する。 QRSでは、クライアントとサーバはベルペアを共有する必要があり、ベルペアの欠陥は、検知のためのサーバ側の体系的な方法で状態準備エラーにつながる。我々はデフォーカスと状態準備の誤りの効果を考察する。クライアント側の不確実性は、繰り返し数$M$ for small $M$の平方根によって減少する。一方、大規模な$m$の場合、状態準備エラーはデファスメントと同様に関連し、不確実性は$m$で対数的に減少する。我々はクライアントとイブの間で得た情報を比較する。これにより、非対称な利得がデファス化の効果の下でも維持される条件が得られる。

The quantum remote sensing (QRS) is a scheme to add security about the measurement results of a qubit-based sensor. A client delegates a measurement task to a remote server that has a quantum sensor, and eavesdropper (Eve) steals every classical information stored in the server side. By using quantum properties, the QRS provides an asymmetricity about the information gain where the client gets more information about the sensing results than Eve. However, quantum states are fragile against decoherence, and so it is not clear whether such a QRS is practically useful under the effect of realistic noise. Here, we investigate the performance of the QRS with dephasing during the interaction with the target fields. In the QRS, the client and server need to share a Bell pair, and an imperfection of the Bell pair leads to a state preparation error in a systematic way on the server side for the sensing. We consider the effect of both dephasing and state preparation error. The uncertainty of the client side decreases with the square root of the repetition number $M$ for small $M$, which is the same scaling as the standard quantum metrology. On the other hand, for large $M$, the state preparation error becomes as relevant as the dephasing, and the uncertainty decreases logarithmically with $M$. We compare the information gain between the client and Eve. This leads us to obtain the conditions for the asymmetric gain to be maintained even under the effect of dephasing.

翻訳日:2023-05-07 12:53:22 公開日:2020-07-31

# 3つの同一ボソン:非整数次元および外部場における特性

Three identical bosons: Properties in non-integer dimensions and in external fields ( http://arxiv.org/abs/2007.15900v1 )

ライセンス: Link先を確認

E. Garrido and A.S. Jensen

(参考訳) 3次元(3次元)空間から2次元(2次元)空間に連続的に絞り込まれる3体系について検討する。このようなスクイージングは、一つの軸に沿って作用する外部閉じ込め電位によって得られる。しかし、この手順は数値的に要求されるか、特に大きな絞り込みシナリオでは不可能である。代用として、パラメータとして$d$という次元を使い、2\leq d \leq 3$の範囲内で連続的に変化する。 $d$-計算の単純さは、進行的閉じ込め後の3体状態の進化を研究するために利用される。 3dで相対的に$s$-waveを持つ3つの同一スピンレスボソンと調和振動子スクイージングポテンシャルの場合には考慮される。 2つの方法から得られた結果を比較し,両手法の次元,絞り長,波動関数に関するそれらの間の変換を行う。すべての計算はより単純な$d$-method内で完全に可能であるが、同じ幾何学を外部ポテンシャルで同時に提供する。

Three-body systems that are continuously squeezed from a three-dimensional (3D) space into a two-dimensional (2D) space are investigated. Such a squeezing can be obtained by means of an external confining potential acting along a single axis. However, this procedure can be numerically demanding, or even undoable, especially for large squeezed scenarios. An alternative is provided by use of the dimension $d$ as a parameter that changes continuously within the range $2\leq d \leq 3$. The simplicity of the $d$-calculations is exploited to investigate the evolution of three-body states after progressive confinement. The case of three identical spinless bosons with relative $s$-waves in 3D, and a harmonic oscillator squeezing potential is considered. We compare results from the two methods and provide a translation between them, relating dimension, squeezing length, and wave functions from both methods. All calculations are then possible entirely within the simpler $d$-method, but simultaneously providing the equivalent geometry with the external potential.

翻訳日:2023-05-07 12:52:59 公開日:2020-07-31

# 絡み合うチャネルのヌルスペースとその応用

Nullspaces of Entanglement Breaking Channels and Applications ( http://arxiv.org/abs/2007.15893v1 )

ライセンス: Link先を確認

D.W. Kribs, J. Levick, K. Olfert, R. Pereira, M. Rahaman

(参考訳) 絡み合うチャネルのヌルスペース構造と関連する応用について検討する。トレースゼロ行列のすべての作用素空間は、絡み合う破壊チャネルのヌル空間であることを示す。相補的なチャネル挙動と絡み合うチャネルのヌル空間に基づいて、量子チャネルの混合ユニタリ性をテストする。我々は、ある種の絡み合うチャネルのクラスに対するプライベート代数の存在を保証する条件を特定する。

We investigate the nullspace structures of entanglement breaking channels, and related applications. We show that every operator space of trace zero matrices is the nullspace of an entanglement breaking channel. We derive a test for mixed unitarity of quantum channels based on complementary channel behaviour and entanglement breaking channel nullspaces. We identify conditions that guarantee the existence of private algebras for certain classes of entanglement breaking channels.

翻訳日:2023-05-07 12:52:42 公開日:2020-07-31

# 絡み合いの少ない単純な量子位置検証プロトコルを破る

Breaking simple quantum position verification protocols with little entanglement ( http://arxiv.org/abs/2007.15808v1 )

ライセンス: Link先を確認

Andrea Olivo, Ulysse Chabaud, Andr\'e Chailloux, Fr\'ed\'eric Grosshans

(参考訳) inqc(instantaneous nonlocal quantum computation)は、見かけの量子および相対論的制約を回避し、指数的絡み合いコストでジェネリック量子位置検証(qpv)プロトコル(遠方証明器の位置をセキュアに検証する)を攻撃可能にする。我々は,最大絡み合ったキューディットのペアを共有する敵について検討し,1光子を1つの角度で偏光したQPVプロトコルの実用的ファミリに対する低次元INQC攻撃を,$\theta$で行う。クリフォード階層の外に座っているもの(例えば$\pi/6$)や、$\theta$が$\simeq 5\cdot 10^{-3}$以上のエラーをプロトコルのキュービットに2つのebitを持つ敵に対して許容できないことなど、いくつかの合理的な角度に対する正確な攻撃を見つける。

Instantaneous nonlocal quantum computation (INQC) evades apparent quantum and relativistic constraints and allows to attack generic quantum position verification (QPV) protocols (aiming at securely certifying the location of a distant prover) at an exponential entanglement cost. We consider adversaries sharing maximally entangled pairs of qudits and find low-dimensional INQC attacks against the simple practical family of QPV protocols based on single photons polarized at an angle $\theta$. We find exact attacks against some rational angles, including some sitting outside of the Clifford hierarchy (e.g. $\pi/6$), and show no $\theta$ allows to tolerate errors higher than $\simeq 5\cdot 10^{-3}$ against adversaries holding two ebits per protocol's qubit.

翻訳日:2023-05-07 12:51:38 公開日:2020-07-31

# ハイブリッド職場における食事嗜好の分析

Seating preference analysis for hybrid workplaces ( http://arxiv.org/abs/2007.15807v1 )

ライセンス: Link先を確認

Mohammad Saiedur Rahaman, Shaw Kudo, Tim Rawling, Yongli Ren, and Flora D. Salim

(参考訳) フレキシブルな仕事の性質の増大と、新型コロナウイルス(covid-19)の規制による近年の要件により、職場はよりハイブリッドになってきている(例えば、伝統的なオフィススペースや、自宅など他の場所で働くことができる)。作業場は設計、レイアウト、利用可能な設備が異なるため、多くの作業員は適切な調整が難しいと感じている。最終的に、これは仕事の生産性や、集中、ストレス、ムードなどの関連するパラメータに悪影響を及ぼす。この負の作業経験を引き起こす重要な要因の1つは、利用可能な座席配置に直接関連している。本稿では、新型コロナウイルス以前のデータを用いて、37人の従業員の様々な座席選択を理解するための分析を行い、ハイブリッド職場環境における調査結果を分析した。また、我々の発見がより広範なハイブリッドな作業環境にどのように適応できるかを示す意味のリストについても論じる。

Due to the increasing nature of flexible work and the recent requirements from COVID-19 restrictions, workplaces are becoming more hybrid (i.e. allowing workers to work between traditional office spaces and elsewhere including from home). Since workplaces are different in design, layout and available facilities, many workers find it difficult to adjust accordingly. Eventually, this impacts negatively towards work productivity and other related parameters including concentration, stress, and mood while at work. One of the key factors that causes this negative work experience is directly linked to the available seating arrangements. In this paper, we conduct an analysis to understand various seating preferences of 37 workers with varying demographics, using the data collected pre-COVID-19, and analyse the findings in the context of hybrid workplace settings. We also discuss a list of implications illustrating how our findings can be adapted across wider hybrid work settings.

翻訳日:2023-05-07 12:51:21 公開日:2020-07-31

# 音響光学変調器を用いた光空間モードの高速生成と検出

Fast Generation and Detection of Spatial Modes of Light using an Acousto-Optic Modulator ( http://arxiv.org/abs/2007.16115v1 )

ライセンス: Link先を確認

Boris Braverman, Alexander Skerjanc, Nicholas Sullivan, Robert W. Boyd

(参考訳) 光の空間モードは、古典的および量子的情報をエンコードするのに使用できる高次元空間を提供する。空間光変調器やデジタルマイクロミラー装置などの高解像度位相マスクを再構成する必要があるため、これらのモードを動的に生成・測定するための現在のアプローチは遅い。光の空間モードを更新するプロセスは、AOM(Acousto-optic modulator)のような高速な画像保存光学スイッチで静止相マスクのセットを多重化することにより、大幅に加速することができる。両パスAOMを用いて5つの軌道角運動量状態のうちの1つを最大500kHzのスイッチング速度で生成する。次に,このシステムを用いて2次元ヒルベルト空間における空間モードの高速量子トモグラフィーを行い,未知の状態を3つの偏りのない基底からなる6つの空間モードに投影する。我々は平均96.9%の忠実度で任意の状態を1ミリ秒未満で再構築することができる。

Spatial modes of light provide a high-dimensional space that can be used to encode both classical and quantum information. Current approaches for dynamically generating and measuring these modes are slow, due to the need to reconfigure a high-resolution phase mask such as a spatial light modulator or digital micromirror device. The process of updating the spatial mode of light can be greatly accelerated by multiplexing a set of static phase masks with a fast, image-preserving optical switch, such as an acousto-optic modulator (AOM). We experimentally realize this approach, using a double-pass AOM to generate one of five orbital angular momentum states with a switching rate of up to 500 kHz. We then apply this system to perform fast quantum state tomography of spatial modes of light in a 2-dimensional Hilbert space, by projecting the unknown state onto six spatial modes comprising three mutually unbiased bases. We are able to reconstruct arbitrary states in under 1 ms with an average fidelity of 96.9%.

翻訳日:2023-05-07 12:43:37 公開日:2020-07-31

# 超伝導ナノワイヤ単光子検出器の2x2多重画素アレイの量子検出器トモグラフィー

Quantum detector tomography of a 2x2 multi-pixel array of superconducting nanowire single photon detectors ( http://arxiv.org/abs/2007.16048v1 )

ライセンス: Link先を確認

Timon Schapeler, Jan Philipp Hoepker, Tim J. Bartley

(参考訳) 超伝導ナノワイヤ単光子検出器の商用2x2アレイの量子検出器トモグラフィーを実証する。本研究は, 検出器物理に関係なく, 効率, 暗数, クロストーク確率などの検出器固有値を直接抽出できることを示す。これらの数値は、デバイスの再構成された正の演算子値測定(POVM)の4つの要素から直接識別される。検出器トモグラフィーにより抽出された効率と暗カウント確率の値は,これらの量の独立測定値と良好な一致を示し,クロストーク確率の直感的な操作定義を提供する。最後に,再構成に必要なパラメータを慎重に選択し,データの過度なスムース化を回避する必要があることを示す。

We demonstrate quantum detector tomography of a commercial 2x2 array of superconducting nanowire single photon detectors. We show that detector-specific figures of merit including efficiency, dark-count and cross-talk probabilities can be directly extracted, without recourse to the underlying detector physics. These figures of merit are directly identified from just four elements of the reconstructed positive operator valued measure (POVM) of the device. We show that the values for efficiency and dark-count probability extracted by detector tomography show excellent agreement with independent measurements of these quantities, and we provide an intuitive operational definition for cross-talk probability. Finally, we show that parameters required for the reconstruction must be carefully chosen to avoid oversmoothing the data.

翻訳日:2023-05-07 12:42:28 公開日:2020-07-31

# 単一光子アバランシェダイオードカメラによる量子照明イメージング

Quantum illumination imaging with a single-photon avalanche diode camera ( http://arxiv.org/abs/2007.16037v1 )

ライセンス: Link先を確認

Hugo Defienne, Jiuxuan Zhao, Edoardo Charbon, Daniele Faccio

(参考訳) 単光子-バランシェダイオード(SPAD)アレイは、バイオフォトニクス、光学測光、量子光学において必須のツールである。しかし、画素数が少なく、量子効率が低く、フィリング係数も小さいため、実用的なイメージングへの応用は妨げられている。本稿では,100kピクセルspadカメラを用いたフルフィールドエンタングル光子対相関イメージングを示す。 5億対以上の位置間の光子一致を測定することで、撮像系の全点拡散関数と、空間的に絡み合った光子対によって照らされた対象物の高分解能画像を求める。我々は、我々の撮像手法が成層光に対して堅牢であることを示し、量子イメージング技術が実験室を超えて量子LiDARのような実世界のアプリケーションに移行することを可能にする。

Single-photon-avalanche diode (SPAD) arrays are essential tools in biophotonics, optical ranging and sensing and quantum optics. However, their small number of pixels, low quantum efficiency and small fill factor have so far hindered their use for practical imaging applications. Here, we demonstrate full-field entangled photon pair correlation imaging using a 100-kpixels SPAD camera. By measuring photon coincidences between more than 500 million pairs of positions, we retrieve the full point spread function of the imaging system and subsequently high-resolution images of target objects illuminated by spatially entangled photon pairs. We show that our imaging approach is robust against stray light, enabling quantum imaging technologies to move beyond laboratory experiments towards real-world applications such as quantum LiDAR.

翻訳日:2023-05-07 12:42:14 公開日:2020-07-31

# 強化学習を用いた量子コンパイラの量子ビットルーティング

Using Reinforcement Learning to Perform Qubit Routing in Quantum Compilers ( http://arxiv.org/abs/2007.15957v1 )

ライセンス: Link先を確認

Matteo G. Pozzi (1), Steven J. Herbert (1 and 2), Akash Sengupta (3), Robert D. Mullins (1) ((1) University of Cambridge Computer Laboratory, (2) Cambridge Quantum Computing, (3) Department of Engineering, University of Cambridge)

(参考訳) 量子ルーティング(Qubit routing)とは、ターゲットの量子コンピュータの接続制約を満たすために量子回路を変更するタスクである。これはSWAPゲートを回路に挿入することで、論理ゲートが隣接する物理量子ビット間でのみ発生するようにする。 SWAPゲートが付加する回路深度を最小化することが目的である。本稿では,深層q学習パラダイムの修正版を用いた量子ビットルーティング手法を提案する。このシステムは、ランダム回路とリアル回路の両方で現在利用可能な最も先進的な量子コンパイラの2つから、短期的なアーキテクチャサイズでqubitルーティング手順を上回ることができる。

"Qubit routing" refers to the task of modifying quantum circuits so that they satisfy the connectivity constraints of a target quantum computer. This involves inserting SWAP gates into the circuit so that the logical gates only ever occur between adjacent physical qubits. The goal is to minimise the circuit depth added by the SWAP gates. In this paper, we propose a qubit routing procedure that uses a modified version of the deep Q-learning paradigm. The system is able to outperform the qubit routing procedures from two of the most advanced quantum compilers currently available, on both random and realistic circuits, across near-term architecture sizes.

翻訳日:2023-05-07 12:41:12 公開日:2020-07-31

# 起業金融研究における西洋的イデオロギー的均質性--高引用出版物からのエビデンス

Western ideological homogeneity in entrepreneurial finance research: Evidence from highly cited publications ( http://arxiv.org/abs/2008.00016v1 )

ライセンス: Link先を確認

Minh-Hoang Nguyen, Huyen Thanh T. Nguyen, Thanh-Hang Pham, Manh-Toan Ho and Quan-Hoang Vuong

(参考訳) 起業家はグローバルな持続可能な開発において重要な役割を果たすが、限られた金融資源は業績と生存率を制限している。したがって、起業家金融の規律は、金融と起業家精神の関係を探求するために生まれます。起業家精神のグローバルな存在にもかかわらず、起業家金融の文献は西洋のイデオロギー的均質であると疑われている。本研究の目的は、起業家金融文学における西洋のイデオロギー的均質性の存在を検討することである。我々は,マインドスポンジ機構と文献分析(Y-インデックスと社会構造)を用いて,Web of Scienceデータベースから抽出された412の高度に引用された論文を分析し,起業家金融の中核イデオロギーの集合における異質性に対する弱い耐性と西洋イデオロギーの優位性を見出した。これらの結果は著者、機関、国レベルで一致しており、この分野における西洋のイデオロギー的同質性の存在の強い証拠を示している。筆者らは,イデオロギー的均質性の欠点を避けるため,研究トピックの多様化と知識交換の促進を積極的に行うことを推奨する。さらに、他の科学分野におけるイデオロギーの多様性を評価できる方法として、マインドスポンジ機構の合成と文献分析が提案されている。

Entrepreneurs play crucial roles in global sustainable development, but limited financial resources constrain their performance and survival rate. Entrepreneurial finance discipline is, therefore, born to explore the connection between finance and entrepreneurship. Despite the global presence of entrepreneurship, the literature of entrepreneurial finance is suspected to be Western ideologically homogenous. Thus, the objective of this study is to examine the existence of Western ideological homogeneity in entrepreneurial finance literature. Employing the mindsponge mechanism and bibliometric analyses (Y-index and social structure), we analyze 412 highly cited publications extracted from Web of Science database and find Western ideological dominance as well as weak tolerance towards heterogeneity in the set of core ideologies of entrepreneurial finance. These results are consistent across author-, institution-, and country-levels, which reveals strong evidence for the existence of Western ideological homogeneity in the field. We recommend editors, reviewers, and authors to have proactive actions to diversify research topics and enhancing knowledge exchange to avoid the shortfalls of ideological homogeneity. Moreover, the synthesis of mindsponge mechanism and bibliometric analyses are suggested as a possible way to evaluate the state of ideological diversity in other scientific disciplines.

翻訳日:2023-05-07 12:35:24 公開日:2020-07-31

# 放射状schr\"odinger方程式のユニタリ、連続体、定常摂動理論

Unitary, continuum, stationary perturbation theory for the radial Schr\"odinger equation ( http://arxiv.org/abs/2008.01831v1 )

ライセンス: Link先を確認

Scott E. Hoffmann

(参考訳) ポアンカル群生成子の可換体は、単位変換が自由生成元と相互作用する相対論的理論の生成元を関連付ける場合、形式的に変化しない。非相対論的な場合、生成体のユニタリ変換の概念をテストし、自由かつ相互作用するハミルトン多様体はユニタリ変換によって関連付ける必要がある。他の著者はこの概念を時間依存摂動理論に適用し、摂動論における時間発展作用素の各々の順序へのユニタリティを与え、標準摂動理論よりも改善することを示した。この場合、定常摂動理論は球対称ポテンシャルから散乱するための放射状シュレーディンガー方程式の近似解を見つけるために構成することができる。カップリング定数において、第1および第2次位相シフトに対して一般式を求める。本研究では,S波位相シフトに対する第1次および第2次コントリビューションと,対応する正解の第2次への拡張との完全な一致を求める。

The commutators of the Poincar\'e group generators will be unchanged in form if a unitary transformation relates the free generators to the generators of an interacting relativistic theory. We test the concept of unitary transformations of generators in the nonrelativistic case, requiring that the free and interacting Hamiltonians be related by a unitary transformation. Other authors have applied this concept to time-dependent perturbation theory to give unitarity of the time evolution operator to each order in perturbation theory, with results that show improvement over the standard perturbation theory. In our case, a stationary perturbation theory can be constructed to find approximate solutions of the radial Schr\"odinger equation for scattering from a spherically symmetric potential. General formulae are obtained for the phase shifts at first and second order in the coupling constant. We test the method on a simple system with a known exact solution and find complete agreement between our first- and second-order contributions to the s-wave phase shifts and the corresponding expansion to second order of the exact solution.

翻訳日:2023-05-07 12:24:47 公開日:2020-07-31

# トラップ型超伝導光子検出器による捕捉イオン量子状態の読み出し

State Readout of a Trapped Ion Qubit Using a Trap-Integrated Superconducting Photon Detector ( http://arxiv.org/abs/2008.00065v1 )

ライセンス: Link先を確認

S. L. Todaro, V. B. Verma, K. C. McCormick, D. T. C. Allcock, R. P. Mirin, D. J. Wineland, S. W. Nam, A. C. Wilson, D. Leibfried, and D. H. Slichter

(参考訳) トラップ型光子検出器を用いた捕捉イオン量子ビットの高忠実度状態読み出しについて報告する。このトラップ構造に作製された超伝導ナノワイヤ単光子検出器(SNSPD)を用いた状態依存型イオン蛍光光子を数えることにより、表面電界rfイオントラップに保持される1つの$^9$Be$^+$イオンの超微細量子状態を決定する。平均読み出し忠実度は 0.9991(1) であり、平均読み出し期間は 46 $\mu$s であり、読み出しレーザビームの偏光不純物とオフ共振光励起によって制限される。イオンと検出器の間に干渉する光学素子がないため、イオン蛍光を自己校正光源として利用し、検出器の量子効率と光子入射角と偏光への依存性を決定することができる。

We report high-fidelity state readout of a trapped ion qubit using a trap-integrated photon detector. We determine the hyperfine qubit state of a single $^9$Be$^+$ ion held in a surface-electrode rf ion trap by counting state-dependent ion fluorescence photons with a superconducting nanowire single-photon detector (SNSPD) fabricated into the trap structure. The average readout fidelity is 0.9991(1), with a mean readout duration of 46 $\mu$s, and is limited by the polarization impurity of the readout laser beam and by off-resonant optical pumping. Because there are no intervening optical elements between the ion and the detector, we can use the ion fluorescence as a self-calibrated photon source to determine the detector quantum efficiency and its dependence on photon incidence angle and polarization.

翻訳日:2023-05-07 12:24:04 公開日:2020-07-31

# データから知識から行動へ:スマートグリッドの実現

From Data to Knowledge to Action: Enabling the Smart Grid ( http://arxiv.org/abs/2008.00055v1 )

ライセンス: Link先を確認

Randal E. Bryant, Randy H. Katz, Chase Hensel, and Erwin P. Gianchandani

(参考訳) 我が国の発電、送電、配電のためのインフラである「グリッド」は、何世紀にもわたる技術に基づく遺物である。大規模プラントによる高価で中央集権的な発電と、大規模な送電・流通システムで構成されている。要求がどうであれ、すべての加入者に同時に高品質な電力を提供することを試みており、それゆえ、各配布ポイントにおけるピークアグリゲーション需要までのサイズでなければならない。最終的にシステムはエンドツーエンドの同期を必要とするため、"バッフィング(buffering)"エネルギを格納するメカニズムが欠如しており、グリッド間の共有や"上流(upstream)"停止時の独立操作を複雑にしている。最近のブラックアウトは、既存のグリッドの問題を示している。さらに、この構造は太陽や風といった再生可能エネルギー源の高度に可変な性質に対応できない。多くの人々は、電気エネルギーの生成、分配、消費のためのより分散的で適応的で市場ベースのインフラである「スマートグリッド」に期待を向けている。この新しいアプローチは、既存の配電システムに比べて環境への影響を低減しつつ、効率とレジリエンスを高めるように設計されている。スマートグリッドの当初の計画では、既存の情報技術を広く活用することを示唆している。特に、データ分析の最近の進歩は、データマイニング、機械学習などである。エネルギーの使用方法や現在のエネルギーグリッドに課している要求の種類に関する豊富なデータを理解するのに役立ち、スマートグリッドを大幅に強化し、最終的にはその影響を増幅する可能性を持っている。ここでは、電力網が10年でどう見えるか、特に、データ分析アプローチに対する連邦政府の投資が、このビジョンを実現する上でいかに重要かを説明します。

Our nation's infrastructure for generating, transmitting, and distributing electricity - "The Grid" - is a relic based in many respects on century-old technology. It consists of expensive, centralized generation via large plants, and a massive transmission and distribution system. It strives to deliver high-quality power to all subscribers simultaneously - no matter what their demand - and must therefore be sized to the peak aggregate demand at each distribution point. Ultimately, the system demands end-to-end synchronization, and it lacks a mechanism for storing ("buffering") energy, thus complicating sharing among grids or independent operation during an "upstream" outage. Recent blackouts demonstrate the existing grid's problems - failures are rare but spectacular. Moreover, the structure cannot accommodate the highly variable nature of renewable energy sources such as solar and wind. Many people are pinning their hopes on the "smart grid" - i.e., a more distributed, adaptive, and market-based infrastructure for the generation, distribution, and consumption of electrical energy. This new approach is designed to yield greater efficiency and resilience, while reducing environmental impact, compared to the existing electricity distribution system. Initial plans for the smart grid suggest it will make extensive use of existing information technology. In particular, recent advances in data analytics - i.e., data mining, machine learning, etc. - have the potential to greatly enhance the smart grid and, ultimately, amplify its impact, by helping us make sense of an increasing wealth of data about how we use energy and the kinds of demands that we are placing upon the current energy grid. Here we describe what the electricity grid could look like in 10 years, and specifically how Federal investment in data analytics approaches are critical to realizing this vision.

翻訳日:2023-05-07 12:23:49 公開日:2020-07-31

# 関連するOTOC作用素:古典力学のフットプリント

Relevant OTOC operators: footprints of the classical dynamics ( http://arxiv.org/abs/2008.00046v1 )

ライセンス: Link先を確認

Pablo D. Bergamasco, Gabriel G. Carlo and Alejandro M. F. Rivas

(参考訳) out-of-time order correlator (otoc) は最近、量子情報のスクランブルと絡み合いに関連付けられた様々な領域で関連づけられている。また、量子複雑性の指標として提案されている。この意味で、OTOC-REの定理は、作用素の完備基底にまとめられたOTOCを第二レニイエントロピーに関連付ける。ここでは、パウリ、リフレクション、翻訳演算子で構築されたような物理的に意味のある基底上でOTOC-RE対応を研究した。この進化は、異なるダイナミクスを持つ2つの摂動と結合されたアーノルドキャットマップからなるパラダイム的二成分系によって与えられる。関係作用素の小さな集合上の和は、エントロピーの非常によい近似を得るのに十分であり、したがって、時間 t 0 まで、力学の性格を明らかにするのに十分であることを示す。逆に、これは複雑さの別の自然な指標、すなわち時間に伴う関連する演算子の数のスケーリングを提供する。位相空間で表されるとき、これらの集合は、選択された基底に応じて深さの異なる古典力学足跡を明らかにする。

The out-of-time order correlator (OTOC) has recently become relevant in different areas where it has been linked to scrambling of quantum information and entanglement. It has also been proposed as a good indicator of quantum complexity. In this sense, the OTOC-RE theorem relates the OTOCs summed over a complete base of operators to the second Renyi entropy. Here we have studied the OTOC-RE correspondence on physically meaningful bases like the ones constructed with the Pauli, reflection, and translation operators. The evolution is given by a paradigmatic bi-partite system consisting of two perturbed and coupled Arnold cat maps with different dynamics. We show that the sum over a small set of relevant operators, is enough in order to obtain a very good approximation for the entropy and hence to reveal the character of the dynamics, up to a time t 0 . In turn, this provides with an alternative natural indicator of complexity, i.e. the scaling of the number of relevant operators with time. When represented in phase space, each one of these sets reveals the classical dynamical footprints with different depth according to the chosen base.

翻訳日:2023-05-07 12:23:20 公開日:2020-07-31

# データから知識から行動へ:21世紀のグローバル・エンバーサ

From Data to Knowledge to Action: A Global Enabler for the 21st Century ( http://arxiv.org/abs/2008.00045v1 )

ライセンス: Link先を確認

Eric Horvitz and Tom Mitchell

(参考訳) コンピュータと数理科学の進歩は、真の証拠に基づく意思決定を可能にする前例のない能力を生み出した。これらの能力は、科学、社会、政府の課題に関する決定を支援するために、データの大規模なキャプチャと、そのデータの洞察とレコメンデーションへの変換を可能にする。主な進歩は、リッチなデータストリームの可用性の上昇、大量のデータの保存と検索のコストの急落、計算能力とメモリの指数関数的な増加、機械学習と推論を実行するための方法の多さの飛躍などである。これらの進歩は、大量のデータを活用して洞察を創造し、意思決定を導く能力の転換点を生み出しました。商業、科学、教育、芸術、エンターテイメントのWebへの移行により、人間の活動に関する前例のない量の構造化された、構造化されていないデータベースが利用できるようになる。科学において、新しい明らかなパラダイムとセンシング技術は、基本的に新しい種類の低コストセンサー(ゲノムマイクロアレイなど)や、前例のない範囲と解像度を提供するビューアを通じて、大量のデータを作成している。データはデータ中心の分析に大きなチャンスをもたらす。これまでのところ、これらの大規模なデータセットから学習する可能性の表面をひっかいただけだった。意思決定者に対して洞察を提供し、行動やポリシーの質を高めるために、私たちの新しい機能をもっと広範囲にタップする機会がある。

A confluence of advances in the computer and mathematical sciences has unleashed unprecedented capabilities for enabling true evidence-based decision making. These capabilities are making possible the large-scale capture of data and the transformation of that data into insights and recommendations in support of decisions about challenging problems in science, society, and government. Key advances include jumps in the availability of rich streams of data, precipitous drops in the cost of storing and retrieving massive amounts of data, exponential increases in computing power and memory, and jumps in the prowess of methods for performing machine learning and reasoning. These advances have come together to create an inflection point in our ability to harness large amounts of data for generating insights and guiding decision making. The shift of commerce, science, education, art, and entertainment to the web makes available unprecedented quantities of structured and unstructured databases about human activities - much of it available to anyone who wishes to mine it for insights. In the sciences, new evidential paradigms and sensing technologies are making available great quantities of data, via use of fundamentally new kinds of low-cost sensors (e.g., genomic microarrays) or through viewers that provide unprecedented scope and resolution. The data pose a huge opportunity for data-centric analyses. To date, we have only scratched the surface of the potential for learning from these large-scale data sets. Opportunities abound for tapping our new capabilities more broadly to provide insights to decision makers and to enhance the quality of their actions and policies.

翻訳日:2023-05-07 12:23:04 公開日:2020-07-31

# 最初の原理による量子熱力学のモデル:量子ハロまたは小さな環境

The Model of Quantum Thermodynamics From the First Principles: Quantum Halo or Small Environment ( http://arxiv.org/abs/2008.00040v1 )

ライセンス: Link先を確認

Ashot Gevorkyan

(参考訳) 結合系 (js) - `quantum system (qs)+thermal bath (tb) の進化は、ランジュバン・シュル(langevin-schr\"{o}dinger)型の確率微分方程式を満たす複雑な確率的過程の枠組みで考えられている。環境と相互にランダムに相互作用する2つの線形結合振動子をQSとして選択する。相互作用が白色ランダム過程の法則に従う場合、QSの統計パラメータとその環境のすべての構成は、二重積分と二階偏微分方程式の解の形で解析的に実行される。時間依存フォン・ノイマンエントロピーとその一般化の表現は、jsで起こる自己組織化と絡み合い過程を考慮して得られる。数学的には、TB における JS の緩和の結果、小さな量子化された環境が形成されることが証明され、これは QS の継続あるいはハローと解釈できる。連成2本の線形発振器の崩壊により形成されたベル状態は、環境の影響を考慮して構成される。 QSの漸近状態の$(in)$と$(out$)への遷移は、TBの影響を考慮して詳細に研究される。モデル問題の枠組みの中で、第一原理から量子熱力学を構築する可能性は、追加条件を使わずに証明される。

The evolution of the joint system (JS) - ``quantum system (QS)+thermal bath (TB)" is considered in the framework of a complex probabilistic processes that satisfies the stochastic differential equation of the Langevin-Schr\"{o}dinger type. Two linearly coupled oscillators that randomly interact with the environment and with each other are selected as QS. In the case when the interactions obey the law of a white random process, all the construction of the statistical parameters of the QS and its environment are performed analytically in the form of double integrals and solutions of second-order partial differential equations. Expressions of time-dependent von Neumann entropy and its generalization are obtained, taking into account the self-organization and entanglement processes occurring in the JS. It is mathematically proved that as a result of the relaxation of JS in the TB, a small quantized environment is formed, which can be interpreted as a continuation of QS or its halo. Bell states formed as a result of the decay of coupled two linear oscillators are constructed taking into account the influence of the environment. The transitions between $(in)$ and $(out$) asymptotic states of QS are studied in detail taking into account the influence of TB. Within the framework of the model problem, the possibility of constructing quantum thermodynamics from the first principle is proved without using any additional conditions.

翻訳日:2023-05-07 12:22:40 公開日:2020-07-31

# 次世代コンピューティングの可能性と課題

Opportunities and Challenges for Next Generation Computing ( http://arxiv.org/abs/2008.00023v1 )

ライセンス: Link先を確認

Gregory D. Hager, Mark D. Hill, and Katherine Yelick

(参考訳) コンピューティングは、ビジネスや農業からコミュニケーション、エンターテイメントに至るまで、私たちの生活のほぼすべての側面を劇的に変えました。国家としては、エネルギー、輸送、防衛のためのシステム設計におけるコンピューティングに依存しており、コンピューティングは世界の根本的な理解を改善し、健康と環境における大きな課題に対するソリューションの開発を支援する科学的発見を促進する。なぜなら、私たちのイノベーションは、過去数十年で性能とコストパフォーマンスが100万倍に向上したコンピュータ上で実行することができるからです。この背景にある推進力はムーアの法則と呼ばれるチップ毎のトランジスタの倍増を繰り返している。デナード・スケーリング(Dennard Scaling)は、これらのパフォーマンスの倍増をほぼ一定のパワーで実現したイネーブルだが、いずれのトレンドも課題に直面している。過去30年間のこの2つのトレンドの影響について考えてみましょう。 1980年代のスーパーコンピュータ(例えばCray 2)は2Gflops近くで評価され、200KW近い電力を消費した。当時、気象予報から核兵器研究まで、高性能で全国規模の用途に使われていた。同じような性能のコンピュータがポケットに収まり、消費電力は10ワット以下になった。つまり、ペタフロロップスケールのマシン(例えば1Pflop(=1015オペレーション/秒)のパフォーマンスに約500KWを必要とするCray XK7)を取り込み、そのプロセスを繰り返すことになる。そんなコンピューターをポケットに入れたら何ができますか。高容量コンピューティングの状況をどのように変えるのか? 本稿では,パーソナル・スケール・コンピューティングと国家規模コンピューティングの双方において,劇的なパフォーマンス向上の機会と課題を明らかにし,この規模のコンピューティングを実現する上での「アウト・オブ・ザ・ボックス」の可能性について論じる。

Computing has dramatically changed nearly every aspect of our lives, from business and agriculture to communication and entertainment. As a nation, we rely on computing in the design of systems for energy, transportation and defense; and computing fuels scientific discoveries that will improve our fundamental understanding of the world and help develop solutions to major challenges in health and the environment. Computing has changed our world, in part, because our innovations can run on computers whose performance and cost-performance has improved a million-fold over the last few decades. A driving force behind this has been a repeated doubling of the transistors per chip, dubbed Moore's Law. A concomitant enabler has been Dennard Scaling that has permitted these performance doublings at roughly constant power, but, as we will see, both trends face challenges. Consider for a moment the impact of these two trends over the past 30 years. A 1980's supercomputer (e.g. a Cray 2) was rated at nearly 2 Gflops and consumed nearly 200 KW of power. At the time, it was used for high performance and national-scale applications ranging from weather forecasting to nuclear weapons research. A computer of similar performance now fits in our pocket and consumes less than 10 watts. What would be the implications of a similar computing/power reduction over the next 30 years - that is, taking a petaflop-scale machine (e.g. the Cray XK7 which requires about 500 KW for 1 Pflop (=1015 operations/sec) performance) and repeating that process? What is possible with such a computer in your pocket? How would it change the landscape of high capacity computing? In the remainder of this paper, we articulate some opportunities and challenges for dramatic performance improvements of both personal to national scale computing, and discuss some "out of the box" possibilities for achieving computing at this scale.

翻訳日:2023-05-07 12:22:15 公開日:2020-07-31

# モノのインターネットのトレンドの加速による安全、セキュリティ、プライバシの脅威

Safety, Security, and Privacy Threats Posed by Accelerating Trends in the Internet of Things ( http://arxiv.org/abs/2008.00017v1 )

ライセンス: Link先を確認

Kevin Fu, Tadayoshi Kohno, Daniel Lopresti, Elizabeth Mynatt, Klara Nahrstedt, Shwetak Patel, Debra Richardson, and Ben Zorn

(参考訳) IoT(Internet of Things)はすでに、産業や都市、家庭を変革している。全ての産業におけるこの変革の経済的価値は1兆ドルと見積もられ、エネルギー効率、健康、生産性に対する社会的影響は巨大である。相互接続されたスマートデバイスの潜在的な利点は、あらゆるデバイスにセンサーとインテリジェンスを埋め込む際のリスクと悪用の可能性を高める。 iotデバイスの増加に関する主要な問題のひとつは、安全かつ安全に運用するために必要となる複雑さの増加である。この複雑さの増加によって、新しい安全性、セキュリティ、プライバシ、ユーザビリティの課題は、個人が1つのデバイスを保護するだけで直面する困難な課題をはるかに超えます。スマートデバイスやデバイスの集合が引き起こす負の傾向に注目し,セキュリティや物理的安全性,プライバシ,ユーザビリティなどに関わる問題は厳密に相互接続され,4つすべてを同時に対処するソリューションが必要であると主張する。既存の技術に基づく個々のデバイスに対する厳密な安全性とセキュリティ基準が必要である。同様に、個人がデバイスのコレクションを確実に管理する最良の方法を決定する研究は、このようなシステムの今後の展開を導く必要がある。

The Internet of Things (IoT) is already transforming industries, cities, and homes. The economic value of this transformation across all industries is estimated to be trillions of dollars and the societal impact on energy efficiency, health, and productivity are enormous. Alongside potential benefits of interconnected smart devices comes increased risk and potential for abuse when embedding sensing and intelligence into every device. One of the core problems with the increasing number of IoT devices is the increased complexity that is required to operate them safely and securely. This increased complexity creates new safety, security, privacy, and usability challenges far beyond the difficult challenges individuals face just securing a single device. We highlight some of the negative trends that smart devices and collections of devices cause and we argue that issues related to security, physical safety, privacy, and usability are tightly interconnected and solutions that address all four simultaneously are needed. Tight safety and security standards for individual devices based on existing technology are needed. Likewise research that determines the best way for individuals to confidently manage collections of devices must guide the future deployments of such systems.

翻訳日:2023-05-07 12:21:44 公開日:2020-07-31

# side-tuning: 追加サイドネットワークによるネットワーク適応のためのベースライン

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks ( http://arxiv.org/abs/1912.13503v4 )

ライセンス: Link先を確認

Jeffrey O Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik

(参考訳) 望ましいタスクのためにニューラルネットワークをトレーニングする場合、ランダムに初期化された重みから始めるよりも、トレーニング済みのネットワークに適応する方がよい。適応性は、トレーニングデータが不足している場合、単一の学習者が複数のタスクを実行する必要がある場合、あるいはネットワークで事前をエンコードしたい場合に役立つ。ネットワーク適応のための最も一般的なアプローチは、微調整と固定特徴抽出器として事前訓練されたネットワークの利用である。本稿では,その代替案であるサイドチューニングを提案する。サイドチューニングは、(変更されていない)事前トレーニングされたネットワークと総和で融合した軽量な"サイド"ネットワークをトレーニングすることで、事前トレーニングされたネットワークに適応する。この単純な方法は、既存のソリューションと同等かそれ以上に機能し、微調整、固定機能、その他の一般的なアプローチに関する基本的な問題を解決します。特に、サイドチューニングは過度に適合しにくく、漸近的に一貫性があり、漸進的な学習における破滅的な忘れに苦しむことはない。本研究では,インクリメンタル学習 (icifar, itaskonomy),強化学習,模倣学習 (visual navigation in habitat), nlp質問応答学習 (squad v2) およびシングルタスク伝達学習 (taskonomy) など,様々なシナリオにおけるサイドチューニングの性能を示す。

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights. Adaptation can be useful in cases when training data is scarce, when a single learner needs to perform multiple tasks, or when one wishes to encode priors in the network. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others. In this paper, we propose a straightforward alternative: side-tuning. Side-tuning adapts a pre-trained network by training a lightweight "side" network that is fused with the (unchanged) pre-trained network via summation. This simple method works as well as or better than existing solutions and it resolves some of the basic issues with fine-tuning, fixed features, and other common approaches. In particular, side-tuning is less prone to overfitting, is asymptotically consistent, and does not suffer from catastrophic forgetting in incremental learning. We demonstrate the performance of side-tuning under a diverse set of scenarios, including incremental learning (iCIFAR, iTaskonomy), reinforcement learning, imitation learning (visual navigation in Habitat), NLP question-answering (SQuAD v2), and single-task transfer learning (Taskonomy), with consistently promising results.

翻訳日:2023-01-16 20:07:19 公開日:2020-07-31

# 非同期イベントベースデータのための微分可能なリカレントサーフェス

A Differentiable Recurrent Surface for Asynchronous Event-Based Data ( http://arxiv.org/abs/2001.03455v2 )

ライセンス: Link先を確認

Marco Cannici, Marco Ciccone, Andrea Romanoni, Matteo Matteucci

(参考訳) dynamic vision sensor (dvss) 輝度変化の対象となるピクセルに対応するイベントを非同期にストリームする。古典的な視覚装置とは異なり、シーンの粗い表現を生成する。したがって、標準的なコンピュータビジョンアルゴリズムを適用するには、イベントをフレームやイベントサーフェスに統合する必要がある。これは通常、余分なヒューリスティックを用いてフレームを再構築する手作りのグリッドによって達成される。本稿では,イベントを効率的に処理し,エンドツーエンドのタスク依存型イベントサーフェスを学ぶための,lstm(long short-term memory)セルのグリッドであるmatrix-lstmを提案する。既存の再構成手法と比較すると,MVSECベンチマークでは光学フロー推定の柔軟性や表現性が向上し,N-Carsデータセット上でのイベントベースオブジェクト分類の最先端性が改善されている。

Dynamic Vision Sensors (DVSs) asynchronously stream events in correspondence of pixels subject to brightness changes. Differently from classic vision devices, they produce a sparse representation of the scene. Therefore, to apply standard computer vision algorithms, events need to be integrated into a frame or event-surface. This is usually attained through hand-crafted grids that reconstruct the frame using ad-hoc heuristics. In this paper, we propose Matrix-LSTM, a grid of Long Short-Term Memory (LSTM) cells that efficiently process events and learn end-to-end task-dependent event-surfaces. Compared to existing reconstruction approaches, our learned event-surface shows good flexibility and expressiveness on optical flow estimation on the MVSEC benchmark and it improves the state-of-the-art of event-based object classification on the N-Cars dataset.

翻訳日:2023-01-12 23:14:50 公開日:2020-07-31

# 機械学習モデルに何を尋ねますか? 人文モデル対話に基づくモデル記述のためのユーザニーズの同定

What Would You Ask the Machine Learning Model? Identification of User Needs for Model Explanations Based on Human-Model Conversations ( http://arxiv.org/abs/2002.05674v3 )

ライセンス: Link先を確認

Micha{\l} Ku\'zba, Przemys{\l}aw Biecek

(参考訳) 最近、eXplainable Artificial Intelligenceの分野では、メソッドが増えている。驚いたことに、彼らの開発は、エンドユーザーのニーズの研究ではなく、モデル開発者によって進められます。ニーズの分析は、完了すれば、オープンな質問の研究ではなく、A/Bテストの形式を取る。人間のオペレータはmlモデルに何を尋ねるのか?」という問いに答えるために,予測モデルの決定を説明する会話システムを提案する。本研究では,タイタニック号の生存確率を予測するための機械学習モデルについて,Dr_antというチャットボットを開発した。モデルのさまざまな側面についてDr_ant氏と話し、予測の背後にある根拠を理解することができます。 1000以上の対話のコーパスを収集し、ユーザが聞きたい最も一般的なタイプの質問を分析します。我々の知る限り、これは人間の操作者のニーズを予測モデルの対話的かつ反復的な対話探索から収集する会話システムを用いた最初の研究である。

Recently we see a rising number of methods in the field of eXplainable Artificial Intelligence. To our surprise, their development is driven by model developers rather than a study of needs for human end users. The analysis of needs, if done, takes the form of an A/B test rather than a study of open questions. To answer the question "What would a human operator like to ask the ML model?" we propose a conversational system explaining decisions of the predictive model. In this experiment, we developed a chatbot called dr_ant to talk about machine learning model trained to predict survival odds on Titanic. People can talk with dr_ant about different aspects of the model to understand the rationale behind its predictions. Having collected a corpus of 1000+ dialogues, we analyse the most common types of questions that users would like to ask. To our knowledge, it is the first study which uses a conversational system to collect the needs of human operators from the interactive and iterative dialogue explorations of a predictive model.

翻訳日:2023-01-03 03:32:56 公開日:2020-07-31

# ノイズブレーカー:ノイズ解析で導かれるグラデーショナルイメージ

NoiseBreaker: Gradual Image Denoising Guided by Noise Analysis ( http://arxiv.org/abs/2002.07487v2 )

ライセンス: Link先を確認

Florian Lemarchand, Erwan Nogues and Maxime Pelcat

(参考訳) 完全な教師付きディープラーニングベースのデノイザは現在、最もパフォーマンスの高いイメージデノイザソリューションである。しかし、それらはきれいな参照画像を必要とする。対象雑音が複雑である場合、例えば、未知の一次雑音と未知の強度の混合からなる場合、完全な教師付き解は問題に適したトレーニングセットを構築することの困難さによって制限される。本稿では,画像中の支配ノイズを反復的に検出し,調整されたデノイザーを用いて除去する漸進的デノイジング戦略を提案する。この手法は混合雑音に対する美術ブラインドデノイザーの状態に追従することを示す。さらに, ノイズ解析により, ノイズの種類だけでなく, 騒音強度も効率的に誘導できることを示した。この手法は、遭遇した雑音の性質についての洞察を提供し、既存のデノイザーを新しいノイズの性質で拡張することができる。この特徴により、様々なデノイジングケースに適応する。

Fully supervised deep-learning based denoisers are currently the most performing image denoising solutions. However, they require clean reference images. When the target noise is complex, e.g. composed of an unknown mixture of primary noises with unknown intensity, fully supervised solutions are limited by the difficulty to build a suited training set for the problem. This paper proposes a gradual denoising strategy that iteratively detects the dominating noise in an image, and removes it using a tailored denoiser. The method is shown to keep up with state of the art blind denoisers on mixture noises. Moreover, noise analysis is demonstrated to guide denoisers efficiently not only on noise type, but also on noise intensity. The method provides an insight on the nature of the encountered noise, and it makes it possible to extend an existing denoiser with new noise nature. This feature makes the method adaptive to varied denoising cases.

翻訳日:2022-12-30 20:26:30 公開日:2020-07-31

# 実世界の人間-ロボット協調強化学習

Real-World Human-Robot Collaborative Reinforcement Learning ( http://arxiv.org/abs/2003.01156v2 )

ライセンス: Link先を確認

Ali Shafti, Jonas Tjomsland, William Dudley and A. Aldo Faisal

(参考訳) 人間とインテリジェントロボット(embodied ai)の現実世界における直感的なコラボレーションは、ロボット工学の多くの望ましい応用にとって必須の目的である。明示的なコミュニケーションに関する多くの研究があるが、人間とロボットがどのように暗黙的に相互作用するか、運動適応レベルに焦点を当てている。本研究では,2つの直交軸の回転に動作を制限し,各軸を1人のプレイヤーに割り当てることにより,人間ロボット協調迷路ゲームの現実的な構成について述べる。この結果、人間もエージェントも自分でゲームを解くことはできない。我々は,ロボットエージェントの制御に深層強化学習を用い,実世界のプレイの30分以内に,いかなる事前学習も行わない結果を得る。次に、この設定を用いて、協調ゲームのためのポリシーを共同学習する際に、人間/エージェントの行動と適応に関する体系的な実験を行う。本研究では,人間とロボットエージェント間の時間的相互政治学習の結果を提示し,各参加者のエージェントがゲームプレイの表現として機能することを示す。これにより、エージェントのポリシーと自身のエージェントのポリシーを比較して、エージェントが自分自身と異なるエージェントと遊ぶ場合の成功を関連付けることができます。

The intuitive collaboration of humans and intelligent robots (embodied AI) in the real-world is an essential objective for many desirable applications of robotics. Whilst there is much research regarding explicit communication, we focus on how humans and robots interact implicitly, on motor adaptation level. We present a real-world setup of a human-robot collaborative maze game, designed to be non-trivial and only solvable through collaboration, by limiting the actions to rotations of two orthogonal axes, and assigning each axes to one player. This results in neither the human nor the agent being able to solve the game on their own. We use deep reinforcement learning for the control of the robotic agent, and achieve results within 30 minutes of real-world play, without any type of pre-training. We then use this setup to perform systematic experiments on human/agent behaviour and adaptation when co-learning a policy for the collaborative game. We present results on how co-policy learning occurs over time between the human and the robotic agent resulting in each participant's agent serving as a representation of how they would play the game. This allows us to relate a person's success when playing with different agents than their own, by comparing the policy of the agent with that of their own agent.

翻訳日:2022-12-27 04:40:25 公開日:2020-07-31

# ニューロモルフィックハードウェアを用いたエネルギー効率の高いマップレスナビゲーションのための深層・スパイキングニューラルネットワークの強化共学習

Reinforcement co-Learning of Deep and Spiking Neural Networks for Energy-Efficient Mapless Navigation with Neuromorphic Hardware ( http://arxiv.org/abs/2003.01157v2 )

ライセンス: Link先を確認

Guangzhi Tang, Neelesh Kumar, Konstantinos P. Michmizos

(参考訳) エネルギー効率の良いマップレスナビゲーションは、限られたオンボードリソースで未知の環境を探索するモバイルロボットにとって不可欠である。近年の深部強化学習(DRL)アプローチはナビゲーションに成功しているが、その高エネルギー消費はいくつかのロボットアプリケーションでの使用を制限する。本稿では、スパイクニューラルネットワークのエネルギー効率とDRLの最適性を組み合わせたニューロモルフィックなアプローチを提案し、それをマップレスナビゲーションの学習制御ポリシーでベンチマークする。我々のハイブリッド・フレームワークである深層決定主義的政策勾配(SDDPG)は、スパイキングアクターネットワーク(SAN)と深い批判ネットワークから構成されており、この2つのネットワークは勾配降下を用いて共同で訓練されている。共同学習は、2つのネットワーク間の相乗的情報交換を可能にし、共有表現学習を通じて相互の制限を克服した。アプローチを評価するため、トレーニング済みのSANをIntelのLoihiニューロモルフィックプロセッサにデプロイした。シミュレーションおよび実世界の複雑な環境において,本手法はJetson TX2のDDPGと比較して75倍のエネルギーを消費し,目標への航法成功率も1%から4.2%に向上した。これらの結果は、ニューロモルフィックハードウェアで自律ロボットを制御する脳に触発されたアルゴリズムを設計するための継続的な取り組みを強化する。

Energy-efficient mapless navigation is crucial for mobile robots as they explore unknown environments with limited on-board resources. Although the recent deep reinforcement learning (DRL) approaches have been successfully applied to navigation, their high energy consumption limits their use in several robotic applications. Here, we propose a neuromorphic approach that combines the energy-efficiency of spiking neural networks with the optimality of DRL and benchmark it in learning control policies for mapless navigation. Our hybrid framework, spiking deep deterministic policy gradient (SDDPG), consists of a spiking actor network (SAN) and a deep critic network, where the two networks were trained jointly using gradient descent. The co-learning enabled synergistic information exchange between the two networks, allowing them to overcome each other's limitations through a shared representation learning. To evaluate our approach, we deployed the trained SAN on Intel's Loihi neuromorphic processor. When validated on simulated and real-world complex environments, our method on Loihi consumed 75 times less energy per inference as compared to DDPG on Jetson TX2, and also exhibited a higher rate of successful navigation to the goal, which ranged from 1% to 4.2% and depended on the forward-propagation timestep size. These results reinforce our ongoing efforts to design brain-inspired algorithms for controlling autonomous robots with neuromorphic hardware.

翻訳日:2022-12-27 04:14:33 公開日:2020-07-31

# RGBDフィデューシャルセンシングとリカレントニューラルネットワークを用いた効率的なケーブル駆動手術ロボット

Efficiently Calibrating Cable-Driven Surgical Robots with RGBD Fiducial Sensing and Recurrent Neural Networks ( http://arxiv.org/abs/2003.08520v4 )

ライセンス: Link先を確認

Minho Hwang, Brijen Thananjeyan, Samuel Paradis, Daniel Seita, Jeffrey Ichnowski, Danyal Fer, Thomas Low, and Ken Goldberg

(参考訳) Intuitive surgery's da Vinci Research Kit (dVRK) のようなケーブル駆動型手術補助装置(RSA)を用いた手術用サブタスクの自動化は,ケーブルストレッチやヒステリシスなどのケーブル関連効果の制御が困難である。 rgbdセンシングで追跡するエンドエフェクタと腕に3dプリントされたフィドゥクアル座標フレームを配置することで、ロボットを効率的に校正する新しい手法を提案する。関節間の結合と履歴に依存した効果を計測するために, サンプル軌跡からのデータを解析し, モデリングへの13のアプローチを検討する。これらのモデルには線形回帰とlstmリカレントニューラルネットワークが含まれており、それぞれ時間的ウィンドウ長が異なり、補償的フィードバックを提供する。提案手法では,1800試料のデータ収集に31分,モデルトレーニングに1分以下を要した。基準軌道の試験セットの結果は、トレーニングされたモデルが物理ロボットの平均追尾誤差を2.96mmから0.65mmに低減できることを示している。 fls pegトランスファー外科医訓練タスクのオープンループ軌道の実行結果から、最良のモデルは成功率を39.4 %から96.7 %に増加させ、熟練した外科患者に匹敵する性能をもたらすことが示唆された。コードや3Dプリント可能なモデルを含む補助材料はhttps://sites.google.com/berkeley.edu/surgical-calibrationで入手できる。

Automation of surgical subtasks using cable-driven robotic surgical assistants (RSAs) such as Intuitive Surgical's da Vinci Research Kit (dVRK) is challenging due to imprecision in control from cable-related effects such as cable stretching and hysteresis. We propose a novel approach to efficiently calibrate such robots by placing a 3D printed fiducial coordinate frames on the arm and end-effector that is tracked using RGBD sensing. To measure the coupling and history-dependent effects between joints, we analyze data from sampled trajectories and consider 13 approaches to modeling. These models include linear regression and LSTM recurrent neural networks, each with varying temporal window length to provide compensatory feedback. With the proposed method, data collection of 1800 samples takes 31 minutes and model training takes under 1 minute. Results on a test set of reference trajectories suggest that the trained model can reduce the mean tracking error of the physical robot from 2.96 mm to 0.65 mm. Results on the execution of open-loop trajectories of the FLS peg transfer surgeon training task suggest that the best model increases success rate from 39.4 % to 96.7 %, producing performance comparable to that of an expert surgical resident. Supplementary materials, including code and 3D-printable models, are available at https://sites.google.com/berkeley.edu/surgical-calibration

翻訳日:2022-12-22 04:42:02 公開日:2020-07-31

# テスト時のコスト効率の高い機能獲得による異種間意思決定支援

Peri-Diagnostic Decision Support Through Cost-Efficient Feature Acquisition at Test-Time ( http://arxiv.org/abs/2003.14127v2 )

ライセンス: Link先を確認

Gerome Vivar, Kamilia Mullakaeva, Andreas Zwergal, Nassir Navab, and Seyed-Ahmad Ahmadi

(参考訳) 医学におけるコンピュータ支援診断(CADx)アルゴリズムは、医師に患者固有の意思決定支援を提供する。これらのアルゴリズムは通常、高次元マルチモーダル検査データの完全取得後に適用され、しばしば特徴完全性を仮定する。しかし、検査コスト、侵襲性、または徴候の欠如により、このようなケースはめったにない。 CADxのサブプロブレムは,これまでにCADxコミュニティでほとんど注目されていないが,取得段階を含む診断ワークフロー全体において,医師を指導することを目的としている。我々は、医師の視点から「これまで収集された証拠を収集し、最も正確で効率的な診断予測を達成するために、次にどの検査を行うべきか」という質問をモデル化した。本研究では,入力層でのドロップアウトの利用と,テスト時にトレーニングされたネットワークの勾配の統合により,機能の重要性を動的に属性づけする手法を提案する。 2つの公衆医療と2つの合成データセットを用いて,提案手法の有効性を検証する。その結果,提案手法は従来手法よりもコスト効率が高く,全体の精度も高いことがわかった。これは直接的に、患者にとって不要な検査を減らし、医師にとってより早く、よりコストが低く、より正確な意思決定支援に繋がる。

Computer-aided diagnosis (CADx) algorithms in medicine provide patient-specific decision support for physicians. These algorithms are usually applied after full acquisition of high-dimensional multimodal examination data, and often assume feature-completeness. This, however, is rarely the case due to examination costs, invasiveness, or a lack of indication. A sub-problem in CADx, which to our knowledge has received very little attention among the CADx community so far, is to guide the physician during the entire peri-diagnostic workflow, including the acquisition stage. We model the following question, asked from a physician's perspective: "Given the evidence collected so far, which examination should I perform next, in order to achieve the most accurate and efficient diagnostic prediction?". In this work, we propose a novel approach which is enticingly simple: use dropout at the input layer, and integrated gradients of the trained network at test-time to attribute feature importance dynamically. We validate and explain the effectiveness of our proposed approach using two public medical and two synthetic datasets. Results show that our proposed approach is more cost- and feature-efficient than prior approaches and achieves a higher overall accuracy. This directly translates to less unnecessary examinations for patients, and a quicker, less costly and more accurate decision support for the physician.

翻訳日:2022-12-18 00:22:58 公開日:2020-07-31

# 2D-3Dライン対応付き先行LiDARマップにおける単眼カメラの定位

Monocular Camera Localization in Prior LiDAR Maps with 2D-3D Line Correspondences ( http://arxiv.org/abs/2004.00740v2 )

ライセンス: Link先を確認

Huai Yu, Weikun Zhen, Wen Yang, Ji Zhang, Sebastian Scherer

(参考訳) 既存の地図における軽量カメラのローカライゼーションは、視覚ベースのナビゲーションに不可欠である。現在、視覚および視覚慣性オドメトリ(vo\&vio)技術は状態推定のためによく開発されているが、ループ閉包時に必然的に蓄積されたドリフトとポーズジャンプがある。これらの問題を解決するために,直接2D-3D線対応を用いた先行LiDARマップにおける効率的な単眼カメラのローカライズ手法を提案する。画像とLiDAR点雲の出現差とモダリティギャップに対処するため,LDARマップから幾何学的3D線をオフラインに抽出し,ビデオシーケンスからロバストな2D線をオンライン抽出する。 VIOからのポーズ予測により、粗い2D-3D線対応を効率的に得ることができる。次に、カメラポーズと2D-3D対応を、対応の投影誤差を最小化し、出力を拒否することで繰り返し最適化する。 eurocmavデータセットと収集したデータセットにおける実験結果から,提案手法は,構造化された環境でのドリフトやジャンプを蓄積することなく,効率的にカメラポーズを推定できることが示されている。

Light-weight camera localization in existing maps is essential for vision-based navigation. Currently, visual and visual-inertial odometry (VO\&VIO) techniques are well-developed for state estimation but with inevitable accumulated drifts and pose jumps upon loop closure. To overcome these problems, we propose an efficient monocular camera localization method in prior LiDAR maps using direct 2D-3D line correspondences. To handle the appearance differences and modality gaps between LiDAR point clouds and images, geometric 3D lines are extracted offline from LiDAR maps while robust 2D lines are extracted online from video sequences. With the pose prediction from VIO, we can efficiently obtain coarse 2D-3D line correspondences. Then the camera poses and 2D-3D correspondences are iteratively optimized by minimizing the projection error of correspondences and rejecting outliers. Experimental results on the EurocMav dataset and our collected dataset demonstrate that the proposed method can efficiently estimate camera poses without accumulated drifts or pose jumps in structured environments.

翻訳日:2022-12-17 19:13:00 公開日:2020-07-31

# 変形を考慮した3次元モデル埋め込みと検索

Deformation-Aware 3D Model Embedding and Retrieval ( http://arxiv.org/abs/2004.01228v3 )

ライセンス: Link先を確認

Mikaela Angelina Uy and Jingwei Huang and Minhyuk Sung and Tolga Birdal and Leonidas Guibas

(参考訳) 本稿では,与えられた問合せ形状に変形可能な3次元モデルの検索問題を導入し,この検索課題を解決するための新しい深部変形認識埋め込みを提案する。 3Dモデル検索は、ノイズと部分的な3Dスキャンからクリーンで完全な3Dモデルを復元するための基本的な操作である。しかし、3次元形状の有限集合を考えると、クエリに最も近いモデルでさえ満足できないかもしれない。これにより、検索したモデルに適合するように3次元モデル変形技術を適用する動機付けとなる。しかし、多くの3次元変形技術では、元のモデルの重要な特徴を保存し、クエリへの変形モデルの完全適合を防止するために、一定の制限が課されている。この変形モデルとクエリ間のギャップは、典型的なメトリック学習技術では扱えないモデル間の非対称な関係を誘導する。そこで本研究では,位置依存型自我中心距離場を利用して非対称な関係を学習する新しい深層埋め込み手法を提案する。また,組込みネットワークを訓練するための2つの戦略を提案する。これらの手法は、合成データと実データの両方で実験において、他のベースラインよりも優れていることを示す。プロジェクトページはhttps://deformscan2cad.github.io/で閲覧できます。

We introduce a new problem of retrieving 3D models that are deformable to a given query shape and present a novel deep deformation-aware embedding to solve this retrieval task. 3D model retrieval is a fundamental operation for recovering a clean and complete 3D model from a noisy and partial 3D scan. However, given a finite collection of 3D shapes, even the closest model to a query may not be satisfactory. This motivates us to apply 3D model deformation techniques to adapt the retrieved model so as to better fit the query. Yet, certain restrictions are enforced in most 3D deformation techniques to preserve important features of the original model that prevent a perfect fitting of the deformed model to the query. This gap between the deformed model and the query induces asymmetric relationships among the models, which cannot be handled by typical metric learning techniques. Thus, to retrieve the best models for fitting, we propose a novel deep embedding approach that learns the asymmetric relationships by leveraging location-dependent egocentric distance fields. We also propose two strategies for training the embedding network. We demonstrate that both of these approaches outperform other baselines in our experiments with both synthetic and real data. Our project page can be found at https://deformscan2cad.github.io/.

翻訳日:2022-12-17 10:14:06 公開日:2020-07-31

# CLARIAHのオントロジー:歴史・言語・メディアの相互運用性を目指して

Ontologies in CLARIAH: Towards Interoperability in History, Language and Media ( http://arxiv.org/abs/2004.02845v2 )

ライセンス: Link先を確認

Albert Mero\~no-Pe\~nuela, Victor de Boer, Marieke van Erp, Richard Zijdeman, Rick Mourits, Willem Melder, Auke Rijpma, Ruben Schalk

(参考訳) デジタル人文科学の最も重要な目標の1つは、研究者に新たな研究課題のためのデータとツールを提供することである。ここでfairの原則は、データが必要な場合に有用なフレームワークを提供する: findable, 様々なソースに散在することが多い; アクセス可能; 一部はオフラインまたはペイウォールの背後にあるのでアクセス可能; 相互運用可能; 標準の知識表現形式と共有語彙を使用する; 適切なライセンスと許可によって再利用する。多様な人文科学領域からのデータの統合は簡単ではなく、「経済の富は18世紀に均等に分配されたか?」「破壊的なメディアイベントを中心に構築された物語は何か?」といった研究課題や、学者の準備段階(データ収集、知識組織、清掃など)を考慮する必要がある。本章では,オランダ国立プロジェクト clariah で開発・統合されたオントロジーとツールについて記述し,パラダイム的データ表現(文体コーパス,構造化データ,マルチメディア)を持つ人文科学(言語学,社会・経済史,メディア研究)の「ピラーズ」という3つの基本領域のデータセットから,これらの問題に対処した。このようなオントロジーとツールを用いて,一般化と再利用性の観点から学んだ教訓を要約する。

One of the most important goals of digital humanities is to provide researchers with data and tools for new research questions, either by increasing the scale of scholarly studies, linking existing databases, or improving the accessibility of data. Here, the FAIR principles provide a useful framework as these state that data needs to be: Findable, as they are often scattered among various sources; Accessible, since some might be offline or behind paywalls; Interoperable, thus using standard knowledge representation formats and shared vocabularies; and Reusable, through adequate licensing and permissions. Integrating data from diverse humanities domains is not trivial, research questions such as "was economic wealth equally distributed in the 18th century?", or "what are narratives constructed around disruptive media events?") and preparation phases (e.g. data collection, knowledge organisation, cleaning) of scholars need to be taken into account. In this chapter, we describe the ontologies and tools developed and integrated in the Dutch national project CLARIAH to address these issues across datasets from three fundamental domains or "pillars" of the humanities (linguistics, social and economic history, and media studies) that have paradigmatic data representations (textual corpora, structured data, and multimedia). We summarise the lessons learnt from using such ontologies and tools in these domains from a generalisation and reusability perspective.

翻訳日:2022-12-16 07:29:35 公開日:2020-07-31

# never stop learning: ロボット強化学習における微調整の有効性

Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning ( http://arxiv.org/abs/2004.10190v2 )

ライセンス: Link先を確認

Ryan Julian, Benjamin Swanson, Gaurav S. Sukhatme, Sergey Levine, Chelsea Finn, and Karol Hausman

(参考訳) ロボット学習システムの大きな約束の一つは、過ちから学び、絶えず変化する環境に適応できることだ。この可能性にもかかわらず、今日のロボット学習システムのほとんどは、固定ポリシーとしてデプロイされており、デプロイ後に適応されていない。学習済みの振る舞いを、現実世界の新しい環境、オブジェクト、パーセプションに効率的に適応できるだろうか? 本稿では,継続的な適応を促進するロボット学習フレームワークのための手法と実証的証拠を提案する。特に, 背景, 物体形状, 外観, 照明条件, ロボット形態の変化など, オフポリシー強化学習による微調整により, 視覚に基づくロボット操作ポリシを新たなバリエーションに適用する方法を実証する。さらに、この適応はタスクをゼロから学習するために必要なデータの0.2%未満を使用する。私たちは、事前トレーニングされたポリシーを適用するアプローチが、微調整の過程でかなりのパフォーマンス向上につながること、rlによる事前トレーニングが不可欠であることを見出します。また、これらの肯定的な結果は、連続的な学習環境に限られており、新しいタスクの連続したデータを用いて、単一のポリシーの行を繰り返し微調整する。我々の経験的な結論は、シミュレーション操作タスクの実験と、580,000の把持で事前訓練された実際のロボット把持システムに関する52のユニークな微調整実験によって一貫して支持されている。

One of the great promises of robot learning systems is that they will be able to learn from their mistakes and continuously adapt to ever-changing environments. Despite this potential, most of the robot learning systems today are deployed as a fixed policy and they are not being adapted after their deployment. Can we efficiently adapt previously learned behaviors to new environments, objects and percepts in the real world? In this paper, we present a method and empirical evidence towards a robot learning framework that facilitates continuous adaption. In particular, we demonstrate how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning, including changes in background, object shape and appearance, lighting conditions, and robot morphology. Further, this adaptation uses less than 0.2% of the data necessary to learn the task from scratch. We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning, and that pre-training via RL is essential: training from scratch or adapting from supervised ImageNet features are both unsuccessful with such small amounts of data. We also find that these positive results hold in a limited continual learning setting, in which we repeatedly fine-tune a single lineage of policies using data from a succession of new tasks. Our empirical conclusions are consistently supported by experiments on simulated manipulation tasks, and by 52 unique fine-tuning experiments on a real robotic grasping system pre-trained on 580,000 grasps.

翻訳日:2022-12-11 05:55:09 公開日:2020-07-31

# DR-SPAAM:2次元距離データにおける人物検出のための空間的・自己回帰モデル

DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data ( http://arxiv.org/abs/2004.14079v2 )

ライセンス: Link先を確認

Dan Jia, Alexander Hermans, and Bastian Leibe

(参考訳) 2次元LiDARを用いた人物検出は,2次元範囲データの低情報化による課題である。 LiDAR点のばらつきによる問題を緩和するため、現在の最先端手法は複数の過去のスキャンを融合させ、組み合わせたスキャンを用いて検出を行う。このような後方視の融合の欠点は、すべてのスキャンを明示的にアライメントする必要があること、そして必要なアライメント操作によってパイプライン全体のコストが高くなることだ。本稿では,異なるタイミングで取得したスキャンを組み合わせるための代替戦略を用いた人物検出ネットワークを提案する。距離ロバスト空間注意自動回帰モデル (DR-SPAAM) は前方視のパラダイムに従っている。中間機能をbackboneネットワークからテンプレートとして保持し、新しいスキャンが利用可能になったときにテンプレートを更新します。更新されたフィーチャーテンプレートは、現在シーンにいる人を検出するために使用される。 DROWデータセットでは,既存の最先端技術よりも約4倍高速で,専用GPUを搭載したラップトップで87.2FPS,NVIDIA Jetson AGX組み込みGPUで22.6FPSで動作する。 PyTorchと、事前トレーニングされたモデルを含むROSノードでコードをリリースします。

Detecting persons using a 2D LiDAR is a challenging task due to the low information content of 2D range data. To alleviate the problem caused by the sparsity of the LiDAR points, current state-of-the-art methods fuse multiple previous scans and perform detection using the combined scans. The downside of such a backward looking fusion is that all the scans need to be aligned explicitly, and the necessary alignment operation makes the whole pipeline more expensive -- often too expensive for real-world applications. In this paper, we propose a person detection network which uses an alternative strategy to combine scans obtained at different times. Our method, Distance Robust SPatial Attention and Auto-regressive Model (DR-SPAAM), follows a forward looking paradigm. It keeps the intermediate features from the backbone network as a template and recurrently updates the template when a new scan becomes available. The updated feature template is in turn used for detecting persons currently in the scene. On the DROW dataset, our method outperforms the existing state-of-the-art, while being approximately four times faster, running at 87.2 FPS on a laptop with a dedicated GPU and at 22.6 FPS on an NVIDIA Jetson AGX embedded GPU. We release our code in PyTorch and a ROS node including pre-trained models.

翻訳日:2022-12-08 14:29:09 公開日:2020-07-31

# 前歯部光コヒーレンス断層撮影における隅角閉鎖緑内障の評価

AGE Challenge: Angle Closure Glaucoma Evaluation in Anterior Segment Optical Coherence Tomography ( http://arxiv.org/abs/2005.02258v3 )

ライセンス: Link先を確認

Huazhu Fu, Fei Li, Xu Sun, Xingxing Cao, Jingan Liao, Jose Ignacio Orlando, Xing Tao, Yuexiang Li, Shihao Zhang, Mingkui Tan, Chenglang Yuan, Cheng Bian, Ruitao Xie, Jiongcheng Li, Xiaomeng Li, Jing Wang, Le Geng, Panming Li, Huaying Hao, Jiang Liu, Yan Kong, Yongyong Ren, Hrvoje Bogunovic, Xiulan Zhang, Yanwu Xu

(参考訳) アングル閉鎖緑内障(ACG)は開角緑内障よりも攻撃的な疾患であり、前室角度(ACA)の異常な解剖学的構造が眼圧を上昇させ、徐々に緑内障を発症し、最終的には視力障害や盲目を引き起こす。 Anterior Segment Optical Coherence Tomography (AS-OCT) は、開角度から角度閉鎖を識別する高速で接触のない方法を提供する。緑内障の診断のために多くの医用画像解析アルゴリズムが開発されたが、AS-OCTイメージングに焦点を当てた研究はごくわずかである。特に、既存のメソッドを統一的に評価するための公開as-octデータセットは存在せず、アングルクロージャ検出と評価のための自動化技術の開発の進捗を制限している。そこで我々は,MICCAI 2019と共同で開催したAngle closure Glaucoma Evaluation Challenge (AGE)を組織した。年齢課題は, 強膜刺激の局在と角度閉鎖分類の2つの課題から成っていた。そこで我々は199人の患者から4800個の注釈付きAS-OCT画像の大規模なデータセットを公開し、異なるモデルをベンチマークし比較するための評価フレームワークを提案した。 AGEチャレンジでは200以上のチームがオンラインに登録され、1100以上の結果がオンライン評価のために提出された。最終的に8チームがオンサイトチャレンジに参加した。本稿では,これらの8つの課題を要約し,その2つの課題に対して対応する結果を解析する。我々はさらに制限と今後の方向性について議論する。 AGEチャレンジでは,最高性能のユークリッド距離は平均10ピクセル (10um) であり,角度クロージャ分類のタスクでは,全てのアルゴリズムが良好な性能を達成し,100%の精度を得ることができた。

Angle closure glaucoma (ACG) is a more aggressive disease than open-angle glaucoma, where the abnormal anatomical structures of the anterior chamber angle (ACA) may cause an elevated intraocular pressure and gradually lead to glaucomatous optic neuropathy and eventually to visual impairment and blindness. Anterior Segment Optical Coherence Tomography (AS-OCT) imaging provides a fast and contactless way to discriminate angle closure from open angle. Although many medical image analysis algorithms have been developed for glaucoma diagnosis, only a few studies have focused on AS-OCT imaging. In particular, there is no public AS-OCT dataset available for evaluating the existing methods in a uniform way, which limits progress in the development of automated techniques for angle closure detection and assessment. To address this, we organized the Angle closure Glaucoma Evaluation challenge (AGE), held in conjunction with MICCAI 2019. The AGE challenge consisted of two tasks: scleral spur localization and angle closure classification. For this challenge, we released a large dataset of 4800 annotated AS-OCT images from 199 patients, and also proposed an evaluation framework to benchmark and compare different models. During the AGE challenge, over 200 teams registered online, and more than 1100 results were submitted for online evaluation. Finally, eight teams participated in the onsite challenge. In this paper, we summarize these eight onsite challenge methods and analyze their corresponding results for the two tasks. We further discuss limitations and future directions. In the AGE challenge, the top-performing approach had an average Euclidean Distance of 10 pixels (10um) in scleral spur localization, while in the task of angle closure classification, all the algorithms achieved satisfactory performances, with two best obtaining an accuracy rate of 100%.

翻訳日:2022-12-06 14:17:26 公開日:2020-07-31

# 自動マルチラベル分類法のロバストな実験評価

A Robust Experimental Evaluation of Automated Multi-Label Classification Methods ( http://arxiv.org/abs/2005.08083v2 )

ライセンス: Link先を確認

Alex G. C. de S\'a, Cristiano G. Pimenta, Gisele L. Pappa and Alex A. Freitas

(参考訳) 機械学習(AutoML)は、与えられた学習タスクに対するアルゴリズムの選択と設定を扱うために登場した。 AutoMLの進歩に伴い、特に従来の分類や回帰問題に対していくつかの効果的な手法が導入された。 AutoMLの成功とは別に、いくつかの問題が未解決のままである。特に問題の一つは、さまざまなタイプのデータを扱うautomlメソッドが欠如していることだ。このシナリオに基づいて,マルチラベル分類(MLC)問題に対するAutoMLにアプローチする。 mlcでは、それぞれの例が複数のクラスラベルに同時に関連付けられるが、標準的な分類タスクとは異なり、例は1つのクラスラベルに関連付けられる。本研究では,14のデータセットと3つの設計された検索空間に対して,2つの進化的手法,1つのベイズ最適化法,1つのランダム探索法,1つのグリージー探索法と5つの自動多ラベル分類法の比較を行った。全体として、最も顕著な方法は、標準文法に基づく遺伝的プログラミング(GGP)探索法、すなわちAuto-MEKA$_{GGP}$に基づくものである。 Auto-MEKA$_{GGP}$は, 比較において最高の平均値を示し, グリーディ探索法と比較した場合を除き, 異なる探索空間における他の手法よりも統計的に優れていた。

Automated Machine Learning (AutoML) has emerged to deal with the selection and configuration of algorithms for a given learning task. With the progression of AutoML, several effective methods were introduced, especially for traditional classification and regression problems. Apart from the AutoML success, several issues remain open. One issue, in particular, is the lack of ability of AutoML methods to deal with different types of data. Based on this scenario, this paper approaches AutoML for multi-label classification (MLC) problems. In MLC, each example can be simultaneously associated to several class labels, unlike the standard classification task, where an example is associated to just one class label. In this work, we provide a general comparison of five automated multi-label classification methods -- two evolutionary methods, one Bayesian optimization method, one random search and one greedy search -- on 14 datasets and three designed search spaces. Overall, we observe that the most prominent method is the one based on a canonical grammar-based genetic programming (GGP) search method, namely Auto-MEKA$_{GGP}$. Auto-MEKA$_{GGP}$ presented the best average results in our comparison and was statistically better than all the other methods in different search spaces and evaluated measures, except when compared to the greedy search method.

翻訳日:2022-12-02 12:41:17 公開日:2020-07-31

# 超低リソース自動音声認識のための生成型adversarial training data adaptation

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition ( http://arxiv.org/abs/2005.09256v2 )

ライセンス: Link先を確認

Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

(参考訳) 言語文化の遺産を保存するために、絶滅危惧言語の音声データを書き起こし、アーカイブすることが重要であり、自動音声認識(asr)はこのプロセスを容易にする強力なツールである。しかし、絶滅危惧言語は一般に多くの話者を持つ大きなコーパスを持たないため、訓練されたASRモデルの性能は概してかなり劣っている。それでも、書き起こさなければならない自発的な音声データの多くの記録が残されていることが多い。本研究では,この話者スパーシティ問題を解決するために,学習音声データ全体を変換し,テスト話者のように聞こえるようにし,高精度なasrシステムを構築することを提案する。本研究では,CycleGANをベースとした非並列音声変換技術を用いて,テスト話者の音声に近いラベル付きトレーニングデータを構築する。 AinuとMboshiの2つの低リソースコーパスに対する話者適応手法の評価を行った。 Ainu corpusの電話誤り率を35-60%改善し,Mboshi corpusでは40%改善した。このアプローチは、教師なし適応とこれら2つのコーパスによる多言語訓練という、2つの従来の手法よりも優れていた。

It is important to transcribe and archive speech data of endangered languages for preserving heritages of verbal culture and automatic speech recognition (ASR) is a powerful tool to facilitate this process. However, since endangered languages do not generally have large corpora with many speakers, the performance of ASR models trained on them are considerably poor in general. Nevertheless, we are often left with a lot of recordings of spontaneous speech data that have to be transcribed. In this work, for mitigating this speaker sparsity problem, we propose to convert the whole training speech data and make it sound like the test speaker in order to develop a highly accurate ASR system for this speaker. For this purpose, we utilize a CycleGAN-based non-parallel voice conversion technology to forge a labeled training data that is close to the test speaker's speech. We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi. We obtained 35-60% relative improvement in phone error rate on the Ainu corpus, and 40% relative improvement was attained on the Mboshi corpus. This approach outperformed two conventional methods namely unsupervised adaptation and multilingual training with these two corpora.

翻訳日:2022-12-01 14:15:11 公開日:2020-07-31

# 振動データを用いた回転軸の機械学習による不均衡検出

Machine Learning-Based Unbalance Detection of a Rotating Shaft Using Vibration Data ( http://arxiv.org/abs/2005.12742v3 )

ライセンス: Link先を確認

Oliver Mey, Willi Neudeck, Andr\'e Schneider and Olaf Enge-Rosenblatt

(参考訳) 振動センサによる回転機械の故障検出は、早期に機械の損傷を検知し、適切な対策を講じて生産停止を防ぐことができる。機械学習を用いた振動データの解析は、関連する分析労力の大幅な削減と、さらなる診断精度の向上を約束する。ここでは、アンバランス検出のためのアルゴリズムの開発と評価の基礎として使用されるデータセットを公開する。この目的のために3Dプリントホルダを用いて回転軸に様々な大きさのアンバランスを取付けた。速度は近似から近似まで。 630 RPMから2330 RPMの3つのセンサを用いて, 回転軸の振動を毎秒4096値のサンプリング速度で記録した。不均衡強度ごとに開発と評価データセットが利用可能である。このように記録されたデータセットを用いて、完全に接続された畳み込みニューラルネットワーク、Hidden Markov ModelsおよびRandom Forest分類を自動抽出時系列特徴に基づいてテストした。評価データセット上で98.6 %の予測精度で、スケールしたfft変換振動データを入力として受信する完全接続ニューラルネットワークを用いて、最良の結果が得られる。

Fault detection at rotating machinery with the help of vibration sensors offers the possibility to detect damage to machines at an early stage and to prevent production downtimes by taking appropriate measures. The analysis of the vibration data using methods of machine learning promises a significant reduction in the associated analysis effort and a further improvement in diagnostic accuracy. Here we publish a dataset which is used as a basis for the development and evaluation of algorithms for unbalance detection. For this purpose, unbalances of various sizes were attached to a rotating shaft using a 3D-printed holder. In a speed range from approx. 630 RPM to 2330 RPM, three sensors were used to record vibrations on the rotating shaft at a sampling rate of 4096 values per second. A development and an evaluation dataset are available for each unbalance strength. Using the dataset recorded in this way, fully connected and convolutional neural networks, Hidden Markov Models and Random Forest classifications on the basis of automatically extracted time series features were tested. With a prediction accuracy of 98.6 % on the evaluation dataset, the best result could be achieved with a fully-connected neural network that receives the scaled FFT-transformed vibration data as input.

翻訳日:2022-11-29 00:50:13 公開日:2020-07-31

# ニューラルネットワークモデルの抽出に対する保護

A Protection against the Extraction of Neural Network Models ( http://arxiv.org/abs/2005.12782v3 )

ライセンス: Link先を確認

Herv\'e Chabanne and Vincent Despiegel and Linda Guiga

(参考訳) oracleがニューラルネットワーク(nn)にアクセスすると、その基盤となるモデルを抽出することができる。ここでは,基盤となるnnの予測をほとんど変更せず,リバースエンジニアリングのタスクを複雑化させる寄生層を追加することで保護を導入する。提案手法は,畳み込みnnを用いた雑音idマッピングを近似する。新たな寄生層の導入が攻撃を複雑化する理由を説明する。我々は,保護されたNNの性能と精度に関する実験を報告する。

Given oracle access to a Neural Network (NN), it is possible to extract its underlying model. We here introduce a protection by adding parasitic layers which keep the underlying NN's predictions mostly unchanged while complexifying the task of reverse-engineering. Our countermeasure relies on approximating a noisy identity mapping with a Convolutional NN. We explain why the introduction of new parasitic layers complexifies the attacks. We report experiments regarding the performance and the accuracy of the protected NN.

翻訳日:2022-11-28 23:47:14 公開日:2020-07-31

# 時間列回帰と連続正規化流れに対する離散化最適化と最適分散

Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows ( http://arxiv.org/abs/2005.13420v2 )

ライセンス: Link先を確認

Derek Onken and Lars Ruthotto

(参考訳) ニューラルネットワークを用いた時系列回帰と連続正規化フロー(CNF)に対する離散化最適化(Disc-Opt)と最適化分散(Opt-Disc)アプローチの比較を行った。ニューラルODEは、ニューラルネットワーク成分を持つ通常の微分方程式(ODE)である。神経odeのトレーニングは、重みが制御であり、隠れた特徴が状態である最適な制御問題である。トレーニングの各イテレーションでは、odeの前方と後方のタイムを解決し、大量の計算、時間、メモリを必要とする。画像分類タスクにおける Opt-Disc と Disc-Opt のアプローチを比較すると、Gholami et al. (2019) は勾配の精度が保証されているために Disc-Opt が好ましいことを示唆している。本稿では,時系列回帰とCNFの比較を行う。分類とは異なり、これらのタスクにおける有意義なモデルは、CNFの可逆性など、正確な最終的な出力を超える追加の要求を満たす必要がある。数値実験により、注意深い数値処理を行うことで、Opt-Discと同等の性能をトレーニングコストを大幅に削減できることを示した。 Disc-Optはトレーニング時間を39%から97%に減らした7つの問題のうち6つでコストを削減した。

We compare the discretize-optimize (Disc-Opt) and optimize-discretize (Opt-Disc) approaches for time-series regression and continuous normalizing flows (CNFs) using neural ODEs. Neural ODEs are ordinary differential equations (ODEs) with neural network components. Training a neural ODE is an optimal control problem where the weights are the controls and the hidden features are the states. Every training iteration involves solving an ODE forward and another backward in time, which can require large amounts of computation, time, and memory. Comparing the Opt-Disc and Disc-Opt approaches in image classification tasks, Gholami et al. (2019) suggest that Disc-Opt is preferable due to the guaranteed accuracy of gradients. In this paper, we extend the comparison to neural ODEs for time-series regression and CNFs. Unlike in classification, meaningful models in these tasks must also satisfy additional requirements beyond accurate final-time output, e.g., the invertibility of the CNF. Through our numerical experiments, we demonstrate that with careful numerical treatment, Disc-Opt methods can achieve similar performance as Opt-Disc at inference with drastically reduced training costs. Disc-Opt reduced costs in six out of seven separate problems with training time reduction ranging from 39% to 97%, and in one case, Disc-Opt reduced training from nine days to less than one day.

翻訳日:2022-11-28 08:21:21 公開日:2020-07-31

# ロバスト推定と制御のための神経収縮メトリクス:凸最適化アプローチ

Neural Contraction Metrics for Robust Estimation and Control: A Convex Optimization Approach ( http://arxiv.org/abs/2006.04361v3 )

ライセンス: Link先を確認

Hiroyasu Tsukamoto and Soon-Jo Chung

(参考訳) 本稿では,ニューラル収縮メトリック(ncm)の概念を用いたロバストな非線形推定と制御のための新しいディープラーニングフレームワークを提案する。 NCMは、非線形システムの指数的安定性に必要な必要十分条件である最適収縮距離の大域的近似に、ディープ・ロング・短期記憶リカレントニューラルネットワークを使用する。この最適性は、オフラインでサンプリングされた収縮測度が、摂動と摂動系軌道の間の定常ユークリッド距離の上界を最小化するための凸最適化問題の解であることに由来する。そこで本稿では,NCMを用いた非線形システムの最適推定器と制御器の設計について述べる。この枠組みの性能はロレンツ振動子状態推定と宇宙船の最適運動計画問題によって示される。

This paper presents a new deep learning-based framework for robust nonlinear estimation and control using the concept of a Neural Contraction Metric (NCM). The NCM uses a deep long short-term memory recurrent neural network for a global approximation of an optimal contraction metric, the existence of which is a necessary and sufficient condition for exponential stability of nonlinear systems. The optimality stems from the fact that the contraction metrics sampled offline are the solutions of a convex optimization problem to minimize an upper bound of the steady-state Euclidean distance between perturbed and unperturbed system trajectories. We demonstrate how to exploit NCMs to design an online optimal estimator and controller for nonlinear systems with bounded disturbances utilizing their duality. The performance of our framework is illustrated through Lorenz oscillator state estimation and spacecraft optimal motion planning problems.

翻訳日:2022-11-24 01:42:26 公開日:2020-07-31

# ハンティントン病における持続発声からの発声マーカー

Vocal markers from sustained phonation in Huntington's Disease ( http://arxiv.org/abs/2006.05365v3 )

ライセンス: Link先を確認

Rachid Riad and Hadrien Titeux and Laurie Lemoine and Justine Montillot and Jennifer Hamet Bagnou and Xuan Nga Cao and Emmanuel Dupoux and Anne-Catherine Bachoud-L\'evi

(参考訳) 疾患修正治療は現在神経変性疾患で評価されている。ハンティントン病は、前マンニフェスト遺伝子キャリアにおいてさえ、自動的に臨床マーカーを設計するユニークな機会である。音声障害を臨床マーカーとして検討し, 診断と遺伝子担体について検討した。音声特徴と変調パワースペクトル特徴の2つの特徴セットを使用しました。発声は遺伝子キャリアーのサブクリニカル障害の同定には不十分であることがわかった。以上の結果から, ハンティントン病の臨床成績の予測には, 音像の特徴が適していると考えられた。

Disease-modifying treatments are currently assessed in neurodegenerative diseases. Huntington's Disease represents a unique opportunity to design automatic sub-clinical markers, even in premanifest gene carriers. We investigated phonatory impairments as potential clinical markers and propose them for both diagnosis and gene carriers follow-up. We used two sets of features: Phonatory features and Modulation Power Spectrum Features. We found that phonation is not sufficient for the identification of sub-clinical disorders of premanifest gene carriers. According to our regression results, Phonatory features are suitable for the predictions of clinical performance in Huntington's Disease.

翻訳日:2022-11-23 15:19:59 公開日:2020-07-31

# 差別化可能なレンダリング: 調査

Differentiable Rendering: A Survey ( http://arxiv.org/abs/2006.12057v2 )

ライセンス: Link先を確認

Hiroharu Kato, Deniz Beker, Mihai Morariu, Takahiro Ando, Toru Matsuoka, Wadim Kehl, Adrien Gaidon

(参考訳) ディープニューラルネットワーク(DNN)は、オブジェクト検出やイメージセグメンテーションなどの視覚関連タスクにおいて、顕著なパフォーマンス向上を示している。その成功にもかかわらず、通常は画像を形成する3dオブジェクトの理解が欠如しており、シーンに関する3d情報を収集したり、簡単に注釈を付けることは必ずしも不可能である。微分可能レンダリングは、3dオブジェクトの勾配を画像を通して計算し伝播できる新しいフィールドである。また、3Dデータ収集とアノテーションの要求を減らし、様々なアプリケーションで高い成功率を実現する。本稿では,既存の文献を概観し,微分可能レンダリングの現状,応用,オープンリサーチの問題について考察する。

Deep neural networks (DNNs) have shown remarkable performance improvements on vision-related tasks such as object detection or image segmentation. Despite their success, they generally lack the understanding of 3D objects which form the image, as it is not always possible to collect 3D information about the scene or to easily annotate it. Differentiable rendering is a novel field which allows the gradients of 3D objects to be calculated and propagated through images. It also reduces the requirement of 3D data collection and annotation, while enabling higher success rate in various applications. This paper reviews existing literature and discusses the current state of differentiable rendering, its applications and open research problems.

翻訳日:2022-11-18 06:39:56 公開日:2020-07-31

# SRFlow: 正規化フローによる超解法空間の学習

SRFlow: Learning the Super-Resolution Space with Normalizing Flow ( http://arxiv.org/abs/2006.14200v2 )

ライセンス: Link先を確認

Andreas Lugmayr and Martin Danelljan and Luc Van Gool and Radu Timofte

(参考訳) 超解像度は、与えられた低解像度画像の複数の予測を可能にするため、不適切な問題である。この基本的な事実は、最先端のディープラーニングベースのアプローチによって無視されている。これらの手法は、リコンストラクションと敵対的損失の組み合わせを使って決定論的マッピングを訓練する。そこで本研究では,低解像度入力の出力条件分布を学習可能な正規化フローベースの超解像法であるsrflowを提案する。我々のモデルは、単一の損失、すなわち負のログ類似性を用いて原則的に訓練される。したがって、SRFlowは問題の本質を直接的に説明し、多彩なフォトリアリスティックな高解像度画像を予測することを学ぶ。また,srflowが学習した強像後段を柔軟な画像操作技術として活用し,他の画像からのコンテンツの転送などによる超解像の高分解能化を実現する。我々は, 顔や超解像性全般について, 広範囲にわたる実験を行った。 SRFlowは、PSNRと知覚品質指標の両方の観点から、最先端のGANベースのアプローチよりも優れており、超解解の空間を探索することで多様性を実現する。

Super-resolution is an ill-posed problem, since it allows for multiple predictions for a given low-resolution image. This fundamental fact is largely ignored by state-of-the-art deep learning based approaches. These methods instead train a deterministic mapping using combinations of reconstruction and adversarial losses. In this work, we therefore propose SRFlow: a normalizing flow based super-resolution method capable of learning the conditional distribution of the output given the low-resolution input. Our model is trained in a principled manner using a single loss, namely the negative log-likelihood. SRFlow therefore directly accounts for the ill-posed nature of the problem, and learns to predict diverse photo-realistic high-resolution images. Moreover, we utilize the strong image posterior learned by SRFlow to design flexible image manipulation techniques, capable of enhancing super-resolved images by, e.g., transferring content from other images. We perform extensive experiments on faces, as well as on super-resolution in general. SRFlow outperforms state-of-the-art GAN-based approaches in terms of both PSNR and perceptual quality metrics, while allowing for diversity through the exploration of the space of super-resolved solutions.

翻訳日:2022-11-17 04:24:38 公開日:2020-07-31

# ピアレビューにおける紙入札最適化のためのスーパー*アルゴリズム

A SUPER* Algorithm to Optimize Paper Bidding in Peer Review ( http://arxiv.org/abs/2007.07079v2 )

ライセンス: Link先を確認

Tanner Fiez, Nihar B. Shah, Lillian Ratliff

(参考訳) 多くのアプリケーションがユーザのシーケンシャルな到着を伴い、各ユーザにアイテムの順序を示す必要がある。主な例(この記事の焦点となる)は、レビュー者が順次システムに入り、各レビュアーが提出された論文のリストを表示し、レビュアーがいくつかの論文をレビューするために "bids" する必要があるカンファレンスピアレビューの入札プロセスである。示されている論文の順序は、プライマシー効果による入札に大きな影響を与える。表示すべき論文の順序を決定する際、競合する2つの目標がある。 (i)各紙に十分な数の入札を得ること、及び (二)関連項目を提示して審査員を満足させる。本稿では,この問題を原則的に研究する枠組みの開発から始める。この目的のために,A*アルゴリズムにインスパイアされたSUPER*アルゴリズムを提案する。理論的には、アルゴリズムの局所最適性保証を示し、人気ベースラインがかなり最適であることを示す。さらに, 類似性に関するコミュニティモデルでは, SUPER* がほぼ最適であるのに対して, 人気ベースラインはかなり最適であることを示す。 ICLR 2018の実際のデータと合成データの実験では、SUPER*は既存のシステムにデプロイされたベースラインをかなり上回り、必要な入札を50～75%以下に減らし、さまざまな現実の複雑さにも堅牢であることがわかった。

A number of applications involve sequential arrival of users, and require showing each user an ordering of items. A prime example (which forms the focus of this paper) is the bidding process in conference peer review where reviewers enter the system sequentially, each reviewer needs to be shown the list of submitted papers, and the reviewer then "bids" to review some papers. The order of the papers shown has a significant impact on the bids due to primacy effects. In deciding on the ordering of papers to show, there are two competing goals: (i) obtaining sufficiently many bids for each paper, and (ii) satisfying reviewers by showing them relevant items. In this paper, we begin by developing a framework to study this problem in a principled manner. We present an algorithm called SUPER*, inspired by the A* algorithm, for this goal. Theoretically, we show a local optimality guarantee of our algorithm and prove that popular baselines are considerably suboptimal. Moreover, under a community model for the similarities, we prove that SUPER* is near-optimal whereas the popular baselines are considerably suboptimal. In experiments on real data from ICLR 2018 and synthetic data, we find that SUPER* considerably outperforms baselines deployed in existing systems, consistently reducing the number of papers with fewer than requisite bids by 50-75% or more, and is also robust to various real world complexities.

翻訳日:2022-11-16 07:32:16 公開日:2020-07-31

# メタSAC: メタグラディエントによるソフトアクター・クライトのエントロピー温度の自動調整

Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient ( http://arxiv.org/abs/2007.01932v2 )

ライセンス: Link先を確認

Yufei Wang, Tianwei Ni

(参考訳) 探索探索ジレンマは、強化学習において長い間重要な問題であった。本稿では,これら2つのバランスをとるための新しい手法を提案する。提案手法は,従来のタスク報酬とポリシのエントロピーのバランスをとる「エントロピー温度」を用いて,エクスプロイトと探索のトレードオフを制御するソフトアクタ・クリティカル(SAC)アルゴリズムに基づいている。 SACはこのハイパーパラメータに非常に敏感であることが実証的に示されており、自動調整に制約付き最適化を用いるフォローアップ作業(SAC-v2)にはいくつかの制限がある。提案手法の中核は,SACのエントロピー温度を自動調整するために,メタグラディエントと新しいメタ目的を併用することである。我々は,Meta-SACがいくつかのMujocoベンチマークタスクにおいて有望な性能を達成し,最も困難なタスクの一つであるHumanoid-v2において,SAC-v2を10%以上上回っていることを示す。

Exploration-exploitation dilemma has long been a crucial issue in reinforcement learning. In this paper, we propose a new approach to automatically balance between these two. Our method is built upon the Soft Actor-Critic (SAC) algorithm, which uses an "entropy temperature" that balances the original task reward and the policy entropy, and hence controls the trade-off between exploitation and exploration. It is empirically shown that SAC is very sensitive to this hyperparameter, and the follow-up work (SAC-v2), which uses constrained optimization for automatic adjustment, has some limitations. The core of our method, namely Meta-SAC, is to use metagradient along with a novel meta objective to automatically tune the entropy temperature in SAC. We show that Meta-SAC achieves promising performances on several of the Mujoco benchmarking tasks, and outperforms SAC-v2 over 10% in one of the most challenging tasks, humanoid-v2.

翻訳日:2022-11-14 04:43:20 公開日:2020-07-31

# ModeNet:学習ビデオ符号化のためのモード選択ネットワーク

ModeNet: Mode Selection Network For Learned Video Coding ( http://arxiv.org/abs/2007.02532v2 )

ライセンス: Link先を確認

Th\'eo Ladune (IETR), Pierrick Philippe, Wassim Hamidouche (IETR), Lu Zhang (IETR), Olivier D\'eforges (IETR)

(参考訳) 本稿では,深層学習に基づくビデオ圧縮を強化するため,モード選択ネットワーク(ModeNet)を提案する。従来のビデオコーディングにインスパイアされたModeNetの目的は、いくつかのコーディングモード間の競争を可能にすることである。提案したModeNetは,各ピクセルを最も適した符号化モードに割り当てるために使用されるフレームの画素分割を学習し,伝達する。 modenetは異なるコーディングモードと共に訓練され、レート分散コストを最小限に抑える。これは、異なるコーディングツール間の競合を可能にするために、他のシステムに一般化できる柔軟なコンポーネントである。 Mod-eNetの関心は、Pフレームのコーディングタスクで研究され、予測値からフレームをコーディングする手法の設計に使用される。 modenetベースのシステムは、学習画像圧縮2020(clic20)のpフレーム符号化トラック条件で評価することで、魅力的な性能を達成している。

In this paper, a mode selection network (ModeNet) is proposed to enhance deep learning-based video compression. Inspired by traditional video coding, ModeNet purpose is to enable competition among several coding modes. The proposed ModeNet learns and conveys a pixel-wise partitioning of the frame, used to assign each pixel to the most suited coding mode. ModeNet is trained alongside the different coding modes to minimize a rate-distortion cost. It is a flexible component which can be generalized to other systems to allow competition between different coding tools. Mod-eNet interest is studied on a P-frame coding task, where it is used to design a method for coding a frame given its prediction. ModeNet-based systems achieve compelling performance when evaluated under the Challenge on Learned Image Compression 2020 (CLIC20) P-frame coding track conditions.

翻訳日:2022-11-13 02:26:27 公開日:2020-07-31

# シーンコンテキストによる長期人間の動作予測

Long-term Human Motion Prediction with Scene Context ( http://arxiv.org/abs/2007.03672v3 )

ライセンス: Link先を確認

Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, Jitendra Malik

(参考訳) 人間の動きはゴール指向であり、シーン内の物体の空間配置に影響される。将来の人間の動きを計画するためには、環境を知覚することが不可欠だ。人間の動きを予測する既存の研究は、シーンの文脈に注意を払わないため、長期的な予測に苦慮している。本研究では,この課題に対処するためのシーンコンテキストを活用する新しい3段階フレームワークを提案する。 1つのシーンイメージと2Dポーズ履歴が与えられた後、まず複数の人間の動作目標を抽出し、各目標に向けて3Dヒューマンパスを計画し、最終的に各パスに続く3Dヒューマンポーズシーケンスを予測する。安定したトレーニングと厳密な評価のために,クリーンアノテーションを用いた多様な合成データセットを寄贈する。合成データと実データの両方において,本手法は既存の手法に比べて一貫した定量的,定性的な改善を示す。

Human movement is goal-directed and influenced by the spatial layout of the objects in the scene. To plan future human motion, it is crucial to perceive the environment -- imagine how hard it is to navigate a new room with lights off. Existing works on predicting human motion do not pay attention to the scene context and thus struggle in long-term prediction. In this work, we propose a novel three-stage framework that exploits scene context to tackle this task. Given a single scene image and 2D pose histories, our method first samples multiple human motion goals, then plans 3D human paths towards each goal, and finally predicts 3D human pose sequences following each path. For stable training and rigorous evaluation, we contribute a diverse synthetic dataset with clean annotations. In both synthetic and real datasets, our method shows consistent quantitative and qualitative improvements over existing methods.

翻訳日:2022-11-12 20:08:34 公開日:2020-07-31

# AttentionNAS:ビデオ分類のための時空間注意細胞探索

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification ( http://arxiv.org/abs/2007.12034v2 )

ライセンス: Link先を確認

Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani and Wei Hua

(参考訳) 畳み込み操作には2つの制限がある:(1)同じフィルタを全ての位置に適用する場所を明示的にモデル化しないこと、(2)小さな近傍でのみ動作するような長距離依存性のモデル化には不向きである。どちらの制限も注意操作によって緩和できるが、特にビデオに注意を向ける場合、注意を使用するように多くの設計選択が決定される。ビデオに注意を向ける原理的な方法を目指して,時空間注意細胞探索の課題に対処する。そこで本稿では, セルの様々な設計選択を柔軟に探索できる, 時空間アテンションセルの新しい探索空間を提案する。検出されたアテンションセルは既存のバックボーンネットワーク(例えばI3DやS3D)にシームレスに挿入することができ、Kinetics-600とMiTのデータセットでビデオ分類精度を2%以上改善することができる。検出された注意セルは、両方のデータセット上の非ローカルブロックよりも優れており、異なるモダリティ、バックボーン、データセットにまたがる強力な一般化を示している。注意細胞をI3D-R50に挿入すると、両方のデータセットで最先端のパフォーマンスが得られる。

Convolutional operations have two limitations: (1) do not explicitly model where to focus as the same filter is applied to all the positions, and (2) are unsuitable for modeling long-range dependencies as they only operate on a small neighborhood. While both limitations can be alleviated by attention operations, many design choices remain to be determined to use attention, especially when applying attention to videos. Towards a principled way of applying attention to videos, we address the task of spatiotemporal attention cell search. We propose a novel search space for spatiotemporal attention cells, which allows the search algorithm to flexibly explore various design choices in the cell. The discovered attention cells can be seamlessly inserted into existing backbone networks, e.g., I3D or S3D, and improve video classification accuracy by more than 2% on both Kinetics-600 and MiT datasets. The discovered attention cells outperform non-local blocks on both datasets, and demonstrate strong generalization across different modalities, backbones, and datasets. Inserting our attention cells into I3D-R50 yields state-of-the-art performance on both datasets.

翻訳日:2022-11-07 12:21:16 公開日:2020-07-31

# ディープラーニングモデルに基づく積極的なタスク管理

Proactive Tasks Management based on a Deep Learning Model ( http://arxiv.org/abs/2007.12857v2 )

ライセンス: Link先を確認

Kostas Kolomvatsos, Christos Anagnotopoulos

(参考訳) 広く普及するコンピューティングアプリケーションは、アクティビティを促進するユーザを取り巻くインテリジェンスを扱う。このインテリジェンスは、組み込みシステムやエンドユーザーとの密接な距離にあるデバイスに組み込まれたソフトウェアコンポーネントという形で提供される。 ecノードはiot(internet of things)インフラストラクチャに存在するデバイスが収集したデータに対して,さまざまなタスクを実行することができる。本稿では,要求に基づく知的かつ積極的なタスク管理モデルを提案する。デマンドはECノードで利用可能なタスクの使用に関心のあるユーザやアプリケーションの数を表しており、その人気を特徴づけている。我々は、Deep Machine Learning(DML)モデル、特にLong Short Term Memory(LSTM)ネットワークを利用して、各タスクに対する需要指標の分布を学習し、将来の関心を見積もる。この情報は歴史的な観察と組み合わされ、限られた関心のためにどのタスクがオフロードされるかを決定する意思決定スキームをサポートする。意思決定において、各タスクが割り当てられる処理ノードに追加される可能性のある負荷も考慮に入れておく必要があります。本モデルには,提案機構を評価するための実験シミュレーションが多数添付されている。提案手法は,最も効率的な割当を結論づけつつ,飛べば決定できることを示す。

Pervasive computing applications deal with intelligence surrounding users that can facilitate their activities. This intelligence is provided in the form of software components incorporated in embedded systems or devices in close distance with end users.One example infrastructure that can host intelligent pervasive services is the Edge Computing (EC) infrastructure. EC nodes can execute a number of tasks for data collected by devices present in the Internet of Things (IoT) infrastructure. In this paper, we propose an intelligent, proactive tasks management model based on the demand. Demand depicts the number of users or applications interested in using the available tasks in EC nodes, thus, characterizing their popularity. We rely on a Deep Machine Learning (DML) model and more specifically on a Long Short Term Memory (LSTM) network to learn the distribution of demand indicators for each task and estimate the future interest. This information is combined with historical observations and support a decision making scheme to conclude which tasks will be offloaded due to limited interest on them. We have to notice that in our decision making, we also take into consideration the load that every task may add to the processing node where it will be allocated. The description of our model is accompanied by a large set of experimental simulations for evaluating the proposed mechanism. We provide numerical results and reveal that the proposed scheme is capable of deciding on the fly while concluding the most efficient allocation.

翻訳日:2022-11-07 01:28:04 公開日:2020-07-31

# 深層学習を用いた除草ハーバリウムスカンからの植物器官の検出とアノテーション

Detection and Annotation of Plant Organs from Digitized Herbarium Scans using Deep Learning ( http://arxiv.org/abs/2007.13106v2 )

ライセンス: Link先を確認

Sohaib Younis, Marco Schmidt, Claus Weiland, Stefan Dressler, Bernhard Seeger, Thomas Hickler

(参考訳) ヘルバリウム標本がデジタル化され、オンラインリポジトリからアクセスできるようになるにつれて、高度なコンピュータビジョン技術がそれらから情報を抽出するために使用されている。ハーバリウムシート上の特定の植物器官の存在は、様々な科学的文脈において有用な情報であり、これらの臓器の自動認識は、そのような情報の動員に役立つ。本研究では,より高速なr-cnnを用いて,植物器官の検出に深層学習を用いる。実験では,6種類の植物器官に何千ものバウンディングボックスを含む数百のエルバリウムスキャンを手作業で注釈し,植物器官検出モデルの訓練と評価に用いた。モデルは葉や茎で特にうまく機能し、花はシートに多く含まれていたが、同様に認識されなかった。

As herbarium specimens are increasingly becoming digitized and accessible in online repositories, advanced computer vision techniques are being used to extract information from them. The presence of certain plant organs on herbarium sheets is useful information in various scientific contexts and automatic recognition of these organs will help mobilize such information. In our study we use deep learning to detect plant organs on digitized herbarium specimens with Faster R-CNN. For our experiment we manually annotated hundreds of herbarium scans with thousands of bounding boxes for six types of plant organs and used them for training and evaluating the plant organ detection model. The model worked particularly well on leaves and stems, while flowers were also present in large numbers in the sheets, but not equally well recognized.

翻訳日:2022-11-06 20:11:50 公開日:2020-07-31

# 解剖学的に可変なXCATファントムを用いた3次元心臓MR画像合成のためのXCAT-GAN

XCAT-GAN for Synthesizing 3D Consistent Labeled Cardiac MR Images on Anatomically Variable XCAT Phantoms ( http://arxiv.org/abs/2007.13408v2 )

ライセンス: Link先を確認

Sina Amirrajab, Samaneh Abbasi-Sureshjani, Yasmina Al Khalil, Cristian Lorenz, Juergen Weese, Josien Pluim, and Marcel Breeuwer

(参考訳) GAN(Generative Adversarial Network)は、高忠実度画像の合成による有望なデータ濃縮ソリューションを提供する。しかし,新しい解剖学的変化を伴うラベル付き画像の大量生成は未発見のままである。そこで本研究では, 4D eXtended Cardiac と Torso (XCAT) コンピュータ化ヒトファントムを用いて, 解剖学的変化が大きい仮想被験者の集団に心磁気共鳴画像(CMR)を合成する方法を提案する。本研究では,4-classと8-class XCAT-GANの2つの条件付き画像合成手法について検討した。 4-class法は心臓のアノテーションのみに依存し,8-class法では心周囲臓器の多部ラベルマップが予測され,条件付き画像合成のためのより良いガイダンスが得られた。いずれの手法も、条件付きXCAT-GANを実画像と対応するラベルとの組み合わせで訓練し、その後、推論時にXCATから派生したラベルに置き換える。そのため、トレーニングされたネットワークは、組織固有のテクスチャを新しいラベルマップに正確に転送する。終末期期および終末期期における合成cmr画像の33個の仮想被写体を作成し,そのデータの有効性について検討した。その結果, 実画像(40巻)の20%に留まらず, 合成CMR画像の付加によりセグメンテーション性能が保たれることがわかった。さらに, 実データ拡張のための合成画像の利用改善は, ハウスドルフ距離を最大28%削減し, ディススコアを最大5%向上させることで明らかとなり, 全次元において地上の真実と高い類似性を示した。

Generative adversarial networks (GANs) have provided promising data enrichment solutions by synthesizing high-fidelity images. However, generating large sets of labeled images with new anatomical variations remains unexplored. We propose a novel method for synthesizing cardiac magnetic resonance (CMR) images on a population of virtual subjects with a large anatomical variation, introduced using the 4D eXtended Cardiac and Torso (XCAT) computerized human phantom. We investigate two conditional image synthesis approaches grounded on a semantically-consistent mask-guided image generation technique: 4-class and 8-class XCAT-GANs. The 4-class technique relies on only the annotations of the heart; while the 8-class technique employs a predicted multi-tissue label map of the heart-surrounding organs and provides better guidance for our conditional image synthesis. For both techniques, we train our conditional XCAT-GAN with real images paired with corresponding labels and subsequently at the inference time, we substitute the labels with the XCAT derived ones. Therefore, the trained network accurately transfers the tissue-specific textures to the new label maps. By creating 33 virtual subjects of synthetic CMR images at the end-diastolic and end-systolic phases, we evaluate the usefulness of such data in the downstream cardiac cavity segmentation task under different augmentation strategies. Results demonstrate that even with only 20% of real images (40 volumes) seen during training, segmentation performance is retained with the addition of synthetic CMR images. Moreover, the improvement in utilizing synthetic images for augmenting the real data is evident through the reduction of Hausdorff distance up to 28% and an increase in the Dice score up to 5%, indicating a higher similarity to the ground truth in all dimensions.

翻訳日:2022-11-06 08:19:36 公開日:2020-07-31

# 超高解像度画像のセグメンテーション

Foveation for Segmentation of Ultra-High Resolution Images ( http://arxiv.org/abs/2007.15124v2 )

ライセンス: Link先を確認

Chen Jin, Ryutaro Tanno, Moucheng Xu, Thomy Mertzanidou, Daniel C. Alexander

(参考訳) 超高解像度画像のセグメンテーションは、その巨大なサイズ、数百万から数十億ピクセルからなるため、難しい。典型的なソリューションとしては、入力イメージを固定サイズのパッチに分割したり、メモリ制約を満たすためにダウンサンプリングしたりする。このような操作は、視野(FoV)、すなわち空間カバレッジと画像解像度において情報損失を引き起こす。しかし、セグメンテーションのパフォーマンスへの影響はまだ検討されていない。本研究では,fovと解像度のトレードオフが超高解像度画像のセグメンテーション性能に影響を及ぼすことを示すモチベーション実験から始める。次に、与えられた超高解像度画像に対して、入力パッチの適切な構成(FoV/解像度トレードオフ)を適応的に選択し、画像の各空間位置で下流セグメンテーションモデルにフィードする、学習可能なデータローダであるFoveationモジュールを導入する。フェーベーションモジュールは、タスク性能を最大化するためにセグメンテーションネットワークと共同で訓練される。我々は、FoV/解像度トレードオフのパッチでトレーニングされた場合よりも、Foveationモジュールがセグメンテーション性能を一貫して向上することを示す3つの公開高解像度画像データセットを実証する。本手法は,deepglobe aerial image dataset における sota 性能を実現する。 Gleason2019 の病理組織学的データセットでは,最も臨床的に重要で曖昧な2つのクラス (Gleason Grade 3 と 4 ) のセグメンテーション精度が13.1%,7.5%向上し,6 名の人間専門家の平均性能を6.5%,7.5%向上させた。私たちのコードとトレーニングされたモデルは、$\text{https://github.com/lxasqjc/Foveation-Segmentation}$で利用可能です。

Segmentation of ultra-high resolution images is challenging because of their enormous size, consisting of millions or even billions of pixels. Typical solutions include dividing input images into patches of fixed size and/or down-sampling to meet memory constraints. Such operations incur information loss in the field-of-view (FoV) i.e., spatial coverage and the image resolution. The impact on segmentation performance is, however, as yet understudied. In this work, we start with a motivational experiment which demonstrates that the trade-off between FoV and resolution affects the segmentation performance on ultra-high resolution images---and furthermore, its influence also varies spatially according to the local patterns in different areas. We then introduce foveation module, a learnable "dataloader" which, for a given ultra-high resolution image, adaptively chooses the appropriate configuration (FoV/resolution trade-off) of the input patch to feed to the downstream segmentation model at each spatial location of the image. The foveation module is jointly trained with the segmentation network to maximise the task performance. We demonstrate on three publicly available high-resolution image datasets that the foveation module consistently improves segmentation performance over the cases trained with patches of fixed FoV/resolution trade-off. Our approach achieves the SoTA performance on the DeepGlobe aerial image dataset. On the Gleason2019 histopathology dataset, our model achieves better segmentation accuracy for the two most clinically important and ambiguous classes (Gleason Grade 3 and 4) than the top performers in the challenge by 13.1% and 7.5%, and improves on the average performance of 6 human experts by 6.5% and 7.5%. Our code and trained models are available at $\text{https://github.com/lxasqjc/Foveation-Segmentation}$.

翻訳日:2022-11-05 20:08:44 公開日:2020-07-31

# 考えることを学ぶか、何をするかを学ぶか--活気ある学習の微妙な基礎

Learning what they think vs. learning what they do: The micro-foundations of vicarious learning ( http://arxiv.org/abs/2007.15264v2 )

ライセンス: Link先を確認

Sanghyun Park and Phanish Puranam

(参考訳) 活気のある学習は組織学習の重要な要素です。私たちは、活発な学習の基礎となる2つの基本的なプロセスを理論化し、モデル化します。私たちのモデルの分析は、3つの重要な洞察を示します。第一に、活気ある学習者のシステムにエージェントがいない場合でも、どちらのプロセスでも活気ある学習は有益である。第二に、信念共有による活気ある学習は、行動と成果の相互観察よりも普遍的に優れているわけではない。特に,行動と成果の相互観測可能性の実現は,タスク環境が価値に大きな違いのある選択肢がほとんどなく,時間的プレッシャーがない場合に,信念の共有よりも優れている。第三に、ヴィカリアス学習の対称性は実際には信念共有に悪影響を及ぼすが、観察学習を改善する。これら3つの結果は、自発的な学習が自己完結するバイアスド・信念にどのように影響を及ぼすかの結果として示される。

Vicarious learning is a vital component of organizational learning. We theorize and model two fundamental processes underlying vicarious learning: observation of actions (learning what they do) vs. belief sharing (learning what they think). The analysis of our model points to three key insights. First, vicarious learning through either process is beneficial even when no agent in a system of vicarious learners begins with a knowledge advantage. Second, vicarious learning through belief sharing is not universally better than mutual observation of actions and outcomes. Specifically, enabling mutual observability of actions and outcomes is superior to sharing of beliefs when the task environment features few alternatives with large differences in their value and there are no time pressures. Third, symmetry in vicarious learning in fact adversely affects belief sharing but improves observational learning. All three results are shown to be the consequence of how vicarious learning affects self-confirming biased beliefs.

翻訳日:2022-11-05 14:33:48 公開日:2020-07-31

# 外部知識からの学習による多ラベルゼロショット分類

Multi-label Zero-shot Classification by Learning to Transfer from External Knowledge ( http://arxiv.org/abs/2007.15610v2 )

ライセンス: Link先を確認

He Huang, Yuanwei Chen, Wei Tang, Wenhao Zheng, Qing-Guo Chen, Yao Hu, Philip Yu

(参考訳) マルチラベルゼロショット分類は、入力画像に対する複数の未知のクラスラベルを予測することを目的としている。シングルレーベルよりも難しい。一方、各イメージに割り当てられたラベルの制限されていない数は、モデルが見たクラスにもっと簡単にオーバーフィットする。一方、既存のマルチラベル分類データセットでは、目に見えるクラスと見えないクラスの間に大きな意味的ギャップがある。このような難題に対処するために,外部知識からの伝達を学習する多ラベルゼロショット分類フレームワークを提案する。 ImageNetは特徴抽出器の事前訓練によく使われており、大きめできめ細かなラベル空間を持つ。これにより、目に見えるクラスと見えないクラスを橋渡しし、一般化を促進する外部知識として活用するモチベーションが生まれます。具体的には、ターゲットデータセットのクラスだけでなく、ImageNetのクラスも含む知識グラフを構築する。対象とするデータセットではimagenetラベルが利用できないため,拡張知識グラフで初期状態を推測する新しいposvaeモジュールを提案する。次に,関係グラフ畳み込みネットワーク(RGCN)を設計し,クラス間で情報を伝達し,知識伝達を実現する。 2つのベンチマークデータセットの実験結果から,提案手法の有効性が示された。

Multi-label zero-shot classification aims to predict multiple unseen class labels for an input image. It is more challenging than its single-label counterpart. On one hand, the unconstrained number of labels assigned to each image makes the model more easily overfit to those seen classes. On the other hand, there is a large semantic gap between seen and unseen classes in the existing multi-label classification datasets. To address these difficult issues, this paper introduces a novel multi-label zero-shot classification framework by learning to transfer from external knowledge. We observe that ImageNet is commonly used to pretrain the feature extractor and has a large and fine-grained label space. This motivates us to exploit it as external knowledge to bridge the seen and unseen classes and promote generalization. Specifically, we construct a knowledge graph including not only classes from the target dataset but also those from ImageNet. Since ImageNet labels are not available in the target dataset, we propose a novel PosVAE module to infer their initial states in the extended knowledge graph. Then we design a relational graph convolutional network (RGCN) to propagate information among classes and achieve knowledge transfer. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed approach.

翻訳日:2022-11-05 14:24:31 公開日:2020-07-31

# 正規化RBFカーネルによるサンプル効率の向上

Improving Sample Efficiency with Normalized RBF Kernels ( http://arxiv.org/abs/2007.15397v2 )

ライセンス: Link先を確認

Sebastian Pineda-Arango, David Obando-Paniagua, Alperen Dedeoglu, Philip Kurzend\"orfer, Friedemann Schestag and Randolf Scholz

(参考訳) ディープラーニングモデルでは、少ないデータでより多くを学ぶことが重要になっています。本稿では,正規化されたラジアル基底関数(RBF)カーネルを用いたニューラルネットワークを用いて,サンプル効率の向上を図る。さらに,このような出力層がクラスがコンパクトかつ分離された埋め込み空間をどのように見つけるかを示す。そこで本研究では,このようなニューラルネットワークを分類タスクで学習するための2段階の手法を提案する。 CIFAR-10 と CIFAR-100 の実験により,通常のカーネルを出力層として持つネットワークは,SoftMax 出力層を用いたネットワークと比較して,提案手法によりより高効率,高コンパクト,高分離性が得られることを示した。

In deep learning models, learning more with less data is becoming more important. This paper explores how neural networks with normalized Radial Basis Function (RBF) kernels can be trained to achieve better sample efficiency. Moreover, we show how this kind of output layer can find embedding spaces where the classes are compact and well-separated. In order to achieve this, we propose a two-phase method to train those type of neural networks on classification tasks. Experiments on CIFAR-10 and CIFAR-100 show that networks with normalized kernels as output layer can achieve higher sample efficiency, high compactness and well-separability through the presented method in comparison to networks with SoftMax output layer.

翻訳日:2022-11-05 13:21:41 公開日:2020-07-31

# 1量子化ディープニューラルネットワーク量子状態を持つ二次元スピンレス格子フェルミオンの相

Phases of two-dimensional spinless lattice fermions with first-quantized deep neural-network quantum states ( http://arxiv.org/abs/2008.00118v1 )

ライセンス: Link先を確認

James Stokes, Javier Robledo Moreno, Eftychios A. Pnevmatikakis, Giuseppe Carleo

(参考訳) 格子上の強結合フェルミオン系を解析するために, 第一量子化ディープニューラルネットワーク技術を開発した。畳み込み残差ブロックを持つ深い残差ネットワークを利用するスレーター・ジャストロウインスパイアアンサッツを用いて、最寄り-近距離相互作用を持つ正方格子上のスピンレスフェルミオンの基底状態を近似的に決定する。ニューラルネットワークのansatzの柔軟性は、エネルギーと相関関数の両方において、小さなシステムでの正確な対角化結果と比較して高い精度をもたらす。大規模系では, 相互作用強度と粒子密度の関数として, 金属相と電荷相の境界を正確に推定する。

First-quantized deep neural network techniques are developed for analyzing strongly coupled fermionic systems on the lattice. Using a Slater-Jastrow inspired ansatz which exploits deep residual networks with convolutional residual blocks, we approximately determine the ground state of spinless fermions on a square lattice with nearest-neighbor interactions. The flexibility of the neural-network ansatz results in a high level of accuracy when compared to exact diagonalization results on small systems, both for energy and correlation functions. On large systems, we obtain accurate estimates of the boundaries between metallic and charge ordered phases as a function of the interaction strength and the particle density.

翻訳日:2022-11-04 07:20:38 公開日:2020-07-31

# ニューラルネットワークによるアイスフォビック性能の予測

Using neural networks to predict icephobic performance ( http://arxiv.org/abs/2008.00966v1 )

ライセンス: Link先を確認

Rahul Ramachandran

(参考訳) 超疎水性表面に触発されたアイスホビック表面は、アイシング問題に対する受動的解を提供する。しかし, 超疎水化を助長する物質的特徴によっては, 耐氷性能に悪影響を及ぼす可能性があるため, アイスフォビシティのモデル化は困難である。本研究では, 人工ニューラルネットワークを用いたアイスフォビシティのモデル化手法を提案する。人工ニューラルネットワークモデルを用いて, コンクリートの凍結性能を予測した。実験データを用いて, 凍結条件下で水滴の表面着氷強度と再着氷係数(cor)を推算した。材料, 塗料組成物, および環境条件をモデル入力変数として用いた。多層パーセプトロンを用いて根平均二乗誤差 0.08, 90%信頼区間 [0.042, 0.151] のcorを予測した。このモデルは展開後の0.92の係数を持つ。氷の付着強度は試料の幅広い値に対して変化するため, 混合密度ネットワークをモデルとして, マルチモーダルデータの基盤となる関係を学習した。判定係数は0.96。アイスフォビック性能における入力変数の相対的重要性を置換重要度を用いて算出した。開発モデルはコンクリートの耐氷性を最適化する上で有益である。

Icephobic surfaces inspired by superhydrophobic surfaces offer a passive solution to the problem of icing. However, modeling icephobicity is challenging because some material features that aid superhydrophobicity can adversely affect the icephobic performance. This study presents a new approach based on artificial neural networks to model icephobicity. Artificial neural network models were developed to predict the icephobic performance of concrete. The models were trained on experimental data to predict the surface ice adhesion strength and the coefficient of restitution (COR) of water droplet bouncing off the surface under freezing conditions. The material and coating compositions, and environmental condition were used as the models' input variables. A multilayer perceptron was trained to predict COR with a root mean squared error of 0.08, and a 90% confidence interval of [0.042, 0.151]. The model had a coefficient of determination of 0.92 after deployment. Since ice adhesion strength varied over a wide range of values for the samples, a mixture density network was model was developed to learn the underlying relationship in the multimodal data. Coefficient of determination for the model was 0.96. The relative importance of the input variables in icephobic performance were calculated using permutation importance. The developed models will be beneficial to optimize icephobicity of concrete.

翻訳日:2022-11-04 07:20:26 公開日:2020-07-31

# 時間ワープ編集距離の改善 --線形メモリにおける並列動的プログラム

Improved Time Warp Edit Distance -- A Parallel Dynamic Program in Linear Memory ( http://arxiv.org/abs/2007.16135v1 )

ライセンス: Link先を確認

Garrett Wright

(参考訳) 編集距離(edit distance)は、動的プログラミング問題の古典的なファミリーであり、タイムワープ編集距離(time warp edit distance)は、計量と時間弾性の概念を用いて問題を洗練する。大規模並列化と線形記憶のみを必要とする,改良されたタイムワープ編集距離アルゴリズムを提案する。この方法は、オリジナルの動的プログラム空間をカバーするために3対角帯の行列を用いる。対角線更新のすべての要素は並列に計算できる。コア法(core method)は、twed long common subsequence data dependenceの特徴であり、類似のバンドサブプロブレム構造を共有する動的プログラムに適用できる。このアルゴリズムはPythonバインディングを備えたCUDA Cライブラリとして実装されている。挑戦的な問題のスピードアップは驚くべきことです。

Edit Distance is a classic family of dynamic programming problems, among which Time Warp Edit Distance refines the problem with the notion of a metric and temporal elasticity. A novel Improved Time Warp Edit Distance algorithm that is both massively parallelizable and requiring only linear storage is presented. This method uses the procession of a three diagonal band to cover the original dynamic program space. Every element of the diagonal update can be computed in parallel. The core method is a feature of the TWED Longest Common Subsequence data dependence and is applicable to dynamic programs that share similar band subproblem structure. The algorithm has been implemented as a CUDA C library with Python bindings. Speedups for challenging problems are phenomenal.

翻訳日:2022-11-04 07:19:21 公開日:2020-07-31

# CorrSigNet: 放射線画像と病理画像からCRRELated Prostate Cancer SIGnaturesを学習してコンピュータ支援診断を改善する

CorrSigNet: Learning CORRelated Prostate Cancer SIGnatures from Radiology and Pathology Images for Improved Computer Aided Diagnosis ( http://arxiv.org/abs/2008.00119v1 )

ライセンス: Link先を確認

Indrani Bhattacharya and Arun Seetharaman and Wei Shao and Rewa Sood and Christian A. Kunder and Richard E. Fan and Simon John Christoph Soerensen and Jeffrey B. Wang and Pejman Ghanouni and Nikola C. Teslovich and James D. Brooks and Geoffrey A. Sonn and Mirabela Rusu

(参考訳) MRIは前立腺がんのスクリーニングやステージングに広く用いられている。しかし、多くの前立腺癌はMRIでは容易に識別できない微妙な特徴を有しており、診断に失敗し、放射線学の解釈に異常が生じる。機械学習モデルは、がんの同定を改善するために開発されたが、現在のモデルは、MRIによる特徴を用いてがんを局在させるが、切除組織で観察される疾患の病態の特徴を考慮しない。本稿では,MRIで前立腺癌を局所化する2段階自動モデルであるCorrSigNetを提案する。まず, 共通表現学習を用いて, 対応する病理組織学的特徴に関連付けられたがんのmri所見を学習する。第二に、学習した相関MRI機能を使って、畳み込みニューラルネットワークをトレーニングし、前立腺がんを局所化する。病理組織像は、相関する特徴を学習するために第1段階のみ使用される。これらの特徴は、(病理組織学や手術なしで)新しい患者のmriから抽出され、がんを局在化することができる。前立腺切除術を施行した806スライス75例を対象に,本フレームワークを訓練,検証した。前立腺癌症例20例(139例,24例,112万画素)の独立した検査群を用いて,1画素あたりの感度0.81,特異度0.71,auc 0.86,1レシオンauc$0.96 \pm 0.07$を,mriを用いた前立腺癌予測における現在の精度を上回った。

Magnetic Resonance Imaging (MRI) is widely used for screening and staging prostate cancer. However, many prostate cancers have subtle features which are not easily identifiable on MRI, resulting in missed diagnoses and alarming variability in radiologist interpretation. Machine learning models have been developed in an effort to improve cancer identification, but current models localize cancer using MRI-derived features, while failing to consider the disease pathology characteristics observed on resected tissue. In this paper, we propose CorrSigNet, an automated two-step model that localizes prostate cancer on MRI by capturing the pathology features of cancer. First, the model learns MRI signatures of cancer that are correlated with corresponding histopathology features using Common Representation Learning. Second, the model uses the learned correlated MRI features to train a Convolutional Neural Network to localize prostate cancer. The histopathology images are used only in the first step to learn the correlated features. Once learned, these correlated features can be extracted from MRI of new patients (without histopathology or surgery) to localize cancer. We trained and validated our framework on a unique dataset of 75 patients with 806 slices who underwent MRI followed by prostatectomy surgery. We tested our method on an independent test set of 20 prostatectomy patients (139 slices, 24 cancerous lesions, 1.12M pixels) and achieved a per-pixel sensitivity of 0.81, specificity of 0.71, AUC of 0.86 and a per-lesion AUC of $0.96 \pm 0.07$, outperforming the current state-of-the-art accuracy in predicting prostate cancer using MRI.

翻訳日:2022-11-04 07:16:43 公開日:2020-07-31

# 高次元線形モデルにおける経験ベイズ後方の変分近似

Variational approximations of empirical Bayes posteriors in high-dimensional linear models ( http://arxiv.org/abs/2007.15930v1 )

ライセンス: Link先を確認

Yue Yang and Ryan Martin

(参考訳) 高次元では、先行尾部は後部計算と漸近性集中率の両方に有意な影響を及ぼす。後方計算を比較的シンプルに保ちながら最適速度を達成するために,データ駆動センタを用いた薄い共役前処理を特徴とする経験的ベイズ法が最近提案されている。共役先行法は計算負担の一部を緩和するが、マルコフ連鎖モンテカルロ法は依然として必要であり、次元が高ければ高価である。本稿では, 実験的なベイズ後部への変分近似を開発し, 計算が高速で, 原点の最適濃度特性を保っている。シミュレーションでは,本手法は,多種多様な高次元環境における文献における既存変分近似よりも優れた性能を示した。

In high-dimensions, the prior tails can have a significant effect on both posterior computation and asymptotic concentration rates. To achieve optimal rates while keeping the posterior computations relatively simple, an empirical Bayes approach has recently been proposed, featuring thin-tailed conjugate priors with data-driven centers. While conjugate priors ease some of the computational burden, Markov chain Monte Carlo methods are still needed, which can be expensive when dimension is high. In this paper, we develop a variational approximation to the empirical Bayes posterior that is fast to compute and retains the optimal concentration rate properties of the original. In simulations, our method is shown to have superior performance compared to existing variational approximations in the literature across a wide range of high-dimensional settings.

翻訳日:2022-11-04 07:16:11 公開日:2020-07-31

# モチーフに基づくグラフ畳み込み多層ネットワークを用いたグラフの表現学習

Representation Learning of Graphs Using Graph Convolutional Multilayer Networks Based on Motifs ( http://arxiv.org/abs/2007.15838v1 )

ライセンス: Link先を確認

Xing Li, Wei Wei, Xiangnan Feng, Xue Liu, Zhiming Zheng

(参考訳) グラフ構造は一般的に使用されるデータ記憶モードであり、グラフ内のノードの低次元埋め込み表現は、ノード分類、リンク予測など、様々な典型的なタスクで非常に有用であることが判明した。しかし、既存のアプローチのほとんどはグラフ内の二項関係(すなわちエッジ)から始まり、グラフの高階局所構造(すなわちモチーフ)を生かしていない。本稿では,ノードの特徴情報とグラフの高次局所構造を利用して,それまで認識されていなかったデータに対してノード埋め込みを効果的に生成する新しいフレームワークmgcmnを提案する。研究により、異なるタイプのネットワークには異なるキーモチーフがあることが判明した。また,提案手法のベースライン法に対する利点は,引用ネットワークとソーシャルネットワークのデータセットに関する数多くの実験で実証されている。同時に,分類精度の向上とクラスタリング係数との正の相関が明らかになった。高次構造情報を用いることで、グラフニューラルネットワークの学習効率を大幅に向上させ、新たな学習モードの確立を促進することができると考えられる。

The graph structure is a commonly used data storage mode, and it turns out that the low-dimensional embedded representation of nodes in the graph is extremely useful in various typical tasks, such as node classification, link prediction , etc. However, most of the existing approaches start from the binary relationship (i.e., edges) in the graph and have not leveraged the higher order local structure (i.e., motifs) of the graph. Here, we propose mGCMN -- a novel framework which utilizes node feature information and the higher order local structure of the graph to effectively generate node embeddings for previously unseen data. Through research we have found that different types of networks have different key motifs. And the advantages of our method over the baseline methods have been demonstrated in a large number of experiments on citation network and social network datasets. At the same time, a positive correlation between increase of the classification accuracy and the clustering coefficient is revealed. It is believed that using high order structural information can truly manifest the potential of the network, which will greatly improve the learning efficiency of the graph neural network and promote a brand-new learning mode establishment.

翻訳日:2022-11-04 07:14:51 公開日:2020-07-31

# DeepCOVIDNet:不均一な特徴と相互作用を用いた新型コロナウイルスの予測監視のための解釈可能なディープラーニングモデル

DeepCOVIDNet: An Interpretable Deep Learning Model for Predictive Surveillance of COVID-19 Using Heterogeneous Features and their Interactions ( http://arxiv.org/abs/2008.00115v1 )

ライセンス: Link先を確認

Ankit Ramchandani, Chao Fan, Ali Mostafavi

(参考訳) 本稿では,今後新型コロナウイルス感染者の増加範囲を予測するための深層学習モデルを提案し,多変量時系列と多変量空間時系列データの等次元表現を計算するための新しい手法を提案する。この手法により, 国勢調査データ, 地域内移動性, 地域間移動性, 社会的分散性データ, 感染の過去成長など, 様々な特徴を取り入れ, それらの特徴間の複雑な相互作用を学ぶことができる。様々な情報源から収集したデータを用いて,米国全郡で7日間の感染者増加範囲を推定した。また,このモデルを用いて,感染拡大の予測に最も影響力のある特徴を同定する。また、特徴のペアを分析し、観察された2次相互作用の量を推定する。実験により,提案モデルが良好な予測性能と極めて解釈可能な特徴分析結果を得ることで,新型コロナウイルス等の全国レベルのパンデミック監視のための標準疫学モデルを補完する可能性が示唆された。深層学習モデルから得られた結果と結果は、政策立案者や研究者に効果的な緩和と対応戦略を考案する上で有益である。さらなる開発と実験を素早く進めるために、提案されたモデルを実装するために使用されるコードは、完全にオープンソース化された。

In this paper, we propose a deep learning model to forecast the range of increase in COVID-19 infected cases in future days and we present a novel method to compute equidimensional representations of multivariate time series and multivariate spatial time series data. Using this novel method, the proposed model can both take in a large number of heterogeneous features, such as census data, intra-county mobility, inter-county mobility, social distancing data, past growth of infection, among others, and learn complex interactions between these features. Using data collected from various sources, we estimate the range of increase in infected cases seven days into the future for all U.S. counties. In addition, we use the model to identify the most influential features for prediction of the growth of infection. We also analyze pairs of features and estimate the amount of observed second-order interaction between them. Experiments show that the proposed model obtains satisfactory predictive performance and fairly interpretable feature analysis results; hence, the proposed model could complement the standard epidemiological models for national-level surveillance of pandemics, such as COVID-19. The results and findings obtained from the deep learning model could potentially inform policymakers and researchers in devising effective mitigation and response strategies. To fast-track further development and experimentation, the code used to implement the proposed model has been made fully open source.

翻訳日:2022-11-04 07:13:59 公開日:2020-07-31

# 音響シーン分類におけるデバイス適応のための神経ラベル埋め込みによるリレーショナル教師学生学習

Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification ( http://arxiv.org/abs/2008.00110v1 )

ライセンス: Link先を確認

Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee

(参考訳) 本稿では,ニューラルラベル埋め込み(NLE)と関係教師学習(RTSL)を活用した音響シーン分類におけるデバイスミスマッチ問題に対処するドメイン適応フレームワークを提案する。提案手法では,音響シーンクラス間の構造的関係を考慮し,デバイスに依存しない関係を捉える。トレーニング段階では、転写可能な知識はソースドメインからNLEに凝縮される。次に、適応段階において、従来の教師学習でしばしば必要とされるペア・ソース・ターゲットデータを用いることなく、適応対象モデルを学習するための新しいRTSL戦略を採用する。提案するフレームワークはDCASE 2018 Task1bデータセットで評価されている。 AlexNet-L深層分類モデルによる実験結果から,提案手法の有効性が確認された。 NLE-alone適応は、従来のデバイス適応や教師による適応技術と好適に比較できる。 RTSLによるNLEはさらに分類精度を向上させる。

In this paper, we propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL). Taking into account the structural relationships between acoustic scene classes, our proposed framework captures such relationships which are intrinsically device-independent. In the training stage, transferable knowledge is condensed in NLE from the source domain. Next in the adaptation stage, a novel RTSL strategy is adopted to learn adapted target models without using paired source-target data often required in conventional teacher student learning. The proposed framework is evaluated on the DCASE 2018 Task1b data set. Experimental results based on AlexNet-L deep classification models confirm the effectiveness of our proposed approach for mismatch situations. NLE-alone adaptation compares favourably with the conventional device adaptation and teacher student based adaptation techniques. NLE with RTSL further improves the classification accuracy.

翻訳日:2022-11-04 07:07:27 公開日:2020-07-31

# DBLSTM-CTCを用いた手書き文字認識における暗黙的・明示的言語モデル情報の効果に関する研究

A Study on Effects of Implicit and Explicit Language Model Information for DBLSTM-CTC Based Handwriting Recognition ( http://arxiv.org/abs/2008.01532v1 )

ライセンス: Link先を確認

Qi Liu, Lijuan Wang, Qiang Huo

(参考訳) 直交時間分類(CTC)出力層を備えたD-BLSTM(Deep Bidirectional Long Short-Term Memory)が手書き認識のための最先端ソリューションとして確立されている。 CTC目的関数を用いて訓練されたDBLSTMは、文字モデリングのためのローカル文字画像依存性と暗黙的な言語モデリングのための長距離コンテキスト依存性の両方を学ぶことはよく知られている。本稿では,DBLSTM-CTCを用いた手書き文字認識における暗黙的および明示的言語モデル情報の効果について,明示的言語モデルを用いた復号化の性能の比較を行った。 100万行のトレーニング文を使用してDBLSTMをトレーニングしても、明示的な言語モデルを使用することは有用である。このような大規模トレーニング問題に対処するために,mini-batch based epochwise back propagation through time (bptt)アルゴリズムを用いて,dblstmのctcトレーニング用gpuベースのトレーニングツールを開発した。

Deep Bidirectional Long Short-Term Memory (D-BLSTM) with a Connectionist Temporal Classification (CTC) output layer has been established as one of the state-of-the-art solutions for handwriting recognition. It is well known that the DBLSTM trained by using a CTC objective function will learn both local character image dependency for character modeling and long-range contextual dependency for implicit language modeling. In this paper, we study the effects of implicit and explicit language model information for DBLSTM-CTC based handwriting recognition by comparing the performance of using or without using an explicit language model in decoding. It is observed that even using one million lines of training sentences to train the DBLSTM, using an explicit language model is still helpful. To deal with such a large-scale training problem, a GPU-based training tool has been developed for CTC training of DBLSTM by using a mini-batch based epochwise Back Propagation Through Time (BPTT) algorithm.

翻訳日:2022-11-04 07:07:11 公開日:2020-07-31

# LVCSRのための将来ベクトル拡張LSTM言語モデル

Future Vector Enhanced LSTM Language Model for LVCSR ( http://arxiv.org/abs/2008.01832v1 )

ライセンス: Link先を確認

Qi Liu, Yanmin Qian, Kai Yu

(参考訳) 言語モデル(LM)は,大語彙連続音声認識(LVCSR)において重要な役割を果たす。しかしながら、従来の言語モデルは、与えられた歴史を持つ次の単一の単語しか予測しないが、連続した単語列の予測は通常、LVCSRで要求され有用である。学習中の単一単語予測モデルと読み出し要求における長期シーケンス予測のミスマッチは、性能低下につながる可能性がある。本稿では,将来ベクトルを用いた拡張長短期メモリ(LSTM)LMを提案する。与えられた履歴に加えて、シーケンスの残りの部分は将来のベクターによって埋め込まれる。この将来のベクターはlstm lmに組み込むことができるので、より長期のシーケンスレベルの情報をモデル化することができる。実験の結果,提案したLSTM LMはBLEUスコアよりも長期的シーケンス予測が優れていることがわかった。音声認識では,提案するLSTM LMは若干の利得が得られるが,従来のLSTM LMとの大きな相補性が得られると考えられる。新たなLSTM LMと従来のLSTM LMを併用することで,単語誤り率を大幅に向上させることができる。

Language models (LM) play an important role in large vocabulary continuous speech recognition (LVCSR). However, traditional language models only predict next single word with given history, while the consecutive predictions on a sequence of words are usually demanded and useful in LVCSR. The mismatch between the single word prediction modeling in trained and the long term sequence prediction in read demands may lead to the performance degradation. In this paper, a novel enhanced long short-term memory (LSTM) LM using the future vector is proposed. In addition to the given history, the rest of the sequence will be also embedded by future vectors. This future vector can be incorporated with the LSTM LM, so it has the ability to model much longer term sequence level information. Experiments show that, the proposed new LSTM LM gets a better result on BLEU scores for long term sequence prediction. For the speech recognition rescoring, although the proposed LSTM LM obtains very slight gains, the new model seems obtain the great complementary with the conventional LSTM LM. Rescoring using both the new and conventional LSTM LMs can achieve a very large improvement on the word error rate.

翻訳日:2022-11-04 07:06:54 公開日:2020-07-31

# ビデオ領域適応のための逆二部グラフ学習

Adversarial Bipartite Graph Learning for Video Domain Adaptation ( http://arxiv.org/abs/2007.15829v1 )

ライセンス: Link先を確認

Yadan Luo, Zi Huang, Zijian Wang, Zheng Zhang, Mahsa Baktashmotlagh

(参考訳) 異なる領域間のモデルを適応させることに焦点を当てたドメイン適応技術は、ソース(トレーニング)とターゲット(テスト)ドメイン間での空間的および時間的な大きなシフトのため、ビデオ認識領域ではほとんど研究されない。このように、視覚領域適応に関する最近の研究は、逆学習を利用して、ソースとターゲットビデオの表現を統一し、特徴伝達性を強化しているため、ビデオにはあまり効果がない。この制限を克服するために,本研究では,ドメイン不変表現を学習する代わりに,ドメインに依存しないビデオ分類器を学習し,両部グラフのネットワークトポロジとソースターゲットの相互作用を直接モデル化するAdversarial Bipartite Graph (ABG) 学習フレームワークを提案する。具体的には、ソースフレームとターゲットフレームを異種頂点としてサンプリングし、2種類のノードを接続するエッジがそれらの親和性を測定する。メッセージパッシングを通じて、それぞれの頂点は、その異種の隣人から特徴を集約し、同じクラスから来る特徴を均等に混合させる。トレーニングとテストステージでビデオ分類器をこのようなクロスドメイン表現に明示的に公開することで,ラベル付きソースデータへの偏りが少なくなり,結果としてターゲットドメインのより優れた一般化が可能になるのです。モデルキャパシティをさらに高め,難読化タスクにおけるアーキテクチャのロバスト性を検証するため,ビデオレベルの二部グラフを付加した半教師付き環境での作業にモデルを拡張した。 4つのベンチマークで行った大規模な実験は、ビデオ認識におけるSOTA法に対する提案手法の有効性を実証している。

Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area due to the significant spatial and temporal shifts across the source (i.e. training) and target (i.e. test) domains. As such, recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations and strengthen the feature transferability are not highly effective on the videos. To overcome this limitation, in this paper, we learn a domain-agnostic video classifier instead of learning domain-invariant representations, and propose an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions with a network topology of the bipartite graph. Specifically, the source and target frames are sampled as heterogeneous vertexes while the edges connecting two types of nodes measure the affinity among them. Through message-passing, each vertex aggregates the features from its heterogeneous neighbors, forcing the features coming from the same class to be mixed evenly. Explicitly exposing the video classifier to such cross-domain representations at the training and test stages makes our model less biased to the labeled source data, which in-turn results in achieving a better generalization on the target domain. To further enhance the model capacity and testify the robustness of the proposed architecture on difficult transfer tasks, we extend our model to work in a semi-supervised setting using an additional video-level bipartite graph. Extensive experiments conducted on four benchmarks evidence the effectiveness of the proposed approach over the SOTA methods on the task of video recognition.

翻訳日:2022-11-04 07:06:16 公開日:2020-07-31

# デモビデオからの動作コードの推定

Estimating Motion Codes from Demonstration Videos ( http://arxiv.org/abs/2007.15841v1 )

ライセンス: Link先を確認

Maxat Alibayev, David Paulius and Yu Sun

(参考訳) 運動分類学は、操作をバイナリ符号化された表現としてエンコードすることができる。これらの動き符号は、接触や軌道タイプを含む動きの機械的特徴を記述する埋め込み空間における操作動作を本質的に表わす。埋め込みにモーションコードを使用する主な利点は、動きをロボット関連の特徴でより適切に定義でき、それらの距離をこれらの動き特徴を用いてより合理的に測定できることである。本稿では,実演ビデオから動作コードを教師なしで抽出する深層学習パイプラインを開発し,その知識をロボットに適切に表現し,活用する。評価の結果,EPIC-KITCHENSデータセットにおける動作のデモから動作符号を抽出できることが示唆された。

A motion taxonomy can encode manipulations as a binary-encoded representation, which we refer to as motion codes. These motion codes innately represent a manipulation action in an embedded space that describes the motion's mechanical features, including contact and trajectory type. The key advantage of using motion codes for embedding is that motions can be more appropriately defined with robotic-relevant features, and their distances can be more reasonably measured using these motion features. In this paper, we develop a deep learning pipeline to extract motion codes from demonstration videos in an unsupervised manner so that knowledge from these videos can be properly represented and used for robots. Our evaluations show that motion codes can be extracted from demonstrations of action in the EPIC-KITCHENS dataset.

翻訳日:2022-11-04 07:05:46 公開日:2020-07-31

# ロバスト糖尿病網膜症スクリーニングにおける残肝-CycleGANを用いたカメラ適応

Residual-CycleGAN based Camera Adaptation for Robust Diabetic Retinopathy Screening ( http://arxiv.org/abs/2007.15874v1 )

ライセンス: Link先を確認

Dalu Yang, Yehui Yang, Tiantian Huang, Binghong Wu, Lei Wang, Yanwu Xu

(参考訳) 基礎画像からの糖尿病網膜症(dr)の自動検出に焦点をあてた広範な研究がある。しかし、実際のdrスクリーニングでこれらのモデルを適用する場合、精度の低下は観測され、そこでは、訓練中のim-ageをキャプチャするために使用されるものと、楽しいカメラブランドが異なる。カメラブランドが1つしか持たないが、他のブランドのカメラで撮られたイマージで優れたパフォーマンスを実現している、ラベル付きファンドイメージの分類モデルをどうやってトレーニングできるのか? 本稿では,dr分類モデルの性能に及ぼすファンドスカメラブランドのドメインシフトの影響を,実験的な観点から定量的に検証する。さらに,領域適応によりカメラブランドの差異を緩和し,対象カメラ画像の分類性能の向上を図るために,カメラ指向残像GANを提案する。 EyePACS da-tasetとプライベートデータセットの両方での大規模なアブレーション実験により、カメラのブランド差が分類性能にシグニフィカライズし、提案したメソオードが対象領域におけるモデル性能を効果的に改善できることが証明された。我々は、eyepacs da-tasetの各画像のカメラブランドを推測し、ラベル付けし、さらにドメイン適応の研究のためにカメラブランドのラベルを公表する。

There are extensive researches focusing on automated diabetic reti-nopathy (DR) detection from fundus images. However, the accuracy drop is ob-served when applying these models in real-world DR screening, where the fun-dus camera brands are different from the ones used to capture the training im-ages. How can we train a classification model on labeled fundus images ac-quired from only one camera brand, yet still achieves good performance on im-ages taken by other brands of cameras? In this paper, we quantitatively verify the impact of fundus camera brands related domain shift on the performance of DR classification models, from an experimental perspective. Further, we pro-pose camera-oriented residual-CycleGAN to mitigate the camera brand differ-ence by domain adaptation and achieve increased classification performance on target camera images. Extensive ablation experiments on both the EyePACS da-taset and a private dataset show that the camera brand difference can signifi-cantly impact the classification performance and prove that our proposed meth-od can effectively improve the model performance on the target domain. We have inferred and labeled the camera brand for each image in the EyePACS da-taset and will publicize the camera brand labels for further research on domain adaptation.

翻訳日:2022-11-04 07:05:05 公開日:2020-07-31

# Visual SLAMのための動的物体追跡とマスキング

Dynamic Object Tracking and Masking for Visual SLAM ( http://arxiv.org/abs/2008.00072v1 )

ライセンス: Link先を確認

Jonathan Vincent, Mathieu Labb\'e, Jean-Samuel Lauzon, Fran\c{c}ois Grondin, Pier-Marc Comtois-Rivet, Fran\c{c}ois Michaud

(参考訳) 動的環境下では、動いた物体から得られる視覚的特徴によって視覚的SLAM技術の性能が損なわれる可能性がある。一つの解決策は、それらのオブジェクトを識別して、その視覚的特徴をローカライズとマッピングのために取り除くことである。本稿では,深層ニューラルネットワーク,拡張kalmanフィルタ,視覚スラムを用いて,動的環境(gtx 1080の約14fps)における局所化とマッピングの両方を改善する,シンプルで高速なパイプラインを提案する。 RTAB-MapをビジュアルSLAMとして使用したTUMデータセットからの動的シーケンスの結果から,本手法は他の最先端手法と同じようなローカライゼーション性能を実現するとともに,追跡された動的オブジェクトの位置,それらの動的オブジェクトを含まない3Dマップ,パイプライン全体を適度に移動するロボット上でのループクロージャ検出の改善などが示唆された。

In dynamic environments, performance of visual SLAM techniques can be impaired by visual features taken from moving objects. One solution is to identify those objects so that their visual features can be removed for localization and mapping. This paper presents a simple and fast pipeline that uses deep neural networks, extended Kalman filters and visual SLAM to improve both localization and mapping in dynamic environments (around 14 fps on a GTX 1080). Results on the dynamic sequences from the TUM dataset using RTAB-Map as visual SLAM suggest that the approach achieves similar localization performance compared to other state-of-the-art methods, while also providing the position of the tracked dynamic objects, a 3D map free of those dynamic objects, better loop closure detection with the whole pipeline able to run on a robot moving at moderate speed.

翻訳日:2022-11-04 07:04:19 公開日:2020-07-31

# ベータスタビライザを用いたディープラーニングに関する研究

An Investigation on Deep Learning with Beta Stabilizer ( http://arxiv.org/abs/2008.01173v1 )

ライセンス: Link先を確認

Qi Liu, Tian Tan, Kai Yu

(参考訳) 人工ニューラルネットワーク(ANN)は、手書き認識や音声認識などの多くのアプリケーションで使われている。ニューラルネットワークのトレーニング手順において,学習率が重要であることはよく知られている。学習率の初期値が最終結果に合致し得ることが示され、この値は実際に常に手動で設定される。ベータ安定化器と呼ばれる新しいパラメータを導入し、初期学習率の感度を下げた。しかし、この手法はシグモイド活性化機能を持つディープニューラルネットワーク(DNN)に対してのみ提案されている。本稿では,ベータ安定化器を長期記憶(LSTM)に拡張し,LSTMとDNNを含む様々なモデルに対するベータ安定化器パラメータの効果を検討した。ベータ安定化パラメータは、Reluアクティベーション関数とLSTMを持つDNNでほぼ同じ性能で学習率の感度を低下させることができると結論付けた。しかし, 可溶性活性化機能を有するDNNとLSTMに対するβ安定化剤の効果は, シグモイド活性化機能を有するDNNに対する影響よりも小さいことがわかった。

Artificial neural networks (ANN) have been used in many applications such like handwriting recognition and speech recognition. It is well-known that learning rate is a crucial value in the training procedure for artificial neural networks. It is shown that the initial value of learning rate can confoundedly affect the final result and this value is always set manually in practice. A new parameter called beta stabilizer has been introduced to reduce the sensitivity of the initial learning rate. But this method has only been proposed for deep neural network (DNN) with sigmoid activation function. In this paper we extended beta stabilizer to long short-term memory (LSTM) and investigated the effects of beta stabilizer parameters on different models, including LSTM and DNN with relu activation function. It is concluded that beta stabilizer parameters can reduce the sensitivity of learning rate with almost the same performance on DNN with relu activation function and LSTM. However, it is shown that the effects of beta stabilizer on DNN with relu activation function and LSTM are fewer than the effects on DNN with sigmoid activation function.

翻訳日:2022-11-04 06:57:43 公開日:2020-07-31

# ブラジャーとケッツでブラケットするブラケット

Bracketing brackets with bras and kets ( http://arxiv.org/abs/2008.12247v1 )

ライセンス: Link先を確認

Emily Clark, Angelie Vincent, J. Nathan Kutz, and Steven L. Brunton

(参考訳) ブラケットは航空機の製造と設計、部品の接合、重量の維持、ワイヤーの保持、継手強化に欠かせない要素である。全ての航空機で数百から数千の独特なブラケットが使用されるが、多くの異なるブラケットの製造は非効率で高価である。幸運にも、いわゆる「異なる」ブラケットの多くは、実際には互いに非常に似ているか、あるいは同一である。本稿では,ブラケットデータの階層的クラスタリングに基づいて,現在のブラケットの大きなカタログから比較的小さな代表ブラケット群を構築するための,データ駆動型フレームワークを提案する。現代の商用機では、テストセットの半分を十分に正確に記述しながら、ブラケットの完全なセットを30 %削減できることがわかった。このアプローチは、内積の「bra」と「ket」である2つの括弧間の多目的類似性を定量化する内積を設計することに基づいている。航空機製造におけるブラケット数を削減するために,本アルゴリズムを実証するが,大規模コンポーネントの標準化にも適用できる。

Brackets are an essential component in aircraft manufacture and design, joining parts together, supporting weight, holding wires, and strengthening joints. Hundreds or thousands of unique brackets are used in every aircraft, but manufacturing a large number of distinct brackets is inefficient and expensive. Fortunately, many so-called "different" brackets are in fact very similar or even identical to each other. In this manuscript, we present a data-driven framework for constructing a comparatively small group of representative brackets from a large catalog of current brackets, based on hierarchical clustering of bracket data. We find that for a modern commercial aircraft, the full set of brackets can be reduced by 30\% while still describing half of the test set sufficiently accurately. This approach is based on designing an inner product that quantifies a multi-objective similarity between two brackets, which are the "bra" and the "ket" of the inner product. Although we demonstrate this algorithm to reduce the number of brackets in aerospace manufacturing, it may be generally applied to any large-scale component standardization effort.

翻訳日:2022-11-04 06:57:25 公開日:2020-07-31

# 実行時情報を組み込んだ準最適反応合成

Near-Optimal Reactive Synthesis Incorporating Runtime Information ( http://arxiv.org/abs/2007.16107v1 )

ライセンス: Link先を確認

Suda Bharadwaj, Abraham P. Vinod, Rayna Dimitrova, Ufuk Topcu

(参考訳) 我々は,動的環境におけるミッション仕様を満たす戦略を計算し,性能指標を最適化する,最適反応合成の問題を考える。実行時にのみ利用可能なタスククリティカルな情報を戦略合成に組み込んで,パフォーマンスを向上させる。このような時間変化情報を利用する既存のアプローチは、リアルタイムアプリケーションでは計算不可能なオンライン再合成を必要とする。本稿では,候補のインスタンス化に対応する戦略のセット(事前特定代表情報シナリオ)を事前に合成する。そこで我々は,すべての安全性と生存目標を満たしながら,実行時の戦略を動的に切り替える新しいスイッチング機構を提案する。また、パフォーマンスサブオプティリティの境界を特徴付ける。そこで本研究では,ロボットの目標位置の可能性をリアルタイムで更新するロボット運動計画手法と,都市空力移動のための航空交通管理問題について紹介する。

We consider the problem of optimal reactive synthesis - compute a strategy that satisfies a mission specification in a dynamic environment, and optimizes a performance metric. We incorporate task-critical information, that is only available at runtime, into the strategy synthesis in order to improve performance. Existing approaches to utilising such time-varying information require online re-synthesis, which is not computationally feasible in real-time applications. In this paper, we pre-synthesize a set of strategies corresponding to candidate instantiations (pre-specified representative information scenarios). We then propose a novel switching mechanism to dynamically switch between the strategies at runtime while guaranteeing all safety and liveness goals are met. We also characterize bounds on the performance suboptimality. We demonstrate our approach on two examples - robotic motion planning where the likelihood of the position of the robot's goal is updated in real-time, and an air traffic management problem for urban air mobility.

翻訳日:2022-11-04 06:57:08 公開日:2020-07-31

# 複数属性選択決定のためのルックアヘッドおよびハイブリッドサンプル割り当て手順

Lookahead and Hybrid Sample Allocation Procedures for Multiple Attribute Selection Decisions ( http://arxiv.org/abs/2007.16119v1 )

ライセンス: Link先を確認

Jeffrey W. Herrmann and Kunal Mehta

(参考訳) 属性は、意思決定者が検討している代替案に関する重要な情報を提供する。大きさが不確実な場合、意思決定者はどの選択肢が本当に最良ののかわからないため、属性の測定は意思決定者がよりよい判断を下すのに役立つかもしれない。本稿では、各測定値が1つの属性の1つのサンプルを1つの代替として生成する設定について考察する。収集すべきサンプル数が一定であれば、どのサンプルを取得するかを決定し、測定を行い、属性の規模に関する事前の信念を更新し、代替案を選択する必要がある。本稿では,複数の属性選択決定に対するサンプル割当問題を提案し,不確かさをモデル化するために離散分布を用いた場合の2つの逐次的ルックアヘッド手順を提案する。 2つの手順は似ているが、異なる品質基準(と損失関数)を反映しており、これは異なる決定ルールを動機付けている。そこで本研究では,まず一様アロケーション手順を用いてサンプルを割り当て,次にシーケンシャルなルックアヘッド手順を用いて,シーケンシャルなプロシージャとハイブリッドなプロシージャの性能を評価するためのシミュレーション研究を行った。その結果,初期標本の多く(すべてではないが)を均一な割当手順で割当てることによって,全体の計算労力を削減できるだけでなく,平均的な機会コストが低く,真にベストな代替案も選択できることが示唆された。

Attributes provide critical information about the alternatives that a decision-maker is considering. When their magnitudes are uncertain, the decision-maker may be unsure about which alternative is truly the best, so measuring the attributes may help the decision-maker make a better decision. This paper considers settings in which each measurement yields one sample of one attribute for one alternative. When given a fixed number of samples to collect, the decision-maker must determine which samples to obtain, make the measurements, update prior beliefs about the attribute magnitudes, and then select an alternative. This paper presents the sample allocation problem for multiple attribute selection decisions and proposes two sequential, lookahead procedures for the case in which discrete distributions are used to model the uncertain attribute magnitudes. The two procedures are similar but reflect different quality measures (and loss functions), which motivate different decision rules: (1) select the alternative with the greatest expected utility and (2) select the alternative that is most likely to be the truly best alternative. We conducted a simulation study to evaluate the performance of the sequential procedures and hybrid procedures that first allocate some samples using a uniform allocation procedure and then use the sequential, lookahead procedure. The results indicate that the hybrid procedures are effective; allocating many (but not all) of the initial samples with the uniform allocation procedure not only reduces overall computational effort but also selects alternatives that have lower average opportunity cost and are more often truly best.

翻訳日:2022-11-04 06:56:55 公開日:2020-07-31

# the tactician (extended version):coqのためのシームレスでインタラクティブな戦術学習者と証明者

The Tactician (extended version): A Seamless, Interactive Tactic Learner and Prover for Coq ( http://arxiv.org/abs/2008.00120v1 )

ライセンス: Link先を確認

Lasse Blaauwbroek, Josef Urban and Herman Geuvers

(参考訳) 我々はCoq Proof Assistantの戦術学習者であり証明者であるTacticianを紹介する。 Tacticianは、一般的な証明戦略のコントロールを維持しながら、ユーザーが戦術的証明決定を行うのを助ける。この目的のために、tacticianは以前に書かれた戦術スクリプトから学習し、実行すべき次の戦術について提案するか、証明合成の負担を完全に引き受ける。 Tacticianの目標は、ユーザに対してシームレスでインタラクティブで直感的なエクスペリエンスと、堅牢で適応的な証明自動化を提供することだ。本稿では,ユーザの視点からのTacticianの概要を概観し,大規模に学習しながら,パッケージ依存管理の日常的利用と課題について述べる。最後に、tacticianのcoqプラグインと機械学習プラットフォームとしての実装について紹介する。

We present Tactician, a tactic learner and prover for the Coq Proof Assistant. Tactician helps users make tactical proof decisions while they retain control over the general proof strategy. To this end, Tactician learns from previously written tactic scripts and gives users either suggestions about the next tactic to be executed or altogether takes over the burden of proof synthesis. Tactician's goal is to provide users with a seamless, interactive, and intuitive experience together with robust and adaptive proof automation. In this paper, we give an overview of Tactician from the user's point of view, regarding both day-to-day usage and issues of package dependency management while learning in the large. Finally, we give a peek into Tactician's implementation as a Coq plugin and machine learning platform.

翻訳日:2022-11-04 06:56:15 公開日:2020-07-31

# 非同期分散マイクロホンを用いた発話会議記録システム

Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones ( http://arxiv.org/abs/2007.15868v1 )

ライセンス: Link先を確認

Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu

(参考訳) 本稿では,非同期マイクロホンを用いた新しい音声書き起こしフレームワークを提案する。音声同期、話者ダイアリゼーション、誘導音源分離を用いた発話音声強調、自動音声認識、重複低減で構成されている。音声強調の前に話者ダイアリゼーションを行うことで、マイクロホン間のサンプリング周波数ミスマッチを考慮せずに重複音声を処理することができる。実際の会議データセットから,11個の分散マイクロホンを用いて28.7%の文字誤り率(CER)を達成し,テーブル中央のモノラルマイクロホンは38.2%のCERを示した。また,本フレームワークは21.8 %のcerを達成し,ヘッドセット用マイクロホンによる音声認識では2.1ポイントの精度を示した。

A novel framework for meeting transcription using asynchronous microphones is proposed in this paper. It consists of audio synchronization, speaker diarization, utterance-wise speech enhancement using guided source separation, automatic speech recognition, and duplication reduction. Doing speaker diarization before speech enhancement enables the system to deal with overlapped speech without considering sampling frequency mismatch between microphones. Evaluation on our real meeting datasets showed that our framework achieved a character error rate (CER) of 28.7 % by using 11 distributed microphones, while a monaural microphone placed on the center of the table had a CER of 38.2 %. We also showed that our framework achieved CER of 21.8 %, which is only 2.1 percentage points higher than the CER in headset microphone-based transcription.

翻訳日:2022-11-04 06:56:02 公開日:2020-07-31

# 部分発話を用いた音響シーン分類における音響セグメントモデルに基づくセグメント単位選択手法

An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances ( http://arxiv.org/abs/2008.00107v1 )

ライセンス: Link先を確認

Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee

(参考訳) 本稿では,音響シーン分類(asc)のための情報が少ない音声録音における音響セグメントを除去するサブ発話単位選択フレームワークを提案する。このアプローチは,音響シーン全体の空間をカバーする音響セグメント単位の普遍セットを基盤としている。まず、これらの単位を音響セグメントモデル(ASM)でモデル化し、音響シーンの発話を音響セグメント単位のシーケンスにトークン化する。次に、情報検索における停止語の概念と並行して、ASMを自動的に検出する。最後に、ほとんどの音響シーンの検索においてインデックス化能力の低いため、停止ASMに関連する音響セグメントをブロックする。全発話を含むシーンモデルとは対照的に、ASM除去サブ発話、すなわち音節を停止しない音響発話は、最終分類のためのAlexNet-Lバックエンドへの入力として使用される。 dcase 2018データセットでは、シーン分類の精度が、発話全体の68%からセグメント選択による72.1%に向上した。これはデータ拡張やアンサンブル戦略を使わずに競合する精度を示す。さらに,本手法は注意を払ってAlexNet-Lと比較した。

In this paper, we propose a sub-utterance unit selection framework to remove acoustic segments in audio recordings that carry little information for acoustic scene classification (ASC). Our approach is built upon a universal set of acoustic segment units covering the overall acoustic scene space. First, those units are modeled with acoustic segment models (ASMs) used to tokenize acoustic scene utterances into sequences of acoustic segment units. Next, paralleling the idea of stop words in information retrieval, stop ASMs are automatically detected. Finally, acoustic segments associated with the stop ASMs are blocked, because of their low indexing power in retrieval of most acoustic scenes. In contrast to building scene models with whole utterances, the ASM-removed sub-utterances, i.e., acoustic utterances without stop acoustic segments, are then used as inputs to the AlexNet-L back-end for final classification. On the DCASE 2018 dataset, scene classification accuracy increases from 68%, with whole utterances, to 72.1%, with segment selection. This represents a competitive accuracy without any data augmentation, and/or ensemble strategy. Moreover, our approach compares favourably to AlexNet-L with attention.

翻訳日:2022-11-04 06:55:25 公開日:2020-07-31

# 低光度画像におけるサルエント物体検出のための画像強調の検討

Exploring Image Enhancement for Salient Object Detection in Low Light Images ( http://arxiv.org/abs/2007.16124v1 )

ライセンス: Link先を確認

Xin Xu, Shiqin Wang, Zheng Wang, Xiaolong Zhang, and Ruimin Hu

(参考訳) 非一様照明環境で撮影される低光画像は通常、シーン深度と対応する環境光で劣化する。この劣化により、劣化した画像モダリティの厳しい物体情報損失が発生し、低コントラスト特性と人工光の影響により、顕著な物体検出がより困難になる。しかし,実世界のシナリオでは実現不可能な十分な明るさ環境下で画像が撮影されるという仮定に基づいて,既存の正当性物体検出モデルを開発した。本研究では,低照度画像における物体検出を容易にする画像強調手法を提案する。提案モデルでは, 物理照明モデルを深部ニューラルネットワークに直接埋め込んで, 低光画像の劣化を記述する。さらに、非局所ブロック層を用いて、物体の局所的内容と、その局所的に有利な領域との差をキャプチャする。定量的評価のために,画素レベルの人間ラベル付地中アノテーションを用いた低照度画像データセットを構築し,4つの公開データセットとベンチマークデータセットで有望な結果を報告する。

Low light images captured in a non-uniform illumination environment usually are degraded with the scene depth and the corresponding environment lights. This degradation results in severe object information loss in the degraded image modality, which makes the salient object detection more challenging due to low contrast property and artificial light influence. However, existing salient object detection models are developed based on the assumption that the images are captured under a sufficient brightness environment, which is impractical in real-world scenarios. In this work, we propose an image enhancement approach to facilitate the salient object detection in low light images. The proposed model directly embeds the physical lighting model into the deep neural network to describe the degradation of low light images, in which the environment light is treated as a point-wise variate and changes with local content. Moreover, a Non-Local-Block Layer is utilized to capture the difference of local content of an object against its local neighborhood favoring regions. To quantitative evaluation, we construct a low light Images dataset with pixel-level human-labeled ground-truth annotations and report promising results on four public datasets and our benchmark dataset.

翻訳日:2022-11-04 06:48:56 公開日:2020-07-31

# ハイブリッド特徴選択モデルに基づくPalm静脈同定

Palm Vein Identification based on hybrid features selection model ( http://arxiv.org/abs/2007.16195v1 )

ライセンス: Link先を確認

Mohammed Hamzah Abed, Ali H. Alsaeedi, Ali D. Alfoudi, Abayomi M. Otebolaku, Yasmeen Sajid Razooqi

(参考訳) Palm vein Identification (PVI) は、セキュリティと認証システムを強化するために使われる生体認証技術である。手のひら静脈パターンの鍵となる特徴は、その個性、忘れられない、意図しない、無許可の人物が取ることができないことである。しかし,手のひら静脈パターンから抽出した特徴は,高い冗長性を有する。本稿では,2次元離散ウェーブレット変換,主成分分析 (PCA) と粒子群最適化 (PSO) (2D-DWTPP) の組合せモデルを提案する。 2D-DWT抽出はヤシ静脈像から特徴を抽出し,PCAはヤシ静脈像の冗長性を低下させる。このシステムは、ラッパーモデルに基づいて高い尊敬機能を選択するように訓練されている。 PSOは、機能の最適なサブセットによってラッパーモデルを供給します。提案システムは4つの分類器を目的関数として使用し、サポートベクターマシン(svm)、k極近傍(knn)、決定木(dt)、na\"ive bayes(nb)を含むvpiを決定する。実験結果から,提案システムIitはSVMで最高の結果を得た。提案した2D-DWTPPモデルについて評価し,特徴選択のないAlexnetや分類器と比較して,顕著な効率性を示した。実験的に、このモデルは (98.65) に反映され、一方 alexnet は (63.5) 、応用分類器は特徴選択なし (78.79) である。

Palm vein identification (PVI) is a modern biometric security technique used for increasing security and authentication systems. The key characteristics of palm vein patterns include, its uniqueness to each individual, unforgettable, non-intrusive and cannot be taken by an unauthorized person. However, the extracted features from the palm vein pattern are huge with high redundancy. In this paper, we propose a combine model of two-Dimensional Discrete Wavelet Transform, Principal Component Analysis (PCA), and Particle Swarm Optimization (PSO) (2D-DWTPP) to enhance prediction of vein palm patterns. The 2D-DWT Extracts features from palm vein images, PCA reduces the redundancy in palm vein features. The system has been trained in selecting high reverent features based on the wrapper model. The PSO feeds wrapper model by an optimal subset of features. The proposed system uses four classifiers as an objective function to determine VPI which include Support Vector Machine (SVM), K Nearest Neighbor (KNN), Decision Tree (DT) and Na\"ive Bayes (NB). The empirical result shows the proposed system Iit satisfied best results with SVM. The proposed 2D-DWTPP model has been evaluated and the results shown remarkable efficiency in comparison with Alexnet and classifier without feature selection. Experimentally, our model has better accuracy reflected by (98.65) while Alexnet has (63.5) and applied classifier without feature selection has (78.79).

翻訳日:2022-11-04 06:48:35 公開日:2020-07-31

# 車両計数のための物体検出と追尾アルゴリズム:比較分析

Object Detection and Tracking Algorithms for Vehicle Counting: A Comparative Analysis ( http://arxiv.org/abs/2007.16198v1 )

ライセンス: Link先を確認

Vishal Mandal and Yaw Adu-Gyamfi

(参考訳) ディープラーニングとハイパフォーマンスコンピューティングの分野における急速な進歩は、ビデオベースの車両計数システムの範囲を大きく拡大した。本稿では,関心領域(ROI)の異なる車両のクラスを検出し,追跡するために,アートオブジェクトの検出と追跡アルゴリズムのいくつかの状態をデプロイする。 ROIにおける車両の正確な検出と追跡の目標は、正確な車両数を得ることである。異なるトラッキングシステムと組み合わせたオブジェクト検出モデルの複数組み合わせを、最良の車両カウントフレームワークへのアクセスに適用する。モデルでは、異なる気象条件、閉塞、低照度設定に関連する課題に対処し、計算に富んだトレーニングとフィードバックサイクルを通じて車両情報や軌道を効率的に抽出する。ルイジアナ州交通開発局から得られた9時間以上の交通映像データの手動で集計した地上情報と比較し、すべてのモデルの組み合わせから得られた自動車両数を検証した。実験の結果、センタネットとディープソート、ディテクトロン2とディープソート、そしてヨーロフ4とディープソートの組み合わせは、全車両で最高の総計率を生み出した。

The rapid advancement in the field of deep learning and high performance computing has highly augmented the scope of video based vehicle counting system. In this paper, the authors deploy several state of the art object detection and tracking algorithms to detect and track different classes of vehicles in their regions of interest (ROI). The goal of correctly detecting and tracking vehicles' in their ROI is to obtain an accurate vehicle count. Multiple combinations of object detection models coupled with different tracking systems are applied to access the best vehicle counting framework. The models' addresses challenges associated to different weather conditions, occlusion and low-light settings and efficiently extracts vehicle information and trajectories through its computationally rich training and feedback cycles. The automatic vehicle counts resulting from all the model combinations are validated and compared against the manually counted ground truths of over 9 hours' traffic video data obtained from the Louisiana Department of Transportation and Development. Experimental results demonstrate that the combination of CenterNet and Deep SORT, Detectron2 and Deep SORT, and YOLOv4 and Deep SORT produced the best overall counting percentage for all vehicles.

翻訳日:2022-11-04 06:48:05 公開日:2020-07-31

# 車両検出・追跡のための視覚注意手がかりの活用

Utilising Visual Attention Cues for Vehicle Detection and Tracking ( http://arxiv.org/abs/2008.00106v1 )

ライセンス: Link先を確認

Feiyan Hu, Venkatesh G M, Noel E. O'Connor, Alan F. Smeaton and Suzanne Little

(参考訳) Advanced Driver-Assistance Systems (ADAS)は多くの研究者から注目を集めている。視覚ベースのセンサーは、運転中に人間の視覚行動をエミュレートする最も近い方法だ。本稿では,物体検出と追跡に視覚的注意(saliency)を利用する方法について検討する。調査します 1) 2段階物体検出装置において,<emph{subjectness} 注目度マップや,<emph{objectness} 注目度マップなどの視覚的注意度マップが,領域提案生成を容易にするか。 2)複数の物体の追跡に視覚的注意マップをどのように利用できるか。本稿では,物体を同時に検出し,対象性と主観性マップを生成し,計算力を節約するニューラルネットワークを提案する。さらに,逐次モンテカルロ確率仮説密度(phd)フィルタを用いて追跡中に視覚注意マップを活用した。実験はKITTIとDETRACのデータセットを用いて行われた。視覚的注意と階層的特徴の使用により、オブジェクト検出における$\approx$8\%が大幅に改善され、KITTIデータセット上で$\approx$4\%のトラッキング性能が向上した。

Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency map and an \emph{objectness} attention map can facilitate region proposal generation in a 2-stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of $\approx$8\% in object detection which effectively increased tracking performance by $\approx$4\% on KITTI dataset.

翻訳日:2022-11-04 06:46:52 公開日:2020-07-31

# Prolog-based Dialog Engineを用いたインタラクティブテキストグラフマイニング

Interactive Text Graph Mining with a Prolog-based Dialog Engine ( http://arxiv.org/abs/2008.00956v1 )

ライセンス: Link先を確認

Paul Tarau and Eduardo Blanco

(参考訳) ニューラルネットワークベースの依存性パーサとグラフベースの自然言語処理モジュールの上に、テキスト文書から抽出されたランキングファクトデータベースを対話的に探索するPrologベースのダイアログエンジンを設計する。依存グラフを再編成し,文の最も関連性の高い要素に着目し,文識別子をグラフノードとして統合する。さらに、グラフをランク付けした後、依存リンクとWordNetが主観動詞オブジェクト、is-a、part-of関係という形でもたらす暗黙のセマンティック情報を利用する。 Prologの事実とその推測結果に基づいて、ダイアログエンジンはクエリに関するテキストグラフを専門とし、ドキュメントの最も関連性の高いコンテンツ要素をインタラクティブに公開する。統合システムのオープンソースコードはhttps://github.com/ptarau/DeepRank で公開されている。論理プログラミングの理論と実践(tplp)における考察。

On top of a neural network-based dependency parser and a graph-based natural language processing module we design a Prolog-based dialog engine that explores interactively a ranked fact database extracted from a text document. We reorganize dependency graphs to focus on the most relevant content elements of a sentence and integrate sentence identifiers as graph nodes. Additionally, after ranking the graph we take advantage of the implicit semantic information that dependency links and WordNet bring in the form of subject-verb-object, is-a and part-of relations. Working on the Prolog facts and their inferred consequences, the dialog engine specializes the text graph with respect to a query and reveals interactively the document's most relevant content elements. The open-source code of the integrated system is available at https://github.com/ptarau/DeepRank . Under consideration in Theory and Practice of Logic Programming (TPLP).

翻訳日:2022-11-04 06:46:32 公開日:2020-07-31

# ニューラルネットワークの変性と脳との関係

Neural Network Degeneration and its Relationship to the Brain ( http://arxiv.org/abs/2008.00053v1 )

ライセンス: Link先を確認

Jacob Adamczyk

(参考訳) 本稿では,脳の小さな部分としてのニューラルネットワーク(NN)の適用について述べる。生物学的コネクトームを表すネットワークは、空間的にも時間的にも変化する。ここで適用される劣化技法は「重み劣化」、「重みスクランブル」、「可変活性化機能」である。これらの方法は、アルツハイマー病、ハンティントン病、パーキンソン病、脳卒中や脳腫瘍などの神経変性疾患の研究に光を当てることを目的としている。メモリ損失と一般化学習障害に対する基本的な洞察は、ネットワーク劣化時のネットワークのエラー関数を監視することによって得られる。各面の生物学的意義についても論じる。

This report discusses the application of neural networks (NNs) as small segments of the brain. The networks representing the biological connectome are altered both spatially and temporally. The degradation techniques applied here are "weight degradation", "weight scrambling", and variable activation function. These methods aim to shine light on the study of neurodegenerative diseases such as Alzheimer's, Huntington's and Parkinson's disease as well as strokes and brain tumors disrupting the flow of information in the brain's network. Fundamental insights to memory loss and generalized learning dysfunction are gained by monitoring the network's error function during network degradation. The biological significance of each facet is also discussed.

翻訳日:2022-11-04 06:39:46 公開日:2020-07-31

# lemma:マルチエージェントマルチタスクアクティビティを学習するためのマルチビューデータセット

LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities ( http://arxiv.org/abs/2007.15781v1 )

ライセンス: Link先を確認

Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-chun Zhu

(参考訳) 人間の行動を理解し解釈することは長年の挑戦であり、人工知能における知覚の重要な指標である。しかし、ゴール指向アクション、同時マルチタスク、マルチエージェント間のコラボレーションなど、日常的な活動の衝動的な要素は、これまでの文献ではほとんど失われている。補題データセットを導入して,これらの欠落した次元に対して,細心の注意を払って設計した設定で対処するための単一のホームを提供し,異なる学習目標を強調するためにタスクやエージェントの数が異なる。我々は、人間と物体の相互作用による原子間相互作用を密に注釈し、日常の活動の構成性、スケジューリング、割り当ての土台として提供する。さらに,合成行動認識と行動/タスク予測ベンチマークをベースラインモデルで作成し,構成行動理解と時間的推論の能力を測定する。この取り組みにより、マシンビジョンコミュニティは、目標指向の人間活動を調べ、現実世界におけるタスクのスケジューリングと割り当てをさらに研究できることを期待します。

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence. However, a few imperative components of daily human activities are largely missed in prior literature, including the goal-directed actions, concurrent multi-tasks, and collaborations among multi-agents. We introduce the LEMMA dataset to provide a single home to address these missing dimensions with meticulously designed settings, wherein the number of tasks and agents varies to highlight different learning objectives. We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities. We further devise challenging compositional action recognition and action/task anticipation benchmarks with baseline models to measure the capability of compositional action understanding and temporal reasoning. We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.

翻訳日:2022-11-04 06:39:35 公開日:2020-07-31

# AR-Net:効果的な行動認識のための適応フレーム分解能

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition ( http://arxiv.org/abs/2007.15796v1 )

ライセンス: Link先を確認

Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, and Rogerio Feris

(参考訳) 行動認識はコンピュータビジョンにおいてオープンかつ挑戦的な問題である。現在の最先端モデルは優れた認識結果を提供するが、その計算費用は現実世界の多くのアプリケーションに対する影響を制限する。本稿では,提案手法であるar-net(adaptive resolution network,適応解像度ネットワーク)を提案する。具体的には、映像フレームを与えられた場合、アクション認識モデルによる処理にどの入力解像度を使用するべきかを、精度と効率の両立を目標としてポリシーネットワークを用いて決定する。標準バックプロパゲーションを用いた認識モデルと協調してポリシーネットワークを効率的に訓練する。いくつかの挑戦的行動認識ベンチマークデータセットに関する広範な実験は、最先端手法に対する提案手法の有効性をよく示している。プロジェクトページはhttps://mengyuest.github.io/AR-Netにある。

Action recognition is an open and challenging problem in computer vision. While current state-of-the-art models offer excellent recognition results, their computational expense limits their impact for many real-world applications. In this paper, we propose a novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition in long untrimmed videos. Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency. We efficiently train the policy network jointly with the recognition model using standard back-propagation. Extensive experiments on several challenging action recognition benchmark datasets well demonstrate the efficacy of our proposed approach over state-of-the-art methods. The project page can be found at https://mengyuest.github.io/AR-Net

翻訳日:2022-11-04 06:39:17 公開日:2020-07-31

# ETH-XGaze:極端ヘッドポーズにおける注視推定のための大規模データセットと注視変動

ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation ( http://arxiv.org/abs/2007.15837v1 )

ライセンス: Link先を確認

Xucong Zhang and Seonwook Park and Thabo Beeler and Derek Bradley and Siyu Tang and Otmar Hilliges

(参考訳) 視線推定はコンピュータビジョン、人間のコンピュータインタラクション、ロボット工学の多くの応用における基本的なタスクである。多くの最先端のメソッドはカスタムデータセット上でトレーニングされ、テストされるため、メソッド間の比較が困難になる。さらに、既存の視線推定データセットは、頭部ポーズと視線変動が制限されており、異なるプロトコルとメトリクスを用いて評価を行う。本稿では,頭部の極端な姿勢下での視線の異なる100万以上の高解像度画像からなる,eth-xgazeと呼ばれる新しい視線推定データセットを提案する。このデータセットは,18台のデジタルslrカメラと調整可能な照明条件を含むカスタムハードウェアセットアップと,地上の真理観測目標を記録する校正システムを用いて,110名の参加者から収集した。我々のデータセットは、異なる頭部ポーズと視線角度で視線推定手法の堅牢性を大幅に改善できることを示す。さらに,ETH-XGazeの標準化された実験プロトコルと評価基準を定義し,今後の視線推定研究を統一する。データセットとベンチマークのWebサイトはhttps://ait.ethz.ch/projects/2020/ETH-XGazeで公開されている。

Gaze estimation is a fundamental task in many applications of computer vision, human computer interaction and robotics. Many state-of-the-art methods are trained and tested on custom datasets, making comparison across methods challenging. Furthermore, existing gaze estimation datasets have limited head pose and gaze variations, and the evaluations are conducted using different protocols and metrics. In this paper, we propose a new gaze estimation dataset called ETH-XGaze, consisting of over one million high-resolution images of varying gaze under extreme head poses. We collect this dataset from 110 participants with a custom hardware setup including 18 digital SLR cameras and adjustable illumination conditions, and a calibrated system to record ground truth gaze targets. We show that our dataset can significantly improve the robustness of gaze estimation methods across different head poses and gaze angles. Additionally, we define a standardized experimental protocol and evaluation metric on ETH-XGaze, to better unify gaze estimation research going forward. The dataset and benchmark website are available at https://ait.ethz.ch/projects/2020/ETH-XGaze

翻訳日:2022-11-04 06:38:50 公開日:2020-07-31

# 深層ニューラルネットワークの特徴可視化のための塩分駆動クラスインプレッション

Saliency-driven Class Impressions for Feature Visualization of Deep Neural Networks ( http://arxiv.org/abs/2007.15861v1 )

ライセンス: Link先を確認

Sravanti Addepalli, Dipesh Tamboli, R. Venkatesh Babu, Biplab Banerjee

(参考訳) 本稿では,各クラスのインプレッションを分類器のメモリから抽出するデータフリーな手法を提案する。ディープラーニングのレジームは、トレーニングデータから特定のクラスの異なるパターン(あるいは特徴)を抽出するように分類器に権限を与えます。これらのモデルをクリティカルなアプリケーションにデプロイする前に、分類に不可欠なと思われる機能を視覚化するのが有利である。既存の可視化手法は,背景特徴と前景特徴の両方からなる高信頼画像を生成する。これにより、あるクラスの重要な機能が何であるかを判断するのは難しい。本研究では,与えられたタスクにおいて最も重要な識別的特徴を視覚化するための,サリエンシー駆動手法を提案する。既存のメソッドのもう一つの欠点は、生成されたビジュアライゼーションの信頼性が、与えられたクラスの複数のインスタンスを作成することで高まることである。我々は,画像ごとの単一オブジェクトの開発にアルゴリズムを制限し,信頼性の高い特徴を抽出し,その結果の可視化を向上する。さらに,2つ以上のクラスの自然な融合画像として,否定画像の生成を実証する。

In this paper, we propose a data-free method of extracting Impressions of each class from the classifier's memory. The Deep Learning regime empowers classifiers to extract distinct patterns (or features) of a given class from training data, which is the basis on which they generalize to unseen data. Before deploying these models on critical applications, it is advantageous to visualize the features considered to be essential for classification. Existing visualization methods develop high confidence images consisting of both background and foreground features. This makes it hard to judge what the crucial features of a given class are. In this work, we propose a saliency-driven approach to visualize discriminative features that are considered most important for a given task. Another drawback of existing methods is that confidence of the generated visualizations is increased by creating multiple instances of the given class. We restrict the algorithm to develop a single object per image, which helps further in extracting features of high confidence and also results in better visualizations. We further demonstrate the generation of negative images as naturally fused images of two or more classes.

翻訳日:2022-11-04 06:38:14 公開日:2020-07-31

# DynaMiTe:ロバストリアルタイム特徴マッチングのための時間制約付き動的局所運動モデル

DynaMiTe: A Dynamic Local Motion Model with Temporal Constraints for Robust Real-Time Feature Matching ( http://arxiv.org/abs/2007.16005v1 )

ライセンス: Link先を確認

Patrick Ruhkamp and Ruiqi Gong and Nassir Navab and Benjamin Busam

(参考訳) 特徴量に基づくビジュアルオドメトリーとSLAM法では,リアルタイムに正確なカメラポーズ推定を行うために,連続した画像フレーム間の正確な対応が求められている。現在の特徴マッチングパイプラインは、特徴抽出器の記述能力にのみ依存するか、計算的に複雑な最適化スキームを必要とする。本稿では,ディスクリプタ入力に非依存な軽量パイプラインDynaMiTeを提案する。この手法の理論的バックボーンは、特徴マッチングの確率的定式化と、物理的動機づけのある制約の研究にある。動的適応可能な局所運動モデルは、効率的なデータ構造に特徴群をカプセル化する。時間的制約は局所運動モデルの情報を時間的に伝達するので、マッチングの検索空間の複雑さも軽減する。 dynamiteは、高いフレームレートでマッチング精度とカメラポーズ推定の両面で優れた結果を達成し、計算効率は高く、最先端のマッチング手法よりも優れている。

Feature based visual odometry and SLAM methods require accurate and fast correspondence matching between consecutive image frames for precise camera pose estimation in real-time. Current feature matching pipelines either rely solely on the descriptive capabilities of the feature extractor or need computationally complex optimization schemes. We present the lightweight pipeline DynaMiTe, which is agnostic to the descriptor input and leverages spatial-temporal cues with efficient statistical measures. The theoretical backbone of the method lies within a probabilistic formulation of feature matching and the respective study of physically motivated constraints. A dynamically adaptable local motion model encapsulates groups of features in an efficient data structure. Temporal constraints transfer information of the local motion model across time, thus additionally reducing the search space complexity for matching. DynaMiTe achieves superior results both in terms of matching accuracy and camera pose estimation with high frame rates, outperforming state-of-the-art matching methods while being computationally more efficient.

翻訳日:2022-11-04 06:37:56 公開日:2020-07-31

# 自動運転車の交通制御ジェスチャー認識

Traffic Control Gesture Recognition for Autonomous Vehicles ( http://arxiv.org/abs/2007.16072v1 )

ライセンス: Link先を確認

Julian Wiederer, Arij Bouazizi, Ulrich Kressel and Vasileios Belagiannis

(参考訳) 車の運転手は、交通官のジェスチャーに反応する方法を知っています。道路交通制御のジェスチャー認識機能がない限り、これは自動運転車には当てはまらないことは明らかだ。本研究では、交通制御ジェスチャー認識のための学習データを提供するため、既存の自動運転データセットの制限に対処する。本稿では3Dボディスケルトン入力に基づくデータセットを導入し,時間ステップ毎に交通制御のジェスチャー分類を行う。私たちのデータセットは、複数のアクターによる250のシーケンスで構成されています。このデータセットを評価するために,再帰的ネットワーク,注意機構,時間的畳み込みネットワーク,グラフ畳み込みネットワークなどのディープニューラルネットワークに基づく8つの逐次処理モデルを提案する。我々は、データセットに対する全てのアプローチの広範な評価と分析、および現実世界の定量的評価について述べる。コードとデータセットが公開されている。

A car driver knows how to react on the gestures of the traffic officers. Clearly, this is not the case for the autonomous vehicle, unless it has road traffic control gesture recognition functionalities. In this work, we address the limitation of the existing autonomous driving datasets to provide learning data for traffic control gesture recognition. We introduce a dataset that is based on 3D body skeleton input to perform traffic control gesture classification on every time step. Our dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence. To evaluate our dataset, we propose eight sequential processing models based on deep neural networks such as recurrent networks, attention mechanism, temporal convolutional networks and graph convolutional networks. We present an extensive evaluation and analysis of all approaches for our dataset, as well as real-world quantitative evaluation. The code and dataset is publicly available.

翻訳日:2022-11-04 06:37:40 公開日:2020-07-31

# 医用画像分類のための畳み込みニューラルネットワークにおける新しいグローバル空間注意機構

A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification ( http://arxiv.org/abs/2007.15897v1 )

ライセンス: Link先を確認

Linchuan Xu, Jun Huang, Atsushi Nitanda, Ryo Asaoka, Kenji Yamanishi

(参考訳) 画像分類を含む視覚的タスクのパフォーマンスと解釈性を改善するために、畳み込みニューラルネットワーク(CNN)に空間的注意が導入された。空間的注意の本質は、同じ層またはチャネル内でアクティベーションの相対的重要性を表す重みマップを学ぶことである。既存の注意のメカニズムはすべて、重みマップが画像に特有であるという意味で局所的な注意である。しかし, 医療分野では, 画像の集合が同一対象と同一の症状を記録し, 同一の構造的内容を共有するため, すべての画像が同じ重みマップを共有する必要がある場合がある。本稿では,医療画像の分類を主目的とし,cnnにおける新たな空間的注目機構を提案する。グローバルウェイトマップは重要なピクセルと重要でないピクセルの間の決定境界によってインスタンス化される。また,画素内のすべての画像の強度が画素の特徴であるバイナリ分類器によって決定境界を実現することを提案する。バイナリ分類は画像分類CNNに統合され、CNNと共に最適化される。 2つの医用画像データセットと1つの表情データセットの実験により、googlenet, vgg, resnet, densenetの4つの強力なcnnの性能向上だけでなく、有意義な出席領域も得られ、ドメインのイメージの内容を理解するのに有用であることが示された。

Spatial attention has been introduced to convolutional neural networks (CNNs) for improving both their performance and interpretability in visual tasks including image classification. The essence of the spatial attention is to learn a weight map which represents the relative importance of activations within the same layer or channel. All existing attention mechanisms are local attentions in the sense that weight maps are image-specific. However, in the medical field, there are cases that all the images should share the same weight map because the set of images record the same kind of symptom related to the same object and thereby share the same structural content. In this paper, we thus propose a novel global spatial attention mechanism in CNNs mainly for medical image classification. The global weight map is instantiated by a decision boundary between important pixels and unimportant pixels. And we propose to realize the decision boundary by a binary classifier in which the intensities of all images at a pixel are the features of the pixel. The binary classification is integrated into an image classification CNN and is to be optimized together with the CNN. Experiments on two medical image datasets and one facial expression dataset showed that with the proposed attention, not only the performance of four powerful CNNs which are GoogleNet, VGG, ResNet, and DenseNet can be improved, but also meaningful attended regions can be obtained, which is beneficial for understanding the content of images of a domain.

翻訳日:2022-11-04 06:31:30 公開日:2020-07-31

# リモートセンシングのためのニューラルスタイル転送

Neural Style Transfer for Remote Sensing ( http://arxiv.org/abs/2007.15920v1 )

ライセンス: Link先を確認

Maria Karatzoglidi, Georgios Felekis and Eleni Charou

(参考訳) Leon A. Gatys らの論文 "A Neural Algorithm of Artistic Style" で概説された有名なテクニックは、学術文学と産業応用の両方においてトレンドとなっている。 Neural Style Transfer (NST)は、2D画像の芸術的スタイリング、ユーザ支援作成ツール、エンターテイメントアプリケーションのための制作ツールなど、幅広い用途に欠かせないツールである。本研究の目的は,NSTアルゴリズムに基づく衛星画像から芸術地図を作成する方法を提案することである。この方法は3つの基本的なステップを含む (i)衛星画像における意味的画像分割の適用、その内容の分類(陸水) (二)各クラス及びクラスに対するニューラルスタイル転送の適用 (iii)コラージュ、すなわち、前段の2つの様式化された画像の組み合わせからなる芸術的画像の作成。

The well-known technique outlined in the paper of Leon A. Gatys et al., A Neural Algorithm of Artistic Style, has become a trending topic both in academic literature and industrial applications. Neural Style Transfer (NST) constitutes an essential tool for a wide range of applications, such as artistic stylization of 2D images, user-assisted creation tools and production tools for entertainment applications. The purpose of this study is to present a method for creating artistic maps from satellite images, based on the NST algorithm. This method includes three basic steps (i) application of semantic image segmentation on the original satellite image, dividing its content into classes (i.e. land, water), (ii) application of neural style transfer for each class and (iii) creation of a collage, i.e. an artistic image consisting of a combination of the two stylized image generated on the previous step.

翻訳日:2022-11-04 06:31:07 公開日:2020-07-31

# 3D検出ネットワークを用いた乳房超音波自動診断

Computer-aided Tumor Diagnosis in Automated Breast Ultrasound using 3D Detection Network ( http://arxiv.org/abs/2007.16133v1 )

ライセンス: Link先を確認

Junxiong Yu, Chaoyu Chen, Xin Yang, Yi Wang, Dan Yan, Jianxing Zhang, Dong Ni

(参考訳) 自動乳房超音波(ABUS)は、乳がんの診断と診断のための新しい将来性のある画像モダリティであり、直感的な3D情報と診断価値の高い冠動脈平面情報を提供することができる。しかし、ABUS画像から腫瘍を手動でスクリーニング・診断することは非常に時間がかかり、異常の見落としが生じる可能性がある。そこで本研究では, 病変部位を同定し, 良性腫瘍, 悪性腫瘍と分類するための新しい2段階3D検出ネットワークを提案する。具体的には,abus画像中の病変を同定するために,頻繁に使用されるセグメンテーションネットワークではなく,3次元検出ネットワークを提案する。新しい類似性損失は、病変と背景を効果的に区別するように設計されている。次に、検出された病変を良性または悪性と識別する分類ネットワークを用いる。分類タスクと局所化タスクの相関を改善するために,IoUバランスの取れた分類損失を採用する。良性腫瘍145例,悪性腫瘍273例の418例を対象に,本ネットワークの有効性を検証した。実験により, ネットワークの感度は97.66%, 1.23偽陽性 (FPs) であり, 曲線(AUC) 値0.8720以下の領域を有することがわかった。

Automated breast ultrasound (ABUS) is a new and promising imaging modality for breast cancer detection and diagnosis, which could provide intuitive 3D information and coronal plane information with great diagnostic value. However, manually screening and diagnosing tumors from ABUS images is very time-consuming and overlooks of abnormalities may happen. In this study, we propose a novel two-stage 3D detection network for locating suspected lesion areas and further classifying lesions as benign or malignant tumors. Specifically, we propose a 3D detection network rather than frequently-used segmentation network to locate lesions in ABUS images, thus our network can make full use of the spatial context information in ABUS images. A novel similarity loss is designed to effectively distinguish lesions from background. Then a classification network is employed to identify the located lesions as benign or malignant. An IoU-balanced classification loss is adopted to improve the correlation between classification and localization task. The efficacy of our network is verified from a collected dataset of 418 patients with 145 benign tumors and 273 malignant tumors. Experiments show our network attains a sensitivity of 97.66% with 1.23 false positives (FPs), and has an area under the curve(AUC) value of 0.8720.

翻訳日:2022-11-04 06:30:52 公開日:2020-07-31

# 自己教師付き学習による臨床脳波信号の構造解明

Uncovering the structure of clinical EEG signals with self-supervised learning ( http://arxiv.org/abs/2007.16104v1 )

ライセンス: Link先を確認

Hubert Banville, Omar Chehab, Aapo Hyv\"arinen, Denis-Alexander Engemann, Alexandre Gramfort

(参考訳) 目的。教師付き学習パラダイムは、しばしば利用可能なラベル付きデータの量によって制限される。この現象は脳波(EEG)などの臨床関連データに特に問題があり、専門的な専門知識や人的処理時間の観点からラベル付けに費用がかかる。その結果、脳波データで学習するために設計されたディープラーニングアーキテクチャは、従来の機能ベースアプローチとよく似た、比較的浅いモデルとパフォーマンスを生み出した。しかし、ほとんどの状況では、ラベルのないデータは豊富に利用できる。このラベルのないデータから情報を抽出することで、ラベルへのアクセスが制限されているにもかかわらず、ディープニューラルネットワークで競合性能に達することができるかもしれない。アプローチ。脳波信号の表現を学習するために,ラベルのないデータの構造を発見するための有望な手法である自己教師学習(SSL)について検討した。具体的には,脳波に基づく睡眠ステージングと病理診断という2つの臨床関連課題に対して,時間的文脈予測と対比的予測符号化に基づく2つの課題を検討した。何千もの録音を伴う2つの大規模公開データセットの実験を行い、ベースライン比較を行った。主な結果。 SSLで学習した機能に基づいてトレーニングされた線形分類器は、低ラベルのデータレギュレーションで純粋に監視されたディープニューラルネットワークよりも優れ、すべてのラベルが利用可能になった時に競争力のあるパフォーマンスを達成した。さらに,各手法で得られた埋込みは,年齢効果などの生理現象や臨床現象に関連する明らかな潜伏構造を示した。重要なこと。脳波データに対する自己教師あり学習アプローチの利点を実証する。我々の結果は、SSLが脳波データのディープラーニングモデルをより広く活用する道を開くことを示唆している。

Objective. Supervised learning paradigms are often limited by the amount of labeled data that is available. This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG), where labeling can be costly in terms of specialized expertise and human processing time. Consequently, deep learning architectures designed to learn on EEG data have yielded relatively shallow models and performances at best similar to those of traditional feature-based approaches. However, in most situations, unlabeled data is available in abundance. By extracting information from this unlabeled data, it might be possible to reach competitive performance with deep neural networks despite limited access to labels. Approach. We investigated self-supervised learning (SSL), a promising technique for discovering structure in unlabeled data, to learn representations of EEG signals. Specifically, we explored two tasks based on temporal context prediction as well as contrastive predictive coding on two clinically-relevant problems: EEG-based sleep staging and pathology detection. We conducted experiments on two large public datasets with thousands of recordings and performed baseline comparisons with purely supervised and hand-engineered approaches. Main results. Linear classifiers trained on SSL-learned features consistently outperformed purely supervised deep neural networks in low-labeled data regimes while reaching competitive performance when all labels were available. Additionally, the embeddings learned with each method revealed clear latent structures related to physiological and clinical phenomena, such as age effects. Significance. We demonstrate the benefit of self-supervised learning approaches on EEG data. Our results suggest that SSL may pave the way to a wider use of deep learning models on EEG data.

翻訳日:2022-11-04 06:30:01 公開日:2020-07-31

# 構造化宝くじを用いた食事深層音響モデル

Diet deep generative audio models with structured lottery ( http://arxiv.org/abs/2007.16170v1 )

ライセンス: Link先を確認

Philippe Esling, Ninon Devis, Adrien Bitton, Antoine Caillon, Axel Chemla--Romeu-Santos, Constance Douwes

(参考訳) ディープラーニングモデルは、ほとんどのオーディオアプリケーション分野で非常に成功したソリューションを提供している。しかし、これらのモデルの高精度さは、膨大な計算コストを犠牲にしている。この側面は、提案されたモデルの品質を評価する際に、ほとんど常に見過ごされている。しかし、モデルは複雑さを考慮せずに評価するべきではない。この側面はオーディオアプリケーションにおいて特に重要であり、リアルタイム制約のある特殊な組み込みハードウェアに大きく依存している。本稿では,深層音響モデルにおける宝くじチケット仮説を研究することにより,深層モデルが過小評価されているという最近の観測結果について述べる。この仮説は、非常に効率的な小さなサブネットワークが深いモデルに存在し、孤立して訓練された場合、より大きなモデルよりも高い精度を提供するというものである。しかし、宝くじは構造化されていないマスキングに依存するため、結果として得られるモデルはディスクサイズや推論時間に何の利益も与えない。そこで我々は,構造化トリミングを行う手法を開発した。これはグローバル選択に頼り、相互情報に基づく特定の基準を導入する必要があることを示す。まず、小型モデルが大型モデルよりも精度が高いという驚くべき結果を確認する。さらに, モデル重量の最大95%を, 精度の大幅な低下を伴わずに除去できることを示した。したがって、wavenet、sing、ddspなどの一般的な手法をまたいで、高い精度で最大100倍小さい生成音声の非常に軽量なモデルを得ることができる。 Raspberry PiとArduinoにこれらのモデルを埋め込む理論的境界について検討し、大きなGPUモデルと同じ品質のCPU上で生成モデルを得ることができることを示す。最後に,組込みプラットフォーム上での深層生成音声モデルの実装の可能性について論じる。

Deep learning models have provided extremely successful solutions in most audio application fields. However, the high accuracy of these models comes at the expense of a tremendous computation cost. This aspect is almost always overlooked in evaluating the quality of proposed models. However, models should not be evaluated without taking into account their complexity. This aspect is especially critical in audio applications, which heavily relies on specialized embedded hardware with real-time constraints. In this paper, we build on recent observations that deep models are highly overparameterized, by studying the lottery ticket hypothesis on deep generative audio models. This hypothesis states that extremely efficient small sub-networks exist in deep models and would provide higher accuracy than larger models if trained in isolation. However, lottery tickets are found by relying on unstructured masking, which means that resulting models do not provide any gain in either disk size or inference time. Instead, we develop here a method aimed at performing structured trimming. We show that this requires to rely on global selection and introduce a specific criterion based on mutual information. First, we confirm the surprising result that smaller models provide higher accuracy than their large counterparts. We further show that we can remove up to 95% of the model weights without significant degradation in accuracy. Hence, we can obtain very light models for generative audio across popular methods such as Wavenet, SING or DDSP, that are up to 100 times smaller with commensurate accuracy. We study the theoretical bounds for embedding these models on Raspberry Pi and Arduino, and show that we can obtain generative models on CPU with equivalent quality as large GPU models. Finally, we discuss the possibility of implementing deep generative audio models on embedded platforms.

翻訳日:2022-11-04 06:29:36 公開日:2020-07-31

# 宝くじトリミングによる超軽量深度MIR

Ultra-light deep MIR by trimming lottery tickets ( http://arxiv.org/abs/2007.16187v1 )

ライセンス: Link先を確認

Philippe Esling, Theis Bazin, Adrien Bitton, Tristan Carsault, Ninon Devis

(参考訳) 音楽情報検索における現状の成果は、主にディープラーニングのアプローチに支配されている。これらはすべてのタスクに対して前例のない精度を提供する。しかし、これらのモデルの一貫して見過ごされがちな欠点は、驚くほど複雑であり、それが成功に不可欠であるように思える。本稿では,抽選券仮説に基づくモデル刈り込み手法を提案することで,この問題に対処した。個々の重みをマスクする代わりに、ユニット全体の構造的なトリミングを通じてパラメータを明示的に削除できるように、元のアプローチを変更します。これにより,サイズやメモリ,操作数といった面で,事実上軽量なモデルが実現される。本提案は,精度を損なうことなく,最大90%のモデルパラメータを除去できることを示す。我々は、より小さな圧縮比(ネットワークの最大85%)で、より軽いモデルが、より重いモデルよりも一貫して優れているという驚くべき結果を確認した。我々はこれらの結果を,音声分類,ピッチ認識,コード抽出,ドラムの書き起こし,オンセット推定など,多数のMIRタスクで示す。 MIRの超軽量ディープラーニングモデルはCPU上で動作し、最小限の精度で組み込みデバイスに適合する。

Current state-of-the-art results in Music Information Retrieval are largely dominated by deep learning approaches. These provide unprecedented accuracy across all tasks. However, the consistently overlooked downside of these models is their stunningly massive complexity, which seems concomitantly crucial to their success. In this paper, we address this issue by proposing a model pruning method based on the lottery ticket hypothesis. We modify the original approach to allow for explicitly removing parameters, through structured trimming of entire units, instead of simply masking individual weights. This leads to models which are effectively lighter in terms of size, memory and number of operations. We show that our proposal can remove up to 90% of the model parameters without loss of accuracy, leading to ultra-light deep MIR models. We confirm the surprising result that, at smaller compression ratios (removing up to 85% of a network), lighter models consistently outperform their heavier counterparts. We exhibit these results on a large array of MIR tasks including audio classification, pitch recognition, chord extraction, drum transcription and onset estimation. The resulting ultra-light deep learning models for MIR can run on CPU, and can even fit on embedded devices with minimal degradation of accuracy.

翻訳日:2022-11-04 06:29:11 公開日:2020-07-31

# SimulEval: 同時翻訳のための評価ツールキット

SimulEval: An Evaluation Toolkit for Simultaneous Translation ( http://arxiv.org/abs/2007.16193v1 )

ライセンス: Link先を確認

Xutai Ma, Mohammad Javad Dousti, Changhan Wang, Jiatao Gu, Juan Pino

(参考訳) テキストと音声の同時翻訳は、モデルが完全なソース入力を読む前に翻訳を開始するリアルタイムおよび低レイテンシシナリオに焦点を当てる。同時翻訳モデルの評価は、レイテンシが翻訳品質に加えて考慮すべき要素であることから、オフラインモデルよりも複雑である。研究コミュニティは、同時翻訳のための新しいモデリングアプローチに重点を置いているが、現在では普遍的な評価手順を欠いている。そこで本研究では,テキストと音声の同時翻訳のための簡易かつ汎用的な評価ツールキットであるSimulEvalを提案する。サーバクライアントスキームを導入し、同時に翻訳シナリオを作成し、サーバがソース入力を送り、評価のための予測を受け取り、クライアントがカスタマイズされたポリシーを実行する。ポリシーが与えられたら、自動的に同時デコードを実行し、いくつかの一般的なレイテンシメトリクスをまとめて報告する。また、テキスト同時翻訳から音声タスクへの遅延メトリクスも適用する。さらに、SimulEvalは、システムの同時復号プロセスをよりよく理解するための可視化インターフェースを備えている。 SimulEvalはすでに、IWSLT 2020の同時音声翻訳タスクに広く使われている。コードは出版時に公開される。

Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario where the model starts translating before reading the complete source input. Evaluating simultaneous translation models is more complex than offline models because the latency is another factor to consider in addition to translation quality. The research community, despite its growing focus on novel modeling approaches to simultaneous translation, currently lacks a universal evaluation procedure. Therefore, we present SimulEval, an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation. A server-client scheme is introduced to create a simultaneous translation scenario, where the server sends source input and receives predictions for evaluation and the client executes customized policies. Given a policy, it automatically performs simultaneous decoding and collectively reports several popular latency metrics. We also adapt latency metrics from text simultaneous translation to the speech task. Additionally, SimulEval is equipped with a visualization interface to provide better understanding of the simultaneous decoding process of a system. SimulEval has already been extensively used for the IWSLT 2020 shared task on simultaneous speech translation. Code will be released upon publication.

翻訳日:2022-11-04 06:28:33 公開日:2020-07-31

# ランキング指向推薦システムグラフの埋め込み

Embedding Ranking-Oriented Recommender System Graphs ( http://arxiv.org/abs/2007.16173v1 )

ライセンス: Link先を確認

Taher Hekmatfar, Saman Haratizadeh, Sama Goliaei

(参考訳) グラフベースレコメンダシステム(grss)は、データのグラフィカル表現における構造情報を解析し、特に直接ユーザ・テーマ関係データがスパースである場合に、より優れたレコメンデーションを行う。主要なレコメンデーションシステムを構成するランク指向のGRSは、主にノードの類似度を測定するために好み(またはランク)データのグラフィカルな表現を使い、そこから近隣のメカニズムを使ってレコメンデーションリストを推測することができる。本稿では,グラフベースの新しいランク指向推薦フレームワークであるPGRecを提案する。 PGRecは、PrefGraphと呼ばれる新しいグラフ構造によって、アイテムよりもユーザの好みをモデル化する。このグラフは、要素化と深層学習の両方の手法を利用して、ユーザ、アイテム、嗜好を表すベクトルを抽出する改良された埋め込みアプローチによって活用される。結果の埋め込みは、最終的なレコメンデーションリストが推測されるユーザの未知のペアワイズ選好を予測するために使用される。本研究では,提案手法の性能評価を行い,pgrecが,映画レンスデータセットのndcg@10において,ベースラインアルゴリズムを最大3.2%上回ることを示した。

Graph-based recommender systems (GRSs) analyze the structural information in the graphical representation of data to make better recommendations, especially when the direct user-item relation data is sparse. Ranking-oriented GRSs that form a major class of recommendation systems, mostly use the graphical representation of preference (or rank) data for measuring node similarities, from which they can infer a recommendation list using a neighborhood-based mechanism. In this paper, we propose PGRec, a novel graph-based ranking-oriented recommendation framework. PGRec models the preferences of the users over items, by a novel graph structure called PrefGraph. This graph is then exploited by an improved embedding approach, taking advantage of both factorization and deep learning methods, to extract vectors representing users, items, and preferences. The resulting embedding are then used for predicting users' unknown pairwise preferences from which the final recommendation lists are inferred. We have evaluated the performance of the proposed method against the state of the art model-based and neighborhood-based recommendation methods, and our experiments show that PGRec outperforms the baseline algorithms up to 3.2% in terms of NDCG@10 in different MovieLens datasets.

翻訳日:2022-11-04 06:22:50 公開日:2020-07-31

# 深層学習に基づく変調分類器に対する複数アンテナによる逆攻撃

Adversarial Attacks with Multiple Antennas Against Deep Learning-Based Modulation Classifiers ( http://arxiv.org/abs/2007.16204v1 )

ライセンス: Link先を確認

Brian Kim and Yalin E. Sagduyu and Tugba Erpek and Kemal Davaslioglu and Sennur Ulukus

(参考訳) 本稿では,受信者が異なる変調方式の受信者に信号を送信し,受信者が深層学習に基づく分類器を用いて受信信号の変調方式を分類する無線通信システムを考える。同時に、敵は複数のアンテナを用いて敵の摂動を送信し、分類器を騙して受信した信号を誤分類する。敵の機械学習の観点から、敵の攻撃性能を改善するために、敵の複数のアンテナを利用する方法を示す。 2つの主要なポイントは、敵の複数のアンテナ、すなわちアンテナ間の電力配分とチャネルの多様性の利用を利用して検討される。まず,一つのアンテナを持つ複数の独立した敵は,同じ総電力で複数のアンテナを持つ1つの敵に比べて攻撃性能が向上しないことを示す。そこで我々は,1つのアンテナのみに電力を割り当てたり,チャネルゲインに比例あるいは逆比例するなど,複数のアンテナ間で電力を割り当てる様々な方法を検討する。チャネルの多様性を利用して,シンボルレベルで最大チャンネル利得を有するチャネルを介して逆摂動を伝達する攻撃を提案する。この攻撃は,アンテナ間のチャネルのばらつきやチャネル相関の観点から,異なるチャネル条件下での他の攻撃と比較して,分類精度が著しく低下することを示す。また,チャネルの多様性を活かして敵攻撃を行うアンテナの数が増加するにつれて,攻撃の成功が著しく向上することを示す。

We consider a wireless communication system, where a transmitter sends signals to a receiver with different modulation types while the receiver classifies the modulation types of the received signals using its deep learning-based classifier. Concurrently, an adversary transmits adversarial perturbations using its multiple antennas to fool the classifier into misclassifying the received signals. From the adversarial machine learning perspective, we show how to utilize multiple antennas at the adversary to improve the adversarial (evasion) attack performance. Two main points are considered while exploiting the multiple antennas at the adversary, namely the power allocation among antennas and the utilization of channel diversity. First, we show that multiple independent adversaries, each with a single antenna cannot improve the attack performance compared to a single adversary with multiple antennas using the same total power. Then, we consider various ways to allocate power among multiple antennas at a single adversary such as allocating power to only one antenna, and proportional or inversely proportional to the channel gain. By utilizing channel diversity, we introduce an attack to transmit the adversarial perturbation through the channel with the largest channel gain at the symbol level. We show that this attack reduces the classifier accuracy significantly compared to other attacks under different channel conditions in terms of channel variance and channel correlation across antennas. Also, we show that the attack success improves significantly as the number of antennas increases at the adversary that can better utilize channel diversity to craft adversarial attacks.

翻訳日:2022-11-04 06:22:27 公開日:2020-07-31

# マルウェアデータの意味のあるクラスタの同定

Identifying meaningful clusters in malware data ( http://arxiv.org/abs/2008.01175v1 )

ライセンス: Link先を確認

Renato Cordeiro de Amorim and Carlos David Lopez Ruiz

(参考訳) ドライブ・バイ・ダウンのマルウェアデータに意味のあるクラスタを見つけることは特に難しい作業である。マルウェアデータは、幅広い濃度の異なる重なり合うクラスタを含む傾向にある。これは、マルウェアのサンプルの間にかなり類似している可能性があるためである(いくつかは同一の家系に属すると言われている)。クラスタリングアルゴリズムは通常、正規化されたデータセットに適用される。しかし、正規化のプロセスは、クラスタリングに類似した貢献をするために異なる範囲の値で特徴を設定することを目的としている。意味の薄いものよりも有意義な機能を好まないので、データ前処理の段階を期待すべきだろう。本稿では,上記の問題に正確に対処する手法を提案する。クラスタ間の分離を増加させることのできる反復データ前処理方法である。それぞれの機能のクラスタ内の関連度を計算し、それをデータ再スケーリングファクタとして使用します。これを収束するまで繰り返すことで、マルウェアデータはクリアなクラスタに分離され、平均的なシルエット幅が増加した。

Finding meaningful clusters in drive-by-download malware data is a particularly difficult task. Malware data tends to contain overlapping clusters with wide variations of cardinality. This happens because there can be considerable similarity between malware samples (some are even said to belong to the same family), and these tend to appear in bursts. Clustering algorithms are usually applied to normalised data sets. However, the process of normalisation aims at setting features with different range values to have a similar contribution to the clustering. It does not favour more meaningful features over those that are less meaningful, an effect one should perhaps expect of the data pre-processing stage. In this paper we introduce a method to deal precisely with the problem above. This is an iterative data pre-processing method capable of aiding to increase the separation between clusters. It does so by calculating the within-cluster degree of relevance of each feature, and then it uses these as a data rescaling factor. By repeating this until convergence our malware data was separated in clear clusters, leading to a higher average silhouette width.

翻訳日:2022-11-04 06:21:23 公開日:2020-07-31

# 実世界信号のクラウドソーシング音声品質評価予測のためのピラミッドリカレントネットワーク

A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals ( http://arxiv.org/abs/2007.15797v1 )

ライセンス: Link先を確認

Xuan Dong and Donald S. Williamson

(参考訳) 目的語質尺度の現実的能力は,(1)実環境を適切にモデル化しない模擬データから,(2)主観的評価と強く相関しない客観的スコアを推定することにより,制限される。さらに、リスナー品質評価を伴う現実世界の信号の大規模なデータセットは、現在存在しないため、現実世界の評価が容易になる。本稿では,人間の聞き手によって評価される実世界の音声信号の知覚的品質を収集し,予測する。まず,2つの実世界のコーパス上でクラウドソースによる聞き取り調査を行い,質の高い評価データセットを収集した。さらに、注目機構を備えたピラミッド双方向長期記憶(pBLSTM)ネットワークを用いて、人間の品質評価を予測する新しい手法を開発した。その結果,予測スコアが人的判断と強く相関する従来の評価手法よりも統計的に低い推定誤差が得られた。

The real-world capabilities of objective speech quality measures are limited since current measures (1) are developed from simulated data that does not adequately model real environments; or they (2) predict objective scores that are not always strongly correlated with subjective ratings. Additionally, a large dataset of real-world signals with listener quality ratings does not currently exist, which would help facilitate real-world assessment. In this paper, we collect and predict the perceptual quality of real-world speech signals that are evaluated by human listeners. We first collect a large quality rating dataset by conducting crowdsourced listening studies on two real-world corpora. We further develop a novel approach that predicts human quality ratings using a pyramid bidirectional long short term memory (pBLSTM) network with an attention mechanism. The results show that the proposed model achieves statistically lower estimation errors than prior assessment approaches, where the predicted scores strongly correlate with human judgments.

翻訳日:2022-11-04 06:20:48 公開日:2020-07-31

# 大規模肺炎と気胸を用いた弱監督型一段階視覚と言語疾患の検出

Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies ( http://arxiv.org/abs/2007.15778v1 )

ライセンス: Link先を確認

Leo K. Tam, Xiaosong Wang, Evrim Turkbey, Kevin Lu, Yuhong Wen, and Daguang Xu

(参考訳) 詳細なラベルがないため、大きなデータセットにもかかわらず、医療画像における臨床関連オブジェクトの検出は困難である。ラベル問題に対処するために、自然言語情報を含む検出アーキテクチャを用いてシーンレベルのラベルを利用する。特に肺炎と気胸に焦点をあてたMIMIC-CXRデータセットに,放射線技師によるペアリングボックスと自然言語アノテーションを新たに導入した。このデータセットと合わせて,クラスアクティベーションマッピング(CAM)や勾配CAM,およびNIH ChestXray-14およびMIMIC-CXRデータセットに対する関連する実装との強力なベースライン比較とともに,弱教師付きトランスフォーマー層選択型ワンステージデュアルヘッド検出アーキテクチャ(LITERATI)を提案する。視覚言語アーキテクチャの進歩から借用したliterati法は、純粋に監督された方法でスケールする検出のために、画像と参照表現(自然言語で画像にローカライズされたオブジェクト)の入力を示す。アーキテクチャの変更は、3つの障害に対処する - 教師付き視覚と言語検出を弱教師付きで実装し、臨床参照表現自然言語情報を取り入れ、マップ確率の高い忠実度検出を生成する。それにもかかわらず、微妙な参照、マルチインスタンス仕様、比較的冗長な医療報告を含む放射線医学的アノテーションの難易度は、スケールでの視覚言語検出タスクを将来的な調査に刺激し続ける。

Detecting clinically relevant objects in medical images is a challenge despite large datasets due to the lack of detailed labels. To address the label issue, we utilize the scene-level labels with a detection architecture that incorporates natural language information. We present a challenging new set of radiologist paired bounding box and natural language annotations on the publicly available MIMIC-CXR dataset especially focussed on pneumonia and pneumothorax. Along with the dataset, we present a joint vision language weakly supervised transformer layer-selected one-stage dual head detection architecture (LITERATI) alongside strong baseline comparisons with class activation mapping (CAM), gradient CAM, and relevant implementations on the NIH ChestXray-14 and MIMIC-CXR dataset. Borrowing from advances in vision language architectures, the LITERATI method demonstrates joint image and referring expression (objects localized in the image using natural language) input for detection that scales in a purely weakly supervised fashion. The architectural modifications address three obstacles -- implementing a supervised vision and language detection method in a weakly supervised fashion, incorporating clinical referring expression natural language information, and generating high fidelity detections with map probabilities. Nevertheless, the challenging clinical nature of the radiologist annotations including subtle references, multi-instance specifications, and relatively verbose underlying medical reports, ensures the vision language detection task at scale remains stimulating for future investigation.

翻訳日:2022-11-04 06:20:34 公開日:2020-07-31

# 身体を見る: 心理的苦痛における身体の身振りと自己適応の自動分析

Looking At The Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress ( http://arxiv.org/abs/2007.15815v1 )

ライセンス: Link先を確認

Weizhe Lin, Indigo Orton, Qingbiao Li, Gabriela Pavarini, Marwa Mahmoud

(参考訳) 心理的苦痛は社会において重要かつ増大する問題である。このような苦痛の自動検出、評価、分析は、研究の活発な領域である。顔、頭、声といったモダリティと比較して、これらのタスクに身体のモダリティを使用することを研究する研究は比較的少ない。これは、利用可能なデータセットが限られていることや、有用な身体機能を自動的に抽出するのが難しいことによる。最近のポーズ推定とディープラーニングの進歩により、このモダリティとドメインに対する新しいアプローチが可能になった。そこで本研究では,短時間のインタビューや自己報告の苦難ラベルのための全身ビデオを含む新しいデータセットを収集・分析した。本研究では,自己適応者のサブセットである自己適応とフィジットを自動的に検出する新たな手法を提案する。統計的身体動作とフィジット機能の分析を行い、被験者の行動にどう影響するかを探索する。そこで本研究では,マルチモーダル・ディープ・デノイジングオートエンコーダとフィッシャーベクトルエンコーディングの改良を用いた特徴表現を組み合わせるマルチモーダル手法を提案する。提案モデルでは,自己報告型不安度と抑うつ度とをラベル付けしたデータセットにおいて,音声視覚機能と自動検出型行動手がかりを組み合わせることで,苦痛レベルを予測できることを実証した。

Psychological distress is a significant and growing issue in society. Automatic detection, assessment, and analysis of such distress is an active area of research. Compared to modalities such as face, head, and vocal, research investigating the use of the body modality for these tasks is relatively sparse. This is, in part, due to the limited available datasets and difficulty in automatically extracting useful body features. Recent advances in pose estimation and deep learning have enabled new approaches to this modality and domain. To enable this research, we have collected and analyzed a new dataset containing full body videos for short interviews and self-reported distress labels. We propose a novel method to automatically detect self-adaptors and fidgeting, a subset of self-adaptors that has been shown to be correlated with psychological distress. We perform analysis on statistical body gestures and fidgeting features to explore how distress levels affect participants' behaviors. We then propose a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our proposed model, combining audio-visual features with automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels.

翻訳日:2022-11-04 06:20:03 公開日:2020-07-31

# アナカタバティック慣性:PSOのための粒子ワイド適応慣性

Anakatabatic Inertia: Particle-wise Adaptive Inertia for PSO ( http://arxiv.org/abs/2008.00979v1 )

ライセンス: Link先を確認

Sini\v{s}a Dru\v{z}eta, Stefan Ivi\'c

(参考訳) 粒子群最適化の開発を通じて、粒子慣性は可能な方法の改善を研究する方法の重要な側面として確立されてきた。先行研究の継続として, 個々の粒子の適合性向上に基づく慣性重み適応の新たな一般化手法, anakatabatic inertiaを提案する。この手法により、粒子の増大または減少に対応する各粒子の慣性重量値、すなわち粒子の昇降(アナバティック)または下降(カタバティック)運動によって条件づけられる。提案する慣性重み制御フレームワークは、cec 2014テストスイートの30のテスト機能でメタ最適化され、テストされた。提案手法は, 使用するPSO法(Standard PSOおよびTVAC-PSO)毎に4種類のアナカタバティックモデルを生成した。ベンチマーク実験の結果, anakatabatic inertiaモデルを用いた場合, 標準pso(最終フィットネス最小値が0.09桁まで低下する)と, 比較的強力なtvac-pso(最終フィットネス最小値が0.59桁まで低下する)の精度向上が, ほとんどが方法の性能に悪影響を及ぼすことなく確実に達成できることがわかった。

Throughout the course of the development of Particle Swarm Optimization, particle inertia has been established as an important aspect of the method for researching possible method improvements. As a continuation of our previous research, we propose a novel generalized technique of inertia weight adaptation based on individual particle's fitness improvement, called anakatabatic inertia. This technique allows for adapting inertia weight value for each particle corresponding to the particle's increasing or decreasing fitness, i.e. conditioned by particle's ascending (anabatic) or descending (katabatic) movement. The proposed inertia weight control framework was metaoptimized and tested on the 30 test functions of the CEC 2014 test suite. The conducted procedure produced four anakatabatic models, two for each of the PSO methods used (Standard PSO and TVAC-PSO). The benchmark testing results show that using the proposed anakatabatic inertia models reliably yield moderate improvements in accuracy of Standard PSO (final fitness minimum reduced up to 0.09 orders of magnitude) and rather strong improvements for TVAC-PSO (final fitness minimum reduced up to 0.59 orders of magnitude), mostly without any adverse effects on the method's performance.

翻訳日:2022-11-04 06:13:30 公開日:2020-07-31

# トロイの木馬ニューラルネットワークの実用的検出:データ制限とデータフリーケース

Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases ( http://arxiv.org/abs/2007.15802v1 )

ライセンス: Link先を確認

Ren Wang, Gaoyuan Zhang, Sijia Liu, Pin-Yu Chen, Jinjun Xiong, Meng Wang

(参考訳) トレーニングデータが悪質に改ざんされた場合、取得したディープニューラルネットワーク(DNN)の予測は、トロイの木馬攻撃(または中毒バックドア攻撃)と呼ばれる敵によって操作できる。トロイの木馬攻撃に対するDNNの堅牢性の欠如は、下流アプリケーションにおけるリアルタイム機械学習(ML)システムに大きなダメージを与える可能性がある。本稿では,データカース方式におけるトロイの木馬ネットワーク(trojannet)検出の問題点について検討する。まず,データ限定型TrojanNet検出器(TND)を提案する。トロイの木馬攻撃と,各サンプル攻撃,全サンプルユニバーサル攻撃を含む予測・回避攻撃の関連を探索することにより,有効なデータ制限型tndを確立できることを示す。さらに,データサンプルにアクセスせずにTrojanNetを検出できるデータフリーTNDを提案する。このようなTNDは、ランダムノイズ入力においてもトロイの木馬の挙動を示す隠れニューロンの内部応答を利用して構築可能であることを示す。提案手法の有効性は, CIFAR-10, GTSRB, ImageNetなど, 異なるモデルアーキテクチャおよびデータセット下での広範な実験により評価される。

When the training data are maliciously tampered, the predictions of the acquired deep neural network (DNN) can be manipulated by an adversary known as the Trojan attack (or poisoning backdoor attack). The lack of robustness of DNNs against Trojan attacks could significantly harm real-life machine learning (ML) systems in downstream applications, therefore posing widespread concern to their trustworthiness. In this paper, we study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime, where only the weights of a trained DNN are accessed by the detector. We first propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection. We show that an effective data-limited TND can be established by exploring connections between Trojan attack and prediction-evasion adversarial attacks including per-sample attack as well as all-sample universal attack. In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples. We show that such a TND can be built by leveraging the internal response of hidden neurons, which exhibits the Trojan behavior even at random noise inputs. The effectiveness of our proposals is evaluated by extensive experiments under different model architectures and datasets including CIFAR-10, GTSRB, and ImageNet.

翻訳日:2022-11-04 06:12:23 公開日:2020-07-31

# 深い直接的可能性のノックオフ

Deep Direct Likelihood Knockoffs ( http://arxiv.org/abs/2007.15835v1 )

ライセンス: Link先を確認

Mukund Sudarshan, Wesley Tansey, Rajesh Ranganath

(参考訳) 予測モデリングでは、ディープニューラルネットワークなどのブラックボックス機械学習手法を使用して最先端のパフォーマンスを実現することが多い。科学的領域では、科学者は予測を行うのにどの特徴が実際に重要なのかを知りたがることが多い。これらの発見は、コストのかかるフォローアップ実験につながる可能性があり、発見に対するエラー率があまり高くないことが重要である。 model-xのノックオフにより、fdrを制御して重要な機能を発見できる。しかし、ノックオフには、いわゆる"swap"プロパティに準拠しながら、ノックオフ機能を正確にモデル化できるリッチな生成モデルが必要である。我々は、ノックオフスワップ特性がもたらすKLの発散を直接最小化するDeep Direct Likelihood Knockoffs (DDLK) を開発した。 DDLKは、まず特徴の明示的な可能性を最大化し、次に特徴とノックオフの結合分布とそれらのスワップ間のKLのばらつきを最小化する。生成したノックオフが任意のスワップで有効であることを保証するため、DDLKはGumbel-Softmaxトリックを使用して、最悪のスワップでノックオフジェネレータを最適化する。 DDLKはベースラインよりも高いパワーを持ち、COVID-19の震源の1つである大規模なデータセットを含む様々な合成および実際のベンチマークでの偽発見率を制御する。

Predictive modeling often uses black box machine learning methods, such as deep neural networks, to achieve state-of-the-art performance. In scientific domains, the scientist often wishes to discover which features are actually important for making the predictions. These discoveries may lead to costly follow-up experiments and as such it is important that the error rate on discoveries is not too high. Model-X knockoffs enable important features to be discovered with control of the FDR. However, knockoffs require rich generative models capable of accurately modeling the knockoff features while ensuring they obey the so-called "swap" property. We develop Deep Direct Likelihood Knockoffs (DDLK), which directly minimizes the KL divergence implied by the knockoff swap property. DDLK consists of two stages: it first maximizes the explicit likelihood of the features, then minimizes the KL divergence between the joint distribution of features and knockoffs and any swap between them. To ensure that the generated knockoffs are valid under any possible swap, DDLK uses the Gumbel-Softmax trick to optimize the knockoff generator under the worst-case swap. We find DDLK has higher power than baselines while controlling the false discovery rate on a variety of synthetic and real benchmarks including a task involving a large dataset from one of the epicenters of COVID-19.

翻訳日:2022-11-04 06:11:42 公開日:2020-07-31

# 再利用距離の学習

Learning Forward Reuse Distance ( http://arxiv.org/abs/2007.15859v1 )

ライセンス: Link先を確認

Pengcheng Li, Yongbin Gu

(参考訳) キャッシング技術は、コンピュータアーキテクチャにおけるwebキャッシュからインフラストラクチャ、memcached、メモリキャッシュに至るまで、クラウドコンピューティングの時代において広く使われている。キャッシュされたデータの予測は、キャッシュ管理とパフォーマンスを大幅に改善する。近年のディープラーニング技術の進歩は、新しいインテリジェントキャッシュ置換ポリシーの設計を可能にする。本研究では,将来のデータアクセスを予測する学習支援手法を提案する。 LSTMに基づく強力なリカレントニューラルネットワークモデルにより,キャッシュトレースのみを入力として,高い予測精度が得られることがわかった。高い精度は、慎重に作られたローカリティ駆動の機能設計から得られる。高い予測精度に触発され、擬似PTポリシーを提案し、Microsoft Researchから13の現実世界のストレージワークロードに対して評価する。その結果、新しいキャッシュポリシーは、最先端の実用的なポリシーを最大19.2%改善し、平均でオプトより2.3%高いミス率しか発生しないことがわかった。

Caching techniques are widely used in the era of cloud computing from applications, such as Web caches to infrastructures, Memcached and memory caches in computer architectures. Prediction of cached data can greatly help improve cache management and performance. The recent advancement of deep learning techniques enables the design of novel intelligent cache replacement policies. In this work, we propose a learning-aided approach to predict future data accesses. We find that a powerful LSTM-based recurrent neural network model can provide high prediction accuracy based on only a cache trace as input. The high accuracy results from a carefully crafted locality-driven feature design. Inspired by the high prediction accuracy, we propose a pseudo OPT policy and evaluate it upon 13 real-world storage workloads from Microsoft Research. Results demonstrate that the new cache policy improves state-of-art practical policies by up to 19.2% and incurs only 2.3% higher miss ratio than OPT on average.

翻訳日:2022-11-04 06:11:20 公開日:2020-07-31

# 非定常問題に適用可能な最適化を用いた深層ロボット学習に向けて

Towards Deep Robot Learning with Optimizer applicable to Non-stationary Problems ( http://arxiv.org/abs/2007.15890v1 )

ライセンス: Link先を確認

Taisuke Kobayashi

(参考訳) 本稿では,d-amsgradと呼ばれる深層学習のための新しい最適化器を提案する。実世界のデータでは、ロボットのスキルを学ぶために使用するデータセットからノイズや外れ値を排除することはできない。この問題は、データをリアルタイムで収集することで学習するロボットにとって特に重要であり、手作業ではソートできない。そのため、この問題を解決するためにいくつかのノイズローバストオプティマイザが開発され、Adam Optimizationr の変種である AmsGrad は、その収束の証明を持っている。しかし、実際にはロボットのシナリオにおける学習性能は向上しない。この理由は、ほとんどのロボット学習問題は静止していないと仮定されているが、AmsGradは学習中に最大2番目の運動量を与えると仮定している。非定常問題に適応するために, 最大2次運動量を緩やかに減少させる改良版を提案する。提案するオプティマイザは,ベースラインと同じ世界的最適点に達する能力を有し,その性能はロボティクス問題におけるベースラインよりも優れていた。

This paper proposes a new optimizer for deep learning, named d-AmsGrad. In the real-world data, noise and outliers cannot be excluded from dataset to be used for learning robot skills. This problem is especially striking for robots that learn by collecting data in real time, which cannot be sorted manually. Several noise-robust optimizers have therefore been developed to resolve this problem, and one of them, named AmsGrad, which is a variant of Adam optimizer, has a proof of its convergence. However, in practice, it does not improve learning performance in robotics scenarios. This reason is hypothesized that most of robot learning problems are non-stationary, but AmsGrad assumes the maximum second momentum during learning to be stationarily given. In order to adapt to the non-stationary problems, an improved version, which slowly decays the maximum second momentum, is proposed. The proposed optimizer has the same capability of reaching the global optimum as baselines, and its performance outperformed that of the baselines in robotics problems.

翻訳日:2022-11-04 06:11:08 公開日:2020-07-31

# 機械学習のためのグラフ信号処理 : レビューと新しい視点

Graph signal processing for machine learning: A review and new perspectives ( http://arxiv.org/abs/2007.16061v1 )

ライセンス: Link先を確認

Xiaowen Dong, Dorina Thanou, Laura Toni, Michael Bronstein, Pascal Frossard

(参考訳) 大規模構造化データの効率的な表現、処理、分析、可視化、特にネットワークやグラフのような複雑なドメインに関連するものなどは、現代の機械学習において重要な問題である。グラフ信号処理(gsp)は、グラフでサポートされているデータを扱うことを目的とした信号処理モデルとアルゴリズムの活気ある分野であり、この課題に対処するために新たな研究の道を開く。本稿では、グラフフィルタや変換といったgspの概念とツールが、新しい機械学習アルゴリズムの開発にもたらしたいくつかの重要な貢献についてレビューする。特に,データ構造とリレーショナル・プライオリティの活用,データと計算効率の向上,モデル解釈可能性の向上という3つの側面に注目した。さらに, 応用数学と信号処理, 機械学習とネットワーク科学の橋渡しとなるであろうgsp技術の今後の発展について, 新たな視点を提示する。これらの異なる分野にまたがる交配は、現代における複雑なデータ分析の多くの課題を解き放つのに役立つかもしれない。

The effective representation, processing, analysis, and visualization of large-scale structured data, especially those related to complex domains such as networks and graphs, are one of the key questions in modern machine learning. Graph signal processing (GSP), a vibrant branch of signal processing models and algorithms that aims at handling data supported on graphs, opens new paths of research to address this challenge. In this article, we review a few important contributions made by GSP concepts and tools, such as graph filters and transforms, to the development of novel machine learning algorithms. In particular, our discussion focuses on the following three aspects: exploiting data structure and relational priors, improving data and computational efficiency, and enhancing model interpretability. Furthermore, we provide new perspectives on future development of GSP techniques that may serve as a bridge between applied mathematics and signal processing on one side, and machine learning and network science on the other. Cross-fertilization across these different disciplines may help unlock the numerous challenges of complex data analysis in the modern age.

翻訳日:2022-11-04 06:10:50 公開日:2020-07-31

# ディープラーニング分類器における逐次ドリフト検出

Sequential Drift Detection in Deep Learning Classifiers ( http://arxiv.org/abs/2007.16109v1 )

ライセンス: Link先を確認

Samuel Ackerman, Parijat Dube, Eitan Farchi

(参考訳) ニューラルネットワーク埋め込みを用いて,適切な逐次決定枠組み内でドリフト検出を定式化し,データドリフトの検出を行う。これにより、統計検査を繰り返し適用しながらも、誤報率の制御が可能となる。変更検出アルゴリズムは,誤報の回避と迅速な検出のトレードオフに自然に直面するため,これら2つの懸念のバランスをとるアルゴリズムの能力を評価する損失関数を導入し,一連の実験で使用する。

We utilize neural network embeddings to detect data drift by formulating the drift detection within an appropriate sequential decision framework. This enables control of the false alarm rate although the statistical tests are repeatedly applied. Since change detection algorithms naturally face a tradeoff between avoiding false alarms and quick correct detection, we introduce a loss function which evaluates an algorithm's ability to balance these two concerns, and we use it in a series of experiments.

翻訳日:2022-11-04 06:10:32 公開日:2020-07-31

# ニューラルネットワークを用いた半水中のヒーブとサージ運動の予測

Predicting heave and surge motions of a semi-submersible with neural networks ( http://arxiv.org/abs/2007.15973v1 )

ライセンス: Link先を確認

Xiaoxian Guo and Xiantao Zhang and Xinliang Tian and Xin Li and Wenyue Lu

(参考訳) 船舶や浮遊プラットフォームでのリアルタイム動き予測は、動き補償システムの性能を向上させるのに役立つ。また、移動に関して重要なオフショア作戦に便利な早期警戒情報を提供することもできる。本研究では,半潜水艇のヒーブ動作とサージ動作を予測するために,LSTMに基づく長期記憶モデルを開発した。訓練とテストデータは、中国の上海江東大学(上海市)の深海盆地で実施された模型実験から得られたものだ。動きと測定波はLSTM細胞に供給され、その後サーブラート完全連結(FC)層を通過して予測を得た。測定された波の助けを借りて、予測は平均90%近い精度で46.5秒まで伸びた。トレーニングされたモデルは、ノイズ拡張データセットを使用して、0.8までのノイズレベルで効果的に動作した。さらなるステップとして、モデルは動き自体に基づいてのみ動きを予測することができる。モデルのアーキテクチャに関するセンシティブな研究に基づいて,機械学習モデル構築のためのガイドラインを提案する。提案するLSTMモデルでは, 船体波励起運動を予測する能力が強い。

Real-time motion prediction of a vessel or a floating platform can help to improve the performance of motion compensation systems. It can also provide useful early-warning information for offshore operations that are critical with regard to motion. In this study, a long short-term memory (LSTM) -based machine learning model was developed to predict heave and surge motions of a semi-submersible. The training and test data came from a model test carried out in the deep-water ocean basin, at Shanghai Jiao Tong University, China. The motion and measured waves were fed into LSTM cells and then went through serval fully connected (FC) layers to obtain the prediction. With the help of measured waves, the prediction extended 46.5 s into future with an average accuracy close to 90%. Using a noise-extended dataset, the trained model effectively worked with a noise level up to 0.8. As a further step, the model could predict motions only based on the motion itself. Based on sensitive studies on the architectures of the model, guidelines for the construction of the machine learning model are proposed. The proposed LSTM model shows a strong ability to predict vessel wave-excited motions.

翻訳日:2022-11-04 06:04:57 公開日:2020-07-31

# 寒冷後部とアリアティック不確かさ

Cold Posteriors and Aleatoric Uncertainty ( http://arxiv.org/abs/2008.00029v1 )

ライセンス: Link先を確認

Ben Adlam, Jasper Snoek, and Samuel L. Smith

(参考訳) 近年の研究では、ベイズニューラルネットワークにおいて、検証セット(「コールド後部」効果)で後部の「温度」をチューニングすることで、正確な推論よりも優れていることが観察されている。この現象を解釈するために、ベイズニューラルネットワークでよく使われる先行は、多くの分類データセット上のラベルのアレラトリック不確かさを著しく過大評価することができると論じる。この問題は、ラベルの品質が高いMNISTやCIFARのような学術ベンチマークで特に顕著である。ガウス過程回帰の特別な場合、任意の正の温度は修正前の修正後の有効な後部に対応し、この温度を調整することは経験的ベイズと直接類似する。分類タスクでは、事前の修正と温度のチューニングの間に直接的な等価性はないが、温度の低下はトレーニングセットの既存の例をリラベルすることで、ほとんど情報を得ることができないという私たちの信念をよりよく反映するモデルにつながる可能性がある。したがって、冷えた後部は必ずしも正確な推論手順と一致しないが、それらはしばしば我々の真の以前の信念を反映していると信じている。

Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set (the "cold posterior" effect). To help interpret this phenomenon, we argue that commonly used priors in Bayesian neural networks can significantly overestimate the aleatoric uncertainty in the labels on many classification datasets. This problem is particularly pronounced in academic benchmarks like MNIST or CIFAR, for which the quality of the labels is high. For the special case of Gaussian process regression, any positive temperature corresponds to a valid posterior under a modified prior, and tuning this temperature is directly analogous to empirical Bayes. On classification tasks, there is no direct equivalence between modifying the prior and tuning the temperature, however reducing the temperature can lead to models which better reflect our belief that one gains little information by relabeling existing examples in the training set. Therefore although cold posteriors do not always correspond to an exact inference procedure, we believe they may often better reflect our true prior beliefs.

翻訳日:2022-11-04 06:03:54 公開日:2020-07-31

# サイクル学習率を用いた深層強化学習

Deep Reinforcement Learning using Cyclical Learning Rates ( http://arxiv.org/abs/2008.01171v1 )

ライセンス: Link先を確認

Ralf Gulde, Marc Tuscher, Akos Csiszar, Oliver Riedel and Alexander Verl

(参考訳) 深層強化学習(Dep Reinforcement Learning, DRL)法は、しばしば問題を解決するためにハイパーパラメータの微妙なチューニングに依存する。確率勾配降下(SGD)に基づく最適化手順における最も影響力のあるパラメータの1つは、学習率である。循環学習について検討し,様々なDRL問題に対する一般循環学習率の定義法を提案する。本稿では,複素DRL問題に適用した循環学習法を提案する。実験の結果,循環学習は,高度に調整された固定学習率と同等あるいはそれ以上の結果が得られることがわかった。本稿では、DRL設定における循環学習率の最初の適用例を示し、手動ハイパーパラメータチューニングの克服に向けた第一歩となる。

Deep Reinforcement Learning (DRL) methods often rely on the meticulous tuning of hyperparameters to successfully resolve problems. One of the most influential parameters in optimization procedures based on stochastic gradient descent (SGD) is the learning rate. We investigate cyclical learning and propose a method for defining a general cyclical learning rate for various DRL problems. In this paper we present a method for cyclical learning applied to complex DRL problems. Our experiments show that, utilizing cyclical learning achieves similar or even better results than highly tuned fixed learning rates. This paper presents the first application of cyclical learning rates in DRL settings and is a step towards overcoming manual hyperparameter tuning.

翻訳日:2022-11-04 06:03:12 公開日:2020-07-31

# 加速度センサを用いた歩行認識のための特徴学習

Feature Learning for Accelerometer based Gait Recognition ( http://arxiv.org/abs/2007.15958v1 )

ライセンス: Link先を確認

Szil\'ard Nemes, Margit Antal

(参考訳) 音声や物体認識などのパターンマッチングの最近の進歩は、歩行認識のための深層学習ソリューションによる特徴学習の実現を支援する。過去論文では、このタスクのために教師付き方法で訓練されたディープニューラルネットワークを評価した。本研究では,教師なしアプローチと教師なしアプローチの両方について検討した。同様のアーキテクチャをエンドツーエンドモデルとオートエンコーダに組み込んだ特徴抽出器を,歩行検証システムにおいて優れた表現を学習する能力に基づいて比較した。両方の特徴抽出器はIDNetデータセットでトレーニングされ、その後ZJU-GaitAccelデータセットで特徴抽出に使用された。その結果、オートエンコーダは、特徴学習能力に関して差別的なエンドツーエンドモデルに非常に近いこと、そして完全な畳み込みモデルは、訓練戦略に関係なく優れた特徴表現を学習できることを示した。

Recent advances in pattern matching, such as speech or object recognition support the viability of feature learning with deep learning solutions for gait recognition. Past papers have evaluated deep neural networks trained in a supervised manner for this task. In this work, we investigated both supervised and unsupervised approaches. Feature extractors using similar architectures incorporated into end-to-end models and autoencoders were compared based on their ability of learning good representations for a gait verification system. Both feature extractors were trained on the IDNet dataset then used for feature extraction on the ZJU-GaitAccel dataset. Results show that autoencoders are very close to discriminative end-to-end models with regards to their feature learning ability and that fully convolutional models are able to learn good feature representations, regardless of the training strategy.

翻訳日:2022-11-04 06:03:00 公開日:2020-07-31

# Transformer-XLを用いたソースコードの言語モデリング

Language Modelling for Source Code with Transformer-XL ( http://arxiv.org/abs/2007.15813v1 )

ライセンス: Link先を確認

Thomas Dowdell, Hongyu Zhang

(参考訳) 自然言語のテキストと同様に、ソフトウェアは「自然性」を示しており、統計言語モデルによって捉えることができる。近年,ディープラーニングによるソフトウェアの自然性を表現するために,ニューラルネットワークモデルが提案されている。本稿では,rnnモデルやtransformer-xlモデルを含む,ソースコードのための最先端ニューラルネットワークモデルの実験的評価を行う。大規模なPythonコードコーパスの実験により,Transformer-XL モデルは RNN ベースのモデル(LSTM や GRU モデルを含む)よりも計算コストがはるかに少なく,ソフトウェアの自然性を捉えることができることがわかった。

It has been found that software, like natural language texts, exhibits "naturalness", which can be captured by statistical language models. In recent years, neural language models have been proposed to represent the naturalness of software through deep learning. In this paper, we conduct an experimental evaluation of state-of-the-art neural language models for source code, including RNN-based models and Transformer-XL based models. Through experiments on a large-scale Python code corpus, we find that the Transformer-XL model outperforms RNN-based models (including LSTM and GRU models) in capturing the naturalness of software, with far less computational cost.

翻訳日:2022-11-04 06:02:12 公開日:2020-07-31

# 大規模金融コーパスによるNERの性能向上

Improving NER's Performance with Massive financial corpus ( http://arxiv.org/abs/2007.15871v1 )

ライセンス: Link先を確認

Han Zhang

(参考訳) 大きなディープニューラルネットワークのトレーニングには大量の高品質なアノテーションデータが必要ですが、時間と労力のコストは中小企業には高すぎるのです。企業名の認識タスクを,小規模かつ低品質なトレーニングデータを用いて開始し,モデルトレーニング速度の向上と最低労働コストによるパフォーマンスの予測を行う。本手法は,Albert-small や Electra-small といったエリート言語モデルの事前学習,蒸留の知識,多段階学習を含む。その結果,リコール率を20ポイント近く引き上げ,BERT-CRFモデルの4倍速くなることがわかった。

Training large deep neural networks needs massive high quality annotation data, but the time and labor costs are too expensive for small business. We start a company-name recognition task with a small scale and low quality training data, then using skills to enhanced model training speed and predicting performance with minimum labor cost. The methods we use involve pre-training a lite language model such as Albert-small or Electra-small in financial corpus, knowledge of distillation and multi-stage learning. The result is that we raised the recall rate by nearly 20 points and get 4 times as fast as BERT-CRF model.

翻訳日:2022-11-04 05:55:31 公開日:2020-07-31

# 臨床エンティティ抽出の機械学習のためのロバストベンチマーク

Robust Benchmarking for Machine Learning of Clinical Entity Extraction ( http://arxiv.org/abs/2007.16127v1 )

ライセンス: Link先を確認

Monica Agrawal, Chloe O'Connell, Yasmin Fatemi, Ariel Levy, David Sontag

(参考訳) 臨床研究は、しばしば自由テキスト臨床ノートにのみ存在する患者の物語の要素を理解する必要がある。音符を下流で使用するための構造化データに変換するために、これらの要素は一般的に抽出され、医学用語に正規化される。本研究では,最先端システムの性能を監査し,改善領域を示す。 2019 n2c2共有タスクにおける臨床エンティティ正規化システムに対する高いタスク精度は、誤解を招くものであり、基盤となる性能は依然として不安定である。一般的な概念(95.3%)では正規化精度が高いが、トレーニングデータでは認識できない概念(69.3%)ではずっと低い。医療用語の不整合,既存のラベル付けスキーマの制限,狭い評価手法によって,現在のアプローチが妨げられていることを示す。これらの問題に対処するために、臨床エンティティ抽出のためのアノテーションフレームワークを再構築し、堅牢なエンドツーエンドシステムベンチマークを可能にする。 2つのアノテータ間の新たなフレームワークからのアノテーションの一致を評価し、エンティティ認識のための Jaccard 類似度 0.73 とエンティティ正規化のための 0.83 を達成した。本稿では,エンティティ認識と正規化におけるメソッド開発を促進させる基準標準の作成の必要性を実証する手法を提案する。

Clinical studies often require understanding elements of a patient's narrative that exist only in free text clinical notes. To transform notes into structured data for downstream use, these elements are commonly extracted and normalized to medical vocabularies. In this work, we audit the performance of and indicate areas of improvement for state-of-the-art systems. We find that high task accuracies for clinical entity normalization systems on the 2019 n2c2 Shared Task are misleading, and underlying performance is still brittle. Normalization accuracy is high for common concepts (95.3%), but much lower for concepts unseen in training data (69.3%). We demonstrate that current approaches are hindered in part by inconsistencies in medical vocabularies, limitations of existing labeling schemas, and narrow evaluation techniques. We reformulate the annotation framework for clinical entity extraction to factor in these issues to allow for robust end-to-end system benchmarking. We evaluate concordance of annotations from our new framework between two annotators and achieve a Jaccard similarity of 0.73 for entity recognition and an agreement of 0.83 for entity normalization. We propose a path forward to address the demonstrated need for the creation of a reference standard to spur method development in entity recognition and normalization.

翻訳日:2022-11-04 05:55:18 公開日:2020-07-31

# パーキンソン病の学習に基づくコンピュータ支援処方モデル--データ駆動の視点から

Learning-based Computer-aided Prescription Model for Parkinson's Disease: A Data-driven Perspective ( http://arxiv.org/abs/2007.16103v1 )

ライセンス: Link先を確認

Yinghuan Shi and Wanqi Yang and Kim-Han Thung and Hao Wang and Yang Gao and Yang Pan and Li Zhang and Dinggang Shen

(参考訳) 本稿では「PD患者に対する自動処方薬処方」という新たな課題について考察する。この目標を達成するために、まずデータセットを収集して 1) PD患者の症状, および 2)神経科医が提供した処方薬。次に, 観察した症状と処方薬の関係を学習し, 新たな処方薬モデルを構築した。最後に,新来の患者に対しては,処方薬モデルにより,観察された症状に対して適切な処方薬を推奨できる(予測)。方法論的な部分から,提案したモデルであるPrescription viA Learning lAtent Symptoms (PALAS)は,データの多モード表現を用いた処方を推奨できる。 PALASでは、症状と処方薬の関係をより良くモデル化するために、潜伏症状空間が学習される。さらに,PALASの効率的な交互最適化手法を提案する。本手法は,南京脳病院における136人のpd患者から収集したデータを用いて評価した。本研究は,他の競合手法と比較して,提案手法の有効性と臨床効果を示すものである。

In this paper, we study a novel problem: "automatic prescription recommendation for PD patients." To realize this goal, we first build a dataset by collecting 1) symptoms of PD patients, and 2) their prescription drug provided by neurologists. Then, we build a novel computer-aided prescription model by learning the relation between observed symptoms and prescription drug. Finally, for the new coming patients, we could recommend (predict) suitable prescription drug on their observed symptoms by our prescription model. From the methodology part, our proposed model, namely Prescription viA Learning lAtent Symptoms (PALAS), could recommend prescription using the multi-modality representation of the data. In PALAS, a latent symptom space is learned to better model the relationship between symptoms and prescription drug, as there is a large semantic gap between them. Moreover, we present an efficient alternating optimization method for PALAS. We evaluated our method using the data collected from 136 PD patients at Nanjing Brain Hospital, which can be regarded as a large dataset in PD research community. The experimental results demonstrate the effectiveness and clinical potential of our method in this recommendation task, if compared with other competing methods.

翻訳日:2022-11-04 05:45:47 公開日:2020-07-31

# グラフニューラルネットワークにおけるニューラルアーキテクチャ探索

Neural Architecture Search in Graph Neural Networks ( http://arxiv.org/abs/2008.00077v1 )

ライセンス: Link先を確認

Matheus Nunes and Gisele L. Pappa

(参考訳) グラフデータに対する解析的タスクの実行は、リレーショナル情報の多様さと高可用性のため、ますます興味深いものになりつつある。しかし、画像や文とは異なり、ネットワークにはシーケンスの概念はない。ノード(とエッジ)は絶対的な順序に従わず、従来の機械学習(ML)アルゴリズムがパターンを認識して、この種のデータに基づいて予測を一般化することは難しい。グラフニューラルネットワーク(GNN)はこの問題にうまく対処した。これらは畳み込みの概念をグラフドメインに一般化した後に人気になった。しかし、それらは多くのハイパーパラメータを持ち、その設計と最適化は現在、ヒューリスティックスや経験的直観に基づく手作りである。 neural architecture search (nas)メソッドは、この問題に対する興味深い解決策である。本稿では,強化学習に基づく2つのnas法と,進化的アルゴリズムに基づく2つのnas法を比較した。その結果、2つの探索空間上の7つのデータセットを考察し、どちらの方法もランダムな探索と類似の精度を持つことを示し、探索空間の次元が実際に問題と関係しているかどうかという疑問を提起した。

Performing analytical tasks over graph data has become increasingly interesting due to the ubiquity and large availability of relational information. However, unlike images or sentences, there is no notion of sequence in networks. Nodes (and edges) follow no absolute order, and it is hard for traditional machine learning (ML) algorithms to recognize a pattern and generalize their predictions on this type of data. Graph Neural Networks (GNN) successfully tackled this problem. They became popular after the generalization of the convolution concept to the graph domain. However, they possess a large number of hyperparameters and their design and optimization is currently hand-made, based on heuristics or empirical intuition. Neural Architecture Search (NAS) methods appear as an interesting solution to this problem. In this direction, this paper compares two NAS methods for optimizing GNN: one based on reinforcement learning and a second based on evolutionary algorithms. Results consider 7 datasets over two search spaces and show that both methods obtain similar accuracies to a random search, raising the question of how many of the search space dimensions are actually relevant to the problem.

翻訳日:2022-11-04 05:45:10 公開日:2020-07-31

# 遺伝学的改善@ICSE 2020

Genetic Improvement @ ICSE 2020 ( http://arxiv.org/abs/2007.15987v1 )

ライセンス: Link先を確認

William B. Langdon, Westley Weimer, Justyna Petke, Erik Fredericks, Seongmin Lee, Emily Winter, Michail Basios, Myra B. Cohen, Aymeric Blot, Markus Wagner, Bobby R. Bruce, Shin Yoo, Simos Gerasimou, Oliver Krauss, Yu Huang and Michael Gerten

(参考訳) facebookの基調講演と正式なプレゼンテーション(議事録に記録されている)に続いて、第8回国際遺伝子改善ワークショップであるgi-2020 @ icse(2020年7月3日金曜の第42回acm/ieee国際ソフトウェアエンジニアリング会議の一部として開催)で幅広い議論が行われた。トピックとしては、産業の取り込み、ヒューマンファクタ、説明責任(説明可能性、正当化可能性、エクスプロビリティ)、GIベンチマークがある。我々はまた、近年の様々なオンラインアプローチ(例えばSBST 2020)を、対面2面インタラクションなしでインターネット上でWWWを介して仮想コンピュータサイエンス会議やワークショップを開催することと対比した。最後に、コロナウイルスのCovid-19パンデミックが来年と将来の研究にどのように影響するかを推測する。

Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included industry take up, human factors, explainabiloity (explainability, justifyability, exploitability) and GI benchmarks. We also contrast various recent online approaches (e.g. SBST 2020) to holding virtual computer science conferences and workshops via the WWW on the Internet without face-2-face interaction. Finally we speculate on how the Coronavirus Covid-19 Pandemic will affect research next year and into the future.

翻訳日:2022-11-04 05:44:52 公開日:2020-07-31

# 画像の自動生成音素キャプションの評価

Evaluating Automatically Generated Phoneme Captions for Images ( http://arxiv.org/abs/2007.15916v1 )

ライセンス: Link先を確認

Justin van der Hout, Zolt\'an D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg

(参考訳) Image2Speechは画像の音声記述を生成する比較的新しいタスクである。本稿では,この課題の評価について検討する。そこでまず,音素配列からなる画像キャプションを生成するImage2Speechシステムを開発した。このシステムはFlickr8kコーパスでオリジナルのImage2Speechシステムより優れていた。その後、これらの音素キャプションを文に変換する。キャプションは人間の評価者によって画像の記述が優れているとして評価された。最後に, 結果の客観的な測定値は, これらの評価値と相関した。 BLEU4は人間のレーティングと完全に相関しないが、調査された指標の中では最も高い相関関係を示しており、Image2Speechタスクの現在ある最高のメトリクスである。現在の指標は、入力が単語であると仮定するという事実によって制限されている。 Image2Speechタスクのより適切なメトリックは、入力を単語の一部、すなわち音素の一部と仮定するべきである。

Image2Speech is the relatively new task of generating a spoken description of an image. This paper presents an investigation into the evaluation of this task. For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences. This system outperformed the original Image2Speech system on the Flickr8k corpus. Subsequently, these phoneme captions were converted into sentences of words. The captions were rated by human evaluators for their goodness of describing the image. Finally, several objective metric scores of the results were correlated with these human ratings. Although BLEU4 does not perfectly correlate with human ratings, it obtained the highest correlation among the investigated metrics, and is the best currently existing metric for the Image2Speech task. Current metrics are limited by the fact that they assume their input to be words. A more appropriate metric for the Image2Speech task should assume its input to be parts of words, i.e. phonemes, instead.

翻訳日:2022-11-04 05:44:37 公開日:2020-07-31

# 部分メチル化アルジトール酢酸のGC-EIMSスペクトル同定のためのパーゼン密度推定を用いた強化学習と多次元ベイズ分類を用いたニューラルネットワークの比較研究

A Comparative study of Artificial Neural Networks Using Reinforcement learning and Multidimensional Bayesian Classification Using Parzen Density Estimation for Identification of GC-EIMS Spectra of Partially Methylated Alditol Acetates ( http://arxiv.org/abs/2008.02072v1 )

ライセンス: Link先を確認

Faramarz Valafar, Homayoun Valafar

(参考訳) 本研究では, 部分メチル化Alditol Acetates (PMAAs) のガスクロマトグラフィー-電子衝突質量スペクトル (GC-EIMS) データベース用パターン認識検索エンジンの開発について報告する。また,本研究に用いられた2つのパターン認識技術の比較結果を報告する。最初の手法はベイズ分類器とパーゼン密度推定器を用いた統計手法である。第2のテクニックは、強化学習でトレーニングされたニューラルネットワークモジュールだ。ここでは、両システムが少量の雑音でスペクトルを特定するのに優れていることを示す。両方のシステムの性能は劣化信号-雑音比(SNR)で劣化する。部分スペクトル(データを見逃す)を扱う場合、人工ニューラルネットワークシステムの性能が向上する。開発システムはワールドワイドウェブ上に実装されており、GC-EIMS機器に記録されたこれらの分子のスペクトルを用いてPMAAを識別することを目的としている。したがって、このシステムはGC-EIMSスペクトルの計器やカラム依存の変動に敏感である。

This study reports the development of a pattern recognition search engine for a World Wide Web-based database of gas chromatography-electron impact mass spectra (GC-EIMS) of partially methylated Alditol Acetates (PMAAs). Here, we also report comparative results for two pattern recognition techniques that were employed for this study. The first technique is a statistical technique using Bayesian classifiers and Parzen density estimators. The second technique involves an artificial neural network module trained with reinforcement learning. We demonstrate here that both systems perform well in identifying spectra with small amounts of noise. Both system's performance degrades with degrading signal-to-noise ratio (SNR). When dealing with partial spectra (missing data), the artificial neural network system performs better. The developed system is implemented on the world wide web, and is intended to identify PMAAs using submitted spectra of these molecules recorded on any GC-EIMS instrument. The system, therefore, is insensitive to instrument and column dependent variations in GC-EIMS spectra.

翻訳日:2022-11-04 05:44:02 公開日:2020-07-31

# HMCNAS:隠れマルコフ連鎖とベイズ最適化を用いたニューラルネットワーク探索

HMCNAS: Neural Architecture Search using Hidden Markov Chains and Bayesian Optimization ( http://arxiv.org/abs/2007.16149v1 )

ライセンス: Link先を確認

Vasco Lopes and Lu\'is A. Alexandre

(参考訳) Neural Architecture Searchは、さまざまなタスクにおいて最先端のパフォーマンスを達成した。しかし、最終的なモデルアーキテクチャ、サンプルするレイヤの数、強制操作、小さな検索空間など、解決される問題や生成されたモデルに関連する人間定義を必要とする多くの仮定は、最終的にシステムへのバイアスを引き起こすコストで、より高いパフォーマンスを持つモデルを持つことに貢献する。本稿では,2つの新しい構成要素からなるHMCNASを提案する。一人間が設計したモデルに関する情報を利用して、複雑な探索空間を自律的に生成する方法二人間の定義したパラメータや小さな探索空間に頼ることなく、ゼロから競合するCNNを生成することができるベイズ最適化付き進化的アルゴリズム。実験の結果,提案手法は競争的アーキテクチャを極めて短時間で得ることができることがわかった。 HMCNASは、特定のタスクに関する人間的な知識を必要とせずに、競争モデルを作成する方法を提供することによって、NASを一般化するステップを提供する。

Neural Architecture Search has achieved state-of-the-art performance in a variety of tasks, out-performing human-designed networks. However, many assumptions, that require human definition, related with the problems being solved or the models generated are still needed: final model architectures, number of layers to be sampled, forced operations, small search spaces, which ultimately contributes to having models with higher performances at the cost of inducing bias into the system. In this paper, we propose HMCNAS, which is composed of two novel components: i) a method that leverages information about human-designed models to autonomously generate a complex search space, and ii) an Evolutionary Algorithm with Bayesian Optimization that is capable of generating competitive CNNs from scratch, without relying on human-defined parameters or small search spaces. The experimental results show that the proposed approach results in competitive architectures obtained in a very short time. HMCNAS provides a step towards generalizing NAS, by providing a way to create competitive models, without requiring any human knowledge about the specific task.

翻訳日:2022-11-04 05:38:13 公開日:2020-07-31

# ニューラル言語生成:定式化、方法、および評価

Neural Language Generation: Formulation, Methods, and Evaluation ( http://arxiv.org/abs/2007.15780v1 )

ライセンス: Link先を確認

Cristina Garbacea, Qiaozhu Mei

(参考訳) ニューラルネットワークに基づく生成モデリングの最近の進歩は、コンピュータシステムが人間とシームレスに会話でき、自然言語を理解できることを期待している。ニューラルネットワークは、さまざまなユーザニーズを満たすさまざまなコンテキストやタスクにおいて、さまざまな成功度に対するテキストの抜粋を生成するために使用されている。特に、大規模データセットでトレーニングされた高容量のディープラーニングモデルは、明示的な監視信号がなくても、データのパターンを学習する非並列的な能力を示し、現実的で一貫性のあるテキストを生成するための、多くの新しい可能性を開く。自然言語生成の分野は急速に進化しているが、解決すべきオープンな課題がまだたくさんある。本調査では,自然言語生成の問題を明確に定義し,分類する。我々は, 自然言語の生成が実用上重要であるような, 一般的な定式化のインスタンス化である特定のアプリケーションタスクについて検討する。次に、多様なテキストを生成するのに使用される方法とニューラルネットワークアーキテクチャの概要を紹介する。それにもかかわらず、これらの生成モデルによって生成されたテキストの品質を評価する標準的な方法は存在しない。この目的のために、自然言語生成システムの評価に関する現在のアプローチをレビューする。我々は、この調査が神経自然言語生成の定式化、方法、および評価の有益な概要を提供することを期待している。

Recent advances in neural network-based generative modeling have reignited the hopes in having computer systems capable of seamlessly conversing with humans and able to understand natural language. Neural architectures have been employed to generate text excerpts to various degrees of success, in a multitude of contexts and tasks that fulfil various user needs. Notably, high capacity deep learning models trained on large scale datasets demonstrate unparalleled abilities to learn patterns in the data even in the lack of explicit supervision signals, opening up a plethora of new possibilities regarding producing realistic and coherent texts. While the field of natural language generation is evolving rapidly, there are still many open challenges to address. In this survey we formally define and categorize the problem of natural language generation. We review particular application tasks that are instantiations of these general formulations, in which generating natural language is of practical importance. Next we include a comprehensive outline of methods and neural architectures employed for generating diverse texts. Nevertheless, there is no standard way to assess the quality of text produced by these generative models, which constitutes a serious bottleneck towards the progress of the field. To this end, we also review current approaches to evaluating natural language generation systems. We hope this survey will provide an informative overview of formulations, methods, and assessments of neural natural language generation.

翻訳日:2022-11-04 05:37:10 公開日:2020-07-31

# 医療報告書からの信頼できる情報抽出のための浅層cnnモデルのモデル削減

Model Reduction of Shallow CNN Model for Reliable Deployment of Information Extraction from Medical Reports ( http://arxiv.org/abs/2008.01572v1 )

ライセンス: Link先を確認

Abhishek K Dubey and Alina Peluso and Jacob Hinkle and Devanshu Agarawal and Zilong Tan

(参考訳) 浅層畳み込みニューラルネットワーク(cnn)は、がん病理報告から情報を抽出するための時間テストツールである。 Shallow CNNはこのタスクを、多くのNLPタスクの最先端を保持するBERTなど、他のディープラーニングモデルと競合的に実行する。この偏心現象の背景にある主な洞察は、がん病理報告からの情報抽出はタスクを実行するためにドメイン固有のテキストセグメントをほんの数個しか必要とせず、そのため、ほとんどのテキストとコンテキストがそのタスクに過大に働くことである。シャローCNNモデルは、ラベル付きトレーニングセットからこれらのキーショートテキストセグメントを識別するのに適しているが、識別されたテキストセグメントは人間には不明瞭である。本研究では,CNNフィルタと関連するテキストセグメントとの信頼性の高い接続を実現するためのモデル縮小ツールの開発により,このギャップを埋める。我々は,n-gram存在表現の線形変換を変換重みに先立って非負性および疎性で近似することにより,浅部CNN表現の複雑さを低減し,解釈可能なモデルを得る。提案手法は,従来認識されていたトレードオフ境界と,モデルの縮小による説明可能性とのギャップを橋渡しするものである。

Shallow Convolution Neural Network (CNN) is a time-tested tool for the information extraction from cancer pathology reports. Shallow CNN performs competitively on this task to other deep learning models including BERT, which holds the state-of-the-art for many NLP tasks. The main insight behind this eccentric phenomenon is that the information extraction from cancer pathology reports require only a small number of domain-specific text segments to perform the task, thus making the most of the texts and contexts excessive for the task. Shallow CNN model is well-suited to identify these key short text segments from the labeled training set; however, the identified text segments remain obscure to humans. In this study, we fill this gap by developing a model reduction tool to make a reliable connection between CNN filters and relevant text segments by discarding the spurious connections. We reduce the complexity of shallow CNN representation by approximating it with a linear transformation of n-gram presence representation with a non-negativity and sparsity prior on the transformation weights to obtain an interpretable model. Our approach bridge the gap between the conventionally perceived trade-off boundary between accuracy on the one side and explainability on the other by model reduction.

翻訳日:2022-11-04 05:36:04 公開日:2020-07-31

PDF登録状況（公開日: 20200731）