Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20201230となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 長距離相互作用をもつ不規則2次元ハバードモデルの半古典力学 Semiclassical dynamics of a disordered two-dimensional Hubbard model with long-range interactions ( http://arxiv.org/abs/2002.05549v2 ) ライセンス: Link先を確認	Adam S. Sajna, Anatoli Polkovnikov	(参考訳) 相互作用するフェルミオンの二次元系におけるクエンチダイナミクスは半古典的切断ウィグナー近似(twa)で解析される。短距離と長距離の相互作用を持つモデルを考える。後者の場合、twaは非常に正確であり、半古典的ハミルトニアンが正しく同定されるならば、無限範囲極限において漸近的に完全となる。 TWAでは、電荷とスピンの異なる動的時間スケールをはっきりと区別することができる。興味深いことに、弱い中程度の障害強度では電荷の亜拡散挙動を観察し、スピンは拡散力学を示す。強い障害では、量子フィッシャー情報はスピンよりも電荷の増加が遅い時間に対数的成長を示す。短距離モデルとは対照的に、初期状態のドメイン壁のような強い不均一性は、特に弱い障害において、熱化ダイナミクスを著しく遅くすることができる。この振る舞いは、このようなシステムにおける多体局在の解析を目的とした冷原子実験プロトコルの設計において、さらなる課題をもたらす可能性がある。このアプローチでは、多体局在相の存在について明確な記述はできないが、短距離モデルと長距離モデルの両方において、障害強度の関数として非常に高速なクロスオーバーが急速に熱化するから、遅いガラスのような状態になる。 Quench dynamics in a two-dimensional system of interacting fermions is analyzed within the semiclassical truncated Wigner approximation (TWA). The models with short-range and long-range interactions are considered. We show that in the latter case, the TWA is very accurate, becoming asymptotically exact in the infinite-range limit, provided that the semiclassical Hamiltonian is correctly identified. Within the TWA, different dynamical timescales of charges and spins can be clearly distinguished. Interestingly, for a weak and moderate disorder strength, we observe subdiffusive behavior of charges, while spins exhibit diffusive dynamics. At strong disorder, the quantum Fisher information shows logarithmic growth in time with a slower increase for charges than for spins. It is shown that in contrast to the short-range model, strong inhomogeneities such as domain walls in the initial state can significantly slow down thermalization dynamics, especially at weak disorder. This behavior can put additional challenges in designing cold-atom experimental protocols aimed to analyze possible many-body localization in such systems. While within this approach we cannot make any definite statements about the existence of a many-body localized phase, we see a very fast crossover as a function of disorder strength from rapidly thermalizing to a slow glassy like regime both for the short-range and long-range models.	翻訳日:2023-06-03 19:09:03 公開日:2020-12-30
# 最小結合型量子熱エンジンの熱力学 Thermodynamics of Minimal Coupling Quantum Heat Engines ( http://arxiv.org/abs/2003.05788v5 ) ライセンス: Link先を確認	Marcin {\L}obejko, Pawe{\l} Mazurek, Micha{\l} Horodecki	(参考訳) 最小結合型量子熱エンジンは、明示的なエネルギー貯蔵システム、熱浴、作業体からなる熱機械で、離散的なストローク(エネルギー保存型2体量子演算)を通してサブシステムに結合する。このパラダイムの中で、仕事抽出過程は非パッシブエネルギー(エルゴトロピー)の流れによって基本的に制限され、一方、エネルギー散逸はパッシブエネルギーの流れによって表現される量子熱力学の一般的な枠組みを示す。作業体の小さな寸法と2体操作への制限により、エンジンは基本的に可逆的であることが判明した。我々の主な成果は、3ストロークと2レベルの作業体からなる可逆的最小結合エンジンのクラス全体において、各サイクル当たりの最適効率と作業量を求めることであり、そこでは、作業体と電池の間の全ての量子相関を考慮に入れている。主要な新しいツールの1つは、導入される「制御・マージ状態」である。これは、作業体ヒルベルト空間にのみ作用するが、全体の作業体バッテリシステムの作業抽出に関する全ての特徴をカプセル化するものである。また,多ストロークエンジンの一般化を提案するとともに,抽出された作業トレードオフに対する効率性や,エンジンの動作サイクルを何サイクルも経た後に発生する作業変動を解析した。 The minimal-coupling quantum heat engine is a thermal machine consisting of an explicit energy storage system, heat baths, and a working body, which alternatively couples to subsystems through discrete strokes -- energy-conserving two-body quantum operations. Within this paradigm, we present a general framework of quantum thermodynamics, where a work extraction process is fundamentally limited by a flow of non-passive energy (ergotropy), while energy dissipation is expressed through a flow of passive energy. It turns out that small dimensionality of the working body and a restriction only to two-body operations make the engine fundamentally irreversible. Our main result is finding the optimal efficiency and work production per cycle within the whole class of irreversible minimal-coupling engines composed of three strokes and with the two-level working body, where we take into account all possible quantum correlations between the working body and the battery. One of the key new tools is the introduced "control-marginal state" -- one which acts only on a working body Hilbert space, but encapsulates all features regarding work extraction of the total working body-battery system. In addition, we propose a generalization of the many-stroke engine, and we analyze efficiency vs extracted work trade-offs, as well as work fluctuations after many cycles of the running of the engine.	翻訳日:2023-05-29 08:26:33 公開日:2020-12-30
# スピン1/2の波動と誘導方程式の統一 Unification of the wave and guidance equations for spin 1/2 ( http://arxiv.org/abs/2003.06058v2 ) ライセンス: Link先を確認	Peter Holland	(参考訳) 我々は、シュロディンガー方程式と誘導方程式を単一の不均質なシュロディンガー方程式から外部ベクトルポテンシャルを持つリーマン多様体へ一般化する。特別の場合、スピン1/2剛回転子に対する統一理論が得られる。この理論は、粒子と導波を2スピノールとして統合する統一場であるガリレオ群の下で対称であることが証明されている。 We generalize our previous unification of the Schrodinger and guidance equations in a single inhomogeneous Schrodinger equation to a Riemannian manifold with an external vector potential. A special case yields the unified theory for a spin 1/2 rigid rotator. The theory is proved to be symmetrical under the Galileo group, the unified field that integrates the particle and guiding wave being a 2-spinor.	翻訳日:2023-05-29 08:18:07 公開日:2020-12-30
# ダイヤモンド中中性シリコン空孔中心における境界励起子状態による光磁気共鳴 Optically detected magnetic resonance in neutral silicon vacancy centers in diamond via bound exciton states ( http://arxiv.org/abs/2004.12544v2 ) ライセンス: Link先を確認	Zi-Huai Zhang, Paul Stevenson, Gergo Thiering, Brendon C. Rose, Ding Huang, Andrew M. Edmonds, Matthew L. Markham, Stephen A. Lyon, Adam Gali, and Nathalie P. de Leon	(参考訳) ダイヤモンド中の中性シリコン空孔(SiV0)中心は、優れた光学特性と長いスピンコヒーレンス時間のために量子ネットワークの候補となる。しかし、これらの欠陥におけるスピン依存蛍光は励起状態の微細構造や非共鳴スピン偏光の制限が不十分なため、解明されている。本稿では, 低温におけるsiv0中心の光学的に検出された磁気共鳴とコヒーレント制御の実現について述べる。これらの状態は群論と密度汎関数理論を用いて境界励起子状態として割り当てる。これらの境界励起子状態は、SiV0と他の新興欠陥系に対する新しい制御スキームを可能にする。 Neutral silicon vacancy (SiV0) centers in diamond are promising candidates for quantum networks because of their excellent optical properties and long spin coherence times. However, spin-dependent fluorescence in such defects has been elusive due to poor understanding of the excited state fine structure and limited off-resonant spin polarization. Here we report the realization of optically detected magnetic resonance and coherent control of SiV0 centers at cryogenic temperatures, enabled by efficient optical spin polarization via previously unreported higher-lying excited states. We assign these states as bound exciton states using group theory and density functional theory. These bound exciton states enable new control schemes for SiV0 as well as other emerging defect systems.	翻訳日:2023-05-22 00:27:42 公開日:2020-12-30
# ユニタリティにおけるボソニック三量体に対するゼロレンジ相互作用のモデル Models of zero-range interaction for the bosonic trimer at unitarity ( http://arxiv.org/abs/2006.02426v3 ) ライセンス: Link先を確認	Alessandro Michelangeli	(参考訳) ゼロ範囲の2体相互作用によって相互に結合された同一のボソンからなる3体系の物理関連量子ハミルトニアンの数学的構成を示す。プレゼンテーションの大部分では、無限の散乱長(ユニタリティ・レジーム)が考慮される。数学の分野にはいくつかの前駆体がある。我々は、自由ハミルトニアンを偶然超平面の近傍で消滅する波動関数に制限することにより得られる極小作用素の自己随伴拡大の作用素論的構成を導く。このうち, オペレータ構築において, ゼロレンジ法における物理文献に普遍的な形式的物理的議論によって提案される特定の短大構造の存在を実践することにより, 物理的に関連するものを選択する。これは異なる段階において、自己随伴拡張スキームである la Kre{\u\i}n-Vi\v{s}ik-Birman と a la von Neumann を適用することでなされる。我々は正準モデルのクラスを作り、負の有界状態の構造も解析する。ボソニティとゼロレンジの組み合わせにより、そのような標準モデルは典型的なトーマススペクトルとエフィモフスペクトル、すなわちマイナス無限遠点と零点の両方に蓄積されるエネルギー固有値列を示す。また、このようなスペクトル不安定を効果的に短スケールパターンを維持しながら防止する正則化についても論じる。演算子の資格の他に、関連するエネルギー二次形式も提示する。我々は,自己随伴性領域の正しい同定に悪名高い演算子理論構築の特定のステップを明らかにするために,解析を構造化した。 We present the mathematical construction of the physically relevant quantum Hamiltonians for a three-body systems consisting of identical bosons mutually coupled by a two-body interaction of zero range. For a large part of the presentation, infinite scattering length will be considered (the unitarity regime). The subject has several precursors in the mathematical literature. We proceed through an operator-theoretic construction of the self-adjoint extensions of the minimal operator obtained by restricting the free Hamiltonian to wave-functions that vanish in the vicinity of the coincidence hyperplanes: all extensions thus model an interaction precisely supported at the spatial configurations where particles come on top of each other. Among them, we select the physically relevant ones, by implementing in the operator construction the presence of the specific short-scale structure suggested by formal physical arguments that are ubiquitous in the physical literature on zero-range methods. This is done by applying at different stages the self-adjoint extension schemes a la Kre{\u\i}n-Vi\v{s}ik-Birman and a la von Neumann. We produce a class of canonical models for which we also analyse the structure of the negative bound states. Bosonicity and zero range combined together make such canonical models display the typical Thomas and Efimov spectra, i.e., sequence of energy eigenvalues accumulating to both minus infinity and zero. We also discuss a type of regularisation that prevents such spectral instability while retaining an effective short-scale pattern. Beside the operator qualification, we also present the associated energy quadratic forms. We structured our analysis so as to clarify certain steps of the operator-theoretic construction that are notoriously subtle for the correct identification of a domain of self-adjointness.	翻訳日:2023-05-17 06:32:19 公開日:2020-12-30
# Empirica:高スループットマクロレベルの実験のための仮想ラボ Empirica: a virtual lab for high-throughput macro-level experiments ( http://arxiv.org/abs/2006.11398v2 ) ライセンス: Link先を確認	Abdullah Almaatouq, Joshua Becker, James P. Houghton, Nicolas Paton, Duncan J. Watts, Mark E. Whiting	(参考訳) virtual labsは、従来の物理実験室では実現不可能な、高スループットとマクロレベルの実験を研究者が設計できる。オンライン研究の人気は高まっているが、バーチャルラボの実験の設計とデプロイでは、研究者は依然として多くの技術的および物流的障壁に直面している。バーチャルラボ実験の開発を容易にするプラットフォームはいくつか存在するが、彼らは通常、研究者にユーザビリティと機能の間の重大なトレードオフを提示している。 Empiricaは"フレキシブルなデフォルト"設計戦略を採用することで、ユーザビリティと機能のトレードオフに対するソリューションを提供するモジュール型の仮想ラボです。この戦略により、初心者プログラマが利用できる開発プラットフォームを提供しながら、完全な"ビルド・アズ・ア"の柔軟性を維持できます。 Empiricaのアーキテクチャはパラメータ化可能な実験設計、再利用可能なプロトコル、迅速な開発を可能にするように設計されている。これらの機能は、仮想ラボ実験のアクセシビリティを高め、実験設計におけるイノベーションの障壁を取り除き、分散ヒト計算の迅速な理解を可能にする。 Virtual labs allow researchers to design high-throughput and macro-level experiments that are not feasible in traditional in-person physical lab settings. Despite the increasing popularity of online research, researchers still face many technical and logistical barriers when designing and deploying virtual lab experiments. While several platforms exist to facilitate the development of virtual lab experiments, they typically present researchers with a stark trade-off between usability and functionality. We introduce Empirica: a modular virtual lab that offers a solution to the usability-functionality trade-off by employing a "flexible defaults" design strategy. This strategy enables us to maintain complete "build anything" flexibility while offering a development platform that is accessible to novice programmers. Empirica's architecture is designed to allow for parameterizable experimental designs, reusable protocols, and rapid development. These features will increase the accessibility of virtual lab experiments, remove barriers to innovation in experiment design, and enable rapid progress in the understanding of distributed human computation.	翻訳日:2023-05-13 09:13:41 公開日:2020-12-30
# 量子輸送モデルにおける双対性 Duality in quantum transport models ( http://arxiv.org/abs/2008.03476v2 ) ライセンス: Link先を確認	Rouven Frassek, Cristian Giardin\`a, Jorge Kurchan	(参考訳) 量子系を熱的「リンドブラジアン」浴槽と接触させるために、古典的輸送モデルのために広く研究されてきた「双対的アプローチ」を開発した。メソッドが提供します (a)原モデルの単純なモデルへのマッピングで、数個の粒子だけを含むもの、及び (b) この種の一般的な浴の動的過程は平衡浴の浴にマッピングできることが示されている。我々は, [d. bernard, t. jin, phys. rev. lett. 123, 080601 (2019)] で導入された量子対称排他過程という,特定のモデルの研究を通じてこれを例示する。古典的な場合と同様に、全体の構成は問題の力学対称性を考慮すれば理解可能となる。 We develop the `duality approach', that has been extensively studied for classical models of transport, for quantum systems in contact with a thermal `Lindbladian' bath. The method provides (a) a mapping of the original model to a simpler one, containing only a few particles and (b) shows that any dynamic process of this kind with generic baths may be mapped onto one with equilibrium baths. We exemplify this through the study of a particular model: the quantum symmetric exclusion process introduced in [D. Bernard, T. Jin, Phys. Rev. Lett. 123, 080601 (2019)]. As in the classical case, the whole construction becomes intelligible by considering the dynamical symmetries of the problem.	翻訳日:2023-05-06 19:58:23 公開日:2020-12-30
# 非アベリアカゴメ格子におけるフラットバンドと$Z_2$位相位相 Flat bands and $Z_2$ topological phases in a non-Abelian kagome lattice ( http://arxiv.org/abs/2008.10738v2 ) ライセンス: Link先を確認	Zhenxiang Gao and Zhihao Lan	(参考訳) 時間反転と逆対称の両方を持つ非アベリアカゴメ格子モデルを導入し、このモデルの平面バンド物理と位相位相を研究する。時間反転と反転の双方の対称性が共存しているため、エネルギーバンドはフラットバンドの存在に対するエネルギーと条件を解析的に得ることができる2重縮退した3つのバンドから構成され、フラットバンドの他の2つの分散バンドを上から中から下へ、そして3つのバンドの底まで調整することができる。さらに,ガッピング位相はパラメータ空間の離散点にのみ近接するバンドギャップと同じ位相に属することを示し,バンドギャップを閉じることなく任意の2つのガッピング位相を断続的に接続することを示した。逆対称による時間反転対称性とパリティ特性に基づくファフィアンアプローチを用いて、バルクトポロジカル不変量(英語版)を計算し、一意のギャップ位相がZ_2$量子スピンホール位相に属することを証明し、エッジ状態計算によりさらに確認する。 We introduce a non-Abelian kagome lattice model that has both time-reversal and inversion symmetries and study the flat band physics and topological phases of this model. Due to the coexistence of both time-reversal and inversion symmetries, the energy bands consist of three doubly degenerate bands whose energy and conditions for the presence of flat bands could be obtained analytically, allowing us to tune the flat band with respect to the other two dispersive bands from the top to the middle and then to the bottom of the three bands. We further study the gapped phases of the model and show that they belong to the same phase as the band gaps only close at discrete points of the parameter space, making any two gapped phases adiabatically connected to each other without closing the band gap. Using the Pfaffian approach based on the time-reversal symmetry and parity characterization from the inversion symmetry, we calculate the bulk topological invariants and demonstrate that the unique gapped phases belong to the $Z_2$ quantum spin Hall phase, which is further confirmed by the edge state calculations.	翻訳日:2023-05-05 02:02:03 公開日:2020-12-30
# オンチップ導波路におけるフォトニック量子ビットのオンデマンド量子ストレージ On-demand quantum storage of photonic qubits in an on-chip waveguide ( http://arxiv.org/abs/2009.01796v3 ) ライセンス: Link先を確認	Chao Liu, Tian-Xiang Zhu, Ming-Xu Su, You-Zhi Ma, Zong-Quan Zhou, Chuan-Feng Li, and Guang-Can Guo	(参考訳) フォトニック量子メモリは、量子情報処理(qip)のコア要素である。スケーラブルで便利な応用のために、固体で作製された様々な導波路に基づく集積量子メモリに多大な努力が払われている。しかし、qipの必須要件である量子ビットのオンデマンドストレージは、そのような統合量子メモリを用いて実装することが依然として困難である。本稿では、スターク変調原子周波数コムプロトコルを用いて、$^{151}$Eu$^{3+}$:Y$_2$SiO$_5$クリスタルの表面上のオンチップ導波路メモリにおける時間ビン量子ビットのオンデマンド記憶について報告する。 99.3\%\pm0.2\%$のキュービット記憶忠実度が、古典的な測度・プレペア戦略を用いて達成可能な最高忠実度をはるかに超える1パルスあたり0.5フォトンの入力で得られる。オンデマンド検索機能を備えた集積量子メモリは、量子ネットワークにおける統合量子ノードの実用化に向けた重要なステップである。 Photonic quantum memory is the core element in quantum information processing (QIP). For the scalable and convenient practical applications, great efforts have been devoted to the integrated quantum memory based on various waveguides fabricated in solids. However, on-demand storage of qubits, which is an essential requirement for QIP, is still challenging to be implemented using such integrated quantum memory. Here we report the on-demand storage of time-bin qubits in an on-chip waveguide memory on the surface of a $^{151}$Eu$^{3+}$:Y$_2$SiO$_5$ crystal, utilizing the Stark modulated atomic frequency comb protocol. A qubit storage fidelity of $99.3\%\pm0.2\%$ is obtained with a input of 0.5 photons per pulse, far beyond the highest fidelity achievable using the classical measure-and-prepare strategy. The developed integrated quantum memory with the on-demand retrieval capability, represents an important step towards practical applications of integrated quantum nodes in quantum networks.	翻訳日:2023-05-03 22:55:16 公開日:2020-12-30
# 計算相転移:ベンチマーク装置と量子オプティマイザ Computational Phase Transitions: Benchmarking Ising Machines and Quantum Optimisers ( http://arxiv.org/abs/2009.05579v2 ) ライセンス: Link先を確認	Hariphan Philathong and Vishwa Akshay and Ksenia Samburskaya and Jacob Biamonte	(参考訳) 物理プロセッサのベンチマークには様々なアプローチがあるが、最近の研究は計算相転移に焦点を当てている。これはいくつかの要因による。重要なことに、最も難しいインスタンスは狭い領域に集中しており、同様の計算課題を持つ問題インスタンスの均一なランダム分布を可能にする制御パラメータがある。コヒーレントイジングマシン(s)から生成された分布における計算相転移を観察できることが確立されている。量子近似最適化の観点では、量子アルゴリズムが機能する能力は、変数比(密度と呼ばれる)に制約される問題の比率に決定的に依存する。性能に対する臨界密度依存性は、いわゆる到達可能性の欠陥を引き起こした。この観点からは,様々なベンチマーキングタスクに計算位相遷移をどのように適用するかを理解するために必要な背景を思い出し,これらの現代的知見のいくつかを調査した。 While there are various approaches to benchmark physical processors, recent findings have focused on computational phase transitions. This is due to several factors. Importantly, the hardest instances appear to be well-concentrated in a narrow region, with a control parameter allowing uniform random distributions of problem instances with similar computational challenge. It has been established that one could observe a computational phase transition in a distribution produced from coherent Ising machine(s). In terms of quantum approximate optimisation, the ability for the quantum algorithm to function depends critically on the ratio of a problems constraint to variable ratio (called density). The critical density dependence on performance resulted in what was called, reachability deficits. In this perspective we recall the background needed to understand how to apply computational phase transitions in various bench-marking tasks and we survey several such contemporary findings.	翻訳日:2023-05-02 22:19:39 公開日:2020-12-30
# 量子熱力学第一法則におけるコヒーレンスの役割の解明 Unravelling the role of coherence in the first law of quantum thermodynamics ( http://arxiv.org/abs/2009.11370v3 ) ライセンス: Link先を確認	Bert\'ulio de Lima Bernardo	(参考訳) 量子熱力学の新興分野における基本的な問題の一つは、量子レベルで起こるエネルギー過程におけるコヒーレンスの役割である。ここでは、仕事と熱の古典的な定義から導かれた熱力学の第1法則の2つの異なる量子バージョンを調べることでこの問題に対処する。そうすることで、両方のシナリオに数学的に矛盾があることが分かりました。さらに,一貫性を確立する上で,コヒーレンスのダイナミクスのエネルギー的寄与が重要な要素であることを示す。 2段階の原子系を含むいくつかの例について考察した。 One of the fundamental questions in the emerging field of quantum thermodynamics is the role played by coherence in energetic processes that occur at the quantum level. Here, we address this issue by investigating two different quantum versions of the first law of thermodynamics, derived from the classical definitions of work and heat. By doing so, we find out that there exists a mathematical inconsistency between both scenarios. We further show that the energetic contribution of the dynamics of coherence is the key ingredient to establish the consistency. Some examples involving two-level atomic systems are discussed in order to illustrate our findings.	翻訳日:2023-05-01 04:36:24 公開日:2020-12-30
# 粒子理論者のための量子情報 Quantum Information for Particle Theorists ( http://arxiv.org/abs/2010.02931v2 ) ライセンス: Link先を確認	Joseph D. Lykken	(参考訳) 理論高等研究所(TASI 2020)での講義は2020年6月1-26日。対象となったトピックは、量子回路、絡み合い、量子テレポーテーション、ベルの不等式、量子エントロピーとデコヒーレンス、古典的および量子計測、量子場理論における絡み合いエントロピーの領域法則、量子コンピュータ上の量子場理論のシミュレーションである。その過程で私たちは、大学で量子力学を学ぶ(そして教える)方法の根本的なゆるやかさに直面しました。 PythonノートブックとMathematicaノートブックへのリンクにより、読者は計算を再現して拡張でき、量子シミュレータで5つの実験を行うことができる。 Lectures given at the Theoretical Advanced Study Institute (TASI 2020), 1-26 June 2020. The topics covered include quantum circuits, entanglement, quantum teleportation, Bell inequalities, quantum entropy and decoherence, classical versus quantum measurement, the area law for entanglement entropy in quantum field theory, and simulating quantum field theory on a quantum computer. Along the way we confront the fundamental sloppiness of how we all learned (and some of us taught) quantum mechanics in college. Links to a Python notebook and Mathematica notebooks will allow the reader to reproduce and extend the calculations, as well as perform five experiments on a quantum simulator.	翻訳日:2023-04-29 20:14:30 公開日:2020-12-30
# ライドバーグドレッシング用光ツイーザアレイのラマンサイドバンド冷却 Raman Sideband Cooling in Optical Tweezer Arrays for Rydberg Dressing ( http://arxiv.org/abs/2010.07838v2 ) ライセンス: Link先を確認	Nikolaus Lorenz, Lorenzo Festa, Lea-Marina Steinert, Christian Gross	(参考訳) 単一中性原子が光学トワイザーに閉じ込められ、レーザー結合されたライドバーグ状態は、量子シミュレーションのために構成可能な原子配列を生成する高速で柔軟なプラットフォームを提供する。このプラットフォームは、様々な幾何学における量子スピン系の研究に特に適している。しかし、連続トラッピングを必要とする実験では、トラッピングポテンシャルと温度拡大によって引き起こされる不均質な光シフトは厳しい制限を課す。ここでは、ラマンサイドバンド冷却がこれらの制限を克服し、トワイザー配列のライドバーグドレッシングのステージを準備する様子を示す。 Single neutral atoms trapped in optical tweezers and laser-coupled to Rydberg states provide a fast and flexible platform to generate configurable atomic arrays for quantum simulation. The platform is especially suited to study quantum spin systems in various geometries. However, for experiments requiring continuous trapping, inhomogeneous light shifts induced by the trapping potential and temperature broadening impose severe limitations. Here we show how Raman sideband cooling allows one to overcome those limitations, thus, preparing the stage for Rydberg dressing in tweezer arrays.	翻訳日:2023-04-29 00:25:46 公開日:2020-12-30
# Rydberg-interacting qubits を用いた量子シミュレーションと計算 Quantum simulation and computing with Rydberg-interacting qubits ( http://arxiv.org/abs/2011.03031v2 ) ライセンス: Link先を確認	M. Morgado and S. Whitlock	(参考訳) ライドバーグ状態に励起された光学的に捕捉された原子の配列は、最近量子シミュレーションと計算のための競争的物理プラットフォームとして出現し、高忠実度状態の準備と読み出し、量子論理ゲートと100量子ビット以上の量子力学制御が全て実証されている。これらのシステムは現在、数百の量子ビットと数千のマルチキュービットゲートを持つ信頼性のある量子計算が、エラー率の低い領域に初めて到達すべき点に近づいている。本稿では、量子ビットの符号化、量子演算の実行、量子多体ハミルトニアンのエンジニアリングにおいて高い柔軟性を強調した、rydberg量子ツールボックスの概要を示す。次に,高忠実度量子演算と論理ゲート,および多体状態における量子シミュレーションに関する現状を概観する。最後に、Rydbergプラットフォームに特に適する計算スキームと、汎用量子シミュレータや量子コンピュータへの道のりにおける残りの課題について論じる。 Arrays of optically trapped atoms excited to Rydberg states have recently emerged as a competitive physical platform for quantum simulation and computing, where high-fidelity state preparation and readout, quantum logic gates and controlled quantum dynamics of more than 100 qubits have all been demonstrated. These systems are now approaching the point where reliable quantum computations with hundreds of qubits and realistically thousands of multiqubit gates with low error rates should be within reach for the first time. In this article we give an overview of the Rydberg quantum toolbox, emphasizing the high degree of flexibility for encoding qubits, performing quantum operations and engineering quantum many-body Hamiltonians. We then review the state-of-the-art concerning high-fidelity quantum operations and logic gates as well as quantum simulations in many-body regimes. Finally, we discuss computing schemes that are particularly suited to the Rydberg platform and some of the remaining challenges on the road to general purpose quantum simulators and quantum computers.	翻訳日:2023-04-25 05:18:21 公開日:2020-12-30
# 熱場二重状態による2次元量子相転移のためのテンソルネットワーク波動関数の構成 Constructing tensor network wavefunction for a generic two-dimensional quantum phase transition via thermofield double states ( http://arxiv.org/abs/2012.14152v2 ) ライセンス: Link先を確認	Wen-Tao Xu and Guang-Ming Zhang	(参考訳) 二次元量子Rokhsar-Kivelson (RK) 型モデルの最も重要な特徴は、2次元統計モデルの分割関数に基底状態の波動関数ノルムをマッピングすることで、量子相転移が対応する統計モデルの熱相転移となることである。一般的な量子臨界点に対して、平衡密度作用素の浄化である熱場二重状態(TFD)の概念を導入することにより、RK波動関数の枠組みを一般化する。さらに、投影された絡み合ったペア状態の観点からTFD状態を表現することにより、R\'{e}nyiエントロピーの$N$-次はユークリッド時空における3次元統計モデルとなり、一般的な量子相転移を記述する。 2つの平行磁場を持つトーリック符号モデルを用いて、これらのアイデアを説明し、位相遷移が3次元普遍性クラスによって特徴づけられる3次元のZ_2$格子ゲージ-ヒッグスモデルの分割関数を導出する。 The most important feature of two-dimensional quantum Rokhsar-Kivelson (RK) type models is that their ground state wavefunction norms can be mapped into the partition functions of two-dimensional statistical models so that the quantum phase transitions become the thermal phase transitions of the corresponding statistical models. For a generic quantum critical point, we generalize the framework of RK wavefunctions by introducing the concept of the thermofield double (TFD) state, which is a purification of the equilibrium density operator. Moreover, by expressing the TFD state in terms of the projected entangled pair state, its $N$-order of R\'{e}nyi entropy results in a three-dimensional statistical model in Euclidian spacetime, describing the generic quantum phase transitions. Using the toric code model with two parallel magnetic fields as an example, we explain these ideas and derive the partition function of the three-dimensional $Z_2$ lattice gauge-Higgs model, where the phase transitions are characterized by the three-dimensional universality classes.	翻訳日:2023-04-19 01:58:38 公開日:2020-12-30
# マイクロ波の単一人工原子への決定論的負荷と位相形成 Deterministic loading and phase shaping of microwaves onto a single artificial atom ( http://arxiv.org/abs/2012.15084v1 ) ライセンス: Link先を確認	W.-J. Lin, Y. Lu, P. Y. Wen, Y.-T. Cheng, C.-P. Lee, K.-T. Lin, K.-H. Chiang, M. C. Hsieh, J. C. Chen, C.-S. Chuu, F. Nori, A. F. Kockum, G.-D. Lin, P. Delsing and I.-C. Hoi	(参考訳) 量子ノードに決定的に量子情報をロードすることは、量子ネットワークへの重要なステップである。ここでは、コヒーレント状態のマイクロ波光子と最適時相波形が、半無限の1次元(1次元)伝送線路導波路内の単一の超伝導人工原子に効率よくロードできることを実証する。原子のデコヒーレンス時間と時間定数が一致する指数上昇波形を持つ弱コヒーレント状態(平均光子数n<<1)を用いて,1次元半自由空間から人工原子への94%以上の負荷効率を示す。また、フォック状態のマイクロ波光子は98.5%の効率で決定的にロードできることを示した。我々はさらに、原子を励起するコヒーレント状態の位相を操作し、ローディングプロセスのコヒーレント制御を可能にする。その結果,導波路量子電磁力学(QED)に基づく量子ネットワークの実現に期待できる応用が得られた。 Loading quantum information deterministically onto a quantum node is an important step towards a quantum network. Here, we demonstrate that coherent-state microwave photons, with an optimal temporal waveform, can be efficiently loaded onto a single superconducting artificial atom in a semi-infinite one-dimensional (1D) transmission-line waveguide. Using a weak coherent state (average photon number N<<1 with an exponentially rising waveform, whose time constant matches the decoherence time of the artificial atom, we demonstrate a loading efficiency of above 94% from 1D semi-free space to the artificial atom. We also show that Fock-state microwave photons can be deterministically loaded with an efficiency of 98.5%. We further manipulate the phase of the coherent state exciting the atom, enabling coherent control of the loading process. Our results open up promising applications in realizing quantum networks based on waveguide quantum electrodynamics (QED).	翻訳日:2023-04-18 08:08:21 公開日:2020-12-30
# 量子ゼノダイナミクスに関連する積公式について Note on a Product Formula Related to Quantum Zeno Dynamics ( http://arxiv.org/abs/2012.15061v1 ) ライセンス: Link先を確認	Pavel Exner and Takashi Ichinose	(参考訳) 分離可能ヒルベルト空間上で作用する非負自己共役作用素 $H$ と直交射影 $P$ が、$H_P := (H^{1/2}P)^(H^{1/2}P)$ が密に定義されるとき、強い作用素位相において$\lim_{n\rightarrow \infty} (P\,\mathrm{e}^{-itH/n}P)^n = \mathrm{e}^{-itH_P}P$ が成り立つことを証明する。また、この積公式の修正と、$P$が$P(0)=P$を満たす強い連続射影値関数に置き換えられる状況への拡張も導かれる。 Given a nonnegative self-adjoint operator $H$ acting on a separable Hilbert space and an orthogonal projection $P$ such that $H_P := (H^{1/2}P)^(H^{1/2}P)$ is densely defined, we prove that $\lim_{n\rightarrow \infty} (P\,\mathrm{e}^{-itH/n}P)^n = \mathrm{e}^{-itH_P}P$ holds in the strong operator topology. We also derive modifications of this product formula and its extension to the situation when $P$ is replaced by a strongly continuous projection-valued function satisfying $P(0)=P$.	翻訳日:2023-04-18 08:08:03 公開日:2020-12-30
# 弱および強結合状態における駆動場に対する量子ゼノ効果と反ゼノ効果 The quantum Zeno and anti-Zeno effects with driving fields in the weak and strong coupling regimes ( http://arxiv.org/abs/2012.15040v1 ) ライセンス: Link先を確認	Mehwish Majeed and Adam Zaman Chaudhry	(参考訳) 量子力学における繰り返しの測定は、量子系の時間進化を凍結(量子ゼノ効果)または強化(量子反ゼノ効果)することができる。本稿では,システム環境結合の弱さのみを仮定して,任意の開量子系に対する量子ゼノ効果と反ゼノ効果の一般解法を提案する。特に,任意の駆動場に従属する2レベル系の有効減衰率と周期的測定の一般式を得た。その結果, 駆動場は減衰速度を変化させるので, 量子ゼノと反ゼノの挙動は定性的かつ定量的に変化することがわかった。また、量子ゼノ効果と反ゼノ効果に対する駆動場の非自明な効果をさらに説明するために、複数の2レベル系と、調和振動子の環境に強く結合した2レベル系からなる系にも結果を拡張した。 Repeated measurements in quantum mechanics can freeze (the quantum Zeno effect) or enhance (the quantum anti-Zeno effect) the time-evolution of a quantum system. In this paper, we present a general treatment of the quantum Zeno and anti-Zeno effects for arbitrary driven open quantum systems, assuming only that the system-environment coupling is weak. In particular, we obtain a general expression for the effective decay rate of a two-level system subjected to arbitrary driving fields as well as periodic measurements. We demonstrate that the driving fields change the decay rate, and hence the quantum Zeno and anti-Zeno behavior, both qualitatively and quantitatively. We also extend our results to systems consisting of more than one two-level system, as well as a two-level system strongly coupled to an environment of harmonic oscillators, to further illustrate the non-trivial effect of the driving fields on the quantum Zeno and anti-Zeno effects.	翻訳日:2023-04-18 08:07:39 公開日:2020-12-30
# ダイヤモンド中性シリコン空孔中心に基づく量子ノード用ハイブリッドIII-Vダイヤモンドフォトニックプラットフォーム Hybrid III-V diamond photonic platform for quantum nodes based on neutral silicon vacancy centers in diamond ( http://arxiv.org/abs/2012.15018v1 ) ライセンス: Link先を確認	Ding Huang, Alex Abulnaga, Sacha Welinski, Mouktik Raha, Jeff D. Thompson, and Nathalie P. de Leon	(参考訳) ダイヤモンドのカラーセンタに基づく原子量子メモリとオンチップフォトニックデバイスを統合することで、長距離の絡み合い分布が可能になる。しかし、色中心は環境に非常に敏感であり、ナノファブリケート構造ではその性質が劣化するため、統合への取り組みは困難である。本稿では,ダイヤモンド基板のエッチングを回避し,中性シリコン空隙(siv0)センタ用に設計された異種集積型,オンチップ,iii-vダイヤモンドプラットフォームについて述べる。ダイヤモンド表面近くのSiV0中心へのエバネッセント結合により、このプラットフォームはSiV0放出のパーセル増強と、通信Cバンドへの効率的な周波数変換を可能にする。提案した構造は手軽に製造できる技術で実現できる。 Integrating atomic quantum memories based on color centers in diamond with on-chip photonic devices would enable entanglement distribution over long distances. However, efforts towards integration have been challenging because color centers can be highly sensitive to their environment, and their properties degrade in nanofabricated structures. Here, we describe a heterogeneously integrated, on-chip, III-V diamond platform designed for neutral silicon vacancy (SiV0) centers in diamond that circumvents the need for etching the diamond substrate. Through evanescent coupling to SiV0 centers near the surface of diamond, the platform will enable Purcell enhancement of SiV0 emission and efficient frequency conversion to the telecommunication C-band. The proposed structures can be realized with readily available fabrication techniques.	翻訳日:2023-04-18 08:06:50 公開日:2020-12-30
# システム内の障害・紛争に対する政策の評価に対するエントロピー分析:12量子社会量子システムとしての交通交差点を事例として Entropic Analysis to Assess impact of Policies on Disorders and Conflicts within a system: Case Study of Traffic intersection as 12-Qubit Social Quantum System ( http://arxiv.org/abs/2012.15012v1 ) ライセンス: Link先を確認	Rakesh Kumar Pandey	(参考訳) 交通交差点におけるシナリオのエントロピー解析を詳細に試みる。モデルは衝突エントロピーを定義するために利用される。交通信号の設置やフライオーバーの建設といった戦略(政治)を用いることで、エントロピーは減少し、交通が順序づけられることが示されている。これらの方針は, エントロピーの低減と競合エントロピーの排除に寄与することが示唆された。このような分析は、好ましいポリシー決定や人工知能アルゴリズムの定式化に多大な応用を見出すことができる。交通交差点の顕著な類似性は、量子システムの振る舞いを理解するために、新しいトラフィックフローの研究範囲を開く12量子ビットの量子システムで見出される。 Entropic analysis of a scenario at a traffic intersection is attempted in detail. The model is utilized to define Conflict Entropy. It is shown that with the use of strategies (policies) like installing traffic lights and construction of flyovers the Entropy is reduced thereby making the traffic ordered. It is shown that these policies help in reducing the Entropy and eliminating the Conflict Entropy completely in both the cases. Such an analysis can find immense application in deciding a favorable policy and in formulation of artificial intelligence algorithms. A striking similarity of the traffic intersection is found with Quantum systems of twelve qubits that opens up a new scope of study of traffic flows to understand the behavior of Quantum Systems.	翻訳日:2023-04-18 08:06:24 公開日:2020-12-30
# 量子計算の一貫性と等価原理 Consistency of quantum computation and the equivalence principle ( http://arxiv.org/abs/2012.14990v1 ) ライセンス: Link先を確認	Marcin Nowakowski	(参考訳) 一般相対性理論の構成要素の1つである同値原理は、重力における量子効果の分析にも重要であると考えられる。本稿では,等値原理が重力場における量子計算の一貫性を保たなければならないかどうかを考察する。本稿では,重力場と加速参照フレームの両方のステップからなるループ進化解析法を提案する。等価原理がなければ、ループ量子進化はユニタリでないことを示し、一貫性を緩める。この理由から、同値原理はゲージ変換によって定式化され、ループされた経路上の作用に関連する適切な位相を得る粒子に対して解析される。結果として、重力場における量子演算の一貫性を維持するためには、同値原理の量子変種を保持する必要がある。これは量子情報処理におけるこの基本的な重力原理の量子化バージョンの重要性を証明している。 The equivalence principle, being one of the building blocks of general relativity, seems to be also crucial for analysis of quantum effects in gravity. In this paper we consider the question if the equivalence principle has to hold for consistency of performing quantum computation in gravitational field. We propose an analysis with a looped evolution consisting of steps both in the gravitational field and in the accelerated reference frame. We show that without the equivalence principle the looped quantum evolution cannot be unitary and looses its consistency. For this reasoning the equivalence principle is formulated in terms of the gauge transformations and is analyzed for particles acquiring an appropriate phases associated with the actions over the looped path. In consequence, to keep consistency of quantum operations in gravitational field, it is required to keep some quantum variant of the equivalence principle. This proves importance of the quantized versions of this fundamental gravitational principle for quantum information processing.	翻訳日:2023-04-18 08:05:17 公開日:2020-12-30
# 時間内の2粒子量子干渉 Two-boson quantum interference in time ( http://arxiv.org/abs/2012.15165v1 ) ライセンス: Link先を確認	Nicolas J. Cerf and Michael G. Jabbour	(参考訳) 有名なHong-Ou-Mandel効果は、2粒子量子干渉のパラダイムである。ポーリ原理(pauli principle)による同一の量子粒子の対称性にそのルーツがある。ビームスプリッタ(透過率1/2)に衝突する2つの同一のボゾンは、光や物質の多数の実験で確認されたように、両方の出力ポートで偶然検出できない。ここでは、部分時間反転がビームスプリッタ線形結合を増幅に変換することを確立する。この双対性から、量子増幅器(ゲイン2)における2-ボソン干渉効果の存在を推定し、基礎となるメカニズムを時間的非識別可能性と同定する。この基本的なメカニズムはボソニック・ボゴリューボフ変換に汎用的であるため、量子物理学における幅広い影響を予想する。 The celebrated Hong-Ou-Mandel effect is the paradigm of two-particle quantum interference. It has its roots in the symmetry of identical quantum particles, as dictated by the Pauli principle. Two identical bosons impinging on a beam splitter (of transmittance 1/2) cannot be detected in coincidence at both output ports, as confirmed in numerous experiments with light or even matter. Here, we establish that partial time reversal transforms the beamsplitter linear coupling into amplification. We infer from this duality the existence of an unsuspected two-boson interferometric effect in a quantum amplifier (of gain 2) and identify the underlying mechanism as timelike indistinguishability. This fundamental mechanism is generic to any bosonic Bogoliubov transformation, so we anticipate wide implications in quantum physics.	翻訳日:2023-04-18 07:57:04 公開日:2020-12-30
# kugel-khomskii模型における量子絡み合い、局所指標と外部場の影響 Quantum entanglement, local indicators and effect of external fields in the Kugel-Khomskii model ( http://arxiv.org/abs/2012.15134v1 ) ライセンス: Link先を確認	V.E. Valiulin, A.V. Mikheyenkov, N.M. Chtchelkatchev, K.I. Kugel	(参考訳) 厳密な対角化手法を用いて、異なるサブシステム間交換項を持つ2スピンモデル (kugel-khomskii) によって記述された有限鎖のエネルギースペクトルと波動関数を決定する。発見された解は、このクラスのモデルに固有の量子エンタングルメントの問題に対処する可能性を与える。我々は,絡み合いの適切な数値尺度として扱われるコンカレンスの計算に重点を置いている。また,局所的絡み合い指標と考えられる2点相関関数の挙動を解析した。我々は、非零絡み合い領域を含むモデルの位相図を構築する。両スピン変数が絡み合う領域に共役する外部磁場の発音効果は、モデルのパラメータによっても絡み合いを増強し弱めることができる。 Using the exact diagonalization technique, we determine the energy spectrum and wave functions for finite chains described by the two-spin (Kugel--Khomskii) model with different types of intersubsystem exchange terms. The found solutions provide a possibility to address the problem of quantum entanglement inherent to this class of models. We put the main emphasis on the calculations of the concurrence treated as an adequate numerical measure of the entanglement. We also analyze the behavior of two-site correlation functions considered as a local indicator of entanglement. We construct the phase diagrams of the models involving the regions of nonzero entanglement. The pronounced effect of external fields, conjugated to both spin variables on the regions with entanglement, could both enhance and weaken the entanglement depending on the parameters of the models.	翻訳日:2023-04-18 07:56:28 公開日:2020-12-30
# Webのルーチン性と予測可能性の限界:Web追跡データを用いたデモグラフィックと行動差の調査 Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data ( http://arxiv.org/abs/2012.15112v1 ) ライセンス: Link先を確認	Juhi Kulshrestha, Marcos Oliveira, Orkut Karacalik, Denis Bonnay, Claudia Wagner	(参考訳) web上の人間の活動や動きを理解することは、計算社会科学者にとって重要であるだけでなく、レコメンデーション、キャッシング、広告、パーソナライゼーションのためのオンラインシステム設計のための貴重なガイダンスを提供する。そこで本研究では,web上のルーチンに従う傾向にあり,web訪問の繰り返しパターンが,閲覧行動の予測可能性を高めることを実証する。ウェブ上での人体移動の予測可能性の不確実性と理論的限界を測定するための情報理論フレームワークを提案する。異なる設計決定が測定に与える影響を体系的に評価する。このフレームワークをドイツのインターネットユーザのWeb追跡データセットに適用する。私たちの経験的結果は、Web上の個人のルーチンは、平均85%の閲覧行動を予測可能にしますが、その値は個人によって異なります。ユーザの予測能力の違いは、その人口統計学的特性と行動学的特性によってある程度説明できる。 Understanding human activities and movements on the Web is not only important for computational social scientists but can also offer valuable guidance for the design of online systems for recommendations, caching, advertising, and personalization. In this work, we demonstrate that people tend to follow routines on the Web, and these repetitive patterns of web visits increase their browsing behavior's achievable predictability. We present an information-theoretic framework for measuring the uncertainty and theoretical limits of predictability of human mobility on the Web. We systematically assess the impact of different design decisions on the measurement. We apply the framework to a web tracking dataset of German internet users. Our empirical results highlight that individual's routines on the Web make their browsing behavior predictable to 85% on average, though the value varies across individuals. We observe that these differences in the users' predictabilities can be explained to some extent by their demographic and behavioral attributes.	翻訳日:2023-04-18 07:55:42 公開日:2020-12-30
# 動的相転移としての断熱量子輸送の散逸失敗 Dissipative failure of adiabatic quantum transport as a dynamical phase transition ( http://arxiv.org/abs/2012.15212v1 ) ライセンス: Link先を確認	Fergus Barratt, Aleix Bou Comas, Philip J. D. Crowley, Vadim Oganesyan, Peter Sollich, Andrew G. Green	(参考訳) 絡み合いは断熱的な量子輸送の中心的資源である。遅延は、軌道を偏り、成功と失敗の間の移行を駆動することで、リソースの可用性に影響を与える。この絡み合いの減少は、量子技術の実用化に重要である。本稿では, 断熱輸送の故障を動的相転移として理解することで, 断熱計算の失敗に対する新たな視点を示す。これらのアイデアは、2つのスピン系における断熱量子輸送のおもちゃモデルで実証される。 Entanglement is the central resource in adiabatic quantum transport. Dephasing affects the availability of that resource by biasing trajectories, driving transitions between success and failure. This depletion of entanglement is important for the practical implementation of quantum technologies. We present a new perspective on the failure of adiabatic computation by understanding the failure of adiabatic transport as a dynamical phase transition. These ideas are demonstrated in a toy model of adiabatic quantum transport in a two spin system.	翻訳日:2023-04-18 07:47:18 公開日:2020-12-30
# 実世界ツインフィールド量子鍵分布に対するコヒーレント相転移 Coherent phase transfer for real-world twin-field quantum key distribution ( http://arxiv.org/abs/2012.15199v1 ) ライセンス: Link先を確認	Cecilia Clivati, Alice Meda, Simone Donadello, Salvatore Virz\`i, Marco Genovese, Filippo Levi, Alberto Mura, Mirko Pittaluga, Zhiliang L. Yuan, Andrew J. Shields, Marco Lucamarini, Ivo Pietro Degiovanni, Davide Calonico	(参考訳) 量子力学は、本質的にセキュアな暗号鍵を光学的に分配することを可能にする。ツインフィールド量子鍵分布は長距離ファイバの実装において最も有望な手法であるが、パーティ間の通信チャネルの光学的長さを安定化する必要がある。スプールファイバに基づく原理実証実験では、周期的な調整フレームで量子通信をインターリーブすることで達成された。このアプローチでは、キーストリーミングの長いデューティサイクルはチャネル長のゆるやかな制御のコストで実現され、実世界でこのテクニックを使ったキー転送の成功は依然として大きな課題である。周波数計測から得られた干渉法を用いて,65dbの損失を有する206kmのフィールド展開ファイバ上で,鍵のストリーミングとチャネル長制御を同時に行う方法を開発した。提案手法は,チャネル長変化による量子ビットエラーレートを<1%に削減し,実世界の量子通信に有効な解を示す。 Quantum mechanics allows the distribution of intrinsically secure encryption keys by optical means. Twin-field quantum key distribution is the most promising technique for its implementation on long-distance fibers, but requires stabilizing the optical length of the communication channels between parties. In proof-of-principle experiments based on spooled fibers, this was achieved by interleaving the quantum communication with periodical adjustment frames. In this approach, longer duty cycles for the key streaming come at the cost of a looser control of channel length, and a successful key-transfer using this technique in a real world remains a significant challenge. Using interferometry techniques derived from frequency metrology, we developed a solution for the simultaneous key streaming and channel length control, and demonstrate it on a 206 km field-deployed fiber with 65 dB loss. Our technique reduces the quantum-bit-error-rate contributed by channel length variations to <1%, representing an effective solution for real-world quantum communications.	翻訳日:2023-04-18 07:47:11 公開日:2020-12-30
# 拡張重力理論における量子非局所性 Quantum nonlocality in extended theories of gravity ( http://arxiv.org/abs/2012.15331v1 ) ライセンス: Link先を確認	Victor A. S. V. Bittencourt, Massimo Blasone, Fabrizio Illuminati, Gaetano Lambiase, Giuseppe Gaetano Luciano, Luciano Petruzziello	(参考訳) 本研究では, 粒子内部自由度における純粋状態アインシュタイン-ポドルスキー-ローゼン相関が, 重力の延長理論によって記述された時空背景の影響について検討する。アインシュタイン・ヒルベルト作用に対する補正が曲率不変量において二次的なモデルを考え、弱体極限に焦点をあてる。非局所的な量子相関をクレーター・ホルン・シモニー・ホルト不等式違反を用いて定量化し、曲線背景が一般相対性理論とアインシュタイン重力の補正によるさらなる寄与により、先行項による違反をいかに抑制するかを示す。この結果は、光子対のような質量を持たない粒子に一般化することができ、拡張重力模型の精密な実験を考案するために好適に利用することができる。 We investigate how pure-state Einstein-Podolsky-Rosen correlations in the internal degrees of freedom of massive particles are affected by a curved spacetime background described by extended theories of gravity. We consider models for which the corrections to the Einstein-Hilbert action are quadratic in the curvature invariants and we focus on the weak-field limit. We quantify nonlocal quantum correlations by means of the violation of the Clauser-Horne-Shimony-Holt inequality, and show how a curved background suppresses the violation by a leading term due to general relativity and a further contribution due to the corrections to Einstein gravity. Our results can be generalized to massless particles such as photon pairs and can thus be suitably exploited to devise precise experimental tests of extended models of gravity.	翻訳日:2023-04-18 07:39:09 公開日:2020-12-30
# 浮遊ナノ球の量子運動におけるベクトル偏光子 Vectorial polaritons in the quantum motion of a levitated nanosphere ( http://arxiv.org/abs/2012.15265v1 ) ライセンス: Link先を確認	A. Ranfagni, P. Vezio, M. Calamai, A. Chowdhury, F. Marino, and F. Marin	(参考訳) 電磁場(光子)の基本励起と量子化された機械振動(フォノン)の強い結合は、フォノン偏光子として知られるハイブリッド準粒子状態を生成する。典型的なサインは結合系の固有周波数間の交差を回避し、Jaynes-Cummings Hamiltonianによってパラダイム的に説明され、空洞光子が原子、イオン、励起子、スピンエンサンブル、超伝導量子ビットと結合する量子電磁力学実験で観察される。本研究では, ナノ球の量子運動におけるフォノン偏光子の発生を実証する。粒子は光トウェザによって高真空に閉じ込められ、トウェザ光子のコヒーレント散乱によって単一のキャビティモードに強く結合される。 2次元運動は、光学キャビティモードとともに3自由度を持つ光学系を定義する2つのほぼ退化成分に分けられる。このように、強い結合状態に入ると、三部量子系の典型的な分散法則でハイブリッド光機械状態が観察される。驚くべきことに、この運動の独立成分は、光の偏光と同様に、偏光場にベクトル的性質を与える平面上の物理的振動方向を識別する。本研究は,光子成分とフォノニック成分間の量子情報伝達のための新しいプロトコルへの道を開き,室温での光力学的絡み合い状態の実証に向けた鍵となるステップを示す。 The strong coupling between elementary excitations of the electromagnetic field (photons) and quantized mechanical vibrations (phonons) produces hybrid quasi-particle states, known as phonon-polaritons. Their typical signature is the avoided crossing between the eigenfrequencies of the coupled system, as paradigmatically illustrated by the Jaynes-Cummings Hamiltonian, and observed in quantum electrodynamics experiments where cavity photons are coupled to atoms, ions, excitons, spin ensambles and superconducting qubits. In this work, we demonstrate the generation of phonon-polaritons in the quantum motion of an optically-levitated nanosphere. The particle is trapped in high vacuum by an optical tweezer and strongly coupled to a single cavity mode by coherent scattering of the tweezer photons. The two-dimensional motion splits into two nearly-degenerate components that, together with the optical cavity mode, define an optomechanical system with three degrees-of-freedom. As such, when entering the strong coupling regime, we observe hybrid light-mechanical states with a dispersion law typical of tripartite quantum systems. Remarkably, the independent components of motion here identify a physical vibration direction on a plane that, similarly to the polarization of light, confers a vectorial nature to the polariton field. Our results pave the way to novel protocols for quantum information transfer between photonic and phononic components and represent a key-step towards the demonstration of optomechanical entangled states at room temperature.	翻訳日:2023-04-18 07:37:27 公開日:2020-12-30
# 勾配の流れを罠にかける方法 How to trap a gradient flow ( http://arxiv.org/abs/2001.02968v3 ) ライセンス: Link先を確認	S\'ebastien Bubeck and Dan Mikulincer	(参考訳) 我々は、コンパクトな領域である $\mathbb{r}^d$ 上の滑らかな函数の1つの定常点である $\varepsilon$-approximate stationary point を見つける問題を考える。勾配降下のような次元のないアプローチとは対照的に、ここでは$d$が有限かつ潜在的に小さい場合に焦点を当てる。この視点は1993年にヴァヴァシスによって探求され、バヴァシスは任意の有限次元$d$に対して、勾配降下のオラクル複雑性を$O(1/\varepsilon^2)$で改善するアルゴリズムを提案した。例えば、$d=2$の場合、vavasisのアプローチは$o(1/\varepsilon)$を得る。さらに$d=2$で、決定論的アルゴリズムに対して$\omega(1/\sqrt{\varepsilon})$という下限を証明した(この結果をランダム化アルゴリズムに拡張する)。我々の主な貢献は、勾配流トラップ法(GFT)と呼ばれるアルゴリズムと、そのオラクルの複雑さの分析である。次元 $d=2$ において、GFT は Vavasis の下界(対数係数まで)とのギャップを閉じ、複雑性 $O\left(\sqrt {\frac{\log(1/\varepsilon)}{\varepsilon}}\right)$ であることを示す。次元$d=3$では、$O\left(\frac{\log(1/\varepsilon)}{\varepsilon}\right)$を示し、Vavasisの$O\left(1 / \varepsilon^{1.2} \right)$を改善する。高次元では、gftは勾配降下の多項式深さ(vavasis' algorithm)とは対照的に対数平行深さ戦略(logarithmic parallel depth strategy)という特徴を持つ。この高次元状態において、GFTの総作業量は、この問題に対する既知の唯一の多対数深度戦略、すなわち単純格子探索に基づいて2次的に改善される。我々はこの結果を,任意の固定次元におけるVavasisのアルゴリズムを改善するアルゴリズムである 'emph{cut and flow} (CF) で拡張する。 We consider the problem of finding an $\varepsilon$-approximate stationary point of a smooth function on a compact domain of $\mathbb{R}^d$. In contrast with dimension-free approaches such as gradient descent, we focus here on the case where $d$ is finite, and potentially small. This viewpoint was explored in 1993 by Vavasis, who proposed an algorithm which, for any fixed finite dimension $d$, improves upon the $O(1/\varepsilon^2)$ oracle complexity of gradient descent. For example for $d=2$, Vavasis' approach obtains the complexity $O(1/\varepsilon)$. Moreover for $d=2$ he also proved a lower bound of $\Omega(1/\sqrt{\varepsilon})$ for deterministic algorithms (we extend this result to randomized algorithms). Our main contribution is an algorithm, which we call gradient flow trapping (GFT), and the analysis of its oracle complexity. In dimension $d=2$, GFT closes the gap with Vavasis' lower bound (up to a logarithmic factor), as we show that it has complexity $O\left(\sqrt{\frac{\log(1/\varepsilon)}{\varepsilon}}\right)$. In dimension $d=3$, we show a complexity of $O\left(\frac{\log(1/\varepsilon)}{\varepsilon}\right)$, improving upon Vavasis' $O\left(1 / \varepsilon^{1.2} \right)$. In higher dimensions, GFT has the remarkable property of being a logarithmic parallel depth strategy, in stark contrast with the polynomial depth of gradient descent or Vavasis' algorithm. In this higher dimensional regime, the total work of GFT improves quadratically upon the only other known polylogarithmic depth strategy for this problem, namely naive grid search. We augment this result with another algorithm, named \emph{cut and flow} (CF), which improves upon Vavasis' algorithm in any fixed dimension.	翻訳日:2023-01-13 05:25:17 公開日:2020-12-30
# RatLesNetv2: げっ歯類脳病変分離のための完全な畳み込みネットワーク RatLesNetv2: A Fully Convolutional Network for Rodent Brain Lesion Segmentation ( http://arxiv.org/abs/2001.09138v4 ) ライセンス: Link先を確認	Juan Miguel Valverde, Artem Shatillo, Riccardo de Feo, Olli Gr\"ohn, Alejandra Sierra, Jussi Tohka	(参考訳) われわれは,ラットレズネットv2と命名された完全畳み込みニューラルネットワーク(ConvNet)を,歯状磁気共鳴(MR)脳画像における病変の分節のために提案する。 ratlesnetv2アーキテクチャはオートエンコーダに似ており、最適化を容易にする残余ブロックを組み込んでいる。 RatLesNetv2は3次元画像でエンドツーエンドにトレーニングされており、前処理を必要としない。 RatLesNetv2は, 局所脳虚血を薬開発に用いた671ラットの脳MRIで, 916T2強調MRIを用いて, 非常に大きなデータセットで評価した。さらに,医療画像セグメンテーション用に設計された他の3つのConvNetと比較した。 ratlesnetv2は他のconvnetよりも高いdice係数の値が得られ、より少ない穴と低いハウスドルフ距離を持つよりリアルでコンパクトなセグメンテーションを生み出した。 ratlesnetv2 セグメンテーションの dice スコアは、手動セグメンテーションのレート間合意を超えた。結論として、RatLesNetv2は、自動病変分割、人間の作業量の削減、再現性の向上に使用できる。 RatLesNetv2はhttps://github.com/jmlipman/RatLesNetv2で公開されている。 We present a fully convolutional neural network (ConvNet), named RatLesNetv2, for segmenting lesions in rodent magnetic resonance (MR) brain images. RatLesNetv2 architecture resembles an autoencoder and it incorporates residual blocks that facilitate its optimization. RatLesNetv2 is trained end to end on three-dimensional images and it requires no preprocessing. We evaluated RatLesNetv2 on an exceptionally large dataset composed of 916 T2-weighted rat brain MRI scans of 671 rats at nine different lesion stages that were used to study focal cerebral ischemia for drug development. In addition, we compared its performance with three other ConvNets specifically designed for medical image segmentation. RatLesNetv2 obtained similar to higher Dice coefficient values than the other ConvNets and it produced much more realistic and compact segmentations with notably fewer holes and lower Hausdorff distance. The Dice scores of RatLesNetv2 segmentations also exceeded inter-rater agreement of manual segmentations. In conclusion, RatLesNetv2 could be used for automated lesion segmentation, reducing human workload and improving reproducibility. RatLesNetv2 is publicly available at https://github.com/jmlipman/RatLesNetv2.	翻訳日:2023-01-07 05:24:46 公開日:2020-12-30
# 雑音入力を用いたマルチクラスガウス過程分類 Multi-class Gaussian Process Classification with Noisy Inputs ( http://arxiv.org/abs/2001.10523v3 ) ライセンス: Link先を確認	Carlos Villacampa-Calvo, Bryan Zaldivar, Eduardo C. Garrido-Merch\'an, Daniel Hern\'andez-Lobato	(参考訳) 機械学習コミュニティでは、観測されたデータが入力属性のノイズフリーであると仮定することが一般的である。それでも、実際の問題では入力ノイズを伴うシナリオが一般的であり、測定が完全に正確ではない。この入力ノイズが考慮されない場合、教師付き機械学習手法が準最適に実行されることが期待される。本稿では,マルチクラス分類問題に着目し,ガウス過程(GP)を基礎となる分類器として利用する。天体物理学領域から得られたデータセットにより、観測されたデータは入力にノイズを含む可能性があると仮定する。そこで我々は,入力ノイズを考慮できるマルチクラスgp分類器を考案する。このような分類器は、モデルの潜在変数の後方分布を近似するために変分推論を用いて効率的に訓練することができる。また、状況によっては事前に騒音量を知ることができる。このような場合、提案手法に容易に導入することができる。この事前情報は、よりよいパフォーマンス結果をもたらすことが期待されている。提案手法を,合成データと実データを含むいくつかの実験により評価した。これには、UCIリポジトリからのいくつかのデータセット、MNISTデータセット、天体物理学からのデータセットが含まれる。その結果,入力ノイズを無視するgpsに基づく分類器の予測分布よりも,提案手法の予測分布が類似しているものの,テストログの類似性の観点からは,提案手法の予測分布が優れていることがわかった。 It is a common practice in the machine learning community to assume that the observed data are noise-free in the input attributes. Nevertheless, scenarios with input noise are common in real problems, as measurements are never perfectly accurate. If this input noise is not taken into account, a supervised machine learning method is expected to perform sub-optimally. In this paper, we focus on multi-class classification problems and use Gaussian processes (GPs) as the underlying classifier. Motivated by a data set coming from the astrophysics domain, we hypothesize that the observed data may contain noise in the inputs. Therefore, we devise several multi-class GP classifiers that can account for input noise. Such classifiers can be efficiently trained using variational inference to approximate the posterior distribution of the latent variables of the model. Moreover, in some situations, the amount of noise can be known before-hand. If this is the case, it can be readily introduced in the proposed methods. This prior information is expected to lead to better performance results. We have evaluated the proposed methods by carrying out several experiments, involving synthetic and real data. These include several data sets from the UCI repository, the MNIST data set and a data set coming from astrophysics. The results obtained show that, although the classification error is similar across methods, the predictive distribution of the proposed methods is better, in terms of the test log-likelihood, than the predictive distribution of a classifier based on GPs that ignores input noise.	翻訳日:2023-01-06 02:24:13 公開日:2020-12-30
# 高速モデル推論のためのニューラルネットワーク圧縮フレームワーク Neural Network Compression Framework for fast model inference ( http://arxiv.org/abs/2002.08679v4 ) ライセンス: Link先を確認	Alexander Kozlov and Ivan Lazarevich and Vasily Shamporov and Nikolay Lyalyushkin and Yury Gorbachev	(参考訳) 本稿では,ニューラルネット圧縮フレームワーク(nncf)と呼ばれる,微調整によるニューラルネットワーク圧縮のための新しいフレームワークを提案する。様々なネットワーク圧縮手法の最近の進歩を活用し、スパーシティ、量子化、バイナリ化などいくつかの手法を実装している。これらの方法では、汎用ハードウェア計算ユニット(cpu、gpu)や特別なディープラーニングアクセラレータ上で効率的に実行できる、よりハードウェアフレンドリーなモデルを得ることができる。提案手法は,従来の精度を維持しつつ,推論時間を高速化するために,幅広いモデルに適用可能であることを示す。フレームワークはトレーニングサンプル内で使用することができ、それが供給されるか、あるいは最小限の適応で既存のトレーニングコードにシームレスに統合可能なスタンドアロンパッケージとして使用することができる。現在、NNCFのPyTorchバージョンがOpenVINO Training Extensionsの一部としてhttps://github.com/openvinotoolkit/nncfで公開されている。 In this work we present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF). It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization. These methods allow getting more hardware-friendly models which can be efficiently run on general-purpose hardware computation units (CPU, GPU) or special Deep Learning accelerators. We show that the developed methods can be successfully applied to a wide range of models to accelerate the inference time while keeping the original accuracy. The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code with minimal adaptations. Currently, a PyTorch version of NNCF is available as a part of OpenVINO Training Extensions at https://github.com/openvinotoolkit/nncf.	翻訳日:2022-12-30 08:12:00 公開日:2020-12-30
# ディープネットワーク最適化アルゴリズムの収束保証に関する基礎的アプローチ An Elementary Approach to Convergence Guarantees of Optimization Algorithms for Deep Networks ( http://arxiv.org/abs/2002.09051v2 ) ライセンス: Link先を確認	Vincent Roulet and Zaid Harchaoui	(参考訳) 本稿では,基本引数と計算に基づく深層ネットワークの最適化アルゴリズムの収束保証を得るための手法を提案する。収束解析は、機械学習ソフトウェアにおけるディープネットワークの実装の中心となる最適化オラクルの分析構造と計算構造を中心に展開される。深層ネットワークの学習に使用される一階最適化アルゴリズムの収束挙動を制御する滑らかさ定数の推定を体系的に計算する方法を提案する。現代のディープネットワークで発生する多様なサンプルコンポーネントとアーキテクチャは、そのアプローチを説明するために展示物にまたがる。 We present an approach to obtain convergence guarantees of optimization algorithms for deep networks based on elementary arguments and computations. The convergence analysis revolves around the analytical and computational structures of optimization oracles central to the implementation of deep networks in machine learning software. We provide a systematic way to compute estimates of the smoothness constants that govern the convergence behavior of first-order optimization algorithms used to train deep networks. A diverse set of example components and architectures arising in modern deep networks intersperse the exposition to illustrate the approach.	翻訳日:2022-12-30 07:07:57 公開日:2020-12-30
# 機械学習研究における再現性の向上 (NeurIPS 2019 Reproducibility Programからの報告) Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) ( http://arxiv.org/abs/2003.12206v4 ) ライセンス: Link先を確認	Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivi\`ere, Alina Beygelzimer, Florence d'Alch\'e-Buc, Emily Fox, Hugo Larochelle	(参考訳) 機械学習研究の課題の1つは、提示された結果が健全で信頼性が高いことを保証することである。論文や講演で示されるような結果を得る再現性は、同じコードとデータ(利用可能であれば)を使用して、研究結果の信頼性を検証するために必要なステップである。再現性はまた、オープンでアクセスしやすい研究を促進するための重要なステップであり、科学コミュニティが新しい発見を迅速に統合し、アイデアを実践に転換することを可能にする。再現性はまた、意図しないエラーを減らす可能性のある堅牢な実験ワークフローの使用を促進する。 2019年、ニューラル情報処理システム(NeurIPS)カンファレンスは、機械学習研究のための国際会議であり、機械学習研究の実施、コミュニケーション、評価に関するコミュニティ全体の標準を改善するために設計された再現性プログラムを導入した。プログラムには3つのコンポーネントが含まれている: コード提出ポリシー、コミュニティ全体の再現性課題、および、ペーパー提出プロセスの一部として機械学習再現性チェックリストが含まれる。本稿では,これら各コンポーネントについて,デプロイ方法,このイニシアティブからどのようなことを学ぶことができたか,などについて述べる。 One of the challenges in machine learning research is to ensure that presented and published results are sound and reliable. Reproducibility, that is obtaining similar results as presented in a paper or talk, using the same code and data (when available), is a necessary step to verify the reliability of research findings. Reproducibility is also an important step to promote open and accessible research, thereby allowing the scientific community to quickly integrate new findings and convert ideas to practice. Reproducibility also promotes the use of robust experimental workflows, which potentially reduce unintentional errors. In 2019, the Neural Information Processing Systems (NeurIPS) conference, the premier international conference for research in machine learning, introduced a reproducibility program, designed to improve the standards across the community for how we conduct, communicate, and evaluate machine learning research. The program contained three components: a code submission policy, a community-wide reproducibility challenge, and the inclusion of the Machine Learning Reproducibility checklist as part of the paper submission process. In this paper, we describe each of these components, how it was deployed, as well as what we were able to learn from this initiative.	翻訳日:2022-12-19 04:37:05 公開日:2020-12-30
# ResNeSt: 分割アテンションネットワーク ResNeSt: Split-Attention Networks ( http://arxiv.org/abs/2004.08955v2 ) ライセンス: Link先を確認	Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola	(参考訳) 特徴マップの注意とマルチパス表現が視覚認識に重要であることはよく知られている。本稿では,異なるネットワークブランチにチャネル毎の注意を向け,機能横断的なインタラクションを捉え,多様な表現を学習するモジュラー化アーキテクチャを提案する。我々の設計は単純で統一された計算ブロックとなり、少数の変数だけでパラメータ化できる。我々のモデルはResNeStと呼ばれ、画像分類の精度と遅延トレードオフにおいてEfficientNetより優れています。さらに、ResNeStはバックボーンとして機能するいくつかの公開ベンチマークにおいて優れた転送学習結果を達成しており、COCO-LVISチャレンジの勝者として採用されている。完全なシステムと事前訓練されたモデルのソースコードが公開されている。 It is well known that featuremap attention and multi-path representation are important for visual recognition. In this paper, we present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our design results in a simple and unified computation block, which can be parameterized using only a few variables. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification. In addition, ResNeSt has achieved superior transfer learning results on several public benchmarks serving as the backbone, and has been adopted by the winning entries of COCO-LVIS challenge. The source code for complete system and pretrained models are publicly available.	翻訳日:2022-12-12 00:21:37 公開日:2020-12-30
# ニューラルネットワーク高速化のための最近のFPGAにおける低電圧動作の実験的検討 An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration ( http://arxiv.org/abs/2005.03451v2 ) ライセンス: Link先を確認	Behzad Salami, Erhan Baturay Onural, Ismail Emir Yuksel, Fahrettin Koc, Oguz Ergin, Adrian Cristal Kestelman, Osman S. Unsal, Hamid Sarbazi-Azad, Onur Mutlu	(参考訳) 本研究では,フィールドプログラマブルゲートアレイ(FPGA)にマッピングされた畳み込みニューラルネットワーク(CNN)加速器の電力効率を向上させるために,名目レベル以下で回路供給電圧をアンダスケーリングする手法を実証的に評価する。安全な電圧レベル以下では、過度の回路遅延の増加によるタイミング障害が発生する可能性がある。このようなアクセラレーターの信頼性とパワーのトレードオフを評価する。具体的には、実FPGAの複数成分の減電圧動作を実験的に検討し、CNN加速器の信頼性挙動を特徴付けるとともに、減電圧動作の欠点を最小化するための手法を提案し、過電圧を量子化とプルーニングというアーキテクチャCNN最適化技術と組み合わせる。環境温度が加速器の信頼性・電力トレードオフに及ぼす影響について検討する。我々は,最新のXilinx ZCU102 FPGAプラットフォームの3つの同一サンプルに対して,最新の画像分類CNNベンチマークを用いて実験を行った。このアプローチにより、ソフトウェアとハードウェアの両方の可変性に対する下振れ技術の効果を研究できます。我々は過電圧で3倍以上の電力効率(gops/w)を得る。このゲインの2.6倍は電圧ガードバンド領域、すなわちfpgaベンダーが設定した名目レベル以下の安全な電圧領域を除去し、最悪の環境や回路条件で適切な機能を保証する結果である。電力効率向上の43%は、cnn加速器の精度損失のコストがかかるガードバンドを下回ることによるものである。この精度の低下を防ぐ効果的な周波数アンダスケーリング手法の評価を行い,電力効率が43%から25%に低下することを確認した。 We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.	翻訳日:2022-12-07 01:23:45 公開日:2020-12-30
# 深層学習に基づくテキスト・スタイル・トランスファーのレビュー Review of Text Style Transfer Based on Deep Learning ( http://arxiv.org/abs/2005.02914v3 ) ライセンス: Link先を確認	Xiangyang Li, Guo Pu, Keyu Ming, Pu Li, Jie Wang, Yuxuan Wang	(参考訳) 最近の自然言語処理ではテキストスタイルの転送がホットな問題であり、主に特定の状況や聴衆、目的に適応するためにテキストを研究している。テキストのスタイルは、通常、形態、文法、感情、複雑さ、流動性、緊張、トーンなど多くの側面を含んでいる。従来のテキストスタイル伝達モデルでは、テキストスタイルは一般的に専門家の知識と手作りのルールに依存しているが、自然言語処理の分野でのディープラーニングの適用により、深層学習に基づくテキストスタイル転送手法が研究され始めた。近年,自然言語処理研究において,テキストスタイル転送がホットな問題となっている。本稿では,近年の深層学習に基づくテキストスタイル伝達モデルの研究を要約し,本研究の方向性と進歩を要約し,分析し,比較する。さらに、テキストスタイルの転送によく使用される公開データセットや評価指標も紹介する。最後に、既存のテキストスタイル転送モデルの特徴を要約し、ディープラーニングに基づくテキストスタイル転送モデルの今後の開発動向を分析し予測する。 Text style transfer is a hot issue in recent natural language processing,which mainly studies the text to adapt to different specific situations, audiences and purposes by making some changes. The style of the text usually includes many aspects such as morphology, grammar, emotion, complexity, fluency, tense, tone and so on. In the traditional text style transfer model, the text style is generally relied on by experts knowledge and hand-designed rules, but with the application of deep learning in the field of natural language processing, the text style transfer method based on deep learning Started to be heavily researched. In recent years, text style transfer is becoming a hot issue in natural language processing research. This article summarizes the research on the text style transfer model based on deep learning in recent years, and summarizes, analyzes and compares the main research directions and progress. In addition, the article also introduces public data sets and evaluation indicators commonly used for text style transfer. Finally, the existing characteristics of the text style transfer model are summarized, and the future development trend of the text style transfer model based on deep learning is analyzed and forecasted.	翻訳日:2022-12-06 04:58:22 公開日:2020-12-30
# 学習特徴とビュー合成による長期視覚定位のための基準ポーズ生成 Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis ( http://arxiv.org/abs/2005.05179v4 ) ライセンス: Link先を確認	Zichao Zhang, Torsten Sattler, Davide Scaramuzza	(参考訳) 視覚的ローカライゼーションは、自動運転と拡張現実のための重要な技術のひとつだ。正確な6自由度(DoF)参照ポーズを持つ高品質データセットは、既存のメソッドのベンチマークと改善の基盤である。伝統的に、参照ポーズはStructure-from-Motion (SfM)を介して得られる。しかし、SfM自体は、例えば昼夜の変化など、異なる条件下で撮影された画像が失敗しがちな局所的な特徴に依存している。同時に、手動でアノテートする機能対応はスケーラブルではなく、潜在的に不正確である。本研究では,3次元モデルのレンダリングと実画像との特徴マッチングに基づく参照ポーズを生成するための半自動手法を提案する。最初のポーズ推定を仮定すると、現在のポーズ推定からモデルのレンダリングに対して、特徴マッチングに基づいてポーズを反復的に洗練します。我々は,一般的なAachen Day-Nightデータセットの夜間参照ポーズを大幅に改善し,現在最先端の視覚的ローカライゼーション手法がオリジナルの参照ポーズによって予測されるよりも優れた(最大4,7\%)ことを示す。我々は、データセットを新しい夜間テスト画像で拡張し、新しい参照ポーズに対する不確実性推定を提供し、新しい評価基準を導入する。私たちは、リファレンスのポーズとフレームワークを公開時に公開します。 Visual Localization is one of the key enabling technologies for autonomous driving and augmented reality. High quality datasets with accurate 6 Degree-of-Freedom (DoF) reference poses are the foundation for benchmarking and improving existing methods. Traditionally, reference poses have been obtained via Structure-from-Motion (SfM). However, SfM itself relies on local features which are prone to fail when images were taken under different conditions, e.g., day/ night changes. At the same time, manually annotating feature correspondences is not scalable and potentially inaccurate. In this work, we propose a semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features. Given an initial pose estimate, our approach iteratively refines the pose based on feature matches against a rendering of the model from the current pose estimate. We significantly improve the nighttime reference poses of the popular Aachen Day-Night dataset, showing that state-of-the-art visual localization methods perform better (up to $47\%$) than predicted by the original reference poses. We extend the dataset with new nighttime test images, provide uncertainty estimates for our new reference poses, and introduce a new evaluation criterion. We will make our reference poses and our framework publicly available upon publication.	翻訳日:2022-12-04 20:28:51 公開日:2020-12-30
# 非漸近解析による連続宇宙MDPにおけるモンテカルロ計画 POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis ( http://arxiv.org/abs/2006.04672v2 ) ライセンス: Link先を確認	Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Ba\c{s}ar	(参考訳) モンテカルロ・ツリー・サーチ(MCTS)で実証されたモンテカルロ計画は、有限空間の応用において顕著な性能を示した。本稿では,モンテカルロ計画について,制御・ロボット工学における重要な応用に対する理解の低い,連続的な状態対応空間を持つ環境での考察を行う。我々は,階層的楽観的最適化(hoo)(bubeck et al., 2011)と呼ばれる連続武装バンディット戦略でmctsを増強するアルゴリズムであるpoly-hootを紹介する。具体的には,高信頼境界におけるボーナス項の対数ではなく,適切な多項式を用いることでhooを強化した。このような多項式ボーナスは、AlphaGo Zero(Silver et al., 2017b)における経験的成功と、有限空間MCTS(Shah et al., 2019)の理論的保証を達成する上で重要な役割によって動機付けられている。非定常バンディット問題において,HOOアルゴリズムが拡張されたことを初めて考察した。この結果をビルディングブロックとして用いることで、POLY-HOOTの非漸近収束保証を確立する:値推定は多項式速度で最適値関数の任意の小さな近傍に収束する。理論的な知見を裏付ける実験結果も提供します。 Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT, an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems. Using this result as a building block, we establish non-asymptotic convergence guarantees for POLY-HOOT: the value estimate converges to an arbitrarily small neighborhood of the optimal value function at a polynomial rate. We further provide experimental results that corroborate our theoretical findings.	翻訳日:2022-11-24 00:13:30 公開日:2020-12-30
# 近接近傍サンプリングを用いた条件付き相互情報のための神経推定器 Neural Estimators for Conditional Mutual Information Using Nearest Neighbors Sampling ( http://arxiv.org/abs/2006.07225v3 ) ライセンス: Link先を確認	Sina Molavipour, Germ\'an Bassi, Mikael Skoglund	(参考訳) サンプルの集合から相互情報(MI)や条件付き相互情報(CMI)を推定することは、長年の課題である。この領域における最近の研究は、人工ニューラルネットワークの近似能力を活用し、従来の手法よりも改善されている。この新しいアプローチにおける重要な課題の1つは、サンプルが特定の製品密度関数に従って分散される異なるデータセットである元のデータセットを考慮に入れる必要があることである。 CMIを見積もる場合,これは特に困難です。本稿では,試料平均値に対する高信頼濃度境界の再現と導出を行うために,k近傍近傍 (k-nn) に基づく新しい手法を提案する。次に、ニューラルネットワーク分類器を訓練するためにこの技術を使用し、それに応じてCMIを推定する。この手法を用いて3つの推定器を提案し、それらの一貫性を証明し、文献におけるそれらと類似したアプローチの比較を行い、推定器の精度とばらつきの観点からCMIを推定する際の改善を実験的に示す。 The estimation of mutual information (MI) or conditional mutual information (CMI) from a set of samples is a long-standing problem. A recent line of work in this area has leveraged the approximation power of artificial neural networks and has shown improvements over conventional methods. One important challenge in this new approach is the need to obtain, given the original dataset, a different set where the samples are distributed according to a specific product density function. This is particularly challenging when estimating CMI. In this paper, we introduce a new technique, based on k nearest neighbors (k-NN), to perform the resampling and derive high-confidence concentration bounds for the sample average. Then the technique is employed to train a neural network classifier and the CMI is estimated accordingly. We propose three estimators using this technique and prove their consistency, make a comparison between them and similar approaches in the literature, and experimentally show improvements in estimating the CMI in terms of accuracy and variance of the estimators.	翻訳日:2022-11-22 04:53:18 公開日:2020-12-30
# 半教師付き分類のためのクラス親密拡散ネットワーク Class-Attentive Diffusion Network for Semi-Supervised Classification ( http://arxiv.org/abs/2006.10222v3 ) ライセンス: Link先を確認	Jongin Lim, Daeho Um, Hyung Jin Chang, Dae Ung Jo, Jin Young Choi	(参考訳) 近年,半教師付き分類のためのグラフニューラルネットワークが広く研究されている。しかし、既存の手法は限られた隣人の情報のみを使用し、グラフのクラス間接続を扱わない。本稿では,k-hop近傍において,おそらく同一クラスのノードを適応的に集約する新しいアグリゲーションスキームであるadacad(class-attentive diffusion)を用いた適応アグリゲーションを提案する。そこで我々はまず,クラス間拡散(cad)と呼ばれる新しい確率過程を提案し,クラス内ノードへの注目度を高め,クラス間ノードへの注意を弱める。グラフ構造のみによって決定される遷移行列を持つ既存の拡散法とは対照的に,CADはノードの特徴とグラフ構造の両方を,分類器を用いたクラス減衰遷移行列の設計により考慮している。さらに,局所クラスコンテキストに依存する各ノードに対する拡散結果の反射率の差を利用した適応的更新手法を提案する。主な利点として、AdaCADはノードラベルとグラフトポロジの相違に起因するクラス間特徴の望ましくない混合の問題を軽減する。 AdaCAD上に構築され,CAD-Netと呼ばれる単純なモデルを構築した。提案手法の有効性を連続的に検証し,CAD-Netは最先端の手法よりも優れていた。コードはhttps://github.com/ljin0429/CAD-Netで入手できる。 Recently, graph neural networks for semi-supervised classification have been widely studied. However, existing methods only use the information of limited neighbors and do not deal with the inter-class connections in graphs. In this paper, we propose Adaptive aggregation with Class-Attentive Diffusion (AdaCAD), a new aggregation scheme that adaptively aggregates nodes probably of the same class among K-hop neighbors. To this end, we first propose a novel stochastic process, called Class-Attentive Diffusion (CAD), that strengthens attention to intra-class nodes and attenuates attention to inter-class nodes. In contrast to the existing diffusion methods with a transition matrix determined solely by the graph structure, CAD considers both the node features and the graph structure with the design of our class-attentive transition matrix that utilizes a classifier. Then, we further propose an adaptive update scheme that leverages different reflection ratios of the diffusion result for each node depending on the local class-context. As the main advantage, AdaCAD alleviates the problem of undesired mixing of inter-class features caused by discrepancies between node labels and the graph topology. Built on AdaCAD, we construct a simple model called Class-Attentive Diffusion Network (CAD-Net). Extensive experiments on seven benchmark datasets consistently demonstrate the efficacy of the proposed method and our CAD-Net significantly outperforms the state-of-the-art methods. Code is available at https://github.com/ljin0429/CAD-Net.	翻訳日:2022-11-19 10:08:59 公開日:2020-12-30
# ANOVA平均次元の効率的な推定とニューラルネット分類への応用 Efficient estimation of the ANOVA mean dimension, with an application to neural net classification ( http://arxiv.org/abs/2007.01281v4 ) ライセンス: Link先を確認	Christopher Hoyt and Art B. Owen	(参考訳) ブラックボックス関数の平均次元は$d$変数であり、高階または低階の相互作用によって支配される範囲を要約するのに便利な方法である。 2^d-1$分散成分の項で表されるが、$d$ Sobol'の指標の和として書くことができ、これは1つのアウトメソッドから推定できる。筆者らは, ウインド・階段と呼ばれるギブス・サンプルラー, ベースラインから各変数を一度に変化させるラジアル・サンプルラー, 関数評価を再利用しないナイーブ・サンプルラーなどを比較した。加法関数では、半径と巻く階段が最も効率的である。乗算関数の場合、因子が高いクルトシスを持つ場合、ナイーブ法は最も効率的である。図示として、MNISTデータセットからの桁のニューラルネットワーク分類器の平均次元について考察する。この分類器は784ドルのピクセルの関数だ。そのため、階段を巻くのが最適なアルゴリズムです。最終的なsoftmax層への入力は、平均寸法が1.35$から2.0$であることがわかった。 The mean dimension of a black box function of $d$ variables is a convenient way to summarize the extent to which it is dominated by high or low order interactions. It is expressed in terms of $2^d-1$ variance components but it can be written as the sum of $d$ Sobol' indices that can be estimated by leave one out methods. We compare the variance of these leave one out methods: a Gibbs sampler called winding stairs, a radial sampler that changes each variable one at a time from a baseline, and a naive sampler that never reuses function evaluations and so costs about double the other methods. For an additive function the radial and winding stairs are most efficient. For a multiplicative function the naive method can easily be most efficient if the factors have high kurtosis. As an illustration we consider the mean dimension of a neural network classifier of digits from the MNIST data set. The classifier is a function of $784$ pixels. For that problem, winding stairs is the best algorithm. We find that inputs to the final softmax layer have mean dimensions ranging from $1.35$ to $2.0$.	翻訳日:2022-11-14 15:02:44 公開日:2020-12-30
# EfficientHRNet:軽量高分解能マルチパーソンポーズ推定のための効率的なスケーリング EfficientHRNet: Efficient Scaling for Lightweight High-Resolution Multi-Person Pose Estimation ( http://arxiv.org/abs/2007.08090v2 ) ライセンス: Link先を確認	Christopher Neff, Aneri Sheth, Steven Furgurson, Hamed Tabkhi	(参考訳) 多くの新興スマートIoTアプリケーションの軽量なマルチパーソンポーズ推定に対する需要が高まっている。しかし、既存のアルゴリズムは大きなモデルサイズと厳しい計算要求を持ち、リアルタイムアプリケーションやリソース制約のあるハードウェアへのデプロイには不適である。軽量でリアルタイムなアプローチは極めて稀であり、精度が劣るコストがかかる。本稿では,リソース制約されたデバイス上でリアルタイムに動作可能な軽量な多人数ポーズ推定装置であるEfficientHRNetを提案する。 EfficientHRNetは、高解像度の特徴表現によるモデルスケーリングの最近の進歩を統合することで、高精度なモデルを作成しながら、リアルタイムのパフォーマンスを達成するのに十分な計算量を削減している。最大のモデルは現在の最先端の4.4%の精度で、モデルサイズは1/3、計算量は1/6でnvidia jetson xavierでは23fpsとなる。最上位のリアルタイムアプローチと比較して、EfficientHRNetは22%の精度向上を実現し、1/3のパワーで同様のFPSを実現している。あらゆるレベルで、効率の良いHRNetは他のボトムアップな2次元ポーズ推定手法よりも計算効率が良く、高い競争精度を実現している。 There is an increasing demand for lightweight multi-person pose estimation for many emerging smart IoT applications. However, the existing algorithms tend to have large model sizes and intense computational requirements, making them ill-suited for real-time applications and deployment on resource-constrained hardware. Lightweight and real-time approaches are exceedingly rare and come at the cost of inferior accuracy. In this paper, we present EfficientHRNet, a family of lightweight multi-person human pose estimators that are able to perform in real-time on resource-constrained devices. By unifying recent advances in model scaling with high-resolution feature representations, EfficientHRNet creates highly accurate models while reducing computation enough to achieve real-time performance. The largest model is able to come within 4.4% accuracy of the current state-of-the-art, while having 1/3 the model size and 1/6 the computation, achieving 23 FPS on Nvidia Jetson Xavier. Compared to the top real-time approach, EfficientHRNet increases accuracy by 22% while achieving similar FPS with 1/3 the power. At every level, EfficientHRNet proves to be more computationally efficient than other bottom-up 2D human pose estimation approaches, while achieving highly competitive accuracy.	翻訳日:2022-11-09 23:00:12 公開日:2020-12-30
# ロボットハンドオーバ開始のためのジェスチャー認識 Gesture Recognition for Initiating Human-to-Robot Handovers ( http://arxiv.org/abs/2007.09945v2 ) ライセンス: Link先を確認	Jun Kwan, Chinkye Tan and Akansel Cosgun	(参考訳) 人間とロボットのハンドオーバは多くの人間とロボットのインタラクションシナリオに役立ちます。人間がハンドオーバを開始する意図を認識させることが重要であり、ハンドオーバが意図されていなければ、ロボットは人間からオブジェクトを取り出そうとしない。ハンドオーバジェスチャー認識は単一のRGB画像のバイナリ分類問題として機能する。 rgb画像から関連する特徴を抽出するために、物体を検出するための3つの別個のニューラルネットワークモジュール、人体キーポイントと頭部方向を実装し、特徴ベクトルをディープニューラルネットワークに渡してバイナリ分類を行う。以上の結果から,ハンドオーバ動作は90%以上の精度で正しく識別できることがわかった。機能の抽象化により、アプローチはモジュール化され、異なるオブジェクトや人体タイプに一般化できます。 Human-to-Robot handovers are useful for many Human-Robot Interaction scenarios. It is important to recognize when a human intends to initiate handovers, so that the robot does not try to take objects from humans when a handover is not intended. We pose the handover gesture recognition as a binary classification problem in a single RGB image. Three separate neural network modules for detecting the object, human body key points and head orientation, are implemented to extract relevant features from the RGB images, and then the feature vectors are passed into a deep neural net to perform binary classification. Our results show that the handover gestures are correctly identified with an accuracy of over 90%. The abstraction of the features makes our approach modular and generalizable to different objects and human body types.	翻訳日:2022-11-08 14:35:14 公開日:2020-12-30
# 正規化エピポーラ誤差の幾何学的解釈 Geometric Interpretations of the Normalized Epipolar Error ( http://arxiv.org/abs/2008.01254v7 ) ライセンス: Link先を確認	Seong Hun Lee, Javier Civera	(参考訳) 本研究では,正規化エピポーラ誤差の幾何学的解釈を提供する。最も注目すべきは、(1)2つのバックプロジェクション線間の最短距離、(2)2つの境界エピポーラ面間の双面角、(3) $l_1$-optimal angular reprojection errorである。 In this work, we provide geometric interpretations of the normalized epipolar error. Most notably, we show that it is directly related to the following quantities: (1) the shortest distance between the two backprojected rays, (2) the dihedral angle between the two bounding epipolar planes, and (3) the $L_1$-optimal angular reprojection error.	翻訳日:2022-11-03 00:24:02 公開日:2020-12-30
# コモンセンス・プロット・オーダリングによるストーリーテリングの自動化 Automated Storytelling via Causal, Commonsense Plot Ordering ( http://arxiv.org/abs/2009.00829v2 ) ライセンス: Link先を確認	Prithviraj Ammanabrolu, Wesley Cheung, William Broniec, Mark O. Riedl	(参考訳) 自動ストーリープロット生成は、プロットイベントの一貫性のあるシーケンスを生成するタスクである。プロットイベント間の因果関係は、ストーリーの認識とプロットのコヒーレンスを高めると考えられている。本研究では,コモンセンス推論から推定される因果関係として,ソフト因果関係の概念を導入する。 C2POは、この概念をCausal, Commonsense Plot Orderingを通じて運用する物語生成のアプローチである。人間の参加型プロトコルを用いて,異なる常識推論推論と帰納的バイアスを持つベースラインシステムに対して,認識されたストーリー品質におけるソフト因果関係の役割を判断するシステムを評価する。これらの研究を通じて、ストーリーテリングジャンルにおけるコモンセンス規範の変化がストーリー品質の知覚にどのように影響するかを考察する。 Automated story plot generation is the task of generating a coherent sequence of plot events. Causal relations between plot events are believed to increase the perception of story and plot coherence. In this work, we introduce the concept of soft causal relations as causal relations inferred from commonsense reasoning. We demonstrate C2PO, an approach to narrative generation that operationalizes this concept through Causal, Commonsense Plot Ordering. Using human-participant protocols, we evaluate our system against baseline systems with different commonsense reasoning reasoning and inductive biases to determine the role of soft causal relations in perceived story quality. Through these studies we also probe the interplay of how changes in commonsense norms across storytelling genres affect perceptions of story quality.	翻訳日:2022-10-22 18:25:31 公開日:2020-12-30
# 胸部X線異常分類に応用した深層階層型マルチラベル分類 Deep Hiearchical Multi-Label Classification Applied to Chest X-Ray Abnormality Taxonomies ( http://arxiv.org/abs/2009.05609v3 ) ライセンス: Link先を確認	Haomin Chen, Shun Miao, Daguang Xu, Gregory D. Hager, Adam P. Harrison	(参考訳) CXRは決定的かつ極めて一般的な診断ツールであり、CADソリューションの研究に繋がる。しかし,高い分類精度と臨床的分類を尊重し,組み込む有意義なモデル予測がcadユーザビリティに不可欠である。そこで本研究では,CXR CADのためのHMLCアプローチを提案する。他の階層システムとは異なり、まずネットワークを訓練して条件付き確率を直接モデル化し、非条件付き確率で精製することが性能向上の鍵となる。さらに,非条件確率に対して数値的に安定なクロスエントロピー損失関数を定式化し,具体的な性能向上を実現する。最後に,HMLCが欠落したラベルや不完全ラベルの管理に有効であることを示す。我々の知る限りでは、HMLCを医療画像CADに適用するのは初めてである。我々はPLCOデータセットのCXRアームから異常ラベルを検出するためのアプローチを広範囲に評価した。完全なラベルを使用する場合、このデータセットで報告されている最上位の 0.887 の平均 AUC を報告する。これらの結果はPadChestデータセットの補助的な実験によって支援され、AUCとAPでそれぞれ1.2%と4.1%の大幅な改善が報告された。最後に、HMLCアプローチが不完全なラベル付きデータをよりうまく扱えることを示す。これらの性能改善は、分類学的予測の本質的な有用性と相まって、本手法がCXR CADにとって有用なステップであることを示唆している。 CXRs are a crucial and extraordinarily common diagnostic tool, leading to heavy research for CAD solutions. However, both high classification accuracy and meaningful model predictions that respect and incorporate clinical taxonomies are crucial for CAD usability. To this end, we present a deep HMLC approach for CXR CAD. Different than other hierarchical systems, we show that first training the network to model conditional probability directly and then refining it with unconditional probabilities is key in boosting performance. In addition, we also formulate a numerically stable cross-entropy loss function for unconditional probabilities that provides concrete performance improvements. Finally, we demonstrate that HMLC can be an effective means to manage missing or incomplete labels. To the best of our knowledge, we are the first to apply HMLC to medical imaging CAD. We extensively evaluate our approach on detecting abnormality labels from the CXR arm of the PLCO dataset, which comprises over $198,000$ manually annotated CXRs. When using complete labels, we report a mean AUC of 0.887, the highest yet reported for this dataset. These results are supported by ancillary experiments on the PadChest dataset, where we also report significant improvements, 1.2% and 4.1% in AUC and AP, respectively over strong "flat" classifiers. Finally, we demonstrate that our HMLC approach can much better handle incompletely labelled data. These performance improvements, combined with the inherent usefulness of taxonomic predictions, indicate that our approach represents a useful step forward for CXR CAD.	翻訳日:2022-10-19 20:49:03 公開日:2020-12-30
# YOLObile:圧縮コンパイル協調設計によるモバイルデバイス上のリアルタイムオブジェクト検出 YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design ( http://arxiv.org/abs/2009.05697v2 ) ライセンス: Link先を確認	Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang	(参考訳) 物体検出技術の急速な発展と幅広い利用は、物体検出器の精度と速度の両方に注目を集めた。しかし、現在の最先端のオブジェクト検出作業は、大きなモデルで精度指向であるが、軽量モデルで高いレイテンシや速度指向をもたらすが精度を犠牲にしている。本研究では,モバイル端末上でリアルタイムなオブジェクト検出を行う YOLObile フレームワークを提案する。任意のカーネルサイズに対して新しいブロックパンチプルーニング方式を提案する。モバイルデバイス上での計算効率を向上させるため,GPU-CPU協調方式と高度なコンパイラ支援最適化が採用されている。実験結果から, 49.0mAPのYOLOv4の14$\times$圧縮速度が得られた。 YOLObileフレームワークでは,Samsung Galaxy S20上でGPUを用いて17FPSの推論速度を実現する。提案したGPU-CPU協調方式を取り入れることで、推論速度は19.1 FPSに向上し、元のYOLOv4を5$\times$ speedupで上回った。ソースコードは: \url{https://github.com/nightsnack/YOLObile} にある。 The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14$\times$ compression rate of YOLOv4 with 49.0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5$\times$ speedup. Source code is at: \url{https://github.com/nightsnack/YOLObile}.	翻訳日:2022-10-19 07:41:26 公開日:2020-12-30
# Puzzle Mix: 最適混合のための爆発率と局所統計 Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup ( http://arxiv.org/abs/2009.06962v2 ) ライセンス: Link先を確認	Jang-Hyun Kim, Wonho Choo, Hyun Oh Song	(参考訳) 深層ニューラルネットワークはトレーニング分布の適合において優れた性能を発揮する一方、学習されたネットワークは過度に適合する傾向があり、敵の攻撃を受けやすい。この点に関して,最近,ミックスアップに基づく拡張手法がいくつか提案されている。しかし、これらのアプローチは主に、未確認の仮想例の作成に重点を置いており、時にはネットワークに誤解を招く監視信号を提供することもある。そこで本研究では,自然例のサリエンシー情報と基礎となる統計情報を明示的に活用するための混合手法である puzzle mix を提案する。これにより、最適混合マスクのマルチラベル目的と、最適輸送目標のサリエンシ割引とを交互に比較する興味深い最適化問題が発生する。 CIFAR-100, Tiny-ImageNet, ImageNetの他の混合手法と比較して, Puzzle Mixは, 技術一般化の状況と, 対角的ロバスト性を実現する。ソースコードはhttps://github.com/snu-mllab/puzzlemixで入手できる。 While deep neural networks achieve great performance on fitting the training distribution, the learned networks are prone to overfitting and are susceptible to adversarial attacks. In this regard, a number of mixup based augmentation methods have been recently proposed. However, these approaches mainly focus on creating previously unseen virtual examples and can sometimes provide misleading supervisory signal to the network. To this end, we propose Puzzle Mix, a mixup method for explicitly utilizing the saliency information and the underlying statistics of the natural examples. This leads to an interesting optimization problem alternating between the multi-label objective for optimal mixing mask and saliency discounted optimal transport objective. Our experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny-ImageNet, and ImageNet datasets. The source code is available at https://github.com/snu-mllab/PuzzleMix.	翻訳日:2022-10-18 05:14:49 公開日:2020-12-30
# ビデオに基づく人物認識のためのフレームアグリゲーションとマルチモーダルフュージョンフレームワーク Frame Aggregation and Multi-Modal Fusion Framework for Video-Based Person Recognition ( http://arxiv.org/abs/2010.09290v2 ) ライセンス: Link先を確認	Fangtao Li, Wenzhe Wang, Zihe Liu, Haoran Wang, Chenghao Yan, Bin Wu	(参考訳) 映像ベースの人物認識は、人物がブロックされぼやけられ、撮影角度が変化するため困難である。以前の研究では常に静止画の人物認識に焦点が当てられ、ビデオフレーム間の類似性と連続性を無視していた。上記の課題に対処するために,顔の特徴を集約し,映像中の人物を特定するためのマルチモーダル情報を含む,ビデオベースの人物認識のための新しいフレーム集約・マルチモーダルフュージョン(FAMF)フレームワークを提案する。フレームアグリゲーションのために,任意の数の特徴を入力として,特徴品質に基づいて固定長アグリゲーションを演算する,netvlad( attentionvlad)に基づく新しい学習可能な層を提案する。本稿では,NetVLADにアテンション機構を導入することで,低品質フレームの影響を効果的に低減できることを示す。ビデオのマルチモデル情報について,多層マルチモーダルアテンション(MLMA)モジュールを提案する。 iQIYI-VID-2019データセットの実験結果から,我々のフレームワークは他の最先端手法よりも優れた性能を示した。 Video-based person recognition is challenging due to persons being blocked and blurred, and the variation of shooting angle. Previous research always focused on person recognition on still images, ignoring similarity and continuity between video frames. To tackle the challenges above, we propose a novel Frame Aggregation and Multi-Modal Fusion (FAMF) framework for video-based person recognition, which aggregates face features and incorporates them with multi-modal information to identify persons in videos. For frame aggregation, we propose a novel trainable layer based on NetVLAD (named AttentionVLAD), which takes arbitrary number of features as input and computes a fixed-length aggregation feature based on feature quality. We show that introducing an attention mechanism to NetVLAD can effectively decrease the impact of low-quality frames. For the multi-model information of videos, we propose a Multi-Layer Multi-Modal Attention (MLMA) module to learn the correlation of multi-modality by adaptively updating Gram matrix. Experimental results on iQIYI-VID-2019 dataset show that our framework outperforms other state-of-the-art methods.	翻訳日:2022-10-05 22:51:37 公開日:2020-12-30
# FTBNN: 1ビットCNNの非線形性を再考する FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond ( http://arxiv.org/abs/2010.09294v4 ) ライセンス: Link先を確認	Zhuo Su, Linpu Fang, Deke Guo, Dewen Hu, Matti Pietik\"ainen, Li Liu	(参考訳) 重みとアクティベーションの両方を1ビットにバイナライズするバイナリニューラルネットワーク(BNN)は、高度に高速化された計算とリソース制約されたデバイスの開発に訴えるメモリフットプリントの大幅な削減により、近年広く研究されている。 BNN構造を訓練するための量子化誤差を低減する従来の手法とは対照的に、二項化畳み込み過程はそのような誤差を最小化するターゲットに対して線形性を増大させ、それによってBNNの識別能力を損なう。本稿では,その矛盾を解消するために,適切な非線形モジュールを再検討し,チューニングし,精度とトレーニング効率の観点から大規模イメージネットデータセットの最先端性能を実現する強力なベースラインを実現する。さらに,提案するbnnモデルは,精度を損なうことなく,効率的なバイナリ操作をより有効に利用することにより,圧縮される可能性も高いことが判明した。さらに、グループ実行の助けを借りて、BNNモデルの限られた容量を増やすこともできる。これらの知見に基づいて,計算コストが低い場合でも,4～5%の精度でベースラインを改善することができる。コードはhttps://github.com/zhuogege1943/ftbnn.com/で公開します。 Binary neural networks (BNNs), where both weights and activations are binarized into 1 bit, have been widely studied in recent years due to its great benefit of highly accelerated computation and substantially reduced memory footprint that appeal to the development of resource constrained devices. In contrast to previous methods tending to reduce the quantization error for training BNN structures, we argue that the binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability. In this paper, we re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-the-art performance on the large-scale ImageNet dataset in terms of accuracy and training efficiency. To go further, we find that the proposed BNN model still has much potential to be compressed by making a better use of the efficient binary operations, without losing accuracy. In addition, the limited capacity of the BNN model can also be increased with the help of group execution. Based on these insights, we are able to improve the baseline with an additional 4~5% top-1 accuracy gain even with less computational cost. Our code will be made public at https://github.com/zhuogege1943/ftbnn.	翻訳日:2022-10-05 22:09:28 公開日:2020-12-30
# emformer:低レイテンシストリーミング音声認識のための効率的なメモリトランスフォーマーに基づく音響モデル Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition ( http://arxiv.org/abs/2010.10759v4 ) ライセンス: Link先を確認	Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, Mike Seltzer	(参考訳) 本稿では低遅延ストリーミング音声認識のための効率的なメモリ変換器Emformerを提案する。 Emformerでは、長期履歴コンテキストを拡張メモリバンクに蒸留することで、自己注意の計算複雑性を低減する。キャッシュ機構は、キーと値の計算を左のコンテキストの自己アテンションに保存する。 emformerは、低レイテンシモデルをサポートするために、トレーニングに並列化ブロック処理を適用する。ベンチマークのLibriSpeechデータに対して実験を行う。平均遅延 960 ms では、Emformer はテストクリーンで WER 2.50 % 、他で 5.62 % となる。強力なベースライン拡張メモリトランスフォーマー(am-trf)と比較すると、emformerはトレーニングのスピードアップに4.6ドル、相対リアルタイムファクター(rtf)のデコード削減に18\%、テストクリーンに17\%、テストに9\%のコストがかかる。平均レイテンシ80msの低レイテンシシナリオでは、emformerはテストクリーンで$3.01\%、テストで$7.09\%である。 LSTMベースラインを同じレイテンシとモデルサイズで比較すると、Emformerは相対的なWER削減を9.5%、テストクリーンで16.%となっている。 This paper proposes an efficient memory transformer Emformer for low latency streaming speech recognition. In Emformer, the long-range history context is distilled into an augmented memory bank to reduce self-attention's computation complexity. A cache mechanism saves the computation for the key and value in self-attention for the left context. Emformer applies a parallelized block processing in training to support low latency models. We carry out experiments on benchmark LibriSpeech data. Under average latency of 960 ms, Emformer gets WER $2.50\%$ on test-clean and $5.62\%$ on test-other. Comparing with a strong baseline augmented memory transformer (AM-TRF), Emformer gets $4.6$ folds training speedup and $18\%$ relative real-time factor (RTF) reduction in decoding with relative WER reduction $17\%$ on test-clean and $9\%$ on test-other. For a low latency scenario with an average latency of 80 ms, Emformer achieves WER $3.01\%$ on test-clean and $7.09\%$ on test-other. Comparing with the LSTM baseline with the same latency and model size, Emformer gets relative WER reduction $9\%$ and $16\%$ on test-clean and test-other, respectively.	翻訳日:2022-10-04 23:14:54 公開日:2020-12-30
# 自然言語における構成性と構造依存のモデル化 Modelling Compositionality and Structure Dependence in Natural Language ( http://arxiv.org/abs/2012.02038v2 ) ライセンス: Link先を確認	Karthikeya Ramesh Kaushik, Andrea E. Martin	(参考訳) 人間は既知の宇宙で最も洗練された計算機械を持っている。豊かな記述力の言語を理解し、驚くべき明快さで同じ環境でコミュニケーションすることができる。自然言語に興味を持つ多くのコントリビュータの2つ – 構成性と構造依存の性質 – は十分に文書化されており、興味深いモデリング質問を行うための広大なスペースを提供する。これらの疑問に答え始める最初のステップは、形式的な言葉理論を基礎づけることである。言語学と集合論に基づいて、これらの概念の形式化がこの論文の前半で述べられている。私たちは、言語を処理する認知システムが、構造的に定義されたドメインに依存する時間ベースのインクリメンタルな操作など、特定の機能的制約を持つ必要があることを目にします。このフォーマルな設定を分析した結果の観察は、モデリング演習の一環として検討される。単語埋め込み技術の進歩により、リレーショナルラーニングのモデルはカスタムデータセットでシミュレートされ、最初のセクションで記述された制約のいくつかを満たす時間ベースのロールフィラー結合メカニズムがいかに満たされるかを示す。モデルが構造をマッピングする能力とシンボリック・コネクショニストアーキテクチャは、認知的に妥当な実装を可能にします。形式化とシミュレーションは、言語理論によって課される制約を認識し、これらの制約を実現するための関係学習の認知モデルによって提示される機会を探求する試みである。 Human beings possess the most sophisticated computational machinery in the known universe. We can understand language of rich descriptive power, and communicate in the same environment with astonishing clarity. Two of the many contributors to the interest in natural language - the properties of Compositionality and Structure Dependence, are well documented, and offer a vast space to ask interesting modelling questions. The first step to begin answering these questions is to ground verbal theory in formal terms. Drawing on linguistics and set theory, a formalisation of these ideas is presented in the first half of this thesis. We see how cognitive systems that process language need to have certain functional constraints, viz. time based, incremental operations that rely on a structurally defined domain. The observations that result from analysing this formal setup are examined as part of a modelling exercise. Using the advances of word embedding techniques, a model of relational learning is simulated with a custom dataset to demonstrate how a time based role-filler binding mechanism satisfies some of the constraints described in the first section. The model's ability to map structure, along with its symbolic-connectionist architecture makes for a cognitively plausible implementation. The formalisation and simulation are together an attempt to recognise the constraints imposed by linguistic theory, and explore the opportunities presented by a cognitive model of relation learning to realise these constraints.	翻訳日:2022-09-22 09:08:32 公開日:2020-12-30
# deep gravity:深層ニューラルネットワークと地理情報を用いたモビリティフロー生成の促進 Deep Gravity: enhancing mobility flows generation with deep neural networks and geographic information ( http://arxiv.org/abs/2012.00489v2 ) ライセンス: Link先を確認	Filippo Simini, Gianni Barlacchi, Massimiliano Luca, Luca Pappalardo	(参考訳) 都市内および都市間における個人の移動は、客観的・主観的幸福、イノベーションの拡散、流行の広がり、環境の質といった、我々の社会の重要な側面に影響を与えます。このため, 位置の特性を考慮し, 実際の流れに関する情報を一切含まない, 一連の地理的位置間の流れを生成することによる, フロー生成の困難な問題に対する関心が高まっている。フロー生成に対する既存の解決策は、主に重力モデルや放射モデルのような機械的なアプローチに基づいており、過度な拡散や土地利用や輸送網といった重要な変数を無視し、これらの変数間の非線形関係を記述できない。本稿では,多機能深層重力モデル(mfdg)をフロー生成の有効な解として提案する。一方、mfdgモデルは、自発的地理情報データ(openstreetmap)から抽出された多数の変数(例えば、土地利用と道路網の特徴、輸送、食品、健康施設)を利用する。一方,本モデルは深層ニューラルネットワークを用いて,これらの変数間の複雑な非線形関係を記述する。イングランドにおける通勤流に着目した実験により,mfdgモデルは,深層ニューラルネットワークを使用しない機械モデルや地理自発的データを活用しない機械モデルよりも高い性能(人口密度の高い領域では最大250\%)を達成していることが示された。本研究では,時空間データを扱う深層学習コミュニティのための新しい課題であるフロー生成問題の正確な定義を提案し,現状の統計モデルよりもはるかに優れた深層ニューラルネットワークモデルを提案する。 The movements of individuals within and among cities influence key aspects of our society, such as the objective and subjective well-being, the diffusion of innovations, the spreading of epidemics, and the quality of the environment. For this reason, there is increasing interest around the challenging problem of flow generation, which consists in generating the flows between a set of geographic locations, given the characteristics of the locations and without any information about the real flows. Existing solutions to flow generation are mainly based on mechanistic approaches, such as the gravity model and the radiation model, which suffer from underfitting and overdispersion, neglect important variables such as land use and the transportation network, and cannot describe non-linear relationships between these variables. In this paper, we propose the Multi-Feature Deep Gravity (MFDG) model as an effective solution to flow generation. On the one hand, the MFDG model exploits a large number of variables (e.g., characteristics of land use and the road network; transport, food, and health facilities) extracted from voluntary geographic information data (OpenStreetMap). On the other hand, our model exploits deep neural networks to describe complex non-linear relationships between those variables. Our experiments, conducted on commuting flows in England, show that the MFDG model achieves a significant increase in the performance (up to 250\% for highly populated areas) than mechanistic models that do not use deep neural networks, or that do not exploit geographic voluntary data. Our work presents a precise definition of the flow generation problem, which is a novel task for the deep learning community working with spatio-temporal data, and proposes a deep neural network model that significantly outperforms current state-of-the-art statistical models.	翻訳日:2021-05-30 19:30:26 公開日:2020-12-30
# スペクトル分布認識画像生成 Spectral Distribution Aware Image Generation ( http://arxiv.org/abs/2012.03110v2 ) ライセンス: Link先を確認	Steffen Jung and Margret Keuper	(参考訳) フォトリアリスティック画像の深部生成モデルの最近の進歩は、高品質な視覚結果をもたらしている。このようなモデルは、人間の目で実際の画像と容易に区別できないような、所定のトレーニング分布からデータを生成することを学習する。しかし、このような偽画像の検出に関する最近の研究は、それらの周波数スペクトルのアーティファクトが実際に容易に識別できることを指摘している。本稿では,スペクトル判別器を用いて実データの周波数分布に応じて画像を生成することを提案する。提案する判別器は軽量でモジュール性があり、一般的なgan損失が異なる安定して動作する。この結果から,実際の周波数スペクトルによる画像生成がより容易であり,検出が困難であることが示唆された。 Recent advances in deep generative models for photo-realistic images have led to high quality visual results. Such models learn to generate data from a given training distribution such that generated images can not be easily distinguished from real images by the human eye. Yet, recent work on the detection of such fake images pointed out that they are actually easily distinguishable by artifacts in their frequency spectra. In this paper, we propose to generate images according to the frequency distribution of the real data by employing a spectral discriminator. The proposed discriminator is lightweight, modular and works stably with different commonly used GAN losses. We show that the resulting models can better generate images with realistic frequency spectra, which are thus harder to detect by this cue.	翻訳日:2021-05-22 12:04:58 公開日:2020-12-30
# メモリゲートリカレントネットワーク Memory-Gated Recurrent Networks ( http://arxiv.org/abs/2012.13121v2 ) ライセンス: Link先を確認	Yaquan Zhang, Qi Wu, Nanbo Peng, Min Dai, Jing Zhang, Hu Wang	(参考訳) 多変量連続学習の本質は、データの依存関係を抽出する方法にある。集中治療単位の時間毎医療記録や多周波数の音声時系列といったこれらのデータセットは、個々の構成要素(マージナルメモリ)に強い連続依存を示すだけでなく、横断的な依存関係(ジョイントメモリ)において不要な記憶を示すことが多い。データ生成プロセスの根底にある関節分布の進化における多変量的複雑さのため、我々はデータ駆動型アプローチを採用し、メモリゲート型リカレントネットワーク(mGRN)と呼ばれる新しいリカレントネットワークアーキテクチャを構築し、ゲートは境界メモリとジョイントメモリという2つの異なる種類の記憶を明示的に制御する。様々な公開データセットに対する包括的シミュレーション研究と実証実験を組み合わせることで,提案したmGRNアーキテクチャは,多変量時系列を対象とする最先端アーキテクチャを一貫して上回ることを示す。 The essence of multivariate sequential learning is all about how to extract dependencies in data. These data sets, such as hourly medical records in intensive care units and multi-frequency phonetic time series, often time exhibit not only strong serial dependencies in the individual components (the "marginal" memory) but also non-negligible memories in the cross-sectional dependencies (the "joint" memory). Because of the multivariate complexity in the evolution of the joint distribution that underlies the data generating process, we take a data-driven approach and construct a novel recurrent network architecture, termed Memory-Gated Recurrent Networks (mGRN), with gates explicitly regulating two distinct types of memories: the marginal memory and the joint memory. Through a combination of comprehensive simulation studies and empirical experiments on a range of public datasets, we show that our proposed mGRN architecture consistently outperforms state-of-the-art architectures targeting multivariate time series.	翻訳日:2021-04-25 08:06:58 公開日:2020-12-30
# 非ランバート測光ステレオにおけるフレーム間およびフレーム内表現の学習 Learning Inter- and Intra-frame Representations for Non-Lambertian Photometric Stereo ( http://arxiv.org/abs/2012.13720v2 ) ライセンス: Link先を確認	Yanlong Cao, Binjie Ding, Zewei He, Jiangxin Yang, Jingxi Chen, Yanpeng Cao and Xin Li	(参考訳) 本稿では,2段階の畳み込みニューラルネットワーク(CNN)アーキテクチャを構築し,異なる光方向で撮像された画像の任意の数に基づいてフレーム間およびフレーム間表現を構築し,非ランベルト物体の正確な正規推定を行う。光度ステレオ問題に対して,フレーム間およびフレーム間特徴抽出モジュールを配置するための最適スキームを特定するために,多数のネットワーク設計手法を実験的に検討した。さらに, フレーム内空間畳み込みにおいて, 不正な背景領域からの干渉を除去し, 暗黒材料や鋳型シャドウを用いた表面の正常推定精度を効果的に向上させるため, 容易に得られる被写体マスクを提案する。提案する2段式光計測ステレオcnnモデル(mt-ps-cnn)は,精度と効率の両面で最先端の光計測ステレオ技術に好適である。さらに, 複素幾何の非ランベルト対象に対して, 高精度でリッチな面正規細部を予測でき, 希薄な照明分布と密集した照明分布の両方で, 安定して入力を行うことができる。 In this paper, we build a two-stage Convolutional Neural Network (CNN) architecture to construct inter- and intra-frame representations based on an arbitrary number of images captured under different light directions, performing accurate normal estimation of non-Lambertian objects. We experimentally investigate numerous network design alternatives for identifying the optimal scheme to deploy inter-frame and intra-frame feature extraction modules for the photometric stereo problem. Moreover, we propose to utilize the easily obtained object mask for eliminating adverse interference from invalid background regions in intra-frame spatial convolutions, thus effectively improve the accuracy of normal estimation for surfaces made of dark materials or with cast shadows. Experimental results demonstrate that proposed masked two-stage photometric stereo CNN model (MT-PS-CNN) performs favorably against state-of-the-art photometric stereo techniques in terms of both accuracy and efficiency. In addition, the proposed method is capable of predicting accurate and rich surface normal details for non-Lambertian objects of complex geometry and performs stably given inputs captured in both sparse and dense lighting distributions.	翻訳日:2021-04-25 01:10:07 公開日:2020-12-30
# (参考訳) 複合リスクは、教師なしドメイン適応アプローチのパフォーマンスにどのように影響するか? How does the Combined Risk Affect the Performance of Unsupervised Domain Adaptation Approaches? ( http://arxiv.org/abs/2101.01104v1 ) ライセンス: CC BY 4.0	Li Zhong, Zhen Fang, Feng Liu, Jie Lu, Bo Yuan, Guangquan Zhang	(参考訳) unsupervised domain adaptation (uda)は、ソースドメインからのラベル付きサンプルとターゲットドメインからのラベルなしサンプルでターゲット分類器をトレーニングすることを目的としている。古典的なUDA学習バウンダリは、ターゲットのリスクは、ソースリスク、分散の相違、複合リスクの3つの項によって上限づけられていることを示している。組み合わせリスクが小さな固定値であるという仮定に基づいて、この境界値に基づく手法は、ソースリスクと分布不一致の推定を最小化することでターゲット分類器を訓練する。しかし、両方の推定器を最小化すると、複合リスクが増大し、ターゲットのリスクは制御不能になる。したがって、組み合わせたリスクを制御できなければ、ターゲット分類器は理想的な性能を達成できない。複合リスクを制御するために、重要な課題は、ターゲットドメイン内のラベル付きサンプルの有効性に根ざしている。この課題に対処するため,E-MixNetという手法を提案する。 e-mixnetは、ラベル付きソースサンプルと疑似ラベル付きターゲットサンプルに汎用的なビビナル分布である強化ミックスアップを使用して、複合リスクのプロキシを計算する。実験により、プロキシはソースリスクと分散の相違を最小化する際に、結合リスクの増加を効果的に抑制できることが示された。さらに,4つのUDA手法の損失関数に複合リスクのプロキシを付加すると,それらの性能も向上することを示した。 Unsupervised domain adaptation (UDA) aims to train a target classifier with labeled samples from the source domain and unlabeled samples from the target domain. Classical UDA learning bounds show that target risk is upper bounded by three terms: source risk, distribution discrepancy, and combined risk. Based on the assumption that the combined risk is a small fixed value, methods based on this bound train a target classifier by only minimizing estimators of the source risk and the distribution discrepancy. However, the combined risk may increase when minimizing both estimators, which makes the target risk uncontrollable. Hence the target classifier cannot achieve ideal performance if we fail to control the combined risk. To control the combined risk, the key challenge takes root in the unavailability of the labeled samples in the target domain. To address this key challenge, we propose a method named E-MixNet. E-MixNet employs enhanced mixup, a generic vicinal distribution, on the labeled source samples and pseudo-labeled target samples to calculate a proxy of the combined risk. Experiments show that the proxy can effectively curb the increase of the combined risk when minimizing the source risk and distribution discrepancy. Furthermore, we show that if the proxy of the combined risk is added into loss functions of four representative UDA methods, their performance is also improved.	翻訳日:2021-04-18 19:46:12 公開日:2020-12-30
# (参考訳) DeepSphere: グラフベースの球面CNN DeepSphere: a graph-based spherical CNN ( http://arxiv.org/abs/2012.15000v1 ) ライセンス: CC BY 4.0	Micha\"el Defferrard, Martino Milani, Fr\'ed\'erick Gusset, Nathana\"el Perraudin	(参考訳) 球形ニューラルネットワークの畳み込みを設計するには、効率と回転同分散の微妙なトレードオフが必要である。サンプル球のグラフ表現に基づくDeepSphereは、これらの2つのデシダータの間に制御可能なバランスを打つ。この貢献は2つある。まず、各頂点および近傍の数に関して、基礎となるグラフによる等式の影響について理論的および実証的に検討する。次に,DeepSphereの問題点について検討した。実験は最先端のパフォーマンスを示し、この定式化の効率性と柔軟性を示す。おそらく意外なことに、以前の研究と比較すると、異方性フィルターは不必要に支払う価格になるかもしれない。私たちのコードはhttps://github.com/deepsphereで利用可能です。 Designing a convolution for a spherical neural network requires a delicate tradeoff between efficiency and rotation equivariance. DeepSphere, a method based on a graph representation of the sampled sphere, strikes a controllable balance between these two desiderata. This contribution is twofold. First, we study both theoretically and empirically how equivariance is affected by the underlying graph with respect to the number of vertices and neighbors. Second, we evaluate DeepSphere on relevant problems. Experiments show state-of-the-art performance and demonstrates the efficiency and flexibility of this formulation. Perhaps surprisingly, comparison with previous work suggests that anisotropic filters might be an unnecessary price to pay. Our code is available at https://github.com/deepsphere	翻訳日:2021-04-18 19:10:40 公開日:2020-12-30
# (参考訳) openvidial:ビジュアルコンテキストを備えた大規模オープンドメイン対話データセット OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts ( http://arxiv.org/abs/2012.15015v1 ) ライセンス: CC BY 4.0	Yuxian Meng, Shuhe Wang, Qinghong Han, Xiaofei Sun, Fei Wu, Rui Yan and Jiwei Li	(参考訳) 人間が会話するとき、話者が次に何を言うかは、彼が見るものによって大きく異なる。残念ながら、既存の対話モデルは、先行するテキストコンテキストのみに基づいて対話発話を生成しており、視覚的コンテキストはほとんど考慮されない。これは、視覚的コンテキストと組み合わせた発話を伴う大規模マルチモジュール対話データセットがないためである。本稿では,大規模多モジュール対話データセットである {\bf openvidial} をリリースする。対話のターンと視覚的コンテキストは、映画やテレビシリーズから抽出され、各対話のターンは、それが行われる対応する視覚的コンテキストとペアリングされる。 OpenViDialには、合計で1100万回の対話があり、画像に格納されている視覚的コンテキストは1100万回である。このデータセットに基づいて,CNNから抽出した粗粒度画像特徴から,より高速なR-CNNから抽出した細粒度オブジェクト特徴まで,テキストと視覚の両方のコンテキストを活用するエンコーダ・デコーダモデル群を提案する。視覚情報は対話生成の質を著しく向上させ,対話学習のためのマルチモーダル機能の統合の必要性を検証する。我々の研究は、大規模マルチモーダル対話学習への重要な一歩である。 When humans converse, what a speaker will say next significantly depends on what he sees. Unfortunately, existing dialogue models generate dialogue utterances only based on preceding textual contexts, and visual contexts are rarely considered. This is due to a lack of a large-scale multi-module dialogue dataset with utterances paired with visual contexts. In this paper, we release {\bf OpenViDial}, a large-scale multi-module dialogue dataset. The dialogue turns and visual contexts are extracted from movies and TV series, where each dialogue turn is paired with the corresponding visual context in which it takes place. OpenViDial contains a total number of 1.1 million dialogue turns, and thus 1.1 million visual contexts stored in images. Based on this dataset, we propose a family of encoder-decoder models leveraging both textual and visual contexts, from coarse-grained image features extracted from CNNs to fine-grained object features extracted from Faster R-CNNs. We observe that visual information significantly improves dialogue generation qualities, verifying the necessity of integrating multi-modal features for dialogue learning. Our work marks an important step towards large-scale multi-modal dialogue learning.	翻訳日:2021-04-18 18:32:48 公開日:2020-12-30
# (参考訳) デヴァナガリ詩の言語識別 Language Identification of Devanagari Poems ( http://arxiv.org/abs/2012.15023v1 ) ライセンス: CC BY-SA 4.0	Priyankit Acharya, Aditya Ku. Pathak, Rakesh Ch. Balabantaray, and Anil Ku. Singh	(参考訳) 言語識別は、いくつかのテキスト処理パイプラインで非常に重要な部分です。この分野では広範な研究が行われている。本稿では,インドにおける10のデバナガリ言語からなる詩分析課題における詩の自動言語識別手法を提案する。 Angika, Awadhi, Braj, Bhojpuri, Chhattisgarhi, Garhwali, Haryanvi, Hindi, Magahi, Maithili。長さの異なる詩のコーパスを照合し,語彙レベルで10言語間の詩の類似性を検討した。最後に、教師付き機械学習とディープラーニング技術に基づく各種言語識別システムを適用し、評価する。 Language Identification is a very important part of several text processing pipelines. Extensive research has been done in this field. This paper proposes a procedure for automatic language identification of poems for poem analysis task, consisting of 10 Devanagari based languages of India i.e. Angika, Awadhi, Braj, Bhojpuri, Chhattisgarhi, Garhwali, Haryanvi, Hindi, Magahi, and Maithili. We collated corpora of poems of varying length and studied the similarity of poems among the 10 languages at the lexical level. Finally, various language identification systems based on supervised machine learning and deep learning techniques are applied and evaluated.	翻訳日:2021-04-18 17:37:38 公開日:2020-12-30
# (参考訳) 糖尿病管理システムを用いた眼底および舌デジタル画像処理のための機械学習技術のレビュー A Review of Machine Learning Techniques for Applied Eye Fundus and Tongue Digital Image Processing with Diabetes Management System ( http://arxiv.org/abs/2012.15025v1 ) ライセンス: CC BY 4.0	Wei Xiang Lim, Zhiyuan Chen, Amr Ahmed, Tissa Chandesa and Iman Liao	(参考訳) 糖尿病は世界的な流行であり、警戒速度で増加している。国際糖尿病連盟(idf)は、世界規模の糖尿病患者数は48%増加し、4億2500万人(2017年)から6億2900万人(2045年)と予測している。さらに糖尿病は何百万人もの死者を招き、その数は急増している。そこで本稿では糖尿病の背景とその合併症について述べる。また,眼底および舌のデジタル画像を用いた糖尿病管理システムにおける革新的応用と過去の研究について検討した。市販の糖尿病管理システムによる既存の眼底および舌デジタル画像処理と,過去の文献による最先端の機械学習技術について概説した。本研究の目的は,糖尿病研究の概要と,この世界的な流行を解決するための新しい機械学習技術を提案することである。 Diabetes is a global epidemic and it is increasing at an alarming rate. The International Diabetes Federation (IDF) projected that the total number of people with diabetes globally may increase by 48%, from 425 million (year 2017) to 629 million (year 2045). Moreover, diabetes had caused millions of deaths and the number is increasing drastically. Therefore, this paper addresses the background of diabetes and its complications. In addition, this paper investigates innovative applications and past researches in the areas of diabetes management system with applied eye fundus and tongue digital images. Different types of existing applied eye fundus and tongue digital image processing with diabetes management systems in the market and state-of-the-art machine learning techniques from previous literature have been reviewed. The implication of this paper is to have an overview in diabetic research and what new machine learning techniques can be proposed in solving this global epidemic.	翻訳日:2021-04-18 17:29:44 公開日:2020-12-30
# (参考訳) LiDARセンサデータを用いたオンラインSVMによるインクリメンタル学習 Incremental learning with online SVMs on LiDAR sensory data ( http://arxiv.org/abs/2101.01667v1 ) ライセンス: CC BY 4.0	Le Dinh Van Khoa and Zhiyuan Chen	(参考訳) パイプラインの送電システムは、エネルギー産業において長い間存在してきた成長の側面の1つである。サービスを維持するためのパイプ内探索のコストは常に、この業界で注目を集めている。通常の探査方法(例) 磁束漏れと渦電流)は、各パイプのマイルストーンに固定されたセンサーを確立するか、パイプ内を移動するセンサーを運ぶ。大量のセンサーが備わっているため、メンテナンスプロセスは非常に困難である。解決策の1つは、感覚データ分析のための機械学習技術を実装することである。 SVMはカーネルのトリックでこの問題を解決できるが、カーネルの計算はデータサイズにも依存する。サポートベクトルの数が本当に大きくなると、プロセスが急速に誇張されるためです。特に、LiDARは極めて速い速度でスピンし、入力データの流れは最終的に大きな膨張をもたらす可能性がある。提案手法では,各サンプルを瞬時に学習し,サポートするカーネルを同時に計算する。本研究では,lidarセンサデータのみを扱うオンラインサポートベクターマシン(svms)を用いたインクリメンタル学習手法を提案する。 The pipelines transmission system is one of the growing aspects, which has existed for a long time in the energy industry. The cost of in-pipe exploration for maintaining service always draws lots of attention in this industry. Normally exploration methods (e.g. Magnetic flux leakage and eddy current) will establish the sensors stationary for each pipe milestone or carry sensors to travel inside the pipe. It makes the maintenance process very difficult due to the massive amount of sensors. One of the solutions is to implement machine learning techniques for the analysis of sensory data. Although SVMs can resolve this issue with kernel trick, the problem is that computing the kernel depends on the data size too. It is because the process can be exaggerated quickly if the number of support vectors becomes really large. Particularly LiDAR spins with an extremely rapid rate and the flow of input data might eventually lead to massive expansion. In our proposed approach, each sample is learned in an instant way and the supported kernel is computed simultaneously. In this research, incremental learning approach with online support vector machines (SVMs) is presented, which aims to deal with LiDAR sensory data only.	翻訳日:2021-04-18 17:22:50 公開日:2020-12-30
# (参考訳) 組立予測モデルによる石油・ガス産業の設備故障解析 Equipment Failure Analysis for Oil and Gas Industry with an Ensemble Predictive Model ( http://arxiv.org/abs/2012.15030v1 ) ライセンス: CC BY 4.0	Chen ZhiYuan, Olugbenro. O. Selere and Nicholas Lu Chee Seng	(参考訳) 本稿では,smo(sequential minimal optimization)トレーニングアルゴリズムを用いた支援ベクトル機械(svm)分類器の分類精度の向上を目的として,油・ガス機器データから故障や正常インスタンスを適切に分類する。近年の故障解析では,SMOトレーニングアルゴリズムを実装せずにSVM技術を用いているが,本研究では,SMOトレーニングアルゴリズムを用いた場合,提案手法の方が優れた性能が得られることを示す。さらに、SVM分類器の性能を向上させるために、ハイブリッドルールベースとニューラルネットワーク分類器であるアンサンブルアプローチを実装した(SMOトレーニングアルゴリズムを用いて)。最適化研究は、不均衡データセットを扱う際の分類器の性能低下の結果である。選択されたベストパフォーマンス分類器は、不均衡なデータの問題を処理できる効率的なアンサンブル予測モデルを作成するスタックングアンサンブル法を用いて、SVM分類器(SMOトレーニングアルゴリズム)と組み合わせる。この予測モデルの分類性能は、SMOトレーニングアルゴリズムおよび他の多くの従来の分類器によるSVMよりもかなり優れている。 This paper aims at improving the classification accuracy of a Support Vector Machine (SVM) classifier with Sequential Minimal Optimization (SMO) training algorithm in order to properly classify failure and normal instances from oil and gas equipment data. Recent applications of failure analysis have made use of the SVM technique without implementing SMO training algorithm, while in our study we show that the proposed solution can perform much better when using the SMO training algorithm. Furthermore, we implement the ensemble approach, which is a hybrid rule based and neural network classifier to improve the performance of the SVM classifier (with SMO training algorithm). The optimization study is as a result of the underperformance of the classifier when dealing with imbalanced dataset. The selected best performing classifiers are combined together with SVM classifier (with SMO training algorithm) by using the stacking ensemble method which is to create an efficient ensemble predictive model that can handle the issue of imbalanced data. The classification performance of this predictive model is considerably better than the SVM with and without SMO training algorithm and many other conventional classifiers.	翻訳日:2021-04-18 17:15:37 公開日:2020-12-30
# (参考訳) サポートベクターマシンによる故障の教師なしリアルタイム予測 Unsupervised Real Time Prediction of Faults Using the Support Vector Machine ( http://arxiv.org/abs/2012.15032v1 ) ライセンス: CC BY 4.0	Zhiyuan Chen, Isa Dino and Nik Ahmad Akram	(参考訳) 本稿では,smo(sequential minimal optimization)トレーニングアルゴリズムを用いた支援ベクトル機械(svm)分類器の分類精度の向上を目的として,油・ガス機器データから故障や正常インスタンスを適切に分類する。近年の故障解析では,SMOトレーニングアルゴリズムを実装せずにSVM技術を用いているが,本研究では,SMOトレーニングアルゴリズムを用いた場合,提案手法の方が優れた性能が得られることを示す。さらに、SVM分類器の性能を向上させるために、ハイブリッドルールベースとニューラルネットワーク分類器であるアンサンブルアプローチを実装した(SMOトレーニングアルゴリズムを用いて)。最適化研究は、不均衡データセットを扱う際の分類器の性能低下の結果である。選択されたベストパフォーマンス分類器は、不均衡なデータの問題を処理できる効率的なアンサンブル予測モデルを作成するスタックングアンサンブル法を用いて、SVM分類器(SMOトレーニングアルゴリズム)と組み合わせる。この予測モデルの分類性能は、SMOトレーニングアルゴリズムおよび他の多くの従来の分類器によるSVMよりもかなり優れている。 This paper aims at improving the classification accuracy of a Support Vector Machine (SVM) classifier with Sequential Minimal Optimization (SMO) training algorithm in order to properly classify failure and normal instances from oil and gas equipment data. Recent applications of failure analysis have made use of the SVM technique without implementing SMO training algorithm, while in our study we show that the proposed solution can perform much better when using the SMO training algorithm. Furthermore, we implement the ensemble approach, which is a hybrid rule based and neural network classifier to improve the performance of the SVM classifier (with SMO training algorithm). The optimization study is as a result of the underperformance of the classifier when dealing with imbalanced dataset. The selected best performing classifiers are combined together with SVM classifier (with SMO training algorithm) by using the stacking ensemble method which is to create an efficient ensemble predictive model that can handle the issue of imbalanced data. The classification performance of this predictive model is considerably better than the SVM with and without SMO training algorithm and many other conventional classifiers.	翻訳日:2021-04-18 17:05:24 公開日:2020-12-30
# (参考訳) Spoken vs. の人間による評価オープンドメインQAのためのビジュアル説明 Human Evaluation of Spoken vs. Visual Explanations for Open-Domain QA ( http://arxiv.org/abs/2012.15075v1 ) ライセンス: CC BY-SA 4.0	Ana Valeria Gonzalez, Gagan Bansal, Angela Fan, Robin Jia, Yashar Mehdad and Srinivasan Iyer	(参考訳) オープンドメインQAシステム(ODQA)のユーザへの予測についての説明研究が盛んに行われているが,説明がユーザ信頼を高める程度の評価には至っていない。 ODQAは音声アシスタントにおいて最もユビキタスであるが、現在の研究はビジュアルディスプレイを用いた説明のみを評価し、他のモダリティに対する最もパフォーマンスの高い説明に関する結論を誤って外挿する可能性がある。これらの問題を緩和するために、odqaシステムの回答をいつ受け入れるかをユーザーが正確に判断するのに役立つ説明を計測するユーザー調査を行う。従来の作業とは異なり、説明モダリティ(例えば、音声またはビジュアルインターフェースを介してユーザと通信されるか、モダリティ間のコントラスト効果か)を制御する。その結果,得られた証拠文から導かれた説明は,モダリティにまたがる強いベースライン(信頼度)を上回ることができるが,実際にモダリティによって変化する最良の説明戦略であることがわかった。我々は,現在の説明に共通する障害事例を示し,説明のエンドツーエンド評価を強調し,デプロイと異なるプロキシモダリティで評価することを警告する。 While research on explaining predictions of open-domain QA systems (ODQA) to users is gaining momentum, most works have failed to evaluate the extent to which explanations improve user trust. While few works evaluate explanations using user studies, they employ settings that may deviate from the end-user's usage in-the-wild: ODQA is most ubiquitous in voice-assistants, yet current research only evaluates explanations using a visual display, and may erroneously extrapolate conclusions about the most performant explanations to other modalities. To alleviate these issues, we conduct user studies that measure whether explanations help users correctly decide when to accept or reject an ODQA system's answer. Unlike prior work, we control for explanation modality, e.g., whether they are communicated to users through a spoken or visual interface, and contrast effectiveness across modalities. Our results show that explanations derived from retrieved evidence passages can outperform strong baselines (calibrated confidence) across modalities but the best explanation strategy in fact changes with the modality. We show common failure cases of current explanations, emphasize end-to-end evaluation of explanations, and caution against evaluating them in proxy modalities that are different from deployment.	翻訳日:2021-04-18 16:38:28 公開日:2020-12-30
# (参考訳) sindhiのためのサブワード誘導ニューラルワードセグメンテーションモデル A Subword Guided Neural Word Segmentation Model for Sindhi ( http://arxiv.org/abs/2012.15079v1 ) ライセンス: CC BY 4.0	Wazir Ali, Jay Kumar, Zenglin Xu, Congjian Luo, Junyu Lu, Junming Shao, Rajesh Kumar, and Yazhou Ren	(参考訳) ディープニューラルネットワークは、自然言語処理(nlp)における手動特徴工学の負担を軽減するために、テキスト表現の学習に複数の処理層を用いる。このようなテキスト表現はラベルのないデータから特徴を抽出するために広く使われている。セグメンテーションという言葉は多くの言語にとって基本的かつ必然的な前提条件である。 Sindhiはリソース不足の言語であり、空間欠落、空間挿入の問題、セグメンテーションのためのラベル付きコーパスがないため、セグメンテーションは困難である。本稿では,Syndhi のための Subword Guided Neural Word Segmenter (SGNWS) を用いたラベル付きデータを用いた教師付き Sindhi Word Segmentation (SWS) について検討する。テキスト表現を学習するために,2方向長短項記憶(BiLSTM),自己注意機構,条件付きランダムフィールド(CRF)を活用する形態素レベルで単語情報をキャプチャするために,サブワード表現を繰り返しニューラルネットワークに組み込む。提案したSGNWSモデルは機能工学に頼らずに98.51%のF1値を達成する。実験の結果,既存のsindhi単語セグメンタよりも,提案モデルの利点が示された。 Deep neural networks employ multiple processing layers for learning text representations to alleviate the burden of manual feature engineering in Natural Language Processing (NLP). Such text representations are widely used to extract features from unlabeled data. The word segmentation is a fundamental and inevitable prerequisite for many languages. Sindhi is an under-resourced language, whose segmentation is challenging as it exhibits space omission, space insertion issues, and lacks the labeled corpus for segmentation. In this paper, we investigate supervised Sindhi Word Segmentation (SWS) using unlabeled data with a Subword Guided Neural Word Segmenter (SGNWS) for Sindhi. In order to learn text representations, we incorporate subword representations to recurrent neural architecture to capture word information at morphemic-level, which takes advantage of Bidirectional Long-Short Term Memory (BiLSTM), self-attention mechanism, and Conditional Random Field (CRF). Our proposed SGNWS model achieves an F1 value of 98.51% without relying on feature engineering. The empirical results demonstrate the benefits of the proposed model over the existing Sindhi word segmenters.	翻訳日:2021-04-18 16:19:02 公開日:2020-12-30
# (参考訳) 機械学習予測の説明--運用プロセスへの適用のための必須ステップ Explanations of Machine Learning predictions: a mandatory step for its application to Operational Processes ( http://arxiv.org/abs/2012.15103v1 ) ライセンス: CC BY 4.0	Giorgio Visani, Federico Chesani, Enrico Bagli, Davide Capuzzo and Alessandro Poluzzi	(参考訳) 世界経済では、信用会社は金融業者としての活動を通じて、経済発展において中心的な役割を果たす。この重要なタスクにはいくつかの欠点があり、主に債務者が提供されたクレジットを返済できないリスクがある。したがって、信用リスクモデリング(crm)、すなわち債務者が債務を返済しない確率の評価が最重要役割を担っている。統計的なアプローチは長年にわたってうまく活用され、CRMの最もよく使われる方法となった。近年,CRMタスクに機械学習およびディープラーニング技術が適用され,予測品質と性能が著しく向上している。しかし、そのような手法は、通常、彼らが生み出すスコアについて信頼できる説明を与えない。その結果、多くの機械・深層学習技術は、例えばGDPRのような西洋諸国の規制に従わない。本稿では, LIME(Local Interpretable Model-Agnostic Explanations)技術を用いて, この分野における説明可能性問題に対処し, 実際の信用リスクデータセットへの採用を示し, 最終的にその健全性とタスクの採用とコンプライアンスを保証するために必要な改善について議論する。 In the global economy, credit companies play a central role in economic development, through their activity as money lenders. This important task comes with some drawbacks, mainly the risk of the debtors not being able to repay the provided credit. Therefore, Credit Risk Modelling (CRM), namely the evaluation of the probability that a debtor will not repay the due amount, plays a paramount role. Statistical approaches have been successfully exploited since long, becoming the most used methods for CRM. Recently, also machine and deep learning techniques have been applied to the CRM task, showing an important increase in prediction quality and performances. However, such techniques usually do not provide reliable explanations for the scores they come up with. As a consequence, many machine and deep learning techniques fail to comply with western countries' regulations such as, for example, GDPR. In this paper we suggest to use LIME (Local Interpretable Model-agnostic Explanations) technique to tackle the explainability problem in this field, we show its employment on a real credit-risk dataset and eventually discuss its soundness and the necessary improvements to guarantee its adoption and compliance with the task.	翻訳日:2021-04-18 15:13:02 公開日:2020-12-30
# (参考訳) Dual-Camera Compressive Hyperspectral Imaging による高速ハイパースペクトル画像再生 Fast Hyperspectral Image Recovery via Non-iterative Fusion of Dual-Camera Compressive Hyperspectral Imaging ( http://arxiv.org/abs/2012.15104v1 ) ライセンス: CC BY 4.0	Wei He, Naoto Yokoya, and Xin Yuan	(参考訳) Coded Aperture snapshot Spectrum Imaging (CASSI) は、1つの符号化された2次元(2D)計測を用いて3次元ハイパースペクトル画像(HSI)をキャプチャする有望な手法である。異常な性質のため、様々な正規化器を用いて2次元計測から3次元データを再構成している。残念ながら、精度と計算の複雑さは満足できない。 1つの実現可能な解決策は、CASSIにおけるRGB測定などの追加情報を活用することである。本稿では, CASSI と RGB の組合せを考慮し, HSI 再構成のための新しい融合モデルを提案する。スペクトル基底と空間係数からなるhsiのスペクトル低ランク特性について検討した。具体的には、RGB測定を用いて係数を推定し、CASSI測定は直交スペクトルベースを提供する。さらに,hsiのスペクトル低ランク特性を向上させるパッチ処理戦略を提案する。提案したモデルは、非局所的な処理やイテレーション、RGB検出器のスペクトル検出行列を必要としない。シミュレーションおよび実HSIデータセットの大規模な実験により,提案手法は品質だけでなく,5000回以上の再現を高速化することを示す。 Coded aperture snapshot spectral imaging (CASSI) is a promising technique to capture the three-dimensional hyperspectral image (HSI) using a single coded two-dimensional (2D) measurement, in which algorithms are used to perform the inverse problem. Due to the ill-posed nature, various regularizers have been exploited to reconstruct the 3D data from the 2D measurement. Unfortunately, the accuracy and computational complexity are unsatisfied. One feasible solution is to utilize additional information such as the RGB measurement in CASSI. Considering the combined CASSI and RGB measurement, in this paper, we propose a new fusion model for the HSI reconstruction. We investigate the spectral low-rank property of HSI composed of a spectral basis and spatial coefficients. Specifically, the RGB measurement is utilized to estimate the coefficients, meanwhile the CASSI measurement is adopted to provide the orthogonal spectral basis. We further propose a patch processing strategy to enhance the spectral low-rank property of HSI. The proposed model neither requires non-local processing or iteration, nor the spectral sensing matrix of the RGB detector. Extensive experiments on both simulated and real HSI dataset demonstrate that our proposed method outperforms previous state-of-the-art not only in quality but also speeds up the reconstruction more than 5000 times.	翻訳日:2021-04-18 15:04:47 公開日:2020-12-30
# (参考訳) DUT-LF Saliency:Versatile DatasetとLight Field-to-RGB Saliency Detection DUT-LFSaliency: Versatile Dataset and Light Field-to-RGB Saliency Detection ( http://arxiv.org/abs/2012.15124v1 ) ライセンス: CC BY 4.0	Yongri Piao and Zhengkun Rong and Shuang Xu and Miao Zhang and Huchuan Lu	(参考訳) 光電界データは、塩分検出に好適な特性を示す。学習に基づく光場塩分検出の成功は、モデルのより汎用性を高めるための包括的なデータセットの構築方法、高次元光フィールドデータの有効活用方法、デスクトップコンピュータやモバイルデバイスの汎用性を達成するためのフレキシブルモデルの設計方法に大きく依存している。これらの質問に答えるために、まず、rgb、rgb-dおよびlight field saliency detectionの汎用アプリケーションを可能にする大規模データセットを導入し、102のクラスと4204のサンプルを含む。次に,FocalストリームとRGBストリームからなる非対称な2ストリームモデルを提案する。 Focalストリームは、デスクトップコンピュータ上でより高いパフォーマンスを実現し、フォーカスネスの知識をRGBストリームに転送するように設計されている。 RGBストリームは3つの蒸留方式を通じてモバイルデバイスの柔軟性とメモリ/計算効率を保証する。実験は、我々の焦点ストリームが最先端のパフォーマンスを達成することを実証する。 rgb ストリームは dutlf-v2 上で top-2 f-measure を達成し、モデルサイズを 83% 削減し、最高の実行方法と比較して fps を 5 倍向上させる。さらに,提案する蒸留スキームはrgb塩分モデルに適用でき,柔軟性を確保しつつ優れた性能を実現する。 Light field data exhibit favorable characteristics conducive to saliency detection. The success of learning-based light field saliency detection is heavily dependent on how a comprehensive dataset can be constructed for higher generalizability of models, how high dimensional light field data can be effectively exploited, and how a flexible model can be designed to achieve versatility for desktop computers and mobile devices. To answer these questions, first we introduce a large-scale dataset to enable versatile applications for RGB, RGB-D and light field saliency detection, containing 102 classes and 4204 samples. Second, we present an asymmetrical two-stream model consisting of the Focal stream and RGB stream. The Focal stream is designed to achieve higher performance on desktop computers and transfer focusness knowledge to the RGB stream, relying on two tailor-made modules. The RGB stream guarantees the flexibility and memory/computation efficiency on mobile devices through three distillation schemes. Experiments demonstrate that our Focal stream achieves state-of-the-arts performance. The RGB stream achieves Top-2 F-measure on DUTLF-V2, which tremendously minimizes the model size by 83% and boosts FPS by 5 times, compared with the best performing method. Furthermore, our proposed distillation schemes are applicable to RGB saliency models, achieving impressive performance gains while ensuring flexibility.	翻訳日:2021-04-18 14:43:39 公開日:2020-12-30
# (参考訳) 古代ゲームにおける現代技術 Modern Techniques for Ancient Games ( http://arxiv.org/abs/2101.10066v1 ) ライセンス: CC BY 4.0	Cameron Browne	(参考訳) ゲームは、共有された文化的過去と人間の文明の発展に関する豊富な知識を提供する可能性があるが、初期のゲームに対する私たちの理解は不完全であり、しばしば信頼できない再建に基づいている。本稿では,現在進行中の5年間の研究プロジェクトであるDigital Ludemeプロジェクトについて述べる。 Games potentially provide a wealth of knowledge about our shared cultural past and the development of human civilisation, but our understanding of early games is incomplete and often based on unreliable reconstructions. This paper describes the Digital Ludeme Project, a five-year research project currently underway that aims to address such issues using modern computational techniques.	翻訳日:2021-04-18 13:51:43 公開日:2020-12-30
# (参考訳) JPEG圧縮に対するロバストデータハイディングに向けて:擬似微分型ディープラーニングアプローチ Towards Robust Data Hiding Against (JPEG) Compression: A Pseudo-Differentiable Deep Learning Approach ( http://arxiv.org/abs/2101.00973v1 ) ライセンス: CC BY 4.0	Chaoning Zhang, Adil Karjauv, Philipp Benz, In So Kweon	(参考訳) データ隠蔽は、認証と所有権を保護するために広く使われているアプローチである。画像やビデオのようなほとんどのマルチメディアコンテンツは圧縮形式で送信または保存される。 JPEGのようなこのような損失の多い圧縮は、隠れたデータを破壊する可能性があるため、堅牢なデータ隠蔽の必要性が高まる。これらの圧縮に対抗できるデータ隠蔽の目標を達成することは、依然としてオープンな課題である。近年、ディープラーニングはデータの隠蔽に大きな成功を収めている一方、jpegの非微分性は、損失のある圧縮に対する堅牢性を改善するために深いパイプラインを訓練することが難しくなっている。既存のSOTAアプローチは、同様の操作を行う異なるモジュールで、微分不可能な部分を置き換える。 a) 大規模なエンジニアリング努力; (b) 圧縮攻撃のホワイトボックス知識を必要とする; (c) jpegのような単純な圧縮でのみ機能する。本研究では,上記の全ての制限に同時に対処するための,シンプルで効果的なアプローチを提案する。 JPEG以外にも、さまざまな画像やビデオの損失圧縮アルゴリズムに対する堅牢性を向上する手法が示されている。 Data hiding is one widely used approach for protecting authentication and ownership. Most multimedia content like images and videos are transmitted or saved in the compressed form. This kind of lossy compression, such as JPEG, can destroy the hidden data, which raises the need of robust data hiding. It is still an open challenge to achieve the goal of data hiding that can be against these compressions. Recently, deep learning has shown large success in data hiding, while non-differentiability of JPEG makes it challenging to train a deep pipeline for improving robustness against lossy compression. The existing SOTA approaches replace the non-differentiable parts with differentiable modules that perform similar operations. Multiple limitations exist: (a) large engineering effort; (b) requiring a white-box knowledge of compression attacks; (c) only works for simple compression like JPEG. In this work, we propose a simple yet effective approach to address all the above limitations at once. Beyond JPEG, our approach has been shown to improve robustness against various image and video lossy compression algorithms.	翻訳日:2021-04-18 13:39:14 公開日:2020-12-30
# (参考訳) 構文を意識したローカル注意によるbertの改善 Improving BERT with Syntax-aware Local Attention ( http://arxiv.org/abs/2012.15150v1 ) ライセンス: CC BY 4.0	Zhongli Li, Qingyu Zhou, Chao Li, Ke Xu, Yunbo Cao	(参考訳) BERTのような、トレーニング済みのTransformerベースのニューラルネットワークモデルは、さまざまなNLPタスクにおいて顕著な成果を上げている。近年の研究では、注意に基づくモデルが地域に対するより集中的な注意の恩恵を受けることが示された。その多くは、線形スパン内の注意範囲を制限するか、機械翻訳や質問応答のような特定のタスクに限定する。本稿では,構文構造における距離に基づいて注意範囲を制限した構文認識型局所的注意を提案する。提案した構文認識ローカルアテンションは、BERTのような事前訓練された言語モデルと統合して、構文的に関連する単語にフォーカスするためにモデルをレンダリングすることができる。文分類やシーケンスラベリングタスクなど,シングルセンテンスベンチマークの各種実験を行った。実験結果は、すべてのベンチマークデータセット上でBERTよりも一貫した利得を示している。本研究は,構文的に関連した単語に注目が集まることにより,より優れた性能が得られることを示す。 Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions. Most of them restrict the attention scope within a linear span, or confine to certain tasks such as machine translation and question answering. In this paper, we propose a syntax-aware local attention, where the attention scopes are restrained based on the distances in the syntactic structure. The proposed syntax-aware local attention can be integrated with pretrained language models, such as BERT, to render the model to focus on syntactically relevant words. We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks. Experimental results show consistent gains over BERT on all benchmark datasets. The extensive studies verify that our model achieves better performance owing to more focused attention over syntactically relevant words.	翻訳日:2021-04-18 13:32:12 公開日:2020-12-30
# (参考訳) 時間移動可能な摂動:オンライン・ビジュアル・オブジェクト・トラッカに対する効率的な一発攻撃 Temporally-Transferable Perturbations: Efficient, One-Shot Adversarial Attacks for Online Visual Object Trackers ( http://arxiv.org/abs/2012.15183v1 ) ライセンス: CC BY 4.0	Krishna Kanth Nakka and Mathieu Salzmann	(参考訳) 近年、シアムネットワークに基づくトラッカーは、視覚オブジェクト追跡(vot)に非常に効果的で効率的なものとなっている。これらの手法は、視覚認識タスクのための多くのディープネットワークと同様に、敵の攻撃に対して脆弱であることが示されているが、既存のVOTトラッカーに対する攻撃は全ての入力フレームの探索領域の摂動を必要とする。本稿では,オブジェクトテンプレート画像のみから,時間移動可能な1つの逆摂動を生成するフレームワークを提案する。この混乱はすべての検索画像に追加され、事実上コストがかからず、トラッカーを騙すことに成功した。実験により,本手法は,未目標シナリオにおける標準VOTベンチマークに対する最先端攻撃よりも優れていることが示された。さらに,我々のフォーマリズムは,様々な方向の摂動をプリ計算することにより,トラッカーが任意の軌道に追従することを強制する攻撃に自然に及んでいることを示す。 In recent years, the trackers based on Siamese networks have emerged as highly effective and efficient for visual object tracking (VOT). While these methods were shown to be vulnerable to adversarial attacks, as most deep networks for visual recognition tasks, the existing attacks for VOT trackers all require perturbing the search region of every input frame to be effective, which comes at a non-negligible cost, considering that VOT is a real-time task. In this paper, we propose a framework to generate a single temporally transferable adversarial perturbation from the object template image only. This perturbation can then be added to every search image, which comes at virtually no cost, and still, successfully fool the tracker. Our experiments evidence that our approach outperforms the state-of-the-art attacks on the standard VOT benchmarks in the untargeted scenario. Furthermore, we show that our formalism naturally extends to targeted attacks that force the tracker to follow any given trajectory by precomputing diverse directional perturbations.	翻訳日:2021-04-18 13:11:49 公開日:2020-12-30
# (参考訳) 科学論文のインパクト予測の簡易化 Simplifying Impact Prediction for Scientific Articles ( http://arxiv.org/abs/2012.15192v1 ) ライセンス: CC BY 4.0	Thanasis Vergoulis, Ilias Kanellos, Giorgos Giannopoulos, Theodore Dalamagas	(参考訳) 記事の期待される影響を見積もることは、さまざまなアプリケーション(例えば、記事/コオペレータ推奨)に有用である。既存のほとんどのアプローチは、各記事が近い将来受ける引用の正確な数を予測しようとするが、これは難しい回帰分析問題である。さらに、ほとんどのアプローチは、多数の記事に対して適切に満たせない要件である、各記事に対する豊富なメタデータの存在に依存しています。本研究では,より単純な機械学習問題を解くこと,期待される影響に基づく記事の分類が現実の多くのアプリケーションに十分であるという事実を活用し,最小限の記事メタデータを用いて学習可能な簡易モデルを提案する。最後に, このモデルの様々な構成について検討し, 上記の分類問題を解く上での有効性を評価する。 Estimating the expected impact of an article is valuable for various applications (e.g., article/cooperator recommendation). Most existing approaches attempt to predict the exact number of citations each article will receive in the near future, however this is a difficult regression analysis problem. Moreover, most approaches rely on the existence of rich metadata for each article, a requirement that cannot be adequately fulfilled for a large number of them. In this work, we take advantage of the fact that solving a simpler machine learning problem, that of classifying articles based on their expected impact, is adequate for many real world applications and we propose a simplified model that can be trained using minimal article metadata. Finally, we examine various configurations of this model and evaluate their effectiveness in solving the aforementioned classification problem.	翻訳日:2021-04-18 12:56:22 公開日:2020-12-30
# (参考訳) Crossover-SGD: 分散ディープラーニングにおけるゴシップベース通信による大規模ミニバッチ問題の緩和とスケーラビリティ向上 Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability ( http://arxiv.org/abs/2012.15198v1 ) ライセンス: CC BY 4.0	Sangho Yeo, Minho Bae, Minjoong Jeong, Oh-kyoung Kwon, Sangyoon Oh	(参考訳) 分散ディープラーニングは、大規模なデータセットと複雑なモデルのためのディープラーニングのトレーニング時間を短縮する効果的な方法である。しかし、ネットワークオーバーヘッドによるスケーラビリティの制限により、すべてのワーカーのパラメータの同期が困難になる。この問題を解決するため, 作業者数に関係なく, 安定したスケーラビリティを示すゴシップ方式が提案されている。しかし、一般的にゴシップ方式を使用するには、大規模なミニバッチの検証精度を検証する必要がある。そこで本研究では,まず,大規模ミニバッチ問題におけるゴシップ法の特性を実証的に検討し,バッチサイズ数の増加とワーカ数の増加に対して,allreduce-sgd(stochasticgradient descent)よりも高い検証精度を維持できることを確認した。しかし,gossipに基づくモデルの遅延パラメータ伝搬は,大規模ノードスケールでの検証精度を低下させる。この問題に対処するため,重みパラメータの遅延伝搬を,セグメントワイド通信と負荷分散ランダムネットワークトポロジにより緩和するクロスオーバーSGDを提案する。また,ゴシップに基づくコミュニケーション手法における労働者数を制限するため,階層的なコミュニケーションも行う。提案手法の有効性を検証するため,我々は実験実験を行い,我々のクロスオーバーSGDがSGP(Stochastic Gradient Push)よりも高いノードスケーラビリティを示した。 Distributed deep learning is an effective way to reduce the training time of deep learning for large datasets as well as complex models. However, the limited scalability caused by network overheads makes it difficult to synchronize the parameters of all workers. To resolve this problem, gossip-based methods that demonstrates stable scalability regardless of the number of workers have been proposed. However, to use gossip-based methods in general cases, the validation accuracy for a large mini-batch needs to be verified. To verify this, we first empirically study the characteristics of gossip methods in a large mini-batch problem and observe that the gossip methods preserve higher validation accuracy than AllReduce-SGD(Stochastic Gradient Descent) when the number of batch sizes is increased and the number of workers is fixed. However, the delayed parameter propagation of the gossip-based models decreases validation accuracy in large node scales. To cope with this problem, we propose Crossover-SGD that alleviates the delay propagation of weight parameters via segment-wise communication and load balancing random network topology. We also adapt hierarchical communication to limit the number of workers in gossip-based communication methods. To validate the effectiveness of our proposed method, we conduct empirical experiments and observe that our Crossover-SGD shows higher node scalability than SGP(Stochastic Gradient Push).	翻訳日:2021-04-18 12:39:33 公開日:2020-12-30
# (参考訳) 無線センサネットワークにおけるエネルギー効率の最適化のための学習 Learning to Optimize Energy Efficiency in Energy Harvesting Wireless Sensor Networks ( http://arxiv.org/abs/2012.15203v1 ) ライセンス: CC BY 4.0	Debamita Ghosh and Manjesh K. Hanawal and Nikola Zlatanov	(参考訳) エネルギー効率の最大化を目的とした,エネルギー源による複数のエネルギー収穫ノードへの無線電力伝送について検討した。ソースは各タイムスロット内の利用可能な電力レベルのいずれかを使用してノードにエネルギーを送信し、ノードは収穫したエネルギーを使用してエネルギー源に情報を送信する。ソースはチャネルの状態情報を持っておらず、与えられたノードから受信したコードワードがうまくデコードされたかどうかのみを判断する。この限られた情報により、ソースはネットワークのエネルギー効率を最大化する最適な電力レベルを学ぶ必要がある。この問題を確率的多元帯域問題としてモデル化し,エネルギー効率を最大化するエネルギー源の最適送信電力を学習する上信頼境界に基づくアルゴリズムを開発した。数値結果は,提案アルゴリズムの性能保証を検証し,ベンチマーク手法と比較して有意な向上を示した。 We study wireless power transmission by an energy source to multiple energy harvesting nodes with the aim to maximize the energy efficiency. The source transmits energy to the nodes using one of the available power levels in each time slot and the nodes transmit information back to the energy source using the harvested energy. The source does not have any channel state information and it only knows whether a received codeword from a given node was successfully decoded or not. With this limited information, the source has to learn the optimal power level that maximizes the energy efficiency of the network. We model the problem as a stochastic Multi-Armed Bandits problem and develop an Upper Confidence Bound based algorithm, which learns the optimal transmit power of the energy source that maximizes the energy efficiency. Numerical results validate the performance guarantees of the proposed algorithm and show significant gains compared to the benchmark schemes.	翻訳日:2021-04-18 12:18:59 公開日:2020-12-30
# (参考訳) 構造プローブにおける直交制約の導入 Introducing Orthogonal Constraint in Structural Probes ( http://arxiv.org/abs/2012.15228v1 ) ライセンス: CC BY-SA 4.0	Tomasz Limisiewicz and David Mare\v{c}ek	(参考訳) NLPにおける事前訓練モデルの成功により、表現の解釈に重点が置かれた。最も顕著なアプローチの1つは構造的プロッピング(hewitt and manning, 2019)で、言語構造のトポロジーを近似するために言語ベクトル空間の線形射影が実行される。本研究では、この写像を 1 つの同型空間回転に分解する; 2. 最も関係のある方向を特定してスケールする線形スケーリング。埋め込みに隠された情報をアンタングルする手法の能力を検証するための新しい構造的タスクを導入する。提案手法がマルチタスク環境で実行可能であることを実験的に示す。さらに、直交制約は、特定の言語特徴をコードする埋め込み部分空間を識別し、プローブを暗記に対する脆弱さを減らす。 With the recent success of pre-trained models in NLP, a significant focus was put on interpreting their representations. One of the most prominent approaches is structural probing (Hewitt and Manning, 2019), where a linear projection of language vector space is performed in order to approximate the topology of linguistic structures. In this work, we decompose this mapping into 1. isomorphic space rotation; 2. linear scaling that identifies and scales the most relevant directions. We introduce novel structural tasks to exam our method's ability to disentangle information hidden in the embeddings. We experimentally show that our approach can be performed in a multitask setting. Moreover, the orthogonal constraint identifies embedding subspaces encoding specific linguistic features and make the probe less vulnerable to memorization.	翻訳日:2021-04-18 12:08:24 公開日:2020-12-30
# (参考訳) 不均衡データセット最適化のための新しい再サンプリング手法 A Novel Resampling Technique for Imbalanced Dataset Optimization ( http://arxiv.org/abs/2012.15231v1 ) ライセンス: CC BY 4.0	Ivan Letteri, Antonio Di Cecco, Abeer Dyoub, Giuseppe Della Penna	(参考訳) 膨大な量のデータにもかかわらず、特定の関心のある出来事は依然として極めて稀である。まれな事象の分類は、不正取引、マルウェアのトラフィック分析、ネットワーク侵入検出など、多くのドメインで一般的な問題である。さまざまなデータセットに対する機械学習アプローチを用いたマルウェア検出のための多くの研究が開発されているが、MTA-KDD'19データセットのみが、日々の悪意のあるトラフィックの代表セットを更新する特質を持っている。この日次更新はデータセットの追加値であるが、rrw最適化mta-kdd'19のクラス不均衡問題のために潜在的な可能性がある。実際のデータセットにおけるクラス分散の難しさを,safe,borderline,realy,outlierの4種類のマイノリティクラス例から把握する。本研究では,クラス不均衡問題に対する1-Nearest Neighbour(G1Nos)オーバーサンプリングアルゴリズムの2つのバージョンを開発した。 G1Nosアルゴリズムの最初のモジュールは、Im Balance Degreeの臨界しきい値を特定する係数ベースのインスタンス選択シルエットを実行する。 (ID)2番目のモジュールはSMOTEライクなオーバーサンプリングアルゴリズムを用いて合成サンプルを生成する。クラスのバランシングは、使用済みデータセットの2つのクラス間の比率を再確立するために、G1Nosアルゴリズムによって行われます。実験結果から, オーバーサンプリングアルゴリズムは他の2つのSOTA手法よりも有効であることがわかった。 Despite the enormous amount of data, particular events of interest can still be quite rare. Classification of rare events is a common problem in many domains, such as fraudulent transactions, malware traffic analysis and network intrusion detection. Many studies have been developed for malware detection using machine learning approaches on various datasets, but as far as we know only the MTA-KDD'19 dataset has the peculiarity of updating the representative set of malicious traffic on a daily basis. This daily updating is the added value of the dataset, but it translates into a potential due to the class imbalance problem that the RRw-Optimized MTA-KDD'19 will occur. We capture difficulties of class distribution in real datasets by considering four types of minority class examples: safe, borderline, rare and outliers. In this work, we developed two versions of Generative Silhouette Resampling 1-Nearest Neighbour (G1Nos) oversampling algorithms for dealing with class imbalance problem. The first module of G1Nos algorithms performs a coefficient-based instance selection silhouette identifying the critical threshold of Imbalance Degree. (ID), the second module generates synthetic samples using a SMOTE-like oversampling algorithm. The balancing of the classes is done by our G1Nos algorithms to re-establish the proportions between the two classes of the used dataset. The experimental results show that our oversampling algorithm work better than the other two SOTA methodologies in all the metrics considered.	翻訳日:2021-04-18 11:58:45 公開日:2020-12-30
# (参考訳) AI開発レースは異種ネットワークで仲介できる AI Development Race Can Be Mediated on Heterogeneous Networks ( http://arxiv.org/abs/2012.15234v1 ) ライセンス: CC BY 4.0	Theodor Cimpeanu, Francisco C. Santos, Luis Moniz Pereira, Tom Lenaerts and The Anh Han	(参考訳) 人工知能(AI)の分野は、研究、ビジネス、および政策に一定のレベルの不安をもたらしている。緊張はAIレースの物語によってさらに高まっており、多くの利害関係者が行方不明になることを恐れている。現実であろうとなかろうと、この物語に対する信念は有害であり、一部の利害関係者は安全上の予防や社会的結果を無視しなければならないと感じている。混合した世界での理想化された技術競争を記述したゲーム理論モデルから始め、人種間の相互作用構造の違いが集団的選択や規制行動の要件をいかに変えられるかを検討する。その結果、参加者がつながりや相互影響(例えば、スケールフリーネットワークが当事者間の相互作用を形作る場合)で強い多様性を表わすと、均質な設定に存在する衝突が著しく減少し、規制措置の必要性が低下することが示された。さらに, 技術ガバナンスと規制は, 企業や国家間の特許の不均一性と不平等から利益を得られ, 倫理的かつ持続的なaiの利用に向けて, 少数の参加者に対して, 細心の注意を払って介入を行うことが期待できる。 The field of Artificial Intelligence (AI) has been introducing a certain level of anxiety in research, business and also policy. Tensions are further heightened by an AI race narrative which makes many stakeholders fear that they might be missing out. Whether real or not, a belief in this narrative may be detrimental as some stakeholders will feel obliged to cut corners on safety precautions or ignore societal consequences. Starting from a game-theoretical model describing an idealised technology race in a well-mixed world, here we investigate how different interaction structures among race participants can alter collective choices and requirements for regulatory actions. Our findings indicate that, when participants portray a strong diversity in terms of connections and peer-influence (e.g., when scale-free networks shape interactions among parties), the conflicts that exist in homogeneous settings are significantly reduced, thereby lessening the need for regulatory actions. Furthermore, our results suggest that technology governance and regulation may profit from the world's patent heterogeneity and inequality among firms and nations to design and implement meticulous interventions on a minority of participants capable of influencing an entire population towards an ethical and sustainable use of AI.	翻訳日:2021-04-18 11:42:41 公開日:2020-12-30
# (参考訳) medico multimedia task at mediaeval 2020: automatic polyp segmentation Medico Multimedia Task at MediaEval 2020: Automatic Polyp Segmentation ( http://arxiv.org/abs/2012.15244v1 ) ライセンス: CC BY 4.0	Debesh Jha, Steven A. Hicks, Krister Emanuelsen, H{\aa}vard Johansen, Dag Johansen, Thomas de Lange, Michael A. Riegler, P{\aa}l Halvorsen	(参考訳) 大腸癌は世界で3番目に多いがんの原因である。 global cancer statistics 2018によると、開発途上国と先進国の両方で大腸癌の発生が増加している。ポリープなどの大腸異常の早期発見はがん予防に重要であり, 自動ポリープセグメンテーションが重要な役割を担っている。最近の早期発見と治療の進歩にかかわらず、推定されたポリプミス率は20\%である。自動診断システムによるサポートは、見落とされたポリープの潜在的な解決策の1つである可能性がある。このような検出システムは、低コストな設計ソリューションを助け、医師の時間を節約することができる。本稿では,2020 medico challengeを紹介し,関連する作業とデータセットに関する情報を提供し,課題と評価指標を説明し,medico challengeの編成の必要性について論じる。 Colorectal cancer is the third most common cause of cancer worldwide. According to Global cancer statistics 2018, the incidence of colorectal cancer is increasing in both developing and developed countries. Early detection of colon anomalies such as polyps is important for cancer prevention, and automatic polyp segmentation can play a crucial role for this. Regardless of the recent advancement in early detection and treatment options, the estimated polyp miss rate is still around 20\%. Support via an automated computer-aided diagnosis system could be one of the potential solutions for the overlooked polyps. Such detection systems can help low-cost design solutions and save doctors time, which they could for example use to perform more patient examinations. In this paper, we introduce the 2020 Medico challenge, provide some information on related work and the dataset, describe the task and evaluation metrics, and discuss the necessity of organizing the Medico challenge.	翻訳日:2021-04-18 11:41:40 公開日:2020-12-30
# (参考訳) DDANet: Dual Decoder Attention Network for Automatic Polyp Segmentation DDANet: Dual Decoder Attention Network for Automatic Polyp Segmentation ( http://arxiv.org/abs/2012.15245v1 ) ライセンス: CC BY 4.0	Nikhil Kumar Tomar, Debesh Jha, Sharib Ali, H{\aa}vard D. Johansen, Dag Johansen, Michael A. Riegler, and P{\aa}l Halvorsen	(参考訳) 大腸内視鏡は大腸ポリープの検査と検出のための金の標準である。ポリープの局在とデライン化は、治療(例えば、手術計画)と予後決定において重要な役割を果たす。ポリープセグメンテーションは臨床分析のための詳細な境界情報を提供することができる。畳み込みニューラルネットワークは大腸内視鏡の性能を改善した。しかしながら、ポリプは通常、クラス内およびクラス間変異やノイズなど、様々な課題を抱えている。ポリープ評価のための手動ラベリングは、専門家の時間を必要とし、ヒューマンエラー(例えば、欠落した病変)を起こしやすいが、自動化され、正確で、高速に分割することで、脱線した病変の境界の品質を改善し、欠落率を減らすことができる。 endotect challengeは、公開のhyperkvasirでトレーニングし、未公開のデータセットでテストすることで、コンピュータビジョンのメソッドをベンチマークする機会を提供する。本稿では,デュアルデコーダアテンションネットワークに基づく ``ddanet'' と呼ばれる新しいアーキテクチャを提案する。実験により, Kvasir-SEGデータセットを用いてトレーニングし, 未知のデータセット上で試験したモデルは, ダイス係数0.7874, mIoU0.7010, リコール0.7987, 精度0.8577を達成し, モデルの一般化能力を実証した。 Colonoscopy is the gold standard for examination and detection of colorectal polyps. Localization and delineation of polyps can play a vital role in treatment (e.g., surgical planning) and prognostic decision making. Polyp segmentation can provide detailed boundary information for clinical analysis. Convolutional neural networks have improved the performance in colonoscopy. However, polyps usually possess various challenges, such as intra-and inter-class variation and noise. While manual labeling for polyp assessment requires time from experts and is prone to human error (e.g., missed lesions), an automated, accurate, and fast segmentation can improve the quality of delineated lesion boundaries and reduce missed rate. The Endotect challenge provides an opportunity to benchmark computer vision methods by training on the publicly available Hyperkvasir and testing on a separate unseen dataset. In this paper, we propose a novel architecture called ``DDANet'' based on a dual decoder attention network. Our experiments demonstrate that the model trained on the Kvasir-SEG dataset and tested on an unseen dataset achieves a dice coefficient of 0.7874, mIoU of 0.7010, recall of 0.7987, and a precision of 0.8577, demonstrating the generalization ability of our model.	翻訳日:2021-04-18 11:34:54 公開日:2020-12-30
# (参考訳) U-Net-ResNet50を用いた自動ポリープセグメンテーション Automatic Polyp Segmentation using U-Net-ResNet50 ( http://arxiv.org/abs/2012.15247v1 ) ライセンス: CC BY 4.0	Saruar Alam, Nikhil Kumar Tomar, Aarati Thakur, Debesh Jha, Ashish Rauniyar	(参考訳) ポリープは大腸癌の前身であり、世界中のがん関連死亡の原因の1つと考えられている。大腸内視鏡は大腸ポリープの同定、局在化、除去の標準的手順である。大腸内視鏡検査では, 形状, サイズ, 周囲の組織類似性の変動により, 大腸ポリープが欠落することが多い。大腸内視鏡検査において, 自動的, 正確かつ高速なポリープ分割法を用いることで, 多くの大腸ポリープを容易に検出・除去することができる。 medico automatic polyp segmentation challenge'は、ポリプセグメンテーションを研究し、効率的かつ正確なセグメンテーションアルゴリズムを構築する機会を提供する。プリトレーニングされたResNet50をポリプセグメンテーションのエンコーダとしてU-Netを使用する。本モデルでは,この課題に対して提供されるKvasir-SEGデータセットに基づいてトレーニングを行い,オーガナイザのデータセットで検証し,ディス係数0.8154,ジャカード0.7396,リコール0.8533,精度0.8532,精度0.9506,F2スコア0.8272を達成し,モデルの一般化能力を示す。 Polyps are the predecessors to colorectal cancer which is considered as one of the leading causes of cancer-related deaths worldwide. Colonoscopy is the standard procedure for the identification, localization, and removal of colorectal polyps. Due to variability in shape, size, and surrounding tissue similarity, colorectal polyps are often missed by the clinicians during colonoscopy. With the use of an automatic, accurate, and fast polyp segmentation method during the colonoscopy, many colorectal polyps can be easily detected and removed. The ``Medico automatic polyp segmentation challenge'' provides an opportunity to study polyp segmentation and build an efficient and accurate segmentation algorithm. We use the U-Net with pre-trained ResNet50 as the encoder for the polyp segmentation. The model is trained on Kvasir-SEG dataset provided for the challenge and tested on the organizer's dataset and achieves a dice coefficient of 0.8154, Jaccard of 0.7396, recall of 0.8533, precision of 0.8532, accuracy of 0.9506, and F2 score of 0.8272, demonstrating the generalization ability of our model.	翻訳日:2021-04-18 11:27:22 公開日:2020-12-30
# (参考訳) 機械学習におけるフェアネスの最大相関手法 A Maximal Correlation Approach to Imposing Fairness in Machine Learning ( http://arxiv.org/abs/2012.15259v1 ) ライセンス: CC BY 4.0	Joshua Lee, Yuheng Bu, Prasanna Sattigeri, Rameswar Panda, Gregory Wornell, Leonid Karlinsky, Rogerio Feris	(参考訳) 機械学習アルゴリズムの人気が高まり、多くの産業に多様化するにつれて、その公平性に関する倫理的および法的懸念が益々関連している。我々は,情報理論の観点から,アルゴリズムフェアネスの問題を探究する。最大相関フレームワーク(maximal correlation framework)は、フェアネス制約を表現するために導入されたもので、独立性と分離ベースのフェアネス基準を強制する正規化子を導出し、既存のアルゴリズムよりも計算効率のよい離散変数と連続変数の両方の最適化アルゴリズムを許容できることが示されている。これらのアルゴリズムは、スムーズなパフォーマンス・フェアネストレードオフ曲線を提供し、離散データセット(compas, adult)と連続データセット(communities and crime)の両方において最先端の手法と競合する。 As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant. We explore the problem of algorithmic fairness, taking an information-theoretic view. The maximal correlation framework is introduced for expressing fairness constraints and shown to be capable of being used to derive regularizers that enforce independence and separation-based fairness criteria, which admit optimization algorithms for both discrete and continuous variables which are more computationally efficient than existing algorithms. We show that these algorithms provide smooth performance-fairness tradeoff curves and perform competitively with state-of-the-art methods on both discrete datasets (COMPAS, Adult) and continuous datasets (Communities and Crimes).	翻訳日:2021-04-18 11:22:32 公開日:2020-12-30
# (参考訳) 情報ゲインを用いた言語横断形容詞順序予測 Predicting cross-linguistic adjective order with information gain ( http://arxiv.org/abs/2012.15263v1 ) ライセンス: CC BY 4.0	William Dyer, Richard Futrell, Zoey Liu, and Gregory Scontras	(参考訳) 言語は名詞の前、後、または周囲の複数の形容詞の配置が異なるが、通常、それらの形容詞の相対的な順序で強い言語内傾向を示す(例えば、英語では「big blue box」、フランス語では「grande bo\^{i}te bleue」、アラビア語では「alsund\={u}q al'azraq alkab\={\i}r」)。我々は,情報ゲインの最大化に基づく形容詞順の新しい定量化を推し進める。本モデルでは,フランス型ANA配列の左右非対称性を,AANおよびNAA順序と同じアプローチで解決する。 32の言語にまたがって、好まれる形容詞の順序は、情報獲得を最大化する効率的なアルゴリズムをほとんど反映している。 Languages vary in their placement of multiple adjectives before, after, or surrounding the noun, but they typically exhibit strong intra-language tendencies on the relative order of those adjectives (e.g., the preference for `big blue box' in English, `grande bo\^{i}te bleue' in French, and `alsund\={u}q al'azraq alkab\={\i}r' in Arabic). We advance a new quantitative account of adjective order across typologically-distinct languages based on maximizing information gain. Our model addresses the left-right asymmetry of French-type ANA sequences with the same approach as AAN and NAA orderings, without appeal to other mechanisms. We find that, across 32 languages, the preferred order of adjectives largely mirrors an efficient algorithm of maximizing information gain.	翻訳日:2021-04-18 11:05:13 公開日:2020-12-30
# (参考訳) DEER: イベント時間推論のためのデータ効率の良い言語モデル DEER: A Data Efficient Language Model for Event Temporal Reasoning ( http://arxiv.org/abs/2012.15283v1 ) ライセンス: CC BY 4.0	Rujun Han, Xiang Ren, Nanyun Peng	(参考訳) BERT、RoBERTa、ELECTRAなどの事前訓練言語モデル(LM)は、様々な下流NLPタスクの性能向上に有効である。近年、研究者はこれらのLMのトレーニング目標にドメインとタスク固有の知識を取り入れ、下流タスクを扱うモデルの能力をさらに強化している。しかしながら、これらのLMはイベントの時間的推論に特化して設計されていない。本稿では,イベントの時間的関係に着目した言語モデルDEERを提案する。具体的には,イベント時相理解のための機械読解と情報抽出タスクをシミュレートするために,多数のトレーニングサンプルを作成し,イベント時相推論のlms能力を強化するためにジェネレータ・判別器構造を活用する。実験の結果, DEER は SOTA の結果を達成でき,特に 5 つの広く使用されているデータセットの低リソース環境では有効であることがわかった。 Pretrained language models (LMs) such as BERT, RoBERTa, and ELECTRA are effective at improving the performances of a variety of downstream NLP tasks. Recently, researchers have incorporated domain and task-specific knowledge in these LMs' training objectives and further enhanced models' capability of handling downstream tasks. However, none of these LMs are designed specifically for event temporal reasoning. We propose DEER, a language model that is trained to focus on event temporal relations and performs better under low-resource settings than original LMs. More specifically, we create a large number of training samples to simulate the machine reading comprehension and information extraction tasks for event temporal understanding and leverage a generator-discriminator structure to reinforce the LMs' capability of event temporal reasoning. Our experimental results show that DEER can achieve SOTA results and works particularly well in low-resource settings across 5 widely used datasets.	翻訳日:2021-04-18 10:50:52 公開日:2020-12-30
# (参考訳) バッグ外評価とサブバッキングによる分類のための最適木選定法 Optimal trees selection for classification via out-of-bag assessment and sub-bagging ( http://arxiv.org/abs/2012.15301v1 ) ライセンス: CC BY 4.0	Zardad Khan, Naz Gul, Nosheen Faiz, Asma Gul, Werner Adler, Berthold Lausen	(参考訳) 機械学習手法に対するトレーニングデータサイズの影響は過去20年間にわたってよく研究されてきた。一般に、木ベースの機械学習手法の予測性能は、トレーニングデータのサイズが大きくなるにつれて低下して改善される。本研究では,本手法が内部検証によるトレーニング観測から学習できない最適樹木アンサンブル(OTE)について検討する。そこで,OTEは内部検証におけるトレーニング観察の損失に対応するため,修正木選択法を提案する。第1の方法では、各木に対する個別および集団のパフォーマンス評価において、対応するOOB(out-of-bag)観測を使用する。木は、OOB観測に基づいて個々のパフォーマンスに基づいてランク付けされる。特定の上位木を選定し、最も正確な木から開始し、その後に1つずつ木を付加し、その木を付加するために採取したブートストラップ標本から残したOOB観測を用いて、その影響を記録する。アンサンブルの予測精度を向上させると木が選択される。第2のアプローチでは、木はランダムなサブセット上で成長し、ブートストラップサンプルではなく、トレーニングデータのサブバッギング(sub-bagging)として知られています。各試料からの残りの観察は、第1法と同様に、対応する木々の個体および集合的評価に使用される。 21個のベンチマークデータセットの解析とシミュレーション研究により,OTEや他の最先端手法と比較して改良された手法の性能が向上した。 The effect of training data size on machine learning methods has been well investigated over the past two decades. The predictive performance of tree based machine learning methods, in general, improves with a decreasing rate as the size of training data increases. We investigate this in optimal trees ensemble (OTE) where the method fails to learn from some of the training observations due to internal validation. Modified tree selection methods are thus proposed for OTE to cater for the loss of training observations in internal validation. In the first method, corresponding out-of-bag (OOB) observations are used in both individual and collective performance assessment for each tree. Trees are ranked based on their individual performance on the OOB observations. A certain number of top ranked trees is selected and starting from the most accurate tree, subsequent trees are added one by one and their impact is recorded by using the OOB observations left out from the bootstrap sample taken for the tree being added. A tree is selected if it improves predictive accuracy of the ensemble. In the second approach, trees are grown on random subsets, taken without replacement-known as sub-bagging, of the training data instead of bootstrap samples (taken with replacement). The remaining observations from each sample are used in both individual and collective assessments for each corresponding tree similar to the first method. Analysis on 21 benchmark datasets and simulations studies show improved performance of the modified methods in comparison to OTE and other state-of-the-art methods.	翻訳日:2021-04-18 10:36:00 公開日:2020-12-30
# (参考訳) マルチモーダルMR画像を用いた脳腫瘍切開のためのH2NF-Net : 第2回 BraTS Challenge 2020 Segmentation Task H2NF-Net for Brain Tumor Segmentation using Multimodal MR Imaging: 2nd Place Solution to BraTS Challenge 2020 Segmentation Task ( http://arxiv.org/abs/2012.15318v1 ) ライセンス: CC BY 4.0	Haozhe Jia, Weidong Cai, Heng Huang, Yong Xia	(参考訳) 本稿では,マルチモーダルMR画像中の脳腫瘍を分割するハイブリッド高分解能・非局所特徴ネットワーク(H2NF-Net)を提案する。我々のH2NF-Netは、単一かつカスケードされたHNF-Netを使用して、異なる脳腫瘍のサブリージョンを分割し、予測を最終セグメンテーションとして組み合わせます。我々は、マルチモーダル脳腫瘍分離チャレンジ(BraTS)2020データセットでモデルをトレーニングし、評価した。その結果,単発モデルと縦型モデルの組み合わせにより,0.78751,0.91290,0.85461のdiceスコアと26.57525,4.18426,4.97162のハウスドルフ距離がそれぞれ0.78751,0.91290,0.85461であった。提案手法は,80名近い参加者のうち,brats 2020チャレンジセグメンテーションタスクで2位となった。 In this paper, we propose a Hybrid High-resolution and Non-local Feature Network (H2NF-Net) to segment brain tumor in multimodal MR images. Our H2NF-Net uses the single and cascaded HNF-Nets to segment different brain tumor sub-regions and combines the predictions together as the final segmentation. We trained and evaluated our model on the Multimodal Brain Tumor Segmentation Challenge (BraTS) 2020 dataset. The results on the test set show that the combination of the single and cascaded models achieved average Dice scores of 0.78751, 0.91290, and 0.85461, as well as Hausdorff distances ($95\%$) of 26.57525, 4.18426, and 4.97162 for the enhancing tumor, whole tumor, and tumor core, respectively. Our method won the second place in the BraTS 2020 challenge segmentation task out of nearly 80 participants.	翻訳日:2021-04-18 09:30:20 公開日:2020-12-30
# (参考訳) k\=oan: 修正CBOW実装 k\=oan: A Corrected CBOW Implementation ( http://arxiv.org/abs/2012.15332v1 ) ライセンス: CC BY-SA 4.0	Ozan \.Irsoy, Adrian Benton, Karl Stratos	(参考訳) NLPコミュニティでは、CBOW(continuous bag-of-words)ワードの埋め込みがスキップグラム(SG)埋め込みを過小評価する傾向にあるという共通認識がある。この信念は、トレーニング目標の理論的差異よりも、公式実装の word2vec.c や Gensim などの標準ソフトウェアライブラリにおけるCBOW実装の欠陥に基づいていることが分かる。 CBOWの正しい実装は、学習の3倍以上の速さで、様々な本質的・外生的なタスクにおいてSGと完全に競合する単語埋め込みをもたらすことを示す。私たちは実装であるk\=oanをhttps://github.com/bloomberg/koan.comでリリースします。 It is a common belief in the NLP community that continuous bag-of-words (CBOW) word embeddings tend to underperform skip-gram (SG) embeddings. We find that this belief is founded less on theoretical differences in their training objectives but more on faulty CBOW implementations in standard software libraries such as the official implementation word2vec.c and Gensim. We show that our correct implementation of CBOW yields word embeddings that are fully competitive with SG on various intrinsic and extrinsic tasks while being more than three times as fast to train. We release our implementation, k\=oan, at https://github.com/bloomberg/koan.	翻訳日:2021-04-18 09:06:21 公開日:2020-12-30
# (参考訳) ニューラルネットワークを用いた空間ガウス過程モデルの高速共分散パラメータ推定 Fast covariance parameter estimation of spatial Gaussian process models using neural networks ( http://arxiv.org/abs/2012.15339v1 ) ライセンス: CC BY 4.0	Florian Gerber and Douglas W. Nychka	(参考訳) ガウス過程(GP)は空間的に参照されたデータの一般的なモデルであり、記述的ステートメント、新しい場所での予測、新しいフィールドのシミュレーションを可能にする。共分散関数をパラメータ化するのにはいくつかのパラメータが十分であり、データからこれらのパラメータを推定するために最大可能性(ML)法が用いられる。しかし、mlメソッドは計算的に要求される。例えば、局所的な推定の場合、最小サイズのウィンドウに共分散モデルを適用することさえも、データ解析の典型的な計算資源を圧倒することができる。この制限は、ML推定を近似するためにニューラルネットワーク(NN)メソッドを使用するというアイデアを動機付けている。我々はnnを入力として適度な大きさの空間場または変量線を取り、範囲と信号間の共分散パラメータを返すように訓練する。トレーニングが完了すると、nnsはml推定と同等の精度で見積もりを提供し、100倍以上のスピードアップを行う。気候科学の応用によって動機付けられた特定の共分散推定問題に焦点をあてるが、この研究はより複雑で空間的な問題にも容易に拡張でき、計算統計学における機械学習の利用の実証となる。 Gaussian processes (GPs) are a popular model for spatially referenced data and allow descriptive statements, predictions at new locations, and simulation of new fields. Often a few parameters are sufficient to parameterize the covariance function, and maximum likelihood (ML) methods can be used to estimate these parameters from data. ML methods, however, are computationally demanding. For example, in the case of local likelihood estimation, even fitting covariance models on modest size windows can overwhelm typical computational resources for data analysis. This limitation motivates the idea of using neural network (NN) methods to approximate ML estimates. We train NNs to take moderate size spatial fields or variograms as input and return the range and noise-to-signal covariance parameters. Once trained, the NNs provide estimates with a similar accuracy compared to ML estimation and at a speedup by a factor of 100 or more. Although we focus on a specific covariance estimation problem motivated by a climate science application, this work can be easily extended to other, more complex, spatial problems and provides a proof-of-concept for this use of machine learning in computational statistics.	翻訳日:2021-04-18 08:56:26 公開日:2020-12-30
# (参考訳) ビデオモザイクアプリケーションにおける情報重なりフレームのアクティブアノテーション Active Annotation of Informative Overlapping Frames in Video Mosaicking Applications ( http://arxiv.org/abs/2012.15343v1 ) ライセンス: CC BY 4.0	Loic Peter, Marcel Tella-Amo, Dzhoshkun Ismail Shakir, Jan Deprest, Sebastien Ourselin, Juan Eugenio Iglesias, Tom Vercauteren	(参考訳) ビデオモザイクは、再建されたシーンのグローバルな一貫性を確保するために、シーケンス内の遠くのタイムポイントに位置する重なり合うフレームを登録する必要がある。しかし,このような長距離ペアの完全自動登録は,画像自体の登録が困難である場合には困難である。本稿では,配列内の長距離対対応のアクティブアノテーションのための効率的なフレームワークを提案する。当社のフレームワークでは,提案する各ペアに視覚的対応を提供するoracleエージェント(例えば,人間ユーザや信頼できるマッチングアルゴリズム)に情報提供を求めるイメージのペアを提案している。フレームオーバーラップの2つの相補的およびオンライン適応可能なモデルと組み合わせた、原則付きアノテーション報酬に基づく反復戦略に基づいて、インフォーマティブペアを検索する。モザイクの効率的な構築に加えて、我々のフレームワークは、評価や学習目的で使用できる副産物として、地上の真実のランドマーク対応を提供する。本手法は, 人工的およびインタラクティブなシナリオにおいて, 合成配列の実験, 航空画像用データセット, 胎児手術時の胎盤モザイク用臨床データセットを用いて評価した。 Video mosaicking requires the registration of overlapping frames located at distant timepoints in the sequence to ensure global consistency of the reconstructed scene. However, fully automated registration of such long-range pairs is (i) challenging when the registration of images itself is difficult; and (ii) computationally expensive for long sequences due to the large number of candidate pairs for registration. In this paper, we introduce an efficient framework for the active annotation of long-range pairwise correspondences in a sequence. Our framework suggests pairs of images that are sought to be informative to an oracle agent (e.g., a human user, or a reliable matching algorithm) who provides visual correspondences on each suggested pair. Informative pairs are retrieved according to an iterative strategy based on a principled annotation reward coupled with two complementary and online adaptable models of frame overlap. In addition to the efficient construction of a mosaic, our framework provides, as a by-product, ground truth landmark correspondences which can be used for evaluation or learning purposes. We evaluate our approach in both automated and interactive scenarios via experiments on synthetic sequences, on a publicly available dataset for aerial imaging and on a clinical dataset for placenta mosaicking during fetal surgery.	翻訳日:2021-04-18 08:34:37 公開日:2020-12-30
# (参考訳) DynaSent: 知覚分析のための動的ベンチマーク DynaSent: A Dynamic Benchmark for Sentiment Analysis ( http://arxiv.org/abs/2012.15349v1 ) ライセンス: CC BY 4.0	Christopher Potts, Zhengxuan Wu, Atticus Geiger, Douwe Kiela	(参考訳) dynasent ('dynamic sentiment') は,三者間感情分析(正・負・中性)のための新しい英語ベンチマークタスクである。 dynasentは自然に発生する文とオープンソースのdynabenchプラットフォームを使って作成された文を組み合わせる。 DynaSentには合計121,634の文があり、それぞれが5人のクラウドワーカーによって検証されており、その開発とテストの分割は、私たちが開発した最高のモデルであっても、チャンスパフォーマンスを生み出すように設計されている。ここでは、データセットの作成作業について報告し、品質の向上とアーティファクトの削減に要したステップに注目します。 dynasentの中性カテゴリが他のベンチマークの同等のカテゴリよりも一貫性があるという証拠も提示し、連続的な微調整よりも各ラウンドのトレーニングモデルをスクラッチからモチベーションづける。 We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis. DynaSent combines naturally occurring sentences with sentences created using the open-source Dynabench Platform, which facilities human-and-model-in-the-loop dataset creation. DynaSent has a total of 121,634 sentences, each validated by five crowdworkers, and its development and test splits are designed to produce chance performance for even the best models we have been able to develop; when future models solve this task, we will use them to create DynaSent version 2, continuing the dynamic evolution of this benchmark. Here, we report on the dataset creation effort, focusing on the steps we took to increase quality and reduce artifacts. We also present evidence that DynaSent's Neutral category is more coherent than the comparable category in other benchmarks, and we motivate training models from scratch for each round over successive fine-tuning.	翻訳日:2021-04-18 07:52:42 公開日:2020-12-30
# (参考訳) bert(および他のトランスフォーマーモデル)埋め込みによる文脈的意味的特徴の導出 Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings ( http://arxiv.org/abs/2012.15353v1 ) ライセンス: CC BY 4.0	Jacob Turton, David Vinson, Robert Elliott Smith	(参考訳) BERTのようなトランスフォーマーアーキテクチャに基づくモデルは、自然言語処理の分野で重要な一歩を踏み出した。重要なことは、文脈における単語に関する重要な意味情報をキャプチャする単語埋め込みの作成を可能にすることである。しかし、単一の実体として、これらの埋め込みは解釈が難しく、それらを作成するのに使われたモデルは不透明であると説明されている。 Binderと同僚は直感的な埋め込み空間を提案し、それぞれの次元は65のコアセマンティックな特徴の1つに基づいている。残念ながら、スペースは535ワードの小さなデータセットでしか存在せず、使用は制限されている。以前の研究(Utsumi, 2018, 2020, Turton, Vinson & Smith, 2020)では、Binderの機能は静的な埋め込みから派生し、大きな新しい語彙への外挿に成功した。次のステップとして,Binder の機能は BERT 埋め込み空間から導出可能であることを示す。これはコンテキスト化されたBinder埋め込みを提供し、コンテキスト内の単語間の意味的差異を理解するのに役立つ。さらに、BERTモデルの異なるレイヤ間でセマンティック機能がどのように表現されるかについての洞察も提供する。 Models based on the transformer architecture, such as BERT, have marked a crucial step forward in the field of Natural Language Processing. Importantly, they allow the creation of word embeddings that capture important semantic information about words in context. However, as single entities, these embeddings are difficult to interpret and the models used to create them have been described as opaque. Binder and colleagues proposed an intuitive embedding space where each dimension is based on one of 65 core semantic features. Unfortunately, the space only exists for a small dataset of 535 words, limiting its uses. Previous work (Utsumi, 2018, 2020, Turton, Vinson & Smith, 2020) has shown that Binder features can be derived from static embeddings and successfully extrapolated to a large new vocabulary. Taking the next step, this paper demonstrates that Binder features can be derived from the BERT embedding space. This provides contextualised Binder embeddings, which can aid in understanding semantic differences between words in context. It additionally provides insights into how semantic features are represented across the different layers of the BERT model.	翻訳日:2021-04-18 07:28:11 公開日:2020-12-30
# (参考訳) OSTeC:ワンショットテクスチャコンプリート OSTeC: One-Shot Texture Completion ( http://arxiv.org/abs/2012.15370v1 ) ライセンス: CC BY 4.0	Baris Gecer, Jiankang Deng, Stefanos Zafeiriou	(参考訳) ここ数年は、高品質なフォトリアリスティックな顔画像の合成において、非線形生成モデルが大きな成功を収めている。最近の3D顔テクスチャの再構築と、1つの画像アプローチからのポーズ操作は、画像から画像へのジェネレータネットワーク(GAN)をトレーニングする大規模でクリーンな顔データセットに依存している。しかし、このような大規模な高解像度3dテクスチャデータセットの収集は、年齢と民族のバランスを維持するのに非常にコストがかかり、困難である。さらに、回帰に基づくアプローチは、中間条件への一般化に悩まされ、目標像への微調整ができない。本研究では,大規模なテクスチャデータセットを必要とせず,むしろ2d顔生成器に格納された知識を活用する,ワンショット3d顔テクスチャ補完のための教師なしアプローチを提案する。提案手法は、3次元の入力画像を回転させ、可視部に基づいて2次元顔生成器で回転画像を再構成することにより、未検出領域を充填する。最後に、UV画像平面の異なる角度で最も目に見えるテクスチャを縫い合わせる。さらに,完成したテクスチャをジェネレータに投影することで,対象画像をフロンダリゼーションする。定性的かつ定量的な実験により,完成したuvテクスチャとフロントイメージは高品質であり,元のアイデンティティに類似しており,3dmmフィッティングのためのテクスチャganモデルのトレーニングやポーズ不変顔認識の改善に使用することができる。 The last few years have witnessed the great success of non-linear generative models in synthesizing high-quality photorealistic face images. Many recent 3D facial texture reconstruction and pose manipulation from a single image approaches still rely on large and clean face datasets to train image-to-image Generative Adversarial Networks (GANs). Yet the collection of such a large scale high-resolution 3D texture dataset is still very costly and difficult to maintain age/ethnicity balance. Moreover, regression-based approaches suffer from generalization to the in-the-wild conditions and are unable to fine-tune to a target-image. In this work, we propose an unsupervised approach for one-shot 3D facial texture completion that does not require large-scale texture datasets, but rather harnesses the knowledge stored in 2D face generators. The proposed approach rotates an input image in 3D and fill-in the unseen regions by reconstructing the rotated image in a 2D face generator, based on the visible parts. Finally, we stitch the most visible textures at different angles in the UV image-plane. Further, we frontalize the target image by projecting the completed texture into the generator. The qualitative and quantitative experiments demonstrate that the completed UV textures and frontalized images are of high quality, resembles the original identity, can be used to train a texture GAN model for 3DMM fitting and improve pose-invariant face recognition.	翻訳日:2021-04-18 07:04:31 公開日:2020-12-30
# (参考訳) 自己監督型機能距離を用いたモデルベース視覚計画 Model-Based Visual Planning with Self-Supervised Functional Distances ( http://arxiv.org/abs/2012.15373v1 ) ライセンス: CC BY 4.0	Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine	(参考訳) 汎用ロボットはその環境の中で様々なタスクを完了できなければならない。各タスクを指定できるひとつの魅力的な方法は、ゴールの観察である。しかし、特に手書きの報酬関数が利用できない場合、強化学習による目標達成政策の学習は難しい問題である。学習されたダイナミクスモデルは、報酬やタスク指向データなしで環境について学ぶための有望なアプローチであるが、そのようなモデルで目標に到達する計画には、観測と目標状態の間の機能的類似性の概念が必要である。本稿では,視覚力学モデルとモデルフリー強化学習を用いて学習した動的距離関数を併用した,モデルベース視覚目標到達のための自己教師あり手法を提案する。当社のアプローチは、オフラインでラベルのないデータを使用して完全に学習し、大規模で多様なデータセットにスケールすることが現実的になります。実験では,実世界のロボットを用いて,様々なタスクを遂行するモデル,ロボットアームをシミュレートした不注意な物体を移動させるモデル,さらには引き出しの開閉を学習する手法が有効であることを見出した。比較すると,本手法はモデルフリーとモデルベース先行手法の両方で大幅に優れていた。ビデオとビジュアライゼーションは以下の通りである。 A generalist robot must be able to complete a variety of tasks in its environment. One appealing way to specify each task is in terms of a goal observation. However, learning goal-reaching policies with reinforcement learning remains a challenging problem, particularly when hand-engineered reward functions are not available. Learned dynamics models are a promising approach for learning about the environment without rewards or task-directed data, but planning to reach goals with such a model requires a notion of functional similarity between observations and goal states. We present a self-supervised method for model-based visual goal reaching, which uses both a visual dynamics model as well as a dynamical distance function learned using model-free reinforcement learning. Our approach learns entirely using offline, unlabeled data, making it practical to scale to large and diverse datasets. In our experiments, we find that our method can successfully learn models that perform a variety of tasks at test-time, moving objects amid distractors with a simulated robotic arm and even learning to open and close a drawer using a real-world robot. In comparisons, we find that this approach substantially outperforms both model-free and model-based prior methods. Videos and visualizations are available here: http://sites.google.com/berkeley.edu/mbold.	翻訳日:2021-04-18 06:40:12 公開日:2020-12-30
# 普遍視覚指導による正確な単語表現 Accurate Word Representations with Universal Visual Guidance ( http://arxiv.org/abs/2012.15086v1 ) ライセンス: Link先を確認	Zhuosheng Zhang, Haojie Yu, Hai Zhao, Rui Wang, Masao Utiyama	(参考訳) 単語表現は、ニューラルネットワーク理解モデルの基本コンポーネントである。近年,事前学習型言語モデル (PrLM) は,文脈化語表現の新しいパフォーマンス手法を提供する。 prlmは一般に、非文脈化モデルよりも正確な文脈化単語表現を提供するが、マルチモーダリティから単語表現のヒントが多様でないテキストコンテキストの列にはまだ従わない。そこで本稿では,視覚指導から従来の単語埋め込みを視覚的に強調する視覚的表現法を提案する。詳細は,各単語が多様な関連画像に対応するマルチモーダルシードデータセットから,小規模の単語画像辞書を構築する。テキストとペア画像は並列に符号化され、次にマルチモーダル表現を統合するアテンション層が続く。本手法は曖昧さの精度を大幅に向上させる。 12の自然言語理解および機械翻訳タスクの実験により,提案手法の有効性と一般化能力がさらに検証された。 Word representation is a fundamental component in neural language understanding models. Recently, pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally give more accurate contextualized word representations than non-contextualized models do, they are still subject to a sequence of text contexts without diverse hints for word representation from multimodality. This paper thus proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance. In detail, we build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images. The texts and paired images are encoded in parallel, followed by an attention layer to integrate the multimodal representations. We show that the method substantially improves the accuracy of disambiguation. Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.	翻訳日:2021-04-18 06:08:40 公開日:2020-12-30
# 命令外:自然言語理解タスクにおける文中の単語の逐次順序はどの程度重要か? Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks? ( http://arxiv.org/abs/2012.15180v1 ) ライセンス: Link先を確認	Thang M. Pham, Trung Bui, Long Mai, Anh Nguyen	(参考訳) 最先端の自然言語理解モデルは、単語の順序 - シーケンスの最も重要な特徴の1つか? いつもじゃない! 入力単語がランダムにシャッフルされた後も,多くのグルータスクで訓練されたbertベースの分類器の正確な予測の75%から90%が一定であった。 BERTの埋め込みはコンテキスト的に有名だが、各単語の下流タスクへの貢献は、単語のコンテキストがシャッフルされた後もほとんど変化しない。 BERTベースのモデルは表面的な手がかり(例)を利用することができる。感情分析におけるキーワードの感情、あるいは自然言語推論におけるシーケンスペア入力間の単語間の類似性)トークンがランダムに配列されたときに正しい決定をする。単語順序情報をキャプチャするための分類器の強化は、ほとんどのGLUEタスク、SQuAD 2.0およびout-of-samplesのパフォーマンスを改善する。我々の研究は、多くのGLUEタスクが文の意味を理解するのに難題ではないことを示唆している。 Do state-of-the-art natural language understanding models care about word order - one of the most important characteristics of a sequence? Not always! We found 75% to 90% of the correct predictions of BERT-based classifiers, trained on many GLUE tasks, remain constant after input words are randomly shuffled. Despite BERT embeddings are famously contextual, the contribution of each individual word to downstream tasks is almost unchanged even after the word's context is shuffled. BERT-based models are able to exploit superficial cues (e.g. the sentiment of keywords in sentiment analysis; or the word-wise similarity between sequence-pair inputs in natural language inference) to make correct decisions when tokens are arranged in random orders. Encouraging classifiers to capture word order information improves the performance on most GLUE tasks, SQuAD 2.0 and out-of-samples. Our work suggests that many GLUE tasks are not challenging machines to understand the meaning of a sentence.	翻訳日:2021-04-18 06:08:28 公開日:2020-12-30
# メタ認知による言語キャリブレーション:対話エージェント応答と期待された正しさの整合 Linguistic calibration through metacognition: aligning dialogue agent responses with expected correctness ( http://arxiv.org/abs/2012.14983v1 ) ライセンス: Link先を確認	Sabrina J. Mielke, Arthur Szlam, Y-Lan Boureau, Emily Dinan	(参考訳) オープンドメインの対話エージェントは大幅に改善されているが、直観的な質問に対して自信を持って知識を暗示したり、疑問を呈したりする。疑わしい(または自信のある)言語表現は、モデルの答えが正しくない(または正しい)可能性と一致するか? この意味でこれらのモデルは校正が不十分であることがわかったが、モデル内の表現は正確さの確率を正確に予測するために使用できることを示した。制御可能な生成モデルのトレーニングにこれらの正確性予測を組み込むことで、言語キャリブレーションを大幅に改善した対話エージェントを得る。 Open-domain dialogue agents have vastly improved, but still confidently hallucinate knowledge or express doubt when asked straightforward questions. In this work, we analyze whether state-of-the-art chit-chat models can express metacognition capabilities through their responses: does a verbalized expression of doubt (or confidence) match the likelihood that the model's answer is incorrect (or correct)? We find that these models are poorly calibrated in this sense, yet we show that the representations within the models can be used to accurately predict likelihood of correctness. By incorporating these correctness predictions into the training of a controllable generation model, we obtain a dialogue agent with greatly improved linguistic calibration.	翻訳日:2021-04-18 06:08:11 公開日:2020-12-30
# テーブル上のオープンファクトチェックのための共同検証とリランク Joint Verification and Reranking for Open Fact Checking Over Tables ( http://arxiv.org/abs/2012.15115v1 ) ライセンス: Link先を確認	Michael Schlichtkrull, Vladimir Karpukhin, Barlas O\u{g}uz, Mike Lewis, Wen-tau Yih, Sebastian Riedel	(参考訳) 構造化情報は事実クレームの自動検証のための重要な知識源である。しかし,本研究の大部分はテキストデータに重点を置いており,近年では各クレームに対する適切な証拠が既に回収されていると推定されるクローズド・ドメイン・セッティングに関する調査も行われている。本稿では,オープンドメイン設定における構造化データに対する検証について検討し,検証コンポーネント内の証拠文書を融合する検証モデルを導入する。我々のオープンドメインモデルは、TabFactデータセットのクローズドドメイン状態に匹敵するパフォーマンスを実現し、複数のテーブルを含めることによるパフォーマンス向上と、ヒューリスティックな検索ベースラインに対する大幅な改善を示す。 Structured information is an important knowledge source for automatic verification of factual claims. Nevertheless, the majority of existing research into this task has focused on textual data, and the few recent inquiries into structured data have been for the closed-domain setting where appropriate evidence for each claim is assumed to have already been retrieved. In this paper, we investigate verification over structured data in the open-domain setting, introducing a joint reranking-and-verification model which fuses evidence documents in the verification component. Our open-domain model achieves performance comparable to the closed-domain state-of-the-art on the TabFact dataset, and demonstrates performance gains from the inclusion of multiple tables as well as a significant improvement over a heuristic retrieval baseline.	翻訳日:2021-04-18 06:08:00 公開日:2020-12-30
# ペシミズムはおそらくオフラインRLに有効か? Is Pessimism Provably Efficient for Offline RL? ( http://arxiv.org/abs/2012.15085v1 ) ライセンス: Link先を確認	Ying Jin, Zhuoran Yang, Zhaoran Wang	(参考訳) 本研究では,事前収集したデータセットに基づく最適ポリシー学習を目的としたオフライン強化学習(RL)について検討する。環境とのさらなる相互作用が欠如しているため、オフラインのRLはデータセットのカバー不足に悩まされ、既存の理論分析を損なう。本稿では,不確かさ量化器をペナルティ関数として組み込んだ値反復アルゴリズム(pevi)の悲観的変種を提案する。このようなペナルティ関数は、オンラインrlの探索を促進するためのボーナス関数の符号をひっくり返すだけで、一般的な関数近似器と容易に実装でき、互換性がある。データセットの十分なカバレッジを仮定せずに、一般的なマルコフ決定プロセス(MDPs)に対するPEVIの最適度にデータ依存的な上限を確立する。線形 MDP に特化する場合、情報理論の下界は次元と地平線の乗法的因子と一致する。言い換えれば、悲観主義は証明可能な効率だけでなく、最小限の最適化でもある。特にデータセットが与えられた場合、学習されたポリシは、他のポリシが改善できないため、すべてのポリシの中で‘ベストプラクティス’として機能します。我々の理論的分析は, データセットによってカバーされず, 最適方針に反する「無関係」軌道から生じる, 刺激的相関の概念を排除する上で, 悲観主義が重要な役割を解明するものである。 We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function. Such a penalty function simply flips the sign of the bonus function for promoting exploration in online RL, which makes it easily implementable and compatible with general function approximators. Without assuming the sufficient coverage of the dataset, we establish a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs). When specialized to linear MDPs, it matches the information-theoretic lower bound up to multiplicative factors of the dimension and horizon. In other words, pessimism is not only provably efficient but also minimax optimal. In particular, given the dataset, the learned policy serves as the ``best effort'' among all policies, as no other policies can do better. Our theoretical analysis identifies the critical role of pessimism in eliminating a notion of spurious correlation, which emerges from the ``irrelevant'' trajectories that are less covered by the dataset and not informative for the optimal policy.	翻訳日:2021-04-18 06:07:45 公開日:2020-12-30
# ERICA:コントラスト学習による事前学習言語モデルのエンティティと関係理解の改善 ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning ( http://arxiv.org/abs/2012.15022v1 ) ライセンス: Link先を確認	Yujia Qin, Yankai Lin, Ryuichi Takanobu, Zhiyuan Liu, Peng Li, Heng Ji, Minlie Huang, Maosong Sun, Jie Zhou	(参考訳) 事前訓練された言語モデル(PLM)は、様々な下流自然言語処理(NLP)タスクで高いパフォーマンスを示している。しかし、plmはテキスト中の事実の知識をうまく捉えられないため、テキスト全体の理解、特に文書レベルの言語理解タスクにおいて重要である。この問題に対処するため,本研究では,ERICA という新たなコントラスト学習フレームワークを事前学習段階で提案し,エンティティとその関係をテキストでより深く理解する。具体的には、(1)エンティティをよりよく理解するために、与えられたヘッドエンティティと関係によって推測できる末尾エンティティを区別するエンティティ識別タスクを提案する。 2)関係をよりよく理解するために、2つのエンティティペアが関係セマンティクスにおいて近接しているか否かを区別する関係識別タスクを用いる。実験の結果,本フレームワークは,関係抽出や読解といった文書レベルの言語理解タスクにおいて,特に低リソース環境下で一貫した改善を実現していることがわかった。一方、ERICAは文レベルのタスクで同等またはより良いパフォーマンスを達成する。さらなる研究のために、データセット、ソースコード、事前学習された言語モデルをリリースします。 Pre-trained Language Models (PLMs) have shown strong performance in various downstream Natural Language Processing (NLP) tasks. However, PLMs still cannot well capture the factual knowledge in the text, which is crucial for understanding the whole text, especially for document-level language understanding tasks. To address this issue, we propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text. Specifically, (1) to better understand entities, we propose an entity discrimination task that distinguishes which tail entity can be inferred by the given head entity and relation. (2) Besides, to better understand relations, we employ a relation discrimination task which distinguishes whether two entity pairs are close or not in relational semantics. Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks, including relation extraction and reading comprehension, especially under low resource setting. Meanwhile, ERICA achieves comparable or better performance on sentence-level tasks. We will release the datasets, source codes and pre-trained language models for further research explorations.	翻訳日:2021-04-18 06:07:23 公開日:2020-12-30
# SemGloVe:BERTによるGloVeのセマンティック共同発生 SemGloVe: Semantic Co-occurrences for GloVe from BERT ( http://arxiv.org/abs/2012.15197v1 ) ライセンス: Link先を確認	Leilei Gan, Zhiyang Teng, Yue Zhang, Linchao Zhu, Fei Wu, Yi Yang	(参考訳) GloVeは単語共起行列から統計情報を活用することで単語埋め込みを学習する。しかし、行列中の単語ペアは、定義済みのローカルコンテキストウィンドウから抽出され、限定された単語ペアと潜在的に意味のない単語ペアにつながる可能性がある。本稿では,BERTから静的なGloVe単語の埋め込みに意味的共起を蒸留するSemGloVeを提案する。特に,マスク付き言語モデルと多頭部注意重みに基づく共起統計を抽出する2つのモデルを提案する。提案手法は,局所的なウィンドウ仮定によって制限されることなく単語ペアを抽出し,単語ペア間の意味的距離を直接考慮して共起重みを定義できる。いくつかの単語類似性データセットと4つの外部タスクの実験は、SemGloVeがGloVeより優れていることを示している。 GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices. However, word pairs in the matrices are extracted from a predefined local context window, which might lead to limited word pairs and potentially semantic irrelevant word pairs. In this paper, we propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings. Particularly, we propose two models to extract co-occurrence statistics based on either the masked language model or the multi-head attention weights of BERT. Our methods can extract word pairs without limiting by the local window assumption and can define the co-occurrence weights by directly considering the semantic distance between word pairs. Experiments on several word similarity datasets and four external tasks show that SemGloVe can outperform GloVe.	翻訳日:2021-04-18 06:07:05 公開日:2020-12-30
# 対話システムにおける言語理解のロバストネステスト Robustness Testing of Language Understanding in Dialog Systems ( http://arxiv.org/abs/2012.15262v1 ) ライセンス: Link先を確認	Jiexi Liu, Ryuichi Takanobu, Jiaxin Wen, Dazhen Wan, Weiran Nie, Hongyan Li, Cheng Li, Wei Peng, Minlie Huang	(参考訳) ダイアログシステムにおけるほとんどの言語理解モデルは、少量の注釈付きトレーニングデータに基づいて訓練され、同じ分布から小さなセットで評価される。しかし、これらのモデルが実際に自然摂動にさらされると、システム障害や望ましくない出力につながる可能性がある。本稿では,自然言語理解モデルの頑健性に関する包括的評価と分析を行い,実世界の対話システムにおける言語理解に関する3つの重要な側面,すなわち言語多様性,音声特性,雑音の摂動について述べる。本稿では,対話システムにおけるロバスト性問題をテストするために,自然な摂動を近似するモデル非依存ツールキットLAUGを提案する。この3つの側面をカバーする4つのデータ拡張アプローチがlaugで組み立てられ、最先端モデルにおける重要な堅牢性問題を明らかにする。 LAUGによる拡張データセットは、ダイアログシステムにおける言語理解の堅牢性テストの今後の研究を促進するために使用できる。 Most language understanding models in dialog systems are trained on a small amount of annotated training data, and evaluated in a small set from the same distribution. However, these models can lead to system failure or undesirable outputs when being exposed to natural perturbation in practice. In this paper, we conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models, and introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation. We propose a model-agnostic toolkit LAUG to approximate natural perturbation for testing the robustness issues in dialog systems. Four data augmentation approaches covering the three aspects are assembled in LAUG, which reveals critical robustness issues in state-of-the-art models. The augmented dataset through LAUG can be used to facilitate future research on the robustness testing of language understanding in dialog systems.	翻訳日:2021-04-18 06:06:51 公開日:2020-12-30
# グラフ-テキスト問題としての地図からのランドマークナビゲーション命令の生成 Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem ( http://arxiv.org/abs/2012.15329v1 ) ライセンス: Link先を確認	Raphael Schumann and Stefan Riezler	(参考訳) 自動車に焦点をあてたナビゲーションサービスは、名前付き通りの曲がり角と距離に基づいており、人間によって自然に使用されるナビゲーション指示は、ランドマークと呼ばれる物理的オブジェクトを中心にしている。本稿では,OpenStreetMap表現を入力とし,人間の自然言語命令から可視的かつ健全なランドマークを含むナビゲーション命令を生成するニューラルネットワークを提案する。地図上の経路は、自然言語命令にデコードされる位置および回転不変グラフ表現に符号化される。われわれの研究は、ストリートビューで人間のナビゲーションによって検証された7,672件のクラウドソースインスタンスのデータセットに基づいている。評価の結果,本システムで生成したナビゲーション命令は,人間が生成した命令と類似しており,ストリートビューでのナビゲーションが成功していることがわかった。 Car-focused navigation services are based on turns and distances of named streets, whereas navigation instructions naturally used by humans are centered around physical objects called landmarks. We present a neural model that takes OpenStreetMap representations as input and learns to generate navigation instructions that contain visible and salient landmarks from human natural language instructions. Routes on the map are encoded in a location- and rotation-invariant graph representation that is decoded into natural language instructions. Our work is based on a novel dataset of 7,672 crowd-sourced instances that have been verified by human navigation in Street View. Our evaluation shows that the navigation instructions generated by our system have similar properties as human-generated instructions, and lead to successful human navigation in Street View.	翻訳日:2021-04-18 06:06:36 公開日:2020-12-30
# 口語ニューラルマシン翻訳のための合成ソース言語拡張 Synthetic Source Language Augmentation for Colloquial Neural Machine Translation ( http://arxiv.org/abs/2012.15178v1 ) ライセンス: Link先を確認	Asrul Sani Ariesandy, Mukhlis Amien, Alham Fikri Aji, Radityo Eko Prasojo	(参考訳) ニューラルネットワーク翻訳(NMT)は通常ドメインに依存し、スタイルに依存し、多くのトレーニングデータを必要とする。最先端のNMTモデルは、しばしばソース言語の語彙的バリエーションを扱うのに不足しており、この点において並列データの欠如は、既存のモデルを体系的に改善する上で難しいハードルである。そこで本研究では,youtube と twitter から収集したインドネシア英語テストセットを開発した。インドネシア語正規語のソースに対して合成スタイル拡張を行い、新しいテストデータよりもベースラインId-Enモデル(BLEU)を改善したことを示す。 Neural machine translation (NMT) is typically domain-dependent and style-dependent, and it requires lots of training data. State-of-the-art NMT models often fall short in handling colloquial variations of its source language and the lack of parallel data in this regard is a challenging hurdle in systematically improving the existing models. In this work, we develop a novel colloquial Indonesian-English test-set collected from YouTube transcript and Twitter. We perform synthetic style augmentation to the source of formal Indonesian language and show that it improves the baseline Id-En models (in BLEU) over the new test data.	翻訳日:2021-04-18 06:06:21 公開日:2020-12-30
# 小さなデータセット上でのより深いトランスフォーマーの最適化:テキストからsqlへの意味解析への応用 Optimizing Deeper Transformers on Small Datasets: An Application on Text-to-SQL Semantic Parsing ( http://arxiv.org/abs/2012.15355v1 ) ライセンス: Link先を確認	Peng Xu, Wei Yang, Wenjie Zi, Keyi Tang, Chengyang Huang, Jackie Chi Kit Cheung, Yanshuai Cao	(参考訳) スクラッチからディープトランスフォーマーをトレーニングするには大きなデータセットが必要であるという一般的な信念のため、人々は小さなデータセットを微調整する際、トレーニング済みのモデルの上に浅い層と単純な層しか使用しない。適切な初期化とトレーニング技術によって、非常に深いトランスフォーマーの利点は、小さなデータセットを使用しても、ハードな構造的予測タスクに引き継がれることが示されます。特に,意味解析タスクのために48層のトランスフォーマーをトレーニングした。これらは、予め訓練されたRoBERTaの24層と、スクラッチから訓練された24層からなる。トレーニングステップが少なく、タスク固有の事前トレーニングがないため、挑戦的なクロスドメインのText-to-SQLセマンティックパーシングベンチマークであるSpider上で、アートパフォーマンスの状態を取得する。我々は、従来のT-Fixup作業に触発された新しいデータ依存トランスフォーマー固定更新初期化スキーム(DT-Fixup)を導出した。さらなる誤差解析により、変圧器モデルの深さを増大させることで、推論や構造的理解を必要とするケースの一般化が向上することを示した。 Due to the common belief that training deep transformers from scratch requires large datasets, people usually only use shallow and simple additional layers on top of pre-trained models during fine-tuning on small datasets. We provide evidence that this does not always need to be the case: with proper initialization and training techniques, the benefits of very deep transformers are shown to carry over to hard structural prediction tasks, even using small datasets. In particular, we successfully train 48 layers of transformers for a semantic parsing task. These comprise 24 fine-tuned transformer layers from pre-trained RoBERTa and 24 relation-aware transformer layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state of the art performance on the challenging cross-domain Text-to-SQL semantic parsing benchmark Spider. We achieve this by deriving a novel Data dependent Transformer Fixed-update initialization scheme (DT-Fixup), inspired by the prior T-Fixup work. Further error analysis demonstrates that increasing the depth of the transformer model can help improve generalization on the cases requiring reasoning and structural understanding.	翻訳日:2021-04-18 06:06:09 公開日:2020-12-30
# 不自然な言語推論 Unnatural Language Inference ( http://arxiv.org/abs/2101.00010v1 ) ライセンス: Link先を確認	Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, Adina Williams	(参考訳) 自然言語理解は、大規模な事前学習されたトランスフォーマーネットワークの導入によって、分岐した瞬間を目撃している。これらのモデルは、特に自然言語推論(NLI)を含む様々なタスクで最先端を達成する。多くの研究は、モデルに埋め込まれた大きな表現空間がいくつかの構文情報や意味情報を符号化していることを示した。しかし、本当に「構文を知る」ためには、モデルはその入力が構文規則に違反していることを認識し、それに従って推論を計算する必要がある。本稿では,roberta や bart のような最先端 nli モデルが,無作為に並べ替えられた単語の例に対して不変であり,時にはよりよく機能することを示す。反復探索により、原語と同じ単語で置換された仮説前提ペアを含むNLIテストセットのランダム化バージョンを構築することができるが、大きな事前訓練されたモデルや、変換前の最先端エンコーダによって完全な精度で分類できる。問題は言語であり,モデル不変であり,それゆえ根本原因を考察する。この効果を部分的に緩和するために,簡単なトレーニング手法を提案する。我々の発見は、自然言語理解モデルと、その進捗を測定するために使われるタスクが、本当に人間のような構文理解を必要とするという考えに疑問を投げかけている。 Natural Language Understanding has witnessed a watershed moment with the introduction of large pre-trained Transformer networks. These models achieve state-of-the-art on various tasks, notably including Natural Language Inference (NLI). Many studies have shown that the large representation space imbibed by the models encodes some syntactic and semantic information. However, to really "know syntax", a model must recognize when its input violates syntactic rules and calculate inferences accordingly. In this work, we find that state-of-the-art NLI models, such as RoBERTa and BART are invariant to, and sometimes even perform better on, examples with randomly reordered words. With iterative search, we are able to construct randomized versions of NLI test sets, which contain permuted hypothesis-premise pairs with the same words as the original, yet are classified with perfect accuracy by large pre-trained models, as well as pre-Transformer state-of-the-art encoders. We find the issue to be language and model invariant, and hence investigate the root cause. To partially alleviate this effect, we propose a simple training methodology. Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.	翻訳日:2021-04-18 06:05:47 公開日:2020-12-30
# ホップワイズを考慮した適応グラフ拡散ネットワーク Adaptive Graph Diffusion Networks with Hop-wise Attention ( http://arxiv.org/abs/2012.15024v1 ) ライセンス: Link先を確認	Chuxiong Sun, Guoshi Wu	(参考訳) グラフニューラルネットワーク(GNN)は近年注目を集め、多くの分野で最先端のパフォーマンスを実現している。より深いGNNは理論上、より深い近隣情報をキャプチャすることができる。しかし、しばしば過剰フィッティングや過剰スムーシングの問題に苦しむ。複雑度と一般化性を維持しつつ,より深い情報を取り込むため,ホップワイズ・アテンション(agdns-ha)を用いた適応型グラフ拡散ネットワークを提案する。異なる順序の複数のホップ近傍のアグリゲーションを単一層に積み重ねる。次に、各ノードに対して学習可能で適応的なホップ対応の助けを借りて統合する。半教師付きノード分類タスクを用いた標準データセットの実験結果から,提案手法は有意な改善が得られた。 Graph Neural Networks (GNNs) have received much attention recent years and have achieved state-of-the-art performances in many fields. The deeper GNNs can theoretically capture deeper neighborhood information. However, they often suffer from problems of over-fitting and over-smoothing. In order to incorporate deeper information while preserving considerable complexity and generalization ability, we propose Adaptive Graph Diffusion Networks with Hop-wise Attention (AGDNs-HA). We stack multi-hop neighborhood aggregations of different orders into single layer. Then we integrate them with the help of hop-wise attention, which is learnable and adaptive for each node. Experimental results on the standard dataset with semi-supervised node classification task show that our proposed methods achieve significant improvements.	翻訳日:2021-04-18 06:05:27 公開日:2020-12-30
# infer-avae: 逆変分オートエンコーダに基づく属性推論モデル Infer-AVAE: An Attribute Inference Model Based on Adversarial Variational Autoencoder ( http://arxiv.org/abs/2012.15005v1 ) ライセンス: Link先を確認	Yadong Zhou, Zhihao Ding, Xiaoming Liu, Chao Shen, Lingling Tong, Xiaohong Guan	(参考訳) ソーシャルネットワーク上でのユーザ属性の空間性に欠ける属性推論は、既存のデータとユーザ間のソーシャル接続などの追加情報に基づいて、欠落した属性を推測することを目的としている。近年,変分オートエンコーダ (VAE) が半教師付き方式でこの問題の解決に成功している。しかし、エンコーダが学習した潜伏表現には、不十分な情報または無駄な情報が含まれる:i) MLPは、入力データをうまく再構築するが、欠落部分の完了に失敗する;i) GNNは、社会的つながりに応じて情報をマージするが、GNNでは共通の問題である過密化に悩まされる。さらに、既存の手法ではデコーダの規制を無視しており、十分な推論能力がなく、厳しいオーバーフィッティングに直面している。上記の問題に対処するため,逆VAE(Infer-AVAE)に基づく属性推論モデルを提案する。私たちのモデルは、エンコーダ内のmlpとgnnを意図的に統一して、2つの潜在表現を学習します。次に,2つの表現の違いを活用するために,敵対ネットワークを訓練し,強靭な表現のためにMPPを用いてGNNを誘導する。さらに、識別器としてデコーダを特に訓練するために、損失関数に相互情報制約を導入する。したがって、属性推論の表現における補助情報をよりよく活用することができる。実世界のsnsデータセットに基づいて, 実験結果から, 本モデルの平均的な精度は7.0%向上した。 Facing the sparsity of user attributes on social networks, attribute inference aims at inferring missing attributes based on existing data and additional information such as social connections between users. Recently, Variational Autoencoders (VAEs) have been successfully applied to solve the problem in a semi-supervised way. However, the latent representations learned by the encoder contain either insufficient or useless information: i) MLPs can successfully reconstruct the input data but fail in completing missing part, ii) GNNs merge information according to social connections but suffer from over-smoothing, which is a common problem with GNNs. Moreover, existing methods neglect regulating the decoder, as a result, it lacks adequate inference ability and faces severe overfitting. To address the above issues, we propose an attribute inference model based on adversarial VAE (Infer-AVAE). Our model deliberately unifies MLPs and GNNs in encoder to learn dual latent representations: one contains only the observed attributes of each user, the other converges extra information from the neighborhood. Then, an adversarial network is trained to leverage the differences between the two representations and adversarial training is conducted to guide GNNs using MLPs for robust representations. What's more, mutual information constraint is introduced in loss function to specifically train the decoder as a discriminator. Thus, it can make better use of auxiliary information in the representations for attribute inference. Based on real-world social network datasets, experimental results demonstrate that our model averagely outperforms state-of-art by 7.0% in accuracy.	翻訳日:2021-04-18 06:04:55 公開日:2020-12-30
# 3層ニューラルネットワークのSGD分布ダイナミクス SGD Distributional Dynamics of Three Layer Neural Networks ( http://arxiv.org/abs/2012.15036v1 ) ライセンス: Link先を確認	Victor Luo, Yazhen Wang and Glenn Fung	(参考訳) ビッグデータ分析の台頭に伴い、多層ニューラルネットワークは最も強力な機械学習手法の1つとして浮上した。しかし、理論的な数学的性質はまだ完全には理解されていない。ニューラルネットワークのトレーニングには、通常確率勾配降下(sgd)を使用して行われる非凸目的関数を最適化する必要がある。本稿では,Mei et alの平均場結果を拡張することを目的とする。 (2018) 隠れた層を持つ2層ニューラルネットワークから隠れた層を持つ3層ニューラルネットワークへ移行した。 SGD力学は非線形偏微分方程式の集合によって捉えられ、2つの隠蔽層における重みの分布が独立であることを証明する。シミュレーションと実世界データに基づく探索作業についても詳述する。 With the rise of big data analytics, multi-layer neural networks have surfaced as one of the most powerful machine learning methods. However, their theoretical mathematical properties are still not fully understood. Training a neural network requires optimizing a non-convex objective function, typically done using stochastic gradient descent (SGD). In this paper, we seek to extend the mean field results of Mei et al. (2018) from two-layer neural networks with one hidden layer to three-layer neural networks with two hidden layers. We will show that the SGD dynamics is captured by a set of non-linear partial differential equations, and prove that the distributions of weights in the two hidden layers are independent. We will also detail exploratory work done based on simulation and real-world data.	翻訳日:2021-04-18 06:04:24 公開日:2020-12-30
# mm-fsod: メタとメトリックの統合したマイナショットオブジェクト検出 MM-FSOD: Meta and metric integrated few-shot object detection ( http://arxiv.org/abs/2012.15159v1 ) ライセンス: Link先を確認	Yuewen Li, Wenquan Feng, Shuchang Lyu, Qi Zhao, Xuliang Li	(参考訳) オブジェクト検出タスクでは、cnn(convolutional neural networks)モデルはトレーニングプロセスにおいて、常に大量の注釈付き例を必要とします。高価なアノテーションの依存性を減らすために、少数のオブジェクト検出が研究の焦点となっている。本稿では,メタラーニングとメトリック学習を統合した効果的なオブジェクト検出フレームワーク(MM-FSOD)を提案する。我々のモデルは、トレーニングサンプルにない新しいカテゴリを正確に認識できるクラスに依存しない検出モデルである。具体的には,クラス内平均プロトタイプを学習するためのメタ表現モジュール(MRモジュール)を提案する。 MRモジュールは、高度な特徴を再構築する能力を得るためにメタラーニング法で訓練される。クエリロア機能を持つサポートプロトタイプ間の特徴の類似性をさらに高めるために,分類器として機能するピアソン計量モジュール(prモジュール)を提案する。これまでの一般的な計量法と比較すると、コサイン距離メートル法である。 prモジュールは、モデルを識別的な埋め込み空間にアライメント可能にする。ベンチマークデータセット FSOD, MS COCO, PASCAL VOC に関する広範な実験を行い, 本モデルの有効性と有効性を示す。従来の手法と比較して、MM-FSODは最先端(SOTA)結果が得られる。 In the object detection task, CNN (Convolutional neural networks) models always need a large amount of annotated examples in the training process. To reduce the dependency of expensive annotations, few-shot object detection has become an increasing research focus. In this paper, we present an effective object detection framework (MM-FSOD) that integrates metric learning and meta-learning to tackle the few-shot object detection task. Our model is a class-agnostic detection model that can accurately recognize new categories, which are not appearing in training samples. Specifically, to fast learn the features of new categories without a fine-tuning process, we propose a meta-representation module (MR module) to learn intra-class mean prototypes. MR module is trained with a meta-learning method to obtain the ability to reconstruct high-level features. To further conduct similarity of features between support prototype with query RoIs features, we propose a Pearson metric module (PR module) which serves as a classifier. Compared to the previous commonly used metric method, cosine distance metric. PR module enables the model to align features into discriminative embedding space. We conduct extensive experiments on benchmark datasets FSOD, MS COCO, and PASCAL VOC to demonstrate the feasibility and efficiency of our model. Comparing with the previous method, MM-FSOD achieves state-of-the-art (SOTA) results.	翻訳日:2021-04-18 06:04:13 公開日:2020-12-30
# laif:ai、ドイツのディープラーニング、suetterlinの文字認識と生成 LAIF: AI, Deep Learning for Germany Suetterlin Letter Recognition and Generation ( http://arxiv.org/abs/2101.10450v1 ) ライセンス: Link先を確認	Enkhtogtokh Togootogtokh, Christian Klasen	(参考訳) ディープラーニングAI技術の初期の実装として成功したのは、文字認識だった。人工知能(AI)の最近の進歩により、手書き文字認識や自動生成といった複雑な問題に対して、より強固な技術がもたらされる。本研究では,ドイツにおける文字認識・生成のためのLudwig AI Framework(LAIF)というディープラーニングフレームワークを提案する。 Suetterlin文字を認識するために,我々は深層畳み込みニューラルネットワークを提案する。本研究は,手書き文字のハードコピーのラベル付けに膨大なコストと深層モデルのトレーニング用データがないことから,手書き文字を合成データとして生成する深部生成逆数ネットワークを用いた手法についても紹介する。ソースコードはhttps://github.com/enkhtogtokh/LAIFリポジトリにある。 One of the successful early implementation of deep learning AI technology was on letter recognition. With the recent breakthrough of artificial intelligence (AI) brings more solid technology for complex problems like handwritten letter recognition and even automatic generation of them. In this research, we proposed deep learning framework called Ludwig AI Framework(LAIF) for Germany Suetterlin letter recognition and generation. To recognize Suetterlin letter, we proposed deep convolutional neural network. Since lack of big amount of data to train for the deep models and huge cost to label existing hard copy of handwritten letters, we also introduce the methodology with deep generative adversarial network to generate handwritten letters as synthetic data. Main source code is in https://github.com/enkhtogtokh/LAIF repository.	翻訳日:2021-04-18 06:03:53 公開日:2020-12-30
# 時系列予測のための局所モデルアンサンブル Ensembles of Localised Models for Time Series Forecasting ( http://arxiv.org/abs/2012.15059v1 ) ライセンス: Link先を確認	Rakshitha Godahewa, Kasun Bandara, Geoffrey I. Webb, Slawek Smyl, Christoph Bergmeir	(参考訳) 今日では、大量のデータが利用可能になっているため、Global Forecasting Models (GFM)として知られる一連の時系列で訓練された予測モデルは、孤立した時系列で動作する従来の単変量予測モデルよりも定期的に優れている。 GFMは通常、すべての時系列で同じパラメータのセットを共有するため、特にデータセットが不均一な状況において、特定の時系列に十分に局所化されないという問題があることが多い。本稿では,一般GFMとユニバリアイトモデルを用いて,この問題を解決する方法について検討する。私たちの研究は,クラスタ単位の分離サブモデル,いわゆる専門家アンサンブルアプローチ,グローバルモデルとローカルモデルのヘテロジニアスアンサンブルの構築など,関連する現在のアプローチを体系化し,比較する。アプローチのギャップを埋めて、異なる基盤となるGFMモデルタイプに一般化する。次に,クラスタ数とクラスタ種数を変化させて,複数のgfmを連続するクラスタ上でトレーニングする,クラスターアンサンブルの新たな手法を提案する。フィードフォワードニューラルネットワーク,リカレントニューラルネットワーク,プール回帰モデルを基礎となるGAMとして6つの公開データセットを評価した結果,提案モデルはベースラインGAMモデルや単変量予測手法よりもはるかに高い精度を達成できることがわかった。 With large quantities of data typically available nowadays, forecasting models that are trained across sets of time series, known as Global Forecasting Models (GFM), are regularly outperforming traditional univariate forecasting models that work on isolated series. As GFMs usually share the same set of parameters across all time series, they often have the problem of not being localised enough to a particular series, especially in situations where datasets are heterogeneous. We study how ensembling techniques can be used with generic GFMs and univariate models to solve this issue. Our work systematises and compares relevant current approaches, namely clustering series and training separate submodels per cluster, the so-called ensemble of specialists approach, and building heterogeneous ensembles of global and local models. We fill some gaps in the approaches and generalise them to different underlying GFM model types. We then propose a new methodology of clustered ensembles where we train multiple GFMs on different clusters of series, obtained by changing the number of clusters and cluster seeds. Using Feed-forward Neural Networks, Recurrent Neural Networks, and Pooled Regression models as the underlying GFMs, in our evaluation on six publicly available datasets, the proposed models are able to achieve significantly higher accuracy than baseline GFM models and univariate forecasting methods.	翻訳日:2021-04-18 06:03:40 公開日:2020-12-30
# 公正制約下でのニューラルネットワーク分類器の訓練 Provably Training Neural Network Classifiers under Fairness Constraints ( http://arxiv.org/abs/2012.15274v1 ) ライセンス: Link先を確認	You-Lin Chen, Zhaoran Wang, Mladen Kolar	(参考訳) 公正な制約下での分類器の訓練は、道徳的、法的、ビジネス上の理由により、機械学習コミュニティで注目を集めている。しかし、アルゴリズムの公正性に対処する最近のいくつかの研究は、非凸性や人種や性別などの保護されたグループ間での差別化不可能な公正性基準によるロジスティック回帰やサポートベクターマシンのような単純なモデルにのみ焦点を当てている。ニューラルネットワークは、近年最も広く使われている分類モデルであり、理論的保証がない。本稿では,ニューラルネットワークにおけるアルゴリズムフェアネスの文献の欠如を補うことを目的とする。特に,過パラメータ化ニューラルネットワークが公平性制約を満たしていることを示す。公平なニューラルネットワーク分類器を構築する上で重要な要素は、ニューラルネットワークと関連するアプリケーションのオンライン学習に独立して関心を持つ可能性のある、オーバーパラメータ化体制におけるニューラルネットワークの非回帰分析を確立することである。 Training a classifier under fairness constraints has gotten increasing attention in the machine learning community thanks to moral, legal, and business reasons. However, several recent works addressing algorithmic fairness have only focused on simple models such as logistic regression or support vector machines due to non-convex and non-differentiable fairness criteria across protected groups, such as race or gender. Neural networks, the most widely used models for classification nowadays, are precluded and lack theoretical guarantees. This paper aims to fill this missing but crucial part of the literature of algorithmic fairness for neural networks. In particular, we show that overparametrized neural networks could meet the fairness constraints. The key ingredient of building a fair neural network classifier is establishing no-regret analysis for neural networks in the overparameterization regime, which may be of independent interest in the online learning of neural networks and related applications.	翻訳日:2021-04-18 06:03:17 公開日:2020-12-30
# Riesz表現子の逆推定 Adversarial Estimation of Riesz Representers ( http://arxiv.org/abs/2101.00009v1 ) ライセンス: Link先を確認	Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis	(参考訳) 任意の関数空間内の線形汎関数のriesz表現子を推定する逆アプローチを提案する。リース表現器と近似誤差を近似するために用いられる関数空間の局所化ラデマッハ複雑性に基づくオラクルの不等式を証明する。これらの不等式は、高次元スパース線型関数、ニューラルネットワーク、カーネルヒルベルト空間の再現など、多くの関心関数空間に対する高速有限サンプル平均二乗誤差率を意味する。我々のアプローチは、最近導入された機械学習技術でRiesz表現子を推定する新しい方法を提供する。半パラメトリックモデルにおける構造・因果パラメータの非バイアス化,モーメント方程式の自動直交化,および資産価格の文脈における確率的割引係数の推定において,我々の推定器をどのように利用できるかを示す。 We provide an adversarial approach to estimating Riesz representers of linear functionals within arbitrary function spaces. We prove oracle inequalities based on the localized Rademacher complexity of the function space used to approximate the Riesz representer and the approximation error. These inequalities imply fast finite sample mean-squared-error rates for many function spaces of interest, such as high-dimensional sparse linear functions, neural networks and reproducing kernel Hilbert spaces. Our approach offers a new way of estimating Riesz representers with a plethora of recently introduced machine learning techniques. We show how our estimator can be used in the context of de-biasing structural/causal parameters in semi-parametric models, for automated orthogonalization of moment equations and for estimating the stochastic discount factor in the context of asset pricing.	翻訳日:2021-04-18 06:03:03 公開日:2020-12-30
# 3D-UNetを用いたMRI脳腫瘍セグメント化と不確実性評価 MRI brain tumor segmentation and uncertainty estimation using 3D-UNet architectures ( http://arxiv.org/abs/2012.15294v1 ) ライセンス: Link先を確認	Laura Mora Ballestar and Veronica Vilaplana	(参考訳) 3次元磁気共鳴画像(MRI)における脳腫瘍のセグメンテーションの自動化は、疾患の診断と治療を評価する鍵となる。近年では、畳み込みニューラルネットワーク(CNN)がタスクの結果を改善している。しかし、3D-CNNでは高いメモリ消費が問題となっている。また,医療診断において特に重要な不確実性情報を含んでいない方法が多い。本研究は,パッチベースの手法で訓練した3Dエンコーダデコーダアーキテクチャについて検討し,メモリ消費を低減し,不均衡なデータの影響を低減する。異なるトレーニングされたモデルを使用して、各モデルのプロパティを活用するアンサンブルを生成し、パフォーマンスを向上する。また,テストタイム・ドロップアウト (TTD) とデータ拡張 (TTA) を用いて, てんかん, てんかんともにボキセル関連不確実性情報を導入する。さらに、セグメント化の精度を高めるためのハイブリッドアプローチも提案されている。本研究で提案されているモデルと不確実性推定測定は,brats'20 challenge for task 1, 3において腫瘍の分節化と不確実性推定に用いられてきた。 Automation of brain tumor segmentation in 3D magnetic resonance images (MRIs) is key to assess the diagnostic and treatment of the disease. In recent years, convolutional neural networks (CNNs) have shown improved results in the task. However, high memory consumption is still a problem in 3D-CNNs. Moreover, most methods do not include uncertainty information, which is especially critical in medical diagnosis. This work studies 3D encoder-decoder architectures trained with patch-based techniques to reduce memory consumption and decrease the effect of unbalanced data. The different trained models are then used to create an ensemble that leverages the properties of each model, thus increasing the performance. We also introduce voxel-wise uncertainty information, both epistemic and aleatoric using test-time dropout (TTD) and data-augmentation (TTA) respectively. In addition, a hybrid approach is proposed that helps increase the accuracy of the segmentation. The model and uncertainty estimation measurements proposed in this work have been used in the BraTS'20 Challenge for task 1 and 3 regarding tumor segmentation and uncertainty estimation.	翻訳日:2021-04-18 06:02:47 公開日:2020-12-30
# 説明可能性:医療画像におけるバックドア攻撃 Explainability Matters: Backdoor Attacks on Medical Imaging ( http://arxiv.org/abs/2101.00008v1 ) ライセンス: Link先を確認	Munachiso Nwadike, Takumi Miyawaki, Esha Sarkar, Michail Maniatakos, Farah Shamout	(参考訳) 深層ニューラルネットワークは、モデルトレーニングの前にトレーニングセットに簡単に導入できるバックドア攻撃に対して脆弱であることが示されている。最近の研究は、自然画像やおもちゃのデータセットに対するバックドア攻撃の調査に焦点を当てている。その結果、バックドアの正確な影響は、医療画像などの複雑な実世界応用においてはまだ完全には理解されていない。本稿では,胸部X線写真を用いたマルチラベル疾患分類タスクに対するバックドア攻撃の影響を,攻撃者がトレーニングデータセットを操作して攻撃を実行することを前提として検討する。最先端アーキテクチャの広範な評価は、トレーニングセットに数ピクセルの摂動を持つイメージを導入することで、アタッカーがトレーニング手順に関与せずにバックドアをうまく実行できることを示しています。単純な3$\times$3ピクセルトリガは、感染した画像のセットの受信操作特性(AUROC)曲線の下で最大1.00エリアを達成することができる。クリーンな画像のセットでは、バックドアニューラルネットワークは最大0.85AUROCを達成することができ、攻撃のステルス性を強調した。深層学習に基づく診断システムの使用が臨床実践で増加するにつれ,空間的局所化されたバックドアを推論時間で識別できるため,この文脈では説明可能性が不可欠であることを示す。 Deep neural networks have been shown to be vulnerable to backdoor attacks, which could be easily introduced to the training set prior to model training. Recent work has focused on investigating backdoor attacks on natural images or toy datasets. Consequently, the exact impact of backdoors is not yet fully understood in complex real-world applications, such as in medical imaging where misdiagnosis can be very costly. In this paper, we explore the impact of backdoor attacks on a multi-label disease classification task using chest radiography, with the assumption that the attacker can manipulate the training dataset to execute the attack. Extensive evaluation of a state-of-the-art architecture demonstrates that by introducing images with few-pixel perturbations into the training set, an attacker can execute the backdoor successfully without having to be involved with the training procedure. A simple 3$\times$3 pixel trigger can achieve up to 1.00 Area Under the Receiver Operating Characteristic (AUROC) curve on the set of infected images. In the set of clean images, the backdoored neural network could still achieve up to 0.85 AUROC, highlighting the stealthiness of the attack. As the use of deep learning based diagnostic systems proliferates in clinical practice, we also show how explainability is indispensable in this context, as it can identify spatially localized backdoors in inference time.	翻訳日:2021-04-18 06:02:29 公開日:2020-12-30
# 貯水器変圧器 Reservoir Transformer ( http://arxiv.org/abs/2012.15045v1 ) ライセンス: Link先を確認	Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela	(参考訳) いくつかの層がランダムに初期化され、更新されない場合でも、トランスフォーマは印象的なパフォーマンスを得る。機械学習における古き良きアイデアに着想を得て,正規トランスフォーマー層と相互に分散した非線形の「保存」層を探索し,様々な機械翻訳と(マスク)言語モデリングタスクにおいて,収束までの壁時計計算時間の改善と全体的な性能を示す。 We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.	翻訳日:2021-04-18 06:01:55 公開日:2020-12-30
# 語彙単純化による事前学習言語モデルの拡張 Enhancing Pre-trained Language Model with Lexical Simplification ( http://arxiv.org/abs/2012.15070v1 ) ライセンス: Link先を確認	Rongzhou Bao, Jiayi Wang, Zhuosheng Zhang, Hai Zhao	(参考訳) 人間の読者と事前学習された言語モデル(PrLM)の両方にとって、語彙の多様性は、与えられた文の基本的な意味を理解する際に混乱と不正確をもたらす可能性がある。複雑な単語を簡単な代替語で置換することにより、語彙単純化(LS)はそのような語彙の多様性を減らし、文の理解性を向上する。本稿では,LSを活用し,テキスト分類におけるPrLMの性能を効果的に向上する手法を提案する。所定の文に対して規則に基づく単純化処理を適用する。 PrLMは、簡略化されたバージョンからの補助的な入力で、与えられた文の実際のラベルを予測する。強力なPrLM(BERTとELECTRA)をベースラインとして,テキスト分類タスクの性能をさらに向上させることができる。 For both human readers and pre-trained language models (PrLMs), lexical diversity may lead to confusion and inaccuracy when understanding the underlying semantic meanings of given sentences. By substituting complex words with simple alternatives, lexical simplification (LS) is a recognized method to reduce such lexical diversity, and therefore to improve the understandability of sentences. In this paper, we leverage LS and propose a novel approach which can effectively improve the performance of PrLMs in text classification. A rule-based simplification process is applied to a given sentence. PrLMs are encouraged to predict the real label of the given sentence with auxiliary inputs from the simplified version. Using strong PrLMs (BERT and ELECTRA) as baselines, our approach can still further improve the performance in various text classification tasks.	翻訳日:2021-04-18 06:01:46 公開日:2020-12-30
# 位置情報分離によるゼロショット翻訳の改善 Improving Zero-Shot Translation by Disentangling Positional Information ( http://arxiv.org/abs/2012.15127v1 ) ライセンス: Link先を確認	Danni Liu, Jan Niehues, James Cross, Francisco Guzm\'an, Xian Li	(参考訳) 多言語ニューラルマシン翻訳は、トレーニングで見えない言語ペア間で直接翻訳する能力を示している。ゼロショット翻訳。概念的には魅力的だが、しばしば低い出力品質に悩まされる。新しい翻訳方向への一般化の難しさは、モデル表現が訓練で見られる言語対に非常に特有であることを示している。言語固有の表現を引き起こす主な要因が入力トークンの位置対応であることを示す。エンコーダ層の残余接続をなくすことで,これを容易に緩和できることを示す。この修正により、教師付き方向の品質を維持しながら、ゼロショット翻訳で最大18.5 BLEUポイントを得ることができる。提案したモデルがピボットベースの翻訳より優れている関連言語間では特に改善が顕著である。さらに,このアプローチでは,翻訳範囲を大きく拡大する新しい言語の統合が容易になる。隠れた層出力の徹底的な検査により、我々のアプローチが言語に依存しない表現につながることを示す。 Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. We demonstrate that a main factor causing the language-specific representations is the positional correspondence to input tokens. We show that this can be easily alleviated by removing residual connections in an encoder layer. With this modification, we gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions. The improvements are particularly prominent between related languages, where our proposed model outperforms pivot-based translation. Moreover, our approach allows easy integration of new languages, which substantially expands translation coverage. By thorough inspections of the hidden layer outputs, we show that our approach indeed leads to more language-independent representations.	翻訳日:2021-04-18 06:01:32 公開日:2020-12-30
# オープンドメイン質問応答のためのメモリ効率のよいベースライン A Memory Efficient Baseline for Open Domain Question Answering ( http://arxiv.org/abs/2012.15156v1 ) ライセンス: Link先を確認	Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Sebastian Riedel, Edouard Grave	(参考訳) 近年,高密度表現に基づく検索システムにより,オープンドメイン質問応答や関連するタスクが大幅に改善されている。このアプローチは非常に効果的だが、知識ソース全体の密度の高いベクトルをメモリに保持する必要があるため、メモリ集約型である。本稿では,高密度レトリバー・リーダーシステムのメモリフットプリントを低減する方法について検討する。本稿では,次元削減,ベクトル量子化,通過フィルタリングの3つの手法を検討する。我々は,TriviaQAとNaturalQuestionsという2つの質問応答ベンチマークに対するアプローチを評価し,6Gb未満のメモリで競合するシステムを実現できることを示した。 Recently, retrieval systems based on dense representations have led to important improvements in open-domain question answering, and related tasks. While very effective, this approach is also memory intensive, as the dense vectors for the whole knowledge source need to be kept in memory. In this paper, we study how the memory footprint of dense retriever-reader systems can be reduced. We consider three strategies to reduce the index size: dimension reduction, vector quantization and passage filtering. We evaluate our approach on two question answering benchmarks: TriviaQA and NaturalQuestions, showing that it is possible to get competitive systems using less than 6Gb of memory.	翻訳日:2021-04-18 06:01:21 公開日:2020-12-30
# クラック置換暗号はシーケンス・ツー・シーケンス・モデルでも可能か? Can Sequence-to-Sequence Models Crack Substitution Ciphers? ( http://arxiv.org/abs/2012.15229v1 ) ライセンス: Link先を確認	Nada Aldarrab and Jonathan May	(参考訳) 歴史的暗号の解読は難しい問題である。ターゲットの平文の言語は不明であり、暗号文には多くのノイズがある。 State-of-the-art decipherment法では、ビームサーチとニューラル言語モデルを用いて、与えられた暗号に対する候補平文仮説をスコアリングする。簡単な置換暗号を解くためのエンドツーエンド多言語モデルを提案する。提案手法は,テキストを明示的な言語識別なしに解読可能であり,なおも雑音に対して頑健であることを示す。 Decipherment of historical ciphers is a challenging problem. The language of the target plaintext might be unknown, and ciphertext can have a lot of noise. State-of-the-art decipherment methods use beam search and a neural language model to score candidate plaintext hypotheses for a given cipher, assuming plaintext language is known. We propose an end-to-end multilingual model for solving simple substitution ciphers. We test our model on synthetic and real historical ciphers and show that our proposed method can decipher text without explicit language identification and can still be robust to noise.	翻訳日:2021-04-18 06:01:10 公開日:2020-12-30
# 教師なしラベル対応イベントトリガーと引数分類 Unsupervised Label-aware Event Trigger and Argument Classification ( http://arxiv.org/abs/2012.15243v1 ) ライセンス: Link先を確認	Hongming Zhang, Haoyu Wang, Dan Roth	(参考訳) イベントを識別し、事前に定義されたイベントタイプにマッピングすることは、長い間、自然言語処理の重要な問題でした。これまでの作業のほとんどは、イベントタイプのラベルに含まれる意味を無視しながら、労働集約的およびドメイン固有のアノテーションに大きく依存していました。その結果、学習したモデルは、新しいイベントタイプを導入できる新しいドメインに効果的に一般化することはできない。本稿では,まず利用可能なツール(srlなど)でイベントを識別し,提案する非教師付き分類モデルを用いて,事前定義されたイベントタイプに自動マップする,教師なしイベント抽出パイプラインを提案する。アノテーション付きデータに頼るのではなく、モデルが特定したイベントのセマンティクスとイベントタイプラベルのセマンティクスを一致させるのです。具体的には、事前訓練された言語モデルを利用して、イベントトリガと引数の両方の事前定義された型を文脈的に表現する。表現類似性によって特定されたイベントを対象の型にマップした後、イベントオントロジー(例えば、引数型 "Victim" はイベント型 "Attack" の引数としてのみ現れる)を、予測を規則化するためのグローバルな制約として使用します。提案手法は、31のトリガと22の引数型を持つACE-2005データセットでテストした場合、非常に効果的であることが示されている。アノテーションを使わずに、83%のトリガと54%の引数を正しい型にマッピングすることに成功しました。 Identifying events and mapping them to pre-defined event types has long been an important natural language processing problem. Most previous work has been heavily relying on labor-intensive and domain-specific annotations while ignoring the semantic meaning contained in the labels of the event types. As a result, the learned models cannot effectively generalize to new domains, where new event types could be introduced. In this paper, we propose an unsupervised event extraction pipeline, which first identifies events with available tools (e.g., SRL) and then automatically maps them to pre-defined event types with our proposed unsupervised classification model. Rather than relying on annotated data, our model matches the semantics of identified events with those of event type labels. Specifically, we leverage pre-trained language models to contextually represent pre-defined types for both event triggers and arguments. After we map identified events to the target types via representation similarity, we use the event ontology (e.g., argument type "Victim" can only appear as the argument of event type "Attack") as global constraints to regularize the prediction. The proposed approach is shown to be very effective when tested on the ACE-2005 dataset, which has 33 trigger and 22 argument types. Without using any annotation, we successfully map 83% of the triggers and 54% of the arguments to the correct types, almost doubling the performance of previous zero-shot approaches.	翻訳日:2021-04-18 06:00:59 公開日:2020-12-30
# 生成型adversarial networkを用いた教師なし深部画像強調 Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network ( http://arxiv.org/abs/2012.15020v1 ) ライセンス: Link先を確認	Zhangkai Ni, Wenhan Yang, Shiqi Wang, Lin Ma, and Sam Kwong	(参考訳) 画像の美的品質を改善することは、大衆にとって挑戦的で熱心である。この問題に対処するため,既存のアルゴリズムの多くは,低品質の写真とそれに対応する専門家修正バージョンからなるペアデータの自動写真強調器を学習するための教師付き学習法に基づいている。しかし、専門家が修正した写真のスタイルや特徴は一般ユーザーのニーズや好みに合わない可能性がある。本稿では,多数のペア画像について学習するのではなく,教師なしな特徴を持つ画像の集合から,対応する画像と画像のマッピングを学習する,教師なし画像強調生成ネットワーク(UEGAN)を提案する。提案モデルは,よりリッチなグローバル・ローカル特徴を捉えるために,変調機構とアテンション機構を組み込んだシングルディープganに基づいている。提案モデルに基づいて,(1)事前学習したVGGネットワークの特徴領域におけるL2正規化として定義される忠実度損失と,(2)相対論的ヒンジ反転損失として定式化された品質損失とを両立させ,入力画像に所望の特性を与える。定量的,質的ともに,提案モデルが画像の美的品質を効果的に改善することを示す。コードはhttps://github.com/eezkni/uegan.com/。 Improving the aesthetic quality of images is challenging and eager for the public. To address this problem, most existing algorithms are based on supervised learning methods to learn an automatic photo enhancer for paired data, which consists of low-quality photos and corresponding expert-retouched versions. However, the style and characteristics of photos retouched by experts may not meet the needs or preferences of general users. In this paper, we present an unsupervised image enhancement generative adversarial network (UEGAN), which learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner, rather than learning on a large number of paired images. The proposed model is based on single deep GAN which embeds the modulation and attention mechanisms to capture richer global and local features. Based on the proposed model, we introduce two losses to deal with the unsupervised image enhancement: (1) fidelity loss, which is defined as a L2 regularization in the feature domain of a pre-trained VGG network to ensure the content between the enhanced image and the input image is the same, and (2) quality loss that is formulated as a relativistic hinge adversarial loss to endow the input image the desired characteristics. Both quantitative and qualitative results show that the proposed model effectively improves the aesthetic quality of images. Our code is available at: https://github.com/eezkni/UEGAN.	翻訳日:2021-04-18 06:00:32 公開日:2020-12-30
# NBNet:サブスペース投影による画像認識のためのノイズバス学習 NBNet: Noise Basis Learning for Image Denoising with Subspace Projection ( http://arxiv.org/abs/2012.15028v1 ) ライセンス: Link先を確認	Shen Cheng, Yuzhi Wang, Haibin Huang, Donghao Liu, Haoqiang Fan, Shuaicheng Liu	(参考訳) 本稿では,画像復調のための新しいフレームワークであるNBNetを紹介する。従来と異なり,画像適応投影による雑音低減という新たな視点から,この問題に取り組むことを提案する。具体的には,特徴空間における再構成基底のセットを学習することにより,信号と雑音を分離できるネットワークを訓練することを提案する。その後、信号部分空間の対応する基底を選択し、入力をそのような空間に投影することにより、画像デノシングを実現することができる。我々の重要な洞察は、プロジェクションは入力信号の局所的な構造を自然に維持できるということだ。この目的に向けて,基本生成と部分空間射影を明示的に学習するために設計された非局所部分空間注意モジュールであるssaを提案する。さらに、エンド・ツー・エンドの画像デノシング用に設計されたUNet構造化ネットワークであるNBNetにSSAを組み込む。我々は、SIDDやDNDなどのベンチマークで評価を行い、NBNetはPSNRおよびSSIMの最先端性能を計算コストを著しく削減する。 In this paper, we introduce NBNet, a novel framework for image denoising. Unlike previous works, we propose to tackle this challenging problem from a new perspective: noise reduction by image-adaptive projection. Specifically, we propose to train a network that can separate signal and noise by learning a set of reconstruction basis in the feature space. Subsequently, image denosing can be achieved by selecting corresponding basis of the signal subspace and projecting the input into such space. Our key insight is that projection can naturally maintain the local structure of input signal, especially for areas with low light or weak textures. Towards this end, we propose SSA, a non-local subspace attention module designed explicitly to learn the basis generation as well as the subspace projection. We further incorporate SSA with NBNet, a UNet structured network designed for end-to-end image denosing. We conduct evaluations on benchmarks, including SIDD and DND, and NBNet achieves state-of-the-art performance on PSNR and SSIM with significantly less computational cost.	翻訳日:2021-04-18 06:00:09 公開日:2020-12-30
# SkiNet: 不確実性推定と説明可能性を備えた皮膚病変診断のためのディープラーニングソリューション SkiNet: A Deep Learning Solution for Skin Lesion Diagnosis with Uncertainty Estimation and Explainability ( http://arxiv.org/abs/2012.15049v1 ) ライセンス: Link先を確認	Rajeev Kumar Singh, Rohan Gorantla, Sai Giridhar Allada, Narra Pratap	(参考訳) 皮膚がんは最も一般的なヒト悪性腫瘍であると考えられている。アメリカでは毎年500万件の新しい皮膚がんの症例が記録されている。皮膚病変の早期診断と評価は臨床的に非常に重要であるが, 発展途上国では皮膚科医と患者の比率が著しく低下している。そこで,SkiNet として知られる深層学習型アーキテクチャは,より高速なスクリーニングソリューションと,臨床診断過程において新たに訓練された医師に支援を提供することを目的としている。スキレットの設計と開発の主な動機はホワイトボックスソリューションを提供することであり、医療従事者によるコンピュータ支援診断システムの普及に不可欠な信頼と解釈の重大な問題に対処することである。 SkiNetは2段階のパイプラインで、病変のセグメンテーションに続いて、病変の分類を行う。提案手法では,モンテカルロ・ドロップアウト法とテスト時間拡張法を用いて認識論的不確かさを推定し,塩分に基づく手法を用いて深層学習モデルのポストホックな説明を行った。公開データセットISIC-2018は実験およびアブレーション研究に使用される。その結果、モデルの信頼性と透明性をモデル予測に組み込むことで、医療従事者の懐疑を緩和する一方で、従来のベンチマーク上でのモデルの堅牢性を確立した。 Skin cancer is considered to be the most common human malignancy. Around 5 million new cases of skin cancer are recorded in the United States annually. Early identification and evaluation of skin lesions is of great clinical significance, but the disproportionate dermatologist-patient ratio poses significant problem in most developing nations. Therefore a deep learning based architecture, known as SkiNet, is proposed with an objective to provide faster screening solution and assistance to newly trained physicians in the clinical diagnosis process. The main motive behind Skinet's design and development is to provide a white box solution, addressing a critical problem of trust and interpretability which is crucial for the wider adoption of Computer-aided diagnosis systems by the medical practitioners. SkiNet is a two-stage pipeline wherein the lesion segmentation is followed by the lesion classification. In our SkiNet methodology, Monte Carlo dropout and test time augmentation techniques have been employed to estimate epistemic and aleatoric uncertainty, while saliency-based methods are explored to provide post-hoc explanations of the deep learning models. The publicly available dataset, ISIC-2018, is used to perform experimentation and ablation studies. The results establish the robustness of the model on the traditional benchmarks while addressing the black-box nature of such models to alleviate the skepticism of medical practitioners by incorporating transparency and confidence to the model's prediction.	翻訳日:2021-04-18 05:59:51 公開日:2020-12-30
# RTS3D: 自律運転のための4次元特徴整合埋め込み空間からのリアルタイムステレオ3D検出 RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving ( http://arxiv.org/abs/2012.15072v1 ) ライセンス: Link先を確認	Peixuan Li, Shun Su, Huaici Zhao	(参考訳) Pseudo-LiDAR表現を用いた最近の画像ベース3Dオブジェクト検出法は優れた機能を示しているが、LiDAR法と比較して効率と精度の顕著な差は残っている。さらに、スタンドアローン深度推定器の過度信頼は、トレーニング段階では大量のピクセル単位のアノテーションを必要とし、推論段階ではより多くの計算を必要とし、実世界のスケーリングアプリケーションを制限する。本稿では,RTS3Dというステレオ画像から効率よく高精度な3Dオブジェクト検出手法を提案する。擬似ライダー類似手法における3次元占有空間と異なり,新しい4次元特徴整合埋め込み (fce) 空間を深度監督なしで3次元シーンの中間表現として設計する。 FCE空間は、ステレオペアから歪んだマルチスケールの特徴一貫性を探索することによって、オブジェクトの構造と意味情報を符号化する。さらに,FCE空間雑音の影響を低減するために,意味誘導型RBF (Radial Basis Function) と構造認識型アテンションモジュールを考案した。 KITTIベンチマークの実験では、RTS3Dはステレオ画像3D検出のための最初の真のリアルタイムシステム(FPS$>$24)であり、従来の最先端手法と比較して平均精度が10\%向上している。コードはhttps://github.com/Banconxuan/RTS3Dで入手できる。 Although the recent image-based 3D object detection methods using Pseudo-LiDAR representation have shown great capabilities, a notable gap in efficiency and accuracy still exist compared with LiDAR-based methods. Besides, over-reliance on the stand-alone depth estimator, requiring a large number of pixel-wise annotations in the training stage and more computation in the inferencing stage, limits the scaling application in the real world. In this paper, we propose an efficient and accurate 3D object detection method from stereo images, named RTS3D. Different from the 3D occupancy space in the Pseudo-LiDAR similar methods, we design a novel 4D feature-consistent embedding (FCE) space as the intermediate representation of the 3D scene without depth supervision. The FCE space encodes the object's structural and semantic information by exploring the multi-scale feature consistency warped from stereo pair. Furthermore, a semantic-guided RBF (Radial Basis Function) and a structure-aware attention module are devised to reduce the influence of FCE space noise without instance mask supervision. Experiments on the KITTI benchmark show that RTS3D is the first true real-time system (FPS$>$24) for stereo image 3D detection meanwhile achieves $10\%$ improvement in average precision comparing with the previous state-of-the-art method. The code will be available at https://github.com/Banconxuan/RTS3D	翻訳日:2021-04-18 05:59:27 公開日:2020-12-30
# 視点: ジャミング, 特徴学習, 遅延学習を一体化する深層学習のためのフェーズダイアグラム Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training ( http://arxiv.org/abs/2012.15110v1 ) ライセンス: Link先を確認	Mario Geiger, Leonardo Petrini and Matthieu Wyart	(参考訳) ディープラーニングアルゴリズムは、画像認識やgoプレイなど、さまざまなタスクにおける技術革命の責任を負う。しかし、なぜ働くのかは分かっていない。最終的には、高次元の空間の幾何学とそれに伴う次元の呪いのために、一般的に不可能である高次元のデータを分類する。どのような構造、対称性、不変性が、画像などのデータを学習可能にするかを理解することは、根本的な課題である。他のパズルとしては、(i)学習は高次元の損失を最小化することに対応しており、これは一般に凸ではなく、悪いミニマに陥る可能性がある。 (ii)データが完全に適合している状況でも、適合パラメータの数によってパワーを予測するディープラーニングは増加する。本書では,最近の研究成果を概観し,それらが与える(まだ説明されていない)次元パラドックスの呪いについて考察する。我々は、$(h,\alpha)$平面で、$h$はネットワーク幅、$\alpha$は初期化時のネットワーク出力のスケールであり、MNISTとCIFAR 10のために、その平面におけるパフォーマンスの新たな体系的な尺度を提供する。我々は、異なる学習体制をフェーズダイアグラムにまとめることができると論じる。臨界点の直線は、過小パラメータの位相から過小パラメータの位相を鋭く除く。過パラメータのネットでは、学習は滑らかなクロスオーバーによって分離された2つのレジームで動作することができる。大規模な初期化ではカーネルメソッドに対応し、小さな初期化ではデータの不変量とともに学習することができる。我々は、これらの異なる相の性質、遷移の相違、そしていくつかのオープンな疑問についてレビューする。本治療は,物理システムとの類似性を強調し,議論をスケーリングし,これらの結果を定量的に評価するための数値観測器の開発を行った。 Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental challenge. Other puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. In this manuscript, we review recent results elucidating (i,ii) and the perspective they offer on the (still unexplained) curse of dimensionality paradox. We base our theoretical discussion on the $(h,\alpha)$ plane where $h$ is the network width and $\alpha$ the scale of the output of the network at initialization, and provide new systematic measures of performance in that plane for MNIST and CIFAR 10. We argue that different learning regimes can be organized into a phase diagram. A line of critical points sharply delimits an under-parametrised phase from an over-parametrized one. In over-parametrized nets, learning can operate in two regimes separated by a smooth cross-over. At large initialization, it corresponds to a kernel method, whereas for small initializations features can be learnt, together with invariants in the data. We review the properties of these different phases, of the transition separating them and some open questions. Our treatment emphasizes analogies with physical systems, scaling arguments and the development of numerical observables to quantitatively test these results empirically.	翻訳日:2021-04-18 05:58:39 公開日:2020-12-30
# PMGT-VR:分散化近位勾配アルゴリズムの分散化 PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction ( http://arxiv.org/abs/2012.15010v1 ) ライセンス: Link先を確認	Haishan Ye, Wei Xiong, and Tong Zhang	(参考訳) 本稿では分散複合最適化問題について考察する。本研究では,マルチコンセンサス,勾配追従,分散低減といった複数の手法を組み合わせたpmgt-vrと呼ばれる分散分散分散分散型近位勾配アルゴリズムフレームワークを提案する。提案手法は,集中型アルゴリズムの模倣に依拠し,この枠組みに基づくアルゴリズムが集中型アルゴリズムと同様の収束率を達成することを示す。また、PMGT-SAGAとPMGT-LSVRGの2つの代表アルゴリズムを記述・解析し、それらを既存の最先端近位アルゴリズムと比較する。我々の知る限り、PMGT-VRは分散合成最適化問題を解く最初の分散還元法である。提案手法の有効性を示すために数値実験を行った。 This paper considers the decentralized composite optimization problem. We propose a novel decentralized variance-reduced proximal-gradient algorithmic framework, called PMGT-VR, which is based on a combination of several techniques including multi-consensus, gradient tracking, and variance reduction. The proposed framework relies on an imitation of centralized algorithms and we demonstrate that algorithms under this framework achieve convergence rates similar to that of their centralized counterparts. We also describe and analyze two representative algorithms, PMGT-SAGA and PMGT-LSVRG, and compare them to existing state-of-the-art proximal algorithms. To the best of our knowledge, PMGT-VR is the first variance-reduction method that can solve decentralized composite optimization problems. Numerical experiments are provided to demonstrate the effectiveness of the proposed algorithms.	翻訳日:2021-04-18 05:58:08 公開日:2020-12-30
# エンドツーエンド予測と最適化プロセスのリスク保証 Risk Guarantees for End-to-End Prediction and Optimization Processes ( http://arxiv.org/abs/2012.15046v1 ) ライセンス: Link先を確認	Nam Ho-Nguyen and Fatma K{\i}l{\i}n\c{c}-Karzan	(参考訳) 予測モデルは最適化モデルのパラメータを推定するためにしばしば用いられる。エンドツーエンドの観点では、真の目標は、優れた最適化性能を達成することだが、予測性能は単独で測定される。パラメータの推定における優れた予測性能は、後続の最適化性能をもたらすと信じられているが、それに対する正式な理論的保証は特に不足している。本稿では,予測性能が最適化性能をどのように支配するかを明確に記述できる条件について検討する。より弱い条件では漸近収束の結果が得られるが、より強い条件では予測性能の観点から最適化性能を正確に定量化することができる。一般に、これらの条件の検証は非自明なタスクである。それでも、我々の弱い条件は、学習理論の文献からよく知られたフィッシャー整合性の概念と等価であることを示す。これにより、いくつかの損失関数に対してより弱い条件を簡単にチェックできる。また、二乗誤差損失関数が我々のより強い条件を満たすことも確認する。その結果、二乗損失で測定した予測性能と対称損失関数のクラスと、それに続く最適化性能との正確な理論的関係が導出される。ポートフォリオ最適化,分数ナップサック,多クラス分類問題に関する計算研究において,複数の予測損失関数(フィッシャー一貫性のあるもの,そうでないもの)を用いた最適化性能を比較し,損失関数の一貫性の欠如が性能に有害な影響を与えることを実証する。 Prediction models are often employed in estimating parameters of optimization models. Despite the fact that in an end-to-end view, the real goal is to achieve good optimization performance, the prediction performance is measured on its own. While it is usually believed that good prediction performance in estimating the parameters will result in good subsequent optimization performance, formal theoretical guarantees on this are notably lacking. In this paper, we explore conditions that allow us to explicitly describe how the prediction performance governs the optimization performance. Our weaker condition allows for an asymptotic convergence result, while our stronger condition allows for exact quantification of the optimization performance in terms of the prediction performance. In general, verification of these conditions is a non-trivial task. Nevertheless, we show that our weaker condition is equivalent to the well-known Fisher consistency concept from the learning theory literature. This then allows us to easily check our weaker condition for several loss functions. We also establish that the squared error loss function satisfies our stronger condition. Consequently, we derive the exact theoretical relationship between prediction performance measured with the squared loss, as well as a class of symmetric loss functions, and the subsequent optimization performance. In a computational study on portfolio optimization, fractional knapsack and multiclass classification problems, we compare the optimization performance of using of several prediction loss functions (some that are Fisher consistent and some that are not) and demonstrate that lack of consistency of the loss function can indeed have a detrimental effect on performance.	翻訳日:2021-04-18 05:57:55 公開日:2020-12-30
# メカニカルエンジニアリングにおけるデータサイエンスとそのアプローチ A Review into Data Science and Its Approaches in Mechanical Engineering ( http://arxiv.org/abs/2012.15358v1 ) ライセンス: Link先を確認	Ashkan Yousefi Zadeh, Meysam Shahbazy	(参考訳) 今日では、インテリジェントシステムを使用して、デバイスやファクトリのさまざまなコンポーネントのパフォーマンスと最適化を改善することは避けられない。さらに、ビジネスや医学研究、工学研究などにおいて、より良い意思決定を行うための適切な予測を持つことが不可欠です。これらの手法の最新かつ最も広く使われている分野の1つはデータサイエンスと呼ばれる分野であり、科学者、エンジニア、工場の全員がキャリアで学び、利用する必要がある。本稿では,データサイエンスについて概説し,その手法,特に機械工学における利用法,および機械工学におけるデータサイエンスの展開方法について概説する。はじめに、異なるデータサイエンスの定義とその技術における背景をレビューした。以下に、データ科学者が研究で行うべきプロセスであるデータサイエンス方法論について論じる。また、その研究にデータサイエンス手法を用いた機械工学分野の研究について概説する。最終的に、論文でレビューされた課題、なぜ機械工学の研究やプロジェクトにおいてデータサイエンスを使う必要があるのか、という議論がなされている。 Nowadays it is inevitable to use intelligent systems to improve the performance and optimization of different components of devices or factories. Furthermore, it's so essential to have appropriate predictions to make better decisions in businesses, medical studies, and engineering studies, etc. One of the newest and most widely used of these methods is a field called Data Science that all of the scientists, engineers, and factories need to learn and use in their careers. This article briefly introduced data science and reviewed its methods, especially it's usages in mechanical engineering and challenges and ways of developing data science in mechanical engineering. In the introduction, different definitions of data science and its background in technology reviewed. In the following, data science methodology which is the process that a data scientist needs to do in its works been discussed. Further, some researches in the mechanical engineering area that used data science methods in their studies, are reviewed. Eventually, it has been discussed according to the subjects that have been reviewed in the article, why it is necessary to use data science in mechanical engineering researches and projects.	翻訳日:2021-04-18 05:57:21 公開日:2020-12-30
# 法医学的目的のための畳み込み長短期記憶ネットワークによる損傷指紋認識 Damaged Fingerprint Recognition by Convolutional Long Short-Term Memory Networks for Forensic Purposes ( http://arxiv.org/abs/2012.15041v1 ) ライセンス: Link先を確認	Jaouhar Fattahi and Mohamed Mejri	(参考訳) 指紋認識は、しばしば犯罪者に対する証拠を確立するためのゲームを変えるステップである。しかし、犯罪者が故意に指紋を改ざんして、技術者や自動センサーが指紋を認識するのを難しくし、捜査官が法医学的な手続きで彼らに対して強力な証拠を確立するのが面倒になることが、ますますわかってきています。この意味で、ディープラーニングは、損傷した指紋の認識を支援する主要候補として現れる。特に畳み込みアルゴリズムです本稿では,Convolutional Long Short-Term Memory Networkによる損傷指紋の認識に着目した。我々は,このモデルのアーキテクチャを示し,95%の精度,99%の精度,95%のリコール,99%のaucに接近する性能を示す。 Fingerprint recognition is often a game-changing step in establishing evidence against criminals. However, we are increasingly finding that criminals deliberately alter their fingerprints in a variety of ways to make it difficult for technicians and automatic sensors to recognize their fingerprints, making it tedious for investigators to establish strong evidence against them in a forensic procedure. In this sense, deep learning comes out as a prime candidate to assist in the recognition of damaged fingerprints. In particular, convolution algorithms. In this paper, we focus on the recognition of damaged fingerprints by Convolutional Long Short-Term Memory networks. We present the architecture of our model and demonstrate its performance which exceeds 95% accuracy, 99% precision, and approaches 95% recall and 99% AUC.	翻訳日:2021-04-18 05:57:03 公開日:2020-12-30
# 品質アテンション・ジェネレーティブ・アドバイザリ・ネットワークによる未ペア画像強調 Unpaired Image Enhancement with Quality-Attention Generative Adversarial Network ( http://arxiv.org/abs/2012.15052v1 ) ライセンス: Link先を確認	Zhangkai Ni, Wenhan Yang, Shiqi Wang, Lin Ma, and Sam Kwong	(参考訳) 本研究では,ユーザが提供する高品質画像の特徴を活かし,低画質画像のエンリッチ化を可能にする非ペア画像エンハンスメントモデルについて検討する。本稿では,品質アテンションモジュール (QAM) を組み込んだ双方向生成支援ネットワーク (GAN) に基づく,未ペアデータに基づく品質アテンション生成敵ネットワーク (QAGAN) を提案する。提案したQAGANの重要な新規性は、2つのドメインから直接ドメイン関連品質の注意を学習するようにジェネレータに注入されたQAMにある。より具体的には、提案したQAMは、空間的特性から意味的特性を効果的に選択し、チャネル的にそれぞれスタイル関連属性を適応的に組み込むことを可能にする。そこで,提案したQAGANでは,識別器だけでなく,生成器が両方のドメインに直接アクセスし,生成器がマッピング関数を学習できるようにする。提案手法は, 未経験学習に基づく最先端の手法と比較して, 客観的・主観的評価の両面において, 優れた性能が得られることを示す。 In this work, we aim to learn an unpaired image enhancement model, which can enrich low-quality images with the characteristics of high-quality images provided by users. We propose a quality attention generative adversarial network (QAGAN) trained on unpaired data based on the bidirectional Generative Adversarial Network (GAN) embedded with a quality attention module (QAM). The key novelty of the proposed QAGAN lies in the injected QAM for the generator such that it learns domain-relevant quality attention directly from the two domains. More specifically, the proposed QAM allows the generator to effectively select semantic-related characteristics from the spatial-wise and adaptively incorporate style-related attributes from the channel-wise, respectively. Therefore, in our proposed QAGAN, not only discriminators but also the generator can directly access both domains which significantly facilitates the generator to learn the mapping function. Extensive experimental results show that, compared with the state-of-the-art methods based on unpaired learning, our proposed method achieves better performance in both objective and subjective evaluations.	翻訳日:2021-04-18 05:56:51 公開日:2020-12-30
# 脳動脈瘤セグメンテーションにおける大コンテキストの探索 Exploring Large Context for Cerebral Aneurysm Segmentation ( http://arxiv.org/abs/2012.15136v1 ) ライセンス: Link先を確認	Jun Ma, Ziwei Nie	(参考訳) 脳動脈瘤の診断, モニタリング, 治療計画には, 3次元CTからの動脈瘤の自動分画が重要である。本報告では,MICCAI 2020 CADA チャレンジにおける大動脈瘤分節法の主な技術について概説する。主な貢献は、大きなパッチサイズで3D U-Netを設定することで、大きなコンテキストを得ることができることです。 MICCAI 2020 CADAテストデータセットでは,平均Jaccard 0.7593で2位となった。私たちのコードとトレーニングされたモデルは、 \url{https://github.com/junma11/cada2020}で公開されている。 Automated segmentation of aneurysms from 3D CT is important for the diagnosis, monitoring, and treatment planning of the cerebral aneurysm disease. This short paper briefly presents the main technique details of the aneurysm segmentation method in the MICCAI 2020 CADA challenge. The main contribution is that we configure the 3D U-Net with a large patch size, which can obtain the large context. Our method ranked second on the MICCAI 2020 CADA testing dataset with an average Jaccard of 0.7593. Our code and trained models are publicly available at \url{https://github.com/JunMa11/CADA2020}.	翻訳日:2021-04-18 05:56:30 公開日:2020-12-30
# Exact, Approximate, Error-Tolerant Graph Matchingのアルゴリズム Some Algorithms on Exact, Approximate and Error-Tolerant Graph Matching ( http://arxiv.org/abs/2012.15279v1 ) ライセンス: Link先を確認	Shri Prakash Dwivedi	(参考訳) このグラフは、その表現力とオブジェクト間の関係を示す固有の能力から、工学や科学において最も広く使われている数学的構造の一つである。本研究の目的は,グラフの表現力を利用した新しいグラフマッチング手法を導入し,構造的パターン認識に適用することである。本稿では,様々な正確かつ不正確なグラフマッチング手法について広範な調査を行う。準同型の概念を用いたグラフマッチングについて述べる。グラフマッチングアルゴリズムのカテゴリが提示され、関係性のある指標を用いて重要でないノードを除去することでグラフサイズを小さくする。本稿では,より少ない次数ノードを縮合することで,与えられたグラフを別のグラフに変換するノード収縮を用いた誤り耐性グラフマッチング手法を提案する。この手法を用いてグラフ編集距離を延長し,実行時間と精度のトレードオフとして利用することができる。本稿では,各ノードの集中度情報を利用することで,グラフマッチングのアプローチについて述べる。グラフマッチング問題は本質的にグラフの幾何学と位相に関係している。幾何グラフを用いたグラフ類似度測定の新しい手法を提案する。 2つの幾何グラフ間の頂点距離を頂点の位置を用いて定義し、頂点のみを持つすべてのグラフの集合上の計量であることを示す。辺の角方向,長さ,位置に基づいて2つのグラフ間の辺距離を定義する。次に頂点距離と辺距離の概念を組み合わせて、2つの幾何グラフの間のグラフ距離を定義し、それを計量であることを示す。最後に,提案するグラフ類似性フレームワークを用いて,正確かつエラーに耐性のあるグラフマッチングを行う。 The graph is one of the most widely used mathematical structures in engineering and science because of its representational power and inherent ability to demonstrate the relationship between objects. The objective of this work is to introduce the novel graph matching techniques using the representational power of the graph and apply it to structural pattern recognition applications. We present an extensive survey of various exact and inexact graph matching techniques. Graph matching using the concept of homeomorphism is presented. A category of graph matching algorithms is presented, which reduces the graph size by removing the less important nodes using some measure of relevance. We present an approach to error-tolerant graph matching using node contraction where the given graph is transformed into another graph by contracting smaller degree nodes. We use this scheme to extend the notion of graph edit distance, which can be used as a trade-off between execution time and accuracy. We describe an approach to graph matching by utilizing the various node centrality information, which reduces the graph size by removing a fraction of nodes from both graphs based on a given centrality measure. The graph matching problem is inherently linked to the geometry and topology of graphs. We introduce a novel approach to measure graph similarity using geometric graphs. We define the vertex distance between two geometric graphs using the position of their vertices and show it to be a metric over the set of all graphs with vertices only. We define edge distance between two graphs based on the angular orientation, length and position of the edges. Then we combine the notion of vertex distance and edge distance to define the graph distance between two geometric graphs and show it to be a metric. Finally, we use the proposed graph similarity framework to perform exact and error-tolerant graph matching.	翻訳日:2021-04-18 05:56:22 公開日:2020-12-30
# 次数補正ブロックモデルに対する調整型チ二乗試験 Adjusted chi-square test for degree-corrected block models ( http://arxiv.org/abs/2012.15047v1 ) ライセンス: Link先を確認	Linfan Zhang and Arash A. Amini	(参考訳) 次数補正確率ブロックモデル(DCSBM)の適合性試験を提案する。このテストは、d_1,\dots,d_n$観測値を持つ多項分布群間の平均値の等式を測定する調整されたチ二乗統計に基づく。ネットワークモデルの文脈では、$n$という多重項の数は、観測値の$d_i$よりもはるかに速く成長するので、設定は古典的な漸近から逸脱する。単純な調整は、$\{d_i\}$の調和平均が無限大に大きくなる限り、統計学が分布に収束することを示す。この結果は、$d_i$の役割をノード$i$の度合いで果たすような大きなスパースネットワークに適用できる。我々の分布結果は漸近的ではなく、明示的な定数を持ち、目標分布へのコルモゴロフ-スミルノフ距離の有限サンプル境界を与える。順次適用した場合、テストはコミュニティの数を決定するためにも使用できる。テストはadjacency matrixの(row)圧縮バージョンで動作し、次数で条件付けされ、その結果、大きなスパースネットワークに対して高度にスケーラブルである。我々は、$K$コミュニティのテスト時に$(K+1)$-communityの割り当てに基づいて列を圧縮するという新しいアイデアを取り入れた。この手法は, 計算効率を犠牲にすることなく, 逐次的応用のパワーを増大させ, コミュニティ数回復における一貫性を実証する。テスト統計は特定の代替品に依存しないため、そのユーティリティはシーケンシャルなテストを超えて、dcsbmファミリー以外の幅広い代替品に対して同時にテストすることができる。シミュレーションおよび実データを用いた大規模数値実験によるアプローチの有効性を示す。特に、Facebook-100データセットにテストを適用すると、少数のコミュニティを持つDCSBMは、ほぼすべてのケースに適していないことが分かりました。 We propose a goodness-of-fit test for degree-corrected stochastic block models (DCSBM). The test is based on an adjusted chi-square statistic for measuring equality of means among groups of $n$ multinomial distributions with $d_1,\dots,d_n$ observations. In the context of network models, the number of multinomials, $n$, grows much faster than the number of observations, $d_i$, hence the setting deviates from classical asymptotics. We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $\{d_i\}$ grows to infinity. This result applies to large sparse networks where the role of $d_i$ is played by the degree of node $i$. Our distributional results are nonasymptotic, with explicit constants, providing finite-sample bounds on the Kolmogorov-Smirnov distance to the target distribution. When applied sequentially, the test can also be used to determine the number of communities. The test operates on a (row) compressed version of the adjacency matrix, conditional on the degrees, and as a result is highly scalable to large sparse networks. We incorporate a novel idea of compressing the columns based on a $(K+1)$-community assignment when testing for $K$ communities. This approach increases the power in sequential applications without sacrificing computational efficiency, and we prove its consistency in recovering the number of communities. Since the test statistic does not rely on a specific alternative, its utility goes beyond sequential testing and can be used to simultaneously test against a wide range of alternatives outside the DCSBM family. We show the effectiveness of the approach by extensive numerical experiments with simulated and real data. In particular, applying the test to the Facebook-100 dataset, we find that a DCSBM with a small number of communities is far from a good fit in almost all cases.	翻訳日:2021-04-18 05:55:57 公開日:2020-12-30
# 弾性ネットによる特徴ランク付けと選択 Elastic Net based Feature Ranking and Selection ( http://arxiv.org/abs/2012.14982v1 ) ライセンス: Link先を確認	Shaode Yu, Haobo Chen, Hang Yu, Zhicheng Zhang, Xiaokun Liang, Wenjian Qin, Yaoqin Xie, Ping Shi	(参考訳) 特徴選択はデータ表現とインテリジェントな診断において重要である。 Elastic netは最も広く使われている機能セレクタの1つである。しかしながら、選択された特徴はトレーニングデータに依存しており、正規化回帰専用の重み付けは、特徴ランキングに使用される場合の重要性に関係せず、モデル解釈可能性と拡張性が低下する。本研究では,データ分割と弾性ネットによる特徴選択を複数回行った結果,直感的なアイデアが得られた。選択された特徴の頻度に関係し、特徴の重要性を示す指標として周波数を使用する。特徴量を周波数順にソートした後、線形支持ベクトルマシンは漸進的に分類を行う。最終的に、予測性能を比較して識別特徴のコンパクトなサブセットを選択する。乳がんデータセット (BCDR-F03, WDBC, GSE 10810, GSE 15852) の実験結果から, 提案フレームワークは弾力性ネットに対する競争力や優れた性能を達成し, より少ない特徴を連続的に選択できることが示唆された。高次元の小型データセットの一貫性をさらに強化するには、今後の作業にもっと注意を払う必要がある。提案されたフレームワークはオンラインでアクセスできる(https://github.com/nicoyucn/elasticnetfr)。 Feature selection is important in data representation and intelligent diagnosis. Elastic net is one of the most widely used feature selectors. However, the features selected are dependant on the training data, and their weights dedicated for regularized regression are irrelevant to their importance if used for feature ranking, that degrades the model interpretability and extension. In this study, an intuitive idea is put at the end of multiple times of data splitting and elastic net based feature selection. It concerns the frequency of selected features and uses the frequency as an indicator of feature importance. After features are sorted according to their frequency, linear support vector machine performs the classification in an incremental manner. At last, a compact subset of discriminative features is selected by comparing the prediction performance. Experimental results on breast cancer data sets (BCDR-F03, WDBC, GSE 10810, and GSE 15852) suggest that the proposed framework achieves competitive or superior performance to elastic net and with consistent selection of fewer features. How to further enhance its consistency on high-dimension small-sample-size data sets should be paid more attention in our future work. The proposed framework is accessible online (https://github.com/NicoYuCN/elasticnetFR).	翻訳日:2021-04-18 05:55:26 公開日:2020-12-30
# 繰り返しニューラルネットワークを用いたスタックベースバッファオーバーフロー検出 Stack-based Buffer Overflow Detection using Recurrent Neural Networks ( http://arxiv.org/abs/2012.15116v1 ) ライセンス: Link先を確認	William Arild Dahl, Laszlo Erdodi, Fabio Massimo Zennaro	(参考訳) ソフトウェアにおける脆弱性の検出は、アプリケーションの開発とデプロイにおいて重要な課題である。最も知られて危険な脆弱性の1つはスタックベースのバッファオーバーフローであり、潜在的な攻撃者が悪意のあるコードを実行できる可能性がある。本稿では,最近の機械学習モデル,特にリカレントニューラルネットワークを用いて,プログラムのアセンブリコードにスタックベースのバッファオーバーフロー脆弱性を検出することを検討する。アセンブリコードは汎用的で一般的な表現であるため、この言語に焦点を当てることで、複数の異なるプログラミング言語で記述されたプログラムを検討できる。さらに,コードを自然言語として扱うことができるという仮説をサブスクライブし,自然言語処理に一般的に使用される標準アーキテクチャを用いてアセンブリコードを処理する。本研究は,自然言語仮説の妥当性と,再帰的ニューラルネットワークを用いた脆弱性検出の可能性を確認することを目的とした一連の実験を行う。その結果,当社のアーキテクチャは,コンテキストに強く依存する,微妙なスタックベースのバッファオーバーフロー脆弱性を捕捉できることが分かった。 Detecting vulnerabilities in software is a critical challenge in the development and deployment of applications. One of the most known and dangerous vulnerabilities is stack-based buffer overflows, which may allow potential attackers to execute malicious code. In this paper we consider the use of modern machine learning models, specifically recurrent neural networks, to detect stack-based buffer overflow vulnerabilities in the assembly code of a program. Since assembly code is a generic and common representation, focusing on this language allows us to potentially consider programs written in several different programming languages. Moreover, we subscribe to the hypothesis that code may be treated as natural language, and thus we process assembly code using standard architectures commonly employed in natural language processing. We perform a set of experiments aimed at confirming the validity of the natural language hypothesis and the feasibility of using recurrent neural networks for detecting vulnerabilities. Our results show that our architecture is able to capture subtle stack-based buffer overflow vulnerabilities that strongly depend on the context, thus suggesting that this approach may be extended to real-world setting, as well as to other forms of vulnerability detection.	翻訳日:2021-04-18 05:55:06 公開日:2020-12-30
# 公共交通機関の類似性分類 Similarity Classification of Public Transit Stations ( http://arxiv.org/abs/2012.15267v1 ) ライセンス: Link先を確認	Hannah Bast, Patrick Brosi and Markus N\"ather	(参考訳) 2つの公共交通機関の駅識別子 A と B がラベルと地理的座標を持つ場合、A と B が同一の駅を表すかどうかを決定する。例えば "St Pancras International at (51.5306, -0.1253) や "London St Pancras at (51.5319, -0.1269) では、答えは "Yes" となる。この問題は、地理的情報システム、スケジュールのマージ、ルート計画、マップマッチングなど、公共交通機関のデータを使用する領域で頻繁に発生する。地理的距離と単純な文字列類似度尺度に基づくいくつかのベースライン手法を検討する。また、より精巧な文字列類似度尺度を実験し、手動で正規化ルールを作成します。実験の結果,これらのベースライン法は良好な結果をもたらすが,十分に満足できるものではないことがわかった。そこで我々は,2つの駅間のトリグラムの一致,距離,相互織りグリッド上の位置を訓練したランダムフォレスト分類器に基づくアプローチを開発した。すべてのアプローチは、OpenStreetMap (OSM)データから得られた幅広い真実のデータセットに基づいて評価される。全てのデータセットにおいて、我々の学習に基づくアプローチはF1スコアを99%以上達成し、最も精巧なベースラインアプローチ(TFIDFスコアと地理的距離に基づく)でさえもF1スコアを94%以上達成し、地理的距離閾値を用いた単純なアプローチはF1スコアを75%しか達成していない。トレーニングとテストの両方のデータセットが公開されています。 We study the following problem: given two public transit station identifiers A and B, each with a label and a geographic coordinate, decide whether A and B describe the same station. For example, for "St Pancras International" at (51.5306, -0.1253) and "London St Pancras" at (51.5319, -0.1269), the answer would be "Yes". This problem frequently arises in areas where public transit data is used, for example in geographic information systems, schedule merging, route planning, or map matching. We consider several baseline methods based on geographic distance and simple string similarity measures. We also experiment with more elaborate string similarity measures and manually created normalization rules. Our experiments show that these baseline methods produce good, but not fully satisfactory results. We therefore develop an approach based on a random forest classifier which is trained on matching trigrams between two stations, their distance, and their position on an interwoven grid. All approaches are evaluated on extensive ground truth datasets we generated from OpenStreetMap (OSM) data: (1) The union of Great Britain and Ireland and (2) the union of Germany, Switzerland, and Austria. On all datasets, our learning-based approach achieves an F1 score of over 99%, while even the most elaborate baseline approach (based on TFIDF scores and the geographic distance) achieves an F1 score of at most 94%, and a naive approach of using a geographical distance threshold achieves an F1 score of only 75%. Both our training and testing datasets are publicly available.	翻訳日:2021-04-18 05:54:47 公開日:2020-12-30
# タブラルファイナンシャルデータを用いた信用リスクモニタリングのための逐次深層学習 Sequential Deep Learning for Credit Risk Monitoring with Tabular Financial Data ( http://arxiv.org/abs/2012.15330v1 ) ライセンス: Link先を確認	Jillian M. Clements, Di Xu, Nooshin Yousefi, Dmitry Efimov	(参考訳) 機械学習は、銀行業界における財政的損失を防ぐ上で重要な役割を果たす。おそらく、毎年数十億ドルの損失をもたらす可能性のある最も関連する予測タスクは、信用リスクの評価(すなわち債務不履行のリスク)である。今日、信用リスクを予測するための機械学習からの利益の多くは、勾配強化決定木モデルによってもたらされている。しかし、これらの利益は高価な新しいデータソースや高度に設計されたフィーチャを追加せずに高まり始めます。本稿では,新たなモデル入力に依存しない深層学習を用いて,信用リスク評価のための新しい手法を考案する試みについて述べる。本稿では,コストのかかる財務データの長い履歴列を利用する深層再帰的および因果的畳み込みに基づくニューラルネットワークを用いた,新たなクレジットカードトランザクションサンプリング手法を提案する。我々は,時間的畳み込みネットワークを用いた逐次的ディープラーニングアプローチが,非シーケンスツリーベースモデルのベンチマークを上回っており,大幅な貯蓄と早期の信用リスク検出を達成していることを示す。また,本手法により,シーケンスを効率よくメモリに格納し,高速なオンライン学習と推論を行うことが可能となる実運用環境において,本手法が採用される可能性を示した。 Machine learning plays an essential role in preventing financial losses in the banking industry. Perhaps the most pertinent prediction task that can result in billions of dollars in losses each year is the assessment of credit risk (i.e., the risk of default on debt). Today, much of the gains from machine learning to predict credit risk are driven by gradient boosted decision tree models. However, these gains begin to plateau without the addition of expensive new data sources or highly engineered features. In this paper, we present our attempts to create a novel approach to assessing credit risk using deep learning that does not rely on new model inputs. We propose a new credit card transaction sampling technique to use with deep recurrent and causal convolution-based neural networks that exploits long historical sequences of financial data without costly resource requirements. We show that our sequential deep learning approach using a temporal convolutional network outperformed the benchmark non-sequential tree-based model, achieving significant financial savings and earlier detection of credit risk. We also demonstrate the potential for our approach to be used in a production environment, where our sampling technique allows for sequences to be stored efficiently in memory and used for fast online learning and inference.	翻訳日:2021-04-18 05:54:19 公開日:2020-12-30
# スペクトログラムとCNNを用いたLoRaの高周波指紋識別 Radio Frequency Fingerprint Identification for LoRa Using Spectrogram and CNN ( http://arxiv.org/abs/2101.01668v1 ) ライセンス: Link先を確認	Guanxiong Shen, Junqing Zhang, Alan Marshall, Linning Peng, and Xianbin Wang	(参考訳) RFFI(Radio frequency fingerprint Identification)は、無線デバイス固有のハードウェア特性に依存する新しいデバイス認証技術である。我々は,spectrogram and convolutional neural network (cnn) に基づく長距離(lora)システムのためのrffiスキームを設計した。具体的には,lora信号の細粒度時間周波数特性を表すために分光計を用いた。さらに, 即時キャリア周波数オフセット(CFO)がドリフトしており, 誤分類が発生し, システムの安定性を著しく損なうことが判明した。最後に、CNN出力を推定したCFOで調整できるハイブリッド分類器を設計した。 CFOの平均値は比較的安定しているため、推定されたCFOが範囲外になるCNN予測を除外することができる。 20個のLoRaデバイス(DUT)とUniversal Software Radio Peripheral (USRP) N210受信機を用いて実無線環境で実験を行った。 IQベースのRFFIスキームとFFTベースのRFFIスキームを比較することで、スペクトルベースのスキームは20のLoRa DUTに対して97.61%の最良の分類精度に達することができる。 Radio frequency fingerprint identification (RFFI) is an emerging device authentication technique that relies on intrinsic hardware characteristics of wireless devices. We designed an RFFI scheme for Long Range (LoRa) systems based on spectrogram and convolutional neural network (CNN). Specifically, we used spectrogram to represent the fine-grained time-frequency characteristics of LoRa signals. In addition, we revealed that the instantaneous carrier frequency offset (CFO) is drifting, which will result in misclassification and significantly compromise the system stability; we demonstrated CFO compensation is an effective mitigation. Finally, we designed a hybrid classifier that can adjust CNN outputs with the estimated CFO. The mean value of CFO remains relatively stable, hence it can be used to rule out CNN predictions whose estimated CFO falls out of the range. We performed experiments in real wireless environments using 20 LoRa devices under test (DUTs) and a Universal Software Radio Peripheral (USRP) N210 receiver. By comparing with the IQ-based and FFT-based RFFI schemes, our spectrogram-based scheme can reach the best classification accuracy, i.e., 97.61% for 20 LoRa DUTs.	翻訳日:2021-04-18 05:53:40 公開日:2020-12-30
# 非並列調音音声合成のための多視点時間アライメント Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis ( http://arxiv.org/abs/2012.15184v1 ) ライセンス: Link先を確認	Jose A. Gonzalez-Lopez and Miriam Gonzalez-Atienza and Alejandro Gomez-Alanis and Jose L. Perez-Cordoba and Phil D. Green	(参考訳) A2A(Articulatory-to-acoustic)合成(A2A)は、調音器の捕えられた動きから可聴音声を生成することを指す。この手法には、病気や怪我のためにもはや話せない人々への口頭コミュニケーションの回復など、多くの応用がある。最も成功した技術は教師付き学習フレームワークを採用しており、時間同期の調音音声記録を用いて教師付き機械学習アルゴリズムを訓練し、後から音声への調音運動のマッピングに使用できる。しかし、これは並列データが利用できない場合、例えば、既に声を失い、調音データのみをキャプチャできるような場合、A2A技術の適用を妨げている。本研究では,多視点学習理論に基づくこの問題に対する解法を提案する。提案アルゴリズムは, 両ビューが最大相関する共通潜在空間に投影し, 動的時間ワープを適用することにより, 同一の音声内容を含む一対の非整合調音列間の最適時間アライメントを求める。この概念のいくつかの変種が議論され、検討されている。非一致シナリオで生成された音声の質は、並列シナリオで得られたものと同程度であることを示す。 Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators. This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury. Most successful techniques so far adopt a supervised learning framework, in which time-synchronous articulatory-and-speech recordings are used to train a supervised machine learning algorithm that can be used later to map articulator movements to speech. This, however, prevents the application of A2A techniques in cases where parallel data is unavailable, e.g., a person has already lost her/his voice and only articulatory data can be captured. In this work, we propose a solution to this problem based on the theory of multi-view learning. The proposed algorithm attempts to find an optimal temporal alignment between pairs of non-aligned articulatory-and-acoustic sequences with the same phonetic content by projecting them into a common latent space where both views are maximally correlated and then applying dynamic time warping. Several variants of this idea are discussed and explored. We show that the quality of speech generated in the non-aligned scenario is comparable to that obtained in the parallel scenario.	翻訳日:2021-04-18 05:53:23 公開日:2020-12-30
# 確率的ユーティリティ最大化のためのテストスコアアルゴリズム Test Score Algorithms for Budgeted Stochastic Utility Maximization ( http://arxiv.org/abs/2012.15194v1 ) ライセンス: Link先を確認	Dabeen Lee, Milan Vojnovic, Se-Young Yun	(参考訳) 実用性最大化問題を解くための個別項目スコアに基づくアルゴリズム設計の最近の発展により、我々は、予算化された確率的実用性最大化問題の解法として、観測された個々の項目パフォーマンスデータの統計量として定義されたテストスコアを使用する枠組みを研究した。既存のスコアリング機構、すなわちレプリケーションテストスコアを拡張して、異種アイテムのコストとアイテムの値を統合する。そこで本研究では,複製テストのみに基づいてアイテムを選択する自然なグリージーアルゴリズムにより,幅広いユーティリティ関数に対して最適値の定数係数内の解を出力することを示す。我々のアルゴリズムと近似保証は、テストスコアが個々のアイテム値の限界分布に関する特定の期待値のノイズ推定であると仮定し、我々のアルゴリズムを実用化し、ノイズのない見積もりを仮定する以前の作業を拡張します。さらに,我々のアルゴリズムは,同じ近似保証を維持しつつ,ストリーミング形式で商品が到着する状況に適応できることを示す。我々は,Academia.StackExchange Q&Aフォーラムの合成データとデータセットを用いて,我々のテストスコアアルゴリズムが競争性を達成できることを示し,場合によっては関数値を評価するために値オラクルへのアクセスを必要とするベンチマークアルゴリズムよりも優れた性能を示す。 Motivated by recent developments in designing algorithms based on individual item scores for solving utility maximization problems, we study the framework of using test scores, defined as a statistic of observed individual item performance data, for solving the budgeted stochastic utility maximization problem. We extend an existing scoring mechanism, namely the replication test scores, to incorporate heterogeneous item costs as well as item values. We show that a natural greedy algorithm that selects items solely based on their replication test scores outputs solutions within a constant factor of the optimum for a broad class of utility functions. Our algorithms and approximation guarantees assume that test scores are noisy estimates of certain expected values with respect to marginal distributions of individual item values, thus making our algorithms practical and extending previous work that assumes noiseless estimates. Moreover, we show how our algorithm can be adapted to the setting where items arrive in a streaming fashion while maintaining the same approximation guarantee. We present numerical results, using synthetic data and data sets from the Academia.StackExchange Q&A forum, which show that our test score algorithm can achieve competitiveness, and in some cases better performance than a benchmark algorithm that requires access to a value oracle to evaluate function values.	翻訳日:2021-04-18 05:53:04 公開日:2020-12-30

Title

Authors

Abstract

論文公表日・翻訳日

# 長距離相互作用をもつ不規則2次元ハバードモデルの半古典力学

Semiclassical dynamics of a disordered two-dimensional Hubbard model with long-range interactions ( http://arxiv.org/abs/2002.05549v2 )

ライセンス: Link先を確認

Adam S. Sajna, Anatoli Polkovnikov

(参考訳) 相互作用するフェルミオンの二次元系におけるクエンチダイナミクスは半古典的切断ウィグナー近似(twa)で解析される。短距離と長距離の相互作用を持つモデルを考える。後者の場合、twaは非常に正確であり、半古典的ハミルトニアンが正しく同定されるならば、無限範囲極限において漸近的に完全となる。 TWAでは、電荷とスピンの異なる動的時間スケールをはっきりと区別することができる。興味深いことに、弱い中程度の障害強度では電荷の亜拡散挙動を観察し、スピンは拡散力学を示す。強い障害では、量子フィッシャー情報はスピンよりも電荷の増加が遅い時間に対数的成長を示す。短距離モデルとは対照的に、初期状態のドメイン壁のような強い不均一性は、特に弱い障害において、熱化ダイナミクスを著しく遅くすることができる。この振る舞いは、このようなシステムにおける多体局在の解析を目的とした冷原子実験プロトコルの設計において、さらなる課題をもたらす可能性がある。このアプローチでは、多体局在相の存在について明確な記述はできないが、短距離モデルと長距離モデルの両方において、障害強度の関数として非常に高速なクロスオーバーが急速に熱化するから、遅いガラスのような状態になる。

Quench dynamics in a two-dimensional system of interacting fermions is analyzed within the semiclassical truncated Wigner approximation (TWA). The models with short-range and long-range interactions are considered. We show that in the latter case, the TWA is very accurate, becoming asymptotically exact in the infinite-range limit, provided that the semiclassical Hamiltonian is correctly identified. Within the TWA, different dynamical timescales of charges and spins can be clearly distinguished. Interestingly, for a weak and moderate disorder strength, we observe subdiffusive behavior of charges, while spins exhibit diffusive dynamics. At strong disorder, the quantum Fisher information shows logarithmic growth in time with a slower increase for charges than for spins. It is shown that in contrast to the short-range model, strong inhomogeneities such as domain walls in the initial state can significantly slow down thermalization dynamics, especially at weak disorder. This behavior can put additional challenges in designing cold-atom experimental protocols aimed to analyze possible many-body localization in such systems. While within this approach we cannot make any definite statements about the existence of a many-body localized phase, we see a very fast crossover as a function of disorder strength from rapidly thermalizing to a slow glassy like regime both for the short-range and long-range models.

翻訳日:2023-06-03 19:09:03 公開日:2020-12-30

# 最小結合型量子熱エンジンの熱力学

Thermodynamics of Minimal Coupling Quantum Heat Engines ( http://arxiv.org/abs/2003.05788v5 )

ライセンス: Link先を確認

Marcin {\L}obejko, Pawe{\l} Mazurek, Micha{\l} Horodecki

(参考訳) 最小結合型量子熱エンジンは、明示的なエネルギー貯蔵システム、熱浴、作業体からなる熱機械で、離散的なストローク(エネルギー保存型2体量子演算)を通してサブシステムに結合する。このパラダイムの中で、仕事抽出過程は非パッシブエネルギー(エルゴトロピー)の流れによって基本的に制限され、一方、エネルギー散逸はパッシブエネルギーの流れによって表現される量子熱力学の一般的な枠組みを示す。作業体の小さな寸法と2体操作への制限により、エンジンは基本的に可逆的であることが判明した。我々の主な成果は、3ストロークと2レベルの作業体からなる可逆的最小結合エンジンのクラス全体において、各サイクル当たりの最適効率と作業量を求めることであり、そこでは、作業体と電池の間の全ての量子相関を考慮に入れている。主要な新しいツールの1つは、導入される「制御・マージ状態」である。これは、作業体ヒルベルト空間にのみ作用するが、全体の作業体バッテリシステムの作業抽出に関する全ての特徴をカプセル化するものである。また,多ストロークエンジンの一般化を提案するとともに,抽出された作業トレードオフに対する効率性や,エンジンの動作サイクルを何サイクルも経た後に発生する作業変動を解析した。

The minimal-coupling quantum heat engine is a thermal machine consisting of an explicit energy storage system, heat baths, and a working body, which alternatively couples to subsystems through discrete strokes -- energy-conserving two-body quantum operations. Within this paradigm, we present a general framework of quantum thermodynamics, where a work extraction process is fundamentally limited by a flow of non-passive energy (ergotropy), while energy dissipation is expressed through a flow of passive energy. It turns out that small dimensionality of the working body and a restriction only to two-body operations make the engine fundamentally irreversible. Our main result is finding the optimal efficiency and work production per cycle within the whole class of irreversible minimal-coupling engines composed of three strokes and with the two-level working body, where we take into account all possible quantum correlations between the working body and the battery. One of the key new tools is the introduced "control-marginal state" -- one which acts only on a working body Hilbert space, but encapsulates all features regarding work extraction of the total working body-battery system. In addition, we propose a generalization of the many-stroke engine, and we analyze efficiency vs extracted work trade-offs, as well as work fluctuations after many cycles of the running of the engine.

翻訳日:2023-05-29 08:26:33 公開日:2020-12-30

# スピン1/2の波動と誘導方程式の統一

Unification of the wave and guidance equations for spin 1/2 ( http://arxiv.org/abs/2003.06058v2 )

ライセンス: Link先を確認

Peter Holland

(参考訳) 我々は、シュロディンガー方程式と誘導方程式を単一の不均質なシュロディンガー方程式から外部ベクトルポテンシャルを持つリーマン多様体へ一般化する。特別の場合、スピン1/2剛回転子に対する統一理論が得られる。この理論は、粒子と導波を2スピノールとして統合する統一場であるガリレオ群の下で対称であることが証明されている。

We generalize our previous unification of the Schrodinger and guidance equations in a single inhomogeneous Schrodinger equation to a Riemannian manifold with an external vector potential. A special case yields the unified theory for a spin 1/2 rigid rotator. The theory is proved to be symmetrical under the Galileo group, the unified field that integrates the particle and guiding wave being a 2-spinor.

翻訳日:2023-05-29 08:18:07 公開日:2020-12-30

# ダイヤモンド中中性シリコン空孔中心における境界励起子状態による光磁気共鳴

Optically detected magnetic resonance in neutral silicon vacancy centers in diamond via bound exciton states ( http://arxiv.org/abs/2004.12544v2 )

ライセンス: Link先を確認

Zi-Huai Zhang, Paul Stevenson, Gergo Thiering, Brendon C. Rose, Ding Huang, Andrew M. Edmonds, Matthew L. Markham, Stephen A. Lyon, Adam Gali, and Nathalie P. de Leon

(参考訳) ダイヤモンド中の中性シリコン空孔(SiV0)中心は、優れた光学特性と長いスピンコヒーレンス時間のために量子ネットワークの候補となる。しかし、これらの欠陥におけるスピン依存蛍光は励起状態の微細構造や非共鳴スピン偏光の制限が不十分なため、解明されている。本稿では, 低温におけるsiv0中心の光学的に検出された磁気共鳴とコヒーレント制御の実現について述べる。これらの状態は群論と密度汎関数理論を用いて境界励起子状態として割り当てる。これらの境界励起子状態は、SiV0と他の新興欠陥系に対する新しい制御スキームを可能にする。

Neutral silicon vacancy (SiV0) centers in diamond are promising candidates for quantum networks because of their excellent optical properties and long spin coherence times. However, spin-dependent fluorescence in such defects has been elusive due to poor understanding of the excited state fine structure and limited off-resonant spin polarization. Here we report the realization of optically detected magnetic resonance and coherent control of SiV0 centers at cryogenic temperatures, enabled by efficient optical spin polarization via previously unreported higher-lying excited states. We assign these states as bound exciton states using group theory and density functional theory. These bound exciton states enable new control schemes for SiV0 as well as other emerging defect systems.

翻訳日:2023-05-22 00:27:42 公開日:2020-12-30

# ユニタリティにおけるボソニック三量体に対するゼロレンジ相互作用のモデル

Models of zero-range interaction for the bosonic trimer at unitarity ( http://arxiv.org/abs/2006.02426v3 )

ライセンス: Link先を確認

Alessandro Michelangeli

(参考訳) ゼロ範囲の2体相互作用によって相互に結合された同一のボソンからなる3体系の物理関連量子ハミルトニアンの数学的構成を示す。プレゼンテーションの大部分では、無限の散乱長(ユニタリティ・レジーム)が考慮される。数学の分野にはいくつかの前駆体がある。我々は、自由ハミルトニアンを偶然超平面の近傍で消滅する波動関数に制限することにより得られる極小作用素の自己随伴拡大の作用素論的構成を導く。このうち, オペレータ構築において, ゼロレンジ法における物理文献に普遍的な形式的物理的議論によって提案される特定の短大構造の存在を実践することにより, 物理的に関連するものを選択する。これは異なる段階において、自己随伴拡張スキームである la Kre{\u\i}n-Vi\v{s}ik-Birman と a la von Neumann を適用することでなされる。我々は正準モデルのクラスを作り、負の有界状態の構造も解析する。ボソニティとゼロレンジの組み合わせにより、そのような標準モデルは典型的なトーマススペクトルとエフィモフスペクトル、すなわちマイナス無限遠点と零点の両方に蓄積されるエネルギー固有値列を示す。また、このようなスペクトル不安定を効果的に短スケールパターンを維持しながら防止する正則化についても論じる。演算子の資格の他に、関連するエネルギー二次形式も提示する。我々は,自己随伴性領域の正しい同定に悪名高い演算子理論構築の特定のステップを明らかにするために,解析を構造化した。

We present the mathematical construction of the physically relevant quantum Hamiltonians for a three-body systems consisting of identical bosons mutually coupled by a two-body interaction of zero range. For a large part of the presentation, infinite scattering length will be considered (the unitarity regime). The subject has several precursors in the mathematical literature. We proceed through an operator-theoretic construction of the self-adjoint extensions of the minimal operator obtained by restricting the free Hamiltonian to wave-functions that vanish in the vicinity of the coincidence hyperplanes: all extensions thus model an interaction precisely supported at the spatial configurations where particles come on top of each other. Among them, we select the physically relevant ones, by implementing in the operator construction the presence of the specific short-scale structure suggested by formal physical arguments that are ubiquitous in the physical literature on zero-range methods. This is done by applying at different stages the self-adjoint extension schemes a la Kre{\u\i}n-Vi\v{s}ik-Birman and a la von Neumann. We produce a class of canonical models for which we also analyse the structure of the negative bound states. Bosonicity and zero range combined together make such canonical models display the typical Thomas and Efimov spectra, i.e., sequence of energy eigenvalues accumulating to both minus infinity and zero. We also discuss a type of regularisation that prevents such spectral instability while retaining an effective short-scale pattern. Beside the operator qualification, we also present the associated energy quadratic forms. We structured our analysis so as to clarify certain steps of the operator-theoretic construction that are notoriously subtle for the correct identification of a domain of self-adjointness.

翻訳日:2023-05-17 06:32:19 公開日:2020-12-30

# Empirica:高スループットマクロレベルの実験のための仮想ラボ

Empirica: a virtual lab for high-throughput macro-level experiments ( http://arxiv.org/abs/2006.11398v2 )

ライセンス: Link先を確認

Abdullah Almaatouq, Joshua Becker, James P. Houghton, Nicolas Paton, Duncan J. Watts, Mark E. Whiting

(参考訳) virtual labsは、従来の物理実験室では実現不可能な、高スループットとマクロレベルの実験を研究者が設計できる。オンライン研究の人気は高まっているが、バーチャルラボの実験の設計とデプロイでは、研究者は依然として多くの技術的および物流的障壁に直面している。バーチャルラボ実験の開発を容易にするプラットフォームはいくつか存在するが、彼らは通常、研究者にユーザビリティと機能の間の重大なトレードオフを提示している。 Empiricaは"フレキシブルなデフォルト"設計戦略を採用することで、ユーザビリティと機能のトレードオフに対するソリューションを提供するモジュール型の仮想ラボです。この戦略により、初心者プログラマが利用できる開発プラットフォームを提供しながら、完全な"ビルド・アズ・ア"の柔軟性を維持できます。 Empiricaのアーキテクチャはパラメータ化可能な実験設計、再利用可能なプロトコル、迅速な開発を可能にするように設計されている。これらの機能は、仮想ラボ実験のアクセシビリティを高め、実験設計におけるイノベーションの障壁を取り除き、分散ヒト計算の迅速な理解を可能にする。

Virtual labs allow researchers to design high-throughput and macro-level experiments that are not feasible in traditional in-person physical lab settings. Despite the increasing popularity of online research, researchers still face many technical and logistical barriers when designing and deploying virtual lab experiments. While several platforms exist to facilitate the development of virtual lab experiments, they typically present researchers with a stark trade-off between usability and functionality. We introduce Empirica: a modular virtual lab that offers a solution to the usability-functionality trade-off by employing a "flexible defaults" design strategy. This strategy enables us to maintain complete "build anything" flexibility while offering a development platform that is accessible to novice programmers. Empirica's architecture is designed to allow for parameterizable experimental designs, reusable protocols, and rapid development. These features will increase the accessibility of virtual lab experiments, remove barriers to innovation in experiment design, and enable rapid progress in the understanding of distributed human computation.

翻訳日:2023-05-13 09:13:41 公開日:2020-12-30

# 量子輸送モデルにおける双対性

Duality in quantum transport models ( http://arxiv.org/abs/2008.03476v2 )

ライセンス: Link先を確認

Rouven Frassek, Cristian Giardin\`a, Jorge Kurchan

(参考訳) 量子系を熱的「リンドブラジアン」浴槽と接触させるために、古典的輸送モデルのために広く研究されてきた「双対的アプローチ」を開発した。メソッドが提供します (a)原モデルの単純なモデルへのマッピングで、数個の粒子だけを含むもの、及び (b) この種の一般的な浴の動的過程は平衡浴の浴にマッピングできることが示されている。我々は, [d. bernard, t. jin, phys. rev. lett. 123, 080601 (2019)] で導入された量子対称排他過程という,特定のモデルの研究を通じてこれを例示する。古典的な場合と同様に、全体の構成は問題の力学対称性を考慮すれば理解可能となる。

We develop the `duality approach', that has been extensively studied for classical models of transport, for quantum systems in contact with a thermal `Lindbladian' bath. The method provides (a) a mapping of the original model to a simpler one, containing only a few particles and (b) shows that any dynamic process of this kind with generic baths may be mapped onto one with equilibrium baths. We exemplify this through the study of a particular model: the quantum symmetric exclusion process introduced in [D. Bernard, T. Jin, Phys. Rev. Lett. 123, 080601 (2019)]. As in the classical case, the whole construction becomes intelligible by considering the dynamical symmetries of the problem.

翻訳日:2023-05-06 19:58:23 公開日:2020-12-30

# 非アベリアカゴメ格子におけるフラットバンドと$Z_2$位相位相

Flat bands and $Z_2$ topological phases in a non-Abelian kagome lattice ( http://arxiv.org/abs/2008.10738v2 )

ライセンス: Link先を確認

Zhenxiang Gao and Zhihao Lan

(参考訳) 時間反転と逆対称の両方を持つ非アベリアカゴメ格子モデルを導入し、このモデルの平面バンド物理と位相位相を研究する。時間反転と反転の双方の対称性が共存しているため、エネルギーバンドはフラットバンドの存在に対するエネルギーと条件を解析的に得ることができる2重縮退した3つのバンドから構成され、フラットバンドの他の2つの分散バンドを上から中から下へ、そして3つのバンドの底まで調整することができる。さらに,ガッピング位相はパラメータ空間の離散点にのみ近接するバンドギャップと同じ位相に属することを示し,バンドギャップを閉じることなく任意の2つのガッピング位相を断続的に接続することを示した。逆対称による時間反転対称性とパリティ特性に基づくファフィアンアプローチを用いて、バルクトポロジカル不変量(英語版)を計算し、一意のギャップ位相がZ_2$量子スピンホール位相に属することを証明し、エッジ状態計算によりさらに確認する。

We introduce a non-Abelian kagome lattice model that has both time-reversal and inversion symmetries and study the flat band physics and topological phases of this model. Due to the coexistence of both time-reversal and inversion symmetries, the energy bands consist of three doubly degenerate bands whose energy and conditions for the presence of flat bands could be obtained analytically, allowing us to tune the flat band with respect to the other two dispersive bands from the top to the middle and then to the bottom of the three bands. We further study the gapped phases of the model and show that they belong to the same phase as the band gaps only close at discrete points of the parameter space, making any two gapped phases adiabatically connected to each other without closing the band gap. Using the Pfaffian approach based on the time-reversal symmetry and parity characterization from the inversion symmetry, we calculate the bulk topological invariants and demonstrate that the unique gapped phases belong to the $Z_2$ quantum spin Hall phase, which is further confirmed by the edge state calculations.

翻訳日:2023-05-05 02:02:03 公開日:2020-12-30

# オンチップ導波路におけるフォトニック量子ビットのオンデマンド量子ストレージ

On-demand quantum storage of photonic qubits in an on-chip waveguide ( http://arxiv.org/abs/2009.01796v3 )

ライセンス: Link先を確認

Chao Liu, Tian-Xiang Zhu, Ming-Xu Su, You-Zhi Ma, Zong-Quan Zhou, Chuan-Feng Li, and Guang-Can Guo

(参考訳) フォトニック量子メモリは、量子情報処理(qip)のコア要素である。スケーラブルで便利な応用のために、固体で作製された様々な導波路に基づく集積量子メモリに多大な努力が払われている。しかし、qipの必須要件である量子ビットのオンデマンドストレージは、そのような統合量子メモリを用いて実装することが依然として困難である。本稿では、スターク変調原子周波数コムプロトコルを用いて、$^{151}$Eu$^{3+}$:Y$_2$SiO$_5$クリスタルの表面上のオンチップ導波路メモリにおける時間ビン量子ビットのオンデマンド記憶について報告する。 99.3\%\pm0.2\%$のキュービット記憶忠実度が、古典的な測度・プレペア戦略を用いて達成可能な最高忠実度をはるかに超える1パルスあたり0.5フォトンの入力で得られる。オンデマンド検索機能を備えた集積量子メモリは、量子ネットワークにおける統合量子ノードの実用化に向けた重要なステップである。

Photonic quantum memory is the core element in quantum information processing (QIP). For the scalable and convenient practical applications, great efforts have been devoted to the integrated quantum memory based on various waveguides fabricated in solids. However, on-demand storage of qubits, which is an essential requirement for QIP, is still challenging to be implemented using such integrated quantum memory. Here we report the on-demand storage of time-bin qubits in an on-chip waveguide memory on the surface of a $^{151}$Eu$^{3+}$:Y$_2$SiO$_5$ crystal, utilizing the Stark modulated atomic frequency comb protocol. A qubit storage fidelity of $99.3\%\pm0.2\%$ is obtained with a input of 0.5 photons per pulse, far beyond the highest fidelity achievable using the classical measure-and-prepare strategy. The developed integrated quantum memory with the on-demand retrieval capability, represents an important step towards practical applications of integrated quantum nodes in quantum networks.

翻訳日:2023-05-03 22:55:16 公開日:2020-12-30

# 計算相転移:ベンチマーク装置と量子オプティマイザ

Computational Phase Transitions: Benchmarking Ising Machines and Quantum Optimisers ( http://arxiv.org/abs/2009.05579v2 )

ライセンス: Link先を確認

Hariphan Philathong and Vishwa Akshay and Ksenia Samburskaya and Jacob Biamonte

(参考訳) 物理プロセッサのベンチマークには様々なアプローチがあるが、最近の研究は計算相転移に焦点を当てている。これはいくつかの要因による。重要なことに、最も難しいインスタンスは狭い領域に集中しており、同様の計算課題を持つ問題インスタンスの均一なランダム分布を可能にする制御パラメータがある。コヒーレントイジングマシン(s)から生成された分布における計算相転移を観察できることが確立されている。量子近似最適化の観点では、量子アルゴリズムが機能する能力は、変数比(密度と呼ばれる)に制約される問題の比率に決定的に依存する。性能に対する臨界密度依存性は、いわゆる到達可能性の欠陥を引き起こした。この観点からは,様々なベンチマーキングタスクに計算位相遷移をどのように適用するかを理解するために必要な背景を思い出し,これらの現代的知見のいくつかを調査した。

While there are various approaches to benchmark physical processors, recent findings have focused on computational phase transitions. This is due to several factors. Importantly, the hardest instances appear to be well-concentrated in a narrow region, with a control parameter allowing uniform random distributions of problem instances with similar computational challenge. It has been established that one could observe a computational phase transition in a distribution produced from coherent Ising machine(s). In terms of quantum approximate optimisation, the ability for the quantum algorithm to function depends critically on the ratio of a problems constraint to variable ratio (called density). The critical density dependence on performance resulted in what was called, reachability deficits. In this perspective we recall the background needed to understand how to apply computational phase transitions in various bench-marking tasks and we survey several such contemporary findings.

翻訳日:2023-05-02 22:19:39 公開日:2020-12-30

# 量子熱力学第一法則におけるコヒーレンスの役割の解明

Unravelling the role of coherence in the first law of quantum thermodynamics ( http://arxiv.org/abs/2009.11370v3 )

ライセンス: Link先を確認

Bert\'ulio de Lima Bernardo

(参考訳) 量子熱力学の新興分野における基本的な問題の一つは、量子レベルで起こるエネルギー過程におけるコヒーレンスの役割である。ここでは、仕事と熱の古典的な定義から導かれた熱力学の第1法則の2つの異なる量子バージョンを調べることでこの問題に対処する。そうすることで、両方のシナリオに数学的に矛盾があることが分かりました。さらに,一貫性を確立する上で,コヒーレンスのダイナミクスのエネルギー的寄与が重要な要素であることを示す。 2段階の原子系を含むいくつかの例について考察した。

One of the fundamental questions in the emerging field of quantum thermodynamics is the role played by coherence in energetic processes that occur at the quantum level. Here, we address this issue by investigating two different quantum versions of the first law of thermodynamics, derived from the classical definitions of work and heat. By doing so, we find out that there exists a mathematical inconsistency between both scenarios. We further show that the energetic contribution of the dynamics of coherence is the key ingredient to establish the consistency. Some examples involving two-level atomic systems are discussed in order to illustrate our findings.

翻訳日:2023-05-01 04:36:24 公開日:2020-12-30

# 粒子理論者のための量子情報

Quantum Information for Particle Theorists ( http://arxiv.org/abs/2010.02931v2 )

ライセンス: Link先を確認

Joseph D. Lykken

(参考訳) 理論高等研究所(TASI 2020)での講義は2020年6月1-26日。対象となったトピックは、量子回路、絡み合い、量子テレポーテーション、ベルの不等式、量子エントロピーとデコヒーレンス、古典的および量子計測、量子場理論における絡み合いエントロピーの領域法則、量子コンピュータ上の量子場理論のシミュレーションである。その過程で私たちは、大学で量子力学を学ぶ(そして教える)方法の根本的なゆるやかさに直面しました。 PythonノートブックとMathematicaノートブックへのリンクにより、読者は計算を再現して拡張でき、量子シミュレータで5つの実験を行うことができる。

Lectures given at the Theoretical Advanced Study Institute (TASI 2020), 1-26 June 2020. The topics covered include quantum circuits, entanglement, quantum teleportation, Bell inequalities, quantum entropy and decoherence, classical versus quantum measurement, the area law for entanglement entropy in quantum field theory, and simulating quantum field theory on a quantum computer. Along the way we confront the fundamental sloppiness of how we all learned (and some of us taught) quantum mechanics in college. Links to a Python notebook and Mathematica notebooks will allow the reader to reproduce and extend the calculations, as well as perform five experiments on a quantum simulator.

翻訳日:2023-04-29 20:14:30 公開日:2020-12-30

# ライドバーグドレッシング用光ツイーザアレイのラマンサイドバンド冷却

Raman Sideband Cooling in Optical Tweezer Arrays for Rydberg Dressing ( http://arxiv.org/abs/2010.07838v2 )

ライセンス: Link先を確認

Nikolaus Lorenz, Lorenzo Festa, Lea-Marina Steinert, Christian Gross

(参考訳) 単一中性原子が光学トワイザーに閉じ込められ、レーザー結合されたライドバーグ状態は、量子シミュレーションのために構成可能な原子配列を生成する高速で柔軟なプラットフォームを提供する。このプラットフォームは、様々な幾何学における量子スピン系の研究に特に適している。しかし、連続トラッピングを必要とする実験では、トラッピングポテンシャルと温度拡大によって引き起こされる不均質な光シフトは厳しい制限を課す。ここでは、ラマンサイドバンド冷却がこれらの制限を克服し、トワイザー配列のライドバーグドレッシングのステージを準備する様子を示す。

Single neutral atoms trapped in optical tweezers and laser-coupled to Rydberg states provide a fast and flexible platform to generate configurable atomic arrays for quantum simulation. The platform is especially suited to study quantum spin systems in various geometries. However, for experiments requiring continuous trapping, inhomogeneous light shifts induced by the trapping potential and temperature broadening impose severe limitations. Here we show how Raman sideband cooling allows one to overcome those limitations, thus, preparing the stage for Rydberg dressing in tweezer arrays.

翻訳日:2023-04-29 00:25:46 公開日:2020-12-30

# Rydberg-interacting qubits を用いた量子シミュレーションと計算

Quantum simulation and computing with Rydberg-interacting qubits ( http://arxiv.org/abs/2011.03031v2 )

ライセンス: Link先を確認

M. Morgado and S. Whitlock

(参考訳) ライドバーグ状態に励起された光学的に捕捉された原子の配列は、最近量子シミュレーションと計算のための競争的物理プラットフォームとして出現し、高忠実度状態の準備と読み出し、量子論理ゲートと100量子ビット以上の量子力学制御が全て実証されている。これらのシステムは現在、数百の量子ビットと数千のマルチキュービットゲートを持つ信頼性のある量子計算が、エラー率の低い領域に初めて到達すべき点に近づいている。本稿では、量子ビットの符号化、量子演算の実行、量子多体ハミルトニアンのエンジニアリングにおいて高い柔軟性を強調した、rydberg量子ツールボックスの概要を示す。次に,高忠実度量子演算と論理ゲート,および多体状態における量子シミュレーションに関する現状を概観する。最後に、Rydbergプラットフォームに特に適する計算スキームと、汎用量子シミュレータや量子コンピュータへの道のりにおける残りの課題について論じる。

Arrays of optically trapped atoms excited to Rydberg states have recently emerged as a competitive physical platform for quantum simulation and computing, where high-fidelity state preparation and readout, quantum logic gates and controlled quantum dynamics of more than 100 qubits have all been demonstrated. These systems are now approaching the point where reliable quantum computations with hundreds of qubits and realistically thousands of multiqubit gates with low error rates should be within reach for the first time. In this article we give an overview of the Rydberg quantum toolbox, emphasizing the high degree of flexibility for encoding qubits, performing quantum operations and engineering quantum many-body Hamiltonians. We then review the state-of-the-art concerning high-fidelity quantum operations and logic gates as well as quantum simulations in many-body regimes. Finally, we discuss computing schemes that are particularly suited to the Rydberg platform and some of the remaining challenges on the road to general purpose quantum simulators and quantum computers.

翻訳日:2023-04-25 05:18:21 公開日:2020-12-30

# 熱場二重状態による2次元量子相転移のためのテンソルネットワーク波動関数の構成

Constructing tensor network wavefunction for a generic two-dimensional quantum phase transition via thermofield double states ( http://arxiv.org/abs/2012.14152v2 )

ライセンス: Link先を確認

Wen-Tao Xu and Guang-Ming Zhang

(参考訳) 二次元量子Rokhsar-Kivelson (RK) 型モデルの最も重要な特徴は、2次元統計モデルの分割関数に基底状態の波動関数ノルムをマッピングすることで、量子相転移が対応する統計モデルの熱相転移となることである。一般的な量子臨界点に対して、平衡密度作用素の浄化である熱場二重状態(TFD)の概念を導入することにより、RK波動関数の枠組みを一般化する。さらに、投影された絡み合ったペア状態の観点からTFD状態を表現することにより、R\'{e}nyiエントロピーの$N$-次はユークリッド時空における3次元統計モデルとなり、一般的な量子相転移を記述する。 2つの平行磁場を持つトーリック符号モデルを用いて、これらのアイデアを説明し、位相遷移が3次元普遍性クラスによって特徴づけられる3次元のZ_2$格子ゲージ-ヒッグスモデルの分割関数を導出する。

The most important feature of two-dimensional quantum Rokhsar-Kivelson (RK) type models is that their ground state wavefunction norms can be mapped into the partition functions of two-dimensional statistical models so that the quantum phase transitions become the thermal phase transitions of the corresponding statistical models. For a generic quantum critical point, we generalize the framework of RK wavefunctions by introducing the concept of the thermofield double (TFD) state, which is a purification of the equilibrium density operator. Moreover, by expressing the TFD state in terms of the projected entangled pair state, its $N$-order of R\'{e}nyi entropy results in a three-dimensional statistical model in Euclidian spacetime, describing the generic quantum phase transitions. Using the toric code model with two parallel magnetic fields as an example, we explain these ideas and derive the partition function of the three-dimensional $Z_2$ lattice gauge-Higgs model, where the phase transitions are characterized by the three-dimensional universality classes.

翻訳日:2023-04-19 01:58:38 公開日:2020-12-30

# マイクロ波の単一人工原子への決定論的負荷と位相形成

Deterministic loading and phase shaping of microwaves onto a single artificial atom ( http://arxiv.org/abs/2012.15084v1 )

ライセンス: Link先を確認

W.-J. Lin, Y. Lu, P. Y. Wen, Y.-T. Cheng, C.-P. Lee, K.-T. Lin, K.-H. Chiang, M. C. Hsieh, J. C. Chen, C.-S. Chuu, F. Nori, A. F. Kockum, G.-D. Lin, P. Delsing and I.-C. Hoi

(参考訳) 量子ノードに決定的に量子情報をロードすることは、量子ネットワークへの重要なステップである。ここでは、コヒーレント状態のマイクロ波光子と最適時相波形が、半無限の1次元(1次元)伝送線路導波路内の単一の超伝導人工原子に効率よくロードできることを実証する。原子のデコヒーレンス時間と時間定数が一致する指数上昇波形を持つ弱コヒーレント状態(平均光子数n<<1)を用いて,1次元半自由空間から人工原子への94%以上の負荷効率を示す。また、フォック状態のマイクロ波光子は98.5%の効率で決定的にロードできることを示した。我々はさらに、原子を励起するコヒーレント状態の位相を操作し、ローディングプロセスのコヒーレント制御を可能にする。その結果,導波路量子電磁力学(QED)に基づく量子ネットワークの実現に期待できる応用が得られた。

Loading quantum information deterministically onto a quantum node is an important step towards a quantum network. Here, we demonstrate that coherent-state microwave photons, with an optimal temporal waveform, can be efficiently loaded onto a single superconducting artificial atom in a semi-infinite one-dimensional (1D) transmission-line waveguide. Using a weak coherent state (average photon number N<<1 with an exponentially rising waveform, whose time constant matches the decoherence time of the artificial atom, we demonstrate a loading efficiency of above 94% from 1D semi-free space to the artificial atom. We also show that Fock-state microwave photons can be deterministically loaded with an efficiency of 98.5%. We further manipulate the phase of the coherent state exciting the atom, enabling coherent control of the loading process. Our results open up promising applications in realizing quantum networks based on waveguide quantum electrodynamics (QED).

翻訳日:2023-04-18 08:08:21 公開日:2020-12-30

# 量子ゼノダイナミクスに関連する積公式について

Note on a Product Formula Related to Quantum Zeno Dynamics ( http://arxiv.org/abs/2012.15061v1 )

ライセンス: Link先を確認

Pavel Exner and Takashi Ichinose

(参考訳) 分離可能ヒルベルト空間上で作用する非負自己共役作用素 $H$ と直交射影 $P$ が、$H_P := (H^{1/2}P)^*(H^{1/2}P)$ が密に定義されるとき、強い作用素位相において$\lim_{n\rightarrow \infty} (P\,\mathrm{e}^{-itH/n}P)^n = \mathrm{e}^{-itH_P}P$ が成り立つことを証明する。また、この積公式の修正と、$P$が$P(0)=P$を満たす強い連続射影値関数に置き換えられる状況への拡張も導かれる。

Given a nonnegative self-adjoint operator $H$ acting on a separable Hilbert space and an orthogonal projection $P$ such that $H_P := (H^{1/2}P)^*(H^{1/2}P)$ is densely defined, we prove that $\lim_{n\rightarrow \infty} (P\,\mathrm{e}^{-itH/n}P)^n = \mathrm{e}^{-itH_P}P$ holds in the strong operator topology. We also derive modifications of this product formula and its extension to the situation when $P$ is replaced by a strongly continuous projection-valued function satisfying $P(0)=P$.

翻訳日:2023-04-18 08:08:03 公開日:2020-12-30

# 弱および強結合状態における駆動場に対する量子ゼノ効果と反ゼノ効果

The quantum Zeno and anti-Zeno effects with driving fields in the weak and strong coupling regimes ( http://arxiv.org/abs/2012.15040v1 )

ライセンス: Link先を確認

Mehwish Majeed and Adam Zaman Chaudhry

(参考訳) 量子力学における繰り返しの測定は、量子系の時間進化を凍結(量子ゼノ効果)または強化(量子反ゼノ効果)することができる。本稿では,システム環境結合の弱さのみを仮定して,任意の開量子系に対する量子ゼノ効果と反ゼノ効果の一般解法を提案する。特に,任意の駆動場に従属する2レベル系の有効減衰率と周期的測定の一般式を得た。その結果, 駆動場は減衰速度を変化させるので, 量子ゼノと反ゼノの挙動は定性的かつ定量的に変化することがわかった。また、量子ゼノ効果と反ゼノ効果に対する駆動場の非自明な効果をさらに説明するために、複数の2レベル系と、調和振動子の環境に強く結合した2レベル系からなる系にも結果を拡張した。

Repeated measurements in quantum mechanics can freeze (the quantum Zeno effect) or enhance (the quantum anti-Zeno effect) the time-evolution of a quantum system. In this paper, we present a general treatment of the quantum Zeno and anti-Zeno effects for arbitrary driven open quantum systems, assuming only that the system-environment coupling is weak. In particular, we obtain a general expression for the effective decay rate of a two-level system subjected to arbitrary driving fields as well as periodic measurements. We demonstrate that the driving fields change the decay rate, and hence the quantum Zeno and anti-Zeno behavior, both qualitatively and quantitatively. We also extend our results to systems consisting of more than one two-level system, as well as a two-level system strongly coupled to an environment of harmonic oscillators, to further illustrate the non-trivial effect of the driving fields on the quantum Zeno and anti-Zeno effects.

翻訳日:2023-04-18 08:07:39 公開日:2020-12-30

# ダイヤモンド中性シリコン空孔中心に基づく量子ノード用ハイブリッドIII-Vダイヤモンドフォトニックプラットフォーム

Hybrid III-V diamond photonic platform for quantum nodes based on neutral silicon vacancy centers in diamond ( http://arxiv.org/abs/2012.15018v1 )

ライセンス: Link先を確認

Ding Huang, Alex Abulnaga, Sacha Welinski, Mouktik Raha, Jeff D. Thompson, and Nathalie P. de Leon

(参考訳) ダイヤモンドのカラーセンタに基づく原子量子メモリとオンチップフォトニックデバイスを統合することで、長距離の絡み合い分布が可能になる。しかし、色中心は環境に非常に敏感であり、ナノファブリケート構造ではその性質が劣化するため、統合への取り組みは困難である。本稿では,ダイヤモンド基板のエッチングを回避し,中性シリコン空隙(siv0)センタ用に設計された異種集積型,オンチップ,iii-vダイヤモンドプラットフォームについて述べる。ダイヤモンド表面近くのSiV0中心へのエバネッセント結合により、このプラットフォームはSiV0放出のパーセル増強と、通信Cバンドへの効率的な周波数変換を可能にする。提案した構造は手軽に製造できる技術で実現できる。

Integrating atomic quantum memories based on color centers in diamond with on-chip photonic devices would enable entanglement distribution over long distances. However, efforts towards integration have been challenging because color centers can be highly sensitive to their environment, and their properties degrade in nanofabricated structures. Here, we describe a heterogeneously integrated, on-chip, III-V diamond platform designed for neutral silicon vacancy (SiV0) centers in diamond that circumvents the need for etching the diamond substrate. Through evanescent coupling to SiV0 centers near the surface of diamond, the platform will enable Purcell enhancement of SiV0 emission and efficient frequency conversion to the telecommunication C-band. The proposed structures can be realized with readily available fabrication techniques.

翻訳日:2023-04-18 08:06:50 公開日:2020-12-30

# システム内の障害・紛争に対する政策の評価に対するエントロピー分析:12量子社会量子システムとしての交通交差点を事例として

Entropic Analysis to Assess impact of Policies on Disorders and Conflicts within a system: Case Study of Traffic intersection as 12-Qubit Social Quantum System ( http://arxiv.org/abs/2012.15012v1 )

ライセンス: Link先を確認

Rakesh Kumar Pandey

(参考訳) 交通交差点におけるシナリオのエントロピー解析を詳細に試みる。モデルは衝突エントロピーを定義するために利用される。交通信号の設置やフライオーバーの建設といった戦略(政治)を用いることで、エントロピーは減少し、交通が順序づけられることが示されている。これらの方針は, エントロピーの低減と競合エントロピーの排除に寄与することが示唆された。このような分析は、好ましいポリシー決定や人工知能アルゴリズムの定式化に多大な応用を見出すことができる。交通交差点の顕著な類似性は、量子システムの振る舞いを理解するために、新しいトラフィックフローの研究範囲を開く12量子ビットの量子システムで見出される。

Entropic analysis of a scenario at a traffic intersection is attempted in detail. The model is utilized to define Conflict Entropy. It is shown that with the use of strategies (policies) like installing traffic lights and construction of flyovers the Entropy is reduced thereby making the traffic ordered. It is shown that these policies help in reducing the Entropy and eliminating the Conflict Entropy completely in both the cases. Such an analysis can find immense application in deciding a favorable policy and in formulation of artificial intelligence algorithms. A striking similarity of the traffic intersection is found with Quantum systems of twelve qubits that opens up a new scope of study of traffic flows to understand the behavior of Quantum Systems.

翻訳日:2023-04-18 08:06:24 公開日:2020-12-30

# 量子計算の一貫性と等価原理

Consistency of quantum computation and the equivalence principle ( http://arxiv.org/abs/2012.14990v1 )

ライセンス: Link先を確認

Marcin Nowakowski

(参考訳) 一般相対性理論の構成要素の1つである同値原理は、重力における量子効果の分析にも重要であると考えられる。本稿では,等値原理が重力場における量子計算の一貫性を保たなければならないかどうかを考察する。本稿では,重力場と加速参照フレームの両方のステップからなるループ進化解析法を提案する。等価原理がなければ、ループ量子進化はユニタリでないことを示し、一貫性を緩める。この理由から、同値原理はゲージ変換によって定式化され、ループされた経路上の作用に関連する適切な位相を得る粒子に対して解析される。結果として、重力場における量子演算の一貫性を維持するためには、同値原理の量子変種を保持する必要がある。これは量子情報処理におけるこの基本的な重力原理の量子化バージョンの重要性を証明している。

The equivalence principle, being one of the building blocks of general relativity, seems to be also crucial for analysis of quantum effects in gravity. In this paper we consider the question if the equivalence principle has to hold for consistency of performing quantum computation in gravitational field. We propose an analysis with a looped evolution consisting of steps both in the gravitational field and in the accelerated reference frame. We show that without the equivalence principle the looped quantum evolution cannot be unitary and looses its consistency. For this reasoning the equivalence principle is formulated in terms of the gauge transformations and is analyzed for particles acquiring an appropriate phases associated with the actions over the looped path. In consequence, to keep consistency of quantum operations in gravitational field, it is required to keep some quantum variant of the equivalence principle. This proves importance of the quantized versions of this fundamental gravitational principle for quantum information processing.

翻訳日:2023-04-18 08:05:17 公開日:2020-12-30

# 時間内の2粒子量子干渉

Two-boson quantum interference in time ( http://arxiv.org/abs/2012.15165v1 )

ライセンス: Link先を確認

Nicolas J. Cerf and Michael G. Jabbour

(参考訳) 有名なHong-Ou-Mandel効果は、2粒子量子干渉のパラダイムである。ポーリ原理(pauli principle)による同一の量子粒子の対称性にそのルーツがある。ビームスプリッタ(透過率1/2)に衝突する2つの同一のボゾンは、光や物質の多数の実験で確認されたように、両方の出力ポートで偶然検出できない。ここでは、部分時間反転がビームスプリッタ線形結合を増幅に変換することを確立する。この双対性から、量子増幅器(ゲイン2)における2-ボソン干渉効果の存在を推定し、基礎となるメカニズムを時間的非識別可能性と同定する。この基本的なメカニズムはボソニック・ボゴリューボフ変換に汎用的であるため、量子物理学における幅広い影響を予想する。

The celebrated Hong-Ou-Mandel effect is the paradigm of two-particle quantum interference. It has its roots in the symmetry of identical quantum particles, as dictated by the Pauli principle. Two identical bosons impinging on a beam splitter (of transmittance 1/2) cannot be detected in coincidence at both output ports, as confirmed in numerous experiments with light or even matter. Here, we establish that partial time reversal transforms the beamsplitter linear coupling into amplification. We infer from this duality the existence of an unsuspected two-boson interferometric effect in a quantum amplifier (of gain 2) and identify the underlying mechanism as timelike indistinguishability. This fundamental mechanism is generic to any bosonic Bogoliubov transformation, so we anticipate wide implications in quantum physics.

翻訳日:2023-04-18 07:57:04 公開日:2020-12-30

# kugel-khomskii模型における量子絡み合い、局所指標と外部場の影響

Quantum entanglement, local indicators and effect of external fields in the Kugel-Khomskii model ( http://arxiv.org/abs/2012.15134v1 )

ライセンス: Link先を確認

V.E. Valiulin, A.V. Mikheyenkov, N.M. Chtchelkatchev, K.I. Kugel

(参考訳) 厳密な対角化手法を用いて、異なるサブシステム間交換項を持つ2スピンモデル (kugel-khomskii) によって記述された有限鎖のエネルギースペクトルと波動関数を決定する。発見された解は、このクラスのモデルに固有の量子エンタングルメントの問題に対処する可能性を与える。我々は,絡み合いの適切な数値尺度として扱われるコンカレンスの計算に重点を置いている。また,局所的絡み合い指標と考えられる2点相関関数の挙動を解析した。我々は、非零絡み合い領域を含むモデルの位相図を構築する。両スピン変数が絡み合う領域に共役する外部磁場の発音効果は、モデルのパラメータによっても絡み合いを増強し弱めることができる。

Using the exact diagonalization technique, we determine the energy spectrum and wave functions for finite chains described by the two-spin (Kugel--Khomskii) model with different types of intersubsystem exchange terms. The found solutions provide a possibility to address the problem of quantum entanglement inherent to this class of models. We put the main emphasis on the calculations of the concurrence treated as an adequate numerical measure of the entanglement. We also analyze the behavior of two-site correlation functions considered as a local indicator of entanglement. We construct the phase diagrams of the models involving the regions of nonzero entanglement. The pronounced effect of external fields, conjugated to both spin variables on the regions with entanglement, could both enhance and weaken the entanglement depending on the parameters of the models.

翻訳日:2023-04-18 07:56:28 公開日:2020-12-30

# Webのルーチン性と予測可能性の限界:Web追跡データを用いたデモグラフィックと行動差の調査

Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data ( http://arxiv.org/abs/2012.15112v1 )

ライセンス: Link先を確認

Juhi Kulshrestha, Marcos Oliveira, Orkut Karacalik, Denis Bonnay, Claudia Wagner

(参考訳) web上の人間の活動や動きを理解することは、計算社会科学者にとって重要であるだけでなく、レコメンデーション、キャッシング、広告、パーソナライゼーションのためのオンラインシステム設計のための貴重なガイダンスを提供する。そこで本研究では,web上のルーチンに従う傾向にあり,web訪問の繰り返しパターンが,閲覧行動の予測可能性を高めることを実証する。ウェブ上での人体移動の予測可能性の不確実性と理論的限界を測定するための情報理論フレームワークを提案する。異なる設計決定が測定に与える影響を体系的に評価する。このフレームワークをドイツのインターネットユーザのWeb追跡データセットに適用する。私たちの経験的結果は、Web上の個人のルーチンは、平均85%の閲覧行動を予測可能にしますが、その値は個人によって異なります。ユーザの予測能力の違いは、その人口統計学的特性と行動学的特性によってある程度説明できる。

Understanding human activities and movements on the Web is not only important for computational social scientists but can also offer valuable guidance for the design of online systems for recommendations, caching, advertising, and personalization. In this work, we demonstrate that people tend to follow routines on the Web, and these repetitive patterns of web visits increase their browsing behavior's achievable predictability. We present an information-theoretic framework for measuring the uncertainty and theoretical limits of predictability of human mobility on the Web. We systematically assess the impact of different design decisions on the measurement. We apply the framework to a web tracking dataset of German internet users. Our empirical results highlight that individual's routines on the Web make their browsing behavior predictable to 85% on average, though the value varies across individuals. We observe that these differences in the users' predictabilities can be explained to some extent by their demographic and behavioral attributes.

翻訳日:2023-04-18 07:55:42 公開日:2020-12-30

# 動的相転移としての断熱量子輸送の散逸失敗

Dissipative failure of adiabatic quantum transport as a dynamical phase transition ( http://arxiv.org/abs/2012.15212v1 )

ライセンス: Link先を確認

Fergus Barratt, Aleix Bou Comas, Philip J. D. Crowley, Vadim Oganesyan, Peter Sollich, Andrew G. Green

(参考訳) 絡み合いは断熱的な量子輸送の中心的資源である。遅延は、軌道を偏り、成功と失敗の間の移行を駆動することで、リソースの可用性に影響を与える。この絡み合いの減少は、量子技術の実用化に重要である。本稿では, 断熱輸送の故障を動的相転移として理解することで, 断熱計算の失敗に対する新たな視点を示す。これらのアイデアは、2つのスピン系における断熱量子輸送のおもちゃモデルで実証される。

Entanglement is the central resource in adiabatic quantum transport. Dephasing affects the availability of that resource by biasing trajectories, driving transitions between success and failure. This depletion of entanglement is important for the practical implementation of quantum technologies. We present a new perspective on the failure of adiabatic computation by understanding the failure of adiabatic transport as a dynamical phase transition. These ideas are demonstrated in a toy model of adiabatic quantum transport in a two spin system.

翻訳日:2023-04-18 07:47:18 公開日:2020-12-30

# 実世界ツインフィールド量子鍵分布に対するコヒーレント相転移

Coherent phase transfer for real-world twin-field quantum key distribution ( http://arxiv.org/abs/2012.15199v1 )

ライセンス: Link先を確認

Cecilia Clivati, Alice Meda, Simone Donadello, Salvatore Virz\`i, Marco Genovese, Filippo Levi, Alberto Mura, Mirko Pittaluga, Zhiliang L. Yuan, Andrew J. Shields, Marco Lucamarini, Ivo Pietro Degiovanni, Davide Calonico

(参考訳) 量子力学は、本質的にセキュアな暗号鍵を光学的に分配することを可能にする。ツインフィールド量子鍵分布は長距離ファイバの実装において最も有望な手法であるが、パーティ間の通信チャネルの光学的長さを安定化する必要がある。スプールファイバに基づく原理実証実験では、周期的な調整フレームで量子通信をインターリーブすることで達成された。このアプローチでは、キーストリーミングの長いデューティサイクルはチャネル長のゆるやかな制御のコストで実現され、実世界でこのテクニックを使ったキー転送の成功は依然として大きな課題である。周波数計測から得られた干渉法を用いて,65dbの損失を有する206kmのフィールド展開ファイバ上で,鍵のストリーミングとチャネル長制御を同時に行う方法を開発した。提案手法は,チャネル長変化による量子ビットエラーレートを<1%に削減し,実世界の量子通信に有効な解を示す。

Quantum mechanics allows the distribution of intrinsically secure encryption keys by optical means. Twin-field quantum key distribution is the most promising technique for its implementation on long-distance fibers, but requires stabilizing the optical length of the communication channels between parties. In proof-of-principle experiments based on spooled fibers, this was achieved by interleaving the quantum communication with periodical adjustment frames. In this approach, longer duty cycles for the key streaming come at the cost of a looser control of channel length, and a successful key-transfer using this technique in a real world remains a significant challenge. Using interferometry techniques derived from frequency metrology, we developed a solution for the simultaneous key streaming and channel length control, and demonstrate it on a 206 km field-deployed fiber with 65 dB loss. Our technique reduces the quantum-bit-error-rate contributed by channel length variations to <1%, representing an effective solution for real-world quantum communications.

翻訳日:2023-04-18 07:47:11 公開日:2020-12-30

# 拡張重力理論における量子非局所性

Quantum nonlocality in extended theories of gravity ( http://arxiv.org/abs/2012.15331v1 )

ライセンス: Link先を確認

Victor A. S. V. Bittencourt, Massimo Blasone, Fabrizio Illuminati, Gaetano Lambiase, Giuseppe Gaetano Luciano, Luciano Petruzziello

(参考訳) 本研究では, 粒子内部自由度における純粋状態アインシュタイン-ポドルスキー-ローゼン相関が, 重力の延長理論によって記述された時空背景の影響について検討する。アインシュタイン・ヒルベルト作用に対する補正が曲率不変量において二次的なモデルを考え、弱体極限に焦点をあてる。非局所的な量子相関をクレーター・ホルン・シモニー・ホルト不等式違反を用いて定量化し、曲線背景が一般相対性理論とアインシュタイン重力の補正によるさらなる寄与により、先行項による違反をいかに抑制するかを示す。この結果は、光子対のような質量を持たない粒子に一般化することができ、拡張重力模型の精密な実験を考案するために好適に利用することができる。

We investigate how pure-state Einstein-Podolsky-Rosen correlations in the internal degrees of freedom of massive particles are affected by a curved spacetime background described by extended theories of gravity. We consider models for which the corrections to the Einstein-Hilbert action are quadratic in the curvature invariants and we focus on the weak-field limit. We quantify nonlocal quantum correlations by means of the violation of the Clauser-Horne-Shimony-Holt inequality, and show how a curved background suppresses the violation by a leading term due to general relativity and a further contribution due to the corrections to Einstein gravity. Our results can be generalized to massless particles such as photon pairs and can thus be suitably exploited to devise precise experimental tests of extended models of gravity.

翻訳日:2023-04-18 07:39:09 公開日:2020-12-30

# 浮遊ナノ球の量子運動におけるベクトル偏光子

Vectorial polaritons in the quantum motion of a levitated nanosphere ( http://arxiv.org/abs/2012.15265v1 )

ライセンス: Link先を確認

A. Ranfagni, P. Vezio, M. Calamai, A. Chowdhury, F. Marino, and F. Marin

(参考訳) 電磁場(光子)の基本励起と量子化された機械振動(フォノン)の強い結合は、フォノン偏光子として知られるハイブリッド準粒子状態を生成する。典型的なサインは結合系の固有周波数間の交差を回避し、Jaynes-Cummings Hamiltonianによってパラダイム的に説明され、空洞光子が原子、イオン、励起子、スピンエンサンブル、超伝導量子ビットと結合する量子電磁力学実験で観察される。本研究では, ナノ球の量子運動におけるフォノン偏光子の発生を実証する。粒子は光トウェザによって高真空に閉じ込められ、トウェザ光子のコヒーレント散乱によって単一のキャビティモードに強く結合される。 2次元運動は、光学キャビティモードとともに3自由度を持つ光学系を定義する2つのほぼ退化成分に分けられる。このように、強い結合状態に入ると、三部量子系の典型的な分散法則でハイブリッド光機械状態が観察される。驚くべきことに、この運動の独立成分は、光の偏光と同様に、偏光場にベクトル的性質を与える平面上の物理的振動方向を識別する。本研究は,光子成分とフォノニック成分間の量子情報伝達のための新しいプロトコルへの道を開き,室温での光力学的絡み合い状態の実証に向けた鍵となるステップを示す。

The strong coupling between elementary excitations of the electromagnetic field (photons) and quantized mechanical vibrations (phonons) produces hybrid quasi-particle states, known as phonon-polaritons. Their typical signature is the avoided crossing between the eigenfrequencies of the coupled system, as paradigmatically illustrated by the Jaynes-Cummings Hamiltonian, and observed in quantum electrodynamics experiments where cavity photons are coupled to atoms, ions, excitons, spin ensambles and superconducting qubits. In this work, we demonstrate the generation of phonon-polaritons in the quantum motion of an optically-levitated nanosphere. The particle is trapped in high vacuum by an optical tweezer and strongly coupled to a single cavity mode by coherent scattering of the tweezer photons. The two-dimensional motion splits into two nearly-degenerate components that, together with the optical cavity mode, define an optomechanical system with three degrees-of-freedom. As such, when entering the strong coupling regime, we observe hybrid light-mechanical states with a dispersion law typical of tripartite quantum systems. Remarkably, the independent components of motion here identify a physical vibration direction on a plane that, similarly to the polarization of light, confers a vectorial nature to the polariton field. Our results pave the way to novel protocols for quantum information transfer between photonic and phononic components and represent a key-step towards the demonstration of optomechanical entangled states at room temperature.

翻訳日:2023-04-18 07:37:27 公開日:2020-12-30

# 勾配の流れを罠にかける方法

How to trap a gradient flow ( http://arxiv.org/abs/2001.02968v3 )

ライセンス: Link先を確認

S\'ebastien Bubeck and Dan Mikulincer

(参考訳) 我々は、コンパクトな領域である $\mathbb{r}^d$ 上の滑らかな函数の1つの定常点である $\varepsilon$-approximate stationary point を見つける問題を考える。勾配降下のような次元のないアプローチとは対照的に、ここでは$d$が有限かつ潜在的に小さい場合に焦点を当てる。この視点は1993年にヴァヴァシスによって探求され、バヴァシスは任意の有限次元$d$に対して、勾配降下のオラクル複雑性を$O(1/\varepsilon^2)$で改善するアルゴリズムを提案した。例えば、$d=2$の場合、vavasisのアプローチは$o(1/\varepsilon)$を得る。さらに$d=2$で、決定論的アルゴリズムに対して$\omega(1/\sqrt{\varepsilon})$という下限を証明した(この結果をランダム化アルゴリズムに拡張する)。我々の主な貢献は、勾配流トラップ法(GFT)と呼ばれるアルゴリズムと、そのオラクルの複雑さの分析である。次元 $d=2$ において、GFT は Vavasis の下界(対数係数まで)とのギャップを閉じ、複雑性 $O\left(\sqrt {\frac{\log(1/\varepsilon)}{\varepsilon}}\right)$ であることを示す。次元$d=3$では、$O\left(\frac{\log(1/\varepsilon)}{\varepsilon}\right)$を示し、Vavasisの$O\left(1 / \varepsilon^{1.2} \right)$を改善する。高次元では、gftは勾配降下の多項式深さ(vavasis' algorithm)とは対照的に対数平行深さ戦略(logarithmic parallel depth strategy)という特徴を持つ。この高次元状態において、GFTの総作業量は、この問題に対する既知の唯一の多対数深度戦略、すなわち単純格子探索に基づいて2次的に改善される。我々はこの結果を,任意の固定次元におけるVavasisのアルゴリズムを改善するアルゴリズムである 'emph{cut and flow} (CF) で拡張する。

We consider the problem of finding an $\varepsilon$-approximate stationary point of a smooth function on a compact domain of $\mathbb{R}^d$. In contrast with dimension-free approaches such as gradient descent, we focus here on the case where $d$ is finite, and potentially small. This viewpoint was explored in 1993 by Vavasis, who proposed an algorithm which, for any fixed finite dimension $d$, improves upon the $O(1/\varepsilon^2)$ oracle complexity of gradient descent. For example for $d=2$, Vavasis' approach obtains the complexity $O(1/\varepsilon)$. Moreover for $d=2$ he also proved a lower bound of $\Omega(1/\sqrt{\varepsilon})$ for deterministic algorithms (we extend this result to randomized algorithms). Our main contribution is an algorithm, which we call gradient flow trapping (GFT), and the analysis of its oracle complexity. In dimension $d=2$, GFT closes the gap with Vavasis' lower bound (up to a logarithmic factor), as we show that it has complexity $O\left(\sqrt{\frac{\log(1/\varepsilon)}{\varepsilon}}\right)$. In dimension $d=3$, we show a complexity of $O\left(\frac{\log(1/\varepsilon)}{\varepsilon}\right)$, improving upon Vavasis' $O\left(1 / \varepsilon^{1.2} \right)$. In higher dimensions, GFT has the remarkable property of being a logarithmic parallel depth strategy, in stark contrast with the polynomial depth of gradient descent or Vavasis' algorithm. In this higher dimensional regime, the total work of GFT improves quadratically upon the only other known polylogarithmic depth strategy for this problem, namely naive grid search. We augment this result with another algorithm, named \emph{cut and flow} (CF), which improves upon Vavasis' algorithm in any fixed dimension.

翻訳日:2023-01-13 05:25:17 公開日:2020-12-30

# RatLesNetv2: げっ歯類脳病変分離のための完全な畳み込みネットワーク

RatLesNetv2: A Fully Convolutional Network for Rodent Brain Lesion Segmentation ( http://arxiv.org/abs/2001.09138v4 )

ライセンス: Link先を確認

Juan Miguel Valverde, Artem Shatillo, Riccardo de Feo, Olli Gr\"ohn, Alejandra Sierra, Jussi Tohka

(参考訳) われわれは,ラットレズネットv2と命名された完全畳み込みニューラルネットワーク(ConvNet)を,歯状磁気共鳴(MR)脳画像における病変の分節のために提案する。 ratlesnetv2アーキテクチャはオートエンコーダに似ており、最適化を容易にする残余ブロックを組み込んでいる。 RatLesNetv2は3次元画像でエンドツーエンドにトレーニングされており、前処理を必要としない。 RatLesNetv2は, 局所脳虚血を薬開発に用いた671ラットの脳MRIで, 916T2強調MRIを用いて, 非常に大きなデータセットで評価した。さらに,医療画像セグメンテーション用に設計された他の3つのConvNetと比較した。 ratlesnetv2は他のconvnetよりも高いdice係数の値が得られ、より少ない穴と低いハウスドルフ距離を持つよりリアルでコンパクトなセグメンテーションを生み出した。 ratlesnetv2 セグメンテーションの dice スコアは、手動セグメンテーションのレート間合意を超えた。結論として、RatLesNetv2は、自動病変分割、人間の作業量の削減、再現性の向上に使用できる。 RatLesNetv2はhttps://github.com/jmlipman/RatLesNetv2で公開されている。

We present a fully convolutional neural network (ConvNet), named RatLesNetv2, for segmenting lesions in rodent magnetic resonance (MR) brain images. RatLesNetv2 architecture resembles an autoencoder and it incorporates residual blocks that facilitate its optimization. RatLesNetv2 is trained end to end on three-dimensional images and it requires no preprocessing. We evaluated RatLesNetv2 on an exceptionally large dataset composed of 916 T2-weighted rat brain MRI scans of 671 rats at nine different lesion stages that were used to study focal cerebral ischemia for drug development. In addition, we compared its performance with three other ConvNets specifically designed for medical image segmentation. RatLesNetv2 obtained similar to higher Dice coefficient values than the other ConvNets and it produced much more realistic and compact segmentations with notably fewer holes and lower Hausdorff distance. The Dice scores of RatLesNetv2 segmentations also exceeded inter-rater agreement of manual segmentations. In conclusion, RatLesNetv2 could be used for automated lesion segmentation, reducing human workload and improving reproducibility. RatLesNetv2 is publicly available at https://github.com/jmlipman/RatLesNetv2.

翻訳日:2023-01-07 05:24:46 公開日:2020-12-30

# 雑音入力を用いたマルチクラスガウス過程分類

Multi-class Gaussian Process Classification with Noisy Inputs ( http://arxiv.org/abs/2001.10523v3 )

ライセンス: Link先を確認

Carlos Villacampa-Calvo, Bryan Zaldivar, Eduardo C. Garrido-Merch\'an, Daniel Hern\'andez-Lobato

(参考訳) 機械学習コミュニティでは、観測されたデータが入力属性のノイズフリーであると仮定することが一般的である。それでも、実際の問題では入力ノイズを伴うシナリオが一般的であり、測定が完全に正確ではない。この入力ノイズが考慮されない場合、教師付き機械学習手法が準最適に実行されることが期待される。本稿では,マルチクラス分類問題に着目し,ガウス過程(GP)を基礎となる分類器として利用する。天体物理学領域から得られたデータセットにより、観測されたデータは入力にノイズを含む可能性があると仮定する。そこで我々は,入力ノイズを考慮できるマルチクラスgp分類器を考案する。このような分類器は、モデルの潜在変数の後方分布を近似するために変分推論を用いて効率的に訓練することができる。また、状況によっては事前に騒音量を知ることができる。このような場合、提案手法に容易に導入することができる。この事前情報は、よりよいパフォーマンス結果をもたらすことが期待されている。提案手法を,合成データと実データを含むいくつかの実験により評価した。これには、UCIリポジトリからのいくつかのデータセット、MNISTデータセット、天体物理学からのデータセットが含まれる。その結果,入力ノイズを無視するgpsに基づく分類器の予測分布よりも,提案手法の予測分布が類似しているものの,テストログの類似性の観点からは,提案手法の予測分布が優れていることがわかった。

It is a common practice in the machine learning community to assume that the observed data are noise-free in the input attributes. Nevertheless, scenarios with input noise are common in real problems, as measurements are never perfectly accurate. If this input noise is not taken into account, a supervised machine learning method is expected to perform sub-optimally. In this paper, we focus on multi-class classification problems and use Gaussian processes (GPs) as the underlying classifier. Motivated by a data set coming from the astrophysics domain, we hypothesize that the observed data may contain noise in the inputs. Therefore, we devise several multi-class GP classifiers that can account for input noise. Such classifiers can be efficiently trained using variational inference to approximate the posterior distribution of the latent variables of the model. Moreover, in some situations, the amount of noise can be known before-hand. If this is the case, it can be readily introduced in the proposed methods. This prior information is expected to lead to better performance results. We have evaluated the proposed methods by carrying out several experiments, involving synthetic and real data. These include several data sets from the UCI repository, the MNIST data set and a data set coming from astrophysics. The results obtained show that, although the classification error is similar across methods, the predictive distribution of the proposed methods is better, in terms of the test log-likelihood, than the predictive distribution of a classifier based on GPs that ignores input noise.

翻訳日:2023-01-06 02:24:13 公開日:2020-12-30

# 高速モデル推論のためのニューラルネットワーク圧縮フレームワーク

Neural Network Compression Framework for fast model inference ( http://arxiv.org/abs/2002.08679v4 )

ライセンス: Link先を確認

Alexander Kozlov and Ivan Lazarevich and Vasily Shamporov and Nikolay Lyalyushkin and Yury Gorbachev

(参考訳) 本稿では,ニューラルネット圧縮フレームワーク(nncf)と呼ばれる,微調整によるニューラルネットワーク圧縮のための新しいフレームワークを提案する。様々なネットワーク圧縮手法の最近の進歩を活用し、スパーシティ、量子化、バイナリ化などいくつかの手法を実装している。これらの方法では、汎用ハードウェア計算ユニット(cpu、gpu)や特別なディープラーニングアクセラレータ上で効率的に実行できる、よりハードウェアフレンドリーなモデルを得ることができる。提案手法は,従来の精度を維持しつつ,推論時間を高速化するために,幅広いモデルに適用可能であることを示す。フレームワークはトレーニングサンプル内で使用することができ、それが供給されるか、あるいは最小限の適応で既存のトレーニングコードにシームレスに統合可能なスタンドアロンパッケージとして使用することができる。現在、NNCFのPyTorchバージョンがOpenVINO Training Extensionsの一部としてhttps://github.com/openvinotoolkit/nncfで公開されている。

In this work we present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF). It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization. These methods allow getting more hardware-friendly models which can be efficiently run on general-purpose hardware computation units (CPU, GPU) or special Deep Learning accelerators. We show that the developed methods can be successfully applied to a wide range of models to accelerate the inference time while keeping the original accuracy. The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code with minimal adaptations. Currently, a PyTorch version of NNCF is available as a part of OpenVINO Training Extensions at https://github.com/openvinotoolkit/nncf.

翻訳日:2022-12-30 08:12:00 公開日:2020-12-30

# ディープネットワーク最適化アルゴリズムの収束保証に関する基礎的アプローチ

An Elementary Approach to Convergence Guarantees of Optimization Algorithms for Deep Networks ( http://arxiv.org/abs/2002.09051v2 )

ライセンス: Link先を確認

Vincent Roulet and Zaid Harchaoui

(参考訳) 本稿では,基本引数と計算に基づく深層ネットワークの最適化アルゴリズムの収束保証を得るための手法を提案する。収束解析は、機械学習ソフトウェアにおけるディープネットワークの実装の中心となる最適化オラクルの分析構造と計算構造を中心に展開される。深層ネットワークの学習に使用される一階最適化アルゴリズムの収束挙動を制御する滑らかさ定数の推定を体系的に計算する方法を提案する。現代のディープネットワークで発生する多様なサンプルコンポーネントとアーキテクチャは、そのアプローチを説明するために展示物にまたがる。

We present an approach to obtain convergence guarantees of optimization algorithms for deep networks based on elementary arguments and computations. The convergence analysis revolves around the analytical and computational structures of optimization oracles central to the implementation of deep networks in machine learning software. We provide a systematic way to compute estimates of the smoothness constants that govern the convergence behavior of first-order optimization algorithms used to train deep networks. A diverse set of example components and architectures arising in modern deep networks intersperse the exposition to illustrate the approach.

翻訳日:2022-12-30 07:07:57 公開日:2020-12-30

# 機械学習研究における再現性の向上 (NeurIPS 2019 Reproducibility Programからの報告)

Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) ( http://arxiv.org/abs/2003.12206v4 )

ライセンス: Link先を確認

Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivi\`ere, Alina Beygelzimer, Florence d'Alch\'e-Buc, Emily Fox, Hugo Larochelle

(参考訳) 機械学習研究の課題の1つは、提示された結果が健全で信頼性が高いことを保証することである。論文や講演で示されるような結果を得る再現性は、同じコードとデータ(利用可能であれば)を使用して、研究結果の信頼性を検証するために必要なステップである。再現性はまた、オープンでアクセスしやすい研究を促進するための重要なステップであり、科学コミュニティが新しい発見を迅速に統合し、アイデアを実践に転換することを可能にする。再現性はまた、意図しないエラーを減らす可能性のある堅牢な実験ワークフローの使用を促進する。 2019年、ニューラル情報処理システム(NeurIPS)カンファレンスは、機械学習研究のための国際会議であり、機械学習研究の実施、コミュニケーション、評価に関するコミュニティ全体の標準を改善するために設計された再現性プログラムを導入した。プログラムには3つのコンポーネントが含まれている: コード提出ポリシー、コミュニティ全体の再現性課題、および、ペーパー提出プロセスの一部として機械学習再現性チェックリストが含まれる。本稿では,これら各コンポーネントについて,デプロイ方法,このイニシアティブからどのようなことを学ぶことができたか,などについて述べる。

One of the challenges in machine learning research is to ensure that presented and published results are sound and reliable. Reproducibility, that is obtaining similar results as presented in a paper or talk, using the same code and data (when available), is a necessary step to verify the reliability of research findings. Reproducibility is also an important step to promote open and accessible research, thereby allowing the scientific community to quickly integrate new findings and convert ideas to practice. Reproducibility also promotes the use of robust experimental workflows, which potentially reduce unintentional errors. In 2019, the Neural Information Processing Systems (NeurIPS) conference, the premier international conference for research in machine learning, introduced a reproducibility program, designed to improve the standards across the community for how we conduct, communicate, and evaluate machine learning research. The program contained three components: a code submission policy, a community-wide reproducibility challenge, and the inclusion of the Machine Learning Reproducibility checklist as part of the paper submission process. In this paper, we describe each of these components, how it was deployed, as well as what we were able to learn from this initiative.

翻訳日:2022-12-19 04:37:05 公開日:2020-12-30

# ResNeSt: 分割アテンションネットワーク

ResNeSt: Split-Attention Networks ( http://arxiv.org/abs/2004.08955v2 )

ライセンス: Link先を確認

Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola

(参考訳) 特徴マップの注意とマルチパス表現が視覚認識に重要であることはよく知られている。本稿では,異なるネットワークブランチにチャネル毎の注意を向け,機能横断的なインタラクションを捉え,多様な表現を学習するモジュラー化アーキテクチャを提案する。我々の設計は単純で統一された計算ブロックとなり、少数の変数だけでパラメータ化できる。我々のモデルはResNeStと呼ばれ、画像分類の精度と遅延トレードオフにおいてEfficientNetより優れています。さらに、ResNeStはバックボーンとして機能するいくつかの公開ベンチマークにおいて優れた転送学習結果を達成しており、COCO-LVISチャレンジの勝者として採用されている。完全なシステムと事前訓練されたモデルのソースコードが公開されている。

It is well known that featuremap attention and multi-path representation are important for visual recognition. In this paper, we present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our design results in a simple and unified computation block, which can be parameterized using only a few variables. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification. In addition, ResNeSt has achieved superior transfer learning results on several public benchmarks serving as the backbone, and has been adopted by the winning entries of COCO-LVIS challenge. The source code for complete system and pretrained models are publicly available.

翻訳日:2022-12-12 00:21:37 公開日:2020-12-30

# ニューラルネットワーク高速化のための最近のFPGAにおける低電圧動作の実験的検討

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration ( http://arxiv.org/abs/2005.03451v2 )

ライセンス: Link先を確認

Behzad Salami, Erhan Baturay Onural, Ismail Emir Yuksel, Fahrettin Koc, Oguz Ergin, Adrian Cristal Kestelman, Osman S. Unsal, Hamid Sarbazi-Azad, Onur Mutlu

(参考訳) 本研究では,フィールドプログラマブルゲートアレイ(FPGA)にマッピングされた畳み込みニューラルネットワーク(CNN)加速器の電力効率を向上させるために,名目レベル以下で回路供給電圧をアンダスケーリングする手法を実証的に評価する。安全な電圧レベル以下では、過度の回路遅延の増加によるタイミング障害が発生する可能性がある。このようなアクセラレーターの信頼性とパワーのトレードオフを評価する。具体的には、実FPGAの複数成分の減電圧動作を実験的に検討し、CNN加速器の信頼性挙動を特徴付けるとともに、減電圧動作の欠点を最小化するための手法を提案し、過電圧を量子化とプルーニングというアーキテクチャCNN最適化技術と組み合わせる。環境温度が加速器の信頼性・電力トレードオフに及ぼす影響について検討する。我々は,最新のXilinx ZCU102 FPGAプラットフォームの3つの同一サンプルに対して,最新の画像分類CNNベンチマークを用いて実験を行った。このアプローチにより、ソフトウェアとハードウェアの両方の可変性に対する下振れ技術の効果を研究できます。我々は過電圧で3倍以上の電力効率(gops/w)を得る。このゲインの2.6倍は電圧ガードバンド領域、すなわちfpgaベンダーが設定した名目レベル以下の安全な電圧領域を除去し、最悪の環境や回路条件で適切な機能を保証する結果である。電力効率向上の43%は、cnn加速器の精度損失のコストがかかるガードバンドを下回ることによるものである。この精度の低下を防ぐ効果的な周波数アンダスケーリング手法の評価を行い,電力効率が43%から25%に低下することを確認した。

We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.

翻訳日:2022-12-07 01:23:45 公開日:2020-12-30

# 深層学習に基づくテキスト・スタイル・トランスファーのレビュー

Review of Text Style Transfer Based on Deep Learning ( http://arxiv.org/abs/2005.02914v3 )

ライセンス: Link先を確認

Xiangyang Li, Guo Pu, Keyu Ming, Pu Li, Jie Wang, Yuxuan Wang

(参考訳) 最近の自然言語処理ではテキストスタイルの転送がホットな問題であり、主に特定の状況や聴衆、目的に適応するためにテキストを研究している。テキストのスタイルは、通常、形態、文法、感情、複雑さ、流動性、緊張、トーンなど多くの側面を含んでいる。従来のテキストスタイル伝達モデルでは、テキストスタイルは一般的に専門家の知識と手作りのルールに依存しているが、自然言語処理の分野でのディープラーニングの適用により、深層学習に基づくテキストスタイル転送手法が研究され始めた。近年,自然言語処理研究において,テキストスタイル転送がホットな問題となっている。本稿では,近年の深層学習に基づくテキストスタイル伝達モデルの研究を要約し,本研究の方向性と進歩を要約し,分析し,比較する。さらに、テキストスタイルの転送によく使用される公開データセットや評価指標も紹介する。最後に、既存のテキストスタイル転送モデルの特徴を要約し、ディープラーニングに基づくテキストスタイル転送モデルの今後の開発動向を分析し予測する。

Text style transfer is a hot issue in recent natural language processing,which mainly studies the text to adapt to different specific situations, audiences and purposes by making some changes. The style of the text usually includes many aspects such as morphology, grammar, emotion, complexity, fluency, tense, tone and so on. In the traditional text style transfer model, the text style is generally relied on by experts knowledge and hand-designed rules, but with the application of deep learning in the field of natural language processing, the text style transfer method based on deep learning Started to be heavily researched. In recent years, text style transfer is becoming a hot issue in natural language processing research. This article summarizes the research on the text style transfer model based on deep learning in recent years, and summarizes, analyzes and compares the main research directions and progress. In addition, the article also introduces public data sets and evaluation indicators commonly used for text style transfer. Finally, the existing characteristics of the text style transfer model are summarized, and the future development trend of the text style transfer model based on deep learning is analyzed and forecasted.

翻訳日:2022-12-06 04:58:22 公開日:2020-12-30

# 学習特徴とビュー合成による長期視覚定位のための基準ポーズ生成

Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis ( http://arxiv.org/abs/2005.05179v4 )

ライセンス: Link先を確認

Zichao Zhang, Torsten Sattler, Davide Scaramuzza

(参考訳) 視覚的ローカライゼーションは、自動運転と拡張現実のための重要な技術のひとつだ。正確な6自由度(DoF)参照ポーズを持つ高品質データセットは、既存のメソッドのベンチマークと改善の基盤である。伝統的に、参照ポーズはStructure-from-Motion (SfM)を介して得られる。しかし、SfM自体は、例えば昼夜の変化など、異なる条件下で撮影された画像が失敗しがちな局所的な特徴に依存している。同時に、手動でアノテートする機能対応はスケーラブルではなく、潜在的に不正確である。本研究では,3次元モデルのレンダリングと実画像との特徴マッチングに基づく参照ポーズを生成するための半自動手法を提案する。最初のポーズ推定を仮定すると、現在のポーズ推定からモデルのレンダリングに対して、特徴マッチングに基づいてポーズを反復的に洗練します。我々は,一般的なAachen Day-Nightデータセットの夜間参照ポーズを大幅に改善し,現在最先端の視覚的ローカライゼーション手法がオリジナルの参照ポーズによって予測されるよりも優れた(最大4,7\%)ことを示す。我々は、データセットを新しい夜間テスト画像で拡張し、新しい参照ポーズに対する不確実性推定を提供し、新しい評価基準を導入する。私たちは、リファレンスのポーズとフレームワークを公開時に公開します。

Visual Localization is one of the key enabling technologies for autonomous driving and augmented reality. High quality datasets with accurate 6 Degree-of-Freedom (DoF) reference poses are the foundation for benchmarking and improving existing methods. Traditionally, reference poses have been obtained via Structure-from-Motion (SfM). However, SfM itself relies on local features which are prone to fail when images were taken under different conditions, e.g., day/ night changes. At the same time, manually annotating feature correspondences is not scalable and potentially inaccurate. In this work, we propose a semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features. Given an initial pose estimate, our approach iteratively refines the pose based on feature matches against a rendering of the model from the current pose estimate. We significantly improve the nighttime reference poses of the popular Aachen Day-Night dataset, showing that state-of-the-art visual localization methods perform better (up to $47\%$) than predicted by the original reference poses. We extend the dataset with new nighttime test images, provide uncertainty estimates for our new reference poses, and introduce a new evaluation criterion. We will make our reference poses and our framework publicly available upon publication.

翻訳日:2022-12-04 20:28:51 公開日:2020-12-30

# 非漸近解析による連続宇宙MDPにおけるモンテカルロ計画

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis ( http://arxiv.org/abs/2006.04672v2 )

ライセンス: Link先を確認

Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Ba\c{s}ar

(参考訳) モンテカルロ・ツリー・サーチ(MCTS)で実証されたモンテカルロ計画は、有限空間の応用において顕著な性能を示した。本稿では,モンテカルロ計画について,制御・ロボット工学における重要な応用に対する理解の低い,連続的な状態対応空間を持つ環境での考察を行う。我々は,階層的楽観的最適化(hoo)(bubeck et al., 2011)と呼ばれる連続武装バンディット戦略でmctsを増強するアルゴリズムであるpoly-hootを紹介する。具体的には,高信頼境界におけるボーナス項の対数ではなく,適切な多項式を用いることでhooを強化した。このような多項式ボーナスは、AlphaGo Zero(Silver et al., 2017b)における経験的成功と、有限空間MCTS(Shah et al., 2019)の理論的保証を達成する上で重要な役割によって動機付けられている。非定常バンディット問題において,HOOアルゴリズムが拡張されたことを初めて考察した。この結果をビルディングブロックとして用いることで、POLY-HOOTの非漸近収束保証を確立する:値推定は多項式速度で最適値関数の任意の小さな近傍に収束する。理論的な知見を裏付ける実験結果も提供します。

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT, an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems. Using this result as a building block, we establish non-asymptotic convergence guarantees for POLY-HOOT: the value estimate converges to an arbitrarily small neighborhood of the optimal value function at a polynomial rate. We further provide experimental results that corroborate our theoretical findings.

翻訳日:2022-11-24 00:13:30 公開日:2020-12-30

# 近接近傍サンプリングを用いた条件付き相互情報のための神経推定器

Neural Estimators for Conditional Mutual Information Using Nearest Neighbors Sampling ( http://arxiv.org/abs/2006.07225v3 )

ライセンス: Link先を確認

Sina Molavipour, Germ\'an Bassi, Mikael Skoglund

(参考訳) サンプルの集合から相互情報(MI)や条件付き相互情報(CMI)を推定することは、長年の課題である。この領域における最近の研究は、人工ニューラルネットワークの近似能力を活用し、従来の手法よりも改善されている。この新しいアプローチにおける重要な課題の1つは、サンプルが特定の製品密度関数に従って分散される異なるデータセットである元のデータセットを考慮に入れる必要があることである。 CMIを見積もる場合,これは特に困難です。本稿では,試料平均値に対する高信頼濃度境界の再現と導出を行うために,k近傍近傍 (k-nn) に基づく新しい手法を提案する。次に、ニューラルネットワーク分類器を訓練するためにこの技術を使用し、それに応じてCMIを推定する。この手法を用いて3つの推定器を提案し、それらの一貫性を証明し、文献におけるそれらと類似したアプローチの比較を行い、推定器の精度とばらつきの観点からCMIを推定する際の改善を実験的に示す。

The estimation of mutual information (MI) or conditional mutual information (CMI) from a set of samples is a long-standing problem. A recent line of work in this area has leveraged the approximation power of artificial neural networks and has shown improvements over conventional methods. One important challenge in this new approach is the need to obtain, given the original dataset, a different set where the samples are distributed according to a specific product density function. This is particularly challenging when estimating CMI. In this paper, we introduce a new technique, based on k nearest neighbors (k-NN), to perform the resampling and derive high-confidence concentration bounds for the sample average. Then the technique is employed to train a neural network classifier and the CMI is estimated accordingly. We propose three estimators using this technique and prove their consistency, make a comparison between them and similar approaches in the literature, and experimentally show improvements in estimating the CMI in terms of accuracy and variance of the estimators.

翻訳日:2022-11-22 04:53:18 公開日:2020-12-30

# 半教師付き分類のためのクラス親密拡散ネットワーク

Class-Attentive Diffusion Network for Semi-Supervised Classification ( http://arxiv.org/abs/2006.10222v3 )

ライセンス: Link先を確認

Jongin Lim, Daeho Um, Hyung Jin Chang, Dae Ung Jo, Jin Young Choi

(参考訳) 近年,半教師付き分類のためのグラフニューラルネットワークが広く研究されている。しかし、既存の手法は限られた隣人の情報のみを使用し、グラフのクラス間接続を扱わない。本稿では,k-hop近傍において,おそらく同一クラスのノードを適応的に集約する新しいアグリゲーションスキームであるadacad(class-attentive diffusion)を用いた適応アグリゲーションを提案する。そこで我々はまず,クラス間拡散(cad)と呼ばれる新しい確率過程を提案し,クラス内ノードへの注目度を高め,クラス間ノードへの注意を弱める。グラフ構造のみによって決定される遷移行列を持つ既存の拡散法とは対照的に,CADはノードの特徴とグラフ構造の両方を,分類器を用いたクラス減衰遷移行列の設計により考慮している。さらに,局所クラスコンテキストに依存する各ノードに対する拡散結果の反射率の差を利用した適応的更新手法を提案する。主な利点として、AdaCADはノードラベルとグラフトポロジの相違に起因するクラス間特徴の望ましくない混合の問題を軽減する。 AdaCAD上に構築され,CAD-Netと呼ばれる単純なモデルを構築した。提案手法の有効性を連続的に検証し,CAD-Netは最先端の手法よりも優れていた。コードはhttps://github.com/ljin0429/CAD-Netで入手できる。

Recently, graph neural networks for semi-supervised classification have been widely studied. However, existing methods only use the information of limited neighbors and do not deal with the inter-class connections in graphs. In this paper, we propose Adaptive aggregation with Class-Attentive Diffusion (AdaCAD), a new aggregation scheme that adaptively aggregates nodes probably of the same class among K-hop neighbors. To this end, we first propose a novel stochastic process, called Class-Attentive Diffusion (CAD), that strengthens attention to intra-class nodes and attenuates attention to inter-class nodes. In contrast to the existing diffusion methods with a transition matrix determined solely by the graph structure, CAD considers both the node features and the graph structure with the design of our class-attentive transition matrix that utilizes a classifier. Then, we further propose an adaptive update scheme that leverages different reflection ratios of the diffusion result for each node depending on the local class-context. As the main advantage, AdaCAD alleviates the problem of undesired mixing of inter-class features caused by discrepancies between node labels and the graph topology. Built on AdaCAD, we construct a simple model called Class-Attentive Diffusion Network (CAD-Net). Extensive experiments on seven benchmark datasets consistently demonstrate the efficacy of the proposed method and our CAD-Net significantly outperforms the state-of-the-art methods. Code is available at https://github.com/ljin0429/CAD-Net.

翻訳日:2022-11-19 10:08:59 公開日:2020-12-30

# ANOVA平均次元の効率的な推定とニューラルネット分類への応用

Efficient estimation of the ANOVA mean dimension, with an application to neural net classification ( http://arxiv.org/abs/2007.01281v4 )

ライセンス: Link先を確認

Christopher Hoyt and Art B. Owen

(参考訳) ブラックボックス関数の平均次元は$d$変数であり、高階または低階の相互作用によって支配される範囲を要約するのに便利な方法である。 2^d-1$分散成分の項で表されるが、$d$ Sobol'の指標の和として書くことができ、これは1つのアウトメソッドから推定できる。筆者らは, ウインド・階段と呼ばれるギブス・サンプルラー, ベースラインから各変数を一度に変化させるラジアル・サンプルラー, 関数評価を再利用しないナイーブ・サンプルラーなどを比較した。加法関数では、半径と巻く階段が最も効率的である。乗算関数の場合、因子が高いクルトシスを持つ場合、ナイーブ法は最も効率的である。図示として、MNISTデータセットからの桁のニューラルネットワーク分類器の平均次元について考察する。この分類器は784ドルのピクセルの関数だ。そのため、階段を巻くのが最適なアルゴリズムです。最終的なsoftmax層への入力は、平均寸法が1.35$から2.0$であることがわかった。

The mean dimension of a black box function of $d$ variables is a convenient way to summarize the extent to which it is dominated by high or low order interactions. It is expressed in terms of $2^d-1$ variance components but it can be written as the sum of $d$ Sobol' indices that can be estimated by leave one out methods. We compare the variance of these leave one out methods: a Gibbs sampler called winding stairs, a radial sampler that changes each variable one at a time from a baseline, and a naive sampler that never reuses function evaluations and so costs about double the other methods. For an additive function the radial and winding stairs are most efficient. For a multiplicative function the naive method can easily be most efficient if the factors have high kurtosis. As an illustration we consider the mean dimension of a neural network classifier of digits from the MNIST data set. The classifier is a function of $784$ pixels. For that problem, winding stairs is the best algorithm. We find that inputs to the final softmax layer have mean dimensions ranging from $1.35$ to $2.0$.

翻訳日:2022-11-14 15:02:44 公開日:2020-12-30

# EfficientHRNet:軽量高分解能マルチパーソンポーズ推定のための効率的なスケーリング

EfficientHRNet: Efficient Scaling for Lightweight High-Resolution Multi-Person Pose Estimation ( http://arxiv.org/abs/2007.08090v2 )

ライセンス: Link先を確認

Christopher Neff, Aneri Sheth, Steven Furgurson, Hamed Tabkhi

(参考訳) 多くの新興スマートIoTアプリケーションの軽量なマルチパーソンポーズ推定に対する需要が高まっている。しかし、既存のアルゴリズムは大きなモデルサイズと厳しい計算要求を持ち、リアルタイムアプリケーションやリソース制約のあるハードウェアへのデプロイには不適である。軽量でリアルタイムなアプローチは極めて稀であり、精度が劣るコストがかかる。本稿では,リソース制約されたデバイス上でリアルタイムに動作可能な軽量な多人数ポーズ推定装置であるEfficientHRNetを提案する。 EfficientHRNetは、高解像度の特徴表現によるモデルスケーリングの最近の進歩を統合することで、高精度なモデルを作成しながら、リアルタイムのパフォーマンスを達成するのに十分な計算量を削減している。最大のモデルは現在の最先端の4.4%の精度で、モデルサイズは1/3、計算量は1/6でnvidia jetson xavierでは23fpsとなる。最上位のリアルタイムアプローチと比較して、EfficientHRNetは22%の精度向上を実現し、1/3のパワーで同様のFPSを実現している。あらゆるレベルで、効率の良いHRNetは他のボトムアップな2次元ポーズ推定手法よりも計算効率が良く、高い競争精度を実現している。

There is an increasing demand for lightweight multi-person pose estimation for many emerging smart IoT applications. However, the existing algorithms tend to have large model sizes and intense computational requirements, making them ill-suited for real-time applications and deployment on resource-constrained hardware. Lightweight and real-time approaches are exceedingly rare and come at the cost of inferior accuracy. In this paper, we present EfficientHRNet, a family of lightweight multi-person human pose estimators that are able to perform in real-time on resource-constrained devices. By unifying recent advances in model scaling with high-resolution feature representations, EfficientHRNet creates highly accurate models while reducing computation enough to achieve real-time performance. The largest model is able to come within 4.4% accuracy of the current state-of-the-art, while having 1/3 the model size and 1/6 the computation, achieving 23 FPS on Nvidia Jetson Xavier. Compared to the top real-time approach, EfficientHRNet increases accuracy by 22% while achieving similar FPS with 1/3 the power. At every level, EfficientHRNet proves to be more computationally efficient than other bottom-up 2D human pose estimation approaches, while achieving highly competitive accuracy.

翻訳日:2022-11-09 23:00:12 公開日:2020-12-30

# ロボットハンドオーバ開始のためのジェスチャー認識

Gesture Recognition for Initiating Human-to-Robot Handovers ( http://arxiv.org/abs/2007.09945v2 )

ライセンス: Link先を確認

Jun Kwan, Chinkye Tan and Akansel Cosgun

(参考訳) 人間とロボットのハンドオーバは多くの人間とロボットのインタラクションシナリオに役立ちます。人間がハンドオーバを開始する意図を認識させることが重要であり、ハンドオーバが意図されていなければ、ロボットは人間からオブジェクトを取り出そうとしない。ハンドオーバジェスチャー認識は単一のRGB画像のバイナリ分類問題として機能する。 rgb画像から関連する特徴を抽出するために、物体を検出するための3つの別個のニューラルネットワークモジュール、人体キーポイントと頭部方向を実装し、特徴ベクトルをディープニューラルネットワークに渡してバイナリ分類を行う。以上の結果から,ハンドオーバ動作は90%以上の精度で正しく識別できることがわかった。機能の抽象化により、アプローチはモジュール化され、異なるオブジェクトや人体タイプに一般化できます。

Human-to-Robot handovers are useful for many Human-Robot Interaction scenarios. It is important to recognize when a human intends to initiate handovers, so that the robot does not try to take objects from humans when a handover is not intended. We pose the handover gesture recognition as a binary classification problem in a single RGB image. Three separate neural network modules for detecting the object, human body key points and head orientation, are implemented to extract relevant features from the RGB images, and then the feature vectors are passed into a deep neural net to perform binary classification. Our results show that the handover gestures are correctly identified with an accuracy of over 90%. The abstraction of the features makes our approach modular and generalizable to different objects and human body types.

翻訳日:2022-11-08 14:35:14 公開日:2020-12-30

# 正規化エピポーラ誤差の幾何学的解釈

Geometric Interpretations of the Normalized Epipolar Error ( http://arxiv.org/abs/2008.01254v7 )

ライセンス: Link先を確認

Seong Hun Lee, Javier Civera

(参考訳) 本研究では,正規化エピポーラ誤差の幾何学的解釈を提供する。最も注目すべきは、(1)2つのバックプロジェクション線間の最短距離、(2)2つの境界エピポーラ面間の双面角、(3) $l_1$-optimal angular reprojection errorである。

In this work, we provide geometric interpretations of the normalized epipolar error. Most notably, we show that it is directly related to the following quantities: (1) the shortest distance between the two backprojected rays, (2) the dihedral angle between the two bounding epipolar planes, and (3) the $L_1$-optimal angular reprojection error.

翻訳日:2022-11-03 00:24:02 公開日:2020-12-30

# コモンセンス・プロット・オーダリングによるストーリーテリングの自動化

Automated Storytelling via Causal, Commonsense Plot Ordering ( http://arxiv.org/abs/2009.00829v2 )

ライセンス: Link先を確認

Prithviraj Ammanabrolu, Wesley Cheung, William Broniec, Mark O. Riedl

(参考訳) 自動ストーリープロット生成は、プロットイベントの一貫性のあるシーケンスを生成するタスクである。プロットイベント間の因果関係は、ストーリーの認識とプロットのコヒーレンスを高めると考えられている。本研究では,コモンセンス推論から推定される因果関係として,ソフト因果関係の概念を導入する。 C2POは、この概念をCausal, Commonsense Plot Orderingを通じて運用する物語生成のアプローチである。人間の参加型プロトコルを用いて,異なる常識推論推論と帰納的バイアスを持つベースラインシステムに対して,認識されたストーリー品質におけるソフト因果関係の役割を判断するシステムを評価する。これらの研究を通じて、ストーリーテリングジャンルにおけるコモンセンス規範の変化がストーリー品質の知覚にどのように影響するかを考察する。

Automated story plot generation is the task of generating a coherent sequence of plot events. Causal relations between plot events are believed to increase the perception of story and plot coherence. In this work, we introduce the concept of soft causal relations as causal relations inferred from commonsense reasoning. We demonstrate C2PO, an approach to narrative generation that operationalizes this concept through Causal, Commonsense Plot Ordering. Using human-participant protocols, we evaluate our system against baseline systems with different commonsense reasoning reasoning and inductive biases to determine the role of soft causal relations in perceived story quality. Through these studies we also probe the interplay of how changes in commonsense norms across storytelling genres affect perceptions of story quality.

翻訳日:2022-10-22 18:25:31 公開日:2020-12-30

# 胸部X線異常分類に応用した深層階層型マルチラベル分類

Deep Hiearchical Multi-Label Classification Applied to Chest X-Ray Abnormality Taxonomies ( http://arxiv.org/abs/2009.05609v3 )

ライセンス: Link先を確認

Haomin Chen, Shun Miao, Daguang Xu, Gregory D. Hager, Adam P. Harrison

(参考訳) CXRは決定的かつ極めて一般的な診断ツールであり、CADソリューションの研究に繋がる。しかし,高い分類精度と臨床的分類を尊重し,組み込む有意義なモデル予測がcadユーザビリティに不可欠である。そこで本研究では,CXR CADのためのHMLCアプローチを提案する。他の階層システムとは異なり、まずネットワークを訓練して条件付き確率を直接モデル化し、非条件付き確率で精製することが性能向上の鍵となる。さらに,非条件確率に対して数値的に安定なクロスエントロピー損失関数を定式化し,具体的な性能向上を実現する。最後に,HMLCが欠落したラベルや不完全ラベルの管理に有効であることを示す。我々の知る限りでは、HMLCを医療画像CADに適用するのは初めてである。我々はPLCOデータセットのCXRアームから異常ラベルを検出するためのアプローチを広範囲に評価した。完全なラベルを使用する場合、このデータセットで報告されている最上位の 0.887 の平均 AUC を報告する。これらの結果はPadChestデータセットの補助的な実験によって支援され、AUCとAPでそれぞれ1.2%と4.1%の大幅な改善が報告された。最後に、HMLCアプローチが不完全なラベル付きデータをよりうまく扱えることを示す。これらの性能改善は、分類学的予測の本質的な有用性と相まって、本手法がCXR CADにとって有用なステップであることを示唆している。

CXRs are a crucial and extraordinarily common diagnostic tool, leading to heavy research for CAD solutions. However, both high classification accuracy and meaningful model predictions that respect and incorporate clinical taxonomies are crucial for CAD usability. To this end, we present a deep HMLC approach for CXR CAD. Different than other hierarchical systems, we show that first training the network to model conditional probability directly and then refining it with unconditional probabilities is key in boosting performance. In addition, we also formulate a numerically stable cross-entropy loss function for unconditional probabilities that provides concrete performance improvements. Finally, we demonstrate that HMLC can be an effective means to manage missing or incomplete labels. To the best of our knowledge, we are the first to apply HMLC to medical imaging CAD. We extensively evaluate our approach on detecting abnormality labels from the CXR arm of the PLCO dataset, which comprises over $198,000$ manually annotated CXRs. When using complete labels, we report a mean AUC of 0.887, the highest yet reported for this dataset. These results are supported by ancillary experiments on the PadChest dataset, where we also report significant improvements, 1.2% and 4.1% in AUC and AP, respectively over strong "flat" classifiers. Finally, we demonstrate that our HMLC approach can much better handle incompletely labelled data. These performance improvements, combined with the inherent usefulness of taxonomic predictions, indicate that our approach represents a useful step forward for CXR CAD.

翻訳日:2022-10-19 20:49:03 公開日:2020-12-30

# YOLObile:圧縮コンパイル協調設計によるモバイルデバイス上のリアルタイムオブジェクト検出

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design ( http://arxiv.org/abs/2009.05697v2 )

ライセンス: Link先を確認

Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang

(参考訳) 物体検出技術の急速な発展と幅広い利用は、物体検出器の精度と速度の両方に注目を集めた。しかし、現在の最先端のオブジェクト検出作業は、大きなモデルで精度指向であるが、軽量モデルで高いレイテンシや速度指向をもたらすが精度を犠牲にしている。本研究では,モバイル端末上でリアルタイムなオブジェクト検出を行う YOLObile フレームワークを提案する。任意のカーネルサイズに対して新しいブロックパンチプルーニング方式を提案する。モバイルデバイス上での計算効率を向上させるため,GPU-CPU協調方式と高度なコンパイラ支援最適化が採用されている。実験結果から, 49.0mAPのYOLOv4の14$\times$圧縮速度が得られた。 YOLObileフレームワークでは,Samsung Galaxy S20上でGPUを用いて17FPSの推論速度を実現する。提案したGPU-CPU協調方式を取り入れることで、推論速度は19.1 FPSに向上し、元のYOLOv4を5$\times$ speedupで上回った。ソースコードは: \url{https://github.com/nightsnack/YOLObile} にある。

The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14$\times$ compression rate of YOLOv4 with 49.0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5$\times$ speedup. Source code is at: \url{https://github.com/nightsnack/YOLObile}.

翻訳日:2022-10-19 07:41:26 公開日:2020-12-30

# Puzzle Mix: 最適混合のための爆発率と局所統計

Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup ( http://arxiv.org/abs/2009.06962v2 )

ライセンス: Link先を確認

Jang-Hyun Kim, Wonho Choo, Hyun Oh Song

(参考訳) 深層ニューラルネットワークはトレーニング分布の適合において優れた性能を発揮する一方、学習されたネットワークは過度に適合する傾向があり、敵の攻撃を受けやすい。この点に関して,最近,ミックスアップに基づく拡張手法がいくつか提案されている。しかし、これらのアプローチは主に、未確認の仮想例の作成に重点を置いており、時にはネットワークに誤解を招く監視信号を提供することもある。そこで本研究では,自然例のサリエンシー情報と基礎となる統計情報を明示的に活用するための混合手法である puzzle mix を提案する。これにより、最適混合マスクのマルチラベル目的と、最適輸送目標のサリエンシ割引とを交互に比較する興味深い最適化問題が発生する。 CIFAR-100, Tiny-ImageNet, ImageNetの他の混合手法と比較して, Puzzle Mixは, 技術一般化の状況と, 対角的ロバスト性を実現する。ソースコードはhttps://github.com/snu-mllab/puzzlemixで入手できる。

While deep neural networks achieve great performance on fitting the training distribution, the learned networks are prone to overfitting and are susceptible to adversarial attacks. In this regard, a number of mixup based augmentation methods have been recently proposed. However, these approaches mainly focus on creating previously unseen virtual examples and can sometimes provide misleading supervisory signal to the network. To this end, we propose Puzzle Mix, a mixup method for explicitly utilizing the saliency information and the underlying statistics of the natural examples. This leads to an interesting optimization problem alternating between the multi-label objective for optimal mixing mask and saliency discounted optimal transport objective. Our experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny-ImageNet, and ImageNet datasets. The source code is available at https://github.com/snu-mllab/PuzzleMix.

翻訳日:2022-10-18 05:14:49 公開日:2020-12-30

# ビデオに基づく人物認識のためのフレームアグリゲーションとマルチモーダルフュージョンフレームワーク

Frame Aggregation and Multi-Modal Fusion Framework for Video-Based Person Recognition ( http://arxiv.org/abs/2010.09290v2 )

ライセンス: Link先を確認

Fangtao Li, Wenzhe Wang, Zihe Liu, Haoran Wang, Chenghao Yan, Bin Wu

(参考訳) 映像ベースの人物認識は、人物がブロックされぼやけられ、撮影角度が変化するため困難である。以前の研究では常に静止画の人物認識に焦点が当てられ、ビデオフレーム間の類似性と連続性を無視していた。上記の課題に対処するために,顔の特徴を集約し,映像中の人物を特定するためのマルチモーダル情報を含む,ビデオベースの人物認識のための新しいフレーム集約・マルチモーダルフュージョン(FAMF)フレームワークを提案する。フレームアグリゲーションのために,任意の数の特徴を入力として,特徴品質に基づいて固定長アグリゲーションを演算する,netvlad( attentionvlad)に基づく新しい学習可能な層を提案する。本稿では,NetVLADにアテンション機構を導入することで,低品質フレームの影響を効果的に低減できることを示す。ビデオのマルチモデル情報について,多層マルチモーダルアテンション(MLMA)モジュールを提案する。 iQIYI-VID-2019データセットの実験結果から,我々のフレームワークは他の最先端手法よりも優れた性能を示した。

Video-based person recognition is challenging due to persons being blocked and blurred, and the variation of shooting angle. Previous research always focused on person recognition on still images, ignoring similarity and continuity between video frames. To tackle the challenges above, we propose a novel Frame Aggregation and Multi-Modal Fusion (FAMF) framework for video-based person recognition, which aggregates face features and incorporates them with multi-modal information to identify persons in videos. For frame aggregation, we propose a novel trainable layer based on NetVLAD (named AttentionVLAD), which takes arbitrary number of features as input and computes a fixed-length aggregation feature based on feature quality. We show that introducing an attention mechanism to NetVLAD can effectively decrease the impact of low-quality frames. For the multi-model information of videos, we propose a Multi-Layer Multi-Modal Attention (MLMA) module to learn the correlation of multi-modality by adaptively updating Gram matrix. Experimental results on iQIYI-VID-2019 dataset show that our framework outperforms other state-of-the-art methods.

翻訳日:2022-10-05 22:51:37 公開日:2020-12-30

# FTBNN: 1ビットCNNの非線形性を再考する

FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond ( http://arxiv.org/abs/2010.09294v4 )

ライセンス: Link先を確認

Zhuo Su, Linpu Fang, Deke Guo, Dewen Hu, Matti Pietik\"ainen, Li Liu

(参考訳) 重みとアクティベーションの両方を1ビットにバイナライズするバイナリニューラルネットワーク(BNN)は、高度に高速化された計算とリソース制約されたデバイスの開発に訴えるメモリフットプリントの大幅な削減により、近年広く研究されている。 BNN構造を訓練するための量子化誤差を低減する従来の手法とは対照的に、二項化畳み込み過程はそのような誤差を最小化するターゲットに対して線形性を増大させ、それによってBNNの識別能力を損なう。本稿では,その矛盾を解消するために,適切な非線形モジュールを再検討し,チューニングし,精度とトレーニング効率の観点から大規模イメージネットデータセットの最先端性能を実現する強力なベースラインを実現する。さらに,提案するbnnモデルは,精度を損なうことなく,効率的なバイナリ操作をより有効に利用することにより,圧縮される可能性も高いことが判明した。さらに、グループ実行の助けを借りて、BNNモデルの限られた容量を増やすこともできる。これらの知見に基づいて,計算コストが低い場合でも,4～5%の精度でベースラインを改善することができる。コードはhttps://github.com/zhuogege1943/ftbnn.com/で公開します。

Binary neural networks (BNNs), where both weights and activations are binarized into 1 bit, have been widely studied in recent years due to its great benefit of highly accelerated computation and substantially reduced memory footprint that appeal to the development of resource constrained devices. In contrast to previous methods tending to reduce the quantization error for training BNN structures, we argue that the binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability. In this paper, we re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-the-art performance on the large-scale ImageNet dataset in terms of accuracy and training efficiency. To go further, we find that the proposed BNN model still has much potential to be compressed by making a better use of the efficient binary operations, without losing accuracy. In addition, the limited capacity of the BNN model can also be increased with the help of group execution. Based on these insights, we are able to improve the baseline with an additional 4~5% top-1 accuracy gain even with less computational cost. Our code will be made public at https://github.com/zhuogege1943/ftbnn.

翻訳日:2022-10-05 22:09:28 公開日:2020-12-30

# emformer:低レイテンシストリーミング音声認識のための効率的なメモリトランスフォーマーに基づく音響モデル

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition ( http://arxiv.org/abs/2010.10759v4 )

ライセンス: Link先を確認

Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, Mike Seltzer

(参考訳) 本稿では低遅延ストリーミング音声認識のための効率的なメモリ変換器Emformerを提案する。 Emformerでは、長期履歴コンテキストを拡張メモリバンクに蒸留することで、自己注意の計算複雑性を低減する。キャッシュ機構は、キーと値の計算を左のコンテキストの自己アテンションに保存する。 emformerは、低レイテンシモデルをサポートするために、トレーニングに並列化ブロック処理を適用する。ベンチマークのLibriSpeechデータに対して実験を行う。平均遅延 960 ms では、Emformer はテストクリーンで WER 2.50 % 、他で 5.62 % となる。強力なベースライン拡張メモリトランスフォーマー(am-trf)と比較すると、emformerはトレーニングのスピードアップに4.6ドル、相対リアルタイムファクター(rtf)のデコード削減に18\%、テストクリーンに17\%、テストに9\%のコストがかかる。平均レイテンシ80msの低レイテンシシナリオでは、emformerはテストクリーンで$3.01\%、テストで$7.09\%である。 LSTMベースラインを同じレイテンシとモデルサイズで比較すると、Emformerは相対的なWER削減を9.5%、テストクリーンで16.%となっている。

This paper proposes an efficient memory transformer Emformer for low latency streaming speech recognition. In Emformer, the long-range history context is distilled into an augmented memory bank to reduce self-attention's computation complexity. A cache mechanism saves the computation for the key and value in self-attention for the left context. Emformer applies a parallelized block processing in training to support low latency models. We carry out experiments on benchmark LibriSpeech data. Under average latency of 960 ms, Emformer gets WER $2.50\%$ on test-clean and $5.62\%$ on test-other. Comparing with a strong baseline augmented memory transformer (AM-TRF), Emformer gets $4.6$ folds training speedup and $18\%$ relative real-time factor (RTF) reduction in decoding with relative WER reduction $17\%$ on test-clean and $9\%$ on test-other. For a low latency scenario with an average latency of 80 ms, Emformer achieves WER $3.01\%$ on test-clean and $7.09\%$ on test-other. Comparing with the LSTM baseline with the same latency and model size, Emformer gets relative WER reduction $9\%$ and $16\%$ on test-clean and test-other, respectively.

翻訳日:2022-10-04 23:14:54 公開日:2020-12-30

# 自然言語における構成性と構造依存のモデル化

Modelling Compositionality and Structure Dependence in Natural Language ( http://arxiv.org/abs/2012.02038v2 )

ライセンス: Link先を確認

Karthikeya Ramesh Kaushik, Andrea E. Martin

(参考訳) 人間は既知の宇宙で最も洗練された計算機械を持っている。豊かな記述力の言語を理解し、驚くべき明快さで同じ環境でコミュニケーションすることができる。自然言語に興味を持つ多くのコントリビュータの2つ – 構成性と構造依存の性質 – は十分に文書化されており、興味深いモデリング質問を行うための広大なスペースを提供する。これらの疑問に答え始める最初のステップは、形式的な言葉理論を基礎づけることである。言語学と集合論に基づいて、これらの概念の形式化がこの論文の前半で述べられている。私たちは、言語を処理する認知システムが、構造的に定義されたドメインに依存する時間ベースのインクリメンタルな操作など、特定の機能的制約を持つ必要があることを目にします。このフォーマルな設定を分析した結果の観察は、モデリング演習の一環として検討される。単語埋め込み技術の進歩により、リレーショナルラーニングのモデルはカスタムデータセットでシミュレートされ、最初のセクションで記述された制約のいくつかを満たす時間ベースのロールフィラー結合メカニズムがいかに満たされるかを示す。モデルが構造をマッピングする能力とシンボリック・コネクショニストアーキテクチャは、認知的に妥当な実装を可能にします。形式化とシミュレーションは、言語理論によって課される制約を認識し、これらの制約を実現するための関係学習の認知モデルによって提示される機会を探求する試みである。

Human beings possess the most sophisticated computational machinery in the known universe. We can understand language of rich descriptive power, and communicate in the same environment with astonishing clarity. Two of the many contributors to the interest in natural language - the properties of Compositionality and Structure Dependence, are well documented, and offer a vast space to ask interesting modelling questions. The first step to begin answering these questions is to ground verbal theory in formal terms. Drawing on linguistics and set theory, a formalisation of these ideas is presented in the first half of this thesis. We see how cognitive systems that process language need to have certain functional constraints, viz. time based, incremental operations that rely on a structurally defined domain. The observations that result from analysing this formal setup are examined as part of a modelling exercise. Using the advances of word embedding techniques, a model of relational learning is simulated with a custom dataset to demonstrate how a time based role-filler binding mechanism satisfies some of the constraints described in the first section. The model's ability to map structure, along with its symbolic-connectionist architecture makes for a cognitively plausible implementation. The formalisation and simulation are together an attempt to recognise the constraints imposed by linguistic theory, and explore the opportunities presented by a cognitive model of relation learning to realise these constraints.

翻訳日:2022-09-22 09:08:32 公開日:2020-12-30

# deep gravity:深層ニューラルネットワークと地理情報を用いたモビリティフロー生成の促進

Deep Gravity: enhancing mobility flows generation with deep neural networks and geographic information ( http://arxiv.org/abs/2012.00489v2 )

ライセンス: Link先を確認

Filippo Simini, Gianni Barlacchi, Massimiliano Luca, Luca Pappalardo

(参考訳) 都市内および都市間における個人の移動は、客観的・主観的幸福、イノベーションの拡散、流行の広がり、環境の質といった、我々の社会の重要な側面に影響を与えます。このため, 位置の特性を考慮し, 実際の流れに関する情報を一切含まない, 一連の地理的位置間の流れを生成することによる, フロー生成の困難な問題に対する関心が高まっている。フロー生成に対する既存の解決策は、主に重力モデルや放射モデルのような機械的なアプローチに基づいており、過度な拡散や土地利用や輸送網といった重要な変数を無視し、これらの変数間の非線形関係を記述できない。本稿では,多機能深層重力モデル(mfdg)をフロー生成の有効な解として提案する。一方、mfdgモデルは、自発的地理情報データ(openstreetmap)から抽出された多数の変数(例えば、土地利用と道路網の特徴、輸送、食品、健康施設)を利用する。一方,本モデルは深層ニューラルネットワークを用いて,これらの変数間の複雑な非線形関係を記述する。イングランドにおける通勤流に着目した実験により,mfdgモデルは,深層ニューラルネットワークを使用しない機械モデルや地理自発的データを活用しない機械モデルよりも高い性能(人口密度の高い領域では最大250\%)を達成していることが示された。本研究では,時空間データを扱う深層学習コミュニティのための新しい課題であるフロー生成問題の正確な定義を提案し,現状の統計モデルよりもはるかに優れた深層ニューラルネットワークモデルを提案する。

The movements of individuals within and among cities influence key aspects of our society, such as the objective and subjective well-being, the diffusion of innovations, the spreading of epidemics, and the quality of the environment. For this reason, there is increasing interest around the challenging problem of flow generation, which consists in generating the flows between a set of geographic locations, given the characteristics of the locations and without any information about the real flows. Existing solutions to flow generation are mainly based on mechanistic approaches, such as the gravity model and the radiation model, which suffer from underfitting and overdispersion, neglect important variables such as land use and the transportation network, and cannot describe non-linear relationships between these variables. In this paper, we propose the Multi-Feature Deep Gravity (MFDG) model as an effective solution to flow generation. On the one hand, the MFDG model exploits a large number of variables (e.g., characteristics of land use and the road network; transport, food, and health facilities) extracted from voluntary geographic information data (OpenStreetMap). On the other hand, our model exploits deep neural networks to describe complex non-linear relationships between those variables. Our experiments, conducted on commuting flows in England, show that the MFDG model achieves a significant increase in the performance (up to 250\% for highly populated areas) than mechanistic models that do not use deep neural networks, or that do not exploit geographic voluntary data. Our work presents a precise definition of the flow generation problem, which is a novel task for the deep learning community working with spatio-temporal data, and proposes a deep neural network model that significantly outperforms current state-of-the-art statistical models.

翻訳日:2021-05-30 19:30:26 公開日:2020-12-30

# スペクトル分布認識画像生成

Spectral Distribution Aware Image Generation ( http://arxiv.org/abs/2012.03110v2 )

ライセンス: Link先を確認

Steffen Jung and Margret Keuper

(参考訳) フォトリアリスティック画像の深部生成モデルの最近の進歩は、高品質な視覚結果をもたらしている。このようなモデルは、人間の目で実際の画像と容易に区別できないような、所定のトレーニング分布からデータを生成することを学習する。しかし、このような偽画像の検出に関する最近の研究は、それらの周波数スペクトルのアーティファクトが実際に容易に識別できることを指摘している。本稿では,スペクトル判別器を用いて実データの周波数分布に応じて画像を生成することを提案する。提案する判別器は軽量でモジュール性があり、一般的なgan損失が異なる安定して動作する。この結果から,実際の周波数スペクトルによる画像生成がより容易であり,検出が困難であることが示唆された。

Recent advances in deep generative models for photo-realistic images have led to high quality visual results. Such models learn to generate data from a given training distribution such that generated images can not be easily distinguished from real images by the human eye. Yet, recent work on the detection of such fake images pointed out that they are actually easily distinguishable by artifacts in their frequency spectra. In this paper, we propose to generate images according to the frequency distribution of the real data by employing a spectral discriminator. The proposed discriminator is lightweight, modular and works stably with different commonly used GAN losses. We show that the resulting models can better generate images with realistic frequency spectra, which are thus harder to detect by this cue.

翻訳日:2021-05-22 12:04:58 公開日:2020-12-30

# メモリゲートリカレントネットワーク

Memory-Gated Recurrent Networks ( http://arxiv.org/abs/2012.13121v2 )

ライセンス: Link先を確認

Yaquan Zhang, Qi Wu, Nanbo Peng, Min Dai, Jing Zhang, Hu Wang

(参考訳) 多変量連続学習の本質は、データの依存関係を抽出する方法にある。集中治療単位の時間毎医療記録や多周波数の音声時系列といったこれらのデータセットは、個々の構成要素(マージナルメモリ)に強い連続依存を示すだけでなく、横断的な依存関係(ジョイントメモリ)において不要な記憶を示すことが多い。データ生成プロセスの根底にある関節分布の進化における多変量的複雑さのため、我々はデータ駆動型アプローチを採用し、メモリゲート型リカレントネットワーク(mGRN)と呼ばれる新しいリカレントネットワークアーキテクチャを構築し、ゲートは境界メモリとジョイントメモリという2つの異なる種類の記憶を明示的に制御する。様々な公開データセットに対する包括的シミュレーション研究と実証実験を組み合わせることで,提案したmGRNアーキテクチャは,多変量時系列を対象とする最先端アーキテクチャを一貫して上回ることを示す。

The essence of multivariate sequential learning is all about how to extract dependencies in data. These data sets, such as hourly medical records in intensive care units and multi-frequency phonetic time series, often time exhibit not only strong serial dependencies in the individual components (the "marginal" memory) but also non-negligible memories in the cross-sectional dependencies (the "joint" memory). Because of the multivariate complexity in the evolution of the joint distribution that underlies the data generating process, we take a data-driven approach and construct a novel recurrent network architecture, termed Memory-Gated Recurrent Networks (mGRN), with gates explicitly regulating two distinct types of memories: the marginal memory and the joint memory. Through a combination of comprehensive simulation studies and empirical experiments on a range of public datasets, we show that our proposed mGRN architecture consistently outperforms state-of-the-art architectures targeting multivariate time series.

翻訳日:2021-04-25 08:06:58 公開日:2020-12-30

# 非ランバート測光ステレオにおけるフレーム間およびフレーム内表現の学習

Learning Inter- and Intra-frame Representations for Non-Lambertian Photometric Stereo ( http://arxiv.org/abs/2012.13720v2 )

ライセンス: Link先を確認

Yanlong Cao, Binjie Ding, Zewei He, Jiangxin Yang, Jingxi Chen, Yanpeng Cao and Xin Li

(参考訳) 本稿では,2段階の畳み込みニューラルネットワーク(CNN)アーキテクチャを構築し,異なる光方向で撮像された画像の任意の数に基づいてフレーム間およびフレーム間表現を構築し,非ランベルト物体の正確な正規推定を行う。光度ステレオ問題に対して,フレーム間およびフレーム間特徴抽出モジュールを配置するための最適スキームを特定するために,多数のネットワーク設計手法を実験的に検討した。さらに, フレーム内空間畳み込みにおいて, 不正な背景領域からの干渉を除去し, 暗黒材料や鋳型シャドウを用いた表面の正常推定精度を効果的に向上させるため, 容易に得られる被写体マスクを提案する。提案する2段式光計測ステレオcnnモデル(mt-ps-cnn)は,精度と効率の両面で最先端の光計測ステレオ技術に好適である。さらに, 複素幾何の非ランベルト対象に対して, 高精度でリッチな面正規細部を予測でき, 希薄な照明分布と密集した照明分布の両方で, 安定して入力を行うことができる。

In this paper, we build a two-stage Convolutional Neural Network (CNN) architecture to construct inter- and intra-frame representations based on an arbitrary number of images captured under different light directions, performing accurate normal estimation of non-Lambertian objects. We experimentally investigate numerous network design alternatives for identifying the optimal scheme to deploy inter-frame and intra-frame feature extraction modules for the photometric stereo problem. Moreover, we propose to utilize the easily obtained object mask for eliminating adverse interference from invalid background regions in intra-frame spatial convolutions, thus effectively improve the accuracy of normal estimation for surfaces made of dark materials or with cast shadows. Experimental results demonstrate that proposed masked two-stage photometric stereo CNN model (MT-PS-CNN) performs favorably against state-of-the-art photometric stereo techniques in terms of both accuracy and efficiency. In addition, the proposed method is capable of predicting accurate and rich surface normal details for non-Lambertian objects of complex geometry and performs stably given inputs captured in both sparse and dense lighting distributions.

翻訳日:2021-04-25 01:10:07 公開日:2020-12-30

# (参考訳) 複合リスクは、教師なしドメイン適応アプローチのパフォーマンスにどのように影響するか?

How does the Combined Risk Affect the Performance of Unsupervised Domain Adaptation Approaches? ( http://arxiv.org/abs/2101.01104v1 )

ライセンス: CC BY 4.0

Li Zhong, Zhen Fang, Feng Liu, Jie Lu, Bo Yuan, Guangquan Zhang

(参考訳) unsupervised domain adaptation (uda)は、ソースドメインからのラベル付きサンプルとターゲットドメインからのラベルなしサンプルでターゲット分類器をトレーニングすることを目的としている。古典的なUDA学習バウンダリは、ターゲットのリスクは、ソースリスク、分散の相違、複合リスクの3つの項によって上限づけられていることを示している。組み合わせリスクが小さな固定値であるという仮定に基づいて、この境界値に基づく手法は、ソースリスクと分布不一致の推定を最小化することでターゲット分類器を訓練する。しかし、両方の推定器を最小化すると、複合リスクが増大し、ターゲットのリスクは制御不能になる。したがって、組み合わせたリスクを制御できなければ、ターゲット分類器は理想的な性能を達成できない。複合リスクを制御するために、重要な課題は、ターゲットドメイン内のラベル付きサンプルの有効性に根ざしている。この課題に対処するため,E-MixNetという手法を提案する。 e-mixnetは、ラベル付きソースサンプルと疑似ラベル付きターゲットサンプルに汎用的なビビナル分布である強化ミックスアップを使用して、複合リスクのプロキシを計算する。実験により、プロキシはソースリスクと分散の相違を最小化する際に、結合リスクの増加を効果的に抑制できることが示された。さらに,4つのUDA手法の損失関数に複合リスクのプロキシを付加すると,それらの性能も向上することを示した。

Unsupervised domain adaptation (UDA) aims to train a target classifier with labeled samples from the source domain and unlabeled samples from the target domain. Classical UDA learning bounds show that target risk is upper bounded by three terms: source risk, distribution discrepancy, and combined risk. Based on the assumption that the combined risk is a small fixed value, methods based on this bound train a target classifier by only minimizing estimators of the source risk and the distribution discrepancy. However, the combined risk may increase when minimizing both estimators, which makes the target risk uncontrollable. Hence the target classifier cannot achieve ideal performance if we fail to control the combined risk. To control the combined risk, the key challenge takes root in the unavailability of the labeled samples in the target domain. To address this key challenge, we propose a method named E-MixNet. E-MixNet employs enhanced mixup, a generic vicinal distribution, on the labeled source samples and pseudo-labeled target samples to calculate a proxy of the combined risk. Experiments show that the proxy can effectively curb the increase of the combined risk when minimizing the source risk and distribution discrepancy. Furthermore, we show that if the proxy of the combined risk is added into loss functions of four representative UDA methods, their performance is also improved.

翻訳日:2021-04-18 19:46:12 公開日:2020-12-30

# (参考訳) DeepSphere: グラフベースの球面CNN

DeepSphere: a graph-based spherical CNN ( http://arxiv.org/abs/2012.15000v1 )

ライセンス: CC BY 4.0

Micha\"el Defferrard, Martino Milani, Fr\'ed\'erick Gusset, Nathana\"el Perraudin

(参考訳) 球形ニューラルネットワークの畳み込みを設計するには、効率と回転同分散の微妙なトレードオフが必要である。サンプル球のグラフ表現に基づくDeepSphereは、これらの2つのデシダータの間に制御可能なバランスを打つ。この貢献は2つある。まず、各頂点および近傍の数に関して、基礎となるグラフによる等式の影響について理論的および実証的に検討する。次に,DeepSphereの問題点について検討した。実験は最先端のパフォーマンスを示し、この定式化の効率性と柔軟性を示す。おそらく意外なことに、以前の研究と比較すると、異方性フィルターは不必要に支払う価格になるかもしれない。私たちのコードはhttps://github.com/deepsphereで利用可能です。

Designing a convolution for a spherical neural network requires a delicate tradeoff between efficiency and rotation equivariance. DeepSphere, a method based on a graph representation of the sampled sphere, strikes a controllable balance between these two desiderata. This contribution is twofold. First, we study both theoretically and empirically how equivariance is affected by the underlying graph with respect to the number of vertices and neighbors. Second, we evaluate DeepSphere on relevant problems. Experiments show state-of-the-art performance and demonstrates the efficiency and flexibility of this formulation. Perhaps surprisingly, comparison with previous work suggests that anisotropic filters might be an unnecessary price to pay. Our code is available at https://github.com/deepsphere

翻訳日:2021-04-18 19:10:40 公開日:2020-12-30

# (参考訳) openvidial:ビジュアルコンテキストを備えた大規模オープンドメイン対話データセット

OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts ( http://arxiv.org/abs/2012.15015v1 )

ライセンス: CC BY 4.0

Yuxian Meng, Shuhe Wang, Qinghong Han, Xiaofei Sun, Fei Wu, Rui Yan and Jiwei Li

(参考訳) 人間が会話するとき、話者が次に何を言うかは、彼が見るものによって大きく異なる。残念ながら、既存の対話モデルは、先行するテキストコンテキストのみに基づいて対話発話を生成しており、視覚的コンテキストはほとんど考慮されない。これは、視覚的コンテキストと組み合わせた発話を伴う大規模マルチモジュール対話データセットがないためである。本稿では,大規模多モジュール対話データセットである {\bf openvidial} をリリースする。対話のターンと視覚的コンテキストは、映画やテレビシリーズから抽出され、各対話のターンは、それが行われる対応する視覚的コンテキストとペアリングされる。 OpenViDialには、合計で1100万回の対話があり、画像に格納されている視覚的コンテキストは1100万回である。このデータセットに基づいて,CNNから抽出した粗粒度画像特徴から,より高速なR-CNNから抽出した細粒度オブジェクト特徴まで,テキストと視覚の両方のコンテキストを活用するエンコーダ・デコーダモデル群を提案する。視覚情報は対話生成の質を著しく向上させ,対話学習のためのマルチモーダル機能の統合の必要性を検証する。我々の研究は、大規模マルチモーダル対話学習への重要な一歩である。

When humans converse, what a speaker will say next significantly depends on what he sees. Unfortunately, existing dialogue models generate dialogue utterances only based on preceding textual contexts, and visual contexts are rarely considered. This is due to a lack of a large-scale multi-module dialogue dataset with utterances paired with visual contexts. In this paper, we release {\bf OpenViDial}, a large-scale multi-module dialogue dataset. The dialogue turns and visual contexts are extracted from movies and TV series, where each dialogue turn is paired with the corresponding visual context in which it takes place. OpenViDial contains a total number of 1.1 million dialogue turns, and thus 1.1 million visual contexts stored in images. Based on this dataset, we propose a family of encoder-decoder models leveraging both textual and visual contexts, from coarse-grained image features extracted from CNNs to fine-grained object features extracted from Faster R-CNNs. We observe that visual information significantly improves dialogue generation qualities, verifying the necessity of integrating multi-modal features for dialogue learning. Our work marks an important step towards large-scale multi-modal dialogue learning.

翻訳日:2021-04-18 18:32:48 公開日:2020-12-30

# (参考訳) デヴァナガリ詩の言語識別

Language Identification of Devanagari Poems ( http://arxiv.org/abs/2012.15023v1 )

ライセンス: CC BY-SA 4.0

Priyankit Acharya, Aditya Ku. Pathak, Rakesh Ch. Balabantaray, and Anil Ku. Singh

(参考訳) 言語識別は、いくつかのテキスト処理パイプラインで非常に重要な部分です。この分野では広範な研究が行われている。本稿では,インドにおける10のデバナガリ言語からなる詩分析課題における詩の自動言語識別手法を提案する。 Angika, Awadhi, Braj, Bhojpuri, Chhattisgarhi, Garhwali, Haryanvi, Hindi, Magahi, Maithili。長さの異なる詩のコーパスを照合し,語彙レベルで10言語間の詩の類似性を検討した。最後に、教師付き機械学習とディープラーニング技術に基づく各種言語識別システムを適用し、評価する。

Language Identification is a very important part of several text processing pipelines. Extensive research has been done in this field. This paper proposes a procedure for automatic language identification of poems for poem analysis task, consisting of 10 Devanagari based languages of India i.e. Angika, Awadhi, Braj, Bhojpuri, Chhattisgarhi, Garhwali, Haryanvi, Hindi, Magahi, and Maithili. We collated corpora of poems of varying length and studied the similarity of poems among the 10 languages at the lexical level. Finally, various language identification systems based on supervised machine learning and deep learning techniques are applied and evaluated.

翻訳日:2021-04-18 17:37:38 公開日:2020-12-30

# (参考訳) 糖尿病管理システムを用いた眼底および舌デジタル画像処理のための機械学習技術のレビュー

A Review of Machine Learning Techniques for Applied Eye Fundus and Tongue Digital Image Processing with Diabetes Management System ( http://arxiv.org/abs/2012.15025v1 )

ライセンス: CC BY 4.0

Wei Xiang Lim, Zhiyuan Chen, Amr Ahmed, Tissa Chandesa and Iman Liao

(参考訳) 糖尿病は世界的な流行であり、警戒速度で増加している。国際糖尿病連盟(idf)は、世界規模の糖尿病患者数は48%増加し、4億2500万人(2017年)から6億2900万人(2045年)と予測している。さらに糖尿病は何百万人もの死者を招き、その数は急増している。そこで本稿では糖尿病の背景とその合併症について述べる。また,眼底および舌のデジタル画像を用いた糖尿病管理システムにおける革新的応用と過去の研究について検討した。市販の糖尿病管理システムによる既存の眼底および舌デジタル画像処理と,過去の文献による最先端の機械学習技術について概説した。本研究の目的は,糖尿病研究の概要と,この世界的な流行を解決するための新しい機械学習技術を提案することである。

Diabetes is a global epidemic and it is increasing at an alarming rate. The International Diabetes Federation (IDF) projected that the total number of people with diabetes globally may increase by 48%, from 425 million (year 2017) to 629 million (year 2045). Moreover, diabetes had caused millions of deaths and the number is increasing drastically. Therefore, this paper addresses the background of diabetes and its complications. In addition, this paper investigates innovative applications and past researches in the areas of diabetes management system with applied eye fundus and tongue digital images. Different types of existing applied eye fundus and tongue digital image processing with diabetes management systems in the market and state-of-the-art machine learning techniques from previous literature have been reviewed. The implication of this paper is to have an overview in diabetic research and what new machine learning techniques can be proposed in solving this global epidemic.

翻訳日:2021-04-18 17:29:44 公開日:2020-12-30

# (参考訳) LiDARセンサデータを用いたオンラインSVMによるインクリメンタル学習

Incremental learning with online SVMs on LiDAR sensory data ( http://arxiv.org/abs/2101.01667v1 )

ライセンス: CC BY 4.0

Le Dinh Van Khoa and Zhiyuan Chen

(参考訳) パイプラインの送電システムは、エネルギー産業において長い間存在してきた成長の側面の1つである。サービスを維持するためのパイプ内探索のコストは常に、この業界で注目を集めている。通常の探査方法(例) 磁束漏れと渦電流)は、各パイプのマイルストーンに固定されたセンサーを確立するか、パイプ内を移動するセンサーを運ぶ。大量のセンサーが備わっているため、メンテナンスプロセスは非常に困難である。解決策の1つは、感覚データ分析のための機械学習技術を実装することである。 SVMはカーネルのトリックでこの問題を解決できるが、カーネルの計算はデータサイズにも依存する。サポートベクトルの数が本当に大きくなると、プロセスが急速に誇張されるためです。特に、LiDARは極めて速い速度でスピンし、入力データの流れは最終的に大きな膨張をもたらす可能性がある。提案手法では,各サンプルを瞬時に学習し,サポートするカーネルを同時に計算する。本研究では,lidarセンサデータのみを扱うオンラインサポートベクターマシン(svms)を用いたインクリメンタル学習手法を提案する。

The pipelines transmission system is one of the growing aspects, which has existed for a long time in the energy industry. The cost of in-pipe exploration for maintaining service always draws lots of attention in this industry. Normally exploration methods (e.g. Magnetic flux leakage and eddy current) will establish the sensors stationary for each pipe milestone or carry sensors to travel inside the pipe. It makes the maintenance process very difficult due to the massive amount of sensors. One of the solutions is to implement machine learning techniques for the analysis of sensory data. Although SVMs can resolve this issue with kernel trick, the problem is that computing the kernel depends on the data size too. It is because the process can be exaggerated quickly if the number of support vectors becomes really large. Particularly LiDAR spins with an extremely rapid rate and the flow of input data might eventually lead to massive expansion. In our proposed approach, each sample is learned in an instant way and the supported kernel is computed simultaneously. In this research, incremental learning approach with online support vector machines (SVMs) is presented, which aims to deal with LiDAR sensory data only.

翻訳日:2021-04-18 17:22:50 公開日:2020-12-30

# (参考訳) 組立予測モデルによる石油・ガス産業の設備故障解析

Equipment Failure Analysis for Oil and Gas Industry with an Ensemble Predictive Model ( http://arxiv.org/abs/2012.15030v1 )

ライセンス: CC BY 4.0

Chen ZhiYuan, Olugbenro. O. Selere and Nicholas Lu Chee Seng

(参考訳) 本稿では,smo(sequential minimal optimization)トレーニングアルゴリズムを用いた支援ベクトル機械(svm)分類器の分類精度の向上を目的として,油・ガス機器データから故障や正常インスタンスを適切に分類する。近年の故障解析では,SMOトレーニングアルゴリズムを実装せずにSVM技術を用いているが,本研究では,SMOトレーニングアルゴリズムを用いた場合,提案手法の方が優れた性能が得られることを示す。さらに、SVM分類器の性能を向上させるために、ハイブリッドルールベースとニューラルネットワーク分類器であるアンサンブルアプローチを実装した(SMOトレーニングアルゴリズムを用いて)。最適化研究は、不均衡データセットを扱う際の分類器の性能低下の結果である。選択されたベストパフォーマンス分類器は、不均衡なデータの問題を処理できる効率的なアンサンブル予測モデルを作成するスタックングアンサンブル法を用いて、SVM分類器(SMOトレーニングアルゴリズム)と組み合わせる。この予測モデルの分類性能は、SMOトレーニングアルゴリズムおよび他の多くの従来の分類器によるSVMよりもかなり優れている。

This paper aims at improving the classification accuracy of a Support Vector Machine (SVM) classifier with Sequential Minimal Optimization (SMO) training algorithm in order to properly classify failure and normal instances from oil and gas equipment data. Recent applications of failure analysis have made use of the SVM technique without implementing SMO training algorithm, while in our study we show that the proposed solution can perform much better when using the SMO training algorithm. Furthermore, we implement the ensemble approach, which is a hybrid rule based and neural network classifier to improve the performance of the SVM classifier (with SMO training algorithm). The optimization study is as a result of the underperformance of the classifier when dealing with imbalanced dataset. The selected best performing classifiers are combined together with SVM classifier (with SMO training algorithm) by using the stacking ensemble method which is to create an efficient ensemble predictive model that can handle the issue of imbalanced data. The classification performance of this predictive model is considerably better than the SVM with and without SMO training algorithm and many other conventional classifiers.

翻訳日:2021-04-18 17:15:37 公開日:2020-12-30

# (参考訳) サポートベクターマシンによる故障の教師なしリアルタイム予測

Unsupervised Real Time Prediction of Faults Using the Support Vector Machine ( http://arxiv.org/abs/2012.15032v1 )

ライセンス: CC BY 4.0

Zhiyuan Chen, Isa Dino and Nik Ahmad Akram

翻訳日:2021-04-18 17:05:24 公開日:2020-12-30

# (参考訳) Spoken vs. の人間による評価オープンドメインQAのためのビジュアル説明

Human Evaluation of Spoken vs. Visual Explanations for Open-Domain QA ( http://arxiv.org/abs/2012.15075v1 )

ライセンス: CC BY-SA 4.0

Ana Valeria Gonzalez, Gagan Bansal, Angela Fan, Robin Jia, Yashar Mehdad and Srinivasan Iyer

(参考訳) オープンドメインQAシステム(ODQA)のユーザへの予測についての説明研究が盛んに行われているが,説明がユーザ信頼を高める程度の評価には至っていない。 ODQAは音声アシスタントにおいて最もユビキタスであるが、現在の研究はビジュアルディスプレイを用いた説明のみを評価し、他のモダリティに対する最もパフォーマンスの高い説明に関する結論を誤って外挿する可能性がある。これらの問題を緩和するために、odqaシステムの回答をいつ受け入れるかをユーザーが正確に判断するのに役立つ説明を計測するユーザー調査を行う。従来の作業とは異なり、説明モダリティ(例えば、音声またはビジュアルインターフェースを介してユーザと通信されるか、モダリティ間のコントラスト効果か)を制御する。その結果,得られた証拠文から導かれた説明は,モダリティにまたがる強いベースライン(信頼度)を上回ることができるが,実際にモダリティによって変化する最良の説明戦略であることがわかった。我々は,現在の説明に共通する障害事例を示し,説明のエンドツーエンド評価を強調し,デプロイと異なるプロキシモダリティで評価することを警告する。

While research on explaining predictions of open-domain QA systems (ODQA) to users is gaining momentum, most works have failed to evaluate the extent to which explanations improve user trust. While few works evaluate explanations using user studies, they employ settings that may deviate from the end-user's usage in-the-wild: ODQA is most ubiquitous in voice-assistants, yet current research only evaluates explanations using a visual display, and may erroneously extrapolate conclusions about the most performant explanations to other modalities. To alleviate these issues, we conduct user studies that measure whether explanations help users correctly decide when to accept or reject an ODQA system's answer. Unlike prior work, we control for explanation modality, e.g., whether they are communicated to users through a spoken or visual interface, and contrast effectiveness across modalities. Our results show that explanations derived from retrieved evidence passages can outperform strong baselines (calibrated confidence) across modalities but the best explanation strategy in fact changes with the modality. We show common failure cases of current explanations, emphasize end-to-end evaluation of explanations, and caution against evaluating them in proxy modalities that are different from deployment.

翻訳日:2021-04-18 16:38:28 公開日:2020-12-30

# (参考訳) sindhiのためのサブワード誘導ニューラルワードセグメンテーションモデル

A Subword Guided Neural Word Segmentation Model for Sindhi ( http://arxiv.org/abs/2012.15079v1 )

ライセンス: CC BY 4.0

Wazir Ali, Jay Kumar, Zenglin Xu, Congjian Luo, Junyu Lu, Junming Shao, Rajesh Kumar, and Yazhou Ren

(参考訳) ディープニューラルネットワークは、自然言語処理(nlp)における手動特徴工学の負担を軽減するために、テキスト表現の学習に複数の処理層を用いる。このようなテキスト表現はラベルのないデータから特徴を抽出するために広く使われている。セグメンテーションという言葉は多くの言語にとって基本的かつ必然的な前提条件である。 Sindhiはリソース不足の言語であり、空間欠落、空間挿入の問題、セグメンテーションのためのラベル付きコーパスがないため、セグメンテーションは困難である。本稿では,Syndhi のための Subword Guided Neural Word Segmenter (SGNWS) を用いたラベル付きデータを用いた教師付き Sindhi Word Segmentation (SWS) について検討する。テキスト表現を学習するために,2方向長短項記憶(BiLSTM),自己注意機構,条件付きランダムフィールド(CRF)を活用する形態素レベルで単語情報をキャプチャするために,サブワード表現を繰り返しニューラルネットワークに組み込む。提案したSGNWSモデルは機能工学に頼らずに98.51%のF1値を達成する。実験の結果,既存のsindhi単語セグメンタよりも,提案モデルの利点が示された。

Deep neural networks employ multiple processing layers for learning text representations to alleviate the burden of manual feature engineering in Natural Language Processing (NLP). Such text representations are widely used to extract features from unlabeled data. The word segmentation is a fundamental and inevitable prerequisite for many languages. Sindhi is an under-resourced language, whose segmentation is challenging as it exhibits space omission, space insertion issues, and lacks the labeled corpus for segmentation. In this paper, we investigate supervised Sindhi Word Segmentation (SWS) using unlabeled data with a Subword Guided Neural Word Segmenter (SGNWS) for Sindhi. In order to learn text representations, we incorporate subword representations to recurrent neural architecture to capture word information at morphemic-level, which takes advantage of Bidirectional Long-Short Term Memory (BiLSTM), self-attention mechanism, and Conditional Random Field (CRF). Our proposed SGNWS model achieves an F1 value of 98.51% without relying on feature engineering. The empirical results demonstrate the benefits of the proposed model over the existing Sindhi word segmenters.

翻訳日:2021-04-18 16:19:02 公開日:2020-12-30

# (参考訳) 機械学習予測の説明--運用プロセスへの適用のための必須ステップ

Explanations of Machine Learning predictions: a mandatory step for its application to Operational Processes ( http://arxiv.org/abs/2012.15103v1 )

ライセンス: CC BY 4.0

Giorgio Visani, Federico Chesani, Enrico Bagli, Davide Capuzzo and Alessandro Poluzzi

(参考訳) 世界経済では、信用会社は金融業者としての活動を通じて、経済発展において中心的な役割を果たす。この重要なタスクにはいくつかの欠点があり、主に債務者が提供されたクレジットを返済できないリスクがある。したがって、信用リスクモデリング(crm)、すなわち債務者が債務を返済しない確率の評価が最重要役割を担っている。統計的なアプローチは長年にわたってうまく活用され、CRMの最もよく使われる方法となった。近年,CRMタスクに機械学習およびディープラーニング技術が適用され,予測品質と性能が著しく向上している。しかし、そのような手法は、通常、彼らが生み出すスコアについて信頼できる説明を与えない。その結果、多くの機械・深層学習技術は、例えばGDPRのような西洋諸国の規制に従わない。本稿では, LIME(Local Interpretable Model-Agnostic Explanations)技術を用いて, この分野における説明可能性問題に対処し, 実際の信用リスクデータセットへの採用を示し, 最終的にその健全性とタスクの採用とコンプライアンスを保証するために必要な改善について議論する。

In the global economy, credit companies play a central role in economic development, through their activity as money lenders. This important task comes with some drawbacks, mainly the risk of the debtors not being able to repay the provided credit. Therefore, Credit Risk Modelling (CRM), namely the evaluation of the probability that a debtor will not repay the due amount, plays a paramount role. Statistical approaches have been successfully exploited since long, becoming the most used methods for CRM. Recently, also machine and deep learning techniques have been applied to the CRM task, showing an important increase in prediction quality and performances. However, such techniques usually do not provide reliable explanations for the scores they come up with. As a consequence, many machine and deep learning techniques fail to comply with western countries' regulations such as, for example, GDPR. In this paper we suggest to use LIME (Local Interpretable Model-agnostic Explanations) technique to tackle the explainability problem in this field, we show its employment on a real credit-risk dataset and eventually discuss its soundness and the necessary improvements to guarantee its adoption and compliance with the task.

翻訳日:2021-04-18 15:13:02 公開日:2020-12-30

# (参考訳) Dual-Camera Compressive Hyperspectral Imaging による高速ハイパースペクトル画像再生

Fast Hyperspectral Image Recovery via Non-iterative Fusion of Dual-Camera Compressive Hyperspectral Imaging ( http://arxiv.org/abs/2012.15104v1 )

ライセンス: CC BY 4.0

Wei He, Naoto Yokoya, and Xin Yuan

(参考訳) Coded Aperture snapshot Spectrum Imaging (CASSI) は、1つの符号化された2次元(2D)計測を用いて3次元ハイパースペクトル画像(HSI)をキャプチャする有望な手法である。異常な性質のため、様々な正規化器を用いて2次元計測から3次元データを再構成している。残念ながら、精度と計算の複雑さは満足できない。 1つの実現可能な解決策は、CASSIにおけるRGB測定などの追加情報を活用することである。本稿では, CASSI と RGB の組合せを考慮し, HSI 再構成のための新しい融合モデルを提案する。スペクトル基底と空間係数からなるhsiのスペクトル低ランク特性について検討した。具体的には、RGB測定を用いて係数を推定し、CASSI測定は直交スペクトルベースを提供する。さらに,hsiのスペクトル低ランク特性を向上させるパッチ処理戦略を提案する。提案したモデルは、非局所的な処理やイテレーション、RGB検出器のスペクトル検出行列を必要としない。シミュレーションおよび実HSIデータセットの大規模な実験により,提案手法は品質だけでなく,5000回以上の再現を高速化することを示す。

Coded aperture snapshot spectral imaging (CASSI) is a promising technique to capture the three-dimensional hyperspectral image (HSI) using a single coded two-dimensional (2D) measurement, in which algorithms are used to perform the inverse problem. Due to the ill-posed nature, various regularizers have been exploited to reconstruct the 3D data from the 2D measurement. Unfortunately, the accuracy and computational complexity are unsatisfied. One feasible solution is to utilize additional information such as the RGB measurement in CASSI. Considering the combined CASSI and RGB measurement, in this paper, we propose a new fusion model for the HSI reconstruction. We investigate the spectral low-rank property of HSI composed of a spectral basis and spatial coefficients. Specifically, the RGB measurement is utilized to estimate the coefficients, meanwhile the CASSI measurement is adopted to provide the orthogonal spectral basis. We further propose a patch processing strategy to enhance the spectral low-rank property of HSI. The proposed model neither requires non-local processing or iteration, nor the spectral sensing matrix of the RGB detector. Extensive experiments on both simulated and real HSI dataset demonstrate that our proposed method outperforms previous state-of-the-art not only in quality but also speeds up the reconstruction more than 5000 times.

翻訳日:2021-04-18 15:04:47 公開日:2020-12-30

# (参考訳) DUT-LF Saliency:Versatile DatasetとLight Field-to-RGB Saliency Detection

DUT-LFSaliency: Versatile Dataset and Light Field-to-RGB Saliency Detection ( http://arxiv.org/abs/2012.15124v1 )

ライセンス: CC BY 4.0

Yongri Piao and Zhengkun Rong and Shuang Xu and Miao Zhang and Huchuan Lu

(参考訳) 光電界データは、塩分検出に好適な特性を示す。学習に基づく光場塩分検出の成功は、モデルのより汎用性を高めるための包括的なデータセットの構築方法、高次元光フィールドデータの有効活用方法、デスクトップコンピュータやモバイルデバイスの汎用性を達成するためのフレキシブルモデルの設計方法に大きく依存している。これらの質問に答えるために、まず、rgb、rgb-dおよびlight field saliency detectionの汎用アプリケーションを可能にする大規模データセットを導入し、102のクラスと4204のサンプルを含む。次に,FocalストリームとRGBストリームからなる非対称な2ストリームモデルを提案する。 Focalストリームは、デスクトップコンピュータ上でより高いパフォーマンスを実現し、フォーカスネスの知識をRGBストリームに転送するように設計されている。 RGBストリームは3つの蒸留方式を通じてモバイルデバイスの柔軟性とメモリ/計算効率を保証する。実験は、我々の焦点ストリームが最先端のパフォーマンスを達成することを実証する。 rgb ストリームは dutlf-v2 上で top-2 f-measure を達成し、モデルサイズを 83% 削減し、最高の実行方法と比較して fps を 5 倍向上させる。さらに,提案する蒸留スキームはrgb塩分モデルに適用でき,柔軟性を確保しつつ優れた性能を実現する。

Light field data exhibit favorable characteristics conducive to saliency detection. The success of learning-based light field saliency detection is heavily dependent on how a comprehensive dataset can be constructed for higher generalizability of models, how high dimensional light field data can be effectively exploited, and how a flexible model can be designed to achieve versatility for desktop computers and mobile devices. To answer these questions, first we introduce a large-scale dataset to enable versatile applications for RGB, RGB-D and light field saliency detection, containing 102 classes and 4204 samples. Second, we present an asymmetrical two-stream model consisting of the Focal stream and RGB stream. The Focal stream is designed to achieve higher performance on desktop computers and transfer focusness knowledge to the RGB stream, relying on two tailor-made modules. The RGB stream guarantees the flexibility and memory/computation efficiency on mobile devices through three distillation schemes. Experiments demonstrate that our Focal stream achieves state-of-the-arts performance. The RGB stream achieves Top-2 F-measure on DUTLF-V2, which tremendously minimizes the model size by 83% and boosts FPS by 5 times, compared with the best performing method. Furthermore, our proposed distillation schemes are applicable to RGB saliency models, achieving impressive performance gains while ensuring flexibility.

翻訳日:2021-04-18 14:43:39 公開日:2020-12-30

# (参考訳) 古代ゲームにおける現代技術

Modern Techniques for Ancient Games ( http://arxiv.org/abs/2101.10066v1 )

ライセンス: CC BY 4.0

Cameron Browne

(参考訳) ゲームは、共有された文化的過去と人間の文明の発展に関する豊富な知識を提供する可能性があるが、初期のゲームに対する私たちの理解は不完全であり、しばしば信頼できない再建に基づいている。本稿では,現在進行中の5年間の研究プロジェクトであるDigital Ludemeプロジェクトについて述べる。

Games potentially provide a wealth of knowledge about our shared cultural past and the development of human civilisation, but our understanding of early games is incomplete and often based on unreliable reconstructions. This paper describes the Digital Ludeme Project, a five-year research project currently underway that aims to address such issues using modern computational techniques.

翻訳日:2021-04-18 13:51:43 公開日:2020-12-30

# (参考訳) JPEG圧縮に対するロバストデータハイディングに向けて:擬似微分型ディープラーニングアプローチ

Towards Robust Data Hiding Against (JPEG) Compression: A Pseudo-Differentiable Deep Learning Approach ( http://arxiv.org/abs/2101.00973v1 )

ライセンス: CC BY 4.0

Chaoning Zhang, Adil Karjauv, Philipp Benz, In So Kweon

(参考訳) データ隠蔽は、認証と所有権を保護するために広く使われているアプローチである。画像やビデオのようなほとんどのマルチメディアコンテンツは圧縮形式で送信または保存される。 JPEGのようなこのような損失の多い圧縮は、隠れたデータを破壊する可能性があるため、堅牢なデータ隠蔽の必要性が高まる。これらの圧縮に対抗できるデータ隠蔽の目標を達成することは、依然としてオープンな課題である。近年、ディープラーニングはデータの隠蔽に大きな成功を収めている一方、jpegの非微分性は、損失のある圧縮に対する堅牢性を改善するために深いパイプラインを訓練することが難しくなっている。既存のSOTAアプローチは、同様の操作を行う異なるモジュールで、微分不可能な部分を置き換える。 a) 大規模なエンジニアリング努力; (b) 圧縮攻撃のホワイトボックス知識を必要とする; (c) jpegのような単純な圧縮でのみ機能する。本研究では,上記の全ての制限に同時に対処するための,シンプルで効果的なアプローチを提案する。 JPEG以外にも、さまざまな画像やビデオの損失圧縮アルゴリズムに対する堅牢性を向上する手法が示されている。

Data hiding is one widely used approach for protecting authentication and ownership. Most multimedia content like images and videos are transmitted or saved in the compressed form. This kind of lossy compression, such as JPEG, can destroy the hidden data, which raises the need of robust data hiding. It is still an open challenge to achieve the goal of data hiding that can be against these compressions. Recently, deep learning has shown large success in data hiding, while non-differentiability of JPEG makes it challenging to train a deep pipeline for improving robustness against lossy compression. The existing SOTA approaches replace the non-differentiable parts with differentiable modules that perform similar operations. Multiple limitations exist: (a) large engineering effort; (b) requiring a white-box knowledge of compression attacks; (c) only works for simple compression like JPEG. In this work, we propose a simple yet effective approach to address all the above limitations at once. Beyond JPEG, our approach has been shown to improve robustness against various image and video lossy compression algorithms.

翻訳日:2021-04-18 13:39:14 公開日:2020-12-30

# (参考訳) 構文を意識したローカル注意によるbertの改善

Improving BERT with Syntax-aware Local Attention ( http://arxiv.org/abs/2012.15150v1 )

ライセンス: CC BY 4.0

Zhongli Li, Qingyu Zhou, Chao Li, Ke Xu, Yunbo Cao

(参考訳) BERTのような、トレーニング済みのTransformerベースのニューラルネットワークモデルは、さまざまなNLPタスクにおいて顕著な成果を上げている。近年の研究では、注意に基づくモデルが地域に対するより集中的な注意の恩恵を受けることが示された。その多くは、線形スパン内の注意範囲を制限するか、機械翻訳や質問応答のような特定のタスクに限定する。本稿では,構文構造における距離に基づいて注意範囲を制限した構文認識型局所的注意を提案する。提案した構文認識ローカルアテンションは、BERTのような事前訓練された言語モデルと統合して、構文的に関連する単語にフォーカスするためにモデルをレンダリングすることができる。文分類やシーケンスラベリングタスクなど,シングルセンテンスベンチマークの各種実験を行った。実験結果は、すべてのベンチマークデータセット上でBERTよりも一貫した利得を示している。本研究は,構文的に関連した単語に注目が集まることにより,より優れた性能が得られることを示す。

Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions. Most of them restrict the attention scope within a linear span, or confine to certain tasks such as machine translation and question answering. In this paper, we propose a syntax-aware local attention, where the attention scopes are restrained based on the distances in the syntactic structure. The proposed syntax-aware local attention can be integrated with pretrained language models, such as BERT, to render the model to focus on syntactically relevant words. We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks. Experimental results show consistent gains over BERT on all benchmark datasets. The extensive studies verify that our model achieves better performance owing to more focused attention over syntactically relevant words.

翻訳日:2021-04-18 13:32:12 公開日:2020-12-30

# (参考訳) 時間移動可能な摂動:オンライン・ビジュアル・オブジェクト・トラッカに対する効率的な一発攻撃

Temporally-Transferable Perturbations: Efficient, One-Shot Adversarial Attacks for Online Visual Object Trackers ( http://arxiv.org/abs/2012.15183v1 )

ライセンス: CC BY 4.0

Krishna Kanth Nakka and Mathieu Salzmann

(参考訳) 近年、シアムネットワークに基づくトラッカーは、視覚オブジェクト追跡(vot)に非常に効果的で効率的なものとなっている。これらの手法は、視覚認識タスクのための多くのディープネットワークと同様に、敵の攻撃に対して脆弱であることが示されているが、既存のVOTトラッカーに対する攻撃は全ての入力フレームの探索領域の摂動を必要とする。本稿では,オブジェクトテンプレート画像のみから,時間移動可能な1つの逆摂動を生成するフレームワークを提案する。この混乱はすべての検索画像に追加され、事実上コストがかからず、トラッカーを騙すことに成功した。実験により,本手法は,未目標シナリオにおける標準VOTベンチマークに対する最先端攻撃よりも優れていることが示された。さらに,我々のフォーマリズムは,様々な方向の摂動をプリ計算することにより,トラッカーが任意の軌道に追従することを強制する攻撃に自然に及んでいることを示す。

In recent years, the trackers based on Siamese networks have emerged as highly effective and efficient for visual object tracking (VOT). While these methods were shown to be vulnerable to adversarial attacks, as most deep networks for visual recognition tasks, the existing attacks for VOT trackers all require perturbing the search region of every input frame to be effective, which comes at a non-negligible cost, considering that VOT is a real-time task. In this paper, we propose a framework to generate a single temporally transferable adversarial perturbation from the object template image only. This perturbation can then be added to every search image, which comes at virtually no cost, and still, successfully fool the tracker. Our experiments evidence that our approach outperforms the state-of-the-art attacks on the standard VOT benchmarks in the untargeted scenario. Furthermore, we show that our formalism naturally extends to targeted attacks that force the tracker to follow any given trajectory by precomputing diverse directional perturbations.

翻訳日:2021-04-18 13:11:49 公開日:2020-12-30

# (参考訳) 科学論文のインパクト予測の簡易化

Simplifying Impact Prediction for Scientific Articles ( http://arxiv.org/abs/2012.15192v1 )

ライセンス: CC BY 4.0

Thanasis Vergoulis, Ilias Kanellos, Giorgos Giannopoulos, Theodore Dalamagas

(参考訳) 記事の期待される影響を見積もることは、さまざまなアプリケーション(例えば、記事/コオペレータ推奨)に有用である。既存のほとんどのアプローチは、各記事が近い将来受ける引用の正確な数を予測しようとするが、これは難しい回帰分析問題である。さらに、ほとんどのアプローチは、多数の記事に対して適切に満たせない要件である、各記事に対する豊富なメタデータの存在に依存しています。本研究では,より単純な機械学習問題を解くこと,期待される影響に基づく記事の分類が現実の多くのアプリケーションに十分であるという事実を活用し,最小限の記事メタデータを用いて学習可能な簡易モデルを提案する。最後に, このモデルの様々な構成について検討し, 上記の分類問題を解く上での有効性を評価する。

Estimating the expected impact of an article is valuable for various applications (e.g., article/cooperator recommendation). Most existing approaches attempt to predict the exact number of citations each article will receive in the near future, however this is a difficult regression analysis problem. Moreover, most approaches rely on the existence of rich metadata for each article, a requirement that cannot be adequately fulfilled for a large number of them. In this work, we take advantage of the fact that solving a simpler machine learning problem, that of classifying articles based on their expected impact, is adequate for many real world applications and we propose a simplified model that can be trained using minimal article metadata. Finally, we examine various configurations of this model and evaluate their effectiveness in solving the aforementioned classification problem.

翻訳日:2021-04-18 12:56:22 公開日:2020-12-30

# (参考訳) Crossover-SGD: 分散ディープラーニングにおけるゴシップベース通信による大規模ミニバッチ問題の緩和とスケーラビリティ向上

Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability ( http://arxiv.org/abs/2012.15198v1 )

ライセンス: CC BY 4.0

Sangho Yeo, Minho Bae, Minjoong Jeong, Oh-kyoung Kwon, Sangyoon Oh

(参考訳) 分散ディープラーニングは、大規模なデータセットと複雑なモデルのためのディープラーニングのトレーニング時間を短縮する効果的な方法である。しかし、ネットワークオーバーヘッドによるスケーラビリティの制限により、すべてのワーカーのパラメータの同期が困難になる。この問題を解決するため, 作業者数に関係なく, 安定したスケーラビリティを示すゴシップ方式が提案されている。しかし、一般的にゴシップ方式を使用するには、大規模なミニバッチの検証精度を検証する必要がある。そこで本研究では,まず,大規模ミニバッチ問題におけるゴシップ法の特性を実証的に検討し,バッチサイズ数の増加とワーカ数の増加に対して,allreduce-sgd(stochasticgradient descent)よりも高い検証精度を維持できることを確認した。しかし,gossipに基づくモデルの遅延パラメータ伝搬は,大規模ノードスケールでの検証精度を低下させる。この問題に対処するため,重みパラメータの遅延伝搬を,セグメントワイド通信と負荷分散ランダムネットワークトポロジにより緩和するクロスオーバーSGDを提案する。また,ゴシップに基づくコミュニケーション手法における労働者数を制限するため,階層的なコミュニケーションも行う。提案手法の有効性を検証するため,我々は実験実験を行い,我々のクロスオーバーSGDがSGP(Stochastic Gradient Push)よりも高いノードスケーラビリティを示した。

Distributed deep learning is an effective way to reduce the training time of deep learning for large datasets as well as complex models. However, the limited scalability caused by network overheads makes it difficult to synchronize the parameters of all workers. To resolve this problem, gossip-based methods that demonstrates stable scalability regardless of the number of workers have been proposed. However, to use gossip-based methods in general cases, the validation accuracy for a large mini-batch needs to be verified. To verify this, we first empirically study the characteristics of gossip methods in a large mini-batch problem and observe that the gossip methods preserve higher validation accuracy than AllReduce-SGD(Stochastic Gradient Descent) when the number of batch sizes is increased and the number of workers is fixed. However, the delayed parameter propagation of the gossip-based models decreases validation accuracy in large node scales. To cope with this problem, we propose Crossover-SGD that alleviates the delay propagation of weight parameters via segment-wise communication and load balancing random network topology. We also adapt hierarchical communication to limit the number of workers in gossip-based communication methods. To validate the effectiveness of our proposed method, we conduct empirical experiments and observe that our Crossover-SGD shows higher node scalability than SGP(Stochastic Gradient Push).

翻訳日:2021-04-18 12:39:33 公開日:2020-12-30

# (参考訳) 無線センサネットワークにおけるエネルギー効率の最適化のための学習

Learning to Optimize Energy Efficiency in Energy Harvesting Wireless Sensor Networks ( http://arxiv.org/abs/2012.15203v1 )

ライセンス: CC BY 4.0

Debamita Ghosh and Manjesh K. Hanawal and Nikola Zlatanov

(参考訳) エネルギー効率の最大化を目的とした,エネルギー源による複数のエネルギー収穫ノードへの無線電力伝送について検討した。ソースは各タイムスロット内の利用可能な電力レベルのいずれかを使用してノードにエネルギーを送信し、ノードは収穫したエネルギーを使用してエネルギー源に情報を送信する。ソースはチャネルの状態情報を持っておらず、与えられたノードから受信したコードワードがうまくデコードされたかどうかのみを判断する。この限られた情報により、ソースはネットワークのエネルギー効率を最大化する最適な電力レベルを学ぶ必要がある。この問題を確率的多元帯域問題としてモデル化し,エネルギー効率を最大化するエネルギー源の最適送信電力を学習する上信頼境界に基づくアルゴリズムを開発した。数値結果は,提案アルゴリズムの性能保証を検証し,ベンチマーク手法と比較して有意な向上を示した。

We study wireless power transmission by an energy source to multiple energy harvesting nodes with the aim to maximize the energy efficiency. The source transmits energy to the nodes using one of the available power levels in each time slot and the nodes transmit information back to the energy source using the harvested energy. The source does not have any channel state information and it only knows whether a received codeword from a given node was successfully decoded or not. With this limited information, the source has to learn the optimal power level that maximizes the energy efficiency of the network. We model the problem as a stochastic Multi-Armed Bandits problem and develop an Upper Confidence Bound based algorithm, which learns the optimal transmit power of the energy source that maximizes the energy efficiency. Numerical results validate the performance guarantees of the proposed algorithm and show significant gains compared to the benchmark schemes.

翻訳日:2021-04-18 12:18:59 公開日:2020-12-30

# (参考訳) 構造プローブにおける直交制約の導入

Introducing Orthogonal Constraint in Structural Probes ( http://arxiv.org/abs/2012.15228v1 )

ライセンス: CC BY-SA 4.0

Tomasz Limisiewicz and David Mare\v{c}ek

(参考訳) NLPにおける事前訓練モデルの成功により、表現の解釈に重点が置かれた。最も顕著なアプローチの1つは構造的プロッピング(hewitt and manning, 2019)で、言語構造のトポロジーを近似するために言語ベクトル空間の線形射影が実行される。本研究では、この写像を 1 つの同型空間回転に分解する; 2. 最も関係のある方向を特定してスケールする線形スケーリング。埋め込みに隠された情報をアンタングルする手法の能力を検証するための新しい構造的タスクを導入する。提案手法がマルチタスク環境で実行可能であることを実験的に示す。さらに、直交制約は、特定の言語特徴をコードする埋め込み部分空間を識別し、プローブを暗記に対する脆弱さを減らす。

With the recent success of pre-trained models in NLP, a significant focus was put on interpreting their representations. One of the most prominent approaches is structural probing (Hewitt and Manning, 2019), where a linear projection of language vector space is performed in order to approximate the topology of linguistic structures. In this work, we decompose this mapping into 1. isomorphic space rotation; 2. linear scaling that identifies and scales the most relevant directions. We introduce novel structural tasks to exam our method's ability to disentangle information hidden in the embeddings. We experimentally show that our approach can be performed in a multitask setting. Moreover, the orthogonal constraint identifies embedding subspaces encoding specific linguistic features and make the probe less vulnerable to memorization.

翻訳日:2021-04-18 12:08:24 公開日:2020-12-30

# (参考訳) 不均衡データセット最適化のための新しい再サンプリング手法

A Novel Resampling Technique for Imbalanced Dataset Optimization ( http://arxiv.org/abs/2012.15231v1 )

ライセンス: CC BY 4.0

Ivan Letteri, Antonio Di Cecco, Abeer Dyoub, Giuseppe Della Penna

(参考訳) 膨大な量のデータにもかかわらず、特定の関心のある出来事は依然として極めて稀である。まれな事象の分類は、不正取引、マルウェアのトラフィック分析、ネットワーク侵入検出など、多くのドメインで一般的な問題である。さまざまなデータセットに対する機械学習アプローチを用いたマルウェア検出のための多くの研究が開発されているが、MTA-KDD'19データセットのみが、日々の悪意のあるトラフィックの代表セットを更新する特質を持っている。この日次更新はデータセットの追加値であるが、rrw最適化mta-kdd'19のクラス不均衡問題のために潜在的な可能性がある。実際のデータセットにおけるクラス分散の難しさを,safe,borderline,realy,outlierの4種類のマイノリティクラス例から把握する。本研究では,クラス不均衡問題に対する1-Nearest Neighbour(G1Nos)オーバーサンプリングアルゴリズムの2つのバージョンを開発した。 G1Nosアルゴリズムの最初のモジュールは、Im Balance Degreeの臨界しきい値を特定する係数ベースのインスタンス選択シルエットを実行する。 (ID)2番目のモジュールはSMOTEライクなオーバーサンプリングアルゴリズムを用いて合成サンプルを生成する。クラスのバランシングは、使用済みデータセットの2つのクラス間の比率を再確立するために、G1Nosアルゴリズムによって行われます。実験結果から, オーバーサンプリングアルゴリズムは他の2つのSOTA手法よりも有効であることがわかった。

Despite the enormous amount of data, particular events of interest can still be quite rare. Classification of rare events is a common problem in many domains, such as fraudulent transactions, malware traffic analysis and network intrusion detection. Many studies have been developed for malware detection using machine learning approaches on various datasets, but as far as we know only the MTA-KDD'19 dataset has the peculiarity of updating the representative set of malicious traffic on a daily basis. This daily updating is the added value of the dataset, but it translates into a potential due to the class imbalance problem that the RRw-Optimized MTA-KDD'19 will occur. We capture difficulties of class distribution in real datasets by considering four types of minority class examples: safe, borderline, rare and outliers. In this work, we developed two versions of Generative Silhouette Resampling 1-Nearest Neighbour (G1Nos) oversampling algorithms for dealing with class imbalance problem. The first module of G1Nos algorithms performs a coefficient-based instance selection silhouette identifying the critical threshold of Imbalance Degree. (ID), the second module generates synthetic samples using a SMOTE-like oversampling algorithm. The balancing of the classes is done by our G1Nos algorithms to re-establish the proportions between the two classes of the used dataset. The experimental results show that our oversampling algorithm work better than the other two SOTA methodologies in all the metrics considered.

翻訳日:2021-04-18 11:58:45 公開日:2020-12-30

# (参考訳) AI開発レースは異種ネットワークで仲介できる

AI Development Race Can Be Mediated on Heterogeneous Networks ( http://arxiv.org/abs/2012.15234v1 )

ライセンス: CC BY 4.0

Theodor Cimpeanu, Francisco C. Santos, Luis Moniz Pereira, Tom Lenaerts and The Anh Han

(参考訳) 人工知能(AI)の分野は、研究、ビジネス、および政策に一定のレベルの不安をもたらしている。緊張はAIレースの物語によってさらに高まっており、多くの利害関係者が行方不明になることを恐れている。現実であろうとなかろうと、この物語に対する信念は有害であり、一部の利害関係者は安全上の予防や社会的結果を無視しなければならないと感じている。混合した世界での理想化された技術競争を記述したゲーム理論モデルから始め、人種間の相互作用構造の違いが集団的選択や規制行動の要件をいかに変えられるかを検討する。その結果、参加者がつながりや相互影響(例えば、スケールフリーネットワークが当事者間の相互作用を形作る場合)で強い多様性を表わすと、均質な設定に存在する衝突が著しく減少し、規制措置の必要性が低下することが示された。さらに, 技術ガバナンスと規制は, 企業や国家間の特許の不均一性と不平等から利益を得られ, 倫理的かつ持続的なaiの利用に向けて, 少数の参加者に対して, 細心の注意を払って介入を行うことが期待できる。

The field of Artificial Intelligence (AI) has been introducing a certain level of anxiety in research, business and also policy. Tensions are further heightened by an AI race narrative which makes many stakeholders fear that they might be missing out. Whether real or not, a belief in this narrative may be detrimental as some stakeholders will feel obliged to cut corners on safety precautions or ignore societal consequences. Starting from a game-theoretical model describing an idealised technology race in a well-mixed world, here we investigate how different interaction structures among race participants can alter collective choices and requirements for regulatory actions. Our findings indicate that, when participants portray a strong diversity in terms of connections and peer-influence (e.g., when scale-free networks shape interactions among parties), the conflicts that exist in homogeneous settings are significantly reduced, thereby lessening the need for regulatory actions. Furthermore, our results suggest that technology governance and regulation may profit from the world's patent heterogeneity and inequality among firms and nations to design and implement meticulous interventions on a minority of participants capable of influencing an entire population towards an ethical and sustainable use of AI.

翻訳日:2021-04-18 11:42:41 公開日:2020-12-30

# (参考訳) medico multimedia task at mediaeval 2020: automatic polyp segmentation

Medico Multimedia Task at MediaEval 2020: Automatic Polyp Segmentation ( http://arxiv.org/abs/2012.15244v1 )

ライセンス: CC BY 4.0

Debesh Jha, Steven A. Hicks, Krister Emanuelsen, H{\aa}vard Johansen, Dag Johansen, Thomas de Lange, Michael A. Riegler, P{\aa}l Halvorsen

(参考訳) 大腸癌は世界で3番目に多いがんの原因である。 global cancer statistics 2018によると、開発途上国と先進国の両方で大腸癌の発生が増加している。ポリープなどの大腸異常の早期発見はがん予防に重要であり, 自動ポリープセグメンテーションが重要な役割を担っている。最近の早期発見と治療の進歩にかかわらず、推定されたポリプミス率は20\%である。自動診断システムによるサポートは、見落とされたポリープの潜在的な解決策の1つである可能性がある。このような検出システムは、低コストな設計ソリューションを助け、医師の時間を節約することができる。本稿では,2020 medico challengeを紹介し,関連する作業とデータセットに関する情報を提供し,課題と評価指標を説明し,medico challengeの編成の必要性について論じる。

Colorectal cancer is the third most common cause of cancer worldwide. According to Global cancer statistics 2018, the incidence of colorectal cancer is increasing in both developing and developed countries. Early detection of colon anomalies such as polyps is important for cancer prevention, and automatic polyp segmentation can play a crucial role for this. Regardless of the recent advancement in early detection and treatment options, the estimated polyp miss rate is still around 20\%. Support via an automated computer-aided diagnosis system could be one of the potential solutions for the overlooked polyps. Such detection systems can help low-cost design solutions and save doctors time, which they could for example use to perform more patient examinations. In this paper, we introduce the 2020 Medico challenge, provide some information on related work and the dataset, describe the task and evaluation metrics, and discuss the necessity of organizing the Medico challenge.

翻訳日:2021-04-18 11:41:40 公開日:2020-12-30

# (参考訳) DDANet: Dual Decoder Attention Network for Automatic Polyp Segmentation

DDANet: Dual Decoder Attention Network for Automatic Polyp Segmentation ( http://arxiv.org/abs/2012.15245v1 )

ライセンス: CC BY 4.0

Nikhil Kumar Tomar, Debesh Jha, Sharib Ali, H{\aa}vard D. Johansen, Dag Johansen, Michael A. Riegler, and P{\aa}l Halvorsen

(参考訳) 大腸内視鏡は大腸ポリープの検査と検出のための金の標準である。ポリープの局在とデライン化は、治療(例えば、手術計画)と予後決定において重要な役割を果たす。ポリープセグメンテーションは臨床分析のための詳細な境界情報を提供することができる。畳み込みニューラルネットワークは大腸内視鏡の性能を改善した。しかしながら、ポリプは通常、クラス内およびクラス間変異やノイズなど、様々な課題を抱えている。ポリープ評価のための手動ラベリングは、専門家の時間を必要とし、ヒューマンエラー(例えば、欠落した病変)を起こしやすいが、自動化され、正確で、高速に分割することで、脱線した病変の境界の品質を改善し、欠落率を減らすことができる。 endotect challengeは、公開のhyperkvasirでトレーニングし、未公開のデータセットでテストすることで、コンピュータビジョンのメソッドをベンチマークする機会を提供する。本稿では,デュアルデコーダアテンションネットワークに基づく ``ddanet'' と呼ばれる新しいアーキテクチャを提案する。実験により, Kvasir-SEGデータセットを用いてトレーニングし, 未知のデータセット上で試験したモデルは, ダイス係数0.7874, mIoU0.7010, リコール0.7987, 精度0.8577を達成し, モデルの一般化能力を実証した。

Colonoscopy is the gold standard for examination and detection of colorectal polyps. Localization and delineation of polyps can play a vital role in treatment (e.g., surgical planning) and prognostic decision making. Polyp segmentation can provide detailed boundary information for clinical analysis. Convolutional neural networks have improved the performance in colonoscopy. However, polyps usually possess various challenges, such as intra-and inter-class variation and noise. While manual labeling for polyp assessment requires time from experts and is prone to human error (e.g., missed lesions), an automated, accurate, and fast segmentation can improve the quality of delineated lesion boundaries and reduce missed rate. The Endotect challenge provides an opportunity to benchmark computer vision methods by training on the publicly available Hyperkvasir and testing on a separate unseen dataset. In this paper, we propose a novel architecture called ``DDANet'' based on a dual decoder attention network. Our experiments demonstrate that the model trained on the Kvasir-SEG dataset and tested on an unseen dataset achieves a dice coefficient of 0.7874, mIoU of 0.7010, recall of 0.7987, and a precision of 0.8577, demonstrating the generalization ability of our model.

翻訳日:2021-04-18 11:34:54 公開日:2020-12-30

# (参考訳) U-Net-ResNet50を用いた自動ポリープセグメンテーション

Automatic Polyp Segmentation using U-Net-ResNet50 ( http://arxiv.org/abs/2012.15247v1 )

ライセンス: CC BY 4.0

Saruar Alam, Nikhil Kumar Tomar, Aarati Thakur, Debesh Jha, Ashish Rauniyar

(参考訳) ポリープは大腸癌の前身であり、世界中のがん関連死亡の原因の1つと考えられている。大腸内視鏡は大腸ポリープの同定、局在化、除去の標準的手順である。大腸内視鏡検査では, 形状, サイズ, 周囲の組織類似性の変動により, 大腸ポリープが欠落することが多い。大腸内視鏡検査において, 自動的, 正確かつ高速なポリープ分割法を用いることで, 多くの大腸ポリープを容易に検出・除去することができる。 medico automatic polyp segmentation challenge'は、ポリプセグメンテーションを研究し、効率的かつ正確なセグメンテーションアルゴリズムを構築する機会を提供する。プリトレーニングされたResNet50をポリプセグメンテーションのエンコーダとしてU-Netを使用する。本モデルでは,この課題に対して提供されるKvasir-SEGデータセットに基づいてトレーニングを行い,オーガナイザのデータセットで検証し,ディス係数0.8154,ジャカード0.7396,リコール0.8533,精度0.8532,精度0.9506,F2スコア0.8272を達成し,モデルの一般化能力を示す。

Polyps are the predecessors to colorectal cancer which is considered as one of the leading causes of cancer-related deaths worldwide. Colonoscopy is the standard procedure for the identification, localization, and removal of colorectal polyps. Due to variability in shape, size, and surrounding tissue similarity, colorectal polyps are often missed by the clinicians during colonoscopy. With the use of an automatic, accurate, and fast polyp segmentation method during the colonoscopy, many colorectal polyps can be easily detected and removed. The ``Medico automatic polyp segmentation challenge'' provides an opportunity to study polyp segmentation and build an efficient and accurate segmentation algorithm. We use the U-Net with pre-trained ResNet50 as the encoder for the polyp segmentation. The model is trained on Kvasir-SEG dataset provided for the challenge and tested on the organizer's dataset and achieves a dice coefficient of 0.8154, Jaccard of 0.7396, recall of 0.8533, precision of 0.8532, accuracy of 0.9506, and F2 score of 0.8272, demonstrating the generalization ability of our model.

翻訳日:2021-04-18 11:27:22 公開日:2020-12-30

# (参考訳) 機械学習におけるフェアネスの最大相関手法

A Maximal Correlation Approach to Imposing Fairness in Machine Learning ( http://arxiv.org/abs/2012.15259v1 )

ライセンス: CC BY 4.0

Joshua Lee, Yuheng Bu, Prasanna Sattigeri, Rameswar Panda, Gregory Wornell, Leonid Karlinsky, Rogerio Feris

(参考訳) 機械学習アルゴリズムの人気が高まり、多くの産業に多様化するにつれて、その公平性に関する倫理的および法的懸念が益々関連している。我々は,情報理論の観点から,アルゴリズムフェアネスの問題を探究する。最大相関フレームワーク(maximal correlation framework)は、フェアネス制約を表現するために導入されたもので、独立性と分離ベースのフェアネス基準を強制する正規化子を導出し、既存のアルゴリズムよりも計算効率のよい離散変数と連続変数の両方の最適化アルゴリズムを許容できることが示されている。これらのアルゴリズムは、スムーズなパフォーマンス・フェアネストレードオフ曲線を提供し、離散データセット(compas, adult)と連続データセット(communities and crime)の両方において最先端の手法と競合する。

As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant. We explore the problem of algorithmic fairness, taking an information-theoretic view. The maximal correlation framework is introduced for expressing fairness constraints and shown to be capable of being used to derive regularizers that enforce independence and separation-based fairness criteria, which admit optimization algorithms for both discrete and continuous variables which are more computationally efficient than existing algorithms. We show that these algorithms provide smooth performance-fairness tradeoff curves and perform competitively with state-of-the-art methods on both discrete datasets (COMPAS, Adult) and continuous datasets (Communities and Crimes).

翻訳日:2021-04-18 11:22:32 公開日:2020-12-30

# (参考訳) 情報ゲインを用いた言語横断形容詞順序予測

Predicting cross-linguistic adjective order with information gain ( http://arxiv.org/abs/2012.15263v1 )

ライセンス: CC BY 4.0

William Dyer, Richard Futrell, Zoey Liu, and Gregory Scontras

(参考訳) 言語は名詞の前、後、または周囲の複数の形容詞の配置が異なるが、通常、それらの形容詞の相対的な順序で強い言語内傾向を示す(例えば、英語では「big blue box」、フランス語では「grande bo\^{i}te bleue」、アラビア語では「alsund\={u}q al'azraq alkab\={\i}r」)。我々は,情報ゲインの最大化に基づく形容詞順の新しい定量化を推し進める。本モデルでは,フランス型ANA配列の左右非対称性を,AANおよびNAA順序と同じアプローチで解決する。 32の言語にまたがって、好まれる形容詞の順序は、情報獲得を最大化する効率的なアルゴリズムをほとんど反映している。

Languages vary in their placement of multiple adjectives before, after, or surrounding the noun, but they typically exhibit strong intra-language tendencies on the relative order of those adjectives (e.g., the preference for `big blue box' in English, `grande bo\^{i}te bleue' in French, and `alsund\={u}q al'azraq alkab\={\i}r' in Arabic). We advance a new quantitative account of adjective order across typologically-distinct languages based on maximizing information gain. Our model addresses the left-right asymmetry of French-type ANA sequences with the same approach as AAN and NAA orderings, without appeal to other mechanisms. We find that, across 32 languages, the preferred order of adjectives largely mirrors an efficient algorithm of maximizing information gain.

翻訳日:2021-04-18 11:05:13 公開日:2020-12-30

# (参考訳) DEER: イベント時間推論のためのデータ効率の良い言語モデル

DEER: A Data Efficient Language Model for Event Temporal Reasoning ( http://arxiv.org/abs/2012.15283v1 )

ライセンス: CC BY 4.0

Rujun Han, Xiang Ren, Nanyun Peng

(参考訳) BERT、RoBERTa、ELECTRAなどの事前訓練言語モデル(LM)は、様々な下流NLPタスクの性能向上に有効である。近年、研究者はこれらのLMのトレーニング目標にドメインとタスク固有の知識を取り入れ、下流タスクを扱うモデルの能力をさらに強化している。しかしながら、これらのLMはイベントの時間的推論に特化して設計されていない。本稿では,イベントの時間的関係に着目した言語モデルDEERを提案する。具体的には,イベント時相理解のための機械読解と情報抽出タスクをシミュレートするために,多数のトレーニングサンプルを作成し,イベント時相推論のlms能力を強化するためにジェネレータ・判別器構造を活用する。実験の結果, DEER は SOTA の結果を達成でき,特に 5 つの広く使用されているデータセットの低リソース環境では有効であることがわかった。

Pretrained language models (LMs) such as BERT, RoBERTa, and ELECTRA are effective at improving the performances of a variety of downstream NLP tasks. Recently, researchers have incorporated domain and task-specific knowledge in these LMs' training objectives and further enhanced models' capability of handling downstream tasks. However, none of these LMs are designed specifically for event temporal reasoning. We propose DEER, a language model that is trained to focus on event temporal relations and performs better under low-resource settings than original LMs. More specifically, we create a large number of training samples to simulate the machine reading comprehension and information extraction tasks for event temporal understanding and leverage a generator-discriminator structure to reinforce the LMs' capability of event temporal reasoning. Our experimental results show that DEER can achieve SOTA results and works particularly well in low-resource settings across 5 widely used datasets.

翻訳日:2021-04-18 10:50:52 公開日:2020-12-30

# (参考訳) バッグ外評価とサブバッキングによる分類のための最適木選定法

Optimal trees selection for classification via out-of-bag assessment and sub-bagging ( http://arxiv.org/abs/2012.15301v1 )

ライセンス: CC BY 4.0

Zardad Khan, Naz Gul, Nosheen Faiz, Asma Gul, Werner Adler, Berthold Lausen

(参考訳) 機械学習手法に対するトレーニングデータサイズの影響は過去20年間にわたってよく研究されてきた。一般に、木ベースの機械学習手法の予測性能は、トレーニングデータのサイズが大きくなるにつれて低下して改善される。本研究では,本手法が内部検証によるトレーニング観測から学習できない最適樹木アンサンブル(OTE)について検討する。そこで,OTEは内部検証におけるトレーニング観察の損失に対応するため,修正木選択法を提案する。第1の方法では、各木に対する個別および集団のパフォーマンス評価において、対応するOOB(out-of-bag)観測を使用する。木は、OOB観測に基づいて個々のパフォーマンスに基づいてランク付けされる。特定の上位木を選定し、最も正確な木から開始し、その後に1つずつ木を付加し、その木を付加するために採取したブートストラップ標本から残したOOB観測を用いて、その影響を記録する。アンサンブルの予測精度を向上させると木が選択される。第2のアプローチでは、木はランダムなサブセット上で成長し、ブートストラップサンプルではなく、トレーニングデータのサブバッギング(sub-bagging)として知られています。各試料からの残りの観察は、第1法と同様に、対応する木々の個体および集合的評価に使用される。 21個のベンチマークデータセットの解析とシミュレーション研究により,OTEや他の最先端手法と比較して改良された手法の性能が向上した。

The effect of training data size on machine learning methods has been well investigated over the past two decades. The predictive performance of tree based machine learning methods, in general, improves with a decreasing rate as the size of training data increases. We investigate this in optimal trees ensemble (OTE) where the method fails to learn from some of the training observations due to internal validation. Modified tree selection methods are thus proposed for OTE to cater for the loss of training observations in internal validation. In the first method, corresponding out-of-bag (OOB) observations are used in both individual and collective performance assessment for each tree. Trees are ranked based on their individual performance on the OOB observations. A certain number of top ranked trees is selected and starting from the most accurate tree, subsequent trees are added one by one and their impact is recorded by using the OOB observations left out from the bootstrap sample taken for the tree being added. A tree is selected if it improves predictive accuracy of the ensemble. In the second approach, trees are grown on random subsets, taken without replacement-known as sub-bagging, of the training data instead of bootstrap samples (taken with replacement). The remaining observations from each sample are used in both individual and collective assessments for each corresponding tree similar to the first method. Analysis on 21 benchmark datasets and simulations studies show improved performance of the modified methods in comparison to OTE and other state-of-the-art methods.

翻訳日:2021-04-18 10:36:00 公開日:2020-12-30

# (参考訳) マルチモーダルMR画像を用いた脳腫瘍切開のためのH2NF-Net : 第2回 BraTS Challenge 2020 Segmentation Task

H2NF-Net for Brain Tumor Segmentation using Multimodal MR Imaging: 2nd Place Solution to BraTS Challenge 2020 Segmentation Task ( http://arxiv.org/abs/2012.15318v1 )

ライセンス: CC BY 4.0

Haozhe Jia, Weidong Cai, Heng Huang, Yong Xia

(参考訳) 本稿では,マルチモーダルMR画像中の脳腫瘍を分割するハイブリッド高分解能・非局所特徴ネットワーク(H2NF-Net)を提案する。我々のH2NF-Netは、単一かつカスケードされたHNF-Netを使用して、異なる脳腫瘍のサブリージョンを分割し、予測を最終セグメンテーションとして組み合わせます。我々は、マルチモーダル脳腫瘍分離チャレンジ(BraTS)2020データセットでモデルをトレーニングし、評価した。その結果,単発モデルと縦型モデルの組み合わせにより,0.78751,0.91290,0.85461のdiceスコアと26.57525,4.18426,4.97162のハウスドルフ距離がそれぞれ0.78751,0.91290,0.85461であった。提案手法は,80名近い参加者のうち,brats 2020チャレンジセグメンテーションタスクで2位となった。

In this paper, we propose a Hybrid High-resolution and Non-local Feature Network (H2NF-Net) to segment brain tumor in multimodal MR images. Our H2NF-Net uses the single and cascaded HNF-Nets to segment different brain tumor sub-regions and combines the predictions together as the final segmentation. We trained and evaluated our model on the Multimodal Brain Tumor Segmentation Challenge (BraTS) 2020 dataset. The results on the test set show that the combination of the single and cascaded models achieved average Dice scores of 0.78751, 0.91290, and 0.85461, as well as Hausdorff distances ($95\%$) of 26.57525, 4.18426, and 4.97162 for the enhancing tumor, whole tumor, and tumor core, respectively. Our method won the second place in the BraTS 2020 challenge segmentation task out of nearly 80 participants.

翻訳日:2021-04-18 09:30:20 公開日:2020-12-30

# (参考訳) k\=oan: 修正CBOW実装

k\=oan: A Corrected CBOW Implementation ( http://arxiv.org/abs/2012.15332v1 )

ライセンス: CC BY-SA 4.0

Ozan \.Irsoy, Adrian Benton, Karl Stratos

(参考訳) NLPコミュニティでは、CBOW(continuous bag-of-words)ワードの埋め込みがスキップグラム(SG)埋め込みを過小評価する傾向にあるという共通認識がある。この信念は、トレーニング目標の理論的差異よりも、公式実装の word2vec.c や Gensim などの標準ソフトウェアライブラリにおけるCBOW実装の欠陥に基づいていることが分かる。 CBOWの正しい実装は、学習の3倍以上の速さで、様々な本質的・外生的なタスクにおいてSGと完全に競合する単語埋め込みをもたらすことを示す。私たちは実装であるk\=oanをhttps://github.com/bloomberg/koan.comでリリースします。

It is a common belief in the NLP community that continuous bag-of-words (CBOW) word embeddings tend to underperform skip-gram (SG) embeddings. We find that this belief is founded less on theoretical differences in their training objectives but more on faulty CBOW implementations in standard software libraries such as the official implementation word2vec.c and Gensim. We show that our correct implementation of CBOW yields word embeddings that are fully competitive with SG on various intrinsic and extrinsic tasks while being more than three times as fast to train. We release our implementation, k\=oan, at https://github.com/bloomberg/koan.

翻訳日:2021-04-18 09:06:21 公開日:2020-12-30

# (参考訳) ニューラルネットワークを用いた空間ガウス過程モデルの高速共分散パラメータ推定

Fast covariance parameter estimation of spatial Gaussian process models using neural networks ( http://arxiv.org/abs/2012.15339v1 )

ライセンス: CC BY 4.0

Florian Gerber and Douglas W. Nychka

(参考訳) ガウス過程(GP)は空間的に参照されたデータの一般的なモデルであり、記述的ステートメント、新しい場所での予測、新しいフィールドのシミュレーションを可能にする。共分散関数をパラメータ化するのにはいくつかのパラメータが十分であり、データからこれらのパラメータを推定するために最大可能性(ML)法が用いられる。しかし、mlメソッドは計算的に要求される。例えば、局所的な推定の場合、最小サイズのウィンドウに共分散モデルを適用することさえも、データ解析の典型的な計算資源を圧倒することができる。この制限は、ML推定を近似するためにニューラルネットワーク(NN)メソッドを使用するというアイデアを動機付けている。我々はnnを入力として適度な大きさの空間場または変量線を取り、範囲と信号間の共分散パラメータを返すように訓練する。トレーニングが完了すると、nnsはml推定と同等の精度で見積もりを提供し、100倍以上のスピードアップを行う。気候科学の応用によって動機付けられた特定の共分散推定問題に焦点をあてるが、この研究はより複雑で空間的な問題にも容易に拡張でき、計算統計学における機械学習の利用の実証となる。

Gaussian processes (GPs) are a popular model for spatially referenced data and allow descriptive statements, predictions at new locations, and simulation of new fields. Often a few parameters are sufficient to parameterize the covariance function, and maximum likelihood (ML) methods can be used to estimate these parameters from data. ML methods, however, are computationally demanding. For example, in the case of local likelihood estimation, even fitting covariance models on modest size windows can overwhelm typical computational resources for data analysis. This limitation motivates the idea of using neural network (NN) methods to approximate ML estimates. We train NNs to take moderate size spatial fields or variograms as input and return the range and noise-to-signal covariance parameters. Once trained, the NNs provide estimates with a similar accuracy compared to ML estimation and at a speedup by a factor of 100 or more. Although we focus on a specific covariance estimation problem motivated by a climate science application, this work can be easily extended to other, more complex, spatial problems and provides a proof-of-concept for this use of machine learning in computational statistics.

翻訳日:2021-04-18 08:56:26 公開日:2020-12-30

# (参考訳) ビデオモザイクアプリケーションにおける情報重なりフレームのアクティブアノテーション

Active Annotation of Informative Overlapping Frames in Video Mosaicking Applications ( http://arxiv.org/abs/2012.15343v1 )

ライセンス: CC BY 4.0

Loic Peter, Marcel Tella-Amo, Dzhoshkun Ismail Shakir, Jan Deprest, Sebastien Ourselin, Juan Eugenio Iglesias, Tom Vercauteren

(参考訳) ビデオモザイクは、再建されたシーンのグローバルな一貫性を確保するために、シーケンス内の遠くのタイムポイントに位置する重なり合うフレームを登録する必要がある。しかし,このような長距離ペアの完全自動登録は,画像自体の登録が困難である場合には困難である。本稿では,配列内の長距離対対応のアクティブアノテーションのための効率的なフレームワークを提案する。当社のフレームワークでは,提案する各ペアに視覚的対応を提供するoracleエージェント(例えば,人間ユーザや信頼できるマッチングアルゴリズム)に情報提供を求めるイメージのペアを提案している。フレームオーバーラップの2つの相補的およびオンライン適応可能なモデルと組み合わせた、原則付きアノテーション報酬に基づく反復戦略に基づいて、インフォーマティブペアを検索する。モザイクの効率的な構築に加えて、我々のフレームワークは、評価や学習目的で使用できる副産物として、地上の真実のランドマーク対応を提供する。本手法は, 人工的およびインタラクティブなシナリオにおいて, 合成配列の実験, 航空画像用データセット, 胎児手術時の胎盤モザイク用臨床データセットを用いて評価した。

Video mosaicking requires the registration of overlapping frames located at distant timepoints in the sequence to ensure global consistency of the reconstructed scene. However, fully automated registration of such long-range pairs is (i) challenging when the registration of images itself is difficult; and (ii) computationally expensive for long sequences due to the large number of candidate pairs for registration. In this paper, we introduce an efficient framework for the active annotation of long-range pairwise correspondences in a sequence. Our framework suggests pairs of images that are sought to be informative to an oracle agent (e.g., a human user, or a reliable matching algorithm) who provides visual correspondences on each suggested pair. Informative pairs are retrieved according to an iterative strategy based on a principled annotation reward coupled with two complementary and online adaptable models of frame overlap. In addition to the efficient construction of a mosaic, our framework provides, as a by-product, ground truth landmark correspondences which can be used for evaluation or learning purposes. We evaluate our approach in both automated and interactive scenarios via experiments on synthetic sequences, on a publicly available dataset for aerial imaging and on a clinical dataset for placenta mosaicking during fetal surgery.

翻訳日:2021-04-18 08:34:37 公開日:2020-12-30

# (参考訳) DynaSent: 知覚分析のための動的ベンチマーク

DynaSent: A Dynamic Benchmark for Sentiment Analysis ( http://arxiv.org/abs/2012.15349v1 )

ライセンス: CC BY 4.0

Christopher Potts, Zhengxuan Wu, Atticus Geiger, Douwe Kiela

(参考訳) dynasent ('dynamic sentiment') は,三者間感情分析(正・負・中性)のための新しい英語ベンチマークタスクである。 dynasentは自然に発生する文とオープンソースのdynabenchプラットフォームを使って作成された文を組み合わせる。 DynaSentには合計121,634の文があり、それぞれが5人のクラウドワーカーによって検証されており、その開発とテストの分割は、私たちが開発した最高のモデルであっても、チャンスパフォーマンスを生み出すように設計されている。ここでは、データセットの作成作業について報告し、品質の向上とアーティファクトの削減に要したステップに注目します。 dynasentの中性カテゴリが他のベンチマークの同等のカテゴリよりも一貫性があるという証拠も提示し、連続的な微調整よりも各ラウンドのトレーニングモデルをスクラッチからモチベーションづける。

We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis. DynaSent combines naturally occurring sentences with sentences created using the open-source Dynabench Platform, which facilities human-and-model-in-the-loop dataset creation. DynaSent has a total of 121,634 sentences, each validated by five crowdworkers, and its development and test splits are designed to produce chance performance for even the best models we have been able to develop; when future models solve this task, we will use them to create DynaSent version 2, continuing the dynamic evolution of this benchmark. Here, we report on the dataset creation effort, focusing on the steps we took to increase quality and reduce artifacts. We also present evidence that DynaSent's Neutral category is more coherent than the comparable category in other benchmarks, and we motivate training models from scratch for each round over successive fine-tuning.

翻訳日:2021-04-18 07:52:42 公開日:2020-12-30

# (参考訳) bert(および他のトランスフォーマーモデル)埋め込みによる文脈的意味的特徴の導出

Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings ( http://arxiv.org/abs/2012.15353v1 )

ライセンス: CC BY 4.0

Jacob Turton, David Vinson, Robert Elliott Smith

(参考訳) BERTのようなトランスフォーマーアーキテクチャに基づくモデルは、自然言語処理の分野で重要な一歩を踏み出した。重要なことは、文脈における単語に関する重要な意味情報をキャプチャする単語埋め込みの作成を可能にすることである。しかし、単一の実体として、これらの埋め込みは解釈が難しく、それらを作成するのに使われたモデルは不透明であると説明されている。 Binderと同僚は直感的な埋め込み空間を提案し、それぞれの次元は65のコアセマンティックな特徴の1つに基づいている。残念ながら、スペースは535ワードの小さなデータセットでしか存在せず、使用は制限されている。以前の研究(Utsumi, 2018, 2020, Turton, Vinson & Smith, 2020)では、Binderの機能は静的な埋め込みから派生し、大きな新しい語彙への外挿に成功した。次のステップとして,Binder の機能は BERT 埋め込み空間から導出可能であることを示す。これはコンテキスト化されたBinder埋め込みを提供し、コンテキスト内の単語間の意味的差異を理解するのに役立つ。さらに、BERTモデルの異なるレイヤ間でセマンティック機能がどのように表現されるかについての洞察も提供する。

Models based on the transformer architecture, such as BERT, have marked a crucial step forward in the field of Natural Language Processing. Importantly, they allow the creation of word embeddings that capture important semantic information about words in context. However, as single entities, these embeddings are difficult to interpret and the models used to create them have been described as opaque. Binder and colleagues proposed an intuitive embedding space where each dimension is based on one of 65 core semantic features. Unfortunately, the space only exists for a small dataset of 535 words, limiting its uses. Previous work (Utsumi, 2018, 2020, Turton, Vinson & Smith, 2020) has shown that Binder features can be derived from static embeddings and successfully extrapolated to a large new vocabulary. Taking the next step, this paper demonstrates that Binder features can be derived from the BERT embedding space. This provides contextualised Binder embeddings, which can aid in understanding semantic differences between words in context. It additionally provides insights into how semantic features are represented across the different layers of the BERT model.

翻訳日:2021-04-18 07:28:11 公開日:2020-12-30

# (参考訳) OSTeC:ワンショットテクスチャコンプリート

OSTeC: One-Shot Texture Completion ( http://arxiv.org/abs/2012.15370v1 )

ライセンス: CC BY 4.0

Baris Gecer, Jiankang Deng, Stefanos Zafeiriou

(参考訳) ここ数年は、高品質なフォトリアリスティックな顔画像の合成において、非線形生成モデルが大きな成功を収めている。最近の3D顔テクスチャの再構築と、1つの画像アプローチからのポーズ操作は、画像から画像へのジェネレータネットワーク(GAN)をトレーニングする大規模でクリーンな顔データセットに依存している。しかし、このような大規模な高解像度3dテクスチャデータセットの収集は、年齢と民族のバランスを維持するのに非常にコストがかかり、困難である。さらに、回帰に基づくアプローチは、中間条件への一般化に悩まされ、目標像への微調整ができない。本研究では,大規模なテクスチャデータセットを必要とせず,むしろ2d顔生成器に格納された知識を活用する,ワンショット3d顔テクスチャ補完のための教師なしアプローチを提案する。提案手法は、3次元の入力画像を回転させ、可視部に基づいて2次元顔生成器で回転画像を再構成することにより、未検出領域を充填する。最後に、UV画像平面の異なる角度で最も目に見えるテクスチャを縫い合わせる。さらに,完成したテクスチャをジェネレータに投影することで,対象画像をフロンダリゼーションする。定性的かつ定量的な実験により,完成したuvテクスチャとフロントイメージは高品質であり,元のアイデンティティに類似しており,3dmmフィッティングのためのテクスチャganモデルのトレーニングやポーズ不変顔認識の改善に使用することができる。

The last few years have witnessed the great success of non-linear generative models in synthesizing high-quality photorealistic face images. Many recent 3D facial texture reconstruction and pose manipulation from a single image approaches still rely on large and clean face datasets to train image-to-image Generative Adversarial Networks (GANs). Yet the collection of such a large scale high-resolution 3D texture dataset is still very costly and difficult to maintain age/ethnicity balance. Moreover, regression-based approaches suffer from generalization to the in-the-wild conditions and are unable to fine-tune to a target-image. In this work, we propose an unsupervised approach for one-shot 3D facial texture completion that does not require large-scale texture datasets, but rather harnesses the knowledge stored in 2D face generators. The proposed approach rotates an input image in 3D and fill-in the unseen regions by reconstructing the rotated image in a 2D face generator, based on the visible parts. Finally, we stitch the most visible textures at different angles in the UV image-plane. Further, we frontalize the target image by projecting the completed texture into the generator. The qualitative and quantitative experiments demonstrate that the completed UV textures and frontalized images are of high quality, resembles the original identity, can be used to train a texture GAN model for 3DMM fitting and improve pose-invariant face recognition.

翻訳日:2021-04-18 07:04:31 公開日:2020-12-30

# (参考訳) 自己監督型機能距離を用いたモデルベース視覚計画

Model-Based Visual Planning with Self-Supervised Functional Distances ( http://arxiv.org/abs/2012.15373v1 )

ライセンス: CC BY 4.0

Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine

(参考訳) 汎用ロボットはその環境の中で様々なタスクを完了できなければならない。各タスクを指定できるひとつの魅力的な方法は、ゴールの観察である。しかし、特に手書きの報酬関数が利用できない場合、強化学習による目標達成政策の学習は難しい問題である。学習されたダイナミクスモデルは、報酬やタスク指向データなしで環境について学ぶための有望なアプローチであるが、そのようなモデルで目標に到達する計画には、観測と目標状態の間の機能的類似性の概念が必要である。本稿では,視覚力学モデルとモデルフリー強化学習を用いて学習した動的距離関数を併用した,モデルベース視覚目標到達のための自己教師あり手法を提案する。当社のアプローチは、オフラインでラベルのないデータを使用して完全に学習し、大規模で多様なデータセットにスケールすることが現実的になります。実験では,実世界のロボットを用いて,様々なタスクを遂行するモデル,ロボットアームをシミュレートした不注意な物体を移動させるモデル,さらには引き出しの開閉を学習する手法が有効であることを見出した。比較すると,本手法はモデルフリーとモデルベース先行手法の両方で大幅に優れていた。ビデオとビジュアライゼーションは以下の通りである。

A generalist robot must be able to complete a variety of tasks in its environment. One appealing way to specify each task is in terms of a goal observation. However, learning goal-reaching policies with reinforcement learning remains a challenging problem, particularly when hand-engineered reward functions are not available. Learned dynamics models are a promising approach for learning about the environment without rewards or task-directed data, but planning to reach goals with such a model requires a notion of functional similarity between observations and goal states. We present a self-supervised method for model-based visual goal reaching, which uses both a visual dynamics model as well as a dynamical distance function learned using model-free reinforcement learning. Our approach learns entirely using offline, unlabeled data, making it practical to scale to large and diverse datasets. In our experiments, we find that our method can successfully learn models that perform a variety of tasks at test-time, moving objects amid distractors with a simulated robotic arm and even learning to open and close a drawer using a real-world robot. In comparisons, we find that this approach substantially outperforms both model-free and model-based prior methods. Videos and visualizations are available here: http://sites.google.com/berkeley.edu/mbold.

翻訳日:2021-04-18 06:40:12 公開日:2020-12-30

# 普遍視覚指導による正確な単語表現

Accurate Word Representations with Universal Visual Guidance ( http://arxiv.org/abs/2012.15086v1 )

ライセンス: Link先を確認

Zhuosheng Zhang, Haojie Yu, Hai Zhao, Rui Wang, Masao Utiyama

(参考訳) 単語表現は、ニューラルネットワーク理解モデルの基本コンポーネントである。近年,事前学習型言語モデル (PrLM) は,文脈化語表現の新しいパフォーマンス手法を提供する。 prlmは一般に、非文脈化モデルよりも正確な文脈化単語表現を提供するが、マルチモーダリティから単語表現のヒントが多様でないテキストコンテキストの列にはまだ従わない。そこで本稿では,視覚指導から従来の単語埋め込みを視覚的に強調する視覚的表現法を提案する。詳細は,各単語が多様な関連画像に対応するマルチモーダルシードデータセットから,小規模の単語画像辞書を構築する。テキストとペア画像は並列に符号化され、次にマルチモーダル表現を統合するアテンション層が続く。本手法は曖昧さの精度を大幅に向上させる。 12の自然言語理解および機械翻訳タスクの実験により,提案手法の有効性と一般化能力がさらに検証された。

Word representation is a fundamental component in neural language understanding models. Recently, pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally give more accurate contextualized word representations than non-contextualized models do, they are still subject to a sequence of text contexts without diverse hints for word representation from multimodality. This paper thus proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance. In detail, we build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images. The texts and paired images are encoded in parallel, followed by an attention layer to integrate the multimodal representations. We show that the method substantially improves the accuracy of disambiguation. Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.

翻訳日:2021-04-18 06:08:40 公開日:2020-12-30

# 命令外:自然言語理解タスクにおける文中の単語の逐次順序はどの程度重要か?

Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks? ( http://arxiv.org/abs/2012.15180v1 )

ライセンス: Link先を確認

Thang M. Pham, Trung Bui, Long Mai, Anh Nguyen

(参考訳) 最先端の自然言語理解モデルは、単語の順序 - シーケンスの最も重要な特徴の1つか? いつもじゃない! 入力単語がランダムにシャッフルされた後も,多くのグルータスクで訓練されたbertベースの分類器の正確な予測の75%から90%が一定であった。 BERTの埋め込みはコンテキスト的に有名だが、各単語の下流タスクへの貢献は、単語のコンテキストがシャッフルされた後もほとんど変化しない。 BERTベースのモデルは表面的な手がかり(例)を利用することができる。感情分析におけるキーワードの感情、あるいは自然言語推論におけるシーケンスペア入力間の単語間の類似性)トークンがランダムに配列されたときに正しい決定をする。単語順序情報をキャプチャするための分類器の強化は、ほとんどのGLUEタスク、SQuAD 2.0およびout-of-samplesのパフォーマンスを改善する。我々の研究は、多くのGLUEタスクが文の意味を理解するのに難題ではないことを示唆している。

Do state-of-the-art natural language understanding models care about word order - one of the most important characteristics of a sequence? Not always! We found 75% to 90% of the correct predictions of BERT-based classifiers, trained on many GLUE tasks, remain constant after input words are randomly shuffled. Despite BERT embeddings are famously contextual, the contribution of each individual word to downstream tasks is almost unchanged even after the word's context is shuffled. BERT-based models are able to exploit superficial cues (e.g. the sentiment of keywords in sentiment analysis; or the word-wise similarity between sequence-pair inputs in natural language inference) to make correct decisions when tokens are arranged in random orders. Encouraging classifiers to capture word order information improves the performance on most GLUE tasks, SQuAD 2.0 and out-of-samples. Our work suggests that many GLUE tasks are not challenging machines to understand the meaning of a sentence.

翻訳日:2021-04-18 06:08:28 公開日:2020-12-30

# メタ認知による言語キャリブレーション:対話エージェント応答と期待された正しさの整合

Linguistic calibration through metacognition: aligning dialogue agent responses with expected correctness ( http://arxiv.org/abs/2012.14983v1 )

ライセンス: Link先を確認

Sabrina J. Mielke, Arthur Szlam, Y-Lan Boureau, Emily Dinan

(参考訳) オープンドメインの対話エージェントは大幅に改善されているが、直観的な質問に対して自信を持って知識を暗示したり、疑問を呈したりする。疑わしい(または自信のある)言語表現は、モデルの答えが正しくない(または正しい)可能性と一致するか? この意味でこれらのモデルは校正が不十分であることがわかったが、モデル内の表現は正確さの確率を正確に予測するために使用できることを示した。制御可能な生成モデルのトレーニングにこれらの正確性予測を組み込むことで、言語キャリブレーションを大幅に改善した対話エージェントを得る。

Open-domain dialogue agents have vastly improved, but still confidently hallucinate knowledge or express doubt when asked straightforward questions. In this work, we analyze whether state-of-the-art chit-chat models can express metacognition capabilities through their responses: does a verbalized expression of doubt (or confidence) match the likelihood that the model's answer is incorrect (or correct)? We find that these models are poorly calibrated in this sense, yet we show that the representations within the models can be used to accurately predict likelihood of correctness. By incorporating these correctness predictions into the training of a controllable generation model, we obtain a dialogue agent with greatly improved linguistic calibration.

翻訳日:2021-04-18 06:08:11 公開日:2020-12-30

# テーブル上のオープンファクトチェックのための共同検証とリランク

Joint Verification and Reranking for Open Fact Checking Over Tables ( http://arxiv.org/abs/2012.15115v1 )

ライセンス: Link先を確認

Michael Schlichtkrull, Vladimir Karpukhin, Barlas O\u{g}uz, Mike Lewis, Wen-tau Yih, Sebastian Riedel

(参考訳) 構造化情報は事実クレームの自動検証のための重要な知識源である。しかし,本研究の大部分はテキストデータに重点を置いており,近年では各クレームに対する適切な証拠が既に回収されていると推定されるクローズド・ドメイン・セッティングに関する調査も行われている。本稿では,オープンドメイン設定における構造化データに対する検証について検討し,検証コンポーネント内の証拠文書を融合する検証モデルを導入する。我々のオープンドメインモデルは、TabFactデータセットのクローズドドメイン状態に匹敵するパフォーマンスを実現し、複数のテーブルを含めることによるパフォーマンス向上と、ヒューリスティックな検索ベースラインに対する大幅な改善を示す。

Structured information is an important knowledge source for automatic verification of factual claims. Nevertheless, the majority of existing research into this task has focused on textual data, and the few recent inquiries into structured data have been for the closed-domain setting where appropriate evidence for each claim is assumed to have already been retrieved. In this paper, we investigate verification over structured data in the open-domain setting, introducing a joint reranking-and-verification model which fuses evidence documents in the verification component. Our open-domain model achieves performance comparable to the closed-domain state-of-the-art on the TabFact dataset, and demonstrates performance gains from the inclusion of multiple tables as well as a significant improvement over a heuristic retrieval baseline.

翻訳日:2021-04-18 06:08:00 公開日:2020-12-30

# ペシミズムはおそらくオフラインRLに有効か?

Is Pessimism Provably Efficient for Offline RL? ( http://arxiv.org/abs/2012.15085v1 )

ライセンス: Link先を確認

Ying Jin, Zhuoran Yang, Zhaoran Wang

(参考訳) 本研究では,事前収集したデータセットに基づく最適ポリシー学習を目的としたオフライン強化学習(RL)について検討する。環境とのさらなる相互作用が欠如しているため、オフラインのRLはデータセットのカバー不足に悩まされ、既存の理論分析を損なう。本稿では,不確かさ量化器をペナルティ関数として組み込んだ値反復アルゴリズム(pevi)の悲観的変種を提案する。このようなペナルティ関数は、オンラインrlの探索を促進するためのボーナス関数の符号をひっくり返すだけで、一般的な関数近似器と容易に実装でき、互換性がある。データセットの十分なカバレッジを仮定せずに、一般的なマルコフ決定プロセス(MDPs)に対するPEVIの最適度にデータ依存的な上限を確立する。線形 MDP に特化する場合、情報理論の下界は次元と地平線の乗法的因子と一致する。言い換えれば、悲観主義は証明可能な効率だけでなく、最小限の最適化でもある。特にデータセットが与えられた場合、学習されたポリシは、他のポリシが改善できないため、すべてのポリシの中で‘ベストプラクティス’として機能します。我々の理論的分析は, データセットによってカバーされず, 最適方針に反する「無関係」軌道から生じる, 刺激的相関の概念を排除する上で, 悲観主義が重要な役割を解明するものである。

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function. Such a penalty function simply flips the sign of the bonus function for promoting exploration in online RL, which makes it easily implementable and compatible with general function approximators. Without assuming the sufficient coverage of the dataset, we establish a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs). When specialized to linear MDPs, it matches the information-theoretic lower bound up to multiplicative factors of the dimension and horizon. In other words, pessimism is not only provably efficient but also minimax optimal. In particular, given the dataset, the learned policy serves as the ``best effort'' among all policies, as no other policies can do better. Our theoretical analysis identifies the critical role of pessimism in eliminating a notion of spurious correlation, which emerges from the ``irrelevant'' trajectories that are less covered by the dataset and not informative for the optimal policy.

翻訳日:2021-04-18 06:07:45 公開日:2020-12-30

# ERICA:コントラスト学習による事前学習言語モデルのエンティティと関係理解の改善

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning ( http://arxiv.org/abs/2012.15022v1 )

ライセンス: Link先を確認

Yujia Qin, Yankai Lin, Ryuichi Takanobu, Zhiyuan Liu, Peng Li, Heng Ji, Minlie Huang, Maosong Sun, Jie Zhou

(参考訳) 事前訓練された言語モデル(PLM)は、様々な下流自然言語処理(NLP)タスクで高いパフォーマンスを示している。しかし、plmはテキスト中の事実の知識をうまく捉えられないため、テキスト全体の理解、特に文書レベルの言語理解タスクにおいて重要である。この問題に対処するため,本研究では,ERICA という新たなコントラスト学習フレームワークを事前学習段階で提案し,エンティティとその関係をテキストでより深く理解する。具体的には、(1)エンティティをよりよく理解するために、与えられたヘッドエンティティと関係によって推測できる末尾エンティティを区別するエンティティ識別タスクを提案する。 2)関係をよりよく理解するために、2つのエンティティペアが関係セマンティクスにおいて近接しているか否かを区別する関係識別タスクを用いる。実験の結果,本フレームワークは,関係抽出や読解といった文書レベルの言語理解タスクにおいて,特に低リソース環境下で一貫した改善を実現していることがわかった。一方、ERICAは文レベルのタスクで同等またはより良いパフォーマンスを達成する。さらなる研究のために、データセット、ソースコード、事前学習された言語モデルをリリースします。

Pre-trained Language Models (PLMs) have shown strong performance in various downstream Natural Language Processing (NLP) tasks. However, PLMs still cannot well capture the factual knowledge in the text, which is crucial for understanding the whole text, especially for document-level language understanding tasks. To address this issue, we propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text. Specifically, (1) to better understand entities, we propose an entity discrimination task that distinguishes which tail entity can be inferred by the given head entity and relation. (2) Besides, to better understand relations, we employ a relation discrimination task which distinguishes whether two entity pairs are close or not in relational semantics. Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks, including relation extraction and reading comprehension, especially under low resource setting. Meanwhile, ERICA achieves comparable or better performance on sentence-level tasks. We will release the datasets, source codes and pre-trained language models for further research explorations.

翻訳日:2021-04-18 06:07:23 公開日:2020-12-30

# SemGloVe:BERTによるGloVeのセマンティック共同発生

SemGloVe: Semantic Co-occurrences for GloVe from BERT ( http://arxiv.org/abs/2012.15197v1 )

ライセンス: Link先を確認

Leilei Gan, Zhiyang Teng, Yue Zhang, Linchao Zhu, Fei Wu, Yi Yang

(参考訳) GloVeは単語共起行列から統計情報を活用することで単語埋め込みを学習する。しかし、行列中の単語ペアは、定義済みのローカルコンテキストウィンドウから抽出され、限定された単語ペアと潜在的に意味のない単語ペアにつながる可能性がある。本稿では,BERTから静的なGloVe単語の埋め込みに意味的共起を蒸留するSemGloVeを提案する。特に,マスク付き言語モデルと多頭部注意重みに基づく共起統計を抽出する2つのモデルを提案する。提案手法は,局所的なウィンドウ仮定によって制限されることなく単語ペアを抽出し,単語ペア間の意味的距離を直接考慮して共起重みを定義できる。いくつかの単語類似性データセットと4つの外部タスクの実験は、SemGloVeがGloVeより優れていることを示している。

GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices. However, word pairs in the matrices are extracted from a predefined local context window, which might lead to limited word pairs and potentially semantic irrelevant word pairs. In this paper, we propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings. Particularly, we propose two models to extract co-occurrence statistics based on either the masked language model or the multi-head attention weights of BERT. Our methods can extract word pairs without limiting by the local window assumption and can define the co-occurrence weights by directly considering the semantic distance between word pairs. Experiments on several word similarity datasets and four external tasks show that SemGloVe can outperform GloVe.

翻訳日:2021-04-18 06:07:05 公開日:2020-12-30

# 対話システムにおける言語理解のロバストネステスト

Robustness Testing of Language Understanding in Dialog Systems ( http://arxiv.org/abs/2012.15262v1 )

ライセンス: Link先を確認

Jiexi Liu, Ryuichi Takanobu, Jiaxin Wen, Dazhen Wan, Weiran Nie, Hongyan Li, Cheng Li, Wei Peng, Minlie Huang

(参考訳) ダイアログシステムにおけるほとんどの言語理解モデルは、少量の注釈付きトレーニングデータに基づいて訓練され、同じ分布から小さなセットで評価される。しかし、これらのモデルが実際に自然摂動にさらされると、システム障害や望ましくない出力につながる可能性がある。本稿では,自然言語理解モデルの頑健性に関する包括的評価と分析を行い,実世界の対話システムにおける言語理解に関する3つの重要な側面,すなわち言語多様性,音声特性,雑音の摂動について述べる。本稿では,対話システムにおけるロバスト性問題をテストするために,自然な摂動を近似するモデル非依存ツールキットLAUGを提案する。この3つの側面をカバーする4つのデータ拡張アプローチがlaugで組み立てられ、最先端モデルにおける重要な堅牢性問題を明らかにする。 LAUGによる拡張データセットは、ダイアログシステムにおける言語理解の堅牢性テストの今後の研究を促進するために使用できる。

Most language understanding models in dialog systems are trained on a small amount of annotated training data, and evaluated in a small set from the same distribution. However, these models can lead to system failure or undesirable outputs when being exposed to natural perturbation in practice. In this paper, we conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models, and introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation. We propose a model-agnostic toolkit LAUG to approximate natural perturbation for testing the robustness issues in dialog systems. Four data augmentation approaches covering the three aspects are assembled in LAUG, which reveals critical robustness issues in state-of-the-art models. The augmented dataset through LAUG can be used to facilitate future research on the robustness testing of language understanding in dialog systems.

翻訳日:2021-04-18 06:06:51 公開日:2020-12-30

# グラフ-テキスト問題としての地図からのランドマークナビゲーション命令の生成

Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem ( http://arxiv.org/abs/2012.15329v1 )

ライセンス: Link先を確認

Raphael Schumann and Stefan Riezler

(参考訳) 自動車に焦点をあてたナビゲーションサービスは、名前付き通りの曲がり角と距離に基づいており、人間によって自然に使用されるナビゲーション指示は、ランドマークと呼ばれる物理的オブジェクトを中心にしている。本稿では,OpenStreetMap表現を入力とし,人間の自然言語命令から可視的かつ健全なランドマークを含むナビゲーション命令を生成するニューラルネットワークを提案する。地図上の経路は、自然言語命令にデコードされる位置および回転不変グラフ表現に符号化される。われわれの研究は、ストリートビューで人間のナビゲーションによって検証された7,672件のクラウドソースインスタンスのデータセットに基づいている。評価の結果,本システムで生成したナビゲーション命令は,人間が生成した命令と類似しており,ストリートビューでのナビゲーションが成功していることがわかった。

Car-focused navigation services are based on turns and distances of named streets, whereas navigation instructions naturally used by humans are centered around physical objects called landmarks. We present a neural model that takes OpenStreetMap representations as input and learns to generate navigation instructions that contain visible and salient landmarks from human natural language instructions. Routes on the map are encoded in a location- and rotation-invariant graph representation that is decoded into natural language instructions. Our work is based on a novel dataset of 7,672 crowd-sourced instances that have been verified by human navigation in Street View. Our evaluation shows that the navigation instructions generated by our system have similar properties as human-generated instructions, and lead to successful human navigation in Street View.

翻訳日:2021-04-18 06:06:36 公開日:2020-12-30

# 口語ニューラルマシン翻訳のための合成ソース言語拡張

Synthetic Source Language Augmentation for Colloquial Neural Machine Translation ( http://arxiv.org/abs/2012.15178v1 )

ライセンス: Link先を確認

Asrul Sani Ariesandy, Mukhlis Amien, Alham Fikri Aji, Radityo Eko Prasojo

(参考訳) ニューラルネットワーク翻訳(NMT)は通常ドメインに依存し、スタイルに依存し、多くのトレーニングデータを必要とする。最先端のNMTモデルは、しばしばソース言語の語彙的バリエーションを扱うのに不足しており、この点において並列データの欠如は、既存のモデルを体系的に改善する上で難しいハードルである。そこで本研究では,youtube と twitter から収集したインドネシア英語テストセットを開発した。インドネシア語正規語のソースに対して合成スタイル拡張を行い、新しいテストデータよりもベースラインId-Enモデル(BLEU)を改善したことを示す。

Neural machine translation (NMT) is typically domain-dependent and style-dependent, and it requires lots of training data. State-of-the-art NMT models often fall short in handling colloquial variations of its source language and the lack of parallel data in this regard is a challenging hurdle in systematically improving the existing models. In this work, we develop a novel colloquial Indonesian-English test-set collected from YouTube transcript and Twitter. We perform synthetic style augmentation to the source of formal Indonesian language and show that it improves the baseline Id-En models (in BLEU) over the new test data.

翻訳日:2021-04-18 06:06:21 公開日:2020-12-30

# 小さなデータセット上でのより深いトランスフォーマーの最適化:テキストからsqlへの意味解析への応用

Optimizing Deeper Transformers on Small Datasets: An Application on Text-to-SQL Semantic Parsing ( http://arxiv.org/abs/2012.15355v1 )

ライセンス: Link先を確認

Peng Xu, Wei Yang, Wenjie Zi, Keyi Tang, Chengyang Huang, Jackie Chi Kit Cheung, Yanshuai Cao

(参考訳) スクラッチからディープトランスフォーマーをトレーニングするには大きなデータセットが必要であるという一般的な信念のため、人々は小さなデータセットを微調整する際、トレーニング済みのモデルの上に浅い層と単純な層しか使用しない。適切な初期化とトレーニング技術によって、非常に深いトランスフォーマーの利点は、小さなデータセットを使用しても、ハードな構造的予測タスクに引き継がれることが示されます。特に,意味解析タスクのために48層のトランスフォーマーをトレーニングした。これらは、予め訓練されたRoBERTaの24層と、スクラッチから訓練された24層からなる。トレーニングステップが少なく、タスク固有の事前トレーニングがないため、挑戦的なクロスドメインのText-to-SQLセマンティックパーシングベンチマークであるSpider上で、アートパフォーマンスの状態を取得する。我々は、従来のT-Fixup作業に触発された新しいデータ依存トランスフォーマー固定更新初期化スキーム(DT-Fixup)を導出した。さらなる誤差解析により、変圧器モデルの深さを増大させることで、推論や構造的理解を必要とするケースの一般化が向上することを示した。

Due to the common belief that training deep transformers from scratch requires large datasets, people usually only use shallow and simple additional layers on top of pre-trained models during fine-tuning on small datasets. We provide evidence that this does not always need to be the case: with proper initialization and training techniques, the benefits of very deep transformers are shown to carry over to hard structural prediction tasks, even using small datasets. In particular, we successfully train 48 layers of transformers for a semantic parsing task. These comprise 24 fine-tuned transformer layers from pre-trained RoBERTa and 24 relation-aware transformer layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state of the art performance on the challenging cross-domain Text-to-SQL semantic parsing benchmark Spider. We achieve this by deriving a novel Data dependent Transformer Fixed-update initialization scheme (DT-Fixup), inspired by the prior T-Fixup work. Further error analysis demonstrates that increasing the depth of the transformer model can help improve generalization on the cases requiring reasoning and structural understanding.

翻訳日:2021-04-18 06:06:09 公開日:2020-12-30

# 不自然な言語推論

Unnatural Language Inference ( http://arxiv.org/abs/2101.00010v1 )

ライセンス: Link先を確認

Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, Adina Williams

(参考訳) 自然言語理解は、大規模な事前学習されたトランスフォーマーネットワークの導入によって、分岐した瞬間を目撃している。これらのモデルは、特に自然言語推論(NLI)を含む様々なタスクで最先端を達成する。多くの研究は、モデルに埋め込まれた大きな表現空間がいくつかの構文情報や意味情報を符号化していることを示した。しかし、本当に「構文を知る」ためには、モデルはその入力が構文規則に違反していることを認識し、それに従って推論を計算する必要がある。本稿では,roberta や bart のような最先端 nli モデルが,無作為に並べ替えられた単語の例に対して不変であり,時にはよりよく機能することを示す。反復探索により、原語と同じ単語で置換された仮説前提ペアを含むNLIテストセットのランダム化バージョンを構築することができるが、大きな事前訓練されたモデルや、変換前の最先端エンコーダによって完全な精度で分類できる。問題は言語であり,モデル不変であり,それゆえ根本原因を考察する。この効果を部分的に緩和するために,簡単なトレーニング手法を提案する。我々の発見は、自然言語理解モデルと、その進捗を測定するために使われるタスクが、本当に人間のような構文理解を必要とするという考えに疑問を投げかけている。

Natural Language Understanding has witnessed a watershed moment with the introduction of large pre-trained Transformer networks. These models achieve state-of-the-art on various tasks, notably including Natural Language Inference (NLI). Many studies have shown that the large representation space imbibed by the models encodes some syntactic and semantic information. However, to really "know syntax", a model must recognize when its input violates syntactic rules and calculate inferences accordingly. In this work, we find that state-of-the-art NLI models, such as RoBERTa and BART are invariant to, and sometimes even perform better on, examples with randomly reordered words. With iterative search, we are able to construct randomized versions of NLI test sets, which contain permuted hypothesis-premise pairs with the same words as the original, yet are classified with perfect accuracy by large pre-trained models, as well as pre-Transformer state-of-the-art encoders. We find the issue to be language and model invariant, and hence investigate the root cause. To partially alleviate this effect, we propose a simple training methodology. Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.

翻訳日:2021-04-18 06:05:47 公開日:2020-12-30

# ホップワイズを考慮した適応グラフ拡散ネットワーク

Adaptive Graph Diffusion Networks with Hop-wise Attention ( http://arxiv.org/abs/2012.15024v1 )

ライセンス: Link先を確認

Chuxiong Sun, Guoshi Wu

(参考訳) グラフニューラルネットワーク(GNN)は近年注目を集め、多くの分野で最先端のパフォーマンスを実現している。より深いGNNは理論上、より深い近隣情報をキャプチャすることができる。しかし、しばしば過剰フィッティングや過剰スムーシングの問題に苦しむ。複雑度と一般化性を維持しつつ,より深い情報を取り込むため,ホップワイズ・アテンション(agdns-ha)を用いた適応型グラフ拡散ネットワークを提案する。異なる順序の複数のホップ近傍のアグリゲーションを単一層に積み重ねる。次に、各ノードに対して学習可能で適応的なホップ対応の助けを借りて統合する。半教師付きノード分類タスクを用いた標準データセットの実験結果から,提案手法は有意な改善が得られた。

Graph Neural Networks (GNNs) have received much attention recent years and have achieved state-of-the-art performances in many fields. The deeper GNNs can theoretically capture deeper neighborhood information. However, they often suffer from problems of over-fitting and over-smoothing. In order to incorporate deeper information while preserving considerable complexity and generalization ability, we propose Adaptive Graph Diffusion Networks with Hop-wise Attention (AGDNs-HA). We stack multi-hop neighborhood aggregations of different orders into single layer. Then we integrate them with the help of hop-wise attention, which is learnable and adaptive for each node. Experimental results on the standard dataset with semi-supervised node classification task show that our proposed methods achieve significant improvements.

翻訳日:2021-04-18 06:05:27 公開日:2020-12-30

# infer-avae: 逆変分オートエンコーダに基づく属性推論モデル

Infer-AVAE: An Attribute Inference Model Based on Adversarial Variational Autoencoder ( http://arxiv.org/abs/2012.15005v1 )

ライセンス: Link先を確認

Yadong Zhou, Zhihao Ding, Xiaoming Liu, Chao Shen, Lingling Tong, Xiaohong Guan

(参考訳) ソーシャルネットワーク上でのユーザ属性の空間性に欠ける属性推論は、既存のデータとユーザ間のソーシャル接続などの追加情報に基づいて、欠落した属性を推測することを目的としている。近年,変分オートエンコーダ (VAE) が半教師付き方式でこの問題の解決に成功している。しかし、エンコーダが学習した潜伏表現には、不十分な情報または無駄な情報が含まれる:i) MLPは、入力データをうまく再構築するが、欠落部分の完了に失敗する;i) GNNは、社会的つながりに応じて情報をマージするが、GNNでは共通の問題である過密化に悩まされる。さらに、既存の手法ではデコーダの規制を無視しており、十分な推論能力がなく、厳しいオーバーフィッティングに直面している。上記の問題に対処するため,逆VAE(Infer-AVAE)に基づく属性推論モデルを提案する。私たちのモデルは、エンコーダ内のmlpとgnnを意図的に統一して、2つの潜在表現を学習します。次に,2つの表現の違いを活用するために,敵対ネットワークを訓練し,強靭な表現のためにMPPを用いてGNNを誘導する。さらに、識別器としてデコーダを特に訓練するために、損失関数に相互情報制約を導入する。したがって、属性推論の表現における補助情報をよりよく活用することができる。実世界のsnsデータセットに基づいて, 実験結果から, 本モデルの平均的な精度は7.0%向上した。

Facing the sparsity of user attributes on social networks, attribute inference aims at inferring missing attributes based on existing data and additional information such as social connections between users. Recently, Variational Autoencoders (VAEs) have been successfully applied to solve the problem in a semi-supervised way. However, the latent representations learned by the encoder contain either insufficient or useless information: i) MLPs can successfully reconstruct the input data but fail in completing missing part, ii) GNNs merge information according to social connections but suffer from over-smoothing, which is a common problem with GNNs. Moreover, existing methods neglect regulating the decoder, as a result, it lacks adequate inference ability and faces severe overfitting. To address the above issues, we propose an attribute inference model based on adversarial VAE (Infer-AVAE). Our model deliberately unifies MLPs and GNNs in encoder to learn dual latent representations: one contains only the observed attributes of each user, the other converges extra information from the neighborhood. Then, an adversarial network is trained to leverage the differences between the two representations and adversarial training is conducted to guide GNNs using MLPs for robust representations. What's more, mutual information constraint is introduced in loss function to specifically train the decoder as a discriminator. Thus, it can make better use of auxiliary information in the representations for attribute inference. Based on real-world social network datasets, experimental results demonstrate that our model averagely outperforms state-of-art by 7.0% in accuracy.

翻訳日:2021-04-18 06:04:55 公開日:2020-12-30

# 3層ニューラルネットワークのSGD分布ダイナミクス

SGD Distributional Dynamics of Three Layer Neural Networks ( http://arxiv.org/abs/2012.15036v1 )

ライセンス: Link先を確認

Victor Luo, Yazhen Wang and Glenn Fung

(参考訳) ビッグデータ分析の台頭に伴い、多層ニューラルネットワークは最も強力な機械学習手法の1つとして浮上した。しかし、理論的な数学的性質はまだ完全には理解されていない。ニューラルネットワークのトレーニングには、通常確率勾配降下(sgd)を使用して行われる非凸目的関数を最適化する必要がある。本稿では,Mei et alの平均場結果を拡張することを目的とする。 (2018) 隠れた層を持つ2層ニューラルネットワークから隠れた層を持つ3層ニューラルネットワークへ移行した。 SGD力学は非線形偏微分方程式の集合によって捉えられ、2つの隠蔽層における重みの分布が独立であることを証明する。シミュレーションと実世界データに基づく探索作業についても詳述する。

With the rise of big data analytics, multi-layer neural networks have surfaced as one of the most powerful machine learning methods. However, their theoretical mathematical properties are still not fully understood. Training a neural network requires optimizing a non-convex objective function, typically done using stochastic gradient descent (SGD). In this paper, we seek to extend the mean field results of Mei et al. (2018) from two-layer neural networks with one hidden layer to three-layer neural networks with two hidden layers. We will show that the SGD dynamics is captured by a set of non-linear partial differential equations, and prove that the distributions of weights in the two hidden layers are independent. We will also detail exploratory work done based on simulation and real-world data.

翻訳日:2021-04-18 06:04:24 公開日:2020-12-30

# mm-fsod: メタとメトリックの統合したマイナショットオブジェクト検出

MM-FSOD: Meta and metric integrated few-shot object detection ( http://arxiv.org/abs/2012.15159v1 )

ライセンス: Link先を確認

Yuewen Li, Wenquan Feng, Shuchang Lyu, Qi Zhao, Xuliang Li

(参考訳) オブジェクト検出タスクでは、cnn(convolutional neural networks)モデルはトレーニングプロセスにおいて、常に大量の注釈付き例を必要とします。高価なアノテーションの依存性を減らすために、少数のオブジェクト検出が研究の焦点となっている。本稿では,メタラーニングとメトリック学習を統合した効果的なオブジェクト検出フレームワーク(MM-FSOD)を提案する。我々のモデルは、トレーニングサンプルにない新しいカテゴリを正確に認識できるクラスに依存しない検出モデルである。具体的には,クラス内平均プロトタイプを学習するためのメタ表現モジュール(MRモジュール)を提案する。 MRモジュールは、高度な特徴を再構築する能力を得るためにメタラーニング法で訓練される。クエリロア機能を持つサポートプロトタイプ間の特徴の類似性をさらに高めるために,分類器として機能するピアソン計量モジュール(prモジュール)を提案する。これまでの一般的な計量法と比較すると、コサイン距離メートル法である。 prモジュールは、モデルを識別的な埋め込み空間にアライメント可能にする。ベンチマークデータセット FSOD, MS COCO, PASCAL VOC に関する広範な実験を行い, 本モデルの有効性と有効性を示す。従来の手法と比較して、MM-FSODは最先端(SOTA)結果が得られる。

In the object detection task, CNN (Convolutional neural networks) models always need a large amount of annotated examples in the training process. To reduce the dependency of expensive annotations, few-shot object detection has become an increasing research focus. In this paper, we present an effective object detection framework (MM-FSOD) that integrates metric learning and meta-learning to tackle the few-shot object detection task. Our model is a class-agnostic detection model that can accurately recognize new categories, which are not appearing in training samples. Specifically, to fast learn the features of new categories without a fine-tuning process, we propose a meta-representation module (MR module) to learn intra-class mean prototypes. MR module is trained with a meta-learning method to obtain the ability to reconstruct high-level features. To further conduct similarity of features between support prototype with query RoIs features, we propose a Pearson metric module (PR module) which serves as a classifier. Compared to the previous commonly used metric method, cosine distance metric. PR module enables the model to align features into discriminative embedding space. We conduct extensive experiments on benchmark datasets FSOD, MS COCO, and PASCAL VOC to demonstrate the feasibility and efficiency of our model. Comparing with the previous method, MM-FSOD achieves state-of-the-art (SOTA) results.

翻訳日:2021-04-18 06:04:13 公開日:2020-12-30

# laif:ai、ドイツのディープラーニング、suetterlinの文字認識と生成

LAIF: AI, Deep Learning for Germany Suetterlin Letter Recognition and Generation ( http://arxiv.org/abs/2101.10450v1 )

ライセンス: Link先を確認

Enkhtogtokh Togootogtokh, Christian Klasen

(参考訳) ディープラーニングAI技術の初期の実装として成功したのは、文字認識だった。人工知能(AI)の最近の進歩により、手書き文字認識や自動生成といった複雑な問題に対して、より強固な技術がもたらされる。本研究では,ドイツにおける文字認識・生成のためのLudwig AI Framework(LAIF)というディープラーニングフレームワークを提案する。 Suetterlin文字を認識するために,我々は深層畳み込みニューラルネットワークを提案する。本研究は,手書き文字のハードコピーのラベル付けに膨大なコストと深層モデルのトレーニング用データがないことから,手書き文字を合成データとして生成する深部生成逆数ネットワークを用いた手法についても紹介する。ソースコードはhttps://github.com/enkhtogtokh/LAIFリポジトリにある。

One of the successful early implementation of deep learning AI technology was on letter recognition. With the recent breakthrough of artificial intelligence (AI) brings more solid technology for complex problems like handwritten letter recognition and even automatic generation of them. In this research, we proposed deep learning framework called Ludwig AI Framework(LAIF) for Germany Suetterlin letter recognition and generation. To recognize Suetterlin letter, we proposed deep convolutional neural network. Since lack of big amount of data to train for the deep models and huge cost to label existing hard copy of handwritten letters, we also introduce the methodology with deep generative adversarial network to generate handwritten letters as synthetic data. Main source code is in https://github.com/enkhtogtokh/LAIF repository.

翻訳日:2021-04-18 06:03:53 公開日:2020-12-30

# 時系列予測のための局所モデルアンサンブル

Ensembles of Localised Models for Time Series Forecasting ( http://arxiv.org/abs/2012.15059v1 )

ライセンス: Link先を確認

Rakshitha Godahewa, Kasun Bandara, Geoffrey I. Webb, Slawek Smyl, Christoph Bergmeir

(参考訳) 今日では、大量のデータが利用可能になっているため、Global Forecasting Models (GFM)として知られる一連の時系列で訓練された予測モデルは、孤立した時系列で動作する従来の単変量予測モデルよりも定期的に優れている。 GFMは通常、すべての時系列で同じパラメータのセットを共有するため、特にデータセットが不均一な状況において、特定の時系列に十分に局所化されないという問題があることが多い。本稿では,一般GFMとユニバリアイトモデルを用いて,この問題を解決する方法について検討する。私たちの研究は,クラスタ単位の分離サブモデル,いわゆる専門家アンサンブルアプローチ,グローバルモデルとローカルモデルのヘテロジニアスアンサンブルの構築など,関連する現在のアプローチを体系化し,比較する。アプローチのギャップを埋めて、異なる基盤となるGFMモデルタイプに一般化する。次に,クラスタ数とクラスタ種数を変化させて,複数のgfmを連続するクラスタ上でトレーニングする,クラスターアンサンブルの新たな手法を提案する。フィードフォワードニューラルネットワーク,リカレントニューラルネットワーク,プール回帰モデルを基礎となるGAMとして6つの公開データセットを評価した結果,提案モデルはベースラインGAMモデルや単変量予測手法よりもはるかに高い精度を達成できることがわかった。

With large quantities of data typically available nowadays, forecasting models that are trained across sets of time series, known as Global Forecasting Models (GFM), are regularly outperforming traditional univariate forecasting models that work on isolated series. As GFMs usually share the same set of parameters across all time series, they often have the problem of not being localised enough to a particular series, especially in situations where datasets are heterogeneous. We study how ensembling techniques can be used with generic GFMs and univariate models to solve this issue. Our work systematises and compares relevant current approaches, namely clustering series and training separate submodels per cluster, the so-called ensemble of specialists approach, and building heterogeneous ensembles of global and local models. We fill some gaps in the approaches and generalise them to different underlying GFM model types. We then propose a new methodology of clustered ensembles where we train multiple GFMs on different clusters of series, obtained by changing the number of clusters and cluster seeds. Using Feed-forward Neural Networks, Recurrent Neural Networks, and Pooled Regression models as the underlying GFMs, in our evaluation on six publicly available datasets, the proposed models are able to achieve significantly higher accuracy than baseline GFM models and univariate forecasting methods.

翻訳日:2021-04-18 06:03:40 公開日:2020-12-30

# 公正制約下でのニューラルネットワーク分類器の訓練

Provably Training Neural Network Classifiers under Fairness Constraints ( http://arxiv.org/abs/2012.15274v1 )

ライセンス: Link先を確認

You-Lin Chen, Zhaoran Wang, Mladen Kolar

(参考訳) 公正な制約下での分類器の訓練は、道徳的、法的、ビジネス上の理由により、機械学習コミュニティで注目を集めている。しかし、アルゴリズムの公正性に対処する最近のいくつかの研究は、非凸性や人種や性別などの保護されたグループ間での差別化不可能な公正性基準によるロジスティック回帰やサポートベクターマシンのような単純なモデルにのみ焦点を当てている。ニューラルネットワークは、近年最も広く使われている分類モデルであり、理論的保証がない。本稿では,ニューラルネットワークにおけるアルゴリズムフェアネスの文献の欠如を補うことを目的とする。特に,過パラメータ化ニューラルネットワークが公平性制約を満たしていることを示す。公平なニューラルネットワーク分類器を構築する上で重要な要素は、ニューラルネットワークと関連するアプリケーションのオンライン学習に独立して関心を持つ可能性のある、オーバーパラメータ化体制におけるニューラルネットワークの非回帰分析を確立することである。

Training a classifier under fairness constraints has gotten increasing attention in the machine learning community thanks to moral, legal, and business reasons. However, several recent works addressing algorithmic fairness have only focused on simple models such as logistic regression or support vector machines due to non-convex and non-differentiable fairness criteria across protected groups, such as race or gender. Neural networks, the most widely used models for classification nowadays, are precluded and lack theoretical guarantees. This paper aims to fill this missing but crucial part of the literature of algorithmic fairness for neural networks. In particular, we show that overparametrized neural networks could meet the fairness constraints. The key ingredient of building a fair neural network classifier is establishing no-regret analysis for neural networks in the overparameterization regime, which may be of independent interest in the online learning of neural networks and related applications.

翻訳日:2021-04-18 06:03:17 公開日:2020-12-30

# Riesz表現子の逆推定

Adversarial Estimation of Riesz Representers ( http://arxiv.org/abs/2101.00009v1 )

ライセンス: Link先を確認

Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

(参考訳) 任意の関数空間内の線形汎関数のriesz表現子を推定する逆アプローチを提案する。リース表現器と近似誤差を近似するために用いられる関数空間の局所化ラデマッハ複雑性に基づくオラクルの不等式を証明する。これらの不等式は、高次元スパース線型関数、ニューラルネットワーク、カーネルヒルベルト空間の再現など、多くの関心関数空間に対する高速有限サンプル平均二乗誤差率を意味する。我々のアプローチは、最近導入された機械学習技術でRiesz表現子を推定する新しい方法を提供する。半パラメトリックモデルにおける構造・因果パラメータの非バイアス化,モーメント方程式の自動直交化,および資産価格の文脈における確率的割引係数の推定において,我々の推定器をどのように利用できるかを示す。

We provide an adversarial approach to estimating Riesz representers of linear functionals within arbitrary function spaces. We prove oracle inequalities based on the localized Rademacher complexity of the function space used to approximate the Riesz representer and the approximation error. These inequalities imply fast finite sample mean-squared-error rates for many function spaces of interest, such as high-dimensional sparse linear functions, neural networks and reproducing kernel Hilbert spaces. Our approach offers a new way of estimating Riesz representers with a plethora of recently introduced machine learning techniques. We show how our estimator can be used in the context of de-biasing structural/causal parameters in semi-parametric models, for automated orthogonalization of moment equations and for estimating the stochastic discount factor in the context of asset pricing.

翻訳日:2021-04-18 06:03:03 公開日:2020-12-30

# 3D-UNetを用いたMRI脳腫瘍セグメント化と不確実性評価

MRI brain tumor segmentation and uncertainty estimation using 3D-UNet architectures ( http://arxiv.org/abs/2012.15294v1 )

ライセンス: Link先を確認

Laura Mora Ballestar and Veronica Vilaplana

(参考訳) 3次元磁気共鳴画像(MRI)における脳腫瘍のセグメンテーションの自動化は、疾患の診断と治療を評価する鍵となる。近年では、畳み込みニューラルネットワーク(CNN)がタスクの結果を改善している。しかし、3D-CNNでは高いメモリ消費が問題となっている。また,医療診断において特に重要な不確実性情報を含んでいない方法が多い。本研究は,パッチベースの手法で訓練した3Dエンコーダデコーダアーキテクチャについて検討し,メモリ消費を低減し,不均衡なデータの影響を低減する。異なるトレーニングされたモデルを使用して、各モデルのプロパティを活用するアンサンブルを生成し、パフォーマンスを向上する。また,テストタイム・ドロップアウト (TTD) とデータ拡張 (TTA) を用いて, てんかん, てんかんともにボキセル関連不確実性情報を導入する。さらに、セグメント化の精度を高めるためのハイブリッドアプローチも提案されている。本研究で提案されているモデルと不確実性推定測定は,brats'20 challenge for task 1, 3において腫瘍の分節化と不確実性推定に用いられてきた。

Automation of brain tumor segmentation in 3D magnetic resonance images (MRIs) is key to assess the diagnostic and treatment of the disease. In recent years, convolutional neural networks (CNNs) have shown improved results in the task. However, high memory consumption is still a problem in 3D-CNNs. Moreover, most methods do not include uncertainty information, which is especially critical in medical diagnosis. This work studies 3D encoder-decoder architectures trained with patch-based techniques to reduce memory consumption and decrease the effect of unbalanced data. The different trained models are then used to create an ensemble that leverages the properties of each model, thus increasing the performance. We also introduce voxel-wise uncertainty information, both epistemic and aleatoric using test-time dropout (TTD) and data-augmentation (TTA) respectively. In addition, a hybrid approach is proposed that helps increase the accuracy of the segmentation. The model and uncertainty estimation measurements proposed in this work have been used in the BraTS'20 Challenge for task 1 and 3 regarding tumor segmentation and uncertainty estimation.

翻訳日:2021-04-18 06:02:47 公開日:2020-12-30

# 説明可能性:医療画像におけるバックドア攻撃

Explainability Matters: Backdoor Attacks on Medical Imaging ( http://arxiv.org/abs/2101.00008v1 )

ライセンス: Link先を確認

Munachiso Nwadike, Takumi Miyawaki, Esha Sarkar, Michail Maniatakos, Farah Shamout

(参考訳) 深層ニューラルネットワークは、モデルトレーニングの前にトレーニングセットに簡単に導入できるバックドア攻撃に対して脆弱であることが示されている。最近の研究は、自然画像やおもちゃのデータセットに対するバックドア攻撃の調査に焦点を当てている。その結果、バックドアの正確な影響は、医療画像などの複雑な実世界応用においてはまだ完全には理解されていない。本稿では,胸部X線写真を用いたマルチラベル疾患分類タスクに対するバックドア攻撃の影響を,攻撃者がトレーニングデータセットを操作して攻撃を実行することを前提として検討する。最先端アーキテクチャの広範な評価は、トレーニングセットに数ピクセルの摂動を持つイメージを導入することで、アタッカーがトレーニング手順に関与せずにバックドアをうまく実行できることを示しています。単純な3$\times$3ピクセルトリガは、感染した画像のセットの受信操作特性(AUROC)曲線の下で最大1.00エリアを達成することができる。クリーンな画像のセットでは、バックドアニューラルネットワークは最大0.85AUROCを達成することができ、攻撃のステルス性を強調した。深層学習に基づく診断システムの使用が臨床実践で増加するにつれ,空間的局所化されたバックドアを推論時間で識別できるため,この文脈では説明可能性が不可欠であることを示す。

Deep neural networks have been shown to be vulnerable to backdoor attacks, which could be easily introduced to the training set prior to model training. Recent work has focused on investigating backdoor attacks on natural images or toy datasets. Consequently, the exact impact of backdoors is not yet fully understood in complex real-world applications, such as in medical imaging where misdiagnosis can be very costly. In this paper, we explore the impact of backdoor attacks on a multi-label disease classification task using chest radiography, with the assumption that the attacker can manipulate the training dataset to execute the attack. Extensive evaluation of a state-of-the-art architecture demonstrates that by introducing images with few-pixel perturbations into the training set, an attacker can execute the backdoor successfully without having to be involved with the training procedure. A simple 3$\times$3 pixel trigger can achieve up to 1.00 Area Under the Receiver Operating Characteristic (AUROC) curve on the set of infected images. In the set of clean images, the backdoored neural network could still achieve up to 0.85 AUROC, highlighting the stealthiness of the attack. As the use of deep learning based diagnostic systems proliferates in clinical practice, we also show how explainability is indispensable in this context, as it can identify spatially localized backdoors in inference time.

翻訳日:2021-04-18 06:02:29 公開日:2020-12-30

# 貯水器変圧器

Reservoir Transformer ( http://arxiv.org/abs/2012.15045v1 )

ライセンス: Link先を確認

Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela

(参考訳) いくつかの層がランダムに初期化され、更新されない場合でも、トランスフォーマは印象的なパフォーマンスを得る。機械学習における古き良きアイデアに着想を得て,正規トランスフォーマー層と相互に分散した非線形の「保存」層を探索し,様々な機械翻訳と(マスク)言語モデリングタスクにおいて,収束までの壁時計計算時間の改善と全体的な性能を示す。

We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.

翻訳日:2021-04-18 06:01:55 公開日:2020-12-30

# 語彙単純化による事前学習言語モデルの拡張

Enhancing Pre-trained Language Model with Lexical Simplification ( http://arxiv.org/abs/2012.15070v1 )

ライセンス: Link先を確認

Rongzhou Bao, Jiayi Wang, Zhuosheng Zhang, Hai Zhao

(参考訳) 人間の読者と事前学習された言語モデル(PrLM)の両方にとって、語彙の多様性は、与えられた文の基本的な意味を理解する際に混乱と不正確をもたらす可能性がある。複雑な単語を簡単な代替語で置換することにより、語彙単純化(LS)はそのような語彙の多様性を減らし、文の理解性を向上する。本稿では,LSを活用し,テキスト分類におけるPrLMの性能を効果的に向上する手法を提案する。所定の文に対して規則に基づく単純化処理を適用する。 PrLMは、簡略化されたバージョンからの補助的な入力で、与えられた文の実際のラベルを予測する。強力なPrLM(BERTとELECTRA)をベースラインとして,テキスト分類タスクの性能をさらに向上させることができる。

For both human readers and pre-trained language models (PrLMs), lexical diversity may lead to confusion and inaccuracy when understanding the underlying semantic meanings of given sentences. By substituting complex words with simple alternatives, lexical simplification (LS) is a recognized method to reduce such lexical diversity, and therefore to improve the understandability of sentences. In this paper, we leverage LS and propose a novel approach which can effectively improve the performance of PrLMs in text classification. A rule-based simplification process is applied to a given sentence. PrLMs are encouraged to predict the real label of the given sentence with auxiliary inputs from the simplified version. Using strong PrLMs (BERT and ELECTRA) as baselines, our approach can still further improve the performance in various text classification tasks.

翻訳日:2021-04-18 06:01:46 公開日:2020-12-30

# 位置情報分離によるゼロショット翻訳の改善

Improving Zero-Shot Translation by Disentangling Positional Information ( http://arxiv.org/abs/2012.15127v1 )

ライセンス: Link先を確認

Danni Liu, Jan Niehues, James Cross, Francisco Guzm\'an, Xian Li

(参考訳) 多言語ニューラルマシン翻訳は、トレーニングで見えない言語ペア間で直接翻訳する能力を示している。ゼロショット翻訳。概念的には魅力的だが、しばしば低い出力品質に悩まされる。新しい翻訳方向への一般化の難しさは、モデル表現が訓練で見られる言語対に非常に特有であることを示している。言語固有の表現を引き起こす主な要因が入力トークンの位置対応であることを示す。エンコーダ層の残余接続をなくすことで,これを容易に緩和できることを示す。この修正により、教師付き方向の品質を維持しながら、ゼロショット翻訳で最大18.5 BLEUポイントを得ることができる。提案したモデルがピボットベースの翻訳より優れている関連言語間では特に改善が顕著である。さらに,このアプローチでは,翻訳範囲を大きく拡大する新しい言語の統合が容易になる。隠れた層出力の徹底的な検査により、我々のアプローチが言語に依存しない表現につながることを示す。

Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. We demonstrate that a main factor causing the language-specific representations is the positional correspondence to input tokens. We show that this can be easily alleviated by removing residual connections in an encoder layer. With this modification, we gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions. The improvements are particularly prominent between related languages, where our proposed model outperforms pivot-based translation. Moreover, our approach allows easy integration of new languages, which substantially expands translation coverage. By thorough inspections of the hidden layer outputs, we show that our approach indeed leads to more language-independent representations.

翻訳日:2021-04-18 06:01:32 公開日:2020-12-30

# オープンドメイン質問応答のためのメモリ効率のよいベースライン

A Memory Efficient Baseline for Open Domain Question Answering ( http://arxiv.org/abs/2012.15156v1 )

ライセンス: Link先を確認

Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Sebastian Riedel, Edouard Grave

(参考訳) 近年,高密度表現に基づく検索システムにより,オープンドメイン質問応答や関連するタスクが大幅に改善されている。このアプローチは非常に効果的だが、知識ソース全体の密度の高いベクトルをメモリに保持する必要があるため、メモリ集約型である。本稿では,高密度レトリバー・リーダーシステムのメモリフットプリントを低減する方法について検討する。本稿では,次元削減,ベクトル量子化,通過フィルタリングの3つの手法を検討する。我々は,TriviaQAとNaturalQuestionsという2つの質問応答ベンチマークに対するアプローチを評価し,6Gb未満のメモリで競合するシステムを実現できることを示した。

Recently, retrieval systems based on dense representations have led to important improvements in open-domain question answering, and related tasks. While very effective, this approach is also memory intensive, as the dense vectors for the whole knowledge source need to be kept in memory. In this paper, we study how the memory footprint of dense retriever-reader systems can be reduced. We consider three strategies to reduce the index size: dimension reduction, vector quantization and passage filtering. We evaluate our approach on two question answering benchmarks: TriviaQA and NaturalQuestions, showing that it is possible to get competitive systems using less than 6Gb of memory.

翻訳日:2021-04-18 06:01:21 公開日:2020-12-30

# クラック置換暗号はシーケンス・ツー・シーケンス・モデルでも可能か?

Can Sequence-to-Sequence Models Crack Substitution Ciphers? ( http://arxiv.org/abs/2012.15229v1 )

ライセンス: Link先を確認

Nada Aldarrab and Jonathan May

(参考訳) 歴史的暗号の解読は難しい問題である。ターゲットの平文の言語は不明であり、暗号文には多くのノイズがある。 State-of-the-art decipherment法では、ビームサーチとニューラル言語モデルを用いて、与えられた暗号に対する候補平文仮説をスコアリングする。簡単な置換暗号を解くためのエンドツーエンド多言語モデルを提案する。提案手法は,テキストを明示的な言語識別なしに解読可能であり,なおも雑音に対して頑健であることを示す。

Decipherment of historical ciphers is a challenging problem. The language of the target plaintext might be unknown, and ciphertext can have a lot of noise. State-of-the-art decipherment methods use beam search and a neural language model to score candidate plaintext hypotheses for a given cipher, assuming plaintext language is known. We propose an end-to-end multilingual model for solving simple substitution ciphers. We test our model on synthetic and real historical ciphers and show that our proposed method can decipher text without explicit language identification and can still be robust to noise.

翻訳日:2021-04-18 06:01:10 公開日:2020-12-30

# 教師なしラベル対応イベントトリガーと引数分類

Unsupervised Label-aware Event Trigger and Argument Classification ( http://arxiv.org/abs/2012.15243v1 )

ライセンス: Link先を確認

Hongming Zhang, Haoyu Wang, Dan Roth

(参考訳) イベントを識別し、事前に定義されたイベントタイプにマッピングすることは、長い間、自然言語処理の重要な問題でした。これまでの作業のほとんどは、イベントタイプのラベルに含まれる意味を無視しながら、労働集約的およびドメイン固有のアノテーションに大きく依存していました。その結果、学習したモデルは、新しいイベントタイプを導入できる新しいドメインに効果的に一般化することはできない。本稿では,まず利用可能なツール(srlなど)でイベントを識別し,提案する非教師付き分類モデルを用いて,事前定義されたイベントタイプに自動マップする,教師なしイベント抽出パイプラインを提案する。アノテーション付きデータに頼るのではなく、モデルが特定したイベントのセマンティクスとイベントタイプラベルのセマンティクスを一致させるのです。具体的には、事前訓練された言語モデルを利用して、イベントトリガと引数の両方の事前定義された型を文脈的に表現する。表現類似性によって特定されたイベントを対象の型にマップした後、イベントオントロジー(例えば、引数型 "Victim" はイベント型 "Attack" の引数としてのみ現れる)を、予測を規則化するためのグローバルな制約として使用します。提案手法は、31のトリガと22の引数型を持つACE-2005データセットでテストした場合、非常に効果的であることが示されている。アノテーションを使わずに、83%のトリガと54%の引数を正しい型にマッピングすることに成功しました。

Identifying events and mapping them to pre-defined event types has long been an important natural language processing problem. Most previous work has been heavily relying on labor-intensive and domain-specific annotations while ignoring the semantic meaning contained in the labels of the event types. As a result, the learned models cannot effectively generalize to new domains, where new event types could be introduced. In this paper, we propose an unsupervised event extraction pipeline, which first identifies events with available tools (e.g., SRL) and then automatically maps them to pre-defined event types with our proposed unsupervised classification model. Rather than relying on annotated data, our model matches the semantics of identified events with those of event type labels. Specifically, we leverage pre-trained language models to contextually represent pre-defined types for both event triggers and arguments. After we map identified events to the target types via representation similarity, we use the event ontology (e.g., argument type "Victim" can only appear as the argument of event type "Attack") as global constraints to regularize the prediction. The proposed approach is shown to be very effective when tested on the ACE-2005 dataset, which has 33 trigger and 22 argument types. Without using any annotation, we successfully map 83% of the triggers and 54% of the arguments to the correct types, almost doubling the performance of previous zero-shot approaches.

翻訳日:2021-04-18 06:00:59 公開日:2020-12-30

# 生成型adversarial networkを用いた教師なし深部画像強調

Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network ( http://arxiv.org/abs/2012.15020v1 )

ライセンス: Link先を確認

Zhangkai Ni, Wenhan Yang, Shiqi Wang, Lin Ma, and Sam Kwong

(参考訳) 画像の美的品質を改善することは、大衆にとって挑戦的で熱心である。この問題に対処するため,既存のアルゴリズムの多くは,低品質の写真とそれに対応する専門家修正バージョンからなるペアデータの自動写真強調器を学習するための教師付き学習法に基づいている。しかし、専門家が修正した写真のスタイルや特徴は一般ユーザーのニーズや好みに合わない可能性がある。本稿では,多数のペア画像について学習するのではなく,教師なしな特徴を持つ画像の集合から,対応する画像と画像のマッピングを学習する,教師なし画像強調生成ネットワーク(UEGAN)を提案する。提案モデルは,よりリッチなグローバル・ローカル特徴を捉えるために,変調機構とアテンション機構を組み込んだシングルディープganに基づいている。提案モデルに基づいて,(1)事前学習したVGGネットワークの特徴領域におけるL2正規化として定義される忠実度損失と,(2)相対論的ヒンジ反転損失として定式化された品質損失とを両立させ,入力画像に所望の特性を与える。定量的,質的ともに,提案モデルが画像の美的品質を効果的に改善することを示す。コードはhttps://github.com/eezkni/uegan.com/。

Improving the aesthetic quality of images is challenging and eager for the public. To address this problem, most existing algorithms are based on supervised learning methods to learn an automatic photo enhancer for paired data, which consists of low-quality photos and corresponding expert-retouched versions. However, the style and characteristics of photos retouched by experts may not meet the needs or preferences of general users. In this paper, we present an unsupervised image enhancement generative adversarial network (UEGAN), which learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner, rather than learning on a large number of paired images. The proposed model is based on single deep GAN which embeds the modulation and attention mechanisms to capture richer global and local features. Based on the proposed model, we introduce two losses to deal with the unsupervised image enhancement: (1) fidelity loss, which is defined as a L2 regularization in the feature domain of a pre-trained VGG network to ensure the content between the enhanced image and the input image is the same, and (2) quality loss that is formulated as a relativistic hinge adversarial loss to endow the input image the desired characteristics. Both quantitative and qualitative results show that the proposed model effectively improves the aesthetic quality of images. Our code is available at: https://github.com/eezkni/UEGAN.

翻訳日:2021-04-18 06:00:32 公開日:2020-12-30

# NBNet:サブスペース投影による画像認識のためのノイズバス学習

NBNet: Noise Basis Learning for Image Denoising with Subspace Projection ( http://arxiv.org/abs/2012.15028v1 )

ライセンス: Link先を確認

Shen Cheng, Yuzhi Wang, Haibin Huang, Donghao Liu, Haoqiang Fan, Shuaicheng Liu

(参考訳) 本稿では,画像復調のための新しいフレームワークであるNBNetを紹介する。従来と異なり,画像適応投影による雑音低減という新たな視点から,この問題に取り組むことを提案する。具体的には,特徴空間における再構成基底のセットを学習することにより,信号と雑音を分離できるネットワークを訓練することを提案する。その後、信号部分空間の対応する基底を選択し、入力をそのような空間に投影することにより、画像デノシングを実現することができる。我々の重要な洞察は、プロジェクションは入力信号の局所的な構造を自然に維持できるということだ。この目的に向けて,基本生成と部分空間射影を明示的に学習するために設計された非局所部分空間注意モジュールであるssaを提案する。さらに、エンド・ツー・エンドの画像デノシング用に設計されたUNet構造化ネットワークであるNBNetにSSAを組み込む。我々は、SIDDやDNDなどのベンチマークで評価を行い、NBNetはPSNRおよびSSIMの最先端性能を計算コストを著しく削減する。

In this paper, we introduce NBNet, a novel framework for image denoising. Unlike previous works, we propose to tackle this challenging problem from a new perspective: noise reduction by image-adaptive projection. Specifically, we propose to train a network that can separate signal and noise by learning a set of reconstruction basis in the feature space. Subsequently, image denosing can be achieved by selecting corresponding basis of the signal subspace and projecting the input into such space. Our key insight is that projection can naturally maintain the local structure of input signal, especially for areas with low light or weak textures. Towards this end, we propose SSA, a non-local subspace attention module designed explicitly to learn the basis generation as well as the subspace projection. We further incorporate SSA with NBNet, a UNet structured network designed for end-to-end image denosing. We conduct evaluations on benchmarks, including SIDD and DND, and NBNet achieves state-of-the-art performance on PSNR and SSIM with significantly less computational cost.

翻訳日:2021-04-18 06:00:09 公開日:2020-12-30

# SkiNet: 不確実性推定と説明可能性を備えた皮膚病変診断のためのディープラーニングソリューション

SkiNet: A Deep Learning Solution for Skin Lesion Diagnosis with Uncertainty Estimation and Explainability ( http://arxiv.org/abs/2012.15049v1 )

ライセンス: Link先を確認

Rajeev Kumar Singh, Rohan Gorantla, Sai Giridhar Allada, Narra Pratap

(参考訳) 皮膚がんは最も一般的なヒト悪性腫瘍であると考えられている。アメリカでは毎年500万件の新しい皮膚がんの症例が記録されている。皮膚病変の早期診断と評価は臨床的に非常に重要であるが, 発展途上国では皮膚科医と患者の比率が著しく低下している。そこで,SkiNet として知られる深層学習型アーキテクチャは,より高速なスクリーニングソリューションと,臨床診断過程において新たに訓練された医師に支援を提供することを目的としている。スキレットの設計と開発の主な動機はホワイトボックスソリューションを提供することであり、医療従事者によるコンピュータ支援診断システムの普及に不可欠な信頼と解釈の重大な問題に対処することである。 SkiNetは2段階のパイプラインで、病変のセグメンテーションに続いて、病変の分類を行う。提案手法では,モンテカルロ・ドロップアウト法とテスト時間拡張法を用いて認識論的不確かさを推定し,塩分に基づく手法を用いて深層学習モデルのポストホックな説明を行った。公開データセットISIC-2018は実験およびアブレーション研究に使用される。その結果、モデルの信頼性と透明性をモデル予測に組み込むことで、医療従事者の懐疑を緩和する一方で、従来のベンチマーク上でのモデルの堅牢性を確立した。

Skin cancer is considered to be the most common human malignancy. Around 5 million new cases of skin cancer are recorded in the United States annually. Early identification and evaluation of skin lesions is of great clinical significance, but the disproportionate dermatologist-patient ratio poses significant problem in most developing nations. Therefore a deep learning based architecture, known as SkiNet, is proposed with an objective to provide faster screening solution and assistance to newly trained physicians in the clinical diagnosis process. The main motive behind Skinet's design and development is to provide a white box solution, addressing a critical problem of trust and interpretability which is crucial for the wider adoption of Computer-aided diagnosis systems by the medical practitioners. SkiNet is a two-stage pipeline wherein the lesion segmentation is followed by the lesion classification. In our SkiNet methodology, Monte Carlo dropout and test time augmentation techniques have been employed to estimate epistemic and aleatoric uncertainty, while saliency-based methods are explored to provide post-hoc explanations of the deep learning models. The publicly available dataset, ISIC-2018, is used to perform experimentation and ablation studies. The results establish the robustness of the model on the traditional benchmarks while addressing the black-box nature of such models to alleviate the skepticism of medical practitioners by incorporating transparency and confidence to the model's prediction.

翻訳日:2021-04-18 05:59:51 公開日:2020-12-30

# RTS3D: 自律運転のための4次元特徴整合埋め込み空間からのリアルタイムステレオ3D検出

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving ( http://arxiv.org/abs/2012.15072v1 )

ライセンス: Link先を確認

Peixuan Li, Shun Su, Huaici Zhao

(参考訳) Pseudo-LiDAR表現を用いた最近の画像ベース3Dオブジェクト検出法は優れた機能を示しているが、LiDAR法と比較して効率と精度の顕著な差は残っている。さらに、スタンドアローン深度推定器の過度信頼は、トレーニング段階では大量のピクセル単位のアノテーションを必要とし、推論段階ではより多くの計算を必要とし、実世界のスケーリングアプリケーションを制限する。本稿では,RTS3Dというステレオ画像から効率よく高精度な3Dオブジェクト検出手法を提案する。擬似ライダー類似手法における3次元占有空間と異なり,新しい4次元特徴整合埋め込み (fce) 空間を深度監督なしで3次元シーンの中間表現として設計する。 FCE空間は、ステレオペアから歪んだマルチスケールの特徴一貫性を探索することによって、オブジェクトの構造と意味情報を符号化する。さらに,FCE空間雑音の影響を低減するために,意味誘導型RBF (Radial Basis Function) と構造認識型アテンションモジュールを考案した。 KITTIベンチマークの実験では、RTS3Dはステレオ画像3D検出のための最初の真のリアルタイムシステム(FPS$>$24)であり、従来の最先端手法と比較して平均精度が10\%向上している。コードはhttps://github.com/Banconxuan/RTS3Dで入手できる。

Although the recent image-based 3D object detection methods using Pseudo-LiDAR representation have shown great capabilities, a notable gap in efficiency and accuracy still exist compared with LiDAR-based methods. Besides, over-reliance on the stand-alone depth estimator, requiring a large number of pixel-wise annotations in the training stage and more computation in the inferencing stage, limits the scaling application in the real world. In this paper, we propose an efficient and accurate 3D object detection method from stereo images, named RTS3D. Different from the 3D occupancy space in the Pseudo-LiDAR similar methods, we design a novel 4D feature-consistent embedding (FCE) space as the intermediate representation of the 3D scene without depth supervision. The FCE space encodes the object's structural and semantic information by exploring the multi-scale feature consistency warped from stereo pair. Furthermore, a semantic-guided RBF (Radial Basis Function) and a structure-aware attention module are devised to reduce the influence of FCE space noise without instance mask supervision. Experiments on the KITTI benchmark show that RTS3D is the first true real-time system (FPS$>$24) for stereo image 3D detection meanwhile achieves $10\%$ improvement in average precision comparing with the previous state-of-the-art method. The code will be available at https://github.com/Banconxuan/RTS3D

翻訳日:2021-04-18 05:59:27 公開日:2020-12-30

# 視点: ジャミング, 特徴学習, 遅延学習を一体化する深層学習のためのフェーズダイアグラム

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training ( http://arxiv.org/abs/2012.15110v1 )

ライセンス: Link先を確認

Mario Geiger, Leonardo Petrini and Matthieu Wyart

(参考訳) ディープラーニングアルゴリズムは、画像認識やgoプレイなど、さまざまなタスクにおける技術革命の責任を負う。しかし、なぜ働くのかは分かっていない。最終的には、高次元の空間の幾何学とそれに伴う次元の呪いのために、一般的に不可能である高次元のデータを分類する。どのような構造、対称性、不変性が、画像などのデータを学習可能にするかを理解することは、根本的な課題である。他のパズルとしては、(i)学習は高次元の損失を最小化することに対応しており、これは一般に凸ではなく、悪いミニマに陥る可能性がある。 (ii)データが完全に適合している状況でも、適合パラメータの数によってパワーを予測するディープラーニングは増加する。本書では,最近の研究成果を概観し,それらが与える(まだ説明されていない)次元パラドックスの呪いについて考察する。我々は、$(h,\alpha)$平面で、$h$はネットワーク幅、$\alpha$は初期化時のネットワーク出力のスケールであり、MNISTとCIFAR 10のために、その平面におけるパフォーマンスの新たな体系的な尺度を提供する。我々は、異なる学習体制をフェーズダイアグラムにまとめることができると論じる。臨界点の直線は、過小パラメータの位相から過小パラメータの位相を鋭く除く。過パラメータのネットでは、学習は滑らかなクロスオーバーによって分離された2つのレジームで動作することができる。大規模な初期化ではカーネルメソッドに対応し、小さな初期化ではデータの不変量とともに学習することができる。我々は、これらの異なる相の性質、遷移の相違、そしていくつかのオープンな疑問についてレビューする。本治療は,物理システムとの類似性を強調し,議論をスケーリングし,これらの結果を定量的に評価するための数値観測器の開発を行った。

Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental challenge. Other puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. In this manuscript, we review recent results elucidating (i,ii) and the perspective they offer on the (still unexplained) curse of dimensionality paradox. We base our theoretical discussion on the $(h,\alpha)$ plane where $h$ is the network width and $\alpha$ the scale of the output of the network at initialization, and provide new systematic measures of performance in that plane for MNIST and CIFAR 10. We argue that different learning regimes can be organized into a phase diagram. A line of critical points sharply delimits an under-parametrised phase from an over-parametrized one. In over-parametrized nets, learning can operate in two regimes separated by a smooth cross-over. At large initialization, it corresponds to a kernel method, whereas for small initializations features can be learnt, together with invariants in the data. We review the properties of these different phases, of the transition separating them and some open questions. Our treatment emphasizes analogies with physical systems, scaling arguments and the development of numerical observables to quantitatively test these results empirically.

翻訳日:2021-04-18 05:58:39 公開日:2020-12-30

# PMGT-VR:分散化近位勾配アルゴリズムの分散化

PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction ( http://arxiv.org/abs/2012.15010v1 )

ライセンス: Link先を確認

Haishan Ye, Wei Xiong, and Tong Zhang

(参考訳) 本稿では分散複合最適化問題について考察する。本研究では,マルチコンセンサス,勾配追従,分散低減といった複数の手法を組み合わせたpmgt-vrと呼ばれる分散分散分散分散型近位勾配アルゴリズムフレームワークを提案する。提案手法は,集中型アルゴリズムの模倣に依拠し,この枠組みに基づくアルゴリズムが集中型アルゴリズムと同様の収束率を達成することを示す。また、PMGT-SAGAとPMGT-LSVRGの2つの代表アルゴリズムを記述・解析し、それらを既存の最先端近位アルゴリズムと比較する。我々の知る限り、PMGT-VRは分散合成最適化問題を解く最初の分散還元法である。提案手法の有効性を示すために数値実験を行った。

This paper considers the decentralized composite optimization problem. We propose a novel decentralized variance-reduced proximal-gradient algorithmic framework, called PMGT-VR, which is based on a combination of several techniques including multi-consensus, gradient tracking, and variance reduction. The proposed framework relies on an imitation of centralized algorithms and we demonstrate that algorithms under this framework achieve convergence rates similar to that of their centralized counterparts. We also describe and analyze two representative algorithms, PMGT-SAGA and PMGT-LSVRG, and compare them to existing state-of-the-art proximal algorithms. To the best of our knowledge, PMGT-VR is the first variance-reduction method that can solve decentralized composite optimization problems. Numerical experiments are provided to demonstrate the effectiveness of the proposed algorithms.

翻訳日:2021-04-18 05:58:08 公開日:2020-12-30

# エンドツーエンド予測と最適化プロセスのリスク保証

Risk Guarantees for End-to-End Prediction and Optimization Processes ( http://arxiv.org/abs/2012.15046v1 )

ライセンス: Link先を確認

Nam Ho-Nguyen and Fatma K{\i}l{\i}n\c{c}-Karzan

(参考訳) 予測モデルは最適化モデルのパラメータを推定するためにしばしば用いられる。エンドツーエンドの観点では、真の目標は、優れた最適化性能を達成することだが、予測性能は単独で測定される。パラメータの推定における優れた予測性能は、後続の最適化性能をもたらすと信じられているが、それに対する正式な理論的保証は特に不足している。本稿では,予測性能が最適化性能をどのように支配するかを明確に記述できる条件について検討する。より弱い条件では漸近収束の結果が得られるが、より強い条件では予測性能の観点から最適化性能を正確に定量化することができる。一般に、これらの条件の検証は非自明なタスクである。それでも、我々の弱い条件は、学習理論の文献からよく知られたフィッシャー整合性の概念と等価であることを示す。これにより、いくつかの損失関数に対してより弱い条件を簡単にチェックできる。また、二乗誤差損失関数が我々のより強い条件を満たすことも確認する。その結果、二乗損失で測定した予測性能と対称損失関数のクラスと、それに続く最適化性能との正確な理論的関係が導出される。ポートフォリオ最適化,分数ナップサック,多クラス分類問題に関する計算研究において,複数の予測損失関数(フィッシャー一貫性のあるもの,そうでないもの)を用いた最適化性能を比較し,損失関数の一貫性の欠如が性能に有害な影響を与えることを実証する。

Prediction models are often employed in estimating parameters of optimization models. Despite the fact that in an end-to-end view, the real goal is to achieve good optimization performance, the prediction performance is measured on its own. While it is usually believed that good prediction performance in estimating the parameters will result in good subsequent optimization performance, formal theoretical guarantees on this are notably lacking. In this paper, we explore conditions that allow us to explicitly describe how the prediction performance governs the optimization performance. Our weaker condition allows for an asymptotic convergence result, while our stronger condition allows for exact quantification of the optimization performance in terms of the prediction performance. In general, verification of these conditions is a non-trivial task. Nevertheless, we show that our weaker condition is equivalent to the well-known Fisher consistency concept from the learning theory literature. This then allows us to easily check our weaker condition for several loss functions. We also establish that the squared error loss function satisfies our stronger condition. Consequently, we derive the exact theoretical relationship between prediction performance measured with the squared loss, as well as a class of symmetric loss functions, and the subsequent optimization performance. In a computational study on portfolio optimization, fractional knapsack and multiclass classification problems, we compare the optimization performance of using of several prediction loss functions (some that are Fisher consistent and some that are not) and demonstrate that lack of consistency of the loss function can indeed have a detrimental effect on performance.

翻訳日:2021-04-18 05:57:55 公開日:2020-12-30

# メカニカルエンジニアリングにおけるデータサイエンスとそのアプローチ

A Review into Data Science and Its Approaches in Mechanical Engineering ( http://arxiv.org/abs/2012.15358v1 )

ライセンス: Link先を確認

Ashkan Yousefi Zadeh, Meysam Shahbazy

(参考訳) 今日では、インテリジェントシステムを使用して、デバイスやファクトリのさまざまなコンポーネントのパフォーマンスと最適化を改善することは避けられない。さらに、ビジネスや医学研究、工学研究などにおいて、より良い意思決定を行うための適切な予測を持つことが不可欠です。これらの手法の最新かつ最も広く使われている分野の1つはデータサイエンスと呼ばれる分野であり、科学者、エンジニア、工場の全員がキャリアで学び、利用する必要がある。本稿では,データサイエンスについて概説し,その手法,特に機械工学における利用法,および機械工学におけるデータサイエンスの展開方法について概説する。はじめに、異なるデータサイエンスの定義とその技術における背景をレビューした。以下に、データ科学者が研究で行うべきプロセスであるデータサイエンス方法論について論じる。また、その研究にデータサイエンス手法を用いた機械工学分野の研究について概説する。最終的に、論文でレビューされた課題、なぜ機械工学の研究やプロジェクトにおいてデータサイエンスを使う必要があるのか、という議論がなされている。

Nowadays it is inevitable to use intelligent systems to improve the performance and optimization of different components of devices or factories. Furthermore, it's so essential to have appropriate predictions to make better decisions in businesses, medical studies, and engineering studies, etc. One of the newest and most widely used of these methods is a field called Data Science that all of the scientists, engineers, and factories need to learn and use in their careers. This article briefly introduced data science and reviewed its methods, especially it's usages in mechanical engineering and challenges and ways of developing data science in mechanical engineering. In the introduction, different definitions of data science and its background in technology reviewed. In the following, data science methodology which is the process that a data scientist needs to do in its works been discussed. Further, some researches in the mechanical engineering area that used data science methods in their studies, are reviewed. Eventually, it has been discussed according to the subjects that have been reviewed in the article, why it is necessary to use data science in mechanical engineering researches and projects.

翻訳日:2021-04-18 05:57:21 公開日:2020-12-30

# 法医学的目的のための畳み込み長短期記憶ネットワークによる損傷指紋認識

Damaged Fingerprint Recognition by Convolutional Long Short-Term Memory Networks for Forensic Purposes ( http://arxiv.org/abs/2012.15041v1 )

ライセンス: Link先を確認

Jaouhar Fattahi and Mohamed Mejri

(参考訳) 指紋認識は、しばしば犯罪者に対する証拠を確立するためのゲームを変えるステップである。しかし、犯罪者が故意に指紋を改ざんして、技術者や自動センサーが指紋を認識するのを難しくし、捜査官が法医学的な手続きで彼らに対して強力な証拠を確立するのが面倒になることが、ますますわかってきています。この意味で、ディープラーニングは、損傷した指紋の認識を支援する主要候補として現れる。特に畳み込みアルゴリズムです本稿では,Convolutional Long Short-Term Memory Networkによる損傷指紋の認識に着目した。我々は,このモデルのアーキテクチャを示し,95%の精度,99%の精度,95%のリコール,99%のaucに接近する性能を示す。

Fingerprint recognition is often a game-changing step in establishing evidence against criminals. However, we are increasingly finding that criminals deliberately alter their fingerprints in a variety of ways to make it difficult for technicians and automatic sensors to recognize their fingerprints, making it tedious for investigators to establish strong evidence against them in a forensic procedure. In this sense, deep learning comes out as a prime candidate to assist in the recognition of damaged fingerprints. In particular, convolution algorithms. In this paper, we focus on the recognition of damaged fingerprints by Convolutional Long Short-Term Memory networks. We present the architecture of our model and demonstrate its performance which exceeds 95% accuracy, 99% precision, and approaches 95% recall and 99% AUC.

翻訳日:2021-04-18 05:57:03 公開日:2020-12-30

# 品質アテンション・ジェネレーティブ・アドバイザリ・ネットワークによる未ペア画像強調

Unpaired Image Enhancement with Quality-Attention Generative Adversarial Network ( http://arxiv.org/abs/2012.15052v1 )

ライセンス: Link先を確認

Zhangkai Ni, Wenhan Yang, Shiqi Wang, Lin Ma, and Sam Kwong

(参考訳) 本研究では,ユーザが提供する高品質画像の特徴を活かし,低画質画像のエンリッチ化を可能にする非ペア画像エンハンスメントモデルについて検討する。本稿では,品質アテンションモジュール (QAM) を組み込んだ双方向生成支援ネットワーク (GAN) に基づく,未ペアデータに基づく品質アテンション生成敵ネットワーク (QAGAN) を提案する。提案したQAGANの重要な新規性は、2つのドメインから直接ドメイン関連品質の注意を学習するようにジェネレータに注入されたQAMにある。より具体的には、提案したQAMは、空間的特性から意味的特性を効果的に選択し、チャネル的にそれぞれスタイル関連属性を適応的に組み込むことを可能にする。そこで,提案したQAGANでは,識別器だけでなく,生成器が両方のドメインに直接アクセスし,生成器がマッピング関数を学習できるようにする。提案手法は, 未経験学習に基づく最先端の手法と比較して, 客観的・主観的評価の両面において, 優れた性能が得られることを示す。

In this work, we aim to learn an unpaired image enhancement model, which can enrich low-quality images with the characteristics of high-quality images provided by users. We propose a quality attention generative adversarial network (QAGAN) trained on unpaired data based on the bidirectional Generative Adversarial Network (GAN) embedded with a quality attention module (QAM). The key novelty of the proposed QAGAN lies in the injected QAM for the generator such that it learns domain-relevant quality attention directly from the two domains. More specifically, the proposed QAM allows the generator to effectively select semantic-related characteristics from the spatial-wise and adaptively incorporate style-related attributes from the channel-wise, respectively. Therefore, in our proposed QAGAN, not only discriminators but also the generator can directly access both domains which significantly facilitates the generator to learn the mapping function. Extensive experimental results show that, compared with the state-of-the-art methods based on unpaired learning, our proposed method achieves better performance in both objective and subjective evaluations.

翻訳日:2021-04-18 05:56:51 公開日:2020-12-30

# 脳動脈瘤セグメンテーションにおける大コンテキストの探索

Exploring Large Context for Cerebral Aneurysm Segmentation ( http://arxiv.org/abs/2012.15136v1 )

ライセンス: Link先を確認

Jun Ma, Ziwei Nie

(参考訳) 脳動脈瘤の診断, モニタリング, 治療計画には, 3次元CTからの動脈瘤の自動分画が重要である。本報告では,MICCAI 2020 CADA チャレンジにおける大動脈瘤分節法の主な技術について概説する。主な貢献は、大きなパッチサイズで3D U-Netを設定することで、大きなコンテキストを得ることができることです。 MICCAI 2020 CADAテストデータセットでは,平均Jaccard 0.7593で2位となった。私たちのコードとトレーニングされたモデルは、 \url{https://github.com/junma11/cada2020}で公開されている。

Automated segmentation of aneurysms from 3D CT is important for the diagnosis, monitoring, and treatment planning of the cerebral aneurysm disease. This short paper briefly presents the main technique details of the aneurysm segmentation method in the MICCAI 2020 CADA challenge. The main contribution is that we configure the 3D U-Net with a large patch size, which can obtain the large context. Our method ranked second on the MICCAI 2020 CADA testing dataset with an average Jaccard of 0.7593. Our code and trained models are publicly available at \url{https://github.com/JunMa11/CADA2020}.

翻訳日:2021-04-18 05:56:30 公開日:2020-12-30

# Exact, Approximate, Error-Tolerant Graph Matchingのアルゴリズム

Some Algorithms on Exact, Approximate and Error-Tolerant Graph Matching ( http://arxiv.org/abs/2012.15279v1 )

ライセンス: Link先を確認

Shri Prakash Dwivedi

(参考訳) このグラフは、その表現力とオブジェクト間の関係を示す固有の能力から、工学や科学において最も広く使われている数学的構造の一つである。本研究の目的は,グラフの表現力を利用した新しいグラフマッチング手法を導入し,構造的パターン認識に適用することである。本稿では,様々な正確かつ不正確なグラフマッチング手法について広範な調査を行う。準同型の概念を用いたグラフマッチングについて述べる。グラフマッチングアルゴリズムのカテゴリが提示され、関係性のある指標を用いて重要でないノードを除去することでグラフサイズを小さくする。本稿では,より少ない次数ノードを縮合することで,与えられたグラフを別のグラフに変換するノード収縮を用いた誤り耐性グラフマッチング手法を提案する。この手法を用いてグラフ編集距離を延長し,実行時間と精度のトレードオフとして利用することができる。本稿では,各ノードの集中度情報を利用することで,グラフマッチングのアプローチについて述べる。グラフマッチング問題は本質的にグラフの幾何学と位相に関係している。幾何グラフを用いたグラフ類似度測定の新しい手法を提案する。 2つの幾何グラフ間の頂点距離を頂点の位置を用いて定義し、頂点のみを持つすべてのグラフの集合上の計量であることを示す。辺の角方向,長さ,位置に基づいて2つのグラフ間の辺距離を定義する。次に頂点距離と辺距離の概念を組み合わせて、2つの幾何グラフの間のグラフ距離を定義し、それを計量であることを示す。最後に,提案するグラフ類似性フレームワークを用いて,正確かつエラーに耐性のあるグラフマッチングを行う。

The graph is one of the most widely used mathematical structures in engineering and science because of its representational power and inherent ability to demonstrate the relationship between objects. The objective of this work is to introduce the novel graph matching techniques using the representational power of the graph and apply it to structural pattern recognition applications. We present an extensive survey of various exact and inexact graph matching techniques. Graph matching using the concept of homeomorphism is presented. A category of graph matching algorithms is presented, which reduces the graph size by removing the less important nodes using some measure of relevance. We present an approach to error-tolerant graph matching using node contraction where the given graph is transformed into another graph by contracting smaller degree nodes. We use this scheme to extend the notion of graph edit distance, which can be used as a trade-off between execution time and accuracy. We describe an approach to graph matching by utilizing the various node centrality information, which reduces the graph size by removing a fraction of nodes from both graphs based on a given centrality measure. The graph matching problem is inherently linked to the geometry and topology of graphs. We introduce a novel approach to measure graph similarity using geometric graphs. We define the vertex distance between two geometric graphs using the position of their vertices and show it to be a metric over the set of all graphs with vertices only. We define edge distance between two graphs based on the angular orientation, length and position of the edges. Then we combine the notion of vertex distance and edge distance to define the graph distance between two geometric graphs and show it to be a metric. Finally, we use the proposed graph similarity framework to perform exact and error-tolerant graph matching.

翻訳日:2021-04-18 05:56:22 公開日:2020-12-30

# 次数補正ブロックモデルに対する調整型チ二乗試験

Adjusted chi-square test for degree-corrected block models ( http://arxiv.org/abs/2012.15047v1 )

ライセンス: Link先を確認

Linfan Zhang and Arash A. Amini

(参考訳) 次数補正確率ブロックモデル(DCSBM)の適合性試験を提案する。このテストは、d_1,\dots,d_n$観測値を持つ多項分布群間の平均値の等式を測定する調整されたチ二乗統計に基づく。ネットワークモデルの文脈では、$n$という多重項の数は、観測値の$d_i$よりもはるかに速く成長するので、設定は古典的な漸近から逸脱する。単純な調整は、$\{d_i\}$の調和平均が無限大に大きくなる限り、統計学が分布に収束することを示す。この結果は、$d_i$の役割をノード$i$の度合いで果たすような大きなスパースネットワークに適用できる。我々の分布結果は漸近的ではなく、明示的な定数を持ち、目標分布へのコルモゴロフ-スミルノフ距離の有限サンプル境界を与える。順次適用した場合、テストはコミュニティの数を決定するためにも使用できる。テストはadjacency matrixの(row)圧縮バージョンで動作し、次数で条件付けされ、その結果、大きなスパースネットワークに対して高度にスケーラブルである。我々は、$K$コミュニティのテスト時に$(K+1)$-communityの割り当てに基づいて列を圧縮するという新しいアイデアを取り入れた。この手法は, 計算効率を犠牲にすることなく, 逐次的応用のパワーを増大させ, コミュニティ数回復における一貫性を実証する。テスト統計は特定の代替品に依存しないため、そのユーティリティはシーケンシャルなテストを超えて、dcsbmファミリー以外の幅広い代替品に対して同時にテストすることができる。シミュレーションおよび実データを用いた大規模数値実験によるアプローチの有効性を示す。特に、Facebook-100データセットにテストを適用すると、少数のコミュニティを持つDCSBMは、ほぼすべてのケースに適していないことが分かりました。

We propose a goodness-of-fit test for degree-corrected stochastic block models (DCSBM). The test is based on an adjusted chi-square statistic for measuring equality of means among groups of $n$ multinomial distributions with $d_1,\dots,d_n$ observations. In the context of network models, the number of multinomials, $n$, grows much faster than the number of observations, $d_i$, hence the setting deviates from classical asymptotics. We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $\{d_i\}$ grows to infinity. This result applies to large sparse networks where the role of $d_i$ is played by the degree of node $i$. Our distributional results are nonasymptotic, with explicit constants, providing finite-sample bounds on the Kolmogorov-Smirnov distance to the target distribution. When applied sequentially, the test can also be used to determine the number of communities. The test operates on a (row) compressed version of the adjacency matrix, conditional on the degrees, and as a result is highly scalable to large sparse networks. We incorporate a novel idea of compressing the columns based on a $(K+1)$-community assignment when testing for $K$ communities. This approach increases the power in sequential applications without sacrificing computational efficiency, and we prove its consistency in recovering the number of communities. Since the test statistic does not rely on a specific alternative, its utility goes beyond sequential testing and can be used to simultaneously test against a wide range of alternatives outside the DCSBM family. We show the effectiveness of the approach by extensive numerical experiments with simulated and real data. In particular, applying the test to the Facebook-100 dataset, we find that a DCSBM with a small number of communities is far from a good fit in almost all cases.

翻訳日:2021-04-18 05:55:57 公開日:2020-12-30

# 弾性ネットによる特徴ランク付けと選択

Elastic Net based Feature Ranking and Selection ( http://arxiv.org/abs/2012.14982v1 )

ライセンス: Link先を確認

Shaode Yu, Haobo Chen, Hang Yu, Zhicheng Zhang, Xiaokun Liang, Wenjian Qin, Yaoqin Xie, Ping Shi

(参考訳) 特徴選択はデータ表現とインテリジェントな診断において重要である。 Elastic netは最も広く使われている機能セレクタの1つである。しかしながら、選択された特徴はトレーニングデータに依存しており、正規化回帰専用の重み付けは、特徴ランキングに使用される場合の重要性に関係せず、モデル解釈可能性と拡張性が低下する。本研究では,データ分割と弾性ネットによる特徴選択を複数回行った結果,直感的なアイデアが得られた。選択された特徴の頻度に関係し、特徴の重要性を示す指標として周波数を使用する。特徴量を周波数順にソートした後、線形支持ベクトルマシンは漸進的に分類を行う。最終的に、予測性能を比較して識別特徴のコンパクトなサブセットを選択する。乳がんデータセット (BCDR-F03, WDBC, GSE 10810, GSE 15852) の実験結果から, 提案フレームワークは弾力性ネットに対する競争力や優れた性能を達成し, より少ない特徴を連続的に選択できることが示唆された。高次元の小型データセットの一貫性をさらに強化するには、今後の作業にもっと注意を払う必要がある。提案されたフレームワークはオンラインでアクセスできる(https://github.com/nicoyucn/elasticnetfr)。

Feature selection is important in data representation and intelligent diagnosis. Elastic net is one of the most widely used feature selectors. However, the features selected are dependant on the training data, and their weights dedicated for regularized regression are irrelevant to their importance if used for feature ranking, that degrades the model interpretability and extension. In this study, an intuitive idea is put at the end of multiple times of data splitting and elastic net based feature selection. It concerns the frequency of selected features and uses the frequency as an indicator of feature importance. After features are sorted according to their frequency, linear support vector machine performs the classification in an incremental manner. At last, a compact subset of discriminative features is selected by comparing the prediction performance. Experimental results on breast cancer data sets (BCDR-F03, WDBC, GSE 10810, and GSE 15852) suggest that the proposed framework achieves competitive or superior performance to elastic net and with consistent selection of fewer features. How to further enhance its consistency on high-dimension small-sample-size data sets should be paid more attention in our future work. The proposed framework is accessible online (https://github.com/NicoYuCN/elasticnetFR).

翻訳日:2021-04-18 05:55:26 公開日:2020-12-30

# 繰り返しニューラルネットワークを用いたスタックベースバッファオーバーフロー検出

Stack-based Buffer Overflow Detection using Recurrent Neural Networks ( http://arxiv.org/abs/2012.15116v1 )

ライセンス: Link先を確認

William Arild Dahl, Laszlo Erdodi, Fabio Massimo Zennaro

(参考訳) ソフトウェアにおける脆弱性の検出は、アプリケーションの開発とデプロイにおいて重要な課題である。最も知られて危険な脆弱性の1つはスタックベースのバッファオーバーフローであり、潜在的な攻撃者が悪意のあるコードを実行できる可能性がある。本稿では,最近の機械学習モデル,特にリカレントニューラルネットワークを用いて,プログラムのアセンブリコードにスタックベースのバッファオーバーフロー脆弱性を検出することを検討する。アセンブリコードは汎用的で一般的な表現であるため、この言語に焦点を当てることで、複数の異なるプログラミング言語で記述されたプログラムを検討できる。さらに,コードを自然言語として扱うことができるという仮説をサブスクライブし,自然言語処理に一般的に使用される標準アーキテクチャを用いてアセンブリコードを処理する。本研究は,自然言語仮説の妥当性と,再帰的ニューラルネットワークを用いた脆弱性検出の可能性を確認することを目的とした一連の実験を行う。その結果,当社のアーキテクチャは,コンテキストに強く依存する,微妙なスタックベースのバッファオーバーフロー脆弱性を捕捉できることが分かった。

Detecting vulnerabilities in software is a critical challenge in the development and deployment of applications. One of the most known and dangerous vulnerabilities is stack-based buffer overflows, which may allow potential attackers to execute malicious code. In this paper we consider the use of modern machine learning models, specifically recurrent neural networks, to detect stack-based buffer overflow vulnerabilities in the assembly code of a program. Since assembly code is a generic and common representation, focusing on this language allows us to potentially consider programs written in several different programming languages. Moreover, we subscribe to the hypothesis that code may be treated as natural language, and thus we process assembly code using standard architectures commonly employed in natural language processing. We perform a set of experiments aimed at confirming the validity of the natural language hypothesis and the feasibility of using recurrent neural networks for detecting vulnerabilities. Our results show that our architecture is able to capture subtle stack-based buffer overflow vulnerabilities that strongly depend on the context, thus suggesting that this approach may be extended to real-world setting, as well as to other forms of vulnerability detection.

翻訳日:2021-04-18 05:55:06 公開日:2020-12-30

# 公共交通機関の類似性分類

Similarity Classification of Public Transit Stations ( http://arxiv.org/abs/2012.15267v1 )

ライセンス: Link先を確認

Hannah Bast, Patrick Brosi and Markus N\"ather

(参考訳) 2つの公共交通機関の駅識別子 A と B がラベルと地理的座標を持つ場合、A と B が同一の駅を表すかどうかを決定する。例えば "St Pancras International at (51.5306, -0.1253) や "London St Pancras at (51.5319, -0.1269) では、答えは "Yes" となる。この問題は、地理的情報システム、スケジュールのマージ、ルート計画、マップマッチングなど、公共交通機関のデータを使用する領域で頻繁に発生する。地理的距離と単純な文字列類似度尺度に基づくいくつかのベースライン手法を検討する。また、より精巧な文字列類似度尺度を実験し、手動で正規化ルールを作成します。実験の結果,これらのベースライン法は良好な結果をもたらすが,十分に満足できるものではないことがわかった。そこで我々は,2つの駅間のトリグラムの一致,距離,相互織りグリッド上の位置を訓練したランダムフォレスト分類器に基づくアプローチを開発した。すべてのアプローチは、OpenStreetMap (OSM)データから得られた幅広い真実のデータセットに基づいて評価される。全てのデータセットにおいて、我々の学習に基づくアプローチはF1スコアを99%以上達成し、最も精巧なベースラインアプローチ(TFIDFスコアと地理的距離に基づく)でさえもF1スコアを94%以上達成し、地理的距離閾値を用いた単純なアプローチはF1スコアを75%しか達成していない。トレーニングとテストの両方のデータセットが公開されています。

We study the following problem: given two public transit station identifiers A and B, each with a label and a geographic coordinate, decide whether A and B describe the same station. For example, for "St Pancras International" at (51.5306, -0.1253) and "London St Pancras" at (51.5319, -0.1269), the answer would be "Yes". This problem frequently arises in areas where public transit data is used, for example in geographic information systems, schedule merging, route planning, or map matching. We consider several baseline methods based on geographic distance and simple string similarity measures. We also experiment with more elaborate string similarity measures and manually created normalization rules. Our experiments show that these baseline methods produce good, but not fully satisfactory results. We therefore develop an approach based on a random forest classifier which is trained on matching trigrams between two stations, their distance, and their position on an interwoven grid. All approaches are evaluated on extensive ground truth datasets we generated from OpenStreetMap (OSM) data: (1) The union of Great Britain and Ireland and (2) the union of Germany, Switzerland, and Austria. On all datasets, our learning-based approach achieves an F1 score of over 99%, while even the most elaborate baseline approach (based on TFIDF scores and the geographic distance) achieves an F1 score of at most 94%, and a naive approach of using a geographical distance threshold achieves an F1 score of only 75%. Both our training and testing datasets are publicly available.

翻訳日:2021-04-18 05:54:47 公開日:2020-12-30

# タブラルファイナンシャルデータを用いた信用リスクモニタリングのための逐次深層学習

Sequential Deep Learning for Credit Risk Monitoring with Tabular Financial Data ( http://arxiv.org/abs/2012.15330v1 )

ライセンス: Link先を確認

Jillian M. Clements, Di Xu, Nooshin Yousefi, Dmitry Efimov

(参考訳) 機械学習は、銀行業界における財政的損失を防ぐ上で重要な役割を果たす。おそらく、毎年数十億ドルの損失をもたらす可能性のある最も関連する予測タスクは、信用リスクの評価(すなわち債務不履行のリスク)である。今日、信用リスクを予測するための機械学習からの利益の多くは、勾配強化決定木モデルによってもたらされている。しかし、これらの利益は高価な新しいデータソースや高度に設計されたフィーチャを追加せずに高まり始めます。本稿では,新たなモデル入力に依存しない深層学習を用いて,信用リスク評価のための新しい手法を考案する試みについて述べる。本稿では,コストのかかる財務データの長い履歴列を利用する深層再帰的および因果的畳み込みに基づくニューラルネットワークを用いた,新たなクレジットカードトランザクションサンプリング手法を提案する。我々は,時間的畳み込みネットワークを用いた逐次的ディープラーニングアプローチが,非シーケンスツリーベースモデルのベンチマークを上回っており,大幅な貯蓄と早期の信用リスク検出を達成していることを示す。また,本手法により,シーケンスを効率よくメモリに格納し,高速なオンライン学習と推論を行うことが可能となる実運用環境において,本手法が採用される可能性を示した。

Machine learning plays an essential role in preventing financial losses in the banking industry. Perhaps the most pertinent prediction task that can result in billions of dollars in losses each year is the assessment of credit risk (i.e., the risk of default on debt). Today, much of the gains from machine learning to predict credit risk are driven by gradient boosted decision tree models. However, these gains begin to plateau without the addition of expensive new data sources or highly engineered features. In this paper, we present our attempts to create a novel approach to assessing credit risk using deep learning that does not rely on new model inputs. We propose a new credit card transaction sampling technique to use with deep recurrent and causal convolution-based neural networks that exploits long historical sequences of financial data without costly resource requirements. We show that our sequential deep learning approach using a temporal convolutional network outperformed the benchmark non-sequential tree-based model, achieving significant financial savings and earlier detection of credit risk. We also demonstrate the potential for our approach to be used in a production environment, where our sampling technique allows for sequences to be stored efficiently in memory and used for fast online learning and inference.

翻訳日:2021-04-18 05:54:19 公開日:2020-12-30

# スペクトログラムとCNNを用いたLoRaの高周波指紋識別

Radio Frequency Fingerprint Identification for LoRa Using Spectrogram and CNN ( http://arxiv.org/abs/2101.01668v1 )

ライセンス: Link先を確認

Guanxiong Shen, Junqing Zhang, Alan Marshall, Linning Peng, and Xianbin Wang

(参考訳) RFFI(Radio frequency fingerprint Identification)は、無線デバイス固有のハードウェア特性に依存する新しいデバイス認証技術である。我々は,spectrogram and convolutional neural network (cnn) に基づく長距離(lora)システムのためのrffiスキームを設計した。具体的には,lora信号の細粒度時間周波数特性を表すために分光計を用いた。さらに, 即時キャリア周波数オフセット(CFO)がドリフトしており, 誤分類が発生し, システムの安定性を著しく損なうことが判明した。最後に、CNN出力を推定したCFOで調整できるハイブリッド分類器を設計した。 CFOの平均値は比較的安定しているため、推定されたCFOが範囲外になるCNN予測を除外することができる。 20個のLoRaデバイス(DUT)とUniversal Software Radio Peripheral (USRP) N210受信機を用いて実無線環境で実験を行った。 IQベースのRFFIスキームとFFTベースのRFFIスキームを比較することで、スペクトルベースのスキームは20のLoRa DUTに対して97.61%の最良の分類精度に達することができる。

Radio frequency fingerprint identification (RFFI) is an emerging device authentication technique that relies on intrinsic hardware characteristics of wireless devices. We designed an RFFI scheme for Long Range (LoRa) systems based on spectrogram and convolutional neural network (CNN). Specifically, we used spectrogram to represent the fine-grained time-frequency characteristics of LoRa signals. In addition, we revealed that the instantaneous carrier frequency offset (CFO) is drifting, which will result in misclassification and significantly compromise the system stability; we demonstrated CFO compensation is an effective mitigation. Finally, we designed a hybrid classifier that can adjust CNN outputs with the estimated CFO. The mean value of CFO remains relatively stable, hence it can be used to rule out CNN predictions whose estimated CFO falls out of the range. We performed experiments in real wireless environments using 20 LoRa devices under test (DUTs) and a Universal Software Radio Peripheral (USRP) N210 receiver. By comparing with the IQ-based and FFT-based RFFI schemes, our spectrogram-based scheme can reach the best classification accuracy, i.e., 97.61% for 20 LoRa DUTs.

翻訳日:2021-04-18 05:53:40 公開日:2020-12-30

# 非並列調音音声合成のための多視点時間アライメント

Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis ( http://arxiv.org/abs/2012.15184v1 )

ライセンス: Link先を確認

Jose A. Gonzalez-Lopez and Miriam Gonzalez-Atienza and Alejandro Gomez-Alanis and Jose L. Perez-Cordoba and Phil D. Green

(参考訳) A2A(Articulatory-to-acoustic)合成(A2A)は、調音器の捕えられた動きから可聴音声を生成することを指す。この手法には、病気や怪我のためにもはや話せない人々への口頭コミュニケーションの回復など、多くの応用がある。最も成功した技術は教師付き学習フレームワークを採用しており、時間同期の調音音声記録を用いて教師付き機械学習アルゴリズムを訓練し、後から音声への調音運動のマッピングに使用できる。しかし、これは並列データが利用できない場合、例えば、既に声を失い、調音データのみをキャプチャできるような場合、A2A技術の適用を妨げている。本研究では,多視点学習理論に基づくこの問題に対する解法を提案する。提案アルゴリズムは, 両ビューが最大相関する共通潜在空間に投影し, 動的時間ワープを適用することにより, 同一の音声内容を含む一対の非整合調音列間の最適時間アライメントを求める。この概念のいくつかの変種が議論され、検討されている。非一致シナリオで生成された音声の質は、並列シナリオで得られたものと同程度であることを示す。

Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators. This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury. Most successful techniques so far adopt a supervised learning framework, in which time-synchronous articulatory-and-speech recordings are used to train a supervised machine learning algorithm that can be used later to map articulator movements to speech. This, however, prevents the application of A2A techniques in cases where parallel data is unavailable, e.g., a person has already lost her/his voice and only articulatory data can be captured. In this work, we propose a solution to this problem based on the theory of multi-view learning. The proposed algorithm attempts to find an optimal temporal alignment between pairs of non-aligned articulatory-and-acoustic sequences with the same phonetic content by projecting them into a common latent space where both views are maximally correlated and then applying dynamic time warping. Several variants of this idea are discussed and explored. We show that the quality of speech generated in the non-aligned scenario is comparable to that obtained in the parallel scenario.

翻訳日:2021-04-18 05:53:23 公開日:2020-12-30

# 確率的ユーティリティ最大化のためのテストスコアアルゴリズム

Test Score Algorithms for Budgeted Stochastic Utility Maximization ( http://arxiv.org/abs/2012.15194v1 )

ライセンス: Link先を確認

Dabeen Lee, Milan Vojnovic, Se-Young Yun

(参考訳) 実用性最大化問題を解くための個別項目スコアに基づくアルゴリズム設計の最近の発展により、我々は、予算化された確率的実用性最大化問題の解法として、観測された個々の項目パフォーマンスデータの統計量として定義されたテストスコアを使用する枠組みを研究した。既存のスコアリング機構、すなわちレプリケーションテストスコアを拡張して、異種アイテムのコストとアイテムの値を統合する。そこで本研究では,複製テストのみに基づいてアイテムを選択する自然なグリージーアルゴリズムにより,幅広いユーティリティ関数に対して最適値の定数係数内の解を出力することを示す。我々のアルゴリズムと近似保証は、テストスコアが個々のアイテム値の限界分布に関する特定の期待値のノイズ推定であると仮定し、我々のアルゴリズムを実用化し、ノイズのない見積もりを仮定する以前の作業を拡張します。さらに,我々のアルゴリズムは,同じ近似保証を維持しつつ,ストリーミング形式で商品が到着する状況に適応できることを示す。我々は,Academia.StackExchange Q&Aフォーラムの合成データとデータセットを用いて,我々のテストスコアアルゴリズムが競争性を達成できることを示し,場合によっては関数値を評価するために値オラクルへのアクセスを必要とするベンチマークアルゴリズムよりも優れた性能を示す。

Motivated by recent developments in designing algorithms based on individual item scores for solving utility maximization problems, we study the framework of using test scores, defined as a statistic of observed individual item performance data, for solving the budgeted stochastic utility maximization problem. We extend an existing scoring mechanism, namely the replication test scores, to incorporate heterogeneous item costs as well as item values. We show that a natural greedy algorithm that selects items solely based on their replication test scores outputs solutions within a constant factor of the optimum for a broad class of utility functions. Our algorithms and approximation guarantees assume that test scores are noisy estimates of certain expected values with respect to marginal distributions of individual item values, thus making our algorithms practical and extending previous work that assumes noiseless estimates. Moreover, we show how our algorithm can be adapted to the setting where items arrive in a streaming fashion while maintaining the same approximation guarantee. We present numerical results, using synthetic data and data sets from the Academia.StackExchange Q&A forum, which show that our test score algorithm can achieve competitiveness, and in some cases better performance than a benchmark algorithm that requires access to a value oracle to evaluate function values.

翻訳日:2021-04-18 05:53:04 公開日:2020-12-30

PDF登録状況（公開日: 20201230）